the leads database at icpsr: identifying important social science studies for archiving presentation...
Post on 19-Dec-2015
214 views
TRANSCRIPT
The LEADS Database at ICPSR: Identifying Important Social Science Studies for Archiving
Presentation prepared for 2006 Annual Meeting of the IASSISTFriday, May 26, 2006
LEADS at ICPSR
We would like to know the “universe” of social science data that have been collected
Identification
Data-PASS and ICPSR would like to know how much social science data is “at risk” of being lost or has been lost
Appraisal
We would also like to know what “at risk” social science data are important enough to be archived
What is LEADS?
LEADS is a database of records containing information about scientific studies that may have produced social science data
LEADS contains descriptive information about various scientific studies that have been identified.
LEADS also contains information that can be used to determine the “fit” and “value” of a scientific study
LEADS keeps a record of all human (staff) decisions that have been made about the fit and value of a scientific study.
Sources of Records in LEADS
NSF research grant awards downloaded from nsf.gov
NIH research grant awards downloaded from CRISP
Prospective searches of topical areas/journals
Researcher nominations (self or other)
NSF Grant Awards in LEADS (pre-screening)LEADS contains 17,194 awards made by NSF
LEADS spans 30 years of awards - 1976 to 2005
LEADS spans 53 NSF organizations that award grants
Of the 53 organizations, the 4 organizations with the most records screened (each contributing 1,000+ records) were:
SES: Social and Economic SciencesBCS: Behavioral and Cognitive SciencesDMS: Mathematical SciencesIOB: Integrative and Organism Biology
Total # NSF Grant Awards by Year
0
100
200
300
400
500
600
700
800
900
1970 1975 1980 1985 1990 1995 2000 2005 2010
Start Year
NS
F G
ran
ts R
evie
wed
by
ICP
SR
Screening Criteria
• Social science and/or behavioral science• Original or primary data collection proposed,
including assembling a database from existing (archival) sources
Activity in NSF Grants (n=17,194)
Type of Activity Proposed % N=
Not Social Science 47.9 8,237
Training/Workshop/Conf. 3.8 656
Social Science
Primary Data Collection 13.6 2,336
Secondary Analysis 2.1 363
Primary & Secondary (combination) 1.2 201
No Data Collection or Analysis 2.9 505
No Abstract 15.5 2,664
Flagged & Other 13.0 2,232
Types of Research Activity NSF has Awarded by Year
0
50
100
150
200
250
1985 1990 1995 2000 2005 2010
# NSF AwardsProposing New SocScience DataCollection (incl.combo)
# NSF AwardsProposing SocScience SecondaryAnalyses
***Abstracts become widely available 1987+***
Most Prevalent Social Science Primary Data Collection Awards by NSF Organization
% of total for division n of awards
Antarctic Sciences Div 51.1 38
Arctic Science Div 38.4 81
Behavioral & Cognitive Sciences
19.2 391
Social & Economic Sciences
29.7 1847
Research, Evaluation & Communication
40.0 10
Information & Intelligent Systems
11.9 97
Other Fields Coded During Screening
• Topic/Discipline• Data Collection Methodology• Sampling Characteristics
Topic/Discipline in NSF Awards for Primary Social Science Data Collection
1034
55
119
317
192
316342
430
0
50
100
150
200
250
300
350
400
450
Topic/Discipline
Aging
Health
Child Care
Education
Criminal Justice
Demography
Political Science
Economics
Psychology
# o
f N
SF
Aw
ard
s
An additional 1,594 records coded “General Social Science”
Type of Data Collection Method/Design in NSF Awards for Primary Social Science Data Collection
408
493
35
410
498
304
112
34
351 357
0
50
100
150
200
250
300
350
400
450
500
Method/Design
Survey
Experimental
Evaluation
Historical/Archival
Interview (?)
Field/Observational
Content Analysis
Focus Groups
Other Method
Not Specified
# o
f N
SF
Aw
ard
s
NSF Awards for Social Science Primary Data: Proposed Sampling Method
Percent of Total N=
Probability Sample Proposed 5.3 123
Non-Probability Sample Proposed 1.0 23
Not Specified/Missing 93.7 2,190
NSF Awards for Social Science Primary Data: Type of Sampling Frame Proposed
Sampling Frame Percent of Total N=
U.S. - National 17.5 408
U.S. - Regional 14.4 336
International – Including U.S. 9.2 214
International – Excluding U.S. 16.5 385
Not Specified/Missing 42.5 993
NSF Awards for Social Science Primary Data: Proposed Sample Size
Sample Size Percent N=
1,000+ 5.2 122
100-999 6.5 151
< 100 3.0 69
Not specified/Missing 85.3 1,994
NSF Awards for Social Science Primary Data: Race/Ethnic Distribution of Sample
Percent N=
Multiple Races 6.6 154
Single Race Study 4.8 113
Not Specified/Missing 88.6 2,069
Any Whites 2.1 50
Any African Americans 4.2 97
Any Latinos 3.0 70
Any Asians 3.9 90
Any Other Non-Whites 1.7 39
Following-Up: Prospects for Data Archiving• N=2,336 Primary Social Science Data Collection Awards• N=201 Combined Data Collection Activity and Secondary Data, Social
Science
Research Steps: Select ~10-20 records per week Generate updated contact information for PI Determine if “obviously” archived already (ICPSR, Roper, Odum, Murray,
Sociometrics, GOOGLE) Review related citations Review other NSF awards made to PI Contact PI (Data Produced? Data Archived? Data Still Available?)
Other Qualitative Fields in LEADS
• Description of how the collection fits within the scope of important social science studies
• Description of the value of the study for archiving• Priority ranking• Citations• PI communication
Research, 47
Email PI, 20
RC-Arch, 17
Plan Deposit, 6Appraisal, 4
RC-Other, 3
0
5
10
15
20
25
30
35
40
45
50
NSF Active/Closed LEADS (n=97)
Problems archiving studies…
• PI unsure where data are stored• Data are in an old format that we may or may not be
able to recover• Physical condition (storage media or documentation)
has deteriorated• Paper copy documentation only, incomplete
documentation• No English language documentation
NIH records in LEADS
• We screened NIH awards for (1) social science/behavioral, (2) original data & (3) quantitative• All NIH Institutes (1990-2001)• NICHD, NIA, NIMH, NINR, AHRQ, NIAAA, NIDA, clinical Center, NIDCD, FIC, NCI, NHLBI, NIDDK (all years) • 172,196 - total # awards screened• 6,381 – selected awards
Challenges & Limitations
• Size and scope of this project• Need for PI cooperation• Screening error rate has not been quantified• Addressing the ambiguous records• Collaborative projects and continuation projects have
not been eliminated
Conclusions
• NIH & NSF award databases are a valuable source of information about studies “at risk” of being lost
• PI grant abstracts are highly variable regarding amount of detail about research aims & methodology
• Preliminary results suggest that few studies have been archived; although the rate is higher for NSF
• The large number of unarchived studies requires us to use appraisal methods to determine a particular study’s value for archiving