preserving social science data through archival collaboration

19
1 Micah Altman, Harvard Universi ty Preserving Social Science Data Through Archival Collaboration Micah Altman, Senior Research Scientist

Upload: fordlovers

Post on 25-May-2015

390 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Preserving Social Science Data Through Archival Collaboration

1Micah Altman, Harvard University

Preserving Social Science Data Through Archival Collaboration

Micah Altman, Senior Research Scientist

Page 2: Preserving Social Science Data Through Archival Collaboration

2Micah Altman, Harvard University

This Talk…

• The roadmap: Past Present Future

• Collaborators & Co-conspirators: Ken Billen, Jonathan Crabtree, Darrell Donakowski,, Myron

Gutmann, Gary King, Lois Timms-Ferrarra, Amy Pienta, Marc Maynard,

Page 3: Preserving Social Science Data Through Archival Collaboration

3Micah Altman, Harvard University

What? -- Digital Social-Science DataDIGITAL • Optical: DVD, CD• Magnetic: Tapes, ‘Floppies’• Paper: cards, tapesSOCIAL SCIENCE• Social:

class, crime, social movements, culture, folklore, family

• Economic: wealth, prosperity, labor, business, equity

• Psychology: cognition, attitudes, stereotypes

• Politics:justice, democracy, public policy, public administration, international conflict

DATA• Raw measurements• Numeric tables• Administrative records (& email)• Video and audio interviews, transcripts (& blogs)

Page 4: Preserving Social Science Data Through Archival Collaboration

4Micah Altman, Harvard University

Data Access is the Key To Science• Science is not (only) about being scientific• Scientific progress requires community: Competition and cooperation• In the pursuit of common goals• Without access to the same materials: no community exists• The value of an article that can’t be replicated: ?• Scholarly articles are summaries, not the actual research results• But: Data access is spotty by field• Movement to require data access with publication• Finding the data is still hard• Hard for journal editors to verify• If you find it, how do you know it’s the same?• Replication projects: most published articles in social science cannot be

replicated

Page 5: Preserving Social Science Data Through Archival Collaboration

5Micah Altman, Harvard University

Data Access is the Key To Democracy• Statistics = state-istics• The state tax authority: counting people, estimating

wealth• Reformers use data to assess the performance of the

state• Science informs public policy continually• In modern democracy: the public needs a direct source

of information

Page 6: Preserving Social Science Data Through Archival Collaboration

6Micah Altman, Harvard University

How Data Is LostData Intentionally Discarded“It was just too long ago, I generally keep data for something like 10 years beyond the last time I do something with them.”

“Destroyed, in accord with APA 5-year post-publication rule.”

Unintentional Hardware Problems“Some data were collected, but the data file was lost in a technical malfunction.”

Destroyed for Confidentiality Reasons“The material…was considered sensitive data. Institutional review boards.. required us to promise to destroy the data after

a certain period of time...”

Acts of Nature“The data from the studies were on punched cards that were destroyed in a flood in the department in the early 80s.”

Discarded or Lost in a Move“As I retired …. Unfortunately, I simply didn’t have the room to store these data sets at my house.”

Obsolescence “Speech recordings stored on a LISP Machine…, an experimental computer which is long obsolete.”

Simply Lost“For all I know, they are on a [University] server, but it has been literally years and years since the research was done, and

my files are long gone.”

Research by:

Page 7: Preserving Social Science Data Through Archival Collaboration

7Micah Altman, Harvard University

• Past grants and awards

• Private research organization

• Polling organizations

• Journals and researcher associations

Identifying Data at Risk

Page 8: Preserving Social Science Data Through Archival Collaboration

8Micah Altman, Harvard University

Collaboration for Preservation

Partnership Agreements Agreement to establish

good practice Preservation copies of data

collected Transfer Protocol: in case

of archival failure

Cooperating Operations Central database of leads

for acquisition Development of shared

procedures Review of acquisitions

Joint “Not-bad” practices Identification & selection Metadata Security Confidentiality

Shared Catalog Unified Discovery Content exchange Layered Services

Page 9: Preserving Social Science Data Through Archival Collaboration

9Micah Altman, Harvard University

Data Rescued Examples• U.S. Information Agency Surveys

Directly informed U.S. foreign policy through surveys of foreign public opinion Previously, only surveys from 1970-1990 were held in the national archives Collaboration be NARA and Roper to create a much more complete series spanning the 1950-1990 Surveys conducted in Europe, Latin America, Asian countries include nuclear arms control, Recent Subjects include US-Soviet relations, US strike on Libya, Soviet Union invasion of Afghanistan,

and economic matters, terrorism, economic summits, arms control, and the Soviet actions in Afghanistan, drug trafficking, democratization, and conflicts in El Salvador and Nicaragua.

• Longitudinal Study of Personality Development. By Jack and Jeanne Humphrey Block The most intensive study of human personality development in existence. Thirty year longitudinal study. Mixed methods – quantitative, audio, video. More than 100 instruments, and 1000’s of measures (variables) Resulted in more than 100 publications. (Also shows how whiny kids are more likely to grow up to be conservatives.)

• National Network of State Polls Diverse membership of 50 members in 38 states Covers a tremendous range of local and national issues Data imminently at risk

Page 10: Preserving Social Science Data Through Archival Collaboration

10Micah Altman, Harvard University

Selected Topics & Sponsors• Political activity, political activism, voting behavior, protest activity, voter registration, fundraising, political

alienation, relationship to the Black community, feminism, racial identity, attitudes toward abortion, attitudes toward federal programs; television viewing habits, affects of having children on the marriage, giving too much/little independence, discipline, overscheduling, overprotecting, measuring levels of success in teaching values, self-control, good citizenship, good money habits, religion, worries that parents have of the future facing their children; problems facing parents and children from drugs, sex, violence to the lack of various family and religious values; daycare, mothers working, childrearing, taxes, government spending, morals, children’s issues, economy, jobs, education, crime, health care, social security, local school administration, standardized testing, impact of poor scores on teachers, higher academic standards needed, too much/little homework, summer school., teachers, administrators, quality of academics, discipline matters, class size, level of science and math skills taught, Shakespeare, life skills, athletics, citizenship, Role of the US in the world and assessing US performance, terrorism, war in Iraq, respondent identified level of understanding of foreign affairs, US and foreign aid, assisting emerging democracies, enhancing national security, image of the US abroad, Seriousness of Welfare problems--abuse, fraud, generational, etc.; assessing list of remedies--limit duration, require job training, provide day care, unannounced visits, business tax breaks for hiring recipients, penalize recipients who have more children, etc.; profiling welfare recipients (e.g. more likely to be better/worse parents, lazy or hardworking, from troubled families; defining the American Ideal, how to teach kids what it means to be American, , national identity, appreciation of freedoms in the US, importance of voting, ashamed of nation's history of racism, job US does in teaching immigrant children, bi-lingualism, fly an American flag; most about the meaning of the rights the Constitution guarantees, assessing the level of appreciation of those rights in the US and how it is perceived to the international community; aging. Money Mangers; on union organizations, employers, and labor market institutions; tort law reforms; crime and urbanization; law and social control; natural disasters; awareness of self

• NSF, NIH, The Danforth Foundation, The Ford Foundation, The David and Lucille Packard Foundation, and Ewing Marion Kauffman Foundation., State Farm Insurance, Ronald McDonald House Charities, Advertising Council, American Federation of Teachers, the Annenberg Institute, the George Gund Foundation, the National School Boards Association, U.S. Department of Education, GE Foundation, Nellie Mae Education Foundation, Wallace Foundation, Bill & Melinda Gates Foundation, Pew Charitable Trust, National Constitution Center, Alliance for Aging Research, American Federation for Aging Research; the MacArthur Foundation, NiMH

Page 11: Preserving Social Science Data Through Archival Collaboration

11Micah Altman, Harvard University

Data-PASS Shared Catalog• A unified catalog of the

partners’ entire holdings• Completes the unification

of social science data that was the dream of the first Council of Social Science Data Archives in 1969

• Discovery Services Simple & fielded search Virtual collection browsing

• Metadata delivery Descriptive study, file, & variable

information Provenance metadata Human and OAI interfaces

• Enhanced Delivery Proxy delivery Replication Layered analysis services

Hosted by:

Page 12: Preserving Social Science Data Through Archival Collaboration

12Micah Altman, Harvard University

Catalog Distributed Architecture

Search Shared Catalog

Data Mirror

MetadataCatalog

Harvester

Online Catalog

Online Analysis

View Information on Data-Through Catalog-Link to Data at Partner Site

Access Data-With Extraction and Analysis, Through Catalog-Direct to Partner Sites

<X

SL

> C

ros

sw

alk

<X

SL

> C

ros

sw

alk

prox

ypr

oxy

OAI

Page 13: Preserving Social Science Data Through Archival Collaboration

13Micah Altman, Harvard University

Page 14: Preserving Social Science Data Through Archival Collaboration

14Micah Altman, Harvard University

The Dataverse Network*

Includes integrated developments in web application software, networking, data citation standards, and statistical methods designed to put some of the universe of data and data sharing practices on firmer ground. It facilitates the public preservation and distribution of persistent, authorized, and verifiable research data.

• In production• Will migrate Shared Catalog

this summer

Virtually-Hosted Archiving• The importance of being virtual …

Nothing to install Dynamic collections: local and federated

• Institutionally supported Persistent identifiers and citations No worries about file formats changing,

backups, etc. All the initial setup work is done for

depositor

• Depositor retain total control over Content Access Presentation

*Successor to “VDC”

Developed by:

Page 15: Preserving Social Science Data Through Archival Collaboration

15Micah Altman, Harvard University

http://dvn.iq.harvard.edu/dvn

Page 16: Preserving Social Science Data Through Archival Collaboration

16Micah Altman, Harvard University

Better Data Citations: Persisent ID’s and Universal Numeric Fingerprints

Persistent ID’s get you from a journal article to the data UNF’s verify that it’s the same data as cited

Same UNF regardless of hardware, operating system, statistical software, database, or spreadsheet software.

UNF’s combine: generalized rounding (dessication), normalization (canonicalization), fingerprinting (cryptographic hash, e.g. SHA256)

Available as: C++, R-stats language, Stata, SAS, S-Plus

Page 17: Preserving Social Science Data Through Archival Collaboration

17Micah Altman, Harvard University

Future: Replication as Institutional Insurance

• Schema driven:capture inter-archival preservation commitments

• Asymmetric: resource commitments proportional to holdings

• Versioned: versioned data and citations

• Integration: LOCKSS + DVN techology, archival workflows

Data-PASS Syndicated Storage Project

• External Causes of Preservation Failure Third party attacks Institutional funding Change in legal regimes

• Quis custodiet ipsos custodes? Unintentional curatorial

modification Loss of institutional knowledge &

skills Intentional curatorial

deaccessioning Change in institutional mission

Page 18: Preserving Social Science Data Through Archival Collaboration

18Micah Altman, Harvard University

• Preservation now follows the research life cycle

• Future preservation should be planned at the beginning of the cycle

Page 19: Preserving Social Science Data Through Archival Collaboration

19Micah Altman, Harvard University

For More Information Data-PASS Project:

http://www.icpsr.umich.edu/DATAPASS/

Shared Catalog:http://vdc.hmdc.harvard.edu/dataverse/DATAPASS/

Dataverse Network Software:http://TheData.Org

Get a dataverse hosted by IQSS:http://dvn.iq.harvard.edu/dvn