identifying psychological research data in the digital environment

24
Identifying psychological research data Identifying psychological research data in the digital environment Erich Weichselgartner Erich Weichselgartner Leibniz Institute for Psychology Information (ZPID) Information (ZPID) Trier, Germany IDSC of IZA/GESIS/RatSWD Workshop: IDSC of IZA/GESIS/RatSWD Workshop: Persistent Identifiers for the Social Sciences University Club, Bonn – Feb. 12, 2011

Category:

Technology


1 download

DESCRIPTION

Weichselgartner, E. (2011, Februar).Identifying psychological research data in the digital environment. (PDF) IDSC of IZA/Gesis/RatSWD Workshop: Persistent Identifiers for the Social Sciences, Bonn.

TRANSCRIPT

Page 1: Identifying psychological research data in the digital environment

Identifying psychological research dataIdentifying psychological research data in the digital environment

Erich WeichselgartnerErich WeichselgartnerLeibniz Institute for Psychology

Information (ZPID)Information (ZPID)Trier, Germany

IDSC of IZA/GESIS/RatSWDWorkshop:IDSC of IZA/GESIS/RatSWD Workshop: Persistent Identifiers for the Social SciencesUniversity Club, Bonn – Feb. 1‐2, 2011

Page 2: Identifying psychological research data in the digital environment

PsychDataPsychData

Walter Schneider

Name not unique identifier

Walter Schneider

Bonn, February 1, 2011 2Weichselgartner: PsychData

Page 3: Identifying psychological research data in the digital environment

PsychDataPsychData

Makeanonymous

Sj2

Bonn, February 1, 2011 3Weichselgartner: PsychData

Page 4: Identifying psychological research data in the digital environment

PsychDataPsychData

Major issues in Psychology (outline)

• It is not common practice in Psychology to share data

E hi l i i l C fid i li i• Ethical principles: Confidentiality, privacy

• Lack of standardization of instruments: documentation veryl b i ilaborious, context important

• Change culture of the field

• Point out advantages for researcher, community, society

• PsychData: Discipline specific repository (archive)

Bonn, February 1, 2011 4Weichselgartner: PsychData

Page 5: Identifying psychological research data in the digital environment

PsychDataPsychData

PsychData is an archive of primary research data in psychology. It wasdeveloped at the Leibniz Institute for Psychology Information (ZPID) in Trier,Germany with partial funding by the German Research Foundation (DFG)Germany, with partial funding by the German Research Foundation (DFG).

Goals1 Acquisition1. Acquisition

2. Documentation

3 P ti (l t hi i )3. Preservation (long‐term archiving)

4. Access (distribution)

5 Di t h t (li it d !)5. Direct research support (limited resources!)1. Tools to incorporate data sharing in the initial design of a study

2 Direct deposit via web interface2. Direct deposit via web interface

Bonn, February 1, 2011 5Weichselgartner: PsychData

Page 6: Identifying psychological research data in the digital environment

PsychDataPsychData

Very limited resources (2x0,5 staff).

Selection criteria based on quality:q y• Large surveys• Studies of unique populations• Studies conducted at unique times• Longitudinal studies• Non Replicability (data replication not feasible excessively• Non‐Replicability (data replication not feasible, excessively 

costly or prohibitive)• Scientific value

• Citation, research, and educational use as published in refereed scientific publications

Active solicitation required, hardly any volunteer donors!

Bonn, February 1, 2011 6Weichselgartner: PsychData

Page 7: Identifying psychological research data in the digital environment

PsychDataPsychData

Volume• ~ 60 Studies, annual growth ~ 10 studies

• ~ 80 Data sets

• ~ 40 Mio Data points

Data reuse• Possible since June 2004

• ~ 10 requests per year 10 requests per year

Bonn, February 1, 2011 7Weichselgartner: PsychData

Page 8: Identifying psychological research data in the digital environment

PsychDataPsychData

Exemplary Study

The Munich Twin Study (GOLD): Genetic OrientedLongitudinal Study of Differential Development(in preparation)

• Long term study, begin in 1937 with 180 i (id i l) d di i (f l)monozygotic (identical) and dizygotic (fraternal) 

twins.

• Five waves• Five waves

• Genetic vs. Environmental Determinants of Traits, Motives Self Referential Cognitions and VolitionalMotives, Self‐Referential Cognitions, and Volitional Control in Old Age 

• http://www.mpipf‐muenchen.mpg.de/BCD/PROJECTS/gold_g.htm

Bonn, February 1, 2011 8Weichselgartner: PsychData

Page 9: Identifying psychological research data in the digital environment

PsychDataPsychData

• Provide incentives rewards and recognition for scientists who share

Benefits of data sharing (in Psychology)Provide incentives, rewards, and recognition for scientists who share and archive data.• Citation is a primary scholarly indicator of value (Lyon, 2007)

Sh i h d i i d i h i d i i• Sharing research data is associated with increased citation rate  (Piwowar, Day & Fridsma, 2007)

• Make data sets citable as scholarly publications; establish citation standard• Long‐Lived Digital Data Collections (NSF, USA)

• „Strategies for location‐independent identification of data objects,„Strategies for location independent identification of data objects, such as Digital Object Identifiers and permanent Universal Resource Locators (URLs) need to be developed and broadly applied to address this problem. “p

• Digital Repositories Programme (JISC, UK)• Project STD‐DOI „Publication and Citation of Scientific Primary 

Data” (funded by the German Research Foundation 2003 2005)

Bonn, February 1, 2011 9Weichselgartner: PsychData

Data  (funded by the German Research Foundation, 2003‐2005)

Page 10: Identifying psychological research data in the digital environment

PsychDataPsychDataMake data citeable (as a unique piece of work and not only a 

part of a publication); this requiresPersistent identification

• Long‐term availability (resolver, data)• Reliability• Reliability

Possible solution: The DOI System (International DOI Foundation).Components

• a specified numbering syntax• a resol tion ser ice (based on the Handle S stem)• a resolution service (based on the Handle System);• a data model system (including the indecs Data 

Dictionary);Dictionary);• policies and procedures for the implementation of DOI 

names through a federation of Registration Agencies.

Bonn, February 1, 2011 10Weichselgartner: PsychData

Page 11: Identifying psychological research data in the digital environment

PsychDataPsychData

The Digital Object Identifier (DOI®) SystemTh DOI S t id f k fThe DOI System provides a framework for

• persistent identification,

i i t ll t l t t• managing intellectual content,

• managing metadata,

li ki i h li• linking customers with content suppliers,

• facilitating electronic commerce, and

• enabling automated management of media.

DOI names can be used for any form of management of any data, h th i l i lwhether commercial or non‐commercial.

Bonn, February 1, 2011 11Weichselgartner: PsychData

Page 12: Identifying psychological research data in the digital environment

PsychDataPsychData

The DOI System

• Examples• Examples• doi:10.1000/182• doi:10.1594/PANGAEA.484677

• The prefix identifies the registrant of the name, and the suffix is chosen by the registrant and identifies the specific object associated with that DOIassociated with that DOI

• http://dx.doi.org/10.1000/182• http://dx.doi.org/10.1594/PANGAEA.484677

b d b h l k• DOIs can be incorporated into Web pages much like current links. But instead of pointing to a specific Web location, the DOI sends the browser off to a database, where it retrieves and displays whatever information the publisher chooses to offer.

Bonn, February 1, 2011 12Weichselgartner: PsychData

Page 13: Identifying psychological research data in the digital environment

PsychDataPsychData

The DOI System• Registration agencies

• CrossRef, OPOCE, DataCite (GER: GESIS, TIB, ZB MED), etc.• On May 1st 2005 the TIB became the world's first DOI registration• On May 1st 2005 the TIB became the world s first DOI registration 

agency for scientific primary data

• Publication agents (data centers, e.g. PsychData)• Long‐term archive

Bonn, February 1, 2011 13Weichselgartner: PsychData

Page 14: Identifying psychological research data in the digital environment

PsychDataPsychData

The primary role of Registration Agencies (RAs) is to provide services to Registrants allocating DOI® name prefixes registering DOI names andRegistrants ‐ allocating DOI name prefixes, registering DOI names and providing the necessary infrastructure to allow Registrants to declare and maintain metadata and state data. This service is expected to encompass 

li h h i i f h DOI®quality assurance measures, so that the integrity of the DOI® system as a whole is maintained at the highest possible level (delivering reliable and consistent results to users). This includes ensuring that state data is accurate and up‐to‐date and that metadata is consistent and complies with both DOI system Kernel and appropriate Application Profile standards.

Bonn, February 1, 2011 14Weichselgartner: PsychData

Page 15: Identifying psychological research data in the digital environment

PsychDataPsychData

Registration Agency: DataCite

• DataCite is focused on improving the scholarly infrastructure around datasets. There will be a set of activities around establishing and sharing best‐practices, identifying and solving some of the unique issues that arisebest practices, identifying and solving some of the unique issues that arise with datasets.

• DataCite is focused on working with data centres and organisations that hold data The details of their business models workflows and otherhold data. The details of their business models, workflows, and other requirements do not appear to be identical to those of publishers producing traditional journals.

• DataCite has a b siness model that meets the needs of non commercial• DataCite has a business model that meets the needs of non‐commercial and sometimes smaller organisations; larger national‐scale organisations(e.g., TIB, BL) carry the basic infrastructure costs and will reclaim where appropriate within their domain.

Bonn, February 1, 2011 15Weichselgartner: PsychData

Page 16: Identifying psychological research data in the digital environment

PsychDataPsychData

Publication Agent

Data publications are processed by publication agents. Besides the publication tasks the agents are also responsible for long‐term archiving of primary data ("data library"). Each agent covers its own thematic field.primary data ( data library ). Each agent covers its own thematic field. 

Bonn, February 1, 2011 16Weichselgartner: PsychData

Page 17: Identifying psychological research data in the digital environment

PsychDataPsychData

Infrastructure and services

Responsibilities of Publication AgentInfrastructure and services“Datasets should be easy to find, easy to access, easy to use.”• How to identify data set? Persistent identification.

• „Strategies for location‐independent identification of data objects, such as Digital Object Identifiers and permanent Universal Resource Locators (URLs) need to be developed and broadly applied to address this problem. “ NSF, 2005

• Discovery• Metadata elements providing data history, authorship, and access 

information• Catalogues, Search engines

• Data access/Release policiesData access/Release policies• Legal restrictions• Property rights, confidentiality, privacy

Bonn, February 1, 2011 17Weichselgartner: PsychData

Page 18: Identifying psychological research data in the digital environment

PsychDataPsychData

Requirements for PID infrastructureq

• Trustworthy, secure, reliable, sustainable (e.g., defined sevicelevel agreements)

• Acceptance in community (PUB + COM)

• Standardized, interoperable; provides guidelinesStandardized, interoperable; provides guidelines

• Added value

• resource discovery (related works derivatives)• resource discovery (related works, derivatives)

• track citation, usage logs

Bonn, February 1, 2011 18Weichselgartner: PsychData

Page 19: Identifying psychological research data in the digital environment

PsychDataPsychData

Advantages of the DOI System

• Well established (e.g., all major publishers)

• Stable, redundant, no downtimes

• 1:1 relationship between metadata and identifier(use metadata to find identifier)

• „cited‐by linking“

Bonn, February 1, 2011 19Weichselgartner: PsychData

Page 20: Identifying psychological research data in the digital environment

PsychDataPsychDataReferences• A fair share. The concept of sharing primary data is generating unnecessary angst in the 

psychology community (7 December 2006) Nature 444 653‐654psychology community. (7 December 2006). Nature, 444, 653 654• Breckler, S. (2009). Psychology needs to develop mechanisms for data sharing. APA Monitor 

Online, 40 (2).• Fienberg S. E., Martin M. E., Straf M. L. (1985). Sharing research data. Washington, D.C.: National 

Academy Press.• Guilford, J. P. (1954). Psychometric Methods. McGraw‐Hill, New York, 2nd edition.• Piwowar H.A., Day R.S., Fridsma D.B. (2007). Sharing Detailed Research Data Is Associated with 

Increased Citation Rate PLoS ONE 2(3): e308 doi:10 1371/journal pone 0000308Increased Citation Rate. PLoS ONE 2(3): e308. doi:10.1371/journal.pone.0000308• Roberts, F. S. (1979).  Measurement theory: with applications to decisionmaking, utility, and the 

social sciences. Encyclopedia of Mathematics and its Applications, Vol. 7, Addison‐Wesley, Reading, MA.

• Sobal, J. (1982). The Role of Secondary Data Analysis in Teaching the Social Sciences. Library Trends, 30, 479‐488.

• Weichselgartner, E. (2008). PsychData: An archive for primary research data in Psychology. Keeping the Records of Science Accessible: Can we afford it? High‐level strategic conferenceKeeping the Records of Science Accessible: Can we afford it? High level strategic conference organized by the Alliance for Permanent Access, the European Science Foundation (ESF) and the Hungarian Scientific Research Fund (OTKA) in Budapest, Hungary, November 4, 2008.

• Wicherts, J. M., Borsboom, D., Kats, J., & Molenaar, D. (2006). The Poor Availability of 

Bonn, February 1, 2011 20Weichselgartner: PsychData

Psychological Research Data for Reanalysis. American Psychologist, 61, 726‐28.

Page 21: Identifying psychological research data in the digital environment

PsychDataPsychData

BibliographyAzar B (1999) Psychology needs to develop mechanisms for data sharing APA Monitor 30 (8)Azar, B. (1999). Psychology needs to develop mechanisms for data sharing. APA Monitor, 30 (8).Dockser M. A.: My Data, Your Data, Our Data. Wall Street Journal, 2010/04/13.Editorial: Data for eternity. Nature Geoscience 3, 219 (2010).Klopp, T. (2010). OPEN DATA. Forscher sollen ihre Daten teilen. ZEIT ONLINE.Mervis, J.: NSF to Ask Every Grant Applicant for Data Management Plan. Science Insider, 2010/05/05.Procter, M. (1993). Analyzing other researchers´ data. In N. Gilbert (Ed.), Researching social life (pp. 

255‐269). London: Sage. Sieber J E (Ed ) (1991) Sharing social science data Advantages and challenges Thousand Oaks CA:Sieber, J. E. (Ed.). (1991). Sharing social science data. Advantages and challenges. Thousand Oaks, CA: 

Sage.Sieber, J. E. (1997). Credit allocation in psychology. Science and Engineering Ethics, 3, 261‐264.Tucker, Jennifer (2009). Motivating Subjects: Data Sharing in Cancer Research. Falls Church, VA 

(Dissertation)

EventsCNR ISTI (Italy) Workshop: GLOBAL SCIENTIFIC DATA INFRASTRUCTURES: THE BIG DATA CHALLENGESCNR‐ISTI (Italy) Workshop: GLOBAL SCIENTIFIC DATA INFRASTRUCTURES: THE BIG DATA CHALLENGES. Hotel La Palma, Island of Capri, Italy, 12‐13 May 2011

Bonn, February 1, 2011 21Weichselgartner: PsychData

Page 22: Identifying psychological research data in the digital environment

PsychDataPsychData

Other InitiativesEuropep• CESSDA: Council of European Social Science Data Archives (http://www.cessda.org/ )• UKDA: The UK Data Archive (http://www.data‐archive.ac.uk/ )

USA• CHILDES: Child Language Data Exchange System (http://childes.psy.cmu.edu/ )• Henry A. Murray Research Archive at Harvard University (http://www.murray.harvard.edu/ )

J l f S i i Ed i D A hi ( )• Journal of Statistics Education Data Archive (http://www.amstat.org/publications/jse/jse_data_archive.htm )

Bonn, February 1, 2011 22Weichselgartner: PsychData

Page 23: Identifying psychological research data in the digital environment

PsychDataPsychData

Contacthttp://www.psychdata.de/

[email protected]

P hD t t Th Bä I D h h d A i Gü th Gü t KPsychData team: Thomas Bäumer, Ina Dehnhard, Armin Günther, Günter Krampen, Jutta von Maurice, Leo Montada, Sebastian Mühlböck, Erich Weichselgartner

Partly funded by the German Research Foundation

Member of

Bonn, February 1, 2011 23Weichselgartner: PsychData

Page 24: Identifying psychological research data in the digital environment

About ZPID (http://www.zpid.de/index.php?lang=EN):• ZPID’s objective is to provide a comprehensive, sustainable, and professionally based documentation and communication of information in the field of psychology focusing on the German‐speaking countries.p y gy g p g

• Founded in 1977 at the University of Trier.• Non‐profit organization – co‐funded by the Federal Republic of Germany and the German States. 

• Member of the Leibniz Association (association of 86 scientific research institutions).)

• Quality Assurance by External Evaluation, Scientific Advisory Board and Supervisory Board.

• Annual budget ~ US‐$ 2.5 Mio (without competition‐based grants).

• ~ 30 scientific and administrative staff.

Bonn, February 1, 2011 24Weichselgartner: PsychData