gaining credit for sharing research data

44
Varsha Khodiyar, PhD Data Curation Editor, Scientific Data Nature Publishing Group @varsha_khodiyar @scientificdata Tweet with #SDJPN16 Gaining credit for sharing research data Data publishing with Scientific Data RIKEN Center for Life Science Technologies 4 th March 2016

Upload: varsha-khodiyar

Post on 11-Apr-2017

238 views

Category:

Science


3 download

TRANSCRIPT

Page 1: Gaining credit for sharing research data

Varsha Khodiyar, PhD

Data Curation Editor, Scientific Data

Nature Publishing Group

@varsha_khodiyar

@scientificdata

Tweet with #SDJPN16

Gaining credit for sharing research data

Data publishing with Scientific Data RIKEN Center for Life Science Technologies 4th March 2016

Page 2: Gaining credit for sharing research data

My background • Joined Scientific Data in October 2014

• Professional data curator since 2003

• PhD in Molecular Biology from the University of Leicester

• Contributed to the Human Genome Project as member of the Human Gene Nomenclature Committee (HGNC)

• Gene Ontology curator for 8 years, at University College London, UK

• 3 years of open data publishing experience

2

Page 3: Gaining credit for sharing research data

Why share research data?

Page 4: Gaining credit for sharing research data

Generating research data is expensive

Just 18.1% NIH grant applications funded in 2014*

• Hours spent writing grants?

• Hours spent reviewing grants?

Resources are finite/expensive

• Modified animals

• Specialized reagents

Time and effort taken in the laboratory to generate good, valid data

* report.nih.gov/success_rates/Success_ByIC.cfm

Page 5: Gaining credit for sharing research data

Irreproducibility of published science

Figure 1 - Ioannidis JPA. et al. Repeatability of published microarray gene

expression analyses. Nature Genetics 41, 149–55 (2009) doi:10.1038/ng.295

Page 6: Gaining credit for sharing research data

Withholding data impacts on human health

Clinical study reports, detailed data and software code available at Dryad Digital Repository doi:10.5061/dryad.bv8j6 and www.Study329.org

Page 7: Gaining credit for sharing research data

• Diversity of analyses and opinion

• New research

• testing of new hypotheses

• new analysis methods

• meta-analyses to create new datasets

• studies on data collection methods

• Education of new researchers

• Increased return on investment in research

Vickers AJ: Whose data set is it anyway? Sharing raw data from randomized trials. Trials 2006, 7:15

Hrynaszkiewicz I, Altman DG: Towards agreement on

best practice for publishing raw clinical trial data. Trials 2009, 10:17

Sharing data promotes

Page 8: Gaining credit for sharing research data

Researchers already share data

• Most researchers are sharing

data, and using the data of

others

• Direct contact between

researchers (on request) is a

common way of sharing data

• Repositories are second most

common method of sharing

Kratz and Strasser (2015) doi: 10.1371/journal.pone.0117619 9

Page 9: Gaining credit for sharing research data

Some problems… • Sharing upon request relies heavily on trust

• Informally stored data associated with published works disappears at a

rate of ~17% per year (Vines et al. 2014; doi: 10.1016/j.cub.2013.11.014)

• Datasets not referenced in a manuscript are essentially invisible (a.k.a

“Dark data”)

• If data are available, they are often not interpretable or reusable

because sufficient detail is not included

• Data producers do not get appropriate credit for their work

Page 10: Gaining credit for sharing research data

10

www.nature.com/scientificdata

Page 11: Gaining credit for sharing research data

Credit – Scholarly credit for publishing data; all publications are indexed

and citeable.

Reuse – Standardized and detailed descriptions enables easier reuse of

published research data.

Quality – Rigorous peer-review on technical quality and reusability.

Editorial Board of experts in their field maintain community standards.

Discovery – Curated, machine-readable metadata for dataset discovery.

Validated links to published data in each article.

Open – Use of CC-BY licence for articles and CC0 for metadata. Promote

use of open licences for published data.

Service – Commitment to excellent service for authors and readers.

Page 12: Gaining credit for sharing research data

What is a Data Descriptor?

Page 13: Gaining credit for sharing research data

Data Descriptors have human and machine readable components

13

Human readable representation of

study i.e. article (HTML &

PDF)

Human readable representation of

study i.e. article (HTML

& PDF)

Machine readable

representation of study

i.e. metadata

Page 14: Gaining credit for sharing research data

Synthesis

Analysis

Conclusions

What did I do to generate the data?

How was the data processed?

Where is the data?

Who did what and when?

Methods and technical analyses supporting the quality of the measurements.

Do not contain tests of new scientific hypotheses

Comparison of Data Descriptor to traditional article

Page 15: Gaining credit for sharing research data

What types of data can be published?

15

Decades old

dataset

Standalone dataset

Data that has been used in an analysis

article

Large consortium

dataset

Data from a single

experiment

Data that the researcher finds

valuable and that others might find

useful too

Data associated with a high impact

analysis article

Page 16: Gaining credit for sharing research data

When can a Data Descriptor be published?

16

After data analysis has

been published

Before analysis has been published

Authors not intending to analyse data

Data Descriptors can be submitted and published

at any point in the research workflow, i.e.

whenever it makes most sense for your data

After data analysis has

been published

Before the analysis has

been published

Publication alongside analysis

article

Page 17: Gaining credit for sharing research data

Scientific Data accepts submissions from all quantitative research disciplines

17

Page 18: Gaining credit for sharing research data

Helping authors find the right place for their data

Page 19: Gaining credit for sharing research data

Scientific Data’s Repository List

Browse our recommended data repositories online.

• We currently list almost 80 repositories, across biological, medical,

physical and social sciences

• When required, we provide guidance to authors on the best place to

store their data

www.nature.com/sdata/data-policies/repositories

Page 20: Gaining credit for sharing research data

Generation of machine readable metadata

Page 21: Gaining credit for sharing research data

• We want to capture metadata about the dataset being described in each Data Descriptor

• The manuscript captures human readable metadata needed for data reuse

• The curated metadata records capture machine readable metadata needed for machine based data discovery

Metadata at Scientific Data

Page 22: Gaining credit for sharing research data

ISA-Tab format for machine readable metadata

22

• Study workflow

• Key sample characteristics

needed for data discovery

• Relates samples to data files

• Shows location of dataset

• Uses controlled vocabularies

and ontologies (where

possible)

Page 23: Gaining credit for sharing research data

Use of community endorsed ontologies and controlled vocabularies

23

Controlled vocabulary = list of standardized phrases of scientific concepts Ontology = controlled vocabulary with defined relationships between terms

Page 24: Gaining credit for sharing research data

Structured Summary table from curated metadata

24

Investigation file

Study file

Sample characteristics reported in Structured Summary table: Organism Organism part Cell line Geographical location Environment type

Page 25: Gaining credit for sharing research data

Viewing the metadata

25

1.

2.

3.

Page 26: Gaining credit for sharing research data

Metadata for data discovery

Search by: • Data Repositories • Experiment design • Measurements made • Technologies used • Factor types • Sample Characteristics

• Organism • Environment types • Geographic locations

scientificdata.isa-explorer.org

Page 27: Gaining credit for sharing research data

Citing Data

Page 28: Gaining credit for sharing research data

Citing my own data

1. In the article text

2. In the Data Citation section

Page 29: Gaining credit for sharing research data

Citing data I’ve reused

1. In the article text

2. In the References

section

Page 30: Gaining credit for sharing research data

Clinical researchers support sharing, but…

Rathi V, Dzara K, Gross CP, Hrynaszkiewicz I, Joffe S, Krumholz HM, Strait KM, Ross JS: Sharing of clinical trial data among trialists: a cross sectional survey. BMJ 2012;345:e7570

• Sharing de-identified data via repositories should be required (236 respondents, 74%)

• Investigators should share de-identified data on request (229 respondents, 72%)

Page 31: Gaining credit for sharing research data

…clinical data producers have specific concerns

Rathi V, Dzara K, Gross CP, Hrynaszkiewicz I, Joffe S, Krumholz HM, Strait KM, Ross JS: Sharing of clinical trial data among trialists: a cross sectional survey. BMJ 2012;345:e7570

Page 32: Gaining credit for sharing research data

Example initiatives for sharing clinical data

Yale Open Data Access (YODA) & Clinical Study Data Request (CSDR) projects:

• Data Use Agreements (DUAs) • Controlled access environment • Scientific validity of reanalysis checked • Independent governance • Data anonymisation checks

http://yoda.yale.edu/ https://www.clinicalstudydatarequest.com/

Page 33: Gaining credit for sharing research data

Clinical data publication at Scientific Data

• Identify repositories able to archive clinical data

• Work with identified repositories to establish workflows for

peer review and publication, whilst maintaining patient

privacy

• Facilitate specialist peer review process for clinical data, for

example ensure peer reviewers have agreed to terms of data

use agreement

Hrynaszkiewicz, I., Khodiyar, V., Hufton, A. & Sansone, S. A. Publishing descriptions of non-public clinical datasets: guidance for researchers, repositories, editors and funding organisations. BioRxiv http://dx.doi.org/10.1101/021667 (2015).

Page 34: Gaining credit for sharing research data

A robust data-on-request workflow?

Page 35: Gaining credit for sharing research data

Published Data Descriptor with clinical data Data Records

section details how to access

the data

Page 36: Gaining credit for sharing research data

Links to restricted access data Data Citations link to repository

Data files requiring

permission to access

Freely accessible data files

Page 37: Gaining credit for sharing research data

Data Reuse stories

Page 38: Gaining credit for sharing research data

Data reuse by (some of) the same researchers

38

Page 39: Gaining credit for sharing research data

Data reuse by other researchers in the same field

39

“The Data Descriptor made it easier to use the data, for me it was critical that everything was there…all the technical details like voxel size.”

Professor Daniele Marinazzo

Page 40: Gaining credit for sharing research data

According to Google Scholar, cited 43 times! (February 2016)

Data reuse and citation by researchers

Page 41: Gaining credit for sharing research data

41

www.bbc.co.uk/news/science-environment-33057402

Data reuse by the non-research community

Page 42: Gaining credit for sharing research data

Data reuse by the non-research community

42

http://www.nytimes.com/interactive/2014/12/30/science/history-of-ebola-in-24-outbreaks.html

Page 43: Gaining credit for sharing research data

Data Descriptors…

• …enable you to gain scholarly credit for your data gathering efforts.

• …are human AND machine readable.

• …can be published with, or independently of, an analysis article.

• …can be published point in the research workflow.

• …allow the publication and discovery of clinical data, whilst maintaining your patients privacy.

• …result in greater reuse and citation by fellow members of your research community.

• …extend the impact of your research data by enabling access to and reuse by the non-research community.

43

Page 44: Gaining credit for sharing research data

Get more from

your data

Preserve it

Encourage reuse

Get credit for it

Visit nature.com/sdata Email [email protected] Tweet @ScientificData #SDJPN16