managing and sharing data

22
Managing and sharing data Sarah Jones DCC, University of Glasgow [email protected] Twitter: @sjDCC ERC Workshop on Research Data Management and Sharing 18-19 September 2014 , Brussels Funded by:

Upload: sarah-jones

Post on 29-Nov-2014

113 views

Category:

Technology


6 download

DESCRIPTION

Presentation given at the European Research Council workshop on research data management and sharing in Brussels on 18th-19th September 2014. The presentation covers the benefits and drivers for RDM, points to relevant tools and resources and closes with some open questions for discussion.

TRANSCRIPT

Page 1: Managing and sharing data

Managing and sharing data

Sarah JonesDCC, University of Glasgow

[email protected] Twitter: @sjDCC

ERC Workshop on Research Data Management and Sharing 18-19 September 2014 , Brussels

Funded by:

Page 2: Managing and sharing data

European Research Council policy

Commitment to open science from the start:

"it is the firm intention of the ERC Scientific Council to issue specific guidelines for the mandatory deposit in open access repositories of research results – that is, publications, data and primary materials – obtained thanks to ERC grants, as

soon as pertinent repositories become operational."

Statement on Open Access, December 2006

Image CC BY-SA 3.0 by Greg Emmerich www.flickr.com/photos/gemmerich/6365692655

Page 3: Managing and sharing data

Why make data available?

Page 4: Managing and sharing data

Sharing leads to breakthroughs

www.nytimes.com/2010/08/13/health/research/13alzheimer.html?pagewanted=all&_r=0

“It was unbelievable. Its not science the way most of us have practiced in our careers. But we all realised that we would never get biomarkers unless all of us parked our egos and intellectual property noses outside the door and agreed that all of our data would be public immediately.”

Dr John Trojanowski, University of Pennsylvania

... increases the speed of discovery

Page 5: Managing and sharing data

Returns for institutions

“If an institution spent A$10 million on data, what would be the return? The answer is: more publications; an increased citation count; more grants; greater profile; and more collaboration.”

Dr Ross Wilkinson, ANDSwww.ariadne.ac.uk/issue72/oar-2013-rpt

Page 6: Managing and sharing data

Researchers get a citation boost

“Publicly available data was significantly (p = 0.006) associated with a 69% increase in

citations, independently of journal impact factor, date of publication, and author country of origin

using linear regression.”Piwowar H., Day, R and Fridsma, D. (2007) Sharing detailed research data is associated with

increased citation rate. DOI: 10.1371/journal.pone.0000308

Page 7: Managing and sharing data

But, there are also barriers...

Who owns the data?• Researchers?• University?• Commercial partners?• Funders?• …

People are often misinformed about who owns the data. It is particularly hard to determine in international projects or ones with industry.

Restrictions on sharing• Patentable data• Commercial sensitivities• Personal, identifiable data• Lack of consent • …

There are legitimate reasons to agree embargo periods, impose conditions, or to share only some of the data.However, these are often given as reasons not to share data at all.

www.dcc.ac.uk/sites/default/files/documents/events/ workshops/IHW-2013/UKDA-barriers-to-data-sharing.pdf

Page 8: Managing and sharing data

And opportunity costs

By Emilio Brunahttp://brunalab.org/blog/2014/09/04/the-opportunity-cost-of-my-openscience-was-35-hours-690

For his most recent paper:

1. Double checking the main dataset and reformatting to submit to Dryad: 5 hours

2. Creating complementary file and preparing metadata: 3 hours

3. Submission of these two files and the metadata to Dryad: 45 minutes

4. Preparing a map of the locations: 1 hour

5. Submission of map to Figshare: 15 minutes

6. Cleaning up and documenting the code, uploading it to GitHub: 25 hours

7. Cost of archiving in Dryad: US$90

8. Page Charges: $600

Page 9: Managing and sharing data

What needs to change?

Conclusions from Emilio Bruna:

• Develop a better system of incentives from the community for archiving data and code

• Teach our students how to do this NOW - it’s much easier if you develop good habits early

• Minimise the actual and opportunity costs

We need to stop telling people “You should” and get better at telling people “Here’s how”

Page 10: Managing and sharing data

What is involved in data curation

• Data Management Planning• Data creation• Annotating / documenting data• Analysis, use, versioning• Storage and backup• Publishing papers and data• Preparing for deposit• Archiving and sharing• Licensing• Citing…

Plan

Create

Document

Use

Publish

Share

Page 11: Managing and sharing data

Data Management Plans

Brief plans to determine how data will be created, managed and shared. DMPs usually cover:

1. Description of data to be collected / created

2. Standards and methodologies for data collection & management

3. Any issues or restrictions due to ethics and Intellectual Property

4. Plans for data sharing and access

5. Strategy for long-term preservation

DMPs are often submitted as part of grant applications, but are useful whenever you’re creating data.

Page 13: Managing and sharing data

Managing and sharing data: a best practice guide

http://data-archive.ac.uk/media/2894/managingsharing.pdf

Page 14: Managing and sharing data

Training materials

FOSTER project• Open science training• Courses across EU• Portal to OA materials• Guidance on Horizon 2020

• Free online training course• Aimed at PhD students• Case studies, quizzes etc• Data handling tutorials

– R– SPSS– ArcGIS– Nvivo

http://datalib.edina.ac.uk/mantra www.fosteropenscience.eu

Page 15: Managing and sharing data

DCC tools catalogueA catalogue of RDM tools for different audiences. Tools for researchers focus on data handling, managing workflows, citation and impact.

www.dcc.ac.uk/resources/external/tools-services

Page 16: Managing and sharing data

Tools to help with RDM activities

impactstory.org

owncloud.org

thedata.org

www.datacite.org

dataup.cdlib.org

www.myexperiment.org

www.taverna.org.uk

www.labtrove.org

Documentation & metadata

Workflow management

Storage & collaboration

Citation & impact

Page 17: Managing and sharing data

Metadata standards catalogue

Use standards wherever possible for interoperability

www.dcc.ac.uk/resources/metadata-standards

Page 18: Managing and sharing data

Data repositories

http://databib.org

http://service.re3data.org/search

Page 19: Managing and sharing data

1. How do you foster open science?

• Make it feasible to comply – provide tools and infrastructure

• Train people early in their careers

• Incentivise openness

• Listen to researchers and learn from their experience about what doesn’t work

• Follow up on any demands made in policies

Page 20: Managing and sharing data

2. Who is responsible for providing infrastructure and support?

Funders

Discipline

Institution

Third-party

services

National provider

Data centres e.g. via NERC

Institutional support for discipline-specific tools e.g. Monash MeRC partnership on tools like OMERO

National brokerage of deals with third-party providers e.g. Jisc Janet deals with Arkivum

And what about co-ordination?

Page 22: Managing and sharing data

Thanks – any questions?

DCC guidance, tools and case studies:www.dcc.ac.uk/resources

Follow us on twitter: @digitalcuration and #ukdcc