publishing the full research data lifecycle

18
| 1 Anita de Waard, VP Research Data Collaborations Elsevier RDM Services [email protected] May 20, 2016 Publishing The Full Research Cycle To Support Open Science Container Strategies for Data & Software Preservation that Promote Open Science Notre Dame, IN

Upload: anita-de-waard

Post on 21-Feb-2017

215 views

Category:

Science


1 download

TRANSCRIPT

Page 1: Publishing the Full Research Data Lifecycle

| 1

Anita de Waard, VP Research Data CollaborationsElsevier RDM [email protected]

May 20, 2016

Publishing The Full Research Cycle To Support Open Science

Container Strategies for Data & Software Preservation that Promote Open Science Notre Dame, IN

Page 3: Publishing the Full Research Data Lifecycle

| 3

Collaborate and Analyse: Hivebench

www.hivebench.com

Page 4: Publishing the Full Research Data Lifecycle

| 4

Manage, Store, Preserve: Data Rescue: Preserving Data At Risk

https://olivearchive.org/

Software Rescue: Preserving Executable Content

http://www.codata.org/task-groups/data-at-risk/dar-workshops

Page 5: Publishing the Full Research Data Lifecycle

| 5

https://data.mendeley.com/

Linked to published papers – or not

Linked to Github – or not

Versioning and provenance

Manage, Store, Preserve: Mendeley Data

Allowing Different Licenses

Page 6: Publishing the Full Research Data Lifecycle

| 6

Data articles

Softwarearticles

Methodarticles

Protocols

Video articles

Hardwarearticles

Labresources

Full Researchpaper

• Brief article types designed to communicate a specific element of the research cycle

• Complementary to full research papers

• Easy to prepare and submit• Peer-reviewed and indexed • Receive a DOI and fully citable• Allow citable post-publication updates

• Primarily Open Access (CC-BY) • Published in Multidisciplinary and domain-specific journals

https://www.elsevier.com/books-and-journals/research-elements

Share, Publish: Research Elements

Page 7: Publishing the Full Research Data Lifecycle

| 7

http://www.journals.elsevier.com/softwarex/

Share, Publish: SoftwareX• Submissions to SoftwareX are composed of

- A short article describing the software, with a focus on the impact of the software in the research community and re-usability across disciplines

- A “metadata table” containing information about the software and key metrics:

- A permanent link to a software repository (GitHub) where the software and code is stored and maintained by Elsevier and made freely available

• Peer Review- Follows a simple reviewer questionnaire, available from the SoftwareX website, that evaluates usability and

scientific impact of the software- Less attention is placed on the technical quality of the software

Page 8: Publishing the Full Research Data Lifecycle

| 8

data uploaded on Mendeley Data

code/software

deposited to GitHub

software updates

Software article

peer-review process

submitted

SoftwareX

MetadataBi-directional

links

software article published; live stats

shown

code/software forked to the journal GitHub repository

(open source)

CC-BYlinke

d

Data is publicly available on

Mendeley Data (CC-BY)

accepted

Share, Publish: SoftwareX

Page 9: Publishing the Full Research Data Lifecycle

| 9

Discover: Datasearch

http://datasearchdemo.elsevier.com/indexed#

Page 10: Publishing the Full Research Data Lifecycle

| 10

• The first Reproducibility Paper was published recently: http://www.sciencedirect.com/science/article/pii/S0306437915301113

• It is linked to this paper: http://www.sciencedirect.com/science/article/pii/S0306437915000472

• The data is hosted here: https://data.mendeley.com/datasets/xz6gv65m6d/6   • To reproduce the experiment, the journal requires source code for the software

components, together with installation scripts; we suggest authors to host their code in GitHub

• In addition to the source code, we recommend authors to submit a virtual machine, where all appropriate software components are readily installed and can be reproduced on a wide variety of platforms. Authors are to submit their experiments using either ReproZip or Docker.

Reuse: Reproducibility Papers

Page 11: Publishing the Full Research Data Lifecycle

| 11

Discover, Reuse and Cite:• ICSU-WDS/RDA Publishing Data Service Working group,

merged with National Data Service pilot • Cross-stakeholder - with support and input from CrossRef, DataCite, OpenAIRE,

Europe PubMed Central, ANDS, PANGAEA, Thomson Reuters, Elsevier, and others

• Proposed long-term architecture and interoperability framework: www.scholix.org• Operational prototype at http://dliservice.research-infrastructures.eu/#/api

(including 1.4 Million links from various sources)

http://dliservice.research-infrastructures.eu/#/api

Page 12: Publishing the Full Research Data Lifecycle

| 12

Discover, Reuse and Cite:

https://www.elsevier.com/connect/data-citation-is-becoming-real-with-force11-and-elsevier

Page 13: Publishing the Full Research Data Lifecycle

| 13

Publishing The Full Research Cycle Requires Networks of Collaboration:Force11:

- Multi-stakeholder, member-driven organisation- Unites scholars, tool developers, librarians, publishers, funding agencies etc. etc.- E.g. Software citation group, akin to Data Citation Group

National Data Service:- Multi-stakeholder group, based around supercomputing centres- Aims to be a ‘connective tissue’ between data creation, curation, storage etc projects. - Inviting Pilots: two or more partners who have not worked together, interested in

collaborating on a data-centric project to solve a real-world needs - E.g. Datasearch, Data Linking systems

RDA: - Coleading Data publishing, linking group- Colead Cost Recovery group, part of RDA US Sustainability effort- Active in Chemistry, Earth Science groups, starting IG on Data Search- SciDataCon, Sept 11-16, Denver, CO

The NationalDATA SERVICE

Page 14: Publishing the Full Research Data Lifecycle

| 14

• https://www.hivebench.com• https://www.elsevier.com/physical-sciences/earth-and-planetary-sciences/the-2015-international-da

ta-rescue-award-in-the-geosciences

• http://www.journals.elsevier.com/softwarex/• https://www.elsevier.com/books-and-journals/content-innovation/data-base-linking• https://rd-alliance.org/groups/rdawds-publishing-data-services-wg.html • https://rd-alliance.org/bof-data-search.html• https://data.mendeley.com/• https://www.elsevier.com/connect/10-aspects-of-highly-effective-research-data• https://www.force11.org/• http://www.nationaldataservice.org/• https://rd-alliance.org/• https://www.elsevier.com/about/open-science/research-data

Anita de Waard, [email protected]

Thank you! Questions?

Page 15: Publishing the Full Research Data Lifecycle

| 15

Researchers

Funding AgencyInstitution

Data RepositoryDataset

JournalPaper

1. Researcher creates datasets2. Researcher writes paper & publishes in journal3. (Sometimes,) dataset gets posted to repository4. Researcher reports (post-hoc) to Institution and Funder

22

1

3

4

4

Share and Publish, Current Status:

Page 16: Publishing the Full Research Data Lifecycle

| 16

Researchers

Funding AgencyInstitution

Dataset

JournalPaper2

2

1

3

4

4iii. No link between data

and paper

iv. Funders/Institutions informed as an afterthought

i. Too much work for researchers

ii. Data posting not mandatory

Data Repository

Share and Publish, Issues:

Page 17: Publishing the Full Research Data Lifecycle

| 17

Researchers

Funding AgencyInstitution

Data Repository

Dataset

Journal

Paper

1. Researcher creates datasets and posts to repository(under embargo)

2. Funder is automatically notified of dataset publication3. Researcher writes paper & publishes in journal;

embargo is lifted and data linked- NB this also allows release of non-used data for negative result and reproducibility4. Funder and institution get report on publication and embargo lifting

2

11

3

3

3

4

4i. Less Work!

iv. Better Tracking!

iii. Better Linking!

ii. More Data

Stored!

Share and Publish, Proposal:

Page 18: Publishing the Full Research Data Lifecycle

| 18

https://www.elsevier.com/connect/10-aspects-of-highly-effective-research-data

A Maslow Hierarchy for Research Data: