publishing the full research data lifecycle
TRANSCRIPT
| 1
Anita de Waard, VP Research Data CollaborationsElsevier RDM [email protected]
May 20, 2016
Publishing The Full Research Cycle To Support Open Science
Container Strategies for Data & Software Preservation that Promote Open Science Notre Dame, IN
| 2
Source: JISC: How and why you should manage your research data: a guide for researchersCaroline Ingram, Published: 7 January 2016
Research Data Life Cycle:
| 4
Manage, Store, Preserve: Data Rescue: Preserving Data At Risk
https://olivearchive.org/
Software Rescue: Preserving Executable Content
http://www.codata.org/task-groups/data-at-risk/dar-workshops
| 5
https://data.mendeley.com/
Linked to published papers – or not
Linked to Github – or not
Versioning and provenance
Manage, Store, Preserve: Mendeley Data
Allowing Different Licenses
| 6
Data articles
Softwarearticles
Methodarticles
Protocols
Video articles
Hardwarearticles
Labresources
Full Researchpaper
• Brief article types designed to communicate a specific element of the research cycle
• Complementary to full research papers
• Easy to prepare and submit• Peer-reviewed and indexed • Receive a DOI and fully citable• Allow citable post-publication updates
• Primarily Open Access (CC-BY) • Published in Multidisciplinary and domain-specific journals
https://www.elsevier.com/books-and-journals/research-elements
Share, Publish: Research Elements
| 7
http://www.journals.elsevier.com/softwarex/
Share, Publish: SoftwareX• Submissions to SoftwareX are composed of
- A short article describing the software, with a focus on the impact of the software in the research community and re-usability across disciplines
- A “metadata table” containing information about the software and key metrics:
- A permanent link to a software repository (GitHub) where the software and code is stored and maintained by Elsevier and made freely available
• Peer Review- Follows a simple reviewer questionnaire, available from the SoftwareX website, that evaluates usability and
scientific impact of the software- Less attention is placed on the technical quality of the software
| 8
data uploaded on Mendeley Data
code/software
deposited to GitHub
software updates
Software article
peer-review process
submitted
SoftwareX
MetadataBi-directional
links
software article published; live stats
shown
code/software forked to the journal GitHub repository
(open source)
CC-BYlinke
d
Data is publicly available on
Mendeley Data (CC-BY)
accepted
Share, Publish: SoftwareX
| 9
Discover: Datasearch
http://datasearchdemo.elsevier.com/indexed#
| 10
• The first Reproducibility Paper was published recently: http://www.sciencedirect.com/science/article/pii/S0306437915301113
• It is linked to this paper: http://www.sciencedirect.com/science/article/pii/S0306437915000472
• The data is hosted here: https://data.mendeley.com/datasets/xz6gv65m6d/6 • To reproduce the experiment, the journal requires source code for the software
components, together with installation scripts; we suggest authors to host their code in GitHub
• In addition to the source code, we recommend authors to submit a virtual machine, where all appropriate software components are readily installed and can be reproduced on a wide variety of platforms. Authors are to submit their experiments using either ReproZip or Docker.
Reuse: Reproducibility Papers
| 11
Discover, Reuse and Cite:• ICSU-WDS/RDA Publishing Data Service Working group,
merged with National Data Service pilot • Cross-stakeholder - with support and input from CrossRef, DataCite, OpenAIRE,
Europe PubMed Central, ANDS, PANGAEA, Thomson Reuters, Elsevier, and others
• Proposed long-term architecture and interoperability framework: www.scholix.org• Operational prototype at http://dliservice.research-infrastructures.eu/#/api
(including 1.4 Million links from various sources)
http://dliservice.research-infrastructures.eu/#/api
| 12
Discover, Reuse and Cite:
https://www.elsevier.com/connect/data-citation-is-becoming-real-with-force11-and-elsevier
| 13
Publishing The Full Research Cycle Requires Networks of Collaboration:Force11:
- Multi-stakeholder, member-driven organisation- Unites scholars, tool developers, librarians, publishers, funding agencies etc. etc.- E.g. Software citation group, akin to Data Citation Group
National Data Service:- Multi-stakeholder group, based around supercomputing centres- Aims to be a ‘connective tissue’ between data creation, curation, storage etc projects. - Inviting Pilots: two or more partners who have not worked together, interested in
collaborating on a data-centric project to solve a real-world needs - E.g. Datasearch, Data Linking systems
RDA: - Coleading Data publishing, linking group- Colead Cost Recovery group, part of RDA US Sustainability effort- Active in Chemistry, Earth Science groups, starting IG on Data Search- SciDataCon, Sept 11-16, Denver, CO
The NationalDATA SERVICE
| 14
• https://www.hivebench.com• https://www.elsevier.com/physical-sciences/earth-and-planetary-sciences/the-2015-international-da
ta-rescue-award-in-the-geosciences
• http://www.journals.elsevier.com/softwarex/• https://www.elsevier.com/books-and-journals/content-innovation/data-base-linking• https://rd-alliance.org/groups/rdawds-publishing-data-services-wg.html • https://rd-alliance.org/bof-data-search.html• https://data.mendeley.com/• https://www.elsevier.com/connect/10-aspects-of-highly-effective-research-data• https://www.force11.org/• http://www.nationaldataservice.org/• https://rd-alliance.org/• https://www.elsevier.com/about/open-science/research-data
Anita de Waard, [email protected]
Thank you! Questions?
| 15
Researchers
Funding AgencyInstitution
Data RepositoryDataset
JournalPaper
1. Researcher creates datasets2. Researcher writes paper & publishes in journal3. (Sometimes,) dataset gets posted to repository4. Researcher reports (post-hoc) to Institution and Funder
22
1
3
4
4
Share and Publish, Current Status:
| 16
Researchers
Funding AgencyInstitution
Dataset
JournalPaper2
2
1
3
4
4iii. No link between data
and paper
iv. Funders/Institutions informed as an afterthought
i. Too much work for researchers
ii. Data posting not mandatory
Data Repository
Share and Publish, Issues:
| 17
Researchers
Funding AgencyInstitution
Data Repository
Dataset
Journal
Paper
1. Researcher creates datasets and posts to repository(under embargo)
2. Funder is automatically notified of dataset publication3. Researcher writes paper & publishes in journal;
embargo is lifted and data linked- NB this also allows release of non-used data for negative result and reproducibility4. Funder and institution get report on publication and embargo lifting
2
11
3
3
3
4
4i. Less Work!
iv. Better Tracking!
iii. Better Linking!
ii. More Data
Stored!
Share and Publish, Proposal:
| 18
https://www.elsevier.com/connect/10-aspects-of-highly-effective-research-data
A Maslow Hierarchy for Research Data: