data sharing with dataverse

18
Data Publication and Preservation Panel FNLM's Workshop, Changing Publication Practices in the COVID-19 Era October 28, 2020 Mercè Crosas, Ph.D., Harvard University scholar.harvard.edu/mercecrosas @mercecrosas Data Sharing with Dataverse

Upload: others

Post on 22-May-2022

5 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Data Sharing with Dataverse

Data Publication and Preservation PanelFNLM's Workshop, Changing Publication Practices in the COVID-19 Era

October 28, 2020

Mercè Crosas, Ph.D., Harvard Universityscholar.harvard.edu/mercecrosas @mercecrosas

Data Sharing with Dataverse

Page 2: Data Sharing with Dataverse

Data Repositories becoming part of the Research Lifecycle

Data from companies, hospitals, services

Data from government

Data collected for research (experiments,

observations, etc)

Active Research(Data cleaning,

processing, analysis, modeling)

Publish or Disseminate Research Outputs

Journals,Preprint and OA

Repositories

Data Repositories(institutional,

global, domain, generalist)

Article

Data used in Research Study(+ Code, Workflows,Provenance, Documentation, Containers)Reuse and Reproducibility

Input Output

Page 3: Data Sharing with Dataverse

Data Repositories becoming part of the Research Lifecycle

Data from companies, hospitals, services

Data from government

Data collected for research (experiments,

observations, etc)

Active Research(Data cleaning,

processing, analysis, modeling)

Publish or Disseminate Research Outputs

Journals,Preprint and OA

Repositories

Data Repositories(institutional,

global, domain, generalist)

Article

Data used in Research Study(+ Code, Workflows,Provenance, Documentation, Containers)Reuse and Reproducibility

Input Output

Page 4: Data Sharing with Dataverse

Dataverse powers Data Repositories around the world

● Open-source software for data repositories

● 62 Dataverse repositories world-wide

● A community of users and contributors

● 100K searchable datasets, 800K files

● 20M file downloads

● Open to all researchers and research fields

(dataverse.org) Harvard Dataverse (dataverse.harvard.edu)

Page 5: Data Sharing with Dataverse

Key Dataverse Features

Page 6: Data Sharing with Dataverse

Organization of a Dataverse Repository

Dataverse dataset File

Collection of datasetsOwn administrationOwn branding (& can be embedded in your site)

CitationMetadataVersioningTerms/permissionsCollection of Files

CitationPreview/ExploreMetadataVersioningPermissions

Page 7: Data Sharing with Dataverse

Features for FAIR and Responsible Data Sharingü Data Citation with DOI for datasets and files

ü Credit to data authors

ü Link from data to related article

ü Standards-based and custom metadata

ü DDI, Schema.org, DataCite, Dublin Core, OAI-ORE, OpenAire, JSON-LD

ü Access controls (open vs guestbook vs restricted ) with licenses and terms of use

ü Versioning and provenance

ü Descriptive Statistics generated from variables in tabular data files

ü Conversion to multiple formats of tabular data files

ü Flexible upload of large data files (>> 5GB): Web UI, API, Standalone Client

ü Integration with external tools through extensive API

ü Make Data Count (coming soon)

Page 8: Data Sharing with Dataverse

Data Citation, with DataCite DOI, fully compliant with Force11 Joint Declaration of Data Citation Principles

FINDABILITY:Full, standard data citation automatically generated

Page 9: Data Sharing with Dataverse

Rich support for Metadata Standards in human- and machine-readable formats.

FINDABILITY AND REUSABILITY: Support for multiple metadata standards

Page 10: Data Sharing with Dataverse

Optional request accessfeature for restricted data

ACCESSIBILITY:Access terms available for restricted data

Page 11: Data Sharing with Dataverse

Deaccession reason in dataset landing page when data not longer available

ACCESSIBILITY:Metadata always available

Page 12: Data Sharing with Dataverse

Extensive variable metadata (descriptive statistics) automatically derived from tabular data file in DDI format

INTEROPERABLE: Rich statistics metadata derived from each variable in data file

Page 13: Data Sharing with Dataverse

FAIR Controlled Vocabularies can be added in metadata template

INTEROPERABLE: Enable use of standard vocabularies

Page 14: Data Sharing with Dataverse

• CC0 as default waiver for open data;• Optional Licenses and custom Terms;• Tiered-access for restricted data

REUSABLE: Licenses, Terms, and Tiered-Access to Data

Page 15: Data Sharing with Dataverse

What’s Next

Page 16: Data Sharing with Dataverse

Support for Sensitive Data in Dataverse

Non-Sensitive DataTags (TODAY)

● Data uploaded to Dataverse via one of the current options

● Stored locally

Sensitive DataTags (FUTURE RELEASE)

● Stored in a Trusted Remote Storage Agent,accessed through notary service

● Metadata published in Dataverse

Blue

Green

Yellow

Orange

Red

CrimsonOnly metadata and no link to data; data stored outside network (maximum sensitivity)

Publicly open, no barriers

Publicly open, but need to register to access

Restricted, need to be granted permissions, but non-sensitive

Requires Data Use Agreement (DUA); requires data enclave (moderate sensitivity)Requires DUA; stricter security requirements and audits (high sensitivity)

Page 17: Data Sharing with Dataverse

Dataverse + Impact + OpenDP

Dataset and file metadata

Data Use Agreement (DUA)Set by Data Owner

Find sensitive dataset

Notary Service

Data User

Public Repository

Blue Green Yellow

Sensitive Data FilesLarge Data Files

Trusted Remote Storage Agents (TRSA) or data enclaves

Orange Red

Secure Compute Environment to run Differentially Private statistics

Orange Red

Differentially private statistical release of the data

Page 18: Data Sharing with Dataverse

Data Publication and Preservation PanelFNLM's Workshop, Changing Publication Practices in the COVID-19 Era

October 28, 2020

Mercè Crosas, Ph.D., Harvard Universityscholar.harvard.edu/mercecrosas @mercecrosas

THANKS