towards fairness in research data - vr.se · •discover research datasets via etsin and b2find and...

28
CSC – Suomalainen tutkimuksen, koulutuksen, kulttuurin ja julkishallinnon ICT-osaamiskeskus CSC – Suomalainen tutkimuksen, koulutuksen, kulttuurin ja julkishallinnon ICT-osaamiskeskus Towards FAIRness in research data Per Öster, 3 October 2018

Upload: others

Post on 30-May-2020

7 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Towards FAIRness in research data - vr.se · •Discover research datasets via Etsin and B2Find and make sure your data is discoverable, too •Store data needed in analysis in CSC’s

CSC – Suomalainen tutkimuksen, koulutuksen, kulttuurin ja julkishallinnon ICT-osaamiskeskusCSC – Suomalainen tutkimuksen, koulutuksen, kulttuurin ja julkishallinnon ICT-osaamiskeskus

Towards FAIRness in research data

Per Öster, 3 October 2018

Page 2: Towards FAIRness in research data - vr.se · •Discover research datasets via Etsin and B2Find and make sure your data is discoverable, too •Store data needed in analysis in CSC’s

2

Towards FAIR data!

Anyone expecting that an advertising company or a retailer of merchandise and media will take us there?

Page 3: Towards FAIRness in research data - vr.se · •Discover research datasets via Etsin and B2Find and make sure your data is discoverable, too •Store data needed in analysis in CSC’s

17.10.20183

Page 4: Towards FAIRness in research data - vr.se · •Discover research datasets via Etsin and B2Find and make sure your data is discoverable, too •Store data needed in analysis in CSC’s

Secure Compute Clouds

Supporting sample

logistics

• Federated Authentication

• Authorization

• Dataset registry

• Data transfer hub

• Policy and Legal Framework

Services and

Coordination

High speed encrypted data

transfer

GridFTP/Globus/Aspera

Secure data access remote API

( GA4GH )

Sequencing centers

Data

Users

EGA

at

Data Archiving

Bringing users

to data

Data Generation

Managing Access

Data Owner

Data Access Agreement

Data Access Committee

Data Request

Authorization Management Tools

( EGA and CSC REMS )

4

Page 5: Towards FAIRness in research data - vr.se · •Discover research datasets via Etsin and B2Find and make sure your data is discoverable, too •Store data needed in analysis in CSC’s

From field measurements to open data

17.10.20185

Questions:

Sensitive data?

Requirements on Authentication and authorization?

Page 6: Towards FAIRness in research data - vr.se · •Discover research datasets via Etsin and B2Find and make sure your data is discoverable, too •Store data needed in analysis in CSC’s

Instrument

Measuring PCs:Raw data at the stations

File servers at stations:Raw data and field diaries, cal documents

File servers in Helsinki:Raw and intermediate data,documents, scripts

SMEAR database: Processed data in Helsinki

ICOS, EBAS,... databases: Near real time and processed data outside UH

Routine data processing = (- unit conversion)- calibration correction- quality check, gapfilling- averaging over space or time

SMEARdata flow

A/D conversionunit conversion

IDA (CSC data service):Raw data & document archive, database datasets

Field documentation

Researchers,Data processing server

Feedback on data quality

Metadata

Metadata

Metadata

17.10.20186

https://avaa.tdata.fi/web/smart/smear

Page 7: Towards FAIRness in research data - vr.se · •Discover research datasets via Etsin and B2Find and make sure your data is discoverable, too •Store data needed in analysis in CSC’s

10/17/20187

Analysis

Publicat ion

Review Conceptualisat ion

Data gathering

Open access

Scient ific blogs Collaborat ive

bibliographies

Alternat ive Reputat ion

system s

Cit izens science

Open code

Open w orkflow s

Open annotat ion

Open data

Pre-print

Data-intensive

2!

Sci-starter.com

Runmycode.org

ArXiv

Roar.eprints.org

Impact Story

Altmetric.com

Mendeley.com Academia.edu

Researchgate.com

Openannotation.org

Datadryad.org

Myexperiment.org

Figshare.com

An#emerging#ecosystem#of#services#and#standards#

I t 's real!

Page 8: Towards FAIRness in research data - vr.se · •Discover research datasets via Etsin and B2Find and make sure your data is discoverable, too •Store data needed in analysis in CSC’s

Support in All Phases of Research Process

20168

PlanCustomer PortalExpertsGuidesWebsitesTrainingService Desk

Produce & Collect

DataInternational resourcesModellingSoftwareSupercomputers

AnalyseCloud ServicesTrainingData scienceComputingSoftware

StoreB2SAFEB2SHAREHPC ArchiveIDADatabasesResearch long-term preservation (LTP)

Share & Publish

AVAAB2DROPB2SHAREDatabankEtsinFunetFileSender

Page 9: Towards FAIRness in research data - vr.se · •Discover research datasets via Etsin and B2Find and make sure your data is discoverable, too •Store data needed in analysis in CSC’s

Termination. Company may terminate your access to all or any part of the Service at any time, with or without cause, with or without notice, effective immediately, which may result in the forfeiture and destruction of all information associated with your account, including User Submissions. If you wish to terminate your account, you may do so by following instructions available on the Site. Any fees paid hereunder are non-refundable. All provisions of the Terms of Use which by their nature should survive termination shall survive termination, including, without limitation, ownership provisions, warranty disclaimers, indemnity and limitations of liability.

Page 10: Towards FAIRness in research data - vr.se · •Discover research datasets via Etsin and B2Find and make sure your data is discoverable, too •Store data needed in analysis in CSC’s

ARTICLE 2: DISCLAIMER1. The Service is provided "as is" and the Provider disclaims any and all

representations and warranties, whether express or implied, including;- but not limited to;- implied warranties of title, merchantability, fitness for any particular purpose or non-infringement. The Provider does not promise any specific results, effects or outcome from the use of the Service.

2. …3. The Provider reserves the right to change, reduce, interrupt or discontinue

the Service or parts of it at any time.4. No one has a right to use the Service; the Provider reserves the right to

exclude certain Users.

Page 11: Towards FAIRness in research data - vr.se · •Discover research datasets via Etsin and B2Find and make sure your data is discoverable, too •Store data needed in analysis in CSC’s

Are the commercial services sufficient?

• Nice complement but can not serve as the fundamental infrastructure for

research data of national and international interest

• Need for publicly funded and operated infrastructure

17.10.201811

e-ScienceDataFactory

Page 12: Towards FAIRness in research data - vr.se · •Discover research datasets via Etsin and B2Find and make sure your data is discoverable, too •Store data needed in analysis in CSC’s
Page 13: Towards FAIRness in research data - vr.se · •Discover research datasets via Etsin and B2Find and make sure your data is discoverable, too •Store data needed in analysis in CSC’s

THE FAIR DATA PRINCIPLES

• FINDABLE:

o Data are assigned a globally unique and eternally persistent identifier.

o Data are described with rich metadata.

o (Meta)data are registered or indexed in a searchable resource.

o metadata specify the data identifier.

• ACCESSIBLE:o (Meta)data are retrievable by their identifier using a standardized communications protocol.

o The protocol is open, free, and universally implementable.

o The protocol allows for an authentication and authorization procedure, where necessary.

o Metadata are accessible, even when the data are no longer available.

• INTEROPERABLE

o (Meta)data use a formal, accessible, shared, and broadly applicable language for knowledge representation.

o (Meta)data use vocabularies that follow FAIR principles.

o (Meta)data include qualified references to other (meta)data.

• RE-USABLE:

o Meta(data) have a plurality of accurate and relevant attributes.

o Meta(data) are released with a clear and accessible data usage license.

o Meta(data) are associated with their provenance.

o Meta(data) meet domain-relevant community standards.

17.10.201813

https://www.force11.org/group/fairgroup/fairprinciples

Page 14: Towards FAIRness in research data - vr.se · •Discover research datasets via Etsin and B2Find and make sure your data is discoverable, too •Store data needed in analysis in CSC’s

14

• Described in relevant catalog with enough detail

• Landing page with globally unique identifierF• Can be retrieved over the internet

• Versioning and lifecycle documented

• Tombstone page if data is deletedA

• Common, documented, and open formatsI

• Well documented and intelligible

• Rights clearly statedRRE

SE

AR

CH

DA

TA FINDABLE

ACCESSIBLE

INTEROPERABLE

RE-USABLE

Page 15: Towards FAIRness in research data - vr.se · •Discover research datasets via Etsin and B2Find and make sure your data is discoverable, too •Store data needed in analysis in CSC’s

Plan

Discover,produce& collect

Analyse

Store

Share

Re-use

• Plan data management with DMPTuuli

• Discover research datasets via Etsin and B2Find

and make sure your data is discoverable, too

• Store data needed in analysis in CSC’s user

directories or within your cloud environment

• Store stable data in IDA, B2SHARE, HPC Archive

or in dedicated databases (Kaivos etc.)

• Share stored data via B2SHARE or IDA, send large

files with FUNET FileSender

• Reuse open research datasets: geo-informatics

data (PaiTuli), speech and text corpora (Language

Bank) and from various fields of science (AVAA)

Services to support the data lifecycle

Page 16: Towards FAIRness in research data - vr.se · •Discover research datasets via Etsin and B2Find and make sure your data is discoverable, too •Store data needed in analysis in CSC’s

The EUDAT Service Suite

Page 17: Towards FAIRness in research data - vr.se · •Discover research datasets via Etsin and B2Find and make sure your data is discoverable, too •Store data needed in analysis in CSC’s
Page 18: Towards FAIRness in research data - vr.se · •Discover research datasets via Etsin and B2Find and make sure your data is discoverable, too •Store data needed in analysis in CSC’s

https://research.csc.fi/data-management-and-analytics

Services to support the data lifecycle

Page 19: Towards FAIRness in research data - vr.se · •Discover research datasets via Etsin and B2Find and make sure your data is discoverable, too •Store data needed in analysis in CSC’s

Fairdata services

• IDA – store research data

• ETSIN – find research data

• QVAIN – describe metadata

• FAIRDATA-PAS – Preserve research outputs

• In addition (1) metadata repository and (2)

authentication solution

• Authentication, ontology, and other concurrent

services

Page 20: Towards FAIRness in research data - vr.se · •Discover research datasets via Etsin and B2Find and make sure your data is discoverable, too •Store data needed in analysis in CSC’s

Components

EtsinQvainIDAFairdata PASmanagement

interface

AAIFairdata PAS

packaging service

PASsolution Metax Metadata repository

Sto

rag

e

Ava

ilab

ility

Man

age

me

nt

De

scri

bti

ng

Front end

Back end

Data producer

Data user Data

preserver

Page 21: Towards FAIRness in research data - vr.se · •Discover research datasets via Etsin and B2Find and make sure your data is discoverable, too •Store data needed in analysis in CSC’s

Persistent Identifiers

Metadata repository

Resolver

PID

Data file

License

Configuration files

Read me

Landing page

Page 22: Towards FAIRness in research data - vr.se · •Discover research datasets via Etsin and B2Find and make sure your data is discoverable, too •Store data needed in analysis in CSC’s

Operational data

• Active data

• Not citable

• Dynamic, volatile

22

Page 23: Towards FAIRness in research data - vr.se · •Discover research datasets via Etsin and B2Find and make sure your data is discoverable, too •Store data needed in analysis in CSC’s

Published data

• Static and persistent

• Assigned a PID

• Product of specific research

• Possibly low reusability

• Needed for review and transparence

23

Page 24: Towards FAIRness in research data - vr.se · •Discover research datasets via Etsin and B2Find and make sure your data is discoverable, too •Store data needed in analysis in CSC’s

Generic Research data

• Validated by and for researchers

• Possible to cite

• Documented (metadata) and versioned

• Can be dynamic or cumulative

24

Page 25: Towards FAIRness in research data - vr.se · •Discover research datasets via Etsin and B2Find and make sure your data is discoverable, too •Store data needed in analysis in CSC’s

25

Page 26: Towards FAIRness in research data - vr.se · •Discover research datasets via Etsin and B2Find and make sure your data is discoverable, too •Store data needed in analysis in CSC’s

The Target

Local- Storage- Backup- Computing

National- Sharing- Long term preservation

International- Sharing- Collaboration- Publishing

26

Page 27: Towards FAIRness in research data - vr.se · •Discover research datasets via Etsin and B2Find and make sure your data is discoverable, too •Store data needed in analysis in CSC’s

Acknowledgment

• Timo Vesala, INAR RI, Helsinki University

• Ville Tenhunen, Helsinki Univeristy

• Jessica Parland-von Essen, CSC and Fairdata.fi

• Tommi Nyrönen, ELIXIR-Finland Head of Node, CSC

• Mikael Linden, CSC

• Damien Lecarpentier, EUDAT, CSC

17.10.201827

Page 28: Towards FAIRness in research data - vr.se · •Discover research datasets via Etsin and B2Find and make sure your data is discoverable, too •Store data needed in analysis in CSC’s

facebook.com/CSCfi

twitter.com/CSCfi

youtube.com/CSCfi

linkedin.com/company/csc---it-center-for-science

Kuvat CSC:n arkisto ja Thinkstock

github.com/CSCfi

CSC – IT Center for Science Ltd

Per Öster

Director, Research Infrastructures & Policy

[email protected]