Download - Mendeley Data FAIR hackathon
WHAT IS FAIR DATA?
I need data. What should I do?
WHAT IS FAIR DATA?
WHAT IS FAIR DATA?
Findable:
F1. (meta)data are assigned a globally unique and persistent identifier;
F2. data are described with rich metadata;
F3. metadata clearly and explicitly include the identifier of the data it describes;
F4. (meta)data are registered or indexed in a searchable resource;
http://www.nature.com/articles/sdata201618
WHAT IS FAIR DATA?
WHAT IS FAIR DATA?
Accessible:
A1. (meta)data are retrievable by their identifier using a standardized communications protocol;
A1.1 the protocol is open, free, and universally implementable;
A1.2. the protocol allows for an authentication and authorization procedure, where necessary;
A2. metadata are accessible, even when the data are no longer available;
■ http://www.nature.com/articles/sdata201618
WHAT IS FAIR DATA?
WHAT IS FAIR DATA?
Interoperable:
I1. (meta)data use a formal, accessible, shared, and broadly applicable language for knowledge representation.
I2. (meta)data use vocabularies that follow FAIR principles;
I3. (meta)data include qualified references to other (meta)data;
■ http://www.nature.com/articles/sdata201618
WHAT IS FAIR DATA?
WHAT IS FAIR DATA?
Reusable:
R1. meta(data) are richly described with a plurality of accurate and relevant attributes;
R1.1. (meta)data are released with a clear and accessible data usage license;
R1.2. (meta)data are associated with detailed provenance;
R1.3. (meta)data meet domain-relevant community standards;
■ http://www.nature.com/articles/sdata201618
NETHERLANDS
FAIR transformation FAIR transformation
Analysis transformation Analysis transformation
FAIRNESS LEVELS
PID\\\
Metadata (intrinsic)
'provenance' (user defined)
Data (elements)
Non-FAIR
PID
Metadata (intrinsic)
'provenance' (user defined)
Data (elements)
Findable Usable for Humans
PID
Metadata (intrinsic)
'provenance' (user defined)
Data (elements)
FAIR metadata
PID
Metadata (intrinsic)
'provenance' (user defined)
Data (elements)
FAIR data- restricted access
PID
Metadata (intrinsic)
'provenance' (user defined)
Data (elements)
FAIR data- Open Access
PID
Metadata (intrinsic)
'provenance' (user defined)
Data (elements)
FAIR data- Open Access/Functionally Linked
FAIR DATA TOOLS
FAIR DATA POINT
A particular class of FAIR Data System that provides access to datasets in a FAIR manner. The datasets can be external or internal to the FAIR Data Point. Also, the source data can be a non-FAIR dataset or a FAIR Data Resource. If the source data is non-FAIR, the FAIR Data Point needs to made the necessary FAIR transformations on the fly.
FAIR Data Resource
non-FAIR Data Resource
FAIR DATA POINT
Who are you? Can I
trust you?
FAIR DATA POINT
Here is information about
myself
FDP Metadata
Who is responsible?
FDP license?
Description?
FAIR DATA POINTOk,
now that I know you, tell me what you
have to offer
reads
FDP Metadata
FAIR DATA POINT
Here is information about my catalog of datasets
Catalog Metadata
FAIR DATA POINTTell
me more about your genomic dataset
reads
Catalog Metadata
FAIR DATA POINT
This is the information about the
genomic dataset
Dataset Metadata
License?Publisher?
Last modified date?
Theme?
FAIR DATA POINTIn
which forms the dataset is available?
reads
Dataset Metadata
FAIR DATA POINT
This is the information about the
dataset distributions
Distribution Metadata
Access or download URL?
Format?
Size?
Media type?
FAIR DATA POINTTell
me more about the dataset content
reads
Dataset Metadata
FAIR DATA POINT
This is the information about the data
record of the dataset
Data record Metadata
Types?
Domain?
Range?
FAIR DATA POINTOk,
now that I know what you have, give
me the data.
reads
Dataset, distribution, data record metadata
FAIR DATA POINT
Here is my data.
FAIR Data
FAIR DATA POINT - ARCHITECTURE
FAIR API / GUI
Metadata Provider FAIR Accessor
Metrics Gatherer Security Enforcer
FAIR Metadata FAIR Data
FAIR Data Point metadata
Title Responsible institution(s) Contact FAIR API version License …
FDP METADATA
<http://dev-vm.fair-dtls.surf-hosted.nl:8082/fdp> dct:alternative "DTL FDP"@en ; dct:description "The DTL FAIR Data Point hosts the FAIR Data versions of datasets that have been made FAIR during BYODs as well as other relevant life sciences datasets"@en ; dct:subject "FAIR Data" , "Life Sciences" ; dct:title "DTL FAIR Data Point"@en ; <http://www.re3data.org/schema/3-0#api> <http://dtls.nl/fdp#api=1> ; <http://www.re3data.org/schema/3-0#catalog> <http://dev-vm.fair-dtls.surf-hosted.nl:8082/fdp/biobank> , <http://dev-vm.fair-dtls.surf-hosted.nl:8082/fdp/comparativeGenomics> , <http://dev-vm.fair-dtls.surf-hosted.nl:8082/fdp/patient-registry> , <http://dev-vm.fair-dtls.surf-hosted.nl:8082/fdp/textmining> , <http://dev-vm.fair-dtls.surf-hosted.nl:8082/fdp/transcriptomics> ; <http://www.re3data.org/schema/3-0#institution> <http://dtls.nl> ; <http://www.re3data.org/schema/3-0#institutionCountry> <http://lexvo.org/id/iso3166/NL> ; <http://www.re3data.org/schema/3-0#lastUpdate> "2016-10-27"^^xsd:date ; <http://www.re3data.org/schema/3-0#software> "FAIR Data Point" ; <http://www.re3data.org/schema/3-0#startDate> "2016-10-27"^^xsd:date ; a <http://www.re3data.org/schema/3-0#Repository> ; rdfs:label "DTL FAIR Data Point"@en ; <http://xmlns.com/foaf/0.1/landingpage> <http://dev-vm.fair-dtls.surf-hosted.nl:8082/fdp/swagger-ui.html> .
FAIR Data Point metadata
Catalog metadata
Title Theme taxonomy Issued date …
CATALOG METADATA
<http://dev-vm.fair-dtls.surf-hosted.nl:8082/fdp/biobank> dct:hasVersion "1.0" ; dct:identifier "biobank" ; dct:issued "2016-02-01"^^xsd:date ; dct:language lang:en ; dct:modified "2016-08-01"^^xsd:date ; dct:title "Rd connect's biobank catalog"@en ; a dcat:Catalog ; rdfs:label "Rd connect's biobank catalog"@en ; dcat:dataset <http://dev-vm.fair-dtls.surf-hosted.nl:8082/fdp/biobank/77350-collection1> ; dcat:themeTaxonomy <http://dbpedia.org/resource/Biobank> , <http://edamontology.org/topic_3337> .
FAIR Data Point metadata
Catalog 1 metadata
Dataset metadata
Title Publisher License Theme(s) Version …
DATASET METADATA
<http://dev-vm.fair-dtls.surf-hosted.nl:8082/fdp/biobank/77350-collection1> dct:creator <http://orcid.org/0000-0002-1215-167X> ; dct:hasVersion "1.0" ; dct:identifier "77350-collection1" ; dct:issued "2016-02-01"^^xsd:date ; dct:language lang:en ; dct:modified "2016-08-01"^^xsd:date ; dct:publisher <http://orcid.org/0000-0002-1215-167X> ; dct:title "Galliera Genetic Bank"@en ; <http://rdf.biosemantics.org/ontologies/fdp-o#dataRecord> <http://dev-vm.fair-dtls.surf-hosted.nl:8082/fdp/datarecord/77350-collection1-datarecord-1> ; a dcat:Dataset ; rdfs:label "Galliera Genetic Bank"@en ; rdfs:seeAlso <http://catalogue.rd-connect.eu/web/galliera-genetic-bank/bb_home> ; dcat:distribution <http://dev-vm.fair-dtls.surf-hosted.nl:8082/fdp/biobank/77350-collection1/csv> , <http://dev-vm.fair-dtls.surf-hosted.nl:8082/fdp/biobank/77350-collection1/distributionTurtle> , <http://dev-vm.fair-dtls.surf-hosted.nl:8082/fdp/biobank/77350-collection1/ldf> ; dcat:keyword "Galliera Genetic Bank" , "biobank" ; dcat:landingPage <http://ggb.galliera.it> ; dcat:theme <http://dbpedia.org/resource/Biobank> , <http://edamontology.org/topic_3337> , <http://www.orpha.net/ORDO/Orphanet_1023> …
FAIR Data Point metadata
Catalog 1 metadata
Dataset 1 metadata
Distribution metadata
Title Media type Download/access URL License …
DISTRIBUTION METADATA
<http://dev-vm.fair-dtls.surf-hosted.nl:8082/fdp/biobank/77350-collection1/distributionTurtle> dct:description "Ring14 biobank turtle distribution"@en ; dct:hasVersion "1.0" ; dct:identifier "distributionTurtle" ; dct:issued "2016-02-01"^^xsd:date ; dct:license <http://rdflicense.appspot.com/rdflicense/cc-by-nc-nd3.0> ; dct:modified "2016-07-07"^^xsd:date ; dct:title "Ring14 biobank turtle distribution"@en ; a dcat:Distribution ; rdfs:label "Ring14 biobank turtle distribution"@en ; dcat:downloadURL <http://semlab1.liacs.nl:8080/rdc-demo-dataset/RING_14_dummy-Biobank.ttl> ; dcat:mediaType "text/turtle" .
<http://dev-vm.fair-dtls.surf-hosted.nl:8082/fdp/biobank/77350-collection1/ldf> dct:description "Ring14 biobank linked data fragment distribution"@en ; dct:hasVersion "1.0" ; dct:identifier "ldf" ; dct:license <http://rdflicense.appspot.com/rdflicense/cc-by-nc-nd3.0> ; dct:title "Ring14 biobank linked data fragment distribution"@en ; a dcat:Distribution ; rdfs:label "Ring14 biobank linked data fragment distribution"@en ; dcat:accessURL <http://dev-vm.fair-dtls.surf-hosted.nl:5050/ring14-biosample> .
FAIR Data Point metadata
Catalog metadata
Dataset metadata
Distribution metadata
Data record metadata
Type Domain Range …
DATA RECORD METADATA
<http://dev-vm.fair-dtls.surf-hosted.nl:8082/fdp/datarecord/77350-collection1-datarecord-1> dct:hasVersion "1.0" ; dct:identifier "77350-collection1-datarecord-1" ; dct:issued "2016-02-01"^^xsd:date ; dct:language lang:en ; dct:modified "2016-08-01"^^xsd:date ; dct:publisher <http://orcid.org/0000-0002-1215-167X> ; dct:title "Galliera Genetic Bank datarecord metadata" ; <http://rdf.biosemantics.org/ontologies/fdp-o#refersTo> <http://dev-vm.fair-dtls.surf-hosted.nl:8082/fdp/biobank/77350-collection1/csv> ; <http://rdf.biosemantics.org/ontologies/fdp-o#rmlMapping> <https://git.lumc.nl/biosemantics/ring14-fdp-metadata/raw/bd01b84fb792ae3860fdda646e9cb96a1a11205c/rml/biobank/RING_14_biobank_mapping.ttl> ; a <http://rdf.biosemantics.org/ontologies/fdp-o#DataRecord> ; rdfs:label "Galliera Genetic Bank datarecord metadata" .
<#ring14-biobank-id-resource> rml:logicalSource <#inputFile>; rr:subjectMap [ rr:template "http://rdf.biosemantics.org/dataset/ring14/resource/identifier/{Sample ID}" ; rr:class <http://rdf.biosemantics.org/ontologies/rd-connect/21f6df30_1f72_45fb_bfc1_2b3d1af1410a> ];
FAIR Data Point metadata
Catalog 2 metadata
Catalog 1 metadata
Dataset 1 metadata
Distribution 1.a metadata
Data record metadata
Distribution 1.b metadata
Dataset 2 metadata
Distribution 2.a metadata
Data record metadata
Distribution 2.b metadata
Dataset 3 metadata
Distribution 3.a metadata
Data record metadata
FAIR DATA POINT
Biobank
FAIR Data PointBiobankDatabase
Patie
nt R
egist
ry
FAIR
Dat
a Po
int
UNIPROT
FAIR
Dat
a Po
int
HPA
FAIR Data Point
METADATA LAYERSLayer Description Example Standard
FDP (Data repository)
Information about the FDP as a data repository
PID, title, description, license, owner, API version, etc.
RE3Data
Catalog Information about the catalog of datasets offered
PID, title, description, publisher, etc.
W3C DCAT #Catalog
Dataset Information about each of the offered datasets
Publisher, issue date, theme, etc.
W3C DCAT #Dataset,
Distribution Information about how the dataset is distributed
AccessURL, downloadURL, format, mediaType, etc.
W3C DCAT #Distribution
Data record Information about the actual data, types, identifiers, etc.
Data items types, identifiers, domain, range, etc.
RML
OA
I-PM
H
DEMO FAIR DATA POINT
http://dev-vm.fair-dtls.surf-hosted.nl:8082/fdp/swagger-ui.html
http://dev-vm.fair-dtls.surf-hosted.nl:8082/fdp/
API
GUI
FAIR DATA POINT
Metadata Provider FAIR Accessor
Metrics Gatherer Security Enforcer
EXISTING DATA REPOSITORIES
Metadata Provider FAIR Accessor
Metrics Gatherer Security Enforcer
EXTENDING EXISTING DATA REPOSITORIES
Metadata Provider FAIR Accessor
Metrics Gatherer Security Enforcer+
Metadata Provider
FAIR Data Accessor
Metrics Gatherer Access Controller
EUDAT Current ComponentsEUDAT Current
ComponentsEUDAT Current
ComponentsCurrentSolution
Components
FAIR DATA ECOSYSTEM
Create Publish AnnotateFind
0110011 1100101 1001100
OpenRDFKnowledgeAnnotatorORKA
DataFAIRport
0110011 1100101 1001100
Development started in October 2016
FAIR Metadata FAIR Metadata FAIR Metadata FAIR Metadata
0110011 1100101 1001100
metadata index
retrieves metadata
search interfaces
(GUI and API)
0110011 1100101 1001100
Development started in October 2016 Based on OpenRefine
FAIRIFICATION PROCESS
■ Retrieve original data
■ Dataset identification and analysis
■ Definition of the semantic model
■ Data transformation
■ License assignment
■ Metadata definition
■ FAIR Data resource (data, metadata, license) deployment
FAIRIFICATION
Original dataset
FAIR Data Resource
FAIR Format
Metadata Licensesubmit generate
Generic semantic
model
Resolution ServicesInput
Non-FAIR Dataset
Metadata
License
ARTA Service
Identifier Service
Vocabulary Service
FAIR Data Resource
FAIR Format
Metadata License
output
FAIR DataModel Registry
FAIRIFIER
■ Transform non-FAIR datasets into FAIR Data Resources (dataset in FAIR format, license and metadata)
■ Data munging
■ Semantic modeling
■ License definition
■ Metadata definition and extraction
■ Data publication
FAIRIFIER
FAIR DATA MODEL REGISTRY
FAIR DataModel Registry
Dataset
Data Model
Dataset
Data Model
Dataset
Data Model
FAIRIFICATION
Original dataset
FAIR Data Resource
FAIR Format
Metadata Licensesubmit generate
Generic semantic
model
FAIRIFICATION - NEW DATASET TYPE
Original dataset
FAIR Data Resource
FAIR Format
Metadata Licensesubmit generate
FAIR Data Model Registry
store
Non-FAIR - FAIR
mapping
FAIRIFICATION - RECURRING DATASET TYPE
Original dataset
FAIR Data Resource
FAIR Format
Metadata Licensesubmit generate
FAIR Data Model Registry
qu
ery
Non-FAIR - FAIR
mappingretr
ieve
■ A particular class of FAIR Data System to provide support for data interoperability;
■ Supports publication and access to FAIR data. ■ Fosters an ecosystems of applications and services; ■ Federated architecture: different FAIRports (and other
FAIR Data Systems) are interconnectable; ■ Supports citations of datasets and data items; ■ Provides metrics for data usage and citation;
DataFAIRport
FAIR Data Search Engine
FAIRifier + (Meta)Data Publication
Metadata storageData storage (optional)
TransformationServices Registry
(optional)FAIR Data Point
DataFAIRportDTL
FAIR Data PointFAIR Data Point
F A IR
FAIRPORT
DataFAIRportFind,&Access,&Interoperate&&&Re3use&Data
Stewardship API FAIR Data API
(Meta)Data Storage component
Metadata storage
Data storage
DataVerse EUDAT Data Repository
Semantic resolver Ontology storage
Data storage API / FAIR Data API
Data usage policy
Management component
GUI (Data publishing, search, mgmt)
Data Mgmt App
FAIR Data System
Metrics storage
Data ConsumerData Producer
Data Consumer AppsEx. *APInatomy, BRAIN,
etc)
Data Consumer AppsEx. *APInatomy, BRAIN,
etc)
Data Consumer AppsEx. *APInatomy, BRAIN,
etc)
Data Consumer AppsEx. *APInatomy, BRAIN,
etc)Data Mgmt AppData Mgmt AppData
Stewardship Apps
■ Allow third-party annotation on existing knowledge bases
■ Capture the provenance of the annotator and the original statement
OpenRDFKnowledgeAnnotatorORKA
DEMO: HTTP://DEV-VM.FAIR-DTLS.SURF-HOSTED.NL:8080/#/
DEMO: HTTP://DEV-VM.FAIR-DTLS.SURF-HOSTED.NL:8080/#/
DEMO: HTTP://DEV-VM.FAIR-DTLS.SURF-HOSTED.NL:8080/#/
ANNOTATIONS GO TO NANOPUB STORE
TOOLS ROADMAPDec 16 Jan 17 Feb 17 Mar 17
FAIR Data Point
Version 1 Metadata editor,
release metadata, POST, FAIR accessor
Version 1.1 Reintroduce OAI-PMH compliance
Version 1.2 Update
notification
FAIR Data Search Engine
Beta 1 Crawler,
metadata index and search GUI
Beta 2 Improved search GUI, search API
FAIRifier
Beta 1 OpenRefine + RDF plugin, publication to FAIR Data Point
Beta 2 Metadata
definition and extraction (RML),
license picker
TOOLS ROADMAPDec 16 Jan 17 Feb 17 Mar 17
FAIR Data Model
Registry
Alpha 1 Start of the
integration work
ORKA
Beta 1 Definition of 2-3
use cases
Beta 2 Extended with
features required by the use cases
Data FAIRport
Alpha 1 Start of the
integration work
TECHNOLOGY TRANSFER EVENTS
EXTENDING EXISTING DATA REPOSITORIES
Metadata Provider FAIR Accessor
Metrics Gatherer Security Enforcer+
Metadata Provider
FAIR Data Accessor
Metrics Gatherer Access Controller
EUDAT Current ComponentsEUDAT Current
ComponentsEUDAT Current
ComponentsCurrentSolution
Components
FAIR HACKATHON - GOALS
■ Align solutions with FAIR Data Point specifications.
■ Metadata content
■ API
■ Data
FAIR HACKATHON OUTCOME
■ FAIR data model for solutions content;
■ Architecture of the required adjustments/extensions;
■ Technical specification of the adjustments/extensions;
■ Proof-of-concept of the adjusted solution;
FAIR HACKATHONS
RDRF
MOLGENIS FAIR HACKATHON
MOLGENIS FAIR HACKATHON
MOLGENIS FAIR HACKATHON
DTL’S FAIR HACKATHONS ROADMAP■ EUDAT (pilot project ongoing)
■ EGA (July 6-8 2016)
■ Molgenis (Oct 19-20 2016)
■ Patient registry solution providers (Oct 25-27 2016)
■ Mendeley (Nov 18 2016)
■ Quaero Systems (Nov 24 2016)
■ tranSMART (TBD)
■ phenotypeDB (TBD)
■ Euretos Knowledge Platform (TBD)
■ NIH, Australian National Data Services, Brazilian open government data, …
BRING YOUR OWN DATA - BYOD
■ Goals: ■ Learn how to make data linkable “hands-on” with experts ■ Create a “telling story” to demonstrate its use ■ Make FAIR Data at the source
■ Composition: ■ Data owners – specialists on given datasets ■ Data interoperability experts ■ Domain experts
Source: Marcos Roos
NETHERLANDS
Domain Expert
Data Owner FAIR Data Expert
BYOD
NETHERLANDS
BYOD
Non-FAIR Dataset
Metadata
Non-FAIR Dataset
FAIR DataTransformation
Ontology
Ontology
FAIR datasets
FAIR datasets
NETHERLANDS
BYODPlanning
Preparation Execution Follow Up
NETHERLANDS
BYODPlanning
Preparation
Identify Plan
Datasets Attendees' profile Output data access Tentative dates Tentative venue Costs Funds
Coordination Set date Invite attendees Set venue Catering Lodging Financial planning
Publicity Working document Preparatory calls Data hosting Software hosting Documentation hosting
NETHERLANDS
BYODPlanning
Execution
Day One
Introduction SW, LD, Ontology intro Use case intro Workgroups division Working sessions WWW/TTTALA
Day Two
Progress report Working sessions Groups reports WWW/TTTALA
Day Three
Data integration Answer driving question Explore data Demo improvement Final report WWW/TTTALA
NETHERLANDS
BYODPlanning
Follow-Up
D+15
Report difficulties Clarifications Next steps
D+45
Report difficulties Clarifications Next steps
Implementation
Expand FAIRification Implement solution Scale-up solution Deploy
BYOD
FA
IR
FAIR HACKATHON
BBMRI
2.0
FAIR dICT
RD Connectt
ODE
X 4A
LL
myFAIR El
ixir
Exce
llera
te
Core FAIR TechnologyFAIR Data E.T.
FAIR Data E.T.
FAIR Data E.T.
FAIR Data E.T.
FAIR Data E.T.
FAIR Data E.T.
DTL
RELATED PROJECTS
ODEX4allFAIR-dICT
myFAIR
QUESTIONS?