data fairport prototype & demo - presentation to elsevier, jul 10, 2015

121
Data FAIRport Skunkworks First Prototype Demo of Legacy Data Repository Discovery and Interoperability (Mark D Wilkinson, Presentation to Elsevier, Jul, 2015)

Upload: mark-wilkinson

Post on 06-Aug-2015

50 views

Category:

Internet


4 download

TRANSCRIPT

Page 1: Data FAIRport Prototype & Demo - Presentation to Elsevier, Jul 10, 2015

Data FAIRport SkunkworksFirst Prototype Demo of Legacy Data Repository

Discovery and Interoperability

(Mark D Wilkinson, Presentation to Elsevier, Jul, 2015)

Page 2: Data FAIRport Prototype & Demo - Presentation to Elsevier, Jul 10, 2015

This presentation is licensed CC-BYMark Wilkinson ([email protected])

https://goo.gl/YEdwwB

@markmoby

Page 3: Data FAIRport Prototype & Demo - Presentation to Elsevier, Jul 10, 2015

EU LeadMark Wilkinson

Isaac Peral Distinguished Researcher, CBGP-UPM, Madrid

USA LeadMichel Dumontier

Associate Professor, Biomedical Informatics, Stanford, USA

FAIRport Project LeadBarend Mons

Professor, Leiden University Medical Centre, Netherlands

Data FAIRport Skunkworks

Common repository access via meta-meta-descriptors and homogenous accessors

Page 4: Data FAIRport Prototype & Demo - Presentation to Elsevier, Jul 10, 2015

What is a FAIRport?

● Findable - (meta)data should be uniquely and persistently identifiable

● Accessible - identifiers should provide a mechanism for (meta)data access, including authentication, access protocol, license, etc.

● Interoperable - (meta)data should be machine-accessible, using a machine-parseable syntax and, where possible, shared common vocabularies.

● Reusable - there should be sufficient machine-readable metadata that it is possible to “integrate like-with-like”, and that component data objects can be precisely and comprehensively cited post-integration.

Page 5: Data FAIRport Prototype & Demo - Presentation to Elsevier, Jul 10, 2015

The Problem

Page 6: Data FAIRport Prototype & Demo - Presentation to Elsevier, Jul 10, 2015

End-user view of “The Problem”

Tissue rejection experimental context. Today, I’m looking for microarray data of human liver cells on a time-course following liver transplant.

What repositories could contain such data?

● GEO? EUDat? FigShare? Dryad? Atlas?

● What fields in those repositories would I need to search, using what vocabularies, to find the microarray studies that are relevant?

Page 7: Data FAIRport Prototype & Demo - Presentation to Elsevier, Jul 10, 2015

Dissecting the problem

There are a lot of repositories!

General Purpose: Dataverse, Dryad, EUDat, Figshare, etc.

Special Purpose: PDB, UniProt, NCBI, GEO, Atlas, EnsEMBL

Page 8: Data FAIRport Prototype & Demo - Presentation to Elsevier, Jul 10, 2015

Dissecting the problem

Lack of harmonized metadata structures, or even rich descriptions of the contents of these repositories, hinders us from (for example):

● knowing where we can look for certain types of data

● knowing if two repositories contain records about the same thing

● Cross-referencing or “joining” across repositories to integrate disparate data about the same thing

● Knowing which repository I could/should deposit my data to (and how)

Page 9: Data FAIRport Prototype & Demo - Presentation to Elsevier, Jul 10, 2015

“Skunkworks” Challenge

If we wanted to enable this kind of FAIR discovery and integration over myriad repositories, what infrastructure

(existing/new) would we need?

Page 10: Data FAIRport Prototype & Demo - Presentation to Elsevier, Jul 10, 2015

If we wanted to enable this kind of FAIR discovery and integration over myriad repositories, what infrastructure

(existing/new) would we need?

Discussions with Tim Clark revealed that the core objectives of Skunkworks were very similar to those of

Force 11 Data Citation Implementation Working Group Team 4 - “Common repository interfaces”

...so we joined forces :-)

“Skunkworks” Challenge

Page 11: Data FAIRport Prototype & Demo - Presentation to Elsevier, Jul 10, 2015

The Solution?

Page 12: Data FAIRport Prototype & Demo - Presentation to Elsevier, Jul 10, 2015

Shared Metadata Descriptors?

They already exist! (e.g. DCAT)

Are not (yet) widely implemented

But are not sufficiently rich......only describe “core” metadata

We need to query richer metadata like experimental context and domain-specific data elements

Page 13: Data FAIRport Prototype & Demo - Presentation to Elsevier, Jul 10, 2015

So... extend DCAT?

Page 14: Data FAIRport Prototype & Demo - Presentation to Elsevier, Jul 10, 2015

So... extend DCAT?

...extend it where?...too many specialist domains & data-types

resistance to harmonization

resistance to implementation(time, money, expertise, ‘just don’t care’)

attempting to impose standards is a Mug’s game!

Page 15: Data FAIRport Prototype & Demo - Presentation to Elsevier, Jul 10, 2015

Common provider-implemented API?

Page 16: Data FAIRport Prototype & Demo - Presentation to Elsevier, Jul 10, 2015

Common provider-implemented API?

a la TDWG/TAPIR and caBIO...too many specialist domains & data-types

resistance to harmonization

resistance to implementation(time, money, expertise, ‘just don’t care’)

attempting to impose standards is a Mug’s game!

Page 17: Data FAIRport Prototype & Demo - Presentation to Elsevier, Jul 10, 2015

Where else could the solution be?

What exactly *is* our problem?

Page 18: Data FAIRport Prototype & Demo - Presentation to Elsevier, Jul 10, 2015

What exactly *is* our problem?

Data Record (e.g. XML, RDF)

Page 19: Data FAIRport Prototype & Demo - Presentation to Elsevier, Jul 10, 2015

What exactly *is* our problem?

Data Record (e.g. XML, RDF)

Data Schema (e.g. XMLS, RDFS)

Defines

Page 20: Data FAIRport Prototype & Demo - Presentation to Elsevier, Jul 10, 2015

What exactly *is* our problem?

Data Record (e.g. XML, RDF)

Data Schema (e.g. XMLS, RDFS)

Metadata Record (e.g. DCAT-compliant RDF)

Defines

Describes

Page 21: Data FAIRport Prototype & Demo - Presentation to Elsevier, Jul 10, 2015

What exactly *is* our problem?

Data Record (e.g. XML, RDF)

Data Schema (e.g. XMLS, RDFS)

Metadata Record (e.g. DCAT-compliant RDF)(IF the repository uses DCAT)

DCAT RDFS Schema(IF the repository uses DCAT…)

Defines

Describes

Defines

Page 22: Data FAIRport Prototype & Demo - Presentation to Elsevier, Jul 10, 2015

What exactly *is* our problem?

Data Record (e.g. XML, RDF)

Data Schema (e.g. XMLS, RDFS)

Metadata Record (e.g. DCAT-compliant RDF)(IF the repository uses DCAT)

DCAT RDFS Schema(IF the repository uses DCAT…)

Defines

Describes

Defines

If everyone used DCAT, we could at least query the core metadata of all repositories…

...but they don’t......and core isn’t rich enough anyway...

Page 23: Data FAIRport Prototype & Demo - Presentation to Elsevier, Jul 10, 2015

What exactly *is* our problem?

XML Data Record

XMLS Data Schema

DCAT RDF Metadata Record

RDF Data Record

RDFS Data Schema

UniProt RDFMetadata Record

ACEDBData Record

ACEDB Data Schema

DragonDB FormMetadata Record

DCAT RDFS Schema

UniProt RDFSMetadataSchema

DragonDB FormMetadata Schema

REALITY

Page 24: Data FAIRport Prototype & Demo - Presentation to Elsevier, Jul 10, 2015

What exactly *is* our problem?

XML Data Record

XMLS Data Schema

DCAT RDF Metadata Record

RDF Data Record

RDFS Data Schema

UniProt RDFMetadata Record

ACEDBData Record

ACEDB Data Schema

DragonDB FormMetadata Record

DCAT RDFS Schema

UniProt RDFSMetadataSchema

DragonDB FormMetadata Schema

Repositories don’t all use DCAT Schema

Page 25: Data FAIRport Prototype & Demo - Presentation to Elsevier, Jul 10, 2015

What exactly *is* our problem?

XML Data Record

XMLS Data Schema

DCAT RDF Metadata Record

RDF Data Record

RDFS Data Schema

UniProt RDFMetadata Record

ACEDBData Record

ACEDB Data Schema

DragonDB FormMetadata Record

DCAT RDFS Schema

UniProt RDFSMetadataSchema

DragonDB FormMetadata Schema

Those that use DCAT Schema, use only parts of it

Page 26: Data FAIRport Prototype & Demo - Presentation to Elsevier, Jul 10, 2015

What exactly *is* our problem?

XML Data Record

XMLS Data Schema

DCAT RDF Metadata Record

RDF Data Record

RDFS Data Schema

UniProt RDFMetadata Record

ACEDBData Record

ACEDB Data Schema

DragonDB FormMetadata Record

DCAT RDFS Schema

UniProt RDFSMetadataSchema

DragonDB FormMetadata Schema

Those that don’t use DCATuse a myriad of alternatives (some very loosely defined)

Page 27: Data FAIRport Prototype & Demo - Presentation to Elsevier, Jul 10, 2015

What exactly *is* our problem?

XML Data Record

XMLS Data Schema

DCAT RDF Metadata Record

RDF Data Record

RDFS Data Schema

UniProt RDFMetadata Record

ACEDBData Record

ACEDB Data Schema

DragonDB FormMetadata Record

DCAT RDFS Schema

UniProt RDFSMetadataSchema

DragonDB FormMetadata Schema

And don’t necessarily useall elements of those alternatives either

Page 28: Data FAIRport Prototype & Demo - Presentation to Elsevier, Jul 10, 2015

What exactly *is* our problem?

XML Data Record

XMLS Data Schema

DCAT RDF Metadata Record

RDF Data Record

RDFS Data Schema

UniProt RDFMetadata Record

ACEDBData Record

ACEDB Data Schema

DragonDB FormMetadata Record

DCAT RDFS Schema

UniProt RDFSMetadataSchema

DragonDB FormMetadata Schema

So we need to find a way to do RICH queries over all of these?

Page 29: Data FAIRport Prototype & Demo - Presentation to Elsevier, Jul 10, 2015

What exactly *is* our problem?

XML Data Record

XMLS Data Schema

DCAT RDF Metadata Record

RDF Data Record

RDFS Data Schema

UniProt RDFMetadata Record

ACEDBData Record

ACEDB Data Schema

DragonDB FormMetadata Record

DCAT RDFS Schema

UniProt RDFSMetadataSchema

DragonDB FormMetadata Schema

We need a way to describe the descriptors...

Page 30: Data FAIRport Prototype & Demo - Presentation to Elsevier, Jul 10, 2015

Desiderata of meta-meta descriptors

● Must describe legacy data (i.e. not just DCAT or other “modern” data)

● Must describe a multitude of data formats (XML, RDF, Key/Value, etc.)

● Must be capable of describing any kind of value constraint, e.g. plain text,

numerical, arbitrary CV, rdf:range, or equivalent OWL construct

● Must be modular, identifiable, shareable, and reusable (to stem the

proliferation of new formats)

● Must be hierarchical to allow composite re-use of shared descriptors

● Must use standard technologies, and re-use existing vocabularies if poss.

● Must be extremely lightweight and “trivial” to create

● Must NOT require the participation of the repository host (no buy-in required)

Page 31: Data FAIRport Prototype & Demo - Presentation to Elsevier, Jul 10, 2015

The Solution?(or at least, our best attempt to date!)

Page 32: Data FAIRport Prototype & Demo - Presentation to Elsevier, Jul 10, 2015

Exemplar use-cases:

● A piece of software that can generate a “sensible” data submission form for any repository

(at the Force 2015 meeting a few months ago I gave a presentation of a working example of this… so I won’t repeat that today…)

● A piece of software that can generate a “sensible” query form/interface for any repository

(demonstration of this today!)

Skunkworks Task #1 - [F]indable

Invent harmonized cross-repository meta-descriptors

Page 33: Data FAIRport Prototype & Demo - Presentation to Elsevier, Jul 10, 2015

“FAIR Profiles”

FAIR Profiles provide a common way to describe a repository’s metadata

(and data, for that matter!)

Page 34: Data FAIRport Prototype & Demo - Presentation to Elsevier, Jul 10, 2015

XML Data Record

XMLS Data Schema

DCAT RDF Metadata Record

RDF Data Record

RDFS Data Schema

UniProt RDFMetadata Record

ACEDBData Record

ACEDB Data Schema

DragonDB FormMetadata Record

DCAT RDFS Schema

UniProt RDFSMetadataSchema

DragonDB FormMetadata Schema

What FAIR Profiles do

Page 35: Data FAIRport Prototype & Demo - Presentation to Elsevier, Jul 10, 2015

XML Data Record

XMLS Data Schema

DCAT RDF Metadata Record

RDF Data Record

RDFS Data Schema

UniProt RDFMetadata Record

ACEDBData Record

ACEDB Data Schema

DragonDB FormMetadata Record

DCAT RDFS Schema

UniProt RDFSMetadataSchema

DragonDB FormMetadata Schema

FAIR Profile ofDCAT Schema

FAIR Profile of UniProt Metadata Schema

FAIR Profile ofDragDB Metadata Schema

What FAIR Profiles do

Page 36: Data FAIRport Prototype & Demo - Presentation to Elsevier, Jul 10, 2015

XML Data Record

XMLS Data Schema

DCAT RDF Metadata Record

RDF Data Record

RDFS Data Schema

UniProt RDFMetadata Record

ACEDBData Record

ACEDB Data Schema

DragonDB FormMetadata Record

DCAT RDFS Schema

UniProt RDFSMetadataSchema

DragonDB FormMetadata Schema

FAIR ProfileDCAT Schema

FAIR ProfileUniProt Metadata Schema

FAIR ProfileDragDB Metadata Schema

Though they are potentially describing very different things(from Web FORM fields to OWL Ontologies!)

all FAIR Profiles are written using the same vocabulary and structure, defined by...

Page 37: Data FAIRport Prototype & Demo - Presentation to Elsevier, Jul 10, 2015

XML Data Record

XMLS Data Schema

DCAT RDF Metadata Record

RDF Data Record

RDFS Data Schema

UniProt RDFMetadata Record

ACEDBData Record

ACEDB Data Schema

DragonDB FormMetadata Record

DCAT RDFS Schema

UniProt RDFSMetadataSchema

DragonDB FormMetadata Schema

FAIR ProfileDCAT Schema

FAIR ProfileUniProt Metadata Schema

FAIR ProfileDragDB Metadata Schema

Page 38: Data FAIRport Prototype & Demo - Presentation to Elsevier, Jul 10, 2015

The FAIR Profile Schema

Page 39: Data FAIRport Prototype & Demo - Presentation to Elsevier, Jul 10, 2015

Repo. Data Record (e.g. XML, RDF)

Repo. Data Schema (e.g. XMLS, RDFS)

Repository Metadata Record

Repository Metadata Schema

Defines

Describes

Defines

Defines

~~Describes**

Repository’s FAIR Profile

FAIR Profile Schema

Page 40: Data FAIRport Prototype & Demo - Presentation to Elsevier, Jul 10, 2015

Repo. Data Record (e.g. XML, RDF)

Repo. Data Schema (e.g. XMLS, RDFS)

Repository Metadata Record

Repository Metadata Schema

Defines

Defines

~~Describes**

Repository’s FAIR Profile

FAIR Profile Schema

Page 41: Data FAIRport Prototype & Demo - Presentation to Elsevier, Jul 10, 2015

FAIR Profile SchemaA very small OWL Vocabulary for writing meta-meta-

descriptors

FAIR ProfileMetadata

FAIR Classdc:provenance

hasProperty

owl:Class(URI or de

novo definition)

rdf:Propertyowl:ObjectProperty or owl:DatatypeProperty

describes property

minCount

xsd:anyURI

xsd:integer

xsd:integermaxCount

allowedValues

FAIR Property

describes class

rdf:langString

skos:preferredLabel skos:preferredLabel

rdf:langString

http://datafairport.org/schema/FAIR-schema.owl

Page 42: Data FAIRport Prototype & Demo - Presentation to Elsevier, Jul 10, 2015

FAIR Profile SchemaA very small OWL Vocabulary for writing meta-meta-

descriptors

http://datafairport.org/schema/FAIR-schema.owl

FAIR ProfileMetadata

FAIR Classdc:provenance

hasProperty

owl:Class(URI or de

novo definition)

rdf:Propertyowl:ObjectProperty or owl:DatatypeProperty

describes property

minCount

xsd:anyURI

xsd:integer

xsd:integermaxCount

allowedValues

FAIR Property

describes class

rdf:langString

skos:preferredLabel skos:preferredLabel

rdf:langString

Page 43: Data FAIRport Prototype & Demo - Presentation to Elsevier, Jul 10, 2015

xsd:anyURI

allowedValues

Page 44: Data FAIRport Prototype & Demo - Presentation to Elsevier, Jul 10, 2015

URI must resolve to aSKOS Concept Scheme

Describes the constraints on the possible values for a predicate in the target-

Repository’s metadata Schema

xsd:anyURI

allowedValues

Page 45: Data FAIRport Prototype & Demo - Presentation to Elsevier, Jul 10, 2015

URI must resolve to aSKOS Concept Scheme

Describes the constraints on the possible values for a predicate in the target-

Repository’s metadata Schema

NOTE: we cannot use rdfs:range because we are meta-modelling a schema! The

predicate is a CLASS at the meta-model level, so use of rdfs:range is not appropriate.

xsd:anyURI

allowedValues

Page 46: Data FAIRport Prototype & Demo - Presentation to Elsevier, Jul 10, 2015

A FAIR Profile (an RDF document that follows the FAIR Profile Schema)

This

Metadata Record

Metadata Schema

Fair Profile

Fair Profile Schema

Page 47: Data FAIRport Prototype & Demo - Presentation to Elsevier, Jul 10, 2015

A FAIR Profile (an RDF document that follows the FAIR Profile Schema)

This

Metadata Record

Metadata Schema

Fair Profile

Fair Profile Schema

(as an aside, This is a document that, we believe, is an implementation of the ISO11179 standard for metadata descriptors; however we have not formally made the mapping between our concepts and theirs. This will happen soon, and this mapping alone is sufficient to become ISO11179-compliant. As such, the Fair Profile Schema is a schema for creating ISO11179-compliant descriptors…)

Page 48: Data FAIRport Prototype & Demo - Presentation to Elsevier, Jul 10, 2015

What a FAIR Profile is:

A meta-description of the (meta)data in a repository

Page 49: Data FAIRport Prototype & Demo - Presentation to Elsevier, Jul 10, 2015

What a FAIR Profile is:

A meta-description of the (meta)data in a repository

What a FAIR Profile is NOT:

THE meta-description of the (meta)data in a repository

Page 50: Data FAIRport Prototype & Demo - Presentation to Elsevier, Jul 10, 2015

What a FAIR Profile is:

A meta-description of the (meta)data in a repository

if you were to view it from a particular “perspective”

(also known as a “lens*” over the data)

* Scientific Lenses to Support Multiple Views over Linked Chemistry Data; DOI:10.1007/978-3-319-11964-9_7

Page 51: Data FAIRport Prototype & Demo - Presentation to Elsevier, Jul 10, 2015

What a FAIR Profile is:

A meta-description of the (meta)data in a repository

if you were to view it from a particular “perspective”

(also known as a “lens*” over the data)

this is where the FAIRport approach becomes

distinctly powerful!

Page 52: Data FAIRport Prototype & Demo - Presentation to Elsevier, Jul 10, 2015

What a FAIR Profile is:

A meta-description of the (meta)data in a repository

if you were to view it from a particular “perspective”

(also known as a “lens*” over the data)

but first, look at the other FAIRport components

Page 53: Data FAIRport Prototype & Demo - Presentation to Elsevier, Jul 10, 2015

Skunkworks Task #2 - [A]cessible

Are there already access layer definitions?

Page 54: Data FAIRport Prototype & Demo - Presentation to Elsevier, Jul 10, 2015

remind myself to say that many of these components are standalone

you don’t have to implement everything, all at once.

Page 55: Data FAIRport Prototype & Demo - Presentation to Elsevier, Jul 10, 2015

A set of behaviors for providing a unified (albeit simplistic!) access layer for “records” contained in any Web resource

Skunkworks Task #2 - [A]cessible

Are there already access layer definitions?

Page 56: Data FAIRport Prototype & Demo - Presentation to Elsevier, Jul 10, 2015

LDP sits at a URL waiting

Page 57: Data FAIRport Prototype & Demo - Presentation to Elsevier, Jul 10, 2015

GET

Client calls HTTP GETon the URL(that’s all!)

Page 58: Data FAIRport Prototype & Demo - Presentation to Elsevier, Jul 10, 2015

??

LDP communicates with the repository

(how? entirely up to you!)

Page 59: Data FAIRport Prototype & Demo - Presentation to Elsevier, Jul 10, 2015

Repository returns data“about available records”(how? entirely up to you!)

??

Page 60: Data FAIRport Prototype & Demo - Presentation to Elsevier, Jul 10, 2015

LDP returns you anRDF representation of the

list of records’ URLs

<RDF>URL1URL2URL3URL4URL5URL6……...</RDF>

Page 61: Data FAIRport Prototype & Demo - Presentation to Elsevier, Jul 10, 2015

GET URL6

The URLs (should) point back to the LDP server

Page 62: Data FAIRport Prototype & Demo - Presentation to Elsevier, Jul 10, 2015

??

LDP communicates with the repository about that record

??

Page 63: Data FAIRport Prototype & Demo - Presentation to Elsevier, Jul 10, 2015

LDP returns you DCAT Distributions for all

available formats of that record that the repo provides

<RDF>

<dcat:Dist.><format xml> URL6a

<dcat:Dist.><format html>URL6b

</RDF>

Page 64: Data FAIRport Prototype & Demo - Presentation to Elsevier, Jul 10, 2015

You directly call the repository using the URL of

your choice

GET URL6a

Page 65: Data FAIRport Prototype & Demo - Presentation to Elsevier, Jul 10, 2015

Repository returns you the data you requested

Content-type: application/xml

<data><data> Yummy Data Here!</data></data>….

(Note: most repositories already do this part! So we’re half-way there :-) )

Page 66: Data FAIRport Prototype & Demo - Presentation to Elsevier, Jul 10, 2015

The first time I wrote one of these from scratch, it was about 170 lines of code, and took less than 4 hours

(including reading the W3C documentation!)

Page 67: Data FAIRport Prototype & Demo - Presentation to Elsevier, Jul 10, 2015

The first time I wrote one of these from scratch, it was about 170 lines of code, and took less than 4 hours

(including reading the W3C documentation!)

These may exist completely independently of any other FAIRport componentas a way of fulfilling the “A” (Accessible) aspect of the FAIR Data Principles

https://www.force11.org/group/fairgroup/fairprinciples

Page 68: Data FAIRport Prototype & Demo - Presentation to Elsevier, Jul 10, 2015

The first time I wrote one of these from scratch, it was about 170 lines of code, and took less than 4 hours

(including reading the W3C documentation!)

However, when one of these is associated with a FAIR Profile we call it a

“FAIR Accessor”

Page 69: Data FAIRport Prototype & Demo - Presentation to Elsevier, Jul 10, 2015

The first time I wrote one of these from scratch, it was about 170 lines of code, and took less than 4 hours

(including reading the W3C documentation!)

**** As of July 8, 2015, there are now support libraries (in Perl) for this part of the implementation: https://goo.gl/NB13Fz (the location of this code will change soon, sorry!)

Page 70: Data FAIRport Prototype & Demo - Presentation to Elsevier, Jul 10, 2015

Live Demo….

http://biordf.org/cgi-bin/RD_Connect/EHDN_Accessor

DISCLAIMER: This demonstration was written during an RD Connect workshop, and the demonstration references various aspects of the RD Connect project (including having ‘RD_Connect’ in the URL). HOWEVER, I am not affiliated with RD Connect, I do not speak for RD Connect in any way, and RD Connect does not endorse any of these ideas, products, behaviours, or anything else presented in this talk. In addition, the data and metadata presented here is all completely fake, and this public demo raises no privacy concerns.

The demo is presented using the Tabulator extension to Chrome, in order to nicely format the RDF for human readability.

Page 71: Data FAIRport Prototype & Demo - Presentation to Elsevier, Jul 10, 2015
Page 72: Data FAIRport Prototype & Demo - Presentation to Elsevier, Jul 10, 2015

These elements are defined by the LDP Specification

Page 73: Data FAIRport Prototype & Demo - Presentation to Elsevier, Jul 10, 2015
Page 74: Data FAIRport Prototype & Demo - Presentation to Elsevier, Jul 10, 2015

A SKOS Concept Scheme describing the “nature” of the data in the repository

Page 75: Data FAIRport Prototype & Demo - Presentation to Elsevier, Jul 10, 2015

Optional - the link to the meta-descriptors of each record in the repository (paginated using HTTP headers defined by LDP)

Page 76: Data FAIRport Prototype & Demo - Presentation to Elsevier, Jul 10, 2015

The Record’s Meta-descriptor

Page 77: Data FAIRport Prototype & Demo - Presentation to Elsevier, Jul 10, 2015

The Record’s Meta-descriptor

Metadata about the record (which metadata is completely at the discretion of the data owner!)

Page 78: Data FAIRport Prototype & Demo - Presentation to Elsevier, Jul 10, 2015

The Record’s Meta-descriptor

Ontological information about the type of record… click through to:

Page 79: Data FAIRport Prototype & Demo - Presentation to Elsevier, Jul 10, 2015

rdf:type SIO:SIO_001027 (Medical Health Record)

Page 80: Data FAIRport Prototype & Demo - Presentation to Elsevier, Jul 10, 2015

The Record’s Meta-descriptor

There are 3 DCAT Distributions for this record

Page 81: Data FAIRport Prototype & Demo - Presentation to Elsevier, Jul 10, 2015

The Record’s Meta-descriptor

Each distribution has its own format and download URL (distributions are completely optional, and up to the provider)

This would also be the place to put license, accessibility, or authentication metadata!

Page 82: Data FAIRport Prototype & Demo - Presentation to Elsevier, Jul 10, 2015

This provides an API-free way of accessing any record in any dataset

Every step is just HTTP GET with standard metadata following the DCAT ontology

Incremental drill-down from repository-level all the way to an individual record

Useful metadata at all levels

Access is 100% under provider-control at all levels

Page 83: Data FAIRport Prototype & Demo - Presentation to Elsevier, Jul 10, 2015

This is NOT intended to be “efficient”!!

However, the alternative is to invent a query API, and then force all repositories to implement it…

Not Gonna Happen!

This is lightweight, and easy to implement

Trade power for (hopefully) wider adoption...

Page 84: Data FAIRport Prototype & Demo - Presentation to Elsevier, Jul 10, 2015

package EHDN_Accessor; # this should be the same as your filename!use strict;use warnings;use JSON;use base 'FAIR::Accessor';my $config = { title => 'European Huntington Disease Network Data Accessor', serviceTextualDescription => 'Server for some ERDN Data', textualAccessibilityInfo => "The information from this server requries no authentication", # this could also be a $URI describing the accessibiltiy mechanizedAccessibilityInfo => "", # this must be a URI to an RDF document textualLicenseInfo => "CC-BY", # this could also be a URI to the license info mechanizedLicenseInfo => "", # this must be a URI to an RDF document baseURI => "", # I don't know what this is used for yet, but I have a feeling I will need it! ETAG_Base => "EHDN_Accessor_For_RegInfo", localNamespaces => {ehdn => 'http://ehdn.org/some/items/', ehdnpred => 'http://ehdn.org/some/predicates/'}, localMetadataElements => [qw(erdnpred:fromHospital erdnpred:lastevaluatedDate) ], };my $service = EHDN_Accessor->new(%$config);$service->handle_requests;sub get_all_meta_URIs { my ($starting_at_record, $path_info) = @_; $path_info ||=""; my %result = ( 'dc:title' => "EHDN Accessor Server", 'dcat:description' => "the prototype Accessor server for EHDN", 'dcat:identifier' => "handle:12345566798", 'dcat:keyword' => ["medical records", "rare diseases", "EHDN", "Linked Data Platform", 'HTT', 'huntington'], 'dcat:landingPage' => 'http://www.euro-hd.net/html/network', 'dcat:language' => 'en', 'dcat:publisher' => 'http://www.euro-hd.net', 'dcat:temporal' => 'http://reference.data.gov.uk/id/quarter/2006-Q1', 'dcat:theme' => 'http://biordf.org/DataFairPort/ConceptSchemes/Huntingtons.rdf', 'daml:has-Technical-Lead' => "Summer Student Joe", 'daml:has-Administrative-Contact' => "John Doe", 'daml:has-Program-Manager' => "Jane Doe", 'daml:has-Principle-Investigator' => "Big Doctor", ); my $BASE_URL = "http://" . $ENV{'SERVER_NAME'} . $ENV{'REQUEST_URI'} . $path_info; my @known_records = ($BASE_URL . "/479-467-29X", $BASE_URL . "/768-599-467", ); $result{'void:entities'} = scalar(@known_records); # THE TOTAL *NUMBER* OF RECORDS THAT CAN BE SERVED $result{'ldp:contains'} = \@known_records; # the listref of record ids return encode_json(\%result);}sub get_distribution_URIs { my ($self, $ID, $PATH_INFO) = @_; my (%response, %formats, %metadata); $formats{'text/html'} = 'http://myserver.org/ThisScript/record/479-467-29X.html'; $formats{'application/rdf+xml'} = 'http://myserver.org/ThisScript/record/479-467-29X.rdf'; $metadata{'rdf:type'} = ['edam:data_0006', 'sio:SIO_000088']; extractDataFromSpreadsheet(\%metadata, $ID); $response{distributions} = \%formats; $response{metadata} = \%metadata if (keys %metadata); # only set it if you can provided something my $response = encode_json(\%response); return $response;}sub extractDataFromSpreadsheet{ my ($metadata, $ID) = @_; use Spreadsheet::XLSX::Reader::LibXML; my $db_file = "registry3-enrolment.xlsx.xlsx"; my $excel = Spreadsheet::XLSX::Reader::LibXML->new(); my $workbook = $excel->parse($db_file); my ($sheet) = $workbook->worksheets; my ($first, $last) = $sheet->row_range; foreach my $row ($first .. $last) { next unless ($sheet->get_cell($row, 0)->value eq $ID); my $cell = $sheet->get_cell($row, 5); $metadata->{'dcat:updateDate'} = $cell->value; $cell = $sheet->get_cell($row, 1); $metadata->{'dcat:releaseDate'} = $cell->value; $cell = $sheet->get_cell($row, 3); $metadata->{'ehdnpred:enrollmentState'} = $cell->value; last; }}

This is the only code that a provider must implement… and much (almost half!) of it is just tag/value definitions

If they don’t want to implement the full set of drill-down behaviors then the code is even smaller!

(This is the actual code - 68 lines - running the demo you just saw. Most of the heavy-lifting is handled by the libraries I published yesterday)

Page 85: Data FAIRport Prototype & Demo - Presentation to Elsevier, Jul 10, 2015

Skunkworks Task #3 - [I]nteroperable

This is “the holy grail”!!

Page 86: Data FAIRport Prototype & Demo - Presentation to Elsevier, Jul 10, 2015

Skunkworks Task #3 - [I]nteroperable

This is “the holy grail”!!

This is where the FAIR Profile reveals its utility

“what it IS” vs. “what it IS NOT”

Page 87: Data FAIRport Prototype & Demo - Presentation to Elsevier, Jul 10, 2015

What a FAIR Profile is:

A meta-description of the (meta)data in a repository

if you were to view it from a particular “perspective”

(also known as a “lens” over the data)

Page 88: Data FAIRport Prototype & Demo - Presentation to Elsevier, Jul 10, 2015

Skunkworks Task #3 - [I]nteroperable

“FAIR Projectors”

A FAIR Projector is a (potentially) small, modular, reusable Web based service that “projects” data

from a repository into the format described by a FAIR Profile

Page 89: Data FAIRport Prototype & Demo - Presentation to Elsevier, Jul 10, 2015

Skunkworks Task #3 - [I]nteroperable

“FAIR Projectors”

A FAIR Projector is a (potentially) small, modular, reusable Web based service that “projects” data

from a repository into the format described by a FAIR Profile

http://linkeddatafragments.org/

Page 90: Data FAIRport Prototype & Demo - Presentation to Elsevier, Jul 10, 2015

RESTful access to RDF data resources

RESTful hypermedia controls (e.g. pagination) defined by Hydra W3C Community Group

http://www.hydra-cg.com/

Page 91: Data FAIRport Prototype & Demo - Presentation to Elsevier, Jul 10, 2015

implementedBy

Page 92: Data FAIRport Prototype & Demo - Presentation to Elsevier, Jul 10, 2015

implementedBy

GET

Page 93: Data FAIRport Prototype & Demo - Presentation to Elsevier, Jul 10, 2015

implementedBy

2 Options for a projector:

Page 94: Data FAIRport Prototype & Demo - Presentation to Elsevier, Jul 10, 2015

implementedBy

2 Options for a projector:Direct Access to Repository

Page 95: Data FAIRport Prototype & Demo - Presentation to Elsevier, Jul 10, 2015

implementedBy

2 Options for a projector:OR access via a FAIR Accessor

Page 96: Data FAIRport Prototype & Demo - Presentation to Elsevier, Jul 10, 2015

implementedBy

Client receives

Page 97: Data FAIRport Prototype & Demo - Presentation to Elsevier, Jul 10, 2015

Stage 1: Kinds of questions we can ask

● How do I access the records in Repo X?→ HTTP GET (Accessor URL)

● How do I access the records in Repo X in XML? → HTTP GET (Accessor URL)

→ HTTP GET (DCAT Dist URL)

● Can I please have the “biological tissue” field in Repository X as FMA Ontology terms?

→ Search FAIRport Registry→ Find matching FAIR Profile + Projector → HTTP GET (Projector URL)

Page 98: Data FAIRport Prototype & Demo - Presentation to Elsevier, Jul 10, 2015

FAIR Projector:

The first time I wrote one of these from scratch, it was about 300 lines of Perl code,

and took about 6 hours (including reading the LDF documentation!)

and it projected three different FAIR Profiles

The next thing on my TODO list is to write libraries to make this easier; however, this is a much trickier thing to do!

Page 99: Data FAIRport Prototype & Demo - Presentation to Elsevier, Jul 10, 2015

Live Demo of a FAIR Projector

This demo is done over my own database:

http://antirrhinum.net

it will project the “Allele” slice of that database into three different forms, using 3 different profiles. The demo uses a FAIR Accessor (as described in the previous demo)

Page 100: Data FAIRport Prototype & Demo - Presentation to Elsevier, Jul 10, 2015

<?xml version="1.0" standalone="yes"?>

<Allele class="Allele" value="cho"> <Source> <gene class="Locus" value="CHO" /> </Source> <Location class="Laboratory" value="Schwarz-Sommer" /> <Description> <Phenotype class="#Text" value="Habit:" /> <Phenotype class="#Text" value="Leaves: Thin and narrow starting from the 6th leaf. Buckled and warped along the axis of the leaf." /> <Phenotype class="#Text" value="Seedlings:" /> <Phenotype class="#Text" value="Cotyledons: no obvious change" /> <Phenotype class="#Text" value="Hypocotyl:" /> <Phenotype class="#Text" value="Inflorescence:" /> <Phenotype class="#Text" value="Flowers: Conspicuous sepal to petal transformations, particularly in 165E genetic background. Petals unfused. Carpel sometimes unfused and stunted." /> <Phenotype class="#Text" value="_____________" /> <Phenotype class="#Text" value="Upper lip:" /> <Phenotype class="#Text" value="Lower lip:" /> <Phenotype class="#Text" value="Bumps:" /> <Phenotype class="#Text" value="Seed: reduced germimation of mutant seed." /> <Phenotype class="#Text" value="Roots: root growth retarded; roots sometimes absent." /> <Phenotype class="#Text" value="Remarks: F2 74:25, though usually mutants are under-represented in F2 populations." /> <Phenotype class="#Text" value="Remarks: Identical phenotype to Des (Despenteado)" /> <Recessive /> </Description> <Phenotype_picture class="Phenotype_Picture" value="cho~a" /> <Phenotype_picture class="Phenotype_Picture" value="cho~b" /> <Phenotype_picture class="Phenotype_Picture" value="cho-0" /> <Phenotype_picture class="Phenotype_Picture" value="cho-1" /> <Phenotype_picture class="Phenotype_Picture" value="zss_pict0027" /> <Phenotype_picture class="Phenotype_Picture" value="zss_pict0028" /> <Phenotype_picture class="Phenotype_Picture" value="zss_pict0029" /> <Expression_pattern_of class="Locus" value="FIM"> <Description class="#Text" value="FIM extends to first whorl" /> <Pick_me_to_call class="#txt" value="FIM_in_cho"> <Pick_me_to_call-2 class="#txt" value="FIM_in_cho.jpg" /> </Pick_me_to_call> <Photo_by class="Author" value="Wilkinson MD" /> </Expression_pattern_of> <Expression_pattern_of class="Locus" value="GLO"> <Description class="#Text" value="GLO expression in first whorl organs that ectopically express FIM" /> <Pick_me_to_call class="#txt" value="GLO_in_cho"> <Pick_me_to_call-2 class="#txt" value="GLO_in_cho.jpg" /> </Pick_me_to_call> <Photo_by class="Author" value="Wilkinson MD" /> </Expression_pattern_of> <Multi_mutant class="Multi_mutant" value="cho_fim-679" /> <Multi_mutant class="Multi_mutant" value="cho_fim-1" /> <Multi_mutant class="Multi_mutant" value="cho_fis" /> <Multi_mutant class="Multi_mutant" value="cho_def-gli" /> <Multi_mutant class="Multi_mutant" value="cho_inco" /> <Multi_mutant class="Multi_mutant" value="cho_glo-1" /> <Multi_mutant class="Multi_mutant" value="cho_glo-3D" /></Allele>

The raw data, from the repository

Page 101: Data FAIRport Prototype & Demo - Presentation to Elsevier, Jul 10, 2015

<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:FAI="http://datafairport.org/schemas/FAIR-schema.owl#" xmlns:dcterms="http://purl.org/dc/terms/" xmlns:s="http://www.w3.org/2000/01/rdf-schema#"> <FAI:FAIRClass rdf:about=""> <FAI:hasProperty rdf:resource="http://datafairport.org/sampledata/profileschemaproperty/0c0c0c3c-5ce1-4df6-98d3-4c22a75748ea"/> <FAI:hasProperty rdf:resource="http://datafairport.org/sampledata/profileschemaproperty/94037d0d-3d8e-4fc4-bd24-dafb85520089"/> <FAI:onClassType rdf:resource="http://purl.obolibrary.org/obo/SO_0001023"/> <dcterms:provenance rdf:resource="#Profile"/> <s:label>FAIR Class of Allele</s:label> </FAI:FAIRClass> <FAI:FAIRProfile rdf:about="#Profile"> <FAI:hasClass rdf:resource=""/> <dcterms:description>FAIR Profile Allele record properties, using textual descriptions and links to Gene Records</dcterms:description> <dcterms:identifier>doi:Mark.Dragon.P1</dcterms:identifier> <dcterms:license>Anyone may use this freely</dcterms:license> <dcterms:organization>wilkinsonlab.info</dcterms:organization> <dcterms:title>FAIR Profile of Descriptive Allele records</dcterms:title> <rdf:type rdf:resource="http://purl.org/dc/terms/ProvenanceStatement"/> <s:label>FAIR Profile Allele</s:label> </FAI:FAIRProfile> <FAI:FAIRProperty rdf:about="http://datafairport.org/sampledata/profileschemaproperty/0c0c0c3c-5ce1-4df6-98d3-4c22a75748ea"> <FAI:allowedValues rdf:resource="../ConceptSchemes/xsdstring"/> <FAI:onPropertyType rdf:resource="http://purl.org/dc/terms/description"/>

<s:label>description</s:label> </FAI:FAIRProperty> <FAI:FAIRProperty rdf:about="http://datafairport.org/sampledata/profileschemaproperty/94037d0d-3d8e-4fc4-bd24-dafb85520089"> <FAI:allowedValues rdf:resource="../ConceptSchemes/SequenceOntologyGene704"/> <FAI:maxCount>1</FAI:maxCount> <FAI:minCount>1</FAI:minCount> <FAI:onPropertyType rdf:resource="http://purl.obolibrary.org/obo/so_variant_of"/>

<s:label>variant of</s:label> </FAI:FAIRProperty></rdf:RDF>

FAIR Profile #1 - “Descriptive”

Page 102: Data FAIRport Prototype & Demo - Presentation to Elsevier, Jul 10, 2015

<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:FAI="http://datafairport.org/schemas/FAIR-schema.owl#" xmlns:dc="http://purl.org/dc/terms/" xmlns:s="http://www.w3.org/2000/01/rdf-schema#"> <FAI:FAIRClass rdf:about=""> <FAI:hasProperty rdf:resource="http://datafairport.org/sampledata/profileschemaproperty/6e35cbde-3e6a-430d-be22-e87507c71827"/> <FAI:onClassType rdf:resource="http://purl.obolibrary.org/obo/SO_0001023"/> <dc:provenance rdf:resource="#Profile"/> <s:label>FAIR Class of Allele</s:label> </FAI:FAIRClass> <FAI:FAIRProfile rdf:about="#Profile"> <FAI:hasClass rdf:resource=""/> <dc:description>FAIR Profile the Image portion of an Allele record using SIO:Image classification</dc:description> <dc:identifier>doi:Mark.Dragon.P2</dc:identifier> <dc:license>Anyone may use this freely</dc:license> <dc:organization>wilkinsonlab.info</dc:organization> <dc:title>FAIR Profile the Image portion of an Allele record</dc:title> <rdf:type rdf:resource="http://purl.org/dc/terms/ProvenanceStatement"/> <s:label>FAIR Profile of Allele Images(SIO)</s:label> </FAI:FAIRProfile> <FAI:FAIRProperty rdf:about="http://datafairport.org/sampledata/profileschemaproperty/6e35cbde-3e6a-430d-be22-e87507c71827">

<FAI:allowedValues rdf:resource="../ConceptSchemes/SIOOntologyImage81"/> <FAI:onPropertyType rdf:resource="http://semanticscience.org/ontology/SIO_000205"/>

<s:label>is represented by</s:label> </FAI:FAIRProperty></rdf:RDF>

FAIR Profile #2 - “Image using SIO ontology”

Page 103: Data FAIRport Prototype & Demo - Presentation to Elsevier, Jul 10, 2015

<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:FAI="http://datafairport.org/schemas/FAIR-schema.owl#" xmlns:dcterms="http://purl.org/dc/terms/" xmlns:s="http://www.w3.org/2000/01/rdf-schema#"> <FAI:FAIRClass rdf:about=""> <FAI:hasProperty rdf:resource="http://datafairport.org/sampledata/profileschemaproperty/4fbb6c39-1bbe-49bf-af36-c966c3e233a1"/> <FAI:onClassType rdf:resource="http://purl.obolibrary.org/obo/SO_0001023"/> <dcterms:provenance rdf:resource="#Profile"/> <s:label>FAIR Class of Allele</s:label> </FAI:FAIRClass> <FAI:FAIRProfile rdf:about="#Profile"> <FAI:hasClass rdf:resource=""/> <dcterms:description>FAIR Profile the Image portion of an Allele record using EDAM:Image classification</dcterms:description> <dcterms:identifier>doi:Mark.Dragon.P3</dcterms:identifier> <dcterms:license>Anyone may use this freely</dcterms:license> <dcterms:organization>wilkinsonlab.info</dcterms:organization> <dcterms:title>FAIR Profile the Image portion of an Allele record</dcterms:title> <rdf:type rdf:resource="http://purl.org/dc/terms/ProvenanceStatement"/> <s:label>FAIR Profile Allele Images (EDAM)</s:label> </FAI:FAIRProfile> <FAI:FAIRProperty rdf:about="http://datafairport.org/sampledata/profileschemaproperty/4fbb6c39-1bbe-49bf-af36-c966c3e233a1">

<FAI:allowedValues rdf:resource="../ConceptSchemes/EDAMOntologyImage2968"/> <FAI:onPropertyType rdf:resource="http://semanticscience.org/ontology/SIO_000205"/>

<s:label>is represented by</s:label> </FAI:FAIRProperty></rdf:RDF>

FAIR Profile #3 - “Image using EDAM ontology”

Page 104: Data FAIRport Prototype & Demo - Presentation to Elsevier, Jul 10, 2015

FAIR Accessor Step 1

(http://antirrhinum.net/cgi-bin/LDP/Alleles)

Page 105: Data FAIRport Prototype & Demo - Presentation to Elsevier, Jul 10, 2015

FAIR Accessor Step 2

(click on “cho” in the list of alleles)

The Projector takes the application/xml distribution and projects it...

Page 106: Data FAIRport Prototype & Demo - Presentation to Elsevier, Jul 10, 2015

XML of the Allele recordsprojected using the “descriptive” FAIR Profile

http://biordf.org/cgi-bin/DataFairPort/DragonDB_LDF_Profiler/DragonDB_Allele_ProfileAlleleDescriptions/

Page 107: Data FAIRport Prototype & Demo - Presentation to Elsevier, Jul 10, 2015

XML of the Allele records (SAME XML!)projected using the “SIO Image” FAIR Profile

http://biordf.org/cgi-bin/DataFairPort/DragonDB_LDF_Profiler/DragonDB_Allele_ProfileImagesSIO/

http://semanticscience.org/resource/SIO_000081

Page 108: Data FAIRport Prototype & Demo - Presentation to Elsevier, Jul 10, 2015

XML of the Allele records (SAME XML!)projected using the “EDAM Image” FAIR Profile

http://biordf.org/cgi-bin/DataFairPort/DragonDB_LDF_Profiler/DragonDB_Allele_ProfileImagesSIO/

http://edamontology.org/data_2968

Page 109: Data FAIRport Prototype & Demo - Presentation to Elsevier, Jul 10, 2015

XML of the Allele records (SAME XML!)projected using the “EDAM Image” FAIR Profile

http://biordf.org/cgi-bin/DataFairPort/DragonDB_LDF_Profiler/DragonDB_Allele_ProfileImagesSIO/

Pagination controls from the Hydra ontology...

Page 110: Data FAIRport Prototype & Demo - Presentation to Elsevier, Jul 10, 2015

This was a very “lightweight” demo, but it showed the point

Same data, different lenses (FAIR Profiles)

In the case of this demo, all three Projections used the same FAIR Accessor

It is obviously more efficient to reach-into the database directly and skip the Accessor altogether. Yes!

However, the point of this demo was to show the “worst case scenario” where you need to FAIR Project

something that you have absolutely no control over

Page 111: Data FAIRport Prototype & Demo - Presentation to Elsevier, Jul 10, 2015

Stage 2: Leverage the Modularity

implementedBy

Page 112: Data FAIRport Prototype & Demo - Presentation to Elsevier, Jul 10, 2015

Stage 2: Leverage the Modularity

implementedByimplementedBy

Page 113: Data FAIRport Prototype & Demo - Presentation to Elsevier, Jul 10, 2015

Stage 2: Leverage the Modularity

implementedByimplementedBy

Repository XRepository Y

Page 114: Data FAIRport Prototype & Demo - Presentation to Elsevier, Jul 10, 2015

Stage 2: Leverage the Modularity

implementedByimplementedBy

Page 115: Data FAIRport Prototype & Demo - Presentation to Elsevier, Jul 10, 2015

Stage 2: Leverage the Modularity

implementedByimplementedBy

Merged data to be cross-queried

Page 116: Data FAIRport Prototype & Demo - Presentation to Elsevier, Jul 10, 2015
Page 117: Data FAIRport Prototype & Demo - Presentation to Elsevier, Jul 10, 2015

Main features of FAIR Profiles● Do not require repository participation - anyone can write a Profile. Most of

the time it should be possible to write an accessor too, even by screen-scraping!

● Provides a end-user-purpose-driven, potentially non-comprehensive “view” on a repository

● FAIR Profiles of any given repository facet may be different! May use different vocabularies or may interpret fields differently, depending on the needs of the Profile author

● FAIR profiles can/should be indexed and shared (e.g. in a FAIRport Registry), to facilitate cross-repository interoperability and integration

● There is no (obvious) reason why a FAIR profile could not be used to describe the DATA in the repository, not just the metadata… o my Antirrhinum Allele example does exactly that!

● FAIR Profiles can be used both at the “read” and at the “write” end of data publishing… (Force 11 Oxford meeting demo was for “write” interfaces)

Page 118: Data FAIRport Prototype & Demo - Presentation to Elsevier, Jul 10, 2015

Main features of FAIRPort Platform

● GET GET GET!! We didn’t invent any new technology or API :-) :-)

● All components modular, re-usable, and often will be written by 3rd partieso → encourages the creation of an ecosystem of these lightweight,

discoverable little data transformers

● All components identified by URL, and can be “cobbled together” in whatever way a client needs on a particular day (and this can happen automatically!)

● Because everything is identified by a URL, and we only use HTTP GET, components can be “chained” (e.g. the Projector calls GET on the URL of another Projector)

→ i.e. I don’t care how the Projector or Accessor work “under the hood”, it’s all the same GET to me!

Page 119: Data FAIRport Prototype & Demo - Presentation to Elsevier, Jul 10, 2015

Skunkworks Participants

● Mark Wilkinson● Michel Dumontier● Barend Mons● Tim Clark● Jun Zhao● Paolo Ciccarese● Paul Groth● Erik van Mulligen● Luiz Olavo Bonino da

Silva Santos● Matthew Gamble● Carole Goble● Joël Kuiper● Morris Swertz● Erik Schultes

● Erik Schultes● Mercè Crosas● Adrian Garcia● Philip Durbin● Jeffrey Grethe● Katy Wolstencroft● Sudeshna Das● M. Emily Merrill

Page 120: Data FAIRport Prototype & Demo - Presentation to Elsevier, Jul 10, 2015

Working Examples - One (small) dataset (the Allele slice of my own DragonDB): http://antirrhinum.net An example record in the Repository's native format is here: http://antirrhinum.net/cgi-bin/ace/generic/xml/DragonDB?name=cho;class=Allele

- Three different FAIR Profiles that could be applied to Allele records (from ANY repository) - one with textual descriptions and gene cross-references, the other two with phenotypic images described using the SIO ontology, or the EDAM ontology (respectively). This is the "F" in FAIR, since these can (in principle) be searched and queried in order to find various representations of your data of interest. Profiles are associated - in a many-to-many relationship - with specific repositories via “Projectors” (see below). A Repository may project into many different Profiles, and many Repositories may project their data into the same Profile. * http://biordf.org/DataFairPort/ProfileSchemas/Allele_Profile_Descriptive.rdf * http://biordf.org/DataFairPort/ProfileSchemas/Allele_Profile_EDAM.rdf * http://biordf.org/DataFairPort/ProfileSchemas/Allele_ProfileSIO.rdf

- a "FAIR Accessor" that provides a Linked Data Platform-compliant way to retrieve all of the URIs for the Allele records, as well as their various representations (described as DCAT Distributions). This is the "A" in FAIR. http://antirrhinum.net/cgi-bin/LDP/Alleles

- a "FAIR Projector" that takes the data from the Allele records and "projects" it as RDF that is compliant with whichever Profile you chose. This is the 'I" in FAIR. http://biordf.org/cgi-bin/DataFairPort/DragonDB_LDF_Profiler If you call HTTP GET on that URL, it will report to you what FAIR Profiles it is capable of projecting, using which FAIR Accessor (if any) In this example, all three Projections use an Accesor, and use the same Accessor in each case. This is the ‘worst case scenario’, as it represents the slowest, most roundabout way to access a Repository’s records - it would generally only be used if the Repository provides no externally-facing API of its own ! (that’s why the demo runs so slowly…) I did the demo this way so that everyone could see every component “working together”.

Three “Projections” of the DragonDB Allele Data (note that most of the process above is achieved simply by called GET on the URLs below!!)

http://biordf.org/cgi-bin/DataFairPort/DragonDB_LDF_Profiler/DragonDB_Allele_ProfileAlleleDescriptionshttp://biordf.org/cgi-bin/DataFairPort/DragonDB_LDF_Profiler/DragonDB_Allele_ProfileImagesSIOhttp://biordf.org/cgi-bin/DataFairPort/DragonDB_LDF_Profiler/DragonDB_Allele_ProfileImagesEDAM

Page 121: Data FAIRport Prototype & Demo - Presentation to Elsevier, Jul 10, 2015

This presentation is licensed CC-BYMark Wilkinson ([email protected])

https://goo.gl/YEdwwB

@markmoby