fairy stories

59
FAIRy stories for Christmas Carole Goble The University of Manchester, UK [email protected] ELIXIR-UK, FAIRDOM, ISBE, BioExcel CoE, Software Sustainability Institute Open PHACTS SWAT4HCLS 2017, 5th Dec 2017, Rome

Upload: carole-goble

Post on 22-Jan-2018

174 views

Category:

Science


1 download

TRANSCRIPT

Page 1: FAIRy Stories

FAIRy stories

for Christmas

Carole GobleThe University of Manchester, [email protected]

ELIXIR-UK, FAIRDOM, ISBE, BioExcel CoE, Software Sustainability InstituteOpen PHACTS

SWAT4HCLS 2017, 5th Dec 2017, Rome

Page 2: FAIRy Stories

Once upon a time in a land far, far away lived a KinG …

Who wanted all data to be FAIR….

Page 3: FAIRy Stories
Page 4: FAIRy Stories

Mark D. Wilkinson, Michel Dumontier,

IJsbrand Jan Aalbersberg, Gabrielle Appleton,

Myles Axton, Arie Baak,

Niklas Blomberg, Jan-Willem Boiten,

Luiz Bonino da Silva Santos, Philip E. Bourne,

Jildau Bouwman, Anthony J. Brookes,

Tim Clark, Mercè Crosas,

Ingrid Dillo, Olivier Dumon, Scott Edmunds,

Chris T. Evelo, Richard Finkers,

Alejandra Gonzalez-Beltran, Alasdair J.G. Gray,

Paul Groth, Carole Goble,

Jeffrey S. Grethe, Jaap Heringa,

Peter A.C ’t Hoen, Rob Hooft,

Tobias Kuhn, Ruben Kok,

Joost Kok, Scott J. Lusher,

Maryann E. Martone, Albert Mons,

Abel L. Packer, Bengt Persson,

Philippe Rocca-Serra, Marco Roos,

Rene van Schaik, Susanna-Assunta Sansone,

Erik Schultes, Thierry Sengstag,

Ted Slater, George Strawn,

Morris A. Swertz, Mark Thompson,

Johan van der Lei, Erik van Mulligen,

Jan Velterop,Andra Waagmeester,

Peter Wittenburg, Katherine Wolstencroft,

Jun Zhao, Barend Mons

Wilkinson Dumontier Schultes

Scientific Data 3, 160018 (2016) doi:10.1038/sdata.2016.18

Page 5: FAIRy Stories

Queens…

And FAIRY GODMOTHERS

Scientific Data 3, 160018 (2016) doi:10.1038/sdata.2016.18

Page 6: FAIRy Stories

Machine Processable Metadata

Scientific Data 3, 160018 (2016) doi:10.1038/sdata.2016.18

• Catalogues, Search, Stores• Metadata Standards• Standard Access protocols• Identifiers, Policies• Authorised Access • Licensing

Page 7: FAIRy Stories

FAIR spread across the lands ……

VIVO/SciTS Conferences 6-8 August 2014, Austin, TX

Page 8: FAIRy Stories

FAIR spread across the lands ……

Page 9: FAIRy Stories

Stakeholder FAIR Awareness

UK Institutional Research Data Management guidance*

* Jisc: Final Report FAIR in Practice, Nov 2017

Government, Funder, Publisher,National & International Infrastructures…

Institutional

Researchers

FAIR spread across the lands …… BUT not

necessarily all the peoples

Page 10: FAIRy Stories

FAIR spread across the lands ……

Page 11: FAIRy Stories

Moral: Names are important

Spinning (metadata) straw into gold

Be careful what you promise…

Page 12: FAIRy Stories

Me Too!

staking claims

we { are | will be | always have been } FAIR

a rallying flag

Page 13: FAIRy Stories

Hype

Curve

Page 14: FAIRy Stories

http://dx.doi.org/10.1101/225490

http://blog.ukdataservice.ac.uk/fair-data-assessment-tool/

http://fairmetrics.org/

Page 15: FAIRy Stories

Beware…

beauty is in the

eye of the

beholder

What’s FAIR from a Cataloguer perspective maybe useless from a biologists viewpoint

Page 16: FAIRy Stories

My Semantic FAIRy Stories

The Scientist and

the FAIR Commons

The MAGIC

Research Object

little semantics and

the big Web

Page 17: FAIRy Stories

The Scientists and the FAIR Research Commons

Supporting mixed types and many researchers

FAIR

Page 18: FAIRy Stories

The Scientists and the

FAIR Research

Commons

Find: ID resolutionFaceted NavigationSearch, RDFSPARQL endpoint, APIs

A Commons for Workflows

myexperiment.org

A Commons for Systems Biology Projects

fairdomhub.org

investigation

study

assay/analysis

data

models

SOPs

Page 19: FAIRy Stories

Community & Project Commons

Structured organisationacross standards and types

Federation over autonomous resources

Laissez-Faire

Independent Users

Ecosystem of types, stores and metadata

Page 20: FAIRy Stories

Own little houses: from straw to bricks

Permission controlsStaged sharingLicensesNegotiated accessEmbargosOpen

Page 21: FAIRy Stories

SchemaDublin coreDatacite, DCAT, Bioschemas

Catalogue Level

InvestigationStudiesAssay/Analysis

Contentlevel

Persistent Identifiers

Content levelsubject thematic standards

Contentlevel

StratifiedLinked Data

Page 22: FAIRy Stories

Getting the best FAIR metadata….FAIR Access

– myExperiment -> open

– FAIRDOM -> friends and family

– Hand over straw houses to FAIRDOMHub

“The Tragedy of the Commons”* – Metadata quality and quantity

– Identifier hygiene

– Curation & contributions

– Public good vs personal burden

– Incorporation into processes

– Community socialisation - obligations mismatches. Credit!

*Mark Musen , https://ncip.nci.nih.gov/blog/face-new-tragedy-commons-remedy-better-metadata/

Page 23: FAIRy Stories

project PIs, funderstime burden, distrust

project PIs, fundersPALs – juniors, advocates and Cinderellastemplates, toolsbenefit

Page 24: FAIRy Stories

Moral: Incentives

Page 25: FAIRy Stories

Bake in

“Semantic Nudging”

Ontologies stealthily embedded in Excel spreadsheet templates

Added value -Model execution

Vanity, guilt, shaming

Automation

rightfield.org.uk

Page 26: FAIRy Stories

Cinderella?

The Spreadsheet

Page 27: FAIRy Stories

“The Last Mile”* -> The First Mile

FAIR from bench to cloud

Last mile - Infrastructure view

First mile - researcher / resource view

* Dimitrios Koureas et al Community engagement: The ‘last mile’ challenge for European research e-infrastructuresResearch I deas and Outcomes 2: e9933 (20 Jul 2016) https://doi.org/10.3897/rio.2.e9933

Page 28: FAIRy Stories

the generic vs specific zig zag path

Page 29: FAIRy Stories

The MAGIC Research OBJECT

GENERIC Framework For exchange, reproducibility,

Preservation, active artefacts

Universal Catering, bottomless content

FAIR

Page 30: FAIRy Stories

The FAIR Research Object import, exchange, portability, maintenance

ISA-TAB

Bergman et al COMBINE archive and OMEX format: one file to share all information to reproduce a modeling project, BMC Bioinformatics 2014, 15:369

Page 31: FAIRy Stories

workflow engine

Workflow RunProvenance

Inputs Outputs

IntermediatesParametersConfigs

Narrative

Exchange between people & platformsCommons store, catalogue & archiveReproduce preserve, port, repairActivate re-compute, mix, compare, evolve

The FAIR Workflow Research Object

Page 32: FAIRy Stories

researchobject.org

Bechhofer et al (2013) Why linked data is not enough for scientists https://doi.org/10.1016/j.future.2011.08.004Bechhofer et al (2010) Research Objects: Towards Exchange and Reuse of Digital Knowledge, https://eprints.soton.ac.uk/268555/

Standards-based generic

metadata framework for

bundling internal and external

resources with context

citable reproducible packaging

Data used and results produced in studyMethods employed to produce/analyse dataProvenance and settings for the experimentsPeople involved in the investigationAnnotations about these resources:-understanding & interpretation

Page 33: FAIRy Stories

Linking across ROs and into the Linked Open Data Cloud

• Recording & linking together the components of an experiment

• Linking across experiments.

• Linked ROs

• A Semantic Web of Research Objects

• Resource References – a bottomless pot

Page 34: FAIRy Stories

Technology Independent.

The least possible.The simplest feasible. Low tech.

Low user overhead and thin client

Graceful degradation.

FAIR ROs Desiderata

Page 35: FAIRy Stories

Construction Content ProfileTypes

Identificationto locate thingsAggregatesto link things togetherAnnotationsabout things & their

relationships

Type Checklistswhat should be thereProvenancewhere it came fromVersioningits evolutionDependencies what else is needed

Manifest checklistType Checklistsdescribing what should be there

Container

Metadata

Objects

Page 36: FAIRy Stories

Construction

http://www.researchobject.org/specifications/

RO Model

Identifiers: URI, RRI, DOI, ORCID

W3C Web Annotation Vocabulary

Open Archives InitiativeObject Exchange and Reuse

Aggregation

Annotation

Container

Page 37: FAIRy Stories

Content

Profiles. Progression Levels

Container

Page 38: FAIRy Stories

Profile

http://purl.org/minim/description

W3C Shape Specs

*Gamble, Zhao, Klyne, Goble. "MIM: A Minimum Information Model Vocabulary and Framework for Scientific Linked Data", IEEE eScience 2012 Chicago, USA October, 2012), http://dx.doi.org/10.1109/eScience.2012.6404489

validators / viewers

Minim model for defining checklists*

multiple profiles for different consumers

Generic

Specifics

RO-SHOW

Container

Page 39: FAIRy Stories

Linked Data Pharmacological Discovery Platform Data ReleasesDataset “build”

RO LibraryEarth Sciences

Public Health Learning Systems

Asthma Research e-Lab sharing and computing statistical cohort studies

Happy Endings!

ISA based Packaging, Systems Biology commons & publishing

Managing distributed unmovable large datasets for Biomedical HTS analytic pipelines *

* Chard et al I'll take that to go: Big data bags and minimal identifiers for exchange of large, complex datasets, https://doi.org/10.1109/BigData.2016.7840618

Page 40: FAIRy Stories

Happy Ending – Workflows

Biomedical HTS analytic pipelines

Manifest description of CWL workflows + rich context + provenance +other objects + snapshots

Precision medicineNGS pipelines regulation*

*Alterovitz, Dean II, Goble, Crusoe, Soiland-Reyes et al Enabling Precision Medicine via standard communication of NGS provenance, analysis, and results, biorxiv.org, 2017, https://doi.org/10.1101/191783

EDAM

Biomolecular modelling

Portable Workflows

Page 41: FAIRy Stories

BagIT, JSON(-LD), schema.org

https://dokie.li/

https://linkedresearch.org/

Manifest: Schema.org, JSON-LD, RDFArchive: .tar.gz

Reproducible Document Stack project

eLife, Substance and Stencila

BagIT data profile + schema.org JSON-LD annotations

Many Roads

Page 42: FAIRy Stories

MoralsIncremental, open frameworks hard work

– Extensive reuse of standards is tricky

– Too Generic vs Too Specific

– Multi-element type & nesting challenges

– ROs with a Purpose

– Examples & templates

Representational Beauty vs Tools– Easy to make, hard to consume

– Be specific, be developer friendly

– Profiles & tools critical

Patience is a virtue

Page 43: FAIRy Stories

Bioschemas:

Little Semantics and

the big web

Being and keeping light,

small and viral

FAIR

Page 44: FAIRy Stories

Structured data markup for web pagesSchema.org adds simple structured metadata markup to web pages & sitemaps for harvesting, search and summary snippet making.

Search engines often highlight websites containing Schema.org

Widespread commercial and open source infrastructure creates a low barrier to adoption

Page 45: FAIRy Stories

Goldilocks & the 3 Use Cases

Standardised metadatamark-up

Metadata published & harvested without APIs or special feeds

3 Use Cases

1. Finding/Citing, 2. Summary snippets3. Metadata exchange /

ingest

Goldilocks• Reuse ubiquitous

commercial platform• The least possible change,

the max possible reuse• Minimum properties – 6• Reuse domain ontologies –

we are not reinventing them!

CommodityOff the Shelf toolsApp eco-system

Repository LevelContent type level

Page 46: FAIRy Stories

Standardised metadatamark-up

Metadata published & harvested without APIs or special feeds

CommodityOff the Shelf toolsApp eco-system

Repository LevelContent type level

Goldilocks & the 3 Use Cases

Page 47: FAIRy Stories

TrainingmaterialsEvents

Organizations Data

Software Lab Protocols

schema.org tailored to the Biosciences for FAIRsimple structured metadata markup on web pages & sitemaps

bio.tools

Page 48: FAIRy Stories

schema.org tailored to the Biosciencessimple structured metadata markup on web pages & sitemaps

• Specific for life sciences• Extends existing Schema.org types• Focused on few types and well defined relationships• Minimum properties for finding and accessing data• Best practices for selected properties• Managed by Bioschemas.org

• Generic data model• Generous list of properties to describe data types• Managed by Schema.org

Page 49: FAIRy Stories

Tailored schema.org to improve Findability and Accessibility in Bioscience

Layer of constraints +

documentation + extensionsLeyla Garcia. Poster & Flashtalk

Page 50: FAIRy Stories

2-3 Oct 2017, Hinxton, ~50 people

Ideally 6 conceptsReuse ontologies

schema.orgReal mark-upTools

Find, Cite, Snippets, Metadata exchange

Community

Page 51: FAIRy Stories

http://www.france-bioinformatique.fr/en/training_material

https://search.google.com/structured-data/testing-tool

Applied Drupal 7 schema.org extensionTook about 2 hours

Included in TeSS in an hour[Niall Beard]

Page 52: FAIRy Stories

MORALs

Community Buy-in Worth it

• First specs & main mechanism for training

• Google / Schema & ELIXIR support

• Research Schemas for European Open Science Cloud pilot

Goldilocks works but is hard work

• Types & Profiles debates

• Elegance vs best for tools

• Reuse domain ontologies

• Validation, mark-up & harvesting tools

Trolls

Page 53: FAIRy Stories

How are we FAIRing?

Different levels with different emphasisIts an Ecosystem, not a single solution

• Catalogues, Search, Stores• Metadata Standards• Standard Access protocols• Identifiers, Policies• Authorised Access • Licensing

Page 54: FAIRy Stories

smart rebrand launch

Still hard, same stuff

Rally big communities and grassroots initiatives

Examine our capabilities

There is no magic

Page 55: FAIRy Stories

FAIRy Land PEST

Political

Economic

Social

Technical

Page 56: FAIRy Stories

Platform & user buy-in from the get-go

Passionate, dedicated leadership

Seeding critical mass

Community

Tools Driver

Bottom up initiatives fostered by big umbrellas infrastructures

FAIR Semantic Village*

Simple & Lightweight

Ramps not revolutions

FAIR with a PURPOSE & With PEOPLE

FAIR

Support typical developer –Familiarity – JSON, APIs

*Deb McGuinness

Page 57: FAIRy Stories

Research for FAIRFAIR representation

• The Semantic Web

Automated metadata• Deep learning, machine learning, AI

• Text Mining, Ontology mapping

Social metadata• User Experience, Crowd Sourcing

• Choice architecture

FAIR action• Blockchain

• Virtualised & remote execution

• Image processing

• Preservation & portability

• Provenance tracking, object trajectories

• Engineering & Design, Ethics, Social Sciences

Research +

Developer Practitioner

practices

Page 58: FAIRy Stories

Mark RobinsonNorman MorrisonPaul GrothTim ClarkAlejandra Gonzalez-BeltranPhilippe Rocca-SerraIan CottamSusanna SansoneKristian GarzaDaniel GarijoCatarina MartinsIain BuchanCaroline JayDavid De RoureOscar CorchoSteve PettiferKhalid BelhajjameJun ZhaoPhil CrouchLilian Gorea, Oluwatomide Fasugba

Stian Soiland-ReyesMichael CrusoeRafael JimenezAlasdair GrayBarend MonsSean Bechhofer

Michel DumontierMark WilkinsonLeyla GarciaStuart OwenKaty WolstencroftFinn BacallAlan WilliamsWolfgang MuellerOlga KrebsJacky SnoepMatthew GambleRaul PalmaMark Musen

http://www.researchobject.org

http://www.myexperiment.org

http://wf4ever.org

http://www.fair-dom.org

http://www.fairdomhub.org

http://seek4science.org

http://rightfield.org.uk

http://www.bioschemas.org

http://www.commonwl.org

http://www.bioexcel.eu

http://www.openphacts.org

Page 59: FAIRy Stories