keynote speech - carole goble - jisc digital festival 2015

69

Upload: jisc

Post on 15-Jul-2015

1.699 views

Category:

Documents


4 download

TRANSCRIPT

Page 1: Keynote speech - Carole Goble - Jisc Digital Festival 2015

RARE and FAIR Science Reproducibility and Research ObjectsProfessor Carole Goble FREng FBCS

The University of Manchester UK

The Software Sustainability Institute

carolegoblemanchesteracuk

Jisc Digital Festival 9-10 March 2015 ICC Birmingham UK

Knowledge Turning Flow

Barriers to Cure

raquo Access to scientific resources

raquo Coordination and Collaboration

raquo Flow of Information

httpforatv20100423Sage_Commons_Josh_Sommer_Chordoma_Foundation

[Josh Sommer]

[Pettifer Attwood]

httpgetutopiacom

Virtual WitnessingScientific publications

raquo announce a result

raquo convince readers the result is correct

ldquopapers in experimental [and computational science] should describe the results and provide a clear enough protocol [algorithm] to allow successful repetition and extensionrdquo

Jill Mesirov Broad Institute 2010

Accessible Reproducible Research Science 22 January 2010 Vol 327 no 5964 pp 415-416 DOI 101126science1179653

Leviathan and the Air-Pump Hobbes Boyle and the Experimental Life (1985) Shapin and Schaffer

Bramhall et al QUALITY OF METHODS REPORTING IN ANIMAL MODELS OF COLITIS Inflammatory Bowel Diseases 2015

ldquoOnly one of the 58 papers reported all essential criteria on our checklist Animal age gender housing conditions and mortalitymorbidity were all poorly reportedhelliprdquo

httpwwwnaturecomnewsmale-researchers-stress-out-rodents-115106

ldquoAn article about computational science in a scientific publication is not the scholarship itself it is merely advertising of the scholarship The actual scholarship is the complete software development environment [the complete data] and the complete set of instructions which generated the figuresrdquo

David Donoho ldquoWavelab and Reproducible Researchrdquo 1995

datasetsdata collectionsstandard operating proceduressoftwarealgorithmsconfigurationstools and appscodesworkflows scriptscode librariesservicessystem software infrastructure compilers hardware

Morin et al Shining Light into Black BoxesScience 13 April 2012 336(6078) 159-160

Ince et al The case for open computer programs Nature 482 2012

Of 50 papers randomly chosen from 378 manuscripts in 2011 that use Burrows Wheeler Aligner for mapping Illumina reads

7 studies listed necessary details

26 no access to primary data sets broken links to home websites

31 no sw version parameters exact version of genomic reference

sequence

Nekrutenko amp Taylor Next-generation sequencing data interpretation enhancing reproducibility and accessibility Nature Genetics 13 (2012)

Broken software Broken science

raquo Geoffrey Chang Scripps Institute

raquo Homemade data-analysis program inherited from another lab

raquo Flipped two columns of data inverting the electron-density map used to derive protein structure

raquo Retract 3 Science papers and 2 papers in other journals

raquo One paper cited by 364 The structures of MsbA (purple) and Sav1866 (green) overlap little (left) until MsbA is inverted (right)

Miller A Scientists Nightmare Software Problem Leads to Five Retractions Science 22 December 2006 vol 314 no 5807 1856-1857httpwwwsoftwareacukblog2014-12-04-its-impossible-conduct-research-without-software-say-7-out-10-uk-researchers

Software making practicesldquoAs a general rule

researchers do not

test or document their

programs rigorously

and they rarely

release their codes

making it almost

impossible to

reproduce and verify

published results

generated by

scientific softwarerdquo

2000 scientists JE Hannay et al ldquoHow Do Scientists Develop and Use Scientific Softwarerdquo Proc ICSE Workshop Software Eng for Computational Science and Eng 2009 pp 1ndash8

Tools StandardsMachine actionableFormats Reporting Policies Practices

Record and AutomateEverything

recomputationorg

sciencecodemanifestoorg

republic of science

regulation of science

institution cores libraries

Mertonrsquos four norms of scientific behaviour (1942)

public services

Honest Error Science is messy

Inherent

ReinhartRogoff Austerity economicsThomas Herndon

Nature Oct rsquo12

Zoeuml Corbyn

Fraud

ldquoI canrsquot immediately reproduce the research in my own laboratory It took an estimated 280 hours for an average user to approximately reproduce the paperrdquo

Prof Phil BourneAssociate Director NIH Big Data 2 Knowledge Program

When research goes ldquowrongrdquo

raquo Tainted resources

raquo Black boxes

raquo Poor Reporting

raquo Unavailable resources results data software

raquo Bad maths

raquo Sins of omission

raquo Poor training sloppiness

httpswwwsciencenewsorgarticle12-reasons-research-goes-wrong (adapted)

Ioannidis Why Most Published Research Findings Are False August 2005Joppa et alTroubling Trends in Scientific Software Use SCIENCE 340 May 2013

Scientific method

Social environmentraquo Impact factor mania

raquo Pressure to publish

raquo Broken peer review

raquo Research never reported

raquo Disorganisation

raquo Time pressures

raquo Prep amp curate costs

When research goes ldquowrongrdquo

httpswwwsciencenewsorgarticle12-reasons-research-goes-wrong (adapted)

Nick D Kim strange-matternet

Norman Morrison

Do a Replication Study No thanks Not FAIR

Hard Resource intensiveUnrecognised TrolledJust gathering the bits

Cross-Institutional e-Laboratory

Scattered parts Subject specific General resources

Fragmented Landscape

101 Innovations in Scholarly Communication - the Changing Research Workflow Boseman and Kramer 2015 httpfigsharecomarticles101_Innovations_in_Scholarly_Communication_the_Changing_Research_Workflow1286826

httpmyexperimentorg

Research Objects

Compound Investigations Research Products

Multi-various ProductsPlatformsResources

Units of exchange commons contextual metadata

httpwwwresearchobjectorg

httpwwwresearchobjectorg

First class citizens - data software methods - id manage credit track profile focus

A Framework to Bundle and Relate (scattered) resources Metadata Objects that carry Research Context

Research Objects

bull closed lt-gt open bull local lt-gt alienbull embed lt-gt referbull fixed lt-gt fluidbull nested

bull multi ndashtyped stewarded sited authored

bull span research researchers platforms time

bull cite resolve steward

Goble De Roure Bechhofer Accelerating Knowledge Turns I3CK 2013

means

ends

driver

Research Object packages codes study and metadata to exchange descriptions of clinical study cohorts statistical scripts data (CKAN for the Farr Commons)

STELAR Asthma e-Lab Study Team for Early Life Asthma Research

ClinicalCodesorg coded patient cohorts exchanged with NHS FARSITE system

MRC funded multi-site collaboration to support safe use of patient and research data for medical research

STELAR e-Lab

Platform 1

Platform 2

Platform 3

Focus Pivot and ProfileProfile around methods workflows scripts software data figureshellip

Focus on the figure F1000Research Living Figuresversioned articles in-article data manipulation

R Lawrence Force2015 Vision Award Runner Up httpf1000compostersbrowsesummary1097482

Simply data + code

Can change the definition of

a figure and ultimately the

journal article

Colomb J and Brembs B

Sub-strains of Drosophila Canton-S differ

markedly in their locomotor behavior [v1

ref status indexed httpf1000res3is]

F1000Research 2014 3176

Other labs can replicate the study or

contribute their data to a meta-

analysis or disease model - figure

automatically updates

Data updates time-stamped

New conclusions added via versions

Jennifer Schopf Treating Data Like Software A Case for Production Quality Data JCDL 2012

Software-like Release paradigm Not a static document paradigm

Reproduce looks backwards -gt Release looks forwards

raquo Science methods data change -gt agile evolution

raquo Comparisons versions forks amp merges dependencies

raquo Id amp Citations

raquo Interlinked ROs

httpsdoiorg1015490seek1investigation56

[Snoep 2015]

httpsdoiorg1015490seek1investigation56

Personal Data

Local Stores

External

Databases

Articles

Models

Standards

Aggregated Commons Infrastructure

Consistent Comparative Reportingbull Design protocols samples

software modelshellipbull Just Enough Results Modelbull Common and specific elements

httpwwwseek4scienceorg httpwwwfair-domorg httpisatoolsorg

RO as Instrument Materials Method

RO as Instrument Materials Method

Input Data

Software

Output Data

ConfigParameters

Drummond Replicability is not Reproducibility Nor is it Good Science onlinePeng Reproducible Research in Computational Science Science 2 Dec 2011 1226-1227

Public data sets

My algorithm

RO Workflow as Instrument

BioSTIF

My data set

Public software

recompute

replicatererun

repeat

re-examine

repurpose

recreate

reuse

restorereconstruct review

regeneraterevise

recycle

redo

What IS reproducibilityRe ldquodo againrdquo ldquoreturn to original staterdquo

regenerate figure

ldquoshow A is true by doing Brdquo

verify but not falsify[Yong Nature 485 2012]

robustness tolerance

verification compliance

validation assurance

1 Science Changes So does the Lab

BioSTIF

ldquoThe questions donrsquot

change but the

answers dordquoDan Reed

The lab is not fixedUpdated resources

Uncertainty

Zhao et al Why workflows break - Understanding and combating decay in Taverna workflows 8th Intl Conf e-Science 2012

2 Instruments Break Labs Decaymaterials become unavailable technicians leave

Reproducibility Window

raquo Bit rot Black boxes

raquo Proprietary Licenses

raquo Clown services

raquo Partial replication

raquo Prepare to Repair

rsaquo form or function

rsaquo preserve or sustain

RO as Instrument Materials Method

Input Data

Software

Output Data

ConfigParameters

Methods(techniques algorithms

spec of the steps)

Materials(datasets parameters

algorithm seeds)

Experiment

Instruments(codes services scripts

underlying libraries)

Laboratory(sw and hw infrastructure

systems software

integrative platforms)

Setup

Drummond Replicability is not Reproducibility Nor is it Good Science onlinePeng Reproducible Research in Computational Science Science 2 Dec 2011 1226-1227

Research Environment

submit articleand move onhellip

publish articlePublication Environment

Research Environment

publish articlePublication Environment

submit articleand move onhellip

[Adapted Freire 2013]

transparencydependencies

stepsprovenance

portability

robustness

preservation

accessavailable

descriptionintelligible

standardscommon APIs

licensing

standardscommon

metadata

change managementversioning

packaging

Machine actionable

Machine actionable

Provenance ndash the link between doing and reporting

Reproduce by ReadingArchived Record Retain the ProcessCode

The IT Crowd Series 3 Episode 4

The eLab Virtual Machine (or Docker Image)

a black box thoughdockercom

Reproduce by Running Active InstrumentRetain the bits

The IT Crowd Series 3 Episode 4

The Internet

serviceScience as a Service

Integrative frameworks

Open Source

Workflows

Virtual Machines

Portable Packaging

Portability

Transparency

ReproZip

Workflowsmakefiles

serviceScience as a Service

Integrative frameworks

Open Source

Workflows

Virtual Machines

Portable Packaging

Metadata Objectsthe secret is the manifesthellip

Workflow definition

Data (inputs outputs) Parameter configsProvenance log

Hettne et al Structuring research methods and data with the research object model genomics workflows as a case study 2014 httpwwwjbiomedsemcomcontentpdf2041-1480-5-41pdf

myRDM

httpwwwcnrirestonvauspapersOverviewDigitalObjectArchitecturepdf

Depth and Coverage Profiles

NISO-JATS

NISO-JATS

Depth and Coverage Metadata Profiles

Zhao et al 2013

Method Matters

Make reproducible -gt Born

Be smart about reproducibility

Think Commons not Repository

Best Practices for Scientific Computing httparxivorgabs12100530Stodden Reproducible Research Standard Intl J Comm Law amp Policy 13 2009

RARE amp FAIR Knowledge Turns with Research Objects

httpmerchandisethedoctorwhositecouk

httpdoctorwhosite1weeblycomsonic-screwdrivershtml

Researcher Silver bullet tools

Psychic paper

httpbowjamesbowca20080608shhhhhhh-silencshtml

Reality Check

Jorge Cham wwwphdcomicscom

Stealthy not Sneakyreduce the friction

instrumentationspan RARE and FAIR

Optimising The Neylon Equation

Auto-magical end-to-end Instrumentationhttpswwwyoutubecomwatchv=QVQwSOX5S08

ELNs and Authoring Platforms

Sweave

Credit ne AuthorshipCiting what

Research Currencies

Training

56Of UK researchers develop their own research software or scripts

73 Of UK researchers have had no formal software engineering training

Survey of researchers from 15 Russell Group universities conducted by SSI between August - October 2014 406 respondents covering representative range of funders discipline and seniority

httpwwwrseacuk

BUThelliphellip

two years time when the paper is writtenreviewers want additional workstatistician wants more runsanalysis may need to be repeatedpost-doc leaves student arrivesnew data revised dataupdated versions of algorithmscodessample was contaminated

Inspired by Bob Harrison

bull Incremental shift for infrastructure providers

bull Moderate shift for policy makers and stewards

bull Paradigm shift for researchers and their institutions

The Challenge

All the members of the Wf4Ever teamColleagues in Manchesterrsquos Information Management Group

httpwwwresearchobjectorg

httpwwwwf4ever-projectorg

httpwwwfair-domorg

httpseek4scienceorg

httprightfieldorguk

httpwwwsoftwareacuk

httpwwwdatafairportorg

httpmyexperimentorg

httpwwwbioveleuAlan WilliamsNorman MorrisonStian Soiland-ReyesPaul GrothTim ClarkJuliana FreireAlejandra Gonzalez-BeltranPhilippe Rocca-SerraIan CottamSusanna SansoneKristian Garza

Barend MonsSean BechhoferPhilip BourneMatthew GambleRaul PalmaJun ZhaoNeil Chue HongJosh SommerMatthias ObstJacky SnoepDavid GavaghanRebecca Lawrence

Contacthellip

Professor Carole Goble CBE FREng FBCS

The University of Manchester UK

carolegoblemanchesteracuk

httpssitesgooglecomsitecarolegoble

httpwwwmygridorguk

Page 2: Keynote speech - Carole Goble - Jisc Digital Festival 2015

Knowledge Turning Flow

Barriers to Cure

raquo Access to scientific resources

raquo Coordination and Collaboration

raquo Flow of Information

httpforatv20100423Sage_Commons_Josh_Sommer_Chordoma_Foundation

[Josh Sommer]

[Pettifer Attwood]

httpgetutopiacom

Virtual WitnessingScientific publications

raquo announce a result

raquo convince readers the result is correct

ldquopapers in experimental [and computational science] should describe the results and provide a clear enough protocol [algorithm] to allow successful repetition and extensionrdquo

Jill Mesirov Broad Institute 2010

Accessible Reproducible Research Science 22 January 2010 Vol 327 no 5964 pp 415-416 DOI 101126science1179653

Leviathan and the Air-Pump Hobbes Boyle and the Experimental Life (1985) Shapin and Schaffer

Bramhall et al QUALITY OF METHODS REPORTING IN ANIMAL MODELS OF COLITIS Inflammatory Bowel Diseases 2015

ldquoOnly one of the 58 papers reported all essential criteria on our checklist Animal age gender housing conditions and mortalitymorbidity were all poorly reportedhelliprdquo

httpwwwnaturecomnewsmale-researchers-stress-out-rodents-115106

ldquoAn article about computational science in a scientific publication is not the scholarship itself it is merely advertising of the scholarship The actual scholarship is the complete software development environment [the complete data] and the complete set of instructions which generated the figuresrdquo

David Donoho ldquoWavelab and Reproducible Researchrdquo 1995

datasetsdata collectionsstandard operating proceduressoftwarealgorithmsconfigurationstools and appscodesworkflows scriptscode librariesservicessystem software infrastructure compilers hardware

Morin et al Shining Light into Black BoxesScience 13 April 2012 336(6078) 159-160

Ince et al The case for open computer programs Nature 482 2012

Of 50 papers randomly chosen from 378 manuscripts in 2011 that use Burrows Wheeler Aligner for mapping Illumina reads

7 studies listed necessary details

26 no access to primary data sets broken links to home websites

31 no sw version parameters exact version of genomic reference

sequence

Nekrutenko amp Taylor Next-generation sequencing data interpretation enhancing reproducibility and accessibility Nature Genetics 13 (2012)

Broken software Broken science

raquo Geoffrey Chang Scripps Institute

raquo Homemade data-analysis program inherited from another lab

raquo Flipped two columns of data inverting the electron-density map used to derive protein structure

raquo Retract 3 Science papers and 2 papers in other journals

raquo One paper cited by 364 The structures of MsbA (purple) and Sav1866 (green) overlap little (left) until MsbA is inverted (right)

Miller A Scientists Nightmare Software Problem Leads to Five Retractions Science 22 December 2006 vol 314 no 5807 1856-1857httpwwwsoftwareacukblog2014-12-04-its-impossible-conduct-research-without-software-say-7-out-10-uk-researchers

Software making practicesldquoAs a general rule

researchers do not

test or document their

programs rigorously

and they rarely

release their codes

making it almost

impossible to

reproduce and verify

published results

generated by

scientific softwarerdquo

2000 scientists JE Hannay et al ldquoHow Do Scientists Develop and Use Scientific Softwarerdquo Proc ICSE Workshop Software Eng for Computational Science and Eng 2009 pp 1ndash8

Tools StandardsMachine actionableFormats Reporting Policies Practices

Record and AutomateEverything

recomputationorg

sciencecodemanifestoorg

republic of science

regulation of science

institution cores libraries

Mertonrsquos four norms of scientific behaviour (1942)

public services

Honest Error Science is messy

Inherent

ReinhartRogoff Austerity economicsThomas Herndon

Nature Oct rsquo12

Zoeuml Corbyn

Fraud

ldquoI canrsquot immediately reproduce the research in my own laboratory It took an estimated 280 hours for an average user to approximately reproduce the paperrdquo

Prof Phil BourneAssociate Director NIH Big Data 2 Knowledge Program

When research goes ldquowrongrdquo

raquo Tainted resources

raquo Black boxes

raquo Poor Reporting

raquo Unavailable resources results data software

raquo Bad maths

raquo Sins of omission

raquo Poor training sloppiness

httpswwwsciencenewsorgarticle12-reasons-research-goes-wrong (adapted)

Ioannidis Why Most Published Research Findings Are False August 2005Joppa et alTroubling Trends in Scientific Software Use SCIENCE 340 May 2013

Scientific method

Social environmentraquo Impact factor mania

raquo Pressure to publish

raquo Broken peer review

raquo Research never reported

raquo Disorganisation

raquo Time pressures

raquo Prep amp curate costs

When research goes ldquowrongrdquo

httpswwwsciencenewsorgarticle12-reasons-research-goes-wrong (adapted)

Nick D Kim strange-matternet

Norman Morrison

Do a Replication Study No thanks Not FAIR

Hard Resource intensiveUnrecognised TrolledJust gathering the bits

Cross-Institutional e-Laboratory

Scattered parts Subject specific General resources

Fragmented Landscape

101 Innovations in Scholarly Communication - the Changing Research Workflow Boseman and Kramer 2015 httpfigsharecomarticles101_Innovations_in_Scholarly_Communication_the_Changing_Research_Workflow1286826

httpmyexperimentorg

Research Objects

Compound Investigations Research Products

Multi-various ProductsPlatformsResources

Units of exchange commons contextual metadata

httpwwwresearchobjectorg

httpwwwresearchobjectorg

First class citizens - data software methods - id manage credit track profile focus

A Framework to Bundle and Relate (scattered) resources Metadata Objects that carry Research Context

Research Objects

bull closed lt-gt open bull local lt-gt alienbull embed lt-gt referbull fixed lt-gt fluidbull nested

bull multi ndashtyped stewarded sited authored

bull span research researchers platforms time

bull cite resolve steward

Goble De Roure Bechhofer Accelerating Knowledge Turns I3CK 2013

means

ends

driver

Research Object packages codes study and metadata to exchange descriptions of clinical study cohorts statistical scripts data (CKAN for the Farr Commons)

STELAR Asthma e-Lab Study Team for Early Life Asthma Research

ClinicalCodesorg coded patient cohorts exchanged with NHS FARSITE system

MRC funded multi-site collaboration to support safe use of patient and research data for medical research

STELAR e-Lab

Platform 1

Platform 2

Platform 3

Focus Pivot and ProfileProfile around methods workflows scripts software data figureshellip

Focus on the figure F1000Research Living Figuresversioned articles in-article data manipulation

R Lawrence Force2015 Vision Award Runner Up httpf1000compostersbrowsesummary1097482

Simply data + code

Can change the definition of

a figure and ultimately the

journal article

Colomb J and Brembs B

Sub-strains of Drosophila Canton-S differ

markedly in their locomotor behavior [v1

ref status indexed httpf1000res3is]

F1000Research 2014 3176

Other labs can replicate the study or

contribute their data to a meta-

analysis or disease model - figure

automatically updates

Data updates time-stamped

New conclusions added via versions

Jennifer Schopf Treating Data Like Software A Case for Production Quality Data JCDL 2012

Software-like Release paradigm Not a static document paradigm

Reproduce looks backwards -gt Release looks forwards

raquo Science methods data change -gt agile evolution

raquo Comparisons versions forks amp merges dependencies

raquo Id amp Citations

raquo Interlinked ROs

httpsdoiorg1015490seek1investigation56

[Snoep 2015]

httpsdoiorg1015490seek1investigation56

Personal Data

Local Stores

External

Databases

Articles

Models

Standards

Aggregated Commons Infrastructure

Consistent Comparative Reportingbull Design protocols samples

software modelshellipbull Just Enough Results Modelbull Common and specific elements

httpwwwseek4scienceorg httpwwwfair-domorg httpisatoolsorg

RO as Instrument Materials Method

RO as Instrument Materials Method

Input Data

Software

Output Data

ConfigParameters

Drummond Replicability is not Reproducibility Nor is it Good Science onlinePeng Reproducible Research in Computational Science Science 2 Dec 2011 1226-1227

Public data sets

My algorithm

RO Workflow as Instrument

BioSTIF

My data set

Public software

recompute

replicatererun

repeat

re-examine

repurpose

recreate

reuse

restorereconstruct review

regeneraterevise

recycle

redo

What IS reproducibilityRe ldquodo againrdquo ldquoreturn to original staterdquo

regenerate figure

ldquoshow A is true by doing Brdquo

verify but not falsify[Yong Nature 485 2012]

robustness tolerance

verification compliance

validation assurance

1 Science Changes So does the Lab

BioSTIF

ldquoThe questions donrsquot

change but the

answers dordquoDan Reed

The lab is not fixedUpdated resources

Uncertainty

Zhao et al Why workflows break - Understanding and combating decay in Taverna workflows 8th Intl Conf e-Science 2012

2 Instruments Break Labs Decaymaterials become unavailable technicians leave

Reproducibility Window

raquo Bit rot Black boxes

raquo Proprietary Licenses

raquo Clown services

raquo Partial replication

raquo Prepare to Repair

rsaquo form or function

rsaquo preserve or sustain

RO as Instrument Materials Method

Input Data

Software

Output Data

ConfigParameters

Methods(techniques algorithms

spec of the steps)

Materials(datasets parameters

algorithm seeds)

Experiment

Instruments(codes services scripts

underlying libraries)

Laboratory(sw and hw infrastructure

systems software

integrative platforms)

Setup

Drummond Replicability is not Reproducibility Nor is it Good Science onlinePeng Reproducible Research in Computational Science Science 2 Dec 2011 1226-1227

Research Environment

submit articleand move onhellip

publish articlePublication Environment

Research Environment

publish articlePublication Environment

submit articleand move onhellip

[Adapted Freire 2013]

transparencydependencies

stepsprovenance

portability

robustness

preservation

accessavailable

descriptionintelligible

standardscommon APIs

licensing

standardscommon

metadata

change managementversioning

packaging

Machine actionable

Machine actionable

Provenance ndash the link between doing and reporting

Reproduce by ReadingArchived Record Retain the ProcessCode

The IT Crowd Series 3 Episode 4

The eLab Virtual Machine (or Docker Image)

a black box thoughdockercom

Reproduce by Running Active InstrumentRetain the bits

The IT Crowd Series 3 Episode 4

The Internet

serviceScience as a Service

Integrative frameworks

Open Source

Workflows

Virtual Machines

Portable Packaging

Portability

Transparency

ReproZip

Workflowsmakefiles

serviceScience as a Service

Integrative frameworks

Open Source

Workflows

Virtual Machines

Portable Packaging

Metadata Objectsthe secret is the manifesthellip

Workflow definition

Data (inputs outputs) Parameter configsProvenance log

Hettne et al Structuring research methods and data with the research object model genomics workflows as a case study 2014 httpwwwjbiomedsemcomcontentpdf2041-1480-5-41pdf

myRDM

httpwwwcnrirestonvauspapersOverviewDigitalObjectArchitecturepdf

Depth and Coverage Profiles

NISO-JATS

NISO-JATS

Depth and Coverage Metadata Profiles

Zhao et al 2013

Method Matters

Make reproducible -gt Born

Be smart about reproducibility

Think Commons not Repository

Best Practices for Scientific Computing httparxivorgabs12100530Stodden Reproducible Research Standard Intl J Comm Law amp Policy 13 2009

RARE amp FAIR Knowledge Turns with Research Objects

httpmerchandisethedoctorwhositecouk

httpdoctorwhosite1weeblycomsonic-screwdrivershtml

Researcher Silver bullet tools

Psychic paper

httpbowjamesbowca20080608shhhhhhh-silencshtml

Reality Check

Jorge Cham wwwphdcomicscom

Stealthy not Sneakyreduce the friction

instrumentationspan RARE and FAIR

Optimising The Neylon Equation

Auto-magical end-to-end Instrumentationhttpswwwyoutubecomwatchv=QVQwSOX5S08

ELNs and Authoring Platforms

Sweave

Credit ne AuthorshipCiting what

Research Currencies

Training

56Of UK researchers develop their own research software or scripts

73 Of UK researchers have had no formal software engineering training

Survey of researchers from 15 Russell Group universities conducted by SSI between August - October 2014 406 respondents covering representative range of funders discipline and seniority

httpwwwrseacuk

BUThelliphellip

two years time when the paper is writtenreviewers want additional workstatistician wants more runsanalysis may need to be repeatedpost-doc leaves student arrivesnew data revised dataupdated versions of algorithmscodessample was contaminated

Inspired by Bob Harrison

bull Incremental shift for infrastructure providers

bull Moderate shift for policy makers and stewards

bull Paradigm shift for researchers and their institutions

The Challenge

All the members of the Wf4Ever teamColleagues in Manchesterrsquos Information Management Group

httpwwwresearchobjectorg

httpwwwwf4ever-projectorg

httpwwwfair-domorg

httpseek4scienceorg

httprightfieldorguk

httpwwwsoftwareacuk

httpwwwdatafairportorg

httpmyexperimentorg

httpwwwbioveleuAlan WilliamsNorman MorrisonStian Soiland-ReyesPaul GrothTim ClarkJuliana FreireAlejandra Gonzalez-BeltranPhilippe Rocca-SerraIan CottamSusanna SansoneKristian Garza

Barend MonsSean BechhoferPhilip BourneMatthew GambleRaul PalmaJun ZhaoNeil Chue HongJosh SommerMatthias ObstJacky SnoepDavid GavaghanRebecca Lawrence

Contacthellip

Professor Carole Goble CBE FREng FBCS

The University of Manchester UK

carolegoblemanchesteracuk

httpssitesgooglecomsitecarolegoble

httpwwwmygridorguk

Page 3: Keynote speech - Carole Goble - Jisc Digital Festival 2015

[Pettifer Attwood]

httpgetutopiacom

Virtual WitnessingScientific publications

raquo announce a result

raquo convince readers the result is correct

ldquopapers in experimental [and computational science] should describe the results and provide a clear enough protocol [algorithm] to allow successful repetition and extensionrdquo

Jill Mesirov Broad Institute 2010

Accessible Reproducible Research Science 22 January 2010 Vol 327 no 5964 pp 415-416 DOI 101126science1179653

Leviathan and the Air-Pump Hobbes Boyle and the Experimental Life (1985) Shapin and Schaffer

Bramhall et al QUALITY OF METHODS REPORTING IN ANIMAL MODELS OF COLITIS Inflammatory Bowel Diseases 2015

ldquoOnly one of the 58 papers reported all essential criteria on our checklist Animal age gender housing conditions and mortalitymorbidity were all poorly reportedhelliprdquo

httpwwwnaturecomnewsmale-researchers-stress-out-rodents-115106

ldquoAn article about computational science in a scientific publication is not the scholarship itself it is merely advertising of the scholarship The actual scholarship is the complete software development environment [the complete data] and the complete set of instructions which generated the figuresrdquo

David Donoho ldquoWavelab and Reproducible Researchrdquo 1995

datasetsdata collectionsstandard operating proceduressoftwarealgorithmsconfigurationstools and appscodesworkflows scriptscode librariesservicessystem software infrastructure compilers hardware

Morin et al Shining Light into Black BoxesScience 13 April 2012 336(6078) 159-160

Ince et al The case for open computer programs Nature 482 2012

Of 50 papers randomly chosen from 378 manuscripts in 2011 that use Burrows Wheeler Aligner for mapping Illumina reads

7 studies listed necessary details

26 no access to primary data sets broken links to home websites

31 no sw version parameters exact version of genomic reference

sequence

Nekrutenko amp Taylor Next-generation sequencing data interpretation enhancing reproducibility and accessibility Nature Genetics 13 (2012)

Broken software Broken science

raquo Geoffrey Chang Scripps Institute

raquo Homemade data-analysis program inherited from another lab

raquo Flipped two columns of data inverting the electron-density map used to derive protein structure

raquo Retract 3 Science papers and 2 papers in other journals

raquo One paper cited by 364 The structures of MsbA (purple) and Sav1866 (green) overlap little (left) until MsbA is inverted (right)

Miller A Scientists Nightmare Software Problem Leads to Five Retractions Science 22 December 2006 vol 314 no 5807 1856-1857httpwwwsoftwareacukblog2014-12-04-its-impossible-conduct-research-without-software-say-7-out-10-uk-researchers

Software making practicesldquoAs a general rule

researchers do not

test or document their

programs rigorously

and they rarely

release their codes

making it almost

impossible to

reproduce and verify

published results

generated by

scientific softwarerdquo

2000 scientists JE Hannay et al ldquoHow Do Scientists Develop and Use Scientific Softwarerdquo Proc ICSE Workshop Software Eng for Computational Science and Eng 2009 pp 1ndash8

Tools StandardsMachine actionableFormats Reporting Policies Practices

Record and AutomateEverything

recomputationorg

sciencecodemanifestoorg

republic of science

regulation of science

institution cores libraries

Mertonrsquos four norms of scientific behaviour (1942)

public services

Honest Error Science is messy

Inherent

ReinhartRogoff Austerity economicsThomas Herndon

Nature Oct rsquo12

Zoeuml Corbyn

Fraud

ldquoI canrsquot immediately reproduce the research in my own laboratory It took an estimated 280 hours for an average user to approximately reproduce the paperrdquo

Prof Phil BourneAssociate Director NIH Big Data 2 Knowledge Program

When research goes ldquowrongrdquo

raquo Tainted resources

raquo Black boxes

raquo Poor Reporting

raquo Unavailable resources results data software

raquo Bad maths

raquo Sins of omission

raquo Poor training sloppiness

httpswwwsciencenewsorgarticle12-reasons-research-goes-wrong (adapted)

Ioannidis Why Most Published Research Findings Are False August 2005Joppa et alTroubling Trends in Scientific Software Use SCIENCE 340 May 2013

Scientific method

Social environmentraquo Impact factor mania

raquo Pressure to publish

raquo Broken peer review

raquo Research never reported

raquo Disorganisation

raquo Time pressures

raquo Prep amp curate costs

When research goes ldquowrongrdquo

httpswwwsciencenewsorgarticle12-reasons-research-goes-wrong (adapted)

Nick D Kim strange-matternet

Norman Morrison

Do a Replication Study No thanks Not FAIR

Hard Resource intensiveUnrecognised TrolledJust gathering the bits

Cross-Institutional e-Laboratory

Scattered parts Subject specific General resources

Fragmented Landscape

101 Innovations in Scholarly Communication - the Changing Research Workflow Boseman and Kramer 2015 httpfigsharecomarticles101_Innovations_in_Scholarly_Communication_the_Changing_Research_Workflow1286826

httpmyexperimentorg

Research Objects

Compound Investigations Research Products

Multi-various ProductsPlatformsResources

Units of exchange commons contextual metadata

httpwwwresearchobjectorg

httpwwwresearchobjectorg

First class citizens - data software methods - id manage credit track profile focus

A Framework to Bundle and Relate (scattered) resources Metadata Objects that carry Research Context

Research Objects

bull closed lt-gt open bull local lt-gt alienbull embed lt-gt referbull fixed lt-gt fluidbull nested

bull multi ndashtyped stewarded sited authored

bull span research researchers platforms time

bull cite resolve steward

Goble De Roure Bechhofer Accelerating Knowledge Turns I3CK 2013

means

ends

driver

Research Object packages codes study and metadata to exchange descriptions of clinical study cohorts statistical scripts data (CKAN for the Farr Commons)

STELAR Asthma e-Lab Study Team for Early Life Asthma Research

ClinicalCodesorg coded patient cohorts exchanged with NHS FARSITE system

MRC funded multi-site collaboration to support safe use of patient and research data for medical research

STELAR e-Lab

Platform 1

Platform 2

Platform 3

Focus Pivot and ProfileProfile around methods workflows scripts software data figureshellip

Focus on the figure F1000Research Living Figuresversioned articles in-article data manipulation

R Lawrence Force2015 Vision Award Runner Up httpf1000compostersbrowsesummary1097482

Simply data + code

Can change the definition of

a figure and ultimately the

journal article

Colomb J and Brembs B

Sub-strains of Drosophila Canton-S differ

markedly in their locomotor behavior [v1

ref status indexed httpf1000res3is]

F1000Research 2014 3176

Other labs can replicate the study or

contribute their data to a meta-

analysis or disease model - figure

automatically updates

Data updates time-stamped

New conclusions added via versions

Jennifer Schopf Treating Data Like Software A Case for Production Quality Data JCDL 2012

Software-like Release paradigm Not a static document paradigm

Reproduce looks backwards -gt Release looks forwards

raquo Science methods data change -gt agile evolution

raquo Comparisons versions forks amp merges dependencies

raquo Id amp Citations

raquo Interlinked ROs

httpsdoiorg1015490seek1investigation56

[Snoep 2015]

httpsdoiorg1015490seek1investigation56

Personal Data

Local Stores

External

Databases

Articles

Models

Standards

Aggregated Commons Infrastructure

Consistent Comparative Reportingbull Design protocols samples

software modelshellipbull Just Enough Results Modelbull Common and specific elements

httpwwwseek4scienceorg httpwwwfair-domorg httpisatoolsorg

RO as Instrument Materials Method

RO as Instrument Materials Method

Input Data

Software

Output Data

ConfigParameters

Drummond Replicability is not Reproducibility Nor is it Good Science onlinePeng Reproducible Research in Computational Science Science 2 Dec 2011 1226-1227

Public data sets

My algorithm

RO Workflow as Instrument

BioSTIF

My data set

Public software

recompute

replicatererun

repeat

re-examine

repurpose

recreate

reuse

restorereconstruct review

regeneraterevise

recycle

redo

What IS reproducibilityRe ldquodo againrdquo ldquoreturn to original staterdquo

regenerate figure

ldquoshow A is true by doing Brdquo

verify but not falsify[Yong Nature 485 2012]

robustness tolerance

verification compliance

validation assurance

1 Science Changes So does the Lab

BioSTIF

ldquoThe questions donrsquot

change but the

answers dordquoDan Reed

The lab is not fixedUpdated resources

Uncertainty

Zhao et al Why workflows break - Understanding and combating decay in Taverna workflows 8th Intl Conf e-Science 2012

2 Instruments Break Labs Decaymaterials become unavailable technicians leave

Reproducibility Window

raquo Bit rot Black boxes

raquo Proprietary Licenses

raquo Clown services

raquo Partial replication

raquo Prepare to Repair

rsaquo form or function

rsaquo preserve or sustain

RO as Instrument Materials Method

Input Data

Software

Output Data

ConfigParameters

Methods(techniques algorithms

spec of the steps)

Materials(datasets parameters

algorithm seeds)

Experiment

Instruments(codes services scripts

underlying libraries)

Laboratory(sw and hw infrastructure

systems software

integrative platforms)

Setup

Drummond Replicability is not Reproducibility Nor is it Good Science onlinePeng Reproducible Research in Computational Science Science 2 Dec 2011 1226-1227

Research Environment

submit articleand move onhellip

publish articlePublication Environment

Research Environment

publish articlePublication Environment

submit articleand move onhellip

[Adapted Freire 2013]

transparencydependencies

stepsprovenance

portability

robustness

preservation

accessavailable

descriptionintelligible

standardscommon APIs

licensing

standardscommon

metadata

change managementversioning

packaging

Machine actionable

Machine actionable

Provenance ndash the link between doing and reporting

Reproduce by ReadingArchived Record Retain the ProcessCode

The IT Crowd Series 3 Episode 4

The eLab Virtual Machine (or Docker Image)

a black box thoughdockercom

Reproduce by Running Active InstrumentRetain the bits

The IT Crowd Series 3 Episode 4

The Internet

serviceScience as a Service

Integrative frameworks

Open Source

Workflows

Virtual Machines

Portable Packaging

Portability

Transparency

ReproZip

Workflowsmakefiles

serviceScience as a Service

Integrative frameworks

Open Source

Workflows

Virtual Machines

Portable Packaging

Metadata Objectsthe secret is the manifesthellip

Workflow definition

Data (inputs outputs) Parameter configsProvenance log

Hettne et al Structuring research methods and data with the research object model genomics workflows as a case study 2014 httpwwwjbiomedsemcomcontentpdf2041-1480-5-41pdf

myRDM

httpwwwcnrirestonvauspapersOverviewDigitalObjectArchitecturepdf

Depth and Coverage Profiles

NISO-JATS

NISO-JATS

Depth and Coverage Metadata Profiles

Zhao et al 2013

Method Matters

Make reproducible -gt Born

Be smart about reproducibility

Think Commons not Repository

Best Practices for Scientific Computing httparxivorgabs12100530Stodden Reproducible Research Standard Intl J Comm Law amp Policy 13 2009

RARE amp FAIR Knowledge Turns with Research Objects

httpmerchandisethedoctorwhositecouk

httpdoctorwhosite1weeblycomsonic-screwdrivershtml

Researcher Silver bullet tools

Psychic paper

httpbowjamesbowca20080608shhhhhhh-silencshtml

Reality Check

Jorge Cham wwwphdcomicscom

Stealthy not Sneakyreduce the friction

instrumentationspan RARE and FAIR

Optimising The Neylon Equation

Auto-magical end-to-end Instrumentationhttpswwwyoutubecomwatchv=QVQwSOX5S08

ELNs and Authoring Platforms

Sweave

Credit ne AuthorshipCiting what

Research Currencies

Training

56Of UK researchers develop their own research software or scripts

73 Of UK researchers have had no formal software engineering training

Survey of researchers from 15 Russell Group universities conducted by SSI between August - October 2014 406 respondents covering representative range of funders discipline and seniority

httpwwwrseacuk

BUThelliphellip

two years time when the paper is writtenreviewers want additional workstatistician wants more runsanalysis may need to be repeatedpost-doc leaves student arrivesnew data revised dataupdated versions of algorithmscodessample was contaminated

Inspired by Bob Harrison

bull Incremental shift for infrastructure providers

bull Moderate shift for policy makers and stewards

bull Paradigm shift for researchers and their institutions

The Challenge

All the members of the Wf4Ever teamColleagues in Manchesterrsquos Information Management Group

httpwwwresearchobjectorg

httpwwwwf4ever-projectorg

httpwwwfair-domorg

httpseek4scienceorg

httprightfieldorguk

httpwwwsoftwareacuk

httpwwwdatafairportorg

httpmyexperimentorg

httpwwwbioveleuAlan WilliamsNorman MorrisonStian Soiland-ReyesPaul GrothTim ClarkJuliana FreireAlejandra Gonzalez-BeltranPhilippe Rocca-SerraIan CottamSusanna SansoneKristian Garza

Barend MonsSean BechhoferPhilip BourneMatthew GambleRaul PalmaJun ZhaoNeil Chue HongJosh SommerMatthias ObstJacky SnoepDavid GavaghanRebecca Lawrence

Contacthellip

Professor Carole Goble CBE FREng FBCS

The University of Manchester UK

carolegoblemanchesteracuk

httpssitesgooglecomsitecarolegoble

httpwwwmygridorguk

Page 4: Keynote speech - Carole Goble - Jisc Digital Festival 2015

Virtual WitnessingScientific publications

raquo announce a result

raquo convince readers the result is correct

ldquopapers in experimental [and computational science] should describe the results and provide a clear enough protocol [algorithm] to allow successful repetition and extensionrdquo

Jill Mesirov Broad Institute 2010

Accessible Reproducible Research Science 22 January 2010 Vol 327 no 5964 pp 415-416 DOI 101126science1179653

Leviathan and the Air-Pump Hobbes Boyle and the Experimental Life (1985) Shapin and Schaffer

Bramhall et al QUALITY OF METHODS REPORTING IN ANIMAL MODELS OF COLITIS Inflammatory Bowel Diseases 2015

ldquoOnly one of the 58 papers reported all essential criteria on our checklist Animal age gender housing conditions and mortalitymorbidity were all poorly reportedhelliprdquo

httpwwwnaturecomnewsmale-researchers-stress-out-rodents-115106

ldquoAn article about computational science in a scientific publication is not the scholarship itself it is merely advertising of the scholarship The actual scholarship is the complete software development environment [the complete data] and the complete set of instructions which generated the figuresrdquo

David Donoho ldquoWavelab and Reproducible Researchrdquo 1995

datasetsdata collectionsstandard operating proceduressoftwarealgorithmsconfigurationstools and appscodesworkflows scriptscode librariesservicessystem software infrastructure compilers hardware

Morin et al Shining Light into Black BoxesScience 13 April 2012 336(6078) 159-160

Ince et al The case for open computer programs Nature 482 2012

Of 50 papers randomly chosen from 378 manuscripts in 2011 that use Burrows Wheeler Aligner for mapping Illumina reads

7 studies listed necessary details

26 no access to primary data sets broken links to home websites

31 no sw version parameters exact version of genomic reference

sequence

Nekrutenko amp Taylor Next-generation sequencing data interpretation enhancing reproducibility and accessibility Nature Genetics 13 (2012)

Broken software Broken science

raquo Geoffrey Chang Scripps Institute

raquo Homemade data-analysis program inherited from another lab

raquo Flipped two columns of data inverting the electron-density map used to derive protein structure

raquo Retract 3 Science papers and 2 papers in other journals

raquo One paper cited by 364 The structures of MsbA (purple) and Sav1866 (green) overlap little (left) until MsbA is inverted (right)

Miller A Scientists Nightmare Software Problem Leads to Five Retractions Science 22 December 2006 vol 314 no 5807 1856-1857httpwwwsoftwareacukblog2014-12-04-its-impossible-conduct-research-without-software-say-7-out-10-uk-researchers

Software making practicesldquoAs a general rule

researchers do not

test or document their

programs rigorously

and they rarely

release their codes

making it almost

impossible to

reproduce and verify

published results

generated by

scientific softwarerdquo

2000 scientists JE Hannay et al ldquoHow Do Scientists Develop and Use Scientific Softwarerdquo Proc ICSE Workshop Software Eng for Computational Science and Eng 2009 pp 1ndash8

Tools StandardsMachine actionableFormats Reporting Policies Practices

Record and AutomateEverything

recomputationorg

sciencecodemanifestoorg

republic of science

regulation of science

institution cores libraries

Mertonrsquos four norms of scientific behaviour (1942)

public services

Honest Error Science is messy

Inherent

ReinhartRogoff Austerity economicsThomas Herndon

Nature Oct rsquo12

Zoeuml Corbyn

Fraud

ldquoI canrsquot immediately reproduce the research in my own laboratory It took an estimated 280 hours for an average user to approximately reproduce the paperrdquo

Prof Phil BourneAssociate Director NIH Big Data 2 Knowledge Program

When research goes ldquowrongrdquo

raquo Tainted resources

raquo Black boxes

raquo Poor Reporting

raquo Unavailable resources results data software

raquo Bad maths

raquo Sins of omission

raquo Poor training sloppiness

httpswwwsciencenewsorgarticle12-reasons-research-goes-wrong (adapted)

Ioannidis Why Most Published Research Findings Are False August 2005Joppa et alTroubling Trends in Scientific Software Use SCIENCE 340 May 2013

Scientific method

Social environmentraquo Impact factor mania

raquo Pressure to publish

raquo Broken peer review

raquo Research never reported

raquo Disorganisation

raquo Time pressures

raquo Prep amp curate costs

When research goes ldquowrongrdquo

httpswwwsciencenewsorgarticle12-reasons-research-goes-wrong (adapted)

Nick D Kim strange-matternet

Norman Morrison

Do a Replication Study No thanks Not FAIR

Hard Resource intensiveUnrecognised TrolledJust gathering the bits

Cross-Institutional e-Laboratory

Scattered parts Subject specific General resources

Fragmented Landscape

101 Innovations in Scholarly Communication - the Changing Research Workflow Boseman and Kramer 2015 httpfigsharecomarticles101_Innovations_in_Scholarly_Communication_the_Changing_Research_Workflow1286826

httpmyexperimentorg

Research Objects

Compound Investigations Research Products

Multi-various ProductsPlatformsResources

Units of exchange commons contextual metadata

httpwwwresearchobjectorg

httpwwwresearchobjectorg

First class citizens - data software methods - id manage credit track profile focus

A Framework to Bundle and Relate (scattered) resources Metadata Objects that carry Research Context

Research Objects

bull closed lt-gt open bull local lt-gt alienbull embed lt-gt referbull fixed lt-gt fluidbull nested

bull multi ndashtyped stewarded sited authored

bull span research researchers platforms time

bull cite resolve steward

Goble De Roure Bechhofer Accelerating Knowledge Turns I3CK 2013

means

ends

driver

Research Object packages codes study and metadata to exchange descriptions of clinical study cohorts statistical scripts data (CKAN for the Farr Commons)

STELAR Asthma e-Lab Study Team for Early Life Asthma Research

ClinicalCodesorg coded patient cohorts exchanged with NHS FARSITE system

MRC funded multi-site collaboration to support safe use of patient and research data for medical research

STELAR e-Lab

Platform 1

Platform 2

Platform 3

Focus Pivot and ProfileProfile around methods workflows scripts software data figureshellip

Focus on the figure F1000Research Living Figuresversioned articles in-article data manipulation

R Lawrence Force2015 Vision Award Runner Up httpf1000compostersbrowsesummary1097482

Simply data + code

Can change the definition of

a figure and ultimately the

journal article

Colomb J and Brembs B

Sub-strains of Drosophila Canton-S differ

markedly in their locomotor behavior [v1

ref status indexed httpf1000res3is]

F1000Research 2014 3176

Other labs can replicate the study or

contribute their data to a meta-

analysis or disease model - figure

automatically updates

Data updates time-stamped

New conclusions added via versions

Jennifer Schopf Treating Data Like Software A Case for Production Quality Data JCDL 2012

Software-like Release paradigm Not a static document paradigm

Reproduce looks backwards -gt Release looks forwards

raquo Science methods data change -gt agile evolution

raquo Comparisons versions forks amp merges dependencies

raquo Id amp Citations

raquo Interlinked ROs

httpsdoiorg1015490seek1investigation56

[Snoep 2015]

httpsdoiorg1015490seek1investigation56

Personal Data

Local Stores

External

Databases

Articles

Models

Standards

Aggregated Commons Infrastructure

Consistent Comparative Reportingbull Design protocols samples

software modelshellipbull Just Enough Results Modelbull Common and specific elements

httpwwwseek4scienceorg httpwwwfair-domorg httpisatoolsorg

RO as Instrument Materials Method

RO as Instrument Materials Method

Input Data

Software

Output Data

ConfigParameters

Drummond Replicability is not Reproducibility Nor is it Good Science onlinePeng Reproducible Research in Computational Science Science 2 Dec 2011 1226-1227

Public data sets

My algorithm

RO Workflow as Instrument

BioSTIF

My data set

Public software

recompute

replicatererun

repeat

re-examine

repurpose

recreate

reuse

restorereconstruct review

regeneraterevise

recycle

redo

What IS reproducibilityRe ldquodo againrdquo ldquoreturn to original staterdquo

regenerate figure

ldquoshow A is true by doing Brdquo

verify but not falsify[Yong Nature 485 2012]

robustness tolerance

verification compliance

validation assurance

1 Science Changes So does the Lab

BioSTIF

ldquoThe questions donrsquot

change but the

answers dordquoDan Reed

The lab is not fixedUpdated resources

Uncertainty

Zhao et al Why workflows break - Understanding and combating decay in Taverna workflows 8th Intl Conf e-Science 2012

2 Instruments Break Labs Decaymaterials become unavailable technicians leave

Reproducibility Window

raquo Bit rot Black boxes

raquo Proprietary Licenses

raquo Clown services

raquo Partial replication

raquo Prepare to Repair

rsaquo form or function

rsaquo preserve or sustain

RO as Instrument Materials Method

Input Data

Software

Output Data

ConfigParameters

Methods(techniques algorithms

spec of the steps)

Materials(datasets parameters

algorithm seeds)

Experiment

Instruments(codes services scripts

underlying libraries)

Laboratory(sw and hw infrastructure

systems software

integrative platforms)

Setup

Drummond Replicability is not Reproducibility Nor is it Good Science onlinePeng Reproducible Research in Computational Science Science 2 Dec 2011 1226-1227

Research Environment

submit articleand move onhellip

publish articlePublication Environment

Research Environment

publish articlePublication Environment

submit articleand move onhellip

[Adapted Freire 2013]

transparencydependencies

stepsprovenance

portability

robustness

preservation

accessavailable

descriptionintelligible

standardscommon APIs

licensing

standardscommon

metadata

change managementversioning

packaging

Machine actionable

Machine actionable

Provenance ndash the link between doing and reporting

Reproduce by ReadingArchived Record Retain the ProcessCode

The IT Crowd Series 3 Episode 4

The eLab Virtual Machine (or Docker Image)

a black box thoughdockercom

Reproduce by Running Active InstrumentRetain the bits

The IT Crowd Series 3 Episode 4

The Internet

serviceScience as a Service

Integrative frameworks

Open Source

Workflows

Virtual Machines

Portable Packaging

Portability

Transparency

ReproZip

Workflowsmakefiles

serviceScience as a Service

Integrative frameworks

Open Source

Workflows

Virtual Machines

Portable Packaging

Metadata Objectsthe secret is the manifesthellip

Workflow definition

Data (inputs outputs) Parameter configsProvenance log

Hettne et al Structuring research methods and data with the research object model genomics workflows as a case study 2014 httpwwwjbiomedsemcomcontentpdf2041-1480-5-41pdf

myRDM

httpwwwcnrirestonvauspapersOverviewDigitalObjectArchitecturepdf

Depth and Coverage Profiles

NISO-JATS

NISO-JATS

Depth and Coverage Metadata Profiles

Zhao et al 2013

Method Matters

Make reproducible -gt Born

Be smart about reproducibility

Think Commons not Repository

Best Practices for Scientific Computing httparxivorgabs12100530Stodden Reproducible Research Standard Intl J Comm Law amp Policy 13 2009

RARE amp FAIR Knowledge Turns with Research Objects

httpmerchandisethedoctorwhositecouk

httpdoctorwhosite1weeblycomsonic-screwdrivershtml

Researcher Silver bullet tools

Psychic paper

httpbowjamesbowca20080608shhhhhhh-silencshtml

Reality Check

Jorge Cham wwwphdcomicscom

Stealthy not Sneakyreduce the friction

instrumentationspan RARE and FAIR

Optimising The Neylon Equation

Auto-magical end-to-end Instrumentationhttpswwwyoutubecomwatchv=QVQwSOX5S08

ELNs and Authoring Platforms

Sweave

Credit ne AuthorshipCiting what

Research Currencies

Training

56Of UK researchers develop their own research software or scripts

73 Of UK researchers have had no formal software engineering training

Survey of researchers from 15 Russell Group universities conducted by SSI between August - October 2014 406 respondents covering representative range of funders discipline and seniority

httpwwwrseacuk

BUThelliphellip

two years time when the paper is writtenreviewers want additional workstatistician wants more runsanalysis may need to be repeatedpost-doc leaves student arrivesnew data revised dataupdated versions of algorithmscodessample was contaminated

Inspired by Bob Harrison

bull Incremental shift for infrastructure providers

bull Moderate shift for policy makers and stewards

bull Paradigm shift for researchers and their institutions

The Challenge

All the members of the Wf4Ever teamColleagues in Manchesterrsquos Information Management Group

httpwwwresearchobjectorg

httpwwwwf4ever-projectorg

httpwwwfair-domorg

httpseek4scienceorg

httprightfieldorguk

httpwwwsoftwareacuk

httpwwwdatafairportorg

httpmyexperimentorg

httpwwwbioveleuAlan WilliamsNorman MorrisonStian Soiland-ReyesPaul GrothTim ClarkJuliana FreireAlejandra Gonzalez-BeltranPhilippe Rocca-SerraIan CottamSusanna SansoneKristian Garza

Barend MonsSean BechhoferPhilip BourneMatthew GambleRaul PalmaJun ZhaoNeil Chue HongJosh SommerMatthias ObstJacky SnoepDavid GavaghanRebecca Lawrence

Contacthellip

Professor Carole Goble CBE FREng FBCS

The University of Manchester UK

carolegoblemanchesteracuk

httpssitesgooglecomsitecarolegoble

httpwwwmygridorguk

Page 5: Keynote speech - Carole Goble - Jisc Digital Festival 2015

Bramhall et al QUALITY OF METHODS REPORTING IN ANIMAL MODELS OF COLITIS Inflammatory Bowel Diseases 2015

ldquoOnly one of the 58 papers reported all essential criteria on our checklist Animal age gender housing conditions and mortalitymorbidity were all poorly reportedhelliprdquo

httpwwwnaturecomnewsmale-researchers-stress-out-rodents-115106

ldquoAn article about computational science in a scientific publication is not the scholarship itself it is merely advertising of the scholarship The actual scholarship is the complete software development environment [the complete data] and the complete set of instructions which generated the figuresrdquo

David Donoho ldquoWavelab and Reproducible Researchrdquo 1995

datasetsdata collectionsstandard operating proceduressoftwarealgorithmsconfigurationstools and appscodesworkflows scriptscode librariesservicessystem software infrastructure compilers hardware

Morin et al Shining Light into Black BoxesScience 13 April 2012 336(6078) 159-160

Ince et al The case for open computer programs Nature 482 2012

Of 50 papers randomly chosen from 378 manuscripts in 2011 that use Burrows Wheeler Aligner for mapping Illumina reads

7 studies listed necessary details

26 no access to primary data sets broken links to home websites

31 no sw version parameters exact version of genomic reference

sequence

Nekrutenko amp Taylor Next-generation sequencing data interpretation enhancing reproducibility and accessibility Nature Genetics 13 (2012)

Broken software Broken science

raquo Geoffrey Chang Scripps Institute

raquo Homemade data-analysis program inherited from another lab

raquo Flipped two columns of data inverting the electron-density map used to derive protein structure

raquo Retract 3 Science papers and 2 papers in other journals

raquo One paper cited by 364 The structures of MsbA (purple) and Sav1866 (green) overlap little (left) until MsbA is inverted (right)

Miller A Scientists Nightmare Software Problem Leads to Five Retractions Science 22 December 2006 vol 314 no 5807 1856-1857httpwwwsoftwareacukblog2014-12-04-its-impossible-conduct-research-without-software-say-7-out-10-uk-researchers

Software making practicesldquoAs a general rule

researchers do not

test or document their

programs rigorously

and they rarely

release their codes

making it almost

impossible to

reproduce and verify

published results

generated by

scientific softwarerdquo

2000 scientists JE Hannay et al ldquoHow Do Scientists Develop and Use Scientific Softwarerdquo Proc ICSE Workshop Software Eng for Computational Science and Eng 2009 pp 1ndash8

Tools StandardsMachine actionableFormats Reporting Policies Practices

Record and AutomateEverything

recomputationorg

sciencecodemanifestoorg

republic of science

regulation of science

institution cores libraries

Mertonrsquos four norms of scientific behaviour (1942)

public services

Honest Error Science is messy

Inherent

ReinhartRogoff Austerity economicsThomas Herndon

Nature Oct rsquo12

Zoeuml Corbyn

Fraud

ldquoI canrsquot immediately reproduce the research in my own laboratory It took an estimated 280 hours for an average user to approximately reproduce the paperrdquo

Prof Phil BourneAssociate Director NIH Big Data 2 Knowledge Program

When research goes ldquowrongrdquo

raquo Tainted resources

raquo Black boxes

raquo Poor Reporting

raquo Unavailable resources results data software

raquo Bad maths

raquo Sins of omission

raquo Poor training sloppiness

httpswwwsciencenewsorgarticle12-reasons-research-goes-wrong (adapted)

Ioannidis Why Most Published Research Findings Are False August 2005Joppa et alTroubling Trends in Scientific Software Use SCIENCE 340 May 2013

Scientific method

Social environmentraquo Impact factor mania

raquo Pressure to publish

raquo Broken peer review

raquo Research never reported

raquo Disorganisation

raquo Time pressures

raquo Prep amp curate costs

When research goes ldquowrongrdquo

httpswwwsciencenewsorgarticle12-reasons-research-goes-wrong (adapted)

Nick D Kim strange-matternet

Norman Morrison

Do a Replication Study No thanks Not FAIR

Hard Resource intensiveUnrecognised TrolledJust gathering the bits

Cross-Institutional e-Laboratory

Scattered parts Subject specific General resources

Fragmented Landscape

101 Innovations in Scholarly Communication - the Changing Research Workflow Boseman and Kramer 2015 httpfigsharecomarticles101_Innovations_in_Scholarly_Communication_the_Changing_Research_Workflow1286826

httpmyexperimentorg

Research Objects

Compound Investigations Research Products

Multi-various ProductsPlatformsResources

Units of exchange commons contextual metadata

httpwwwresearchobjectorg

httpwwwresearchobjectorg

First class citizens - data software methods - id manage credit track profile focus

A Framework to Bundle and Relate (scattered) resources Metadata Objects that carry Research Context

Research Objects

bull closed lt-gt open bull local lt-gt alienbull embed lt-gt referbull fixed lt-gt fluidbull nested

bull multi ndashtyped stewarded sited authored

bull span research researchers platforms time

bull cite resolve steward

Goble De Roure Bechhofer Accelerating Knowledge Turns I3CK 2013

means

ends

driver

Research Object packages codes study and metadata to exchange descriptions of clinical study cohorts statistical scripts data (CKAN for the Farr Commons)

STELAR Asthma e-Lab Study Team for Early Life Asthma Research

ClinicalCodesorg coded patient cohorts exchanged with NHS FARSITE system

MRC funded multi-site collaboration to support safe use of patient and research data for medical research

STELAR e-Lab

Platform 1

Platform 2

Platform 3

Focus Pivot and ProfileProfile around methods workflows scripts software data figureshellip

Focus on the figure F1000Research Living Figuresversioned articles in-article data manipulation

R Lawrence Force2015 Vision Award Runner Up httpf1000compostersbrowsesummary1097482

Simply data + code

Can change the definition of

a figure and ultimately the

journal article

Colomb J and Brembs B

Sub-strains of Drosophila Canton-S differ

markedly in their locomotor behavior [v1

ref status indexed httpf1000res3is]

F1000Research 2014 3176

Other labs can replicate the study or

contribute their data to a meta-

analysis or disease model - figure

automatically updates

Data updates time-stamped

New conclusions added via versions

Jennifer Schopf Treating Data Like Software A Case for Production Quality Data JCDL 2012

Software-like Release paradigm Not a static document paradigm

Reproduce looks backwards -gt Release looks forwards

raquo Science methods data change -gt agile evolution

raquo Comparisons versions forks amp merges dependencies

raquo Id amp Citations

raquo Interlinked ROs

httpsdoiorg1015490seek1investigation56

[Snoep 2015]

httpsdoiorg1015490seek1investigation56

Personal Data

Local Stores

External

Databases

Articles

Models

Standards

Aggregated Commons Infrastructure

Consistent Comparative Reportingbull Design protocols samples

software modelshellipbull Just Enough Results Modelbull Common and specific elements

httpwwwseek4scienceorg httpwwwfair-domorg httpisatoolsorg

RO as Instrument Materials Method

RO as Instrument Materials Method

Input Data

Software

Output Data

ConfigParameters

Drummond Replicability is not Reproducibility Nor is it Good Science onlinePeng Reproducible Research in Computational Science Science 2 Dec 2011 1226-1227

Public data sets

My algorithm

RO Workflow as Instrument

BioSTIF

My data set

Public software

recompute

replicatererun

repeat

re-examine

repurpose

recreate

reuse

restorereconstruct review

regeneraterevise

recycle

redo

What IS reproducibilityRe ldquodo againrdquo ldquoreturn to original staterdquo

regenerate figure

ldquoshow A is true by doing Brdquo

verify but not falsify[Yong Nature 485 2012]

robustness tolerance

verification compliance

validation assurance

1 Science Changes So does the Lab

BioSTIF

ldquoThe questions donrsquot

change but the

answers dordquoDan Reed

The lab is not fixedUpdated resources

Uncertainty

Zhao et al Why workflows break - Understanding and combating decay in Taverna workflows 8th Intl Conf e-Science 2012

2 Instruments Break Labs Decaymaterials become unavailable technicians leave

Reproducibility Window

raquo Bit rot Black boxes

raquo Proprietary Licenses

raquo Clown services

raquo Partial replication

raquo Prepare to Repair

rsaquo form or function

rsaquo preserve or sustain

RO as Instrument Materials Method

Input Data

Software

Output Data

ConfigParameters

Methods(techniques algorithms

spec of the steps)

Materials(datasets parameters

algorithm seeds)

Experiment

Instruments(codes services scripts

underlying libraries)

Laboratory(sw and hw infrastructure

systems software

integrative platforms)

Setup

Drummond Replicability is not Reproducibility Nor is it Good Science onlinePeng Reproducible Research in Computational Science Science 2 Dec 2011 1226-1227

Research Environment

submit articleand move onhellip

publish articlePublication Environment

Research Environment

publish articlePublication Environment

submit articleand move onhellip

[Adapted Freire 2013]

transparencydependencies

stepsprovenance

portability

robustness

preservation

accessavailable

descriptionintelligible

standardscommon APIs

licensing

standardscommon

metadata

change managementversioning

packaging

Machine actionable

Machine actionable

Provenance ndash the link between doing and reporting

Reproduce by ReadingArchived Record Retain the ProcessCode

The IT Crowd Series 3 Episode 4

The eLab Virtual Machine (or Docker Image)

a black box thoughdockercom

Reproduce by Running Active InstrumentRetain the bits

The IT Crowd Series 3 Episode 4

The Internet

serviceScience as a Service

Integrative frameworks

Open Source

Workflows

Virtual Machines

Portable Packaging

Portability

Transparency

ReproZip

Workflowsmakefiles

serviceScience as a Service

Integrative frameworks

Open Source

Workflows

Virtual Machines

Portable Packaging

Metadata Objectsthe secret is the manifesthellip

Workflow definition

Data (inputs outputs) Parameter configsProvenance log

Hettne et al Structuring research methods and data with the research object model genomics workflows as a case study 2014 httpwwwjbiomedsemcomcontentpdf2041-1480-5-41pdf

myRDM

httpwwwcnrirestonvauspapersOverviewDigitalObjectArchitecturepdf

Depth and Coverage Profiles

NISO-JATS

NISO-JATS

Depth and Coverage Metadata Profiles

Zhao et al 2013

Method Matters

Make reproducible -gt Born

Be smart about reproducibility

Think Commons not Repository

Best Practices for Scientific Computing httparxivorgabs12100530Stodden Reproducible Research Standard Intl J Comm Law amp Policy 13 2009

RARE amp FAIR Knowledge Turns with Research Objects

httpmerchandisethedoctorwhositecouk

httpdoctorwhosite1weeblycomsonic-screwdrivershtml

Researcher Silver bullet tools

Psychic paper

httpbowjamesbowca20080608shhhhhhh-silencshtml

Reality Check

Jorge Cham wwwphdcomicscom

Stealthy not Sneakyreduce the friction

instrumentationspan RARE and FAIR

Optimising The Neylon Equation

Auto-magical end-to-end Instrumentationhttpswwwyoutubecomwatchv=QVQwSOX5S08

ELNs and Authoring Platforms

Sweave

Credit ne AuthorshipCiting what

Research Currencies

Training

56Of UK researchers develop their own research software or scripts

73 Of UK researchers have had no formal software engineering training

Survey of researchers from 15 Russell Group universities conducted by SSI between August - October 2014 406 respondents covering representative range of funders discipline and seniority

httpwwwrseacuk

BUThelliphellip

two years time when the paper is writtenreviewers want additional workstatistician wants more runsanalysis may need to be repeatedpost-doc leaves student arrivesnew data revised dataupdated versions of algorithmscodessample was contaminated

Inspired by Bob Harrison

bull Incremental shift for infrastructure providers

bull Moderate shift for policy makers and stewards

bull Paradigm shift for researchers and their institutions

The Challenge

All the members of the Wf4Ever teamColleagues in Manchesterrsquos Information Management Group

httpwwwresearchobjectorg

httpwwwwf4ever-projectorg

httpwwwfair-domorg

httpseek4scienceorg

httprightfieldorguk

httpwwwsoftwareacuk

httpwwwdatafairportorg

httpmyexperimentorg

httpwwwbioveleuAlan WilliamsNorman MorrisonStian Soiland-ReyesPaul GrothTim ClarkJuliana FreireAlejandra Gonzalez-BeltranPhilippe Rocca-SerraIan CottamSusanna SansoneKristian Garza

Barend MonsSean BechhoferPhilip BourneMatthew GambleRaul PalmaJun ZhaoNeil Chue HongJosh SommerMatthias ObstJacky SnoepDavid GavaghanRebecca Lawrence

Contacthellip

Professor Carole Goble CBE FREng FBCS

The University of Manchester UK

carolegoblemanchesteracuk

httpssitesgooglecomsitecarolegoble

httpwwwmygridorguk

Page 6: Keynote speech - Carole Goble - Jisc Digital Festival 2015

ldquoAn article about computational science in a scientific publication is not the scholarship itself it is merely advertising of the scholarship The actual scholarship is the complete software development environment [the complete data] and the complete set of instructions which generated the figuresrdquo

David Donoho ldquoWavelab and Reproducible Researchrdquo 1995

datasetsdata collectionsstandard operating proceduressoftwarealgorithmsconfigurationstools and appscodesworkflows scriptscode librariesservicessystem software infrastructure compilers hardware

Morin et al Shining Light into Black BoxesScience 13 April 2012 336(6078) 159-160

Ince et al The case for open computer programs Nature 482 2012

Of 50 papers randomly chosen from 378 manuscripts in 2011 that use Burrows Wheeler Aligner for mapping Illumina reads

7 studies listed necessary details

26 no access to primary data sets broken links to home websites

31 no sw version parameters exact version of genomic reference

sequence

Nekrutenko amp Taylor Next-generation sequencing data interpretation enhancing reproducibility and accessibility Nature Genetics 13 (2012)

Broken software Broken science

raquo Geoffrey Chang Scripps Institute

raquo Homemade data-analysis program inherited from another lab

raquo Flipped two columns of data inverting the electron-density map used to derive protein structure

raquo Retract 3 Science papers and 2 papers in other journals

raquo One paper cited by 364 The structures of MsbA (purple) and Sav1866 (green) overlap little (left) until MsbA is inverted (right)

Miller A Scientists Nightmare Software Problem Leads to Five Retractions Science 22 December 2006 vol 314 no 5807 1856-1857httpwwwsoftwareacukblog2014-12-04-its-impossible-conduct-research-without-software-say-7-out-10-uk-researchers

Software making practicesldquoAs a general rule

researchers do not

test or document their

programs rigorously

and they rarely

release their codes

making it almost

impossible to

reproduce and verify

published results

generated by

scientific softwarerdquo

2000 scientists JE Hannay et al ldquoHow Do Scientists Develop and Use Scientific Softwarerdquo Proc ICSE Workshop Software Eng for Computational Science and Eng 2009 pp 1ndash8

Tools StandardsMachine actionableFormats Reporting Policies Practices

Record and AutomateEverything

recomputationorg

sciencecodemanifestoorg

republic of science

regulation of science

institution cores libraries

Mertonrsquos four norms of scientific behaviour (1942)

public services

Honest Error Science is messy

Inherent

ReinhartRogoff Austerity economicsThomas Herndon

Nature Oct rsquo12

Zoeuml Corbyn

Fraud

ldquoI canrsquot immediately reproduce the research in my own laboratory It took an estimated 280 hours for an average user to approximately reproduce the paperrdquo

Prof Phil BourneAssociate Director NIH Big Data 2 Knowledge Program

When research goes ldquowrongrdquo

raquo Tainted resources

raquo Black boxes

raquo Poor Reporting

raquo Unavailable resources results data software

raquo Bad maths

raquo Sins of omission

raquo Poor training sloppiness

httpswwwsciencenewsorgarticle12-reasons-research-goes-wrong (adapted)

Ioannidis Why Most Published Research Findings Are False August 2005Joppa et alTroubling Trends in Scientific Software Use SCIENCE 340 May 2013

Scientific method

Social environmentraquo Impact factor mania

raquo Pressure to publish

raquo Broken peer review

raquo Research never reported

raquo Disorganisation

raquo Time pressures

raquo Prep amp curate costs

When research goes ldquowrongrdquo

httpswwwsciencenewsorgarticle12-reasons-research-goes-wrong (adapted)

Nick D Kim strange-matternet

Norman Morrison

Do a Replication Study No thanks Not FAIR

Hard Resource intensiveUnrecognised TrolledJust gathering the bits

Cross-Institutional e-Laboratory

Scattered parts Subject specific General resources

Fragmented Landscape

101 Innovations in Scholarly Communication - the Changing Research Workflow Boseman and Kramer 2015 httpfigsharecomarticles101_Innovations_in_Scholarly_Communication_the_Changing_Research_Workflow1286826

httpmyexperimentorg

Research Objects

Compound Investigations Research Products

Multi-various ProductsPlatformsResources

Units of exchange commons contextual metadata

httpwwwresearchobjectorg

httpwwwresearchobjectorg

First class citizens - data software methods - id manage credit track profile focus

A Framework to Bundle and Relate (scattered) resources Metadata Objects that carry Research Context

Research Objects

bull closed lt-gt open bull local lt-gt alienbull embed lt-gt referbull fixed lt-gt fluidbull nested

bull multi ndashtyped stewarded sited authored

bull span research researchers platforms time

bull cite resolve steward

Goble De Roure Bechhofer Accelerating Knowledge Turns I3CK 2013

means

ends

driver

Research Object packages codes study and metadata to exchange descriptions of clinical study cohorts statistical scripts data (CKAN for the Farr Commons)

STELAR Asthma e-Lab Study Team for Early Life Asthma Research

ClinicalCodesorg coded patient cohorts exchanged with NHS FARSITE system

MRC funded multi-site collaboration to support safe use of patient and research data for medical research

STELAR e-Lab

Platform 1

Platform 2

Platform 3

Focus Pivot and ProfileProfile around methods workflows scripts software data figureshellip

Focus on the figure F1000Research Living Figuresversioned articles in-article data manipulation

R Lawrence Force2015 Vision Award Runner Up httpf1000compostersbrowsesummary1097482

Simply data + code

Can change the definition of

a figure and ultimately the

journal article

Colomb J and Brembs B

Sub-strains of Drosophila Canton-S differ

markedly in their locomotor behavior [v1

ref status indexed httpf1000res3is]

F1000Research 2014 3176

Other labs can replicate the study or

contribute their data to a meta-

analysis or disease model - figure

automatically updates

Data updates time-stamped

New conclusions added via versions

Jennifer Schopf Treating Data Like Software A Case for Production Quality Data JCDL 2012

Software-like Release paradigm Not a static document paradigm

Reproduce looks backwards -gt Release looks forwards

raquo Science methods data change -gt agile evolution

raquo Comparisons versions forks amp merges dependencies

raquo Id amp Citations

raquo Interlinked ROs

httpsdoiorg1015490seek1investigation56

[Snoep 2015]

httpsdoiorg1015490seek1investigation56

Personal Data

Local Stores

External

Databases

Articles

Models

Standards

Aggregated Commons Infrastructure

Consistent Comparative Reportingbull Design protocols samples

software modelshellipbull Just Enough Results Modelbull Common and specific elements

httpwwwseek4scienceorg httpwwwfair-domorg httpisatoolsorg

RO as Instrument Materials Method

RO as Instrument Materials Method

Input Data

Software

Output Data

ConfigParameters

Drummond Replicability is not Reproducibility Nor is it Good Science onlinePeng Reproducible Research in Computational Science Science 2 Dec 2011 1226-1227

Public data sets

My algorithm

RO Workflow as Instrument

BioSTIF

My data set

Public software

recompute

replicatererun

repeat

re-examine

repurpose

recreate

reuse

restorereconstruct review

regeneraterevise

recycle

redo

What IS reproducibilityRe ldquodo againrdquo ldquoreturn to original staterdquo

regenerate figure

ldquoshow A is true by doing Brdquo

verify but not falsify[Yong Nature 485 2012]

robustness tolerance

verification compliance

validation assurance

1 Science Changes So does the Lab

BioSTIF

ldquoThe questions donrsquot

change but the

answers dordquoDan Reed

The lab is not fixedUpdated resources

Uncertainty

Zhao et al Why workflows break - Understanding and combating decay in Taverna workflows 8th Intl Conf e-Science 2012

2 Instruments Break Labs Decaymaterials become unavailable technicians leave

Reproducibility Window

raquo Bit rot Black boxes

raquo Proprietary Licenses

raquo Clown services

raquo Partial replication

raquo Prepare to Repair

rsaquo form or function

rsaquo preserve or sustain

RO as Instrument Materials Method

Input Data

Software

Output Data

ConfigParameters

Methods(techniques algorithms

spec of the steps)

Materials(datasets parameters

algorithm seeds)

Experiment

Instruments(codes services scripts

underlying libraries)

Laboratory(sw and hw infrastructure

systems software

integrative platforms)

Setup

Drummond Replicability is not Reproducibility Nor is it Good Science onlinePeng Reproducible Research in Computational Science Science 2 Dec 2011 1226-1227

Research Environment

submit articleand move onhellip

publish articlePublication Environment

Research Environment

publish articlePublication Environment

submit articleand move onhellip

[Adapted Freire 2013]

transparencydependencies

stepsprovenance

portability

robustness

preservation

accessavailable

descriptionintelligible

standardscommon APIs

licensing

standardscommon

metadata

change managementversioning

packaging

Machine actionable

Machine actionable

Provenance ndash the link between doing and reporting

Reproduce by ReadingArchived Record Retain the ProcessCode

The IT Crowd Series 3 Episode 4

The eLab Virtual Machine (or Docker Image)

a black box thoughdockercom

Reproduce by Running Active InstrumentRetain the bits

The IT Crowd Series 3 Episode 4

The Internet

serviceScience as a Service

Integrative frameworks

Open Source

Workflows

Virtual Machines

Portable Packaging

Portability

Transparency

ReproZip

Workflowsmakefiles

serviceScience as a Service

Integrative frameworks

Open Source

Workflows

Virtual Machines

Portable Packaging

Metadata Objectsthe secret is the manifesthellip

Workflow definition

Data (inputs outputs) Parameter configsProvenance log

Hettne et al Structuring research methods and data with the research object model genomics workflows as a case study 2014 httpwwwjbiomedsemcomcontentpdf2041-1480-5-41pdf

myRDM

httpwwwcnrirestonvauspapersOverviewDigitalObjectArchitecturepdf

Depth and Coverage Profiles

NISO-JATS

NISO-JATS

Depth and Coverage Metadata Profiles

Zhao et al 2013

Method Matters

Make reproducible -gt Born

Be smart about reproducibility

Think Commons not Repository

Best Practices for Scientific Computing httparxivorgabs12100530Stodden Reproducible Research Standard Intl J Comm Law amp Policy 13 2009

RARE amp FAIR Knowledge Turns with Research Objects

httpmerchandisethedoctorwhositecouk

httpdoctorwhosite1weeblycomsonic-screwdrivershtml

Researcher Silver bullet tools

Psychic paper

httpbowjamesbowca20080608shhhhhhh-silencshtml

Reality Check

Jorge Cham wwwphdcomicscom

Stealthy not Sneakyreduce the friction

instrumentationspan RARE and FAIR

Optimising The Neylon Equation

Auto-magical end-to-end Instrumentationhttpswwwyoutubecomwatchv=QVQwSOX5S08

ELNs and Authoring Platforms

Sweave

Credit ne AuthorshipCiting what

Research Currencies

Training

56Of UK researchers develop their own research software or scripts

73 Of UK researchers have had no formal software engineering training

Survey of researchers from 15 Russell Group universities conducted by SSI between August - October 2014 406 respondents covering representative range of funders discipline and seniority

httpwwwrseacuk

BUThelliphellip

two years time when the paper is writtenreviewers want additional workstatistician wants more runsanalysis may need to be repeatedpost-doc leaves student arrivesnew data revised dataupdated versions of algorithmscodessample was contaminated

Inspired by Bob Harrison

bull Incremental shift for infrastructure providers

bull Moderate shift for policy makers and stewards

bull Paradigm shift for researchers and their institutions

The Challenge

All the members of the Wf4Ever teamColleagues in Manchesterrsquos Information Management Group

httpwwwresearchobjectorg

httpwwwwf4ever-projectorg

httpwwwfair-domorg

httpseek4scienceorg

httprightfieldorguk

httpwwwsoftwareacuk

httpwwwdatafairportorg

httpmyexperimentorg

httpwwwbioveleuAlan WilliamsNorman MorrisonStian Soiland-ReyesPaul GrothTim ClarkJuliana FreireAlejandra Gonzalez-BeltranPhilippe Rocca-SerraIan CottamSusanna SansoneKristian Garza

Barend MonsSean BechhoferPhilip BourneMatthew GambleRaul PalmaJun ZhaoNeil Chue HongJosh SommerMatthias ObstJacky SnoepDavid GavaghanRebecca Lawrence

Contacthellip

Professor Carole Goble CBE FREng FBCS

The University of Manchester UK

carolegoblemanchesteracuk

httpssitesgooglecomsitecarolegoble

httpwwwmygridorguk

Page 7: Keynote speech - Carole Goble - Jisc Digital Festival 2015

Of 50 papers randomly chosen from 378 manuscripts in 2011 that use Burrows Wheeler Aligner for mapping Illumina reads

7 studies listed necessary details

26 no access to primary data sets broken links to home websites

31 no sw version parameters exact version of genomic reference

sequence

Nekrutenko amp Taylor Next-generation sequencing data interpretation enhancing reproducibility and accessibility Nature Genetics 13 (2012)

Broken software Broken science

raquo Geoffrey Chang Scripps Institute

raquo Homemade data-analysis program inherited from another lab

raquo Flipped two columns of data inverting the electron-density map used to derive protein structure

raquo Retract 3 Science papers and 2 papers in other journals

raquo One paper cited by 364 The structures of MsbA (purple) and Sav1866 (green) overlap little (left) until MsbA is inverted (right)

Miller A Scientists Nightmare Software Problem Leads to Five Retractions Science 22 December 2006 vol 314 no 5807 1856-1857httpwwwsoftwareacukblog2014-12-04-its-impossible-conduct-research-without-software-say-7-out-10-uk-researchers

Software making practicesldquoAs a general rule

researchers do not

test or document their

programs rigorously

and they rarely

release their codes

making it almost

impossible to

reproduce and verify

published results

generated by

scientific softwarerdquo

2000 scientists JE Hannay et al ldquoHow Do Scientists Develop and Use Scientific Softwarerdquo Proc ICSE Workshop Software Eng for Computational Science and Eng 2009 pp 1ndash8

Tools StandardsMachine actionableFormats Reporting Policies Practices

Record and AutomateEverything

recomputationorg

sciencecodemanifestoorg

republic of science

regulation of science

institution cores libraries

Mertonrsquos four norms of scientific behaviour (1942)

public services

Honest Error Science is messy

Inherent

ReinhartRogoff Austerity economicsThomas Herndon

Nature Oct rsquo12

Zoeuml Corbyn

Fraud

ldquoI canrsquot immediately reproduce the research in my own laboratory It took an estimated 280 hours for an average user to approximately reproduce the paperrdquo

Prof Phil BourneAssociate Director NIH Big Data 2 Knowledge Program

When research goes ldquowrongrdquo

raquo Tainted resources

raquo Black boxes

raquo Poor Reporting

raquo Unavailable resources results data software

raquo Bad maths

raquo Sins of omission

raquo Poor training sloppiness

httpswwwsciencenewsorgarticle12-reasons-research-goes-wrong (adapted)

Ioannidis Why Most Published Research Findings Are False August 2005Joppa et alTroubling Trends in Scientific Software Use SCIENCE 340 May 2013

Scientific method

Social environmentraquo Impact factor mania

raquo Pressure to publish

raquo Broken peer review

raquo Research never reported

raquo Disorganisation

raquo Time pressures

raquo Prep amp curate costs

When research goes ldquowrongrdquo

httpswwwsciencenewsorgarticle12-reasons-research-goes-wrong (adapted)

Nick D Kim strange-matternet

Norman Morrison

Do a Replication Study No thanks Not FAIR

Hard Resource intensiveUnrecognised TrolledJust gathering the bits

Cross-Institutional e-Laboratory

Scattered parts Subject specific General resources

Fragmented Landscape

101 Innovations in Scholarly Communication - the Changing Research Workflow Boseman and Kramer 2015 httpfigsharecomarticles101_Innovations_in_Scholarly_Communication_the_Changing_Research_Workflow1286826

httpmyexperimentorg

Research Objects

Compound Investigations Research Products

Multi-various ProductsPlatformsResources

Units of exchange commons contextual metadata

httpwwwresearchobjectorg

httpwwwresearchobjectorg

First class citizens - data software methods - id manage credit track profile focus

A Framework to Bundle and Relate (scattered) resources Metadata Objects that carry Research Context

Research Objects

bull closed lt-gt open bull local lt-gt alienbull embed lt-gt referbull fixed lt-gt fluidbull nested

bull multi ndashtyped stewarded sited authored

bull span research researchers platforms time

bull cite resolve steward

Goble De Roure Bechhofer Accelerating Knowledge Turns I3CK 2013

means

ends

driver

Research Object packages codes study and metadata to exchange descriptions of clinical study cohorts statistical scripts data (CKAN for the Farr Commons)

STELAR Asthma e-Lab Study Team for Early Life Asthma Research

ClinicalCodesorg coded patient cohorts exchanged with NHS FARSITE system

MRC funded multi-site collaboration to support safe use of patient and research data for medical research

STELAR e-Lab

Platform 1

Platform 2

Platform 3

Focus Pivot and ProfileProfile around methods workflows scripts software data figureshellip

Focus on the figure F1000Research Living Figuresversioned articles in-article data manipulation

R Lawrence Force2015 Vision Award Runner Up httpf1000compostersbrowsesummary1097482

Simply data + code

Can change the definition of

a figure and ultimately the

journal article

Colomb J and Brembs B

Sub-strains of Drosophila Canton-S differ

markedly in their locomotor behavior [v1

ref status indexed httpf1000res3is]

F1000Research 2014 3176

Other labs can replicate the study or

contribute their data to a meta-

analysis or disease model - figure

automatically updates

Data updates time-stamped

New conclusions added via versions

Jennifer Schopf Treating Data Like Software A Case for Production Quality Data JCDL 2012

Software-like Release paradigm Not a static document paradigm

Reproduce looks backwards -gt Release looks forwards

raquo Science methods data change -gt agile evolution

raquo Comparisons versions forks amp merges dependencies

raquo Id amp Citations

raquo Interlinked ROs

httpsdoiorg1015490seek1investigation56

[Snoep 2015]

httpsdoiorg1015490seek1investigation56

Personal Data

Local Stores

External

Databases

Articles

Models

Standards

Aggregated Commons Infrastructure

Consistent Comparative Reportingbull Design protocols samples

software modelshellipbull Just Enough Results Modelbull Common and specific elements

httpwwwseek4scienceorg httpwwwfair-domorg httpisatoolsorg

RO as Instrument Materials Method

RO as Instrument Materials Method

Input Data

Software

Output Data

ConfigParameters

Drummond Replicability is not Reproducibility Nor is it Good Science onlinePeng Reproducible Research in Computational Science Science 2 Dec 2011 1226-1227

Public data sets

My algorithm

RO Workflow as Instrument

BioSTIF

My data set

Public software

recompute

replicatererun

repeat

re-examine

repurpose

recreate

reuse

restorereconstruct review

regeneraterevise

recycle

redo

What IS reproducibilityRe ldquodo againrdquo ldquoreturn to original staterdquo

regenerate figure

ldquoshow A is true by doing Brdquo

verify but not falsify[Yong Nature 485 2012]

robustness tolerance

verification compliance

validation assurance

1 Science Changes So does the Lab

BioSTIF

ldquoThe questions donrsquot

change but the

answers dordquoDan Reed

The lab is not fixedUpdated resources

Uncertainty

Zhao et al Why workflows break - Understanding and combating decay in Taverna workflows 8th Intl Conf e-Science 2012

2 Instruments Break Labs Decaymaterials become unavailable technicians leave

Reproducibility Window

raquo Bit rot Black boxes

raquo Proprietary Licenses

raquo Clown services

raquo Partial replication

raquo Prepare to Repair

rsaquo form or function

rsaquo preserve or sustain

RO as Instrument Materials Method

Input Data

Software

Output Data

ConfigParameters

Methods(techniques algorithms

spec of the steps)

Materials(datasets parameters

algorithm seeds)

Experiment

Instruments(codes services scripts

underlying libraries)

Laboratory(sw and hw infrastructure

systems software

integrative platforms)

Setup

Drummond Replicability is not Reproducibility Nor is it Good Science onlinePeng Reproducible Research in Computational Science Science 2 Dec 2011 1226-1227

Research Environment

submit articleand move onhellip

publish articlePublication Environment

Research Environment

publish articlePublication Environment

submit articleand move onhellip

[Adapted Freire 2013]

transparencydependencies

stepsprovenance

portability

robustness

preservation

accessavailable

descriptionintelligible

standardscommon APIs

licensing

standardscommon

metadata

change managementversioning

packaging

Machine actionable

Machine actionable

Provenance ndash the link between doing and reporting

Reproduce by ReadingArchived Record Retain the ProcessCode

The IT Crowd Series 3 Episode 4

The eLab Virtual Machine (or Docker Image)

a black box thoughdockercom

Reproduce by Running Active InstrumentRetain the bits

The IT Crowd Series 3 Episode 4

The Internet

serviceScience as a Service

Integrative frameworks

Open Source

Workflows

Virtual Machines

Portable Packaging

Portability

Transparency

ReproZip

Workflowsmakefiles

serviceScience as a Service

Integrative frameworks

Open Source

Workflows

Virtual Machines

Portable Packaging

Metadata Objectsthe secret is the manifesthellip

Workflow definition

Data (inputs outputs) Parameter configsProvenance log

Hettne et al Structuring research methods and data with the research object model genomics workflows as a case study 2014 httpwwwjbiomedsemcomcontentpdf2041-1480-5-41pdf

myRDM

httpwwwcnrirestonvauspapersOverviewDigitalObjectArchitecturepdf

Depth and Coverage Profiles

NISO-JATS

NISO-JATS

Depth and Coverage Metadata Profiles

Zhao et al 2013

Method Matters

Make reproducible -gt Born

Be smart about reproducibility

Think Commons not Repository

Best Practices for Scientific Computing httparxivorgabs12100530Stodden Reproducible Research Standard Intl J Comm Law amp Policy 13 2009

RARE amp FAIR Knowledge Turns with Research Objects

httpmerchandisethedoctorwhositecouk

httpdoctorwhosite1weeblycomsonic-screwdrivershtml

Researcher Silver bullet tools

Psychic paper

httpbowjamesbowca20080608shhhhhhh-silencshtml

Reality Check

Jorge Cham wwwphdcomicscom

Stealthy not Sneakyreduce the friction

instrumentationspan RARE and FAIR

Optimising The Neylon Equation

Auto-magical end-to-end Instrumentationhttpswwwyoutubecomwatchv=QVQwSOX5S08

ELNs and Authoring Platforms

Sweave

Credit ne AuthorshipCiting what

Research Currencies

Training

56Of UK researchers develop their own research software or scripts

73 Of UK researchers have had no formal software engineering training

Survey of researchers from 15 Russell Group universities conducted by SSI between August - October 2014 406 respondents covering representative range of funders discipline and seniority

httpwwwrseacuk

BUThelliphellip

two years time when the paper is writtenreviewers want additional workstatistician wants more runsanalysis may need to be repeatedpost-doc leaves student arrivesnew data revised dataupdated versions of algorithmscodessample was contaminated

Inspired by Bob Harrison

bull Incremental shift for infrastructure providers

bull Moderate shift for policy makers and stewards

bull Paradigm shift for researchers and their institutions

The Challenge

All the members of the Wf4Ever teamColleagues in Manchesterrsquos Information Management Group

httpwwwresearchobjectorg

httpwwwwf4ever-projectorg

httpwwwfair-domorg

httpseek4scienceorg

httprightfieldorguk

httpwwwsoftwareacuk

httpwwwdatafairportorg

httpmyexperimentorg

httpwwwbioveleuAlan WilliamsNorman MorrisonStian Soiland-ReyesPaul GrothTim ClarkJuliana FreireAlejandra Gonzalez-BeltranPhilippe Rocca-SerraIan CottamSusanna SansoneKristian Garza

Barend MonsSean BechhoferPhilip BourneMatthew GambleRaul PalmaJun ZhaoNeil Chue HongJosh SommerMatthias ObstJacky SnoepDavid GavaghanRebecca Lawrence

Contacthellip

Professor Carole Goble CBE FREng FBCS

The University of Manchester UK

carolegoblemanchesteracuk

httpssitesgooglecomsitecarolegoble

httpwwwmygridorguk

Page 8: Keynote speech - Carole Goble - Jisc Digital Festival 2015

Broken software Broken science

raquo Geoffrey Chang Scripps Institute

raquo Homemade data-analysis program inherited from another lab

raquo Flipped two columns of data inverting the electron-density map used to derive protein structure

raquo Retract 3 Science papers and 2 papers in other journals

raquo One paper cited by 364 The structures of MsbA (purple) and Sav1866 (green) overlap little (left) until MsbA is inverted (right)

Miller A Scientists Nightmare Software Problem Leads to Five Retractions Science 22 December 2006 vol 314 no 5807 1856-1857httpwwwsoftwareacukblog2014-12-04-its-impossible-conduct-research-without-software-say-7-out-10-uk-researchers

Software making practicesldquoAs a general rule

researchers do not

test or document their

programs rigorously

and they rarely

release their codes

making it almost

impossible to

reproduce and verify

published results

generated by

scientific softwarerdquo

2000 scientists JE Hannay et al ldquoHow Do Scientists Develop and Use Scientific Softwarerdquo Proc ICSE Workshop Software Eng for Computational Science and Eng 2009 pp 1ndash8

Tools StandardsMachine actionableFormats Reporting Policies Practices

Record and AutomateEverything

recomputationorg

sciencecodemanifestoorg

republic of science

regulation of science

institution cores libraries

Mertonrsquos four norms of scientific behaviour (1942)

public services

Honest Error Science is messy

Inherent

ReinhartRogoff Austerity economicsThomas Herndon

Nature Oct rsquo12

Zoeuml Corbyn

Fraud

ldquoI canrsquot immediately reproduce the research in my own laboratory It took an estimated 280 hours for an average user to approximately reproduce the paperrdquo

Prof Phil BourneAssociate Director NIH Big Data 2 Knowledge Program

When research goes ldquowrongrdquo

raquo Tainted resources

raquo Black boxes

raquo Poor Reporting

raquo Unavailable resources results data software

raquo Bad maths

raquo Sins of omission

raquo Poor training sloppiness

httpswwwsciencenewsorgarticle12-reasons-research-goes-wrong (adapted)

Ioannidis Why Most Published Research Findings Are False August 2005Joppa et alTroubling Trends in Scientific Software Use SCIENCE 340 May 2013

Scientific method

Social environmentraquo Impact factor mania

raquo Pressure to publish

raquo Broken peer review

raquo Research never reported

raquo Disorganisation

raquo Time pressures

raquo Prep amp curate costs

When research goes ldquowrongrdquo

httpswwwsciencenewsorgarticle12-reasons-research-goes-wrong (adapted)

Nick D Kim strange-matternet

Norman Morrison

Do a Replication Study No thanks Not FAIR

Hard Resource intensiveUnrecognised TrolledJust gathering the bits

Cross-Institutional e-Laboratory

Scattered parts Subject specific General resources

Fragmented Landscape

101 Innovations in Scholarly Communication - the Changing Research Workflow Boseman and Kramer 2015 httpfigsharecomarticles101_Innovations_in_Scholarly_Communication_the_Changing_Research_Workflow1286826

httpmyexperimentorg

Research Objects

Compound Investigations Research Products

Multi-various ProductsPlatformsResources

Units of exchange commons contextual metadata

httpwwwresearchobjectorg

httpwwwresearchobjectorg

First class citizens - data software methods - id manage credit track profile focus

A Framework to Bundle and Relate (scattered) resources Metadata Objects that carry Research Context

Research Objects

bull closed lt-gt open bull local lt-gt alienbull embed lt-gt referbull fixed lt-gt fluidbull nested

bull multi ndashtyped stewarded sited authored

bull span research researchers platforms time

bull cite resolve steward

Goble De Roure Bechhofer Accelerating Knowledge Turns I3CK 2013

means

ends

driver

Research Object packages codes study and metadata to exchange descriptions of clinical study cohorts statistical scripts data (CKAN for the Farr Commons)

STELAR Asthma e-Lab Study Team for Early Life Asthma Research

ClinicalCodesorg coded patient cohorts exchanged with NHS FARSITE system

MRC funded multi-site collaboration to support safe use of patient and research data for medical research

STELAR e-Lab

Platform 1

Platform 2

Platform 3

Focus Pivot and ProfileProfile around methods workflows scripts software data figureshellip

Focus on the figure F1000Research Living Figuresversioned articles in-article data manipulation

R Lawrence Force2015 Vision Award Runner Up httpf1000compostersbrowsesummary1097482

Simply data + code

Can change the definition of

a figure and ultimately the

journal article

Colomb J and Brembs B

Sub-strains of Drosophila Canton-S differ

markedly in their locomotor behavior [v1

ref status indexed httpf1000res3is]

F1000Research 2014 3176

Other labs can replicate the study or

contribute their data to a meta-

analysis or disease model - figure

automatically updates

Data updates time-stamped

New conclusions added via versions

Jennifer Schopf Treating Data Like Software A Case for Production Quality Data JCDL 2012

Software-like Release paradigm Not a static document paradigm

Reproduce looks backwards -gt Release looks forwards

raquo Science methods data change -gt agile evolution

raquo Comparisons versions forks amp merges dependencies

raquo Id amp Citations

raquo Interlinked ROs

httpsdoiorg1015490seek1investigation56

[Snoep 2015]

httpsdoiorg1015490seek1investigation56

Personal Data

Local Stores

External

Databases

Articles

Models

Standards

Aggregated Commons Infrastructure

Consistent Comparative Reportingbull Design protocols samples

software modelshellipbull Just Enough Results Modelbull Common and specific elements

httpwwwseek4scienceorg httpwwwfair-domorg httpisatoolsorg

RO as Instrument Materials Method

RO as Instrument Materials Method

Input Data

Software

Output Data

ConfigParameters

Drummond Replicability is not Reproducibility Nor is it Good Science onlinePeng Reproducible Research in Computational Science Science 2 Dec 2011 1226-1227

Public data sets

My algorithm

RO Workflow as Instrument

BioSTIF

My data set

Public software

recompute

replicatererun

repeat

re-examine

repurpose

recreate

reuse

restorereconstruct review

regeneraterevise

recycle

redo

What IS reproducibilityRe ldquodo againrdquo ldquoreturn to original staterdquo

regenerate figure

ldquoshow A is true by doing Brdquo

verify but not falsify[Yong Nature 485 2012]

robustness tolerance

verification compliance

validation assurance

1 Science Changes So does the Lab

BioSTIF

ldquoThe questions donrsquot

change but the

answers dordquoDan Reed

The lab is not fixedUpdated resources

Uncertainty

Zhao et al Why workflows break - Understanding and combating decay in Taverna workflows 8th Intl Conf e-Science 2012

2 Instruments Break Labs Decaymaterials become unavailable technicians leave

Reproducibility Window

raquo Bit rot Black boxes

raquo Proprietary Licenses

raquo Clown services

raquo Partial replication

raquo Prepare to Repair

rsaquo form or function

rsaquo preserve or sustain

RO as Instrument Materials Method

Input Data

Software

Output Data

ConfigParameters

Methods(techniques algorithms

spec of the steps)

Materials(datasets parameters

algorithm seeds)

Experiment

Instruments(codes services scripts

underlying libraries)

Laboratory(sw and hw infrastructure

systems software

integrative platforms)

Setup

Drummond Replicability is not Reproducibility Nor is it Good Science onlinePeng Reproducible Research in Computational Science Science 2 Dec 2011 1226-1227

Research Environment

submit articleand move onhellip

publish articlePublication Environment

Research Environment

publish articlePublication Environment

submit articleand move onhellip

[Adapted Freire 2013]

transparencydependencies

stepsprovenance

portability

robustness

preservation

accessavailable

descriptionintelligible

standardscommon APIs

licensing

standardscommon

metadata

change managementversioning

packaging

Machine actionable

Machine actionable

Provenance ndash the link between doing and reporting

Reproduce by ReadingArchived Record Retain the ProcessCode

The IT Crowd Series 3 Episode 4

The eLab Virtual Machine (or Docker Image)

a black box thoughdockercom

Reproduce by Running Active InstrumentRetain the bits

The IT Crowd Series 3 Episode 4

The Internet

serviceScience as a Service

Integrative frameworks

Open Source

Workflows

Virtual Machines

Portable Packaging

Portability

Transparency

ReproZip

Workflowsmakefiles

serviceScience as a Service

Integrative frameworks

Open Source

Workflows

Virtual Machines

Portable Packaging

Metadata Objectsthe secret is the manifesthellip

Workflow definition

Data (inputs outputs) Parameter configsProvenance log

Hettne et al Structuring research methods and data with the research object model genomics workflows as a case study 2014 httpwwwjbiomedsemcomcontentpdf2041-1480-5-41pdf

myRDM

httpwwwcnrirestonvauspapersOverviewDigitalObjectArchitecturepdf

Depth and Coverage Profiles

NISO-JATS

NISO-JATS

Depth and Coverage Metadata Profiles

Zhao et al 2013

Method Matters

Make reproducible -gt Born

Be smart about reproducibility

Think Commons not Repository

Best Practices for Scientific Computing httparxivorgabs12100530Stodden Reproducible Research Standard Intl J Comm Law amp Policy 13 2009

RARE amp FAIR Knowledge Turns with Research Objects

httpmerchandisethedoctorwhositecouk

httpdoctorwhosite1weeblycomsonic-screwdrivershtml

Researcher Silver bullet tools

Psychic paper

httpbowjamesbowca20080608shhhhhhh-silencshtml

Reality Check

Jorge Cham wwwphdcomicscom

Stealthy not Sneakyreduce the friction

instrumentationspan RARE and FAIR

Optimising The Neylon Equation

Auto-magical end-to-end Instrumentationhttpswwwyoutubecomwatchv=QVQwSOX5S08

ELNs and Authoring Platforms

Sweave

Credit ne AuthorshipCiting what

Research Currencies

Training

56Of UK researchers develop their own research software or scripts

73 Of UK researchers have had no formal software engineering training

Survey of researchers from 15 Russell Group universities conducted by SSI between August - October 2014 406 respondents covering representative range of funders discipline and seniority

httpwwwrseacuk

BUThelliphellip

two years time when the paper is writtenreviewers want additional workstatistician wants more runsanalysis may need to be repeatedpost-doc leaves student arrivesnew data revised dataupdated versions of algorithmscodessample was contaminated

Inspired by Bob Harrison

bull Incremental shift for infrastructure providers

bull Moderate shift for policy makers and stewards

bull Paradigm shift for researchers and their institutions

The Challenge

All the members of the Wf4Ever teamColleagues in Manchesterrsquos Information Management Group

httpwwwresearchobjectorg

httpwwwwf4ever-projectorg

httpwwwfair-domorg

httpseek4scienceorg

httprightfieldorguk

httpwwwsoftwareacuk

httpwwwdatafairportorg

httpmyexperimentorg

httpwwwbioveleuAlan WilliamsNorman MorrisonStian Soiland-ReyesPaul GrothTim ClarkJuliana FreireAlejandra Gonzalez-BeltranPhilippe Rocca-SerraIan CottamSusanna SansoneKristian Garza

Barend MonsSean BechhoferPhilip BourneMatthew GambleRaul PalmaJun ZhaoNeil Chue HongJosh SommerMatthias ObstJacky SnoepDavid GavaghanRebecca Lawrence

Contacthellip

Professor Carole Goble CBE FREng FBCS

The University of Manchester UK

carolegoblemanchesteracuk

httpssitesgooglecomsitecarolegoble

httpwwwmygridorguk

Page 9: Keynote speech - Carole Goble - Jisc Digital Festival 2015

Software making practicesldquoAs a general rule

researchers do not

test or document their

programs rigorously

and they rarely

release their codes

making it almost

impossible to

reproduce and verify

published results

generated by

scientific softwarerdquo

2000 scientists JE Hannay et al ldquoHow Do Scientists Develop and Use Scientific Softwarerdquo Proc ICSE Workshop Software Eng for Computational Science and Eng 2009 pp 1ndash8

Tools StandardsMachine actionableFormats Reporting Policies Practices

Record and AutomateEverything

recomputationorg

sciencecodemanifestoorg

republic of science

regulation of science

institution cores libraries

Mertonrsquos four norms of scientific behaviour (1942)

public services

Honest Error Science is messy

Inherent

ReinhartRogoff Austerity economicsThomas Herndon

Nature Oct rsquo12

Zoeuml Corbyn

Fraud

ldquoI canrsquot immediately reproduce the research in my own laboratory It took an estimated 280 hours for an average user to approximately reproduce the paperrdquo

Prof Phil BourneAssociate Director NIH Big Data 2 Knowledge Program

When research goes ldquowrongrdquo

raquo Tainted resources

raquo Black boxes

raquo Poor Reporting

raquo Unavailable resources results data software

raquo Bad maths

raquo Sins of omission

raquo Poor training sloppiness

httpswwwsciencenewsorgarticle12-reasons-research-goes-wrong (adapted)

Ioannidis Why Most Published Research Findings Are False August 2005Joppa et alTroubling Trends in Scientific Software Use SCIENCE 340 May 2013

Scientific method

Social environmentraquo Impact factor mania

raquo Pressure to publish

raquo Broken peer review

raquo Research never reported

raquo Disorganisation

raquo Time pressures

raquo Prep amp curate costs

When research goes ldquowrongrdquo

httpswwwsciencenewsorgarticle12-reasons-research-goes-wrong (adapted)

Nick D Kim strange-matternet

Norman Morrison

Do a Replication Study No thanks Not FAIR

Hard Resource intensiveUnrecognised TrolledJust gathering the bits

Cross-Institutional e-Laboratory

Scattered parts Subject specific General resources

Fragmented Landscape

101 Innovations in Scholarly Communication - the Changing Research Workflow Boseman and Kramer 2015 httpfigsharecomarticles101_Innovations_in_Scholarly_Communication_the_Changing_Research_Workflow1286826

httpmyexperimentorg

Research Objects

Compound Investigations Research Products

Multi-various ProductsPlatformsResources

Units of exchange commons contextual metadata

httpwwwresearchobjectorg

httpwwwresearchobjectorg

First class citizens - data software methods - id manage credit track profile focus

A Framework to Bundle and Relate (scattered) resources Metadata Objects that carry Research Context

Research Objects

bull closed lt-gt open bull local lt-gt alienbull embed lt-gt referbull fixed lt-gt fluidbull nested

bull multi ndashtyped stewarded sited authored

bull span research researchers platforms time

bull cite resolve steward

Goble De Roure Bechhofer Accelerating Knowledge Turns I3CK 2013

means

ends

driver

Research Object packages codes study and metadata to exchange descriptions of clinical study cohorts statistical scripts data (CKAN for the Farr Commons)

STELAR Asthma e-Lab Study Team for Early Life Asthma Research

ClinicalCodesorg coded patient cohorts exchanged with NHS FARSITE system

MRC funded multi-site collaboration to support safe use of patient and research data for medical research

STELAR e-Lab

Platform 1

Platform 2

Platform 3

Focus Pivot and ProfileProfile around methods workflows scripts software data figureshellip

Focus on the figure F1000Research Living Figuresversioned articles in-article data manipulation

R Lawrence Force2015 Vision Award Runner Up httpf1000compostersbrowsesummary1097482

Simply data + code

Can change the definition of

a figure and ultimately the

journal article

Colomb J and Brembs B

Sub-strains of Drosophila Canton-S differ

markedly in their locomotor behavior [v1

ref status indexed httpf1000res3is]

F1000Research 2014 3176

Other labs can replicate the study or

contribute their data to a meta-

analysis or disease model - figure

automatically updates

Data updates time-stamped

New conclusions added via versions

Jennifer Schopf Treating Data Like Software A Case for Production Quality Data JCDL 2012

Software-like Release paradigm Not a static document paradigm

Reproduce looks backwards -gt Release looks forwards

raquo Science methods data change -gt agile evolution

raquo Comparisons versions forks amp merges dependencies

raquo Id amp Citations

raquo Interlinked ROs

httpsdoiorg1015490seek1investigation56

[Snoep 2015]

httpsdoiorg1015490seek1investigation56

Personal Data

Local Stores

External

Databases

Articles

Models

Standards

Aggregated Commons Infrastructure

Consistent Comparative Reportingbull Design protocols samples

software modelshellipbull Just Enough Results Modelbull Common and specific elements

httpwwwseek4scienceorg httpwwwfair-domorg httpisatoolsorg

RO as Instrument Materials Method

RO as Instrument Materials Method

Input Data

Software

Output Data

ConfigParameters

Drummond Replicability is not Reproducibility Nor is it Good Science onlinePeng Reproducible Research in Computational Science Science 2 Dec 2011 1226-1227

Public data sets

My algorithm

RO Workflow as Instrument

BioSTIF

My data set

Public software

recompute

replicatererun

repeat

re-examine

repurpose

recreate

reuse

restorereconstruct review

regeneraterevise

recycle

redo

What IS reproducibilityRe ldquodo againrdquo ldquoreturn to original staterdquo

regenerate figure

ldquoshow A is true by doing Brdquo

verify but not falsify[Yong Nature 485 2012]

robustness tolerance

verification compliance

validation assurance

1 Science Changes So does the Lab

BioSTIF

ldquoThe questions donrsquot

change but the

answers dordquoDan Reed

The lab is not fixedUpdated resources

Uncertainty

Zhao et al Why workflows break - Understanding and combating decay in Taverna workflows 8th Intl Conf e-Science 2012

2 Instruments Break Labs Decaymaterials become unavailable technicians leave

Reproducibility Window

raquo Bit rot Black boxes

raquo Proprietary Licenses

raquo Clown services

raquo Partial replication

raquo Prepare to Repair

rsaquo form or function

rsaquo preserve or sustain

RO as Instrument Materials Method

Input Data

Software

Output Data

ConfigParameters

Methods(techniques algorithms

spec of the steps)

Materials(datasets parameters

algorithm seeds)

Experiment

Instruments(codes services scripts

underlying libraries)

Laboratory(sw and hw infrastructure

systems software

integrative platforms)

Setup

Drummond Replicability is not Reproducibility Nor is it Good Science onlinePeng Reproducible Research in Computational Science Science 2 Dec 2011 1226-1227

Research Environment

submit articleand move onhellip

publish articlePublication Environment

Research Environment

publish articlePublication Environment

submit articleand move onhellip

[Adapted Freire 2013]

transparencydependencies

stepsprovenance

portability

robustness

preservation

accessavailable

descriptionintelligible

standardscommon APIs

licensing

standardscommon

metadata

change managementversioning

packaging

Machine actionable

Machine actionable

Provenance ndash the link between doing and reporting

Reproduce by ReadingArchived Record Retain the ProcessCode

The IT Crowd Series 3 Episode 4

The eLab Virtual Machine (or Docker Image)

a black box thoughdockercom

Reproduce by Running Active InstrumentRetain the bits

The IT Crowd Series 3 Episode 4

The Internet

serviceScience as a Service

Integrative frameworks

Open Source

Workflows

Virtual Machines

Portable Packaging

Portability

Transparency

ReproZip

Workflowsmakefiles

serviceScience as a Service

Integrative frameworks

Open Source

Workflows

Virtual Machines

Portable Packaging

Metadata Objectsthe secret is the manifesthellip

Workflow definition

Data (inputs outputs) Parameter configsProvenance log

Hettne et al Structuring research methods and data with the research object model genomics workflows as a case study 2014 httpwwwjbiomedsemcomcontentpdf2041-1480-5-41pdf

myRDM

httpwwwcnrirestonvauspapersOverviewDigitalObjectArchitecturepdf

Depth and Coverage Profiles

NISO-JATS

NISO-JATS

Depth and Coverage Metadata Profiles

Zhao et al 2013

Method Matters

Make reproducible -gt Born

Be smart about reproducibility

Think Commons not Repository

Best Practices for Scientific Computing httparxivorgabs12100530Stodden Reproducible Research Standard Intl J Comm Law amp Policy 13 2009

RARE amp FAIR Knowledge Turns with Research Objects

httpmerchandisethedoctorwhositecouk

httpdoctorwhosite1weeblycomsonic-screwdrivershtml

Researcher Silver bullet tools

Psychic paper

httpbowjamesbowca20080608shhhhhhh-silencshtml

Reality Check

Jorge Cham wwwphdcomicscom

Stealthy not Sneakyreduce the friction

instrumentationspan RARE and FAIR

Optimising The Neylon Equation

Auto-magical end-to-end Instrumentationhttpswwwyoutubecomwatchv=QVQwSOX5S08

ELNs and Authoring Platforms

Sweave

Credit ne AuthorshipCiting what

Research Currencies

Training

56Of UK researchers develop their own research software or scripts

73 Of UK researchers have had no formal software engineering training

Survey of researchers from 15 Russell Group universities conducted by SSI between August - October 2014 406 respondents covering representative range of funders discipline and seniority

httpwwwrseacuk

BUThelliphellip

two years time when the paper is writtenreviewers want additional workstatistician wants more runsanalysis may need to be repeatedpost-doc leaves student arrivesnew data revised dataupdated versions of algorithmscodessample was contaminated

Inspired by Bob Harrison

bull Incremental shift for infrastructure providers

bull Moderate shift for policy makers and stewards

bull Paradigm shift for researchers and their institutions

The Challenge

All the members of the Wf4Ever teamColleagues in Manchesterrsquos Information Management Group

httpwwwresearchobjectorg

httpwwwwf4ever-projectorg

httpwwwfair-domorg

httpseek4scienceorg

httprightfieldorguk

httpwwwsoftwareacuk

httpwwwdatafairportorg

httpmyexperimentorg

httpwwwbioveleuAlan WilliamsNorman MorrisonStian Soiland-ReyesPaul GrothTim ClarkJuliana FreireAlejandra Gonzalez-BeltranPhilippe Rocca-SerraIan CottamSusanna SansoneKristian Garza

Barend MonsSean BechhoferPhilip BourneMatthew GambleRaul PalmaJun ZhaoNeil Chue HongJosh SommerMatthias ObstJacky SnoepDavid GavaghanRebecca Lawrence

Contacthellip

Professor Carole Goble CBE FREng FBCS

The University of Manchester UK

carolegoblemanchesteracuk

httpssitesgooglecomsitecarolegoble

httpwwwmygridorguk

Page 10: Keynote speech - Carole Goble - Jisc Digital Festival 2015

Tools StandardsMachine actionableFormats Reporting Policies Practices

Record and AutomateEverything

recomputationorg

sciencecodemanifestoorg

republic of science

regulation of science

institution cores libraries

Mertonrsquos four norms of scientific behaviour (1942)

public services

Honest Error Science is messy

Inherent

ReinhartRogoff Austerity economicsThomas Herndon

Nature Oct rsquo12

Zoeuml Corbyn

Fraud

ldquoI canrsquot immediately reproduce the research in my own laboratory It took an estimated 280 hours for an average user to approximately reproduce the paperrdquo

Prof Phil BourneAssociate Director NIH Big Data 2 Knowledge Program

When research goes ldquowrongrdquo

raquo Tainted resources

raquo Black boxes

raquo Poor Reporting

raquo Unavailable resources results data software

raquo Bad maths

raquo Sins of omission

raquo Poor training sloppiness

httpswwwsciencenewsorgarticle12-reasons-research-goes-wrong (adapted)

Ioannidis Why Most Published Research Findings Are False August 2005Joppa et alTroubling Trends in Scientific Software Use SCIENCE 340 May 2013

Scientific method

Social environmentraquo Impact factor mania

raquo Pressure to publish

raquo Broken peer review

raquo Research never reported

raquo Disorganisation

raquo Time pressures

raquo Prep amp curate costs

When research goes ldquowrongrdquo

httpswwwsciencenewsorgarticle12-reasons-research-goes-wrong (adapted)

Nick D Kim strange-matternet

Norman Morrison

Do a Replication Study No thanks Not FAIR

Hard Resource intensiveUnrecognised TrolledJust gathering the bits

Cross-Institutional e-Laboratory

Scattered parts Subject specific General resources

Fragmented Landscape

101 Innovations in Scholarly Communication - the Changing Research Workflow Boseman and Kramer 2015 httpfigsharecomarticles101_Innovations_in_Scholarly_Communication_the_Changing_Research_Workflow1286826

httpmyexperimentorg

Research Objects

Compound Investigations Research Products

Multi-various ProductsPlatformsResources

Units of exchange commons contextual metadata

httpwwwresearchobjectorg

httpwwwresearchobjectorg

First class citizens - data software methods - id manage credit track profile focus

A Framework to Bundle and Relate (scattered) resources Metadata Objects that carry Research Context

Research Objects

bull closed lt-gt open bull local lt-gt alienbull embed lt-gt referbull fixed lt-gt fluidbull nested

bull multi ndashtyped stewarded sited authored

bull span research researchers platforms time

bull cite resolve steward

Goble De Roure Bechhofer Accelerating Knowledge Turns I3CK 2013

means

ends

driver

Research Object packages codes study and metadata to exchange descriptions of clinical study cohorts statistical scripts data (CKAN for the Farr Commons)

STELAR Asthma e-Lab Study Team for Early Life Asthma Research

ClinicalCodesorg coded patient cohorts exchanged with NHS FARSITE system

MRC funded multi-site collaboration to support safe use of patient and research data for medical research

STELAR e-Lab

Platform 1

Platform 2

Platform 3

Focus Pivot and ProfileProfile around methods workflows scripts software data figureshellip

Focus on the figure F1000Research Living Figuresversioned articles in-article data manipulation

R Lawrence Force2015 Vision Award Runner Up httpf1000compostersbrowsesummary1097482

Simply data + code

Can change the definition of

a figure and ultimately the

journal article

Colomb J and Brembs B

Sub-strains of Drosophila Canton-S differ

markedly in their locomotor behavior [v1

ref status indexed httpf1000res3is]

F1000Research 2014 3176

Other labs can replicate the study or

contribute their data to a meta-

analysis or disease model - figure

automatically updates

Data updates time-stamped

New conclusions added via versions

Jennifer Schopf Treating Data Like Software A Case for Production Quality Data JCDL 2012

Software-like Release paradigm Not a static document paradigm

Reproduce looks backwards -gt Release looks forwards

raquo Science methods data change -gt agile evolution

raquo Comparisons versions forks amp merges dependencies

raquo Id amp Citations

raquo Interlinked ROs

httpsdoiorg1015490seek1investigation56

[Snoep 2015]

httpsdoiorg1015490seek1investigation56

Personal Data

Local Stores

External

Databases

Articles

Models

Standards

Aggregated Commons Infrastructure

Consistent Comparative Reportingbull Design protocols samples

software modelshellipbull Just Enough Results Modelbull Common and specific elements

httpwwwseek4scienceorg httpwwwfair-domorg httpisatoolsorg

RO as Instrument Materials Method

RO as Instrument Materials Method

Input Data

Software

Output Data

ConfigParameters

Drummond Replicability is not Reproducibility Nor is it Good Science onlinePeng Reproducible Research in Computational Science Science 2 Dec 2011 1226-1227

Public data sets

My algorithm

RO Workflow as Instrument

BioSTIF

My data set

Public software

recompute

replicatererun

repeat

re-examine

repurpose

recreate

reuse

restorereconstruct review

regeneraterevise

recycle

redo

What IS reproducibilityRe ldquodo againrdquo ldquoreturn to original staterdquo

regenerate figure

ldquoshow A is true by doing Brdquo

verify but not falsify[Yong Nature 485 2012]

robustness tolerance

verification compliance

validation assurance

1 Science Changes So does the Lab

BioSTIF

ldquoThe questions donrsquot

change but the

answers dordquoDan Reed

The lab is not fixedUpdated resources

Uncertainty

Zhao et al Why workflows break - Understanding and combating decay in Taverna workflows 8th Intl Conf e-Science 2012

2 Instruments Break Labs Decaymaterials become unavailable technicians leave

Reproducibility Window

raquo Bit rot Black boxes

raquo Proprietary Licenses

raquo Clown services

raquo Partial replication

raquo Prepare to Repair

rsaquo form or function

rsaquo preserve or sustain

RO as Instrument Materials Method

Input Data

Software

Output Data

ConfigParameters

Methods(techniques algorithms

spec of the steps)

Materials(datasets parameters

algorithm seeds)

Experiment

Instruments(codes services scripts

underlying libraries)

Laboratory(sw and hw infrastructure

systems software

integrative platforms)

Setup

Drummond Replicability is not Reproducibility Nor is it Good Science onlinePeng Reproducible Research in Computational Science Science 2 Dec 2011 1226-1227

Research Environment

submit articleand move onhellip

publish articlePublication Environment

Research Environment

publish articlePublication Environment

submit articleand move onhellip

[Adapted Freire 2013]

transparencydependencies

stepsprovenance

portability

robustness

preservation

accessavailable

descriptionintelligible

standardscommon APIs

licensing

standardscommon

metadata

change managementversioning

packaging

Machine actionable

Machine actionable

Provenance ndash the link between doing and reporting

Reproduce by ReadingArchived Record Retain the ProcessCode

The IT Crowd Series 3 Episode 4

The eLab Virtual Machine (or Docker Image)

a black box thoughdockercom

Reproduce by Running Active InstrumentRetain the bits

The IT Crowd Series 3 Episode 4

The Internet

serviceScience as a Service

Integrative frameworks

Open Source

Workflows

Virtual Machines

Portable Packaging

Portability

Transparency

ReproZip

Workflowsmakefiles

serviceScience as a Service

Integrative frameworks

Open Source

Workflows

Virtual Machines

Portable Packaging

Metadata Objectsthe secret is the manifesthellip

Workflow definition

Data (inputs outputs) Parameter configsProvenance log

Hettne et al Structuring research methods and data with the research object model genomics workflows as a case study 2014 httpwwwjbiomedsemcomcontentpdf2041-1480-5-41pdf

myRDM

httpwwwcnrirestonvauspapersOverviewDigitalObjectArchitecturepdf

Depth and Coverage Profiles

NISO-JATS

NISO-JATS

Depth and Coverage Metadata Profiles

Zhao et al 2013

Method Matters

Make reproducible -gt Born

Be smart about reproducibility

Think Commons not Repository

Best Practices for Scientific Computing httparxivorgabs12100530Stodden Reproducible Research Standard Intl J Comm Law amp Policy 13 2009

RARE amp FAIR Knowledge Turns with Research Objects

httpmerchandisethedoctorwhositecouk

httpdoctorwhosite1weeblycomsonic-screwdrivershtml

Researcher Silver bullet tools

Psychic paper

httpbowjamesbowca20080608shhhhhhh-silencshtml

Reality Check

Jorge Cham wwwphdcomicscom

Stealthy not Sneakyreduce the friction

instrumentationspan RARE and FAIR

Optimising The Neylon Equation

Auto-magical end-to-end Instrumentationhttpswwwyoutubecomwatchv=QVQwSOX5S08

ELNs and Authoring Platforms

Sweave

Credit ne AuthorshipCiting what

Research Currencies

Training

56Of UK researchers develop their own research software or scripts

73 Of UK researchers have had no formal software engineering training

Survey of researchers from 15 Russell Group universities conducted by SSI between August - October 2014 406 respondents covering representative range of funders discipline and seniority

httpwwwrseacuk

BUThelliphellip

two years time when the paper is writtenreviewers want additional workstatistician wants more runsanalysis may need to be repeatedpost-doc leaves student arrivesnew data revised dataupdated versions of algorithmscodessample was contaminated

Inspired by Bob Harrison

bull Incremental shift for infrastructure providers

bull Moderate shift for policy makers and stewards

bull Paradigm shift for researchers and their institutions

The Challenge

All the members of the Wf4Ever teamColleagues in Manchesterrsquos Information Management Group

httpwwwresearchobjectorg

httpwwwwf4ever-projectorg

httpwwwfair-domorg

httpseek4scienceorg

httprightfieldorguk

httpwwwsoftwareacuk

httpwwwdatafairportorg

httpmyexperimentorg

httpwwwbioveleuAlan WilliamsNorman MorrisonStian Soiland-ReyesPaul GrothTim ClarkJuliana FreireAlejandra Gonzalez-BeltranPhilippe Rocca-SerraIan CottamSusanna SansoneKristian Garza

Barend MonsSean BechhoferPhilip BourneMatthew GambleRaul PalmaJun ZhaoNeil Chue HongJosh SommerMatthias ObstJacky SnoepDavid GavaghanRebecca Lawrence

Contacthellip

Professor Carole Goble CBE FREng FBCS

The University of Manchester UK

carolegoblemanchesteracuk

httpssitesgooglecomsitecarolegoble

httpwwwmygridorguk

Page 11: Keynote speech - Carole Goble - Jisc Digital Festival 2015

Record and AutomateEverything

recomputationorg

sciencecodemanifestoorg

republic of science

regulation of science

institution cores libraries

Mertonrsquos four norms of scientific behaviour (1942)

public services

Honest Error Science is messy

Inherent

ReinhartRogoff Austerity economicsThomas Herndon

Nature Oct rsquo12

Zoeuml Corbyn

Fraud

ldquoI canrsquot immediately reproduce the research in my own laboratory It took an estimated 280 hours for an average user to approximately reproduce the paperrdquo

Prof Phil BourneAssociate Director NIH Big Data 2 Knowledge Program

When research goes ldquowrongrdquo

raquo Tainted resources

raquo Black boxes

raquo Poor Reporting

raquo Unavailable resources results data software

raquo Bad maths

raquo Sins of omission

raquo Poor training sloppiness

httpswwwsciencenewsorgarticle12-reasons-research-goes-wrong (adapted)

Ioannidis Why Most Published Research Findings Are False August 2005Joppa et alTroubling Trends in Scientific Software Use SCIENCE 340 May 2013

Scientific method

Social environmentraquo Impact factor mania

raquo Pressure to publish

raquo Broken peer review

raquo Research never reported

raquo Disorganisation

raquo Time pressures

raquo Prep amp curate costs

When research goes ldquowrongrdquo

httpswwwsciencenewsorgarticle12-reasons-research-goes-wrong (adapted)

Nick D Kim strange-matternet

Norman Morrison

Do a Replication Study No thanks Not FAIR

Hard Resource intensiveUnrecognised TrolledJust gathering the bits

Cross-Institutional e-Laboratory

Scattered parts Subject specific General resources

Fragmented Landscape

101 Innovations in Scholarly Communication - the Changing Research Workflow Boseman and Kramer 2015 httpfigsharecomarticles101_Innovations_in_Scholarly_Communication_the_Changing_Research_Workflow1286826

httpmyexperimentorg

Research Objects

Compound Investigations Research Products

Multi-various ProductsPlatformsResources

Units of exchange commons contextual metadata

httpwwwresearchobjectorg

httpwwwresearchobjectorg

First class citizens - data software methods - id manage credit track profile focus

A Framework to Bundle and Relate (scattered) resources Metadata Objects that carry Research Context

Research Objects

bull closed lt-gt open bull local lt-gt alienbull embed lt-gt referbull fixed lt-gt fluidbull nested

bull multi ndashtyped stewarded sited authored

bull span research researchers platforms time

bull cite resolve steward

Goble De Roure Bechhofer Accelerating Knowledge Turns I3CK 2013

means

ends

driver

Research Object packages codes study and metadata to exchange descriptions of clinical study cohorts statistical scripts data (CKAN for the Farr Commons)

STELAR Asthma e-Lab Study Team for Early Life Asthma Research

ClinicalCodesorg coded patient cohorts exchanged with NHS FARSITE system

MRC funded multi-site collaboration to support safe use of patient and research data for medical research

STELAR e-Lab

Platform 1

Platform 2

Platform 3

Focus Pivot and ProfileProfile around methods workflows scripts software data figureshellip

Focus on the figure F1000Research Living Figuresversioned articles in-article data manipulation

R Lawrence Force2015 Vision Award Runner Up httpf1000compostersbrowsesummary1097482

Simply data + code

Can change the definition of

a figure and ultimately the

journal article

Colomb J and Brembs B

Sub-strains of Drosophila Canton-S differ

markedly in their locomotor behavior [v1

ref status indexed httpf1000res3is]

F1000Research 2014 3176

Other labs can replicate the study or

contribute their data to a meta-

analysis or disease model - figure

automatically updates

Data updates time-stamped

New conclusions added via versions

Jennifer Schopf Treating Data Like Software A Case for Production Quality Data JCDL 2012

Software-like Release paradigm Not a static document paradigm

Reproduce looks backwards -gt Release looks forwards

raquo Science methods data change -gt agile evolution

raquo Comparisons versions forks amp merges dependencies

raquo Id amp Citations

raquo Interlinked ROs

httpsdoiorg1015490seek1investigation56

[Snoep 2015]

httpsdoiorg1015490seek1investigation56

Personal Data

Local Stores

External

Databases

Articles

Models

Standards

Aggregated Commons Infrastructure

Consistent Comparative Reportingbull Design protocols samples

software modelshellipbull Just Enough Results Modelbull Common and specific elements

httpwwwseek4scienceorg httpwwwfair-domorg httpisatoolsorg

RO as Instrument Materials Method

RO as Instrument Materials Method

Input Data

Software

Output Data

ConfigParameters

Drummond Replicability is not Reproducibility Nor is it Good Science onlinePeng Reproducible Research in Computational Science Science 2 Dec 2011 1226-1227

Public data sets

My algorithm

RO Workflow as Instrument

BioSTIF

My data set

Public software

recompute

replicatererun

repeat

re-examine

repurpose

recreate

reuse

restorereconstruct review

regeneraterevise

recycle

redo

What IS reproducibilityRe ldquodo againrdquo ldquoreturn to original staterdquo

regenerate figure

ldquoshow A is true by doing Brdquo

verify but not falsify[Yong Nature 485 2012]

robustness tolerance

verification compliance

validation assurance

1 Science Changes So does the Lab

BioSTIF

ldquoThe questions donrsquot

change but the

answers dordquoDan Reed

The lab is not fixedUpdated resources

Uncertainty

Zhao et al Why workflows break - Understanding and combating decay in Taverna workflows 8th Intl Conf e-Science 2012

2 Instruments Break Labs Decaymaterials become unavailable technicians leave

Reproducibility Window

raquo Bit rot Black boxes

raquo Proprietary Licenses

raquo Clown services

raquo Partial replication

raquo Prepare to Repair

rsaquo form or function

rsaquo preserve or sustain

RO as Instrument Materials Method

Input Data

Software

Output Data

ConfigParameters

Methods(techniques algorithms

spec of the steps)

Materials(datasets parameters

algorithm seeds)

Experiment

Instruments(codes services scripts

underlying libraries)

Laboratory(sw and hw infrastructure

systems software

integrative platforms)

Setup

Drummond Replicability is not Reproducibility Nor is it Good Science onlinePeng Reproducible Research in Computational Science Science 2 Dec 2011 1226-1227

Research Environment

submit articleand move onhellip

publish articlePublication Environment

Research Environment

publish articlePublication Environment

submit articleand move onhellip

[Adapted Freire 2013]

transparencydependencies

stepsprovenance

portability

robustness

preservation

accessavailable

descriptionintelligible

standardscommon APIs

licensing

standardscommon

metadata

change managementversioning

packaging

Machine actionable

Machine actionable

Provenance ndash the link between doing and reporting

Reproduce by ReadingArchived Record Retain the ProcessCode

The IT Crowd Series 3 Episode 4

The eLab Virtual Machine (or Docker Image)

a black box thoughdockercom

Reproduce by Running Active InstrumentRetain the bits

The IT Crowd Series 3 Episode 4

The Internet

serviceScience as a Service

Integrative frameworks

Open Source

Workflows

Virtual Machines

Portable Packaging

Portability

Transparency

ReproZip

Workflowsmakefiles

serviceScience as a Service

Integrative frameworks

Open Source

Workflows

Virtual Machines

Portable Packaging

Metadata Objectsthe secret is the manifesthellip

Workflow definition

Data (inputs outputs) Parameter configsProvenance log

Hettne et al Structuring research methods and data with the research object model genomics workflows as a case study 2014 httpwwwjbiomedsemcomcontentpdf2041-1480-5-41pdf

myRDM

httpwwwcnrirestonvauspapersOverviewDigitalObjectArchitecturepdf

Depth and Coverage Profiles

NISO-JATS

NISO-JATS

Depth and Coverage Metadata Profiles

Zhao et al 2013

Method Matters

Make reproducible -gt Born

Be smart about reproducibility

Think Commons not Repository

Best Practices for Scientific Computing httparxivorgabs12100530Stodden Reproducible Research Standard Intl J Comm Law amp Policy 13 2009

RARE amp FAIR Knowledge Turns with Research Objects

httpmerchandisethedoctorwhositecouk

httpdoctorwhosite1weeblycomsonic-screwdrivershtml

Researcher Silver bullet tools

Psychic paper

httpbowjamesbowca20080608shhhhhhh-silencshtml

Reality Check

Jorge Cham wwwphdcomicscom

Stealthy not Sneakyreduce the friction

instrumentationspan RARE and FAIR

Optimising The Neylon Equation

Auto-magical end-to-end Instrumentationhttpswwwyoutubecomwatchv=QVQwSOX5S08

ELNs and Authoring Platforms

Sweave

Credit ne AuthorshipCiting what

Research Currencies

Training

56Of UK researchers develop their own research software or scripts

73 Of UK researchers have had no formal software engineering training

Survey of researchers from 15 Russell Group universities conducted by SSI between August - October 2014 406 respondents covering representative range of funders discipline and seniority

httpwwwrseacuk

BUThelliphellip

two years time when the paper is writtenreviewers want additional workstatistician wants more runsanalysis may need to be repeatedpost-doc leaves student arrivesnew data revised dataupdated versions of algorithmscodessample was contaminated

Inspired by Bob Harrison

bull Incremental shift for infrastructure providers

bull Moderate shift for policy makers and stewards

bull Paradigm shift for researchers and their institutions

The Challenge

All the members of the Wf4Ever teamColleagues in Manchesterrsquos Information Management Group

httpwwwresearchobjectorg

httpwwwwf4ever-projectorg

httpwwwfair-domorg

httpseek4scienceorg

httprightfieldorguk

httpwwwsoftwareacuk

httpwwwdatafairportorg

httpmyexperimentorg

httpwwwbioveleuAlan WilliamsNorman MorrisonStian Soiland-ReyesPaul GrothTim ClarkJuliana FreireAlejandra Gonzalez-BeltranPhilippe Rocca-SerraIan CottamSusanna SansoneKristian Garza

Barend MonsSean BechhoferPhilip BourneMatthew GambleRaul PalmaJun ZhaoNeil Chue HongJosh SommerMatthias ObstJacky SnoepDavid GavaghanRebecca Lawrence

Contacthellip

Professor Carole Goble CBE FREng FBCS

The University of Manchester UK

carolegoblemanchesteracuk

httpssitesgooglecomsitecarolegoble

httpwwwmygridorguk

Page 12: Keynote speech - Carole Goble - Jisc Digital Festival 2015

republic of science

regulation of science

institution cores libraries

Mertonrsquos four norms of scientific behaviour (1942)

public services

Honest Error Science is messy

Inherent

ReinhartRogoff Austerity economicsThomas Herndon

Nature Oct rsquo12

Zoeuml Corbyn

Fraud

ldquoI canrsquot immediately reproduce the research in my own laboratory It took an estimated 280 hours for an average user to approximately reproduce the paperrdquo

Prof Phil BourneAssociate Director NIH Big Data 2 Knowledge Program

When research goes ldquowrongrdquo

raquo Tainted resources

raquo Black boxes

raquo Poor Reporting

raquo Unavailable resources results data software

raquo Bad maths

raquo Sins of omission

raquo Poor training sloppiness

httpswwwsciencenewsorgarticle12-reasons-research-goes-wrong (adapted)

Ioannidis Why Most Published Research Findings Are False August 2005Joppa et alTroubling Trends in Scientific Software Use SCIENCE 340 May 2013

Scientific method

Social environmentraquo Impact factor mania

raquo Pressure to publish

raquo Broken peer review

raquo Research never reported

raquo Disorganisation

raquo Time pressures

raquo Prep amp curate costs

When research goes ldquowrongrdquo

httpswwwsciencenewsorgarticle12-reasons-research-goes-wrong (adapted)

Nick D Kim strange-matternet

Norman Morrison

Do a Replication Study No thanks Not FAIR

Hard Resource intensiveUnrecognised TrolledJust gathering the bits

Cross-Institutional e-Laboratory

Scattered parts Subject specific General resources

Fragmented Landscape

101 Innovations in Scholarly Communication - the Changing Research Workflow Boseman and Kramer 2015 httpfigsharecomarticles101_Innovations_in_Scholarly_Communication_the_Changing_Research_Workflow1286826

httpmyexperimentorg

Research Objects

Compound Investigations Research Products

Multi-various ProductsPlatformsResources

Units of exchange commons contextual metadata

httpwwwresearchobjectorg

httpwwwresearchobjectorg

First class citizens - data software methods - id manage credit track profile focus

A Framework to Bundle and Relate (scattered) resources Metadata Objects that carry Research Context

Research Objects

bull closed lt-gt open bull local lt-gt alienbull embed lt-gt referbull fixed lt-gt fluidbull nested

bull multi ndashtyped stewarded sited authored

bull span research researchers platforms time

bull cite resolve steward

Goble De Roure Bechhofer Accelerating Knowledge Turns I3CK 2013

means

ends

driver

Research Object packages codes study and metadata to exchange descriptions of clinical study cohorts statistical scripts data (CKAN for the Farr Commons)

STELAR Asthma e-Lab Study Team for Early Life Asthma Research

ClinicalCodesorg coded patient cohorts exchanged with NHS FARSITE system

MRC funded multi-site collaboration to support safe use of patient and research data for medical research

STELAR e-Lab

Platform 1

Platform 2

Platform 3

Focus Pivot and ProfileProfile around methods workflows scripts software data figureshellip

Focus on the figure F1000Research Living Figuresversioned articles in-article data manipulation

R Lawrence Force2015 Vision Award Runner Up httpf1000compostersbrowsesummary1097482

Simply data + code

Can change the definition of

a figure and ultimately the

journal article

Colomb J and Brembs B

Sub-strains of Drosophila Canton-S differ

markedly in their locomotor behavior [v1

ref status indexed httpf1000res3is]

F1000Research 2014 3176

Other labs can replicate the study or

contribute their data to a meta-

analysis or disease model - figure

automatically updates

Data updates time-stamped

New conclusions added via versions

Jennifer Schopf Treating Data Like Software A Case for Production Quality Data JCDL 2012

Software-like Release paradigm Not a static document paradigm

Reproduce looks backwards -gt Release looks forwards

raquo Science methods data change -gt agile evolution

raquo Comparisons versions forks amp merges dependencies

raquo Id amp Citations

raquo Interlinked ROs

httpsdoiorg1015490seek1investigation56

[Snoep 2015]

httpsdoiorg1015490seek1investigation56

Personal Data

Local Stores

External

Databases

Articles

Models

Standards

Aggregated Commons Infrastructure

Consistent Comparative Reportingbull Design protocols samples

software modelshellipbull Just Enough Results Modelbull Common and specific elements

httpwwwseek4scienceorg httpwwwfair-domorg httpisatoolsorg

RO as Instrument Materials Method

RO as Instrument Materials Method

Input Data

Software

Output Data

ConfigParameters

Drummond Replicability is not Reproducibility Nor is it Good Science onlinePeng Reproducible Research in Computational Science Science 2 Dec 2011 1226-1227

Public data sets

My algorithm

RO Workflow as Instrument

BioSTIF

My data set

Public software

recompute

replicatererun

repeat

re-examine

repurpose

recreate

reuse

restorereconstruct review

regeneraterevise

recycle

redo

What IS reproducibilityRe ldquodo againrdquo ldquoreturn to original staterdquo

regenerate figure

ldquoshow A is true by doing Brdquo

verify but not falsify[Yong Nature 485 2012]

robustness tolerance

verification compliance

validation assurance

1 Science Changes So does the Lab

BioSTIF

ldquoThe questions donrsquot

change but the

answers dordquoDan Reed

The lab is not fixedUpdated resources

Uncertainty

Zhao et al Why workflows break - Understanding and combating decay in Taverna workflows 8th Intl Conf e-Science 2012

2 Instruments Break Labs Decaymaterials become unavailable technicians leave

Reproducibility Window

raquo Bit rot Black boxes

raquo Proprietary Licenses

raquo Clown services

raquo Partial replication

raquo Prepare to Repair

rsaquo form or function

rsaquo preserve or sustain

RO as Instrument Materials Method

Input Data

Software

Output Data

ConfigParameters

Methods(techniques algorithms

spec of the steps)

Materials(datasets parameters

algorithm seeds)

Experiment

Instruments(codes services scripts

underlying libraries)

Laboratory(sw and hw infrastructure

systems software

integrative platforms)

Setup

Drummond Replicability is not Reproducibility Nor is it Good Science onlinePeng Reproducible Research in Computational Science Science 2 Dec 2011 1226-1227

Research Environment

submit articleand move onhellip

publish articlePublication Environment

Research Environment

publish articlePublication Environment

submit articleand move onhellip

[Adapted Freire 2013]

transparencydependencies

stepsprovenance

portability

robustness

preservation

accessavailable

descriptionintelligible

standardscommon APIs

licensing

standardscommon

metadata

change managementversioning

packaging

Machine actionable

Machine actionable

Provenance ndash the link between doing and reporting

Reproduce by ReadingArchived Record Retain the ProcessCode

The IT Crowd Series 3 Episode 4

The eLab Virtual Machine (or Docker Image)

a black box thoughdockercom

Reproduce by Running Active InstrumentRetain the bits

The IT Crowd Series 3 Episode 4

The Internet

serviceScience as a Service

Integrative frameworks

Open Source

Workflows

Virtual Machines

Portable Packaging

Portability

Transparency

ReproZip

Workflowsmakefiles

serviceScience as a Service

Integrative frameworks

Open Source

Workflows

Virtual Machines

Portable Packaging

Metadata Objectsthe secret is the manifesthellip

Workflow definition

Data (inputs outputs) Parameter configsProvenance log

Hettne et al Structuring research methods and data with the research object model genomics workflows as a case study 2014 httpwwwjbiomedsemcomcontentpdf2041-1480-5-41pdf

myRDM

httpwwwcnrirestonvauspapersOverviewDigitalObjectArchitecturepdf

Depth and Coverage Profiles

NISO-JATS

NISO-JATS

Depth and Coverage Metadata Profiles

Zhao et al 2013

Method Matters

Make reproducible -gt Born

Be smart about reproducibility

Think Commons not Repository

Best Practices for Scientific Computing httparxivorgabs12100530Stodden Reproducible Research Standard Intl J Comm Law amp Policy 13 2009

RARE amp FAIR Knowledge Turns with Research Objects

httpmerchandisethedoctorwhositecouk

httpdoctorwhosite1weeblycomsonic-screwdrivershtml

Researcher Silver bullet tools

Psychic paper

httpbowjamesbowca20080608shhhhhhh-silencshtml

Reality Check

Jorge Cham wwwphdcomicscom

Stealthy not Sneakyreduce the friction

instrumentationspan RARE and FAIR

Optimising The Neylon Equation

Auto-magical end-to-end Instrumentationhttpswwwyoutubecomwatchv=QVQwSOX5S08

ELNs and Authoring Platforms

Sweave

Credit ne AuthorshipCiting what

Research Currencies

Training

56Of UK researchers develop their own research software or scripts

73 Of UK researchers have had no formal software engineering training

Survey of researchers from 15 Russell Group universities conducted by SSI between August - October 2014 406 respondents covering representative range of funders discipline and seniority

httpwwwrseacuk

BUThelliphellip

two years time when the paper is writtenreviewers want additional workstatistician wants more runsanalysis may need to be repeatedpost-doc leaves student arrivesnew data revised dataupdated versions of algorithmscodessample was contaminated

Inspired by Bob Harrison

bull Incremental shift for infrastructure providers

bull Moderate shift for policy makers and stewards

bull Paradigm shift for researchers and their institutions

The Challenge

All the members of the Wf4Ever teamColleagues in Manchesterrsquos Information Management Group

httpwwwresearchobjectorg

httpwwwwf4ever-projectorg

httpwwwfair-domorg

httpseek4scienceorg

httprightfieldorguk

httpwwwsoftwareacuk

httpwwwdatafairportorg

httpmyexperimentorg

httpwwwbioveleuAlan WilliamsNorman MorrisonStian Soiland-ReyesPaul GrothTim ClarkJuliana FreireAlejandra Gonzalez-BeltranPhilippe Rocca-SerraIan CottamSusanna SansoneKristian Garza

Barend MonsSean BechhoferPhilip BourneMatthew GambleRaul PalmaJun ZhaoNeil Chue HongJosh SommerMatthias ObstJacky SnoepDavid GavaghanRebecca Lawrence

Contacthellip

Professor Carole Goble CBE FREng FBCS

The University of Manchester UK

carolegoblemanchesteracuk

httpssitesgooglecomsitecarolegoble

httpwwwmygridorguk

Page 13: Keynote speech - Carole Goble - Jisc Digital Festival 2015

Honest Error Science is messy

Inherent

ReinhartRogoff Austerity economicsThomas Herndon

Nature Oct rsquo12

Zoeuml Corbyn

Fraud

ldquoI canrsquot immediately reproduce the research in my own laboratory It took an estimated 280 hours for an average user to approximately reproduce the paperrdquo

Prof Phil BourneAssociate Director NIH Big Data 2 Knowledge Program

When research goes ldquowrongrdquo

raquo Tainted resources

raquo Black boxes

raquo Poor Reporting

raquo Unavailable resources results data software

raquo Bad maths

raquo Sins of omission

raquo Poor training sloppiness

httpswwwsciencenewsorgarticle12-reasons-research-goes-wrong (adapted)

Ioannidis Why Most Published Research Findings Are False August 2005Joppa et alTroubling Trends in Scientific Software Use SCIENCE 340 May 2013

Scientific method

Social environmentraquo Impact factor mania

raquo Pressure to publish

raquo Broken peer review

raquo Research never reported

raquo Disorganisation

raquo Time pressures

raquo Prep amp curate costs

When research goes ldquowrongrdquo

httpswwwsciencenewsorgarticle12-reasons-research-goes-wrong (adapted)

Nick D Kim strange-matternet

Norman Morrison

Do a Replication Study No thanks Not FAIR

Hard Resource intensiveUnrecognised TrolledJust gathering the bits

Cross-Institutional e-Laboratory

Scattered parts Subject specific General resources

Fragmented Landscape

101 Innovations in Scholarly Communication - the Changing Research Workflow Boseman and Kramer 2015 httpfigsharecomarticles101_Innovations_in_Scholarly_Communication_the_Changing_Research_Workflow1286826

httpmyexperimentorg

Research Objects

Compound Investigations Research Products

Multi-various ProductsPlatformsResources

Units of exchange commons contextual metadata

httpwwwresearchobjectorg

httpwwwresearchobjectorg

First class citizens - data software methods - id manage credit track profile focus

A Framework to Bundle and Relate (scattered) resources Metadata Objects that carry Research Context

Research Objects

bull closed lt-gt open bull local lt-gt alienbull embed lt-gt referbull fixed lt-gt fluidbull nested

bull multi ndashtyped stewarded sited authored

bull span research researchers platforms time

bull cite resolve steward

Goble De Roure Bechhofer Accelerating Knowledge Turns I3CK 2013

means

ends

driver

Research Object packages codes study and metadata to exchange descriptions of clinical study cohorts statistical scripts data (CKAN for the Farr Commons)

STELAR Asthma e-Lab Study Team for Early Life Asthma Research

ClinicalCodesorg coded patient cohorts exchanged with NHS FARSITE system

MRC funded multi-site collaboration to support safe use of patient and research data for medical research

STELAR e-Lab

Platform 1

Platform 2

Platform 3

Focus Pivot and ProfileProfile around methods workflows scripts software data figureshellip

Focus on the figure F1000Research Living Figuresversioned articles in-article data manipulation

R Lawrence Force2015 Vision Award Runner Up httpf1000compostersbrowsesummary1097482

Simply data + code

Can change the definition of

a figure and ultimately the

journal article

Colomb J and Brembs B

Sub-strains of Drosophila Canton-S differ

markedly in their locomotor behavior [v1

ref status indexed httpf1000res3is]

F1000Research 2014 3176

Other labs can replicate the study or

contribute their data to a meta-

analysis or disease model - figure

automatically updates

Data updates time-stamped

New conclusions added via versions

Jennifer Schopf Treating Data Like Software A Case for Production Quality Data JCDL 2012

Software-like Release paradigm Not a static document paradigm

Reproduce looks backwards -gt Release looks forwards

raquo Science methods data change -gt agile evolution

raquo Comparisons versions forks amp merges dependencies

raquo Id amp Citations

raquo Interlinked ROs

httpsdoiorg1015490seek1investigation56

[Snoep 2015]

httpsdoiorg1015490seek1investigation56

Personal Data

Local Stores

External

Databases

Articles

Models

Standards

Aggregated Commons Infrastructure

Consistent Comparative Reportingbull Design protocols samples

software modelshellipbull Just Enough Results Modelbull Common and specific elements

httpwwwseek4scienceorg httpwwwfair-domorg httpisatoolsorg

RO as Instrument Materials Method

RO as Instrument Materials Method

Input Data

Software

Output Data

ConfigParameters

Drummond Replicability is not Reproducibility Nor is it Good Science onlinePeng Reproducible Research in Computational Science Science 2 Dec 2011 1226-1227

Public data sets

My algorithm

RO Workflow as Instrument

BioSTIF

My data set

Public software

recompute

replicatererun

repeat

re-examine

repurpose

recreate

reuse

restorereconstruct review

regeneraterevise

recycle

redo

What IS reproducibilityRe ldquodo againrdquo ldquoreturn to original staterdquo

regenerate figure

ldquoshow A is true by doing Brdquo

verify but not falsify[Yong Nature 485 2012]

robustness tolerance

verification compliance

validation assurance

1 Science Changes So does the Lab

BioSTIF

ldquoThe questions donrsquot

change but the

answers dordquoDan Reed

The lab is not fixedUpdated resources

Uncertainty

Zhao et al Why workflows break - Understanding and combating decay in Taverna workflows 8th Intl Conf e-Science 2012

2 Instruments Break Labs Decaymaterials become unavailable technicians leave

Reproducibility Window

raquo Bit rot Black boxes

raquo Proprietary Licenses

raquo Clown services

raquo Partial replication

raquo Prepare to Repair

rsaquo form or function

rsaquo preserve or sustain

RO as Instrument Materials Method

Input Data

Software

Output Data

ConfigParameters

Methods(techniques algorithms

spec of the steps)

Materials(datasets parameters

algorithm seeds)

Experiment

Instruments(codes services scripts

underlying libraries)

Laboratory(sw and hw infrastructure

systems software

integrative platforms)

Setup

Drummond Replicability is not Reproducibility Nor is it Good Science onlinePeng Reproducible Research in Computational Science Science 2 Dec 2011 1226-1227

Research Environment

submit articleand move onhellip

publish articlePublication Environment

Research Environment

publish articlePublication Environment

submit articleand move onhellip

[Adapted Freire 2013]

transparencydependencies

stepsprovenance

portability

robustness

preservation

accessavailable

descriptionintelligible

standardscommon APIs

licensing

standardscommon

metadata

change managementversioning

packaging

Machine actionable

Machine actionable

Provenance ndash the link between doing and reporting

Reproduce by ReadingArchived Record Retain the ProcessCode

The IT Crowd Series 3 Episode 4

The eLab Virtual Machine (or Docker Image)

a black box thoughdockercom

Reproduce by Running Active InstrumentRetain the bits

The IT Crowd Series 3 Episode 4

The Internet

serviceScience as a Service

Integrative frameworks

Open Source

Workflows

Virtual Machines

Portable Packaging

Portability

Transparency

ReproZip

Workflowsmakefiles

serviceScience as a Service

Integrative frameworks

Open Source

Workflows

Virtual Machines

Portable Packaging

Metadata Objectsthe secret is the manifesthellip

Workflow definition

Data (inputs outputs) Parameter configsProvenance log

Hettne et al Structuring research methods and data with the research object model genomics workflows as a case study 2014 httpwwwjbiomedsemcomcontentpdf2041-1480-5-41pdf

myRDM

httpwwwcnrirestonvauspapersOverviewDigitalObjectArchitecturepdf

Depth and Coverage Profiles

NISO-JATS

NISO-JATS

Depth and Coverage Metadata Profiles

Zhao et al 2013

Method Matters

Make reproducible -gt Born

Be smart about reproducibility

Think Commons not Repository

Best Practices for Scientific Computing httparxivorgabs12100530Stodden Reproducible Research Standard Intl J Comm Law amp Policy 13 2009

RARE amp FAIR Knowledge Turns with Research Objects

httpmerchandisethedoctorwhositecouk

httpdoctorwhosite1weeblycomsonic-screwdrivershtml

Researcher Silver bullet tools

Psychic paper

httpbowjamesbowca20080608shhhhhhh-silencshtml

Reality Check

Jorge Cham wwwphdcomicscom

Stealthy not Sneakyreduce the friction

instrumentationspan RARE and FAIR

Optimising The Neylon Equation

Auto-magical end-to-end Instrumentationhttpswwwyoutubecomwatchv=QVQwSOX5S08

ELNs and Authoring Platforms

Sweave

Credit ne AuthorshipCiting what

Research Currencies

Training

56Of UK researchers develop their own research software or scripts

73 Of UK researchers have had no formal software engineering training

Survey of researchers from 15 Russell Group universities conducted by SSI between August - October 2014 406 respondents covering representative range of funders discipline and seniority

httpwwwrseacuk

BUThelliphellip

two years time when the paper is writtenreviewers want additional workstatistician wants more runsanalysis may need to be repeatedpost-doc leaves student arrivesnew data revised dataupdated versions of algorithmscodessample was contaminated

Inspired by Bob Harrison

bull Incremental shift for infrastructure providers

bull Moderate shift for policy makers and stewards

bull Paradigm shift for researchers and their institutions

The Challenge

All the members of the Wf4Ever teamColleagues in Manchesterrsquos Information Management Group

httpwwwresearchobjectorg

httpwwwwf4ever-projectorg

httpwwwfair-domorg

httpseek4scienceorg

httprightfieldorguk

httpwwwsoftwareacuk

httpwwwdatafairportorg

httpmyexperimentorg

httpwwwbioveleuAlan WilliamsNorman MorrisonStian Soiland-ReyesPaul GrothTim ClarkJuliana FreireAlejandra Gonzalez-BeltranPhilippe Rocca-SerraIan CottamSusanna SansoneKristian Garza

Barend MonsSean BechhoferPhilip BourneMatthew GambleRaul PalmaJun ZhaoNeil Chue HongJosh SommerMatthias ObstJacky SnoepDavid GavaghanRebecca Lawrence

Contacthellip

Professor Carole Goble CBE FREng FBCS

The University of Manchester UK

carolegoblemanchesteracuk

httpssitesgooglecomsitecarolegoble

httpwwwmygridorguk

Page 14: Keynote speech - Carole Goble - Jisc Digital Festival 2015

ldquoI canrsquot immediately reproduce the research in my own laboratory It took an estimated 280 hours for an average user to approximately reproduce the paperrdquo

Prof Phil BourneAssociate Director NIH Big Data 2 Knowledge Program

When research goes ldquowrongrdquo

raquo Tainted resources

raquo Black boxes

raquo Poor Reporting

raquo Unavailable resources results data software

raquo Bad maths

raquo Sins of omission

raquo Poor training sloppiness

httpswwwsciencenewsorgarticle12-reasons-research-goes-wrong (adapted)

Ioannidis Why Most Published Research Findings Are False August 2005Joppa et alTroubling Trends in Scientific Software Use SCIENCE 340 May 2013

Scientific method

Social environmentraquo Impact factor mania

raquo Pressure to publish

raquo Broken peer review

raquo Research never reported

raquo Disorganisation

raquo Time pressures

raquo Prep amp curate costs

When research goes ldquowrongrdquo

httpswwwsciencenewsorgarticle12-reasons-research-goes-wrong (adapted)

Nick D Kim strange-matternet

Norman Morrison

Do a Replication Study No thanks Not FAIR

Hard Resource intensiveUnrecognised TrolledJust gathering the bits

Cross-Institutional e-Laboratory

Scattered parts Subject specific General resources

Fragmented Landscape

101 Innovations in Scholarly Communication - the Changing Research Workflow Boseman and Kramer 2015 httpfigsharecomarticles101_Innovations_in_Scholarly_Communication_the_Changing_Research_Workflow1286826

httpmyexperimentorg

Research Objects

Compound Investigations Research Products

Multi-various ProductsPlatformsResources

Units of exchange commons contextual metadata

httpwwwresearchobjectorg

httpwwwresearchobjectorg

First class citizens - data software methods - id manage credit track profile focus

A Framework to Bundle and Relate (scattered) resources Metadata Objects that carry Research Context

Research Objects

bull closed lt-gt open bull local lt-gt alienbull embed lt-gt referbull fixed lt-gt fluidbull nested

bull multi ndashtyped stewarded sited authored

bull span research researchers platforms time

bull cite resolve steward

Goble De Roure Bechhofer Accelerating Knowledge Turns I3CK 2013

means

ends

driver

Research Object packages codes study and metadata to exchange descriptions of clinical study cohorts statistical scripts data (CKAN for the Farr Commons)

STELAR Asthma e-Lab Study Team for Early Life Asthma Research

ClinicalCodesorg coded patient cohorts exchanged with NHS FARSITE system

MRC funded multi-site collaboration to support safe use of patient and research data for medical research

STELAR e-Lab

Platform 1

Platform 2

Platform 3

Focus Pivot and ProfileProfile around methods workflows scripts software data figureshellip

Focus on the figure F1000Research Living Figuresversioned articles in-article data manipulation

R Lawrence Force2015 Vision Award Runner Up httpf1000compostersbrowsesummary1097482

Simply data + code

Can change the definition of

a figure and ultimately the

journal article

Colomb J and Brembs B

Sub-strains of Drosophila Canton-S differ

markedly in their locomotor behavior [v1

ref status indexed httpf1000res3is]

F1000Research 2014 3176

Other labs can replicate the study or

contribute their data to a meta-

analysis or disease model - figure

automatically updates

Data updates time-stamped

New conclusions added via versions

Jennifer Schopf Treating Data Like Software A Case for Production Quality Data JCDL 2012

Software-like Release paradigm Not a static document paradigm

Reproduce looks backwards -gt Release looks forwards

raquo Science methods data change -gt agile evolution

raquo Comparisons versions forks amp merges dependencies

raquo Id amp Citations

raquo Interlinked ROs

httpsdoiorg1015490seek1investigation56

[Snoep 2015]

httpsdoiorg1015490seek1investigation56

Personal Data

Local Stores

External

Databases

Articles

Models

Standards

Aggregated Commons Infrastructure

Consistent Comparative Reportingbull Design protocols samples

software modelshellipbull Just Enough Results Modelbull Common and specific elements

httpwwwseek4scienceorg httpwwwfair-domorg httpisatoolsorg

RO as Instrument Materials Method

RO as Instrument Materials Method

Input Data

Software

Output Data

ConfigParameters

Drummond Replicability is not Reproducibility Nor is it Good Science onlinePeng Reproducible Research in Computational Science Science 2 Dec 2011 1226-1227

Public data sets

My algorithm

RO Workflow as Instrument

BioSTIF

My data set

Public software

recompute

replicatererun

repeat

re-examine

repurpose

recreate

reuse

restorereconstruct review

regeneraterevise

recycle

redo

What IS reproducibilityRe ldquodo againrdquo ldquoreturn to original staterdquo

regenerate figure

ldquoshow A is true by doing Brdquo

verify but not falsify[Yong Nature 485 2012]

robustness tolerance

verification compliance

validation assurance

1 Science Changes So does the Lab

BioSTIF

ldquoThe questions donrsquot

change but the

answers dordquoDan Reed

The lab is not fixedUpdated resources

Uncertainty

Zhao et al Why workflows break - Understanding and combating decay in Taverna workflows 8th Intl Conf e-Science 2012

2 Instruments Break Labs Decaymaterials become unavailable technicians leave

Reproducibility Window

raquo Bit rot Black boxes

raquo Proprietary Licenses

raquo Clown services

raquo Partial replication

raquo Prepare to Repair

rsaquo form or function

rsaquo preserve or sustain

RO as Instrument Materials Method

Input Data

Software

Output Data

ConfigParameters

Methods(techniques algorithms

spec of the steps)

Materials(datasets parameters

algorithm seeds)

Experiment

Instruments(codes services scripts

underlying libraries)

Laboratory(sw and hw infrastructure

systems software

integrative platforms)

Setup

Drummond Replicability is not Reproducibility Nor is it Good Science onlinePeng Reproducible Research in Computational Science Science 2 Dec 2011 1226-1227

Research Environment

submit articleand move onhellip

publish articlePublication Environment

Research Environment

publish articlePublication Environment

submit articleand move onhellip

[Adapted Freire 2013]

transparencydependencies

stepsprovenance

portability

robustness

preservation

accessavailable

descriptionintelligible

standardscommon APIs

licensing

standardscommon

metadata

change managementversioning

packaging

Machine actionable

Machine actionable

Provenance ndash the link between doing and reporting

Reproduce by ReadingArchived Record Retain the ProcessCode

The IT Crowd Series 3 Episode 4

The eLab Virtual Machine (or Docker Image)

a black box thoughdockercom

Reproduce by Running Active InstrumentRetain the bits

The IT Crowd Series 3 Episode 4

The Internet

serviceScience as a Service

Integrative frameworks

Open Source

Workflows

Virtual Machines

Portable Packaging

Portability

Transparency

ReproZip

Workflowsmakefiles

serviceScience as a Service

Integrative frameworks

Open Source

Workflows

Virtual Machines

Portable Packaging

Metadata Objectsthe secret is the manifesthellip

Workflow definition

Data (inputs outputs) Parameter configsProvenance log

Hettne et al Structuring research methods and data with the research object model genomics workflows as a case study 2014 httpwwwjbiomedsemcomcontentpdf2041-1480-5-41pdf

myRDM

httpwwwcnrirestonvauspapersOverviewDigitalObjectArchitecturepdf

Depth and Coverage Profiles

NISO-JATS

NISO-JATS

Depth and Coverage Metadata Profiles

Zhao et al 2013

Method Matters

Make reproducible -gt Born

Be smart about reproducibility

Think Commons not Repository

Best Practices for Scientific Computing httparxivorgabs12100530Stodden Reproducible Research Standard Intl J Comm Law amp Policy 13 2009

RARE amp FAIR Knowledge Turns with Research Objects

httpmerchandisethedoctorwhositecouk

httpdoctorwhosite1weeblycomsonic-screwdrivershtml

Researcher Silver bullet tools

Psychic paper

httpbowjamesbowca20080608shhhhhhh-silencshtml

Reality Check

Jorge Cham wwwphdcomicscom

Stealthy not Sneakyreduce the friction

instrumentationspan RARE and FAIR

Optimising The Neylon Equation

Auto-magical end-to-end Instrumentationhttpswwwyoutubecomwatchv=QVQwSOX5S08

ELNs and Authoring Platforms

Sweave

Credit ne AuthorshipCiting what

Research Currencies

Training

56Of UK researchers develop their own research software or scripts

73 Of UK researchers have had no formal software engineering training

Survey of researchers from 15 Russell Group universities conducted by SSI between August - October 2014 406 respondents covering representative range of funders discipline and seniority

httpwwwrseacuk

BUThelliphellip

two years time when the paper is writtenreviewers want additional workstatistician wants more runsanalysis may need to be repeatedpost-doc leaves student arrivesnew data revised dataupdated versions of algorithmscodessample was contaminated

Inspired by Bob Harrison

bull Incremental shift for infrastructure providers

bull Moderate shift for policy makers and stewards

bull Paradigm shift for researchers and their institutions

The Challenge

All the members of the Wf4Ever teamColleagues in Manchesterrsquos Information Management Group

httpwwwresearchobjectorg

httpwwwwf4ever-projectorg

httpwwwfair-domorg

httpseek4scienceorg

httprightfieldorguk

httpwwwsoftwareacuk

httpwwwdatafairportorg

httpmyexperimentorg

httpwwwbioveleuAlan WilliamsNorman MorrisonStian Soiland-ReyesPaul GrothTim ClarkJuliana FreireAlejandra Gonzalez-BeltranPhilippe Rocca-SerraIan CottamSusanna SansoneKristian Garza

Barend MonsSean BechhoferPhilip BourneMatthew GambleRaul PalmaJun ZhaoNeil Chue HongJosh SommerMatthias ObstJacky SnoepDavid GavaghanRebecca Lawrence

Contacthellip

Professor Carole Goble CBE FREng FBCS

The University of Manchester UK

carolegoblemanchesteracuk

httpssitesgooglecomsitecarolegoble

httpwwwmygridorguk

Page 15: Keynote speech - Carole Goble - Jisc Digital Festival 2015

When research goes ldquowrongrdquo

raquo Tainted resources

raquo Black boxes

raquo Poor Reporting

raquo Unavailable resources results data software

raquo Bad maths

raquo Sins of omission

raquo Poor training sloppiness

httpswwwsciencenewsorgarticle12-reasons-research-goes-wrong (adapted)

Ioannidis Why Most Published Research Findings Are False August 2005Joppa et alTroubling Trends in Scientific Software Use SCIENCE 340 May 2013

Scientific method

Social environmentraquo Impact factor mania

raquo Pressure to publish

raquo Broken peer review

raquo Research never reported

raquo Disorganisation

raquo Time pressures

raquo Prep amp curate costs

When research goes ldquowrongrdquo

httpswwwsciencenewsorgarticle12-reasons-research-goes-wrong (adapted)

Nick D Kim strange-matternet

Norman Morrison

Do a Replication Study No thanks Not FAIR

Hard Resource intensiveUnrecognised TrolledJust gathering the bits

Cross-Institutional e-Laboratory

Scattered parts Subject specific General resources

Fragmented Landscape

101 Innovations in Scholarly Communication - the Changing Research Workflow Boseman and Kramer 2015 httpfigsharecomarticles101_Innovations_in_Scholarly_Communication_the_Changing_Research_Workflow1286826

httpmyexperimentorg

Research Objects

Compound Investigations Research Products

Multi-various ProductsPlatformsResources

Units of exchange commons contextual metadata

httpwwwresearchobjectorg

httpwwwresearchobjectorg

First class citizens - data software methods - id manage credit track profile focus

A Framework to Bundle and Relate (scattered) resources Metadata Objects that carry Research Context

Research Objects

bull closed lt-gt open bull local lt-gt alienbull embed lt-gt referbull fixed lt-gt fluidbull nested

bull multi ndashtyped stewarded sited authored

bull span research researchers platforms time

bull cite resolve steward

Goble De Roure Bechhofer Accelerating Knowledge Turns I3CK 2013

means

ends

driver

Research Object packages codes study and metadata to exchange descriptions of clinical study cohorts statistical scripts data (CKAN for the Farr Commons)

STELAR Asthma e-Lab Study Team for Early Life Asthma Research

ClinicalCodesorg coded patient cohorts exchanged with NHS FARSITE system

MRC funded multi-site collaboration to support safe use of patient and research data for medical research

STELAR e-Lab

Platform 1

Platform 2

Platform 3

Focus Pivot and ProfileProfile around methods workflows scripts software data figureshellip

Focus on the figure F1000Research Living Figuresversioned articles in-article data manipulation

R Lawrence Force2015 Vision Award Runner Up httpf1000compostersbrowsesummary1097482

Simply data + code

Can change the definition of

a figure and ultimately the

journal article

Colomb J and Brembs B

Sub-strains of Drosophila Canton-S differ

markedly in their locomotor behavior [v1

ref status indexed httpf1000res3is]

F1000Research 2014 3176

Other labs can replicate the study or

contribute their data to a meta-

analysis or disease model - figure

automatically updates

Data updates time-stamped

New conclusions added via versions

Jennifer Schopf Treating Data Like Software A Case for Production Quality Data JCDL 2012

Software-like Release paradigm Not a static document paradigm

Reproduce looks backwards -gt Release looks forwards

raquo Science methods data change -gt agile evolution

raquo Comparisons versions forks amp merges dependencies

raquo Id amp Citations

raquo Interlinked ROs

httpsdoiorg1015490seek1investigation56

[Snoep 2015]

httpsdoiorg1015490seek1investigation56

Personal Data

Local Stores

External

Databases

Articles

Models

Standards

Aggregated Commons Infrastructure

Consistent Comparative Reportingbull Design protocols samples

software modelshellipbull Just Enough Results Modelbull Common and specific elements

httpwwwseek4scienceorg httpwwwfair-domorg httpisatoolsorg

RO as Instrument Materials Method

RO as Instrument Materials Method

Input Data

Software

Output Data

ConfigParameters

Drummond Replicability is not Reproducibility Nor is it Good Science onlinePeng Reproducible Research in Computational Science Science 2 Dec 2011 1226-1227

Public data sets

My algorithm

RO Workflow as Instrument

BioSTIF

My data set

Public software

recompute

replicatererun

repeat

re-examine

repurpose

recreate

reuse

restorereconstruct review

regeneraterevise

recycle

redo

What IS reproducibilityRe ldquodo againrdquo ldquoreturn to original staterdquo

regenerate figure

ldquoshow A is true by doing Brdquo

verify but not falsify[Yong Nature 485 2012]

robustness tolerance

verification compliance

validation assurance

1 Science Changes So does the Lab

BioSTIF

ldquoThe questions donrsquot

change but the

answers dordquoDan Reed

The lab is not fixedUpdated resources

Uncertainty

Zhao et al Why workflows break - Understanding and combating decay in Taverna workflows 8th Intl Conf e-Science 2012

2 Instruments Break Labs Decaymaterials become unavailable technicians leave

Reproducibility Window

raquo Bit rot Black boxes

raquo Proprietary Licenses

raquo Clown services

raquo Partial replication

raquo Prepare to Repair

rsaquo form or function

rsaquo preserve or sustain

RO as Instrument Materials Method

Input Data

Software

Output Data

ConfigParameters

Methods(techniques algorithms

spec of the steps)

Materials(datasets parameters

algorithm seeds)

Experiment

Instruments(codes services scripts

underlying libraries)

Laboratory(sw and hw infrastructure

systems software

integrative platforms)

Setup

Drummond Replicability is not Reproducibility Nor is it Good Science onlinePeng Reproducible Research in Computational Science Science 2 Dec 2011 1226-1227

Research Environment

submit articleand move onhellip

publish articlePublication Environment

Research Environment

publish articlePublication Environment

submit articleand move onhellip

[Adapted Freire 2013]

transparencydependencies

stepsprovenance

portability

robustness

preservation

accessavailable

descriptionintelligible

standardscommon APIs

licensing

standardscommon

metadata

change managementversioning

packaging

Machine actionable

Machine actionable

Provenance ndash the link between doing and reporting

Reproduce by ReadingArchived Record Retain the ProcessCode

The IT Crowd Series 3 Episode 4

The eLab Virtual Machine (or Docker Image)

a black box thoughdockercom

Reproduce by Running Active InstrumentRetain the bits

The IT Crowd Series 3 Episode 4

The Internet

serviceScience as a Service

Integrative frameworks

Open Source

Workflows

Virtual Machines

Portable Packaging

Portability

Transparency

ReproZip

Workflowsmakefiles

serviceScience as a Service

Integrative frameworks

Open Source

Workflows

Virtual Machines

Portable Packaging

Metadata Objectsthe secret is the manifesthellip

Workflow definition

Data (inputs outputs) Parameter configsProvenance log

Hettne et al Structuring research methods and data with the research object model genomics workflows as a case study 2014 httpwwwjbiomedsemcomcontentpdf2041-1480-5-41pdf

myRDM

httpwwwcnrirestonvauspapersOverviewDigitalObjectArchitecturepdf

Depth and Coverage Profiles

NISO-JATS

NISO-JATS

Depth and Coverage Metadata Profiles

Zhao et al 2013

Method Matters

Make reproducible -gt Born

Be smart about reproducibility

Think Commons not Repository

Best Practices for Scientific Computing httparxivorgabs12100530Stodden Reproducible Research Standard Intl J Comm Law amp Policy 13 2009

RARE amp FAIR Knowledge Turns with Research Objects

httpmerchandisethedoctorwhositecouk

httpdoctorwhosite1weeblycomsonic-screwdrivershtml

Researcher Silver bullet tools

Psychic paper

httpbowjamesbowca20080608shhhhhhh-silencshtml

Reality Check

Jorge Cham wwwphdcomicscom

Stealthy not Sneakyreduce the friction

instrumentationspan RARE and FAIR

Optimising The Neylon Equation

Auto-magical end-to-end Instrumentationhttpswwwyoutubecomwatchv=QVQwSOX5S08

ELNs and Authoring Platforms

Sweave

Credit ne AuthorshipCiting what

Research Currencies

Training

56Of UK researchers develop their own research software or scripts

73 Of UK researchers have had no formal software engineering training

Survey of researchers from 15 Russell Group universities conducted by SSI between August - October 2014 406 respondents covering representative range of funders discipline and seniority

httpwwwrseacuk

BUThelliphellip

two years time when the paper is writtenreviewers want additional workstatistician wants more runsanalysis may need to be repeatedpost-doc leaves student arrivesnew data revised dataupdated versions of algorithmscodessample was contaminated

Inspired by Bob Harrison

bull Incremental shift for infrastructure providers

bull Moderate shift for policy makers and stewards

bull Paradigm shift for researchers and their institutions

The Challenge

All the members of the Wf4Ever teamColleagues in Manchesterrsquos Information Management Group

httpwwwresearchobjectorg

httpwwwwf4ever-projectorg

httpwwwfair-domorg

httpseek4scienceorg

httprightfieldorguk

httpwwwsoftwareacuk

httpwwwdatafairportorg

httpmyexperimentorg

httpwwwbioveleuAlan WilliamsNorman MorrisonStian Soiland-ReyesPaul GrothTim ClarkJuliana FreireAlejandra Gonzalez-BeltranPhilippe Rocca-SerraIan CottamSusanna SansoneKristian Garza

Barend MonsSean BechhoferPhilip BourneMatthew GambleRaul PalmaJun ZhaoNeil Chue HongJosh SommerMatthias ObstJacky SnoepDavid GavaghanRebecca Lawrence

Contacthellip

Professor Carole Goble CBE FREng FBCS

The University of Manchester UK

carolegoblemanchesteracuk

httpssitesgooglecomsitecarolegoble

httpwwwmygridorguk

Page 16: Keynote speech - Carole Goble - Jisc Digital Festival 2015

Social environmentraquo Impact factor mania

raquo Pressure to publish

raquo Broken peer review

raquo Research never reported

raquo Disorganisation

raquo Time pressures

raquo Prep amp curate costs

When research goes ldquowrongrdquo

httpswwwsciencenewsorgarticle12-reasons-research-goes-wrong (adapted)

Nick D Kim strange-matternet

Norman Morrison

Do a Replication Study No thanks Not FAIR

Hard Resource intensiveUnrecognised TrolledJust gathering the bits

Cross-Institutional e-Laboratory

Scattered parts Subject specific General resources

Fragmented Landscape

101 Innovations in Scholarly Communication - the Changing Research Workflow Boseman and Kramer 2015 httpfigsharecomarticles101_Innovations_in_Scholarly_Communication_the_Changing_Research_Workflow1286826

httpmyexperimentorg

Research Objects

Compound Investigations Research Products

Multi-various ProductsPlatformsResources

Units of exchange commons contextual metadata

httpwwwresearchobjectorg

httpwwwresearchobjectorg

First class citizens - data software methods - id manage credit track profile focus

A Framework to Bundle and Relate (scattered) resources Metadata Objects that carry Research Context

Research Objects

bull closed lt-gt open bull local lt-gt alienbull embed lt-gt referbull fixed lt-gt fluidbull nested

bull multi ndashtyped stewarded sited authored

bull span research researchers platforms time

bull cite resolve steward

Goble De Roure Bechhofer Accelerating Knowledge Turns I3CK 2013

means

ends

driver

Research Object packages codes study and metadata to exchange descriptions of clinical study cohorts statistical scripts data (CKAN for the Farr Commons)

STELAR Asthma e-Lab Study Team for Early Life Asthma Research

ClinicalCodesorg coded patient cohorts exchanged with NHS FARSITE system

MRC funded multi-site collaboration to support safe use of patient and research data for medical research

STELAR e-Lab

Platform 1

Platform 2

Platform 3

Focus Pivot and ProfileProfile around methods workflows scripts software data figureshellip

Focus on the figure F1000Research Living Figuresversioned articles in-article data manipulation

R Lawrence Force2015 Vision Award Runner Up httpf1000compostersbrowsesummary1097482

Simply data + code

Can change the definition of

a figure and ultimately the

journal article

Colomb J and Brembs B

Sub-strains of Drosophila Canton-S differ

markedly in their locomotor behavior [v1

ref status indexed httpf1000res3is]

F1000Research 2014 3176

Other labs can replicate the study or

contribute their data to a meta-

analysis or disease model - figure

automatically updates

Data updates time-stamped

New conclusions added via versions

Jennifer Schopf Treating Data Like Software A Case for Production Quality Data JCDL 2012

Software-like Release paradigm Not a static document paradigm

Reproduce looks backwards -gt Release looks forwards

raquo Science methods data change -gt agile evolution

raquo Comparisons versions forks amp merges dependencies

raquo Id amp Citations

raquo Interlinked ROs

httpsdoiorg1015490seek1investigation56

[Snoep 2015]

httpsdoiorg1015490seek1investigation56

Personal Data

Local Stores

External

Databases

Articles

Models

Standards

Aggregated Commons Infrastructure

Consistent Comparative Reportingbull Design protocols samples

software modelshellipbull Just Enough Results Modelbull Common and specific elements

httpwwwseek4scienceorg httpwwwfair-domorg httpisatoolsorg

RO as Instrument Materials Method

RO as Instrument Materials Method

Input Data

Software

Output Data

ConfigParameters

Drummond Replicability is not Reproducibility Nor is it Good Science onlinePeng Reproducible Research in Computational Science Science 2 Dec 2011 1226-1227

Public data sets

My algorithm

RO Workflow as Instrument

BioSTIF

My data set

Public software

recompute

replicatererun

repeat

re-examine

repurpose

recreate

reuse

restorereconstruct review

regeneraterevise

recycle

redo

What IS reproducibilityRe ldquodo againrdquo ldquoreturn to original staterdquo

regenerate figure

ldquoshow A is true by doing Brdquo

verify but not falsify[Yong Nature 485 2012]

robustness tolerance

verification compliance

validation assurance

1 Science Changes So does the Lab

BioSTIF

ldquoThe questions donrsquot

change but the

answers dordquoDan Reed

The lab is not fixedUpdated resources

Uncertainty

Zhao et al Why workflows break - Understanding and combating decay in Taverna workflows 8th Intl Conf e-Science 2012

2 Instruments Break Labs Decaymaterials become unavailable technicians leave

Reproducibility Window

raquo Bit rot Black boxes

raquo Proprietary Licenses

raquo Clown services

raquo Partial replication

raquo Prepare to Repair

rsaquo form or function

rsaquo preserve or sustain

RO as Instrument Materials Method

Input Data

Software

Output Data

ConfigParameters

Methods(techniques algorithms

spec of the steps)

Materials(datasets parameters

algorithm seeds)

Experiment

Instruments(codes services scripts

underlying libraries)

Laboratory(sw and hw infrastructure

systems software

integrative platforms)

Setup

Drummond Replicability is not Reproducibility Nor is it Good Science onlinePeng Reproducible Research in Computational Science Science 2 Dec 2011 1226-1227

Research Environment

submit articleand move onhellip

publish articlePublication Environment

Research Environment

publish articlePublication Environment

submit articleand move onhellip

[Adapted Freire 2013]

transparencydependencies

stepsprovenance

portability

robustness

preservation

accessavailable

descriptionintelligible

standardscommon APIs

licensing

standardscommon

metadata

change managementversioning

packaging

Machine actionable

Machine actionable

Provenance ndash the link between doing and reporting

Reproduce by ReadingArchived Record Retain the ProcessCode

The IT Crowd Series 3 Episode 4

The eLab Virtual Machine (or Docker Image)

a black box thoughdockercom

Reproduce by Running Active InstrumentRetain the bits

The IT Crowd Series 3 Episode 4

The Internet

serviceScience as a Service

Integrative frameworks

Open Source

Workflows

Virtual Machines

Portable Packaging

Portability

Transparency

ReproZip

Workflowsmakefiles

serviceScience as a Service

Integrative frameworks

Open Source

Workflows

Virtual Machines

Portable Packaging

Metadata Objectsthe secret is the manifesthellip

Workflow definition

Data (inputs outputs) Parameter configsProvenance log

Hettne et al Structuring research methods and data with the research object model genomics workflows as a case study 2014 httpwwwjbiomedsemcomcontentpdf2041-1480-5-41pdf

myRDM

httpwwwcnrirestonvauspapersOverviewDigitalObjectArchitecturepdf

Depth and Coverage Profiles

NISO-JATS

NISO-JATS

Depth and Coverage Metadata Profiles

Zhao et al 2013

Method Matters

Make reproducible -gt Born

Be smart about reproducibility

Think Commons not Repository

Best Practices for Scientific Computing httparxivorgabs12100530Stodden Reproducible Research Standard Intl J Comm Law amp Policy 13 2009

RARE amp FAIR Knowledge Turns with Research Objects

httpmerchandisethedoctorwhositecouk

httpdoctorwhosite1weeblycomsonic-screwdrivershtml

Researcher Silver bullet tools

Psychic paper

httpbowjamesbowca20080608shhhhhhh-silencshtml

Reality Check

Jorge Cham wwwphdcomicscom

Stealthy not Sneakyreduce the friction

instrumentationspan RARE and FAIR

Optimising The Neylon Equation

Auto-magical end-to-end Instrumentationhttpswwwyoutubecomwatchv=QVQwSOX5S08

ELNs and Authoring Platforms

Sweave

Credit ne AuthorshipCiting what

Research Currencies

Training

56Of UK researchers develop their own research software or scripts

73 Of UK researchers have had no formal software engineering training

Survey of researchers from 15 Russell Group universities conducted by SSI between August - October 2014 406 respondents covering representative range of funders discipline and seniority

httpwwwrseacuk

BUThelliphellip

two years time when the paper is writtenreviewers want additional workstatistician wants more runsanalysis may need to be repeatedpost-doc leaves student arrivesnew data revised dataupdated versions of algorithmscodessample was contaminated

Inspired by Bob Harrison

bull Incremental shift for infrastructure providers

bull Moderate shift for policy makers and stewards

bull Paradigm shift for researchers and their institutions

The Challenge

All the members of the Wf4Ever teamColleagues in Manchesterrsquos Information Management Group

httpwwwresearchobjectorg

httpwwwwf4ever-projectorg

httpwwwfair-domorg

httpseek4scienceorg

httprightfieldorguk

httpwwwsoftwareacuk

httpwwwdatafairportorg

httpmyexperimentorg

httpwwwbioveleuAlan WilliamsNorman MorrisonStian Soiland-ReyesPaul GrothTim ClarkJuliana FreireAlejandra Gonzalez-BeltranPhilippe Rocca-SerraIan CottamSusanna SansoneKristian Garza

Barend MonsSean BechhoferPhilip BourneMatthew GambleRaul PalmaJun ZhaoNeil Chue HongJosh SommerMatthias ObstJacky SnoepDavid GavaghanRebecca Lawrence

Contacthellip

Professor Carole Goble CBE FREng FBCS

The University of Manchester UK

carolegoblemanchesteracuk

httpssitesgooglecomsitecarolegoble

httpwwwmygridorguk

Page 17: Keynote speech - Carole Goble - Jisc Digital Festival 2015

Cross-Institutional e-Laboratory

Scattered parts Subject specific General resources

Fragmented Landscape

101 Innovations in Scholarly Communication - the Changing Research Workflow Boseman and Kramer 2015 httpfigsharecomarticles101_Innovations_in_Scholarly_Communication_the_Changing_Research_Workflow1286826

httpmyexperimentorg

Research Objects

Compound Investigations Research Products

Multi-various ProductsPlatformsResources

Units of exchange commons contextual metadata

httpwwwresearchobjectorg

httpwwwresearchobjectorg

First class citizens - data software methods - id manage credit track profile focus

A Framework to Bundle and Relate (scattered) resources Metadata Objects that carry Research Context

Research Objects

bull closed lt-gt open bull local lt-gt alienbull embed lt-gt referbull fixed lt-gt fluidbull nested

bull multi ndashtyped stewarded sited authored

bull span research researchers platforms time

bull cite resolve steward

Goble De Roure Bechhofer Accelerating Knowledge Turns I3CK 2013

means

ends

driver

Research Object packages codes study and metadata to exchange descriptions of clinical study cohorts statistical scripts data (CKAN for the Farr Commons)

STELAR Asthma e-Lab Study Team for Early Life Asthma Research

ClinicalCodesorg coded patient cohorts exchanged with NHS FARSITE system

MRC funded multi-site collaboration to support safe use of patient and research data for medical research

STELAR e-Lab

Platform 1

Platform 2

Platform 3

Focus Pivot and ProfileProfile around methods workflows scripts software data figureshellip

Focus on the figure F1000Research Living Figuresversioned articles in-article data manipulation

R Lawrence Force2015 Vision Award Runner Up httpf1000compostersbrowsesummary1097482

Simply data + code

Can change the definition of

a figure and ultimately the

journal article

Colomb J and Brembs B

Sub-strains of Drosophila Canton-S differ

markedly in their locomotor behavior [v1

ref status indexed httpf1000res3is]

F1000Research 2014 3176

Other labs can replicate the study or

contribute their data to a meta-

analysis or disease model - figure

automatically updates

Data updates time-stamped

New conclusions added via versions

Jennifer Schopf Treating Data Like Software A Case for Production Quality Data JCDL 2012

Software-like Release paradigm Not a static document paradigm

Reproduce looks backwards -gt Release looks forwards

raquo Science methods data change -gt agile evolution

raquo Comparisons versions forks amp merges dependencies

raquo Id amp Citations

raquo Interlinked ROs

httpsdoiorg1015490seek1investigation56

[Snoep 2015]

httpsdoiorg1015490seek1investigation56

Personal Data

Local Stores

External

Databases

Articles

Models

Standards

Aggregated Commons Infrastructure

Consistent Comparative Reportingbull Design protocols samples

software modelshellipbull Just Enough Results Modelbull Common and specific elements

httpwwwseek4scienceorg httpwwwfair-domorg httpisatoolsorg

RO as Instrument Materials Method

RO as Instrument Materials Method

Input Data

Software

Output Data

ConfigParameters

Drummond Replicability is not Reproducibility Nor is it Good Science onlinePeng Reproducible Research in Computational Science Science 2 Dec 2011 1226-1227

Public data sets

My algorithm

RO Workflow as Instrument

BioSTIF

My data set

Public software

recompute

replicatererun

repeat

re-examine

repurpose

recreate

reuse

restorereconstruct review

regeneraterevise

recycle

redo

What IS reproducibilityRe ldquodo againrdquo ldquoreturn to original staterdquo

regenerate figure

ldquoshow A is true by doing Brdquo

verify but not falsify[Yong Nature 485 2012]

robustness tolerance

verification compliance

validation assurance

1 Science Changes So does the Lab

BioSTIF

ldquoThe questions donrsquot

change but the

answers dordquoDan Reed

The lab is not fixedUpdated resources

Uncertainty

Zhao et al Why workflows break - Understanding and combating decay in Taverna workflows 8th Intl Conf e-Science 2012

2 Instruments Break Labs Decaymaterials become unavailable technicians leave

Reproducibility Window

raquo Bit rot Black boxes

raquo Proprietary Licenses

raquo Clown services

raquo Partial replication

raquo Prepare to Repair

rsaquo form or function

rsaquo preserve or sustain

RO as Instrument Materials Method

Input Data

Software

Output Data

ConfigParameters

Methods(techniques algorithms

spec of the steps)

Materials(datasets parameters

algorithm seeds)

Experiment

Instruments(codes services scripts

underlying libraries)

Laboratory(sw and hw infrastructure

systems software

integrative platforms)

Setup

Drummond Replicability is not Reproducibility Nor is it Good Science onlinePeng Reproducible Research in Computational Science Science 2 Dec 2011 1226-1227

Research Environment

submit articleand move onhellip

publish articlePublication Environment

Research Environment

publish articlePublication Environment

submit articleand move onhellip

[Adapted Freire 2013]

transparencydependencies

stepsprovenance

portability

robustness

preservation

accessavailable

descriptionintelligible

standardscommon APIs

licensing

standardscommon

metadata

change managementversioning

packaging

Machine actionable

Machine actionable

Provenance ndash the link between doing and reporting

Reproduce by ReadingArchived Record Retain the ProcessCode

The IT Crowd Series 3 Episode 4

The eLab Virtual Machine (or Docker Image)

a black box thoughdockercom

Reproduce by Running Active InstrumentRetain the bits

The IT Crowd Series 3 Episode 4

The Internet

serviceScience as a Service

Integrative frameworks

Open Source

Workflows

Virtual Machines

Portable Packaging

Portability

Transparency

ReproZip

Workflowsmakefiles

serviceScience as a Service

Integrative frameworks

Open Source

Workflows

Virtual Machines

Portable Packaging

Metadata Objectsthe secret is the manifesthellip

Workflow definition

Data (inputs outputs) Parameter configsProvenance log

Hettne et al Structuring research methods and data with the research object model genomics workflows as a case study 2014 httpwwwjbiomedsemcomcontentpdf2041-1480-5-41pdf

myRDM

httpwwwcnrirestonvauspapersOverviewDigitalObjectArchitecturepdf

Depth and Coverage Profiles

NISO-JATS

NISO-JATS

Depth and Coverage Metadata Profiles

Zhao et al 2013

Method Matters

Make reproducible -gt Born

Be smart about reproducibility

Think Commons not Repository

Best Practices for Scientific Computing httparxivorgabs12100530Stodden Reproducible Research Standard Intl J Comm Law amp Policy 13 2009

RARE amp FAIR Knowledge Turns with Research Objects

httpmerchandisethedoctorwhositecouk

httpdoctorwhosite1weeblycomsonic-screwdrivershtml

Researcher Silver bullet tools

Psychic paper

httpbowjamesbowca20080608shhhhhhh-silencshtml

Reality Check

Jorge Cham wwwphdcomicscom

Stealthy not Sneakyreduce the friction

instrumentationspan RARE and FAIR

Optimising The Neylon Equation

Auto-magical end-to-end Instrumentationhttpswwwyoutubecomwatchv=QVQwSOX5S08

ELNs and Authoring Platforms

Sweave

Credit ne AuthorshipCiting what

Research Currencies

Training

56Of UK researchers develop their own research software or scripts

73 Of UK researchers have had no formal software engineering training

Survey of researchers from 15 Russell Group universities conducted by SSI between August - October 2014 406 respondents covering representative range of funders discipline and seniority

httpwwwrseacuk

BUThelliphellip

two years time when the paper is writtenreviewers want additional workstatistician wants more runsanalysis may need to be repeatedpost-doc leaves student arrivesnew data revised dataupdated versions of algorithmscodessample was contaminated

Inspired by Bob Harrison

bull Incremental shift for infrastructure providers

bull Moderate shift for policy makers and stewards

bull Paradigm shift for researchers and their institutions

The Challenge

All the members of the Wf4Ever teamColleagues in Manchesterrsquos Information Management Group

httpwwwresearchobjectorg

httpwwwwf4ever-projectorg

httpwwwfair-domorg

httpseek4scienceorg

httprightfieldorguk

httpwwwsoftwareacuk

httpwwwdatafairportorg

httpmyexperimentorg

httpwwwbioveleuAlan WilliamsNorman MorrisonStian Soiland-ReyesPaul GrothTim ClarkJuliana FreireAlejandra Gonzalez-BeltranPhilippe Rocca-SerraIan CottamSusanna SansoneKristian Garza

Barend MonsSean BechhoferPhilip BourneMatthew GambleRaul PalmaJun ZhaoNeil Chue HongJosh SommerMatthias ObstJacky SnoepDavid GavaghanRebecca Lawrence

Contacthellip

Professor Carole Goble CBE FREng FBCS

The University of Manchester UK

carolegoblemanchesteracuk

httpssitesgooglecomsitecarolegoble

httpwwwmygridorguk

Page 18: Keynote speech - Carole Goble - Jisc Digital Festival 2015

httpmyexperimentorg

Research Objects

Compound Investigations Research Products

Multi-various ProductsPlatformsResources

Units of exchange commons contextual metadata

httpwwwresearchobjectorg

httpwwwresearchobjectorg

First class citizens - data software methods - id manage credit track profile focus

A Framework to Bundle and Relate (scattered) resources Metadata Objects that carry Research Context

Research Objects

bull closed lt-gt open bull local lt-gt alienbull embed lt-gt referbull fixed lt-gt fluidbull nested

bull multi ndashtyped stewarded sited authored

bull span research researchers platforms time

bull cite resolve steward

Goble De Roure Bechhofer Accelerating Knowledge Turns I3CK 2013

means

ends

driver

Research Object packages codes study and metadata to exchange descriptions of clinical study cohorts statistical scripts data (CKAN for the Farr Commons)

STELAR Asthma e-Lab Study Team for Early Life Asthma Research

ClinicalCodesorg coded patient cohorts exchanged with NHS FARSITE system

MRC funded multi-site collaboration to support safe use of patient and research data for medical research

STELAR e-Lab

Platform 1

Platform 2

Platform 3

Focus Pivot and ProfileProfile around methods workflows scripts software data figureshellip

Focus on the figure F1000Research Living Figuresversioned articles in-article data manipulation

R Lawrence Force2015 Vision Award Runner Up httpf1000compostersbrowsesummary1097482

Simply data + code

Can change the definition of

a figure and ultimately the

journal article

Colomb J and Brembs B

Sub-strains of Drosophila Canton-S differ

markedly in their locomotor behavior [v1

ref status indexed httpf1000res3is]

F1000Research 2014 3176

Other labs can replicate the study or

contribute their data to a meta-

analysis or disease model - figure

automatically updates

Data updates time-stamped

New conclusions added via versions

Jennifer Schopf Treating Data Like Software A Case for Production Quality Data JCDL 2012

Software-like Release paradigm Not a static document paradigm

Reproduce looks backwards -gt Release looks forwards

raquo Science methods data change -gt agile evolution

raquo Comparisons versions forks amp merges dependencies

raquo Id amp Citations

raquo Interlinked ROs

httpsdoiorg1015490seek1investigation56

[Snoep 2015]

httpsdoiorg1015490seek1investigation56

Personal Data

Local Stores

External

Databases

Articles

Models

Standards

Aggregated Commons Infrastructure

Consistent Comparative Reportingbull Design protocols samples

software modelshellipbull Just Enough Results Modelbull Common and specific elements

httpwwwseek4scienceorg httpwwwfair-domorg httpisatoolsorg

RO as Instrument Materials Method

RO as Instrument Materials Method

Input Data

Software

Output Data

ConfigParameters

Drummond Replicability is not Reproducibility Nor is it Good Science onlinePeng Reproducible Research in Computational Science Science 2 Dec 2011 1226-1227

Public data sets

My algorithm

RO Workflow as Instrument

BioSTIF

My data set

Public software

recompute

replicatererun

repeat

re-examine

repurpose

recreate

reuse

restorereconstruct review

regeneraterevise

recycle

redo

What IS reproducibilityRe ldquodo againrdquo ldquoreturn to original staterdquo

regenerate figure

ldquoshow A is true by doing Brdquo

verify but not falsify[Yong Nature 485 2012]

robustness tolerance

verification compliance

validation assurance

1 Science Changes So does the Lab

BioSTIF

ldquoThe questions donrsquot

change but the

answers dordquoDan Reed

The lab is not fixedUpdated resources

Uncertainty

Zhao et al Why workflows break - Understanding and combating decay in Taverna workflows 8th Intl Conf e-Science 2012

2 Instruments Break Labs Decaymaterials become unavailable technicians leave

Reproducibility Window

raquo Bit rot Black boxes

raquo Proprietary Licenses

raquo Clown services

raquo Partial replication

raquo Prepare to Repair

rsaquo form or function

rsaquo preserve or sustain

RO as Instrument Materials Method

Input Data

Software

Output Data

ConfigParameters

Methods(techniques algorithms

spec of the steps)

Materials(datasets parameters

algorithm seeds)

Experiment

Instruments(codes services scripts

underlying libraries)

Laboratory(sw and hw infrastructure

systems software

integrative platforms)

Setup

Drummond Replicability is not Reproducibility Nor is it Good Science onlinePeng Reproducible Research in Computational Science Science 2 Dec 2011 1226-1227

Research Environment

submit articleand move onhellip

publish articlePublication Environment

Research Environment

publish articlePublication Environment

submit articleand move onhellip

[Adapted Freire 2013]

transparencydependencies

stepsprovenance

portability

robustness

preservation

accessavailable

descriptionintelligible

standardscommon APIs

licensing

standardscommon

metadata

change managementversioning

packaging

Machine actionable

Machine actionable

Provenance ndash the link between doing and reporting

Reproduce by ReadingArchived Record Retain the ProcessCode

The IT Crowd Series 3 Episode 4

The eLab Virtual Machine (or Docker Image)

a black box thoughdockercom

Reproduce by Running Active InstrumentRetain the bits

The IT Crowd Series 3 Episode 4

The Internet

serviceScience as a Service

Integrative frameworks

Open Source

Workflows

Virtual Machines

Portable Packaging

Portability

Transparency

ReproZip

Workflowsmakefiles

serviceScience as a Service

Integrative frameworks

Open Source

Workflows

Virtual Machines

Portable Packaging

Metadata Objectsthe secret is the manifesthellip

Workflow definition

Data (inputs outputs) Parameter configsProvenance log

Hettne et al Structuring research methods and data with the research object model genomics workflows as a case study 2014 httpwwwjbiomedsemcomcontentpdf2041-1480-5-41pdf

myRDM

httpwwwcnrirestonvauspapersOverviewDigitalObjectArchitecturepdf

Depth and Coverage Profiles

NISO-JATS

NISO-JATS

Depth and Coverage Metadata Profiles

Zhao et al 2013

Method Matters

Make reproducible -gt Born

Be smart about reproducibility

Think Commons not Repository

Best Practices for Scientific Computing httparxivorgabs12100530Stodden Reproducible Research Standard Intl J Comm Law amp Policy 13 2009

RARE amp FAIR Knowledge Turns with Research Objects

httpmerchandisethedoctorwhositecouk

httpdoctorwhosite1weeblycomsonic-screwdrivershtml

Researcher Silver bullet tools

Psychic paper

httpbowjamesbowca20080608shhhhhhh-silencshtml

Reality Check

Jorge Cham wwwphdcomicscom

Stealthy not Sneakyreduce the friction

instrumentationspan RARE and FAIR

Optimising The Neylon Equation

Auto-magical end-to-end Instrumentationhttpswwwyoutubecomwatchv=QVQwSOX5S08

ELNs and Authoring Platforms

Sweave

Credit ne AuthorshipCiting what

Research Currencies

Training

56Of UK researchers develop their own research software or scripts

73 Of UK researchers have had no formal software engineering training

Survey of researchers from 15 Russell Group universities conducted by SSI between August - October 2014 406 respondents covering representative range of funders discipline and seniority

httpwwwrseacuk

BUThelliphellip

two years time when the paper is writtenreviewers want additional workstatistician wants more runsanalysis may need to be repeatedpost-doc leaves student arrivesnew data revised dataupdated versions of algorithmscodessample was contaminated

Inspired by Bob Harrison

bull Incremental shift for infrastructure providers

bull Moderate shift for policy makers and stewards

bull Paradigm shift for researchers and their institutions

The Challenge

All the members of the Wf4Ever teamColleagues in Manchesterrsquos Information Management Group

httpwwwresearchobjectorg

httpwwwwf4ever-projectorg

httpwwwfair-domorg

httpseek4scienceorg

httprightfieldorguk

httpwwwsoftwareacuk

httpwwwdatafairportorg

httpmyexperimentorg

httpwwwbioveleuAlan WilliamsNorman MorrisonStian Soiland-ReyesPaul GrothTim ClarkJuliana FreireAlejandra Gonzalez-BeltranPhilippe Rocca-SerraIan CottamSusanna SansoneKristian Garza

Barend MonsSean BechhoferPhilip BourneMatthew GambleRaul PalmaJun ZhaoNeil Chue HongJosh SommerMatthias ObstJacky SnoepDavid GavaghanRebecca Lawrence

Contacthellip

Professor Carole Goble CBE FREng FBCS

The University of Manchester UK

carolegoblemanchesteracuk

httpssitesgooglecomsitecarolegoble

httpwwwmygridorguk

Page 19: Keynote speech - Carole Goble - Jisc Digital Festival 2015

Research Objects

Compound Investigations Research Products

Multi-various ProductsPlatformsResources

Units of exchange commons contextual metadata

httpwwwresearchobjectorg

httpwwwresearchobjectorg

First class citizens - data software methods - id manage credit track profile focus

A Framework to Bundle and Relate (scattered) resources Metadata Objects that carry Research Context

Research Objects

bull closed lt-gt open bull local lt-gt alienbull embed lt-gt referbull fixed lt-gt fluidbull nested

bull multi ndashtyped stewarded sited authored

bull span research researchers platforms time

bull cite resolve steward

Goble De Roure Bechhofer Accelerating Knowledge Turns I3CK 2013

means

ends

driver

Research Object packages codes study and metadata to exchange descriptions of clinical study cohorts statistical scripts data (CKAN for the Farr Commons)

STELAR Asthma e-Lab Study Team for Early Life Asthma Research

ClinicalCodesorg coded patient cohorts exchanged with NHS FARSITE system

MRC funded multi-site collaboration to support safe use of patient and research data for medical research

STELAR e-Lab

Platform 1

Platform 2

Platform 3

Focus Pivot and ProfileProfile around methods workflows scripts software data figureshellip

Focus on the figure F1000Research Living Figuresversioned articles in-article data manipulation

R Lawrence Force2015 Vision Award Runner Up httpf1000compostersbrowsesummary1097482

Simply data + code

Can change the definition of

a figure and ultimately the

journal article

Colomb J and Brembs B

Sub-strains of Drosophila Canton-S differ

markedly in their locomotor behavior [v1

ref status indexed httpf1000res3is]

F1000Research 2014 3176

Other labs can replicate the study or

contribute their data to a meta-

analysis or disease model - figure

automatically updates

Data updates time-stamped

New conclusions added via versions

Jennifer Schopf Treating Data Like Software A Case for Production Quality Data JCDL 2012

Software-like Release paradigm Not a static document paradigm

Reproduce looks backwards -gt Release looks forwards

raquo Science methods data change -gt agile evolution

raquo Comparisons versions forks amp merges dependencies

raquo Id amp Citations

raquo Interlinked ROs

httpsdoiorg1015490seek1investigation56

[Snoep 2015]

httpsdoiorg1015490seek1investigation56

Personal Data

Local Stores

External

Databases

Articles

Models

Standards

Aggregated Commons Infrastructure

Consistent Comparative Reportingbull Design protocols samples

software modelshellipbull Just Enough Results Modelbull Common and specific elements

httpwwwseek4scienceorg httpwwwfair-domorg httpisatoolsorg

RO as Instrument Materials Method

RO as Instrument Materials Method

Input Data

Software

Output Data

ConfigParameters

Drummond Replicability is not Reproducibility Nor is it Good Science onlinePeng Reproducible Research in Computational Science Science 2 Dec 2011 1226-1227

Public data sets

My algorithm

RO Workflow as Instrument

BioSTIF

My data set

Public software

recompute

replicatererun

repeat

re-examine

repurpose

recreate

reuse

restorereconstruct review

regeneraterevise

recycle

redo

What IS reproducibilityRe ldquodo againrdquo ldquoreturn to original staterdquo

regenerate figure

ldquoshow A is true by doing Brdquo

verify but not falsify[Yong Nature 485 2012]

robustness tolerance

verification compliance

validation assurance

1 Science Changes So does the Lab

BioSTIF

ldquoThe questions donrsquot

change but the

answers dordquoDan Reed

The lab is not fixedUpdated resources

Uncertainty

Zhao et al Why workflows break - Understanding and combating decay in Taverna workflows 8th Intl Conf e-Science 2012

2 Instruments Break Labs Decaymaterials become unavailable technicians leave

Reproducibility Window

raquo Bit rot Black boxes

raquo Proprietary Licenses

raquo Clown services

raquo Partial replication

raquo Prepare to Repair

rsaquo form or function

rsaquo preserve or sustain

RO as Instrument Materials Method

Input Data

Software

Output Data

ConfigParameters

Methods(techniques algorithms

spec of the steps)

Materials(datasets parameters

algorithm seeds)

Experiment

Instruments(codes services scripts

underlying libraries)

Laboratory(sw and hw infrastructure

systems software

integrative platforms)

Setup

Drummond Replicability is not Reproducibility Nor is it Good Science onlinePeng Reproducible Research in Computational Science Science 2 Dec 2011 1226-1227

Research Environment

submit articleand move onhellip

publish articlePublication Environment

Research Environment

publish articlePublication Environment

submit articleand move onhellip

[Adapted Freire 2013]

transparencydependencies

stepsprovenance

portability

robustness

preservation

accessavailable

descriptionintelligible

standardscommon APIs

licensing

standardscommon

metadata

change managementversioning

packaging

Machine actionable

Machine actionable

Provenance ndash the link between doing and reporting

Reproduce by ReadingArchived Record Retain the ProcessCode

The IT Crowd Series 3 Episode 4

The eLab Virtual Machine (or Docker Image)

a black box thoughdockercom

Reproduce by Running Active InstrumentRetain the bits

The IT Crowd Series 3 Episode 4

The Internet

serviceScience as a Service

Integrative frameworks

Open Source

Workflows

Virtual Machines

Portable Packaging

Portability

Transparency

ReproZip

Workflowsmakefiles

serviceScience as a Service

Integrative frameworks

Open Source

Workflows

Virtual Machines

Portable Packaging

Metadata Objectsthe secret is the manifesthellip

Workflow definition

Data (inputs outputs) Parameter configsProvenance log

Hettne et al Structuring research methods and data with the research object model genomics workflows as a case study 2014 httpwwwjbiomedsemcomcontentpdf2041-1480-5-41pdf

myRDM

httpwwwcnrirestonvauspapersOverviewDigitalObjectArchitecturepdf

Depth and Coverage Profiles

NISO-JATS

NISO-JATS

Depth and Coverage Metadata Profiles

Zhao et al 2013

Method Matters

Make reproducible -gt Born

Be smart about reproducibility

Think Commons not Repository

Best Practices for Scientific Computing httparxivorgabs12100530Stodden Reproducible Research Standard Intl J Comm Law amp Policy 13 2009

RARE amp FAIR Knowledge Turns with Research Objects

httpmerchandisethedoctorwhositecouk

httpdoctorwhosite1weeblycomsonic-screwdrivershtml

Researcher Silver bullet tools

Psychic paper

httpbowjamesbowca20080608shhhhhhh-silencshtml

Reality Check

Jorge Cham wwwphdcomicscom

Stealthy not Sneakyreduce the friction

instrumentationspan RARE and FAIR

Optimising The Neylon Equation

Auto-magical end-to-end Instrumentationhttpswwwyoutubecomwatchv=QVQwSOX5S08

ELNs and Authoring Platforms

Sweave

Credit ne AuthorshipCiting what

Research Currencies

Training

56Of UK researchers develop their own research software or scripts

73 Of UK researchers have had no formal software engineering training

Survey of researchers from 15 Russell Group universities conducted by SSI between August - October 2014 406 respondents covering representative range of funders discipline and seniority

httpwwwrseacuk

BUThelliphellip

two years time when the paper is writtenreviewers want additional workstatistician wants more runsanalysis may need to be repeatedpost-doc leaves student arrivesnew data revised dataupdated versions of algorithmscodessample was contaminated

Inspired by Bob Harrison

bull Incremental shift for infrastructure providers

bull Moderate shift for policy makers and stewards

bull Paradigm shift for researchers and their institutions

The Challenge

All the members of the Wf4Ever teamColleagues in Manchesterrsquos Information Management Group

httpwwwresearchobjectorg

httpwwwwf4ever-projectorg

httpwwwfair-domorg

httpseek4scienceorg

httprightfieldorguk

httpwwwsoftwareacuk

httpwwwdatafairportorg

httpmyexperimentorg

httpwwwbioveleuAlan WilliamsNorman MorrisonStian Soiland-ReyesPaul GrothTim ClarkJuliana FreireAlejandra Gonzalez-BeltranPhilippe Rocca-SerraIan CottamSusanna SansoneKristian Garza

Barend MonsSean BechhoferPhilip BourneMatthew GambleRaul PalmaJun ZhaoNeil Chue HongJosh SommerMatthias ObstJacky SnoepDavid GavaghanRebecca Lawrence

Contacthellip

Professor Carole Goble CBE FREng FBCS

The University of Manchester UK

carolegoblemanchesteracuk

httpssitesgooglecomsitecarolegoble

httpwwwmygridorguk

Page 20: Keynote speech - Carole Goble - Jisc Digital Festival 2015

httpwwwresearchobjectorg

First class citizens - data software methods - id manage credit track profile focus

A Framework to Bundle and Relate (scattered) resources Metadata Objects that carry Research Context

Research Objects

bull closed lt-gt open bull local lt-gt alienbull embed lt-gt referbull fixed lt-gt fluidbull nested

bull multi ndashtyped stewarded sited authored

bull span research researchers platforms time

bull cite resolve steward

Goble De Roure Bechhofer Accelerating Knowledge Turns I3CK 2013

means

ends

driver

Research Object packages codes study and metadata to exchange descriptions of clinical study cohorts statistical scripts data (CKAN for the Farr Commons)

STELAR Asthma e-Lab Study Team for Early Life Asthma Research

ClinicalCodesorg coded patient cohorts exchanged with NHS FARSITE system

MRC funded multi-site collaboration to support safe use of patient and research data for medical research

STELAR e-Lab

Platform 1

Platform 2

Platform 3

Focus Pivot and ProfileProfile around methods workflows scripts software data figureshellip

Focus on the figure F1000Research Living Figuresversioned articles in-article data manipulation

R Lawrence Force2015 Vision Award Runner Up httpf1000compostersbrowsesummary1097482

Simply data + code

Can change the definition of

a figure and ultimately the

journal article

Colomb J and Brembs B

Sub-strains of Drosophila Canton-S differ

markedly in their locomotor behavior [v1

ref status indexed httpf1000res3is]

F1000Research 2014 3176

Other labs can replicate the study or

contribute their data to a meta-

analysis or disease model - figure

automatically updates

Data updates time-stamped

New conclusions added via versions

Jennifer Schopf Treating Data Like Software A Case for Production Quality Data JCDL 2012

Software-like Release paradigm Not a static document paradigm

Reproduce looks backwards -gt Release looks forwards

raquo Science methods data change -gt agile evolution

raquo Comparisons versions forks amp merges dependencies

raquo Id amp Citations

raquo Interlinked ROs

httpsdoiorg1015490seek1investigation56

[Snoep 2015]

httpsdoiorg1015490seek1investigation56

Personal Data

Local Stores

External

Databases

Articles

Models

Standards

Aggregated Commons Infrastructure

Consistent Comparative Reportingbull Design protocols samples

software modelshellipbull Just Enough Results Modelbull Common and specific elements

httpwwwseek4scienceorg httpwwwfair-domorg httpisatoolsorg

RO as Instrument Materials Method

RO as Instrument Materials Method

Input Data

Software

Output Data

ConfigParameters

Drummond Replicability is not Reproducibility Nor is it Good Science onlinePeng Reproducible Research in Computational Science Science 2 Dec 2011 1226-1227

Public data sets

My algorithm

RO Workflow as Instrument

BioSTIF

My data set

Public software

recompute

replicatererun

repeat

re-examine

repurpose

recreate

reuse

restorereconstruct review

regeneraterevise

recycle

redo

What IS reproducibilityRe ldquodo againrdquo ldquoreturn to original staterdquo

regenerate figure

ldquoshow A is true by doing Brdquo

verify but not falsify[Yong Nature 485 2012]

robustness tolerance

verification compliance

validation assurance

1 Science Changes So does the Lab

BioSTIF

ldquoThe questions donrsquot

change but the

answers dordquoDan Reed

The lab is not fixedUpdated resources

Uncertainty

Zhao et al Why workflows break - Understanding and combating decay in Taverna workflows 8th Intl Conf e-Science 2012

2 Instruments Break Labs Decaymaterials become unavailable technicians leave

Reproducibility Window

raquo Bit rot Black boxes

raquo Proprietary Licenses

raquo Clown services

raquo Partial replication

raquo Prepare to Repair

rsaquo form or function

rsaquo preserve or sustain

RO as Instrument Materials Method

Input Data

Software

Output Data

ConfigParameters

Methods(techniques algorithms

spec of the steps)

Materials(datasets parameters

algorithm seeds)

Experiment

Instruments(codes services scripts

underlying libraries)

Laboratory(sw and hw infrastructure

systems software

integrative platforms)

Setup

Drummond Replicability is not Reproducibility Nor is it Good Science onlinePeng Reproducible Research in Computational Science Science 2 Dec 2011 1226-1227

Research Environment

submit articleand move onhellip

publish articlePublication Environment

Research Environment

publish articlePublication Environment

submit articleand move onhellip

[Adapted Freire 2013]

transparencydependencies

stepsprovenance

portability

robustness

preservation

accessavailable

descriptionintelligible

standardscommon APIs

licensing

standardscommon

metadata

change managementversioning

packaging

Machine actionable

Machine actionable

Provenance ndash the link between doing and reporting

Reproduce by ReadingArchived Record Retain the ProcessCode

The IT Crowd Series 3 Episode 4

The eLab Virtual Machine (or Docker Image)

a black box thoughdockercom

Reproduce by Running Active InstrumentRetain the bits

The IT Crowd Series 3 Episode 4

The Internet

serviceScience as a Service

Integrative frameworks

Open Source

Workflows

Virtual Machines

Portable Packaging

Portability

Transparency

ReproZip

Workflowsmakefiles

serviceScience as a Service

Integrative frameworks

Open Source

Workflows

Virtual Machines

Portable Packaging

Metadata Objectsthe secret is the manifesthellip

Workflow definition

Data (inputs outputs) Parameter configsProvenance log

Hettne et al Structuring research methods and data with the research object model genomics workflows as a case study 2014 httpwwwjbiomedsemcomcontentpdf2041-1480-5-41pdf

myRDM

httpwwwcnrirestonvauspapersOverviewDigitalObjectArchitecturepdf

Depth and Coverage Profiles

NISO-JATS

NISO-JATS

Depth and Coverage Metadata Profiles

Zhao et al 2013

Method Matters

Make reproducible -gt Born

Be smart about reproducibility

Think Commons not Repository

Best Practices for Scientific Computing httparxivorgabs12100530Stodden Reproducible Research Standard Intl J Comm Law amp Policy 13 2009

RARE amp FAIR Knowledge Turns with Research Objects

httpmerchandisethedoctorwhositecouk

httpdoctorwhosite1weeblycomsonic-screwdrivershtml

Researcher Silver bullet tools

Psychic paper

httpbowjamesbowca20080608shhhhhhh-silencshtml

Reality Check

Jorge Cham wwwphdcomicscom

Stealthy not Sneakyreduce the friction

instrumentationspan RARE and FAIR

Optimising The Neylon Equation

Auto-magical end-to-end Instrumentationhttpswwwyoutubecomwatchv=QVQwSOX5S08

ELNs and Authoring Platforms

Sweave

Credit ne AuthorshipCiting what

Research Currencies

Training

56Of UK researchers develop their own research software or scripts

73 Of UK researchers have had no formal software engineering training

Survey of researchers from 15 Russell Group universities conducted by SSI between August - October 2014 406 respondents covering representative range of funders discipline and seniority

httpwwwrseacuk

BUThelliphellip

two years time when the paper is writtenreviewers want additional workstatistician wants more runsanalysis may need to be repeatedpost-doc leaves student arrivesnew data revised dataupdated versions of algorithmscodessample was contaminated

Inspired by Bob Harrison

bull Incremental shift for infrastructure providers

bull Moderate shift for policy makers and stewards

bull Paradigm shift for researchers and their institutions

The Challenge

All the members of the Wf4Ever teamColleagues in Manchesterrsquos Information Management Group

httpwwwresearchobjectorg

httpwwwwf4ever-projectorg

httpwwwfair-domorg

httpseek4scienceorg

httprightfieldorguk

httpwwwsoftwareacuk

httpwwwdatafairportorg

httpmyexperimentorg

httpwwwbioveleuAlan WilliamsNorman MorrisonStian Soiland-ReyesPaul GrothTim ClarkJuliana FreireAlejandra Gonzalez-BeltranPhilippe Rocca-SerraIan CottamSusanna SansoneKristian Garza

Barend MonsSean BechhoferPhilip BourneMatthew GambleRaul PalmaJun ZhaoNeil Chue HongJosh SommerMatthias ObstJacky SnoepDavid GavaghanRebecca Lawrence

Contacthellip

Professor Carole Goble CBE FREng FBCS

The University of Manchester UK

carolegoblemanchesteracuk

httpssitesgooglecomsitecarolegoble

httpwwwmygridorguk

Page 21: Keynote speech - Carole Goble - Jisc Digital Festival 2015

bull closed lt-gt open bull local lt-gt alienbull embed lt-gt referbull fixed lt-gt fluidbull nested

bull multi ndashtyped stewarded sited authored

bull span research researchers platforms time

bull cite resolve steward

Goble De Roure Bechhofer Accelerating Knowledge Turns I3CK 2013

means

ends

driver

Research Object packages codes study and metadata to exchange descriptions of clinical study cohorts statistical scripts data (CKAN for the Farr Commons)

STELAR Asthma e-Lab Study Team for Early Life Asthma Research

ClinicalCodesorg coded patient cohorts exchanged with NHS FARSITE system

MRC funded multi-site collaboration to support safe use of patient and research data for medical research

STELAR e-Lab

Platform 1

Platform 2

Platform 3

Focus Pivot and ProfileProfile around methods workflows scripts software data figureshellip

Focus on the figure F1000Research Living Figuresversioned articles in-article data manipulation

R Lawrence Force2015 Vision Award Runner Up httpf1000compostersbrowsesummary1097482

Simply data + code

Can change the definition of

a figure and ultimately the

journal article

Colomb J and Brembs B

Sub-strains of Drosophila Canton-S differ

markedly in their locomotor behavior [v1

ref status indexed httpf1000res3is]

F1000Research 2014 3176

Other labs can replicate the study or

contribute their data to a meta-

analysis or disease model - figure

automatically updates

Data updates time-stamped

New conclusions added via versions

Jennifer Schopf Treating Data Like Software A Case for Production Quality Data JCDL 2012

Software-like Release paradigm Not a static document paradigm

Reproduce looks backwards -gt Release looks forwards

raquo Science methods data change -gt agile evolution

raquo Comparisons versions forks amp merges dependencies

raquo Id amp Citations

raquo Interlinked ROs

httpsdoiorg1015490seek1investigation56

[Snoep 2015]

httpsdoiorg1015490seek1investigation56

Personal Data

Local Stores

External

Databases

Articles

Models

Standards

Aggregated Commons Infrastructure

Consistent Comparative Reportingbull Design protocols samples

software modelshellipbull Just Enough Results Modelbull Common and specific elements

httpwwwseek4scienceorg httpwwwfair-domorg httpisatoolsorg

RO as Instrument Materials Method

RO as Instrument Materials Method

Input Data

Software

Output Data

ConfigParameters

Drummond Replicability is not Reproducibility Nor is it Good Science onlinePeng Reproducible Research in Computational Science Science 2 Dec 2011 1226-1227

Public data sets

My algorithm

RO Workflow as Instrument

BioSTIF

My data set

Public software

recompute

replicatererun

repeat

re-examine

repurpose

recreate

reuse

restorereconstruct review

regeneraterevise

recycle

redo

What IS reproducibilityRe ldquodo againrdquo ldquoreturn to original staterdquo

regenerate figure

ldquoshow A is true by doing Brdquo

verify but not falsify[Yong Nature 485 2012]

robustness tolerance

verification compliance

validation assurance

1 Science Changes So does the Lab

BioSTIF

ldquoThe questions donrsquot

change but the

answers dordquoDan Reed

The lab is not fixedUpdated resources

Uncertainty

Zhao et al Why workflows break - Understanding and combating decay in Taverna workflows 8th Intl Conf e-Science 2012

2 Instruments Break Labs Decaymaterials become unavailable technicians leave

Reproducibility Window

raquo Bit rot Black boxes

raquo Proprietary Licenses

raquo Clown services

raquo Partial replication

raquo Prepare to Repair

rsaquo form or function

rsaquo preserve or sustain

RO as Instrument Materials Method

Input Data

Software

Output Data

ConfigParameters

Methods(techniques algorithms

spec of the steps)

Materials(datasets parameters

algorithm seeds)

Experiment

Instruments(codes services scripts

underlying libraries)

Laboratory(sw and hw infrastructure

systems software

integrative platforms)

Setup

Drummond Replicability is not Reproducibility Nor is it Good Science onlinePeng Reproducible Research in Computational Science Science 2 Dec 2011 1226-1227

Research Environment

submit articleand move onhellip

publish articlePublication Environment

Research Environment

publish articlePublication Environment

submit articleand move onhellip

[Adapted Freire 2013]

transparencydependencies

stepsprovenance

portability

robustness

preservation

accessavailable

descriptionintelligible

standardscommon APIs

licensing

standardscommon

metadata

change managementversioning

packaging

Machine actionable

Machine actionable

Provenance ndash the link between doing and reporting

Reproduce by ReadingArchived Record Retain the ProcessCode

The IT Crowd Series 3 Episode 4

The eLab Virtual Machine (or Docker Image)

a black box thoughdockercom

Reproduce by Running Active InstrumentRetain the bits

The IT Crowd Series 3 Episode 4

The Internet

serviceScience as a Service

Integrative frameworks

Open Source

Workflows

Virtual Machines

Portable Packaging

Portability

Transparency

ReproZip

Workflowsmakefiles

serviceScience as a Service

Integrative frameworks

Open Source

Workflows

Virtual Machines

Portable Packaging

Metadata Objectsthe secret is the manifesthellip

Workflow definition

Data (inputs outputs) Parameter configsProvenance log

Hettne et al Structuring research methods and data with the research object model genomics workflows as a case study 2014 httpwwwjbiomedsemcomcontentpdf2041-1480-5-41pdf

myRDM

httpwwwcnrirestonvauspapersOverviewDigitalObjectArchitecturepdf

Depth and Coverage Profiles

NISO-JATS

NISO-JATS

Depth and Coverage Metadata Profiles

Zhao et al 2013

Method Matters

Make reproducible -gt Born

Be smart about reproducibility

Think Commons not Repository

Best Practices for Scientific Computing httparxivorgabs12100530Stodden Reproducible Research Standard Intl J Comm Law amp Policy 13 2009

RARE amp FAIR Knowledge Turns with Research Objects

httpmerchandisethedoctorwhositecouk

httpdoctorwhosite1weeblycomsonic-screwdrivershtml

Researcher Silver bullet tools

Psychic paper

httpbowjamesbowca20080608shhhhhhh-silencshtml

Reality Check

Jorge Cham wwwphdcomicscom

Stealthy not Sneakyreduce the friction

instrumentationspan RARE and FAIR

Optimising The Neylon Equation

Auto-magical end-to-end Instrumentationhttpswwwyoutubecomwatchv=QVQwSOX5S08

ELNs and Authoring Platforms

Sweave

Credit ne AuthorshipCiting what

Research Currencies

Training

56Of UK researchers develop their own research software or scripts

73 Of UK researchers have had no formal software engineering training

Survey of researchers from 15 Russell Group universities conducted by SSI between August - October 2014 406 respondents covering representative range of funders discipline and seniority

httpwwwrseacuk

BUThelliphellip

two years time when the paper is writtenreviewers want additional workstatistician wants more runsanalysis may need to be repeatedpost-doc leaves student arrivesnew data revised dataupdated versions of algorithmscodessample was contaminated

Inspired by Bob Harrison

bull Incremental shift for infrastructure providers

bull Moderate shift for policy makers and stewards

bull Paradigm shift for researchers and their institutions

The Challenge

All the members of the Wf4Ever teamColleagues in Manchesterrsquos Information Management Group

httpwwwresearchobjectorg

httpwwwwf4ever-projectorg

httpwwwfair-domorg

httpseek4scienceorg

httprightfieldorguk

httpwwwsoftwareacuk

httpwwwdatafairportorg

httpmyexperimentorg

httpwwwbioveleuAlan WilliamsNorman MorrisonStian Soiland-ReyesPaul GrothTim ClarkJuliana FreireAlejandra Gonzalez-BeltranPhilippe Rocca-SerraIan CottamSusanna SansoneKristian Garza

Barend MonsSean BechhoferPhilip BourneMatthew GambleRaul PalmaJun ZhaoNeil Chue HongJosh SommerMatthias ObstJacky SnoepDavid GavaghanRebecca Lawrence

Contacthellip

Professor Carole Goble CBE FREng FBCS

The University of Manchester UK

carolegoblemanchesteracuk

httpssitesgooglecomsitecarolegoble

httpwwwmygridorguk

Page 22: Keynote speech - Carole Goble - Jisc Digital Festival 2015

Goble De Roure Bechhofer Accelerating Knowledge Turns I3CK 2013

means

ends

driver

Research Object packages codes study and metadata to exchange descriptions of clinical study cohorts statistical scripts data (CKAN for the Farr Commons)

STELAR Asthma e-Lab Study Team for Early Life Asthma Research

ClinicalCodesorg coded patient cohorts exchanged with NHS FARSITE system

MRC funded multi-site collaboration to support safe use of patient and research data for medical research

STELAR e-Lab

Platform 1

Platform 2

Platform 3

Focus Pivot and ProfileProfile around methods workflows scripts software data figureshellip

Focus on the figure F1000Research Living Figuresversioned articles in-article data manipulation

R Lawrence Force2015 Vision Award Runner Up httpf1000compostersbrowsesummary1097482

Simply data + code

Can change the definition of

a figure and ultimately the

journal article

Colomb J and Brembs B

Sub-strains of Drosophila Canton-S differ

markedly in their locomotor behavior [v1

ref status indexed httpf1000res3is]

F1000Research 2014 3176

Other labs can replicate the study or

contribute their data to a meta-

analysis or disease model - figure

automatically updates

Data updates time-stamped

New conclusions added via versions

Jennifer Schopf Treating Data Like Software A Case for Production Quality Data JCDL 2012

Software-like Release paradigm Not a static document paradigm

Reproduce looks backwards -gt Release looks forwards

raquo Science methods data change -gt agile evolution

raquo Comparisons versions forks amp merges dependencies

raquo Id amp Citations

raquo Interlinked ROs

httpsdoiorg1015490seek1investigation56

[Snoep 2015]

httpsdoiorg1015490seek1investigation56

Personal Data

Local Stores

External

Databases

Articles

Models

Standards

Aggregated Commons Infrastructure

Consistent Comparative Reportingbull Design protocols samples

software modelshellipbull Just Enough Results Modelbull Common and specific elements

httpwwwseek4scienceorg httpwwwfair-domorg httpisatoolsorg

RO as Instrument Materials Method

RO as Instrument Materials Method

Input Data

Software

Output Data

ConfigParameters

Drummond Replicability is not Reproducibility Nor is it Good Science onlinePeng Reproducible Research in Computational Science Science 2 Dec 2011 1226-1227

Public data sets

My algorithm

RO Workflow as Instrument

BioSTIF

My data set

Public software

recompute

replicatererun

repeat

re-examine

repurpose

recreate

reuse

restorereconstruct review

regeneraterevise

recycle

redo

What IS reproducibilityRe ldquodo againrdquo ldquoreturn to original staterdquo

regenerate figure

ldquoshow A is true by doing Brdquo

verify but not falsify[Yong Nature 485 2012]

robustness tolerance

verification compliance

validation assurance

1 Science Changes So does the Lab

BioSTIF

ldquoThe questions donrsquot

change but the

answers dordquoDan Reed

The lab is not fixedUpdated resources

Uncertainty

Zhao et al Why workflows break - Understanding and combating decay in Taverna workflows 8th Intl Conf e-Science 2012

2 Instruments Break Labs Decaymaterials become unavailable technicians leave

Reproducibility Window

raquo Bit rot Black boxes

raquo Proprietary Licenses

raquo Clown services

raquo Partial replication

raquo Prepare to Repair

rsaquo form or function

rsaquo preserve or sustain

RO as Instrument Materials Method

Input Data

Software

Output Data

ConfigParameters

Methods(techniques algorithms

spec of the steps)

Materials(datasets parameters

algorithm seeds)

Experiment

Instruments(codes services scripts

underlying libraries)

Laboratory(sw and hw infrastructure

systems software

integrative platforms)

Setup

Drummond Replicability is not Reproducibility Nor is it Good Science onlinePeng Reproducible Research in Computational Science Science 2 Dec 2011 1226-1227

Research Environment

submit articleand move onhellip

publish articlePublication Environment

Research Environment

publish articlePublication Environment

submit articleand move onhellip

[Adapted Freire 2013]

transparencydependencies

stepsprovenance

portability

robustness

preservation

accessavailable

descriptionintelligible

standardscommon APIs

licensing

standardscommon

metadata

change managementversioning

packaging

Machine actionable

Machine actionable

Provenance ndash the link between doing and reporting

Reproduce by ReadingArchived Record Retain the ProcessCode

The IT Crowd Series 3 Episode 4

The eLab Virtual Machine (or Docker Image)

a black box thoughdockercom

Reproduce by Running Active InstrumentRetain the bits

The IT Crowd Series 3 Episode 4

The Internet

serviceScience as a Service

Integrative frameworks

Open Source

Workflows

Virtual Machines

Portable Packaging

Portability

Transparency

ReproZip

Workflowsmakefiles

serviceScience as a Service

Integrative frameworks

Open Source

Workflows

Virtual Machines

Portable Packaging

Metadata Objectsthe secret is the manifesthellip

Workflow definition

Data (inputs outputs) Parameter configsProvenance log

Hettne et al Structuring research methods and data with the research object model genomics workflows as a case study 2014 httpwwwjbiomedsemcomcontentpdf2041-1480-5-41pdf

myRDM

httpwwwcnrirestonvauspapersOverviewDigitalObjectArchitecturepdf

Depth and Coverage Profiles

NISO-JATS

NISO-JATS

Depth and Coverage Metadata Profiles

Zhao et al 2013

Method Matters

Make reproducible -gt Born

Be smart about reproducibility

Think Commons not Repository

Best Practices for Scientific Computing httparxivorgabs12100530Stodden Reproducible Research Standard Intl J Comm Law amp Policy 13 2009

RARE amp FAIR Knowledge Turns with Research Objects

httpmerchandisethedoctorwhositecouk

httpdoctorwhosite1weeblycomsonic-screwdrivershtml

Researcher Silver bullet tools

Psychic paper

httpbowjamesbowca20080608shhhhhhh-silencshtml

Reality Check

Jorge Cham wwwphdcomicscom

Stealthy not Sneakyreduce the friction

instrumentationspan RARE and FAIR

Optimising The Neylon Equation

Auto-magical end-to-end Instrumentationhttpswwwyoutubecomwatchv=QVQwSOX5S08

ELNs and Authoring Platforms

Sweave

Credit ne AuthorshipCiting what

Research Currencies

Training

56Of UK researchers develop their own research software or scripts

73 Of UK researchers have had no formal software engineering training

Survey of researchers from 15 Russell Group universities conducted by SSI between August - October 2014 406 respondents covering representative range of funders discipline and seniority

httpwwwrseacuk

BUThelliphellip

two years time when the paper is writtenreviewers want additional workstatistician wants more runsanalysis may need to be repeatedpost-doc leaves student arrivesnew data revised dataupdated versions of algorithmscodessample was contaminated

Inspired by Bob Harrison

bull Incremental shift for infrastructure providers

bull Moderate shift for policy makers and stewards

bull Paradigm shift for researchers and their institutions

The Challenge

All the members of the Wf4Ever teamColleagues in Manchesterrsquos Information Management Group

httpwwwresearchobjectorg

httpwwwwf4ever-projectorg

httpwwwfair-domorg

httpseek4scienceorg

httprightfieldorguk

httpwwwsoftwareacuk

httpwwwdatafairportorg

httpmyexperimentorg

httpwwwbioveleuAlan WilliamsNorman MorrisonStian Soiland-ReyesPaul GrothTim ClarkJuliana FreireAlejandra Gonzalez-BeltranPhilippe Rocca-SerraIan CottamSusanna SansoneKristian Garza

Barend MonsSean BechhoferPhilip BourneMatthew GambleRaul PalmaJun ZhaoNeil Chue HongJosh SommerMatthias ObstJacky SnoepDavid GavaghanRebecca Lawrence

Contacthellip

Professor Carole Goble CBE FREng FBCS

The University of Manchester UK

carolegoblemanchesteracuk

httpssitesgooglecomsitecarolegoble

httpwwwmygridorguk

Page 23: Keynote speech - Carole Goble - Jisc Digital Festival 2015

Research Object packages codes study and metadata to exchange descriptions of clinical study cohorts statistical scripts data (CKAN for the Farr Commons)

STELAR Asthma e-Lab Study Team for Early Life Asthma Research

ClinicalCodesorg coded patient cohorts exchanged with NHS FARSITE system

MRC funded multi-site collaboration to support safe use of patient and research data for medical research

STELAR e-Lab

Platform 1

Platform 2

Platform 3

Focus Pivot and ProfileProfile around methods workflows scripts software data figureshellip

Focus on the figure F1000Research Living Figuresversioned articles in-article data manipulation

R Lawrence Force2015 Vision Award Runner Up httpf1000compostersbrowsesummary1097482

Simply data + code

Can change the definition of

a figure and ultimately the

journal article

Colomb J and Brembs B

Sub-strains of Drosophila Canton-S differ

markedly in their locomotor behavior [v1

ref status indexed httpf1000res3is]

F1000Research 2014 3176

Other labs can replicate the study or

contribute their data to a meta-

analysis or disease model - figure

automatically updates

Data updates time-stamped

New conclusions added via versions

Jennifer Schopf Treating Data Like Software A Case for Production Quality Data JCDL 2012

Software-like Release paradigm Not a static document paradigm

Reproduce looks backwards -gt Release looks forwards

raquo Science methods data change -gt agile evolution

raquo Comparisons versions forks amp merges dependencies

raquo Id amp Citations

raquo Interlinked ROs

httpsdoiorg1015490seek1investigation56

[Snoep 2015]

httpsdoiorg1015490seek1investigation56

Personal Data

Local Stores

External

Databases

Articles

Models

Standards

Aggregated Commons Infrastructure

Consistent Comparative Reportingbull Design protocols samples

software modelshellipbull Just Enough Results Modelbull Common and specific elements

httpwwwseek4scienceorg httpwwwfair-domorg httpisatoolsorg

RO as Instrument Materials Method

RO as Instrument Materials Method

Input Data

Software

Output Data

ConfigParameters

Drummond Replicability is not Reproducibility Nor is it Good Science onlinePeng Reproducible Research in Computational Science Science 2 Dec 2011 1226-1227

Public data sets

My algorithm

RO Workflow as Instrument

BioSTIF

My data set

Public software

recompute

replicatererun

repeat

re-examine

repurpose

recreate

reuse

restorereconstruct review

regeneraterevise

recycle

redo

What IS reproducibilityRe ldquodo againrdquo ldquoreturn to original staterdquo

regenerate figure

ldquoshow A is true by doing Brdquo

verify but not falsify[Yong Nature 485 2012]

robustness tolerance

verification compliance

validation assurance

1 Science Changes So does the Lab

BioSTIF

ldquoThe questions donrsquot

change but the

answers dordquoDan Reed

The lab is not fixedUpdated resources

Uncertainty

Zhao et al Why workflows break - Understanding and combating decay in Taverna workflows 8th Intl Conf e-Science 2012

2 Instruments Break Labs Decaymaterials become unavailable technicians leave

Reproducibility Window

raquo Bit rot Black boxes

raquo Proprietary Licenses

raquo Clown services

raquo Partial replication

raquo Prepare to Repair

rsaquo form or function

rsaquo preserve or sustain

RO as Instrument Materials Method

Input Data

Software

Output Data

ConfigParameters

Methods(techniques algorithms

spec of the steps)

Materials(datasets parameters

algorithm seeds)

Experiment

Instruments(codes services scripts

underlying libraries)

Laboratory(sw and hw infrastructure

systems software

integrative platforms)

Setup

Drummond Replicability is not Reproducibility Nor is it Good Science onlinePeng Reproducible Research in Computational Science Science 2 Dec 2011 1226-1227

Research Environment

submit articleand move onhellip

publish articlePublication Environment

Research Environment

publish articlePublication Environment

submit articleand move onhellip

[Adapted Freire 2013]

transparencydependencies

stepsprovenance

portability

robustness

preservation

accessavailable

descriptionintelligible

standardscommon APIs

licensing

standardscommon

metadata

change managementversioning

packaging

Machine actionable

Machine actionable

Provenance ndash the link between doing and reporting

Reproduce by ReadingArchived Record Retain the ProcessCode

The IT Crowd Series 3 Episode 4

The eLab Virtual Machine (or Docker Image)

a black box thoughdockercom

Reproduce by Running Active InstrumentRetain the bits

The IT Crowd Series 3 Episode 4

The Internet

serviceScience as a Service

Integrative frameworks

Open Source

Workflows

Virtual Machines

Portable Packaging

Portability

Transparency

ReproZip

Workflowsmakefiles

serviceScience as a Service

Integrative frameworks

Open Source

Workflows

Virtual Machines

Portable Packaging

Metadata Objectsthe secret is the manifesthellip

Workflow definition

Data (inputs outputs) Parameter configsProvenance log

Hettne et al Structuring research methods and data with the research object model genomics workflows as a case study 2014 httpwwwjbiomedsemcomcontentpdf2041-1480-5-41pdf

myRDM

httpwwwcnrirestonvauspapersOverviewDigitalObjectArchitecturepdf

Depth and Coverage Profiles

NISO-JATS

NISO-JATS

Depth and Coverage Metadata Profiles

Zhao et al 2013

Method Matters

Make reproducible -gt Born

Be smart about reproducibility

Think Commons not Repository

Best Practices for Scientific Computing httparxivorgabs12100530Stodden Reproducible Research Standard Intl J Comm Law amp Policy 13 2009

RARE amp FAIR Knowledge Turns with Research Objects

httpmerchandisethedoctorwhositecouk

httpdoctorwhosite1weeblycomsonic-screwdrivershtml

Researcher Silver bullet tools

Psychic paper

httpbowjamesbowca20080608shhhhhhh-silencshtml

Reality Check

Jorge Cham wwwphdcomicscom

Stealthy not Sneakyreduce the friction

instrumentationspan RARE and FAIR

Optimising The Neylon Equation

Auto-magical end-to-end Instrumentationhttpswwwyoutubecomwatchv=QVQwSOX5S08

ELNs and Authoring Platforms

Sweave

Credit ne AuthorshipCiting what

Research Currencies

Training

56Of UK researchers develop their own research software or scripts

73 Of UK researchers have had no formal software engineering training

Survey of researchers from 15 Russell Group universities conducted by SSI between August - October 2014 406 respondents covering representative range of funders discipline and seniority

httpwwwrseacuk

BUThelliphellip

two years time when the paper is writtenreviewers want additional workstatistician wants more runsanalysis may need to be repeatedpost-doc leaves student arrivesnew data revised dataupdated versions of algorithmscodessample was contaminated

Inspired by Bob Harrison

bull Incremental shift for infrastructure providers

bull Moderate shift for policy makers and stewards

bull Paradigm shift for researchers and their institutions

The Challenge

All the members of the Wf4Ever teamColleagues in Manchesterrsquos Information Management Group

httpwwwresearchobjectorg

httpwwwwf4ever-projectorg

httpwwwfair-domorg

httpseek4scienceorg

httprightfieldorguk

httpwwwsoftwareacuk

httpwwwdatafairportorg

httpmyexperimentorg

httpwwwbioveleuAlan WilliamsNorman MorrisonStian Soiland-ReyesPaul GrothTim ClarkJuliana FreireAlejandra Gonzalez-BeltranPhilippe Rocca-SerraIan CottamSusanna SansoneKristian Garza

Barend MonsSean BechhoferPhilip BourneMatthew GambleRaul PalmaJun ZhaoNeil Chue HongJosh SommerMatthias ObstJacky SnoepDavid GavaghanRebecca Lawrence

Contacthellip

Professor Carole Goble CBE FREng FBCS

The University of Manchester UK

carolegoblemanchesteracuk

httpssitesgooglecomsitecarolegoble

httpwwwmygridorguk

Page 24: Keynote speech - Carole Goble - Jisc Digital Festival 2015

Focus Pivot and ProfileProfile around methods workflows scripts software data figureshellip

Focus on the figure F1000Research Living Figuresversioned articles in-article data manipulation

R Lawrence Force2015 Vision Award Runner Up httpf1000compostersbrowsesummary1097482

Simply data + code

Can change the definition of

a figure and ultimately the

journal article

Colomb J and Brembs B

Sub-strains of Drosophila Canton-S differ

markedly in their locomotor behavior [v1

ref status indexed httpf1000res3is]

F1000Research 2014 3176

Other labs can replicate the study or

contribute their data to a meta-

analysis or disease model - figure

automatically updates

Data updates time-stamped

New conclusions added via versions

Jennifer Schopf Treating Data Like Software A Case for Production Quality Data JCDL 2012

Software-like Release paradigm Not a static document paradigm

Reproduce looks backwards -gt Release looks forwards

raquo Science methods data change -gt agile evolution

raquo Comparisons versions forks amp merges dependencies

raquo Id amp Citations

raquo Interlinked ROs

httpsdoiorg1015490seek1investigation56

[Snoep 2015]

httpsdoiorg1015490seek1investigation56

Personal Data

Local Stores

External

Databases

Articles

Models

Standards

Aggregated Commons Infrastructure

Consistent Comparative Reportingbull Design protocols samples

software modelshellipbull Just Enough Results Modelbull Common and specific elements

httpwwwseek4scienceorg httpwwwfair-domorg httpisatoolsorg

RO as Instrument Materials Method

RO as Instrument Materials Method

Input Data

Software

Output Data

ConfigParameters

Drummond Replicability is not Reproducibility Nor is it Good Science onlinePeng Reproducible Research in Computational Science Science 2 Dec 2011 1226-1227

Public data sets

My algorithm

RO Workflow as Instrument

BioSTIF

My data set

Public software

recompute

replicatererun

repeat

re-examine

repurpose

recreate

reuse

restorereconstruct review

regeneraterevise

recycle

redo

What IS reproducibilityRe ldquodo againrdquo ldquoreturn to original staterdquo

regenerate figure

ldquoshow A is true by doing Brdquo

verify but not falsify[Yong Nature 485 2012]

robustness tolerance

verification compliance

validation assurance

1 Science Changes So does the Lab

BioSTIF

ldquoThe questions donrsquot

change but the

answers dordquoDan Reed

The lab is not fixedUpdated resources

Uncertainty

Zhao et al Why workflows break - Understanding and combating decay in Taverna workflows 8th Intl Conf e-Science 2012

2 Instruments Break Labs Decaymaterials become unavailable technicians leave

Reproducibility Window

raquo Bit rot Black boxes

raquo Proprietary Licenses

raquo Clown services

raquo Partial replication

raquo Prepare to Repair

rsaquo form or function

rsaquo preserve or sustain

RO as Instrument Materials Method

Input Data

Software

Output Data

ConfigParameters

Methods(techniques algorithms

spec of the steps)

Materials(datasets parameters

algorithm seeds)

Experiment

Instruments(codes services scripts

underlying libraries)

Laboratory(sw and hw infrastructure

systems software

integrative platforms)

Setup

Drummond Replicability is not Reproducibility Nor is it Good Science onlinePeng Reproducible Research in Computational Science Science 2 Dec 2011 1226-1227

Research Environment

submit articleand move onhellip

publish articlePublication Environment

Research Environment

publish articlePublication Environment

submit articleand move onhellip

[Adapted Freire 2013]

transparencydependencies

stepsprovenance

portability

robustness

preservation

accessavailable

descriptionintelligible

standardscommon APIs

licensing

standardscommon

metadata

change managementversioning

packaging

Machine actionable

Machine actionable

Provenance ndash the link between doing and reporting

Reproduce by ReadingArchived Record Retain the ProcessCode

The IT Crowd Series 3 Episode 4

The eLab Virtual Machine (or Docker Image)

a black box thoughdockercom

Reproduce by Running Active InstrumentRetain the bits

The IT Crowd Series 3 Episode 4

The Internet

serviceScience as a Service

Integrative frameworks

Open Source

Workflows

Virtual Machines

Portable Packaging

Portability

Transparency

ReproZip

Workflowsmakefiles

serviceScience as a Service

Integrative frameworks

Open Source

Workflows

Virtual Machines

Portable Packaging

Metadata Objectsthe secret is the manifesthellip

Workflow definition

Data (inputs outputs) Parameter configsProvenance log

Hettne et al Structuring research methods and data with the research object model genomics workflows as a case study 2014 httpwwwjbiomedsemcomcontentpdf2041-1480-5-41pdf

myRDM

httpwwwcnrirestonvauspapersOverviewDigitalObjectArchitecturepdf

Depth and Coverage Profiles

NISO-JATS

NISO-JATS

Depth and Coverage Metadata Profiles

Zhao et al 2013

Method Matters

Make reproducible -gt Born

Be smart about reproducibility

Think Commons not Repository

Best Practices for Scientific Computing httparxivorgabs12100530Stodden Reproducible Research Standard Intl J Comm Law amp Policy 13 2009

RARE amp FAIR Knowledge Turns with Research Objects

httpmerchandisethedoctorwhositecouk

httpdoctorwhosite1weeblycomsonic-screwdrivershtml

Researcher Silver bullet tools

Psychic paper

httpbowjamesbowca20080608shhhhhhh-silencshtml

Reality Check

Jorge Cham wwwphdcomicscom

Stealthy not Sneakyreduce the friction

instrumentationspan RARE and FAIR

Optimising The Neylon Equation

Auto-magical end-to-end Instrumentationhttpswwwyoutubecomwatchv=QVQwSOX5S08

ELNs and Authoring Platforms

Sweave

Credit ne AuthorshipCiting what

Research Currencies

Training

56Of UK researchers develop their own research software or scripts

73 Of UK researchers have had no formal software engineering training

Survey of researchers from 15 Russell Group universities conducted by SSI between August - October 2014 406 respondents covering representative range of funders discipline and seniority

httpwwwrseacuk

BUThelliphellip

two years time when the paper is writtenreviewers want additional workstatistician wants more runsanalysis may need to be repeatedpost-doc leaves student arrivesnew data revised dataupdated versions of algorithmscodessample was contaminated

Inspired by Bob Harrison

bull Incremental shift for infrastructure providers

bull Moderate shift for policy makers and stewards

bull Paradigm shift for researchers and their institutions

The Challenge

All the members of the Wf4Ever teamColleagues in Manchesterrsquos Information Management Group

httpwwwresearchobjectorg

httpwwwwf4ever-projectorg

httpwwwfair-domorg

httpseek4scienceorg

httprightfieldorguk

httpwwwsoftwareacuk

httpwwwdatafairportorg

httpmyexperimentorg

httpwwwbioveleuAlan WilliamsNorman MorrisonStian Soiland-ReyesPaul GrothTim ClarkJuliana FreireAlejandra Gonzalez-BeltranPhilippe Rocca-SerraIan CottamSusanna SansoneKristian Garza

Barend MonsSean BechhoferPhilip BourneMatthew GambleRaul PalmaJun ZhaoNeil Chue HongJosh SommerMatthias ObstJacky SnoepDavid GavaghanRebecca Lawrence

Contacthellip

Professor Carole Goble CBE FREng FBCS

The University of Manchester UK

carolegoblemanchesteracuk

httpssitesgooglecomsitecarolegoble

httpwwwmygridorguk

Page 25: Keynote speech - Carole Goble - Jisc Digital Festival 2015

Focus on the figure F1000Research Living Figuresversioned articles in-article data manipulation

R Lawrence Force2015 Vision Award Runner Up httpf1000compostersbrowsesummary1097482

Simply data + code

Can change the definition of

a figure and ultimately the

journal article

Colomb J and Brembs B

Sub-strains of Drosophila Canton-S differ

markedly in their locomotor behavior [v1

ref status indexed httpf1000res3is]

F1000Research 2014 3176

Other labs can replicate the study or

contribute their data to a meta-

analysis or disease model - figure

automatically updates

Data updates time-stamped

New conclusions added via versions

Jennifer Schopf Treating Data Like Software A Case for Production Quality Data JCDL 2012

Software-like Release paradigm Not a static document paradigm

Reproduce looks backwards -gt Release looks forwards

raquo Science methods data change -gt agile evolution

raquo Comparisons versions forks amp merges dependencies

raquo Id amp Citations

raquo Interlinked ROs

httpsdoiorg1015490seek1investigation56

[Snoep 2015]

httpsdoiorg1015490seek1investigation56

Personal Data

Local Stores

External

Databases

Articles

Models

Standards

Aggregated Commons Infrastructure

Consistent Comparative Reportingbull Design protocols samples

software modelshellipbull Just Enough Results Modelbull Common and specific elements

httpwwwseek4scienceorg httpwwwfair-domorg httpisatoolsorg

RO as Instrument Materials Method

RO as Instrument Materials Method

Input Data

Software

Output Data

ConfigParameters

Drummond Replicability is not Reproducibility Nor is it Good Science onlinePeng Reproducible Research in Computational Science Science 2 Dec 2011 1226-1227

Public data sets

My algorithm

RO Workflow as Instrument

BioSTIF

My data set

Public software

recompute

replicatererun

repeat

re-examine

repurpose

recreate

reuse

restorereconstruct review

regeneraterevise

recycle

redo

What IS reproducibilityRe ldquodo againrdquo ldquoreturn to original staterdquo

regenerate figure

ldquoshow A is true by doing Brdquo

verify but not falsify[Yong Nature 485 2012]

robustness tolerance

verification compliance

validation assurance

1 Science Changes So does the Lab

BioSTIF

ldquoThe questions donrsquot

change but the

answers dordquoDan Reed

The lab is not fixedUpdated resources

Uncertainty

Zhao et al Why workflows break - Understanding and combating decay in Taverna workflows 8th Intl Conf e-Science 2012

2 Instruments Break Labs Decaymaterials become unavailable technicians leave

Reproducibility Window

raquo Bit rot Black boxes

raquo Proprietary Licenses

raquo Clown services

raquo Partial replication

raquo Prepare to Repair

rsaquo form or function

rsaquo preserve or sustain

RO as Instrument Materials Method

Input Data

Software

Output Data

ConfigParameters

Methods(techniques algorithms

spec of the steps)

Materials(datasets parameters

algorithm seeds)

Experiment

Instruments(codes services scripts

underlying libraries)

Laboratory(sw and hw infrastructure

systems software

integrative platforms)

Setup

Drummond Replicability is not Reproducibility Nor is it Good Science onlinePeng Reproducible Research in Computational Science Science 2 Dec 2011 1226-1227

Research Environment

submit articleand move onhellip

publish articlePublication Environment

Research Environment

publish articlePublication Environment

submit articleand move onhellip

[Adapted Freire 2013]

transparencydependencies

stepsprovenance

portability

robustness

preservation

accessavailable

descriptionintelligible

standardscommon APIs

licensing

standardscommon

metadata

change managementversioning

packaging

Machine actionable

Machine actionable

Provenance ndash the link between doing and reporting

Reproduce by ReadingArchived Record Retain the ProcessCode

The IT Crowd Series 3 Episode 4

The eLab Virtual Machine (or Docker Image)

a black box thoughdockercom

Reproduce by Running Active InstrumentRetain the bits

The IT Crowd Series 3 Episode 4

The Internet

serviceScience as a Service

Integrative frameworks

Open Source

Workflows

Virtual Machines

Portable Packaging

Portability

Transparency

ReproZip

Workflowsmakefiles

serviceScience as a Service

Integrative frameworks

Open Source

Workflows

Virtual Machines

Portable Packaging

Metadata Objectsthe secret is the manifesthellip

Workflow definition

Data (inputs outputs) Parameter configsProvenance log

Hettne et al Structuring research methods and data with the research object model genomics workflows as a case study 2014 httpwwwjbiomedsemcomcontentpdf2041-1480-5-41pdf

myRDM

httpwwwcnrirestonvauspapersOverviewDigitalObjectArchitecturepdf

Depth and Coverage Profiles

NISO-JATS

NISO-JATS

Depth and Coverage Metadata Profiles

Zhao et al 2013

Method Matters

Make reproducible -gt Born

Be smart about reproducibility

Think Commons not Repository

Best Practices for Scientific Computing httparxivorgabs12100530Stodden Reproducible Research Standard Intl J Comm Law amp Policy 13 2009

RARE amp FAIR Knowledge Turns with Research Objects

httpmerchandisethedoctorwhositecouk

httpdoctorwhosite1weeblycomsonic-screwdrivershtml

Researcher Silver bullet tools

Psychic paper

httpbowjamesbowca20080608shhhhhhh-silencshtml

Reality Check

Jorge Cham wwwphdcomicscom

Stealthy not Sneakyreduce the friction

instrumentationspan RARE and FAIR

Optimising The Neylon Equation

Auto-magical end-to-end Instrumentationhttpswwwyoutubecomwatchv=QVQwSOX5S08

ELNs and Authoring Platforms

Sweave

Credit ne AuthorshipCiting what

Research Currencies

Training

56Of UK researchers develop their own research software or scripts

73 Of UK researchers have had no formal software engineering training

Survey of researchers from 15 Russell Group universities conducted by SSI between August - October 2014 406 respondents covering representative range of funders discipline and seniority

httpwwwrseacuk

BUThelliphellip

two years time when the paper is writtenreviewers want additional workstatistician wants more runsanalysis may need to be repeatedpost-doc leaves student arrivesnew data revised dataupdated versions of algorithmscodessample was contaminated

Inspired by Bob Harrison

bull Incremental shift for infrastructure providers

bull Moderate shift for policy makers and stewards

bull Paradigm shift for researchers and their institutions

The Challenge

All the members of the Wf4Ever teamColleagues in Manchesterrsquos Information Management Group

httpwwwresearchobjectorg

httpwwwwf4ever-projectorg

httpwwwfair-domorg

httpseek4scienceorg

httprightfieldorguk

httpwwwsoftwareacuk

httpwwwdatafairportorg

httpmyexperimentorg

httpwwwbioveleuAlan WilliamsNorman MorrisonStian Soiland-ReyesPaul GrothTim ClarkJuliana FreireAlejandra Gonzalez-BeltranPhilippe Rocca-SerraIan CottamSusanna SansoneKristian Garza

Barend MonsSean BechhoferPhilip BourneMatthew GambleRaul PalmaJun ZhaoNeil Chue HongJosh SommerMatthias ObstJacky SnoepDavid GavaghanRebecca Lawrence

Contacthellip

Professor Carole Goble CBE FREng FBCS

The University of Manchester UK

carolegoblemanchesteracuk

httpssitesgooglecomsitecarolegoble

httpwwwmygridorguk

Page 26: Keynote speech - Carole Goble - Jisc Digital Festival 2015

Jennifer Schopf Treating Data Like Software A Case for Production Quality Data JCDL 2012

Software-like Release paradigm Not a static document paradigm

Reproduce looks backwards -gt Release looks forwards

raquo Science methods data change -gt agile evolution

raquo Comparisons versions forks amp merges dependencies

raquo Id amp Citations

raquo Interlinked ROs

httpsdoiorg1015490seek1investigation56

[Snoep 2015]

httpsdoiorg1015490seek1investigation56

Personal Data

Local Stores

External

Databases

Articles

Models

Standards

Aggregated Commons Infrastructure

Consistent Comparative Reportingbull Design protocols samples

software modelshellipbull Just Enough Results Modelbull Common and specific elements

httpwwwseek4scienceorg httpwwwfair-domorg httpisatoolsorg

RO as Instrument Materials Method

RO as Instrument Materials Method

Input Data

Software

Output Data

ConfigParameters

Drummond Replicability is not Reproducibility Nor is it Good Science onlinePeng Reproducible Research in Computational Science Science 2 Dec 2011 1226-1227

Public data sets

My algorithm

RO Workflow as Instrument

BioSTIF

My data set

Public software

recompute

replicatererun

repeat

re-examine

repurpose

recreate

reuse

restorereconstruct review

regeneraterevise

recycle

redo

What IS reproducibilityRe ldquodo againrdquo ldquoreturn to original staterdquo

regenerate figure

ldquoshow A is true by doing Brdquo

verify but not falsify[Yong Nature 485 2012]

robustness tolerance

verification compliance

validation assurance

1 Science Changes So does the Lab

BioSTIF

ldquoThe questions donrsquot

change but the

answers dordquoDan Reed

The lab is not fixedUpdated resources

Uncertainty

Zhao et al Why workflows break - Understanding and combating decay in Taverna workflows 8th Intl Conf e-Science 2012

2 Instruments Break Labs Decaymaterials become unavailable technicians leave

Reproducibility Window

raquo Bit rot Black boxes

raquo Proprietary Licenses

raquo Clown services

raquo Partial replication

raquo Prepare to Repair

rsaquo form or function

rsaquo preserve or sustain

RO as Instrument Materials Method

Input Data

Software

Output Data

ConfigParameters

Methods(techniques algorithms

spec of the steps)

Materials(datasets parameters

algorithm seeds)

Experiment

Instruments(codes services scripts

underlying libraries)

Laboratory(sw and hw infrastructure

systems software

integrative platforms)

Setup

Drummond Replicability is not Reproducibility Nor is it Good Science onlinePeng Reproducible Research in Computational Science Science 2 Dec 2011 1226-1227

Research Environment

submit articleand move onhellip

publish articlePublication Environment

Research Environment

publish articlePublication Environment

submit articleand move onhellip

[Adapted Freire 2013]

transparencydependencies

stepsprovenance

portability

robustness

preservation

accessavailable

descriptionintelligible

standardscommon APIs

licensing

standardscommon

metadata

change managementversioning

packaging

Machine actionable

Machine actionable

Provenance ndash the link between doing and reporting

Reproduce by ReadingArchived Record Retain the ProcessCode

The IT Crowd Series 3 Episode 4

The eLab Virtual Machine (or Docker Image)

a black box thoughdockercom

Reproduce by Running Active InstrumentRetain the bits

The IT Crowd Series 3 Episode 4

The Internet

serviceScience as a Service

Integrative frameworks

Open Source

Workflows

Virtual Machines

Portable Packaging

Portability

Transparency

ReproZip

Workflowsmakefiles

serviceScience as a Service

Integrative frameworks

Open Source

Workflows

Virtual Machines

Portable Packaging

Metadata Objectsthe secret is the manifesthellip

Workflow definition

Data (inputs outputs) Parameter configsProvenance log

Hettne et al Structuring research methods and data with the research object model genomics workflows as a case study 2014 httpwwwjbiomedsemcomcontentpdf2041-1480-5-41pdf

myRDM

httpwwwcnrirestonvauspapersOverviewDigitalObjectArchitecturepdf

Depth and Coverage Profiles

NISO-JATS

NISO-JATS

Depth and Coverage Metadata Profiles

Zhao et al 2013

Method Matters

Make reproducible -gt Born

Be smart about reproducibility

Think Commons not Repository

Best Practices for Scientific Computing httparxivorgabs12100530Stodden Reproducible Research Standard Intl J Comm Law amp Policy 13 2009

RARE amp FAIR Knowledge Turns with Research Objects

httpmerchandisethedoctorwhositecouk

httpdoctorwhosite1weeblycomsonic-screwdrivershtml

Researcher Silver bullet tools

Psychic paper

httpbowjamesbowca20080608shhhhhhh-silencshtml

Reality Check

Jorge Cham wwwphdcomicscom

Stealthy not Sneakyreduce the friction

instrumentationspan RARE and FAIR

Optimising The Neylon Equation

Auto-magical end-to-end Instrumentationhttpswwwyoutubecomwatchv=QVQwSOX5S08

ELNs and Authoring Platforms

Sweave

Credit ne AuthorshipCiting what

Research Currencies

Training

56Of UK researchers develop their own research software or scripts

73 Of UK researchers have had no formal software engineering training

Survey of researchers from 15 Russell Group universities conducted by SSI between August - October 2014 406 respondents covering representative range of funders discipline and seniority

httpwwwrseacuk

BUThelliphellip

two years time when the paper is writtenreviewers want additional workstatistician wants more runsanalysis may need to be repeatedpost-doc leaves student arrivesnew data revised dataupdated versions of algorithmscodessample was contaminated

Inspired by Bob Harrison

bull Incremental shift for infrastructure providers

bull Moderate shift for policy makers and stewards

bull Paradigm shift for researchers and their institutions

The Challenge

All the members of the Wf4Ever teamColleagues in Manchesterrsquos Information Management Group

httpwwwresearchobjectorg

httpwwwwf4ever-projectorg

httpwwwfair-domorg

httpseek4scienceorg

httprightfieldorguk

httpwwwsoftwareacuk

httpwwwdatafairportorg

httpmyexperimentorg

httpwwwbioveleuAlan WilliamsNorman MorrisonStian Soiland-ReyesPaul GrothTim ClarkJuliana FreireAlejandra Gonzalez-BeltranPhilippe Rocca-SerraIan CottamSusanna SansoneKristian Garza

Barend MonsSean BechhoferPhilip BourneMatthew GambleRaul PalmaJun ZhaoNeil Chue HongJosh SommerMatthias ObstJacky SnoepDavid GavaghanRebecca Lawrence

Contacthellip

Professor Carole Goble CBE FREng FBCS

The University of Manchester UK

carolegoblemanchesteracuk

httpssitesgooglecomsitecarolegoble

httpwwwmygridorguk

Page 27: Keynote speech - Carole Goble - Jisc Digital Festival 2015

httpsdoiorg1015490seek1investigation56

[Snoep 2015]

httpsdoiorg1015490seek1investigation56

Personal Data

Local Stores

External

Databases

Articles

Models

Standards

Aggregated Commons Infrastructure

Consistent Comparative Reportingbull Design protocols samples

software modelshellipbull Just Enough Results Modelbull Common and specific elements

httpwwwseek4scienceorg httpwwwfair-domorg httpisatoolsorg

RO as Instrument Materials Method

RO as Instrument Materials Method

Input Data

Software

Output Data

ConfigParameters

Drummond Replicability is not Reproducibility Nor is it Good Science onlinePeng Reproducible Research in Computational Science Science 2 Dec 2011 1226-1227

Public data sets

My algorithm

RO Workflow as Instrument

BioSTIF

My data set

Public software

recompute

replicatererun

repeat

re-examine

repurpose

recreate

reuse

restorereconstruct review

regeneraterevise

recycle

redo

What IS reproducibilityRe ldquodo againrdquo ldquoreturn to original staterdquo

regenerate figure

ldquoshow A is true by doing Brdquo

verify but not falsify[Yong Nature 485 2012]

robustness tolerance

verification compliance

validation assurance

1 Science Changes So does the Lab

BioSTIF

ldquoThe questions donrsquot

change but the

answers dordquoDan Reed

The lab is not fixedUpdated resources

Uncertainty

Zhao et al Why workflows break - Understanding and combating decay in Taverna workflows 8th Intl Conf e-Science 2012

2 Instruments Break Labs Decaymaterials become unavailable technicians leave

Reproducibility Window

raquo Bit rot Black boxes

raquo Proprietary Licenses

raquo Clown services

raquo Partial replication

raquo Prepare to Repair

rsaquo form or function

rsaquo preserve or sustain

RO as Instrument Materials Method

Input Data

Software

Output Data

ConfigParameters

Methods(techniques algorithms

spec of the steps)

Materials(datasets parameters

algorithm seeds)

Experiment

Instruments(codes services scripts

underlying libraries)

Laboratory(sw and hw infrastructure

systems software

integrative platforms)

Setup

Drummond Replicability is not Reproducibility Nor is it Good Science onlinePeng Reproducible Research in Computational Science Science 2 Dec 2011 1226-1227

Research Environment

submit articleand move onhellip

publish articlePublication Environment

Research Environment

publish articlePublication Environment

submit articleand move onhellip

[Adapted Freire 2013]

transparencydependencies

stepsprovenance

portability

robustness

preservation

accessavailable

descriptionintelligible

standardscommon APIs

licensing

standardscommon

metadata

change managementversioning

packaging

Machine actionable

Machine actionable

Provenance ndash the link between doing and reporting

Reproduce by ReadingArchived Record Retain the ProcessCode

The IT Crowd Series 3 Episode 4

The eLab Virtual Machine (or Docker Image)

a black box thoughdockercom

Reproduce by Running Active InstrumentRetain the bits

The IT Crowd Series 3 Episode 4

The Internet

serviceScience as a Service

Integrative frameworks

Open Source

Workflows

Virtual Machines

Portable Packaging

Portability

Transparency

ReproZip

Workflowsmakefiles

serviceScience as a Service

Integrative frameworks

Open Source

Workflows

Virtual Machines

Portable Packaging

Metadata Objectsthe secret is the manifesthellip

Workflow definition

Data (inputs outputs) Parameter configsProvenance log

Hettne et al Structuring research methods and data with the research object model genomics workflows as a case study 2014 httpwwwjbiomedsemcomcontentpdf2041-1480-5-41pdf

myRDM

httpwwwcnrirestonvauspapersOverviewDigitalObjectArchitecturepdf

Depth and Coverage Profiles

NISO-JATS

NISO-JATS

Depth and Coverage Metadata Profiles

Zhao et al 2013

Method Matters

Make reproducible -gt Born

Be smart about reproducibility

Think Commons not Repository

Best Practices for Scientific Computing httparxivorgabs12100530Stodden Reproducible Research Standard Intl J Comm Law amp Policy 13 2009

RARE amp FAIR Knowledge Turns with Research Objects

httpmerchandisethedoctorwhositecouk

httpdoctorwhosite1weeblycomsonic-screwdrivershtml

Researcher Silver bullet tools

Psychic paper

httpbowjamesbowca20080608shhhhhhh-silencshtml

Reality Check

Jorge Cham wwwphdcomicscom

Stealthy not Sneakyreduce the friction

instrumentationspan RARE and FAIR

Optimising The Neylon Equation

Auto-magical end-to-end Instrumentationhttpswwwyoutubecomwatchv=QVQwSOX5S08

ELNs and Authoring Platforms

Sweave

Credit ne AuthorshipCiting what

Research Currencies

Training

56Of UK researchers develop their own research software or scripts

73 Of UK researchers have had no formal software engineering training

Survey of researchers from 15 Russell Group universities conducted by SSI between August - October 2014 406 respondents covering representative range of funders discipline and seniority

httpwwwrseacuk

BUThelliphellip

two years time when the paper is writtenreviewers want additional workstatistician wants more runsanalysis may need to be repeatedpost-doc leaves student arrivesnew data revised dataupdated versions of algorithmscodessample was contaminated

Inspired by Bob Harrison

bull Incremental shift for infrastructure providers

bull Moderate shift for policy makers and stewards

bull Paradigm shift for researchers and their institutions

The Challenge

All the members of the Wf4Ever teamColleagues in Manchesterrsquos Information Management Group

httpwwwresearchobjectorg

httpwwwwf4ever-projectorg

httpwwwfair-domorg

httpseek4scienceorg

httprightfieldorguk

httpwwwsoftwareacuk

httpwwwdatafairportorg

httpmyexperimentorg

httpwwwbioveleuAlan WilliamsNorman MorrisonStian Soiland-ReyesPaul GrothTim ClarkJuliana FreireAlejandra Gonzalez-BeltranPhilippe Rocca-SerraIan CottamSusanna SansoneKristian Garza

Barend MonsSean BechhoferPhilip BourneMatthew GambleRaul PalmaJun ZhaoNeil Chue HongJosh SommerMatthias ObstJacky SnoepDavid GavaghanRebecca Lawrence

Contacthellip

Professor Carole Goble CBE FREng FBCS

The University of Manchester UK

carolegoblemanchesteracuk

httpssitesgooglecomsitecarolegoble

httpwwwmygridorguk

Page 28: Keynote speech - Carole Goble - Jisc Digital Festival 2015

httpsdoiorg1015490seek1investigation56

Personal Data

Local Stores

External

Databases

Articles

Models

Standards

Aggregated Commons Infrastructure

Consistent Comparative Reportingbull Design protocols samples

software modelshellipbull Just Enough Results Modelbull Common and specific elements

httpwwwseek4scienceorg httpwwwfair-domorg httpisatoolsorg

RO as Instrument Materials Method

RO as Instrument Materials Method

Input Data

Software

Output Data

ConfigParameters

Drummond Replicability is not Reproducibility Nor is it Good Science onlinePeng Reproducible Research in Computational Science Science 2 Dec 2011 1226-1227

Public data sets

My algorithm

RO Workflow as Instrument

BioSTIF

My data set

Public software

recompute

replicatererun

repeat

re-examine

repurpose

recreate

reuse

restorereconstruct review

regeneraterevise

recycle

redo

What IS reproducibilityRe ldquodo againrdquo ldquoreturn to original staterdquo

regenerate figure

ldquoshow A is true by doing Brdquo

verify but not falsify[Yong Nature 485 2012]

robustness tolerance

verification compliance

validation assurance

1 Science Changes So does the Lab

BioSTIF

ldquoThe questions donrsquot

change but the

answers dordquoDan Reed

The lab is not fixedUpdated resources

Uncertainty

Zhao et al Why workflows break - Understanding and combating decay in Taverna workflows 8th Intl Conf e-Science 2012

2 Instruments Break Labs Decaymaterials become unavailable technicians leave

Reproducibility Window

raquo Bit rot Black boxes

raquo Proprietary Licenses

raquo Clown services

raquo Partial replication

raquo Prepare to Repair

rsaquo form or function

rsaquo preserve or sustain

RO as Instrument Materials Method

Input Data

Software

Output Data

ConfigParameters

Methods(techniques algorithms

spec of the steps)

Materials(datasets parameters

algorithm seeds)

Experiment

Instruments(codes services scripts

underlying libraries)

Laboratory(sw and hw infrastructure

systems software

integrative platforms)

Setup

Drummond Replicability is not Reproducibility Nor is it Good Science onlinePeng Reproducible Research in Computational Science Science 2 Dec 2011 1226-1227

Research Environment

submit articleand move onhellip

publish articlePublication Environment

Research Environment

publish articlePublication Environment

submit articleand move onhellip

[Adapted Freire 2013]

transparencydependencies

stepsprovenance

portability

robustness

preservation

accessavailable

descriptionintelligible

standardscommon APIs

licensing

standardscommon

metadata

change managementversioning

packaging

Machine actionable

Machine actionable

Provenance ndash the link between doing and reporting

Reproduce by ReadingArchived Record Retain the ProcessCode

The IT Crowd Series 3 Episode 4

The eLab Virtual Machine (or Docker Image)

a black box thoughdockercom

Reproduce by Running Active InstrumentRetain the bits

The IT Crowd Series 3 Episode 4

The Internet

serviceScience as a Service

Integrative frameworks

Open Source

Workflows

Virtual Machines

Portable Packaging

Portability

Transparency

ReproZip

Workflowsmakefiles

serviceScience as a Service

Integrative frameworks

Open Source

Workflows

Virtual Machines

Portable Packaging

Metadata Objectsthe secret is the manifesthellip

Workflow definition

Data (inputs outputs) Parameter configsProvenance log

Hettne et al Structuring research methods and data with the research object model genomics workflows as a case study 2014 httpwwwjbiomedsemcomcontentpdf2041-1480-5-41pdf

myRDM

httpwwwcnrirestonvauspapersOverviewDigitalObjectArchitecturepdf

Depth and Coverage Profiles

NISO-JATS

NISO-JATS

Depth and Coverage Metadata Profiles

Zhao et al 2013

Method Matters

Make reproducible -gt Born

Be smart about reproducibility

Think Commons not Repository

Best Practices for Scientific Computing httparxivorgabs12100530Stodden Reproducible Research Standard Intl J Comm Law amp Policy 13 2009

RARE amp FAIR Knowledge Turns with Research Objects

httpmerchandisethedoctorwhositecouk

httpdoctorwhosite1weeblycomsonic-screwdrivershtml

Researcher Silver bullet tools

Psychic paper

httpbowjamesbowca20080608shhhhhhh-silencshtml

Reality Check

Jorge Cham wwwphdcomicscom

Stealthy not Sneakyreduce the friction

instrumentationspan RARE and FAIR

Optimising The Neylon Equation

Auto-magical end-to-end Instrumentationhttpswwwyoutubecomwatchv=QVQwSOX5S08

ELNs and Authoring Platforms

Sweave

Credit ne AuthorshipCiting what

Research Currencies

Training

56Of UK researchers develop their own research software or scripts

73 Of UK researchers have had no formal software engineering training

Survey of researchers from 15 Russell Group universities conducted by SSI between August - October 2014 406 respondents covering representative range of funders discipline and seniority

httpwwwrseacuk

BUThelliphellip

two years time when the paper is writtenreviewers want additional workstatistician wants more runsanalysis may need to be repeatedpost-doc leaves student arrivesnew data revised dataupdated versions of algorithmscodessample was contaminated

Inspired by Bob Harrison

bull Incremental shift for infrastructure providers

bull Moderate shift for policy makers and stewards

bull Paradigm shift for researchers and their institutions

The Challenge

All the members of the Wf4Ever teamColleagues in Manchesterrsquos Information Management Group

httpwwwresearchobjectorg

httpwwwwf4ever-projectorg

httpwwwfair-domorg

httpseek4scienceorg

httprightfieldorguk

httpwwwsoftwareacuk

httpwwwdatafairportorg

httpmyexperimentorg

httpwwwbioveleuAlan WilliamsNorman MorrisonStian Soiland-ReyesPaul GrothTim ClarkJuliana FreireAlejandra Gonzalez-BeltranPhilippe Rocca-SerraIan CottamSusanna SansoneKristian Garza

Barend MonsSean BechhoferPhilip BourneMatthew GambleRaul PalmaJun ZhaoNeil Chue HongJosh SommerMatthias ObstJacky SnoepDavid GavaghanRebecca Lawrence

Contacthellip

Professor Carole Goble CBE FREng FBCS

The University of Manchester UK

carolegoblemanchesteracuk

httpssitesgooglecomsitecarolegoble

httpwwwmygridorguk

Page 29: Keynote speech - Carole Goble - Jisc Digital Festival 2015

Personal Data

Local Stores

External

Databases

Articles

Models

Standards

Aggregated Commons Infrastructure

Consistent Comparative Reportingbull Design protocols samples

software modelshellipbull Just Enough Results Modelbull Common and specific elements

httpwwwseek4scienceorg httpwwwfair-domorg httpisatoolsorg

RO as Instrument Materials Method

RO as Instrument Materials Method

Input Data

Software

Output Data

ConfigParameters

Drummond Replicability is not Reproducibility Nor is it Good Science onlinePeng Reproducible Research in Computational Science Science 2 Dec 2011 1226-1227

Public data sets

My algorithm

RO Workflow as Instrument

BioSTIF

My data set

Public software

recompute

replicatererun

repeat

re-examine

repurpose

recreate

reuse

restorereconstruct review

regeneraterevise

recycle

redo

What IS reproducibilityRe ldquodo againrdquo ldquoreturn to original staterdquo

regenerate figure

ldquoshow A is true by doing Brdquo

verify but not falsify[Yong Nature 485 2012]

robustness tolerance

verification compliance

validation assurance

1 Science Changes So does the Lab

BioSTIF

ldquoThe questions donrsquot

change but the

answers dordquoDan Reed

The lab is not fixedUpdated resources

Uncertainty

Zhao et al Why workflows break - Understanding and combating decay in Taverna workflows 8th Intl Conf e-Science 2012

2 Instruments Break Labs Decaymaterials become unavailable technicians leave

Reproducibility Window

raquo Bit rot Black boxes

raquo Proprietary Licenses

raquo Clown services

raquo Partial replication

raquo Prepare to Repair

rsaquo form or function

rsaquo preserve or sustain

RO as Instrument Materials Method

Input Data

Software

Output Data

ConfigParameters

Methods(techniques algorithms

spec of the steps)

Materials(datasets parameters

algorithm seeds)

Experiment

Instruments(codes services scripts

underlying libraries)

Laboratory(sw and hw infrastructure

systems software

integrative platforms)

Setup

Drummond Replicability is not Reproducibility Nor is it Good Science onlinePeng Reproducible Research in Computational Science Science 2 Dec 2011 1226-1227

Research Environment

submit articleand move onhellip

publish articlePublication Environment

Research Environment

publish articlePublication Environment

submit articleand move onhellip

[Adapted Freire 2013]

transparencydependencies

stepsprovenance

portability

robustness

preservation

accessavailable

descriptionintelligible

standardscommon APIs

licensing

standardscommon

metadata

change managementversioning

packaging

Machine actionable

Machine actionable

Provenance ndash the link between doing and reporting

Reproduce by ReadingArchived Record Retain the ProcessCode

The IT Crowd Series 3 Episode 4

The eLab Virtual Machine (or Docker Image)

a black box thoughdockercom

Reproduce by Running Active InstrumentRetain the bits

The IT Crowd Series 3 Episode 4

The Internet

serviceScience as a Service

Integrative frameworks

Open Source

Workflows

Virtual Machines

Portable Packaging

Portability

Transparency

ReproZip

Workflowsmakefiles

serviceScience as a Service

Integrative frameworks

Open Source

Workflows

Virtual Machines

Portable Packaging

Metadata Objectsthe secret is the manifesthellip

Workflow definition

Data (inputs outputs) Parameter configsProvenance log

Hettne et al Structuring research methods and data with the research object model genomics workflows as a case study 2014 httpwwwjbiomedsemcomcontentpdf2041-1480-5-41pdf

myRDM

httpwwwcnrirestonvauspapersOverviewDigitalObjectArchitecturepdf

Depth and Coverage Profiles

NISO-JATS

NISO-JATS

Depth and Coverage Metadata Profiles

Zhao et al 2013

Method Matters

Make reproducible -gt Born

Be smart about reproducibility

Think Commons not Repository

Best Practices for Scientific Computing httparxivorgabs12100530Stodden Reproducible Research Standard Intl J Comm Law amp Policy 13 2009

RARE amp FAIR Knowledge Turns with Research Objects

httpmerchandisethedoctorwhositecouk

httpdoctorwhosite1weeblycomsonic-screwdrivershtml

Researcher Silver bullet tools

Psychic paper

httpbowjamesbowca20080608shhhhhhh-silencshtml

Reality Check

Jorge Cham wwwphdcomicscom

Stealthy not Sneakyreduce the friction

instrumentationspan RARE and FAIR

Optimising The Neylon Equation

Auto-magical end-to-end Instrumentationhttpswwwyoutubecomwatchv=QVQwSOX5S08

ELNs and Authoring Platforms

Sweave

Credit ne AuthorshipCiting what

Research Currencies

Training

56Of UK researchers develop their own research software or scripts

73 Of UK researchers have had no formal software engineering training

Survey of researchers from 15 Russell Group universities conducted by SSI between August - October 2014 406 respondents covering representative range of funders discipline and seniority

httpwwwrseacuk

BUThelliphellip

two years time when the paper is writtenreviewers want additional workstatistician wants more runsanalysis may need to be repeatedpost-doc leaves student arrivesnew data revised dataupdated versions of algorithmscodessample was contaminated

Inspired by Bob Harrison

bull Incremental shift for infrastructure providers

bull Moderate shift for policy makers and stewards

bull Paradigm shift for researchers and their institutions

The Challenge

All the members of the Wf4Ever teamColleagues in Manchesterrsquos Information Management Group

httpwwwresearchobjectorg

httpwwwwf4ever-projectorg

httpwwwfair-domorg

httpseek4scienceorg

httprightfieldorguk

httpwwwsoftwareacuk

httpwwwdatafairportorg

httpmyexperimentorg

httpwwwbioveleuAlan WilliamsNorman MorrisonStian Soiland-ReyesPaul GrothTim ClarkJuliana FreireAlejandra Gonzalez-BeltranPhilippe Rocca-SerraIan CottamSusanna SansoneKristian Garza

Barend MonsSean BechhoferPhilip BourneMatthew GambleRaul PalmaJun ZhaoNeil Chue HongJosh SommerMatthias ObstJacky SnoepDavid GavaghanRebecca Lawrence

Contacthellip

Professor Carole Goble CBE FREng FBCS

The University of Manchester UK

carolegoblemanchesteracuk

httpssitesgooglecomsitecarolegoble

httpwwwmygridorguk

Page 30: Keynote speech - Carole Goble - Jisc Digital Festival 2015

Aggregated Commons Infrastructure

Consistent Comparative Reportingbull Design protocols samples

software modelshellipbull Just Enough Results Modelbull Common and specific elements

httpwwwseek4scienceorg httpwwwfair-domorg httpisatoolsorg

RO as Instrument Materials Method

RO as Instrument Materials Method

Input Data

Software

Output Data

ConfigParameters

Drummond Replicability is not Reproducibility Nor is it Good Science onlinePeng Reproducible Research in Computational Science Science 2 Dec 2011 1226-1227

Public data sets

My algorithm

RO Workflow as Instrument

BioSTIF

My data set

Public software

recompute

replicatererun

repeat

re-examine

repurpose

recreate

reuse

restorereconstruct review

regeneraterevise

recycle

redo

What IS reproducibilityRe ldquodo againrdquo ldquoreturn to original staterdquo

regenerate figure

ldquoshow A is true by doing Brdquo

verify but not falsify[Yong Nature 485 2012]

robustness tolerance

verification compliance

validation assurance

1 Science Changes So does the Lab

BioSTIF

ldquoThe questions donrsquot

change but the

answers dordquoDan Reed

The lab is not fixedUpdated resources

Uncertainty

Zhao et al Why workflows break - Understanding and combating decay in Taverna workflows 8th Intl Conf e-Science 2012

2 Instruments Break Labs Decaymaterials become unavailable technicians leave

Reproducibility Window

raquo Bit rot Black boxes

raquo Proprietary Licenses

raquo Clown services

raquo Partial replication

raquo Prepare to Repair

rsaquo form or function

rsaquo preserve or sustain

RO as Instrument Materials Method

Input Data

Software

Output Data

ConfigParameters

Methods(techniques algorithms

spec of the steps)

Materials(datasets parameters

algorithm seeds)

Experiment

Instruments(codes services scripts

underlying libraries)

Laboratory(sw and hw infrastructure

systems software

integrative platforms)

Setup

Drummond Replicability is not Reproducibility Nor is it Good Science onlinePeng Reproducible Research in Computational Science Science 2 Dec 2011 1226-1227

Research Environment

submit articleand move onhellip

publish articlePublication Environment

Research Environment

publish articlePublication Environment

submit articleand move onhellip

[Adapted Freire 2013]

transparencydependencies

stepsprovenance

portability

robustness

preservation

accessavailable

descriptionintelligible

standardscommon APIs

licensing

standardscommon

metadata

change managementversioning

packaging

Machine actionable

Machine actionable

Provenance ndash the link between doing and reporting

Reproduce by ReadingArchived Record Retain the ProcessCode

The IT Crowd Series 3 Episode 4

The eLab Virtual Machine (or Docker Image)

a black box thoughdockercom

Reproduce by Running Active InstrumentRetain the bits

The IT Crowd Series 3 Episode 4

The Internet

serviceScience as a Service

Integrative frameworks

Open Source

Workflows

Virtual Machines

Portable Packaging

Portability

Transparency

ReproZip

Workflowsmakefiles

serviceScience as a Service

Integrative frameworks

Open Source

Workflows

Virtual Machines

Portable Packaging

Metadata Objectsthe secret is the manifesthellip

Workflow definition

Data (inputs outputs) Parameter configsProvenance log

Hettne et al Structuring research methods and data with the research object model genomics workflows as a case study 2014 httpwwwjbiomedsemcomcontentpdf2041-1480-5-41pdf

myRDM

httpwwwcnrirestonvauspapersOverviewDigitalObjectArchitecturepdf

Depth and Coverage Profiles

NISO-JATS

NISO-JATS

Depth and Coverage Metadata Profiles

Zhao et al 2013

Method Matters

Make reproducible -gt Born

Be smart about reproducibility

Think Commons not Repository

Best Practices for Scientific Computing httparxivorgabs12100530Stodden Reproducible Research Standard Intl J Comm Law amp Policy 13 2009

RARE amp FAIR Knowledge Turns with Research Objects

httpmerchandisethedoctorwhositecouk

httpdoctorwhosite1weeblycomsonic-screwdrivershtml

Researcher Silver bullet tools

Psychic paper

httpbowjamesbowca20080608shhhhhhh-silencshtml

Reality Check

Jorge Cham wwwphdcomicscom

Stealthy not Sneakyreduce the friction

instrumentationspan RARE and FAIR

Optimising The Neylon Equation

Auto-magical end-to-end Instrumentationhttpswwwyoutubecomwatchv=QVQwSOX5S08

ELNs and Authoring Platforms

Sweave

Credit ne AuthorshipCiting what

Research Currencies

Training

56Of UK researchers develop their own research software or scripts

73 Of UK researchers have had no formal software engineering training

Survey of researchers from 15 Russell Group universities conducted by SSI between August - October 2014 406 respondents covering representative range of funders discipline and seniority

httpwwwrseacuk

BUThelliphellip

two years time when the paper is writtenreviewers want additional workstatistician wants more runsanalysis may need to be repeatedpost-doc leaves student arrivesnew data revised dataupdated versions of algorithmscodessample was contaminated

Inspired by Bob Harrison

bull Incremental shift for infrastructure providers

bull Moderate shift for policy makers and stewards

bull Paradigm shift for researchers and their institutions

The Challenge

All the members of the Wf4Ever teamColleagues in Manchesterrsquos Information Management Group

httpwwwresearchobjectorg

httpwwwwf4ever-projectorg

httpwwwfair-domorg

httpseek4scienceorg

httprightfieldorguk

httpwwwsoftwareacuk

httpwwwdatafairportorg

httpmyexperimentorg

httpwwwbioveleuAlan WilliamsNorman MorrisonStian Soiland-ReyesPaul GrothTim ClarkJuliana FreireAlejandra Gonzalez-BeltranPhilippe Rocca-SerraIan CottamSusanna SansoneKristian Garza

Barend MonsSean BechhoferPhilip BourneMatthew GambleRaul PalmaJun ZhaoNeil Chue HongJosh SommerMatthias ObstJacky SnoepDavid GavaghanRebecca Lawrence

Contacthellip

Professor Carole Goble CBE FREng FBCS

The University of Manchester UK

carolegoblemanchesteracuk

httpssitesgooglecomsitecarolegoble

httpwwwmygridorguk

Page 31: Keynote speech - Carole Goble - Jisc Digital Festival 2015

RO as Instrument Materials Method

RO as Instrument Materials Method

Input Data

Software

Output Data

ConfigParameters

Drummond Replicability is not Reproducibility Nor is it Good Science onlinePeng Reproducible Research in Computational Science Science 2 Dec 2011 1226-1227

Public data sets

My algorithm

RO Workflow as Instrument

BioSTIF

My data set

Public software

recompute

replicatererun

repeat

re-examine

repurpose

recreate

reuse

restorereconstruct review

regeneraterevise

recycle

redo

What IS reproducibilityRe ldquodo againrdquo ldquoreturn to original staterdquo

regenerate figure

ldquoshow A is true by doing Brdquo

verify but not falsify[Yong Nature 485 2012]

robustness tolerance

verification compliance

validation assurance

1 Science Changes So does the Lab

BioSTIF

ldquoThe questions donrsquot

change but the

answers dordquoDan Reed

The lab is not fixedUpdated resources

Uncertainty

Zhao et al Why workflows break - Understanding and combating decay in Taverna workflows 8th Intl Conf e-Science 2012

2 Instruments Break Labs Decaymaterials become unavailable technicians leave

Reproducibility Window

raquo Bit rot Black boxes

raquo Proprietary Licenses

raquo Clown services

raquo Partial replication

raquo Prepare to Repair

rsaquo form or function

rsaquo preserve or sustain

RO as Instrument Materials Method

Input Data

Software

Output Data

ConfigParameters

Methods(techniques algorithms

spec of the steps)

Materials(datasets parameters

algorithm seeds)

Experiment

Instruments(codes services scripts

underlying libraries)

Laboratory(sw and hw infrastructure

systems software

integrative platforms)

Setup

Drummond Replicability is not Reproducibility Nor is it Good Science onlinePeng Reproducible Research in Computational Science Science 2 Dec 2011 1226-1227

Research Environment

submit articleand move onhellip

publish articlePublication Environment

Research Environment

publish articlePublication Environment

submit articleand move onhellip

[Adapted Freire 2013]

transparencydependencies

stepsprovenance

portability

robustness

preservation

accessavailable

descriptionintelligible

standardscommon APIs

licensing

standardscommon

metadata

change managementversioning

packaging

Machine actionable

Machine actionable

Provenance ndash the link between doing and reporting

Reproduce by ReadingArchived Record Retain the ProcessCode

The IT Crowd Series 3 Episode 4

The eLab Virtual Machine (or Docker Image)

a black box thoughdockercom

Reproduce by Running Active InstrumentRetain the bits

The IT Crowd Series 3 Episode 4

The Internet

serviceScience as a Service

Integrative frameworks

Open Source

Workflows

Virtual Machines

Portable Packaging

Portability

Transparency

ReproZip

Workflowsmakefiles

serviceScience as a Service

Integrative frameworks

Open Source

Workflows

Virtual Machines

Portable Packaging

Metadata Objectsthe secret is the manifesthellip

Workflow definition

Data (inputs outputs) Parameter configsProvenance log

Hettne et al Structuring research methods and data with the research object model genomics workflows as a case study 2014 httpwwwjbiomedsemcomcontentpdf2041-1480-5-41pdf

myRDM

httpwwwcnrirestonvauspapersOverviewDigitalObjectArchitecturepdf

Depth and Coverage Profiles

NISO-JATS

NISO-JATS

Depth and Coverage Metadata Profiles

Zhao et al 2013

Method Matters

Make reproducible -gt Born

Be smart about reproducibility

Think Commons not Repository

Best Practices for Scientific Computing httparxivorgabs12100530Stodden Reproducible Research Standard Intl J Comm Law amp Policy 13 2009

RARE amp FAIR Knowledge Turns with Research Objects

httpmerchandisethedoctorwhositecouk

httpdoctorwhosite1weeblycomsonic-screwdrivershtml

Researcher Silver bullet tools

Psychic paper

httpbowjamesbowca20080608shhhhhhh-silencshtml

Reality Check

Jorge Cham wwwphdcomicscom

Stealthy not Sneakyreduce the friction

instrumentationspan RARE and FAIR

Optimising The Neylon Equation

Auto-magical end-to-end Instrumentationhttpswwwyoutubecomwatchv=QVQwSOX5S08

ELNs and Authoring Platforms

Sweave

Credit ne AuthorshipCiting what

Research Currencies

Training

56Of UK researchers develop their own research software or scripts

73 Of UK researchers have had no formal software engineering training

Survey of researchers from 15 Russell Group universities conducted by SSI between August - October 2014 406 respondents covering representative range of funders discipline and seniority

httpwwwrseacuk

BUThelliphellip

two years time when the paper is writtenreviewers want additional workstatistician wants more runsanalysis may need to be repeatedpost-doc leaves student arrivesnew data revised dataupdated versions of algorithmscodessample was contaminated

Inspired by Bob Harrison

bull Incremental shift for infrastructure providers

bull Moderate shift for policy makers and stewards

bull Paradigm shift for researchers and their institutions

The Challenge

All the members of the Wf4Ever teamColleagues in Manchesterrsquos Information Management Group

httpwwwresearchobjectorg

httpwwwwf4ever-projectorg

httpwwwfair-domorg

httpseek4scienceorg

httprightfieldorguk

httpwwwsoftwareacuk

httpwwwdatafairportorg

httpmyexperimentorg

httpwwwbioveleuAlan WilliamsNorman MorrisonStian Soiland-ReyesPaul GrothTim ClarkJuliana FreireAlejandra Gonzalez-BeltranPhilippe Rocca-SerraIan CottamSusanna SansoneKristian Garza

Barend MonsSean BechhoferPhilip BourneMatthew GambleRaul PalmaJun ZhaoNeil Chue HongJosh SommerMatthias ObstJacky SnoepDavid GavaghanRebecca Lawrence

Contacthellip

Professor Carole Goble CBE FREng FBCS

The University of Manchester UK

carolegoblemanchesteracuk

httpssitesgooglecomsitecarolegoble

httpwwwmygridorguk

Page 32: Keynote speech - Carole Goble - Jisc Digital Festival 2015

RO as Instrument Materials Method

Input Data

Software

Output Data

ConfigParameters

Drummond Replicability is not Reproducibility Nor is it Good Science onlinePeng Reproducible Research in Computational Science Science 2 Dec 2011 1226-1227

Public data sets

My algorithm

RO Workflow as Instrument

BioSTIF

My data set

Public software

recompute

replicatererun

repeat

re-examine

repurpose

recreate

reuse

restorereconstruct review

regeneraterevise

recycle

redo

What IS reproducibilityRe ldquodo againrdquo ldquoreturn to original staterdquo

regenerate figure

ldquoshow A is true by doing Brdquo

verify but not falsify[Yong Nature 485 2012]

robustness tolerance

verification compliance

validation assurance

1 Science Changes So does the Lab

BioSTIF

ldquoThe questions donrsquot

change but the

answers dordquoDan Reed

The lab is not fixedUpdated resources

Uncertainty

Zhao et al Why workflows break - Understanding and combating decay in Taverna workflows 8th Intl Conf e-Science 2012

2 Instruments Break Labs Decaymaterials become unavailable technicians leave

Reproducibility Window

raquo Bit rot Black boxes

raquo Proprietary Licenses

raquo Clown services

raquo Partial replication

raquo Prepare to Repair

rsaquo form or function

rsaquo preserve or sustain

RO as Instrument Materials Method

Input Data

Software

Output Data

ConfigParameters

Methods(techniques algorithms

spec of the steps)

Materials(datasets parameters

algorithm seeds)

Experiment

Instruments(codes services scripts

underlying libraries)

Laboratory(sw and hw infrastructure

systems software

integrative platforms)

Setup

Drummond Replicability is not Reproducibility Nor is it Good Science onlinePeng Reproducible Research in Computational Science Science 2 Dec 2011 1226-1227

Research Environment

submit articleand move onhellip

publish articlePublication Environment

Research Environment

publish articlePublication Environment

submit articleand move onhellip

[Adapted Freire 2013]

transparencydependencies

stepsprovenance

portability

robustness

preservation

accessavailable

descriptionintelligible

standardscommon APIs

licensing

standardscommon

metadata

change managementversioning

packaging

Machine actionable

Machine actionable

Provenance ndash the link between doing and reporting

Reproduce by ReadingArchived Record Retain the ProcessCode

The IT Crowd Series 3 Episode 4

The eLab Virtual Machine (or Docker Image)

a black box thoughdockercom

Reproduce by Running Active InstrumentRetain the bits

The IT Crowd Series 3 Episode 4

The Internet

serviceScience as a Service

Integrative frameworks

Open Source

Workflows

Virtual Machines

Portable Packaging

Portability

Transparency

ReproZip

Workflowsmakefiles

serviceScience as a Service

Integrative frameworks

Open Source

Workflows

Virtual Machines

Portable Packaging

Metadata Objectsthe secret is the manifesthellip

Workflow definition

Data (inputs outputs) Parameter configsProvenance log

Hettne et al Structuring research methods and data with the research object model genomics workflows as a case study 2014 httpwwwjbiomedsemcomcontentpdf2041-1480-5-41pdf

myRDM

httpwwwcnrirestonvauspapersOverviewDigitalObjectArchitecturepdf

Depth and Coverage Profiles

NISO-JATS

NISO-JATS

Depth and Coverage Metadata Profiles

Zhao et al 2013

Method Matters

Make reproducible -gt Born

Be smart about reproducibility

Think Commons not Repository

Best Practices for Scientific Computing httparxivorgabs12100530Stodden Reproducible Research Standard Intl J Comm Law amp Policy 13 2009

RARE amp FAIR Knowledge Turns with Research Objects

httpmerchandisethedoctorwhositecouk

httpdoctorwhosite1weeblycomsonic-screwdrivershtml

Researcher Silver bullet tools

Psychic paper

httpbowjamesbowca20080608shhhhhhh-silencshtml

Reality Check

Jorge Cham wwwphdcomicscom

Stealthy not Sneakyreduce the friction

instrumentationspan RARE and FAIR

Optimising The Neylon Equation

Auto-magical end-to-end Instrumentationhttpswwwyoutubecomwatchv=QVQwSOX5S08

ELNs and Authoring Platforms

Sweave

Credit ne AuthorshipCiting what

Research Currencies

Training

56Of UK researchers develop their own research software or scripts

73 Of UK researchers have had no formal software engineering training

Survey of researchers from 15 Russell Group universities conducted by SSI between August - October 2014 406 respondents covering representative range of funders discipline and seniority

httpwwwrseacuk

BUThelliphellip

two years time when the paper is writtenreviewers want additional workstatistician wants more runsanalysis may need to be repeatedpost-doc leaves student arrivesnew data revised dataupdated versions of algorithmscodessample was contaminated

Inspired by Bob Harrison

bull Incremental shift for infrastructure providers

bull Moderate shift for policy makers and stewards

bull Paradigm shift for researchers and their institutions

The Challenge

All the members of the Wf4Ever teamColleagues in Manchesterrsquos Information Management Group

httpwwwresearchobjectorg

httpwwwwf4ever-projectorg

httpwwwfair-domorg

httpseek4scienceorg

httprightfieldorguk

httpwwwsoftwareacuk

httpwwwdatafairportorg

httpmyexperimentorg

httpwwwbioveleuAlan WilliamsNorman MorrisonStian Soiland-ReyesPaul GrothTim ClarkJuliana FreireAlejandra Gonzalez-BeltranPhilippe Rocca-SerraIan CottamSusanna SansoneKristian Garza

Barend MonsSean BechhoferPhilip BourneMatthew GambleRaul PalmaJun ZhaoNeil Chue HongJosh SommerMatthias ObstJacky SnoepDavid GavaghanRebecca Lawrence

Contacthellip

Professor Carole Goble CBE FREng FBCS

The University of Manchester UK

carolegoblemanchesteracuk

httpssitesgooglecomsitecarolegoble

httpwwwmygridorguk

Page 33: Keynote speech - Carole Goble - Jisc Digital Festival 2015

Public data sets

My algorithm

RO Workflow as Instrument

BioSTIF

My data set

Public software

recompute

replicatererun

repeat

re-examine

repurpose

recreate

reuse

restorereconstruct review

regeneraterevise

recycle

redo

What IS reproducibilityRe ldquodo againrdquo ldquoreturn to original staterdquo

regenerate figure

ldquoshow A is true by doing Brdquo

verify but not falsify[Yong Nature 485 2012]

robustness tolerance

verification compliance

validation assurance

1 Science Changes So does the Lab

BioSTIF

ldquoThe questions donrsquot

change but the

answers dordquoDan Reed

The lab is not fixedUpdated resources

Uncertainty

Zhao et al Why workflows break - Understanding and combating decay in Taverna workflows 8th Intl Conf e-Science 2012

2 Instruments Break Labs Decaymaterials become unavailable technicians leave

Reproducibility Window

raquo Bit rot Black boxes

raquo Proprietary Licenses

raquo Clown services

raquo Partial replication

raquo Prepare to Repair

rsaquo form or function

rsaquo preserve or sustain

RO as Instrument Materials Method

Input Data

Software

Output Data

ConfigParameters

Methods(techniques algorithms

spec of the steps)

Materials(datasets parameters

algorithm seeds)

Experiment

Instruments(codes services scripts

underlying libraries)

Laboratory(sw and hw infrastructure

systems software

integrative platforms)

Setup

Drummond Replicability is not Reproducibility Nor is it Good Science onlinePeng Reproducible Research in Computational Science Science 2 Dec 2011 1226-1227

Research Environment

submit articleand move onhellip

publish articlePublication Environment

Research Environment

publish articlePublication Environment

submit articleand move onhellip

[Adapted Freire 2013]

transparencydependencies

stepsprovenance

portability

robustness

preservation

accessavailable

descriptionintelligible

standardscommon APIs

licensing

standardscommon

metadata

change managementversioning

packaging

Machine actionable

Machine actionable

Provenance ndash the link between doing and reporting

Reproduce by ReadingArchived Record Retain the ProcessCode

The IT Crowd Series 3 Episode 4

The eLab Virtual Machine (or Docker Image)

a black box thoughdockercom

Reproduce by Running Active InstrumentRetain the bits

The IT Crowd Series 3 Episode 4

The Internet

serviceScience as a Service

Integrative frameworks

Open Source

Workflows

Virtual Machines

Portable Packaging

Portability

Transparency

ReproZip

Workflowsmakefiles

serviceScience as a Service

Integrative frameworks

Open Source

Workflows

Virtual Machines

Portable Packaging

Metadata Objectsthe secret is the manifesthellip

Workflow definition

Data (inputs outputs) Parameter configsProvenance log

Hettne et al Structuring research methods and data with the research object model genomics workflows as a case study 2014 httpwwwjbiomedsemcomcontentpdf2041-1480-5-41pdf

myRDM

httpwwwcnrirestonvauspapersOverviewDigitalObjectArchitecturepdf

Depth and Coverage Profiles

NISO-JATS

NISO-JATS

Depth and Coverage Metadata Profiles

Zhao et al 2013

Method Matters

Make reproducible -gt Born

Be smart about reproducibility

Think Commons not Repository

Best Practices for Scientific Computing httparxivorgabs12100530Stodden Reproducible Research Standard Intl J Comm Law amp Policy 13 2009

RARE amp FAIR Knowledge Turns with Research Objects

httpmerchandisethedoctorwhositecouk

httpdoctorwhosite1weeblycomsonic-screwdrivershtml

Researcher Silver bullet tools

Psychic paper

httpbowjamesbowca20080608shhhhhhh-silencshtml

Reality Check

Jorge Cham wwwphdcomicscom

Stealthy not Sneakyreduce the friction

instrumentationspan RARE and FAIR

Optimising The Neylon Equation

Auto-magical end-to-end Instrumentationhttpswwwyoutubecomwatchv=QVQwSOX5S08

ELNs and Authoring Platforms

Sweave

Credit ne AuthorshipCiting what

Research Currencies

Training

56Of UK researchers develop their own research software or scripts

73 Of UK researchers have had no formal software engineering training

Survey of researchers from 15 Russell Group universities conducted by SSI between August - October 2014 406 respondents covering representative range of funders discipline and seniority

httpwwwrseacuk

BUThelliphellip

two years time when the paper is writtenreviewers want additional workstatistician wants more runsanalysis may need to be repeatedpost-doc leaves student arrivesnew data revised dataupdated versions of algorithmscodessample was contaminated

Inspired by Bob Harrison

bull Incremental shift for infrastructure providers

bull Moderate shift for policy makers and stewards

bull Paradigm shift for researchers and their institutions

The Challenge

All the members of the Wf4Ever teamColleagues in Manchesterrsquos Information Management Group

httpwwwresearchobjectorg

httpwwwwf4ever-projectorg

httpwwwfair-domorg

httpseek4scienceorg

httprightfieldorguk

httpwwwsoftwareacuk

httpwwwdatafairportorg

httpmyexperimentorg

httpwwwbioveleuAlan WilliamsNorman MorrisonStian Soiland-ReyesPaul GrothTim ClarkJuliana FreireAlejandra Gonzalez-BeltranPhilippe Rocca-SerraIan CottamSusanna SansoneKristian Garza

Barend MonsSean BechhoferPhilip BourneMatthew GambleRaul PalmaJun ZhaoNeil Chue HongJosh SommerMatthias ObstJacky SnoepDavid GavaghanRebecca Lawrence

Contacthellip

Professor Carole Goble CBE FREng FBCS

The University of Manchester UK

carolegoblemanchesteracuk

httpssitesgooglecomsitecarolegoble

httpwwwmygridorguk

Page 34: Keynote speech - Carole Goble - Jisc Digital Festival 2015

recompute

replicatererun

repeat

re-examine

repurpose

recreate

reuse

restorereconstruct review

regeneraterevise

recycle

redo

What IS reproducibilityRe ldquodo againrdquo ldquoreturn to original staterdquo

regenerate figure

ldquoshow A is true by doing Brdquo

verify but not falsify[Yong Nature 485 2012]

robustness tolerance

verification compliance

validation assurance

1 Science Changes So does the Lab

BioSTIF

ldquoThe questions donrsquot

change but the

answers dordquoDan Reed

The lab is not fixedUpdated resources

Uncertainty

Zhao et al Why workflows break - Understanding and combating decay in Taverna workflows 8th Intl Conf e-Science 2012

2 Instruments Break Labs Decaymaterials become unavailable technicians leave

Reproducibility Window

raquo Bit rot Black boxes

raquo Proprietary Licenses

raquo Clown services

raquo Partial replication

raquo Prepare to Repair

rsaquo form or function

rsaquo preserve or sustain

RO as Instrument Materials Method

Input Data

Software

Output Data

ConfigParameters

Methods(techniques algorithms

spec of the steps)

Materials(datasets parameters

algorithm seeds)

Experiment

Instruments(codes services scripts

underlying libraries)

Laboratory(sw and hw infrastructure

systems software

integrative platforms)

Setup

Drummond Replicability is not Reproducibility Nor is it Good Science onlinePeng Reproducible Research in Computational Science Science 2 Dec 2011 1226-1227

Research Environment

submit articleand move onhellip

publish articlePublication Environment

Research Environment

publish articlePublication Environment

submit articleand move onhellip

[Adapted Freire 2013]

transparencydependencies

stepsprovenance

portability

robustness

preservation

accessavailable

descriptionintelligible

standardscommon APIs

licensing

standardscommon

metadata

change managementversioning

packaging

Machine actionable

Machine actionable

Provenance ndash the link between doing and reporting

Reproduce by ReadingArchived Record Retain the ProcessCode

The IT Crowd Series 3 Episode 4

The eLab Virtual Machine (or Docker Image)

a black box thoughdockercom

Reproduce by Running Active InstrumentRetain the bits

The IT Crowd Series 3 Episode 4

The Internet

serviceScience as a Service

Integrative frameworks

Open Source

Workflows

Virtual Machines

Portable Packaging

Portability

Transparency

ReproZip

Workflowsmakefiles

serviceScience as a Service

Integrative frameworks

Open Source

Workflows

Virtual Machines

Portable Packaging

Metadata Objectsthe secret is the manifesthellip

Workflow definition

Data (inputs outputs) Parameter configsProvenance log

Hettne et al Structuring research methods and data with the research object model genomics workflows as a case study 2014 httpwwwjbiomedsemcomcontentpdf2041-1480-5-41pdf

myRDM

httpwwwcnrirestonvauspapersOverviewDigitalObjectArchitecturepdf

Depth and Coverage Profiles

NISO-JATS

NISO-JATS

Depth and Coverage Metadata Profiles

Zhao et al 2013

Method Matters

Make reproducible -gt Born

Be smart about reproducibility

Think Commons not Repository

Best Practices for Scientific Computing httparxivorgabs12100530Stodden Reproducible Research Standard Intl J Comm Law amp Policy 13 2009

RARE amp FAIR Knowledge Turns with Research Objects

httpmerchandisethedoctorwhositecouk

httpdoctorwhosite1weeblycomsonic-screwdrivershtml

Researcher Silver bullet tools

Psychic paper

httpbowjamesbowca20080608shhhhhhh-silencshtml

Reality Check

Jorge Cham wwwphdcomicscom

Stealthy not Sneakyreduce the friction

instrumentationspan RARE and FAIR

Optimising The Neylon Equation

Auto-magical end-to-end Instrumentationhttpswwwyoutubecomwatchv=QVQwSOX5S08

ELNs and Authoring Platforms

Sweave

Credit ne AuthorshipCiting what

Research Currencies

Training

56Of UK researchers develop their own research software or scripts

73 Of UK researchers have had no formal software engineering training

Survey of researchers from 15 Russell Group universities conducted by SSI between August - October 2014 406 respondents covering representative range of funders discipline and seniority

httpwwwrseacuk

BUThelliphellip

two years time when the paper is writtenreviewers want additional workstatistician wants more runsanalysis may need to be repeatedpost-doc leaves student arrivesnew data revised dataupdated versions of algorithmscodessample was contaminated

Inspired by Bob Harrison

bull Incremental shift for infrastructure providers

bull Moderate shift for policy makers and stewards

bull Paradigm shift for researchers and their institutions

The Challenge

All the members of the Wf4Ever teamColleagues in Manchesterrsquos Information Management Group

httpwwwresearchobjectorg

httpwwwwf4ever-projectorg

httpwwwfair-domorg

httpseek4scienceorg

httprightfieldorguk

httpwwwsoftwareacuk

httpwwwdatafairportorg

httpmyexperimentorg

httpwwwbioveleuAlan WilliamsNorman MorrisonStian Soiland-ReyesPaul GrothTim ClarkJuliana FreireAlejandra Gonzalez-BeltranPhilippe Rocca-SerraIan CottamSusanna SansoneKristian Garza

Barend MonsSean BechhoferPhilip BourneMatthew GambleRaul PalmaJun ZhaoNeil Chue HongJosh SommerMatthias ObstJacky SnoepDavid GavaghanRebecca Lawrence

Contacthellip

Professor Carole Goble CBE FREng FBCS

The University of Manchester UK

carolegoblemanchesteracuk

httpssitesgooglecomsitecarolegoble

httpwwwmygridorguk

Page 35: Keynote speech - Carole Goble - Jisc Digital Festival 2015

1 Science Changes So does the Lab

BioSTIF

ldquoThe questions donrsquot

change but the

answers dordquoDan Reed

The lab is not fixedUpdated resources

Uncertainty

Zhao et al Why workflows break - Understanding and combating decay in Taverna workflows 8th Intl Conf e-Science 2012

2 Instruments Break Labs Decaymaterials become unavailable technicians leave

Reproducibility Window

raquo Bit rot Black boxes

raquo Proprietary Licenses

raquo Clown services

raquo Partial replication

raquo Prepare to Repair

rsaquo form or function

rsaquo preserve or sustain

RO as Instrument Materials Method

Input Data

Software

Output Data

ConfigParameters

Methods(techniques algorithms

spec of the steps)

Materials(datasets parameters

algorithm seeds)

Experiment

Instruments(codes services scripts

underlying libraries)

Laboratory(sw and hw infrastructure

systems software

integrative platforms)

Setup

Drummond Replicability is not Reproducibility Nor is it Good Science onlinePeng Reproducible Research in Computational Science Science 2 Dec 2011 1226-1227

Research Environment

submit articleand move onhellip

publish articlePublication Environment

Research Environment

publish articlePublication Environment

submit articleand move onhellip

[Adapted Freire 2013]

transparencydependencies

stepsprovenance

portability

robustness

preservation

accessavailable

descriptionintelligible

standardscommon APIs

licensing

standardscommon

metadata

change managementversioning

packaging

Machine actionable

Machine actionable

Provenance ndash the link between doing and reporting

Reproduce by ReadingArchived Record Retain the ProcessCode

The IT Crowd Series 3 Episode 4

The eLab Virtual Machine (or Docker Image)

a black box thoughdockercom

Reproduce by Running Active InstrumentRetain the bits

The IT Crowd Series 3 Episode 4

The Internet

serviceScience as a Service

Integrative frameworks

Open Source

Workflows

Virtual Machines

Portable Packaging

Portability

Transparency

ReproZip

Workflowsmakefiles

serviceScience as a Service

Integrative frameworks

Open Source

Workflows

Virtual Machines

Portable Packaging

Metadata Objectsthe secret is the manifesthellip

Workflow definition

Data (inputs outputs) Parameter configsProvenance log

Hettne et al Structuring research methods and data with the research object model genomics workflows as a case study 2014 httpwwwjbiomedsemcomcontentpdf2041-1480-5-41pdf

myRDM

httpwwwcnrirestonvauspapersOverviewDigitalObjectArchitecturepdf

Depth and Coverage Profiles

NISO-JATS

NISO-JATS

Depth and Coverage Metadata Profiles

Zhao et al 2013

Method Matters

Make reproducible -gt Born

Be smart about reproducibility

Think Commons not Repository

Best Practices for Scientific Computing httparxivorgabs12100530Stodden Reproducible Research Standard Intl J Comm Law amp Policy 13 2009

RARE amp FAIR Knowledge Turns with Research Objects

httpmerchandisethedoctorwhositecouk

httpdoctorwhosite1weeblycomsonic-screwdrivershtml

Researcher Silver bullet tools

Psychic paper

httpbowjamesbowca20080608shhhhhhh-silencshtml

Reality Check

Jorge Cham wwwphdcomicscom

Stealthy not Sneakyreduce the friction

instrumentationspan RARE and FAIR

Optimising The Neylon Equation

Auto-magical end-to-end Instrumentationhttpswwwyoutubecomwatchv=QVQwSOX5S08

ELNs and Authoring Platforms

Sweave

Credit ne AuthorshipCiting what

Research Currencies

Training

56Of UK researchers develop their own research software or scripts

73 Of UK researchers have had no formal software engineering training

Survey of researchers from 15 Russell Group universities conducted by SSI between August - October 2014 406 respondents covering representative range of funders discipline and seniority

httpwwwrseacuk

BUThelliphellip

two years time when the paper is writtenreviewers want additional workstatistician wants more runsanalysis may need to be repeatedpost-doc leaves student arrivesnew data revised dataupdated versions of algorithmscodessample was contaminated

Inspired by Bob Harrison

bull Incremental shift for infrastructure providers

bull Moderate shift for policy makers and stewards

bull Paradigm shift for researchers and their institutions

The Challenge

All the members of the Wf4Ever teamColleagues in Manchesterrsquos Information Management Group

httpwwwresearchobjectorg

httpwwwwf4ever-projectorg

httpwwwfair-domorg

httpseek4scienceorg

httprightfieldorguk

httpwwwsoftwareacuk

httpwwwdatafairportorg

httpmyexperimentorg

httpwwwbioveleuAlan WilliamsNorman MorrisonStian Soiland-ReyesPaul GrothTim ClarkJuliana FreireAlejandra Gonzalez-BeltranPhilippe Rocca-SerraIan CottamSusanna SansoneKristian Garza

Barend MonsSean BechhoferPhilip BourneMatthew GambleRaul PalmaJun ZhaoNeil Chue HongJosh SommerMatthias ObstJacky SnoepDavid GavaghanRebecca Lawrence

Contacthellip

Professor Carole Goble CBE FREng FBCS

The University of Manchester UK

carolegoblemanchesteracuk

httpssitesgooglecomsitecarolegoble

httpwwwmygridorguk

Page 36: Keynote speech - Carole Goble - Jisc Digital Festival 2015

Zhao et al Why workflows break - Understanding and combating decay in Taverna workflows 8th Intl Conf e-Science 2012

2 Instruments Break Labs Decaymaterials become unavailable technicians leave

Reproducibility Window

raquo Bit rot Black boxes

raquo Proprietary Licenses

raquo Clown services

raquo Partial replication

raquo Prepare to Repair

rsaquo form or function

rsaquo preserve or sustain

RO as Instrument Materials Method

Input Data

Software

Output Data

ConfigParameters

Methods(techniques algorithms

spec of the steps)

Materials(datasets parameters

algorithm seeds)

Experiment

Instruments(codes services scripts

underlying libraries)

Laboratory(sw and hw infrastructure

systems software

integrative platforms)

Setup

Drummond Replicability is not Reproducibility Nor is it Good Science onlinePeng Reproducible Research in Computational Science Science 2 Dec 2011 1226-1227

Research Environment

submit articleand move onhellip

publish articlePublication Environment

Research Environment

publish articlePublication Environment

submit articleand move onhellip

[Adapted Freire 2013]

transparencydependencies

stepsprovenance

portability

robustness

preservation

accessavailable

descriptionintelligible

standardscommon APIs

licensing

standardscommon

metadata

change managementversioning

packaging

Machine actionable

Machine actionable

Provenance ndash the link between doing and reporting

Reproduce by ReadingArchived Record Retain the ProcessCode

The IT Crowd Series 3 Episode 4

The eLab Virtual Machine (or Docker Image)

a black box thoughdockercom

Reproduce by Running Active InstrumentRetain the bits

The IT Crowd Series 3 Episode 4

The Internet

serviceScience as a Service

Integrative frameworks

Open Source

Workflows

Virtual Machines

Portable Packaging

Portability

Transparency

ReproZip

Workflowsmakefiles

serviceScience as a Service

Integrative frameworks

Open Source

Workflows

Virtual Machines

Portable Packaging

Metadata Objectsthe secret is the manifesthellip

Workflow definition

Data (inputs outputs) Parameter configsProvenance log

Hettne et al Structuring research methods and data with the research object model genomics workflows as a case study 2014 httpwwwjbiomedsemcomcontentpdf2041-1480-5-41pdf

myRDM

httpwwwcnrirestonvauspapersOverviewDigitalObjectArchitecturepdf

Depth and Coverage Profiles

NISO-JATS

NISO-JATS

Depth and Coverage Metadata Profiles

Zhao et al 2013

Method Matters

Make reproducible -gt Born

Be smart about reproducibility

Think Commons not Repository

Best Practices for Scientific Computing httparxivorgabs12100530Stodden Reproducible Research Standard Intl J Comm Law amp Policy 13 2009

RARE amp FAIR Knowledge Turns with Research Objects

httpmerchandisethedoctorwhositecouk

httpdoctorwhosite1weeblycomsonic-screwdrivershtml

Researcher Silver bullet tools

Psychic paper

httpbowjamesbowca20080608shhhhhhh-silencshtml

Reality Check

Jorge Cham wwwphdcomicscom

Stealthy not Sneakyreduce the friction

instrumentationspan RARE and FAIR

Optimising The Neylon Equation

Auto-magical end-to-end Instrumentationhttpswwwyoutubecomwatchv=QVQwSOX5S08

ELNs and Authoring Platforms

Sweave

Credit ne AuthorshipCiting what

Research Currencies

Training

56Of UK researchers develop their own research software or scripts

73 Of UK researchers have had no formal software engineering training

Survey of researchers from 15 Russell Group universities conducted by SSI between August - October 2014 406 respondents covering representative range of funders discipline and seniority

httpwwwrseacuk

BUThelliphellip

two years time when the paper is writtenreviewers want additional workstatistician wants more runsanalysis may need to be repeatedpost-doc leaves student arrivesnew data revised dataupdated versions of algorithmscodessample was contaminated

Inspired by Bob Harrison

bull Incremental shift for infrastructure providers

bull Moderate shift for policy makers and stewards

bull Paradigm shift for researchers and their institutions

The Challenge

All the members of the Wf4Ever teamColleagues in Manchesterrsquos Information Management Group

httpwwwresearchobjectorg

httpwwwwf4ever-projectorg

httpwwwfair-domorg

httpseek4scienceorg

httprightfieldorguk

httpwwwsoftwareacuk

httpwwwdatafairportorg

httpmyexperimentorg

httpwwwbioveleuAlan WilliamsNorman MorrisonStian Soiland-ReyesPaul GrothTim ClarkJuliana FreireAlejandra Gonzalez-BeltranPhilippe Rocca-SerraIan CottamSusanna SansoneKristian Garza

Barend MonsSean BechhoferPhilip BourneMatthew GambleRaul PalmaJun ZhaoNeil Chue HongJosh SommerMatthias ObstJacky SnoepDavid GavaghanRebecca Lawrence

Contacthellip

Professor Carole Goble CBE FREng FBCS

The University of Manchester UK

carolegoblemanchesteracuk

httpssitesgooglecomsitecarolegoble

httpwwwmygridorguk

Page 37: Keynote speech - Carole Goble - Jisc Digital Festival 2015

RO as Instrument Materials Method

Input Data

Software

Output Data

ConfigParameters

Methods(techniques algorithms

spec of the steps)

Materials(datasets parameters

algorithm seeds)

Experiment

Instruments(codes services scripts

underlying libraries)

Laboratory(sw and hw infrastructure

systems software

integrative platforms)

Setup

Drummond Replicability is not Reproducibility Nor is it Good Science onlinePeng Reproducible Research in Computational Science Science 2 Dec 2011 1226-1227

Research Environment

submit articleand move onhellip

publish articlePublication Environment

Research Environment

publish articlePublication Environment

submit articleand move onhellip

[Adapted Freire 2013]

transparencydependencies

stepsprovenance

portability

robustness

preservation

accessavailable

descriptionintelligible

standardscommon APIs

licensing

standardscommon

metadata

change managementversioning

packaging

Machine actionable

Machine actionable

Provenance ndash the link between doing and reporting

Reproduce by ReadingArchived Record Retain the ProcessCode

The IT Crowd Series 3 Episode 4

The eLab Virtual Machine (or Docker Image)

a black box thoughdockercom

Reproduce by Running Active InstrumentRetain the bits

The IT Crowd Series 3 Episode 4

The Internet

serviceScience as a Service

Integrative frameworks

Open Source

Workflows

Virtual Machines

Portable Packaging

Portability

Transparency

ReproZip

Workflowsmakefiles

serviceScience as a Service

Integrative frameworks

Open Source

Workflows

Virtual Machines

Portable Packaging

Metadata Objectsthe secret is the manifesthellip

Workflow definition

Data (inputs outputs) Parameter configsProvenance log

Hettne et al Structuring research methods and data with the research object model genomics workflows as a case study 2014 httpwwwjbiomedsemcomcontentpdf2041-1480-5-41pdf

myRDM

httpwwwcnrirestonvauspapersOverviewDigitalObjectArchitecturepdf

Depth and Coverage Profiles

NISO-JATS

NISO-JATS

Depth and Coverage Metadata Profiles

Zhao et al 2013

Method Matters

Make reproducible -gt Born

Be smart about reproducibility

Think Commons not Repository

Best Practices for Scientific Computing httparxivorgabs12100530Stodden Reproducible Research Standard Intl J Comm Law amp Policy 13 2009

RARE amp FAIR Knowledge Turns with Research Objects

httpmerchandisethedoctorwhositecouk

httpdoctorwhosite1weeblycomsonic-screwdrivershtml

Researcher Silver bullet tools

Psychic paper

httpbowjamesbowca20080608shhhhhhh-silencshtml

Reality Check

Jorge Cham wwwphdcomicscom

Stealthy not Sneakyreduce the friction

instrumentationspan RARE and FAIR

Optimising The Neylon Equation

Auto-magical end-to-end Instrumentationhttpswwwyoutubecomwatchv=QVQwSOX5S08

ELNs and Authoring Platforms

Sweave

Credit ne AuthorshipCiting what

Research Currencies

Training

56Of UK researchers develop their own research software or scripts

73 Of UK researchers have had no formal software engineering training

Survey of researchers from 15 Russell Group universities conducted by SSI between August - October 2014 406 respondents covering representative range of funders discipline and seniority

httpwwwrseacuk

BUThelliphellip

two years time when the paper is writtenreviewers want additional workstatistician wants more runsanalysis may need to be repeatedpost-doc leaves student arrivesnew data revised dataupdated versions of algorithmscodessample was contaminated

Inspired by Bob Harrison

bull Incremental shift for infrastructure providers

bull Moderate shift for policy makers and stewards

bull Paradigm shift for researchers and their institutions

The Challenge

All the members of the Wf4Ever teamColleagues in Manchesterrsquos Information Management Group

httpwwwresearchobjectorg

httpwwwwf4ever-projectorg

httpwwwfair-domorg

httpseek4scienceorg

httprightfieldorguk

httpwwwsoftwareacuk

httpwwwdatafairportorg

httpmyexperimentorg

httpwwwbioveleuAlan WilliamsNorman MorrisonStian Soiland-ReyesPaul GrothTim ClarkJuliana FreireAlejandra Gonzalez-BeltranPhilippe Rocca-SerraIan CottamSusanna SansoneKristian Garza

Barend MonsSean BechhoferPhilip BourneMatthew GambleRaul PalmaJun ZhaoNeil Chue HongJosh SommerMatthias ObstJacky SnoepDavid GavaghanRebecca Lawrence

Contacthellip

Professor Carole Goble CBE FREng FBCS

The University of Manchester UK

carolegoblemanchesteracuk

httpssitesgooglecomsitecarolegoble

httpwwwmygridorguk

Page 38: Keynote speech - Carole Goble - Jisc Digital Festival 2015

Research Environment

submit articleand move onhellip

publish articlePublication Environment

Research Environment

publish articlePublication Environment

submit articleand move onhellip

[Adapted Freire 2013]

transparencydependencies

stepsprovenance

portability

robustness

preservation

accessavailable

descriptionintelligible

standardscommon APIs

licensing

standardscommon

metadata

change managementversioning

packaging

Machine actionable

Machine actionable

Provenance ndash the link between doing and reporting

Reproduce by ReadingArchived Record Retain the ProcessCode

The IT Crowd Series 3 Episode 4

The eLab Virtual Machine (or Docker Image)

a black box thoughdockercom

Reproduce by Running Active InstrumentRetain the bits

The IT Crowd Series 3 Episode 4

The Internet

serviceScience as a Service

Integrative frameworks

Open Source

Workflows

Virtual Machines

Portable Packaging

Portability

Transparency

ReproZip

Workflowsmakefiles

serviceScience as a Service

Integrative frameworks

Open Source

Workflows

Virtual Machines

Portable Packaging

Metadata Objectsthe secret is the manifesthellip

Workflow definition

Data (inputs outputs) Parameter configsProvenance log

Hettne et al Structuring research methods and data with the research object model genomics workflows as a case study 2014 httpwwwjbiomedsemcomcontentpdf2041-1480-5-41pdf

myRDM

httpwwwcnrirestonvauspapersOverviewDigitalObjectArchitecturepdf

Depth and Coverage Profiles

NISO-JATS

NISO-JATS

Depth and Coverage Metadata Profiles

Zhao et al 2013

Method Matters

Make reproducible -gt Born

Be smart about reproducibility

Think Commons not Repository

Best Practices for Scientific Computing httparxivorgabs12100530Stodden Reproducible Research Standard Intl J Comm Law amp Policy 13 2009

RARE amp FAIR Knowledge Turns with Research Objects

httpmerchandisethedoctorwhositecouk

httpdoctorwhosite1weeblycomsonic-screwdrivershtml

Researcher Silver bullet tools

Psychic paper

httpbowjamesbowca20080608shhhhhhh-silencshtml

Reality Check

Jorge Cham wwwphdcomicscom

Stealthy not Sneakyreduce the friction

instrumentationspan RARE and FAIR

Optimising The Neylon Equation

Auto-magical end-to-end Instrumentationhttpswwwyoutubecomwatchv=QVQwSOX5S08

ELNs and Authoring Platforms

Sweave

Credit ne AuthorshipCiting what

Research Currencies

Training

56Of UK researchers develop their own research software or scripts

73 Of UK researchers have had no formal software engineering training

Survey of researchers from 15 Russell Group universities conducted by SSI between August - October 2014 406 respondents covering representative range of funders discipline and seniority

httpwwwrseacuk

BUThelliphellip

two years time when the paper is writtenreviewers want additional workstatistician wants more runsanalysis may need to be repeatedpost-doc leaves student arrivesnew data revised dataupdated versions of algorithmscodessample was contaminated

Inspired by Bob Harrison

bull Incremental shift for infrastructure providers

bull Moderate shift for policy makers and stewards

bull Paradigm shift for researchers and their institutions

The Challenge

All the members of the Wf4Ever teamColleagues in Manchesterrsquos Information Management Group

httpwwwresearchobjectorg

httpwwwwf4ever-projectorg

httpwwwfair-domorg

httpseek4scienceorg

httprightfieldorguk

httpwwwsoftwareacuk

httpwwwdatafairportorg

httpmyexperimentorg

httpwwwbioveleuAlan WilliamsNorman MorrisonStian Soiland-ReyesPaul GrothTim ClarkJuliana FreireAlejandra Gonzalez-BeltranPhilippe Rocca-SerraIan CottamSusanna SansoneKristian Garza

Barend MonsSean BechhoferPhilip BourneMatthew GambleRaul PalmaJun ZhaoNeil Chue HongJosh SommerMatthias ObstJacky SnoepDavid GavaghanRebecca Lawrence

Contacthellip

Professor Carole Goble CBE FREng FBCS

The University of Manchester UK

carolegoblemanchesteracuk

httpssitesgooglecomsitecarolegoble

httpwwwmygridorguk

Page 39: Keynote speech - Carole Goble - Jisc Digital Festival 2015

Research Environment

publish articlePublication Environment

submit articleand move onhellip

[Adapted Freire 2013]

transparencydependencies

stepsprovenance

portability

robustness

preservation

accessavailable

descriptionintelligible

standardscommon APIs

licensing

standardscommon

metadata

change managementversioning

packaging

Machine actionable

Machine actionable

Provenance ndash the link between doing and reporting

Reproduce by ReadingArchived Record Retain the ProcessCode

The IT Crowd Series 3 Episode 4

The eLab Virtual Machine (or Docker Image)

a black box thoughdockercom

Reproduce by Running Active InstrumentRetain the bits

The IT Crowd Series 3 Episode 4

The Internet

serviceScience as a Service

Integrative frameworks

Open Source

Workflows

Virtual Machines

Portable Packaging

Portability

Transparency

ReproZip

Workflowsmakefiles

serviceScience as a Service

Integrative frameworks

Open Source

Workflows

Virtual Machines

Portable Packaging

Metadata Objectsthe secret is the manifesthellip

Workflow definition

Data (inputs outputs) Parameter configsProvenance log

Hettne et al Structuring research methods and data with the research object model genomics workflows as a case study 2014 httpwwwjbiomedsemcomcontentpdf2041-1480-5-41pdf

myRDM

httpwwwcnrirestonvauspapersOverviewDigitalObjectArchitecturepdf

Depth and Coverage Profiles

NISO-JATS

NISO-JATS

Depth and Coverage Metadata Profiles

Zhao et al 2013

Method Matters

Make reproducible -gt Born

Be smart about reproducibility

Think Commons not Repository

Best Practices for Scientific Computing httparxivorgabs12100530Stodden Reproducible Research Standard Intl J Comm Law amp Policy 13 2009

RARE amp FAIR Knowledge Turns with Research Objects

httpmerchandisethedoctorwhositecouk

httpdoctorwhosite1weeblycomsonic-screwdrivershtml

Researcher Silver bullet tools

Psychic paper

httpbowjamesbowca20080608shhhhhhh-silencshtml

Reality Check

Jorge Cham wwwphdcomicscom

Stealthy not Sneakyreduce the friction

instrumentationspan RARE and FAIR

Optimising The Neylon Equation

Auto-magical end-to-end Instrumentationhttpswwwyoutubecomwatchv=QVQwSOX5S08

ELNs and Authoring Platforms

Sweave

Credit ne AuthorshipCiting what

Research Currencies

Training

56Of UK researchers develop their own research software or scripts

73 Of UK researchers have had no formal software engineering training

Survey of researchers from 15 Russell Group universities conducted by SSI between August - October 2014 406 respondents covering representative range of funders discipline and seniority

httpwwwrseacuk

BUThelliphellip

two years time when the paper is writtenreviewers want additional workstatistician wants more runsanalysis may need to be repeatedpost-doc leaves student arrivesnew data revised dataupdated versions of algorithmscodessample was contaminated

Inspired by Bob Harrison

bull Incremental shift for infrastructure providers

bull Moderate shift for policy makers and stewards

bull Paradigm shift for researchers and their institutions

The Challenge

All the members of the Wf4Ever teamColleagues in Manchesterrsquos Information Management Group

httpwwwresearchobjectorg

httpwwwwf4ever-projectorg

httpwwwfair-domorg

httpseek4scienceorg

httprightfieldorguk

httpwwwsoftwareacuk

httpwwwdatafairportorg

httpmyexperimentorg

httpwwwbioveleuAlan WilliamsNorman MorrisonStian Soiland-ReyesPaul GrothTim ClarkJuliana FreireAlejandra Gonzalez-BeltranPhilippe Rocca-SerraIan CottamSusanna SansoneKristian Garza

Barend MonsSean BechhoferPhilip BourneMatthew GambleRaul PalmaJun ZhaoNeil Chue HongJosh SommerMatthias ObstJacky SnoepDavid GavaghanRebecca Lawrence

Contacthellip

Professor Carole Goble CBE FREng FBCS

The University of Manchester UK

carolegoblemanchesteracuk

httpssitesgooglecomsitecarolegoble

httpwwwmygridorguk

Page 40: Keynote speech - Carole Goble - Jisc Digital Festival 2015

[Adapted Freire 2013]

transparencydependencies

stepsprovenance

portability

robustness

preservation

accessavailable

descriptionintelligible

standardscommon APIs

licensing

standardscommon

metadata

change managementversioning

packaging

Machine actionable

Machine actionable

Provenance ndash the link between doing and reporting

Reproduce by ReadingArchived Record Retain the ProcessCode

The IT Crowd Series 3 Episode 4

The eLab Virtual Machine (or Docker Image)

a black box thoughdockercom

Reproduce by Running Active InstrumentRetain the bits

The IT Crowd Series 3 Episode 4

The Internet

serviceScience as a Service

Integrative frameworks

Open Source

Workflows

Virtual Machines

Portable Packaging

Portability

Transparency

ReproZip

Workflowsmakefiles

serviceScience as a Service

Integrative frameworks

Open Source

Workflows

Virtual Machines

Portable Packaging

Metadata Objectsthe secret is the manifesthellip

Workflow definition

Data (inputs outputs) Parameter configsProvenance log

Hettne et al Structuring research methods and data with the research object model genomics workflows as a case study 2014 httpwwwjbiomedsemcomcontentpdf2041-1480-5-41pdf

myRDM

httpwwwcnrirestonvauspapersOverviewDigitalObjectArchitecturepdf

Depth and Coverage Profiles

NISO-JATS

NISO-JATS

Depth and Coverage Metadata Profiles

Zhao et al 2013

Method Matters

Make reproducible -gt Born

Be smart about reproducibility

Think Commons not Repository

Best Practices for Scientific Computing httparxivorgabs12100530Stodden Reproducible Research Standard Intl J Comm Law amp Policy 13 2009

RARE amp FAIR Knowledge Turns with Research Objects

httpmerchandisethedoctorwhositecouk

httpdoctorwhosite1weeblycomsonic-screwdrivershtml

Researcher Silver bullet tools

Psychic paper

httpbowjamesbowca20080608shhhhhhh-silencshtml

Reality Check

Jorge Cham wwwphdcomicscom

Stealthy not Sneakyreduce the friction

instrumentationspan RARE and FAIR

Optimising The Neylon Equation

Auto-magical end-to-end Instrumentationhttpswwwyoutubecomwatchv=QVQwSOX5S08

ELNs and Authoring Platforms

Sweave

Credit ne AuthorshipCiting what

Research Currencies

Training

56Of UK researchers develop their own research software or scripts

73 Of UK researchers have had no formal software engineering training

Survey of researchers from 15 Russell Group universities conducted by SSI between August - October 2014 406 respondents covering representative range of funders discipline and seniority

httpwwwrseacuk

BUThelliphellip

two years time when the paper is writtenreviewers want additional workstatistician wants more runsanalysis may need to be repeatedpost-doc leaves student arrivesnew data revised dataupdated versions of algorithmscodessample was contaminated

Inspired by Bob Harrison

bull Incremental shift for infrastructure providers

bull Moderate shift for policy makers and stewards

bull Paradigm shift for researchers and their institutions

The Challenge

All the members of the Wf4Ever teamColleagues in Manchesterrsquos Information Management Group

httpwwwresearchobjectorg

httpwwwwf4ever-projectorg

httpwwwfair-domorg

httpseek4scienceorg

httprightfieldorguk

httpwwwsoftwareacuk

httpwwwdatafairportorg

httpmyexperimentorg

httpwwwbioveleuAlan WilliamsNorman MorrisonStian Soiland-ReyesPaul GrothTim ClarkJuliana FreireAlejandra Gonzalez-BeltranPhilippe Rocca-SerraIan CottamSusanna SansoneKristian Garza

Barend MonsSean BechhoferPhilip BourneMatthew GambleRaul PalmaJun ZhaoNeil Chue HongJosh SommerMatthias ObstJacky SnoepDavid GavaghanRebecca Lawrence

Contacthellip

Professor Carole Goble CBE FREng FBCS

The University of Manchester UK

carolegoblemanchesteracuk

httpssitesgooglecomsitecarolegoble

httpwwwmygridorguk

Page 41: Keynote speech - Carole Goble - Jisc Digital Festival 2015

Provenance ndash the link between doing and reporting

Reproduce by ReadingArchived Record Retain the ProcessCode

The IT Crowd Series 3 Episode 4

The eLab Virtual Machine (or Docker Image)

a black box thoughdockercom

Reproduce by Running Active InstrumentRetain the bits

The IT Crowd Series 3 Episode 4

The Internet

serviceScience as a Service

Integrative frameworks

Open Source

Workflows

Virtual Machines

Portable Packaging

Portability

Transparency

ReproZip

Workflowsmakefiles

serviceScience as a Service

Integrative frameworks

Open Source

Workflows

Virtual Machines

Portable Packaging

Metadata Objectsthe secret is the manifesthellip

Workflow definition

Data (inputs outputs) Parameter configsProvenance log

Hettne et al Structuring research methods and data with the research object model genomics workflows as a case study 2014 httpwwwjbiomedsemcomcontentpdf2041-1480-5-41pdf

myRDM

httpwwwcnrirestonvauspapersOverviewDigitalObjectArchitecturepdf

Depth and Coverage Profiles

NISO-JATS

NISO-JATS

Depth and Coverage Metadata Profiles

Zhao et al 2013

Method Matters

Make reproducible -gt Born

Be smart about reproducibility

Think Commons not Repository

Best Practices for Scientific Computing httparxivorgabs12100530Stodden Reproducible Research Standard Intl J Comm Law amp Policy 13 2009

RARE amp FAIR Knowledge Turns with Research Objects

httpmerchandisethedoctorwhositecouk

httpdoctorwhosite1weeblycomsonic-screwdrivershtml

Researcher Silver bullet tools

Psychic paper

httpbowjamesbowca20080608shhhhhhh-silencshtml

Reality Check

Jorge Cham wwwphdcomicscom

Stealthy not Sneakyreduce the friction

instrumentationspan RARE and FAIR

Optimising The Neylon Equation

Auto-magical end-to-end Instrumentationhttpswwwyoutubecomwatchv=QVQwSOX5S08

ELNs and Authoring Platforms

Sweave

Credit ne AuthorshipCiting what

Research Currencies

Training

56Of UK researchers develop their own research software or scripts

73 Of UK researchers have had no formal software engineering training

Survey of researchers from 15 Russell Group universities conducted by SSI between August - October 2014 406 respondents covering representative range of funders discipline and seniority

httpwwwrseacuk

BUThelliphellip

two years time when the paper is writtenreviewers want additional workstatistician wants more runsanalysis may need to be repeatedpost-doc leaves student arrivesnew data revised dataupdated versions of algorithmscodessample was contaminated

Inspired by Bob Harrison

bull Incremental shift for infrastructure providers

bull Moderate shift for policy makers and stewards

bull Paradigm shift for researchers and their institutions

The Challenge

All the members of the Wf4Ever teamColleagues in Manchesterrsquos Information Management Group

httpwwwresearchobjectorg

httpwwwwf4ever-projectorg

httpwwwfair-domorg

httpseek4scienceorg

httprightfieldorguk

httpwwwsoftwareacuk

httpwwwdatafairportorg

httpmyexperimentorg

httpwwwbioveleuAlan WilliamsNorman MorrisonStian Soiland-ReyesPaul GrothTim ClarkJuliana FreireAlejandra Gonzalez-BeltranPhilippe Rocca-SerraIan CottamSusanna SansoneKristian Garza

Barend MonsSean BechhoferPhilip BourneMatthew GambleRaul PalmaJun ZhaoNeil Chue HongJosh SommerMatthias ObstJacky SnoepDavid GavaghanRebecca Lawrence

Contacthellip

Professor Carole Goble CBE FREng FBCS

The University of Manchester UK

carolegoblemanchesteracuk

httpssitesgooglecomsitecarolegoble

httpwwwmygridorguk

Page 42: Keynote speech - Carole Goble - Jisc Digital Festival 2015

Reproduce by ReadingArchived Record Retain the ProcessCode

The IT Crowd Series 3 Episode 4

The eLab Virtual Machine (or Docker Image)

a black box thoughdockercom

Reproduce by Running Active InstrumentRetain the bits

The IT Crowd Series 3 Episode 4

The Internet

serviceScience as a Service

Integrative frameworks

Open Source

Workflows

Virtual Machines

Portable Packaging

Portability

Transparency

ReproZip

Workflowsmakefiles

serviceScience as a Service

Integrative frameworks

Open Source

Workflows

Virtual Machines

Portable Packaging

Metadata Objectsthe secret is the manifesthellip

Workflow definition

Data (inputs outputs) Parameter configsProvenance log

Hettne et al Structuring research methods and data with the research object model genomics workflows as a case study 2014 httpwwwjbiomedsemcomcontentpdf2041-1480-5-41pdf

myRDM

httpwwwcnrirestonvauspapersOverviewDigitalObjectArchitecturepdf

Depth and Coverage Profiles

NISO-JATS

NISO-JATS

Depth and Coverage Metadata Profiles

Zhao et al 2013

Method Matters

Make reproducible -gt Born

Be smart about reproducibility

Think Commons not Repository

Best Practices for Scientific Computing httparxivorgabs12100530Stodden Reproducible Research Standard Intl J Comm Law amp Policy 13 2009

RARE amp FAIR Knowledge Turns with Research Objects

httpmerchandisethedoctorwhositecouk

httpdoctorwhosite1weeblycomsonic-screwdrivershtml

Researcher Silver bullet tools

Psychic paper

httpbowjamesbowca20080608shhhhhhh-silencshtml

Reality Check

Jorge Cham wwwphdcomicscom

Stealthy not Sneakyreduce the friction

instrumentationspan RARE and FAIR

Optimising The Neylon Equation

Auto-magical end-to-end Instrumentationhttpswwwyoutubecomwatchv=QVQwSOX5S08

ELNs and Authoring Platforms

Sweave

Credit ne AuthorshipCiting what

Research Currencies

Training

56Of UK researchers develop their own research software or scripts

73 Of UK researchers have had no formal software engineering training

Survey of researchers from 15 Russell Group universities conducted by SSI between August - October 2014 406 respondents covering representative range of funders discipline and seniority

httpwwwrseacuk

BUThelliphellip

two years time when the paper is writtenreviewers want additional workstatistician wants more runsanalysis may need to be repeatedpost-doc leaves student arrivesnew data revised dataupdated versions of algorithmscodessample was contaminated

Inspired by Bob Harrison

bull Incremental shift for infrastructure providers

bull Moderate shift for policy makers and stewards

bull Paradigm shift for researchers and their institutions

The Challenge

All the members of the Wf4Ever teamColleagues in Manchesterrsquos Information Management Group

httpwwwresearchobjectorg

httpwwwwf4ever-projectorg

httpwwwfair-domorg

httpseek4scienceorg

httprightfieldorguk

httpwwwsoftwareacuk

httpwwwdatafairportorg

httpmyexperimentorg

httpwwwbioveleuAlan WilliamsNorman MorrisonStian Soiland-ReyesPaul GrothTim ClarkJuliana FreireAlejandra Gonzalez-BeltranPhilippe Rocca-SerraIan CottamSusanna SansoneKristian Garza

Barend MonsSean BechhoferPhilip BourneMatthew GambleRaul PalmaJun ZhaoNeil Chue HongJosh SommerMatthias ObstJacky SnoepDavid GavaghanRebecca Lawrence

Contacthellip

Professor Carole Goble CBE FREng FBCS

The University of Manchester UK

carolegoblemanchesteracuk

httpssitesgooglecomsitecarolegoble

httpwwwmygridorguk

Page 43: Keynote speech - Carole Goble - Jisc Digital Festival 2015

The IT Crowd Series 3 Episode 4

The eLab Virtual Machine (or Docker Image)

a black box thoughdockercom

Reproduce by Running Active InstrumentRetain the bits

The IT Crowd Series 3 Episode 4

The Internet

serviceScience as a Service

Integrative frameworks

Open Source

Workflows

Virtual Machines

Portable Packaging

Portability

Transparency

ReproZip

Workflowsmakefiles

serviceScience as a Service

Integrative frameworks

Open Source

Workflows

Virtual Machines

Portable Packaging

Metadata Objectsthe secret is the manifesthellip

Workflow definition

Data (inputs outputs) Parameter configsProvenance log

Hettne et al Structuring research methods and data with the research object model genomics workflows as a case study 2014 httpwwwjbiomedsemcomcontentpdf2041-1480-5-41pdf

myRDM

httpwwwcnrirestonvauspapersOverviewDigitalObjectArchitecturepdf

Depth and Coverage Profiles

NISO-JATS

NISO-JATS

Depth and Coverage Metadata Profiles

Zhao et al 2013

Method Matters

Make reproducible -gt Born

Be smart about reproducibility

Think Commons not Repository

Best Practices for Scientific Computing httparxivorgabs12100530Stodden Reproducible Research Standard Intl J Comm Law amp Policy 13 2009

RARE amp FAIR Knowledge Turns with Research Objects

httpmerchandisethedoctorwhositecouk

httpdoctorwhosite1weeblycomsonic-screwdrivershtml

Researcher Silver bullet tools

Psychic paper

httpbowjamesbowca20080608shhhhhhh-silencshtml

Reality Check

Jorge Cham wwwphdcomicscom

Stealthy not Sneakyreduce the friction

instrumentationspan RARE and FAIR

Optimising The Neylon Equation

Auto-magical end-to-end Instrumentationhttpswwwyoutubecomwatchv=QVQwSOX5S08

ELNs and Authoring Platforms

Sweave

Credit ne AuthorshipCiting what

Research Currencies

Training

56Of UK researchers develop their own research software or scripts

73 Of UK researchers have had no formal software engineering training

Survey of researchers from 15 Russell Group universities conducted by SSI between August - October 2014 406 respondents covering representative range of funders discipline and seniority

httpwwwrseacuk

BUThelliphellip

two years time when the paper is writtenreviewers want additional workstatistician wants more runsanalysis may need to be repeatedpost-doc leaves student arrivesnew data revised dataupdated versions of algorithmscodessample was contaminated

Inspired by Bob Harrison

bull Incremental shift for infrastructure providers

bull Moderate shift for policy makers and stewards

bull Paradigm shift for researchers and their institutions

The Challenge

All the members of the Wf4Ever teamColleagues in Manchesterrsquos Information Management Group

httpwwwresearchobjectorg

httpwwwwf4ever-projectorg

httpwwwfair-domorg

httpseek4scienceorg

httprightfieldorguk

httpwwwsoftwareacuk

httpwwwdatafairportorg

httpmyexperimentorg

httpwwwbioveleuAlan WilliamsNorman MorrisonStian Soiland-ReyesPaul GrothTim ClarkJuliana FreireAlejandra Gonzalez-BeltranPhilippe Rocca-SerraIan CottamSusanna SansoneKristian Garza

Barend MonsSean BechhoferPhilip BourneMatthew GambleRaul PalmaJun ZhaoNeil Chue HongJosh SommerMatthias ObstJacky SnoepDavid GavaghanRebecca Lawrence

Contacthellip

Professor Carole Goble CBE FREng FBCS

The University of Manchester UK

carolegoblemanchesteracuk

httpssitesgooglecomsitecarolegoble

httpwwwmygridorguk

Page 44: Keynote speech - Carole Goble - Jisc Digital Festival 2015

The IT Crowd Series 3 Episode 4

The Internet

serviceScience as a Service

Integrative frameworks

Open Source

Workflows

Virtual Machines

Portable Packaging

Portability

Transparency

ReproZip

Workflowsmakefiles

serviceScience as a Service

Integrative frameworks

Open Source

Workflows

Virtual Machines

Portable Packaging

Metadata Objectsthe secret is the manifesthellip

Workflow definition

Data (inputs outputs) Parameter configsProvenance log

Hettne et al Structuring research methods and data with the research object model genomics workflows as a case study 2014 httpwwwjbiomedsemcomcontentpdf2041-1480-5-41pdf

myRDM

httpwwwcnrirestonvauspapersOverviewDigitalObjectArchitecturepdf

Depth and Coverage Profiles

NISO-JATS

NISO-JATS

Depth and Coverage Metadata Profiles

Zhao et al 2013

Method Matters

Make reproducible -gt Born

Be smart about reproducibility

Think Commons not Repository

Best Practices for Scientific Computing httparxivorgabs12100530Stodden Reproducible Research Standard Intl J Comm Law amp Policy 13 2009

RARE amp FAIR Knowledge Turns with Research Objects

httpmerchandisethedoctorwhositecouk

httpdoctorwhosite1weeblycomsonic-screwdrivershtml

Researcher Silver bullet tools

Psychic paper

httpbowjamesbowca20080608shhhhhhh-silencshtml

Reality Check

Jorge Cham wwwphdcomicscom

Stealthy not Sneakyreduce the friction

instrumentationspan RARE and FAIR

Optimising The Neylon Equation

Auto-magical end-to-end Instrumentationhttpswwwyoutubecomwatchv=QVQwSOX5S08

ELNs and Authoring Platforms

Sweave

Credit ne AuthorshipCiting what

Research Currencies

Training

56Of UK researchers develop their own research software or scripts

73 Of UK researchers have had no formal software engineering training

Survey of researchers from 15 Russell Group universities conducted by SSI between August - October 2014 406 respondents covering representative range of funders discipline and seniority

httpwwwrseacuk

BUThelliphellip

two years time when the paper is writtenreviewers want additional workstatistician wants more runsanalysis may need to be repeatedpost-doc leaves student arrivesnew data revised dataupdated versions of algorithmscodessample was contaminated

Inspired by Bob Harrison

bull Incremental shift for infrastructure providers

bull Moderate shift for policy makers and stewards

bull Paradigm shift for researchers and their institutions

The Challenge

All the members of the Wf4Ever teamColleagues in Manchesterrsquos Information Management Group

httpwwwresearchobjectorg

httpwwwwf4ever-projectorg

httpwwwfair-domorg

httpseek4scienceorg

httprightfieldorguk

httpwwwsoftwareacuk

httpwwwdatafairportorg

httpmyexperimentorg

httpwwwbioveleuAlan WilliamsNorman MorrisonStian Soiland-ReyesPaul GrothTim ClarkJuliana FreireAlejandra Gonzalez-BeltranPhilippe Rocca-SerraIan CottamSusanna SansoneKristian Garza

Barend MonsSean BechhoferPhilip BourneMatthew GambleRaul PalmaJun ZhaoNeil Chue HongJosh SommerMatthias ObstJacky SnoepDavid GavaghanRebecca Lawrence

Contacthellip

Professor Carole Goble CBE FREng FBCS

The University of Manchester UK

carolegoblemanchesteracuk

httpssitesgooglecomsitecarolegoble

httpwwwmygridorguk

Page 45: Keynote speech - Carole Goble - Jisc Digital Festival 2015

serviceScience as a Service

Integrative frameworks

Open Source

Workflows

Virtual Machines

Portable Packaging

Portability

Transparency

ReproZip

Workflowsmakefiles

serviceScience as a Service

Integrative frameworks

Open Source

Workflows

Virtual Machines

Portable Packaging

Metadata Objectsthe secret is the manifesthellip

Workflow definition

Data (inputs outputs) Parameter configsProvenance log

Hettne et al Structuring research methods and data with the research object model genomics workflows as a case study 2014 httpwwwjbiomedsemcomcontentpdf2041-1480-5-41pdf

myRDM

httpwwwcnrirestonvauspapersOverviewDigitalObjectArchitecturepdf

Depth and Coverage Profiles

NISO-JATS

NISO-JATS

Depth and Coverage Metadata Profiles

Zhao et al 2013

Method Matters

Make reproducible -gt Born

Be smart about reproducibility

Think Commons not Repository

Best Practices for Scientific Computing httparxivorgabs12100530Stodden Reproducible Research Standard Intl J Comm Law amp Policy 13 2009

RARE amp FAIR Knowledge Turns with Research Objects

httpmerchandisethedoctorwhositecouk

httpdoctorwhosite1weeblycomsonic-screwdrivershtml

Researcher Silver bullet tools

Psychic paper

httpbowjamesbowca20080608shhhhhhh-silencshtml

Reality Check

Jorge Cham wwwphdcomicscom

Stealthy not Sneakyreduce the friction

instrumentationspan RARE and FAIR

Optimising The Neylon Equation

Auto-magical end-to-end Instrumentationhttpswwwyoutubecomwatchv=QVQwSOX5S08

ELNs and Authoring Platforms

Sweave

Credit ne AuthorshipCiting what

Research Currencies

Training

56Of UK researchers develop their own research software or scripts

73 Of UK researchers have had no formal software engineering training

Survey of researchers from 15 Russell Group universities conducted by SSI between August - October 2014 406 respondents covering representative range of funders discipline and seniority

httpwwwrseacuk

BUThelliphellip

two years time when the paper is writtenreviewers want additional workstatistician wants more runsanalysis may need to be repeatedpost-doc leaves student arrivesnew data revised dataupdated versions of algorithmscodessample was contaminated

Inspired by Bob Harrison

bull Incremental shift for infrastructure providers

bull Moderate shift for policy makers and stewards

bull Paradigm shift for researchers and their institutions

The Challenge

All the members of the Wf4Ever teamColleagues in Manchesterrsquos Information Management Group

httpwwwresearchobjectorg

httpwwwwf4ever-projectorg

httpwwwfair-domorg

httpseek4scienceorg

httprightfieldorguk

httpwwwsoftwareacuk

httpwwwdatafairportorg

httpmyexperimentorg

httpwwwbioveleuAlan WilliamsNorman MorrisonStian Soiland-ReyesPaul GrothTim ClarkJuliana FreireAlejandra Gonzalez-BeltranPhilippe Rocca-SerraIan CottamSusanna SansoneKristian Garza

Barend MonsSean BechhoferPhilip BourneMatthew GambleRaul PalmaJun ZhaoNeil Chue HongJosh SommerMatthias ObstJacky SnoepDavid GavaghanRebecca Lawrence

Contacthellip

Professor Carole Goble CBE FREng FBCS

The University of Manchester UK

carolegoblemanchesteracuk

httpssitesgooglecomsitecarolegoble

httpwwwmygridorguk

Page 46: Keynote speech - Carole Goble - Jisc Digital Festival 2015

ReproZip

Workflowsmakefiles

serviceScience as a Service

Integrative frameworks

Open Source

Workflows

Virtual Machines

Portable Packaging

Metadata Objectsthe secret is the manifesthellip

Workflow definition

Data (inputs outputs) Parameter configsProvenance log

Hettne et al Structuring research methods and data with the research object model genomics workflows as a case study 2014 httpwwwjbiomedsemcomcontentpdf2041-1480-5-41pdf

myRDM

httpwwwcnrirestonvauspapersOverviewDigitalObjectArchitecturepdf

Depth and Coverage Profiles

NISO-JATS

NISO-JATS

Depth and Coverage Metadata Profiles

Zhao et al 2013

Method Matters

Make reproducible -gt Born

Be smart about reproducibility

Think Commons not Repository

Best Practices for Scientific Computing httparxivorgabs12100530Stodden Reproducible Research Standard Intl J Comm Law amp Policy 13 2009

RARE amp FAIR Knowledge Turns with Research Objects

httpmerchandisethedoctorwhositecouk

httpdoctorwhosite1weeblycomsonic-screwdrivershtml

Researcher Silver bullet tools

Psychic paper

httpbowjamesbowca20080608shhhhhhh-silencshtml

Reality Check

Jorge Cham wwwphdcomicscom

Stealthy not Sneakyreduce the friction

instrumentationspan RARE and FAIR

Optimising The Neylon Equation

Auto-magical end-to-end Instrumentationhttpswwwyoutubecomwatchv=QVQwSOX5S08

ELNs and Authoring Platforms

Sweave

Credit ne AuthorshipCiting what

Research Currencies

Training

56Of UK researchers develop their own research software or scripts

73 Of UK researchers have had no formal software engineering training

Survey of researchers from 15 Russell Group universities conducted by SSI between August - October 2014 406 respondents covering representative range of funders discipline and seniority

httpwwwrseacuk

BUThelliphellip

two years time when the paper is writtenreviewers want additional workstatistician wants more runsanalysis may need to be repeatedpost-doc leaves student arrivesnew data revised dataupdated versions of algorithmscodessample was contaminated

Inspired by Bob Harrison

bull Incremental shift for infrastructure providers

bull Moderate shift for policy makers and stewards

bull Paradigm shift for researchers and their institutions

The Challenge

All the members of the Wf4Ever teamColleagues in Manchesterrsquos Information Management Group

httpwwwresearchobjectorg

httpwwwwf4ever-projectorg

httpwwwfair-domorg

httpseek4scienceorg

httprightfieldorguk

httpwwwsoftwareacuk

httpwwwdatafairportorg

httpmyexperimentorg

httpwwwbioveleuAlan WilliamsNorman MorrisonStian Soiland-ReyesPaul GrothTim ClarkJuliana FreireAlejandra Gonzalez-BeltranPhilippe Rocca-SerraIan CottamSusanna SansoneKristian Garza

Barend MonsSean BechhoferPhilip BourneMatthew GambleRaul PalmaJun ZhaoNeil Chue HongJosh SommerMatthias ObstJacky SnoepDavid GavaghanRebecca Lawrence

Contacthellip

Professor Carole Goble CBE FREng FBCS

The University of Manchester UK

carolegoblemanchesteracuk

httpssitesgooglecomsitecarolegoble

httpwwwmygridorguk

Page 47: Keynote speech - Carole Goble - Jisc Digital Festival 2015

Metadata Objectsthe secret is the manifesthellip

Workflow definition

Data (inputs outputs) Parameter configsProvenance log

Hettne et al Structuring research methods and data with the research object model genomics workflows as a case study 2014 httpwwwjbiomedsemcomcontentpdf2041-1480-5-41pdf

myRDM

httpwwwcnrirestonvauspapersOverviewDigitalObjectArchitecturepdf

Depth and Coverage Profiles

NISO-JATS

NISO-JATS

Depth and Coverage Metadata Profiles

Zhao et al 2013

Method Matters

Make reproducible -gt Born

Be smart about reproducibility

Think Commons not Repository

Best Practices for Scientific Computing httparxivorgabs12100530Stodden Reproducible Research Standard Intl J Comm Law amp Policy 13 2009

RARE amp FAIR Knowledge Turns with Research Objects

httpmerchandisethedoctorwhositecouk

httpdoctorwhosite1weeblycomsonic-screwdrivershtml

Researcher Silver bullet tools

Psychic paper

httpbowjamesbowca20080608shhhhhhh-silencshtml

Reality Check

Jorge Cham wwwphdcomicscom

Stealthy not Sneakyreduce the friction

instrumentationspan RARE and FAIR

Optimising The Neylon Equation

Auto-magical end-to-end Instrumentationhttpswwwyoutubecomwatchv=QVQwSOX5S08

ELNs and Authoring Platforms

Sweave

Credit ne AuthorshipCiting what

Research Currencies

Training

56Of UK researchers develop their own research software or scripts

73 Of UK researchers have had no formal software engineering training

Survey of researchers from 15 Russell Group universities conducted by SSI between August - October 2014 406 respondents covering representative range of funders discipline and seniority

httpwwwrseacuk

BUThelliphellip

two years time when the paper is writtenreviewers want additional workstatistician wants more runsanalysis may need to be repeatedpost-doc leaves student arrivesnew data revised dataupdated versions of algorithmscodessample was contaminated

Inspired by Bob Harrison

bull Incremental shift for infrastructure providers

bull Moderate shift for policy makers and stewards

bull Paradigm shift for researchers and their institutions

The Challenge

All the members of the Wf4Ever teamColleagues in Manchesterrsquos Information Management Group

httpwwwresearchobjectorg

httpwwwwf4ever-projectorg

httpwwwfair-domorg

httpseek4scienceorg

httprightfieldorguk

httpwwwsoftwareacuk

httpwwwdatafairportorg

httpmyexperimentorg

httpwwwbioveleuAlan WilliamsNorman MorrisonStian Soiland-ReyesPaul GrothTim ClarkJuliana FreireAlejandra Gonzalez-BeltranPhilippe Rocca-SerraIan CottamSusanna SansoneKristian Garza

Barend MonsSean BechhoferPhilip BourneMatthew GambleRaul PalmaJun ZhaoNeil Chue HongJosh SommerMatthias ObstJacky SnoepDavid GavaghanRebecca Lawrence

Contacthellip

Professor Carole Goble CBE FREng FBCS

The University of Manchester UK

carolegoblemanchesteracuk

httpssitesgooglecomsitecarolegoble

httpwwwmygridorguk

Page 48: Keynote speech - Carole Goble - Jisc Digital Festival 2015

Workflow definition

Data (inputs outputs) Parameter configsProvenance log

Hettne et al Structuring research methods and data with the research object model genomics workflows as a case study 2014 httpwwwjbiomedsemcomcontentpdf2041-1480-5-41pdf

myRDM

httpwwwcnrirestonvauspapersOverviewDigitalObjectArchitecturepdf

Depth and Coverage Profiles

NISO-JATS

NISO-JATS

Depth and Coverage Metadata Profiles

Zhao et al 2013

Method Matters

Make reproducible -gt Born

Be smart about reproducibility

Think Commons not Repository

Best Practices for Scientific Computing httparxivorgabs12100530Stodden Reproducible Research Standard Intl J Comm Law amp Policy 13 2009

RARE amp FAIR Knowledge Turns with Research Objects

httpmerchandisethedoctorwhositecouk

httpdoctorwhosite1weeblycomsonic-screwdrivershtml

Researcher Silver bullet tools

Psychic paper

httpbowjamesbowca20080608shhhhhhh-silencshtml

Reality Check

Jorge Cham wwwphdcomicscom

Stealthy not Sneakyreduce the friction

instrumentationspan RARE and FAIR

Optimising The Neylon Equation

Auto-magical end-to-end Instrumentationhttpswwwyoutubecomwatchv=QVQwSOX5S08

ELNs and Authoring Platforms

Sweave

Credit ne AuthorshipCiting what

Research Currencies

Training

56Of UK researchers develop their own research software or scripts

73 Of UK researchers have had no formal software engineering training

Survey of researchers from 15 Russell Group universities conducted by SSI between August - October 2014 406 respondents covering representative range of funders discipline and seniority

httpwwwrseacuk

BUThelliphellip

two years time when the paper is writtenreviewers want additional workstatistician wants more runsanalysis may need to be repeatedpost-doc leaves student arrivesnew data revised dataupdated versions of algorithmscodessample was contaminated

Inspired by Bob Harrison

bull Incremental shift for infrastructure providers

bull Moderate shift for policy makers and stewards

bull Paradigm shift for researchers and their institutions

The Challenge

All the members of the Wf4Ever teamColleagues in Manchesterrsquos Information Management Group

httpwwwresearchobjectorg

httpwwwwf4ever-projectorg

httpwwwfair-domorg

httpseek4scienceorg

httprightfieldorguk

httpwwwsoftwareacuk

httpwwwdatafairportorg

httpmyexperimentorg

httpwwwbioveleuAlan WilliamsNorman MorrisonStian Soiland-ReyesPaul GrothTim ClarkJuliana FreireAlejandra Gonzalez-BeltranPhilippe Rocca-SerraIan CottamSusanna SansoneKristian Garza

Barend MonsSean BechhoferPhilip BourneMatthew GambleRaul PalmaJun ZhaoNeil Chue HongJosh SommerMatthias ObstJacky SnoepDavid GavaghanRebecca Lawrence

Contacthellip

Professor Carole Goble CBE FREng FBCS

The University of Manchester UK

carolegoblemanchesteracuk

httpssitesgooglecomsitecarolegoble

httpwwwmygridorguk

Page 49: Keynote speech - Carole Goble - Jisc Digital Festival 2015

httpwwwcnrirestonvauspapersOverviewDigitalObjectArchitecturepdf

Depth and Coverage Profiles

NISO-JATS

NISO-JATS

Depth and Coverage Metadata Profiles

Zhao et al 2013

Method Matters

Make reproducible -gt Born

Be smart about reproducibility

Think Commons not Repository

Best Practices for Scientific Computing httparxivorgabs12100530Stodden Reproducible Research Standard Intl J Comm Law amp Policy 13 2009

RARE amp FAIR Knowledge Turns with Research Objects

httpmerchandisethedoctorwhositecouk

httpdoctorwhosite1weeblycomsonic-screwdrivershtml

Researcher Silver bullet tools

Psychic paper

httpbowjamesbowca20080608shhhhhhh-silencshtml

Reality Check

Jorge Cham wwwphdcomicscom

Stealthy not Sneakyreduce the friction

instrumentationspan RARE and FAIR

Optimising The Neylon Equation

Auto-magical end-to-end Instrumentationhttpswwwyoutubecomwatchv=QVQwSOX5S08

ELNs and Authoring Platforms

Sweave

Credit ne AuthorshipCiting what

Research Currencies

Training

56Of UK researchers develop their own research software or scripts

73 Of UK researchers have had no formal software engineering training

Survey of researchers from 15 Russell Group universities conducted by SSI between August - October 2014 406 respondents covering representative range of funders discipline and seniority

httpwwwrseacuk

BUThelliphellip

two years time when the paper is writtenreviewers want additional workstatistician wants more runsanalysis may need to be repeatedpost-doc leaves student arrivesnew data revised dataupdated versions of algorithmscodessample was contaminated

Inspired by Bob Harrison

bull Incremental shift for infrastructure providers

bull Moderate shift for policy makers and stewards

bull Paradigm shift for researchers and their institutions

The Challenge

All the members of the Wf4Ever teamColleagues in Manchesterrsquos Information Management Group

httpwwwresearchobjectorg

httpwwwwf4ever-projectorg

httpwwwfair-domorg

httpseek4scienceorg

httprightfieldorguk

httpwwwsoftwareacuk

httpwwwdatafairportorg

httpmyexperimentorg

httpwwwbioveleuAlan WilliamsNorman MorrisonStian Soiland-ReyesPaul GrothTim ClarkJuliana FreireAlejandra Gonzalez-BeltranPhilippe Rocca-SerraIan CottamSusanna SansoneKristian Garza

Barend MonsSean BechhoferPhilip BourneMatthew GambleRaul PalmaJun ZhaoNeil Chue HongJosh SommerMatthias ObstJacky SnoepDavid GavaghanRebecca Lawrence

Contacthellip

Professor Carole Goble CBE FREng FBCS

The University of Manchester UK

carolegoblemanchesteracuk

httpssitesgooglecomsitecarolegoble

httpwwwmygridorguk

Page 50: Keynote speech - Carole Goble - Jisc Digital Festival 2015

NISO-JATS

Depth and Coverage Metadata Profiles

Zhao et al 2013

Method Matters

Make reproducible -gt Born

Be smart about reproducibility

Think Commons not Repository

Best Practices for Scientific Computing httparxivorgabs12100530Stodden Reproducible Research Standard Intl J Comm Law amp Policy 13 2009

RARE amp FAIR Knowledge Turns with Research Objects

httpmerchandisethedoctorwhositecouk

httpdoctorwhosite1weeblycomsonic-screwdrivershtml

Researcher Silver bullet tools

Psychic paper

httpbowjamesbowca20080608shhhhhhh-silencshtml

Reality Check

Jorge Cham wwwphdcomicscom

Stealthy not Sneakyreduce the friction

instrumentationspan RARE and FAIR

Optimising The Neylon Equation

Auto-magical end-to-end Instrumentationhttpswwwyoutubecomwatchv=QVQwSOX5S08

ELNs and Authoring Platforms

Sweave

Credit ne AuthorshipCiting what

Research Currencies

Training

56Of UK researchers develop their own research software or scripts

73 Of UK researchers have had no formal software engineering training

Survey of researchers from 15 Russell Group universities conducted by SSI between August - October 2014 406 respondents covering representative range of funders discipline and seniority

httpwwwrseacuk

BUThelliphellip

two years time when the paper is writtenreviewers want additional workstatistician wants more runsanalysis may need to be repeatedpost-doc leaves student arrivesnew data revised dataupdated versions of algorithmscodessample was contaminated

Inspired by Bob Harrison

bull Incremental shift for infrastructure providers

bull Moderate shift for policy makers and stewards

bull Paradigm shift for researchers and their institutions

The Challenge

All the members of the Wf4Ever teamColleagues in Manchesterrsquos Information Management Group

httpwwwresearchobjectorg

httpwwwwf4ever-projectorg

httpwwwfair-domorg

httpseek4scienceorg

httprightfieldorguk

httpwwwsoftwareacuk

httpwwwdatafairportorg

httpmyexperimentorg

httpwwwbioveleuAlan WilliamsNorman MorrisonStian Soiland-ReyesPaul GrothTim ClarkJuliana FreireAlejandra Gonzalez-BeltranPhilippe Rocca-SerraIan CottamSusanna SansoneKristian Garza

Barend MonsSean BechhoferPhilip BourneMatthew GambleRaul PalmaJun ZhaoNeil Chue HongJosh SommerMatthias ObstJacky SnoepDavid GavaghanRebecca Lawrence

Contacthellip

Professor Carole Goble CBE FREng FBCS

The University of Manchester UK

carolegoblemanchesteracuk

httpssitesgooglecomsitecarolegoble

httpwwwmygridorguk

Page 51: Keynote speech - Carole Goble - Jisc Digital Festival 2015

Method Matters

Make reproducible -gt Born

Be smart about reproducibility

Think Commons not Repository

Best Practices for Scientific Computing httparxivorgabs12100530Stodden Reproducible Research Standard Intl J Comm Law amp Policy 13 2009

RARE amp FAIR Knowledge Turns with Research Objects

httpmerchandisethedoctorwhositecouk

httpdoctorwhosite1weeblycomsonic-screwdrivershtml

Researcher Silver bullet tools

Psychic paper

httpbowjamesbowca20080608shhhhhhh-silencshtml

Reality Check

Jorge Cham wwwphdcomicscom

Stealthy not Sneakyreduce the friction

instrumentationspan RARE and FAIR

Optimising The Neylon Equation

Auto-magical end-to-end Instrumentationhttpswwwyoutubecomwatchv=QVQwSOX5S08

ELNs and Authoring Platforms

Sweave

Credit ne AuthorshipCiting what

Research Currencies

Training

56Of UK researchers develop their own research software or scripts

73 Of UK researchers have had no formal software engineering training

Survey of researchers from 15 Russell Group universities conducted by SSI between August - October 2014 406 respondents covering representative range of funders discipline and seniority

httpwwwrseacuk

BUThelliphellip

two years time when the paper is writtenreviewers want additional workstatistician wants more runsanalysis may need to be repeatedpost-doc leaves student arrivesnew data revised dataupdated versions of algorithmscodessample was contaminated

Inspired by Bob Harrison

bull Incremental shift for infrastructure providers

bull Moderate shift for policy makers and stewards

bull Paradigm shift for researchers and their institutions

The Challenge

All the members of the Wf4Ever teamColleagues in Manchesterrsquos Information Management Group

httpwwwresearchobjectorg

httpwwwwf4ever-projectorg

httpwwwfair-domorg

httpseek4scienceorg

httprightfieldorguk

httpwwwsoftwareacuk

httpwwwdatafairportorg

httpmyexperimentorg

httpwwwbioveleuAlan WilliamsNorman MorrisonStian Soiland-ReyesPaul GrothTim ClarkJuliana FreireAlejandra Gonzalez-BeltranPhilippe Rocca-SerraIan CottamSusanna SansoneKristian Garza

Barend MonsSean BechhoferPhilip BourneMatthew GambleRaul PalmaJun ZhaoNeil Chue HongJosh SommerMatthias ObstJacky SnoepDavid GavaghanRebecca Lawrence

Contacthellip

Professor Carole Goble CBE FREng FBCS

The University of Manchester UK

carolegoblemanchesteracuk

httpssitesgooglecomsitecarolegoble

httpwwwmygridorguk

Page 52: Keynote speech - Carole Goble - Jisc Digital Festival 2015

httpmerchandisethedoctorwhositecouk

httpdoctorwhosite1weeblycomsonic-screwdrivershtml

Researcher Silver bullet tools

Psychic paper

httpbowjamesbowca20080608shhhhhhh-silencshtml

Reality Check

Jorge Cham wwwphdcomicscom

Stealthy not Sneakyreduce the friction

instrumentationspan RARE and FAIR

Optimising The Neylon Equation

Auto-magical end-to-end Instrumentationhttpswwwyoutubecomwatchv=QVQwSOX5S08

ELNs and Authoring Platforms

Sweave

Credit ne AuthorshipCiting what

Research Currencies

Training

56Of UK researchers develop their own research software or scripts

73 Of UK researchers have had no formal software engineering training

Survey of researchers from 15 Russell Group universities conducted by SSI between August - October 2014 406 respondents covering representative range of funders discipline and seniority

httpwwwrseacuk

BUThelliphellip

two years time when the paper is writtenreviewers want additional workstatistician wants more runsanalysis may need to be repeatedpost-doc leaves student arrivesnew data revised dataupdated versions of algorithmscodessample was contaminated

Inspired by Bob Harrison

bull Incremental shift for infrastructure providers

bull Moderate shift for policy makers and stewards

bull Paradigm shift for researchers and their institutions

The Challenge

All the members of the Wf4Ever teamColleagues in Manchesterrsquos Information Management Group

httpwwwresearchobjectorg

httpwwwwf4ever-projectorg

httpwwwfair-domorg

httpseek4scienceorg

httprightfieldorguk

httpwwwsoftwareacuk

httpwwwdatafairportorg

httpmyexperimentorg

httpwwwbioveleuAlan WilliamsNorman MorrisonStian Soiland-ReyesPaul GrothTim ClarkJuliana FreireAlejandra Gonzalez-BeltranPhilippe Rocca-SerraIan CottamSusanna SansoneKristian Garza

Barend MonsSean BechhoferPhilip BourneMatthew GambleRaul PalmaJun ZhaoNeil Chue HongJosh SommerMatthias ObstJacky SnoepDavid GavaghanRebecca Lawrence

Contacthellip

Professor Carole Goble CBE FREng FBCS

The University of Manchester UK

carolegoblemanchesteracuk

httpssitesgooglecomsitecarolegoble

httpwwwmygridorguk

Page 53: Keynote speech - Carole Goble - Jisc Digital Festival 2015

Reality Check

Jorge Cham wwwphdcomicscom

Stealthy not Sneakyreduce the friction

instrumentationspan RARE and FAIR

Optimising The Neylon Equation

Auto-magical end-to-end Instrumentationhttpswwwyoutubecomwatchv=QVQwSOX5S08

ELNs and Authoring Platforms

Sweave

Credit ne AuthorshipCiting what

Research Currencies

Training

56Of UK researchers develop their own research software or scripts

73 Of UK researchers have had no formal software engineering training

Survey of researchers from 15 Russell Group universities conducted by SSI between August - October 2014 406 respondents covering representative range of funders discipline and seniority

httpwwwrseacuk

BUThelliphellip

two years time when the paper is writtenreviewers want additional workstatistician wants more runsanalysis may need to be repeatedpost-doc leaves student arrivesnew data revised dataupdated versions of algorithmscodessample was contaminated

Inspired by Bob Harrison

bull Incremental shift for infrastructure providers

bull Moderate shift for policy makers and stewards

bull Paradigm shift for researchers and their institutions

The Challenge

All the members of the Wf4Ever teamColleagues in Manchesterrsquos Information Management Group

httpwwwresearchobjectorg

httpwwwwf4ever-projectorg

httpwwwfair-domorg

httpseek4scienceorg

httprightfieldorguk

httpwwwsoftwareacuk

httpwwwdatafairportorg

httpmyexperimentorg

httpwwwbioveleuAlan WilliamsNorman MorrisonStian Soiland-ReyesPaul GrothTim ClarkJuliana FreireAlejandra Gonzalez-BeltranPhilippe Rocca-SerraIan CottamSusanna SansoneKristian Garza

Barend MonsSean BechhoferPhilip BourneMatthew GambleRaul PalmaJun ZhaoNeil Chue HongJosh SommerMatthias ObstJacky SnoepDavid GavaghanRebecca Lawrence

Contacthellip

Professor Carole Goble CBE FREng FBCS

The University of Manchester UK

carolegoblemanchesteracuk

httpssitesgooglecomsitecarolegoble

httpwwwmygridorguk

Page 54: Keynote speech - Carole Goble - Jisc Digital Festival 2015

Stealthy not Sneakyreduce the friction

instrumentationspan RARE and FAIR

Optimising The Neylon Equation

Auto-magical end-to-end Instrumentationhttpswwwyoutubecomwatchv=QVQwSOX5S08

ELNs and Authoring Platforms

Sweave

Credit ne AuthorshipCiting what

Research Currencies

Training

56Of UK researchers develop their own research software or scripts

73 Of UK researchers have had no formal software engineering training

Survey of researchers from 15 Russell Group universities conducted by SSI between August - October 2014 406 respondents covering representative range of funders discipline and seniority

httpwwwrseacuk

BUThelliphellip

two years time when the paper is writtenreviewers want additional workstatistician wants more runsanalysis may need to be repeatedpost-doc leaves student arrivesnew data revised dataupdated versions of algorithmscodessample was contaminated

Inspired by Bob Harrison

bull Incremental shift for infrastructure providers

bull Moderate shift for policy makers and stewards

bull Paradigm shift for researchers and their institutions

The Challenge

All the members of the Wf4Ever teamColleagues in Manchesterrsquos Information Management Group

httpwwwresearchobjectorg

httpwwwwf4ever-projectorg

httpwwwfair-domorg

httpseek4scienceorg

httprightfieldorguk

httpwwwsoftwareacuk

httpwwwdatafairportorg

httpmyexperimentorg

httpwwwbioveleuAlan WilliamsNorman MorrisonStian Soiland-ReyesPaul GrothTim ClarkJuliana FreireAlejandra Gonzalez-BeltranPhilippe Rocca-SerraIan CottamSusanna SansoneKristian Garza

Barend MonsSean BechhoferPhilip BourneMatthew GambleRaul PalmaJun ZhaoNeil Chue HongJosh SommerMatthias ObstJacky SnoepDavid GavaghanRebecca Lawrence

Contacthellip

Professor Carole Goble CBE FREng FBCS

The University of Manchester UK

carolegoblemanchesteracuk

httpssitesgooglecomsitecarolegoble

httpwwwmygridorguk

Page 55: Keynote speech - Carole Goble - Jisc Digital Festival 2015

Auto-magical end-to-end Instrumentationhttpswwwyoutubecomwatchv=QVQwSOX5S08

ELNs and Authoring Platforms

Sweave

Credit ne AuthorshipCiting what

Research Currencies

Training

56Of UK researchers develop their own research software or scripts

73 Of UK researchers have had no formal software engineering training

Survey of researchers from 15 Russell Group universities conducted by SSI between August - October 2014 406 respondents covering representative range of funders discipline and seniority

httpwwwrseacuk

BUThelliphellip

two years time when the paper is writtenreviewers want additional workstatistician wants more runsanalysis may need to be repeatedpost-doc leaves student arrivesnew data revised dataupdated versions of algorithmscodessample was contaminated

Inspired by Bob Harrison

bull Incremental shift for infrastructure providers

bull Moderate shift for policy makers and stewards

bull Paradigm shift for researchers and their institutions

The Challenge

All the members of the Wf4Ever teamColleagues in Manchesterrsquos Information Management Group

httpwwwresearchobjectorg

httpwwwwf4ever-projectorg

httpwwwfair-domorg

httpseek4scienceorg

httprightfieldorguk

httpwwwsoftwareacuk

httpwwwdatafairportorg

httpmyexperimentorg

httpwwwbioveleuAlan WilliamsNorman MorrisonStian Soiland-ReyesPaul GrothTim ClarkJuliana FreireAlejandra Gonzalez-BeltranPhilippe Rocca-SerraIan CottamSusanna SansoneKristian Garza

Barend MonsSean BechhoferPhilip BourneMatthew GambleRaul PalmaJun ZhaoNeil Chue HongJosh SommerMatthias ObstJacky SnoepDavid GavaghanRebecca Lawrence

Contacthellip

Professor Carole Goble CBE FREng FBCS

The University of Manchester UK

carolegoblemanchesteracuk

httpssitesgooglecomsitecarolegoble

httpwwwmygridorguk

Page 56: Keynote speech - Carole Goble - Jisc Digital Festival 2015

Credit ne AuthorshipCiting what

Research Currencies

Training

56Of UK researchers develop their own research software or scripts

73 Of UK researchers have had no formal software engineering training

Survey of researchers from 15 Russell Group universities conducted by SSI between August - October 2014 406 respondents covering representative range of funders discipline and seniority

httpwwwrseacuk

BUThelliphellip

two years time when the paper is writtenreviewers want additional workstatistician wants more runsanalysis may need to be repeatedpost-doc leaves student arrivesnew data revised dataupdated versions of algorithmscodessample was contaminated

Inspired by Bob Harrison

bull Incremental shift for infrastructure providers

bull Moderate shift for policy makers and stewards

bull Paradigm shift for researchers and their institutions

The Challenge

All the members of the Wf4Ever teamColleagues in Manchesterrsquos Information Management Group

httpwwwresearchobjectorg

httpwwwwf4ever-projectorg

httpwwwfair-domorg

httpseek4scienceorg

httprightfieldorguk

httpwwwsoftwareacuk

httpwwwdatafairportorg

httpmyexperimentorg

httpwwwbioveleuAlan WilliamsNorman MorrisonStian Soiland-ReyesPaul GrothTim ClarkJuliana FreireAlejandra Gonzalez-BeltranPhilippe Rocca-SerraIan CottamSusanna SansoneKristian Garza

Barend MonsSean BechhoferPhilip BourneMatthew GambleRaul PalmaJun ZhaoNeil Chue HongJosh SommerMatthias ObstJacky SnoepDavid GavaghanRebecca Lawrence

Contacthellip

Professor Carole Goble CBE FREng FBCS

The University of Manchester UK

carolegoblemanchesteracuk

httpssitesgooglecomsitecarolegoble

httpwwwmygridorguk

Page 57: Keynote speech - Carole Goble - Jisc Digital Festival 2015

Training

56Of UK researchers develop their own research software or scripts

73 Of UK researchers have had no formal software engineering training

Survey of researchers from 15 Russell Group universities conducted by SSI between August - October 2014 406 respondents covering representative range of funders discipline and seniority

httpwwwrseacuk

BUThelliphellip

two years time when the paper is writtenreviewers want additional workstatistician wants more runsanalysis may need to be repeatedpost-doc leaves student arrivesnew data revised dataupdated versions of algorithmscodessample was contaminated

Inspired by Bob Harrison

bull Incremental shift for infrastructure providers

bull Moderate shift for policy makers and stewards

bull Paradigm shift for researchers and their institutions

The Challenge

All the members of the Wf4Ever teamColleagues in Manchesterrsquos Information Management Group

httpwwwresearchobjectorg

httpwwwwf4ever-projectorg

httpwwwfair-domorg

httpseek4scienceorg

httprightfieldorguk

httpwwwsoftwareacuk

httpwwwdatafairportorg

httpmyexperimentorg

httpwwwbioveleuAlan WilliamsNorman MorrisonStian Soiland-ReyesPaul GrothTim ClarkJuliana FreireAlejandra Gonzalez-BeltranPhilippe Rocca-SerraIan CottamSusanna SansoneKristian Garza

Barend MonsSean BechhoferPhilip BourneMatthew GambleRaul PalmaJun ZhaoNeil Chue HongJosh SommerMatthias ObstJacky SnoepDavid GavaghanRebecca Lawrence

Contacthellip

Professor Carole Goble CBE FREng FBCS

The University of Manchester UK

carolegoblemanchesteracuk

httpssitesgooglecomsitecarolegoble

httpwwwmygridorguk

Page 58: Keynote speech - Carole Goble - Jisc Digital Festival 2015

httpwwwrseacuk

BUThelliphellip

two years time when the paper is writtenreviewers want additional workstatistician wants more runsanalysis may need to be repeatedpost-doc leaves student arrivesnew data revised dataupdated versions of algorithmscodessample was contaminated

Inspired by Bob Harrison

bull Incremental shift for infrastructure providers

bull Moderate shift for policy makers and stewards

bull Paradigm shift for researchers and their institutions

The Challenge

All the members of the Wf4Ever teamColleagues in Manchesterrsquos Information Management Group

httpwwwresearchobjectorg

httpwwwwf4ever-projectorg

httpwwwfair-domorg

httpseek4scienceorg

httprightfieldorguk

httpwwwsoftwareacuk

httpwwwdatafairportorg

httpmyexperimentorg

httpwwwbioveleuAlan WilliamsNorman MorrisonStian Soiland-ReyesPaul GrothTim ClarkJuliana FreireAlejandra Gonzalez-BeltranPhilippe Rocca-SerraIan CottamSusanna SansoneKristian Garza

Barend MonsSean BechhoferPhilip BourneMatthew GambleRaul PalmaJun ZhaoNeil Chue HongJosh SommerMatthias ObstJacky SnoepDavid GavaghanRebecca Lawrence

Contacthellip

Professor Carole Goble CBE FREng FBCS

The University of Manchester UK

carolegoblemanchesteracuk

httpssitesgooglecomsitecarolegoble

httpwwwmygridorguk

Page 59: Keynote speech - Carole Goble - Jisc Digital Festival 2015

BUThelliphellip

two years time when the paper is writtenreviewers want additional workstatistician wants more runsanalysis may need to be repeatedpost-doc leaves student arrivesnew data revised dataupdated versions of algorithmscodessample was contaminated

Inspired by Bob Harrison

bull Incremental shift for infrastructure providers

bull Moderate shift for policy makers and stewards

bull Paradigm shift for researchers and their institutions

The Challenge

All the members of the Wf4Ever teamColleagues in Manchesterrsquos Information Management Group

httpwwwresearchobjectorg

httpwwwwf4ever-projectorg

httpwwwfair-domorg

httpseek4scienceorg

httprightfieldorguk

httpwwwsoftwareacuk

httpwwwdatafairportorg

httpmyexperimentorg

httpwwwbioveleuAlan WilliamsNorman MorrisonStian Soiland-ReyesPaul GrothTim ClarkJuliana FreireAlejandra Gonzalez-BeltranPhilippe Rocca-SerraIan CottamSusanna SansoneKristian Garza

Barend MonsSean BechhoferPhilip BourneMatthew GambleRaul PalmaJun ZhaoNeil Chue HongJosh SommerMatthias ObstJacky SnoepDavid GavaghanRebecca Lawrence

Contacthellip

Professor Carole Goble CBE FREng FBCS

The University of Manchester UK

carolegoblemanchesteracuk

httpssitesgooglecomsitecarolegoble

httpwwwmygridorguk

Page 60: Keynote speech - Carole Goble - Jisc Digital Festival 2015

Inspired by Bob Harrison

bull Incremental shift for infrastructure providers

bull Moderate shift for policy makers and stewards

bull Paradigm shift for researchers and their institutions

The Challenge

All the members of the Wf4Ever teamColleagues in Manchesterrsquos Information Management Group

httpwwwresearchobjectorg

httpwwwwf4ever-projectorg

httpwwwfair-domorg

httpseek4scienceorg

httprightfieldorguk

httpwwwsoftwareacuk

httpwwwdatafairportorg

httpmyexperimentorg

httpwwwbioveleuAlan WilliamsNorman MorrisonStian Soiland-ReyesPaul GrothTim ClarkJuliana FreireAlejandra Gonzalez-BeltranPhilippe Rocca-SerraIan CottamSusanna SansoneKristian Garza

Barend MonsSean BechhoferPhilip BourneMatthew GambleRaul PalmaJun ZhaoNeil Chue HongJosh SommerMatthias ObstJacky SnoepDavid GavaghanRebecca Lawrence

Contacthellip

Professor Carole Goble CBE FREng FBCS

The University of Manchester UK

carolegoblemanchesteracuk

httpssitesgooglecomsitecarolegoble

httpwwwmygridorguk

Page 61: Keynote speech - Carole Goble - Jisc Digital Festival 2015

All the members of the Wf4Ever teamColleagues in Manchesterrsquos Information Management Group

httpwwwresearchobjectorg

httpwwwwf4ever-projectorg

httpwwwfair-domorg

httpseek4scienceorg

httprightfieldorguk

httpwwwsoftwareacuk

httpwwwdatafairportorg

httpmyexperimentorg

httpwwwbioveleuAlan WilliamsNorman MorrisonStian Soiland-ReyesPaul GrothTim ClarkJuliana FreireAlejandra Gonzalez-BeltranPhilippe Rocca-SerraIan CottamSusanna SansoneKristian Garza

Barend MonsSean BechhoferPhilip BourneMatthew GambleRaul PalmaJun ZhaoNeil Chue HongJosh SommerMatthias ObstJacky SnoepDavid GavaghanRebecca Lawrence

Contacthellip

Professor Carole Goble CBE FREng FBCS

The University of Manchester UK

carolegoblemanchesteracuk

httpssitesgooglecomsitecarolegoble

httpwwwmygridorguk

Page 62: Keynote speech - Carole Goble - Jisc Digital Festival 2015

Contacthellip

Professor Carole Goble CBE FREng FBCS

The University of Manchester UK

carolegoblemanchesteracuk

httpssitesgooglecomsitecarolegoble

httpwwwmygridorguk