elixir: the federated data infrastructure for europe’s ... · elixir structure five platforms for...

Post on 22-Jul-2020

8 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

ELIXIR: The Federated data infrastructure for Europe’s life-science research

www.elixir-europe.org @ELIXIREurope

A network of data Nodes

• ELIXIR Nodes are funded nationally

• ELIXIR Nodes build on national strengths and priorities

• ELIXIR Nodes provides a national framework for long-term resource management

de.NBI- TheGermanNetworkforBioinformatics Infrastructure

de.NBIconsortium• 39project partners• 30institutions• 8service centers• designated nationalGermannode inELIXIR

www.denbi.de

ELIXIR Common Services – our federated infrastructure platforms

Data deposition:ENA, EGA, PDBe, EuropePMC, …

Data management:Genome annotationData management plans

Added value data resources:UniProt, Ensembl, OrphaNet, …

Data Interoperability:Standards,Identifiers, Ontologies

Bioinformatics tools:Bio.tools, Containers, Galaxy

Compute:Secure data transfer, cloud computing, AAI

Training:TeSS, Data Carpentry, eLearning

ELIXIR StructureFive Platforms for Compute,Data, Tools and Interoperability Complemented by Use Cases for Marine meta-genomics, Rare diseases, Human data, Plants sciences,

PROTEOMICS

METABOLOMICS and galaxyproteomics,

metabolomics

HUMAN CELL ATLAS

HUMAN COPY NUMBER VARIATIONGALAXY

FOOD AND NUTRITION

MICROBIAL BIOTECHNOLOGY

Use cases under review:• Microbial biotechnology • Food and nutrition • Human Cell Atlas • Human copy number variation

 

OPINION ARTICLE     The future of metabolomics in ELIXIR [version 2; referees:

2 approved, 1 approved with reservations]Merlijn van Rijswijk ,       Charlie Beirnaert , Christophe Caron , Marta Cascante ,

       Victoria Dominguez , Warwick B. Dunn , Timothy M. D. Ebbels , Franck Giacomoni ,     Alejandra Gonzalez-Beltran , Thomas Hankemeier , Kenneth Haug ,

     Jose L. Izquierdo-Garcia , Rafael C. Jimenez , Fabien Jourdan ,       Namrata Kale , Maria I. Klapa , Oliver Kohlbacher , Kairi Koort ,

     Kim Kultima , Gildas Le Corguillé , Pablo Moreno ,     Nicholas K. Moschonas , Steffen Neumann , Claire O’Donovan ,

       Martin Reczko , Philippe Rocca-Serra , Antonio Rosato , Reza M. Salek ,     Susanna-Assunta Sansone , Venkata Satagopam , Daniel Schober ,

       Ruth Shimmo , Rachel A. Spicer , Ola Spjuth , Etienne A. Thévenot ,       Mark R. Viant , Ralf J. M. Weber , Egon L. Willighagen , Gianluigi Zanetti ,

Christoph Steinbeck 33

ELIXIR-NL, Dutch Techcentre for Life Sciences, Utrecht, 3503 RM, NetherlandsNetherlands Metabolomics Center, Leiden, 2333 CC, NetherlandsADReM, Department of Mathematics and Computer Science, University of Antwerp, Antwerp, 2020, BelgiumELIXIR-FR, French Institute of Bioinformatics, Gif-sur-Yvette, F-91198, FranceDepartment of Biochemistry and Molecular Biomedicine, Faculty of Biology, Universitat de Barcelona, Barcelona, 08028, SpainSchool of Biosciences, Phenome Centre Birmingham and Birmingham Metabolomics Training Centre, University of Birmingham,Birmingham, B15 2TT, UKComputational and Systems Medicine, Department of Surgery and Cancer, Imperial College London, London, SW7 2AZ, UKINRA, UNH, Human Nutrition Unit, PFEM, Metabolism Exploration Platform, MetaboHUB-Clermont, Clermont Auvergne University,Clermont-Ferrand, F-63000, FranceOxford e-Research Centre, Engineering Science Department, University of Oxford, Oxford, OX1 3QG, UKLeiden Academic Centre for Drug Research, Leiden University, Leiden, 2300 RA, NetherlandsEuropean Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Cambridge, CB10 1SD, UKCentro Nacional Investigaciones Cardiovasculares, Madrid, 28029, SpainCIBER de Enfermedades Respiratorias, Madrid, 28029 , SpainELIXIR Hub, Cambridge, CB10 1SD, UKToxalim, UMR 1331, Université de Toulouse, Toulouse, F-31300, FranceMetabolic Engineering and Systems Biology Laboratory, Institute of Chemical Engineering Sciences, Foundation for Research &

Technology – Hellas (FORTH/ICE-HT), Patras, GR-26504, GreeceBiomolecular Interactions, Max Planck Institute for Developmental Biology, Tübingen, 72076, GermanyDepartment of Computer Science, University of Tübingen, Tübingen, 72076, GermanyCenter for Bioinformatics, University of Tübingen, Tübingen, 72076, GermanyThe Centre of Excellence in Neural and Behavioural Sciences, Tallinn, Tallinn, 10120, EstoniaSchool of Natural Sciences and Health, Tallinn University, 10120, 10120, EstoniaDepartment of Medical Sciences, Uppsala University, Uppsala, 752 36, Sweden

UPMC, CNRS, FR2424, ABiMS, Station Biologique, Roscoff, F-29680, France

1,2 3 4 54 6 7 8

9 2,10 1112,13 14 15

11 16 17-19 20,2122 4,23 11

16,24 25 1126 9 27 11

9 28 2520,21 11 29 306 6 31 32

33

123456

78

910111213141516

171819202122

23

Page 1 of 16

F1000Research 2017, 6(ELIXIR):1649 Last updated: 08 NOV 2017

 

OPINION ARTICLEA community proposal to integrate proteomics activities in

 ELIXIR [version 1; referees: 2 approved]Juan Antonio Vizcaíno ,     Mathias Walzer , Rafael C. Jiménez ,

       Wout Bittremieux , David Bouyssié , Christine Carapito , Fernando Corrales ,       Myriam Ferro , Albert J.R. Heck , Peter Horvatovich , Martin Hubalek ,       Lydie Lane , Kris Laukens , Fredrik Levander , Frederique Lisacek ,

       Petr Novak , Magnus Palmblad , Damiano Piovesan , Alfred Pühler ,       Veit Schwämmle , Dirk Valkenborg , Merlijn van Rijswijk , Jiri Vondrasek ,   Martin Eisenacher , Lennart Martens , Oliver Kohlbacher 28-31

European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Cambridge, CB10 1SD, UKELIXIR Hub, Cambridge, CB10 1SD, UKDepartment of Mathematics and Computer Science, University of Antwerp, Antwerp, 2020, BelgiumFrench Proteomics Infrastructure ProFI, Grenoble, (EDyP U1038, CEA/Inserm/ Grenoble Alpes University) Toulouse (IPBS, Université deToulouse, CNRS, UPS), Strasbourg (LSMBO, IPHC UMR7178, CNRS-Université de Strasbourg), FranceProteoRed, Proteomics Unit, Centro Nacional de Biotecnología (CSIC), Madrid, 28049, SpainBiomolecular Mass Spectrometry and Proteomics, Bijvoet Centre for Biomolecular Research and Utrecht Institute for PharmaceuticalSciences, University of Utrecht, Utrecht, 3548 CH, NetherlandsNetherlands Proteomics Center, Utretcht, 3584 CH, NetherlandsAnalytical Biochemistry, Department of Pharmacy, University of Groningen, Groningen, 9713 AV, NetherlandsInstitute of Organic Chemistry and Biochemistry, Czech Academy of Sciences, Prague 1, 117 20, Czech RepublicCALIPHO Group, SIB Swiss Institute of Bioinformatics, Geneva, 1015, SwitzerlandDepartment of Human Protein Science, Faculty of Medicine, University of Geneva, Geneva, 1205, SwitzerlandNational Bioinformatics Infrastructure Sweden (NBIS), SciLifeLab, Department of Immunotechnology, Lund University, Lund, 223 62,

SwedenProteome Informatics Group, SIB Swiss Institute of Bioinformatics, Geneva, 1015, SwitzerlandComputer Science Department, University of Geneva, Geneva, 1205, SwitzerlandInstitute of Microbiology, Czech Academy of Sciences, Prague 1, 117 20, Czech RepublicCenter for Proteomics and Metabolomics, Leiden University Medical Center, Leiden, 2333 ZA, NetherlandsDepartment of Biomedical Sciences, University of Padova, Padova, I-35121, ItalyCenter for Biotechnology, Bielefeld University, Bielefeld, 33615, GermanyDepartment of Biochemistry and Molecular Biology, University of Southern Denmark, Odense M, 5230, DenmarkInteruniversity Institute for Biostatistics and Statistical Bioinformatics, Hasselt University, Hasselt, 3500, BelgiumCenter for Proteomics, University of Antwerp, Antwerpen, 2000, BelgiumApplied Bio & Molecular Systems, VITO, Mol, BE-2400, BelgiumNetherlands Metabolomics Centre, Utrecht, 3511 GC, NetherlandsDutch Techcentre for Life Sciences / ELIXIR-NL, Utrecht, 3511 GC, NetherlandsMedical Bioinformatics, Medizinisches Proteom-Center, Ruhr-University Bochum, Bochum, 44801, GermanyVIB-UGent Center for Medical Biotechnology, Ghent, 9052, BelgiumDepartment of Biochemistry, Ghent University, Ghent, 9000, BelgiumApplied Bioinformatics, Department of Computer Science, University of Tübingen, Tübingen, 72074, GermanyCenter for Bioinformatics Tübingen, University of Tübingen, Tübingen, 72074, GermanyQuantitative Biology Center, University of Tübingen, Tübingen, 72074, GermanyBiomolecular Interactions, Max Planck Institute for Developmental Biology, Tübingen, 72076, Germany

1 1 23 4 4 5

4 6,7 8 910,11 3 12 13,1415 16 17 18

19 20-22 23,24 925 26,27 28-31

1234

56

789101112

13141516171819202122232425262728293031

Page 1 of 11

F1000Research 2017, 6:875 Last updated: 08 NOV 2017

Open data requires infrastructure

Open access life science data is intensively reused

Biosimulation market worth $1bn/yr (2015)http://www.marketsandmarkets.com/Market-Reports/biosimulation-market-838.html

What are ELIXIR Core Data Resources?

• A set of data resources that are of fundamental importance to the broad life science community and the long-term preservation of biological data

• They provide complete collections of generic value to life science, and show high levels of usage, scientific quality and service

ELIXIR Core Data Resources – fundamentally important to life-science research

• 16 Core Data Resourced Nominated

• ELIXIR is committed to Open Access as a core principle for publicly funded research.

• Discussions on-going with Nodes, Resources and funders on high-quality, non-Open Access resources

• ELIXIR Core Data Resources should reflect this commitment and have terms of use or a license that enables the reuse and remixing of data.

• See “Identifying ELIXIR Core Data Resources”

• Agreed collectively by 21 Node directorshttps://www.elixir-europe.org/platforms/data/core-data-resources

Large impact on science

• ELIXIR Core Data Resources – over 16 000 citations of key papers in 2015

• Plus direct citations of data records and identifiers in scientific literature

• >20 000 articles w data citations (2014)

• > 88 000 direct citations of accessions in full-text open access articles (2014)

• ELIXIR Data Platform “metrics” group are working on standard methodology

0

2000

4000

6000

8000

10000

12000

14000

16000

18000

0 2 4 6 8 10 12 14 16

Citationsofkeypapers(EuropePMC 2015)forELIXIRCoreDataResources

An infrastructure for bioeconomy innovation

2010-2015:

30 771 patents* referred to bioinformatics repositories*Patterns of database citations in articles and patents indicate long-term scientific and industry value: https://f1000research.com/articles/5-160/v1

Towards a Global coalition to sustain Core Data Resources

• Call for Action published in Nature in March 2017

• Full text of article available as pre-print in bioRxiv

• June workshop in London with international funders

• Great interest in Core Data Resources (outcome and method)

• Outcomes taken into HIRO meeting following day

• Working Group established to take forward next steps

Changing landscape with many actors

• Highly distributed data-generating & monitoring

• Distributed analysis requires reference datasets (organized centrally, locally or in distributed networks)

• Manage Legal requirements in transnational settings

International Resources

National data centres

N!

A!D!Institutional data centres

ELIXIR Position Paper on FAIR data management in the life sciences

1. Open sharing of research data is a core principle

2. Data Management is crucial to science

3. Data should be submitted to deposition databases

4. All data submitted to Open Data archives should align with community-defined standards

5. ELIXIR Nodes implement FAIR for their respective nations

6. Professional skills, adequate resources and appropriate funding are needed for Data Management and infrastructure

Blomberg N and ELIXIR Consortium. ELIXIR position paper on FAIR data management in the life sciences. F1000Research 2017, 6(ELIXIR):1857 (document) (doi: 10.7490/f1000research.1114985.1)

“Whenever possible, biological research data should be submitted to the recommended community deposition databases"

• The ELIXIR Deposition Databases meet the technical quality and governance criteria expected of ELIXIR Core Data Resources

• See “Identifying ELIXIR Core Data Resources”

• Agreed collectively by 21 Node directors

• International collaborative effort

https://elixir-europe.org/platforms/data/elixir-deposition-databases

“All data submitted to Open Data archives must be annotated in accordance with community-defined standards”

https://elixir-europe.org/platforms/interoperability

“FAIR data management requires professional skills and adequate resources” Bring your own data workshops

• Problem-centered workshops

• Integration experts -Data resources –Users

• With national nodes or pan-European projects

“ELIXIR Nodes are the national implementation of a harmonised FAIR Data Management programme for the life sciences”

FindabilityHow do you find a needle in a federated haystack?

Bioschemas“schema.org markup for life sciences –minimum properties needed for finding data”

http://bioschemas.org

Carole Goble, Alisdair Gray, ELIXIR-UKRafael Jimenez – ELIXIR Hub

Bioschemas.org

Search enginesRegistriesData

Aggregators

• Standardised metadata

• Metadata publish and harvest without APIs or special feeds

• Feed bio registries and aggregators

A community initiative built on top of Schemas.org to improve Findability and Accessibility in Life Sciences

• Rapid markup• Exposed to harvesting• Find

Major data resources

Smaller datasets

Bioschemas Bioschemas

Bioschemas progress

Use case Gap analysis Spec Test Adoption Applications

Data repositories

✓ ✓ ✓ ✓ ✓ ✓

Datasets ✓ ✓ ✓ ✓ ✓

Beacons ✓ ✓ ✓ ✓ ✓

Samples ✓ ✓ ✓ ✓ ✓

Protein annotations

✓ ✓ ✓ ✓ ✓

Biological Entity ✓ ✓

Event ✓ ✓ ✓ ✓ ✓ ✓

Training material ✓ ✓ ✓ ✓ ✓ ✓

Tools ✓ ✓ ✓

omicsDI

Early adopters

Google research blog: Facilitating the discovery of public

datasets

Dataset index

Scientific File

PID

Dataset index

Scientific File

PID

Dataset index

Scientific File

PID

EarthLife ...

Common Access Common Access Common Access

Data

Services Compute Storage Transfer …

”Research schemas” as Emerging federation architecture in EOSC

EOSC Catalogue

ELIXIRComputePlatformhttps://www.elixir-europe.org/platforms/compute

Targetingaseamlessworkflow:aresearchermayusetheirelectronicidentitytosecurelycreateascientificsoftwareanalysisenvironment,andusetheenvironmenttoaccesslargebiologicaldataresourcesstoredonacloud.

Reliableelectronicidentificationofusers(ELIXIRID)isneededtoaccessthekeyservicesandcapacitiesofELIXIR.

• YoucanlinkexistinguseraccountstocreateyourELIXIRIDtodayatwww.elixir-europe.orgELIXIRAAIallowsUserstocontinueusingtheirfederatedacademic,corporateorsocialmediaidentitybylinkingittoapersonalELIXIRID.

• TheELIXIRserviceprovidersconnectedtoELIXIRAAIbenefitfromacentralised useridentityandaccessmanagementservices.

• ProtocolsSAML2,OpenIDConnect.

• https://www.elixir-europe.org/services/compute/aai

ELIXIRAuthenticationandAuthorizationInfrastructureAAI

o 359 Home Organisation IdPs enabled for login (via eduGAIN)

o 987 ELIXIR users

o 155 groups created in ELIXIR AAI

o 61 registered Resource Providers

ELIXIRCloud&Compute

ELIXIRCloudcapacitiessurveyedhere DK,DE,EBI,FI,FR,SUIconfirmedcapacity

>60.000computecores

>24.000TBofstorage

>3.000computeusers

ELIXIR Cloud WG: towards interoperable clouds

Datastorageandtransfer, coupled to security

Insert link to ELIXIR Webinar

ELIXIR Industry Strategy

ELIXIR Innovation and SME Forum

Previous Events Node-hosted events that present to companies the free tools and services made available through ELIXIR

•8 events since 2014

•350 companies have attended

•50% of forum attendees, on average, are from the industry sector

•95% attendee satisfaction rate

•Lots of networking opportunities

Upcoming Events• Cambridge – UK 24-25 January 2018:

Enabling Discoverability in Bio-Data Innovation

• Munich – Germany (Dates TBA): Biotechnology

• Themes• Human Data: FI, ES, CH • Rare Disease: FR• Marine: NO• Plant Sciences: NL• Multi-domain: BE, DK

SPEAKERS:

• Wim Haentjens (European Commission, DG Research & Innovation –Agri-food unit)

• Peer Bork (European Molecular Biology Laboratory)• Silvia Miret Catalan (Director Nutrition & Health Discover at Unilever)

DATA RESOURCE SHOWCASE | TRAINING | FLASH-TALK

ELIXIR Innovation and SME Forums – attendees –Quantitative Indicator

TotalPrivateAcademics

Copenhagen 2014

Wageningen2015

Basel2015

Oslo2016

Helsinki2017

Barcelona2017

Brussels2017

Paris2017

Outcome from innovation events: Qualitative Indicator

Node - collaboration

Service - exchange

Node - collaboration

ELIXIR in numbers• 21 Members and 1 Observer

• ~ 180 institutes involved

• 600+ staff

• 16 Core Data Resources

• 23 Implementation Studies ongoing or soon to start

• 17 papers in ELIXIR F1000 channel

• 264 live events in TeSS

• 350 companies attended Innovation and SME programme

top related