identity management – life sciences perspective

24
Identity management – life sciences perspective Ugis Sarkans European Bioinformatics Institute

Upload: matsu

Post on 23-Feb-2016

56 views

Category:

Documents


2 download

DESCRIPTION

Identity management – life sciences perspective. Ugis Sarkans European Bioinformatics Institute. European Bioinformatics Institute. Outstation of the European Molecular Biology Laboratory International organisation created by treaty ( cf CERN, ESA) - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Identity management –  life sciences perspective

Identity management – life sciences perspective

Ugis Sarkans

European BioinformaticsInstitute

Page 2: Identity management –  life sciences perspective

2

European Bioinformatics Institute• Outstation of the European Molecular Biology Laboratory• International organisation created by treaty (cf CERN, ESA)• EMBL-EBI has 400 Staff, €30 Million Budget, several million

users• 15 year history of service provision and scientific excellence• Sited at the Wellcome Trust Genome Campus Hinxton,

Cambridge, UK after European competition

2008funding sources

Page 3: Identity management –  life sciences perspective

3

• To provide freely available data and bioinformatics services to all facets of the scientific community in ways that promote scientific progress

• To contribute to the advancement of biology through basic investigator-driven research in bioinformatics

• To provide advanced bioinformatics training to scientists at all levels, from PhD students to independent investigators

• To help disseminate cutting-edge technologies to industry

EMBL-EBI Mission

Page 4: Identity management –  life sciences perspective

4

• Life sciences• Medicine• Agriculture• Pharmaceuticals• Biotechnology• Environment• Bio-fuels• Cosmaceuticals• Neutraceuticals• Consumer products• Personal genomes• Etc…

GenomesEnsembl , Ensembl

Genomes, EGA

GenomesEnsembl , Ensembl

Genomes, EGA

Nucleotide sequenceEMBL - Bank

Nucleotide sequenceEMBL - Bank

Gene expressionArrayExpress

Gene expressionArrayExpress

ProteomesUniProt , PRIDE

ProteomesUniProt , PRIDE

Protein families, motifs and domains

InterPro

Protein families, motifs and domains

InterPro

Protein structurePDBe

Protein structurePDBe

Protein interactionsIntAct

Protein interactionsIntAct

Chemical entitiesChEBI , ChEMBL

Chemical entitiesChEBI , ChEMBL

PathwaysReactomePathwaysReactome

SystemsBioModelsSystems

BioModels

Literature and ontologiesCitExplore , GO

Literature and ontologiesCitExplore , GO

Comprehensive, universal, integrated…

Page 5: Identity management –  life sciences perspective

Challenges facing information infrastructure for life sciences

• The growth of biomedical data is faster than the Moore's law• Data generated in geographically distributed manner, but

needs to be tightly integrated for interpretation • Data analysis algorithms need to be applied to combined

datasets on raw data level • Human research subject data (clinical data) needs to be

integrated with bio-molecular data raising the privacy issues and need for highly controlled access

• The data analysis algorithms are becoming more compute intensive – the need for parallelisation

Page 6: Identity management –  life sciences perspective

Dynamic growth response

Log(data volume)

Time

Available disk space

Page 7: Identity management –  life sciences perspective

Dynamic growth response

Log(data volume)

Time

Data to be stored

Available disk space

Page 8: Identity management –  life sciences perspective

9

What is Elixir?

• An EU Framework 7 Preparatory Phase Project• Coordinated by Prof Janet Thornton, Director EMBL-EBI• To construct a plan for the operation of a sustainable

infrastructure for biological information in Europe• €4.5 million grant awarded May 2007, three year term• 32 member consortium engaging many of Europe’s main

bioinformatics funding agencies and research institutes• Deliverables are memoranda of understanding to fund the

implementation phase which could cost €500 million • Interested parties should register as stake-holders via the

ELIXIR Website: www.elixir-europe.org

Page 9: Identity management –  life sciences perspective

10

ESFRIThe European Strategy Forum on Research Infrastructures

• Created by the Commission in February 2002• Adopted by the Competitiveness Council in April 2002• Representatives of EU Member States, Associated States, and one

representative of the European Commission. • Chairman: Prof Carlo Rizzuto (Sincrotrone Trieste S.c.p.A.-ELETTRA,

IT) • To support a coherent approach to policy-making on research

infrastructures in Europe• To act as an incubator for international negotiations about concrete

initiatives

Page 10: Identity management –  life sciences perspective

11

European Roadmap for Research Infrastructures

• 35 ‘mature’ projects for new large scale Research Infrastructures

• Based on an international peer review process• Covers all scientific areas, regardless of possible location • Likely to be realized in the next 10 to 20 years• Supported by a relevant European partnership or

intergovernmental research organisation.• Impact on science and technology development at

international level • Support new ways of doing science in Europe• Contribute to the enhancement of the European Research

Area

Page 11: Identity management –  life sciences perspective

12

Roadmap projects summary.

• 6 Social Science & Humanities• 8 Environmental Sciences• 3 Energy• 6 Biomedical and Life Sciences• 7 Material Sciences• 5 Astronomy, Astro-, Nuclear and Particle Physics

• 1 Computer and Data Treatment (transverse)

http://cordis.europa.eu/esfri/

Page 12: Identity management –  life sciences perspective

13

Cost of 35 Mature ESFRI RI Projects

Physics£3,600

Materials£4,500

Energy£2,200

Biomedical£1,600

Environment£1,300

Computing£300M

Social Science

Total Capital Cost = €13,696 Million

Page 13: Identity management –  life sciences perspective

The ten ESFRI BMS RI

14

Page 14: Identity management –  life sciences perspective

1515

ELIXIR Scientific & Technical Structure

Page 15: Identity management –  life sciences perspective

16

BMS Support of the European Grand Challenges

ELIXIR will provide Infrastructure forthe other ESFRI BMS RI.

Page 16: Identity management –  life sciences perspective

17

BioMedBridges

• Call 8 (Research) Topic 2.3.2 “Clustering the ESFRI BMS.” • Coordinated by Janet Thornton• To create the links between the ESFRI BMS RI• €10.6M over 4 years, 21 participating organisations, 12 WP• To “build bridges” between the infrastructures• Deliverables are infrastructure components that will link data

from the different domains of the ESFRI BMS RI to ELIXIR Core Datasets

• It is anticipated that these components will be incorporated into ELIXIR Construction Phase

• ESFRI BMS RIs will be doing the work• e-Infrastructure Advisory Panel: GÉANT, DANTE, EGI.eu, PRACE

Page 17: Identity management –  life sciences perspective

18

BioMedBridges Structure of Proposal

• WP1 Management• WP2 Outreach and inreach• WP3 ESFRI BMS Standards Description and Harmonization• WP4 Technical integration• WP5 Secure access• Five Use Cases WP6 – WP12

– WP6 Interoperability of large scale image data sets from different biological scales– WP7 PhenoBridge - crossing the species bridge between mouse and human– WP8 Personalized Medicine - integrating complex data sets to understand disease

pathogenesis and improve biomarker and treatment selection– WP9 From cells to molecules - integrating structural data – WP10 Integrating disease related data and terminology from samples of different types

• WP11 Technology Watch• WP12 Training

Page 18: Identity management –  life sciences perspective
Page 19: Identity management –  life sciences perspective

20

EMBL-EBI: Most important data collectionsGenomes & Genes

1. Ensembl: Joint project with Sanger Institute - high-quality annotation of vertebrate genomes2. Ensembl Genomes: Environment for genome data from other taxons3. 1000 Genomes: Catalogue of human variation from major World populations4. EGA*: European Genotype Archive* – genotype, phenotype and sequences from individual subjects and controls5. ENA: European Nucleotide Archive – all DNA & RNA, nextgen reads and traces

Transcription6. ArrayExpress: Archive of transcriptomics and other functional genomics data7. Expression Atlas: Differentially expressed genes in tissues, cells, disease states & treatments

Protein8. UniProt: Archive of protein sequences and functional annotation9. InterPro: Integrated resource for protein families, motifs and domains10. PRIDE: Public data repository for proteomics data11. PDBe: Protein and other macromolecular structure and function

Small molecules12. ChEBI: Chemical entities of biological interest13. ChEMBL: Bioactive compounds, drugs and drug-like molecules, properties and activities

Processes14. IntAct: Public repository for molecular interaction data15. Reactome: Biochemical pathways and reactions in human biology16. Biomodels: Mathematical models of cellular processes

Ontologies17. GO: Gene Ontology, consistent descriptions of gene products

Scientific literature18. CiteXplor: Bibliographic query system

* Requires authentication

Page 20: Identity management –  life sciences perspective

reviewer

author

submittedmanuscript

publishedmanuscript

restricteddata

publicdata

Data supporting publication – typical lifecycle

Page 21: Identity management –  life sciences perspective

European Genome-phenome Archive (EGA)• Primary archive for any data consented for research but not for fully public distribution

• all data must be de-identified and in accordance with the informed consent.

• Controlled access to the data

• distributed access policy:• Data Access Committee (DAC) • data release policy – data access application and data access agreement

• EGA supports only data access decisions that are based on the original informed consent

• authorized users have personal accounts in our system• access to the data requires account password• data decryption requires a separate key that must be requested and is sent off line

22HSF - 20.1.2011

Page 22: Identity management –  life sciences perspective

EGA works with Data Access Committees (DAC)

23HSF - 20.1.2011

Page 23: Identity management –  life sciences perspective

Authentication of FTP clientsis inherently insecure; we mayhave to require FTPS compliantclients (RFC 4217)

Secure Server

EGA provides archival encryption key andile path in the archive. This requires a secureAPI to facilitate access into the EGA master database

EGA secure layer

(3)

EGA secure layer

FTP Client

Request for whole file for download (with username/ password)

(1)

EGA verifies user and provides list of authorized list of files.

(2)

(4)Requested BAM data decrypted, andre-encrypted using client key

(5)Secure Server responds to FTP requests directly; FTP client downloads the custom-encrypted file

Mechanics of secure data access

Page 24: Identity management –  life sciences perspective

Acknowledgements

• Andrew Lyall, ELIXIR project manager• Paul Flicek, Ilkka Lappalainen, EGA• Alvis Brazma, Functional Genomics,

BioMedBridges security