storage and analysis of sensitive large-scale biomedical data in sweden

27
Storage and Analysis of Sensitive Large-Scale Biomedical Data in Sweden Ola Spjuth SNIC, UPPMAX and Science for Life Laboratory Uppsala University, Sweden [email protected]

Upload: uppsala-university

Post on 09-Jan-2017

168 views

Category:

Science


0 download

TRANSCRIPT

Page 1: Storage and Analysis of Sensitive Large-Scale Biomedical Data in Sweden

Storage and Analysis of Sensitive Large-Scale Biomedical Data in Sweden

Ola SpjuthSNIC, UPPMAX and Science for Life

LaboratoryUppsala University, Sweden

[email protected]

Page 2: Storage and Analysis of Sensitive Large-Scale Biomedical Data in Sweden

Ola Spjuth

• Associate Professor in Pharmaceutical Bioinformatics

• Guest Researcher

• Co-Director

• Manager of Bioinformatics Compute and Storage facility

Page 3: Storage and Analysis of Sensitive Large-Scale Biomedical Data in Sweden
Page 4: Storage and Analysis of Sensitive Large-Scale Biomedical Data in Sweden

2003: First sequenced human genome - 13 years for $3 billions

Page 5: Storage and Analysis of Sensitive Large-Scale Biomedical Data in Sweden

2015: Human whole genome sequenced in 3 days for ~$1150

…requires supercomputersfor analysis and storage

Massively parallel sequencing….

Page 6: Storage and Analysis of Sensitive Large-Scale Biomedical Data in Sweden

2010: Science for Life Laboratory inaugurated

An internationally leading center that develops and applies

large-scale technologies for molecular biosciences with a focus

on health and environment.

National platform since 2013

Stockholm node

Uppsala node

Page 7: Storage and Analysis of Sensitive Large-Scale Biomedical Data in Sweden

2. Data delivery

Data generation and delivery

3. Analysis

Scientists

www.uppmax.uu.se/uppnexHigh-performance computers and large scale storage for bioinformatics analysis.

1. Sample transfer

Page 8: Storage and Analysis of Sensitive Large-Scale Biomedical Data in Sweden
Page 9: Storage and Analysis of Sensitive Large-Scale Biomedical Data in Sweden

Sequence production 2014:• Generated > 120 Tbp of sequence data• 13.7 Gbp/hour, 3.8 Mbp/sec (on average)

Page 10: Storage and Analysis of Sensitive Large-Scale Biomedical Data in Sweden

Hardware resourcesmilou: HP cluster of 208 nodes

pica: 6 (7) PBHitachi storage

halvan: 2 TB high-memory computer

Fast network via SUNET

Backup via SNIC

Long-termstorage atSweStore

nestor: 48 nodes production cluster

meles: 547 TBHitachi storage mosler: 24

nodes, 223 TBSmog: 100 nodes, ~300 TB

2015: 250 nodes

2016: 200new nodes

+1 PB

+2 PB

Page 11: Storage and Analysis of Sensitive Large-Scale Biomedical Data in Sweden

A national e-Infrastructure for NGS

Software + reference data

Support

Education

Compute resources

Storage resourcesEfficiency + automation

Page 12: Storage and Analysis of Sensitive Large-Scale Biomedical Data in Sweden

What we sequenced at NGI /

Page 13: Storage and Analysis of Sensitive Large-Scale Biomedical Data in Sweden

Chipster workbench on UPPMAX

UpCloud – smog - (OpenStack)

Page 14: Storage and Analysis of Sensitive Large-Scale Biomedical Data in Sweden

• Open catalogue of VMIs• Hosted at Uppsala University

M. Dahlö, F. Haziza, A. Kallio, E.

Korpelainen, E. Bongcam-

Rudloff, and O. Spjuth.

BioImg.org: A catalogue of

virtual machine images for the

life sciences. Accepted in

Bioinformatics and Biology

Insights.

www.bioimg.org

Managing Virtual Machine Images

Page 15: Storage and Analysis of Sensitive Large-Scale Biomedical Data in Sweden

Mosler overview

• e-Infrastructure for working with sensitive data

• Copy of Norwegian solution (TSD)

• Designed to look like UPPMAX clusters

Page 16: Storage and Analysis of Sensitive Large-Scale Biomedical Data in Sweden

Mosler specifications

• High-performance computing in a virtualized environment (OpenStack)

• 2-factor authentication• Restricted data transfer in/out• Only accessible over remote desktop (ThinLinc) via

Mosler dashboard

• Aim: Compliant with all laws and regulations for analyzing sensitive data in Sweden

Page 17: Storage and Analysis of Sensitive Large-Scale Biomedical Data in Sweden

Consortia

DBA

Consortiummember

MyResearch

Virtual environment

storage compute

Mosler

Datahosting

Datasyncing

Access, analysis

Data hosting use case

Page 18: Storage and Analysis of Sensitive Large-Scale Biomedical Data in Sweden

Manager

DBA

Scientist

LifeGene

Virtual environment

storage compute

Mosler

1. Requestfor data

2. Approval

3. Dataextraction

4. Datatransfer

5. Access, analysis

Data extraction use case

Page 19: Storage and Analysis of Sensitive Large-Scale Biomedical Data in Sweden

Nov 2014

20M € total grant4M € IT-infrastructure

Page 20: Storage and Analysis of Sensitive Large-Scale Biomedical Data in Sweden

X-Ten System

• First system able to deliver 1000$ genome• Each run 1.2TB data

• 16 Human genome (30X)• 3 days per run

• Population scale genomics• 15K genomes per year

Swedish Genome Initiative

Call for a reference variation Database (1000 genomes) and for Whole Human Genome (half price).

Goal: 5.000 genomes 2015, 10.000 genomes 2016

Page 21: Storage and Analysis of Sensitive Large-Scale Biomedical Data in Sweden

Aug-11 Mar-12 Sep-12 Apr-13 Nov-13 May-14 Dec-14 Jun-15 Jan-160

100000

200000

300000

400000

500000

600000

700000

800000

900000

1000000NGI-Stockholm Procution (Jan-12 to Dec-15)

Production date

Giga

Bas

esData production

Conservative Prediction(60% of maximum production)

Page 22: Storage and Analysis of Sensitive Large-Scale Biomedical Data in Sweden

Whole Genome Sequencing

• Data on new scale, 80% expected to be sensitive New challenges

• Funding for IT-infrastructure from KAW foundation– Resources for data production (2 M EUR)– Resources for scientists (2 M EUR)

• A national security project funded by Swedish Research Council (5 M EUR over 4 years) – SNIC Sens

Page 23: Storage and Analysis of Sensitive Large-Scale Biomedical Data in Sweden

SNIC-Sens

• 4-year project, started Jan 2015• Project owner: SNIC (Ann-Charlotte Sonnhammer)• Project leader: Ola Spjuth (until end of this week)• Aims:

– Specifications for analyzing sensitive data in SNIC (hardware, legal, contracts, processes etc.)

– Evaluation on the use of public cloud providers (Google, Amazon)

– Make available e-Infrastructure for production and research of data generated at NGI, blueprint for other domains

Page 24: Storage and Analysis of Sensitive Large-Scale Biomedical Data in Sweden

SNIC-Sens roadmap

• Information classification workshop (21/5)• Risk/vulnerability analysis (2/6)• Specifications for hardware procurement• Public tender (end of this week)• Installation and testing of production system (Aug-

Sept)• Installation, configuration and testing of research

system (Q3-Q4)• Research system online (Q1 2016)

Page 25: Storage and Analysis of Sensitive Large-Scale Biomedical Data in Sweden

Two pilots for clinical data management

Page 26: Storage and Analysis of Sensitive Large-Scale Biomedical Data in Sweden

CML, Lucia Cavelier

Page 27: Storage and Analysis of Sensitive Large-Scale Biomedical Data in Sweden

MDR, Åsa Melhus