(bac208) bursting to the cloud: deploying a hybrid cloud storage solution with aws | aws re:invent...

27
November 13, 2014 Las Vegas, Nevada Aaron Black, Director of Informatics, Inova Translational Medicine Institute (ITMI) Ron Bianchini, President and CEO, Avere Systems

Upload: amazon-web-services

Post on 16-Jul-2015

259 views

Category:

Technology


2 download

TRANSCRIPT

November 13, 2014 Las Vegas, Nevada

Aaron Black, Director of Informatics, Inova Translational Medicine Institute (ITMI)

Ron Bianchini, President and CEO, Avere Systems

Vision, Validation, and Innovation

• Five hospital + ambulatory

healthcare system

• Largest healthcare system in

Northern VA

• Two million patient visits/year

• 1,700 beds

• 20,000 deliveries/year

• Inova Translational Medicine

Institute (ITMI) started in 2010

Inova Translational Medicine Institute (ITMI)

• Personalized (precision) medicine

ITMI research studies

• Molecular associations with preterm birth

• 365 Preterm Birth trios, 590 Full term trios

• WGS + ‘omics + clinical data

• Biobank: blood, buccal mucosa, saliva, cord blood,

placenta

Preterm birth study (2011)

• WGS + ‘omics + clinical data on 5,000 → 10,000

family trios – currently 2800 genomes

• Longitudinal study (≥18 yrs)

• Blood, saliva, urine, cord blood, placenta

• DNA, RNA, protein, epigenetic + clinical data

Longitudinal study (2012)

Congenital disorders

study (2012)• Mostly NICU-based

• Any other patient with a “congenital/genetic”

disorder

and there’s more...

• Additional studies

• Diabetes

• Obesity

• Heart implants

• Ethics

• Clinical pharmacogenomics testing

• Anticipating disease specific genetic panels in 2015

• Predictive analytics

Community health system

Large, diverse patient cohort

Family trio-based data sets

Longitudinal

Non-disease specific

Clinic linked to R&D

• Data diversity• Over 100 countries of origin

represented

• Data quality• CLIA regulated lab

• Complete medical records

• Whole genome sequence

• Interoperable and comparable data• Consistent SOPs across studies

• Consistent data formats / platforms

• Industry best practices

• Access to policy makers

• Patients

• 8,300 subjects (enrolled), > 110 different countries of birth

• Banked samples

• >200,000 banked samples

• Whole genome sequences

• 5,745

• 28,750,000,000

• Diagnosis

• 46,500 patient diagnosis

• Labs results

• 1,245,612 discrete lab results

• 91,881 surveys and case report forms

• 1,470,772 discrete variables from case report form and surveys

ITMI omics

• Whole genome sequences

• CGI and Illumina – high quality

• RNA-Seq

• Gene expression

• MicroRNA-Seq

• Gene regulation

• Methylation (Infinium 450k arrays)

• Gene regulation

• External enrichment

• Reference annotations – open and commercial

sourcesScalability needed!

Research data – production systems

Agile and iterative

Secure

Resilient

Learn and collaborate

Integrate data across services

Enable disease prediction models

Represent all informatics activities

Enable fast data discovery

Improve clinical care through better insightLearn

Predict

Report

Manage

Store Securely manage data at scale

ITMI informatics challenges

• Petabyte scale storage

• Execution: how to set up effective,

large scale, data store

• Cost: on-premises initial costs in

the $10’s MM

• Data durability: average of 2%

decay/mo* unacceptable

• HIPAA compliance

• Support for obscure

bioinformatics tools

• Data movement:

AWS on-premises

• 100’s of millions of files

• Large files: up to 0.5 TB each

• Encryption: difficult to ensure

integrity given large file sizes

• Fluctuating HPC demands

and benchmarks

• Balancing development and

support

• On premises – SGI UV2K \ NetApp– Large in-memory processing

– 16 TB memory

– 1024 CPU cores

– 1 PB magnetic storage

– 40 TB SSD storage

• 10+ Linux servers

• Numerous virtual machines– Application and DBs (Oracle, MSSQL, PostgreSQL, MySQL)

Why ITMI uses AWS

• Facilitates movement of biological data from vendors

• Lower data storage costs• Saved >$10 million in up-front costs

• 1.3 PB storage (mostly bio data)

• 7,000,000 + files (scripts for ETL)

• Only pay for usage per month biological data QC and analysis workflows

• Flexible number of Amazon EC2 instances• Linux clusters

• Hadoop/Cloudera big data engine

• Custom bioinformatics

• Quickly do proof of concepts

• Easy to share data with collaborators

• Easy to deploy web applications

Bio materials Omic data

Reporting

& analysis

On-premises Inova

bioinformatics staff

ITMI hybrid cloud with AWS & Avere

On-premises

laboratory &

clinical data

On-premises HPC cluster & storage

InovaAmazon Web Services

HIPAA compliant

ITMI hybrid cloud with AWS & Avere

ITMI hybrid cloud with AWS & Avere

• Analysis from days (or not even finishing) to hours!

• Agile and targeted analysis

• Faster outcomes

• Improved patient care!

• Improved prediction!

Clients & Servers LAN / WANAvere Edge Filer

Low latency read, write

& metadata ops

Add performance NetApp

EMC

Oracle

Core Filer(s)

Edge Core Architecture• Edge filer performance optimized

• Read caching, Write posting

• Clustering for linear scaling

• Core filer capacity optimized

• High density disk

• Latency independent

• Heterogeneous global namespace

• On-line data migration

Success Stories• High performance

• Up to 50x traditional NAS

• Cost Savings

• Up to 80% savings vs NAS

• Remote Office/WAN

• Hide 98% WAN latency

• Public Benchmarks

• 80% footprint, WAN neutral

Clients & Servers LAN / WANAvere Edge Filer

Low latency read, write

& metadata ops

Add performance NetApp

EMC

Core Filer(s)

On-premises object storage

Amazon S3

AOS 3.0

AOS 4.0

FlashMoveHybrid Cloud Storage

• S3 bucket is treated as a Core filer

• All prior AOS features apply

• 50:1 off load

• GNS

• Online in/out migration

• Public Benchmarks performance neutral

• Ultimate cloud on-ramp!

This is the

Inova Translational Medicine Institute

Use Case

On-premises object storage

NAS (Netapp)

NAS (Isilon)

Customer premises

Amazon Web Services

Amazon S3

AOS 4.0

AOS

4.0

Amazon S3

Core Filer

Clients & Servers LAN / WANAvere Edge Filer

Core Filer(s)

AOS

4.5

Virtual FXT

in cloud

Amazon EC2

AOS 4.5

Cloudbursting

Best-In-Class NAS 100% AWS Cloud Enabled

• Storage: Amazon S3

• Compute: Amazon EC2

• No vendor-specific hardware

• No over-priced disks

The Only Best-In-Class NAS that is 100% enabled for AWS

Cloud

Gateway

Products

Traditional

NAS

Products

• Personalized medicine is happening at ITMI

• Many drivers for preventative personalized medicine

• Multiple and complicated barriers for broader success

• Keys for IT / informatics

• Leverage both cloud and on-premises HPC

• Use the right tools (there are many)

• Collaborate

• Be secure!

• Agile, scalable, resilient, durable

http://bit.ly/awsevals