aws public sector summit 2014 talk - science as a service using aws

35
AWS Government, Education, and Nonprofits Symposium Washington, DC | June 24, 2014 - June 26, 2014 AWS Government, Education, and Nonprofits Symposium Washington, DC | June 24, 2014 - June 26, 2014 [email protected] Science as a Service on AWS Ravi K Madduri

Upload: ravi-madduri

Post on 21-Jun-2015

21.096 views

Category:

Science


1 download

DESCRIPTION

We present our work on creating sustainable science services using Globus, Amazon Web Services and Galaxy framework. We focus on Globus Genomics as successful usecase

TRANSCRIPT

Page 1: AWS Public Sector Summit 2014 Talk - Science as a Service using AWS

AWS Government, Education, and Nonprofits Symposium

Washington, DC | June 24, 2014 - June 26, 2014

AWS Government, Education, and Nonprofits Symposium Washington, DC | June 24, 2014 - June 26, 2014

[email protected]

Science as a Service on AWS

Ravi K Madduri

Page 2: AWS Public Sector Summit 2014 Talk - Science as a Service using AWS

AWS Government, Education, and Nonprofits Symposium

Washington, DC | June 24, 2014 - June 26, 2014

Page 3: AWS Public Sector Summit 2014 Talk - Science as a Service using AWS

AWS Government, Education, and Nonprofits Symposium

Washington, DC | June 24, 2014 - June 26, 2014

Outline

• CI Mission and Introduction of Science as a Service

• Motivation– Why is this important?

• Separation of concerns – Going far together• Examples of Science as a Service• Focus on Globus Genomics as a Success story

– Announcing Globus Genomics AWS Test Drive

Page 4: AWS Public Sector Summit 2014 Talk - Science as a Service using AWS

AWS Government, Education, and Nonprofits Symposium

Washington, DC | June 24, 2014 - June 26, 2014

Our Vision for a 21st Century Discovery Infrastructure

Provide more capability for people at lower cost by delivering

Science as a servicewww.globus.org

Page 5: AWS Public Sector Summit 2014 Talk - Science as a Service using AWS

AWS Government, Education, and Nonprofits Symposium

Washington, DC | June 24, 2014 - June 26, 2014

Two Broader Themes

• Productivity of Researchers– Time spent performing administrative tasks Vs

time spent doing science – Reproducibility

• Sustainability of scientific software– Reduction in funding for science

Page 6: AWS Public Sector Summit 2014 Talk - Science as a Service using AWS

AWS Government, Education, and Nonprofits Symposium

Washington, DC | June 24, 2014 - June 26, 2014

Time-consuming tasks in science• Run experiments• Collect data• Manage data• Move data• Acquire computers• Analyze data• Run simulations• Compare experiment

with simulation• Search the literature

• Communicate with colleagues

• Publish papers• Find, configure, install

relevant software• Find, access, analyze

relevant data• Order supplies• Write proposals• Write reports

Page 7: AWS Public Sector Summit 2014 Talk - Science as a Service using AWS

AWS Government, Education, and Nonprofits Symposium

Washington, DC | June 24, 2014 - June 26, 2014

Page 8: AWS Public Sector Summit 2014 Talk - Science as a Service using AWS

AWS Government, Education, and Nonprofits Symposium

Washington, DC | June 24, 2014 - June 26, 2014

Page 9: AWS Public Sector Summit 2014 Talk - Science as a Service using AWS

AWS Government, Education, and Nonprofits Symposium

Washington, DC | June 24, 2014 - June 26, 2014

42%

Page 10: AWS Public Sector Summit 2014 Talk - Science as a Service using AWS

AWS Government, Education, and Nonprofits Symposium

Washington, DC | June 24, 2014 - June 26, 2014

Presenting21st Century Discovery Infrastructure

Page 11: AWS Public Sector Summit 2014 Talk - Science as a Service using AWS

AWS Government, Education, and Nonprofits Symposium

Washington, DC | June 24, 2014 - June 26, 2014

Going Far Together

Separation of Concerns

Page 12: AWS Public Sector Summit 2014 Talk - Science as a Service using AWS

AWS Government, Education, and Nonprofits Symposium

Washington, DC | June 24, 2014 - June 26, 2014

Our Science Stack• Galaxy

– Interactive execution– Creation, Execution, Sharing,

Discovering Workflows

• Globus– Data management– Identity Management

• AWS– EC2, EBS, S3, SNS, Spot,

Route 53, Cloud Formation

SaaS

PaaS

IaaS

Page 13: AWS Public Sector Summit 2014 Talk - Science as a Service using AWS

AWS Government, Education, and Nonprofits Symposium

Washington, DC | June 24, 2014 - June 26, 2014

Time-consuming tasks in science

• Communicate with colleagues

• Publish papers• Find, configure, install

relevant software• Find, access, analyze

relevant data• Order supplies• Write proposals• Write reports

• Run experiments• Collect data• Manage data• Move data• Acquire computers• Analyze data• Run simulations• Compare experiment

with simulation• Search the literature

Page 14: AWS Public Sector Summit 2014 Talk - Science as a Service using AWS

AWS Government, Education, and Nonprofits Symposium

Washington, DC | June 24, 2014 - June 26, 2014

DataSource

DataDestination

User initiates transfer request1

Globus moves and syncs files2

Globus notifies user3

Globus: Fast, reliable data transfer

Page 15: AWS Public Sector Summit 2014 Talk - Science as a Service using AWS

AWS Government, Education, and Nonprofits Symposium

Washington, DC | June 24, 2014 - June 26, 2014

Amazon S3 Endpoints

Page 16: AWS Public Sector Summit 2014 Talk - Science as a Service using AWS

AWS Government, Education, and Nonprofits Symposium

Washington, DC | June 24, 2014 - June 26, 2014

DataSource

User A selects file(s) to share, selects user or group, and sets permissions

1

Globus tracks shared files; no need to move files to cloud storage!

2

User B logs into Globus and

accesses shared file

3

Globus: Sharing off existing systems

Page 17: AWS Public Sector Summit 2014 Talk - Science as a Service using AWS

AWS Government, Education, and Nonprofits Symposium

Washington, DC | June 24, 2014 - June 26, 2014

MyProxy

Globus: Federated identity

Page 18: AWS Public Sector Summit 2014 Talk - Science as a Service using AWS

AWS Government, Education, and Nonprofits Symposium

Washington, DC | June 24, 2014 - June 26, 2014

>25,000 registered users; >150 daily50 PB moved; >1B files

10x (or better) performance vs. scp99.9% availability

Entirely hosted on Amazon

Globus Transfer

Page 19: AWS Public Sector Summit 2014 Talk - Science as a Service using AWS

AWS Government, Education, and Nonprofits Symposium

Washington, DC | June 24, 2014 - June 26, 2014

Metadata

Access Control

License

Storage

Curation Workflow

PoliciesCollection

Globus: Data publication service

Metadata

DataMetadata

Data

Metadata

Data

DatasetDataset

Dataset

Community

Page 20: AWS Public Sector Summit 2014 Talk - Science as a Service using AWS

AWS Government, Education, and Nonprofits Symposium

Washington, DC | June 24, 2014 - June 26, 2014

Time-consuming tasks in science• Run experiments• Collect data• Manage data• Move data• Acquire computers• Analyze data• Run simulations• Compare experiment

with simulation• Search the literature

• Communicate with colleagues

• Publish papers• Find, configure, install

relevant software• Find, access, analyze

relevant data• Order supplies• Write proposals• Write reports

Page 21: AWS Public Sector Summit 2014 Talk - Science as a Service using AWS

AWS Government, Education, and Nonprofits Symposium

Washington, DC | June 24, 2014 - June 26, 2014

Globus Science Stack in Action

Sequencing Centers

Sequencing Centers

PublicData

Storage

Local Cluster/CloudSeq

Center

Research Lab

Globus Provides a• High-performance • Fault-tolerant• Secure

file transfer Service between all data-endpoints

Data Management Data Analysis

Picard

GATK

Fastq Ref Genome

Alignment

Variant Calling

Galaxy Data Libraries

Globus Genomics on Amazon EC2

• Analytical tools are automatically run on the scalable compute resources when possible

• Globus Integrated within Galaxy

• Web-based UI• Drag-Drop workflow

creations• Easily modify Workflows

with new tools

Galaxy Based Workflow Management System

FTP, SCP, others

FTP, SCP

SCP

Globus SaaS

FTP,

SCP,

HTTP

Page 22: AWS Public Sector Summit 2014 Talk - Science as a Service using AWS

AWS Government, Education, and Nonprofits Symposium

Washington, DC | June 24, 2014 - June 26, 2014

Flexible, scalable, affordable

genomics analysis for all biologists

Page 23: AWS Public Sector Summit 2014 Talk - Science as a Service using AWS

AWS Government, Education, and Nonprofits Symposium

Washington, DC | June 24, 2014 - June 26, 2014

Globus Genomics• Analysis tools profiled for optimal

performance

• Workload management for parallel execution

• Resources provisioned on demand

• High performance, reliable data movement

• Seamless access using institution’s credentials

• Best practice + extensible, customizable pipelines

Page 24: AWS Public Sector Summit 2014 Talk - Science as a Service using AWS

AWS Government, Education, and Nonprofits Symposium

Washington, DC | June 24, 2014 - June 26, 2014

Globus Climate

Page 25: AWS Public Sector Summit 2014 Talk - Science as a Service using AWS

AWS Government, Education, and Nonprofits Symposium

Washington, DC | June 24, 2014 - June 26, 2014

Globus Materials

Page 26: AWS Public Sector Summit 2014 Talk - Science as a Service using AWS

AWS Government, Education, and Nonprofits Symposium

Washington, DC | June 24, 2014 - June 26, 2014

Cardio Vascular Research

Page 27: AWS Public Sector Summit 2014 Talk - Science as a Service using AWS

AWS Government, Education, and Nonprofits Symposium

Washington, DC | June 24, 2014 - June 26, 2014

Proton Cancer Treatment

No. Histories

Execution Time (s)

No. Per Hour

On-demand Cost ($2.10)

Spot Cost ($0.50)

1.5B 570 6 $35 $91B 445 8 $27 $70.5B 283 12 $18 $50.25B

170 21 $10 $2

Page 28: AWS Public Sector Summit 2014 Talk - Science as a Service using AWS

AWS Government, Education, and Nonprofits Symposium

Washington, DC | June 24, 2014 - June 26, 2014

Usage has been promising

January February March April May June0

200000

400000

600000

800000

1000000

1200000

0

2000

4000

6000

8000

10000

12000Instance Hours Cost

Date

Inst

ance

Hou

rs

Cost

($)

2.5 Million Core hours

Page 29: AWS Public Sector Summit 2014 Talk - Science as a Service using AWS

AWS Government, Education, and Nonprofits Symposium

Washington, DC | June 24, 2014 - June 26, 2014

Exome: 3 – 12hrs ~1hr

Whole Genome: ~22hrs ~10hrs

RNA-Seq: 1 – 12hrs ~minutes

Page 30: AWS Public Sector Summit 2014 Talk - Science as a Service using AWS

AWS Government, Education, and Nonprofits Symposium

Washington, DC | June 24, 2014 - June 26, 2014

Diversity of collaborations

DobynsLab

Cox LabVolchenboum LabOlopade Lab

Nagarajan Lab

Page 31: AWS Public Sector Summit 2014 Talk - Science as a Service using AWS

AWS Government, Education, and Nonprofits Symposium

Washington, DC | June 24, 2014 - June 26, 2014

Common misconceptions• Cloud is expensive• Cloud is insecure• It takes a long time to move data and its hard• Cloud is about VMs and we got VMs• My codes won’t run on the cloud• Cloud is not HPC-enough• Amazon will be acquired or will file for bankruptcy

– What happens to my data?

Page 32: AWS Public Sector Summit 2014 Talk - Science as a Service using AWS

AWS Government, Education, and Nonprofits Symposium

Washington, DC | June 24, 2014 - June 26, 2014

Possible Solutions

• Outreach• Case studies with TCO for various

domains and problem types• Compliance• Transparency in Billing

Page 33: AWS Public Sector Summit 2014 Talk - Science as a Service using AWS

AWS Government, Education, and Nonprofits Symposium

Washington, DC | June 24, 2014 - June 26, 2014

Our Vision for a 21st Century Discovery

InfrastructureTo make advanced

computational capabilities available to all researchers at

substantially lower cost

Page 34: AWS Public Sector Summit 2014 Talk - Science as a Service using AWS

AWS Government, Education, and Nonprofits Symposium

Washington, DC | June 24, 2014 - June 26, 2014

We’re “all in” on cloud

Identify time-consuming activities amenable to automation, outsourcing and deliver as high-quality, low-touch SaaS

Extract common elements as a research data management automation PaaS

Leverage IaaS for reliability, economies of scale

Page 35: AWS Public Sector Summit 2014 Talk - Science as a Service using AWS

AWS Government, Education, and Nonprofits Symposium

Washington, DC | June 24, 2014 - June 26, 2014

Thank you to our sponsors!