high performance computing with aws

39
Jafar Shameem and David Pellerin High Performance Computing with AWS Business Development, HPC

Upload: amazon-web-services

Post on 28-Nov-2014

1.216 views

Category:

Technology


5 download

DESCRIPTION

More and more, the scalable on-demand infrastructure provided by AWS is being used by researchers, scientists and engineers in Life Sciences, Finance and Engineering to solve bigger problems, answer complex questions and run larger simulations. In this session we start by talking about the supercomputing class performance and high performance storage available to the scientists and engineers at their fingertips. We will go over examples of how startups are innovating and large enterprises are extending their HPC environments. Finally, we walk through some of the common questions that come up as organizations start leveraging AWS for their high performance computing needs.

TRANSCRIPT

Page 1: High Performance Computing with AWS

Jafar Shameem and David Pellerin

High Performance Computing with AWS

Business Development, HPC

Page 2: High Performance Computing with AWS

Migrate entire HPC applications

and datacenters to the cloud

Use cloud capabilities to create

entirely new HPC applications

Augment on-premise HPC

resources with cloud capacity

How are Organizations Using Cloud for HPC?

Page 3: High Performance Computing with AWS

• Security: Deploy applications and store data in a secure,

highly configurable VPC environment

• Agility: Deploy the right infrastructure for each technical

computing job, at the right time

• Scalability: Add and subtract servers in minutes to

optimize time-to-results

• Cost Savings: Pay only for what you use, don’t pay for

idle or outdated servers

Why AWS for High-Performance Computing?

Page 4: High Performance Computing with AWS

Waste

User/Customer

Dissatisfaction

Actual demand

Predicted Demand

Rigid On-Premise Resources Elastic Resources

Actual demand

Resources scaled to demand

AWS for Agility

Page 5: High Performance Computing with AWS

On-Demand

Pay for compute

capacity by the hour

with no long-term

commitments

For spiky workloads,

or to define needs

Many purchase models to support different needs

Reserved

Make a low, one-time

payment and receive a

significant discount on

the hourly charge

For committed

utilization

Spot

Bid for unused capacity,

charged at a Spot Price

which fluctuates based

on supply and demand

For time-insensitive or

transient workloads

Dedicated

Launch instances within

Amazon VPC that run

on hardware dedicated

to a single customer

For highly sensitive or

compliance related

workloads

Free Tier

Get Started on AWS

with free usage & no

commitment

For POCs and

getting started

Page 6: High Performance Computing with AWS

Massive scale allows AWS to constantly reduce

costs, while improving quality and reliability

TCO of cloud is much lower then on-premise IT

when all costs are considered

Result? Large scale datacenter-to-cloud

migrations are in progress every day

AWS for Scale

Page 7: High Performance Computing with AWS

Scalable Computing: Go From Just One Instance…

Page 8: High Performance Computing with AWS

To Thousands… in Just Minutes!

Page 9: High Performance Computing with AWS

Mem

ory

(GiB

)

Small 1.7 GB,

1 EC2 Compute Unit

1 virtual core

Micro 613 MB

Up to 2 ECUs

Large 7.5 GB

4 EC2 Compute Units

2 virtual cores

Extra Large 15 GB

8 EC2 Compute Units

4 virtual cores

Hi-Mem XL 17.1 GB

6.5 EC2 Compute Units

2 virtual cores

Hi-Mem 2XL 34.2 GB

13 EC2 Compute Units

4 virtual cores

Hi-Mem 4XL 68.4 GB

26 EC2 Compute Units

8 virtual cores

High-CPU Med 1.7 GB

5 EC2 Compute Units

2 virtual cores

High-CPU XL 7 GB

20 EC2 Compute Units

8 virtual cores

Cluster GPU 4XL 22 GB

33.5 EC2 Compute Units,

2 x NVIDIA Tesla “Fermi”

M2050 GPUs

Cluster Compute 4XL 23 GB

33.5 EC2 Compute Units

Medium 3.7 GB,

2 EC2 Compute Units

1 virtual core

High I/O 4XL 60.5 GB, 35

EC2 Compute Units,

2*1024 GB SSD-based

local instance storage

High Storage 8XL 117 GB

35 EC2 Compute Units

24 * 2 TB instance store

Cluster High Mem 8XL

89 EC2 Compute Units

244 GB SSD instance storage

EC2 Compute Units

Cluster Compute 8XL 60.5 GB

88 EC2 Compute Units

Choose the Right Instance Type for the Job

Page 10: High Performance Computing with AWS

On-Premise

Experiment

infrequently

Failure is

expensive

Less Innovation

Cloud

Experiment

often

Fail quickly at a

low cost

More Innovation

$ Millions Nearly $0

AWS for Innovation

Page 11: High Performance Computing with AWS

Focus on innovation

Leave the muck of infrastructure management to AWS

http://eddie.niese.net/20090313/dont-pity-incompetence/

Page 12: High Performance Computing with AWS

• Engineering: CAD and CAE for aerospace, defense, structures,

consumer products

• Life Sciences: For basic research, drug discovery, genomics, and

translational medicine

• Energy and Geophysics: Including seismic processing, reservoir

estimation, high-energy simulation, wind energy modeling, GIS

• Financial Services and Insurance: Including valuation and risk

analytics

And Many More!

HPC Applications Running on AWS Today

Page 13: High Performance Computing with AWS

HPC for Engineering

Scalable Computing for CAD/CAE/EDA

Page 14: High Performance Computing with AWS

AWS for Engineering

• Computer-Aided Design, Simulation, Analysis, Visualization

– For development of commercial and military products

– Aerospace, automotive, civil, construction, energy, others

– Across industries, the trend is Simulation-Driven Design

• Examples

– Computer-Aided Design (CAD) including 3D models

– Electronic Design Automation (EDA)

– Computational Fluid Dynamics (CFD)

– Finite Element Analysis (FEA) and Thermal Analysis

– Crash Analysis

– Failure and Hazard Analysis

Page 15: High Performance Computing with AWS

CFD for Turbine Engine Design

• Time accurate fluid dynamics

• SBIR-funded project for the US Air Force Research Laboratory (AFRL)

• SAS 70 Type II certification and VPN-level access required

• Additional security measures:

– Uploaded and downloaded data was encrypted

– Dedicated EC2 cluster instances were provisioned

– Data was purged upon completion of the run

“The results of this case were impressive. Using Amazon EC2 the large-scale, time accurate simulation was turned around in just 72 hours with computing infrastructure costs well below $1,000.”

http://aws.amazon.com/solutions/case-studies/aerodynamic-solutions/

Page 16: High Performance Computing with AWS

• Commercial provider of mixed-signal ASICs for X-ray and gamma ray

detection and imaging

• Needs to perform very large Monte Carlo simulations using as many as 4000

server nodes

• Computing workloads are highly variable, project-driven

• Building an on-premise cluster to handle peak loads would be cost prohibitive

• Solution: EC2 3rd-generation High-Memory instances

• Up to 80% savings by using Spot instances on EC2

Radiation Simulation for ASIC Design

Page 17: High Performance Computing with AWS

1) Customer Managed Application Hosting

• Customer has account with AWS and manages infrastructure

• Customer maintains traditional software vendor relationships

• Software vendor offers license flexibility (BYOL)

2) Vendor Managed Hosting to Augment On-Premise Application

• Client-Server model for acceleration of batch tasks

• Customer pays software vendor for AWS-hosted services

• Customer does not need to manage low-level infrastructure

3) Vendor Managed Software-as-a-Service

• Pay-per-use, fully web-based including GUI

Scenarios for Technical Software

Page 18: High Performance Computing with AWS

Trusted by Enterprises Worldwide

Page 19: High Performance Computing with AWS

HPC for Life Sciences

Customer Case-Studies

Page 20: High Performance Computing with AWS

And a rich history in Life Sciences

Page 21: High Performance Computing with AWS

AWS Public Data Sets • A centralized repository of public datasets

• Seamless integration with cloud based applications

• No charge to the community

• Some of the datasets available today:

– 1000 Genomes Project

– Ensembl

– GenBank

– Illumina – Jay Flateley Human Genome Dataset

– YRI Trio Dataset

– The Cannabis Sativa Genome

– UniGene

– Influenza Virrus

– PubChem

• Tell us what else you’d like for us to host …

Page 22: High Performance Computing with AWS

Open Source ecosystem

• NCBI BLAST

• Crossbow

• CloudBurst

• Myrna

• Clovr

• BioPerl Max

• VIPDAC

• Superfamily

• Cloud-Coffee

• BioNimbus

• GMOD

• CloudAligner

• CRdata

• SeqWare

• Blend

• StormSeq

• BioConductor

Get links to AMIs at: https://github.com/mndoci/mndoci.github.com/wiki/Life-Science-Apps-on-AWS

MIT StarCluster Sun Grid Engine Condor

Torque Slurm Rocks

Chef Puppet

Page 23: High Performance Computing with AWS

Number of Cluster nodes can be added depends on the computational

needs

Page 24: High Performance Computing with AWS

Remove constraints Capex, operational skills,

processing limitations

Focus on the problem Not the technical challenges

of large compute clusters

Achieve more Perform bigger, more

complex jobs in a much

reduced time

Iterate around the

problem Do more and afford to take more

risks as cost of experimentation

reduced

Why

AWS?

Page 25: High Performance Computing with AWS

Data Transfer

• AWS Import/Export

– Move large amounts of data into and outside AWS

– Data Migration, Content Distribution, DR, etc.

• AWS Direct Connect

– Secure private link to AWS

– 1Gbps, 10Gbps connectivity

– You can also co-locate hardware in AWS DX locations

• Bandwidth Optimization Solutions

– Commercial providers – Aspera, Riverbed, Attunity, etc.

– Open Source – Tsunami UDP, Globus Online

AWS Direct Connect

AWS Import/Export

Page 26: High Performance Computing with AWS

Relational Database Service

Fully managed database

(MySQL, Oracle, MSSQL)

DynamoDB

NoSQL, Schemaless,

Provisioned throughput

database

S3

Object datastore up to 5TB

per object

99.999999999% durability

SimpleDB

NoSQL, Schemaless

Smaller datasets

Redshift

Petabyte scale

data warehousing service

Fully managed

Storage Options

Page 27: High Performance Computing with AWS

1.3 Trillion

835k peak transactions per second

Objects in S3

Page 28: High Performance Computing with AWS

Glacier

Long term cold storage

From $0.01 per GB/Month

99.999999999% durability

Archival

“Every day our genome sequencers produce terabytes of data. As our company moves into the clinical space, we face a legal requirement to archive patient data for years that would drastically raise the cost of storage. Thanks to Amazon Glacier’s secure and scalable solution, we will be able to provide cost-effective, long-term storage and thereby eliminate a barrier to providing whole genome sequencing for medical treatment of cancer and other genetic diseases.” - Keith Raffel, Senior Vice President and Chief Commercial Officer, Complete Genomics

Page 29: High Performance Computing with AWS

Elastic MapReduce Managed, elastic Hadoop cluster

Integrates with S3 & DynamoDB

Leverage Hive & Pig analytics scripts

Integrates with instance types such as spot

Application Services

Feature Details

Scalable Use as many or as few compute instances running Hadoop as you want. Modify the number of instances while your job flow is running

Integrated with other services

Works seamlessly with S3 as origin and output. Integrates with DynamoDB

Comprehensive Supports languages such as Hive and Pig for defining analytics, and allows complex definitions in Cascading, Java, Ruby, Perl, Python, PHP, R, or C++

Cost effective Works with Spot instance types

Monitoring Monitor job flows from with the management console

Compute Storage

AWS Global Infrastructure

Database

App Services

Deployment & Administration

Networking

Page 30: High Performance Computing with AWS

EMR Jobs

0

500,000

1,000,000

1,500,000

2,000,000

2,500,000

3,000,000

3,500,000

4,000,000

3.7 M clusters launched since May 2010

Page 31: High Performance Computing with AWS

Crossbow

• Align billions of reads and find SNPs

– Reuse software components: Hadoop Streaming

h" p://bowI eAbio.sourceforge.net/crossbow2

• Map: Bowtie (Langmead et al., 2009)

– Find best alignment for each read

– Emit (chromosome region, alignment)

• Reduce: SOAPsnp (Li et al., 2009)

– Scan alignments for divergent columns

– Accounts for sequencing error, known SNPs

• Shuffle: Hadoop

– Group and sort alignments by region

…2

…2

Searching for SNPs with Cloud Computing.

Langmead B, Schatz MC, Lin J, Pop M, Salzberg SL (2009) Genome Biology. 10:R134 

Page 32: High Performance Computing with AWS

Worldwide research and

development

The Amazon Virtual Private Cloud was a unique option that offered an additional level of security and

an ability to integrate with other aspects of our infrastructure.

“AWS enables Pfizer’s WRD to explore specific difficult or deep

scientific questions in a timely, scalable manner and helps

Pfizer make better decisions more quickly” Dr. Michael Miller, Head of HPC for R&D, Pfizer

Page 33: High Performance Computing with AWS

Spiral Genetics

• Alignment, Variant Calling, Annotation

• Turnaround time – Targeted : less than 40 minutes

– Exome : less than 2 hours

– Whole Genome : less than 5 hours

Page 34: High Performance Computing with AWS

• Workflows can be easily defined

and automated with integrated Galaxy Platform capabilities

• Data movement is streamlined with integrated Globus file-transfer functionality

• Resources can be provisioned on-demand with Amazon Web Services cloud based infrastructure

Globus Genomics

Page 35: High Performance Computing with AWS

Proprietary and Confidential. ©2013 Syapse

Syapse: Bringing Omics in Routine Medical Use

Laboratory Testing

Test Results Clinical Use

Syapse Semantic Data Platform

Syapse Omics Medical Record Application

Syapse Physician Portal Application

Syapse Discovery Application

Syapse

Page 36: High Performance Computing with AWS

Leverage Spot instances in workflows 1 days worth of effort

resulted in 50% savings in cost

Harvard Medical School The Laboratory of Personal Medicine

Run EC2 clusters to analyze entire

genomes “The AWS solution is stable, robust, flexible, and low cost. It

has everything to recommend it.” Dr. Peter Tonellato, LPM, Center for Biomedical Informatics, Harvard Medical School

Page 37: High Performance Computing with AWS

Illumina BaseSpace

• Data Analysis

– Alignment, Assembly, QC, Analysis

• Share data with colleagues

• Access high quality and diverse datasets

Page 38: High Performance Computing with AWS

We are here to help

Enterprise support Trusted Advisor Professional Services Sales and

Solutions Architects

Page 39: High Performance Computing with AWS

Thank You

Jafar Shameem ([email protected])

David Pellerin

([email protected])

http://aws.amazon.com/