hpc on aws
DESCRIPTION
Overview of HPC on Amazon Web ServicesTRANSCRIPT
HPC on Amazon Web Services
Deepak SinghAmazon Web Services
Dec 17, 2010
Image: Simon Cockell under CC-BY
the new reality
lots and lots and lots and lots and lots of data
lots and lots and lots and lots and lots of
compute
lots and lots and lots and lots and lots of
people
lots and lots and lots and lots and lots of
places
constant change
goal
innovate
innovate in a new reality
optimize the most valuable resource
compute, storage, workflows, memory,
transmission, algorithms, cost, …
people drive innovation
Credit: Pieter Musterd a CC-BY-NC-ND license
make people productive
Credit: Pieter Musterd a CC-BY-NC-ND license
challenges
Your Idea SuccessfulProduct
Your Idea SuccessfulProduct
Your Idea SuccessfulProduct
Great Idea Not Prioritized
Your Idea SuccessfulProduct
Great Idea Not Prioritized
Resource Contention
Tight Budgets
Shared Resources
enter the cloud
infrastructure services
building blocks
Undifferentiated Heavy Lifting
pay as you go
pay for what you use
on demand
programmable
import botoimport boto.emrfrom boto.emr.step import StreamingStepfrom boto.emr.bootstrap_action import BootstrapActionimport time
# set your aws keys and S3 bucket, e.g. from environment or .botoAWSKEY= SECRETKEY= S3_BUCKET=NUM_INSTANCES = 1
conn = boto.connect_emr(AWSKEY,SECRETKEY)
bootstrap_step = BootstrapAction("download.tst", "s3://elasticmapreduce/bootstrap-actions/download.sh",None)
step = StreamingStep(name='Wordcount', mapper='s3n://elasticmapreduce/samples/wordcount/wordSplitter.py', cache_files = ["s3n://" + S3_BUCKET + "/boto.mod#boto.mod"], reducer='aggregate', input='s3n://elasticmapreduce/samples/wordcount/input', output='s3n://' + S3_BUCKET + '/output/wordcount_output')
jobid = conn.run_jobflow( name="testbootstrap", log_uri="s3://" + S3_BUCKET + "/logs", steps = [step], bootstrap_actions=[bootstrap_step], num_instances=NUM_INSTANCES)
print "finished spawning job (note: starting still takes time)"
state = conn.describe_jobflow(jobid).stateprint "job state = ", stateprint "job id = ", jobidwhile state != u'COMPLETED': print time.localtime() time.sleep(30) state = conn.describe_jobflow(jobid).state print "job state = ", state print "job id = ", jobid
print "final output can be found in s3://" + S3_BUCKET + "/output" + TIMESTAMPprint "try: $ s3cmd sync s3://" + S3_BUCKET + "/output" + TIMESTAMP + " ."
Connect to Elastic MapReduce
Install packages
Set up mappers &reduces
job state
elastic
Capacity
Time
Realdemand
Elasticcapacity
On demand Faster to market
Pay as you go Maintain focus
Pay to play Efficiency
Elastic resources Capacity planning
Computing with Amazon EC2
Credit: Angel Pizzaro, U. Penn
Credit: Tom Fifield: U. Melbourne
standard “m1”high cpu “c1”
high memory “m2”
http://aws.amazon.com/ec2/instance-types/
EC2
inst
ance
type
s
listening to customers
new EC2 instance type
text
cluster compute instances
http://aws.amazon.com/ec2/instance-types/
2 * Xeon 5570 (“Intel Nehalem”)23 GB RAM
10 gbps Ethernet
1690 TB local disk
HVM-based virtualization
$1.60 / hr
10gbps
PlacementGroup
full bisection bandwidth
HPC on EC2 =
EC2 instance+
high bandwidth, low latency networking
http://aws.amazon.com/ec2/hpc-applications/
Linpack benchmark
880-instance CC1 clusterPerformance: 41.82 TFlops*
*#231 in the most recent Top 500 rankings
CFDMolecular ModelingSequence AnalysisEngineering Design
Energy Trading…
high I/O applications
standard “m1”high cpu “c1”
high memory “m2”
http://aws.amazon.com/ec2/instance-types/
cluster compute “cc1”
EC2
inst
ance
type
s
HPC is evolving
cluster GPU instances
http://aws.amazon.com/ec2/instance-types/
HPC on EC2 =
EC2 instance+
high bandwidth, low latency networking
+GPU
http://aws.amazon.com/ec2/hpc-applications/
2 * Xeon 5570 (“Intel Nehalem”)
22 GB RAM
10 gbps Ethernet
1690 TB local disk
HVM-based virtualization
$2.10 / hr
2 * Tesla M2050 GPU
standard “m1”high cpu “c1”
high memory “m2”
http://aws.amazon.com/ec2/instance-types/
cluster compute “cc1”
EC2
inst
ance
type
s
cluster GPU “cg1”
CFDMolecular DynamicsFinancial Modeling
RenderingVideo Processing
…What is your interest?
“90 percent scaling efficiency on clusters of up to 128 GPUs”
-- Mental Images iRay
Getting Started
http://aws.amazon.com/hpc-applications
4 steps
15 minutes
ecosystem
ISV ecosystem
Mathworksmental imagesRevup Render
Elemental Technologies...
HPC with AWS
E2 instance+
high bandwidth, low latency networking
+Tesla GPU*
*optional
On demand Faster to market
Pay as you go Maintain focus
Pay to play Efficiency
Elastic resources Capacity planning
make people productive
Credit: Pieter Musterd a CC-BY-NC-ND license
Your Idea SuccessfulProduct
Great Idea Not Prioritized
Your Idea SuccessfulProduct
http://aws.amazon.com/hpc-applications
[email protected] Twitter:@mndoci
http://slideshare.net/mndocihttp://mndoci.com
Inspiration and ideas from Matt Wood, James Hamilton
& Larry Lessig
Credit” Oberazzi under a CC-BY-NC-SA license