![Page 1: Launch a Thousand Core HPC Cluster in Minutes with AWS CfnCluster](https://reader033.vdocuments.mx/reader033/viewer/2022052514/58ee05811a28aba5258b45c7/html5/thumbnails/1.jpg)
© 2015, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Adam Boeglin, HPC Solutions Architect
Monday, October 31, 2016
Launch a thousand core HPC cluster in minutes with AWS CfnCluster
![Page 2: Launch a Thousand Core HPC Cluster in Minutes with AWS CfnCluster](https://reader033.vdocuments.mx/reader033/viewer/2022052514/58ee05811a28aba5258b45c7/html5/thumbnails/2.jpg)
Webinar Highlights
• What is CfnCluster and when to use it• Architecture guidance to fit your
security models• How to install and configure of
CfnCluster• Demo: Review of CfnCluster and
managing compute at scale
![Page 3: Launch a Thousand Core HPC Cluster in Minutes with AWS CfnCluster](https://reader033.vdocuments.mx/reader033/viewer/2022052514/58ee05811a28aba5258b45c7/html5/thumbnails/3.jpg)
Introduction to CfnCluster
• AWS CloudFormation + Cluster = CfnCluster• Simple to install, easy to manage• Everything you need to get a cluster up and running in
minutes• Head node with scheduler• Shared NFS Storage
• /home• /shared
• OpenMPI• Compute nodes that grow and shrink on demand
![Page 4: Launch a Thousand Core HPC Cluster in Minutes with AWS CfnCluster](https://reader033.vdocuments.mx/reader033/viewer/2022052514/58ee05811a28aba5258b45c7/html5/thumbnails/4.jpg)
Workloads Well Suited for CfnCluster
• Computational Fluid Dynamics• Semiconductor Design• Weather Modeling• Genomics and Molecular Simulation• Seismic and reservoir simulations• 3D rendering and visualizations• … anything that uses a traditional HPC scheduler
![Page 5: Launch a Thousand Core HPC Cluster in Minutes with AWS CfnCluster](https://reader033.vdocuments.mx/reader033/viewer/2022052514/58ee05811a28aba5258b45c7/html5/thumbnails/5.jpg)
Cluster HPC and Grid HPC
Cluster HPCTightly coupled, latency sensitive
applications
Use larger EC2 compute instances, placement groups,
Enhanced Networking
Grid HPCLoosely coupled,
pleasingly parallel.
Requires very little node to node interaction.
Grids of ClustersUse a grid strategy on the cloud
to run a group of parallel, individually clustered HPC jobs
![Page 6: Launch a Thousand Core HPC Cluster in Minutes with AWS CfnCluster](https://reader033.vdocuments.mx/reader033/viewer/2022052514/58ee05811a28aba5258b45c7/html5/thumbnails/6.jpg)
Computational Fluid DynamicsANSYS Fluent
• AWS c4.8xlarge• 140M cells• F1 car CFD benchmark
http://www.ansys-blog.com/simulation-on-the-cloud/
![Page 7: Launch a Thousand Core HPC Cluster in Minutes with AWS CfnCluster](https://reader033.vdocuments.mx/reader033/viewer/2022052514/58ee05811a28aba5258b45c7/html5/thumbnails/7.jpg)
https://aws.amazon.com/hpc/cfncluster/
![Page 8: Launch a Thousand Core HPC Cluster in Minutes with AWS CfnCluster](https://reader033.vdocuments.mx/reader033/viewer/2022052514/58ee05811a28aba5258b45c7/html5/thumbnails/8.jpg)
Configuration Options• Operating System
• Amazon Linux• Centos 6• Centos 7• Ubuntu 14.04
• Scheduler• Sun Grid Engine (SGE)• OpenLava• Torque• SLURM
• Storage Size & IOPS• EBS & Instance Store
Encryption• Scaling Speed & Limits• Provisioning Scripts
![Page 9: Launch a Thousand Core HPC Cluster in Minutes with AWS CfnCluster](https://reader033.vdocuments.mx/reader033/viewer/2022052514/58ee05811a28aba5258b45c7/html5/thumbnails/9.jpg)
Many AWS services to tie it all together
• CloudFormation manages the state of the cluster• Amazon CloudWatch & Auto Scaling lets compute fleet
grow and shrink on demand• Amazon SQS & Amazon SNS allows compute nodes to
signal to master when they’re online• AWS Identity and Access Management (IAM) allows for
fine grained access control• Amazon S3 for storage of CloudFormation templates
![Page 10: Launch a Thousand Core HPC Cluster in Minutes with AWS CfnCluster](https://reader033.vdocuments.mx/reader033/viewer/2022052514/58ee05811a28aba5258b45c7/html5/thumbnails/10.jpg)
Amazon S3
DynamoDB
Amazon SQS
CloudWatch
Internet Gateway
(IGW)
region-1a
Master Server
Auto ScalingCompute
Fleet
CloudFormation
Standalone CfnCluster
![Page 11: Launch a Thousand Core HPC Cluster in Minutes with AWS CfnCluster](https://reader033.vdocuments.mx/reader033/viewer/2022052514/58ee05811a28aba5258b45c7/html5/thumbnails/11.jpg)
Amazon S3
DynamoDB
Amazon SQS
CloudWatch
Internet Gateway
(IGW)
Private Subnet
Master Server
Auto ScalingCompute
Fleet
CloudFormation
Public Subnet
VPC NAT gateway
Private Subnet Route TableVPC Traffic -> Local
0.0.0.0 -> Nat Gateway
Public Subnet Route TableVPC Traffic -> Local
0.0.0.0 -> Internet Gateway
Isolated CfnCluster
Bastian Server
![Page 12: Launch a Thousand Core HPC Cluster in Minutes with AWS CfnCluster](https://reader033.vdocuments.mx/reader033/viewer/2022052514/58ee05811a28aba5258b45c7/html5/thumbnails/12.jpg)
Amazon S3
DynamoDB
Amazon SQS
CloudWatch
Internet Gateway
(IGW)
Private Subnet
Master Server
Auto ScalingCompute
Fleet
CloudFormation
Public Subnet
VPC NAT gateway
Corporate Data Center
Engineer VPN Connection
Private Subnet Route TableVPC Traffic -> Local
Corp IP Range -> VPN0.0.0.0 -> Nat Gateway
Public Subnet Route TableVPC Traffic -> Local
Corp IP Range -> VPN0.0.0.0 -> Internet Gateway
Isolated CfnCluster w/ VPN
![Page 13: Launch a Thousand Core HPC Cluster in Minutes with AWS CfnCluster](https://reader033.vdocuments.mx/reader033/viewer/2022052514/58ee05811a28aba5258b45c7/html5/thumbnails/13.jpg)
Private Subnet
Master Server
Auto ScalingCompute
Fleet
Amazon S3
DynamoDB
Amazon SQS
CloudWatch
CloudFormation
Corporate Data Center
Proxy ServerVPN Connection
InternetConnection
Private Subnet Route TableVPC Traffic -> Local
Corp IP Range -> VPN0.0.0.0 -> VPN
Private CfnCluster w/ VPN & Proxy
![Page 14: Launch a Thousand Core HPC Cluster in Minutes with AWS CfnCluster](https://reader033.vdocuments.mx/reader033/viewer/2022052514/58ee05811a28aba5258b45c7/html5/thumbnails/14.jpg)
Creating an IAM User
• Create an IAM user with Administrative privileges• Fine grain access controls can be done later
• Generate an Access & Secret key and keep it safe
![Page 15: Launch a Thousand Core HPC Cluster in Minutes with AWS CfnCluster](https://reader033.vdocuments.mx/reader033/viewer/2022052514/58ee05811a28aba5258b45c7/html5/thumbnails/15.jpg)
Create an SSH Key
• Generate or import the key you’ll use for user login
![Page 16: Launch a Thousand Core HPC Cluster in Minutes with AWS CfnCluster](https://reader033.vdocuments.mx/reader033/viewer/2022052514/58ee05811a28aba5258b45c7/html5/thumbnails/16.jpg)
Installing the CfnCluster CLI
• On your desktop or a bastion server
$ sudo pip install cfncluster
![Page 17: Launch a Thousand Core HPC Cluster in Minutes with AWS CfnCluster](https://reader033.vdocuments.mx/reader033/viewer/2022052514/58ee05811a28aba5258b45c7/html5/thumbnails/17.jpg)
Creating the Base Configuration
• First, create the base config required to start a cluster.
$ cfncluster configure
![Page 18: Launch a Thousand Core HPC Cluster in Minutes with AWS CfnCluster](https://reader033.vdocuments.mx/reader033/viewer/2022052514/58ee05811a28aba5258b45c7/html5/thumbnails/18.jpg)
Edit the configuration file to meet your needs
• Reference the configuration docs• http://cfncluster.readthedocs.io/en/latest/configuration.html
$ vim ~/.cfncluster/config
![Page 19: Launch a Thousand Core HPC Cluster in Minutes with AWS CfnCluster](https://reader033.vdocuments.mx/reader033/viewer/2022052514/58ee05811a28aba5258b45c7/html5/thumbnails/19.jpg)
Launch the Cluster
$ cfncluster create mycluster
• Cluster creation usually takes ~15 minutes
• Completely managed by CloudFormation
![Page 20: Launch a Thousand Core HPC Cluster in Minutes with AWS CfnCluster](https://reader033.vdocuments.mx/reader033/viewer/2022052514/58ee05811a28aba5258b45c7/html5/thumbnails/20.jpg)
Submit your first job[ec2-user@ip-10-0-0-17 ~]$ cat hw.qsub#!/bin/bash##$ -cwd#$ -j y#$ -pe mpi 2#$ -S /bin/bash#module load openmpi-x86_64mpirun -np 2 hostname
[ec2-user@ip-10-0-0-17 ~]$ qsub hw.qsub Your job 1 ("hw.qsub") has been submitted
[ec2-user@ip-10-0-0-17 ~]$ qstatjob-ID prior name user state submit/start at queue slots ja-task-ID ------------------------------------------------------------------------------------------------ 1 0.55500 hw.qsub ec2-user r 02/01/2015 05:57:25 [email protected] 2
[ec2-user@ip-10-0-0-17 ~]$ ls -ltotal 8-rw-rw-r-- 1 ec2-user ec2-user 110 Feb 1 05:57 hw.qsub-rw-r--r-- 1 ec2-user ec2-user 26 Feb 1 05:57 hw.qsub.o1
[ec2-user@ip-10-0-0-17 ~]$ cat hw.qsub.o1 ip-10-0-0-44ip-10-0-0-45
![Page 21: Launch a Thousand Core HPC Cluster in Minutes with AWS CfnCluster](https://reader033.vdocuments.mx/reader033/viewer/2022052514/58ee05811a28aba5258b45c7/html5/thumbnails/21.jpg)
EBS Snapshots for Software & Storage Management
• Install your applications and store any working data to /shared
• Create a snapshot of that volume
• Re-use that snapshot every time you launch your cluster
ebs_snapshot_id = snap-xxxxx
Master Server
Root & HomeVolume (/ & /home)
NFS Shared Volume(/shared)
Amazon EBS Snapshot
(snap-xxxxx)
![Page 22: Launch a Thousand Core HPC Cluster in Minutes with AWS CfnCluster](https://reader033.vdocuments.mx/reader033/viewer/2022052514/58ee05811a28aba5258b45c7/html5/thumbnails/22.jpg)
Upgrading Hardware is Easy!
• Simple upgrade from Ivy Bridge to Haswell
1. Let all compute nodes stop2. Edit ~/.cfncluster/config and change
compute_instance_type = c3.8xlargeto
compute_instance_type = c4.8xlarge3. Update the cluster
$ cfncluster update mycluster
C3
C4
![Page 23: Launch a Thousand Core HPC Cluster in Minutes with AWS CfnCluster](https://reader033.vdocuments.mx/reader033/viewer/2022052514/58ee05811a28aba5258b45c7/html5/thumbnails/23.jpg)
Demo: Launching a Cluster
![Page 24: Launch a Thousand Core HPC Cluster in Minutes with AWS CfnCluster](https://reader033.vdocuments.mx/reader033/viewer/2022052514/58ee05811a28aba5258b45c7/html5/thumbnails/24.jpg)
Thank you!