the high performance computing roadmap · 2020. 6. 4. · espos sle hpc 15 sp1 ltss sle hpc 15 sp1...
TRANSCRIPT
1
The High Performance Computing Roadmap
FUT-1438
Jay KruemckeSenior Product Manager – SUSE High Performance Computing
@mr_sles
2
Agenda
1. Why HPC?
2. Customer challenges
3. What SUSE brings to HPC
4. Where are we going?
3
Why HPC?
Worldwide HPC revenue expected to reach over $19.95 billion by 20231
Big data combined with HPC creating new solutions, adding many new
users/buyers to the HPC space (AI/ML/DL and HPDA are hot new areas)
SUSE runs on 21 of the top 50 supercomputers (7 RH, 9 CentOS)2
SUSE dominates top 100, CentOS gains share in “smaller” supercomputers2
Commercial OS Share in Top 500 (represents 100 supercomputers in the list): SUSE 53%, RH 24%, bullx 17%, Ubuntu 6%2
1 Hyperion Research, November 20192 Top500 Supercomputer Report, November 2019
4
Cloud Computing For HPC Will Grow Faster
1 Hyperion Research, November 2019
• Total HPC spending is projected to reach $44B
in 2022
• Over 70% of HPC sites run some jobs in public
clouds
• Over 10% of all HPC jobs are now running in
clouds (primarily hybrid)
• Public clouds are cost-effective for some jobs
but up to 10x more expensive for others,
depending on where data resides
• Private and hybrid cloud use is growing faster
5
Customer Pain Points And Challenges
Time to Solution
“I need to maximize
application performance,
scale workloads, and
minimize overhead.”
• Parallel software is lacking
with many applications
needing a major re-design
• Segmented into
commercial and scientific,
and there is not enough
collaboration
Maintenance
“My IT staff doesn’t have
time to update and test all
the different software
components.”
• Better management
software needed; update
deployment approach to
leverage HPC and cloud
infrastructure
• Stack components
provided by multiple
vendors, making it more
challenging to maintain
Complexity
Composing a working HPC
environment is difficult, time-
consuming, requiring
experts.”
• Clusters are hard to use
and manage as they
become more complex in
heterogeneous
environments
• Storage access time and
data management are
becoming new bottlenecks
6
SUSE Linux Enterprise High Performance Computing
HPC bundle with supported HPC packages – beyond an OS
Supports Aarch64 (Arm) and x86-64
Many IHV/ISV/CSP partnerships
Multiple service life options
Competitive cluster node pricing model
7
SuperMUC Petascale system runs SUSE on Lenovo
ThinkSystem
Geophysicists use earthquake simulation software to
investigate seismic waves beneath Earth’s surface
Calculations involved in this kind of simulation are so
complex that they push even supercomputers to their limits
8
Selected SUSE HPC Projects
SUSE Linux for HPC
& the HPC module
SUSE Enterprise Storage
SUSE Package Hub
HPC Containers
Arm: the emerging platform
HPC in the Cloud
Accelerator enablement
9
Why SUSE Linux For HPC?
Enterprise Linux with Enterprise support
• Security incidents require quick response to address system vulnerabilities
More than just an OS - HPC software included and supported
• SLE HPC includes popular HPC software such as slurm and OpenMPI
• Deployment templates for Head Nodes, Compute Nodes, Dev Nodes
Aggressively priced subscriptions
• SUSE Linux for HPC priced for large and small HPC configurations
Proven track record in HPC
• 50% of the Top 100 HPC systems are running SUSE Linux or SLE-based OS
10
SUSE Linux HPC Module
Simplify access to supported
HPC packages
All packages supported by SUSE
via SUSE Linux Enterprise HPC
Available for x86 and Arm-based
platforms
SLE HPC 12 and SLE HPC 15MUNGE
ScaLAPACK
genders
11
Installing The HPC Module
6/3/2020
12
SUSE HPC Reference Architecture
13
Cloud is being optimized for HPC
workloads through performance,
scalability and cost efficiency,
enabling you to extend your HPC
environment to the cloud on-demand.
Dynamically burst to the cloud to
complement your on-premises
capabilities, or even fully migrate
entire HPC environments and
workflows.
Cloud-Ready HPC
14
HPC In The Cloud
HPC “all-in” the cloud
• Includes the head, compute and storage nodes,
with no hardware infrastructure to maintain
• Optimized cost and performance for scale-out
applications
HPC bursting to hybrid/public clouds
• Address changing capacity needs
• Extend HPC jobs to the Cloud for on-demand
scale and flexibility
Local Network Cloud Local Network Cloud
15
Goal: Propel the Arm HPC ecosystem and exascale computing in the UK
• More than 12,000 Arm-based cores running across three universities
• 64 Apollo 70 systems per site
• Two 32 core Cavium ThunderX2 processors per system
• Running SUSE Linux Enterprise for High Performance Computing
Catalyst UK program: HPE, Arm, SUSE, and three leading UK universities establish one of the largest Arm-based supercomputer deployments in the world
16
Artificial Intelligence
Machine Learning
Neural Networks
Deep Learning
Convolutional Neural Networks
Transfer Learning
The Spectrum Of AI Solutions
Deep LearningExamples are disease identification
and energy demand optimization.
Machine LearningExamples are cyber security,
autonomous vehicles and F1 racing.
Artificial IntelligenceExamples are Google Maps and game
play.
Neural NetworksExamples are facial and voice
recognition.
Convolutional Neural NetworksExamples are image/video recognition
and medical image analysis.
Transfer LearningFor example, knowledge gained while
learning to recognize cars could apply
when trying to recognize trucks.
17
SUSE Participation In OpenACC
OpenACC is a directive based programming model designed to provide
performance and portability for CPUs, GPUs, and other accelerators
SUSE joined OpenACC to simplify access to accelerator technology for
SUSE HPC customers
18
SUSE PackageHub
• High-quality, up-to-date packages
delivered by openSUSE Factory
• Easy to install via zypper or yast
• Built and maintained by the
community of users
• Approved and curated by SUSE
• No charge
About 1000 packages
available for X86-64
More than 500 packages
available for ARM
Enterprise UserSUSE Package HubUpstream packages
Package Category
TensorFlow ML Framework
Caffe2 Framework
Theano Deep learning library
Numpy* Math library
Pytorch* ML library
ArmNN ML Framework
clustershell Administrative
robinhood Administrative
singularity Runtime*planned
19
Key HPC Partnerships
1
9
20
Ceph-based, software-defined storage
Backup/archival HPC storage
IO500 benchmark-ranked
Easy to manage with openATTIC
Certified with HPE DMF
SUSE Enterprise Storage
21
“Thanks to the stability and ease of
management of the SUSE solution, we
have significantly reduced the time we
spend managing live and archived
data. This keeps our internal team free
to focus on driving new value for the
university and its life-changing
research projects.”
“SUSE Enterprise Storage has already brought
clear improvements to our deep learning projects,
one of which requires two million files in a single
directory. Putting these files into SUSE Enterprise
Storage has increased performance more than ten
times compared with the previous storage solution.”Steve Cousins
Supercomputer Engineer
University of Maine System
22
SUSE Enterprise Storage Solution For HPC - Ceph
Tier 2 Storage Use Case
6/3/2020
Low Latency
Storage (Lustre,
XFS, NFS etc)
HPC Compute
Cluster
SUSE Enterprise
Storage
• Use Cases:
• Primary Storage (Certain Use Cases)
• Nearline or Archival Storage
• Home Directories
• Certified with HPE Data Management Framework (DMF) and iRODS*
23
HPC Storage Use Case:
Large European Energy Company
Active TierHot Data
Dormant TierCold Data
HPC/AI Compute Cluster
High-Performance Storage
Scale-out NAS
Parallel File Systems
All-Flash File System
HPE Data Management FrameworkTiered data management
TapeDMF zero watt storage
Object Storage & Cloud
Tier 0 Storage needs
- Clustered file system
- Lustre
- 10 PiB, 240Gb/sec
Tier 1 Storage needs
(SUSE / Ceph)
- Object Storage, resilient
- Widely used, affordable
- Automatic access`
- 5 PiB
24
SLES HPC Lifecycle Roadmap*
SLES 12 HPC SP5
SLES 12 HPC
SP5 LTSS
SLES 12 HPC SP5SLES 12 HPC SP5
ESPOS
2017 2018 2019 2020 2021 2022 20252023 2024
SLES 12 HPC
SP3 LTSS
SLES 12 HPC SP3
ESPOS
SLES 12 HPC SP3
FCS
Sept 2017
SLES 12 HPC SP3
”Normal” SP
overlap
SLES 12 HPC
SP4 LTSS
SLES 12 HPC SP4
ESPOS
SLES 12 HPC SP4
FCS
4Q 2018
SLES 12 HPC SP4
”Normal” SP
overlap
SLE HPC 15
ESPOS
SLE HPC 15 FCS
Q2 2018
SLE HPC 15
”Normal” SP
overlap
SLE HPC 15 SP2
SLE HPC 15 SP2
SLE HPC 15
SP2 LTSS
SLE HPC 15 SP2
ESPOS
SLE HPC 15
SP1 LTSS
SLE HPC 15 SP1
ESPOS
SLE HPC 15 SP1
FCS
Q2 2019
SLE HPC 15 SP1
”Normal” SP
overlap
*NOTE: All future dates are estimates for illustration purposes and are not intended as committed dates.
SLE HPC 15
LTSS
25
Strategic Directions
Enable and exploit new HPC hardware
Shift HPC Module focus to utilities
Blend in AI/ML support
Simplify HPC in the Cloud experience
Improve Day 1 and Day 2 experience