fabric software - ornlcomputing.ornl.gov/workshops/smc16/docs/session3/2016-09-01...fabric software...
TRANSCRIPT
Dr. Robert W. Wisniewski, Chief Software Architect Extreme Scale Computing
Smoky Mountain Conference 2016
Intel Confidential - For Disclosure Under NDA Only
Compute Memory/Storage
Fabric Software
Intel Silicon Photonics
Legal DisclaimerResults have been estimated based on internal Intel analysis and are provided for informational purposes only. Any difference in system hardware or software design or configuration may affect actual performance.
Intel, processors, chipsets, and desktop boards may contain design defects or errors known as errata, which may cause the product to deviate from published specifications. Current characterized errata are available on request.
Intel, Intel Xeon, Intel Core microarchitecture, and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States and other countries.
Intel technologies’ features and benefits depend on system configuration and may require enabled hardware, software or service activation. Learn more at intel.com, or from the OEM or retailer.
No computer system can be absolutely secure.
Tests document performance of components on a particular test, in specific systems. Differences in hardware, software, or configuration will affect actual performance. Consult other sources of information to evaluate performance as you consider your purchase. For more complete information about performance and benchmark results, visit http://www.intel.com/performance.
Cost reduction scenarios described are intended as examples of how a given Intel-based product, in the specified circumstances and configurations, may affect future costs and provide cost savings. Circumstances will vary. Intel does not guarantee any costs or cost reduction.
This document contains information on products, services and/or processes in development. All information provided here is subject to change without notice. Contact your Intel representative to obtain the latest forecast, schedule, specifications and roadmaps.
No license (express or implied, by estoppel or otherwise) to any intellectual property rights is granted by this document.
Intel does not control or audit third-party benchmark data or the web sites referenced in this document. You should visit the referenced web site and confirm whether referenced data are accurate.
Intel, the Intel logo, Xeon, Xeon Phi, Core, and others are trademarks of Intel Corporation in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others.
Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. For more complete information visit http://www.intel.com/performance.
Intel technologies’ features and benefits depend on system configuration and may require enabled hardware, software or service activation. Performance varies depending on system configuration. No computer system can be absolutely secure. Check with your system manufacturer or retailer or learn more at {most relevant URL to your product}.
Optimization Notice: Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice.
*Other names and brands may be claimed as the property of others
Copyright © 2016, Intel Corporation. All rights reserved.
Agenda
�Big Data
– Today
– Today and tomorrow
• Known capabilities on the horizon
• Longer term potential
3
Aurora | Science From Day One!Science From Day One!Science From Day One!Science From Day One!Extreme performance for a broad range of compute and data-centric workloads
Transportation Biological Science Renewable Energy
Materials Science Accelerators
Batteries / Solar Cells
Design / Data Analysis
Combustion Biofuels / Disease Control Wind Turbine Design / Placement
Fo
cu
s A
rea
s
Argonne Training Program on Extreme-
Scale Computing
US Industryand International
Training
Public Access
4Other names and brands may be claimed as the property of others.
Increase internal combustion engine efficiency by potentially 25%-50% with lower emissions through improved high-pressure designs with improved ignition
Enhanced extraction of biofuels from biomass by modeling the bottlenecks in bioconversion to enable rational design of superior catalysts
Exploration of evolution path of protein structures and extracting function from protein sequence and genomic context
Enhanced blade and bearing endurance and optimized turbine placement within a field
Optimize variable renewable energy injection into power grid over days and geographic regions
Climate Science
Dynamic Climate Systems
Materials design enabling the discovery of specific materials with higher energy and power densities, better stability and safety, and longer lifetimes
Creating optimized materials to improve photovoltaic efficiency and lowering manufacturing costs
Global optimization of accelerator design with integrated simulators for guiding, focusing and accelerating fields and assessment of stability
Tools that combine theory, modeling and analysis to interpret vast collections of experimental data from neutron, electron and x-ray accelerators to discover new material and molecule properties
Dynamic ecological and chemical evolution of the climate system through models that utilize observations, simulation and reanalysis data from multiple sources; improved hydrologic and carbon cycle processes
Personalized Medicine
5
Deep learning
6
The Four Vs of Big Data
7
• Volume
– 50 ZB by 2020
– 7B people each with potentially multiple devices
– 7B x N appliances each feeding information
• Velocity
– Smart cars each streaming 100s of sensors
– Data feeds, news, financial, etc.
• Variety
– Structured, static
– Video, picture
– Audio
• Veracity
– Uncertainty
– Inaccurate
Big Data Gone Wrong
8
Big Data (and AI) Gone Wrong
9
Copyright © 2016, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.
Optimization Notice10
Big data frameworks: Hadoop, Spark, Cassandra, etc.
SQ
L
sto
res
NoS
QL
sto
res
In-m
em
ory
sto
res
Co
nn
ecto
rs
Data mining
Recommendation engines
Customer behavior modeling
BI
analytics
Real time analytics
Big data analyticsCurrent common practice
• Limited performance
• Many layers of dependencies
• Low ROI on HW investment
• Run on state-of-art hardware
• Built with a patchwork of math libs
• Under-exploiting hardware performance features
Data sources
Finance
Social media
Marketing
IoT
Mfg
…
Problem Statement
Agenda
• Big Data
– Today
– Today and tomorrow
�Known capabilities on the horizon
• Longer term potential
11
computecompute
Pro
cess
or
Ad
jace
nt
or
I/O
N
od
e
Co
mp
ute
N
od
eR
em
ote
S
tora
ge
Processor
Compute Node
I/O Node
Remote Storage
Addressing the memory and i/O walls
12
Keeping data closer to compute �better data intensive app performance and energy efficiency
Parallel File System (Hard Drive Storage)
SSD Storage
Local Memory
Enough Capacity to Support Local Application Storage
Local Processing Node Temporal Storage
Faster CheckpointingQuicker RecoveryBetter App Performance
Today Tomorrow
High
er Ba
ndw
idth
. Lo
wer
Late
ncy
and
Capa
city
Caches
Parallel File System (Hard Drive Storage)
In-Package High Bandwidth
Memory
Non-Volatile Memory
Burst Buffer Storage
Caches
SSF: Enabling Configurability & Scalability from components to racks to clusters to supercomputers
“Rack”
“Cluster”
• Xeon or Xeon-Phi – based on workload needs
• Compute flexibly aggregated
• Lowest latency compute to compute interconnect
• I/O Topologies for best performance
• Configurable I/O bandwidth director switch
• Burst buffer to decouple storage from I/O
“Chassis”Simple, dense,
& configurable
“Memory”Enough capacity
to support apps“I/O”
Adaptive & configurable “Compute”Right sized” & configurable
8
>3TFpeak DP
PERFORMANCE
3XFASTER
ST PERFORMANCEVS. KNC
5XFASTER
MCDRAM VS.DDR4 DIMMs
Knight’s LandingNext-Gen Intel® Xeon Phi™ processor
CPUCPU
MCDRAMMCDRAM
DDRDDR
NAND SSDNAND SSD
Hard Disk DrivesHard Disk Drives
CPU
MCDRAM
DDR
NAND SSD
Hard Disk Drives
15
Intel Directions (From IDF 2016)
• Commitment to open source with optimized machine learning frameworks (Caffe, Theano) and libraries (Intel® Math Kernel Library – Deep Learning Neural Network, Intel Deep Learning SDK).
• Disclosure of the next-generation Intel® Xeon™ Phi processor, codename Knights Mill, with enhanced variable precision and flexible, high-capacity memory.
• Today we completed the acquisition of NervanaSystems, bringing together the Intel engineers who create the Intel® Xeon® and Intel Xeon Phi processors with Nervana’s machine learning experts to advance the AI industry faster than would have otherwise been possible.
16
1000Xfaster
THAN NAND
1000Xendurance
OF NAND
10Xdenser
THAN DRAM
3D XPoint™ Technology
CPUCPU
DDRDDR
INTEL® DIMMSINTEL® DIMMS
Intel® Optane™ SSDIntel® Optane™ SSD
NAND SSDNAND SSD
Hard Disk DrivesHard Disk Drives
CPU
DDR
INTEL® DIMMS
Intel® Optane™ SSD
NAND SSD
Hard Disk Drives
17
Copyright © 2016, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.
Optimization Notice
Big Data & Machine Learning Challenge
Problem:
� Big data needs high performance computing.
� Many big data applications leave performance at the table –> Not optimized for underlying hardware.
Solution:
� A performance library provides building blocks to be easily integrated into big data analytics workflows.
Volume
Velocity Variety
Value
Copyright © 2016, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.
Optimization Notice19
Intel® Data Analytics Acceleration Library (Intel® DAAL)
� Python, Java & C++ APIs
� Can be used with many platforms (Hadoop*, Spark*, R*, …) but not tied to any of them
� Flexible interface to connect to different data sources (CSV, SQL, HDFS, …)
� Windows*, Linux*, and OS X*
� Developed by same team as the industry-leading Intel® Math Kernel Library
� Open source, Free community-supported and commercial premium-supported options
� Also included in Parallel Studio XE suites
An Intel-optimized library that provides building blocks for all data analytics stages, from data preparation to data mining & machine learning
Copyright © 2016, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.
Optimization Notice
Intel DAAL Overview
Industry leading performance, C++/Java/Python library for machine learning and deep learning optimized for Intel® Architectures.
(De-)CompressionPCAStatistical momentsVariance matrixQR, SVD, CholeskyApriori
Linear regressionNaïve BayesSVMClassifier boosting
KmeansEM GMM
Collaborative filtering
Neural Networks
Pre-processing Transformation Analysis Modeling Decision Making
Sci
en
tifi
c/E
ng
ine
eri
ng
We
b/S
oci
al
Bu
sin
ess
Validation
Copyright © 2016, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.
Optimization Notice21
• Speeds math processing for machine learning, scientific, engineering financial and design applications
• Includes functions for dense and sparse linear algebra (BLAS, LAPACK, PARDISO), FFTs, vector math, summary statistics and more
• De facto standard APIs for easy switching from other math libraries
• Highly optimized, threaded and vectorized to maximize processor performance
Intel® Math Kernel LibraryEnergy Financial
AnalyticsEngineering
DesignDigital
Content Creation
Science & Research
Signal Processing
Copyright © 2016, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.
Optimization Notice22
Components of Intel MKL 2017
Linear Algebra
• BLAS• LAPACK• ScaLAPACK• Sparse BLAS• Sparse Solvers• Iterative • PARDISO*• Cluster Sparse
Solver
Fast Fourier Transforms
• Multidimensional• FFTW interfaces• Cluster FFT
Vector Math
• Trigonometric• Hyperbolic • Exponential• Log• Power• Root• Vector RNGs
Summary Statistics
• Kurtosis• Variation
coefficient• Order
statistics• Min/max• Variance-
covariance
And More…
• Splines• Interpolation• Trust Region• Fast Poisson
Solver
Deep Neural Networks
• Convolution• Pooling• Normalization• ReLU• Softmax
New
23 | © SAFFRON TECHNOLOGY 2016
NATURAL NATURAL NATURAL NATURAL INTELLIGENCEINTELLIGENCEINTELLIGENCEINTELLIGENCE
Associative Memory Scale COMPUTING COMPUTING COMPUTING COMPUTING
POWERPOWERPOWERPOWER
Experience-based Reasoning
RAISED WITHOUT RULES (OR MODELS)
A Cognitive Platform that Learns and Adapts AutomaticallyA Cognitive Platform that Learns and Adapts AutomaticallyA Cognitive Platform that Learns and Adapts AutomaticallyA Cognitive Platform that Learns and Adapts Automatically
BORN OF NEUROSCIENCE AND DATA SCIENCE
Accuracy
FOUNDATIONAL BELIEFS OUTCOMES
24 | © SAFFRON TECHNOLOGY 2016
CONTEXTUAL MATRIX CORRELATION AND DE-CORRELATION
JIM MEMORY
A massiveA massiveA massiveA massive hypergraphhypergraphhypergraphhypergraph of connections and coincidencesof connections and coincidencesof connections and coincidencesof connections and coincidencesSemantic and statistical deSemantic and statistical deSemantic and statistical deSemantic and statistical de----correlation correlation correlation correlation depending on question
Intel Confidential - For Disclosure Under NDA Only 25
Community name: OpenHPC Web Address: www.openhpc.communityCommunity name: OpenHPC Web Address: www.openhpc.community
Goals for the HPC software community
� Provide a common platform to the HPC community that works across multiple segments and on which end-users can collaborate and innovate.
� Simplify the complexity of installation, configuration, and ongoing maintenance of an HPC software stack
� Receive contributions and feedback from community
� Enable developers to focus on their differentiation and unique area, rather than having to spend effort on developing, testing, and maintaining a whole stack
� Deliver integrated hardware and software innovations to ease the path to extreme scale
Participation in OpenHPC as of 8/14/2016
OpenHPC is a Linux Foundation Project initiated by Intel and gained wide participation right away
The goal is to collaboratively advance the state of the software ecosystem
Governing board is composed of Platinum members (Intel, Dell, HPE, SUSE) plus representatives from Silver and Academic, Technical committees
Members
Performance Peak Framework: OSS Project and Product
27
University
Community
OEM
CommunityGNU
Linux
Parallel File system
Upstream
source
Communities
Resource Manager
Upstream
source
Communities
Upstream
source
Communities
Upstream
source
Communities
Integrates and tests HPC stacks and makes them available as OS
Base
HPC Stack
OEMStack
UniversityStack
Contributors include Intel, OEMs, ISVs, labs, academia
RRV
RRV
RRV
RRV
RRVs
Continuous Integration Environment-Build Environment & Source Control-Bug Tracking-User & Dev Forums-Collaboration tools-Validation Environment
Cadence 6~12 mo
“RRV” = Relevant and Reliable Version
Intel HPC
Orchestrator
PRODUCTPROJECT
Supported HPC Stack-Premium Features-Advanced Integration Testing-Testing at scale-Validated updates-Level3 Support across stack
OEMStack
Agenda
• Big Data
– Today
– Today and tomorrow
• Known capabilities on the horizon
�Longer term potential
28
29
*source: BDEC report (Reed et. al.) - to be released
*
30
Convergence
HW vs SW
31
Convergence
?
Data Management for Big Data(Long-Term View)
• Smooth and automatic representation between– Application data structure in memory
– Representation and access to NVRAM
– Storage to disk
• Moving compute to data
• Application makes system call
– make_permanent(*data), make_durable(*data)
32
main()
A[100][100][100];
graph_node {int value;edge e1;
} RAM nvram
main()
Intel Confidential