enabling the future of artificial intelligence - sign in · pdf file•ai overview...
TRANSCRIPT
Enabling the future of Artificial intelligence
• AI Overview• Intel Nervana AI products
• Hardware• Software• Intel Nervana Deep Learning Platform
• Learn more - Intel Nervana AI Academy
Contents
© 2017 Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks of Intel Corporation or its subsidiaries in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others. For more complete information about compiler optimizations, see our Optimization Notice.
Artificial Intelligence, Machine Learning & Deep Learning
3
© 2017 Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks of Intel Corporation or its subsidiaries in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others. For more complete information about compiler optimizations, see our Optimization Notice.
Why Now?
4
Bigger Data Better Hardware Smarter Algorithms
Image: 1000 KB / picture
Audio: 5000 KB / song
Video: 5,000,000 KB / movie
Transistor density doubles every 18 months
Cost / GB in 1995: $1000.00
Cost / GB in 2017: $0.02
Advances in algorithm innovation, including neural networks, leading to better accuracy in training models
© 2017 Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks of Intel Corporation or its subsidiaries in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others. For more complete information about compiler optimizations, see our Optimization Notice.
Sharing• Companies share algorithms and topologies
5
http://www.camdencca.org/content/uploads/2016/11/community-ideas-sharing.jpg
https://www.juanmerodio.com/en/wp-content/uploads/gold-data.jpg
Their gold is:
• Data
• Trained models
• Talent
© 2017 Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks of Intel Corporation or its subsidiaries in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others. For more complete information about compiler optimizations, see our Optimization Notice.
Machine Learning Types
6
http://www.frankichamaki.com/data-driven-market-segmentation-more-effective-marketing-to-segments-using-ai/
Reinforcement
Act in a environment to maximize rewardBuild autonomous agents that learn
Supervised
Teach desired behavior with labeled data and infer new data
Unsupervised
Make inferences with unlabeled data and discover patterns
Semi-supervised
A combination of supervised and unsupervised learning
Labeled Data
Classified Data
Labeled and Unlabeled Data
Classified Data
Unlabeled Data Clustered Data
© 2017 Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks of Intel Corporation or its subsidiaries in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others. For more complete information about compiler optimizations, see our Optimization Notice.
Machine Learning Types
7
http://www.frankichamaki.com/data-driven-market-segmentation-more-effective-marketing-to-segments-using-ai/
Reinforcement
Act in a environment to maximize rewardBuild autonomous agents that learn
Supervised
Teach desired behavior with labeled data and infer new data
Unsupervised
Make inferences with unlabeled data and discover patterns
Semi-supervised
A combination of supervised and unsupervised learning
Labeled Data
Classified Data
Labeled and Unlabeled Data
Classified Data
Unlabeled Data Clustered Data
© 2017 Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks of Intel Corporation or its subsidiaries in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others. For more complete information about compiler optimizations, see our Optimization Notice.
Training
8
data
output expected
…
0.10 0.15 0.20 …0.05
person cat dog bike
0 1 0 … 0
person cat dog bike
penalty(error or cost)
…
Forward
Propagation
Back
Propagation
© 2017 Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks of Intel Corporation or its subsidiaries in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others. For more complete information about compiler optimizations, see our Optimization Notice.
Inference
9
data
output
…
0.02 0.85 0.07 …0.01
person cat dog bike
Forward
Propagation
© 2017 Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks of Intel Corporation or its subsidiaries in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others. For more complete information about compiler optimizations, see our Optimization Notice.
Deep Learning use cases
10
Healthcare: Tumor detection
Transport: Automated Driving Finance: Time-series search
Positive:
Negative:
Agriculture: Robotics Energy: Oil & gas search
Positive:
Negative:
Proteomics: Sequence analysis
Query:
Results:
Consumer: Speech/text search
Consumer: Smart speakers
© 2017 Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks of Intel Corporation or its subsidiaries in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others. For more complete information about compiler optimizations, see our Optimization Notice.
INtel Nervana Deep Learning Portfolio
12
Nervana Graph
Nervana Deep Learning Studio
MKL-DNN, other math libraries
Frameworks
• Accelerate framework optimization on IA; open source• For framework developers & Intel• Multi-node optimizations • Extend to non-DC inference products and use cases
Titanium: HW mgmt.
• Data scientist and developer DL productivity tools• DL Cloud Service for POC, developers and academics• DL appliance for DLaaS
• Frameworks for developers • Back end APIs to Nervana Graph
HW Transformers,Non-x86 libraries
Datacenter Edge, client, gateway
• Comprehensive product portfolio• General purpose x86• Dedicated DL NPU accelerators
PRODUCTS
PRODUCT SOFTWARE
ENABLING
DEEP LEARNING PLATFORM
SYSTEMS
Deep Learning Systems
Node & rack reference designs
Channel sales
• Enable direct and end customers with Deep Learn System portfolio
• Intel branded under investigation
Intel Brain Data Scientist Team
BDM & Direct Optimization
Team
• Research new AI usages and models• Develop POC with customers to apply AI methods• Enable customers to deploy products
RESEARCH AND APPLICATION SUPPORT
Nervana Cloud
Intel branded
© 2017 Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks of Intel Corporation or its subsidiaries in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others. For more complete information about compiler optimizations, see our Optimization Notice.
13
AI DatacenterAll purpose Highly-parallel Flexible acceleration Deep Learning
Crest Family
Deep learning by designScalable acceleration with best
performance for intensive deep learning training &
inference, period
Intel®FPGA
Enhanced DL InferenceScalable acceleration for deep learning inference in real-time
with higher efficiency, and wide range of workloads &
configurations
Intel® Xeon Phi™ Processor (Knights Mill)
Faster DL TrainingScalable performance optimized
for even faster deep learning training and select highly-
parallel datacenter workloads*
Intel® Xeon® Scalable Processors
most agile AI PlatformScalable performance for widest variety of AI & other datacenter
workloads – including breakthrough deep learning
training & inference
*Knights Mill (KNM); select = single-precision highly-parallel workloads generally scale to >100 threads and benefit from more vectorization, and may also benefit from greater memory bandwidth e.g. energy (reverse time migration), deep learning training, etc.All products, computer systems, dates, and figures specified are preliminary based on current expectations, and are subject to change without notice.
© 2017 Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks of Intel Corporation or its subsidiaries in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others. For more complete information about compiler optimizations, see our Optimization Notice.
Performance Drivers for AI Workloads
14
Compute Bandwidth
SW Optimizations
© 2017 Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks of Intel Corporation or its subsidiaries in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others. For more complete information about compiler optimizations, see our Optimization Notice.
Most agile AI platform
Intel® Xeon® scalable processorBuilt-in ROIBegin your AI journey today usingexisting, familiar infrastructure
Potent performanceUp to 2.2X deep learning training & inference perf vs. prior gen1; 113X with SW optimizations2
Production-readyRobust support for full range of AI deployments
1,2Configuration details on slide: 18, 20, 24Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. For more complete information visit: http://www.intel.com/performance Source: Intel measured as of November 2016Optimization Notice: Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice. Notice Revision #20110804
Classic ML Deep Learning Reasoning
Emerging AI Analytics More
Scalable performance for widest variety of AI & other datacenter workloads – including deep learning
15
© 2017 Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks of Intel Corporation or its subsidiaries in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others. For more complete information about compiler optimizations, see our Optimization Notice.
AI Performance – Gen over Gen
16
INFERENCE THROUGHPUT
Up to
2.4xIntel® Xeon® Platinum 8180 Processor
higher Neon ResNet 18 inference throughput compared to
Intel® Xeon® Processor E5-2699 v4
TRAINING THROUGHPUT
Up to
2.2xIntel® Xeon® Platinum 8180 Processor
higher Neon ResNet 18 training throughputcompared to
Intel® Xeon® Processor E5-2699 v4
Advance previous generation AI workload performance with Intel® Xeon® Scalable Processors
Inference throughput batch size: 1 Training throughput batch size: 256 Configuration Details on Slide: 18, 20 Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specificcomputer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined withother products. For more complete information visit http://www.intel.com/performance Source: Intel measured as of June 2017 Optimization Notice: Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizationsinclude SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certainoptimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice.
Inference and training throughput measured with FP32 instructions. Inference with INT8 will be higher.
© 2017 Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks of Intel Corporation or its subsidiaries in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others. For more complete information about compiler optimizations, see our Optimization Notice.
Up to 3.4x Integer Matrix Multiply Performance on Intel® Xeon® Platinum 8180 Processor
17
Configuration Details on Slide: 24Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. For more complete information visit: http://www.intel.com/performance Source: Intel measured as of June 2017 Optimization Notice: Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice.
1 1
2.3
3.4
0
0.5
1
1.5
2
2.5
3
3.5
4
Single Precision Floating Point General Matrix MultiplySGEMM(FP32)
Integer General Matrix MultiplyIGEMM(INT8)
Matrix Multiply Performance on Intel® Xeon® Platinum 8180 Processor compared to Intel® Xeon® Processor E5-2699 v4
1S Intel® Xeon® Processor E5-2699 v4 1S Intel® Xeon® Platinum 8180 Processor
Enhanced matrix multiply performance on Intel® Xeon® Scalable Processors
8bit IGEMM will be available in Intel® Math Kernel Library (Intel® MKL) 2018 Gold to be released by end of Q3 2017
GEM
M p
erfo
rman
ce(M
easu
red
in G
FLO
PS)
rep
rese
nte
d r
elat
ive
to a
bas
elin
e 1
.0H
igh
er is
Bet
ter
© 2017 Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks of Intel Corporation or its subsidiaries in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others. For more complete information about compiler optimizations, see our Optimization Notice.
Up to 2.4x Higher Inference Throughputon Intel® Xeon® Platinum 8180 Processor
18
Intel® Xeon® Platinum Processor delivers Inference throughput performance across different frameworks
1146
405
118 62
2135
427
140
1093
155 16479
1305
445286
2656
814
226136
3382
658
248
2439
333250
115
2889
1036
672
0
500
1000
1500
2000
2500
3000
3500
4000
AlexNetBS = 1024
GoogLeNet v1BS = 1024
ResNet-50BS = 1024
VGG-19BS = 256
AlexNet ConvNetBS = 1024
GoogLeNet ConvNetBS = 1024
VGG ConvNetBS = 256
AlexNetBS = 1024
VGG-19BS = 256
Inception V3BS = 1024
ResNet-50BS = 256
AlexNet ConvNetBS = 1024
GoogLeNet v1ConvNet
BS = 1024
ResNet 18BS = 1024
Infe
ren
ce T
hro
ugh
pu
t sh
ow
n in
Imag
es/S
eco
nd
2S Intel® Xeon® Processor E5-2699v4, 22C, 2.2GHz 2S Intel® Xeon® Platinum 8180 Processor, 28C, 2.5GHz
Caffe TensorFlow MXNet Neon
Inference throughput measured with FP32 instructions. Inference with INT8 will be higher. Additional optimizations may further improve performance.
Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may causethe results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. For more complete information visit: http://www.intel.com/performance Source: Intel measured as ofJune 2017 Optimization Notice: Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability,functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to theapplicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice.
© 2017 Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks of Intel Corporation or its subsidiaries in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others. For more complete information about compiler optimizations, see our Optimization Notice.
AI Performance – Software + Hardware
19
INFERENCE THROUGHPUT
Up to
138xIntel® Xeon® Platinum 8180 Processor
higher Intel optimized Caffe GoogleNet v1 with Intel® MKL inference throughput compared to
Intel® Xeon® Processor E5-2699 v3 with BVLC-Caffe
INFERENCE using FP32 Batch Size Caffe GoogleNet v1 256 AlexNet 256 Configuration Details on Slide: 18, 25Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may causethe results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. For more complete information visit: http://www.intel.com/performance Source: Intel measured as ofJune 2017 Optimization Notice: Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability,functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to theapplicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice.
TRAINING THROUGHPUT
Up to
113xIntel® Xeon® Platinum 8180 Processor
higher Intel Optimized Caffe AlexNet with Intel® MKL training throughput compared to
Intel® Xeon® Processor E5-2699 v3 with BVLC-Caffe
Deliver significant AI performance with hardware and software optimizations on Intel® Xeon® Scalable Processors
Optimized Frameworks
Optimized Intel® MKL Libraries
Inference and training throughput measured with FP32 instructions. Inference with INT8 will be higher.
© 2017 Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks of Intel Corporation or its subsidiaries in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others. For more complete information about compiler optimizations, see our Optimization Notice.
20
Intel® Xeon® Scalable Processor Multi-node Performance
Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any
change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. For more complete
information visit: http://www.intel.com/performance Source: Intel measured as of August 2017.
Optimization Notice: Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice.
496.0
247.5
130.3
62.9
30.0
15.1
7.8
3.9
2.0
1.51.1
0.8
0.5
1.0
2.0
4.0
8.0
16.0
32.0
64.0
128.0
256.0
512.0
32(1 node)
64(2 nodes)
128(4 nodes)
256(8 nodes)
512(16 nodes)
1024(32 nodes)
2048(64 nodes)
4096(128 nodes)
8192(256 nodes)
11264(352 nodes)
11264(470 nodes)
11264(704 nodes)
MB-32 per nodeMB-24 per node MB-16 per node
Tim
e to
Tra
in (
ho
urs
)
Global minibatch - scaled across nodes
ResNet-50 Time to train (Hours) - Weak scaling minibatch
SKX-6148 SKX-8180*
© 2017 Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks of Intel Corporation or its subsidiaries in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others. For more complete information about compiler optimizations, see our Optimization Notice.
Performance Drivers for AI Workloads
21
Higher number of operations per second
Intel® Xeon® Platinum 8180 Processor (1-socket)
• up to 3570 GFLOPS on SGEMM (FP32)
• up to 5185 GOPS on IGEMM (Int8)
Increased parallelism and vectorization
Intel® Xeon® Scalable Processor offers Intel® AVX-512 with up to 2 512bit FMA units computing in parallel per core 1
Higher number of cores
Up to 28 core Intel® Xeon® Scalable Processors1 Available on Intel® Xeon® Platinum Processors, Intel® Xeon® Gold Processors and Intel® Xeon® 5122 ProcessorConfiguration Details on Slide: 23 Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Anychange to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. For more complete information visit:http://www.intel.com/performance Source: Intel measured as of June 2017 Optimization Notice: Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets andother optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture arereserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice.
High Throughput, Low Latency
• Intel® Xeon® Scalable Processors offer up to 6 DDR4 channels per socket and new mesh architecture
• Intel® Xeon® Processor 8180 Up to 199GB/s of STREAM Triad performance on a 2 socket system
Efficient Large Sized Caches
Intel® Xeon® Scalable Processors offer increased private local Mid-Level Cache MLC up to 1MB per core
Compute bandwidth
© 2017 Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks of Intel Corporation or its subsidiaries in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others. For more complete information about compiler optimizations, see our Optimization Notice.
Deep learningBy design
Scalable acceleration with best
performance for intensive deep
learning training & inference,
period
Crest family
▪ Unprecedented compute density
▪ Large reduction in time-to-train
▪ 32 GB of in package memory via HBM2 technology
▪ 8 Tera-bits/s of memory access speed
▪ 12 bi-directional high-bandwidth links
▪ Seamless data transfer via interconnects
Custom hardware Blazing data access High-speed scalability
2017
© 2017 Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks of Intel Corporation or its subsidiaries in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others. For more complete information about compiler optimizations, see our Optimization Notice.
Intel Nervana Lake Crest NPU Architecture
23
Interposer
Processing Cluster
PCI Express x16
SPI, IC2, GPIO
ICC
MGMT CPU
PCIe Controller & DMA
HBMPHY
ICL
Processing Cluster Processing Cluster
Processing Cluster
Processing Cluster
ICL
ICL
ICL
ICL
ICL
ICL
ICL
Processing Cluster
MemCtrlr
HBMPHY
MemCtrlr
HBMPHY
MemCtrlr
Processing Cluster
Processing Cluster
Processing Cluster
Processing Cluster
Processing Cluster
Processing Cluster
ICCICL
ICL
ICL
ICL
HBMPHY
MemCtrlr
HBM2
HBM2
HBM2
HBM2
Floorplan not to scale
© 2017 Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks of Intel Corporation or its subsidiaries in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others. For more complete information about compiler optimizations, see our Optimization Notice.
FlexPoint™ Numerical Format Designed
24
EXP
ON
ENT
MA
NTISSA
EXP
ON
ENT
MA
NTISSA
• 11 bit mantissa precision (-1024 to 1023)
• Individual 5-bit exponents
• 16 bit mantissa 45% more precision than Float16(-32,768 to 32,767)
• Tensor-wide shared 5-bit exponent
929 -045 -195
935 -1011 549
-702 923 310
-13487 29475 22630
21964 -21581 29857
29884 -26049 30852
DEC=8 DEC=7 DEC=8
DEC=6 DEC=7 DEC=8
DEC=7 DEC=6 DEC=8
DEC=8
Float16 Flex16
Flex16 accuracy on par with Float32 but with much smaller cores
© 2017 Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks of Intel Corporation or its subsidiaries in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others. For more complete information about compiler optimizations, see our Optimization Notice.
Diversity in Deep Networks• Variety in Network Topology ▪ Recurrent NNs common for NLP/ASR,
DAG for GoogLeNet, Networks with memory…
• But there are a few well defined building blocks
▪ Convolutions common for image recognition tasks
▪ GEMMs for recurrent network layers—could be sparse
▪ ReLU, tanh, softmax
26
GoogLeNet
Recurrent NN
CNN - AlexNet
© 2017 Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks of Intel Corporation or its subsidiaries in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others. For more complete information about compiler optimizations, see our Optimization Notice.
Intel® Math Kernel Library (Intel® MKL)
27
• Optimized AVX-2 and AVX-512 instructions
• Intel® Xeon® processors and Intel® Xeon Phi™ processors
• Optimized for common deep learning operations
• GEMM (useful in RNNs and fully connected layers)
• Convolutions
• Pooling
• ReLU
• Batch normalization
• Coming soon: Winograd-based convolutions
© 2017 Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks of Intel Corporation or its subsidiaries in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others. For more complete information about compiler optimizations, see our Optimization Notice.
Naïve Convolution
28
https://en.wikipedia.org/wiki/Convolutional_neural_network
© 2017 Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks of Intel Corporation or its subsidiaries in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others. For more complete information about compiler optimizations, see our Optimization Notice.
Cache Friendly Convolution
29
arxiv.org/pdf/1602.06709v1.pdf
© 2017 Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks of Intel Corporation or its subsidiaries in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others. For more complete information about compiler optimizations, see our Optimization Notice.
Xeon Xeon Phi FPGA
Deep Learning Frameworks
Intel®
MKL-DNN
Intel® Math Kernel
Library
(Intel® MKL)
* GEMM matrix multiply building blocks are binary
Intel® MKL and Intel® MKL-DNN for Deep LearningIntel® MKL Intel® MKL-DNN
DNN primitives + wide variety of other math
functions
DNN primitives
C DNN APIs (C++ future) C/C++ DNN APIs
Binary distribution Open source DNN code*
Free community license. Premium support available as
part of Parallel Studio XE Apache 2.0 license
Broad usage DNN primitives; not specific to individual
frameworks
Multiple variants of DNN primitives as required for framework integrations
Quarterly update releasesRapid development ahead of
Intel MKL releases
© 2017 Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks of Intel Corporation or its subsidiaries in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others. For more complete information about compiler optimizations, see our Optimization Notice.
Deep learning software: a many to many problem
Users
Frameworks
Hardware Platforms
Engineering effort combinatorial explosion that will only worsen as hardware X software X topologies X quantization schemes expands
© 2017 Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks of Intel Corporation or its subsidiaries in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others. For more complete information about compiler optimizations, see our Optimization Notice.
Nervana Graph - the project
• Four components:
• An intermediate representation (IR) for
deep learning• Data flow with common tensor computational primitives and
scheduling/side effecting memory management
• Compiler backends for Nervana Graph IR• Nervana GPU kernels (cuDNN soon), MKL-DNN, Lake Crest, ...
• Connectors to other deep learning
frameworks• Currently planning TensorFlow, Caffe2, MXNet, and Pytorch
support
© 2017 Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks of Intel Corporation or its subsidiaries in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others. For more complete information about compiler optimizations, see our Optimization Notice.
Performance Optimization on Modern Platforms
Utilize all the coresOpenMP, MPI, TBB…
Reduce synchronization events, serial code
Improve load balancing
Vectorize/SIMD
Unit strided access per SIMD lane
High vector efficiency
Data alignment
Efficient memory/cache use
Blocking
Data reuse
Prefetching
Memory allocation
Hierarchical Parallelism
Fine-Grained Parallelism / within node Sub-domain: 1) Multi-level domain decomposition (ex. across layers)
2) Data decomposition (layer parallelism)
Coarse-Grained / multi-node
Domain decomposition
Scaling
Improve load balancing
Reduce synchronization events, all-to-all comms
© 2017 Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks of Intel Corporation or its subsidiaries in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others. For more complete information about compiler optimizations, see our Optimization Notice.
Intel® Nervana™ Deep Learning StudioCompress Innovation Cycle to Accelerate Time-to-Solution
What it isA comprehensive software suite to allow groups of data scientists to reduce the “innovation cycle” and enable them to develop custom, enterprise-grade deep learning solutions in record time.
Available as part of Intel® Nervana Cloud and Intel® Nervana Deep Learning System.
UsersPrimary: Data scientists Secondary: Software developers who take trained deep learning models and integrate into their applications.
35
Why it's importantIt is both time consuming and expensive to develop a deep learning solution due to expensive data scientists spending too much time wrangling data and manually executing hundreds of experiments to find the right network topology and combination of parameters to achieve a converged model that fits their use case.
Learn More: intelnervana.com
Images
Video
Text
Speech
Tabular
Time series
Deep Learning FrameworksNeon (more coming soon)
Intel® Nervana™ Deep Learning Studio
Intel® Nervana™ Hardware
© 2017 Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks of Intel Corporation or its subsidiaries in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others. For more complete information about compiler optimizations, see our Optimization Notice.
High-Level Workflow
Dataset
Trained
Model
Data Scientist
Label
Import Dataset
Build Model
Model Library
Train Deploy
ncloudCommand Line Interface
Interactive Notebooks User Interface
Multiple Interface options
Edge
Cloud/Server
36
© 2017 Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks of Intel Corporation or its subsidiaries in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others. For more complete information about compiler optimizations, see our Optimization Notice.
✓ Intel Developer Zone for Artificial Intelligence
✓ Deep Learning Frameworks, libraries and additional tools
✓ Workshops, Webinars, Meet Ups & Remote Access
software.intel.com/ai/academyIntelnervana.com
Intel® Nervana™ ai academy
© 2017 Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks of Intel Corporation or its subsidiaries in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others. For more complete information about compiler optimizations, see our Optimization Notice.
39
Visual Understanding Research
@ Intel Labs China
2D/3D Face &
Emotion EngineVisual Parsing &
Multimodal Analysis
Face Analysis TechnologyMultimodal Emotion Recognition
…
Automatic Image/Video CaptioningVisual Question & Answering
…
Efficient DNN Design
& Compression
Efficient CNN Algorithm DesignDNN Model Compression
…
Innovate in cutting-edge visual cognition & machine learningtechnologies for smart computing to enable novel usages and user experience
FC
Loss
128
128
Face Analysis &
Emotion RecognitionVisual Parsing &
Multimodal Analysis
Deep Learning based
Visual Recognition
6
© 2017 Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks of Intel Corporation or its subsidiaries in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others. For more complete information about compiler optimizations, see our Optimization Notice.
Legal Disclaimer & Optimization Notice• INFORMATION IN THIS DOCUMENT IS PROVIDED “AS IS”. NO LICENSE, EXPRESS OR IMPLIED, BY ESTOPPEL OR OTHERWISE, TO ANY INTELLECTUAL
PROPERTY RIGHTS IS GRANTED BY THIS DOCUMENT. INTEL ASSUMES NO LIABILITY WHATSOEVER AND INTEL DISCLAIMS ANY EXPRESS OR IMPLIEDWARRANTY, RELATING TO THIS INFORMATION INCLUDING LIABILITY OR WARRANTIES RELATING TO FITNESS FOR A PARTICULAR PURPOSE, MERCHANTABILITY, OR INFRINGEMENT OF ANY PATENT, COPYRIGHT OR OTHER INTELLECTUAL PROPERTY RIGHT.
• Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products.
• Copyright © 2015, Intel Corporation. All rights reserved. Intel, Pentium, Xeon, Xeon Phi, Core, VTune, Cilk, and the Intel logo are trademarks of Intel Corporation in the U.S. and other countries.
Optimization Notice
Intel’s compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice.
Notice revision #20110804
© 2017 Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks of Intel Corporation or its subsidiaries in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others. For more complete information about compiler optimizations, see our Optimization Notice.
インテル® Parallel Studio XEベータ2018の
無料の30日間試用版を今日ダウンロードします>固有のタグ付けを持つ貴社のカスタムURLを以下のサンプルリンクに挿入してください
https://intelcustomer.az1.qualtrics.com/SE/?SID=SV_09AEJgAYdKezL6d&Q_JFE=0
忘れずに以下のことを行ってください...
貴社の商品とURLの宣伝に、どうぞこのスライドをテンプレートとしてご使用ください例えば、上の例はインテル® Parallel Studio XEベータ2018の販促用のものとなっています
卓越した性能を持つコード
このプレゼンテーション後に評価調査をEメールで送信いたしますので、
受信トレイ内をご確認ください。
追記アンケートにご記入いただいた方全員に、研修完了を示す個人の証明書をお送りいたします!
商品ボックスの写真をここに挿入してください。
© 2017 Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks of Intel Corporation or its subsidiaries in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others. For more complete information about compiler optimizations, see our Optimization Notice.