enabling the future of artificial intelligence - sign in · pdf file•ai overview...

Enabling the future of Artificial intelligence

• AI Overview• Intel Nervana AI products

• Hardware• Software• Intel Nervana Deep Learning Platform

• Learn more - Intel Nervana AI Academy

Contents

© 2017 Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks of Intel Corporation or its subsidiaries in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others. For more complete information about compiler optimizations, see our Optimization Notice.

Artificial Intelligence, Machine Learning & Deep Learning

3

https://software.intel.com/en-us/articles/optimization-notice#opt-en


Why Now?

4

Bigger Data Better Hardware Smarter Algorithms

Image: 1000 KB / picture

Audio: 5000 KB / song

Video: 5,000,000 KB / movie

Transistor density doubles every 18 months

Cost / GB in 1995: $1000.00

Cost / GB in 2017: $0.02

Advances in algorithm innovation, including neural networks, leading to better accuracy in training models



Sharing• Companies share algorithms and topologies

5

http://www.camdencca.org/content/uploads/2016/11/community-ideas-sharing.jpg

https://www.juanmerodio.com/en/wp-content/uploads/gold-data.jpg

Their gold is:

• Data

• Trained models

• Talent



Machine Learning Types

6

http://www.frankichamaki.com/data-driven-market-segmentation-more-effective-marketing-to-segments-using-ai/

Reinforcement

Act in a environment to maximize rewardBuild autonomous agents that learn

Supervised

Teach desired behavior with labeled data and infer new data

Unsupervised

Make inferences with unlabeled data and discover patterns

Semi-supervised

A combination of supervised and unsupervised learning

Labeled Data

Classified Data

Labeled and Unlabeled Data

Classified Data

Unlabeled Data Clustered Data



Machine Learning Types

7

http://www.frankichamaki.com/data-driven-market-segmentation-more-effective-marketing-to-segments-using-ai/

Reinforcement

Act in a environment to maximize rewardBuild autonomous agents that learn

Supervised

Teach desired behavior with labeled data and infer new data

Unsupervised

Make inferences with unlabeled data and discover patterns

Semi-supervised

A combination of supervised and unsupervised learning

Labeled Data

Classified Data

Labeled and Unlabeled Data

Classified Data

Unlabeled Data Clustered Data



Training

8

data

output expected

…

0.10 0.15 0.20 …0.05

person cat dog bike

0 1 0 … 0

person cat dog bike

penalty(error or cost)

…

Forward

Propagation

Back

Propagation



Inference

9

data

output

…

0.02 0.85 0.07 …0.01

person cat dog bike

Forward

Propagation



Deep Learning use cases

10

Healthcare: Tumor detection

Transport: Automated Driving Finance: Time-series search

Positive:

Negative:

Agriculture: Robotics Energy: Oil & gas search

Positive:

Negative:

Proteomics: Sequence analysis

Query:

Results:

Consumer: Speech/text search

Consumer: Smart speakers



INtel Nervana Deep Learning Portfolio

12

Nervana Graph

Nervana Deep Learning Studio

MKL-DNN, other math libraries

Frameworks

• Accelerate framework optimization on IA; open source• For framework developers & Intel• Multi-node optimizations • Extend to non-DC inference products and use cases

Titanium: HW mgmt.

• Data scientist and developer DL productivity tools• DL Cloud Service for POC, developers and academics• DL appliance for DLaaS

• Frameworks for developers • Back end APIs to Nervana Graph

HW Transformers,Non-x86 libraries

Datacenter Edge, client, gateway

• Comprehensive product portfolio• General purpose x86• Dedicated DL NPU accelerators

PRODUCTS

PRODUCT SOFTWARE

ENABLING

DEEP LEARNING PLATFORM

SYSTEMS

Deep Learning Systems

Node & rack reference designs

Channel sales

• Enable direct and end customers with Deep Learn System portfolio

• Intel branded under investigation

Intel Brain Data Scientist Team

BDM & Direct Optimization

Team

• Research new AI usages and models• Develop POC with customers to apply AI methods• Enable customers to deploy products

RESEARCH AND APPLICATION SUPPORT

Nervana Cloud

Intel branded


http://deeplearning.net/software/theano/


13

AI DatacenterAll purpose Highly-parallel Flexible acceleration Deep Learning

Crest Family

Deep learning by designScalable acceleration with best

performance for intensive deep learning training &

inference, period

Intel®FPGA

Enhanced DL InferenceScalable acceleration for deep learning inference in real-time

with higher efficiency, and wide range of workloads &

configurations

Intel® Xeon Phi™ Processor (Knights Mill)

Faster DL TrainingScalable performance optimized

for even faster deep learning training and select highly-

parallel datacenter workloads*

Intel® Xeon® Scalable Processors

most agile AI PlatformScalable performance for widest variety of AI & other datacenter

workloads – including breakthrough deep learning

training & inference

*Knights Mill (KNM); select = single-precision highly-parallel workloads generally scale to >100 threads and benefit from more vectorization, and may also benefit from greater memory bandwidth e.g. energy (reverse time migration), deep learning training, etc.All products, computer systems, dates, and figures specified are preliminary based on current expectations, and are subject to change without notice.



Performance Drivers for AI Workloads

14

Compute Bandwidth

SW Optimizations



Most agile AI platform

Intel® Xeon® scalable processorBuilt-in ROIBegin your AI journey today usingexisting, familiar infrastructure

Potent performanceUp to 2.2X deep learning training & inference perf vs. prior gen1; 113X with SW optimizations2

Production-readyRobust support for full range of AI deployments

1,2Configuration details on slide: 18, 20, 24Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. For more complete information visit: http://www.intel.com/performance Source: Intel measured as of November 2016Optimization Notice: Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice. Notice Revision #20110804

Classic ML Deep Learning Reasoning

Emerging AI Analytics More

Scalable performance for widest variety of AI & other datacenter workloads – including deep learning

15


http://www.intel.com/performance


AI Performance – Gen over Gen

16

INFERENCE THROUGHPUT

Up to

2.4xIntel® Xeon® Platinum 8180 Processor

higher Neon ResNet 18 inference throughput compared to

Intel® Xeon® Processor E5-2699 v4

TRAINING THROUGHPUT

Up to

2.2xIntel® Xeon® Platinum 8180 Processor

higher Neon ResNet 18 training throughputcompared to

Intel® Xeon® Processor E5-2699 v4

Advance previous generation AI workload performance with Intel® Xeon® Scalable Processors

Inference throughput batch size: 1 Training throughput batch size: 256 Configuration Details on Slide: 18, 20 Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specificcomputer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined withother products. For more complete information visit http://www.intel.com/performance Source: Intel measured as of June 2017 Optimization Notice: Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizationsinclude SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certainoptimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice.

Inference and training throughput measured with FP32 instructions. Inference with INT8 will be higher.




Up to 3.4x Integer Matrix Multiply Performance on Intel® Xeon® Platinum 8180 Processor

17

Configuration Details on Slide: 24Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. For more complete information visit: http://www.intel.com/performance Source: Intel measured as of June 2017 Optimization Notice: Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice.

1 1

2.3

3.4

0

0.5

1

1.5

2

2.5

3

3.5

4

Single Precision Floating Point General Matrix MultiplySGEMM(FP32)

Integer General Matrix MultiplyIGEMM(INT8)

Matrix Multiply Performance on Intel® Xeon® Platinum 8180 Processor compared to Intel® Xeon® Processor E5-2699 v4

1S Intel® Xeon® Processor E5-2699 v4 1S Intel® Xeon® Platinum 8180 Processor

Enhanced matrix multiply performance on Intel® Xeon® Scalable Processors

8bit IGEMM will be available in Intel® Math Kernel Library (Intel® MKL) 2018 Gold to be released by end of Q3 2017

GEM

M p

erfo

rman

ce(M

easu

red

in G

FLO

PS)

rep

rese

nte

d r

elat

ive

to a

bas

elin

e 1

.0H

igh

er is

Bet

ter




Up to 2.4x Higher Inference Throughputon Intel® Xeon® Platinum 8180 Processor

18

Intel® Xeon® Platinum Processor delivers Inference throughput performance across different frameworks

1146

405

118 62

2135

427

140

1093

155 16479

1305

445286

2656

814

226136

3382

658

248

2439

333250

115

2889

1036

672

0

500

1000

1500

2000

2500

3000

3500

4000

AlexNetBS = 1024

GoogLeNet v1BS = 1024

ResNet-50BS = 1024

VGG-19BS = 256

AlexNet ConvNetBS = 1024

GoogLeNet ConvNetBS = 1024

VGG ConvNetBS = 256

AlexNetBS = 1024

VGG-19BS = 256

Inception V3BS = 1024

ResNet-50BS = 256

AlexNet ConvNetBS = 1024

GoogLeNet v1ConvNet

BS = 1024

ResNet 18BS = 1024

Infe

ren

ce T

hro

ugh

pu

t sh

ow

n in

Imag

es/S

eco

nd

2S Intel® Xeon® Processor E5-2699v4, 22C, 2.2GHz 2S Intel® Xeon® Platinum 8180 Processor, 28C, 2.5GHz

Caffe TensorFlow MXNet Neon

Inference throughput measured with FP32 instructions. Inference with INT8 will be higher. Additional optimizations may further improve performance.

Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may causethe results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. For more complete information visit: http://www.intel.com/performance Source: Intel measured as ofJune 2017 Optimization Notice: Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability,functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to theapplicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice.




AI Performance – Software + Hardware

19

INFERENCE THROUGHPUT

Up to

138xIntel® Xeon® Platinum 8180 Processor

higher Intel optimized Caffe GoogleNet v1 with Intel® MKL inference throughput compared to

Intel® Xeon® Processor E5-2699 v3 with BVLC-Caffe

INFERENCE using FP32 Batch Size Caffe GoogleNet v1 256 AlexNet 256 Configuration Details on Slide: 18, 25Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may causethe results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. For more complete information visit: http://www.intel.com/performance Source: Intel measured as ofJune 2017 Optimization Notice: Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability,functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to theapplicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice.

TRAINING THROUGHPUT

Up to

113xIntel® Xeon® Platinum 8180 Processor

higher Intel Optimized Caffe AlexNet with Intel® MKL training throughput compared to

Intel® Xeon® Processor E5-2699 v3 with BVLC-Caffe

Deliver significant AI performance with hardware and software optimizations on Intel® Xeon® Scalable Processors

Optimized Frameworks

Optimized Intel® MKL Libraries

Inference and training throughput measured with FP32 instructions. Inference with INT8 will be higher.




20

Intel® Xeon® Scalable Processor Multi-node Performance

Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any

change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. For more complete

information visit: http://www.intel.com/performance Source: Intel measured as of August 2017.

Optimization Notice: Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice.

496.0

247.5

130.3

62.9

30.0

15.1

7.8

3.9

2.0

1.51.1

0.8

0.5

1.0

2.0

4.0

8.0

16.0

32.0

64.0

128.0

256.0

512.0

32(1 node)

64(2 nodes)

128(4 nodes)

256(8 nodes)

512(16 nodes)

1024(32 nodes)

2048(64 nodes)

4096(128 nodes)

8192(256 nodes)

11264(352 nodes)

11264(470 nodes)

11264(704 nodes)

MB-32 per nodeMB-24 per node MB-16 per node

Tim

e to

Tra

in (

ho

urs

)

Global minibatch - scaled across nodes

ResNet-50 Time to train (Hours) - Weak scaling minibatch

SKX-6148 SKX-8180*




Performance Drivers for AI Workloads

21

Higher number of operations per second

Intel® Xeon® Platinum 8180 Processor (1-socket)

• up to 3570 GFLOPS on SGEMM (FP32)

• up to 5185 GOPS on IGEMM (Int8)

Increased parallelism and vectorization

Intel® Xeon® Scalable Processor offers Intel® AVX-512 with up to 2 512bit FMA units computing in parallel per core 1

Higher number of cores

Up to 28 core Intel® Xeon® Scalable Processors1 Available on Intel® Xeon® Platinum Processors, Intel® Xeon® Gold Processors and Intel® Xeon® 5122 ProcessorConfiguration Details on Slide: 23 Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Anychange to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. For more complete information visit:http://www.intel.com/performance Source: Intel measured as of June 2017 Optimization Notice: Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets andother optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture arereserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice.

High Throughput, Low Latency

• Intel® Xeon® Scalable Processors offer up to 6 DDR4 channels per socket and new mesh architecture

• Intel® Xeon® Processor 8180 Up to 199GB/s of STREAM Triad performance on a 2 socket system

Efficient Large Sized Caches

Intel® Xeon® Scalable Processors offer increased private local Mid-Level Cache MLC up to 1MB per core

Compute bandwidth




Deep learningBy design

Scalable acceleration with best

performance for intensive deep

learning training & inference,

period

Crest family

▪ Unprecedented compute density

▪ Large reduction in time-to-train

▪ 32 GB of in package memory via HBM2 technology

▪ 8 Tera-bits/s of memory access speed

▪ 12 bi-directional high-bandwidth links

▪ Seamless data transfer via interconnects

Custom hardware Blazing data access High-speed scalability

2017



Intel Nervana Lake Crest NPU Architecture

23

Interposer

Processing Cluster

PCI Express x16

SPI, IC2, GPIO

ICC

MGMT CPU

PCIe Controller & DMA

HBMPHY

ICL

Processing Cluster Processing Cluster

Processing Cluster

Processing Cluster

ICL

ICL

ICL

ICL

ICL

ICL

ICL

Processing Cluster

MemCtrlr

HBMPHY

MemCtrlr

HBMPHY

MemCtrlr

Processing Cluster

Processing Cluster

Processing Cluster

Processing Cluster

Processing Cluster

Processing Cluster

ICCICL

ICL

ICL

ICL

HBMPHY

MemCtrlr

HBM2

HBM2

HBM2

HBM2

Floorplan not to scale



FlexPoint™ Numerical Format Designed

24

EXP

ON

ENT

MA

NTISSA

EXP

ON

ENT

MA

NTISSA

• 11 bit mantissa precision (-1024 to 1023)

• Individual 5-bit exponents

• 16 bit mantissa 45% more precision than Float16(-32,768 to 32,767)

• Tensor-wide shared 5-bit exponent

929 -045 -195

935 -1011 549

-702 923 310

-13487 29475 22630

21964 -21581 29857

29884 -26049 30852

DEC=8 DEC=7 DEC=8

DEC=6 DEC=7 DEC=8

DEC=7 DEC=6 DEC=8

DEC=8

Float16 Flex16

Flex16 accuracy on par with Float32 but with much smaller cores



Diversity in Deep Networks• Variety in Network Topology ▪ Recurrent NNs common for NLP/ASR,

DAG for GoogLeNet, Networks with memory…

• But there are a few well defined building blocks

▪ Convolutions common for image recognition tasks

▪ GEMMs for recurrent network layers—could be sparse

▪ ReLU, tanh, softmax

26

GoogLeNet

Recurrent NN

CNN - AlexNet



Intel® Math Kernel Library (Intel® MKL)

27

• Optimized AVX-2 and AVX-512 instructions

• Intel® Xeon® processors and Intel® Xeon Phi™ processors

• Optimized for common deep learning operations

• GEMM (useful in RNNs and fully connected layers)

• Convolutions

• Pooling

• ReLU

• Batch normalization

• Coming soon: Winograd-based convolutions



Naïve Convolution

28

https://en.wikipedia.org/wiki/Convolutional_neural_network



Cache Friendly Convolution

29

arxiv.org/pdf/1602.06709v1.pdf



Xeon Xeon Phi FPGA

Deep Learning Frameworks

Intel®

MKL-DNN

Intel® Math Kernel

Library

(Intel® MKL)

* GEMM matrix multiply building blocks are binary

Intel® MKL and Intel® MKL-DNN for Deep LearningIntel® MKL Intel® MKL-DNN

DNN primitives + wide variety of other math

functions

DNN primitives

C DNN APIs (C++ future) C/C++ DNN APIs

Binary distribution Open source DNN code*

Free community license. Premium support available as

part of Parallel Studio XE Apache 2.0 license

Broad usage DNN primitives; not specific to individual

frameworks

Multiple variants of DNN primitives as required for framework integrations

Quarterly update releasesRapid development ahead of

Intel MKL releases


http://deeplearning.net/software/theano/


Deep learning software: a many to many problem

Users

Frameworks

Hardware Platforms

Engineering effort combinatorial explosion that will only worsen as hardware X software X topologies X quantization schemes expands



Nervana Graph - the project

• Four components:

• An intermediate representation (IR) for

deep learning• Data flow with common tensor computational primitives and

scheduling/side effecting memory management

• Compiler backends for Nervana Graph IR• Nervana GPU kernels (cuDNN soon), MKL-DNN, Lake Crest, ...

• Connectors to other deep learning

frameworks• Currently planning TensorFlow, Caffe2, MXNet, and Pytorch

support



Performance Optimization on Modern Platforms

Utilize all the coresOpenMP, MPI, TBB…

Reduce synchronization events, serial code

Improve load balancing

Vectorize/SIMD

Unit strided access per SIMD lane

High vector efficiency

Data alignment

Efficient memory/cache use

Blocking

Data reuse

Prefetching

Memory allocation

Hierarchical Parallelism

Fine-Grained Parallelism / within node Sub-domain: 1) Multi-level domain decomposition (ex. across layers)

2) Data decomposition (layer parallelism)

Coarse-Grained / multi-node

Domain decomposition

Scaling

Improve load balancing

Reduce synchronization events, all-to-all comms



Intel® Nervana™ Deep Learning StudioCompress Innovation Cycle to Accelerate Time-to-Solution

What it isA comprehensive software suite to allow groups of data scientists to reduce the “innovation cycle” and enable them to develop custom, enterprise-grade deep learning solutions in record time.

Available as part of Intel® Nervana Cloud and Intel® Nervana Deep Learning System.

UsersPrimary: Data scientists Secondary: Software developers who take trained deep learning models and integrate into their applications.

35

Why it's importantIt is both time consuming and expensive to develop a deep learning solution due to expensive data scientists spending too much time wrangling data and manually executing hundreds of experiments to find the right network topology and combination of parameters to achieve a converged model that fits their use case.

Learn More: intelnervana.com

Images

Video

Text

Speech

Tabular

Time series

Deep Learning FrameworksNeon (more coming soon)

Intel® Nervana™ Deep Learning Studio

Intel® Nervana™ Hardware



High-Level Workflow

Dataset

Trained

Model

Data Scientist

Label

Import Dataset

Build Model

Model Library

Train Deploy

ncloudCommand Line Interface

Interactive Notebooks User Interface

Multiple Interface options

Edge

Cloud/Server

36



✓ Intel Developer Zone for Artificial Intelligence

✓ Deep Learning Frameworks, libraries and additional tools

✓ Workshops, Webinars, Meet Ups & Remote Access

software.intel.com/ai/academyIntelnervana.com

Intel® Nervana™ ai academy



39

Visual Understanding Research

@ Intel Labs China

2D/3D Face &

Emotion EngineVisual Parsing &

Multimodal Analysis

Face Analysis TechnologyMultimodal Emotion Recognition

…

Automatic Image/Video CaptioningVisual Question & Answering

…

Efficient DNN Design

& Compression

Efficient CNN Algorithm DesignDNN Model Compression

…

Innovate in cutting-edge visual cognition & machine learningtechnologies for smart computing to enable novel usages and user experience

FC

Loss

128

128

Face Analysis &

Emotion RecognitionVisual Parsing &

Multimodal Analysis

Deep Learning based

Visual Recognition

6



Legal Disclaimer & Optimization Notice• INFORMATION IN THIS DOCUMENT IS PROVIDED “AS IS”. NO LICENSE, EXPRESS OR IMPLIED, BY ESTOPPEL OR OTHERWISE, TO ANY INTELLECTUAL

PROPERTY RIGHTS IS GRANTED BY THIS DOCUMENT. INTEL ASSUMES NO LIABILITY WHATSOEVER AND INTEL DISCLAIMS ANY EXPRESS OR IMPLIEDWARRANTY, RELATING TO THIS INFORMATION INCLUDING LIABILITY OR WARRANTIES RELATING TO FITNESS FOR A PARTICULAR PURPOSE, MERCHANTABILITY, OR INFRINGEMENT OF ANY PATENT, COPYRIGHT OR OTHER INTELLECTUAL PROPERTY RIGHT.

• Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products.

• Copyright © 2015, Intel Corporation. All rights reserved. Intel, Pentium, Xeon, Xeon Phi, Core, VTune, Cilk, and the Intel logo are trademarks of Intel Corporation in the U.S. and other countries.

Optimization Notice

Intel’s compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice.

Notice revision #20110804



インテル® Parallel Studio XEベータ2018の

無料の30日間試用版を今日ダウンロードします>固有のタグ付けを持つ貴社のカスタムURLを以下のサンプルリンクに挿入してください

https://intelcustomer.az1.qualtrics.com/SE/?SID=SV_09AEJgAYdKezL6d&Q_JFE=0

忘れずに以下のことを行ってください...

貴社の商品とURLの宣伝に、どうぞこのスライドをテンプレートとしてご使用ください例えば、上の例はインテル® Parallel Studio XEベータ2018の販促用のものとなっています

卓越した性能を持つコード

このプレゼンテーション後に評価調査をＥメールで送信いたしますので、

受信トレイ内をご確認ください。

追記アンケートにご記入いただいた方全員に、研修完了を示す個人の証明書をお送りいたします！

商品ボックスの写真をここに挿入してください。


enabling the future of artificial intelligence - sign in · pdf file•ai overview...

Documents