an introduction to deep learning with apache mxnet (november 2017)
TRANSCRIPT
Julien Simon, AI/ML Evangelist, EMEA
@julsimon
November 8th, 2017
An introduction to Deep Learning
with Apache MXNet
Agenda
• The Advent of Deep Learning
• Deep Learning Applications
• Apache MXNet Overview
• Apache MXNet Demos (there will be code, ha!)
Artificial Intelligence: design software applications which
exhibit human-like behavior, e.g. speech, natural language
processing, reasoning or intuition
Machine Learning: teach machines to learn without being
explicitly programmed
Deep Learning: using neural networks, teach machines to
learn from complex data where features cannot be explicitly
expressed
The Advent of
Deep Learning
Algorithms
The Advent of
Deep LearningData
Algorithms
The Advent of
Deep LearningData
GPUs
& Acceleration
Algorithms
The Advent of
Deep LearningData
GPUs
& Acceleration
Programming
models
Algorithms
Selected customers running AI on AWS
Infrastructure CPU
Engines MXNet TensorFlow Caffe Theano Pytorch CNTK
ServicesAmazon Polly
Chat
Platforms
IoT
Speech
Amazon Lex
Mobile
Amazon AI: Artificial Intelligence In The Hands Of Every Developer
Amazon
ML
Spark &
EMRKinesis Batch ECS
GPU
Amazon Rekognition
Vision
FPGA
Hardware innovation for Deep Learning
https://aws.amazon.com/blogs/aws/new-amazon-ec2-instances-with-up-to-8-nvidia-tesla-v100-gpus-p3/
https://devblogs.nvidia.com/parallelforall/inside-volta/
https://aws.amazon.com/fr/blogs/aws/now-available-compute-intensive-c5-instances-for-amazon-ec2/
Intel Skylake CPU
November 6th
Nvidia Volta GPU
October 25th
Deep Learning Applications
Image Classification
Same breed?
Humans: 5,1%
https://news.developer.nvidia.com/expedia-ranking-hotel-images-with-deep-learning/
• Expedia have over 10M images from
300,000 hotels
• Using great images boosts conversion
• Using Keras and EC2 GPU instances,
they fine-tuned a pre-trained Convolutional
Neural Network using 100,000 images
• Hotel descriptions now automatically feature the
best available images
• 17,000 images from Instagram
• 7 brands
• Inception v3 model, pre-trained on ImageNet
• Fine-tuning with TensorFlow and EC2 GPU
instances
• Additional work on color extraction
https://technology.condenast.com/story/handbag-brand-and-color-detection
Object Detection
https://github.com/precedenceguo/mx-rcnn https://github.com/zhreshold/mxnet-yolo
https://www.oreilly.com/ideas/self-driving-trucks-enter-the-fast-lane-using-deep-learning
Last June, tuSimple drove an autonomous
truck
for 200 miles from Yuma, AZ to San Diego,
CA
Real-Time Pose Estimation
https://github.com/dragonfly90/mxnet_Realtime_Multi-Person_Pose_Estimation
Machine Translation
https://aws.amazon.com/blogs/ai/train-neural-machine-translation-models-with-sockeye/
Amazon Echo
Natural Language Processing & Text-to-Speech
Apache MXNet overview
Apache MXNet
Programmable Portable High Performance
Near linear scaling
across hundreds of GPUs
Highly efficient
models for mobile
and IoT
Simple syntax,
multiple languages
Most Open Best On AWS
Optimized for
Deep Learning on
AWS
Accepted into the
Apache Incubator
Input Output
1 1 1
1 0 1
0 0 03
mx. sym. Convol ut i on( dat a, ker nel =( 5, 5) , num_f i l t er =20)
mx. sym. Pool i ng( dat a, pool _t ype=" max" , ker nel =( 2, 2) ,
st r i de=( 2, 2)
l st m. l st m_unr ol l ( num_l st m_l ayer , seq_l en, l en, num_hi dden, num_embed)
4 2
2 0 4=Max
1
3
...
4
0.2
-0.1
...
0.7
mx. sym. Ful l yConnect ed( dat a, num_hi dden=128)
2
mx. symbol . Embeddi ng( dat a, i nput _di m, out put _di m = k)
0.2
-0.1
...
0.7
Queen
4 2
2 0 2=Avg
Input Weights
cos(w, queen ) = cos(w, k i ng) - cos(w, m an ) + cos(w, w om an)
mx. sym. Act i vat i on( dat a, act _t ype=" xxxx" )
" r el u"
" t anh"
" si gmoi d"
" sof t r el u"
Neural Art
Face Search
Image Segmentation
Image Caption
“People Riding
Bikes”
Bicycle, People,
Road, Sport
Image Labels
Image
Video
Speech
Text
“People Riding
Bikes”
Machine Translation
“Οι άνθρωποι
ιππασίας ποδήλατα”
Events
mx. model . FeedFor war d model . f i t
mx. sym. Sof t maxOut put
Anat omy of a Deep Lear ning Model
CPU or GPU: your choice
mod = mx.mod.Module(lenet)
mod = mx.mod.Module(lenet, context=mx.gpu(0))
mod = mx.mod.Module(lenet,
context=(mx.gpu(7), mx.gpu(8), mx.gpu(9)))
Ideal
Inception v3Resnet
Alexnet
88%Efficiency
1 2 4 8 16 32 64 128 256
Distributed Training with MXNet
Apache MXNet demos
1. Image classification: using pre-trained models
Imagenet, multiple CNNs, MXNet
2. Image classification: fine-tuning a pre-trained model
CIFAR-10, ResNet-50, Keras + MXNet
3. Image classification: learning from scratch
MNIST, MLP & LeNet, MXNet
4. Machine Translation: translating German to English
News, LSTM, Sockeye + MXNet
Demo #1 – Image classification: using a pre-trained model
*** VGG16
[(0.46811387, 'n04296562 stage'), (0.24333163,
'n03272010 electric guitar'), (0.045918692, 'n02231487
walking stick, walkingstick, stick insect'),
(0.03316205, 'n04286575 spotlight, spot'),
(0.021694135, 'n03691459 loudspeaker, speaker, speaker
unit, loudspeaker system, speaker system')]
*** ResNet-152
[(0.8726753, 'n04296562 stage'), (0.046159592,
'n03272010 electric guitar'), (0.041658506, 'n03759954
microphone, mike'), (0.018624334, 'n04286575
spotlight, spot'), (0.0058045341, 'n02676566 acoustic
guitar')]
*** Inception v3
[(0.44991142, 'n04296562 stage'), (0.43065304,
'n03272010 electric guitar'), (0.067580454, 'n04456115
torch'), (0.012423956, 'n02676566 acoustic guitar'),
(0.0093934005, 'n03250847 drumstick')]
https://medium.com/@julsimon/an-introduction-to-the-mxnet-api-part-5-9e78534096db
Demo #2 – Image classification: fine-tuning a model
CIFAR-10 data set• 60,000 images in 10 classes
• 32x32 color images
Initial training• Resnet-50 CNN
• 200 epochs
• 82.12% validation
Cars vs. horses• 88.8% validation accuracy
https://medium.com/@julsimon/keras-shoot-out-part-3-fine-tuning-7d1548c51a41
Demo #2 – Image classification: fine-tuning a model
• Freezing all layers but the last one
• Fine-tuning on « cars vs. horses » for 10 epochs
• 2 minutes on 1 GPU (p2)
• 98.8% validation accuracy
Epoch 10/10
10000/10000 [==============================] - 12s
loss: 1.6989 - acc: 0.9994 - val_loss: 1.7490 - val_acc: 0.9880
2000/2000 [==============================] - 2s
[1.7490020694732666, 0.98799999999999999]
Demo #3 – Image classification: learning from scratch
MNIST data set
70,000 hand-written digits
28x28 grayscale images
https://medium.com/@julsimon/training-mxnet-part-1-mnist-6f0dc4210c62
Multi-Layer Perceptron vs. Handmade-Digits-From-Hell™784/128/64/10, Relu, AdaGrad, 100 epochs 97.51% validation accuracy
[[ 0.839 0.034 0.039 0.009 0. 0.008 0.066 0.002 0. 0.004]]
[[ 0. 0.988 0.001 0.003 0.001 0.001 0.002 0.003 0.001 0.002]]
[[ 0.006 0.01 0.95 0.029 0. 0.001 0.004 0. 0. 0.]]
[[ 0. 0. 0. 1. 0. 0. 0. 0. 0. 0.]]
[[ 0. 0.001 0.005 0.001 0.982 0.001 0. 0.007 0. 0.002]]
[[ 0.001 0.001 0. 0.078 0. 0.911 0.01 0. 0. 0.]]
[[ 0.003 0. 0.019 0. 0.005 0.004 0.863 0. 0.105 0.001]]
[[ 0.001 0.008 0.098 0.033 0. 0. 0. 0.852 0.004 0.004]]
[[ 0.001 0. 0.006 0. 0. 0.001 0.002 0. 0.991 0.]]
[[ 0.002 0.158 0.007 0.117 0.082 0.001 0. 0.239 0.17 0.224]]
LeNet CNN vs. Handmade-Digits-From-Hell™ReLu instead of tanh, 10 epochs, AdaGrad 99.20% validation accuracy
[[ 1. 0. 0. 0. 0. 0. 0. 0. 0. 0.]]
[[ 0. 1. 0. 0. 0. 0. 0. 0. 0. 0.]]
[[ 0. 0. 1. 0. 0. 0. 0. 0. 0. 0.]]
[[ 0. 0. 0. 1. 0. 0. 0. 0. 0. 0.]]
[[ 0. 0. 0.001 0. 0.998 0. 0. 0.001 0. 0.]]
[[ 0. 0. 0. 0. 0. 1. 0. 0. 0. 0.]]
[[ 0. 0. 0. 0. 0. 0. 1. 0. 0. 0.]]
[[ 0. 0. 0. 0.001 0. 0. 0. 0.999 0. 0.]]
[[ 0. 0. 0.006 0. 0. 0. 0. 0. 0.994 0.]]
[[ 0. 0. 0. 0.001 0.001 0. 0. 0.001 0.001 0.996]]
Demo #4 – Machine Translation: German to English
• Sockeye
• 5.8M sentences (news headlines), 5 hours of training on 8 GPUs (p2)
./translate.sh "Chopin zählt zu den bedeutendsten Persönlichkeiten der
Musikgeschichte Polens .”
Chopin is one of the most important personalities of Poland’s history
./translate.sh "Hotelbetreiber müssen künftig nur den Rundfunkbeitrag
bezahlen, wenn ihre Zimmer auch eine Empfangsmöglichkeit bieten .”
in the future , hotel operators must pay only the broadcasting fee if their
rooms also offer a reception facility .
https://aws.amazon.com/blogs/ai/train-neural-machine-translation-models-with-sockeye/
Anything you dream is fiction, and anything
you accomplish is science, the whole history
of mankind is nothing but science fiction.
Ray Bradbury
Resources
https://aws.amazon.com/ai/
https://aws.amazon.com/blogs/ai/
https://mxnet.io
https://github.com/gluon-api/
https://reinvent.awsevents.com/ watch this space ;)
https://medium.com/@julsimon/
https://aws.amazon.com/fr/events/semaine-ia/
Merci!Julien Simon, AI/ML Evangelist, EMEA
@julsimon
https://aws.amazon.com/evangelists/julien-simon/