![Page 1: Deep learning at Microsoft - Microsoft Cognitive Toolkit · PDF fileCognitive Toolkit Deep learning at Microsoft ... Cognitive Toolkit Image Similarity ... •Microsoft’s open-source](https://reader036.vdocuments.mx/reader036/viewer/2022081517/5abde2317f8b9a7e418c3463/html5/thumbnails/1.jpg)
![Page 2: Deep learning at Microsoft - Microsoft Cognitive Toolkit · PDF fileCognitive Toolkit Deep learning at Microsoft ... Cognitive Toolkit Image Similarity ... •Microsoft’s open-source](https://reader036.vdocuments.mx/reader036/viewer/2022081517/5abde2317f8b9a7e418c3463/html5/thumbnails/2.jpg)
![Page 3: Deep learning at Microsoft - Microsoft Cognitive Toolkit · PDF fileCognitive Toolkit Deep learning at Microsoft ... Cognitive Toolkit Image Similarity ... •Microsoft’s open-source](https://reader036.vdocuments.mx/reader036/viewer/2022081517/5abde2317f8b9a7e418c3463/html5/thumbnails/3.jpg)
Microsoft
Cognitive
Toolkit
Deep learning at Microsoft
• Microsoft Cognitive Services
• Skype Translator
• Cortana
• Bing
• HoloLens
• Microsoft Research
![Page 4: Deep learning at Microsoft - Microsoft Cognitive Toolkit · PDF fileCognitive Toolkit Deep learning at Microsoft ... Cognitive Toolkit Image Similarity ... •Microsoft’s open-source](https://reader036.vdocuments.mx/reader036/viewer/2022081517/5abde2317f8b9a7e418c3463/html5/thumbnails/4.jpg)
Microsoft
Cognitive
Toolkit
Microsoft Cognitive Services
![Page 5: Deep learning at Microsoft - Microsoft Cognitive Toolkit · PDF fileCognitive Toolkit Deep learning at Microsoft ... Cognitive Toolkit Image Similarity ... •Microsoft’s open-source](https://reader036.vdocuments.mx/reader036/viewer/2022081517/5abde2317f8b9a7e418c3463/html5/thumbnails/5.jpg)
Microsoft
Cognitive
Toolkit
![Page 6: Deep learning at Microsoft - Microsoft Cognitive Toolkit · PDF fileCognitive Toolkit Deep learning at Microsoft ... Cognitive Toolkit Image Similarity ... •Microsoft’s open-source](https://reader036.vdocuments.mx/reader036/viewer/2022081517/5abde2317f8b9a7e418c3463/html5/thumbnails/6.jpg)
Microsoft
Cognitive
Toolkit
ImageNet: Microsoft 2015 ResNet
28.225.8
16.4
11.7
7.3 6.73.5
ILSVRC2010 NECAmerica
ILSVRC2011 Xerox
ILSVRC2012
AlexNet
ILSVRC2013 Clarifi
ILSVRC2014 VGG
ILSVRC2014
GoogleNet
ILSVRC2015 ResNet
ImageNet Classification top-5 error (%)
Microsoft had all 5 entries being the 1-st places this year: ImageNet classification,
ImageNet localization, ImageNet detection, COCO detection, and COCO segmentation
![Page 7: Deep learning at Microsoft - Microsoft Cognitive Toolkit · PDF fileCognitive Toolkit Deep learning at Microsoft ... Cognitive Toolkit Image Similarity ... •Microsoft’s open-source](https://reader036.vdocuments.mx/reader036/viewer/2022081517/5abde2317f8b9a7e418c3463/html5/thumbnails/7.jpg)
Microsoft
Cognitive
Toolkit
![Page 8: Deep learning at Microsoft - Microsoft Cognitive Toolkit · PDF fileCognitive Toolkit Deep learning at Microsoft ... Cognitive Toolkit Image Similarity ... •Microsoft’s open-source](https://reader036.vdocuments.mx/reader036/viewer/2022081517/5abde2317f8b9a7e418c3463/html5/thumbnails/8.jpg)
Microsoft
Cognitive
Toolkit
Image Similarity
Goal: given query image, find similar images.
• Customer: Anonymous ISV (Azure Partner)
• Task: given a retail image, find same product on competitor websites (to compare price).
• Existing solution: solely based on mining text information from the websites of Target, Macy, etc.
• Customer asked for individual similarity measure (e.g. texture, neck style, etc).
![Page 9: Deep learning at Microsoft - Microsoft Cognitive Toolkit · PDF fileCognitive Toolkit Deep learning at Microsoft ... Cognitive Toolkit Image Similarity ... •Microsoft’s open-source](https://reader036.vdocuments.mx/reader036/viewer/2022081517/5abde2317f8b9a7e418c3463/html5/thumbnails/9.jpg)
Microsoft
Cognitive
Toolkit
Bing / Bing Ads
![Page 10: Deep learning at Microsoft - Microsoft Cognitive Toolkit · PDF fileCognitive Toolkit Deep learning at Microsoft ... Cognitive Toolkit Image Similarity ... •Microsoft’s open-source](https://reader036.vdocuments.mx/reader036/viewer/2022081517/5abde2317f8b9a7e418c3463/html5/thumbnails/10.jpg)
Microsoft
Cognitive
Toolkit
Microsoft Translator http://translate.it
Power point-plug in for translating
speech to subtitles
![Page 11: Deep learning at Microsoft - Microsoft Cognitive Toolkit · PDF fileCognitive Toolkit Deep learning at Microsoft ... Cognitive Toolkit Image Similarity ... •Microsoft’s open-source](https://reader036.vdocuments.mx/reader036/viewer/2022081517/5abde2317f8b9a7e418c3463/html5/thumbnails/11.jpg)
Microsoft
Cognitive
Toolkit
Microsoft’s historicspeech breakthrough
• Microsoft 2016 research system for
conversational speech recognition
• 5.9% word-error rate
• enabled by CNTK’s multi-server scalability
[W. Xiong, J. Droppo, X. Huang, F. Seide, M. Seltzer, A. Stolcke,
D. Yu, G. Zweig: “Achieving Human Parity in Conversational
Speech Recognition,” https://arxiv.org/abs/1610.05256]
![Page 12: Deep learning at Microsoft - Microsoft Cognitive Toolkit · PDF fileCognitive Toolkit Deep learning at Microsoft ... Cognitive Toolkit Image Similarity ... •Microsoft’s open-source](https://reader036.vdocuments.mx/reader036/viewer/2022081517/5abde2317f8b9a7e418c3463/html5/thumbnails/12.jpg)
Microsoft Customer Support Agent
![Page 13: Deep learning at Microsoft - Microsoft Cognitive Toolkit · PDF fileCognitive Toolkit Deep learning at Microsoft ... Cognitive Toolkit Image Similarity ... •Microsoft’s open-source](https://reader036.vdocuments.mx/reader036/viewer/2022081517/5abde2317f8b9a7e418c3463/html5/thumbnails/13.jpg)
![Page 14: Deep learning at Microsoft - Microsoft Cognitive Toolkit · PDF fileCognitive Toolkit Deep learning at Microsoft ... Cognitive Toolkit Image Similarity ... •Microsoft’s open-source](https://reader036.vdocuments.mx/reader036/viewer/2022081517/5abde2317f8b9a7e418c3463/html5/thumbnails/14.jpg)
Microsoft
Cognitive
Toolkit
Microsoft Cognitive Toolkit (CNTK)
• Microsoft’s open-source deep-learning toolkit
• https://github.com/Microsoft/CNTK
• Created by Microsoft Speech researchers (Dong Yu et al.) in 2012,
“Computational Network Toolkit”
• On GitHub since Jan 2016 under MIT license
• Python support since Oct 2016 (beta), rebranded as “Cognitive Toolkit”
• External contributions e.g. from MIT, Stanford and NVidia
![Page 15: Deep learning at Microsoft - Microsoft Cognitive Toolkit · PDF fileCognitive Toolkit Deep learning at Microsoft ... Cognitive Toolkit Image Similarity ... •Microsoft’s open-source](https://reader036.vdocuments.mx/reader036/viewer/2022081517/5abde2317f8b9a7e418c3463/html5/thumbnails/15.jpg)
Microsoft
Cognitive
Toolkit
Microsoft Cognitive Toolkit (CNTK)
• Over 80% Microsoft internal DL workload runs CNTK
• 1st-class on Linux and Windows, docker support
• Python, C++, C#, Java
• Internal == External
![Page 16: Deep learning at Microsoft - Microsoft Cognitive Toolkit · PDF fileCognitive Toolkit Deep learning at Microsoft ... Cognitive Toolkit Image Similarity ... •Microsoft’s open-source](https://reader036.vdocuments.mx/reader036/viewer/2022081517/5abde2317f8b9a7e418c3463/html5/thumbnails/16.jpg)
Microsoft
Cognitive
Toolkit
CTNK – The Fastest Toolkit
Caffe CNTK MxNet TensorFlow Torch
FCN5 (1024) 55.329ms 51.038ms 60.448ms 62.044ms 52.154ms
AlexNet (256) 36.815ms 27.215ms 28.994ms 103.960ms 37.462ms
ResNet (32) 143.987ms 81.470ms 84.545ms 181.404ms 90.935ms
LSTM (256)(v7 benchmark)
- 43.581ms(44.917ms)
288.142ms(284.898ms)
-(223.547ms)
1130.606ms(906.958ms)
http://dlbench.comp.hkbu.edu.hk/ Benchmarking by HKBU, Version 8Single Tesla K80 GPU, CUDA: 8.0 CUDNN: v5.1
Caffe: 1.0rc5(39f28e4)CNTK: 2.0 Beta10(1ae666d)MXNet: 0.93(32dc3a2)TensorFlow: 1.0(4ac9c09)Torch: 7(748f5e3)
![Page 17: Deep learning at Microsoft - Microsoft Cognitive Toolkit · PDF fileCognitive Toolkit Deep learning at Microsoft ... Cognitive Toolkit Image Similarity ... •Microsoft’s open-source](https://reader036.vdocuments.mx/reader036/viewer/2022081517/5abde2317f8b9a7e418c3463/html5/thumbnails/17.jpg)
Microsoft
Cognitive
Toolkit
“CNTK is production-ready: State-of-the-art accuracy, efficient, and scales to multi-GPU/multi-server.”
Theano only supports 1 GPU
Achieved with 1-bit gradient quantizationalgorithm
0
10000
20000
30000
40000
50000
60000
70000
80000
CNTK Theano TensorFlow Torch 7 Caffe
speed comparison (samples/second), higher = better
[note: December 2015]
1 GPU 1 x 4 GPUs 2 x 4 GPUs (8 GPUs)
![Page 18: Deep learning at Microsoft - Microsoft Cognitive Toolkit · PDF fileCognitive Toolkit Deep learning at Microsoft ... Cognitive Toolkit Image Similarity ... •Microsoft’s open-source](https://reader036.vdocuments.mx/reader036/viewer/2022081517/5abde2317f8b9a7e418c3463/html5/thumbnails/18.jpg)
Superior performance
![Page 19: Deep learning at Microsoft - Microsoft Cognitive Toolkit · PDF fileCognitive Toolkit Deep learning at Microsoft ... Cognitive Toolkit Image Similarity ... •Microsoft’s open-source](https://reader036.vdocuments.mx/reader036/viewer/2022081517/5abde2317f8b9a7e418c3463/html5/thumbnails/19.jpg)
Scalability
![Page 20: Deep learning at Microsoft - Microsoft Cognitive Toolkit · PDF fileCognitive Toolkit Deep learning at Microsoft ... Cognitive Toolkit Image Similarity ... •Microsoft’s open-source](https://reader036.vdocuments.mx/reader036/viewer/2022081517/5abde2317f8b9a7e418c3463/html5/thumbnails/20.jpg)
What is new in CNTK 2.0?
https://esciencegroup.com/2016/11/10/cntk-revisited-a-new-deep-learning-toolkit-release-from-microsoft/
Microsoft has now released a major upgrade of the software
and rebranded it as part of the Microsoft Cognitive
Toolkit. This release is a major improvement over the initial
release.
There are two major changes from the first release that you
will see when you begin to look at the new release. First is
that CNTK now has a very nice Python API and, second, the
documentation and examples are excellent.
Installing the software from the binary builds is very easy on
both Ubuntu Linux and Windows.
![Page 21: Deep learning at Microsoft - Microsoft Cognitive Toolkit · PDF fileCognitive Toolkit Deep learning at Microsoft ... Cognitive Toolkit Image Similarity ... •Microsoft’s open-source](https://reader036.vdocuments.mx/reader036/viewer/2022081517/5abde2317f8b9a7e418c3463/html5/thumbnails/21.jpg)
Microsoft
Cognitive
Toolkit
CNTK Other Advantages
• Python and C++ API• Mostly implemented in C++
• Low level + high level Python API
• Extensibility • User functions and learners in pure Python
• Readers • Distributed, highly efficient built-in data readers
• Internal == External
![Page 22: Deep learning at Microsoft - Microsoft Cognitive Toolkit · PDF fileCognitive Toolkit Deep learning at Microsoft ... Cognitive Toolkit Image Similarity ... •Microsoft’s open-source](https://reader036.vdocuments.mx/reader036/viewer/2022081517/5abde2317f8b9a7e418c3463/html5/thumbnails/22.jpg)
![Page 23: Deep learning at Microsoft - Microsoft Cognitive Toolkit · PDF fileCognitive Toolkit Deep learning at Microsoft ... Cognitive Toolkit Image Similarity ... •Microsoft’s open-source](https://reader036.vdocuments.mx/reader036/viewer/2022081517/5abde2317f8b9a7e418c3463/html5/thumbnails/23.jpg)
Microsoft
Cognitive
Toolkit
• CNTK expresses (nearly) arbitrary neural networks by composing simple building blocks into complex computational networks, supporting relevant network types and applications.
• CNTK is production-ready: State-of-the-art accuracy, efficient, and scales to multi-GPU/multi-server.
The Microsoft Cognitive Toolkit (CNTK)
![Page 24: Deep learning at Microsoft - Microsoft Cognitive Toolkit · PDF fileCognitive Toolkit Deep learning at Microsoft ... Cognitive Toolkit Image Similarity ... •Microsoft’s open-source](https://reader036.vdocuments.mx/reader036/viewer/2022081517/5abde2317f8b9a7e418c3463/html5/thumbnails/24.jpg)
MNIST Handwritten Digits (OCR)
• Data set of hand written digits with60,000 training images
10,000 test images
• Each image is: 28 x 28 pixels
Handwritten Digits
1 5 4 35 3 5 35 9 0 6
Corresponding Labels
![Page 25: Deep learning at Microsoft - Microsoft Cognitive Toolkit · PDF fileCognitive Toolkit Deep learning at Microsoft ... Cognitive Toolkit Image Similarity ... •Microsoft’s open-source](https://reader036.vdocuments.mx/reader036/viewer/2022081517/5abde2317f8b9a7e418c3463/html5/thumbnails/25.jpg)
Multi-layer perceptron
28 pix
28
pix
.
784 pixels (x)
.
Di = 784O= 400a = relu
Di = 400O= 200a = relu
D10 nodes i = 200
O= 10a = None
Weights
784
400 + 400 bias
400
200 + 200 bias
200
10 + 10 bias
Deep Model
z0 z1 z2 z3 z4 z5 z6 z7 z8 z9
0.08 0.08 0.10 0.17 0.11 0.09 0.08 0.08 0.13 0.01𝑒𝑧i
σ𝑗=09 𝑒𝑧j
softmax
https://github.com/Microsoft/CNTK/tree/master/Tutorials
![Page 26: Deep learning at Microsoft - Microsoft Cognitive Toolkit · PDF fileCognitive Toolkit Deep learning at Microsoft ... Cognitive Toolkit Image Similarity ... •Microsoft’s open-source](https://reader036.vdocuments.mx/reader036/viewer/2022081517/5abde2317f8b9a7e418c3463/html5/thumbnails/26.jpg)
28 pix
28
pix
.
28 x 28 pix (p)
Loss function
Lossfunction
ce = −σ𝑗=09 𝑦𝑗 𝑙𝑜𝑔 𝑝𝑗
Cross entropy error
1 5 4 35 3 5 35 9 0 6
Label One-hot encoded (Y)
0 0 0 1 0 0 0 0 0 0
Model(w, b)
Predicted Probabilities (p)
0.08 0.08 0.10 0.17 0.11 0.09 0.08 0.08 0.13 0.01
![Page 27: Deep learning at Microsoft - Microsoft Cognitive Toolkit · PDF fileCognitive Toolkit Deep learning at Microsoft ... Cognitive Toolkit Image Similarity ... •Microsoft’s open-source](https://reader036.vdocuments.mx/reader036/viewer/2022081517/5abde2317f8b9a7e418c3463/html5/thumbnails/27.jpg)
Microsoft
Cognitive
Toolkit
Example: 2-hidden layer feed-forward NN
h1 = s(W1 x + b1) h1 = sigmoid (x @ W1 + b1)
h2 = s(W2 h1 + b2) h2 = sigmoid (h1 @ W2 + b2)
P = softmax(Wout h2 + bout) P = softmax (h2 @ Wout + bout)
with input x RM and one-hot label y RJ
and cross-entropy training criterion
ce = yT log P ce = cross_entropy (L, P)
CNTK Model
![Page 28: Deep learning at Microsoft - Microsoft Cognitive Toolkit · PDF fileCognitive Toolkit Deep learning at Microsoft ... Cognitive Toolkit Image Similarity ... •Microsoft’s open-source](https://reader036.vdocuments.mx/reader036/viewer/2022081517/5abde2317f8b9a7e418c3463/html5/thumbnails/28.jpg)
Microsoft
Cognitive
Toolkit
example: 2-hidden layer feed-forward NN
h1 = s(W1 x + b1) h1 = sigmoid (x @ W1 + b1)
h2 = s(W2 h1 + b2) h2 = sigmoid (h1 @ W2 + b2)
P = softmax(Wout h2 + bout) P = softmax (h2 @ Wout + bout)
with input x RM and one-hot label y RJ
and cross-entropy training criterion
ce = yT log P ce = cross_entropy (P, y)
CNTK Model
![Page 29: Deep learning at Microsoft - Microsoft Cognitive Toolkit · PDF fileCognitive Toolkit Deep learning at Microsoft ... Cognitive Toolkit Image Similarity ... •Microsoft’s open-source](https://reader036.vdocuments.mx/reader036/viewer/2022081517/5abde2317f8b9a7e418c3463/html5/thumbnails/29.jpg)
Microsoft
Cognitive
Toolkit
•
+
s
•
+
s
•
+
softmax
W1
b1
W2
b2
Wout
bout
cross_entropy
h1
h2
P
x y
h1 = sigmoid (x @ W1 + b1)
h2 = sigmoid (h1 @ W2 + b2)
P = softmax (h2 @ Wout + bout)
ce = cross_entropy (P, y)
ce
CNTK Model
![Page 30: Deep learning at Microsoft - Microsoft Cognitive Toolkit · PDF fileCognitive Toolkit Deep learning at Microsoft ... Cognitive Toolkit Image Similarity ... •Microsoft’s open-source](https://reader036.vdocuments.mx/reader036/viewer/2022081517/5abde2317f8b9a7e418c3463/html5/thumbnails/30.jpg)
Microsoft
Cognitive
Toolkit
•
+
s
•
+
s
•
+
softmax
W1
b1
W2
b2
Wout
bout
cross_entropy
h1
h2
P
x y
ce• Nodes: functions (primitives)
• Can be composed into reusable composites
• Edges: values• Incl. tensors, sparse
• Automatic differentiation• ∂F / ∂in = ∂F / ∂out ∙ ∂out / ∂in
• Deferred computation execution engine
• Editable, clonable
LEGO-like composability allows CNTK to supportwide range of networks & applications
CNTK Model
![Page 31: Deep learning at Microsoft - Microsoft Cognitive Toolkit · PDF fileCognitive Toolkit Deep learning at Microsoft ... Cognitive Toolkit Image Similarity ... •Microsoft’s open-source](https://reader036.vdocuments.mx/reader036/viewer/2022081517/5abde2317f8b9a7e418c3463/html5/thumbnails/31.jpg)
Microsoft
Cognitive
Toolkit
• “model function”• features predictions
• defines the model structure & parameter initialization
• holds parameters that will be learned by training
• “criterion function”• (features, labels) (training loss, additional metrics)
• defines training and evaluation criteria on top of the model function
• provides gradients w.r.t. training criteria
Authoring networks as functions
•
+
s
•
+
s
•
+
softmax
W1
b1
W2
b2
Wout
bout
cross_entropy
h1
h2
P
x y
ce
![Page 32: Deep learning at Microsoft - Microsoft Cognitive Toolkit · PDF fileCognitive Toolkit Deep learning at Microsoft ... Cognitive Toolkit Image Similarity ... •Microsoft’s open-source](https://reader036.vdocuments.mx/reader036/viewer/2022081517/5abde2317f8b9a7e418c3463/html5/thumbnails/32.jpg)
Microsoft
Cognitive
Toolkit
Authoring networks as functions
• CNTK model: neural networks are functions• pure functions
• with “special powers”:• can compute a gradient w.r.t. any of its nodes
• external deity can update model parameters
• user specifies network as function objects:• formula as a Python function (low level, e.g. LSTM)
• function composition of smaller sub-networks (layering)
• higher-order functions (equiv. of scan, fold, unfold)
• model parameters held by function objects
• “compiled” into the static execution graph under the hood
![Page 33: Deep learning at Microsoft - Microsoft Cognitive Toolkit · PDF fileCognitive Toolkit Deep learning at Microsoft ... Cognitive Toolkit Image Similarity ... •Microsoft’s open-source](https://reader036.vdocuments.mx/reader036/viewer/2022081517/5abde2317f8b9a7e418c3463/html5/thumbnails/33.jpg)
Microsoft
Cognitive
Toolkit
Layers lib: full list of layers/blocks• layers/blocks.py:
• LSTM(), GRU(), RNNUnit()• Stabilizer(), identity• ForwardDeclaration(), Tensor[], SparseTensor[], Sequence[], SequenceOver[]
• layers/layers.py:• Dense(), Embedding()• Convolution(), Convolution1D(), Convolution2D(), Convolution3D(), Deconvolution()• MaxPooling(), AveragePooling(), GlobalMaxPooling(), GlobalAveragePooling(), MaxUnpooling()• BatchNormalization(), LayerNormalization()• Dropout(), Activation()• Label()
• layers/higher_order_layers.py:• Sequential(), For(), operator >>, (function tuples)• ResNetBlock(), SequentialClique()
• layers/sequence.py:• Delay(), PastValueWindow()• Recurrence(), RecurrenceFrom(), Fold(), UnfoldFrom()
• models/models.py:• AttentionModel()
![Page 34: Deep learning at Microsoft - Microsoft Cognitive Toolkit · PDF fileCognitive Toolkit Deep learning at Microsoft ... Cognitive Toolkit Image Similarity ... •Microsoft’s open-source](https://reader036.vdocuments.mx/reader036/viewer/2022081517/5abde2317f8b9a7e418c3463/html5/thumbnails/34.jpg)
Microsoft
Cognitive
Toolkit
CNTK workflowScript configure and executes through CNTK Python APIs…
trainer• SGD
(momentum,Adam, …)
• minibatching
reader• minibatch source• task-specific
deserializer• automatic
randomization• distributed
reading
corpus model
network• model function• criterion function• CPU/GPU
execution engine• packing, padding
![Page 35: Deep learning at Microsoft - Microsoft Cognitive Toolkit · PDF fileCognitive Toolkit Deep learning at Microsoft ... Cognitive Toolkit Image Similarity ... •Microsoft’s open-source](https://reader036.vdocuments.mx/reader036/viewer/2022081517/5abde2317f8b9a7e418c3463/html5/thumbnails/35.jpg)
Microsoft
Cognitive
Toolkit
from cntk import *
# readerdef create_reader(path, is_training):
...
# networkdef create_model_function():
...def create_criterion_function(model):
...
# trainer (and evaluator)def train(reader, model):
...def evaluate(reader, model):
...
# main functionmodel = create_model_function()
reader = create_reader(..., is_training=True)train(reader, model)
reader = create_reader(..., is_training=False)evaluate(reader, model)
As easy as 1-2-3
![Page 36: Deep learning at Microsoft - Microsoft Cognitive Toolkit · PDF fileCognitive Toolkit Deep learning at Microsoft ... Cognitive Toolkit Image Similarity ... •Microsoft’s open-source](https://reader036.vdocuments.mx/reader036/viewer/2022081517/5abde2317f8b9a7e418c3463/html5/thumbnails/36.jpg)
Microsoft
Cognitive
Toolkit
• prepare data
• configure reader, network, learner (Python)
• train:mpiexec --np 16 --hosts server1,server2,server3,server4 \python my_cntk_script.py
Workflow
![Page 37: Deep learning at Microsoft - Microsoft Cognitive Toolkit · PDF fileCognitive Toolkit Deep learning at Microsoft ... Cognitive Toolkit Image Similarity ... •Microsoft’s open-source](https://reader036.vdocuments.mx/reader036/viewer/2022081517/5abde2317f8b9a7e418c3463/html5/thumbnails/37.jpg)
Microsoft
Cognitive
Toolkit
def create_reader(map_file, mean_file, is_training):
# image preprocessing pipeline
transforms = [
ImageDeserializer.crop(crop_type='Random', ratio=0.8, jitter_type='uniRatio')
ImageDeserializer.scale(width=image_width, height=image_height, channels=num_channels,interpolations='linear'),
ImageDeserializer.mean(mean_file)
]
# deserializer
return MinibatchSource(ImageDeserializer(map_file, StreamDefs(
features = StreamDef(field='image', transforms=transforms), '
labels = StreamDef(field='label', shape=num_classes)
)), randomize=is_training, epoch_size = INFINITELY_REPEAT if is_training elseFULL_DATA_SWEEP)
Prepare data: reader
![Page 38: Deep learning at Microsoft - Microsoft Cognitive Toolkit · PDF fileCognitive Toolkit Deep learning at Microsoft ... Cognitive Toolkit Image Similarity ... •Microsoft’s open-source](https://reader036.vdocuments.mx/reader036/viewer/2022081517/5abde2317f8b9a7e418c3463/html5/thumbnails/38.jpg)
Microsoft
Cognitive
Toolkit
def create_reader(map_file, mean_file, is_training):# image preprocessing pipelinetransforms = [
ImageDeserializer.crop(crop_type='Random', ratio=0.8, jitter_type='uniRatio')ImageDeserializer.scale(width=image_width, height=image_height, channels=num_channels,
interpolations='linear'),ImageDeserializer.mean(mean_file)
]# deserializerreturn MinibatchSource(ImageDeserializer(map_file, StreamDefs(
features = StreamDef(field='image', transforms=transforms), 'labels = StreamDef(field='label', shape=num_classes)
)), randomize=is_training, epoch_size = INFINITELY_REPEAT if is_training else FULL_DATA_SWEEP)
• automatic on-the-fly randomization important for large data sets
• readers compose, e.g. image text caption
Prepare data: reader
![Page 39: Deep learning at Microsoft - Microsoft Cognitive Toolkit · PDF fileCognitive Toolkit Deep learning at Microsoft ... Cognitive Toolkit Image Similarity ... •Microsoft’s open-source](https://reader036.vdocuments.mx/reader036/viewer/2022081517/5abde2317f8b9a7e418c3463/html5/thumbnails/39.jpg)
Microsoft
Cognitive
Toolkit
• prepare data
• configure reader, network, learner (Python)
• train: --distributed!mpiexec --np 16 --hosts server1,server2,server3,server4 \python my_cntk_script.py
Distributed training
![Page 40: Deep learning at Microsoft - Microsoft Cognitive Toolkit · PDF fileCognitive Toolkit Deep learning at Microsoft ... Cognitive Toolkit Image Similarity ... •Microsoft’s open-source](https://reader036.vdocuments.mx/reader036/viewer/2022081517/5abde2317f8b9a7e418c3463/html5/thumbnails/40.jpg)
Microsoft
Cognitive
Toolkit
• prepare data
• configure reader, network, learner (Python)
• train:mpiexec --np 16 --hosts server1,server2,server3,server4 \python my_cntk_script.py
• deploy• offline (Python): apply model file-to-file
• your code: embed model through C++ API
• online: web service wrapper through C#/Java API
Workflow
![Page 41: Deep learning at Microsoft - Microsoft Cognitive Toolkit · PDF fileCognitive Toolkit Deep learning at Microsoft ... Cognitive Toolkit Image Similarity ... •Microsoft’s open-source](https://reader036.vdocuments.mx/reader036/viewer/2022081517/5abde2317f8b9a7e418c3463/html5/thumbnails/41.jpg)
![Page 42: Deep learning at Microsoft - Microsoft Cognitive Toolkit · PDF fileCognitive Toolkit Deep learning at Microsoft ... Cognitive Toolkit Image Similarity ... •Microsoft’s open-source](https://reader036.vdocuments.mx/reader036/viewer/2022081517/5abde2317f8b9a7e418c3463/html5/thumbnails/42.jpg)
Microsoft
Cognitive
Toolkit
• Symbolic loops over sequences with dynamic scheduling
• Turn graph into parallel program through minibatching
• Unique parallel training algorithms (1-bit SGD, Block Momentum)
CNTK Unique Features
![Page 43: Deep learning at Microsoft - Microsoft Cognitive Toolkit · PDF fileCognitive Toolkit Deep learning at Microsoft ... Cognitive Toolkit Image Similarity ... •Microsoft’s open-source](https://reader036.vdocuments.mx/reader036/viewer/2022081517/5abde2317f8b9a7e418c3463/html5/thumbnails/43.jpg)
Microsoft
Cognitive
Toolkit
Symbolic Loops over Sequential Data
extend our example to a recurrent network (RNN)
h1(t) = s(W1 x(t) + H1 h1(t-1) + b1) h1 = sigmoid(x @ W1 + past_value(h1) + b1)
h2(t) = s(W2 h1(t) + H2 h2(t-1) + b2) h2 = sigmoid(h1 @ W2 + past_value(h2) @ H2 + b2)
P(t) = softmax(Wout h2(t) + bout) P = softmax(h2 @ Wout + bout)
ce(t) = LT(t) log P(t) ce = cross_entropy(P, L)
Scorpusce(t) = max
no explicit notion of time
![Page 44: Deep learning at Microsoft - Microsoft Cognitive Toolkit · PDF fileCognitive Toolkit Deep learning at Microsoft ... Cognitive Toolkit Image Similarity ... •Microsoft’s open-source](https://reader036.vdocuments.mx/reader036/viewer/2022081517/5abde2317f8b9a7e418c3463/html5/thumbnails/44.jpg)
Microsoft
Cognitive
Toolkit
Symbolic Loops over Sequential Data
extend our example to a recurrent network (RNN)
h1(t) = s(W1 x(t) + H1 h1(t-1) + b1) h1 = sigmoid(x @ W1 + past_value(h1) + b1)
h2(t) = s(W2 h1(t) + H2 h2(t-1) + b2) h2 = sigmoid(h1 @ W2 + past_value(h2) @ H2 + b2)
P(t) = softmax(Wout h2(t) + bout) P = softmax(h2 @ Wout + bout)
ce(t) = LT(t) log P(t) ce = cross_entropy(P, L)
Scorpusce(t) = max
no explicit notion of time
![Page 45: Deep learning at Microsoft - Microsoft Cognitive Toolkit · PDF fileCognitive Toolkit Deep learning at Microsoft ... Cognitive Toolkit Image Similarity ... •Microsoft’s open-source](https://reader036.vdocuments.mx/reader036/viewer/2022081517/5abde2317f8b9a7e418c3463/html5/thumbnails/45.jpg)
Microsoft
Cognitive
Toolkit
Symbolic Loops over Sequential Data
extend our example to a recurrent network (RNN)
h1(t) = s(W1 x(t) + H1 h1(t-1) + b1) h1 = sigmoid(x @ W1 + past_value(h1) + b1)
h2(t) = s(W2 h1(t) + H2 h2(t-1) + b2) h2 = sigmoid(h1 @ W2 + past_value(h2) @ H2 + b2)
P(t) = softmax(Wout h2(t) + bout) P = softmax(h2 @ Wout + bout)
ce(t) = LT(t) log P(t) ce = cross_entropy(P, L)
Scorpusce(t) = max
no explicit notion of time
![Page 46: Deep learning at Microsoft - Microsoft Cognitive Toolkit · PDF fileCognitive Toolkit Deep learning at Microsoft ... Cognitive Toolkit Image Similarity ... •Microsoft’s open-source](https://reader036.vdocuments.mx/reader036/viewer/2022081517/5abde2317f8b9a7e418c3463/html5/thumbnails/46.jpg)
Microsoft
Cognitive
Toolkit
Symbolic Loops over Sequential Data
extend our example to a recurrent network (RNN)
h1(t) = s(W1 x(t) + H1 h1(t-1) + b1) h1 = sigmoid(x @ W1 + past_value(h1) @ H1 + b1)
h2(t) = s(W2 h1(t) + H2 h2(t-1) + b2) h2 = sigmoid(h1 @ W2 + past_value(h2) @ H2 + b2)
P(t) = softmax(Wout h2(t) + bout) P = softmax(h2 @ Wout + bout)
ce(t) = LT(t) log P(t) ce = cross_entropy(P, L)
Scorpusce(t) = max
![Page 47: Deep learning at Microsoft - Microsoft Cognitive Toolkit · PDF fileCognitive Toolkit Deep learning at Microsoft ... Cognitive Toolkit Image Similarity ... •Microsoft’s open-source](https://reader036.vdocuments.mx/reader036/viewer/2022081517/5abde2317f8b9a7e418c3463/html5/thumbnails/47.jpg)
Microsoft
Cognitive
Toolkit
Symbolic Loops over Sequential Data
•
+
s
•
+
softmax
W1
b1
Wout
bout
cross_entropy
h1
P
x y
ce
h1 = sigmoid(x @ W1 + past_value(h1) @ H1 + b1)
h2 = sigmoid(h1 @ W2 + past_value(h2) @ H2 + b2)
P = softmax(h2 @ Wout + bout)
ce = cross_entropy(P, L)
• CNTK automatically unrolls cycles deferred computation
• Efficient and composable
+ •
H1
z-1
•
+
s
W2
b2
h2
+ •
H2
z-1
![Page 48: Deep learning at Microsoft - Microsoft Cognitive Toolkit · PDF fileCognitive Toolkit Deep learning at Microsoft ... Cognitive Toolkit Image Similarity ... •Microsoft’s open-source](https://reader036.vdocuments.mx/reader036/viewer/2022081517/5abde2317f8b9a7e418c3463/html5/thumbnails/48.jpg)
Microsoft
Cognitive
Toolkit
• minibatches containing sequences of different lengths are automatically packed and padded
• CNTK handles the special cases:• past_value operation correctly resets state and gradient at sequence boundaries
• non-recurrent operations just pretend there is no padding (“garbage-in/garbage-out”)
• sequence reductions
Batch-Scheduling of Variable-Length Sequences
pa
ralle
l se
qu
en
ces
time steps computed in parallel
padding
sequence 1
sequence 2 sequence 3
sequence 4
sequence 5 sequence 6
sequence 7
![Page 49: Deep learning at Microsoft - Microsoft Cognitive Toolkit · PDF fileCognitive Toolkit Deep learning at Microsoft ... Cognitive Toolkit Image Similarity ... •Microsoft’s open-source](https://reader036.vdocuments.mx/reader036/viewer/2022081517/5abde2317f8b9a7e418c3463/html5/thumbnails/49.jpg)
Microsoft
Cognitive
Toolkit
• minibatches containing sequences of different lengths are automatically packed and padded
• CNTK handles the special cases:• past_value operation correctly resets state and gradient at sequence boundaries
• non-recurrent operations just pretend there is no padding (“garbage-in/garbage-out”)
• sequence reductions
pa
ralle
l se
qu
en
ces
time steps computed in parallel
padding
sequence 1
sequence 2 sequence 3
sequence 4
sequence 5 sequence 6
sequence 7
Batch-Scheduling of Variable-Length Sequences
![Page 50: Deep learning at Microsoft - Microsoft Cognitive Toolkit · PDF fileCognitive Toolkit Deep learning at Microsoft ... Cognitive Toolkit Image Similarity ... •Microsoft’s open-source](https://reader036.vdocuments.mx/reader036/viewer/2022081517/5abde2317f8b9a7e418c3463/html5/thumbnails/50.jpg)
Microsoft
Cognitive
Toolkit
• minibatches containing sequences of different lengths are automatically packed and padded
• CNTK handles the special cases:• past_value operation correctly resets state and gradient at sequence boundaries
• non-recurrent operations just pretend there is no padding (“garbage-in/garbage-out”)
• sequence reductions
pa
ralle
l se
qu
en
ces
time steps computed in parallel
padding
sequence 1
sequence 2 sequence 3
sequence 3
sequence 5 sequence 6
sequence 7
Batch-Scheduling of Variable-Length Sequences
![Page 51: Deep learning at Microsoft - Microsoft Cognitive Toolkit · PDF fileCognitive Toolkit Deep learning at Microsoft ... Cognitive Toolkit Image Similarity ... •Microsoft’s open-source](https://reader036.vdocuments.mx/reader036/viewer/2022081517/5abde2317f8b9a7e418c3463/html5/thumbnails/51.jpg)
Microsoft
Cognitive
Toolkit
• minibatches containing sequences of different lengths are automatically packed and padded
• CNTK handles the special cases:• past_value operation correctly resets state and gradient at sequence boundaries
• non-recurrent operations just pretend there is no padding (“garbage-in/garbage-out”)
• sequence reductions
pa
ralle
l se
qu
en
ces
time steps computed in parallel
padding
sequence 1
sequence 2 sequence 3
sequence 4
sequence 5 sequence 6
sequence 7
Batch-Scheduling of Variable-Length Sequences
![Page 52: Deep learning at Microsoft - Microsoft Cognitive Toolkit · PDF fileCognitive Toolkit Deep learning at Microsoft ... Cognitive Toolkit Image Similarity ... •Microsoft’s open-source](https://reader036.vdocuments.mx/reader036/viewer/2022081517/5abde2317f8b9a7e418c3463/html5/thumbnails/52.jpg)
Microsoft
Cognitive
Toolkit
• minibatches containing sequences of different lengths are automatically packed and padded
• CNTK handles the special cases:• past_value operation correctly resets state and gradient at sequence boundaries
• non-recurrent operations just pretend there is no padding (“garbage-in/garbage-out”)
• sequence reductions
pa
ralle
l se
qu
en
ces
time steps computed in parallel
padding
sequence 1
sequence 2 sequence 3
sequence 4
sequence 5 sequence 6
sequence 7
Batch-Scheduling of Variable-Length Sequences
![Page 53: Deep learning at Microsoft - Microsoft Cognitive Toolkit · PDF fileCognitive Toolkit Deep learning at Microsoft ... Cognitive Toolkit Image Similarity ... •Microsoft’s open-source](https://reader036.vdocuments.mx/reader036/viewer/2022081517/5abde2317f8b9a7e418c3463/html5/thumbnails/53.jpg)
Microsoft
Cognitive
Toolkit
• minibatches containing sequences of different lengths are automatically packed and padded
• CNTK handles the special cases:• past_value operation correctly resets state and gradient at sequence boundaries
• non-recurrent operations just pretend there is no padding (“garbage-in/garbage-out”)
• sequence reductions
pa
ralle
l se
qu
en
ces
time steps computed in parallel
padding
sequence 1
sequence 2 sequence 3
sequence 4
sequence 5 sequence 6
sequence 7
Batch-Scheduling of Variable-Length Sequences
![Page 54: Deep learning at Microsoft - Microsoft Cognitive Toolkit · PDF fileCognitive Toolkit Deep learning at Microsoft ... Cognitive Toolkit Image Similarity ... •Microsoft’s open-source](https://reader036.vdocuments.mx/reader036/viewer/2022081517/5abde2317f8b9a7e418c3463/html5/thumbnails/54.jpg)
Microsoft
Cognitive
Toolkit
• minibatches containing sequences of different lengths are automatically packed and padded
• CNTK handles the special cases:• past_value operation correctly resets state and gradient at sequence boundaries
• non-recurrent operations just pretend there is no padding (“garbage-in/garbage-out”)
• sequence reductions
pa
ralle
l se
qu
en
ces
time steps computed in parallel
padding
sequence 1
sequence 2 sequence 3
sequence 4
sequence 5 sequence 6
sequence 7
Batch-Scheduling of Variable-Length Sequences
![Page 55: Deep learning at Microsoft - Microsoft Cognitive Toolkit · PDF fileCognitive Toolkit Deep learning at Microsoft ... Cognitive Toolkit Image Similarity ... •Microsoft’s open-source](https://reader036.vdocuments.mx/reader036/viewer/2022081517/5abde2317f8b9a7e418c3463/html5/thumbnails/55.jpg)
Microsoft
Cognitive
Toolkit
• minibatches containing sequences of different lengths are automatically packed and padded
• CNTK handles the special cases:• past_value operation correctly resets state and gradient at sequence boundaries
• non-recurrent operations just pretend there is no padding (“garbage-in/garbage-out”)
• sequence reductions
pa
ralle
l se
qu
en
ces
time steps computed in parallel
padding
sequence 1
sequence 2 sequence 3
sequence 4
sequence 5 sequence 6
sequence 7
Batch-Scheduling of Variable-Length Sequences
![Page 56: Deep learning at Microsoft - Microsoft Cognitive Toolkit · PDF fileCognitive Toolkit Deep learning at Microsoft ... Cognitive Toolkit Image Similarity ... •Microsoft’s open-source](https://reader036.vdocuments.mx/reader036/viewer/2022081517/5abde2317f8b9a7e418c3463/html5/thumbnails/56.jpg)
Microsoft
Cognitive
Toolkit
• minibatches containing sequences of different lengths are automatically packed and padded
• speed-up is automatic:
pa
ralle
l se
qu
en
ces
time steps computed in parallel
padding
sequence 1
sequence 2 sequence 3
sequence 4
sequence 5 sequence 6
sequence 7
Naïve , Single Sequence, 1
Optimized, multi sequence >20
0 5 10 15 20 25
Naïve
Optimized
Speed comparison on RNNs
Batch-Scheduling of Variable-Length Sequences
![Page 57: Deep learning at Microsoft - Microsoft Cognitive Toolkit · PDF fileCognitive Toolkit Deep learning at Microsoft ... Cognitive Toolkit Image Similarity ... •Microsoft’s open-source](https://reader036.vdocuments.mx/reader036/viewer/2022081517/5abde2317f8b9a7e418c3463/html5/thumbnails/57.jpg)
Microsoft
Cognitive
Toolkit
• Data-parallelism: distribute minibatch over workers, all-reduce partial gradients
all-reduce
Data-Parallel Training
node 1 node 2 node 3
S
![Page 58: Deep learning at Microsoft - Microsoft Cognitive Toolkit · PDF fileCognitive Toolkit Deep learning at Microsoft ... Cognitive Toolkit Image Similarity ... •Microsoft’s open-source](https://reader036.vdocuments.mx/reader036/viewer/2022081517/5abde2317f8b9a7e418c3463/html5/thumbnails/58.jpg)
Microsoft
Cognitive
Toolkit
• Data-parallelism: distribute minibatch over workers, all-reduce partial gradients
Data-parallel training
node 1 node 2 node 3
ring algorithmO(2 (K-1)/K M)
O(1) w.r.t. K
![Page 59: Deep learning at Microsoft - Microsoft Cognitive Toolkit · PDF fileCognitive Toolkit Deep learning at Microsoft ... Cognitive Toolkit Image Similarity ... •Microsoft’s open-source](https://reader036.vdocuments.mx/reader036/viewer/2022081517/5abde2317f8b9a7e418c3463/html5/thumbnails/59.jpg)
Microsoft
Cognitive
Toolkit
GPU 1 GPU 2 GPU 3
how to reduce communication cost:
communicate less each time
• 1-bit SGD:[F. Seide, H. Fu, J. Droppo, G. Li, D. Yu: “1-Bit Stochastic Gradient Descent...Distributed Training of Speech DNNs”, Interspeech 2014]
• quantize gradients to 1 bit per value
• trick: carry over quantization error to next minibatch
1-bit quantized with residual
1-bit quantized with residual
Data-parallel training
minibatch
![Page 60: Deep learning at Microsoft - Microsoft Cognitive Toolkit · PDF fileCognitive Toolkit Deep learning at Microsoft ... Cognitive Toolkit Image Similarity ... •Microsoft’s open-source](https://reader036.vdocuments.mx/reader036/viewer/2022081517/5abde2317f8b9a7e418c3463/html5/thumbnails/60.jpg)
Microsoft
Cognitive
Toolkit
How to reduce communication cost:
communicate less each time
• 1-bit SGD: [F. Seide, H. Fu, J. Droppo, G. Li, D. Yu: “1-Bit Stochastic Gradient Descent...Distributed Training of Speech DNNs”, Interspeech 2014]
• quantize gradients to 1 bit per value
• trick: carry over quantization error to next minibatch
communicate less often
• Automatic MB sizing [F. Seide, H. Fu, J. Droppo, G. Li, D. Yu: “ON Parallelizability of Stochastic Gradient Descent...”, ICASSP 2014]
• Block momentum [K. Chen, Q. Huo: “Scalable training of deep learning machines by incremental block training…,” ICASSP 2016]
• Very recent, very effective parallelization method
• Combines model averaging with error-residual idea
Data-Parallel Training
![Page 61: Deep learning at Microsoft - Microsoft Cognitive Toolkit · PDF fileCognitive Toolkit Deep learning at Microsoft ... Cognitive Toolkit Image Similarity ... •Microsoft’s open-source](https://reader036.vdocuments.mx/reader036/viewer/2022081517/5abde2317f8b9a7e418c3463/html5/thumbnails/61.jpg)
Microsoft
Cognitive
Toolkit
Benchmark Result of Parallel Training on CNTK
2.9 5.4
8.0 3.3
6.7 10.8
3.7 6.9
13.8
25.5
43.7
4.1 8.1
14.1
27.3
54.0
0.0
10.0
20.0
30.0
40.0
50.0
60.0
4 GPUs 8 GPUs 16 GPUs 32 GPUs 64 GPUs
1bit/BMUF Speedup Factors in LSTM Training
1bit-average
1bit-peak
BMUF-average
BMUF-peak
• Training data: 2,670-hour speech from real traffics of VS, SMD, and Cortana
• About 16 and 20 days to train DNN and LSTM on 1-GPU, respectively
Credit: Yongqiang Wang, Kai Chen, Qiang Huo
![Page 62: Deep learning at Microsoft - Microsoft Cognitive Toolkit · PDF fileCognitive Toolkit Deep learning at Microsoft ... Cognitive Toolkit Image Similarity ... •Microsoft’s open-source](https://reader036.vdocuments.mx/reader036/viewer/2022081517/5abde2317f8b9a7e418c3463/html5/thumbnails/62.jpg)
Microsoft
Cognitive
Toolkit
Results
• Achievement• Almost linear speedup without degradation of model quality
• Verified for training DNN, CNN, LSTM up to 64 GPUs for speech recognition, image classification, OCR, and click prediction tasks
• Released in CNTK as a critical differentiator
• Used for enterprise scale production data loads
• Production tools in other companies such as iFLYTEK and Alibaba
![Page 63: Deep learning at Microsoft - Microsoft Cognitive Toolkit · PDF fileCognitive Toolkit Deep learning at Microsoft ... Cognitive Toolkit Image Similarity ... •Microsoft’s open-source](https://reader036.vdocuments.mx/reader036/viewer/2022081517/5abde2317f8b9a7e418c3463/html5/thumbnails/63.jpg)
Where to begin?On GitHub: https://github.com/Microsoft/CNTK/wiki
Tutorials: https://www.cntk.ai/pythondocs/tutorials.html (latest release)https://github.com/Microsoft/CNTK/tree/master/Tutorials (latest)
Azure Notebooks: Try for free pre-hosted https://notebooks.azure.com/cntk/libraries/tutorials
Seek help on Stack Overflow: http://stackoverflow.com/search?q=cntk (please add cntk tag)
Seek help on Stack Overflow: http://stackoverflow.com/search?q=cntk (please add cntk tag)
![Page 64: Deep learning at Microsoft - Microsoft Cognitive Toolkit · PDF fileCognitive Toolkit Deep learning at Microsoft ... Cognitive Toolkit Image Similarity ... •Microsoft’s open-source](https://reader036.vdocuments.mx/reader036/viewer/2022081517/5abde2317f8b9a7e418c3463/html5/thumbnails/64.jpg)
Where to begin?Tutorials: https://www.cntk.ai/pythondocs/tutorials.html (latest release)https://github.com/Microsoft/CNTK/tree/master/Tutorials (latest)
![Page 65: Deep learning at Microsoft - Microsoft Cognitive Toolkit · PDF fileCognitive Toolkit Deep learning at Microsoft ... Cognitive Toolkit Image Similarity ... •Microsoft’s open-source](https://reader036.vdocuments.mx/reader036/viewer/2022081517/5abde2317f8b9a7e418c3463/html5/thumbnails/65.jpg)
Where to begin?Azure Notebooks: Try for free pre-hosted https://notebooks.azure.com/cntk/libraries/tutorials
![Page 66: Deep learning at Microsoft - Microsoft Cognitive Toolkit · PDF fileCognitive Toolkit Deep learning at Microsoft ... Cognitive Toolkit Image Similarity ... •Microsoft’s open-source](https://reader036.vdocuments.mx/reader036/viewer/2022081517/5abde2317f8b9a7e418c3463/html5/thumbnails/66.jpg)
![Page 67: Deep learning at Microsoft - Microsoft Cognitive Toolkit · PDF fileCognitive Toolkit Deep learning at Microsoft ... Cognitive Toolkit Image Similarity ... •Microsoft’s open-source](https://reader036.vdocuments.mx/reader036/viewer/2022081517/5abde2317f8b9a7e418c3463/html5/thumbnails/67.jpg)
Where to begin?On GitHub: https://github.com/Microsoft/CNTK/wiki
Tutorials: https://www.cntk.ai/pythondocs/tutorials.html (latest release)https://github.com/Microsoft/CNTK/tree/master/Tutorials (latest)
Azure Notebooks: Try for free pre-hosted https://notebooks.azure.com/cntk/libraries/tutorials
Seek help on Stack Overflow: http://stackoverflow.com/search?q=cntk (please add cntk tag)
Seek help on Stack Overflow: http://stackoverflow.com/search?q=cntk (please add cntk tag)