deep learning neural networks and ai explained

tommaso.gritti@braincreators.com

@tommasogritti

Deep Neural Network intuition

Embeddings

Transfer Learning

Outline

Deep Neural Network omnipresence

https://trends.google.com/trends/explore?date=2008-03-09%202017-04-09&q=artificial%20intelligence,machine%20learning,deep%20learning

… or almost

Applications

http://www.yaronhadad.com/deep-learning-most-amazing-applications/

Human1011 neurons 104 synapses per neuron1016 “operations” / sec250 M neurons per mm3 180,000 km of “wires” 25 Watts

Deep Neural Networks sound coolGPU8x1012 operations / sec 500 Watts 5760 (small) cores $2000

Toy example

Num website visits

Num page visited

Average time on page

Converted?

1 13 55s 1

2 1 141s 1

1 8 10s 0

3 5 127s 0

2 3 18s 0

Toy example

Num website visits

Num page visited

Converted?

1 13 55s 1

2 1 141s 1

1 8 10s 0

3 5 127s 0

2 3 18s 0

“Num website visits” does not seem to influence output

Toy example

Num website visits

Num page visited

Converted?

1 13 55s 1

2 1 141s 1

1 8 10s 0

3 5 127s 0

2 3 18s 0

“Num page visited” above 9 seems to be a good threshold, but even when =1 a person can convert ⇒ no simple threshold

Toy example

Num website visits

Num page visited

Converted?

1 13 55s 1

2 1 141s 1

1 8 10s 0

3 5 127s 0

2 3 18s 0

“Avg time on page” > 128 seems to be a good threshold, but even when =55 a person can convert ⇒ no simple threshold

Toy example

Num website visits

Num page visited

User converted?w1=??

> 0 converted< 0 not converted

multiply

1*w1 + 13*w2 + 55*w3 > 0

Toy example

Num website visits

Num page visited

User converted?-7.04

multiply

> 0 ? YES

Toy example

Num website visits

Num page visited

User converted?

multiply

> 0 ? NO

Toy example

Num website visits

Num page visited

0.12127

Method #1

Num website visits

Num page visited

0.013127

Method #2

Num website visits

Num page visited

0.03127

Method #3

Num website visits

Num page visited

0.18127

Method #4

Toy example

Method #1

Method #2

Method #3

Method #4

Final estimate

Num website visits

Num page visited

Toy example

Method #1

Method #2

Method #3

Method #4

Final estimate

Num website visits

Num page visited

input layerhidden layer

output layer

Toy example

Method #1

Method #2

Method #3

Method #4

Final estimate

Num website visits

Num page visited

input layerhidden layer

output layer

Deep = lots of hidden layers

http://www.asimovinstitute.org/neural-network-zoo/

Lots of configurations

Open source toolkits

Neural Networks - Take Home Message● Applicable to endless domains: object recognition, medical imaging,

automotive, finance, robotics, natural language processing, translation

systems, speech recognition

● At the simplest levels only a series of nodes doing sums & thresholding

● Lots of variety

● Lots of open source tools

Embeddings

Context: object recognitionAutomatically classify product images into 1000s

of categories

Dress Boot

Image dataset Image Features Classifier

● Edges● Contrast● Local patterns● colors

● Adaboost● SVM● Random

Forests● Neural Network

Image Classifier (old school)

0.2-0.30.150.750.11……

Input Image Feature extraction Image features

Image Features (old school)

Classifier

f(V) > 0

Image dataset Image Features Classifier

● Edges● Contrast● Local patterns● colors

● Adaboost● SVM● Random

Forests● Neural Network

10% 45% 45%

Effort (old school)

...still in use today

DatasetData gathered from 100s of scraped webshops

Dataset5 million products, uncategorised

Datasetuncategorised

● Keywords filtering

● Visual clustering

● Human inspection

~500 labelled classes~ 1000 images / class

Image classifier (the new way)

Deep Convolutional Neural Network (DCNN)~500 labelled classes~1000 images / class

Backpropagation + Gradient descent

Forward passtraining imagelabel: pans

predicted label=shoe

training imagelabel: pans

Backpropagation + Gradient descent= update weights “towards target”

Forward pass

● Repeat for all training images

● Repeat till stopping criteria

Effort (the new way)

50% 50%

~500 labelled classes~1000 images / class Deep Convolutional Neural Network (DCNN)

What is going on in the network?

http://vision03.csail.mit.edu/cnn_art/data/single_layer.png

predicted label=pans

http://vision03.csail.mit.edu/cnn_art/data/single_layer.png

AbstractImage

“concepts”

Low levelImage

“concepts”

Embedding = self-learnt descriptors

Abstract level concept / descriptor

0.2-0.30.150.750.11……

Distance in embedding space

E(a), E(b), E(c)

d ( , ) >> d ( , )E(a) E(b) E(c) E(b)

Distance in embedding space

Bracelets (unsorted)

Bracelets (sorted on embedding)

Shoes (unsorted)

Shoes (sorted on embedding)

Iterative refinementNewly discovered

classesRe-train classifier Results

95% from 5M products classified with confidence > 96%

More than 250 new labeled categories

Context: identity recognitionAutomatically recognize celebrities from

red carpet events

Jennifer Aniston

LL Cool J

Embedding training

Triplet Loss

Train network to discriminate between triplets of images

Triplets

Training

Random embedding initialization

Embedding training

Trained embedding

Celebrity identifier

NLP - Word embeddings

https://medium.com/@ageitgey/machine-learning-is-fun-part-5-language-translation-with-deep-learning-and-the-magic-of-sequences-2ace0acca0aa

With a different network setup we can learn an embedding for words:

NLP - Word embeddings

Each word is represented by a vector. Vector allow to explore very

interesting relationships learnt automatically from the data:

● King - man + woman → queen

● Paris - France + Italy → Rome

● Obama - USA + Russia → Putin

● President - power → prime minister

Embeddings - Take Home Message● From feature engineering to data collection

● Neural Networks automatically learn relevant high level

abstractions

● Embedding spaces very useful to explore data

● Application areas: retrieval or ranking tasks (e.g. product

recommendation, customer segmentation), classification

Transfer learning

ImageNet

1.5 million training examples

1000 categories

Training time ~ days on best GPUs

Transfer Learningrandomly initialized

weightsImageNet

weightsImageNet Network trained to

classify 1000 classes

Classify correctly (>90%) images in

1000 classes

Transfer LearningNew data

New classes

classify 1000 classesFine-tune model(update weights)

New dataNew classes

classify 1000 classesFine-tune model(update weights)

New dataNew classes● Faster training time

● Better performance

Sharing pre-trained models● Model-Zoo:

https://github.com/BVLC/caffe/wiki/Model-Zoo

● Common format to share pre-trained models

● Active discussion and contributions

Sharing pre-trained models

Transfer Learning - recommendations

small; similar

large; similar

small; different

large; different

Similarity of the data

Small; similar

large; similar

small; different

large; different

Use existing embedding

Small; similar

large; similar

small; different

large; different

Fine-tune complete network

Small; similar

large; similar

small; different

large; different

Use activations from earlier

in the network

Fine-tune complete network (or start from scratch)

Small; similar

large; similar

small; different

large; different

Use activations from earlier

in the network

Fine-tune complete network (or start from scratch)

Transfer Learning - Take Home Message● Faster progress

● Training also with much smaller amount of data

● Check closest available model before starting from scratch

Should we all go

Some questions you should ask● What is the performance of the baseline?

○ What can be achieved with a simpler system?

○ Can we start testing the value proposition with a simpler system?

● How much training data is required?

● Do we have the data, can we acquire it or how long does it take to collect it?

● Do we need labeled data or can we use unlabeled data?

● How well does it work on data it has never seen? Generalization / Overfitting

● What are the failure cases?

● How reliable is the confidence of the prediction?

● Can we explain why a prediction has been made?

tommaso.gritti@braincreators.com

@tommasogritti

Thank you!

deep learning neural networks and ai explained

Data & Analytics

deep learning explained -...

applied inductive learning - lecture 5 (deep) neural...

deep learning in neural networks: an...

do deep neural networks suffer from...

handwritten hangul recognition using deep convolutional...

scalable object detection using deep neural … object...

deepiot: compressing deep neural network structures … ·...

deep learning and neural networks: background and...

lecture 10: neural networks and deep...

deep neural networks in machine translation: an...

deep convolutional neural networks - computer science-...

deep learning neural networks and ai explained

imagenet classification with deep deep convolutional...

deep neural networks for object...

distributed deep neural networks over the cloud, the...

deep learning explained

decoupled deep neural network for semi-supervised semantic...

artificial neural networks and deep...

deep convolutional neural networks on multichannel time...

deep learning binary neural network on an fpga · deep...