deep learning neural networks and ai explained

Post on 13-Apr-2017

204 Views

Category:

Data & Analytics

2 Downloads

Preview:

Click to see full reader

TRANSCRIPT

tommaso.gritti@braincreators.com

@tommasogritti

Deep Neural Network intuition

Embeddings

Transfer Learning

Tips

Outline

Deep Neural Network omnipresence

https://trends.google.com/trends/explore?date=2008-03-09%202017-04-09&q=artificial%20intelligence,machine%20learning,deep%20learning

Deep Neural Network omnipresence

https://trends.google.com/trends/explore?date=2008-03-09%202017-04-09&q=artificial%20intelligence,machine%20learning,deep%20learning

Deep Neural Network omnipresence

https://trends.google.com/trends/explore?date=2008-03-09%202017-04-09&q=artificial%20intelligence,machine%20learning,deep%20learning

Deep Neural Network omnipresence

https://trends.google.com/trends/explore?date=2008-03-09%202017-04-09&q=artificial%20intelligence,machine%20learning,deep%20learning

… or almost

https://trends.google.com/trends/explore?date=2008-03-09%202017-04-09&q=artificial%20intelligence,machine%20learning,deep%20learning

Human1011 neurons 104 synapses per neuron1016 “operations” / sec250 M neurons per mm3 180,000 km of “wires” 25 Watts

Deep Neural Networks sound coolGPU8x1012 operations / sec 500 Watts 5760 (small) cores $2000

Toy example

Num website visits

Num page visited

Average time on page

Converted?

1 13 55s 1

2 1 141s 1

1 8 10s 0

3 5 127s 0

2 3 18s 0

Toy example

Num website visits

Num page visited

Average time on page

Converted?

1 13 55s 1

2 1 141s 1

1 8 10s 0

3 5 127s 0

2 3 18s 0

“Num website visits” does not seem to influence output

Toy example

Num website visits

Num page visited

Average time on page

Converted?

1 13 55s 1

2 1 141s 1

1 8 10s 0

3 5 127s 0

2 3 18s 0

“Num page visited” above 9 seems to be a good threshold, but even when =1 a person can convert ⇒ no simple threshold

Toy example

Num website visits

Num page visited

Average time on page

Converted?

1 13 55s 1

2 1 141s 1

1 8 10s 0

3 5 127s 0

2 3 18s 0

“Avg time on page” > 128 seems to be a good threshold, but even when =55 a person can convert ⇒ no simple threshold

Toy example

1

13

55

??

Num website visits

Num page visited

Average time on page

User converted?w1=??

w2=??

w3=??

> 0 converted< 0 not converted

multiply

sum

1*w1 + 13*w2 + 55*w3 > 0

Toy example

1

13

55

3.58

Num website visits

Num page visited

Average time on page

User converted?-7.04

0.28

0.12

multiply

sum

> 0 ? YES

Toy example

3

5

127

-3.76

Num website visits

Num page visited

Average time on page

User converted?

multiply

sum

> 0 ? NO

-7.04

0.28

0.12

Toy example

Num website visits

Num page visited

Average time on page

3

5

-7.04

0.28

0.12127

Method #1

Num website visits

Num page visited

Average time on page

3

5

-2.4

0.91

0.013127

Method #2

Num website visits

Num page visited

Average time on page

3

5

-3.9

0.21

0.03127

Method #3

Num website visits

Num page visited

Average time on page

3

5

-1.1

0.83

0.18127

Method #4

Toy example

1

13

55

Method #1

Method #2

Method #3

Method #4

Final estimate

Num website visits

Num page visited

Average time on page

Toy example

1

13

55

Method #1

Method #2

Method #3

Method #4

Final estimate

Num website visits

Num page visited

Average time on page

input layerhidden layer

output layer

Toy example

1

13

55

Method #1

Method #2

Method #3

Method #4

Final estimate

Num website visits

Num page visited

Average time on page

input layerhidden layer

output layer

Deep = lots of hidden layers

http://www.asimovinstitute.org/neural-network-zoo/

Lots of configurations

Open source toolkits

Neural Networks - Take Home Message● Applicable to endless domains: object recognition, medical imaging,

automotive, finance, robotics, natural language processing, translation

systems, speech recognition

● At the simplest levels only a series of nodes doing sums & thresholding

● Lots of variety

● Lots of open source tools

Embeddings

Context: object recognitionAutomatically classify product images into 1000s

of categories

Dress Boot

Image dataset Image Features Classifier

● Edges● Contrast● Local patterns● colors

● Adaboost● SVM● Random

Forests● Neural Network

Image Classifier (old school)

0.2-0.30.150.750.11……

0.93

V =

Input Image Feature extraction Image features

Image Features (old school)

Classifier

f(V) > 0

Image dataset Image Features Classifier

● Edges● Contrast● Local patterns● colors

● Adaboost● SVM● Random

Forests● Neural Network

10% 45% 45%

Effort (old school)

...still in use today

DatasetData gathered from 100s of scraped webshops

Dataset5 million products, uncategorised

Datasetuncategorised

● Keywords filtering

● Visual clustering

● Human inspection

~500 labelled classes~ 1000 images / class

Image classifier (the new way)

Deep Convolutional Neural Network (DCNN)~500 labelled classes~1000 images / class

Backpropagation + Gradient descent

Image classifier (the new way)

Forward passtraining imagelabel: pans

Image classifier (the new way)

Forward passtraining imagelabel: pans

Image classifier (the new way)

Forward passtraining imagelabel: pans

Image classifier (the new way)

Forward passtraining imagelabel: pans

Image classifier (the new way)

Forward passtraining imagelabel: pans

Image classifier (the new way)

Forward passtraining imagelabel: pans

Image classifier (the new way)

Forward passtraining imagelabel: pans

Image classifier (the new way)

Forward passtraining imagelabel: pans

Image classifier (the new way)

Forward passtraining imagelabel: pans

predicted label=shoe

Image classifier (the new way)

training imagelabel: pans

predicted label=shoe

Backpropagation + Gradient descent

Image classifier (the new way)

training imagelabel: pans

predicted label=shoe

Backpropagation + Gradient descent= update weights “towards target”

Image classifier (the new way)

Forward pass

Backpropagation + Gradient descent

● Repeat for all training images

● Repeat till stopping criteria

Effort (the new way)

50% 50%

~500 labelled classes~1000 images / class Deep Convolutional Neural Network (DCNN)

What is going on in the network?

Dress

http://vision03.csail.mit.edu/cnn_art/data/single_layer.png

What is going on in the network?

predicted label=pans

http://vision03.csail.mit.edu/cnn_art/data/single_layer.png

What is going on in the network?

AbstractImage

“concepts”

Low levelImage

“concepts”

Embedding = self-learnt descriptors

Abstract level concept / descriptor

Dress

0.2-0.30.150.750.11……

0.93

Distance in embedding space

E(a), E(b), E(c)

a

b

c

d ( , ) >> d ( , )E(a) E(b) E(c) E(b)

Distance in embedding space

Bracelets (unsorted)

Bracelets (sorted on embedding)

Bracelets (sorted on embedding)

Shoes (unsorted)

Shoes (sorted on embedding)

Shoes (sorted on embedding)

Iterative refinementNewly discovered

classesRe-train classifier Results

95% from 5M products classified with confidence > 96%

More than 250 new labeled categories

Context: identity recognitionAutomatically recognize celebrities from

red carpet events

Jennifer Aniston

LL Cool J

Embedding training

Triplet Loss

Train network to discriminate between triplets of images

Triplets

Training

Random embedding initialization

Embedding training

Trained embedding

Celebrity identifier

Celebrity identifier

Celebrity identifier

Celebrity identifier

Celebrity identifier

NLP - Word embeddings

https://medium.com/@ageitgey/machine-learning-is-fun-part-5-language-translation-with-deep-learning-and-the-magic-of-sequences-2ace0acca0aa

With a different network setup we can learn an embedding for words:

NLP - Word embeddings

Each word is represented by a vector. Vector allow to explore very

interesting relationships learnt automatically from the data:

● King - man + woman → queen

● Paris - France + Italy → Rome

● Obama - USA + Russia → Putin

● President - power → prime minister

Embeddings - Take Home Message● From feature engineering to data collection

● Neural Networks automatically learn relevant high level

abstractions

● Embedding spaces very useful to explore data

● Application areas: retrieval or ranking tasks (e.g. product

recommendation, customer segmentation), classification

Transfer learning

ImageNet

1.5 million training examples

1000 categories

Training time ~ days on best GPUs

Transfer Learningrandomly initialized

weightsImageNet

Transfer Learningrandomly initialized

weightsImageNet Network trained to

classify 1000 classes

Classify correctly (>90%) images in

1000 classes

Transfer LearningNew data

New classes

?

Transfer Learningrandomly initialized

weightsImageNet Network trained to

classify 1000 classesFine-tune model(update weights)

New dataNew classes

Transfer Learningrandomly initialized

weightsImageNet Network trained to

classify 1000 classesFine-tune model(update weights)

New dataNew classes● Faster training time

● Better performance

Sharing pre-trained models● Model-Zoo:

https://github.com/BVLC/caffe/wiki/Model-Zoo

● Common format to share pre-trained models

● Active discussion and contributions

Sharing pre-trained models

Transfer Learning - recommendations

small; similar

large; similar

small; different

large; different

Similarity of the data

Size

of d

atab

ase

Transfer Learning - recommendations

Small; similar

large; similar

small; different

large; different

Use existing embedding

Similarity of the data

Size

of d

atab

ase

Transfer Learning - recommendations

Small; similar

large; similar

small; different

large; different

Use existing embedding

Fine-tune complete network

Similarity of the data

Size

of d

atab

ase

Transfer Learning - recommendations

Small; similar

large; similar

small; different

large; different

Use existing embedding

Fine-tune complete network

Use activations from earlier

in the network

Fine-tune complete network (or start from scratch)

Similarity of the data

Size

of d

atab

ase

Transfer Learning - recommendations

Small; similar

large; similar

small; different

large; different

Use existing embedding

Fine-tune complete network

Use activations from earlier

in the network

Fine-tune complete network (or start from scratch)

Similarity of the data

Size

of d

atab

ase

Transfer Learning - Take Home Message● Faster progress

● Training also with much smaller amount of data

● Check closest available model before starting from scratch

Should we all go

deep?

Some questions you should ask● What is the performance of the baseline?

○ What can be achieved with a simpler system?

○ Can we start testing the value proposition with a simpler system?

Some questions you should ask● What is the performance of the baseline?

● How much training data is required?

Some questions you should ask● What is the performance of the baseline?

● How much training data is required?

● Do we have the data, can we acquire it or how long does it take to collect it?

Some questions you should ask● What is the performance of the baseline?

● How much training data is required?

● Do we have the data, can we acquire it or how long does it take to collect it?

● Do we need labeled data or can we use unlabeled data?

Some questions you should ask● What is the performance of the baseline?

● How much training data is required?

● Do we have the data, can we acquire it or how long does it take to collect it?

● Do we need labeled data or can we use unlabeled data?

● How well does it work on data it has never seen? Generalization / Overfitting

Some questions you should ask● What is the performance of the baseline?

● How much training data is required?

● Do we have the data, can we acquire it or how long does it take to collect it?

● Do we need labeled data or can we use unlabeled data?

● How well does it work on data it has never seen? Generalization / Overfitting

● What are the failure cases?

Some questions you should ask● What is the performance of the baseline?

● How much training data is required?

● Do we have the data, can we acquire it or how long does it take to collect it?

● Do we need labeled data or can we use unlabeled data?

● How well does it work on data it has never seen? Generalization / Overfitting

● What are the failure cases?

● How reliable is the confidence of the prediction?

Some questions you should ask● What is the performance of the baseline?

● How much training data is required?

● Do we have the data, can we acquire it or how long does it take to collect it?

● Do we need labeled data or can we use unlabeled data?

● How well does it work on data it has never seen? Generalization / Overfitting

● What are the failure cases?

● How reliable is the confidence of the prediction?

● Can we explain why a prediction has been made?

tommaso.gritti@braincreators.com

@tommasogritti

Thank you!

top related