deep learning - uppsala university...deep learning: ai go player an ai defeated a human professional...

Deep Learning

Niklas Wahlstrom

Department of Information Technology, Uppsala University, Sweden

September 21, 2017

[email protected] Guest lecture: Empirical modeling (HT 2017)

mailto:[email protected]

Page 2: Deep Learning - Uppsala University...Deep learning: AI Go player An AI defeated a human professional for the rst time in the game of Go Silver, D. et al. Mastering the game of Go with

Deep Learning: Caption generation

Generate caption automatically from images

Xu, K., Lei Ba, J., Kiros, R., Cho, K., Courville, A., Salakhutdinov, R. Richard S. Zemel, R. S., and Bengio, Y.Show, attend and tell: neural image caption generation with visual attention. In Proceedings of the 32ndInternational Conference on Machine Learning (ICML), Lille, France, July, 2015.

1 / 25 [email protected] Guest lecture: Empirical modeling (HT 2017)

Deep learning: AI Go player

An AI defeated a human professional forthe first time in the game of Go

Silver, D. et al. Mastering the game of Go with deep neural networks and tree search, Nature, Vol 529, 484–489(2016)

Page 4: Deep Learning - Uppsala University...Deep learning: AI Go player An AI defeated a human professional for the rst time in the game of Go Silver, D. et al. Mastering the game of Go with

Outline

1. What is a neural network (NN)?

2. Why do deep neural networks work so well?

a) Why neural networks?b) Why deep?

3. Some comment, pointers and summary

Page 5: Deep Learning - Uppsala University...Deep learning: AI Go player An AI defeated a human professional for the rst time in the game of Go Silver, D. et al. Mastering the game of Go with

Outline

Page 6: Deep Learning - Uppsala University...Deep learning: AI Go player An AI defeated a human professional for the rst time in the game of Go Silver, D. et al. Mastering the game of Go with

Skin cancer – background

One recent result on the use of deep learning in medicine -Detecting skin cancer (February 2017)Andre Esteva, A., Kuprel, B., Novoa, R. A., Ko, J., Swetter, S. M., Blau, H. M. and Thrun, S. Dermatologist-levelclassification of skin cancer with deep neural networks. Nature, 542, 115–118, February, 2017.

Some background figures (from the US) on skin cancer:

I Melanomas represents less than 5% of all skin cancers, butaccounts for 75% of all skin-cancer-related deaths.

I Early detection absolutely critical. Estimated 5-year survivalrate for melanoma: Over 99% if detected in its earlier stagesand 14% is detected in its later stages.

Page 7: Deep Learning - Uppsala University...Deep learning: AI Go player An AI defeated a human professional for the rst time in the game of Go Silver, D. et al. Mastering the game of Go with

Skin cancer – background

One recent result on the use of deep learning in medicine -Detecting skin cancer (February 2017)Andre Esteva, A., Kuprel, B., Novoa, R. A., Ko, J., Swetter, S. M., Blau, H. M. and Thrun, S. Dermatologist-levelclassification of skin cancer with deep neural networks. Nature, 542, 115–118, February, 2017.

Some background figures (from the US) on skin cancer:

I Melanomas represents less than 5% of all skin cancers, butaccounts for 75% of all skin-cancer-related deaths.

I Early detection absolutely critical. Estimated 5-year survivalrate for melanoma: Over 99% if detected in its earlier stagesand 14% is detected in its later stages.

Page 8: Deep Learning - Uppsala University...Deep learning: AI Go player An AI defeated a human professional for the rst time in the game of Go Silver, D. et al. Mastering the game of Go with

Skin cancer – taxonomy used

Image copyright Nature doi:10.1038/nature21056)

Page 9: Deep Learning - Uppsala University...Deep learning: AI Go player An AI defeated a human professional for the rst time in the game of Go Silver, D. et al. Mastering the game of Go with

Skin cancer – task

Image copyright Nature (doi:10.1038/nature21056)

Page 10: Deep Learning - Uppsala University...Deep learning: AI Go player An AI defeated a human professional for the rst time in the game of Go Silver, D. et al. Mastering the game of Go with

Skin cancer – solution (ultrabrief)

Start from a neural network trained on 1.28 million images(transfer learning).

Make minor modifications to this model, specializing to presentsituation.

Learn new model parametersusing 129 450 clinical images(∼ 100 times more images thanany previous study).

?

Unseen data

Modelprediction

Page 11: Deep Learning - Uppsala University...Deep learning: AI Go player An AI defeated a human professional for the rst time in the game of Go Silver, D. et al. Mastering the game of Go with

?

Unseen data

Modelprediction

Page 12: Deep Learning - Uppsala University...Deep learning: AI Go player An AI defeated a human professional for the rst time in the game of Go Silver, D. et al. Mastering the game of Go with

?

Unseen data

Modelprediction

Page 13: Deep Learning - Uppsala University...Deep learning: AI Go player An AI defeated a human professional for the rst time in the game of Go Silver, D. et al. Mastering the game of Go with

Skin cancer – indication of the results

sensitivity =true positive

positivespecificity =

true negative

negative

Page 14: Deep Learning - Uppsala University...Deep learning: AI Go player An AI defeated a human professional for the rst time in the game of Go Silver, D. et al. Mastering the game of Go with

Skin cancer – indication of the results

sensitivity =true positive

positivespecificity =

true negative

negative

Page 15: Deep Learning - Uppsala University...Deep learning: AI Go player An AI defeated a human professional for the rst time in the game of Go Silver, D. et al. Mastering the game of Go with

Constructing an NN for regression

A neural network (NN) is a nonlinear function y = gθ(ϕ)from an input variable ϕ to an output variable y

parameterized by θ.

Linear regression models the relationship between a continuoustarget variable y and an input variable ϕ,

y =n∑i=1

ϕiθi + θ0 = ϕTθ,

where θ is the parameters composed by the “weights” θi and theoffset (“bias”) term θ0,

θ =(θ0 θ1 θ2 · · · θn

)T,

ϕ =(1 ϕ1 ϕ2 · · · ϕn

)T.

Page 16: Deep Learning - Uppsala University...Deep learning: AI Go player An AI defeated a human professional for the rst time in the game of Go Silver, D. et al. Mastering the game of Go with

Generalized linear regression

We can generalize this by introducing nonlinear transformations ofthe predictor ϕTθ,

y = σ(ϕTθ)....

1ϕ1

ϕn

σ y

θ0

θn

We call σ(x) the activation function. Two common choices are:

−5 5

1

x

σ(x)

Sigmoid: σ(x) = 11+e−x

−1 1

1

x

σ(x)

ReLU: σ(x) = max(0, x)

Let us consider an example of a feed-forward NN, indicating thatthe information flows from the input to the output layer.

Page 17: Deep Learning - Uppsala University...Deep learning: AI Go player An AI defeated a human professional for the rst time in the game of Go Silver, D. et al. Mastering the game of Go with

y = σ(ϕTθ)....

1ϕ1

ϕn

σ y

θ0

θnWe call σ(x) the activation function. Two common choices are:

−5 5

1

x

σ(x)

−1 1

1

x

σ(x)

Page 18: Deep Learning - Uppsala University...Deep learning: AI Go player An AI defeated a human professional for the rst time in the game of Go Silver, D. et al. Mastering the game of Go with

y = σ(ϕTθ)....

1ϕ1

ϕn

σ y

θ0

−5 5

1

x

σ(x)

−1 1

1

x

σ(x)

Page 19: Deep Learning - Uppsala University...Deep learning: AI Go player An AI defeated a human professional for the rst time in the game of Go Silver, D. et al. Mastering the game of Go with

y = σ(ϕTθ)....

1ϕ1

ϕn

σ y

θ0

−5 5

1

x

σ(x)

−1 1

1

x

σ(x)

Page 20: Deep Learning - Uppsala University...Deep learning: AI Go player An AI defeated a human professional for the rst time in the game of Go Silver, D. et al. Mastering the game of Go with

y = σ(ϕTθ)....

1ϕ1

ϕn

σ y

θ0

−5 5

1

x

σ(x)

−1 1

1

x

σ(x)

Page 21: Deep Learning - Uppsala University...Deep learning: AI Go player An AI defeated a human professional for the rst time in the game of Go Silver, D. et al. Mastering the game of Go with

Neural network - construction

A NN is a sequential construction of several linearregression models.

...

1

ϕ1

ϕn

σz1

yσ...

σzM

11

...

σ

z(2)1

z(2)2

z(2)M2

y

Inputs Hidden units Outputs

z1 = σ(+∑n

j=1ϕj

)z2 = σ

(+∑n

j=1ϕj

)...

zM = σ(+∑n

j=1ϕj

)

Page 22: Deep Learning - Uppsala University...Deep learning: AI Go player An AI defeated a human professional for the rst time in the game of Go Silver, D. et al. Mastering the game of Go with

A NN is a sequential construction of several linear regressionmodels.

...

1

ϕ1

ϕn

σz1

y

σ...

σzM

11

...

σ

z(2)1

z(2)2

z(2)M2

y

z1 = σ(θ(1)01 +

∑n

j=1θ(1)j1 ϕj

)

z2 = σ(+∑n

j=1ϕj

)...

zM = σ(+∑n

j=1ϕj

)

y = θ(2)1 z1

Page 23: Deep Learning - Uppsala University...Deep learning: AI Go player An AI defeated a human professional for the rst time in the game of Go Silver, D. et al. Mastering the game of Go with

...

1

ϕ1

ϕn

σz1

yσ

...

σzM

11

...

σ

z(2)1

z(2)2

z(2)M2

y

z1 = σ(θ(1)01 +

∑n

j=1θ(1)j1 ϕj

)z2 = σ

(θ(1)02 +

∑n

j=1θ(1)j2 ϕj

)

...

zM = σ(+∑n

j=1ϕj

)

y =

2∑m=1

θ(2)m zm

Page 24: Deep Learning - Uppsala University...Deep learning: AI Go player An AI defeated a human professional for the rst time in the game of Go Silver, D. et al. Mastering the game of Go with

...

1

ϕ1

ϕn

σz1

yσ...

σzM

11

...

σ

z(2)1

z(2)2

z(2)M2

y

z1 = σ(θ(1)01 +

∑n

j=1θ(1)j1 ϕj

)z2 = σ

(θ(1)02 +

∑n

j=1θ(1)j2 ϕj

)...

zM = σ(θ(1)0M +

∑n

j=1θ(1)jMϕj

)y =

M∑m=1

θ(2)m zm

Page 25: Deep Learning - Uppsala University...Deep learning: AI Go player An AI defeated a human professional for the rst time in the game of Go Silver, D. et al. Mastering the game of Go with

...

1

ϕ1

ϕn

σz1

yσ...

σzM

1

...

σ

z(2)1

z(2)2

z(2)M2

y

z1 = σ(θ(1)01 +

∑n

j=1θ(1)j1 ϕj

)z2 = σ

(θ(1)02 +

∑n

j=1θ(1)j2 ϕj

)...

zM = σ(θ(1)0M +

∑n

j=1θ(1)jMϕj

)y = θ

(2)0 +

M∑m=1

θ(2)m zm

Page 26: Deep Learning - Uppsala University...Deep learning: AI Go player An AI defeated a human professional for the rst time in the game of Go Silver, D. et al. Mastering the game of Go with

...

1

ϕ1

ϕn

σ

yσ...

σ

1

...

σ

z(2)1

z(2)2

z(2)M2

y

z = σ(WT1 ϕ+ bT

1 )

b1 = [ θ(1)01 ... θ(1)0M

]

W1 =

θ(1)01 ... θ

(1)0M

... ......

θ(1)n1 ... θ

(1)nM

y =WT

2 z+ bT2

b2 = [ θ(1)0 ]

W2 =

θ(2)0

...θ(2)M

Page 27: Deep Learning - Uppsala University...Deep learning: AI Go player An AI defeated a human professional for the rst time in the game of Go Silver, D. et al. Mastering the game of Go with

A NN is a sequential construction of several several linearregression models.

...

1

ϕ1

ϕn

σ

yσ...

σ

1

...

σ

z(2)1

z(2)2

z(2)M2

y

z = σ(WT1 ϕ+ bT

1 )

y =WT2 Z+ bT

2

Page 28: Deep Learning - Uppsala University...Deep learning: AI Go player An AI defeated a human professional for the rst time in the game of Go Silver, D. et al. Mastering the game of Go with

A NN is a sequential construction of several linearregression models.

...

1

ϕ1

ϕn

σ

σ...

σ

z(1)1

z(1)2

z(1)M1

11

...

σ

z(2)1

z(2)2

z(2)M2

y

Inputs Hidden units Hidden units Outputs

z(1) = σ(WT1 ϕ+ bT

1 )

z(2) = σ(WT2 z

(1) + bT2 )

y =WT3 z

(2) + bT3

The model learns better using adeep network (several layers)instead of a wide and shallownetwork.

Page 29: Deep Learning - Uppsala University...Deep learning: AI Go player An AI defeated a human professional for the rst time in the game of Go Silver, D. et al. Mastering the game of Go with

Outline

2. Why do deep neural networks work so well?a) Why neural networks?b) Why deep?

Page 30: Deep Learning - Uppsala University...Deep learning: AI Go player An AI defeated a human professional for the rst time in the game of Go Silver, D. et al. Mastering the game of Go with

Why neural networks?

Continuous multiplication gateA neural network with only four hidden units can modelmultiplication of two numbers arbitrarily well.

ϕ1

ϕ2

f

y ≈ ϕ1 ∗ ϕ2

λ

λ−λ−λλ

−λ−λλ

µ

−µ−µ

If we choose µ = 14λ2f ′′(0) then y → ϕ1 ∗ ϕ2 when λ→ 0.

Henry W. Lin and Max Tegmark. (2016) Why does deep and cheap learning work so well?, arXiv

Page 31: Deep Learning - Uppsala University...Deep learning: AI Go player An AI defeated a human professional for the rst time in the game of Go Silver, D. et al. Mastering the game of Go with

A regression example

Input: u ∈ R1000

Output: y ∈ RTask: Model a quadratic relationship between y and u

Linear regression

y = u1u1θ1,1 + u1u2θ1,2 + · · ·+ u1000u1000θ1000,1000 = ϕTθ

where

ϕ =[u1u1 u1u2 . . . u1000u1000

]Tθ =

[θ1,1 θ1,2 . . . θ1000,1000

]TRequires ≈ 1’000∗1’000

2 = 500’000 parameters!

Neural networkTo model all products with a neural network we would need4 ∗ 500’000 = 2 ∗ 106 hidden units and hence 2 billion parameters...

... ...

u1

u2

u3

u1000

z1

z2’000’000

y

1000 ∗ (2 ∗ 106) + 2 ∗ 106 ≈ 2 ∗ 109 param.

Page 32: Deep Learning - Uppsala University...Deep learning: AI Go player An AI defeated a human professional for the rst time in the game of Go Silver, D. et al. Mastering the game of Go with

Input: u ∈ R1000

Linear regression

y = u1u1θ1,1 + u1u2θ1,2 + · · ·+ u1000u1000θ1000,1000 = ϕTθ

where

ϕ =[u1u1 u1u2 . . . u1000u1000

]Tθ =

[θ1,1 θ1,2 . . . θ1000,1000

]TRequires ≈ 1’000∗1’000

... ...

u1

u2

u3

u1000

z1

z2’000’000

y

1000 ∗ (2 ∗ 106) + 2 ∗ 106 ≈ 2 ∗ 109 param.

Page 33: Deep Learning - Uppsala University...Deep learning: AI Go player An AI defeated a human professional for the rst time in the game of Go Silver, D. et al. Mastering the game of Go with

Input: u ∈ R1000

Linear regression

y = u1u1θ1,1 + u1u2θ1,2 + · · ·+ u1000u1000θ1000,1000 = ϕTθ

where

ϕ =[u1u1 u1u2 . . . u1000u1000

]Tθ =

[θ1,1 θ1,2 . . . θ1000,1000

]TRequires ≈ 1’000∗1’000

... ...

u1

u2

u3

u1000

z1

z2’000’000

y

1000 ∗ (2 ∗ 106) + 2 ∗ 106 ≈ 2 ∗ 109 param.

Page 34: Deep Learning - Uppsala University...Deep learning: AI Go player An AI defeated a human professional for the rst time in the game of Go Silver, D. et al. Mastering the game of Go with

A regression example (cont.)

Input: u ∈ R1000

Assume that only 10 of the regressors uiuj are of importance

Linear regression

y = u1u1θ1,1 + u1u2θ1,2 + · · ·+ u1000u1000θ1000,1000 = ϕTθ

where

ϕ =[u1u1 u1u2 . . . u1000u1000

]Tθ =

[θ1,1 θ1,2 . . . θ1000,1000

]TYou probably want to regularize, but 500’000 parameters are stillrequired!

Neural networkTo model 10 products with a neural network we would need 4*10hidden units, i.e. leading to only ≈ 40’000 parameters!

...

u1

u2

u3

u1000

z1

z40y

1000*40 + 40 = 40’040

Page 35: Deep Learning - Uppsala University...Deep learning: AI Go player An AI defeated a human professional for the rst time in the game of Go Silver, D. et al. Mastering the game of Go with

A regression example (cont.)

Input: u ∈ R1000

Assume that only 10 of the regressors uiuj are of importance

Linear regression

y = u1u1θ1,1 + u1u2θ1,2 + · · ·+ u1000u1000θ1000,1000 = ϕTθ

where

ϕ =[u1u1 u1u2 . . . u1000u1000

]Tθ =

[θ1,1 θ1,2 . . . θ1000,1000

]TYou probably want to regularize, but 500’000 parameters are stillrequired!

Neural networkTo model 10 products with a neural network we would need 4*10hidden units, i.e. leading to only ≈ 40’000 parameters!

...

u1

u2

u3

u1000

z1

z40y

1000*40 + 40 = 40’04016 / 25 [email protected] Guest lecture: Empirical modeling (HT 2017)

Page 36: Deep Learning - Uppsala University...Deep learning: AI Go player An AI defeated a human professional for the rst time in the game of Go Silver, D. et al. Mastering the game of Go with

Outline

2. Why do deep neural networks work so well?a) Why neural networks?b) Why deep?

Page 37: Deep Learning - Uppsala University...Deep learning: AI Go player An AI defeated a human professional for the rst time in the game of Go Silver, D. et al. Mastering the game of Go with

Why deep? - A regression example

I Consider the same example. Now we want a model withcomplexity corresponding to polynomials of degree 1’000.

I Keep 250 products in each layer⇒ 250∗4=1’000 hidden units.

......

...

u1

u2

u3

u1000

z(1)1

z(1)1000

z(10)1

z(10)1000

y

106+106+106+106+106+106+106+106+106+106+103︸︷︷︸≈107 parameters

Linear regression would require ≈ 10001000

1000! parameters to modelsuch a relationship...

Page 38: Deep Learning - Uppsala University...Deep learning: AI Go player An AI defeated a human professional for the rst time in the game of Go Silver, D. et al. Mastering the game of Go with

Why deep? - Image classification

Example: Image classification

Input: pixels of an imageOutput: object identityEach hidden layer extractsincreasingly abstractfeatures.

Zeiler, M. D. and Fergus, R. Visualizing and understanding convolutional networks

Computer Vision - ECCV (2014).

Page 39: Deep Learning - Uppsala University...Deep learning: AI Go player An AI defeated a human professional for the rst time in the game of Go Silver, D. et al. Mastering the game of Go with

Deep neural networks

Deep learning methods allow a machine to make use of raw datato automatically discover the representations (abstractions) thatare necessary to solve a particular task.

It is accomplished by using multiple levels of representation.Each level transforms the representation at the previous level into anew and more abstract representation,

z(l+1) = f(W (l+1)z(l) + b(l+1)

),

starting from the input (raw data) z(0) = u.

Key aspect: The layers are not designed by human engineers,they are generated from (typically lots of) data using a learningprocedure and lots of computations.

Page 40: Deep Learning - Uppsala University...Deep learning: AI Go player An AI defeated a human professional for the rst time in the game of Go Silver, D. et al. Mastering the game of Go with

z(l+1) = f(W (l+1)z(l) + b(l+1)

),

Page 41: Deep Learning - Uppsala University...Deep learning: AI Go player An AI defeated a human professional for the rst time in the game of Go Silver, D. et al. Mastering the game of Go with

z(l+1) = f(W (l+1)z(l) + b(l+1)

),

Page 42: Deep Learning - Uppsala University...Deep learning: AI Go player An AI defeated a human professional for the rst time in the game of Go Silver, D. et al. Mastering the game of Go with

Outline

Page 43: Deep Learning - Uppsala University...Deep learning: AI Go player An AI defeated a human professional for the rst time in the game of Go Silver, D. et al. Mastering the game of Go with

Some comments - Why now?

Neural networks have been around for more than fifty years. Whyhave they become so popular now (again)?

To solve really interesting problems you need:

1. Efficient learning algorithms

2. Efficient computational hardware

3. A lot of labeled data!

These three factors have not been fulfilled to a satisfactory leveluntil the last 5-10 years.

Page 44: Deep Learning - Uppsala University...Deep learning: AI Go player An AI defeated a human professional for the rst time in the game of Go Silver, D. et al. Mastering the game of Go with

Some pointers

A book has recently been writtenI. Goodfellow, Y. Bengio and A. Courville Deep learning MIT Press, 2016

http://www.deeplearningbook.org/

A well written introduction:LeCun, Y., Bengio, Y., and Hinton, G. (2015) Deep learning, Nature, 521(7553), 436–444.

Details about the multiplication gate

You will also find more material than you can possibly want here

http://deeplearning.net/

Page 45: Deep Learning - Uppsala University...Deep learning: AI Go player An AI defeated a human professional for the rst time in the game of Go Silver, D. et al. Mastering the game of Go with

Some pointers

Page 46: Deep Learning - Uppsala University...Deep learning: AI Go player An AI defeated a human professional for the rst time in the game of Go Silver, D. et al. Mastering the game of Go with

Some pointers

Page 47: Deep Learning - Uppsala University...Deep learning: AI Go player An AI defeated a human professional for the rst time in the game of Go Silver, D. et al. Mastering the game of Go with

Some pointers

Page 48: Deep Learning - Uppsala University...Deep learning: AI Go player An AI defeated a human professional for the rst time in the game of Go Silver, D. et al. Mastering the game of Go with

Summary

A neural network (NN) is a nonlinear function y = gθ(u)from an input variable u to an output variable y

We can think of an NN as a sequential/recursive construction ofseveral generalized linear regressions.

Deep learning refers to learning NNs with several hidden layers.Allows for data-driven models that automatically learns rep. ofdata (features) with multiple layers of abstraction.

A deep NN is very parameter efficient when modellinghigh-dimensional, complex data.

Page 49: Deep Learning - Uppsala University...Deep learning: AI Go player An AI defeated a human professional for the rst time in the game of Go Silver, D. et al. Mastering the game of Go with

Summary

Page 50: Deep Learning - Uppsala University...Deep learning: AI Go player An AI defeated a human professional for the rst time in the game of Go Silver, D. et al. Mastering the game of Go with

Summary

Page 51: Deep Learning - Uppsala University...Deep learning: AI Go player An AI defeated a human professional for the rst time in the game of Go Silver, D. et al. Mastering the game of Go with

Summary

Page 52: Deep Learning - Uppsala University...Deep learning: AI Go player An AI defeated a human professional for the rst time in the game of Go Silver, D. et al. Mastering the game of Go with

Thank you!

deep learning - uppsala university...deep learning: ai go player an ai defeated a human professional...

Documents