lecture11 - neural networks

32
Introduction to Machine Learning Lecture 11 N lN t k Neural Networks Albert Orriols i Puig [email protected] Artificial Intelligence Machine Learning Enginyeria i Arquitectura La Salle Universitat Ramon Llull

Upload: albert-orriols-puig

Post on 24-Jan-2015

2.382 views

Category:

Documents


1 download

DESCRIPTION

 

TRANSCRIPT

Page 1: Lecture11 - neural networks

Introduction to Machine Learning

Lecture 11N l N t kNeural Networks

Albert Orriols i [email protected]

Artificial Intelligence – Machine Learningg gEnginyeria i Arquitectura La Salle

Universitat Ramon Llull

Page 2: Lecture11 - neural networks

Recap of Lecture 5-10Data classification

Decision trees (C4.5)

Instance-based learners (kNN and CBR)

Slide 2Artificial Intelligence Machine Learning

Page 3: Lecture11 - neural networks

Recap of Lecture 5-10Data classification

Probabilistic-based learners

)()|( hPhDP)(

)()|()|(DP

hPhDPDhP =

Linear/polynomial classifier

Slide 3Artificial Intelligence Machine Learning

Page 4: Lecture11 - neural networks

Today’s Agenda

Why Neural Networks?Looking into a BrainNeural NetworksNeural NetworksStarting from the Beginning:

PerceptronsMulti-layer perceptrons

Slide 4Artificial Intelligence Machine Learning

Page 5: Lecture11 - neural networks

Why Neural Networks?Brain vs. machines

Machines are tremendously faster than brains in well-defined problems:

Invert matrices solve differential equations etcInvert matrices, solve differential equations, etc.

Brains are tremendously faster and more accurate than machines in ill-defined methods or problems that require a lot p qof processing

Recognize the character of objects in TV

Let’s simulate our brains with artificial neural networks!Massive parallelism

Neurons interchanging signals

Slide 5Artificial Intelligence Machine Learning

Page 6: Lecture11 - neural networks

Looking into a Brain1011 neurons of more than 20 different types

0.001 seconds of neuron switching time

104-5 connections per neuron

0.1 seconds of scene recognition time

Slide 6Artificial Intelligence Machine Learning

Page 7: Lecture11 - neural networks

Artificial Neural NetworksBorrow some ideas from nervous systems of animals

)()( , jijjii aWginga ∑==

Slide 7Artificial Intelligence Machine Learning

THE PERCEPTRON (McCulloch & Pitts)

Page 8: Lecture11 - neural networks

AdalineAdaptive Linear Element

Adaptive linear combiner cascaded with a hard-limiting quantizer

Linear output transformed to binary by means of a threshold device

Training = adjusting the weights

Activation functions

Slide 8Artificial Intelligence Machine Learning

Page 9: Lecture11 - neural networks

AdalineNote that Adaline implements a function

∑+=n

iiiwxwwxf

10),( rr

This defines a threshold when the output is zero

=i 1

This defines a threshold when the output is zero

0),( 0 =+= ∑n

iiwxwwxf rr 0),(1

0 +∑=i

iiwxwwxf

Slide 9Artificial Intelligence Machine Learning

Page 10: Lecture11 - neural networks

AdalineLet’s assume that we have two variables

0)( ++ wxwxwwxf rr

Therefore

0),( 22110 =++= wxwxwwxf

01 wxwx −−=2

12

2 wx

wx −−=

So, Adaline is drawing a linear , gdiscriminant that divides the space into two regions

Linear classifierLinear classifier

Slide 10Artificial Intelligence Machine Learning

Page 11: Lecture11 - neural networks

AdalineSo, we got a cool way to create linear classifiers

But are linear classifiers enough to tackle our problems?

Can you draw a line that separates examples of class whiteCan you draw a line that separates examples of class white and black for the last example?

Slide 11Artificial Intelligence Machine Learning

Page 12: Lecture11 - neural networks

Moving to more Flexible NNSo, we want to classify problems such as x-or. Any idea?

Polynomial discriminant functions

In this system:

Slide 12Artificial Intelligence Machine Learning

0),( 222222122111

21110 =+++++= wxwxwxxwxwxwwxf rr

Page 13: Lecture11 - neural networks

Moving to more Flexible NN

With appropriate values of w, I can fit data that is not linearly separable

Slide 13Artificial Intelligence Machine Learning

Page 14: Lecture11 - neural networks

Even more Flexible: Multi-layer NN

So, we want to classify problems such as x-or. Any other idea?

Madaline: Multiple Adalines connected

This also enables the network to solve non-separable problems

Slide 14Artificial Intelligence Machine Learning

Page 15: Lecture11 - neural networks

But Step Down… How Do I Learn w?

We have seen that different structures enable me to define different functionsdefine different functions

But the key is to get a proper estimation of w

There are many algorithmsPerceptron rule

α-LMS

α-perceptron

May’s algorithm

Backpropagationp p g

We are going to see two examples: α-LMS and backprop.

Slide 15Artificial Intelligence Machine Learning

Page 16: Lecture11 - neural networks

Weight Learning in AdalineRecall that we want to adjust w

Slide 16Artificial Intelligence Machine Learning

Page 17: Lecture11 - neural networks

Weight Learning in AdalineWeight learning with α-LMS algorithm

XεIncrementally update weights as 21

k

kkkk

XXWW εα+=+

The error is the difference between the actual and the expected output k

Tkkk XWd −=+1ε

A change in the weights effects the error k

Tkk

Tkkk WXXWd Δ−=−Δ=Δ )(ε

XεAnd the weight change is 21

k

kkkkk

XXWWW εα=−=Δ +

Therefore kk

Tkk

kXX αεεαε −=−=Δ 2

Slide 17

Therefore

Artificial Intelligence Machine Learning

kk

kX 2

Page 18: Lecture11 - neural networks

Weight Learning in Adaline

kk

k XX

W 2εα=Δk

Tkk WX ΔεΔ −=

kX

Slide 18Artificial Intelligence Machine Learning

Page 19: Lecture11 - neural networks

Backpropagationα-LMS works for networks with a single layer. But what happens in networks with multiple layers?happens in networks with multiple layers?

Backpropagation (Rumelhat, 1986)The most influential development of NN in the 1980s

Here, we present the method conceptually (the math details are in the papers)in the papers)

Let’s assume a network withThree neurons in the input layer

Two neurons in the output layer

Slide 19Artificial Intelligence Machine Learning

Page 20: Lecture11 - neural networks

BackpropagationStrategy

Compute the gradient of the error

2∂ε

k

kk W

ˆ∂∂

=∇ε

Adjust the weights in the direction opposite to the instantaneous error gradient

Now, Wk is a vector that contains all the components of the net

Slide 20Artificial Intelligence Machine Learning

Page 21: Lecture11 - neural networks

BackpropagationAlgorithm1. Insert a new example Xk into the network and sweep it forward

till getting the output y

C t th f thi tt ib t2. Compute the square error of this attribute

( )∑∑ −==yy N

2ikik

N2

ik2

k ydεε

For example, for two outputs (disregarding k)

∑∑== 1i1i

P t th t th i l (b k ti )

( ) ( )2222

112 ydyd −+−=ε

3. Propagate the error to the previous layer (back-propagation). How?

Steepest descent

Slide 21

pCompute the derivative of the square error δ for each Adaline

Artificial Intelligence Machine Learning

Page 22: Lecture11 - neural networks

Backpropagation ExampleExample borrowed from: http://home.agh.edu.pl/~vlsi/AI/backp_t_en/backprop.html

Slide 22Artificial Intelligence Machine Learning

Page 23: Lecture11 - neural networks

Backpropagation Example1. Sweep the weights forward

Slide 23Artificial Intelligence Machine Learning

Page 24: Lecture11 - neural networks

Backpropagation Example2. Backpropagate the error

Slide 24Artificial Intelligence Machine Learning

Page 25: Lecture11 - neural networks

Backpropagation Example3. Modify the weights of each neuron

Slide 25Artificial Intelligence Machine Learning

Page 26: Lecture11 - neural networks

Backpropagation Example3.bis. Do the same of each neuron

Slide 26Artificial Intelligence Machine Learning

Page 27: Lecture11 - neural networks

Backpropagation Example3.bis2. Until reaching the output

Slide 27Artificial Intelligence Machine Learning

Page 28: Lecture11 - neural networks

Backpropagation for a Two-Layer Net.

That is, the algorithm is1. Find the instantaneous square error derivative

2)l( 1 ∂εδ

This tells us how sensitive is the square output error of the net ork is to changes in the linear o tp t s of the associated

)l(j

)l(j s2 ∂

−=δ

network is to changes in the linear output s of the associated Madaline

2. Expanding the error term we getp g g

[ ])2(

1

2)2(11

)2(1

222

211)2(

1 s)s(sgmd

21

s)yd()yd(

21

∂−∂

−=∂

−+−∂−=

][δ

3. And recognizing that d1 is independent of s1

11 s2s2 ∂∂

)2()2()2()2()2(

Slide 28Artificial Intelligence Machine Learning

)s('sgm)s('sgm)s(sgmd )2(1

)2(1

)2(1

)2(11

)2(1 εδ =−= ][

Page 29: Lecture11 - neural networks

Backpropagation for a Two-Layer Net.

That is, the algorithm is4. Similarly for the hidden layers we have

⎟⎞

⎜⎛ ∂∂∂∂∂ )2(

22)2(

122

)1( ss11 εεεδ ⎟⎟⎠

⎜⎜⎝ ∂

∂∂∂

+∂∂

∂∂

−=∂∂

−= )1(1

2)2(

2)1(

1

1)2(

1)1(

1

)1(1 s

sss

ss2

1s2

1 εεεδ

)2()2(

5. That is)1(

1

)2(2)2(

2)1(1

)2(1)2(

1)1(

1 ss

ss

∂∂

+∂∂

= δδδ

4. Which yields)1(

23

1i

)2(i1

)2(20

)1(i

3

1i

)2(i1

)2(10 )s(sgmww)2()s(sgmww)2()1( ⎥⎦

⎤⎢⎣⎡

∑+∂⎥⎦⎤

⎢⎣⎡

∑+∂== + δδδ )1(

1

1i)1(

1

1i

s)(

2s)(

1)(

1 ∂⎦⎣

∂⎦⎣ == += δδδ

)s('sgmw)s('sgmw )1(1

)2(21

)2(2

)1(1

)2(11

)2(1 δδ +=

Slide 29Artificial Intelligence Machine Learning

[ ] )s('sgmww )1(1

)2(21

)2(2

)2(11

)2(1 δδ +=

Page 30: Lecture11 - neural networks

Backpropagation for a Two-Layer Net.

Defining )2(21

)2(2

)2(11

)2(1

)1(1 ww δδε +=

Δ

We obtain )s('sgm )1(1

)1(1

)1(1 εδ =

Slide 30

Implementation details of each Adaline

Page 31: Lecture11 - neural networks

Next Class

Support Vector MachinesSupport Vector Machines

Slide 31Artificial Intelligence Machine Learning

Page 32: Lecture11 - neural networks

Introduction to Machine Learning

Lecture 11N l N t kNeural Networks

Albert Orriols i [email protected]

Artificial Intelligence – Machine Learningg gEnginyeria i Arquitectura La Salle

Universitat Ramon Llull