supervised learning. multilayer neural networks

14
CS 2710 Foundations of AI CS 2710 Foundations of AI Lecture 23 Milos Hauskrecht [email protected] 5329 Sennott Square Supervised learning. Multilayer neural networks. CS 2710 Foundations of AI Announcements Homework 10: due on Thursday, December 2, 2004 Final exam: December 14, 2004 at 1:00pm-3:00pm Location: Sennott Square 5129 Closed book Cumulative

Upload: others

Post on 26-May-2022

7 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Supervised learning. Multilayer neural networks

1

CS 2710 Foundations of AI

CS 2710 Foundations of AILecture 23

Milos [email protected] Sennott Square

Supervised learning.Multilayer neural networks.

CS 2710 Foundations of AI

Announcements

Homework 10: • due on Thursday, December 2, 2004

Final exam: December 14, 2004 at 1:00pm-3:00pm• Location: Sennott Square 5129• Closed book• Cumulative

Page 2: Supervised learning. Multilayer neural networks

2

CS 2710 Foundations of AI

Linear units

Logistic regressionLinear regression

1

)|1( xyp =

0w

1w2w

dw

z∑

1

1x0w

1w2w

dw

dx

2x

)(xf

∑=

+=d

jjj xwwf

10)(x )(),|1()(

10 ∑

=

+===d

jjj xwwgypf wxx

jjj xfyww ))(( x−+← α jjj xfyww ))(( x−+← α

))((00 xfyww −+← αOn-line gradient update:

))((00 xfyww −+← αOn-line gradient update:

The same

=)(xf

1x

dx

2x

CS 2710 Foundations of AI

Limitations of basic linear units

Logistic regressionLinear regression

1

)|1( xyp =

0w

1w2w

dw

z∑

1

0w

1w2w

dw

)(xf

Function linear in inputs !! Linear decision boundary!!

∑=

+=d

jjj xwwf

10)(x )(),|1()(

10 ∑

=

+===d

jjj xwwgypf wxx

1x

dx

2x1x

dx

2x

Page 3: Supervised learning. Multilayer neural networks

3

CS 2710 Foundations of AI

Extensions of simple linear units

)()(1

0 xx j

m

jjwwf φ∑

=

+=

∑)(1 xφ

)(2 xφ

)( xmφ

1

1x

0w

1w2w

mwdx

)(xjφ - an arbitrary function of x

• use feature (basis) functions to model nonlinearities

))(()(1

0 xx j

m

jjwwgf φ∑

=

+=

Linear regression Logistic regression

CS 2710 Foundations of AI

Regression with the quadratic model.

Limitation: linear hyper-plane onlya non-linear surface can be better

Page 4: Supervised learning. Multilayer neural networks

4

CS 2710 Foundations of AI

Classification with the linear model.

Logistic regression model defines a linear decision boundary• Example: 2 classes (blue and red points)

-2 -1.5 -1 -0.5 0 0.5 1 1.5 2-2

-1.5

-1

-0.5

0

0.5

1

1.5

2Decision boundary

CS 2710 Foundations of AI

Linear decision boundary• logistic regression model is not optimal, but not that bad

-4 -3 -2 -1 0 1 2 3 4 5 6-4

-3

-2

-1

0

1

2

3

4

5

Page 5: Supervised learning. Multilayer neural networks

5

CS 2710 Foundations of AI

When logistic regression fails?

• Example in which the logistic regression model fails

-4 -3 -2 -1 0 1 2 3 4 5-4

-3

-2

-1

0

1

2

3

4

5

CS 2710 Foundations of AI

Limitations of linear units.

-2 -1.5 -1 -0.5 0 0.5 1 1.5 2-2

-1.5

-1

-0.5

0

0.5

1

1.5

2

• Logistic regression does not work for parity functions-no linear decision boundary exists

Solution: a model of a non-linear decision boundary

Page 6: Supervised learning. Multilayer neural networks

6

CS 2710 Foundations of AI

Example. Regression with polynomials.Regression with polynomials of degree m• Data points: pairs of • Feature functions: m feature functions

• Function to learn:i

m

iii

m

ii xwwxwwxf ∑∑

==

+=+=1

01

0 )(),( φw

ii xx =)(φ

>< yx,

mi ,,2,1 K=

∑x=)(1 xφ2

2 )( x=xφ

mm x=)(xφ

1

x

0w

1w2w

mw

CS 2710 Foundations of AI

Learning with extended linear units

)()(1

0 xx j

m

jjwwf φ∑

=

+=

Feature (basis) functions model nonlinearities

Important property:• The problem of learning the weights is the same as it was for

the linear units• Trick: we have changed the inputs – but the weights are still

linear in the new input

∑)(1 xφ

)(2 xφ

)( xmφ

1

1x

0w

1w2w

mwdx

))(()(1

0 xx j

m

jjwwgf φ∑

=

+=

Linear regression Logistic regression

Page 7: Supervised learning. Multilayer neural networks

7

CS 2710 Foundations of AI

Function to learn:

On line gradient update for the <x,y> pair

Gradient updates are of the same form as in the linear and logistic regression models

)()),(( xwx jjj fyww φα −+=

)),((00 wxfyww −+= α

Learning with feature functions.

)(),(1

0 xwwxf i

k

iiφ∑

=

+=w

CS 2710 Foundations of AI

Example: Regression with polynomials of degree m

• On line update for <x,y> pair

jjj xfyww )),(( wx−+= α

)),((00 wxfyww −+= α

Example. Regression with polynomials.

im

iii

m

ii xwwxwwxf ∑∑

==

+=+=1

01

0 )(),( φw

Page 8: Supervised learning. Multilayer neural networks

8

CS 2710 Foundations of AI

Multi-layered neural networks

• Alternative way to introduce nonlinearities to regression/classification models

• Idea: Cascade several simple neural models with logistic units. Much like neuron connections.

CS 2710 Foundations of AI

Multilayer neural network

Hidden layer Output layerInput layer

Cascades multiple logistic regression unitsAlso called a multilayer perceptron (MLP)

∑1

1x)|1( xyp =

)1(1,0w

)1(1,kw)1(2,kw

dx

2x)2(1z

)1(2,0w

)1(1z

)1(2z

1

)2(1,0w

)2(1,1w

)2(1,2w

Example: (2 layer) classifier with non-linear decision boundaries

Page 9: Supervised learning. Multilayer neural networks

9

CS 2710 Foundations of AI

Multilayer neural network

• Models non-linearities through logistic regression units• Can be applied to both regression and binary classification

problems

∑1

),|1()( wxx == ypf

)1(1,0w

)1(1,kw)1(2,kw

)2(1z

)1(2,0w

)1(1z

)1(2z

)2(1,0w

)2(1,1w

)2(1,2w

Hidden layer Output layerInput layer

1),()( wxx ff =

regression

classification

option

1x

dx

2x

CS 2710 Foundations of AI

Multilayer neural network

• Non-linearities are modeled using multiple hidden logistic regression units (organized in layers)

• The output layer determines whether it is a regression or a binary classification problem

),|1()( wxx == ypf

Hidden layers Output layerInput layer

),()( wxx ff =

regression

classification

option

1x

dx

2x

Page 10: Supervised learning. Multilayer neural networks

10

CS 2710 Foundations of AI

Learning with MLP

• How to learn the parameters of the neural network?• Online gradient descent algorithm

– Weight updates based on online error:

• We need to compute gradients for weights in all units• Can be computed in one backward sweep through the net !!!

• The process is called back-propagation

),(online wij

jj DJw

ww∂∂

−← α

),(online wiDJ

CS 2710 Foundations of AI

Backpropagation

∑)1( +kxl)(kxi

)(, kw ji )(kzi )1( +kzl)1(, +kw il

∑)1( −kx j

k-th level (k+1)-th level(k-1)-th level

)(kxi - output of the unit i on level k)(kzi - input to the sigmoid function on level k

∑ −+=j

jjiii kxkwkwkz )1()()()( ,0,

))(()( kzgkx ii =

)(, kw ji - weight between units j and i on levels (k-1) and k

Page 11: Supervised learning. Multilayer neural networks

11

CS 2710 Foundations of AI

Backpropagation

)(kiδ

))(()( wx,fyKi −−=δ

)(, kw jiUpdate weight using a data point),(

)()()(

,,, wuonline

jijiji DJ

kwkwkw

∂∂

−← α

),()(

)( wuonlinei

i DJkz

k∂∂

=δLet

Then: )1()()(

)()(

),(),(

)( ,,

−=∂∂

∂∂

=∂

∂ kxkkw

kzkzDJ

DJkw ji

ji

i

i

uonlineuonline

ji

δw

w

S.t. is computed from and the next layer )1( +klδ

))(1)(()1()1()( , kxkxkwkk iiill

li −

++= ∑δδ

Last unit (is the same as for the regular linear units):

It is the same for the classification with the log-likelihoodmeasure of fit and linear regression with least-squares error!!!

>=< yDu ,x

)(kxi

CS 2710 Foundations of AI

Learning with MLP

• Online gradient descent algorithm– Weight update:

),()(

)()( online,

,, wuji

jiji DJkw

kwkw∂

∂−← α

)1()()(

)()(

),(),(

)( ,,

−=∂∂

∂∂

=∂

∂ kxkkw

kzkzDJ

DJkw ji

ji

i

i

uonlineuonline

ji

δw

w

)1()()()( ,, −−← kxkkwkw jijiji αδ

)1( −kx j

)(kiδ- j-th output of the (k-1) layer

- derivative computed via backpropagationα - a learning rate

Page 12: Supervised learning. Multilayer neural networks

12

CS 2710 Foundations of AI

Online gradient descent algorithm for MLP

Online-gradient-descent (D, number of iterations)Initialize all weightsfor i=1:1: number of iterations

do select a data point Du=<x,y> from Dset learning rate compute outputs for each unitcompute derivatives via backpropagationupdate all weights (in parallel)

end forreturn weights w

)(, kw ji

α

)1()()()( ,, −−← kxkkwkw jijiji αδ

)(kx j

)(kiδ

CS 2710 Foundations of AI

Xor Example.

-2 -1.5 -1 -0.5 0 0.5 1 1.5 2-2

-1.5

-1

-0.5

0

0.5

1

1.5

2

• linear decision boundary does not exist

Page 13: Supervised learning. Multilayer neural networks

13

CS 2710 Foundations of AI

Xor example. Linear unit

CS 2710 Foundations of AI

Xor example. Neural network with 2 hidden units

Page 14: Supervised learning. Multilayer neural networks

14

CS 2710 Foundations of AI

Xor example. Neural network with 10 hidden units

CS 2710 Foundations of AI

MLP in practice

• Optical character recognition – digits 20x20– Automatic sorting of mails– 5 layer network with multiple output functions

10 outputs (0,1,…9)…

20x20 = 400 inputs

5 10 3000

4 300 1200

3 1200 50000

2 784 3136

1 3136 78400

layer Neurons Weights