non-bayes classifiers. linear discriminants, neural networks

Non-Bayes classifiers.

Linear discriminants,

neural networks.

Discriminant functions(1)

2121 :?0)|()|( wwxwPxwP Bayes classification rule:

Instead might try to find a function:

21, :?0)(21

wwxf ww

)(21 , xf ww is called discriminant function.

}0)(|{21 , xfx ww

- decision surface

Discriminant functions (2)Class 1

Class 2

Class 1

Class 2

0, )(21

wxwxf Tww

Decision surface is a hyperplane 00 wxwT

Linear discriminant function:

Linear discriminant – perceptron cost function

x

Tx xwxJ )(

1 xand

0

x

w

wwReplace

Thus now decision function is and decision surface is

xwxf Tww )(

21 ,

0xwT

Perceptron cost function:

where

classifiedcorrectly is ,0

0 is and if ,1

0 is and if ,1

2

1

x

xwxwx

xwxwxT

T

x

Linear discriminant – perceptron cost function

x

Tx xwxJ )(

Perceptron cost function:Class 1

Class 2

Value of is proportional to the sum of distances of all misclassified samples to the decision surface.

)(xJ

If discriminant function separates classes perfectly, thenOtherwise, and we want to minimize it.

0)( xJ0)( xJ

is continuous and piecewise linear. So we might try to use gradient descent algorithm.

)(xJ

Linear discriminant – Perceptron algorithm

)(

)()()1(

twwt w

wJtwtw

Gradient descent:

At points where is differentiable )(xJ

x

xxδw

wJ

sifiedmisclas

)(

Thus x

xt xδtwtw

sifiedmisclas

)()1(

Perceptron algorithm converges when classes are linearly separable with some conditions on t

Sum of error squares estimation

xwx Twwf )(

21 ,Want to find discriminant functionwhose output is similar to

Let denote as desired output function, 1 for one class and –1 for the other.

1)( xy

)(xy

Use sum of error squares as similarity criterion:

)(minargˆ

)(1

2

ww

xww

wJ

yJN

ii

Ti

Sum of error squares estimationMinimize mean square error:

N

iii

N

i

Tii

N

ii

Tii

y

yJ

11

1

ˆ

0)(2)(

xwxx

xwxw

w

Thus

N

iii

N

i

Tii yw

1

1

1

ˆ xxx

Neurons

Artificial neuron.

1w

2w

lw

1x

2x

lx0w

f

Above figure represent artificial neuron calculating:

l

iiixwfy

1

Artificial neuron.Threshold functions f:

0

1

0

1

Step function Logistic function

00

01)(

x

xxf

axexf

1

1)(

Combining artificial neurons

1x

2x

lx

Multilayer perceptron with 3 layers.

Discriminating ability of multilayer perceptron

Since 3-layer perceptron can approximate any smooth function, it can approximate - optimal discriminant function of two classes.

)|()|()( 21 xwPxwPxF

Training of multilayer perceptronf

f

f

Layer r-1 Layer r

f

f

f

1rky r

jkwrjv

rjy

Training and cost functionDesired network output: )()( iyix

Trained network output: )(ˆ)( iyix

Cost function for one training sample:

Lk

mmm iyiyiE

1

2))(ˆ)((2

1)(

Total cost function:

N

i

iEJ1

)(

Goal of the training: find values of which minimize cost function .

rjkw

J

Gradient descentDenote: Tr

jkrj

rj

rj r

www ],...,,[110

w

rj

rj

rj

Joldnew

www

)()(Gradient descent:

Since , we might want to update weights after processing each training sample separately:

N

i

iEJ1

)(

rj

rj

rj

iEoldnew

www

)(

)()(

Gradient descent

)()(

)()(

)(

)()( 1 iyiv

iEiv

iv

iEiE rrj

rj

rj

rj

rj

ww

Chain rule for differentiating composite functions:

Denote: )(

)()(

iv

iEi

rj

rj

BackpropagationIf r=L, then

))(()())(())(ˆ))(((

))(ˆ))(((2

1

)()(

)()(

1

2

ivfieivfiyivf

iyivfiviv

iEi

Ljj

Ljj

Lj

k

mm

LmL

jLj

Lj

L

If r<L, then

rr

r

k

k

rj

rkj

rj

k

krj

rjr

j

k

krj

rj

rj

rj

rj

ivfwiiv

ivi

iv

iv

iv

iE

iv

iEi

1

1

11

111

1

))(()()(

)()(

)(

)(

)(

)(

)(

)()(

Backpropagation algorithm

• Initialization: initialize all weights with random values.• Forward computations: for each training vector x(i) compute all • Backward computations: for each i, j and r=L, L-1,…,2 compute • Update weights:

)(1 irj

)( ),( iyiv rj

rj

)()()(

)()()(

1 iyiold

iEoldnew

rrj

rj

rj

rj

rj

w

www

MLP issues• What is the best network configuration?• How to choose proper learning parameter ?• When training should be stopped?• Choose another threshold function f or cost function J?

non-bayes classifiers. linear discriminants, neural networks

Documents