non-bayes classifiers. linear discriminants, neural networks
TRANSCRIPT
![Page 1: Non-Bayes classifiers. Linear discriminants, neural networks](https://reader035.vdocuments.mx/reader035/viewer/2022062422/56649f335503460f94c50132/html5/thumbnails/1.jpg)
Non-Bayes classifiers.
Linear discriminants,
neural networks.
![Page 2: Non-Bayes classifiers. Linear discriminants, neural networks](https://reader035.vdocuments.mx/reader035/viewer/2022062422/56649f335503460f94c50132/html5/thumbnails/2.jpg)
Discriminant functions(1)
2121 :?0)|()|( wwxwPxwP Bayes classification rule:
Instead might try to find a function:
21, :?0)(21
wwxf ww
)(21 , xf ww is called discriminant function.
}0)(|{21 , xfx ww
- decision surface
![Page 3: Non-Bayes classifiers. Linear discriminants, neural networks](https://reader035.vdocuments.mx/reader035/viewer/2022062422/56649f335503460f94c50132/html5/thumbnails/3.jpg)
Discriminant functions (2)Class 1
Class 2
Class 1
Class 2
0, )(21
wxwxf Tww
Decision surface is a hyperplane 00 wxwT
Linear discriminant function:
![Page 4: Non-Bayes classifiers. Linear discriminants, neural networks](https://reader035.vdocuments.mx/reader035/viewer/2022062422/56649f335503460f94c50132/html5/thumbnails/4.jpg)
Linear discriminant – perceptron cost function
x
Tx xwxJ )(
1 xand
0
x
w
wwReplace
Thus now decision function is and decision surface is
xwxf Tww )(
21 ,
0xwT
Perceptron cost function:
where
classifiedcorrectly is ,0
0 is and if ,1
0 is and if ,1
2
1
x
xwxwx
xwxwxT
T
x
![Page 5: Non-Bayes classifiers. Linear discriminants, neural networks](https://reader035.vdocuments.mx/reader035/viewer/2022062422/56649f335503460f94c50132/html5/thumbnails/5.jpg)
Linear discriminant – perceptron cost function
x
Tx xwxJ )(
Perceptron cost function:Class 1
Class 2
Value of is proportional to the sum of distances of all misclassified samples to the decision surface.
)(xJ
If discriminant function separates classes perfectly, thenOtherwise, and we want to minimize it.
0)( xJ0)( xJ
is continuous and piecewise linear. So we might try to use gradient descent algorithm.
)(xJ
![Page 6: Non-Bayes classifiers. Linear discriminants, neural networks](https://reader035.vdocuments.mx/reader035/viewer/2022062422/56649f335503460f94c50132/html5/thumbnails/6.jpg)
Linear discriminant – Perceptron algorithm
)(
)()()1(
twwt w
wJtwtw
Gradient descent:
At points where is differentiable )(xJ
x
xxδw
wJ
sifiedmisclas
)(
Thus x
xt xδtwtw
sifiedmisclas
)()1(
Perceptron algorithm converges when classes are linearly separable with some conditions on t
![Page 7: Non-Bayes classifiers. Linear discriminants, neural networks](https://reader035.vdocuments.mx/reader035/viewer/2022062422/56649f335503460f94c50132/html5/thumbnails/7.jpg)
Sum of error squares estimation
xwx Twwf )(
21 ,Want to find discriminant functionwhose output is similar to
Let denote as desired output function, 1 for one class and –1 for the other.
1)( xy
)(xy
Use sum of error squares as similarity criterion:
)(minargˆ
)(1
2
ww
xww
wJ
yJN
ii
Ti
![Page 8: Non-Bayes classifiers. Linear discriminants, neural networks](https://reader035.vdocuments.mx/reader035/viewer/2022062422/56649f335503460f94c50132/html5/thumbnails/8.jpg)
Sum of error squares estimationMinimize mean square error:
N
iii
N
i
Tii
N
ii
Tii
y
yJ
11
1
ˆ
0)(2)(
xwxx
xwxw
w
Thus
N
iii
N
i
Tii yw
1
1
1
ˆ xxx
![Page 9: Non-Bayes classifiers. Linear discriminants, neural networks](https://reader035.vdocuments.mx/reader035/viewer/2022062422/56649f335503460f94c50132/html5/thumbnails/9.jpg)
Neurons
![Page 10: Non-Bayes classifiers. Linear discriminants, neural networks](https://reader035.vdocuments.mx/reader035/viewer/2022062422/56649f335503460f94c50132/html5/thumbnails/10.jpg)
Artificial neuron.
1w
2w
lw
1x
2x
lx0w
f
Above figure represent artificial neuron calculating:
l
iiixwfy
1
![Page 11: Non-Bayes classifiers. Linear discriminants, neural networks](https://reader035.vdocuments.mx/reader035/viewer/2022062422/56649f335503460f94c50132/html5/thumbnails/11.jpg)
Artificial neuron.Threshold functions f:
0
1
0
1
Step function Logistic function
00
01)(
x
xxf
axexf
1
1)(
![Page 12: Non-Bayes classifiers. Linear discriminants, neural networks](https://reader035.vdocuments.mx/reader035/viewer/2022062422/56649f335503460f94c50132/html5/thumbnails/12.jpg)
Combining artificial neurons
1x
2x
lx
Multilayer perceptron with 3 layers.
![Page 13: Non-Bayes classifiers. Linear discriminants, neural networks](https://reader035.vdocuments.mx/reader035/viewer/2022062422/56649f335503460f94c50132/html5/thumbnails/13.jpg)
![Page 14: Non-Bayes classifiers. Linear discriminants, neural networks](https://reader035.vdocuments.mx/reader035/viewer/2022062422/56649f335503460f94c50132/html5/thumbnails/14.jpg)
Discriminating ability of multilayer perceptron
Since 3-layer perceptron can approximate any smooth function, it can approximate - optimal discriminant function of two classes.
)|()|()( 21 xwPxwPxF
![Page 15: Non-Bayes classifiers. Linear discriminants, neural networks](https://reader035.vdocuments.mx/reader035/viewer/2022062422/56649f335503460f94c50132/html5/thumbnails/15.jpg)
Training of multilayer perceptronf
f
f
Layer r-1 Layer r
f
f
f
1rky r
jkwrjv
rjy
![Page 16: Non-Bayes classifiers. Linear discriminants, neural networks](https://reader035.vdocuments.mx/reader035/viewer/2022062422/56649f335503460f94c50132/html5/thumbnails/16.jpg)
Training and cost functionDesired network output: )()( iyix
Trained network output: )(ˆ)( iyix
Cost function for one training sample:
Lk
mmm iyiyiE
1
2))(ˆ)((2
1)(
Total cost function:
N
i
iEJ1
)(
Goal of the training: find values of which minimize cost function .
rjkw
J
![Page 17: Non-Bayes classifiers. Linear discriminants, neural networks](https://reader035.vdocuments.mx/reader035/viewer/2022062422/56649f335503460f94c50132/html5/thumbnails/17.jpg)
Gradient descentDenote: Tr
jkrj
rj
rj r
www ],...,,[110
w
rj
rj
rj
Joldnew
www
)()(Gradient descent:
Since , we might want to update weights after processing each training sample separately:
N
i
iEJ1
)(
rj
rj
rj
iEoldnew
www
)(
)()(
![Page 18: Non-Bayes classifiers. Linear discriminants, neural networks](https://reader035.vdocuments.mx/reader035/viewer/2022062422/56649f335503460f94c50132/html5/thumbnails/18.jpg)
Gradient descent
)()(
)()(
)(
)()( 1 iyiv
iEiv
iv
iEiE rrj
rj
rj
rj
rj
ww
Chain rule for differentiating composite functions:
Denote: )(
)()(
iv
iEi
rj
rj
![Page 19: Non-Bayes classifiers. Linear discriminants, neural networks](https://reader035.vdocuments.mx/reader035/viewer/2022062422/56649f335503460f94c50132/html5/thumbnails/19.jpg)
BackpropagationIf r=L, then
))(()())(())(ˆ))(((
))(ˆ))(((2
1
)()(
)()(
1
2
ivfieivfiyivf
iyivfiviv
iEi
Ljj
Ljj
Lj
k
mm
LmL
jLj
Lj
L
If r<L, then
rr
r
k
k
rj
rkj
rj
k
krj
rjr
j
k
krj
rj
rj
rj
rj
ivfwiiv
ivi
iv
iv
iv
iE
iv
iEi
1
1
11
111
1
))(()()(
)()(
)(
)(
)(
)(
)(
)()(
![Page 20: Non-Bayes classifiers. Linear discriminants, neural networks](https://reader035.vdocuments.mx/reader035/viewer/2022062422/56649f335503460f94c50132/html5/thumbnails/20.jpg)
Backpropagation algorithm
• Initialization: initialize all weights with random values.• Forward computations: for each training vector x(i) compute all • Backward computations: for each i, j and r=L, L-1,…,2 compute • Update weights:
)(1 irj
)( ),( iyiv rj
rj
)()()(
)()()(
1 iyiold
iEoldnew
rrj
rj
rj
rj
rj
w
www
![Page 21: Non-Bayes classifiers. Linear discriminants, neural networks](https://reader035.vdocuments.mx/reader035/viewer/2022062422/56649f335503460f94c50132/html5/thumbnails/21.jpg)
MLP issues• What is the best network configuration?• How to choose proper learning parameter ?• When training should be stopped?• Choose another threshold function f or cost function J?