8/10/2015 1 rbf networksm.w. mak radial basis function networks 1. introduction 2. finding rbf...
TRANSCRIPT
04/19/23 1RBF Networks M.W. Mak
Radial Basis Function Radial Basis Function NetworksNetworks
1. Introduction
2. Finding RBF Parameters
3. Decision Surface of RBF Networks
4. Comparison between RBF and BP
04/19/23 2RBF Networks M.W. Mak
1. Introduction MLPs are highly non-linear in the parameter space
gradient descent local minima RBF networks solve this problem by dividing the
learning into two independent processes.
w
04/19/23 3RBF Networks M.W. Mak
RBF networks implement the function
s x w w x ci i ii
M
( ) ( )
0
1
wi i and ci can be determined separately
Fast learning algorithm Basis function types
( ) log( )
( ) exp( )
( )
( )
r r r
r r
r r
rr
2
2
2
2 2
2 21
04/19/23 4RBF Networks M.W. Mak
For Gaussian basis functions
s x w w x c
w wx c
p i i p ii
M
ipj ij
ijj
n
i
M
( )
exp( )
01
0
2
211 2
Assume the variance across each dimension are equal
s x w w x cp ii
pj ijj
n
i
M
( ) exp ( )
0 22
11
1
2
04/19/23 5RBF Networks M.W. Mak
To write in matrix form, let
a x c
s x w a a
pi i p i
p i pii
M
p
where ( )
00 1
s x
s x
s x
a a a
a a a
a a a
w
w
wN
M
M
N N NM M
( )
( )
( )
`
1
2
11 12 1
21 22 2
1 2
0
1
1
1
1
s Aw
A s1
w
04/19/23 6RBF Networks M.W. Mak
2. Finding the RBF Parameters
Use the K-mean algorithm to find ci
1
2
2
2
1
1
04/19/23 7RBF Networks M.W. Mak
K-mean Algorithm
step1: K initial clusters are chosen randomly from the samples to form K groups.
step2: Each new sample is added to the group whose mean is the closest to this sample.
step3: Adjust the mean of the group to take account of the new points.
step4: Repeat step2 until the distance between the old means and the new means of all clusters is smaller than a predefined tolerance.
04/19/23 8RBF Networks M.W. Mak
Outcome: There are K clusters with means representing the centroid of each clusters.
Advantages: (1) A fast and simple algorithm.
(2) Reduce the effects of noisy samples.
04/19/23 9RBF Networks M.W. Mak
Use K nearest neighbor rule to find the function width
2
1
1
K
kiki cc
K
k-th nearest neighbor of ci
The objective is to cover the training points so that a smooth fit of the training samples can be achieved
04/19/23 10RBF Networks M.W. Mak
Centers and widths found by K-means and K-NN
04/19/23 11RBF Networks M.W. Mak
Determining weights w using the least square method
E d w x cp j jj
M
p jp
N
0
2
1
where dp is the desired output for pattern p
E
E
T
T T
( ) ( )
( )
d Aw d Aw
wA A A dSet w
0 1
04/19/23 12RBF Networks M.W. Mak
Let E be the total-squared error between the actual output and the target output TNdddd
21
wAdwAdET
AwAwAwddAwdd
AwdAwdTTTTTT
TTT
AwAww
Awdw
dAwww
E TTTTT
0
AwAdA
wAAAwAdAww
dA
TT
TTTTTT
22
dAAAw
dAAwATT
TT
1
04/19/23 13RBF Networks M.W. Mak
Note that
xAxAxAxx
yAyAxx
yyxx
TT
T
T
Problems
(1) Susceptible to round-off error.
(2) No solution if is singular.
(3) If is close to singular, we get very large component in w.
AAT
AAT
04/19/23 14RBF Networks M.W. Mak
Reasons
(1) Inaccuracy in forming(2) If A is ill-conditioned, small change in A introduces
large change in(3) If ATA is close to singular, dependent columns in ATA
exist
AAT
1AAT
e.g. two parallel straight lines.
x
y
04/19/23 15RBF Networks M.W. Mak
singular matrix :
1
0
42
21
y
x
If the lines are nearly parallel, they intersect each other at
,
i.e.
0
0
y
x
0
0
y
xor
So, the magnitude of the solution becomes very large; hence overflow will occur.
The effect of the large components can be cancelled out if the machine precision is infinite.
04/19/23 16RBF Networks M.W. Mak
If the machine precision is finite, we get large error.For example,
0
0
102
104
21
2138
38
Finite machine precision =>
33
33
38
38
101
101
102
1000001.4
21
21
Solution: Singular Value Decomposition
04/19/23 17RBF Networks M.W. Mak
xp
K-means
K-NearestNeighbor
BasisFunctions
LinearRegression
ci
ci
i
A w
RBF learning processRBF learning process
04/19/23 18RBF Networks M.W. Mak
RBF learning by gradient descent
Let and i p
pj ij
ijj
n
p p pxx c
e x d x s x( ) exp ( ) ( ) ( )
1
2
2
21
E e x pp
N
1
2 1
2
( ) .
we have
E
w
E E
ci ij ij
, , and
Apply
04/19/23 19RBF Networks M.W. Mak
we have the following update equations
w t w t e x x i M
w t w t e x i
t t e x w x x c t
c t c t e x w x x c t
i i w p i pp
N
i i w pp
N
ij ij p i i p pj ij ijp
N
ij ij c p i i p pj ij ijp
N
( ) ( ) ( ) ( ) , , ,
( ) ( ) ( )
( ) ( ) ( ) ( ) ( )
( ) ( ) ( ) ( ) ( )
1 1 2
1 0
1
1
1
1
2 3
1
2
1
when
when
04/19/23 20RBF Networks M.W. Mak
Elliptical Basis Function networks
)}()(2
1exp{)( 1
jpjT
jppj xxx
j
j
: function centers
: covariance matrix
1
x1
2 M
x2 xn
J
jpjkjpk xwxy
0
)()(
y W D W = +
y x1( )
y xK ( )
04/19/23 21RBF Networks M.W. Mak
K-means and Sample covariance K-means :
if Sample covariance :
j jj x
Nx
j
1
x j
x x j kj k
jj
j jT
xN
x xj
1
( )( )
The EM algorithm
04/19/23 22RBF Networks M.W. Mak
EBF Vs. RBF networksEBF Vs. RBF networks
RBFN with 4 centers EBFN with 4 centers
-3
-2
-1
0
1
2
3
-3 -2 -1 0 1 2 3
Class 1Class 2
-3
-2
-1
0
1
2
3
-3 -2 -1 0 1 2 3
Class 1Class 2
04/19/23 23RBF Networks M.W. Mak
Out put 1 of an EBF net work (bias, no rescale, gamma=1)
'nxor.ebf 4.Y.N.1.dat ' 1.43
0.948 0.463
-0.0209 -0.505
-3-2
-10
12
3 -3-2
-10
12
3
-1
-0.5
0
0.5
1
1.5
2
EBF Network’s output
Elliptical Basis Function NetworksElliptical Basis Function Networks
04/19/23 24RBF Networks M.W. Mak
RBFN for Pattern Classification
MLP RBFHyperplane Kernel function
The probability density function (also called conditional density function or likelihood) of the k-th class is defined as
kCxp |
04/19/23 25RBF Networks M.W. Mak
•According to Bays’ theorem, the posterior prob. is
xp
CPCxpxCP kk
k
||
where P(Ck) is the prior prob. and
)()|( rr
r CPCxpxp
• It is possible to use a common pool of M basis functions, labeled by an index j, to represent all of the class-conditional densities, i.e.
)|()|(|1
k
M
jk CjPjxpCxp
04/19/23 26RBF Networks M.W. Mak
)1|(xp
)|( kCxp
)|( Mxp)2|(xp
k
M
jk CjPjxpCxp |||
1
)|( kCMP
04/19/23 27RBF Networks M.W. Mak
kk
M
jk CPCjPjxpxp
1
||
M
j
kk
k
M
j
jPjxp
CPCjPjxp
1
1
|
||
jP
jP
jPjxp
CPCjPjxp
xCP M
j
M
jkk
k
1
''
1
|
||
|
04/19/23 28RBF Networks M.W. Mak
M
jjkj
M
jk
M
j
M
j
kk
xw
xjPjCP
jPjxp
jPjxp
jP
CPCjP
1
1
1
''1
||
|
||
Hidden node’s output posterior prob. of the j-th set of
features in the input .
weight posterior prob. of class membership, given
the presence of the j- th set of features .
:)|()( xjPxj
:)|( jCPw kkj
No bias term
04/19/23 29RBF Networks M.W. Mak
RBF networks MLP
Learning speed Very Fast Very Slow
Convergence Almost guarantee Not guarantee
Response time Slow Fast
Memoryrequirement
Very large Small
Hardwareimplementation
IBM ZISC036Nestor Ni1000www-5.ibm.com/fr/cdlab/zisc.html
Voice Direct 364www.sensoryinc.com
Generalization Usually better Usually poorer
Comparison of RBF and MLPComparison of RBF and MLP
To learn more about NN hardware, see To learn more about NN hardware, see http://www.particle.kth.se/~lindsey/HardwareNNWCourse/home.htmlhttp://www.particle.kth.se/~lindsey/HardwareNNWCourse/home.html