8/10/2015 1 rbf networksm.w. mak radial basis function networks 1. introduction 2. finding rbf...

04/19/23 1RBF Networks M.W. Mak

Radial Basis Function Radial Basis Function NetworksNetworks

1. Introduction

2. Finding RBF Parameters

3. Decision Surface of RBF Networks

4. Comparison between RBF and BP


1. Introduction MLPs are highly non-linear in the parameter space

gradient descent local minima RBF networks solve this problem by dividing the

learning into two independent processes.

w


RBF networks implement the function

s x w w x ci i ii

M

( ) ( )

0

1

wi i and ci can be determined separately

Fast learning algorithm Basis function types

( ) log( )

( ) exp( )

( )

( )

r r r

r r

r r

rr

2

2

2

2 2

2 21


For Gaussian basis functions

s x w w x c

w wx c

p i i p ii

M

ipj ij

ijj

n

i

M

( )

exp( )

01

0

2

211 2

Assume the variance across each dimension are equal

s x w w x cp ii

pj ijj

n

i

M

( ) exp ( )

0 22

11

1

2


To write in matrix form, let

a x c

s x w a a

pi i p i

p i pii

M

p

where ( )

00 1

s x

s x

s x

a a a

a a a

a a a

w

w

wN

M

M

N N NM M

( )

( )

( )

`

1

2

11 12 1

21 22 2

1 2

0

1

1

1

1

s Aw

A s1

w


2. Finding the RBF Parameters

Use the K-mean algorithm to find ci

1

2

2

2

1

1


K-mean Algorithm

step1: K initial clusters are chosen randomly from the samples to form K groups.

step2: Each new sample is added to the group whose mean is the closest to this sample.

step3: Adjust the mean of the group to take account of the new points.

step4: Repeat step2 until the distance between the old means and the new means of all clusters is smaller than a predefined tolerance.


Outcome: There are K clusters with means representing the centroid of each clusters.

Advantages: (1) A fast and simple algorithm.

(2) Reduce the effects of noisy samples.


Use K nearest neighbor rule to find the function width

2

1

1

K

kiki cc

K

k-th nearest neighbor of ci

The objective is to cover the training points so that a smooth fit of the training samples can be achieved


Centers and widths found by K-means and K-NN


Determining weights w using the least square method

E d w x cp j jj

M

p jp

N

0

2

1

where dp is the desired output for pattern p

E

E

T

T T

( ) ( )

( )

d Aw d Aw

wA A A dSet w

0 1


Let E be the total-squared error between the actual output and the target output TNdddd

21

wAdwAdET

AwAwAwddAwdd

AwdAwdTTTTTT

TTT

AwAww

Awdw

dAwww

E TTTTT

0

AwAdA

wAAAwAdAww

dA

TT

TTTTTT

22

dAAAw

dAAwATT

TT

1


Note that

xAxAxAxx

yAyAxx

yyxx

TT

T

T

Problems

(1) Susceptible to round-off error.

(2) No solution if is singular.

(3) If is close to singular, we get very large component in w.

AAT

AAT


Reasons

(1) Inaccuracy in forming(2) If A is ill-conditioned, small change in A introduces

large change in(3) If ATA is close to singular, dependent columns in ATA

exist

AAT

1AAT

e.g. two parallel straight lines.

x

y


singular matrix :

1

0

42

21

y

x

If the lines are nearly parallel, they intersect each other at

,

i.e.

0

0

y

x

0

0

y

xor

So, the magnitude of the solution becomes very large; hence overflow will occur.

The effect of the large components can be cancelled out if the machine precision is infinite.


If the machine precision is finite, we get large error.For example,

0

0

102

104

21

2138

38

Finite machine precision =>

33

33

38

38

101

101

102

1000001.4

21

21

Solution: Singular Value Decomposition


xp

K-means

K-NearestNeighbor

BasisFunctions

LinearRegression

ci

ci

i

A w

RBF learning processRBF learning process


RBF learning by gradient descent

Let and i p

pj ij

ijj

n

p p pxx c

e x d x s x( ) exp ( ) ( ) ( )

1

2

2

21

E e x pp

N

1

2 1

2

( ) .

we have

E

w

E E

ci ij ij

, , and

Apply


we have the following update equations

w t w t e x x i M

w t w t e x i

t t e x w x x c t

c t c t e x w x x c t

i i w p i pp

N

i i w pp

N

ij ij p i i p pj ij ijp

N

ij ij c p i i p pj ij ijp

N

( ) ( ) ( ) ( ) , , ,

( ) ( ) ( )

( ) ( ) ( ) ( ) ( )

( ) ( ) ( ) ( ) ( )

1 1 2

1 0

1

1

1

1

2 3

1

2

1

when

when


Elliptical Basis Function networks

)}()(2

1exp{)( 1

jpjT

jppj xxx

j

j

: function centers

: covariance matrix

1

x1

2 M

x2 xn

J

jpjkjpk xwxy

0

)()(

y W D W = +

y x1( )

y xK ( )


K-means and Sample covariance K-means :

if Sample covariance :

j jj x

Nx

j

1

x j

x x j kj k

jj

j jT

xN

x xj

1

( )( )

The EM algorithm


EBF Vs. RBF networksEBF Vs. RBF networks

RBFN with 4 centers EBFN with 4 centers

-3

-2

-1

0

1

2

3

-3 -2 -1 0 1 2 3

Class 1Class 2

-3

-2

-1

0

1

2

3

-3 -2 -1 0 1 2 3

Class 1Class 2


Out put 1 of an EBF net work (bias, no rescale, gamma=1)

'nxor.ebf 4.Y.N.1.dat ' 1.43

0.948 0.463

-0.0209 -0.505

-3-2

-10

12

3 -3-2

-10

12

3

-1

-0.5

0

0.5

1

1.5

2

EBF Network’s output

Elliptical Basis Function NetworksElliptical Basis Function Networks


RBFN for Pattern Classification

MLP RBFHyperplane Kernel function

The probability density function (also called conditional density function or likelihood) of the k-th class is defined as

kCxp |


•According to Bays’ theorem, the posterior prob. is

xp

CPCxpxCP kk

k

||

where P(Ck) is the prior prob. and

)()|( rr

r CPCxpxp

• It is possible to use a common pool of M basis functions, labeled by an index j, to represent all of the class-conditional densities, i.e.

)|()|(|1

k

M

jk CjPjxpCxp


)1|(xp

)|( kCxp

)|( Mxp)2|(xp

k

M

jk CjPjxpCxp |||

1

)|( kCMP


kk

M

jk CPCjPjxpxp

1

||

M

j

kk

k

M

j

jPjxp

CPCjPjxp

1

1

|

||

jP

jP

jPjxp

CPCjPjxp

xCP M

j

M

jkk

k

1

''

1

|

||

|


M

jjkj

M

jk

M

j

M

j

kk

xw

xjPjCP

jPjxp

jPjxp

jP

CPCjP

1

1

1

''1

||

|

||

Hidden node’s output posterior prob. of the j-th set of

features in the input .

weight posterior prob. of class membership, given

the presence of the j- th set of features .

:)|()( xjPxj

:)|( jCPw kkj

No bias term


RBF networks MLP

Learning speed Very Fast Very Slow

Convergence Almost guarantee Not guarantee

Response time Slow Fast

Memoryrequirement

Very large Small

Hardwareimplementation

IBM ZISC036Nestor Ni1000www-5.ibm.com/fr/cdlab/zisc.html

Voice Direct 364www.sensoryinc.com

Generalization Usually better Usually poorer

Comparison of RBF and MLPComparison of RBF and MLP

To learn more about NN hardware, see To learn more about NN hardware, see http://www.particle.kth.se/~lindsey/HardwareNNWCourse/home.htmlhttp://www.particle.kth.se/~lindsey/HardwareNNWCourse/home.html

8/10/2015 1 rbf networksm.w. mak radial basis function networks 1. introduction 2. finding rbf...

Documents