8/10/2015 1 rbf networksm.w. mak radial basis function networks 1. introduction 2. finding rbf...

Post on 23-Dec-2015

226 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

04/19/23 1RBF Networks M.W. Mak

Radial Basis Function Radial Basis Function NetworksNetworks

1. Introduction

2. Finding RBF Parameters

3. Decision Surface of RBF Networks

4. Comparison between RBF and BP

04/19/23 2RBF Networks M.W. Mak

1. Introduction MLPs are highly non-linear in the parameter space

gradient descent local minima RBF networks solve this problem by dividing the

learning into two independent processes.

w

04/19/23 3RBF Networks M.W. Mak

RBF networks implement the function

s x w w x ci i ii

M

( ) ( )

0

1

wi i and ci can be determined separately

Fast learning algorithm Basis function types

( ) log( )

( ) exp( )

( )

( )

r r r

r r

r r

rr

2

2

2

2 2

2 21

04/19/23 4RBF Networks M.W. Mak

For Gaussian basis functions

s x w w x c

w wx c

p i i p ii

M

ipj ij

ijj

n

i

M

( )

exp( )

01

0

2

211 2

Assume the variance across each dimension are equal

s x w w x cp ii

pj ijj

n

i

M

( ) exp ( )

0 22

11

1

2

04/19/23 5RBF Networks M.W. Mak

To write in matrix form, let

a x c

s x w a a

pi i p i

p i pii

M

p

where ( )

00 1

s x

s x

s x

a a a

a a a

a a a

w

w

wN

M

M

N N NM M

( )

( )

( )

`

1

2

11 12 1

21 22 2

1 2

0

1

1

1

1

s Aw

A s1

w

04/19/23 6RBF Networks M.W. Mak

2. Finding the RBF Parameters

Use the K-mean algorithm to find ci

1

2

2

2

1

1

04/19/23 7RBF Networks M.W. Mak

K-mean Algorithm

step1: K initial clusters are chosen randomly from the samples to form K groups.

step2: Each new sample is added to the group whose mean is the closest to this sample.

step3: Adjust the mean of the group to take account of the new points.

step4: Repeat step2 until the distance between the old means and the new means of all clusters is smaller than a predefined tolerance.

04/19/23 8RBF Networks M.W. Mak

Outcome: There are K clusters with means representing the centroid of each clusters.

Advantages: (1) A fast and simple algorithm.

(2) Reduce the effects of noisy samples.

04/19/23 9RBF Networks M.W. Mak

Use K nearest neighbor rule to find the function width

2

1

1

K

kiki cc

K

k-th nearest neighbor of ci

The objective is to cover the training points so that a smooth fit of the training samples can be achieved

04/19/23 10RBF Networks M.W. Mak

Centers and widths found by K-means and K-NN

04/19/23 11RBF Networks M.W. Mak

Determining weights w using the least square method

E d w x cp j jj

M

p jp

N

0

2

1

where dp is the desired output for pattern p

E

E

T

T T

( ) ( )

( )

d Aw d Aw

wA A A dSet w

0 1

04/19/23 12RBF Networks M.W. Mak

Let E be the total-squared error between the actual output and the target output TNdddd

21

wAdwAdET

AwAwAwddAwdd

AwdAwdTTTTTT

TTT

AwAww

Awdw

dAwww

E TTTTT

0

AwAdA

wAAAwAdAww

dA

TT

TTTTTT

22

dAAAw

dAAwATT

TT

1

04/19/23 13RBF Networks M.W. Mak

Note that

xAxAxAxx

yAyAxx

yyxx

TT

T

T

Problems

(1) Susceptible to round-off error.

(2) No solution if is singular.

(3) If is close to singular, we get very large component in w.

AAT

AAT

04/19/23 14RBF Networks M.W. Mak

Reasons

(1) Inaccuracy in forming(2) If A is ill-conditioned, small change in A introduces

large change in(3) If ATA is close to singular, dependent columns in ATA

exist

AAT

1AAT

e.g. two parallel straight lines.

x

y

04/19/23 15RBF Networks M.W. Mak

singular matrix :

1

0

42

21

y

x

If the lines are nearly parallel, they intersect each other at

,

i.e.

0

0

y

x

0

0

y

xor

So, the magnitude of the solution becomes very large; hence overflow will occur.

The effect of the large components can be cancelled out if the machine precision is infinite.

04/19/23 16RBF Networks M.W. Mak

If the machine precision is finite, we get large error.For example,

0

0

102

104

21

2138

38

Finite machine precision =>

33

33

38

38

101

101

102

1000001.4

21

21

Solution: Singular Value Decomposition

04/19/23 17RBF Networks M.W. Mak

xp

K-means

K-NearestNeighbor

BasisFunctions

LinearRegression

ci

ci

i

A w

RBF learning processRBF learning process

04/19/23 18RBF Networks M.W. Mak

RBF learning by gradient descent

Let and i p

pj ij

ijj

n

p p pxx c

e x d x s x( ) exp ( ) ( ) ( )

1

2

2

21

E e x pp

N

1

2 1

2

( ) .

we have

E

w

E E

ci ij ij

, , and

Apply

04/19/23 19RBF Networks M.W. Mak

we have the following update equations

w t w t e x x i M

w t w t e x i

t t e x w x x c t

c t c t e x w x x c t

i i w p i pp

N

i i w pp

N

ij ij p i i p pj ij ijp

N

ij ij c p i i p pj ij ijp

N

( ) ( ) ( ) ( ) , , ,

( ) ( ) ( )

( ) ( ) ( ) ( ) ( )

( ) ( ) ( ) ( ) ( )

1 1 2

1 0

1

1

1

1

2 3

1

2

1

when

when

04/19/23 20RBF Networks M.W. Mak

Elliptical Basis Function networks

)}()(2

1exp{)( 1

jpjT

jppj xxx

j

j

: function centers

: covariance matrix

1

x1

2 M

x2 xn

J

jpjkjpk xwxy

0

)()(

y W D W = +

y x1( )

y xK ( )

04/19/23 21RBF Networks M.W. Mak

K-means and Sample covariance K-means :

if Sample covariance :

j jj x

Nx

j

1

x j

x x j kj k

jj

j jT

xN

x xj

1

( )( )

The EM algorithm

04/19/23 22RBF Networks M.W. Mak

EBF Vs. RBF networksEBF Vs. RBF networks

RBFN with 4 centers EBFN with 4 centers

-3

-2

-1

0

1

2

3

-3 -2 -1 0 1 2 3

Class 1Class 2

-3

-2

-1

0

1

2

3

-3 -2 -1 0 1 2 3

Class 1Class 2

04/19/23 23RBF Networks M.W. Mak

Out put 1 of an EBF net work (bias, no rescale, gamma=1)

'nxor.ebf 4.Y.N.1.dat ' 1.43

0.948 0.463

-0.0209 -0.505

-3-2

-10

12

3 -3-2

-10

12

3

-1

-0.5

0

0.5

1

1.5

2

EBF Network’s output

Elliptical Basis Function NetworksElliptical Basis Function Networks

04/19/23 24RBF Networks M.W. Mak

RBFN for Pattern Classification

MLP RBFHyperplane Kernel function

The probability density function (also called conditional density function or likelihood) of the k-th class is defined as

kCxp |

04/19/23 25RBF Networks M.W. Mak

•According to Bays’ theorem, the posterior prob. is

xp

CPCxpxCP kk

k

||

where P(Ck) is the prior prob. and

)()|( rr

r CPCxpxp

• It is possible to use a common pool of M basis functions, labeled by an index j, to represent all of the class-conditional densities, i.e.

)|()|(|1

k

M

jk CjPjxpCxp

04/19/23 26RBF Networks M.W. Mak

)1|(xp

)|( kCxp

)|( Mxp)2|(xp

k

M

jk CjPjxpCxp |||

1

)|( kCMP

04/19/23 27RBF Networks M.W. Mak

kk

M

jk CPCjPjxpxp

1

||

M

j

kk

k

M

j

jPjxp

CPCjPjxp

1

1

|

||

jP

jP

jPjxp

CPCjPjxp

xCP M

j

M

jkk

k

1

''

1

|

||

|

04/19/23 28RBF Networks M.W. Mak

M

jjkj

M

jk

M

j

M

j

kk

xw

xjPjCP

jPjxp

jPjxp

jP

CPCjP

1

1

1

''1

||

|

||

Hidden node’s output posterior prob. of the j-th set of

features in the input .

weight posterior prob. of class membership, given

the presence of the j- th set of features .

:)|()( xjPxj

:)|( jCPw kkj

No bias term

04/19/23 29RBF Networks M.W. Mak

RBF networks MLP

Learning speed Very Fast Very Slow

Convergence Almost guarantee Not guarantee

Response time Slow Fast

Memoryrequirement

Very large Small

Hardwareimplementation

IBM ZISC036Nestor Ni1000www-5.ibm.com/fr/cdlab/zisc.html

Voice Direct 364www.sensoryinc.com

Generalization Usually better Usually poorer

Comparison of RBF and MLPComparison of RBF and MLP

To learn more about NN hardware, see To learn more about NN hardware, see http://www.particle.kth.se/~lindsey/HardwareNNWCourse/home.htmlhttp://www.particle.kth.se/~lindsey/HardwareNNWCourse/home.html

top related