jacek mazurkiewicz, phd softcomputing part 13: …...•radial basis function (rbf) networks are...

93
Internet Engineering Jacek Mazurkiewicz, PhD Softcomputing Part 13: Radial Basis Function Networks

Upload: others

Post on 20-May-2020

7 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Jacek Mazurkiewicz, PhD Softcomputing Part 13: …...•Radial basis function (RBF) networks are feed-forward networks trained using a supervised training algorithm. •The activation

Internet EngineeringJacek Mazurkiewicz, PhD

Softcomputing

Part 13: Radial Basis Function Networks

Page 2: Jacek Mazurkiewicz, PhD Softcomputing Part 13: …...•Radial basis function (RBF) networks are feed-forward networks trained using a supervised training algorithm. •The activation

Contents• Overview• The Models of Function Approximator• The Radial Basis Function Networks• RBFN’s for Function Approximation• The Projection Matrix• Learning the Kernels• Bias-Variance Dilemma• The Effective Number of Parameters• Model Selection

Page 3: Jacek Mazurkiewicz, PhD Softcomputing Part 13: …...•Radial basis function (RBF) networks are feed-forward networks trained using a supervised training algorithm. •The activation

RBF• Linear models have been studied in

statistics for about 200 years and the theory is applicable to RBF networks which are just one particular type of linear model.

• However, the fashion for neural networks which started in the mid-80 has given rise to new names for concepts already familiar to statisticians

Page 4: Jacek Mazurkiewicz, PhD Softcomputing Part 13: …...•Radial basis function (RBF) networks are feed-forward networks trained using a supervised training algorithm. •The activation

Typical Applications of NN

• Pattern Classification

• Function Approximation

• Time-Series Forecasting

( )l f= xl C N

( )f=y x mY R y

mX R x

nX R x

1 2 3( ) ( , , , )t t tft − − −=x x x x

Page 5: Jacek Mazurkiewicz, PhD Softcomputing Part 13: …...•Radial basis function (RBF) networks are feed-forward networks trained using a supervised training algorithm. •The activation

Function Approximation

mX R fnY R

:f X Y→Unknown

mX RnY R

ˆ :f X Y→Approximator

f

Page 6: Jacek Mazurkiewicz, PhD Softcomputing Part 13: …...•Radial basis function (RBF) networks are feed-forward networks trained using a supervised training algorithm. •The activation

Linear Models

1

(( ) )i

i

i

m

f w=

=x xFixed BasisFunctions

Weights

Page 7: Jacek Mazurkiewicz, PhD Softcomputing Part 13: …...•Radial basis function (RBF) networks are feed-forward networks trained using a supervised training algorithm. •The activation

Linear Models

Feature VectorsInputs

HiddenUnits

OutputUnits

• Decomposition• Feature Extraction• Transformation

Linearly weighted output

1

(( ) )i

i

i

m

f w=

=x xy

1 2 m

x1 x2 xn

w1 w2 wm

x =

Page 8: Jacek Mazurkiewicz, PhD Softcomputing Part 13: …...•Radial basis function (RBF) networks are feed-forward networks trained using a supervised training algorithm. •The activation

Example Linear Models

• Polynomial

• Fourier Series

( ) i

i

iwf xx =

( )0exp 2( )k

k jf w k xx =

( ) , 0,1,2,i

i ix x = =

( )0( ) exp 2 0,1,, 2, k x j k x k = =

1

(( ) )i

i

i

m

f w=

=x x

Page 9: Jacek Mazurkiewicz, PhD Softcomputing Part 13: …...•Radial basis function (RBF) networks are feed-forward networks trained using a supervised training algorithm. •The activation

y

1 2 m

x1 x2 xn

w1 w2 wm

x =

Single-Layer Perceptrons as Universal Aproximators

HiddenUnits

With sufficient number of sigmoidal units, it can be a universal approximator.

1

(( ) )i

i

i

m

f w=

=x x

Page 10: Jacek Mazurkiewicz, PhD Softcomputing Part 13: …...•Radial basis function (RBF) networks are feed-forward networks trained using a supervised training algorithm. •The activation

y

1 2 m

x1 x2 xn

w1 w2 wm

x =

Radial Basis Function Networks as Universal Aproximators

HiddenUnits

With sufficient number of radial-basis-function units, it can also be a universal approximator.

1

(( ) )i

i

i

m

f w=

=x x

Page 11: Jacek Mazurkiewicz, PhD Softcomputing Part 13: …...•Radial basis function (RBF) networks are feed-forward networks trained using a supervised training algorithm. •The activation

Non-Linear Models

1

(( ) )i

i

i

m

f w=

=x xAdjusted by theLearning process

Weights

Page 12: Jacek Mazurkiewicz, PhD Softcomputing Part 13: …...•Radial basis function (RBF) networks are feed-forward networks trained using a supervised training algorithm. •The activation

Radial Basis Functions

• Center

• Distance Measure

• Shape

Three parameters for a radial function:

xi

r = ||x − xi||

i(x)= (||x − xi||)

Page 13: Jacek Mazurkiewicz, PhD Softcomputing Part 13: …...•Radial basis function (RBF) networks are feed-forward networks trained using a supervised training algorithm. •The activation

Typical Radial Functions

• Gaussian

• Hardy-Multiquadratic (1971)

• Inverse Multiquadratic

( )2

22 0 and r

r e r −

=

( ) 2 2 0 and r r c c c r = +

( ) 2 2 0 and r c r c c r = +

Page 14: Jacek Mazurkiewicz, PhD Softcomputing Part 13: …...•Radial basis function (RBF) networks are feed-forward networks trained using a supervised training algorithm. •The activation

Gaussian Basis Function (=0.5,1.0,1.5)

( )2

22 0 and r

r e r −

=

0.5 =1.0 =

1.5 =

Page 15: Jacek Mazurkiewicz, PhD Softcomputing Part 13: …...•Radial basis function (RBF) networks are feed-forward networks trained using a supervised training algorithm. •The activation

Inverse Multiquadratic

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

-10 -5 0 5 10

( ) 2 2 0 and r c r c c r = +

c=1

c=2c=3

c=4c=5

Page 16: Jacek Mazurkiewicz, PhD Softcomputing Part 13: …...•Radial basis function (RBF) networks are feed-forward networks trained using a supervised training algorithm. •The activation

Most General RBF

( ) ( )1( ) ( )i i

T

i −= − −μ x μx x

1μ2μ 3μ

Basis {i: i =1,2,…} is `near’ orthogonal.

Page 17: Jacek Mazurkiewicz, PhD Softcomputing Part 13: …...•Radial basis function (RBF) networks are feed-forward networks trained using a supervised training algorithm. •The activation

Properties of RBF’s

• On-Center, Off Surround

• Analogies with localized receptive fieldsfound in several biological structures, e.g.,– visual cortex;

– ganglion cells

Page 18: Jacek Mazurkiewicz, PhD Softcomputing Part 13: …...•Radial basis function (RBF) networks are feed-forward networks trained using a supervised training algorithm. •The activation

The Topology of RBF

Feature Vectorsx1 x2 xn

y1 ym

Inputs

HiddenUnits

OutputUnits

Projection

Interpolation

As a function approximator

Page 19: Jacek Mazurkiewicz, PhD Softcomputing Part 13: …...•Radial basis function (RBF) networks are feed-forward networks trained using a supervised training algorithm. •The activation

The Topology of RBF

Feature Vectorsx1 x2 xn

y1 ym

Inputs

HiddenUnits

OutputUnits

Subclasses

Classes

As a pattern classifier.

Page 20: Jacek Mazurkiewicz, PhD Softcomputing Part 13: …...•Radial basis function (RBF) networks are feed-forward networks trained using a supervised training algorithm. •The activation

• Radial basis function (RBF) networks are feed-forward networks trained using a supervised training algorithm.

• The activation function is selected from a class of functions called basis functions.

• They usually train much faster than BP.

• They are less susceptible to problems with non-stationary inputs

Page 21: Jacek Mazurkiewicz, PhD Softcomputing Part 13: …...•Radial basis function (RBF) networks are feed-forward networks trained using a supervised training algorithm. •The activation

• Popularized by Broomhead and Lowe (1988), and Moody and Darken (1989), RBF networks have proven to be a useful neural network architecture.

• The major difference between RBF and BP is the behavior of the single hidden layer.

• Rather than using the sigmoidal or S-shaped activation function as in BP, the hidden units in RBF networks use a Gaussian or some other basis kernel function.

Page 22: Jacek Mazurkiewicz, PhD Softcomputing Part 13: …...•Radial basis function (RBF) networks are feed-forward networks trained using a supervised training algorithm. •The activation

The idea

x

y Unknown Function to Approximate

TrainingData

Page 23: Jacek Mazurkiewicz, PhD Softcomputing Part 13: …...•Radial basis function (RBF) networks are feed-forward networks trained using a supervised training algorithm. •The activation

The idea

x

y

Basis Functions (Kernels)

Function Learned

1

)) ((m

i

iiwy f =

= = xx

Page 24: Jacek Mazurkiewicz, PhD Softcomputing Part 13: …...•Radial basis function (RBF) networks are feed-forward networks trained using a supervised training algorithm. •The activation

The idea

x

y

Basis Functions (Kernels)

Function Learned

NontrainingSample

1

)) ((m

i

iiwy f =

= = xx

Page 25: Jacek Mazurkiewicz, PhD Softcomputing Part 13: …...•Radial basis function (RBF) networks are feed-forward networks trained using a supervised training algorithm. •The activation

The idea

x

yFunction Learned

NontrainingSample

1

)) ((m

i

iiwy f =

= = xx

Page 26: Jacek Mazurkiewicz, PhD Softcomputing Part 13: …...•Radial basis function (RBF) networks are feed-forward networks trained using a supervised training algorithm. •The activation

Radial Basis Function Networks as Universal Aproximators

1

( ) ( )m

i

i

iy wf =

= =x x

x1 x2 xn

w1 w2 wm

x =

Training set ( ) ( )

1

( ),p

k

k

ky=

= xT

Goal ( )(( ))k kfy x for all k

( )2

( )

1

( )min kp

k

k

SSE fy=

= − x

( )2

)(

1

) (

1

p mk

i

k

k

i

i

y w= =

= −

x

Page 27: Jacek Mazurkiewicz, PhD Softcomputing Part 13: …...•Radial basis function (RBF) networks are feed-forward networks trained using a supervised training algorithm. •The activation

Learn the Optimal Weight Vector

w1 w2 wm

x1 x2 xnx =

Training set ( ) ( )

1

( ),p

k

k

ky=

= xT

Goal ( )(( ))k kfy x for all k

1

( ) ( )m

i

i

iy wf =

= =x x

( )2

( )

1

( )min kp

k

k

SSE fy=

= − x

( )2

)(

1

) (

1

p mk

i

k

k

i

i

y w= =

= −

x

Page 28: Jacek Mazurkiewicz, PhD Softcomputing Part 13: …...•Radial basis function (RBF) networks are feed-forward networks trained using a supervised training algorithm. •The activation

( )2

( )

1

( )min kp

k

k

SSE fy=

= − x

( )2

)(

1

) (

1

p mk

i

k

k

i

i

y w= =

= −

x

Regularization

Training set ( ) ( )

1

( ),p

k

k

ky=

= xT

Goal ( )(( ))k kfy x for all k

2

1

m

i

iiw=

+

2

1

m

i

iiw=

+

0i

If regularization is unneeded, set 0i =

C

Page 29: Jacek Mazurkiewicz, PhD Softcomputing Part 13: …...•Radial basis function (RBF) networks are feed-forward networks trained using a supervised training algorithm. •The activation

Learn the Optimal Weight Vector

( )2

2( )

1

( )

1

p mk

k

i

i

i

k wfyC = =

= − + xMinimize

( )( )

(

( )

) ( )

1

2 2kp

j

k

j

k

k

j j

wfC

fw w

y =

= − − +

x

x

( ) ( )( ) ( ) ( )

1

2 2p

k k

j

k

jj

k f wy =

= − − + x x

0 =

( ) ( ) ( )( ) * ( ) ( )

1 1

)* (p p

k k k

j j

k k

jj

kwf y = =

+ = x x x

Page 30: Jacek Mazurkiewicz, PhD Softcomputing Part 13: …...•Radial basis function (RBF) networks are feed-forward networks trained using a supervised training algorithm. •The activation

Learn the Optimal Weight Vector

( ) ( )( )(1) ( ), ,T

p

j j j =φ x x

( ) ( )( )* * (1) * ( ), ,T

pf f=f x xDefine

( )(1) ( ), ,T

py y=y

* *

j

T T

j jjw+ =φ f φ y 1, ,j m=

( ) ( ) ( )( ) * ( ) ( )

1 1

)* (p p

k k k

j j

k k

jj

kwf y = =

+ = x x x

Page 31: Jacek Mazurkiewicz, PhD Softcomputing Part 13: …...•Radial basis function (RBF) networks are feed-forward networks trained using a supervised training algorithm. •The activation

Learn the Optimal Weight Vector*

1 1

*

2 2

*

1

*

*

1

2 2

*

T T

T T

T T

m mmm

w

w

w

+ =

+ =

+ =

φ f φ y

φ f φ y

φ f φ y

( )1 2, , , m=Φ φ φ φ

1

2

m

=

Λ

( )* * *

1 2* , , ,T

mw w w=w

Define

**T T+ =wΦ Λf Φ y

Page 32: Jacek Mazurkiewicz, PhD Softcomputing Part 13: …...•Radial basis function (RBF) networks are feed-forward networks trained using a supervised training algorithm. •The activation

Learn the Optimal Weight Vector

**T T+ =wΦ Λf Φ y

( )

( )

( )

* (1)

* (2)

*

* ( )p

f

f

f

=

x

xf

x

( )

( )

( )

(1)

1

(2)

1

(

*

*

1

*

)

k

k

m

k

k

m

k

k

mP

k

k

k

w

w

w

=

=

=

=

x

x

x

( ) ( ) ( )

( ) ( ) ( )

( ) ( ) ( )

(1) (1) (1)*1 21

(2) (2) (2) *1 2 2

*( ) ( ) ( )

1 2

m

m

p p pm

m

w

w

w

=

x x x

x x x

x x x

*=Φw

* *T T+ =ΛwΦ wΦ Φ y

Page 33: Jacek Mazurkiewicz, PhD Softcomputing Part 13: …...•Radial basis function (RBF) networks are feed-forward networks trained using a supervised training algorithm. •The activation

Learn the Optimal Weight Vector

( ) *T T+ =wΦ ΛΦ Φ y

( )1

* T T−

= +Λw Φ Φ Φ y

Variance Matrix

1 T−= A Φ y

1 :−A

Design Matrix:Φ

Page 34: Jacek Mazurkiewicz, PhD Softcomputing Part 13: …...•Radial basis function (RBF) networks are feed-forward networks trained using a supervised training algorithm. •The activation

The Empirical-Error Vector1* T−=w A Φ y

*

1w *

2w*

mw

( )kx

( )

1

kx ( )

2

kx( )k

nx( )kx ( )

1

kx ( )

2

kx( )k

nx

UnknownFunction

y ** =f Φw

1φ 2φ mφ

Page 35: Jacek Mazurkiewicz, PhD Softcomputing Part 13: …...•Radial basis function (RBF) networks are feed-forward networks trained using a supervised training algorithm. •The activation

The Empirical-Error Vector1* T−=w A Φ y

*

1w *

2w*

mw

( )kx

( )

1

kx ( )

2

kx( )k

nx( )kx ( )

1

kx ( )

2

kx( )k

nx

UnknownFunction

y ** =f Φw

1φ 2φ mφ

Error Vector

*= −e y f*= −y Φw 1 T−= −ΦAy Φ y

( )1 T

p

−= − AI Φ Φ y = Py

Page 36: Jacek Mazurkiewicz, PhD Softcomputing Part 13: …...•Radial basis function (RBF) networks are feed-forward networks trained using a supervised training algorithm. •The activation

Sum-Squared-Error

=e Py1 T

p

−= −P I Φ ΦA

T= +A Φ Φ Λ

( )1 2, , , m=Φ φ φ φError Vector

( )T

= Py Py

2T= y P y

( )2

( ) * ( )

1

pk k

k

SSE y f=

= − x

If =0, the RBFN’s learning algorithm

is to minimize SSE (MSE).

Page 37: Jacek Mazurkiewicz, PhD Softcomputing Part 13: …...•Radial basis function (RBF) networks are feed-forward networks trained using a supervised training algorithm. •The activation

The Projection Matrix

=e Py1 T

p

−= −P I Φ ΦA

T= +A Φ Φ Λ

( )1 2, , , m=Φ φ φ φError Vector

2TSSE = y P y

1 2( , , , )mspan⊥e φ φ φ=Λ 0

( ) =P Py Pe = e = Py

T= y Py

Page 38: Jacek Mazurkiewicz, PhD Softcomputing Part 13: …...•Radial basis function (RBF) networks are feed-forward networks trained using a supervised training algorithm. •The activation

RBFN’s as Universal Approximators

x1 x2 xn

y1 yl

12 m

w11 w12w1m

wl1 wl2wlml

Training set

( ) ) ( )(

1,

pk

k

k

== yxT

2

2( ) exp

2

j

j

j

− = −

x μx

Kernels

Page 39: Jacek Mazurkiewicz, PhD Softcomputing Part 13: …...•Radial basis function (RBF) networks are feed-forward networks trained using a supervised training algorithm. •The activation

What to Learn?

• Weights wij’s• Centers j’s of j’s• Widths j’s of j’s

• Number of j’s −Model Selection

x1 x2 xn

y1 yl

12 m

w11 w12w1m

wl1 wl2wlml

Page 40: Jacek Mazurkiewicz, PhD Softcomputing Part 13: …...•Radial basis function (RBF) networks are feed-forward networks trained using a supervised training algorithm. •The activation

One-Stage Learning

( )( ) ( )( ) ( ) ( )

1

k k k

ij i i jw y f = − x x

( ) ( )( )( ) ( ) ( )

2 21

ljk k k

j j ij i i

ij

w y f =

− = −

x μμ x x

( ) ( )( )2

( ) ( ) ( )

3 31

ljk k k

j j ij i i

ij

w y f =

− = −

x μx x

Page 41: Jacek Mazurkiewicz, PhD Softcomputing Part 13: …...•Radial basis function (RBF) networks are feed-forward networks trained using a supervised training algorithm. •The activation

One-Stage Learning

The simultaneous updates of all three sets of parameters may be suitable for non-stationary environments or on-line setting.

( )( ) ( )( ) ( ) ( )

1

k k k

ij i i jw y f = − x x

( ) ( )( )( ) ( ) ( )

2 21

ljk k k

j j ij i i

ij

w y f =

− = −

x μμ x x

( ) ( )( )2

( ) ( ) ( )

3 31

ljk k k

j j ij i i

ij

w y f =

− = −

x μx x

Page 42: Jacek Mazurkiewicz, PhD Softcomputing Part 13: …...•Radial basis function (RBF) networks are feed-forward networks trained using a supervised training algorithm. •The activation

x1 x2 xn

y1 yl

12 m

w11 w12w1m

wl1 wl2wlml

Two-Stage Training

Determines• Centers j’s of j’s.• Widths j’s of j’s.• Number of j’s.

Step 1

Step 2 Determines wij’s.

E.g., using batch-learning.

Page 43: Jacek Mazurkiewicz, PhD Softcomputing Part 13: …...•Radial basis function (RBF) networks are feed-forward networks trained using a supervised training algorithm. •The activation

Train the Kernels

Page 44: Jacek Mazurkiewicz, PhD Softcomputing Part 13: …...•Radial basis function (RBF) networks are feed-forward networks trained using a supervised training algorithm. •The activation

Unsupervised Training

Page 45: Jacek Mazurkiewicz, PhD Softcomputing Part 13: …...•Radial basis function (RBF) networks are feed-forward networks trained using a supervised training algorithm. •The activation

Methods• Subset Selection

– Random Subset Selection– Forward Selection– Backward Elimination

• Clustering Algorithms– KMEANS– LVQ

• Mixture Models– GMM

Page 46: Jacek Mazurkiewicz, PhD Softcomputing Part 13: …...•Radial basis function (RBF) networks are feed-forward networks trained using a supervised training algorithm. •The activation

Subset Selection

Page 47: Jacek Mazurkiewicz, PhD Softcomputing Part 13: …...•Radial basis function (RBF) networks are feed-forward networks trained using a supervised training algorithm. •The activation

Random Subset Selection

• Randomly choosing a subset of points from training set

• Sensitive to the initially chosen points.

• Using some adaptive techniques to tune– Centers– Widths– #points

Page 48: Jacek Mazurkiewicz, PhD Softcomputing Part 13: …...•Radial basis function (RBF) networks are feed-forward networks trained using a supervised training algorithm. •The activation

Clustering Algorithms

Page 49: Jacek Mazurkiewicz, PhD Softcomputing Part 13: …...•Radial basis function (RBF) networks are feed-forward networks trained using a supervised training algorithm. •The activation

Clustering Algorithms

Page 50: Jacek Mazurkiewicz, PhD Softcomputing Part 13: …...•Radial basis function (RBF) networks are feed-forward networks trained using a supervised training algorithm. •The activation

Clustering Algorithms

Page 51: Jacek Mazurkiewicz, PhD Softcomputing Part 13: …...•Radial basis function (RBF) networks are feed-forward networks trained using a supervised training algorithm. •The activation

Clustering Algorithms

1

2

3

4

Page 52: Jacek Mazurkiewicz, PhD Softcomputing Part 13: …...•Radial basis function (RBF) networks are feed-forward networks trained using a supervised training algorithm. •The activation

Questions -How should the user choose the kernel?

• Problem similar to that of selecting features for other learning algorithms.➢Poor choice ---learning made very difficult.

➢Good choice ---even poor learners could succeed.

• The requirement from the user is thus critical.– can this requirement be lessened?

– is a more automatic selection of features possible?

Page 53: Jacek Mazurkiewicz, PhD Softcomputing Part 13: …...•Radial basis function (RBF) networks are feed-forward networks trained using a supervised training algorithm. •The activation

Goal Revisit

Minimize Prediction Error

• Ultimate Goal − Generalization

• Goal of Our Learning Procedure

Minimize Empirical Error

Page 54: Jacek Mazurkiewicz, PhD Softcomputing Part 13: …...•Radial basis function (RBF) networks are feed-forward networks trained using a supervised training algorithm. •The activation

Badness of Fit• Underfitting

– A model (e.g., network) that is not sufficiently complex can fail to detect fully the signal in a complicated data set, leading to underfitting.

– Produces excessive bias in the outputs.

• Overfitting– A model (e.g., network) that is too complex may

fit the noise, not just the signal, leading to overfitting.

– Produces excessive variance in the outputs.

Page 55: Jacek Mazurkiewicz, PhD Softcomputing Part 13: …...•Radial basis function (RBF) networks are feed-forward networks trained using a supervised training algorithm. •The activation

Underfitting/Overfitting Avoidance

• Model selection • Jittering • Early stopping • Weight decay

– Regularization– Ridge Regression

• Bayesian learning • Combining networks

Page 56: Jacek Mazurkiewicz, PhD Softcomputing Part 13: …...•Radial basis function (RBF) networks are feed-forward networks trained using a supervised training algorithm. •The activation

Best Way to Avoid Overfitting

• Use lots of training data, e.g.,– 30 times as many training cases as there

are weights in the network.

– for noise-free data, 5 times as many training cases as weights may be sufficient.

• Don’t arbitrarily reduce the number of weights for fear of underfitting.

Page 57: Jacek Mazurkiewicz, PhD Softcomputing Part 13: …...•Radial basis function (RBF) networks are feed-forward networks trained using a supervised training algorithm. •The activation

Badness of FitUnderfit Overfit

Page 58: Jacek Mazurkiewicz, PhD Softcomputing Part 13: …...•Radial basis function (RBF) networks are feed-forward networks trained using a supervised training algorithm. •The activation

Badness of FitUnderfit Overfit

Page 59: Jacek Mazurkiewicz, PhD Softcomputing Part 13: …...•Radial basis function (RBF) networks are feed-forward networks trained using a supervised training algorithm. •The activation

Bias-Variance Dilemma

Underfit Overfit

Large bias Small bias

Small variance Large variance

Page 60: Jacek Mazurkiewicz, PhD Softcomputing Part 13: …...•Radial basis function (RBF) networks are feed-forward networks trained using a supervised training algorithm. •The activation

Underfit Overfit

Large bias Small bias

Small variance Large variance

Bias-Variance Dilemma

More on overfitting

• Easily lead to predictionsthat are far beyond the range of the training data.

• Produce wild predictionsin multilayer perceptrons even with noise-free data.

Page 61: Jacek Mazurkiewicz, PhD Softcomputing Part 13: …...•Radial basis function (RBF) networks are feed-forward networks trained using a supervised training algorithm. •The activation

Bias-Variance Dilemma

fitmore

overfitmore

underfit

Bias

Variance

Page 62: Jacek Mazurkiewicz, PhD Softcomputing Part 13: …...•Radial basis function (RBF) networks are feed-forward networks trained using a supervised training algorithm. •The activation

Bias-Variance Dilemma

Sets of functions

E.g., depend on # hidden nodes used.

The true modelnoise

bias

Solution obtained with training set 2.

Solution obtained with training set 1.

Solution obtained with training set 3.

biasbias

Page 63: Jacek Mazurkiewicz, PhD Softcomputing Part 13: …...•Radial basis function (RBF) networks are feed-forward networks trained using a supervised training algorithm. •The activation

Bias-Variance Dilemma

Sets of functions

E.g., depend on # hidden nodes used.

The true modelnoise

bias

Variance

Page 64: Jacek Mazurkiewicz, PhD Softcomputing Part 13: …...•Radial basis function (RBF) networks are feed-forward networks trained using a supervised training algorithm. •The activation

Model Selection

Sets of functions

E.g., depend on # hidden nodes used.

The true modelnoise

bias

Variance

Page 65: Jacek Mazurkiewicz, PhD Softcomputing Part 13: …...•Radial basis function (RBF) networks are feed-forward networks trained using a supervised training algorithm. •The activation

Bias-Variance Dilemma

Sets of functions

E.g., depend on # hidden nodes used.

The true modelnoise

bias

( )g x( ) ( )y g = +x x

( )f x

Page 66: Jacek Mazurkiewicz, PhD Softcomputing Part 13: …...•Radial basis function (RBF) networks are feed-forward networks trained using a supervised training algorithm. •The activation

Bias-Variance Dilemma

( )2

( ) ( )E y f −

x x ( )2

( ) ( )E g f = + −

x x

( ) ( )2 2( ) ( ) 2 ( ) ( )E g f g f = − + − +

x x x x

( ) 2 2( ) ( ) 2 ( ) ( )E g f E g f E = − + − +

x x x x

Goal: ( )2

min ( ) ( )E g f −

x x( )2

min ( ) ( )E y f −

x x

0 constant

Page 67: Jacek Mazurkiewicz, PhD Softcomputing Part 13: …...•Radial basis function (RBF) networks are feed-forward networks trained using a supervised training algorithm. •The activation

Bias-Variance Dilemma

( )2

( ) ( )E g f −

x x ( )2

[ ( )] [( ) () )( ]E f E fE g f − −

+ =

x xx x

( )2

[ ]( )) (E fE g = −

x x ( )2

[ ]( )) (E fE f + −

x x

( )( )[ ( )] [ ( )) ( ]2 ( )E g fE f E f− − − x xx x

( ) ( ) ( )( )2 2

[ ( )] [ ( )]( ) ( ) [ ( )] [2 ( ) ( ) ( )]E f E f E f EE g g f ff = + − − −

− −

x x x xx x x x

( )( )( ) ( )[ ( )] [ ( )]E E f Eg f f− −x xxx

( ) ( )E g f= x x [ )]( ()EE fg− x x [ ( ()] )E E f f− x x [ ( )] [ ( )]E f E fE+ x x

( ) ( )E g E f= x x [ )]( () EE fg− x x [ ( )] ( )EE f f− x x [ ( )] [ ( )]E f E f+ x x 0=

0

Page 68: Jacek Mazurkiewicz, PhD Softcomputing Part 13: …...•Radial basis function (RBF) networks are feed-forward networks trained using a supervised training algorithm. •The activation

Bias-Variance Dilemma

( ) ( )2 22( ) ( ) ( ) ( )E y f E E g f − = + −

x x x x

( ) ( )2 22 [ ( )] [ ( )( ( ]) )E fE E fE g E f = + + −

x x xx

noise bias2 variance

Cannot be minimized

Minimize both bias2 and variance

Page 69: Jacek Mazurkiewicz, PhD Softcomputing Part 13: …...•Radial basis function (RBF) networks are feed-forward networks trained using a supervised training algorithm. •The activation

Model Complexity vs. Bias-Variance

( ) ( )2 22( ) ( ) ( ) ( )E y f E E g f − = + −

x x x x

( ) ( )2 22 [ ( )] [ ( )( ( ]) )E fE E fE g E f = + + −

x x xx

noise bias2 variance

ModelComplexity(Capacity)

Page 70: Jacek Mazurkiewicz, PhD Softcomputing Part 13: …...•Radial basis function (RBF) networks are feed-forward networks trained using a supervised training algorithm. •The activation

Bias-Variance Dilemma

( ) ( )2 22( ) ( ) ( ) ( )E y f E E g f − = + −

x x x x

( ) ( )2 22 [ ( )] [ ( )( ( ]) )E fE E fE g E f = + + −

x x xx

noise bias2 variance

ModelComplexity(Capacity)

Page 71: Jacek Mazurkiewicz, PhD Softcomputing Part 13: …...•Radial basis function (RBF) networks are feed-forward networks trained using a supervised training algorithm. •The activation

Example (Polynomial Fits)22exp( 16 )y x x= + −

Page 72: Jacek Mazurkiewicz, PhD Softcomputing Part 13: …...•Radial basis function (RBF) networks are feed-forward networks trained using a supervised training algorithm. •The activation

Example (Polynomial Fits)

Page 73: Jacek Mazurkiewicz, PhD Softcomputing Part 13: …...•Radial basis function (RBF) networks are feed-forward networks trained using a supervised training algorithm. •The activation

Example (Polynomial Fits)

Degree 1 Degree 5

Degree 10 Degree 15

Page 74: Jacek Mazurkiewicz, PhD Softcomputing Part 13: …...•Radial basis function (RBF) networks are feed-forward networks trained using a supervised training algorithm. •The activation

Variance Estimation

Mean

Variance ( )22

1

p

i

i

xp

=

= −

In general, not available.

Page 75: Jacek Mazurkiewicz, PhD Softcomputing Part 13: …...•Radial basis function (RBF) networks are feed-forward networks trained using a supervised training algorithm. •The activation

Variance Estimation

Mean1

p

i

i

x xp

=

= =

Variance ( )22 2

1

ˆ1

p

i

i

s xp

=

= = −−

Loss 1 degree of freedom

Page 76: Jacek Mazurkiewicz, PhD Softcomputing Part 13: …...•Radial basis function (RBF) networks are feed-forward networks trained using a supervised training algorithm. •The activation

Simple Linear Regression

0 1y x = + +

2~ (0, )N 0

1

y

x

=

+

y

x

Page 77: Jacek Mazurkiewicz, PhD Softcomputing Part 13: …...•Radial basis function (RBF) networks are feed-forward networks trained using a supervised training algorithm. •The activation

Simple Linear Regression

y

x

0

1ˆˆ

y

x

=

+0 1y x = + +

2~ (0, )N

Page 78: Jacek Mazurkiewicz, PhD Softcomputing Part 13: …...•Radial basis function (RBF) networks are feed-forward networks trained using a supervised training algorithm. •The activation

Mean Squared Error (MSE)

y

x

0

1ˆˆ

y

x

=

+0 1y x = + +

2~ (0, )N

2

SSEMSE

p = =

−Loss 2 degrees

of freedom

Page 79: Jacek Mazurkiewicz, PhD Softcomputing Part 13: …...•Radial basis function (RBF) networks are feed-forward networks trained using a supervised training algorithm. •The activation

Variance Estimation

2ˆSSE

MSEmp

= =−

Loss m degrees of freedom

m: #parameters of the model

Page 80: Jacek Mazurkiewicz, PhD Softcomputing Part 13: …...•Radial basis function (RBF) networks are feed-forward networks trained using a supervised training algorithm. •The activation

The Number of Parameters

w1 w2 wm

x1 x2 xnx =

1

( ) ( )m

i

i

iy wf =

= =x x#degrees of freedom:

Page 81: Jacek Mazurkiewicz, PhD Softcomputing Part 13: …...•Radial basis function (RBF) networks are feed-forward networks trained using a supervised training algorithm. •The activation

The Effective Number of Parameters ()

w1 w2 wm

x1 x2 xnx =

1

( ) ( )m

i

i

iy wf =

= =x x The projection Matrix

1 T

p

−= −P I Φ ΦA

T= +A Φ Φ Λ

=Λ 0

( )p trace = − P m=

Page 82: Jacek Mazurkiewicz, PhD Softcomputing Part 13: …...•Radial basis function (RBF) networks are feed-forward networks trained using a supervised training algorithm. •The activation

The Effective Number of Parameters ()

The projection Matrix

1 T

p

−= −P I Φ ΦA

T= +A Φ Φ Λ

=Λ 0

( )1( ) T

ptrace trace −= − AP I Φ Φ

( )1 Tp trace −= − ΦA Φ

( )( )1TTp trace

= − Φ Φ ΦΦ

Pf)

( )( )1T Tp trace

= − Φ ΦΦ Φ

( )mp trace= − I

p m= −( )p trace = − P m=

Page 83: Jacek Mazurkiewicz, PhD Softcomputing Part 13: …...•Radial basis function (RBF) networks are feed-forward networks trained using a supervised training algorithm. •The activation

Regularization

( )((

1

)

2

)p

k

k

k fy=

− x

2

1

m

i

iiw=

Empirial

Er

Model s

penalror ts

yCo t = +

( )2

( )

1 1

( )p m

k

i

k

k

i

iwy = =

x

Penalize models withlarge weightsSSE

Page 84: Jacek Mazurkiewicz, PhD Softcomputing Part 13: …...•Radial basis function (RBF) networks are feed-forward networks trained using a supervised training algorithm. •The activation

Regularization

( )((

1

)

2

)p

k

k

k fy=

− x

2

1

m

i

iiw=

Empirial

Er

Model s

penalror ts

yCo t = +

( )2

( )

1 1

( )p m

k

i

k

k

i

iwy = =

x

Penalize models withlarge weightsSSE

Without penalty (i=0), there are m degrees

of freedom to minimize SSE (Cost).

The effective number of parameters =m.

Page 85: Jacek Mazurkiewicz, PhD Softcomputing Part 13: …...•Radial basis function (RBF) networks are feed-forward networks trained using a supervised training algorithm. •The activation

Regularization

( )((

1

)

2

)p

k

k

k fy=

− x

2

1

m

i

iiw=

Empirial

Er

Model s

penalror ts

yCo t = +

( )2

( )

1 1

( )p m

k

i

k

k

i

iwy = =

x

Penalize models withlarge weightsSSE

With penalty (i>0), the liberty to minimize SSE will be reduced.

The effective number of parameters <m.

Page 86: Jacek Mazurkiewicz, PhD Softcomputing Part 13: …...•Radial basis function (RBF) networks are feed-forward networks trained using a supervised training algorithm. •The activation

Variance Estimation

2ˆ MSE =

Loss degrees of freedom

SSE

p =

Page 87: Jacek Mazurkiewicz, PhD Softcomputing Part 13: …...•Radial basis function (RBF) networks are feed-forward networks trained using a supervised training algorithm. •The activation

Variance Estimation

2ˆ MSE =( )

SSE

trace=

P

Page 88: Jacek Mazurkiewicz, PhD Softcomputing Part 13: …...•Radial basis function (RBF) networks are feed-forward networks trained using a supervised training algorithm. •The activation

Model Selection• Goal

– Choose the fittest model

• Criteria– Least prediction error

• Main Tools (Estimate Model Fitness)– Cross validation– Projection matrix

• Methods– Weight decay (Ridge regression)– Pruning and Growing RBFN’s

Page 89: Jacek Mazurkiewicz, PhD Softcomputing Part 13: …...•Radial basis function (RBF) networks are feed-forward networks trained using a supervised training algorithm. •The activation

Empirical Error vs. Model Fitness

Minimize Prediction Error

• Ultimate Goal − Generalization

• Goal of Our Learning Procedure

Minimize Empirical Error

Minimize Prediction Error

(MSE)

Page 90: Jacek Mazurkiewicz, PhD Softcomputing Part 13: …...•Radial basis function (RBF) networks are feed-forward networks trained using a supervised training algorithm. •The activation

Estimating Prediction Error

• When you have plenty of data use independent test sets– E.g., use the same training set to train

different models, and choose the best model by comparing on the test set.

• When data is scarce, use– Cross-Validation– Bootstrap

Page 91: Jacek Mazurkiewicz, PhD Softcomputing Part 13: …...•Radial basis function (RBF) networks are feed-forward networks trained using a supervised training algorithm. •The activation

Cross Validation

• Simplest and most widely used method for estimating prediction error.

• Partition the original set into several different ways and to compute an average score over the different partitions, e.g.,– K-fold Cross-Validation– Leave-One-Out Cross-Validation– Generalize Cross-Validation

Page 92: Jacek Mazurkiewicz, PhD Softcomputing Part 13: …...•Radial basis function (RBF) networks are feed-forward networks trained using a supervised training algorithm. •The activation

Global hyperplane Local receptive field

EBP LMS

Local minima Serious local minima

Smaller number of hidden

neurons

Larger number of hidden

neurons

Shorter computation time Longer computation time

Longer learning time Shorter learning time

Comparison of RBF and MLP

Page 93: Jacek Mazurkiewicz, PhD Softcomputing Part 13: …...•Radial basis function (RBF) networks are feed-forward networks trained using a supervised training algorithm. •The activation

RBF networks MLP

Learning speed Very Fast Very Slow

Convergence Almost guarantee Not guarantee

Response time Slow Fast

Memory

requirement

Very large Small

Hardware

implementation

IBM ZISC036

Nestor Ni1000www-5.ibm.com/fr/cdlab/zisc.html

Voice Direct 364

www.sensoryinc.com

Generalization Usually better Usually poorer

Comparison of RBF and MLP