radial basis function networks - vision...

21
Radial Basis Function Networks

Upload: others

Post on 20-May-2020

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Radial Basis Function Networks - Vision Labsvision.psych.umn.edu/users/schrater/schrater_lab/courses/PattRecog… · Radial Basis Function Networks. Introduction In this lecture we

Radial Basis Function Networks

Page 2: Radial Basis Function Networks - Vision Labsvision.psych.umn.edu/users/schrater/schrater_lab/courses/PattRecog… · Radial Basis Function Networks. Introduction In this lecture we

Introduction In this lecture we will look at RBFs, networks where the

activation of hidden is based on the distance betweenthe input vector and a prototype vector

Radial Basis Functions have a number ofinteresting properties

• Strong connections to other disciplines– function approximation, regularization theory, density estimation and

interpolation in the presence of noise [Bishop, 1995]• RBFs allow for a straightforward interpretation of the

internal representation produced by the hidden layer• RBFs have training algorithms that are significantly

faster than those for MLPs• RBFs can be implemented as support vector machines

Page 3: Radial Basis Function Networks - Vision Labsvision.psych.umn.edu/users/schrater/schrater_lab/courses/PattRecog… · Radial Basis Function Networks. Introduction In this lecture we

Radial Basis Functions

• Discriminant function flexibility–NON-Linear

•But with sets of linear parameters at each layer•Provably general function approximators for sufficient nodes

• Error function–Mixed- Different error criteria typically used for hidden vs.output layers.

–Hidden: Input approx.–Output: Training error

• Optimization– Simple least squares - with hybrid training solution isunique.

Page 4: Radial Basis Function Networks - Vision Labsvision.psych.umn.edu/users/schrater/schrater_lab/courses/PattRecog… · Radial Basis Function Networks. Introduction In this lecture we

Two layer Non-linear NN

Non-linear functions are radially symmetric ‘kernels’, withfree parameters

Page 5: Radial Basis Function Networks - Vision Labsvision.psych.umn.edu/users/schrater/schrater_lab/courses/PattRecog… · Radial Basis Function Networks. Introduction In this lecture we

Input to Hidden Mapping

• Each region of feature space by means of a radially symmetricfunction– Activation of a hidden unit is determined by the DISTANCE between the input

vector x and a prototype vector µ

• Choice of radial basis function• Although several forms of radial basis may be used, Gaussian kernels are

most commonly used– The Gaussian kernel may have a full-covariance structure, which requires D(D+3)/2

parameters to be learned

– or a diagonal structure, with only (D+1) independent parameters

In practice, a trade-off exists between using a small number of basis with manyparameters or a larger number of less flexible functions [Bishop, 1995]

Page 6: Radial Basis Function Networks - Vision Labsvision.psych.umn.edu/users/schrater/schrater_lab/courses/PattRecog… · Radial Basis Function Networks. Introduction In this lecture we

Builds up a function out of‘Spheres’

Page 7: Radial Basis Function Networks - Vision Labsvision.psych.umn.edu/users/schrater/schrater_lab/courses/PattRecog… · Radial Basis Function Networks. Introduction In this lecture we

RBF vs. NN

Page 8: Radial Basis Function Networks - Vision Labsvision.psych.umn.edu/users/schrater/schrater_lab/courses/PattRecog… · Radial Basis Function Networks. Introduction In this lecture we
Page 9: Radial Basis Function Networks - Vision Labsvision.psych.umn.edu/users/schrater/schrater_lab/courses/PattRecog… · Radial Basis Function Networks. Introduction In this lecture we
Page 10: Radial Basis Function Networks - Vision Labsvision.psych.umn.edu/users/schrater/schrater_lab/courses/PattRecog… · Radial Basis Function Networks. Introduction In this lecture we

RBFs have their origins in techniques forperforming exact interpolation

These techniques place a basis function at each ofthe training examples f(x)

and compute the coefficients wk so that the“mixture model” has zero error at examples

Page 11: Radial Basis Function Networks - Vision Labsvision.psych.umn.edu/users/schrater/schrater_lab/courses/PattRecog… · Radial Basis Function Networks. Introduction In this lecture we

Formally: Exact Interpolation

y = h(x) = wiϕ x − x i( )i=1:n∑

ϕ11 ϕ12 L ϕ1nϕ21 ϕ22 L ϕ2nM M M Mϕn1 ϕn2 L ϕnn

w1w2Mwn

=

t1t2Mtn

ϕ ij =ϕ x i − x j( )

Goal: Find a function h(x), such that h(xi) = ti, for inputs xi and targets ti

Φr w =

r t

r w =Φ−1r t Solve for w

Form

Recall the Kerneltrick

Page 12: Radial Basis Function Networks - Vision Labsvision.psych.umn.edu/users/schrater/schrater_lab/courses/PattRecog… · Radial Basis Function Networks. Introduction In this lecture we

Solving conditions

• Micchelli’s Theorem: If the points xi are distinct,then the Φ matrix will be nonsingular.

Mhaskar and Micchelli, Approximation by superposition of sigmoidal and radialbasis functions, Advances in Applied Mathematics,13, 350-373, 1992.

Page 13: Radial Basis Function Networks - Vision Labsvision.psych.umn.edu/users/schrater/schrater_lab/courses/PattRecog… · Radial Basis Function Networks. Introduction In this lecture we
Page 14: Radial Basis Function Networks - Vision Labsvision.psych.umn.edu/users/schrater/schrater_lab/courses/PattRecog… · Radial Basis Function Networks. Introduction In this lecture we

Radial Basis function Networks

n outputs, m basis functions

yk = hk (x) = wkiϕ ii=1:m∑ x( ) + wk0, e.g. ϕ i(x) = exp −

x−µ j2

2σ j2

= wkiϕ ii= 0:m∑ x( ), ϕ0(x) =1

Rewrite this as :

r y = W r ϕ (x),

r ϕ (x) =

ϕ0(x)M

ϕm (x)

Page 15: Radial Basis Function Networks - Vision Labsvision.psych.umn.edu/users/schrater/schrater_lab/courses/PattRecog… · Radial Basis Function Networks. Introduction In this lecture we

Learning

RBFs are commonly trained following a hybridprocedure that operates in two stages

Unsupervised selection of RBF centers– RBF centers are selected so as to match the distribution of training

examples in the input feature space– This is the critical step in training, normally performed in a slow

iterative manner– A number of strategies are used to solve this problem

• Supervised computation of output vectors– Hidden-to-output weight vectors are determined so as to minimize

the sum-squared– error between the RBF outputs and the desired targets– Since the outputs are linear, the optimal weights can be computed

using fast, linear

Page 16: Radial Basis Function Networks - Vision Labsvision.psych.umn.edu/users/schrater/schrater_lab/courses/PattRecog… · Radial Basis Function Networks. Introduction In this lecture we

Least squares solution for output weightsGiven L input vectors xl, with labels tl

Form the least square error function :

E = yk (x l ) − tlk{ }

k=1:n∑

l=1:L∑

2, tl

k is the k th value of the target vector

E = (r y (x l ) −r t l )t (

r y (x l ) −r t l )

l=1:L∑

= (r ϕ x l( )Wt −

r t l )

l=1:L∑

t(r ϕ x l( )Wt −

r t l )

E = (ΦWt −T)l=1:L∑

t(ΦWt −T) = WΦtΦWt − 2WΦtT+

l=1:L∑ TtT

∂EdW = 0 = 2ΦtΦWt − 2ΦtT

Wt = Φ tΦ( )−1

ΦtT

Page 17: Radial Basis Function Networks - Vision Labsvision.psych.umn.edu/users/schrater/schrater_lab/courses/PattRecog… · Radial Basis Function Networks. Introduction In this lecture we

Solving for input parameters

• Now we have a linear solution, if we knowthe input parameters. How do we set theinput parameters?– Gradient descent--like Back-prop– Density estimation viewpoint: By looking at the

meaning of the RBF network for classification,the input parameters can be estimated viadensity estimation and/or clustering techniques

– We will skip these in lecture

Page 18: Radial Basis Function Networks - Vision Labsvision.psych.umn.edu/users/schrater/schrater_lab/courses/PattRecog… · Radial Basis Function Networks. Introduction In this lecture we

Unsupervised methods overview• Random selection of centers

– The simplest approach is to randomly select a number of trainingexamples as RBF centers

– This method has the advantage of being very fast, but the network will likelyrequire an excessive number of centers

– Once the center positions have been selected, the spread parametersσj can be estimated, for instance, from the average distance betweenneighboring centers

• Clustering– Alternatively, RBF centers may be obtained with a clustering procedure

such as the k-means algorithm The spread parameters can be computed as before, or from the sample

covariance of the examples of each cluster• Density estimation

– The position of the RB centers may also be obtained by modeling thefeature space density with a Gaussian Mixture Model using theExpectation- Maximization algorithm

– The spread parameters for each center are automatically obtained fromthe covariance matrices of the corresponding Gaussian components

Page 19: Radial Basis Function Networks - Vision Labsvision.psych.umn.edu/users/schrater/schrater_lab/courses/PattRecog… · Radial Basis Function Networks. Introduction In this lecture we

Probabilistic Interpretion ofRBFs for Classification

=

IntroduceMixture

Page 20: Radial Basis Function Networks - Vision Labsvision.psych.umn.edu/users/schrater/schrater_lab/courses/PattRecog… · Radial Basis Function Networks. Introduction In this lecture we

Prob RBF con’d

where

Basis function outputs: Posterior jthFeature probabilities

weights: Class probabilitygiven jth Feature value

Page 21: Radial Basis Function Networks - Vision Labsvision.psych.umn.edu/users/schrater/schrater_lab/courses/PattRecog… · Radial Basis Function Networks. Introduction In this lecture we

Radial Basis function networks:multi-class mixtures of Gaussians• With Gaussian basis function, normalized

RBFs have the interpretation of being aMixture of gaussian probability densityestimate for all the data, together with aclass based posterior on these basisfunctions.

• So we can train 1st layer parameters usingMixture of gaussian---EM (coming soon)