Download - RBF Classnote Mtech Spring2013
Radial Basis Functions An Introduction
Prof. Sarat K. PatraSenior Member, IEEE
National Institute of Technology, Rourkela
Odisha, India
Email: [email protected]
Presentation Outline
3/31/2014 Prof. Sarat Kumar Patra Radial Basis Function 2
Books and reference materials:
• S Haykin; Neural Networks – A comprehensive foundation; Pearson Education
• Christopher M Bishop; Neural Networks for Pattern recognition; Oxford University Press
• B Mulgrew; Applying radial basis functions ;Signal Processing Magazine, IEEE; Volume: 13 ,Issue: 2; 1998
What are we going to cover
Introduction
Soft computing Techniques
NN Architectures
Linear and non-linearly separable
Basis Functions
Regularized RBF; Generalized RBF
RBF Training and Examples
Difference with MLP
Conclusion
3/31/2014 Prof. Sarat Kumar Patra Radial Basis Function 3
Different NN Architectures
• Perceptron (Only one neuron)
– Linear decision boundary
– Limited functionality
• MLP
• RBF
• Recurrent networks
• Self organizing maps
• Many more
3/31/2014 Prof. Sarat Kumar Patra Radial Basis Function 4
Linear and Non-linearly Separable
• Take a 2 input single output
• Plot the each category output in input space using different symbols
• Take inputs in “x-y” plane
• Can you have a line separating the points into 2 categories?
– Yes – linearly separable (OR. AND gate)
– No – Non-linearly separable (EX-OR gate)
3/31/2014 Prof. Sarat Kumar Patra Radial Basis Function 5
Why network models beyond MLN?
• MLN (MLP) was already universal, but…
• MLN (MLP) can have many local minima.
• It is often too slow to train MLN.
• Sometimes, it is extremely difficult to optimize the structure of MLN.
• There may exist other network architecturesin terms of number of elements in each layer…whose performance could be superior to theone used.
3/31/2014 Prof. Sarat Kumar Patra Radial Basis Function 6
Radial Basis Function (RBF) Networks
RBFN are artificial neural networks forapplication to problems of supervised learning:
Regression
Classification
3/31/2014 Prof. Sarat Kumar Patra Radial Basis Function 7
Pragmatic Regression
• Parametric regression-the form of the function is known butnot the parameter values.
• Typically, the parameters (both the dependent andindependent) have physical meaning.
• E.g. fitting a straight
line to a bunch
Of points-
3/31/2014 Prof. Sarat Kumar Patra Radial Basis Function 8
Non-Pragmatic Regression
• No prior knowledge of the true form of the function.
• Using many free parameters which have no physical meaning.
• The model should be able to represent a very broad class of functions.
3/31/2014 Prof. Sarat Kumar Patra Radial Basis Function 9
Classification
• Purpose: assign previously unseen patterns to their respective classes.
• Training: previous examples of each class.
• Output: a class out of a discrete set of classes.
• Classification problems can be made to look like nonparametric regression.
3/31/2014 Prof. Sarat Kumar Patra Radial Basis Function 10
Time Series Prediction
• Estimate the next value and future values of asequence, such as:
• The problem is that usually it is not an explicit functionof time. Normally time series are modeled as auto-regressive in nature, i.e. the outputs, suitably delayed,are also the inputs:
• To create the training set from the available historicalsequence first requires the choice of how many andwhich delayed outputs affect the next output.
3/31/2014 Prof. Sarat Kumar Patra Radial Basis Function 11
Supervised Learning in RBFN
• Neural networks, including radial basis functionnetworks, are nonparametric models and theirweights (and other parameters) have noparticular meaning in relation to the problems towhich they are applied.
• Estimating values for the weights of a neuralnetwork (or the parameters of any nonparametricmodel) is never the primary goal in supervisedlearning.
• The primary goal is to estimate the underlyingfunction (or at least to estimate its output atcertain desired values of the input).
3/31/2014 Prof. Sarat Kumar Patra Radial Basis Function 12
The idea of RBFNN
The MLN is one way to get non-linearity. The other is to use
The generalized linear discriminate function
3/31/2014 Prof. Sarat Kumar Patra Radial Basis Function 13
j
jjwy )(x
The idea of RBFNN
For Radial Basis Function (RBF), the basis function is radial
Symmetry with respect to the input, whose valueis determined by the - distance from the data pointto the RBF center.
3/31/2014 Prof. Sarat Kumar Patra Radial Basis Function 14
M
m
mjmj
j
j
jj
x1
2
2
])([|||| distance,Euclidean For
measure. distance theis ||||
width. the center, therepresents where
)2/||||exp()(
ccx
cx
c
cxx
The Gaussian Kernel
Cover’s Theorem
“A complex pattern-classification problem cast inhigh-dimensional space nonlinearly is more likelyto be linearly separable than in a lowdimensional space”
(Cover, 1965).
3/31/2014 Prof. Sarat Kumar Patra Radial Basis Function 15
Radial Basis Function Networks
• In its most basic form Radial-Basis Functionnetwork (RBF) involves three layers with entirelydifferent roles.
• The input layer is made up of source nodes thatconnect the network to its environment.
• The second layer, the only hidden layer, applies anonlinear transformation from the input space tothe hidden space.
• The output layer is linear, supplying the responseof the network to the activation pattern appliedto the input layer.
3/31/2014 Prof. Sarat Kumar Patra Radial Basis Function 16
The idea of RBFNN
• For RBFNN, we expect that the function to belearnt can be expressed as a linearsuperposition of a number of RBFs.
3/31/2014 Prof. Sarat Kumar Patra Radial Basis Function 17
The function is described as a linear
superposition Of three basis
functions.
RBF Structure
RBFNN: a two-layer network
Free parameters
--The network weights win the 2nd layer
--The form of basis functions
--The number of basis functions
--The location of basis functions.
E.g.: for Gaussian RBFNN, they are the number, the centersand the widths of basis functions
3/31/2014 Prof. Sarat Kumar Patra Radial Basis Function 18
y
x
w
Some Theory
Given a set of N differentpoints {xi Rm0 i=1,2,...,N}and a corresponding set of Nreal numbers {di R1 i=1,2,...,N}, find a functionF:RN->R1 that satisfies theinterpolation condition
F(xi) = di , i=1,2,...,NThe radial-basis functiontechnique consists ofchoosing a function FF(x) = N
i=1 wi (x-xi )
3/31/2014 Prof. Sarat Kumar Patra Radial Basis Function 19
Some Theory
Micchelli’s Theorem
Let {xi}N
i=1 be a set of distinct points in Rm0 Thenthe N-by-N interpolation matrix , whose joy-theelement is ij = (xj-xi) is non-singular.
3/31/2014 Prof. Sarat Kumar Patra Radial Basis Function 20
Regularization Networks
The regularization network is a universal approximator
The regularization network has the best approximation property
The solution computed by the regularization network is optimal.
3/31/2014 Prof. Sarat Kumar Patra Radial Basis Function 21
Generalized RBF Networks
• When N is large, the one-to-one correspondencebetween the training inputdata and the Green’sfunction produces aregularization networkthat may be consideredexpensive. ->
• An approximation of theregularized network.
3/31/2014 Prof. Sarat Kumar Patra Radial Basis Function 22
Generalized RBF Networks
• The approach taken involves searching for suboptimal solution in alower-dimensional space that approximates the regularized solution(Galerkin’s method).
F*(x) = m1 i=1 wi i(x),
where {i(x) | i=1,2,...,m1 N} is a new set of linearly independentbasis functions and the wi constitute a new set of weights.
• We set i(x) = G(x-ti ), i=1,2,... m1 where the set of centers {ti |i=1,2,...,m1} is to be determined.
Note that this particular choice of basis functions is the only thatguarantees that in the case of m1 = N and xi = ti i=1,2,...,N thecorrect solution is consistently recovered.
3/31/2014 Prof. Sarat Kumar Patra Radial Basis Function 23
localized Non-localized
RBF Structure (2)
• Universal approximation: for Gaussian RBFNN, it is capable to approximate any function.
3/31/2014 Prof. Sarat Kumar Patra Radial Basis Function 24
Exact Interpolation
• The idea of RBFNN is that we ‘interpolate’ thetarget function by using the sum of a number ofbasis functions.
• To illustrate this idea, we consider a special caseof exact interpolation, in which the number ofbasis functions M is equal to the number of datapoints N (M=N) and all
• The basis functions are centered at the datapoints. We want the target values are exactlyinterpolated by the summation of basis functions.
3/31/2014 Prof. Sarat Kumar Patra Radial Basis Function 25
Exact Interpolation
3/31/2014 Prof. Sarat Kumar Patra Radial Basis Function 26
tw
cx
or
||)(||
,1for ,
1
M
j
n
j
n
jj
nn
tw
Nnty
Since M=N, is a square matrix and is non-singular for general cases, the result istw
1
RBF Output with 3 centers
1-Dimensional problem
Center location (-1, 0, 1)
3/31/2014 Prof. Sarat Kumar Patra Radial Basis Function 27
RBF Output with 4centres (EX-OR)
3/31/2014 Prof. Sarat Kumar Patra Radial Basis Function 28
σ2= 0.1
σ2= 1.0
RBF Output with 4centres
3/31/2014 Prof. Sarat Kumar Patra Radial Basis Function 29
σ2= 0.1
σ2= 1.0
An example of exact interpolation
For Gaussian RBF (1D input)
21 data points are generated by y=sin(px) plus noise (strength=0.2)
3/31/2014 Prof. Sarat Kumar Patra Radial Basis Function 30
The target data points areindeed exactly interpolated,but the generalizationperformance is not good.
The hybrid training procedure
• The number of basis functions needs not to be equal tothe number of data points. Actually, in a typicalsituation, M should be much less than N.
• The centers of basis functions are no longerconstrained to be at the input data points. Instead, thedetermination of centres becomes part of the trainingprocess.
• Instead of having a common width parameter , eachbasis function can has its own width, which is also tobe determined by learning.
3/31/2014 Prof. Sarat Kumar Patra Radial Basis Function 31
An example of RBFNN
3/31/2014 Prof. Sarat Kumar Patra Radial Basis Function 32
Exact interpolation, =0.1
RBFNN, 4 basis functions, 0.4
The hybrid training procedure
• Unsupervised learning in the first layer. This is to fix thebasis functions by only using the knowledge of inputdata. For Gaussian RBF, it often includes deciding thenumber, locations and the width of RBF.
• Supervised learning in the second layer. This is todetermine the network weights in the second layer. Ifwe choose the sum-of-square error, it becomes aquadratic function optimization, which is easy to solve.
• In summary, the hybrid training avoids to usesupervised learning simultaneously in two layers, andgreatly simplify the computational cost.
3/31/2014 Prof. Sarat Kumar Patra Radial Basis Function 33
Basis function optimization
The form of basis function is predefined, and isoften chosen to be Gaussian.
The number of basis function has often to bedetermined by trials, e.g. though monitoringthe generalization performance.
The key issue in unsupervised learning is todetermine the locations and the widths of basisfunctions.
3/31/2014 Prof. Sarat Kumar Patra Radial Basis Function 34
Algorithms for basis function optimization
Subsets of data points.
• To randomly select a number of input datapoints as basis function centers.
• The width can be chosen to be equal and tobe given by some multiple of the averagedistance between the basis function centers.
3/31/2014 Prof. Sarat Kumar Patra Radial Basis Function 35
Algorithms for basis function optimization
Gaussian mixture models.
• The choice of basis functions is essential tomodel the density distribution of the inputdata (intuitively we want the centers of basisfunctions to be at high density regions). Wemay assume input data is generated by amixture of Gaussian distribution. Optimizingthe probability density model returns thebasis function centers and widths.
3/31/2014 Prof. Sarat Kumar Patra Radial Basis Function 36
Algorithms for basis function optimization
Clustering algorithms.
• In this approach the input data is assumed toconsist of a number of clusters. Each clustercorresponds to one basis function, with thecenter being the basis function center. Thewidth can be set to be equal to some multipleof the average distance between all centers.
3/31/2014 Prof. Sarat Kumar Patra Radial Basis Function 37
K-means clustering algorithm (1)
• The algorithm partition data points into K disjointsubsets (K is predefined).
• The clustering criteria are:
– The cluster centers are set in the high density regionsof data
– A data point is assigned to the cluster with which ithas the minimum distance to the center
• Mathematically, this is equivalent to minimizingthe sum-of-square clustering function
3/31/2014 Prof. Sarat Kumar Patra Radial Basis Function 38
K-means clustering algorithm (2)
3/31/2014 Prof. Sarat Kumar Patra Radial Basis Function 39
cluster in points data theofmean the:1
points data containingcluster th the:
where
||||1
2
j
Sn
n
j
j
jj
K
j Sn
j
n
SN
NjS
J
j
j
xc
cx
K-means clustering algorithm (3)
3/31/2014 Prof. Sarat Kumar Patra Radial Basis Function 40
• Step 1: Initially randomly assign data points to one of Kclusters. Each data point will then have a cluster label.
• Step 2: Calculate the mean of each cluster C.• Step 3:Check whether each data pointed has the right
cluster label. For each data point, calculate its distancesto all K centers. If the minimum distance is not thevalue of this data point in its cluster center, the clusteridentity of this data point will then be updated to theone that gives the minimum distance.
• Step 4: After each epoch checking (one turn for all datapoints), if no updating occurs, i.e., J reaches theminimum value, then stop. Otherwise, go back to step-2.
An example of data clustering
3/31/2014 Prof. Sarat Kumar Patra Radial Basis Function 41
Before clustering After clustering
The network training
• The network output after clustering
3/31/2014 Prof. Sarat Kumar Patra Radial Basis Function 42
termbias the:1)(
clusteringby obtained centers the:
RBFGaussian the:0for ),2/||||exp()(
)()(
0
22
0
x
c
cxx
xx
j
jj
j
K
j
j
j
wy
N
n
nM
j
n
jj twE1
2
0
)(2
1)( w
The error output is
RBF in Time series Prediction
• We will show an example of using RBFNN for timeseries prediction.
• Time series prediction: to predict the systembehavior based on its history.
• Suppose the time course of a system is denotedas{S(1),S(2),…S(n)}, where S(n) is the system stateat time step n. The task is to predict the systembehavior at n+1 based on the knowledge of itshistory. i.e., {S(n),S(n-1),S(n-2),…}. This is possiblefor many problems in which system states arecorrelated over time.
3/31/2014 Prof. Sarat Kumar Patra Radial Basis Function 43
RBF in Time series Prediction
• Consider a simple example, the logistic map, in which the system state x is updated iteratively according to
• Our task is to predict the value of x at any step based on its values in the previous two steps, i.e., to estimate xn based on xn-2 and
3/31/2014 Prof. Sarat Kumar Patra Radial Basis Function 44
)1(1 nnn xrxx
Generating training data from the logistic map
• The logistic map, though is simple, shows many interesting behaviors. (More detail can be found at http://mathworld.wolfram.com/LogisticMap.html
• The data collecting process:• Choose r=4, and the initial value of x to be 0.3
• Iterate the logistic map 500 steps, and collect 100 examples from the last
• 100 iterations (chopping the data into triplets, each triplet gives one input-output pair).
3/31/2014 Prof. Sarat Kumar Patra Radial Basis Function 45
Generating training data from the logistic map
3/31/2014 Prof. Sarat Kumar Patra Radial Basis Function 46
The input data space
The time course of the system state
Clustering the input data
• We cluster the input data by using the K-means clustering algorithm.
• We choose K=4. The clustering result returns the centers of basis functions and the scale of width.
3/31/2014 Prof. Sarat Kumar Patra Radial Basis Function 47
The training result of RBFNN
3/31/2014 Prof. Sarat Kumar Patra Radial Basis Function 48
2 and between iprelationsh The nn xx
The training result of RBFNN
3/31/2014 Prof. Sarat Kumar Patra Radial Basis Function 49
1 and between iprelationsh The nn xx
Time series predicted data
3/31/2014 Prof. Sarat Kumar Patra Radial Basis Function 50
Comparison with MLP
RBF• Simple structure: one hidden layer,
linear combination at the output layer
• Simple training: the hybrid training: clustering + the quadratic error function
• Localized representation: the input space is covered by a number of localized basis functions. A given input typically only activate significantly a limited number of hidden units (those are within a close distance)
MLP
• Complicated structure: often many layers and many hidden units
• Complicated training: optimizing multiple layer together, local minimum and slow convergence.
• Distributed representation: for a given input, typically many hidden units will be activated.
3/31/2014 Prof. Sarat Kumar Patra Radial Basis Function 51
Comparison with MLP (2)
• Different ways of interpolating data
3/31/2014 Prof. Sarat Kumar Patra Radial Basis Function 52
MLP: data are classified by hyper-planes. RBF: data are classified according to clusters
Shortcomings of RBFNN
• Unsupervised learning implies that RBFNNmay only achieve a sub - optimal solution,since the training of basis functions does notconsider the information of the outputdistribution.
3/31/2014 Prof. Sarat Kumar Patra Radial Basis Function 53
Example: a basis function ischosen based only on thedensity of input data, whichgives p (x). It does not matchthe real output function h (x).
Shortcomings of RBFNN
3/31/2014 Prof. Sarat Kumar Patra Radial Basis Function 54
Example: the output function is only determined by one input component, theother component is irrelevant. Due to unsupervised, RBFNN is unable to detectthis irrelevant component, whereas, MLP may do (the network weightsconnected to irrelevant components will tend to have smaller values).
Some Theory
The XOR problem: (x1 OR x2) AND NOT (x1 AND x2)
3/31/2014 Prof. Sarat Kumar Patra Radial Basis Function 55
Summary
• The structure of an RBF network is unusual in that the constitution of its hidden units is entirely different from that of its output units.
• Tikhonov’s regularization theory provides a sound mathematical basis for the formulation of RBF networks.
• The Green’s function G (x, ) plays a central role in the theory.
3/31/2014 Prof. Sarat Kumar Patra Radial Basis Function 56
Queries ?????