nonparametric density estimation riu baring cis 8526 machine learning temple university fall 2007...
TRANSCRIPT
Nonparametric Density Estimation
Riu BaringCIS 8526 Machine Learning
Temple UniversityFall 2007
Christopher M. Bishop, Pattern Recognition and Machine Learning, Chapter 2.5Some slides from http://courses.cs.tamu.edu/rgutier/cpsc689_f07/
Overview
Density Estimation Given: a finite set x1,…,xN
Task: to model the probability distribution p(x)
Parametric Distribution Governed by adaptive parameters
Mean and variance – Gaussian Distribution Need procedure to determine suitable values for the
parameters Discrete rv – binomial and multinomial distributions Continuous rv – Gaussian distributions
Nonparametric Method
Attempt to estimate the density directly from the data without making any parametric assumptions about the underlying distribution
.
NonparametricDensity Estimation
Histogram
Divide the sample space into a number of bins and approximate the density at the center of each bin by the fraction of points in the training data that fall into the corresponding bin
.
1 [number of falling in bin ]( )
[width of bin containing ]
x ip x
N x
Histogram
Parameter: bin width
.
Histogram - Drawbacks
The discontinuities of the estimate are not due to the underlying density, they are only an artifact of the chosen bin locations These discontinuities make it very difficult (to the naïve
analyst) to grasp the structure of the data A much more serious problem is the curse of
dimensionality, since the number of bins grows exponentially with the number of dimensions In high dimensions we would require a very large
number of examples or else most of the bins would be empty
Nonparametric DE
Nonparametric DE
Nonparametric DE
Kernel Density Estimator
Kernel Density Estimator
k-nearest-neighbors
To estimate p(x): Consider small sphere centered on the point x Allow the radius of the sphere to grow until it
contains k data points
k-nearest-neighbors
Data set comprising Nk points in class Ck, so that
Suppose the sphere has volume, V, and contains kk points from class Ck
Density Estimate Unconditional density Class Prior
Posterior probability of class membership .
kkN N
k kk k
k
k k k
k Nkp(x|C )= p(x)= p(C )=
N V NV N
p(x|C )p(C ) kp(C|x)= =
p(x) k
k-nearest-neighbors
To classify new point x Identify K nearest neighbors from training data Assign to the class having the largest number of
representatives Parameter, K
.
My thoughts
KDE and KNN require the entire training data set to be stored Leads to expensive computation
Tweak “parameters” KDE: bandwidth, h KNN: K