nonparametric density estimation riu baring cis 8526 machine learning temple university fall 2007...

15
Nonparametric Density Estimation Riu Baring CIS 8526 Machine Learning Temple University Fall 2007 Christopher M. Bishop, Pattern Recognition and Machine Learning, Chapter 2.5 Some slides from http://courses.cs.tamu.edu/rgutier/cpsc689_f07/

Upload: suzanna-blair

Post on 19-Jan-2016

216 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Nonparametric Density Estimation Riu Baring CIS 8526 Machine Learning Temple University Fall 2007 Christopher M. Bishop, Pattern Recognition and Machine

Nonparametric Density Estimation

Riu BaringCIS 8526 Machine Learning

Temple UniversityFall 2007

Christopher M. Bishop, Pattern Recognition and Machine Learning, Chapter 2.5Some slides from http://courses.cs.tamu.edu/rgutier/cpsc689_f07/

Page 2: Nonparametric Density Estimation Riu Baring CIS 8526 Machine Learning Temple University Fall 2007 Christopher M. Bishop, Pattern Recognition and Machine

Overview

Density Estimation Given: a finite set x1,…,xN

Task: to model the probability distribution p(x)

Parametric Distribution Governed by adaptive parameters

Mean and variance – Gaussian Distribution Need procedure to determine suitable values for the

parameters Discrete rv – binomial and multinomial distributions Continuous rv – Gaussian distributions

Page 3: Nonparametric Density Estimation Riu Baring CIS 8526 Machine Learning Temple University Fall 2007 Christopher M. Bishop, Pattern Recognition and Machine

Nonparametric Method

Attempt to estimate the density directly from the data without making any parametric assumptions about the underlying distribution

.

NonparametricDensity Estimation

Page 4: Nonparametric Density Estimation Riu Baring CIS 8526 Machine Learning Temple University Fall 2007 Christopher M. Bishop, Pattern Recognition and Machine

Histogram

Divide the sample space into a number of bins and approximate the density at the center of each bin by the fraction of points in the training data that fall into the corresponding bin

.

1 [number of falling in bin ]( )

[width of bin containing ]

x ip x

N x

Page 5: Nonparametric Density Estimation Riu Baring CIS 8526 Machine Learning Temple University Fall 2007 Christopher M. Bishop, Pattern Recognition and Machine

Histogram

Parameter: bin width

.

Page 6: Nonparametric Density Estimation Riu Baring CIS 8526 Machine Learning Temple University Fall 2007 Christopher M. Bishop, Pattern Recognition and Machine

Histogram - Drawbacks

The discontinuities of the estimate are not due to the underlying density, they are only an artifact of the chosen bin locations These discontinuities make it very difficult (to the naïve

analyst) to grasp the structure of the data A much more serious problem is the curse of

dimensionality, since the number of bins grows exponentially with the number of dimensions In high dimensions we would require a very large

number of examples or else most of the bins would be empty

Page 7: Nonparametric Density Estimation Riu Baring CIS 8526 Machine Learning Temple University Fall 2007 Christopher M. Bishop, Pattern Recognition and Machine

Nonparametric DE

Page 8: Nonparametric Density Estimation Riu Baring CIS 8526 Machine Learning Temple University Fall 2007 Christopher M. Bishop, Pattern Recognition and Machine

Nonparametric DE

Page 9: Nonparametric Density Estimation Riu Baring CIS 8526 Machine Learning Temple University Fall 2007 Christopher M. Bishop, Pattern Recognition and Machine

Nonparametric DE

Page 10: Nonparametric Density Estimation Riu Baring CIS 8526 Machine Learning Temple University Fall 2007 Christopher M. Bishop, Pattern Recognition and Machine

Kernel Density Estimator

Page 11: Nonparametric Density Estimation Riu Baring CIS 8526 Machine Learning Temple University Fall 2007 Christopher M. Bishop, Pattern Recognition and Machine

Kernel Density Estimator

Page 12: Nonparametric Density Estimation Riu Baring CIS 8526 Machine Learning Temple University Fall 2007 Christopher M. Bishop, Pattern Recognition and Machine

k-nearest-neighbors

To estimate p(x): Consider small sphere centered on the point x Allow the radius of the sphere to grow until it

contains k data points

Page 13: Nonparametric Density Estimation Riu Baring CIS 8526 Machine Learning Temple University Fall 2007 Christopher M. Bishop, Pattern Recognition and Machine

k-nearest-neighbors

Data set comprising Nk points in class Ck, so that

Suppose the sphere has volume, V, and contains kk points from class Ck

Density Estimate Unconditional density Class Prior

Posterior probability of class membership .

kkN N

k kk k

k

k k k

k Nkp(x|C )= p(x)= p(C )=

N V NV N

p(x|C )p(C ) kp(C|x)= =

p(x) k

Page 14: Nonparametric Density Estimation Riu Baring CIS 8526 Machine Learning Temple University Fall 2007 Christopher M. Bishop, Pattern Recognition and Machine

k-nearest-neighbors

To classify new point x Identify K nearest neighbors from training data Assign to the class having the largest number of

representatives Parameter, K

.

Page 15: Nonparametric Density Estimation Riu Baring CIS 8526 Machine Learning Temple University Fall 2007 Christopher M. Bishop, Pattern Recognition and Machine

My thoughts

KDE and KNN require the entire training data set to be stored Leads to expensive computation

Tweak “parameters” KDE: bandwidth, h KNN: K