kernel methods arie nakhmani. outline kernel smoothers kernel density estimators kernel density...
Post on 18-Jan-2018
Embed Size (px)
DESCRIPTIONKernel Smoothers The Goal Estimating a function by using noisy observations, when the parametric model for this function is unknown The resulting function should be smooth The level of smoothness should be set by a single parameter
Kernel Methods Arie Nakhmani Outline Kernel Smoothers Kernel Density Estimators Kernel Density Classifiers Kernel Smoothers The Goal Estimating a function by using noisy observations, when the parametric model for this function is unknown The resulting function should be smooth The level of smoothness should be set by a single parameter Example N=100 sample points What is it: smooth enough ? Example N=100 sample points Exponential Smoother Smaller smoother line, but more delayed Exponential Smoother Simple Sequential Single parameter Single value memory Too rough Delayed Moving Average Smoother m=11 Larger m smoother, but straightened line Moving Average Smoother Sequential Single parameter: the window size m Memory for m values Irregularly smooth What if we have p-dimensional problem with p>1 ??? Nearest Neighbors Smoother x0x0 m=16 Larger m smoother, but biased line Nearest Neighbors Smoother Not sequential Single parameter: the number of neighbors m Trivially extended to any number of dimensions Memory for m values Depends on metrics definition Not smooth enough Biased end-points Low Pass Filter 2 nd order Butterworth: Why do we need kernel smoothers ??? Low Pass Filter The same filterfor log function Low Pass Filter Smooth Simply extended to any number of dimensions Effectively, 3 parameters: type, order, and bandwidth Biased end-points Inappropriate for some functions (depends on bandwidth) Kernel Average Smoother x0x0 Nadaraya-Watson kernel-weighted average: with the kernel: for Nearest Neighbor Smoother for Locally Weighted Average t Popular Kernels Epanechnikov kernel: Tri-cube kernel: Gaussian Kernel: Non-Symmetric Kernel Kernel example: Which kernel is that ??? Kernel Average Smoother Single parameter: window width Smooth Trivially extended to any number of dimensions Memory-based method little or no training is required Depends on metrics definition Biased end-points Local Linear Regression Kernel-weighted average minimizes: Local linear regression minimizes: Local Linear Regression Solution: where: Other representation: equivalent kernel Local Linear Regression x0x0 Equivalent Kernels Local Polynomial Regression Why stop at local linear fits? Lets minimize: Local Polynomial Regression Variance Compromise Conclusions Local linear fits can help bias dramatically at the boundaries at a modest cost in variance. Local linear fits more reliable for extrapolation. Local quadratic fits do little at the boundaries for bias, but increase the variance a lot. Local quadratic fits tend to be most helpful in reducing bias due to curvature in the interior of the domain. controls the tradeoff between bias and variance. Larger makes lower variance but higher bias Local Regression in Radial kernel: Popular Kernels Epanechnikov kernel Tri-cube kernel Gaussian kernel Example Higher Dimensions The boundary estimation is problematic Many sample points are needed to reduce the bias Local regression is less useful for p>3 Its impossible to maintain localness (low bias) and sizeable samples (low variance) at the same time Structured Kernels Non-radial kernel: Coordinates or directions can be downgraded or omitted by imposing restrictions on A. Covariance can be used to adapt a metric A. (related to Mahalanobis distance) Projection-pursuit model Structured Regression Divide into a set (X 1,X 2,,X q ) with q