prototype methods and nearest- neighbors methods
TRANSCRIPT
Prototype Methods and Nearest-Neighbors Methods
UNSUPERVIED LEARNING:
Clustering Methods
Next Class
Introduction
n Model-free methodsn Prototypes methods
n K-means clusteringn Learning Vector Quantization – LVQn Gaussian mixtures
n k-Nearest-Neighbors Classifiers n Summary
Model-free Methods for Classification
n Highly Unstructuredn Can be highly effective
as Black-box prediction engines
n Aren’t useful for understanding the nature of relationships between the features and class outcomes
n Nearest-neighbor techniques n Also work well for
regression in low dimensions
n In high-dimensional feature space, Bias-Variance tradeoff does not work as well with regression as with classification
Prototype Methods
n Class labels take values in {1,2, …, K}
n Training Data: N examples (x1,g1), (x2,g2), …, (xN,gN)n x: Featuresn g:class labels
n Prototype Methods represent the Training Data by a set of points in the feature space
n Prototypes are typically not examples from the training samples, except for 1-NN classification
K-Means Clustering
n Method for finding clusters and cluster centers in a set of unlabelled datan Choose a number of
cluster centers (cluster representatives), say, R
n K-means procedure iteratively moves the centers to minimize within cluster distances.
n Objects within cluster are closer than objects between clusters
n Given an initial set of centers, (chosen at random) the K-means algorithms iteratesn For each center, identify
the subset of training points (its cluster) that is closer to it than any other center
n The features mean vector for the data points in each cluster becomes the new center for that cluster
n Iterate till converge
K-means Clustering in Classification
n Apply K-means clustering to the training data from each class separately
n R prototypes per classn Assign a class label to
each of the K X R prototypes
n Classify a new example x to the class of the closest prototype
Learning Vector Quantization -LVQ
n LVQ: Kohonen (1989)n Prototypes placed
startegically with respect to decision boundaries in an ad-hoc way.
n Online algorithmn Training points attract
prototypes of correct class, and repel other prototypes.
n Just like online Neural Nets, Robbins-Monroe Stochastic approximation used
n Learning rate decrease for newer examples
n Several Algorithmsn Drawback: defined by
algorithms, rather than an optimization of an fixed criteria
LVQ1 Algorithmn Choose R initial
prototypes for each class, mi(k) i = 1,…,R; k = 1,…,K
n Select a training point xirandomlyn (j,k): index of closest
prototype mj(k) to xi
n If class label of xi =k, move the prototype closer to the sampled training point:
ε: learning raten If class label of xi not
same as k, move the prototype away from the training point:
n Repeat above step, decreasing the learning rate ε witheach iteration to zero.
( ) ( ) ( - ( )), j j i jm k m k x m kε← +
( ) ( ) ( - ( )),j j i jm k m k x m kε← −
LVQ for simulated Gaussian Mixtures data
n Using the K-means solution as initial prototypes for each classn The prototypes have
tended to move away from the decision boundaries, and away from the prototypes of other classes.
MDA-Mixtures of Gaussians
n MDA can also be considered as a prototype method
n Each cluster within a class is described in terms of a Gaussian density, with a centroid and a covariance matrix (possibly scalar)
n EM algorithm used to estimate the prototypes, based on likelihoods
MDA- EM algorithmn E-step: each obs.
assigned a weight (responsibility) for each cluster, based on likelihood of each of the corresponding Gaussiann Obs. Close to the center
of a cluster will most likely get weight 1 for that cluster, and weight 0 for other cluster.
n Obs half way between two clusters divide their weight accordingly.
n M-step: Each example contributes to the weighted means and covariance for every cluster
n Soft-clustering methodn An example may
assigned to a more than one class with some weight
K-means and MDA
k-Nearest Neighbors Classifiern Memory-based, no
model fitting requiredn Given a query point
x0, find the k training points x(r), r=1,…,k closest in a specified distance (metric) to x0
n Classify according to the majority vote among these k N-N
n Ties are broken at random
n Simplicity may dictate using Euclidean distance between points in feature spacen First standardize each of
the features to have mean zero, and variance 1
n All features contribute equally to the metric
n Euclidean distance: not invariant to scale and rotation changes
Examplen k increases, bias
increases, var decreasesn Cover-Hart ( 1967) :
asymptotically, the error rate of 1-nearest neighbor classifier is never more than twice the Bayes error raten Assumes No Bias – fixed
dimension, space filling training set asymptotically
n Details in the book
n Invariance under rotations could be important
n Other metrics may be better sometimes
k-Nearest Neighbors
k-Nearest Neighbors:Choice of k
Example: A Comparative Study
n Two problems:n Ten independent
features ~U[0,1]
n Easy – one featuren Y = I(X1 > 1/2)n 9 features are noise
n Difficultn Y = n 7 features are noise
n Performance could vary by problems
n With the best choice of tuning parameters, k-means and LVQ outperform Nearest Neighbors for the first problem, but perform similarly for the second problem
)0)}5.({( 31 >−∏ jXsignI
Comparison of K-means, LVQ and Nearest Neighbors
• Blue (k-means)
• Red (LVQ)
• Means +/- sd of misclassification errors
• Ten realizations for each problem
Choice of Distance Measure?n N-N classification
assumes that class probabilities are roughly contant in a nbd.
n Thus simple average give good estimates
n Here class probs. vary in the horizontal direction
n If we knew, we could define nbds. differently
n Hence reduce the bias of our method
Comparison of Various N-N Methods