prototype methods and nearest- neighbors methods

Prototype Methods and Nearest-Neighbors Methods

UNSUPERVIED LEARNING:

Clustering Methods

Next Class

Introduction

n Model-free methodsn Prototypes methods

n K-means clusteringn Learning Vector Quantization – LVQn Gaussian mixtures

n k-Nearest-Neighbors Classifiers n Summary

Model-free Methods for Classification

n Highly Unstructuredn Can be highly effective

as Black-box prediction engines

n Aren’t useful for understanding the nature of relationships between the features and class outcomes

n Nearest-neighbor techniques n Also work well for

regression in low dimensions

n In high-dimensional feature space, Bias-Variance tradeoff does not work as well with regression as with classification

Prototype Methods

n Class labels take values in {1,2, …, K}

n Training Data: N examples (x1,g1), (x2,g2), …, (xN,gN)n x: Featuresn g:class labels

n Prototype Methods represent the Training Data by a set of points in the feature space

n Prototypes are typically not examples from the training samples, except for 1-NN classification

K-Means Clustering

n Method for finding clusters and cluster centers in a set of unlabelled datan Choose a number of

cluster centers (cluster representatives), say, R

n K-means procedure iteratively moves the centers to minimize within cluster distances.

n Objects within cluster are closer than objects between clusters

n Given an initial set of centers, (chosen at random) the K-means algorithms iteratesn For each center, identify

the subset of training points (its cluster) that is closer to it than any other center

n The features mean vector for the data points in each cluster becomes the new center for that cluster

n Iterate till converge

K-means Clustering in Classification

n Apply K-means clustering to the training data from each class separately

n R prototypes per classn Assign a class label to

each of the K X R prototypes

n Classify a new example x to the class of the closest prototype

Learning Vector Quantization -LVQ

n LVQ: Kohonen (1989)n Prototypes placed

startegically with respect to decision boundaries in an ad-hoc way.

n Online algorithmn Training points attract

prototypes of correct class, and repel other prototypes.

n Just like online Neural Nets, Robbins-Monroe Stochastic approximation used

n Learning rate decrease for newer examples

n Several Algorithmsn Drawback: defined by

algorithms, rather than an optimization of an fixed criteria

LVQ1 Algorithmn Choose R initial

prototypes for each class, mi(k) i = 1,…,R; k = 1,…,K

n Select a training point xirandomlyn (j,k): index of closest

prototype mj(k) to xi

n If class label of xi =k, move the prototype closer to the sampled training point:

ε: learning raten If class label of xi not

same as k, move the prototype away from the training point:

n Repeat above step, decreasing the learning rate ε witheach iteration to zero.

( ) ( ) ( - ( )), j j i jm k m k x m kε← +

( ) ( ) ( - ( )),j j i jm k m k x m kε← −

LVQ for simulated Gaussian Mixtures data

n Using the K-means solution as initial prototypes for each classn The prototypes have

tended to move away from the decision boundaries, and away from the prototypes of other classes.

MDA-Mixtures of Gaussians

n MDA can also be considered as a prototype method

n Each cluster within a class is described in terms of a Gaussian density, with a centroid and a covariance matrix (possibly scalar)

n EM algorithm used to estimate the prototypes, based on likelihoods

MDA- EM algorithmn E-step: each obs.

assigned a weight (responsibility) for each cluster, based on likelihood of each of the corresponding Gaussiann Obs. Close to the center

of a cluster will most likely get weight 1 for that cluster, and weight 0 for other cluster.

n Obs half way between two clusters divide their weight accordingly.

n M-step: Each example contributes to the weighted means and covariance for every cluster

n Soft-clustering methodn An example may

assigned to a more than one class with some weight

K-means and MDA

k-Nearest Neighbors Classifiern Memory-based, no

model fitting requiredn Given a query point

x0, find the k training points x(r), r=1,…,k closest in a specified distance (metric) to x0

n Classify according to the majority vote among these k N-N

n Ties are broken at random

n Simplicity may dictate using Euclidean distance between points in feature spacen First standardize each of

the features to have mean zero, and variance 1

n All features contribute equally to the metric

n Euclidean distance: not invariant to scale and rotation changes

Examplen k increases, bias

increases, var decreasesn Cover-Hart ( 1967) :

asymptotically, the error rate of 1-nearest neighbor classifier is never more than twice the Bayes error raten Assumes No Bias – fixed

dimension, space filling training set asymptotically

n Details in the book

n Invariance under rotations could be important

n Other metrics may be better sometimes

k-Nearest Neighbors

k-Nearest Neighbors:Choice of k

Example: A Comparative Study

n Two problems:n Ten independent

features ~U[0,1]

n Easy – one featuren Y = I(X1 > 1/2)n 9 features are noise

n Difficultn Y = n 7 features are noise

n Performance could vary by problems

n With the best choice of tuning parameters, k-means and LVQ outperform Nearest Neighbors for the first problem, but perform similarly for the second problem

)0)}5.({( 31 >−∏ jXsignI

Comparison of K-means, LVQ and Nearest Neighbors

• Blue (k-means)

• Red (LVQ)

• Means +/- sd of misclassification errors

• Ten realizations for each problem

Choice of Distance Measure?n N-N classification

assumes that class probabilities are roughly contant in a nbd.

n Thus simple average give good estimates

n Here class probs. vary in the horizontal direction

n If we knew, we could define nbds. differently

n Hence reduce the bias of our method

Comparison of Various N-N Methods

prototype methods and nearest- neighbors methods

Documents