nearest neighbour and clustering
Post on 06-Apr-2018
231 Views
Preview:
TRANSCRIPT
-
8/3/2019 Nearest Neighbour and Clustering
1/122
Nearest Neighbor and Clustering
-
8/3/2019 Nearest Neighbour and Clustering
2/122
-
8/3/2019 Nearest Neighbour and Clustering
3/122
Nearest Neighbour and Clustering
Nearest Neighbor Clustering
Used for prediction as well as
consolidation.
Used mostly for consolidating data into a
high-level view and general grouping of
records into like behaviors.
Space is defined by the problem
to be solved (supervised learning).
Space is defined as default n-
dimensional space, or is defined by the
user, or is a predefined space driven by
past experience (unsupervised learning).
Generally only uses distance
metrics to determine nearness.
Can use other metrics besides distance to
determine nearness of two records - for
example linking two points together.
-
8/3/2019 Nearest Neighbour and Clustering
4/122
K Nearest Neighbors
K Nearest Neighbors
Advantage Nonparametric architecture
Simple
Powerful
Requires no training time
Disadvantage
Memory intensive
Classification/estimation is slow
-
8/3/2019 Nearest Neighbour and Clustering
5/122
K Nearest Neighbors
The key issues involved in training this
model includes setting
the variableK
Validation techniques (ex. Cross validation)
the type of distant metric
Euclidean measure
2
1
)(),( !
!D
i
YiXiYXDist
-
8/3/2019 Nearest Neighbour and Clustering
6/122
Figure K Nearest Neighbors Example
X
Stored training set patterns
X input pattern for classification
--- Euclidean distance measure tothe nearest three patterns
-
8/3/2019 Nearest Neighbour and Clustering
7/122
Store all input data in the training set
For each pattern in the test set
Search for the K nearest patterns to the
input pattern using a Euclidean distance
measure
For classification, compute the confidence for
each class as Ci/K,
(where Ci is the number ofpatterns among
the K nearest patterns belonging to class i.)
The classification for the input pattern is the
class with the highest confidence.
-
8/3/2019 Nearest Neighbour and Clustering
8/122
Training parameters and typical
settings Number of nearest neighbors
The numbers of nearest neighbors (K) should be
based on cross validation over a number ofK setting.
When k=1 is a good baseline model to benchmarkagainst.
A good rule-of-thumb numbers is k should be less
than the square root of the total number of training
patterns.
-
8/3/2019 Nearest Neighbour and Clustering
9/122
Training parameters and typical
settings Input compression
Since KNN is very storage intensive, we may want tocompress data patterns as a preprocessing stepbefore classification.
Using input compression will result in slightly worseperformance.
Sometimes using compression will improveperformance because it performs automatic
normalization of the data which can equalize theeffect of each input in the Euclidean distancemeasure.
-
8/3/2019 Nearest Neighbour and Clustering
10/122
Nearest Neighbour and Clustering
Oldest techniques used in DM
Like records are grouped or clustered
together and put into same grouping Nearest neighbor prediction tech quite
close to clustering
To find prediction value in one record, lookfor similar records with similarpredictor
values in the historical DB
-
8/3/2019 Nearest Neighbour and Clustering
11/122
Use the prediction value of the recordwhich is nearest to the unknown rec
Ex:laundry uses clustering
In Business, clusters-more dynamic
Which cluster a rec falls, may changedaily, monthly
Therefore is difficult to decide Another ex NN:income group of
neighbours
-
8/3/2019 Nearest Neighbour and Clustering
12/122
Best way to predict an unknown persons
income possibly choose the closest
persons
Nearest neighbourprediction alg works on
DB very much same way
Many factors-nearest condn
Persons locn,school attended,degree
attained etc..
-
8/3/2019 Nearest Neighbour and Clustering
13/122
Business Score Card
Measures critical to business successdeals with: ease of deployment, real worldproblems avoiding serious mistakes as
well as achieving big successes DM tech needs to be :easy to use, deploy
in an automated fashion as possible
Provide clear understandable answers Provide answer that can be converted into
ROI
-
8/3/2019 Nearest Neighbour and Clustering
14/122
BSC
Automation: NN are relatively automated,
although some preprocessing is performed
in converting predictors into values that
can be used in a measure of distance
Unordered categorical predictors (eye
color) need to be defined in terms of the
dist from each other when there is a match
(whether blue is close to brown)
-
8/3/2019 Nearest Neighbour and Clustering
15/122
Clarity: excellent for clear explanation of
why a prediction was made. A single ex or
a set of exs can be extracted from the
historical DB for evidence as to why a
prediction should or should not be made.
The system can also communicate when it
is not confident of its prediction
-
8/3/2019 Nearest Neighbour and Clustering
16/122
ROI: Since the individual records of the
nearest neighbor are returned directly
without altering the DB, it is possible to
understand all facets of business behavior
and thus derive a more complete estimate
of the ROI not just from the prediction but
from a variety of different factors
-
8/3/2019 Nearest Neighbour and Clustering
17/122
Where to use clustering and
nearest neighborprediction Personal bankruptcy to computer
recognition of human handwriting
Clustering for clarity: clustering-likerecords are grouped together. High levelview of what is going on in the DB
Clustering-segmentation-birds eye view of
business Commercial offerings:PRIZM &
Microvision
-
8/3/2019 Nearest Neighbour and Clustering
18/122
Grouped the population by demographic
info into segments
Clustering info is then used by the enduser to tag the customers in the db
Business user gets a high level view of
what is happening within the cluster
Once worked with these clusters, users
will know more about customers reaction
-
8/3/2019 Nearest Neighbour and Clustering
19/122
Clustering for outlier analysis
Clustering done to an extent where some
records stick out
Profit in stores, dept
-
8/3/2019 Nearest Neighbour and Clustering
20/122
Nearest Neighbour for Prediction
One particular object can be closer to
another obj than the third object
People have innate sense of ordering on avariety of objects
Apple close to orange than tomato
Toyota corolla, honda civic than porsche Sense of ordering places them in time and
space and makes sense in real world
-
8/3/2019 Nearest Neighbour and Clustering
21/122
Defn of nearness that seems to beubiquitous also allows us to makepredictions
NN prediction alg simply stated as:
Objects that are near to each other willhave similarprediction values as well.
Thus if you know the prediction value ofone of the objects , you can predict it forits nearest neighbors
-
8/3/2019 Nearest Neighbour and Clustering
22/122
Classic ex NN: Text retrieval
Define a document. Look for more suchdocuments
NN looks for imp characteristics with thosedocuments which have been marked asinteresting
Can be used in wide variety of places
Successful use depends on preformatting ofdata, so that nearness can be calculated andwhere individual records can be defined
-
8/3/2019 Nearest Neighbour and Clustering
23/122
Easy for text retrieval but not for time
series kind like stock prices where there is
no inherent order
-
8/3/2019 Nearest Neighbour and Clustering
24/122
Application Score Card
Rules are seldom used for prediction here
Used for unsupervised learning
Clusters: the underlying prediction method
for nearest neighbor technology is nearness
in some feature space. This is same
underlying metric used for most clustering
algorithms although for nearest neighbor the
feature space is shaped in such a way as to
facilitate a particular prediction
-
8/3/2019 Nearest Neighbour and Clustering
25/122
Links: NN techniques can be used forlink analysis as long as the data ispreformatted so that predictor values to
be linked fall within same record Outliers: NN techniques are particularly
good at detecting outliers since theyhave effectively created a space withinwhich it is possible to determine whena record is out of place
-
8/3/2019 Nearest Neighbour and Clustering
26/122
Rules: one strength of NN techniques is
that they take into account all the
predictors to some degree, which is helpful
forprediction but makes for a complex
model that cannot easily be described as a
rule. The systems are also generally
optimized forprediction of new recordsrather than exhaustive extraction of
interesting rules from the DB
-
8/3/2019 Nearest Neighbour and Clustering
27/122
Sequences: NN techniques have beensuccessfully used to make prediction intime sequences. The time values need to
be encoded in records Text : most text retrieval systems are
based around NN tech, and most of themremaining breakthroughs come fromfurther refinements of the predictorweighting algs and the dist calculations
-
8/3/2019 Nearest Neighbour and Clustering
28/122
General Idea
NN is a refinement of clustering in the
sense that both use dist in some feature
space to create either structure in data or
in predictions
NN is a way of automatically determining
the weighting of the importance of the
predictors and how the dist will bemeasured within the feature space
-
8/3/2019 Nearest Neighbour and Clustering
29/122
Clustering is one special case-imp of each
predictor is considered to be equivalent
Ex:set ofpeople and clustering friends
-
8/3/2019 Nearest Neighbour and Clustering
30/122
There is no best way to cluster
Is Clustering on financial status better than
eye color or on food habit?
Clustering with no specific purpose andjust to group data , probably all are ok
Reasons for clustering are ill defined
As they are used more often forexploration and summarization than for
prediction
-
8/3/2019 Nearest Neighbour and Clustering
31/122
How are tradeoffs made when
determining which records fall into
which clusters Ex:aged vs young, classial vs rock
Clustering large no. of records, these
tradeoffs are explicitly defined by theclustering algroithm
-
8/3/2019 Nearest Neighbour and Clustering
32/122
Difference betn clustering and NN
Main distinction: clustering is unsupervised
learning and NN forprediction supervised
learning technique
Unsupervised: there is no particular
reason for the creation of the models
Supervised: prediction
Prediction: patterns presented are most
important
-
8/3/2019 Nearest Neighbour and Clustering
33/122
-
8/3/2019 Nearest Neighbour and Clustering
34/122
How is the space for clustering and
nearest neighbor defined?
Clustering: n dimensional space-assigning
one predictor to each dimension
NN: predictors are also mapped todimensions , but those dimensions are
literally stretched or compressed
according to how important the particular
predictor is in making prediction
Stretching makes it more imp than others
-
8/3/2019 Nearest Neighbour and Clustering
35/122
-
8/3/2019 Nearest Neighbour and Clustering
36/122
The dist betn the cluster and a given datapoint :measured from the centre of massof the cluster
The center of mass of the cluster can becalculated avg ofpredictor value
Clusters defined: soley by centre or by
their centre with some radius attached inwhich all points that fall within the radiusare classified into that cluster
-
8/3/2019 Nearest Neighbour and Clustering
37/122
Centre record :prototypical rec
Normal DB records mapped onto n
dimensional space 2 or 3 dimensions: easy to visualize
More dim: complex
-
8/3/2019 Nearest Neighbour and Clustering
38/122
How is nearness defined?
Clustering and NN: work with n
dimensional space
One record being close to or far fromanother record
Nearness determined: any rec in the
historical DB that is exactly the same as
the rec to be predicted is considered close
and anything else is far away
-
8/3/2019 Nearest Neighbour and Clustering
39/122
Difficulty with this strategy
Unlikely that exact matches of records in
db Perfectly matching rec may be spurious.
Better results: taking vote among several
nearby recs
-
8/3/2019 Nearest Neighbour and Clustering
40/122
Two other dists:
Manhattan dist: adds up the diff betn eachpredicator betn the historical rec and the
rec to be predicted. Euclidean dist:(pythogorous)dist betn twopoints in n dimensions by squaring the
differences of the predictor values for thetwo recs and taking the square root of thesum
-
8/3/2019 Nearest Neighbour and Clustering
41/122
Dist betn xyz & abc:
Age:6
Sal:3100 Color of eye:0
Gender:1
Income:1(high 3 med 2 low 1) Total diff=3108
-
8/3/2019 Nearest Neighbour and Clustering
42/122
Diff dominated by sal
Others whether they match or not does not
matter
To balance, use normalized values
Val:0 to 100
Max diff betn sal in the data set:16543
Betn xyz and abc 3100 which is 19% of max 6+19+0+100+100=225
W i hti th di i di t ith
-
8/3/2019 Nearest Neighbour and Clustering
43/122
Weighting the dimensions: dist with
a purpose:
High income rec when added (Mukesh
ambani) there is outlier created when
clustering is made betn age and income
Normalizing does not help in this case
When near is defined , how imp each
dimensions contribution is?
Ans: it depends on what is to be
accomplished
-
8/3/2019 Nearest Neighbour and Clustering
44/122
Calculating dimension weights
Several automatic ways of calculating the imp of
different dimensions
Ex of document classification and prediction, the
dimensions of space are often the individualwords contained in the document
Ex: entrepreneur occurs or not
The occurs several times: it is little significance Earlier word significant
-
8/3/2019 Nearest Neighbour and Clustering
45/122
Weights: 1. Inverse freq often used: the occurred
10000 docs, word weight = 1/10000=0.0001
Entrepreneur occurred in 100 docs:1/100=0.01
2. importance of the word for the topic to be
predicted. If topic : starting a smallbusiness, words such as entrepreneur and
venture capital will be given higher weights
-
8/3/2019 Nearest Neighbour and Clustering
46/122
Data Mining in doc: special situation: manydimensions and all dim are binary
Other business problems: binary (gender),
categorical (eye color), numeric (revenue)dimensions
Each dim weighted depending on its
relevance to the topic to be predicted Calculation: correlation betn predictor andpredictor value
-
8/3/2019 Nearest Neighbour and Clustering
47/122
Or conditional prob that prediction has
certain value given that predictor has
certain value
Dimension weights calculated via alg
searches :random weights tried initially
Then slowly modified to improve the
accuracy of the system
-
8/3/2019 Nearest Neighbour and Clustering
48/122
Hierarchical and nonhierarchical
clustering
Hierarchical C: small to big clusters
Unsupervised learning
Fewer or Greater no of clusters desired. Depending on appn choose clusters
Extreme: as many as there are no. of recs
In this case recs are optimally similar toeach other (within a cluster there is only
one) but different from other clusters
-
8/3/2019 Nearest Neighbour and Clustering
49/122
Such clustering probably cannot find
useful patterns
No summary info Data is not understood any better
Fewer than original is better
Adv of HC: allow end users to choose fromeither many clusters or only a few
-
8/3/2019 Nearest Neighbour and Clustering
50/122
HC is viewed as tree: smaller clustersmerge together to create next highest levelof clusters and at that level again merge
and so on User can decide what the adequate no. of
clusters that will summarize data andproviding useful info
Single cluster :great summarization butdoes not provide any specific info
-
8/3/2019 Nearest Neighbour and Clustering
51/122
Two algs to HC:
1 . Agglomarative:AC tech start with as
many clusters as there are recs. Eachcluster has one rec. clusters that are
nearest to each other merged. This is
continued till we have single cluster
containing all recs at the top of thehierarchy
-
8/3/2019 Nearest Neighbour and Clustering
52/122
2.Divisive:DC techniques take opp
approach
Start with all rec in one cluster Split into smallerpieces
Further try to split
-
8/3/2019 Nearest Neighbour and Clustering
53/122
Non HC faster to create from historical db
User makes decision about no. of clusters
desired or nearness reqd multiple times run
Start with arbitrary clustering and
iteratively improve by shuffling
Or create by taking one rec at time
depending on the criteria
-
8/3/2019 Nearest Neighbour and Clustering
54/122
Nonhierarchical clustering
Two NHC:
1. single pass methods: db passed thro
only once in order to create clusters
2. Reallocation methods: movement or
reallocation of records from one cluster to
another to create better clusters. Multiple
passes thro db. Faster compared to HC.
-
8/3/2019 Nearest Neighbour and Clustering
55/122
Alg for single pass technique:
Read in a rec from db, determine the
cluster it best fits to (measure of nearness)
If nearest still far away, new cluster with
this rec
Read next rec
-
8/3/2019 Nearest Neighbour and Clustering
56/122
Reading recs: expensive. Single pass
scores better
Problem: large clusters. Decision made
earlier. Sequence in which processed
matters
Reallocation solves this problem by
readjusting the cluster
Optimizes similarity
-
8/3/2019 Nearest Neighbour and Clustering
57/122
Alg for reallocation:
preset the no. of clusters desired
Randomly pick a record to become thecentre or seed for each of these clusters
Go thro db and assign each rec to nearestcluster
Recalculate the centers of the clusters Repeat steps 3 & 4 until there is a
minimum or reallocation betn clusters
-
8/3/2019 Nearest Neighbour and Clustering
58/122
Recs initially assigned may not be good
fits
Recalculating center, clusters that actually
better match are formed
Center moving towards high density and
away from outliers
Predefining no. of cluster may be a bad
idea than driven by data
-
8/3/2019 Nearest Neighbour and Clustering
59/122
There is no one right ans as how
clustering is to be done
-
8/3/2019 Nearest Neighbour and Clustering
60/122
HC
HC has adv over NHC :clusters are
defined solely by data(no predetermined
no.)
No. of clusters :increased or decreased
:moving down or up the hierarchy
Hierarchy started either from top and
dividing further or from bottom andmerging rec at every level
-
8/3/2019 Nearest Neighbour and Clustering
61/122
Merge or split:usually two at a time
Agglomerative alg:
Start with as many clusters as there arerecords,with one record in each cluster
Combine the two nearest clusters into a
larger cluster
Continue until only one cluster remains
-
8/3/2019 Nearest Neighbour and Clustering
62/122
Divisive tech alg
Start with one cluster that contains all therecords in the db
Determine the division of the existing clusterthat best maximizes similarity within clustersand dissimilarity betn clusters
Divide the cluster and repeat on the twosmaller clusters
Stop when some min threshold of clustersize or total no. has been reached or whenthere is only one rec in the cluster
-
8/3/2019 Nearest Neighbour and Clustering
63/122
Divisive techniques:quite expensive to
compute
Separates cluster into every possible
smaller cluster and picks best one (min
avg dist)
Agglomerative preferred
Decisions are made to merge clusters
-
8/3/2019 Nearest Neighbour and Clustering
64/122
Join the clusters whose resulting merged
cluster has min total dist betn all
recs:wards method. Produces symmetric
hierarchy. Good at recovering clusterstructure. Sensitive to outliers. Difficulty in
recovering elongated structure
-
8/3/2019 Nearest Neighbour and Clustering
65/122
Decisions in several ways:
Join the clusters whose nearest recs as
near as possible.-single link method.
Clusters can be joined on a single nearest
pair of recs , tech can create long
snakelike clusters not good at extracting
classical spherical and compact clusters
-
8/3/2019 Nearest Neighbour and Clustering
66/122
Join the clusters whose most distinct recs
are as near as possible.:complete link
method.all recs are linked within some max
dist. Favours:Compact clusters Join the clusters where the avg dist betn all
pairs of recs is as small as possible:Group
avg link method.Includes noth nearest and
distinct, clusters result in elongated singlelink to tight complete link clusters
-
8/3/2019 Nearest Neighbour and Clustering
67/122
Implementation of KNNmethod for object recognition.
-
8/3/2019 Nearest Neighbour and Clustering
68/122
Outline
Introduction.
Description of the problem.
Description of the method.
Image library.
Process of identification.
Example.
Future work.
-
8/3/2019 Nearest Neighbour and Clustering
69/122
Introduction
Generally speaking, problem of objectrecognition is how to teach computer to
recognize different objects on a picture. This is a nontrivial problem. Some of the
main difficulties in solving this problem areseparation of an object from the background
(especially in the presence of clutter orocclusions in the background), and ability torecognize an object with different lighting.
-
8/3/2019 Nearest Neighbour and Clustering
70/122
Introduction.
In this research I am trying to improveaccuracy of object recognition by
implementation of KNN method withnew weighted Hamming-Levenshteindistance that I developed.
-
8/3/2019 Nearest Neighbour and Clustering
71/122
Description of the problem. The problem of object recognition can
be divided into two parts:
1) Location of an object on the picture;
2) Identification of an object.
For example, assume that we have the
following picture:
-
8/3/2019 Nearest Neighbour and Clustering
72/122
Description of the problem.
-
8/3/2019 Nearest Neighbour and Clustering
73/122
Description of the problem. and we have the following library of
images that we will use for object
identification:
-
8/3/2019 Nearest Neighbour and Clustering
74/122
Description of the problem. Our goal is to
identify and locate
objects from ourlibrary on thepicture.
-
8/3/2019 Nearest Neighbour and Clustering
75/122
Description of the problem. In this research I have developed a
method of objects identification
assuming that we already know thelocation of an object, and I am going todevelop the method of location in my
future work.
-
8/3/2019 Nearest Neighbour and Clustering
76/122
Description of the method. We will use KNN method to identify
objects.
For example, assume we need toidentify an object X on a given picture.Letus consider the space of pictures
generated by the image of X andimages from our library.
-
8/3/2019 Nearest Neighbour and Clustering
77/122
Description of the method. In this space we will
pick up, say 5,
closest to X images,and identify X byfinding the pluralityclass of the nearest
pictures.
X
A1
A2
B1
B2
A3
Nearest neighbors: A1, B1, A2, B2, A3
B3
C1
C2
-
8/3/2019 Nearest Neighbour and Clustering
78/122
Description of the method. In order to use KNN method we need to
introduce a measure of similarity between
two pictures. First of all, in order to say something about
similarity between pictures, we need to getsome ideas about the shape of objects on
these pictures. To do this we use edge-detection method (Sobel method, forexample).
-
8/3/2019 Nearest Neighbour and Clustering
79/122
-
8/3/2019 Nearest Neighbour and Clustering
80/122
-
8/3/2019 Nearest Neighbour and Clustering
81/122
Description of the method. Next, we turn the edge-detected picture
into a bit array by thresholding
intensities to 0 or 1. In fact, we aregoing to keep images in the library inthis form.
-
8/3/2019 Nearest Neighbour and Clustering
82/122
-
8/3/2019 Nearest Neighbour and Clustering
83/122
Description of the method. Now, in order to compare two pictures , we
need to compare two 2-dimensional bit
arrays. It may seem natural to use the traditional
Hamming distance for bitstrings which isdefined as follows: given two bitstrings of the
same dimension, the Hamming distance is theminimum number of symbol changes neededto change one bitmap into the other.
-
8/3/2019 Nearest Neighbour and Clustering
84/122
Description of the method. For example, the Hamming distance between
(A) 10001001 and
(B) 11100000 is 4. Notice that the Hamming distance between
(A) 10001001 and
(C) 10010010 is also 4, but intuitively one canregard (C) as a better match for (A) than (B).
-
8/3/2019 Nearest Neighbour and Clustering
85/122
Description of the method. We can modify Hamming distance using
the idea of Levenshtein distance which
is usually used for comparing of textstrings and is obtained by finding thecheapest way to transform one stringinto another. Transformations are the
one-step operations of insertion,deletion and substitution, and eachtransformation has a certain cost.
-
8/3/2019 Nearest Neighbour and Clustering
86/122
Description of the method.Also, since different parts of images
have different level of importance in the
process of recognition, we can assign aweight value for each pixel of an image,and use it in the definition of adistance. For example, we can eliminate
the background of a picture byassigning to the corresponding pixelszero weight.
-
8/3/2019 Nearest Neighbour and Clustering
87/122
-
8/3/2019 Nearest Neighbour and Clustering
88/122
-
8/3/2019 Nearest Neighbour and Clustering
89/122
Description of the method. To get weighted Hamming-Levenshtein
distance between two pictures we
divide each bitstring into several substringsof the same length.
Then we compare correspondingsubstrings using Levenshtein distance,
And summarize all these distancesmultiplied by the average weight of eachsubstring.
-
8/3/2019 Nearest Neighbour and Clustering
90/122
Image library. Each object in the library is represented
by several images taken with different
lighting and from different sides. Eachimage in the library is represented bytwo 2-dimensional arrays. First arraycontains the edge-detected picture
turned into a bit array, and the secondone contains weight values assigned toeach pixel.
-
8/3/2019 Nearest Neighbour and Clustering
91/122
Process of identification. To identify an object,
we turn its edge-detected image into a bit
array by thresholding intensities to 0 or 1. Then we measure distance between this
image and each image from our libraryusing corresponding weight arrays and
weighted Hamming-Levenshtein distance.
Using KNN method we identify the object.
-
8/3/2019 Nearest Neighbour and Clustering
92/122
Example. Below some results are presented in
object identification that was obtained
using the method that has beendescribed.
-
8/3/2019 Nearest Neighbour and Clustering
93/122
Example. Assume that we have the image library
with the following edge-detected images
of objects and weighted images.
-
8/3/2019 Nearest Neighbour and Clustering
94/122
-
8/3/2019 Nearest Neighbour and Clustering
95/122
-
8/3/2019 Nearest Neighbour and Clustering
96/122
-
8/3/2019 Nearest Neighbour and Clustering
97/122
Example. Letus try to
identify the
followingpicture.
Picture 1
-
8/3/2019 Nearest Neighbour and Clustering
98/122
Example. We compare this picture with each
image in our library, and we get the
following table of distances.
-
8/3/2019 Nearest Neighbour and Clustering
99/122
Example. If we select three
closest neighbors of
our pict
ure 1, thenwe can identify it as
Bear.
Bear 1 876
Bear 2 21009
Bear 3 24495Cat 1 27401
Cat 2 25986
Cat 3 24538
Dog 1 21629
Dog 2 26809
Dog 3 25546
-
8/3/2019 Nearest Neighbour and Clustering
100/122
Example. Letus do similar calculations for these
two pictures:
Picture 2. Picture 3.
-
8/3/2019 Nearest Neighbour and Clustering
101/122
Picture 2 Pict ure 3Bear 1 31678 32629
Bear 2 24644 23790
Bear 3 31662 32150
Cat 1 1864 28687
Cat 2 22798 25655
Cat 3 22242 25824
Dog 1 23087 1577
Dog 2 25679 24042
Dog 3 25785 23880
-
8/3/2019 Nearest Neighbour and Clustering
102/122
Fu
tu
re work. Develop a method of location of an object on
the picture.
Develop an idea of reasonable weightdistribution on the images from the library.
Improve the algorithm of identification toallow to compare pictures of different sizes.
Continue to work on improving the definitionof weighted Hamming-Levenshtein distance.
I t d ti
-
8/3/2019 Nearest Neighbour and Clustering
103/122
Optical Character Recognition (OCR)
Predict the label of each image using the classification
function learned from training
OCR is basically a classification task on
multivariate data
Pixel Values Variables
Each type of character Class
Objective: To recognise images of Handwritten digitsbased on classification methods for multivariate
data.
Introduction
Handwritten Digit dataHandwritten Digit data
-
8/3/2019 Nearest Neighbour and Clustering
104/122
16 x16 (= 256 pixel) Grey Scale images ofdigits in range 0-9 Xi=[xi1, xi2, . xi256]
yi { 0,1,2,3,4,5,6,7,8,9}
9298 labelled samples Training set ~ 1000 images
Test set
Randomly selected from the full data base
Basic idea Correctly identify the digit given an image
2 4 6 8 1 0 1 2 1 4 1 6
2
4
6
8
10
12
14
16
xij ]1,0[
16
16
gg
Di i d ti PCA
-
8/3/2019 Nearest Neighbour and Clustering
105/122
Dimension reduction - PCA
PCA done on the mean centered images
The eigenvectors of256x256matrix are
called theE
igen digits (256 dimensional)
The larger an Eigen value the more
important is that Eigen digit.
The ith PC of an image X is
yi=eiX
AVERAG E I M AG E
2 4 6 8 10 12 14 16
2
4
6
8
10
12
14
16
AVERAGE DIGIT
5 10 15
5
10
15
5 10 15
5
10
15
5 10 1 5
5
10
15
5 10 15
5
10
15
5 10 15
5
10
15
5 10 1 5
5
10
15
5 10 15
5
10
15
5 10 1 5
5
10
15
5 10 15
5
10
15
5 10 15
5
10
15
EIGEN D IGITS
PCA ( ti d )
-
8/3/2019 Nearest Neighbour and Clustering
106/122
PCA (continued)
Based on the Eigen values first 64 PCs were found to besignificant
Variance captured ~ 92.74%
Any image represented by its PC: Y= [y1 y2.....y64 ]
Reduced Data Matrix with 64 variables
Y= 1000 x 64 matrix
-
8/3/2019 Nearest Neighbour and Clustering
107/122
0 5 0 1 0 0 1 5 0 2 0 0 2 5 06410
20
30
40
50
60
70
80
90
10 0
9 2 . 7 4
No of Pr inciple Com ponents
%V
arianceexplained
(Cumulative)
Cum ulat ive Percentage var iance e xplained vs No of Pr inciple Com ponents u sed
64
9 2 . 7 4
Interpreting the PCs as Image
-
8/3/2019 Nearest Neighbour and Clustering
108/122
Interpreting the PCs as Image
F
eatures The Eigen vectors are the rotation of the originalaxes to more meaningful directions. The PCsare the projection of the data onto each of thesenew axes.
Image Reconstruction: The original image can be reconstructed by projecting
the PCs back to old axes.
Using the most significant PC will give a
reconstructed image that is close to original image. These features can be used for carrying out further
investigations e.g. Classification!!
I R t ti
-
8/3/2019 Nearest Neighbour and Clustering
109/122
Image Reconstruction
Mean Centered Image: I=(X-Xmean) PC as Features:
yi = eiI
Y= [y1, y2,.. y64]
= EI where E=[e1 e2. e64]
Reconstruction: Xrecon= E*Y + XmeanACTUAL IM AGE FROM TEST SE T
5 10 15
2
4
6
8
10
12
14
16
COM PLETELY REC ONSTRUCTED
IM AGE USING ALL 256 PRICIPLE COMPONENTS
5 10 1 5
2
4
6
8
10
12
14
16
RECONSTRUCTED IMAGE USING
15 0 PRICIPLE COMPONENTS
5 1 0 15
2
4
6
8
10
12
14
16
RECONSTRUCTED IMAGE USING
64 PRICIPLE COM PONENTS
5 10 1 5
2
4
6
8
10
12
14
16
N lit t t PC
-
8/3/2019 Nearest Neighbour and Clustering
110/122
Normality test on PCs
-4 -2 0 2 4-1
-0. 5
0
0. 5
1
Standard Normal Quant i les
Quantiles
ofInputSam
ple
QQ Plot of Sample Data versus Standard Normal
P r in c ip le Co mp o n en t No 1
-4 -2 0 2 4-1
-0. 5
0
0. 5
1
Standard Normal Quant i les
Quantiles
ofInputSam
ple
QQ Plot of Sample Data versus Standard Normal
P r in c ip le Co mp o n en t No 3
-4 -2 0 2 4-1
-0. 5
0
0. 5
1
Standard Normal Quant i les
Quantiles
ofInputSam
ple
QQ Plot of Sample Data versus Standard Normal
P r in c ip le Co mp o n en t No 5
-4 -2 0 2 4-1
-0. 5
0
0. 5
1
Standard Normal Quant i les
Quantiles
ofInputSam
ple
QQ Plot of Sample Data versus Standard Normal
P r in c ip le Co mp o n en t No 10
-4 -2 0 2 4-1. 5
-1
-0. 5
0
0. 5
1
1. 5
Standard Normal Quant i les
Quantiles
ofInputSam
ple
QQ Plot of Sample Data versus Standard Normal
P r in c ip le Co mp o n en t No 20
-4 -2 0 2 4-1. 5
-1
-0. 5
0
0. 5
1
1. 5
Standard Normal Quant i les
Quantiles
ofInputSam
ple
QQ Plot of Sample Data versus Standard Normal
P r in c ip le Co mp o n en t No 30
-4 -2 0 2 4-3
-2
-1
0
1
2
Standard Normal Quant i les
Quantiles
ofInputSam
ple
QQ Plot of Sample Data versus Standard Normal
P r in c ip le Co mp o n en t No 40
-4 -2 0 2 4-3
-2
-1
0
1
2
3
Standard Normal Quant i les
Quantiles
ofInputSam
ple
QQ Plot of Sample Data versus Standard Normal
P r in c ip le Co mp o n en t No 50
-4 -2 0 2 4-5
0
5
Standard Normal Quant i les
Quantiles
ofInputSam
ple
QQ Plot of Sample Data versus Standard Normal
P r in c ip le Co mp o n en t No 60
Cl ifi ti
-
8/3/2019 Nearest Neighbour and Clustering
111/122
Classification
Principle Components used as features of
images
LDA assuming multivariate normality of thefeature groups and common covariance
Fisher discriminant procedure which assumes
only common covariance
Cl ifi ti ( td )
-
8/3/2019 Nearest Neighbour and Clustering
112/122
Classification (contd..)
Equal cost of misclassification
Misclassification error rate:
APER based on training data AER on the validation data
Error rate using different number
of PCs were compared
Averaged overseveral random
sampling of training
and validation data
from the full data set.
Performing LDA
-
8/3/2019 Nearest Neighbour and Clustering
113/122
Performing LDA
Prior probabilities of each class were taken
as the frequency of that class in data.
Equivalence of covariance matrix Strong Assumption
Error rates used to check validity of
assumption
Spooled used for covariance matrix
-
8/3/2019 Nearest Neighbour and Clustering
114/122
-
8/3/2019 Nearest Neighbour and Clustering
115/122
Fisher Discriminant Results
-
8/3/2019 Nearest Neighbour and Clustering
116/122
Fisher Discriminant Results
r=2 discriminants
APER
AER
Both AER and APER are very high
No of PCs 256 150 64
APER % 32 34.5 37.4
No of PCs 256 150 64
AER % 45 42 40
Fisher Discriminant Results
-
8/3/2019 Nearest Neighbour and Clustering
117/122
Fisher Discriminant Results
r=7 discriminants
APER
AER
Considerable improvement in AER and APER
Performance is close to LDA
Using 64 PCs is better
No of PCs 256 150 64
APER % 3.2 4.8 7.9
No of PCs 256 150 64
AER % 14.1 12.4 10.8
Fisher Discriminant Results
-
8/3/2019 Nearest Neighbour and Clustering
118/122
Fisher Discriminant Results
r=9(all) discriminants
APER
AER
No significant performance gain from r=7
Error rates are ~ LDA (as expected!)
No of PCs 256 150 64
APER % 1.6 4.3 6.4
No of PCs 256 150 64
AER % 13.21 10.55 9.86
Nearest Neighbour Classifier
-
8/3/2019 Nearest Neighbour and Clustering
119/122
g
No assumption aboutdistribution of data
Euclidean distance to findnearest neighbour
Test point assigned to
Class 2
Class 2
Class 1
Finds the nearest neighbours from the trainingset to test image and assigns its label to testimage.
K-Nearest Neighbour Classifier
-
8/3/2019 Nearest Neighbour and Clustering
120/122
g
(KNN) Compute the k nearest neighbours and
assign the class by majority vote.
k= 3
Test point assigned to
Class 1
Class 2 ( 1 vote )
Class 1 ( 2 votes )
1-NN Classification Results:
-
8/3/2019 Nearest Neighbour and Clustering
121/122
1 NN Classification Results:
No of PCs 256 150 64AER % 7.09 7.01 6.45
Test error rates have improved compared toLDA andFisher
Using 64 PCs gives better results
Using higherks does not show improvementin recognition rate
Misclassification in NN:
-
8/3/2019 Nearest Neighbour and Clustering
122/122
Misclassification in NN:
0 1 2 3 4 5 6 7 8 9
0 1376 0 4 2 0 5 12 2 0 0
1 0 1113 1 0 1 0 2 0 2 0
2 22 9 728 17 4 4 6 16 18 2
3 4 0 4 690 2 26 0 4 6 3
4 3 15 9 0 687 0 7 2 4 32
5 9 3 12 37 5 517 32 0 23 9
6 10 3 5 0 3 2 714 0 3 2
7 0 6 1 0 19 0 0 657 1 20
8 8 11 1 26 7 7 8 5 547 13
9 6 1 2 0 23 0 0 32 0 664
Actual
Recognised as
Euclidean distances between transformed
images of same class can be very high
top related