cluster analysis
DESCRIPTION
cluster analysisTRANSCRIPT
CLUSTER ANALYSIS
Cluster analysis is a data analysis tool for classification problems. Its objective is
to classify observations into groups / clusters such that the degree of association is strong
between members of the same clusters and weak between members of different clusters.
It is now easier for people to predict the behavior of people/objects based on cluster
membership. Cluster analysis is also called segmentation or taxonomy analysis.
Cluster analysis classifies unknown groups whereas discriminant analysis
classifies known groups, that is, cluster analysis does not require any priori information
about the cluster membership whereas discriminant analysis requires prior knowledge of
the cluster membership for each object
Clustering procedures
Clustering procedures are broadly classified as Hierarchical and Non-Hierarchical
clustering. In hierarchical clustering, the data are not partitioned into a particular cluster.
Series of partitions take place, which may run from a single cluster containing all
observations to ‘n’ clusters each containing a single object. It builds a hierarchy of
clusters. It can be represented by a dendrogram or tree diagram.
Hierarchical clustering is further subdivided into agglomerative method, which
combines individual observations into a group and divisive methods divides n objects
into smaller groups.
Agglomerative method
Agglomerative method produces series of ‘n-1’ steps where ‘n’ is the number of
observations. The first step contains ‘n’ single observation cluster and the last step
contains one group of all ‘n’ observations. At any given stage agglomerative method
joins two clusters which are the closest.
2
Agglomerative method consists of linkage method, variance method & centroid
method. Linkage method consists of single linkage, complete linkage and average
linkage.
SINGLE LINKAGE
This is also known as nearest neighbor technique. In this method, the distance
between the closest pair of objects in two groups is considered as the distance between
the two groups.
Distance between the two clusters r & s is given by
D(r, s) = min {d (i, j)} where i, j are the observations in the r & s clusters respectively.
The distance between two clusters is the shortest link between the clusters.
PROBLEM: The following table represents the co-ordinates of 8 cities in India. The
objective of the clustering is to group the cities based on their similarities so that the
degree of association is strong within cluster and weak between clusters
CITIES X YC1 -57 28C2 54 -65C3 46 79C4 8 111C5 -36 52C6 -22 -76C7 34 129C8 74 6
Step1: Select a distance measure. The distance measure used is Euclidean distance.
D (i, j) = {∑ (x ik – x jk) 2} 1/2
Where i & j are individual observations; p is the no of variables
3
AGGLOMERATIVE
DIVISIVE
A CB D E
In our case, p = 2, i & j varies from 1 to 8.
Step 2: Formulate the distance matrix
CITIES C1 C2 C3 C4 C5 C6 C7 C8C1 0C2 144.81 0C3 114.9 144.2 0C4 105.4 181.9 49.7 0C5 31.9 147.6 152.5 73.6 0C6 109.7 76.8 166.3 189.4 128.8 0C7 135.9 195 51.4 31.7 104 212.5 0C8 132.8 73.7 78.2 124 119.2 126.2 129.3 0
Step 3: Find the minimum value in the distance matrix. Combine those two cities into a
cluster. In our case, it is C4 and C7.
Step 4: After forming the cluster; calculate the distance between the clustered and the
remaining unclustered individuals. These are obtained as follows
D (C47, C1) = min {D (C4, C1), D (C7, C1)}
= min (105.4, 135.9)
= 105.4
D (C47, C2) = min {D (C4, C2), D (C7, C2)}
= min (181.9, 195)
= 181.9
Similarly after calculating the rest of the values, the following distance matrix is formed
CITIES C1 C2 C3 C47 C5 C6 C8C1 0C2 144.81 0
4
C3 114.9 144.2 0C47 105.4 181.9 49.7 0C5 31.9 147.6 152.5 73.6 0C6 109.7 76.8 166.3 189.4 128.8 0C8 152.8 73.7 78.2 124 184.2 126.2 0
Taking this distance matrix and minimum distance is 31.7 which is the distance between
C1 & C5. Forming the new cluster C15 and proceeding further, the distance matrix is
CITIES C15 C2 C3 C47 C6 C8C15 0C2 144.81 0C3 114.9 144.2 0C47 73.6 181.9 49.7 0C6 109.7 76.8 166.3 189.4 0C8 119.2 73.7 78.2 124 126.2 0
REPEATING STEP 3 Taking this distance matrix and minimum distance is 49.7 which is
the distance between C3 & C47. Forming the new cluster C347 and proceeding further,
the distance matrix is
CITIES C15 C2 C347 C6 C8C15 0C2 144.81 0C347 73.6 144.2 0C6 109.7 76.8 166.3 0C8 119.2 73.7 78.2 126.2 0
REPEATING STEP 3 Taking this distance matrix and minimum distance is 73.6 which is
the distance between C15 & C347. Forming the new cluster C15347 and proceeding
further, the distance matrix is
CITIES C15347 C2 C6 C8C15347 0C2 144.2 0C6 109.7 76.8 0
5
C8 78.2 73.7 126.2 0
REPEATING STEP 3 Taking this distance matrix and minimum distance is 73.7 which is
the distance between C6 & C8. Forming the new cluster C68 and proceeding further, the
distance matrix is
CITIES C15347 C2 C68C15347 0C2 78.2 0C68 109.7 76.8 0
REPEATING STEP 3 Taking this distance matrix and minimum distance is 76.8 which is
the distance between C2 & C68. Forming the new cluster C268 and proceeding further,
the distance matrix is
CITIES C15347 C628C15347 0C628 78.2 0
At the last stage the two clusters are joined to form a single cluster containing all the 8
cities. The dendrogram that summarizes the various stages is shown below
6
COMPLETE LINKAGE
It is also called as furthest neighbor method and it is the opposite of single linkage
method. Here the distance between the two clusters is the maximum distance of all
observations. The distance between the two clusters is the largest link between the
clusters
D(r, s) = max {d (i, j)}
where i, j are the observations of r& s clusters respectively.
The distance between every possible object pair is calculated and the maximum distance
is taken as the distance between the clusters.
1 35 2 8
80
70
60
50
40
30
20
10
04 7 6
CUSTOMERS
DISTANCE
SINGLE LINKAGE
7
PROBLEM The following table represents the co-ordinates of 8 cities in India. The
objective of the clustering is to group the cities based on their similarities so that the
degree of association is strong within cluster and weak between clusters
CITIES X YC1 -57 28C2 54 -65C3 46 79C4 8 111C5 -36 52C6 -22 -76C7 34 129C8 74 6
Step1: Select a distance measure. The distance measure used is Euclidean distance.
D (i, j) = {∑ (x ik – x jk) 2} 1/2
Where I & j are individual observations: p is the no of variables
In our case, p = 2, I & j varies from 1 to 8.
Step 2: Formulate the Distance matrix
CITIES C1 C2 C3 C4 C5 C6 C7 C8C1 0C2 144.81 0C3 114.9 144.2 0C4 105.4 181.9 49.7 0C5 31.9 147.6 152.5 73.6 0C6 109.7 76.8 166.3 189.4 128.8 0C7 135.9 195 51.4 31.7 104 212.5 0C8 132.8 73.7 78.2 124 119.2 126.2 129.3 0
Step 3: Find the minimum value in the distance matrix. Combine those two cities into a
cluster. In our case, it is C4 and C7.
8
Step 4: After forming the cluster, calculate the distance between the clustered and the
remaining unclustered individuals. These are obtained as follows:
D (C47, C1) = max {D (C4, C1), D (C7, C1)}
= max (105.4, 135.9)
= 135.9
D (C47, C2) = max {D (C4, C2), D (C7, C2)}
= max (181.9, 195)
= 195
Similarly after calculating the rest of the values the following distance matrix is formed as follows:
CITIES C1 C2 C3 C47 C5 C6 C8C1 0C2 144.81 0C3 114.9 144.2 0C47 135.9 195 51.5 0C5 31.9 147.6 152.5 73.6 0C6 109.7 76.8 166.3 189.4 128.8 0C8 132.8 73.7 78.2 129.3 119.2 126.2 0
REPEATING STEP 3 Taking this distance matrix and minimum distance is 31.9 which is
the distance between C1 & C5. Forming the new cluster C15 and proceeding further, the
distance matrix is
CITIES C15 C2 C3 C47 C6` C8C15 0C2 147.6 0C3 152.5 144.2 0C47 135.9 195 51.4 0C6 128.8 76.8 166.8 189.4 0C8 132.8 73.7 78.2 129.3 126.2 0
9
REPEATING STEP 3 Taking this distance matrix and minimum distance is 73.7 which is
the distance between C2 & C8. Forming the new cluster C28 and proceeding further, the
distance matrix is
CITIES C15 C28 C3 C47 C6
C15 0
C28 147.6 0
C3 152.5 144.2 0
C47 135.9 195 51.4 0
C6 128.8 126.2 166.3 189.4 0
REPEATING STEP 3 Taking this distance matrix and minimum distance is 51.4 which is
the distance between C3 & C47. Forming the new cluster C347 and proceeding further,
the distance matrix is
CITIES C15 C28 C347 C8
C15 0
C28 147.6 0
C347 152.5 195 0
C6 128.8 126.2 189.4 0
REPEATING STEP 3 Taking this distance matrix and minimum distance is 126.2 which
is the distance between C6 & C28. Forming the new cluster C268 and proceeding further,
the distance matrix is
CITIES C15 C268 C347
C15 0
C268 147.6 0
C347 152.5 195 0
10
REPEATING STEP 3 Taking this distance matrix and minimum distance is 147.6 which
is the distance between C15 & C268. Forming the new cluster C15268 and proceeding
further, the distance matrix is
CITIES C15268 C347
C15268 0
C347 195 0
At the last stage the two clusters are joined to form a single cluster containing all the 8
cities. The dendrogram that summarizes the various stages is shown below
1 65 4 7
200
180
160
140
120
100
80
60
40
20
02 8 3
CUSTOMERS
DISTANCE
COMPLETE LINKAGE
11
AVERAGE LINKAGE
In this method, the distance between two clusters is defined as the average
distance between the all possible pairs of objects. Each pair is made up of one object
from each cluster
The distance is given by
( 1/ ni nj ) ∑ ∑ dij
where dij is the distance between object i & object j each belonging to different clusters ,
where the summation is over all possible pairings of the variables between the two
clusters and ni and nj are the number of objects in each clusters
PROBLEM The following table represents the co-ordinates of 8 cities in India. The
objective of the clustering is to group the cities based on their similarities so that the
degree of association is strong within cluster and weak between clusters
CITIES X YC1 -57 28C2 54 -65C3 46 79C4 8 111C5 -36 52C6 -22 -76C7 34 129C8 74 6
Step1: Select a distance measure. The distance measure used is Euclidean distance.
D (i, j) = {∑ (x ik – x jk) 2} 1/2
Where i & j are individual observations: p is the no of variables
In our case, p = 2, i & j varies from 1 to 8.
12
Step 2: Formulate the distance matrix
CITIES C1 C2 C3 C4 C5 C6 C7 C8C1 0C2 144.81 0C3 114.9 144.2 0C4 105.4 181.9 49.7 0C5 31.9 147.6 152.5 73.6 0C6 109.7 76.8 166.3 189.4 128.8 0C7 135.9 195 51.4 31.7 104 212.5 0C8 132.8 73.7 78.2 124 119.2 126.2 129.3 0
Step 3: Find the minimum value in the distance matrix. Combine those two cities into a
cluster. In our case, it is C4 and C7.
Step 4: After forming the cluster; calculate the distance between the clustered and the
remaining unclustered individuals. These are obtained as follows
( 1/ ni nj ) ∑ ∑ dij
Where ni nj are the no of objects in clusters I & j respectively. Dij is the distance between
the object I of first cluster and object j of second cluster
In our case
D (C47, C1) = 1/ 2 * 1 (105.4 + 135.9) = 120.65
D (C47, C2) = 1/ 2 * 1 (181.9 + 195) = 188.45
Similarly after calculating the other distances, distance matrix is as follows
CITIES C1 C2 C3 C47 C5 C6 C8
C1 0
C2 144.81 0
C3 114.9 144.2 0
C47 120.65 188.45 50.55 0
C5 31.9 147.6 152.5 88.8 0
C6 109.7 76.8 166.3 200.95 128.8 0
C8 132.8 73.7 78.2 126.65 119.2 126.2 0
13
REPEATING STEP 3 Taking this distance matrix and minimum distance is 31.9 which is
the distance between C1 & C5. Forming the new cluster C15 and proceeding further, the
distance matrix is
CITIES C15 C2 C3 C47 C6 C8
C15 0
C2 146.21 0
C3 133.7 144.2 0
C47 104.73 188.45 50.55 0
C6 119.25 76.8 166.3 200.95 0
C8 126 73.7 78.2 126.65 126.2 0
REPEATING STEP 3 Taking this distance matrix and minimum distance is 50.55 which
is the distance between C47 & C3. Forming the new cluster C347 and proceeding further,
the distance matrix is
Cities C15 C2 C347 C6 C8
C15 0
C2 146.21 0
C347 119.22 166.33 0
C6 119.25 76.8 183.63 0
C8 126 73.7 102.43 126.2 0
REPEATING STEP 3 Taking this distance matrix and minimum distance is 73.7 which is
the distance between C2 & C8. Forming the new cluster C28 and proceeding further, the
distance matrix is
14
CITIES C15 C28 C347 C6
C15 0
C28 136.11 0
C347 119.22 134.37 0
C6 119.25 101.5 183.63 0
REPEATING STEP 3 Taking this distance matrix and minimum distance is 101.5 which
is the distance between C6 & C28. Forming the new cluster C628 and proceeding further,
the distance matrix is
CITIES C15 C628 C347
C15 0
C628 123.68 0
C347 119.22 159 0
REPEATING STEP 3 Taking this distance matrix and minimum distance is 119.22
which is the distance between C15 & C347. Forming the new cluster C15347 and
proceeding further, the distance matrix is
CITIES C15347 C268
C15347 0
C268 143.34 0
At the last stage the two clusters are joined to form a single cluster containing all the 8
cities. The dendrogram that summarizes the various stages is shown below
15
1 35 2 8
200
180
160
140
120
100
80
60
40
20
04 7 6
CUSTOMERS
DISTANCE
AVERAGE LINKAGE
16
NON- HIERARCHICAL CLUSTERING
Non-hierarchical clustering forms a grouping of set of units, into a pre-determined
number of groups, using an iterative algorithm in such a way that there is minimum
‘within group’ variation and maximum ‘between group’ variation. Starting from an initial
classification, units are transferred from one group to another, until no further
improvement can be made.
K-MEANS CLUSTERING
Algorithm for K-Means
Step 1: Decide the number of final clusters. This is represented by k.
Step 2: Form the initial clusters by subdividing the complete data into ‘k’ groups is done
by using the following formula:
k [Sum (i) – Min]
+ 1
[Max – Min]
Where i refer to the individual observations,
Sum (i) refers to the sum of variable values of the ith observation,
Max refers to the maximum of sum (i) values,
Min refers to the minimum of sum (i) values.
Step 3: Compute the cluster centers.
Step 4: Compute the error for the partition which is given by
E[P(n,k)] = ∑ D[ i , l( i ) ] ,for i = 1 to n
17
Where l (i) is the cluster containing the ith case,
K the number of clusters,
n the number of cases.
Step 5: Check to see whether any movement of case from one cluster to another cluster
results in a reduction in E. The following value is calculated for each case:
R l( i ),l = n( l ) D(i,l)2 - n( l (i) ) D(i, l ( i ))2
------------------------- ---------------------------------
n( l ) + 1 n( l(i) ) + 1
Where n ( l ) refers to the number of cases in lth cluster
l( i ) refers to the cluster containing ith case.
Step 6: Repeat steps 3, 4 & 5, till movement of any case from one cluster to another does
not result in a reduction in E.
Step 7: This represents the final clusters.
Compute the cluster centre.
PROBLEM The following table represents the co-ordinates of 8 cities in India. The
objective of the clustering is to group the cities based on their similarities so that the
degree of association is strong within cluster and weak between clusters
CITIES X YC1 -57 28C2 54 -65C3 46 79C4 8 111C5 -36 52C6 -22 -76C7 34 129C8 74 6
18
Step 1: Initially no of clusters has to be decided. This is done arbitrarily. In our case let
us take it as k= 3. Then the cities in each of the k clusters have to be decided. To do that,
the variables should be converted to a single measure using the following formula.
(K [sum (I) – min] / max – min) + 1
Where sum (I) = sum of the variables of observation
Min = minimum of sum (i)
Max = maximum of sum (i)
In our case it is as follows
CITIES X Y Sum Single measure
C1 -57 28 -29 1.793
C2 54 -65 -11 2
C3 46 79 125 3.563
C4 8 111 119 3.494
C5 -36 52 16 2.310
C6 -22 -76 -98 1
C7 34 129 163 4
C8 74 6 80 3.048
Sample calculation
K = 3 max = 163 min = - 98 sum (i) = -29
Measure = (3 * [-29 + 98] / [163 + 98]) + 1
= 1.793
Grouping the cities into clusters based on the measures results in following clusters
CLUSTER MEMBERS
I C1, C2, C6
II C3, C4, C5, C8
III C7
19
Step 2: Find the cluster centers
CLUSTER CLUSTER CENTERS
X Y
I -8.33 -37.67
II 23 62
III 34 129
For cluster 1,
Center = c1 + c2 + c6 / 3
= [(-57 + 54 -22)/3, (28 – 65 – 76) / 3]
= - 8.33, - 37.67
Step 3: Find the distance between the Ith observation and kth cluster
D(I,k) = [ ∑ { a( I, j ) – b ( k , j) }2 ] ½
D(1,1) = [ { -57 + 8.33 } 2 + { 28 + 37.67 } 2 ] ½
= 81.74
Similarly calculate for all the other observation
Step 4: The error for any particular partition is
E [P (n,k)] = ∑ D [ I , l(I) ] 2
Where l(i) is the cluster containing the ith case
n is the number of observations- here n = 8
k is the number of variables – here k = 2
E [ p( 8,2) ] = D[1,1] 2 + D [ 2,1] 2 + D [ 3,2 ] 2 + D [ 4,2] 2 + D [5,2] 2 +
D[6,1]2 + D [ 7,3 ] 2 + D [8,2]2
= 25731.34
Step 5: we now have to check whether there is any movement of any city from one
cluster to other cluster results in reduction of error e.
20
R [ l(i), L ] = n(i) D(i,l)2 - n(l(i)) d ( i ,l (i) )2
n(l)+1 n(l(i))-1
Where n (l) - no of observations in l th cluster
l (i) – Cluster containing the i th observation
Iteration 1 : For the 1st city, calculation is as follows
D(1,1)2 = ( -57 + 8.33 )2 + (28+ 37.67 ) 2
= 6681.31
D(1,2)2 = (-57 – 23 )2 + (28- 62 ) 2
= 7556
D(1,3)2 = (-57-34)2 + (28-129) 2
= 18482
R[1(1),2] = (4 / 5) * 7556 – (3 / 2) * 6681.31 = - 3977.145
By shifting the city 1 from cluster 1 to cluster 2 we can reduce the error, so we shift city
from cluster 1 to cluster 2
Iteration 2
CLUSTER MEMBERS
I C2, C6
II C1, C3, C4, C5, C8
III C7
New cluster centers are
CLUSTER CLUSTER CENTERS
X Y
I 16 -70.5
II 7 55.2
III 34 129
21
For 1st city calculation of R[l(i),i] are as follows
D(1,1)2 = ( -57 -16 )2 + (28+ 70.5 ) 2
= 15031.25
D(1,2)2 = (-57 – 7 )2 + (28- 55.2 ) 2
= 4835.84
D(1,3)2 = (-57-34)2 + (28-129) 2
= 18482
R[1(2),1] = (2 / 3) * 15031.25 – (5 /4) * 4835.84 > 0
R[1(2),3] = (1 / 2) * 18482 – (5 /4) * 4835.84 > 0
Shifting city 1 to other clusters does not result in reduction of error. So we move on to
next city
For 2nd city calculation of R[l(i),i] are as follows
D(2,1)2 = ( 54 -16 )2 + (-65+ 70.5 ) 2
= 1474.25
D(2,2)2 = (54 – 7 )2 + (-65 - 55.2 ) 2
= 16657.04
D(1,3)2 = (54-34)2 + (-65 -129) 2
= 38036
R[2(1),2] = (5 / 6) * 16657.04 – (2 /1) * 1474.25 > 0
R[2(1),3] = (1 / 2) * 38036 – (2 /1) * 1474.25 > 0
Shifting city 2 to other clusters does not result in reduction of error. So we move on to
next city
For 3rd city calculation of R[l(i),i] are as follows
D(3,1)2 = ( 46 -16 )2 + (79+ 70.5 ) 2
= 23250.25
22
D(3,2)2 = (46 – 7 )2 + (79 - 55.2 ) 2
= 2087.44
D(3,3)2 = (46-34)2 + (79 -129) 2
= 2564
R[3(2),1] = (2 / 3) * 23250.25 – (5 /4) * 2087.44 > 0
R[3(2),3] = (1 / 2) * 2564 – (5 /4) * 2087.44 < 0
Shifting city 3 to third cluster results in reduction of error. Therefore change city 3 to
third cluster
Iteration 3
CLUSTER MEMBERS
I C2, C6
II C1, C4, C5, C8
III C3, C7
New cluster centers are
CLUSTER CLUSTER CENTERS
X Y
I 16 -70.5
II -2.75 49.75
III 40 104
For 1st city calculation of R[l(i),i] are as follows
D(1,1)2 = ( -57 -16 )2 + (28+ 70.5 ) 2
= 15031.25
D(1,2)2 = (-57 + 2.75 )2 + (28- 49.75 ) 2
23
= 3416.125
D(1,3)2 = (-57-40)2 + (28-104) 2
= 15185
R[1(2),1] = (2 / 3) * 15031.25 – (4 /3) * 3416.125 > 0
R[1(2),1] = (2 / 3) * 15185 – (4/3) * 3416.125 > 0
Shifting of city 1 to other clusters does not result in reduction of error. So we move on to
next city.
For 2nd city calculation of R[l(i),i] are as follows
D(2,1)2 = ( 54 -16 )2 + (-65+ 70.5 ) 2
= 1474.25
D(2,2)2 = (54 + 2.75 )2 + (-65 - 49.75 ) 2
= 16388
D(2,3)2 = (54-40)2 + (-65-104) 2 = 28757
R[2(1),2] = (4 / 5) * 16388 – (2 /1) * 1474.25 > 0
R[2(1),3] = (2 / 3) * 28757 – (2/1) * 1474.25 > 0
Shifting of city 2 to other clusters does not result in reduction of error. So we move on to
next city
For 3rd city calculation of R[l(i),i] are as follows
D(3,1)2 = ( 46 -16 )2 + (79+ 70.5 ) 2
= 23250.25
D(3,2)2 = (46 + 2.75 )2 + (79 - 49.75 ) 2
= 3232.125
D(3,3)2 = (46-40)2 + (79-104) 2
= 661
R[3(3),1] = (2 / 3) * 23250.2 – (2 /1) * 661 > 0
R[3(3),2] = (4 / 5) * 3232.12 – (2/1) * 661 > 0
24
Shifting of city 3 to other clusters does not result in reduction of error. So we move on to
next city.
For 4th city calculation of R[l(i),i] are as follows
D(4,1)2 = ( 8 -16 )2 + (111+ 70.5 ) 2
= 33006.25
D(4,2)2 = (8 + 2.75 )2 + (111- 49.75 ) 2
= 3867.125
D(4,3)2 = (8-40)2 + (111-104) 2
= 1076
R[4(2),1] = (2 / 3) * 33006.25 – (4 /3) * 3867.125 > 0
R[4(2),3] = (2 / 3) * 1076 – (4/3) * 3867.125 < 0
Shifting city 4 to third cluster results in reduction of error. So we move city 4 to third
cluster
Iteration 4
CLUSTER MEMBERS
I C2, C6
II C1, C5, C8
III C3, C4,C7
New cluster centers are
CLUSTER CLUSTER CENTERS
X Y
I 16 -70.5
II -6.33 28.6
III 29.33 106.33
25
For 1st city calculation of R[l(i),i] are as follows
D(1,1)2 = ( -57 -16 )2 + (28+ 70.5 ) 2
= 15031.25
D(1,2)2 = (-57 + 6.33 )2 + (28- 28.6 ) 2
= 2567.8
D(1,3)2 = (-57-29.33)2 + (28-106.33) 2
= 13588.45
R[1(2),1] = (2 / 3) * 15031.25 – (3/2) * 2567.8 > 0
R[1(2),1] = (3 / 4) * 13588.45 – (3/2) * 2567.8 > 0
Shifting of city 1 to other clusters does not result in reduction of error. So we move on to
next city.
For 2nd city calculation of R[l(i),i] are as follows
D(2,1)2 = ( 54 -16 )2 + (-65+ 70.5 ) 2
= 1474.25
D(2,2)2 = (54 + 6.33 )2 + (-65 – 28.6 ) 2
= 12400.6
D(2,3)2 = (54-29.33)2 + (-65-106.33) 2
= 29962.5
R[2(1),2] = (3 / 4) * 12400.6 – (2 /1) * 1474.25 > 0
R[2(1),3] = (3 / 4) * 29962.5 – (2/1) * 1474.25 > 0
Shifting of city 2 to other clusters does not result in reduction of error. So we move on to
next city.
For 3rd city calculation of R[l(i),i] are as follows
D(3,1)2 = ( 46 -16 )2 + (79+ 70.5 ) 2
= 23250.25
D(3,2)2 = (46 + 6.33 )2 + (79 -28.6 ) 2
= 5278.58
26
D(3,3)2 = (46-29.33)2 + (79-106.33) 2
= 1024.81
R[3(3),1] = (2 / 3) * 23250.25 – (3 / 2) * 1024.81 > 0
R[3(3),2] = (3 / 4) * 5278.58 – (3 / 2 )* 1024.81 > 0
Shifting of city 3 to other clusters does not result in reduction of error. So we move on to
next city.
For 4th city calculation of R[l(i),i] are as follows
D(4,1)2 = ( 8 -16 )2 + (111+ 70.5 ) 2
= 33006.25
D(4,2)2 = (8 + 6.33 )2 + (111- 28.6 ) 2
= 6995.10
D(4,3)2 = (8-29.33)2 + (111-106.33) 2
= 476.778
R[4(2),1] = (2 / 3) * 33006.25 – (3 / 2) * 476.778 > 0
R[4(2),3] = (3 / 4) * 6995.10 – (3 / 2) * 476.778 > 0
Shifting of city 4 to other clusters does not result in reduction of error. So we can move to
next city.
For 5th city calculation of R[l(i),i] are as follows
D(5,1)2 = ( -36 -16 )2 + (52+ 70.5 ) 2
= 17710.25
D(5,2)2 = (-36 + 6.33 )2 + (52- 28.6 ) 2
= 1427.86
D(5,3)2 = (-36-29.33)2 + (52-106.33) 2
= 7219.75
27
R[5(2),1] = (2 / 3) * 17710.25 – (3 / 2) * 1427.86 > 0
R[5(2),3] = (3 / 4) * 7219.75 – (3 / 2) * 1427.86 > 0
Shifting of city 5 to other clusters does not result in reduction of error. So we can move to
next city.
For 6th city calculation of R[l(i),i] are as follows
D(6,1)2 = (-22-16 )2 + (-76+ 70.5 ) 2
= 1474.25
D(6,2)2 = (-22 + 6.33 )2 + (-76- 28.6 ) 2
= 11186.70
D(6,3)2 = (-22-29.33)2 + (-76-106.33) 2
= 35878.98
R[6(1),2] = (3 / 4) * 11186.70 – (2 / 1) * 1474.25 > 0
R[6(1),3] = (3 / 4) * 35878.98 – (2 / 1) * 1474.25 > 0
Shifting of city 6 to other clusters does not result in reduction of error. So we can move to
next city.
For 7th city calculation of R[l(i),i] are as follows
D(7,1)2 = ( 34 -16 )2 + (129+ 70.5 ) 2
= 40124
D(7,2)2 = (34 + 6.33 )2 + (129- 28.6 ) 2
= 11706.6
D(7,3)2 = (34-29.33)2 + (129-106.33) 2
= 535.73
R[7(3),1] = (2 / 3) * 40124 – (3 / 2) * 535.73 > 0
R[7(3),2] = (3 / 4 )* 11706.6 – (3 / 2 )* 535.73 > 0
28
Shifting of city 7 to other clusters does not result in reduction of error. So we can move to
next city.
For 8th city calculation of R[l(i),i] are as follows
D(8,1)2 = ( 74 -16 )2 + (6+ 70.5 ) 2
= 9216.25
D(8,2)2 = (74 + 6.33 )2 + (6- 28.6 ) 2
= 6963.66
D(8,3)2 = (74-29.33)2 + (6-106.33) 2
= 12061.5
R[8(2),1] = (2 / 3) * 9216.25 – (3 / 2) * 6963.66 < 0
Shifting of city 8 to first cluster results in reduction of error. So we should reallocate 8th city to first cluster.
Iteration 5
CLUSTER MEMBERS
I C2, C6, C8
II C1, C5
III C3, C4,C7
New cluster centers are
CLUSTER CLUSTER CENTERS
X Y
I 35.33 -45
II -45.66 40
III 29.33 106.33
For 1st city calculation of R[l(i),i] are as follows
D(1,1)2 = ( -57 -35.33 )2 + (28+ 45 ) 2
= 8813.8
29
D(1,2)2 = (-57 + 46.5 )2 + (28- 40 ) 2
= 254.25
D(1,3)2 = (-57-29.33 )2 + (28-106.33) 2
= 13588.45
R[1(2),1] = (3 / 4) * 8813.8 – (2/1) * 254.25 > 0
R[1(2),1] = (3 / 4) * 13588.45 – (2/1) * 254.25 > 0
Shifting of city 1 to other clusters does not result in reduction of error. So we move on to
next city.
For 2nd city calculation of R[l(i),i] are as follows
D(2,1)2 = ( 54 -35.33 )2 + (-65+ 45 ) 2
= 748.56
D(2,2)2 = (54 + 46.5 )2 + (-65 – 40 ) 2
= 21125.25
D(2,3)2 = (54-29.33)2 + (-65-106.33) 2
= 29962.5
R[2(1),2] = (2 / 3) * 21125.25 – (3 / 2) * 748.56 > 0
R[2(1),3] = (3 / 4) * 29962.5 – (3 / 2) * 748.56 > 0
Shifting of city 2 to other clusters does not result in reduction of error. So we move on to
next city.
For 3rd city calculation of R[l(i),i] are as follows
D(3,1)2 = ( 46 -35.33)2 + (79+ 45 ) 2
= 15489.8
D(3,2)2 = (46 + 46.5 )2 + (79 -40 ) 2
= 10077.23
D(3,3)2 = (46-29.33)2 + (79-106.33) 2
= 1024.81
30
R[3(3),1] = (3 / 4) * 15489.8 – (3 / 2) * 1024.81 > 0
R[3(3),2] = (2 / 3) * 10077.23 – (3 / 2) * 1024.81 > 0
Shifting of city 3 to other clusters does not result in reduction of error. So we move on to
next city.
For 4th city calculation of R[l(i),i] are as follows
D(4,1)2 = ( 8 -35.33 )2 + (111+ 45 ) 2
= 25082.9
D(4,2)2 = (8 + 46.5 )2 + (111- 40 ) 2
= 8011.25
D(4,3)2 = (8-29.33)2 + (111-106.33) 2
= 476.778
R[4(2),1] = (3 / 4) * 25082.9 – (3 / 2) * 476.778 > 0
R[4(2),3] = (2 / 3) * 8011.25 – (3 / 2) * 476.778 > 0
Shifting of city 4 to other clusters does not result in reduction of error. So we can move to
next city.
For 5th city calculation of R[l(i),i] are as follows
D(5,1)2 = ( -36 -35.33 )2 + (52+ 45 ) 2
= 14496.5
D(5,2)2 = (-36 + 46.5 + (52- 40)2
= 254.25
D(5,3)2 = (-36-29.33)2 + (52-106.33) 2
= 7219.75
R[5(2),1] = (3/ 4) * 14496.96– (2 / 1) * 254.25 > 0
R[5(2),3] = (3 / 4) * 7219.75 – (2 / 1) * 254.25 > 0
31
Shifting of city 5 to other clusters does not result in reduction of error. So we can move to
next city.
For 6th city calculation of R[l(i),i] are as follows
D(6,1)2 = (-22-35.33 )2 + (-76+ 45 ) 2
= 4247.7
D(6,2)2 = (-22 + 46.5 )2 + (-76- 40 ) 2
= 14056.25
D(6,3)2 = (-22-29.33)2 + (-76-106.33) 2
= 33297.5
R[6(1),2] = (3 / 4) * 14056.25 – (2 / 1) * 4247.7 > 0
R[6(1),3] = (3 / 4) * 33297.5 – (2 / 1) * 4247.7 > 0
Shifting of city 6 to other clusters does not result in reduction of error. So we can move to
next city.
For 7th city calculation of R[l(i),i] are as follows
D(7,1)2 = ( 34 -35.33)2 + (129+ 45 ) 2
= 30277.69
D(7,2)2 = (34 + 46.5 )2 + (129- 40 ) 2
= 14401.25
D(7,3)2 = (34-29.33)2 + (129-106.33) 2
= 535.73
R[7(3),1] = (3 / 4) * 30277.69 – (3 / 2) * 535.73 > 0
R[7(3),2] = (2 / 3) * 14401.25 – (3 / 2) * 535.73 > 0
Shifting of city 7 to other clusters does not result in reduction of error. So we can move to
next city.
32
For 8th city calculation of R[l(i),i] are as follows
D(8,1)2 = ( 74 -35.33)2 + (6+ 45 ) 2
= 4304.36
D(8,2)2 = (74 + 46.5 )2 + (6- 40 ) 2
= 15544.25
D(8,3)2 = (74-29.33)2 + (6-106.33) 2
= 11664.79
R[8(1),2] = (2 / 3) *15544.25 – (3 / 2) *4304.36 > 0
R[8(1),3] = (3 / 4) *11664.79 – (3 / 2 ) *4304.36 > 0
After calculating all the R l (i) , j we found that all are positive & will not result in less
error, therefore final cluster is
FINAL SOLUTION
CLUSTER MEMBERS
I C2, C6, C8
II C1, C5
III C3, C4,C7
New cluster centers are
CLUSTER CLUSTER CENTERS
X Y
I 35.33 -45
II -46.5 40
III 29.33 106.33
Error = D ( 2, 1)2 + D ( 6, 1)2 + D ( 6, 1)2 + D ( 6, 2)2 + D ( 5, 2)2 + D ( 3, 3)2
+ D ( 4, 3)2 + D ( 7, 3)2
= 748.877+ 4247.7 + 4304.36 + 254.25 + 254.25 +1024.877 +476.77 + 535.73
= 11846.50
33
APPLICATIONS OF CLUSTER ANALYSIS
1. Segmenting the market
2. Understanding buyer behavior.
3. Product positioning
4. New product development
5. Selecting test markets.
34