cluster analysis

44

Upload: vicky225066

Post on 31-Dec-2015

10 views

Category:

Documents


0 download

DESCRIPTION

cluster analysis

TRANSCRIPT

Page 1: Cluster Analysis
Page 2: Cluster Analysis

CLUSTER ANALYSIS

Cluster analysis is a data analysis tool for classification problems. Its objective is

to classify observations into groups / clusters such that the degree of association is strong

between members of the same clusters and weak between members of different clusters.

It is now easier for people to predict the behavior of people/objects based on cluster

membership. Cluster analysis is also called segmentation or taxonomy analysis.

Cluster analysis classifies unknown groups whereas discriminant analysis

classifies known groups, that is, cluster analysis does not require any priori information

about the cluster membership whereas discriminant analysis requires prior knowledge of

the cluster membership for each object

Clustering procedures

Clustering procedures are broadly classified as Hierarchical and Non-Hierarchical

clustering. In hierarchical clustering, the data are not partitioned into a particular cluster.

Series of partitions take place, which may run from a single cluster containing all

observations to ‘n’ clusters each containing a single object. It builds a hierarchy of

clusters. It can be represented by a dendrogram or tree diagram.

Hierarchical clustering is further subdivided into agglomerative method, which

combines individual observations into a group and divisive methods divides n objects

into smaller groups.

Agglomerative method

Agglomerative method produces series of ‘n-1’ steps where ‘n’ is the number of

observations. The first step contains ‘n’ single observation cluster and the last step

contains one group of all ‘n’ observations. At any given stage agglomerative method

joins two clusters which are the closest.

2

Page 3: Cluster Analysis

Agglomerative method consists of linkage method, variance method & centroid

method. Linkage method consists of single linkage, complete linkage and average

linkage.

SINGLE LINKAGE

This is also known as nearest neighbor technique. In this method, the distance

between the closest pair of objects in two groups is considered as the distance between

the two groups.

Distance between the two clusters r & s is given by

D(r, s) = min {d (i, j)} where i, j are the observations in the r & s clusters respectively.

The distance between two clusters is the shortest link between the clusters.

PROBLEM: The following table represents the co-ordinates of 8 cities in India. The

objective of the clustering is to group the cities based on their similarities so that the

degree of association is strong within cluster and weak between clusters

CITIES X YC1 -57 28C2 54 -65C3 46 79C4 8 111C5 -36 52C6 -22 -76C7 34 129C8 74 6

Step1: Select a distance measure. The distance measure used is Euclidean distance.

D (i, j) = {∑ (x ik – x jk) 2} 1/2

Where i & j are individual observations; p is the no of variables

3

AGGLOMERATIVE

DIVISIVE

A CB D E

Page 4: Cluster Analysis

In our case, p = 2, i & j varies from 1 to 8.

Step 2: Formulate the distance matrix

CITIES C1 C2 C3 C4 C5 C6 C7 C8C1 0C2 144.81 0C3 114.9 144.2 0C4 105.4 181.9 49.7 0C5 31.9 147.6 152.5 73.6 0C6 109.7 76.8 166.3 189.4 128.8 0C7 135.9 195 51.4 31.7 104 212.5 0C8 132.8 73.7 78.2 124 119.2 126.2 129.3 0

Step 3: Find the minimum value in the distance matrix. Combine those two cities into a

cluster. In our case, it is C4 and C7.

Step 4: After forming the cluster; calculate the distance between the clustered and the

remaining unclustered individuals. These are obtained as follows

D (C47, C1) = min {D (C4, C1), D (C7, C1)}

= min (105.4, 135.9)

= 105.4

D (C47, C2) = min {D (C4, C2), D (C7, C2)}

= min (181.9, 195)

= 181.9

Similarly after calculating the rest of the values, the following distance matrix is formed

CITIES C1 C2 C3 C47 C5 C6 C8C1 0C2 144.81 0

4

Page 5: Cluster Analysis

C3 114.9 144.2 0C47 105.4 181.9 49.7 0C5 31.9 147.6 152.5 73.6 0C6 109.7 76.8 166.3 189.4 128.8 0C8 152.8 73.7 78.2 124 184.2 126.2 0

Taking this distance matrix and minimum distance is 31.7 which is the distance between

C1 & C5. Forming the new cluster C15 and proceeding further, the distance matrix is

CITIES C15 C2 C3 C47 C6 C8C15 0C2 144.81 0C3 114.9 144.2 0C47 73.6 181.9 49.7 0C6 109.7 76.8 166.3 189.4 0C8 119.2 73.7 78.2 124 126.2 0

REPEATING STEP 3 Taking this distance matrix and minimum distance is 49.7 which is

the distance between C3 & C47. Forming the new cluster C347 and proceeding further,

the distance matrix is

CITIES C15 C2 C347 C6 C8C15 0C2 144.81 0C347 73.6 144.2 0C6 109.7 76.8 166.3 0C8 119.2 73.7 78.2 126.2 0

REPEATING STEP 3 Taking this distance matrix and minimum distance is 73.6 which is

the distance between C15 & C347. Forming the new cluster C15347 and proceeding

further, the distance matrix is

CITIES C15347 C2 C6 C8C15347 0C2 144.2 0C6 109.7 76.8 0

5

Page 6: Cluster Analysis

C8 78.2 73.7 126.2 0

REPEATING STEP 3 Taking this distance matrix and minimum distance is 73.7 which is

the distance between C6 & C8. Forming the new cluster C68 and proceeding further, the

distance matrix is

CITIES C15347 C2 C68C15347 0C2 78.2 0C68 109.7 76.8 0

REPEATING STEP 3 Taking this distance matrix and minimum distance is 76.8 which is

the distance between C2 & C68. Forming the new cluster C268 and proceeding further,

the distance matrix is

CITIES C15347 C628C15347 0C628 78.2 0

At the last stage the two clusters are joined to form a single cluster containing all the 8

cities. The dendrogram that summarizes the various stages is shown below

6

Page 7: Cluster Analysis

COMPLETE LINKAGE

It is also called as furthest neighbor method and it is the opposite of single linkage

method. Here the distance between the two clusters is the maximum distance of all

observations. The distance between the two clusters is the largest link between the

clusters

D(r, s) = max {d (i, j)}

where i, j are the observations of r& s clusters respectively.

The distance between every possible object pair is calculated and the maximum distance

is taken as the distance between the clusters.

1 35 2 8

80

70

60

50

40

30

20

10

04 7 6

CUSTOMERS

DISTANCE

SINGLE LINKAGE

7

Page 8: Cluster Analysis

PROBLEM The following table represents the co-ordinates of 8 cities in India. The

objective of the clustering is to group the cities based on their similarities so that the

degree of association is strong within cluster and weak between clusters

CITIES X YC1 -57 28C2 54 -65C3 46 79C4 8 111C5 -36 52C6 -22 -76C7 34 129C8 74 6

Step1: Select a distance measure. The distance measure used is Euclidean distance.

D (i, j) = {∑ (x ik – x jk) 2} 1/2

Where I & j are individual observations: p is the no of variables

In our case, p = 2, I & j varies from 1 to 8.

Step 2: Formulate the Distance matrix

CITIES C1 C2 C3 C4 C5 C6 C7 C8C1 0C2 144.81 0C3 114.9 144.2 0C4 105.4 181.9 49.7 0C5 31.9 147.6 152.5 73.6 0C6 109.7 76.8 166.3 189.4 128.8 0C7 135.9 195 51.4 31.7 104 212.5 0C8 132.8 73.7 78.2 124 119.2 126.2 129.3 0

Step 3: Find the minimum value in the distance matrix. Combine those two cities into a

cluster. In our case, it is C4 and C7.

8

Page 9: Cluster Analysis

Step 4: After forming the cluster, calculate the distance between the clustered and the

remaining unclustered individuals. These are obtained as follows:

D (C47, C1) = max {D (C4, C1), D (C7, C1)}

= max (105.4, 135.9)

= 135.9

D (C47, C2) = max {D (C4, C2), D (C7, C2)}

= max (181.9, 195)

= 195

Similarly after calculating the rest of the values the following distance matrix is formed as follows:

CITIES C1 C2 C3 C47 C5 C6 C8C1 0C2 144.81 0C3 114.9 144.2 0C47 135.9 195 51.5 0C5 31.9 147.6 152.5 73.6 0C6 109.7 76.8 166.3 189.4 128.8 0C8 132.8 73.7 78.2 129.3 119.2 126.2 0

REPEATING STEP 3 Taking this distance matrix and minimum distance is 31.9 which is

the distance between C1 & C5. Forming the new cluster C15 and proceeding further, the

distance matrix is

CITIES C15 C2 C3 C47 C6` C8C15 0C2 147.6 0C3 152.5 144.2 0C47 135.9 195 51.4 0C6 128.8 76.8 166.8 189.4 0C8 132.8 73.7 78.2 129.3 126.2 0

9

Page 10: Cluster Analysis

REPEATING STEP 3 Taking this distance matrix and minimum distance is 73.7 which is

the distance between C2 & C8. Forming the new cluster C28 and proceeding further, the

distance matrix is

CITIES C15 C28 C3 C47 C6

C15 0

C28 147.6 0

C3 152.5 144.2 0

C47 135.9 195 51.4 0

C6 128.8 126.2 166.3 189.4 0

REPEATING STEP 3 Taking this distance matrix and minimum distance is 51.4 which is

the distance between C3 & C47. Forming the new cluster C347 and proceeding further,

the distance matrix is

CITIES C15 C28 C347 C8

C15 0

C28 147.6 0

C347 152.5 195 0

C6 128.8 126.2 189.4 0

REPEATING STEP 3 Taking this distance matrix and minimum distance is 126.2 which

is the distance between C6 & C28. Forming the new cluster C268 and proceeding further,

the distance matrix is

CITIES C15 C268 C347

C15 0

C268 147.6 0

C347 152.5 195 0

10

Page 11: Cluster Analysis

REPEATING STEP 3 Taking this distance matrix and minimum distance is 147.6 which

is the distance between C15 & C268. Forming the new cluster C15268 and proceeding

further, the distance matrix is

CITIES C15268 C347

C15268 0

C347 195 0

At the last stage the two clusters are joined to form a single cluster containing all the 8

cities. The dendrogram that summarizes the various stages is shown below

1 65 4 7

200

180

160

140

120

100

80

60

40

20

02 8 3

CUSTOMERS

DISTANCE

COMPLETE LINKAGE

11

Page 12: Cluster Analysis

AVERAGE LINKAGE

In this method, the distance between two clusters is defined as the average

distance between the all possible pairs of objects. Each pair is made up of one object

from each cluster

The distance is given by

( 1/ ni nj ) ∑ ∑ dij

where dij is the distance between object i & object j each belonging to different clusters ,

where the summation is over all possible pairings of the variables between the two

clusters and ni and nj are the number of objects in each clusters

PROBLEM The following table represents the co-ordinates of 8 cities in India. The

objective of the clustering is to group the cities based on their similarities so that the

degree of association is strong within cluster and weak between clusters

CITIES X YC1 -57 28C2 54 -65C3 46 79C4 8 111C5 -36 52C6 -22 -76C7 34 129C8 74 6

Step1: Select a distance measure. The distance measure used is Euclidean distance.

D (i, j) = {∑ (x ik – x jk) 2} 1/2

Where i & j are individual observations: p is the no of variables

In our case, p = 2, i & j varies from 1 to 8.

12

Page 13: Cluster Analysis

Step 2: Formulate the distance matrix

CITIES C1 C2 C3 C4 C5 C6 C7 C8C1 0C2 144.81 0C3 114.9 144.2 0C4 105.4 181.9 49.7 0C5 31.9 147.6 152.5 73.6 0C6 109.7 76.8 166.3 189.4 128.8 0C7 135.9 195 51.4 31.7 104 212.5 0C8 132.8 73.7 78.2 124 119.2 126.2 129.3 0

Step 3: Find the minimum value in the distance matrix. Combine those two cities into a

cluster. In our case, it is C4 and C7.

Step 4: After forming the cluster; calculate the distance between the clustered and the

remaining unclustered individuals. These are obtained as follows

( 1/ ni nj ) ∑ ∑ dij

Where ni nj are the no of objects in clusters I & j respectively. Dij is the distance between

the object I of first cluster and object j of second cluster

In our case

D (C47, C1) = 1/ 2 * 1 (105.4 + 135.9) = 120.65

D (C47, C2) = 1/ 2 * 1 (181.9 + 195) = 188.45

Similarly after calculating the other distances, distance matrix is as follows

CITIES C1 C2 C3 C47 C5 C6 C8

C1 0

C2 144.81 0

C3 114.9 144.2 0

C47 120.65 188.45 50.55 0

C5 31.9 147.6 152.5 88.8 0

C6 109.7 76.8 166.3 200.95 128.8 0

C8 132.8 73.7 78.2 126.65 119.2 126.2 0

13

Page 14: Cluster Analysis

REPEATING STEP 3 Taking this distance matrix and minimum distance is 31.9 which is

the distance between C1 & C5. Forming the new cluster C15 and proceeding further, the

distance matrix is

CITIES C15 C2 C3 C47 C6 C8

C15 0

C2 146.21 0

C3 133.7 144.2 0

C47 104.73 188.45 50.55 0

C6 119.25 76.8 166.3 200.95 0

C8 126 73.7 78.2 126.65 126.2 0

REPEATING STEP 3 Taking this distance matrix and minimum distance is 50.55 which

is the distance between C47 & C3. Forming the new cluster C347 and proceeding further,

the distance matrix is

Cities C15 C2 C347 C6 C8

C15 0

C2 146.21 0

C347 119.22 166.33 0

C6 119.25 76.8 183.63 0

C8 126 73.7 102.43 126.2 0

REPEATING STEP 3 Taking this distance matrix and minimum distance is 73.7 which is

the distance between C2 & C8. Forming the new cluster C28 and proceeding further, the

distance matrix is

14

Page 15: Cluster Analysis

CITIES C15 C28 C347 C6

C15 0

C28 136.11 0

C347 119.22 134.37 0

C6 119.25 101.5 183.63 0

REPEATING STEP 3 Taking this distance matrix and minimum distance is 101.5 which

is the distance between C6 & C28. Forming the new cluster C628 and proceeding further,

the distance matrix is

CITIES C15 C628 C347

C15 0

C628 123.68 0

C347 119.22 159 0

REPEATING STEP 3 Taking this distance matrix and minimum distance is 119.22

which is the distance between C15 & C347. Forming the new cluster C15347 and

proceeding further, the distance matrix is

CITIES C15347 C268

C15347 0

C268 143.34 0

At the last stage the two clusters are joined to form a single cluster containing all the 8

cities. The dendrogram that summarizes the various stages is shown below

15

Page 16: Cluster Analysis

1 35 2 8

200

180

160

140

120

100

80

60

40

20

04 7 6

CUSTOMERS

DISTANCE

AVERAGE LINKAGE

16

Page 17: Cluster Analysis

NON- HIERARCHICAL CLUSTERING

Non-hierarchical clustering forms a grouping of set of units, into a pre-determined

number of groups, using an iterative algorithm in such a way that there is minimum

‘within group’ variation and maximum ‘between group’ variation. Starting from an initial

classification, units are transferred from one group to another, until no further

improvement can be made.

K-MEANS CLUSTERING

Algorithm for K-Means

Step 1: Decide the number of final clusters. This is represented by k.

Step 2: Form the initial clusters by subdividing the complete data into ‘k’ groups is done

by using the following formula:

k [Sum (i) – Min]

+ 1

[Max – Min]

Where i refer to the individual observations,

Sum (i) refers to the sum of variable values of the ith observation,

Max refers to the maximum of sum (i) values,

Min refers to the minimum of sum (i) values.

Step 3: Compute the cluster centers.

Step 4: Compute the error for the partition which is given by

E[P(n,k)] = ∑ D[ i , l( i ) ] ,for i = 1 to n

17

Page 18: Cluster Analysis

Where l (i) is the cluster containing the ith case,

K the number of clusters,

n the number of cases.

Step 5: Check to see whether any movement of case from one cluster to another cluster

results in a reduction in E. The following value is calculated for each case:

R l( i ),l = n( l ) D(i,l)2 - n( l (i) ) D(i, l ( i ))2

------------------------- ---------------------------------

n( l ) + 1 n( l(i) ) + 1

Where n ( l ) refers to the number of cases in lth cluster

l( i ) refers to the cluster containing ith case.

Step 6: Repeat steps 3, 4 & 5, till movement of any case from one cluster to another does

not result in a reduction in E.

Step 7: This represents the final clusters.

Compute the cluster centre.

PROBLEM The following table represents the co-ordinates of 8 cities in India. The

objective of the clustering is to group the cities based on their similarities so that the

degree of association is strong within cluster and weak between clusters

CITIES X YC1 -57 28C2 54 -65C3 46 79C4 8 111C5 -36 52C6 -22 -76C7 34 129C8 74 6

18

Page 19: Cluster Analysis

Step 1: Initially no of clusters has to be decided. This is done arbitrarily. In our case let

us take it as k= 3. Then the cities in each of the k clusters have to be decided. To do that,

the variables should be converted to a single measure using the following formula.

(K [sum (I) – min] / max – min) + 1

Where sum (I) = sum of the variables of observation

Min = minimum of sum (i)

Max = maximum of sum (i)

In our case it is as follows

CITIES X Y Sum Single measure

C1 -57 28 -29 1.793

C2 54 -65 -11 2

C3 46 79 125 3.563

C4 8 111 119 3.494

C5 -36 52 16 2.310

C6 -22 -76 -98 1

C7 34 129 163 4

C8 74 6 80 3.048

Sample calculation

K = 3 max = 163 min = - 98 sum (i) = -29

Measure = (3 * [-29 + 98] / [163 + 98]) + 1

= 1.793

Grouping the cities into clusters based on the measures results in following clusters

CLUSTER MEMBERS

I C1, C2, C6

II C3, C4, C5, C8

III C7

19

Page 20: Cluster Analysis

Step 2: Find the cluster centers

CLUSTER CLUSTER CENTERS

X Y

I -8.33 -37.67

II 23 62

III 34 129

For cluster 1,

Center = c1 + c2 + c6 / 3

= [(-57 + 54 -22)/3, (28 – 65 – 76) / 3]

= - 8.33, - 37.67

Step 3: Find the distance between the Ith observation and kth cluster

D(I,k) = [ ∑ { a( I, j ) – b ( k , j) }2 ] ½

D(1,1) = [ { -57 + 8.33 } 2 + { 28 + 37.67 } 2 ] ½

= 81.74

Similarly calculate for all the other observation

Step 4: The error for any particular partition is

E [P (n,k)] = ∑ D [ I , l(I) ] 2

Where l(i) is the cluster containing the ith case

n is the number of observations- here n = 8

k is the number of variables – here k = 2

E [ p( 8,2) ] = D[1,1] 2 + D [ 2,1] 2 + D [ 3,2 ] 2 + D [ 4,2] 2 + D [5,2] 2 +

D[6,1]2 + D [ 7,3 ] 2 + D [8,2]2

= 25731.34

Step 5: we now have to check whether there is any movement of any city from one

cluster to other cluster results in reduction of error e.

20

Page 21: Cluster Analysis

R [ l(i), L ] = n(i) D(i,l)2 - n(l(i)) d ( i ,l (i) )2

n(l)+1 n(l(i))-1

Where n (l) - no of observations in l th cluster

l (i) – Cluster containing the i th observation

Iteration 1 : For the 1st city, calculation is as follows

D(1,1)2 = ( -57 + 8.33 )2 + (28+ 37.67 ) 2

= 6681.31

D(1,2)2 = (-57 – 23 )2 + (28- 62 ) 2

= 7556

D(1,3)2 = (-57-34)2 + (28-129) 2

= 18482

R[1(1),2] = (4 / 5) * 7556 – (3 / 2) * 6681.31 = - 3977.145

By shifting the city 1 from cluster 1 to cluster 2 we can reduce the error, so we shift city

from cluster 1 to cluster 2

Iteration 2

CLUSTER MEMBERS

I C2, C6

II C1, C3, C4, C5, C8

III C7

New cluster centers are

CLUSTER CLUSTER CENTERS

X Y

I 16 -70.5

II 7 55.2

III 34 129

21

Page 22: Cluster Analysis

For 1st city calculation of R[l(i),i] are as follows

D(1,1)2 = ( -57 -16 )2 + (28+ 70.5 ) 2

= 15031.25

D(1,2)2 = (-57 – 7 )2 + (28- 55.2 ) 2

= 4835.84

D(1,3)2 = (-57-34)2 + (28-129) 2

= 18482

R[1(2),1] = (2 / 3) * 15031.25 – (5 /4) * 4835.84 > 0

R[1(2),3] = (1 / 2) * 18482 – (5 /4) * 4835.84 > 0

Shifting city 1 to other clusters does not result in reduction of error. So we move on to

next city

For 2nd city calculation of R[l(i),i] are as follows

D(2,1)2 = ( 54 -16 )2 + (-65+ 70.5 ) 2

= 1474.25

D(2,2)2 = (54 – 7 )2 + (-65 - 55.2 ) 2

= 16657.04

D(1,3)2 = (54-34)2 + (-65 -129) 2

= 38036

R[2(1),2] = (5 / 6) * 16657.04 – (2 /1) * 1474.25 > 0

R[2(1),3] = (1 / 2) * 38036 – (2 /1) * 1474.25 > 0

Shifting city 2 to other clusters does not result in reduction of error. So we move on to

next city

For 3rd city calculation of R[l(i),i] are as follows

D(3,1)2 = ( 46 -16 )2 + (79+ 70.5 ) 2

= 23250.25

22

Page 23: Cluster Analysis

D(3,2)2 = (46 – 7 )2 + (79 - 55.2 ) 2

= 2087.44

D(3,3)2 = (46-34)2 + (79 -129) 2

= 2564

R[3(2),1] = (2 / 3) * 23250.25 – (5 /4) * 2087.44 > 0

R[3(2),3] = (1 / 2) * 2564 – (5 /4) * 2087.44 < 0

Shifting city 3 to third cluster results in reduction of error. Therefore change city 3 to

third cluster

Iteration 3

CLUSTER MEMBERS

I C2, C6

II C1, C4, C5, C8

III C3, C7

New cluster centers are

CLUSTER CLUSTER CENTERS

X Y

I 16 -70.5

II -2.75 49.75

III 40 104

For 1st city calculation of R[l(i),i] are as follows

D(1,1)2 = ( -57 -16 )2 + (28+ 70.5 ) 2

= 15031.25

D(1,2)2 = (-57 + 2.75 )2 + (28- 49.75 ) 2

23

Page 24: Cluster Analysis

= 3416.125

D(1,3)2 = (-57-40)2 + (28-104) 2

= 15185

R[1(2),1] = (2 / 3) * 15031.25 – (4 /3) * 3416.125 > 0

R[1(2),1] = (2 / 3) * 15185 – (4/3) * 3416.125 > 0

Shifting of city 1 to other clusters does not result in reduction of error. So we move on to

next city.

For 2nd city calculation of R[l(i),i] are as follows

D(2,1)2 = ( 54 -16 )2 + (-65+ 70.5 ) 2

= 1474.25

D(2,2)2 = (54 + 2.75 )2 + (-65 - 49.75 ) 2

= 16388

D(2,3)2 = (54-40)2 + (-65-104) 2 = 28757

R[2(1),2] = (4 / 5) * 16388 – (2 /1) * 1474.25 > 0

R[2(1),3] = (2 / 3) * 28757 – (2/1) * 1474.25 > 0

Shifting of city 2 to other clusters does not result in reduction of error. So we move on to

next city

For 3rd city calculation of R[l(i),i] are as follows

D(3,1)2 = ( 46 -16 )2 + (79+ 70.5 ) 2

= 23250.25

D(3,2)2 = (46 + 2.75 )2 + (79 - 49.75 ) 2

= 3232.125

D(3,3)2 = (46-40)2 + (79-104) 2

= 661

R[3(3),1] = (2 / 3) * 23250.2 – (2 /1) * 661 > 0

R[3(3),2] = (4 / 5) * 3232.12 – (2/1) * 661 > 0

24

Page 25: Cluster Analysis

Shifting of city 3 to other clusters does not result in reduction of error. So we move on to

next city.

For 4th city calculation of R[l(i),i] are as follows

D(4,1)2 = ( 8 -16 )2 + (111+ 70.5 ) 2

= 33006.25

D(4,2)2 = (8 + 2.75 )2 + (111- 49.75 ) 2

= 3867.125

D(4,3)2 = (8-40)2 + (111-104) 2

= 1076

R[4(2),1] = (2 / 3) * 33006.25 – (4 /3) * 3867.125 > 0

R[4(2),3] = (2 / 3) * 1076 – (4/3) * 3867.125 < 0

Shifting city 4 to third cluster results in reduction of error. So we move city 4 to third

cluster

Iteration 4

CLUSTER MEMBERS

I C2, C6

II C1, C5, C8

III C3, C4,C7

New cluster centers are

CLUSTER CLUSTER CENTERS

X Y

I 16 -70.5

II -6.33 28.6

III 29.33 106.33

25

Page 26: Cluster Analysis

For 1st city calculation of R[l(i),i] are as follows

D(1,1)2 = ( -57 -16 )2 + (28+ 70.5 ) 2

= 15031.25

D(1,2)2 = (-57 + 6.33 )2 + (28- 28.6 ) 2

= 2567.8

D(1,3)2 = (-57-29.33)2 + (28-106.33) 2

= 13588.45

R[1(2),1] = (2 / 3) * 15031.25 – (3/2) * 2567.8 > 0

R[1(2),1] = (3 / 4) * 13588.45 – (3/2) * 2567.8 > 0

Shifting of city 1 to other clusters does not result in reduction of error. So we move on to

next city.

For 2nd city calculation of R[l(i),i] are as follows

D(2,1)2 = ( 54 -16 )2 + (-65+ 70.5 ) 2

= 1474.25

D(2,2)2 = (54 + 6.33 )2 + (-65 – 28.6 ) 2

= 12400.6

D(2,3)2 = (54-29.33)2 + (-65-106.33) 2

= 29962.5

R[2(1),2] = (3 / 4) * 12400.6 – (2 /1) * 1474.25 > 0

R[2(1),3] = (3 / 4) * 29962.5 – (2/1) * 1474.25 > 0

Shifting of city 2 to other clusters does not result in reduction of error. So we move on to

next city.

For 3rd city calculation of R[l(i),i] are as follows

D(3,1)2 = ( 46 -16 )2 + (79+ 70.5 ) 2

= 23250.25

D(3,2)2 = (46 + 6.33 )2 + (79 -28.6 ) 2

= 5278.58

26

Page 27: Cluster Analysis

D(3,3)2 = (46-29.33)2 + (79-106.33) 2

= 1024.81

R[3(3),1] = (2 / 3) * 23250.25 – (3 / 2) * 1024.81 > 0

R[3(3),2] = (3 / 4) * 5278.58 – (3 / 2 )* 1024.81 > 0

Shifting of city 3 to other clusters does not result in reduction of error. So we move on to

next city.

For 4th city calculation of R[l(i),i] are as follows

D(4,1)2 = ( 8 -16 )2 + (111+ 70.5 ) 2

= 33006.25

D(4,2)2 = (8 + 6.33 )2 + (111- 28.6 ) 2

= 6995.10

D(4,3)2 = (8-29.33)2 + (111-106.33) 2

= 476.778

R[4(2),1] = (2 / 3) * 33006.25 – (3 / 2) * 476.778 > 0

R[4(2),3] = (3 / 4) * 6995.10 – (3 / 2) * 476.778 > 0

Shifting of city 4 to other clusters does not result in reduction of error. So we can move to

next city.

For 5th city calculation of R[l(i),i] are as follows

D(5,1)2 = ( -36 -16 )2 + (52+ 70.5 ) 2

= 17710.25

D(5,2)2 = (-36 + 6.33 )2 + (52- 28.6 ) 2

= 1427.86

D(5,3)2 = (-36-29.33)2 + (52-106.33) 2

= 7219.75

27

Page 28: Cluster Analysis

R[5(2),1] = (2 / 3) * 17710.25 – (3 / 2) * 1427.86 > 0

R[5(2),3] = (3 / 4) * 7219.75 – (3 / 2) * 1427.86 > 0

Shifting of city 5 to other clusters does not result in reduction of error. So we can move to

next city.

For 6th city calculation of R[l(i),i] are as follows

D(6,1)2 = (-22-16 )2 + (-76+ 70.5 ) 2

= 1474.25

D(6,2)2 = (-22 + 6.33 )2 + (-76- 28.6 ) 2

= 11186.70

D(6,3)2 = (-22-29.33)2 + (-76-106.33) 2

= 35878.98

R[6(1),2] = (3 / 4) * 11186.70 – (2 / 1) * 1474.25 > 0

R[6(1),3] = (3 / 4) * 35878.98 – (2 / 1) * 1474.25 > 0

Shifting of city 6 to other clusters does not result in reduction of error. So we can move to

next city.

For 7th city calculation of R[l(i),i] are as follows

D(7,1)2 = ( 34 -16 )2 + (129+ 70.5 ) 2

= 40124

D(7,2)2 = (34 + 6.33 )2 + (129- 28.6 ) 2

= 11706.6

D(7,3)2 = (34-29.33)2 + (129-106.33) 2

= 535.73

R[7(3),1] = (2 / 3) * 40124 – (3 / 2) * 535.73 > 0

R[7(3),2] = (3 / 4 )* 11706.6 – (3 / 2 )* 535.73 > 0

28

Page 29: Cluster Analysis

Shifting of city 7 to other clusters does not result in reduction of error. So we can move to

next city.

For 8th city calculation of R[l(i),i] are as follows

D(8,1)2 = ( 74 -16 )2 + (6+ 70.5 ) 2

= 9216.25

D(8,2)2 = (74 + 6.33 )2 + (6- 28.6 ) 2

= 6963.66

D(8,3)2 = (74-29.33)2 + (6-106.33) 2

= 12061.5

R[8(2),1] = (2 / 3) * 9216.25 – (3 / 2) * 6963.66 < 0

Shifting of city 8 to first cluster results in reduction of error. So we should reallocate 8th city to first cluster.

Iteration 5

CLUSTER MEMBERS

I C2, C6, C8

II C1, C5

III C3, C4,C7

New cluster centers are

CLUSTER CLUSTER CENTERS

X Y

I 35.33 -45

II -45.66 40

III 29.33 106.33

For 1st city calculation of R[l(i),i] are as follows

D(1,1)2 = ( -57 -35.33 )2 + (28+ 45 ) 2

= 8813.8

29

Page 30: Cluster Analysis

D(1,2)2 = (-57 + 46.5 )2 + (28- 40 ) 2

= 254.25

D(1,3)2 = (-57-29.33 )2 + (28-106.33) 2

= 13588.45

R[1(2),1] = (3 / 4) * 8813.8 – (2/1) * 254.25 > 0

R[1(2),1] = (3 / 4) * 13588.45 – (2/1) * 254.25 > 0

Shifting of city 1 to other clusters does not result in reduction of error. So we move on to

next city.

For 2nd city calculation of R[l(i),i] are as follows

D(2,1)2 = ( 54 -35.33 )2 + (-65+ 45 ) 2

= 748.56

D(2,2)2 = (54 + 46.5 )2 + (-65 – 40 ) 2

= 21125.25

D(2,3)2 = (54-29.33)2 + (-65-106.33) 2

= 29962.5

R[2(1),2] = (2 / 3) * 21125.25 – (3 / 2) * 748.56 > 0

R[2(1),3] = (3 / 4) * 29962.5 – (3 / 2) * 748.56 > 0

Shifting of city 2 to other clusters does not result in reduction of error. So we move on to

next city.

For 3rd city calculation of R[l(i),i] are as follows

D(3,1)2 = ( 46 -35.33)2 + (79+ 45 ) 2

= 15489.8

D(3,2)2 = (46 + 46.5 )2 + (79 -40 ) 2

= 10077.23

D(3,3)2 = (46-29.33)2 + (79-106.33) 2

= 1024.81

30

Page 31: Cluster Analysis

R[3(3),1] = (3 / 4) * 15489.8 – (3 / 2) * 1024.81 > 0

R[3(3),2] = (2 / 3) * 10077.23 – (3 / 2) * 1024.81 > 0

Shifting of city 3 to other clusters does not result in reduction of error. So we move on to

next city.

For 4th city calculation of R[l(i),i] are as follows

D(4,1)2 = ( 8 -35.33 )2 + (111+ 45 ) 2

= 25082.9

D(4,2)2 = (8 + 46.5 )2 + (111- 40 ) 2

= 8011.25

D(4,3)2 = (8-29.33)2 + (111-106.33) 2

= 476.778

R[4(2),1] = (3 / 4) * 25082.9 – (3 / 2) * 476.778 > 0

R[4(2),3] = (2 / 3) * 8011.25 – (3 / 2) * 476.778 > 0

Shifting of city 4 to other clusters does not result in reduction of error. So we can move to

next city.

For 5th city calculation of R[l(i),i] are as follows

D(5,1)2 = ( -36 -35.33 )2 + (52+ 45 ) 2

= 14496.5

D(5,2)2 = (-36 + 46.5 + (52- 40)2

= 254.25

D(5,3)2 = (-36-29.33)2 + (52-106.33) 2

= 7219.75

R[5(2),1] = (3/ 4) * 14496.96– (2 / 1) * 254.25 > 0

R[5(2),3] = (3 / 4) * 7219.75 – (2 / 1) * 254.25 > 0

31

Page 32: Cluster Analysis

Shifting of city 5 to other clusters does not result in reduction of error. So we can move to

next city.

For 6th city calculation of R[l(i),i] are as follows

D(6,1)2 = (-22-35.33 )2 + (-76+ 45 ) 2

= 4247.7

D(6,2)2 = (-22 + 46.5 )2 + (-76- 40 ) 2

= 14056.25

D(6,3)2 = (-22-29.33)2 + (-76-106.33) 2

= 33297.5

R[6(1),2] = (3 / 4) * 14056.25 – (2 / 1) * 4247.7 > 0

R[6(1),3] = (3 / 4) * 33297.5 – (2 / 1) * 4247.7 > 0

Shifting of city 6 to other clusters does not result in reduction of error. So we can move to

next city.

For 7th city calculation of R[l(i),i] are as follows

D(7,1)2 = ( 34 -35.33)2 + (129+ 45 ) 2

= 30277.69

D(7,2)2 = (34 + 46.5 )2 + (129- 40 ) 2

= 14401.25

D(7,3)2 = (34-29.33)2 + (129-106.33) 2

= 535.73

R[7(3),1] = (3 / 4) * 30277.69 – (3 / 2) * 535.73 > 0

R[7(3),2] = (2 / 3) * 14401.25 – (3 / 2) * 535.73 > 0

Shifting of city 7 to other clusters does not result in reduction of error. So we can move to

next city.

32

Page 33: Cluster Analysis

For 8th city calculation of R[l(i),i] are as follows

D(8,1)2 = ( 74 -35.33)2 + (6+ 45 ) 2

= 4304.36

D(8,2)2 = (74 + 46.5 )2 + (6- 40 ) 2

= 15544.25

D(8,3)2 = (74-29.33)2 + (6-106.33) 2

= 11664.79

R[8(1),2] = (2 / 3) *15544.25 – (3 / 2) *4304.36 > 0

R[8(1),3] = (3 / 4) *11664.79 – (3 / 2 ) *4304.36 > 0

After calculating all the R l (i) , j we found that all are positive & will not result in less

error, therefore final cluster is

FINAL SOLUTION

CLUSTER MEMBERS

I C2, C6, C8

II C1, C5

III C3, C4,C7

New cluster centers are

CLUSTER CLUSTER CENTERS

X Y

I 35.33 -45

II -46.5 40

III 29.33 106.33

Error = D ( 2, 1)2 + D ( 6, 1)2 + D ( 6, 1)2 + D ( 6, 2)2 + D ( 5, 2)2 + D ( 3, 3)2

+ D ( 4, 3)2 + D ( 7, 3)2

= 748.877+ 4247.7 + 4304.36 + 254.25 + 254.25 +1024.877 +476.77 + 535.73

= 11846.50

33

Page 34: Cluster Analysis

APPLICATIONS OF CLUSTER ANALYSIS

1. Segmenting the market

2. Understanding buyer behavior.

3. Product positioning

4. New product development

5. Selecting test markets.

34