classification & clustering
DESCRIPTION
Classification & Clustering. Pieter Spronck http://www.cs.unimaas.nl/p.spronck. Binary Division of Marbles. Big vs. Small. Transparent vs. Opaque. Marble Attributes. Size (big vs. small) Transparency (transparent vs. opaque) Shininess (shiny vs. dull) - PowerPoint PPT PresentationTRANSCRIPT
ComputerScience
UniversiteitMaastricht
Institute for Knowledgeand Agent Technology
Classification & Clustering
Pieter Spronckhttp://www.cs.unimaas.nl/p.spronck
222 Apr 2023
Binary Division of Marbles
322 Apr 2023
Big vs. Small
422 Apr 2023
Transparent vs. Opaque
522 Apr 2023
Marble AttributesSize (big vs. small)Transparency (transparent vs. opaque)Shininess (shiny vs. dull)Colouring (monochrome vs. polychrome)Colour (blue, green, yellow, …)…
622 Apr 2023
Grouping of Marbles
722 Apr 2023
“Marbles”
822 Apr 2023
“Honouring All Distinctions”
922 Apr 2023
“Colour Coding”
1022 Apr 2023
if transparent then if coloured glass
then group 1else group 3
else group 2
1
2
3
“Natural Grouping”
1122 Apr 2023
Types of ClustersUniquely classifying clustersOverlapping clustersProbabilistic clustersDendrograms
1222 Apr 2023
Uniquely Classifying Clusters
1322 Apr 2023
Overlapping Clusters
1422 Apr 2023
Probabilistic ClusteringCluster Green Blue Typical
Samples1 1.0 0.0
2 0.0 1.0
3 0.1 0.9
4 0.5 0.5
1522 Apr 2023
Dendrogramopaquetransparent
not clear clear
1622 Apr 2023
ClassificationOrdering of entities into groups based on their similarityMinimisation of within-group varianceMaximisation of between-group varianceExhaustive and exclusivePrincipal technique: clustering
1722 Apr 2023
Reasons for ClassificationDescriptive powerParsimonyMaintainabilityVersatilityIdentification of distinctive attributes
1822 Apr 2023
Typology vs. TaxonomyTypology – conceptualTaxonomy – empirical
1922 Apr 2023
TypologyDefine conceptual attributesSelect appropriate attributes Create typology matrix (substruction)Insert empirical entities in matrixExtend matrix if necessaryReduce matrix if necessary
2022 Apr 2023
Defining Conceptual AttributesMeaningfulFocus on ideal typesOrder of importanceExhaustive domains
2122 Apr 2023
Conceptual Marble Attributes
2222 Apr 2023
Typology MatrixTransparency
ColouringOpaque Transparent
Monochrome
Polychrome
2322 Apr 2023
Matrix ExtensionTransparency
Colouring
Transparent Opaque
Clear Not clear Clear Not clear
MonochromeBig
Small
PolychromeBig
Small
GlassSize
2422 Apr 2023
ReductionFunctional reductionPragmatic reductionNumerical reductionReduction by using criterion types
2522 Apr 2023
Functional ReductionTransparency
Colouring
Transparent Opaque
Clear Not clear Clear Not clear
MonochromeBig
Small
PolychromeBig
Small
GlassSize
2622 Apr 2023
Functionally Reduced MatrixTransparency
Colouring
Transparent
OpaqueClear Not clear
MonochromeBig
Small
PolychromeBig
Small
GlassSize
2722 Apr 2023
Pragmatic ReductionTransparency
Colouring
Transparent Opaque
Clear Not clear Clear Not clear
MonochromeBig
Small
PolychromeBig
Small
GlassSize
2822 Apr 2023
Pragmatically Reduced MatrixTransparency
Size
Transparent
OpaqueClear Not clear
SmallMonochrome
Polychrome
Big
GlassColouring
2922 Apr 2023
Criticising Typological ClassificationReificationResilienceProblematic attribute selectionUnmanageability
3022 Apr 2023
TaxonomyDefine empirical attributesSelect appropriate attributesCreate entity matrixApply clustering techniqueAnalyse clusters
3122 Apr 2023
Empirical Attributes
Big
Single colour
Lots of colours
Green glass
TransparentBlue
Yellow
White Dull
Shiny
3222 Apr 2023
Selecting AttributesSize (big/small)Colour (yellow, green, blue, red, white…)Colouring (monochrome/polychrome)Shininess (shiny/dull)Transparency (transparent/opaque)Glass colour (clear, green, …)
3322 Apr 2023
Entity MatrixBig Monochrome Shiny Transparent Big Monochrome Shiny Transparent
N Y Y Y N Y Y NN Y Y Y N Y Y NN Y Y Y N Y Y NN Y Y Y N Y Y NN Y Y Y N Y Y NN N N N N N Y Y
Y N N N Y N Y Y
Y Y Y N
3422 Apr 2023
Automatic Clustering ParametersAgglomerative vs. divisiveMonothetic vs. polytheticOutliers permittedLimits to number of clustersForm of linkage (single, complete, average)…
3522 Apr 2023
Automatic Clustering
NYYYsmall, monochrome, shiny, transparent
*NNNpolychrome, dull, opaque
*NYYpolychrome, shiny, transparent
YYYNbig, monochrome, shiny, opaque
NYYNsmall, monochrome, shiny, opaque
3622 Apr 2023
Polythetic to Monothetic
NYYYsmall, monochrome,shiny, transparent
*NNNpolychrome, dull, opaque
*NYYpolychrome, shiny, transparent
NYYNsmall, monochrome,
shiny, opaque
*YYNmonochrome, shiny, opaque
3722 Apr 2023
Analysing Clusters
“Vanilla”
“Stone”
“Tiger”
“Classic”
small, monochrome,shiny, transparent polychrome, dull,
opaque
polychrome, shiny,transparent
small, monochrome,shiny, opaque
3822 Apr 2023
Criticising Taxonomical ClassificationDependent on specimensDifficult to generaliseDifficult to labelBiased towards academic disciplineNot the “last word”
3922 Apr 2023
Typology vs. TaxonomyTypology TaxonomyConceptual Empirical
Subjective Objective
Manual (Mostly) automatic
Less discriminative More discriminative
Goes awry when there are insufficient insights
Goes awry when there are insufficient specimens
4022 Apr 2023
Operational ClassificationTypology
(conceptual)
Taxonomy(empirical)
Operational typology(conceptual + empirical)
4122 Apr 2023
Automated Clustering MethodsIterative distance-based clustering: the k-means methodIncremental clustering:the Cobweb methodProbability-based clustering:the EM algorithm
4222 Apr 2023
k-Means MethodIterative distance-based clustering DivisivePolytheticPredefined number of clusters (k)Outliers permitted
4322 Apr 2023
k-Means (pass 1)
k = 2attributes: size (big/small), colouring (monochrome/polychrome), shininess (shiny/dull), transparency (transparent/opaque)
??
4422 Apr 2023
k-Means (pass 2)
Cluster average:small, monochrome,shiny, transparent.
Cluster average:small, polychrome,dull, opaque
k = 2attributes: size (big/small), colouring (monochrome/polychrome), shininess (shiny/dull), transparency (transparent/opaque)
4522 Apr 2023
k-Means (pass 3)
Cluster average:small, monochrome,shiny, transparent.
Cluster average:big, polychrome,dull, opaque
k = 2attributes: size (big/small), colouring (monochrome/polychrome), shininess (shiny/dull), transparency (transparent/opaque)
?
4622 Apr 2023
Cobweb AlgorithmIncremental clustering AgglomerativePolytheticDynamic number of clustersOutliers permitted
4722 Apr 2023
Cobweb ProcedureBuilds a tree by adding instances to itUses a Category Utility function to determine the quality of the clusteringChanges the tree structure if this positively influences the Category Utility (by merging nodes or splitting nodes)“Cutoff” value may be used to group sufficiently similar instances together
4822 Apr 2023
Category Utility
Measure for quality of clusteringThe better the predictive value of the average attribute values of the instances in the clusters for the individual attribute values, the higher the CU will be
k
vaCvaCCCCU i j ijiiji
k
22
1
Pr|PrPr,...,
4922 Apr 2023
Category Utility for “Size” (1)
C1 C2
a) Pr[size=big|C1] = 1/3b) Pr[size=big|C2] = 1/3c) Pr[size=big] = 1/3d) Pr[C1] = 1/2
e) Pr[size=small|C1] = 2/3f) Pr[size=small|C2] = 2/3g) Pr[size=small] = 2/3h) Pr[C2] = 1/2
CU = (d((a2–c2)+(e2–g2))+h((b2–c2)+(f2–g2)))/2 = 0
5022 Apr 2023
Category Utility for “Size” (2)
C1 C2
a) Pr[size=big|C1] = 2/3b) Pr[size=big|C2] = 0c) Pr[size=big] = 1/3d) Pr[C1] = 1/2
e) Pr[size=small|C1] = 1/3f) Pr[size=small|C2] = 1g) Pr[size=small] = 2/3h) Pr[C2] = 1/2
CU = (d((a2–c2)+(e2–g2))+h((b2–c2)+(f2–g2)))/2 = ((1/2)((1/3)+(–1/3))+(1/2)((–1/9)+(5/9)))/2 = 1/9
5122 Apr 2023
Category Utility for “Size” (3)
C1 C2
a) Pr[size=big|C1] = 1b) Pr[size=big|C2] = 0c) Pr[size=big] = 1/3d) Pr[C1] = 1/3
e) Pr[size=small|C1] = 0f) Pr[size=small|C2] = 1g) Pr[size=small] = 2/3h) Pr[C2] = 1/2
CU = (d((a2–c2)+(e2–g2))+h((b2–c2)+(f2–g2)))/2 = ((1/3)((8/9)+(–4/9))+(2/3)((–1/9)+(5/9)))/2 = 2/9
5222 Apr 2023
Cobweb Example
12
attributes: size (big/small), colouring (monochrome/polychrome), shininess (shiny/dull), transparency (transparent/opaque)
5322 Apr 2023
Cobweb Result Example
attributes: size (big/small), colouring (monochrome/polychrome), shininess (shiny/dull), transparency (transparent/opaque)
5422 Apr 2023
Cobweb NumericalProbability of values of attributes of instances in a cluster is based on the standard deviation from the estimate for the mean valueAcuity is presumed variance in attribute values
5522 Apr 2023
Disadvantages of Previous MethodsFast and hard to judgeDependent on initial setupAd-hoc limitationsHard to escape from local minima
5622 Apr 2023
Probability-based ClusteringFinite mixture modelsEach cluster is defined by a vector of probabilities for instances to have certain values for their attributes, and a probability for instances to reside in the cluster. Clustering equals searching for optimal sets of probabilities for a sample set
5722 Apr 2023
Expectation-Maximisation (EM)Probability-based clusteringDivisivePolytheticPredefined number of clusters (k)Outliers permitted
5822 Apr 2023
EM ProcedureSelect k cluster vectors randomlyCalculate cluster probabilities for each instance (under the assumption that the instance attributes are independent)Use calculations to re-estimate valuesRepeat until increase in quality becomes negligible
5922 Apr 2023
EM Result ExamplepC1=0.2pbig=0.6pmonochrome=0.3pshiny=0.4ptransparent=0.4
pC2=0.8 pbig=0.2pmonochrome=0.8pshiny=0.9ptransparent=0.5
.2*.4*.3*.4*.6=0.0058 .8*.8*.8*.9*.5=0.2304.2*.4*.7*.6*.6=0.0202 .8*.8*.2*.1*.5=0.0064.2*.6*.7*.4*.4=0.0134 .8*.2*.2*.9*.5=0.0144
6022 Apr 2023
The Essence of ClassificationA successful classification defines fundamental characteristicsA classification can never be better than the attributes it is based uponThere is no magic formula