cs6220: data mining techniques - uclaweb.cs.ucla.edu/~yzsun/classes/2013spring_cs6220/slides/... ·...
TRANSCRIPT
CS6220 DATA MINING TECHNIQUES
Instructor Yizhou Sun yzsunccsneuedu
April 10 2013
Chapter 11 Advanced Clustering Analysis
Chapter 10 Cluster Analysis Basic Concepts and Methods
bull Beyond K-Means
bull K-means
bull EM-algorithm
bull Kernel K-means
bull Clustering Graphs and Network Data
bull Summary
2
Recall K-Means
bull Objective function
bull 119869 = ||119909119894 minus 119888119895||2
119862 119894 =119895119896119895=1
bull Total within-cluster variance
bull Re-arrange the objective function
bull 119869 = 119908119894119895||119909119894 minus 119888119895||2
119894119896119895=1
bull Where 119908119894119895 = 1 119894119891 119909119894 119887119890119897119900119899119892119904 119905119900 119888119897119906119904119905119890119903 119895 119908119894119895 = 0 119900119905ℎ119890119903119908119894119904119890
bull Looking for
bull The best assignment 119908119894119895
bull The best center 119888119895
3
Solution of K-Means
bull Iterations
bull Step 1 Fix centers 119888119895 find assignment 119908119894119895 that minimizes 119869
bull =gt 119908119894119895 = 1 119894119891 ||119909119894 minus 119888119895||2 is the smallest
bull Step 2 Fix assignment 119908119894119895 find centers that minimize 119869
bull =gt first derivative of 119869 = 0
bull =gt 120597119869
120597119888119895= minus2 119908119894119895(119909119894 minus 119888119895) =119894
119896119895=1 0
bull =gt119888119895 = 119908119894119895119909119894119894
119908119894119895119894
bull Note 119908119894119895119894 is the total number of objects in cluster j
4
Limitations of K-Means
bull K-means has problems when clusters are of differing
bull Sizes
bull Densities
bull Non-Spherical Shapes
11
Limitations of K-Means Different Density and Size
12
Limitations of K-Means Non-Spherical Shapes
13
Fuzzy Set and Fuzzy Cluster
bull Clustering methods discussed so far
bull Every data object is assigned to exactly one cluster
bull Some applications may need for fuzzy or soft cluster assignment
bull Ex An e-game could belong to both entertainment
and software
bull Methods fuzzy clusters and probabilistic model-based clusters
bull Fuzzy cluster A fuzzy set S FS X rarr [0 1] (value between 0 and 1)
14
Probabilistic Model-Based Clustering
bull Cluster analysis is to find hidden categories
bull A hidden category (ie probabilistic cluster) is a distribution over the data space which can be mathematically represented using a probability density function (or distribution function)
Ex categories for digital cameras sold
consumer line vs professional line
density functions f1 f2 for C1 C2
obtained by probabilistic clustering
A mixture model assumes that a set of observed objects is a mixture
of instances from multiple probabilistic clusters and conceptually
each observed object is generated independently
Our task infer a set of k probabilistic clusters that is mostly likely to
generate D using the above data generation process
15
16
Mixture Model-Based Clustering
bull A set C of k probabilistic clusters C1 hellipCk with probability density functions f1
hellip fk respectively and their probabilities ω1 hellip ωk
bull Probability of an object o generated by cluster Cj is
bull Probability of o generated by the set of cluster C is
Since objects are assumed to be generated
independently for a data set D = o1 hellip on we have
Task Find a set C of k probabilistic clusters st P(D|C) is maximized
The EM (Expectation Maximization) Algorithm
bull The (EM) algorithm A framework to approach maximum likelihood or maximum a posteriori estimates of parameters in statistical models bull E-step assigns objects to clusters according to the current fuzzy
clustering or parameters of probabilistic clusters
bull 119908119894119895119905 = 119901 119911119894 = 119895 120579119895
119905 119909119894 prop 119901 119909119894 119862119895119905 120579119895
119905 119901(119862119895119905)
bull M-step finds the new clustering or parameters that minimize the sum of squared error (SSE) or the expected likelihood bull Under uni-variant normal distribution assumptions
bull 120583119895119905+1 =
119908119894119895119905 119909119894119894
119908119894119895119905
119894 120590119895
2 = 119908119894119895
119905 119909119894minus1198881198951199052
119894
119908119894119895119905
119894 119901 119862119895
119905 prop 119908119894119895119905
119894
bull More about mixture model and EM algorithms httpwwwstatcmuedu~cshalizi350lectures29lecture-29pdf
17
K-Means Special Case of Gaussian Mixture Model
bull When each Gaussian component with covariance matrix 1205902119868
bull Soft K-means
bull When 1205902 rarr 0
bull Soft assignment becomes hard assignment
18
Advantages and Disadvantages of Mixture Models
bull Strength
bull Mixture models are more general than partitioning
bull Clusters can be characterized by a small number of parameters
bull The results may satisfy the statistical assumptions of the generative models
bull Weakness
bull Converge to local optimal (overcome run multi-times w random
initialization)
bull Computationally expensive if the number of distributions is large or the
data set contains very few observed data points
bull Need large data sets
bull Hard to estimate the number of clusters
19
Kernel K-Means
bull How to cluster the following data
bull A non-linear map 120601119877119899 rarr 119865
bull Map a data point into a higherinfinite dimensional space
bull 119909 rarr 120601 119909
bull Dot product matrix 119870119894119895
bull 119870119894119895 =lt 120601 119909119894 120601(119909119895) gt
20
Solution of Kernel K-Means
bull Objective function under new feature space
bull 119869 = 119908119894119895||120601(119909119894) minus 119888119895||2
119894119896119895=1
bull Algorithm
bull By fixing assignment 119908119894119895
bull 119888119895 = 119908119894119895119894 120601(119909119894) 119908119894119895119894
bull In the assignment step assign the data points to the closest
center
bull 119889 119909119894 119888119895 = 120601 119909119894 minus 119908119894prime119895119894prime 120601 119909119894prime
119908119894prime119895119894prime
2
=
120601 119909119894 sdot 120601 119909119894 minus 2 119908
119894prime119895119894prime120601 119909119894 sdot120601 119909
119894prime
119908119894prime119895119894prime+ 119908
119894prime119895119908119897119895120601 119909
119894primesdot120601(119909119897)119897119894prime
( 119908119894prime119895)^2 119894prime
21
Do not really need to know 120601 119909 119887119906119905 119900119899119897119910 119870119894119895
Advatanges and Disadvantages of Kernel K-Means
bull Advantages
bull Algorithm is able to identify the non-linear structures
bull Disadvantages
bull Number of cluster centers need to be predefined
bull Algorithm is complex in nature and time complexity is large
bull References
bull Kernel k-means and Spectral Clustering by Max Welling
bull Kernel k-means Spectral Clustering and Normalized Cut by
Inderjit S Dhillon Yuqiang Guan and Brian Kulis
bull An Introduction to kernel methods by Colin Campbell
22
Chapter 10 Cluster Analysis Basic Concepts and Methods
bull Beyond K-Means
bull K-means
bull EM-algorithm for Mixture Models
bull Kernel K-means
bull Clustering Graphs and Network Data
bull Summary
23
Clustering Graphs and Network Data
bull Applications
bull Bi-partite graphs eg customers and products authors and conferences
bull Web search engines eg click through graphs and Web graphs
bull Social networks friendshipcoauthor graphs
24
Clustering books about politics [Newman 2006]
Algorithms bull Graph clustering methods
bull Density-based clustering SCAN (Xu et al KDDrsquo2007)
bull Spectral clustering
bull Modularity-based approach
bull Probabilistic approach
bull Nonnegative matrix factorization
bull hellip
25
SCAN Density-Based Clustering of Networks
bull How many clusters
bull What size should they be
bull What is the best partitioning
bull Should some points be segregated
26
An Example Network
Application Given simply information of who associates with whom
could one identify clusters of individuals with common interests or
special relationships (families cliques terrorist cells)
A Social Network Model
bull Cliques hubs and outliers
bull Individuals in a tight social group or clique know many of the same
people regardless of the size of the group
bull Individuals who are hubs know many people in different groups but belong
to no single group Politicians for example bridge multiple groups
bull Individuals who are outliers reside at the margins of society Hermits for
example know few people and belong to no group
bull The Neighborhood of a Vertex
27
v
Define () as the immediate
neighborhood of a vertex (ie the set
of people that an individual knows )
Structure Similarity
bull The desired features tend to be captured by a measure we
call Structural Similarity
bull Structural similarity is large for members of a clique and small
for hubs and outliers
|)(||)(|
|)()(|)(
wv
wvwv
28
v
Structural Connectivity [1]
bull -Neighborhood
bull Core
bull Direct structure reachable
bull Structure reachable transitive closure of direct structure
reachability
bull Structure connected
)(|)()( wvvwvN
|)(|)( vNvCORE
)()()( vNwvCOREwvDirRECH
)()()( wuRECHvuRECHVuwvCONNECT
[1] M Ester H P Kriegel J Sander amp X Xu (KDD96) ldquoA Density-Based
Algorithm for Discovering Clusters in Large Spatial Databases
29
Structure-Connected Clusters
bull Structure-connected cluster C
bull Connectivity
bull Maximality
bull Hubs
bull Not belong to any cluster
bull Bridge to many clusters
bull Outliers
bull Not belong to any cluster
bull Connect to less clusters
)( wvCONNECTCwv
CwwvREACHCvVwv )(
hub
outlier
30
13
9
10
11
7
8
12
6
4
0
1 5
2
3
Algorithm
= 2
= 07
31
13
9
10
11
7
8
12
6
4
0
1 5
2
3
Algorithm
= 2
= 07
063
32
13
9
10
11
7
8
12
6
4
0
1 5
2
3
Algorithm
= 2
= 07
075
067
082
33
13
9
10
11
7
8
12
6
4
0
1 5
2
3
Algorithm
= 2
= 07
34
13
9
10
11
7
8
12
6
4
0
1 5
2
3
Algorithm
= 2
= 07
067
35
13
9
10
11
7
8
12
6
4
0
1 5
2
3
Algorithm
= 2
= 07
073
073
073
36
13
9
10
11
7
8
12
6
4
0
1 5
2
3
Algorithm
= 2
= 07
37
13
9
10
11
7
8
12
6
4
0
1 5
2
3
Algorithm
= 2
= 07
051
38
13
9
10
11
7
8
12
6
4
0
1 5
2
3
Algorithm
= 2
= 07
068
39
13
9
10
11
7
8
12
6
4
0
1 5
2
3
Algorithm
= 2
= 07
051
40
13
9
10
11
7
8
12
6
4
0
1 5
2
3
Algorithm
= 2
= 07
41
13
9
10
11
7
8
12
6
4
0
1 5
2
3
Algorithm
= 2
= 07 051
051
068
42
13
9
10
11
7
8
12
6
4
0
1 5
2
3
Algorithm
= 2
= 07
43
Running Time
bull Running time = O(|E|) bull For sparse networks = O(|V|)
[2] A Clauset M E J Newman amp C Moore Phys Rev E 70 066111 (2004) 44
Spectral Clustering
bull Reference ICDMrsquo09 Tutorial by Chris Ding
bull Example
bull Clustering supreme court justices according to their voting
behavior
45
Example Continue
46
Spectral Graph Partition
bull Min-Cut
bull Minimize the of cut of edges
47
Objective Function
48
Minimum Cut with Constraints
49
New Objective Functions
50
Other References
bull A Tutorial on Spectral Clustering by U Luxburg httpwwwkybmpgdefileadminuser_uploadfilespublicationsattachmentsLuxburg07_tutorial_44885B05Dpdf
51
Chapter 10 Cluster Analysis Basic Concepts and Methods
bull Beyond K-Means
bull K-means
bull EM-algorithm
bull Kernel K-means
bull Clustering Graphs and Network Data
bull Summary
52
Summary
bull Generalizing K-Means
bull Mixture Model EM-Algorithm Kernel K-Means
bull Clustering Graph and Networked Data
bull SCAN density-based algorithm
bull Spectral clustering
53
Announcement
bull HW 3 due tomorrow
bull Course project due next week bull Submit final report data code (with readme) evaluation forms
bull Make appointment with me to explain your project
bull I will ask questions according to your report
bull Final Exam bull 422 3 hours in class cover the whole semester with different
weights
bull You can bring two A4 cheating sheets one for content before midterm and the other for content after midterm
bull Interested in research bull My research area Informationsocial network mining
54
Chapter 10 Cluster Analysis Basic Concepts and Methods
bull Beyond K-Means
bull K-means
bull EM-algorithm
bull Kernel K-means
bull Clustering Graphs and Network Data
bull Summary
2
Recall K-Means
bull Objective function
bull 119869 = ||119909119894 minus 119888119895||2
119862 119894 =119895119896119895=1
bull Total within-cluster variance
bull Re-arrange the objective function
bull 119869 = 119908119894119895||119909119894 minus 119888119895||2
119894119896119895=1
bull Where 119908119894119895 = 1 119894119891 119909119894 119887119890119897119900119899119892119904 119905119900 119888119897119906119904119905119890119903 119895 119908119894119895 = 0 119900119905ℎ119890119903119908119894119904119890
bull Looking for
bull The best assignment 119908119894119895
bull The best center 119888119895
3
Solution of K-Means
bull Iterations
bull Step 1 Fix centers 119888119895 find assignment 119908119894119895 that minimizes 119869
bull =gt 119908119894119895 = 1 119894119891 ||119909119894 minus 119888119895||2 is the smallest
bull Step 2 Fix assignment 119908119894119895 find centers that minimize 119869
bull =gt first derivative of 119869 = 0
bull =gt 120597119869
120597119888119895= minus2 119908119894119895(119909119894 minus 119888119895) =119894
119896119895=1 0
bull =gt119888119895 = 119908119894119895119909119894119894
119908119894119895119894
bull Note 119908119894119895119894 is the total number of objects in cluster j
4
Limitations of K-Means
bull K-means has problems when clusters are of differing
bull Sizes
bull Densities
bull Non-Spherical Shapes
11
Limitations of K-Means Different Density and Size
12
Limitations of K-Means Non-Spherical Shapes
13
Fuzzy Set and Fuzzy Cluster
bull Clustering methods discussed so far
bull Every data object is assigned to exactly one cluster
bull Some applications may need for fuzzy or soft cluster assignment
bull Ex An e-game could belong to both entertainment
and software
bull Methods fuzzy clusters and probabilistic model-based clusters
bull Fuzzy cluster A fuzzy set S FS X rarr [0 1] (value between 0 and 1)
14
Probabilistic Model-Based Clustering
bull Cluster analysis is to find hidden categories
bull A hidden category (ie probabilistic cluster) is a distribution over the data space which can be mathematically represented using a probability density function (or distribution function)
Ex categories for digital cameras sold
consumer line vs professional line
density functions f1 f2 for C1 C2
obtained by probabilistic clustering
A mixture model assumes that a set of observed objects is a mixture
of instances from multiple probabilistic clusters and conceptually
each observed object is generated independently
Our task infer a set of k probabilistic clusters that is mostly likely to
generate D using the above data generation process
15
16
Mixture Model-Based Clustering
bull A set C of k probabilistic clusters C1 hellipCk with probability density functions f1
hellip fk respectively and their probabilities ω1 hellip ωk
bull Probability of an object o generated by cluster Cj is
bull Probability of o generated by the set of cluster C is
Since objects are assumed to be generated
independently for a data set D = o1 hellip on we have
Task Find a set C of k probabilistic clusters st P(D|C) is maximized
The EM (Expectation Maximization) Algorithm
bull The (EM) algorithm A framework to approach maximum likelihood or maximum a posteriori estimates of parameters in statistical models bull E-step assigns objects to clusters according to the current fuzzy
clustering or parameters of probabilistic clusters
bull 119908119894119895119905 = 119901 119911119894 = 119895 120579119895
119905 119909119894 prop 119901 119909119894 119862119895119905 120579119895
119905 119901(119862119895119905)
bull M-step finds the new clustering or parameters that minimize the sum of squared error (SSE) or the expected likelihood bull Under uni-variant normal distribution assumptions
bull 120583119895119905+1 =
119908119894119895119905 119909119894119894
119908119894119895119905
119894 120590119895
2 = 119908119894119895
119905 119909119894minus1198881198951199052
119894
119908119894119895119905
119894 119901 119862119895
119905 prop 119908119894119895119905
119894
bull More about mixture model and EM algorithms httpwwwstatcmuedu~cshalizi350lectures29lecture-29pdf
17
K-Means Special Case of Gaussian Mixture Model
bull When each Gaussian component with covariance matrix 1205902119868
bull Soft K-means
bull When 1205902 rarr 0
bull Soft assignment becomes hard assignment
18
Advantages and Disadvantages of Mixture Models
bull Strength
bull Mixture models are more general than partitioning
bull Clusters can be characterized by a small number of parameters
bull The results may satisfy the statistical assumptions of the generative models
bull Weakness
bull Converge to local optimal (overcome run multi-times w random
initialization)
bull Computationally expensive if the number of distributions is large or the
data set contains very few observed data points
bull Need large data sets
bull Hard to estimate the number of clusters
19
Kernel K-Means
bull How to cluster the following data
bull A non-linear map 120601119877119899 rarr 119865
bull Map a data point into a higherinfinite dimensional space
bull 119909 rarr 120601 119909
bull Dot product matrix 119870119894119895
bull 119870119894119895 =lt 120601 119909119894 120601(119909119895) gt
20
Solution of Kernel K-Means
bull Objective function under new feature space
bull 119869 = 119908119894119895||120601(119909119894) minus 119888119895||2
119894119896119895=1
bull Algorithm
bull By fixing assignment 119908119894119895
bull 119888119895 = 119908119894119895119894 120601(119909119894) 119908119894119895119894
bull In the assignment step assign the data points to the closest
center
bull 119889 119909119894 119888119895 = 120601 119909119894 minus 119908119894prime119895119894prime 120601 119909119894prime
119908119894prime119895119894prime
2
=
120601 119909119894 sdot 120601 119909119894 minus 2 119908
119894prime119895119894prime120601 119909119894 sdot120601 119909
119894prime
119908119894prime119895119894prime+ 119908
119894prime119895119908119897119895120601 119909
119894primesdot120601(119909119897)119897119894prime
( 119908119894prime119895)^2 119894prime
21
Do not really need to know 120601 119909 119887119906119905 119900119899119897119910 119870119894119895
Advatanges and Disadvantages of Kernel K-Means
bull Advantages
bull Algorithm is able to identify the non-linear structures
bull Disadvantages
bull Number of cluster centers need to be predefined
bull Algorithm is complex in nature and time complexity is large
bull References
bull Kernel k-means and Spectral Clustering by Max Welling
bull Kernel k-means Spectral Clustering and Normalized Cut by
Inderjit S Dhillon Yuqiang Guan and Brian Kulis
bull An Introduction to kernel methods by Colin Campbell
22
Chapter 10 Cluster Analysis Basic Concepts and Methods
bull Beyond K-Means
bull K-means
bull EM-algorithm for Mixture Models
bull Kernel K-means
bull Clustering Graphs and Network Data
bull Summary
23
Clustering Graphs and Network Data
bull Applications
bull Bi-partite graphs eg customers and products authors and conferences
bull Web search engines eg click through graphs and Web graphs
bull Social networks friendshipcoauthor graphs
24
Clustering books about politics [Newman 2006]
Algorithms bull Graph clustering methods
bull Density-based clustering SCAN (Xu et al KDDrsquo2007)
bull Spectral clustering
bull Modularity-based approach
bull Probabilistic approach
bull Nonnegative matrix factorization
bull hellip
25
SCAN Density-Based Clustering of Networks
bull How many clusters
bull What size should they be
bull What is the best partitioning
bull Should some points be segregated
26
An Example Network
Application Given simply information of who associates with whom
could one identify clusters of individuals with common interests or
special relationships (families cliques terrorist cells)
A Social Network Model
bull Cliques hubs and outliers
bull Individuals in a tight social group or clique know many of the same
people regardless of the size of the group
bull Individuals who are hubs know many people in different groups but belong
to no single group Politicians for example bridge multiple groups
bull Individuals who are outliers reside at the margins of society Hermits for
example know few people and belong to no group
bull The Neighborhood of a Vertex
27
v
Define () as the immediate
neighborhood of a vertex (ie the set
of people that an individual knows )
Structure Similarity
bull The desired features tend to be captured by a measure we
call Structural Similarity
bull Structural similarity is large for members of a clique and small
for hubs and outliers
|)(||)(|
|)()(|)(
wv
wvwv
28
v
Structural Connectivity [1]
bull -Neighborhood
bull Core
bull Direct structure reachable
bull Structure reachable transitive closure of direct structure
reachability
bull Structure connected
)(|)()( wvvwvN
|)(|)( vNvCORE
)()()( vNwvCOREwvDirRECH
)()()( wuRECHvuRECHVuwvCONNECT
[1] M Ester H P Kriegel J Sander amp X Xu (KDD96) ldquoA Density-Based
Algorithm for Discovering Clusters in Large Spatial Databases
29
Structure-Connected Clusters
bull Structure-connected cluster C
bull Connectivity
bull Maximality
bull Hubs
bull Not belong to any cluster
bull Bridge to many clusters
bull Outliers
bull Not belong to any cluster
bull Connect to less clusters
)( wvCONNECTCwv
CwwvREACHCvVwv )(
hub
outlier
30
13
9
10
11
7
8
12
6
4
0
1 5
2
3
Algorithm
= 2
= 07
31
13
9
10
11
7
8
12
6
4
0
1 5
2
3
Algorithm
= 2
= 07
063
32
13
9
10
11
7
8
12
6
4
0
1 5
2
3
Algorithm
= 2
= 07
075
067
082
33
13
9
10
11
7
8
12
6
4
0
1 5
2
3
Algorithm
= 2
= 07
34
13
9
10
11
7
8
12
6
4
0
1 5
2
3
Algorithm
= 2
= 07
067
35
13
9
10
11
7
8
12
6
4
0
1 5
2
3
Algorithm
= 2
= 07
073
073
073
36
13
9
10
11
7
8
12
6
4
0
1 5
2
3
Algorithm
= 2
= 07
37
13
9
10
11
7
8
12
6
4
0
1 5
2
3
Algorithm
= 2
= 07
051
38
13
9
10
11
7
8
12
6
4
0
1 5
2
3
Algorithm
= 2
= 07
068
39
13
9
10
11
7
8
12
6
4
0
1 5
2
3
Algorithm
= 2
= 07
051
40
13
9
10
11
7
8
12
6
4
0
1 5
2
3
Algorithm
= 2
= 07
41
13
9
10
11
7
8
12
6
4
0
1 5
2
3
Algorithm
= 2
= 07 051
051
068
42
13
9
10
11
7
8
12
6
4
0
1 5
2
3
Algorithm
= 2
= 07
43
Running Time
bull Running time = O(|E|) bull For sparse networks = O(|V|)
[2] A Clauset M E J Newman amp C Moore Phys Rev E 70 066111 (2004) 44
Spectral Clustering
bull Reference ICDMrsquo09 Tutorial by Chris Ding
bull Example
bull Clustering supreme court justices according to their voting
behavior
45
Example Continue
46
Spectral Graph Partition
bull Min-Cut
bull Minimize the of cut of edges
47
Objective Function
48
Minimum Cut with Constraints
49
New Objective Functions
50
Other References
bull A Tutorial on Spectral Clustering by U Luxburg httpwwwkybmpgdefileadminuser_uploadfilespublicationsattachmentsLuxburg07_tutorial_44885B05Dpdf
51
Chapter 10 Cluster Analysis Basic Concepts and Methods
bull Beyond K-Means
bull K-means
bull EM-algorithm
bull Kernel K-means
bull Clustering Graphs and Network Data
bull Summary
52
Summary
bull Generalizing K-Means
bull Mixture Model EM-Algorithm Kernel K-Means
bull Clustering Graph and Networked Data
bull SCAN density-based algorithm
bull Spectral clustering
53
Announcement
bull HW 3 due tomorrow
bull Course project due next week bull Submit final report data code (with readme) evaluation forms
bull Make appointment with me to explain your project
bull I will ask questions according to your report
bull Final Exam bull 422 3 hours in class cover the whole semester with different
weights
bull You can bring two A4 cheating sheets one for content before midterm and the other for content after midterm
bull Interested in research bull My research area Informationsocial network mining
54
Recall K-Means
bull Objective function
bull 119869 = ||119909119894 minus 119888119895||2
119862 119894 =119895119896119895=1
bull Total within-cluster variance
bull Re-arrange the objective function
bull 119869 = 119908119894119895||119909119894 minus 119888119895||2
119894119896119895=1
bull Where 119908119894119895 = 1 119894119891 119909119894 119887119890119897119900119899119892119904 119905119900 119888119897119906119904119905119890119903 119895 119908119894119895 = 0 119900119905ℎ119890119903119908119894119904119890
bull Looking for
bull The best assignment 119908119894119895
bull The best center 119888119895
3
Solution of K-Means
bull Iterations
bull Step 1 Fix centers 119888119895 find assignment 119908119894119895 that minimizes 119869
bull =gt 119908119894119895 = 1 119894119891 ||119909119894 minus 119888119895||2 is the smallest
bull Step 2 Fix assignment 119908119894119895 find centers that minimize 119869
bull =gt first derivative of 119869 = 0
bull =gt 120597119869
120597119888119895= minus2 119908119894119895(119909119894 minus 119888119895) =119894
119896119895=1 0
bull =gt119888119895 = 119908119894119895119909119894119894
119908119894119895119894
bull Note 119908119894119895119894 is the total number of objects in cluster j
4
Limitations of K-Means
bull K-means has problems when clusters are of differing
bull Sizes
bull Densities
bull Non-Spherical Shapes
11
Limitations of K-Means Different Density and Size
12
Limitations of K-Means Non-Spherical Shapes
13
Fuzzy Set and Fuzzy Cluster
bull Clustering methods discussed so far
bull Every data object is assigned to exactly one cluster
bull Some applications may need for fuzzy or soft cluster assignment
bull Ex An e-game could belong to both entertainment
and software
bull Methods fuzzy clusters and probabilistic model-based clusters
bull Fuzzy cluster A fuzzy set S FS X rarr [0 1] (value between 0 and 1)
14
Probabilistic Model-Based Clustering
bull Cluster analysis is to find hidden categories
bull A hidden category (ie probabilistic cluster) is a distribution over the data space which can be mathematically represented using a probability density function (or distribution function)
Ex categories for digital cameras sold
consumer line vs professional line
density functions f1 f2 for C1 C2
obtained by probabilistic clustering
A mixture model assumes that a set of observed objects is a mixture
of instances from multiple probabilistic clusters and conceptually
each observed object is generated independently
Our task infer a set of k probabilistic clusters that is mostly likely to
generate D using the above data generation process
15
16
Mixture Model-Based Clustering
bull A set C of k probabilistic clusters C1 hellipCk with probability density functions f1
hellip fk respectively and their probabilities ω1 hellip ωk
bull Probability of an object o generated by cluster Cj is
bull Probability of o generated by the set of cluster C is
Since objects are assumed to be generated
independently for a data set D = o1 hellip on we have
Task Find a set C of k probabilistic clusters st P(D|C) is maximized
The EM (Expectation Maximization) Algorithm
bull The (EM) algorithm A framework to approach maximum likelihood or maximum a posteriori estimates of parameters in statistical models bull E-step assigns objects to clusters according to the current fuzzy
clustering or parameters of probabilistic clusters
bull 119908119894119895119905 = 119901 119911119894 = 119895 120579119895
119905 119909119894 prop 119901 119909119894 119862119895119905 120579119895
119905 119901(119862119895119905)
bull M-step finds the new clustering or parameters that minimize the sum of squared error (SSE) or the expected likelihood bull Under uni-variant normal distribution assumptions
bull 120583119895119905+1 =
119908119894119895119905 119909119894119894
119908119894119895119905
119894 120590119895
2 = 119908119894119895
119905 119909119894minus1198881198951199052
119894
119908119894119895119905
119894 119901 119862119895
119905 prop 119908119894119895119905
119894
bull More about mixture model and EM algorithms httpwwwstatcmuedu~cshalizi350lectures29lecture-29pdf
17
K-Means Special Case of Gaussian Mixture Model
bull When each Gaussian component with covariance matrix 1205902119868
bull Soft K-means
bull When 1205902 rarr 0
bull Soft assignment becomes hard assignment
18
Advantages and Disadvantages of Mixture Models
bull Strength
bull Mixture models are more general than partitioning
bull Clusters can be characterized by a small number of parameters
bull The results may satisfy the statistical assumptions of the generative models
bull Weakness
bull Converge to local optimal (overcome run multi-times w random
initialization)
bull Computationally expensive if the number of distributions is large or the
data set contains very few observed data points
bull Need large data sets
bull Hard to estimate the number of clusters
19
Kernel K-Means
bull How to cluster the following data
bull A non-linear map 120601119877119899 rarr 119865
bull Map a data point into a higherinfinite dimensional space
bull 119909 rarr 120601 119909
bull Dot product matrix 119870119894119895
bull 119870119894119895 =lt 120601 119909119894 120601(119909119895) gt
20
Solution of Kernel K-Means
bull Objective function under new feature space
bull 119869 = 119908119894119895||120601(119909119894) minus 119888119895||2
119894119896119895=1
bull Algorithm
bull By fixing assignment 119908119894119895
bull 119888119895 = 119908119894119895119894 120601(119909119894) 119908119894119895119894
bull In the assignment step assign the data points to the closest
center
bull 119889 119909119894 119888119895 = 120601 119909119894 minus 119908119894prime119895119894prime 120601 119909119894prime
119908119894prime119895119894prime
2
=
120601 119909119894 sdot 120601 119909119894 minus 2 119908
119894prime119895119894prime120601 119909119894 sdot120601 119909
119894prime
119908119894prime119895119894prime+ 119908
119894prime119895119908119897119895120601 119909
119894primesdot120601(119909119897)119897119894prime
( 119908119894prime119895)^2 119894prime
21
Do not really need to know 120601 119909 119887119906119905 119900119899119897119910 119870119894119895
Advatanges and Disadvantages of Kernel K-Means
bull Advantages
bull Algorithm is able to identify the non-linear structures
bull Disadvantages
bull Number of cluster centers need to be predefined
bull Algorithm is complex in nature and time complexity is large
bull References
bull Kernel k-means and Spectral Clustering by Max Welling
bull Kernel k-means Spectral Clustering and Normalized Cut by
Inderjit S Dhillon Yuqiang Guan and Brian Kulis
bull An Introduction to kernel methods by Colin Campbell
22
Chapter 10 Cluster Analysis Basic Concepts and Methods
bull Beyond K-Means
bull K-means
bull EM-algorithm for Mixture Models
bull Kernel K-means
bull Clustering Graphs and Network Data
bull Summary
23
Clustering Graphs and Network Data
bull Applications
bull Bi-partite graphs eg customers and products authors and conferences
bull Web search engines eg click through graphs and Web graphs
bull Social networks friendshipcoauthor graphs
24
Clustering books about politics [Newman 2006]
Algorithms bull Graph clustering methods
bull Density-based clustering SCAN (Xu et al KDDrsquo2007)
bull Spectral clustering
bull Modularity-based approach
bull Probabilistic approach
bull Nonnegative matrix factorization
bull hellip
25
SCAN Density-Based Clustering of Networks
bull How many clusters
bull What size should they be
bull What is the best partitioning
bull Should some points be segregated
26
An Example Network
Application Given simply information of who associates with whom
could one identify clusters of individuals with common interests or
special relationships (families cliques terrorist cells)
A Social Network Model
bull Cliques hubs and outliers
bull Individuals in a tight social group or clique know many of the same
people regardless of the size of the group
bull Individuals who are hubs know many people in different groups but belong
to no single group Politicians for example bridge multiple groups
bull Individuals who are outliers reside at the margins of society Hermits for
example know few people and belong to no group
bull The Neighborhood of a Vertex
27
v
Define () as the immediate
neighborhood of a vertex (ie the set
of people that an individual knows )
Structure Similarity
bull The desired features tend to be captured by a measure we
call Structural Similarity
bull Structural similarity is large for members of a clique and small
for hubs and outliers
|)(||)(|
|)()(|)(
wv
wvwv
28
v
Structural Connectivity [1]
bull -Neighborhood
bull Core
bull Direct structure reachable
bull Structure reachable transitive closure of direct structure
reachability
bull Structure connected
)(|)()( wvvwvN
|)(|)( vNvCORE
)()()( vNwvCOREwvDirRECH
)()()( wuRECHvuRECHVuwvCONNECT
[1] M Ester H P Kriegel J Sander amp X Xu (KDD96) ldquoA Density-Based
Algorithm for Discovering Clusters in Large Spatial Databases
29
Structure-Connected Clusters
bull Structure-connected cluster C
bull Connectivity
bull Maximality
bull Hubs
bull Not belong to any cluster
bull Bridge to many clusters
bull Outliers
bull Not belong to any cluster
bull Connect to less clusters
)( wvCONNECTCwv
CwwvREACHCvVwv )(
hub
outlier
30
13
9
10
11
7
8
12
6
4
0
1 5
2
3
Algorithm
= 2
= 07
31
13
9
10
11
7
8
12
6
4
0
1 5
2
3
Algorithm
= 2
= 07
063
32
13
9
10
11
7
8
12
6
4
0
1 5
2
3
Algorithm
= 2
= 07
075
067
082
33
13
9
10
11
7
8
12
6
4
0
1 5
2
3
Algorithm
= 2
= 07
34
13
9
10
11
7
8
12
6
4
0
1 5
2
3
Algorithm
= 2
= 07
067
35
13
9
10
11
7
8
12
6
4
0
1 5
2
3
Algorithm
= 2
= 07
073
073
073
36
13
9
10
11
7
8
12
6
4
0
1 5
2
3
Algorithm
= 2
= 07
37
13
9
10
11
7
8
12
6
4
0
1 5
2
3
Algorithm
= 2
= 07
051
38
13
9
10
11
7
8
12
6
4
0
1 5
2
3
Algorithm
= 2
= 07
068
39
13
9
10
11
7
8
12
6
4
0
1 5
2
3
Algorithm
= 2
= 07
051
40
13
9
10
11
7
8
12
6
4
0
1 5
2
3
Algorithm
= 2
= 07
41
13
9
10
11
7
8
12
6
4
0
1 5
2
3
Algorithm
= 2
= 07 051
051
068
42
13
9
10
11
7
8
12
6
4
0
1 5
2
3
Algorithm
= 2
= 07
43
Running Time
bull Running time = O(|E|) bull For sparse networks = O(|V|)
[2] A Clauset M E J Newman amp C Moore Phys Rev E 70 066111 (2004) 44
Spectral Clustering
bull Reference ICDMrsquo09 Tutorial by Chris Ding
bull Example
bull Clustering supreme court justices according to their voting
behavior
45
Example Continue
46
Spectral Graph Partition
bull Min-Cut
bull Minimize the of cut of edges
47
Objective Function
48
Minimum Cut with Constraints
49
New Objective Functions
50
Other References
bull A Tutorial on Spectral Clustering by U Luxburg httpwwwkybmpgdefileadminuser_uploadfilespublicationsattachmentsLuxburg07_tutorial_44885B05Dpdf
51
Chapter 10 Cluster Analysis Basic Concepts and Methods
bull Beyond K-Means
bull K-means
bull EM-algorithm
bull Kernel K-means
bull Clustering Graphs and Network Data
bull Summary
52
Summary
bull Generalizing K-Means
bull Mixture Model EM-Algorithm Kernel K-Means
bull Clustering Graph and Networked Data
bull SCAN density-based algorithm
bull Spectral clustering
53
Announcement
bull HW 3 due tomorrow
bull Course project due next week bull Submit final report data code (with readme) evaluation forms
bull Make appointment with me to explain your project
bull I will ask questions according to your report
bull Final Exam bull 422 3 hours in class cover the whole semester with different
weights
bull You can bring two A4 cheating sheets one for content before midterm and the other for content after midterm
bull Interested in research bull My research area Informationsocial network mining
54
Solution of K-Means
bull Iterations
bull Step 1 Fix centers 119888119895 find assignment 119908119894119895 that minimizes 119869
bull =gt 119908119894119895 = 1 119894119891 ||119909119894 minus 119888119895||2 is the smallest
bull Step 2 Fix assignment 119908119894119895 find centers that minimize 119869
bull =gt first derivative of 119869 = 0
bull =gt 120597119869
120597119888119895= minus2 119908119894119895(119909119894 minus 119888119895) =119894
119896119895=1 0
bull =gt119888119895 = 119908119894119895119909119894119894
119908119894119895119894
bull Note 119908119894119895119894 is the total number of objects in cluster j
4
Limitations of K-Means
bull K-means has problems when clusters are of differing
bull Sizes
bull Densities
bull Non-Spherical Shapes
11
Limitations of K-Means Different Density and Size
12
Limitations of K-Means Non-Spherical Shapes
13
Fuzzy Set and Fuzzy Cluster
bull Clustering methods discussed so far
bull Every data object is assigned to exactly one cluster
bull Some applications may need for fuzzy or soft cluster assignment
bull Ex An e-game could belong to both entertainment
and software
bull Methods fuzzy clusters and probabilistic model-based clusters
bull Fuzzy cluster A fuzzy set S FS X rarr [0 1] (value between 0 and 1)
14
Probabilistic Model-Based Clustering
bull Cluster analysis is to find hidden categories
bull A hidden category (ie probabilistic cluster) is a distribution over the data space which can be mathematically represented using a probability density function (or distribution function)
Ex categories for digital cameras sold
consumer line vs professional line
density functions f1 f2 for C1 C2
obtained by probabilistic clustering
A mixture model assumes that a set of observed objects is a mixture
of instances from multiple probabilistic clusters and conceptually
each observed object is generated independently
Our task infer a set of k probabilistic clusters that is mostly likely to
generate D using the above data generation process
15
16
Mixture Model-Based Clustering
bull A set C of k probabilistic clusters C1 hellipCk with probability density functions f1
hellip fk respectively and their probabilities ω1 hellip ωk
bull Probability of an object o generated by cluster Cj is
bull Probability of o generated by the set of cluster C is
Since objects are assumed to be generated
independently for a data set D = o1 hellip on we have
Task Find a set C of k probabilistic clusters st P(D|C) is maximized
The EM (Expectation Maximization) Algorithm
bull The (EM) algorithm A framework to approach maximum likelihood or maximum a posteriori estimates of parameters in statistical models bull E-step assigns objects to clusters according to the current fuzzy
clustering or parameters of probabilistic clusters
bull 119908119894119895119905 = 119901 119911119894 = 119895 120579119895
119905 119909119894 prop 119901 119909119894 119862119895119905 120579119895
119905 119901(119862119895119905)
bull M-step finds the new clustering or parameters that minimize the sum of squared error (SSE) or the expected likelihood bull Under uni-variant normal distribution assumptions
bull 120583119895119905+1 =
119908119894119895119905 119909119894119894
119908119894119895119905
119894 120590119895
2 = 119908119894119895
119905 119909119894minus1198881198951199052
119894
119908119894119895119905
119894 119901 119862119895
119905 prop 119908119894119895119905
119894
bull More about mixture model and EM algorithms httpwwwstatcmuedu~cshalizi350lectures29lecture-29pdf
17
K-Means Special Case of Gaussian Mixture Model
bull When each Gaussian component with covariance matrix 1205902119868
bull Soft K-means
bull When 1205902 rarr 0
bull Soft assignment becomes hard assignment
18
Advantages and Disadvantages of Mixture Models
bull Strength
bull Mixture models are more general than partitioning
bull Clusters can be characterized by a small number of parameters
bull The results may satisfy the statistical assumptions of the generative models
bull Weakness
bull Converge to local optimal (overcome run multi-times w random
initialization)
bull Computationally expensive if the number of distributions is large or the
data set contains very few observed data points
bull Need large data sets
bull Hard to estimate the number of clusters
19
Kernel K-Means
bull How to cluster the following data
bull A non-linear map 120601119877119899 rarr 119865
bull Map a data point into a higherinfinite dimensional space
bull 119909 rarr 120601 119909
bull Dot product matrix 119870119894119895
bull 119870119894119895 =lt 120601 119909119894 120601(119909119895) gt
20
Solution of Kernel K-Means
bull Objective function under new feature space
bull 119869 = 119908119894119895||120601(119909119894) minus 119888119895||2
119894119896119895=1
bull Algorithm
bull By fixing assignment 119908119894119895
bull 119888119895 = 119908119894119895119894 120601(119909119894) 119908119894119895119894
bull In the assignment step assign the data points to the closest
center
bull 119889 119909119894 119888119895 = 120601 119909119894 minus 119908119894prime119895119894prime 120601 119909119894prime
119908119894prime119895119894prime
2
=
120601 119909119894 sdot 120601 119909119894 minus 2 119908
119894prime119895119894prime120601 119909119894 sdot120601 119909
119894prime
119908119894prime119895119894prime+ 119908
119894prime119895119908119897119895120601 119909
119894primesdot120601(119909119897)119897119894prime
( 119908119894prime119895)^2 119894prime
21
Do not really need to know 120601 119909 119887119906119905 119900119899119897119910 119870119894119895
Advatanges and Disadvantages of Kernel K-Means
bull Advantages
bull Algorithm is able to identify the non-linear structures
bull Disadvantages
bull Number of cluster centers need to be predefined
bull Algorithm is complex in nature and time complexity is large
bull References
bull Kernel k-means and Spectral Clustering by Max Welling
bull Kernel k-means Spectral Clustering and Normalized Cut by
Inderjit S Dhillon Yuqiang Guan and Brian Kulis
bull An Introduction to kernel methods by Colin Campbell
22
Chapter 10 Cluster Analysis Basic Concepts and Methods
bull Beyond K-Means
bull K-means
bull EM-algorithm for Mixture Models
bull Kernel K-means
bull Clustering Graphs and Network Data
bull Summary
23
Clustering Graphs and Network Data
bull Applications
bull Bi-partite graphs eg customers and products authors and conferences
bull Web search engines eg click through graphs and Web graphs
bull Social networks friendshipcoauthor graphs
24
Clustering books about politics [Newman 2006]
Algorithms bull Graph clustering methods
bull Density-based clustering SCAN (Xu et al KDDrsquo2007)
bull Spectral clustering
bull Modularity-based approach
bull Probabilistic approach
bull Nonnegative matrix factorization
bull hellip
25
SCAN Density-Based Clustering of Networks
bull How many clusters
bull What size should they be
bull What is the best partitioning
bull Should some points be segregated
26
An Example Network
Application Given simply information of who associates with whom
could one identify clusters of individuals with common interests or
special relationships (families cliques terrorist cells)
A Social Network Model
bull Cliques hubs and outliers
bull Individuals in a tight social group or clique know many of the same
people regardless of the size of the group
bull Individuals who are hubs know many people in different groups but belong
to no single group Politicians for example bridge multiple groups
bull Individuals who are outliers reside at the margins of society Hermits for
example know few people and belong to no group
bull The Neighborhood of a Vertex
27
v
Define () as the immediate
neighborhood of a vertex (ie the set
of people that an individual knows )
Structure Similarity
bull The desired features tend to be captured by a measure we
call Structural Similarity
bull Structural similarity is large for members of a clique and small
for hubs and outliers
|)(||)(|
|)()(|)(
wv
wvwv
28
v
Structural Connectivity [1]
bull -Neighborhood
bull Core
bull Direct structure reachable
bull Structure reachable transitive closure of direct structure
reachability
bull Structure connected
)(|)()( wvvwvN
|)(|)( vNvCORE
)()()( vNwvCOREwvDirRECH
)()()( wuRECHvuRECHVuwvCONNECT
[1] M Ester H P Kriegel J Sander amp X Xu (KDD96) ldquoA Density-Based
Algorithm for Discovering Clusters in Large Spatial Databases
29
Structure-Connected Clusters
bull Structure-connected cluster C
bull Connectivity
bull Maximality
bull Hubs
bull Not belong to any cluster
bull Bridge to many clusters
bull Outliers
bull Not belong to any cluster
bull Connect to less clusters
)( wvCONNECTCwv
CwwvREACHCvVwv )(
hub
outlier
30
13
9
10
11
7
8
12
6
4
0
1 5
2
3
Algorithm
= 2
= 07
31
13
9
10
11
7
8
12
6
4
0
1 5
2
3
Algorithm
= 2
= 07
063
32
13
9
10
11
7
8
12
6
4
0
1 5
2
3
Algorithm
= 2
= 07
075
067
082
33
13
9
10
11
7
8
12
6
4
0
1 5
2
3
Algorithm
= 2
= 07
34
13
9
10
11
7
8
12
6
4
0
1 5
2
3
Algorithm
= 2
= 07
067
35
13
9
10
11
7
8
12
6
4
0
1 5
2
3
Algorithm
= 2
= 07
073
073
073
36
13
9
10
11
7
8
12
6
4
0
1 5
2
3
Algorithm
= 2
= 07
37
13
9
10
11
7
8
12
6
4
0
1 5
2
3
Algorithm
= 2
= 07
051
38
13
9
10
11
7
8
12
6
4
0
1 5
2
3
Algorithm
= 2
= 07
068
39
13
9
10
11
7
8
12
6
4
0
1 5
2
3
Algorithm
= 2
= 07
051
40
13
9
10
11
7
8
12
6
4
0
1 5
2
3
Algorithm
= 2
= 07
41
13
9
10
11
7
8
12
6
4
0
1 5
2
3
Algorithm
= 2
= 07 051
051
068
42
13
9
10
11
7
8
12
6
4
0
1 5
2
3
Algorithm
= 2
= 07
43
Running Time
bull Running time = O(|E|) bull For sparse networks = O(|V|)
[2] A Clauset M E J Newman amp C Moore Phys Rev E 70 066111 (2004) 44
Spectral Clustering
bull Reference ICDMrsquo09 Tutorial by Chris Ding
bull Example
bull Clustering supreme court justices according to their voting
behavior
45
Example Continue
46
Spectral Graph Partition
bull Min-Cut
bull Minimize the of cut of edges
47
Objective Function
48
Minimum Cut with Constraints
49
New Objective Functions
50
Other References
bull A Tutorial on Spectral Clustering by U Luxburg httpwwwkybmpgdefileadminuser_uploadfilespublicationsattachmentsLuxburg07_tutorial_44885B05Dpdf
51
Chapter 10 Cluster Analysis Basic Concepts and Methods
bull Beyond K-Means
bull K-means
bull EM-algorithm
bull Kernel K-means
bull Clustering Graphs and Network Data
bull Summary
52
Summary
bull Generalizing K-Means
bull Mixture Model EM-Algorithm Kernel K-Means
bull Clustering Graph and Networked Data
bull SCAN density-based algorithm
bull Spectral clustering
53
Announcement
bull HW 3 due tomorrow
bull Course project due next week bull Submit final report data code (with readme) evaluation forms
bull Make appointment with me to explain your project
bull I will ask questions according to your report
bull Final Exam bull 422 3 hours in class cover the whole semester with different
weights
bull You can bring two A4 cheating sheets one for content before midterm and the other for content after midterm
bull Interested in research bull My research area Informationsocial network mining
54
Limitations of K-Means
bull K-means has problems when clusters are of differing
bull Sizes
bull Densities
bull Non-Spherical Shapes
11
Limitations of K-Means Different Density and Size
12
Limitations of K-Means Non-Spherical Shapes
13
Fuzzy Set and Fuzzy Cluster
bull Clustering methods discussed so far
bull Every data object is assigned to exactly one cluster
bull Some applications may need for fuzzy or soft cluster assignment
bull Ex An e-game could belong to both entertainment
and software
bull Methods fuzzy clusters and probabilistic model-based clusters
bull Fuzzy cluster A fuzzy set S FS X rarr [0 1] (value between 0 and 1)
14
Probabilistic Model-Based Clustering
bull Cluster analysis is to find hidden categories
bull A hidden category (ie probabilistic cluster) is a distribution over the data space which can be mathematically represented using a probability density function (or distribution function)
Ex categories for digital cameras sold
consumer line vs professional line
density functions f1 f2 for C1 C2
obtained by probabilistic clustering
A mixture model assumes that a set of observed objects is a mixture
of instances from multiple probabilistic clusters and conceptually
each observed object is generated independently
Our task infer a set of k probabilistic clusters that is mostly likely to
generate D using the above data generation process
15
16
Mixture Model-Based Clustering
bull A set C of k probabilistic clusters C1 hellipCk with probability density functions f1
hellip fk respectively and their probabilities ω1 hellip ωk
bull Probability of an object o generated by cluster Cj is
bull Probability of o generated by the set of cluster C is
Since objects are assumed to be generated
independently for a data set D = o1 hellip on we have
Task Find a set C of k probabilistic clusters st P(D|C) is maximized
The EM (Expectation Maximization) Algorithm
bull The (EM) algorithm A framework to approach maximum likelihood or maximum a posteriori estimates of parameters in statistical models bull E-step assigns objects to clusters according to the current fuzzy
clustering or parameters of probabilistic clusters
bull 119908119894119895119905 = 119901 119911119894 = 119895 120579119895
119905 119909119894 prop 119901 119909119894 119862119895119905 120579119895
119905 119901(119862119895119905)
bull M-step finds the new clustering or parameters that minimize the sum of squared error (SSE) or the expected likelihood bull Under uni-variant normal distribution assumptions
bull 120583119895119905+1 =
119908119894119895119905 119909119894119894
119908119894119895119905
119894 120590119895
2 = 119908119894119895
119905 119909119894minus1198881198951199052
119894
119908119894119895119905
119894 119901 119862119895
119905 prop 119908119894119895119905
119894
bull More about mixture model and EM algorithms httpwwwstatcmuedu~cshalizi350lectures29lecture-29pdf
17
K-Means Special Case of Gaussian Mixture Model
bull When each Gaussian component with covariance matrix 1205902119868
bull Soft K-means
bull When 1205902 rarr 0
bull Soft assignment becomes hard assignment
18
Advantages and Disadvantages of Mixture Models
bull Strength
bull Mixture models are more general than partitioning
bull Clusters can be characterized by a small number of parameters
bull The results may satisfy the statistical assumptions of the generative models
bull Weakness
bull Converge to local optimal (overcome run multi-times w random
initialization)
bull Computationally expensive if the number of distributions is large or the
data set contains very few observed data points
bull Need large data sets
bull Hard to estimate the number of clusters
19
Kernel K-Means
bull How to cluster the following data
bull A non-linear map 120601119877119899 rarr 119865
bull Map a data point into a higherinfinite dimensional space
bull 119909 rarr 120601 119909
bull Dot product matrix 119870119894119895
bull 119870119894119895 =lt 120601 119909119894 120601(119909119895) gt
20
Solution of Kernel K-Means
bull Objective function under new feature space
bull 119869 = 119908119894119895||120601(119909119894) minus 119888119895||2
119894119896119895=1
bull Algorithm
bull By fixing assignment 119908119894119895
bull 119888119895 = 119908119894119895119894 120601(119909119894) 119908119894119895119894
bull In the assignment step assign the data points to the closest
center
bull 119889 119909119894 119888119895 = 120601 119909119894 minus 119908119894prime119895119894prime 120601 119909119894prime
119908119894prime119895119894prime
2
=
120601 119909119894 sdot 120601 119909119894 minus 2 119908
119894prime119895119894prime120601 119909119894 sdot120601 119909
119894prime
119908119894prime119895119894prime+ 119908
119894prime119895119908119897119895120601 119909
119894primesdot120601(119909119897)119897119894prime
( 119908119894prime119895)^2 119894prime
21
Do not really need to know 120601 119909 119887119906119905 119900119899119897119910 119870119894119895
Advatanges and Disadvantages of Kernel K-Means
bull Advantages
bull Algorithm is able to identify the non-linear structures
bull Disadvantages
bull Number of cluster centers need to be predefined
bull Algorithm is complex in nature and time complexity is large
bull References
bull Kernel k-means and Spectral Clustering by Max Welling
bull Kernel k-means Spectral Clustering and Normalized Cut by
Inderjit S Dhillon Yuqiang Guan and Brian Kulis
bull An Introduction to kernel methods by Colin Campbell
22
Chapter 10 Cluster Analysis Basic Concepts and Methods
bull Beyond K-Means
bull K-means
bull EM-algorithm for Mixture Models
bull Kernel K-means
bull Clustering Graphs and Network Data
bull Summary
23
Clustering Graphs and Network Data
bull Applications
bull Bi-partite graphs eg customers and products authors and conferences
bull Web search engines eg click through graphs and Web graphs
bull Social networks friendshipcoauthor graphs
24
Clustering books about politics [Newman 2006]
Algorithms bull Graph clustering methods
bull Density-based clustering SCAN (Xu et al KDDrsquo2007)
bull Spectral clustering
bull Modularity-based approach
bull Probabilistic approach
bull Nonnegative matrix factorization
bull hellip
25
SCAN Density-Based Clustering of Networks
bull How many clusters
bull What size should they be
bull What is the best partitioning
bull Should some points be segregated
26
An Example Network
Application Given simply information of who associates with whom
could one identify clusters of individuals with common interests or
special relationships (families cliques terrorist cells)
A Social Network Model
bull Cliques hubs and outliers
bull Individuals in a tight social group or clique know many of the same
people regardless of the size of the group
bull Individuals who are hubs know many people in different groups but belong
to no single group Politicians for example bridge multiple groups
bull Individuals who are outliers reside at the margins of society Hermits for
example know few people and belong to no group
bull The Neighborhood of a Vertex
27
v
Define () as the immediate
neighborhood of a vertex (ie the set
of people that an individual knows )
Structure Similarity
bull The desired features tend to be captured by a measure we
call Structural Similarity
bull Structural similarity is large for members of a clique and small
for hubs and outliers
|)(||)(|
|)()(|)(
wv
wvwv
28
v
Structural Connectivity [1]
bull -Neighborhood
bull Core
bull Direct structure reachable
bull Structure reachable transitive closure of direct structure
reachability
bull Structure connected
)(|)()( wvvwvN
|)(|)( vNvCORE
)()()( vNwvCOREwvDirRECH
)()()( wuRECHvuRECHVuwvCONNECT
[1] M Ester H P Kriegel J Sander amp X Xu (KDD96) ldquoA Density-Based
Algorithm for Discovering Clusters in Large Spatial Databases
29
Structure-Connected Clusters
bull Structure-connected cluster C
bull Connectivity
bull Maximality
bull Hubs
bull Not belong to any cluster
bull Bridge to many clusters
bull Outliers
bull Not belong to any cluster
bull Connect to less clusters
)( wvCONNECTCwv
CwwvREACHCvVwv )(
hub
outlier
30
13
9
10
11
7
8
12
6
4
0
1 5
2
3
Algorithm
= 2
= 07
31
13
9
10
11
7
8
12
6
4
0
1 5
2
3
Algorithm
= 2
= 07
063
32
13
9
10
11
7
8
12
6
4
0
1 5
2
3
Algorithm
= 2
= 07
075
067
082
33
13
9
10
11
7
8
12
6
4
0
1 5
2
3
Algorithm
= 2
= 07
34
13
9
10
11
7
8
12
6
4
0
1 5
2
3
Algorithm
= 2
= 07
067
35
13
9
10
11
7
8
12
6
4
0
1 5
2
3
Algorithm
= 2
= 07
073
073
073
36
13
9
10
11
7
8
12
6
4
0
1 5
2
3
Algorithm
= 2
= 07
37
13
9
10
11
7
8
12
6
4
0
1 5
2
3
Algorithm
= 2
= 07
051
38
13
9
10
11
7
8
12
6
4
0
1 5
2
3
Algorithm
= 2
= 07
068
39
13
9
10
11
7
8
12
6
4
0
1 5
2
3
Algorithm
= 2
= 07
051
40
13
9
10
11
7
8
12
6
4
0
1 5
2
3
Algorithm
= 2
= 07
41
13
9
10
11
7
8
12
6
4
0
1 5
2
3
Algorithm
= 2
= 07 051
051
068
42
13
9
10
11
7
8
12
6
4
0
1 5
2
3
Algorithm
= 2
= 07
43
Running Time
bull Running time = O(|E|) bull For sparse networks = O(|V|)
[2] A Clauset M E J Newman amp C Moore Phys Rev E 70 066111 (2004) 44
Spectral Clustering
bull Reference ICDMrsquo09 Tutorial by Chris Ding
bull Example
bull Clustering supreme court justices according to their voting
behavior
45
Example Continue
46
Spectral Graph Partition
bull Min-Cut
bull Minimize the of cut of edges
47
Objective Function
48
Minimum Cut with Constraints
49
New Objective Functions
50
Other References
bull A Tutorial on Spectral Clustering by U Luxburg httpwwwkybmpgdefileadminuser_uploadfilespublicationsattachmentsLuxburg07_tutorial_44885B05Dpdf
51
Chapter 10 Cluster Analysis Basic Concepts and Methods
bull Beyond K-Means
bull K-means
bull EM-algorithm
bull Kernel K-means
bull Clustering Graphs and Network Data
bull Summary
52
Summary
bull Generalizing K-Means
bull Mixture Model EM-Algorithm Kernel K-Means
bull Clustering Graph and Networked Data
bull SCAN density-based algorithm
bull Spectral clustering
53
Announcement
bull HW 3 due tomorrow
bull Course project due next week bull Submit final report data code (with readme) evaluation forms
bull Make appointment with me to explain your project
bull I will ask questions according to your report
bull Final Exam bull 422 3 hours in class cover the whole semester with different
weights
bull You can bring two A4 cheating sheets one for content before midterm and the other for content after midterm
bull Interested in research bull My research area Informationsocial network mining
54
Limitations of K-Means Different Density and Size
12
Limitations of K-Means Non-Spherical Shapes
13
Fuzzy Set and Fuzzy Cluster
bull Clustering methods discussed so far
bull Every data object is assigned to exactly one cluster
bull Some applications may need for fuzzy or soft cluster assignment
bull Ex An e-game could belong to both entertainment
and software
bull Methods fuzzy clusters and probabilistic model-based clusters
bull Fuzzy cluster A fuzzy set S FS X rarr [0 1] (value between 0 and 1)
14
Probabilistic Model-Based Clustering
bull Cluster analysis is to find hidden categories
bull A hidden category (ie probabilistic cluster) is a distribution over the data space which can be mathematically represented using a probability density function (or distribution function)
Ex categories for digital cameras sold
consumer line vs professional line
density functions f1 f2 for C1 C2
obtained by probabilistic clustering
A mixture model assumes that a set of observed objects is a mixture
of instances from multiple probabilistic clusters and conceptually
each observed object is generated independently
Our task infer a set of k probabilistic clusters that is mostly likely to
generate D using the above data generation process
15
16
Mixture Model-Based Clustering
bull A set C of k probabilistic clusters C1 hellipCk with probability density functions f1
hellip fk respectively and their probabilities ω1 hellip ωk
bull Probability of an object o generated by cluster Cj is
bull Probability of o generated by the set of cluster C is
Since objects are assumed to be generated
independently for a data set D = o1 hellip on we have
Task Find a set C of k probabilistic clusters st P(D|C) is maximized
The EM (Expectation Maximization) Algorithm
bull The (EM) algorithm A framework to approach maximum likelihood or maximum a posteriori estimates of parameters in statistical models bull E-step assigns objects to clusters according to the current fuzzy
clustering or parameters of probabilistic clusters
bull 119908119894119895119905 = 119901 119911119894 = 119895 120579119895
119905 119909119894 prop 119901 119909119894 119862119895119905 120579119895
119905 119901(119862119895119905)
bull M-step finds the new clustering or parameters that minimize the sum of squared error (SSE) or the expected likelihood bull Under uni-variant normal distribution assumptions
bull 120583119895119905+1 =
119908119894119895119905 119909119894119894
119908119894119895119905
119894 120590119895
2 = 119908119894119895
119905 119909119894minus1198881198951199052
119894
119908119894119895119905
119894 119901 119862119895
119905 prop 119908119894119895119905
119894
bull More about mixture model and EM algorithms httpwwwstatcmuedu~cshalizi350lectures29lecture-29pdf
17
K-Means Special Case of Gaussian Mixture Model
bull When each Gaussian component with covariance matrix 1205902119868
bull Soft K-means
bull When 1205902 rarr 0
bull Soft assignment becomes hard assignment
18
Advantages and Disadvantages of Mixture Models
bull Strength
bull Mixture models are more general than partitioning
bull Clusters can be characterized by a small number of parameters
bull The results may satisfy the statistical assumptions of the generative models
bull Weakness
bull Converge to local optimal (overcome run multi-times w random
initialization)
bull Computationally expensive if the number of distributions is large or the
data set contains very few observed data points
bull Need large data sets
bull Hard to estimate the number of clusters
19
Kernel K-Means
bull How to cluster the following data
bull A non-linear map 120601119877119899 rarr 119865
bull Map a data point into a higherinfinite dimensional space
bull 119909 rarr 120601 119909
bull Dot product matrix 119870119894119895
bull 119870119894119895 =lt 120601 119909119894 120601(119909119895) gt
20
Solution of Kernel K-Means
bull Objective function under new feature space
bull 119869 = 119908119894119895||120601(119909119894) minus 119888119895||2
119894119896119895=1
bull Algorithm
bull By fixing assignment 119908119894119895
bull 119888119895 = 119908119894119895119894 120601(119909119894) 119908119894119895119894
bull In the assignment step assign the data points to the closest
center
bull 119889 119909119894 119888119895 = 120601 119909119894 minus 119908119894prime119895119894prime 120601 119909119894prime
119908119894prime119895119894prime
2
=
120601 119909119894 sdot 120601 119909119894 minus 2 119908
119894prime119895119894prime120601 119909119894 sdot120601 119909
119894prime
119908119894prime119895119894prime+ 119908
119894prime119895119908119897119895120601 119909
119894primesdot120601(119909119897)119897119894prime
( 119908119894prime119895)^2 119894prime
21
Do not really need to know 120601 119909 119887119906119905 119900119899119897119910 119870119894119895
Advatanges and Disadvantages of Kernel K-Means
bull Advantages
bull Algorithm is able to identify the non-linear structures
bull Disadvantages
bull Number of cluster centers need to be predefined
bull Algorithm is complex in nature and time complexity is large
bull References
bull Kernel k-means and Spectral Clustering by Max Welling
bull Kernel k-means Spectral Clustering and Normalized Cut by
Inderjit S Dhillon Yuqiang Guan and Brian Kulis
bull An Introduction to kernel methods by Colin Campbell
22
Chapter 10 Cluster Analysis Basic Concepts and Methods
bull Beyond K-Means
bull K-means
bull EM-algorithm for Mixture Models
bull Kernel K-means
bull Clustering Graphs and Network Data
bull Summary
23
Clustering Graphs and Network Data
bull Applications
bull Bi-partite graphs eg customers and products authors and conferences
bull Web search engines eg click through graphs and Web graphs
bull Social networks friendshipcoauthor graphs
24
Clustering books about politics [Newman 2006]
Algorithms bull Graph clustering methods
bull Density-based clustering SCAN (Xu et al KDDrsquo2007)
bull Spectral clustering
bull Modularity-based approach
bull Probabilistic approach
bull Nonnegative matrix factorization
bull hellip
25
SCAN Density-Based Clustering of Networks
bull How many clusters
bull What size should they be
bull What is the best partitioning
bull Should some points be segregated
26
An Example Network
Application Given simply information of who associates with whom
could one identify clusters of individuals with common interests or
special relationships (families cliques terrorist cells)
A Social Network Model
bull Cliques hubs and outliers
bull Individuals in a tight social group or clique know many of the same
people regardless of the size of the group
bull Individuals who are hubs know many people in different groups but belong
to no single group Politicians for example bridge multiple groups
bull Individuals who are outliers reside at the margins of society Hermits for
example know few people and belong to no group
bull The Neighborhood of a Vertex
27
v
Define () as the immediate
neighborhood of a vertex (ie the set
of people that an individual knows )
Structure Similarity
bull The desired features tend to be captured by a measure we
call Structural Similarity
bull Structural similarity is large for members of a clique and small
for hubs and outliers
|)(||)(|
|)()(|)(
wv
wvwv
28
v
Structural Connectivity [1]
bull -Neighborhood
bull Core
bull Direct structure reachable
bull Structure reachable transitive closure of direct structure
reachability
bull Structure connected
)(|)()( wvvwvN
|)(|)( vNvCORE
)()()( vNwvCOREwvDirRECH
)()()( wuRECHvuRECHVuwvCONNECT
[1] M Ester H P Kriegel J Sander amp X Xu (KDD96) ldquoA Density-Based
Algorithm for Discovering Clusters in Large Spatial Databases
29
Structure-Connected Clusters
bull Structure-connected cluster C
bull Connectivity
bull Maximality
bull Hubs
bull Not belong to any cluster
bull Bridge to many clusters
bull Outliers
bull Not belong to any cluster
bull Connect to less clusters
)( wvCONNECTCwv
CwwvREACHCvVwv )(
hub
outlier
30
13
9
10
11
7
8
12
6
4
0
1 5
2
3
Algorithm
= 2
= 07
31
13
9
10
11
7
8
12
6
4
0
1 5
2
3
Algorithm
= 2
= 07
063
32
13
9
10
11
7
8
12
6
4
0
1 5
2
3
Algorithm
= 2
= 07
075
067
082
33
13
9
10
11
7
8
12
6
4
0
1 5
2
3
Algorithm
= 2
= 07
34
13
9
10
11
7
8
12
6
4
0
1 5
2
3
Algorithm
= 2
= 07
067
35
13
9
10
11
7
8
12
6
4
0
1 5
2
3
Algorithm
= 2
= 07
073
073
073
36
13
9
10
11
7
8
12
6
4
0
1 5
2
3
Algorithm
= 2
= 07
37
13
9
10
11
7
8
12
6
4
0
1 5
2
3
Algorithm
= 2
= 07
051
38
13
9
10
11
7
8
12
6
4
0
1 5
2
3
Algorithm
= 2
= 07
068
39
13
9
10
11
7
8
12
6
4
0
1 5
2
3
Algorithm
= 2
= 07
051
40
13
9
10
11
7
8
12
6
4
0
1 5
2
3
Algorithm
= 2
= 07
41
13
9
10
11
7
8
12
6
4
0
1 5
2
3
Algorithm
= 2
= 07 051
051
068
42
13
9
10
11
7
8
12
6
4
0
1 5
2
3
Algorithm
= 2
= 07
43
Running Time
bull Running time = O(|E|) bull For sparse networks = O(|V|)
[2] A Clauset M E J Newman amp C Moore Phys Rev E 70 066111 (2004) 44
Spectral Clustering
bull Reference ICDMrsquo09 Tutorial by Chris Ding
bull Example
bull Clustering supreme court justices according to their voting
behavior
45
Example Continue
46
Spectral Graph Partition
bull Min-Cut
bull Minimize the of cut of edges
47
Objective Function
48
Minimum Cut with Constraints
49
New Objective Functions
50
Other References
bull A Tutorial on Spectral Clustering by U Luxburg httpwwwkybmpgdefileadminuser_uploadfilespublicationsattachmentsLuxburg07_tutorial_44885B05Dpdf
51
Chapter 10 Cluster Analysis Basic Concepts and Methods
bull Beyond K-Means
bull K-means
bull EM-algorithm
bull Kernel K-means
bull Clustering Graphs and Network Data
bull Summary
52
Summary
bull Generalizing K-Means
bull Mixture Model EM-Algorithm Kernel K-Means
bull Clustering Graph and Networked Data
bull SCAN density-based algorithm
bull Spectral clustering
53
Announcement
bull HW 3 due tomorrow
bull Course project due next week bull Submit final report data code (with readme) evaluation forms
bull Make appointment with me to explain your project
bull I will ask questions according to your report
bull Final Exam bull 422 3 hours in class cover the whole semester with different
weights
bull You can bring two A4 cheating sheets one for content before midterm and the other for content after midterm
bull Interested in research bull My research area Informationsocial network mining
54
Limitations of K-Means Non-Spherical Shapes
13
Fuzzy Set and Fuzzy Cluster
bull Clustering methods discussed so far
bull Every data object is assigned to exactly one cluster
bull Some applications may need for fuzzy or soft cluster assignment
bull Ex An e-game could belong to both entertainment
and software
bull Methods fuzzy clusters and probabilistic model-based clusters
bull Fuzzy cluster A fuzzy set S FS X rarr [0 1] (value between 0 and 1)
14
Probabilistic Model-Based Clustering
bull Cluster analysis is to find hidden categories
bull A hidden category (ie probabilistic cluster) is a distribution over the data space which can be mathematically represented using a probability density function (or distribution function)
Ex categories for digital cameras sold
consumer line vs professional line
density functions f1 f2 for C1 C2
obtained by probabilistic clustering
A mixture model assumes that a set of observed objects is a mixture
of instances from multiple probabilistic clusters and conceptually
each observed object is generated independently
Our task infer a set of k probabilistic clusters that is mostly likely to
generate D using the above data generation process
15
16
Mixture Model-Based Clustering
bull A set C of k probabilistic clusters C1 hellipCk with probability density functions f1
hellip fk respectively and their probabilities ω1 hellip ωk
bull Probability of an object o generated by cluster Cj is
bull Probability of o generated by the set of cluster C is
Since objects are assumed to be generated
independently for a data set D = o1 hellip on we have
Task Find a set C of k probabilistic clusters st P(D|C) is maximized
The EM (Expectation Maximization) Algorithm
bull The (EM) algorithm A framework to approach maximum likelihood or maximum a posteriori estimates of parameters in statistical models bull E-step assigns objects to clusters according to the current fuzzy
clustering or parameters of probabilistic clusters
bull 119908119894119895119905 = 119901 119911119894 = 119895 120579119895
119905 119909119894 prop 119901 119909119894 119862119895119905 120579119895
119905 119901(119862119895119905)
bull M-step finds the new clustering or parameters that minimize the sum of squared error (SSE) or the expected likelihood bull Under uni-variant normal distribution assumptions
bull 120583119895119905+1 =
119908119894119895119905 119909119894119894
119908119894119895119905
119894 120590119895
2 = 119908119894119895
119905 119909119894minus1198881198951199052
119894
119908119894119895119905
119894 119901 119862119895
119905 prop 119908119894119895119905
119894
bull More about mixture model and EM algorithms httpwwwstatcmuedu~cshalizi350lectures29lecture-29pdf
17
K-Means Special Case of Gaussian Mixture Model
bull When each Gaussian component with covariance matrix 1205902119868
bull Soft K-means
bull When 1205902 rarr 0
bull Soft assignment becomes hard assignment
18
Advantages and Disadvantages of Mixture Models
bull Strength
bull Mixture models are more general than partitioning
bull Clusters can be characterized by a small number of parameters
bull The results may satisfy the statistical assumptions of the generative models
bull Weakness
bull Converge to local optimal (overcome run multi-times w random
initialization)
bull Computationally expensive if the number of distributions is large or the
data set contains very few observed data points
bull Need large data sets
bull Hard to estimate the number of clusters
19
Kernel K-Means
bull How to cluster the following data
bull A non-linear map 120601119877119899 rarr 119865
bull Map a data point into a higherinfinite dimensional space
bull 119909 rarr 120601 119909
bull Dot product matrix 119870119894119895
bull 119870119894119895 =lt 120601 119909119894 120601(119909119895) gt
20
Solution of Kernel K-Means
bull Objective function under new feature space
bull 119869 = 119908119894119895||120601(119909119894) minus 119888119895||2
119894119896119895=1
bull Algorithm
bull By fixing assignment 119908119894119895
bull 119888119895 = 119908119894119895119894 120601(119909119894) 119908119894119895119894
bull In the assignment step assign the data points to the closest
center
bull 119889 119909119894 119888119895 = 120601 119909119894 minus 119908119894prime119895119894prime 120601 119909119894prime
119908119894prime119895119894prime
2
=
120601 119909119894 sdot 120601 119909119894 minus 2 119908
119894prime119895119894prime120601 119909119894 sdot120601 119909
119894prime
119908119894prime119895119894prime+ 119908
119894prime119895119908119897119895120601 119909
119894primesdot120601(119909119897)119897119894prime
( 119908119894prime119895)^2 119894prime
21
Do not really need to know 120601 119909 119887119906119905 119900119899119897119910 119870119894119895
Advatanges and Disadvantages of Kernel K-Means
bull Advantages
bull Algorithm is able to identify the non-linear structures
bull Disadvantages
bull Number of cluster centers need to be predefined
bull Algorithm is complex in nature and time complexity is large
bull References
bull Kernel k-means and Spectral Clustering by Max Welling
bull Kernel k-means Spectral Clustering and Normalized Cut by
Inderjit S Dhillon Yuqiang Guan and Brian Kulis
bull An Introduction to kernel methods by Colin Campbell
22
Chapter 10 Cluster Analysis Basic Concepts and Methods
bull Beyond K-Means
bull K-means
bull EM-algorithm for Mixture Models
bull Kernel K-means
bull Clustering Graphs and Network Data
bull Summary
23
Clustering Graphs and Network Data
bull Applications
bull Bi-partite graphs eg customers and products authors and conferences
bull Web search engines eg click through graphs and Web graphs
bull Social networks friendshipcoauthor graphs
24
Clustering books about politics [Newman 2006]
Algorithms bull Graph clustering methods
bull Density-based clustering SCAN (Xu et al KDDrsquo2007)
bull Spectral clustering
bull Modularity-based approach
bull Probabilistic approach
bull Nonnegative matrix factorization
bull hellip
25
SCAN Density-Based Clustering of Networks
bull How many clusters
bull What size should they be
bull What is the best partitioning
bull Should some points be segregated
26
An Example Network
Application Given simply information of who associates with whom
could one identify clusters of individuals with common interests or
special relationships (families cliques terrorist cells)
A Social Network Model
bull Cliques hubs and outliers
bull Individuals in a tight social group or clique know many of the same
people regardless of the size of the group
bull Individuals who are hubs know many people in different groups but belong
to no single group Politicians for example bridge multiple groups
bull Individuals who are outliers reside at the margins of society Hermits for
example know few people and belong to no group
bull The Neighborhood of a Vertex
27
v
Define () as the immediate
neighborhood of a vertex (ie the set
of people that an individual knows )
Structure Similarity
bull The desired features tend to be captured by a measure we
call Structural Similarity
bull Structural similarity is large for members of a clique and small
for hubs and outliers
|)(||)(|
|)()(|)(
wv
wvwv
28
v
Structural Connectivity [1]
bull -Neighborhood
bull Core
bull Direct structure reachable
bull Structure reachable transitive closure of direct structure
reachability
bull Structure connected
)(|)()( wvvwvN
|)(|)( vNvCORE
)()()( vNwvCOREwvDirRECH
)()()( wuRECHvuRECHVuwvCONNECT
[1] M Ester H P Kriegel J Sander amp X Xu (KDD96) ldquoA Density-Based
Algorithm for Discovering Clusters in Large Spatial Databases
29
Structure-Connected Clusters
bull Structure-connected cluster C
bull Connectivity
bull Maximality
bull Hubs
bull Not belong to any cluster
bull Bridge to many clusters
bull Outliers
bull Not belong to any cluster
bull Connect to less clusters
)( wvCONNECTCwv
CwwvREACHCvVwv )(
hub
outlier
30
13
9
10
11
7
8
12
6
4
0
1 5
2
3
Algorithm
= 2
= 07
31
13
9
10
11
7
8
12
6
4
0
1 5
2
3
Algorithm
= 2
= 07
063
32
13
9
10
11
7
8
12
6
4
0
1 5
2
3
Algorithm
= 2
= 07
075
067
082
33
13
9
10
11
7
8
12
6
4
0
1 5
2
3
Algorithm
= 2
= 07
34
13
9
10
11
7
8
12
6
4
0
1 5
2
3
Algorithm
= 2
= 07
067
35
13
9
10
11
7
8
12
6
4
0
1 5
2
3
Algorithm
= 2
= 07
073
073
073
36
13
9
10
11
7
8
12
6
4
0
1 5
2
3
Algorithm
= 2
= 07
37
13
9
10
11
7
8
12
6
4
0
1 5
2
3
Algorithm
= 2
= 07
051
38
13
9
10
11
7
8
12
6
4
0
1 5
2
3
Algorithm
= 2
= 07
068
39
13
9
10
11
7
8
12
6
4
0
1 5
2
3
Algorithm
= 2
= 07
051
40
13
9
10
11
7
8
12
6
4
0
1 5
2
3
Algorithm
= 2
= 07
41
13
9
10
11
7
8
12
6
4
0
1 5
2
3
Algorithm
= 2
= 07 051
051
068
42
13
9
10
11
7
8
12
6
4
0
1 5
2
3
Algorithm
= 2
= 07
43
Running Time
bull Running time = O(|E|) bull For sparse networks = O(|V|)
[2] A Clauset M E J Newman amp C Moore Phys Rev E 70 066111 (2004) 44
Spectral Clustering
bull Reference ICDMrsquo09 Tutorial by Chris Ding
bull Example
bull Clustering supreme court justices according to their voting
behavior
45
Example Continue
46
Spectral Graph Partition
bull Min-Cut
bull Minimize the of cut of edges
47
Objective Function
48
Minimum Cut with Constraints
49
New Objective Functions
50
Other References
bull A Tutorial on Spectral Clustering by U Luxburg httpwwwkybmpgdefileadminuser_uploadfilespublicationsattachmentsLuxburg07_tutorial_44885B05Dpdf
51
Chapter 10 Cluster Analysis Basic Concepts and Methods
bull Beyond K-Means
bull K-means
bull EM-algorithm
bull Kernel K-means
bull Clustering Graphs and Network Data
bull Summary
52
Summary
bull Generalizing K-Means
bull Mixture Model EM-Algorithm Kernel K-Means
bull Clustering Graph and Networked Data
bull SCAN density-based algorithm
bull Spectral clustering
53
Announcement
bull HW 3 due tomorrow
bull Course project due next week bull Submit final report data code (with readme) evaluation forms
bull Make appointment with me to explain your project
bull I will ask questions according to your report
bull Final Exam bull 422 3 hours in class cover the whole semester with different
weights
bull You can bring two A4 cheating sheets one for content before midterm and the other for content after midterm
bull Interested in research bull My research area Informationsocial network mining
54
Fuzzy Set and Fuzzy Cluster
bull Clustering methods discussed so far
bull Every data object is assigned to exactly one cluster
bull Some applications may need for fuzzy or soft cluster assignment
bull Ex An e-game could belong to both entertainment
and software
bull Methods fuzzy clusters and probabilistic model-based clusters
bull Fuzzy cluster A fuzzy set S FS X rarr [0 1] (value between 0 and 1)
14
Probabilistic Model-Based Clustering
bull Cluster analysis is to find hidden categories
bull A hidden category (ie probabilistic cluster) is a distribution over the data space which can be mathematically represented using a probability density function (or distribution function)
Ex categories for digital cameras sold
consumer line vs professional line
density functions f1 f2 for C1 C2
obtained by probabilistic clustering
A mixture model assumes that a set of observed objects is a mixture
of instances from multiple probabilistic clusters and conceptually
each observed object is generated independently
Our task infer a set of k probabilistic clusters that is mostly likely to
generate D using the above data generation process
15
16
Mixture Model-Based Clustering
bull A set C of k probabilistic clusters C1 hellipCk with probability density functions f1
hellip fk respectively and their probabilities ω1 hellip ωk
bull Probability of an object o generated by cluster Cj is
bull Probability of o generated by the set of cluster C is
Since objects are assumed to be generated
independently for a data set D = o1 hellip on we have
Task Find a set C of k probabilistic clusters st P(D|C) is maximized
The EM (Expectation Maximization) Algorithm
bull The (EM) algorithm A framework to approach maximum likelihood or maximum a posteriori estimates of parameters in statistical models bull E-step assigns objects to clusters according to the current fuzzy
clustering or parameters of probabilistic clusters
bull 119908119894119895119905 = 119901 119911119894 = 119895 120579119895
119905 119909119894 prop 119901 119909119894 119862119895119905 120579119895
119905 119901(119862119895119905)
bull M-step finds the new clustering or parameters that minimize the sum of squared error (SSE) or the expected likelihood bull Under uni-variant normal distribution assumptions
bull 120583119895119905+1 =
119908119894119895119905 119909119894119894
119908119894119895119905
119894 120590119895
2 = 119908119894119895
119905 119909119894minus1198881198951199052
119894
119908119894119895119905
119894 119901 119862119895
119905 prop 119908119894119895119905
119894
bull More about mixture model and EM algorithms httpwwwstatcmuedu~cshalizi350lectures29lecture-29pdf
17
K-Means Special Case of Gaussian Mixture Model
bull When each Gaussian component with covariance matrix 1205902119868
bull Soft K-means
bull When 1205902 rarr 0
bull Soft assignment becomes hard assignment
18
Advantages and Disadvantages of Mixture Models
bull Strength
bull Mixture models are more general than partitioning
bull Clusters can be characterized by a small number of parameters
bull The results may satisfy the statistical assumptions of the generative models
bull Weakness
bull Converge to local optimal (overcome run multi-times w random
initialization)
bull Computationally expensive if the number of distributions is large or the
data set contains very few observed data points
bull Need large data sets
bull Hard to estimate the number of clusters
19
Kernel K-Means
bull How to cluster the following data
bull A non-linear map 120601119877119899 rarr 119865
bull Map a data point into a higherinfinite dimensional space
bull 119909 rarr 120601 119909
bull Dot product matrix 119870119894119895
bull 119870119894119895 =lt 120601 119909119894 120601(119909119895) gt
20
Solution of Kernel K-Means
bull Objective function under new feature space
bull 119869 = 119908119894119895||120601(119909119894) minus 119888119895||2
119894119896119895=1
bull Algorithm
bull By fixing assignment 119908119894119895
bull 119888119895 = 119908119894119895119894 120601(119909119894) 119908119894119895119894
bull In the assignment step assign the data points to the closest
center
bull 119889 119909119894 119888119895 = 120601 119909119894 minus 119908119894prime119895119894prime 120601 119909119894prime
119908119894prime119895119894prime
2
=
120601 119909119894 sdot 120601 119909119894 minus 2 119908
119894prime119895119894prime120601 119909119894 sdot120601 119909
119894prime
119908119894prime119895119894prime+ 119908
119894prime119895119908119897119895120601 119909
119894primesdot120601(119909119897)119897119894prime
( 119908119894prime119895)^2 119894prime
21
Do not really need to know 120601 119909 119887119906119905 119900119899119897119910 119870119894119895
Advatanges and Disadvantages of Kernel K-Means
bull Advantages
bull Algorithm is able to identify the non-linear structures
bull Disadvantages
bull Number of cluster centers need to be predefined
bull Algorithm is complex in nature and time complexity is large
bull References
bull Kernel k-means and Spectral Clustering by Max Welling
bull Kernel k-means Spectral Clustering and Normalized Cut by
Inderjit S Dhillon Yuqiang Guan and Brian Kulis
bull An Introduction to kernel methods by Colin Campbell
22
Chapter 10 Cluster Analysis Basic Concepts and Methods
bull Beyond K-Means
bull K-means
bull EM-algorithm for Mixture Models
bull Kernel K-means
bull Clustering Graphs and Network Data
bull Summary
23
Clustering Graphs and Network Data
bull Applications
bull Bi-partite graphs eg customers and products authors and conferences
bull Web search engines eg click through graphs and Web graphs
bull Social networks friendshipcoauthor graphs
24
Clustering books about politics [Newman 2006]
Algorithms bull Graph clustering methods
bull Density-based clustering SCAN (Xu et al KDDrsquo2007)
bull Spectral clustering
bull Modularity-based approach
bull Probabilistic approach
bull Nonnegative matrix factorization
bull hellip
25
SCAN Density-Based Clustering of Networks
bull How many clusters
bull What size should they be
bull What is the best partitioning
bull Should some points be segregated
26
An Example Network
Application Given simply information of who associates with whom
could one identify clusters of individuals with common interests or
special relationships (families cliques terrorist cells)
A Social Network Model
bull Cliques hubs and outliers
bull Individuals in a tight social group or clique know many of the same
people regardless of the size of the group
bull Individuals who are hubs know many people in different groups but belong
to no single group Politicians for example bridge multiple groups
bull Individuals who are outliers reside at the margins of society Hermits for
example know few people and belong to no group
bull The Neighborhood of a Vertex
27
v
Define () as the immediate
neighborhood of a vertex (ie the set
of people that an individual knows )
Structure Similarity
bull The desired features tend to be captured by a measure we
call Structural Similarity
bull Structural similarity is large for members of a clique and small
for hubs and outliers
|)(||)(|
|)()(|)(
wv
wvwv
28
v
Structural Connectivity [1]
bull -Neighborhood
bull Core
bull Direct structure reachable
bull Structure reachable transitive closure of direct structure
reachability
bull Structure connected
)(|)()( wvvwvN
|)(|)( vNvCORE
)()()( vNwvCOREwvDirRECH
)()()( wuRECHvuRECHVuwvCONNECT
[1] M Ester H P Kriegel J Sander amp X Xu (KDD96) ldquoA Density-Based
Algorithm for Discovering Clusters in Large Spatial Databases
29
Structure-Connected Clusters
bull Structure-connected cluster C
bull Connectivity
bull Maximality
bull Hubs
bull Not belong to any cluster
bull Bridge to many clusters
bull Outliers
bull Not belong to any cluster
bull Connect to less clusters
)( wvCONNECTCwv
CwwvREACHCvVwv )(
hub
outlier
30
13
9
10
11
7
8
12
6
4
0
1 5
2
3
Algorithm
= 2
= 07
31
13
9
10
11
7
8
12
6
4
0
1 5
2
3
Algorithm
= 2
= 07
063
32
13
9
10
11
7
8
12
6
4
0
1 5
2
3
Algorithm
= 2
= 07
075
067
082
33
13
9
10
11
7
8
12
6
4
0
1 5
2
3
Algorithm
= 2
= 07
34
13
9
10
11
7
8
12
6
4
0
1 5
2
3
Algorithm
= 2
= 07
067
35
13
9
10
11
7
8
12
6
4
0
1 5
2
3
Algorithm
= 2
= 07
073
073
073
36
13
9
10
11
7
8
12
6
4
0
1 5
2
3
Algorithm
= 2
= 07
37
13
9
10
11
7
8
12
6
4
0
1 5
2
3
Algorithm
= 2
= 07
051
38
13
9
10
11
7
8
12
6
4
0
1 5
2
3
Algorithm
= 2
= 07
068
39
13
9
10
11
7
8
12
6
4
0
1 5
2
3
Algorithm
= 2
= 07
051
40
13
9
10
11
7
8
12
6
4
0
1 5
2
3
Algorithm
= 2
= 07
41
13
9
10
11
7
8
12
6
4
0
1 5
2
3
Algorithm
= 2
= 07 051
051
068
42
13
9
10
11
7
8
12
6
4
0
1 5
2
3
Algorithm
= 2
= 07
43
Running Time
bull Running time = O(|E|) bull For sparse networks = O(|V|)
[2] A Clauset M E J Newman amp C Moore Phys Rev E 70 066111 (2004) 44
Spectral Clustering
bull Reference ICDMrsquo09 Tutorial by Chris Ding
bull Example
bull Clustering supreme court justices according to their voting
behavior
45
Example Continue
46
Spectral Graph Partition
bull Min-Cut
bull Minimize the of cut of edges
47
Objective Function
48
Minimum Cut with Constraints
49
New Objective Functions
50
Other References
bull A Tutorial on Spectral Clustering by U Luxburg httpwwwkybmpgdefileadminuser_uploadfilespublicationsattachmentsLuxburg07_tutorial_44885B05Dpdf
51
Chapter 10 Cluster Analysis Basic Concepts and Methods
bull Beyond K-Means
bull K-means
bull EM-algorithm
bull Kernel K-means
bull Clustering Graphs and Network Data
bull Summary
52
Summary
bull Generalizing K-Means
bull Mixture Model EM-Algorithm Kernel K-Means
bull Clustering Graph and Networked Data
bull SCAN density-based algorithm
bull Spectral clustering
53
Announcement
bull HW 3 due tomorrow
bull Course project due next week bull Submit final report data code (with readme) evaluation forms
bull Make appointment with me to explain your project
bull I will ask questions according to your report
bull Final Exam bull 422 3 hours in class cover the whole semester with different
weights
bull You can bring two A4 cheating sheets one for content before midterm and the other for content after midterm
bull Interested in research bull My research area Informationsocial network mining
54
Probabilistic Model-Based Clustering
bull Cluster analysis is to find hidden categories
bull A hidden category (ie probabilistic cluster) is a distribution over the data space which can be mathematically represented using a probability density function (or distribution function)
Ex categories for digital cameras sold
consumer line vs professional line
density functions f1 f2 for C1 C2
obtained by probabilistic clustering
A mixture model assumes that a set of observed objects is a mixture
of instances from multiple probabilistic clusters and conceptually
each observed object is generated independently
Our task infer a set of k probabilistic clusters that is mostly likely to
generate D using the above data generation process
15
16
Mixture Model-Based Clustering
bull A set C of k probabilistic clusters C1 hellipCk with probability density functions f1
hellip fk respectively and their probabilities ω1 hellip ωk
bull Probability of an object o generated by cluster Cj is
bull Probability of o generated by the set of cluster C is
Since objects are assumed to be generated
independently for a data set D = o1 hellip on we have
Task Find a set C of k probabilistic clusters st P(D|C) is maximized
The EM (Expectation Maximization) Algorithm
bull The (EM) algorithm A framework to approach maximum likelihood or maximum a posteriori estimates of parameters in statistical models bull E-step assigns objects to clusters according to the current fuzzy
clustering or parameters of probabilistic clusters
bull 119908119894119895119905 = 119901 119911119894 = 119895 120579119895
119905 119909119894 prop 119901 119909119894 119862119895119905 120579119895
119905 119901(119862119895119905)
bull M-step finds the new clustering or parameters that minimize the sum of squared error (SSE) or the expected likelihood bull Under uni-variant normal distribution assumptions
bull 120583119895119905+1 =
119908119894119895119905 119909119894119894
119908119894119895119905
119894 120590119895
2 = 119908119894119895
119905 119909119894minus1198881198951199052
119894
119908119894119895119905
119894 119901 119862119895
119905 prop 119908119894119895119905
119894
bull More about mixture model and EM algorithms httpwwwstatcmuedu~cshalizi350lectures29lecture-29pdf
17
K-Means Special Case of Gaussian Mixture Model
bull When each Gaussian component with covariance matrix 1205902119868
bull Soft K-means
bull When 1205902 rarr 0
bull Soft assignment becomes hard assignment
18
Advantages and Disadvantages of Mixture Models
bull Strength
bull Mixture models are more general than partitioning
bull Clusters can be characterized by a small number of parameters
bull The results may satisfy the statistical assumptions of the generative models
bull Weakness
bull Converge to local optimal (overcome run multi-times w random
initialization)
bull Computationally expensive if the number of distributions is large or the
data set contains very few observed data points
bull Need large data sets
bull Hard to estimate the number of clusters
19
Kernel K-Means
bull How to cluster the following data
bull A non-linear map 120601119877119899 rarr 119865
bull Map a data point into a higherinfinite dimensional space
bull 119909 rarr 120601 119909
bull Dot product matrix 119870119894119895
bull 119870119894119895 =lt 120601 119909119894 120601(119909119895) gt
20
Solution of Kernel K-Means
bull Objective function under new feature space
bull 119869 = 119908119894119895||120601(119909119894) minus 119888119895||2
119894119896119895=1
bull Algorithm
bull By fixing assignment 119908119894119895
bull 119888119895 = 119908119894119895119894 120601(119909119894) 119908119894119895119894
bull In the assignment step assign the data points to the closest
center
bull 119889 119909119894 119888119895 = 120601 119909119894 minus 119908119894prime119895119894prime 120601 119909119894prime
119908119894prime119895119894prime
2
=
120601 119909119894 sdot 120601 119909119894 minus 2 119908
119894prime119895119894prime120601 119909119894 sdot120601 119909
119894prime
119908119894prime119895119894prime+ 119908
119894prime119895119908119897119895120601 119909
119894primesdot120601(119909119897)119897119894prime
( 119908119894prime119895)^2 119894prime
21
Do not really need to know 120601 119909 119887119906119905 119900119899119897119910 119870119894119895
Advatanges and Disadvantages of Kernel K-Means
bull Advantages
bull Algorithm is able to identify the non-linear structures
bull Disadvantages
bull Number of cluster centers need to be predefined
bull Algorithm is complex in nature and time complexity is large
bull References
bull Kernel k-means and Spectral Clustering by Max Welling
bull Kernel k-means Spectral Clustering and Normalized Cut by
Inderjit S Dhillon Yuqiang Guan and Brian Kulis
bull An Introduction to kernel methods by Colin Campbell
22
Chapter 10 Cluster Analysis Basic Concepts and Methods
bull Beyond K-Means
bull K-means
bull EM-algorithm for Mixture Models
bull Kernel K-means
bull Clustering Graphs and Network Data
bull Summary
23
Clustering Graphs and Network Data
bull Applications
bull Bi-partite graphs eg customers and products authors and conferences
bull Web search engines eg click through graphs and Web graphs
bull Social networks friendshipcoauthor graphs
24
Clustering books about politics [Newman 2006]
Algorithms bull Graph clustering methods
bull Density-based clustering SCAN (Xu et al KDDrsquo2007)
bull Spectral clustering
bull Modularity-based approach
bull Probabilistic approach
bull Nonnegative matrix factorization
bull hellip
25
SCAN Density-Based Clustering of Networks
bull How many clusters
bull What size should they be
bull What is the best partitioning
bull Should some points be segregated
26
An Example Network
Application Given simply information of who associates with whom
could one identify clusters of individuals with common interests or
special relationships (families cliques terrorist cells)
A Social Network Model
bull Cliques hubs and outliers
bull Individuals in a tight social group or clique know many of the same
people regardless of the size of the group
bull Individuals who are hubs know many people in different groups but belong
to no single group Politicians for example bridge multiple groups
bull Individuals who are outliers reside at the margins of society Hermits for
example know few people and belong to no group
bull The Neighborhood of a Vertex
27
v
Define () as the immediate
neighborhood of a vertex (ie the set
of people that an individual knows )
Structure Similarity
bull The desired features tend to be captured by a measure we
call Structural Similarity
bull Structural similarity is large for members of a clique and small
for hubs and outliers
|)(||)(|
|)()(|)(
wv
wvwv
28
v
Structural Connectivity [1]
bull -Neighborhood
bull Core
bull Direct structure reachable
bull Structure reachable transitive closure of direct structure
reachability
bull Structure connected
)(|)()( wvvwvN
|)(|)( vNvCORE
)()()( vNwvCOREwvDirRECH
)()()( wuRECHvuRECHVuwvCONNECT
[1] M Ester H P Kriegel J Sander amp X Xu (KDD96) ldquoA Density-Based
Algorithm for Discovering Clusters in Large Spatial Databases
29
Structure-Connected Clusters
bull Structure-connected cluster C
bull Connectivity
bull Maximality
bull Hubs
bull Not belong to any cluster
bull Bridge to many clusters
bull Outliers
bull Not belong to any cluster
bull Connect to less clusters
)( wvCONNECTCwv
CwwvREACHCvVwv )(
hub
outlier
30
13
9
10
11
7
8
12
6
4
0
1 5
2
3
Algorithm
= 2
= 07
31
13
9
10
11
7
8
12
6
4
0
1 5
2
3
Algorithm
= 2
= 07
063
32
13
9
10
11
7
8
12
6
4
0
1 5
2
3
Algorithm
= 2
= 07
075
067
082
33
13
9
10
11
7
8
12
6
4
0
1 5
2
3
Algorithm
= 2
= 07
34
13
9
10
11
7
8
12
6
4
0
1 5
2
3
Algorithm
= 2
= 07
067
35
13
9
10
11
7
8
12
6
4
0
1 5
2
3
Algorithm
= 2
= 07
073
073
073
36
13
9
10
11
7
8
12
6
4
0
1 5
2
3
Algorithm
= 2
= 07
37
13
9
10
11
7
8
12
6
4
0
1 5
2
3
Algorithm
= 2
= 07
051
38
13
9
10
11
7
8
12
6
4
0
1 5
2
3
Algorithm
= 2
= 07
068
39
13
9
10
11
7
8
12
6
4
0
1 5
2
3
Algorithm
= 2
= 07
051
40
13
9
10
11
7
8
12
6
4
0
1 5
2
3
Algorithm
= 2
= 07
41
13
9
10
11
7
8
12
6
4
0
1 5
2
3
Algorithm
= 2
= 07 051
051
068
42
13
9
10
11
7
8
12
6
4
0
1 5
2
3
Algorithm
= 2
= 07
43
Running Time
bull Running time = O(|E|) bull For sparse networks = O(|V|)
[2] A Clauset M E J Newman amp C Moore Phys Rev E 70 066111 (2004) 44
Spectral Clustering
bull Reference ICDMrsquo09 Tutorial by Chris Ding
bull Example
bull Clustering supreme court justices according to their voting
behavior
45
Example Continue
46
Spectral Graph Partition
bull Min-Cut
bull Minimize the of cut of edges
47
Objective Function
48
Minimum Cut with Constraints
49
New Objective Functions
50
Other References
bull A Tutorial on Spectral Clustering by U Luxburg httpwwwkybmpgdefileadminuser_uploadfilespublicationsattachmentsLuxburg07_tutorial_44885B05Dpdf
51
Chapter 10 Cluster Analysis Basic Concepts and Methods
bull Beyond K-Means
bull K-means
bull EM-algorithm
bull Kernel K-means
bull Clustering Graphs and Network Data
bull Summary
52
Summary
bull Generalizing K-Means
bull Mixture Model EM-Algorithm Kernel K-Means
bull Clustering Graph and Networked Data
bull SCAN density-based algorithm
bull Spectral clustering
53
Announcement
bull HW 3 due tomorrow
bull Course project due next week bull Submit final report data code (with readme) evaluation forms
bull Make appointment with me to explain your project
bull I will ask questions according to your report
bull Final Exam bull 422 3 hours in class cover the whole semester with different
weights
bull You can bring two A4 cheating sheets one for content before midterm and the other for content after midterm
bull Interested in research bull My research area Informationsocial network mining
54
16
Mixture Model-Based Clustering
bull A set C of k probabilistic clusters C1 hellipCk with probability density functions f1
hellip fk respectively and their probabilities ω1 hellip ωk
bull Probability of an object o generated by cluster Cj is
bull Probability of o generated by the set of cluster C is
Since objects are assumed to be generated
independently for a data set D = o1 hellip on we have
Task Find a set C of k probabilistic clusters st P(D|C) is maximized
The EM (Expectation Maximization) Algorithm
bull The (EM) algorithm A framework to approach maximum likelihood or maximum a posteriori estimates of parameters in statistical models bull E-step assigns objects to clusters according to the current fuzzy
clustering or parameters of probabilistic clusters
bull 119908119894119895119905 = 119901 119911119894 = 119895 120579119895
119905 119909119894 prop 119901 119909119894 119862119895119905 120579119895
119905 119901(119862119895119905)
bull M-step finds the new clustering or parameters that minimize the sum of squared error (SSE) or the expected likelihood bull Under uni-variant normal distribution assumptions
bull 120583119895119905+1 =
119908119894119895119905 119909119894119894
119908119894119895119905
119894 120590119895
2 = 119908119894119895
119905 119909119894minus1198881198951199052
119894
119908119894119895119905
119894 119901 119862119895
119905 prop 119908119894119895119905
119894
bull More about mixture model and EM algorithms httpwwwstatcmuedu~cshalizi350lectures29lecture-29pdf
17
K-Means Special Case of Gaussian Mixture Model
bull When each Gaussian component with covariance matrix 1205902119868
bull Soft K-means
bull When 1205902 rarr 0
bull Soft assignment becomes hard assignment
18
Advantages and Disadvantages of Mixture Models
bull Strength
bull Mixture models are more general than partitioning
bull Clusters can be characterized by a small number of parameters
bull The results may satisfy the statistical assumptions of the generative models
bull Weakness
bull Converge to local optimal (overcome run multi-times w random
initialization)
bull Computationally expensive if the number of distributions is large or the
data set contains very few observed data points
bull Need large data sets
bull Hard to estimate the number of clusters
19
Kernel K-Means
bull How to cluster the following data
bull A non-linear map 120601119877119899 rarr 119865
bull Map a data point into a higherinfinite dimensional space
bull 119909 rarr 120601 119909
bull Dot product matrix 119870119894119895
bull 119870119894119895 =lt 120601 119909119894 120601(119909119895) gt
20
Solution of Kernel K-Means
bull Objective function under new feature space
bull 119869 = 119908119894119895||120601(119909119894) minus 119888119895||2
119894119896119895=1
bull Algorithm
bull By fixing assignment 119908119894119895
bull 119888119895 = 119908119894119895119894 120601(119909119894) 119908119894119895119894
bull In the assignment step assign the data points to the closest
center
bull 119889 119909119894 119888119895 = 120601 119909119894 minus 119908119894prime119895119894prime 120601 119909119894prime
119908119894prime119895119894prime
2
=
120601 119909119894 sdot 120601 119909119894 minus 2 119908
119894prime119895119894prime120601 119909119894 sdot120601 119909
119894prime
119908119894prime119895119894prime+ 119908
119894prime119895119908119897119895120601 119909
119894primesdot120601(119909119897)119897119894prime
( 119908119894prime119895)^2 119894prime
21
Do not really need to know 120601 119909 119887119906119905 119900119899119897119910 119870119894119895
Advatanges and Disadvantages of Kernel K-Means
bull Advantages
bull Algorithm is able to identify the non-linear structures
bull Disadvantages
bull Number of cluster centers need to be predefined
bull Algorithm is complex in nature and time complexity is large
bull References
bull Kernel k-means and Spectral Clustering by Max Welling
bull Kernel k-means Spectral Clustering and Normalized Cut by
Inderjit S Dhillon Yuqiang Guan and Brian Kulis
bull An Introduction to kernel methods by Colin Campbell
22
Chapter 10 Cluster Analysis Basic Concepts and Methods
bull Beyond K-Means
bull K-means
bull EM-algorithm for Mixture Models
bull Kernel K-means
bull Clustering Graphs and Network Data
bull Summary
23
Clustering Graphs and Network Data
bull Applications
bull Bi-partite graphs eg customers and products authors and conferences
bull Web search engines eg click through graphs and Web graphs
bull Social networks friendshipcoauthor graphs
24
Clustering books about politics [Newman 2006]
Algorithms bull Graph clustering methods
bull Density-based clustering SCAN (Xu et al KDDrsquo2007)
bull Spectral clustering
bull Modularity-based approach
bull Probabilistic approach
bull Nonnegative matrix factorization
bull hellip
25
SCAN Density-Based Clustering of Networks
bull How many clusters
bull What size should they be
bull What is the best partitioning
bull Should some points be segregated
26
An Example Network
Application Given simply information of who associates with whom
could one identify clusters of individuals with common interests or
special relationships (families cliques terrorist cells)
A Social Network Model
bull Cliques hubs and outliers
bull Individuals in a tight social group or clique know many of the same
people regardless of the size of the group
bull Individuals who are hubs know many people in different groups but belong
to no single group Politicians for example bridge multiple groups
bull Individuals who are outliers reside at the margins of society Hermits for
example know few people and belong to no group
bull The Neighborhood of a Vertex
27
v
Define () as the immediate
neighborhood of a vertex (ie the set
of people that an individual knows )
Structure Similarity
bull The desired features tend to be captured by a measure we
call Structural Similarity
bull Structural similarity is large for members of a clique and small
for hubs and outliers
|)(||)(|
|)()(|)(
wv
wvwv
28
v
Structural Connectivity [1]
bull -Neighborhood
bull Core
bull Direct structure reachable
bull Structure reachable transitive closure of direct structure
reachability
bull Structure connected
)(|)()( wvvwvN
|)(|)( vNvCORE
)()()( vNwvCOREwvDirRECH
)()()( wuRECHvuRECHVuwvCONNECT
[1] M Ester H P Kriegel J Sander amp X Xu (KDD96) ldquoA Density-Based
Algorithm for Discovering Clusters in Large Spatial Databases
29
Structure-Connected Clusters
bull Structure-connected cluster C
bull Connectivity
bull Maximality
bull Hubs
bull Not belong to any cluster
bull Bridge to many clusters
bull Outliers
bull Not belong to any cluster
bull Connect to less clusters
)( wvCONNECTCwv
CwwvREACHCvVwv )(
hub
outlier
30
13
9
10
11
7
8
12
6
4
0
1 5
2
3
Algorithm
= 2
= 07
31
13
9
10
11
7
8
12
6
4
0
1 5
2
3
Algorithm
= 2
= 07
063
32
13
9
10
11
7
8
12
6
4
0
1 5
2
3
Algorithm
= 2
= 07
075
067
082
33
13
9
10
11
7
8
12
6
4
0
1 5
2
3
Algorithm
= 2
= 07
34
13
9
10
11
7
8
12
6
4
0
1 5
2
3
Algorithm
= 2
= 07
067
35
13
9
10
11
7
8
12
6
4
0
1 5
2
3
Algorithm
= 2
= 07
073
073
073
36
13
9
10
11
7
8
12
6
4
0
1 5
2
3
Algorithm
= 2
= 07
37
13
9
10
11
7
8
12
6
4
0
1 5
2
3
Algorithm
= 2
= 07
051
38
13
9
10
11
7
8
12
6
4
0
1 5
2
3
Algorithm
= 2
= 07
068
39
13
9
10
11
7
8
12
6
4
0
1 5
2
3
Algorithm
= 2
= 07
051
40
13
9
10
11
7
8
12
6
4
0
1 5
2
3
Algorithm
= 2
= 07
41
13
9
10
11
7
8
12
6
4
0
1 5
2
3
Algorithm
= 2
= 07 051
051
068
42
13
9
10
11
7
8
12
6
4
0
1 5
2
3
Algorithm
= 2
= 07
43
Running Time
bull Running time = O(|E|) bull For sparse networks = O(|V|)
[2] A Clauset M E J Newman amp C Moore Phys Rev E 70 066111 (2004) 44
Spectral Clustering
bull Reference ICDMrsquo09 Tutorial by Chris Ding
bull Example
bull Clustering supreme court justices according to their voting
behavior
45
Example Continue
46
Spectral Graph Partition
bull Min-Cut
bull Minimize the of cut of edges
47
Objective Function
48
Minimum Cut with Constraints
49
New Objective Functions
50
Other References
bull A Tutorial on Spectral Clustering by U Luxburg httpwwwkybmpgdefileadminuser_uploadfilespublicationsattachmentsLuxburg07_tutorial_44885B05Dpdf
51
Chapter 10 Cluster Analysis Basic Concepts and Methods
bull Beyond K-Means
bull K-means
bull EM-algorithm
bull Kernel K-means
bull Clustering Graphs and Network Data
bull Summary
52
Summary
bull Generalizing K-Means
bull Mixture Model EM-Algorithm Kernel K-Means
bull Clustering Graph and Networked Data
bull SCAN density-based algorithm
bull Spectral clustering
53
Announcement
bull HW 3 due tomorrow
bull Course project due next week bull Submit final report data code (with readme) evaluation forms
bull Make appointment with me to explain your project
bull I will ask questions according to your report
bull Final Exam bull 422 3 hours in class cover the whole semester with different
weights
bull You can bring two A4 cheating sheets one for content before midterm and the other for content after midterm
bull Interested in research bull My research area Informationsocial network mining
54
The EM (Expectation Maximization) Algorithm
bull The (EM) algorithm A framework to approach maximum likelihood or maximum a posteriori estimates of parameters in statistical models bull E-step assigns objects to clusters according to the current fuzzy
clustering or parameters of probabilistic clusters
bull 119908119894119895119905 = 119901 119911119894 = 119895 120579119895
119905 119909119894 prop 119901 119909119894 119862119895119905 120579119895
119905 119901(119862119895119905)
bull M-step finds the new clustering or parameters that minimize the sum of squared error (SSE) or the expected likelihood bull Under uni-variant normal distribution assumptions
bull 120583119895119905+1 =
119908119894119895119905 119909119894119894
119908119894119895119905
119894 120590119895
2 = 119908119894119895
119905 119909119894minus1198881198951199052
119894
119908119894119895119905
119894 119901 119862119895
119905 prop 119908119894119895119905
119894
bull More about mixture model and EM algorithms httpwwwstatcmuedu~cshalizi350lectures29lecture-29pdf
17
K-Means Special Case of Gaussian Mixture Model
bull When each Gaussian component with covariance matrix 1205902119868
bull Soft K-means
bull When 1205902 rarr 0
bull Soft assignment becomes hard assignment
18
Advantages and Disadvantages of Mixture Models
bull Strength
bull Mixture models are more general than partitioning
bull Clusters can be characterized by a small number of parameters
bull The results may satisfy the statistical assumptions of the generative models
bull Weakness
bull Converge to local optimal (overcome run multi-times w random
initialization)
bull Computationally expensive if the number of distributions is large or the
data set contains very few observed data points
bull Need large data sets
bull Hard to estimate the number of clusters
19
Kernel K-Means
bull How to cluster the following data
bull A non-linear map 120601119877119899 rarr 119865
bull Map a data point into a higherinfinite dimensional space
bull 119909 rarr 120601 119909
bull Dot product matrix 119870119894119895
bull 119870119894119895 =lt 120601 119909119894 120601(119909119895) gt
20
Solution of Kernel K-Means
bull Objective function under new feature space
bull 119869 = 119908119894119895||120601(119909119894) minus 119888119895||2
119894119896119895=1
bull Algorithm
bull By fixing assignment 119908119894119895
bull 119888119895 = 119908119894119895119894 120601(119909119894) 119908119894119895119894
bull In the assignment step assign the data points to the closest
center
bull 119889 119909119894 119888119895 = 120601 119909119894 minus 119908119894prime119895119894prime 120601 119909119894prime
119908119894prime119895119894prime
2
=
120601 119909119894 sdot 120601 119909119894 minus 2 119908
119894prime119895119894prime120601 119909119894 sdot120601 119909
119894prime
119908119894prime119895119894prime+ 119908
119894prime119895119908119897119895120601 119909
119894primesdot120601(119909119897)119897119894prime
( 119908119894prime119895)^2 119894prime
21
Do not really need to know 120601 119909 119887119906119905 119900119899119897119910 119870119894119895
Advatanges and Disadvantages of Kernel K-Means
bull Advantages
bull Algorithm is able to identify the non-linear structures
bull Disadvantages
bull Number of cluster centers need to be predefined
bull Algorithm is complex in nature and time complexity is large
bull References
bull Kernel k-means and Spectral Clustering by Max Welling
bull Kernel k-means Spectral Clustering and Normalized Cut by
Inderjit S Dhillon Yuqiang Guan and Brian Kulis
bull An Introduction to kernel methods by Colin Campbell
22
Chapter 10 Cluster Analysis Basic Concepts and Methods
bull Beyond K-Means
bull K-means
bull EM-algorithm for Mixture Models
bull Kernel K-means
bull Clustering Graphs and Network Data
bull Summary
23
Clustering Graphs and Network Data
bull Applications
bull Bi-partite graphs eg customers and products authors and conferences
bull Web search engines eg click through graphs and Web graphs
bull Social networks friendshipcoauthor graphs
24
Clustering books about politics [Newman 2006]
Algorithms bull Graph clustering methods
bull Density-based clustering SCAN (Xu et al KDDrsquo2007)
bull Spectral clustering
bull Modularity-based approach
bull Probabilistic approach
bull Nonnegative matrix factorization
bull hellip
25
SCAN Density-Based Clustering of Networks
bull How many clusters
bull What size should they be
bull What is the best partitioning
bull Should some points be segregated
26
An Example Network
Application Given simply information of who associates with whom
could one identify clusters of individuals with common interests or
special relationships (families cliques terrorist cells)
A Social Network Model
bull Cliques hubs and outliers
bull Individuals in a tight social group or clique know many of the same
people regardless of the size of the group
bull Individuals who are hubs know many people in different groups but belong
to no single group Politicians for example bridge multiple groups
bull Individuals who are outliers reside at the margins of society Hermits for
example know few people and belong to no group
bull The Neighborhood of a Vertex
27
v
Define () as the immediate
neighborhood of a vertex (ie the set
of people that an individual knows )
Structure Similarity
bull The desired features tend to be captured by a measure we
call Structural Similarity
bull Structural similarity is large for members of a clique and small
for hubs and outliers
|)(||)(|
|)()(|)(
wv
wvwv
28
v
Structural Connectivity [1]
bull -Neighborhood
bull Core
bull Direct structure reachable
bull Structure reachable transitive closure of direct structure
reachability
bull Structure connected
)(|)()( wvvwvN
|)(|)( vNvCORE
)()()( vNwvCOREwvDirRECH
)()()( wuRECHvuRECHVuwvCONNECT
[1] M Ester H P Kriegel J Sander amp X Xu (KDD96) ldquoA Density-Based
Algorithm for Discovering Clusters in Large Spatial Databases
29
Structure-Connected Clusters
bull Structure-connected cluster C
bull Connectivity
bull Maximality
bull Hubs
bull Not belong to any cluster
bull Bridge to many clusters
bull Outliers
bull Not belong to any cluster
bull Connect to less clusters
)( wvCONNECTCwv
CwwvREACHCvVwv )(
hub
outlier
30
13
9
10
11
7
8
12
6
4
0
1 5
2
3
Algorithm
= 2
= 07
31
13
9
10
11
7
8
12
6
4
0
1 5
2
3
Algorithm
= 2
= 07
063
32
13
9
10
11
7
8
12
6
4
0
1 5
2
3
Algorithm
= 2
= 07
075
067
082
33
13
9
10
11
7
8
12
6
4
0
1 5
2
3
Algorithm
= 2
= 07
34
13
9
10
11
7
8
12
6
4
0
1 5
2
3
Algorithm
= 2
= 07
067
35
13
9
10
11
7
8
12
6
4
0
1 5
2
3
Algorithm
= 2
= 07
073
073
073
36
13
9
10
11
7
8
12
6
4
0
1 5
2
3
Algorithm
= 2
= 07
37
13
9
10
11
7
8
12
6
4
0
1 5
2
3
Algorithm
= 2
= 07
051
38
13
9
10
11
7
8
12
6
4
0
1 5
2
3
Algorithm
= 2
= 07
068
39
13
9
10
11
7
8
12
6
4
0
1 5
2
3
Algorithm
= 2
= 07
051
40
13
9
10
11
7
8
12
6
4
0
1 5
2
3
Algorithm
= 2
= 07
41
13
9
10
11
7
8
12
6
4
0
1 5
2
3
Algorithm
= 2
= 07 051
051
068
42
13
9
10
11
7
8
12
6
4
0
1 5
2
3
Algorithm
= 2
= 07
43
Running Time
bull Running time = O(|E|) bull For sparse networks = O(|V|)
[2] A Clauset M E J Newman amp C Moore Phys Rev E 70 066111 (2004) 44
Spectral Clustering
bull Reference ICDMrsquo09 Tutorial by Chris Ding
bull Example
bull Clustering supreme court justices according to their voting
behavior
45
Example Continue
46
Spectral Graph Partition
bull Min-Cut
bull Minimize the of cut of edges
47
Objective Function
48
Minimum Cut with Constraints
49
New Objective Functions
50
Other References
bull A Tutorial on Spectral Clustering by U Luxburg httpwwwkybmpgdefileadminuser_uploadfilespublicationsattachmentsLuxburg07_tutorial_44885B05Dpdf
51
Chapter 10 Cluster Analysis Basic Concepts and Methods
bull Beyond K-Means
bull K-means
bull EM-algorithm
bull Kernel K-means
bull Clustering Graphs and Network Data
bull Summary
52
Summary
bull Generalizing K-Means
bull Mixture Model EM-Algorithm Kernel K-Means
bull Clustering Graph and Networked Data
bull SCAN density-based algorithm
bull Spectral clustering
53
Announcement
bull HW 3 due tomorrow
bull Course project due next week bull Submit final report data code (with readme) evaluation forms
bull Make appointment with me to explain your project
bull I will ask questions according to your report
bull Final Exam bull 422 3 hours in class cover the whole semester with different
weights
bull You can bring two A4 cheating sheets one for content before midterm and the other for content after midterm
bull Interested in research bull My research area Informationsocial network mining
54
K-Means Special Case of Gaussian Mixture Model
bull When each Gaussian component with covariance matrix 1205902119868
bull Soft K-means
bull When 1205902 rarr 0
bull Soft assignment becomes hard assignment
18
Advantages and Disadvantages of Mixture Models
bull Strength
bull Mixture models are more general than partitioning
bull Clusters can be characterized by a small number of parameters
bull The results may satisfy the statistical assumptions of the generative models
bull Weakness
bull Converge to local optimal (overcome run multi-times w random
initialization)
bull Computationally expensive if the number of distributions is large or the
data set contains very few observed data points
bull Need large data sets
bull Hard to estimate the number of clusters
19
Kernel K-Means
bull How to cluster the following data
bull A non-linear map 120601119877119899 rarr 119865
bull Map a data point into a higherinfinite dimensional space
bull 119909 rarr 120601 119909
bull Dot product matrix 119870119894119895
bull 119870119894119895 =lt 120601 119909119894 120601(119909119895) gt
20
Solution of Kernel K-Means
bull Objective function under new feature space
bull 119869 = 119908119894119895||120601(119909119894) minus 119888119895||2
119894119896119895=1
bull Algorithm
bull By fixing assignment 119908119894119895
bull 119888119895 = 119908119894119895119894 120601(119909119894) 119908119894119895119894
bull In the assignment step assign the data points to the closest
center
bull 119889 119909119894 119888119895 = 120601 119909119894 minus 119908119894prime119895119894prime 120601 119909119894prime
119908119894prime119895119894prime
2
=
120601 119909119894 sdot 120601 119909119894 minus 2 119908
119894prime119895119894prime120601 119909119894 sdot120601 119909
119894prime
119908119894prime119895119894prime+ 119908
119894prime119895119908119897119895120601 119909
119894primesdot120601(119909119897)119897119894prime
( 119908119894prime119895)^2 119894prime
21
Do not really need to know 120601 119909 119887119906119905 119900119899119897119910 119870119894119895
Advatanges and Disadvantages of Kernel K-Means
bull Advantages
bull Algorithm is able to identify the non-linear structures
bull Disadvantages
bull Number of cluster centers need to be predefined
bull Algorithm is complex in nature and time complexity is large
bull References
bull Kernel k-means and Spectral Clustering by Max Welling
bull Kernel k-means Spectral Clustering and Normalized Cut by
Inderjit S Dhillon Yuqiang Guan and Brian Kulis
bull An Introduction to kernel methods by Colin Campbell
22
Chapter 10 Cluster Analysis Basic Concepts and Methods
bull Beyond K-Means
bull K-means
bull EM-algorithm for Mixture Models
bull Kernel K-means
bull Clustering Graphs and Network Data
bull Summary
23
Clustering Graphs and Network Data
bull Applications
bull Bi-partite graphs eg customers and products authors and conferences
bull Web search engines eg click through graphs and Web graphs
bull Social networks friendshipcoauthor graphs
24
Clustering books about politics [Newman 2006]
Algorithms bull Graph clustering methods
bull Density-based clustering SCAN (Xu et al KDDrsquo2007)
bull Spectral clustering
bull Modularity-based approach
bull Probabilistic approach
bull Nonnegative matrix factorization
bull hellip
25
SCAN Density-Based Clustering of Networks
bull How many clusters
bull What size should they be
bull What is the best partitioning
bull Should some points be segregated
26
An Example Network
Application Given simply information of who associates with whom
could one identify clusters of individuals with common interests or
special relationships (families cliques terrorist cells)
A Social Network Model
bull Cliques hubs and outliers
bull Individuals in a tight social group or clique know many of the same
people regardless of the size of the group
bull Individuals who are hubs know many people in different groups but belong
to no single group Politicians for example bridge multiple groups
bull Individuals who are outliers reside at the margins of society Hermits for
example know few people and belong to no group
bull The Neighborhood of a Vertex
27
v
Define () as the immediate
neighborhood of a vertex (ie the set
of people that an individual knows )
Structure Similarity
bull The desired features tend to be captured by a measure we
call Structural Similarity
bull Structural similarity is large for members of a clique and small
for hubs and outliers
|)(||)(|
|)()(|)(
wv
wvwv
28
v
Structural Connectivity [1]
bull -Neighborhood
bull Core
bull Direct structure reachable
bull Structure reachable transitive closure of direct structure
reachability
bull Structure connected
)(|)()( wvvwvN
|)(|)( vNvCORE
)()()( vNwvCOREwvDirRECH
)()()( wuRECHvuRECHVuwvCONNECT
[1] M Ester H P Kriegel J Sander amp X Xu (KDD96) ldquoA Density-Based
Algorithm for Discovering Clusters in Large Spatial Databases
29
Structure-Connected Clusters
bull Structure-connected cluster C
bull Connectivity
bull Maximality
bull Hubs
bull Not belong to any cluster
bull Bridge to many clusters
bull Outliers
bull Not belong to any cluster
bull Connect to less clusters
)( wvCONNECTCwv
CwwvREACHCvVwv )(
hub
outlier
30
13
9
10
11
7
8
12
6
4
0
1 5
2
3
Algorithm
= 2
= 07
31
13
9
10
11
7
8
12
6
4
0
1 5
2
3
Algorithm
= 2
= 07
063
32
13
9
10
11
7
8
12
6
4
0
1 5
2
3
Algorithm
= 2
= 07
075
067
082
33
13
9
10
11
7
8
12
6
4
0
1 5
2
3
Algorithm
= 2
= 07
34
13
9
10
11
7
8
12
6
4
0
1 5
2
3
Algorithm
= 2
= 07
067
35
13
9
10
11
7
8
12
6
4
0
1 5
2
3
Algorithm
= 2
= 07
073
073
073
36
13
9
10
11
7
8
12
6
4
0
1 5
2
3
Algorithm
= 2
= 07
37
13
9
10
11
7
8
12
6
4
0
1 5
2
3
Algorithm
= 2
= 07
051
38
13
9
10
11
7
8
12
6
4
0
1 5
2
3
Algorithm
= 2
= 07
068
39
13
9
10
11
7
8
12
6
4
0
1 5
2
3
Algorithm
= 2
= 07
051
40
13
9
10
11
7
8
12
6
4
0
1 5
2
3
Algorithm
= 2
= 07
41
13
9
10
11
7
8
12
6
4
0
1 5
2
3
Algorithm
= 2
= 07 051
051
068
42
13
9
10
11
7
8
12
6
4
0
1 5
2
3
Algorithm
= 2
= 07
43
Running Time
bull Running time = O(|E|) bull For sparse networks = O(|V|)
[2] A Clauset M E J Newman amp C Moore Phys Rev E 70 066111 (2004) 44
Spectral Clustering
bull Reference ICDMrsquo09 Tutorial by Chris Ding
bull Example
bull Clustering supreme court justices according to their voting
behavior
45
Example Continue
46
Spectral Graph Partition
bull Min-Cut
bull Minimize the of cut of edges
47
Objective Function
48
Minimum Cut with Constraints
49
New Objective Functions
50
Other References
bull A Tutorial on Spectral Clustering by U Luxburg httpwwwkybmpgdefileadminuser_uploadfilespublicationsattachmentsLuxburg07_tutorial_44885B05Dpdf
51
Chapter 10 Cluster Analysis Basic Concepts and Methods
bull Beyond K-Means
bull K-means
bull EM-algorithm
bull Kernel K-means
bull Clustering Graphs and Network Data
bull Summary
52
Summary
bull Generalizing K-Means
bull Mixture Model EM-Algorithm Kernel K-Means
bull Clustering Graph and Networked Data
bull SCAN density-based algorithm
bull Spectral clustering
53
Announcement
bull HW 3 due tomorrow
bull Course project due next week bull Submit final report data code (with readme) evaluation forms
bull Make appointment with me to explain your project
bull I will ask questions according to your report
bull Final Exam bull 422 3 hours in class cover the whole semester with different
weights
bull You can bring two A4 cheating sheets one for content before midterm and the other for content after midterm
bull Interested in research bull My research area Informationsocial network mining
54
Advantages and Disadvantages of Mixture Models
bull Strength
bull Mixture models are more general than partitioning
bull Clusters can be characterized by a small number of parameters
bull The results may satisfy the statistical assumptions of the generative models
bull Weakness
bull Converge to local optimal (overcome run multi-times w random
initialization)
bull Computationally expensive if the number of distributions is large or the
data set contains very few observed data points
bull Need large data sets
bull Hard to estimate the number of clusters
19
Kernel K-Means
bull How to cluster the following data
bull A non-linear map 120601119877119899 rarr 119865
bull Map a data point into a higherinfinite dimensional space
bull 119909 rarr 120601 119909
bull Dot product matrix 119870119894119895
bull 119870119894119895 =lt 120601 119909119894 120601(119909119895) gt
20
Solution of Kernel K-Means
bull Objective function under new feature space
bull 119869 = 119908119894119895||120601(119909119894) minus 119888119895||2
119894119896119895=1
bull Algorithm
bull By fixing assignment 119908119894119895
bull 119888119895 = 119908119894119895119894 120601(119909119894) 119908119894119895119894
bull In the assignment step assign the data points to the closest
center
bull 119889 119909119894 119888119895 = 120601 119909119894 minus 119908119894prime119895119894prime 120601 119909119894prime
119908119894prime119895119894prime
2
=
120601 119909119894 sdot 120601 119909119894 minus 2 119908
119894prime119895119894prime120601 119909119894 sdot120601 119909
119894prime
119908119894prime119895119894prime+ 119908
119894prime119895119908119897119895120601 119909
119894primesdot120601(119909119897)119897119894prime
( 119908119894prime119895)^2 119894prime
21
Do not really need to know 120601 119909 119887119906119905 119900119899119897119910 119870119894119895
Advatanges and Disadvantages of Kernel K-Means
bull Advantages
bull Algorithm is able to identify the non-linear structures
bull Disadvantages
bull Number of cluster centers need to be predefined
bull Algorithm is complex in nature and time complexity is large
bull References
bull Kernel k-means and Spectral Clustering by Max Welling
bull Kernel k-means Spectral Clustering and Normalized Cut by
Inderjit S Dhillon Yuqiang Guan and Brian Kulis
bull An Introduction to kernel methods by Colin Campbell
22
Chapter 10 Cluster Analysis Basic Concepts and Methods
bull Beyond K-Means
bull K-means
bull EM-algorithm for Mixture Models
bull Kernel K-means
bull Clustering Graphs and Network Data
bull Summary
23
Clustering Graphs and Network Data
bull Applications
bull Bi-partite graphs eg customers and products authors and conferences
bull Web search engines eg click through graphs and Web graphs
bull Social networks friendshipcoauthor graphs
24
Clustering books about politics [Newman 2006]
Algorithms bull Graph clustering methods
bull Density-based clustering SCAN (Xu et al KDDrsquo2007)
bull Spectral clustering
bull Modularity-based approach
bull Probabilistic approach
bull Nonnegative matrix factorization
bull hellip
25
SCAN Density-Based Clustering of Networks
bull How many clusters
bull What size should they be
bull What is the best partitioning
bull Should some points be segregated
26
An Example Network
Application Given simply information of who associates with whom
could one identify clusters of individuals with common interests or
special relationships (families cliques terrorist cells)
A Social Network Model
bull Cliques hubs and outliers
bull Individuals in a tight social group or clique know many of the same
people regardless of the size of the group
bull Individuals who are hubs know many people in different groups but belong
to no single group Politicians for example bridge multiple groups
bull Individuals who are outliers reside at the margins of society Hermits for
example know few people and belong to no group
bull The Neighborhood of a Vertex
27
v
Define () as the immediate
neighborhood of a vertex (ie the set
of people that an individual knows )
Structure Similarity
bull The desired features tend to be captured by a measure we
call Structural Similarity
bull Structural similarity is large for members of a clique and small
for hubs and outliers
|)(||)(|
|)()(|)(
wv
wvwv
28
v
Structural Connectivity [1]
bull -Neighborhood
bull Core
bull Direct structure reachable
bull Structure reachable transitive closure of direct structure
reachability
bull Structure connected
)(|)()( wvvwvN
|)(|)( vNvCORE
)()()( vNwvCOREwvDirRECH
)()()( wuRECHvuRECHVuwvCONNECT
[1] M Ester H P Kriegel J Sander amp X Xu (KDD96) ldquoA Density-Based
Algorithm for Discovering Clusters in Large Spatial Databases
29
Structure-Connected Clusters
bull Structure-connected cluster C
bull Connectivity
bull Maximality
bull Hubs
bull Not belong to any cluster
bull Bridge to many clusters
bull Outliers
bull Not belong to any cluster
bull Connect to less clusters
)( wvCONNECTCwv
CwwvREACHCvVwv )(
hub
outlier
30
13
9
10
11
7
8
12
6
4
0
1 5
2
3
Algorithm
= 2
= 07
31
13
9
10
11
7
8
12
6
4
0
1 5
2
3
Algorithm
= 2
= 07
063
32
13
9
10
11
7
8
12
6
4
0
1 5
2
3
Algorithm
= 2
= 07
075
067
082
33
13
9
10
11
7
8
12
6
4
0
1 5
2
3
Algorithm
= 2
= 07
34
13
9
10
11
7
8
12
6
4
0
1 5
2
3
Algorithm
= 2
= 07
067
35
13
9
10
11
7
8
12
6
4
0
1 5
2
3
Algorithm
= 2
= 07
073
073
073
36
13
9
10
11
7
8
12
6
4
0
1 5
2
3
Algorithm
= 2
= 07
37
13
9
10
11
7
8
12
6
4
0
1 5
2
3
Algorithm
= 2
= 07
051
38
13
9
10
11
7
8
12
6
4
0
1 5
2
3
Algorithm
= 2
= 07
068
39
13
9
10
11
7
8
12
6
4
0
1 5
2
3
Algorithm
= 2
= 07
051
40
13
9
10
11
7
8
12
6
4
0
1 5
2
3
Algorithm
= 2
= 07
41
13
9
10
11
7
8
12
6
4
0
1 5
2
3
Algorithm
= 2
= 07 051
051
068
42
13
9
10
11
7
8
12
6
4
0
1 5
2
3
Algorithm
= 2
= 07
43
Running Time
bull Running time = O(|E|) bull For sparse networks = O(|V|)
[2] A Clauset M E J Newman amp C Moore Phys Rev E 70 066111 (2004) 44
Spectral Clustering
bull Reference ICDMrsquo09 Tutorial by Chris Ding
bull Example
bull Clustering supreme court justices according to their voting
behavior
45
Example Continue
46
Spectral Graph Partition
bull Min-Cut
bull Minimize the of cut of edges
47
Objective Function
48
Minimum Cut with Constraints
49
New Objective Functions
50
Other References
bull A Tutorial on Spectral Clustering by U Luxburg httpwwwkybmpgdefileadminuser_uploadfilespublicationsattachmentsLuxburg07_tutorial_44885B05Dpdf
51
Chapter 10 Cluster Analysis Basic Concepts and Methods
bull Beyond K-Means
bull K-means
bull EM-algorithm
bull Kernel K-means
bull Clustering Graphs and Network Data
bull Summary
52
Summary
bull Generalizing K-Means
bull Mixture Model EM-Algorithm Kernel K-Means
bull Clustering Graph and Networked Data
bull SCAN density-based algorithm
bull Spectral clustering
53
Announcement
bull HW 3 due tomorrow
bull Course project due next week bull Submit final report data code (with readme) evaluation forms
bull Make appointment with me to explain your project
bull I will ask questions according to your report
bull Final Exam bull 422 3 hours in class cover the whole semester with different
weights
bull You can bring two A4 cheating sheets one for content before midterm and the other for content after midterm
bull Interested in research bull My research area Informationsocial network mining
54
Kernel K-Means
bull How to cluster the following data
bull A non-linear map 120601119877119899 rarr 119865
bull Map a data point into a higherinfinite dimensional space
bull 119909 rarr 120601 119909
bull Dot product matrix 119870119894119895
bull 119870119894119895 =lt 120601 119909119894 120601(119909119895) gt
20
Solution of Kernel K-Means
bull Objective function under new feature space
bull 119869 = 119908119894119895||120601(119909119894) minus 119888119895||2
119894119896119895=1
bull Algorithm
bull By fixing assignment 119908119894119895
bull 119888119895 = 119908119894119895119894 120601(119909119894) 119908119894119895119894
bull In the assignment step assign the data points to the closest
center
bull 119889 119909119894 119888119895 = 120601 119909119894 minus 119908119894prime119895119894prime 120601 119909119894prime
119908119894prime119895119894prime
2
=
120601 119909119894 sdot 120601 119909119894 minus 2 119908
119894prime119895119894prime120601 119909119894 sdot120601 119909
119894prime
119908119894prime119895119894prime+ 119908
119894prime119895119908119897119895120601 119909
119894primesdot120601(119909119897)119897119894prime
( 119908119894prime119895)^2 119894prime
21
Do not really need to know 120601 119909 119887119906119905 119900119899119897119910 119870119894119895
Advatanges and Disadvantages of Kernel K-Means
bull Advantages
bull Algorithm is able to identify the non-linear structures
bull Disadvantages
bull Number of cluster centers need to be predefined
bull Algorithm is complex in nature and time complexity is large
bull References
bull Kernel k-means and Spectral Clustering by Max Welling
bull Kernel k-means Spectral Clustering and Normalized Cut by
Inderjit S Dhillon Yuqiang Guan and Brian Kulis
bull An Introduction to kernel methods by Colin Campbell
22
Chapter 10 Cluster Analysis Basic Concepts and Methods
bull Beyond K-Means
bull K-means
bull EM-algorithm for Mixture Models
bull Kernel K-means
bull Clustering Graphs and Network Data
bull Summary
23
Clustering Graphs and Network Data
bull Applications
bull Bi-partite graphs eg customers and products authors and conferences
bull Web search engines eg click through graphs and Web graphs
bull Social networks friendshipcoauthor graphs
24
Clustering books about politics [Newman 2006]
Algorithms bull Graph clustering methods
bull Density-based clustering SCAN (Xu et al KDDrsquo2007)
bull Spectral clustering
bull Modularity-based approach
bull Probabilistic approach
bull Nonnegative matrix factorization
bull hellip
25
SCAN Density-Based Clustering of Networks
bull How many clusters
bull What size should they be
bull What is the best partitioning
bull Should some points be segregated
26
An Example Network
Application Given simply information of who associates with whom
could one identify clusters of individuals with common interests or
special relationships (families cliques terrorist cells)
A Social Network Model
bull Cliques hubs and outliers
bull Individuals in a tight social group or clique know many of the same
people regardless of the size of the group
bull Individuals who are hubs know many people in different groups but belong
to no single group Politicians for example bridge multiple groups
bull Individuals who are outliers reside at the margins of society Hermits for
example know few people and belong to no group
bull The Neighborhood of a Vertex
27
v
Define () as the immediate
neighborhood of a vertex (ie the set
of people that an individual knows )
Structure Similarity
bull The desired features tend to be captured by a measure we
call Structural Similarity
bull Structural similarity is large for members of a clique and small
for hubs and outliers
|)(||)(|
|)()(|)(
wv
wvwv
28
v
Structural Connectivity [1]
bull -Neighborhood
bull Core
bull Direct structure reachable
bull Structure reachable transitive closure of direct structure
reachability
bull Structure connected
)(|)()( wvvwvN
|)(|)( vNvCORE
)()()( vNwvCOREwvDirRECH
)()()( wuRECHvuRECHVuwvCONNECT
[1] M Ester H P Kriegel J Sander amp X Xu (KDD96) ldquoA Density-Based
Algorithm for Discovering Clusters in Large Spatial Databases
29
Structure-Connected Clusters
bull Structure-connected cluster C
bull Connectivity
bull Maximality
bull Hubs
bull Not belong to any cluster
bull Bridge to many clusters
bull Outliers
bull Not belong to any cluster
bull Connect to less clusters
)( wvCONNECTCwv
CwwvREACHCvVwv )(
hub
outlier
30
13
9
10
11
7
8
12
6
4
0
1 5
2
3
Algorithm
= 2
= 07
31
13
9
10
11
7
8
12
6
4
0
1 5
2
3
Algorithm
= 2
= 07
063
32
13
9
10
11
7
8
12
6
4
0
1 5
2
3
Algorithm
= 2
= 07
075
067
082
33
13
9
10
11
7
8
12
6
4
0
1 5
2
3
Algorithm
= 2
= 07
34
13
9
10
11
7
8
12
6
4
0
1 5
2
3
Algorithm
= 2
= 07
067
35
13
9
10
11
7
8
12
6
4
0
1 5
2
3
Algorithm
= 2
= 07
073
073
073
36
13
9
10
11
7
8
12
6
4
0
1 5
2
3
Algorithm
= 2
= 07
37
13
9
10
11
7
8
12
6
4
0
1 5
2
3
Algorithm
= 2
= 07
051
38
13
9
10
11
7
8
12
6
4
0
1 5
2
3
Algorithm
= 2
= 07
068
39
13
9
10
11
7
8
12
6
4
0
1 5
2
3
Algorithm
= 2
= 07
051
40
13
9
10
11
7
8
12
6
4
0
1 5
2
3
Algorithm
= 2
= 07
41
13
9
10
11
7
8
12
6
4
0
1 5
2
3
Algorithm
= 2
= 07 051
051
068
42
13
9
10
11
7
8
12
6
4
0
1 5
2
3
Algorithm
= 2
= 07
43
Running Time
bull Running time = O(|E|) bull For sparse networks = O(|V|)
[2] A Clauset M E J Newman amp C Moore Phys Rev E 70 066111 (2004) 44
Spectral Clustering
bull Reference ICDMrsquo09 Tutorial by Chris Ding
bull Example
bull Clustering supreme court justices according to their voting
behavior
45
Example Continue
46
Spectral Graph Partition
bull Min-Cut
bull Minimize the of cut of edges
47
Objective Function
48
Minimum Cut with Constraints
49
New Objective Functions
50
Other References
bull A Tutorial on Spectral Clustering by U Luxburg httpwwwkybmpgdefileadminuser_uploadfilespublicationsattachmentsLuxburg07_tutorial_44885B05Dpdf
51
Chapter 10 Cluster Analysis Basic Concepts and Methods
bull Beyond K-Means
bull K-means
bull EM-algorithm
bull Kernel K-means
bull Clustering Graphs and Network Data
bull Summary
52
Summary
bull Generalizing K-Means
bull Mixture Model EM-Algorithm Kernel K-Means
bull Clustering Graph and Networked Data
bull SCAN density-based algorithm
bull Spectral clustering
53
Announcement
bull HW 3 due tomorrow
bull Course project due next week bull Submit final report data code (with readme) evaluation forms
bull Make appointment with me to explain your project
bull I will ask questions according to your report
bull Final Exam bull 422 3 hours in class cover the whole semester with different
weights
bull You can bring two A4 cheating sheets one for content before midterm and the other for content after midterm
bull Interested in research bull My research area Informationsocial network mining
54
Solution of Kernel K-Means
bull Objective function under new feature space
bull 119869 = 119908119894119895||120601(119909119894) minus 119888119895||2
119894119896119895=1
bull Algorithm
bull By fixing assignment 119908119894119895
bull 119888119895 = 119908119894119895119894 120601(119909119894) 119908119894119895119894
bull In the assignment step assign the data points to the closest
center
bull 119889 119909119894 119888119895 = 120601 119909119894 minus 119908119894prime119895119894prime 120601 119909119894prime
119908119894prime119895119894prime
2
=
120601 119909119894 sdot 120601 119909119894 minus 2 119908
119894prime119895119894prime120601 119909119894 sdot120601 119909
119894prime
119908119894prime119895119894prime+ 119908
119894prime119895119908119897119895120601 119909
119894primesdot120601(119909119897)119897119894prime
( 119908119894prime119895)^2 119894prime
21
Do not really need to know 120601 119909 119887119906119905 119900119899119897119910 119870119894119895
Advatanges and Disadvantages of Kernel K-Means
bull Advantages
bull Algorithm is able to identify the non-linear structures
bull Disadvantages
bull Number of cluster centers need to be predefined
bull Algorithm is complex in nature and time complexity is large
bull References
bull Kernel k-means and Spectral Clustering by Max Welling
bull Kernel k-means Spectral Clustering and Normalized Cut by
Inderjit S Dhillon Yuqiang Guan and Brian Kulis
bull An Introduction to kernel methods by Colin Campbell
22
Chapter 10 Cluster Analysis Basic Concepts and Methods
bull Beyond K-Means
bull K-means
bull EM-algorithm for Mixture Models
bull Kernel K-means
bull Clustering Graphs and Network Data
bull Summary
23
Clustering Graphs and Network Data
bull Applications
bull Bi-partite graphs eg customers and products authors and conferences
bull Web search engines eg click through graphs and Web graphs
bull Social networks friendshipcoauthor graphs
24
Clustering books about politics [Newman 2006]
Algorithms bull Graph clustering methods
bull Density-based clustering SCAN (Xu et al KDDrsquo2007)
bull Spectral clustering
bull Modularity-based approach
bull Probabilistic approach
bull Nonnegative matrix factorization
bull hellip
25
SCAN Density-Based Clustering of Networks
bull How many clusters
bull What size should they be
bull What is the best partitioning
bull Should some points be segregated
26
An Example Network
Application Given simply information of who associates with whom
could one identify clusters of individuals with common interests or
special relationships (families cliques terrorist cells)
A Social Network Model
bull Cliques hubs and outliers
bull Individuals in a tight social group or clique know many of the same
people regardless of the size of the group
bull Individuals who are hubs know many people in different groups but belong
to no single group Politicians for example bridge multiple groups
bull Individuals who are outliers reside at the margins of society Hermits for
example know few people and belong to no group
bull The Neighborhood of a Vertex
27
v
Define () as the immediate
neighborhood of a vertex (ie the set
of people that an individual knows )
Structure Similarity
bull The desired features tend to be captured by a measure we
call Structural Similarity
bull Structural similarity is large for members of a clique and small
for hubs and outliers
|)(||)(|
|)()(|)(
wv
wvwv
28
v
Structural Connectivity [1]
bull -Neighborhood
bull Core
bull Direct structure reachable
bull Structure reachable transitive closure of direct structure
reachability
bull Structure connected
)(|)()( wvvwvN
|)(|)( vNvCORE
)()()( vNwvCOREwvDirRECH
)()()( wuRECHvuRECHVuwvCONNECT
[1] M Ester H P Kriegel J Sander amp X Xu (KDD96) ldquoA Density-Based
Algorithm for Discovering Clusters in Large Spatial Databases
29
Structure-Connected Clusters
bull Structure-connected cluster C
bull Connectivity
bull Maximality
bull Hubs
bull Not belong to any cluster
bull Bridge to many clusters
bull Outliers
bull Not belong to any cluster
bull Connect to less clusters
)( wvCONNECTCwv
CwwvREACHCvVwv )(
hub
outlier
30
13
9
10
11
7
8
12
6
4
0
1 5
2
3
Algorithm
= 2
= 07
31
13
9
10
11
7
8
12
6
4
0
1 5
2
3
Algorithm
= 2
= 07
063
32
13
9
10
11
7
8
12
6
4
0
1 5
2
3
Algorithm
= 2
= 07
075
067
082
33
13
9
10
11
7
8
12
6
4
0
1 5
2
3
Algorithm
= 2
= 07
34
13
9
10
11
7
8
12
6
4
0
1 5
2
3
Algorithm
= 2
= 07
067
35
13
9
10
11
7
8
12
6
4
0
1 5
2
3
Algorithm
= 2
= 07
073
073
073
36
13
9
10
11
7
8
12
6
4
0
1 5
2
3
Algorithm
= 2
= 07
37
13
9
10
11
7
8
12
6
4
0
1 5
2
3
Algorithm
= 2
= 07
051
38
13
9
10
11
7
8
12
6
4
0
1 5
2
3
Algorithm
= 2
= 07
068
39
13
9
10
11
7
8
12
6
4
0
1 5
2
3
Algorithm
= 2
= 07
051
40
13
9
10
11
7
8
12
6
4
0
1 5
2
3
Algorithm
= 2
= 07
41
13
9
10
11
7
8
12
6
4
0
1 5
2
3
Algorithm
= 2
= 07 051
051
068
42
13
9
10
11
7
8
12
6
4
0
1 5
2
3
Algorithm
= 2
= 07
43
Running Time
bull Running time = O(|E|) bull For sparse networks = O(|V|)
[2] A Clauset M E J Newman amp C Moore Phys Rev E 70 066111 (2004) 44
Spectral Clustering
bull Reference ICDMrsquo09 Tutorial by Chris Ding
bull Example
bull Clustering supreme court justices according to their voting
behavior
45
Example Continue
46
Spectral Graph Partition
bull Min-Cut
bull Minimize the of cut of edges
47
Objective Function
48
Minimum Cut with Constraints
49
New Objective Functions
50
Other References
bull A Tutorial on Spectral Clustering by U Luxburg httpwwwkybmpgdefileadminuser_uploadfilespublicationsattachmentsLuxburg07_tutorial_44885B05Dpdf
51
Chapter 10 Cluster Analysis Basic Concepts and Methods
bull Beyond K-Means
bull K-means
bull EM-algorithm
bull Kernel K-means
bull Clustering Graphs and Network Data
bull Summary
52
Summary
bull Generalizing K-Means
bull Mixture Model EM-Algorithm Kernel K-Means
bull Clustering Graph and Networked Data
bull SCAN density-based algorithm
bull Spectral clustering
53
Announcement
bull HW 3 due tomorrow
bull Course project due next week bull Submit final report data code (with readme) evaluation forms
bull Make appointment with me to explain your project
bull I will ask questions according to your report
bull Final Exam bull 422 3 hours in class cover the whole semester with different
weights
bull You can bring two A4 cheating sheets one for content before midterm and the other for content after midterm
bull Interested in research bull My research area Informationsocial network mining
54
Advatanges and Disadvantages of Kernel K-Means
bull Advantages
bull Algorithm is able to identify the non-linear structures
bull Disadvantages
bull Number of cluster centers need to be predefined
bull Algorithm is complex in nature and time complexity is large
bull References
bull Kernel k-means and Spectral Clustering by Max Welling
bull Kernel k-means Spectral Clustering and Normalized Cut by
Inderjit S Dhillon Yuqiang Guan and Brian Kulis
bull An Introduction to kernel methods by Colin Campbell
22
Chapter 10 Cluster Analysis Basic Concepts and Methods
bull Beyond K-Means
bull K-means
bull EM-algorithm for Mixture Models
bull Kernel K-means
bull Clustering Graphs and Network Data
bull Summary
23
Clustering Graphs and Network Data
bull Applications
bull Bi-partite graphs eg customers and products authors and conferences
bull Web search engines eg click through graphs and Web graphs
bull Social networks friendshipcoauthor graphs
24
Clustering books about politics [Newman 2006]
Algorithms bull Graph clustering methods
bull Density-based clustering SCAN (Xu et al KDDrsquo2007)
bull Spectral clustering
bull Modularity-based approach
bull Probabilistic approach
bull Nonnegative matrix factorization
bull hellip
25
SCAN Density-Based Clustering of Networks
bull How many clusters
bull What size should they be
bull What is the best partitioning
bull Should some points be segregated
26
An Example Network
Application Given simply information of who associates with whom
could one identify clusters of individuals with common interests or
special relationships (families cliques terrorist cells)
A Social Network Model
bull Cliques hubs and outliers
bull Individuals in a tight social group or clique know many of the same
people regardless of the size of the group
bull Individuals who are hubs know many people in different groups but belong
to no single group Politicians for example bridge multiple groups
bull Individuals who are outliers reside at the margins of society Hermits for
example know few people and belong to no group
bull The Neighborhood of a Vertex
27
v
Define () as the immediate
neighborhood of a vertex (ie the set
of people that an individual knows )
Structure Similarity
bull The desired features tend to be captured by a measure we
call Structural Similarity
bull Structural similarity is large for members of a clique and small
for hubs and outliers
|)(||)(|
|)()(|)(
wv
wvwv
28
v
Structural Connectivity [1]
bull -Neighborhood
bull Core
bull Direct structure reachable
bull Structure reachable transitive closure of direct structure
reachability
bull Structure connected
)(|)()( wvvwvN
|)(|)( vNvCORE
)()()( vNwvCOREwvDirRECH
)()()( wuRECHvuRECHVuwvCONNECT
[1] M Ester H P Kriegel J Sander amp X Xu (KDD96) ldquoA Density-Based
Algorithm for Discovering Clusters in Large Spatial Databases
29
Structure-Connected Clusters
bull Structure-connected cluster C
bull Connectivity
bull Maximality
bull Hubs
bull Not belong to any cluster
bull Bridge to many clusters
bull Outliers
bull Not belong to any cluster
bull Connect to less clusters
)( wvCONNECTCwv
CwwvREACHCvVwv )(
hub
outlier
30
13
9
10
11
7
8
12
6
4
0
1 5
2
3
Algorithm
= 2
= 07
31
13
9
10
11
7
8
12
6
4
0
1 5
2
3
Algorithm
= 2
= 07
063
32
13
9
10
11
7
8
12
6
4
0
1 5
2
3
Algorithm
= 2
= 07
075
067
082
33
13
9
10
11
7
8
12
6
4
0
1 5
2
3
Algorithm
= 2
= 07
34
13
9
10
11
7
8
12
6
4
0
1 5
2
3
Algorithm
= 2
= 07
067
35
13
9
10
11
7
8
12
6
4
0
1 5
2
3
Algorithm
= 2
= 07
073
073
073
36
13
9
10
11
7
8
12
6
4
0
1 5
2
3
Algorithm
= 2
= 07
37
13
9
10
11
7
8
12
6
4
0
1 5
2
3
Algorithm
= 2
= 07
051
38
13
9
10
11
7
8
12
6
4
0
1 5
2
3
Algorithm
= 2
= 07
068
39
13
9
10
11
7
8
12
6
4
0
1 5
2
3
Algorithm
= 2
= 07
051
40
13
9
10
11
7
8
12
6
4
0
1 5
2
3
Algorithm
= 2
= 07
41
13
9
10
11
7
8
12
6
4
0
1 5
2
3
Algorithm
= 2
= 07 051
051
068
42
13
9
10
11
7
8
12
6
4
0
1 5
2
3
Algorithm
= 2
= 07
43
Running Time
bull Running time = O(|E|) bull For sparse networks = O(|V|)
[2] A Clauset M E J Newman amp C Moore Phys Rev E 70 066111 (2004) 44
Spectral Clustering
bull Reference ICDMrsquo09 Tutorial by Chris Ding
bull Example
bull Clustering supreme court justices according to their voting
behavior
45
Example Continue
46
Spectral Graph Partition
bull Min-Cut
bull Minimize the of cut of edges
47
Objective Function
48
Minimum Cut with Constraints
49
New Objective Functions
50
Other References
bull A Tutorial on Spectral Clustering by U Luxburg httpwwwkybmpgdefileadminuser_uploadfilespublicationsattachmentsLuxburg07_tutorial_44885B05Dpdf
51
Chapter 10 Cluster Analysis Basic Concepts and Methods
bull Beyond K-Means
bull K-means
bull EM-algorithm
bull Kernel K-means
bull Clustering Graphs and Network Data
bull Summary
52
Summary
bull Generalizing K-Means
bull Mixture Model EM-Algorithm Kernel K-Means
bull Clustering Graph and Networked Data
bull SCAN density-based algorithm
bull Spectral clustering
53
Announcement
bull HW 3 due tomorrow
bull Course project due next week bull Submit final report data code (with readme) evaluation forms
bull Make appointment with me to explain your project
bull I will ask questions according to your report
bull Final Exam bull 422 3 hours in class cover the whole semester with different
weights
bull You can bring two A4 cheating sheets one for content before midterm and the other for content after midterm
bull Interested in research bull My research area Informationsocial network mining
54
Chapter 10 Cluster Analysis Basic Concepts and Methods
bull Beyond K-Means
bull K-means
bull EM-algorithm for Mixture Models
bull Kernel K-means
bull Clustering Graphs and Network Data
bull Summary
23
Clustering Graphs and Network Data
bull Applications
bull Bi-partite graphs eg customers and products authors and conferences
bull Web search engines eg click through graphs and Web graphs
bull Social networks friendshipcoauthor graphs
24
Clustering books about politics [Newman 2006]
Algorithms bull Graph clustering methods
bull Density-based clustering SCAN (Xu et al KDDrsquo2007)
bull Spectral clustering
bull Modularity-based approach
bull Probabilistic approach
bull Nonnegative matrix factorization
bull hellip
25
SCAN Density-Based Clustering of Networks
bull How many clusters
bull What size should they be
bull What is the best partitioning
bull Should some points be segregated
26
An Example Network
Application Given simply information of who associates with whom
could one identify clusters of individuals with common interests or
special relationships (families cliques terrorist cells)
A Social Network Model
bull Cliques hubs and outliers
bull Individuals in a tight social group or clique know many of the same
people regardless of the size of the group
bull Individuals who are hubs know many people in different groups but belong
to no single group Politicians for example bridge multiple groups
bull Individuals who are outliers reside at the margins of society Hermits for
example know few people and belong to no group
bull The Neighborhood of a Vertex
27
v
Define () as the immediate
neighborhood of a vertex (ie the set
of people that an individual knows )
Structure Similarity
bull The desired features tend to be captured by a measure we
call Structural Similarity
bull Structural similarity is large for members of a clique and small
for hubs and outliers
|)(||)(|
|)()(|)(
wv
wvwv
28
v
Structural Connectivity [1]
bull -Neighborhood
bull Core
bull Direct structure reachable
bull Structure reachable transitive closure of direct structure
reachability
bull Structure connected
)(|)()( wvvwvN
|)(|)( vNvCORE
)()()( vNwvCOREwvDirRECH
)()()( wuRECHvuRECHVuwvCONNECT
[1] M Ester H P Kriegel J Sander amp X Xu (KDD96) ldquoA Density-Based
Algorithm for Discovering Clusters in Large Spatial Databases
29
Structure-Connected Clusters
bull Structure-connected cluster C
bull Connectivity
bull Maximality
bull Hubs
bull Not belong to any cluster
bull Bridge to many clusters
bull Outliers
bull Not belong to any cluster
bull Connect to less clusters
)( wvCONNECTCwv
CwwvREACHCvVwv )(
hub
outlier
30
13
9
10
11
7
8
12
6
4
0
1 5
2
3
Algorithm
= 2
= 07
31
13
9
10
11
7
8
12
6
4
0
1 5
2
3
Algorithm
= 2
= 07
063
32
13
9
10
11
7
8
12
6
4
0
1 5
2
3
Algorithm
= 2
= 07
075
067
082
33
13
9
10
11
7
8
12
6
4
0
1 5
2
3
Algorithm
= 2
= 07
34
13
9
10
11
7
8
12
6
4
0
1 5
2
3
Algorithm
= 2
= 07
067
35
13
9
10
11
7
8
12
6
4
0
1 5
2
3
Algorithm
= 2
= 07
073
073
073
36
13
9
10
11
7
8
12
6
4
0
1 5
2
3
Algorithm
= 2
= 07
37
13
9
10
11
7
8
12
6
4
0
1 5
2
3
Algorithm
= 2
= 07
051
38
13
9
10
11
7
8
12
6
4
0
1 5
2
3
Algorithm
= 2
= 07
068
39
13
9
10
11
7
8
12
6
4
0
1 5
2
3
Algorithm
= 2
= 07
051
40
13
9
10
11
7
8
12
6
4
0
1 5
2
3
Algorithm
= 2
= 07
41
13
9
10
11
7
8
12
6
4
0
1 5
2
3
Algorithm
= 2
= 07 051
051
068
42
13
9
10
11
7
8
12
6
4
0
1 5
2
3
Algorithm
= 2
= 07
43
Running Time
bull Running time = O(|E|) bull For sparse networks = O(|V|)
[2] A Clauset M E J Newman amp C Moore Phys Rev E 70 066111 (2004) 44
Spectral Clustering
bull Reference ICDMrsquo09 Tutorial by Chris Ding
bull Example
bull Clustering supreme court justices according to their voting
behavior
45
Example Continue
46
Spectral Graph Partition
bull Min-Cut
bull Minimize the of cut of edges
47
Objective Function
48
Minimum Cut with Constraints
49
New Objective Functions
50
Other References
bull A Tutorial on Spectral Clustering by U Luxburg httpwwwkybmpgdefileadminuser_uploadfilespublicationsattachmentsLuxburg07_tutorial_44885B05Dpdf
51
Chapter 10 Cluster Analysis Basic Concepts and Methods
bull Beyond K-Means
bull K-means
bull EM-algorithm
bull Kernel K-means
bull Clustering Graphs and Network Data
bull Summary
52
Summary
bull Generalizing K-Means
bull Mixture Model EM-Algorithm Kernel K-Means
bull Clustering Graph and Networked Data
bull SCAN density-based algorithm
bull Spectral clustering
53
Announcement
bull HW 3 due tomorrow
bull Course project due next week bull Submit final report data code (with readme) evaluation forms
bull Make appointment with me to explain your project
bull I will ask questions according to your report
bull Final Exam bull 422 3 hours in class cover the whole semester with different
weights
bull You can bring two A4 cheating sheets one for content before midterm and the other for content after midterm
bull Interested in research bull My research area Informationsocial network mining
54
Clustering Graphs and Network Data
bull Applications
bull Bi-partite graphs eg customers and products authors and conferences
bull Web search engines eg click through graphs and Web graphs
bull Social networks friendshipcoauthor graphs
24
Clustering books about politics [Newman 2006]
Algorithms bull Graph clustering methods
bull Density-based clustering SCAN (Xu et al KDDrsquo2007)
bull Spectral clustering
bull Modularity-based approach
bull Probabilistic approach
bull Nonnegative matrix factorization
bull hellip
25
SCAN Density-Based Clustering of Networks
bull How many clusters
bull What size should they be
bull What is the best partitioning
bull Should some points be segregated
26
An Example Network
Application Given simply information of who associates with whom
could one identify clusters of individuals with common interests or
special relationships (families cliques terrorist cells)
A Social Network Model
bull Cliques hubs and outliers
bull Individuals in a tight social group or clique know many of the same
people regardless of the size of the group
bull Individuals who are hubs know many people in different groups but belong
to no single group Politicians for example bridge multiple groups
bull Individuals who are outliers reside at the margins of society Hermits for
example know few people and belong to no group
bull The Neighborhood of a Vertex
27
v
Define () as the immediate
neighborhood of a vertex (ie the set
of people that an individual knows )
Structure Similarity
bull The desired features tend to be captured by a measure we
call Structural Similarity
bull Structural similarity is large for members of a clique and small
for hubs and outliers
|)(||)(|
|)()(|)(
wv
wvwv
28
v
Structural Connectivity [1]
bull -Neighborhood
bull Core
bull Direct structure reachable
bull Structure reachable transitive closure of direct structure
reachability
bull Structure connected
)(|)()( wvvwvN
|)(|)( vNvCORE
)()()( vNwvCOREwvDirRECH
)()()( wuRECHvuRECHVuwvCONNECT
[1] M Ester H P Kriegel J Sander amp X Xu (KDD96) ldquoA Density-Based
Algorithm for Discovering Clusters in Large Spatial Databases
29
Structure-Connected Clusters
bull Structure-connected cluster C
bull Connectivity
bull Maximality
bull Hubs
bull Not belong to any cluster
bull Bridge to many clusters
bull Outliers
bull Not belong to any cluster
bull Connect to less clusters
)( wvCONNECTCwv
CwwvREACHCvVwv )(
hub
outlier
30
13
9
10
11
7
8
12
6
4
0
1 5
2
3
Algorithm
= 2
= 07
31
13
9
10
11
7
8
12
6
4
0
1 5
2
3
Algorithm
= 2
= 07
063
32
13
9
10
11
7
8
12
6
4
0
1 5
2
3
Algorithm
= 2
= 07
075
067
082
33
13
9
10
11
7
8
12
6
4
0
1 5
2
3
Algorithm
= 2
= 07
34
13
9
10
11
7
8
12
6
4
0
1 5
2
3
Algorithm
= 2
= 07
067
35
13
9
10
11
7
8
12
6
4
0
1 5
2
3
Algorithm
= 2
= 07
073
073
073
36
13
9
10
11
7
8
12
6
4
0
1 5
2
3
Algorithm
= 2
= 07
37
13
9
10
11
7
8
12
6
4
0
1 5
2
3
Algorithm
= 2
= 07
051
38
13
9
10
11
7
8
12
6
4
0
1 5
2
3
Algorithm
= 2
= 07
068
39
13
9
10
11
7
8
12
6
4
0
1 5
2
3
Algorithm
= 2
= 07
051
40
13
9
10
11
7
8
12
6
4
0
1 5
2
3
Algorithm
= 2
= 07
41
13
9
10
11
7
8
12
6
4
0
1 5
2
3
Algorithm
= 2
= 07 051
051
068
42
13
9
10
11
7
8
12
6
4
0
1 5
2
3
Algorithm
= 2
= 07
43
Running Time
bull Running time = O(|E|) bull For sparse networks = O(|V|)
[2] A Clauset M E J Newman amp C Moore Phys Rev E 70 066111 (2004) 44
Spectral Clustering
bull Reference ICDMrsquo09 Tutorial by Chris Ding
bull Example
bull Clustering supreme court justices according to their voting
behavior
45
Example Continue
46
Spectral Graph Partition
bull Min-Cut
bull Minimize the of cut of edges
47
Objective Function
48
Minimum Cut with Constraints
49
New Objective Functions
50
Other References
bull A Tutorial on Spectral Clustering by U Luxburg httpwwwkybmpgdefileadminuser_uploadfilespublicationsattachmentsLuxburg07_tutorial_44885B05Dpdf
51
Chapter 10 Cluster Analysis Basic Concepts and Methods
bull Beyond K-Means
bull K-means
bull EM-algorithm
bull Kernel K-means
bull Clustering Graphs and Network Data
bull Summary
52
Summary
bull Generalizing K-Means
bull Mixture Model EM-Algorithm Kernel K-Means
bull Clustering Graph and Networked Data
bull SCAN density-based algorithm
bull Spectral clustering
53
Announcement
bull HW 3 due tomorrow
bull Course project due next week bull Submit final report data code (with readme) evaluation forms
bull Make appointment with me to explain your project
bull I will ask questions according to your report
bull Final Exam bull 422 3 hours in class cover the whole semester with different
weights
bull You can bring two A4 cheating sheets one for content before midterm and the other for content after midterm
bull Interested in research bull My research area Informationsocial network mining
54
Algorithms bull Graph clustering methods
bull Density-based clustering SCAN (Xu et al KDDrsquo2007)
bull Spectral clustering
bull Modularity-based approach
bull Probabilistic approach
bull Nonnegative matrix factorization
bull hellip
25
SCAN Density-Based Clustering of Networks
bull How many clusters
bull What size should they be
bull What is the best partitioning
bull Should some points be segregated
26
An Example Network
Application Given simply information of who associates with whom
could one identify clusters of individuals with common interests or
special relationships (families cliques terrorist cells)
A Social Network Model
bull Cliques hubs and outliers
bull Individuals in a tight social group or clique know many of the same
people regardless of the size of the group
bull Individuals who are hubs know many people in different groups but belong
to no single group Politicians for example bridge multiple groups
bull Individuals who are outliers reside at the margins of society Hermits for
example know few people and belong to no group
bull The Neighborhood of a Vertex
27
v
Define () as the immediate
neighborhood of a vertex (ie the set
of people that an individual knows )
Structure Similarity
bull The desired features tend to be captured by a measure we
call Structural Similarity
bull Structural similarity is large for members of a clique and small
for hubs and outliers
|)(||)(|
|)()(|)(
wv
wvwv
28
v
Structural Connectivity [1]
bull -Neighborhood
bull Core
bull Direct structure reachable
bull Structure reachable transitive closure of direct structure
reachability
bull Structure connected
)(|)()( wvvwvN
|)(|)( vNvCORE
)()()( vNwvCOREwvDirRECH
)()()( wuRECHvuRECHVuwvCONNECT
[1] M Ester H P Kriegel J Sander amp X Xu (KDD96) ldquoA Density-Based
Algorithm for Discovering Clusters in Large Spatial Databases
29
Structure-Connected Clusters
bull Structure-connected cluster C
bull Connectivity
bull Maximality
bull Hubs
bull Not belong to any cluster
bull Bridge to many clusters
bull Outliers
bull Not belong to any cluster
bull Connect to less clusters
)( wvCONNECTCwv
CwwvREACHCvVwv )(
hub
outlier
30
13
9
10
11
7
8
12
6
4
0
1 5
2
3
Algorithm
= 2
= 07
31
13
9
10
11
7
8
12
6
4
0
1 5
2
3
Algorithm
= 2
= 07
063
32
13
9
10
11
7
8
12
6
4
0
1 5
2
3
Algorithm
= 2
= 07
075
067
082
33
13
9
10
11
7
8
12
6
4
0
1 5
2
3
Algorithm
= 2
= 07
34
13
9
10
11
7
8
12
6
4
0
1 5
2
3
Algorithm
= 2
= 07
067
35
13
9
10
11
7
8
12
6
4
0
1 5
2
3
Algorithm
= 2
= 07
073
073
073
36
13
9
10
11
7
8
12
6
4
0
1 5
2
3
Algorithm
= 2
= 07
37
13
9
10
11
7
8
12
6
4
0
1 5
2
3
Algorithm
= 2
= 07
051
38
13
9
10
11
7
8
12
6
4
0
1 5
2
3
Algorithm
= 2
= 07
068
39
13
9
10
11
7
8
12
6
4
0
1 5
2
3
Algorithm
= 2
= 07
051
40
13
9
10
11
7
8
12
6
4
0
1 5
2
3
Algorithm
= 2
= 07
41
13
9
10
11
7
8
12
6
4
0
1 5
2
3
Algorithm
= 2
= 07 051
051
068
42
13
9
10
11
7
8
12
6
4
0
1 5
2
3
Algorithm
= 2
= 07
43
Running Time
bull Running time = O(|E|) bull For sparse networks = O(|V|)
[2] A Clauset M E J Newman amp C Moore Phys Rev E 70 066111 (2004) 44
Spectral Clustering
bull Reference ICDMrsquo09 Tutorial by Chris Ding
bull Example
bull Clustering supreme court justices according to their voting
behavior
45
Example Continue
46
Spectral Graph Partition
bull Min-Cut
bull Minimize the of cut of edges
47
Objective Function
48
Minimum Cut with Constraints
49
New Objective Functions
50
Other References
bull A Tutorial on Spectral Clustering by U Luxburg httpwwwkybmpgdefileadminuser_uploadfilespublicationsattachmentsLuxburg07_tutorial_44885B05Dpdf
51
Chapter 10 Cluster Analysis Basic Concepts and Methods
bull Beyond K-Means
bull K-means
bull EM-algorithm
bull Kernel K-means
bull Clustering Graphs and Network Data
bull Summary
52
Summary
bull Generalizing K-Means
bull Mixture Model EM-Algorithm Kernel K-Means
bull Clustering Graph and Networked Data
bull SCAN density-based algorithm
bull Spectral clustering
53
Announcement
bull HW 3 due tomorrow
bull Course project due next week bull Submit final report data code (with readme) evaluation forms
bull Make appointment with me to explain your project
bull I will ask questions according to your report
bull Final Exam bull 422 3 hours in class cover the whole semester with different
weights
bull You can bring two A4 cheating sheets one for content before midterm and the other for content after midterm
bull Interested in research bull My research area Informationsocial network mining
54
SCAN Density-Based Clustering of Networks
bull How many clusters
bull What size should they be
bull What is the best partitioning
bull Should some points be segregated
26
An Example Network
Application Given simply information of who associates with whom
could one identify clusters of individuals with common interests or
special relationships (families cliques terrorist cells)
A Social Network Model
bull Cliques hubs and outliers
bull Individuals in a tight social group or clique know many of the same
people regardless of the size of the group
bull Individuals who are hubs know many people in different groups but belong
to no single group Politicians for example bridge multiple groups
bull Individuals who are outliers reside at the margins of society Hermits for
example know few people and belong to no group
bull The Neighborhood of a Vertex
27
v
Define () as the immediate
neighborhood of a vertex (ie the set
of people that an individual knows )
Structure Similarity
bull The desired features tend to be captured by a measure we
call Structural Similarity
bull Structural similarity is large for members of a clique and small
for hubs and outliers
|)(||)(|
|)()(|)(
wv
wvwv
28
v
Structural Connectivity [1]
bull -Neighborhood
bull Core
bull Direct structure reachable
bull Structure reachable transitive closure of direct structure
reachability
bull Structure connected
)(|)()( wvvwvN
|)(|)( vNvCORE
)()()( vNwvCOREwvDirRECH
)()()( wuRECHvuRECHVuwvCONNECT
[1] M Ester H P Kriegel J Sander amp X Xu (KDD96) ldquoA Density-Based
Algorithm for Discovering Clusters in Large Spatial Databases
29
Structure-Connected Clusters
bull Structure-connected cluster C
bull Connectivity
bull Maximality
bull Hubs
bull Not belong to any cluster
bull Bridge to many clusters
bull Outliers
bull Not belong to any cluster
bull Connect to less clusters
)( wvCONNECTCwv
CwwvREACHCvVwv )(
hub
outlier
30
13
9
10
11
7
8
12
6
4
0
1 5
2
3
Algorithm
= 2
= 07
31
13
9
10
11
7
8
12
6
4
0
1 5
2
3
Algorithm
= 2
= 07
063
32
13
9
10
11
7
8
12
6
4
0
1 5
2
3
Algorithm
= 2
= 07
075
067
082
33
13
9
10
11
7
8
12
6
4
0
1 5
2
3
Algorithm
= 2
= 07
34
13
9
10
11
7
8
12
6
4
0
1 5
2
3
Algorithm
= 2
= 07
067
35
13
9
10
11
7
8
12
6
4
0
1 5
2
3
Algorithm
= 2
= 07
073
073
073
36
13
9
10
11
7
8
12
6
4
0
1 5
2
3
Algorithm
= 2
= 07
37
13
9
10
11
7
8
12
6
4
0
1 5
2
3
Algorithm
= 2
= 07
051
38
13
9
10
11
7
8
12
6
4
0
1 5
2
3
Algorithm
= 2
= 07
068
39
13
9
10
11
7
8
12
6
4
0
1 5
2
3
Algorithm
= 2
= 07
051
40
13
9
10
11
7
8
12
6
4
0
1 5
2
3
Algorithm
= 2
= 07
41
13
9
10
11
7
8
12
6
4
0
1 5
2
3
Algorithm
= 2
= 07 051
051
068
42
13
9
10
11
7
8
12
6
4
0
1 5
2
3
Algorithm
= 2
= 07
43
Running Time
bull Running time = O(|E|) bull For sparse networks = O(|V|)
[2] A Clauset M E J Newman amp C Moore Phys Rev E 70 066111 (2004) 44
Spectral Clustering
bull Reference ICDMrsquo09 Tutorial by Chris Ding
bull Example
bull Clustering supreme court justices according to their voting
behavior
45
Example Continue
46
Spectral Graph Partition
bull Min-Cut
bull Minimize the of cut of edges
47
Objective Function
48
Minimum Cut with Constraints
49
New Objective Functions
50
Other References
bull A Tutorial on Spectral Clustering by U Luxburg httpwwwkybmpgdefileadminuser_uploadfilespublicationsattachmentsLuxburg07_tutorial_44885B05Dpdf
51
Chapter 10 Cluster Analysis Basic Concepts and Methods
bull Beyond K-Means
bull K-means
bull EM-algorithm
bull Kernel K-means
bull Clustering Graphs and Network Data
bull Summary
52
Summary
bull Generalizing K-Means
bull Mixture Model EM-Algorithm Kernel K-Means
bull Clustering Graph and Networked Data
bull SCAN density-based algorithm
bull Spectral clustering
53
Announcement
bull HW 3 due tomorrow
bull Course project due next week bull Submit final report data code (with readme) evaluation forms
bull Make appointment with me to explain your project
bull I will ask questions according to your report
bull Final Exam bull 422 3 hours in class cover the whole semester with different
weights
bull You can bring two A4 cheating sheets one for content before midterm and the other for content after midterm
bull Interested in research bull My research area Informationsocial network mining
54
A Social Network Model
bull Cliques hubs and outliers
bull Individuals in a tight social group or clique know many of the same
people regardless of the size of the group
bull Individuals who are hubs know many people in different groups but belong
to no single group Politicians for example bridge multiple groups
bull Individuals who are outliers reside at the margins of society Hermits for
example know few people and belong to no group
bull The Neighborhood of a Vertex
27
v
Define () as the immediate
neighborhood of a vertex (ie the set
of people that an individual knows )
Structure Similarity
bull The desired features tend to be captured by a measure we
call Structural Similarity
bull Structural similarity is large for members of a clique and small
for hubs and outliers
|)(||)(|
|)()(|)(
wv
wvwv
28
v
Structural Connectivity [1]
bull -Neighborhood
bull Core
bull Direct structure reachable
bull Structure reachable transitive closure of direct structure
reachability
bull Structure connected
)(|)()( wvvwvN
|)(|)( vNvCORE
)()()( vNwvCOREwvDirRECH
)()()( wuRECHvuRECHVuwvCONNECT
[1] M Ester H P Kriegel J Sander amp X Xu (KDD96) ldquoA Density-Based
Algorithm for Discovering Clusters in Large Spatial Databases
29
Structure-Connected Clusters
bull Structure-connected cluster C
bull Connectivity
bull Maximality
bull Hubs
bull Not belong to any cluster
bull Bridge to many clusters
bull Outliers
bull Not belong to any cluster
bull Connect to less clusters
)( wvCONNECTCwv
CwwvREACHCvVwv )(
hub
outlier
30
13
9
10
11
7
8
12
6
4
0
1 5
2
3
Algorithm
= 2
= 07
31
13
9
10
11
7
8
12
6
4
0
1 5
2
3
Algorithm
= 2
= 07
063
32
13
9
10
11
7
8
12
6
4
0
1 5
2
3
Algorithm
= 2
= 07
075
067
082
33
13
9
10
11
7
8
12
6
4
0
1 5
2
3
Algorithm
= 2
= 07
34
13
9
10
11
7
8
12
6
4
0
1 5
2
3
Algorithm
= 2
= 07
067
35
13
9
10
11
7
8
12
6
4
0
1 5
2
3
Algorithm
= 2
= 07
073
073
073
36
13
9
10
11
7
8
12
6
4
0
1 5
2
3
Algorithm
= 2
= 07
37
13
9
10
11
7
8
12
6
4
0
1 5
2
3
Algorithm
= 2
= 07
051
38
13
9
10
11
7
8
12
6
4
0
1 5
2
3
Algorithm
= 2
= 07
068
39
13
9
10
11
7
8
12
6
4
0
1 5
2
3
Algorithm
= 2
= 07
051
40
13
9
10
11
7
8
12
6
4
0
1 5
2
3
Algorithm
= 2
= 07
41
13
9
10
11
7
8
12
6
4
0
1 5
2
3
Algorithm
= 2
= 07 051
051
068
42
13
9
10
11
7
8
12
6
4
0
1 5
2
3
Algorithm
= 2
= 07
43
Running Time
bull Running time = O(|E|) bull For sparse networks = O(|V|)
[2] A Clauset M E J Newman amp C Moore Phys Rev E 70 066111 (2004) 44
Spectral Clustering
bull Reference ICDMrsquo09 Tutorial by Chris Ding
bull Example
bull Clustering supreme court justices according to their voting
behavior
45
Example Continue
46
Spectral Graph Partition
bull Min-Cut
bull Minimize the of cut of edges
47
Objective Function
48
Minimum Cut with Constraints
49
New Objective Functions
50
Other References
bull A Tutorial on Spectral Clustering by U Luxburg httpwwwkybmpgdefileadminuser_uploadfilespublicationsattachmentsLuxburg07_tutorial_44885B05Dpdf
51
Chapter 10 Cluster Analysis Basic Concepts and Methods
bull Beyond K-Means
bull K-means
bull EM-algorithm
bull Kernel K-means
bull Clustering Graphs and Network Data
bull Summary
52
Summary
bull Generalizing K-Means
bull Mixture Model EM-Algorithm Kernel K-Means
bull Clustering Graph and Networked Data
bull SCAN density-based algorithm
bull Spectral clustering
53
Announcement
bull HW 3 due tomorrow
bull Course project due next week bull Submit final report data code (with readme) evaluation forms
bull Make appointment with me to explain your project
bull I will ask questions according to your report
bull Final Exam bull 422 3 hours in class cover the whole semester with different
weights
bull You can bring two A4 cheating sheets one for content before midterm and the other for content after midterm
bull Interested in research bull My research area Informationsocial network mining
54
Structure Similarity
bull The desired features tend to be captured by a measure we
call Structural Similarity
bull Structural similarity is large for members of a clique and small
for hubs and outliers
|)(||)(|
|)()(|)(
wv
wvwv
28
v
Structural Connectivity [1]
bull -Neighborhood
bull Core
bull Direct structure reachable
bull Structure reachable transitive closure of direct structure
reachability
bull Structure connected
)(|)()( wvvwvN
|)(|)( vNvCORE
)()()( vNwvCOREwvDirRECH
)()()( wuRECHvuRECHVuwvCONNECT
[1] M Ester H P Kriegel J Sander amp X Xu (KDD96) ldquoA Density-Based
Algorithm for Discovering Clusters in Large Spatial Databases
29
Structure-Connected Clusters
bull Structure-connected cluster C
bull Connectivity
bull Maximality
bull Hubs
bull Not belong to any cluster
bull Bridge to many clusters
bull Outliers
bull Not belong to any cluster
bull Connect to less clusters
)( wvCONNECTCwv
CwwvREACHCvVwv )(
hub
outlier
30
13
9
10
11
7
8
12
6
4
0
1 5
2
3
Algorithm
= 2
= 07
31
13
9
10
11
7
8
12
6
4
0
1 5
2
3
Algorithm
= 2
= 07
063
32
13
9
10
11
7
8
12
6
4
0
1 5
2
3
Algorithm
= 2
= 07
075
067
082
33
13
9
10
11
7
8
12
6
4
0
1 5
2
3
Algorithm
= 2
= 07
34
13
9
10
11
7
8
12
6
4
0
1 5
2
3
Algorithm
= 2
= 07
067
35
13
9
10
11
7
8
12
6
4
0
1 5
2
3
Algorithm
= 2
= 07
073
073
073
36
13
9
10
11
7
8
12
6
4
0
1 5
2
3
Algorithm
= 2
= 07
37
13
9
10
11
7
8
12
6
4
0
1 5
2
3
Algorithm
= 2
= 07
051
38
13
9
10
11
7
8
12
6
4
0
1 5
2
3
Algorithm
= 2
= 07
068
39
13
9
10
11
7
8
12
6
4
0
1 5
2
3
Algorithm
= 2
= 07
051
40
13
9
10
11
7
8
12
6
4
0
1 5
2
3
Algorithm
= 2
= 07
41
13
9
10
11
7
8
12
6
4
0
1 5
2
3
Algorithm
= 2
= 07 051
051
068
42
13
9
10
11
7
8
12
6
4
0
1 5
2
3
Algorithm
= 2
= 07
43
Running Time
bull Running time = O(|E|) bull For sparse networks = O(|V|)
[2] A Clauset M E J Newman amp C Moore Phys Rev E 70 066111 (2004) 44
Spectral Clustering
bull Reference ICDMrsquo09 Tutorial by Chris Ding
bull Example
bull Clustering supreme court justices according to their voting
behavior
45
Example Continue
46
Spectral Graph Partition
bull Min-Cut
bull Minimize the of cut of edges
47
Objective Function
48
Minimum Cut with Constraints
49
New Objective Functions
50
Other References
bull A Tutorial on Spectral Clustering by U Luxburg httpwwwkybmpgdefileadminuser_uploadfilespublicationsattachmentsLuxburg07_tutorial_44885B05Dpdf
51
Chapter 10 Cluster Analysis Basic Concepts and Methods
bull Beyond K-Means
bull K-means
bull EM-algorithm
bull Kernel K-means
bull Clustering Graphs and Network Data
bull Summary
52
Summary
bull Generalizing K-Means
bull Mixture Model EM-Algorithm Kernel K-Means
bull Clustering Graph and Networked Data
bull SCAN density-based algorithm
bull Spectral clustering
53
Announcement
bull HW 3 due tomorrow
bull Course project due next week bull Submit final report data code (with readme) evaluation forms
bull Make appointment with me to explain your project
bull I will ask questions according to your report
bull Final Exam bull 422 3 hours in class cover the whole semester with different
weights
bull You can bring two A4 cheating sheets one for content before midterm and the other for content after midterm
bull Interested in research bull My research area Informationsocial network mining
54
Structural Connectivity [1]
bull -Neighborhood
bull Core
bull Direct structure reachable
bull Structure reachable transitive closure of direct structure
reachability
bull Structure connected
)(|)()( wvvwvN
|)(|)( vNvCORE
)()()( vNwvCOREwvDirRECH
)()()( wuRECHvuRECHVuwvCONNECT
[1] M Ester H P Kriegel J Sander amp X Xu (KDD96) ldquoA Density-Based
Algorithm for Discovering Clusters in Large Spatial Databases
29
Structure-Connected Clusters
bull Structure-connected cluster C
bull Connectivity
bull Maximality
bull Hubs
bull Not belong to any cluster
bull Bridge to many clusters
bull Outliers
bull Not belong to any cluster
bull Connect to less clusters
)( wvCONNECTCwv
CwwvREACHCvVwv )(
hub
outlier
30
13
9
10
11
7
8
12
6
4
0
1 5
2
3
Algorithm
= 2
= 07
31
13
9
10
11
7
8
12
6
4
0
1 5
2
3
Algorithm
= 2
= 07
063
32
13
9
10
11
7
8
12
6
4
0
1 5
2
3
Algorithm
= 2
= 07
075
067
082
33
13
9
10
11
7
8
12
6
4
0
1 5
2
3
Algorithm
= 2
= 07
34
13
9
10
11
7
8
12
6
4
0
1 5
2
3
Algorithm
= 2
= 07
067
35
13
9
10
11
7
8
12
6
4
0
1 5
2
3
Algorithm
= 2
= 07
073
073
073
36
13
9
10
11
7
8
12
6
4
0
1 5
2
3
Algorithm
= 2
= 07
37
13
9
10
11
7
8
12
6
4
0
1 5
2
3
Algorithm
= 2
= 07
051
38
13
9
10
11
7
8
12
6
4
0
1 5
2
3
Algorithm
= 2
= 07
068
39
13
9
10
11
7
8
12
6
4
0
1 5
2
3
Algorithm
= 2
= 07
051
40
13
9
10
11
7
8
12
6
4
0
1 5
2
3
Algorithm
= 2
= 07
41
13
9
10
11
7
8
12
6
4
0
1 5
2
3
Algorithm
= 2
= 07 051
051
068
42
13
9
10
11
7
8
12
6
4
0
1 5
2
3
Algorithm
= 2
= 07
43
Running Time
bull Running time = O(|E|) bull For sparse networks = O(|V|)
[2] A Clauset M E J Newman amp C Moore Phys Rev E 70 066111 (2004) 44
Spectral Clustering
bull Reference ICDMrsquo09 Tutorial by Chris Ding
bull Example
bull Clustering supreme court justices according to their voting
behavior
45
Example Continue
46
Spectral Graph Partition
bull Min-Cut
bull Minimize the of cut of edges
47
Objective Function
48
Minimum Cut with Constraints
49
New Objective Functions
50
Other References
bull A Tutorial on Spectral Clustering by U Luxburg httpwwwkybmpgdefileadminuser_uploadfilespublicationsattachmentsLuxburg07_tutorial_44885B05Dpdf
51
Chapter 10 Cluster Analysis Basic Concepts and Methods
bull Beyond K-Means
bull K-means
bull EM-algorithm
bull Kernel K-means
bull Clustering Graphs and Network Data
bull Summary
52
Summary
bull Generalizing K-Means
bull Mixture Model EM-Algorithm Kernel K-Means
bull Clustering Graph and Networked Data
bull SCAN density-based algorithm
bull Spectral clustering
53
Announcement
bull HW 3 due tomorrow
bull Course project due next week bull Submit final report data code (with readme) evaluation forms
bull Make appointment with me to explain your project
bull I will ask questions according to your report
bull Final Exam bull 422 3 hours in class cover the whole semester with different
weights
bull You can bring two A4 cheating sheets one for content before midterm and the other for content after midterm
bull Interested in research bull My research area Informationsocial network mining
54
Structure-Connected Clusters
bull Structure-connected cluster C
bull Connectivity
bull Maximality
bull Hubs
bull Not belong to any cluster
bull Bridge to many clusters
bull Outliers
bull Not belong to any cluster
bull Connect to less clusters
)( wvCONNECTCwv
CwwvREACHCvVwv )(
hub
outlier
30
13
9
10
11
7
8
12
6
4
0
1 5
2
3
Algorithm
= 2
= 07
31
13
9
10
11
7
8
12
6
4
0
1 5
2
3
Algorithm
= 2
= 07
063
32
13
9
10
11
7
8
12
6
4
0
1 5
2
3
Algorithm
= 2
= 07
075
067
082
33
13
9
10
11
7
8
12
6
4
0
1 5
2
3
Algorithm
= 2
= 07
34
13
9
10
11
7
8
12
6
4
0
1 5
2
3
Algorithm
= 2
= 07
067
35
13
9
10
11
7
8
12
6
4
0
1 5
2
3
Algorithm
= 2
= 07
073
073
073
36
13
9
10
11
7
8
12
6
4
0
1 5
2
3
Algorithm
= 2
= 07
37
13
9
10
11
7
8
12
6
4
0
1 5
2
3
Algorithm
= 2
= 07
051
38
13
9
10
11
7
8
12
6
4
0
1 5
2
3
Algorithm
= 2
= 07
068
39
13
9
10
11
7
8
12
6
4
0
1 5
2
3
Algorithm
= 2
= 07
051
40
13
9
10
11
7
8
12
6
4
0
1 5
2
3
Algorithm
= 2
= 07
41
13
9
10
11
7
8
12
6
4
0
1 5
2
3
Algorithm
= 2
= 07 051
051
068
42
13
9
10
11
7
8
12
6
4
0
1 5
2
3
Algorithm
= 2
= 07
43
Running Time
bull Running time = O(|E|) bull For sparse networks = O(|V|)
[2] A Clauset M E J Newman amp C Moore Phys Rev E 70 066111 (2004) 44
Spectral Clustering
bull Reference ICDMrsquo09 Tutorial by Chris Ding
bull Example
bull Clustering supreme court justices according to their voting
behavior
45
Example Continue
46
Spectral Graph Partition
bull Min-Cut
bull Minimize the of cut of edges
47
Objective Function
48
Minimum Cut with Constraints
49
New Objective Functions
50
Other References
bull A Tutorial on Spectral Clustering by U Luxburg httpwwwkybmpgdefileadminuser_uploadfilespublicationsattachmentsLuxburg07_tutorial_44885B05Dpdf
51
Chapter 10 Cluster Analysis Basic Concepts and Methods
bull Beyond K-Means
bull K-means
bull EM-algorithm
bull Kernel K-means
bull Clustering Graphs and Network Data
bull Summary
52
Summary
bull Generalizing K-Means
bull Mixture Model EM-Algorithm Kernel K-Means
bull Clustering Graph and Networked Data
bull SCAN density-based algorithm
bull Spectral clustering
53
Announcement
bull HW 3 due tomorrow
bull Course project due next week bull Submit final report data code (with readme) evaluation forms
bull Make appointment with me to explain your project
bull I will ask questions according to your report
bull Final Exam bull 422 3 hours in class cover the whole semester with different
weights
bull You can bring two A4 cheating sheets one for content before midterm and the other for content after midterm
bull Interested in research bull My research area Informationsocial network mining
54
13
9
10
11
7
8
12
6
4
0
1 5
2
3
Algorithm
= 2
= 07
31
13
9
10
11
7
8
12
6
4
0
1 5
2
3
Algorithm
= 2
= 07
063
32
13
9
10
11
7
8
12
6
4
0
1 5
2
3
Algorithm
= 2
= 07
075
067
082
33
13
9
10
11
7
8
12
6
4
0
1 5
2
3
Algorithm
= 2
= 07
34
13
9
10
11
7
8
12
6
4
0
1 5
2
3
Algorithm
= 2
= 07
067
35
13
9
10
11
7
8
12
6
4
0
1 5
2
3
Algorithm
= 2
= 07
073
073
073
36
13
9
10
11
7
8
12
6
4
0
1 5
2
3
Algorithm
= 2
= 07
37
13
9
10
11
7
8
12
6
4
0
1 5
2
3
Algorithm
= 2
= 07
051
38
13
9
10
11
7
8
12
6
4
0
1 5
2
3
Algorithm
= 2
= 07
068
39
13
9
10
11
7
8
12
6
4
0
1 5
2
3
Algorithm
= 2
= 07
051
40
13
9
10
11
7
8
12
6
4
0
1 5
2
3
Algorithm
= 2
= 07
41
13
9
10
11
7
8
12
6
4
0
1 5
2
3
Algorithm
= 2
= 07 051
051
068
42
13
9
10
11
7
8
12
6
4
0
1 5
2
3
Algorithm
= 2
= 07
43
Running Time
bull Running time = O(|E|) bull For sparse networks = O(|V|)
[2] A Clauset M E J Newman amp C Moore Phys Rev E 70 066111 (2004) 44
Spectral Clustering
bull Reference ICDMrsquo09 Tutorial by Chris Ding
bull Example
bull Clustering supreme court justices according to their voting
behavior
45
Example Continue
46
Spectral Graph Partition
bull Min-Cut
bull Minimize the of cut of edges
47
Objective Function
48
Minimum Cut with Constraints
49
New Objective Functions
50
Other References
bull A Tutorial on Spectral Clustering by U Luxburg httpwwwkybmpgdefileadminuser_uploadfilespublicationsattachmentsLuxburg07_tutorial_44885B05Dpdf
51
Chapter 10 Cluster Analysis Basic Concepts and Methods
bull Beyond K-Means
bull K-means
bull EM-algorithm
bull Kernel K-means
bull Clustering Graphs and Network Data
bull Summary
52
Summary
bull Generalizing K-Means
bull Mixture Model EM-Algorithm Kernel K-Means
bull Clustering Graph and Networked Data
bull SCAN density-based algorithm
bull Spectral clustering
53
Announcement
bull HW 3 due tomorrow
bull Course project due next week bull Submit final report data code (with readme) evaluation forms
bull Make appointment with me to explain your project
bull I will ask questions according to your report
bull Final Exam bull 422 3 hours in class cover the whole semester with different
weights
bull You can bring two A4 cheating sheets one for content before midterm and the other for content after midterm
bull Interested in research bull My research area Informationsocial network mining
54
13
9
10
11
7
8
12
6
4
0
1 5
2
3
Algorithm
= 2
= 07
063
32
13
9
10
11
7
8
12
6
4
0
1 5
2
3
Algorithm
= 2
= 07
075
067
082
33
13
9
10
11
7
8
12
6
4
0
1 5
2
3
Algorithm
= 2
= 07
34
13
9
10
11
7
8
12
6
4
0
1 5
2
3
Algorithm
= 2
= 07
067
35
13
9
10
11
7
8
12
6
4
0
1 5
2
3
Algorithm
= 2
= 07
073
073
073
36
13
9
10
11
7
8
12
6
4
0
1 5
2
3
Algorithm
= 2
= 07
37
13
9
10
11
7
8
12
6
4
0
1 5
2
3
Algorithm
= 2
= 07
051
38
13
9
10
11
7
8
12
6
4
0
1 5
2
3
Algorithm
= 2
= 07
068
39
13
9
10
11
7
8
12
6
4
0
1 5
2
3
Algorithm
= 2
= 07
051
40
13
9
10
11
7
8
12
6
4
0
1 5
2
3
Algorithm
= 2
= 07
41
13
9
10
11
7
8
12
6
4
0
1 5
2
3
Algorithm
= 2
= 07 051
051
068
42
13
9
10
11
7
8
12
6
4
0
1 5
2
3
Algorithm
= 2
= 07
43
Running Time
bull Running time = O(|E|) bull For sparse networks = O(|V|)
[2] A Clauset M E J Newman amp C Moore Phys Rev E 70 066111 (2004) 44
Spectral Clustering
bull Reference ICDMrsquo09 Tutorial by Chris Ding
bull Example
bull Clustering supreme court justices according to their voting
behavior
45
Example Continue
46
Spectral Graph Partition
bull Min-Cut
bull Minimize the of cut of edges
47
Objective Function
48
Minimum Cut with Constraints
49
New Objective Functions
50
Other References
bull A Tutorial on Spectral Clustering by U Luxburg httpwwwkybmpgdefileadminuser_uploadfilespublicationsattachmentsLuxburg07_tutorial_44885B05Dpdf
51
Chapter 10 Cluster Analysis Basic Concepts and Methods
bull Beyond K-Means
bull K-means
bull EM-algorithm
bull Kernel K-means
bull Clustering Graphs and Network Data
bull Summary
52
Summary
bull Generalizing K-Means
bull Mixture Model EM-Algorithm Kernel K-Means
bull Clustering Graph and Networked Data
bull SCAN density-based algorithm
bull Spectral clustering
53
Announcement
bull HW 3 due tomorrow
bull Course project due next week bull Submit final report data code (with readme) evaluation forms
bull Make appointment with me to explain your project
bull I will ask questions according to your report
bull Final Exam bull 422 3 hours in class cover the whole semester with different
weights
bull You can bring two A4 cheating sheets one for content before midterm and the other for content after midterm
bull Interested in research bull My research area Informationsocial network mining
54
13
9
10
11
7
8
12
6
4
0
1 5
2
3
Algorithm
= 2
= 07
075
067
082
33
13
9
10
11
7
8
12
6
4
0
1 5
2
3
Algorithm
= 2
= 07
34
13
9
10
11
7
8
12
6
4
0
1 5
2
3
Algorithm
= 2
= 07
067
35
13
9
10
11
7
8
12
6
4
0
1 5
2
3
Algorithm
= 2
= 07
073
073
073
36
13
9
10
11
7
8
12
6
4
0
1 5
2
3
Algorithm
= 2
= 07
37
13
9
10
11
7
8
12
6
4
0
1 5
2
3
Algorithm
= 2
= 07
051
38
13
9
10
11
7
8
12
6
4
0
1 5
2
3
Algorithm
= 2
= 07
068
39
13
9
10
11
7
8
12
6
4
0
1 5
2
3
Algorithm
= 2
= 07
051
40
13
9
10
11
7
8
12
6
4
0
1 5
2
3
Algorithm
= 2
= 07
41
13
9
10
11
7
8
12
6
4
0
1 5
2
3
Algorithm
= 2
= 07 051
051
068
42
13
9
10
11
7
8
12
6
4
0
1 5
2
3
Algorithm
= 2
= 07
43
Running Time
bull Running time = O(|E|) bull For sparse networks = O(|V|)
[2] A Clauset M E J Newman amp C Moore Phys Rev E 70 066111 (2004) 44
Spectral Clustering
bull Reference ICDMrsquo09 Tutorial by Chris Ding
bull Example
bull Clustering supreme court justices according to their voting
behavior
45
Example Continue
46
Spectral Graph Partition
bull Min-Cut
bull Minimize the of cut of edges
47
Objective Function
48
Minimum Cut with Constraints
49
New Objective Functions
50
Other References
bull A Tutorial on Spectral Clustering by U Luxburg httpwwwkybmpgdefileadminuser_uploadfilespublicationsattachmentsLuxburg07_tutorial_44885B05Dpdf
51
Chapter 10 Cluster Analysis Basic Concepts and Methods
bull Beyond K-Means
bull K-means
bull EM-algorithm
bull Kernel K-means
bull Clustering Graphs and Network Data
bull Summary
52
Summary
bull Generalizing K-Means
bull Mixture Model EM-Algorithm Kernel K-Means
bull Clustering Graph and Networked Data
bull SCAN density-based algorithm
bull Spectral clustering
53
Announcement
bull HW 3 due tomorrow
bull Course project due next week bull Submit final report data code (with readme) evaluation forms
bull Make appointment with me to explain your project
bull I will ask questions according to your report
bull Final Exam bull 422 3 hours in class cover the whole semester with different
weights
bull You can bring two A4 cheating sheets one for content before midterm and the other for content after midterm
bull Interested in research bull My research area Informationsocial network mining
54
13
9
10
11
7
8
12
6
4
0
1 5
2
3
Algorithm
= 2
= 07
34
13
9
10
11
7
8
12
6
4
0
1 5
2
3
Algorithm
= 2
= 07
067
35
13
9
10
11
7
8
12
6
4
0
1 5
2
3
Algorithm
= 2
= 07
073
073
073
36
13
9
10
11
7
8
12
6
4
0
1 5
2
3
Algorithm
= 2
= 07
37
13
9
10
11
7
8
12
6
4
0
1 5
2
3
Algorithm
= 2
= 07
051
38
13
9
10
11
7
8
12
6
4
0
1 5
2
3
Algorithm
= 2
= 07
068
39
13
9
10
11
7
8
12
6
4
0
1 5
2
3
Algorithm
= 2
= 07
051
40
13
9
10
11
7
8
12
6
4
0
1 5
2
3
Algorithm
= 2
= 07
41
13
9
10
11
7
8
12
6
4
0
1 5
2
3
Algorithm
= 2
= 07 051
051
068
42
13
9
10
11
7
8
12
6
4
0
1 5
2
3
Algorithm
= 2
= 07
43
Running Time
bull Running time = O(|E|) bull For sparse networks = O(|V|)
[2] A Clauset M E J Newman amp C Moore Phys Rev E 70 066111 (2004) 44
Spectral Clustering
bull Reference ICDMrsquo09 Tutorial by Chris Ding
bull Example
bull Clustering supreme court justices according to their voting
behavior
45
Example Continue
46
Spectral Graph Partition
bull Min-Cut
bull Minimize the of cut of edges
47
Objective Function
48
Minimum Cut with Constraints
49
New Objective Functions
50
Other References
bull A Tutorial on Spectral Clustering by U Luxburg httpwwwkybmpgdefileadminuser_uploadfilespublicationsattachmentsLuxburg07_tutorial_44885B05Dpdf
51
Chapter 10 Cluster Analysis Basic Concepts and Methods
bull Beyond K-Means
bull K-means
bull EM-algorithm
bull Kernel K-means
bull Clustering Graphs and Network Data
bull Summary
52
Summary
bull Generalizing K-Means
bull Mixture Model EM-Algorithm Kernel K-Means
bull Clustering Graph and Networked Data
bull SCAN density-based algorithm
bull Spectral clustering
53
Announcement
bull HW 3 due tomorrow
bull Course project due next week bull Submit final report data code (with readme) evaluation forms
bull Make appointment with me to explain your project
bull I will ask questions according to your report
bull Final Exam bull 422 3 hours in class cover the whole semester with different
weights
bull You can bring two A4 cheating sheets one for content before midterm and the other for content after midterm
bull Interested in research bull My research area Informationsocial network mining
54
13
9
10
11
7
8
12
6
4
0
1 5
2
3
Algorithm
= 2
= 07
067
35
13
9
10
11
7
8
12
6
4
0
1 5
2
3
Algorithm
= 2
= 07
073
073
073
36
13
9
10
11
7
8
12
6
4
0
1 5
2
3
Algorithm
= 2
= 07
37
13
9
10
11
7
8
12
6
4
0
1 5
2
3
Algorithm
= 2
= 07
051
38
13
9
10
11
7
8
12
6
4
0
1 5
2
3
Algorithm
= 2
= 07
068
39
13
9
10
11
7
8
12
6
4
0
1 5
2
3
Algorithm
= 2
= 07
051
40
13
9
10
11
7
8
12
6
4
0
1 5
2
3
Algorithm
= 2
= 07
41
13
9
10
11
7
8
12
6
4
0
1 5
2
3
Algorithm
= 2
= 07 051
051
068
42
13
9
10
11
7
8
12
6
4
0
1 5
2
3
Algorithm
= 2
= 07
43
Running Time
bull Running time = O(|E|) bull For sparse networks = O(|V|)
[2] A Clauset M E J Newman amp C Moore Phys Rev E 70 066111 (2004) 44
Spectral Clustering
bull Reference ICDMrsquo09 Tutorial by Chris Ding
bull Example
bull Clustering supreme court justices according to their voting
behavior
45
Example Continue
46
Spectral Graph Partition
bull Min-Cut
bull Minimize the of cut of edges
47
Objective Function
48
Minimum Cut with Constraints
49
New Objective Functions
50
Other References
bull A Tutorial on Spectral Clustering by U Luxburg httpwwwkybmpgdefileadminuser_uploadfilespublicationsattachmentsLuxburg07_tutorial_44885B05Dpdf
51
Chapter 10 Cluster Analysis Basic Concepts and Methods
bull Beyond K-Means
bull K-means
bull EM-algorithm
bull Kernel K-means
bull Clustering Graphs and Network Data
bull Summary
52
Summary
bull Generalizing K-Means
bull Mixture Model EM-Algorithm Kernel K-Means
bull Clustering Graph and Networked Data
bull SCAN density-based algorithm
bull Spectral clustering
53
Announcement
bull HW 3 due tomorrow
bull Course project due next week bull Submit final report data code (with readme) evaluation forms
bull Make appointment with me to explain your project
bull I will ask questions according to your report
bull Final Exam bull 422 3 hours in class cover the whole semester with different
weights
bull You can bring two A4 cheating sheets one for content before midterm and the other for content after midterm
bull Interested in research bull My research area Informationsocial network mining
54
13
9
10
11
7
8
12
6
4
0
1 5
2
3
Algorithm
= 2
= 07
073
073
073
36
13
9
10
11
7
8
12
6
4
0
1 5
2
3
Algorithm
= 2
= 07
37
13
9
10
11
7
8
12
6
4
0
1 5
2
3
Algorithm
= 2
= 07
051
38
13
9
10
11
7
8
12
6
4
0
1 5
2
3
Algorithm
= 2
= 07
068
39
13
9
10
11
7
8
12
6
4
0
1 5
2
3
Algorithm
= 2
= 07
051
40
13
9
10
11
7
8
12
6
4
0
1 5
2
3
Algorithm
= 2
= 07
41
13
9
10
11
7
8
12
6
4
0
1 5
2
3
Algorithm
= 2
= 07 051
051
068
42
13
9
10
11
7
8
12
6
4
0
1 5
2
3
Algorithm
= 2
= 07
43
Running Time
bull Running time = O(|E|) bull For sparse networks = O(|V|)
[2] A Clauset M E J Newman amp C Moore Phys Rev E 70 066111 (2004) 44
Spectral Clustering
bull Reference ICDMrsquo09 Tutorial by Chris Ding
bull Example
bull Clustering supreme court justices according to their voting
behavior
45
Example Continue
46
Spectral Graph Partition
bull Min-Cut
bull Minimize the of cut of edges
47
Objective Function
48
Minimum Cut with Constraints
49
New Objective Functions
50
Other References
bull A Tutorial on Spectral Clustering by U Luxburg httpwwwkybmpgdefileadminuser_uploadfilespublicationsattachmentsLuxburg07_tutorial_44885B05Dpdf
51
Chapter 10 Cluster Analysis Basic Concepts and Methods
bull Beyond K-Means
bull K-means
bull EM-algorithm
bull Kernel K-means
bull Clustering Graphs and Network Data
bull Summary
52
Summary
bull Generalizing K-Means
bull Mixture Model EM-Algorithm Kernel K-Means
bull Clustering Graph and Networked Data
bull SCAN density-based algorithm
bull Spectral clustering
53
Announcement
bull HW 3 due tomorrow
bull Course project due next week bull Submit final report data code (with readme) evaluation forms
bull Make appointment with me to explain your project
bull I will ask questions according to your report
bull Final Exam bull 422 3 hours in class cover the whole semester with different
weights
bull You can bring two A4 cheating sheets one for content before midterm and the other for content after midterm
bull Interested in research bull My research area Informationsocial network mining
54
13
9
10
11
7
8
12
6
4
0
1 5
2
3
Algorithm
= 2
= 07
37
13
9
10
11
7
8
12
6
4
0
1 5
2
3
Algorithm
= 2
= 07
051
38
13
9
10
11
7
8
12
6
4
0
1 5
2
3
Algorithm
= 2
= 07
068
39
13
9
10
11
7
8
12
6
4
0
1 5
2
3
Algorithm
= 2
= 07
051
40
13
9
10
11
7
8
12
6
4
0
1 5
2
3
Algorithm
= 2
= 07
41
13
9
10
11
7
8
12
6
4
0
1 5
2
3
Algorithm
= 2
= 07 051
051
068
42
13
9
10
11
7
8
12
6
4
0
1 5
2
3
Algorithm
= 2
= 07
43
Running Time
bull Running time = O(|E|) bull For sparse networks = O(|V|)
[2] A Clauset M E J Newman amp C Moore Phys Rev E 70 066111 (2004) 44
Spectral Clustering
bull Reference ICDMrsquo09 Tutorial by Chris Ding
bull Example
bull Clustering supreme court justices according to their voting
behavior
45
Example Continue
46
Spectral Graph Partition
bull Min-Cut
bull Minimize the of cut of edges
47
Objective Function
48
Minimum Cut with Constraints
49
New Objective Functions
50
Other References
bull A Tutorial on Spectral Clustering by U Luxburg httpwwwkybmpgdefileadminuser_uploadfilespublicationsattachmentsLuxburg07_tutorial_44885B05Dpdf
51
Chapter 10 Cluster Analysis Basic Concepts and Methods
bull Beyond K-Means
bull K-means
bull EM-algorithm
bull Kernel K-means
bull Clustering Graphs and Network Data
bull Summary
52
Summary
bull Generalizing K-Means
bull Mixture Model EM-Algorithm Kernel K-Means
bull Clustering Graph and Networked Data
bull SCAN density-based algorithm
bull Spectral clustering
53
Announcement
bull HW 3 due tomorrow
bull Course project due next week bull Submit final report data code (with readme) evaluation forms
bull Make appointment with me to explain your project
bull I will ask questions according to your report
bull Final Exam bull 422 3 hours in class cover the whole semester with different
weights
bull You can bring two A4 cheating sheets one for content before midterm and the other for content after midterm
bull Interested in research bull My research area Informationsocial network mining
54
13
9
10
11
7
8
12
6
4
0
1 5
2
3
Algorithm
= 2
= 07
051
38
13
9
10
11
7
8
12
6
4
0
1 5
2
3
Algorithm
= 2
= 07
068
39
13
9
10
11
7
8
12
6
4
0
1 5
2
3
Algorithm
= 2
= 07
051
40
13
9
10
11
7
8
12
6
4
0
1 5
2
3
Algorithm
= 2
= 07
41
13
9
10
11
7
8
12
6
4
0
1 5
2
3
Algorithm
= 2
= 07 051
051
068
42
13
9
10
11
7
8
12
6
4
0
1 5
2
3
Algorithm
= 2
= 07
43
Running Time
bull Running time = O(|E|) bull For sparse networks = O(|V|)
[2] A Clauset M E J Newman amp C Moore Phys Rev E 70 066111 (2004) 44
Spectral Clustering
bull Reference ICDMrsquo09 Tutorial by Chris Ding
bull Example
bull Clustering supreme court justices according to their voting
behavior
45
Example Continue
46
Spectral Graph Partition
bull Min-Cut
bull Minimize the of cut of edges
47
Objective Function
48
Minimum Cut with Constraints
49
New Objective Functions
50
Other References
bull A Tutorial on Spectral Clustering by U Luxburg httpwwwkybmpgdefileadminuser_uploadfilespublicationsattachmentsLuxburg07_tutorial_44885B05Dpdf
51
Chapter 10 Cluster Analysis Basic Concepts and Methods
bull Beyond K-Means
bull K-means
bull EM-algorithm
bull Kernel K-means
bull Clustering Graphs and Network Data
bull Summary
52
Summary
bull Generalizing K-Means
bull Mixture Model EM-Algorithm Kernel K-Means
bull Clustering Graph and Networked Data
bull SCAN density-based algorithm
bull Spectral clustering
53
Announcement
bull HW 3 due tomorrow
bull Course project due next week bull Submit final report data code (with readme) evaluation forms
bull Make appointment with me to explain your project
bull I will ask questions according to your report
bull Final Exam bull 422 3 hours in class cover the whole semester with different
weights
bull You can bring two A4 cheating sheets one for content before midterm and the other for content after midterm
bull Interested in research bull My research area Informationsocial network mining
54
13
9
10
11
7
8
12
6
4
0
1 5
2
3
Algorithm
= 2
= 07
068
39
13
9
10
11
7
8
12
6
4
0
1 5
2
3
Algorithm
= 2
= 07
051
40
13
9
10
11
7
8
12
6
4
0
1 5
2
3
Algorithm
= 2
= 07
41
13
9
10
11
7
8
12
6
4
0
1 5
2
3
Algorithm
= 2
= 07 051
051
068
42
13
9
10
11
7
8
12
6
4
0
1 5
2
3
Algorithm
= 2
= 07
43
Running Time
bull Running time = O(|E|) bull For sparse networks = O(|V|)
[2] A Clauset M E J Newman amp C Moore Phys Rev E 70 066111 (2004) 44
Spectral Clustering
bull Reference ICDMrsquo09 Tutorial by Chris Ding
bull Example
bull Clustering supreme court justices according to their voting
behavior
45
Example Continue
46
Spectral Graph Partition
bull Min-Cut
bull Minimize the of cut of edges
47
Objective Function
48
Minimum Cut with Constraints
49
New Objective Functions
50
Other References
bull A Tutorial on Spectral Clustering by U Luxburg httpwwwkybmpgdefileadminuser_uploadfilespublicationsattachmentsLuxburg07_tutorial_44885B05Dpdf
51
Chapter 10 Cluster Analysis Basic Concepts and Methods
bull Beyond K-Means
bull K-means
bull EM-algorithm
bull Kernel K-means
bull Clustering Graphs and Network Data
bull Summary
52
Summary
bull Generalizing K-Means
bull Mixture Model EM-Algorithm Kernel K-Means
bull Clustering Graph and Networked Data
bull SCAN density-based algorithm
bull Spectral clustering
53
Announcement
bull HW 3 due tomorrow
bull Course project due next week bull Submit final report data code (with readme) evaluation forms
bull Make appointment with me to explain your project
bull I will ask questions according to your report
bull Final Exam bull 422 3 hours in class cover the whole semester with different
weights
bull You can bring two A4 cheating sheets one for content before midterm and the other for content after midterm
bull Interested in research bull My research area Informationsocial network mining
54
13
9
10
11
7
8
12
6
4
0
1 5
2
3
Algorithm
= 2
= 07
051
40
13
9
10
11
7
8
12
6
4
0
1 5
2
3
Algorithm
= 2
= 07
41
13
9
10
11
7
8
12
6
4
0
1 5
2
3
Algorithm
= 2
= 07 051
051
068
42
13
9
10
11
7
8
12
6
4
0
1 5
2
3
Algorithm
= 2
= 07
43
Running Time
bull Running time = O(|E|) bull For sparse networks = O(|V|)
[2] A Clauset M E J Newman amp C Moore Phys Rev E 70 066111 (2004) 44
Spectral Clustering
bull Reference ICDMrsquo09 Tutorial by Chris Ding
bull Example
bull Clustering supreme court justices according to their voting
behavior
45
Example Continue
46
Spectral Graph Partition
bull Min-Cut
bull Minimize the of cut of edges
47
Objective Function
48
Minimum Cut with Constraints
49
New Objective Functions
50
Other References
bull A Tutorial on Spectral Clustering by U Luxburg httpwwwkybmpgdefileadminuser_uploadfilespublicationsattachmentsLuxburg07_tutorial_44885B05Dpdf
51
Chapter 10 Cluster Analysis Basic Concepts and Methods
bull Beyond K-Means
bull K-means
bull EM-algorithm
bull Kernel K-means
bull Clustering Graphs and Network Data
bull Summary
52
Summary
bull Generalizing K-Means
bull Mixture Model EM-Algorithm Kernel K-Means
bull Clustering Graph and Networked Data
bull SCAN density-based algorithm
bull Spectral clustering
53
Announcement
bull HW 3 due tomorrow
bull Course project due next week bull Submit final report data code (with readme) evaluation forms
bull Make appointment with me to explain your project
bull I will ask questions according to your report
bull Final Exam bull 422 3 hours in class cover the whole semester with different
weights
bull You can bring two A4 cheating sheets one for content before midterm and the other for content after midterm
bull Interested in research bull My research area Informationsocial network mining
54
13
9
10
11
7
8
12
6
4
0
1 5
2
3
Algorithm
= 2
= 07
41
13
9
10
11
7
8
12
6
4
0
1 5
2
3
Algorithm
= 2
= 07 051
051
068
42
13
9
10
11
7
8
12
6
4
0
1 5
2
3
Algorithm
= 2
= 07
43
Running Time
bull Running time = O(|E|) bull For sparse networks = O(|V|)
[2] A Clauset M E J Newman amp C Moore Phys Rev E 70 066111 (2004) 44
Spectral Clustering
bull Reference ICDMrsquo09 Tutorial by Chris Ding
bull Example
bull Clustering supreme court justices according to their voting
behavior
45
Example Continue
46
Spectral Graph Partition
bull Min-Cut
bull Minimize the of cut of edges
47
Objective Function
48
Minimum Cut with Constraints
49
New Objective Functions
50
Other References
bull A Tutorial on Spectral Clustering by U Luxburg httpwwwkybmpgdefileadminuser_uploadfilespublicationsattachmentsLuxburg07_tutorial_44885B05Dpdf
51
Chapter 10 Cluster Analysis Basic Concepts and Methods
bull Beyond K-Means
bull K-means
bull EM-algorithm
bull Kernel K-means
bull Clustering Graphs and Network Data
bull Summary
52
Summary
bull Generalizing K-Means
bull Mixture Model EM-Algorithm Kernel K-Means
bull Clustering Graph and Networked Data
bull SCAN density-based algorithm
bull Spectral clustering
53
Announcement
bull HW 3 due tomorrow
bull Course project due next week bull Submit final report data code (with readme) evaluation forms
bull Make appointment with me to explain your project
bull I will ask questions according to your report
bull Final Exam bull 422 3 hours in class cover the whole semester with different
weights
bull You can bring two A4 cheating sheets one for content before midterm and the other for content after midterm
bull Interested in research bull My research area Informationsocial network mining
54
13
9
10
11
7
8
12
6
4
0
1 5
2
3
Algorithm
= 2
= 07 051
051
068
42
13
9
10
11
7
8
12
6
4
0
1 5
2
3
Algorithm
= 2
= 07
43
Running Time
bull Running time = O(|E|) bull For sparse networks = O(|V|)
[2] A Clauset M E J Newman amp C Moore Phys Rev E 70 066111 (2004) 44
Spectral Clustering
bull Reference ICDMrsquo09 Tutorial by Chris Ding
bull Example
bull Clustering supreme court justices according to their voting
behavior
45
Example Continue
46
Spectral Graph Partition
bull Min-Cut
bull Minimize the of cut of edges
47
Objective Function
48
Minimum Cut with Constraints
49
New Objective Functions
50
Other References
bull A Tutorial on Spectral Clustering by U Luxburg httpwwwkybmpgdefileadminuser_uploadfilespublicationsattachmentsLuxburg07_tutorial_44885B05Dpdf
51
Chapter 10 Cluster Analysis Basic Concepts and Methods
bull Beyond K-Means
bull K-means
bull EM-algorithm
bull Kernel K-means
bull Clustering Graphs and Network Data
bull Summary
52
Summary
bull Generalizing K-Means
bull Mixture Model EM-Algorithm Kernel K-Means
bull Clustering Graph and Networked Data
bull SCAN density-based algorithm
bull Spectral clustering
53
Announcement
bull HW 3 due tomorrow
bull Course project due next week bull Submit final report data code (with readme) evaluation forms
bull Make appointment with me to explain your project
bull I will ask questions according to your report
bull Final Exam bull 422 3 hours in class cover the whole semester with different
weights
bull You can bring two A4 cheating sheets one for content before midterm and the other for content after midterm
bull Interested in research bull My research area Informationsocial network mining
54
13
9
10
11
7
8
12
6
4
0
1 5
2
3
Algorithm
= 2
= 07
43
Running Time
bull Running time = O(|E|) bull For sparse networks = O(|V|)
[2] A Clauset M E J Newman amp C Moore Phys Rev E 70 066111 (2004) 44
Spectral Clustering
bull Reference ICDMrsquo09 Tutorial by Chris Ding
bull Example
bull Clustering supreme court justices according to their voting
behavior
45
Example Continue
46
Spectral Graph Partition
bull Min-Cut
bull Minimize the of cut of edges
47
Objective Function
48
Minimum Cut with Constraints
49
New Objective Functions
50
Other References
bull A Tutorial on Spectral Clustering by U Luxburg httpwwwkybmpgdefileadminuser_uploadfilespublicationsattachmentsLuxburg07_tutorial_44885B05Dpdf
51
Chapter 10 Cluster Analysis Basic Concepts and Methods
bull Beyond K-Means
bull K-means
bull EM-algorithm
bull Kernel K-means
bull Clustering Graphs and Network Data
bull Summary
52
Summary
bull Generalizing K-Means
bull Mixture Model EM-Algorithm Kernel K-Means
bull Clustering Graph and Networked Data
bull SCAN density-based algorithm
bull Spectral clustering
53
Announcement
bull HW 3 due tomorrow
bull Course project due next week bull Submit final report data code (with readme) evaluation forms
bull Make appointment with me to explain your project
bull I will ask questions according to your report
bull Final Exam bull 422 3 hours in class cover the whole semester with different
weights
bull You can bring two A4 cheating sheets one for content before midterm and the other for content after midterm
bull Interested in research bull My research area Informationsocial network mining
54
Running Time
bull Running time = O(|E|) bull For sparse networks = O(|V|)
[2] A Clauset M E J Newman amp C Moore Phys Rev E 70 066111 (2004) 44
Spectral Clustering
bull Reference ICDMrsquo09 Tutorial by Chris Ding
bull Example
bull Clustering supreme court justices according to their voting
behavior
45
Example Continue
46
Spectral Graph Partition
bull Min-Cut
bull Minimize the of cut of edges
47
Objective Function
48
Minimum Cut with Constraints
49
New Objective Functions
50
Other References
bull A Tutorial on Spectral Clustering by U Luxburg httpwwwkybmpgdefileadminuser_uploadfilespublicationsattachmentsLuxburg07_tutorial_44885B05Dpdf
51
Chapter 10 Cluster Analysis Basic Concepts and Methods
bull Beyond K-Means
bull K-means
bull EM-algorithm
bull Kernel K-means
bull Clustering Graphs and Network Data
bull Summary
52
Summary
bull Generalizing K-Means
bull Mixture Model EM-Algorithm Kernel K-Means
bull Clustering Graph and Networked Data
bull SCAN density-based algorithm
bull Spectral clustering
53
Announcement
bull HW 3 due tomorrow
bull Course project due next week bull Submit final report data code (with readme) evaluation forms
bull Make appointment with me to explain your project
bull I will ask questions according to your report
bull Final Exam bull 422 3 hours in class cover the whole semester with different
weights
bull You can bring two A4 cheating sheets one for content before midterm and the other for content after midterm
bull Interested in research bull My research area Informationsocial network mining
54
Spectral Clustering
bull Reference ICDMrsquo09 Tutorial by Chris Ding
bull Example
bull Clustering supreme court justices according to their voting
behavior
45
Example Continue
46
Spectral Graph Partition
bull Min-Cut
bull Minimize the of cut of edges
47
Objective Function
48
Minimum Cut with Constraints
49
New Objective Functions
50
Other References
bull A Tutorial on Spectral Clustering by U Luxburg httpwwwkybmpgdefileadminuser_uploadfilespublicationsattachmentsLuxburg07_tutorial_44885B05Dpdf
51
Chapter 10 Cluster Analysis Basic Concepts and Methods
bull Beyond K-Means
bull K-means
bull EM-algorithm
bull Kernel K-means
bull Clustering Graphs and Network Data
bull Summary
52
Summary
bull Generalizing K-Means
bull Mixture Model EM-Algorithm Kernel K-Means
bull Clustering Graph and Networked Data
bull SCAN density-based algorithm
bull Spectral clustering
53
Announcement
bull HW 3 due tomorrow
bull Course project due next week bull Submit final report data code (with readme) evaluation forms
bull Make appointment with me to explain your project
bull I will ask questions according to your report
bull Final Exam bull 422 3 hours in class cover the whole semester with different
weights
bull You can bring two A4 cheating sheets one for content before midterm and the other for content after midterm
bull Interested in research bull My research area Informationsocial network mining
54
Example Continue
46
Spectral Graph Partition
bull Min-Cut
bull Minimize the of cut of edges
47
Objective Function
48
Minimum Cut with Constraints
49
New Objective Functions
50
Other References
bull A Tutorial on Spectral Clustering by U Luxburg httpwwwkybmpgdefileadminuser_uploadfilespublicationsattachmentsLuxburg07_tutorial_44885B05Dpdf
51
Chapter 10 Cluster Analysis Basic Concepts and Methods
bull Beyond K-Means
bull K-means
bull EM-algorithm
bull Kernel K-means
bull Clustering Graphs and Network Data
bull Summary
52
Summary
bull Generalizing K-Means
bull Mixture Model EM-Algorithm Kernel K-Means
bull Clustering Graph and Networked Data
bull SCAN density-based algorithm
bull Spectral clustering
53
Announcement
bull HW 3 due tomorrow
bull Course project due next week bull Submit final report data code (with readme) evaluation forms
bull Make appointment with me to explain your project
bull I will ask questions according to your report
bull Final Exam bull 422 3 hours in class cover the whole semester with different
weights
bull You can bring two A4 cheating sheets one for content before midterm and the other for content after midterm
bull Interested in research bull My research area Informationsocial network mining
54
Spectral Graph Partition
bull Min-Cut
bull Minimize the of cut of edges
47
Objective Function
48
Minimum Cut with Constraints
49
New Objective Functions
50
Other References
bull A Tutorial on Spectral Clustering by U Luxburg httpwwwkybmpgdefileadminuser_uploadfilespublicationsattachmentsLuxburg07_tutorial_44885B05Dpdf
51
Chapter 10 Cluster Analysis Basic Concepts and Methods
bull Beyond K-Means
bull K-means
bull EM-algorithm
bull Kernel K-means
bull Clustering Graphs and Network Data
bull Summary
52
Summary
bull Generalizing K-Means
bull Mixture Model EM-Algorithm Kernel K-Means
bull Clustering Graph and Networked Data
bull SCAN density-based algorithm
bull Spectral clustering
53
Announcement
bull HW 3 due tomorrow
bull Course project due next week bull Submit final report data code (with readme) evaluation forms
bull Make appointment with me to explain your project
bull I will ask questions according to your report
bull Final Exam bull 422 3 hours in class cover the whole semester with different
weights
bull You can bring two A4 cheating sheets one for content before midterm and the other for content after midterm
bull Interested in research bull My research area Informationsocial network mining
54
Objective Function
48
Minimum Cut with Constraints
49
New Objective Functions
50
Other References
bull A Tutorial on Spectral Clustering by U Luxburg httpwwwkybmpgdefileadminuser_uploadfilespublicationsattachmentsLuxburg07_tutorial_44885B05Dpdf
51
Chapter 10 Cluster Analysis Basic Concepts and Methods
bull Beyond K-Means
bull K-means
bull EM-algorithm
bull Kernel K-means
bull Clustering Graphs and Network Data
bull Summary
52
Summary
bull Generalizing K-Means
bull Mixture Model EM-Algorithm Kernel K-Means
bull Clustering Graph and Networked Data
bull SCAN density-based algorithm
bull Spectral clustering
53
Announcement
bull HW 3 due tomorrow
bull Course project due next week bull Submit final report data code (with readme) evaluation forms
bull Make appointment with me to explain your project
bull I will ask questions according to your report
bull Final Exam bull 422 3 hours in class cover the whole semester with different
weights
bull You can bring two A4 cheating sheets one for content before midterm and the other for content after midterm
bull Interested in research bull My research area Informationsocial network mining
54
Minimum Cut with Constraints
49
New Objective Functions
50
Other References
bull A Tutorial on Spectral Clustering by U Luxburg httpwwwkybmpgdefileadminuser_uploadfilespublicationsattachmentsLuxburg07_tutorial_44885B05Dpdf
51
Chapter 10 Cluster Analysis Basic Concepts and Methods
bull Beyond K-Means
bull K-means
bull EM-algorithm
bull Kernel K-means
bull Clustering Graphs and Network Data
bull Summary
52
Summary
bull Generalizing K-Means
bull Mixture Model EM-Algorithm Kernel K-Means
bull Clustering Graph and Networked Data
bull SCAN density-based algorithm
bull Spectral clustering
53
Announcement
bull HW 3 due tomorrow
bull Course project due next week bull Submit final report data code (with readme) evaluation forms
bull Make appointment with me to explain your project
bull I will ask questions according to your report
bull Final Exam bull 422 3 hours in class cover the whole semester with different
weights
bull You can bring two A4 cheating sheets one for content before midterm and the other for content after midterm
bull Interested in research bull My research area Informationsocial network mining
54
New Objective Functions
50
Other References
bull A Tutorial on Spectral Clustering by U Luxburg httpwwwkybmpgdefileadminuser_uploadfilespublicationsattachmentsLuxburg07_tutorial_44885B05Dpdf
51
Chapter 10 Cluster Analysis Basic Concepts and Methods
bull Beyond K-Means
bull K-means
bull EM-algorithm
bull Kernel K-means
bull Clustering Graphs and Network Data
bull Summary
52
Summary
bull Generalizing K-Means
bull Mixture Model EM-Algorithm Kernel K-Means
bull Clustering Graph and Networked Data
bull SCAN density-based algorithm
bull Spectral clustering
53
Announcement
bull HW 3 due tomorrow
bull Course project due next week bull Submit final report data code (with readme) evaluation forms
bull Make appointment with me to explain your project
bull I will ask questions according to your report
bull Final Exam bull 422 3 hours in class cover the whole semester with different
weights
bull You can bring two A4 cheating sheets one for content before midterm and the other for content after midterm
bull Interested in research bull My research area Informationsocial network mining
54
Other References
bull A Tutorial on Spectral Clustering by U Luxburg httpwwwkybmpgdefileadminuser_uploadfilespublicationsattachmentsLuxburg07_tutorial_44885B05Dpdf
51
Chapter 10 Cluster Analysis Basic Concepts and Methods
bull Beyond K-Means
bull K-means
bull EM-algorithm
bull Kernel K-means
bull Clustering Graphs and Network Data
bull Summary
52
Summary
bull Generalizing K-Means
bull Mixture Model EM-Algorithm Kernel K-Means
bull Clustering Graph and Networked Data
bull SCAN density-based algorithm
bull Spectral clustering
53
Announcement
bull HW 3 due tomorrow
bull Course project due next week bull Submit final report data code (with readme) evaluation forms
bull Make appointment with me to explain your project
bull I will ask questions according to your report
bull Final Exam bull 422 3 hours in class cover the whole semester with different
weights
bull You can bring two A4 cheating sheets one for content before midterm and the other for content after midterm
bull Interested in research bull My research area Informationsocial network mining
54
Chapter 10 Cluster Analysis Basic Concepts and Methods
bull Beyond K-Means
bull K-means
bull EM-algorithm
bull Kernel K-means
bull Clustering Graphs and Network Data
bull Summary
52
Summary
bull Generalizing K-Means
bull Mixture Model EM-Algorithm Kernel K-Means
bull Clustering Graph and Networked Data
bull SCAN density-based algorithm
bull Spectral clustering
53
Announcement
bull HW 3 due tomorrow
bull Course project due next week bull Submit final report data code (with readme) evaluation forms
bull Make appointment with me to explain your project
bull I will ask questions according to your report
bull Final Exam bull 422 3 hours in class cover the whole semester with different
weights
bull You can bring two A4 cheating sheets one for content before midterm and the other for content after midterm
bull Interested in research bull My research area Informationsocial network mining
54
Summary
bull Generalizing K-Means
bull Mixture Model EM-Algorithm Kernel K-Means
bull Clustering Graph and Networked Data
bull SCAN density-based algorithm
bull Spectral clustering
53
Announcement
bull HW 3 due tomorrow
bull Course project due next week bull Submit final report data code (with readme) evaluation forms
bull Make appointment with me to explain your project
bull I will ask questions according to your report
bull Final Exam bull 422 3 hours in class cover the whole semester with different
weights
bull You can bring two A4 cheating sheets one for content before midterm and the other for content after midterm
bull Interested in research bull My research area Informationsocial network mining
54
Announcement
bull HW 3 due tomorrow
bull Course project due next week bull Submit final report data code (with readme) evaluation forms
bull Make appointment with me to explain your project
bull I will ask questions according to your report
bull Final Exam bull 422 3 hours in class cover the whole semester with different
weights
bull You can bring two A4 cheating sheets one for content before midterm and the other for content after midterm
bull Interested in research bull My research area Informationsocial network mining
54