machine learning for data science (cs4786) lecture 3 · j randomly and set m = 1 repeat until...
TRANSCRIPT
![Page 1: Machine Learning for Data Science (CS4786) Lecture 3 · j randomly and set m = 1 Repeat until convergence (or until patience runs out) 1 For each t ∈ {1,...,n}, set cluster identity](https://reader033.vdocuments.mx/reader033/viewer/2022060317/5f0c5a1e7e708231d434f9a2/html5/thumbnails/1.jpg)
Machine Learning for Data Science (CS4786)Lecture 3
Clustering
Course Webpage :http://www.cs.cornell.edu/Courses/cs4786/2017fa/
![Page 2: Machine Learning for Data Science (CS4786) Lecture 3 · j randomly and set m = 1 Repeat until convergence (or until patience runs out) 1 For each t ∈ {1,...,n}, set cluster identity](https://reader033.vdocuments.mx/reader033/viewer/2022060317/5f0c5a1e7e708231d434f9a2/html5/thumbnails/2.jpg)
CLUSTERING CRITERION
Minimize total within-cluster dissimilarity
M1 = K�j=1�
s,t∈Cj
dissimilarity(xt
,xs
)Maximize between-cluster dissimilarity
M2 = �x
s
,xt
∶c(xs
)≠c(xt
)dissimilarity(x
t
,xs
)Maximize smallest between-cluster dissimilarity
M3 = minx
s
,xt
∶c(xs
)≠c(xt
)dissimilarity(xt
,xs
)Minimize largest within-cluster dissimilarity
M4 =maxj∈[K] max
s,t∈Cj
dissimilarity(xt
,xs
)
![Page 3: Machine Learning for Data Science (CS4786) Lecture 3 · j randomly and set m = 1 Repeat until convergence (or until patience runs out) 1 For each t ∈ {1,...,n}, set cluster identity](https://reader033.vdocuments.mx/reader033/viewer/2022060317/5f0c5a1e7e708231d434f9a2/html5/thumbnails/3.jpg)
CLUSTERING CRITERION
minimizing M1 ≡maximizing M2
minimizing M5 ≡minimizing M6
![Page 4: Machine Learning for Data Science (CS4786) Lecture 3 · j randomly and set m = 1 Repeat until convergence (or until patience runs out) 1 For each t ∈ {1,...,n}, set cluster identity](https://reader033.vdocuments.mx/reader033/viewer/2022060317/5f0c5a1e7e708231d434f9a2/html5/thumbnails/4.jpg)
Lets Build an Algorithm
CLUSTERING CRITERION
Minimize total within-cluster dissimilarity
M1 = K�j=1�
x
s
,xt
∈Cj
dissimilarity(xt
,xs
)Maximize between-cluster dissimilarity
M2 = �x
s
,xt
∶c(xs
)≠c(xt
)dissimilarity(x
t
,xs
)Maximize smallest between-cluster dissimilarity
M3 = minx
s
,xt
∶c(xs
)≠c(xt
)dissimilarity(xt
,xs
)Minimize largest within-cluster dissimilarity
M4 =maxj∈[K] max
x
s
,xt
∈Cj
dissimilarity(xt
,xs
)
![Page 5: Machine Learning for Data Science (CS4786) Lecture 3 · j randomly and set m = 1 Repeat until convergence (or until patience runs out) 1 For each t ∈ {1,...,n}, set cluster identity](https://reader033.vdocuments.mx/reader033/viewer/2022060317/5f0c5a1e7e708231d434f9a2/html5/thumbnails/5.jpg)
SINGLE LINK CLUSTERING
Initialize n clusters with each point xt to its own cluster
Until there are only K clusters, do
1 Find closest two clusters and merge them into one cluster
2 Update between cluster distances (called proximity matrix)
dissimilarity(Ci, Cj) = min
t2Ci,s2Cj
dissimilarity(xt, xs)
![Page 6: Machine Learning for Data Science (CS4786) Lecture 3 · j randomly and set m = 1 Repeat until convergence (or until patience runs out) 1 For each t ∈ {1,...,n}, set cluster identity](https://reader033.vdocuments.mx/reader033/viewer/2022060317/5f0c5a1e7e708231d434f9a2/html5/thumbnails/6.jpg)
Demo
![Page 7: Machine Learning for Data Science (CS4786) Lecture 3 · j randomly and set m = 1 Repeat until convergence (or until patience runs out) 1 For each t ∈ {1,...,n}, set cluster identity](https://reader033.vdocuments.mx/reader033/viewer/2022060317/5f0c5a1e7e708231d434f9a2/html5/thumbnails/7.jpg)
Demo
![Page 8: Machine Learning for Data Science (CS4786) Lecture 3 · j randomly and set m = 1 Repeat until convergence (or until patience runs out) 1 For each t ∈ {1,...,n}, set cluster identity](https://reader033.vdocuments.mx/reader033/viewer/2022060317/5f0c5a1e7e708231d434f9a2/html5/thumbnails/8.jpg)
Demo
![Page 9: Machine Learning for Data Science (CS4786) Lecture 3 · j randomly and set m = 1 Repeat until convergence (or until patience runs out) 1 For each t ∈ {1,...,n}, set cluster identity](https://reader033.vdocuments.mx/reader033/viewer/2022060317/5f0c5a1e7e708231d434f9a2/html5/thumbnails/9.jpg)
Demo
![Page 10: Machine Learning for Data Science (CS4786) Lecture 3 · j randomly and set m = 1 Repeat until convergence (or until patience runs out) 1 For each t ∈ {1,...,n}, set cluster identity](https://reader033.vdocuments.mx/reader033/viewer/2022060317/5f0c5a1e7e708231d434f9a2/html5/thumbnails/10.jpg)
Demo
![Page 11: Machine Learning for Data Science (CS4786) Lecture 3 · j randomly and set m = 1 Repeat until convergence (or until patience runs out) 1 For each t ∈ {1,...,n}, set cluster identity](https://reader033.vdocuments.mx/reader033/viewer/2022060317/5f0c5a1e7e708231d434f9a2/html5/thumbnails/11.jpg)
Demo
![Page 12: Machine Learning for Data Science (CS4786) Lecture 3 · j randomly and set m = 1 Repeat until convergence (or until patience runs out) 1 For each t ∈ {1,...,n}, set cluster identity](https://reader033.vdocuments.mx/reader033/viewer/2022060317/5f0c5a1e7e708231d434f9a2/html5/thumbnails/12.jpg)
Demo
![Page 13: Machine Learning for Data Science (CS4786) Lecture 3 · j randomly and set m = 1 Repeat until convergence (or until patience runs out) 1 For each t ∈ {1,...,n}, set cluster identity](https://reader033.vdocuments.mx/reader033/viewer/2022060317/5f0c5a1e7e708231d434f9a2/html5/thumbnails/13.jpg)
Demo
![Page 14: Machine Learning for Data Science (CS4786) Lecture 3 · j randomly and set m = 1 Repeat until convergence (or until patience runs out) 1 For each t ∈ {1,...,n}, set cluster identity](https://reader033.vdocuments.mx/reader033/viewer/2022060317/5f0c5a1e7e708231d434f9a2/html5/thumbnails/14.jpg)
Demo
![Page 15: Machine Learning for Data Science (CS4786) Lecture 3 · j randomly and set m = 1 Repeat until convergence (or until patience runs out) 1 For each t ∈ {1,...,n}, set cluster identity](https://reader033.vdocuments.mx/reader033/viewer/2022060317/5f0c5a1e7e708231d434f9a2/html5/thumbnails/15.jpg)
Demo
![Page 16: Machine Learning for Data Science (CS4786) Lecture 3 · j randomly and set m = 1 Repeat until convergence (or until patience runs out) 1 For each t ∈ {1,...,n}, set cluster identity](https://reader033.vdocuments.mx/reader033/viewer/2022060317/5f0c5a1e7e708231d434f9a2/html5/thumbnails/16.jpg)
Demo
![Page 17: Machine Learning for Data Science (CS4786) Lecture 3 · j randomly and set m = 1 Repeat until convergence (or until patience runs out) 1 For each t ∈ {1,...,n}, set cluster identity](https://reader033.vdocuments.mx/reader033/viewer/2022060317/5f0c5a1e7e708231d434f9a2/html5/thumbnails/17.jpg)
Demo
![Page 18: Machine Learning for Data Science (CS4786) Lecture 3 · j randomly and set m = 1 Repeat until convergence (or until patience runs out) 1 For each t ∈ {1,...,n}, set cluster identity](https://reader033.vdocuments.mx/reader033/viewer/2022060317/5f0c5a1e7e708231d434f9a2/html5/thumbnails/18.jpg)
Demo
![Page 19: Machine Learning for Data Science (CS4786) Lecture 3 · j randomly and set m = 1 Repeat until convergence (or until patience runs out) 1 For each t ∈ {1,...,n}, set cluster identity](https://reader033.vdocuments.mx/reader033/viewer/2022060317/5f0c5a1e7e708231d434f9a2/html5/thumbnails/19.jpg)
Demo
![Page 20: Machine Learning for Data Science (CS4786) Lecture 3 · j randomly and set m = 1 Repeat until convergence (or until patience runs out) 1 For each t ∈ {1,...,n}, set cluster identity](https://reader033.vdocuments.mx/reader033/viewer/2022060317/5f0c5a1e7e708231d434f9a2/html5/thumbnails/20.jpg)
Demo
![Page 21: Machine Learning for Data Science (CS4786) Lecture 3 · j randomly and set m = 1 Repeat until convergence (or until patience runs out) 1 For each t ∈ {1,...,n}, set cluster identity](https://reader033.vdocuments.mx/reader033/viewer/2022060317/5f0c5a1e7e708231d434f9a2/html5/thumbnails/21.jpg)
Demo
![Page 22: Machine Learning for Data Science (CS4786) Lecture 3 · j randomly and set m = 1 Repeat until convergence (or until patience runs out) 1 For each t ∈ {1,...,n}, set cluster identity](https://reader033.vdocuments.mx/reader033/viewer/2022060317/5f0c5a1e7e708231d434f9a2/html5/thumbnails/22.jpg)
Demo
![Page 23: Machine Learning for Data Science (CS4786) Lecture 3 · j randomly and set m = 1 Repeat until convergence (or until patience runs out) 1 For each t ∈ {1,...,n}, set cluster identity](https://reader033.vdocuments.mx/reader033/viewer/2022060317/5f0c5a1e7e708231d434f9a2/html5/thumbnails/23.jpg)
SINGLE LINK OBJECTIVE
Objective for single-link:
M4 = minxs,xt∶c(xs)≠c(xt) �xs − xt�22
Single link clustering is optimal for above objective!
CLUSTERING CRITERION
Minimize total within-cluster dissimilarity
M1 = K�j=1�
x
s
,xt
∈Cj
dissimilarity(xt
,xs
)Maximize between-cluster dissimilarity
M2 = �x
s
,xt
∶c(xs
)≠c(xt
)dissimilarity(x
t
,xs
)Maximize smallest between-cluster dissimilarity
M3 = minx
s
,xt
∶c(xs
)≠c(xt
)dissimilarity(xt
,xs
)Minimize largest within-cluster dissimilarity
M4 =maxj∈[K] max
x
s
,xt
∈Cj
dissimilarity(xt
,xs
)
![Page 24: Machine Learning for Data Science (CS4786) Lecture 3 · j randomly and set m = 1 Repeat until convergence (or until patience runs out) 1 For each t ∈ {1,...,n}, set cluster identity](https://reader033.vdocuments.mx/reader033/viewer/2022060317/5f0c5a1e7e708231d434f9a2/html5/thumbnails/24.jpg)
SINGLE LINK OBJECTIVE
Objective for single-link:
M4 = minxs,xt∶c(xs)≠c(xt) �xs − xt�22
Single link clustering is optimal for above objective!
Say c is solution produced by single-link clustering
Proof:
Say c0 6= c then,
9 t, s s.t. c0(xt) 6= c
0(xs) but c(xt) = c(xs)
min
t,s:c(xi) 6=c(xj)dissimilarity(x
i
, x
j
) > max
t,s:c(xt)=c(xs)dissimilarity(x
t
, x
s
)
Distance of points merged (edges on the tree)
Key observation:
xtxs
a b
c’ boundary
![Page 25: Machine Learning for Data Science (CS4786) Lecture 3 · j randomly and set m = 1 Repeat until convergence (or until patience runs out) 1 For each t ∈ {1,...,n}, set cluster identity](https://reader033.vdocuments.mx/reader033/viewer/2022060317/5f0c5a1e7e708231d434f9a2/html5/thumbnails/25.jpg)
CLUSTERING CRITERION
Minimize within cluster average dissimilarity
M6 = K�j=1�s∈C
j
dissimilarity �xs
,Cj
�
= K�j=1�s∈C
j
���1�C
j
� �t∈C
j
,t≠s
dissimilarity (xs
,xt
)���= K�
j=1
1�C
j
� �s∈C
j
��� �t∈C
j
,t≠s
�xs
− x
t
�22���Minimize within-cluster variance: r
j
= 1n
j
∑x∈C
j
x
M5 = K�j=1�t∈C
j
�x
t
− r
j
�22
![Page 26: Machine Learning for Data Science (CS4786) Lecture 3 · j randomly and set m = 1 Repeat until convergence (or until patience runs out) 1 For each t ∈ {1,...,n}, set cluster identity](https://reader033.vdocuments.mx/reader033/viewer/2022060317/5f0c5a1e7e708231d434f9a2/html5/thumbnails/26.jpg)
CLUSTERING CRITERION
minimizing M1 ≡maximizing M2
minimizing M5 ≡minimizing M6
![Page 27: Machine Learning for Data Science (CS4786) Lecture 3 · j randomly and set m = 1 Repeat until convergence (or until patience runs out) 1 For each t ∈ {1,...,n}, set cluster identity](https://reader033.vdocuments.mx/reader033/viewer/2022060317/5f0c5a1e7e708231d434f9a2/html5/thumbnails/27.jpg)
Lets build an Algorithm
where rj =1
|Cj |X
xt2Cj
xt
CLUSTERING CRITERION
Minimize within-cluster variance: r
j
= 1n
j
∑x∈C
j
x
M5 = K�j=1�
x
t
∈Cj
�x
t
− r
j
�22Minimize within-cluster weighted scatter
M6 = K�j=1
1�C
j
� �x
s
∈Cj
��� �x
t
∈Cj
,xs
≠x
t
�xs
− x
t
�22���= K�
j=1
1�C
j
� �x
s
∈Cj
dissimilarity �xs
,Cj
�
t
t
![Page 28: Machine Learning for Data Science (CS4786) Lecture 3 · j randomly and set m = 1 Repeat until convergence (or until patience runs out) 1 For each t ∈ {1,...,n}, set cluster identity](https://reader033.vdocuments.mx/reader033/viewer/2022060317/5f0c5a1e7e708231d434f9a2/html5/thumbnails/28.jpg)
Demo
![Page 29: Machine Learning for Data Science (CS4786) Lecture 3 · j randomly and set m = 1 Repeat until convergence (or until patience runs out) 1 For each t ∈ {1,...,n}, set cluster identity](https://reader033.vdocuments.mx/reader033/viewer/2022060317/5f0c5a1e7e708231d434f9a2/html5/thumbnails/29.jpg)
Demo
![Page 30: Machine Learning for Data Science (CS4786) Lecture 3 · j randomly and set m = 1 Repeat until convergence (or until patience runs out) 1 For each t ∈ {1,...,n}, set cluster identity](https://reader033.vdocuments.mx/reader033/viewer/2022060317/5f0c5a1e7e708231d434f9a2/html5/thumbnails/30.jpg)
Demo
![Page 31: Machine Learning for Data Science (CS4786) Lecture 3 · j randomly and set m = 1 Repeat until convergence (or until patience runs out) 1 For each t ∈ {1,...,n}, set cluster identity](https://reader033.vdocuments.mx/reader033/viewer/2022060317/5f0c5a1e7e708231d434f9a2/html5/thumbnails/31.jpg)
Demo
![Page 32: Machine Learning for Data Science (CS4786) Lecture 3 · j randomly and set m = 1 Repeat until convergence (or until patience runs out) 1 For each t ∈ {1,...,n}, set cluster identity](https://reader033.vdocuments.mx/reader033/viewer/2022060317/5f0c5a1e7e708231d434f9a2/html5/thumbnails/32.jpg)
Demo
![Page 33: Machine Learning for Data Science (CS4786) Lecture 3 · j randomly and set m = 1 Repeat until convergence (or until patience runs out) 1 For each t ∈ {1,...,n}, set cluster identity](https://reader033.vdocuments.mx/reader033/viewer/2022060317/5f0c5a1e7e708231d434f9a2/html5/thumbnails/33.jpg)
Demo
![Page 34: Machine Learning for Data Science (CS4786) Lecture 3 · j randomly and set m = 1 Repeat until convergence (or until patience runs out) 1 For each t ∈ {1,...,n}, set cluster identity](https://reader033.vdocuments.mx/reader033/viewer/2022060317/5f0c5a1e7e708231d434f9a2/html5/thumbnails/34.jpg)
Demo
![Page 35: Machine Learning for Data Science (CS4786) Lecture 3 · j randomly and set m = 1 Repeat until convergence (or until patience runs out) 1 For each t ∈ {1,...,n}, set cluster identity](https://reader033.vdocuments.mx/reader033/viewer/2022060317/5f0c5a1e7e708231d434f9a2/html5/thumbnails/35.jpg)
Demo
![Page 36: Machine Learning for Data Science (CS4786) Lecture 3 · j randomly and set m = 1 Repeat until convergence (or until patience runs out) 1 For each t ∈ {1,...,n}, set cluster identity](https://reader033.vdocuments.mx/reader033/viewer/2022060317/5f0c5a1e7e708231d434f9a2/html5/thumbnails/36.jpg)
Demo
![Page 37: Machine Learning for Data Science (CS4786) Lecture 3 · j randomly and set m = 1 Repeat until convergence (or until patience runs out) 1 For each t ∈ {1,...,n}, set cluster identity](https://reader033.vdocuments.mx/reader033/viewer/2022060317/5f0c5a1e7e708231d434f9a2/html5/thumbnails/37.jpg)
Demo
![Page 38: Machine Learning for Data Science (CS4786) Lecture 3 · j randomly and set m = 1 Repeat until convergence (or until patience runs out) 1 For each t ∈ {1,...,n}, set cluster identity](https://reader033.vdocuments.mx/reader033/viewer/2022060317/5f0c5a1e7e708231d434f9a2/html5/thumbnails/38.jpg)
Demo
![Page 39: Machine Learning for Data Science (CS4786) Lecture 3 · j randomly and set m = 1 Repeat until convergence (or until patience runs out) 1 For each t ∈ {1,...,n}, set cluster identity](https://reader033.vdocuments.mx/reader033/viewer/2022060317/5f0c5a1e7e708231d434f9a2/html5/thumbnails/39.jpg)
Demo
![Page 40: Machine Learning for Data Science (CS4786) Lecture 3 · j randomly and set m = 1 Repeat until convergence (or until patience runs out) 1 For each t ∈ {1,...,n}, set cluster identity](https://reader033.vdocuments.mx/reader033/viewer/2022060317/5f0c5a1e7e708231d434f9a2/html5/thumbnails/40.jpg)
Demo
![Page 41: Machine Learning for Data Science (CS4786) Lecture 3 · j randomly and set m = 1 Repeat until convergence (or until patience runs out) 1 For each t ∈ {1,...,n}, set cluster identity](https://reader033.vdocuments.mx/reader033/viewer/2022060317/5f0c5a1e7e708231d434f9a2/html5/thumbnails/41.jpg)
Demo
![Page 42: Machine Learning for Data Science (CS4786) Lecture 3 · j randomly and set m = 1 Repeat until convergence (or until patience runs out) 1 For each t ∈ {1,...,n}, set cluster identity](https://reader033.vdocuments.mx/reader033/viewer/2022060317/5f0c5a1e7e708231d434f9a2/html5/thumbnails/42.jpg)
t
K-MEANS CLUSTERING
For all j ∈ [K], initialize cluster centroids r
0j randomly and set m = 1
Repeat until convergence (or until patience runs out)1 For each t ∈ {1, . . . ,n}, set cluster identity of the point
cm(xt) = argminj∈[K]
�xt − r
m−1j �
2 For each j ∈ [K], set new representative as
r
mj = 1�Cm
j � �xt∈Cmj
xt
3 m← m + 1
![Page 43: Machine Learning for Data Science (CS4786) Lecture 3 · j randomly and set m = 1 Repeat until convergence (or until patience runs out) 1 For each t ∈ {1,...,n}, set cluster identity](https://reader033.vdocuments.mx/reader033/viewer/2022060317/5f0c5a1e7e708231d434f9a2/html5/thumbnails/43.jpg)
KX
j=1
X
t2Cj
������xt �
1
|Cj |X
s2Cj
xs
������
2
= minr1,...,rK
KX
j=1
X
t2Cj
kxt � rjk2
K-means objective
K-MEANS CONVERGENCE
K-means algorithm converges to local minima of objective
O (c; r1, . . . , rK) = K�j=1�
c(xt)=j�xt − rj�22
Proof:Clustering assignment improves objective:
O �cm−1; rm1 , . . . , r
mK� ≥ O (cm; rm
1 , . . . , rmK)
(By definition of cm(xt))Computing centroids improves objective:
O (cm; rm1 , . . . , r
mK) ≥ O �cm; rm+1
1 , . . . , rm+1K �
(By the fact about centroid)
Minimize above objective over c and r1,…,rK
= =
M5 = minr1,...,rK
O(c; r1, . . . , rK)
![Page 44: Machine Learning for Data Science (CS4786) Lecture 3 · j randomly and set m = 1 Repeat until convergence (or until patience runs out) 1 For each t ∈ {1,...,n}, set cluster identity](https://reader033.vdocuments.mx/reader033/viewer/2022060317/5f0c5a1e7e708231d434f9a2/html5/thumbnails/44.jpg)
Fact: Centroid is Minimizer
8rj ,X
t2Cj
������xt �
1
|Cj |X
s2Cj
xs
������
2
X
t2Cj
kxt � rjk2
![Page 45: Machine Learning for Data Science (CS4786) Lecture 3 · j randomly and set m = 1 Repeat until convergence (or until patience runs out) 1 For each t ∈ {1,...,n}, set cluster identity](https://reader033.vdocuments.mx/reader033/viewer/2022060317/5f0c5a1e7e708231d434f9a2/html5/thumbnails/45.jpg)
X
t2Cj
kxt � rjk2
=X
t2Cj
kxt � µj + µj � rjk2
=X
t2Cj
kxt � µjk2 +X
t2Cj
kµj � rjk2 + 2X
t2Cj
(xt � µj)> (µj � rj)
=X
t2Cj
kxt � µjk2 +X
t2Cj
kµj � rjk2 + 2
0
@X
t2Cj
xt � |Cj |µj
1
A>
(µj � rj)
=X
t2Cj
kxt � µjk2 +X
t2Cj
kµj � rjk2
�X
t2Cj
kxt � µjk2
Proof
µj =1
|Cj |X
t2Cj
xt
![Page 46: Machine Learning for Data Science (CS4786) Lecture 3 · j randomly and set m = 1 Repeat until convergence (or until patience runs out) 1 For each t ∈ {1,...,n}, set cluster identity](https://reader033.vdocuments.mx/reader033/viewer/2022060317/5f0c5a1e7e708231d434f9a2/html5/thumbnails/46.jpg)
K-MEANS CONVERGENCE
K-means algorithm converges to local minima of objective
O (c; r1, . . . , rK) = K�j=1�
c(xt)=j�xt − rj�22
Proof:Clustering assignment improves objective:
O �cm−1; rm−11 , . . . , rm−1
K � ≥ O �cm; rm−11 , . . . , rm−1
K �(By definition of cm(xt))
Computing centroids improves objective:
O �cm; rm−11 , . . . , rm−1
K � ≥ O (cm; rm1 , . . . , r
mK)
(By the fact about centroid)