intelligent database systems lab n.y.u.s.t. i. m. determining the best k for clustering...
TRANSCRIPT
![Page 1: Intelligent Database Systems Lab N.Y.U.S.T. I. M. Determining the best K for clustering transactional datasets – A coverage density-based approach Presenter](https://reader035.vdocuments.mx/reader035/viewer/2022081519/56649f135503460f94c2741e/html5/thumbnails/1.jpg)
Intelligent Database Systems Lab
N.Y.U.S.T.I. M.
Determining the best K for clustering transactional datasets –
A coverage density-based approach
Presenter : Lin, Shu-Han
Authors : Hua Yan, Keke Chen, Ling Liu, Joonsoo Bae
Data & Knowledge Engineering (DKE) 68 (2009) 28–48
![Page 2: Intelligent Database Systems Lab N.Y.U.S.T. I. M. Determining the best K for clustering transactional datasets – A coverage density-based approach Presenter](https://reader035.vdocuments.mx/reader035/viewer/2022081519/56649f135503460f94c2741e/html5/thumbnails/2.jpg)
Intelligent Database Systems Lab
N.Y.U.S.T.I. M.
2
Outline
Motivation Objective Methodology Experiments Conclusion Comments
![Page 3: Intelligent Database Systems Lab N.Y.U.S.T. I. M. Determining the best K for clustering transactional datasets – A coverage density-based approach Presenter](https://reader035.vdocuments.mx/reader035/viewer/2022081519/56649f135503460f94c2741e/html5/thumbnails/3.jpg)
Intelligent Database Systems Lab
N.Y.U.S.T.I. M.Motivation
Cluster the transactional datasets – a kind of special categorical data
Time complexity: O(dmN2logN)
3
Name Buy
Jane Coke, Milk
Mary Coke, Pepsi
Tom Milk, Water
Denny Milk, Juice
TinaJuice, Red
Wine, Pepsi
Boolean values
Name Coke Milk Pepsi Water Juice Red Wine
Jane 1 1 0 0 0 0Mary 1 0 1 0 0 0Tom 0 1 0 1 0 0
Denny 0 1 0 0 1 0Tina 0 0 1 0 1 1
![Page 4: Intelligent Database Systems Lab N.Y.U.S.T. I. M. Determining the best K for clustering transactional datasets – A coverage density-based approach Presenter](https://reader035.vdocuments.mx/reader035/viewer/2022081519/56649f135503460f94c2741e/html5/thumbnails/4.jpg)
Intelligent Database Systems Lab
N.Y.U.S.T.I. M.Objectives
To design a method ACTD (Agglomerative Clustering algorithm with Transactional-cluster-modes Dissimilarity) especially for transactional data
Instead of ACE (Agglomerative Categorical clustering with Entropy criterion) Find best-K
More efficiently
4
![Page 5: Intelligent Database Systems Lab N.Y.U.S.T. I. M. Determining the best K for clustering transactional datasets – A coverage density-based approach Presenter](https://reader035.vdocuments.mx/reader035/viewer/2022081519/56649f135503460f94c2741e/html5/thumbnails/5.jpg)
Intelligent Database Systems Lab
N.Y.U.S.T.I. M.
ACE ACTD
Methodology – Overview of SCALE
5
(Sampling, Clustering structure Assessment, cLustering & domain-specfic Evaluation)
Agglomerative
BKPlot DMDI
![Page 6: Intelligent Database Systems Lab N.Y.U.S.T. I. M. Determining the best K for clustering transactional datasets – A coverage density-based approach Presenter](https://reader035.vdocuments.mx/reader035/viewer/2022081519/56649f135503460f94c2741e/html5/thumbnails/6.jpg)
Intelligent Database Systems Lab
N.Y.U.S.T.I. M.
Coverage Density
Transactional-cluster-mode A subset of items
Methodology – ACTDIntra-cluster similarity
6
9
7
33
7
Nk
Mk
1.c
2/3,b
2/3,a
.8,
in this case, only c is the transactional-cluster-mode
![Page 7: Intelligent Database Systems Lab N.Y.U.S.T. I. M. Determining the best K for clustering transactional datasets – A coverage density-based approach Presenter](https://reader035.vdocuments.mx/reader035/viewer/2022081519/56649f135503460f94c2741e/html5/thumbnails/7.jpg)
Intelligent Database Systems Lab
N.Y.U.S.T.I. M.
Transactional-cluster-mode dissimilarity
Time complexity: O(dmN2logN) O(MN2logN)
Methodology – ACTDInter-cluster similarity
7
032
33-1
5
2
10
6-1
52
33-1
2
1
12
6-1
62
33-1
[0, .5]
![Page 8: Intelligent Database Systems Lab N.Y.U.S.T. I. M. Determining the best K for clustering transactional datasets – A coverage density-based approach Presenter](https://reader035.vdocuments.mx/reader035/viewer/2022081519/56649f135503460f94c2741e/html5/thumbnails/8.jpg)
Intelligent Database Systems Lab
N.Y.U.S.T.I. M.Methodology – DMDI
8
Valleys、change dramatically
![Page 9: Intelligent Database Systems Lab N.Y.U.S.T. I. M. Determining the best K for clustering transactional datasets – A coverage density-based approach Presenter](https://reader035.vdocuments.mx/reader035/viewer/2022081519/56649f135503460f94c2741e/html5/thumbnails/9.jpg)
Intelligent Database Systems Lab
N.Y.U.S.T.I. M.Experiments – Performance
9
![Page 10: Intelligent Database Systems Lab N.Y.U.S.T. I. M. Determining the best K for clustering transactional datasets – A coverage density-based approach Presenter](https://reader035.vdocuments.mx/reader035/viewer/2022081519/56649f135503460f94c2741e/html5/thumbnails/10.jpg)
Intelligent Database Systems Lab
N.Y.U.S.T.I. M.Experiments – Quality
10
![Page 11: Intelligent Database Systems Lab N.Y.U.S.T. I. M. Determining the best K for clustering transactional datasets – A coverage density-based approach Presenter](https://reader035.vdocuments.mx/reader035/viewer/2022081519/56649f135503460f94c2741e/html5/thumbnails/11.jpg)
Intelligent Database Systems Lab
N.Y.U.S.T.I. M.Experiments – Quality on sample dataset
11
With noise
![Page 12: Intelligent Database Systems Lab N.Y.U.S.T. I. M. Determining the best K for clustering transactional datasets – A coverage density-based approach Presenter](https://reader035.vdocuments.mx/reader035/viewer/2022081519/56649f135503460f94c2741e/html5/thumbnails/12.jpg)
Intelligent Database Systems Lab
N.Y.U.S.T.I. M.
12
Conclusions
The ACTD The Coverage Density-based method is promising for
transactional datasets Faster
More stable
than entropy-based method
The Agglomerative Hierarchical clustering algorithm and DMDI can help to find best-K
![Page 13: Intelligent Database Systems Lab N.Y.U.S.T. I. M. Determining the best K for clustering transactional datasets – A coverage density-based approach Presenter](https://reader035.vdocuments.mx/reader035/viewer/2022081519/56649f135503460f94c2741e/html5/thumbnails/13.jpg)
Intelligent Database Systems Lab
N.Y.U.S.T.I. M.
13
Comments
Advantage …
Drawback …
Application …