data clustring
TRANSCRIPT
![Page 1: Data clustring](https://reader035.vdocuments.mx/reader035/viewer/2022070515/5877da961a28abaa6c8b6143/html5/thumbnails/1.jpg)
DATA CLUSTRING
![Page 2: Data clustring](https://reader035.vdocuments.mx/reader035/viewer/2022070515/5877da961a28abaa6c8b6143/html5/thumbnails/2.jpg)
DATA Data is any raw material or unorganized information.
CLUSTER Cluster is group of objects that belongs to a
same class. Cluster is a set of tables physically stored
together as one table that shares common columns.
Data Clustering
![Page 3: Data clustring](https://reader035.vdocuments.mx/reader035/viewer/2022070515/5877da961a28abaa6c8b6143/html5/thumbnails/3.jpg)
Data clustering is technique in which the information that is logically similar is physically stored together.
Clustering is “the process of organizing objects into groups whose members are similar in some way
In clustering the objects of similar properties are placed in one class of objects. (eg: Nic,lib)
DATA CLUSTRING
![Page 4: Data clustring](https://reader035.vdocuments.mx/reader035/viewer/2022070515/5877da961a28abaa6c8b6143/html5/thumbnails/4.jpg)
![Page 5: Data clustring](https://reader035.vdocuments.mx/reader035/viewer/2022070515/5877da961a28abaa6c8b6143/html5/thumbnails/5.jpg)
Why clustering?
A few good reasons ...
Simplifications (eg. Lib) Pattern detection (eg. fb img) Useful in data concept construction Unsupervised learning process
Procedure that identify groups in the data.
![Page 6: Data clustring](https://reader035.vdocuments.mx/reader035/viewer/2022070515/5877da961a28abaa6c8b6143/html5/thumbnails/6.jpg)
Where we use data clustering ? Data Mining Pattern Recognition Speech Recognition Text Mining Web Analysis Marketing Medical Diagnostic Image Processing
Applications of Data Clustering
![Page 7: Data clustring](https://reader035.vdocuments.mx/reader035/viewer/2022070515/5877da961a28abaa6c8b6143/html5/thumbnails/7.jpg)
A good clustering method will produce high quality clusters with high intra-class similarity low inter-class similarity
The quality of a clustering result depends on both the similarity measure used by the method and its implementation.
The quality of a clustering method is also measured by its ability to discover some or all of the hidden patterns.
What Is Good Clustering ?
![Page 8: Data clustring](https://reader035.vdocuments.mx/reader035/viewer/2022070515/5877da961a28abaa6c8b6143/html5/thumbnails/8.jpg)
Good Clustering
![Page 9: Data clustring](https://reader035.vdocuments.mx/reader035/viewer/2022070515/5877da961a28abaa6c8b6143/html5/thumbnails/9.jpg)
Data mining is the process to discover information from large amounts of data, using pattern recognition technologies and mathematical techniques.
Data mining is widely used in many domains, such as retail, finance, telecommunication and social media
Data Clustering in Data Mining
(The analysis step of the "Knowledge Discovery in Databases" process, or KDD)
![Page 10: Data clustring](https://reader035.vdocuments.mx/reader035/viewer/2022070515/5877da961a28abaa6c8b6143/html5/thumbnails/10.jpg)
Partitioning MethodsHierarchical MethodsDensity-Based MethodsGrid-Based MethodsModel-Based Clustering Methods
Major Clustering Approaches
![Page 11: Data clustring](https://reader035.vdocuments.mx/reader035/viewer/2022070515/5877da961a28abaa6c8b6143/html5/thumbnails/11.jpg)
Partitioning method: Construct a partition of a database D of n objects into a set of k clusters
Given a k, find a partition of k clusters that optimizes the chosen partitioning criterion Heuristic methods: k-means and k-medoids algorithms k-means (MacQueen’67): Each cluster is represented by
the center of the cluster k-medoids or PAM (Partition around medoids) (Kaufman &
Rousseeuw’87): Each cluster is represented by one of the objects in the cluster
Partitioning Methods
![Page 12: Data clustring](https://reader035.vdocuments.mx/reader035/viewer/2022070515/5877da961a28abaa6c8b6143/html5/thumbnails/12.jpg)
Given k, the k-means algorithm is implemented in 4 steps:Partition objects into k nonempty subsetsCompute seed points as the centroids of the
clusters of the current partition. The centroid is the center (mean point) of the cluster.
Assign each object to the cluster with the nearest seed point.
Go back to Step 2, stop when no more new assignment.
The K-Means Clustering Method
![Page 13: Data clustring](https://reader035.vdocuments.mx/reader035/viewer/2022070515/5877da961a28abaa6c8b6143/html5/thumbnails/13.jpg)
.
0
1
2
3
4
5
6
7
8
9
10
0 1 2 3 4 5 6 7 8 9 100
1
2
3
4
5
6
7
8
9
10
0 1 2 3 4 5 6 7 8 9 10
0
1
2
3
4
5
6
7
8
9
10
0 1 2 3 4 5 6 7 8 9 10
0
1
2
3
4
5
6
7
8
9
10
0 1 2 3 4 5 6 7 8 9 10
The K-Means Clustering Method EXAMPLE
![Page 14: Data clustring](https://reader035.vdocuments.mx/reader035/viewer/2022070515/5877da961a28abaa6c8b6143/html5/thumbnails/14.jpg)
Create a hierarchical decomposition of the set of data (or objects) using some criterion
Hierarchical Clustering
![Page 15: Data clustring](https://reader035.vdocuments.mx/reader035/viewer/2022070515/5877da961a28abaa6c8b6143/html5/thumbnails/15.jpg)
Hierarchical Clustering
Use distance matrix as clustering criteria. This method does not require the number of clusters k as an input, but needs a termination condition
agglomerative (AGNES)
Bottom-up
divisive (DIANA)
Top-down
c
d
e
a
bab
de
cde
abcde
![Page 16: Data clustring](https://reader035.vdocuments.mx/reader035/viewer/2022070515/5877da961a28abaa6c8b6143/html5/thumbnails/16.jpg)
Density-based: based on connectivity and density functions
Grid-based: based on a multiple-level granularity structure
Model-based: A model is hypothesized for each of the clusters and the idea is to find the best fit of that model to each other
Other Algorithms
![Page 17: Data clustring](https://reader035.vdocuments.mx/reader035/viewer/2022070515/5877da961a28abaa6c8b6143/html5/thumbnails/17.jpg)
Scalability We need highly scalable clustering algorithms to deal with large databases. The ability of a system to handle a growing amount of work in a capable
manner Ability to deal with different kind of attributes
Algorithms should be capable to be applied on any kind of data such as interval based (numerical) data, categorical, binary data.
High dimensionality The clustering algorithm should not only be able to handle low- dimensional data
but also the high dimensional space. Ability to deal with noisy data
Databases contain noisy, missing or erroneous data. Some algorithms are sensitive to such data and may lead to poor quality clusters.
Interpretability The clustering results should be interpretable, comprehensible and usable.
Requirements of Clustering in Data Mining
![Page 18: Data clustring](https://reader035.vdocuments.mx/reader035/viewer/2022070515/5877da961a28abaa6c8b6143/html5/thumbnails/18.jpg)
Conclusion
In this presentation, i try to give the basic concept of clustering by first providing the definition of clustering and then the definition of some related terms. i give some examples to elaborate the concept. Then i give different approaches to data clustering and also discussed some algorithms to implement that approaches. The partitioning method and hierarchical method of clustering were explained. The applications of clustering are also discussed with the examples of medical images database, data mining using data clustering
![Page 19: Data clustring](https://reader035.vdocuments.mx/reader035/viewer/2022070515/5877da961a28abaa6c8b6143/html5/thumbnails/19.jpg)
![Page 20: Data clustring](https://reader035.vdocuments.mx/reader035/viewer/2022070515/5877da961a28abaa6c8b6143/html5/thumbnails/20.jpg)
Thank You…