presenter : keng -yu lin author : amir ahmad , lipika dey prl . 2011

Download Presenter :   Keng -Yu Lin Author : Amir Ahmad ,  Lipika Dey PRL . 2011

Post on 23-Feb-2016




0 download

Embed Size (px)


A k-means type clustering algorithm for subspace clustering of mixed numeric and categorical datasets. Presenter : Keng -Yu Lin Author : Amir Ahmad , Lipika Dey PRL . 2011. Outlines. Motivation Objectives Methodology Experiments Conclusions Comments. Motivation. - PowerPoint PPT Presentation


Research Progress Report

1A k-means type clustering algorithm for subspace clustering of mixed numeric and categorical datasetsPresenter : Keng-Yu LinAuthor : Amir Ahmad , Lipika Dey

PRL. 2011Intelligent Database Systems LabNational Yunlin University of Science and TechnologyIntelligent Database Systems LabN.Y.U.S.T.I. M.12OutlinesMotivationObjectivesMethodologyExperimentsConclusionsCommentsIntelligent Database Systems LabN.Y.U.S.T.I. M.2MotivationAlmost all subspace clustering algorithms proposed so far are designed for numeric datasets.

3Intelligent Database Systems LabN.Y.U.S.T.I. M.34ObjectivesThis paper present a k-means type clustering algorithm that finds clusters in data subspaces in mixed numeric and categorical datasets.Intelligent Database Systems LabN.Y.U.S.T.I. M.4Methodologyk-means clustering algorithmPlace K points into the space represented by the objects that are being clustered. These points represent initial group centroids.

Assign each object to the group that has the closest centroid.

When all objects have been assigned, recalculate the positions of the K centroids.

Repeat Steps 2 and 3 until the centroids no longer move. This produces a separation of the objects into groups from which the metric to be minimized can be calculated.5

Intelligent Database Systems LabN.Y.U.S.T.I. M.5Methodology6

Intelligent Database Systems LabN.Y.U.S.T.I. M.6ExperimentsVote dataset7

error rate : 4.8%Zaki et al. error rate : 3.8%Intelligent Database Systems LabN.Y.U.S.T.I. M.7ExperimentsMushroom datasets8error rate : 4.1%Zaki et al. error rate : 0.3%

Intelligent Database Systems LabN.Y.U.S.T.I. M.8ExperimentsDNA datasets9

error rate : 17%Intelligent Database Systems LabN.Y.U.S.T.I. M.9ExperimentsAustralian credit data10

error rate : 13.9%Huang et al.(2005) error rate: 15%Intelligent Database Systems LabN.Y.U.S.T.I. M.10ConclusionsThis paper presented a clustering algorithm for subspace clustering for mixed numeric and categorical data. 11Intelligent Database Systems LabN.Y.U.S.T.I. M.11CommentsAdvantage

ApplicationsSubspace clustering.

12Intelligent Database Systems LabN.Y.U.S.T.I. M.12