a matrix density based algorithm to hierarchically co-cluster documents and words

A matrix density based algorithm to hierarchically co-cluster documents and words Advisor ： Dr. Hsu Graduate ： Keng-Wei Chang Author ： Bhushan Mandhan i Sachindra Joshi Krishna Ku mmamuru

Upload: quinn-beach

Post on 03-Jan-2016

23 views

Category:

Documents

0 download

Report

Download

Tags:

Embed Size (px):

DESCRIPTION

A matrix density based algorithm to hierarchically co-cluster documents and words. Advisor ： Dr. Hsu Graduate ： Keng-Wei Chang Author ： Bhushan Mandhani Sachindra Joshi Krishna Kummamuru. outline. Motivation Objective Introduction background - PowerPoint PPT Presentation

TRANSCRIPT

A matrix density based algorithm to hierarchically co-cluster documents and words

Advisor ： Dr. HsuGraduate ： Keng-Wei ChangAuthor ： Bhushan Mandhani

Sachindra Joshi Krishna Kummamuru

Page 2: A matrix density based algorithm to hierarchically co-cluster documents and words

outline

Motivation Objective Introduction background Rowset Partitioning and Submatrix Agglomeration(RPSA)

Experimental results Conclusions Personal Opinion

Page 3: A matrix density based algorithm to hierarchically co-cluster documents and words

Motivation

With this explosion of unstructured information, it has become increasingly important to organize the information in a comprehensible and navigable manner.

Page 4: A matrix density based algorithm to hierarchically co-cluster documents and words

Objective

A hierarchical arrangement of documents is very useful in browsing a document collection, as the popularity of the Yahoo 、 Google.

This paper proposes an algorithm to hierarchically cluster documents for solving problems.

Page 5: A matrix density based algorithm to hierarchically co-cluster documents and words

Introduction

90s -> 100 thousand pages ； 2002 -> 2 billion pages; it has become increasingly important to organize

the information Manually is accurate, but not always feasible Need tools to automatically arrange documents t

o labeled hierarchies Propose RPSA -> two step partitional-agglomerative

Page 6: A matrix density based algorithm to hierarchically co-cluster documents and words

background

Vector Model for Documents Evaluation of Clustering Quality Evaluation of Hierarchical Clustering

Page 7: A matrix density based algorithm to hierarchically co-cluster documents and words

Vector Model for Documents

We have d documents

Document i is represented by

is the number of occurrences of word j in document i

Term Frequency ， TF

Inverse Document Frequency ， IDF

ijt

Unitized-TF IDF

Page 8: A matrix density based algorithm to hierarchically co-cluster documents and words

Evaluation of Clustering Quality

1. Purity ：

2. Entropy ：

i ijj ppE

1log-

Page 9: A matrix density based algorithm to hierarchically co-cluster documents and words

Evaluation of Hierarchical Clustering

Page 10: A matrix density based algorithm to hierarchically co-cluster documents and words

Rowset Partitioning and Submatrix Agglomeration(RPSA)

tow-step partitional-agglomerative algorithm 1th step ： The Partitioning Step 2th step ： The Agglomerative Step

Page 11: A matrix density based algorithm to hierarchically co-cluster documents and words

The Partitioning Step

Define the density of submatices

a row r， a column c

a set R of rows ， a set C of columns

Page 12: A matrix density based algorithm to hierarchically co-cluster documents and words

The Partitioning Step

Generating a Leaf Cluster

Page 13: A matrix density based algorithm to hierarchically co-cluster documents and words

The Partitioning Step

Choice of Leader Documents

The sum of TFIDF vector representing that document

Documents with relatively large lengths were observed to be better leader documents for the algorithm above

Page 14: A matrix density based algorithm to hierarchically co-cluster documents and words

The Partitioning Step

The Complete Partitioning Algorithm

Page 15: A matrix density based algorithm to hierarchically co-cluster documents and words

The Partitioning Step

Complexity Analysis The time complexity is O(mz) The space complexity is O(z)

Page 16: A matrix density based algorithm to hierarchically co-cluster documents and words

The Agglomerative Step

Reduce the number of clusters The similarity measure between two clusters

for merging Flat Clustering Hierarchical Clustering

Page 17: A matrix density based algorithm to hierarchically co-cluster documents and words

The Agglomerative Step

Complexity Analysis The time complexity is O( ) The space complexity is O( )

zm2

Page 18: A matrix density based algorithm to hierarchically co-cluster documents and words

Experimental results-Flat Clustering

Data Sets

Page 19: A matrix density based algorithm to hierarchically co-cluster documents and words

Experimental results-Flat Clustering

Results

Page 20: A matrix density based algorithm to hierarchically co-cluster documents and words

Experimental results-Flat Clustering

Page 21: A matrix density based algorithm to hierarchically co-cluster documents and words

Experimental results-Hierarchical Clustering

Data Sets

Page 22: A matrix density based algorithm to hierarchically co-cluster documents and words

Experimental results-Hierarchical Clustering

Data Sets

Page 23: A matrix density based algorithm to hierarchically co-cluster documents and words

Experimental results-Hierarchical Clustering

Results

Page 24: A matrix density based algorithm to hierarchically co-cluster documents and words

Conclusions

It is comparable with or better than the best k-means run

It’s performance does not degrade on small data sets

It’s acceptable on purity in hierarchy

Page 25: A matrix density based algorithm to hierarchically co-cluster documents and words

Personal Opinion

HIERARCHICALLY HYPERBOLIC SPACES II: COMBINATION

A brownian dynamics interpretation of membrane protein ... · fusion 2. Experiments 1: cluster density and cluster diameter ‣ STED microscopy on plasma sheets yields density of

Spatial Cluster Analysis of High-Density Vehicle-Bear

High-density limits of hierarchically structured branching ... · ELSEVIER Stochastic Processes and their Applications 62 (1996) 191-222 stochastic processes and their applications

The Star Formation- Density Relation …and the Cluster Abell 901/2 in COMBO-17

MODELING HIERARCHICALLY STRUCTURED … · where ã, ì, í 6 are the principal moments of the cluster gyration tensor and b and c are the asphericity and acylindricity of the cluster,

Optimizing Cluster Density on Illumina Sequencing Systems · PDF fileOptimizing Cluster Density on Illumina Sequencing Systems Understanding cluster density limitations and strategies

Hierarchically Clustered Representation Learning

Recovering galaxy cluster gas density profiles with XMM

Hierarchically Tiled Arrays (HTAs)

Cluster Analysis - VUBThe density-based methods cluster instances based on the distance between instances, which can nd arbitrarily shaped clusters. It can cluster instances as dense

Electrohydrodynamic-assisted Assembly of Hierarchically ...yylab.seas.ucla.edu/papers/srep38701.pdf · Electrohydrodynamic-assisted Assembly of Hierarchically Structured, 3D Crumpled

Adsorption-Induced Deformation of Hierarchically ...sol.rutgers.edu/~aneimark/PDFs/BalzerEtAl_AnisotropicDeformation... · Adsorption-Induced Deformation of Hierarchically Structured

VOSviewer Manual · An example of the cluster density visualization is shown in Figure 5. 8 Figure 5. The cluster density visualization. 2.1.4 Zooming and scrolling To facilitate

Hierarchically Structured Nanoporous Poly(Ionic Liquid

Hierarchically porous polymer coatings forhighlyefficient ... · OPTICAL METAMATERIALS Hierarchically porous polymer coatings forhighlyefficient passive daytime radiative cooling

HIERARCHICALLY HYPERBOLIC SPACES I: CURVE …hierarchically hyperbolic spaces i: curve complexes for cubical groups 7 (3)Teichmüller space T pSqwith the Weil-Petersson metric is hierarchically

Odd-Mass Nuclei in the Cluster Shell Model€¦ · Adrian Horacio Santana Valdés M.Sc. Thesis, UNAM (2018) Cluster density Cluster potential INPC 2019 Roelof Bijker, ICN-UNAM 8

DMN09-Hierarchically Distributed Peer-To-peer Document Clustering and Cluster Summarization

The CLUSTER Procedure - Sas Institutecluster analysis to produce a large number of clusters. Then use PROC CLUSTER to cluster the preliminary clusters hierarchically. This method is

Cluster and Density wave --- cluster structures in 28 Si and 12 C---

IT: Machine Independent Programming on Hierarchically

CSE601 Density-based Clustering · Density-based Clustering •Basic idea –Clusters are dense regions in the data space, separated by regions of lower object density –A cluster

Hierarchically Structured Optical Materials

Hierarchically-driven Approach for Quantifying Materials

Hierarchically Porous Multimetal‐Based Carbon Nanorod

Clustering - GitHub Pages and worksheets/Class 4.pdf · 4. Now hierarchically cluster this data, using scipy.cluster.hierarchy.linkage. Choose Ward’s method, and plot the resulting

Model-Based Clustering, Discriminant Analysis, and Density ...€¦ · Model-Based Clustering, Discriminant Analysis, and Density Estimation Chris FRALEY and Adrian E. RAFTERY Cluster

Adaptive Object Representation with Hierarchically-Distributed

Hierarchically nested factor models

Nonparametric density estimation and clustering in ...wjang/paper/Jang06.pdf · Keywords: Galaxy clustering; Density contour cluster; Plug-in estimator; Cross-validation; Smoothed

Hierarchical density-Based clustering of White Matter ...dm.uestc.edu.cn/wp-content/uploads/paper/Hierarchical-Density-bas… · cluster white matter tracts. However, before grouping

Hierarchically Focused Guardbanding: An Adaptive Approach

Chapter DM:II - webis.de · Chapter DM:II II.Cluster Analysis q Cluster Analysis Basics q Hierarchical Cluster Analysis q Iterative Cluster Analysis q Density-Based Cluster Analysis

Hierarchically Tiled Arrays

a matrix density based algorithm to hierarchically co-cluster documents and words

Documents

better leader documents

cocluster documents

omzthe space complexity

ozthe agglomerative

document collection

matrix density

number of occurrences

clustering quality1