extending the growing hierarchal som for clustering documents in graphs domain
DESCRIPTION
Extending the Growing Hierarchal SOM for Clustering Documents in Graphs domain. Presenter : Cheng- Hui Chen Authors : Mahmoud F. Hussin , Mahmoud R. farra and Yasser El- Sonbaty IJCNN, 2008. Outlines. Motivation Objectives Methodology Experiments Conclusions Comments. - PowerPoint PPT PresentationTRANSCRIPT
Intelligent Database Systems Lab
國立雲林科技大學National Yunlin University of Science and Technology
1
Extending the Growing Hierarchal SOM for Clustering Documents in Graphs domain
Presenter : Cheng-Hui Chen Authors : Mahmoud F. Hussin, Mahmoud R. farra and Yasser El-Sonbaty
IJCNN, 2008
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.
2
Outlines Motivation Objectives Methodology Experiments Conclusions Comments
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.Motivation· The variants of SOM are limited by the fact that they
use only the VSM for document representations. ─ It does not represent any relation between the words.─ The space complexity to the VSM.
· The sentences being broken down into their individual components without any representation of the sentence structure.
3
term B
term A
*
*
D1
D2
θ
d
river raftingmild
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.Objectives· Using graphs to represent documents helped
the salient features of data through using edges to represent relations and using vertices to represent words.
· The decrease the space complexity comparing to the VSM.
· The extend the GHSOM to work in the graph domain to enhance the quality of clusters.
4
rafting
river
mild
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.Methodology
5
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.Enhance the DIG to work with G-GHSOM· The Document Index Graph (DIG) model
─ the DIG for representing the document and Exploited it in the document clustering.
· For example (the document table of word "river" is shown)─ River rafting. (doc1)─ mild river rafting. (doc2)─ River fishing. (doc3)
6
e1 S0(1)
e0 S0(0)
river
3
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.Enhance the DIG to work with G-GHSOM
7
1
2
1
2
3
4
34
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.Enhance the DIG to work with G-GHSOM· Single-word similarity measure
· Two document vectors similarity measure
· The total similarity is the integration
8
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.Ment of the G-GHSOM to work with graph· Neuron Initialization
─ Detecting the matching list to calculate the phrase based similarity.
9
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.Ment of the G-GHSOM to work with graph
10
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.Ment of the G-GHSOM to work with graph
11
Gin
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.Experiments· Data set
─ Reuter’s news articles (RNA)─ University of Waterloo and Canadian Web sites (UW-
CAN)
12
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.Experiments
13
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.Conclusions· The extend the GHSOM to a new graph based
GHSOM: (G-GHSOM) to enhance the quality of the document clustering.─ G-GHSOM works successfully with graph domain and
achieves a better quality clustering than TGHSOM in document clustering.
· The enhanced the DIG model to work with GHSOM algorithm.
14
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.
15
Comments· Advantages
─ Enhance the quality of the document clustering· Application
─ SOM─ Clustering