web search clustering and labeling with hidden topics
DESCRIPTION
Web Search Clustering and Labeling with Hidden Topics. Presenter : Chien-Hsing Chen Author: Cam- Tu Nguyen Xuan-Hieu Phan Susumu Horiguchi Thu- Trang Nguyen Quang-Thuy Ha. 2009.TALIP.40 . Outline. Motivation Objective Method Experiments Conclusion - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Web Search Clustering and Labeling with Hidden Topics](https://reader034.vdocuments.mx/reader034/viewer/2022051421/5681611b550346895dd073bf/html5/thumbnails/1.jpg)
Intelligent Database Systems Lab
國立雲林科技大學National Yunlin University of Science and Technology
Web Search Clustering and Labeling withHidden Topics
Presenter : Chien-Hsing ChenAuthor: Cam-Tu Nguyen Xuan-Hieu Phan Susumu Horiguchi Thu-Trang Nguyen Quang-Thuy Ha
1
2009.TALIP.40.
![Page 2: Web Search Clustering and Labeling with Hidden Topics](https://reader034.vdocuments.mx/reader034/viewer/2022051421/5681611b550346895dd073bf/html5/thumbnails/2.jpg)
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.
2
Outline Motivation Objective Method Experiments Conclusion Comment
![Page 3: Web Search Clustering and Labeling with Hidden Topics](https://reader034.vdocuments.mx/reader034/viewer/2022051421/5681611b550346895dd073bf/html5/thumbnails/3.jpg)
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.
3
d1: ezPeer+ 音樂下載、音樂試聽、歌詞、 MP3 、音樂網 - 蔡依林 - 歷年專輯 ezPeer+ – 蔡依林 - J1 Live Concert 演唱會影音全紀錄 ,J-game, 看我 72 變 , 城堡 ,J9 Party 派對精選 ,Jolin J- Top 冠軍精選 , 舞孃 , 蔡依林唯舞獨尊演唱會鮮聽版 & 混音專輯 & 花 ... web.ezpeer.com/singer/s120.html - 頁庫存檔 - 類似內容
d2: ezPeer+ 音樂下載、音樂試 花蝴蝶好聽… web.ezpeer.com/singer/s120.html - 頁庫存檔 - 類似內容
The snippets are usually noisier, less topic-focused, and much shorter 花 ??
similarity evaluation between snippets may not be successful
Motivation
d3: {He is an author}d4: {The writer is standing behind you}
![Page 4: Web Search Clustering and Labeling with Hidden Topics](https://reader034.vdocuments.mx/reader034/viewer/2022051421/5681611b550346895dd073bf/html5/thumbnails/4.jpg)
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.
4
Similarity evaluator is referred to a set of hidden topics
di: {He is an author}dj: {The writer is standing behind you}
(a document may be related to multi-topics)
Objective
![Page 5: Web Search Clustering and Labeling with Hidden Topics](https://reader034.vdocuments.mx/reader034/viewer/2022051421/5681611b550346895dd073bf/html5/thumbnails/5.jpg)
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.
5
Framework
(label candidate generation)
di > topic10dj > topic10
djdi
musicmovieradio player
musicmovie
![Page 6: Web Search Clustering and Labeling with Hidden Topics](https://reader034.vdocuments.mx/reader034/viewer/2022051421/5681611b550346895dd073bf/html5/thumbnails/6.jpg)
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.
6
LDA
music movie author kill writer book …
1 1 0 0 1 1 …
1 1 0 0 1 1 …
k topicm documentn word
zm,n
wm,n
k = 10 (show business)K=60
z1
w1
1 1 0 0 1 1 …
1 0 0 0 0 0 …
z2
w2
1 1 0 0 0 0 …
1 0 0 0 0 0 …
z3
w3
politicsentertainment
show business
edu. cul.hel.
the word “music” in the topic 10 can explain the occurrence of the words in the documents m=1,2,3
In training step:the keyword is related to a topic when it often occurs in the documents topic
refer to topic k
refer to vocabulary
![Page 7: Web Search Clustering and Labeling with Hidden Topics](https://reader034.vdocuments.mx/reader034/viewer/2022051421/5681611b550346895dd073bf/html5/thumbnails/7.jpg)
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.
7
LDA
music movie author move writer book …
1 1 0 0 1 1 …
1 1 0 1 1 1 …
k topicm documentn word zm,n
wm,nk = topic 10K=60
z1
w1
![Page 8: Web Search Clustering and Labeling with Hidden Topics](https://reader034.vdocuments.mx/reader034/viewer/2022051421/5681611b550346895dd073bf/html5/thumbnails/8.jpg)
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.
8
LDA
music movie author move writer book …
1 1 0 0 1 1 …
1 1 0 1 1 1 …
k topicm documentn word
zm,n
wm,nk = topic 10K=60
z1
w1
1 2 3 4 … 9 10 11 11 … 60
dm0.2 0.1 0.4 0.3 … 0.2 0.9 0.1 0.2 … 0.1
p(.|.)=?
![Page 9: Web Search Clustering and Labeling with Hidden Topics](https://reader034.vdocuments.mx/reader034/viewer/2022051421/5681611b550346895dd073bf/html5/thumbnails/9.jpg)
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.
9
LDA
music movie author move writer book …
1 1 0 0 1 1 …
1 1 0 1 1 1 …
k topicm documentn word
zm,n
wm,nk = topic 10K=60
z1
w1
1 2 3 4 … 9 10 11 11 … 60
dm0.2 0.1 0.4 0.3 … 0.2 0.9 0.1 0.2 … 0.1
p(.|.)=?
p(.|.)=1/60
![Page 10: Web Search Clustering and Labeling with Hidden Topics](https://reader034.vdocuments.mx/reader034/viewer/2022051421/5681611b550346895dd073bf/html5/thumbnails/10.jpg)
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.
10
Framework
![Page 11: Web Search Clustering and Labeling with Hidden Topics](https://reader034.vdocuments.mx/reader034/viewer/2022051421/5681611b550346895dd073bf/html5/thumbnails/11.jpg)
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.
11
the tth term in the vocabulary V
the kth topic
Similarity between di and dj
![Page 12: Web Search Clustering and Labeling with Hidden Topics](https://reader034.vdocuments.mx/reader034/viewer/2022051421/5681611b550346895dd073bf/html5/thumbnails/12.jpg)
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.
12
Framework
similarity matrix between snippets
![Page 13: Web Search Clustering and Labeling with Hidden Topics](https://reader034.vdocuments.mx/reader034/viewer/2022051421/5681611b550346895dd073bf/html5/thumbnails/13.jpg)
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.
13
Label Candidate Generation
D Topic k
k=1 k=2 … k=10 … k=60
music 14 18 38 9Label Candidate Generation
music
radio player
mp3
CD
…
![Page 14: Web Search Clustering and Labeling with Hidden Topics](https://reader034.vdocuments.mx/reader034/viewer/2022051421/5681611b550346895dd073bf/html5/thumbnails/14.jpg)
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.
14
Label assignment for clustering snippets
D Topic k
Label Candidate Generation
music
radio player
mp3
CD
…
dj
di
Label assignment
music
CD
![Page 15: Web Search Clustering and Labeling with Hidden Topics](https://reader034.vdocuments.mx/reader034/viewer/2022051421/5681611b550346895dd073bf/html5/thumbnails/15.jpg)
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.
15
Framework
(label candidate generation)
di > topic10dj > topic4, topic10
djdi
musicmovieradio layer
musicmovie
![Page 16: Web Search Clustering and Labeling with Hidden Topics](https://reader034.vdocuments.mx/reader034/viewer/2022051421/5681611b550346895dd073bf/html5/thumbnails/16.jpg)
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.
16
Experiment
Wikipedia datasetVnexpress dataset
![Page 17: Web Search Clustering and Labeling with Hidden Topics](https://reader034.vdocuments.mx/reader034/viewer/2022051421/5681611b550346895dd073bf/html5/thumbnails/17.jpg)
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.
17
Experimental dataset
Web dataset consists of 2,357 snippets in 9 categories
20 queries to Google and obtaining about 150 distinguished snippets
![Page 18: Web Search Clustering and Labeling with Hidden Topics](https://reader034.vdocuments.mx/reader034/viewer/2022051421/5681611b550346895dd073bf/html5/thumbnails/18.jpg)
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.
18
F-measure
Experiments
![Page 19: Web Search Clustering and Labeling with Hidden Topics](https://reader034.vdocuments.mx/reader034/viewer/2022051421/5681611b550346895dd073bf/html5/thumbnails/19.jpg)
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.
19
Experiments
![Page 20: Web Search Clustering and Labeling with Hidden Topics](https://reader034.vdocuments.mx/reader034/viewer/2022051421/5681611b550346895dd073bf/html5/thumbnails/20.jpg)
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.
20
Experiments
![Page 21: Web Search Clustering and Labeling with Hidden Topics](https://reader034.vdocuments.mx/reader034/viewer/2022051421/5681611b550346895dd073bf/html5/thumbnails/21.jpg)
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.
21
Experiments
![Page 22: Web Search Clustering and Labeling with Hidden Topics](https://reader034.vdocuments.mx/reader034/viewer/2022051421/5681611b550346895dd073bf/html5/thumbnails/22.jpg)
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.
22
Experiments
![Page 23: Web Search Clustering and Labeling with Hidden Topics](https://reader034.vdocuments.mx/reader034/viewer/2022051421/5681611b550346895dd073bf/html5/thumbnails/23.jpg)
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.
23
Experiments
![Page 24: Web Search Clustering and Labeling with Hidden Topics](https://reader034.vdocuments.mx/reader034/viewer/2022051421/5681611b550346895dd073bf/html5/thumbnails/24.jpg)
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.
24
Experiments
![Page 25: Web Search Clustering and Labeling with Hidden Topics](https://reader034.vdocuments.mx/reader034/viewer/2022051421/5681611b550346895dd073bf/html5/thumbnails/25.jpg)
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.
25
clustering snippets with hidden topics labeling clusters using hidden topic analysis
Conclusion
![Page 26: Web Search Clustering and Labeling with Hidden Topics](https://reader034.vdocuments.mx/reader034/viewer/2022051421/5681611b550346895dd073bf/html5/thumbnails/26.jpg)
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.
26
Advantage labeling clusters with the help of hidden topics the size of snippets is small
Two datasets: 2,357 and 150 (in our work: more than 2 million snippets)
Disadvantage less depends on snippets
Application snippets are useful to make sense
My Comment