topic hierarchy construction for the organization of multi-source user generated contents date :...

35
Topic Hierarchy Construction for the Organization of Multi-Source User Generated Contents Date : 2013/09/17 Source : SIGIR’13 Authors : Zhu, Xingwei Ming Zhao-Yan Zhu, Xiaoyan Chua, Tat-Seng Advisor : Dr.Jia-ling, Koh Speaker : Wei, Chang 1

Upload: suzan-lloyd

Post on 16-Dec-2015

213 views

Category:

Documents


0 download

TRANSCRIPT

Topic Hierarchy Construction for the Organization of Multi-Source User Generated Contents

Date : 2013/09/17Source : SIGIR’13Authors : Zhu, Xingwei

Ming Zhao-YanZhu, XiaoyanChua, Tat-Seng

Advisor : Dr.Jia-ling, KohSpeaker : Wei, Chang

1

Outline

• Introduction• Approach• Experiment• Conclusion

2

IPhone 5s? IPhone 5c?

3

Multi-Source User Generated Contents

4

Problem Formulation

• Goal : Given a root topic C and its information source set Sc, we aim to build and continuously update a topic hierarchy H for C in order to organize the information in Sc according to their relevant topics.

• In this paper, Sc={Blogger, Twitter, community QA site(cQA)}

5

Outline

• Introduction• Approach• Framework• Topic Term Identification• Topic Relation Identification• Topic Hierarchy Generation• Topic Hierarchy Update

• Experiment• Conclusion 6

Framwork

7

Topic Term Identification

8

User Generated Contents

Potential Grounding

Topics

Grounding Topic Set

Heuristic Rules

TF-IDFFinal

Candidate Topic SetExternal

Sources

Heuristic Rules

9

Grounding Topic Set

10

Apple Inc.

T-Mobile

IPhone

IOS

Price

64-bit

Smartphone

Blog 1

Tweet 2

QA 1

QA 2

Tweet 1

TFIDF

IPhoneApple Inc.

T-MobileApple Inc.

IOSApple Inc.

IOS

IPhone

AppleIOS

IPhone

Grounding Topic Set

• Blogs • Use the content and title• Double weights of terms in titles• Use the top 5 terms

• cQAs :• Use the question title, description and the best

answers• Use the top 5 terms

• Tweets :• Use the content• Use the top 1 terms

11

Topic Set Extension

• What we already have :• Grounding topic set

• What it lacks :• Middle level topic

• How to get middle level topics :• Search Engine : 2 patterns• * such as <slot>• <slot> of *

• WordNet : direct hypernym• Wikipedia : category tags

• Final candidate topic set :

12

Outline

• Introduction• Approach• Framework• Topic Term Identification• Topic Relation Identification• Topic Hierarchy Generation• Topic Hierarchy Update

• Experiment• Conclusion 13

Topic Relation Identification

14

IPhone IPhone 5s

Apple Inc.

𝑒(𝑟 (𝑡𝐴 , 𝑡𝐵)) 𝑒(𝑟 (𝑡𝐵 ,𝑡 𝐴))

𝑒(𝑟 (𝑡𝐶 ,𝑡𝐵))

𝑒(𝑟 (𝑡𝐴 , 𝑡𝐶 )) 𝑒(𝑟 (𝑡𝐶 ,𝑡 𝐴))

𝑒(𝑟 (𝑡𝐵 ,𝑡𝐶))

Denote as a sub-topic relation, which means is a sub-topic of

Topic Relation Identification

15

Evidences from the Information Source Set• , : the cosine similarity between the corresponding contexts

of them• V=(smart phone, price, buy, iOS, Android)

16

Evidences from Wikipedia

Pointwise Mutual Information (PMI)

17

Evidences from WordNet

18

Evidences from Search Engine Results• Pattern-based evidences• Query = “tA such as tB and” root topic• = 1 if the search engine returns more than ζ results that

contain this query; otherwise it is set to 0.

19

Combine Evidences

20

Outline

• Introduction• Approach• Framework• Topic Term Identification• Topic Relation Identification• Topic Hierarchy Generation• Topic Hierarchy Update

• Experiment• Conclusion 21

Topic Hierarchy Generation

22

Topic Hierarchy Generation

23

Topic Hierarchy Generation

24

Topic Hierarchy Generation

25

Edge Weighting

26

Hierarchy Pruning• Use the Chu- Liu/Edmond’s optimum branching algorithm• every non-root node has only one parent and the sum of the

edge weights are maximized• remove • (1) the nodes that are not reachable for the root topic and • (2) the leaf nodes that are not in the grounding topic set.

27

Topic Hierarchy Update

28

Outline

• Introduction• Approach• Framework• Topic Term Identification• Topic Relation Identification• Topic Hierarchy Generation• Topic Hierarchy Update

• Experiment• Conclusion 29

Topic Term Identification

30

Topic Hierarchy Generation

31

Topic Hierarchy Generation

32

Hierarchy Update

33

Outline

• Introduction• Approach• Framework• Topic Term Identification• Topic Relation Identification• Topic Hierarchy Generation• Topic Hierarchy Update

• Experiment• Conclusion 34

Conclusion

• Given a root topic, we used evidences from multiple UGCs to identify topic terms and sub-topic relations between them. With these topic terms, a graph-based algorithm was applied to generate and update the topic hierarchies, on which the UGCs can be organized according to their relevant topics.

35