segmentation of multi-sentence questions: towards effective question retrieval in cqa services

22
Segmentation of Multi-Sente nce Questions: Towards Effe ctive Question Retrieval in cQA Services Kai Wang, Zhao-Yan Ming, Xia Hu, Tat-Sen g Chua SIGIR’10 Speaker: Hsin-Lan, Wang Date: 2011/03/07

Upload: aiko

Post on 03-Feb-2016

37 views

Category:

Documents


0 download

DESCRIPTION

Segmentation of Multi-Sentence Questions: Towards Effective Question Retrieval in cQA Services. Kai Wang, Zhao-Yan Ming, Xia Hu, Tat-Seng Chua SIGIR ’ 10 Speaker: Hsin-Lan, Wang Date: 2011/03/07. Outline. Introduction Question Sentence Detection Sequential Pattern Mining - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Segmentation of Multi-Sentence Questions: Towards Effective Question Retrieval in cQA Services

Segmentation of Multi-Sentence Questions: Towards Effective Question Retrieval in cQA Services

Kai Wang, Zhao-Yan Ming, Xia Hu, Tat-Seng ChuaSIGIR’10

Speaker: Hsin-Lan, WangDate: 2011/03/07

Page 2: Segmentation of Multi-Sentence Questions: Towards Effective Question Retrieval in cQA Services

Outline Introduction Question Sentence Detection

Sequential Pattern Mining Syntactic Shallow Pattern Mining Model Learning

Multi-Sentence Question Segmentation Building Graphs for Question Threads Propagating the Closeness Scores Segmentation-aided Retrieval

Experiment Conclusion

Page 3: Segmentation of Multi-Sentence Questions: Towards Effective Question Retrieval in cQA Services

Introduction

cQA: Community-based Question Answering services

Page 4: Segmentation of Multi-Sentence Questions: Towards Effective Question Retrieval in cQA Services

Introduction A new graph based approach to segment

multi-sentence questions would be introduced in this paper.

Basic idea: Detect question sentences Measure the closeness score Model their relationships to form a graph Use the graph to propagate the closeness

scores Group topically related sentences

Page 5: Segmentation of Multi-Sentence Questions: Towards Effective Question Retrieval in cQA Services

Question Sentence Detection

Human generated content on the Web are usually informal.

Solve: Use salient sequential and syntactic patterns as features to build a question detector.

Page 6: Segmentation of Multi-Sentence Questions: Towards Effective Question Retrieval in cQA Services

Question Sentence Detection

Sequential Pattern Mining Sequential Pattern is also referred to

as Labeled Sequential Pattern.S→C, C is the class label that the sequence S is classified to.

Sequence is defined to be a series of tokens from sentences, and the class is in the binary form of {Q, NQ}.

Page 7: Segmentation of Multi-Sentence Questions: Towards Effective Question Retrieval in cQA Services

Question Sentence Detection

Sequential Pattern Mining The purpose is to extract a set of frequen

t subsequence of words that are indicative of questions.

Applying POS taggers to all tokens except some keywords.<any1, know, what>→<any1, VB, what>

Page 8: Segmentation of Multi-Sentence Questions: Towards Effective Question Retrieval in cQA Services

Question Sentence Detection

Syntactic Shallow Pattern Mining

Page 9: Segmentation of Multi-Sentence Questions: Towards Effective Question Retrieval in cQA Services

Question Sentence Detection

Model Learning Certain patterns from questions

becomes unnatural to identify characteristics for non-questions.

Solve: One-class SVM Training data: assuming all questions

ending with question marks as an initial set of positive examples.

Page 10: Segmentation of Multi-Sentence Questions: Towards Effective Question Retrieval in cQA Services

Multi-Sentence Question Segmentation

Building Graphs for Question Threads Vq: question sentence vertex set Vc: context sentence vertex set

Model the question thread into a weighted graph (V,E).

Page 11: Segmentation of Multi-Sentence Questions: Towards Effective Question Retrieval in cQA Services

Multi-Sentence Question Segmentation

Building Graphs for Question Threads Directed edge (u→v):

KL-divergence

Coherence

Coreference

Page 12: Segmentation of Multi-Sentence Questions: Towards Effective Question Retrieval in cQA Services

Multi-Sentence Question Segmentation

Building Graphs for Question Threads Undirected edge (u-v):

Cosine Similarity

Distance

: proportional to the number of sentences between u and v.

Page 13: Segmentation of Multi-Sentence Questions: Towards Effective Question Retrieval in cQA Services

Multi-Sentence Question Segmentation Building Graphs for Question Threads

Undirected edge (u-v): Coherence

Coreference

Page 14: Segmentation of Multi-Sentence Questions: Towards Effective Question Retrieval in cQA Services

Multi-Sentence Question Segmentation Propagating the Closeness Scores

Page 15: Segmentation of Multi-Sentence Questions: Towards Effective Question Retrieval in cQA Services

Multi-Sentence Question Segmentation Propagating the Closeness Scores

Sort edges in Er by the closeness score. <e1, e2, … , en > Extraction process terminates at em when

one of the following criteria is met:

Page 16: Segmentation of Multi-Sentence Questions: Towards Effective Question Retrieval in cQA Services

Multi-Sentence Question Segmentation Propagating the Closeness Scores

Example: final edge set {(q1,c1), (q2,c2), (q1,c2)}

question segments (q1 – c1, c2), (q2 – c2)

Page 17: Segmentation of Multi-Sentence Questions: Towards Effective Question Retrieval in cQA Services

Multi-Sentence Question Segmentation Segmentation-aided Retrieval

Page 18: Segmentation of Multi-Sentence Questions: Towards Effective Question Retrieval in cQA Services

Experiments Evaluation of Question Detection

Dataset: issued getByCategory API query to Yahoo! Answers.

Generate three datasets: Pattern Mining Set: 350k sentences extracted from 60k

question threads. Training Set: 130k sentences from another 60k questio

n threads. Testing Set: Two annotators are asked to tag 2004 que

stion sentences and 2039 non-question sentences.

Page 19: Segmentation of Multi-Sentence Questions: Towards Effective Question Retrieval in cQA Services

Experiments

Evaluation of Question Detection

Page 20: Segmentation of Multi-Sentence Questions: Towards Effective Question Retrieval in cQA Services

Experiments

Direct Assessment of Multi-Sentence Question Segmentation via User Study

Page 21: Segmentation of Multi-Sentence Questions: Towards Effective Question Retrieval in cQA Services

Experiments

Performance Evaluation on Question Retrieval with Segmentation Model

Page 22: Segmentation of Multi-Sentence Questions: Towards Effective Question Retrieval in cQA Services

Conclusion

Present a new segmentation approach for segmenting multi-sentence questions.

Separates question sentences from non-question sentences and aligns them according to their closeness scores.