segmentation of multi-sentence questions: towards effective question retrieval in cqa services
DESCRIPTION
Segmentation of Multi-Sentence Questions: Towards Effective Question Retrieval in cQA Services. Kai Wang, Zhao-Yan Ming, Xia Hu, Tat-Seng Chua SIGIR ’ 10 Speaker: Hsin-Lan, Wang Date: 2011/03/07. Outline. Introduction Question Sentence Detection Sequential Pattern Mining - PowerPoint PPT PresentationTRANSCRIPT
Segmentation of Multi-Sentence Questions: Towards Effective Question Retrieval in cQA Services
Kai Wang, Zhao-Yan Ming, Xia Hu, Tat-Seng ChuaSIGIR’10
Speaker: Hsin-Lan, WangDate: 2011/03/07
Outline Introduction Question Sentence Detection
Sequential Pattern Mining Syntactic Shallow Pattern Mining Model Learning
Multi-Sentence Question Segmentation Building Graphs for Question Threads Propagating the Closeness Scores Segmentation-aided Retrieval
Experiment Conclusion
Introduction
cQA: Community-based Question Answering services
Introduction A new graph based approach to segment
multi-sentence questions would be introduced in this paper.
Basic idea: Detect question sentences Measure the closeness score Model their relationships to form a graph Use the graph to propagate the closeness
scores Group topically related sentences
Question Sentence Detection
Human generated content on the Web are usually informal.
Solve: Use salient sequential and syntactic patterns as features to build a question detector.
Question Sentence Detection
Sequential Pattern Mining Sequential Pattern is also referred to
as Labeled Sequential Pattern.S→C, C is the class label that the sequence S is classified to.
Sequence is defined to be a series of tokens from sentences, and the class is in the binary form of {Q, NQ}.
Question Sentence Detection
Sequential Pattern Mining The purpose is to extract a set of frequen
t subsequence of words that are indicative of questions.
Applying POS taggers to all tokens except some keywords.<any1, know, what>→<any1, VB, what>
Question Sentence Detection
Syntactic Shallow Pattern Mining
Question Sentence Detection
Model Learning Certain patterns from questions
becomes unnatural to identify characteristics for non-questions.
Solve: One-class SVM Training data: assuming all questions
ending with question marks as an initial set of positive examples.
Multi-Sentence Question Segmentation
Building Graphs for Question Threads Vq: question sentence vertex set Vc: context sentence vertex set
Model the question thread into a weighted graph (V,E).
Multi-Sentence Question Segmentation
Building Graphs for Question Threads Directed edge (u→v):
KL-divergence
Coherence
Coreference
Multi-Sentence Question Segmentation
Building Graphs for Question Threads Undirected edge (u-v):
Cosine Similarity
Distance
: proportional to the number of sentences between u and v.
Multi-Sentence Question Segmentation Building Graphs for Question Threads
Undirected edge (u-v): Coherence
Coreference
Multi-Sentence Question Segmentation Propagating the Closeness Scores
Multi-Sentence Question Segmentation Propagating the Closeness Scores
Sort edges in Er by the closeness score. <e1, e2, … , en > Extraction process terminates at em when
one of the following criteria is met:
Multi-Sentence Question Segmentation Propagating the Closeness Scores
Example: final edge set {(q1,c1), (q2,c2), (q1,c2)}
question segments (q1 – c1, c2), (q2 – c2)
Multi-Sentence Question Segmentation Segmentation-aided Retrieval
Experiments Evaluation of Question Detection
Dataset: issued getByCategory API query to Yahoo! Answers.
Generate three datasets: Pattern Mining Set: 350k sentences extracted from 60k
question threads. Training Set: 130k sentences from another 60k questio
n threads. Testing Set: Two annotators are asked to tag 2004 que
stion sentences and 2039 non-question sentences.
Experiments
Evaluation of Question Detection
Experiments
Direct Assessment of Multi-Sentence Question Segmentation via User Study
Experiments
Performance Evaluation on Question Retrieval with Segmentation Model
Conclusion
Present a new segmentation approach for segmenting multi-sentence questions.
Separates question sentences from non-question sentences and aligns them according to their closeness scores.