finding question answer pairs from online forum
TRANSCRIPT
G. Cong, L. Wang, C. Lin, Y. Song, and Y. Sun. (2008)
Presenter: Tan Kent Loong r04944005
Finding Question-Answer Pairs from Online Forums
Motivation
● Forums contain a huge amount of user-generated content on a variety of topics.○ Knowledge of human○ Largely unstructured
Flow Chart
Question Detection Answer DetectionThreads
Forum
Question Detection● Rule based methods
○ 5W1H○ End with ' ? '
■ 30% questions do not end with question marks.● I am wondering where I can visit in Bangkok.● I am having doubt about changing tyre.
■ 9% are not questions● Like to enjoy a long walk while enjoying great
sights and tastes?● Only have three days to explore this city?
Not good!
Labeled Sequential Pattern1. Pre-process each sentence into POS tags
“where can you find a job”→ “where can PRP VB DT NN”
2. Build sequence database.<a, d, e, f> → Q<a, f, e, f> → Q<d, a, f> → NQ
3. Calculate the support and confidence- <a, e, f> with support 66.7% and 100% confidence- <a, f> with support 66.7% and 66.7% confidence
1. Set minimum support threshold and minimum confidence threshold F1 score = 97.4%
Answer Detection● Observation: Many-to-many
○ Multiple questions and answers within same thread.■ 1 question may have multiple replies.■ 1 post may contain answers to multiple
questions.
● Treat as traditional document retrieval problem○ Cosine Similarity○ Query likelihood language model○ KL-divergence language model
● Classification method
Answer Detection
Think of a “distance” between question language model and answer language model
p(w|Ma) :
p(w|Mq) :
Probability of keyword appeared in candidate answer.Probability of keyword appeared in question.
KL-divergence
● Treat as traditional document retrieval problem○ Cosine Similarity○ Query likelihood language model○ KL-divergence language model
● Classification method
Cons: Do not consider the relationship of candidate answers and forum-specific features.
a1: world hotel is good but I prefer century hotel a2: world hotel has a very good restauranta2(generator) → a1(offspring)
Answer Detection
PageRank (without hyperlinks)
Graph-Based Propagation
● Calculate weight based on○ Probability assigned by language model of
generating one candidate answer from the other candidate answer
○ The distance of candidate answer from question○ The authority of authors of candidate answer.
author(ag ; #reply2, #start)
Graph-Based Propagation
Graph-Based Propagation
1. Propagation without initial score:
Graph-Based Propagation
1. Propagation without initial score:
2. Propagation with initial score:
Integration with other methods
1. Graph based propagation → classification2. Lexical mapping
e.g. “why → because”
Evaluation
Evaluation
Evaluation
Evaluation
Summary
Question Detection(Labeled Sequence
Pattern)
Answer Detection(Enhance with Graph-based Propagation)
Threads
Forum
Reference1. Finding question-answer pairs from online forum
http://research.microsoft.com/en-us/people/cyl/sigir2008-gao-msra.pdf
2. PageRank without hyperlinks: Structural re-ranking using links induced by language modelshttps://www.cs.cornell.edu/home/llee/papers/lmpagerank.home.html
Thank you