natural language based reformulation resource and web exploitation for question answering ulf...
TRANSCRIPT
Natural Language Based Reformulation Resource and Web Exploitation for Question
Answering
Ulf Hermjakob, Abdessamad Echihabi, Daniel MarcuUniversity of Southern California
Presented By:Soobia Afroz
Introduction
The degree of difficulty How closely a given corpus matches the question and NOT on the question itself
Q: When was the UN founded?
A: The UN was formed in January 1942.
A: The name "United Nations", coined by United States President Franklin D. Roosevelt, was first used in the "Declaration by United Nations" of 1 January 1942, during the Second World War, when representatives of 26 nations pledged their Governments to
continue fighting together against the Axis Powers.
Larger text => Good Answers => Validation in original text
Paraphrasing questions:
Create semantically equivalent paraphrases of the questions Match Answer/string with any of the paraphrases
• Question paraphrases + Retrieval engine Find documents containing correct answers
• Rank and select better answers• Automatically paraphrase questions by TextMap.
Example:
“How did Mahatma Gandhi die?”
“How deep is Crater Lake?”
“Who invented the cotton gin?”
Automatic Paraphrases of questions:
How the system works:
• Parse questions
• Identify the answer type of the question
• Reformulate the questionaverage reformulations: 3.14
• Match at parse-tree level
1. Syntactic reformulations
• Turn a question into declarative form, e.g.,
2. Inference Reformulations
.
3. Reformulation Chains
4. Generation
Information Retrieval and the Web
TREC (Text Retrieval Conference)
IR system for Webclopedia
Web
Web based IR system
Query Reformulation module
Web Search engine
Sentence Ranking module
1. Query Reformulation module
Previous attempts:• Simple, exhaustive string-based manipulations• Transformation grammars• Learning algorithms
Current attempt:• Analyze how people naturally form queries to find answers• Randomly selected 50 TREC8 questions• Manually produced simplest queries that yield the most Web pages containing
answers• Analyzed the manually-produced queries and categorized them into seven ‘natural’
techniques that were used to form a natural language question• Derived algorithms that replicate each of the observed technique
Query Reformulation Techniques
2. Sentence Ranking module
• Produce a list of Boolean queries for each question using all the query reformulation techniques
• Retrieve the top ten results for each query using a web search engine• Retrieve the documents, strip HTML, segment the text into sentences• Each sentence is ranked according to 2 schemas:
Score w.r.t. queries terms:-- Each word in query assigned a weight-- Each quoted term in the query has a weight equal to the sum of the weights of its
words-- Each sentence has a weight equal to the weighted overlap with queries terms
Score w.r.t. answers:-- Tag sentences using BBN’s IdentiFinder (a hidden Markov model that learns to recognize and classify names,
dates, times, and numerical quantities.)-- Score sentences according to the overlap with answer type, checked against the
answer type and the semantic entities found by IdentiFinder
Evaluation of the results:
Evaluation of the results:
Evaluation of the results:
Reformulations led to more correct answers when used in conjunction with a large corpus like the Web.
Conclusion
Likelihood of finding correct answers is increased by QR
IR module produces higher quality answer candidates
Scoring precision is increased for answer candidates
A strong match with a reformulation provides additional confidence in the correctness of the answer