natural language based reformulation resource and web exploitation for question answering ulf...

Natural Language Based Reformulation Resource and Web Exploitation for Question Answering Ulf Hermjakob, Abdessamad Echihabi, Daniel Marcu University of Southern California Presented By: Soobia Afroz

Upload: arron-oliver

Post on 01-Jan-2016

221 views

Category:

Documents

0 download

Report

Download

Tags:

Embed Size (px):

TRANSCRIPT

Natural Language Based Reformulation Resource and Web Exploitation for Question

Answering

Ulf Hermjakob, Abdessamad Echihabi, Daniel MarcuUniversity of Southern California

Presented By:Soobia Afroz

Page 2: Natural Language Based Reformulation Resource and Web Exploitation for Question Answering Ulf Hermjakob, Abdessamad Echihabi, Daniel Marcu University of

Introduction

The degree of difficulty How closely a given corpus matches the question and NOT on the question itself

Q: When was the UN founded?

A: The UN was formed in January 1942.

A: The name "United Nations", coined by United States President Franklin D. Roosevelt, was first used in the "Declaration by United Nations" of 1 January 1942, during the Second World War, when representatives of 26 nations pledged their Governments to

continue fighting together against the Axis Powers.

Larger text => Good Answers => Validation in original text

Page 3: Natural Language Based Reformulation Resource and Web Exploitation for Question Answering Ulf Hermjakob, Abdessamad Echihabi, Daniel Marcu University of

Paraphrasing questions:

Create semantically equivalent paraphrases of the questions Match Answer/string with any of the paraphrases

• Question paraphrases + Retrieval engine Find documents containing correct answers

• Rank and select better answers• Automatically paraphrase questions by TextMap.

Example:

“How did Mahatma Gandhi die?”

“How deep is Crater Lake?”

“Who invented the cotton gin?”

Page 4: Natural Language Based Reformulation Resource and Web Exploitation for Question Answering Ulf Hermjakob, Abdessamad Echihabi, Daniel Marcu University of

Automatic Paraphrases of questions:

Page 5: Natural Language Based Reformulation Resource and Web Exploitation for Question Answering Ulf Hermjakob, Abdessamad Echihabi, Daniel Marcu University of

How the system works:

• Parse questions

• Identify the answer type of the question

• Reformulate the questionaverage reformulations: 3.14

• Match at parse-tree level

Page 6: Natural Language Based Reformulation Resource and Web Exploitation for Question Answering Ulf Hermjakob, Abdessamad Echihabi, Daniel Marcu University of

1. Syntactic reformulations

• Turn a question into declarative form, e.g.,

Page 7: Natural Language Based Reformulation Resource and Web Exploitation for Question Answering Ulf Hermjakob, Abdessamad Echihabi, Daniel Marcu University of

2. Inference Reformulations

Page 8: Natural Language Based Reformulation Resource and Web Exploitation for Question Answering Ulf Hermjakob, Abdessamad Echihabi, Daniel Marcu University of

3. Reformulation Chains

Page 9: Natural Language Based Reformulation Resource and Web Exploitation for Question Answering Ulf Hermjakob, Abdessamad Echihabi, Daniel Marcu University of

4. Generation

Page 10: Natural Language Based Reformulation Resource and Web Exploitation for Question Answering Ulf Hermjakob, Abdessamad Echihabi, Daniel Marcu University of

Information Retrieval and the Web

TREC (Text Retrieval Conference)

IR system for Webclopedia

Web

Web based IR system

Query Reformulation module

Web Search engine

Sentence Ranking module

Page 11: Natural Language Based Reformulation Resource and Web Exploitation for Question Answering Ulf Hermjakob, Abdessamad Echihabi, Daniel Marcu University of

1. Query Reformulation module

Previous attempts:• Simple, exhaustive string-based manipulations• Transformation grammars• Learning algorithms

Current attempt:• Analyze how people naturally form queries to find answers• Randomly selected 50 TREC8 questions• Manually produced simplest queries that yield the most Web pages containing

answers• Analyzed the manually-produced queries and categorized them into seven ‘natural’

techniques that were used to form a natural language question• Derived algorithms that replicate each of the observed technique

Page 12: Natural Language Based Reformulation Resource and Web Exploitation for Question Answering Ulf Hermjakob, Abdessamad Echihabi, Daniel Marcu University of

Query Reformulation Techniques

Page 13: Natural Language Based Reformulation Resource and Web Exploitation for Question Answering Ulf Hermjakob, Abdessamad Echihabi, Daniel Marcu University of

2. Sentence Ranking module

• Produce a list of Boolean queries for each question using all the query reformulation techniques

• Retrieve the top ten results for each query using a web search engine• Retrieve the documents, strip HTML, segment the text into sentences• Each sentence is ranked according to 2 schemas:

Score w.r.t. queries terms:-- Each word in query assigned a weight-- Each quoted term in the query has a weight equal to the sum of the weights of its

words-- Each sentence has a weight equal to the weighted overlap with queries terms

Score w.r.t. answers:-- Tag sentences using BBN’s IdentiFinder (a hidden Markov model that learns to recognize and classify names,

dates, times, and numerical quantities.)-- Score sentences according to the overlap with answer type, checked against the

answer type and the semantic entities found by IdentiFinder