1 sims 290-2: applied natural language processing marti hearst november 15, 2004

66
1 SIMS 290-2: Applied Natural Language Processing Marti Hearst November 15, 2004

Post on 20-Dec-2015

216 views

Category:

Documents


2 download

TRANSCRIPT

1

SIMS 290-2: Applied Natural Language Processing

Marti HearstNovember 15, 2004 

 

2

Question Answering

Today:Introduction to QAA typical full-fledged QA systemA very simple system, in response to thisAn intermediate approach

Wednesday:Using external resources

– WordNet– Encyclopedias, Gazeteers

Incorporating a reasoning systemMachine Learning of mappingsOther question types (e.g., biography, definitions)

3

A of Search Types

What is the typical height of a giraffe?

What are some good ideas for landscaping my client’s yard?

What are some promising untried treatments for Raynaud’s disease?Text Data Mining

Browse and Build

Question/Answer

4Adapted from slide by Surdeanu and Pasca

Document RetrievalUsers submit queries corresponding to their information needs.System returns (voluminous) list of full-length documents.It is the responsibility of the users to find information of interest within the returned documents.

Open-Domain Question Answering (QA)Users ask questions in natural language.

What is the highest volcano in Europe?System returns list of short answers.

… Under Mount Etna, the highest volcano in Europe, perches the fabulous town …A real use for NLP

Beyond Document Retrieval

5

Questions and Answers

What is the height of a typical giraffe?

The result can be a simple answer, extracted from existing web pages.Can specify with keywords or a natural language query

– However, most web search engines are not set up to handle questions properly.

– Get different results using a question vs. keywords

6

7

8

9

10Adapted from slide by Surdeanu and Pasca

The Problem of Question Answering

What is the nationality of Pope John Paul II? … stabilize the country with its help, the Catholic hierarchy stoutly held out

for pluralism, in large part at the urging of Polish-born Pope John Paul II. When the Pope emphatically defended the Solidarity trade union during a 1987 tour of the…

When was the San Francisco fire? … were driven over it. After the ceremonial tie was removed - it burned in

the San Francisco fire of 1906 – historians believe an unknown Chinese worker probably drove the last steel spike into a wooden tie. If so, it was only…

Where is the Taj Mahal? … list of more than 360 cities around the world includes the Great Reef in

Australia, the Taj Mahal in India, Chartre’s Cathedral in France, and Serengeti National Park in Tanzania. The four sites Japan has listed include…

11Adapted from slide by Surdeanu and Pasca

The Problem of Question Answering

What is the nationality of Pope John Paul II? … stabilize the country with its help, the Catholic hierarchy stoutly held out

for pluralism, in large part at the urging of Polish-born Pope John Paul II. When the Pope emphatically defended the Solidarity trade union during a 1987 tour of the…

Natural language question, not keyword

queries

Short text fragment, not URL

list

12Adapted from slides by Manning, Harabagiu, Kushmeric, and ISI

Question Answering from text

With massive collections of full-text documents, simply finding relevant documents is of limited use: we want answers QA: give the user a (short) answer to their question, perhaps supported by evidence. An alternative to standard IR

The first problem area in IR where NLP is really making a difference.

13Adapted from slides by Manning, Harabagiu, Kushmeric, and ISI

People want to ask questions…

Examples from AltaVista query logwho invented surf music?how to make stink bombswhere are the snowdens of yesteryear?which english translation of the bible is used in official catholic liturgies?how to do clayarthow to copy psxhow tall is the sears tower?

Examples from Excite query log (12/1999)how can i find someone in texaswhere can i find information on puritan religion?what are the 7 wonders of the worldhow can i eliminate stressWhat vacuum cleaner does Consumers Guide recommend

14Adapted from slides by Manning, Harabagiu, Kushmeric, and ISI

A Brief (Academic) HistoryIn some sense question answering is not a new research area Question answering systems can be found in many areas of NLP research, including:

Natural language database systems– A lot of early NLP work on these

Problem-solving systems– STUDENT (Winograd ’77)– LUNAR (Woods & Kaplan ’77)

Spoken dialog systems– Currently very active and commercially relevant

The focus is now on open-domain QA is newFirst modern system: MURAX (Kupiec, SIGIR’93):

– Trivial Pursuit questions – Encyclopedia answers

FAQFinder (Burke et al. ’97)TREC QA competition (NIST, 1999–present)

15Adapted from slides by Manning, Harabagiu, Kushmeric, and ISI

AskJeeves

AskJeeves is probably most hyped example of “Question answering”How it used to work:

Do pattern matching to match a question to their own knowledge base of questionsIf a match is found, returns a human-curated answer to that known questionIf that fails, it falls back to regular web search(Seems to be more of a meta-search engine now)

A potentially interesting middle ground, but a fairly weak shadow of real QA

16Adapted from slides by Manning, Harabagiu, Kushmeric, and ISI

Question Answering at TREC

Question answering competition at TREC consists of answering a set of 500 fact-based questions, e.g.,

“When was Mozart born?”.

Has really pushed the field forward.The document set

Newswire textual documents from LA Times, San Jose Mercury News, Wall Street Journal, NY Times etcetera: over 1M documents now.Well-formed lexically, syntactically and semantically (were reviewed by professional editors).

The questionsHundreds of new questions every year, the total is ~2400

TaskInitially extract at most 5 answers: long (250B) and short (50B).Now extract only one exact answer.Several other sub-tasks added later: definition, list, biography.

17

Sample TREC questions

1. Who is the author of the book, "The Iron Lady: A Biography of Margaret Thatcher"?

2. What was the monetary value of the Nobel Peace Prize in 1989?

3. What does the Peugeot company manufacture?

4. How much did Mercury spend on advertising in 1993?

5. What is the name of the managing director of Apricot Computer?

6. Why did David Koresh ask the FBI for a word processor?

7. What is the name of the rare neurological disease with symptoms such as: involuntary movements (tics), swearing, and incoherent vocalizations (grunts, shouts, etc.)?

18

TREC Scoring

For the first three years systems were allowed to return 5 ranked answer snippets (50/250 bytes) to each question.

Mean Reciprocal Rank Scoring (MRR):– Each question assigned the reciprocal rank of the first correct

answer. If correct answer at position k, the score is 1/k. 1, 0.5, 0.33, 0.25, 0.2, 0 for 1, 2, 3, 4, 5, 6+ position

Mainly Named Entity answers (person, place, date, …)

From 2002 on, the systems are only allowed to return a single exact answer and the notion of confidence has been introduced.

19Adapted from slides by Manning, Harabagiu, Kusmerick, ISI

Top Performing SystemsIn 2003, the best performing systems at TREC can answer approximately 60-70% of the questionsApproaches and successes have varied a fair deal

Knowledge-rich approaches, using a vast array of NLP techniques stole the show in 2000-2003

– Notably Harabagiu, Moldovan et al. ( SMU/UTD/LCC )

Statistical systems starting to catch upAskMSR system stressed how much could be achieved by very simple methods with enough text (and now various copycats)People are experimenting with machine learning methods

Middle ground is to use large collection of surface matching patterns (ISI)

20

Example QA System

This system contains many components used by other systems, but more complex in some waysMost work completed in 2001; there have been advances by this group and others since then.Next slides based mainly on:

Paşca and Harabagiu, High-Performance Question Answering from Large Text Collections, SIGIR’01.Paşca and Harabagiu, Answer Mining from Online Documents, ACL’01.Harabagiu, Paşca, Maiorano: Experiments with Open-Domain Textual Question Answering. COLING’00

21Adapted from slide by Surdeanu and Pasca

QA Block Architecture

QuestionProcessing

QuestionProcessing

PassageRetrieval

PassageRetrieval

AnswerExtraction

AnswerExtraction

WordNet

NER

Parser

WordNet

NER

ParserDocumentRetrieval

DocumentRetrieval

Keywords Passages

Question Semantics

Captures the semantics of the questionSelects keywords for PR

Extracts and ranks passagesusing surface-text techniques

Extracts and ranks answersusing NL techniques

Q A

22Adapted from slide by Surdeanu and Pasca

Question Processing Flow

Q Questionparsing

Questionparsing

Construction of thequestion representation

Construction of thequestion representation

Answer type detectionAnswer type detection

Keyword selectionKeyword selection

Questionsemantic representation

AT category

Keywords

23Adapted from slide by Surdeanu and Pasca

Question Stems and Answer Types

Question Question stem Answer type

Q555: What was the name of Titanic’s captain? What Person

Q654: What U.S. Government agency registers trademarks?

What Organization

Q162: What is the capital of Kosovo? What City

Q661: How much does one ton of cement cost? How much Quantity

Other question stems: Who, Which, Name, How hot...Other answer types: Country, Number, Product...

Identify the semantic category of expected answers

24Adapted from slide by Surdeanu and Pasca

Detecting the Expected Answer Type

In some cases, the question stem is sufficient to indicate the answer type (AT)

Why REASONWhen DATE

In many cases, the question stem is ambiguousExamples

– What was the name of Titanic’s captain ?– What U.S. Government agency registers trademarks?– What is the capital of Kosovo?

Solution: select additional question concepts (AT words) that help disambiguate the expected answer typeExamples

– captain– agency– capital

25

Answer Type TaxonomyEncodes 8707 English concepts to help recognize expected answer typeMapping to parts of Wordnet done by hand

Can connect to Noun, Adj, and/or Verb subhierarchies

26Adapted from slide by Surdeanu and Pasca

Answer Type Detection Algorithm

Select the answer type word from the question representation.

Select the word(s) connected to the question. Some “content-free” words are skipped (e.g. “name”).From the previous set select the word with the highest connectivity in the question representation.

Map the AT word in a previously built AT hierarchyThe AT hierarchy is based on WordNet, with some concepts associated with semantic categories, e.g. “writer” PERSON.

Select the AT(s) from the first hypernym(s) associated with a semantic category.

27Adapted from slide by Surdeanu and Pasca

Answer Type Hierarchy

researcheroceanographer

chemist

scientist,man of science

Americanislander,

island-dweller

westerner

inhabitant,dweller, denizen

actor

actress

dancer

performer,performing artist

balletdancertragedian

ERSONP

Whatresearcher

discovered

Hepatitis-B vaccine

What researcher discovered thevaccine against Hepatitis-B?

What is the name of the French oceanographer who owned Calypso?

PERSON What oceanographer

ownedCalypso

name

French

PERSON

28Adapted from slide by Surdeanu and Pasca

Evaluation of Answer Type Hierarchy

This evaluation done in 2001Controlled the variation of the number of WordNet synsets included in the answer type hierarchy.Test on 800 TREC questions.

0% 0.296 3% 0.404 10% 0.437 25% 0.451 50% 0.461

Precision score (50-byte answers)

Hierarchy coverage

The derivation of the answer type is the main source of unrecoverable errors in the QA system

29Adapted from slide by Surdeanu and Pasca

Keyword Selection

Answer Type indicates what the question is looking for, but provides insufficient context to locate the answer in very large document collectionLexical terms (keywords) from the question, possibly expanded with lexical/semantic variations provide the required context.

30Adapted from slide by Surdeanu and Pasca

Lexical Terms Extraction

Questions approximated by sets of unrelated words (lexical terms)Similar to bag-of-word IR models

Question (from TREC QA track) Lexical terms

Q002: What was the monetary value of the Nobel Peace Prize in 1989?

monetary, value, Nobel, Peace, Prize

Q003: What does the Peugeot company manufacture?

Peugeot, company, manufacture

Q004: How much did Mercury spend on advertising in 1993?

Mercury, spend, advertising, 1993

Q005: What is the name of the managing director of Apricot Computer?

name, managing, director, Apricot, Computer

31Adapted from slide by Surdeanu and Pasca

Keyword Selection Algorithm

1. Select all non-stopwords in quotations2. Select all NNP words in recognized named entities3. Select all complex nominals with their adjectival

modifiers4. Select all other complex nominals5. Select all nouns with adjectival modifiers6. Select all other nouns7. Select all verbs8. Select the AT word (which was skipped in all previous

steps)

32Adapted from slide by Surdeanu and Pasca

Keyword Selection Examples

What researcher discovered the vaccine against Hepatitis-B?

Hepatitis-B, vaccine, discover, researcher

What is the name of the French oceanographer who owned Calypso?

Calypso, French, own, oceanographer

What U.S. government agency registers trademarks?U.S., government, trademarks, register, agency

What is the capital of Kosovo?Kosovo, capital

33Adapted from slide by Surdeanu and Pasca

Passage Retrieval

QuestionProcessing

QuestionProcessing

PassageRetrieval

PassageRetrieval

AnswerExtraction

AnswerExtraction

WordNet

NER

Parser

WordNet

NER

ParserDocumentRetrieval

DocumentRetrieval

Keywords Passages

Question Semantics

Captures the semantics of the questionSelects keywords for PR

Extracts and ranks passagesusing surface-text techniques

Extracts and ranks answersusing NL techniques

Q A

34Adapted from slide by Surdeanu and Pasca

Passage Extraction Loop

Passage Extraction ComponentExtracts passages that contain all selected keywordsPassage size dynamicStart position dynamic

Passage quality and keyword adjustmentIn the first iteration use the first 6 keyword selection heuristicsIf the number of passages is lower than a threshold query is too strict drop a keywordIf the number of passages is higher than a threshold query is too relaxed add a keyword

35Adapted from slide by Surdeanu and Pasca

Passage Retrieval Architecture

Passage ExtractionPassage Extraction

Passage Quality

Keyword Adjustment

Keyword Adjustment

Passage Scoring

Passage Scoring

Passage Ordering

Passage Ordering

Keywords No

Passages

Yes

Documents

DocumentRetrieval

DocumentRetrieval

RankedPassages

36Adapted from slide by Surdeanu and Pasca

Passage ScoringPassages are scored based on keyword windowsFor example, if a question has a set of keywords: {k1, k2, k3, k4}, and in a passage k1 and k2 are matched twice, k3 is matched once, and k4 is not matched, the following windows are built:

k1 k2 k3k2 k1

Window 1

k1 k2 k3k2 k1

Window 2

k1 k2 k3k2 k1

Window 3

k1 k2 k3k2 k1

Window 4

37Adapted from slide by Surdeanu and Pasca

Passage Scoring

Passage ordering is performed using a radix sort that involves three scores:

SameWordSequenceScore (largest)– Computes the number of words from the question that

are recognized in the same sequence in the window

DistanceScore (largest)– The number of words that separate the most distant

keywords in the window

MissingKeywordScore (smallest)– The number of unmatched keywords in the window

38Adapted from slide by Surdeanu and Pasca

Answer Extraction

QuestionProcessing

QuestionProcessing

PassageRetrieval

PassageRetrieval

AnswerExtraction

AnswerExtraction

WordNet

NER

Parser

WordNet

NER

ParserDocumentRetrieval

DocumentRetrieval

Keywords Passages

Question Semantics

Captures the semantics of the questionSelects keywords for PR

Extracts and ranks passagesusing surface-text techniques

Extracts and ranks answersusing NL techniques

Q A

39Adapted from slide by Surdeanu and Pasca

Ranking Candidate Answers

Answer type: Person Text passage:

“Among them was Christa McAuliffe, the first private citizen to fly in space. Karen Allen, best known for her starring role in “Raiders of the Lost Ark”, plays McAuliffe. Brian Kerwin is featured as shuttle pilot Mike Smith...”

Best candidate answer: Christa McAuliffe

Q066: Name the first private citizen to fly in space.

40Adapted from slide by Surdeanu and Pasca

Features for Answer Ranking

relNMW number of question terms matched in the answer passage

relSP number of question terms matched in the same phrase as the candidate answerrelSS number of question terms matched in the same sentence as the candidate answerrelFP flag set to 1 if the candidate answer is followed by a punctuation signrelOCTW number of question terms matched, separated from the candidate answer by at most three words and one commarelSWS number of terms occurring in the same order in the answer passage as in the questionrelDTW average distance from candidate answer to question term matches

SIGIR ‘01

41Adapted from slide by Surdeanu and Pasca

Answer Ranking based on Machine Learning

Relative relevance score computed for each pair of candidates (answer windows)relPAIR = wSWS relSWS + wFP relFP

+ wOCTW relOCTW + wSP relSP + wSS relSS

+ wNMW relNMW + wDTW relDTW + threshold

If relPAIR positive, then first candidate from pair is more relevant

Perceptron model used to learn the weightsScores in the 50% MRR for short answers, in the 60% MRR for long answers

42Adapted from slide by Surdeanu and Pasca

Evaluation on the Web

Google Answer extraction from Google

AltaVista Answer extraction from AltaVista

Precision score 0.29 0.44 0.15 0.37

Questions with a correct answer among top 5 returned answers

0.44 0.57 0.27 0.45

- test on 350 questions from TREC (Q250-Q600)- extract 250-byte answers

43Adapted from slides by Manning, Harabagiu, Kusmerick, ISI

Can we make this simpler?

One reason systems became so complex is that they have to pick out one sentence within a small collection

The answer is likely to be stated in a hard-to-recognize manner.Alternative Idea:

What happens with a much larger collection? The web is so huge that you’re likely to see the answer stated in a form similar to the question

Goal: make the simplest possible QA system by exploiting this redundancy in the web

Use this as a baseline against which to compare more elaborate systems.The next slides based on:

– Web Question Answering: Is More Always Better? Dumais, Banko, Brill, Lin, Ng, SIGIR’02

– An Analysis of the AskMSR Question-Answering System, Brill, Dumais, and Banko, EMNLP’02.

44Adapted from slides by Manning, Harabagiu, Kusmerick, ISI

AskMSR System Architecture

1 2

3

45

45Adapted from slides by Manning, Harabagiu, Kusmerick, ISI

Step 1: Rewrite the questions

Intuition: The user’s question is often syntactically quite close to sentences that contain the answer

Where is the Louvre Museum located? The Louvre Museum is located in Paris

Who created the character of Scrooge?

Charles Dickens created the character of Scrooge.

46Adapted from slides by Manning, Harabagiu, Kusmerick, ISI

Query rewritingClassify question into seven categories

Who is/was/are/were…?When is/did/will/are/were …?Where is/are/were …?

a. Hand-crafted category-specific transformation rulese.g.: For where questions, move ‘is’ to all possible locations

Look to the right of the query terms for the answer.

“Where is the Louvre Museum located?” “is the Louvre Museum located” “the is Louvre Museum located” “the Louvre is Museum located” “the Louvre Museum is located” “the Louvre Museum located is”

b. Expected answer “Datatype” (eg, Date, Person, Location, …)When was the French Revolution? DATE

Nonsense,but ok. It’sonly a fewmore queriesto the search engine.

47Adapted from slides by Manning, Harabagiu, Kusmerick, ISI

Query Rewriting - weighting

Some query rewrites are more reliable than others.

+“the Louvre Museum is located”

Where is the Louvre Museum located?

Weight 5if a match,probably right

+Louvre +Museum +located

Weight 1Lots of non-answerscould come back too

48Adapted from slides by Manning, Harabagiu, Kusmerick, ISI

Step 2: Query search engine

Send all rewrites to a Web search engineRetrieve top N answers (100-200)For speed, rely just on search engine’s “snippets”, not the full text of the actual document

49Adapted from slides by Manning, Harabagiu, Kusmerick, ISI

Step 3: Gathering N-Grams

Enumerate all N-grams (N=1,2,3) in all retrieved snippetsWeight of an n-gram: occurrence count, each weighted by “reliability” (weight) of rewrite rule that fetched the document

Example: “Who created the character of Scrooge?”

Dickens 117Christmas Carol 78Charles Dickens 75Disney 72Carl Banks 54A Christmas 41Christmas Carol 45Uncle 31

50Adapted from slides by Manning, Harabagiu, Kusmerick, ISI

Step 4: Filtering N-Grams

Each question type is associated with one or more “data-type filters” = regular expressionWhen…Where…What …Who …

Boost score of n-grams that match regexpLower score of n-grams that don’t match regexpDetails omitted from paper….

Date

Location

Person

51Adapted from slides by Manning, Harabagiu, Kusmerick, ISI

Step 5: Tiling the Answers

Dickens

Charles Dickens

Mr Charles

Scores

20

15

10

merged, discardold n-grams

Mr Charles DickensScore 45

N-Gramstile highest-scoring n-gram

N-Grams

Repeat, until no more overlap

52Adapted from slides by Manning, Harabagiu, Kusmerick, ISI

Results

Standard TREC contest test-bed (TREC 2001):~1M documents; 900 questions

Technique doesn’t do too well (though would have placed in top 9 of ~30 participants)

– MRR: strict: .34– MRR: lenient: .43– 9th place

53

ResultsFrom EMNLP’02 paper

MMR of .577; answers 61% correctlyWould be near the top of TREC-9 runs

Breakdown of feature contribution:

54Adapted from slides by Manning, Harabagiu, Kusmerick, ISI

Issues

Works best/only for “Trivial Pursuit”-style fact-based questionsLimited/brittle repertoire of

question categoriesanswer data types/filtersquery rewriting rules

55Adapted from slides by Manning, Harabagiu, Kusmerick, ISI

Intermediate Approach:Surface pattern discovery

Based on:Ravichandran, D. and Hovy E.H. Learning Surface Text Patterns for a Question Answering System, ACL’02Hovy, et al., Question Answering in Webclopedia, TREC-9, 2000.

Use of Characteristic Phrases"When was <person> born”

Typical answers– "Mozart was born in 1756.”– "Gandhi (1869-1948)...”

Suggests regular expressions to help locate correct answer– "<NAME> was born in <BIRTHDATE>”– "<NAME> ( <BIRTHDATE>-”

56Adapted from slides by Manning, Harabagiu, Kusmerick, ISI

Use Pattern Learning

Examples:“The great composer Mozart (1756-1791) achieved fame at a young age”“Mozart (1756-1791) was a genius”“The whole world would always be indebted to the great music of Mozart (1756-1791)”

Longest matching substring for all 3 sentences is "Mozart (1756-1791)”

Suffix tree would extract "Mozart (1756-1791)" as an output, with score of 3

Reminiscent of IE pattern learning

57Adapted from slides by Manning, Harabagiu, Kusmerick, ISI

Pattern Learning (cont.)

Repeat with different examples of same question type

“Gandhi 1869”, “Newton 1642”, etc.

Some patterns learned for BIRTHDATEa. born in <ANSWER>, <NAME>b. <NAME> was born on <ANSWER> , c. <NAME> ( <ANSWER> -d. <NAME> ( <ANSWER> - )

58Adapted from slides by Manning, Harabagiu, Kusmerick, ISI

QA Typology from ISITypology of typical question forms—94 nodes (47 leaf nodes)Analyzed 17,384 questions (from answers.com)

(THING ((AGENT (NAME (FEMALE-FIRST-NAME (EVE MARY ...)) (MALE-FIRST-NAME (LAWRENCE SAM ...)))) (COMPANY-NAME (BOEING AMERICAN-EXPRESS)) JESUS ROMANOFF ...) (ANIMAL-HUMAN (ANIMAL (WOODCHUCK YAK ...)) PERSON) (ORGANIZATION (SQUADRON DICTATORSHIP ...)) (GROUP-OF-PEOPLE (POSSE CHOIR ...)) (STATE-DISTRICT (TIROL MISSISSIPPI ...)) (CITY (ULAN-BATOR VIENNA ...)) (COUNTRY (SULTANATE ZIMBABWE ...)))) (PLACE (STATE-DISTRICT (CITY COUNTRY...)) (GEOLOGICAL-FORMATION (STAR CANYON...)) AIRPORT COLLEGE CAPITOL ...) (ABSTRACT (LANGUAGE (LETTER-CHARACTER (A B ...))) (QUANTITY (NUMERICAL-QUANTITY INFORMATION-QUANTITY MASS-QUANTITY MONETARY-QUANTITY TEMPORAL-QUANTITY ENERGY-QUANTITY TEMPERATURE-QUANTITY ILLUMINATION-QUANTITY

(SPATIAL-QUANTITY (VOLUME-QUANTITY AREA-QUANTITY DISTANCE-QUANTITY))

... PERCENTAGE))) (UNIT ((INFORMATION-UNIT (BIT BYTE ... EXABYTE)) (MASS-UNIT (OUNCE ...)) (ENERGY-UNIT (BTU ...)) (CURRENCY-UNIT (ZLOTY PESO ...)) (TEMPORAL-UNIT (ATTOSECOND ... MILLENIUM)) (TEMPERATURE-UNIT (FAHRENHEIT KELVIN CELCIUS)) (ILLUMINATION-UNIT (LUX CANDELA)) (SPATIAL-UNIT ((VOLUME-UNIT (DECILITER ...)) (DISTANCE-UNIT (NANOMETER ...)))) (AREA-UNIT (ACRE)) ... PERCENT)) (TANGIBLE-OBJECT ((FOOD (HUMAN-FOOD (FISH CHEESE ...))) (SUBSTANCE ((LIQUID (LEMONADE GASOLINE BLOOD ...)) (SOLID-SUBSTANCE (MARBLE PAPER ...)) (GAS-FORM-SUBSTANCE (GAS AIR)) ...)) (INSTRUMENT (DRUM DRILL (WEAPON (ARM GUN)) ...) (BODY-PART (ARM HEART ...)) (MUSICAL-INSTRUMENT (PIANO))) ... *GARMENT *PLANT DISEASE)

59Adapted from slides by Manning, Harabagiu, Kusmerick, ISI

Experiments

6 different question typesfrom Webclopedia QA Typology

– BIRTHDATE– LOCATION– INVENTOR– DISCOVERER– DEFINITION– WHY-FAMOUS

60Adapted from slides by Manning, Harabagiu, Kusmerick, ISI

Experiments: pattern precision

BIRTHDATE:1.0 <NAME> ( <ANSWER> - )0.85 <NAME> was born on <ANSWER>,0.6 <NAME> was born in <ANSWER>0.59 <NAME> was born <ANSWER>0.53 <ANSWER> <NAME> was born0.50 - <NAME> ( <ANSWER>0.36 <NAME> ( <ANSWER> -

INVENTOR1.0 <ANSWER> invents <NAME>1.0 the <NAME> was invented by <ANSWER>1.0 <ANSWER> invented the <NAME> in

61Adapted from slides by Manning, Harabagiu, Kusmerick, ISI

Experiments (cont.)

DISCOVERER1.0 when <ANSWER> discovered <NAME>1.0 <ANSWER>'s discovery of <NAME>0.9 <NAME> was discovered by <ANSWER> in

DEFINITION1.0 <NAME> and related <ANSWER>1.0 form of <ANSWER>, <NAME>0.94 as <NAME>, <ANSWER> and

62Adapted from slides by Manning, Harabagiu, Kusmerick, ISI

Experiments (cont.)

WHY-FAMOUS1.0 <ANSWER> <NAME> called1.0 laureate <ANSWER> <NAME>0.71 <NAME> is the <ANSWER> of

LOCATION1.0 <ANSWER>'s <NAME>1.0 regional : <ANSWER> : <NAME>0.92 near <NAME> in <ANSWER>

Depending on question type, get high MRR (0.6–0.9), with higher results from use of Web than TREC QA collection

63Adapted from slides by Manning, Harabagiu, Kusmerick, ISI

Shortcomings & Extensions

Need for POS &/or semantic types– "Where are the Rocky Mountains?”– "Denver's new airport, topped with white fiberglass

cones in imitation of the Rocky Mountains in the background , continues to lie empty”

– <NAME> in <ANSWER>

NE tagger &/or ontology could enable system to determine "background" is not a location

64Adapted from slides by Manning, Harabagiu, Kusmerick, ISI

Shortcomings... (cont.)

Long distance dependencies"Where is London?”

"London, which has one of the busiest airports in the world, lies on the banks of the river Thames”

would require pattern like:<QUESTION>, (<any_word>)*, lies on <ANSWER>

Abundance & variety of Web data helps system to find an instance of patterns w/o losing answers to long distance dependencies

65Adapted from slides by Manning, Harabagiu, Kusmerick, ISI

Shortcomings... (cont.)

System currently has only one anchor wordDoesn't work for Q types requiring multiple words from question to be in answer

– "In which county does the city of Long Beach lie?”– "Long Beach is situated in Los Angeles County”– required pattern:

<Q_TERM_1> is situated in <ANSWER> <Q_TERM_2>

Does not use case– "What is a micron?”– "...a spokesman for Micron, a maker of semiconductors,

said SIMMs are..."

If Micron had been capitalized in question, would be a perfect answer

66

Question Answering

Today:Introduction to QAA typical full-fledged QA systemA very simple system, in response to thisAn intermediate approach

Wednesday:Using external resources

– WordNet– Encyclopedias, Gazeteers

Incorporating a reasoning systemMachine Learning of mappingsAlternative question types