question answering

54
© Johan Bos November 2005 Question Answering Lecture 1 (two weeks ago): Introduction; History of QA; Architecture of a QA system; Evaluation. Lecture 2 (last week): Question Classification; NLP techniques for question analysis; Tokenisation; Lemmatisation; POS-tagging; Parsing; WordNet. Lecture 3 (today): Named Entity Recognition; Anaphora Resolution; Matching; Reranking; Answer Validation.

Upload: rangle

Post on 11-Feb-2016

52 views

Category:

Documents


0 download

DESCRIPTION

Question Answering. Lecture 1 (two weeks ago): Introduction; History of QA; Architecture of a QA system; Evaluation. Lecture 2 (last week): Question Classification; NLP techniques for question analysis; Tokenisation; Lemmatisation; POS-tagging; Parsing; WordNet. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Question Answering

© J

ohan

Bos

Nov

embe

r 200

5

Question Answering

• Lecture 1 (two weeks ago):Introduction; History of QA; Architecture of a QA system; Evaluation.

• Lecture 2 (last week):Question Classification; NLP techniques for question analysis; Tokenisation; Lemmatisation; POS-tagging; Parsing; WordNet.

• Lecture 3 (today):Named Entity Recognition; Anaphora Resolution; Matching; Reranking; Answer Validation.

Page 2: Question Answering

© J

ohan

Bos

Nov

embe

r 200

5

The Panda

Page 3: Question Answering

© J

ohan

Bos

Nov

embe

r 200

5

A panda…

A panda walks into a cafe. He orders a sandwich, eats it, then draws

a gun and fires two shots in the air.

Page 4: Question Answering

© J

ohan

Bos

Nov

embe

r 200

5

A panda…

“Why?” asks the confused waiter, as the panda makes towards the exit.

The panda produces a dictionary and tosses it over his shoulder.

“I am a panda,” he says. “Look it up.”

Page 5: Question Answering

© J

ohan

Bos

Nov

embe

r 200

5

The panda’s dictionary

Panda. Large black-and-white bear-like mammal, native to China. Eats, shoots and leaves.

Page 6: Question Answering

© J

ohan

Bos

Nov

embe

r 200

5

Ambiguities

Eats, shoots and leaves. VBZ VBZ CC VBZ

Page 7: Question Answering

© J

ohan

Bos

Nov

embe

r 200

5

Ambiguities

Eats shoots and leaves. VBZ NNS CC NNS

Page 8: Question Answering

© J

ohan

Bos

Nov

embe

r 200

5

Question Answering

• Lecture 1 (two weeks ago):Introduction; History of QA; Architecture of a QA system; Evaluation.

• Lecture 2 (last week):Question Classification; NLP techniques for question analysis; Tokenisation; Lemmatisation; POS-tagging; Parsing; WordNet.

• Lecture 3 (today):Named Entity Recognition; Anaphora Resolution; Matching; Reranking;Answer Validation.

Page 9: Question Answering

© J

ohan

Bos

Nov

embe

r 200

5

Architecture of a QA system

IRQuestion Analysis

query

Document Analysis

Answer Extraction

question

answer-type

question representation

documents/passages

passage representation

corpus

answers

Page 10: Question Answering

© J

ohan

Bos

Nov

embe

r 200

5

Architecture of a QA system

IRQuestion Analysis

query

Document Analysis

Answer Extraction

question

answer-type

question representation

documents/passages

passage representation

corpus

answers

Page 11: Question Answering

© J

ohan

Bos

Nov

embe

r 200

5

Recall the Answer-Type Taxonomy

• We divided questions according to their expected answer type

• Simple Answer-Type Typology

PERSONNUMERALDATEMEASURELOCATIONORGANISATIONENTITY

Page 12: Question Answering

© J

ohan

Bos

Nov

embe

r 200

5

Named Entity Recognition

• In order to make use of the answer types, we need to be able to recognise named entities of the same types in the corpus

PERSONNUMERALDATEMEASURELOCATIONORGANISATIONENTITY

Page 13: Question Answering

© J

ohan

Bos

Nov

embe

r 200

5

Example Text

Italy’s business world was rocked by the announcement last Thursday that Mr. Verdi would leave his job as vice-president of Music Masters of Milan, Inc to become operations director of  Arthur Andersen. 

Page 14: Question Answering

© J

ohan

Bos

Nov

embe

r 200

5

Named Entity Recognition

<ENAMEX TYPE=„LOCATION“>Italy</ENAME>‘s business world was rocked by the announcement <TIMEX TYPE=„DATE“>last Thursday</TIMEX> that Mr. <ENAMEX TYPE=„PERSON“>Verdi</ENAMEX> would leave his job as vice-president of <ENAMEX TYPE=„ORGANIZATION“>Music Masters of Milan, Inc</ENAMEX> to become operations director of  <ENAMEX TYPE=„ORGANIZATION“>Arthur Andersen</ENAMEX>. 

Page 15: Question Answering

© J

ohan

Bos

Nov

embe

r 200

5

NER difficulties

• Several types of entities are too numerous to include in dictionaries

• New names turn up every day• Different forms of same entities

in same text– Brian Jones … Mr. Jones

• Capitalisation

Page 16: Question Answering

© J

ohan

Bos

Nov

embe

r 200

5

NER approaches

• Rule-based approach– Hand-crafted rules– Help from databases of known named

entities

• Statistical approach– Features – Machine learning

Page 17: Question Answering

© J

ohan

Bos

Nov

embe

r 200

5

Anaphora

Page 18: Question Answering

© J

ohan

Bos

Nov

embe

r 200

5

What is anaphora?

• Relation between a pronoun and another element in the same or earlier sentence

• Anaphoric pronouns: – he, she, it, they

• Anaphoric noun phrases:– the country, – that idiot, – his hat, her dress

Page 19: Question Answering

© J

ohan

Bos

Nov

embe

r 200

5

Anaphora (pronouns)

• Question:What is the biggest sector in Andorra’s economy?

• Corpus:Andorra is a tiny land-locked country in southwestern Europe, between France and Spain. Tourism, the largest sector of its tiny, well-to-do economy, accounts for roughly 80% of the GDP.

• Answer: ?

Page 20: Question Answering

© J

ohan

Bos

Nov

embe

r 200

5

Anaphora (definite descriptions)

• Question:What is the biggest sector in Andorra’s economy?

• Corpus:Andorra is a tiny land-locked country in southwestern Europe, between France and Spain. Tourism, the largest sector of the country’s tiny, well-to-do economy, accounts for roughly 80% of the GDP.

• Answer: ?

Page 21: Question Answering

© J

ohan

Bos

Nov

embe

r 200

5

Anaphora Resolution

• Anaphora Resolution is the task of finding the antecedents of anaphoric expressions

• Example system:– Mitkov, Evans & Orasan (2002)– http://clg.wlv.ac.uk/MARS/

Page 22: Question Answering

© J

ohan

Bos

Nov

embe

r 200

5

Anaphora (pronouns)

• Question:What is the biggest sector in Andorra’s economy?

• Corpus:Andorra is a tiny land-locked country in southwestern Europe, between France and Spain. Tourism, the largest sector of Andorra’s tiny, well-to-do economy, accounts for roughly 80% of the GDP.

• Answer: Tourism

Page 23: Question Answering

© J

ohan

Bos

Nov

embe

r 200

5

Architecture of a QA system

IRQuestion Analysis

query

Document Analysis

Answer Extraction

question

answer-type

question representation

documents/passages

passage representation

corpus

answers

Page 24: Question Answering

© J

ohan

Bos

Nov

embe

r 200

5

Matching

• Given a question and an expression with a potential answer, calculate a matching score S = match(Q,A) that indicates how well Q matches A

• Example– Q: When was Franz Kafka born?– A1: Franz Kafka died in 1924.

– A2: Kafka was born in 1883.

Page 25: Question Answering

© J

ohan

Bos

Nov

embe

r 200

5

Semantic Matching

answer(X)franz(Y)kafka(Y)born(E)patient(E,Y)temp(E,X)

franz(x1)kafka(x1)die(x3)agent(x3,x1)in(x3,x2)1924(x2)

Q: A1:

Page 26: Question Answering

© J

ohan

Bos

Nov

embe

r 200

5

Semantic Matching

answer(X)franz(Y)kafka(Y)born(E)patient(E,Y)temp(E,X)

franz(x1)kafka(x1)die(x3)agent(x3,x1)in(x3,x2)1924(x2)

Q: A1:

X=x2

Page 27: Question Answering

© J

ohan

Bos

Nov

embe

r 200

5

Semantic Matching

answer(x2)franz(Y)kafka(Y)born(E)patient(E,Y)temp(E,x2)

franz(x1)kafka(x1)die(x3)agent(x3,x1)in(x3,x2)1924(x2)

Q: A1:

Y=x1

Page 28: Question Answering

© J

ohan

Bos

Nov

embe

r 200

5

Semantic Matching

answer(x2)franz(x1)kafka(x1)born(E)patient(E,Y)temp(E,x2)

franz(x1)kafka(x1)die(x3)agent(x3,x1)in(x3,x2)1924(x2)

Q: A1:

Y=x1

Page 29: Question Answering

© J

ohan

Bos

Nov

embe

r 200

5

Semantic Matching

answer(x2)franz(x1)kafka(x1)born(E)patient(E,Y)temp(E,x2)

Match score = 3/6 = 0.50

Q: A1: franz(x1)kafka(x1)die(x3)agent(x3,x1)in(x3,x2)1924(x2)

Page 30: Question Answering

© J

ohan

Bos

Nov

embe

r 200

5

Semantic Matching

answer(X)franz(Y)kafka(Y)born(E)patient(E,Y)temp(E,X)

kafka(x1)born(x3)patient(x3,x1)in(x3,x2)1883(x2)

Q: A2:

Page 31: Question Answering

© J

ohan

Bos

Nov

embe

r 200

5

Semantic Matching

answer(X)franz(Y)kafka(Y)born(E)patient(E,Y)temp(E,X)

kafka(x1)born(x3)patient(x3,x1)in(x3,x2)1883(x2)

Q: A2:

X=x2

Page 32: Question Answering

© J

ohan

Bos

Nov

embe

r 200

5

Semantic Matching

answer(x2)franz(Y)kafka(Y)born(E)patient(E,Y)temp(E,x2)

kafka(x1)born(x3)patient(x3,x1)in(x3,x2)1883(x2)

Q: A2:

Y=x1

Page 33: Question Answering

© J

ohan

Bos

Nov

embe

r 200

5

Semantic Matching

answer(x2)franz(x1)kafka(x1)born(E)patient(E,x1)temp(E,x2)

kafka(x1)born(x3)patient(x3,x1)in(x3,x2)1883(x2)

Q: A2:

E=x3

Page 34: Question Answering

© J

ohan

Bos

Nov

embe

r 200

5

Semantic Matching

answer(x2)franz(x1)kafka(x1)born(x3)patient(x3,x1)temp(x3,x2)

kafka(x1)born(x3)patient(x3,x1)in(x3,x2)1883(x2)

Q: A2:

E=x3

Page 35: Question Answering

© J

ohan

Bos

Nov

embe

r 200

5

Semantic Matching

answer(x2)franz(x1)kafka(x1)born(x3)patient(x3,x1)temp(x3,x2)

kafka(x1)born(x3)patient(x3,x1)in(x3,x2)1883(x2)

Q: A2:

Match score = 4/6 = 0.67

Page 36: Question Answering

© J

ohan

Bos

Nov

embe

r 200

5

Matching Techniques

• Weighted matching– Higher weight for named entities

• WordNet – Hyponyms

• Inferences rules– Example:

BORN(E) & IN(E,Y) & DATE(Y) TEMP(E,Y)

Page 37: Question Answering

© J

ohan

Bos

Nov

embe

r 200

5

Reranking

Page 38: Question Answering

© J

ohan

Bos

Nov

embe

r 200

5

Reranking

• Most QA systems first produce a list of possible answers…

• This is usually followed by a process called reranking

• Reranking promotes correct answers to a higher rank

Page 39: Question Answering

© J

ohan

Bos

Nov

embe

r 200

5

Factors in reranking

• Matching score– The better the match with the question, the

more likely the answers• Frequency

– If the same answer occurs many times, it is likely to be correct

Page 40: Question Answering

© J

ohan

Bos

Nov

embe

r 200

5

Sanity Checking

Answer should be informative

Q: Who is Tom Cruise married to?A: Tom Cruise

Q: Where was Florence Nightingale born?A: Florence

Page 41: Question Answering

© J

ohan

Bos

Nov

embe

r 200

5

Answer Validation

• Given a ranked list of answers, some of these might not make sense at all

• Promote answers that make sense

• How?• Use even a larger corpus!

– “Sloppy” approach– “Strict” approach

Page 42: Question Answering

© J

ohan

Bos

Nov

embe

r 200

5

The World Wide Web

Page 43: Question Answering

© J

ohan

Bos

Nov

embe

r 200

5

Answer validation (sloppy)

• Given a question Q and a set of answers A1…An

• For each i, generate query Q Ai • Count the number of hits for each i• Choose Ai with most number of hits• Use existing search engines

– Google, AltaVista– Magnini et al. 2002 (CCP)

Page 44: Question Answering

© J

ohan

Bos

Nov

embe

r 200

5

Corrected Conditional Probability

• Treat Q and A as a bag of words– Q = content words question– A = answer

hits(A NEAR Q)• CCP(Qsp,Asp) = ------------------------------

hits(A) x hits(Q)

• Accept answers above a certain CCP threshold

Page 45: Question Answering

© J

ohan

Bos

Nov

embe

r 200

5

Answer validation (strict)

• Given a question Q and a set of answers A1…An

• Create a declarative sentence with the focus of the question replaced by Ai

• Use the strict search option in Google– High precision– Low recall

• Any terms of the target not in the sentence as added to the query

Page 46: Question Answering

© J

ohan

Bos

Nov

embe

r 200

5

Example

• TREC 99.3Target: Woody Guthrie.Question: Where was Guthrie born?

• Top-5 Answers: 1) Britain* 2) Okemah, Okla.

3) Newport* 4) Oklahoma

5) New York

Page 47: Question Answering

© J

ohan

Bos

Nov

embe

r 200

5

Example: generate queries

• TREC 99.3Target: Woody Guthrie.Question: Where was Guthrie born?

• Generated queries: 1) “Guthrie was born in Britain” 2) “Guthrie was born in Okemah, Okla.”

3) “Guthrie was born in Newport”4) “Guthrie was born in Oklahoma”5) “Guthrie was born in New York”

Page 48: Question Answering

© J

ohan

Bos

Nov

embe

r 200

5

Example: add target words

• TREC 99.3Target: Woody Guthrie.Question: Where was Guthrie born?

• Generated queries: 1) “Guthrie was born in Britain” Woody 2) “Guthrie was born in Okemah, Okla.” Woody

3) “Guthrie was born in Newport” Woody4) “Guthrie was born in Oklahoma” Woody5) “Guthrie was born in New York” Woody

Page 49: Question Answering

© J

ohan

Bos

Nov

embe

r 200

5

Example: morphological variants

TREC 99.3Target: Woody Guthrie.Question: Where was Guthrie born?

Generated queries:“Guthrie is OR was OR are OR were born in Britain” Woody“Guthrie is OR was OR are OR were born in Okemah, Okla.” Woody“Guthrie is OR was OR are OR were born in Newport” Woody“Guthrie is OR was OR are OR were born in Oklahoma” Woody“Guthrie is OR was OR are OR were born in New York” Woody

Page 50: Question Answering

© J

ohan

Bos

Nov

embe

r 200

5

Example: google hits

TREC 99.3Target: Woody Guthrie.Question: Where was Guthrie born?

Generated queries:“Guthrie is OR was OR are OR were born in Britain” Woody 0“Guthrie is OR was OR are OR were born in Okemah, Okla.” Woody 10“Guthrie is OR was OR are OR were born in Newport” Woody 0 “Guthrie is OR was OR are OR were born in Oklahoma” Woody 42“Guthrie is OR was OR are OR were born in New York” Woody 2

Page 51: Question Answering

© J

ohan

Bos

Nov

embe

r 200

5

Example: reranked answers

TREC 99.3Target: Woody Guthrie.Question: Where was Guthrie born?

Original answers 1) Britain* 2) Okemah, Okla.

3) Newport* 4) Oklahoma

5) New York

Reranked answers * 4) Oklahoma * 2) Okemah, Okla.

5) New York 1) Britain3) Newport

Page 52: Question Answering

© J

ohan

Bos

Nov

embe

r 200

5

Summary

• Introduction to QA– Typical Architecture, Evaluation– Types of Questions and Answers

• Use of general NLP techniques– Tokenisation, POS tagging, Parsing– NER, Anaphora Resolution

• QA Techniques– Matching– Reranking– Answer Validation

Page 53: Question Answering

© J

ohan

Bos

Nov

embe

r 200

5

Where to go from here

• Producing answers in real-time• Improve accuracy • Answer explanation• User modelling• Speech interfaces• Dialogue (interactive QA)• Multi-lingual QA

Page 54: Question Answering

© J

ohan

Bos

Nov

embe

r 200

5

Video (Robot)