aquaint bbn’s aqua project ana licuanan, jonathan may, scott miller, ralph weischedel, jinxi xu 3...
Post on 02-Jan-2016
215 Views
Preview:
TRANSCRIPT
AQUAINT
BBN’s AQUA ProjectBBN’s AQUA Project
Ana Licuanan, Jonathan May, Scott Miller, Ralph Weischedel, Jinxi Xu
3 December 2002
2
AQUAINTBBN’s Approach to QABBN’s Approach to QA
• Theme: Use document retrieval, entity recognition, & proposition recognition
• Analyze the question
– Reduce question to propositions and a bag of words
– Predict the type of the answer
• Rank candidate answers using passage retrieval from primary corpus (the Aquaint corpus)
• Other knowledge sources (e.g. the Web) are optionally used to rerank answers
• Re-rank candidates based on propositions
• Estimate confidence for answers
3
AQUAINTSystem DiagramSystem Diagram
Question Classification
Web Search
NP Labeling
Treebank
Name Annotation
Name Extraction
Parsing
Description ClassificationProposition Finding
Document Retrieval
Confidence Estimation
Passage Retrieval
Question
Answer & Confidence Score
Name Extraction
Regularization Proposition Bank
AQUAINT
Question ClassificationQuestion Classification
5
AQUAINTQuestion ClassificationQuestion Classification
• A hybrid approach based on rules and statistical parsing & question templates– Match question templates against statistical parses– Back off to statistical bag-of-word classification
• Example features used for classification– The type of WHNP starting the question (e.g. “Who”,
“What”, “When” …) – The headword of the core NP– WordNet definition– Bag of words– Main verb of the question
• Performance– TREC8&9 questions for training– ~85% when testing on TREC10
6
AQUAINTExamples of Question AnalysisExamples of Question Analysis
• Where is the Taj Mahal?
– WHNP=where
– Answer type: Location or GPE
• Which pianist won the last International Tchaikovsky Competition?
– Headword of core NP=pianist,
– WordNet definition=person
– Answer type: Person
7
AQUAINTQuestion-Answer TypesQuestion-Answer Types
Type Subtype
ORGANIZATIONCORPORATION EDUCATIONAL GOVERNMENT HOSPITAL HOTEL MUSEUM OTHER POLITICAL RELIGIOUS
LOCATION CONTINENT LAKE_SEA_OCEAN OTHER REGION RIVER BORDER
FAC AIRPORT ATTRACTION BRIDGE BUILDING HIGHWAY_STREET OTHER
GAME
PRODUCT DRUG OTHER VEHICLE WEAPON
NATIONALITY NATIONALITY OTHER POLITICAL RELIGION
LANGUAGE
FAC_DESC AIRPORT ATTRACTION BRIDGE BUILDING HIGHWAY_STREET OTHER
MONEY
GPE_DESC CITY COUNTRY OTHER STATE_PROVINCE
ORG_DESCCORPORATION EDUCATIONAL GOVERNMENT HOSPITAL HOTEL MUSEUM OTHER POLITICAL RELIGIOUS
CONTACT_INFO ADDRESS OTHER PHONE
WORK_OF_ART BOOK OTHER PAINTING PLAY SONG
*Thanks to USC/ISI and IBM groups for sharing the conclusions of their analyses.
8
AQUAINTQuestion Answer Types (cont’d)Question Answer Types (cont’d)
PRODUCT_DESC OTHER VIHICLE WEAPON
PERSON
EVENT HURRICAN OTHER WAR
SUBSTANCE CHEMICAL DRUG FOOD OTHER
PER_DESC
PRODCUT OTHER
ORDINAL
ANIMAL
QUANTITY1D 1D_SPACE 2D 2D_SPACE 3D 3D_SPACE ENERGY OTHER SPEED WEIGHT TEMPERATURE
GPE CITY COUNTRY OTHER STATE_PROVINCE
DISEASE
CARDINAL
AGE
TIME
PLANT
PERCENT
LAW
DATE AGE DATE DURATION OTHER
9
AQUAINTFrequency of Q TypesFrequency of Q Types
0
50
100
150
200
250P
ers
on
Qu
an
tity
Mo
ne
yP
erc
en
tO
rga
niz
atio
nO
rga
niz
atio
n-D
esc
Pro
du
ct-N
am
eP
rod
uct
-De
scF
aci
lity
Dis
ea
seR
ea
son
GP
EG
PE
-De
scW
ork
-of-
Art
Da
teE
ven
tT
ime
La
ng
ua
ge
Na
tion
alit
yL
oca
tion
-Na
me
De
finiti
on
Use
Oth
er
Ca
rdin
al
Ord
ina
lG
am
eC
on
tact
In
foA
nim
al
Pla
nt
Bio
Ca
use
-Eff
ect
-In
flue
nce
La
w
# i
n T
RE
C 8
, 9
, 1
0
AQUAINT
InterpretationInterpretation
11
AQUAINTIdentiFinderIdentiFinderTMTM Status Status
• Current IdentiFinder performance on types
• IdentiFinder easily trainable for other languages, e.g., Arabic and Chinese
Rec
all
Pre
cis
ion F
SubcategoryCategory
88 89 88.487 88 87.3
0
20
40
60
80
100
12
AQUAINTProposition IndexingProposition Indexing
• A shallow semantic representation
– Deeper than bags of words
– But broad enough to cover all the text
• Characterizes documents by
– The entities they contain
– Propositions involving those entities
• Resolves all references to entities
– Whether named, described, or pronominal
• Represents all propositions that are directly stated in the text
13
AQUAINTProposition Finding ExampleProposition Finding Example
Propositions
• (e1: “Dell”)
• (e2: “Comaq”)
• (e3: “the most PCs”)
• (e4: “2001”)
• (sold subj:e1, obj:e3, in:e4)
• (beating subj:e1, obj:e2)
• Question: Which company sold the most PCs in 2001?
• Text: Dell, beating Compaq, sold the most PCs in 2001.
• Passage retrieval would select the wrong answer
Answer
14
AQUAINTProposition Recognition StrategyProposition Recognition Strategy
• Start with a lexicalized, probabilistic (LPCFG) parsing model
• Distinguish names by replacing NP labels with NPP
• Currently, rules normalize the parse tree to produce propositions
• At a later date, extend the statistical model to
– Predict argument labels for clauses
– Resolve references to entities
15
AQUAINTConfidence EstimationConfidence Estimation
• Compute probability P(correct|Q,A) from the following featuresP(correct|Q,A)P(correct|type(Q), <m,n>, PropSat)– type(Q): question type– m: question length– n: number of matched question words in answer
context– PropSat: whether answer satisfies propositions in the
question• Confidence for answers found on the Web P(correct|Q,A)P(correct|Freq, InTrec)
– Freq=Number of Web hits, using Google– InTrec=Whether Q was also a top answer from Aquaint
corpus
16
AQUAINT
Dependence of Answer Correctness Dependence of Answer Correctness on Question Typeon Question Type
0
0.1
0.2
0.3
0.4
0.5P
(cor
rrec
t|Typ
e)
17
AQUAINT
Dependence on Proposition Dependence on Proposition SatisfactionSatisfaction
0
0.1
0.2
0.3
0.4
0.5
0.6
PropSat=True PropSat=False
P(c
orr
ec
t|P
rop
Sa
t)
18
AQUAINT
Dependence on Number of Matched Dependence on Number of Matched WordsWords
0
0.1
0.2
0.3
0.4
0.5
0 2 4 6
number of matched words
p(co
rrec
t)
questionlength=3
questionlength=4
questionlength=5
19
AQUAINT
Dependence of AnswerDependence of AnswerCorrectness on Web FrequencyCorrectness on Web Frequency
0
0.2
0.4
0.6
0.8
1
0 50 100 150
Fre que ncy of answe r in Google sum m arie s
P(c
orre
ct|F
,IN
TR
EC
)
INT REC t rue
INT REC false
20
AQUAINTOfficial Results of TREC2002QAOfficial Results of TREC2002QA
RunTagUnranked Average
Precision
Ranked Average
Precision
Upper-bound
BBN2002A 0.186 0.257 0.498
BBN2002B 0.288 0.468 0.646
BBN2002C 0.284 0.499 0.641
• BBN2002A did not use Web
• BBN2002B&C used Web
• Unranked average precision=percentage of questions for which the first answer is correct
• Ranked average precision=Confidence weighted score, the official metric for TREC2002
• Upper-bound=confidence weighted score given perfect confidence estimation
21
AQUAINTRecent Progress Recent Progress
• In the last six months, we have:
– Retrained our name tagger (IdentiFinderTM) for roughly 29 question types
– Distributed the re-trained English version of IdentiFinder to other sites
– Participated in the Question Answering track of TREC 2002
– Participated in a pilot evaluation of automatically answering definitional/biographical questions
– Developed a demonstration of our question answering system AQUA against streaming news
top related