aquaint bbn’s aqua project ana licuanan, jonathan may, scott miller, ralph weischedel, jinxi xu 3...

AQUAINT

BBN’s AQUA ProjectBBN’s AQUA Project

Ana Licuanan, Jonathan May, Scott Miller, Ralph Weischedel, Jinxi Xu

3 December 2002

AQUAINTBBN’s Approach to QABBN’s Approach to QA

• Theme: Use document retrieval, entity recognition, & proposition recognition

• Analyze the question

– Reduce question to propositions and a bag of words

– Predict the type of the answer

• Rank candidate answers using passage retrieval from primary corpus (the Aquaint corpus)

• Other knowledge sources (e.g. the Web) are optionally used to rerank answers

• Re-rank candidates based on propositions

• Estimate confidence for answers

AQUAINTSystem DiagramSystem Diagram

Question Classification

Web Search

NP Labeling

Treebank

Name Annotation

Name Extraction

Parsing

Description ClassificationProposition Finding

Document Retrieval

Confidence Estimation

Passage Retrieval

Question

Answer & Confidence Score

Name Extraction

Regularization Proposition Bank

AQUAINT

Question ClassificationQuestion Classification

AQUAINTQuestion ClassificationQuestion Classification

• A hybrid approach based on rules and statistical parsing & question templates– Match question templates against statistical parses– Back off to statistical bag-of-word classification

• Example features used for classification– The type of WHNP starting the question (e.g. “Who”,

“What”, “When” …) – The headword of the core NP– WordNet definition– Bag of words– Main verb of the question

• Performance– TREC8&9 questions for training– ~85% when testing on TREC10

AQUAINTExamples of Question AnalysisExamples of Question Analysis

• Where is the Taj Mahal?

– WHNP=where

– Answer type: Location or GPE

• Which pianist won the last International Tchaikovsky Competition?

– Headword of core NP=pianist,

– WordNet definition=person

– Answer type: Person

AQUAINTQuestion-Answer TypesQuestion-Answer Types

Type Subtype

ORGANIZATIONCORPORATION EDUCATIONAL GOVERNMENT HOSPITAL HOTEL MUSEUM OTHER POLITICAL RELIGIOUS

LOCATION CONTINENT LAKE_SEA_OCEAN OTHER REGION RIVER BORDER

FAC AIRPORT ATTRACTION BRIDGE BUILDING HIGHWAY_STREET OTHER

PRODUCT DRUG OTHER VEHICLE WEAPON

NATIONALITY NATIONALITY OTHER POLITICAL RELIGION

LANGUAGE

FAC_DESC AIRPORT ATTRACTION BRIDGE BUILDING HIGHWAY_STREET OTHER

GPE_DESC CITY COUNTRY OTHER STATE_PROVINCE

ORG_DESCCORPORATION EDUCATIONAL GOVERNMENT HOSPITAL HOTEL MUSEUM OTHER POLITICAL RELIGIOUS

CONTACT_INFO ADDRESS OTHER PHONE

WORK_OF_ART BOOK OTHER PAINTING PLAY SONG

*Thanks to USC/ISI and IBM groups for sharing the conclusions of their analyses.

AQUAINTQuestion Answer Types (cont’d)Question Answer Types (cont’d)

PRODUCT_DESC OTHER VIHICLE WEAPON

PERSON

EVENT HURRICAN OTHER WAR

SUBSTANCE CHEMICAL DRUG FOOD OTHER

PER_DESC

PRODCUT OTHER

ORDINAL

ANIMAL

QUANTITY1D 1D_SPACE 2D 2D_SPACE 3D 3D_SPACE ENERGY OTHER SPEED WEIGHT TEMPERATURE

GPE CITY COUNTRY OTHER STATE_PROVINCE

DISEASE

CARDINAL

PERCENT

DATE AGE DATE DURATION OTHER

AQUAINTFrequency of Q TypesFrequency of Q Types

finiti

AQUAINT

InterpretationInterpretation

AQUAINTIdentiFinderIdentiFinderTMTM Status Status

• Current IdentiFinder performance on types

• IdentiFinder easily trainable for other languages, e.g., Arabic and Chinese

SubcategoryCategory

88 89 88.487 88 87.3

AQUAINTProposition IndexingProposition Indexing

• A shallow semantic representation

– Deeper than bags of words

– But broad enough to cover all the text

• Characterizes documents by

– The entities they contain

– Propositions involving those entities

• Resolves all references to entities

– Whether named, described, or pronominal

• Represents all propositions that are directly stated in the text

AQUAINTProposition Finding ExampleProposition Finding Example

Propositions

• (e1: “Dell”)

• (e2: “Comaq”)

• (e3: “the most PCs”)

• (e4: “2001”)

• (sold subj:e1, obj:e3, in:e4)

• (beating subj:e1, obj:e2)

• Question: Which company sold the most PCs in 2001?

• Text: Dell, beating Compaq, sold the most PCs in 2001.

• Passage retrieval would select the wrong answer

Answer

AQUAINTProposition Recognition StrategyProposition Recognition Strategy

• Start with a lexicalized, probabilistic (LPCFG) parsing model

• Distinguish names by replacing NP labels with NPP

• Currently, rules normalize the parse tree to produce propositions

• At a later date, extend the statistical model to

– Predict argument labels for clauses

– Resolve references to entities

AQUAINTConfidence EstimationConfidence Estimation

• Compute probability P(correct|Q,A) from the following featuresP(correct|Q,A)P(correct|type(Q), <m,n>, PropSat)– type(Q): question type– m: question length– n: number of matched question words in answer

context– PropSat: whether answer satisfies propositions in the

question• Confidence for answers found on the Web P(correct|Q,A)P(correct|Freq, InTrec)

– Freq=Number of Web hits, using Google– InTrec=Whether Q was also a top answer from Aquaint

corpus

AQUAINT

Dependence of Answer Correctness Dependence of Answer Correctness on Question Typeon Question Type

AQUAINT

Dependence on Proposition Dependence on Proposition SatisfactionSatisfaction

PropSat=True PropSat=False

AQUAINT

Dependence on Number of Matched Dependence on Number of Matched WordsWords

0 2 4 6

number of matched words

questionlength=3

questionlength=4

questionlength=5

AQUAINT

Dependence of AnswerDependence of AnswerCorrectness on Web FrequencyCorrectness on Web Frequency

0 50 100 150

Fre que ncy of answe r in Google sum m arie s

INT REC t rue

INT REC false

AQUAINTOfficial Results of TREC2002QAOfficial Results of TREC2002QA

RunTagUnranked Average

Precision

Ranked Average

Precision

Upper-bound

BBN2002A 0.186 0.257 0.498

BBN2002B 0.288 0.468 0.646

BBN2002C 0.284 0.499 0.641

• BBN2002A did not use Web

• BBN2002B&C used Web

• Unranked average precision=percentage of questions for which the first answer is correct

• Ranked average precision=Confidence weighted score, the official metric for TREC2002

• Upper-bound=confidence weighted score given perfect confidence estimation

AQUAINTRecent Progress Recent Progress

• In the last six months, we have:

– Retrained our name tagger (IdentiFinderTM) for roughly 29 question types

– Distributed the re-trained English version of IdentiFinder to other sites

– Participated in the Question Answering track of TREC 2002

– Participated in a pilot evaluation of automatically answering definitional/biographical questions

– Developed a demonstration of our question answering system AQUA against streaming news

aquaint bbn’s aqua project ana licuanan, jonathan may, scott miller, ralph weischedel, jinxi xu 3...

question lengthn

questionreduce question

question typem

e2 question

questionanswer types

question answer types

statistical bag

statistical parsesback

Documents

december 2016 january 2017 cvsu administrative council...

history of pmma of trustees hon. patricia b. licuanan, ph.d....

james licuanan ppp

01 infinity front sea view - islandsproperties.com...jinxi...

lincoln elementary mrs. mcneely principal april 2018 grade...

patricia b. licuanan , ph.d. chairperson commission on...

alice lorraine flood was born september 14, 1927 in...

geodesics in heat - department of computer...

jinxi town (錦溪古鎮)

a genome-wide bac-end sequence survey provides first ... ·...

1 an integrated annotation db in ontonotes sameer pradhan,...

ontonotes release 4 - catalog.ldc.upenn.edu · ontonotes...

the state of philippine higher...

© anselm dohle-beltinger 2010 6 - wirtschaftsethik...

wilhelm weischedel - versuch über das wesen der...

· pdf filephilippines statement by hon. patricia b....

aquaint building an initial cross-lingual question answering...

aquaint a hybrid approach to answering...

© anselm dohle-beltinger 2010 wirtschaftsethik...

dr. patricia licuanan education and human capital...