query-driven hypothesis generation for answering queries over nlp graphs, by chris welty, ken...

22
Answering Conjunctive SPARQL Queries over NLP Graphs Tex t “30 are better than one” Chris Welty, Ken Barker, Lora Aroyo, Shilpa Arora Query-Driven Hypothesis Generation for Answering Queries over NLP Graphs

Upload: lora-aroyo

Post on 28-Nov-2014

1.163 views

Category:

Technology


1 download

DESCRIPTION

This paper has been presented at the ISWC2012. It has become common to use RDF to store the results of Natural Language Processing (NLP) as a graph of the entities mentioned in the text with the relationships mentioned in the text as links between them. These NLP graphs can be measured with Precision and Recall against a ground truth graph representing what the documents actually say. When asking conjunctive queries on NLP graphs, the Recall of the query is expected to be roughly the product of the Recall of the relations in each conjunct. Since Recall is typically less than one, conjunctive query Recall on NLP graphs degrades geometrically with the number of conjuncts. We present an approach to address this Recall problem by hypothesizing links in the graph that would improve query Recall, and then attempting to find more evidence to support them. Using this approach, we confirm that in the context of answering queries over NLP graphs, we can use lower confidence results from NLP components if they complete a query result.

TRANSCRIPT

Page 1: Query-Driven Hypothesis Generation for Answering Queries over NLP Graphs, by Chris Welty, Ken Barker, Lora Aroyo, Shilpa Arora

Answering Conjunctive SPARQL Queries over NLP Graphs

Text

“30 are better than one”

Chris Welty, Ken Barker, Lora Aroyo, Shilpa Arora

Query-Driven Hypothesis Generation for Answering Queries over NLP Graphs

Page 2: Query-Driven Hypothesis Generation for Answering Queries over NLP Graphs, by Chris Welty, Ken Barker, Lora Aroyo, Shilpa Arora

Answering Conjunctive SPARQL Queries over NLP Graphs

to decrease the cost of maintaining critical system DBs can we replace the human without changing the LSW

can we build a machine reader for this

Approach

query

the NLP process is not a one-shot deal the query provides context for what the user is seeking

and thus an opportunity to re-interpret the text

NLP Graphs

NLP Stack

re-interpret

Page 3: Query-Driven Hypothesis Generation for Answering Queries over NLP Graphs, by Chris Welty, Ken Barker, Lora Aroyo, Shilpa Arora

Answering Conjunctive SPARQL Queries over NLP Graphs

NLP Stack •  Contains NER, CoRef, RelEx, entity disambiguation

•  RelEx: SVM learner with output score: probabilities/confidences for each known relation that the sentence expresses it between each pair of mentions

•  Run over target corpus producing NLP graph

•  nodes are entities (clusters of mentions produced by coref)

•  edges are type statements between entities and classes in the ontology, or relations detected between mentions of these entities in the corpus

Page 4: Query-Driven Hypothesis Generation for Answering Queries over NLP Graphs, by Chris Welty, Ken Barker, Lora Aroyo, Shilpa Arora

NLP Graph

…  Mr.  X  of  India  …  

…  in  places  like  India,  Iraq,  …  

Person Country

GPE Country

citizenOf

citizenOf

citizenOf

subPlace

coref

Page 5: Query-Driven Hypothesis Generation for Answering Queries over NLP Graphs, by Chris Welty, Ken Barker, Lora Aroyo, Shilpa Arora

India  

India   Iraq  

Mr.  X  

NLP Graph

Person Country

GPE Country

citizenOf

subPlace

coref

Page 6: Query-Driven Hypothesis Generation for Answering Queries over NLP Graphs, by Chris Welty, Ken Barker, Lora Aroyo, Shilpa Arora

NLP Graph

Mr. X

India

Iraq

Person GPE

citizenOf

subPlaceOf

rdf:type

Country

rdf:type

rdf:type

rdf:subClassOf

India

rdf:type

Page 7: Query-Driven Hypothesis Generation for Answering Queries over NLP Graphs, by Chris Welty, Ken Barker, Lora Aroyo, Shilpa Arora

Answering Conjunctive SPARQL Queries over NLP Graphs

Relation Extraction by RelEx •  RelEx: a set of SVM binary classifiers, one per relation

•  for each sentence in the corpus,

•  for each pair of mentions in that sentence,

•  for each known relation

•  produce a probability that that pair is related by the relation

•  NLP graphs are generated by selecting relations from RelEx output in two ways:

•  Primary: takes only the top scoring relation between any mention pair above a confidence threshold

•  Secondary: takes all relations between all mention pairs above a threshold

Page 8: Query-Driven Hypothesis Generation for Answering Queries over NLP Graphs, by Chris Welty, Ken Barker, Lora Aroyo, Shilpa Arora

subPlaceOf rdf:type

RelEx Secondary Graph

Mr. X

India

Iraq

Person GPE

citizenOf

rdf:type

Country

rdf:type

rdf:type

rdf:subClassOf

India subPlaceOf locatedIn causes

Page 9: Query-Driven Hypothesis Generation for Answering Queries over NLP Graphs, by Chris Welty, Ken Barker, Lora Aroyo, Shilpa Arora

Primary vs. Secondary

P R F

Primary @ 0.1 0.19 0.39 0.26

Primary @ 0.2 0.29 0.33 0.30

Secondary @ 0 0.01 0.95 0.02

Recall of max-F configuration

Page 10: Query-Driven Hypothesis Generation for Answering Queries over NLP Graphs, by Chris Welty, Ken Barker, Lora Aroyo, Shilpa Arora

R  =  .057  

R  =  .65  R  =  .09  R  =  .97  

Conjunctive Queries find all terrorist organizations that were agents of bombings

in Lebanon on October 23, 1983:

 SELECT  ?t  WHERE  {  

?t  rdf:type  mric:TerroristOrganization  .  ?b  rdf:type  mric:Bombing  .  ?b  mric:mediatingAgent  ?t  .  ?b  mric:eventLocation  mric:Lebanon  .  ?b  mric:eventDate  "1983-­‐10-­‐23"  .  

   }    

Page 11: Query-Driven Hypothesis Generation for Answering Queries over NLP Graphs, by Chris Welty, Ken Barker, Lora Aroyo, Shilpa Arora

Problem with Conjunctive Queries

•  [Π Recall(Rk) ] x Recallcoref

•  Recall for n term query O(Recalln)

•  for complex queries Recall becomes dominating factor

•  in our experiments: query recall <.1 for n>3

•  To get any particular correct answer, all NLP components had to get it right

k=1

n

Page 12: Query-Driven Hypothesis Generation for Answering Queries over NLP Graphs, by Chris Welty, Ken Barker, Lora Aroyo, Shilpa Arora

Hypothesis Generation •  For queries of size N

–  For each term •  relax the query by removing the term H •  for each solution

–  bind the variables in H from the solution forming a hypothesis

–  If no solutions for size N-1 are found, then try for N-2

•  appropriate for queries that are almost answerable, e.g. missing one of the terms

•  biased towards generating more answers to queries, e.g. perform poorly on queries for which the corpus does not contain the answer

Page 13: Query-Driven Hypothesis Generation for Answering Queries over NLP Graphs, by Chris Welty, Ken Barker, Lora Aroyo, Shilpa Arora

SELECT  ?t  WHERE  {  

?t  rdf:type  mric:TerroristOrganization  .  ?b  rdf:type  mric:Bombing  .  ?b  mric:mediatingAgent  ?t  .  ?b  mric:eventLocation  mric:Lebanon  .  ?b  mric:eventDate  "1983-­‐10-­‐23"  .  

   }  

mric:TerroristOrganiza=on  mric:bombing  

rdf:type

b  

mric:Lebanon  

mric:eventLocation

1983-­‐10-­‐23  

mric:eventDate

t   mric:mediatingAgent

rdf:type

find all bombings by terrorist orgs in Lebanon (hypothesize that the bombings were on 1983-10-23)

Page 14: Query-Driven Hypothesis Generation for Answering Queries over NLP Graphs, by Chris Welty, Ken Barker, Lora Aroyo, Shilpa Arora

find all bombings by terrorist orgs in Lebanon

mric:eventDate

1983-­‐10-­‐23  

mric:org-­‐16   mric:event-­‐3  

hypothesize that event-3 was on 1983-10-23  

This subgraph matches the relaxed query  

Page 15: Query-Driven Hypothesis Generation for Answering Queries over NLP Graphs, by Chris Welty, Ken Barker, Lora Aroyo, Shilpa Arora

Hypothesis Validation •  Once generated, a hypothesis must be validated

–  gather evidence that it is true –  the probability of a triple being true increases

•  We utilize a stack of hypothesis checkers that provide –  confidence whether a hypothesis holds –  provenance: a pointer to a span of text that supports it

•  Can be used to bind complex computational tasks –  e.g. formal reasoning/choosing between low-confidence extractions

–  such tasks are made more tractable by using hypotheses as goals, e.g. a reasoner may be used effectively by constraining to only a part of the graph connected to a hypothesis

Page 16: Query-Driven Hypothesis Generation for Answering Queries over NLP Graphs, by Chris Welty, Ken Barker, Lora Aroyo, Shilpa Arora

Secondary Graph for Validation

•  Hypotheses can be validated by looking for the tuple in the secondary graph •  a tuple will appear in SG if the subject and object entities

occur in the same sentence somewhere in the corpus

•  With precision at .02, it is important to find a productive threshold for accepting hypotheses •  we conducted several experiments to find this threshold

Page 17: Query-Driven Hypothesis Generation for Answering Queries over NLP Graphs, by Chris Welty, Ken Barker, Lora Aroyo, Shilpa Arora

Experiments

•  3 for dev, 3 for test

•  each experiment compares query results from only PG to query results using the PG+SG for hypothesis validation

•  the three experiments compare performance at different primary graph thresholds

Page 18: Query-Driven Hypothesis Generation for Answering Queries over NLP Graphs, by Chris Welty, Ken Barker, Lora Aroyo, Shilpa Arora

0-threshold primary graph with & without secondary graph

for a given PG threshold we vary the SG threshold for validated hypotheses (x-axis)

secondary graph: all@0

Page 19: Query-Driven Hypothesis Generation for Answering Queries over NLP Graphs, by Chris Welty, Ken Barker, Lora Aroyo, Shilpa Arora

.1-threshold primary graph with & without secondary graph

red line indicates the PG threshold - the PG-only flattens below this threshold as expected

best performance point (.01 SG threshold)

secondary graph: all@0

Page 20: Query-Driven Hypothesis Generation for Answering Queries over NLP Graphs, by Chris Welty, Ken Barker, Lora Aroyo, Shilpa Arora

.2-threshold primary graph with & without secondary graph

the best performing configuration for dev is .2 threshold PG with SG hypotheses validated at .01 threshold

secondary graph: all@0

best performance point (.01 SG threshold)

if a triple in the SG completes a query that is mostly answered by the PG

it is very likely to be true

Page 21: Query-Driven Hypothesis Generation for Answering Queries over NLP Graphs, by Chris Welty, Ken Barker, Lora Aroyo, Shilpa Arora

Performance

Text

the difference at the chosen threshold on the test set significantly outperforms the baseline on the same set

Page 22: Query-Driven Hypothesis Generation for Answering Queries over NLP Graphs, by Chris Welty, Ken Barker, Lora Aroyo, Shilpa Arora

Conclusions •  the secondary graph can be exploited for getting answers

•  the probability that a relation is true between two entities increases significantly when that relation completes a query answer that is partially satisfied in the primary graph

•  able to target discarded interpretations when they will meet some user need

•  the NLP process is not a one-shot deal, the query provides context for what the user is seeking and thus an opportunity to re-interpret the text