aquaint 18-month workshop 1 light semantic processing for qa language technologies institute,...

21
1 Light Semantic Processing for QA AQUAINT 18-Month Workshop Language Technologies Institute, Carnegie Mellon B. Van Durme, Y. Huang, A. Kupsc and E. Nyberg Towards Light Semantic Processing for Question Answering

Upload: martin-blankenship

Post on 03-Jan-2016

215 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: AQUAINT 18-Month Workshop 1 Light Semantic Processing for QA Language Technologies Institute, Carnegie Mellon B. Van Durme, Y. Huang, A. Kupsc and E. Nyberg

1Light Semantic Processing for QA

AQUAINT 18-Month Workshop

Language Technologies Institute, Carnegie Mellon

B. Van Durme, Y. Huang,A. Kupsc and E. Nyberg

Towards Light Semantic Processingfor Question Answering

Page 2: AQUAINT 18-Month Workshop 1 Light Semantic Processing for QA Language Technologies Institute, Carnegie Mellon B. Van Durme, Y. Huang, A. Kupsc and E. Nyberg

2Light Semantic Processing for QA

AQUAINT 18-Month Workshop

Overview of This Talk

• Motivation• Components of the Approach

– Logical Form – Similarity Measure– Unification Strategy

• Incorporation into JAVELIN

• Future Work / Next Steps

Page 3: AQUAINT 18-Month Workshop 1 Light Semantic Processing for QA Language Technologies Institute, Carnegie Mellon B. Van Durme, Y. Huang, A. Kupsc and E. Nyberg

3Light Semantic Processing for QA

AQUAINT 18-Month Workshop

Example of Extraction Error

• Question: “When was Wendy’s founded?”

• Passage candidate:– “The renowned Murano glassmaking industry, on an

island in the Venetian lagoon, has gone through several reincarnations since it was founded in 1291. Three exhibitions of 20th-century Murano glass are coming up in New York. By Wendy Moonan.”

• Statistical extractor: 20th-century

Page 4: AQUAINT 18-Month Workshop 1 Light Semantic Processing for QA Language Technologies Institute, Carnegie Mellon B. Van Durme, Y. Huang, A. Kupsc and E. Nyberg

4Light Semantic Processing for QA

AQUAINT 18-Month Workshop

Basic IdeaQ: “xxx xxxx xxxx xxxx xxxxxxxxxx xx xxxxx?” P: “xxx xxxx xxxx xxxx xxxxx xx xxxxx.”

A(?,C) A(B,C)

? = B

extract extract

Unification on simple predicatesrepresenting basic argumentstructure will provide a moreaccurate way to match questionswith appropriate answer(s)

Two Challenges:* Where do predicates come from?* Flexibility in interpretation…

partial interpretation

Page 5: AQUAINT 18-Month Workshop 1 Light Semantic Processing for QA Language Technologies Institute, Carnegie Mellon B. Van Durme, Y. Huang, A. Kupsc and E. Nyberg

5Light Semantic Processing for QA

AQUAINT 18-Month Workshop

Associating Tokens with Concepts

• Imprecise Reference, e.g.:“John W. was greeted by William Clinton” “Bill greeted Mr. Wright”

• Definite Description, e.g.“Mr. Bush” vs. “the president”

• Anaphoric Reference

UNIFY( {GREET(“William Clinton”,”John W.”)} , {GREET(“Bill”,”Mr. Wright”)} )

Interpretation of tokens must be:•Approximate, not exact•Context-sensitive

Page 6: AQUAINT 18-Month Workshop 1 Light Semantic Processing for QA Language Technologies Institute, Carnegie Mellon B. Van Durme, Y. Huang, A. Kupsc and E. Nyberg

6Light Semantic Processing for QA

AQUAINT 18-Month Workshop

Language Processing Tools

• BBN IdentiFinder (BBN, 2000)• Link Grammar parser (Grinberg et al., 1995)• KANTOO parser (Nyberg & Mitamura, 2000)• Brill part-of-speech tagger (Brill, 1995)• WordNet (Fellbaum, 1998)• Lexical Conceptual Structure (LCS) Database

(Dorr 2001)

Page 7: AQUAINT 18-Month Workshop 1 Light Semantic Processing for QA Language Technologies Institute, Carnegie Mellon B. Van Durme, Y. Huang, A. Kupsc and E. Nyberg

7Light Semantic Processing for QA

AQUAINT 18-Month Workshop

Representation

• Formula: a set of literals• Literal: a predicate, plus two terms• Extrinsic literal: a relation mapping a

label to a label– SUBJECT(x1,x2)

• Intrinsic literal: a relation mapping a label to a value– ROOT(x1,|Benjamin|)

• Value: EVENT, past, +, |Mary Smith|,…

Page 8: AQUAINT 18-Month Workshop 1 Light Semantic Processing for QA Language Technologies Institute, Carnegie Mellon B. Van Durme, Y. Huang, A. Kupsc and E. Nyberg

8Light Semantic Processing for QA

AQUAINT 18-Month Workshop

Example

Q = Who killed Jefferson?ROOT(x1,?a0),ROOT(x2,|kill|),ROOT(x3,|Jefferson|),TYPE(x2,|event|),TYPE(x1,|person|),TYPE(x3,|person|),SUBJECT(x2,x1),OBJECT(x2,x3),ANS(?a0)

P = Benjamin murdered Jefferson.ROOT(y1,|Benjamin|),ROOT(y2,|murder|),ROOT(y3,|Jefferson|),TYPE(y2,|event|),TYPE(y1,|person|),TYPE(y3,|person|),SUBJECT(y2,y1),OBJECT(y2,y3)

Page 9: AQUAINT 18-Month Workshop 1 Light Semantic Processing for QA Language Technologies Institute, Carnegie Mellon B. Van Durme, Y. Huang, A. Kupsc and E. Nyberg

9Light Semantic Processing for QA

AQUAINT 18-Month Workshop

Graphically

?a0

x1 x2

kill

x3

Jeffersonperson

Benjamin

y1y2

murder

y3

Jeffersonperson

eventperson

person

event

SUBJECT

SUBJECT

OBJECT

OBJECT

ROOT

ROOT

ROOT

ROOT

ROOT

ROOT

TYPE

TYPE

TYPETYPE

TYPE

TYPE

Page 10: AQUAINT 18-Month Workshop 1 Light Semantic Processing for QA Language Technologies Institute, Carnegie Mellon B. Van Durme, Y. Huang, A. Kupsc and E. Nyberg

10Light Semantic Processing for QA

AQUAINT 18-Month Workshop

Similarity Functions• A zero-to-one function that returns a value

representing similarity between the formulae for question, passage

• Unification requires similarity measurement between literal values

• sim(“Who killed Jefferson?”, ”Benjamin murdered Jefferson.”) = 0.9

Page 11: AQUAINT 18-Month Workshop 1 Light Semantic Processing for QA Language Technologies Institute, Carnegie Mellon B. Van Durme, Y. Huang, A. Kupsc and E. Nyberg

11Light Semantic Processing for QA

AQUAINT 18-Month Workshop

sim(formula0,formula1)

Given two formulae, we define the similarity to be the geometricmean of the similarity between the separate extrinsic literals.

Page 12: AQUAINT 18-Month Workshop 1 Light Semantic Processing for QA Language Technologies Institute, Carnegie Mellon B. Van Durme, Y. Huang, A. Kupsc and E. Nyberg

12Light Semantic Processing for QA

AQUAINT 18-Month Workshop

sim(extrinsicLiteral0,extrinsicLiteral1)

To measure the similarity between two extrinsic literals,we take the square root of the product of the similaritybetween each of the two pairs of labels.

Page 13: AQUAINT 18-Month Workshop 1 Light Semantic Processing for QA Language Technologies Institute, Carnegie Mellon B. Van Durme, Y. Huang, A. Kupsc and E. Nyberg

13Light Semantic Processing for QA

AQUAINT 18-Month Workshop

sim(label0,label1)

To measure the similarity of two labels, we find the maximumpossible value of taking the geometric mean of the similarity of each pairwise combination of intrinsic literals that are shared by the two labels.

Page 14: AQUAINT 18-Month Workshop 1 Light Semantic Processing for QA Language Technologies Institute, Carnegie Mellon B. Van Durme, Y. Huang, A. Kupsc and E. Nyberg

14Light Semantic Processing for QA

AQUAINT 18-Month Workshop

sim(intrinsicLiteral0,intrinsicLiteral1)

The similarity between two intrinsic literals is measured by similarity of the paired words, times the weight of the first literal.

Page 15: AQUAINT 18-Month Workshop 1 Light Semantic Processing for QA Language Technologies Institute, Carnegie Mellon B. Van Durme, Y. Huang, A. Kupsc and E. Nyberg

15Light Semantic Processing for QA

AQUAINT 18-Month Workshop

sim(word0,word1)

• sim(|kill|,|murder|) = 0.8– via WordNet distance function

• sim(?a0,|Benjamin|) = 1.0– zero cost for variable binding

Page 16: AQUAINT 18-Month Workshop 1 Light Semantic Processing for QA Language Technologies Institute, Carnegie Mellon B. Van Durme, Y. Huang, A. Kupsc and E. Nyberg

16Light Semantic Processing for QA

AQUAINT 18-Month Workshop

Example

Page 17: AQUAINT 18-Month Workshop 1 Light Semantic Processing for QA Language Technologies Institute, Carnegie Mellon B. Van Durme, Y. Huang, A. Kupsc and E. Nyberg

17Light Semantic Processing for QA

AQUAINT 18-Month Workshop

Answer

• Find the maximum possible similarity score, return the term bound to ?a0

• ?a0/|Benjamin|• sim(Q,P) = 0.9• Answer = Benjamin, 0.9

Page 18: AQUAINT 18-Month Workshop 1 Light Semantic Processing for QA Language Technologies Institute, Carnegie Mellon B. Van Durme, Y. Huang, A. Kupsc and E. Nyberg

18Light Semantic Processing for QA

AQUAINT 18-Month Workshop

Current Status, Future Work• First version implemented, testing now• Short Term: Test “NLP IX” against statistical

extraction module on factoid questions• Longer Term:

– Support simple reasoning about questions and passages

– Investigate approach in narrower domains• Question answering based on CNS data on terrorism

and weapons of mass destruction– Extend similarity metric at word level

• Word co-occurrence information• Distance metrics on ontologies other than WordNet

– Incorporate LCS Lexicon

Page 19: AQUAINT 18-Month Workshop 1 Light Semantic Processing for QA Language Technologies Institute, Carnegie Mellon B. Van Durme, Y. Huang, A. Kupsc and E. Nyberg

19Light Semantic Processing for QA

AQUAINT 18-Month Workshop

Summary

• We believe complex question answering requires more than statistical extraction methods

• Knowledge bottleneck forces compromise in depth of language processing

• Robust unification based on heuristic measure of similarity offers short-term solution

Page 20: AQUAINT 18-Month Workshop 1 Light Semantic Processing for QA Language Technologies Institute, Carnegie Mellon B. Van Durme, Y. Huang, A. Kupsc and E. Nyberg

20Light Semantic Processing for QA

AQUAINT 18-Month Workshop

Additional Resources

• Paper available:

B. Van Durme, Y. Huang, A. Kupsc and E. Nyberg (2003). “Towards Light Semantic Processing for Question Answering”, presented at the HLT/NAACL 2003 Workshop on Text Meaning.

• This and other papers at the JAVELIN web site:

http://www.lti.cs.cmu.edu/Research/JAVELIN

Page 21: AQUAINT 18-Month Workshop 1 Light Semantic Processing for QA Language Technologies Institute, Carnegie Mellon B. Van Durme, Y. Huang, A. Kupsc and E. Nyberg

21Light Semantic Processing for QA

AQUAINT 18-Month Workshop

Questions?