+ identifying implicit relationships christine boucher hon 111 j. chu-carroll e.w. brown a.lally...

22
+ Identifying Implicit Relationships Christine Boucher HON 111 J. Chu-Carroll E.W. Brown A. Lally J.W. Murdock

Upload: amos-simon

Post on 29-Dec-2015

215 views

Category:

Documents


0 download

TRANSCRIPT

+

Identifying Implicit Relationships

Christine Boucher HON 111

J. Chu-CarrollE.W. BrownA. LallyJ.W. Murdock

+Outline

Introduction to Implicit Relationships

Spreading Activation

Watson’s 3 Information Resources

Application to COMMON BOND Questions

Application to Missing Link Questions

Evaluation against Watson’s baseline system

Conclusion

+Introduction

Resolving an implicit reference to a hidden concept

Question Types: COMMON BOND

Feet, eyebrows and McDonald’s have arches in common Trout, loose change and compliments are things that you

fish for Missing Link questions

“The 1648 Peace of Westphalia ended a war that began on May 23 of this year.”

+Identifying the missing link

“The 1648 Peace of Westphalia ended a war that began on May 23 of this year.”

Peace of Westphalia

Ended the Thirty Years’ War

1618

+Problem?

Need to identify concepts that are closely related to those given in the question…

…then use that information to solve for the final answer

+Spreading Activation

Theory of Spreading Activation Originated in Cog Psych, used to explain semantic

processing and lexical retrieval

Activation of a semantic network Concepts in a network are activated

through their connections to already

active concepts

+Spreading Activation Algorithm

Measure concept relatedness on the basis of frequencies that concepts co-occur

Activation over natural-language text: Watson’s 3 Resources n-gram corpus PRISMATIC knowledge base Wikipedia links

Fan size f, depth d f-most-related concepts to current activated concept Recursively invoked d times

+n-gram corpus

Contiguous sequence of n items from sequence of text or speech

Could be sequences of letters, syllables, words, etc.

5-gram corpus: corpus of 5-word sequences from text (with functionwords removed) E.g. “Pineapples grow inthe tropical climate of Hawaii and taste

sweet.”

Lexical collocation retrieval of frequently collocated terms leads to computation of semantic similarity

E.g. High collocation frequency between terns “JFK” and “airport” and “JFK” and “assassination”

+PRISMATIC knowledge base

Extracts frames and slots based on syntactic relationships

Syntactic frame – links arguments and predicates

Example frame: SVO (subject-verb-object)

“Ford pardoned Nixon in 1974” (Ford, pardon, Nixon)

Query provides count of SVO tuplesw/ subject Ford, etc.

Other types of frames: SVPO and NPO

Counts for 3 frames are combined to compute total frequency of links between two terms and compute a relatedness score

+Covering the gaps left by n-gram

n-gram counts related words that appear lexically near e/o, while PRISMATIC counts words that are syntactically connected

“Ford did not act hastily but did finally pardon Nixon in September.”

+Wikipedia Links Uses metadata encoded in Web documents rather than

the texts themselves

Documents link to other documents Target documents are typically closely related concepts to

source documents

+Using Wikipedia links, continued Capture semantic relatedness using article titles

Article titles represent canonical form of concepts --> higher likelihood of finding a common related concept given 2 or more concepts

Essence: given term t, we identify the Wikidocument whose title best matches t and return all target document titles from links in that document.

+Application to Common-bond Questions

The answers are all semantically related to the given entities

Calls for use of spreading activation Identify concepts that are closely related to each given

entity Score each concept on basis of their degrees of relatedness

to all given entities

+Candidate Generation

Spreading activation invoked on each entity

Example: Bobby, bowling, rolling (pins) bobby: Robert, British police officer, pin bowling: lane, strike, 300, pin rolling: Rolling Stone, ramp, pin

Related concepts found are generated as candidate answers strike, British police officer, Rolling Stone, pin, ramp

Search n-gram corpus for most frequently collocated terms

+Common-bond answer scorer

Candidates scored on basis of semantic relatedness to each given entity

Relatedness of ‘strike, British police officer, Rolling Stone, pin, ramp’ to ‘bobby,’ ‘bowling,’ ‘rolling’

Multiply 3 NGD (Normalized Google Distance) scores for overall goodness score of candidate as common bond answer

f(Bobby, pin) xf(bowling, pin) xf(rolling, pin) = pin’s score f(Bobby, ramp) xf(bowling, ramp) xf(rolling, ramp) = ramp’s

score

‘pin’ wins!

+Application to Missing-link Questions

Q’s in which a missing entity is either explicitly or implicitly referred to (often Final Jeopardy! questions) “On hearing of the discovery of George Mallory’s body, this

explorer still thinks he was first.” (Answer: “Edmund Hillary”)

George Mallory

Mount Everest

Edmund Hillary

3-step solving : Missing link identification & candidate generation and scoring

+Missing link identification

2 criteria: highly related to concepts in the question and must be ruled out as a possible correct final answer

Search for semantically highly associated entities to key concepts in Q

Many are actually the correct final answer, so can’t be the missing link

Attempt to definitively rule out possible correct final answers Wrong answer type (e.g. “Thirty Years’ War” appears as

a high-association answer but is not of the right answer type “year” and thus is a prime candidate as a missing link.)

+Candidate generation using missing links

Invoke system again using missing links in search process Hope that new search results include correct answers that

previously failed to be generated

New search queries produced by augmenting each existing query with a missing link “The 1648 Peace of Westphalia ended a war that began on

May 23 of this year.” Peace of Westphalia, Thirty Years’ War, began, May 23

Focuses search on key concepts from Q with additional bias toward the inferred missing link

+Missing-link answer scorer

Second iteration produces list of answers ranked by confidence

Developed new scorers for scoring semantic relatedness of candidate answer and concepts in the question via the identified missing link

For each candidate-answer and missing-link pair, compute semantic relatedness score using spreading-activation process

E.g. Given its strong association with George Mallory, it is fairly straightforward to identify “Mount Everest” as a missing link. Compute relatedness scores of missing link and candidate

answers; (Mt. Everest, Apa Sherpa), (Mt. Everest, Edmund Hillary), (Mt. Everest, Jordan Romero)

Edmund Hillary wins!

+Evaluation against the baseline system

baseline technique stumbles on a high-scoring candidate that is strongly related to just one of the clue phrases E.g. “COMMON BONDS: Spice, interrupted, Georgy”

Baseline system prefers “Girl, Interrupted” Common Bond Answer Generator is able to prefer “girls,”

which is associated with all three clue phrases

+Evaluation against the baseline system

a)oldtop answer becomes missing link (1,2)

b)initialanswer incorrect but of correct type when missing link taken into consideration before final answer

generation, aids promotion of correct answer candidate to top position

+Conclusion

Spreading activation approach for concept expansion and measuring semantic relatedness

Implemented three new knowledge resources n-gram corpus – semantic relatedness based on lexical

collocation PRISMATIC knowledge base – relatedness of concepts based

on syntactic collocation Wiki links – metadata from link structures to indicate

semantic relatedness

Process identifies missing semantic associations between concepts and improves performance on common-bond and Final Jeopardy! questions