+ identifying implicit relationships christine boucher hon 111 j. chu-carroll e.w. brown a.lally...
TRANSCRIPT
+
Identifying Implicit Relationships
Christine Boucher HON 111
J. Chu-CarrollE.W. BrownA. LallyJ.W. Murdock
+Outline
Introduction to Implicit Relationships
Spreading Activation
Watson’s 3 Information Resources
Application to COMMON BOND Questions
Application to Missing Link Questions
Evaluation against Watson’s baseline system
Conclusion
+Introduction
Resolving an implicit reference to a hidden concept
Question Types: COMMON BOND
Feet, eyebrows and McDonald’s have arches in common Trout, loose change and compliments are things that you
fish for Missing Link questions
“The 1648 Peace of Westphalia ended a war that began on May 23 of this year.”
+Identifying the missing link
“The 1648 Peace of Westphalia ended a war that began on May 23 of this year.”
Peace of Westphalia
Ended the Thirty Years’ War
1618
+Problem?
Need to identify concepts that are closely related to those given in the question…
…then use that information to solve for the final answer
+Spreading Activation
Theory of Spreading Activation Originated in Cog Psych, used to explain semantic
processing and lexical retrieval
Activation of a semantic network Concepts in a network are activated
through their connections to already
active concepts
+Spreading Activation Algorithm
Measure concept relatedness on the basis of frequencies that concepts co-occur
Activation over natural-language text: Watson’s 3 Resources n-gram corpus PRISMATIC knowledge base Wikipedia links
Fan size f, depth d f-most-related concepts to current activated concept Recursively invoked d times
+n-gram corpus
Contiguous sequence of n items from sequence of text or speech
Could be sequences of letters, syllables, words, etc.
5-gram corpus: corpus of 5-word sequences from text (with functionwords removed) E.g. “Pineapples grow inthe tropical climate of Hawaii and taste
sweet.”
Lexical collocation retrieval of frequently collocated terms leads to computation of semantic similarity
E.g. High collocation frequency between terns “JFK” and “airport” and “JFK” and “assassination”
+PRISMATIC knowledge base
Extracts frames and slots based on syntactic relationships
Syntactic frame – links arguments and predicates
Example frame: SVO (subject-verb-object)
“Ford pardoned Nixon in 1974” (Ford, pardon, Nixon)
Query provides count of SVO tuplesw/ subject Ford, etc.
Other types of frames: SVPO and NPO
Counts for 3 frames are combined to compute total frequency of links between two terms and compute a relatedness score
+Covering the gaps left by n-gram
n-gram counts related words that appear lexically near e/o, while PRISMATIC counts words that are syntactically connected
“Ford did not act hastily but did finally pardon Nixon in September.”
+Wikipedia Links Uses metadata encoded in Web documents rather than
the texts themselves
Documents link to other documents Target documents are typically closely related concepts to
source documents
+Using Wikipedia links, continued Capture semantic relatedness using article titles
Article titles represent canonical form of concepts --> higher likelihood of finding a common related concept given 2 or more concepts
Essence: given term t, we identify the Wikidocument whose title best matches t and return all target document titles from links in that document.
+Application to Common-bond Questions
The answers are all semantically related to the given entities
Calls for use of spreading activation Identify concepts that are closely related to each given
entity Score each concept on basis of their degrees of relatedness
to all given entities
+Candidate Generation
Spreading activation invoked on each entity
Example: Bobby, bowling, rolling (pins) bobby: Robert, British police officer, pin bowling: lane, strike, 300, pin rolling: Rolling Stone, ramp, pin
Related concepts found are generated as candidate answers strike, British police officer, Rolling Stone, pin, ramp
Search n-gram corpus for most frequently collocated terms
+Common-bond answer scorer
Candidates scored on basis of semantic relatedness to each given entity
Relatedness of ‘strike, British police officer, Rolling Stone, pin, ramp’ to ‘bobby,’ ‘bowling,’ ‘rolling’
Multiply 3 NGD (Normalized Google Distance) scores for overall goodness score of candidate as common bond answer
f(Bobby, pin) xf(bowling, pin) xf(rolling, pin) = pin’s score f(Bobby, ramp) xf(bowling, ramp) xf(rolling, ramp) = ramp’s
score
‘pin’ wins!
+Application to Missing-link Questions
Q’s in which a missing entity is either explicitly or implicitly referred to (often Final Jeopardy! questions) “On hearing of the discovery of George Mallory’s body, this
explorer still thinks he was first.” (Answer: “Edmund Hillary”)
George Mallory
Mount Everest
Edmund Hillary
3-step solving : Missing link identification & candidate generation and scoring
+Missing link identification
2 criteria: highly related to concepts in the question and must be ruled out as a possible correct final answer
Search for semantically highly associated entities to key concepts in Q
Many are actually the correct final answer, so can’t be the missing link
Attempt to definitively rule out possible correct final answers Wrong answer type (e.g. “Thirty Years’ War” appears as
a high-association answer but is not of the right answer type “year” and thus is a prime candidate as a missing link.)
+Candidate generation using missing links
Invoke system again using missing links in search process Hope that new search results include correct answers that
previously failed to be generated
New search queries produced by augmenting each existing query with a missing link “The 1648 Peace of Westphalia ended a war that began on
May 23 of this year.” Peace of Westphalia, Thirty Years’ War, began, May 23
Focuses search on key concepts from Q with additional bias toward the inferred missing link
+Missing-link answer scorer
Second iteration produces list of answers ranked by confidence
Developed new scorers for scoring semantic relatedness of candidate answer and concepts in the question via the identified missing link
For each candidate-answer and missing-link pair, compute semantic relatedness score using spreading-activation process
E.g. Given its strong association with George Mallory, it is fairly straightforward to identify “Mount Everest” as a missing link. Compute relatedness scores of missing link and candidate
answers; (Mt. Everest, Apa Sherpa), (Mt. Everest, Edmund Hillary), (Mt. Everest, Jordan Romero)
Edmund Hillary wins!
+Evaluation against the baseline system
baseline technique stumbles on a high-scoring candidate that is strongly related to just one of the clue phrases E.g. “COMMON BONDS: Spice, interrupted, Georgy”
Baseline system prefers “Girl, Interrupted” Common Bond Answer Generator is able to prefer “girls,”
which is associated with all three clue phrases
+Evaluation against the baseline system
a)oldtop answer becomes missing link (1,2)
b)initialanswer incorrect but of correct type when missing link taken into consideration before final answer
generation, aids promotion of correct answer candidate to top position
+Conclusion
Spreading activation approach for concept expansion and measuring semantic relatedness
Implemented three new knowledge resources n-gram corpus – semantic relatedness based on lexical
collocation PRISMATIC knowledge base – relatedness of concepts based
on syntactic collocation Wiki links – metadata from link structures to indicate
semantic relatedness
Process identifies missing semantic associations between concepts and improves performance on common-bond and Final Jeopardy! questions