deep text understanding with wordnet christiane fellbaum princeton university and berlin-brandenburg...

Deep Text Understanding with WordNet

Christiane Fellbaum

Princeton University and

Berlin-Brandenburg Academy of Sciences

WordNet• What is WordNet and why is it interesting/useful?

• A bit of history

• WordNet for natural language processing/word sense disambiguation

What is WordNet?• A large lexical database, or “electronic dictionary,”

developed and maintained at Princeton University http://wordnet.princeton.edu• Includes most English nouns, verbs, adjectives, adverbs• Electronic format makes it amenable to automatic

manipulation• Used in many Natural Language Processing applications

(information retrieval, text mining, question answering, machine translation, AI/reasoning,...)

• Wordnets are built for many languages (including Danish!)

What’s special about WordNet? • Traditional paper dictionaries are organized alphabetically:

words that are found together (on the same page) are not related by meaning

• WordNet is organized by meaning: words in close proximity are semantically similar

• Human users and computers can browse WordNet and find words that are meaningfully related to their queries (somewhat like in a hyperdimensional thesaurus)

• Meaning similiarity can be measured and quantified to support Natural Language Understanding

A bit of history

Research in Artificial Intelligence (AI):How do humans store and access knowledge about

concept?Hypothesis: concepts are interconnected via

meaningful relationsKnowledge about concepts is huge--must be stored

in an efficient and economic fashion

A bit of history

Knowledge about concepts is computed “on the fly” via access to general concepts

E.g., we know that “canaries fly” because

“birds fly” and “canaries are a kind of bird”

A simple picture

animal (animate, breathes, has heart,...)

|

bird (has feathers, flies,..)

|

canary (yellow, sings nicely,..)

Knowledge is stored at the highest possible node and inherited by lower (more specific) concepts rather than being multiply stored

Collins & Quillian (1969) measured reaction times to statements involving knowledge distributed across different “levels”

Do birds fly?

--short RT

Do canaries fly?

--longer RT

Do canaries have a heart?

--even longer RT

Collins’ & Quillian’s results are subject to criticism (reaction time to statements like “do canaries move?” are influenced by prototypicality, word frequency, uneven semantic distance across levels)

But other evidence from psychological experiments confirms that humans organize knowledge about words and concept by means of meaningful relations

Access to one concepts activates related concepts in an outward spreading (radial) fashion

A bit of history

But the idea inspired WordNet (1986), which asked:Can most/all of the lexicon be represented as a semantic

network where words are interlinked by meaning? If so, the result would be a semantic network (a graph)

WordNet

If the (English) lexicon can be represented as a semantic network, which are the relations that connect the nodes?

Whence the relations?

• Inspection of association normsstimulus: hand reponse: finger, arm

• Classical ontology (Aristotle): IS-A (maple-tree), HAS-A (maple-leaves)

• Co-occurrence patterns in texts (meaningfully related words are used together)

Relations:Synonymy

One concept is expressed by several different word forms:

{beat, hit, strike}

{car, motorcar, auto, automobile}

{ big, large}

Synonymy = one:many mapping of meaning and form

Synonymy in WordNet

WordNet groups (roughly) synonymous, denotationally equivalent, words into unordered sets of synonyms (“synsets”)

{hit, beat, strike}{big, large}{queue, line}

Each synset expresses a distinct meaning/concept

PolysemyOne word form expresses multiple meaningsPolysemy = one:many mapping of form and

meaning

{table, tabular_array}{table, piece_of_furniture}{table, mesa}{table, postpone}

Note: the most frequent word forms are the most polysemous!

Polysemy in WordNet

A word form that appears in n synsets is n-fold polysemous

{table, tabular_array}{table, piece_of_furniture}{table, mesa}{table, postpone}

table is fourfold polysemous/has four senses

Some WordNet stats

117,659155,287total

3,6214,481adverb

18,15621,479adjective

13,76711,529verb

82,115117,798noun

Synsets containing wf

Word formsPart of speech

The “Net” part of WordNet

Synsets arethe building block of the network

Synsets are interconnected via relations

Bi-directional arcs express semantic relations

Result: large semantic network (graph)

Hypo-/hypernymy relates noun synsets

Relates more/less general conceptsCreates hierarchies, or “trees” {vehicle} / \ {car, automobile} {bicycle, bike} / \ \ {convertible} {SUV} {mountain bike}

“A car is is a kind of vehicle” <=>“The class of vehicles includes cars, bikes”Hierarchies can have up to 16 levels

Hyponymy

Transitivity:

A car is a kind of vehicle

An SUV is a kind of car

=> An SUV is a kind of vehicle

Meronymy/holonymy(part-whole relation)

{car, automobile} | {engine} / \ {spark plug} {cylinder}

“An engine has spark plugs” “Spark plus and cylinders are parts of an engine”

Meronymy/Holonymy

Inheritance:

A finger is part of a hand A hand is part of an armAn arm is part of a body=>a finger is part of a body

Structure of WordNet (Nouns)

{vehicle}

{conveyance; transport}

{car; auto; automobile; machine; motorcar}

{cruiser; squad car; patrol car; police car; prowl car} {cab; taxi; hack; taxicab; }

{motor vehicle; automotive vehicle}

{bumper}

{car door}

{car window}

{car mirror}

{hinge; flexible joint}

{doorlock}

{armrest}

hyperonym

hyperonym

hyperonym

hyperonymhyperonym

meronym

meronym

meronym

meronym

WordNet Data Model

bank

fiddleviolin

violistfiddler

string

rec: 12345- financial instituterec: 54321

- side of a riverrec: 9876

- small string instrumentrec: 65438

- musician playing violinrec:42654

- musician

rec:25876

- string instrument

rec:35576

- string of instrumentrec:29551

- subatomic particle

type-of

type-of

part-of

Vocabulary of a languageConceptsRelations

1

2

2

1

1

2

WordNet for Natural Language Processing

Challenge:

get a computer to “understand” language

• Information retrieval

• Text mining

• Document sorting

• Machine translation

Natural Language Processing• Stemming, parsing currently at >90% accuracy

level • Word sense discrimination (lexical

disambiguation) still a major hurdle for successful NLP

• Which sense is intended by the writer (relative to a dictionary)?

• Best systems: ~60% precision, ~60% recall (but human inter-annotator agreement isn’t perfect, either!)

Understanding text beyond the word level

(joint work with Peter Clark and Jerry Hobbs)

Knowledge in text

Human language users routinely derive knowledge from text that is NOT expressed on the surface

Perhaps more knowledge is unexpressed than overtly expressed on the surface

Grasser (1981) estimates

explicit:implicit info = 1:8

An exampleText: A soldier was killed in a gun battleInferences:Soldiers were fighting one anotherThe soldiers had guns with live ammunitionMultiple shots were firedOne soldier shot another soldierThe shot soldier died as a result of the injuries

caused by the shotThe time interval between the fatal shot and the

death was short

Humans use world knowledge to supplement word knowledge

(How) can such knowledge be encoded and harnessed by automatic systems?

Previous attempts (e.g., Cyc’s microtheories)

--too few theories

--uneven coverage of world knowledge

Recognizing Textual Entailment

Task:

Evaluate truth of hypothesis H given a text T

(T) A soldier was killed in a gun battle

(H) A soldier died

Answer may be yes/no/probably/...

RTE

Many automatic system attempt RTE via lexical, syntactic matching algorithms (“do the same words occur in T, H?” “do T, H have the same subject/object?”)

Not “deep” language understanding

Our RTE test suite

250 Text-Hypothesis pairs

for 50% of them, H is entailed by T

for the remaining 50%, H is not (necessarily) entailed

Focus on semantic interpretation

RTE test suite

Core of T statements came from newspaper texts

H statements were hand-coded

focus on general world knowledge

RTE test suite

Manually analyzed pairs

Distinguished, classified 19 types of knowledge among the T-H pairs

some partial overlap

Exx: Types of knowledge(increasing order of difficulty)

Lexical: relation among irregular forms of a single lemma, Named Entities vs. proper nouns

Lexical-semantic (paradigmatic): synonyms, hypernyms, meronyms, antonyms, metonymy, derivations

Syntagmatic: selectional preferences, telic rolesPropositional: cause-effect, preconditionsWorld knowledge/core theories (e.g., ambush entails

concealment)

Overall approach (bag of tricks)

• Initial text interpretation with language processing tools (Peter Clark et al.)

• Compute subsumption among text fragments

• WordNet augmentations

Text interpretation

First step: parsing (assign a structure to a sentence or phrase)

SAPIR parser (Harrison & Maxwell 1986)

SAPIR also produces a Logical Form (LF)

LFs

LF structures are trees generated by rules parallel to grammar rules

contain logic elements

nouns, verbs, adj’s, prepositions represented as variables

LFs are parsed and have part-of-speech tags

LFs generate ground logical assertions

Example

LF for "A soldier was killed in a gun battle."

(DECL

((VAR X1 "a" "soldier")

(VAR X2 "a" "battle" (NN "gun" "battle")))

(S (PAST) NIL "kill" ?X1 (PP "in" ?X2)))

Logical assertions

logic for "A soldier was killed in a gun battle."

object(kill,soldier) & in(kill,battle) & modifier(battle,gun)

Result: T, H in Logical Form

Matching sentences/fragments with subsumption

A basic reasoning operationA person loves a personsubsumesA man loves a woman

Set1 of clauses subsumes another Set2 of clauses if each clause in S1 subsumes some member of S2.

Similary, a set of clauses subsumes another set of clauses if the arguments of the first subsume or match the arguments of the second

Argument (word) subsumption as in WordNet (X is a Y)Matching = synonyms

Syntactic matching of predicates

--both are the same

--one is predicate “of” or modifier (my friend’s car, the car of my friend)

--predicates “subject” and “by” match (passives)

Lexical (word) matching

Words related by derivational morphology (destroy, destruction) are considered matches in conjunction with syntactic matches

Recognize as equivalent:

the bomb destroyed the shrine

the destruction of the shrine by the bomb

But not

the destruction of the bomb by the shrine

a person attacks with a bomb

there is a bomb attack by a person

Benefits for text understanding/RTE

(T) Moore is a prolific writer(H) Moore writes many books

Moore is the Agent of write

Exploiting word and world knowledge encoded in WordNet

Use of WordNet glosses

Glosses = definition of “concept” expressed by synset members

{airplane, plane (an aircraft that has fixed wings and is powered by propellers or jets)}

syntagmatic information, world knowledge

Translating glossed into First Order Logic Axioms

{ bridge, span (any structure that allows people or vehicles to cross an obstacle such as a river or canal...)}

bridgeN1(x,y)

<--> structureN1(x) & allowV1(x,e1) & crossV1(e1,z,y)

& obstacleN2(y) & person/vehicle(z)

personN1(z) --> person/vehicle(z)

vehicleN1(z) --> vehicle/person(z)

riverN2(y) --> obstacleN2(y)

canalN3(y) --> obstacleN2(y)

The nouns, verbs, adjectives, adverbs in the LF glosses were manually disambiguated

Thus, each variable in the LFs was identified not just with a word form, but a form-meaning pair (sense) in WordNet

LFs were generated for 110K glosses Particular emphasis on CoreWordNet

How well do our tricks perform?

An example that works

Exploiting formally related words in WN:

(T) …go through licensing procedures

(H) …go through licensing processes

Exploiting hyponymy (IS-A relation):

(T) Beverley served at WEDCOR

(H) Beverley worked at WEDCOR

More complex example that works

(T) Britain puts curbs on immigrant labor from Bulgaria

(H) Britain restricted workers from Bulgaria

Knowledge from WordNet

Synset with gloss: {restrict, restrain, place_limits_on, (place restrictions on)}

Synonymy: {put, place}, {curb, limit}

Morphosemantic link: {labor} - {laborer}

Hyponymy: {laborer} ISA {worker}

Example that doesn’t work

(T) The Philharmonic orchestra draws large crowds(H) Large crowds were drawn to listen to the

orchestraWordNet tells us thatorchestra = collection of musiciansmusician = someone who plays musical instrumentmusic = sound produced by musical instrumentslisten = hear = perceive soundBut WN doesn’t tell us that playing results in sound

production and that there is a listener

Examples that don’t work

The most fundamental knowledge that humans take for granted trips up automatic systems

Such knowledge is not explicitly taught to children

But it must be “taught” to machines!

Core theories (Jerry Hobbs)

• Attempt to encode fundamental knowledge

• Space, time, causality,...

• Essential for reasoning

• Not encoded in WordNet glosses

Core theories

• Manually encoded

• Axiomatized

Core theories

• Composite entities (things made of other things, stuff)

• Scalar notions (time, space,...)

• Change of state

• Causality

Core theories

Example of predications:

change(e1,e2)

changeFrom(e1)

changeTo(e2)

Core theories and WordNet

map core theories to Core WN synsets

encode meanings of synsets denoting events, event structure in terms of core theory predications

Examples

let(x,e) <--> not(cause(x,not(e)))

{go, become, get (“he went wild”)}go(x,e) <--> changeTo(e)

free(x,y) <--> cause(x,changeTo(free(y)))

(All words are linked to WN senses)

Example

The captors freed the hostagesThe hostages were free

free = let(x, go(y, free(y)))<--> not(cause(x, not(changeTo(free(y)))<--> cause(x, changeTo(free(y)))<--> free(x,y)

Preliminary evaluation

(What) does each component contribute to RTE?

For the 250 Text-Hypothesis pairs in our test suite:

7297when H or ¬H is not predicted (assumed not to be entailed)

14WordNet glosses in LF

114WordNet relations

311syntactic transformations

IncorrectCorrectwhen H or ¬H is predicted by:

Conclusion

• Way to go!

• Deliberately exclude statistical similarity measures (this hurts our results)

• Symbolic approach: aim at deep level understanding

WordNet for Deeper Text Understanding

• Axioms in Logical Form are useful for many other NL Understanding applications

• E.g., automated question answering: translate Qs and As into logic representation

• Logic representations enable reasoning (axioms can be fed into a reasoner/logic prover)

Thanks for your attention

deep text understanding with wordnet christiane fellbaum princeton university and berlin-brandenburg...

Documents

engine slide

body slide

form slide

kind of vehicle slide

wordnet wordnet groups

kind of bird slide

history wordnet

longer rt slide