5/16/2015cpsc503 winter 20091 cpsc 503 computational linguistics computational lexical semantics...

04/18/23 CPSC503 Winter 2009 1

CPSC 503Computational Linguistics

Computational Lexical SemanticsLecture 14

Giuseppe Carenini

04/18/23 CPSC503 Winter 2009 2

Today 23/10

Three well-defined Semantic Task• Word Sense Disambiguation

– Corpus and Thesaurus

• Word Similarity– Thesaurus and Corpus

• Semantic Role Labeling (corpus)

04/18/23 CPSC503 Winter 2009 3

WSD example: table + ?? -> [1-6]

The noun "table" has 6 senses in WordNet.1. table, tabular array -- (a set of data …)2. table -- (a piece of furniture …)3. table -- (a piece of furniture with tableware…)4. mesa, table -- (flat tableland …)5. table -- (a company of people …)6. board, table -- (food or meals …)

04/18/23 CPSC503 Winter 2009 4

WSD methods

•Machine Learning – Supervised– Unsupervised

•Dictionary / Thesaurus (Lesk)

04/18/23 CPSC503 Winter 2009 5

Supervised ML Approaches to WSD

MachineLearning

Classifier

TrainingData

((word + context1) sense1)……((word + contextn) sensen)

sense(word + context)

04/18/23 CPSC503 Winter 2009 6

Training Data Example

..after the soup she had bass with a big salad…

((word + context) sense)i

context

Examples, • One of 8 possible senses for “bass”

in WordNet• One of the 2 key distinct senses for

“bass” in WordNet

sense

04/18/23 CPSC503 Winter 2009 7

WordNet Bass: music vs. fishThe noun ``bass'' has 8 senses in WordNet1. bass - (the lowest part of the musical range)2. bass, bass part - (the lowest part in polyphonic

music)3. bass, basso - (an adult male singer with …)4. sea bass, bass - (flesh of lean-fleshed saltwater fish

of the family Serranidae)5. freshwater bass, bass - (any of various North

American lean-fleshed ………)6. bass, bass voice, basso - (the lowest adult male

singing voice)7. bass - (the member with the lowest range of a

family of musical instruments)8. bass -(nontechnical name for any of numerous

edible marine and freshwater spiny-finned fishes)

04/18/23 CPSC503 Winter 2009 8

Representations for Context

• GOAL: Informative characterization of the window of text surrounding the target word

• Supervised ML requires a simple representation for the training data:

vectors of feature/value pairs

• TASK: Select relevant linguistic information, encode them as a feature vector

04/18/23 CPSC503 Winter 2009 9

Relevant Linguistic Information(1)

• Collocational: info about the words that appear in specific positions to the right and left of the target word

• Example text (WSJ)– An electric guitar and bass player

stand off to one side not really part of the scene, …

Assume a window of +/- 2 from the target

[guitar, NN, and, CJC, player, NN, stand, VVB]

[word in position -n, part-of-speech position -n, …

word in position +n, part-of-speech position +n,]

Typically words and their POS

04/18/23 CPSC503 Winter 2009 10

Relevant Linguistic Information(2)

Co-occurrence: info about the words that occur anywhere in the window regardless of position

• Find k content words that most frequently co-occur with target in corpus (for bass: fishing, big, sound, player, fly …, guitar, band))Vector for one case:

[c(fishing), c(big), c(sound), c(player), c(fly), …, c(guitar), c(band)]• Example text (WSJ)

– An electric guitar and bass player stand off to one side not really part of the scene, …

[0,0,0,1,0,0,0,0,0,0,1,0]

04/18/23 CPSC503 Winter 2009 11

Training Data Examples

[guitar, NN, and, CJC, player, NN, stand, VVB, 0] [0,0,0,1,0,0,0,0,0,0,1,0,0]

Let’s assume: bass-music encoded as 0bass-fish encoded as 1

[a, AT0, sea, CJC, to, PRP, me, PNP, 1]

[play, VVB, the, AT0, with, PRP, others, PNP, 0]

[……… ]

[1,0,0,0,0,0,0,0,0,0,0,0,1]

[1,0,0,0,0,0,0,0,0,0,0,1,1][…………………..]

• Inputs to classifiers

[guitar, NN, and, CJC, could, VM0, be, VVI] [1,1,0,0,0,1,0,0,0,0,0,0]

04/18/23 CPSC503 Winter 2009 12

ML for Classifiers

MachineLearning

Training Data:•Co-occurrence•Collocational

Classifier

• Naïve Bayes• Decision lists• Decision trees• Neural nets• Support vector machines• Nearest neighbor

methods…

04/18/23 CPSC503 Winter 2009 13

Naïve Bayes

)|( argmaxs VsPSs

argmaxs Ss

n

jj

SssvPsP

1 )|()( argmaxs

)|( sVP

Independence

04/18/23 CPSC503 Winter 2009 14

Naïve Bayes: EvaluationExperiment comparing different

classifiers [Mooney 96]• Naïve Bayes and Neural Network

achieved highest performance• 73% in assigning one of six senses

to line • Is this good?

• Simplest Baseline: “most frequent sense”

• Celing: human inter-annotator agreement– 75%-80% on refined sense distinctions

(wordnet)– Closer to 90% for binary distinctions

04/18/23 CPSC503 Winter 2009 15

Bootstrapping• What if you don’t have enough data

to train a system…

More DataMoreClassifiedData

MachineLearning

SmallTrainingData

Classifier

seeds

04/18/23 CPSC503 Winter 2009 16

Bootstrapping: how to pick the seeds

E.g., bass: play is strongly associated with the music sense whereas fish is strongly associated the fish sense

• Hand-labeling (Hearst 1991):– Likely correct– Likely to be prototypical

• One sense per collocation (Yarowsky 1995):

• One Sense Per Discourse: multiple occurrences of word in one discourse tend to have the same sense

04/18/23 CPSC503 Winter 2009 17

Unsupervised Methods [Schutze ’98]

MachineLearning

(Clustering)

TrainingData

K Clusters ci

(word + vector)1

……(word + vector)n

Hand-labeling(c1 sense1)

……

(word + vector) senseVector/cluster

Similarity

04/18/23 CPSC503 Winter 2009 18

Agglomerative Clustering

• Assign each instance to its own cluster• Repeat

– Merge the two clusters that are more similar

• Until (specified # of clusters is reached)

• If there are too many training instances ->random sampling

04/18/23 CPSC503 Winter 2009 19

Problems

• Given these general ML approaches, how many classifiers do I need to perform WSD robustly– One for each ambiguous word in the

language

• How do you decide what set of tags/labels/senses to use for a given word?– Depends on the application

04/18/23 CPSC503 Winter 2009 20

WDS: Dictionary and Thesaurus Methods

Most common: Lesk method• Choose the sense whose

dictionary gloss shares most words with the target word’s neighborhood

• Exclude stop-wordsDef: Words in gloss for a sense

is called the signature

04/18/23 CPSC503 Winter 2009 21

Lesk: ExampleTwo SENSES for channel

S1: (n) channel (a passage for water (or other fluids) to flow through) "the fields were crossed with irrigation channels"; "gutters carried off the rain water into a series of channels under the street"

S2: (n) channel, television channel, TV channel (a television station and its programs) "a satellite TV channel"; "surfing through the channels"; "they offer more than one hundred channels" …..

“ most streets closed to the TV station were flooded because the main channel was clogged by heavy rain .”

04/18/23 CPSC503 Winter 2009 22

Corpus LeskBest performer• If a corpus with annotated senses is

available• For each sense: add to the signature for

that sense, words “that frequently appear” in the sentences containing that sense

CORPUS……“most streets closed to the TV station were flooded because the main <S1> channel </S1> was clogged by heavy rain.…..

04/18/23 CPSC503 Winter 2009 23

WSD: More Recent Trends SemEval workshops –

Cross Language Evaluation Forum (CLEF)

• Better ML techniques (e.g., Combining Classifiers)

• Combining ML and Lesk (Yuret,2004)

• Other Languages

• Building better/larger corpora

04/18/23 CPSC503 Winter 2009 24

Today 22/10

• Word Sense Disambiguation• Word Similarity• Semantic Role Labeling

04/18/23 CPSC503 Winter 2009 25

Word Similarity/Semantic DistanceActually relation between two sensessun vs. moon – mouth vs. food – hot

vs. cold

Applications?

• Thesaurus methods: measure distance in online thesauri (e.g., Wordnet)

• Distributional methods: finding if the two words appear in similar contexts

04/18/23 26

WS: Thesaurus Methods (path-length)• Path-length sim based on isa hierarchies

),(log),(sim 2121path ccpathlencc

• If we do not have Word Sense Disambiguation

CPSC503 Winter 2009

04/18/23 27

WS: Thesaurus Methods(info content)• Or not all edges are equal…. Add

probabilistic info derived from a corpus

N

ccount

c csubsensesci

i

)(

)(

)(P)(log)(I cPcC ),( 21 ccLCSprobability

Information Lowest Common Subsumer

)),((log),(sim 2121resnik ccLCSPcc CPSC503 Winter 2009

04/18/23 28

WS: Thesaurus Methods(info-content)• One of best performers – Jiang-

Conrath distance

))(log)((log)),((log2),(d 212121 cPcPccLCSPccistJC

• This is a measure of distance. Reciprocal for similarity!

• See also extended LeskCPSC503 Winter 2009

04/18/23 29

Best Performers

• Jiang-Conrath• Extended Lesk

• Wordnet::Similarity Package

Pedersen et al. 2004

CPSC503 Winter 2009

04/18/23 CPSC503 Winter 2009 30

WS: Distributional Methods

• Do not have any thesauri for target language

• If you have thesaurus, still– Missing domain-specific (e.g., technical words)– Poor hyponym knowledge (for V) and nothing for

Adj and Adv– Difficult to compare senses from different

hierarchies• Solution: extract similarity from corpora

• Basic idea: two words are similar if they appear in similar contexts

04/18/23 CPSC503 Winter 2009 31

WS Distributional Methods (1)

• Context: feature vector

Example: fi how many times wi appeared in the neighborhood of w

Stop list

),...,,( 21 Nfffw

04/18/23 CPSC503 Winter 2009 32


• More informative values (referred to as weights or measure of association in the literature)

• Point-wise Mutual Information

)()(

),(log),( 2

i

iiPMI wPwP

wwPwwassoc

• t-test

)()(

)()(),(),(

i

iiitestt

wPwP

wPwPwwPwwassoct

04/18/23 CPSC503 Winter 2009 33

WS Distributional Methods (3)• Similarity between vectors

Not sensitive to extreme values

)cos(),(cos

wv

wv

w

w

v

vwvsim ine

v

w

N

iii

N

iii

Jaccard

wv

wvwvsim

1

1

),max(

),min(),(

Normalized

(weighted) number of overlapping features

04/18/23 CPSC503 Winter 2009 34


• Best combination overall (Curan 2003)– t-test for weights– Jaccard (or Dice) for vector similarity

04/18/23 CPSC503 Winter 2009 35

Today 22/10

• Word Sense Disambiguation• Word Similarity• Semantic Role Labeling

04/18/23 CPSC503 Winter 2009 36

Semantic Role LabelingTypically framed as a classification

problem [Gildea, Jurfsky 2002]1. Assign parse tree to input2. Find all predicate-bearing words

(PropBank, FrameNet)3. For each “governing” predicate:

determine for each synt. constituent which role (if any) it plays with respect to the predicate

Common constituent features: predicate, phrase type, head word and its POS, path, voice, linear position…… and many others

04/18/23 CPSC503 Winter 2009 37

Semantic Role Labeling: Example

[issued, NP, Examiner, NNP, NPSVPVBD, active, before, …..]ARG0

predicate, phrase type, head word and its POS, path, voice, linear position…… and many others

04/18/23 CPSC503 Winter 2009 38

Next Time

• Discourse and Dialog : Overview of Chapters 21 and 24

04/18/23 39

WordSim: Thesaurus Methods(Extended Lesk)

• For each n-word phrase that occurs in both glosses, Extended Lesk adds in a score n2

CPSC503 Winter 2009

04/18/23 CPSC503 Winter 2009 40

WS: Thesaurus Methods(1)• Path-length based sim on hyper/hypo

hierarchies),(log),(sim 2121path ccpathlencc

• Information content word similarity (not all edges are equal)

N

ccount

c csubsensesci

i

)(

)(

)(P)(log)(I cPcC ),( 21 ccLCSprobability

Information Lowest Common Subsumer

)),((log),(sim 2121resnik ccLCSPcc

5/16/2015cpsc503 winter 20091 cpsc 503 computational linguistics computational lexical semantics...

Documents

wordnet bass

bass player

freshwater bass

bass voice

bass nontechnical

bass flesh

wordnet sense slide

noun table