5/16/2015cpsc503 winter 20091 cpsc 503 computational linguistics computational lexical semantics...
TRANSCRIPT
04/18/23 CPSC503 Winter 2009 1
CPSC 503Computational Linguistics
Computational Lexical SemanticsLecture 14
Giuseppe Carenini
04/18/23 CPSC503 Winter 2009 2
Today 23/10
Three well-defined Semantic Task• Word Sense Disambiguation
– Corpus and Thesaurus
• Word Similarity– Thesaurus and Corpus
• Semantic Role Labeling (corpus)
04/18/23 CPSC503 Winter 2009 3
WSD example: table + ?? -> [1-6]
The noun "table" has 6 senses in WordNet.1. table, tabular array -- (a set of data …)2. table -- (a piece of furniture …)3. table -- (a piece of furniture with tableware…)4. mesa, table -- (flat tableland …)5. table -- (a company of people …)6. board, table -- (food or meals …)
04/18/23 CPSC503 Winter 2009 4
WSD methods
•Machine Learning – Supervised– Unsupervised
•Dictionary / Thesaurus (Lesk)
04/18/23 CPSC503 Winter 2009 5
Supervised ML Approaches to WSD
MachineLearning
Classifier
TrainingData
((word + context1) sense1)……((word + contextn) sensen)
sense(word + context)
04/18/23 CPSC503 Winter 2009 6
Training Data Example
..after the soup she had bass with a big salad…
((word + context) sense)i
context
Examples, • One of 8 possible senses for “bass”
in WordNet• One of the 2 key distinct senses for
“bass” in WordNet
sense
04/18/23 CPSC503 Winter 2009 7
WordNet Bass: music vs. fishThe noun ``bass'' has 8 senses in WordNet1. bass - (the lowest part of the musical range)2. bass, bass part - (the lowest part in polyphonic
music)3. bass, basso - (an adult male singer with …)4. sea bass, bass - (flesh of lean-fleshed saltwater fish
of the family Serranidae)5. freshwater bass, bass - (any of various North
American lean-fleshed ………)6. bass, bass voice, basso - (the lowest adult male
singing voice)7. bass - (the member with the lowest range of a
family of musical instruments)8. bass -(nontechnical name for any of numerous
edible marine and freshwater spiny-finned fishes)
04/18/23 CPSC503 Winter 2009 8
Representations for Context
• GOAL: Informative characterization of the window of text surrounding the target word
• Supervised ML requires a simple representation for the training data:
vectors of feature/value pairs
• TASK: Select relevant linguistic information, encode them as a feature vector
04/18/23 CPSC503 Winter 2009 9
Relevant Linguistic Information(1)
• Collocational: info about the words that appear in specific positions to the right and left of the target word
• Example text (WSJ)– An electric guitar and bass player
stand off to one side not really part of the scene, …
Assume a window of +/- 2 from the target
[guitar, NN, and, CJC, player, NN, stand, VVB]
[word in position -n, part-of-speech position -n, …
word in position +n, part-of-speech position +n,]
Typically words and their POS
04/18/23 CPSC503 Winter 2009 10
Relevant Linguistic Information(2)
Co-occurrence: info about the words that occur anywhere in the window regardless of position
• Find k content words that most frequently co-occur with target in corpus (for bass: fishing, big, sound, player, fly …, guitar, band))Vector for one case:
[c(fishing), c(big), c(sound), c(player), c(fly), …, c(guitar), c(band)]• Example text (WSJ)
– An electric guitar and bass player stand off to one side not really part of the scene, …
[0,0,0,1,0,0,0,0,0,0,1,0]
04/18/23 CPSC503 Winter 2009 11
Training Data Examples
[guitar, NN, and, CJC, player, NN, stand, VVB, 0] [0,0,0,1,0,0,0,0,0,0,1,0,0]
Let’s assume: bass-music encoded as 0bass-fish encoded as 1
[a, AT0, sea, CJC, to, PRP, me, PNP, 1]
[play, VVB, the, AT0, with, PRP, others, PNP, 0]
[……… ]
[1,0,0,0,0,0,0,0,0,0,0,0,1]
[1,0,0,0,0,0,0,0,0,0,0,1,1][…………………..]
• Inputs to classifiers
[guitar, NN, and, CJC, could, VM0, be, VVI] [1,1,0,0,0,1,0,0,0,0,0,0]
04/18/23 CPSC503 Winter 2009 12
ML for Classifiers
MachineLearning
Training Data:•Co-occurrence•Collocational
Classifier
• Naïve Bayes• Decision lists• Decision trees• Neural nets• Support vector machines• Nearest neighbor
methods…
04/18/23 CPSC503 Winter 2009 13
Naïve Bayes
)|( argmaxs VsPSs
argmaxs Ss
n
jj
SssvPsP
1 )|()( argmaxs
)|( sVP
Independence
04/18/23 CPSC503 Winter 2009 14
Naïve Bayes: EvaluationExperiment comparing different
classifiers [Mooney 96]• Naïve Bayes and Neural Network
achieved highest performance• 73% in assigning one of six senses
to line • Is this good?
• Simplest Baseline: “most frequent sense”
• Celing: human inter-annotator agreement– 75%-80% on refined sense distinctions
(wordnet)– Closer to 90% for binary distinctions
04/18/23 CPSC503 Winter 2009 15
Bootstrapping• What if you don’t have enough data
to train a system…
More DataMoreClassifiedData
MachineLearning
SmallTrainingData
Classifier
seeds
04/18/23 CPSC503 Winter 2009 16
Bootstrapping: how to pick the seeds
E.g., bass: play is strongly associated with the music sense whereas fish is strongly associated the fish sense
• Hand-labeling (Hearst 1991):– Likely correct– Likely to be prototypical
• One sense per collocation (Yarowsky 1995):
• One Sense Per Discourse: multiple occurrences of word in one discourse tend to have the same sense
04/18/23 CPSC503 Winter 2009 17
Unsupervised Methods [Schutze ’98]
MachineLearning
(Clustering)
TrainingData
K Clusters ci
(word + vector)1
……(word + vector)n
Hand-labeling(c1 sense1)
……
(word + vector) senseVector/cluster
Similarity
04/18/23 CPSC503 Winter 2009 18
Agglomerative Clustering
• Assign each instance to its own cluster• Repeat
– Merge the two clusters that are more similar
• Until (specified # of clusters is reached)
• If there are too many training instances ->random sampling
04/18/23 CPSC503 Winter 2009 19
Problems
• Given these general ML approaches, how many classifiers do I need to perform WSD robustly– One for each ambiguous word in the
language
• How do you decide what set of tags/labels/senses to use for a given word?– Depends on the application
04/18/23 CPSC503 Winter 2009 20
WDS: Dictionary and Thesaurus Methods
Most common: Lesk method• Choose the sense whose
dictionary gloss shares most words with the target word’s neighborhood
• Exclude stop-wordsDef: Words in gloss for a sense
is called the signature
04/18/23 CPSC503 Winter 2009 21
Lesk: ExampleTwo SENSES for channel
S1: (n) channel (a passage for water (or other fluids) to flow through) "the fields were crossed with irrigation channels"; "gutters carried off the rain water into a series of channels under the street"
S2: (n) channel, television channel, TV channel (a television station and its programs) "a satellite TV channel"; "surfing through the channels"; "they offer more than one hundred channels" …..
“ most streets closed to the TV station were flooded because the main channel was clogged by heavy rain .”
04/18/23 CPSC503 Winter 2009 22
Corpus LeskBest performer• If a corpus with annotated senses is
available• For each sense: add to the signature for
that sense, words “that frequently appear” in the sentences containing that sense
CORPUS……“most streets closed to the TV station were flooded because the main <S1> channel </S1> was clogged by heavy rain.…..
04/18/23 CPSC503 Winter 2009 23
WSD: More Recent Trends SemEval workshops –
Cross Language Evaluation Forum (CLEF)
• Better ML techniques (e.g., Combining Classifiers)
• Combining ML and Lesk (Yuret,2004)
• Other Languages
• Building better/larger corpora
04/18/23 CPSC503 Winter 2009 24
Today 22/10
• Word Sense Disambiguation• Word Similarity• Semantic Role Labeling
04/18/23 CPSC503 Winter 2009 25
Word Similarity/Semantic DistanceActually relation between two sensessun vs. moon – mouth vs. food – hot
vs. cold
Applications?
• Thesaurus methods: measure distance in online thesauri (e.g., Wordnet)
• Distributional methods: finding if the two words appear in similar contexts
04/18/23 26
WS: Thesaurus Methods (path-length)• Path-length sim based on isa hierarchies
),(log),(sim 2121path ccpathlencc
• If we do not have Word Sense Disambiguation
CPSC503 Winter 2009
04/18/23 27
WS: Thesaurus Methods(info content)• Or not all edges are equal…. Add
probabilistic info derived from a corpus
N
ccount
c csubsensesci
i
)(
)(
)(P)(log)(I cPcC ),( 21 ccLCSprobability
Information Lowest Common Subsumer
)),((log),(sim 2121resnik ccLCSPcc CPSC503 Winter 2009
04/18/23 28
WS: Thesaurus Methods(info-content)• One of best performers – Jiang-
Conrath distance
))(log)((log)),((log2),(d 212121 cPcPccLCSPccistJC
• This is a measure of distance. Reciprocal for similarity!
• See also extended LeskCPSC503 Winter 2009
04/18/23 29
Best Performers
• Jiang-Conrath• Extended Lesk
• Wordnet::Similarity Package
Pedersen et al. 2004
CPSC503 Winter 2009
04/18/23 CPSC503 Winter 2009 30
WS: Distributional Methods
• Do not have any thesauri for target language
• If you have thesaurus, still– Missing domain-specific (e.g., technical words)– Poor hyponym knowledge (for V) and nothing for
Adj and Adv– Difficult to compare senses from different
hierarchies• Solution: extract similarity from corpora
• Basic idea: two words are similar if they appear in similar contexts
04/18/23 CPSC503 Winter 2009 31
WS Distributional Methods (1)
• Context: feature vector
Example: fi how many times wi appeared in the neighborhood of w
Stop list
),...,,( 21 Nfffw
04/18/23 CPSC503 Winter 2009 32
WS Distributional Methods (2)
• More informative values (referred to as weights or measure of association in the literature)
• Point-wise Mutual Information
)()(
),(log),( 2
i
iiPMI wPwP
wwPwwassoc
• t-test
)()(
)()(),(),(
i
iiitestt
wPwP
wPwPwwPwwassoct
04/18/23 CPSC503 Winter 2009 33
WS Distributional Methods (3)• Similarity between vectors
Not sensitive to extreme values
)cos(),(cos
wv
wv
w
w
v
vwvsim ine
v
w
N
iii
N
iii
Jaccard
wv
wvwvsim
1
1
),max(
),min(),(
Normalized
(weighted) number of overlapping features
04/18/23 CPSC503 Winter 2009 34
WS Distributional Methods (4)
• Best combination overall (Curan 2003)– t-test for weights– Jaccard (or Dice) for vector similarity
04/18/23 CPSC503 Winter 2009 35
Today 22/10
• Word Sense Disambiguation• Word Similarity• Semantic Role Labeling
04/18/23 CPSC503 Winter 2009 36
Semantic Role LabelingTypically framed as a classification
problem [Gildea, Jurfsky 2002]1. Assign parse tree to input2. Find all predicate-bearing words
(PropBank, FrameNet)3. For each “governing” predicate:
determine for each synt. constituent which role (if any) it plays with respect to the predicate
Common constituent features: predicate, phrase type, head word and its POS, path, voice, linear position…… and many others
04/18/23 CPSC503 Winter 2009 37
Semantic Role Labeling: Example
[issued, NP, Examiner, NNP, NPSVPVBD, active, before, …..]ARG0
predicate, phrase type, head word and its POS, path, voice, linear position…… and many others
04/18/23 CPSC503 Winter 2009 38
Next Time
• Discourse and Dialog : Overview of Chapters 21 and 24
04/18/23 39
WordSim: Thesaurus Methods(Extended Lesk)
• For each n-word phrase that occurs in both glosses, Extended Lesk adds in a score n2
CPSC503 Winter 2009
04/18/23 CPSC503 Winter 2009 40
WS: Thesaurus Methods(1)• Path-length based sim on hyper/hypo
hierarchies),(log),(sim 2121path ccpathlencc
• Information content word similarity (not all edges are equal)
N
ccount
c csubsensesci
i
)(
)(
)(P)(log)(I cPcC ),( 21 ccLCSprobability
Information Lowest Common Subsumer
)),((log),(sim 2121resnik ccLCSPcc