conll
TRANSCRIPT
1
Combining Lexical and Syntactic Features for
Supervised Word Sense Disambiguation
Saif Mohammad Ted Pedersen Univ. of Toronto Univ. of Minnesota, Duluth
http//:www.cs.toronto.edu/~smm
http//:www.d.umn.edu/~tpederse
2
Word Sense Disambiguation
Harry cast a bewitching spell
We understand target word spell in this context to mean charm or incantation
not “reading out letter by letter” or “a period of time” Automatically identifying the intended
sense of a word based on its context is hard! Best accuracies often around 65%-75%
3
WSD as Classification Learn model for a given target word
from a corpus of manually sense tagged training examples
The model assigns target word a sense based on the context in which it occurs Context represented by feature set
Evaluate model on held out test set
4
Motivations Lexical features do “reasonably well” at
supervised WSD… Duluth systems in Senseval-2 Pedersen NAACL-2001
POS features do “reasonably well” too Complementary or redundant?
Complementary? Find the simplest ways to represent instances and combine results to improve performance
Redundant? We can reduce feature space without affecting performance
5
Decision Trees Assigns sense to an instance by asking a
series of questions Questions correspond to features of the
instance and depend on previous answer In the tree…
Top most node is called the root Each node corresponds to a feature Each value of a feature has a branch Each path terminates in a sense/leaf
6
WSD Tree
Feature 4?
Feature 4 ?
Feature 2 ?
Feature 3 ?
Feature 2 ?
SENSE 4
SENSE 3SENSE 2
SENSE 1
SENSE 3
SENSE 3
0
0
0
1
1
1
0
10
1
0 1
Feature 1 ?
SENSE 1
7
Why Decision Trees?
Many kinds of features can contribute to WSD performance
Many learning algorithms result in comparable classifiers when given the same set of features
A learned decision tree captures interactions among features
Many implementations available Weka J48
8
Lexical Features Surface form
Observed form of target word Unigrams and Bigrams
One and two word sequences Ngram Statistics Package
http://www.d.umn.edu/~tpederse/nsp.html
9
POS Features Surrounding POS indicate different sense:
Why did Jack turn/VB against/IN his/PRP$ team/NN
Why did Jack turn/VB left/NN at/IN the/DT crossing
Individual word POS: P-2, P-1, P0, P1, P2
Used individually and in combination
10
Part of Speech Tagging Brill Tagger
Open Source Easy to Understand
Guaranteed Pre-Tagging Manually tag target words Implemented in BrillPatch
11
Parse Features Head word of the target phrase
the hard work, the hard surface Head word of the parent phrase
fasten the line, cross the line Target and parent phrase POS
noun phrase, verb phrase… Used individually and in
combination Obtained via Collins Parser
12
Experiments How accurate are simple classifiers
based on a single feature type? How complementary or redundant
are lexical and syntactic features? Is it possible (in theory at least) to
combine just a few very simple classifiers and achieve near state of the art results?
13
Experiments Learn a decision tree based on a
single feature type Surface, Unigram, Bigram, POS,
Parse, … Combine pairs of these trees via a
simple ensemble technique Weighted vote
14
Sense-Tagged Data Senseval-2 data
4328 test instances, 8611 training instances 73 nouns, verbs and adjectives.
Senseval-1 data 8512 test instances, 13276 training instances 35 nouns, verbs and adjectives.
line, hard, interest, serve data 4149, 4337, 4378 and 2476 instances
50,000 sense-tagged instances in all!
15
Lexical Features
Sval-2 Sval-1 line hard serve interest
Majority
47.7% 56.3% 54.3% 81.5% 42.2%
54.9%
Surface Form
49.3% 62.9% 54.3% 81.5% 44.2%
64.0%
Unigram
55.3% 66.9% 74.5% 83.4% 73.3%
75.7%
Bigram 55.1% 66.9% 72.9% 89.5% 72.1%
79.9%
16
POS FeaturesSval-2Sval-2 Sval-1Sval-1 lineline hardhard serve serve interestinterest
majority
47.7% 56.3% 54.3% 81.5% 42.2% 54.9%
P-247.1% 57.5% 54.9% 81.6% 60.3% 56.0%
P-149.6% 59.2% 56.2% 82.1% 60.2% 62.7%
P049.9% 60.3% 54.3% 81.6% 58.0% 64.0%
P153.1% 63.9% 54.2% 81.6% 73.0% 65.3%
P248.9% 59.9% 54.3% 81.7% 75.7% 62.3%
17
Combining POS FeaturesSval-2 Sval-1 line hard serve interest
Majority 47.7% 56.3% 54.3% 81.5% 42.2% 54.9%
P0, P154.3% 66.7% 54.1% 81.9% 60.2% 70.5%
P-1, P0, P154.6% 68.0% 60.4% 84.8% 73.0% 78.8%
P-2, P-1,
P0, P1 , P254.6% 67.8% 62.3% 86.2% 75.7% 80.6%
18
Parse FeaturesSval-2 Sval-1 line hard serve interest
Majority 47.7% 56.3% 54.3% 81.5% 42.2% 54.9%
Head Word
51.7% 64.3% 54.7% 87.8% 47.4% 69.1%
Parent Word
50.0% 60.6% 59.8% 84.5% 57.2% 67.8%
Phrase POS
52.9% 58.5% 54.3% 81.5% 41.4% 54.9%
Parent Phrase POS
52.7% 57.9% 54.3% 81.7% 41.6% 54.9%
19
Discussion Lexical and syntactic features perform
comparably. Do they get the same instances right ? Are there instances disambiguated by one
feature set and not by the other? How much are the individual feature sets
complementary?
20
Measures
Baseline Ensemble: accuracy of a hypothetical ensemble which predicts the sense correctly only if both individual feature sets do so.
Optimal Ensemble: accuracy of a hypothetical ensemble which predicts the sense correctly if either of the individual feature sets do so.
21
Our Ensemble Approach We use a weighted vote ensemble to
decide the sense of a target word For a given test instance, takes the
output of two classifiers (one lexical and one syntactic) and sums the probabilities associated with each possible sense
22
Best Combinations
Data Set 1 Set 2 Base Ours Optimal BestSval-247.7%
Unigrams 55.3%
P-1,P0, P1
55.3%
43.6% 57.0% 67.9% 66.7%
Sval-156.3%
Unigrams 66.9%
P-1,P0, P1
68.0%
57.6% 71.1% 78.0% 81.1%
line54.3%
Unigrams 74.5%
P-1,P0, P1
60.4%
55.1% 74.2% 82.0% 88.0%
hard81.5%
Bigrams 89.5%
Head, Parent 87.7%
86.1% 88.9% 91.3% 83.0%
serve42.2%
Unigrams 73.3%
P-1,P0, P1
73.0%
58.4% 81.6% 89.9% 83.0%
interest54.9%
Bigrams 79.9%
P-1,P0, P1
78.8%
67.6% 83.2% 90.1% 89.0%
23
Conclusions
Reasonable amount of complementarity across lexical and syntactic features.
Simple lexical and part of speech features can be combined to achieve state of the art results.
Future Work : How best to capitalize on the complementarity?
24
Senseval-3
Approx. 8000 training and 4000 test instances. English lexical sample task.
Training data collected via Open Mind Word Expert.
Comparative results unveiled at ACL workshop!
25
Software and Data SyntaLex : WSD using lexical and syntactic
features. posSenseval : POS tag data in Senseval-2 format
using Brill Tagger. parseSenseval : parse output from Brill Tagger
using Collins Parser. BrillPatch : Supports Guaranteed Pre-Tagging. Packages to convert line hard, serve and interest
data to Senseval-1 and Senseval-2 data formats.
http://www.d.umn.edu/~tpederse/code.htmlhttp://www.d.umn.edu/~tpederse/data.html
26
Individual Word POS : Senseval-1
All Nouns Verbs Adj.Majority 56.3% 57.2% 56.9% 64.3%
P-257.5% 58.2% 58.6% 64.0
P-159.2% 62.2% 58.2% 64.3%
P060.3% 62.5% 58.2% 64.3%
P163.9% 65.4% 64.4% 66.2%
P-259.9% 60.0% 60.8% 65.2%
27
Individual Word POS: Senseval-2
All Nouns Verbs Adj.Majority 47.7% 51.0% 39.7% 59.0%
P-247.1% 51.9% 38.0% 57.9%
P-149.6% 55.2% 40.2% 59.0%
P049.9% 55.7% 40.6% 58.2%
P153.1% 53.8% 49.1% 61.0%
P-248.9% 50.2% 43.2% 59.4%
28
Parse Features:Senseval-1
All Nouns Verbs Adj.Majority 56.3% 57.2% 56.9% 64.3%
Head Word 64.3% 70.9% 59.8% 66.9%
Parent Word 60.6% 62.6% 60.3% 65.8%
Phrase 58.5% 57.5% 57.2% 66.2%
Parent Phrase 57.9% 58.1% 58.3% 66.2%
29
Parse Features:Senseval-2
All Nouns Verbs Adj.Majority 47.7% 51.0% 39.7% 59.0%
Head 51.7% 58.5% 39.8% 64.0%
Parent 50.0% 56.1% 40.1% 59.3%
Phrase 48.3% 51.7% 40.3% 59.5%
Parent Phrase
48.5% 53.0% 39.1% 60.3%