conll

29
1 Combining Lexical and Syntactic Features for Supervised Word Sense Disambiguation Saif Mohammad Ted Pedersen Univ. of Toronto Univ. of Minnesota, Duluth http//:www.cs.toronto.edu/~smm http//:www.d.umn.edu/~tpederse

Upload: university-of-minnesota-duluth

Post on 11-May-2015

366 views

Category:

Technology


0 download

TRANSCRIPT

Page 1: Conll

1

Combining Lexical and Syntactic Features for

Supervised Word Sense Disambiguation

Saif Mohammad Ted Pedersen Univ. of Toronto Univ. of Minnesota, Duluth

http//:www.cs.toronto.edu/~smm

http//:www.d.umn.edu/~tpederse

Page 2: Conll

2

Word Sense Disambiguation

Harry cast a bewitching spell

We understand target word spell in this context to mean charm or incantation

not “reading out letter by letter” or “a period of time” Automatically identifying the intended

sense of a word based on its context is hard! Best accuracies often around 65%-75%

Page 3: Conll

3

WSD as Classification Learn model for a given target word

from a corpus of manually sense tagged training examples

The model assigns target word a sense based on the context in which it occurs Context represented by feature set

Evaluate model on held out test set

Page 4: Conll

4

Motivations Lexical features do “reasonably well” at

supervised WSD… Duluth systems in Senseval-2 Pedersen NAACL-2001

POS features do “reasonably well” too Complementary or redundant?

Complementary? Find the simplest ways to represent instances and combine results to improve performance

Redundant? We can reduce feature space without affecting performance

Page 5: Conll

5

Decision Trees Assigns sense to an instance by asking a

series of questions Questions correspond to features of the

instance and depend on previous answer In the tree…

Top most node is called the root Each node corresponds to a feature Each value of a feature has a branch Each path terminates in a sense/leaf

Page 6: Conll

6

WSD Tree

Feature 4?

Feature 4 ?

Feature 2 ?

Feature 3 ?

Feature 2 ?

SENSE 4

SENSE 3SENSE 2

SENSE 1

SENSE 3

SENSE 3

0

0

0

1

1

1

0

10

1

0 1

Feature 1 ?

SENSE 1

Page 7: Conll

7

Why Decision Trees?

Many kinds of features can contribute to WSD performance

Many learning algorithms result in comparable classifiers when given the same set of features

A learned decision tree captures interactions among features

Many implementations available Weka J48

Page 8: Conll

8

Lexical Features Surface form

Observed form of target word Unigrams and Bigrams

One and two word sequences Ngram Statistics Package

http://www.d.umn.edu/~tpederse/nsp.html

Page 9: Conll

9

POS Features Surrounding POS indicate different sense:

Why did Jack turn/VB against/IN his/PRP$ team/NN

Why did Jack turn/VB left/NN at/IN the/DT crossing

Individual word POS: P-2, P-1, P0, P1, P2

Used individually and in combination

Page 10: Conll

10

Part of Speech Tagging Brill Tagger

Open Source Easy to Understand

Guaranteed Pre-Tagging Manually tag target words Implemented in BrillPatch

Page 11: Conll

11

Parse Features Head word of the target phrase

the hard work, the hard surface Head word of the parent phrase

fasten the line, cross the line Target and parent phrase POS

noun phrase, verb phrase… Used individually and in

combination Obtained via Collins Parser

Page 12: Conll

12

Experiments How accurate are simple classifiers

based on a single feature type? How complementary or redundant

are lexical and syntactic features? Is it possible (in theory at least) to

combine just a few very simple classifiers and achieve near state of the art results?

Page 13: Conll

13

Experiments Learn a decision tree based on a

single feature type Surface, Unigram, Bigram, POS,

Parse, … Combine pairs of these trees via a

simple ensemble technique Weighted vote

Page 14: Conll

14

Sense-Tagged Data Senseval-2 data

4328 test instances, 8611 training instances 73 nouns, verbs and adjectives.

Senseval-1 data 8512 test instances, 13276 training instances 35 nouns, verbs and adjectives.

line, hard, interest, serve data 4149, 4337, 4378 and 2476 instances

50,000 sense-tagged instances in all!

Page 15: Conll

15

Lexical Features

Sval-2 Sval-1 line hard serve interest

Majority

47.7% 56.3% 54.3% 81.5% 42.2%

54.9%

Surface Form

49.3% 62.9% 54.3% 81.5% 44.2%

64.0%

Unigram

55.3% 66.9% 74.5% 83.4% 73.3%

75.7%

Bigram 55.1% 66.9% 72.9% 89.5% 72.1%

79.9%

Page 16: Conll

16

POS FeaturesSval-2Sval-2 Sval-1Sval-1 lineline hardhard serve serve interestinterest

majority

47.7% 56.3% 54.3% 81.5% 42.2% 54.9%

P-247.1% 57.5% 54.9% 81.6% 60.3% 56.0%

P-149.6% 59.2% 56.2% 82.1% 60.2% 62.7%

P049.9% 60.3% 54.3% 81.6% 58.0% 64.0%

P153.1% 63.9% 54.2% 81.6% 73.0% 65.3%

P248.9% 59.9% 54.3% 81.7% 75.7% 62.3%

Page 17: Conll

17

Combining POS FeaturesSval-2 Sval-1 line hard serve interest

Majority 47.7% 56.3% 54.3% 81.5% 42.2% 54.9%

P0, P154.3% 66.7% 54.1% 81.9% 60.2% 70.5%

P-1, P0, P154.6% 68.0% 60.4% 84.8% 73.0% 78.8%

P-2, P-1,

P0, P1 , P254.6% 67.8% 62.3% 86.2% 75.7% 80.6%

Page 18: Conll

18

Parse FeaturesSval-2 Sval-1 line hard serve interest

Majority 47.7% 56.3% 54.3% 81.5% 42.2% 54.9%

Head Word

51.7% 64.3% 54.7% 87.8% 47.4% 69.1%

Parent Word

50.0% 60.6% 59.8% 84.5% 57.2% 67.8%

Phrase POS

52.9% 58.5% 54.3% 81.5% 41.4% 54.9%

Parent Phrase POS

52.7% 57.9% 54.3% 81.7% 41.6% 54.9%

Page 19: Conll

19

Discussion Lexical and syntactic features perform

comparably. Do they get the same instances right ? Are there instances disambiguated by one

feature set and not by the other? How much are the individual feature sets

complementary?

Page 20: Conll

20

Measures

Baseline Ensemble: accuracy of a hypothetical ensemble which predicts the sense correctly only if both individual feature sets do so.

Optimal Ensemble: accuracy of a hypothetical ensemble which predicts the sense correctly if either of the individual feature sets do so.

Page 21: Conll

21

Our Ensemble Approach We use a weighted vote ensemble to

decide the sense of a target word For a given test instance, takes the

output of two classifiers (one lexical and one syntactic) and sums the probabilities associated with each possible sense

Page 22: Conll

22

Best Combinations

Data Set 1 Set 2 Base Ours Optimal BestSval-247.7%

Unigrams 55.3%

P-1,P0, P1

55.3%

43.6% 57.0% 67.9% 66.7%

Sval-156.3%

Unigrams 66.9%

P-1,P0, P1

68.0%

57.6% 71.1% 78.0% 81.1%

line54.3%

Unigrams 74.5%

P-1,P0, P1

60.4%

55.1% 74.2% 82.0% 88.0%

hard81.5%

Bigrams 89.5%

Head, Parent 87.7%

86.1% 88.9% 91.3% 83.0%

serve42.2%

Unigrams 73.3%

P-1,P0, P1

73.0%

58.4% 81.6% 89.9% 83.0%

interest54.9%

Bigrams 79.9%

P-1,P0, P1

78.8%

67.6% 83.2% 90.1% 89.0%

Page 23: Conll

23

Conclusions

Reasonable amount of complementarity across lexical and syntactic features.

Simple lexical and part of speech features can be combined to achieve state of the art results.

Future Work : How best to capitalize on the complementarity?

Page 24: Conll

24

Senseval-3

Approx. 8000 training and 4000 test instances. English lexical sample task.

Training data collected via Open Mind Word Expert.

Comparative results unveiled at ACL workshop!

Page 25: Conll

25

Software and Data SyntaLex : WSD using lexical and syntactic

features. posSenseval : POS tag data in Senseval-2 format

using Brill Tagger. parseSenseval : parse output from Brill Tagger

using Collins Parser. BrillPatch : Supports Guaranteed Pre-Tagging. Packages to convert line hard, serve and interest

data to Senseval-1 and Senseval-2 data formats.

http://www.d.umn.edu/~tpederse/code.htmlhttp://www.d.umn.edu/~tpederse/data.html

Page 26: Conll

26

Individual Word POS : Senseval-1

All Nouns Verbs Adj.Majority 56.3% 57.2% 56.9% 64.3%

P-257.5% 58.2% 58.6% 64.0

P-159.2% 62.2% 58.2% 64.3%

P060.3% 62.5% 58.2% 64.3%

P163.9% 65.4% 64.4% 66.2%

P-259.9% 60.0% 60.8% 65.2%

Page 27: Conll

27

Individual Word POS: Senseval-2

All Nouns Verbs Adj.Majority 47.7% 51.0% 39.7% 59.0%

P-247.1% 51.9% 38.0% 57.9%

P-149.6% 55.2% 40.2% 59.0%

P049.9% 55.7% 40.6% 58.2%

P153.1% 53.8% 49.1% 61.0%

P-248.9% 50.2% 43.2% 59.4%

Page 28: Conll

28

Parse Features:Senseval-1

All Nouns Verbs Adj.Majority 56.3% 57.2% 56.9% 64.3%

Head Word 64.3% 70.9% 59.8% 66.9%

Parent Word 60.6% 62.6% 60.3% 65.8%

Phrase 58.5% 57.5% 57.2% 66.2%

Parent Phrase 57.9% 58.1% 58.3% 66.2%

Page 29: Conll

29

Parse Features:Senseval-2

All Nouns Verbs Adj.Majority 47.7% 51.0% 39.7% 59.0%

Head 51.7% 58.5% 39.8% 64.0%

Parent 50.0% 56.1% 40.1% 59.3%

Phrase 48.3% 51.7% 40.3% 59.5%

Parent Phrase

48.5% 53.0% 39.1% 60.3%