named entity classification chioma osondu & wei wei

Named Entity Classification

Chioma Osondu & Wei Wei

Classifiers

• Decision Tree • Multinomial Naïve Bayes • Support Vector Machines

Features

• Unigrams• Bigrams• Trigrams• Quadrigrams• Specialized features like number of words, presence of

numbers, etc• Stemmed words

Accuracy with Tree Depth

• Accuracy does not grow with the tree depth

• Accuracy is lower than Maximum Entropy Model with the same sets features.

Tree depth vs. Accuracy

0.5

0.55

0.6

0.65

0.7

0.75

0.8

0 50 100

Tree depth

Acc

urac

y

Without Unigram

With unigram

Results & Error Analysis (1)

• Features Are not abstract enough: Corp., Corporation, Inc., is really the same feature.

• Out of the 599 disputed classifications, MEM had 481 correct, and the decision tree had 118 correct

• Not enough features defined on Place, Movie and Person.

Results & Error Analysis (2) Error Percentage in Data

0

0.05 0.1

0.15

0.2

0.25 0.3

0.35

Drug Person Place Movie Company Category

E r r o r

%

Error In both classifiers Error in Decision Tree Error In Maximum Entropy Model


347

7

347

7 4

0

50

100

150

200

250

300

350

Classification of atomic elements in drug category

drug

person

place

movie

company


1 3 10 6

324

0

50

100

150

200

250

300

350

Classification of atomic elements in company category

drug

person

place

movie

company

Conclusion & Future Work

• Stemmed words are too coarse for multi-way• Better accuracies of over 94% can be achieved

using a combination of featuresSee Automatic Classification of Previously Unseen Proper Noun Phrases into Semantic Categories Using an N-Gram Letter Model by Stephen Patel & Joseph Smarr (2001 Final Project)

• Combining classifiers

named entity classification chioma osondu & wei wei

Documents