named entity classification chioma osondu & wei wei
TRANSCRIPT
Named Entity Classification
Chioma Osondu & Wei Wei
Classifiers
• Decision Tree • Multinomial Naïve Bayes • Support Vector Machines
Features
• Unigrams• Bigrams• Trigrams• Quadrigrams• Specialized features like number of words, presence of
numbers, etc• Stemmed words
Accuracy with Tree Depth
• Accuracy does not grow with the tree depth
• Accuracy is lower than Maximum Entropy Model with the same sets features.
Tree depth vs. Accuracy
0.5
0.55
0.6
0.65
0.7
0.75
0.8
0 50 100
Tree depth
Acc
urac
y
Without Unigram
With unigram
Results & Error Analysis (1)
• Features Are not abstract enough: Corp., Corporation, Inc., is really the same feature.
• Out of the 599 disputed classifications, MEM had 481 correct, and the decision tree had 118 correct
• Not enough features defined on Place, Movie and Person.
Results & Error Analysis (2) Error Percentage in Data
0
0.05 0.1
0.15
0.2
0.25 0.3
0.35
Drug Person Place Movie Company Category
E r r o r
%
Error In both classifiers Error in Decision Tree Error In Maximum Entropy Model
Results & Error Analysis (3)
347
7
347
7 4
0
50
100
150
200
250
300
350
Classification of atomic elements in drug category
drug
person
place
movie
company
Results & Error Analysis (4)
1 3 10 6
324
0
50
100
150
200
250
300
350
Classification of atomic elements in company category
drug
person
place
movie
company
Conclusion & Future Work
• Stemmed words are too coarse for multi-way• Better accuracies of over 94% can be achieved
using a combination of featuresSee Automatic Classification of Previously Unseen Proper Noun Phrases into Semantic Categories Using an N-Gram Letter Model by Stephen Patel & Joseph Smarr (2001 Final Project)
• Combining classifiers