classifying entities into an incomplete ontology bhavana dalvi, william w. cohen, jamie callan...
TRANSCRIPT
CLASSIFYING ENTITIES INTO AN INCOMPLETE ONTOLOGY
Bhavana Dalvi, William W. Cohen, Jamie Callan School of Computer Science, Carnegie Mellon University
Motivation
Existing Techniques Semi-supervised Hierarchical Classification: Carlson
WSDM’10 Extending knowledge bases: Finding new relations or
attributes of existing concepts Mohamed et al. EMNLP’11 Unsupervised ontology discovery:
Adams et al. NIPS’10, Blei et al. JACM’10, Reisinger et al. ACL’09
Evolving Web-scale datasets Billions of entities and hundreds of thousands of
concepts Difficult to create a complete ontology Hierarchical classification of entities into incomplete
ontologies is needed
Contributions
Hierarchical Exploratory EM Adds new instances to the existing classes Discovers new classes and adds them at appropriate
places in the ontology
Class constraints: Inclusion: Every entity that is “Mammal” is also an
“Animal” Mutual Exclusion: If an entity is “Electronic Device”
then its not “Mammal”
Problem Definition
Input Large set of data-points : Some known classes : Class constraints betweenclasses Small number of seeds per known class: n
Output Labels for all data-points Discover new classes from data: k Updated class constraints:
Review: Exploratory EM [Dalvi et al. ECML 2013]
Initialize model with few seeds per classIterate till convergence (Data likelihood and #
classes) E step: Predict labels for unlabeled points If P(Cj | Xi) is nearly-uniform for a data-point Xi, j=1 to k
Create a new class Ck+1, assign Xi to it
M step: Recompute model parameters using seeds + predicted labels for unlabeled points
Number of classes might increase in each iteration
Check if model selection criterion is satisfied If not, revert to model in Iteration `t-1’
Classification/clustering
KMeans, NBayes, VMF …
Max/Min ratioJS Divergence
AIC, BIC, AICc …
Hierarchical Exploratory EM
Initialize model with few seeds per classIterate till convergence (Data likelihood and # classes)
E step: Predict labels for unlabeled points Assign a consistent bit vector of labels for each
unlabeled datapoint If is nearly-uniform for a data-point
Create a new class , assign to it Update class constraints accordingly
M step: Recompute model parameters using seeds + predicted labels for unlabeled points
Number of classes might increase in each iteration Since the E step follows class constraints this step need
not be modified
Check if model selection criterion is satisfied If not, revert to model in Iteration `t-1’
Divide-And-Conquer Exploratory EM
Mutual ExcIusion
Root
FoodLocatio
n
CountryState Vegetable
Condiment
Inclusion
E.g. Spinach, Potato, Pepper…
Level 1
Level 2
Level 3
Assumptions: Classes are arranged in a tree-structured hierarchy. Classes at any level of the hierarchy are mutually exclusive.
Divide-And-Conquer Exploratory EM
Root
FoodLocati
on
Country
State
Vegetable
Condiment
1.0 California
Divide-And-Conquer Exploratory EM
Root
FoodLocati
on
Country
State
Vegetable
Condiment
1.0 California
0.9 0.1
Divide-And-Conquer Exploratory EM
Root
FoodLocati
on
Country
State
Vegetable
Condiment
1.0 California
0.8 0.2
0.9 0.1
0 1 0 01 1 0
Divide-And-Conquer Exploratory EM
Root
FoodLocati
on
Country
State
Vegetable
Condiment
1.0 Coke
Divide-And-Conquer Exploratory EM
Root
FoodLocati
on
Country
State
Vegetable
Condiment
1.0 Coke
0.1 0.9
Divide-And-Conquer Exploratory EM
Root
FoodLocati
on
Country
State
Vegetable
Condiment
1.0 Coke
0.1 0.9
0.55 0.45
Divide-And-Conquer Exploratory EM
Root
FoodLocati
on
Country
State
Vegetable
Condiment
1.0 Coke
0.1 0.9
0.55 0.45
C8
Coke
1 0 0 01 0 0 1
Divide-And-Conquer Exploratory EM
Root
FoodLocati
on
Country
State
Vegetable
Condiment
1.0 Coke
0.1 0.9
0.55 0.45
𝑪𝟖
Adds to class constraints
1 0 0 01 0 0 1
Coke
Divide-And-Conquer Exploratory EM
Root
FoodLocati
on
Country
State
Vegetable
Condiment
1.0 Cat
C8
C90.45 0.55Cat
0 0 0 00 0 0 11
Adds to class constraints
What are we trying to optimize? Objective Function :
Maximize { Log Data Likelihood – Model Penalty } m: #clusters,
Params{C1… Cm}
subject to Class constraints: Zm
Datasets
Ontology 1
Ontology 2
Dataset
#Classes
#Levels
#NELLentities
#Contexts
DS-1 11 3 2.5K 3.4M
DS-2 39 4 12.9K 6.7M
Clueweb09 Corpus
+Subsets of
NELL
Results
Dataset
#Train/Test Points
DS-1 335/ 2.2K
DS-2 1.5K/11.4K
Results
Dataset
#Train/Test Points
Level
#Seed/ #Ideal Classes
DS-1 335/ 2.2K
2 2/3
3 4/7
DS-2 1.5K/11.4K
2 3.9/4
3 9.4/24
4 2.4/10
Results
Dataset
#Train/Test Points
Level
#Seed/ #Ideal Classes
Macro-averaged Seed Class F1
FLAT
SemisupEM
ExploratoryEM
DS-1 335/ 2.2K
2 2/3 43.2 78.7 *
3 4/7 34.4 42.6 *
DS-2 1.5K/11.4K
2 3.9/4 64.3 53.40
3 9.4/24 31.3 33.7 *
4 2.4/10 27.5 38.9 *
Results
Dataset
#Train/Test Points
Level
#Seed/ #Ideal Classes
Macro-averaged Seed Class F1
FLAT DAC
SemisupEM
ExploratoryEM
SemisupEM
ExploratoryEM
DS-1 335/ 2.2K
2 2/3 43.2 78.7 * 69.5 77.2 *
3 4/7 34.4 42.6 * 31.3 44.4 *
DS-2 1.5K/11.4K
2 3.9/4 64.3 53.40
65.4 68.9 *
3 9.4/24 31.3 33.7 * 34.9 41.7 *
4 2.4/10 27.5 38.9 * 43.2 42.40
Conclusions
Hierarchical Exploratory EM works with incomplete class hierarchy and few seed instances to extend the existing knowledge base.
Encouraging preliminary results Hierarchical classification Flat classification Exploratory Learning Semi-supervised Learning
Future work: Incorporate arbitrary class constraints Evaluate the newly added clusters
Thank You
Questions?
Extra Slides
Class Creation Criterion
Given MinMax ratio:
Jensen-Shannon divergence: JS-Div(
Model Selection
Extended Akaike Information Criterion
AICc(g) = -2*L(g) + 2*v + 2*v*(v+1)/(n – v -1) Here g: model being evaluated, L(g): log-likelihood of data given g, v: number of free parameters of the model, n: number of data-points.