leila – learning to extract information by linguistic analysis
DESCRIPTION
LEILA – Learning to Extract Information by Linguistic Analysis. presented at the 2 nd Workshop on Ontology Learning and Population (OLP2). Fabian M. Suchanek , Georgiana Ifrim, Gerhard Weikum (Max-Planck Institute for Computer Science Saarbrücken/Germany). Overview. ر Motivation - PowerPoint PPT PresentationTRANSCRIPT
LEILA - Learning to Extract Information by Linguistic Analysis 1Fabian M. Suchanek
LEILA – Learning to Extract Information by Linguistic Analysis
presented at the2nd Workshop on Ontology Learning and Population (OLP2)
Fabian M. Suchanek, Georgiana Ifrim, Gerhard Weikum
(Max-Planck Institute for Computer Science Saarbrücken/Germany)
LEILA - Learning to Extract Information by Linguistic Analysis 2Fabian M. Suchanek
Overview
Motivation ر
The LEILA System ر
Plan of Attack ر
System Architecture ر
Experiments ر
Conclusion ر
LEILA - Learning to Extract Information by Linguistic Analysis 3Fabian M. Suchanek
Motivation
Meat dish
Google Search I'm feeling hungry
This page has been createdto enlighten the public about the Wiener Schnitzel. [...]
?
LEILA - Learning to Extract Information by Linguistic Analysis 4Fabian M. Suchanek
Motivation
To know that a Schnitzel is a meat dish,
we need an ontology.
Use hand-crafted ontologies (like WordNet) ر
(but: low coverage, high cost, fast aging)
Or: Gather ontological data from Web documents ر
LEILA - Learning to Extract Information by Linguistic Analysis 5Fabian M. Suchanek
Goal
Given
a binary target relation (e.g. subclassOf) ر
a set of Web documents ر
extract
all pairs of entities that are in the target relation
LEILA - Learning to Extract Information by Linguistic Analysis 6Fabian M. Suchanek
Related Work
X is a Y
A Schnitzel is a meat dish from Austria.
Learn text patterns (e.g. Soderland, Chakrabarti, KnowItAll)
LEILA - Learning to Extract Information by Linguistic Analysis 7Fabian M. Suchanek
Related Work
X is a Y
A Schnitzel, also called Wiener Schnitzel, is a meat dish.
Learn text patterns (e.g. Soderland, Chakrabarti, KnowItAll)
LEILA - Learning to Extract Information by Linguistic Analysis 8Fabian M. Suchanek
Related Work
┌──────Subject───────────┐┌Obj─┐
A Schnitzel, also called Wiener Schnitzel, is a meat dish.
Learn text patterns (e.g. Soderland, Chakrabarti, KnowItAll)
Idea: Learn linguistic patterns!
LEILA - Learning to Extract Information by Linguistic Analysis 9Fabian M. Suchanek
Plan of Attack
(Web documents) (Output pairs)
Schnitzel meat dish
Koala mammal
…
subclassOf
(Target relation)
LEILA - Learning to Extract Information by Linguistic Analysis 10Fabian M. Suchanek
Preprocessing
(Web documents) (Output pairs)
Schnitzel meat dish
Koala mammal
…
subclassOf
(Target relation)
The Schnitzel (0.0314946089 stones) is best
enjoyed with Ösibräu.
LEILA - Learning to Extract Information by Linguistic Analysis 11Fabian M. Suchanek
Preprocessing
(Web documents) (Output pairs)
Schnitzel meat dish
Koala mammal
…
subclassOf
(Target relation)
The Schnitzel (200g) is best
enjoyed with Ösibräu.
LEILA - Learning to Extract Information by Linguistic Analysis 12Fabian M. Suchanek
The Schnitzel (200g) is best
enjoyed with Oesibraeu.
Preprocessing
(Web documents) (Output pairs)
Schnitzel meat dish
Koala mammal
…
subclassOf
(Target relation)
LEILA - Learning to Extract Information by Linguistic Analysis 13Fabian M. Suchanek
Preprocessing
(Web documents) (Output pairs)
Schnitzel meat dish
Koala mammal
…
subclassOf
(Target relation)
The Schnitzel is best enjoyed with Oesibraeu.
The Schnitzel ( 200 g )
LEILA - Learning to Extract Information by Linguistic Analysis 14Fabian M. Suchanek
Preprocessing
Schnitzel meat dish
Koala mammal
…
subclassOf
det subjparticiple
advmod comp
The Schnitzel ( 200 g )
adj adj adj adj adj
The Schnitzel is best enjoyed with Oesibraeu.
LEILA - Learning to Extract Information by Linguistic Analysis 15Fabian M. Suchanek
Preprocessing
Schnitzel meat dish
Koala mammal
…
subclassOf
(Web documents) (Output pairs)(Target relation)
LEILA - Learning to Extract Information by Linguistic Analysis 16Fabian M. Suchanek
Algorithm
(Web documents)(Seed pairs) (Output pairs)
Schnitzel meat dish
Koala mammal
…
dog mammal
...
A dog is a mammal.
dog nag
...
+
-
LEILA - Learning to Extract Information by Linguistic Analysis 17Fabian M. Suchanek
Algorithm
(Web documents)(Seed pairs) (Output pairs)
Schnitzel meat dish
Koala mammal
…
(Positive patterns)
dog nag
...
dog mammal
...+
-
This dog is a nag.A X is a Y.
LEILA - Learning to Extract Information by Linguistic Analysis 18Fabian M. Suchanek
Algorithm
(Web documents)(Seed pairs) (Output pairs)
Schnitzel meat dish
Koala mammal
…
(Positive patterns) (Negative patterns)
A X is a Y. This X is a Y.
dog nag
...
dog mammal
...+
-
LEILA - Learning to Extract Information by Linguistic Analysis 19Fabian M. Suchanek
Algorithm
(Web documents)(Seed pairs) (Output pairs)
Schnitzel meat dish
Koala mammal
…
(Generalized positive patterns)
A X is a Y.
dog mammal
...
dog nag
...
+
-
A Schnitzel is a meat dish.
LEILA - Learning to Extract Information by Linguistic Analysis 20Fabian M. Suchanek
LEILA: System Architecture
(Web documents)(Seed pairs) (Output pairs)
Schnitzel meat dish
Koala mammal
…
dog mammal
...
dog nag
...
Seed pair data sets
LEILA
LinkParser (Sleator, CMU)
Preprocessing, stemming
kNN Learner
SVMLight (Joachims, Cornell U)
LEILA - Learning to Extract Information by Linguistic Analysis 21Fabian M. Suchanek
Gold Standard for Evaluation
(Web documents) (Output pairs)
Schnitzel meat dish
Koala mammal
…
(Target relation)
(Ideal pairs)A Schnitzel is practically vitamin-free
and thus the meat dish is extremely
popular in Europe.
Schnitzel meat dish
LEILA - Learning to Extract Information by Linguistic Analysis 22Fabian M. Suchanek
Results with different relations
Seed pairs are given by a function that decides whether a word pair is
an example ر
(here: list of birth dates from www.famousbirthdays.com)
a counterexample ر
(here: can be deduced from examples)
a candidate ر
(here: all pairs of a name and a date)
birthDate
LEILA - Learning to Extract Information by Linguistic Analysis 23Fabian M. Suchanek
Results with different relations
birthDate
Patterns:
X (born in Y)
X was born in Y
...
79%8% 70%9%
Target Relation Corpus Precision Recall
Wikip composers
(see paper for details on the experiments)
LEILA - Learning to Extract Information by Linguistic Analysis 24Fabian M. Suchanek
Results with different relations
synonymy
Examples: all WordNet synsets
Counterexamples: all words that are not in a synset
Candidates: all pairs of proper names
Patterns: X or Y, X (or Y), ...
73%7% 64%7%
birthDate 79%8% 70%9%
Target Relation Corpus Precision Recall
Wikip composers
Wikip geography
LEILA - Learning to Extract Information by Linguistic Analysis 25Fabian M. Suchanek
Results with different relations
Examples: all direct WordNet hyponyms
Counterexamples: all words that are not hyponyms of each other
Candidates: all pairs of a proper name and a WordNet concept
Patterns: an X is a Y, X is unusual among the Y,...
instanceOf 58%3% 41%3%
synonymy 73%7% 64%7%
birthDate 79%8% 70%9%
Target Relation Corpus Precision Recall
Wikip composers
Wikip geography
Wikip composers
LEILA - Learning to Extract Information by Linguistic Analysis 26Fabian M. Suchanek
Results with different relations
instanceOf 58%3% 41%3%
synonymy 73%7% 64%7%
birthDate 79%8% 70%9%
Target Relation Corpus Precision Recall
Wikip composers
Wikip geography
Wikip composers
Wikip random
Google composers 28%3% 17%2%
33%3% 33%3%
(see paper for details on the experiments)
LEILA - Learning to Extract Information by Linguistic Analysis 27Fabian M. Suchanek
58
41
Results with different competitors
(see paper for explanations, conditions and details!)
Snowball
headquarters
Snowball’s corpus
TextToOnto,Text2Onto
instanceOf
Wikip composers
CV-System
instanceOf
CV’s corpus
CV-System
instanceOf
Wikip composers
34
90
50
30
58
4150
39
4
3226
15
3222
4
(Results in %, LEILA in red)
2
Precision Recall Precision Recall Precision Recall Precision Recall
LEILA - Learning to Extract Information by Linguistic Analysis 28Fabian M. Suchanek
Conclusion
Our system LEILA
can learn arbitrary binary relations from Web documents ر
uses a deep linguistic analysis ر
compares favorably with other systems ر
See http://www.mpi-inf.de/~suchanek
LEILA - Learning to Extract Information by Linguistic Analysis 29Fabian M. Suchanek
Results with different competitors
headquarters
instanceOf
34%8% 30%7%
System Relation Corpus Precision Recall
Snowball Snowball’s
headquarters 90%6% 50%7%LEILA Snowball’s
TextToOnto Wikip composers 39%9% 4%1%
Text2Onto instanceOf Wikip composers 50% 2%1%
CV-System instanceOf CV’s 32%5% 32%5%
LEILA instanceOf CV’s 26%7% 15%4%
CV-System instanceOf 22% 4%2%Wikip composers
LEILA instanceOf Wikip composers 58%3% 41%3%
(see paper for explanations, conditions and details!)
LEILA instanceOf Wikip composers 58%3% 41%3%
LEILA - Learning to Extract Information by Linguistic Analysis 30Fabian M. Suchanek
Pattern Generalization – kNN
This X is a Y. X such as Y
A X is a Y.+
+-
A X is a big Y
(See our paper at KDD for details)
LEILA - Learning to Extract Information by Linguistic Analysis 31Fabian M. Suchanek
Pattern Generalization – SVM
This X is a Y.
X such as Y
A X is a Y.+
+
-
A X is a big Y
(See our paper at KDD for details)
- +
+