modeling missing data in distant supervision for information extraction (ritter+, tacl 2013)
DESCRIPTION
第ïŒåæå 端NLPå匷äŒã®çºè¡šè³æ http://www.cl.ecei.tohoku.ac.jp/~y-matsu/snlp6/TRANSCRIPT
Modeling Missing Data in Distant Supervision for Information Extraction
Alan Ritter (CMU)Luke Zettlemoyer (University of Washington)
Mausam (University of Washington)Oren Etzioni (Vulcan Inc.)TACL, 1, 367-378, 2013.
Presented by Naoaki Okazaki (Tohoku University)
2014-09-05 Modeling Missing Data in Distant Supervision 1
Relation instance extractionSteven Spielbergâs film Saving Private Ryan is loosely based on the brothersâ story.
Extractor Film Director
Saving Private Ryan Steven Spielberg
Film-director relation
⢠Fully-supervised learning (Zhou+ 05, âŠ)⢠Uses ACE corpora to build relation-instance classifiers⢠Suffers from the limited number of training data
⢠Unsupervised information extraction (Banko+ 07, âŠ)⢠Extracts relational patterns between entities, and clusters the
patterns into relations⢠Difficult to map clusters into relations of interest
⢠Bootstrap learning (Brin 98, âŠ)⢠Uses seed instances to extract a new set of relational patterns⢠Often suffers from low precision (semantic drift)
⢠Distant supervision (Mintz+ 09, âŠ)⢠Combines the advantages of the above approaches
2014-09-05 Modeling Missing Data in Distant Supervision 2
Distant supervision (Mintz+, 09)Person Birthplace
Edwin Hubble Marshfield
⊠⊠Automatic annotation
Astronomer Edwin Hubble was born in Marshfield, Missouri.
Feature extraction
Mintz et al. (2009) Distant supervision for relation extraction without labeled data. ACL-2009, pages 1003â1011.* Each row presents a single feature. Concatenate features from different sentences containing the same entity pairs.
Problem: An entity pair cannot have multiple relationsE.g., Founded(Jobs, Apple) and CEO-of(Jobs, Apple) are true.
2014-09-05 Modeling Missing Data in Distant Supervision 3
MultiR (Hoffmann+, 11)
Introduces latent variables (ð§ð§ðð) to indicate the relation expressed by sentence ð¥ð¥ðð
0 1 1 0
Founder Founder CEO-of
ðŠðŠbornâin ðŠðŠfounder ðŠðŠCEOâof ðŠðŠcapitalâof
Steve Jobs was founder of Apple.
Steve Jobs, Steve Wozniak and Ronald Wayne founded Apple.
Steve Jobs is CEO of Apple.
ð§ð§1 ð§ð§2 ð§ð§3
ðð ðð, ðð ðð
=1ððð¥ð¥ï¿œðð
Ίjoin(ðŠðŠðð , ðð)ï¿œðð
Ίextract(ð§ð§ðð , ð¥ð¥ðð)
ð¥ð¥1 ð¥ð¥2 ð¥ð¥3
ðð
ðð
ðð
For entity pair, (Steve Jobs, Apple) ð¥ð¥ðð: a sentence containing the entity pairðŠðŠðð â {0,1}: 1 if the knowledge base includes the pair with relation ðð, 0 otherwiseð§ð§ðð â ð ð : the relation expressed by sentence ð¥ð¥ðð
Ίextract ð§ð§ðð , ð¥ð¥ðð = exp ï¿œðð
ðððððððð(ð§ð§ðð , ð¥ð¥ðð)
Ίjoin ðŠðŠðð , ðð = 1(¬ðŠðŠððââðð: ðð = ð§ð§ðð)(Deterministic OR)
The same as (Mintz+ 09)
Ίjoin ensures that a sentence ð¥ð¥ðð expressing the relation ðð exists if ðð is true
Allows multiple relations for the same entity pair
2014-09-05 Modeling Missing Data in Distant Supervision 4
MultiR: Training
Hoffmann et al. (2011) Knowledge-Based Weak Supervision for Information Extraction of Overlapping Relations. ACL-2011, pages 541â550.
Loop for passes over the training data
Loop for entity pairs in the KB
Predict sentence-level and KB-level relations (ignoring
the facts in the KB)
Find an optimal assignment of sentence-level relations
consistent with the facts in KB
We need two kinds of inferences
Update feature weights similarly to the perceptron algorithm
2014-09-05 Modeling Missing Data in Distant Supervision 5
MultiR: Inference 1: argmaxðð,ðð
ðð(ðð, ðð|ðð)
? ? ? ?
? ? ?
ðŠðŠbornâin ðŠðŠfounder ðŠðŠCEOâof ðŠðŠcapitalâof
Steve Jobs was founder of Apple.
Steve Jobs, Steve Wozniak and Ronald Wayne founded Apple.
Steve Jobs is CEO of Apple.
ð§ð§1 ð§ð§2 ð§ð§3
ð¥ð¥1 ð¥ð¥2 ð¥ð¥3
ðð
ðð
ðð
For entity pair, (Steve Jobs, Apple)
0.5
16.0
9.0
0.1
8.0
11.0
6.0
0.1
7.0
8.0
7.0
0.2
bornâinfounderCEOâofcapitaâof
Predict a relation label for each sentence
independently
Aggregate sentence-level predictions into
global-level predictions
2014-09-05 Modeling Missing Data in Distant Supervision 6
MultiR: Inference 1: argmaxðð,ðð
ðð(ðð, ðð|ðð)
0 1 0 0
founder founder founder
ðŠðŠbornâin ðŠðŠfounder ðŠðŠCEOâof ðŠðŠcapitalâof
Steve Jobs was founder of Apple.
Steve Jobs, Steve Wozniak and Ronald Wayne founded Apple.
Steve Jobs is CEO of Apple.
ð§ð§1 ð§ð§2 ð§ð§3
ð¥ð¥1 ð¥ð¥2 ð¥ð¥3
ðð
ðð
ðð
For entity pair, (Steve Jobs, Apple)
0.5
16.0
9.0
0.1
8.0
11.0
6.0
0.1
7.0
8.0
7.0
0.2
bornâinfounderCEOâofcapitaâof
Predict a relation label for each sentence
independently
Aggregate sentence-level predictions into
global-level predictions
Very easy to find!Computational cost:
ðð( ð ð ðð )
2014-09-05 Modeling Missing Data in Distant Supervision 7
MultiR: Inference 2: argmaxðð
ðð(ðð|ðð,ðð)
0 1 1 0
? ? ?
ðŠðŠbornâin ðŠðŠfounder ðŠðŠCEOâof ðŠðŠcapitalâof
Steve Jobs was founder of Apple.
Steve Jobs, Steve Wozniak and Ronald Wayne founded Apple.
Steve Jobs is CEO of Apple.
ð§ð§1 ð§ð§2 ð§ð§3
ð¥ð¥1 ð¥ð¥2 ð¥ð¥3
ðð
ðð
ðð
For entity pair, (Steve Jobs, Apple)
0.5
16.0
9.0
0.1
8.0
11.0
6.0
0.1
7.0
8.0
7.0
0.2
bornâinfounderCEOâofcapitaâof
0.5 87 16 11
8 96 7 0.1
0.1 0.2
Define an edge weight: w ðŠðŠðð , ð§ð§ðð = Ίextract(ðð, ð¥ð¥ðð)
A node with ðŠðŠðð = 1 must have at least an edge connecting to ð§ð§ðð
Each node ð§ð§ðð must have an edge connecting to ðŠðŠðð
Find a set of edges that maximize the sum of weights
2014-09-05 Modeling Missing Data in Distant Supervision 8
MultiR: Inference 2: argmaxðð
ðð(ðð|ðð,ðð)
0 1 1 0
founder founder CEO-of
ðŠðŠbornâin ðŠðŠfounder ðŠðŠCEOâof ðŠðŠcapitalâof
Steve Jobs was founder of Apple.
Steve Jobs, Steve Wozniak and Ronald Wayne founded Apple.
Steve Jobs is CEO of Apple.
ð§ð§1 ð§ð§2 ð§ð§3
ð¥ð¥1 ð¥ð¥2 ð¥ð¥3
ðð
ðð
ðð
For entity pair, (Steve Jobs, Apple)
0.5
16.0
9.0
0.1
8.0
11.0
6.0
0.1
7.0
8.0
7.0
0.2
bornâinfounderCEOâofcapitaâof
16 118 9
6 7
Define an edge weight: w ðŠðŠðð , ð§ð§ðð = Ίextract(ðð, ð¥ð¥ðð)
A node with ðŠðŠðð = 1 must have at least an edge connecting to ð§ð§ðð
Each node ð§ð§ðð must have an edge connecting to ðŠðŠðð
Find a set of edges that maximize the sum of weights
Exact solution in polynomial time
In practice, approximate solution by greedy search (assigning ð§ð§ðð for
each node ðŠðŠðð = 1) is sufficient2014-09-05 Modeling Missing Data in Distant Supervision 9
Contribution of this work⢠MultiR makes two assumptions (hard constraints):
⢠If a fact is not found in the database, it cannot be mentioned in the text
⢠If a fact is in the database, it must be mentioned in at least one sentence.
⢠Relax MultiR to handle the situation where:⢠A fact is not mentioned in text (MIT)⢠A fact mentioned in text is missing in database (MID)
⢠Side effect of this relaxation⢠Incorporates the tendency that the knowledge base is
likely to include popular entities and relations2014-09-05 Modeling Missing Data in Distant Supervision 10
Distant Supervision with Data Not Missing at Random (DNMAR)
0 1 1 0
Founder Founder visit
ðŠðŠbornâin ðŠðŠfounder ðŠðŠCEOâof ðŠðŠvisit
Steve Jobs was founder of Apple.
Steve Jobs, Steve Wozniak and Ronald Wayne founded Apple.
Steve Jobs visited Apple storeâŠ
ð§ð§1 ð§ð§2 ð§ð§3
ð¥ð¥1 ð¥ð¥2 ð¥ð¥3
ðð
ðð
ðð
For entity pair, (Steve Jobs, Apple)
0 1 0 1ðð
Introduce a layer of latent variables (ð¡ð¡ðð) to handle missing cases
ððmiss ðŠðŠðð , ð¡ð¡ðð
=
âðŒðŒðððððð (ðŠðŠðð = 1âð¡ð¡ðð = 0)(missing in text)
âðŒðŒðððððð (ðŠðŠðð = 0âð¡ð¡ðð = 1)(missing in DB)
0 (otherwise)
Relaxing two hard constraints in MultiR into soft ones with penalty
factors âðŒðŒðððððð and âðŒðŒðððððð
Introduce a new factor:
Training algorithm is the same as the one used in MultiR
2014-09-05 Modeling Missing Data in Distant Supervision 11
Constrained inference: argmaxðð
ðð(ðð|ðð,ðð)
0 1 1 0
? ? ?
ðŠðŠbornâin ðŠðŠfounder ðŠðŠCEOâof ðŠðŠvisit
Steve Jobs was founder of Apple.
Steve Jobs, Steve Wozniak and Ronald Wayne founded Apple.
Steve Jobs visited Apple storeâŠ
ð§ð§1 ð§ð§2 ð§ð§3
ð¥ð¥1 ð¥ð¥2 ð¥ð¥3
ðð
ðð
ðð
For entity pair, (Steve Jobs, Apple)
? ? ? ?ðð
ð§ð§â = argmaxðð
ï¿œðð=1
ðð
ðð ï¿œ Ίextract ð§ð§ðð , ð¥ð¥ðð + ï¿œðð
ðŒðŒðððððð ï¿œ 1(ðŠðŠððââðð: ðð = ð§ð§ðð) âðŒðŒððððððï¿œ 1(¬ðŠðŠððââðð: ðð = ð§ð§ðð)
Became more challenging
A* search can find an exact solution, but is not scalable
with many variables
Present a greedy hill climbing approach for the inference:
1. Initialize ð§ð§ðð at random2. Obtain neighborhoods of
the current solution3. Move to the neighbor
yielding the highest score4. Repeat this process
2014-09-05 Modeling Missing Data in Distant Supervision 12
Incorporating popularity in KB⢠We tune the penalty factors ðŒðŒðððððð and ðŒðŒðððððð on a
development set⢠We can take into account how likely each fact is to
be observed in the text and the knowledge base⢠Facts about Barack Obama are likely to exist⢠Facts about Naoaki Okazaki are unlikely to exists
⢠Control the penalty factor for each entity pair⢠Popularity of entities: ðŒðŒðððððð
(ðð1,ðð2) = âðŸðŸmin(ðð ðð1 , ðð(ðð2))⢠A larger penalty if the model predicts that a fact about a
popular entity does not exist in KB⢠Well-aligned relations: assign 3 kinds of values of ðŒðŒðððððððð
⢠A larger penalty if a popular relation such as contains, place_lived, and nationality does not exist in text
2014-09-05 Modeling Missing Data in Distant Supervision 13
Experiments⢠Binary relation extraction
⢠The standard setting (Riedel+, 10)⢠Knowledge base: Freebase relations⢠Text corpus: 1.8m New York Times articles
⢠Two kinds of evaluation⢠Sentence-level extractions using the dataset (Hoffmann+, 11)⢠Holdout evaluation on Freebase knowledge
⢠Unary relation extraction (NE categorization)⢠Twitter NE categorization dataset (Ritter+, 11)
⢠Knowledge base: Freebase (instances and their categories)⢠Text corpus: tweets
⢠Hold-out evaluation
2014-09-05 Modeling Missing Data in Distant Supervision 14
Results
17% increase in area under the curve.Incorporating popularity yielded 27% increase over the baseline.
This evaluation underestimate precision because many facts correctly extracted from text are missing in the database.DNMAR doubled the recall.
Ritter et al. (2013) Modeling Missing Data in Distant Supervision for Information Extraction, TACL(1), 367-378.
2014-09-05 Modeling Missing Data in Distant Supervision 15
Conclusion⢠Investigated the problem of missing data in distant
supervision⢠Presented an extension of MultiR to handle missing
data⢠Could incorporate the popularity of facts to be
included in the knowledge base and text⢠Presented a scalable inference algorithm based on
greedy hill-climbing⢠Demonstrated the effectiveness of the modeling
2014-09-05 Modeling Missing Data in Distant Supervision 16
References⢠Raphael Hoffmann, Congle Zhang, Xiao Ling, Luke
Zettlemoyer, Daniel S. Weld. (2011) Knowledge-Based Weak Supervision for Information Extraction of Overlapping Relations. ACL-2011, pages 541â550.
⢠Slides and codes
⢠Mike Mintz, Steven Bills, Rion Snow, Dan Jurafsky. (2009) Distant supervision for relation extraction without labeled data. ACL-2009, pages 1003â1011.
2014-09-05 Modeling Missing Data in Distant Supervision 17