relation recognition - sjtuli-fang/lecture 7 relaion ie...distant supervision --mike mintz, et al....
TRANSCRIPT
![Page 1: Relation Recognition - SJTUli-fang/Lecture 7 Relaion IE...Distant Supervision --Mike Mintz, et al. Distant supervision for relation extraction without labeled data ACL2009 Combing](https://reader035.vdocuments.mx/reader035/viewer/2022070820/5f1db5982ebc7d16f53b0a22/html5/thumbnails/1.jpg)
lecture of Internet-based IE Technology
Relation Recognition
Fang Li
Dept. of Computer Science &
Engineering
![Page 2: Relation Recognition - SJTUli-fang/Lecture 7 Relaion IE...Distant Supervision --Mike Mintz, et al. Distant supervision for relation extraction without labeled data ACL2009 Combing](https://reader035.vdocuments.mx/reader035/viewer/2022070820/5f1db5982ebc7d16f53b0a22/html5/thumbnails/2.jpg)
lecture of Internet-based IE Technology
Contents
-- Semi-supervised learning
-- Distant supervised learning
-- Deep learning
![Page 3: Relation Recognition - SJTUli-fang/Lecture 7 Relaion IE...Distant Supervision --Mike Mintz, et al. Distant supervision for relation extraction without labeled data ACL2009 Combing](https://reader035.vdocuments.mx/reader035/viewer/2022070820/5f1db5982ebc7d16f53b0a22/html5/thumbnails/3.jpg)
Semi-supervised method
Relation Bootstrapping
Gather a set of seeds
Iterate:
1.Find sentence with these seeds
2.Look at the context between or around the seeds to define a pattern
3.Use the pattern for more examples
lecture of Internet-based IE Technology
![Page 4: Relation Recognition - SJTUli-fang/Lecture 7 Relaion IE...Distant Supervision --Mike Mintz, et al. Distant supervision for relation extraction without labeled data ACL2009 Combing](https://reader035.vdocuments.mx/reader035/viewer/2022070820/5f1db5982ebc7d16f53b0a22/html5/thumbnails/4.jpg)
Bootstrapping from seed entity pairs to learn relations
lecture of Internet-based IE Technology
![Page 5: Relation Recognition - SJTUli-fang/Lecture 7 Relaion IE...Distant Supervision --Mike Mintz, et al. Distant supervision for relation extraction without labeled data ACL2009 Combing](https://reader035.vdocuments.mx/reader035/viewer/2022070820/5f1db5982ebc7d16f53b0a22/html5/thumbnails/5.jpg)
Confidence Value for Bootstrpping Given a document collection D, a current set of
tuples T, and a proposed pattern P, two factors need to be considered:
Hits: the set of tuples in T that p matches while looking in D.
Finds: The total set of tuples that p finds in D
lecture of Internet-based IE Technology
A corpus D: ABCDEF BDCDFE CDEFG HHECDE
Tuple set T: BCDE ECDE AUD
Conf(CD)=2/4 X log (4)=30%
![Page 6: Relation Recognition - SJTUli-fang/Lecture 7 Relaion IE...Distant Supervision --Mike Mintz, et al. Distant supervision for relation extraction without labeled data ACL2009 Combing](https://reader035.vdocuments.mx/reader035/viewer/2022070820/5f1db5982ebc7d16f53b0a22/html5/thumbnails/6.jpg)
lecture of Internet-based IE Technology
![Page 7: Relation Recognition - SJTUli-fang/Lecture 7 Relaion IE...Distant Supervision --Mike Mintz, et al. Distant supervision for relation extraction without labeled data ACL2009 Combing](https://reader035.vdocuments.mx/reader035/viewer/2022070820/5f1db5982ebc7d16f53b0a22/html5/thumbnails/7.jpg)
lecture of Internet-based IE Technology
lecture of Internet-based IE Technology
Patterns:
[person][was assigned│was selected │was appointed as][position]
New examples:
Example: Extract Person name and position title
Search Engine Keywords: Wang Ning + vice Mayor
![Page 8: Relation Recognition - SJTUli-fang/Lecture 7 Relaion IE...Distant Supervision --Mike Mintz, et al. Distant supervision for relation extraction without labeled data ACL2009 Combing](https://reader035.vdocuments.mx/reader035/viewer/2022070820/5f1db5982ebc7d16f53b0a22/html5/thumbnails/8.jpg)
lecture of Internet-based IE Technology
How does the bootstrapping method differ from supervised methods for relation extraction previously described? (Select all that apply)
A. The bootstrap method doesn't train a classifier.
B. The bootstrap does not require a human selected set of features
C. The bootstrap iteratively expands its feature set.
D. The bootstrap method is nondeterministic (it will give different outputs for the same corpus and seed).
![Page 9: Relation Recognition - SJTUli-fang/Lecture 7 Relaion IE...Distant Supervision --Mike Mintz, et al. Distant supervision for relation extraction without labeled data ACL2009 Combing](https://reader035.vdocuments.mx/reader035/viewer/2022070820/5f1db5982ebc7d16f53b0a22/html5/thumbnails/9.jpg)
Distant supervision method
“Distant supervision for relation extraction without labeled data”
What means “distant supervision”?
What are the advantages of the method?
What are the disadvantages of the method?
lecture of Internet-based IE Technology
![Page 10: Relation Recognition - SJTUli-fang/Lecture 7 Relaion IE...Distant Supervision --Mike Mintz, et al. Distant supervision for relation extraction without labeled data ACL2009 Combing](https://reader035.vdocuments.mx/reader035/viewer/2022070820/5f1db5982ebc7d16f53b0a22/html5/thumbnails/10.jpg)
Distant Supervision --Mike Mintz, et al. Distant supervision for relation extraction without labeled data ACL2009
Combing bootstrapping with supervised learning
Instead of 5 seeds, use a large database to get huge number of seeds
Create lots of features from all these examples
Combine in a supervised classifier
lecture of Internet-based IE Technology
![Page 11: Relation Recognition - SJTUli-fang/Lecture 7 Relaion IE...Distant Supervision --Mike Mintz, et al. Distant supervision for relation extraction without labeled data ACL2009 Combing](https://reader035.vdocuments.mx/reader035/viewer/2022070820/5f1db5982ebc7d16f53b0a22/html5/thumbnails/11.jpg)
Existing Knowledge Base
DBPedia or Freebase: tens of thousands of examples of many relations; such as,
place-of-birth<Edwin Hubble, Marshfield>
Place-of-birth<Albert Einstein,Ulm>
…
Wikipedia: Extract all sentences that have two named entities that match the tuple, like the following:
...Hubble was born in Marshfield...
...Einstein, born (1879), Ulm...
...Hubble’s birthplace in Marshfield... lecture of Internet-based IE
Technology
![Page 12: Relation Recognition - SJTUli-fang/Lecture 7 Relaion IE...Distant Supervision --Mike Mintz, et al. Distant supervision for relation extraction without labeled data ACL2009 Combing](https://reader035.vdocuments.mx/reader035/viewer/2022070820/5f1db5982ebc7d16f53b0a22/html5/thumbnails/12.jpg)
lecture of Internet-based IE Technology
Collecting Training Data
a freely available online database of structured semantic data. They use 1.8 million instances of 102 relations connecting 940,000 entities
![Page 13: Relation Recognition - SJTUli-fang/Lecture 7 Relaion IE...Distant Supervision --Mike Mintz, et al. Distant supervision for relation extraction without labeled data ACL2009 Combing](https://reader035.vdocuments.mx/reader035/viewer/2022070820/5f1db5982ebc7d16f53b0a22/html5/thumbnails/13.jpg)
The distant supervision algorithm for relation extraction
lecture of Internet-based IE Technology
![Page 14: Relation Recognition - SJTUli-fang/Lecture 7 Relaion IE...Distant Supervision --Mike Mintz, et al. Distant supervision for relation extraction without labeled data ACL2009 Combing](https://reader035.vdocuments.mx/reader035/viewer/2022070820/5f1db5982ebc7d16f53b0a22/html5/thumbnails/14.jpg)
lecture of Internet-based IE Technology
![Page 15: Relation Recognition - SJTUli-fang/Lecture 7 Relaion IE...Distant Supervision --Mike Mintz, et al. Distant supervision for relation extraction without labeled data ACL2009 Combing](https://reader035.vdocuments.mx/reader035/viewer/2022070820/5f1db5982ebc7d16f53b0a22/html5/thumbnails/15.jpg)
Distant supervision: Lexical Features:
The sequence of words between the two entities
The part-of-speech tags of these words
A flag indicating which entity came first in the sentence
A window of k words to the left of Entity 1 and their part-of-speech tags
A window of k words to the right of Entity 2 and their part-of-speech tags
lecture of Internet-based IE Technology
![Page 16: Relation Recognition - SJTUli-fang/Lecture 7 Relaion IE...Distant Supervision --Mike Mintz, et al. Distant supervision for relation extraction without labeled data ACL2009 Combing](https://reader035.vdocuments.mx/reader035/viewer/2022070820/5f1db5982ebc7d16f53b0a22/html5/thumbnails/16.jpg)
Distant supervision: Syntactic Features:
Use parser MINIPAR
A dependency path between the two entities.
For each entity, one ‘window’ node that is not part of the dependency path.
lecture of Internet-based IE Technology
![Page 17: Relation Recognition - SJTUli-fang/Lecture 7 Relaion IE...Distant Supervision --Mike Mintz, et al. Distant supervision for relation extraction without labeled data ACL2009 Combing](https://reader035.vdocuments.mx/reader035/viewer/2022070820/5f1db5982ebc7d16f53b0a22/html5/thumbnails/17.jpg)
Distant supervision: Features
lecture of Internet-based IE Technology
词汇特征的K=0,1,2 P(place_of_birth │f1,f2,f3,…f7000)
![Page 18: Relation Recognition - SJTUli-fang/Lecture 7 Relaion IE...Distant Supervision --Mike Mintz, et al. Distant supervision for relation extraction without labeled data ACL2009 Combing](https://reader035.vdocuments.mx/reader035/viewer/2022070820/5f1db5982ebc7d16f53b0a22/html5/thumbnails/18.jpg)
lecture of Internet-based IE Technology
Examples of high-weight features for several relations
![Page 19: Relation Recognition - SJTUli-fang/Lecture 7 Relaion IE...Distant Supervision --Mike Mintz, et al. Distant supervision for relation extraction without labeled data ACL2009 Combing](https://reader035.vdocuments.mx/reader035/viewer/2022070820/5f1db5982ebc7d16f53b0a22/html5/thumbnails/19.jpg)
Training and Testing
Training: 900,000 Freebase relation instances, 800,000 Wikipedia articles
Testing: 900,000 Freebase relation instances, 400,000 different articles
Classifer: multi-class logistic regression classifier which returns a relation name and a confidence score.
lecture of Internet-based IE Technology
![Page 20: Relation Recognition - SJTUli-fang/Lecture 7 Relaion IE...Distant Supervision --Mike Mintz, et al. Distant supervision for relation extraction without labeled data ACL2009 Combing](https://reader035.vdocuments.mx/reader035/viewer/2022070820/5f1db5982ebc7d16f53b0a22/html5/thumbnails/20.jpg)
Freebase Examples
lecture of Internet-based IE Technology
![Page 21: Relation Recognition - SJTUli-fang/Lecture 7 Relaion IE...Distant Supervision --Mike Mintz, et al. Distant supervision for relation extraction without labeled data ACL2009 Combing](https://reader035.vdocuments.mx/reader035/viewer/2022070820/5f1db5982ebc7d16f53b0a22/html5/thumbnails/21.jpg)
Human Evaluation Result
lecture of Internet-based IE Technology
![Page 22: Relation Recognition - SJTUli-fang/Lecture 7 Relaion IE...Distant Supervision --Mike Mintz, et al. Distant supervision for relation extraction without labeled data ACL2009 Combing](https://reader035.vdocuments.mx/reader035/viewer/2022070820/5f1db5982ebc7d16f53b0a22/html5/thumbnails/22.jpg)
Examples of their results
lecture of Internet-based IE Technology
![Page 23: Relation Recognition - SJTUli-fang/Lecture 7 Relaion IE...Distant Supervision --Mike Mintz, et al. Distant supervision for relation extraction without labeled data ACL2009 Combing](https://reader035.vdocuments.mx/reader035/viewer/2022070820/5f1db5982ebc7d16f53b0a22/html5/thumbnails/23.jpg)
Advantages
Does not need human annotation
Large training corpus
lecture of Internet-based IE Technology
![Page 24: Relation Recognition - SJTUli-fang/Lecture 7 Relaion IE...Distant Supervision --Mike Mintz, et al. Distant supervision for relation extraction without labeled data ACL2009 Combing](https://reader035.vdocuments.mx/reader035/viewer/2022070820/5f1db5982ebc7d16f53b0a22/html5/thumbnails/24.jpg)
Disadvantages of the method
Noises in training data, for example,
founder(Bill Gates, Microsoft ) S1: Bill Gates is one of the founders of Microsoft Co.
S2: Bill Gates has founded the Microsoft Co.
S3: Bill Gates was CEO of the Microsoft Co. ×
S4: Bill Gates discussed with the CEO of the Microsoft Co. for his retirement. ×
Relations are disjoined
Founded(Jobs, Apple), CEO-of(Jobs, Apple) can not be extracted both.
How to improve it ? lecture of Internet-based IE
Technology
![Page 25: Relation Recognition - SJTUli-fang/Lecture 7 Relaion IE...Distant Supervision --Mike Mintz, et al. Distant supervision for relation extraction without labeled data ACL2009 Combing](https://reader035.vdocuments.mx/reader035/viewer/2022070820/5f1db5982ebc7d16f53b0a22/html5/thumbnails/25.jpg)
Improved method:
lecture of Internet-based IE Technology
![Page 26: Relation Recognition - SJTUli-fang/Lecture 7 Relaion IE...Distant Supervision --Mike Mintz, et al. Distant supervision for relation extraction without labeled data ACL2009 Combing](https://reader035.vdocuments.mx/reader035/viewer/2022070820/5f1db5982ebc7d16f53b0a22/html5/thumbnails/26.jpg)
Performance of Relation Classification (SemiEval-2010 task 8 dataset)
From Xianpei Han’s tutorial in 2016,10,21 “Knowledge graph based sematic relation Extraction”
lecture of Internet-based IE Technology
![Page 27: Relation Recognition - SJTUli-fang/Lecture 7 Relaion IE...Distant Supervision --Mike Mintz, et al. Distant supervision for relation extraction without labeled data ACL2009 Combing](https://reader035.vdocuments.mx/reader035/viewer/2022070820/5f1db5982ebc7d16f53b0a22/html5/thumbnails/27.jpg)
New Trends: Deep Learning
Training data Word Embedding Acme Inc. hired Mr Smith as their new CEO, replacing Mr Bloggs.
Zeng, Relation classification via convolutional deep neural network Coling 2014
![Page 28: Relation Recognition - SJTUli-fang/Lecture 7 Relaion IE...Distant Supervision --Mike Mintz, et al. Distant supervision for relation extraction without labeled data ACL2009 Combing](https://reader035.vdocuments.mx/reader035/viewer/2022070820/5f1db5982ebc7d16f53b0a22/html5/thumbnails/28.jpg)
Extract relations using new method
lecture of Internet-based IE Technology
![Page 29: Relation Recognition - SJTUli-fang/Lecture 7 Relaion IE...Distant Supervision --Mike Mintz, et al. Distant supervision for relation extraction without labeled data ACL2009 Combing](https://reader035.vdocuments.mx/reader035/viewer/2022070820/5f1db5982ebc7d16f53b0a22/html5/thumbnails/29.jpg)
Relation Representation (Triplets)
(head, label, tail) i.e (h,l,t): there exists a relationship of name label between the entities head and tail.
(Mom born_in Austin)
![Page 30: Relation Recognition - SJTUli-fang/Lecture 7 Relaion IE...Distant Supervision --Mike Mintz, et al. Distant supervision for relation extraction without labeled data ACL2009 Combing](https://reader035.vdocuments.mx/reader035/viewer/2022070820/5f1db5982ebc7d16f53b0a22/html5/thumbnails/30.jpg)
TransE:
If (h,l,t) holds, then the embedding of the tail entity t should be close to the embedding of the head entity h plus some vector that depends on the relationship l
![Page 31: Relation Recognition - SJTUli-fang/Lecture 7 Relaion IE...Distant Supervision --Mike Mintz, et al. Distant supervision for relation extraction without labeled data ACL2009 Combing](https://reader035.vdocuments.mx/reader035/viewer/2022070820/5f1db5982ebc7d16f53b0a22/html5/thumbnails/31.jpg)
Example
Aim: H+L=T T should be a nearest neighbor of H+L, while H+L should be far away from T otherwise.
(Yaoming born_in Shanghai) √
(Yaoming born_in Beijing) X
d(yaoming+born_in,Beijing) >> d(yaoming+born_in, Shanghai)
d: distance as dissimilarity function
lecture of Internet-based IE Technology
![Page 32: Relation Recognition - SJTUli-fang/Lecture 7 Relaion IE...Distant Supervision --Mike Mintz, et al. Distant supervision for relation extraction without labeled data ACL2009 Combing](https://reader035.vdocuments.mx/reader035/viewer/2022070820/5f1db5982ebc7d16f53b0a22/html5/thumbnails/32.jpg)
Learn TransE
Minimize a margin-based ranking criterion over the training set:
Corrupted triplets:
Positive examples
Negative examples
The optimization is carried out by stochastic gradient descent. Additional constraint: the L2-norm of the embeddings of entities is 1
![Page 33: Relation Recognition - SJTUli-fang/Lecture 7 Relaion IE...Distant Supervision --Mike Mintz, et al. Distant supervision for relation extraction without labeled data ACL2009 Combing](https://reader035.vdocuments.mx/reader035/viewer/2022070820/5f1db5982ebc7d16f53b0a22/html5/thumbnails/33.jpg)
Experimental Results
![Page 34: Relation Recognition - SJTUli-fang/Lecture 7 Relaion IE...Distant Supervision --Mike Mintz, et al. Distant supervision for relation extraction without labeled data ACL2009 Combing](https://reader035.vdocuments.mx/reader035/viewer/2022070820/5f1db5982ebc7d16f53b0a22/html5/thumbnails/34.jpg)
lecture of Internet-based IE Technology
TransE’s Result
![Page 35: Relation Recognition - SJTUli-fang/Lecture 7 Relaion IE...Distant Supervision --Mike Mintz, et al. Distant supervision for relation extraction without labeled data ACL2009 Combing](https://reader035.vdocuments.mx/reader035/viewer/2022070820/5f1db5982ebc7d16f53b0a22/html5/thumbnails/35.jpg)
lecture of Internet-based IE Technology
Comparisons with several methods developed from TransE
![Page 36: Relation Recognition - SJTUli-fang/Lecture 7 Relaion IE...Distant Supervision --Mike Mintz, et al. Distant supervision for relation extraction without labeled data ACL2009 Combing](https://reader035.vdocuments.mx/reader035/viewer/2022070820/5f1db5982ebc7d16f53b0a22/html5/thumbnails/36.jpg)
What are the disadvantages of TransE?
Can not extract 1 to N, N to 1, N to N relations.
(USA president Obama)
(USA president Bush)
(USA president Trump)
lecture of Internet-based IE Technology
![Page 37: Relation Recognition - SJTUli-fang/Lecture 7 Relaion IE...Distant Supervision --Mike Mintz, et al. Distant supervision for relation extraction without labeled data ACL2009 Combing](https://reader035.vdocuments.mx/reader035/viewer/2022070820/5f1db5982ebc7d16f53b0a22/html5/thumbnails/37.jpg)
Source Codes
KB2E: https://github.com/thunlp/KB2E
TransE, TransH, TransR,…
lecture of Internet-based IE Technology
![Page 38: Relation Recognition - SJTUli-fang/Lecture 7 Relaion IE...Distant Supervision --Mike Mintz, et al. Distant supervision for relation extraction without labeled data ACL2009 Combing](https://reader035.vdocuments.mx/reader035/viewer/2022070820/5f1db5982ebc7d16f53b0a22/html5/thumbnails/38.jpg)
References for Relation Extraction using deep learning methods
Daojian Zeng, Kang Liu, Siwei Lai, Guangyou Zhou,
and Jun Zhao. Relation classification via convolutional
deep neural network. In Proceedings of COLING 2014.
Daojian Zeng,Kang Liu,Yubo Chen,and Jun Zhao.
Distant supervision for relation extraction via piecewise
convolutional neural networks. In Proceedings of
EMNLP
Lin, et al. (2016). Neural Relation Extraction with
Selective Attention over Instances. ACL
Zeng, et al. (2015). Distant Supervision for Relation
Extraction via Piecewise Convolutional Neural
Networks. EMNLP. lecture of Internet-based IE
Technology
![Page 39: Relation Recognition - SJTUli-fang/Lecture 7 Relaion IE...Distant Supervision --Mike Mintz, et al. Distant supervision for relation extraction without labeled data ACL2009 Combing](https://reader035.vdocuments.mx/reader035/viewer/2022070820/5f1db5982ebc7d16f53b0a22/html5/thumbnails/39.jpg)
lecture of Internet-based IE Technology
Summarization
Semi-supervised methods
Distant supervision method
Deep learning method
![Page 40: Relation Recognition - SJTUli-fang/Lecture 7 Relaion IE...Distant Supervision --Mike Mintz, et al. Distant supervision for relation extraction without labeled data ACL2009 Combing](https://reader035.vdocuments.mx/reader035/viewer/2022070820/5f1db5982ebc7d16f53b0a22/html5/thumbnails/40.jpg)
About the Project (1)
Task: Employment relation extraction
Training corpus:本报北京12月30日讯新华社记者胡晓梦、本报记者吴亚明报道:新年将至,国务院侨务办公室主任郭东坡今天通过新闻媒介,向海外同胞和国内归侨、侨眷、侨务工作者发表新年贺词。
(胡晓梦,新华社)
(吴亚明,新民晚报)
(郭东坡,国务院侨务办) lecture of Internet-based IE
Technology
![Page 41: Relation Recognition - SJTUli-fang/Lecture 7 Relaion IE...Distant Supervision --Mike Mintz, et al. Distant supervision for relation extraction without labeled data ACL2009 Combing](https://reader035.vdocuments.mx/reader035/viewer/2022070820/5f1db5982ebc7d16f53b0a22/html5/thumbnails/41.jpg)
About the Project (2)
Methods:
Pattern-based
Supervised method or semi-supervised or unsupervised methods
Training corpus are put online.
Evaluation:
Use test corpus with human annotated results to evaluate your algorithm.
lecture of Internet-based IE Technology