coreference based event-argument relation extraction on biomedical text katsumasa yoshikawa 1),...

27
Coreference Based Event-Argument Relation Extr action on Biomedical Text Katsumasa Yoshikawa 1) , Sebastian Riedel 2) , Tsutomu Hirao 3) , Masayuki Asahara 1) , Yuji Matsumoto 1) 1) Nara Institute of Science and Technology, Japan 2) University of Massachusetts, Amherst, USA 3) NTT Communication Science Lab. Japan SMBM 2010 25 th - 26 th October, 2010 Hinxton, Cambridge, UK

Upload: savanah-allsopp

Post on 14-Dec-2015

222 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Coreference Based Event-Argument Relation Extraction on Biomedical Text Katsumasa Yoshikawa 1), Sebastian Riedel 2), Tsutomu Hirao 3), Masayuki Asahara

Coreference Based Event-Argument Relation Extraction on

Biomedical Text

Katsumasa Yoshikawa1), Sebastian Riedel2), Tsutomu Hirao3), Masayuki Asahara1), Yuji Matsumoto1)

1) Nara Institute of Science and Technology, Japan2) University of Massachusetts, Amherst, USA

3) NTT Communication Science Lab. Japan

SMBM 201025th - 26th October, 2010 Hinxton, Cambridge, UK

Page 2: Coreference Based Event-Argument Relation Extraction on Biomedical Text Katsumasa Yoshikawa 1), Sebastian Riedel 2), Tsutomu Hirao 3), Masayuki Asahara

2

Outline Research summary

Related work of event extraction

Proposed coreference based approach

Experimental setup and highlighted data

Conclusion and future work

Page 3: Coreference Based Event-Argument Relation Extraction on Biomedical Text Katsumasa Yoshikawa 1), Sebastian Riedel 2), Tsutomu Hirao 3), Masayuki Asahara

3

Summary of Our Research

Coreference Based Approach for biomedical event extraction with Markov Logic

Why coreference?– Extraction of valuable event-argument relations in di

scourse structure– Identification of arguments crossing sentence bound

aries

Why Markov Logic?– Implementation of Salience in Discourse and Tran

sitivity in very direct fashion

Page 4: Coreference Based Event-Argument Relation Extraction on Biomedical Text Katsumasa Yoshikawa 1), Sebastian Riedel 2), Tsutomu Hirao 3), Masayuki Asahara

4

We analyzed the effect on the binding and the activity of transcription factors at a regulatory element.

TPA induction inhibits the binding of the transcription factor NF-E2 to this transcriptional control element.

TPA induction increases the binding of AP-1 factors to this element.

Cause ThemeTheme

Theme Theme

S1

S2

S3

Arguments are often related to the other mentions through coreference relations

Event-Argument Relation with Coreference Information

Page 5: Coreference Based Event-Argument Relation Extraction on Biomedical Text Katsumasa Yoshikawa 1), Sebastian Riedel 2), Tsutomu Hirao 3), Masayuki Asahara

5

"this element" in S2 is coreferent to… "a regulatory element" in S1

We analyzed the effect on the binding and the activity of transcription factors at a regulatory element.

Corefer

TPA induction inhibits the binding of the transcription factor NF-E2 to this transcriptional control element.

TPA induction increases the binding of AP-1 factors to this element.

Cause ThemeTheme

Theme Theme

S1

S2

S3

Event-Argument Relation with Coreference Information

Page 6: Coreference Based Event-Argument Relation Extraction on Biomedical Text Katsumasa Yoshikawa 1), Sebastian Riedel 2), Tsutomu Hirao 3), Masayuki Asahara

6

The true argument (Theme) of binding is "a regulatory element“ and "this element" is just an anaphor of it

Transitivity enables us to extract it

We analyzed the effect on the binding and the activity of transcription factors at a regulatory element.

(B) Corefer(C) Theme

TPA induction inhibits the binding of the transcription factor NF-E2 to this transcriptional control element.

TPA induction increases the binding of AP-1 factors to this element.

Cause ThemeTheme

Theme (A) Theme

S1

S2

S3

Event-Argument Relation with Coreference Information

(A) Theme & (B) Corefer => (C) Theme

Page 7: Coreference Based Event-Argument Relation Extraction on Biomedical Text Katsumasa Yoshikawa 1), Sebastian Riedel 2), Tsutomu Hirao 3), Masayuki Asahara

7

Arguments mentioned over and over again have higher salience in discourse and should be extracted at any cost

Our approach can aggressively extracts such arguments that are valuable in discourse structure

We analyzed the effect on the binding and the activity of transcription factors at a regulatory element.

CoreferTheme

TPA induction inhibits the binding of the transcription factor NF-E2 to this transcriptional control element.

TPA induction increases the binding of AP-1 factors to this element.

Cause ThemeTheme

Theme Theme

Theme

CoreferTheme

S1

S2

S3

Event-Argument Relation with Coreference Information

Page 8: Coreference Based Event-Argument Relation Extraction on Biomedical Text Katsumasa Yoshikawa 1), Sebastian Riedel 2), Tsutomu Hirao 3), Masayuki Asahara

8

Outline Research summary

Related work of event extraction

Proposed coreference based approach

Experimental setup and highlighted data

Conclusion and future work

Page 9: Coreference Based Event-Argument Relation Extraction on Biomedical Text Katsumasa Yoshikawa 1), Sebastian Riedel 2), Tsutomu Hirao 3), Masayuki Asahara

9

Biomedical Event Extraction(BioNLP'09 Task 1)

Extracting events, arguments, and their relations in a document

TPA induction increases the binding of AP-1 factors to this element.

Cause ThemeTheme

Theme

eventevent event

argument argument argumentargument

Main targets : Event-Argument relations (E-As)

argument

Theme

Example

event induction, increases, binding

argument TPA, AP-1 factors, this element, induction, binding

event-argument Theme(induction-TPA), Cause(increases, induction), Theme(increases, binding), Theme(binding, AP-1 factors), Theme(binding, this element)

Page 10: Coreference Based Event-Argument Relation Extraction on Biomedical Text Katsumasa Yoshikawa 1), Sebastian Riedel 2), Tsutomu Hirao 3), Masayuki Asahara

10

Previous Work [in BioNLP’09]

Pairwise pipeline by SVM classifiers [Bjorne et al., 2009]

eventarg1 arg2

1. Identification of events 2. Coupling with proteins and labeling the roles

eventarg1 arg2

N oTheme

event1arg1 arg2 event2 arg3

Theme

Theme Cause Cause

Collective approach by Markov Logic[Riedel et al., 2009] [Poon et al., 2010]

1. Jointly identify the most probable E-A assignments in a sentence

Page 11: Coreference Based Event-Argument Relation Extraction on Biomedical Text Katsumasa Yoshikawa 1), Sebastian Riedel 2), Tsutomu Hirao 3), Masayuki Asahara

11

Outline Research summary

Related work of event extraction

Proposed coreference based approach

Experimental setup and highlighted data

Conclusion and future work

Page 12: Coreference Based Event-Argument Relation Extraction on Biomedical Text Katsumasa Yoshikawa 1), Sebastian Riedel 2), Tsutomu Hirao 3), Masayuki Asahara

12

Markov Logic[Richardson and Domingos, 2006]

A Statistical Relational Learning framework An expressive template language of Markov N

etworks Not only hard but also soft constraints A Markov Logic Network (MLN) is a set of pair

s (φ, w) where– φ is a formula in first-order logic– w is a real number weight

Higher weight stronger constraint

Page 13: Coreference Based Event-Argument Relation Extraction on Biomedical Text Katsumasa Yoshikawa 1), Sebastian Riedel 2), Tsutomu Hirao 3), Masayuki Asahara

13

Coreference Based Event Extraction with Markov Logic

Hidden predicate (Query)predicate description

event(i) token i is an event

eventType(i,t) token i is an event with type t

role(i,j,r) token i has an argument j with role r

Observed predicate (Given)predicate description

pos(i,p) token i has part-of-speech p

protein(i) token i is a protein

dep(i,j,d) token i depends on token j

Features are described by combinations of these predicates

Page 14: Coreference Based Event-Argument Relation Extraction on Biomedical Text Katsumasa Yoshikawa 1), Sebastian Riedel 2), Tsutomu Hirao 3), Masayuki Asahara

14

Example of Markov Logic Networks

pos(3,Verb)

event(3)

wa(Verb) wb(regulation, Theme)

role (3,6,Theme)

protein(6)

wc(obj,Theme)

dep(3,6,obj)

Weight Function Weight value

Ground Formula

wa(Verb) 3.1 pos(3,Verb) event(3)⇒

wb(regulation,Theme) -0.9 event(3) ^ eventType(3,regulation) ^ protein(6) role(3,6,Theme)⇒

Feature definition by weighted First-Order Logic

)Theme,6,3()obj,6,3(

otherwise

if

0

116/,13/

roledepf ji

※ all features are binary

)Theme,,()obj,,( jirolejidep

eventType(3,regulation)

grounded

grounding

Page 15: Coreference Based Event-Argument Relation Extraction on Biomedical Text Katsumasa Yoshikawa 1), Sebastian Riedel 2), Tsutomu Hirao 3), Masayuki Asahara

15

Basic Ideas of Proposed Method

Effective employment of coreference information based on discourse structure– Salience in Discourse : aggressive extraction

of valuable E-As

Consider event-argument relations crossing sentence boundaries– Transitivity involving coreference relations

Page 16: Coreference Based Event-Argument Relation Extraction on Biomedical Text Katsumasa Yoshikawa 1), Sebastian Riedel 2), Tsutomu Hirao 3), Masayuki Asahara

16

How to Use Coreference with Markov Logic?

1. Salience in Discourse2. Transitivity 3. Feature Copy

Theme CauseCorefer

Theme

S1

S2

The IRF-2 promoter region contains a CpG island .

The region is inducible by both interferons .

1 2 3 4 5 6 7 8 9

10 11 12 13 14 15 16 17

predicate description

corefer(i,j) token i is coreferent to token j

Page 17: Coreference Based Event-Argument Relation Extraction on Biomedical Text Katsumasa Yoshikawa 1), Sebastian Riedel 2), Tsutomu Hirao 3), Masayuki Asahara

17

Coreference Based Approach ①( Salience in Discourse )

Tokens coreferent to something have higher salience in discourse and are more likely to be arguments of events

ThemeCorefer

S1

S2

The IRF-2 promoter region contains a CpG island .

The region is inducible by both interferons .

1 2 3 4 5 6 7 8 9

10 11 12 13 14 15 16 17

)(),(.),( predeventarg,rpredrolepredantargcorefer

If "The region" is coreferent to "The IRF-2...", then there is at least one event related to "The region"

・・・( SiD)

Page 18: Coreference Based Event-Argument Relation Extraction on Biomedical Text Katsumasa Yoshikawa 1), Sebastian Riedel 2), Tsutomu Hirao 3), Masayuki Asahara

18

Coreference Based Approach ②( Transitivity )

Transition rules involving coreference relations allow us to extract cross sentential event-arguments with "sentence by sentence" manner

(A) Theme

(B) Corefer(C) Theme

S1

S2

The IRF-2 promoter region contains a CpG island .

The region is inducible by both interferons .

1 2 3 4 5 6 7 8 9

10 11 12 13 14 15 16 17

)Theme,4,13()4,11()Theme,11,13( rolecoreferrole (A) (B) (C)

),,(),()r,,( rantpredroleantargcoreferargpredrole ・・・( T)

Page 19: Coreference Based Event-Argument Relation Extraction on Biomedical Text Katsumasa Yoshikawa 1), Sebastian Riedel 2), Tsutomu Hirao 3), Masayuki Asahara

19

Coreference Based Approach③( Feature Copy )

If a token coreferent to something, then we exploit the features of antecedents to identify intra sentential E-A relations

Theme

Corefer

S1

S2

The IRF-2 promoter region contains a CpG island .

The region is inducible by both interferons .

1 2 3 4 5 6 7 8 9

10 11 12 13 14 15 16 17

)Theme,11,13()"2IRF",4()4,11( rolewordcorefer

Copy

)r,,(),(),( argpredrolewantwordantargcorefer ・・・(FC)

Page 20: Coreference Based Event-Argument Relation Extraction on Biomedical Text Katsumasa Yoshikawa 1), Sebastian Riedel 2), Tsutomu Hirao 3), Masayuki Asahara

20

Outline Research summary

Related work of event extraction

Proposed coreference based approach

Experimental setup and highlighted data

Conclusion and future work

Page 21: Coreference Based Event-Argument Relation Extraction on Biomedical Text Katsumasa Yoshikawa 1), Sebastian Riedel 2), Tsutomu Hirao 3), Masayuki Asahara

21

Experimental Setup

Data : GENIA Event Corpus ver. 0.9 [Kim et al., 2008]

– Preprocess : POS tagging, NE tagging, Parsing Coreference resolver : pairwise model [Soon et al.,

2001]

– Learning & Inference : SVM Event extraction:

– Joint Markov Logic model [Riedel et al., 2009]Learning : one-best MIRAInference : ILP solver with CPI [Riedel, 2008]Provided by Markov thebeast

– SVM pipeline [Bjorne et al., 2009]Learning & Inference : multi-class SVM

Page 22: Coreference Based Event-Argument Relation Extraction on Biomedical Text Katsumasa Yoshikawa 1), Sebastian Riedel 2), Tsutomu Hirao 3), Masayuki Asahara

22

Experimental Result (Summary)

Results of Event Extraction (F1)

We got statistically significant improvements by both models, SVM and MLN

Model Coreference event eventType role

1)

SVM

w/o 77.0 67.8 52.3 ( 0.0)

2)

with resolver 77.0 67.8 53.6 (+1.3)

3)

with gold 77.0 67.8 55.4 (+3.1)

4)

MLN

w/o 80.5 70.6 51.7 ( 0.0)

5)

with resolver 80.8 70.6 53.8 (+2.1)

6)

with gold 81.2 70.8 56.7 (+5.0)

ρ< 0.01 (McNemar’s test, 2-tailed)

Page 23: Coreference Based Event-Argument Relation Extraction on Biomedical Text Katsumasa Yoshikawa 1), Sebastian Riedel 2), Tsutomu Hirao 3), Masayuki Asahara

23

Three Types of E-A Relations

(2) W-ANT (3) NormalCorefer

(1) Cross

S1

S2

The IRF-2 promoter region contains a CpG island .

The region is inducible by both interferons .

1 2 3 4 5 6 7 8 9

10 11 12 13 14 15 16 17

Type Description

(1) Cross link E-A relations crossing sentence boundaries

(2) With-Antecedent Intra-sentence E-As with antecedents

(3) Normal Neither Cross-link nor With-Antecedent

Evaluation for the three types of E-A relations

Page 24: Coreference Based Event-Argument Relation Extraction on Biomedical Text Katsumasa Yoshikawa 1), Sebastian Riedel 2), Tsutomu Hirao 3), Masayuki Asahara

24

Experimental Result (E-A Relation)

Results of E-A Relation Extraction (F1)

Both Transitivity and Salience in Discourse work well MLN with gold coreference annotations outperforms SV

M pipeline both on Cross and on W-ANT

Model Coreference Cross-link With-Antecedent Normal

1)

SVM

w/o 0.0 56.0 53.6

2)

with resolver 27.9 57.0 54.3

3)

with gold 54.1 57.3 55.4

4)

MLN

w/o 0.0 49.8 ( 0.0) 53.2

5)

with resolver 39.3 56.5 (+6.7) 54.3

6)

with gold 69.7 66.7(+16.9) 55.3

Page 25: Coreference Based Event-Argument Relation Extraction on Biomedical Text Katsumasa Yoshikawa 1), Sebastian Riedel 2), Tsutomu Hirao 3), Masayuki Asahara

25

Outline Research summary

Related work of event extraction

Proposed coreference based approach

Experimental setup and highlighted data

Conclusion and future work

Page 26: Coreference Based Event-Argument Relation Extraction on Biomedical Text Katsumasa Yoshikawa 1), Sebastian Riedel 2), Tsutomu Hirao 3), Masayuki Asahara

26

Summary

We proposed a new method for biomedical event extraction with coreference information

Our systems successfully extract cross-sentential E-As by transitivity including coreference relations

The concept of salience in discourse can also help E-A extraction

We got further improvements with gold coreference annotations especially for MLN

Page 27: Coreference Based Event-Argument Relation Extraction on Biomedical Text Katsumasa Yoshikawa 1), Sebastian Riedel 2), Tsutomu Hirao 3), Masayuki Asahara

27

Future Work

Make more effort to coreference resolution– From pairwise model to clustering approach

Full joint approach of event extraction and coreference resolution– Fighting against computational costs– Narrative Event Chains [Chambers et al., 2008]