improved relation extraction with feature-rich ...mgormley/papers/gormley+yu+dredze.emnlp.201… ·...

30
Improved Relation Extraction with Feature-Rich Compositional Embedding Models September 21, 2015 EMNLP 1 Mo Yu * Matt Gormley * Mark Dredze *Co-first authors

Upload: others

Post on 18-Jun-2020

12 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Improved Relation Extraction with Feature-Rich ...mgormley/papers/gormley+yu+dredze.emnlp.201… · Improved Relation Extraction with Feature-Rich Compositional Embedding Models September

Improved Relation Extraction with Feature-Rich Compositional

Embedding Models

September 21, 2015EMNLP

1

Mo Yu* Matt Gormley*

Mark Dredze

*Co-first authors

Page 2: Improved Relation Extraction with Feature-Rich ...mgormley/papers/gormley+yu+dredze.emnlp.201… · Improved Relation Extraction with Feature-Rich Compositional Embedding Models September

FCM or: How I Learned to Stop Worrying (about Deep Learning)

and Love Features

September 21, 2015EMNLP

2

Mo Yu* Matt Gormley*

Mark Dredze

*Co-first authors

Page 3: Improved Relation Extraction with Feature-Rich ...mgormley/papers/gormley+yu+dredze.emnlp.201… · Improved Relation Extraction with Feature-Rich Compositional Embedding Models September

Handcrafted Features

3

NNP : VBN NNP VBD

PERLOC

Egypt - born Proyas directed

S

NP VP

ADJP VPNP

egypt - born proyas direct

p(y|x) ∝exp(Θyf( ))

born-in

Page 4: Improved Relation Extraction with Feature-Rich ...mgormley/papers/gormley+yu+dredze.emnlp.201… · Improved Relation Extraction with Feature-Rich Compositional Embedding Models September

Where do features come from?

4

Feat

ure

En

gin

ee

rin

g

Feature Learning

hand-craftedfeatures

Sun et al., 2011

Zhou et al.,2005

First word before M1

Second word before M1

Bag-of-words in M1

Head word of M1

Other word in between

First word after M2

Second word after M2

Bag-of-words in M2

Head word of M2

Bigrams in between

Words on dependency path

Country name list

Personal relative triggers

Personal title list

WordNet Tags

Heads of chunks in between

Path of phrase labels

Combination of entity types

Page 5: Improved Relation Extraction with Feature-Rich ...mgormley/papers/gormley+yu+dredze.emnlp.201… · Improved Relation Extraction with Feature-Rich Compositional Embedding Models September

Where do features come from?

5

Feat

ure

En

gin

ee

rin

g

Feature Learning

hand-craftedfeatures

Sun et al., 2011

Zhou et al.,2005 word

embeddingsMikolov et al.,

2013

CBOW model in Mikolov et al. (2013)

input(context words)

embedding

missing word

Look-up table Classifier

0.13 .26 … -.52

0.11 .23 … -.45

dog:

cat:similar words,similar embeddings

unsupervisedlearning

Page 6: Improved Relation Extraction with Feature-Rich ...mgormley/papers/gormley+yu+dredze.emnlp.201… · Improved Relation Extraction with Feature-Rich Compositional Embedding Models September

Where do features come from?

6

Feat

ure

En

gin

ee

rin

g

Feature Learning

hand-craftedfeatures

Sun et al., 2011

Zhou et al.,2005 word

embeddingsMikolov et al.,

2013

stringembeddings

Collobert & Weston, 2008

Socher, 2011

Convolutional Neural Networks (Collobert and Weston 2008)

The [movie] showed [wars]

pooling

CNN

Recursive Auto Encoder (Socher 2011)

The [movie] showed [wars]

RAE

Page 7: Improved Relation Extraction with Feature-Rich ...mgormley/papers/gormley+yu+dredze.emnlp.201… · Improved Relation Extraction with Feature-Rich Compositional Embedding Models September

Where do features come from?

7

Feat

ure

En

gin

ee

rin

g

Feature Learning

hand-craftedfeatures

Sun et al., 2011

Zhou et al.,2005 word

embeddingsMikolov et al.,

2013

treeembeddings

Socher et al.,2013

Hermann & Blunsom,2013

stringembeddings

Collobert & Weston, 2008

Socher, 2011

The [movie] showed [wars]

WNP,VP

WDT,NN WV,NN

S

NP VP

Page 8: Improved Relation Extraction with Feature-Rich ...mgormley/papers/gormley+yu+dredze.emnlp.201… · Improved Relation Extraction with Feature-Rich Compositional Embedding Models September

Where do features come from?

8

wordembeddings

treeembeddings

hand-craftedfeatures

stringembeddings

Feat

ure

En

gin

ee

rin

g

Feature Learning

Sun et al., 2011

Zhou et al.,2005

Mikolov et al.,2013

Collobert & Weston, 2008

Socher, 2011

Socher et al.,2013

Hermann & Blunsom,2013

Hermann et al.2014

word embedding features

Turian et al. 2010

Koo et al. 2008

Page 9: Improved Relation Extraction with Feature-Rich ...mgormley/papers/gormley+yu+dredze.emnlp.201… · Improved Relation Extraction with Feature-Rich Compositional Embedding Models September

Where do features come from?

9

wordembeddings

treeembeddings

word embedding featureshand-crafted

features

Our model (FCM)

stringembeddings

Feat

ure

En

gin

ee

rin

g

Feature Learning

Sun et al., 2011

Zhou et al.,2005

Mikolov et al.,2013

Collobert & Weston, 2008

Socher, 2011

Socher et al.,2013

Turian et al. 2010

Koo et al. 2008

Hermann et al.2014

Hermann & Blunsom,2013

Page 10: Improved Relation Extraction with Feature-Rich ...mgormley/papers/gormley+yu+dredze.emnlp.201… · Improved Relation Extraction with Feature-Rich Compositional Embedding Models September

Feature-rich Compositional Embedding Model (FCM)

Goals for our Model:

1. Incorporate semantic/syntactic structural information

2. Incorporate word meaning

3. Bridge the gap between feature engineering and feature learning – but remain as simple as possible

10

Page 11: Improved Relation Extraction with Feature-Rich ...mgormley/papers/gormley+yu+dredze.emnlp.201… · Improved Relation Extraction with Feature-Rich Compositional Embedding Models September

Feature-rich Compositional Embedding Model (FCM)

11The [movie]M1 I watched depicted [hope]M2

nilnoun-other

noun-person

noun-other

verb-percep.

verb-comm.

on-path(wi)

is-between(wi)

head-of-M1(wi)

head-of-M2(wi)

before-M1(wi)

before-M2(wi)

0

0

0

0

1

0

1

0

1

0

0

0

0

1

0

0

0

0

0

1

0

0

0

0

1

1

0

0

0

1

1

0

0

1

0

0

f1 f2 f3 f4 f5 f6

Per-word Features:

Page 12: Improved Relation Extraction with Feature-Rich ...mgormley/papers/gormley+yu+dredze.emnlp.201… · Improved Relation Extraction with Feature-Rich Compositional Embedding Models September

Feature-rich Compositional Embedding Model (FCM)

12The [movie]M1 I watched depicted [hope]M2

nilnoun-other

noun-person

noun-other

verb-percep.

verb-comm.

on-path(wi)

is-between(wi)

head-of-M1(wi)

head-of-M2(wi)

before-M1(wi)

before-M2(wi)

1

1

0

0

0

1

f5

Per-word Features:

Page 13: Improved Relation Extraction with Feature-Rich ...mgormley/papers/gormley+yu+dredze.emnlp.201… · Improved Relation Extraction with Feature-Rich Compositional Embedding Models September

Feature-rich Compositional Embedding Model (FCM)

13The [movie]M1 I watched depicted [hope]M2

nilnoun-other

noun-person

noun-other

verb-percep.

verb-comm.

on-path(wi) & wi= “depicted”

is-between(wi) & wi= “depicted”

head-of-M1(wi) & wi= “depicted”

head-of-M2(wi) & wi= “depicted”

before-M1(wi) & wi= “depicted”

before-M2(wi) & wi= “depicted”

1

1

0

0

0

1

f5

Per-word Features: (with conjunction)

Page 14: Improved Relation Extraction with Feature-Rich ...mgormley/papers/gormley+yu+dredze.emnlp.201… · Improved Relation Extraction with Feature-Rich Compositional Embedding Models September

Feature-rich Compositional Embedding Model (FCM)

14The [movie]M1 I watched depicted [hope]M2

nilnoun-other

noun-person

noun-other

verb-percep.

verb-comm.

1

1

0

0

0

1

f5

Per-word Features: (with soft conjunction)

on-path(wi)

is-between(wi)

head-of-M1(wi)

head-of-M2(wi)

before-M1(wi)

before-M2(wi)

-.3 .9 .1 -1

edepicted

Outer-product

Page 15: Improved Relation Extraction with Feature-Rich ...mgormley/papers/gormley+yu+dredze.emnlp.201… · Improved Relation Extraction with Feature-Rich Compositional Embedding Models September

Feature-rich Compositional Embedding Model (FCM)

15The [movie]M1 I watched depicted [hope]M2

nilnoun-other

noun-person

noun-other

verb-percep.

verb-comm.

1

1

0

0

0

1

f5

Per-word Features: (with soft conjunction)

on-path(wi)

is-between(wi)

head-of-M1(wi)

head-of-M2(wi)

before-M1(wi)

before-M2(wi)

-.3 .9 .1 -1

edepicted

-.3 .9 .1 -1

-.3 .9 .1 -1

-.3 .9 .1 -1

0 0 0 0

0 0 0 0

-.3 .9 .1 -1

… … … …

Page 16: Improved Relation Extraction with Feature-Rich ...mgormley/papers/gormley+yu+dredze.emnlp.201… · Improved Relation Extraction with Feature-Rich Compositional Embedding Models September

Feature-rich Compositional Embedding Model (FCM)

16

fi

ewi

p(y|x) ∝ exp Σi=1

n

Our full model sums over each word in the sentence

Then takes the dot-product with a parameter tensor

Ty

And finally, exponentiatesand renormalizes

Page 17: Improved Relation Extraction with Feature-Rich ...mgormley/papers/gormley+yu+dredze.emnlp.201… · Improved Relation Extraction with Feature-Rich Compositional Embedding Models September

Features for FCM

• Let M1 and M2 denote the left and right entity mentions

• Our per-word Binary Features: head of M1

head of M2

in-between M1 and M2

-2, -1, +1, or +2 of M1

-2, -1, +1, or +2 of M2

on dependency path between M1 and M2

• Optionally: Add the entity type of M1, M2, or both

17

Page 18: Improved Relation Extraction with Feature-Rich ...mgormley/papers/gormley+yu+dredze.emnlp.201… · Improved Relation Extraction with Feature-Rich Compositional Embedding Models September

FCM as a Neural Network

18

Σ

Τ

fn

p(y|x)

f1 e

h1 hn

ex

���

e

���

��� ���

���

���

w1 wn

Binary features

Embeddings

Parameter tensor

• Embeddings are (optionally) treated as model parameters• A log-bilinear model• We initialize, then fine-tune the embeddings

Page 19: Improved Relation Extraction with Feature-Rich ...mgormley/papers/gormley+yu+dredze.emnlp.201… · Improved Relation Extraction with Feature-Rich Compositional Embedding Models September

Baseline Model

19

Yi,j

NNP : VBN NNP VBD

PERLOC

Egypt- bornProyasdirected

S

NP VP

ADJP VPNP

egypt- bornproyasdirect

born-in

p(y|x) µ

exp(Θy�f ( ) )

• Multinomial logistic regression (standard approach)

• Bring in all the usual binary NLP features (Sun et al., 2011)– type of the left entity mention

– dependency path between mentions

– bag of words in right mention

– …

Page 20: Improved Relation Extraction with Feature-Rich ...mgormley/papers/gormley+yu+dredze.emnlp.201… · Improved Relation Extraction with Feature-Rich Compositional Embedding Models September

Hybrid Model: Baseline + FCM

20

Yi,j

Σ

Τ

fn

p(y|x)

f1 e

h1 hn

ex

���

e

���

��� ���

���

���

w1 wn

NNP : VBN NNP VBD

PERLOC

Egypt- bornProyasdirected

S

NP VP

ADJP VPNP

egypt- bornproyasdirect

born-in

p(y|x) µ

exp(Θy�f ( ) )

p(y|x) = pBaseline(y|x) pFCM(y|x)1Z(x)

Product of Experts:

Page 21: Improved Relation Extraction with Feature-Rich ...mgormley/papers/gormley+yu+dredze.emnlp.201… · Improved Relation Extraction with Feature-Rich Compositional Embedding Models September

Experimental Setup

ACE 2005

• Data: 6 domains– Newswire (nw)– Broadcast Conversation (bc)– Broadcast News (bn)– Telephone Speech (cts)– Usenet Newsgroups (un)– Weblogs (wl)

• Train: bn+nw (~3600 relations)Dev: ½ of bcTest: ½ of bc, cts, wl`

• Metric: Micro F1(given entity mention)

SemEval-2010 Task 8

• Data: Web text– Newswire (nw)– Broadcast Conversation (bc)– Broadcast News (bn)– Telephone Speech (cts)– Usenet Newsgroups (un)– Weblogs (wl)

• Train:Dev: Test:

• Metric: Macro F1(given entity boundaries)

21

Standard split from shared task

Page 22: Improved Relation Extraction with Feature-Rich ...mgormley/papers/gormley+yu+dredze.emnlp.201… · Improved Relation Extraction with Feature-Rich Compositional Embedding Models September

ACE 2005 Results

22

45%

50%

55%

60%

65%

Broadcast Conversation Conversational TelephoneSpeech

Weblogs

Mic

ro F

1

Test Set

Baseline

FCM

Baseline+FCM

Page 23: Improved Relation Extraction with Feature-Rich ...mgormley/papers/gormley+yu+dredze.emnlp.201… · Improved Relation Extraction with Feature-Rich Compositional Embedding Models September

SemEval-2010 ResultsSource Classifier F1

Socher et al. (2012) RNN 74.8

Socher et al. (2012) MVRNN 79.1

Hashimoto et al. (2015) RelEmb 81.8

Rink and Harabagiu (2010) SVM 82.2

Zeng et al. (2014) CNN 82.7

Santos et al. (2015) CR-CNN (log-loss) 82.7

Liu et al. (2015) DepNN 82.8

Hashimoto et al. (2015) RelEmb (task-spec-emb) 82.8

70 72 74 76 78 80

82

84

86

Best in SemEval-2010 Shared Task

Page 24: Improved Relation Extraction with Feature-Rich ...mgormley/papers/gormley+yu+dredze.emnlp.201… · Improved Relation Extraction with Feature-Rich Compositional Embedding Models September

SemEval-2010 ResultsSource Classifier F1

Socher et al. (2012) RNN 74.8

Socher et al. (2012) MVRNN 79.1

Hashimoto et al. (2015) RelEmb 81.8

Rink and Harabagiu (2010) SVM 82.2

Zeng et al. (2014) CNN 82.7

Santos et al. (2015) CR-CNN (log-loss) 82.7

Liu et al. (2015) DepNN 82.8

Hashimoto et al. (2015) RelEmb (task-spec-emb) 82.8

FCM (log-linear) 81.4

FCM (log-bilinear) 83.0

70 72 74 76 78 80

82

84

86

Best in SemEval-2010 Shared Task

Page 25: Improved Relation Extraction with Feature-Rich ...mgormley/papers/gormley+yu+dredze.emnlp.201… · Improved Relation Extraction with Feature-Rich Compositional Embedding Models September

SemEval-2010 ResultsSource Classifier F1

Socher et al. (2012) RNN 74.8

Socher et al. (2012) MVRNN 79.1

Hashimoto et al. (2015) RelEmb 81.8

Rink and Harabagiu (2010) SVM 82.2

Zeng et al. (2014) CNN 82.7

Santos et al. (2015) CR-CNN (log-loss) 82.7

Liu et al. (2015) DepNN 82.8

Hashimoto et al. (2015) RelEmb (task-spec-emb) 82.8

FCM (log-linear) 81.4

FCM (log-bilinear) 83.0

70 72 74 76 78 80

82

84

86

Page 26: Improved Relation Extraction with Feature-Rich ...mgormley/papers/gormley+yu+dredze.emnlp.201… · Improved Relation Extraction with Feature-Rich Compositional Embedding Models September

SemEval-2010 ResultsSource Classifier F1

Socher et al. (2012) RNN 74.8

Socher et al. (2012) MVRNN 79.1

Hashimoto et al. (2015) RelEmb 81.8

Rink and Harabagiu (2010) SVM 82.2

Xu et al. (2015) SDP-LSTM 82.4

Zeng et al. (2014) CNN 82.7

Santos et al. (2015) CR-CNN (log-loss) 82.7

Liu et al. (2015) DepNN 82.8

Hashimoto et al. (2015) RelEmb (task-spec-emb) 82.8

Xu et al. (2015) SDP-LSTM (full) 83.7

FCM (log-linear) 81.4

FCM (log-bilinear) 83.0

70 72 74 76 78 80

82

84

86

Page 27: Improved Relation Extraction with Feature-Rich ...mgormley/papers/gormley+yu+dredze.emnlp.201… · Improved Relation Extraction with Feature-Rich Compositional Embedding Models September

SemEval-2010 ResultsSource Classifier F1

Socher et al. (2012) RNN 74.8

Socher et al. (2012) MVRNN 79.1

Hashimoto et al. (2015) RelEmb 81.8

Rink and Harabagiu (2010) SVM 82.2

Xu et al. (2015) SDP-LSTM 82.4

Zeng et al. (2014) CNN 82.7

Santos et al. (2015) CR-CNN (log-loss) 82.7

Liu et al. (2015) DepNN 82.8

Hashimoto et al. (2015) RelEmb (task-spec-emb) 82.8

Xu et al. (2015) SDP-LSTM (full) 83.7

FCM (log-linear) 81.4

FCM (log-bilinear) 83.0

FCM (log-bilinear)(task-spec-emb)

83.7

70 72 74 76 78 80

82

84

86

Page 28: Improved Relation Extraction with Feature-Rich ...mgormley/papers/gormley+yu+dredze.emnlp.201… · Improved Relation Extraction with Feature-Rich Compositional Embedding Models September

SemEval-2010 ResultsSource Classifier F1

Socher et al. (2012) RNN 74.8

Socher et al. (2012) MVRNN 79.1

Hashimoto et al. (2015) RelEmb 81.8

Rink and Harabagiu (2010) SVM 82.2

Xu et al. (2015) SDP-LSTM 82.4

Zeng et al. (2014) CNN 82.7

Santos et al. (2015) CR-CNN (log-loss) 82.7

Liu et al. (2015) DepNN 82.8

Hashimoto et al. (2015) RelEmb (task-spec-emb) 82.8

Xu et al. (2015) SDP-LSTM (full) 83.7

Santos et al. (2015) CR-CNN (ranking-loss) 84.1

FCM (log-linear) 81.4

FCM (log-bilinear) 83.0

FCM (log-bilinear)(task-spec-emb)

83.7

70 72 74 76 78 80

82

84

86

Page 29: Improved Relation Extraction with Feature-Rich ...mgormley/papers/gormley+yu+dredze.emnlp.201… · Improved Relation Extraction with Feature-Rich Compositional Embedding Models September

Takeaways

FCM bridges the gap between feature engineering and feature learning

If you are allergic to deep learning:– Try the FCM for your task: it is simple, easy-to-

implement, and was shown to be effective for two relation benchmarks

If you are a deep learning expert:– Inject the FCM (i.e. outer product of features and

embeddings) into your fancy deep network

29

Page 30: Improved Relation Extraction with Feature-Rich ...mgormley/papers/gormley+yu+dredze.emnlp.201… · Improved Relation Extraction with Feature-Rich Compositional Embedding Models September

Questions?

Two open source implementations:

– Java: (Within the Pacaya framework)https://github.com/mgormley/pacaya

– C++: (From our NAACL 2015 paper on LRFCM)https://github.com/Gorov/ERE_RE