entity centric coreference resolution with model stacking

23
Entity Centric Coreference Resolution with Model Stacking Kevin Clark and Christopher D. Manning (ACL - IJCNLP 2015) (Tables are taken from the above - mentioned paper) Presented by Mamoru Komachi <[email protected]> ACL 2015 Reading Group @ Tokyo Institute of Technology August 26 th , 2015

Upload: mamoru-komachi

Post on 21-Jan-2018

803 views

Category:

Engineering


2 download

TRANSCRIPT

Page 1: Entity Centric Coreference Resolution with Model Stacking

Entity Centric Coreference

Resolution with Model Stacking

Kevin Clark and Christopher D. Manning

(ACL-IJCNLP 2015)

(Tables are taken from the above-mentioned paper)

Presented by Mamoru Komachi

<[email protected]>

ACL 2015 Reading Group @ Tokyo Institute of Technology

August 26th, 2015

Page 2: Entity Centric Coreference Resolution with Model Stacking

Entity-level information allows early coreference

decisions to inform later ones

Entity-centric coreference systems build up

coreference clusters incrementally (Raghunathan et

al., 2010; Stoyanov and Eisner, 2012; Ma et al.,

2014)

2

Hillary Clinton files for divorce from Bill Clinton ahead

of her campaign for presidency for 2016.

….

Clinton is confident that her poll numbers will skyrocket

once the divorce is final.

?!?

Page 3: Entity Centric Coreference Resolution with Model Stacking

Problem: How to build up clusters

effectively?

Model stacking

Two mention pair models: classification model

and ranking model

Generates clusters features for clusters of

mentions

Imitation learning

Assigns exact costs to actions based on

coreference evaluation metrics

Uses the scores of the pairwise models to reduce

the search space

3

Page 4: Entity Centric Coreference Resolution with Model Stacking

Mention Pair ModelsPrevious approach using local information

4

Page 5: Entity Centric Coreference Resolution with Model Stacking

Two models for predicting whether a given pair of

mentions belong to the same coreference cluster

Is it a coreferent?

Classification model

Which one best suites for the mention?

Ranking model

5

Bill arrived, but nobody saw him.

I talked to him on the phone.

Page 6: Entity Centric Coreference Resolution with Model Stacking

Logistic classifiers for classification model

M: set of all mentions in the training set

T(m): set of true antecedents of a mention m

F(m): set of false antecedents of m

Considers each pair of mentions independently

6

Page 7: Entity Centric Coreference Resolution with Model Stacking

Logistic classifiers for ranking model

Considers candidate antecedents simultaneously

Max-margin training encourages the model to find

the single best antecedent for a mention, but it is

not robust for a downstream clustering model

7

Page 8: Entity Centric Coreference Resolution with Model Stacking

Features for mention pair model

Distance features: the distance between the two mentions in sentences or number of mentions

Syntactic features: number of embedded NPs under a mention, POS tags of the first, last, and head word

Semantic features: named entity type, speaker identification

Rule-based features: exact and partial string matching

Lexical features: the first, last, and head word of the current mention

8

Page 9: Entity Centric Coreference Resolution with Model Stacking

Entity-Centric

Coreference ModelProposed approach using cluster features

9

Page 10: Entity Centric Coreference Resolution with Model Stacking

Entity-centric model can exhibit high

coherency

Best first clustering (Ng and Cardie, 2002)

Assigns the most probable preceding mention

classified as coreferent with it as the antecedent

Only relies on local information

Entity-centric model (this work)

Operates between pairs of clusters instead of pairs

of mentions

Builds up coreference chains with agglomerative

clustering, by merging clusters if it predicts they are

representing the same one

10

Page 11: Entity Centric Coreference Resolution with Model Stacking

Inference

Reducing the search

space by using a

threshold from

mention-pair models

Sort P to perform

easy-first clustering

s is a scoring

function to make a

binary decision for

merge action

11

Page 12: Entity Centric Coreference Resolution with Model Stacking

Learning entity-centric model by imitation learning

Sequential prediction problem: future observations

depend on previous actions

Imitation learning (in this work, DAgger (Ross al.,

2011)), is useful for this problem (Argall et al., 2009)

Training the agent on the gold labels alone assumes

that all previous decisions were correct, but it is

problematic in coreference, where the error rate is

quite high

DAgger exposes the system to states at train time

similar to the ones it will face at test time12

Page 13: Entity Centric Coreference Resolution with Model Stacking

Learning cluster merging policy

by DAgger (Ross et al., 2011)

Iterative algorithm

aggregating a dataset D

consisting of states and the

actions performed by the

expert policy in those

states

b controls the probability of

the expert’s policy and

current policy (decays

exponentially as the

iteration number increases)

13

Page 14: Entity Centric Coreference Resolution with Model Stacking

Adding cost to actions: Directly tune to

optimize coreference metrics

Merging clusters (order of merge operations is also

important) influence the score

How a particular local decision will affect the final

score of the coreference system?

Problem: standard coreference metrics do not

decompose into clusters

Answer: rolling out the actions from the current state

14A(s): set of actions that can be taken from the state s

Page 15: Entity Centric Coreference Resolution with Model Stacking

Cluster features for classification model

and ranking model

Between clusters features

Minimum and maximum probability of coreference

Average probability and average log prob. of coreference

Average probability and log probability of coreference for a

particular pair of grammar types of mentions (pron or not)

15

Page 16: Entity Centric Coreference Resolution with Model Stacking

Only 56 features for entity-centric model

State features

Whether a preceding mention pair in the list of

mention pairs has the same candidate anaphor as

the current one

The index of the current mention pair in the list

divided by the size of the list (what percentage of the

list have we seen so far?)

Entity-centric model doesn’t rely on sparse lexical

features. Instead, it employs model stacking to

exploit strong features (with scores learned from

pairwise model)16

Page 17: Entity Centric Coreference Resolution with Model Stacking

Results and discussionsCoNLL 2012 English coreference task

17

Page 18: Entity Centric Coreference Resolution with Model Stacking

Experimental setup:

CoNLL 2012 Shared Task

English portion of OntoNotes

Training: 2802, development: 343, test:345 documents

Use the provided pre-processing (parse trees, NE, etc)

Common evaluation metrics

MUC, B3, CEAFE

CoNLL F1 (the average F1 score of the three metrics)

CoNLL scorer version 8.01

Rule-based mention detection (Raghunathan et al., 2010)

18

Page 19: Entity Centric Coreference Resolution with Model Stacking

Results: Entity-centric model outperforms best-

first clustering in both classification and ranking

19

Page 20: Entity Centric Coreference Resolution with Model Stacking

Entity-centric model beats other state-of-

the-art coreference models

20

This work primarily optimize for B3 metric during training

State-of-the-art systems use latent antecedents to learn

scoring functions over mention pairs, but are trained to

maximize global objective functions

Page 21: Entity Centric Coreference Resolution with Model Stacking

Entity-centric model directly learns a coreference

model that maximizes an evaluation metric

Post-processing of mention pair and ranking models

Closest-first clustering (Soon et al., 2001)

Best-first clustering (Ng and Cardie, 2002)

Global inference models

Global inference with integer linear programming

(Denis and Baldridge, 2007; Finkel and Manning,

2008)

Graph partitioning (McCallum and Wellner, 2005;

Nicolae and Nicolae, 2006)

Correlational clustering (McCallum and Wellner,

2003; Finely and Joachims, 2005)21

Page 22: Entity Centric Coreference Resolution with Model Stacking

Previous approaches do not directly tune

against coreference metrics

Non-local entity-level information

Cluster model (Luo et al., 2004; Yang et al., 2008;

Rahman and Ng, 2011)

Joint inference (McCallum and Wellner, 2003;

Culotta et al., 2006; Poon and Domingos, 2008;

Haghighi and Klein, 2010)

Learning trajectories of decisions

Imitation learning (Daume et al., 2005; Ma et al.,

2014)

Structured perceptron (Stoyanov and Eisner, 2012;

Fernandes et al., 2012; Bjoerkelund and Kuhn, 2014)

22

Page 23: Entity Centric Coreference Resolution with Model Stacking

Summary Proposed an entity-centric coreference model using

the scores produced by mention pair models as

features

Pairwise scores are learned using standard

coreference metrics

Imitation learning can be used to learn how to build

up coreference chains incrementally

Proposed model outperforms the commonly used

best-first method and current state-of-the-art

23