a global relaxation labeling approach to coreference resolution

27
A Global Relaxation Labeling Approach to Coreference Resolution Coling 2010 Emili Sapena, Llu´ıs Padr´o and Jordi Turmo TALP Research Center Universitat Polit`ecnica de Catalunya Yi-Ting Huang 2011/01/25 1

Upload: mabli

Post on 22-Feb-2016

45 views

Category:

Documents


0 download

DESCRIPTION

A Global Relaxation Labeling Approach to Coreference Resolution. Coling 2010 Emili Sapena, Llu´ıs Padr´o and Jordi Turmo TALP Research Center Universitat Polit`ecnica de Catalunya. Yi-Ting Huang 2011/01/25. Outline. Introduction Notation Method - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: A Global Relaxation Labeling Approach to Coreference Resolution

1

A Global Relaxation Labeling Approach to Coreference Resolution

Coling 2010Emili Sapena, Llu´ıs Padr´o and Jordi Turmo

TALP Research CenterUniversitat Polit`ecnica de Catalunya

Yi-Ting Huang2011/01/25

Page 2: A Global Relaxation Labeling Approach to Coreference Resolution

2

Outline Introduction Notation Method

Relaxation Labeling: weighted constraint satisfaction problem

Initial State Model: C4.5

Pruning Reordering

Experiments Conclusion

Page 3: A Global Relaxation Labeling Approach to Coreference Resolution

1. Introduction (1/6) Coreference resolution is the task of partitioning a

set of entity mentions in a text, where each partition corresponds to some entity in an underlying discourse model.

3

NP1….N2…….………PN3.......……NP4………N5…NP6…..

a mentiona entity/label

NP1….N2…….………PN3.......……NP4………N5…NP6…..

N: name entity, e.g. “Michael Jackson”NP: noun phrase, e.g. “the youngest of Jackson”PN: pronoun, e.g. “he”

Page 4: A Global Relaxation Labeling Approach to Coreference Resolution

4

1. Introduction (2/6) A typical machine learning-based coreference

resolution system usually consists of two steps: classification, where the system evaluates the

coreferentiality of each pair or group of mentions. formation of chains, where given the confidence

values of the previous classifications the system forms the coreference chains.

NP1….N2…….………PN3.......……NP4………N5…NP6…..

1. Classification

NP1….N2…….………PN3.......……NP4………N5…NP6…..

2. Formation of chains

CHAIN1: NP1-N2-PN3

CHAIN2: NP4-N5-PN6

Page 5: A Global Relaxation Labeling Approach to Coreference Resolution

5

1. Introduction (3/6) Pairwise classification

Groupwise classification

Combine: joining each positively-classified pair or only with maximum confidence value.

c1….c2…….…c3……c4..……c5……..c6……m…..

pair-wise classification(c1, m)=Y (c2, m)=N(c3, m)=N (c4, m)=Y(c5, m)=Y (c6, m)=N

Combineentity1(c1, c4, c5, m)entity2(c2, c3)entity3(c6)

c1….c2…….…c3……c4..……c5……..c6……m…..

twin-wise classification(c1, c2, m)=Y (c1, c3, m)=Y(c1, c6, m)=Y (c2, c1, m)=Y(c2, c3, m)=N (c2, c4, m)=N(c2, c5, m)=N (c2, c6, m)=N...

Combineentity1(c1, c4, c5, m)entity2(c2, c3)entity3(c6)

Page 6: A Global Relaxation Labeling Approach to Coreference Resolution

6

1. Introduction (4/6) Chain information: Integer Linear Programming

The main advantage of these types of post-processes is the enforcement of transitivity sorting out the contradictions that the previous classification process may introduce.

A set of binary variables (xij) symbolize whether pairs of mentions (mi,mj) corefer (xij = 1) or not (xij = 0). An objective function is defined as follows:

where Pcij is the confidence value of mentions mi

i<j<k

Page 7: A Global Relaxation Labeling Approach to Coreference Resolution

7

1. Introduction (5/6) Although chain formation processes search for global

consistency, the lack of contextual information in the classification step is propagated forward.

Few works try to overcome the limitations of keeping classification and chain formation apart. Luo et al. (2004): Bell tree structure McCallum and Wellner (2005): graph partitioning

cutting by distances Culotta et al. (2007) a groupwise classifier with a

clustering process in a First-Order probabilistic modelLuo, X., A. Ittycheriah, H. Jing, N. Kambhatla, and S. Roukos. 2004. A mention-synchronous coreferenceresolution algorithm based on the bell tree. In Proceedings of 42nd ACL, page 135.McCallum, A. and B. Wellner. 2005. Conditional models of identity uncertainty with application to noun coreference.Advances in Neural Information Processing Systems, 17:905–912Culotta, A., M. Wick, and A. McCallum. 2007. First-Order Probabilistic Models for Coreference Resolution. Proceedingsof NAACL HLT, pages 81–88.

Page 8: A Global Relaxation Labeling Approach to Coreference Resolution

8

1. Introduction (6/6) The approach presented in this paper follows the

same research line of joining group classification and chain formation in the same step.

We propose a graph representation of the problem solved by a relaxation labeling process, reducing coreference resolution to a graph partitioning problem given a set of constraints.

In this manner, decisions are taken considering the whole set of mentions, ensuring consistency and avoiding that classification decisions are independently taken.

Page 9: A Global Relaxation Labeling Approach to Coreference Resolution

9

Notation

1

( , )( ,..., )

is represented as reprsents the possibility ( , ) corefer.

is constraint that restricts the compatibility of both mentions and

is the weight of the

n

i i

ij i j

ij i j

ij

G G V Em m mm v Ve E v v

C C v v

w W

edge :jei

1 2

is the number of different values (labels) that are possible for .

is a vector containing the probability distribution of , that is,

( , ,..., ).

i i

ii

i i i iLi

L v

h v

h h h h

Page 10: A Global Relaxation Labeling Approach to Coreference Resolution

10

Methods Relaxation Labeling: weighted constraint satisfaction

problem Initial State

Model: C4.5 Pruning

Reordering

Page 11: A Global Relaxation Labeling Approach to Coreference Resolution

11

Method - Relaxation Labeling (1/4) Relaxation labeling (Relax) is a generic name for a

family of iterative algorithms which perform function optimization, based on local information.

Relaxation labeling solves our weighted constraint satisfaction problem dealing with the edge weights.

expresses how compatible is the assignment of label

to variable considreing the labels of adjacent variables and the edge weights.

( ) is the list of adjacent vertices of but only

il

i

i i

S lv

A v v k i

Page 12: A Global Relaxation Labeling Approach to Coreference Resolution

12

Method - Relaxation Labeling (2/4) The aim of the algorithm is to find a weighted

labeling such that global consistency is maximized. Maximizing global consistency is defined as

maximizing the average support for each variable.

Page 13: A Global Relaxation Labeling Approach to Coreference Resolution

13

Method - Relaxation Labeling (3/4)

Page 14: A Global Relaxation Labeling Approach to Coreference Resolution

14

Method - Relaxation Labeling (4/4) Many studies have been done towards the

demonstration of the consistency, convergence and cost reduction advantages of the relaxation algorithm.

Although some of the conditions required by the formal demonstrations are not fulfilled in our case, the presented algorithm –that forces a stop after a number of iterations– has proven useful for practical purposes.

Page 15: A Global Relaxation Labeling Approach to Coreference Resolution

15

Method - Initial State The initial state of the vertices define the a priori

probabilities for each vertex to be in each partition. uniformly distributed state

For Pronoun

Others

random

Li+1, +1 stats a new entity

Page 16: A Global Relaxation Labeling Approach to Coreference Resolution

16

Method - Constraints

Page 17: A Global Relaxation Labeling Approach to Coreference Resolution

17

Method - Constraints

Page 18: A Global Relaxation Labeling Approach to Coreference Resolution

18

Method – C4.5 Three specialized models are constructed

depending on the type of anaphor mention (mj ) of the pair: pronoun, named entity or nominal.

For each specialized model, a decision tree (DT) is generated and a set of rules is extracted with C4.5 rule-learning algorithm.

The weight assigned to a constraint (λk) is its precision over the training data (Pk), but shifted to be zero-centered: λk = Pk -0.5

Page 19: A Global Relaxation Labeling Approach to Coreference Resolution

19

Method - Pruning We have found two main error patterns that can be

solved by a pruning process.1. The size of the document.

Each vertex is adjacent to all the other vertices. This produces that the larger the number of adjacencies, the smaller the influence of a constraint is.

This problem is usually solved looking for antecedents in a window of few sentences, which entails an evident limitation of recall.

2. Many weak edge weights produce a bias. For example, the pairs (pronoun, pronoun) Is very weakly

informative. For each vertex’s adjacency list A(vi), only a maximum

of N edges remain and the others are pruned.

Page 20: A Global Relaxation Labeling Approach to Coreference Resolution

20

Method - Reordering The vertices of the graph would usually be placed in

the same order as the mentions are found in the document (chronological).

However, as suggested by Luo (2007), there is no need to generate the model following that order.

In our approach, the first variables have a lower number of possible label: Name entity. An error in the first variables has more influence on

the performance than an error in the later ones. Name entityNominal mentionsPronoun

Luo, X. 2007. Coreference or not: A twin model for coreference resolution. In Proceedings of NAACL HLT, pages 73–80.

Page 21: A Global Relaxation Labeling Approach to Coreference Resolution

21

Experiments setting Data Set: ACE-phase02 Consider true mentions Metrics: CEAF, B3

Preprocessing: FreeLing for sentences splitting and tokenization SVMtool for POS tagging BIO for NER and classification No lemmatization and syntactic analysis are used.

Page 22: A Global Relaxation Labeling Approach to Coreference Resolution

22

Baselines1. DT(decision tree) with automatic feature selection

based on Ng and Cardie (2002) 1

The features used in the baseline are the same. But some features are noisy and redundant.

Hill Climbing process has been performed doing a five-fold cross-validation over the training corpus

Integer Linear Programming (Klenner, 2007 2; Denis and Baldridge, 2007 3; Finkel and Manning, 2008 4 )

1 Ng, V. and C. Cardie. 2002. Improving machine learning approaches to coreference resolution. Proceedings of the 40th Annual Meeting on Association for Computational Linguistics, pages 104–111.2 Klenner, M. 2007. Enforcing consistency on coreference sets. In Recent Advances in Natural Language Processing (RANLP), pages 323–328.3 Denis, P. and J. Baldridge. 2007. Joint Determination of Anaphoricity and Coreference Resolution using Integer Programming. Proceedings of NAACL HLT, pages 236– 243.4 Finkel, J.R. and C.D. Manning. 2008. Enforcing transitivity in coreference resolution. In Proceedings of the 46th Annual Meeting of the ACL HLT: Short Papers, pages 45–48. Association for Computational Linguistics.

Page 23: A Global Relaxation Labeling Approach to Coreference Resolution

23

Experimental Results (1/3) 1st experiment compares the performancesof our

baselines.

In 2nd experiment, only applied to documents shorter than 200 mentions due to computational cost.

Page 24: A Global Relaxation Labeling Approach to Coreference Resolution

24

Experimental Results (2/3) The 3rd experiment shows the improvements

achieved by the use of pruning and reordering techniques.

Pruning: B3 precision is decreased but the global F1 is increased due to a considerably improvement of recall.

Reordering: recovers the precision lost by the pruning without loosing recall, which achieves the best performances of 69.7 with CEAF and 74.9 with B3

Page 25: A Global Relaxation Labeling Approach to Coreference Resolution

25

Experimental Results (3/3) The 4th experiment evaluates the influence of the

initial state.

Page 26: A Global Relaxation Labeling Approach to Coreference Resolution

26

Conclusion The approach for coreference resolution presented

in this paper is a constraint-based graph partitioning solved by relaxation labeling. The capacity to easily incorporate constraints from

different sources and using different knowledge is also remarkable.

three tecniques to improve results have been presented: reordering, pruning and feature selection by Hill Climbing.

The approach also outperforms baseline and others in the state of the art using same corpora and metrics.

Page 27: A Global Relaxation Labeling Approach to Coreference Resolution

27

5th International Workshop on Semantic EvaluationSemEval-2010 Task 1: Coreference Resolution in Multiple Languages