tutorial on coreference resolution

Образец заголовка

Tutorial on Coreference Resolution

by Anirudh Jayakumar (ajayaku2), Sili Hui(silihui2)

Prepared as an assignment for CS410: Text Information Systems in Spring 2016

Образец заголовкаAgenda

We will mainly address 3 problems

1. What is the Coreference Resolution problem?

2. What are the existing approaches to it?

3. What are the future directions?

Образец заголовкаWhat is the Coreference Resolution problem?

Suppose you are given a sample text,

I did not voted for Donald Trump because I think he is…

How can program tell he refers to Donald Trump?

Образец заголовкаDefinition

• Coreference resolution is the task of finding all noun phrases (NPs) that refer to the same real-world entity.

• In previous example, he == Donald Trump

• One of the classical NLP problems

Образец заголовкаWhy do we care?

• Suppose your boss ask you to general opinion of Donald Trump from a corpus of collected text data, how to finish your job?

• I did not voted for Donald Trump because I think he is…

• What goes after “he is…” provides information about the sentiment of this person towards “he”, but what does he refers to?

• If we know “he” refers to “Donald Trump” we know more about this person! (that either likes or, most likely, dislikes Donald Trump)

– Small dataset can be labeled by hand (time consuming but ok-ish)

– What if we have GBs of text data????

Образец заголовкаWhy do we care?

• This is where coreference resolution come into play– We know more about what entity are associated with

what words• There are many potential real world use cases:

– information extraction

– information retrieval

– question answering

– machine translation

– text summarization

Образец заголовкаA brief history of mainstream…• 1970s - 1990s

– Mostly linguistic approaches– Parse tree, semantic analysis, etc.

• 1990s - 2000s

– More machine Learning approaches to this problem– Mostly supervised machine learning approaches

• Later 2000s - now– More unsupervised machine learning approaches came out– Other models (ILP, Markov Logic Nets, etc) were proposed

Образец заголовкаHow to evaluate?

• How to tell my approach is better than yours?

• Many well-established datasets and benchmarks

– ACE

– MUC

• Evaluate the performance on these datasets on F1 score, precision, etc.

Образец заголовкаTaxonomy of ideas

• In this tutorial, we will focus on two approaches:

– Linguistic approaches

– Machine learning approaches

• Supervised approaches

• Unsupervised approaches

• Other approaches will be briefly addressed towards the end

Образец заголовкаLinguistic Approach

• Appear in 1980s• One of very first approaches to the problem • Take advantage of linguistic of the text

– parse tree– syntactic constraints– semantic analysis

• Requires domain specific knowledge

Образец заголовкаLinguistic Approach• Centering approach to pronouns was proposed by : S.E. Brennan, M.W.

Friedman, and C.J. Pollard in 1987

• Centering theory was proposed in order to model the “relationships among

– a) focus of attention

– b) choice of referring expression

– c) perceived coherence of utterances

• An entity means an object, that could be the targets of a referring expression.

• An utterance is used to describe the basic unit, which could be a sentence or a clause or a phrase

• Each utterance is assigned a set of forward-looking centers, Cf (U), and a single backward-looking center, Cb(U)

Образец заголовкаLinguistic Approach

• The algorithm consists of four main steps– Construct all possible < Cb, Cf > pairs, by taking the

cross-product of Cb, Cf lists– Filter these pairs by applying certain constraints– Classify each pair based on the transition type, and

rank the pairs– Choose the best ranked pair

• The goal of the algorithm design was conceptual clarity rather than efficiency

Образец заголовкаMachine Learning Approaches

• More ML approaches appear since 1990s

• We consider two classical categorizations of ML approaches:

– Supervised learning

• Take advantage of labeled data (train) and predict on unlabeled data

– Unsupervised learning

• Feed in unlabeled data the algorithm will do the right thing for you(hopefully)

Образец заголовкаSupervised Learning

• Supervised learning is the machine learning task of inferring a function from labeled training data.

• The training data consist of a set of training examples.

• A supervised learning algorithm analyzes the training data and produces an inferred function, which can be used for mapping new examples

Образец заголовкаSupervised Paper 1

• Evaluating automated and manual acquisition of anaphora resolution strategies - Chinatsu Aone and Scott William Bennett

• This paper describes an approach to build an automatically trainable anaphora resolution system

• Uses discourse information tagged Japanese newspaper articles as training examples for a C4.5 decision tree algorithm

• The training features include lexical (e.g. category), syntactic (e.g. grammatical role), semantic (e.g. semantic class) and positional (e.g. distance between anaphor and antecedent)

Образец заголовкаSupervised Paper 1 cont.• The method uses three training techniques using different parameters

– The anaphoric chain parameter is used in selecting positive and negative training examples

– With anaphoric type identification parameter, • answer "no" when a pair of an anaphor and a possible antecedent

are not co-referential, • answer “yes” the anaphoric type when they are co-referential

– The confidence factor parameter (0-100) is used in pruning decision trees. With a higher confidence factor, less pruning of the tree

• Using anaphoric chains without anaphoric type identification helps improve the learning algorithm

• With 100% confidence factor, the tree overfits the examples that lead to spurious uses of features

Образец заголовкаSupervised Paper 2• A Machine Learning Approach to Coreference Resolution of Noun Phrases: Wee

Meng Soon, Hwee Tou Ng and Daniel Chung Yong Lim

• Learning approach in unrestricted text by learning from a small-annotated corpus

• All markables in the training set are determined by pipeline of NLP modules consists of tokenization, sentence segmentation, morphological processing, part-of-speech tagging, noun phrase identification, named entity recognition, nested noun phrase extraction and semantic class determination

• The feature vector consists of 12 features derived based on two extracted markables, j, and i where i is the potential antecedent and j is the anaphor

Образец заголовкаSupervised Paper 2 cont.• The learning algorithm used in our coreference engine is C5, which is an

updated version of C4.5

• For each j, the algorithm considers every markable i before j as a potential antecedent. For each pair i and j, a feature vector is generated and given to the decision tree classifier

• The coreference engine achieves a recall of 58.6% and a precision of 67.3%, yielding a balanced F-measure of 62.6% for MUC-6.

• For MUC-7, the recall is 56.1%, the precision is 65.5%, and the balanced F-measure is 60.4%.

Образец заголовкаSupervised Paper 3• Conditional models of identity uncertainty with application to proper noun

coreference: A. McCallum and B. Wellner• This paper introduces several discriminative, conditional-probability models

for coreferecne analysis. • No assumption that pairwise coreference decisions should be made

independently from each other• Model 1:

– Very general discriminative model where the dependency structure is unrestricted

– The model considers the coreference decisions and the attributes of entities as random variables, conditioned on the entity mentions

– The feature functions depend on the coreference decisions, y, the set of attributes, a as well as the mentions of the entities, x.

Образец заголовкаSupervised Paper 3 cont.• Model 2: Authors remove the dependence of the coreference variable, y, by

replacing it with a binary valued random variable, Yij for every pair of mentions

• Model 3: The third model that they introduce does not include attributes as a random variable, and is otherwise similar to the second model

• The model performs a little better than the approach by Ng and Cardie (2002)

• the F1 results on NP coreference on the MUC-6 dataset is only about 73%.

Образец заголовкаSupervised Paper 4• Kernel-Based Pronoun Resolution with Structured Syntactic Knowledge:

Xiaofeng Yang, Jian Su and Chew Lim Tan

• a kernel-based method that can automatically mine the syntactic information from the parse trees for pronoun resolution

• for each pronominal anaphor encountered, a positive instance is created by paring the anaphor and its closest antecedent

• a set of negative instances is formed by paring the anaphor with each of the non-coreferential candidates

• The learning algorithm used in this work is SVM to allow the use of kernels to incorporate the structured feature

Образец заголовкаSupervised Paper 4 cont.• The study examines three possible structured features

• Min-Expansion records the minimal structure covering both the pronoun and the candidate in the parse tree

• Simple-Expansion captures the syntactic properties of the candidate or the pronoun

• Full-Expansion focuses on the whole tree structure between the candidate and pronoun

• Hobbs’ algorithm obtains 66%-72% success rates on the three domains while the baseline system obtains 74%-77% success rates

Образец заголовкаUnsupervised learning

• Let it run on top of your data, no supervision of wrong or right. Most are iterative methods.

• Generally preferred over supervised learning– Does not generally need labeled data

– Does not generally need prior knowledge

– Does not subject to dataset limitation

– Often scales better than supervised approaches

• Yet, it came along way…

Образец заголовкаUnsupervised Paper 1

• First notable Unsupervised learning algorithms came out in 2007 by Haghighi, Aria, and Dan Klein– it presents a generative model– the objective function is to maximize posterior

probability of entities given collection of variables of current mention

– it also discussed adding features to the collection like gender, plurality and entity activation (how often this entity is mentioned)

Образец заголовкаUnsupervised Paper 1 cont.

• Resulting a over-complicated generative model

• Achieving 72.5 F1 on MUC-6• Set a good standard for later algorithms


• Inspired by the previous papers

• Another unsupervised method is proposed by Vincent Ng in 2008

– Use a new but simpler generative model

– Consider all pairs of mentions in a document

– Probability of pair of mentions taking into account 7 context features (gender, plurality, etc)

– Use classical EM algorithm to iterative update those parameters


• Greatly simplified previous generative model

• Can be applied to document level instead of collection level

• Beat the performance of previous model on AEC dataset (by small margin)


• Previous methods emphasize on generative model• Why not use Markov Net?• Proposed by Hoifung Poon and Pedro Domingos

– formulate the problem in Markov Logic Net(MLN)– define rules and clauses and gradually build from a

base model by adding rules– leverage other sampling algorithms in training and

inference step


• Pioneer work of leveraging Markov Logic into coreference resolution problem

• Beat the generative model proposed by Haghighi and Klein by large margin on MUC-6 dataset.

• Authors are pioneers in Markov Logic Network. This paper may be just a “showcase” of their work and what MLN can do?

Образец заголовкаRelated Work

• There are many other related work:

– Formulation of equivalent ILP problem

• Pascal Denis and Jason Bladridge

– Enforce transitivity property on ILP

• Finkel, Jenny Rose, and Christopher D. Manning

– Latent structure prediction approach

• Kai-wei Chang, Rajhans Samdani, Dan Roth (professor @ UIUC)

Образец заголовкаFuture Direction• After studies, we think here are some major points of future directions• More Standardized Updated Benchmarks

– Coreference resolution research should use a more standard set of the standard corpora; in this way, results will be comparable.

• First-order and cluster based features will play an important role – The use of them has given the field a much-needed push, and will likely

remain a staple of future state-of-the-art• Combination of linguistic ideas into modern models

– Combining the strengths of the two themes, using more of the richer machine learning models with the linguistic ideas.

Образец заголовкаConclusion

• Thanks for going through our tutorial!• Major take-aways:

– Coreference Resolution remains an active research area– Modern research tends to diverge from pure linguistic analysis– Generally, the performance(evaluated on well-established dataset) for

state-of-art algorithms is still not optimal for industrial usages that requires precise labels

– For general purpose, modern unsupervised learning approach can achieve decent accuracy compared to supervised learning approach

– Future machine learning approaches will leverage more linguistic knowledge (features) into their model

Образец заголовкаReference• Brennan, Susan E., Marilyn W. Friedman, and Carl J. Pollard. "A centering approach to pronouns." Proceedings of the 25th annual meeting on

Association for Computational Linguistics. Association for Computational Linguistics, 1987.• Aone, Chinatsu, and Scott William Bennett. "Evaluating automated and manual acquisition of anaphora resolution strategies." Proceedings of the 33rd

annual meeting on Association for Computational Linguistics. Association for Computational Linguistics, 1995.• Ge, Niyu, John Hale, and Eugene Charniak. "A statistical approach to anaphora resolution." Proceedings of the sixth workshop on very large corpora.

Vol. 71. 1998.• Soon, Wee Meng, Hwee Tou Ng, and Daniel Chung Yong Lim. "A machine learning approach to coreference resolution of noun phrases."

Computational linguistics 27.4 (2001): 521-544.• McCallum, Andrew, and Ben Wellner. "Toward conditional models of identity uncertainty with application to proper noun coreference." (2003).• Yang, Xiaofeng, Jian Su, and Chew Lim Tan. "Kernel-based pronoun resolution with structured syntactic knowledge." Proceedings of the 21st

International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics. Association for Computational Linguistics, 2006.

• Haghighi, Aria, and Dan Klein. "Unsupervised coreference resolution in a nonparametric bayesian model." Annual meeting-Association for Computational Linguistics. Vol. 45. No. 1. 2007.

• Ng, Vincent. "Unsupervised models for coreference resolution." Proceedings of the Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 2008.

• Poon, Hoifung, and Pedro Domingos. "Joint unsupervised coreference resolution with Markov logic." Proceedings of the conference on empirical methods in natural language processing. Association for Computational Linguistics, 2008.

• Finkel, Jenny Rose, and Christopher D. Manning. "Enforcing transitivity in coreference resolution." Proceedings of the 46th Annual Meeting of the Association for Computational Linguistics on Human Language Technologies: Short Papers. Association for Computational Linguistics, 2008.

• Chang, Kai-Wei, Rajhans Samdani, and Dan Roth. "A constrained latent variable model for coreference resolution." (2013).• Denis, Pascal, and Jason Baldridge. "Joint Determination of Anaphoricity and Coreference Resolution using Integer Programming."

HLT-NAACL. 2007.

tutorial on coreference resolution

Technology