the process of anaphora resolution (.ppt)

Post on 11-Jun-2015

1.380 Views

Category:

Documents

7 Downloads

Preview:

Click to see full reader

TRANSCRIPT

The process of anaphora resolution

Ruslan Mitkov

Anaphoric jokes or the importance of correct anaphora resolution

• There is a pile of inflammable trash next to your car. You'll have to get rid of it. 

• Fried eggs should be cooked properly and if there are frail or elderly people in the house, they should be hard-boiled.

• Autumn leaves may cause a problem for the elderly, especially when they fall and become a wet and soggy mess on the ground.

• If these shoes don't fit your feet, you can exchange them. 

Anaphoric jokes or the importance of correct anaphora resolution (2)

• If the baby does not thrive on raw milk, boil it.

• If an incendiary bomb drops near you, don't lose your head. Put it in a bucket and cover it with sand.

• There will be a Moscow Exhibition of Arts by 150,000 Soviet Republic painters and sculptors. These were executed over the past two years.

Outline of the lecture

• Anaphora resolution and the knowledge needed

• Anaphora resolution in practice– Identification of anaphors – Defining search scope and identifying

candidates– The resolution algorithm: factors in

anaphora resolution • Discussion limited to nominal anaphora

Morphological and lexical knowledge

• Some anaphors are successfully resolved on the basis of lexical information such as gender and number 

• Anaphors usually match (the head of) their antecedents in gender and number.

Examples gender and number agreement rules

• Steven, of Worthing, Sussex, said he and Emily had a huge row after he discovered she had been skipping lessons at school.

• John Bradley spoke to Jane McCarthy and to the Browns about a forthcoming project. The businessman said this enterprise would cost millions

• The gender and number agreement rule is not as discriminative for English as for German or Russian.

Syntax knowledge

• Identification of boundaries: vital for identifying NPs, PPs, sentences

• Application of syntax-based constraints (“filtering” rules)

• Application of syntax-based preferences

Semantic knowledge• However important morphological, lexical and

syntax knowledge are, in many cases they alone cannot help. – The mouse was under the table. It was eating a

piece of cheese.

• Semantic knowledge vital for resolving lexical noun phrase anaphors– Roy Keane has warned Manchester United he may

snub their pay deal. United's skipper is even hinting that unless the future Old Trafford Package meets his demands, he could quit the club in June 2000. Alex Ferguson's No. 1 player confirmed…

Discourse knowledge • Morphological, lexical, syntactic and semantic

criteria are not always sufficient to distinguish between a set of possible candidates.

• Jenny put the cup on the plate and broke it.• Jenny went window shopping yesterday and spotted a

nice cup. She wanted to buy it, but she had no money with her. Nevertheless, she knew she would be shopping the following day, so she would be able to buy the cup then. The following day, she went to the shop and bought the coveted cup. However, once back home and in her kitchen, she put the cup on a plate and broke it...

Real-world knowledge

• The soldiers shot at the women and they fell.

• The soldiers shot at the women add they missed.

• They can be resolved only with the help of real-world knowledge.

• Rule 1: If X shoots at Y and if Z (Z {X,Y}) falls, then it is more likely for Z to be Y

• Rule 2: If X shoots at Y and if Z (Z {X,Y}) misses, then it is more likely for Z to be X

Real-world knowledge (2)• The following pronominal anaphors are no easier

to be dealt with:• The FBI's role is to ensure our country's freedom

and be ever watchful of those who threaten it.• The KGB's role is to ensure our country's

freedom and be ever watchful of those who threaten it.

• If Peter Mandelson had been in Tony Blair’s shoes he would have demanded his resignation the day the Prime Minister forced him to leave the Cabinet.

Anaphora resolution in practice

– Identification of anaphors

– Defining search scope and

identifying candidates

– The resolution algorithm: factors in

anaphora resolution

Identification of anaphors

• Identification of pronouns

• Identification of pleonastic pronouns: – It must be stated that Oskar behaved

impeccably – It is cloudy – It’s three o'clock

Identification of anaphors (2)

• Identification of lexical NPs (definite descriptions, proper names)

• Identification of non-anaphoric lexical NPs

• Queen Elizabeth attended the ceremony. The Queen delivered a speech.

• The Queen attended the ceremony. The Duchess of York was there too.

Tools and resources needed at this stage

• Morphological or lexical information usually provided by a morphological analyser, part-of-speech tagger or dictionary.

• Program for recognising pleonastic pronouns or one for identifying non-anaphoric definite descriptions

• Parser• Machine learning annotated corpora. • Partial parser or NP extractor• Proper name recogniser• Ontology (WordNet)

Location of the candidates for antecedents

• All NPs preceding an anaphor within a certain search scope are initially regarded as potential candidates for antecedents

• Typical search scope pronominal anaphora: 2-3 sentences

• Typical search scope lexical NP anaphors: up to 10 sentences

• Discourse segment

Tools/resources needed at this stage

• Identifying noun phrases and the sentence boundaries

• Full parser / sentence splitter (or POS tagger) + NP extractor

• Clause splitter

• Discourse segmentation algorithm

• Proper name recogniser

The resolution algorithm: factors in anaphora resolution

• One the anaphors have been detected, the program will attempt to resolve them by selecting their antecedents from the identified sets of candidates.

• The resolution rules based on the different sources of knowledge and used in the resolution process usually referred to as "anaphora resolution factors".

Eliminating factors (constraints)

• Factors that eliminate candidates for antecedents

• gender constraints

• number constraints

• GB theory (c-command) constraints

• selectional restrictions

Definition c-command

A node A c-commands a node B if and only if

I. A does not dominate B

II. B does not dominate A

III. the first branching node dominating A also dominates B

Example c-command

A B C D E F G H I J K L M N For example (not exhaustive): A c-commands all the other nodes. B c-commands C and every node that C dominates. C c-commands B and every node that B dominates. D c-commands E and J, but not C, or any of the nodes that C dominates. H c-commands I and no other node.

Preferential factors (preferences)

• salience (center of attention)

• parallelism

• proximity

Gender and number agreement constraints

• This constraint requires that anaphors and their antecedents must agree in number and gender.– Jane told Philip and his friends that she

was in love.

Traps: are gender and number agreement a must?

• Ask another Macintosh user about the problem you’re having; they may have a solution

• If there is a doctor on board, could they please make themselves known to the crew

• You were called on the 30th of April at 21.38 hours. The caller withheld their number

Traps: other gender and number agreement problems

Problems with proper names

Proper names are ambiguous in gender in one language

Chris, Lesley or Tracey in English, Claude in French

Or differ in gender across languages such as Jean

C-command constraints (1)

A reflexive anaphor must be c-commanded by its antecedent. In Sylvia admires herself herself is c-commanded by Sylvia S NP VP NP V Sylvia admires herself

C-command constraints (2)

• A non-pronominal NP cannot corefer with NP that c-commands it– He warned Mr. Byers that whether it will

be public or private investment….

Selectional restrictions (semantic constraints)

• Semantic (selectional) restrictions which apply to the anaphor, should apply to the antecedent as well. – Vincent removed the diskette from the

computer and then disconnected it.– Vincent removed the diskette from the

computer and then copied it.

Preferences

• There is a general preference to the most recent compatible candidate NP to be the antecedent, but this is not always the case. – Jack took the newspaper and then got hold of

the magazine. He started reading it straight away.

– The newspaper contained the latest international news but the magazine did not. Jack started reading it straight away.

Preferences (2)

• Entities in the major clause are favoured over those in the subordinate clause

• Preference to entities in non-adjunct phrases over those in adjunct phrases. – Jack drank the wine on the table. It was

brown and round.

Preferences (3)

• Preference is given to NPs with the same syntactic function as the anaphor. – The programmer successfully combined Prolog

with C, but he had combined it with Pascal last time.

– The programmer successfully combined Prolog with C, but he had combined Pascal with it last time.

– The program successfully combined Prolog with C but Jack had to modify it because the combination of Prolog with Perl did not work.

Preferences (4)• Center / Focus preference

– If an incendiary bomb drops near you, don't loose your head. Put it in a bucket and cover it with sand.

– Tilly tried on the dress over her skirt and ripped it.

– Tilly's mother had agreed to make her a new dress for the party. She worked hard on the dress for weeks and finally it was ready for Tilly to try on. Impatient to see what it would look like, Tilly tried on the dress over her skirt and ripped it.

Tools and resources needed

• The gender and number filters require information on the gender and number of the anaphor and its candidates

• dictionaries or

• part-of-speech taggers

• partial parsers

• morphological analysers

Tools and resources needed (2)• C-command constraints require tree structures of

sentences full parser is desirable.• Semantic knowledge can be provided by WordNet• Some semantic information (verb selectional

restrictions) supplied in dictionary entries• Selectional restrictions can be cheaply modelled by

collocations extracted from corpora• Word-sense disambiguation • A center tracking program is needed for

approaches relying on center preference.

Example of anaphora resolution based on a simple model

• A simple constraint-based model using the gender and number agreement constraint and the center preference.

• Jim was dating both Jane and Becky but it was Jane he was falling in love with. The young lad found her very attractive and kind.

top related