curs 10: veins theory discourse structure and coherence dan cristea selecţie de sliduri

32
Curs 10: Veins Theory Discourse structure and coherence Dan Cristea Selecţie de sliduri

Upload: mae-powers

Post on 25-Dec-2015

216 views

Category:

Documents


0 download

TRANSCRIPT

Curs 10: Veins TheoryDiscourse structure and coherence

Dan Cristea

Selecţie de sliduri

On cohesion

Types of references

evocative references

-evocative resolution processes:- an anaphor may be resolved to a referent that is not linearly the closest, but only hierarchically the closest - based on associations (pattern matching on morpho-semantic features) - fast- give fluency to the text

Types of references

- post-evocative resolution processes:- are inferential processes developed in memory, - computationally and cognitively slow (compel to more inference load),- require more powerful referencing means (like proper nouns), - are less frequent.

post-evocative references

Domain of evocative accessibility (DEA)

dea(u) = pref(u, vein(u))

Remind! The vein expression of a terminal node (discourse unit): the sequence of units that are required to understand just that unit, in the context of the whole discourse.

(simplified)

Heads and veins

H=3

H=1 2

H=3H=1

H=2 H=3 H=4

H=5

H=3

1 2 3 4

5

V=3 5

V=3V=3

V=1 2 3

V=1 2 3

V=1 2 3

V=(1 2) 3

V=(1 2) 3 V=3 4

From vein expressions...

1 2 3 4

5

V=1 2 3

V=1 2 3 V=(1 2) 3 V=3 4

V=3 5

... to Domains of Evocative Accessibility

1 2 3 4

5

V=3 5

V=1 2 3

V=1 2 3 V=3 4

DEAs

V=1 2 3

The reason why she can refer Mary but not John’s mother

1 John told Mary that he loves her.2. He has never been married 3. and lived until his 40s with his mother. 4. She, on the contrary, was married twice.

antithesis

14

4

2

elaboration

3

1

elaboration 1

2 3

4

V=1 2 4

The reason why we recuperate with difficulty the antecedent of it

1. With one year before finishing his mandate as president of the company,

2. Mr. W. Ross has begun to bring about its bankruptcy. 3. There were rumors that he has obtained it by fraud.

13

circumstance

21

background

3

1

2 3

V=2 3

… while here the reference is immediate

1. Mr. W. Ross has begun to bring about the bankruptcy of his company.

2. with one year before finishing his mandate as president.3. There were rumors that he has obtained it by fraud.

13

2

background

3

1

circumstance

1

2 3

V=1 2 3

Experiment 1: evocative vs post-evocative references

Source No. of units

Total no. of refs

On the veins

Outside the veins

English 62 97 91.70% 8.30%

French 48 110 99.10% 0.90%

Romanian 66 111 95.50% 4.50%

Total 176 318 95.60% 4.40%

The 4.4% exceptions

decreasing evoking

power

Type of RE VT

pragmatic 56.30%

proper nouns 22.70%

common nouns 16.00%

pronouns 5.00%

Experiment 2: potential to establish correct co-reference links

• Compare Linear-k and Discourse-VT-k models:– For each k, each re, and each model M

(Linear or VT)• p(M-k,re,DEAk) =

• p(M-k,Corpus) = re Corpus p(M-k,re,DEAk)

1, re can be resolved to antecedents in DEAk

0, otherwise.{

Potentials

70.00%

75.00%

80.00%

85.00%

90.00%

95.00%

0 1 2 3 4 5 6 7 8 9E - D E A s i z e

VT-k Linear-k

Experiment 3: the effort required to find antecedents

• Compare Linear-k and Discourse-VT-k models:– For each k, each re, and each model M

(Linear or VT)• e(M-k,re,DEAk) =

• e(M-k,Corpus) = re Corpus e(M-k,re,DEAk)

d<k, the distance between re and the closest antecedent in DEAk

k, if no such antecedent exists.{

Effort: an example

Michael D. Casey

Genetic Therapy Inc.

Mr. Casey

Genetic Therapy Inc.

Mr. Casey

the smaller company

Johnson & Johnson

M. James Barett

chairman

its president

its

J&J

Mr. Casey

J&J

Mr. Barett

CEO

2 3 4 5 6 7 81 9

1. Michael D. Casey, a top Johnson&Johnson manager, moved to Genetic Therapy Inc., a small biotechnology concern here,

2. to become its president and chief operating officer.

3. Mr. Casey, 46 years old, was president of J&J's McNeil Pharmaceutical subsidiary,

4. which was merged with another J&J unit, Ortho Pharmaceutical Corp., this year in a cost-cutting move.

5. Mr. Casey succeeds M. James Barrett, 50, as president of Genetic Therapy.

6. Mr. Barrett remains chief executive officer

7. and becomes chairman.8. Mr. Casey said 9. he made the move to the

smaller company.

Efforts

0

1000

2000

3000

4000

5000

6000

7000

8000

E - D E A s i z eVT Process Lin Process

The account of VT on coherence

• Veins give a natural way to generalize Centering from local to global

Centering Rule 2: transitions

Cb(u) = Cb(u-1) Cb(u) Cb(u-1)

Cb(u) = Cp(u)

Cb(u) Cp(u)

CONTINUING SMOOTH SHIFT

RETAINING ABRUPT SHIFT

CON > RET > SSH > ASH

1 2 3 4

5 V=1 3 5

V=1 3 5

V=1 2 3 5 V=1 3 5V=1 3 4 5

Vein expressions give „lines of argumentation“

1. John sold his bicycle

1. John sold his bicycle

3. He obtained a good price for it,

5. Therefore he decided to use the money to go on a trip.

1. John sold his bicycle2. although Bill would have wanted it3. He obtained a good price for it,4. which Bill could not have afforded5. Therefore he decided to use the money to go on a trip.

1 2 3 4

5 V=1 3 5

V=1 3 5

V=1 2 3 5 V=1 3 5V=1 3 4 5

Lines of argumentation

2. although Bill would have wanted it.

1. John sold his bicycle

2. although Bill would have wanted it

3. He obtained a good price for it,

5. Therefore he decided to use the money to go on a trip.

1 2 3 4

5 V=1 3 5

V=1 3 5

V=1 2 3 5 V=1 3 5V=1 3 4 5

Lines of argumentation

3. He obtained a good price for it,

1. John sold his bicycle

3. He obtained a good price for it,

5. Therefore he decided to use the money to go on a trip.

1 2 3 4

5 V=1 3 5

V=1 3 5

V=1 2 3 5 V=1 3 5V=1 3 4 5

Lines of argumentation

4. which Bill could not have afforded.

1. John sold his bicycle

3. He obtained a good price for it,

4. which Bill could not have afforded

5. Therefore he decided to use the money to go on a trip.

1 2 3 4

5 V=1 3 5

V=1 3 5

V=1 2 3 5 V=1 3 5V=1 3 4 5

Lines of argumentation

5. Therefore he decided to use the money to go on a trip.

1. John sold his bicycle

3. He obtained a good price for it,

5. Therefore he decided to use the money to go on a trip.

Computation of longest argumentation lines (al)

u V(u) dea(u) al

1 1 3 5 1

2 1 2 3 5 1 2 1 2

3 1 3 5 1 3

4 1 3 4 5 1 3 4 1 3 4

5 1 3 5 1 3 5 1 3 5

Evaluating the coherence of a discourse

• A smoothness score:– CONTINUING = 4– RETAINING = 3– SMOOTH SHIFT =2– ABRUPT SHIFT = 1– NO Cb = 0

• A global smoothness score: summing up the score of all units

The second conjecture (on coherence)

• The global smoothness score of a discourse when computed following VT is at least as high as the score computed following CT.

• But segments, as considered by Centering, typically are developed along veins.

• When passing segments frontiers, in a linear reading, transitions are usually abrupt.

• Therefore, what we claim here is that long-distance transitions, as computed along veins, are systematically smoother than accidental transitions at segment boundaries.

Transitions and scores on a linear adjacency metric

J = [John], b = [John's bicycle], B = [Bill], p = [price], m = [the money], t = [a trip])

1 2 3 4 5

Cf J, b B, b J, p, b p, B J, m, t

Cb J b b p -

Trans ASH RET SSH No Cb

Score 1 3 2 0

Global 6/4 = 1.5

Transitions and scores on a hierarchical adjacency metric

1 2

Cf J, b B, b

Cb J b

Trans ASH

Score 1

Global

1 3 4

J, b J, p, b p, B

J J p

CON SSH

4 2

1 3 5

J, b J, p, b J, m, t

J J J

CON

4

11/4=2.75

Verifying the second conjecture

Source No. of transitions

CT score Average CT score

per transition

VT score Average VT score

per transition

English 59 76 1.25 84 1.38

French 47 109 2.35 116 2.47

Romanian 65 142 2.18 152 2.34

Total 173 327 1.89 352 2.03

VT referencesCristea,D.; Ide,N.; Romary,L. (1998): Veins Theory. An Approach to Global

Cohesion and Coherence. In Proceedings of Coling/ACL ‘98, Montreal

Cristea,D., Ide,N., Marcu,D., Tablan, M.-V. (2000): Discourse Structure and Co-Reference: An Empirical Study, In Proceedings of The 18th International Conference on Computational Linguistics COLING'2000, Luxembourg

Ide,N., Cristea,D. (2000): A Hierarchical Account of Referential Accessibility. In Proceedings of The 38th Annual Meeting of the Association for Computational Linguistics, ACL'2000, Hong Kong

Sereţan,V., Cristea,D. (2002): The use of referential constrains in structuring discourse. In Proceedings of The Third International Conference on Language Resources and Evaluation, LREC-2002, Las Palmas

Cristea, D. (2005): Motivations and Implications of Veins Theory, in B.Sharp (Ed.). Natural Language Understanding and Cognitive Science, Proceedings of the 2nd International Workshop on Natural Language Understanding and Cognitive Scienc3, NLUCS 2005, in conjunction with ICEIS 2005, Miami, U.S.A., May 2005, INSTICC Press