summarizing documents based on cue-phrases and references
DESCRIPTION
Summarizing documents based on cue-phrases and references. Goal: coherent focused summaries. What is a focused summary? - reveals on short what the document tells about the key entity (focus), within the context of the whole document Why focused summaries? - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Summarizing documents based on cue-phrases and references](https://reader035.vdocuments.mx/reader035/viewer/2022081515/56816807550346895ddd8994/html5/thumbnails/1.jpg)
Summarizing documents based on cue-phrases and references
Dan CristeaOana Postolache
Georgiana PuşcaşuLaurenţiu Ghetu
“Alexandru Ioan Cuza” University of Iasi
Romania
![Page 2: Summarizing documents based on cue-phrases and references](https://reader035.vdocuments.mx/reader035/viewer/2022081515/56816807550346895ddd8994/html5/thumbnails/2.jpg)
Goal: coherent focused summaries
What is a focused summary? - reveals on short what the document tells about the key entity (focus), within the context of the whole document
Why focused summaries? For example, when searching the web about an entity: - avoid browsing tremendous list of links to documents mentioning that entity (as given by a normal search engine) - read abstracts that mention the searched entity- if of minor importance in a document, the searched entity will not appear in a normal abstract
![Page 3: Summarizing documents based on cue-phrases and references](https://reader035.vdocuments.mx/reader035/viewer/2022081515/56816807550346895ddd8994/html5/thumbnails/3.jpg)
The idea
summary
discourse structure
cue-phrases anaphoric references
VT
![Page 4: Summarizing documents based on cue-phrases and references](https://reader035.vdocuments.mx/reader035/viewer/2022081515/56816807550346895ddd8994/html5/thumbnails/4.jpg)
The proposed method (1)
Preparatory phases:
- POS-tagging
- Syntactic tagging done by an FDG parser
- NP-tagging
![Page 5: Summarizing documents based on cue-phrases and references](https://reader035.vdocuments.mx/reader035/viewer/2022081515/56816807550346895ddd8994/html5/thumbnails/5.jpg)
The proposed method (2)
Step 1: segmentation into elementary discourse units (edu-s)
Maria went alone to the market because Simon had to stay at home with the baby. Simon is a good friend of mine and he also helped me in a number of situations. For instance he was very helpful when I had the problem with the car. I think she has a lot of trust in him to let him alone with the child. You know how Maria is: she is not very hurried to give credit to anybody.
1 2
3 45 6
78 9
![Page 6: Summarizing documents based on cue-phrases and references](https://reader035.vdocuments.mx/reader035/viewer/2022081515/56816807550346895ddd8994/html5/thumbnails/6.jpg)
The proposed method (2)
Maria went alone to the market because Simon had to stay at home with the baby. Simon is a good friend of mine and he also helped me in a number of situations. For instance he was very helpful when I had the problem with the car. I think she has a lot of trust in him to let him alone with the child. You know how Maria is: she is not very hurried to give credit to anybody.
1 2
3 45 6
78 9
Step 2: building-up sentence level discourse trees (sdt-s)
5
– when –
6
– for instance –
![Page 7: Summarizing documents based on cue-phrases and references](https://reader035.vdocuments.mx/reader035/viewer/2022081515/56816807550346895ddd8994/html5/thumbnails/7.jpg)
The proposed method (2)
Step 3: anaphora resolution
Maria went alone to the market because Simon had to stay at home with the baby. Simon is a good friend of mine and he also helped me in a number of situations. For instance he was very helpful when I had the problem with the car. I think she has a lot of trust in him to let him alone with the child. You know how Maria is: she is not very hurried to give credit to anybody.
![Page 8: Summarizing documents based on cue-phrases and references](https://reader035.vdocuments.mx/reader035/viewer/2022081515/56816807550346895ddd8994/html5/thumbnails/8.jpg)
The proposed method (2)
Step 4: integration of sdt-s in a global structure
foot node
*
pdti-1
sdti
pdti
![Page 9: Summarizing documents based on cue-phrases and references](https://reader035.vdocuments.mx/reader035/viewer/2022081515/56816807550346895ddd8994/html5/thumbnails/9.jpg)
The proposed method (2)
Step 5: generating the summary
![Page 10: Summarizing documents based on cue-phrases and references](https://reader035.vdocuments.mx/reader035/viewer/2022081515/56816807550346895ddd8994/html5/thumbnails/10.jpg)
Step 1: Text segmentation method• Identification of finite verbs
• Extraction of the FDG-sub-tree rooted in each finite verb (If FDG tagging is correct, then every sub-tree will represent a clause)
• Grouping clauses, if necessary, into discourse units
Maria went alone to the market because Simon had to stay at home with the baby. Simon is a good friend of mine and he also helped me in a number of situations. For instance he was very helpful when I had the problem with the car. I think she has a lot of trust in him to let him alone with the child. You know how Maria is: she is not very hurried to give credit to anybody.
![Page 11: Summarizing documents based on cue-phrases and references](https://reader035.vdocuments.mx/reader035/viewer/2022081515/56816807550346895ddd8994/html5/thumbnails/11.jpg)
Step 2: Inference of the sdt-s (1)
Cue words or phrases (markers)
Inter-edu-s local dependencies
Sentence level discourse trees
Inner nodes labeled with markers
Terminal nodes labeled with edu-s
![Page 12: Summarizing documents based on cue-phrases and references](https://reader035.vdocuments.mx/reader035/viewer/2022081515/56816807550346895ddd8994/html5/thumbnails/12.jpg)
Step 2: Inference of the sdt-s (2)
Ambiguities:[1] so [2,3,4]because [2,3,4], [2,3,4][1,2] and [3,4] Inferring the sdt = finding the proper arguments and nuclearities
Cue-phrases usually suggest patterns of displacement of the connected arguments, nuclearity included
– so – because –, – – and –
John is determined to pass the NLP exam so, because he has missed many courses and was only vaguely implicated at the working sessions , he will have a hard time until summer.
1 4a 23
4b
1 so, because 2 and 3, 4
![Page 13: Summarizing documents based on cue-phrases and references](https://reader035.vdocuments.mx/reader035/viewer/2022081515/56816807550346895ddd8994/html5/thumbnails/13.jpg)
Step 2: Inference of the sdt-s (3) Consistency constraints for elementary sdt-s
The “nesting-arguments” ruleIf an edu xsub-tree(markeri) sub-tree(markerj) with ij, then one marker is in the other one’s sub-tree.This rule states that it is impossible to have two inner nodes of the tree, which cover crossing text spans on the terminal frontier
… …
markeri
markerj
![Page 14: Summarizing documents based on cue-phrases and references](https://reader035.vdocuments.mx/reader035/viewer/2022081515/56816807550346895ddd8994/html5/thumbnails/14.jpg)
Step 3: Anaphora resolutionThe AR-engine
AR-engine is a general framework for anaphora resolution, able to accommodate different AR-models.
AR-engine
text
AR-model3
AR-model2
AR-model1
anaphoric links
![Page 15: Summarizing documents based on cue-phrases and references](https://reader035.vdocuments.mx/reader035/viewer/2022081515/56816807550346895ddd8994/html5/thumbnails/15.jpg)
The three-layered anaphora resolution process
text layer ……………………….…………………………………………
semantic layer ………………………………………
DEm
REa
projection layer ………………………………………………
DEj DE1
PSx
REb REc REd REx
Reference expressions (RE)
Projected structures (PS)
Discourse entities (DE)
![Page 16: Summarizing documents based on cue-phrases and references](https://reader035.vdocuments.mx/reader035/viewer/2022081515/56816807550346895ddd8994/html5/thumbnails/16.jpg)
What is an AR-model?
text layer ……………………….…………………………………………
semantic layer ………………………………………
DEm
REa
projection layer ………………………………………………
DEj DE1
PSx
knowledge sources
primary attributes
REb REc REd REx
domain of referential accessibility
heuristics/rules
![Page 17: Summarizing documents based on cue-phrases and references](https://reader035.vdocuments.mx/reader035/viewer/2022081515/56816807550346895ddd8994/html5/thumbnails/17.jpg)
Types of anaphorae resolved
• Common nouns referring proper nouns
• Common nouns with different lemmas
• Pronominal references
![Page 18: Summarizing documents based on cue-phrases and references](https://reader035.vdocuments.mx/reader035/viewer/2022081515/56816807550346895ddd8994/html5/thumbnails/18.jpg)
Step 4: Compiling the final discourse structure (1)
A discourse structure tree must be derived by combining the sdt-s
(a = Maria, b = Simon, c = the child, d = I, empty = any other REs)
1 2
because
3 4
and
a b c b b d b d b d a b cd a a
when
for instance
:
b
5 6 7 8 9
![Page 19: Summarizing documents based on cue-phrases and references](https://reader035.vdocuments.mx/reader035/viewer/2022081515/56816807550346895ddd8994/html5/thumbnails/19.jpg)
Step 4: Compiling the final discourse structure (2)
1 2
because
a b c
3 4
and
b b d b d
pdt1 = sdt1 sdt2
+ =>1
because
a
3 4
and
b b d b d
2
b c
…
pdt2
![Page 20: Summarizing documents based on cue-phrases and references](https://reader035.vdocuments.mx/reader035/viewer/2022081515/56816807550346895ddd8994/html5/thumbnails/20.jpg)
Step 4: Compiling the final discourse structure (3)
b d
when
for instance
5 6
1
because
a
3 4
and
b b d b d
2
b c
… + =>
pdt2 sdt3pdt3
1
because
a
3
and
b b d
2
b c
…
4
b d
b d
when
for instance
5 6
![Page 21: Summarizing documents based on cue-phrases and references](https://reader035.vdocuments.mx/reader035/viewer/2022081515/56816807550346895ddd8994/html5/thumbnails/21.jpg)
Step 4: Compiling the final discourse structure (4)
a b cd b
7
1
because
a
3
and
b b d
2
b c
…
4
b d
b d
when
for instance
5 6
+ => 1
because
a
3
and
b b d
2
b c
…
4
b d
b d
when
for instance
5 6
…
a b cd b
7
pdt3 sdt4pdt4
![Page 22: Summarizing documents based on cue-phrases and references](https://reader035.vdocuments.mx/reader035/viewer/2022081515/56816807550346895ddd8994/html5/thumbnails/22.jpg)
Step 4: Compiling the final discourse structure (5)
a a
:
8 91
because
a
3
and
b b d
2
b c
…
4
b d
b d
when
for instance
5 6
…
a b cd b
7
+ => 1
because
a
3
and
b b d
2
b c
…
4
b d
b d
when
for instance
5 6
…
a b cd b
7
a a
:
8 9
…
pdt4 sdt5pdt5
![Page 23: Summarizing documents based on cue-phrases and references](https://reader035.vdocuments.mx/reader035/viewer/2022081515/56816807550346895ddd8994/html5/thumbnails/23.jpg)
Step 5: Generating the summary (1)Veins Theory
• Head expression of a node: the sequence of the most important units within the corresponding span of text:– the head of a terminal node: its label– the head of a non-terminal node: the
concatenation of the head expressions of the nuclear children
• the important units are projected up to the level where the corresponding span is seen as a satellite
![Page 24: Summarizing documents based on cue-phrases and references](https://reader035.vdocuments.mx/reader035/viewer/2022081515/56816807550346895ddd8994/html5/thumbnails/24.jpg)
Step 5: Generating the summary (2)Computing head expressions
1
because
3
and2
…
4 when
for instance
5 6
…
7 :
8 9
…
H=6
H=9
H=5
H=4
H=2
H=1
H=3 H=8
H=7
H=2
H=8,9
H=3,4
H=4
H=5
H=7
H=2,7
H=1
![Page 25: Summarizing documents based on cue-phrases and references](https://reader035.vdocuments.mx/reader035/viewer/2022081515/56816807550346895ddd8994/html5/thumbnails/25.jpg)
Step 5: Generating the summary (3)Veins
to understand a piece of text in the context of the whole discourse one needs the significant units within the span together with other surrounding units
Vein expression of a node: the sequence of units that are required to understand the span of text covered by the node, in the context of the whole discourse
![Page 26: Summarizing documents based on cue-phrases and references](https://reader035.vdocuments.mx/reader035/viewer/2022081515/56816807550346895ddd8994/html5/thumbnails/26.jpg)
Step 5: Generating the summary (4)Computing vein expressions
Vein expressions are computed top-down starting with the root (vein expression of the root is its head expression)
nuclear node with no satellites to the left
satellite node on the right
V=v
V=v
V=v
V=seq(h, v)
V=v
H=h
![Page 27: Summarizing documents based on cue-phrases and references](https://reader035.vdocuments.mx/reader035/viewer/2022081515/56816807550346895ddd8994/html5/thumbnails/27.jpg)
Step 5: Generating the summary (5)Vein expressions
1
because
3
and2
…
4 when
for instance
5 6
…
7 :
8 9
…V=1,2,7
V=1
V=1,2,3,4,7
V=1,2,7
V=1,2,7
V=1,2,7,8,9V=1,2,3,4,7
V=1,2,7
V=1,2,7
V=1
V=1,2,3,4,7
V=1,2,3,4,7
V=1,2,3,4,5,7 V=1,2,3,4,5,6,7
V=1,2,3,4,5,7
V=1,2,7,8,9V=1,2,7,8,9
![Page 28: Summarizing documents based on cue-phrases and references](https://reader035.vdocuments.mx/reader035/viewer/2022081515/56816807550346895ddd8994/html5/thumbnails/28.jpg)
The summaries
eduVein expression
1 {1}
2 {1,2,7}
3 {1,2,3,4,7}
4 {1,2,3,4,7}
5 {1,2,3,4,5,7}
6 {1,2,3,4,5,6,7}
7 {1,2,7}
8 {1,2,7,8,9}
9 {1,2,7,8,9}
• Maria is referred in edu-s 1,7,8,9 =>
summary focused on Maria {1,2,7,8,9}
• Simon is referred in edu-s 2,3,4,5,7=>
summary focused on Simon {1,2,3,4,5,7}
• The child is referred in edu-s 2,7 =>
summary focused on the child {1,2,7}
• I is referred in edu-s 3,4,6,7 =>
summary focused on I {1,2,3,4,5,6,7}
Maria went alone to the market because Simon had to stay at home with the baby. I think she has a lot of trust in him to let him alone with the child. You know how Maria is: she is not very hurried to give credit to anybody.
Maria went alone to the market because Simon had to stay at home with the baby. I think she has a lot of trust in him to let him alone with the child.
![Page 29: Summarizing documents based on cue-phrases and references](https://reader035.vdocuments.mx/reader035/viewer/2022081515/56816807550346895ddd8994/html5/thumbnails/29.jpg)
Results
Segmentation step:The results show that, if the input contained errors made by the FDG parser, the precision and recall of the segmentation method would be around 75%. If the input was corrected (that means if all words were properly related between them), the precision and recall would be of 100%.
Anaphora resolution step:The best results proved 100% precision and values of recall in range 70% to 100%. These figures should be taken with care, because of the small dimension of the corpus we used.
![Page 30: Summarizing documents based on cue-phrases and references](https://reader035.vdocuments.mx/reader035/viewer/2022081515/56816807550346895ddd8994/html5/thumbnails/30.jpg)
Conclusions
• The method proposed is based on an earlier investigation which showed a correlation between references and vein structure (antecedents can be found along veins - 99,1% references obey this conjecture)• It is a deterministic method in the sense that only one tree is obtained• Degrees of non-determinism show up at:
- building sdt-s due to different cue-phrase patterns- combining sdt-s into a final discourse tree
![Page 31: Summarizing documents based on cue-phrases and references](https://reader035.vdocuments.mx/reader035/viewer/2022081515/56816807550346895ddd8994/html5/thumbnails/31.jpg)
Further work
• Identify the overall trust in the method• Improve the method of building the global
structure (scores for the types of antecedents)• Transform it by using CT into a beam-search
type of processing• Derive more sophisticated sdt integration rules
by learning• Represent only vein expressions, not the entire
tree
![Page 32: Summarizing documents based on cue-phrases and references](https://reader035.vdocuments.mx/reader035/viewer/2022081515/56816807550346895ddd8994/html5/thumbnails/32.jpg)
Thank you!