get to the point: summarization with pointer-generator...
TRANSCRIPT
Get To The Point:Summarization with Pointer-Generator Networks
Abigail See*, Peter J. Liu, Christopher D. Manning* work done partially during an internship at Google Brain
Abstractive Summarization with RNNs• Recurrent Neural Networks (RNNs) provide a potentially powerful solution for
abstractive summarization. • Using the attention mechanism (see below), they can generate new words (Germany beat Argentina 2-0) by attending to relevant words (victorious, win) in the source text.
Acknowledgements: (i) NVIDIA Corporation (ii) DARPA Deep Exploration and Filtering of Text (DEFT) Program under AFRL contract no. FA8750-13-2-0040
...
Atte
ntio
n Di
strib
utio
n
<START>
Vocabulary Distribution
Context Vector
Germany
a zoo
Partial Summary
"beat"
Germany emerge victorious in 2-0 win against Argentina on Saturday ...
Enco
der
Hid
den
Stat
es
DecoderH
idden States
Source Text
Two common problems:
Easier Copying with Pointer-Generator NetworkProblem: Factual details (especially rare and OOV words) are copied inaccurately
Solution: A hybrid network that can copy via pointing, or generate from a fixed vocabulary
Source Text
Germany emerge victorious in 2-0 win against Argentina on Saturday ...
...
<START>
Vocabulary Distribution
Context Vector
Germany
a zoo
beat
a zoo
Partial Summary
Final Distribution
"Argentina"
"2-0"
Atte
ntio
n Di
strib
utio
n
Enco
der
Hid
den
Stat
es
Decoder Hidden States
• For each summary word, the network first calculates the generation probability pgen • pgen interpolates between copying from attention distribution a and generating from
vocabulary distribution Pvocab
Best of both worlds: abstractive (generating) and extractive (copying)
Advantages:• Faster to train • Easy to accurately reproduce phrases • Can copy OOV words (don’t need large vocabulary)
Eliminating Repetition with CoverageProblem: Summaries are repetitive.
Solution: Penalize repeatedly attending to the same parts of the source text.
On each decoder timestep t, the coverage vector ct tells us what has been attended to (thus summarized) so far.
ExperimentsDataset: CNN/Daily Mail (news article → multi-sentence summary)
• Our pointer-generator + coverage model beats best abstractive system.
Example Output
Coverage eliminates undesirable repetition
How Abstractive Is Our Output?• Our network uses the pointer more than the
generator (average pgen = 0.17) • It produces some novel words and phrases,
but fewer than the reference summaries
Open question: How to make pointer-generator network more abstractive?
1
-
g
r
a
m
s
2
-
g
r
a
m
s
3
-
g
r
a
m
s
4
-
g
r
a
m
s
s
e
n
t
e
n
c
e
s
0
10
20
30
%thatareduplicates
pointer-generator, no coverage
pointer-generator, with coverage
reference summaries
1
Abstractive systems
Extractive systems
Conclusion• Pointer-generator networks enable more accurate copying, are easier to
train and can deal with OOVs • Coverage drastically reduces repetition • ROUGE metric is of limited use for evaluating abstractive systems • Future work: make the pointer-generator network more abstractive
1. Summaries repeat themselves (e.g. Germany beat Germany beat Germany beat… ) 2. Summaries reproduce factual details inaccurately (e.g. Germany beat Argentina 3-2 )
Coverage vector Sum of attention distributions so far
Penalize overlap between coverage vector ct and new attention distribution at.
Coverage loss function
Result: repetition reduced to similar level as reference summaries
• Extractive systems and lead-3 baseline remain difficult to beat. • The ROUGE metric is not robust to paraphrasing
Yellow highlight shows final value of the coverage vector (i.e. what has been covered by the final summary)
Baseline model produces factual inaccuracies and UNKs
Pointer-generator model copies accurately, deals with OOVs but repeats itself
Coverage model has no repetition. Green highlighting shows generation probability pgen.
Probability that next word is w
Probability of generating w from the fixed vocabulary Sum of attention
distribution everywhere w appears in the source text
Overlap between coverage and current attention
The Need for Abstractive SummarizationAutomatic text summarization is increasingly vital in the digital age. Two approaches:
Extractive summarizationSelect and rearrange passages from the original text
Abstractive summarizationGenerate novel sentences
• More restrictive • Most past work has been extractive • Easier to get reasonable performance
• More expressive • Humans are abstractive summarizers • More difficult and unpredictable
Abstractive summarization is essential for high-quality output
1
-
g
r
a
m
s
2
-
g
r
a
m
s
3
-
g
r
a
m
s
4
-
g
r
a
m
s
s
e
n
t
e
n
c
e
s
0
20
40
60
80
100
%thatarenovel
pointer-generator, with coverage
baseline model
reference summaries
1
P (w) = pgen
Pvocab
(w) + (1� pgen
)X
i:wi=w
ai (1)
1
P (w) = pgen
Pvocab
(w) + (1� pgen
)X
i:wi=w
ai (1)
ct =Pt�1
t0=0 at0
covlosst =P
i min(ati, cti)
1
P (w) = pgen
Pvocab
(w) + (1� pgen
)X
i:wi=w
ai (1)
ct =Pt�1
t0=0 at0
covlosst =P
i min(ati, cti)
1