a distributional and orthographic aggregation model for ...€¦ · •reranking with frequency...
TRANSCRIPT
A Distributional and Orthographic Aggregation Model for English Derivational Morphology
Daniel Deutsch,* John Hewitt,* and Dan Roth
*equal contribution
2
Co-Authors
John HewittCo-First Author
Dan RothAdvisor
3
Derivational Morphology
employer
employemployment
intensely
intenseintensity
4
Derivational Morphology
employer
employemployment
intensely
intenseintensity
root wordtransformation
derived word
5
Derivational Morphology
employer
employemployment
intensely
intenseintensity
root wordtransformation
derived word
6
Motivation
• Machine translation
• Text simplification
• Language generation
7
Challenges
• Suffix ambiguity
• Orthographic irregularity
8
Suffix Ambiguity
“I have an observament!”
9
Suffix Ambiguity
ground
grounding
*groundation
*groundment
*groundal
“I have an observament!”
Result
10
Suffix Ambiguity
validvalidity
*validnessNominal
ground
grounding
*groundation
*groundment
*groundal
“I have an observament!”
Result
11
Orthographic Irregularity
speak speechResult
12
Orthographic Irregularity
speak
creak
speechResult
Result
13
Orthographic Irregularity
speak
creak
speech
*creech
Result
Result
*creech
14
Orthographic Irregularity
speak
creak
speech
creaking
Result
Result
*creech
15
Orthographic Irregularity
speak
creak
erupt
speech
creaking
eruption
Result
Result
Result
*creech
16
Orthographic Irregularity
speak
creak
erupt
bankrupt
speech
creaking
eruption
Result
Result
Result
Result
*creech
17
Orthographic Irregularity
speak
creak
erupt
bankrupt
speech
creaking
eruption
*bankruption
Result
Result
Result
Result
18
Orthographic Irregularity
speak speech
creak *creech
erupt eruption
bankrupt *bankruption bankruptcy
creaking
Result
Result
Result
Result
19
Model Overview
wise
Adverb+ wisely
aggregation
distributional:
orthographic irregularity
orthographic:
suffix ambiguity
20
Model Overview
wise
Adverb+
21
Model Overview
distributional:
orthographic irregularity
orthographic:
suffix ambiguity
22
Model Overview
wisely
aggregation
23
Model Overview
wise
Adverb+ wisely
aggregation
distributional:
orthographic irregularity
orthographic:
suffix ambiguity
24
Model Overview
orthographic:
suffix ambiguity
25
Orthographic Model
• Seq2Seq baseline
• Dictionary-constrained decoding
• Reranking with frequency information
26
Seq2Seq Baseline
c o m p o s e #
Re
su
lt#
c o m p o s i
27
Seq2Seq Baseline
c o m p o s e ##
Re
su
lt
28
Seq2Seq Baseline
c o m p o s e ##
29
Seq2Seq Baseline
Re
su
lt
30
Seq2Seq Baselinec o m p o s i
• Seq2Seq models generate many unattested words, but are reasonable guesses
31
Dictionary-Constrained Decoding
ground
grounding
*groundation
*groundment
*groundalResult
Suffix Ambiguity
• Seq2Seq models generate many unattested words, but are reasonable guesses
• Intuition: constrain model to only generate known words
32
Dictionary-Constrained Decoding
ground
grounding
*groundation
*groundment
*groundalResult
Suffix Ambiguity
33
Dictionary-Constrained Decoding
34
Dictionary-Constrained Decoding
#
a
b
ab
aa
bb
aba
abb
baa
bab
ba
…
…
…
…
…
…
#
35
Dictionary-Constrained Decoding
#
a
b
ab
aa
bb
aba
abb
baa
bab
ba
#
36
Dictionary-Constrained Decoding
#
a
b
ab
aa
bb
aba
abb
baa
bab
ba
…
…
…
…
…
…
#
37
Dictionary-Constrained Decoding
#
a
b
ab
aa
bb
aba
abb
baa
bab
ba
#
38
Dictionary-Constrained Decoding
#
a
b
ab
aa
bb
aba
abb
baa
bab
ba
…
#
39
Dictionary-Constrained Decoding
#
a
b
ab
aa
bb
aba
abb
baa
bab
ba
…
#
40
Dictionary-Constrained Decoding
#
a
b
ab
aa
bb
aba
abb
baa
bab
ba
…
…
…
…
…
…
#
41
Dictionary-Constrained Decoding
#
a
b
ab
aa
bb
aba
abb
baa
bab
ba
…
…
…
…
…
…
#
42
Dictionary-Constrained Decoding
Search over trie
induced from dictionary
#
a
b
ab
aba
ba
#
43
Reranking with Frequency Information
refute Result
44
Reranking with Frequency Information
refution
refutation
refut
refuty
refutat
-1.1
-1.2
-4.8
-5.6
-8.7
refute Result
Model Output Model Score
45
Reranking with Frequency Information
refution
refutation
refut
refuty
refutat
-1.1
-1.2
-4.8
-5.6
-8.7
refute Result
Model Output Model Score
46
Reranking with Frequency Information
refution
refutation
refut
refuty
refutat
-1.1
-1.2
-4.8
-5.6
-8.7
5.0
14.3
7.4
0.1
8.6
refute Result
Model Output Model ScoreLog Corpus
Freq
47
Reranking with Frequency Information
Model Output
refution
refutation
refut
refuty
refutat
Model Score
-1.1
-1.2
-4.8
-5.6
-8.7
Log Corpus
Freq
5.0
14.3
7.4
0.1
8.6
Reranker
Output
refutation
refution
refut
refuty
refutat
Reranker
Score
0.5
-0.9
-0.9
-0.9
-0.9
refute Result
48
Model Overview
orthographic:
suffix ambiguity
49
Model Overview
distributional:
orthographic irregularity
• Orthographic information can be unreliable
• Semantic transformation remains the same
50
Distributional Model
*creech
speak
creak
speech
creaking
Orthographic
Irregularity
Result
Result
51
Distributional Model
Intuition
52
Distributional Model
Intuition
53
Distributional Model
Intuition
Learn non-linear function
per transformation
54
Distributional Model
Intuition
Learn non-linear function
per transformation
Independent of
orthography
55
Distributional Model
non-linear
function
56
Model Overview
distributional:
orthographic irregularity
57
Model Overview
wisely
aggregation
58
Aggregation Model
approval
Orthographic
approvation
-0.2
Distributional
approval
-0.1
non-linear
function
59
Aggregation Model
Ortho Score ScoreDistributional
-0.9
-0.3
-0.5
-0.8
-0.6
-0.8
-1.1
-0.9
approvation
bankruption
expertly
stroller
approval
bankruptcy
expertly
strolls
60
Aggregation Model
Ortho Score ScoreDistributional
-0.9
-0.3
-0.8
-0.6
-0.8
-0.9
approvation
bankruption
stroller
approval
bankruptcy
strolls
61
Aggregation Model
Ortho Score Score Aggregation Selection
approvation
bankruption
expertly
stroller
Distributional
approval
bankruptcy
expertly
strolls
-0.9
-0.3
-0.5
-0.8
-0.6
-0.8
-1.1
-0.9
approval
bankruption
expertly
stroller
Experiments
62
Dataset
Transformation Count Example
Adverb 1715
Result 1251
Agent 801
Nominal 354
recite recital
overstate overstatement
simulate simulation
wise wisely
survive survivor
yodel yodeler
effective effectiveness
pessimistic pessimism
intense intensity
Cotterell et al. 2017
63
64
Experiment Details
• 30 random restarts
• Token information: Google Book NGrams
– 360k unigram types
– Token counts aggregated
• Google News pre-trained word embeddings
• Evaluation: full-token match accuracy
65
Results Legend
Seq2Seq
Distributional
Aggregation
Dictionary-Constrained Decoding
Frequency-Based Reranking
66
Results
40
45
50
55
60
65
70
75
80
85
Dist Seq Aggr Seq+Freq Aggr+Freq
Toke
n A
ccura
cy
Unconstrained
Constrained
Cotterell et al.
2017
Cotterell et al.
2017
Constrained
67
Results
40
45
50
55
60
65
70
75
80
85
Dist Seq Aggr Seq+Freq Aggr+Freq
Toke
n A
ccura
cy
Unconstrained
68
Results
40
45
50
55
60
65
70
75
80
85
Dist Seq Aggr Seq+Freq Aggr+Freq
Toke
n A
ccura
cy
Unconstrained
Constrained
Cotterell et al.
2017
69
Results
40
45
50
55
60
65
70
75
80
85
Dist Seq Aggr Seq+Freq Aggr+Freq
Toke
n A
ccura
cy
Unconstrained
Constrained
Cotterell et al.
2017
Significant improvement when combining Dist and Seq
70
Results
40
45
50
55
60
65
70
75
80
85
Dist Seq Aggr Seq+Freq Aggr+Freq
Toke
n A
ccura
cy
Unconstrained
Constrained
Cotterell et al.
2017
Frequency statistics are a valuable signal
71
Results
40
45
50
55
60
65
70
75
80
85
Dist Seq Aggr Seq+Freq Aggr+Freq
Toke
n A
ccura
cy
Unconstrained
Constrained
Cotterell et al.
2017
Combined model still outperforms separate models
72
Results
40
45
50
55
60
65
70
75
80
85
Dist Seq Aggr Seq+Freq Aggr+Freq
Toke
n A
ccura
cy
Unconstrained
Constrained
Cotterell et al.
2017
Cotterell et al.
2017 73
Results
40
45
50
55
60
65
70
75
80
85
Dist Seq Aggr Seq+Freq Aggr+Freq
Toke
n A
ccura
cy
Unconstrained
Constrained
22% and 37% relative error reductions over Seq
74
Results by Transformation
0
10
20
30
40
50
60
70
80
90
100
Nominal Result Agent Adverb
Toke
n A
ccura
cy
Baseline
Aggr
Aggr+Freq+Dict
Cotterell et al.
2017
Baseline
Aggr
Aggr+Freq+Dict
75
Results by Transformation
0
10
20
30
40
50
60
70
80
90
100
Nominal Result Agent Adverb
Toke
n A
ccura
cy
Cotterell et al.
2017
Baseline
Aggr
Aggr+Freq+Dict
76
Results by Transformation
0
10
20
30
40
50
60
70
80
90
100
Nominal Result Agent Adverb
Toke
n A
ccura
cy
Cotterell et al.
2017
Baseline
Aggr
77
Results by Transformation
0
10
20
30
40
50
60
70
80
90
100
Nominal Result Agent Adverb
Toke
n A
ccura
cy
Aggr+Freq+Dict
Cotterell et al.
2017
78
Results by Transformation
0
10
20
30
40
50
60
70
80
90
100
Nominal Result Agent Adverb
Toke
n A
ccura
cy
Baseline
Aggr
Aggr+Freq+Dict
Cotterell et al.
2017
79
Results by Transformation
0
10
20
30
40
50
60
70
80
90
100
Nominal Result Agent Adverb
Toke
n A
ccura
cy
Baseline
Aggr
Aggr+Freq+Dict
Cotterell et al.
2017
80
Results by Transformation
0
10
20
30
40
50
60
70
80
90
100
Nominal Result Agent Adverb
Toke
n A
ccura
cy
Baseline
Aggr
Aggr+Freq+Dict
Cotterell et al.
2017
equivalent {equivalence, equivalency}Nominal
81
What does each model do well?
82
What does each model do well?
Orthographic
does best
Distributional
does best
83
What does each model do well?
Orthographic
does best
Distributional
does best
84
What does each model do well?
Orthographic
does best
Distributional
does best
85
What does each model do well?
*sponsorment
sponsorship
Orthographic
does best
Distributional
does best
86
What does each model do well?
Orthographic
does best
Distributional
does best
87
What does each model do well?
Orthographic
does best
Distributional
does best
wise wiselyAdverb
quick quicklyAdverb
88
What does each model do well?
Orthographic
does best
Distributional
does best
wise wiselyAdverb
quick quicklyAdverb
Result
renew renewal
invest investmentResult
inspire inspirationResult
89
Conclusion
• Aggregation model for English derivational morphology
• Dictionary-constrained decoding
• Frequency-based reranking
• Distributional model per-transformation
• Best open- and closed-vocabulary models demonstrate 22% and 37% reduction in error– New state-of-the-art results
90
Code & Data
Code
https://github.com/danieldeutsch/derivational-morphology
Data
https://github.com/ryancotterell/derivational-paradigms
Powered by
91
References
• Cotterell et al. 2017, Paradigm completion for derivational morphology. In EMNLP
Thank you!