1 boosting-based parse re-ranking with subtree features taku kudo jun suzuki hideki isozaki ntt...
TRANSCRIPT
![Page 1: 1 Boosting-based parse re-ranking with subtree features Taku Kudo Jun Suzuki Hideki Isozaki NTT Communication Science Labs](https://reader036.vdocuments.mx/reader036/viewer/2022070410/56649ef65503460f94c09cc0/html5/thumbnails/1.jpg)
1
Boosting-based parse re-ranking with subtree features
Taku Kudo
Jun Suzuki Hideki Isozaki
NTT Communication Science Labs.
![Page 2: 1 Boosting-based parse re-ranking with subtree features Taku Kudo Jun Suzuki Hideki Isozaki NTT Communication Science Labs](https://reader036.vdocuments.mx/reader036/viewer/2022070410/56649ef65503460f94c09cc0/html5/thumbnails/2.jpg)
2
Discriminative methods for parsing
have shown a remarkable performance compared to traditional generative models, e.g., PCFG
two approachesre-ranking [Collins 00, Collins 02]
discriminative machine learning algorithms are used to rerank n-best outputs of generative/conditional parsers.
dynamic programming Max margin parsing [Tasker 04]
![Page 3: 1 Boosting-based parse re-ranking with subtree features Taku Kudo Jun Suzuki Hideki Isozaki NTT Communication Science Labs](https://reader036.vdocuments.mx/reader036/viewer/2022070410/56649ef65503460f94c09cc0/html5/thumbnails/3.jpg)
3
Reranking Let x be an input
sentence, and y be a parse tree for x
Let G(x) be a function that returns a set of n-best results for x
A re-ranker gives a score to each sentence and selects the result which has the highest score
x: I buy cars with money
….
y1
y2
y3
n-best results
0.2
0.5
0.1
G(x)
![Page 4: 1 Boosting-based parse re-ranking with subtree features Taku Kudo Jun Suzuki Hideki Isozaki NTT Communication Science Labs](https://reader036.vdocuments.mx/reader036/viewer/2022070410/56649ef65503460f94c09cc0/html5/thumbnails/4.jpg)
4
Scoring with linear model
is a feature function that maps output y into space
is a parameter vector (weights) modeled with training data
)(ymR
w
)(y...}1,0,0,0,1,0,1,0{
)( maxargˆ
)()(
)(yy
yyscore
xGy
w
w
![Page 5: 1 Boosting-based parse re-ranking with subtree features Taku Kudo Jun Suzuki Hideki Isozaki NTT Communication Science Labs](https://reader036.vdocuments.mx/reader036/viewer/2022070410/56649ef65503460f94c09cc0/html5/thumbnails/5.jpg)
5
Two issues in linear model [1/2]
How to estimate the weights ? try to minimize a loss for given training data
definition of loss:
w
)( maxargˆ)(
yyxGy
w
ME
SVMs
Boosting
![Page 6: 1 Boosting-based parse re-ranking with subtree features Taku Kudo Jun Suzuki Hideki Isozaki NTT Communication Science Labs](https://reader036.vdocuments.mx/reader036/viewer/2022070410/56649ef65503460f94c09cc0/html5/thumbnails/6.jpg)
6
Two issues in linear model [2/2]
How to define the feature set ? use all subtrees
Pros: - natural extension of CFG rules
- can capture long contextual information Cons: naïve enumerations give huge complexities
)( maxargˆ)(
yyxGy
w
)(y
)(y
![Page 7: 1 Boosting-based parse re-ranking with subtree features Taku Kudo Jun Suzuki Hideki Isozaki NTT Communication Science Labs](https://reader036.vdocuments.mx/reader036/viewer/2022070410/56649ef65503460f94c09cc0/html5/thumbnails/7.jpg)
7
A question for all subtrees
Do we always need all subtrees? only a small set of subtrees is informative most subtrees are redundant
Goal: automatic feature selection from all subtrees can perform fast parsing can give good interpretation to selected
subtrees Boosting meets our demand!
![Page 8: 1 Boosting-based parse re-ranking with subtree features Taku Kudo Jun Suzuki Hideki Isozaki NTT Communication Science Labs](https://reader036.vdocuments.mx/reader036/viewer/2022070410/56649ef65503460f94c09cc0/html5/thumbnails/8.jpg)
8
Why Boosting? Different regularization strategies for
L1 (Boosting) better when most given features are irrelevant can remove redundant features
L2 (SVMs) better when most given features are relevant uses features as much as they can
Boosting meets our demand, because most subtrees are irrelevant and redundant
w
![Page 9: 1 Boosting-based parse re-ranking with subtree features Taku Kudo Jun Suzuki Hideki Isozaki NTT Communication Science Labs](https://reader036.vdocuments.mx/reader036/viewer/2022070410/56649ef65503460f94c09cc0/html5/thumbnails/9.jpg)
9
RankBoost [Freund03]Current weights
Next weightsUpdate feature kwith an increment δ
select the optimal pair <k,δ> thatminimizes the Loss
![Page 10: 1 Boosting-based parse re-ranking with subtree features Taku Kudo Jun Suzuki Hideki Isozaki NTT Communication Science Labs](https://reader036.vdocuments.mx/reader036/viewer/2022070410/56649ef65503460f94c09cc0/html5/thumbnails/10.jpg)
10
How to find the optimal subtree?
Set of all subtrees is huge Need to find the optimal subtree efficiently
A variant of Branch-and-Bound Define a search space in which the whole set of subtrees is given Find the optimal subtree by traversing this search space Prune the search space by proposing a criterion
![Page 11: 1 Boosting-based parse re-ranking with subtree features Taku Kudo Jun Suzuki Hideki Isozaki NTT Communication Science Labs](https://reader036.vdocuments.mx/reader036/viewer/2022070410/56649ef65503460f94c09cc0/html5/thumbnails/11.jpg)
11
Ad-hoc techniques Size constraints
Use subtrees whose size is less than s (s = 6~8)
Frequency constraints Use subtrees that occur no less than f times in
training data (f = 2 ~ 5)
Pseudo iterations After several 5- or 10-iterations of boosting, we
alternately perform 100- or 300 pseudo iterations, in which the optimal subtee is selected from the cache that maintains the features explored in the previous iterations.
![Page 12: 1 Boosting-based parse re-ranking with subtree features Taku Kudo Jun Suzuki Hideki Isozaki NTT Communication Science Labs](https://reader036.vdocuments.mx/reader036/viewer/2022070410/56649ef65503460f94c09cc0/html5/thumbnails/12.jpg)
12
Relation to previous work
Boosting vs Kernel methods [Collins 00]Boosting vs Data Oriented Parsing [Bod 98]
![Page 13: 1 Boosting-based parse re-ranking with subtree features Taku Kudo Jun Suzuki Hideki Isozaki NTT Communication Science Labs](https://reader036.vdocuments.mx/reader036/viewer/2022070410/56649ef65503460f94c09cc0/html5/thumbnails/13.jpg)
13
Kernels [Collins 00] Kernel methods reduce the problem into the dual
form that only depends on dot products of two instances (parsed trees)
Pros No need to provide explicit feature vector A dynamic programming is used to calculate dot
products between trees, which is very efficient! Cons
Require a large number of kernel evaluations in testing Parsing is slow Difficult to see which features are relevant
![Page 14: 1 Boosting-based parse re-ranking with subtree features Taku Kudo Jun Suzuki Hideki Isozaki NTT Communication Science Labs](https://reader036.vdocuments.mx/reader036/viewer/2022070410/56649ef65503460f94c09cc0/html5/thumbnails/14.jpg)
14
DOP [Bod 98] DOP is not based on re-ranking DOP deals with the all the subtrees
representation explicitly like our method Pros
high accuracy Cons
exact computation is NP-complete cannot always provide sparse feature representation very slow since the number of subtrees the DOP
uses is huge
![Page 15: 1 Boosting-based parse re-ranking with subtree features Taku Kudo Jun Suzuki Hideki Isozaki NTT Communication Science Labs](https://reader036.vdocuments.mx/reader036/viewer/2022070410/56649ef65503460f94c09cc0/html5/thumbnails/15.jpg)
15
Kernels vs DOP vs BoostingKernel DOP Boosting
How to enumerate all the subtrees?
implicitly explicitly explicitly
Complexity in training
polynomial NP-hard NP-hard
(worst case)Branch-and-bound
Sparse feature representations
No No Yes
Parsing speed slow slow fast
Can see relevant features?
No Yes, but difficult because of
redundant features
Yes
![Page 16: 1 Boosting-based parse re-ranking with subtree features Taku Kudo Jun Suzuki Hideki Isozaki NTT Communication Science Labs](https://reader036.vdocuments.mx/reader036/viewer/2022070410/56649ef65503460f94c09cc0/html5/thumbnails/16.jpg)
16
Experiments
WSJ parsingShallow parsing
![Page 17: 1 Boosting-based parse re-ranking with subtree features Taku Kudo Jun Suzuki Hideki Isozaki NTT Communication Science Labs](https://reader036.vdocuments.mx/reader036/viewer/2022070410/56649ef65503460f94c09cc0/html5/thumbnails/17.jpg)
17
Experiments WSJ parsing
Standard data: training: 2-21, test 23 of PTB Model2 of Collins 99 was used to obtain n-best
results exactly the same setting as [Collins 00 (Kernels)]
Shallow parsing CoNLL 2000 shared task training:15-18, test: 20 of PTB CRF-based parser [Sha 03] was used to obtain n-
best results
![Page 18: 1 Boosting-based parse re-ranking with subtree features Taku Kudo Jun Suzuki Hideki Isozaki NTT Communication Science Labs](https://reader036.vdocuments.mx/reader036/viewer/2022070410/56649ef65503460f94c09cc0/html5/thumbnails/18.jpg)
18
Tree representations WSJ parsing
lexicalized tree each non-terminal has
a special node labeled with a head word
Shallow parsing right-branching tree
where adjacent phrases are child/parent relation
special node for right/left boundaries
![Page 19: 1 Boosting-based parse re-ranking with subtree features Taku Kudo Jun Suzuki Hideki Isozaki NTT Communication Science Labs](https://reader036.vdocuments.mx/reader036/viewer/2022070410/56649ef65503460f94c09cc0/html5/thumbnails/19.jpg)
19
Results: WSJ parsing
LR/LP = labeled recall/precision. CBs is the average number of cross brackets per sentence. 0 CBs, and 2CBs are the percentage of sentences with 0 or 2 crossing brackets, respectively
Comparable to other methods Better than kernel method that uses all subtree representations with different
parameter estimation
![Page 20: 1 Boosting-based parse re-ranking with subtree features Taku Kudo Jun Suzuki Hideki Isozaki NTT Communication Science Labs](https://reader036.vdocuments.mx/reader036/viewer/2022070410/56649ef65503460f94c09cc0/html5/thumbnails/20.jpg)
20
Results: Shallow parsing
Comparable to other methods Our method is also comparable to Zhang’s method even without extra linguistic
features
Fβ=1 is a harmonic mean between precision and recall
![Page 21: 1 Boosting-based parse re-ranking with subtree features Taku Kudo Jun Suzuki Hideki Isozaki NTT Communication Science Labs](https://reader036.vdocuments.mx/reader036/viewer/2022070410/56649ef65503460f94c09cc0/html5/thumbnails/21.jpg)
21
Advantages
Compact feature set WSJ parsing: ~ 8,000 Shallow parsing: ~ 3,000 Kernels implicitly use a huge number of features
Parsing is very fast WSJ parsing: 0.055 sec./sentence Shallow parsing: 0.042 sec./sentence
(n-best parsing time is NOT included)
![Page 22: 1 Boosting-based parse re-ranking with subtree features Taku Kudo Jun Suzuki Hideki Isozaki NTT Communication Science Labs](https://reader036.vdocuments.mx/reader036/viewer/2022070410/56649ef65503460f94c09cc0/html5/thumbnails/22.jpg)
22
Advantages, cont’d Sparse feature representations allow us to
analyze which kinds of subtrees are relevant
Shallow parsing
positive subtrees
negative subtrees negative subtrees
positive subtrees
WSJ parsing
![Page 23: 1 Boosting-based parse re-ranking with subtree features Taku Kudo Jun Suzuki Hideki Isozaki NTT Communication Science Labs](https://reader036.vdocuments.mx/reader036/viewer/2022070410/56649ef65503460f94c09cc0/html5/thumbnails/23.jpg)
23
Conclusions All subtrees are potentially used as features Boosting
L1 norm regularization performs automatic feature selection
Branch and bound enables us to find the optimal subtrees efficiently
Advantages: comparable accuracy to other parsing methods fast parsing good interpretability
![Page 24: 1 Boosting-based parse re-ranking with subtree features Taku Kudo Jun Suzuki Hideki Isozaki NTT Communication Science Labs](https://reader036.vdocuments.mx/reader036/viewer/2022070410/56649ef65503460f94c09cc0/html5/thumbnails/24.jpg)
24
Efficient computation
![Page 25: 1 Boosting-based parse re-ranking with subtree features Taku Kudo Jun Suzuki Hideki Isozaki NTT Communication Science Labs](https://reader036.vdocuments.mx/reader036/viewer/2022070410/56649ef65503460f94c09cc0/html5/thumbnails/25.jpg)
25
Right most extension [Asai02, Zaki02]
Extend a given tree of size (n-1) by adding a new node to obtain trees of size n a node is added to the right-most-path a node is added as the rightmost sibling
b
a
c1
2 4
a b5 6c3
b
a
c1
2 4
a b5 6c3
b
a
c1
2 4
a b5 6c3
b
a
c1
2 4
a b5 6c3
rightmost- path
t 7
7 7},,{ cbaL
},,{ cba
},,{ cba},,{ cba
![Page 26: 1 Boosting-based parse re-ranking with subtree features Taku Kudo Jun Suzuki Hideki Isozaki NTT Communication Science Labs](https://reader036.vdocuments.mx/reader036/viewer/2022070410/56649ef65503460f94c09cc0/html5/thumbnails/26.jpg)
26
Right most extension, cont. Recursive applications of right most
extensions create a search space
![Page 27: 1 Boosting-based parse re-ranking with subtree features Taku Kudo Jun Suzuki Hideki Isozaki NTT Communication Science Labs](https://reader036.vdocuments.mx/reader036/viewer/2022070410/56649ef65503460f94c09cc0/html5/thumbnails/27.jpg)
27
Pruning For all propose an upper bound
such that Can prune the node t if ,
where is a suboptimal gain
tt ' )()'( ttgain
)(t
4.0
4.0
gain
7.0
)( 1.0
gain
6.0
)(3.0
gain
5.0
)( 4.0
gain
)( 5.0 gain
)( 4.0 gain
Pruning strategyμ(t )=0.4 implies the gain of any supertree of t is no grater than 0.4
)()'(
,'
ttgain
tt
![Page 28: 1 Boosting-based parse re-ranking with subtree features Taku Kudo Jun Suzuki Hideki Isozaki NTT Communication Science Labs](https://reader036.vdocuments.mx/reader036/viewer/2022070410/56649ef65503460f94c09cc0/html5/thumbnails/28.jpg)
28
Upper bound of the gain
![Page 29: 1 Boosting-based parse re-ranking with subtree features Taku Kudo Jun Suzuki Hideki Isozaki NTT Communication Science Labs](https://reader036.vdocuments.mx/reader036/viewer/2022070410/56649ef65503460f94c09cc0/html5/thumbnails/29.jpg)
29