efficient kernels for sentence pair classification

Fabio Massimo Zanzotto and Lorenzo Dell’ArcipreteUniversity of Rome “Tor Vergata”

Roma, Italy

Efficient kernels for sentence pair classification

F.M.Zanzotto

University of Rome “Tor Vergata”

• Classifying sentence pairs is an important activity in many NLP tasks, e.g.:– Textual Entailment Recognition– Machine Translation– Question-Answering

• Classifiers need suitalble feature spaces

Motivation

F.M.Zanzotto


For example, in textual entailment…

Motivation

T1

H1

“Farmers feed cows animal extracts”

“Cows eat animal extracts”

P1: T1 H1

T2

H2

“They feed dolphins fishs”

“Fishs eat dolphins”

P2: T2 H2

T3

H3

“Mothers feed babies milk”

“Babies eat milk”

P3: T3 H3

Training examples

Classification

Relevant Featuresfeed eatX Y X Y

First-order rules

F.M.Zanzotto


• First-order rule (FOR) feature spaces: a challenge

• Tripartite Directed Acyclic Graphs (tDAG) as a solution:– for modelling FOR feature spaces– for defining efficient algorithms for computing kernel functions

with tDAGs in FOR feature spaces

• An efficient algorithm for computing kernels in FOR spaces

• Experimental and comparative assessment of the computational efficiency of the proposed algorithm

In this talk…

F.M.Zanzotto


We want to exploit first-order rule (FOR) feature spaces writing the implicit kernel function

K(P1,P2)=|S(P1)S(P2)|that computes how many common first-order rules are activated from P1 and P2

Without loss of generality, we present the problem in syntactic-first-order rule feature spaces

First-order rule (FOR) feature spaces: challenges

F.M.Zanzotto


• … using the Kernel Trick: – define the distance K(P1 , P2) – instead of defining the feautures

Observations

T1 H1

T1 H2

K(T1 H1,T1 H2)

F.M.Zanzotto



S

NP VP

VB NP

eat

VP

VB NP

feed

NPNNS

CowsNN NNS

animal extractsNNS

cows

NN NNS

animal extracts

S

NP

NNS

Farmers

1 2 3

3

3

1 2

1

2 3

3

3

21

1

1

,

VP

S

NP

S

NP VP1 , VP

VB NP NP 31

S

NP VP

VB NP 3

1 ,, ,...{ }

T1

H1

“Farmers feed cows animal extracts”

“Cows eat animal extracts”

T1 H1

feedeat

Pa=

S(Pa)=

Adding placeholdersPropagating placeholders

F.M.Zanzotto



S

NP VP

VB

eat

VP

VB NP

feed

NPNNS

Babies

NNS

babies

NN

milk

S

NP

NNS

Mothers

1 2

2

1 2

1

1

1

1

, NP

NN

milk2

2

2

T3

H3

“Mothers feed babies milk”“Babies eat milk”

T3 H3

Pb=

S(Pb)=VP

S

NP

S

NP VP1 , VP

VB NP NP 21

S

NP VP

VB NP 2

1 ,, ,...{ }

feedeat

F.M.Zanzotto



S

NP VP

VB NP

X

Y

eat

VP

VB NP X

feed

NP Y

VP

S

NP

S

NP VP1 , VP

VB NP NP 21

S

NP VP

VB NP 2

1 ,, ,...{ }

feedeat

VP

S

NP

S

NP VP1 , VP

VB NP NP 31

S

NP VP

VB NP 3

1 ,, ,...{ }

feedeat

K(Pa,Pb)=|S(Pa)S(Pb)|

S(Pb)=

S(Pa)=

,=

==

F.M.Zanzotto


• FOR feature spaces can be modelled with particular graphs

• We call these graphs tripartite direct acyclic graphs (tDAGs)

• Observations:– tDAGs are not trees– tDAGs can be used to model both rules and sentence

pairs– unifying rules in sentences is a graph matching problem

A step back…

F.M.Zanzotto


As for Feature Structures…

Tripartite Directed Acyclic Graphs (tDAG)

S

NP VP

VB NP

X

Y

eat

VP

VB NP X

feed

NP Y

S

NP VP

VB NP

eat

VP

VB NP

feed

NPNNS

CowsNN NNS

animal extractsNNS

cows

NN NNS

animal extracts

S

NP

NNS

Farmers

1 2 3

3

3

1 2

1

2 3

3

3

21

1

1

F.M.Zanzotto


S

NP VP

NP

eat

VP

VB

feed

NPNP

VB

A tripartite directed acyclic graph is a graph G = (N,E)

where:• the set of nodes N is partitioned in three sets Nt, Ng, and A• the set of edges is partitioned in four sets Nt, Ng, EA(t), and

EA(g)

where t = (Nt,Et) and g = (Nt,Et) are two trees EA(t) = {(x, y)|x Nt and yA} EA(g) = {(x, y)|x Ng and yA}

Tripartite Directed Acyclic Graphs (tDAGs)

F.M.Zanzotto


Alternative definitionA tDAG is a pair of extented trees

G = (t,g) where:t = (NtAt,EtEA(t)) and g = (NgAg,EgEA(g)).

Tripartite Directed Acyclic Graphs (tDAGs)

S

NP VP

NP

eat

VP

VB

feed

NPNP

VB

S

NP VP

NP

eat

VP

VB

feed

NPNP

VB

X

Y

X Y

F.M.Zanzotto


Computing the implicit kernel functionK(P1,P2)=|S(P1)S(P2)|

involves general graph matching. This is an exponential problem.

Yet…tDAGs are particular graphs and we can define an efficient algorithm

We will analyze the isomorphism among tDAGs and we will derive an algorithm for

Again challenges

F.M.Zanzotto


Isomorphism between graphs

G1=(N1,E1) and G2=(N2,E2) are isomorphic if:– |N1|=|N2| and |E1|=|E2|– Among all the bijecive functions relating N1 and N2, it

exists f : N1 N2 such that:• for each n1 in N1, Label(n1)=Label(f(n1))• for each (na,nb) in E1, (f(na),f(nb)) is in E2

Isomorphism between tDAGs

F.M.Zanzotto


Isomorphism adapted to tDAGsG1 = (t1,g1) and G2 = (t2,g2) are isomorphic if these

two properties hold– Partial isomorphism

• g1 and g2 are isomorphic• t1 and t2 are isomorphic• This property generates two functions fg and ft

– Constraint compatibility• fg and ft are compatible on the sets of nodes A1 and A2, if

for each n A1, it happens that f g (n) = ft (n).


F.M.Zanzotto



VP

VB NP NP 31

S

NP VP

VB NP 3

1 ,

VP

VB NP NP 21

S

NP VP

VB NP 2

1 ,

Ct=

Ct= Cg

1 1{ ), 3 2( ),( }, Cg= 1 1{ ), 3 2( ),( },

Partial isomorphism

Constraint compatibility

Pa=(ta,ga)=

Pb=(tb,gb)=

F.M.Zanzotto


We defineK(P1,P2)=|S(P1)S(P2)|

using the isomorphism between tDAGs

The idea: reverse the order of isomorphism detection• First, constraint compatibility

– Building a set C of all the relevant alternative constraints – Finding subsets of S(P1)S(P2) meeting a constraint cC

• Second, partial isomorphism detection

Ideas for building the kernelsubsets of S(P1)S(P2) Alternative constraints

Partial Isomorphism


F.M.Zanzotto


Ideas for building the kernel

A

B C

C

1

1

C 2B B 21

1

1

A

B C

C

1

1

C 3B B 21

1

1

I

M N

N

1

1

N 1M M 12

1

2

I

M N

N

1

1

N 1M M 13

1

2

,

,

C={c1,c2}={ 1 1{ ), 2 2( ),( }, , 1 1{ ), 2 3( ),( }, }

K(Pa,Pb)=|S(Pa)S(Pb)|

Pa=(ta,ga)=

Pb=(tb,gb)=

subsets of S(P1)S(P2) Alternative constraints

Partial Isomorphism


F.M.Zanzotto



A

B C

C

1

1

C 2B B 21

1

1

A

B C

C

1

1

C 3B B 21

1

1

I

M N

N

1

1

N 1M M 12

1

2

I

M N

N

1

1

N 1M M 13

1

2

,

,

1 1{ ), 2 2( ),( },c1=

A

B C

1

1

B B 21

1

I

M N

N

1

1

N 1

1

2

,

A

B C

1

1 1

I

M N

N

1

1

N 1

1

2

,

A

B C

1

1

B B 21

1I

M N

1

1 1

,

A

B C

1

1 1

I

M N

1

1 1 ,{

}

, ,

,

C={c1,c2}

S(Pa)S(Pb)) c1=

Pa=

Pb=


Partial Isomorphism


K(Pa,Pb)=|S(Pa)S(Pb)|K(Pa,Pb)=|S(Pa)S(Pb)|=|(S(Pa)S(Pb)) c1(S(Pa)S(Pb)) c2|

F.M.Zanzotto



A

B C

C

1

1

C 2B B 21

1

1

A

B C

C

1

1

C 3B B 21

1

1

I

M N

N

1

1

N 1M M 12

1

2

I

M N

N

1

1

N 1M M 13

1

2

,

,

1 1{ ), 2 3( ),( },c2=

A

B C

1

1

C C 21

1

I

M N

M

1

1

M 1

1

2

,

A

B C

1

1 1

I

M N

N

1

1

N 1

1

2

,

A

B C

1

1

C C 21

1I

M N

1

1 1

,

A

B C

1

1 1

I

M N

1

1 1 ,{

}

, ,

,

C={c1,c2}

K(Pa,Pb)=|S(Pa)S(Pb)|=|(S(Pa)S(Pb)) c1(S(Pa)S(Pb)) c2|

Pa=

Pb=

S(Pa)S(Pb)) c2=


Partial Isomorphism


F.M.Zanzotto



A

B C

1

1

B B 21

1

I

M N

N

1

1

N 1

1

2

,

A

B C

1

1 1

I

M N

N

1

1

N 1

1

2

,

A

B C

1

1

B B 21

1I

M N

1

1 1

,

A

B C

1

1 1

I

M N

1

1 1 ,=

{

} =

, ,

,

={A

B C

1

1

B B 21

1

I

M N

N

1

1

N 1

1

2

,A

B C

1

1 1

I

M N

1

1 1, }

=}{

(S(Pa)S(Pb)) c1

=(S(ta)S(tb)) c1 (S(ga)S(gb)) c1

K(Pa,Pb)=|cC(S(Pa)S(Pb))c|=|cC (S(ta)S(tb))c(S(ga)S(gb))c|


Partial Isomorphism


F.M.Zanzotto


The general Equation

can be computed using:1) KS (kernel function for trees) introduced in(Duffy&Collins, 2001)

and refined in (Moschitti&Zanzotto, 2007)2) The inclusion exclusion principle

Kernel on FOR feature spaces

K(P1,P2)=|cC (S(t1)S(t2))c(S(g1)S(g2))c|

F.M.Zanzotto


• Comparison Kernel (Zanzotto&Moschitti, Coling-ACL 2006),(Moschitti&Zanzotto, ICML 2007)

• Test-bed: corpus– Recognizing Textual Entailment challenge data

Computational Efficency Analysis

F.M.Zanzotto


Computational Efficency Analysis

Execution time in seconds (s) for all the RTE2 with respect to different numbers of allowed placeholders

F.M.Zanzotto


• Training: RTE 1, 2, 3 • Testing: RTE 4

Accuracy Comparison

F.M.Zanzotto


• We reduced kernels in first-order feature spaces as graph-matching problems

• We defined a new class of graphs, tDAGs• We presented an efficient algorithm for computing

kernels in FOR feature spaces

Conclusions

F.M.Zanzotto


S

NP VP

VB NP

eat

VP

VB NP

feed

NPNNS

CowsNN NNS

animal extractsNNS

cows

NN NNS

animal extracts

S

NP

NNS

Farmers

1 2 3

3

3

1 2

1

2 3

3

3

21

1

1

,

VP

S

NP

S

NP VP1 , VP

VB NP NP 31

S

NP VP

VB NP 3

1 ,, ,...{ }

F.M.Zanzotto


VP

S

NP

S

NP VP1 , VP

VB NP NP 21

S

NP VP

VB NP 2

1 ,, ,...{ }

S

NP VP

VB

eat

VP

VB NP

feed

NPNNS

Cows

NNS

babies

NN

milk

S

NP

NNS

Mothers

1 2

2

1 2

1

1

1

1

, NP

NN

milk2

2

2

VP

S

NP

S

NP VP1 , VP

VB NP NP 31

S

NP VP

VB NP 3

1 ,, ,...{ }

efficient kernels for sentence pair classification

Documents

feature spacesfor

spaces experimental

animal extractsp1

sentence pairsunifying

animal extractst1 h1feedeatpa

computing kernels

computing kernel functions

entailment relations