efficient kernels for sentence pair classification

30
Fabio Massimo Zanzotto and Lorenzo Dell’Arciprete University of Rome “Tor Vergata” Roma, Italy Efficient kernels for sentence pair classification

Upload: demont

Post on 23-Feb-2016

42 views

Category:

Documents


0 download

DESCRIPTION

Efficient kernels for sentence pair classification. Fabio Massimo Zanzotto and Lorenzo Dell’Arciprete University of Rome “Tor Vergata” Roma, Italy. Motivation. Classifying sentence pairs is an important activity in many NLP tasks , e.g.: Textual Entailment Recognition - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Efficient  kernels  for  sentence pair classification

Fabio Massimo Zanzotto and Lorenzo Dell’ArcipreteUniversity of Rome “Tor Vergata”

Roma, Italy

Efficient kernels for sentence pair classification

Page 2: Efficient  kernels  for  sentence pair classification

F.M.Zanzotto

University of Rome “Tor Vergata”

• Classifying sentence pairs is an important activity in many NLP tasks, e.g.:– Textual Entailment Recognition– Machine Translation– Question-Answering

• Classifiers need suitalble feature spaces

Motivation

Page 3: Efficient  kernels  for  sentence pair classification

F.M.Zanzotto

University of Rome “Tor Vergata”

For example, in textual entailment…

Motivation

T1

H1

“Farmers feed cows animal extracts”

“Cows eat animal extracts”

P1: T1 H1

T2

H2

“They feed dolphins fishs”

“Fishs eat dolphins”

P2: T2 H2

T3

H3

“Mothers feed babies milk”

“Babies eat milk”

P3: T3 H3

Training examples

Classification

Relevant Featuresfeed eatX Y X Y

First-order rules

Page 4: Efficient  kernels  for  sentence pair classification

F.M.Zanzotto

University of Rome “Tor Vergata”

• First-order rule (FOR) feature spaces: a challenge

• Tripartite Directed Acyclic Graphs (tDAG) as a solution:– for modelling FOR feature spaces– for defining efficient algorithms for computing kernel functions

with tDAGs in FOR feature spaces

• An efficient algorithm for computing kernels in FOR spaces

• Experimental and comparative assessment of the computational efficiency of the proposed algorithm

In this talk…

Page 5: Efficient  kernels  for  sentence pair classification

F.M.Zanzotto

University of Rome “Tor Vergata”

We want to exploit first-order rule (FOR) feature spaces writing the implicit kernel function

K(P1,P2)=|S(P1)S(P2)|that computes how many common first-order rules are activated from P1 and P2

Without loss of generality, we present the problem in syntactic-first-order rule feature spaces

First-order rule (FOR) feature spaces: challenges

Page 6: Efficient  kernels  for  sentence pair classification

F.M.Zanzotto

University of Rome “Tor Vergata”

• … using the Kernel Trick: – define the distance K(P1 , P2) – instead of defining the feautures

Observations

T1 H1

T1 H2

K(T1 H1,T1 H2)

Page 7: Efficient  kernels  for  sentence pair classification

F.M.Zanzotto

University of Rome “Tor Vergata”

First-order rule (FOR) feature spaces: challenges

S

NP VP

VB NP

eat

VP

VB NP

feed

NPNNS

CowsNN NNS

animal extractsNNS

cows

NN NNS

animal extracts

S

NP

NNS

Farmers

1 2 3

3

3

1 2

1

2 3

3

3

21

1

1

,

VP

S

NP

S

NP VP1 , VP

VB NP NP 31

S

NP VP

VB NP 3

1 ,, ,...{ }

T1

H1

“Farmers feed cows animal extracts”

“Cows eat animal extracts”

T1 H1

feedeat

Pa=

S(Pa)=

Adding placeholdersPropagating placeholders

Page 8: Efficient  kernels  for  sentence pair classification

F.M.Zanzotto

University of Rome “Tor Vergata”

First-order rule (FOR) feature spaces: challenges

S

NP VP

VB

eat

VP

VB NP

feed

NPNNS

Babies

NNS

babies

NN

milk

S

NP

NNS

Mothers

1 2

2

1 2

1

1

1

1

, NP

NN

milk2

2

2

T3

H3

“Mothers feed babies milk”“Babies eat milk”

T3 H3

Pb=

S(Pb)=VP

S

NP

S

NP VP1 , VP

VB NP NP 21

S

NP VP

VB NP 2

1 ,, ,...{ }

feedeat

Page 9: Efficient  kernels  for  sentence pair classification

F.M.Zanzotto

University of Rome “Tor Vergata”

First-order rule (FOR) feature spaces: challenges

S

NP VP

VB NP

X

Y

eat

VP

VB NP X

feed

NP Y

VP

S

NP

S

NP VP1 , VP

VB NP NP 21

S

NP VP

VB NP 2

1 ,, ,...{ }

feedeat

VP

S

NP

S

NP VP1 , VP

VB NP NP 31

S

NP VP

VB NP 3

1 ,, ,...{ }

feedeat

K(Pa,Pb)=|S(Pa)S(Pb)|

S(Pb)=

S(Pa)=

,=

==

Page 10: Efficient  kernels  for  sentence pair classification

F.M.Zanzotto

University of Rome “Tor Vergata”

• FOR feature spaces can be modelled with particular graphs

• We call these graphs tripartite direct acyclic graphs (tDAGs)

• Observations:– tDAGs are not trees– tDAGs can be used to model both rules and sentence

pairs– unifying rules in sentences is a graph matching problem

A step back…

Page 11: Efficient  kernels  for  sentence pair classification

F.M.Zanzotto

University of Rome “Tor Vergata”

As for Feature Structures…

Tripartite Directed Acyclic Graphs (tDAG)

S

NP VP

VB NP

X

Y

eat

VP

VB NP X

feed

NP Y

S

NP VP

VB NP

eat

VP

VB NP

feed

NPNNS

CowsNN NNS

animal extractsNNS

cows

NN NNS

animal extracts

S

NP

NNS

Farmers

1 2 3

3

3

1 2

1

2 3

3

3

21

1

1

Page 12: Efficient  kernels  for  sentence pair classification

F.M.Zanzotto

University of Rome “Tor Vergata”

As for Feature Structures…

Tripartite Directed Acyclic Graphs (tDAG)

S

NP VP

VB NP

X

Y

eat

VP

VB NP X

feed

NP Y

S

NP VP

VB NP

eat

VP

VB NP

feed

NPNNS

CowsNN NNS

animal extractsNNS

cows

NN NNS

animal extracts

S

NP

NNS

Farmers

1 2 3

3

3

1 2

1

2 3

3

3

21

1

1

Page 13: Efficient  kernels  for  sentence pair classification

F.M.Zanzotto

University of Rome “Tor Vergata”

S

NP VP

NP

eat

VP

VB

feed

NPNP

VB

A tripartite directed acyclic graph is a graph G = (N,E)

where:• the set of nodes N is partitioned in three sets Nt, Ng, and A• the set of edges is partitioned in four sets Nt, Ng, EA(t), and

EA(g)

where t = (Nt,Et) and g = (Nt,Et) are two trees EA(t) = {(x, y)|x Nt and yA} EA(g) = {(x, y)|x Ng and yA}

Tripartite Directed Acyclic Graphs (tDAGs)

Page 14: Efficient  kernels  for  sentence pair classification

F.M.Zanzotto

University of Rome “Tor Vergata”

Alternative definitionA tDAG is a pair of extented trees

G = (t,g) where:t = (NtAt,EtEA(t)) and g = (NgAg,EgEA(g)).

Tripartite Directed Acyclic Graphs (tDAGs)

S

NP VP

NP

eat

VP

VB

feed

NPNP

VB

S

NP VP

NP

eat

VP

VB

feed

NPNP

VB

X

Y

X Y

Page 15: Efficient  kernels  for  sentence pair classification

F.M.Zanzotto

University of Rome “Tor Vergata”

Computing the implicit kernel functionK(P1,P2)=|S(P1)S(P2)|

involves general graph matching. This is an exponential problem.

Yet…tDAGs are particular graphs and we can define an efficient algorithm

We will analyze the isomorphism among tDAGs and we will derive an algorithm for

Again challenges

Page 16: Efficient  kernels  for  sentence pair classification

F.M.Zanzotto

University of Rome “Tor Vergata”

Isomorphism between graphs

G1=(N1,E1) and G2=(N2,E2) are isomorphic if:– |N1|=|N2| and |E1|=|E2|– Among all the bijecive functions relating N1 and N2, it

exists f : N1 N2 such that:• for each n1 in N1, Label(n1)=Label(f(n1))• for each (na,nb) in E1, (f(na),f(nb)) is in E2

Isomorphism between tDAGs

Page 17: Efficient  kernels  for  sentence pair classification

F.M.Zanzotto

University of Rome “Tor Vergata”

Isomorphism adapted to tDAGsG1 = (t1,g1) and G2 = (t2,g2) are isomorphic if these

two properties hold– Partial isomorphism

• g1 and g2 are isomorphic• t1 and t2 are isomorphic• This property generates two functions fg and ft

– Constraint compatibility• fg and ft are compatible on the sets of nodes A1 and A2, if

for each n A1, it happens that f g (n) = ft (n).

Isomorphism between tDAGs

Page 18: Efficient  kernels  for  sentence pair classification

F.M.Zanzotto

University of Rome “Tor Vergata”

Isomorphism between tDAGs

VP

VB NP NP 31

S

NP VP

VB NP 3

1 ,

VP

VB NP NP 21

S

NP VP

VB NP 2

1 ,

Ct=

Ct= Cg

1 1{ ), 3 2( ),( }, Cg= 1 1{ ), 3 2( ),( },

Partial isomorphism

Constraint compatibility

Pa=(ta,ga)=

Pb=(tb,gb)=

Page 19: Efficient  kernels  for  sentence pair classification

F.M.Zanzotto

University of Rome “Tor Vergata”

We defineK(P1,P2)=|S(P1)S(P2)|

using the isomorphism between tDAGs

The idea: reverse the order of isomorphism detection• First, constraint compatibility

– Building a set C of all the relevant alternative constraints – Finding subsets of S(P1)S(P2) meeting a constraint cC

• Second, partial isomorphism detection

Ideas for building the kernelsubsets of S(P1)S(P2) Alternative constraints

Partial Isomorphism

Constraint compatibility

Page 20: Efficient  kernels  for  sentence pair classification

F.M.Zanzotto

University of Rome “Tor Vergata”

Ideas for building the kernel

A

B C

C

1

1

C 2B B 21

1

1

A

B C

C

1

1

C 3B B 21

1

1

I

M N

N

1

1

N 1M M 12

1

2

I

M N

N

1

1

N 1M M 13

1

2

,

,

C={c1,c2}={ 1 1{ ), 2 2( ),( }, , 1 1{ ), 2 3( ),( }, }

K(Pa,Pb)=|S(Pa)S(Pb)|

Pa=(ta,ga)=

Pb=(tb,gb)=

subsets of S(P1)S(P2) Alternative constraints

Partial Isomorphism

Constraint compatibility

Page 21: Efficient  kernels  for  sentence pair classification

F.M.Zanzotto

University of Rome “Tor Vergata”

Ideas for building the kernel

A

B C

C

1

1

C 2B B 21

1

1

A

B C

C

1

1

C 3B B 21

1

1

I

M N

N

1

1

N 1M M 12

1

2

I

M N

N

1

1

N 1M M 13

1

2

,

,

1 1{ ), 2 2( ),( },c1=

A

B C

1

1

B B 21

1

I

M N

N

1

1

N 1

1

2

,

A

B C

1

1 1

I

M N

N

1

1

N 1

1

2

,

A

B C

1

1

B B 21

1I

M N

1

1 1

,

A

B C

1

1 1

I

M N

1

1 1 ,{

}

, ,

,

C={c1,c2}

S(Pa)S(Pb)) c1=

Pa=

Pb=

subsets of S(P1)S(P2) Alternative constraints

Partial Isomorphism

Constraint compatibility

K(Pa,Pb)=|S(Pa)S(Pb)|K(Pa,Pb)=|S(Pa)S(Pb)|=|(S(Pa)S(Pb)) c1(S(Pa)S(Pb)) c2|

Page 22: Efficient  kernels  for  sentence pair classification

F.M.Zanzotto

University of Rome “Tor Vergata”

Ideas for building the kernel

A

B C

C

1

1

C 2B B 21

1

1

A

B C

C

1

1

C 3B B 21

1

1

I

M N

N

1

1

N 1M M 12

1

2

I

M N

N

1

1

N 1M M 13

1

2

,

,

1 1{ ), 2 3( ),( },c2=

A

B C

1

1

C C 21

1

I

M N

M

1

1

M 1

1

2

,

A

B C

1

1 1

I

M N

N

1

1

N 1

1

2

,

A

B C

1

1

C C 21

1I

M N

1

1 1

,

A

B C

1

1 1

I

M N

1

1 1 ,{

}

, ,

,

C={c1,c2}

K(Pa,Pb)=|S(Pa)S(Pb)|=|(S(Pa)S(Pb)) c1(S(Pa)S(Pb)) c2|

Pa=

Pb=

S(Pa)S(Pb)) c2=

subsets of S(P1)S(P2) Alternative constraints

Partial Isomorphism

Constraint compatibility

Page 23: Efficient  kernels  for  sentence pair classification

F.M.Zanzotto

University of Rome “Tor Vergata”

Ideas for building the kernel

A

B C

1

1

B B 21

1

I

M N

N

1

1

N 1

1

2

,

A

B C

1

1 1

I

M N

N

1

1

N 1

1

2

,

A

B C

1

1

B B 21

1I

M N

1

1 1

,

A

B C

1

1 1

I

M N

1

1 1 ,=

{

} =

, ,

,

={A

B C

1

1

B B 21

1

I

M N

N

1

1

N 1

1

2

,A

B C

1

1 1

I

M N

1

1 1, }

=}{

(S(Pa)S(Pb)) c1

=(S(ta)S(tb)) c1 (S(ga)S(gb)) c1

K(Pa,Pb)=|cC(S(Pa)S(Pb))c|=|cC (S(ta)S(tb))c(S(ga)S(gb))c|

subsets of S(P1)S(P2) Alternative constraints

Partial Isomorphism

Constraint compatibility

Page 24: Efficient  kernels  for  sentence pair classification

F.M.Zanzotto

University of Rome “Tor Vergata”

The general Equation

can be computed using:1) KS (kernel function for trees) introduced in(Duffy&Collins, 2001)

and refined in (Moschitti&Zanzotto, 2007)2) The inclusion exclusion principle

Kernel on FOR feature spaces

K(P1,P2)=|cC (S(t1)S(t2))c(S(g1)S(g2))c|

Page 25: Efficient  kernels  for  sentence pair classification

F.M.Zanzotto

University of Rome “Tor Vergata”

• Comparison Kernel (Zanzotto&Moschitti, Coling-ACL 2006),(Moschitti&Zanzotto, ICML 2007)

• Test-bed: corpus– Recognizing Textual Entailment challenge data

Computational Efficency Analysis

Page 26: Efficient  kernels  for  sentence pair classification

F.M.Zanzotto

University of Rome “Tor Vergata”

Computational Efficency Analysis

Execution time in seconds (s) for all the RTE2 with respect to different numbers of allowed placeholders

Page 27: Efficient  kernels  for  sentence pair classification

F.M.Zanzotto

University of Rome “Tor Vergata”

• Training: RTE 1, 2, 3 • Testing: RTE 4

Accuracy Comparison

Page 28: Efficient  kernels  for  sentence pair classification

F.M.Zanzotto

University of Rome “Tor Vergata”

• We reduced kernels in first-order feature spaces as graph-matching problems

• We defined a new class of graphs, tDAGs• We presented an efficient algorithm for computing

kernels in FOR feature spaces

Conclusions

Page 29: Efficient  kernels  for  sentence pair classification

F.M.Zanzotto

University of Rome “Tor Vergata”

S

NP VP

VB NP

eat

VP

VB NP

feed

NPNNS

CowsNN NNS

animal extractsNNS

cows

NN NNS

animal extracts

S

NP

NNS

Farmers

1 2 3

3

3

1 2

1

2 3

3

3

21

1

1

,

VP

S

NP

S

NP VP1 , VP

VB NP NP 31

S

NP VP

VB NP 3

1 ,, ,...{ }

Page 30: Efficient  kernels  for  sentence pair classification

F.M.Zanzotto

University of Rome “Tor Vergata”

VP

S

NP

S

NP VP1 , VP

VB NP NP 21

S

NP VP

VB NP 2

1 ,, ,...{ }

S

NP VP

VB

eat

VP

VB NP

feed

NPNNS

Cows

NNS

babies

NN

milk

S

NP

NNS

Mothers

1 2

2

1 2

1

1

1

1

, NP

NN

milk2

2

2

VP

S

NP

S

NP VP1 , VP

VB NP NP 31

S

NP VP

VB NP 3

1 ,, ,...{ }