on a semantic subsumtion test - ii.uni.wroc.pljma/urugwaj.pdf · on a semantic subsumtion test...

On a Semantic Subsumtion Test

Jerzy Marcinkowski

Jan Otop

Grzegorz Stelmaszek

Institute of Computer Science

University of Wroc law,

Wroc law, Poland

(LPAR 04)

Outline of the talk

• Preliminaries 5 min.

– subsumption

– architecture of a theorem prover

• Our work – subsumption 25 min.

– first idea

– a semantics of subsumption

– implementation

• Our work – matching 5 min.

Subsumption - quick reminder

We say that clause C subsumes D if an instance of C is a fragment of D.

C

E(r , s , f(z))

E(s , v , z )

D

E(f(x), y, f(c))

E(y, f(k), fff(c))

E(f(k), u, f(c))

E(u, f(x), ff(c))



C

E(r , s , f(z))

E(s , v , z )

D

E(f(x), y, f(c))

E(y, f(k), fff(c))

E(f(k), u, f(c))

E(u, f(x), ff(c))

llllllllllllllllll

E(f(x), v,f(c))



C

E(r , s , f(z))

E(s , v , z )

D

E(f(x), y, f(c))

E(y, f(k), fff(c))

E(f(k), u, f(c))

E(u, f(x), ff(c))

llllllllllllllllll

E(f(x), v,f(c))��>



C

E(r , s , f(z))

E(s , v , z )

D

E(f(x), y, f(c))

E(y, f(k), fff(c))

E(f(k), u, f(c))

E(u, f(x), ff(c))

hhhhhhhhhhhhhhhhhhz



C

E(r , s , f(z))

E(s , v , z )

D

E(f(x), y, f(c))

E(y, f(k), fff(c))

E(f(k), u, f(c))

E(u, f(x), ff(c))

hhhhhhhhhhhhhhhhhhz

E(f(k), v,ff(c))



C

E(r , s , f(z))

E(s , v , z )

D

E(f(x), y, f(c))

E(y, f(k), fff(c))

E(f(k), u, f(c))

E(u, f(x), ff(c))

?

:

hhhhhhhhhhhhhhhhhhz

E(f(k), v,ff(c))

As you see, subsumption is about choice. No wonder it is NP-complete.

Architecture of a theorem prover

set Aof clauses

�

�A new clause C

born from parents in A

?

Subsumption test

Does there exist D ∈ A

such that D subsumes C?

Yes No

? ?

�

C is redundant,its children would be redundant,so better if it dies now

90%?

-C is added to A

10%

This goes on untilthe empty clause is born...

• Many thousands of subsumption tests need to be performed in a run

of a theorem prover.

• Most of them prove negative.

• They take about half of the running time of a prover.

• Complicated indexing techniques have been developed (discrimination

trees, code trees). What they index is the syntax of clauses. We want

to index clauses with respect to their meaning.


set Aof clauses

�

�A new clause C


?

Subsumption test



Yes No

? ?

�


90%?

-C is added to A

10%



set Aof clauses

�

�A new clause C


?

Subsumption test



Yes No

? ?

�


90%?

-C is added to A

10%


index

}}}

-��*

hhhhhhhhhhhhhhhhhhhhhhz

:

PPPPPPPPPPPPPPPPPPPPPPq

��

��

��:

hhhhhhhhhhhhhhhhhhz


set Aof clauses

�

�A new clause C


?

Subsumption test



Yes No

? ?

```````````````````̀

�


90%?

-C is added to A

10%


index

no}}}

-��*


:


��

��

��:

hhhhhhhhhhhhhhhhhhz


set Aof clauses

�

�A new clause C


?

Subsumption test



Yes No

? ?

```````````````````̀

PPPPPPPPPPPPPPPPPPPPq

�


90%?

-C is added to A

10%


index

no}maybe}}

-��*


:


��

��

��:

hhhhhhhhhhhhhhhhhhz


set Aof clauses

�

�A new clause C


?

Subsumption test



Yes No

? ?

```````````````````̀


�


90%?

-C is added to A

10%


index

no}maybe}}

-��*


:


no

��

��

��:

hhhhhhhhhhhhhhhhhhz


set Aof clauses

�

�A new clause C


?

Subsumption test



Yes No

? ?

```````````````````̀


�


90%?

-C is added to A

10%


index

no}maybe}}

-��*


:


no

yes

��

��

��:

hhhhhhhhhhhhhhhhhhz

• Many thousands of subsumption tests need to be performed in a run

of a theorem prover.

• Most of them prove negative.

• They take about half of the running time of a prover.

• Complicated indexing techniques have been developed (discrimination

trees, code trees). What they index is the syntax of clauses. We want

to index clauses with respect to their meaning.

A semantic (no)-subsumption test

• take 64 small random structures M1,M2, . . .M64 over the signature ofinterest

• for each clause D ∈ A compute profile(D)=〈i1, . . . i64〉 where ik is thetruth value of D in Mi

• index clauses with respect to their profiles

Observation: Let: profile(D)=〈i1, . . . i64〉, profile(C)=〈j1, . . . j64〉.If ∀k ik ≤ jk then D maybe subsumes C. Otherwise does not .

It is very cheap test! Just check if 〈i1 ∧ j1, . . . i64 ∧ j64〉 6= 〈i1, . . . i64〉.

But is it any good? Can we expect decent selectivity?

A semantic (no)-implication test

• take 64 small random structures M1,M2, . . .M64 over the signature ofinterest

• for each clause D ∈ A compute profile(D)=〈i1, . . . i64〉 where ik is thetruth value of D in Mi

• index clauses with respect to their profiles

Observation: Let: profile(D)=〈i1, . . . i64〉, profile(C)=〈j1, . . . j64〉.If ∀k ik ≤ jk then D maybe implies C. Otherwise does not .

It is very cheap test! Just check if 〈i1 ∧ j1, . . . i64 ∧ j64〉 6= 〈i1, . . . i64〉.

But is it any good? Can we expect decent selectivity?


instances ofimplicationXXXXXXXXXXXXXXXXXXXXXXXz

instances ofsubsumption

``````

``````

`````̀

C1 step = {¬P (x), P (f(x))} C2 steps = {¬P (x), P (f(f(x)))}C1 step implies C2 steps but doesn’t subsume it.

Is this blue fragment a practical problem? Can we do anything about it?


-

〈C1 step, C2 steps〉

instances ofimplicationXXXXXXXXXXXXXXXXXXXXXXXz

instances ofsubsumption

``````

``````

`````̀

C1 step = {¬P (x), P (f(x))} C2 steps = {¬P (x), P (f(f(x)))}C1 step implies C2 steps but doesn’t subsume it.

Is this blue fragment a practical problem? Can we do anything about it?

We want to discover

some semantic meaning of subsumption.

Let us begin with:

Definition (of the truth value of clause C in structure M)....

Yes, I realize that you know what a truth value of a formula is.

Let L be a logic.......i.e. a total order with a unary function L → L called

negation.

L-model is a set M with:

a function Mk →M for each k-ary function symbol

a function Mk → L for each k-ary relation symbol.

Take a valuation of variables: v : V ar →M

It extends to: v : terms →M

then to: v : atomic formulas → Land then to: v : literals → L

Truth value TV (C,M, v) of clause C in M under valuation v is...

max{v(L) | L ∈ C}

...the maximal truth value of a literal from C.

This max is what you used to call disjunction.

Truth value TV (C,M) of clause C in M is...

min{TV (C,M, v) | v : V ar →M}

...the minimal truth value of C in M under all possible valuations

(sometimes called universal quantification.)

C implies D in logic L if TV (C,M) ≤ TV (D,M) for each L-model M

Remark.

What you used to call implication is implication in logic 0 < 1, where

negation(x)=1-x

Observation.

If C subsumes D then C implies D in every logic.

Proof: The minimum ranges over more valuations in C than in D. The

maximum ranges over more literals in D than in C.

Remark.

What you used to call implication is implication in logic 0 < 1, where

negation(x)=1-x

Observation.

If C subsumes D then C implies D in every logic.

So whatever logic we take, our subsumption test will remain sound. Is

there any chance to make it complete (at least in principle)?

As some of you may still remember....

We want to discover


Consider the following Strange 4 Valued Logic:

XT>T>F>XF

negation(XT) = XT

negation(T) = F

negation(F) = T

negation(XF) = XF

Theorem. Subsumption of clauses is (finite) implication in S4VL.

Remark. In the practical cases subsumption is already implication in S3VL

– the value XF is not needed

As some of you may still remember....

We want to discover


Consider the following Strange 4 Valued Logic:

XT>T>F>XF

negation(XT) = XT

negation(T) = F

negation(F) = T

negation(XF) = XF

Theorem. Subsumption of clauses is (finite) implication in S4VL.

Remark. In the practical cases subsumption is already implication in S3VL

– the value XF is not needed.

Back to C1 step and C2 steps

Among the 4-element S3VL–models there are 10.7% such structures M

which witness no-subsumption i.e:

TV (C1 step,M) > TV (C2 steps,M)

Probability that for a random sequence M1,M2 . . .M64 of four-element

models TV (C1 step,Mi) ≤ TV (C2 steps,Mi) will hold for each i is:

0.89364 < 0.001

(this is the probability of a false positive in our test)

Among the 36 two-element S3VL models there are four such good struc-

tures M , which is 11.1%

Implementation

We implemented our ideas in Otter and compared the performance of:

• Otter

• our 2/2 semantic Otter (64 models consisting of 2 elements each, 2

truth values,negation as identity)

• our 2/3 semantic Otter (32 models of 2 elements each, S3VL)

• our 4/4 semantic Otter (20 models of 4 elements each, 4 truth values,

negation as identity)

Implementation issues: – profile indexing, – computing of the truth values.

Benchmarks

Reference Set – such problems from TPTP that:

either 2/2 semantic Otter or Otter finds a proof in less than 300 seconds;

at least one of them needs more than 5 seconds to find a proof.

Otter vs. 2/2 semantic Otter

(by TPTP domain, not all domains included, run on the Reference Set)

Th-s Th-s Th-s Th-s Th-s

not for for for not

proved which which which proved

by Otter they Otter by

TPTP 2/2 was are was Otter

domain sem. > 30% perform > 30%Otter faster equally slower

BOO 3 16 0 0 1

GRP 24 65 4 7 5

LCL 45 52 0 0 0

SYN 3 14 0 1 0

CAT 0 0 0 6 1

GEO 0 0 0 8 19

HWV 0 0 0 9 1

MGT 0 0 0 3 1

Theorems not proved by different versions of Otter

(by the maximal number of literals

in the input clauses, run on the Reference Set).

Maximal Th-s Th-s Th-s Th-s

number not not not not

of proved proved proved proved

literals by by by by

in 2/2 2/3 4/4 Otter

input semantic semantic semantic

clauses Otter Otter Otter

4 or less 77 101 105 5

5 3 8 8 1

6 0 1 0 1

7 0 1 0 1

8 0 6 6 11

9 1 1 1 1

10 to 19 0 7 2 6

FOF 0 2 2 5

Related idea: matching and unification

For a structure M and a term t ∈ TV (Σ) define the set of possible values

of t in M as

V al(t,M) = {τ̄(v) : τ : V →M}

Related idea: matching and unification

Observation. Suppose t is an instance of s, and M is any structure. Then

V al(t,M) ⊆ V al(s,M).

In other words, if we can guess a structure M in which V al(t,M) ⊆V al(s,M) does not hold, then t is not an instance of s.

But why should we bother? Matching and unification are easy anyway!

Why would we need a semantic test?

The above observation is true also for the AC case. AC matching, like

subsumption, is NP-complete and it turns out that it is AC matching that

takes most of the running time of EQP, a cousin of Otter built to prove

theorems in first-order equational logic.

We implemented the above idea in EQP. Terms are profiled by 32 random

models, each of them of size 4. We ran our semantic EQP on Robbins

Conjecture.

EQP vs. semantic EQP

on the lemmas of Robbins Conjecture

EQP semantic EQP

Lemma 1

total time 72.67 sec 36.74 sec

Lemma 1

AC matching time 56.13 sec 20.49 sec

Lemma 2

total time 25477.20 sec 11405.72 sec

Lemma 2

AC matching time 21030.06 sec 6812.41 sec

The end (for now)

on a semantic subsumtion test - ii.uni.wroc.pljma/urugwaj.pdf · on a semantic subsumtion test...

Documents