solving non-clausal formulas with dpll search
DESCRIPTION
Solving Non-clausal Formulas with DPLL search. Christian Thiffault Fahiem Bacchus University of Toronto. Toby Walsh UNSW. CNF. The classical DPLL algorithm (most modern SAT solvers) work with Conjunctive Normal Form (CNF) However, CNF is not the most natural representation. - PowerPoint PPT PresentationTRANSCRIPT
Solving Non-clausal Formulas with DPLL search
Christian Thiffault Fahiem BacchusUniversity of Toronto
Toby WalshUNSW
04/19/23 NCDPLL SAT 2004 2
CNF• The classical DPLL algorithm (most modern SAT
solvers) work with Conjunctive Normal Form (CNF)• However, CNF is not the most natural representation.• Hence, to use modern SAT solver technology problems
must be converted to CNF• In this work we demonstrate that such a conversion is
unnecessary. • For a number of reasons such a conversion is also
undesirable.– We demonstrate that original structure lost in conversion can
be utilized to significantly improve the performance of SAT solvers.
04/19/23 NCDPLL SAT 2004 3
Tseitin Encodings• The most commonly used CNF encoding is
the Tseitin encoding [Tseitin 1970].• This encoding converts a propositional
formula recursively by adding a new variable for every subformula.
04/19/23 NCDPLL SAT 2004 4
Tseitin EncodingsA → (C & D)– V1 ≡ (C & D)
• (~V1, C), (~V1, D), (~C,~D,V1)
1. (~V1, C)
2. (~V1, D)
3. (~C,~D,V1)
04/19/23 NCDPLL SAT 2004 5
Tseitin EncodingsA → (C & D)– V1 ≡ (C & D)
• (~V1, C), (~V1, D), (~C,~D,V1)
– V2 ≡ (A → V1)• (~V2, ~A,V1), (A,V2), (~V1,V2)
1. (~V1, C)
2. (~V1, D)
3. (~C,~D,V1)
4. (~V2, ~A,V1)
5. (A,V2)
6. (~V1,V2)
04/19/23 NCDPLL SAT 2004 6
Tseitin EncodingsA → (C & D)– V1 ≡ (C & D)
• (~V1, C), (~V1, D), (~C,~D,V1)
– V2 ≡ (A → V1)• (~V2, ~A,V1), (A,V2), (V1,V2)
– The formula must be true: (V2)
• (~V1, C)• (~V1, D)• (~C,~D,V1)
4. (~V2, ~A,V1)
5. (A,V2)
6. (V1,V2)
7. (V2)
04/19/23 NCDPLL SAT 2004 7
Converting to CNF is Undesirable • There are two obvious problems arising
from this conversion.
1. Structural information is lost– The fact that a particular group of clauses were
generated from the same subformula is lost.– The global interconnections between the
various subformulas are lost.
2. Additional variables are added. This potentially increases the size of the DPLL search.
04/19/23 NCDPLL SAT 2004 8
Structural Information • Some of this information could be recovered
from the CNF encoding, but not all of it [Lang & Marquis, 1989].
– There is good empirical evidence that recovering structural information can yield considerable benefits in solving performance. [EqSatZ, LSAT].
– But why lose this information in the first place? – We will show conversion to CNF is not necessary.– We also develop techniques which confirm the
benefits of retaining the original structure.
04/19/23 NCDPLL SAT 2004 9
Extra Variables• Various works have noted the potential
difficulty of introducing new variables. • Some have suggested restricting the DPLL
search so that it cannot branch on any on the newly introduced “subformula” variables.
• However, it is not difficult to show that restricting branching in this manner causes an exponential slowdown on some problems [Jarvisalo et al. 2004]
• Solvers that attempt to restrict their branching in this manner can perform very poorly on many problems.
04/19/23 NCDPLL SAT 2004 10
Extra Variables• The alternative is unrestricted branching.• However, with unrestricted branching a
CNF solver can waste a lot of time branching on variables that have dynamically become irrelevant.
04/19/23 NCDPLL SAT 2004 11
Irrelevant Variables
A → (C & D) A=False.
Formula satisfied
04/19/23 NCDPLL SAT 2004 12
Irrelevant Variables CNFA → (C & D)– V1 ≡ (C & D)– V2 ≡ (A → V1)
• (~V1, C)• (~V1, D)• (~C,~D,V1)
4. (~V2, ~A,V1)
5. (A,V2)
6. (V1,V2)
7. (V2)8. (~A)
Solver must still determine that the remaining clauses are SAT
04/19/23 NCDPLL SAT 2004 13
Converting to CNF is Unnecessary • DPLL search can be performed on the
original formula.– This has been noted in previous work on
circuit based solvers [Ganai et al. 2002]
04/19/23 NCDPLL SAT 2004 14
DPLL on formulas • Convert formula to a dag.
A → (C & D) \/ B → (C & D)
\/
→ →
A B
&
C D
04/19/23 NCDPLL SAT 2004 15
DPLL on formulas • Associate a truth value (True/ False/ Unassigned)
with every node of the DAG.• Now perform DPLL search by splitting on the
truth value of an unassigned node.• Use the logic of the boolean operators to
propagate truth values to neighboring nodes.• A solution is found when all nodes have been
labeled with consistent truth values (i.e., the parent and child values are consistent with the logic of the parent’s operator.)
• A contradiction occurs when True and False are propagated to the same node.
04/19/23 NCDPLL SAT 2004 16
Example
\/
→ →
A B
&
C D
04/19/23 NCDPLL SAT 2004 17
Example
\/
→ →
A B
&
C D
04/19/23 NCDPLL SAT 2004 18
Example
\/
→ →
A B
&
C D
04/19/23 NCDPLL SAT 2004 19
Comparison • Assigning a truth value to a node in the
DAG is equivalent to assigning a truth value to the corresponding variable in the CNF encoding.
• Propagation in the DAG is equivalent to unit propagation in the CNF encoding.
04/19/23 NCDPLL SAT 2004 20
Comparison • The rule that propagated a truth value can
be recorded. Once a contradiction is detected a conflict clause can be learned (a set of impossible node assignments). This clause can be made equivalent to the 1-UIP clauses learned by CNF solvers.
• The learned clauses can be stored and used to unit propagate node truth values in the rest of the search.
04/19/23 NCDPLL SAT 2004 21
Comparison • Complex gates, e.g., n-ary XOR can be
more efficiently propagated in the DAG representation (these gates require a number of clauses exponential in the number of their inputs).
04/19/23 NCDPLL SAT 2004 22
Efficient Propagation • Efficient Unit Propagation via lazy data
structures is a key element of modern SAT solvers.
• One new feature of our circuit based solver over previous circuit solvers is that it utilizes similar lazy data structures to make its propagation as efficient as CNF solvers.
04/19/23 NCDPLL SAT 2004 23
Efficient Propagation • E.g., an AND gate becomes true only when
all of its children become true. – Assign one child as a true watch, and don’t
check for true propagating to the AND gate node unless its true watch becomes true.
– Some benchmarks have AND gates with thousands of children.
– Previous “table-lookup” circuit solvers required some computation each time a child became true.
• Using these techniques there is no intrinsic loss of efficiency in using the DAG over CNF.
04/19/23 NCDPLL SAT 2004 24
Structure Based Optimizations• Since no penalty is being paid for using the
original formula, we can turn to exploiting the extra structural information it provides.
• We implemented two structure based optimizations.– Don’t care propagation to deal with irrelevant
variables.– Conflict Clause reduction.
04/19/23 NCDPLL SAT 2004 25
Don’t Care Propagation• We propagate a third “truth” value through
the DAG: don’t cares.• A node C is don’t care wrt a particular
parent P – if its truth value can no longer affect the truth
value of P nor any of its P siblings.– or P is don’t care.
• A node C is don’t care if it is don’t care wrt to all of its parents.
04/19/23 NCDPLL SAT 2004 26
Don’t Care Propagation• Assign a don’t care watch parent for each
node.• C becoming don’t care wrt to its watch
parent P can be detected when truth values are propagated to P.
• If C becomes don’t care wrt to its don’t care watch we look for another watch.
• If we can’t find one we know that C has become don’t care.
• Finally, we stop the search from branching on don’t care nodes. This eliminates branching of irrelevant values.
04/19/23 NCDPLL SAT 2004 27
Conflict Clause Reductions• If one learns a conflict clause (L1,L2,...) and one has L1
→ L2, then we can reduce the conflict clause by a resolution step:– (-L1,L2) (L1,L2,...) (L2,...)– The resolvant subsumes the original conflict clause.
• In a CNF encoding one would have to search the clauses to detect this situation—probably not going to be effective.
• In the DAG one can examine the neighbors of each node assignment in the conflict clause to see if any of the other assignments in the clause are implied by it.
• If so we can remove the implying node assignment.• If this is done with discrimination it can yield a
significant decrease in the size of the learned clauses with little overhead.
04/19/23 NCDPLL SAT 2004 28
Empirical Results.• We compared with Zchaff.• Tried to make the two solvers as close as
possible, same magic numbers (e.g., clause database cleanup criteria, restart intervals etc.), same branching heuristics.
• Hence, we tried to isolate the impact of the non-clausal representation and the structure based optimizations.
• We believe that the same improvements could be obtained with others CNF solvers via this technique.
04/19/23 NCDPLL SAT 2004 29
Empirical Results caveats • Our experiments were hampered by a lack
of non-clausal benchmarks.• The performance of our solver was also
limited by the fact that the benchmarks we did obtain has already been transformed into simpler formulas, e.g., no complex XOR of IFF gates were present in the benchmarks.
04/19/23 NCDPLL SAT 2004 30
FVP-UNSAT-2.0 (Velev) Time Problem #Vars Time Imp/Sec
Zchaff NoClause Zchaff NoClause 4pipe 5,237 188.89 9.87 467,001 509,433 4pipe_1 4,647 26.55 35.52 512,108 327,098 4pipe_2 4,941 49.76 36.5 482,896 327,298 4pipe_3 5,233 144.34 62.03 424,551 316,049 4pipe_4 5,525 93.83 42.26 470,936 326,186 5pipe 9,471 54.68 33.34 526,457 409,154 5pipe_1 8,441 126.11 116.18 425,921 280,758 5pipe_2 8,851 138.62 177.24 437,166 279,298 5pipe_3 9,267 137.7 134.08 441,319 295,976 5pipe_4 9,764 873.81 284.62 370,906 270,234 5pipe_5 10,113 249.11 137.09 456,400 298,903 6pipe 15,800 4,550.92 297.13 322,039 288,855 6pipe_6 17,064 1,406.18 1,056.56 402,301 267,207 7pipe 23,910 12,717.00 1,657.70 306,433 244,343 7pipe_bug 24,065 128.9 0.29 266,901 403,148
04/19/23 NCDPLL SAT 2004 31
FVP-UNSAT-2.0 Decisions Problem #Vars Zchaff NoClause 4pipe 5,237 541,195 41,637 4pipe_1 4,647 131,223 114,512 4pipe_2 4,941 210,169 112,720 4pipe_3 5,233 392,564 169,117 4pipe_4 5,525 295,841 122,497 5pipe 9,471 334,761 102,077 5pipe_1 8,441 381,921 255,894 5pipe_2 8,851 397,550 362,840 5pipe_3 9,267 385,239 292,802 5pipe_4 9,764 1,393,529 503,128 5pipe_5 10,113 578,432 283,554 6pipe 15,800 5,232,321 435,781 6pipe_6 17,064 2,153,346 1,326,371 7pipe 23,910 12,437,654 1,276,763 7pipe_bug 24,065 1,075,907 481
04/19/23 NCDPLL SAT 2004 32
FVP-UNSAT-2.0 Don’t CaresProblem DC Time No DC DC Decision No DC 4pipe 9.87 57.68 41,637 198,828 4pipe_1 35.52 62.65 114,512 159,049 4pipe_2 36.5 94.46 112,720 212,986 4pipe_3 62.03 213.27 169,117 365,007 4pipe_4 42.26 318.64 122,497 525,623 5pipe 33.34 246.93 102,077 650,312 5pipe_1 116.18 300.59 255,894 489,825 5pipe_2 177.24 360.67 362,840 585,133 5pipe_3 134.08 387.65 292,802 593,815 5pipe_4 284.62 2097.31 503,128 1,842,074 5pipe_5 137.09 379.19 283,554 543,535 6pipe 297.13 10,241.64 435,781 4,726,470 6pipe_6 1,056.56 3,455.35 1,326,371 2,615,479 7pipe 1,657.70 12,685.59 1,276,763 6,687,186 7pipe_bug 0.29 1 481 2,006
04/19/23 NCDPLL SAT 2004 33
FVP-UNSAT-2.0 Clause ReductionProblem Red. Time No Red. Red. Ave Cls. SizeNo Red 4pipe 9.87 30.5 40 148 4pipe_1 35.52 45.13 77 132 4pipe_2 36.5 54.97 84 132 4pipe_3 62.03 69.7 108 162 4pipe_4 42.26 80.13 112 157 5pipe 33.34 16.29 93 113 5pipe_1 116.18 195.81 140 204 5pipe_2 177.24 159.45 165 199 5pipe_3 134.08 154.02 165 218 5pipe_4 284.62 504.27 208 264 5pipe_5 137.09 216.11 172 237 6pipe 297.13 647.78 232 540 6pipe_6 1,056.56 1,421.42 309 380 7pipe 1,657.70 2,053.92 336 761 7pipe_bug 0.29 0.29 10 10
04/19/23 NCDPLL SAT 2004 34
Other Series
Series (#probs) ZchaffTime Dec. Imp/Sec Cls Size
sss-sat-1.0 (100) 128 2,970,794 728,144 70 vliw-sat-1.1 (100) 3,284 154,742,779 302,302 82 fvp-unsat-1.0 (4) 245 3,620,014 322,587 326 fvp-unsat-2.0 (22) 20,903 26,113,810 327,590 651
Series (#probs) NoClauseTime Dec. Imp/Sec Cls Size
sss-sat-1.0 (100) 225 1,532,843 616,705 39 vliw-sat-1.1 (100) 1,033 4,455,378 260,779 55 fvp-unsat-1.0 (4) 172 554,100 402,621 100 fvp-unsat-2.0 (22) 4,104 5,537,711 267,858 240
04/19/23 NCDPLL SAT 2004 35
Conclusions• No intrinsic reason to convert to CNF.• Many other structure based optimizations
remain to be investigated, e.g.– better branching– non-clausal representation of conflicts– more complex gates.