ibm watson research center cpaior-2013 workshop: seeking feasibility in combinatorial problems ©...

IBM Watson Research Center

CPAIOR-2013 Workshop: Seeking Feasibility in Combinatorial Problems © 2013 IBM Corporation

Branching Strategies and Restartsin SAT Solvers

Ashish Sabharwal

IBM Research

© 2013 IBM CorporationCPAIOR-2013 Workshop: Seeking Feasibility in Combinatorial Problems

Talk Outline

The SAT Problem, SAT Solvers

Conflict-Driven Systematic SAT Solvers

– Dramatic Progress

– Contrast with CP/MIP solvers

– “Everything” influenced by Learned Clauses and Conflict Analysis

Traditional Branching Heuristics

CDCL Solvers: Dynamic Heuristics and Associated Techniques

– Clause Learning

– Lazy Data Structures

– VSIDS Variable Selection Heuristic

– Restarts

Summary

2

IBM Research


SAT: Problem and Solvers

3

IBM Research


Boolean Satisfiability (SAT) : Basics

Variables with Boolean domain {T,F} or, equivalently, {1,0}

Constraints specified in the Conjunctive Normal Form (CNF)

E.g. (a or b) and (c or d or f) and (a or c or d)

SAT Solver: An algorithm (typically with an implementation) that, given a CNF formula F, finds a satisfying assignment for F, if there is one

– Complete SAT Solver: must terminate and output “unsatisfiable”, if F is unsat.

Dozens of (mostly Open Source) SAT Solvers available on the Internet

– 35+ solvers participated in SAT Competition 2006

– 65+ in 2011

4

clause = disjunction of literals

a : variablea, a : literals

IBM Research


SAT Solvers: 3 Dominant Approaches

Local Search based stochastic algorithms

– Incomplete (do not prove unsatisfiability)

– Very effective on satisfiable Random instances, esp. near phase transition

5

Look-Ahead Based systematic solvers

– Complete search with careful selection of variables/values to branch on• Spend time exploring “reduction” in complexity with various branching possibilities• Local Learning: some local inference within a subtree is learned as “implication arrays”• Theoretical Studies: autarkies, “reduction” measures based on probability distributions

– Very effective on unsatisfiable Random instances

– Also effective on Crafted and some Industrial instances

Conflict Directed Clause Learning (CDCL) solvers

– Complete search producing General Resolution proofs of unsatisfiability

– Very effective on Industrial instances, esp. large and highly interconnected ones

IBM Research


Look-Ahead vs. CDCL SAT Solvers

Complementary Regimes of Strength

– Plot shows dominating solver on Crafted and Industrial instances

+ March: typical Look-Ahead solver

Minisat: typical CDCL solver

6

Low ConstraintDensity

Credit: Heule & van Maaren,Handbook of SAT

Low Diameterof Resolution Graph

[two clauses have an edgeif they clash in 1 literal]

IBM Research


CDCL SAT Solvers

7

IBM Research


Systematic SAT Solvers as Search EnginesDramatic Progress in 20 Years

Started out with ~100 vars, ~200 constraints in early 1990’s

Now often easily handle over 1M vars, ~5M constraints

– Instances with 30M clauses being used in competitions!

Was it all just Moore’s Law? It helped, but not much…

– 2x faster computer does not solve 2x larger SAT instance

– Search difficulty does not scale linearly with problem size!

Key Development Drivers

– Academic: “Open” SAT Competitions, Races, and Challenges:Germany ’89, Dimacs ’93, China ’96, SAT-2002, …, SAT-2013

– Industrial: Verification: Backend of Model Checkers, SMT solvers

– Applications to Test Pattern generation, Optimal Control, ProtocolDesign, Routers, Cryptography, E-Commerce (E-auctions &electronic trading agents), Bioinformatics (Haplotype Inference), etc

8

IBM Research


SAT vs. CP/MIP Search: A Contrast

SAT Solvers, esp. CDCL Solvers, work in a very different setting and with very different design principles/goals:

Blackbox approach

– No notion of designing custom search / decompositions as in CP Opt. or CPLEX

– Expected to work “out of the box” with perhaps a little parameter tuning

Very little structure available to exploit

– Binary domains, CNF form – very “flat” representation

– Advantage: Standardization, Competitions, Simplicity

No objective function to estimate for guidance or use to assess progress

– Number of unsatisfied clauses can be a highly misleading indicator

Reliance on LOTS of branching, backtracking, learning, restarting, …all performed extremely fast. How fast?

9

But Note: 1M variable CNF formula given to CP or MIP solver will not fly

IBM Research


SAT Solvers as Fast Search Engines CDCL SAT solvers have become really efficient at searching fast

E.g., on an IBM model checking instance from SAT Race 2006, with ~170k variables, 725k clauses, solvers such as Minisat and Rsat roughly

– Make 2000-5000 decisions/second

– Deduce 600-1000 conflicts/second

– Learn 600-1000 clauses/second (#clauses grows rapidly)

– Restart every 1-2 seconds (aggressive restarts)

Leading solvers such as Glucose have pushed Restarts even further

– Extremely aggressive restarts!

– Rely on techniques such as phase saving, “intelligent” clause deletion (based on LBD level), and dynamic context-based freezing of restarts to achieve success

10

IBM Research


SAT vs. CP/MIP: Branching “Tree” Structure

CP & MIP solvers traditionally explore a well-defined underlying search tree, albeit in different heuristic orders

– CP: typically binary/multi-way tree with DFS or LDS exploration order

– MIP: typically best-first style tree search with a frontier of Open nodes and “diving” to obtain feasible solutions quickly

Modern CDCL SAT solvers very far from building a traditional search tree!

– Branching is “uneven”

– Restarts are extremely frequent(context is retained using various techniques)

11

X=0 X=1 X=0 Y=1

• Under current context,X=0 UP Y=0

• Y is 1-UIP variablein last conflict analysis

• Note: Y=1 X=1but not necessarilyY=1 UP X=1

X=1

Normal:

“smaller”

?

IBM Research


The Importance of Learned Clauses

“Everything” is influenced by Conflict Analysis and Learned Clauses!

– No need to “flip” value of the branched upon variable: 1-UIP learned clause automatically implies flipped value of the 1-UIP literal

– Enablement of Aggressive Restarts• Safe, as context is preserved by learned clauses

– Conflict-directed Backjumping

– Necessity of Lazy Data Structures due to ~1000 clauses learned per second• Fast but with a drawback: incomplete knowledge of current state of all clauses• E.g., can no longer determine how many clauses are not yet satisfied!

– Branching heuristic: typical state-based heuristics cannot be computed anymore with lazy data structures: missing information about current state of all clauses• VSIDS and variations for variable selection (more later)

12

IBM Research


It Wasn’t Always the Case…Traditional, State-Dependent

and History-IndependentHeuristics in SAT

13

IBM Research



SAT Solvers, before Clause Learning became a must-have, had many variations of state-dependent heuristics similar to CSP solvers, e.g.:

1. DLCS: Dynamic Largest Combined Sum

maximize CP(x) + CN(x) (#unresolved clauses with literal x

occurring pos and neg, resp.)

2. DLIS: Dynamic Largest Individual Sum

maximize max {CP(x), CN(x)}

14

IBM Research


Traditional Branching Heuristics… contd.

3. BOHM [Buro & Klein-Buning, 1992]

lexicographically-maximize

where

Intuitively, satisfy most small clauses or further reduce their size

15

#unresolved size-i clausescontaining literal x

IBM Research


Traditional Branching Heuristics… contd.

4. MOMS: Maximum Occurrences in clauses of Minimum Size

Many variations, e.g.:

maximize

5. Jeroslow-Wang [1990]

maximize

Two-sided version: maximize

16

#unresolved smallest clausescontaining literal x

preference also to vars thatappear as both pos & neg

in smallest clauses

number of unresolved clauses literal lappears in, weighted inverselyproportional to exp(clause size)

IBM Research


Key Techniques InsideModern CDCL Solvers

17

IBM Research


DPLL Search as Implemented in Modern Solvers

Note:No “search tree” style search where we set x=0 andthen later “flip” to x=1

18

IBM Research


Clause Learning: Conflict Graphs, etc.

Search tree behavior:

Branch: p=0, q=0, b=1

Detect conflict; learn, say, 1-UIP clause (¬a or t)

Backtrack to depth=2: assignment stack has p=0, q=0

Flip value of b to get b=0

Do nothing (not even state update) and simply observe t=1 is implied! further, t=1 implies b=0 (under the current context)

19

b=1

p=0

q=0

t =1

(¬a or t)

IBM Research


Lazy Data Structures

SAT solvers (used to) spend 80% of their time doing unit propagation

Must make unit propagation efficient

– as more and more clauses are added (clause learning)

– as longer clauses are added (initial clauses tend to be mostly short)

Observation: Watching two un-falsified literals is sufficient,no matter how long the clause is!– With 2 un-faslified clauses, clause guaranteed to not unit propagate or be falsified

Can ignore processing most clauses unless the literal under consideration is being watched in them

Head and Tail Lists: SATO solver [1997]

Watched Literals: zChaff solver [2001]

20

IBM Research


H/T Lists vs. Watched Literals

WL structure needs

No pointer “trail” maintenance

No work when backtracking

But can mean exploring the whole clause to detect unit literal

21

Credit: Marques-Silva, Lynce,& Malik; Handbook of SAT

IBM Research


Dynamic Variable Selection Heuristics

VSIDS: Variable State Independent Decaying Sum (zChaff solver)

– Fast heuristic: not extremely accurate but adaptive and informed by conflicts!

– A key ingredient to make SAT solvers work well on industrial instances

– Necessitated by lazy data structures: accurate information about reduced clause size no longer available

Maintain one score for each literal

Increase score of literals appearing in the conflict clause

Periodically divide all scores by 2

Several variations, e.g., Berkmin solver:

– One score for each variable, incremented for all vars appearing in 1-UIP analysis

– More importantly: variable chosen from most recently learned and yet-unsatisfied conflict clause!

22

IBM Research


Restarts: Without Clause Learning

Originally motivated by observations about runtime distributionsof SAT solvers without conflict learning [Gomes et al, 1998]

Really effective when “heavy-tailed” behavior is presentwith many short runs (“backdoors”) and many very long runs

The easy to grasp concept: key is the role of the exponential distribution (geometric distribution, really, for the discrete case)

– If probability of failure after time T(a) decays faster than exponentially hurts to restart(a) decays exponentially doesn’t matter (easy solution strategy: keep restarting!)(c) decays slower than exponentially should restart

23

Standard Distribution(finite mean & variance)

Power Law Decay

Exponential Decay

IBM Research


Restarts: With Clause Learning

No clear empirical runtime distribution study (to my knowledge); however, large runtime variations often observed in practice and rapid restarts help!

– Safe: Context is kept through learned clauses and associated heuristics

Theoretical Justification/Intuition: Do we really need restarts?

– Stems from characterization of Clause Learning Proof System (CL) and its relation to General Resolution (RES)

– Full simulation of RES by CL known only in the presence of restarts!

1. CL (specific learning scheme, no restarts) exponentially more powerful than any “natural and proper” fragment of RES [2003]

2. CL** + lots of restarts = RES [2003]

3. F has a short RES proof F’ has a short CL proof w/o restarts [2008]

4. CL + lots of restarts = RES [2009]

5. CL has short proofs of natural candidate formulas for separation [2012]

24

IBM Research


Summary

Dramatic Progress in CDCL SAT Solvers

– High Contrast with CP/MIP solvers w.r.t. “tree” structure

– “Everything” influenced by Learned Clauses and Conflict Analysis


– Exist but no longer common (except in Look-Ahead SAT Solvers)

CDCL Solvers: Interesting Search Design, no clear “tree”

– Clause Learning

– Lazy Data Structures

– VSIDS Variable Selection Heuristic

– Aggressive (but careful) Restarts

Reference: Handbook of SAT

– 27 chapters: Everything from historical perspectives,theoretical foundations, practical solvers, applications, …

25

IBM Research


EXTRA SLIDES

26

IBM Research


Goal of This Talk

Highlight key advances in the design of DPLL-based SAT solvers that have made this scaling feasible

Note: it is not just the “simplicity” of the constraints per se

E.g., a CNF formula F given as a set of “clause constraints” to IBM/ILOG CP Solver or to a MIP solver would not scale up!

Several fundamental techniques make modern SAT solvers behave very differently from the traditional branch-and-backtrack search; e.g.

– there isn’t anymore a clearly defined “search tree”, or even a search data structure that “tries both branches” / “flips variable value”

– they don’t even look at most of the clauses when branching and propagating

– they literally do nothing upon backtrack besides un-assigning variable values (no “state” to revert back to)

27

IBM Research


Basic DPLL Search for SAT

28

IBM Research


Key Techniques in Modern SAT Solvers

1. Clause learning (no-goods)

– Requires a “conflict analysis” mechanism: implication graph, graph cuts

– Motivates/necessitates efficient data structures (e.g., watched literals)

– Enables getting rid of traditional search tree

– Takes “restarts” to another level: very rapid and less risky

– Helps guide the solver in many waysa) Conflict directed backjumping / non-chronological backtracking

b) Conflict directed variable selection: VSIDS

2. Lazy data structures: watched literals

– Motivated by SAT solvers spending ~80% of their time doing unit prop., and new clauses being added at a very rapid rate!

– Enables very efficient propagation: allows ignoring most clauses

– Enables “no work” upon backtracking

3. Very aggressive restarts

4. Assignment stack shrinking

5. Conflict clause minimization

6. Clause deletion (to save memory) [have to be careful about search tree]

29

ibm watson research center cpaior-2013 workshop: seeking feasibility in combinatorial problems ©...

Documents

ibm corporationcpaior

ibm corporationibm research

solvers3ibm research

solvers7ibm research

literalsibm research

literalibm research

combinatorial problemscdcl

combinatorial problemslook