ibm watson research center cpaior-2013 workshop: seeking feasibility in combinatorial problems ©...
TRANSCRIPT
IBM Watson Research Center
CPAIOR-2013 Workshop: Seeking Feasibility in Combinatorial Problems © 2013 IBM Corporation
Branching Strategies and Restartsin SAT Solvers
Ashish Sabharwal
IBM Research
© 2013 IBM CorporationCPAIOR-2013 Workshop: Seeking Feasibility in Combinatorial Problems
Talk Outline
The SAT Problem, SAT Solvers
Conflict-Driven Systematic SAT Solvers
– Dramatic Progress
– Contrast with CP/MIP solvers
– “Everything” influenced by Learned Clauses and Conflict Analysis
Traditional Branching Heuristics
CDCL Solvers: Dynamic Heuristics and Associated Techniques
– Clause Learning
– Lazy Data Structures
– VSIDS Variable Selection Heuristic
– Restarts
Summary
2
IBM Research
© 2013 IBM CorporationCPAIOR-2013 Workshop: Seeking Feasibility in Combinatorial Problems
SAT: Problem and Solvers
3
IBM Research
© 2013 IBM CorporationCPAIOR-2013 Workshop: Seeking Feasibility in Combinatorial Problems
Boolean Satisfiability (SAT) : Basics
Variables with Boolean domain {T,F} or, equivalently, {1,0}
Constraints specified in the Conjunctive Normal Form (CNF)
E.g. (a or b) and (c or d or f) and (a or c or d)
SAT Solver: An algorithm (typically with an implementation) that, given a CNF formula F, finds a satisfying assignment for F, if there is one
– Complete SAT Solver: must terminate and output “unsatisfiable”, if F is unsat.
Dozens of (mostly Open Source) SAT Solvers available on the Internet
– 35+ solvers participated in SAT Competition 2006
– 65+ in 2011
4
clause = disjunction of literals
a : variablea, a : literals
IBM Research
© 2013 IBM CorporationCPAIOR-2013 Workshop: Seeking Feasibility in Combinatorial Problems
SAT Solvers: 3 Dominant Approaches
Local Search based stochastic algorithms
– Incomplete (do not prove unsatisfiability)
– Very effective on satisfiable Random instances, esp. near phase transition
5
Look-Ahead Based systematic solvers
– Complete search with careful selection of variables/values to branch on• Spend time exploring “reduction” in complexity with various branching possibilities• Local Learning: some local inference within a subtree is learned as “implication arrays”• Theoretical Studies: autarkies, “reduction” measures based on probability distributions
– Very effective on unsatisfiable Random instances
– Also effective on Crafted and some Industrial instances
Conflict Directed Clause Learning (CDCL) solvers
– Complete search producing General Resolution proofs of unsatisfiability
– Very effective on Industrial instances, esp. large and highly interconnected ones
IBM Research
© 2013 IBM CorporationCPAIOR-2013 Workshop: Seeking Feasibility in Combinatorial Problems
Look-Ahead vs. CDCL SAT Solvers
Complementary Regimes of Strength
– Plot shows dominating solver on Crafted and Industrial instances
+ March: typical Look-Ahead solver
Minisat: typical CDCL solver
6
Low ConstraintDensity
Credit: Heule & van Maaren,Handbook of SAT
Low Diameterof Resolution Graph
[two clauses have an edgeif they clash in 1 literal]
IBM Research
© 2013 IBM CorporationCPAIOR-2013 Workshop: Seeking Feasibility in Combinatorial Problems
CDCL SAT Solvers
7
IBM Research
© 2013 IBM CorporationCPAIOR-2013 Workshop: Seeking Feasibility in Combinatorial Problems
Systematic SAT Solvers as Search EnginesDramatic Progress in 20 Years
Started out with ~100 vars, ~200 constraints in early 1990’s
Now often easily handle over 1M vars, ~5M constraints
– Instances with 30M clauses being used in competitions!
Was it all just Moore’s Law? It helped, but not much…
– 2x faster computer does not solve 2x larger SAT instance
– Search difficulty does not scale linearly with problem size!
Key Development Drivers
– Academic: “Open” SAT Competitions, Races, and Challenges:Germany ’89, Dimacs ’93, China ’96, SAT-2002, …, SAT-2013
– Industrial: Verification: Backend of Model Checkers, SMT solvers
– Applications to Test Pattern generation, Optimal Control, ProtocolDesign, Routers, Cryptography, E-Commerce (E-auctions &electronic trading agents), Bioinformatics (Haplotype Inference), etc
8
IBM Research
© 2013 IBM CorporationCPAIOR-2013 Workshop: Seeking Feasibility in Combinatorial Problems
SAT vs. CP/MIP Search: A Contrast
SAT Solvers, esp. CDCL Solvers, work in a very different setting and with very different design principles/goals:
Blackbox approach
– No notion of designing custom search / decompositions as in CP Opt. or CPLEX
– Expected to work “out of the box” with perhaps a little parameter tuning
Very little structure available to exploit
– Binary domains, CNF form – very “flat” representation
– Advantage: Standardization, Competitions, Simplicity
No objective function to estimate for guidance or use to assess progress
– Number of unsatisfied clauses can be a highly misleading indicator
Reliance on LOTS of branching, backtracking, learning, restarting, …all performed extremely fast. How fast?
9
But Note: 1M variable CNF formula given to CP or MIP solver will not fly
IBM Research
© 2013 IBM CorporationCPAIOR-2013 Workshop: Seeking Feasibility in Combinatorial Problems
SAT Solvers as Fast Search Engines CDCL SAT solvers have become really efficient at searching fast
E.g., on an IBM model checking instance from SAT Race 2006, with ~170k variables, 725k clauses, solvers such as Minisat and Rsat roughly
– Make 2000-5000 decisions/second
– Deduce 600-1000 conflicts/second
– Learn 600-1000 clauses/second (#clauses grows rapidly)
– Restart every 1-2 seconds (aggressive restarts)
Leading solvers such as Glucose have pushed Restarts even further
– Extremely aggressive restarts!
– Rely on techniques such as phase saving, “intelligent” clause deletion (based on LBD level), and dynamic context-based freezing of restarts to achieve success
10
IBM Research
© 2013 IBM CorporationCPAIOR-2013 Workshop: Seeking Feasibility in Combinatorial Problems
SAT vs. CP/MIP: Branching “Tree” Structure
CP & MIP solvers traditionally explore a well-defined underlying search tree, albeit in different heuristic orders
– CP: typically binary/multi-way tree with DFS or LDS exploration order
– MIP: typically best-first style tree search with a frontier of Open nodes and “diving” to obtain feasible solutions quickly
Modern CDCL SAT solvers very far from building a traditional search tree!
– Branching is “uneven”
– Restarts are extremely frequent(context is retained using various techniques)
11
X=0 X=1 X=0 Y=1
• Under current context,X=0 UP Y=0
• Y is 1-UIP variablein last conflict analysis
• Note: Y=1 X=1but not necessarilyY=1 UP X=1
X=1
Normal:
“smaller”
?
IBM Research
© 2013 IBM CorporationCPAIOR-2013 Workshop: Seeking Feasibility in Combinatorial Problems
The Importance of Learned Clauses
“Everything” is influenced by Conflict Analysis and Learned Clauses!
– No need to “flip” value of the branched upon variable: 1-UIP learned clause automatically implies flipped value of the 1-UIP literal
– Enablement of Aggressive Restarts• Safe, as context is preserved by learned clauses
– Conflict-directed Backjumping
– Necessity of Lazy Data Structures due to ~1000 clauses learned per second• Fast but with a drawback: incomplete knowledge of current state of all clauses• E.g., can no longer determine how many clauses are not yet satisfied!
– Branching heuristic: typical state-based heuristics cannot be computed anymore with lazy data structures: missing information about current state of all clauses• VSIDS and variations for variable selection (more later)
12
IBM Research
© 2013 IBM CorporationCPAIOR-2013 Workshop: Seeking Feasibility in Combinatorial Problems
It Wasn’t Always the Case…Traditional, State-Dependent
and History-IndependentHeuristics in SAT
13
IBM Research
© 2013 IBM CorporationCPAIOR-2013 Workshop: Seeking Feasibility in Combinatorial Problems
Traditional Branching Heuristics
SAT Solvers, before Clause Learning became a must-have, had many variations of state-dependent heuristics similar to CSP solvers, e.g.:
1. DLCS: Dynamic Largest Combined Sum
maximize CP(x) + CN(x) (#unresolved clauses with literal x
occurring pos and neg, resp.)
2. DLIS: Dynamic Largest Individual Sum
maximize max {CP(x), CN(x)}
14
IBM Research
© 2013 IBM CorporationCPAIOR-2013 Workshop: Seeking Feasibility in Combinatorial Problems
Traditional Branching Heuristics… contd.
3. BOHM [Buro & Klein-Buning, 1992]
lexicographically-maximize
where
Intuitively, satisfy most small clauses or further reduce their size
15
#unresolved size-i clausescontaining literal x
IBM Research
© 2013 IBM CorporationCPAIOR-2013 Workshop: Seeking Feasibility in Combinatorial Problems
Traditional Branching Heuristics… contd.
4. MOMS: Maximum Occurrences in clauses of Minimum Size
Many variations, e.g.:
maximize
5. Jeroslow-Wang [1990]
maximize
Two-sided version: maximize
16
#unresolved smallest clausescontaining literal x
preference also to vars thatappear as both pos & neg
in smallest clauses
number of unresolved clauses literal lappears in, weighted inverselyproportional to exp(clause size)
IBM Research
© 2013 IBM CorporationCPAIOR-2013 Workshop: Seeking Feasibility in Combinatorial Problems
Key Techniques InsideModern CDCL Solvers
17
IBM Research
© 2013 IBM CorporationCPAIOR-2013 Workshop: Seeking Feasibility in Combinatorial Problems
DPLL Search as Implemented in Modern Solvers
Note:No “search tree” style search where we set x=0 andthen later “flip” to x=1
18
IBM Research
© 2013 IBM CorporationCPAIOR-2013 Workshop: Seeking Feasibility in Combinatorial Problems
Clause Learning: Conflict Graphs, etc.
Search tree behavior:
Branch: p=0, q=0, b=1
Detect conflict; learn, say, 1-UIP clause (¬a or t)
Backtrack to depth=2: assignment stack has p=0, q=0
Flip value of b to get b=0
Do nothing (not even state update) and simply observe t=1 is implied! further, t=1 implies b=0 (under the current context)
19
b=1
p=0
q=0
t =1
(¬a or t)
IBM Research
© 2013 IBM CorporationCPAIOR-2013 Workshop: Seeking Feasibility in Combinatorial Problems
Lazy Data Structures
SAT solvers (used to) spend 80% of their time doing unit propagation
Must make unit propagation efficient
– as more and more clauses are added (clause learning)
– as longer clauses are added (initial clauses tend to be mostly short)
Observation: Watching two un-falsified literals is sufficient,no matter how long the clause is!– With 2 un-faslified clauses, clause guaranteed to not unit propagate or be falsified
Can ignore processing most clauses unless the literal under consideration is being watched in them
Head and Tail Lists: SATO solver [1997]
Watched Literals: zChaff solver [2001]
20
IBM Research
© 2013 IBM CorporationCPAIOR-2013 Workshop: Seeking Feasibility in Combinatorial Problems
H/T Lists vs. Watched Literals
WL structure needs
No pointer “trail” maintenance
No work when backtracking
But can mean exploring the whole clause to detect unit literal
21
Credit: Marques-Silva, Lynce,& Malik; Handbook of SAT
IBM Research
© 2013 IBM CorporationCPAIOR-2013 Workshop: Seeking Feasibility in Combinatorial Problems
Dynamic Variable Selection Heuristics
VSIDS: Variable State Independent Decaying Sum (zChaff solver)
– Fast heuristic: not extremely accurate but adaptive and informed by conflicts!
– A key ingredient to make SAT solvers work well on industrial instances
– Necessitated by lazy data structures: accurate information about reduced clause size no longer available
Maintain one score for each literal
Increase score of literals appearing in the conflict clause
Periodically divide all scores by 2
Several variations, e.g., Berkmin solver:
– One score for each variable, incremented for all vars appearing in 1-UIP analysis
– More importantly: variable chosen from most recently learned and yet-unsatisfied conflict clause!
22
IBM Research
© 2013 IBM CorporationCPAIOR-2013 Workshop: Seeking Feasibility in Combinatorial Problems
Restarts: Without Clause Learning
Originally motivated by observations about runtime distributionsof SAT solvers without conflict learning [Gomes et al, 1998]
Really effective when “heavy-tailed” behavior is presentwith many short runs (“backdoors”) and many very long runs
The easy to grasp concept: key is the role of the exponential distribution (geometric distribution, really, for the discrete case)
– If probability of failure after time T(a) decays faster than exponentially hurts to restart(a) decays exponentially doesn’t matter (easy solution strategy: keep restarting!)(c) decays slower than exponentially should restart
23
Standard Distribution(finite mean & variance)
Power Law Decay
Exponential Decay
IBM Research
© 2013 IBM CorporationCPAIOR-2013 Workshop: Seeking Feasibility in Combinatorial Problems
Restarts: With Clause Learning
No clear empirical runtime distribution study (to my knowledge); however, large runtime variations often observed in practice and rapid restarts help!
– Safe: Context is kept through learned clauses and associated heuristics
Theoretical Justification/Intuition: Do we really need restarts?
– Stems from characterization of Clause Learning Proof System (CL) and its relation to General Resolution (RES)
– Full simulation of RES by CL known only in the presence of restarts!
1. CL (specific learning scheme, no restarts) exponentially more powerful than any “natural and proper” fragment of RES [2003]
2. CL** + lots of restarts = RES [2003]
3. F has a short RES proof F’ has a short CL proof w/o restarts [2008]
4. CL + lots of restarts = RES [2009]
5. CL has short proofs of natural candidate formulas for separation [2012]
24
IBM Research
© 2013 IBM CorporationCPAIOR-2013 Workshop: Seeking Feasibility in Combinatorial Problems
Summary
Dramatic Progress in CDCL SAT Solvers
– High Contrast with CP/MIP solvers w.r.t. “tree” structure
– “Everything” influenced by Learned Clauses and Conflict Analysis
Traditional Branching Heuristics
– Exist but no longer common (except in Look-Ahead SAT Solvers)
CDCL Solvers: Interesting Search Design, no clear “tree”
– Clause Learning
– Lazy Data Structures
– VSIDS Variable Selection Heuristic
– Aggressive (but careful) Restarts
Reference: Handbook of SAT
– 27 chapters: Everything from historical perspectives,theoretical foundations, practical solvers, applications, …
25
IBM Research
© 2013 IBM CorporationCPAIOR-2013 Workshop: Seeking Feasibility in Combinatorial Problems
EXTRA SLIDES
26
IBM Research
© 2013 IBM CorporationCPAIOR-2013 Workshop: Seeking Feasibility in Combinatorial Problems
Goal of This Talk
Highlight key advances in the design of DPLL-based SAT solvers that have made this scaling feasible
Note: it is not just the “simplicity” of the constraints per se
E.g., a CNF formula F given as a set of “clause constraints” to IBM/ILOG CP Solver or to a MIP solver would not scale up!
Several fundamental techniques make modern SAT solvers behave very differently from the traditional branch-and-backtrack search; e.g.
– there isn’t anymore a clearly defined “search tree”, or even a search data structure that “tries both branches” / “flips variable value”
– they don’t even look at most of the clauses when branching and propagating
– they literally do nothing upon backtrack besides un-assigning variable values (no “state” to revert back to)
27
IBM Research
© 2013 IBM CorporationCPAIOR-2013 Workshop: Seeking Feasibility in Combinatorial Problems
Basic DPLL Search for SAT
28
IBM Research
© 2013 IBM CorporationCPAIOR-2013 Workshop: Seeking Feasibility in Combinatorial Problems
Key Techniques in Modern SAT Solvers
1. Clause learning (no-goods)
– Requires a “conflict analysis” mechanism: implication graph, graph cuts
– Motivates/necessitates efficient data structures (e.g., watched literals)
– Enables getting rid of traditional search tree
– Takes “restarts” to another level: very rapid and less risky
– Helps guide the solver in many waysa) Conflict directed backjumping / non-chronological backtracking
b) Conflict directed variable selection: VSIDS
2. Lazy data structures: watched literals
– Motivated by SAT solvers spending ~80% of their time doing unit prop., and new clauses being added at a very rapid rate!
– Enables very efficient propagation: allows ignoring most clauses
– Enables “no work” upon backtracking
3. Very aggressive restarts
4. Assignment stack shrinking
5. Conflict clause minimization
6. Clause deletion (to save memory) [have to be careful about search tree]
29