[published at aaai-2013]

18
BIRS Workshop, Banff, Canada Jan 22, 2014 © 2014 IBM Corporation Resolution and Parallelizability: Barriers to the Efficient Parallelization of SAT Solvers George Katsirelos MIAT, INRA, Toulouse, France Ashish Sabharwal IBM Watson, USA Horst Samulowitz IBM Watson, USA Laurent Simon Univ. Paris-Sud, LRI/CNRS, Orsay, France [published at AAAI-2013]

Upload: adler

Post on 22-Feb-2016

49 views

Category:

Documents


0 download

DESCRIPTION

[published at AAAI-2013]. Resolution and Parallelizability: Barriers to the Efficient Parallelization of SAT Solvers George KatsirelosMIAT, INRA, Toulouse, France Ashish Sabharwal IBM Watson, USA Horst Samulowitz IBM Watson, USA Laurent Simon Univ. Paris- Sud , LRI/CNRS, Orsay , France. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: [published at AAAI-2013]

BIRS Workshop, Banff, Canada

Jan 22, 2014 © 2014 IBM Corporation

Resolution and Parallelizability:Barriers to the Efficient Parallelization of SAT Solvers

George Katsirelos MIAT, INRA, Toulouse, FranceAshish Sabharwal IBM Watson, USAHorst Samulowitz IBM Watson, USALaurent Simon Univ. Paris-Sud, LRI/CNRS, Orsay, France

[published at AAAI-2013]

Page 2: [published at AAAI-2013]

Resolution and Parallelizability

© 2014 IBM Corporation

Trend Towards Parallelization Focus Shifting From Single-Thread Performance

to Multi-Processor Performance– 100s and even 1000s of compute cores easily accessible– Classical Algorithm Parallelization, e.g., parallel sort, shortest path,

PRAM model, AC circuits– Significant Advances in Data Parallelism

e.g., MapReduce, Hadoop, SystemML, R statistics

Challenge: Search and Optimization on 1000s of Processors– Tremendous advances in the Sequential case of Combinatorial Search

E.g., SAT solvers can tackle instances with ~2M variables, 10M constraints!

– Exponential search appears to be an “obvious” candidate to parallelize!– In fact, many SAT/CSP/MIP solvers already do support multi-core and

multi-machine runs

2 Banff Workshop on SAT, 2014 | Katsirelos, Samulowitz, Sabharwal, Simon

Page 3: [published at AAAI-2013]

Resolution and Parallelizability

© 2014 IBM Corporation

Parallelization of Combinatorial Search

Fact: State-of-the-Art Search Engines Do NOT Parallelize Well– Brute Force exponential search is, of course, trivial to parallelize– But sophisticated search engines that adapt (through e.g. clause learning,

variable impact aggregation, etc.) have inherent sequential aspects– Modern SAT/MIP/”adapting”-CP solvers do not parallelize well

• Supporting data: next slide

AAAI 2012 Challenge Paper on the topic [Hamadi & Wintersteiger 2012]

– P-completeness of Unit Propagation a key barrier (solvers spend ~80% of the time Unit Propagating and we don’t know how to parallelize P well)

– Our result: barriers exist even if Unit Propagation came for free!

3 Banff Workshop on SAT, 2014 | Katsirelos, Samulowitz, Sabharwal, Simon

Page 4: [published at AAAI-2013]

Resolution and Parallelizability

© 2014 IBM Corporation

Parallelization of Combinatorial Search: SAT Rather Disappointing Performance at SAT Competitions – e.g., in 2011:

– Average speedup on 8 cores only ~1.8x, on 32 cores only ~3x– Top performing parallel solvers were based on little to no communication

(CryptoMinisat-MT [Soos 2012], Plingeling [Biere 2012])– Winners were “simple” Portfolio solvers (ppfolio [Roussel], pfolioUZK [Wotzlaw et al])

Plingeling-ats-587[Dec 2013]– Single machine with 128 cores

and 128 GB memory– Benchmark set used in this

work, restricted to the 142instances solved by 1 core in[10,5000) seconds

4 Banff Workshop on SAT, 2014 | Katsirelos, Samulowitz, Sabharwal, Simon

1 6 640.50

5.00

1.00

1.25

1.90

2.57 2.69

1.63

Plingeling ats 587

Number of Cores

Spee

dup

(geo

met

ric a

vera

ge)

Page 5: [published at AAAI-2013]

Resolution and Parallelizability

© 2014 IBM Corporation5 Banff Workshop on SAT, 2014 | Katsirelos, Samulowitz, Sabharwal, Simon

What makes parallelization of SAT solvers hard?

Can we obtain insights into their behaviorbeyond eventual wall-clock performance?

Page 6: [published at AAAI-2013]

Resolution and Parallelizability

© 2014 IBM Corporation

Contributions of the Work A New Systematic Study of Parallelism in the Context of Search

through the Lens of Proof Complexity– Focus on understanding rather than on engineering– Are there inherent bottlenecks that may hinder parallelization,

irrespective of which heuristics are used to share information?

1. A Practical Study: Interesting properties of Actual Proofs– Proofs generated by state-of-the-art SAT solvers contain narrow bottlenecks

2. Proof-Based Measures that capture Best-Case Parallelizability– Coarse measure: “Depth” of the proof graph– Refined measure: Makespan of a resource constrained scheduling problem

3. Empirical Findings: Correlations and Parallelization Limits– Typical sequential proofs are not very parallelizable even in the best case!– “Schedule speedup” / makespan correlates with observed speedup

6 Banff Workshop on SAT, 2014 | Katsirelos, Samulowitz, Sabharwal, Simon

Page 7: [published at AAAI-2013]

Resolution and Parallelizability

© 2014 IBM Corporation

Approach: Proof Complexity (applied here to Typically Generated Proofs) Proof Complexity [Cook & Reckhov, 1979]: Study of the nature (e.g., size,

width, space, depth, “shape”, etc.) of Proofs of Unsatisfiability– Resolution Graph of Conflict-Directed-Clause-Learning (CDCL) SAT Solvers

Runtime(any SAT solver, F) minproofs Size(Resolution proof of F)

– Note: Insights applicable also to Satisfiable instances!• Solvers prove a lot of sub-formulas to be unsatisfiable before hitting the first solution• Formal characterization [Achlioptas et al, 2001 & 2004]

Study of Proofs has provided strong insights into CDCL SAT solvers– What does “clause learning” bring?– What do “restarts” add? [Beame et al, 2004; Buss et al, 2008, 2012; Hertel et al, 2008; Pipatsrisawat et al, 2011]

7 Banff Workshop on SAT, 2014 | Katsirelos, Samulowitz, Sabharwal, Simon

Worst case / Best case results

Page 8: [published at AAAI-2013]

Resolution and Parallelizability

© 2014 IBM Corporation8

Underlying Inference Principle: Resolution CDCL SAT solvers produce Resolution Derivations Proof Graph and Depth:

– Each initial and derived constraint is a node, annotated with its proof depth– proofdepth(initial clause C) = 0

– proofdepth(derived clause C) = 1 + maxparents proofdepth(parent(C))

C1 0 C2 0 C3 0 C4 0 C5 0 C6 0

C7 1

C8 2

C9 1

C10 3

C11 2

C12 3

C13 4

Banff Workshop on SAT, 2014 | Katsirelos, Samulowitz, Sabharwal, Simon

Constraint ID Depth

F :

Page 9: [published at AAAI-2013]

Resolution and Parallelizability

© 2014 IBM Corporation9

How Parallelizable are Resolution Refutations? Refutation(F) = Resolution Proof that derives the empty (“false”) clause Depth of the proof clearly limits the amount of potential parallelization

– Chain of dependencies– Theorem: All Resolution Proof Graphs of certain “pebbling” style instances have

large depth; also holds for all Conflict Resolution Graphs (XOR substitution trick)

However, proofdepth bound on parallelization is very crude– Does not explain poor performance with small k (e.g., 8, 32, … processors)

How does a typical sequential SAT solver proof look like?– Setup for Experiments:

• Sequential Glucose 2.1 extended with proof output• GluSatX10: using SatX10 to run a k-processor version of Sequential Glucose

– Working Assumption: Proofs produced by GluSatX10 on k cores look “similar”to proofs produced by Sequential Glucose

Banff Workshop on SAT, 2014 | Katsirelos, Samulowitz, Sabharwal, Simon

http://x10-lang.org/satx10 [IBM Teams: X10 and SAT/CSP]

** simplified statements; see paper for more formal notions

Page 10: [published at AAAI-2013]

Resolution and Parallelizability

© 2014 IBM Corporation10

Proof Graph Example: Very Complex Structure

Banff Workshop on SAT, 2014 | Katsirelos, Samulowitz, Sabharwal, Simon

[Easy sequential case, solved in ~30 seconds]

Page 11: [published at AAAI-2013]

Resolution and Parallelizability

© 2014 IBM Corporation11

Bottlenecks in Typical SAT Proofs

Banff Workshop on SAT, 2014 | Katsirelos, Samulowitz, Sabharwal, Simon

Proofs Generated by SAT Solvers Exhibit Surprisingly Narrow “Bottlenecks”, i.e., Depths with Very Few (~1) Clauses!– Nothing deeper can be derived before bottleneck clauses Sequentiality

Depth in the proof

Num

ber o

f Cla

uses

(lo

g-sc

ale)

Der

ived

at t

hat D

epth

Page 12: [published at AAAI-2013]

Resolution and Parallelizability

© 2014 IBM Corporation12

Best-Case Parallelization with k Processors Given Proof P and k Processors, Best-Case Parallelization of P

= Resource Constrained Scheduling Problem with Precedences Let Mk(P) = makespan of the optimal schedule of P on k processors

– Even approximating Mk(P) within 4/3 is NP-hard, but (2 – 1/k) approx. is easy

Best-Case k processor speedup on P: Sk(P) = M1(P) / Mk(P)

Banff Workshop on SAT, 2014 | Katsirelos, Samulowitz, Sabharwal, Simon

C1 0 C2 0 C3 0 C4 0 C5 0 C6 0

C7 1

C8 2

C9 1

C10 3

C11 2

C12 3

C13 4Constraint ID Depth

C’9 1Example:M1(P) = 8M2(P) = 5M3(P) = 4M4(P) = 4…depth = 4

1 1 2

2 3

3 4

5

Page 13: [published at AAAI-2013]

Resolution and Parallelizability

© 2014 IBM Corporation13

Makespan vs. Proof Depth Schedule Makespan yields a finer grained lower bound, Sk(P),

on best-case parallelization than proof depth– proofdepth(P) : limit of parallelization of P with “infinite” processors

– Mk(P) proofdepth(P)

– Mk(P) proofdepth(P) as k

Banff Workshop on SAT, 2014 | Katsirelos, Samulowitz, Sabharwal, Simon

Page 14: [published at AAAI-2013]

Resolution and Parallelizability

© 2014 IBM Corporation14 Banff Workshop on SAT, 2014 | Katsirelos, Samulowitz, Sabharwal, Simon

Empirical Findings

Page 15: [published at AAAI-2013]

Resolution and Parallelizability

© 2014 IBM Corporation15

Even Best-Case Parallelization Efficiency is Low Beyond 100 Processors

Banff Workshop on SAT, 2014 | Katsirelos, Samulowitz, Sabharwal, Simon

Best-Case Efficiency of parallelizing P with k processors = 100 * (Sk(P) / k)

E.g., 100% = full utilization of k processors speedup = k

Page 16: [published at AAAI-2013]

Resolution and Parallelizability

© 2014 IBM Corporation16

Proofs of Some Instances Exhibit Very LowBest-Case Schedule Speedup

Banff Workshop on SAT, 2014 | Katsirelos, Samulowitz, Sabharwal, Simon

A) Even with 1024 processors,best-case speedup ~ 50-100

B) 128 processors insufficient toachieve a speedup of ~ 90

Page 17: [published at AAAI-2013]

Resolution and Parallelizability

© 2014 IBM Corporation17

Best-Case Schedule Speedup Correlates WithActual Observed Runtime Speedup

Banff Workshop on SAT, 2014 | Katsirelos, Samulowitz, Sabharwal, Simon

Average over a sliding window

(Makes the study of the best-case schedule speedup relevant)

Page 18: [published at AAAI-2013]

Resolution and Parallelizability

© 2014 IBM Corporation

Summary A New Systematic Study of Parallelism in the Context of Search

through the Lens of Proof Complexity– Focus on understanding rather than on engineering

Main Findings:A. Typical Sequential Refutations Contain Surprisingly Narrow BottlenecksB. Typical Sequential Refutations are Not Parallelizable Beyond a Few Processors,

even in the best case of offline ‘schedule speedup’ produced in hindsightC. Observed Runtime Speedup with k processors weakly correlates with

Best-Case Schedule Speedup of a Sequential Proof produced in hindsight

Open Question: Can we design SAT solvers that generate Proofs that are inherently More Parallelizable?

Caveat: assumption that proofs generated by GluSatX10 on k cores look “similar” to proofs generated by Sequential Glucose

18 Banff Workshop on SAT, 2014 | Katsirelos, Samulowitz, Sabharwal, Simon