multi-core and/or symbolic model checking€¦ · introduction symbolic multicore symbolic &...
TRANSCRIPT
Multi-core and/or Symbolic Model Checking
UNIVERSITY OF TWENTE. Formal Methods & Tools.
Jaco van de PolSeptember 20, 2012
Joint work with Stefan Blom, Tom van Dijk,Alfons Laarman, Elwin Pater, and Michael WeberAVOCS 2012
... Introduction Symbolic Multicore Symbolic & Multi-core Conclusion ...
Model Checking
I Investigate properties of all paths in some discrete graph
I Typically, the graph describes complex system behaviour
Applications
I Safety-critical systems: prove absence of errorsI Software validation: find bugs, generate test cases
I Indispensable for distributed and multi-core applications
I Hardware verification: partly replacing testing @ Intel
I Scheduling/planning: find the best possible scheduleI Performance and dependability evaluation:
I scenario exhibiting the worst-case latencyI average latency, long run throughput, mean-time to failure
I Biology: investigate dynamic system behaviour in silico
UNIVERSITY OF TWENTE. Multi-core and/or Symbolic September 20, 2012 2 / 41
... Introduction Symbolic Multicore Symbolic & Multi-core Conclusion ...
Model Checking
I Investigate properties of all paths in some discrete graph
I Typically, the graph describes complex system behaviour
Applications
I Safety-critical systems: prove absence of errorsI Software validation: find bugs, generate test cases
I Indispensable for distributed and multi-core applications
I Hardware verification: partly replacing testing @ Intel
I Scheduling/planning: find the best possible scheduleI Performance and dependability evaluation:
I scenario exhibiting the worst-case latencyI average latency, long run throughput, mean-time to failure
I Biology: investigate dynamic system behaviour in silico
UNIVERSITY OF TWENTE. Multi-core and/or Symbolic September 20, 2012 2 / 41
... Introduction Symbolic Multicore Symbolic & Multi-core Conclusion ...
Model Checking
I Investigate properties of all paths in some discrete graph
I Typically, the graph describes complex system behaviour
Applications
I Safety-critical systems: prove absence of errorsI Software validation: find bugs, generate test cases
I Indispensable for distributed and multi-core applications
I Hardware verification: partly replacing testing @ Intel
I Scheduling/planning: find the best possible scheduleI Performance and dependability evaluation:
I scenario exhibiting the worst-case latencyI average latency, long run throughput, mean-time to failure
I Biology: investigate dynamic system behaviour in silico
UNIVERSITY OF TWENTE. Multi-core and/or Symbolic September 20, 2012 2 / 41
... Introduction Symbolic Multicore Symbolic & Multi-core Conclusion ...
State Space Explosion
A combinatorial problem
I number of components
I number of data values
I length of buffers
What if the graph > 1010 states?
I Start computing on an initial part . on-the-fly model checking
I Devise a concise representation . . . . . symbolic model checking
I Explore only a subset of transitions . . . .partial order reduction
I Exhibit structural symmetries . . . . . . . . . . . symmetry reduction
I Use a cluster of computers . . . . . . . distributed model checking
I Shared-memory solution . . . . . . . . . . multi-core model checking
UNIVERSITY OF TWENTE. Multi-core and/or Symbolic September 20, 2012 3 / 41
... Introduction Symbolic Multicore Symbolic & Multi-core Conclusion ...
State Space Explosion
A combinatorial problem
I number of components
I number of data values
I length of buffers
What if the graph > 1010 states?
I Start computing on an initial part . on-the-fly model checking
I Devise a concise representation . . . . . symbolic model checking
I Explore only a subset of transitions . . . .partial order reduction
I Exhibit structural symmetries . . . . . . . . . . . symmetry reduction
I Use a cluster of computers . . . . . . . distributed model checking
I Shared-memory solution . . . . . . . . . . multi-core model checking
UNIVERSITY OF TWENTE. Multi-core and/or Symbolic September 20, 2012 3 / 41
... Introduction Symbolic Multicore Symbolic & Multi-core Conclusion ...
Requirements for an interface
Access to high-performance model checking
I Technique X is only available for specification language Y
I A user needs to rewrite his model into different languages
PINS
mCRL2Process algebra (Spinja) (DBM lib)
Promela UPPAAL
ReachabilitySymbolicMulti−core
ReachabilityReachabilityDistributed
Locality of transitions
I Locality is a key for all high-performance algorithms
I Interface should be simple and generic: black-box
I Interface should expose structural locality: white-box
UNIVERSITY OF TWENTE. Multi-core and/or Symbolic September 20, 2012 4 / 41
... Introduction Symbolic Multicore Symbolic & Multi-core Conclusion ...
Requirements for an interface
Access to high-performance model checking
I Technique X is only available for specification language Y
I A user needs to rewrite his model into different languages
PINS
mCRL2Process algebra (Spinja) (DBM lib)
Promela UPPAAL
ReachabilitySymbolicMulti−core
ReachabilityReachabilityDistributed
Locality of transitions
I Locality is a key for all high-performance algorithms
I Interface should be simple and generic: black-box
I Interface should expose structural locality: white-box
UNIVERSITY OF TWENTE. Multi-core and/or Symbolic September 20, 2012 4 / 41
... Introduction Symbolic Multicore Symbolic & Multi-core Conclusion ...
What is PINS?
Partitioned Interface for Next States:
I States are partitioned into vector of N state parts
I Partition the Next-State function into M transition groups
I Provide a priori a (hopefully sparse) N ×M dependency matrix(indicates which state parts each transition group depends on)
On-the-fly access to the state space and locality information
Three basic functions
I get-matrix : returns the dependency matrix DM×NI init-state(): returns the initial state vector
I next-state(i,s): successors of state s in transition group i
UNIVERSITY OF TWENTE. Multi-core and/or Symbolic September 20, 2012 5 / 41
... Introduction Symbolic Multicore Symbolic & Multi-core Conclusion ...
What is PINS?
Partitioned Interface for Next States:
I States are partitioned into vector of N state parts
I Partition the Next-State function into M transition groups
I Provide a priori a (hopefully sparse) N ×M dependency matrix(indicates which state parts each transition group depends on)
On-the-fly access to the state space and locality information
Three basic functions
I get-matrix : returns the dependency matrix DM×NI init-state(): returns the initial state vector
I next-state(i,s): successors of state s in transition group i
UNIVERSITY OF TWENTE. Multi-core and/or Symbolic September 20, 2012 5 / 41
... Introduction Symbolic Multicore Symbolic & Multi-core Conclusion ...
LTSmin tool Architecture and Functionality
Pins2pins
mCRL2 Promela DVE UPPAAL
Symbolic
Specification
PINS
PINS
Distributed Multi−core
Languages
ToolsReachability
reduction Partial−order Variable reordering
Transition groupingcachingTransition
Wrappers
Analysis algorithms
I Check errors/goals (deadlocks, invariants, actions) on-the-fly
I Distributed generation and minimization modulo bisimulation
I Multi-core LTL model checking
I Symbolic CTL∗ and µ-calculus model checking
I Disclaimer: partial-order reduction needs a more refined API
UNIVERSITY OF TWENTE. Multi-core and/or Symbolic September 20, 2012 6 / 41
... Introduction Symbolic Multicore Symbolic & Multi-core Conclusion ...
Table of Contents
1 IntroductionModel CheckingLTSmin toolset
2 Symbolic Model CheckingLocal Transition CachingMulti-valued Decision Diagrams
3 Multi-core Model CheckingLockless Shared HashtableState Space CompressionParallel Nested Depth First Search for LTL model checking
4 Symbolic Model Checking with Multi-core BDDsMulti-core Data StructuresParallel BDD operations
5 Conclusion
UNIVERSITY OF TWENTE. Multi-core and/or Symbolic September 20, 2012 7 / 41
Symbolic Model Checking
... Introduction Symbolic Multicore Symbolic & Multi-core Conclusion ...
Example Dependency Matrix
int x=7;
process p1() {
do
::{x>0 => x--;y++}
::{x>0 => x--;z++}
od }
int y=3;
process p2() {
do
::{y>0 => y--;x++}
::{y>0 => y--;z++}
od }
int z=9;
process p3() {
do
::{z>0 => z--;x++}
::{z>0 => z--;y++}
od }
Default Matrix
x y z
p1 + + +p2 + + +p3 + + +
Better Matrix
x y z
p1.1 + + −p1.2 + − +p2.1 + + −p2.2 − + +p3.1 + − +p3.2 − + +
Static Regrouping
x y z
p1.1, 2.1 + + −p1.2, 3.1 + − +p2.2, 3.2 − + +
UNIVERSITY OF TWENTE. Multi-core and/or Symbolic September 20, 2012 9 / 41
... Introduction Symbolic Multicore Symbolic & Multi-core Conclusion ...
Example Dependency Matrix
int x=7;
process p1() {
do
::{x>0 => x--;y++}
::{x>0 => x--;z++}
od }
int y=3;
process p2() {
do
::{y>0 => y--;x++}
::{y>0 => y--;z++}
od }
int z=9;
process p3() {
do
::{z>0 => z--;x++}
::{z>0 => z--;y++}
od }
Default Matrix
x y z
p1 + + +p2 + + +p3 + + +
Better Matrix
x y z
p1.1 + + −p1.2 + − +p2.1 + + −p2.2 − + +p3.1 + − +p3.2 − + +
Static Regrouping
x y z
p1.1, 2.1 + + −p1.2, 3.1 + − +p2.2, 3.2 − + +
UNIVERSITY OF TWENTE. Multi-core and/or Symbolic September 20, 2012 9 / 41
... Introduction Symbolic Multicore Symbolic & Multi-core Conclusion ...
Example Dependency Matrix
int x=7;
process p1() {
do
::{x>0 => x--;y++}
::{x>0 => x--;z++}
od }
int y=3;
process p2() {
do
::{y>0 => y--;x++}
::{y>0 => y--;z++}
od }
int z=9;
process p3() {
do
::{z>0 => z--;x++}
::{z>0 => z--;y++}
od }
Default Matrix
x y z
p1 + + +p2 + + +p3 + + +
Better Matrix
x y z
p1.1 + + −p1.2 + − +p2.1 + + −p2.2 − + +p3.1 + − +p3.2 − + +
Static Regrouping
x y z
p1.1, 2.1 + + −p1.2, 3.1 + − +p2.2, 3.2 − + +
UNIVERSITY OF TWENTE. Multi-core and/or Symbolic September 20, 2012 9 / 41
... Introduction Symbolic Multicore Symbolic & Multi-core Conclusion ...
Example Dependency Matrix
int x=7;
process p1() {
do
::{x>0 => x--;y++}
::{x>0 => x--;z++}
od }
int y=3;
process p2() {
do
::{y>0 => y--;x++}
::{y>0 => y--;z++}
od }
int z=9;
process p3() {
do
::{z>0 => z--;x++}
::{z>0 => z--;y++}
od }
Default Matrix
x y z
p1 + + +p2 + + +p3 + + +
Better Matrix
x y z
p1.1 + + −p1.2 + − +p2.1 + + −p2.2 − + +p3.1 + − +p3.2 − + +
Static Regrouping
x y z
p1.1, 2.1 + + −p1.2, 3.1 + − +p2.2, 3.2 − + +
UNIVERSITY OF TWENTE. Multi-core and/or Symbolic September 20, 2012 9 / 41
... Introduction Symbolic Multicore Symbolic & Multi-core Conclusion ...
Example Dependency Matrixint x=7;
process p1() {
do
::{x>0 => x--;y++}
::{x>0 => x--;z++}
od }
int y=3;
process p2() {
do
::{y>0 => y--;x++}
::{y>0 => y--;z++}
od }
int z=9;
process p3() {
do
::{z>0 => z--;x++}
::{z>0 => z--;y++}
od }
Default Matrix
x y z
p1 + + +p2 + + +p3 + + +
Better Matrix
x y z
p1.1 + + −p1.2 + − +p2.1 + + −p2.2 − + +p3.1 + − +p3.2 − + +
init state = 〈7, 3, 9〉
〈7, 3, 9〉 p1.1−→ 〈6, 4, 9〉〈7, 3, ∗〉 p1.1−→ 〈6, 4, ∗〉
〈7, 3, 9〉 p3.2−→ 〈7, 4, 8〉〈∗, 3, 9〉 p3.2−→ 〈∗, 4, 8〉
Static Regrouping
x y z
p1.1, 2.1 + + −p1.2, 3.1 + − +p2.2, 3.2 − + +
UNIVERSITY OF TWENTE. Multi-core and/or Symbolic September 20, 2012 9 / 41
... Introduction Symbolic Multicore Symbolic & Multi-core Conclusion ...
Example Dependency Matrix
int x=7;
process p1() {
do
::{x>0 => x--;y++}
::{x>0 => x--;z++}
od }
int y=3;
process p2() {
do
::{y>0 => y--;x++}
::{y>0 => y--;z++}
od }
int z=9;
process p3() {
do
::{z>0 => z--;x++}
::{z>0 => z--;y++}
od }
Default Matrix
x y z
p1 + + +p2 + + +p3 + + +
Better Matrix
x y z
p1.1 + + −p1.2 + − +p2.1 + + −p2.2 − + +p3.1 + − +p3.2 − + +
Static Regrouping
x y z
p1.1, 2.1 + + −p1.2, 3.1 + − +p2.2, 3.2 − + +
UNIVERSITY OF TWENTE. Multi-core and/or Symbolic September 20, 2012 9 / 41
... Introduction Symbolic Multicore Symbolic & Multi-core Conclusion ...
Caching Local Transitions
I Consider local transition in specification:
p3.2: atomic { z>0 -> z--; y++ }
I Dependency matrix row:[ x y z
p3.2 0 1 1]
I Define projection: πp3.2〈x , y , z〉 = 〈y , z〉
Next, consider two consecutive calls to p3.2
first call: 〈x , y , z〉successor: 〈x , y ′, z ′〉project and 〈y , z〉 →store in cache: 〈y ′, z ′〉
second call: 〈x ′′, y , z〉project: 〈y , z〉cache lookup: → 〈y ′, z ′〉expand: 〈x ′′, y ′, z ′〉
I Transition caching saves calls to the language module
I Memoization table cache[i ] for each transition group i
UNIVERSITY OF TWENTE. Multi-core and/or Symbolic September 20, 2012 10 / 41
... Introduction Symbolic Multicore Symbolic & Multi-core Conclusion ...
Caching Local Transitions
I Consider local transition in specification:
p3.2: atomic { z>0 -> z--; y++ }
I Dependency matrix row:[ x y z
p3.2 0 1 1]
I Define projection: πp3.2〈x , y , z〉 = 〈y , z〉
Next, consider two consecutive calls to p3.2
first call: 〈x , y , z〉successor: 〈x , y ′, z ′〉project and 〈y , z〉 →store in cache: 〈y ′, z ′〉
second call: 〈x ′′, y , z〉project: 〈y , z〉cache lookup: → 〈y ′, z ′〉expand: 〈x ′′, y ′, z ′〉
I Transition caching saves calls to the language module
I Memoization table cache[i ] for each transition group i
UNIVERSITY OF TWENTE. Multi-core and/or Symbolic September 20, 2012 10 / 41
... Introduction Symbolic Multicore Symbolic & Multi-core Conclusion ...
Caching Local Transitions
I Consider local transition in specification:
p3.2: atomic { z>0 -> z--; y++ }
I Dependency matrix row:[ x y z
p3.2 0 1 1]
I Define projection: πp3.2〈x , y , z〉 = 〈y , z〉
Next, consider two consecutive calls to p3.2
first call: 〈x , y , z〉successor: 〈x , y ′, z ′〉project and 〈y , z〉 →store in cache: 〈y ′, z ′〉
second call: 〈x ′′, y , z〉project: 〈y , z〉cache lookup: → 〈y ′, z ′〉expand: 〈x ′′, y ′, z ′〉
I Transition caching saves calls to the language module
I Memoization table cache[i ] for each transition group i
UNIVERSITY OF TWENTE. Multi-core and/or Symbolic September 20, 2012 10 / 41
... Introduction Symbolic Multicore Symbolic & Multi-core Conclusion ...
Caching Local Transitions
I Consider local transition in specification:
p3.2: atomic { z>0 -> z--; y++ }
I Dependency matrix row:[ x y z
p3.2 0 1 1]
I Define projection: πp3.2〈x , y , z〉 = 〈y , z〉
Next, consider two consecutive calls to p3.2
first call: 〈x , y , z〉successor: 〈x , y ′, z ′〉project and 〈y , z〉 →store in cache: 〈y ′, z ′〉
second call: 〈x ′′, y , z〉project: 〈y , z〉cache lookup: → 〈y ′, z ′〉expand: 〈x ′′, y ′, z ′〉
I Transition caching saves calls to the language module
I Memoization table cache[i ] for each transition group i
UNIVERSITY OF TWENTE. Multi-core and/or Symbolic September 20, 2012 10 / 41
... Introduction Symbolic Multicore Symbolic & Multi-core Conclusion ...
Multi-valued Decision Diagrams
34
4
5
5 5
31
4 61
4 5
86
1
1
1
1
1
1
6
8
5
6
6
8
5
5
8
4
3
3
4
3
3
4
3
3
5
4
5
5
4
5
5
4
5
4 4
4 4
4 4
4
4
4
5
5
5
6
6
6
3
3
3
3
3
3
I Every path in the MDD represents a concrete state vector
I Potential gain in memory saving: exponential (here: 54→ 15)
I Symbolic Reachability: explore sets of states stored as MDDs
UNIVERSITY OF TWENTE. Multi-core and/or Symbolic September 20, 2012 11 / 41
... Introduction Symbolic Multicore Symbolic & Multi-core Conclusion ...
Multi-valued Decision Diagrams
34
4
5
5 5
31
4 61
4 5
86
1
1
1
1
1
1
6
8
5
6
6
8
5
5
8
4
3
3
4
3
3
4
3
3
5
4
5
5
4
5
5
4
5
4 4
4 4
4 4
4
4
4
5
5
5
6
6
6
3
3
3
3
3
3
I Every path in the MDD represents a concrete state vector
I Potential gain in memory saving: exponential (here: 54→ 15)
I Symbolic Reachability: explore sets of states stored as MDDs
UNIVERSITY OF TWENTE. Multi-core and/or Symbolic September 20, 2012 11 / 41
... Introduction Symbolic Multicore Symbolic & Multi-core Conclusion ...
Symbolic Reachability Algorithm
I L, V : MDDs for sets of long state vectors (level, visited)
I Ri : MDDs to store transition relation i on short vectors
I Li , Vi : MDDs for sets of short state vectors (level,visited for i)
symbolic-reachability: simplest version
(1) L := {InitState()}; V := L; all Ri := ∅; all Vi := ∅(2) while L 6= ∅ do(3) for i ∈ groups do(4) Li := πi ([D]N×K , L) \ Vi ; Vi := Vi ∪ Li(5) Ri := Ri ∪ {(s, s ′) | s ∈ Li ∧ s ′ ∈ NextState(i , s)}(6) L :=
⋃i (RelProd(Ri , L) \ V ); V := V ∪ L
(7) return V
I RelProd and Or are MDD operations
UNIVERSITY OF TWENTE. Multi-core and/or Symbolic September 20, 2012 12 / 41
... Introduction Symbolic Multicore Symbolic & Multi-core Conclusion ...
Symbolic Reachability Algorithm
I L, V : MDDs for sets of long state vectors (level, visited)
I Ri : MDDs to store transition relation i on short vectors
I Li , Vi : MDDs for sets of short state vectors (level,visited for i)
symbolic-reachability: simplest version
(1) L := {InitState()}; V := L; all Ri := ∅; all Vi := ∅(2) while L 6= ∅ do(3) for i ∈ groups do(4) Li := πi ([D]N×K , L) \ Vi ; Vi := Vi ∪ Li(5) Ri := Ri ∪ {(s, s ′) | s ∈ Li ∧ s ′ ∈ NextState(i , s)}(6) L :=
⋃i (RelProd(Ri , L) \ V ); V := V ∪ L
(7) return V
I RelProd and Or are MDD operations
UNIVERSITY OF TWENTE. Multi-core and/or Symbolic September 20, 2012 12 / 41
... Introduction Symbolic Multicore Symbolic & Multi-core Conclusion ...
Symbolic Reachability
Symbolic Reachability with PINS
I Global set of reachable states is computed as fix point
I Stored as a multi-valued decision diagram (MDD)
I Learn symbolic sub-groups Ri on-the-fly (via NextState)
Up to > 250 states now also for PROMELA and mCRL2
Extensions
I Multiple exploration strategies:I Breadth-first: (T1 + T2 + · · ·+ Tn)∗
I Chaining: (T1 ◦ T2 ◦ · · · ◦ Tn)∗
I Saturation-like: (T ∗1 ◦ T ∗
2 ◦ · · · ◦ T ∗n )∗
I Full Saturation: ((((T ∗1 T2)∗T3)∗ · · · )∗Tn)∗
I Static variable reordering boosts performance
I Multiple BDD packages: BuDDy, own MDD, Sylvan (later)
UNIVERSITY OF TWENTE. Multi-core and/or Symbolic September 20, 2012 13 / 41
... Introduction Symbolic Multicore Symbolic & Multi-core Conclusion ...
Symbolic Reachability
Symbolic Reachability with PINS
I Global set of reachable states is computed as fix point
I Stored as a multi-valued decision diagram (MDD)
I Learn symbolic sub-groups Ri on-the-fly (via NextState)
Up to > 250 states now also for PROMELA and mCRL2
Extensions
I Multiple exploration strategies:I Breadth-first: (T1 + T2 + · · ·+ Tn)∗
I Chaining: (T1 ◦ T2 ◦ · · · ◦ Tn)∗
I Saturation-like: (T ∗1 ◦ T ∗
2 ◦ · · · ◦ T ∗n )∗
I Full Saturation: ((((T ∗1 T2)∗T3)∗ · · · )∗Tn)∗
I Static variable reordering boosts performance
I Multiple BDD packages: BuDDy, own MDD, Sylvan (later)
UNIVERSITY OF TWENTE. Multi-core and/or Symbolic September 20, 2012 13 / 41
... Introduction Symbolic Multicore Symbolic & Multi-core Conclusion ...
Symbolic Reachability
Symbolic Reachability with PINS
I Global set of reachable states is computed as fix point
I Stored as a multi-valued decision diagram (MDD)
I Learn symbolic sub-groups Ri on-the-fly (via NextState)
Up to > 250 states now also for PROMELA and mCRL2
Extensions
I Multiple exploration strategies:I Breadth-first: (T1 + T2 + · · ·+ Tn)∗
I Chaining: (T1 ◦ T2 ◦ · · · ◦ Tn)∗
I Saturation-like: (T ∗1 ◦ T ∗
2 ◦ · · · ◦ T ∗n )∗
I Full Saturation: ((((T ∗1 T2)∗T3)∗ · · · )∗Tn)∗
I Static variable reordering boosts performance
I Multiple BDD packages: BuDDy, own MDD, Sylvan (later)
UNIVERSITY OF TWENTE. Multi-core and/or Symbolic September 20, 2012 13 / 41
Multi-core Model Checking
... Introduction Symbolic Multicore Symbolic & Multi-core Conclusion ...
Multi-Core “Crisis”
Recent “standard hardware”
I Multiple processors, multiple cores
I E.g.: 48 cores, 256GB shared mem
I L2 caches: cache-coherence
I Non-Uniform Memory Architecture
UNIVERSITY OF TWENTE. Multi-core and/or Symbolic September 20, 2012 15 / 41
(Source: Smoothspan)
(Source: Anandtech)
... Introduction Symbolic Multicore Symbolic & Multi-core Conclusion ...
Multi-Core “Crisis”
Recent “standard hardware”
I Multiple processors, multiple cores
I E.g.: 48 cores, 256GB shared mem
I L2 caches: cache-coherence
I Non-Uniform Memory Architecture
UNIVERSITY OF TWENTE. Multi-core and/or Symbolic September 20, 2012 15 / 41
(Source: Smoothspan)
(Source: Anandtech)
... Introduction Symbolic Multicore Symbolic & Multi-core Conclusion ...
Table of Contents
1 IntroductionModel CheckingLTSmin toolset
2 Symbolic Model CheckingLocal Transition CachingMulti-valued Decision Diagrams
3 Multi-core Model CheckingLockless Shared HashtableState Space CompressionParallel Nested Depth First Search for LTL model checking
4 Symbolic Model Checking with Multi-core BDDsMulti-core Data StructuresParallel BDD operations
5 Conclusion
UNIVERSITY OF TWENTE. Multi-core and/or Symbolic September 20, 2012 16 / 41
... Introduction Symbolic Multicore Symbolic & Multi-core Conclusion ...
Multi-core Reachability - Architecture I
Worker 1 Worker 2
Worker 3 Worker 4
QueueQueue
QueueQueue
store store
storestore
I DiVinE: Static Partitioning
I BFS, communication
store
Worker 1 Worker 2
Worker 4 Worker 3
Stack
Stack
Stack
Stack
I SPIN: Shared Storage, Stack Slicing
I DFS, integrated load balancing
Main bottleneck for shared storage: Visited Set
I Parallel access requires synchronization . . . . . . lock contention
I Graph traversal: Random memory access . . . . . . . cache misses
UNIVERSITY OF TWENTE. Multi-core and/or Symbolic September 20, 2012 17 / 41
... Introduction Symbolic Multicore Symbolic & Multi-core Conclusion ...
Multi-core Reachability - Architecture I
Worker 1 Worker 2
Worker 3 Worker 4
QueueQueue
QueueQueue
store store
storestore
I DiVinE: Static Partitioning
I BFS, communication
store
Worker 1 Worker 2
Worker 4 Worker 3
Stack
Stack
Stack
Stack
I SPIN: Shared Storage, Stack Slicing
I DFS, integrated load balancing
Main bottleneck for shared storage: Visited Set
I Parallel access requires synchronization . . . . . . lock contention
I Graph traversal: Random memory access . . . . . . . cache misses
UNIVERSITY OF TWENTE. Multi-core and/or Symbolic September 20, 2012 17 / 41
... Introduction Symbolic Multicore Symbolic & Multi-core Conclusion ...
Multi-core Reachability - Architecture II
store
Worker 1 Worker 2
Worker 3 Worker 4
Shared state storage as main synchronization point
I Flexible state space traversal . . . . . . . free/strict BFS and DFS
I Flexible load balancing . . . . . . . . .Synchronous Random PollingI Single function needed: Find-or-Put
I Monotonically growing set of visited states
UNIVERSITY OF TWENTE. Multi-core and/or Symbolic September 20, 2012 18 / 41
... Introduction Symbolic Multicore Symbolic & Multi-core Conclusion ...
Lockless Hash Table: Layout[FMCAD 2010]
I Open addressing
I Separate hash and data
I Hash Memoization
I Walking the Line
I Lockless (CAS + write bit)
|state|
data bucket |c
ache
line
|
See also: Cliff Click (JavaOne 2007 presentation)
Further compression: (Parallel) Compact Hash Table[J.G. Cleary’84, Vegt/Laarman’12]
UNIVERSITY OF TWENTE. Multi-core and/or Symbolic September 20, 2012 19 / 41
... Introduction Symbolic Multicore Symbolic & Multi-core Conclusion ...
Lockless Hash Table: Layout[FMCAD 2010]
I Open addressing
I Separate hash and data
I Hash Memoization
I Walking the Line
I Lockless (CAS + write bit)
|state|
data bucket |c
ache
line
|
See also: Cliff Click (JavaOne 2007 presentation)Further compression: (Parallel) Compact Hash Table
[J.G. Cleary’84, Vegt/Laarman’12]
UNIVERSITY OF TWENTE. Multi-core and/or Symbolic September 20, 2012 19 / 41
... Introduction Symbolic Multicore Symbolic & Multi-core Conclusion ...
Experiments from 2010 (BEEM database)
UNIVERSITY OF TWENTE. Multi-core and/or Symbolic September 20, 2012 20 / 41
SPIN 5.2.4 (NASA/JPL) DiVinE 2.2 (Brno,CZ)
LTSmin, no load balancing LTSmin, random synchronous polling
... Introduction Symbolic Multicore Symbolic & Multi-core Conclusion ...
Tree Compression – addressing memory consumption[PDMC 2007, JLC 2009]
1
8
1
1
1
1
1
6
8
5
6
6
8
5
8
4
3
3
4
3
4
3
3
4
3
5
4
5
5
4
5
4
5
5
4
4 4
4 4
4 4
4
4
4 4
5
5
5
6
6
6
3
3
3
3
3
5 6
0 1
2 3
3 5 5 4 1 3
3 5
6
2
0 0
1
2
0
1
0
2
52
4 1
1 3
0
0
1
1
2
1 2
2
0
1
I Index 5 to folded vector 〈2, 1〉 represents state 〈3, 5, 5, 4, 1, 3〉I Selected tree fringe of grey boxes corresponds to state vector
I Potential gain (here 54→ 42 entries):I main table of size N is only two integers wideI small tables have expected size O(
√N) only
UNIVERSITY OF TWENTE. Multi-core and/or Symbolic September 20, 2012 21 / 41
... Introduction Symbolic Multicore Symbolic & Multi-core Conclusion ...
Tree Compression – addressing memory consumption[PDMC 2007, JLC 2009]
1
8
1
1
1
1
1
6
8
5
6
6
8
5
8
4
3
3
4
3
4
3
3
4
3
5
4
5
5
4
5
4
5
5
4
4 4
4 4
4 4
4
4
4 4
5
5
5
6
6
6
3
3
3
3
3
5 6
0 1
2 3
3 5 5 4 1 3
3 5
6
2
0 0
1
2
0
1
0
2
52
4 1
1 3
0
0
1
1
2
1 2
2
0
1
I Index 5 to folded vector 〈2, 1〉 represents state 〈3, 5, 5, 4, 1, 3〉I Selected tree fringe of grey boxes corresponds to state vectorI Potential gain (here 54→ 42 entries):
I main table of size N is only two integers wideI small tables have expected size O(
√N) only
UNIVERSITY OF TWENTE. Multi-core and/or Symbolic September 20, 2012 21 / 41
... Introduction Symbolic Multicore Symbolic & Multi-core Conclusion ...
Exploiting locality once more
Dependency Matrix DM×N predicts changing state parts:I Incremental tree insertions:
I Traverse only the changing paths in the Tree of Tables
I Incremental hashing, based on Zobrist:
g1-f3
Hx Hy (Hx Z ,g,1) Z ,f,3 =
Albert L. Zobrist, A New Hashing Method with Application forGame Playing, Tech. Rep. 88, University of Wisconsin, 1969.
UNIVERSITY OF TWENTE. Multi-core and/or Symbolic September 20, 2012 22 / 41
... Introduction Symbolic Multicore Symbolic & Multi-core Conclusion ...
Exploiting locality once more
Dependency Matrix DM×N predicts changing state parts:I Incremental tree insertions:
I Traverse only the changing paths in the Tree of Tables
I Incremental hashing, based on Zobrist:
g1-f3
Hx Hy (Hx Z ,g,1) Z ,f,3 =
Albert L. Zobrist, A New Hashing Method with Application forGame Playing, Tech. Rep. 88, University of Wisconsin, 1969.
UNIVERSITY OF TWENTE. Multi-core and/or Symbolic September 20, 2012 22 / 41
... Introduction Symbolic Multicore Symbolic & Multi-core Conclusion ...
Multi-Core Tree Compression – experiments[NASA FM 2011]
I Tree compression is a recursivevariant of SPIN’s Collapse
I Exploit combinatorial structure:I State vectors highly similarI Impressive compression ratios
I Extreme case: firewire tree
Uncompressed: 14 GBTree Compression: 96 MB
I Compression comes for freeI Arithmetic intensity increasesI Less memory-bus traffic
!"
#"
$!"
$#"
%!"
%#"
!" #!" $!!" $#!" %!!" %#!" &!!"
!"#$%&''(")
*+,!-"%*./
012**
'-,-&*3&)4-5*.67-&2**
'())"*+,-()../+0"12.3"'245)"6-7,25"8())"*+,-()../+0"96::;<=>"+-7,25"96::;<=>"*+,-()../+0"
0
1000
2000
3000
4000
5000
6000
1 2 4 6 8 10 12 14 16
time
(sec
)
#cores
LTSmin-mc TableLTSmin-mc Tree
DiVinE 2.2SPIN
SPIN Collapseoptimal (linear speedup)
UNIVERSITY OF TWENTE. Multi-core and/or Symbolic September 20, 2012 23 / 41
... Introduction Symbolic Multicore Symbolic & Multi-core Conclusion ...
Model Checking by Accepting Cycles
LTL Model Checking
I A buggy run in a system can be viewed as an infinite word
I Absence of bugs: emptiness of some Buchi automaton
I Graph problem: find a reachable accepting state on a cycle
I Basic algorithm: Nested Depth First Search (NDFS)
12
3 4 5
6
12
5
6
New in LTSmin (ATVA’11,’12)
I Scalable parallel NDFS
I So far, thought to beimpossible (in theory)
UNIVERSITY OF TWENTE. Multi-core and/or Symbolic September 20, 2012 24 / 41
... Introduction Symbolic Multicore Symbolic & Multi-core Conclusion ...
Model Checking by Accepting Cycles
LTL Model Checking
I A buggy run in a system can be viewed as an infinite word
I Absence of bugs: emptiness of some Buchi automaton
I Graph problem: find a reachable accepting state on a cycle
I Basic algorithm: Nested Depth First Search (NDFS)
12
3 4 5
6
12
5
6
New in LTSmin (ATVA’11,’12)
I Scalable parallel NDFS
I So far, thought to beimpossible (in theory)
UNIVERSITY OF TWENTE. Multi-core and/or Symbolic September 20, 2012 24 / 41
... Introduction Symbolic Multicore Symbolic & Multi-core Conclusion ...
Model Checking by Accepting Cycles
LTL Model Checking
I A buggy run in a system can be viewed as an infinite word
I Absence of bugs: emptiness of some Buchi automaton
I Graph problem: find a reachable accepting state on a cycle
I Basic algorithm: Nested Depth First Search (NDFS)
12
3 4 5
612
5
6
New in LTSmin (ATVA’11,’12)
I Scalable parallel NDFS
I So far, thought to beimpossible (in theory)
UNIVERSITY OF TWENTE. Multi-core and/or Symbolic September 20, 2012 24 / 41
... Introduction Symbolic Multicore Symbolic & Multi-core Conclusion ...
Model Checking by Accepting Cycles
LTL Model Checking
I A buggy run in a system can be viewed as an infinite word
I Absence of bugs: emptiness of some Buchi automaton
I Graph problem: find a reachable accepting state on a cycle
I Basic algorithm: Nested Depth First Search (NDFS)
12
3 4 5
612
5
6 New in LTSmin (ATVA’11,’12)
I Scalable parallel NDFS
I So far, thought to beimpossible (in theory)
UNIVERSITY OF TWENTE. Multi-core and/or Symbolic September 20, 2012 24 / 41
... Introduction Symbolic Multicore Symbolic & Multi-core Conclusion ...
Nested Depth First Search[Courcoubetis, Vardi, etal.]
procedure DFSblue(s)s.blue := truefor all t∈NextState(s) do
if ¬t.blue then DFSblue(t)if s∈Accepting then
seed := sDFSred(s)
procedure DFSred(s)s.red := truefor all t∈NextState(s) do
if t = seed then ExitCycleif ¬t.red then DFSred(t)
Nested DFS
I Blue searchI Visits all reachable statesI Starts Red search on
accepting states (seed)in post order
I Red SearchI Finds cycle through seedI Visits states at most once
I Linear time, on-the-fly
I Blue is inherently depth-first
UNIVERSITY OF TWENTE. Multi-core and/or Symbolic September 20, 2012 25 / 41
... Introduction Symbolic Multicore Symbolic & Multi-core Conclusion ...
Nested Depth First Search[Courcoubetis, Vardi, etal.]
procedure DFSblue(s)s.blue := truefor all t∈NextState(s) do
if ¬t.blue then DFSblue(t)if s∈Accepting then
seed := sDFSred(s)
procedure DFSred(s)s.red := truefor all t∈NextState(s) do
if t = seed then ExitCycleif ¬t.red then DFSred(t)
Nested DFS
I Blue searchI Visits all reachable statesI Starts Red search on
accepting states (seed)in post order
I Red SearchI Finds cycle through seedI Visits states at most once
I Linear time, on-the-fly
I Blue is inherently depth-first
UNIVERSITY OF TWENTE. Multi-core and/or Symbolic September 20, 2012 25 / 41
... Introduction Symbolic Multicore Symbolic & Multi-core Conclusion ...
Nested Depth First Search[Courcoubetis, Vardi, etal.]
procedure DFSblue(s)s.blue := truefor all t∈NextState(s) do
if ¬t.blue then DFSblue(t)if s∈Accepting then
seed := sDFSred(s)
procedure DFSred(s)s.red := truefor all t∈NextState(s) do
if t = seed then ExitCycleif ¬t.red then DFSred(t)
Nested DFS
I Blue searchI Visits all reachable statesI Starts Red search on
accepting states (seed)in post order
I Red SearchI Finds cycle through seedI Visits states at most once
I Linear time, on-the-fly
I Blue is inherently depth-first
UNIVERSITY OF TWENTE. Multi-core and/or Symbolic September 20, 2012 25 / 41
... Introduction Symbolic Multicore Symbolic & Multi-core Conclusion ...
Swarmed Multi-core Nested Depth First Search
code for worker i, no communication
procedure DFSblue(s,i)s.blue[i] := truefor all t∈NextState(s) do
if ¬t.blue[i] then DFSblue(t,i)if s∈Accepting then
seed[i] := sDFSred(s,i)
procedure DFSred(s,i)s.red[i] := truefor all t∈NextState(s) do
if t = seed[i] then ExitCycleif ¬t.red[i] then DFSred(t,i)
Multi-core Swarmed NDFS
I N workers perform parallelsearch independently
[G. Holzmann etal.]
I Multi-core: store visitedstates in a shared hash table
I Scales well in the presenceof accepting cycles (bugs)
I Otherwise, all workerstraverse the whole graph
UNIVERSITY OF TWENTE. Multi-core and/or Symbolic September 20, 2012 26 / 41
... Introduction Symbolic Multicore Symbolic & Multi-core Conclusion ...
Swarmed Multi-core Nested Depth First Search
code for worker i, no communication
procedure DFSblue(s,i)s.blue[i] := truefor all t∈NextState(s) do
if ¬t.blue[i] then DFSblue(t,i)if s∈Accepting then
seed[i] := sDFSred(s,i)
procedure DFSred(s,i)s.red[i] := truefor all t∈NextState(s) do
if t = seed[i] then ExitCycleif ¬t.red[i] then DFSred(t,i)
Multi-core Swarmed NDFS
I N workers perform parallelsearch independently
[G. Holzmann etal.]
I Multi-core: store visitedstates in a shared hash table
I Scales well in the presenceof accepting cycles (bugs)
I Otherwise, all workerstraverse the whole graph
UNIVERSITY OF TWENTE. Multi-core and/or Symbolic September 20, 2012 26 / 41
... Introduction Symbolic Multicore Symbolic & Multi-core Conclusion ...
Swarmed Multi-core Nested Depth First Search
code for worker i, no communication
procedure DFSblue(s,i)s.blue[i] := truefor all t∈NextState(s) do
if ¬t.blue[i] then DFSblue(t,i)if s∈Accepting then
seed[i] := sDFSred(s,i)
procedure DFSred(s,i)s.red[i] := truefor all t∈NextState(s) do
if t = seed[i] then ExitCycleif ¬t.red[i] then DFSred(t,i)
Multi-core Swarmed NDFS
I N workers perform parallelsearch independently
[G. Holzmann etal.]
I Multi-core: store visitedstates in a shared hash table
I Scales well in the presenceof accepting cycles (bugs)
I Otherwise, all workerstraverse the whole graph
UNIVERSITY OF TWENTE. Multi-core and/or Symbolic September 20, 2012 26 / 41
... Introduction Symbolic Multicore Symbolic & Multi-core Conclusion ...
Approaches to Parallel LTL Model Checking
Speedup of Swarmed NDFS(1 versus 16 cores)
!"#$%&'
!"#$%!'
!"#(%%'
!"#(%!'
!"#(%&'
!"#(%)'
!"#(%*'
!"#$%&' !"#$%!' !"#(%%' !"#(%!' !"#(%&' !"#(%)' !"#(%*'
!"#$%&'(
()*+',-&#.'
!/"#$%&'(()*+',-&#.'
+,+-./'01'+,+-./','2'3','2'!4'5'3'
[BEEM database]
Alternatives for parallel LTL
I Swarm verification with NDFSI Effective, only for bug finding
I Dual-core NDFS [Holzmann]I Red search on 2nd CPUI Speedup of at most factor 2
I Red Search as parallel reachabilityI Speedup still ≤ 2: |G |+ |G |/N
I Can one do better?I Post-order is P-Complete, soI DFS not efficiently parallelizable
I Breadth-first based:I OWCTY, MAP [Brno]I Not linear (|G | · h), not on-the-fly
UNIVERSITY OF TWENTE. Multi-core and/or Symbolic September 20, 2012 27 / 41
... Introduction Symbolic Multicore Symbolic & Multi-core Conclusion ...
Approaches to Parallel LTL Model Checking
Speedup of Swarmed NDFS(1 versus 16 cores)
!"#$%&'
!"#$%!'
!"#(%%'
!"#(%!'
!"#(%&'
!"#(%)'
!"#(%*'
!"#$%&' !"#$%!' !"#(%%' !"#(%!' !"#(%&' !"#(%)' !"#(%*'
!"#$%&'(
()*+',-&#.'
!/"#$%&'(()*+',-&#.'
+,+-./'01'+,+-./','2'3','2'!4'5'3'
[BEEM database]
Alternatives for parallel LTL
I Swarm verification with NDFSI Effective, only for bug finding
I Dual-core NDFS [Holzmann]I Red search on 2nd CPUI Speedup of at most factor 2
I Red Search as parallel reachabilityI Speedup still ≤ 2: |G |+ |G |/N
I Can one do better?I Post-order is P-Complete, soI DFS not efficiently parallelizable
I Breadth-first based:I OWCTY, MAP [Brno]I Not linear (|G | · h), not on-the-fly
UNIVERSITY OF TWENTE. Multi-core and/or Symbolic September 20, 2012 27 / 41
... Introduction Symbolic Multicore Symbolic & Multi-core Conclusion ...
Approaches to Parallel LTL Model Checking
Speedup of Swarmed NDFS(1 versus 16 cores)
!"#$%&'
!"#$%!'
!"#(%%'
!"#(%!'
!"#(%&'
!"#(%)'
!"#(%*'
!"#$%&' !"#$%!' !"#(%%' !"#(%!' !"#(%&' !"#(%)' !"#(%*'
!"#$%&'(
()*+',-&#.'
!/"#$%&'(()*+',-&#.'
+,+-./'01'+,+-./','2'3','2'!4'5'3'
[BEEM database]
Alternatives for parallel LTL
I Swarm verification with NDFSI Effective, only for bug finding
I Dual-core NDFS [Holzmann]I Red search on 2nd CPUI Speedup of at most factor 2
I Red Search as parallel reachabilityI Speedup still ≤ 2: |G |+ |G |/N
I Can one do better?I Post-order is P-Complete, soI DFS not efficiently parallelizable
I Breadth-first based:I OWCTY, MAP [Brno]I Not linear (|G | · h), not on-the-fly
UNIVERSITY OF TWENTE. Multi-core and/or Symbolic September 20, 2012 27 / 41
... Introduction Symbolic Multicore Symbolic & Multi-core Conclusion ...
Approaches to Parallel LTL Model Checking
Speedup of Swarmed NDFS(1 versus 16 cores)
!"#$%&'
!"#$%!'
!"#(%%'
!"#(%!'
!"#(%&'
!"#(%)'
!"#(%*'
!"#$%&' !"#$%!' !"#(%%' !"#(%!' !"#(%&' !"#(%)' !"#(%*'
!"#$%&'(
()*+',-&#.'
!/"#$%&'(()*+',-&#.'
+,+-./'01'+,+-./','2'3','2'!4'5'3'
[BEEM database]
Alternatives for parallel LTL
I Swarm verification with NDFSI Effective, only for bug finding
I Dual-core NDFS [Holzmann]I Red search on 2nd CPUI Speedup of at most factor 2
I Red Search as parallel reachabilityI Speedup still ≤ 2: |G |+ |G |/N
I Can one do better?I Post-order is P-Complete, soI DFS not efficiently parallelizable
I Breadth-first based:I OWCTY, MAP [Brno]I Not linear (|G | · h), not on-the-fly
UNIVERSITY OF TWENTE. Multi-core and/or Symbolic September 20, 2012 27 / 41
... Introduction Symbolic Multicore Symbolic & Multi-core Conclusion ...
Approaches to Parallel LTL Model Checking
Speedup of Swarmed NDFS(1 versus 16 cores)
!"#$%&'
!"#$%!'
!"#(%%'
!"#(%!'
!"#(%&'
!"#(%)'
!"#(%*'
!"#$%&' !"#$%!' !"#(%%' !"#(%!' !"#(%&' !"#(%)' !"#(%*'
!"#$%&'(
()*+',-&#.'
!/"#$%&'(()*+',-&#.'
+,+-./'01'+,+-./','2'3','2'!4'5'3'
[BEEM database]
Alternatives for parallel LTL
I Swarm verification with NDFSI Effective, only for bug finding
I Dual-core NDFS [Holzmann]I Red search on 2nd CPUI Speedup of at most factor 2
I Red Search as parallel reachabilityI Speedup still ≤ 2: |G |+ |G |/N
I Can one do better?I Post-order is P-Complete, soI DFS not efficiently parallelizable
I Breadth-first based:I OWCTY, MAP [Brno]I Not linear (|G | · h), not on-the-fly
UNIVERSITY OF TWENTE. Multi-core and/or Symbolic September 20, 2012 27 / 41
... Introduction Symbolic Multicore Symbolic & Multi-core Conclusion ...
Parallel NDFS: share the red color[ATVA’11]
procedure DFSblue(s,i)s.color[i] := cyanfor all t ∈ NextState(s) do
if t.color[i]=cyan and s or t ∈ Acc then ExitCycleif t.color[i]=white and ¬t.red then DFSblue(t,i)
if s ∈ Acc then s.count++; DFSred(s,i)s.color[i] := blue
procedure DFSred(s,i)s.color[i] := pinkfor all t ∈ NextState(s) do
if t.color[i]=cyan then ExitCycleif t.color[i] 6=pink and ¬t.red then DFSred(t,i)
if s ∈ Acc then s.count−−; await s.count=0s.red := true
UNIVERSITY OF TWENTE. Multi-core and/or Symbolic September 20, 2012 28 / 41
... Introduction Symbolic Multicore Symbolic & Multi-core Conclusion ...
Parallel NDFS: share the red color[ATVA’11]
procedure DFSblue(s,i)s.color[i] := cyanfor all t ∈ NextState(s) do
if t.color[i]=cyan and s or t ∈ Acc then ExitCycleif t.color[i]=white and ¬t.red then DFSblue(t,i)
if s ∈ Acc then s.count++; DFSred(s,i)s.color[i] := blue
procedure DFSred(s,i)s.color[i] := pinkfor all t ∈ NextState(s) do
if t.color[i]=cyan then ExitCycleif t.color[i] 6=pink and ¬t.red then DFSred(t,i)
if s ∈ Acc then s.count−−; await s.count=0s.red := true
UNIVERSITY OF TWENTE. Multi-core and/or Symbolic September 20, 2012 28 / 41
... Introduction Symbolic Multicore Symbolic & Multi-core Conclusion ...
OWCTY and Swarmed NDFS versus Parallel NDFS
!"#$%&'
!"#$%!'
!"#(%%'
!"#(%!'
!"#(%&'
!"#(%)'
!"#(%*'
!"#$%&' !"#$%!' !"#(%%' !"#(%!' !"#(%&' !"#(%)' !"#(%*'
!"#$%&'())*+,(-.'$/(
!"#$%&'(01#))*+,(-.'$/(
+,+-./'01'+,+-./','2'3','2'!%'4'3'
Swarmed versus Parallel NDFS(both 16 cores)
!"#$%&'
!"#$%!'
!"#(%%'
!"#(%!'
!"#(%&'
!"#(%)'
!"#$%&' !"#$%!' !"#(%%' !"#(%!' !"#(%&' !"#(%)' !"#(%*'
!"#$%&'()*+,)
-.*
/(01'$2(
!"#$%&'(3.#445+6(01'$2(
+,+-./'
01'+,+-./'
2'3','
2'3'!%'4','
2'3'!5!%'4','
OWCTY versus Parallel NDFS(both 16 cores)
UNIVERSITY OF TWENTE. Multi-core and/or Symbolic September 20, 2012 29 / 41
Multi-core Symbolic Model Checking
... Introduction Symbolic Multicore Symbolic & Multi-core Conclusion ...
Observations
I Time-wise, multicore model checking yields great speedups
I Memory-wise, BDDs are potentially exponentially concise
I Can we design scalable parallel symbolic data structures?
I Memory-intensive, random memory access, cannot scale
UNIVERSITY OF TWENTE. Multi-core and/or Symbolic September 20, 2012 31 / 41
... Introduction Symbolic Multicore Symbolic & Multi-core Conclusion ...
Observations
I Time-wise, multicore model checking yields great speedups
I Memory-wise, BDDs are potentially exponentially concise
I Can we design scalable parallel symbolic data structures?
I Memory-intensive, random memory access, cannot scale
UNIVERSITY OF TWENTE. Multi-core and/or Symbolic September 20, 2012 31 / 41
... Introduction Symbolic Multicore Symbolic & Multi-core Conclusion ...
Earlier work: disappointing speedups
Earlier work in parallel BDDs
I early ’90s: vector machines, massive SIMD (not unlike GPU)
I late ’90s: virtual SMP, distributed BDDs (BDDnow)
Earlier work in parallel symbolic model checking
I ’00vv: Grumberg et al: vertical splitting of BDDs (distributed)
I ’00vv: Ciardo et al: horizontal splitting of BDDs (distributed)
I CAV’07: Luttgen etal - parallelisation of saturation with Cilk
I PDMC’09: Ciardo - Difficult, but what is the alternative?
Recent developments
I 2010: J. Ossowski - Jinc: Multi-threaded decision diagrams
I 2012: Tom van Dijk, Alfons Laarman, JvdP:Sylvan, a library for multi-core BDD operations (PDMC’12)
UNIVERSITY OF TWENTE. Multi-core and/or Symbolic September 20, 2012 32 / 41
... Introduction Symbolic Multicore Symbolic & Multi-core Conclusion ...
Earlier work: disappointing speedups
Earlier work in parallel BDDs
I early ’90s: vector machines, massive SIMD (not unlike GPU)
I late ’90s: virtual SMP, distributed BDDs (BDDnow)
Earlier work in parallel symbolic model checking
I ’00vv: Grumberg et al: vertical splitting of BDDs (distributed)
I ’00vv: Ciardo et al: horizontal splitting of BDDs (distributed)
I CAV’07: Luttgen etal - parallelisation of saturation with Cilk
I PDMC’09: Ciardo - Difficult, but what is the alternative?
Recent developments
I 2010: J. Ossowski - Jinc: Multi-threaded decision diagrams
I 2012: Tom van Dijk, Alfons Laarman, JvdP:Sylvan, a library for multi-core BDD operations (PDMC’12)
UNIVERSITY OF TWENTE. Multi-core and/or Symbolic September 20, 2012 32 / 41
... Introduction Symbolic Multicore Symbolic & Multi-core Conclusion ...
Earlier work: disappointing speedups
Earlier work in parallel BDDs
I early ’90s: vector machines, massive SIMD (not unlike GPU)
I late ’90s: virtual SMP, distributed BDDs (BDDnow)
Earlier work in parallel symbolic model checking
I ’00vv: Grumberg et al: vertical splitting of BDDs (distributed)
I ’00vv: Ciardo et al: horizontal splitting of BDDs (distributed)
I CAV’07: Luttgen etal - parallelisation of saturation with Cilk
I PDMC’09: Ciardo - Difficult, but what is the alternative?
Recent developments
I 2010: J. Ossowski - Jinc: Multi-threaded decision diagrams
I 2012: Tom van Dijk, Alfons Laarman, JvdP:Sylvan, a library for multi-core BDD operations (PDMC’12)
UNIVERSITY OF TWENTE. Multi-core and/or Symbolic September 20, 2012 32 / 41
... Introduction Symbolic Multicore Symbolic & Multi-core Conclusion ...
Table of Contents
1 IntroductionModel CheckingLTSmin toolset
2 Symbolic Model CheckingLocal Transition CachingMulti-valued Decision Diagrams
3 Multi-core Model CheckingLockless Shared HashtableState Space CompressionParallel Nested Depth First Search for LTL model checking
4 Symbolic Model Checking with Multi-core BDDsMulti-core Data StructuresParallel BDD operations
5 Conclusion
UNIVERSITY OF TWENTE. Multi-core and/or Symbolic September 20, 2012 33 / 41
... Introduction Symbolic Multicore Symbolic & Multi-core Conclusion ...
Reuse lockless Hash Tables?
Data structures for Binary Decision Diagrams
I Unique Table: ensure that BDD nodes exist at most onceI implements the maximal sharing requirementI necessary for constant equality check by pointer comparison
I Computed Table: dynamic programmingI Used to store intermediate results to avoid recomputationsI Needed to manipulate BDDs in polynomial time
I Both data structures are usually implemented as hash tables
I Problem: Both tables require garbage collection in practice
UNIVERSITY OF TWENTE. Multi-core and/or Symbolic September 20, 2012 34 / 41
... Introduction Symbolic Multicore Symbolic & Multi-core Conclusion ...
Reuse lockless Hash Tables?
Data structures for Binary Decision Diagrams
I Unique Table: ensure that BDD nodes exist at most onceI implements the maximal sharing requirementI necessary for constant equality check by pointer comparison
I Computed Table: dynamic programmingI Used to store intermediate results to avoid recomputationsI Needed to manipulate BDDs in polynomial time
I Both data structures are usually implemented as hash tables
I Problem: Both tables require garbage collection in practice
UNIVERSITY OF TWENTE. Multi-core and/or Symbolic September 20, 2012 34 / 41
... Introduction Symbolic Multicore Symbolic & Multi-core Conclusion ...
Algorithm to put an entry in Computed Table
I Local bit-lock in hash array controls access to data arrayI Every cache-hit is nice, but don’t loose time
I No waiting for locks: just give upI No hash collision resolution: just overwrite or give up
Input: key, data
1: hash← calculate hash(key)2: index← hash % tablesize3: 〈curhash, curlock〉 ← hasharray[index]4: if curlock = 1 then return NOTADDED
5: if curhash = hash then6: if data matches data in data array then return NOTADDED
7: if not compare and swap(hasharray[index], 〈curhash, 0〉, 〈hash, 1〉) then8: return NOTADDED
9: write data to data array10: hasharray[index]← 〈hash, 0〉 {release lock}11: return ADDED
UNIVERSITY OF TWENTE. Multi-core and/or Symbolic September 20, 2012 35 / 41
... Introduction Symbolic Multicore Symbolic & Multi-core Conclusion ...
Algorithm to put an entry in Computed Table
I Local bit-lock in hash array controls access to data arrayI Every cache-hit is nice, but don’t loose time
I No waiting for locks: just give upI No hash collision resolution: just overwrite or give up
Input: key, data
1: hash← calculate hash(key)2: index← hash % tablesize3: 〈curhash, curlock〉 ← hasharray[index]4: if curlock = 1 then return NOTADDED
5: if curhash = hash then6: if data matches data in data array then return NOTADDED
7: if not compare and swap(hasharray[index], 〈curhash, 0〉, 〈hash, 1〉) then8: return NOTADDED
9: write data to data array10: hasharray[index]← 〈hash, 0〉 {release lock}11: return ADDED
UNIVERSITY OF TWENTE. Multi-core and/or Symbolic September 20, 2012 35 / 41
... Introduction Symbolic Multicore Symbolic & Multi-core Conclusion ...
Unique table = lockless hashtable + garbage collection
I Use one bit of hash as lock, and two bytes as reference count
I Buckets in the hash table are in one of these states:
1 EMPTY: Unused bucket, also end-of-search2 TOMBSTONE: Unused bucket, but keep searching3 WAIT(hash): Bucket being written4 DONE(hash, count): Filled bucket
EMPTY WAIT(h) DONE(h,count)
TOMBSTONE
cas write data
+, −deletecas
I Rules to enforce correctness:
1 Inserting and deleting mutually exclusive (separate gc mode)2 Transitions to WAIT use compare and swap
UNIVERSITY OF TWENTE. Multi-core and/or Symbolic September 20, 2012 36 / 41
... Introduction Symbolic Multicore Symbolic & Multi-core Conclusion ...
Unique table = lockless hashtable + garbage collection
I Use one bit of hash as lock, and two bytes as reference countI Buckets in the hash table are in one of these states:
1 EMPTY: Unused bucket, also end-of-search2 TOMBSTONE: Unused bucket, but keep searching3 WAIT(hash): Bucket being written4 DONE(hash, count): Filled bucket
EMPTY WAIT(h) DONE(h,count)
TOMBSTONE
cas write data
+, −deletecas
I Rules to enforce correctness:
1 Inserting and deleting mutually exclusive (separate gc mode)2 Transitions to WAIT use compare and swap
UNIVERSITY OF TWENTE. Multi-core and/or Symbolic September 20, 2012 36 / 41
... Introduction Symbolic Multicore Symbolic & Multi-core Conclusion ...
Unique table = lockless hashtable + garbage collection
I Use one bit of hash as lock, and two bytes as reference countI Buckets in the hash table are in one of these states:
1 EMPTY: Unused bucket, also end-of-search2 TOMBSTONE: Unused bucket, but keep searching3 WAIT(hash): Bucket being written4 DONE(hash, count): Filled bucket
EMPTY WAIT(h) DONE(h,count)
TOMBSTONE
cas write data
+, −deletecas
I Rules to enforce correctness:
1 Inserting and deleting mutually exclusive (separate gc mode)2 Transitions to WAIT use compare and swap
UNIVERSITY OF TWENTE. Multi-core and/or Symbolic September 20, 2012 36 / 41
... Introduction Symbolic Multicore Symbolic & Multi-core Conclusion ...
Parallelizing the Operations
Parallel BDD operations
I RelProd and Or have two recursive calls
I Calls organized in a task dependency graph
I Fine-grained task-parallelism:
I Parent spawns children for subtasksI Parent waits upon completion of childrenI Load balancing by work-stealingI Cilk [Blumofe ’95] or Wool [Faxen 2008]
I Alternative: rely on random traversal order
I Same task might be created many times
I Store the results in a memoization table
UNIVERSITY OF TWENTE. Multi-core and/or Symbolic September 20, 2012 37 / 41
... Introduction Symbolic Multicore Symbolic & Multi-core Conclusion ...
Parallelizing the Operations
Parallel BDD operations
I RelProd and Or have two recursive calls
I Calls organized in a task dependency graphI Fine-grained task-parallelism:
I Parent spawns children for subtasksI Parent waits upon completion of childrenI Load balancing by work-stealingI Cilk [Blumofe ’95] or Wool [Faxen 2008]
I Alternative: rely on random traversal order
I Same task might be created many times
I Store the results in a memoization table
UNIVERSITY OF TWENTE. Multi-core and/or Symbolic September 20, 2012 37 / 41
... Introduction Symbolic Multicore Symbolic & Multi-core Conclusion ...
Parallelizing the Operations
Parallel BDD operations
I RelProd and Or have two recursive calls
I Calls organized in a task dependency graphI Fine-grained task-parallelism:
I Parent spawns children for subtasksI Parent waits upon completion of childrenI Load balancing by work-stealingI Cilk [Blumofe ’95] or Wool [Faxen 2008]
I Alternative: rely on random traversal order
I Same task might be created many times
I Store the results in a memoization table
UNIVERSITY OF TWENTE. Multi-core and/or Symbolic September 20, 2012 37 / 41
... Introduction Symbolic Multicore Symbolic & Multi-core Conclusion ...
Results: speedup of BDD operations for model checking
5
10
15
20
25
30
0 10 20 30 40Workers
Sp
eed
up
Model
bakery.4
bakery.8
collision.5
iprotocol.7
lifts.4
lifts.7
schedule world.2
schedule world.3
Conclusion
I BEEM benchmarks, again
I On 4× 12 = 48 core NUMA
I Speedup of up to 32
I Small models don’t scale(time spent in work stealing)
I We only speedup the BDD-operations in symbolic reachability
I Measurements exclude time spent in language modules
I Even for large models, many small BDDs are involved
I This has potential impact on hardware verification
UNIVERSITY OF TWENTE. Multi-core and/or Symbolic September 20, 2012 38 / 41
... Introduction Symbolic Multicore Symbolic & Multi-core Conclusion ...
Results: speedup of BDD operations for model checking
5
10
15
20
25
30
0 10 20 30 40Workers
Sp
eed
up
Model
bakery.4
bakery.8
collision.5
iprotocol.7
lifts.4
lifts.7
schedule world.2
schedule world.3
Conclusion
I BEEM benchmarks, again
I On 4× 12 = 48 core NUMA
I Speedup of up to 32
I Small models don’t scale(time spent in work stealing)
I We only speedup the BDD-operations in symbolic reachability
I Measurements exclude time spent in language modules
I Even for large models, many small BDDs are involved
I This has potential impact on hardware verification
UNIVERSITY OF TWENTE. Multi-core and/or Symbolic September 20, 2012 38 / 41
... Introduction Symbolic Multicore Symbolic & Multi-core Conclusion ...
Table of Contents
1 IntroductionModel CheckingLTSmin toolset
2 Symbolic Model CheckingLocal Transition CachingMulti-valued Decision Diagrams
3 Multi-core Model CheckingLockless Shared HashtableState Space CompressionParallel Nested Depth First Search for LTL model checking
4 Symbolic Model Checking with Multi-core BDDsMulti-core Data StructuresParallel BDD operations
5 Conclusion
UNIVERSITY OF TWENTE. Multi-core and/or Symbolic September 20, 2012 39 / 41
... Introduction Symbolic Multicore Symbolic & Multi-core Conclusion ...
Lessons learnt
Data structures: engineering
I Lockless, Compare-and-Swap
I Tedious > 1 year: know your hardware
I Correctness? We used our tools:I mCRL2/LTSmin: algorithmsI Spin/LTSmin: C-code levelI Separation logic (ongoing)
Algorithms: ingenuity
I Very simple architecture
I Workers mostly independentI Rely on randomnessI Some global info shared
I Rigorous mathematical proofI (only partly in PVS)
Performance: repeatable experiments, open data
I Measure, measure, measure, hypothesize, visualize
I BEEM: > 250 Benchmarks for Explicit MC; Spin models
I Possible bias to our 16-core and 48-core AMD machines
I Applications: Railway Interlockings, CERN LHC control
I Todo: hardware benchmarks for symbolic model checking
UNIVERSITY OF TWENTE. Multi-core and/or Symbolic September 20, 2012 40 / 41
... Introduction Symbolic Multicore Symbolic & Multi-core Conclusion ...
Lessons learnt
Data structures: engineering
I Lockless, Compare-and-Swap
I Tedious > 1 year: know your hardware
I Correctness? We used our tools:I mCRL2/LTSmin: algorithmsI Spin/LTSmin: C-code levelI Separation logic (ongoing)
Algorithms: ingenuity
I Very simple architecture
I Workers mostly independentI Rely on randomnessI Some global info shared
I Rigorous mathematical proofI (only partly in PVS)
Performance: repeatable experiments, open data
I Measure, measure, measure, hypothesize, visualize
I BEEM: > 250 Benchmarks for Explicit MC; Spin models
I Possible bias to our 16-core and 48-core AMD machines
I Applications: Railway Interlockings, CERN LHC control
I Todo: hardware benchmarks for symbolic model checking
UNIVERSITY OF TWENTE. Multi-core and/or Symbolic September 20, 2012 40 / 41
... Introduction Symbolic Multicore Symbolic & Multi-core Conclusion ...
Lessons learnt
Data structures: engineering
I Lockless, Compare-and-Swap
I Tedious > 1 year: know your hardware
I Correctness? We used our tools:I mCRL2/LTSmin: algorithmsI Spin/LTSmin: C-code levelI Separation logic (ongoing)
Algorithms: ingenuity
I Very simple architecture
I Workers mostly independentI Rely on randomnessI Some global info shared
I Rigorous mathematical proofI (only partly in PVS)
Performance: repeatable experiments, open data
I Measure, measure, measure, hypothesize, visualize
I BEEM: > 250 Benchmarks for Explicit MC; Spin models
I Possible bias to our 16-core and 48-core AMD machines
I Applications: Railway Interlockings, CERN LHC control
I Todo: hardware benchmarks for symbolic model checking
UNIVERSITY OF TWENTE. Multi-core and/or Symbolic September 20, 2012 40 / 41
... Introduction Symbolic Multicore Symbolic & Multi-core Conclusion ...
Literature
Check out: http://fmt.cs.utwente.nl/tools/ltsmin/
I Tom van Dijk, Alfons Laarman and Jaco van de Pol,Multi-core BDD Operations for Symbolic Reachability . . . . (PDMC 2012)
I Alfons Laarman, Rom Langerak, Jaco vd Pol, Michael Weber, A. Wijs,Multi-Core Nested Depth-First Search. . . . . . . . . . . . . . . . . (ATVA 2011,’12)
I Alfons Laarman, Jaco van de Pol, Michael Weber,Parallel Recursive State Compression for Free . . . . . . . . . . . . . . (SPIN 2011)
I Alfons Laarman, Jaco van de Pol, Michael Weber,Multi-Core LTSmin: Marrying Modularity and Scalability . . . . (NFM 2011)
I Stefan Blom, Jaco van de Pol, Michael Weber,LTSmin: Distributed and Symbolic Reachability . . . . . . . . . . . . . (CAV 2010)
I Alfons Laarman, Jaco van de Pol and Michael Weber, . . . (FMCAD 2010)Boosting Multi-Core Reachability Performance with Shared Hash Tables
I Stefan Blom, Bert Lisser, Jaco van de Pol and Michael Weber,A Database Approach to Distributed State-Space Generation (JLC 2009)
UNIVERSITY OF TWENTE. Multi-core and/or Symbolic September 20, 2012 41 / 41