analyzing the intel itanium memory ordering rules using logic programming and sat yue yang ganesh...
Post on 21-Dec-2015
218 views
TRANSCRIPT
![Page 1: Analyzing the Intel Itanium Memory Ordering Rules using Logic Programming and SAT Yue Yang Ganesh Gopalakrishnan Gary Lindstrom Konrad Slind School of](https://reader031.vdocuments.mx/reader031/viewer/2022032704/56649d575503460f94a36879/html5/thumbnails/1.jpg)
Analyzing the Intel Itanium Memory Ordering Rules
using Logic Programming and SAT
Yue YangGanesh Gopalakrishnan
Gary LindstromKonrad Slind
School of ComputingUniversity of Utah
Work supported in part by NSF Awards CCR-0081406 and 0219805, and SRC Contract 1031.001
![Page 2: Analyzing the Intel Itanium Memory Ordering Rules using Logic Programming and SAT Yue Yang Ganesh Gopalakrishnan Gary Lindstrom Konrad Slind School of](https://reader031.vdocuments.mx/reader031/viewer/2022032704/56649d575503460f94a36879/html5/thumbnails/2.jpg)
2
cpu cpu cpu….
mem
What are Memory Ordering Rules?
Aggressiveload/storereorderings
‘Bypassing’ (read back own store before others)
Strong orderingsonly at acquires/releases
cpu cpu ….
mem
The effects of aggressive hardware optimizations…
...that are visible as out-of-order executions to a programmer
st a,1 ;st b,2;
ld b,2;ld a,0;
cpu cpu
st c,1 ;st.rel d,2;
ld.acq d,2;ld c,1;
“out of order” usually means“with respect to SC”
![Page 3: Analyzing the Intel Itanium Memory Ordering Rules using Logic Programming and SAT Yue Yang Ganesh Gopalakrishnan Gary Lindstrom Konrad Slind School of](https://reader031.vdocuments.mx/reader031/viewer/2022032704/56649d575503460f94a36879/html5/thumbnails/3.jpg)
3
Why Relaxed Ordering Rules?
• All modern high-end processors employ relaxed ordering rules • Modern multi-threaded languages also follow suit
WHY?
• Aggressive updates are too expensive– CPU / Memory speed mismatch getting progressively worse
• Enables performance enhancing optimizations at the bus / interconnect level
• Simplifies directory protocols (less waiting, avoid deadlocks by relaxing message traffic rules, ...)
![Page 4: Analyzing the Intel Itanium Memory Ordering Rules using Logic Programming and SAT Yue Yang Ganesh Gopalakrishnan Gary Lindstrom Konrad Slind School of](https://reader031.vdocuments.mx/reader031/viewer/2022032704/56649d575503460f94a36879/html5/thumbnails/4.jpg)
4
Contrast between `strict’ and `relaxed’ orderings
Strict (e.g., Sequential Consistency)
Relaxed(e.g., PRAM)
Each processor’sinstructions comeaccording to program order
memory
They execute as ifconnected to a singleserial memory thru anon-deterministic switch
One memory per processorin effect (details omitted)
No write-atomicity - only program order obeyed
![Page 5: Analyzing the Intel Itanium Memory Ordering Rules using Logic Programming and SAT Yue Yang Ganesh Gopalakrishnan Gary Lindstrom Konrad Slind School of](https://reader031.vdocuments.mx/reader031/viewer/2022032704/56649d575503460f94a36879/html5/thumbnails/5.jpg)
5
Contrast between Relaxed Academic and Industrial Models
Relaxed(e.g., PRAM)
Relaxed + Strict +Hybrid + ... (e.g., Itanium)• See our ICCD’99 paper for a very approximate operational model • Lamport et.al. have one in TLA, too...
![Page 6: Analyzing the Intel Itanium Memory Ordering Rules using Logic Programming and SAT Yue Yang Ganesh Gopalakrishnan Gary Lindstrom Konrad Slind School of](https://reader031.vdocuments.mx/reader031/viewer/2022032704/56649d575503460f94a36879/html5/thumbnails/6.jpg)
6
Who depends on Memory Orderings?
• Compiler / OS developers– many of the proposed high-performance kernels exploit
weakness to a high degree
• People who port existing code-bases– code-bases must port between platforms
• Implementers of thread-based systems, JVMs, ....– it has to mesh with the language-level memory model as
well
• It is a central issue even in “uniprocessors” in which multiple threads share memory
![Page 7: Analyzing the Intel Itanium Memory Ordering Rules using Logic Programming and SAT Yue Yang Ganesh Gopalakrishnan Gary Lindstrom Konrad Slind School of](https://reader031.vdocuments.mx/reader031/viewer/2022032704/56649d575503460f94a36879/html5/thumbnails/7.jpg)
7
A taxonomy of methods to specify industrial Relaxed Memory Models
• Informal • “A Store Release flushes out earlier pended operations. • All Store Releases appear to commit in a global total order. • They allow Read Bypassing, except for non-Cacheable
addresses• Full Intel spec available by searching `251429’ under google
– A dozen or so litmus tests also given as a supplement
P1 P2 st.rel A,1; st.rel B,1; ld.acq r1,A; [1] ld.acq r3,B; [1] ld r2,B; [0] ld r4,A; [0]
• Formal– Operational– Axiomatic
![Page 8: Analyzing the Intel Itanium Memory Ordering Rules using Logic Programming and SAT Yue Yang Ganesh Gopalakrishnan Gary Lindstrom Konrad Slind School of](https://reader031.vdocuments.mx/reader031/viewer/2022032704/56649d575503460f94a36879/html5/thumbnails/8.jpg)
8
A taxonomy of Formal methods to specify industrial Relaxed Memory
Models• Operational
– Operational models of industrial memory models are complex
– Running them inside a standard model-checker is too slow!
– Utility for verification is limited
– Provides limited insight
• Axiomatic– Much more precise
– Orderings must ideally be expressed thru an ORTHOGONAL set of rules
– No such prior axiomatic specs of industrial memory models
![Page 9: Analyzing the Intel Itanium Memory Ordering Rules using Logic Programming and SAT Yue Yang Ganesh Gopalakrishnan Gary Lindstrom Konrad Slind School of](https://reader031.vdocuments.mx/reader031/viewer/2022032704/56649d575503460f94a36879/html5/thumbnails/9.jpg)
9
How to Organize AxiomaticMemory Ordering Specs?
• Ad-hoc
• Visibility Order Based
![Page 10: Analyzing the Intel Itanium Memory Ordering Rules using Logic Programming and SAT Yue Yang Ganesh Gopalakrishnan Gary Lindstrom Konrad Slind School of](https://reader031.vdocuments.mx/reader031/viewer/2022032704/56649d575503460f94a36879/html5/thumbnails/10.jpg)
10
Visibility Order Specs
st A,1 ;st B,2;
ld B [v1]ld A [v2]
A memory model (spec of Memory Ordering Rules) is amapping from executions to a set of allowed total orderscalled visibility orders; it is a 1-to-many mapping:
st(A,1) st(B,2) ld(B,v1) ld(A,v2)
ld(A,v2) ld(B,v1) st(B,2) st(A,1) RelaxedOrderingallowed too
st.rel A,1 ;st B,2;
ld.acq B [v1] ld A [v2]
For “complex” instructions,we generate more visibility events
After specifying all allowed Visibility Orders, the Load-Value Rulespecifies how Loads return their values ..... see below
ld(A,?) st(A,1) st(A,1) st(B,2) st(B,2); ld(B,?)
0 2
st.rel(A,1),st(B,2),
st.rel(A,1), st(B,2),
seenin P1
seenin P2
ld.acq(B,v1), ld(A,v2)
initialmemory
Strict OrderingAllowed
{}
![Page 11: Analyzing the Intel Itanium Memory Ordering Rules using Logic Programming and SAT Yue Yang Ganesh Gopalakrishnan Gary Lindstrom Konrad Slind School of](https://reader031.vdocuments.mx/reader031/viewer/2022032704/56649d575503460f94a36879/html5/thumbnails/11.jpg)
11
Our first contribution
• Developed Axiomatic, Visibility Order based Spec for most of Itanium Orderings (semaphores will be added in next version)– Orderings implicit in their document made explicit
• 3-pages of HOL as opposed to 24 pages of prose + tables – Also developed an executable constraint-Prolog version
• Can reason using a theorem prover– will attempt claim found in Intel’s manual about causality
• Written in a generic style - several other memory models specified in the same framework– pre-requisite to formally comparing memory models
• Comprised of orthogonal sub-rules
![Page 12: Analyzing the Intel Itanium Memory Ordering Rules using Logic Programming and SAT Yue Yang Ganesh Gopalakrishnan Gary Lindstrom Konrad Slind School of](https://reader031.vdocuments.mx/reader031/viewer/2022032704/56649d575503460f94a36879/html5/thumbnails/12.jpg)
12
legalItaniumStyle of specification
legalItanium(ops) =
Exists order.( constraint1 ops order /\ constraint2 ops order /\ ... )
• Can selectively disable constraints and compare results • Since the constraints are orthogonal, we can localize errors
Visibility Order described by order : visevent -> visevent -> bool
We use the “id” of each visevent which is an int; so order : int -> int -> bool
![Page 13: Analyzing the Intel Itanium Memory Ordering Rules using Logic Programming and SAT Yue Yang Ganesh Gopalakrishnan Gary Lindstrom Konrad Slind School of](https://reader031.vdocuments.mx/reader031/viewer/2022032704/56649d575503460f94a36879/html5/thumbnails/13.jpg)
13
legalItanium
legalItanium(ops) =Exists order.( requireLinearOrder ops order /\ requireWriteOperationOrder ops order/\ requireProgramOrder ops order/\ requireMemoryDataDependence ops order/\ requireDataFlowDependence ops order/\ requireCoherence ops order/\ requireReadValue ops order/\ requireAtomicWBRelease ops order/\ requireSequentialUC ops order/\ requireNoUCBypass ops order )
![Page 14: Analyzing the Intel Itanium Memory Ordering Rules using Logic Programming and SAT Yue Yang Ganesh Gopalakrishnan Gary Lindstrom Konrad Slind School of](https://reader031.vdocuments.mx/reader031/viewer/2022032704/56649d575503460f94a36879/html5/thumbnails/14.jpg)
14
requireProgramOrder
requireProgramOrder ops order = Forall i,j : ops ( orderedByAcquire i j \/ orderedByRelease i j \/ orderedByFence i j ) ==> order i j
![Page 15: Analyzing the Intel Itanium Memory Ordering Rules using Logic Programming and SAT Yue Yang Ganesh Gopalakrishnan Gary Lindstrom Konrad Slind School of](https://reader031.vdocuments.mx/reader031/viewer/2022032704/56649d575503460f94a36879/html5/thumbnails/15.jpg)
15
Where do we use our Formal Spec of Memory Orderings?
• To help solve one of the nastiest problems encountered during Post-Silicon Validation– An MP system has just been built (boards, fan, ...)– How do we certify that it obeys the memory ordering
rules?
Limitedobservability(forced toobserve via“final effects”on programs)
Unverified inter-moduleassumptions examined forthe first time at GHz speeds!
WHY IS POST-SILICON VERIFICATION HARD?
![Page 16: Analyzing the Intel Itanium Memory Ordering Rules using Logic Programming and SAT Yue Yang Ganesh Gopalakrishnan Gary Lindstrom Konrad Slind School of](https://reader031.vdocuments.mx/reader031/viewer/2022032704/56649d575503460f94a36879/html5/thumbnails/16.jpg)
16
Typical Post-Si Memory Ordering Verification Approach
• Manual reasoning of executions generated by random tests– Highly labor intensive
• designers have to think through ALL ordering rules at EACH step
– No systematic methods to write the tests
• Ad-hoc tools employed for behavior matching
– No Formal Guarantees even on small executions
– No insights provided upon failure
– Cannot pinpoint onset of divergence from allowed behaviors
![Page 17: Analyzing the Intel Itanium Memory Ordering Rules using Logic Programming and SAT Yue Yang Ganesh Gopalakrishnan Gary Lindstrom Konrad Slind School of](https://reader031.vdocuments.mx/reader031/viewer/2022032704/56649d575503460f94a36879/html5/thumbnails/17.jpg)
17
Our Idealized Approach to a solution (currently under development)
BUILDTHIS
BOX !!
An ArbitrarySpecificationof MemoryOrdering Rulesin HOL
An ArbitraryLitmus Test, e.g. ...
st.rel a,1; st.rel b,1;
ld.acq r1,a; [V2] ld.acq r3,b;[V3]
ld r2,b;[0] ld r4,a;[0]
LEGAL! Explanation script + ALL bindings to V2 and V3
ILLEGAL! explanation script...
![Page 18: Analyzing the Intel Itanium Memory Ordering Rules using Logic Programming and SAT Yue Yang Ganesh Gopalakrishnan Gary Lindstrom Konrad Slind School of](https://reader031.vdocuments.mx/reader031/viewer/2022032704/56649d575503460f94a36879/html5/thumbnails/18.jpg)
18
The first approach presented here
Spec of Memory Ordering Rules Coded-up Nicely as a Constraint Logic Program
An ArbitraryGroundLitmus Test, e.g. ...
st.rel a,1; st.rel b,1;
ld.acq r1,a; [1] ld.acq r3,b;[1]
ld r2,b;[0] ld r4,a;[0]
LEGAL! explanation script...
ILLEGAL!
onlygroundvaluesallowed
![Page 19: Analyzing the Intel Itanium Memory Ordering Rules using Logic Programming and SAT Yue Yang Ganesh Gopalakrishnan Gary Lindstrom Konrad Slind School of](https://reader031.vdocuments.mx/reader031/viewer/2022032704/56649d575503460f94a36879/html5/thumbnails/19.jpg)
19
The second approach presented here
Spec of MemoryOrdering RulesCoded-up Nicelyas a ConstraintLogic Program
An ArbitraryGroundLitmus Test
UNSAT! implies ILLEGAL!
A SAT checker
SAT! implies LEGAL!
![Page 20: Analyzing the Intel Itanium Memory Ordering Rules using Logic Programming and SAT Yue Yang Ganesh Gopalakrishnan Gary Lindstrom Konrad Slind School of](https://reader031.vdocuments.mx/reader031/viewer/2022032704/56649d575503460f94a36879/html5/thumbnails/20.jpg)
20
How does Approach #1 work ?
• Need to know a little bit about Constraint Logic Programs (e.g., – GnuProlog, Sicstus Prolog, Mozart, ... support
constraints directly– Available as “free-standing” packages callable
from C, Java, Ocaml, ...
evens_below_Y( X,Y) :- X is in (0..10), X < Y, (X mod 2) = 0
Allocates constraint-storeentry for X withsome user-chosen initial range
called with Y = W, X unbound
Imposes X=W-1 Imposes constraint(W-1) mod 2 = 0 intoconstraint storebacktracking triggered if W is later found = 6
![Page 21: Analyzing the Intel Itanium Memory Ordering Rules using Logic Programming and SAT Yue Yang Ganesh Gopalakrishnan Gary Lindstrom Konrad Slind School of](https://reader031.vdocuments.mx/reader031/viewer/2022032704/56649d575503460f94a36879/html5/thumbnails/21.jpg)
21
How to model requireProgramOrder (e.g.)
as a Constraint Logic Program?
requireProgramOrder ops order = Forall i,j : ops ( orderedByAcquire i j \/ orderedByRelease i j \/ orderedByFence i j ) ==> order i j
x x x x x xx x x x x xx x x x x xx x x x x xx x x x x xx x x x x x
i
j
•Allocate 2D constraint-var array
•Interpret Litmus test, adding constraint to 2D array
•When Interpretation Finishes, all “x” reveals latitude in weak order
•When an “x” changes to a 1, an attempt to set it 0 later triggers backtracking
jM i
ijM = 1 means
i is ordered before j
![Page 22: Analyzing the Intel Itanium Memory Ordering Rules using Logic Programming and SAT Yue Yang Ganesh Gopalakrishnan Gary Lindstrom Konrad Slind School of](https://reader031.vdocuments.mx/reader031/viewer/2022032704/56649d575503460f94a36879/html5/thumbnails/22.jpg)
22
Our Prolog Code is VERY close to the HOL spec!
requireProgramOrder ops order = Forall i,j : ops ( orderedByAcquire i j \/ orderedByRelease i j \/ orderedByFence i j ) ==> order i j
![Page 23: Analyzing the Intel Itanium Memory Ordering Rules using Logic Programming and SAT Yue Yang Ganesh Gopalakrishnan Gary Lindstrom Konrad Slind School of](https://reader031.vdocuments.mx/reader031/viewer/2022032704/56649d575503460f94a36879/html5/thumbnails/23.jpg)
23
Our Prolog Code is VERY close to the HOL spec!
requireProgramOrder ops order = Forall i,j : ops ( orderedByAcquire i j \/ orderedByRelease i j \/ orderedByFence i j ) ==> order i j
( % Rule (ACQ): ACQ>>I .....
#\/
% Rule (REL):
Op_j #= StRel #/\(
IsWr_i #==>(WrType_i #= Local #/\ WrType_j #= Local
#\/WrType_i #= Remote #/\ WrType_j #= Remote
#/\ WrProc_i #= WrProc_j))
....#==>Oij.
IMPOSES CONSTRAINT ONMATRIX ENTRY Oij
![Page 24: Analyzing the Intel Itanium Memory Ordering Rules using Logic Programming and SAT Yue Yang Ganesh Gopalakrishnan Gary Lindstrom Konrad Slind School of](https://reader031.vdocuments.mx/reader031/viewer/2022032704/56649d575503460f94a36879/html5/thumbnails/24.jpg)
24
Idea behind the SAT approach
( % Rule (ACQ): ACQ>>I .....
#\/
% Rule (REL):Op_j #= StRel #/\(
IsWr_i #==>(WrType_i #= Local #/\ WrType_j #= Local
#\/WrType_i #= Remote #/\ WrType_j #= Remote
#/\ WrProc_i #= WrProc_j))
....#==>
Emit Boolean Expression here (as opposed to imposing constraint on constraint-store)
![Page 25: Analyzing the Intel Itanium Memory Ordering Rules using Logic Programming and SAT Yue Yang Ganesh Gopalakrishnan Gary Lindstrom Konrad Slind School of](https://reader031.vdocuments.mx/reader031/viewer/2022032704/56649d575503460f94a36879/html5/thumbnails/25.jpg)
25
What did we learn?• A really elegant approach to study Memory Ordering • Many bugs in spec caught through finite executions
– Formal `paper-and-pencil’ memory ordering specs are very unreliable!
• Prolog Code may not scale – Prolog Quirks (memory resources scattered in stack, trail-stack,
constraint-store, ... - execution halts if one exhausted)– Prolog’s search may not be “as smart” as SAT’s (?)
• SAT generation time dominates– Pretty naive coding and CNF generation– Could scale considerably; for example:
FD-solving SAT-gen SAT-vars SAT-clauses SAT-solving
22 s 200s 576 15k 0.01s
• Best long-term approach is the `ideal’ one mentioned earlier– (explain details if there is time)
![Page 26: Analyzing the Intel Itanium Memory Ordering Rules using Logic Programming and SAT Yue Yang Ganesh Gopalakrishnan Gary Lindstrom Konrad Slind School of](https://reader031.vdocuments.mx/reader031/viewer/2022032704/56649d575503460f94a36879/html5/thumbnails/26.jpg)
26
Summary of Key Contributions• We provide a formal specification of the entire Itanium memory
ordering specification in Higher Order Logic (barring semaphores that
change the ‘data structures’ we need )
– Our Spec (3 pages of hol) replaces 24 pages of Intel spec– Our Spec is EASIER to understand (said the Charme reviewers!)– We can now prove theorems to increase confidence
• We present TWO ways to use this hol spec to check executions obtained from the post-silicon environment– Encode as a Constraint-Logic program that interprets
assembly executions and checks conformance with the rules
– Constraint-Logic program that interprets assembly executions, and generates a SAT instance embodying conformance
• Our tool was given to engineers in Intel’s post-Si validation group– highly encouraging feedback obtained
![Page 27: Analyzing the Intel Itanium Memory Ordering Rules using Logic Programming and SAT Yue Yang Ganesh Gopalakrishnan Gary Lindstrom Konrad Slind School of](https://reader031.vdocuments.mx/reader031/viewer/2022032704/56649d575503460f94a36879/html5/thumbnails/27.jpg)
27
Some of the Related Work
• Classical approaches– Mostly paper-and-pencil specs– Executable specs (Murphi) used to verify critical section codes
• Spec of the Alpha memory ordering rules in FOL/HOL– Yuan Yu (personal communication) - unpublished– VCs generated for assembly programs and given to ESC prover– Our work is for a modern system (Itanium) and uses SAT
• TLA+ spec of the Itanium ordering rules– Details are not published– Not amenable to execution (very slow execution speeds)– Impractical for use in checking assembly program executions
![Page 28: Analyzing the Intel Itanium Memory Ordering Rules using Logic Programming and SAT Yue Yang Ganesh Gopalakrishnan Gary Lindstrom Konrad Slind School of](https://reader031.vdocuments.mx/reader031/viewer/2022032704/56649d575503460f94a36879/html5/thumbnails/28.jpg)
28
Questions?
![Page 29: Analyzing the Intel Itanium Memory Ordering Rules using Logic Programming and SAT Yue Yang Ganesh Gopalakrishnan Gary Lindstrom Konrad Slind School of](https://reader031.vdocuments.mx/reader031/viewer/2022032704/56649d575503460f94a36879/html5/thumbnails/29.jpg)
29
Work in progress
An ArbitrarySpecificationof MemoryOrdering Rulesin HOL
An ArbitraryLitmus Test(non-ground values allowed)
LEGAL! Explanation script + ALL bindings to V2 and V3
ILLEGAL! explanation script...
Generate a QBF formulafor the size of the Litmus test
DNFrepresentationof Litmus test(“ROM”)
Generate“compact”CNF
QBFSolver
QBF is natural formemory ordering rules