have your compiler and extend it too zachary tatlock uc san diego correctness guaranteed
TRANSCRIPT
Have Your Compilerand Extend It Too
Zachary TatlockUC San Diego
Correctness Guaranteed
Howdy! My name is Zach.
I work in the Programming Systems
group.
Collaborators
Sudipta Kundu
PhD 09 Synopsis
Sorin Lerner
UCSD
Building robust compilers is
difficult
complex interactions resist testingCompiler bugs are contagious
invalidate source level guarantees
Few users extend their compiler
hand optimized, unreadable code
Compiler Correctness
Major Resource Allocation
GCC
LLVM
Extensive Testing
Compiler Total (KLOC) Testing (KLOC) %GCC 5292 839 16
Python 1118 223 20
Java 5340 789 15
LLVM 790 250 32
Rough Source Breakdown
“Testing shows the presence, not the absence of bugs.”
- Dijkstra
Decades of Research
Compiler Verification : A Bibliography20030 Gerwin Klein, and Tobias Nipkow; Verified Bytecode Veri-fiers; Theoretical Computer Science, 298:583-626; 2003.1 S. Berghofer, and M. Strecker; Extracting a formally veri-fied, fully executable compiler from a proof assistant; In Proceedings of Compiler Optimization meet Compiler Verification; 2003.2 Sabine Glesner, and Jan Olaf Blech; Classifying and For-mally Verifying Integer Constant Folding; In Proceedings of Compiler Optimization meet Compiler Verification; 2003.3 Thomas Genet, Thomas Jensen, Vikash Kodati, and David Pichardie; A Java Card CAP Converter in PVS; In Pro-ceedings of Compiler Optimization meet Compiler Verification; 2003.
20024 Gerhard Goos; Compiler Verification and Compiler Archi-tecture; Electronic Notes in Theoretical Computer Science, Volume 65, Issue 2, April 2002.5 Lenore Zuck, Amir Pnueli, Yi Fang and Benjamin Goldberg; VOC: A Translation Validator for Optimizing Compilers; Electronic Notes in Theoretical Computer Science, Volume 65, Issue 2, April 6 Sabine Glesner, Rubino Geiß and Boris Boesler; Verified Code Generation for Embedded Systems; Electronic Notes in Theoretical Computer Science, Volume 65, Issue 2, April 2002.7 Carl Christian Frederiksen; Correctness of Classical Com-piler Optimizations using CTL; Electronic Notes in Theo-retical Computer Science, Volume 65, Issue 2, April 2002.8 Thi Viet Nga Nguyen and Francois Irigoin; Alias verifica-tion for Fortran code optimization; Electronic Notes in Theoretical Computer Science, Volume 65, Issue 2, April 2002.9 K.C. Shashidhar, Maurice Bruynooghe, Francky Catthoor and Gerda Janssens; Geometric Model Checking: An Auto-matic Verification Technique for Loop and Data Reuse Transformations; 10 Clara Jaramillo, Rajiv Gupta and Mary Lou Soffa; Debugging and Testing Optimizers through Comparison Checking; Electronic Notes in Theoretical Computer Sci-ence, Volume 65, Issue 2, April 11 Wolfgang Goerigk; Towards Acceptability of Optimiza-tions: An Extended View of Compiler Correctness; Electronic Notes in Theoretical Computer Science, Volume 65, Issue 2, April 2002.12 Martin Strecker; Formal Verification of a Java Compiler in Isabelle; Conference on Automated Deduction, Copenhagen, Denmark, July 27-30, 2002.13 A. Pnueli, Y. Rodeh, O. Strichman, and M. Siegel; The small model property: How small can it be? Information and Computation, 178(1):279-293, October 2002.14 L. Zuck, A. Pnueli, Y. Fang, B. Goldberg, and Y. Hu; Trans-lation and run-time validation of optimized code; In 2nd Workshop on Runtime Verification, volume 70(4) of Elec-tronic Notes in 15 Raya Leviathan, and Amir Pnueli; Validating software pipelining optimizations; Proceedings of the international conference on Compilers, architecture, and synthesis for em-bedded systems,
200116 Axel Dold, and Vincent Vialard; A Mechanically Verified Compiling Specification for a Lisp Compiler; Proc. of the 21st Conference on Foundations of Software Technology and Theoretical 17 A. Dold, and V. Vialard; A Mechanically Verified Boot-strap Compiler; Proceedings of Kolloquium Program-miersprachen und Grundlagen der Programmierung Technical report AIB-2001-11, 18 Wolfgang Goerigk, and Hans Langmaack; Compiler Im-plementation Verification and Trojan Horses; Verifix technical report, 2001.19 L. Zuck, A. Pnueli, and R. Leviathan; Validation of optimiz-ing compilers; Technical Report MCS01-12, Weizmann Insti-tute of Science, August 2001.
200020 Wolfgang Goerigk; Compiler verification revisited; In M;. Kaufmann, P. Manolios, J Moore (ed.): Computer Aided Rea-soning: ACL2 Case Studies, Kluwer, 2000.21 George C. Necula; Translation Validation for an Optimiz-ing Compiler; In Proceedings of the 2000 ACM SIGPLAN Conference on Programming Language Design and Imple-mentation
. . .
Over 100 papers on Compiler Verification from 1967 to 2003 …
… dozens more since 2003.
Decades of Research
67 : McCarthy, Painter
Correctnes of a compiler for arithmetical
expressions
72 : Milner, Weyhrauch
Proving compiler correctness in a mechanized
logic
89 : Moore
Mechanically Verified Language
Implementation
99 : Morrisett, Walker, Crary, Glew
From System F to typed assembly language
06 : Leroy
Formal certification of a compiler back-end
Bugs Persist
Bugzilla Sampling, August 2010
Compiler ReleasedCurrent Bugs
Class
GCC 1987 3410 Confirmed
Python 1991 2300 >= Normal
Java 1995 2120 Unresolved
LLVM 2003 1480 Confirmed
5 7 9 11 13 15 17 19 21 23 251000150020002500300035004000
Compiler Age
Bu
gs
John Regehr : Bug Hunter
Test compilers on random C programs
Found hundreds of mainstream compiler
bugs
Simple: GCC folded (x / -1) != 1 to
0
Work Smarter Not Harder
Build tools for heavy lifting.
OPT
Focus on Optimizations
Many subtle optimizations
difficult to anticipate interactions
Correctness well defined
original and transformed behave
identically
Disabling no longer an option
programs depend on optimizations
Our Two Phase Approach
1. PEC : Automatically check rewrite
2. XCERT : Correctly execute rewrite
Rewrite PEC XCERT
Optimization Correctness
PEC
[PLDI 09] [PLDI 10]
Opt Check: Previous Techniques
Translation Validation
prove equivalence at compile time each execution
TVOC [Zuck et al.] Rhodium
[Lerner et al.]
CompCert[Leroy et al.][Necula 00]
Verified TV [Tristan et al.]
[Pnueli et al.]
a priori Correctness
prove correctness before compiler
runs once and for all
PEC
Focus on Automated Techniques
Scope of Guarantee Verify
RunVerify
Optimization
ExpressivePower
1-to-1Rewrites
ComplexLoop Opts
Complex Loop Opts +Once-and-for-all
Correctness
PEC
Generalize Translation Validationto Once-and-for-all Setting
Translation Validation
PEC
OptimizationInstance
Generalize to Parameterized Progs
Optimization
InputProg
OutputProg
InputPProg
OutputPProg
ParameterizedEquivalenceChecker
EquivalenceChecker
AProve Optimizations Automatically
Before Compiler Ever Runs
Handle Complex Loop
Optimizations
Parameterized Rewrite Rules
Optimization
InputPProg
OutputPProg
Loop Peeling:
move iteration out
Param ranges:
I variable
E expression
S statement
Shift final iteration after loop
Side conditions encode when rewrite is safe
I = 0while I < E: S I++
I = 0while I < E-1: S I++SI++
PEC
where:E > 0S does not modify I, E
Enable 3x Unrolling
Apply Rewrite
1. Match Params
2. Check Side Conds
3. Rewrite
Applying Rewrite Rules
I = 0while I < E: S I++
I = 0while I < E-1: S I++SI++
where:E > 0S does not modify I, E
k = 0
while k <
100:
a[k] += k
k++
k = 0while k < 99: a[k] += k k++a[k] += kk++
PEC
where:
100 > 0
a[k] += k DNM k, 100
Not divisible by 3Difficult to unroll 3x
Divisible by 3Easy to unroll 3x
PEC
Parameterized Equivalence Checking
ParameterizedEquivalenceChecker
Prove Optimizations Automatically
Before Compiler Ever Runs
Handle Complex Loop
Optimizations
Optimization
InputPProg
OutputPProg
I := 0while I < E-1 : S I++SI++
I := 0while I < E: S I++
where:• E > 0• S does not modify I, E
I:=0
I<EI≥E
S
I++
I:=0
I<E-1I≥E-1
S
I++S
I++
σ1=σ2
σ1=σ2
Programs
equivalent:
Consider CFGs
Start in equal states
End in equal states
Checking Rewrite Rules
PEC
Relate Executions:
1. Find synch points
2. Generate invariants
3. Check invs preserved
I:=0
I<EI≥E
S
I++
I:=0
I<E-1I≥E-1
S
I++S
I++
A
B Auto Theorem Prover
Each inv implies
succs
Strengthen if too
weak
σ1=σ2
σ1=σ2
Checking Rewrite Rules
PEC
I<E
S
I++
I:=0
I<E-1I≥E-1
S
I++S
I++
Traverse in lockstep
Stop at stmt params
Prune infeasible paths
From Path: E ≤ 0
Side Conds: E > 0
Path never executes
I:=0
I≥E
1. Find Synchronization Points
PEC
I:=0
I<EI≥E
S
I++
I:=0
I<E-1I≥E-1
S
I++S
I++
Invariants:
preds over σ1, σ2
Gen initial invariant:
σ1 = σ2 AND
strongest post cond
σ1=σ2
σ1=σ2
B
A
A(σ1,σ2) ...
B(σ1,σ2) ...
A(σ1,σ2)σ1=σ2 eval(∧ σ1, I < E)
∧ eval(σ2, I < E-1)
B(σ1,σ2) ...
B
AI<E I<E-1I≥E-1
A(σ1,σ2)σ1=σ2 eval(∧ σ1, I < E)
∧ eval(σ2, I < E-1)
B(σ1,σ2)σ1=σ2 eval(∧ σ1, I < E)
∧ eval(σ2, I ≥ E-1)
2. Generate Invariants
PEC
I:=0
I<EI≥E
S
I++
I:=0
I<E-1I≥E-1
S
I++S
I++
σ1=σ2
σ1=σ2
B
A
Each inv implies succs
Query Theorem Prover
B
AI<E
S
I++
I≥E-1
S
I++
S
I++
I<E
S
I++
I ≥ E-1
A
B
A(σ1,σ2)σ1=σ2 eval(∧ σ1, I < E)
∧ eval(σ2, I < E-1)
B(σ1,σ2)σ1=σ2 eval(∧ σ1, I < E)
∧ eval(σ2, I ≥ E-1)
Entry A Entry B A B A A B Exit
3. Check Invariants
PEC
σ1 σ2 .
A(σ1,σ2) ∧
σ1’ = step(σ1, S;I++;I < E) ∧
σ2’ = step(σ2, S;I++;I ≥ E-1)
B(σ1’, σ2’)
A
B
S
I++
I<E
S
I++
I ≥ E-1
σ1 σ2
σ1’ σ2’
σ1 σ2 .
A(σ1,σ2) ∧
σ1’ = step(σ1, S;I++;I < E) ∧
σ2’ = step(σ2, S;I++;I ≥ E-1)
B(σ1’, σ2’)
ATP Query:
ATPA(σ1,σ2)
σ1=σ2 eval(∧ σ1, I < E) ∧ eval(σ2, I < E-1)
B(σ1,σ2)σ1=σ2 eval(∧ σ1, I < E)
∧ eval(σ2, I ≥ E-1)PEC
3. Check Invariants
A
B
S
I++
I<E
S
I++
I ≥ E-1
σ1
σ1 σ2 .
A(σ1,σ2) ∧
σ1’ = step(σ1, S;I++;I < E) ∧
σ2’ = step(σ2, S;I++;I ≥ E-1)
B(σ1’, σ2’)
σ2
σ1’ σ2’
ATP A(σ1,σ2)
σ1=σ2 eval(∧ σ1, I < E) ∧ eval(σ2, I < E-1)
B(σ1,σ2)σ1=σ2 eval(∧ σ1, I < E)
∧ eval(σ2, I ≥ E-1)
σ1=σ2 eval(∧ σ1, I < E) ∧ eval(σ2, I < E-1)
σ1’=σ2’ eval(∧ σ1’, I < E) ∧ eval(σ2’, I ≥ E-
1)
3. Check Invariants
ATP Query:
PEC
B
S
I++
I<E
S
I++
I ≥ E-1
σ1 σ2
σ1’
σ1 σ2 .
A(σ1,σ2) ∧
σ1’ = step(σ1, S;I++;I < E) ∧
σ2’ = step(σ2, S;I++;I ≥ E-1)
B(σ1’, σ2’)
ATP Query:
σ2’
ATP A(σ1,σ2)
σ1=σ2 eval(∧ σ1, I < E) ∧ eval(σ2, I < E-1)
B(σ1,σ2)σ1=σ2 eval(∧ σ1, I < E)
∧ eval(σ2, I ≥ E-1)
σ1=σ2 eval(∧ σ1, I < E) ∧ eval(σ2, I < E-1)
σ1’=σ2’ eval(∧ σ1’, I < E) ∧ eval(σ2’, I ≥ E-
1)
A ∧ B(σ1’, σ2’)
Strengthen A if the theorem prover fails
σ1’ = step(σ1, S;I++;I < E)
σ2’ = step(σ2, S;I++;I ≥ E-1)
A
3. Check Invariants
PEC
I:=0
I<EI≥E
S
I++
I:=0
I<E-1I≥E-1
S
I++S
I++
σ1=σ2
σ1=σ2
B
A
Entry A Entry B A B A A B Exit
3. Check Invariants
Each inv implies succs
Query Theorem Prover
PEC
Category 1: PEC, Rhodium forms equivalent
Copy propagation
Constant propagation
Common sub-expression elim
Partial redundancy elim
Category 2: PEC form easier, more general
Loop invariant code hoisting
Conditional speculation
Speculation
Category 3:Expressible in PECNo Rhodium formulation possible
Software pipelining
Loop unswitching
Loop unrolling
Loop peeling
Loop splitting
Loop interchange
Optimizations Checked
PEC
Loose Ends
Integrate into compilation chain
build execution engine in real compiler
Correctly pattern match and splice
code
reason about substitutions, IR
semantics
Correctly check side conditions
various program analyses
PEC
PEC
Our Two Phase Approach
1. PEC : Automatically check rewrite
2. XCERT : Correctly execute rewrite
Rewrite PEC XCERT
Optimization Correctness
[PLDI 09] [PLDI 10]XCER
T
Formally prove compiler correct
Implement compiler in proof assistant enables interactive proving
Hard to overcome formality interia high initial cost, “frozen” designs
Strong Guarante
e
Difficult to Extend
Background: Verified Compilers
XCERT
XCERT
On the Shoulders of Giants
XCert extends CompCert with
extensibility
verified compiler provides sure
foundation
Win-win Partnership
CompCert benefits: new optizations
without manual proof effort
XCert benefits: real compilation framework
formal semantics
CompCert
XCert
Extensible & Correct Compiler
PEC Rewrite
ATP Checks
CompCert
C Asm
Correct Compiler
??
Main Theorem Proved in
Coq :
PEC Checked Rewrites in
XCert
XCert Correct Formal Correctness Proof in Coq
Bulk of the development effort
Background: Proof Assistants
XCERT
Based on Curry-Howard Isomorphism:
Coq takes this idea to its logical conclusion
Programs & proofs in same
lang
Dependent Types are
powerful!
Program
Type Theorem
Proof
Example Verified Coq Program
XCERT
Inductive sorted : list Z -> Prop := | sorted0 : sorted nil | sorted1 : forall z:Z, sorted (z :: nil) | sorted2 : forall (z1 z2:Z) (l:list Z), z1 <= z2 -> sorted (z2 :: l) -> sorted (z1 :: z2 :: l).
...Definition sort : forall l:list Z, {l' : list Z | equiv l l' /\ sorted l'}.
induction l as [| a l IHl].exists (nil (A:=Z)); split; auto with sort. case IHl; intros l' [H0 H1].exists (aux a l'); split.apply equiv_trans with (a :: l'); auto with sort.apply aux_equiv. apply aux_sorted; auto.Defined.
XCert Correctness Proof
Small Step
Execute instruction
Step state S to S’
S
S’
XCERT
Execution Equivalence
Initial Equiv
Prove Simulation Diagram
CompCert Small Step Library:
Sim Diagram Progs
Equiv
L
L’
R
R’
L ~ R
<< L L’
R’ L’ ~ R’: R R’
Final Equiv
XCert Correctness Proof
? XCERT
Orig
inal
Tran
sfor
med
XCert Simulation Diagram
PEC Checked
RewriteA
A
A
B
A
B
ATP Checked
A A
A BXCER
T
XCert Highlights
Expressive CFG manipulations
pattern matching, splicing
Proof Complexity Management
Verified validation [Tristan and Leroy]
preserving non-terminating behaviors
Verified Analyses for Side ConditionsXCER
T
Evaluation
Engine : 1,500 lines of Coq
functional code
Proof : 4,500 lines of Coq proof
script
Time : 9 hacker months
XCERTCode Proof
0
1000
2000
3000
4000
5000
CSEConst PropXcert
Evaluation
Trusted Computing Base (TCB)
Appeals to faith … want to minimize
Compcert : Coq + Coq encoding of semantics
XCert adds : SMT + SMT encoding of semantics
All architected to pass through small checker
TCB is only a few hundred lines
XCERT
Evaluation
Extensibility: Support PEC Opts [PLDI
09]
No manual proof effort or TCB increase
Maintain Compcert end-to-end
correctness
Sample of Optimizations Run:
Loop Invariant Code Hoist Loop Peeling
Software Pipelining Conditional Speculation
Loop Unswitching Partial Redundancy ElimXCER
T
2XCert
Extensible & Correct Compiler
Thank You!
1Rewrite
Rule
PEC