have your compiler and extend it too zachary tatlock uc san diego correctness guaranteed

Have Your Compilerand Extend It Too

Zachary TatlockUC San Diego

Correctness Guaranteed

Howdy! My name is Zach.

I work in the Programming Systems

group.

Collaborators

Sudipta Kundu

PhD 09 Synopsis

Sorin Lerner

UCSD

Building robust compilers is

difficult

complex interactions resist testingCompiler bugs are contagious

invalidate source level guarantees

Few users extend their compiler

hand optimized, unreadable code

Compiler Correctness

Major Resource Allocation

GCC

LLVM

Extensive Testing

Compiler Total (KLOC) Testing (KLOC) %GCC 5292 839 16

Python 1118 223 20

Java 5340 789 15

LLVM 790 250 32

Rough Source Breakdown

“Testing shows the presence, not the absence of bugs.”

- Dijkstra

Decades of Research

Compiler Verification : A Bibliography20030 Gerwin Klein, and Tobias Nipkow; Verified Bytecode Veri-fiers; Theoretical Computer Science, 298:583-626; 2003.1 S. Berghofer, and M. Strecker; Extracting a formally veri-fied, fully executable compiler from a proof assistant; In Proceedings of Compiler Optimization meet Compiler Verification; 2003.2 Sabine Glesner, and Jan Olaf Blech; Classifying and For-mally Verifying Integer Constant Folding; In Proceedings of Compiler Optimization meet Compiler Verification; 2003.3 Thomas Genet, Thomas Jensen, Vikash Kodati, and David Pichardie; A Java Card CAP Converter in PVS; In Pro-ceedings of Compiler Optimization meet Compiler Verification; 2003.

20024 Gerhard Goos; Compiler Verification and Compiler Archi-tecture; Electronic Notes in Theoretical Computer Science, Volume 65, Issue 2, April 2002.5 Lenore Zuck, Amir Pnueli, Yi Fang and Benjamin Goldberg; VOC: A Translation Validator for Optimizing Compilers; Electronic Notes in Theoretical Computer Science, Volume 65, Issue 2, April 6 Sabine Glesner, Rubino Geiß and Boris Boesler; Verified Code Generation for Embedded Systems; Electronic Notes in Theoretical Computer Science, Volume 65, Issue 2, April 2002.7 Carl Christian Frederiksen; Correctness of Classical Com-piler Optimizations using CTL; Electronic Notes in Theo-retical Computer Science, Volume 65, Issue 2, April 2002.8 Thi Viet Nga Nguyen and Francois Irigoin; Alias verifica-tion for Fortran code optimization; Electronic Notes in Theoretical Computer Science, Volume 65, Issue 2, April 2002.9 K.C. Shashidhar, Maurice Bruynooghe, Francky Catthoor and Gerda Janssens; Geometric Model Checking: An Auto-matic Verification Technique for Loop and Data Reuse Transformations; 10 Clara Jaramillo, Rajiv Gupta and Mary Lou Soffa; Debugging and Testing Optimizers through Comparison Checking; Electronic Notes in Theoretical Computer Sci-ence, Volume 65, Issue 2, April 11 Wolfgang Goerigk; Towards Acceptability of Optimiza-tions: An Extended View of Compiler Correctness; Electronic Notes in Theoretical Computer Science, Volume 65, Issue 2, April 2002.12 Martin Strecker; Formal Verification of a Java Compiler in Isabelle; Conference on Automated Deduction, Copenhagen, Denmark, July 27-30, 2002.13 A. Pnueli, Y. Rodeh, O. Strichman, and M. Siegel; The small model property: How small can it be? Information and Computation, 178(1):279-293, October 2002.14 L. Zuck, A. Pnueli, Y. Fang, B. Goldberg, and Y. Hu; Trans-lation and run-time validation of optimized code; In 2nd Workshop on Runtime Verification, volume 70(4) of Elec-tronic Notes in 15 Raya Leviathan, and Amir Pnueli; Validating software pipelining optimizations; Proceedings of the international conference on Compilers, architecture, and synthesis for em-bedded systems,

200116 Axel Dold, and Vincent Vialard; A Mechanically Verified Compiling Specification for a Lisp Compiler; Proc. of the 21st Conference on Foundations of Software Technology and Theoretical 17 A. Dold, and V. Vialard; A Mechanically Verified Boot-strap Compiler; Proceedings of Kolloquium Program-miersprachen und Grundlagen der Programmierung Technical report AIB-2001-11, 18 Wolfgang Goerigk, and Hans Langmaack; Compiler Im-plementation Verification and Trojan Horses; Verifix technical report, 2001.19 L. Zuck, A. Pnueli, and R. Leviathan; Validation of optimiz-ing compilers; Technical Report MCS01-12, Weizmann Insti-tute of Science, August 2001.

200020 Wolfgang Goerigk; Compiler verification revisited; In M;. Kaufmann, P. Manolios, J Moore (ed.): Computer Aided Rea-soning: ACL2 Case Studies, Kluwer, 2000.21 George C. Necula; Translation Validation for an Optimiz-ing Compiler; In Proceedings of the 2000 ACM SIGPLAN Conference on Programming Language Design and Imple-mentation

. . .

Over 100 papers on Compiler Verification from 1967 to 2003 …

… dozens more since 2003.

Decades of Research

67 : McCarthy, Painter

Correctnes of a compiler for arithmetical

expressions

72 : Milner, Weyhrauch

Proving compiler correctness in a mechanized

logic

89 : Moore

Mechanically Verified Language

Implementation

99 : Morrisett, Walker, Crary, Glew

From System F to typed assembly language

06 : Leroy

Formal certification of a compiler back-end

Bugs Persist

Bugzilla Sampling, August 2010

Compiler ReleasedCurrent Bugs

Class

GCC 1987 3410 Confirmed

Python 1991 2300 >= Normal

Java 1995 2120 Unresolved

LLVM 2003 1480 Confirmed

5 7 9 11 13 15 17 19 21 23 251000150020002500300035004000

Compiler Age

Bu

gs

John Regehr : Bug Hunter

Test compilers on random C programs

Found hundreds of mainstream compiler

bugs

Simple: GCC folded (x / -1) != 1 to

0

Work Smarter Not Harder

Build tools for heavy lifting.

OPT

Focus on Optimizations

Many subtle optimizations

difficult to anticipate interactions

Correctness well defined

original and transformed behave

identically

Disabling no longer an option

programs depend on optimizations

Our Two Phase Approach

1. PEC : Automatically check rewrite

2. XCERT : Correctly execute rewrite

Rewrite PEC XCERT

Optimization Correctness

PEC

[PLDI 09] [PLDI 10]

Opt Check: Previous Techniques

Translation Validation

prove equivalence at compile time each execution

TVOC [Zuck et al.] Rhodium

[Lerner et al.]

CompCert[Leroy et al.][Necula 00]

Verified TV [Tristan et al.]

[Pnueli et al.]

a priori Correctness

prove correctness before compiler

runs once and for all

PEC

Focus on Automated Techniques

Scope of Guarantee Verify

RunVerify

Optimization

ExpressivePower

1-to-1Rewrites

ComplexLoop Opts

Complex Loop Opts +Once-and-for-all

Correctness

PEC

Generalize Translation Validationto Once-and-for-all Setting

Translation Validation

PEC

OptimizationInstance

Generalize to Parameterized Progs

Optimization

InputProg

OutputProg

InputPProg

OutputPProg

ParameterizedEquivalenceChecker

EquivalenceChecker

AProve Optimizations Automatically

Before Compiler Ever Runs

Handle Complex Loop

Optimizations

Parameterized Rewrite Rules

Optimization

InputPProg

OutputPProg

Loop Peeling:

move iteration out

Param ranges:

I variable

E expression

S statement

Shift final iteration after loop

Side conditions encode when rewrite is safe

I = 0while I < E: S I++

I = 0while I < E-1: S I++SI++

PEC

where:E > 0S does not modify I, E

Enable 3x Unrolling

Apply Rewrite

1. Match Params

2. Check Side Conds

3. Rewrite

Applying Rewrite Rules

I = 0while I < E: S I++

I = 0while I < E-1: S I++SI++

where:E > 0S does not modify I, E

k = 0

while k <

100:

a[k] += k

k++

k = 0while k < 99: a[k] += k k++a[k] += kk++

PEC

where:

100 > 0

a[k] += k DNM k, 100

Not divisible by 3Difficult to unroll 3x

Divisible by 3Easy to unroll 3x

PEC

Parameterized Equivalence Checking

ParameterizedEquivalenceChecker

Prove Optimizations Automatically

Before Compiler Ever Runs

Handle Complex Loop

Optimizations

Optimization

InputPProg

OutputPProg

I := 0while I < E-1 : S I++SI++

I := 0while I < E: S I++

where:• E > 0• S does not modify I, E

I:=0

I<EI≥E

S

I++

I:=0

I<E-1I≥E-1

S

I++S

I++

σ1=σ2

σ1=σ2

Programs

equivalent:

Consider CFGs

Start in equal states

End in equal states

Checking Rewrite Rules

PEC

Relate Executions:

1. Find synch points

2. Generate invariants

3. Check invs preserved

I:=0

I<EI≥E

S

I++

I:=0

I<E-1I≥E-1

S

I++S

I++

A

B Auto Theorem Prover

Each inv implies

succs

Strengthen if too

weak

σ1=σ2

σ1=σ2

Checking Rewrite Rules

PEC

I<E

S

I++

I:=0

I<E-1I≥E-1

S

I++S

I++

Traverse in lockstep

Stop at stmt params

Prune infeasible paths

From Path: E ≤ 0

Side Conds: E > 0

Path never executes

I:=0

I≥E

1. Find Synchronization Points

PEC

I:=0

I<EI≥E

S

I++

I:=0

I<E-1I≥E-1

S

I++S

I++

Invariants:

preds over σ1, σ2

Gen initial invariant:

σ1 = σ2 AND

strongest post cond

σ1=σ2

σ1=σ2

B

A

A(σ1,σ2) ...

B(σ1,σ2) ...

A(σ1,σ2)σ1=σ2 eval(∧ σ1, I < E)

∧ eval(σ2, I < E-1)

B(σ1,σ2) ...

B

AI<E I<E-1I≥E-1

A(σ1,σ2)σ1=σ2 eval(∧ σ1, I < E)

∧ eval(σ2, I < E-1)

B(σ1,σ2)σ1=σ2 eval(∧ σ1, I < E)

∧ eval(σ2, I ≥ E-1)

2. Generate Invariants

PEC

I:=0

I<EI≥E

S

I++

I:=0

I<E-1I≥E-1

S

I++S

I++

σ1=σ2

σ1=σ2

B

A

Each inv implies succs

Query Theorem Prover

B

AI<E

S

I++

I≥E-1

S

I++

S

I++

I<E

S

I++

I ≥ E-1

A

B

A(σ1,σ2)σ1=σ2 eval(∧ σ1, I < E)

∧ eval(σ2, I < E-1)

B(σ1,σ2)σ1=σ2 eval(∧ σ1, I < E)

∧ eval(σ2, I ≥ E-1)

Entry A Entry B A B A A B Exit

3. Check Invariants

PEC

σ1 σ2 .

A(σ1,σ2) ∧

σ1’ = step(σ1, S;I++;I < E) ∧

σ2’ = step(σ2, S;I++;I ≥ E-1)

B(σ1’, σ2’)

A

B

S

I++

I<E

S

I++

I ≥ E-1

σ1 σ2

σ1’ σ2’

σ1 σ2 .

A(σ1,σ2) ∧

σ1’ = step(σ1, S;I++;I < E) ∧

σ2’ = step(σ2, S;I++;I ≥ E-1)

B(σ1’, σ2’)

ATP Query:

ATPA(σ1,σ2)

σ1=σ2 eval(∧ σ1, I < E) ∧ eval(σ2, I < E-1)

B(σ1,σ2)σ1=σ2 eval(∧ σ1, I < E)

∧ eval(σ2, I ≥ E-1)PEC

3. Check Invariants

A

B

S

I++

I<E

S

I++

I ≥ E-1

σ1

σ1 σ2 .

A(σ1,σ2) ∧

σ1’ = step(σ1, S;I++;I < E) ∧

σ2’ = step(σ2, S;I++;I ≥ E-1)

B(σ1’, σ2’)

σ2

σ1’ σ2’

ATP A(σ1,σ2)


B(σ1,σ2)σ1=σ2 eval(∧ σ1, I < E)

∧ eval(σ2, I ≥ E-1)


σ1’=σ2’ eval(∧ σ1’, I < E) ∧ eval(σ2’, I ≥ E-

1)

3. Check Invariants

ATP Query:

PEC

B

S

I++

I<E

S

I++

I ≥ E-1

σ1 σ2

σ1’

σ1 σ2 .

A(σ1,σ2) ∧

σ1’ = step(σ1, S;I++;I < E) ∧

σ2’ = step(σ2, S;I++;I ≥ E-1)

B(σ1’, σ2’)

ATP Query:

σ2’

ATP A(σ1,σ2)


B(σ1,σ2)σ1=σ2 eval(∧ σ1, I < E)

∧ eval(σ2, I ≥ E-1)


σ1’=σ2’ eval(∧ σ1’, I < E) ∧ eval(σ2’, I ≥ E-

1)

A ∧ B(σ1’, σ2’)

Strengthen A if the theorem prover fails

σ1’ = step(σ1, S;I++;I < E)

σ2’ = step(σ2, S;I++;I ≥ E-1)

A

3. Check Invariants

PEC

I:=0

I<EI≥E

S

I++

I:=0

I<E-1I≥E-1

S

I++S

I++

σ1=σ2

σ1=σ2

B

A

Entry A Entry B A B A A B Exit

3. Check Invariants

Each inv implies succs

Query Theorem Prover

PEC

Category 1: PEC, Rhodium forms equivalent

Copy propagation

Constant propagation

Common sub-expression elim

Partial redundancy elim

Category 2: PEC form easier, more general

Loop invariant code hoisting

Conditional speculation

Speculation

Category 3:Expressible in PECNo Rhodium formulation possible

Software pipelining

Loop unswitching

Loop unrolling

Loop peeling

Loop splitting

Loop interchange

Optimizations Checked

PEC

Loose Ends

Integrate into compilation chain

build execution engine in real compiler

Correctly pattern match and splice

code

reason about substitutions, IR

semantics

Correctly check side conditions

various program analyses

PEC

PEC

Our Two Phase Approach

1. PEC : Automatically check rewrite

2. XCERT : Correctly execute rewrite

Rewrite PEC XCERT

Optimization Correctness

[PLDI 09] [PLDI 10]XCER

T

Formally prove compiler correct

Implement compiler in proof assistant enables interactive proving

Hard to overcome formality interia high initial cost, “frozen” designs

Strong Guarante

e

Difficult to Extend

Background: Verified Compilers

XCERT

XCERT

On the Shoulders of Giants

XCert extends CompCert with

extensibility

verified compiler provides sure

foundation

Win-win Partnership

CompCert benefits: new optizations

without manual proof effort

XCert benefits: real compilation framework

formal semantics

CompCert

XCert

Extensible & Correct Compiler

PEC Rewrite

ATP Checks

CompCert

C Asm

Correct Compiler

??

Main Theorem Proved in

Coq :

PEC Checked Rewrites in

XCert

XCert Correct Formal Correctness Proof in Coq

Bulk of the development effort

Background: Proof Assistants

XCERT

Based on Curry-Howard Isomorphism:

Coq takes this idea to its logical conclusion

Programs & proofs in same

lang

Dependent Types are

powerful!

Program

Type Theorem

Proof

Example Verified Coq Program

XCERT

Inductive sorted : list Z -> Prop := | sorted0 : sorted nil | sorted1 : forall z:Z, sorted (z :: nil) | sorted2 : forall (z1 z2:Z) (l:list Z), z1 <= z2 -> sorted (z2 :: l) -> sorted (z1 :: z2 :: l).

...Definition sort : forall l:list Z, {l' : list Z | equiv l l' /\ sorted l'}.

induction l as [| a l IHl].exists (nil (A:=Z)); split; auto with sort. case IHl; intros l' [H0 H1].exists (aux a l'); split.apply equiv_trans with (a :: l'); auto with sort.apply aux_equiv. apply aux_sorted; auto.Defined.

XCert Correctness Proof

Small Step

Execute instruction

Step state S to S’

S

S’

XCERT

Execution Equivalence

Initial Equiv

Prove Simulation Diagram

CompCert Small Step Library:

Sim Diagram Progs

Equiv

L

L’

R

R’

L ~ R

<< L L’

R’ L’ ~ R’: R R’

Final Equiv

XCert Correctness Proof

? XCERT

Orig

inal

Tran

sfor

med

XCert Simulation Diagram

PEC Checked

RewriteA

A

A

B

A

B

ATP Checked

A A

A BXCER

T

XCert Highlights

Expressive CFG manipulations

pattern matching, splicing

Proof Complexity Management

Verified validation [Tristan and Leroy]

preserving non-terminating behaviors

Verified Analyses for Side ConditionsXCER

T

Evaluation

Engine : 1,500 lines of Coq

functional code

Proof : 4,500 lines of Coq proof

script

Time : 9 hacker months

XCERTCode Proof

0

1000

2000

3000

4000

5000

CSEConst PropXcert

Evaluation

Trusted Computing Base (TCB)

Appeals to faith … want to minimize

Compcert : Coq + Coq encoding of semantics

XCert adds : SMT + SMT encoding of semantics

All architected to pass through small checker

TCB is only a few hundred lines

XCERT

Evaluation

Extensibility: Support PEC Opts [PLDI

09]

No manual proof effort or TCB increase

Maintain Compcert end-to-end

correctness

Sample of Optimizations Run:

Loop Invariant Code Hoist Loop Peeling

Software Pipelining Conditional Speculation

Loop Unswitching Partial Redundancy ElimXCER

T

2XCert

Extensible & Correct Compiler

Thank You!

1Rewrite

Rule

PEC

have your compiler and extend it too zachary tatlock uc san diego correctness guaranteed

Documents

java compiler

executable compiler

testing compiler bugs

compiler hand optimized

compilers electronic

ctl electronic notes

theoretical computer

theoretical computer