relational verification to simd loop synthesis

23
Relational Verification to SIMD Loop Synthesis Mark Marron – IMDEA & Microsoft Research Sumit Gulwani – Microsoft Research Gilles Barthe, Juan M. Crespo, Cesar Kunz – IMDEA

Upload: birch

Post on 24-Feb-2016

32 views

Category:

Documents


0 download

DESCRIPTION

Relational Verification to SIMD Loop Synthesis. Mark Marron – Imdea & Microsoft Research Sumit Gulwani – Microsoft Research Gilles Barthe , Juan M. Crespo, Cesar Kunz – Imdea. General SIMD Compilation. Compilers struggle to utilize SIMD operations in general purpose code - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Relational Verification to SIMD Loop Synthesis

Relational Verificationto

SIMD Loop Synthesis

Mark Marron – IMDEA & Microsoft ResearchSumit Gulwani – Microsoft ResearchGilles Barthe, Juan M. Crespo, Cesar Kunz – IMDEA

Page 2: Relational Verification to SIMD Loop Synthesis

General SIMD Compilation Compilers struggle to utilize SIMD operations in general purpose code

◦ Text processing, web browser, compiler, etc.◦ Standard library code (C++ STL, .Net BCL) they utilize

Challenges◦ Data structure layouts of composite types◦ Complex data driven control flow◦ Wide ranging code restructuring is often needed

We know most of the needed “tricks” but:◦ Time and implementation effort too large to identify and implement all of them◦ “An Evaluation of Vectorizing Compilers” PACT ‘11

Page 3: Relational Verification to SIMD Loop Synthesis

Example (Exists Function)struct { int tag; int score; } widget;

int exists(widget* vals, int len, int t, int s) { for(int i = 0; i < len; ++i) { int tagok = vals[i].tag == t; int scoreok = vals[i].score > s; int andok = tagok & scoreok; if(andok) return 1; } return 0; }

Page 4: Relational Verification to SIMD Loop Synthesis

SIMD Example (Exists Function)… for(; i < (len - 3); i += 4) { m128i blck1 = load_128(vals, i); m128i blck2 = load_128(vals, i + 4);

m128i tagvs = shuffle_i32(blck1, blck2, ORDER(0, 2, 0, 2)); m128i scorevs = shuffle_i32(blck1, blck2, ORDER(1, 3, 1, 3));

m128i cmptag = cmpeq_i32(vectv, tagvs); m128i cmpscore = cmpgt_i32(vecsv, scorevs); m128i cmpr = and_i128(cmptag, cmpscore);

int match = !allzeros(cmpr); if (match) return 1; }…

[ti, si, ti+1, si+1][ti+2, si+2, ti+3, si+3]

[ti, ti+1, ti+2, ti+3][si, si+1, si+2, si+3]

[ti==t ? 0xF…F : 0x0, …, ti+3==t ? 0xF…F : 0x0][si>s ? 0xF…F : 0x0, …, si+3>s ? 0xF…F : 0x0]

[cmptag0 & cmpscore0, …, cmptag3 & cmpscore3]

(cmpr0!=0 | cmpr1!=0 | cmpr2!=0 | cmpr3!=0)

Page 5: Relational Verification to SIMD Loop Synthesis

Performance Impact

4 8 16 32 64 128 256 512 1024 20480.5

1

1.5

2

2.5

Exists Speedup

Array Size

Spee

dup

Page 6: Relational Verification to SIMD Loop Synthesis

Overview of Approach Deductive Rewriting of program source to:

◦ Identify high-level structures of interest◦ Rewrite to expose latent parallelism (split, unroll, etc.) and straighten hot-paths

Relational Verification techniques used to:◦ Construct the needed synthesis conditions (for code involving loops!)◦ Produce proof for semantic equivalence of input and result code

Inductive Synthesis of SIMD program fragments to:◦ Identify the best SIMD realizations of the synthesis conditions◦ Produce proofs of correctness wrt. synthesis conditions

Methodology more general than just SIMD Loops!

Page 7: Relational Verification to SIMD Loop Synthesis

From Verification to Synthesis Condition Generation

Relational Verification:◦ Prove two programs equivalent under equivalence relations on states

◦ y = x◦ y = x1 + x2 + x3 + x4

◦ y = 5◦ Only a few standard equivalence relations needed in practice

Prove results of two programs are equivalent by showing:◦ If the programs are synchronously executed then at synchronization points the program

states are always equivalent under the relations◦ For our purposes at the start and end of the loop body

Page 8: Relational Verification to SIMD Loop Synthesis

Relational Verificationint suml = 0;for(int i = 0; i < len; i+=4){ suml = suml + A[i]; suml = suml + A[i+1]; suml = suml + A[i+2]; suml = suml + A[i+3];}

int sumr = 0;int as0, as1, as2, as3 = 0;for(int i = 0; i < len; i+=4){ as0 = as0 + A[i]; as1 = as1 + A[i+1]; as2 = as2 + A[i+2]; as3 = as3 + A[i+3];}sumr = as0 + as1 + as2 + as3;

Full Loop Invariant: Relational Invariant:

Page 9: Relational Verification to SIMD Loop Synthesis

From Verification to Condition Generation

We use “Product Programs” approach◦ “Relational verification using product programs” FM ‘11◦ Rename variables in “left” and “right” programs disjointly◦ Interleave the programs “appropriately”◦ Generates verification conditions on the combined program

Key Idea: ◦ Replace code in “right” program with uninterpreted Function ()◦ Perform Product program construction and VC generation◦ Resulting VC for are needed synthesis pre/post conditions

Page 10: Relational Verification to SIMD Loop Synthesis

Relational Synthesis Conditionint suml = 0;for(int i = 0; i < len; i+=4){ suml = suml + A[i]; suml = suml + A[i+1]; suml = suml + A[i+2]; suml = suml + A[i+3];}

int sumr = 0;m128i ac = [0, 0, 0, 0];for(int i = 0; i < len; i+=4){ }sumr = ac.0 + ac.1 + ac.2 + ac.3;

Relational Invariant:

Page 11: Relational Verification to SIMD Loop Synthesis

Resulting Synthesis Condition Pre-condition:

◦ ac == [v1, v2, v3, v4]

Post-condtion: ◦ ac == [v1 + A[i], v2 + A[i+1], v3 + A[i+2], v4 + A[i+3]]

Page 12: Relational Verification to SIMD Loop Synthesis

Instruction Sequence Search Search space for SIMD instruction sequences is large

◦ Length: frequently need 8 or more instructions◦ Branching: SSE has 200+ instructions

Concrete state space exploration◦ Explore program states instead of instruction sequences◦ Use concrete execution to quickly exclude many candidate instruction sequences

Query SMT solver for a counter example input◦ Eventually either no counter examples or give up

Search for alternative sequences◦ Can generate multiple solutions to find best performance on varying data sizes

Page 13: Relational Verification to SIMD Loop Synthesis

Optimize Search Cost model provides upper bound on depth of search

◦ Also used to pick best operation to explore next and to pick shortest path from input to output state

Incrementally expand available instruction set ◦ Start with standard operations (and those seen in input code)◦ Add more specialized operations if desired

Generate multiple initial input-output pairs ◦ One per path in original loop body

Stack machine construction to reduce the branching factor

Page 14: Relational Verification to SIMD Loop Synthesis

Cost Model Do not want to compute absolute costs

◦ A very hard problem

Compute relative costs◦ Both programs run on the same data so same cache misses and branch taken/not taken◦ Build simple machine model to encapsulate instruction costs

Cost function a polynomial in terms of loop counts and branch rates◦ Use conservative static estimates for synthesis◦ Can use runtime data for selection in JIT setting

Page 15: Relational Verification to SIMD Loop Synthesis

Complete Algorithm

Final SIMD Program

for(i I by 4c){

}

Restructured Program

for(i I by 4c){

}

Input Program

…for(i I by c){

}…

Input Program

…for(i I by c){

}…

Restructured Program

for(i I by 4c){

}

Synthesize

Synth. Cond. Generation

Final SIMD Program

for(i I by 4c){

}

Cost Ranking Function

Correctness Proof

Body

Synthesis Cond.

Optimistic Vectorize

Restructure Loop

Merge & Cleanup

Cost Score

Simulation Relation (Eq)

CPU Model

Page 16: Relational Verification to SIMD Loop Synthesis

SIMD Standard Library Synthesize SIMD implementations of C++ STL and .Net BCL code Consistent performance improvements

◦ Between 2x-4x on large inputs◦ Avoid performance degradation on small inputs

Cost model accurately predicts performance◦ Can pick best implementation based on hardware and input data

Page 17: Relational Verification to SIMD Loop Synthesis

Library Function Performance

4 8 16 32 64 128 256 512 1024 20480.5

1

1.5

2

2.5

3

3.5

4

Cyclic Hash

Actual Predicted

4 8 16 32 64 128 256 512 1024 20480.5

1

1.5

2

2.5

3

3.5

4

CountIf

Actual Predicted

Page 18: Relational Verification to SIMD Loop Synthesis

String Processing Synthesize standard string functions using PCMPESTRI

◦ Packed Compare Explicit Length Strings, Return Index

Encoded semantics and provided them to synthesizer◦ Synthesized range of common string functions with no other changes◦ Speedup of 3.4x for String.Equals◦ Speedup up to 9.5x for String.IndexOfAny

Page 19: Relational Verification to SIMD Loop Synthesis

Impact In Practice 483.Xalan (SPEC CPU)

XML processing framework written in C++ Replaced STL calls with our SIMD implementations Performance sensitive to input data

◦ Previous work replacing these calls with set structures was +15% to -20% on different data

Synthesized SIMD code produces consistent 2%-5% speedup ◦ Indicates a 1.15x to 1.5x speedup in the STL code which is inline with cost model predictions

Page 20: Relational Verification to SIMD Loop Synthesis

Benefits of Approach Proof of correctness from original loop and SIMD version Separation of correctness and optimization

◦ Transform for performant code structure◦ If incorrect proof (or synthesis) will fail later

Approach consistently produces fast SIMD code◦ Robust to details of SIMD instruction set and loop patterns◦ 2x-4x speedups obtained from synthesized SIMD code

Page 21: Relational Verification to SIMD Loop Synthesis

Future Work Pointers and object structures

◦ Scatter-Gather support will help ◦ Compact object graphs into arrays (current work)◦ Can we do local data structure transformations?

Apply technique to larger structures and more generally◦ What about loops with small inner-loops (HashTable lookup)?◦ Can we use synthesis as part of general code-gen?

Page 22: Relational Verification to SIMD Loop Synthesis

Big Picture Conclusions Big challenges and big benefits using specialized hardware

◦ Both performance and power!

Synthesis complements compilation◦ Small step vs. big step code generation◦ Verification structures synthesis (and eliminates compilation bugs)◦ Can we apply ideas to other compiler actions? Target other hardware?

Idea more general than just compilers or SIMD synthesis◦ Expert provided deductive structure ◦ Inductive synthesis driven by underlying semantics◦ A powerful combination for approaching problems

Page 23: Relational Verification to SIMD Loop Synthesis

Questions