automatic generation of peephole superoptimizers

33
LOGO Automatic Generation of Peephole Superoptimizers Speaker:Shuai-wei Huang Advisor:Wuu Yang Sorav Bansal and Alex Aiken Computer System Lab Stanford University ASPLOS’06

Upload: keanumit

Post on 02-Jul-2015

494 views

Category:

Technology


3 download

TRANSCRIPT

Page 1: Automatic Generation of Peephole Superoptimizers

LOGO

Automatic Generation of Peephole Superoptimizers

Speaker:Shuai-wei HuangAdvisor:Wuu Yang

Sorav Bansal and Alex AikenComputer System Lab

Stanford UniversityASPLOS’06

Page 2: Automatic Generation of Peephole Superoptimizers

www.themegallery.com

Company Logo2

Contents

2. Design of the Optimizer

1. Introduction

3.Experimental Results

4. Conclusion

Page 3: Automatic Generation of Peephole Superoptimizers

www.themegallery.com

Company Logo3

Introduction – Peephole Optimizer

Peephole optimizers are pattern matching systems that replace one sequence of instructions by another equivalent, but faster, sequence of instructions.

The optimizations are usually expressed as parameterized replacement rules, so that, for example, mov r1, r2; mov r2, r1 => mov r1, r2

Peephole optimizers are typically constructed using human-written pattern matching rules. Requires Time Less Systematic

Page 4: Automatic Generation of Peephole Superoptimizers

www.themegallery.com

Company Logo4

Introduction – Superoptimization

Automatically discover replacement rules that are optimizations.

Optimizations are computed off-line and then presented as an indexed structure for efficient lookup.

The optimizations are then organized into a lookup table mapping original sequences to their optimized

counterparts.

Page 5: Automatic Generation of Peephole Superoptimizers

www.themegallery.com

Company Logo5

Introduction – Superoptimization

The goals in this paper are considerably more modest, focusing on showing that an automatically constructed peephole optimizer is possible and, even with limited resources (i.e., a single machine) and learning hundreds to thousands of useful optimizations, such an optimizer can find significant speedups that standard optimizers miss.

Page 6: Automatic Generation of Peephole Superoptimizers

www.themegallery.com

Company Logo6

Introduction – Terminology problem

The classical meaning of superoptimization is to find the optimal code sequence for a single, loop-free assembly sequence of instructions, which we call the target sequence.

The terminology problem:In order to distinguish superoptimization from garden variety optimization as that term is normally used.

Page 7: Automatic Generation of Peephole Superoptimizers

www.themegallery.com

Company Logo7

Introduction – Related work

Massalin Simply enumerates sequences of instructions of

increasing length, testing each for equality with the target sequence; the lowest cost equivalent sequence found is the optimal one.

Denali Constrains the search space to a set of equality-

preserving transformations expressed by the system designer.

For a given target sequence, a structure representing all possible equivalent sequences under the transformation rules is searched for the lowest cost equivalent sequence.

Page 8: Automatic Generation of Peephole Superoptimizers

www.themegallery.com

Company Logo8

Design of the Optimizer - Term

Instruction : an opcode together with some valid operands.A potential problem arises with opcodes that take immediate operands, as they generate a huge number of instructions. Restrict immediate operands to a small set of constants

and symbolic constants.

Cost function : different cost functions for different purposes Running time to optimize speed Instruction byte count to optimize the size of a binary

Page 9: Automatic Generation of Peephole Superoptimizers

www.themegallery.com

Company Logo9

Design of the Optimizer - Term

Equivalence of two instruction sequences : Registers live Stack locations Memory locations

(For implementation simplicity, we currently conservatively assume memory and stack locations are always live.)

Equivalence test ≡L tests two instruction sequences for equivalence under the context (set of live registers) L.

For a target sequence T and a cost function c, we are interested in finding a minimum cost instruction sequence O such that

O ≡L T

Page 10: Automatic Generation of Peephole Superoptimizers

www.themegallery.com

Company Logo10

Design of the Optimizer - Flowchart

Page 11: Automatic Generation of Peephole Superoptimizers

www.themegallery.com

Company Logo11

Design of the Optimizer – Structure

Harvester Extracts target instruction sequences from the training

applications. The target instruction sequences are the ones we seek

to optimize.

Enumerator Exhaustively enumerates all possible candidate

instruction sequences up to a certain length. Checking if each candidate sequence is an optimal

replacement for any of the target instruction sequences.

Optimization database An index of all discovered optimizations

Page 12: Automatic Generation of Peephole Superoptimizers

www.themegallery.com

Company Logo12

Design of the Optimizer – Harvester

Harvesting Target Instruction Sequences

1. Obtain target instruction sequences from a representative set of applications.

2. These harvested instruction sequences form the corpus used to train the optimizer.

3. A harvestable instruction sequence I must have a single entry point.

4. Records the set of register live.

Page 13: Automatic Generation of Peephole Superoptimizers

www.themegallery.com

Company Logo13

Design of the Optimizer – Canonicalization(1/3)

On a machine with 8 registers, an instruction

mov r1, r0has 8*7 = 56 equivalent versions with different register names.

Canonicalization : eliminate all unnecessary instruction sequences that are mere renamings of others.

Page 14: Automatic Generation of Peephole Superoptimizers

www.themegallery.com

Company Logo14

Design of the Optimizer – Canonicalization(2/3)

An instruction sequence is canonical if its registers and constants are named in the order of their appearance in the instruction sequence. The first register used is always r0, the second

distinct register used is always r1, and so on.

Similarly, the first constant used in a canonical instruction sequence is c0, the second distinct constant c1, and so on.

Page 15: Automatic Generation of Peephole Superoptimizers

www.themegallery.com

Company Logo15

Design of the Optimizer – Canonicalization(3/3)

Page 16: Automatic Generation of Peephole Superoptimizers

www.themegallery.com

Company Logo16

Design of the Optimizer – Fingerprinting(1/3)

We execute I on test machine states and then compute a hash of the result, which we call I's fingerprint.

Fingerprint Hash table. Each bucket holds the target instruction

sequence.

Page 17: Automatic Generation of Peephole Superoptimizers

www.themegallery.com

Company Logo17

Design of the Optimizer – Fingerprinting(2/3)

Testvectors Each bit in the two testvectors is set randomly,

but the same testvectors are used for fingerprinting every instruction sequence.

The machine is loaded with a testvector and control is transferred to the instruction sequence.

Page 18: Automatic Generation of Peephole Superoptimizers

www.themegallery.com

Company Logo18

Design of the Optimizer – Fingerprinting(3/3)

Minimal collisions It should be asymmetric with respect to

different memory locations and registers.

It should not be based on a single operator (like xor)

r distinct registers and c distinct constants can generate at most r!*c! fingerprints. Typically r 5 and c 2, so the blow-up is ≦ ≦upper-bounded by 240.

Page 19: Automatic Generation of Peephole Superoptimizers

www.themegallery.com

Company Logo19

Design of the Optimizer – Enumerate(1/6)

Instruction sequences are enumerated from a subset of all instructions.

To bound the search space, we restrict the maximum number of distinct registers and constants that can appear in an enumerable instruction sequence. Constants :0,1,C0,C1 Register number : 4

Page 20: Automatic Generation of Peephole Superoptimizers

www.themegallery.com

Company Logo20

Design of the Optimizer – Enumerate(2/6)

Page 21: Automatic Generation of Peephole Superoptimizers

www.themegallery.com

Company Logo21

Design of the Optimizer – Enumerate(3/6)

Page 22: Automatic Generation of Peephole Superoptimizers

www.themegallery.com

Company Logo22

Design of the Optimizer – Enumerate(4/6)

Enumerator’s search space is exponential in the length of the instruction sequence.

Two techniques to reduce the size Enumerate only canonical instruction

sequences. Prune the search space by identifying and

eliminating instructions that are functionally equivalent to other cheaper instructions.

Page 23: Automatic Generation of Peephole Superoptimizers

www.themegallery.com

Company Logo23

Design of the Optimizer – Enumerate(5/6)

In order to avoid fingerprinting{ mov r0, r1; mov r0, r1 }

weed out such special cases automatically through fingerprinting and equivalence checks.

Page 24: Automatic Generation of Peephole Superoptimizers

www.themegallery.com

Company Logo24

Design of the Optimizer – Enumerate(6/6)

Page 25: Automatic Generation of Peephole Superoptimizers

www.themegallery.com

Company Logo25

Design of the Optimizer – Equivalence test

The equivalence test proceeds in two steps. a fast but incomplete execution test a slower but exact boolean test.

Execution test We run the two sequences over a set of testvectors and

observe if they yield the same output on each test.

Boolean test The boolean verfication test represents an instruction

sequence by a boolean formula and expresses the equivalence relation as a satisability constraint.

The satisability constraint is tested using a SAT solver.

Page 26: Automatic Generation of Peephole Superoptimizers

www.themegallery.com

Company Logo26

Design of the Optimizer – Boolean test

All memory writes are stored in a table in order of their occurrence. (addr1 = addr2) => (data1 = data2)

Each read-access R is checked for address-equivalence with each of the preceding write accesses Wi in decreasing order of i, where Wi is the i'th write access by the instruction sequence.

Page 27: Automatic Generation of Peephole Superoptimizers

www.themegallery.com

Company Logo27

Design of the Optimizer – Optimization database

The optimization database records all optimizations discovered by the superoptimizer.

The database is indexed by the original instruction sequence (in its

canonical form) The set of live registers Corresponding optimal sequence if one

exists.

Page 28: Automatic Generation of Peephole Superoptimizers

www.themegallery.com

Company Logo28

Experimental Results

GCC 3.2.3 -02zChaff SAT solverIntel Pentium 3.0GHz processor100 Gigabytes local storage.Linux machinepeephole size : 3 instruction sequencesOptimized windows : 6 instruction

sequences

Page 29: Automatic Generation of Peephole Superoptimizers

www.themegallery.com

Company Logo29

Experimental Results

Improvements of between 1:7 and 10 times over already-optimized code

Page 30: Automatic Generation of Peephole Superoptimizers

www.themegallery.com

Company Logo30

Experimental Results

Five optimizations were used more than 1,000 times each; in total over 600 distinct optimizations were used at least once each on these benchmarks.

Page 31: Automatic Generation of Peephole Superoptimizers

www.themegallery.com

Company Logo31

Experimental Results

Codesize cost function Simply considers the size of the executable binary code

of a sequence as its cost. Runtime cost function

Number of memory accesses Branch instructions Approximate cycle costs

Page 32: Automatic Generation of Peephole Superoptimizers

www.themegallery.com

Company Logo32

Conclusion

The target sequences are extracted, or harvested, from a training set of programs. The idea is that the important sequences to

optimize are the ones emitted by compilers.

Our prototype implementation handles nearly all of the 300+ opcodes of the x86 architecture.

Introduce a new technique, canonicalization we need never consider a sequence that is equal up

to consistent renaming of registers and symbolic constants.

Page 33: Automatic Generation of Peephole Superoptimizers

www.themegallery.com

Company Logo33

Thank Thank You !You !