automatic generation of peephole superoptimizerstheory.stanford.edu/~sbansal/talks/msr-talk.pdf ·...

123
Automatic Generation of Peephole Superoptimizers Sorav Bansal and Alex Aiken Stanford University Produced with L A T E X seminar style & PSTricks 1

Upload: others

Post on 23-Jun-2020

19 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Automatic Generation of Peephole Superoptimizerstheory.stanford.edu/~sbansal/talks/msr-talk.pdf · Automatic Generation of Peephole Superoptimizers Sorav Bansal and Alex Aiken Stanford

Automatic Generation

of Peephole Superoptimizers

Sorav Bansal and Alex Aiken

Stanford University

{sbansal, aiken}� s.stanford.edu

Produced with LATEX seminar style & PSTricks 1

Page 2: Automatic Generation of Peephole Superoptimizerstheory.stanford.edu/~sbansal/talks/msr-talk.pdf · Automatic Generation of Peephole Superoptimizers Sorav Bansal and Alex Aiken Stanford

2

Page 3: Automatic Generation of Peephole Superoptimizerstheory.stanford.edu/~sbansal/talks/msr-talk.pdf · Automatic Generation of Peephole Superoptimizers Sorav Bansal and Alex Aiken Stanford

2-A

Page 4: Automatic Generation of Peephole Superoptimizerstheory.stanford.edu/~sbansal/talks/msr-talk.pdf · Automatic Generation of Peephole Superoptimizers Sorav Bansal and Alex Aiken Stanford

SUPEROPTIMIZATION

SUPEROPTIMIZATION 3

Page 5: Automatic Generation of Peephole Superoptimizerstheory.stanford.edu/~sbansal/talks/msr-talk.pdf · Automatic Generation of Peephole Superoptimizers Sorav Bansal and Alex Aiken Stanford

SUPEROPTIMIZATION

Classical Meaning: To find the optimal code sequence for a

single, loop-free assembly sequence of instructions

TargetSequence

ArchitectureSpecification

SUPEROPTIMIZATION 3-A

Page 6: Automatic Generation of Peephole Superoptimizerstheory.stanford.edu/~sbansal/talks/msr-talk.pdf · Automatic Generation of Peephole Superoptimizers Sorav Bansal and Alex Aiken Stanford

SUPEROPTIMIZATION

Classical Meaning: To find the optimal code sequence for a

single, loop-free assembly sequence of instructions

TargetSequence

Exhaustive Search

ArchitectureSpecification

SUPEROPTIMIZATION 3-B

Page 7: Automatic Generation of Peephole Superoptimizerstheory.stanford.edu/~sbansal/talks/msr-talk.pdf · Automatic Generation of Peephole Superoptimizers Sorav Bansal and Alex Aiken Stanford

SUPEROPTIMIZATION

Classical Meaning: To find the optimal code sequence for a

single, loop-free assembly sequence of instructions

TargetSequence

Exhaustive Search

OptimalSequence

ArchitectureSpecification

SUPEROPTIMIZATION 3-C

Page 8: Automatic Generation of Peephole Superoptimizerstheory.stanford.edu/~sbansal/talks/msr-talk.pdf · Automatic Generation of Peephole Superoptimizers Sorav Bansal and Alex Aiken Stanford

SUPEROPTIMIZATION

Classical Meaning: To find the optimal code sequence for a

single, loop-free assembly sequence of instructions

Common Point-of-View

➜ Superoptimization is Sloowww..

➜ Superoptimization is useful only for occasional optimization of

the critical inner loop

SUPEROPTIMIZATION 4

Page 9: Automatic Generation of Peephole Superoptimizerstheory.stanford.edu/~sbansal/talks/msr-talk.pdf · Automatic Generation of Peephole Superoptimizers Sorav Bansal and Alex Aiken Stanford

SUPEROPTIMIZATION

movl %eax , %ecx

in %al ,shll %eax ,leal (%ecx), %eax

push (%ebx),movl %ecx , (%ebx)

shll (%ebx),addl $9 , (%ebx)

call func ,mov %eax , %ecx

shll %al ,out %al ,push (%ebx),mov %ecx , (%ebx)

push %eax ,mov %eax , %ecx

a.out

SUPEROPTIMIZATION 5

Page 10: Automatic Generation of Peephole Superoptimizerstheory.stanford.edu/~sbansal/talks/msr-talk.pdf · Automatic Generation of Peephole Superoptimizers Sorav Bansal and Alex Aiken Stanford

SUPEROPTIMIZATION

movl %eax , %ecx

in %al ,shll %eax ,leal (%ecx), %eax

push (%ebx),movl %ecx , (%ebx)

shll (%ebx),addl $9 , (%ebx)

call func ,mov %eax , %ecx

shll %al ,out %al ,push (%ebx),mov %ecx , (%ebx)

push %eax ,mov %eax , %ecx

a.out

Exhaustive Search

SUPEROPTIMIZATION 5-A

Page 11: Automatic Generation of Peephole Superoptimizerstheory.stanford.edu/~sbansal/talks/msr-talk.pdf · Automatic Generation of Peephole Superoptimizers Sorav Bansal and Alex Aiken Stanford

SUPEROPTIMIZATION

movl %eax , %ecx

in %al ,shll %eax ,leal (%ecx), %eax

push (%ebx),movl %ecx , (%ebx)

shll (%ebx),addl $9 , (%ebx)

call func ,mov %eax , %ecx

shll %al ,out %al ,push (%ebx),mov %ecx , (%ebx)

push %eax ,mov %eax , %ecx

a.out

Exhaustive Search

Optimization Tabletarget optimal

1. movl %ecx (%ebx) leal 9(%ecx) %ebx

shll (%ebx) movl %ebx (%ebx)

addl $9 (%ebx)

SUPEROPTIMIZATION 5-B

Page 12: Automatic Generation of Peephole Superoptimizerstheory.stanford.edu/~sbansal/talks/msr-talk.pdf · Automatic Generation of Peephole Superoptimizers Sorav Bansal and Alex Aiken Stanford

SUPEROPTIMIZATION

movl %eax , %ecx

in %al ,shll %eax ,leal (%ecx), %eax

push (%ebx),movl %ecx , (%ebx)

shll (%ebx),addl $9 , (%ebx)

call func ,mov %eax , %ecx

shll %al ,out %al ,push (%ebx),mov %ecx , (%ebx)

push %eax ,mov %eax , %ecx

a.out

Exhaustive Search

Optimization Tabletarget optimal

1. movl %ecx (%ebx) leal 9(%ecx) %ebx

shll (%ebx) movl %ebx (%ebx)

addl $9 (%ebx)

Optimization Tabletarget optimal

1. movl %ecx (%ebx) leal 9(%ecx) %ebx

shll (%ebx) movl %ebx (%ebx)

addl $9 (%ebx)

2. shll %eax movl %ecx %eax

leal (%ecx) (%eax)

SUPEROPTIMIZATION 5-C

Page 13: Automatic Generation of Peephole Superoptimizerstheory.stanford.edu/~sbansal/talks/msr-talk.pdf · Automatic Generation of Peephole Superoptimizers Sorav Bansal and Alex Aiken Stanford

SUPEROPTIMIZATION

movl %eax , %ecx

in %al ,shll %eax ,leal (%ecx), %eax

push (%ebx),movl %ecx , (%ebx)

shll (%ebx),addl $9 , (%ebx)

call func ,mov %eax , %ecx

shll %al ,out %al ,push (%ebx),mov %ecx , (%ebx)

push %eax ,mov %eax , %ecx

a.out

mov %eax , %ecx

in %al ,shll %eax ,leal (%ecx), %eax

push (%ebx),movl %ecx , %eax

shll %ecx ,movl %ecx , %eax

call func ,mov %eax , %ecx

shll %al ,out %al ,push (%ebx),mov %ecx , (%ebx)

push %eax ,mov %eax , %ecx

b.out

Exhaustive Search

Optimization Tabletarget optimal

1. movl %ecx (%ebx) leal 9(%ecx) %ebx

shll (%ebx) movl %ebx (%ebx)

addl $9 (%ebx)

Optimization Tabletarget optimal

1. movl %ecx (%ebx) leal 9(%ecx) %ebx

shll (%ebx) movl %ebx (%ebx)

addl $9 (%ebx)

2. shll %eax movl %ecx %eax

leal (%ecx) (%eax)

Optimization Tabletarget optimal

1. movl %ecx (%ebx) leal 9(%ecx) %ebx

shll (%ebx) movl %ebx (%ebx)

addl $9 (%ebx)

2. shll %eax movl %ecx %eax

leal (%ecx) (%eax)

3. movl %ecx %eax shll %ecx

shll %ecx movl %ecx %eax

movl %ecx %eax

SUPEROPTIMIZATION 5-D

Page 14: Automatic Generation of Peephole Superoptimizerstheory.stanford.edu/~sbansal/talks/msr-talk.pdf · Automatic Generation of Peephole Superoptimizers Sorav Bansal and Alex Aiken Stanford

SUPEROPTIMIZATION

movl %eax , %ecx

in %al ,shll %eax ,leal (%ecx), %eax

push (%ebx),movl %ecx , (%ebx)

shll (%ebx),addl $9 , (%ebx)

call func ,mov %eax , %ecx

shll %al ,out %al ,push (%ebx),mov %ecx , (%ebx)

push %eax ,mov %eax , %ecx

a.out

mov %eax , %ecx

in %al ,shll %eax ,leal (%ecx), %eax

push (%ebx),movl %ecx , %eax

shll %ecx ,movl %ecx , %eax

call func ,mov %eax , %ecx

shll %al ,out %al ,push (%ebx),mov %ecx , (%ebx)

push %eax ,mov %eax , %ecx

b.out

Exhaustive Search

Optimization Tabletarget optimal

1. movl %ecx (%ebx) leal 9(%ecx) %ebx

shll (%ebx) movl %ebx (%ebx)

addl $9 (%ebx)

Optimization Tabletarget optimal

1. movl %ecx (%ebx) leal 9(%ecx) %ebx

shll (%ebx) movl %ebx (%ebx)

addl $9 (%ebx)

2. shll %eax movl %ecx %eax

leal (%ecx) (%eax)

Optimization Tabletarget optimal

1. movl %ecx (%ebx) leal 9(%ecx) %ebx

shll (%ebx) movl %ebx (%ebx)

addl $9 (%ebx)

2. shll %eax movl %ecx %eax

leal (%ecx) (%eax)

3. movl %ecx %eax shll %ecx

shll %ecx movl %ecx %eax

movl %ecx %eax

SUPEROPTIMIZATION 5-E

Page 15: Automatic Generation of Peephole Superoptimizerstheory.stanford.edu/~sbansal/talks/msr-talk.pdf · Automatic Generation of Peephole Superoptimizers Sorav Bansal and Alex Aiken Stanford

SUPEROPTIMIZATION

movl %eax , %ecx

in %al ,shll %eax ,leal (%ecx), %eax

push (%ebx),movl %ecx , (%ebx)

shll (%ebx),addl $9 , (%ebx)

call func ,mov %eax , %ecx

shll %al ,out %al ,push (%ebx),mov %ecx , (%ebx)

push %eax ,mov %eax , %ecx

a.out

mov %eax , %ecx

in %al ,shll %eax ,leal (%ecx), %eax

push (%ebx),movl %ecx , %eax

shll %ecx ,movl %ecx , %eax

call func ,mov %eax , %ecx

shll %al ,out %al ,push (%ebx),mov %ecx , (%ebx)

push %eax ,mov %eax , %ecx

b.out

Exhaustive Search

Optimization Tabletarget optimal

1. movl %ecx (%ebx) leal 9(%ecx) %ebx

shll (%ebx) movl %ebx (%ebx)

addl $9 (%ebx)

Optimization Tabletarget optimal

1. movl %ecx (%ebx) leal 9(%ecx) %ebx

shll (%ebx) movl %ebx (%ebx)

addl $9 (%ebx)

2. shll %eax movl %ecx %eax

leal (%ecx) (%eax)

Optimization Tabletarget optimal

1. movl %ecx (%ebx) leal 9(%ecx) %ebx

shll (%ebx) movl %ebx (%ebx)

addl $9 (%ebx)

2. shll %eax movl %ecx %eax

leal (%ecx) (%eax)

3. movl %ecx %eax shll %ecx

shll %ecx movl %ecx %eax

movl %ecx %eax

Optimization Database

SUPEROPTIMIZATION 5-F

Page 16: Automatic Generation of Peephole Superoptimizerstheory.stanford.edu/~sbansal/talks/msr-talk.pdf · Automatic Generation of Peephole Superoptimizers Sorav Bansal and Alex Aiken Stanford

SUPEROPTIMIZATION

movl %eax , %ecx

in %al ,shll %eax ,leal (%ecx), %eax

push (%ebx),movl %ecx , (%ebx)

shll (%ebx),addl $9 , (%ebx)

call func ,mov %eax , %ecx

shll %al ,out %al ,push (%ebx),mov %ecx , (%ebx)

push %eax ,mov %eax , %ecx

a.out

mov %eax , %ecx

in %al ,shll %eax ,leal (%ecx), %eax

push (%ebx),movl %ecx , %eax

shll %ecx ,movl %ecx , %eax

call func ,mov %eax , %ecx

shll %al ,out %al ,push (%ebx),mov %ecx , (%ebx)

push %eax ,mov %eax , %ecx

b.out

Exhaustive Search

Optimization Tabletarget optimal

1. movl %ecx (%ebx) leal 9(%ecx) %ebx

shll (%ebx) movl %ebx (%ebx)

addl $9 (%ebx)

Optimization Tabletarget optimal

1. movl %ecx (%ebx) leal 9(%ecx) %ebx

shll (%ebx) movl %ebx (%ebx)

addl $9 (%ebx)

2. shll %eax movl %ecx %eax

leal (%ecx) (%eax)

Optimization Tabletarget optimal

1. movl %ecx (%ebx) leal 9(%ecx) %ebx

shll (%ebx) movl %ebx (%ebx)

addl $9 (%ebx)

2. shll %eax movl %ecx %eax

leal (%ecx) (%eax)

3. movl %ecx %eax shll %ecx

shll %ecx movl %ecx %eax

movl %ecx %eax

Optimization Database

• To what length of instruction sequences can we scale?

SUPEROPTIMIZATION 5-G

Page 17: Automatic Generation of Peephole Superoptimizerstheory.stanford.edu/~sbansal/talks/msr-talk.pdf · Automatic Generation of Peephole Superoptimizers Sorav Bansal and Alex Aiken Stanford

SUPEROPTIMIZATION

movl %eax , %ecx

in %al ,shll %eax ,leal (%ecx), %eax

push (%ebx),movl %ecx , (%ebx)

shll (%ebx),addl $9 , (%ebx)

call func ,mov %eax , %ecx

shll %al ,out %al ,push (%ebx),mov %ecx , (%ebx)

push %eax ,mov %eax , %ecx

a.out

mov %eax , %ecx

in %al ,shll %eax ,leal (%ecx), %eax

push (%ebx),movl %ecx , %eax

shll %ecx ,movl %ecx , %eax

call func ,mov %eax , %ecx

shll %al ,out %al ,push (%ebx),mov %ecx , (%ebx)

push %eax ,mov %eax , %ecx

b.out

Exhaustive Search

Optimization Tabletarget optimal

1. movl %ecx (%ebx) leal 9(%ecx) %ebx

shll (%ebx) movl %ebx (%ebx)

addl $9 (%ebx)

Optimization Tabletarget optimal

1. movl %ecx (%ebx) leal 9(%ecx) %ebx

shll (%ebx) movl %ebx (%ebx)

addl $9 (%ebx)

2. shll %eax movl %ecx %eax

leal (%ecx) (%eax)

Optimization Tabletarget optimal

1. movl %ecx (%ebx) leal 9(%ecx) %ebx

shll (%ebx) movl %ebx (%ebx)

addl $9 (%ebx)

2. shll %eax movl %ecx %eax

leal (%ecx) (%eax)

3. movl %ecx %eax shll %ecx

shll %ecx movl %ecx %eax

movl %ecx %eax

Optimization Database

• To what length of instruction sequences can we scale?

• Can the database be made large enough to capture

most “traditional” optimizations?

SUPEROPTIMIZATION 5-H

Page 18: Automatic Generation of Peephole Superoptimizerstheory.stanford.edu/~sbansal/talks/msr-talk.pdf · Automatic Generation of Peephole Superoptimizers Sorav Bansal and Alex Aiken Stanford

Traversing a Link-List

struct node{

int val;struct node *next;

};

void traverse (struct node *head){

while (head){

head->value *= 2;head = head->next;

}}

SUPEROPTIMIZATION 6

Page 19: Automatic Generation of Peephole Superoptimizerstheory.stanford.edu/~sbansal/talks/msr-talk.pdf · Automatic Generation of Peephole Superoptimizers Sorav Bansal and Alex Aiken Stanford

Traversing a Link-List

struct node{

int val;struct node *next;

};

void traverse (struct node *head){

while (head){

head->value *= 2;head = head->next;

}}

SUPEROPTIMIZATION 6-A

Page 20: Automatic Generation of Peephole Superoptimizerstheory.stanford.edu/~sbansal/talks/msr-talk.pdf · Automatic Generation of Peephole Superoptimizers Sorav Bansal and Alex Aiken Stanford

Traversing a Link List

1: movl head , %edx2: movl head , %eax3: movl %(eax), %eax4: sall %eax5: movl %eax, %(edx)6: movl head , %eax7: movl 4(%eax), %eax8: movl %eax, head9: mpl $0, headNaive assembly odegenerated by g

1: movl head, %edx2: movl %edx, %eax3: movl %(eax), %eax4: sall %eax5: movl %eax, %(edx)6: movl head, %eax7: movl 4(%eax), %eax8: movl %eax, head9: mpl $0, headStep One

SUPEROPTIMIZATION 7

Page 21: Automatic Generation of Peephole Superoptimizerstheory.stanford.edu/~sbansal/talks/msr-talk.pdf · Automatic Generation of Peephole Superoptimizers Sorav Bansal and Alex Aiken Stanford

Traversing a Link List

1: movl head , %edx2: movl head , %eax3: movl %(eax), %eax4: sall %eax5: movl %eax, %(edx)6: movl head , %eax7: movl 4(%eax), %eax8: movl %eax, head9: mpl $0, headNaive assembly odegenerated by g

1: movl head, %edx2: movl %edx, %eax3: movl %(eax), %eax4: sall %eax5: movl %eax, %(edx)6: movl head, %eax7: movl 4(%eax), %eax8: movl %eax, head9: mpl $0, headStep One

SUPEROPTIMIZATION 7-A

Page 22: Automatic Generation of Peephole Superoptimizerstheory.stanford.edu/~sbansal/talks/msr-talk.pdf · Automatic Generation of Peephole Superoptimizers Sorav Bansal and Alex Aiken Stanford

Traversing a Link List

1: movl head , %edx2: movl head , %eax3: movl %(eax), %eax4: sall %eax5: movl %eax, %(edx)6: movl head , %eax7: movl 4(%eax), %eax8: movl %eax, head9: mpl $0, headNaive assembly odegenerated by g

1: movl head, %edx2: movl %edx, %eax3: movl %(eax), %eax4: sall %eax5: movl %eax, %(edx)6: movl head, %eax7: movl 4(%eax), %eax8: movl %eax, head9: mpl $0, headStep One

Automatic Removal of Redundant Load

SUPEROPTIMIZATION 7-B

Page 23: Automatic Generation of Peephole Superoptimizerstheory.stanford.edu/~sbansal/talks/msr-talk.pdf · Automatic Generation of Peephole Superoptimizers Sorav Bansal and Alex Aiken Stanford

Traversing a Link List

1: movl head, %edx2: movl %edx, %eax3: movl %(eax), %eax4: sall %eax5: movl %eax, %(edx)6: movl head, %eax7: movl 4(%eax), %eax8: movl %eax, head9: mpl $0, headStep One

1: movl head, %edx2: movl %edx, %eax3: movl %(eax), %eax4: sall %eax5: movl %eax, %(edx)6: movl head, %eax7: movl 4(%eax), %eax8: movl %eax, head9: mpl $0, %eaxStep Two

SUPEROPTIMIZATION 8

Page 24: Automatic Generation of Peephole Superoptimizerstheory.stanford.edu/~sbansal/talks/msr-talk.pdf · Automatic Generation of Peephole Superoptimizers Sorav Bansal and Alex Aiken Stanford

Traversing a Link List

1: movl head, %edx2: movl %edx, %eax3: movl %(eax), %eax4: sall %eax5: movl %eax, %(edx)6: movl head, %eax7: movl 4(%eax), %eax8: movl %eax, head9: mpl $0, headStep One

1: movl head, %edx2: movl %edx, %eax3: movl %(eax), %eax4: sall %eax5: movl %eax, %(edx)6: movl head, %eax7: movl 4(%eax), %eax8: movl %eax, head9: mpl $0, %eaxStep Two

SUPEROPTIMIZATION 8-A

Page 25: Automatic Generation of Peephole Superoptimizerstheory.stanford.edu/~sbansal/talks/msr-talk.pdf · Automatic Generation of Peephole Superoptimizers Sorav Bansal and Alex Aiken Stanford

Traversing a Link List

1: movl head, %edx2: movl %edx, %eax3: movl %(eax), %eax4: sall %eax5: movl %eax, %(edx)6: movl head, %eax7: movl 4(%eax), %eax8: movl %eax, head9: mpl $0, headStep One

1: movl head, %edx2: movl %edx, %eax3: movl %(eax), %eax4: sall %eax5: movl %eax, %(edx)6: movl head, %eax7: movl 4(%eax), %eax8: movl %eax, head9: mpl $0, %eaxStep Two

Automatic Elimination of Memory Access

SUPEROPTIMIZATION 8-B

Page 26: Automatic Generation of Peephole Superoptimizerstheory.stanford.edu/~sbansal/talks/msr-talk.pdf · Automatic Generation of Peephole Superoptimizers Sorav Bansal and Alex Aiken Stanford

Traversing a Link List

1: movl head, %edx2: movl %edx, %eax3: movl %(eax), %eax4: sall %eax5: movl %eax, %(edx)6: movl head, %eax7: movl 4(%eax), %eax8: movl %eax, head9: mpl $0, %eaxStep Two

1: movl head, %edx2:3:4: sall (%edx)5:6: movl head, %eax7: movl 4(%eax), %eax8: movl %eax, head9: mpl $0, %eaxStep Three

Automatic Instruction Selection

SUPEROPTIMIZATION 9

Page 27: Automatic Generation of Peephole Superoptimizerstheory.stanford.edu/~sbansal/talks/msr-talk.pdf · Automatic Generation of Peephole Superoptimizers Sorav Bansal and Alex Aiken Stanford

Traversing a Link List

1: movl head, %edx2: sall (%edx)3: movl head, %eax4: movl 4(%eax), %eax5: movl %eax, head6: mpl $0, %eax7:8:9: Step Three

1: movl head, %edx2: sall (%edx)3: movl %edx, %eax4: movl 4(%eax), %eax5: movl %eax, head6: mpl $0, %eax7:8:9: Step Four

Automatic Removal of Redundant Load

SUPEROPTIMIZATION 10

Page 28: Automatic Generation of Peephole Superoptimizerstheory.stanford.edu/~sbansal/talks/msr-talk.pdf · Automatic Generation of Peephole Superoptimizers Sorav Bansal and Alex Aiken Stanford

Traversing a Link List

1: movl head, %edx2: sall (%edx)3: movl %edx, %eax4: movl 4(%eax), %eax5: movl %eax, head6: mpl $0, %eax7:8:9: Step Four

1: movl head, %edx2: sall (%edx)3:4: movl 4(%edx), %eax5: movl %eax, head6: mpl $0, %eax7:8:9: Step Five

Automatic Copy Propagation

SUPEROPTIMIZATION 11

Page 29: Automatic Generation of Peephole Superoptimizerstheory.stanford.edu/~sbansal/talks/msr-talk.pdf · Automatic Generation of Peephole Superoptimizers Sorav Bansal and Alex Aiken Stanford

Traversing a Link List

1: movl head, %edx2: sall (%edx)3: movl 4(%edx), %eax4: movl %eax, head5: mpl $0, %eax6:7:8:9: Step Five

1: movl head, %edx2: sall (%edx)3: movl 4(%edx), %edx4: movl %edx, head5: mpl $0, %edx6:7:8:9: Step Six

Automatic Register Usage Optimization

SUPEROPTIMIZATION 12

Page 30: Automatic Generation of Peephole Superoptimizerstheory.stanford.edu/~sbansal/talks/msr-talk.pdf · Automatic Generation of Peephole Superoptimizers Sorav Bansal and Alex Aiken Stanford

Traversing a Link List

1: movl head, %edx2: sall (%edx)3: movl 4(%edx), %edx4: movl %edx, head5: mpl $0, %edx6:7:8:9: Our optimizer

1: sall (%edx)2: movl 4(%edx), %edx3: mpl $0, %edx4:5:6:7:8:9: g -optimized

SUPEROPTIMIZATION 13

Page 31: Automatic Generation of Peephole Superoptimizerstheory.stanford.edu/~sbansal/talks/msr-talk.pdf · Automatic Generation of Peephole Superoptimizers Sorav Bansal and Alex Aiken Stanford

Traversing a Link List

1: movl head, %edx2: sall (%edx)3: movl 4(%edx), %edx4: movl %edx, head5: mpl $0, %edx6:7:8:9: Our optimizer

1: sall (%edx)2: movl 4(%edx), %edx3: mpl $0, %edx4:5:6:7:8:9: g -optimized

Global Optimizations involving loop carried dependencies

cannot be handled by peephole optimizations

SUPEROPTIMIZATION 13-A

Page 32: Automatic Generation of Peephole Superoptimizerstheory.stanford.edu/~sbansal/talks/msr-talk.pdf · Automatic Generation of Peephole Superoptimizers Sorav Bansal and Alex Aiken Stanford

Traversing a Link-List

1: movl head, %edx2: movl %edx, %eax3: movl %(eax), %eax4: sall %eax5: movl %eax, %(edx)6: movl head, %eax7: movl 4(%eax), %eax8: movl %eax, head9: cmpl $0, head

1: movl head, %edx2: movl %edx, %eax3: movl %(eax), %eax4: sall %eax5: movl %eax, %(edx)6: movl head, %eax7: movl 4(%eax), %eax8: movl %eax, head9: cmpl $0, %eax

1: movl head, %edx2: sall (%edx)3: movl head, %eax4: movl 4(%eax), %eax5: movl %eax, head6: cmpl $0, %eax7:8:9:

1: movl head, %edx2: sall (%edx)3: movl 4(%edx), %edx4: movl %edx, head5: cmpl $0, %edx6:7:8:9:

1: movl head, %edx2: sall (%edx)3: movl 4(%edx), %eax4: movl %eax, head5: cmpl $0, %eax6:7:8:9:

1: movl head, %edx2: sall (%edx)3: movl %edx, %eax4: movl 4(%eax), %eax5: movl %eax, head6: cmpl $0, %eax7:8:9:

An automatically generated “peephole” optimizer can

capture many traditional optimizations

➜ Removal of Redundant Loads

➜ Instruction Selection

➜ Copy Propagation

➜ Reducing Register Usage

SUPEROPTIMIZATION 14

Page 33: Automatic Generation of Peephole Superoptimizerstheory.stanford.edu/~sbansal/talks/msr-talk.pdf · Automatic Generation of Peephole Superoptimizers Sorav Bansal and Alex Aiken Stanford

SUPEROPTIMIZATION 15

Page 34: Automatic Generation of Peephole Superoptimizerstheory.stanford.edu/~sbansal/talks/msr-talk.pdf · Automatic Generation of Peephole Superoptimizers Sorav Bansal and Alex Aiken Stanford

SUPEROPTIMIZATION 16

Page 35: Automatic Generation of Peephole Superoptimizerstheory.stanford.edu/~sbansal/talks/msr-talk.pdf · Automatic Generation of Peephole Superoptimizers Sorav Bansal and Alex Aiken Stanford

SYSTEM ARCHITECTURE

SYSTEM ARCHITECTURE 17

Page 36: Automatic Generation of Peephole Superoptimizerstheory.stanford.edu/~sbansal/talks/msr-talk.pdf · Automatic Generation of Peephole Superoptimizers Sorav Bansal and Alex Aiken Stanford

SYSTEM ARCHITECTURE

Training

Programs

SYSTEM ARCHITECTURE 17-A

Page 37: Automatic Generation of Peephole Superoptimizerstheory.stanford.edu/~sbansal/talks/msr-talk.pdf · Automatic Generation of Peephole Superoptimizers Sorav Bansal and Alex Aiken Stanford

SYSTEM ARCHITECTURE

Training

Programs

Harvester

SYSTEM ARCHITECTURE 17-B

Page 38: Automatic Generation of Peephole Superoptimizerstheory.stanford.edu/~sbansal/talks/msr-talk.pdf · Automatic Generation of Peephole Superoptimizers Sorav Bansal and Alex Aiken Stanford

SYSTEM ARCHITECTURE

Training

Programs

Harvester Canonicalizer

SYSTEM ARCHITECTURE 17-C

Page 39: Automatic Generation of Peephole Superoptimizerstheory.stanford.edu/~sbansal/talks/msr-talk.pdf · Automatic Generation of Peephole Superoptimizers Sorav Bansal and Alex Aiken Stanford

SYSTEM ARCHITECTURE

Training

Programs

Harvester Canonicalizer Fingerprinter

SYSTEM ARCHITECTURE 17-D

Page 40: Automatic Generation of Peephole Superoptimizerstheory.stanford.edu/~sbansal/talks/msr-talk.pdf · Automatic Generation of Peephole Superoptimizers Sorav Bansal and Alex Aiken Stanford

SYSTEM ARCHITECTURE

Training

Programs

Harvester Canonicalizer Fingerprinter

Fingerprint Hashtable

SYSTEM ARCHITECTURE 17-E

Page 41: Automatic Generation of Peephole Superoptimizerstheory.stanford.edu/~sbansal/talks/msr-talk.pdf · Automatic Generation of Peephole Superoptimizers Sorav Bansal and Alex Aiken Stanford

SYSTEM ARCHITECTURE

Training

Programs

Harvester Canonicalizer Fingerprinter

Fingerprint Hashtable

SYSTEM ARCHITECTURE 17-F

Page 42: Automatic Generation of Peephole Superoptimizerstheory.stanford.edu/~sbansal/talks/msr-talk.pdf · Automatic Generation of Peephole Superoptimizers Sorav Bansal and Alex Aiken Stanford

SYSTEM ARCHITECTURE

Training

Programs

Harvester Canonicalizer Fingerprinter

Fingerprint Hashtable

Fingerprinter

SYSTEM ARCHITECTURE 17-G

Page 43: Automatic Generation of Peephole Superoptimizerstheory.stanford.edu/~sbansal/talks/msr-talk.pdf · Automatic Generation of Peephole Superoptimizers Sorav Bansal and Alex Aiken Stanford

SYSTEM ARCHITECTURE

Training

Programs

Harvester Canonicalizer Fingerprinter

Fingerprint Hashtable

Fingerprinter match?

SYSTEM ARCHITECTURE 17-H

Page 44: Automatic Generation of Peephole Superoptimizerstheory.stanford.edu/~sbansal/talks/msr-talk.pdf · Automatic Generation of Peephole Superoptimizers Sorav Bansal and Alex Aiken Stanford

SYSTEM ARCHITECTURE

Training

Programs

Harvester Canonicalizer Fingerprinter

Fingerprint Hashtable

Fingerprinter match?Boolean

Equivalence Test

SYSTEM ARCHITECTURE 17-I

Page 45: Automatic Generation of Peephole Superoptimizerstheory.stanford.edu/~sbansal/talks/msr-talk.pdf · Automatic Generation of Peephole Superoptimizers Sorav Bansal and Alex Aiken Stanford

HARVESTER

Harvests only loop-free instruction sequences that have a

single entry point

- no middle instruction is a jump target

0: movl %esp, %ebp

1: movl 8(%ebp) %edx

2: sall %edx

3: movl %eax, %(edx)

:

: . . .

:

i : jne instruction 1

HARVESTER 18

Page 46: Automatic Generation of Peephole Superoptimizerstheory.stanford.edu/~sbansal/talks/msr-talk.pdf · Automatic Generation of Peephole Superoptimizers Sorav Bansal and Alex Aiken Stanford

HARVESTER

Harvests only loop-free instruction sequences that have a

single entry point

- no middle instruction is a jump target

0: movl %esp, %ebp

1: movl 8(%ebp) %edx

2: sall %edx

3: movl %eax, %(edx)

:

: . . .

:

i : jne instruction 1

HARVESTER 18-A

Page 47: Automatic Generation of Peephole Superoptimizerstheory.stanford.edu/~sbansal/talks/msr-talk.pdf · Automatic Generation of Peephole Superoptimizers Sorav Bansal and Alex Aiken Stanford

CANONICALIZER

movl %esp, %ebp

movl %ebp, %espmovl %esp, %ebp

CANONICALIZER 19

Page 48: Automatic Generation of Peephole Superoptimizerstheory.stanford.edu/~sbansal/talks/msr-talk.pdf · Automatic Generation of Peephole Superoptimizers Sorav Bansal and Alex Aiken Stanford

CANONICALIZER

movl %esp, %ebp

movl %ebp, %espmovl %esp, %ebp

movl %eax, %ebp

movl %ebp, %eaxmovl %eax, %ebp

CANONICALIZER 19-A

Page 49: Automatic Generation of Peephole Superoptimizerstheory.stanford.edu/~sbansal/talks/msr-talk.pdf · Automatic Generation of Peephole Superoptimizers Sorav Bansal and Alex Aiken Stanford

CANONICALIZER

movl %esp, %ebp

movl %ebp, %espmovl %esp, %ebp

movl %eax, %ebp

movl %ebp, %eaxmovl %eax, %ebp

movl %esp, %esi

movl %esi, %espmovl %esp, %esi

CANONICALIZER 19-B

Page 50: Automatic Generation of Peephole Superoptimizerstheory.stanford.edu/~sbansal/talks/msr-talk.pdf · Automatic Generation of Peephole Superoptimizers Sorav Bansal and Alex Aiken Stanford

CANONICALIZER

movl %esp, %ebp

movl %ebp, %espmovl %esp, %ebp

There are 56 variants of this rule

CANONICALIZER 20

Page 51: Automatic Generation of Peephole Superoptimizerstheory.stanford.edu/~sbansal/talks/msr-talk.pdf · Automatic Generation of Peephole Superoptimizers Sorav Bansal and Alex Aiken Stanford

CANONICALIZER

movl %esp, %ebp

movl %ebp, %espmovl %esp, %ebp

There are 56 variants of this rule

movl reg1, reg2

movl reg2, reg1movl reg1, reg2

Reduce it to one rule

CANONICALIZER 20-A

Page 52: Automatic Generation of Peephole Superoptimizerstheory.stanford.edu/~sbansal/talks/msr-talk.pdf · Automatic Generation of Peephole Superoptimizers Sorav Bansal and Alex Aiken Stanford

CANONICALIZER

addl %1234, %ebp

addl %5678, %ebpaddl %7912, %ebp

CANONICALIZER 21

Page 53: Automatic Generation of Peephole Superoptimizerstheory.stanford.edu/~sbansal/talks/msr-talk.pdf · Automatic Generation of Peephole Superoptimizers Sorav Bansal and Alex Aiken Stanford

CANONICALIZER

addl %1234, %ebp

addl %5678, %ebpaddl %7912, %ebp

Use symbolic constants to deal with immediates

addl cons0, %ebp

addl cons1, %ebpaddl cons0+cons1, %ebp

CANONICALIZER 21-A

Page 54: Automatic Generation of Peephole Superoptimizerstheory.stanford.edu/~sbansal/talks/msr-talk.pdf · Automatic Generation of Peephole Superoptimizers Sorav Bansal and Alex Aiken Stanford

CANONICALIZER

Rename the target sequence to get it in its canonical form:

Registers

➜ First register used is renamed to r0➜ Second distinct register used is renamed to r1➜ and so on...

Constants

➜ First constant used is renamed to 0 (a pre-defined constant)

➜ Second distinct constant is renamed to 1

Memory Addresses

➜ Rename a memory access in s-i-b form to a dereference of the

canonical name of its base (and index) register

CANONICALIZER 22

Page 55: Automatic Generation of Peephole Superoptimizerstheory.stanford.edu/~sbansal/talks/msr-talk.pdf · Automatic Generation of Peephole Superoptimizers Sorav Bansal and Alex Aiken Stanford

CANONICALIZER

Only instruction sequences in canonical form are enumerated

0 1 0 1 0 + 1 0 - 1 0 + 1 0 - 1

CANONICALIZER 23

Page 56: Automatic Generation of Peephole Superoptimizerstheory.stanford.edu/~sbansal/talks/msr-talk.pdf · Automatic Generation of Peephole Superoptimizers Sorav Bansal and Alex Aiken Stanford

CANONICALIZER

Only instruction sequences in canonical form are enumerated

movl r3, r4

movl r5, r3

0 1 0 1 0 + 1 0 - 1 0 + 1 0 - 1

CANONICALIZER 23-A

Page 57: Automatic Generation of Peephole Superoptimizerstheory.stanford.edu/~sbansal/talks/msr-talk.pdf · Automatic Generation of Peephole Superoptimizers Sorav Bansal and Alex Aiken Stanford

CANONICALIZER

Only instruction sequences in canonical form are enumerated

movl r3, r4

movl r5, r3

movl r0, r1

movl r2, r0

0 1 0 1 0 + 1 0 - 1 0 + 1 0 - 1

CANONICALIZER 23-B

Page 58: Automatic Generation of Peephole Superoptimizerstheory.stanford.edu/~sbansal/talks/msr-talk.pdf · Automatic Generation of Peephole Superoptimizers Sorav Bansal and Alex Aiken Stanford

CANONICALIZER

Only instruction sequences in canonical form are enumerated

movl r3, r4

movl r5, r3

movl r0, r1

movl r2, r0

Special Constants are enumerated for immediate operands

➜ 0 ; 1

➜ 0 ; 1

➜ 0 + 1 ; 0 - 1➜ 0 + 1 ; 0 - 1

CANONICALIZER 23-C

Page 59: Automatic Generation of Peephole Superoptimizerstheory.stanford.edu/~sbansal/talks/msr-talk.pdf · Automatic Generation of Peephole Superoptimizers Sorav Bansal and Alex Aiken Stanford

CANONICALIZER

Only instruction sequences in canonical form are enumerated

movl r3, r4

movl r5, r3

movl r0, r1

movl r2, r0

Special Constants are enumerated for immediate operands

➜ 0 ; 1

➜ 0 ; 1

➜ 0 + 1 ; 0 - 1➜ 0 + 1 ; 0 - 1

LHS of an optimization rule is always in canonical form

CANONICALIZER 23-D

Page 60: Automatic Generation of Peephole Superoptimizerstheory.stanford.edu/~sbansal/talks/msr-talk.pdf · Automatic Generation of Peephole Superoptimizers Sorav Bansal and Alex Aiken Stanford

CANONICALIZER

addl $47, %ebp

addl $53, %ebp

addl c0, %r0

addl c1, %r0

addl $100, %ebp addl c0+c1, %r0

Canonicalize

Query

Optimization

Database

Un-Canonicalize

CANONICALIZER 24

Page 61: Automatic Generation of Peephole Superoptimizerstheory.stanford.edu/~sbansal/talks/msr-talk.pdf · Automatic Generation of Peephole Superoptimizers Sorav Bansal and Alex Aiken Stanford

FINGERPRINTER

Execute the instruction sequences on random-bit states

Use hash of result to compute a fingerprint

FINGERPRINTER 25

Page 62: Automatic Generation of Peephole Superoptimizerstheory.stanford.edu/~sbansal/talks/msr-talk.pdf · Automatic Generation of Peephole Superoptimizers Sorav Bansal and Alex Aiken Stanford

FINGERPRINTER

Execute the instruction sequences on random-bit states

Use hash of result to compute a fingerprint

Memory is approximated by a 256 byte array

➜ Two equivalent accesses guaranteed to access the same

location in array

FINGERPRINTER 25-A

Page 63: Automatic Generation of Peephole Superoptimizerstheory.stanford.edu/~sbansal/talks/msr-talk.pdf · Automatic Generation of Peephole Superoptimizers Sorav Bansal and Alex Aiken Stanford

FINGERPRINTER

Execute the instruction sequences on random-bit states

Use hash of result to compute a fingerprint

Memory is approximated by a 256 byte array

➜ Two equivalent accesses guaranteed to access the same

location in array

Fingerprint computed by direct execution on hardware

➜ Fingerprint of a sequence computed in <3µs

FINGERPRINTER 25-B

Page 64: Automatic Generation of Peephole Superoptimizerstheory.stanford.edu/~sbansal/talks/msr-talk.pdf · Automatic Generation of Peephole Superoptimizers Sorav Bansal and Alex Aiken Stanford

SYSTEM ARCHITECTURE

Training

Programs

Harvester Canonicalizer Fingerprinter

Fingerprint Hashtable

Fingerprinter match?

SYSTEM ARCHITECTURE 26

Page 65: Automatic Generation of Peephole Superoptimizerstheory.stanford.edu/~sbansal/talks/msr-talk.pdf · Automatic Generation of Peephole Superoptimizers Sorav Bansal and Alex Aiken Stanford

SYSTEM ARCHITECTURE

Training

Programs

Harvester Canonicalizer Fingerprinter

Fingerprint Hashtable

Fingerprinter match?Boolean

Equivalence Test

SYSTEM ARCHITECTURE 26-A

Page 66: Automatic Generation of Peephole Superoptimizerstheory.stanford.edu/~sbansal/talks/msr-talk.pdf · Automatic Generation of Peephole Superoptimizers Sorav Bansal and Alex Aiken Stanford

EQUIVALENCE TEST

Execution Test

(fast)

Boolean Test

(slower)

EQUIVALENCE TEST 27

Page 67: Automatic Generation of Peephole Superoptimizerstheory.stanford.edu/~sbansal/talks/msr-talk.pdf · Automatic Generation of Peephole Superoptimizers Sorav Bansal and Alex Aiken Stanford

EQUIVALENCE TEST

Execution Test

(fast)

Boolean Test

(slower)

Execution Test

Fingerprint on many different states

EQUIVALENCE TEST 27-A

Page 68: Automatic Generation of Peephole Superoptimizerstheory.stanford.edu/~sbansal/talks/msr-talk.pdf · Automatic Generation of Peephole Superoptimizers Sorav Bansal and Alex Aiken Stanford

EQUIVALENCE TEST

Execution Test

(fast)

Boolean Test

(slower)

Execution Test

Fingerprint on many different states

Boolean Test

Use SAT Solver

EQUIVALENCE TEST 27-B

Page 69: Automatic Generation of Peephole Superoptimizerstheory.stanford.edu/~sbansal/talks/msr-talk.pdf · Automatic Generation of Peephole Superoptimizers Sorav Bansal and Alex Aiken Stanford

BOOLEAN TEST

Machine state represented by a set of registers and a model

of memory

BOOLEAN TEST 28

Page 70: Automatic Generation of Peephole Superoptimizerstheory.stanford.edu/~sbansal/talks/msr-talk.pdf · Automatic Generation of Peephole Superoptimizers Sorav Bansal and Alex Aiken Stanford

BOOLEAN TEST

Machine state represented by a set of registers and a model

of memory

Memory model

➜ Map from address expressions to data expressions

➜ Aliasing handled by comparing addresses of multiple accesses

BOOLEAN TEST 28-A

Page 71: Automatic Generation of Peephole Superoptimizerstheory.stanford.edu/~sbansal/talks/msr-talk.pdf · Automatic Generation of Peephole Superoptimizers Sorav Bansal and Alex Aiken Stanford

BOOLEAN TEST

Machine state represented by a set of registers and a model

of memory

Memory model

➜ Map from address expressions to data expressions

➜ Aliasing handled by comparing addresses of multiple accesses

Instructions encoded as boolean circuits

input state output state

BOOLEAN TEST 28-B

Page 72: Automatic Generation of Peephole Superoptimizerstheory.stanford.edu/~sbansal/talks/msr-talk.pdf · Automatic Generation of Peephole Superoptimizers Sorav Bansal and Alex Aiken Stanford

BOOLEAN TEST

Machine state represented by a set of registers and a model

of memory

Memory model

➜ Map from address expressions to data expressions

➜ Aliasing handled by comparing addresses of multiple accesses

Instructions encoded as boolean circuits

input state output state

Use a SAT Solver to compute equivalence

BOOLEAN TEST 28-C

Page 73: Automatic Generation of Peephole Superoptimizerstheory.stanford.edu/~sbansal/talks/msr-talk.pdf · Automatic Generation of Peephole Superoptimizers Sorav Bansal and Alex Aiken Stanford

CONTEXT SENSITIVITY

Equivalence of two instruction sequences is defined under

the set of registers live beyond the sequence itself

CONTEXT SENSITIVITY 29

Page 74: Automatic Generation of Peephole Superoptimizerstheory.stanford.edu/~sbansal/talks/msr-talk.pdf · Automatic Generation of Peephole Superoptimizers Sorav Bansal and Alex Aiken Stanford

EXPERIMENTAL RESULTS

Length Original After Canonicalize

Search Space and Prune

1 5,606 654

2 31 m 1.09 m

3 176 b 2.8 b

Search Space

Exhaustively enumerate sequences up to length 3

EXPERIMENTAL RESULTS 30

Page 75: Automatic Generation of Peephole Superoptimizerstheory.stanford.edu/~sbansal/talks/msr-talk.pdf · Automatic Generation of Peephole Superoptimizers Sorav Bansal and Alex Aiken Stanford

EXPERIMENTAL RESULTS

Integer addition

for (i = 0; i < 64; i++)sum += a[i]

EXPERIMENTAL RESULTS 31

Page 76: Automatic Generation of Peephole Superoptimizerstheory.stanford.edu/~sbansal/talks/msr-talk.pdf · Automatic Generation of Peephole Superoptimizers Sorav Bansal and Alex Aiken Stanford

EXPERIMENTAL RESULTS

Integer addition

for (i = 0; i < 64; i++)sum += a[i]

psadbw: sum of absolute differences of 8 consecutive

integers

psubb %mm0, %mm0

psadbw &a[i], %mm0

movd %mm0, sum

3x faster

EXPERIMENTAL RESULTS 31-A

Page 77: Automatic Generation of Peephole Superoptimizerstheory.stanford.edu/~sbansal/talks/msr-talk.pdf · Automatic Generation of Peephole Superoptimizers Sorav Bansal and Alex Aiken Stanford

EXPERIMENTAL RESULTS

Comparison c[i] = (a[i] < b[i]) ? c0 : c1 7x

Minimum c[i] = (a[i] < b[i]) ? a[i] : b[i] 8x

Pixel-difference sum += abs(a[i]− b[i]) 10x

XOR c[i] = a[i]⊕ b[i] 2x

Sprite Copy c[i] = (a[i] == 0)? b[i] : a[i] 2x

MMX and conditional-move instructions are still under-used

EXPERIMENTAL RESULTS 32

Page 78: Automatic Generation of Peephole Superoptimizerstheory.stanford.edu/~sbansal/talks/msr-talk.pdf · Automatic Generation of Peephole Superoptimizers Sorav Bansal and Alex Aiken Stanford

EXPERIMENTAL RESULTS

We evaluate our optimizer on SPEC benchmarks compiled

using g

Use two cost functions

➜ Codesize

➜ Runtime

• Number of memory accesses

• Number of jump instructions

• Instruction costs

EXPERIMENTAL RESULTS 33

Page 79: Automatic Generation of Peephole Superoptimizerstheory.stanford.edu/~sbansal/talks/msr-talk.pdf · Automatic Generation of Peephole Superoptimizers Sorav Bansal and Alex Aiken Stanford

EXPERIMENTAL RESULTS

Program Codesize

Improvement

gzip 3.95%

mcf 5.86%

crafty 1.71%

bzip2 4.58%

gcc 1.12%

twolf 1.47%

parser 3.06%

Codesize improvement on executables

already optimized for size (-Os)

Codesize Improvement: 1 – 6%

EXPERIMENTAL RESULTS 34

Page 80: Automatic Generation of Peephole Superoptimizerstheory.stanford.edu/~sbansal/talks/msr-talk.pdf · Automatic Generation of Peephole Superoptimizers Sorav Bansal and Alex Aiken Stanford

EXPERIMENTAL RESULTS

Program Instruction Count Improvement

gzip 4.16%

mcf 3.73%

crafty 2.19%

bzip2 4.11%

gcc 2.44%

twolf 2.17%

parser 3.84%

Instruction count improvement on

optimized executables (-O2)

Instruction Count Improvement: 2 – 4%

Speedup: 1 – 5%

EXPERIMENTAL RESULTS 35

Page 81: Automatic Generation of Peephole Superoptimizerstheory.stanford.edu/~sbansal/talks/msr-talk.pdf · Automatic Generation of Peephole Superoptimizers Sorav Bansal and Alex Aiken Stanford

EXPERIMENTAL RESULTS

Number of Distinct Optimization Rules Learnt

Codesize 3000

Runtime 2100

Re-use of Optimization Rules

18,679

4,098

2,82

3

1,737

1,256

Frequency of Use

>1000

201-1000

51-200

11-50

1-10

Total: 28,593 optimizations

EXPERIMENTAL RESULTS 36

Page 82: Automatic Generation of Peephole Superoptimizerstheory.stanford.edu/~sbansal/talks/msr-talk.pdf · Automatic Generation of Peephole Superoptimizers Sorav Bansal and Alex Aiken Stanford

Initial Machine State

r0r1r2r3r4memory

flags

Target Machine State

r0r1r2r3r4memory

flags

EXPERIMENTAL RESULTS 37

Page 83: Automatic Generation of Peephole Superoptimizerstheory.stanford.edu/~sbansal/talks/msr-talk.pdf · Automatic Generation of Peephole Superoptimizers Sorav Bansal and Alex Aiken Stanford

Initial Machine State

r0r1r2r3r4memory

flags

Target Machine State

r0r1r2r3r4memory

flags

Exhaustive Search over

Length-3 sequences

EXPERIMENTAL RESULTS 37-A

Page 84: Automatic Generation of Peephole Superoptimizerstheory.stanford.edu/~sbansal/talks/msr-talk.pdf · Automatic Generation of Peephole Superoptimizers Sorav Bansal and Alex Aiken Stanford

Initial Machine State

r0r1r2r3r4memory

flags

Target Machine State

r0r1r2r3r4memory

flags

Exhaustive Search over

Length-3 sequences

EXPERIMENTAL RESULTS 37-B

Page 85: Automatic Generation of Peephole Superoptimizerstheory.stanford.edu/~sbansal/talks/msr-talk.pdf · Automatic Generation of Peephole Superoptimizers Sorav Bansal and Alex Aiken Stanford

Plain Text Cipher Text

Exhaustive Search over

Length-3 sequences

56-bit keys

EXPERIMENTAL RESULTS 38

Page 86: Automatic Generation of Peephole Superoptimizerstheory.stanford.edu/~sbansal/talks/msr-talk.pdf · Automatic Generation of Peephole Superoptimizers Sorav Bansal and Alex Aiken Stanford

CT = Encrypt56(PT )

EXPERIMENTAL RESULTS 39

Page 87: Automatic Generation of Peephole Superoptimizerstheory.stanford.edu/~sbansal/talks/msr-talk.pdf · Automatic Generation of Peephole Superoptimizers Sorav Bansal and Alex Aiken Stanford

CT = Encrypt56(PT )

If, Key56 = Key28 ◦ Key28

s.t., CT = Encrypt128(Encrypt228(PT ))

EXPERIMENTAL RESULTS 39-A

Page 88: Automatic Generation of Peephole Superoptimizerstheory.stanford.edu/~sbansal/talks/msr-talk.pdf · Automatic Generation of Peephole Superoptimizers Sorav Bansal and Alex Aiken Stanford

CT = Encrypt56(PT )

If, Key56 = Key28 ◦ Key28

s.t., CT = Encrypt128(Encrypt228(PT ))

Then,

Decrypt128(CT ) = Encrypt228(PT )

EXPERIMENTAL RESULTS 39-B

Page 89: Automatic Generation of Peephole Superoptimizerstheory.stanford.edu/~sbansal/talks/msr-talk.pdf · Automatic Generation of Peephole Superoptimizers Sorav Bansal and Alex Aiken Stanford

MEET-IN-MIDDLE

Plain Text Cipher Text

Encrypt using all

possible 28-bit keys

Decrypt using all

possible 28-bit keys

MEET-IN-MIDDLE 40

Page 90: Automatic Generation of Peephole Superoptimizerstheory.stanford.edu/~sbansal/talks/msr-talk.pdf · Automatic Generation of Peephole Superoptimizers Sorav Bansal and Alex Aiken Stanford

MEET-IN-MIDDLE

Plain Text Cipher Text

Encrypt using all

possible 28-bit keys

Decrypt using all

possible 28-bit keys

Match

MEET-IN-MIDDLE 40-A

Page 91: Automatic Generation of Peephole Superoptimizerstheory.stanford.edu/~sbansal/talks/msr-talk.pdf · Automatic Generation of Peephole Superoptimizers Sorav Bansal and Alex Aiken Stanford

MEET-IN-MIDDLE

Plain Text Cipher Text

Encrypt using all

possible 28-bit keys

Decrypt using all

possible 28-bit keys

Match

O(228)

MEET-IN-MIDDLE 40-B

Page 92: Automatic Generation of Peephole Superoptimizerstheory.stanford.edu/~sbansal/talks/msr-talk.pdf · Automatic Generation of Peephole Superoptimizers Sorav Bansal and Alex Aiken Stanford

MEET-IN-MIDDLE

Initial

r0r1r2r3r4memory

flags

Target

r0r1r2r3r4memory

flags

Enumerate all

length n sequences

Enumerate all

length m inverse-sequences

MEET-IN-MIDDLE 41

Page 93: Automatic Generation of Peephole Superoptimizerstheory.stanford.edu/~sbansal/talks/msr-talk.pdf · Automatic Generation of Peephole Superoptimizers Sorav Bansal and Alex Aiken Stanford

MEET-IN-MIDDLE

Initial

r0r1r2r3r4memory

flags

Target

r0r1r2r3r4memory

flags

Enumerate all

length n sequences

Enumerate all

length m inverse-sequences

Match

MEET-IN-MIDDLE 41-A

Page 94: Automatic Generation of Peephole Superoptimizerstheory.stanford.edu/~sbansal/talks/msr-talk.pdf · Automatic Generation of Peephole Superoptimizers Sorav Bansal and Alex Aiken Stanford

MEET-IN-MIDDLE

Initial

r0r1r2r3r4memory

flags

Target

r0r1r2r3r4memory

flags

Enumerate all

length n sequences

Enumerate all

length m inverse-sequences

Match

Effective Enumerated Length: m + n

MEET-IN-MIDDLE 41-B

Page 95: Automatic Generation of Peephole Superoptimizerstheory.stanford.edu/~sbansal/talks/msr-talk.pdf · Automatic Generation of Peephole Superoptimizers Sorav Bansal and Alex Aiken Stanford

MEET-IN-MIDDLE

What is the inverse of an instruction sequence?

How fast can we match?

MEET-IN-MIDDLE 42

Page 96: Automatic Generation of Peephole Superoptimizerstheory.stanford.edu/~sbansal/talks/msr-talk.pdf · Automatic Generation of Peephole Superoptimizers Sorav Bansal and Alex Aiken Stanford

INVERSE OF AN INSTRUCTION SEQUENCE

Initial

r0r1r2r3r4memory

flags

Finalr0r1r2r3r4

memoryflagsExecute

Sequence

Tightest Approximation

to Initial

r0r1r2r3r4memory

flagsExecute

Sequence−1

Approximate using don’t-know bits

INVERSE OF AN INSTRUCTION SEQUENCE 43

Page 97: Automatic Generation of Peephole Superoptimizerstheory.stanford.edu/~sbansal/talks/msr-talk.pdf · Automatic Generation of Peephole Superoptimizers Sorav Bansal and Alex Aiken Stanford

INSTRUCTION INVERSE

Instruction Inverse

ddd d1d11d d0d0dd0 ddd d

dd

INSTRUCTION INVERSE 44

Page 98: Automatic Generation of Peephole Superoptimizerstheory.stanford.edu/~sbansal/talks/msr-talk.pdf · Automatic Generation of Peephole Superoptimizers Sorav Bansal and Alex Aiken Stanford

INSTRUCTION INVERSE

Instruction Inverse

add r0, r1 sub r0,r1 flags← d

dd d1d11d d0d0dd0 ddd d

dd

INSTRUCTION INVERSE 44-A

Page 99: Automatic Generation of Peephole Superoptimizerstheory.stanford.edu/~sbansal/talks/msr-talk.pdf · Automatic Generation of Peephole Superoptimizers Sorav Bansal and Alex Aiken Stanford

INSTRUCTION INVERSE

Instruction Inverse

add r0, r1 sub r0,r1 flags← dadd r0, r0 shr r0; r0[31]← d flags← d

d d1d11d d0d0dd0 ddd d

dd

INSTRUCTION INVERSE 44-B

Page 100: Automatic Generation of Peephole Superoptimizerstheory.stanford.edu/~sbansal/talks/msr-talk.pdf · Automatic Generation of Peephole Superoptimizers Sorav Bansal and Alex Aiken Stanford

INSTRUCTION INVERSE

Instruction Inverse

add r0, r1 sub r0,r1 flags← dadd r0, r0 shr r0; r0[31]← d flags← dmov r0, r1 r0← d

d1d11d d0d0dd0 ddd d

dd

INSTRUCTION INVERSE 44-C

Page 101: Automatic Generation of Peephole Superoptimizerstheory.stanford.edu/~sbansal/talks/msr-talk.pdf · Automatic Generation of Peephole Superoptimizers Sorav Bansal and Alex Aiken Stanford

INSTRUCTION INVERSE

Instruction Inverse

add r0, r1 sub r0,r1 flags← dadd r0, r0 shr r0; r0[31]← d flags← dmov r0, r1 r0← dand r0, $010110 r0← r0 & d1d11d flags← d

0d0dd0 ddd d

dd

INSTRUCTION INVERSE 44-D

Page 102: Automatic Generation of Peephole Superoptimizerstheory.stanford.edu/~sbansal/talks/msr-talk.pdf · Automatic Generation of Peephole Superoptimizers Sorav Bansal and Alex Aiken Stanford

INSTRUCTION INVERSE

Instruction Inverse

add r0, r1 sub r0,r1 flags← dadd r0, r0 shr r0; r0[31]← d flags← dmov r0, r1 r0← dand r0, $010110 r0← r0 & d1d11d flags← d

or r0, $010110 r0← r0 | 0d0dd0 flags← d

dd d

dd

INSTRUCTION INVERSE 44-E

Page 103: Automatic Generation of Peephole Superoptimizerstheory.stanford.edu/~sbansal/talks/msr-talk.pdf · Automatic Generation of Peephole Superoptimizers Sorav Bansal and Alex Aiken Stanford

INSTRUCTION INVERSE

Instruction Inverse

add r0, r1 sub r0,r1 flags← dadd r0, r0 shr r0; r0[31]← d flags← dmov r0, r1 r0← dand r0, $010110 r0← r0 & d1d11d flags← d

or r0, $010110 r0← r0 | 0d0dd0 flags← d

xor r0, r1 xor r0, r1 flags← d

d ddd

INSTRUCTION INVERSE 44-F

Page 104: Automatic Generation of Peephole Superoptimizerstheory.stanford.edu/~sbansal/talks/msr-talk.pdf · Automatic Generation of Peephole Superoptimizers Sorav Bansal and Alex Aiken Stanford

INSTRUCTION INVERSE

Instruction Inverse

add r0, r1 sub r0,r1 flags← dadd r0, r0 shr r0; r0[31]← d flags← dmov r0, r1 r0← dand r0, $010110 r0← r0 & d1d11d flags← d

or r0, $010110 r0← r0 | 0d0dd0 flags← d

xor r0, r1 xor r0, r1 flags← d

xchg r0, r1 xchg r0, r1

d ddd

INSTRUCTION INVERSE 44-G

Page 105: Automatic Generation of Peephole Superoptimizerstheory.stanford.edu/~sbansal/talks/msr-talk.pdf · Automatic Generation of Peephole Superoptimizers Sorav Bansal and Alex Aiken Stanford

INSTRUCTION INVERSE

Instruction Inverse

add r0, r1 sub r0,r1 flags← dadd r0, r0 shr r0; r0[31]← d flags← dmov r0, r1 r0← dand r0, $010110 r0← r0 & d1d11d flags← d

or r0, $010110 r0← r0 | 0d0dd0 flags← d

xor r0, r1 xor r0, r1 flags← d

xchg r0, r1 xchg r0, r1

shr r0, $3 shl r0, $3; r0[0..2]← d flags← d

dd

INSTRUCTION INVERSE 44-H

Page 106: Automatic Generation of Peephole Superoptimizerstheory.stanford.edu/~sbansal/talks/msr-talk.pdf · Automatic Generation of Peephole Superoptimizers Sorav Bansal and Alex Aiken Stanford

INSTRUCTION INVERSE

Instruction Inverse

add r0, r1 sub r0,r1 flags← dadd r0, r0 shr r0; r0[31]← d flags← dmov r0, r1 r0← dand r0, $010110 r0← r0 & d1d11d flags← d

or r0, $010110 r0← r0 | 0d0dd0 flags← d

xor r0, r1 xor r0, r1 flags← d

xchg r0, r1 xchg r0, r1

shr r0, $3 shl r0, $3; r0[0..2]← d flags← d

ror r0, $3 rol r0, $3

inc r0 dec r0 flags← d

neg r0 neg r0 flags← d

not r0 not r0

INSTRUCTION INVERSE 44-I

Page 107: Automatic Generation of Peephole Superoptimizerstheory.stanford.edu/~sbansal/talks/msr-talk.pdf · Automatic Generation of Peephole Superoptimizers Sorav Bansal and Alex Aiken Stanford

Some instructions need not be enumerated during inverse

enumeration

r0r1r2r3r4memory

flags r0r1r2r3r4memory

flags

INSTRUCTION INVERSE 45

Page 108: Automatic Generation of Peephole Superoptimizerstheory.stanford.edu/~sbansal/talks/msr-talk.pdf · Automatic Generation of Peephole Superoptimizers Sorav Bansal and Alex Aiken Stanford

Some instructions need not be enumerated during inverse

enumeration

Initial State

r0r1r2r3r4memory

flags

Final State

r0r1r2r3r4memory

flags

mov r0, r1

INSTRUCTION INVERSE 45-A

Page 109: Automatic Generation of Peephole Superoptimizerstheory.stanford.edu/~sbansal/talks/msr-talk.pdf · Automatic Generation of Peephole Superoptimizers Sorav Bansal and Alex Aiken Stanford

Some instructions need not be enumerated during inverse

enumeration

Initial State

r0r1r2r3r4memory

flags

Final State

r0r1r2r3r4memory

flags

mov r0, r1

{mov r0, r1} can be the last instruction of the optimal

sequence only if r0 = r1 in the final state.

INSTRUCTION INVERSE 45-B

Page 110: Automatic Generation of Peephole Superoptimizerstheory.stanford.edu/~sbansal/talks/msr-talk.pdf · Automatic Generation of Peephole Superoptimizers Sorav Bansal and Alex Aiken Stanford

Some instructions need not be enumerated during inverse

enumeration

Initial State

r0r1r2r3r4memory

flags

Final State

r0r1r2r3r4memory

flags

mov r0, r1

{mov r0, r1} can be the last instruction of the optimal

sequence only if r0 = r1 in the final state.

If r0 6= r1, {mov r0, r1} need not be enumerated

INSTRUCTION INVERSE 45-C

Page 111: Automatic Generation of Peephole Superoptimizerstheory.stanford.edu/~sbansal/talks/msr-talk.pdf · Automatic Generation of Peephole Superoptimizers Sorav Bansal and Alex Aiken Stanford

The final state must be constrained w.r.t to the current

instruction being enumerated

Instruction Constraint

mov r0, r1 r0 = r1

add r0, r1 flags must agree with r0, r1

and r0, $010110 bits 0, 3 and 5 must be 0

or r0, $010110 bits 1, 2 and 4 must be 1

shr r0, $3 Three MSBs must be zero

xor r0, r1 flags must agree with r0, r1

inc r0 flags must agree with r0

dec r0 flags must agree with r0

INSTRUCTION INVERSE 46

Page 112: Automatic Generation of Peephole Superoptimizerstheory.stanford.edu/~sbansal/talks/msr-talk.pdf · Automatic Generation of Peephole Superoptimizers Sorav Bansal and Alex Aiken Stanford

The final state must be constrained w.r.t to the current

instruction being enumerated

Instruction Constraint

mov r0, r1 r0 = r1

add r0, r1 flags must agree with r0, r1

and r0, $010110 bits 0, 3 and 5 must be 0

or r0, $010110 bits 1, 2 and 4 must be 1

shr r0, $3 Three MSBs must be zero

xor r0, r1 flags must agree with r0, r1

inc r0 flags must agree with r0

dec r0 flags must agree with r0

Intuitively: Goal-Directed Search can be pruned

much better

INSTRUCTION INVERSE 46-A

Page 113: Automatic Generation of Peephole Superoptimizerstheory.stanford.edu/~sbansal/talks/msr-talk.pdf · Automatic Generation of Peephole Superoptimizers Sorav Bansal and Alex Aiken Stanford

Prune at every step

Final State

r0r1r2r3r4memory

flags

mov r0, r1

Final - 1r0r1r2r3r4

memoryflags

Prune

add r1, r2

Prune

INSTRUCTION INVERSE 47

Page 114: Automatic Generation of Peephole Superoptimizerstheory.stanford.edu/~sbansal/talks/msr-talk.pdf · Automatic Generation of Peephole Superoptimizers Sorav Bansal and Alex Aiken Stanford

MATCH

Initial

r0r1r2r3r4memory

flags

Target

r0r1r2r3r4memory

flags

Forward Enumeration

machine states

Backward Enumeration

machine states

0

00 01

TRIE

MATCH 48

Page 115: Automatic Generation of Peephole Superoptimizerstheory.stanford.edu/~sbansal/talks/msr-talk.pdf · Automatic Generation of Peephole Superoptimizers Sorav Bansal and Alex Aiken Stanford

MATCH

Initial

r0r1r2r3r4memory

flags

Target

r0r1r2r3r4memory

flags

Forward Enumeration

machine states

Backward Enumeration

machine states

0

00 01

TRIE

• A Trie is constructed on the forward-enumeration machine

states

MATCH 48-A

Page 116: Automatic Generation of Peephole Superoptimizerstheory.stanford.edu/~sbansal/talks/msr-talk.pdf · Automatic Generation of Peephole Superoptimizers Sorav Bansal and Alex Aiken Stanford

MATCH

Initial

r0r1r2r3r4memory

flags

Target

r0r1r2r3r4memory

flags

Forward Enumeration

machine states

Backward Enumeration

machine states

0

00 01

TRIE

• A Trie is constructed on the forward-enumeration machine

states

• Each backward-enumeration machine state is searched on the

constructed trie

MATCH 48-B

Page 117: Automatic Generation of Peephole Superoptimizerstheory.stanford.edu/~sbansal/talks/msr-talk.pdf · Automatic Generation of Peephole Superoptimizers Sorav Bansal and Alex Aiken Stanford

COMPLEXITY

Assuming size of instruction set = I (approx 3000)

For a length m + n instruction sequence:

• Original: O(Im+n)

• New Best Case: O(In + Im. log2 In)

• New Worst Case: O(Im+n)

• If average number of don’t care bits are f%: O(f.Im+n).

In practice, much less due to pruning.

COMPLEXITY 49

Page 118: Automatic Generation of Peephole Superoptimizerstheory.stanford.edu/~sbansal/talks/msr-talk.pdf · Automatic Generation of Peephole Superoptimizers Sorav Bansal and Alex Aiken Stanford

PRELIMINARY RESULTS

477/5500 instructions are perfectly invertible (i.e., no don’t

knows)

(invertibility depends on both opcode and operands)

PRELIMINARY RESULTS 50

Page 119: Automatic Generation of Peephole Superoptimizerstheory.stanford.edu/~sbansal/talks/msr-talk.pdf · Automatic Generation of Peephole Superoptimizers Sorav Bansal and Alex Aiken Stanford

PRELIMINARY RESULTS

• 3 = 2 + 1

– Can enumerate length-3 in few minutes as opposed to

2 days

PRELIMINARY RESULTS 51

Page 120: Automatic Generation of Peephole Superoptimizerstheory.stanford.edu/~sbansal/talks/msr-talk.pdf · Automatic Generation of Peephole Superoptimizers Sorav Bansal and Alex Aiken Stanford

PRELIMINARY RESULTS

• 3 = 2 + 1

– Can enumerate length-3 in few minutes as opposed to

2 days

• 4 = 2 + 2

– For some sequences, completes in few minutes

– For some, it takes a day

PRELIMINARY RESULTS 51-A

Page 121: Automatic Generation of Peephole Superoptimizerstheory.stanford.edu/~sbansal/talks/msr-talk.pdf · Automatic Generation of Peephole Superoptimizers Sorav Bansal and Alex Aiken Stanford

PRELIMINARY RESULTS

• 3 = 2 + 1

– Can enumerate length-3 in few minutes as opposed to

2 days

• 4 = 2 + 2

– For some sequences, completes in few minutes

– For some, it takes a day

• 4 = 3 + 1

– Currently Implementing. Likely to be much faster than

2+2.

PRELIMINARY RESULTS 51-B

Page 122: Automatic Generation of Peephole Superoptimizerstheory.stanford.edu/~sbansal/talks/msr-talk.pdf · Automatic Generation of Peephole Superoptimizers Sorav Bansal and Alex Aiken Stanford

PRELIMINARY RESULTS

• 3 = 2 + 1

– Can enumerate length-3 in few minutes as opposed to

2 days

• 4 = 2 + 2

– For some sequences, completes in few minutes

– For some, it takes a day

• 4 = 3 + 1

– Currently Implementing. Likely to be much faster than

2+2.

• 5 = 3 + 2

– Perhaps, possible

PRELIMINARY RESULTS 51-C

Page 123: Automatic Generation of Peephole Superoptimizerstheory.stanford.edu/~sbansal/talks/msr-talk.pdf · Automatic Generation of Peephole Superoptimizers Sorav Bansal and Alex Aiken Stanford

CONCLUSIONS

• We have demonstrated the construction of an optimizer

using only exhaustive search

• Many (sometimes huge) performance opportunities are

still unexploited by compilers

• Scaling to longer sequence lengths is the key

• http://cs.stanford.edu/∼sbansal/superoptimizer

CONCLUSIONS 52