interprocedural optimization — a much older version of the lecture — copyright 2011, keith d....
TRANSCRIPT
Interprocedural Optimization
— a much older version of the lecture —
Copyright 2011, Keith D. Cooper & Linda Torczon, all rights reserved.
Students enrolled in Comp 512 at Rice University have explicit permission to make copies of these materials for their personal use.
Faculty from other educational institutions may use these materials for nonprofit educational purposes, provided this copyright notice is preserved.
Comp 512Spring 2011
1COMP 512, Rice University
COMP 512, Rice University
2
Enough Analysis, What Can We Do?
There are interprocedural optimizations
• Choosing custom procedure linkages
• Interprocedural common subexpression elimination
• Interprocedural code motion
• Interprocedural register allocation
• Memo-function implementation
• Cross-jumping
• Procedure recognition & abstraction
COMP 512, Rice University
3
Linkage Tailoring
Should choose the best linkage for circumstances
• Inline, clone, semi-open, semi-closed, closed
• Estimate execution frequencies & improvements
• Assign styles to call sites The choices interact
COMP 512, Rice University
4
Linkage Tailoring
Linkage styles
Traditional closed linkage
p
q
p
q
Open linkage(inlined )• Eliminate the call overhead
• Create tailored private copy of callee
COMP 512, Rice University
5
Linkage Tailoring
Linkage styles
Procedure Cloning
tsp
q0 q1
r
• Partition call sites by environment
• Create a location where desired facts can be true
COMP 512, Rice University
6
Linkage Tailoring
Linkage styles
Traditional closed linkage
p
q
Semi-open linkage
• Move prolog & epilog across call & tailor them
• For call in loop, parts of linkage are loop invariant
p
q
Think of this as an aggressive closed linkage
COMP 512, Rice University
7
Linkage Tailoring
Should choose the best linkage for circumstances
• Inline, clone, semi-open, semi-closed, closed
• Estimate execution frequencies & improvements
• Assign styles to call sites
Practical approach
• Limit choices (standard, cloned, inlined)
• Clone for better information & to specialize (based on idfa)
• Inline for high-payoff optimizations
• Adopt an aggressive standard linkage Move parameter addressing code out of callee (& out of
loop)
The choices interact
COMP 512, Rice University
8
Linkage Tailoring
A low-level, post-compilation idea
• Compute register summaries for each procedure How many registers are really used?
• Rotate assignment to move them into caller-saves registers Leave callee-saves registers unused, if possible Avoid saving them in the callee
Leaf routine with little demand for registers
• Adjust caller-saves/callee-saves boundary
• Remove caller’s stores for registers unused in callee
Some authors call this interprocedural register allocation
COMP 512, Rice University
9
Interprocedural Register Allocation
Some work has been done
• Chow’s compiler for the MIPS machine, “shrink wrapping” Often slowed down the code
• Wall and Goodwin at DEC SRC did link-time allocation Target machine had scads of registers Essentially, adjusted caller-saves & callee-saves
What about full-blown, allocate the whole thing, allocation?
• Arithmetic of costs is pretty complex
• Requires good profile or frequency information
• Need a fair basis for comparing different uses for ri
• Real issue would be spilling (“spill everywhere” would be a disaster)
COMP 512, Rice University
10
Interprocedural Common Subexpression Elimination
Sounds good, but consider the real opportunities
• Procedures only share parameters, globals, & constants
• No local variables in an ICSE, by definition
• Not a very large set of expressions
Possible schemes
• Create a global data area to hold ICSEs Sidestep register pressure issue
• Ellide unnecessary parameters or pass-through parameters “commoning” in Cocke’s terminology Shrink the pre-call sequence
COMP 512, Rice University
11
Invocation Invariant Expressions
Do iie’s exist? (yes)
Is moving them profitable? (it should be)
Can we engineer this into a compiler? (this is tougher)
8.9% to 0.2%
1.66% on average
Static counts, not dynamic counts
COMP 512, Rice University
12
Interprocedural Code Motion
What could we do? What could we find?
• Find and mark loop nests in the call graph
• Compute interprocedural AVAIL ?
• Move code across procedure boundaries
Two ideas
• Invocation invariant expressions Expression whose value is determined at point of call Hoist them to the prolog, or hoist them across the call
• Loop embedding and extraction Move control-flow for entire nest into one procedure Enable optimization using intra-procedural techniques
} All are difficult
COMP 512, Rice University
13
Interprocedural Code Motion
Moving loops across a call
do i = 1 to 100 do j = 1 to 100 call fee(a,b,c,i,j) end end
fee(x,y,z,i,j) do k = 1 to 100 x(i,j) = x(i,j) + y(i,k) * z(k,j) end
call fee(a,b,c) fee(x,y,z) do i = 1 to 100 do j = 1 to 100 do k = 1 to 100 x(i,j) = x(i,j) + y(i,k) * z(k,j) end end end
Loop Embedding
Think of this as dual of partial inlining (partial outlining?)
COMP 512, Rice University
14
Interprocedural Code Motion
Moving loops across a call
do i = 1 to 100 do j = 1 to 100 call fee(a,b,c,i,j) end end
fee(x,y,z,i,j) do k = 1 to 100 x(i,j) = x(i,j) + y(i,k) * z(k,j) end
do i = 1 to 100 do j = 1 to 100 do k = 1 to 100 call fee(a,b,c,i,j,k) end end end
fee(x,y,z,i,j,k) x(i,j) = x(i,j) + y(i,k) * z(k,j)
Loop Extraction
Think of this as partial inlining
COMP 512, Rice University
15
Memo-function implementation
Idea
• Find pure functions & turn them into hashed lookups
Implementation
• Use interprocedural analysis to identify pure functions
• Insert stub with lookup between call & evaluation
Benefits
• Replace evaluations with table lookup
• Potential for substantial run-time savings
• Should share table implementation with other functions
COMP 512, Rice University
16
Cross-jumping
Idea
• Procedure epilogs come in two flavors Returned value & no returned value
• Eliminate duplicates & save space
Implementation
• At start of each block, compare ops before predecessor branch
• If identical, move it across the branch
• Repeat until code stops changing
Presents new challenges to the debugger
COMP 512, Rice University
17
Procedure Abstraction
Idea
• Reocgnize common instruction sequences
• Replace them with (very) cheap calls
Experience
• Need to abstract register names & local labels
• Use suffix trees (bio-informatics)
• Produces smaller code with longer paths ~ 1% slower for each 2% smaller
• Wreaks havoc with the debugger
Cooper & McIntosh, PLDI 99, May 1999
COMP 512, Rice University
18
Putting It All Together
Key questions that a compiler writer must answer?
• What should the optimizer & back end do?
• How should the compiler represent the program?
Broad answers
• Figure out what the real problems are
• Learn everything you can about the target machine
• Attack each problem with a specific plan
• Separate concerns as long as possible
COMP 512, Rice University
19
Lessons
Experience is a tough teacher
• Understand what the code looks like On input to the compiler and on output from the compiler Hard to see problems in large procedures
• A good compiler need not do everything Study the code & figure out what’s needed Do those things well & do them thoroughly
• Hand simulation before implementation pays off Avoid implementing things that solve no problem
• Find the right level of abstraction for each optimization Constant propagation can be done at source level Redundancy elimination should be done at a low-level
COMP 512, Rice University
20
Lessons
Experience is a tough teacher
• Intermediate representations are important Determines level of exposed detail Every pass traverses it, so efficient manipulation is critical
• Code shape and name spaces are important Determine opportunities & size of data structures Has large impact on effectiveness of algorithms
• Compile-time space counts Limits size of procedures that can be compiled Compiler touches all that space More complex analyses take lots of space
COMP 512, Rice University
22
Lessons
Experience is a tough teacher
• Separation of concerns is vital to progress Resource constraints from rearrangement Allocation from optimization Parsing from code generation
• Separation of concerns interferes with optimization Allocation & scheduling Evaluation order & redundancy elimination Code shape decisions affect later passes { Loop shape, case stmts.
Annotations (ILOC tags)