intraprocedural optimizations jonathan bachrach mit ai lab
TRANSCRIPT
Intraprocedural Optimizations
Jonathan Bachrach
MIT AI Lab
Outline
• Goal: eliminate abstraction overhead using static analysis and program transformation
• Topics:– Intraprocedural type inference– Static method selection– Specialization and Inlining– Static class prediction– Splitting– Box/unboxing– Common Subexpression Elimination– Overflow and range checks– Partial evaluation revisited
• Partially based on: Chambers’ “Efficient Implementation of Object-oriented Programming Languages” OOPSLA Tutorial
Running Example
(dg + ((x <num>) (y <num>) => <num>))
(dm + ((x <int>) (y <int>) => <int>) (%ib (%i+ (%iu x) (%iu y)))
(dm + ((x <flo>) (y <flo>) => <flo>) (%fb (%f+ (%fu x) (%fu y)))
(dm x2 ((x <num>) => <num>) (+ x x))(dm x2 ((x <int>) => <int>) (+ x x))
• Anatomy of Pure Proto Arithmetic– Dispatch
– Boxing
– Overflow checks
– Actual instruction
• C Arithmetic– Actual instruction
Biggest Inefficiencies
• Method dispatch
• Method calls
• Boxing
• Type checks
• Overflow and range checks
• Slot access
• Object creation
Intraprocedural Type Inference
• Goal: determine concrete class(es) of each variable and expression
• Standard data flow analysis through control graph– Propagate bindings b -> { class … } – Sources are literals, isa expressions, results of some
primitives, and type declarations
– Form unions of bindings at merge points
– Narrow sets after typecases
– Assumes closed world (or at least final classes)
Type Inference Example
(set x (isa <tab> …)) ;; x in { <tab> }(set y (table-growth-factor x)) ;; y in { <int> <flo> }(set z (if t x y)) ;; z in { <tab> <int>
<flo> }
Narrowing Type Precision
(if (isa? x <int>) (+ x 1) (+ x 37.0))
(if (isa? x <int>) (let (([x <int>] x)) (+ x 1)) (let (([x !<int>] x)) (+ x 37.0)))
Static Method Selection
(set x (isa <tab> …)) ;; x in { <tab> }(set y (table-growth-factor x)) ;; y in { <int> <flo> }(print out y)
• If only one class is statically possible then can perform dispatch statically:(set y (<tab>:table-growth-factor x))
• If a couple classes are statically possible then can insert typecase:(sel (class-of y) ((<int>) (<int>:print y)) ((<flo>) (<flo>:print y)))
Type Check Removal
• Type inference can clearly be used to remove type checks and casts
(set x (isa <tab> …)) ;; x in { <tab> }(if (isa? x <tab>) (go) (stop))==>(set x (isa <tab> …)) ;; x in { <tab> }(go)
Intraprocedural Type Inference Critique
• Pros: – Simple
– Fast
– Fewer dependents
• Cons: – Limited type precision
• No result types
• Incoming arg types
• No slot types
• Etc.
Specialization
• Q: How can we improve intraprocedural type inference precision?
• A: Specialization which is the cloning of methods with narrowed argument types
• Improves type precision of callee by contextualizing body:(dm sqr ((x <num>) (y <num>)) (* x y))==>(dm sqr ((x <int>) (y <int>)) (* x y))(dm sqr ((x <flo>) (y <flo>)) (* x y))
• Must make sure super calls still mean same thing
Specialization of Constructors
• Crucial to get object creation to be fast• Specialization can be used to build custom
constructors(def <thingy> (isa <any>)) (slot <thingy> thingy-x 0) (slot (t <thingy>) thingy-tracker (+ (thingy-x t) 1)) (slot <thingy> thingy-cache (fab <tab>))
(df thingy-isa (x tracker cache) (let ((thingy (clone <thingy>))) (unless (== x nul) (set (%slot-value thingy thingy-x) x)) (set (%slot-value thingy thingy-tracker) (if (== tracker nul) (+ (thingy-x p) 1) tracker)))) (set (%slot-value thingy thingy-cache) (if (== cache nul) (fab <tab>) cache))))
Inlining
• Q: Can we do better?
• A: Inlining can improve specialization by inserting specialized body
• Improves type precision at call-site by contextualizing body (includes result types):(dm f ((x <int>) (y <int>)) (+ (g x y) 1))(dm g (x y) (+ x y))==>(dm f ((x <int>) (y <int>)) (+ (+ x y) 1))
Synergy: Method Selection + Inlining
(df f ((x <int>) (y <int>)) (+ x y))
;; method selection(df f ((x <int>) (y <int>)) (<int>:+ x y))
;; inlining(df f ((x <int>) (y <int>)) (%ib (%i+ (%iu x) (%iu y))))
Pitfalls of Inlining and Specialization
• Must control inlining and specialization carefully to avoid code bloat
• Inlining can work merely using syntactic size trying never to increase size over original call
• Class-centric specialization usually works by copying down inherited methods tightening up self references (harder for multimethods)
• Can run inlining/specialization trials based on– Final static size– Performance feedback
Class Centric Specialization
(def <point> (isa <any>)) (slot <point> (point-x <int>) 0)(dm point-move ((p <point>) (offset <num>)) (set (point-x p) (+ (point-x p) offset)))(def <color-point> (isa <point>))
==>
(dm point-move ((p <color-point>) (offset <num>)) (set (point-x p) (+ (point-x p) offset)))
Static Class Prediction
• Can improve type precision in cases where for a given generic a particular method is much more frequent
• Insert type check testing prediction– Can narrow type precision along then and else
branches
• Especially useful in combination with inlining
Static Class Prediction Example
(df f (x) (let ((y (+ x 1))) (+ y 2)))
(df f (x) (let ((y (if (isa? x <int>) (+ x 1) (+ x 1)))) (if (isa? y <int>) (+ y 2) (+ y 2)))))
(df f (x) (let ((y (if (isa? x <int>) (<int>:+ x 1) (+ x 1)))) (if (isa? y <int>) (<int>:+ y 2) (+ y 2)))))
Synergy: Class Prediction + Method Selection + Inlining
(df f (x) (let ((y (if (isa? x <int>) (+ x 1) (+ x 1)))) (if (isa? y <int>) (+ y 2) (+ y 2)))))
;; method selection(df f (x) (let ((y (if (isa? x <int>) (<int>:+ x 1) (+ x 1)))) (if (isa? y <int>) (<int>:+ y 2) (+ y 2)))))
;; inlining(df f (x) (let ((y (if (isa? x <int>) (%ib (%i+ (%iu x) %1)) (+ x 1)))) (if (isa? y <int>) (%ib (%i+ (%iu y) (%iu 2))) (+ y 2)))))
Splitting
• Problem: Class prediction often leads to a bunch of redundant type tests
• Solution: Split off whole sections of graph specialized to particular class on variable– Can split off entire loops– Can specialize on other dataflow information
Splitting Example
(df f (x) (let ((y (+ x 1))) (+ y 2)))
(df f (x) (if (isa? x <int>) (let ((y (+ x 1))) (+ y 2)) (let ((y (+ x 1))) (+ y 2))))
(df f (x) (if (isa? x <int>) (let ((y (<int>:+ x 1))) (<int>:+ y 2)) (let ((y (+ x 1))) (+ y 2))))
Splitting Downside
• Splitting can also lead to code bloat
• Must be intelligent about what to split– A priori knowledge (e.g., integers most
frequent)– Actual performance
Box / Unboxing
(df + ((x <int>) (y <int>) => <int>) (%ib (%i+ (%iu x) (%iu y))))
(df f ((a <int>) (b <int>) => <int>) (+ (+ a b) a))
;; inlining +
(df f ((a <int>) (b <int>) => <int>) (%ib (%i+ (%iu (%ib (%i+ (%iu a) (%iu b)))) (%iu a))))
;; remove box/unbox pair
(df f ((a <int>) (b <int>) => <int>) (%ib (%i+ (%i+ (%iu a) (%iu b)) (%iu a))))
Synergy: Splitting + Method Selection + Inlining + Box/Unboxing
(df f (x) (if (isa? x <int>) (let ((y (+ x 1))) (+ y 2)) (let ((y (+ x 1))) (+ y 2))))
;; method selection(df f (x) (if (isa? x <int>) (let ((y (<int>:+ x 1))) (<int>:+ y 2)) (let ((y (+ x 1))) (+ y 2))))
(df f (x) (if (isa? x <int>) (<int>:+ (<int>:+ x 1) 2) (let ((y (+ x 1))) (+ y 2))));; inlining(df f (x) (if (isa? x <int>) (%ib (i+ (%iu (%ib (%i+ (%iu x) %1)))) %2)) (let ((y (+ x 1))) (+ y 2))));; box/unbox(df f (x) (if (isa? x <int>) (%ib (%i+ (%i+ (%iu x) %1)) %2)) (let ((y (+ x 1))) (+ y 2))))
Common Subexpression Elimination (CSE)
• Removes redundant computations– Constant slot or binding access– Stateless/side-effect-free function calls
• Examples(or (elt (cache x) ‘a) (elt (cache x) ‘b)) ==> (let ((t (cache x))) (or (elt t ‘a) (elt t ‘b))
(if (< i 0) (if (< i 0) (go) (putz)) (dance)) ==> (if (< i 0) (go) (dance))
Overflow and Bounds Checksaka “Moon Challenge”
• Goal: – Support mathematical integers and bounds checked
collection access– Eliminate bounds and overflow checks
• Strategy:– Assume most integer arithmetic and collection accesses
occur in restricted loop context where range can be readily inferred
– Perform range analysis to remove checks• Bound from above variables by size of collection• Bound from below variables by zero• Induction step is 1+
Range Check Example
(rep (((sum <int>) 0) ((i <int>) 0)) (if (< i (len v)) (let ((e (elt v i))) (rep (+ sum e) (+ i 1))) sum))
;; inlining bounds checks(rep (((sum <int>) 0) ((i <int>) 0)) (if (< i (len v)) (let ((e (if (or (< i 0) (>= i (len v))) (sig ...) (vref v i)))) (rep (+ sum e) (+ i 1))) sum))
;; CSE(rep (((sum <int>) 0) ((i <int>) 0)) (if (< i (len v)) (let ((e (if (< i 0) (sig ...) (vref v i)))) (rep (+ sum e) (+ i 1))) sum))
;; range analysis(rep (((sum <int>) 0) ((i <int>) 0)) (if (< i (len v)) (let ((e (vref v i))) (rep (+ sum e) (+ i 1))) sum))
Overflow Check Removal aka “Moon Challenge” Critique
• Pros: – simple analysis
• Cons: – could miss a number of cases
• but then previous approaches (e.g., box/unbox) could be applied
Advanced topic:Representation Selection
• Embed objects in others to remove indirections
• Change object representation over time
• Use minimum number of bits to represent enums
• Pack fields in objects
Advanced Topic:Algorithm Selection
• Goal: compiler determines that one algorithm is more appropriate for given data– Sorted data– Biased data
• Solution: – Embed statistics gathering in runtime– Add guards to code and split
Rule-based Compilation
• First millennium compilers were based on special rules for– Method selection– Pattern matching– Oft-used system functions like format
• Problems– Error prone– Don’t generalize to user code
• Challenge– Minimize number of rules– Competitive compiler speed– Produce competitive code
Partial Evaluation to the Rescue
• Holy grail idea:– Optimizations are manifest in code– Do previous optimizations with only p.e.
• Simplify compiler based on limited moves– Static eval and folding– Inlining
• Eliminate– Custom method selection– Custom constructor optimization– Etc.
Partial Eval Example(dm format (port msg (args …)) (rep nxt ((I 0) (ai 0)) (when (< I (len msg))) (let ((c (elt msg I))) (if (= c #\%) (seq (print port (elt args ai)) (nxt (+ I 1) (+ ai 1)))) (seq (write port c) (nxt (+ I 1) ai)))))))
(format out “%>? ” n)
• First millennium solution is to have a custom optimizer for format
(seq (print port n) (write port “> “))
• Second millennium solution with partial evaluation
(nxt 0 0)
(seq (print port n) (nxt 1 1))
(seq (print port n) (seq (write port #\>) (nxt 2 1)))
(seq (print port n) (seq (write port #\>) (seq (write port #\space))))
Partial Eval Challenge
• Inlining and static eval are slow– “Running” code through inlining
– Need to compile oft-used optimizations
• Residual code is not necessarily efficient– Sometimes algorithmic change is necessary for optimal
efficiency• Example: method selection uses class numbering and decision
tree whereas straightforward code does naïve method sorting
• Perhaps there is a middle ground
Open Problems
• Automatic inlining, splitting, and specialization• Efficient mathematical integers• Constant determination• Representation selection• Algorithmic selection• Efficient partial evaluation• Super compiler that runs for days
Reading List
• Chambers: “Efficient Implementation of Object-oriented Programming Languages” OOPSLA Tutorial
• Chambers and Ungar: SELF papers
• Chambers et al.: Vortex papers