intraprocedural optimizations jonathan bachrach mit ai lab

Intraprocedural Optimizations

Jonathan Bachrach

MIT AI Lab

Outline

• Goal: eliminate abstraction overhead using static analysis and program transformation

• Topics:– Intraprocedural type inference– Static method selection– Specialization and Inlining– Static class prediction– Splitting– Box/unboxing– Common Subexpression Elimination– Overflow and range checks– Partial evaluation revisited

• Partially based on: Chambers’ “Efficient Implementation of Object-oriented Programming Languages” OOPSLA Tutorial

Running Example

(dg + ((x <num>) (y <num>) => <num>))

(dm + ((x <int>) (y <int>) => <int>) (%ib (%i+ (%iu x) (%iu y)))

(dm + ((x <flo>) (y <flo>) => <flo>) (%fb (%f+ (%fu x) (%fu y)))

(dm x2 ((x <num>) => <num>) (+ x x))(dm x2 ((x <int>) => <int>) (+ x x))

• Anatomy of Pure Proto Arithmetic– Dispatch

– Boxing

– Overflow checks

– Actual instruction

• C Arithmetic– Actual instruction

Biggest Inefficiencies

• Method dispatch

• Method calls

• Boxing

• Type checks

• Overflow and range checks

• Slot access

• Object creation

Intraprocedural Type Inference

• Goal: determine concrete class(es) of each variable and expression

• Standard data flow analysis through control graph– Propagate bindings b -> { class … } – Sources are literals, isa expressions, results of some

primitives, and type declarations

– Form unions of bindings at merge points

– Narrow sets after typecases

– Assumes closed world (or at least final classes)

Type Inference Example

(set x (isa <tab> …)) ;; x in { <tab> }(set y (table-growth-factor x)) ;; y in { <int> <flo> }(set z (if t x y)) ;; z in { <tab> <int>

<flo> }

Narrowing Type Precision

(if (isa? x <int>) (+ x 1) (+ x 37.0))

(if (isa? x <int>) (let (([x <int>] x)) (+ x 1)) (let (([x !<int>] x)) (+ x 37.0)))

Static Method Selection

(set x (isa <tab> …)) ;; x in { <tab> }(set y (table-growth-factor x)) ;; y in { <int> <flo> }(print out y)

• If only one class is statically possible then can perform dispatch statically:(set y (<tab>:table-growth-factor x))

• If a couple classes are statically possible then can insert typecase:(sel (class-of y) ((<int>) (<int>:print y)) ((<flo>) (<flo>:print y)))

Type Check Removal

• Type inference can clearly be used to remove type checks and casts

(set x (isa <tab> …)) ;; x in { <tab> }(if (isa? x <tab>) (go) (stop))==>(set x (isa <tab> …)) ;; x in { <tab> }(go)

Intraprocedural Type Inference Critique

• Pros: – Simple

– Fast

– Fewer dependents

• Cons: – Limited type precision

• No result types

• Incoming arg types

• No slot types

• Etc.

Specialization

• Q: How can we improve intraprocedural type inference precision?

• A: Specialization which is the cloning of methods with narrowed argument types

• Improves type precision of callee by contextualizing body:(dm sqr ((x <num>) (y <num>)) (* x y))==>(dm sqr ((x <int>) (y <int>)) (* x y))(dm sqr ((x <flo>) (y <flo>)) (* x y))

• Must make sure super calls still mean same thing

Specialization of Constructors

• Crucial to get object creation to be fast• Specialization can be used to build custom

constructors(def <thingy> (isa <any>)) (slot <thingy> thingy-x 0) (slot (t <thingy>) thingy-tracker (+ (thingy-x t) 1)) (slot <thingy> thingy-cache (fab <tab>))

(df thingy-isa (x tracker cache) (let ((thingy (clone <thingy>))) (unless (== x nul) (set (%slot-value thingy thingy-x) x)) (set (%slot-value thingy thingy-tracker) (if (== tracker nul) (+ (thingy-x p) 1) tracker)))) (set (%slot-value thingy thingy-cache) (if (== cache nul) (fab <tab>) cache))))

Inlining

• Q: Can we do better?

• A: Inlining can improve specialization by inserting specialized body

• Improves type precision at call-site by contextualizing body (includes result types):(dm f ((x <int>) (y <int>)) (+ (g x y) 1))(dm g (x y) (+ x y))==>(dm f ((x <int>) (y <int>)) (+ (+ x y) 1))

Synergy: Method Selection + Inlining

(df f ((x <int>) (y <int>)) (+ x y))

;; method selection(df f ((x <int>) (y <int>)) (<int>:+ x y))

;; inlining(df f ((x <int>) (y <int>)) (%ib (%i+ (%iu x) (%iu y))))

Pitfalls of Inlining and Specialization

• Must control inlining and specialization carefully to avoid code bloat

• Inlining can work merely using syntactic size trying never to increase size over original call

• Class-centric specialization usually works by copying down inherited methods tightening up self references (harder for multimethods)

• Can run inlining/specialization trials based on– Final static size– Performance feedback

Class Centric Specialization

(def <point> (isa <any>)) (slot <point> (point-x <int>) 0)(dm point-move ((p <point>) (offset <num>)) (set (point-x p) (+ (point-x p) offset)))(def <color-point> (isa <point>))

==>

(dm point-move ((p <color-point>) (offset <num>)) (set (point-x p) (+ (point-x p) offset)))

Static Class Prediction

• Can improve type precision in cases where for a given generic a particular method is much more frequent

• Insert type check testing prediction– Can narrow type precision along then and else

branches

• Especially useful in combination with inlining

Static Class Prediction Example

(df f (x) (let ((y (+ x 1))) (+ y 2)))

(df f (x) (let ((y (if (isa? x <int>) (+ x 1) (+ x 1)))) (if (isa? y <int>) (+ y 2) (+ y 2)))))

(df f (x) (let ((y (if (isa? x <int>) (<int>:+ x 1) (+ x 1)))) (if (isa? y <int>) (<int>:+ y 2) (+ y 2)))))

Synergy: Class Prediction + Method Selection + Inlining

(df f (x) (let ((y (if (isa? x <int>) (+ x 1) (+ x 1)))) (if (isa? y <int>) (+ y 2) (+ y 2)))))

;; method selection(df f (x) (let ((y (if (isa? x <int>) (<int>:+ x 1) (+ x 1)))) (if (isa? y <int>) (<int>:+ y 2) (+ y 2)))))

;; inlining(df f (x) (let ((y (if (isa? x <int>) (%ib (%i+ (%iu x) %1)) (+ x 1)))) (if (isa? y <int>) (%ib (%i+ (%iu y) (%iu 2))) (+ y 2)))))

Splitting

• Problem: Class prediction often leads to a bunch of redundant type tests

• Solution: Split off whole sections of graph specialized to particular class on variable– Can split off entire loops– Can specialize on other dataflow information

Splitting Example

(df f (x) (let ((y (+ x 1))) (+ y 2)))

(df f (x) (if (isa? x <int>) (let ((y (+ x 1))) (+ y 2)) (let ((y (+ x 1))) (+ y 2))))

(df f (x) (if (isa? x <int>) (let ((y (<int>:+ x 1))) (<int>:+ y 2)) (let ((y (+ x 1))) (+ y 2))))

Splitting Downside

• Splitting can also lead to code bloat

• Must be intelligent about what to split– A priori knowledge (e.g., integers most

frequent)– Actual performance

Box / Unboxing

(df + ((x <int>) (y <int>) => <int>) (%ib (%i+ (%iu x) (%iu y))))

(df f ((a <int>) (b <int>) => <int>) (+ (+ a b) a))

;; inlining +

(df f ((a <int>) (b <int>) => <int>) (%ib (%i+ (%iu (%ib (%i+ (%iu a) (%iu b)))) (%iu a))))

;; remove box/unbox pair

(df f ((a <int>) (b <int>) => <int>) (%ib (%i+ (%i+ (%iu a) (%iu b)) (%iu a))))

Synergy: Splitting + Method Selection + Inlining + Box/Unboxing

(df f (x) (if (isa? x <int>) (let ((y (+ x 1))) (+ y 2)) (let ((y (+ x 1))) (+ y 2))))

;; method selection(df f (x) (if (isa? x <int>) (let ((y (<int>:+ x 1))) (<int>:+ y 2)) (let ((y (+ x 1))) (+ y 2))))

(df f (x) (if (isa? x <int>) (<int>:+ (<int>:+ x 1) 2) (let ((y (+ x 1))) (+ y 2))));; inlining(df f (x) (if (isa? x <int>) (%ib (i+ (%iu (%ib (%i+ (%iu x) %1)))) %2)) (let ((y (+ x 1))) (+ y 2))));; box/unbox(df f (x) (if (isa? x <int>) (%ib (%i+ (%i+ (%iu x) %1)) %2)) (let ((y (+ x 1))) (+ y 2))))

Common Subexpression Elimination (CSE)

• Removes redundant computations– Constant slot or binding access– Stateless/side-effect-free function calls

• Examples(or (elt (cache x) ‘a) (elt (cache x) ‘b)) ==> (let ((t (cache x))) (or (elt t ‘a) (elt t ‘b))

(if ( (if (< i 0) (go) (dance))

Overflow and Bounds Checksaka “Moon Challenge”

• Goal: – Support mathematical integers and bounds checked

collection access– Eliminate bounds and overflow checks

• Strategy:– Assume most integer arithmetic and collection accesses

occur in restricted loop context where range can be readily inferred

– Perform range analysis to remove checks• Bound from above variables by size of collection• Bound from below variables by zero• Induction step is 1+

Range Check Example

(rep (((sum <int>) 0) ((i <int>) 0)) (if () 0) ((i <int>) 0)) (if (= i (len v))) (sig ...) (vref v i)))) (rep (+ sum e) (+ i 1))) sum))

;; CSE(rep (((sum <int>) 0) ((i <int>) 0)) (if () 0) ((i <int>) 0)) (if (< i (len v)) (let ((e (vref v i))) (rep (+ sum e) (+ i 1))) sum))

Overflow Check Removal aka “Moon Challenge” Critique

• Pros: – simple analysis

• Cons: – could miss a number of cases

• but then previous approaches (e.g., box/unbox) could be applied

Advanced topic:Representation Selection

• Embed objects in others to remove indirections

• Change object representation over time

• Use minimum number of bits to represent enums

• Pack fields in objects

Advanced Topic:Algorithm Selection

• Goal: compiler determines that one algorithm is more appropriate for given data– Sorted data– Biased data

• Solution: – Embed statistics gathering in runtime– Add guards to code and split

Rule-based Compilation

• First millennium compilers were based on special rules for– Method selection– Pattern matching– Oft-used system functions like format

• Problems– Error prone– Don’t generalize to user code

• Challenge– Minimize number of rules– Competitive compiler speed– Produce competitive code

Partial Evaluation to the Rescue

• Holy grail idea:– Optimizations are manifest in code– Do previous optimizations with only p.e.

• Simplify compiler based on limited moves– Static eval and folding– Inlining

• Eliminate– Custom method selection– Custom constructor optimization– Etc.

Partial Eval Example(dm format (port msg (args …)) (rep nxt ((I 0) (ai 0)) (when (? ” n)

• First millennium solution is to have a custom optimizer for format

(seq (print port n) (write port “> “))

• Second millennium solution with partial evaluation

(nxt 0 0)

(seq (print port n) (nxt 1 1))

(seq (print port n) (seq (write port #\>) (nxt 2 1)))

(seq (print port n) (seq (write port #\>) (seq (write port #\space))))

Partial Eval Challenge

• Inlining and static eval are slow– “Running” code through inlining

– Need to compile oft-used optimizations

• Residual code is not necessarily efficient– Sometimes algorithmic change is necessary for optimal

efficiency• Example: method selection uses class numbering and decision

tree whereas straightforward code does naïve method sorting

• Perhaps there is a middle ground

Open Problems

• Automatic inlining, splitting, and specialization• Efficient mathematical integers• Constant determination• Representation selection• Algorithmic selection• Efficient partial evaluation• Super compiler that runs for days

Reading List

• Chambers: “Efficient Implementation of Object-oriented Programming Languages” OOPSLA Tutorial

• Chambers and Ungar: SELF papers

• Chambers et al.: Vortex papers

intraprocedural optimizations jonathan bachrach mit ai lab

Documents