oodl runtime optimizations jonathan bachrach mit ai lab feb 2001

45
OODL Runtime Optimizations Jonathan Bachrach MIT AI Lab Feb 2001

Upload: rosalind-tucker

Post on 19-Jan-2016

220 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: OODL Runtime Optimizations Jonathan Bachrach MIT AI Lab Feb 2001

OODL Runtime Optimizations

Jonathan Bachrach

MIT AI Lab

Feb 2001

Page 2: OODL Runtime Optimizations Jonathan Bachrach MIT AI Lab Feb 2001

Runtime Techniques

• Assume can only write system code turbochargers– No sophisticated compiler available– Can only minimally perturb user code

Page 3: OODL Runtime Optimizations Jonathan Bachrach MIT AI Lab Feb 2001

Q: What are the Biggest Inefficiencies?

• Imagine trying to get Proto to run faster

Page 4: OODL Runtime Optimizations Jonathan Bachrach MIT AI Lab Feb 2001

Hint: Most Popular Operations

Page 5: OODL Runtime Optimizations Jonathan Bachrach MIT AI Lab Feb 2001

Running Example

(dg + ((x <num>) (y <num>) => <num>))

(dm + ((x <int>) (y <int>) => <int>) (%ib (%i+ (%iu x) (%iu y)))

(dm + ((x <flo>) (y <flo>) => <flo>) (%fb (%f+ (%fu x) (%fu y)))

(dm x2 ((x <num>) => <num>) (+ x x))(dm x2 ((x <int>) => <int>) (+ x x))

Page 6: OODL Runtime Optimizations Jonathan Bachrach MIT AI Lab Feb 2001

A: What are the Biggest Inefficiencies?

• Boxing• Method dispatch *• Type checks

• Slot access• Object creation

• * Today

Page 7: OODL Runtime Optimizations Jonathan Bachrach MIT AI Lab Feb 2001

Outline

• Overview

• Inline call caches

• Table

• Decision tree

• Variations

• Open Problems

Page 8: OODL Runtime Optimizations Jonathan Bachrach MIT AI Lab Feb 2001

Method Distributions

• Distribution can be measured– At generic– At call site

• Distribution can be– Monomorphic– Polymorphic– Megamorphic

• Distribution can be – peaked– uniform

Page 9: OODL Runtime Optimizations Jonathan Bachrach MIT AI Lab Feb 2001

Expense of Dispatch

• Problem: expensive if computed naively– Find applicable methods– Sort applicable methods– Call most applicable method– Three outcomes

• One most applicable method => ok

• No applicable methods => not understood error

• Many applicable methods => ambiguous error

Page 10: OODL Runtime Optimizations Jonathan Bachrach MIT AI Lab Feb 2001

Mapping View of Dispatch

• Dispatch can be thought of as a mapping from argument types to a method– (t1, t2, …, tn) => m

Page 11: OODL Runtime Optimizations Jonathan Bachrach MIT AI Lab Feb 2001

Solutions

• Caching

• Fast mapping

Page 12: OODL Runtime Optimizations Jonathan Bachrach MIT AI Lab Feb 2001

Table-based Approach

• N-dimensional tables– Keys are concrete classes of actual arguments– Values are methods to call– Must address size explosion– Talk a bit about this later

• Nested tables– Keys are concrete classes of actual arguments– Values are either other tables or methods to call

Page 13: OODL Runtime Optimizations Jonathan Bachrach MIT AI Lab Feb 2001

Table Example One

text(+ x y) textcache

gen +

#f

prog ram

Page 14: OODL Runtime Optimizations Jonathan Bachrach MIT AI Lab Feb 2001

Table Example Two

texttext(+ x y) textcache

gen +

prog ram

<int> <int> code texti+

i+

prog ram

x= 0 y= 0

Page 15: OODL Runtime Optimizations Jonathan Bachrach MIT AI Lab Feb 2001

Table Example Three

text

text(+ x y) textcache

+

prog ram

x= 0 .0 y= 0 .0

<int>

<int>

code i+

i+<flo>

<int>

textcode

f+

f+

prog ram

Page 16: OODL Runtime Optimizations Jonathan Bachrach MIT AI Lab Feb 2001

Table-based Critique

• Pros– Simple– Amenable to profile guided reordering

• Cons– Too many indirections– Very big

• demand build it• Sharing of subtables

– Only works for class types • can use multiple tables

Page 17: OODL Runtime Optimizations Jonathan Bachrach MIT AI Lab Feb 2001

Engine Node Dispatch

• Glenn Burke and myself at Harlequin, Inc.

circa 1996-– Partial Dispatch: Optimizing Dynamically-Dispatched Multimethod Calls

with Compile-Time Types and Runtime Feedback, 1998

• Shared decision tree built out of executable engine nodes

• Incrementally grows trees on demand upon miss

• Engine nodes are executed to perform some action typically tail calling another engine node eventually tail calling chosen method

• Appropriate engine nodes can be utilized to handle monomorphic, polymorphic, and megamorphic discrimination cases corresponding to single, linear, and table lookup

Page 18: OODL Runtime Optimizations Jonathan Bachrach MIT AI Lab Feb 2001

Engine Node Dispatch Picture

textcall

\+<i>,<i>m ethod

NAM

\+<f>,<f>m ethod

...discrim inator

...

<generic>

...

<f>

<i>

linear ep

<linear-engine>

...

< i>

m ono ep

<mono-engine>

...

<f>

m ono ep

<mono-engine>

...MEP

<method>

...MEP

<method>

...MEP

<method>

Define method \+ (x :: <i>, y :: <i>) … end;Define method \+ (x :: <f>, y :: <f>) … end;Seen (<i>, <i>) and (<f>, <f>) as inputs.

Page 19: OODL Runtime Optimizations Jonathan Bachrach MIT AI Lab Feb 2001

Engine Dispatch Critique

• Pros:

• Portable• Introspectable• Code Shareable

• Cons:

• Data and Code Indirections

• Sharing overhead• Hard to inline• Less partial eval opps

Page 20: OODL Runtime Optimizations Jonathan Bachrach MIT AI Lab Feb 2001

Lookup DAG

• Input is argument values• Output is method or error

• Lookup DAG is a decision tree with identical subtrees shared to save space

• Each interior node has a set of outgoing class-labeled edges and is labeled with an expression

• Each leaf node is labeled with a method which is either user specified, not-understood, or ambiguous.

Page 21: OODL Runtime Optimizations Jonathan Bachrach MIT AI Lab Feb 2001

Lookup DAG Picture•From Chambers and Chen OOPSLA-99

Page 22: OODL Runtime Optimizations Jonathan Bachrach MIT AI Lab Feb 2001

Lookup DAG Evaluation

• Formals start bound to actuals• Evaluation starts from root• To evaluate an interior node

– evaluate its expression yielding v and

– then search its edges for unique edge e whose label is the class of the result v and then edge's target node is evaluated recursively

• To evaluate a leaf node – return its method

Page 23: OODL Runtime Optimizations Jonathan Bachrach MIT AI Lab Feb 2001

Lookup DAG Evaluation Picture•From Chambers and Chen OOPSLA-99

Page 24: OODL Runtime Optimizations Jonathan Bachrach MIT AI Lab Feb 2001

Lookup DAG Constructionfunction BuildLookupDag (DF: canonical dispatch function): lookup DAG = create empty lookup DAG G create empty table Memo cs: set of Case := Cases(DF) G.root := buildSubDag(cs, Exprs(cs)) return G

function buildSubDag (cs: set of Case, es: set of Expr): set of Case = n: node if (cs, es)->n in Memo then return n if empty?(es) then n := create leaf node in G n.method := computeTarget(cs) else n := create interior node in G expr:Expr := pickExpr(es, cs) n.expr := expr for each class in StaticClasses(expr) do cs': set of Case := targetCases(cs, expr, class) es': set of Expr := (es - {expr}) ^ Exprs(cs') n': node := buildSubDag(cs', es') e: edge := create edge from n to n' in G e.class := class end for add (cs, es)->n to Memo return n

function computeTarget (cs: set of Case): Method = methods: set of Method := min<=(Methods(case)) if |methods| = 0 then return m-not-understood if |methods| > 1 then return m-ambiguous return single element m of methods

Page 25: OODL Runtime Optimizations Jonathan Bachrach MIT AI Lab Feb 2001

Single Dispatch Binary Search Tree

• Label classes with integers using inorder walk with goal to get subclasses to form a contiguous range

• Implement Class => Target Map as binary search tree balancing execution frequency information

Page 26: OODL Runtime Optimizations Jonathan Bachrach MIT AI Lab Feb 2001

Class Numbering

<any>

<a> <d>

<c><b> <e>

text0 1 2 3 4 5

Page 27: OODL Runtime Optimizations Jonathan Bachrach MIT AI Lab Feb 2001

Binary Search Tree Picture•From Chambers and Chen OOPSLA-99

Page 28: OODL Runtime Optimizations Jonathan Bachrach MIT AI Lab Feb 2001

Critique of Decision Tree

• Pros– Efficient to construct and execute– Can incorporate profile information to bias execution– Amenable to on demand construction– Amenable to partial evaluation and method inlining– Can easily incorporate static class information– Amenable to inlining into call-sites– Permits arbitrary predicates – Mixes linear, binary, and array lookups– Fast on modern CPU’s

• Cons– Requires code gen / compiler to produce best ones

Page 29: OODL Runtime Optimizations Jonathan Bachrach MIT AI Lab Feb 2001

Inline Call Caches

• Assumption: – method distribution is usually peaked and call-site

specific

• Each call-site has its own cache• Use call instruction as cache

– Calls last taken method– Method prologue checks for correct arguments– Calls slow lookup on miss which also patches call

instruction

• Deutsch and Schiffman, 1984

Page 30: OODL Runtime Optimizations Jonathan Bachrach MIT AI Lab Feb 2001

Inline Caching Example One

text(+ x y)

prog ram prog ram

look up

Page 31: OODL Runtime Optimizations Jonathan Bachrach MIT AI Lab Feb 2001

Inline Caching Two

text(+ x y)

prog ram prog ram

text

look up

x= 0 y= 0

(let ((tx (class-of x)) (ty (class-of y)) (unless (and (== tx <int>) (== ty <int>)) (tail-call lookup)))

... i+ code ...

i+

Page 32: OODL Runtime Optimizations Jonathan Bachrach MIT AI Lab Feb 2001

Inline Caching Three

text(+ x y)

prog ram prog ram

text

look up

x= 0 .0 y= 0 .0

(let ((tx (class-of x)) (ty (class-of y)) (unless (and (== tx <int>) (== ty <int>)) (tail-call lookup)))

... i+ code ...

i+

(let ((tx (class-of x)) (ty (class-of y)) (unless (and (== tx <flo>) (== ty <flo>)) (tail-call lookup)))

... f+ code ...

f+

Page 33: OODL Runtime Optimizations Jonathan Bachrach MIT AI Lab Feb 2001

Inline Caching Critique

• Pros– Fast dispatch sequence for hit– Usually high hit rate (90-95% for Smalltalk)

• Cons– Uses self-modifying code– Slow for misses– Depends on method distribution spike– Might be less beneficial for multimethods

Page 34: OODL Runtime Optimizations Jonathan Bachrach MIT AI Lab Feb 2001

Polymorphic Inline Caching

• Handles polymorphically peaked distribution

• Generate call-site specific dispatch stub

• Holzle et al., 1991

Page 35: OODL Runtime Optimizations Jonathan Bachrach MIT AI Lab Feb 2001

Polymorphic Inline CachingExample One

text(+ x y)

prog ram prog ram

look up

(lookup x y)

i+

f+

(i+ x y)

(f+ x y)

Page 36: OODL Runtime Optimizations Jonathan Bachrach MIT AI Lab Feb 2001

Polymorphic Inline CachingExample Two

text(+ x y)

prog ram prog ram

look up

(let ((tx (class-of x)) (ty (class-of y)) (if (and (== tx <int>) (== ty <int>)) (tail-call i+) (tail-call lookup)))

i+

f+

text

(i+ x y)

(f+ x y)

x= 0 y= 0

Page 37: OODL Runtime Optimizations Jonathan Bachrach MIT AI Lab Feb 2001

Polymorphic Inline CachingExample Three

text(+ x y)

prog ram prog ram

look up

(let ((tx (class-of x)) (ty (class-of y)) (if (and (== tx <int>) (== ty <int>)) (tail-call i+) (if (and (== tx <flo>) (== tx <flo>)) (tail-call i+) (tail-call lookup)))

i+

f+

(i+ x y)

(f+ x y)

x= 0 .0 y= 0 .0

Page 38: OODL Runtime Optimizations Jonathan Bachrach MIT AI Lab Feb 2001

Polymorphic Inline Cache Critique

• Pros– Faster for multiple peaked distributions

• Cons– Slow for uniform distribution– Requires runtime code generation– Doesn’t scale quite as well for multimethods

and predicate types

Page 39: OODL Runtime Optimizations Jonathan Bachrach MIT AI Lab Feb 2001

Other Multimethod Approaches

• Hash table indexed by N keys, – Kiczales and Rodriguez 1989

• Compressed N+1 dimensional dispatch table– Amiel et al. 1994– Pang et al. 1999

Page 40: OODL Runtime Optimizations Jonathan Bachrach MIT AI Lab Feb 2001

Variations

• Inline method bodies into leaves of decision tree

• Reorder decision tree based on method distributions

• Fold slot access into dispatch

Page 41: OODL Runtime Optimizations Jonathan Bachrach MIT AI Lab Feb 2001

Open Problems

• Feed static information into dynamic dispatch

• Smaller

• Faster

• More adaptive

Page 42: OODL Runtime Optimizations Jonathan Bachrach MIT AI Lab Feb 2001

Readings

• Deutsch and Schiffman 1984• Kiczales and Rodriguez 1989• Dussud 1989 • Moon and Cypher 19??• Amiel et al. 1994• Pang et al. 1999• Holzle and Ungar 1994• Chen and Turau 1994

• Peter Lee Advanced Language Implementation 1991

Page 43: OODL Runtime Optimizations Jonathan Bachrach MIT AI Lab Feb 2001

Acknowledgements

• This lecture includes some material from Craig Chambers’ OOPSLA course on OO language implementation.

Page 44: OODL Runtime Optimizations Jonathan Bachrach MIT AI Lab Feb 2001

Assignment 3 Hint

Create methods with the following construction:

(dm make-method ((n <int>) (types <lst>) (body <lst>) => <met>) (select n ((0) (fun () ...)) ((1) (fun ((a0 (elt types 0))) ...)) ((2) (fun ((a0 (elt types 0)) (a1 (elt types 1))) ...)) ...)

Page 45: OODL Runtime Optimizations Jonathan Bachrach MIT AI Lab Feb 2001

Assignment 4

• Write an associative dispatch cache

• Use linear lookup

• Include profile-guided reordering

• Don’t need to handle singleton dispatch