optimizing compilers cisc 673 spring 2009 dynamic compilation ii

30
UNIVERSITY NIVERSITY OF OF D DELAWARE ELAWARE C COMPUTER & OMPUTER & INFORMATION NFORMATION SCIENCES CIENCES DEPARTMENT EPARTMENT Optimizing Compilers CISC 673 Spring 2009 Dynamic Compilation II John Cavazos University of Delaware

Upload: jacob-byers

Post on 02-Jan-2016

41 views

Category:

Documents


2 download

DESCRIPTION

Optimizing Compilers CISC 673 Spring 2009 Dynamic Compilation II. John Cavazos University of Delaware. What is in a Dynamic Compiler?. Interpretation Popular approach for high-level languages Ex, Python, APL, SNOBOL, BCPL, Perl, MATLAB Useful for memory-challenged environments - PowerPoint PPT Presentation

TRANSCRIPT

UUNIVERSITYNIVERSITY OFOF D DELAWARE ELAWARE • • C COMPUTER & OMPUTER & IINFORMATION NFORMATION SSCIENCES CIENCES DDEPARTMENTEPARTMENT

Optimizing CompilersCISC 673

Spring 2009Dynamic Compilation II

John CavazosUniversity of Delaware

UUNIVERSITYNIVERSITY OFOF D DELAWARE ELAWARE • • C COMPUTER & OMPUTER & IINFORMATION NFORMATION SSCIENCES CIENCES DDEPARTMENTEPARTMENT

What is in a Dynamic Compiler?

Interpretation Popular approach for high-level languages

Ex, Python, APL, SNOBOL, BCPL, Perl, MATLAB Useful for memory-challenged

environments Low startup time & space overhead, but

much slower than native code execution MMI (Mixed Mode Interpreter)

[Suganauma’01] Fast interpreter implemented in assembler

UUNIVERSITYNIVERSITY OFOF D DELAWARE ELAWARE • • C COMPUTER & OMPUTER & IINFORMATION NFORMATION SSCIENCES CIENCES DDEPARTMENTEPARTMENT

What is in a Dynamic Compiler?

Quick compilation Reduced set of optimizations for fast

compilation, little inlining Full compilation

Full optimizations only for selected hot methods Classic just-in-time compilation

Compile methods to native code on first invocation

Ex, ParcPlace Smalltalk-80, Self-91 Initial high (time & space) overhead for each

compilation Precludes use of sophisticated optimizations (eg. SSA)

Responsible for many of today’s myths

UUNIVERSITYNIVERSITY OFOF D DELAWARE ELAWARE • • C COMPUTER & OMPUTER & IINFORMATION NFORMATION SSCIENCES CIENCES DDEPARTMENTEPARTMENT

Interpretation vs JIT

0

20

40

60

80

100

120

Intepreter Compiler

Initial Overhead Execution

0

500

1000

1500

2000

2500

Intepreter Compiler

Execution: 20 time units Execution: 2000 time units

UUNIVERSITYNIVERSITY OFOF D DELAWARE ELAWARE • • C COMPUTER & OMPUTER & IINFORMATION NFORMATION SSCIENCES CIENCES DDEPARTMENTEPARTMENT

Selective Optimization

Hypothesis: most execution is spent in a small percentage of methods

Idea: use two execution strategies1. Interpreter or non-optimizing compiler2. Full-fledged optimizing compiler

Strategy: Use option 1 for initial execution of all

methods Profile to find “hot” subset of methods Use option 2 on this subset

UUNIVERSITYNIVERSITY OFOF D DELAWARE ELAWARE • • C COMPUTER & OMPUTER & IINFORMATION NFORMATION SSCIENCES CIENCES DDEPARTMENTEPARTMENT

Selective Optimization

0

20

40

60

80

100

120

Intepreter Compiler Selective

Initial Overhead Execution

0

500

1000

1500

2000

2500

Intepreter Compiler Selective

Initial Overhead Execution

Selective opt: compiles 20% of methods, representing 99% of execution time

Execution: 20 time units Execution: 2000 time units

UUNIVERSITYNIVERSITY OFOF D DELAWARE ELAWARE • • C COMPUTER & OMPUTER & IINFORMATION NFORMATION SSCIENCES CIENCES DDEPARTMENTEPARTMENT

Designing an Adaptive Optimization System What is the system architecture?

What are the profiling mechanisms and policies for driving recompilation? How effective are these systems?

UUNIVERSITYNIVERSITY OFOF D DELAWARE ELAWARE • • C COMPUTER & OMPUTER & IINFORMATION NFORMATION SSCIENCES CIENCES DDEPARTMENTEPARTMENT

Basic Structure of a Dynamic Compiler

ProgramMachine

code

Structural inlining

unrollingloop perm

Scalar cse

constantsexpressions

Memory scalar repl

ptrs

Reg. Alloc

Scheduling peephole

Still needs good core compiler - but more

UUNIVERSITYNIVERSITY OFOF D DELAWARE ELAWARE • • C COMPUTER & OMPUTER & IINFORMATION NFORMATION SSCIENCES CIENCES DDEPARTMENTEPARTMENT

Raw Profile Data

Instrumented code

Basic Structure of a Dynamic Compiler

Compiler subsystem

Optimizations

Interpreter or Simple Translation

Program Executing Program

Profile Processor

History

prior decisionscompile time

ControllerCompilation

decisions

Processed Profile

UUNIVERSITYNIVERSITY OFOF D DELAWARE ELAWARE • • C COMPUTER & OMPUTER & IINFORMATION NFORMATION SSCIENCES CIENCES DDEPARTMENTEPARTMENT

Method Profiling

Counters Call Stack Sampling Combinations

UUNIVERSITYNIVERSITY OFOF D DELAWARE ELAWARE • • C COMPUTER & OMPUTER & IINFORMATION NFORMATION SSCIENCES CIENCES DDEPARTMENTEPARTMENT

Method Profiling: Counters Insert method-specific counter on method entry and loop

back edges Counts how often a method is called and approximates how

much time is spent in a method Very popular approach: Self, HotSpot Issues: overhead for incrementing counter can be

significant Not present in optimized code

UUNIVERSITYNIVERSITY OFOF D DELAWARE ELAWARE • • C COMPUTER & OMPUTER & IINFORMATION NFORMATION SSCIENCES CIENCES DDEPARTMENTEPARTMENT

Method Profiling: Counters

foo ( … ) { fooCounter++; if (fooCounter > Threshold) { recompile( … ); } . . .

}

UUNIVERSITYNIVERSITY OFOF D DELAWARE ELAWARE • • C COMPUTER & OMPUTER & IINFORMATION NFORMATION SSCIENCES CIENCES DDEPARTMENTEPARTMENT

Method Profiling: Call Stack Sampling

Periodically record which method(s) are on call stack

Approximates amount of time spent in each method

Can be compiled into the code Jikes RVM, JRocket

or use hardware sampling Issues: timer-based sampling is not

deterministic

UUNIVERSITYNIVERSITY OFOF D DELAWARE ELAWARE • • C COMPUTER & OMPUTER & IINFORMATION NFORMATION SSCIENCES CIENCES DDEPARTMENTEPARTMENT

Method Profiling: Call Stack Sampling

ABC

AB

A AB

ABC

ABC

......

Sample

UUNIVERSITYNIVERSITY OFOF D DELAWARE ELAWARE • • C COMPUTER & OMPUTER & IINFORMATION NFORMATION SSCIENCES CIENCES DDEPARTMENTEPARTMENT

Method Profiling Mixed Combinations

Use counters initially and sampling later on IBM DK for Java

foo ( … ) { fooCounter++; if (fooCounter > Threshold) { recompile( … ); } . . . }

ABC

UUNIVERSITYNIVERSITY OFOF D DELAWARE ELAWARE • • C COMPUTER & OMPUTER & IINFORMATION NFORMATION SSCIENCES CIENCES DDEPARTMENTEPARTMENT

Method Profiling Mixed Software Hardware Combination

Use interupts & sampling

foo ( … ) { if (flag is set) { sample( … ); } . . . }

ABC

UUNIVERSITYNIVERSITY OFOF D DELAWARE ELAWARE • • C COMPUTER & OMPUTER & IINFORMATION NFORMATION SSCIENCES CIENCES DDEPARTMENTEPARTMENT

Recompilation Policies: Which Candidates to Optimize?

Problem: given optimization candidates, which should be optimized?

Counters: 1. Optimize method that surpasses threshold

Simple, but hard to tune, doesn’t consider context2. Optimize method on the call stack based on inlining

policies Addresses context issue

Call Stack Sampling: 1. Optimize all methods that are sampled

− Simple, but doesn’t consider frequency of sampled methods2. Use Cost/benefit model

Seemingly complicated, but easy to engineer Maintenance free Naturally supports multiple optimization levels

UUNIVERSITYNIVERSITY OFOF D DELAWARE ELAWARE • • C COMPUTER & OMPUTER & IINFORMATION NFORMATION SSCIENCES CIENCES DDEPARTMENTEPARTMENT

Jikes RVM: Recompilation Policy – Cost/Benefit Model Define

cur, current opt level for method m Exe(j), expected future execution time at level j Comp(j), compilation cost at opt level j

Choose j > cur that minimizes Exe(j) + Comp(j)

If Exe(j) + Comp(j) < Exe(cur) recompile at level j Assumptions

Sample data determines how long a method has executed Method will execute as much in the future as it has in the

past Compilation cost and speedup are offline averages

UUNIVERSITYNIVERSITY OFOF D DELAWARE ELAWARE • • C COMPUTER & OMPUTER & IINFORMATION NFORMATION SSCIENCES CIENCES DDEPARTMENTEPARTMENT

Startup Programs: Jikes RVM [Hind et al.’04]

0

1

2

3

4

5

db/10jack/10

ipsixql/short

jess/10

jbb/12000

mtrt/10javac10

xerces/short

mpeg/10

compress/10daikon/shortsoot/shortjack/100

xerces/longjavac/100

jess/100mrtr/100db/100

ipsixql/longsoot/long

jbb/200000compres/100mpeg/100 daikon/long

Geom

Speedup over Baseline

JIT 0 JIT 1 JIT 2

No FDO, Mar’04, AIX/PPC

UUNIVERSITYNIVERSITY OFOF D DELAWARE ELAWARE • • C COMPUTER & OMPUTER & IINFORMATION NFORMATION SSCIENCES CIENCES DDEPARTMENTEPARTMENT

Startup Programs: Jikes RVM

0

1

2

3

4

5

db/10jack/10

ipsixql/short

jess/10

jbb/12000

mtrt/10javac10

xerces/short

mpeg/10

compress/10daikon/shortsoot/shortjack/100

xerces/longjavac/100

jess/100mrtr/100db/100

ipsixql/longsoot/long

jbb/200000compres/100mpeg/100 daikon/long

Geom

Speedup over Baseline

JIT 0 JIT 1 JIT 2 Model

No FDO, Mar’04, AIX/PPC

UUNIVERSITYNIVERSITY OFOF D DELAWARE ELAWARE • • C COMPUTER & OMPUTER & IINFORMATION NFORMATION SSCIENCES CIENCES DDEPARTMENTEPARTMENT

Steady State: Jikes RVM

0

1

2

3

4

5

6

7

jbb-300ipsixqlcompress

jessdb

javac

mpegaudio

mtrt jack

Geomean

Speedup over Baseline

JIT 0 JIT 1 JIT 2

No FDO, Mar’04, AIX/PPC

UUNIVERSITYNIVERSITY OFOF D DELAWARE ELAWARE • • C COMPUTER & OMPUTER & IINFORMATION NFORMATION SSCIENCES CIENCES DDEPARTMENTEPARTMENT

Steady State: Jikes RVM

0

1

2

3

4

5

6

7

jbb-300ipsixqlcompress

jessdb

javac

mpegaudio

mtrt jack

Geomean

Speedup over Baseline

JIT 0 JIT 1 JIT 2 Model

UUNIVERSITYNIVERSITY OFOF D DELAWARE ELAWARE • • C COMPUTER & OMPUTER & IINFORMATION NFORMATION SSCIENCES CIENCES DDEPARTMENTEPARTMENT

Feedback-Directed Optimization (FDO)

Exploit information gathered at run-time to optimize execution “selective optimization”: what to

optimize “FDO” : how to optimize

UUNIVERSITYNIVERSITY OFOF D DELAWARE ELAWARE • • C COMPUTER & OMPUTER & IINFORMATION NFORMATION SSCIENCES CIENCES DDEPARTMENTEPARTMENT

Advantages of FDO Can exploit dynamic information

that cannot be inferred statically

System can change and revert decisions when conditions change

Runtime binding allows more flexible systems

UUNIVERSITYNIVERSITY OFOF D DELAWARE ELAWARE • • C COMPUTER & OMPUTER & IINFORMATION NFORMATION SSCIENCES CIENCES DDEPARTMENTEPARTMENT

Challenges for automatic online FDO

Compensate for profiling overhead

Compensate for runtime transformation overhead

Account for partial profile available and changing conditions

UUNIVERSITYNIVERSITY OFOF D DELAWARE ELAWARE • • C COMPUTER & OMPUTER & IINFORMATION NFORMATION SSCIENCES CIENCES DDEPARTMENTEPARTMENT

Profiling for What to Do

Clients Inlining, unrolling, method dispatch

Dispatch tables, synchronization services, GC

Pretching Misses, Hardware performance

monitors [Adl-Tabatabai et al.’04] Code layout

values - loop counts edges & paths

UUNIVERSITYNIVERSITY OFOF D DELAWARE ELAWARE • • C COMPUTER & OMPUTER & IINFORMATION NFORMATION SSCIENCES CIENCES DDEPARTMENTEPARTMENT

Profiling for What to Do

Myth: Sophisticated profiling is too expensive to perform online

Reality: Well-known technology can collect sophisticated profiles with sampling and minimal overhead

UUNIVERSITYNIVERSITY OFOF D DELAWARE ELAWARE • • C COMPUTER & OMPUTER & IINFORMATION NFORMATION SSCIENCES CIENCES DDEPARTMENTEPARTMENT

Method Profiling Timer Based

class Thread scheduler (...) { ... flag = 1;}void handler(...) { // sample stack, perform GC, swap threads, etc. .... flag = 0;}

foo ( … ) { // on method entry, exit, & all loop backedges if (flag) { handler( … ); } . . . }

ABC

Useful for more than profiling Jikes RVM

Schedule garbage collection Thread scheduling policies, etc.

if (flag) handler();

if (flag) handler();

if (flag) handler();

UUNIVERSITYNIVERSITY OFOF D DELAWARE ELAWARE • • C COMPUTER & OMPUTER & IINFORMATION NFORMATION SSCIENCES CIENCES DDEPARTMENTEPARTMENT

Arnold-Ryder [PLDI 01]: Full Duplication Profiling

Full-Duplication Framework

Duplicated CodeChecking Code

Method Entry

Checks

EntryBackedges

CheckPlacement

Generate two copies of a method• Execute “fast path” most of the time• Execute “slow path” with detailed profiling occassionally• Adapted by J9 due to proven accuracy and low overhead

UUNIVERSITYNIVERSITY OFOF D DELAWARE ELAWARE • • C COMPUTER & OMPUTER & IINFORMATION NFORMATION SSCIENCES CIENCES DDEPARTMENTEPARTMENT

Suggested ReadingDynamic Compilation

Adaptive optimization in the Jalapeno JVM, M. Arnold, S. Fink, D. Grove, M. Hind, and P. Sweeney, Proceedings of the 2000 ACM SIGPLAN Conference on Object-Oriented Programming, Systems, Languages & Applications (OOPSLA '00), pages 47--65, Oct. 2000.