© 2009 ibm corporation1 feedback directed dynamic recompilation for statically compiled languages...

19
© 2009 IBM Corporation 1 Feedback Directed Dynamic Recompilation for Statically Compiled Languages Dorit Nuzman , Sergei Dyshel, Revital Eres IBM Research, Haifa Thematic Session on Dynamic Compilation HiPEAC Computing Systems Week Paris, May 3 rd 2013

Upload: jorden-branson

Post on 29-Mar-2015

216 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: © 2009 IBM Corporation1 Feedback Directed Dynamic Recompilation for Statically Compiled Languages Dorit Nuzman, Sergei Dyshel, Revital Eres IBM Research,

© 2009 IBM Corporation1

Feedback Directed Dynamic Recompilation for

Statically Compiled Languages

Dorit Nuzman, Sergei Dyshel, Revital Eres

IBM Research, Haifa

Thematic Session on Dynamic Compilation

HiPEAC Computing Systems Week

Paris, May 3rd 2013

Page 2: © 2009 IBM Corporation1 Feedback Directed Dynamic Recompilation for Statically Compiled Languages Dorit Nuzman, Sergei Dyshel, Revital Eres IBM Research,

© 2009 IBM Corporation2

Motivating Scenario

(IBM’s) customer

Independent Software Vendor

Computer System Vendor

(e.g., IBM)

Third party software owned by some ISV

Power780 server

Increase target platform level?

Performance problem

Increase optimization level?

Apply feedback directed optimization?

No

Nope

Can’t do

Page 3: © 2009 IBM Corporation1 Feedback Directed Dynamic Recompilation for Statically Compiled Languages Dorit Nuzman, Sergei Dyshel, Revital Eres IBM Research,

© 2009 IBM Corporation3

Fat Binary Runtime Engine

Profiler

Intermediate Representation

Dynamic execution stage

Program Source Code

Static Compiler

Motivating Scenario

(IBM’s) customer

Independent Software Vendor

Computer System Vendor

(e.g., IBM)

Power780 server

Performance problem

Native machine code

JIT compiler

opt = -O2

arch = common

no-profile

Page 4: © 2009 IBM Corporation1 Feedback Directed Dynamic Recompilation for Statically Compiled Languages Dorit Nuzman, Sergei Dyshel, Revital Eres IBM Research,

© 2009 IBM Corporation4

Fat Binary Runtime Engine

Profiler

Intermediate Representation

Dynamic execution stage

Program Source Code

Static Compiler

Motivating Scenario

(IBM’s) customer

Independent Software Vendor

Computer System Vendor

(e.g., IBM)

Power780 server

Performance problem

Native machine code

JIT compiler

opt = -O2

arch = common

no-profile

Page 5: © 2009 IBM Corporation1 Feedback Directed Dynamic Recompilation for Statically Compiled Languages Dorit Nuzman, Sergei Dyshel, Revital Eres IBM Research,

© 2009 IBM Corporation5

Fat Binary Runtime Engine

Profiler

Intermediate Representation

Dynamic execution stage

Program Source Code

Static Compiler

selective profile-driven recompilation

Native machine code

JIT compiler

Our approach: Fat Binary based, feedback-directed, dynamic recompilation

Used for years in dynamic languages & Java Needed also for static languages

Opposed to dynamic binary optimization: includes high-level semantic information allows aggressive, speculative

transformations

Page 6: © 2009 IBM Corporation1 Feedback Directed Dynamic Recompilation for Statically Compiled Languages Dorit Nuzman, Sergei Dyshel, Revital Eres IBM Research,

© 2009 IBM Corporation6

Background

Modern compilers provide sophisticated optimizations.

•O3 (O4, O5)

•Inter-procedural

•Auto-vect/par

•Feedback-directed

•Hardware-specific

•Complicates build process

•Prolongs development & testing cycle

•Requires per-customer tuning – too costly

•No representative input

We can gain back the lost performance benefit by applying the optimizations dynamically, at runtime.

These optimizations are usually not used.

–Only in benchmarking and HPC

Page 7: © 2009 IBM Corporation1 Feedback Directed Dynamic Recompilation for Statically Compiled Languages Dorit Nuzman, Sergei Dyshel, Revital Eres IBM Research,

© 2009 IBM Corporation7

Dynamic Recompilation

Solves the static-compiler usability issue–Transparent feedback-directed optimization for current workload.

–Tuning for current hardware–Separation of optimization from software production

Allows adaptive optimization.

Allows iterative optimization.

Virtualization & Cloud: physical resources known only at runtime, and continuously change

Page 8: © 2009 IBM Corporation1 Feedback Directed Dynamic Recompilation for Statically Compiled Languages Dorit Nuzman, Sergei Dyshel, Revital Eres IBM Research,

© 2009 IBM Corporation8

Other Approaches:

Focus only on very long running programs with heavy workloads to compensate for time spent profiling.

Focus on optimization across consecutive runs of repetitive programs

Domain specific (focus on a specific optimization, to a small pre-selected part of the code)

Trace-based binary-optimization

…Our Goal:

Demonstrate an execution environment with overheads that are low enough to allow the dynamic optimizer to speed up execution of the current invocation, for regular programs/workloads.

Dynamic Recompilation for Static Languages

…Our Goal:

Demonstrate an execution environment with overheads that are low enough to allow the dynamic optimizer to speed up execution of the current invocation, for regular programs/workloads.

Page 9: © 2009 IBM Corporation1 Feedback Directed Dynamic Recompilation for Statically Compiled Languages Dorit Nuzman, Sergei Dyshel, Revital Eres IBM Research,

© 2009 IBM Corporation9

Fat Binary Runtime Engine

Profiler

Split-IR

Dynamic execution stage

Program Source Code

Static Compiler

Native machine code

JIT compiler

Our approach: Fat Binary based, feedback-directed, dynamic recompilation

Page 10: © 2009 IBM Corporation1 Feedback Directed Dynamic Recompilation for Statically Compiled Languages Dorit Nuzman, Sergei Dyshel, Revital Eres IBM Research,

© 2009 IBM Corporation10

t0t1

t2 t3

Execution and sampling thread

t4 t5

t6 t7

t8 t9

Instrumentation-based profiling

sampling-based profiling for method hottness

Original methodversion

Instrumented methodversion

Optimized methodversion

Instrumentation Optimization

Recompilation thread

Runtime Monitoring and Recompilation

timeline

Startup cost

(loading & mapping)

monitoring overhead

Recompilation cost

Slow instrumented execution

Synchronization cost

Page 11: © 2009 IBM Corporation1 Feedback Directed Dynamic Recompilation for Statically Compiled Languages Dorit Nuzman, Sergei Dyshel, Revital Eres IBM Research,

© 2009 IBM Corporation11

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

1.1

1.2

1.3

gcc

gobm

k

perlb

ench

xala

ncbm

k

h264r

ef

hmm

er

bzip2

omne

tpp

asta

r

sjen

gm

cf

libquan

tum

avg.

tim

e n

orm

aliz

ed t

o s

tati

c ex

ecu

tio

n t

ime

other runtime effect

recompilation & instrumentation

sampling and synchronization

loading&mapping

optimized executable

SPECint2006: Dynamic Optimization Overheads – “ref” dataset

Overall not degrading performance.

Stress test1: using highly statically-optimized executable (–O3 -qhot)

Page 12: © 2009 IBM Corporation1 Feedback Directed Dynamic Recompilation for Statically Compiled Languages Dorit Nuzman, Sergei Dyshel, Revital Eres IBM Research,

© 2009 IBM Corporation12

SPECint2006: Dynamic Optimization Overheads – “train” dataset

Works also for very short running programs.

Stress test2: using highly statically-optimized executable (–O3 -qhot)

Currently limited gain from FDO alone.

Page 13: © 2009 IBM Corporation1 Feedback Directed Dynamic Recompilation for Statically Compiled Languages Dorit Nuzman, Sergei Dyshel, Revital Eres IBM Research,

© 2009 IBM Corporation13

Optimization effect (isolated from overheads)

(1) Similar impact gained using sampled profile as with using a “perfect” profiles. the problem is not it the profile quality

(2) offline optimizer applies link-time FDO (cross methods and modules). Our optimizer limited currently to single module

Page 14: © 2009 IBM Corporation1 Feedback Directed Dynamic Recompilation for Statically Compiled Languages Dorit Nuzman, Sergei Dyshel, Revital Eres IBM Research,

© 2009 IBM Corporation14

Fat Binary Runtime Engine

Profiler

Intermediate Representation

Dynamic execution stage

Program Source Code

Static Compiler

(IBM’s) customer

Independent Software Vendor

Computer System Vendor

(e.g., IBM)

Power780 server

Native machine code

JIT compiler

opt = -O2

arch = common

no-profile

programs are statically under-optimized / moderately-optimized

Page 15: © 2009 IBM Corporation1 Feedback Directed Dynamic Recompilation for Statically Compiled Languages Dorit Nuzman, Sergei Dyshel, Revital Eres IBM Research,

© 2009 IBM Corporation15

SPECint2006: Overall Effect of Dynamic Execution (ref)

Overall 7% improvement on average

moderately-optimized scenario (program statically compiled with –O2)Selected methods from the program dynamically recompiled using a higher optimization level.

Page 16: © 2009 IBM Corporation1 Feedback Directed Dynamic Recompilation for Statically Compiled Languages Dorit Nuzman, Sergei Dyshel, Revital Eres IBM Research,

© 2009 IBM Corporation16

Selected methods from the program dynamically recompiled using a higher optimization level.

Recompilation Statistics

Default recompilation mode (default method hotness threshold)

Aggressive recompilation mode (lower method hotness threshold)

moderately-optimized scenario (program statically compiled with –O2)

Overall 7% improvement on average

Overall 8% improvement on average

Page 17: © 2009 IBM Corporation1 Feedback Directed Dynamic Recompilation for Statically Compiled Languages Dorit Nuzman, Sergei Dyshel, Revital Eres IBM Research,

© 2009 IBM Corporation17

More Benchmarks: SQlite

SQLite with 1 G TCP-H tables

0.00

0.10

0.20

0.30

0.40

0.50

0.60

0.70

0.80

0.90

1.00

stream of query 1

time

norm

aliz

ed to

sta

tic e

xecu

tion

time

recompilation & instrumentationsampling and synchronizationloading and mappingoptimized executable

SQlite: – Static version compiled with default

compiler options: -O2 warm.

– Using 1G of TPC-H tables.• (smallest dataset)

– Using TPC-H queries:• Stream of 13 instances of

query #1 13% improvement from dynamic FDO

• Most improvement comes from higher optimization level.

Page 18: © 2009 IBM Corporation1 Feedback Directed Dynamic Recompilation for Statically Compiled Languages Dorit Nuzman, Sergei Dyshel, Revital Eres IBM Research,

© 2009 IBM Corporation18

Overall cost of runtime optimization environment, including– environment startup cost– recompilation – profiling overheads

is less than 2% on average (SPECint2006)

For highly optimized native binaries, on average, there is no overall degradation

These low overheads imply that the fat-binary based approach is practical for real-world use-cases and workloads

– Feedback directed optimization can easily surpass these costs

Aggressive optimization level for selected methods at runtime brings up to 20% speedup, and an 8% average speedup

Much more potential available:– more aggressive optimizations: loop-nest, memory-hierarchy, parallelization– more profiling (event based?)– more synergy with static compiler

more synergy with underlying (virtual) environment, to adapt to changes

Summary and Conclusions

Page 19: © 2009 IBM Corporation1 Feedback Directed Dynamic Recompilation for Statically Compiled Languages Dorit Nuzman, Sergei Dyshel, Revital Eres IBM Research,

© 2009 IBM Corporation19

Thematic Session on Dynamic Compilation

1) What is the dynamic optimization stage? During program execution

2) What triggers the dynamic compilation cycle? A method gets warm

3) How are these triggers being detected? sampling execution/PCs (via time interrupts & code instrumentation) to monitor application behavior

4) How/when are the above triggers being inserted? at run-time

5) What is the recompilation scope/granularity? method

6) What is the target application domain? general purpose/commercial applications

7) What is the input code for the dynamic optimization? fat-binary (binary + IR)

8) What is the programming language of the target applications? statically compiled languages (C/C++...)

9) What specific adaptation / optimization / code-transformation is applied? general feedback-directed optimizations (BB ordering, …)