© 2009 ibm corporation1 feedback directed dynamic recompilation for statically compiled languages...
TRANSCRIPT
![Page 1: © 2009 IBM Corporation1 Feedback Directed Dynamic Recompilation for Statically Compiled Languages Dorit Nuzman, Sergei Dyshel, Revital Eres IBM Research,](https://reader035.vdocuments.mx/reader035/viewer/2022062511/55173b63550346fe558b614e/html5/thumbnails/1.jpg)
© 2009 IBM Corporation1
Feedback Directed Dynamic Recompilation for
Statically Compiled Languages
Dorit Nuzman, Sergei Dyshel, Revital Eres
IBM Research, Haifa
Thematic Session on Dynamic Compilation
HiPEAC Computing Systems Week
Paris, May 3rd 2013
![Page 2: © 2009 IBM Corporation1 Feedback Directed Dynamic Recompilation for Statically Compiled Languages Dorit Nuzman, Sergei Dyshel, Revital Eres IBM Research,](https://reader035.vdocuments.mx/reader035/viewer/2022062511/55173b63550346fe558b614e/html5/thumbnails/2.jpg)
© 2009 IBM Corporation2
Motivating Scenario
(IBM’s) customer
Independent Software Vendor
Computer System Vendor
(e.g., IBM)
Third party software owned by some ISV
Power780 server
Increase target platform level?
Performance problem
Increase optimization level?
Apply feedback directed optimization?
No
Nope
Can’t do
![Page 3: © 2009 IBM Corporation1 Feedback Directed Dynamic Recompilation for Statically Compiled Languages Dorit Nuzman, Sergei Dyshel, Revital Eres IBM Research,](https://reader035.vdocuments.mx/reader035/viewer/2022062511/55173b63550346fe558b614e/html5/thumbnails/3.jpg)
© 2009 IBM Corporation3
Fat Binary Runtime Engine
Profiler
Intermediate Representation
Dynamic execution stage
Program Source Code
Static Compiler
Motivating Scenario
(IBM’s) customer
Independent Software Vendor
Computer System Vendor
(e.g., IBM)
Power780 server
Performance problem
Native machine code
JIT compiler
opt = -O2
arch = common
no-profile
![Page 4: © 2009 IBM Corporation1 Feedback Directed Dynamic Recompilation for Statically Compiled Languages Dorit Nuzman, Sergei Dyshel, Revital Eres IBM Research,](https://reader035.vdocuments.mx/reader035/viewer/2022062511/55173b63550346fe558b614e/html5/thumbnails/4.jpg)
© 2009 IBM Corporation4
Fat Binary Runtime Engine
Profiler
Intermediate Representation
Dynamic execution stage
Program Source Code
Static Compiler
Motivating Scenario
(IBM’s) customer
Independent Software Vendor
Computer System Vendor
(e.g., IBM)
Power780 server
Performance problem
Native machine code
JIT compiler
opt = -O2
arch = common
no-profile
![Page 5: © 2009 IBM Corporation1 Feedback Directed Dynamic Recompilation for Statically Compiled Languages Dorit Nuzman, Sergei Dyshel, Revital Eres IBM Research,](https://reader035.vdocuments.mx/reader035/viewer/2022062511/55173b63550346fe558b614e/html5/thumbnails/5.jpg)
© 2009 IBM Corporation5
Fat Binary Runtime Engine
Profiler
Intermediate Representation
Dynamic execution stage
Program Source Code
Static Compiler
selective profile-driven recompilation
Native machine code
JIT compiler
Our approach: Fat Binary based, feedback-directed, dynamic recompilation
Used for years in dynamic languages & Java Needed also for static languages
Opposed to dynamic binary optimization: includes high-level semantic information allows aggressive, speculative
transformations
![Page 6: © 2009 IBM Corporation1 Feedback Directed Dynamic Recompilation for Statically Compiled Languages Dorit Nuzman, Sergei Dyshel, Revital Eres IBM Research,](https://reader035.vdocuments.mx/reader035/viewer/2022062511/55173b63550346fe558b614e/html5/thumbnails/6.jpg)
© 2009 IBM Corporation6
Background
Modern compilers provide sophisticated optimizations.
•O3 (O4, O5)
•Inter-procedural
•Auto-vect/par
•Feedback-directed
•Hardware-specific
•Complicates build process
•Prolongs development & testing cycle
•Requires per-customer tuning – too costly
•No representative input
We can gain back the lost performance benefit by applying the optimizations dynamically, at runtime.
These optimizations are usually not used.
–Only in benchmarking and HPC
![Page 7: © 2009 IBM Corporation1 Feedback Directed Dynamic Recompilation for Statically Compiled Languages Dorit Nuzman, Sergei Dyshel, Revital Eres IBM Research,](https://reader035.vdocuments.mx/reader035/viewer/2022062511/55173b63550346fe558b614e/html5/thumbnails/7.jpg)
© 2009 IBM Corporation7
Dynamic Recompilation
Solves the static-compiler usability issue–Transparent feedback-directed optimization for current workload.
–Tuning for current hardware–Separation of optimization from software production
Allows adaptive optimization.
Allows iterative optimization.
Virtualization & Cloud: physical resources known only at runtime, and continuously change
![Page 8: © 2009 IBM Corporation1 Feedback Directed Dynamic Recompilation for Statically Compiled Languages Dorit Nuzman, Sergei Dyshel, Revital Eres IBM Research,](https://reader035.vdocuments.mx/reader035/viewer/2022062511/55173b63550346fe558b614e/html5/thumbnails/8.jpg)
© 2009 IBM Corporation8
Other Approaches:
Focus only on very long running programs with heavy workloads to compensate for time spent profiling.
Focus on optimization across consecutive runs of repetitive programs
Domain specific (focus on a specific optimization, to a small pre-selected part of the code)
Trace-based binary-optimization
…Our Goal:
Demonstrate an execution environment with overheads that are low enough to allow the dynamic optimizer to speed up execution of the current invocation, for regular programs/workloads.
Dynamic Recompilation for Static Languages
…Our Goal:
Demonstrate an execution environment with overheads that are low enough to allow the dynamic optimizer to speed up execution of the current invocation, for regular programs/workloads.
![Page 9: © 2009 IBM Corporation1 Feedback Directed Dynamic Recompilation for Statically Compiled Languages Dorit Nuzman, Sergei Dyshel, Revital Eres IBM Research,](https://reader035.vdocuments.mx/reader035/viewer/2022062511/55173b63550346fe558b614e/html5/thumbnails/9.jpg)
© 2009 IBM Corporation9
Fat Binary Runtime Engine
Profiler
Split-IR
Dynamic execution stage
Program Source Code
Static Compiler
Native machine code
JIT compiler
Our approach: Fat Binary based, feedback-directed, dynamic recompilation
![Page 10: © 2009 IBM Corporation1 Feedback Directed Dynamic Recompilation for Statically Compiled Languages Dorit Nuzman, Sergei Dyshel, Revital Eres IBM Research,](https://reader035.vdocuments.mx/reader035/viewer/2022062511/55173b63550346fe558b614e/html5/thumbnails/10.jpg)
© 2009 IBM Corporation10
t0t1
t2 t3
Execution and sampling thread
t4 t5
t6 t7
t8 t9
Instrumentation-based profiling
sampling-based profiling for method hottness
Original methodversion
Instrumented methodversion
Optimized methodversion
Instrumentation Optimization
Recompilation thread
Runtime Monitoring and Recompilation
timeline
Startup cost
(loading & mapping)
monitoring overhead
Recompilation cost
Slow instrumented execution
Synchronization cost
![Page 11: © 2009 IBM Corporation1 Feedback Directed Dynamic Recompilation for Statically Compiled Languages Dorit Nuzman, Sergei Dyshel, Revital Eres IBM Research,](https://reader035.vdocuments.mx/reader035/viewer/2022062511/55173b63550346fe558b614e/html5/thumbnails/11.jpg)
© 2009 IBM Corporation11
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
1.1
1.2
1.3
gcc
gobm
k
perlb
ench
xala
ncbm
k
h264r
ef
hmm
er
bzip2
omne
tpp
asta
r
sjen
gm
cf
libquan
tum
avg.
tim
e n
orm
aliz
ed t
o s
tati
c ex
ecu
tio
n t
ime
other runtime effect
recompilation & instrumentation
sampling and synchronization
loading&mapping
optimized executable
SPECint2006: Dynamic Optimization Overheads – “ref” dataset
Overall not degrading performance.
Stress test1: using highly statically-optimized executable (–O3 -qhot)
![Page 12: © 2009 IBM Corporation1 Feedback Directed Dynamic Recompilation for Statically Compiled Languages Dorit Nuzman, Sergei Dyshel, Revital Eres IBM Research,](https://reader035.vdocuments.mx/reader035/viewer/2022062511/55173b63550346fe558b614e/html5/thumbnails/12.jpg)
© 2009 IBM Corporation12
SPECint2006: Dynamic Optimization Overheads – “train” dataset
Works also for very short running programs.
Stress test2: using highly statically-optimized executable (–O3 -qhot)
Currently limited gain from FDO alone.
![Page 13: © 2009 IBM Corporation1 Feedback Directed Dynamic Recompilation for Statically Compiled Languages Dorit Nuzman, Sergei Dyshel, Revital Eres IBM Research,](https://reader035.vdocuments.mx/reader035/viewer/2022062511/55173b63550346fe558b614e/html5/thumbnails/13.jpg)
© 2009 IBM Corporation13
Optimization effect (isolated from overheads)
(1) Similar impact gained using sampled profile as with using a “perfect” profiles. the problem is not it the profile quality
(2) offline optimizer applies link-time FDO (cross methods and modules). Our optimizer limited currently to single module
![Page 14: © 2009 IBM Corporation1 Feedback Directed Dynamic Recompilation for Statically Compiled Languages Dorit Nuzman, Sergei Dyshel, Revital Eres IBM Research,](https://reader035.vdocuments.mx/reader035/viewer/2022062511/55173b63550346fe558b614e/html5/thumbnails/14.jpg)
© 2009 IBM Corporation14
Fat Binary Runtime Engine
Profiler
Intermediate Representation
Dynamic execution stage
Program Source Code
Static Compiler
(IBM’s) customer
Independent Software Vendor
Computer System Vendor
(e.g., IBM)
Power780 server
Native machine code
JIT compiler
opt = -O2
arch = common
no-profile
programs are statically under-optimized / moderately-optimized
![Page 15: © 2009 IBM Corporation1 Feedback Directed Dynamic Recompilation for Statically Compiled Languages Dorit Nuzman, Sergei Dyshel, Revital Eres IBM Research,](https://reader035.vdocuments.mx/reader035/viewer/2022062511/55173b63550346fe558b614e/html5/thumbnails/15.jpg)
© 2009 IBM Corporation15
SPECint2006: Overall Effect of Dynamic Execution (ref)
Overall 7% improvement on average
moderately-optimized scenario (program statically compiled with –O2)Selected methods from the program dynamically recompiled using a higher optimization level.
![Page 16: © 2009 IBM Corporation1 Feedback Directed Dynamic Recompilation for Statically Compiled Languages Dorit Nuzman, Sergei Dyshel, Revital Eres IBM Research,](https://reader035.vdocuments.mx/reader035/viewer/2022062511/55173b63550346fe558b614e/html5/thumbnails/16.jpg)
© 2009 IBM Corporation16
Selected methods from the program dynamically recompiled using a higher optimization level.
Recompilation Statistics
Default recompilation mode (default method hotness threshold)
Aggressive recompilation mode (lower method hotness threshold)
moderately-optimized scenario (program statically compiled with –O2)
Overall 7% improvement on average
Overall 8% improvement on average
![Page 17: © 2009 IBM Corporation1 Feedback Directed Dynamic Recompilation for Statically Compiled Languages Dorit Nuzman, Sergei Dyshel, Revital Eres IBM Research,](https://reader035.vdocuments.mx/reader035/viewer/2022062511/55173b63550346fe558b614e/html5/thumbnails/17.jpg)
© 2009 IBM Corporation17
More Benchmarks: SQlite
SQLite with 1 G TCP-H tables
0.00
0.10
0.20
0.30
0.40
0.50
0.60
0.70
0.80
0.90
1.00
stream of query 1
time
norm
aliz
ed to
sta
tic e
xecu
tion
time
recompilation & instrumentationsampling and synchronizationloading and mappingoptimized executable
SQlite: – Static version compiled with default
compiler options: -O2 warm.
– Using 1G of TPC-H tables.• (smallest dataset)
– Using TPC-H queries:• Stream of 13 instances of
query #1 13% improvement from dynamic FDO
• Most improvement comes from higher optimization level.
![Page 18: © 2009 IBM Corporation1 Feedback Directed Dynamic Recompilation for Statically Compiled Languages Dorit Nuzman, Sergei Dyshel, Revital Eres IBM Research,](https://reader035.vdocuments.mx/reader035/viewer/2022062511/55173b63550346fe558b614e/html5/thumbnails/18.jpg)
© 2009 IBM Corporation18
Overall cost of runtime optimization environment, including– environment startup cost– recompilation – profiling overheads
is less than 2% on average (SPECint2006)
For highly optimized native binaries, on average, there is no overall degradation
These low overheads imply that the fat-binary based approach is practical for real-world use-cases and workloads
– Feedback directed optimization can easily surpass these costs
Aggressive optimization level for selected methods at runtime brings up to 20% speedup, and an 8% average speedup
Much more potential available:– more aggressive optimizations: loop-nest, memory-hierarchy, parallelization– more profiling (event based?)– more synergy with static compiler
more synergy with underlying (virtual) environment, to adapt to changes
Summary and Conclusions
![Page 19: © 2009 IBM Corporation1 Feedback Directed Dynamic Recompilation for Statically Compiled Languages Dorit Nuzman, Sergei Dyshel, Revital Eres IBM Research,](https://reader035.vdocuments.mx/reader035/viewer/2022062511/55173b63550346fe558b614e/html5/thumbnails/19.jpg)
© 2009 IBM Corporation19
Thematic Session on Dynamic Compilation
1) What is the dynamic optimization stage? During program execution
2) What triggers the dynamic compilation cycle? A method gets warm
3) How are these triggers being detected? sampling execution/PCs (via time interrupts & code instrumentation) to monitor application behavior
4) How/when are the above triggers being inserted? at run-time
5) What is the recompilation scope/granularity? method
6) What is the target application domain? general purpose/commercial applications
7) What is the input code for the dynamic optimization? fat-binary (binary + IR)
8) What is the programming language of the target applications? statically compiled languages (C/C++...)
9) What specific adaptation / optimization / code-transformation is applied? general feedback-directed optimizations (BB ordering, …)