using dynamic binary instrumentation to generate multi...

22
Using Dynamic Binary Instrumentation to Generate Multi-Platform SimPoints Vincent M. Weaver and Sally A. McKee Cornell University 29 January 2008

Upload: others

Post on 23-Nov-2020

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Using Dynamic Binary Instrumentation to Generate Multi ...web.eece.maine.edu/~vweaver/papers/hipeac08/hipeac08...arm x86 x86_64 ppc m68k mips sh4 sparc hppa alpha cris 9 Fusion Pin

Using Dynamic BinaryInstrumentation to GenerateMulti-Platform SimPoints

Vincent M. Weaver and Sally A. McKee

Cornell University

29 January 2008

Page 2: Using Dynamic Binary Instrumentation to Generate Multi ...web.eece.maine.edu/~vweaver/papers/hipeac08/hipeac08...arm x86 x86_64 ppc m68k mips sh4 sparc hppa alpha cris 9 Fusion Pin

Breakdown of Benchmark MethodologyUsed in Major Conferences

HPCA, ISCA, MICRO ISCA(1995-2004) (2006-2007)

Yi et al HPCA 2005

Run Complete (simulator) 18% 19%Run Multiple SimPoints 0% 4%Run SMARTS 0% 11%Run One SimPoint 0% 30%Fast Forward X, Run Z 27% 30%Run Reduced (train, MinneSPEC) 19% 22%Run Z 23% 4%

1Fusion

Page 3: Using Dynamic Binary Instrumentation to Generate Multi ...web.eece.maine.edu/~vweaver/papers/hipeac08/hipeac08...arm x86 x86_64 ppc m68k mips sh4 sparc hppa alpha cris 9 Fusion Pin

Phase Behavior — twolf

0 1000 2000 3000Instruction Interval (100M)

300.twolf default

0

1

2

CP

I

012345

L1 D

-Cac

heM

iss

Rat

e (%

)

2Fusion

Page 4: Using Dynamic Binary Instrumentation to Generate Multi ...web.eece.maine.edu/~vweaver/papers/hipeac08/hipeac08...arm x86 x86_64 ppc m68k mips sh4 sparc hppa alpha cris 9 Fusion Pin

Phase Behavior — mcf

0 200 400 600Instruction Interval (100M)

181.mcf default

02468

CP

I

0

10

20

30

L1 D

-Cac

heM

iss

Rat

e (%

)

3Fusion

Page 5: Using Dynamic Binary Instrumentation to Generate Multi ...web.eece.maine.edu/~vweaver/papers/hipeac08/hipeac08...arm x86 x86_64 ppc m68k mips sh4 sparc hppa alpha cris 9 Fusion Pin

Phase Behavior — gcc.200

0 200 400 600Instruction Interval (100M)

176.gcc 200

0

5

10

CP

I

0

10

20

30

L1 D

-Cac

heM

iss

Rat

e (%

)

4Fusion

Page 6: Using Dynamic Binary Instrumentation to Generate Multi ...web.eece.maine.edu/~vweaver/papers/hipeac08/hipeac08...arm x86 x86_64 ppc m68k mips sh4 sparc hppa alpha cris 9 Fusion Pin

SimPoint Basic Block Vectors

• Instrument every Basic Block

• Increment unique counter on entry to each BB

• Save list of BBs and frequencies periodically

These BBVs are used by SimPoint utility to determine

phase behavior

5Fusion

Page 7: Using Dynamic Binary Instrumentation to Generate Multi ...web.eece.maine.edu/~vweaver/papers/hipeac08/hipeac08...arm x86 x86_64 ppc m68k mips sh4 sparc hppa alpha cris 9 Fusion Pin

Dynamic Binary Instrumentation

• Instruments each BB at first execution

• Caches instrumented code

• Runs faster than simulation, slower than native

6Fusion

Page 8: Using Dynamic Binary Instrumentation to Generate Multi ...web.eece.maine.edu/~vweaver/papers/hipeac08/hipeac08...arm x86 x86_64 ppc m68k mips sh4 sparc hppa alpha cris 9 Fusion Pin

Previous Tools for Generating BBVs

• Atom — only Alpha with Tru64 UNIX

• Pin — only Intel architectures (IA32, Intel 64, IA64,

XScale)

• Simulator mods — only simulator specific (SLOW)

7Fusion

Page 9: Using Dynamic Binary Instrumentation to Generate Multi ...web.eece.maine.edu/~vweaver/papers/hipeac08/hipeac08...arm x86 x86_64 ppc m68k mips sh4 sparc hppa alpha cris 9 Fusion Pin

Goal – Expand Applicability of SimPoints

• Support more architectures

• Use existing DBI tools

• Validate generated BBVs

• Generate cross-platform BBV files

8Fusion

Page 10: Using Dynamic Binary Instrumentation to Generate Multi ...web.eece.maine.edu/~vweaver/papers/hipeac08/hipeac08...arm x86 x86_64 ppc m68k mips sh4 sparc hppa alpha cris 9 Fusion Pin

Solution: Valgrind and Qemu

Pin Valgrind

Qemu

itanium

arm

x86x86_64

ppc

m68kmips

sh4sparchppa

alpha

cris

9Fusion

Page 11: Using Dynamic Binary Instrumentation to Generate Multi ...web.eece.maine.edu/~vweaver/papers/hipeac08/hipeac08...arm x86 x86_64 ppc m68k mips sh4 sparc hppa alpha cris 9 Fusion Pin

Pin

• Intel proprietary DBI tool: IA32, IA64, Intel 64, Xscale

(Windows, Linux, OSX)

• Plugins are written in C++

• Average slowdown of 15x on SPEC 2000

10Fusion

Page 12: Using Dynamic Binary Instrumentation to Generate Multi ...web.eece.maine.edu/~vweaver/papers/hipeac08/hipeac08...arm x86 x86_64 ppc m68k mips sh4 sparc hppa alpha cris 9 Fusion Pin

Qemu

• Open Source

• Full-system simulation or syscall-by-proxy

• Translates from arbitrary ISAs via DBI

• Runs ARM, IA32, Intel 64, MIPS, PPC, SPARC, HPPA,

Etrax CRIS, Alpha, sh4, m68k

• Average slowdown of 28x on SPEC 2000

11Fusion

Page 13: Using Dynamic Binary Instrumentation to Generate Multi ...web.eece.maine.edu/~vweaver/papers/hipeac08/hipeac08...arm x86 x86_64 ppc m68k mips sh4 sparc hppa alpha cris 9 Fusion Pin

Valgrind

• Open Source, external plugin interface

• Originally designed to find memory access violations

• Translates to own IR, instruments, retranslates back to

native ISA

• Runs on IA32, Intel 64 and PPC platforms (Linux and

AIX)

• Average slowdown of 39x on SPEC 2000

12Fusion

Page 14: Using Dynamic Binary Instrumentation to Generate Multi ...web.eece.maine.edu/~vweaver/papers/hipeac08/hipeac08...arm x86 x86_64 ppc m68k mips sh4 sparc hppa alpha cris 9 Fusion Pin

Validation Systems

machine processor memory L1 I/D L2/L3 Cachenestle 400MHz Pentium II 256MB 16KB/16KB 512KB

spruengli 550MHz Pentium III 512MB 16KB/16KB 512KBitanium 800MHz Itanium (x86 mode) 1GB 16KB/16KB 96KB/3MBchocovic 1.66GHz Core Duo 1GB 32KB/32KB 1MB

milka 1.733MHz Athlon MP 512MB 64KB/64KB 256KBgallais 1.8GHz Pentium 4 256MB 12Ku/16KB 256KB

jennifer 2GHz Athlon64 X2 1GB 64KB/64KB 512KBsampaka12 2.8GHz Pentium 4 2GB 12Ku/16KB 512KBdomori25 3.46GHz Pentium D 4GB 12Ku/16KB 2MB

13Fusion

Page 15: Using Dynamic Binary Instrumentation to Generate Multi ...web.eece.maine.edu/~vweaver/papers/hipeac08/hipeac08...arm x86 x86_64 ppc m68k mips sh4 sparc hppa alpha cris 9 Fusion Pin

SPEC CPU 2000

nestle Pentium II

spruengli Pentium III

itanium Itanium

chocovic Core Duo

milka Athlon

gallais Pentium 4

jennifer Athlon 64

sampaka12 Pentium 4

domori25 Pentium D

0

10

20

30

40

CP

I Err

or (

%)

First 100MFfwd 1B, 100M

Pin, one SimPointQemu, one SimPointValgrind, one SimPoint

Pin, up to 10 SimPointsQemu, up to 10 SimPointsValgrind, up to 10 SimPoints

Pin, up to 20 SimPointsQemu, up to 20 SimPointsValgrind, up to 20 SimPoints

46.7%40.3%

50.1% 40.6% 44.9% 82.6% 50.8% 69.1% 55.3% 54.4% 54.7% 43.1% 46.2%

14Fusion

Page 16: Using Dynamic Binary Instrumentation to Generate Multi ...web.eece.maine.edu/~vweaver/papers/hipeac08/hipeac08...arm x86 x86_64 ppc m68k mips sh4 sparc hppa alpha cris 9 Fusion Pin

SPEC CPU 2000 Breakdown – Pentium D

bzip2.graphic

bzip2.program

bzip2.source

crafty.default

eon.cook

eon.kajiya

eon.rushmeier

gap.default

gcc.expr

gcc.integrate

gcc.scilab

gcc.166

gcc.200

gzip.graphic

gzip.log

gzip.program

gzip.random

gzip.source

mcf.default

parser.default

perlbmk.diffmail

perlbmk.makerand

perlbmk.perfect

perlbmk.535

perlbmk.704

perlbmk.957

perlbmk.850

twolf.default

vortex.1

vortex.2

vortex.3

vpr.route

vpr.place

-10

-5

0

5

10

CP

I Err

or (

%)

-10

-5

0

5

10

CP

I Err

or (

%)

Pin Qemu ValgrindInteger Results, up to 20 SimPoints, Pentium D

19.8

22.8

19.3

10.7

-109

11.8

12.9

19.9

ammp.default

applu.default

apsi.defaultart.1

10art.4

70

equake.default

facerec.default

fma3d.default

galgel.default

lucas.default

mesa.default

mgrid.default

sixtrack.default

swim.default

wupwise.default-10

-5

0

5

10

CP

I Err

or (

%)

-10

-5

0

5

10

CP

I Err

or (

%)

Pin Qemu ValgrindFP Results, up to 20 SimPoints, Pentium D

-15.5

10.817.5

15Fusion

Page 17: Using Dynamic Binary Instrumentation to Generate Multi ...web.eece.maine.edu/~vweaver/papers/hipeac08/hipeac08...arm x86 x86_64 ppc m68k mips sh4 sparc hppa alpha cris 9 Fusion Pin

SPEC CPU 2006

chocovic - Core Duo sampaka12 - Pentium 4 domori25 - Pentium D jennifer - Athlon 640

10

20

30

40

CP

I Err

or (

%)

First 100MFfwd 1 B, 100M

Pin, one SimPointQemu, one SimPointValgrind, one SimPoint

Pin, up to 10 SimPointsQemu, up to 10 SimPointsValgrind, up to 10 SimPoints

Pin, up to 20 SimPointsQemu, up to 20 SimPointsValgrind, up to 20 SimPoints

63.7% 110.1% 124.7% 110.0%42.6%

machine processor memory L1 I/D L2 Cache

chocovic 1.66GHz Core Duo 1GB 32KB/32KB 1MBjennifer 2GHz Athlon64 X2 1GB 64KB/64KB 512KB

sampaka12 2.8GHz Pentium 4 2GB 12Ku/16KB 512KBdomori25 3.46GHz Pentium D 4GB 12Ku/16KB 2MB

16Fusion

Page 18: Using Dynamic Binary Instrumentation to Generate Multi ...web.eece.maine.edu/~vweaver/papers/hipeac08/hipeac08...arm x86 x86_64 ppc m68k mips sh4 sparc hppa alpha cris 9 Fusion Pin

SPEC CPU 2006 Breakdown – Pentium D

astar.BigLakes

astar.rivers

bzip2.source

bzip2.chicken

bzip2.liberty

bzip2.program

bzip2.html

bzip2.combined

gcc.166

gcc.200

gcc.c-typeck

gcc.cp-decl

gcc.expr

gcc.expr2

gcc.g23

gcc.s04

gcc.scilab

gobmk.13x13

gobmk.nngs

gobmk.score2

gobmk.trevorc

gobmk.trevord

h264ref.fore_base

h264ref.fore_main

h264ref.sss_main

hmmer.nph3

hmmer.retro

libquantum mcf

omnetpp

perlbench.checkspam

perlbench.diffmail

perlbench.splitmailsjeng

xalancbmk-10

-5

0

5

10C

PI E

rror

(%

)

-10

-5

0

5

10C

PI E

rror

(%

)Pin Qemu ValgrindInteger Results, up to 20 SimPoints, Pentium D

12.1 10.7

-11.4

11.3

-36.3 -12.8

13.0

-12.6

11.5 13.1 15.6

bwaves

cactusADMcalculix

dealII

gamess.cytosine

gamess.h2ocu2

gamess.triazolium

GemsFDTD

gromacs lbm

leslie3dmilc

namdpovray

soplex.pds-50

soplex.ref

sphinx3tonto wrf

zeusmp-10

-5

0

5

10

CP

I Err

or (

%)

-10

-5

0

5

10

CP

I Err

or (

%)

Pin Qemu ValgrindFP Results, up to 20 SimPoints, Pentium D

-33.25-20.5

n/a

17Fusion

Page 19: Using Dynamic Binary Instrumentation to Generate Multi ...web.eece.maine.edu/~vweaver/papers/hipeac08/hipeac08...arm x86 x86_64 ppc m68k mips sh4 sparc hppa alpha cris 9 Fusion Pin

Results – Average CPI error

• SPEC2000: 10 SimPoints (<0.4% of reference inputs)

◦ Pin 5.32%

◦ Qemu 5.04%

◦ Valgrind 5.38%

• SPEC2006: 10 SimPoints (<0.06% of reference inputs)

◦ Pin 5.58%

◦ Qemu 5.30%

◦ Valgrind 5.28%

18Fusion

Page 20: Using Dynamic Binary Instrumentation to Generate Multi ...web.eece.maine.edu/~vweaver/papers/hipeac08/hipeac08...arm x86 x86_64 ppc m68k mips sh4 sparc hppa alpha cris 9 Fusion Pin

Future Work

• Generating multi-platform results – MIPS BBV file from

IA32 Qemu

• Running non-Linux binaries (Solaris, IRIX)

• Generating OS-aware SimPoints – Qemu runs full OS

19Fusion

Page 21: Using Dynamic Binary Instrumentation to Generate Multi ...web.eece.maine.edu/~vweaver/papers/hipeac08/hipeac08...arm x86 x86_64 ppc m68k mips sh4 sparc hppa alpha cris 9 Fusion Pin

Tools Available for Download

All code is available from:

http://fusion.csl.cornell.edu/tools/

20Fusion

Page 22: Using Dynamic Binary Instrumentation to Generate Multi ...web.eece.maine.edu/~vweaver/papers/hipeac08/hipeac08...arm x86 x86_64 ppc m68k mips sh4 sparc hppa alpha cris 9 Fusion Pin

Questions? Feedback?

All code is available from:

http://fusion.csl.cornell.edu/tools/

[email protected] — http://www.csl.cornell.edu/˜vince

[email protected] — http://www.csl.cornell.edu/˜sam

21Fusion