- 1 - copyright © 2006 intel corporation. all rights reserved. techniques for speeding up pin-based...

29
- 1 - Copyright © 2006 Intel Corporation. All Rights Res Techniques for Speeding up Techniques for Speeding up Pin-based Simulation Pin-based Simulation Harish Patil Harish Patil

Upload: caren-hopkins

Post on 17-Dec-2015

214 views

Category:

Documents


0 download

TRANSCRIPT

- 1 -

Copyright © 2006 Intel Corporation. All Rights Reserved.

Techniques for Speeding up Techniques for Speeding up Pin-based SimulationPin-based Simulation

Harish PatilHarish Patil

- 2 -

Copyright © 2006 Intel Corporation. All Rights Reserved.

ISIS : High-level techniques for : High-level techniques for speeding up Pin-based simulationspeeding up Pin-based simulation

IS NotIS Not : low-level optimizations (in- : low-level optimizations (in-lining etc.) of Pintoolslining etc.) of Pintools

Two usage modelsTwo usage models

ObjectiveObjective

Pin-toolSimulator

Pin-tool Simulator

- 3 -

Copyright © 2006 Intel Corporation. All Rights Reserved.

OutlineOutline

Two techniques:Two techniques:

1.1. Selective simulationSelective simulation

2.2. Conditional instrumentationConditional instrumentation

PinPoints : Selecting simulation PinPoints : Selecting simulation regions with Pin and SimPointregions with Pin and SimPoint

Case Study: Pin Case Study: Pin SimpleScalar.x86 SimpleScalar.x86

- 4 -

Copyright © 2006 Intel Corporation. All Rights Reserved.

Instruction Counts : Some IPF Instruction Counts : Some IPF ApplicationsApplications

Real Applications Are Long-running# Instructions (billions)

142 373 463

3,979 3,994

4,932

0

1,000

2,000

3,000

4,000

5,000

6,000

IPF Applications

# In

stru

ctio

ns (b

illio

ns)

- 5 -

Copyright © 2006 Intel Corporation. All Rights Reserved.

Problem: Whole-Program Problem: Whole-Program Simulation is SlowSimulation is Slow

Simulation Time in YEARS@ 10,000 Instructions/Second

0.41.2 1.5

12.6 12.7

15.6

02468

1012141618

IPF Applications

Yea

rs

- 6 -

Copyright © 2006 Intel Corporation. All Rights Reserved.

Solution: Select Simulation PointsSolution: Select Simulation Points

Select One PointSelect One Point

– At the beginning (no skip)At the beginning (no skip)

– After 1 billion instructions After 1 billion instructions

– After skipping a random number of instructionsAfter skipping a random number of instructions

Select Multiple PointsSelect Multiple Points

– Manually by looking at performance data Manually by looking at performance data

– Randomly anywhereRandomly anywhere

– Randomly from uniform regionsRandomly from uniform regions

– By program phase analysis (SimPoint : UCSD)By program phase analysis (SimPoint : UCSD)

– Fine-grain sampling (SMARTS: CMU)Fine-grain sampling (SMARTS: CMU)

Fast-forward

Simulation Fast-forward

Simulation

- 7 -

Copyright © 2006 Intel Corporation. All Rights Reserved.

How Pin Supports Selective How Pin Supports Selective Simulation?Simulation?

Class CONTROL : in InstLib/control.HClass CONTROL : in InstLib/control.H(via instlib.H)(via instlib.H)Pintool includes the class and provides a Pintool includes the class and provides a “Handler” for “start and end of region”“Handler” for “start and end of region”

Provides a number of switches:Provides a number of switches:

– For specifying “start of region”For specifying “start of region” -skip <instruction count> -skip <instruction count> -start_address <Address> -start_address <Address>……

– For specifying “end of region”For specifying “end of region”-length <instruction count>-length <instruction count>-stop_address <Address>-stop_address <Address>……

- 8 -

Copyright © 2006 Intel Corporation. All Rights Reserved.

InstlibExamples/controlInstlibExamples/control

$ $ pinpin –t –t control –skip 100 –length 500control –skip 100 –length 500 –- –- hellohelloip: 0x40000e00 104 Startip: 0x40000e00 104 Start

ip: 0x4000105e 598 Stopip: 0x4000105e 598 Stop

Hello worldHello world

Other example switches:Other example switches:One region:One region:

1.1. -start_address foo:10 -length 500-start_address foo:10 -length 500

Multiple regions:Multiple regions:

2.2. -uniform_period 1000 uniform_length 200-uniform_period 1000 uniform_length 200

3.3. -ppfile foo.pp-ppfile foo.pp

- 9 -

Copyright © 2006 Intel Corporation. All Rights Reserved.#include "instlib.H"#include "instlib.H"

using namespace INSTLIB;using namespace INSTLIB;

// Contains knobs and instrumentation to recognize start/stop points// Contains knobs and instrumentation to recognize start/stop points

CONTROL control;CONTROL control;

VOID VOID Handler(CONTROL_EVENT evHandler(CONTROL_EVENT ev, VOID *v, CONTEXT *ct, VOID *ip, VOID *tid), VOID *v, CONTEXT *ct, VOID *ip, VOID *tid)

{ std::cout << "ip: " << ip << " " << icount.Count() ;{ std::cout << "ip: " << ip << " " << icount.Count() ;

switch(ev){switch(ev){

case CONTROL_START:case CONTROL_START:

std::cout << "Start" << endl;std::cout << "Start" << endl;

break;break;

case CONTROL_STOP:case CONTROL_STOP:

std::cout << "Stop" << endl;std::cout << "Stop" << endl;

break;break;

default:default:

ASSERTX(false);ASSERTX(false);

break;break;}}

}}

main() {main() {......

control.CheckKnobs(Handler, 0);control.CheckKnobs(Handler, 0);}}

analysis routine

InstLibExamples/control.C

Instrumentation (hidden)

- 10 -

Copyright © 2006 Intel Corporation. All Rights Reserved.

Recap: Instrumentation vs. Analysis Recap: Instrumentation vs. Analysis InstrumentationInstrumentation routinesroutines define where define where

instrumentation isinstrumentation is inserted inserted

– e.g. before instructione.g. before instruction

Occurs Occurs first timefirst time an instruction is executed an instruction is executed

Analysis routinesAnalysis routines define what to do when define what to do when instrumentation is instrumentation is activatedactivated

– e.g. increment countere.g. increment counter

Occurs every timeOccurs every time an instruction is executed an instruction is executed

- 11 -

Copyright © 2006 Intel Corporation. All Rights Reserved.

Selective Simulation: Naive Selective Simulation: Naive Approach: Conditional AnalysisApproach: Conditional Analysis

LOCALVAR INT32 enabled = 0;LOCALVAR INT32 enabled = 0;

VOID Simulation()VOID Simulation()

{{

if(!enabled) return;if(!enabled) return;

// Analysis code for detailed simulation// Analysis code for detailed simulation

}}

VOID Handler { VOID Handler {

switch(ev){switch(ev){

case CONTROL_START:case CONTROL_START:

enabled = 1;enabled = 1;

break;break;

case CONTROL_STOP:case CONTROL_STOP:

enabled = 0;enabled = 0;

break;break;

}}

Conditional Analysis routine

Instrumentation always present !

- 12 -

Copyright © 2006 Intel Corporation. All Rights Reserved.

Changing Instrumentation on-the-flyChanging Instrumentation on-the-fly

PIN_RemoveInstrumentation()PIN_RemoveInstrumentation()All instrumentation is removed. When application code is All instrumentation is removed. When application code is executed the instrumentation routines will be called to re-executed the instrumentation routines will be called to re-instrument all codeinstrument all code

Removes old instrumentation, forces Removes old instrumentation, forces instrumentation to be done again (after a instrumentation to be done again (after a delay)delay)

PIN_ExecuteAt ( const CONTEXT * ctxt ) PIN_ExecuteAt ( const CONTEXT * ctxt ) Starts execution at an arbitrary point given the architectural Starts execution at an arbitrary point given the architectural state.state.

– CONTEXT passed in to Handler()CONTEXT passed in to Handler()

– Currently only on IA32 and IA32ECurrently only on IA32 and IA32E

- 13 -

Copyright © 2006 Intel Corporation. All Rights Reserved.

Selective Simulation: Faster Approach: Selective Simulation: Faster Approach: Conditional InstrumentationConditional InstrumentationLOCALVAR INT32 enabled = 0;LOCALVAR INT32 enabled = 0;

VOID Trace(){VOID Trace(){

if(!enabled) return;if(!enabled) return;

// Add instrumentation for detailed simulation// Add instrumentation for detailed simulation

}}

VOID Handler (... CONTEXT *ctxt ... ) { VOID Handler (... CONTEXT *ctxt ... ) {

switch(ev){switch(ev){

case CONTROL_START:case CONTROL_START:

enabled = 1;enabled = 1;

PIN_RemoveInstrumentation();PIN_RemoveInstrumentation();

if (ctxt) PIN_ExecuteAt(ctxt); // Only on IA32/IA32Eif (ctxt) PIN_ExecuteAt(ctxt); // Only on IA32/IA32E

break;break;

case CONTROL_STOP:case CONTROL_STOP:

enabled = 0;enabled = 0;

PIN_RemoveInstrumenation();PIN_RemoveInstrumenation();

if (ctxt) PIN_ExecuteAt(ctxt); // Only on IA32/IA32Eif (ctxt) PIN_ExecuteAt(ctxt); // Only on IA32/IA32E

break;break;

}}

Conditional instrumentation routine

Instrumentation only in simulation regions

DebugTrace/debugtrace.C

- 14 -

Copyright © 2006 Intel Corporation. All Rights Reserved.

Comparing Naïve vs. Fast ApproachComparing Naïve vs. Fast Approachnaïve_debugtrace naïve_debugtrace vs. vs. debugtracedebugtrace

Switches: Switches: -skip 100000000 -length 1000 -skip 100000000 -length 1000 -instruction -memory -early_out-instruction -memory -early_out

Naïve approach : Conditional Analysis

Fast approach (default) : Conditional Instrumentation

- 15 -

Copyright © 2006 Intel Corporation. All Rights Reserved.

debugtrace: Conditional Analysis vs debugtrace: Conditional Analysis vs Conditional InstrumentationConditional Instrumentation

Fast-forwarding is 5X faster with conditional instrumentation!

Fast-forward

Simulation Fast-forward

Simulation

Time to skip 100 million instructions

0

50

100

150

200

250

SPECINT SPECFP

Seco

nd

s naive_debugtrace

debugtrace (default)

- 16 -

Copyright © 2006 Intel Corporation. All Rights Reserved.

Simulation Point Selection: Simulation Point Selection: Re-visitedRe-visited

Select One PointSelect One Point

– At the beginning (no skip)At the beginning (no skip)

– After 1 billion instructions After 1 billion instructions

– After skipping a random number of instructionsAfter skipping a random number of instructions

Select Multiple PointsSelect Multiple Points

– Manually by looking at performance data Manually by looking at performance data

– Randomly anywhereRandomly anywhere

– Randomly from uniform regionsRandomly from uniform regions

– By program phase analysis (SimPoint : UCSD)By program phase analysis (SimPoint : UCSD)

– Fine-grain sampling (SMARTS: CMU)Fine-grain sampling (SMARTS: CMU)

Question: Are the simulation points representative?

- 17 -

Copyright © 2006 Intel Corporation. All Rights Reserved.

CPI: CPI: Average Error Average Error SPEC2000(IA32)SPEC2000(IA32)Whole Program vs. Selected Whole Program vs. Selected

PointsPoints

27.1%

13.6%10.8%

8.9%

4.6%

48.0%

0%

5%

10%

15%

20%

25%

30%

35%

40%

45%

50%

1Selection method

Ave

rag

e C

PI E

rro

r

No Skip:1 point (N*100 million insts.)

Skip 1 billion: 1 point (N*100 million insts)

Skip Random: 1 point (N*100 million insts.)

Random: N points (100 million insts. each)

Uniform Random: N points (100 million insts. each)

Phase-based: N points (100 million insts. each)

- 18 -

Copyright © 2006 Intel Corporation. All Rights Reserved.

PinPoints PinPoints http://rogue.colorado.edu/Pin/PinPoints/http://rogue.colorado.edu/Pin/PinPoints/

Pin (Intel) + SimPoint (UCSD)Pin (Intel) + SimPoint (UCSD)

What are PinPoints?What are PinPoints? Representative regions of programsRepresentative regions of programs

– Automatically chosenAutomatically chosen

– Validated ( represent whole-program behavior)Validated ( represent whole-program behavior)

– For trace-driven or execution-driven simulationFor trace-driven or execution-driven simulation

Found/validated PinPoints for long running Found/validated PinPoints for long running (trillions of instructions) programs [(trillions of instructions) programs [IA-32, IA-32, EM64T, ItaniumEM64T, Itanium]]

- 19 -

Copyright © 2006 Intel Corporation. All Rights Reserved.

Phase Detection Phase Detection ++ PinPoint Selection PinPoint Selection

PinPoint 1: Weight 30% PinPoint 2: Weight 70%

Choose one simulation

point per phase…350 3518 …

1 2 350 4232… …

1 2 1022 4232… …

Profile with isimpoint

Intervals :100 million

Instructions each

PinPoints file

3518 Find phases

Two Phases => Two PinPoints

Bb-vectorsAnalyze with SimPoint

- 20 -

Copyright © 2006 Intel Corporation. All Rights Reserved.

Inside a PinPoints fileInside a PinPoints file

Region-numberRegion-numberSlice-number WeightSlice-number WeightStart-address Count1Start-address Count1End-address Count2End-address Count2

Start-of-region : When Start-of-region : When Start-addressStart-address is is reached reached Count1 Count1 timestimes

End-of-region : When End-of-region : When End-addressEnd-address is reached is reached Count2Count2 times times

Example usage:Example usage:

pinpin –t –t simulator –ppfile foo.ppsimulator –ppfile foo.pp –- –- foofooFast-forward

Simulation Fast-forward

Simulation

- 21 -

Copyright © 2006 Intel Corporation. All Rights Reserved.

PinPoints: Estimating Total Execution TimePinPoints: Estimating Total Execution TimeTotal Execution Time = Total Cycles / Frequency

– We know the simulated Frequency; need to know Total Cycles for *full* We know the simulated Frequency; need to know Total Cycles for *full* run of the binary on the Simulator run of the binary on the Simulator

Total Cycles Simulated = (Weighted CPI) * (Total Instructions)

– PinPoints provides the Total number of instructions in the PinPoints provides the Total number of instructions in the PinPoints file.PinPoints file.

Weighted CPI can be determined through simulation of PinPoints regions and Weighted CPI can be determined through simulation of PinPoints regions and weighting of results:weighting of results:

Weighted CPI = Weighted CPI = Weight Weightii * CPI * CPIii

CAUTION: Use the formula only for statistics normalized by CAUTION: Use the formula only for statistics normalized by instructions : CPI computation OK; IPC computation is NOT OKinstructions : CPI computation OK; IPC computation is NOT OK

- 22 -

Copyright © 2006 Intel Corporation. All Rights Reserved.

PinPoints : Usage ModelPinPoints : Usage Model

Pin-based profiler

Simulation Point

Selection

BB ProfilePinPoints

Pin-based Trace Generator

Pin-based Branch Predictor

Your Simulator Here

CONTROL

CONTROL

CONTROL

A Case Study: Pin + A Case Study: Pin + SimpleScalar.x86SimpleScalar.x86

- 24 -

Copyright © 2006 Intel Corporation. All Rights Reserved.

Ad-hoc system call side-effect emulationAd-hoc system call side-effect emulationswitch (syscall_id)switch (syscall_id)case SC1 : // Action for SC1case SC1 : // Action for SC1case SC2 : // Action for SC2case SC2 : // Action for SC2

Simplescalar(Alpha) emulates 80+ syscalls Simplescalar(Alpha) emulates 80+ syscalls (enough to run SPEC2000 only)(enough to run SPEC2000 only)

User-level Simulation with User-level Simulation with SimpleScalar (Alpha): Old ApproachSimpleScalar (Alpha): Old Approach

Host Operating System

Host Machine

User Level Simulator

ArchitectureSimulation

Engine

System Call

EmulationEngine

syscall(id, arg1,…,argn)

Register and memory updates

Executes syscall natively

- 25 -

Copyright © 2006 Intel Corporation. All Rights Reserved.

No ad-hoc processing of system calls No ad-hoc processing of system calls neededneeded

Ease of porting to newer OSes Ease of porting to newer OSes (MacOS/Windows)(MacOS/Windows)

Simulation of many more applications Simulation of many more applications (non-SPEC) feasible (non-SPEC) feasible

pinSEL : A tool for Automatic pinSEL : A tool for Automatic System-call Side-effect LoggingSystem-call Side-effect Logging

pinSELLog of syscall

side-effects

// At a system call// set memory // locations as// specified in the log

- 26 -

Copyright © 2006 Intel Corporation. All Rights Reserved.

Coming Soon : Coming Soon : pinSEL + SimpleScalar-x86pinSEL + SimpleScalar-x86

pinSEL : Pin-based “System Effects Log” pinSEL : Pin-based “System Effects Log” generator (alternative to generator (alternative to EIOEIO traces) traces)

pinSEL SimpleScalar-x86

SELs

PinPoints

CONTROL

pinSEL Key Advantages

• Automated system-call effect analysis

• Easy port to MacOS and Windows

- 27 -

Copyright © 2006 Intel Corporation. All Rights Reserved.

Example : pinSEL for SimpleScalar.x86Example : pinSEL for SimpleScalar.x86

$ pin -t $ pin -t pinSEL -ppfile perlbmk.makerand.pppinSEL -ppfile perlbmk.makerand.pp - -tracefile perlbmk.makerand -- tracefile perlbmk.makerand -- perlbmk.exe -I perlbmk.exe -I lib makerand.pllib makerand.pl

START:START:icount:13 icount:13 do_trace: 1do_trace: 1

PinPoint #: 1 phase id: 2 weight: 25.64 PinPoint #: 1 phase id: 2 weight: 25.64 slice_size: 30000000slice_size: 30000000

SEL file names: perlbmk.makerand_1_0.sel SEL file names: perlbmk.makerand_1_0.sel perlbmk.makerand_1_0.ssiperlbmk.makerand_1_0.ssi

END:END: icount:30000786 icount:30000786 do_trace: 0do_trace: 0

Selective Simulation

Conditional Instrumentation

- 28 -

Copyright © 2006 Intel Corporation. All Rights Reserved.

SummarySummary

Techniques for speeding up Pin-based Techniques for speeding up Pin-based simulationsimulation

1.1. Be selectiveBe selective : choose simulation regions : choose simulation regions

2.2. Instrument conditionallyInstrument conditionally : Only in : Only in “regions of interest”“regions of interest”

Coming Soon [ from UCSD] : Coming Soon [ from UCSD] : pinSEL + SimpleScalar-x86pinSEL + SimpleScalar-x86

- 29 -

Copyright © 2006 Intel Corporation. All Rights Reserved.

ResourcesResources Pin Manual: Instrumentation Library: Library for

common instrumentation tasks Controller : Identify start and stop points for instrumentation

PinPoints: Harish Patil, Robert Cohn, Mark Charney, Rajiv Kapoor, Andrew Sun, and Anand Karunanidhi. “Pinpointing Representative Portions of Large Intel Itanium Programs with Dynamic Instrumentation” MICRO-37(2004)

pinSEL: Satish Narayanasamy, Cristiano Pereira, Harish Patil, Robert Cohn, and Brad Calder. “Automatic Logging of Operating System Effects to Guide Application-Level Architecture Simulation” SIGMETRICS’06