accurate timing analysis by modeling caches, speculation and their interaction

26
Accurate Timing Analysis by Modeling Caches, Speculation and their Interaction Xianfeng Li Tulika Mitra Xianfeng Li Tulika Mitra Abhik Roychoudhury Abhik Roychoudhury National University of Singapore National University of Singapore

Upload: tamra

Post on 19-Jan-2016

21 views

Category:

Documents


0 download

DESCRIPTION

Accurate Timing Analysis by Modeling Caches, Speculation and their Interaction. Xianfeng Li Tulika Mitra Abhik Roychoudhury National University of Singapore. Why Timing Analysis?. Timing guarantees for real time embedded system Real time scheduling: - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Accurate Timing Analysis by Modeling Caches,  Speculation and their Interaction

Accurate Timing Analysisby Modeling Caches, Speculation and their Interaction

Accurate Timing Analysisby Modeling Caches, Speculation and their Interaction

Xianfeng Li Tulika Mitra Xianfeng Li Tulika Mitra Abhik Abhik RoychoudhuryRoychoudhury

National University of SingaporeNational University of Singapore

Page 2: Accurate Timing Analysis by Modeling Caches,  Speculation and their Interaction

Why Timing Analysis?Why Timing Analysis?

Timing guarantees for real time embedded system

Real time scheduling: – Worst case bound on execution time– Tasks are guaranteed to be schedulable

irrespective of inputs

Tight bound to avoid idle processor cycles

Extremely important for safety critical systems

Page 3: Accurate Timing Analysis by Modeling Caches,  Speculation and their Interaction

Worst Case Execution Time (WCET)Worst Case Execution Time (WCET) Maximum execution time of a program on a

micro-architecture for all possible inputs

Measurement– Execute program for all inputs: impractical– Execute program for selected inputs to get a

lower bound on WCET (Observed WCET)

Analysis– Employ static analysis to compute an upper

bound on WCET (Estimated WCET)

Observed

Actual

Estimated

Page 4: Accurate Timing Analysis by Modeling Caches,  Speculation and their Interaction

WCET AnalysisWCET Analysis

Program path analysis [Shaw’89, Healy’98,..]– All possible paths in program are not feasible

Micro-architectural modeling– Dynamically variable instruction execution time

Cache, Pipeline [Li’99, Theiling’00, Schneider’99,..]

Speculative execution (branch prediction) [Mitra’02]

Combined modeling of cache + speculative execution

Page 5: Accurate Timing Analysis by Modeling Caches,  Speculation and their Interaction

Speculative ExecutionSpeculative Execution

No Speculative ExecutionNo Speculative Execution

MispredictionMisprediction

Correct predictionCorrect prediction

B

N T

SMisprediction penalty

Page 6: Accurate Timing Analysis by Modeling Caches,  Speculation and their Interaction

Cache + Speculation: Destructive Effect Cache + Speculation: Destructive Effect

B

N T

S

Cache Execution

Cache Miss 1:Loading into cache

from speculated path

Cache Miss 2:Loading into cache from correct path

N T&map to same cache block

Page 7: Accurate Timing Analysis by Modeling Caches,  Speculation and their Interaction

Destructive Effect: Extra Cache MissesDestructive Effect: Extra Cache Misses

Cache miss penalty (CMP) along speculative path – Fully masked by branch misprediction penalty

(BMP)– Partially masked by BMP

wait for cache miss to be serviced before executing correct path

Cache miss penalty along correct path due to fetch along speculative path

BMP

CMP

BMP

CMP

Page 8: Accurate Timing Analysis by Modeling Caches,  Speculation and their Interaction

Cache + Speculation: Constructive EffectCache + Speculation: Constructive Effect

B

N

S

Cache Execution

Cache Miss 1:Loading into cache

from speculated path

Cache Hit:Correct block already

loaded into cache

&

map to same cache block

B S

Page 9: Accurate Timing Analysis by Modeling Caches,  Speculation and their Interaction

How serious is the effect?How serious is the effect?

-10 %

-5 %

0 %

5 %

10 %

15 %

20 %

Modelin

g w

ith in

tera

ctio

n/

Modelin

g w

/o in

tera

ctio

n

mat

sum

mat

mult

bsea

rch

fdct fft

dhry

whe

t

Cache Miss Overhead

Page 10: Accurate Timing Analysis by Modeling Caches,  Speculation and their Interaction

Technique: Integer Linear ProgrammingTechnique: Integer Linear Programming

Integrate program analysis and micro-architectural modeling in an ILP framework [Li and Malik 1995]

Input:Input:– Control Flow Graph (CFG) of the program Control Flow Graph (CFG) of the program – User provided loop bounds, recursion depth etc.User provided loop bounds, recursion depth etc.– Specification of micro-architectureSpecification of micro-architecture

Objective function: Execution time (maximized)

Constraints– Flow constraints from Control Flow Graph– Constraints from micro-architectural modeling

ILP formulation of instruction cache + speculative exec.

Page 11: Accurate Timing Analysis by Modeling Caches,  Speculation and their Interaction

Objective FunctionObjective Function

WCET = (costB × countB + BMP x mispredictionB

+ CMP x missB + mp_delayB)

costB × countB : Execution time of basic block B without cache miss and branch misprediction

BMP x mispredictionB: Penalty due to mispredictions

CMP x missB : Penalty due to cache misses – Includes constructive and destructive effect of

speculation along correct path

mp_delayB : Penalty due to partially masked cache misses along speculative path (variable CMP)

Page 12: Accurate Timing Analysis by Modeling Caches,  Speculation and their Interaction

Flow Constraints: Easy !!Flow Constraints: Easy !!

es,1 + e3,1 = count1 = e1,2 + e1,4

e1,2 + e2,2 = count2 = e2,3 + e2,2

e2,3 + e4_3 = count3 = e3,1 + e3,E

e1_4 = count4 = e4,3

Loop bounds: e2,2 100 e3,1 10

B1

B3

Bounds countB

Inflow = Basic Block Execution Count = OutflowBound on maximum loop iterations

B2 B4

Page 13: Accurate Timing Analysis by Modeling Caches,  Speculation and their Interaction

Other ConstraintsOther Constraints

Branch misprediction constraints– Bounds mispredictionsB

– Details appeared in an earlier paper Timing Analysis of Embedded Software for

Speculative Processors. T. Mitra, A. Roychoudhury and X. Li. In ACM Intl. Symposium on System Synthesis (ISSS) 2002

Instruction cache miss constraintsInstruction cache miss constraints– Bounds missBounds missB [Li, Malik and Wolfe 1999][Li, Malik and Wolfe 1999]

Page 14: Accurate Timing Analysis by Modeling Caches,  Speculation and their Interaction

Modeling Cache-Speculation InteractionModeling Cache-Speculation Interaction

Modify instruction cache miss constraints to Modify instruction cache miss constraints to model model constructive/destructive effect of speculation along correct path

Add additional constraints on mp_delayB : Penalty due to partially masked cache misses along speculative path

Page 15: Accurate Timing Analysis by Modeling Caches,  Speculation and their Interaction

Modeling Instruction CacheModeling Instruction Cache

B1 B3

S

E

B1

B3

B2

Cache Conflict Graph

pS_1 p1_3

p3_1 p3_E

Flow among blocks mapping to the same cache line

pS_1 + p3_1 = count1 = p1_3

miss1 = pS_1 + p3_1

B4

Page 16: Accurate Timing Analysis by Modeling Caches,  Speculation and their Interaction

Constructive Effect of SpeculationConstructive Effect of Speculation

B1 B3

B1

B3

B2

N

T

N

T

B4

TN

Speculative Path

Correct Path

B3 (2,T)

Miss

Miss

Partially Masked CMP

Page 17: Accurate Timing Analysis by Modeling Caches,  Speculation and their Interaction

Constructive Effect of SpeculationConstructive Effect of Speculation

B1 B3

B1

B3

B2

N

T

N

T

B4

TN

Speculative Path

Correct Path

B3 (2,T)

Partially Masked CMP

HitMiss

Miss

miss3 will decrease by the amount of flow between B3 (2,T) and B3

Page 18: Accurate Timing Analysis by Modeling Caches,  Speculation and their Interaction

Destructive Effect of SpeculationDestructive Effect of Speculation

B2 B4

B1

B3

B2

N

T

N

T

B4

TN

Speculative Path

Correct Path

B4 (1,N)

Miss

MissPartially Masked CMP

Hit

miss2 will increase by the amount of flow between B4 (1,N) and B2

Page 19: Accurate Timing Analysis by Modeling Caches,  Speculation and their Interaction

General Flow Involving Extra NodesGeneral Flow Involving Extra Nodes

n

m (b,X) n1

b

Case 1

Case 3

Case 2

XX

b

b1

XX

YCase 4

m1 (b,X)

m (b,X)

m2 (b1,Y)

Y

Case 2

Page 20: Accurate Timing Analysis by Modeling Caches,  Speculation and their Interaction

Additional ConstraintsAdditional Constraints

b

B1

BnBMP

count (mi(b,X)) = misprediction(b, X) - miss (mk

(b,X))

k=1

i-1

CMP > BMP

XX

mp_delay (b, X) = miss (mk(b,X)) × delay (mk

(b,X)) k=1

n

delay (mi(b,X)) = CMP – (BMP - cost (mk

(b, X))k=1

i-1

And some others ….

B2

Page 21: Accurate Timing Analysis by Modeling Caches,  Speculation and their Interaction

BenchmarksBenchmarks

PrograProgramm

DescriptionDescription PathsPaths LoopLoopss

matsum Summation of two 100 * 100 matrices

S

matmult Multiplication of two 10 * 10 matrices S

isort Insertion sort of 100-element array

bsearch Binary search of 100 element array

fft 1024-point Fast Fourier Transform S

fdct Fast Discrete Cosine Transform S

dhry Dhrystone benchmark S

des Data Encryption Standard

whet Whetstone benchmark S

djpg Decompress 128 * 96 color JPG image

Page 22: Accurate Timing Analysis by Modeling Caches,  Speculation and their Interaction

Experimental MethodologyExperimental Methodology

Observed WCET: simulationObserved WCET: simulation– SimpleScalar cycle-accurate architectural SimpleScalar cycle-accurate architectural

simulatorsimulator– In-order exec, No pipeline, No Data Cache In-order exec, No pipeline, No Data Cache

missesmisses– Branch misprediction penalty = 5 cyclesBranch misprediction penalty = 5 cycles– Cache miss penalty = 10 cyclesCache miss penalty = 10 cycles

Estimated WCET: Prototype analyzerEstimated WCET: Prototype analyzer Input: benchmark in assembly code, Input: benchmark in assembly code, -arch -arch

parameters, loop boundsparameters, loop bounds Output: ILP constraintsOutput: ILP constraints Feed the constraints to CPLEX: a commercial ILP Feed the constraints to CPLEX: a commercial ILP

solversolver

Page 23: Accurate Timing Analysis by Modeling Caches,  Speculation and their Interaction

Accuracy (Smaller Benchmarks)Accuracy (Smaller Benchmarks)

Program

WCET Ratio Misprediction

Est/Obs

Cache miss

Est/Obs

Obs Est

matsum 105K

106K

1.00 1.00 1.33

matmult 25.1K

25.6K

1.02 1.05 1.03

isort 48.6K

48.8K

1.00 1.02 1.02

bsearch 506

546

1.07 1.25 1.06

fft 8798

8803

1.00 1.00 1.00

fdct 219K

229K

1.04 1.66 1.19

Page 24: Accurate Timing Analysis by Modeling Caches,  Speculation and their Interaction

Accuracy (Larger Benchmarks)Accuracy (Larger Benchmarks)

Program

WCET Ratio

Misprediction

Est/Obs

Cache miss

Est/Obs

Obs Est

dhry 218.6K

232.5K

1.06 0.96 1.18

des 87.4K

96.4K

1.10 2.54 1.07

whet 545.5K

581.5K

1.06 2.81 1.29

djpg 44.9 M

65.2 M

1.44 3.25 1.37

Page 25: Accurate Timing Analysis by Modeling Caches,  Speculation and their Interaction

ScalabilityScalability

0.0 s

0.5 s

1.0 s

1.5 s

2.0 s

2.5 s

3.0 s16

32

64

128

256

512

1K

Predictor Table Size

fft dhry des whet

0.0 s

0.5 s

1.0 s

1.5 s

2.0 s

2.5 s

3.0 s

32

64

128

256

512

1K

2K

Cache Size

fft dhry des whet

Page 26: Accurate Timing Analysis by Modeling Caches,  Speculation and their Interaction

SummarySummary

Micro-architectural modeling is crucial for Micro-architectural modeling is crucial for tight estimation of Worst Case Execution Time tight estimation of Worst Case Execution Time (WCET)(WCET)

Existing methods typically focus on a single Existing methods typically focus on a single micro-architectural featuremicro-architectural feature– Cache Cache – PipelinePipeline– SpeculationSpeculation

A step towards combining micro-architectural A step towards combining micro-architectural features which effect each otherfeatures which effect each other– Cache misses/hits due to speculationCache misses/hits due to speculation