tutorial ew08 absint

44
Tutorial on Timing Analysis and Optimization Is your program always fast enough? Dr. Christian Ferdinand AbsInt Angewandte Informatik GmbH Dr. Kai Richter Symtavision GmbH

Upload: anilkumar-patil

Post on 03-Dec-2014

41 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Tutorial EW08 AbsInt

Tutorial on Timing Analysis and Optimization

Is your program always fast enough?

Dr. Christian FerdinandAbsInt Angewandte Informatik GmbH

Dr. Kai RichterSymtavision GmbH

Page 2: Tutorial EW08 AbsInt

2

AbsInt Angewandte Informatik GmbH

Provides advanced development tools for embedded systems, and tools for validation, verification, and certification of safety-critical software

Founded in February 1998 by six researchers of Saarland University, Germany

Privately held by the founders0

10

20

30

40

1998 2008

Staff growth graph

Page 3: Tutorial EW08 AbsInt

3

Key Products

Page 4: Tutorial EW08 AbsInt

4

Controllers in planes, cars, plants, … are expected to finish their tasks within reliable time bounds.Schedulability analysis must be performedHence, it is essential that an upper bound on the execution times of all tasks is known Commonly called the Worst-Case Execution Time (WCET)

Hard Real-Time Systems

Page 5: Tutorial EW08 AbsInt

5

The Timing ProblemPr

obab

ility

Execution time

Exact worst-caseexecution time

Safe worst-caseexecution timeestimate

Best-caseexecution time

Unsafe:execution timemeasurement

Page 6: Tutorial EW08 AbsInt

6

The Ever-Growing Gap

LOAD r2, _a

LOAD r1, _b

ADD r3,r2,r1

MPC 5xx (2000) PPC 755 (2001)

x = a + b;

68K (1990)

20 200

100

200

300

Best case Worst case

Execution time (clock cycles)

4

320

0

100

200

300

Best case Worst case

Execution time (clock cycles)

4 830

0

100

200

300

0 wait cycles 1 wait cycle External(6,1,1,1,..)

Execution time depending on flash memory

Page 7: Tutorial EW08 AbsInt

7

(Concrete) Instruction Execution

mul

FetchI-Cache miss?

IssueUnit occupied?

ExecuteMulticycle?

RetirePending instructions?

30

1

1

3

3

4

6

44

1 1

1

1

1

1

1

3

Page 8: Tutorial EW08 AbsInt

8

Murphy’s Law in Timing Analysis

Naïve, but safe guarantee accepts Murphy’s Law: Any accident that may happen will happen

Consequence: hardware overkill necessary to guarantee timeliness

Example: EADS study: Measured performance of PPC 603e with all the caches switched off

Corresponds to assumption “all memory accesses miss the cache”

Result: Slowdown of a factor of 30!!!

Page 9: Tutorial EW08 AbsInt

9

Fighting Murphy’s Law

Static Program Analysis allows the derivation of Invariants about all execution states at a program point

Derive Safety Properties from these invariants:

Certain timing accidents will never happen.

Example: At program point p, instruction fetch will never cause a cache miss

The more accidents excluded, the lower the upper bound

Page 10: Tutorial EW08 AbsInt

10

aiT WCET AnalyzerThe solution to the timing problemGlobal program analysis

abstract interpretation for cache, pipeline, and value analysisinteger linear programming for path analysis

Everything combined in a single intuitive GUI

Page 11: Tutorial EW08 AbsInt

11

Structure of the aiT WCET Analyzer

Page 12: Tutorial EW08 AbsInt

12

Example: Direct Mapped I-Cache

mul …

add …

ble 1024

1028:

1024:

1032:mul …

add …

1028:

1024:

Program Counter:

1028

Instruction:

I-Cache

mul ...

1032

ble 1024

CPU

Main memory

Cache Hit: ~ 1 Cycle

Cache Miss: ~ +1 to +100 Cycles

ble 1024 1032:

Page 13: Tutorial EW08 AbsInt

13

Cache Analysis

Must analysis:for each program point and calling context,find out which blocks are in the cacheMay analysis:for each program point and calling context,find out which blocks may be in the cache

Example: Fully Associative Cache (2 Elements)

Page 14: Tutorial EW08 AbsInt

14

Set Associative Cache

Addressprefix

Byte inline

Setnumber

Address:

CPU

1 2 … A

Adr. prefix Tag Rep Data block Adr. prefix Tag Rep Data block … …

… … … … … … Set: Fully associative subcache of A elements with LRU, FIFO, rand. replacement strategy … … … … … …

Main MemoryCompare address prefixIf not equal, fetch block from memory

Data Out

Byte select & align

Page 15: Tutorial EW08 AbsInt

15

Pipelines

Ideal case: 1 instruction per cycle

Fetch

Decode

Execute

Write back

Fetch

Decode

Execute

Write back

Fetch

Decode

Execute

Write back

Fetch

Decode

Execute

Write back

Fetch

Decode

Execute

Write back

Inst 1 Inst 2 Inst 3 Inst 4

Page 16: Tutorial EW08 AbsInt

16

Pipeline Analysis

Goal: calculate all possible pipeline states at a program pointMethod: perform a cycle-wise evolution of the pipeline,determining all possible successor pipeline statesImplementation: from a formal model of the pipeline,its stages and communication between themGeneration: from a PAG specificationResult: WCET for basic blocks

Page 17: Tutorial EW08 AbsInt

17

Pipeline ModelMPC555 Block Diagram

RCPU Block DiagramaiT visualization

aiT's internalpipeline model

Page 18: Tutorial EW08 AbsInt

18

Visualization of Pipeline Analysis Results

Page 19: Tutorial EW08 AbsInt

19

if a then b

elseif c thend

elsee

endiff

a

bc

d

f

e

10t

4t

3t

2t

5t

6t

Value of objective function: 19xa 1xb 1xc 0xd 0xe 0xf 1

max: 4 xa + 10 xb + 3 xc +

2 xd + 6 xe + 5 xf

where xa = xb + xc

xcc = xd + xe

xf = xb + xd + xexa = 1

Path Analysis: Example (simplified constraints)

Page 20: Tutorial EW08 AbsInt

20

A Hybrid Approach:Combining block measurements with static analysis

Measurementsof execution times of blocks(emulator, logic analyzer,Nexus, ETM,…)

Avoids the high costs of micro-architecture modelingRequires to “measure” all local worst-case behaviors

Regrettably, this is nearly impossible generally not safe!Nevertheless, can be quite useful for optimizations by hand

Page 21: Tutorial EW08 AbsInt

21

Some Architectural Features that make Measurement-Based WCET Analysis a Challenge

Fine-grain timing measurement is not always possibleInstrumentation changes timing behaviorDebug interfaces rarely available in “real” embedded applications

The empty cache is not necessarily the “worst case cache”

“Domino” effects

Page 22: Tutorial EW08 AbsInt

22

Domino Effect

Timing anomalyExecution time increase is not bounded by hardware determined constantsCertain instruction sequences e.g. in loop bodies can trigger this effect and increase latencies in further iterations

Page 23: Tutorial EW08 AbsInt

23

Pseudo-LRU Replacement (e.g., PPC G3)

Each setting of B[0..2] points to a specific line:

B0

B1 B2

10

10 10

L0 L1 L2 L3

Page 24: Tutorial EW08 AbsInt

24

4-way PLRU Domino Effect

Non-empty cacheEmpty cache

c: c . . .. . . .

11 0

c d . . 10 0

c d f . 00 1

c d f . 11 1

c d f . 10 1

c d f h 00 0

c d f h 11 0

c d f h 10 0

c d f h 00 1

c d f h 11 1

c d f h 10 1

c d f h 00 0

00 0

d:f:c:d:h:c:d:f:c:d:h:

c e a bc e d bc f d bc f d bc f d bc h d bc h d bc h d bc f d bc f d bc f d bc h d b

11 0

01 1

10 1

11 1

01 1

10 1

11 1

01 1

10 1

11 1

01 1

10 1

f e a b 00 0

c:d:f:c:d:h:c:d:f:c:d:h:

Sequence: c, d, f, c, d, h

This sequence is thenrepeated ad infinitum

only cache hits

two misses each time

b

Page 25: Tutorial EW08 AbsInt

25

aiT WCET Analysis Input/Output

clock 10200 kHz ;loop "_codebook" + 1 loop exactly 16 end ;recursion "_fac" max 6;SNIPPET "printf" IS NOT ANALYZED AND TAKES MAX 333 CYCLES;flow "U_MOD" + 0xAC bytes / "U_MOD" + 0xC4 bytes is max 4;area from 0x20 to 0x497 is read-only;

Specifications (*.ais)

Entry Point

Worst Case Execution Time

Visualization, Documentation

aiT

void Task (void){ variable++;function();next++:if (next)do this;terminate()}

Application Code

Executable (*.elf / *.out)à =€@€� �aŒ† |� @€,�@€�;Þ�Kÿÿô;ÿ �Kÿÿ؉�€2}Œ`øÿÿ™�€(8H#鳡�¶��€(

Compiler Linker

Page 26: Tutorial EW08 AbsInt

26

Hardware-Settings

Hardware settings have to be specified in aiT according to the targetprocessor configuration in the start-up code.

Page 27: Tutorial EW08 AbsInt

27

Challenge: Reconstruction of CFGIndirect Jumps

Case/Switch statements as compiled by the C-compiler are automatically recognizedFor hand-written assembly code annotations might be necessaryINSTRUCTION ProgramPoint BRANCHES TO Target1, …, Targetn

Indirect CallsCan often be recognized automatically if a static array of function pointers is usedFor other casesINSTRUCTION ProgramPoint CALLS Target1, …, Targetn

Page 28: Tutorial EW08 AbsInt

28

Loops

aiT includes a loop bound analysis based on interval analysis and pattern matching that is able to recognize the iteration count of many „simple“ FOR loops automatically

Other loops need to be annotatedExample: loop "_prime" + 1 loop end max 10;

Page 29: Tutorial EW08 AbsInt

29

Source Level Annotations

bool divides (uint n, uint m) {/* ai: SNIPPET HERE NOT ANALYZED, TAKES MAX 173 CYCLES; */return (m % n == 0);

}

bool prime (uint n) {uint i;if (even (n))/* ai: SNIPPET HERE INFEASIBLE; */

return (n == 2);for (i = 3; i * i <= n; i += 2) {/* ai: LOOP HERE MAX 20; */

if (divides (i, n))return 0;

}return (n > 1);

}

Page 30: Tutorial EW08 AbsInt

30

aiT: Timing Details

Page 31: Tutorial EW08 AbsInt

31

Recent Advances

Source: studies by Lim et al. (1995), Thesing et al. (2002), and Souyris et al. (2005)

Cache-miss penalties WCET overestimation

Page 32: Tutorial EW08 AbsInt

32

Master’s Thesis of Daniel SehlbergMälardalen University, Sweden, ASTEC-Project, August 2005

Real-time tasks under Rubus OS on C16x taken from Volvo CE application

Page 33: Tutorial EW08 AbsInt

33

WCET Challenge 2006Organized by the University of Mälardalenhttp://www.idt.mdh.se/personal/jgn/challenge/

Aim: Compare different approaches in analyzing the Worst-Case Execution Time

Excerpts from the final report: "aiT is able to handle every kind of benchmark and every test program that was tested in the Challenge. aiT is able to support WCET analysis even for complex processors.”“aiT demonstrates its leading position through all its features […]"

Full report: http://dc.informatik.uni-essen.de/Tan/all/

Page 34: Tutorial EW08 AbsInt

34

SCADE / aiT automated Flow

Page 35: Tutorial EW08 AbsInt

35

Analysis Reports

Customizable HTML reportsGlobal and detailed reportsDiff feature

Page 36: Tutorial EW08 AbsInt

36

Integration with ETAS/ASCET

aiT/StackAnalyzer is started from the ASCET main menuASCET generates the annotation filesand the analyses are performed in the background

Page 37: Tutorial EW08 AbsInt

37

Practical Experiments, Execution Time

Engine throttle control module specified in ASCET, Tasking compiler v7.5., STM ST10F269 microcontroller board. Run-times extracted from bus traces (ISYSTEMS ILA 128 logic analyzer)The worst-case path information provided by aiT was used to manually construct a corresponding input.

Page 38: Tutorial EW08 AbsInt

38

Practical Experiments: Stack Usage

ST10/C16x uses two stacks.Most generated functions neither use local variables nor call subroutines, i.e. the stack usage is zero.

Page 39: Tutorial EW08 AbsInt

39

Integration with Scheduling Analysis

System level:SymTA/S

Code level:aiT/StackAnalyzer

System model(tasks, activation,scheduling)

WCET/stack analysis (single task)

Scheduling analysis (WCRT)system stack analysis

WCET/stack request

Refinement

WCET/stack response

Additional info

Page 40: Tutorial EW08 AbsInt

40

Future Work

Extraction of timing (pipeline) models from HW description (VHDL)

Use of source-level program analyses

Tighter integration with measurement based approaches

Early phase worst-case execution time estimation

Page 41: Tutorial EW08 AbsInt

41

aiT WCET Analyzer Advantages

Inspect the worst-case timing behavior of (critical parts of) your code

Tight WCET bounds reflect the actual worst-case performance of your system

Determined automatically

Valid for all inputs and all execution scenarios

No modification of your code or tool chain required

Page 42: Tutorial EW08 AbsInt

42

aiT Visualization Features

Precise insight into the program and processor behavior

Valuable feedbackin optimizing your program

Page 43: Tutorial EW08 AbsInt

43

Conclusion

aiT enables development of complex hard-real time systems on state-of-the-art hardware

Increases safety

Saves development time and costs

Usability proven in industrial practice

Page 44: Tutorial EW08 AbsInt

Contact

Visit us!

Hall10, booth 403

Coffee break

We start again at 11h