hier wird wissen wirklichkeit computer architecture – part 6 – page 1 of 22 – prof. dr. uwe...

Hier wird Wissen Wirklichkeit Computer Architecture – Part 6 – page 1 of 22 – Prof. Dr. Uwe Brinkschulte, Prof. Dr. Klaus Waldschmidt

Part 6

Fundamentals in Performance Evaluation

Computer Architecture

Slide Sets

WS 2011/2012

Prof. Dr. Uwe BrinkschulteProf. Dr. Klaus Waldschmidt


Why performance evaluation?

Comparison of computers

Selection of a computer

Changes in the configuration of an existing computer (tuning)

Design of computers

Verification or validation of design desicions

Methods for performance evaluation:

(1) analytical methods

(2) measurements


Aspects for evaluation

modularity Is the system composed of mostly independent parts, so called modules?

orthogonality Does every module offer an own set of functions to the system? Is one particular function not offered by different modules?

adequacy Do performance and cost of a module meet its weight for the whole system?

virtuality Are the physical limits of the hardware modules been repealed to the user? (Examples: virtual memory)

symmetry It is possible to derive the function of unknown parts from the properties of some known parts of the architecture, e.g. parts of the ISA?

transparency Are nonrelevant parts of the architecture been hidden to the user? (Example: transparent coprocessor)


Analytical methods

Performance measures: (hypothetical maximaum performance !!)

MIPS (Millions of Instructions per Second)

MFLOPS (Millions of Floating Point Operations per Sec.)

Mix: (as well calculated, not measured)

In a mix, the average execution time for each instruction is calculated and scaled by a

characteristical weight.

Core-Programs:

Typical application programs, written for the evaluated computer

No measurements, the overall execution time is calculated using the execution times of the

single machine instructions


Performance measures

• runtime = # clock cycles * clock period

• MIPS (million instruction per second)

instruction count MIPS = runtime • 106

instruction count instruction count • clock frequency MIPS = = # clock cycles • clock period • 106 # clock cycles • 106

clock frequency clock frequency • IPC MIPS = = CPI • 106 106

• MFLOPS (million floating point operations per second)

# executed floating point instruction MFLOPS =

runtime • 106

• CPI (cycles per instruction)

# clock cycles CPI = instruction count

• IPC (instructions per cycle)

ICP = 1 / CPI


• CPI, IPC, MIPS and MFLOPS are dependent on the instruction set.

• CPI, IPC, MIPS and MFLOPS are dependent on the program.

• CPI, IPC, MIPS and MFLOPS are dependent on the microarchitecture

Drawbacks of performance measures

Conclusions:

• Greater MIPS or MFLOPS ratings do not implicitly mean more performance!• It is of vital importance to chose well-suited test applications (benchmarks)!


Measurements

Benchmarks

Use of existing or synthetic programs to measure the performance

These programs are translated and executed on the evaluated computer

Therefore, not only the computer hardware, but as well the compiler influences the

outcome of a benchmark

Monitoring:

Monitors are used to observe parts of the computer at run-time

Therefore, interesting quantities inside the computer can be measured beside

the

overall outcome of a benchmark (e.g. cache utilization, network traffic, …)

Monitoring can be done by hardware or software


Benchmark terminology

benchmark A test program.

benchmark suite A set of benchmarks.

synthetic benchmark A test program only useful as

benchmark.

kernel benchmark A very small synthetic

benchmark. Usually a time intensive part of a real

program is chosen. Kernel benchmarks are well

suited

for design and simulation but normally unqualified

to

compare complete systems.

benchmark application A complete program

additionally used as benchmark. Opposite to

synthetic

benchmark.


SPEC-Benchmarks

SPEC Standard Performance Evaluation Corporation

since 1989, consortium of different manufacturer,

general purpose computer applications,

mainly to measure speed and throughput

Several benchmark suites, e.g.

SPEC95,

SPECweb96,

SPEC JVM98

SPEC JBB2000

SPEC CINT 2006

SPEC CFP 2006


SPECmarks

• Goal: comparable values for different systems

• But: single values don't always reflect real relations, therefore

only a first indication to select or judge a computer

• CPU performance plus cache, memory and compiler is measured, the

operating system and IO is less relevant

– Integer test-programs (ANSI C)

– Floating-point test-programs (Fortran77)

– „SPECmark“: this characteristic is the geometric mean of the individual

program characteristics contained in the suite


SPEC-CINT2006:

12 Integer test programs (C, C++)

name description

perlbench PERL interpreter

bzip2 bzip compressionsprogram

gcc GNU-C-Compiler version 3.2

mcf Simplex algorithm for traffic planning

gobmk AI implementation of the game Go

hmmer Protein sequence analysis based on a hidden Markov model

sjeng Chess program

libquantum Quantum computer simulator

h264ref H.264 codec

omnetpp OMNET++ discrete event simulator

astar Route planning

xalancbmk XML translator


SPEC-CFP2006: 17 Floating-point

test programs (C, C++, FORTRAN)name description

bwaves Fluid dynamics algorithm

gamess Quantum chemistry algorithm

milc Physics algorithm

zeusmp Fluid dynamics algorithm

gromacs Newton's equations of motion

cactusADM Equation solver for Einstein's evolutionary equation

leslie3d Fluid dynamics algorithm

namd Biomolecular simulation

dealll Finite-Elements

soplex Simplex algorithm

povray Image rendering

calculix Finite-Elements

GemsFDTD Maxwell equation solver

tonto Quantum chemistry

lbm Lattice-Bolzmann-simulator

wrf Weather modeling

Shinx3 Speach recognition


More popular benchmark suites

Basic Linear Algebra Subprograms (BLAS):

For numerical applications

Core of the LINPACK software package to solve lienar equation systems

TOP 500 list of the fastest parallel computers

Whetstone-Benchmark:

Developed in the seventies, a single program with lot of floating-point calculations

Dhrystone-Benchmark:

Improvement of Whetstone, developed in the eighties

Powerstone-Benchmark-Suite:

To compare the energy consumption of microprocessors and microcontrollers


Powerstone benchmark suite

name description

auto Vehicle control

bilv Logical and shift operations

bilt Graphical application

compress UNIX compression program

crc CRC error detection

des Data encryption

dhry Dhrystone

engine Engine control

fir_int Integer FIR filter

g3fax FAX group 3

g721 Audio compression

jpeg JPEG 24-Bit compression

pocsag Communication protocol for pagers

servo Hard disc control

summin Hand writing recognition

ucbqsort Quick sort

v42bits Modem operation

whet Whetstone


Monitors are components recording the states of a system during

its normal operation.

Contents of registers, flags, buffers and traffic in data paths are

recorded.

Monitors are used to observe and debug systems.

Monitoring


Generally, monitors can be classified in:

a) Hardware monitors

A hardware monitor is a separate component which is

physically connected to the locations of the target system

where measurements take place.

Hardware monitors typically consist of comparators and

counters to create data, memories to store it and busses for

data transport.

Thus, hardware monitors use its own resources.

Monitoring


Monitoring

b) Software monitors

A software monitor is a program, implemented to collect measuring data through interfaces provided by the operation system, the programming languages or application program.

A software monitor uses the resources of the observed system to collect, transport and store data.

c) Hybrid monitors

A hybrid monitor is a mixed hardware and software monitor.

Often simple elements like counters and memories are implemented in hardware while more complex observation functions are implemented in software.


Monitoring constraints

1. Accessing information

Ideally monitoring is integrated into the hardware and software

components of a system during design.

Software monitors are cheaper than hardware monitors but

they may influence the systems run time behavior.

2. Reaction less monitoring

Hardware and most hybrid monitors store the recorded data in

their own memories.

Software monitors have to use the memories of the observed

system.

Thus, hardware monitors are more reaction less than software

monitors.


Monitoring constraints:

3. Amount of recorded data and its further processing

Most purposes, especially debugging, require observations with high resolution.

For the accurate analysis of program errors the causing machine instruction has to be identified.

For other purposes, e.g. a global performance analysis, a coarser resolution is sufficient.

Although it often seems necessary to record observable data on the level of machine instruction execution, this would generate traces much greater than the memory usage of the observed application.

Thus, the cost to store this high amount of data and the general difficulties of processing the trace data prohibit a complete recording of traces at machine instruction level.


Instrumentation

One way of software monitoring is to insert measuring commands

into program code e.g. loop or time counters.

This is called instrumentation.

Instrumentation can be performed by the user, the compiler, the

class library or the operation system.

instrumentedprogram

computer

measuresystem

results

measureresults


hardware very high Hardware monitor

hardware high instrumented program

hard- and satisfactory simulation program software + hardware „Trace“

software sufficient simulation program

system state accuracy tools

Montitoring overview

method

direct

instrumentation

trace drivensimulation

simulation


Typical load-dependent parameters

• throughput

Defines the average number of

jobs

completed per time unit. A job

may be:

execution of an instruction or a

program, saving a data block or

sending a message.

• utilization

Defines the throughput (average

number of jobs completed)

divided by

the maximum possible

throughput.

• response time

Defines the average time

needed to

complete a job.

• utilization ratio

Defines the time spent working on

the jobs divided by whole

operating time.

hier wird wissen wirklichkeit computer architecture – part 6 – page 1 of 22 – prof. dr. uwe...

Documents