hier wird wissen wirklichkeit computer architecture – part 6 – page 1 of 22 – prof. dr. uwe...
TRANSCRIPT
Hier wird Wissen Wirklichkeit Computer Architecture – Part 6 – page 1 of 22 – Prof. Dr. Uwe Brinkschulte, Prof. Dr. Klaus Waldschmidt
Part 6
Fundamentals in Performance Evaluation
Computer Architecture
Slide Sets
WS 2011/2012
Prof. Dr. Uwe BrinkschulteProf. Dr. Klaus Waldschmidt
Hier wird Wissen Wirklichkeit Computer Architecture – Part 6 – page 2 of 22 – Prof. Dr. Uwe Brinkschulte, Prof. Dr. Klaus Waldschmidt
Why performance evaluation?
Comparison of computers
Selection of a computer
Changes in the configuration of an existing computer (tuning)
Design of computers
Verification or validation of design desicions
Methods for performance evaluation:
(1) analytical methods
(2) measurements
Hier wird Wissen Wirklichkeit Computer Architecture – Part 6 – page 3 of 22 – Prof. Dr. Uwe Brinkschulte, Prof. Dr. Klaus Waldschmidt
Aspects for evaluation
modularity Is the system composed of mostly independent parts, so called modules?
orthogonality Does every module offer an own set of functions to the system? Is one particular function not offered by different modules?
adequacy Do performance and cost of a module meet its weight for the whole system?
virtuality Are the physical limits of the hardware modules been repealed to the user? (Examples: virtual memory)
symmetry It is possible to derive the function of unknown parts from the properties of some known parts of the architecture, e.g. parts of the ISA?
transparency Are nonrelevant parts of the architecture been hidden to the user? (Example: transparent coprocessor)
Hier wird Wissen Wirklichkeit Computer Architecture – Part 6 – page 4 of 22 – Prof. Dr. Uwe Brinkschulte, Prof. Dr. Klaus Waldschmidt
Analytical methods
Performance measures: (hypothetical maximaum performance !!)
MIPS (Millions of Instructions per Second)
MFLOPS (Millions of Floating Point Operations per Sec.)
Mix: (as well calculated, not measured)
In a mix, the average execution time for each instruction is calculated and scaled by a
characteristical weight.
Core-Programs:
Typical application programs, written for the evaluated computer
No measurements, the overall execution time is calculated using the execution times of the
single machine instructions
Hier wird Wissen Wirklichkeit Computer Architecture – Part 6 – page 5 of 22 – Prof. Dr. Uwe Brinkschulte, Prof. Dr. Klaus Waldschmidt
Performance measures
• runtime = # clock cycles * clock period
• MIPS (million instruction per second)
instruction count MIPS = runtime • 106
instruction count instruction count • clock frequency MIPS = = # clock cycles • clock period • 106 # clock cycles • 106
clock frequency clock frequency • IPC MIPS = = CPI • 106 106
• MFLOPS (million floating point operations per second)
# executed floating point instruction MFLOPS =
runtime • 106
• CPI (cycles per instruction)
# clock cycles CPI = instruction count
• IPC (instructions per cycle)
ICP = 1 / CPI
Hier wird Wissen Wirklichkeit Computer Architecture – Part 6 – page 6 of 22 – Prof. Dr. Uwe Brinkschulte, Prof. Dr. Klaus Waldschmidt
• CPI, IPC, MIPS and MFLOPS are dependent on the instruction set.
• CPI, IPC, MIPS and MFLOPS are dependent on the program.
• CPI, IPC, MIPS and MFLOPS are dependent on the microarchitecture
Drawbacks of performance measures
Conclusions:
• Greater MIPS or MFLOPS ratings do not implicitly mean more performance!• It is of vital importance to chose well-suited test applications (benchmarks)!
Hier wird Wissen Wirklichkeit Computer Architecture – Part 6 – page 7 of 22 – Prof. Dr. Uwe Brinkschulte, Prof. Dr. Klaus Waldschmidt
Measurements
Benchmarks
Use of existing or synthetic programs to measure the performance
These programs are translated and executed on the evaluated computer
Therefore, not only the computer hardware, but as well the compiler influences the
outcome of a benchmark
Monitoring:
Monitors are used to observe parts of the computer at run-time
Therefore, interesting quantities inside the computer can be measured beside
the
overall outcome of a benchmark (e.g. cache utilization, network traffic, …)
Monitoring can be done by hardware or software
Hier wird Wissen Wirklichkeit Computer Architecture – Part 6 – page 8 of 22 – Prof. Dr. Uwe Brinkschulte, Prof. Dr. Klaus Waldschmidt
Benchmark terminology
benchmark A test program.
benchmark suite A set of benchmarks.
synthetic benchmark A test program only useful as
benchmark.
kernel benchmark A very small synthetic
benchmark. Usually a time intensive part of a real
program is chosen. Kernel benchmarks are well
suited
for design and simulation but normally unqualified
to
compare complete systems.
benchmark application A complete program
additionally used as benchmark. Opposite to
synthetic
benchmark.
Hier wird Wissen Wirklichkeit Computer Architecture – Part 6 – page 9 of 22 – Prof. Dr. Uwe Brinkschulte, Prof. Dr. Klaus Waldschmidt
SPEC-Benchmarks
SPEC Standard Performance Evaluation Corporation
since 1989, consortium of different manufacturer,
general purpose computer applications,
mainly to measure speed and throughput
Several benchmark suites, e.g.
SPEC95,
SPECweb96,
SPEC JVM98
SPEC JBB2000
SPEC CINT 2006
SPEC CFP 2006
Hier wird Wissen Wirklichkeit Computer Architecture – Part 6 – page 10 of 22 – Prof. Dr. Uwe Brinkschulte, Prof. Dr. Klaus Waldschmidt
SPECmarks
• Goal: comparable values for different systems
• But: single values don't always reflect real relations, therefore
only a first indication to select or judge a computer
• CPU performance plus cache, memory and compiler is measured, the
operating system and IO is less relevant
– Integer test-programs (ANSI C)
– Floating-point test-programs (Fortran77)
– „SPECmark“: this characteristic is the geometric mean of the individual
program characteristics contained in the suite
Hier wird Wissen Wirklichkeit Computer Architecture – Part 6 – page 11 of 22 – Prof. Dr. Uwe Brinkschulte, Prof. Dr. Klaus Waldschmidt
SPEC-CINT2006:
12 Integer test programs (C, C++)
name description
perlbench PERL interpreter
bzip2 bzip compressionsprogram
gcc GNU-C-Compiler version 3.2
mcf Simplex algorithm for traffic planning
gobmk AI implementation of the game Go
hmmer Protein sequence analysis based on a hidden Markov model
sjeng Chess program
libquantum Quantum computer simulator
h264ref H.264 codec
omnetpp OMNET++ discrete event simulator
astar Route planning
xalancbmk XML translator
Hier wird Wissen Wirklichkeit Computer Architecture – Part 6 – page 12 of 22 – Prof. Dr. Uwe Brinkschulte, Prof. Dr. Klaus Waldschmidt
SPEC-CFP2006: 17 Floating-point
test programs (C, C++, FORTRAN)name description
bwaves Fluid dynamics algorithm
gamess Quantum chemistry algorithm
milc Physics algorithm
zeusmp Fluid dynamics algorithm
gromacs Newton's equations of motion
cactusADM Equation solver for Einstein's evolutionary equation
leslie3d Fluid dynamics algorithm
namd Biomolecular simulation
dealll Finite-Elements
soplex Simplex algorithm
povray Image rendering
calculix Finite-Elements
GemsFDTD Maxwell equation solver
tonto Quantum chemistry
lbm Lattice-Bolzmann-simulator
wrf Weather modeling
Shinx3 Speach recognition
Hier wird Wissen Wirklichkeit Computer Architecture – Part 6 – page 13 of 22 – Prof. Dr. Uwe Brinkschulte, Prof. Dr. Klaus Waldschmidt
More popular benchmark suites
Basic Linear Algebra Subprograms (BLAS):
For numerical applications
Core of the LINPACK software package to solve lienar equation systems
TOP 500 list of the fastest parallel computers
Whetstone-Benchmark:
Developed in the seventies, a single program with lot of floating-point calculations
Dhrystone-Benchmark:
Improvement of Whetstone, developed in the eighties
Powerstone-Benchmark-Suite:
To compare the energy consumption of microprocessors and microcontrollers
Hier wird Wissen Wirklichkeit Computer Architecture – Part 6 – page 14 of 22 – Prof. Dr. Uwe Brinkschulte, Prof. Dr. Klaus Waldschmidt
Powerstone benchmark suite
name description
auto Vehicle control
bilv Logical and shift operations
bilt Graphical application
compress UNIX compression program
crc CRC error detection
des Data encryption
dhry Dhrystone
engine Engine control
fir_int Integer FIR filter
g3fax FAX group 3
g721 Audio compression
jpeg JPEG 24-Bit compression
pocsag Communication protocol for pagers
servo Hard disc control
summin Hand writing recognition
ucbqsort Quick sort
v42bits Modem operation
whet Whetstone
Hier wird Wissen Wirklichkeit Computer Architecture – Part 6 – page 15 of 22 – Prof. Dr. Uwe Brinkschulte, Prof. Dr. Klaus Waldschmidt
Monitors are components recording the states of a system during
its normal operation.
Contents of registers, flags, buffers and traffic in data paths are
recorded.
Monitors are used to observe and debug systems.
Monitoring
Hier wird Wissen Wirklichkeit Computer Architecture – Part 6 – page 16 of 22 – Prof. Dr. Uwe Brinkschulte, Prof. Dr. Klaus Waldschmidt
Generally, monitors can be classified in:
a) Hardware monitors
A hardware monitor is a separate component which is
physically connected to the locations of the target system
where measurements take place.
Hardware monitors typically consist of comparators and
counters to create data, memories to store it and busses for
data transport.
Thus, hardware monitors use its own resources.
Monitoring
Hier wird Wissen Wirklichkeit Computer Architecture – Part 6 – page 17 of 22 – Prof. Dr. Uwe Brinkschulte, Prof. Dr. Klaus Waldschmidt
Monitoring
b) Software monitors
A software monitor is a program, implemented to collect measuring data through interfaces provided by the operation system, the programming languages or application program.
A software monitor uses the resources of the observed system to collect, transport and store data.
c) Hybrid monitors
A hybrid monitor is a mixed hardware and software monitor.
Often simple elements like counters and memories are implemented in hardware while more complex observation functions are implemented in software.
Hier wird Wissen Wirklichkeit Computer Architecture – Part 6 – page 18 of 22 – Prof. Dr. Uwe Brinkschulte, Prof. Dr. Klaus Waldschmidt
Monitoring constraints
1. Accessing information
Ideally monitoring is integrated into the hardware and software
components of a system during design.
Software monitors are cheaper than hardware monitors but
they may influence the systems run time behavior.
2. Reaction less monitoring
Hardware and most hybrid monitors store the recorded data in
their own memories.
Software monitors have to use the memories of the observed
system.
Thus, hardware monitors are more reaction less than software
monitors.
Hier wird Wissen Wirklichkeit Computer Architecture – Part 6 – page 19 of 22 – Prof. Dr. Uwe Brinkschulte, Prof. Dr. Klaus Waldschmidt
Monitoring constraints:
3. Amount of recorded data and its further processing
Most purposes, especially debugging, require observations with high resolution.
For the accurate analysis of program errors the causing machine instruction has to be identified.
For other purposes, e.g. a global performance analysis, a coarser resolution is sufficient.
Although it often seems necessary to record observable data on the level of machine instruction execution, this would generate traces much greater than the memory usage of the observed application.
Thus, the cost to store this high amount of data and the general difficulties of processing the trace data prohibit a complete recording of traces at machine instruction level.
Hier wird Wissen Wirklichkeit Computer Architecture – Part 6 – page 20 of 22 – Prof. Dr. Uwe Brinkschulte, Prof. Dr. Klaus Waldschmidt
Instrumentation
One way of software monitoring is to insert measuring commands
into program code e.g. loop or time counters.
This is called instrumentation.
Instrumentation can be performed by the user, the compiler, the
class library or the operation system.
instrumentedprogram
computer
measuresystem
results
measureresults
Hier wird Wissen Wirklichkeit Computer Architecture – Part 6 – page 21 of 22 – Prof. Dr. Uwe Brinkschulte, Prof. Dr. Klaus Waldschmidt
hardware very high Hardware monitor
hardware high instrumented program
hard- and satisfactory simulation program software + hardware „Trace“
software sufficient simulation program
system state accuracy tools
Montitoring overview
method
direct
instrumentation
trace drivensimulation
simulation
Hier wird Wissen Wirklichkeit Computer Architecture – Part 6 – page 22 of 22 – Prof. Dr. Uwe Brinkschulte, Prof. Dr. Klaus Waldschmidt
Typical load-dependent parameters
• throughput
Defines the average number of
jobs
completed per time unit. A job
may be:
execution of an instruction or a
program, saving a data block or
sending a message.
• utilization
Defines the throughput (average
number of jobs completed)
divided by
the maximum possible
throughput.
• response time
Defines the average time
needed to
complete a job.
• utilization ratio
Defines the time spent working on
the jobs divided by whole
operating time.