lez_02
TRANSCRIPT
-
8/13/2019 Lez_02
1/11
COMPUTER ARCHITECTURE
Prof. Basel MahafzahLesson 2 - The role of performance - Part 2
Copyright Universit Telematica Internazionale UNINETTUNO
The Role ofPerformanceThe Role of
PerformancePart 2Part 2
OutlineOutline
Clock Cycles perInstruction (CPI)Clock Cycles perInstruction (CPI)
Definition of PerformanceDefinition of Performance
Instruction CountInstruction Count
Cycles and InstructionsCycles and Instructions
Since the compiler clearlygenerated instructions toexecute, and the computerhad to execute the instructionsto run the program
Since the compiler clearlygenerated instructions toexecute, and the computerhad to execute the instructionsto run the program
The execution time mustdepend on the numberof instructions in a program
The execution time mustdepend on the numberof instructions in a program
Cycles and InstructionsCycles and Instructions
How many cyclesare required for
a program?
How many cyclesare required for
a program?
Could we assume that # ofcycles = # of instructionsCould we assume that # ofcycles = # of instructions
1 s
t i n s
t r u c t
i o n
1 s
t i n s
t r u c
t i o n
2 n
d i
n s
t r u c
t i o n
2 n
d i
n s
t r u c
t i o n
3 r
d i
n s
t r u c
t i o n
3 r
d i
n s
t r u c
t i o n
4 t h 4 t h
5 t h 5 t h
6 t h 6 t h
. . . . . .
-
8/13/2019 Lez_02
2/11
-
8/13/2019 Lez_02
3/11
-
8/13/2019 Lez_02
4/11
COMPUTER ARCHITECTURE
Prof. Basel MahafzahLesson 2 - The role of performance - Part 2
Copyright Universit Telematica Internazionale UNINETTUNO
If the two computers have thesame ISA which of our quantities(e.g., clock rate, CPI, executiontime, # of instructions, MIPS)will always be identical?
If the two computers have thesame ISA which of our quantities( e.g. , clock rate , CPI , executiontime , # of instructions , MIPS )will always be identical?
Answer Answer
lets call this number Ilets call this number I
We know that each computerexecutes the same numberof instructions for the sameprogram
We know that each computerexecutes the same numberof instructions for the sameprogram
First find the number ofprocessor clock cycles for eachcomputer:CPU clock cycles A = I x 2.0CPU clock cycles B = I x 1.2
First find the number ofprocessor clock cycles for eachcomputer:CPU clock cycles A = I x 2.0CPU clock cycles B = I x 1.2
Now we can compute the CPUtime for each computer:Now we can compute the CPUtime for each computer:
CPU time A = CPU clock cycles A x Clock cycle time A= I x 2.0 x 1 ns = 2 x I nsCPU time B = I x 1.2 x 2 ns
= 2.4 x I ns
CPU time A = CPU clock cycles A x Clock cycle time A= I x 2.0 x 1 ns = 2 x I nsCPU time B = I x 1.2 x 2 ns
= 2.4 x I ns
Clearly, computer A is faster.The amount faster is given bythe ratio of the execution times:
Clearly, computer A is faster .The amount faster is given bythe ratio of the execution times:
-
8/13/2019 Lez_02
5/11
COMPUTER ARCHITECTURE
Prof. Basel MahafzahLesson 2 - The role of performance - Part 2
Copyright Universit Telematica Internazionale UNINETTUNO
CPU Performance A / CPUPerformance B =Execution Time B / Execution Time A= 2.4 x I ns / 2 x I ns = 1.2
CPU Performance A / CPUPerformance B =Execution Time B / Execution Time A= 2.4 x I ns / 2 x I ns = 1.2
Computer A is 1.2 timesfaster than Computer B forthis program
Computer A is 1.2 timesfaster than Computer B forthis program
Final AnswerFinal Answer
Definitionof Performance
(Continues)
Definitionof Performance
(Continues)Instruction CountInstruction Count
Basic performance equationin terms of instruction count(the number of instructionsexecuted by the program), CPI,and clock cycle time:
Basic performance equationin terms of instruction count(the number of instructionsexecuted by the program) , CPI,and clock cycle time: Inst.Count CPI
CR Inst.Count CPI
CR
CC =Clock CyclesCR =Clock Rate
CC =Clock CyclesCR =Clock Rate
Inst.Count CPI CC TimeInst.Count CPI CC Time
Or CPU time =Or CPU time =
CPU time =CPU time =
-
8/13/2019 Lez_02
6/11
COMPUTER ARCHITECTURE
Prof. Basel MahafzahLesson 2 - The role of performance - Part 2
Copyright Universit Telematica Internazionale UNINETTUNO
Execution Time =Execution Time =
I C C S P I C CI C C S P I C C
Where I=InstructionsP= ProgramCC= Clock cyclesS= Seconds
Where I =InstructionsP= ProgramCC= Clock cyclesS= Seconds
CPU clock cycles =CPU clock cycles =
n
i ii=1
CPI C n
i ii=1
CPI C
How to find the performanceparameters (such as CC)?How to find the performanceparameters (such as CC) ?
CPI i is the average number ofcycles per instruction for thatinstruction class, and n is thenumber of instruction classes
CPI i is the average number ofcycles per instruction for thatinstruction class, and n is thenumber of instruction classes
Where C i is the count of thenumber of instructions of class iexecuted
Where Ci is the count of thenumber of instructions of class iexecuted
A compiler designer is tryingto decide between two codesequences for a particularmachine
A compiler designer is tryingto decide between two codesequences for a particularmachine
ExampleExample
Example ContinuesExample Continues
Based on the hardwareimplementation, there are threedifferent classes of instructions:Class A, Class B, and Class C, andthey require one, two, and threecycles (respectively)
Based on the hardwareimplementation, there are three
different classes of instructions:Class A , Class B , and Class C , andthey require one , two , and threecycles (respectively)
-
8/13/2019 Lez_02
7/11
COMPUTER ARCHITECTURE
Prof. Basel MahafzahLesson 2 - The role of performance - Part 2
Copyright Universit Telematica Internazionale UNINETTUNO
The first code sequence has 5instructions: 2 of A, 1 of B,and 2 of C
The first code sequence has 5instructions: 2 of A, 1 of B,and 2 of C
The second sequence has 6instructions: 4 of A, 1 of B,and 1 of C
The second sequence has 6instructions: 4 of A, 1 of B,and 1 of C
Which sequence willbe faster?Which sequence willbe faster?
What is the CPI for eachsequence?What is the CPI for eachsequence?
Answer AnswerSequence 1 executes2 + 1 + 2 = 5 instructionsSequence 1 executes2 + 1 + 2 = 5 instructions
Sequence 2 executes4 + 1 + 1 = 6 instructionsSequence 2 executes4 + 1 + 1 = 6 instructions
So, sequence 1 executesfewer instructionsSo, sequence 1 executesfewer instructions
Answer Continues Answer Continues
CPU clock cycles =CPU clock cycles =
To find the total number ofclock cycles for each sequence:To find the total number ofclock cycles for each sequence:
n
i ii=1
CPI C n
i ii=1
CPI CCPU clock cycles 2 = (4x1) +(1x2) + (1x3) = 9 cyclesCPU clock cycles 2 = (4x1) +(1x2) + (1x3) = 9 cycles
CPU clock cycles 1 = (2x1) +(1x2) + (2x3) = 10 cyclesCPU clock cycles 1 = (2x1) +(1x2) + (2x3) = 10 cycles
-
8/13/2019 Lez_02
8/11
COMPUTER ARCHITECTURE
Prof. Basel MahafzahLesson 2 - The role of performance - Part 2
Copyright Universit Telematica Internazionale UNINETTUNO
So code sequence 2 is faster,even though it actuallyexecutes one extra instruction
So code sequence 2 is faster ,even though it actuallyexecutes one extra instruction
Since code sequence 2 takesfewer overall clock cycles buthas more instructions, it musthave a lower CPI
Since code sequence 2 takesfewer overall clock cycles buthas more instructions, it musthave a lower CPI
The CPI values can becomputed by:The CPI values can becomputed by:
CPI =CPI = CPU CCInst. Count
CPU CCInst. Count
CPI 1 =CPU clock cycles 1 = 10 = 2Instruction count 1 5
CPI 1 =CPU clock cycles 1 = 10 = 2Instruction count 1 5
CPI 2 =CPU clock cycles 2 = 9 = 1.5Instruction count 2 6
CPI 2 =CPU clock cycles 2 = 9 = 1.5Instruction count 2 6
In SummaryIn SummaryNote: When comparing twomachines, you must look at allthree components (instructioncount, CPI, and clock cycle time),which combine to formexecution time
Note: When comparing two
machines, you must look at allthree components (instructioncount, CPI, and clock cycle time) ,which combine to formexecution time
Now that weunderstand cyclesNow that weunderstand cycles
-
8/13/2019 Lez_02
9/11
COMPUTER ARCHITECTURE
Prof. Basel MahafzahLesson 2 - The role of performance - Part 2
Copyright Universit Telematica Internazionale UNINETTUNO
some number of secondssome number of seconds
A given program will require A given program will require
some number of instructions(machine instructions)some number of instructions(machine instructions)
some number of cyclessome number of cycles
We have a vocabularythat relates these quantities:We have a vocabularythat relates these quantities:
Cycle Time (seconds per cycle)Cycle Time (seconds per cycle)
Inst. Count (inst. per program)Inst. Count (inst. per program)
Clock Rate (cycles per second)Clock Rate (cycles per second)
CPI (cycles per instruction)CPI (cycles per instruction)
A floating point intensiveapplication might havea higher CPI
A floating point intensiveapplication might havea higher CPI
PerformancePerformance
Performance is determinedby execution timePerformance is determinedby execution time
Do any of the other variablesequal performance?Do any of the other variablesequal performance?
number of cycles to executeprogram?number of cycles to executeprogram?
number of instructionsin program?number of instructionsin program?
number of cycles per second?number of cycles per second?
average number of cyclesper instruction?average number of cyclesper instruction?
-
8/13/2019 Lez_02
10/11
COMPUTER ARCHITECTURE
Prof. Basel MahafzahLesson 2 - The role of performance - Part 2
Copyright Universit Telematica Internazionale UNINETTUNO
average number of instructionsper second?average number of instructionsper second?
Common pitfall: thinking oneof the variables is indicative ofperformance when it really isnt
Common pitfall: thinking oneof the variables is indicative ofperformance when it really isnt
BenchmarksBenchmarks
Performance best determinedby running a real applicationPerformance best determinedby running a real application
Use programs typical ofexpected workloadUse programs typical ofexpected workload
Or, typical of expected classof applicationse.g., compilers/editors,scientific applications, graphics,etc.
Or, typical of expected classof applicationse.g., compilers/editors,scientific applications, graphics,etc.
Small benchmarksSmall benchmarks
nice for architectsand designersnice for architectsand designers
easy to standardizeeasy to standardize
can be abusedcan be abused
SPEC (System PerformanceEvaluation Cooperative)SPEC (System PerformanceEvaluation Cooperative)
companies have agreed on a setof real program and inputscompanies have agreed on a setof real program and inputs
can still be abusedcan still be abused
-
8/13/2019 Lez_02
11/11
COMPUTER ARCHITECTURE
Prof. Basel MahafzahLesson 2 - The role of performance - Part 2
Copyright Universit Telematica Internazionale UNINETTUNO
Benchmarks are valuableindicator of performance(and compiler technology)
Benchmarks are valuableindicator of performance(and compiler technology)
SPEC 89SPEC 89800800
700700
600600
500500
400400
300300
200200
100100
00
Compiler enhancements and performance
Compiler enhancements and performance
gccgcc espressoespresso spicespice doducdoduc nasa7nasa7 lili eqntotteqntott matrix300matrix300
fppppfpppp tomcatvtomcatv
Benchmark Benchmark compilercompiler Enhanced compilerEnhanced compiler
Benchmark Description
go Artificial intelligence; plays the game of Gom88ksim Motorola 88k chip simulator; runs test programgcc The Gnu C compiler generating SPARC codecompress Compresses and decompresses file in memoryli Lisp interpreter ijpeg Graphic compression and decompressionperl Manipulates strings and prime numbers in the special-purpose programming language Perlvortex A database program
Benchmark Description
go Artificial intelligence; plays the game of Gom88ksim Motorola 88k chip simulator; runs test programgcc The Gnu C compiler generating SPARC codecompress Compresses and decompresses file in memoryli Lisp interpreter ijpeg Graphic compression and decompressionperl Manipulates strings and prime numbers in the special-purpose programming language Perlvortex A database program
SPEC 95 (8 integer benchmarks)SPEC 95 (8 integer benchmarks)
tomcatv A mesh generation programswim Shallow water model with 513 x 513 gridsu2cor quantum physics; Monte Carlo simulationhydro2d Astrophysics; Hydrodynamic Naiver Stokes equationsmgrid Multigrid solver in 3-D potential fieldapplu Parabolic/elliptic partial differential equationstrub3d Simulates isotropic, homogeneous turbulence in a cubeapsi Solves problems regarding temperature, wind velocity, and distribution of pollutantfppp p Q uant um c hem is trywave5 Plasma physics; electromagnetic particle simulation
tomcatv A mesh generation programswim Shallow water model with 513 x 513 gridsu2cor quantum physics; Monte Carlo simulationhydro2d Astrophysics; Hydrodynamic Naiver Stokes equationsmgrid Multigrid solver in 3-D potential fieldapplu Parabolic/elliptic partial differential equationstrub3d Simulates isotropic, homogeneous turbulence in a cubeapsi Solves problems regarding temperature, wind velocity, and distribution of pollutantfpppp Quant um c hem is tr ywave5 Plasma physics; electromagnetic particle simulation
SPEC 95 (10 floating-pointbenchmarks)
SPEC 95 (10 floating-pointbenchmarks)
Benchmark DescriptionBenchmark Description
SPECint 95SPECint 95
SPEC ratio: bigger numeric resultsindicate Faster performanceSPEC ratio: bigger numeric resultsindicate Faster performance
Pentium Prois 1.4 to 1.5times fasterthan Pentium
Pentium Prois 1.4 to 1.5times fasterthan Pentium
1
Clock rate (MHz)Clock rate (MHz)5050 100100 150150 200200 250250
1-1-22
00
3-3-445-5-667-7-889-9-
1010
SPECfp 95SPECfp 95
Pentium Prois 1.7 to 1.8times fasterthan Pentium
Pentium Prois 1.7 to 1.8times fasterthan Pentium 5050 100100 150150 200200 250250
Clock rate (MHz)Clock rate (MHz)
11-1-22
00
3-3-445-5-667-7-889-9-
1010
SPEC ratio: bigger numeric resultsindicate Faster performanceSPEC ratio: bigger numeric resultsindicate Faster performance