lez_02

Upload: sayed1234

Post on 04-Jun-2018

216 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/13/2019 Lez_02

    1/11

    COMPUTER ARCHITECTURE

    Prof. Basel MahafzahLesson 2 - The role of performance - Part 2

    Copyright Universit Telematica Internazionale UNINETTUNO

    The Role ofPerformanceThe Role of

    PerformancePart 2Part 2

    OutlineOutline

    Clock Cycles perInstruction (CPI)Clock Cycles perInstruction (CPI)

    Definition of PerformanceDefinition of Performance

    Instruction CountInstruction Count

    Cycles and InstructionsCycles and Instructions

    Since the compiler clearlygenerated instructions toexecute, and the computerhad to execute the instructionsto run the program

    Since the compiler clearlygenerated instructions toexecute, and the computerhad to execute the instructionsto run the program

    The execution time mustdepend on the numberof instructions in a program

    The execution time mustdepend on the numberof instructions in a program

    Cycles and InstructionsCycles and Instructions

    How many cyclesare required for

    a program?

    How many cyclesare required for

    a program?

    Could we assume that # ofcycles = # of instructionsCould we assume that # ofcycles = # of instructions

    1 s

    t i n s

    t r u c t

    i o n

    1 s

    t i n s

    t r u c

    t i o n

    2 n

    d i

    n s

    t r u c

    t i o n

    2 n

    d i

    n s

    t r u c

    t i o n

    3 r

    d i

    n s

    t r u c

    t i o n

    3 r

    d i

    n s

    t r u c

    t i o n

    4 t h 4 t h

    5 t h 5 t h

    6 t h 6 t h

    . . . . . .

  • 8/13/2019 Lez_02

    2/11

  • 8/13/2019 Lez_02

    3/11

  • 8/13/2019 Lez_02

    4/11

    COMPUTER ARCHITECTURE

    Prof. Basel MahafzahLesson 2 - The role of performance - Part 2

    Copyright Universit Telematica Internazionale UNINETTUNO

    If the two computers have thesame ISA which of our quantities(e.g., clock rate, CPI, executiontime, # of instructions, MIPS)will always be identical?

    If the two computers have thesame ISA which of our quantities( e.g. , clock rate , CPI , executiontime , # of instructions , MIPS )will always be identical?

    Answer Answer

    lets call this number Ilets call this number I

    We know that each computerexecutes the same numberof instructions for the sameprogram

    We know that each computerexecutes the same numberof instructions for the sameprogram

    First find the number ofprocessor clock cycles for eachcomputer:CPU clock cycles A = I x 2.0CPU clock cycles B = I x 1.2

    First find the number ofprocessor clock cycles for eachcomputer:CPU clock cycles A = I x 2.0CPU clock cycles B = I x 1.2

    Now we can compute the CPUtime for each computer:Now we can compute the CPUtime for each computer:

    CPU time A = CPU clock cycles A x Clock cycle time A= I x 2.0 x 1 ns = 2 x I nsCPU time B = I x 1.2 x 2 ns

    = 2.4 x I ns

    CPU time A = CPU clock cycles A x Clock cycle time A= I x 2.0 x 1 ns = 2 x I nsCPU time B = I x 1.2 x 2 ns

    = 2.4 x I ns

    Clearly, computer A is faster.The amount faster is given bythe ratio of the execution times:

    Clearly, computer A is faster .The amount faster is given bythe ratio of the execution times:

  • 8/13/2019 Lez_02

    5/11

    COMPUTER ARCHITECTURE

    Prof. Basel MahafzahLesson 2 - The role of performance - Part 2

    Copyright Universit Telematica Internazionale UNINETTUNO

    CPU Performance A / CPUPerformance B =Execution Time B / Execution Time A= 2.4 x I ns / 2 x I ns = 1.2

    CPU Performance A / CPUPerformance B =Execution Time B / Execution Time A= 2.4 x I ns / 2 x I ns = 1.2

    Computer A is 1.2 timesfaster than Computer B forthis program

    Computer A is 1.2 timesfaster than Computer B forthis program

    Final AnswerFinal Answer

    Definitionof Performance

    (Continues)

    Definitionof Performance

    (Continues)Instruction CountInstruction Count

    Basic performance equationin terms of instruction count(the number of instructionsexecuted by the program), CPI,and clock cycle time:

    Basic performance equationin terms of instruction count(the number of instructionsexecuted by the program) , CPI,and clock cycle time: Inst.Count CPI

    CR Inst.Count CPI

    CR

    CC =Clock CyclesCR =Clock Rate

    CC =Clock CyclesCR =Clock Rate

    Inst.Count CPI CC TimeInst.Count CPI CC Time

    Or CPU time =Or CPU time =

    CPU time =CPU time =

  • 8/13/2019 Lez_02

    6/11

    COMPUTER ARCHITECTURE

    Prof. Basel MahafzahLesson 2 - The role of performance - Part 2

    Copyright Universit Telematica Internazionale UNINETTUNO

    Execution Time =Execution Time =

    I C C S P I C CI C C S P I C C

    Where I=InstructionsP= ProgramCC= Clock cyclesS= Seconds

    Where I =InstructionsP= ProgramCC= Clock cyclesS= Seconds

    CPU clock cycles =CPU clock cycles =

    n

    i ii=1

    CPI C n

    i ii=1

    CPI C

    How to find the performanceparameters (such as CC)?How to find the performanceparameters (such as CC) ?

    CPI i is the average number ofcycles per instruction for thatinstruction class, and n is thenumber of instruction classes

    CPI i is the average number ofcycles per instruction for thatinstruction class, and n is thenumber of instruction classes

    Where C i is the count of thenumber of instructions of class iexecuted

    Where Ci is the count of thenumber of instructions of class iexecuted

    A compiler designer is tryingto decide between two codesequences for a particularmachine

    A compiler designer is tryingto decide between two codesequences for a particularmachine

    ExampleExample

    Example ContinuesExample Continues

    Based on the hardwareimplementation, there are threedifferent classes of instructions:Class A, Class B, and Class C, andthey require one, two, and threecycles (respectively)

    Based on the hardwareimplementation, there are three

    different classes of instructions:Class A , Class B , and Class C , andthey require one , two , and threecycles (respectively)

  • 8/13/2019 Lez_02

    7/11

    COMPUTER ARCHITECTURE

    Prof. Basel MahafzahLesson 2 - The role of performance - Part 2

    Copyright Universit Telematica Internazionale UNINETTUNO

    The first code sequence has 5instructions: 2 of A, 1 of B,and 2 of C

    The first code sequence has 5instructions: 2 of A, 1 of B,and 2 of C

    The second sequence has 6instructions: 4 of A, 1 of B,and 1 of C

    The second sequence has 6instructions: 4 of A, 1 of B,and 1 of C

    Which sequence willbe faster?Which sequence willbe faster?

    What is the CPI for eachsequence?What is the CPI for eachsequence?

    Answer AnswerSequence 1 executes2 + 1 + 2 = 5 instructionsSequence 1 executes2 + 1 + 2 = 5 instructions

    Sequence 2 executes4 + 1 + 1 = 6 instructionsSequence 2 executes4 + 1 + 1 = 6 instructions

    So, sequence 1 executesfewer instructionsSo, sequence 1 executesfewer instructions

    Answer Continues Answer Continues

    CPU clock cycles =CPU clock cycles =

    To find the total number ofclock cycles for each sequence:To find the total number ofclock cycles for each sequence:

    n

    i ii=1

    CPI C n

    i ii=1

    CPI CCPU clock cycles 2 = (4x1) +(1x2) + (1x3) = 9 cyclesCPU clock cycles 2 = (4x1) +(1x2) + (1x3) = 9 cycles

    CPU clock cycles 1 = (2x1) +(1x2) + (2x3) = 10 cyclesCPU clock cycles 1 = (2x1) +(1x2) + (2x3) = 10 cycles

  • 8/13/2019 Lez_02

    8/11

    COMPUTER ARCHITECTURE

    Prof. Basel MahafzahLesson 2 - The role of performance - Part 2

    Copyright Universit Telematica Internazionale UNINETTUNO

    So code sequence 2 is faster,even though it actuallyexecutes one extra instruction

    So code sequence 2 is faster ,even though it actuallyexecutes one extra instruction

    Since code sequence 2 takesfewer overall clock cycles buthas more instructions, it musthave a lower CPI

    Since code sequence 2 takesfewer overall clock cycles buthas more instructions, it musthave a lower CPI

    The CPI values can becomputed by:The CPI values can becomputed by:

    CPI =CPI = CPU CCInst. Count

    CPU CCInst. Count

    CPI 1 =CPU clock cycles 1 = 10 = 2Instruction count 1 5

    CPI 1 =CPU clock cycles 1 = 10 = 2Instruction count 1 5

    CPI 2 =CPU clock cycles 2 = 9 = 1.5Instruction count 2 6

    CPI 2 =CPU clock cycles 2 = 9 = 1.5Instruction count 2 6

    In SummaryIn SummaryNote: When comparing twomachines, you must look at allthree components (instructioncount, CPI, and clock cycle time),which combine to formexecution time

    Note: When comparing two

    machines, you must look at allthree components (instructioncount, CPI, and clock cycle time) ,which combine to formexecution time

    Now that weunderstand cyclesNow that weunderstand cycles

  • 8/13/2019 Lez_02

    9/11

    COMPUTER ARCHITECTURE

    Prof. Basel MahafzahLesson 2 - The role of performance - Part 2

    Copyright Universit Telematica Internazionale UNINETTUNO

    some number of secondssome number of seconds

    A given program will require A given program will require

    some number of instructions(machine instructions)some number of instructions(machine instructions)

    some number of cyclessome number of cycles

    We have a vocabularythat relates these quantities:We have a vocabularythat relates these quantities:

    Cycle Time (seconds per cycle)Cycle Time (seconds per cycle)

    Inst. Count (inst. per program)Inst. Count (inst. per program)

    Clock Rate (cycles per second)Clock Rate (cycles per second)

    CPI (cycles per instruction)CPI (cycles per instruction)

    A floating point intensiveapplication might havea higher CPI

    A floating point intensiveapplication might havea higher CPI

    PerformancePerformance

    Performance is determinedby execution timePerformance is determinedby execution time

    Do any of the other variablesequal performance?Do any of the other variablesequal performance?

    number of cycles to executeprogram?number of cycles to executeprogram?

    number of instructionsin program?number of instructionsin program?

    number of cycles per second?number of cycles per second?

    average number of cyclesper instruction?average number of cyclesper instruction?

  • 8/13/2019 Lez_02

    10/11

    COMPUTER ARCHITECTURE

    Prof. Basel MahafzahLesson 2 - The role of performance - Part 2

    Copyright Universit Telematica Internazionale UNINETTUNO

    average number of instructionsper second?average number of instructionsper second?

    Common pitfall: thinking oneof the variables is indicative ofperformance when it really isnt

    Common pitfall: thinking oneof the variables is indicative ofperformance when it really isnt

    BenchmarksBenchmarks

    Performance best determinedby running a real applicationPerformance best determinedby running a real application

    Use programs typical ofexpected workloadUse programs typical ofexpected workload

    Or, typical of expected classof applicationse.g., compilers/editors,scientific applications, graphics,etc.

    Or, typical of expected classof applicationse.g., compilers/editors,scientific applications, graphics,etc.

    Small benchmarksSmall benchmarks

    nice for architectsand designersnice for architectsand designers

    easy to standardizeeasy to standardize

    can be abusedcan be abused

    SPEC (System PerformanceEvaluation Cooperative)SPEC (System PerformanceEvaluation Cooperative)

    companies have agreed on a setof real program and inputscompanies have agreed on a setof real program and inputs

    can still be abusedcan still be abused

  • 8/13/2019 Lez_02

    11/11

    COMPUTER ARCHITECTURE

    Prof. Basel MahafzahLesson 2 - The role of performance - Part 2

    Copyright Universit Telematica Internazionale UNINETTUNO

    Benchmarks are valuableindicator of performance(and compiler technology)

    Benchmarks are valuableindicator of performance(and compiler technology)

    SPEC 89SPEC 89800800

    700700

    600600

    500500

    400400

    300300

    200200

    100100

    00

    Compiler enhancements and performance

    Compiler enhancements and performance

    gccgcc espressoespresso spicespice doducdoduc nasa7nasa7 lili eqntotteqntott matrix300matrix300

    fppppfpppp tomcatvtomcatv

    Benchmark Benchmark compilercompiler Enhanced compilerEnhanced compiler

    Benchmark Description

    go Artificial intelligence; plays the game of Gom88ksim Motorola 88k chip simulator; runs test programgcc The Gnu C compiler generating SPARC codecompress Compresses and decompresses file in memoryli Lisp interpreter ijpeg Graphic compression and decompressionperl Manipulates strings and prime numbers in the special-purpose programming language Perlvortex A database program

    Benchmark Description

    go Artificial intelligence; plays the game of Gom88ksim Motorola 88k chip simulator; runs test programgcc The Gnu C compiler generating SPARC codecompress Compresses and decompresses file in memoryli Lisp interpreter ijpeg Graphic compression and decompressionperl Manipulates strings and prime numbers in the special-purpose programming language Perlvortex A database program

    SPEC 95 (8 integer benchmarks)SPEC 95 (8 integer benchmarks)

    tomcatv A mesh generation programswim Shallow water model with 513 x 513 gridsu2cor quantum physics; Monte Carlo simulationhydro2d Astrophysics; Hydrodynamic Naiver Stokes equationsmgrid Multigrid solver in 3-D potential fieldapplu Parabolic/elliptic partial differential equationstrub3d Simulates isotropic, homogeneous turbulence in a cubeapsi Solves problems regarding temperature, wind velocity, and distribution of pollutantfppp p Q uant um c hem is trywave5 Plasma physics; electromagnetic particle simulation

    tomcatv A mesh generation programswim Shallow water model with 513 x 513 gridsu2cor quantum physics; Monte Carlo simulationhydro2d Astrophysics; Hydrodynamic Naiver Stokes equationsmgrid Multigrid solver in 3-D potential fieldapplu Parabolic/elliptic partial differential equationstrub3d Simulates isotropic, homogeneous turbulence in a cubeapsi Solves problems regarding temperature, wind velocity, and distribution of pollutantfpppp Quant um c hem is tr ywave5 Plasma physics; electromagnetic particle simulation

    SPEC 95 (10 floating-pointbenchmarks)

    SPEC 95 (10 floating-pointbenchmarks)

    Benchmark DescriptionBenchmark Description

    SPECint 95SPECint 95

    SPEC ratio: bigger numeric resultsindicate Faster performanceSPEC ratio: bigger numeric resultsindicate Faster performance

    Pentium Prois 1.4 to 1.5times fasterthan Pentium

    Pentium Prois 1.4 to 1.5times fasterthan Pentium

    1

    Clock rate (MHz)Clock rate (MHz)5050 100100 150150 200200 250250

    1-1-22

    00

    3-3-445-5-667-7-889-9-

    1010

    SPECfp 95SPECfp 95

    Pentium Prois 1.7 to 1.8times fasterthan Pentium

    Pentium Prois 1.7 to 1.8times fasterthan Pentium 5050 100100 150150 200200 250250

    Clock rate (MHz)Clock rate (MHz)

    11-1-22

    00

    3-3-445-5-667-7-889-9-

    1010

    SPEC ratio: bigger numeric resultsindicate Faster performanceSPEC ratio: bigger numeric resultsindicate Faster performance