fall 2015, nov 18... elec 5200-001/6200-001 lecture 10 1 elec 5200-001/6200-001 computer...

51
fall 2015, Nov 18 . . fall 2015, Nov 18 . . . ELEC 5200-001/6200-001 ELEC 5200-001/6200-001 Lecture 10 Lecture 10 1 ELEC 5200-001/6200-001 ELEC 5200-001/6200-001 Computer Architecture and Design Computer Architecture and Design Fall 2015 Fall 2015 Performance of a Performance of a Computer Computer (Chapter 4) (Chapter 4) Vishwani D. Agrawal Vishwani D. Agrawal James J. Danaher Professor James J. Danaher Professor Department of Electrical and Computer Department of Electrical and Computer Engineering Engineering Auburn University, Auburn, AL 36849 Auburn University, Auburn, AL 36849 http://www.eng.auburn.edu/~vagrawal http://www.eng.auburn.edu/~vagrawal [email protected] [email protected]

Upload: arline-wright

Post on 18-Jan-2016

219 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: Fall 2015, Nov 18... ELEC 5200-001/6200-001 Lecture 10 1 ELEC 5200-001/6200-001 Computer Architecture and Design Fall 2015 Performance of a Computer (Chapter

fall 2015, Nov 18 . . .fall 2015, Nov 18 . . . ELEC 5200-001/6200-001 Lecture 10ELEC 5200-001/6200-001 Lecture 10 11

ELEC 5200-001/6200-001ELEC 5200-001/6200-001Computer Architecture and DesignComputer Architecture and Design

Fall 2015Fall 2015 Performance of a Computer Performance of a Computer

(Chapter 4)(Chapter 4)Vishwani D. AgrawalVishwani D. Agrawal

James J. Danaher ProfessorJames J. Danaher ProfessorDepartment of Electrical and Computer EngineeringDepartment of Electrical and Computer Engineering

Auburn University, Auburn, AL 36849Auburn University, Auburn, AL 36849http://www.eng.auburn.edu/~vagrawalhttp://www.eng.auburn.edu/~vagrawal

[email protected]@eng.auburn.edu

Page 2: Fall 2015, Nov 18... ELEC 5200-001/6200-001 Lecture 10 1 ELEC 5200-001/6200-001 Computer Architecture and Design Fall 2015 Performance of a Computer (Chapter

fall 2015, Nov 18 . . .fall 2015, Nov 18 . . . ELEC 5200-001/6200-001 Lecture 10ELEC 5200-001/6200-001 Lecture 10 22

What is Performance?What is Performance?Response time: the time between the start and Response time: the time between the start and completion of a task.completion of a task.

Throughput: the total amount of work done in a Throughput: the total amount of work done in a given time.given time.

Some performance measures:Some performance measures:MIPS (million instructions per second).MIPS (million instructions per second).

MFLOPS (million floating point operations per second), also MFLOPS (million floating point operations per second), also GFLOPS, TFLOPS (10GFLOPS, TFLOPS (101212), etc.), etc.

SPEC (System Performance Evaluation Corporation) SPEC (System Performance Evaluation Corporation) benchmarks.benchmarks.

LINPACK benchmarks, floating point computing, used for LINPACK benchmarks, floating point computing, used for supercomputers.supercomputers.

Synthetic benchmarks.Synthetic benchmarks.

Page 3: Fall 2015, Nov 18... ELEC 5200-001/6200-001 Lecture 10 1 ELEC 5200-001/6200-001 Computer Architecture and Design Fall 2015 Performance of a Computer (Chapter

fall 2015, Nov 18 . . .fall 2015, Nov 18 . . . ELEC 5200-001/6200-001 Lecture 10ELEC 5200-001/6200-001 Lecture 10 33

Units for Measuring PerformanceUnits for Measuring PerformanceTime in seconds (s), microseconds (Time in seconds (s), microseconds (μμs), s), nanoseconds (ns), or picoseconds (ps).nanoseconds (ns), or picoseconds (ps).Clock cycleClock cycle

Period of the hardware clockPeriod of the hardware clockExample: one clock cycle means 1 nanosecond for Example: one clock cycle means 1 nanosecond for a 1GHz clock frequency (or 1GHz clock rate)a 1GHz clock frequency (or 1GHz clock rate)

CPU time = (CPU clock cycles)/(clock CPU time = (CPU clock cycles)/(clock rate)rate)

Cycles per instruction (CPI): average Cycles per instruction (CPI): average number of clock cycles used to execute a number of clock cycles used to execute a computer instruction.computer instruction.

Page 4: Fall 2015, Nov 18... ELEC 5200-001/6200-001 Lecture 10 1 ELEC 5200-001/6200-001 Computer Architecture and Design Fall 2015 Performance of a Computer (Chapter

fall 2015, Nov 18 . . .fall 2015, Nov 18 . . . ELEC 5200-001/6200-001 Lecture 10ELEC 5200-001/6200-001 Lecture 10 44

Components of PerformanceComponents of PerformanceComponents of Performance

Units

CPU time for a programCPU time for a program Time (seconds, etc.)Time (seconds, etc.)

Instruction countInstruction count Instructions executed by Instructions executed by the programthe program

CPICPI Average number of Average number of clock cycles per clock cycles per instructioninstruction

Clock cycle timeClock cycle time Time period of clock Time period of clock (seconds, etc.)(seconds, etc.)

Page 5: Fall 2015, Nov 18... ELEC 5200-001/6200-001 Lecture 10 1 ELEC 5200-001/6200-001 Computer Architecture and Design Fall 2015 Performance of a Computer (Chapter

fall 2015, Nov 18 . . .fall 2015, Nov 18 . . . ELEC 5200-001/6200-001 Lecture 10ELEC 5200-001/6200-001 Lecture 10 55

Time, While You Wait, or Pay ForTime, While You Wait, or Pay For

CPU timeCPU time is the time taken by CPU to is the time taken by CPU to execute the program. It has two execute the program. It has two components:components:– User CPU time User CPU time is the time to execute the is the time to execute the

instructions of the program.instructions of the program.– System CPU time System CPU time is the time used by the is the time used by the

operating system to run the program.operating system to run the program.

Elapsed time (wall clock time) Elapsed time (wall clock time) is theis the time time between the start and end of a program.between the start and end of a program.

Page 6: Fall 2015, Nov 18... ELEC 5200-001/6200-001 Lecture 10 1 ELEC 5200-001/6200-001 Computer Architecture and Design Fall 2015 Performance of a Computer (Chapter

fall 2015, Nov 18 . . .fall 2015, Nov 18 . . . ELEC 5200-001/6200-001 Lecture 10ELEC 5200-001/6200-001 Lecture 10 66

Example: Unix “time” CommandExample: Unix “time” Command90.7u 12.9s 2:39 65%

Use

r C

PU

tim

ein

sec

on

ds

Sys

tem

CP

U t

ime

in s

eco

nd

s

Ela

pse

d t

ime

In m

in:s

ec

CP

U t

ime

as p

erce

nt

of

elap

sed

tim

e

90.7 + 12.9 ─────── × 100 = 65% 159

Page 7: Fall 2015, Nov 18... ELEC 5200-001/6200-001 Lecture 10 1 ELEC 5200-001/6200-001 Computer Architecture and Design Fall 2015 Performance of a Computer (Chapter

fall 2015, Nov 18 . . .fall 2015, Nov 18 . . . ELEC 5200-001/6200-001 Lecture 10ELEC 5200-001/6200-001 Lecture 10 77

Computing CPU TimeComputing CPU Time

CPU time = Instruction count × CPI × Clock cycle time

Instruction count × CPI= ────────────────

Clock rate

Instructions Clock cycles 1 second= ──────── × ───────── × ────────

Program Instruction Clock rate

Page 8: Fall 2015, Nov 18... ELEC 5200-001/6200-001 Lecture 10 1 ELEC 5200-001/6200-001 Computer Architecture and Design Fall 2015 Performance of a Computer (Chapter

fall 2015, Nov 18 . . .fall 2015, Nov 18 . . . ELEC 5200-001/6200-001 Lecture 10ELEC 5200-001/6200-001 Lecture 10 88

Comparing Computers C1 and C2Comparing Computers C1 and C2

Run the same program on C1 and C2. Suppose both Run the same program on C1 and C2. Suppose both computers execute the same number ( computers execute the same number ( N N ) of instructions:) of instructions:

C1: CPI = 2.0, clock cycle time = 1 nsC1: CPI = 2.0, clock cycle time = 1 ns

CPU time(C1) = CPU time(C1) = NN × 2.0 × 1 = 2.0× 2.0 × 1 = 2.0NN ns nsC2: CPI = 1.2, clock cycle time = 2 nsC2: CPI = 1.2, clock cycle time = 2 ns

CPU time(C2) = CPU time(C2) = NN × 1.2 × 2 = 2.4× 1.2 × 2 = 2.4NN ns ns

CPU time(C2)/CPU time(C1) = 2.4CPU time(C2)/CPU time(C1) = 2.4NN/2.0/2.0NN = 1.2, therefore, = 1.2, therefore, C1C1 is is 1.21.2 times faster than times faster than C2.C2.

Result can vary with the choice of program.Result can vary with the choice of program.

Page 9: Fall 2015, Nov 18... ELEC 5200-001/6200-001 Lecture 10 1 ELEC 5200-001/6200-001 Computer Architecture and Design Fall 2015 Performance of a Computer (Chapter

fall 2015, Nov 18 . . .fall 2015, Nov 18 . . . ELEC 5200-001/6200-001 Lecture 10ELEC 5200-001/6200-001 Lecture 10 99

Comparing Program Codes I & IIComparing Program Codes I & IICode size for a program:Code size for a program:– Code I has 5 million instructionsCode I has 5 million instructions– Code II has 6 million instructionsCode II has 6 million instructions– Code I is more efficient. Code I is more efficient. Is it?Is it?

Suppose a computer has three Suppose a computer has three types of instructions: A, B and C.types of instructions: A, B and C.CPU cycles (code I) = 10 millionCPU cycles (code I) = 10 millionCPU cycles (code II) = 9 millionCPU cycles (code II) = 9 millionCode II is more efficient.Code II is more efficient.

CPI( I ) = 10/5 = 2CPI( I ) = 10/5 = 2CPI( II ) = 9/6 = 1.5CPI( II ) = 9/6 = 1.5Code II is more efficient.Code II is more efficient.

Caution:Caution: Code size is a misleading Code size is a misleading indicator of performance. indicator of performance.

Instr. Type CPI

AA 11

BB 22

CC 33

Code

Instruction count in million

Type A

Type B

Type C

Total

II 22 11 22 55

IIII 44 11 11 66

Page 10: Fall 2015, Nov 18... ELEC 5200-001/6200-001 Lecture 10 1 ELEC 5200-001/6200-001 Computer Architecture and Design Fall 2015 Performance of a Computer (Chapter

fall 2015, Nov 18 . . .fall 2015, Nov 18 . . . ELEC 5200-001/6200-001 Lecture 10ELEC 5200-001/6200-001 Lecture 10 1010

Rating of a ComputerRating of a ComputerMIPS: million instructions per secondMIPS: million instructions per second

Instruction count of a programInstruction count of a programMIPS = ───────────────────MIPS = ───────────────────

Execution time Execution time × 10× 1066

MIPS rating of a computer is relative to a MIPS rating of a computer is relative to a program.program.Standard programs for performance rating:Standard programs for performance rating:

Synthetic benchmarksSynthetic benchmarksSPEC benchmarks (System Performance Evaluation SPEC benchmarks (System Performance Evaluation Corporation)Corporation)

Page 11: Fall 2015, Nov 18... ELEC 5200-001/6200-001 Lecture 10 1 ELEC 5200-001/6200-001 Computer Architecture and Design Fall 2015 Performance of a Computer (Chapter

fall 2015, Nov 18 . . .fall 2015, Nov 18 . . . ELEC 5200-001/6200-001 Lecture 10ELEC 5200-001/6200-001 Lecture 10 1111

Synthetic Benchmark ProgramsSynthetic Benchmark ProgramsArtificial programs that emulate a large set Artificial programs that emulate a large set of typical “real” programs.of typical “real” programs.Whetstone benchmark – Algol and Fortran.Whetstone benchmark – Algol and Fortran.Dhrystone benchmark – Ada and C.Dhrystone benchmark – Ada and C.Disadvantages:Disadvantages:– No clear agreement on what a typical No clear agreement on what a typical

instruction mix should be.instruction mix should be.– Benchmarks do not produce meaningful result.Benchmarks do not produce meaningful result.– Purpose of rating is defeated when compilers Purpose of rating is defeated when compilers

are written to optimize the performance rating.are written to optimize the performance rating.

Page 12: Fall 2015, Nov 18... ELEC 5200-001/6200-001 Lecture 10 1 ELEC 5200-001/6200-001 Computer Architecture and Design Fall 2015 Performance of a Computer (Chapter

fall 2015, Nov 18 . . .fall 2015, Nov 18 . . . ELEC 5200-001/6200-001 Lecture 10ELEC 5200-001/6200-001 Lecture 10 1212

Misleading CompilersMisleading CompilersConsider a computer with a clock rate of 1 GHz.Consider a computer with a clock rate of 1 GHz.Two compilers produce the following instruction Two compilers produce the following instruction mixes for a program:mixes for a program:

Code from

Instruction count (billions)

CPU

clock

cycles

CPI

CPU

time*

(seconds)

MIPS**Type

AType

BType

CTotal

Compiler 1Compiler 1 55 11 11 77 1010×10×1099 1.431.43 1010 700700

Compiler 2Compiler 2 1010 11 11 1212 1515×10×1099 1.251.25 1515 800800Instruction types – A: 1-cycle, B: 2-cycle, C: 3-cycle

* CPU time = CPU clock cycles/clock rate

** MIPS = (Total instruction count/CPU time) × 10 – 6

Page 13: Fall 2015, Nov 18... ELEC 5200-001/6200-001 Lecture 10 1 ELEC 5200-001/6200-001 Computer Architecture and Design Fall 2015 Performance of a Computer (Chapter

fall 2015, Nov 18 . . .fall 2015, Nov 18 . . . ELEC 5200-001/6200-001 Lecture 10ELEC 5200-001/6200-001 Lecture 10 1313

Peak and Relative MIPS RatingsPeak and Relative MIPS RatingsPeak MIPSPeak MIPS

Choose an instruction mix to minimize CPIChoose an instruction mix to minimize CPIThe rating can be too high and unrealistic for general programsThe rating can be too high and unrealistic for general programs

Relative MIPS: Use a reference computer systemRelative MIPS: Use a reference computer system

Time(ref)Time(ref)Relative MIPS = Relative MIPS = ────── ────── × MIPS(ref)× MIPS(ref)

TimeTime

Historically, VAX-11/ 780, believed to have aHistorically, VAX-11/ 780, believed to have a11 MIPS performance, was used as reference. MIPS performance, was used as reference.

Page 14: Fall 2015, Nov 18... ELEC 5200-001/6200-001 Lecture 10 1 ELEC 5200-001/6200-001 Computer Architecture and Design Fall 2015 Performance of a Computer (Chapter

fall 2015, Nov 18 . . .fall 2015, Nov 18 . . . ELEC 5200-001/6200-001 Lecture 10ELEC 5200-001/6200-001 Lecture 10 1414

A 1994 MIPS Rating ChartA 1994 MIPS Rating Chart

Computer MIPS Price $/MIPS

1975 IBM mainframe1975 IBM mainframe 1010 $10M$10M 1M1M

1976 Cray-11976 Cray-1 160160 $20M$20M 125K125K

1979 DEC VAX1979 DEC VAX 11 $200K$200K 200K200K

1981 IBM PC1981 IBM PC 0.250.25 $3K$3K 12K12K

1984 Sun 21984 Sun 2 11 $10K$10K 10K10K

1994 Pentium PC1994 Pentium PC 6666 $3K$3K 4646

1995 Sony PCX video game1995 Sony PCX video game 500500 $500$500 11

1995 Microunity set-top1995 Microunity set-top 1,0001,000 $500$500 0.50.5 New

Yor

k T

imes

, Apr

il 20

, 199

4

Page 15: Fall 2015, Nov 18... ELEC 5200-001/6200-001 Lecture 10 1 ELEC 5200-001/6200-001 Computer Architecture and Design Fall 2015 Performance of a Computer (Chapter

fall 2015, Nov 18 . . .fall 2015, Nov 18 . . . ELEC 5200-001/6200-001 Lecture 10ELEC 5200-001/6200-001 Lecture 10 1515

MFLOPS (megaFLOPS)MFLOPS (megaFLOPS)

Only floating point operations are counted:Only floating point operations are counted:– Float, real, double; add, subtract, multiply, divideFloat, real, double; add, subtract, multiply, divide

MFLOPS rating is relevant in scientific computing. For MFLOPS rating is relevant in scientific computing. For example, programs like a compiler will measure almost 0 example, programs like a compiler will measure almost 0 MFLOPS.MFLOPS.

Sometimes misleading due to different implementations. Sometimes misleading due to different implementations. For example, a computer that does not have a floating-point For example, a computer that does not have a floating-point divide, will register many FLOPS for a division.divide, will register many FLOPS for a division.

Number of floating-point operations in a programMFLOPS = ─────────────────────────────────

Execution time × 106

Page 16: Fall 2015, Nov 18... ELEC 5200-001/6200-001 Lecture 10 1 ELEC 5200-001/6200-001 Computer Architecture and Design Fall 2015 Performance of a Computer (Chapter

Supercomputer PerformanceSupercomputer Performance

fall 2015, Nov 18 . . .fall 2015, Nov 18 . . . ELEC 5200-001/6200-001 Lecture 10ELEC 5200-001/6200-001 Lecture 10 1616

Gigaflops

Teraflops

Petaflops

Exaflops

http

://e

n.w

ikip

edia

.org

/wik

i/Sup

erco

mpu

ter

Megaflops

Page 17: Fall 2015, Nov 18... ELEC 5200-001/6200-001 Lecture 10 1 ELEC 5200-001/6200-001 Computer Architecture and Design Fall 2015 Performance of a Computer (Chapter

Top Supercomputers, Nov 2014Top Supercomputers, Nov 2014www.top500.orgwww.top500.org

Rank Name Location CoresClock GHz

Max. Pflops

Power MW

ȠPflops

/MJoule

1 Tianhe-2

Guangzhou China

3,120,000 2.2 33.86 17.80 1.90

2 Titan Cray USA 560,460 2.2 17.59 8.21 2.14

3 Sequoia IBM USA 1,572,864 1.6 17.17 7.89 2.18

4K

computerFujitsu Japan

705,024 2.0 10.50 12.66 0.83

5 Mira IBM USA 786,432 1.6 8.59 3.95 2.18

fall 2015, Nov 18 . . .fall 2015, Nov 18 . . . ELEC 5200-001/6200-001 Lecture 10ELEC 5200-001/6200-001 Lecture 10 1717

N. Leavitt, “Big Iron Moves Toward Exascale Computing,” Computer, vol. 45,no. 11, pp. 14-17, Nov. 2012.

Page 18: Fall 2015, Nov 18... ELEC 5200-001/6200-001 Lecture 10 1 ELEC 5200-001/6200-001 Computer Architecture and Design Fall 2015 Performance of a Computer (Chapter

fall 2015, Nov 18 . . .fall 2015, Nov 18 . . . ELEC 5200-001/6200-001 Lecture 10ELEC 5200-001/6200-001 Lecture 10 1818

PerformancePerformance

Performance is measured for a given program or a Performance is measured for a given program or a set of programs:set of programs:

Av. eAv. execution timexecution time = (1/ = (1/nn) ) ΣΣ Execution time Execution time ((program i program i ))

oror

Av. execution time = Av. execution time = [ [ ∏∏ Execution time Execution time ((program i program i )) ]]1/1/nn

Performance is inverse of execution time:Performance is inverse of execution time:

PerformancePerformance = 1/( = 1/(Execution timeExecution time))

i =1

n

i =1

n

Page 19: Fall 2015, Nov 18... ELEC 5200-001/6200-001 Lecture 10 1 ELEC 5200-001/6200-001 Computer Architecture and Design Fall 2015 Performance of a Computer (Chapter

fall 2015, Nov 18 . . .fall 2015, Nov 18 . . . ELEC 5200-001/6200-001 Lecture 10ELEC 5200-001/6200-001 Lecture 10 1919

Geometric vs. Arithmetic MeanGeometric vs. Arithmetic MeanReference computer times of n programs: r1, . . . , rnReference computer times of n programs: r1, . . . , rnTimes of n programs on the computer under evaluation: Times of n programs on the computer under evaluation: T1, . . . , TnT1, . . . , TnNormalized times: T1/r1, . . . , Tn/rnNormalized times: T1/r1, . . . , Tn/rnGeometric meanGeometric mean == {(T1/r1) . . . (Tn/rn)}{(T1/r1) . . . (Tn/rn)}1/n1/n

{T1 . . . Tn}{T1 . . . Tn}1/n1/n

= = UsedUsed{r1 . . . rn}{r1 . . . rn}1/n1/n

Arithmetic meanArithmetic mean = = {(T1/r1)+ . . . +(Tn/rn)}/n{(T1/r1)+ . . . +(Tn/rn)}/n{T1+ . . . +Tn}/n{T1+ . . . +Tn}/n

≠ ≠ Not usedNot used{r1+ . . . +rn}/n{r1+ . . . +rn}/n

J. E. Smith, “Characterizing Computer Performance with a Single J. E. Smith, “Characterizing Computer Performance with a Single Number,” Number,” Comm. ACMComm. ACM, vol. 31, no. 10, pp. 1202-1206, Oct. 1988., vol. 31, no. 10, pp. 1202-1206, Oct. 1988.

Page 20: Fall 2015, Nov 18... ELEC 5200-001/6200-001 Lecture 10 1 ELEC 5200-001/6200-001 Computer Architecture and Design Fall 2015 Performance of a Computer (Chapter

fall 2015, Nov 18 . . .fall 2015, Nov 18 . . . ELEC 5200-001/6200-001 Lecture 10ELEC 5200-001/6200-001 Lecture 10 2020

SPEC BenchmarksSPEC BenchmarksSystem Performance Evaluation Corporation System Performance Evaluation Corporation (SPEC)(SPEC)

SPEC89SPEC89– 10 programs10 programs– SPEC performance ratio relative to VAX-11/780SPEC performance ratio relative to VAX-11/780– One program, matrix300, dropped because One program, matrix300, dropped because

compilers could be engineered to improve its compilers could be engineered to improve its performance.performance.

– www.spec.org www.spec.org

Page 21: Fall 2015, Nov 18... ELEC 5200-001/6200-001 Lecture 10 1 ELEC 5200-001/6200-001 Computer Architecture and Design Fall 2015 Performance of a Computer (Chapter

fall 2015, Nov 18 . . .fall 2015, Nov 18 . . . ELEC 5200-001/6200-001 Lecture 10ELEC 5200-001/6200-001 Lecture 10 2121

SPEC89 Performance Ratio forSPEC89 Performance Ratio forIBM Powerstation 550IBM Powerstation 550

0

100

200

300

400

500

600

700

800g

cc

esp

ress

o

spic

e

do

cuc

nas

a7 li

eqn

tott

mat

rix3

00

fpp

pp

tom

catv

compiler

enhanced compiler

Page 22: Fall 2015, Nov 18... ELEC 5200-001/6200-001 Lecture 10 1 ELEC 5200-001/6200-001 Computer Architecture and Design Fall 2015 Performance of a Computer (Chapter

fall 2015, Nov 18 . . .fall 2015, Nov 18 . . . ELEC 5200-001/6200-001 Lecture 10ELEC 5200-001/6200-001 Lecture 10 2222

SPEC95 BenchmarksSPEC95 BenchmarksEight integer and ten floating point Eight integer and ten floating point programs, programs, SPECint95SPECint95 and and SPECfp95SPECfp95..

Each program run time is normalized with Each program run time is normalized with respect to the run time of respect to the run time of Sun Sun SPARCstation 10/40SPARCstation 10/40 – the ratio is called – the ratio is called SPEC ratioSPEC ratio..

SPECint95SPECint95 and and SPECfp95SPECfp95 summary summary measurements are the geometric means of measurements are the geometric means of SPEC ratios.SPEC ratios.

Page 23: Fall 2015, Nov 18... ELEC 5200-001/6200-001 Lecture 10 1 ELEC 5200-001/6200-001 Computer Architecture and Design Fall 2015 Performance of a Computer (Chapter

fall 2015, Nov 18 . . .fall 2015, Nov 18 . . . ELEC 5200-001/6200-001 Lecture 10ELEC 5200-001/6200-001 Lecture 10 2323

SPEC CPU2000 BenchmarksSPEC CPU2000 Benchmarks

Twelve integer and 14 floating point programs, Twelve integer and 14 floating point programs, CINT2000CINT2000 and and CFP2000CFP2000..

Each program run time is normalized to obtain a Each program run time is normalized to obtain a SPEC ratioSPEC ratio with respect to the run time on with respect to the run time on Sun Sun Ultra 5_10 with a 300MHz processorUltra 5_10 with a 300MHz processor..

CINT2000CINT2000 and and CFP2000CFP2000 summary measurements summary measurements are the geometric means of SPEC ratios.are the geometric means of SPEC ratios.

Retired in 2007, replaced with SPEC CPURetired in 2007, replaced with SPEC CPUTMTM 2006 2006 https://www.spec.org/cpu2006/ https://www.spec.org/cpu2006/

Page 24: Fall 2015, Nov 18... ELEC 5200-001/6200-001 Lecture 10 1 ELEC 5200-001/6200-001 Computer Architecture and Design Fall 2015 Performance of a Computer (Chapter

CINT2000 : Eleven ProgramsCINT2000 : Eleven Programs

fall 2015, Nov 18 . . .fall 2015, Nov 18 . . . ELEC 5200-001/6200-001 Lecture 10ELEC 5200-001/6200-001 Lecture 10 2424

Name Ref Time Remarks164.gzip 1400 Data compression utility (C)175.vpr 1400 FPGA circuit placement and routing (C)176.gcc 1100 C compiler (C)181.mcf 1800 Minimum cost network flow solver (C)186.crafty 1000 Chess program (C)197.parser 1800 Natural language processing (C) 252.eon 1300 Ray tracing (C++) 253.perlbmk 1800 Perl (C) 254.gap 1100 Computational group theory (C) 255.vortex 1900 Object Oriented Database (C) 256.bzip2 1500 Data compression utility (C) 300.twolf 3000 Place and route simulator (C)

https://www.spec.org/cpu2000/docs/readme1st.html#Q8

Page 25: Fall 2015, Nov 18... ELEC 5200-001/6200-001 Lecture 10 1 ELEC 5200-001/6200-001 Computer Architecture and Design Fall 2015 Performance of a Computer (Chapter

CFP2000: Fourteen ProgramsCFP2000: Fourteen Programs(6 Fortran 77, 4 Fortran 90, 4 C)(6 Fortran 77, 4 Fortran 90, 4 C)

fall 2015, Nov 18 . . .fall 2015, Nov 18 . . . ELEC 5200-001/6200-001 Lecture 10ELEC 5200-001/6200-001 Lecture 10 2525

Name Ref Time Remarks168.wupwise 1600 Quantum chromodynamics171.swim 3100 Shallow water modeling172.mgrid 1800 Multi-grid solver in 3D potential field173.applu 2100 Parabolic/elliptic partial differential equations177.mesa 1400 3D Graphics library178.galgel 2900 Fluid dynamics: analysis of oscillatory instability179.art 2600 Neural network simulation; adaptive resonance theory183.equake 1300 Finite element simulation; earthquake modeling187.facerec 1900 Computer vision: recognizes faces188.ammp 2200 Computational chemistry 189.lucas 2000 Number theory: primality testing191.fma3d 2100 Finite element crash simulation200.sixtrack 1100 Particle accelerator model301.apsi 2600 Solves problems regarding temperature, wind, velocity and distribution of pollutants

https://www.spec.org/cpu2000/docs/readme1st.html#Q8

Page 26: Fall 2015, Nov 18... ELEC 5200-001/6200-001 Lecture 10 1 ELEC 5200-001/6200-001 Computer Architecture and Design Fall 2015 Performance of a Computer (Chapter

fall 2015, Nov 18 . . .fall 2015, Nov 18 . . . ELEC 5200-001/6200-001 Lecture 10ELEC 5200-001/6200-001 Lecture 10 2626

Reference CPU: Sun Ultra 5_10 Reference CPU: Sun Ultra 5_10 300MHz Processor300MHz Processor

0

500

1000

1500

2000

2500

3000

3500g

zip

vp

rg

cc

mc

fc

raft

yp

ars

er

eo

np

erl

bm

kg

ap

vo

rte

xb

zip

2tw

olf

wu

pw

ise

sw

imm

gri

da

pp

lum

es

ag

alg

el

art

eq

ua

ke

fac

ere

ca

mm

plu

ca

sfm

a3

ds

ixtr

ac

ka

ps

i

CINT2000

CFP2000CP

U s

eco

nds

Page 27: Fall 2015, Nov 18... ELEC 5200-001/6200-001 Lecture 10 1 ELEC 5200-001/6200-001 Computer Architecture and Design Fall 2015 Performance of a Computer (Chapter

fall 2015, Nov 18 . . .fall 2015, Nov 18 . . . ELEC 5200-001/6200-001 Lecture 10ELEC 5200-001/6200-001 Lecture 10 2727

Two Benchmark ResultsTwo Benchmark Results

Baseline: A uniform configuration not Baseline: A uniform configuration not optimized for specific program:optimized for specific program:

Same compiler with same settings and flags used Same compiler with same settings and flags used for all benchmarksfor all benchmarks

Other restrictionsOther restrictions

Peak: Run is optimized for obtaining the Peak: Run is optimized for obtaining the peak performance for each benchmark peak performance for each benchmark program.program.

Page 28: Fall 2015, Nov 18... ELEC 5200-001/6200-001 Lecture 10 1 ELEC 5200-001/6200-001 Computer Architecture and Design Fall 2015 Performance of a Computer (Chapter

fall 2015, Nov 18 . . .fall 2015, Nov 18 . . . ELEC 5200-001/6200-001 Lecture 10ELEC 5200-001/6200-001 Lecture 10 2828

CINT2000: 1.7GHz Pentium 4CINT2000: 1.7GHz Pentium 4(D850MD Motherboard)(D850MD Motherboard)

0100200300400500600700800900

1000

gzi

p

vpr

gcc

mcf

craf

ty

par

ser

eon

per

lbm

k

gap

vort

ex

bzi

p2

two

lf

Base ratio

Opt. ratio

SPECint2000_base = 579SPECint2000 = 588

Source: www.spec.org

Page 29: Fall 2015, Nov 18... ELEC 5200-001/6200-001 Lecture 10 1 ELEC 5200-001/6200-001 Computer Architecture and Design Fall 2015 Performance of a Computer (Chapter

fall 2015, Nov 18 . . .fall 2015, Nov 18 . . . ELEC 5200-001/6200-001 Lecture 10ELEC 5200-001/6200-001 Lecture 10 2929

CFP2000: 1.7GHz Pentium 4 CFP2000: 1.7GHz Pentium 4 (D850MD Motherboard)(D850MD Motherboard)

0

200

400

600

800

1000

1200

1400w

up

wis

esw

im

mg

rid

app

lum

esa

gal

gel art

equ

ake

face

rec

amm

plu

cas

fma3

dsi

xtra

ck

apsi

Base ratio

Opt. ratio

SPECfp2000_base = 648SPECfp2000 = 659

Source: www.spec.org

Page 30: Fall 2015, Nov 18... ELEC 5200-001/6200-001 Lecture 10 1 ELEC 5200-001/6200-001 Computer Architecture and Design Fall 2015 Performance of a Computer (Chapter

fall 2015, Nov 18 . . .fall 2015, Nov 18 . . . ELEC 5200-001/6200-001 Lecture 10ELEC 5200-001/6200-001 Lecture 10 3030

Additional SPEC BenchmarksAdditional SPEC Benchmarks

SPECweb99: measures the performance of a SPECweb99: measures the performance of a computer in a networked environment.computer in a networked environment.

Energy efficiency mode: Besides the execution Energy efficiency mode: Besides the execution time, energy efficiency of SPEC benchmark time, energy efficiency of SPEC benchmark programs is also measured. Energy efficiency of a programs is also measured. Energy efficiency of a benchmark program is given by:benchmark program is given by:

1/(Execution time)1/(Execution time)Energy efficiency Energy efficiency == ────────────────────────

Power in wattsPower in watts

== Program units/jouleProgram units/joule

Page 31: Fall 2015, Nov 18... ELEC 5200-001/6200-001 Lecture 10 1 ELEC 5200-001/6200-001 Computer Architecture and Design Fall 2015 Performance of a Computer (Chapter

fall 2015, Nov 18 . . .fall 2015, Nov 18 . . . ELEC 5200-001/6200-001 Lecture 10ELEC 5200-001/6200-001 Lecture 10 3131

Energy EfficiencyEnergy Efficiency

Efficiency averaged on Efficiency averaged on nn benchmark programs: benchmark programs:

nnEfficiencyEfficiency == (( ΠΠ Efficiency Efficiencyii ))

1/1/nn

i i =1=1

where Efficiencywhere Efficiencyii is the efficiency for program is the efficiency for program ii..

Relative efficiency:Relative efficiency:

Efficiency of a computerEfficiency of a computerRelative efficiency = ─────────────────Relative efficiency = ─────────────────

Eff. of reference Eff. of reference computercomputer

Page 32: Fall 2015, Nov 18... ELEC 5200-001/6200-001 Lecture 10 1 ELEC 5200-001/6200-001 Computer Architecture and Design Fall 2015 Performance of a Computer (Chapter

fall 2015, Nov 18 . . .fall 2015, Nov 18 . . . ELEC 5200-001/6200-001 Lecture 10ELEC 5200-001/6200-001 Lecture 10 3232

SPEC2000 Relative Energy EfficiencySPEC2000 Relative Energy Efficiency

0

1

2

3

4

5

6

SP

EC

INT

20

00

SP

EC

FP

20

00

SP

EC

INT

20

00

SP

EC

FP

20

00

SP

EC

INT

20

00

SP

EC

FP

20

00

Pentium [email protected]/0.6GHz Energy-efficient procesor

Pentium [email protected] (Reference)

Pentium [email protected]

Always max. clock

Laptop adaptive clk.

Min. power min. clock

Page 33: Fall 2015, Nov 18... ELEC 5200-001/6200-001 Lecture 10 1 ELEC 5200-001/6200-001 Computer Architecture and Design Fall 2015 Performance of a Computer (Chapter

fall 2015, Nov 18 . . .fall 2015, Nov 18 . . . ELEC 5200-001/6200-001 Lecture 10ELEC 5200-001/6200-001 Lecture 10 3333

Ways of Improving PerformanceWays of Improving Performance

Increase clock rate.Increase clock rate.

Improve processor organization for lower CPIImprove processor organization for lower CPIPipeliningPipelining

Instruction-level parallelism (ILP): MIMD (Scalar)Instruction-level parallelism (ILP): MIMD (Scalar)

Data-parallelism: SIMD (Vector)Data-parallelism: SIMD (Vector)

multiprocessingmultiprocessing

Compiler enhancements that lower the instruction Compiler enhancements that lower the instruction count or generate instructions with lower average count or generate instructions with lower average CPI (e.g., by using simpler instructions).CPI (e.g., by using simpler instructions).

Page 34: Fall 2015, Nov 18... ELEC 5200-001/6200-001 Lecture 10 1 ELEC 5200-001/6200-001 Computer Architecture and Design Fall 2015 Performance of a Computer (Chapter

fall 2015, Nov 18 . . .fall 2015, Nov 18 . . . ELEC 5200-001/6200-001 Lecture 10ELEC 5200-001/6200-001 Lecture 10 3434

Limits of PerformanceLimits of PerformanceExecution time of a program on a Execution time of a program on a computer is 100 s:computer is 100 s:

80 s for multiply operations80 s for multiply operations

20 s for other operations20 s for other operations

Improve multiply Improve multiply nn times: times: 8080Execution time = (── + 20 ) secondsExecution time = (── + 20 ) seconds nn

Limit: Even if Limit: Even if nn = = ∞∞, execution time cannot , execution time cannot be reduced below 20 s.be reduced below 20 s.

Page 35: Fall 2015, Nov 18... ELEC 5200-001/6200-001 Lecture 10 1 ELEC 5200-001/6200-001 Computer Architecture and Design Fall 2015 Performance of a Computer (Chapter

fall 2015, Nov 18 . . .fall 2015, Nov 18 . . . ELEC 5200-001/6200-001 Lecture 10ELEC 5200-001/6200-001 Lecture 10 3535

Amdahl’s LawAmdahl’s LawThe execution time of a The execution time of a

system, in general, has two system, in general, has two fractions – a fractionfractions – a fraction f fenhenh that that

can be speeded up by factor can be speeded up by factor nn, ,

and the remaining fraction 1 - and the remaining fraction 1 - ffenhenh that cannot be improved. that cannot be improved.

Thus, the possible speedup is:Thus, the possible speedup is:

G. M. Amdahl, “Validity of the G. M. Amdahl, “Validity of the

Single Processor Approach to Single Processor Approach to

Achieving Large-Scale Achieving Large-Scale

Computing Capabilities,” Computing Capabilities,” Proc. Proc.

AFIPS Spring Joint Computer AFIPS Spring Joint Computer

ConfConf., Atlantic City, NJ, April ., Atlantic City, NJ, April

1967, pp. 483-485.1967, pp. 483-485.

Old timeSpeedup = ──────

New time

1 = ──────────

1 – fenh + fenh/n

Gene Myron Amdahl born 1922

http://en.wikipedia.org/wiki/Gene_Amdahl

Page 36: Fall 2015, Nov 18... ELEC 5200-001/6200-001 Lecture 10 1 ELEC 5200-001/6200-001 Computer Architecture and Design Fall 2015 Performance of a Computer (Chapter

fall 2015, Nov 18 . . .fall 2015, Nov 18 . . . ELEC 5200-001/6200-001 Lecture 10ELEC 5200-001/6200-001 Lecture 10 3636

Wisconsin Integrally Synchronized Wisconsin Integrally Synchronized Computer (WISC), 1950-51Computer (WISC), 1950-51

Page 37: Fall 2015, Nov 18... ELEC 5200-001/6200-001 Lecture 10 1 ELEC 5200-001/6200-001 Computer Architecture and Design Fall 2015 Performance of a Computer (Chapter

Parallel Processors: Shared MemoryParallel Processors: Shared Memory

fall 2015, Nov 18 . . .fall 2015, Nov 18 . . . ELEC 5200-001/6200-001 Lecture 10ELEC 5200-001/6200-001 Lecture 10 3737

P P

P P

P P

M

Page 38: Fall 2015, Nov 18... ELEC 5200-001/6200-001 Lecture 10 1 ELEC 5200-001/6200-001 Computer Architecture and Design Fall 2015 Performance of a Computer (Chapter

Parallel ProcessorsParallel ProcessorsShared Memory, Infinite BandwidthShared Memory, Infinite Bandwidth

N processorsN processors

Single processor: non-memory execution time = Single processor: non-memory execution time = αα

Memory access time = 1 – Memory access time = 1 – αα

N processor run time, T(N)= 1 – N processor run time, T(N)= 1 – αα + + αα/N/N

T(1) T(1) 11 N N

Speedup = Speedup = ——— = —————— = —————————— = —————— = ———————

T(N)T(N) 1 – 1 – αα + + αα/N/N (1 – (1 – αα)N + )N + αα

Maximum speedup = 1/(1 – Maximum speedup = 1/(1 – αα), when N = ∞), when N = ∞

fall 2015, Nov 18 . . .fall 2015, Nov 18 . . . ELEC 5200-001/6200-001 Lecture 10ELEC 5200-001/6200-001 Lecture 10 3838

Page 39: Fall 2015, Nov 18... ELEC 5200-001/6200-001 Lecture 10 1 ELEC 5200-001/6200-001 Computer Architecture and Design Fall 2015 Performance of a Computer (Chapter

Run TimeRun Time

fall 2015, Nov 18 . . .fall 2015, Nov 18 . . . ELEC 5200-001/6200-001 Lecture 10ELEC 5200-001/6200-001 Lecture 10 3939

α

1 – α

1 2 3 4 5 6 7

No

rma

lize

d ru

n ti

me

, T(N

)

Number of processors (N)

α/N

T(N) = 1 – α + α/N

Page 40: Fall 2015, Nov 18... ELEC 5200-001/6200-001 Lecture 10 1 ELEC 5200-001/6200-001 Computer Architecture and Design Fall 2015 Performance of a Computer (Chapter

SpeedupSpeedup

fall 2015, Nov 18 . . .fall 2015, Nov 18 . . . ELEC 5200-001/6200-001 Lecture 10ELEC 5200-001/6200-001 Lecture 10 4040

6

5

4

3

2

11 2 3 4 5 6

Sp

eed

up, T

(1)/

T(N

)

Number of processors (N)

Ideal, N(α = 1)

N(1 – α)N + α

Page 41: Fall 2015, Nov 18... ELEC 5200-001/6200-001 Lecture 10 1 ELEC 5200-001/6200-001 Computer Architecture and Design Fall 2015 Performance of a Computer (Chapter

ExampleExample

10% memory accesses, i.e., 10% memory accesses, i.e., αα = 0.9 = 0.9

Maximum speedup=Maximum speedup= 1/(1 – a)1/(1 – a)

== 1.0/0.1 = 10, 1.0/0.1 = 10, when N = ∞when N = ∞

What is the speedup with 10 What is the speedup with 10 processors?processors?

fall 2015, Nov 18 . . .fall 2015, Nov 18 . . . ELEC 5200-001/6200-001 Lecture 10ELEC 5200-001/6200-001 Lecture 10 4141

Page 42: Fall 2015, Nov 18... ELEC 5200-001/6200-001 Lecture 10 1 ELEC 5200-001/6200-001 Computer Architecture and Design Fall 2015 Performance of a Computer (Chapter

Parallel ProcessorsParallel ProcessorsShared Memory, Finite BandwidthShared Memory, Finite Bandwidth

N processorsN processors

Single processor: non-memory execution time = Single processor: non-memory execution time = αα

Memory access time = (1 – Memory access time = (1 – αα)N )N

N processor run time, T(N) = (1 – N processor run time, T(N) = (1 – αα)N + )N + αα/N/N

11 NN

Speedup = Speedup = ———————— = ———————— = ——————————————

(1 – (1 – αα)N + )N + αα/N/N (1 – (1 – αα)N)N22 + + αα

fall 2015, Nov 18 . . .fall 2015, Nov 18 . . . ELEC 5200-001/6200-001 Lecture 10ELEC 5200-001/6200-001 Lecture 10 4242

Page 43: Fall 2015, Nov 18... ELEC 5200-001/6200-001 Lecture 10 1 ELEC 5200-001/6200-001 Computer Architecture and Design Fall 2015 Performance of a Computer (Chapter

Run TimeRun Time

fall 2015, Nov 18 . . .fall 2015, Nov 18 . . . ELEC 5200-001/6200-001 Lecture 10ELEC 5200-001/6200-001 Lecture 10 4343

α

1 – α

1 2 3 4 5 6 7

No

rma

lize

d ru

n ti

me

, T(N

)

Number of processors (N)

α/N

T(N) = (1 – α)N + α/N(1 – α)N

Page 44: Fall 2015, Nov 18... ELEC 5200-001/6200-001 Lecture 10 1 ELEC 5200-001/6200-001 Computer Architecture and Design Fall 2015 Performance of a Computer (Chapter

Minimum Run TimeMinimum Run Time

Minimize N processor run time,Minimize N processor run time,

T(N) = (1 – T(N) = (1 – αα)N + )N + αα/N/N

∂∂T(N)/∂N = 0T(N)/∂N = 0

1 – 1 – αα – – αα/N/N22 = 0, N = [ = 0, N = [αα/(1 – /(1 – αα)])]½½

Min. T(N) = 2Min. T(N) = 2[[αα(1 – (1 – αα)])]½½, because , because ∂∂22T(N)/∂NT(N)/∂N22 > 0. > 0.

Maximum speedup = 1/T(N) = 0.5Maximum speedup = 1/T(N) = 0.5[[αα(1 – (1 – αα)])]-½-½

Example: Example: αα = 0.9 = 0.9Maximum speedup = 1.67, when N = 3Maximum speedup = 1.67, when N = 3

fall 2015, Nov 18 . . .fall 2015, Nov 18 . . . ELEC 5200-001/6200-001 Lecture 10ELEC 5200-001/6200-001 Lecture 10 4444

Page 45: Fall 2015, Nov 18... ELEC 5200-001/6200-001 Lecture 10 1 ELEC 5200-001/6200-001 Computer Architecture and Design Fall 2015 Performance of a Computer (Chapter

SpeedupSpeedup

fall 2015, Nov 18 . . .fall 2015, Nov 18 . . . ELEC 5200-001/6200-001 Lecture 10ELEC 5200-001/6200-001 Lecture 10 4545

6

5

4

3

2

1

1 2 3 4 5 6

Sp

eed

up, T

(1)/

T(N

)

Number of processors (N)

Ideal, N

N(1 – α)N2 + α

Page 46: Fall 2015, Nov 18... ELEC 5200-001/6200-001 Lecture 10 1 ELEC 5200-001/6200-001 Computer Architecture and Design Fall 2015 Performance of a Computer (Chapter

Parallel Processors: Distributed MemoryParallel Processors: Distributed Memory

fall 2015, Nov 18 . . .fall 2015, Nov 18 . . . ELEC 5200-001/6200-001 Lecture 10ELEC 5200-001/6200-001 Lecture 10 4646

P P

P P

P P

M

Inter-connectio

nnetwork

M

M

M

M

M

Page 47: Fall 2015, Nov 18... ELEC 5200-001/6200-001 Lecture 10 1 ELEC 5200-001/6200-001 Computer Architecture and Design Fall 2015 Performance of a Computer (Chapter

Parallel ProcessorsParallel ProcessorsDistributed MemoryDistributed Memory

N processorsN processors

Single processor: non-memory execution time = Single processor: non-memory execution time = αα

Memory access time = 1 – Memory access time = 1 – αα, same as single processor, same as single processor

Communication overhead = Communication overhead = ββ(N – 1)(N – 1)

N processor run time, T(N) = N processor run time, T(N) = ββ(N – 1) + 1/N(N – 1) + 1/N

11 N N

Speedup = Speedup = ———————— = ——————————————— = ———————

ββ(N – 1) + 1/N(N – 1) + 1/N ββN(N – 1) + 1N(N – 1) + 1

fall 2015, Nov 18 . . .fall 2015, Nov 18 . . . ELEC 5200-001/6200-001 Lecture 10ELEC 5200-001/6200-001 Lecture 10 4747

Page 48: Fall 2015, Nov 18... ELEC 5200-001/6200-001 Lecture 10 1 ELEC 5200-001/6200-001 Computer Architecture and Design Fall 2015 Performance of a Computer (Chapter

Minimum Run TimeMinimum Run TimeMinimize N processor run time,Minimize N processor run time,

T(N) = T(N) = ββ(N – 1) + 1/N(N – 1) + 1/N

∂∂T(N)/∂N = 0T(N)/∂N = 0

ββ – 1/N – 1/N22 = 0, N = = 0, N = ββ-½-½

Min. T(N) = 2Min. T(N) = 2ββ½½ – – ββ, because , because ∂∂22T(N)/∂NT(N)/∂N22 > 0. > 0.

Maximum speedup = 1/T(N) = 1/(2Maximum speedup = 1/T(N) = 1/(2ββ½½ – – ββ))

Example: Example: ββ = 0.01, Maximum speedup: = 0.01, Maximum speedup:N = 10N = 10

T(N) = 0.19T(N) = 0.19

Speedup = 5.26Speedup = 5.26

fall 2015, Nov 18 . . .fall 2015, Nov 18 . . . ELEC 5200-001/6200-001 Lecture 10ELEC 5200-001/6200-001 Lecture 10 4848

Page 49: Fall 2015, Nov 18... ELEC 5200-001/6200-001 Lecture 10 1 ELEC 5200-001/6200-001 Computer Architecture and Design Fall 2015 Performance of a Computer (Chapter

Run TimeRun Time

fall 2015, Nov 18 . . .fall 2015, Nov 18 . . . ELEC 5200-001/6200-001 Lecture 10ELEC 5200-001/6200-001 Lecture 10 4949

01 10 20 30

No

rma

lize

d ru

n ti

me

, T(N

)

Number of processors (N)

1/N

T(N) = β(N – 1) + 1/N

β(N – 1)

1

Page 50: Fall 2015, Nov 18... ELEC 5200-001/6200-001 Lecture 10 1 ELEC 5200-001/6200-001 Computer Architecture and Design Fall 2015 Performance of a Computer (Chapter

SpeedupSpeedup

fall 2015, Nov 18 . . .fall 2015, Nov 18 . . . ELEC 5200-001/6200-001 Lecture 10ELEC 5200-001/6200-001 Lecture 10 5050

12

10

8

6

4

2

2 4 6 8 10 12

Sp

eed

up, T

(1)/

T(N

)

Number of processors (N)

Ideal, N

NβN(N – 1) + 1

Page 51: Fall 2015, Nov 18... ELEC 5200-001/6200-001 Lecture 10 1 ELEC 5200-001/6200-001 Computer Architecture and Design Fall 2015 Performance of a Computer (Chapter

fall 2015, Nov 18 . . .fall 2015, Nov 18 . . . ELEC 5200-001/6200-001 Lecture 10ELEC 5200-001/6200-001 Lecture 10 5151

Further ReadingFurther ReadingG. M. Amdahl, “Validity of the Single Processor Approach to Achieving Large-G. M. Amdahl, “Validity of the Single Processor Approach to Achieving Large-Scale Computing Capabilities,” Scale Computing Capabilities,” Proc. AFIPS Spring Joint Computer ConfProc. AFIPS Spring Joint Computer Conf., ., Atlantic City, NJ, Apr. 1967, pp. 483-485.Atlantic City, NJ, Apr. 1967, pp. 483-485.

J. L. Gustafson, “Reevaluating Amdahl’s Law,” J. L. Gustafson, “Reevaluating Amdahl’s Law,” Comm. ACMComm. ACM, vol. 31, no. 5, pp. , vol. 31, no. 5, pp. 532-533, May 1988.532-533, May 1988.

M. D. Hill and M. R. Marty, “Amdahl’s Law in the Multicore Era,” M. D. Hill and M. R. Marty, “Amdahl’s Law in the Multicore Era,” ComputerComputer, vol. , vol. 41, no. 7, pp. 33-38, July 2008.41, no. 7, pp. 33-38, July 2008.

D. H. Woo and H.-H. S. Lee, “Extending Amdahl’s Law for Energy-Efficient D. H. Woo and H.-H. S. Lee, “Extending Amdahl’s Law for Energy-Efficient Computing in the Many-Core Era,” Computing in the Many-Core Era,” ComputerComputer, vol. 41, no. 12, pp. 24-31, Dec. , vol. 41, no. 12, pp. 24-31, Dec. 2008.2008.

S. M. Pieper, J. M. Paul and M. J. Schulte, “A New Era of Performance S. M. Pieper, J. M. Paul and M. J. Schulte, “A New Era of Performance Evaluation,” Evaluation,” ComputerComputer, vol. 40, no. 9, pp. 23-30, Sep. 2007., vol. 40, no. 9, pp. 23-30, Sep. 2007.

S. Gal-On and M. Levy, “Measuring Multicore Performance,” S. Gal-On and M. Levy, “Measuring Multicore Performance,” ComputerComputer, vol. 41, , vol. 41, no. 11, pp. 99-102, November 2008.no. 11, pp. 99-102, November 2008.

S. Williams, A. Waterman and D. Patterson, “Roofline: An Insightful Visual S. Williams, A. Waterman and D. Patterson, “Roofline: An Insightful Visual Performance Model for Multicore Architectures,” Performance Model for Multicore Architectures,” Comm. ACMComm. ACM, vol. 52, no. 4, pp. , vol. 52, no. 4, pp. 65-76, Apr. 2009.65-76, Apr. 2009.

U. Vishkin, “Is Multicore Hardware for General-Purpose Parallel Processing U. Vishkin, “Is Multicore Hardware for General-Purpose Parallel Processing Broken?” Broken?” Comm. ACMComm. ACM, vol. 57, no. 4, pp. 35-39, Apr. 2014., vol. 57, no. 4, pp. 35-39, Apr. 2014.