chapter 1 performance & technology trends read sections 1.5, 1.6, and 1.8
TRANSCRIPT
Chapter 1
Performance & Technology Trends
Read Sections 1.5, 1.6, and 1.8
CPE 432 Chapter 1.2
Chapter 1 — Computer Abstractions and Technology — 2
Section 1.5: The Power Wall
CPE 432 Chapter 1.3
Chapter 1 — Computer Abstractions and Technology — 3
Power Trends
In CMOS IC technology
FrequencyVoltageload CapacitivePower 2
×1000×30 5V → 1V
Clock rates hit a “power wall”
CPE 432 Chapter 1.4
Chapter 1 — Computer Abstractions and Technology — 4
The power wall
Performance was always improved by increasing frequency (up to 2004)
However by 2006, companies could not reduce generated power and remove more heat
Hence performance improvement could not be achieved by increasing frequency because of the increased power generated >>>>> THE POWER WALL
How else can we improve performance?
CPE 432 Chapter 1.5
Chapter 1 — Computer Abstractions and Technology — 5
Read Section 1.6: The Sea ChangeThe Switch to Multiprocessors
CPE 432 Chapter 1.6
Chapter 1 — Computer Abstractions and Technology — 6
Uniprocessor Performance
Uniprocessor performance is constrained by power, instruction-level parallelism, memory latency
CPE 432 Chapter 1.7 Dr. W. Abu-Sufah
A Sea Change is at Hand The power challenge has forced a change in the design
of microprocessors Since 2002 the rate of improvement in the response
time of programs on desktop computers has slowed from a factor of 1.5 per year to less than a factor of 1.2 per year
As of 2006 all desktop and server companies are shipping microprocessors with multiple processors – cores – per chipProduct AMD
BarcelonaIntel
NehalemIBM Power 6 Sun Niagara
2
Cores per chip 4 4 2 8
Clock rate 2.5 GHz ~2.5 GHz? 4.7 GHz 1.4 GHz
Power 120 W ~100 W? ~100 W? 94 W
Plan is to double the number of cores per chip per generation (about every two years) Plan not followed!!
CPE 432 Chapter 1.8 Dr. W. Abu-Sufah
Multicore microprocessorsRequire explicitly parallel programming
In single core microprocessors, hardware implemented instruction level parallelism to execute multiple instructions IN PARALLEL
Instruction level parallelism is hidden from the programmer
Parallel programming is hard (harder) to do. Involves:
- Programming for performance- Load balancing- Optimizing communication and synchronization
8
With the introduction of multicore microprocessors,The Free Lunch Era Ended !!!
CPE 432 Chapter 1.9 Dr. W. Abu-Sufah
Read Section 1.8: Pitfalls and Fallacies
CPE 432 Chapter 1.10 Dr. W. Abu-Sufah
Pitfalls and Fallacies
Pitfalls: Easily made mistakes
Fallacies:ErrorsMyths…
CPE 432 Chapter 1.11 Dr. W. Abu-Sufah
Pitfall: Amdahl’s Law
Pitfall: Improving an aspect of a computer and expecting a proportional improvement in overall performance
provedOverall_imTime Can’t be done!
aspect unimprovedaspect improved
provedOverall_im Timefactor timprovemen
TimeTime
Example: multiply operations account for 80 seconds of a 100 seconds run time of a program
How much improvement in multiply performance to get the program to run 5 times faster (i.e. in {100/5} = 20s)?
20 n
8020
CPE 432 Chapter 1.12 Dr. W. Abu-Sufah
Amdahl’s Law
ExTime
ExTime Speedup
new
oldoverall
Best Speedupoverall you could ever hope to do:
enhancedmaximum Fraction - 1
1 Speedup
enhanced
enhancedenhancedoldnew Speedup
FractionFraction ExTime ExTime 1
ExTimeold ExTimenew
fraction enhanced
enhanced
enhancedenhanced Speedup
Fraction Fraction 1
1
CPE 432 Chapter 1.13
Amdahl’s Law example
13
New CPU 10X faster I/O bound server, so 60% time waiting for I/O
56.1
64.0
1
100.4
0.4 1
1
SpeedupFraction
Fraction 1
1 Speedup
enhanced
enhancedenhanced
overall
• Apparently, its human nature is to be attracted by 10X faster, vs. keeping in perspective that it is just 1.56X faster
CPE 432 Chapter 1.14
Amdahl’s Law example:Make the common case fast
14
Fraction = 0.1, Speedup = 10
1.1
91.0
1
100.1
0.1 1
1
SpeedupFraction
Fraction 1
1 Speedup
enhanced
enhancedenhanced
overall
3.5
19.0
1
100.9
0.9 1
1 Speedupoverall
Fraction = 0.9, Speedup = 10
CPE 432 Chapter 1.15 15
Pitfall: MIPS as a Performance Metric
MIPS: Millions of Instructions Per Second Doesn’t account for
Differences in ISAs between computers Differences in complexity between instructions
CPE 432 Chapter 1.16 Dr. W. Abu-Sufah
Pitfall: MIPS as a Performance Metric (cont.)
How should MIPS be computed? It is not the maximum theoretical MIPS quoted
by the manufacturer.
610time Execution
count nInstructioMIPS P program executing A processor
610CPI
rate Clock
rate Clock
CPIcount nInstructiotime Execution
610rate Clock
CPIcount nInstructiocount nInstructio
MIPS
CPI varies between programs on a given CPU