advanced computer architecture fundamental of computer design instruction set principles and...
TRANSCRIPT
Advanced Computer Architecture
• Fundamental of Computer Design• Instruction Set Principles and Examples• Pipelining:Basic and Intermediate Concepts• Memory Hierarchy Design• Storage System• Instruction-Level Parallelism:Concepts and
Challenges• Exploiting Instruction-Level Parallelism with
Software Approaches• Multiprocessors and Thread-Level Parallelism
Forces on Computer Architecture
ComputerArchitecture
Technology ProgrammingLanguages
OperatingSystems
History
Applications
(A = F / M)
Fundamentals of Computer Design
• Introduction• The Task of the Computer Designer• Technology Trends• Cost Price, and Their Trends• Performance• Quantitative Principles of Computer Design• Putting It All Together: Performance and Price-
Performance• Power Consumption and Efficiency• Fallacies and Pitfalls
Microprocessor Performance
Cost of Downtime
System Characteristics of the the Three Computing Classes
Technology Trends• Clock Rate: ~30% per year• Transistor Density: ~35%• Chip Area: ~15%• Transistors per chip: ~55%• Total Performance Capability: ~100%• by the time you graduate...
– 3x clock rate (3-4 GHz)– 10x transistor count (1 Billion transistors)– 30x raw capability
• plus 16x DRAM density, 32x disk density
The Most Important Functional Requirements and Architect Faces
1.4 Cost, Price, and Their TrendsPrices of six generation of DRAMS
The Price of an Intel Pentium III over Time
What is “Computer Architecture”?
• Coordination of many levels of abstraction• Under a rapidly changing set of forces• Design, Measurement, and Evaluation
I/O systemInstr. Set Proc.
Compiler
OperatingSystem
Application
Digital DesignCircuit Design
Instruction Set Architecture
Firmware
Datapath & Control
Layout
Computer Architecture Topics
• NetworksM
Interconnection NetworkS
PMPMPMP° ° °
Topologies,Routing,Bandwidth,Latency,Reliability
Processor-Memory-Switch
MultiprocessorsNetworks and Interconnections
Network Interfaces
Shared Memory,Message Passing,Data Parallelism
Photograph of an Intel Pentium 4
This 8-inch Wafer Contains 564 MIPS64 20k Processors
areaDiesDiameterWafer
AreasDiesDiameterWafer
waferperDies
2
)2/( 2
Die yield
areaDieareaunitperDefect
YieldWaferYieldDies 1
Estimated distribution of PC Costs
DropCostRAM
The components of price for a $1000 PC
1.5 Measuring and Reporting Performance: Execution Time
Y
X
X
Y
X
Y
ePerformancePerformanc
ePerformanc
ePerformanctimeExecutiontimeExecution
n 1
1
The programs in the SPEC CPU 2000 benchmark suites
The Embedded Benchmark
EEMBC:The EDN Embedded Microprocessor Benchmarks ConsortiumEEMBC:The EDN Embedded Microprocessor Benchmarks Consortium
The machine, software, and baseline tuning parameters for the CINT2000
Comparing and Summarizing Performance
Weighted arithmetic mean execution for three machines
Execution times from Figure 1.15 normalized to each machine
1.6 Quantitative Principles of Computer Design
• Amdahl’s Law
tenhancementheguwithouttaskentireforePerformancpossiblewhentenhancementhegutaskentireforcePerforrman
Speedupsin
sin
possiblewhentenhancementhegutaskentirefortimeExecutiontenhancementheguwithouttaskentirefortimeExecution
Speedupsin
sin
Amdahl’s Law
• Enhancement more, Improvement more
))1((enhanced
enhancedenhancedoldnew Speedup
FractionFractiontimeExecutiontimeExecution
))1(
1
enhanced
enhancedenhanced
old
newoverall
SpeedupFraction
FractiontimeExecutiontimeExecution
Speedup
Amdahl’s Law (Page41)
Performance Comparison-Speedup Amdahl’s Law
The CPU Performance Equation(Page42)
timecycleClockogramaforCyclesClockCPUtimeCPU Pr
timecycleClockninstructioperCyclesCountnInstructiotimeCPU
timecycleClockCPIICtimeCPU
ogramSeconds
cyclesClockSeconds
nInstructioCyclesClock
ogramnsInstructio
timeCPUPrPr
CPU time
• Clock cycle time---Hardware technology and organization
• CPI---Organization and instruction set architecture
• Instruction count---Instruction set architecture and compiler technology
Overall CPI
timecycleClockCPIICtimeCPU i
n
ii
)(
1
i
n
i
ii
n
ii
overall CPIcountnInstructio
ICcountnInstructio
CPIICCPI
1
1)(
Overall CPI Comparison (Page44)
CPI Com.
Speedup
• Pipeline(Operation manual,Regular design ,…)
• Principle of locality-Temporal and Spatial
• Parallelism-Multiple Units, processors and Cluster Servers, Distributed Computing,…
• Clock Rate ,(Circuits, Devices,…..)
• Optics,…..
1.7 Performance and Price-performance Seven different desktop systems
Performance and price-performance
Performance and price-performance
Cluster Systems
The performance and the price-performance of cluster systems
Price-performance of cluster systems
Five different embedded processors
Relative performance of five different embedded processors for three of the five EEMBC
benchmark suites
EEMBC:The EDN Embedded Microprocessor Benchmarks Consortium
Relative price-performance of five different embedded processors for three of the five
EEMBC benchmark suites
1.8 Power Consumption and Efficiency as the metric
1.9 Fallacies and Pitfalls
• Fallacies—misbelieves(F)• Pitfalls---Easily made mistakes(P)
– The relative performance of two processors with the same instruction set architecture(ISA) can be judged by clock rate or by the performance of a single benchmark suite. (F)(Fig.1.28)
– Benchmarks remain valid indefinitely. (F)(Fig. 1.29)– Comparing hand-coded assembly and compiler-generated
high-level language performance.(P)– Peak performance tracks observed performance. (F)
1.9 Fallacies and Pitfalls
• The Best design for a computer is the one that optimizes the primary objective without considering implementation.(F)
• Neglecting the cost of software in either evaluating a system or examining cost-performance. (P)
• Falling prey to Amdahl’s Law.(P)• Synthetic benchmarks predict performance for real
programs.
1.9 Fallacies and Pitfalls
• MIPS is an accurate measure for computing performance among computers.(F)
66 1010
CPIrateClock
timeExcutioncountnInstructio
MIPS
610
MIPScountnInstructio
timeExcution
1.9 Fallacies and Pitfalls
• The problem with using MIPS as a measure for comparison– MIPS is dependent on the instruction set,
making it difficult to compare MIPS of computer with different instruction sets.
– MIPS varies between programs on the same computer.
– Most importantly, MIPS can vary inversely to performance
P4 and P3 performance comparison-Relative performance
The tuning parameters for the SPEC CFP2000 report
The evolution of the SPEC benchmarks
over time
The performance of three embedded processors
Measurements of peak performance and actual performance
1.10 Concluding Remarks
• Make the common case fast• Chap. 2:The interaction between compiler
and instruction set design.• Part 3: Pipeline(Appendix A)• Part 4: Memory Design(Chap.5)• Part 5: Storage System (Chap. 7)• (Page1-86),(page87-168),(page A-1~A-87)
…..
1.11 Historical Perspective and References
• The First General-purpose Electronic Computers
• Important special-purpose machines
• Commercial Developments
• Development of Quantitative Performance Measures:Successes and Failures
reference
reference
MM MIPS
ePerformancePerformanc
MIPS