“architecture”??comarc/slides/lect1-3.pdf · architecture (from architect’s point of view) °...
Post on 05-Aug-2020
3 Views
Preview:
TRANSCRIPT
מבנה המחשב
“Architecture”??
“The art or science or building...the art or practic e of designing and building structures...”
» Webster 9th New College Dictionary
“including plan, design, construction and decorative treatment...”
» American College Dictionary
“Computer Architecture”
- the word coined byFred Brooks
“Computer architecture is the computer as seen by t he user”- Amdhal et al, (64)
“...by architecture, we mean the structure of the m odules as they are organized in a computer system...”
- Stone, H. (1987)
“The architecture of a computer is the interface be tween the machine and the software”
- Andris PadgesIBM 360/370 Architect
“Computer Architecture”“Computer Architecture”
° Structure: static arrangement of the
parts (plan)
° Organization: dynamic interaction of
these parts and their
management (design)
° Implementation: the design of
specific building
blocks (construction)
“Computer Architecture” – cont’d“Computer Architecture” – cont’d
Architecture (from architect’s point of view)
° Instruction set architecture
° Implementation• Organization: high-level aspects
- memory system- bus structure
- internal CPU design
• Hardware:- logic design
- packaging tech .
Levels in Computer Organization
° Concepts of multi-level machine
° Concepts of virtual machine
Architecture Disciplines
° Hardware/software structure
° Algorithms and their implementation
° Language Issues
Both hardware and software consist of hierarchical layers, with each lower layer hiding details from the level above. This principle of abstraction is the way both hardware designers and software designers cope with the complexity of computer systems. One key interface between the levels of abstraction is the instruction set architecture: the interface between the hardware and low-level software. This abstract interface enables many implementations of varying cost and performance to run identical software.
John L. Hennessy
David A. Patterson
The Big Picture
Early Calculating Machines
° 1623: Wilhelm Schickard’smechanical counter.
° 1642: BlaisePascal’s mechanical adder with carry.
Early Computing Machines
1823-42: Charles Babbage built the Difference Engine – to tabulate polynomial functions for math tables, with plans for a more general “Analytical Engine”
(assisted by Augusta AdaKing, Countess of Lovelace)
First Electronic Computers
° Konrad Zuse (1938) – Z1 mechanical computer with binar y arithmetic (program-controlled Z3 in 1941)
° John Atanasoff (1942) – “ABC” electronic digital comp uter to solve linear equations
° John Mauchly/J. Presper Eckert – ENIAC (1943-46) – fir st operational large-scale computing machine,
° Maurice Wilkes – EDSAC (1949) – 1st operational store d-program computer
° Howard Aiken – Harvard Mark I (1939-44) – built by IB M
° Jon Von Neumann/Eckert/Mauchly – EDVAC (1945) – 1st “published” stored-program computer,
° Von Neumann (1945-51) – IAS Computer
° Mauchly/Eckert (1946-51) - UNIVAC
First General-Purpose Computer° Electronic Numerical Integrator
and Calculator (ENIAC) built in World War II was the first general purpose computer• For computing artillery firing
tables• 80 feet long by 8.5 feet high
and several feet wide• Twenty 10 digit accumulators,
each 2 feet long• 18,000 vacuum tubes + 1500
relays• 5,000 additions/second• 2800 us multiply• Weight: 30 tons• Power consumption: 140kW• Data from card reader (800
cards/min)
© 2004 Morgan Kaufman Publishers
The Atanasoff Story
° John Vincent Atanasoff, a professor of physics at Iowa State College (now University), and his technical assistant, Clifford Berry, built a working electronic computer in 1942.
° The First Electronic Computer, the AtanasoffStory, by Alice R. Burks and Arthur W. Burks, Ann Arbor, Michigan: The University of Michigan Press, 1991.
History Continues
° 1946-52: Von Neumann built the IAS computer at the Institute of Advanced Studies, Princeton – A prototype for most future computers.
° 1947-50: Eckert-Mauchly Computer Corp. built UNIVAC I, used in the 1950 census.
° 1949: Maurice Wilkes built EDSAC, the first stored-program computer
EDVAC – Electronic Discrete Variable Computer (1945)
° Jon von Neumann
° First published stored-program computer (program & data in same memory - “von Neumann architecture”)
° 1024 words mercury delay-line memory, 20K words mag netic wire secondary memory
° 44-bit binary numbers & serial arithmetic
° Instruction format: A1 A2 A3 A4 OP
• A1 OP A2 -> A3, next instruction at A4
• Cond. Jump: if A1 <= A2 goto A3 else goto A4
° I/O between main & secondary memory
1. “Big Iron” Computers:
Used vacuum tubes, electric relays and bulk magnetic storage devices. No microprocessors. No memory.
Example: ENIAC (1945), IBM Mark 1 (1944)
First-Generation Computers
° Late 1940s and 1950s
° Stored-program computers
° Programmed in assembly language
° Used magnetic devices and earlier forms of memories
° Examples: IAS, ENIAC, EDVAC, UNIVAC, Mark I, IBM 701
A PuzzlenWhat does the following mean?• 00000000001000100100000000100000• 00000000011001000100100000100000
• 00000001000010010010100000100010nOK, then, this?• 000000 00001 00010 01000 00000
100000• 000000 00011 00100 01001 00000
100000• 000000 01000 01001 00101 00000
100010
TranslationnAnd this?• 0 1 2 8 0 32• 0 3 4 9 0 32
• 0 8 9 5 0 34nHow about this?• add $8,$1,$2• add $9,$3,$4• sub $5,$8,$9
More TranslationnBecoming clear?• $8 = $1 + $2
• $9 = $3 + $4• $5 = $8 - $9nSurely OK now?• u = a +b• v = c+d;• x = u - vnOr, obviously: x = (a+b) - (c+d)
Levels of Representation
High Level Language Program (e.g., C)
Assembly Language Program (e.g.,MIPS)
Machine Language Program (MIPS)
Hardware Architecture Description (e.g., Verilog Language)
Compiler
Assembler
Machine Interpretation
temp = v[k];
v[k] = v[k+1];v[k+1] = temp;
lw $t0, 0($2)lw $t1, 4($2)sw $t1, 0($2)sw $t0, 4($2)
0000 1001 1100 0110 1010 1111 0101 10001010 1111 0101 1000 0000 1001 1100 0110 1100 0110 1010 1111 0101 1000 0000 1001 0101 1000 0000 1001 1100 0110 1010 1111
Logic Circuit Description (Verilog Language)
Architecture Implementation
wire [31:0] dataBus;regFile registers (databus);
ALU ALUBlock (inA, inB, databus);
wire w0;XOR (w0, a, b);
AND (s, w0, a);
Computer Architecture
What are “Machine Structures”?
*Coordination of many
levels (layers) of abstraction
I/O systemProcessor
CompilerOperating
System(Mac OS X)
Application (ex: browser)
Digital DesignCircuit Design
Instruction SetArchitecture
Datapath & Control
transistors
MemoryHardware
Software Assembler
Anatomy: 5 components of any Computer
Personal Computer
Processor
Computer
Control(“brain”)
Datapath(“brawn”)
Memory
(where programs, data live whenrunning)
Devices
Input
Output
Keyboard, Mouse
Display , Printer
Disk(where programs, data live whennot running)
Overview of Physical implementations
° Integrated Circuits (ICs)• Combinational logic circuits, memory elements,
analog interfaces.
° Printed Circuits (PC) boards• substrate for ICs and interconnection, distribution of
CLK, Vdd, and GND signals, heat dissipation.
° Power Supplies• Converts line AC voltage to regulated DC low voltag e
levels.
° Chassis (rack, card case, ...) • holds boards, power supply, provides physical
interface to user or other systems.
° Connectors and Cables.
The hardware out of which we make systems.
Integrated Circuits
° Primarily Crystalline Silicon
° 1mm - 25mm on a side
° 2003 - feature size ~ 0.13µm = 0.13 x 10-6 m
° 100 - 400M transistors
° (25 - 100M “logic gates")
° 3 - 10 conductive layers
° “CMOS” (complementary metal oxide semiconductor) - most common.
° Package provides:• spreading of chip-level signal paths to
board-level
• heat dissipation.
° Ceramic or plastic with gold wires.
Chip in Package
Bare Die
Printed Circuit Boards
° fiberglass or ceramic
° 1-20 conductive layers
° 1-20in on a side
° IC packages are soldered down.
Technology Trends: Memory Capacity(Single-Chip DRAM)
size
Year
1000
10000
100000
1000000
10000000
100000000
1000000000
1970 1975 1980 1985 1990 1995 2000
year size (Mbit) 1980 0.06251983 0.251986 11989 41992 161996 641998 1282000 2562002 512• Now 1.4X/yr, or 2X every 2 years.
• 8000X since 1980!
Year
1000
10000
100000
1000000
10000000
100000000
1970 1975 1980 1985 1990 1995 2000
i80386
i4004
i8080
Pentium
i80486
i80286
i8086
Technology Trends: Microprocessor Complexity
2X transistors/ChipEvery 1.5 years
Called“ Moore’s Law ”
Alpha 21264: 15 millionPentium Pro: 5.5 millionPowerPC 620: 6.9 millionAlpha 21164: 9.3 millionSparc Ultra: 5.2 million
Moore’s Law
Athlon (K7): 22 Million
Itanium 2: 410 Million
Trends: Processor Performance
0100200300400500600700800900
87 88 89 90 91 92 93 94 95 96 97
Intel (Pentium IV, 3.0 GHz)
DEC Alpha 5/500
DEC Alpha 5/300
DEC Alpha 4/266IBM POWER 100
DEC AXP/500
HP 9000/750
Sun-4/
260
IBMRS/
6000
MIPS M/
120
MIPS M
2000
1.54x/year
10001100
°Performance with respect to performance of VAX-11/780
Processor Performance (SPEC)
0
50
100
150
200
250
300
350
1982 1984 1986 1988 1990 1992 1994
Year
Pe
rform
ance
RISC
Intel x86
35%/yr
RISCintroduction
Did RISC win the technology battle and lose the market war?
performance now improves ~60% per year (2x every 1. 5 years)
OLD PICTURE – BUT THESTORY IS THE SAME
Processor Performance - Capacities
Processor Performance - Capacities
Technology --> Dramatic Changes° Processor
• logic capacity: 2 ×××× in performance every 1.5 years; • clock rate: about 30% per year• overall performance: 1000 ×××× in last decade
° Main Memory• DRAM capacity: 2 ×××× / 2 years; 1000 ×××× size in last decade
• memory speed: about 10% per year
• cost / bit: improves about 25% per year
° Disk• capacity: > 2 ×××× in capacity every 1.5 years• cost / bit: improves about 60% per year• 120 ×××× capacity in last decade
° Network Bandwidth• Bandwidth: increasing more than 100%per year!
Your PC in 2006°State-of-the-art PC(on your desk)
• Processor clock speed: 8000 MegaHertz (8.0 GigaHertz)
• Memory capacity: 2048 MegaBytes (2.0 GigaBytes)
• Disk capacity:800 GigaBytes (0.8 TeraBytes)
• Will need new units! Mega ⇒ Giga ⇒ Tera
Technology in the News
° BIG• LaCie the first to offer
consumer-level 1.6 Terabyte disk!
• ~$2,000• Weighs 11 pounds!
• 5 1/4” form-factor
° SMALL• Pretec is soon
offering a 12GBCompactFlash card
• Size of a silver dollar
• Cost? > New Honda!
www.lacie.com/products/product.htm?id=10129
www.engadget.com/entry/4463693158281236/
° Learn some of the big ideas in CS & engineering:• 5 Classic components of a Computer
• Data can be anything (integers, floating point, characters): a program determines what it is
• Stored program concept: instructions just data
• Principle of Locality, exploited via a memory hiera rchy (cache)
• Greater performance by exploiting parallelism
• Principle of abstraction, used to build systems as layers
• Compilation v. interpretation thru system layers
• Principles/Pitfalls of Performance Measurement
Text°Computer Organization and Design: The Hardware/Software Interface, Third Edition, Patterson and Hennessy (COD). The second edition is far inferior, and is not suggested.
Your final grade
° Grading (could change)• 25% Homework• 75% Test
Course Problems…Cheating
°What is cheating?• Studying together in groups is encouraged.
• Turned-in work must be completely your own.
• Both “giver” and “receiver” are equally culpable
°Every offense will be referred to theOffice of Student Judicial Affairs.
°Continued rapid improvement in computing• 2X every 2.0 years in memory size;
every 1.5 years in processor speed; every 1.0 year in disk capacity;
• Moore’s Law enables processor(2X transistors/chip ~1.5 yrs)
°5 classic components of all computersControl Datapath Memory Input Output
Processor}
What is "Computer Architecture"
Computer Architecture =
Instruction Set Architecture (ISA) +
Machine Organization (MO)
• ISA \ Definition of What the Machine Does, Logical View
• MO \ How Machine Implements ISA, Physical Implementation
The Instruction Set: a (the?) Critical Interface
instruction set
software
hardware
Example ISAs(Instruction Set Architectures)
°Digital Alpha (v1, v3) 1992-97
°HP PA-RISC (v1.1, v2.0) 1986-01
°Sun Sparc (v8, v9, v10, v11) 1987-01
°SGI MIPS (MIPS I, II, III, IV, V) 1986-01
°Intel (8086,80286, 1978-01 80486,Pentium, Pentium )
°Intel + HP EPIC 1998-01
Impact of changing an ISA
°Early 1990’s Apple switched instruction set architecture of the Macintosh• From Motorola 68000-based machines• To PowerPC architecture• Upside? Downside?
°Intel 80x86 Family: many implementations of same architecture• Upside: program written in 1978 for 8086 can be run on latest Pentium chip
• Downside?
The Big Picture
Control
Datapath
Memory
Processor``CPU''Datapath
+ControlUnit
Input
Output
Since 1946 all computers have had 5 components
Interconnection Structures (buses)
What is ``Computer Architecture''?
I/O systemProcessor
CompilerOperating System
(Unix; Windows 2000)
Application (Netscape)
Digital DesignCircuit Design
Instruction SetArchitecture
Datapath & Control
transistors, IC layout
MemoryHardwareSoftware Assembler
° Co-ordination of many levels of abstraction• hide unnecessary implementation details• helps us cope with enormous complexity of
real systems
° Under a rapidly changing set of forces
° Design, Measurement, and Evaluation
Forces Acting on Computer Architecture
° R-a-p-i-d Improvement in Implementation Technology:
• IC: integrated circuit; invented 1959
• SSI →→→→ MSI →→→→ LSI →→→→ VLSI: dramatic growth in number transistors/chip ⇒ ability to create more (and bigger) Functional Units per processor;
• bigger memory ⇒ more sophisticated applications, larger databases
• Ubiquitous computing
Execution Cycle
InstructionFetch
InstructionDecode
OperandFetch
Execute
ResultStore
NextInstruction
Obtain instruction from program storage
Determine required actions and instruction size
Locate and obtain operand data
Compute result value or status
Deposit results in storage for later use
Determine successor instruction
Overview: Processor
Front SideBus
These piecesimplementthe instruction cycles
Overview: PCI Bus and Devices
Bus Controller
° All computers consist of five components
• Processor :
• (1) datapath and (2) control
• (3) Memory
• (4) Input devices and (5) Output devices
° Not all "memories" are created equally
• Cache: fast (expensive) memory are placed closer to the processor
• Main memory : less expensive memory--we can have more
° Interfaces are where the problems are - between functional units and between the computer and the outside world
° Need to design against constraints of performance, power, area and cost
DAP.S98 1
IC cost = Die cost + Testing cost + Packa ging costFinal test yield
Die cost = Wafer costDies per Wafer * Die yield
Dies per wafer = š * ( Wafer_diam / 2) 2 – š * Wafer_diam – Test diesDie Area ¦ 2 * Die Area
Die Yield = Wafer yield * 1 +Defects_per_unit_area * Die_Area
αααα
Integrated Circuits Costs
Die Cost goes roughly with die area 4
{−−−− αααα
}
How to Quantify Performance?
• Time to run the task (ExTime)– Execution time, response time, latency
• Tasks per day, hour, week, sec, ns … (Performance)– Throughput, bandwidth
Plane
Boeing 747
BAD/SudConcodre
Speed
610 mph
1350 mph
DC to Paris
6.5 hours
3 hours
Passengers
470
132
Throughput (pmph)
286,700
178,200
The Bottom Line: Performance and Cost or Cost and Performance?
"X is n times faster than Y" means
ExTime(Y) Performance(X)
--------- = ---------------
ExTime(X) Performance(Y)
• Speed of Concorde vs. Boeing 747• Throughput of Boeing 747 vs. Concorde• Cost is also an important parameter in the
equation which is why concordes are being put to pasture!
Measurement Tools
° Benchmarks, Traces, Mixes
° Hardware: Cost, delay, area, power estimation
° Simulation (many levels)• ISA, RT, Gate, Circuit
° Queuing Theory
° Rules of Thumb
° Fundamental “Laws”/Principles
° Understanding the limitations of any measurement tool is crucial.
Metrics of Performance
Compiler
Programming Language
Application
DatapathControl
Transistors Wires Pins
ISA
Function Units
(millions) of Instructions per second: MIPS(millions) of (FP) operations per second: MFLOP/s
Cycles per second (clock rate)
Megabytes per second
Answers per monthOperations per second
Cases of Benchmark Engineering
° The motivation is to tune the system to the benchma rk to achievepeak performance.
° At the architecture level
• Specialized instructions
° At the compiler level (compiler flags )
• Blocking in Spec89 � factor of 9 speedup
• Incorrect compiler optimizations/reordering.
• Would work fine on benchmark but not on other progr ams
° I/O level
• Spec92 spreadsheet program (sp)
• Companies noticed that the produced output was alwa ys out put to a file (so they stored the results in a memo ry buffer) and then expunged at the end (which was not measured).
• One company eliminated the I/O all together.
After putting in a blazing performance on the benchmark test, Sun issued a glowing press release claiming that it hadoutperformed Windows NT systems on the test. Pendragon president Ivan Phillips cried foul, saying the resultsweren't representative of real-world Java performance and that Sun had gone so far as to duplicate the test's code within Sun'sJust-In-Time compiler. That's cheating, says Phillips, who claims that benchmark tests and real-world applications aren'tthe same thing.
Did Sun issue a denial or a mea culpa? Initially, Sun neither denied optimizing for the benchmark test nor apologized forit. "If the test results are not representative of real-world Java applications, then that's a problem with the benchmark,"Sun's Brian Croll said.
After taking a beating in the press, though, Sun retreated and issued an apology for the optimization.[Excerpted from PC Online 1997]
Issues with Benchmark Engineering
° Motivated by the bottom dollar, good performance on classic suites �more customers, better sales.
° Benchmark Engineering � Limits the longevity of benchmark suites
° Technology and Applications � Limits the longevity of benchmark suites.
°http://www.spec.org/
SPEC: System Performance Evaluation Cooperative
° First Round 1989
• 10 programs yielding a single number (“SPECmarks”)
° Second Round 1992
• SPECInt92 (6 integer programs) and SPECfp92 (14 floating point programs)
- Compiler Flags unlimited. March 93
- new set of programs: SPECint95 (8 integer programs) and SPECfp95 (10 floating point)
• “benchmarks useful for 3 years”
• Single flag setting for all programs: SPECint_base9 5, SPECfp_base95
• SPEC CPU2000 (11 integer benchmarks – CINT2000, and 14 floating-point benchmarks – CFP2000
SPEC 2000 (CINT 2000)Results
SPEC 2000 (CFP 2000)Results
Reporting Performance Results
° Reproducability
° � Apply them on publicly available benchmarks. Pecking/Picking order• Real Programs• Real Kernels• Toy Benchmarks
• Synthetic Benchmarks
How to Summarize Performance
° Arithmetic mean (weighted arithmetic mean) tracks execution time: sum(Ti)/n or sum(W i*Ti)
° Harmonic mean (weighted harmonic mean) of rates (e.g., MFLOPS) tracks execution time: n/sum(1/Ri) or 1/sum(Wi/Ri)
° Normalized execution time is handy for scaling performance (e.g., X times faster than SPARCstation 10)
° But do not take the arithmetic mean of normalized execution time, use the geometric mean = (Product(Ri)^1/n)
Performance Evaluation° “For better or worse, benchmarks shape a field”
° Good products created when have:
• Good benchmarks
• Good ways to summarize performance
° Given sales is a function in part of performance re lative to competition, investment in improving product as rep orted by performance summary
° If benchmarks/summary inadequate, then choose betwe en improving product for real programs vs. improving p roduct to get more sales;Sales almost always wins!
° Execution time is the measure of computer performance!
Simulations
° When are simulations useful?
° What are its limitations, I.e. what real world phenomenon does it not account for?
° The larger the simulation trace, the less tractable the post-processing analysis.
Queueing Theory
° What are the distributions of arrival rates and values for other parameters?
° Are they realistic?
° What happens when the parameters or distributions are changed?
Quantitative Principles of Computer Design
° Make the Common Case Fast• Amdahl’s Law
° CPU Performance Equation• Clock cycle time
• CPI
• Instruction Count
° Principles of Locality
° Take advantage of Parallelism
CPU Performance Equation
CPU time = Seconds = Instructions x Cycles x Seconds
Program Program Instruction Cycle
CPU time = Seconds = Instructions x Cycles x Seconds
Program Program Instruction Cycle
DAP.S98 32
Amdahl's LawSpeedup due to enhancement E:
ExTime w/o E Performance w/ E
Speedup(E) = ------------- = -------------------
ExTime w/ E Performance w/o E
Suppose that enhancement E accelerates a fraction F of the task by a factor S, and the remainder of the task is unaffected
Amdahl’s Law
ExTime new = ExTime old x (1 - Fraction enhanced ) + Fraction enhanced
Speedup overall =ExTime old
ExTime new
Speedup enhanced
=1
(1 - Fraction enhanced ) + Fraction enhanced
Speedup enhanced
Amdahl’s Law
° Floating point instructions improved to run 2X; but only 10% of actual instructions are FP
=
Amdahl’s Law (answer)
° Floating point instructions improved to run 2X; but only 10% of actual instructions are FP
Speedup overall = 1
0.95= 1.053
ExTime new = ExTime old x (0.9 + .1/2) = 0.95 x ExTime old
Example: Calculating CPI
Typical Mix
Base Machine (Reg / Reg)Op Freq Cycles CPI(i) (% Time)ALU 50% 1 .5 (33%)Load 20% 2 .4 (27%)Store 10% 2 .2 (13%)Branch 20% 2 .4 (27%)
1.5
Chapter Summary, #1
• Designing to Last through TrendsCapacity Speed
Logic 2x in 3 years 2x in 3 years
DRAM 4x in 3 years 2x in 10 years
Disk 4x in 3 years 2x in 10 years
• 6yrs to graduate => 16X CPU speed, DRAM/Disk size
• Time to run the task– Execution time, response time, latency
• Tasks per day, hour, week, sec, ns, …– Throughput, bandwidth
• “X is n times faster than Y” meansExTime(Y) Performance(X)
--------- = --------------
ExTime(X) Performance(Y)
° Amdahl’s Law:
° CPI Law:
° Execution time is the REAL measure of computer performance!
° Good products created when have:
• Good benchmarks, good ways to summarize performance
° Die Cost goes roughly with die area 4
Speedup overall =ExTime old
ExTime new
=1
(1 - Fraction enhanced ) + Fraction enhanced
Speedup enhanced
CPU time = Seconds = Instructions x Cycles x Seconds
Program Program Instruction Cycle
CPU time = Seconds = Instructions x Cycles x Seconds
Program Program Instruction Cycle
Food for thought
° Two companies reports results on two benchmarks one on a Fortran benchmark suite and the other on a C++ benchmark suite.
° Company A’s product outperforms Company B’s on the Fortran suite, the reverse holds true for the C++ suite. Assume the performance differences are similar in both cases.
° Do you have enough information to compare the two products. What information will you need?
Food for Thought II
° In the CISC vs. RISC debate a key argument of the RISC movement was that because of its simplicity, RISC would always remain ahead.
° If there were enough transistors to implement a CISC on chip, then those same transistors could implement a pipelined RISC
° If there was enough to allow for a pipelined CISC there would be enough to have an on-chip cache for RISC. And so on.
° After 20 years of this debate what do you think?
° Hint: Think of commercial PC’s, Moore’s law and some of the data in the first chapter of the book ( and on these slides)
top related