computer abstractions and technology -...
TRANSCRIPT
SWE3005: Introduction to Computer Architectures, Fall 2019, Jinkyu Jeong ([email protected])
Computer Abstractions and Technology
Jinkyu Jeong ([email protected])Computer Systems Laboratory
Sungkyunkwan Universityhttp://csl.skku.edu
SWE3005: Introduction to Computer Architectures, Fall 2019, Jinkyu Jeong ([email protected]) 2
OutlineTextbook: P&H 1.1, 1.3-1.8 (5th Ed.) / 1.1-1.6 (4th Ed.)
• Overview
• Below Your Program
• Under the Cover
• Performance
• The Power Wall and Transition to Multiprocessors
SWE3005: Introduction to Computer Architectures, Fall 2019, Jinkyu Jeong ([email protected]) 3
The Computer Revolution
• Progress in computer technology– Underpinned by Moore’s Law
• Makes novel applications feasible– Computers in automobiles– Cell phones– Human genome project– World Wide Web– Search Engines– Artificial intelligence
• Computers are pervasive
SWE3005: Introduction to Computer Architectures, Fall 2019, Jinkyu Jeong ([email protected]) 4
Moore’s Law (1965)
• The single most important guideline in microprocessor fabrication and architecture
– "the number of transistors per chip will double every 18 months“
– "as the sophistication of chips goes up, the cost of [fabrication plants] goes up exponentially"
• Both are held true after 40+ years.
(http://download.intel.com/museum/research/arc_collect/history_docs/pix/hoff1.jpg)
Gordon Moore Original graph from 1965 (source: www.intel.com)R
elat
ive
Man
ufac
turi
ng C
ost/
Com
pone
ntNumber of Components Per Integrated Circuit
SWE3005: Introduction to Computer Architectures, Fall 2019, Jinkyu Jeong ([email protected]) 5
Moore’s Law (1965)
Over 10,000,000x increase of transistor count (and performance)!
(Source: Wikipedia)
SWE3005: Introduction to Computer Architectures, Fall 2019, Jinkyu Jeong ([email protected]) 6
Classes of Computers
• Personal computers– General purpose, variety of software– Subject to cost/performance tradeoff
• Server computers– Network based– High capacity, performance, reliability– Range from small servers to building sized
• Embedded computers– Hidden as components of systems– Stringent power/performance/cost constraints
SWE3005: Introduction to Computer Architectures, Fall 2019, Jinkyu Jeong ([email protected])
The PostPC EraVo
lum
e (in
milli
on u
nits
) Cell Phone (not including smartphone)
PC (not including tablet)
Smart phone sales
Tablet
SWE3005: Introduction to Computer Architectures, Fall 2019, Jinkyu Jeong ([email protected]) 8
The PostPC Era
• Personal Mobile Device (PMD)– Battery operated– Connects to the Internet– Hundreds of dollars– Smart phones, tablets, electronic glasses
• Cloud computing– Warehouse Scale Computers (WSC)– Software as a Service (SaaS)– Portion of software run on a PMD and a portion run in the
Cloud– Amazon and Google
SWE3005: Introduction to Computer Architectures, Fall 2019, Jinkyu Jeong ([email protected]) 9
What You Will Learn from This Course
• How programs are translated into the machine language
– And how the hardware executes them
• The hardware/software interface
• What determines program performance– And how it can be improved
• How hardware designers improve performance
• What is parallel processing
SWE3005: Introduction to Computer Architectures, Fall 2019, Jinkyu Jeong ([email protected]) 10
Understanding Performance
• Algorithm– Determines number of operations executed
• Programming language, compiler, architecture– Determine number of machine instructions executed per
operation
• Processor and memory system– Determine how fast instructions are executed
• I/O system (including OS)– Determines how fast I/O operations are executed
SWE3005: Introduction to Computer Architectures, Fall 2019, Jinkyu Jeong ([email protected])
Below Your Program
SWE3005: Introduction to Computer Architectures, Fall 2019, Jinkyu Jeong ([email protected]) 12
Below Your Program
• Application software– Written in high-level language
• System software– Compiler: translates HLL code to machine code– Operating System: service code
• Handling input/output• Managing memory and storage• Scheduling tasks & sharing resources
• Hardware– Processor, memory, I/O controllers
SWE3005: Introduction to Computer Architectures, Fall 2019, Jinkyu Jeong ([email protected]) 13
Levels of Program Code
• High-level language– Level of abstraction closer to
problem domain– Provides for productivity and
portability
• Assembly language– Textual representation of instructions
• Hardware representation– Binary digits (bits)– Encoded instructions and data
SWE3005: Introduction to Computer Architectures, Fall 2019, Jinkyu Jeong ([email protected])
Under the Covers
SWE3005: Introduction to Computer Architectures, Fall 2019, Jinkyu Jeong ([email protected]) 15
Components of a Computer• Same components for
all kinds of computer– Desktop, server,
embedded• Processor
(Datapath/Control), Memory, and I/O
• Input/output (I/O) includes– User-interface devices
• Display, keyboard, mouse– Storage devices
• Hard disk, CD/DVD, flash– Network adapters
• For communicating with other computers
The BIG Picture
SWE3005: Introduction to Computer Architectures, Fall 2019, Jinkyu Jeong ([email protected])
Opening the Box
Capacitive multitouch LCD screen
3.8 V, 25 Watt-hour battery
Computer board
SWE3005: Introduction to Computer Architectures, Fall 2019, Jinkyu Jeong ([email protected]) 17
Inside the Processor (CPU) (5th Ed.)
• Apple A5
You will understand what each block does by the end of this course!
• Datapath: performs operations on data• Control: sequences datapath, memory, ...• Cache memory
• Small fast SRAM memory for immediate access to data
SWE3005: Introduction to Computer Architectures, Fall 2019, Jinkyu Jeong ([email protected]) 18
Abstractions• Abstraction helps us deal with complexity– Hide lower-level detail
• Instruction set architecture (ISA)– Hardware abstraction visible to software (compiler or
programmer)– The hardware/software interface
software
hardware
Source: D. Culler @ UC Berkeley
instruction set
SWE3005: Introduction to Computer Architectures, Fall 2019, Jinkyu Jeong ([email protected]) 19
A Safe Place for Data
• Volatile main memory– Loses instructions and data when power off– E.g., DRAM
• Non-volatile secondary memory– Magnetic disk– Flash memory– Optical disk (CDROM, DVD)
SWE3005: Introduction to Computer Architectures, Fall 2019, Jinkyu Jeong ([email protected]) 20
Networks
• Communication and resource sharing
• Local area network (LAN): Ethernet– Within a building
• Wide area network (WAN): the Internet
• Wireless network: WiFi, Bluetooth
SWE3005: Introduction to Computer Architectures, Fall 2019, Jinkyu Jeong ([email protected]) 21
Manufacturing ICs
Source: Intel
SWE3005: Introduction to Computer Architectures, Fall 2019, Jinkyu Jeong ([email protected]) 22
Processing ICs
8
Process
1. Start with a partially processed die on a silicon
wafer.� A� chip� is� often� referred� to� as� die� until� final� packaging�
has been completed.
2. Deposit oxide layer.� A� thin� film� of� oxide is an electrical
insulator.� Like� the� insulator� surrounding� household� wires,�
it� is� a� key� component� of� electronic� circuits.� Intel� “grows”�
this� layer� of� oxide� on� top� of� the� wafer� in� a� furnace� at� very�
high temperatures in the presence of oxygen.
3. Coat with photoresist. A light-sensitive substance called
photoresist prepares� the� wafer� for� the� removal� of� sections�
of� the� oxide� to� create� a� specific� oxide� pattern.� Photoresist� is�
sensitive to ultraviolet light, yet it is also resistant to certain
etching� chemicals� that� will� be� applied� later.�
4.� � Position� mask� and� flash� ultraviolet� light. Masks—
pieces� of� glass� with� transparent� and� opaque� regions—are�
a� result� of� the� design� phase� and� define� the� circuit� pattern�
on each layer of a chip. A sophisticated machine called a
stepper� aligns� the� mask� to� the� wafer.� The� stepper� “steps”�
across� the� wafer,� stopping� briefly� at� incremental� locations� to�
flash� ultraviolet� light� through� the� transparent� regions� of� the�
mask. This process is called photolithography. The portions
of the photoresist that are exposed to light become soluble.
5. Rinse with solvent. A solvent removes the exposed
portions of photoresist, revealing part of the oxide
layer underneath.
6. Etch with acid. Using an acid in a process called etching,
the exposed oxide is removed. Oxide protected by the
mask remains in place.
7. Remove remaining photoresist. Finally, the
remaining photoresist is removed, leaving the desired
pattern� of� oxide� on� the� silicon� wafer.� A� new� oxide�
layer is complete.
Building circuits to� form� a� computer� chip� is� extremely� precise� and� complex.� It� requires� dozens� of� layers� of� various� materials� in� specific� patterns to simultaneously produce hundreds or thousands of die on� each� 300mm� wafer.� The� following� illustration� takes� a� closer� look� at� the� process� of� adding� one� layer—a� single� patterned� oxide� film.
3. Coat with photoresist.1. Start with a partially processed die on a silicon wafer.
Oxide LayerOxide Layer
Photoresist
4. Position mask and flash ultraviolet light.
5. Rinse with solvent. 6. Etch with acid. 7. Remove remaining photoresist. Magnified cross-section of metalinterconnects with oxide layers.
2. Deposit oxide layer.
Oxide Layer
Photoresist
Oxide Layer
Photoresist
Mask
Ultraviolet Light
Oxide Layer
Photoresist
Oxide Layer
Metal InterconnectOxide Layer
Metal InterconnectOxide Layer
9
Performing More Fabrication Steps
Laying� down� an� oxide� layer� is� just� one� part� of� the� fabrication�
process.� Other� steps� include� the� following.
Adding more layers
Additional materials such as polysilicon, which� conducts�
electricity,� are� deposited� on� the� wafer� through� further�
film� deposition,� masking,� and� etching� steps.� Each� layer� of�
material� has� a� unique� pattern.�
Doping
The doping operation bombards the exposed areas of
the� silicon� wafer� with� various� chemical� impurities,� altering�
the� way� the� silicon� in� these� areas� conducts� electricity.�
Doping� is� what� turns� silicon� into� silicon� transistors,� enabling�
the� switching� between� the� two� states,� on� and� off,� that�
represent binary 1s� and� 0s,� which� provide� the� basis�
for representing information in a computer.
Metallization
Multiple layers of metal are applied to form the electrical
connections� between� the� transistors.� Intel� uses� eight� or�
more� patterned� layers� of� copper� because� of� its� low� resistance�
and because it can be cost-effectively integrated into the
manufacturing� process.� Interconnects� between� layers,� called�
contacts,� are� made� of� tungsten.� The� specific� patterns� of�
these metals are also formed using photolithography, as
described previously.
Completing the wafer
A� completed� wafer� contains� millions� or� even� billions� of�
transistors connected by a multi-layer maze of metal
“wires.”� Finally,� the� wafer� is� coated� with� a� passivation
layer to help protect it from contamination and increase
its electrical stability.
A completed die contains millions of circuits
that appear as an intricate pattern.
3. Coat with photoresist.1. Start with a partially processed die on a silicon wafer.
Oxide LayerOxide Layer
Photoresist
4. Position mask and flash ultraviolet light.
5. Rinse with solvent. 6. Etch with acid. 7. Remove remaining photoresist. Magnified cross-section of metalinterconnects with oxide layers.
2. Deposit oxide layer.
Oxide Layer
Photoresist
Oxide Layer
Photoresist
Mask
Ultraviolet Light
Oxide Layer
Photoresist
Oxide Layer
Metal InterconnectOxide Layer
Metal InterconnectOxide Layer
Source: Intel
SWE3005: Introduction to Computer Architectures, Fall 2019, Jinkyu Jeong ([email protected]) 23
Realizing a Logic Gate
• NAND logic gate with CMOS technology
14 ICE3003: Computer Architecture | Spring 2012 | Jin-Soo Kim ([email protected])
Realizing a Logic Gate � NAND logic built with CMOS technology
SWE3005: Introduction to Computer Architectures, Fall 2019, Jinkyu Jeong ([email protected]) 24
Intel Core i7 Wafer
• 300mm wafer, 280 chips, 32nm technology
• Each chip is 20.7 x 10.5 mm
SWE3005: Introduction to Computer Architectures, Fall 2019, Jinkyu Jeong ([email protected]) 25
Technology Trends: Processor and Disk
• Processor– Logic capacity: about 30%/year– Clock rate: about 20%/year
• Disk– Capacity: about 60%/year
Year Technology Relative performance/cost1951 Vacuum tube 11965 Transistor 351975 Integrated circuit (IC) 9001995 Very large scale IC (VLSI) 2,400,0002013 Ultra large scale IC 250,000,000,000
SWE3005: Introduction to Computer Architectures, Fall 2019, Jinkyu Jeong ([email protected]) 26
Technology Trends: Memory
• DRAM (Memory)– Capacity: about 60%/year (4x every 3 years)– Speed: about 10%/year
SWE3005: Introduction to Computer Architectures, Fall 2019, Jinkyu Jeong ([email protected])
Performance
SWE3005: Introduction to Computer Architectures, Fall 2019, Jinkyu Jeong ([email protected]) 28
Defining Performance
• Which airplane has the best performance?
0 200 400 600
Douglas DC-8-50
BAC/Sud Concorde
Boeing 747
Boeing 777
Passenger Capacity
0 2000
4000
6000
8000
1E+04
Douglas DC-8-50
BAC/Sud Concorde
Boeing 747
Boeing 777
Cruising Range (miles)
0 500 1000 1500
Douglas DC-8-50
BAC/Sud Concorde
Boeing 747
Boeing 777
Cruising Speed (mph)
0 1E+05
2E+05
3E+05
4E+05
Douglas DC-8-50
BAC/Sud Concorde
Boeing 747
Boeing 777
Passengers x mph
SWE3005: Introduction to Computer Architectures, Fall 2019, Jinkyu Jeong ([email protected]) 29
Response Time and Throughput
• Response time– How long it takes to do a task
• Throughput– Total work done per unit time
• e.g., tasks/transactions/… per hour
• How are response time and throughput affected by– Replacing the processor with a faster version?– Adding more processors?
• We’ll focus on response time for now…
SWE3005: Introduction to Computer Architectures, Fall 2019, Jinkyu Jeong ([email protected]) 30
Relative Performance• Define Performance = 1/Execution Time• “X is n time faster than Y”
n Example: time taken to run a programn 10s on A, 15s on Bn Execution TimeB / Execution TimeA
= 15s / 10s = 1.5
n So A is 1.5 times faster than B
PerformanceX PerformanceY
=Execution timeY Execution timeX = n
SWE3005: Introduction to Computer Architectures, Fall 2019, Jinkyu Jeong ([email protected]) 31
Measuring Execution Time
• Elapsed time– Total response time, including all aspects
• Processing, I/O, OS overhead, idle time
– Determines system performance
• CPU time– Time spent processing a given job
• Discounts I/O time, other jobs’ shares– Comprises user CPU time and system CPU time
SWE3005: Introduction to Computer Architectures, Fall 2019, Jinkyu Jeong ([email protected]) 32
CPU Clocking• Operation of digital hardware governed by a constant-
rate clock
• Clock cycle (period): duration of a clock cycle– e.g., 250ps = 0.25ns = 250×10–12s
• Clock rate (frequency): cycles per second– e.g., 4.0GHz = 4000MHz = 4.0×109Hz
Clock (cycles)
Data transferand computation
Update state
Clock period
SWE3005: Introduction to Computer Architectures, Fall 2019, Jinkyu Jeong ([email protected]) 33
Some Math: CPU Clock Rate vs. Clock Cycle
Clock (cycles)
Data transferand computation
Update state
Clock period
10 nsec clock cycle => 100 MHz clock rate
5 nsec clock cycle => 200 MHz clock rate
2 nsec clock cycle => 500 MHz clock rate
1 nsec (10-9) clock cycle => 1 GHz (109) clock rate
500 psec clock cycle => 2 GHz clock rate
250 psec clock cycle => 4 GHz clock rate
200 psec clock cycle => 5 GHz clock rate
SWE3005: Introduction to Computer Architectures, Fall 2019, Jinkyu Jeong ([email protected]) 34
CPU Time• CPU execution time (CPU time)– Time the CPU spends working on a task– Does not include time waiting for I/O or running other programs
• CPU time can be improved (decreased) by– Reducing number of clock cycles– Increasing clock rate– Hardware designer often trades off clock rate against cycle count
Rate ClockCycles Clock CPU
Time Cycle ClockCycles Clock CPUTime CPU
=
´=
SWE3005: Introduction to Computer Architectures, Fall 2019, Jinkyu Jeong ([email protected]) 35
CPU Time Example• A program runs on computer A with a 2 GHz clock in 10 seconds.
What clock rate must computer B run at to run this program in 6 seconds? Unfortunately, to accomplish this, computer B will require 1.2 times as many clock cycles as computer A to run the program.
• How fast must Computer B clock be?
4GHz6s
10246s
10201.2Rate Clock
10202GHz10s
Rate ClockTime CPUCycles Clock
6sCycles Clock1.2
Time CPUCycles ClockRate Clock
99
B
9
AAA
A
B
BB
=´
=´´
=
´=´=
´=
´==
SWE3005: Introduction to Computer Architectures, Fall 2019, Jinkyu Jeong ([email protected]) 36
Instruction Count and CPI
• Instruction Count for a program– Determined by program, ISA and compiler
• Average cycles per instruction– Determined by CPU hardware– Different instructions have different CPI
• Average CPI affected by instruction mix
Rate ClockCPICount nInstructio
Time Cycle ClockCPICount nInstructioTime CPU
nInstructio per CyclesCount nInstructioCycles Clock
´=
´´=
´=
SWE3005: Introduction to Computer Architectures, Fall 2019, Jinkyu Jeong ([email protected]) 37
CPI Example• Computer A: Cycle Time = 250ps, CPI = 2.0• Computer B: Cycle Time = 500ps, CPI = 1.2• Same ISA• Which is faster, and by how much?
1.2500psI600psI
ATime CPUBTime CPU
600psI500ps1.2IBTime CycleBCPICount nInstructioBTime CPU
500psI250ps2.0IATime CycleACPICount nInstructioATime CPU
=´´
=
´=´´=
´´=
´=´´=
´´=
A is faster…
…by this much
SWE3005: Introduction to Computer Architectures, Fall 2019, Jinkyu Jeong ([email protected]) 38
CPI in More Detail
• If different instruction classes take different numbers of cycles
å=
´=n
1iii )Count nInstructio(CPICycles Clock
å=
÷øö
çèæ ´==
n
1i
ii Count nInstructio
Count nInstructioCPICount nInstructio
Cycles ClockCPI
Relative frequency
• Weighted average CPI
SWE3005: Introduction to Computer Architectures, Fall 2019, Jinkyu Jeong ([email protected]) 39
CPI Example• Alternative compiled code sequences using instructions in
classes A, B, C (IC = Instruction Count)
Class A B CCPI for class 1 2 3IC in sequence 1 2 1 2IC in sequence 2 4 1 1
n Sequence 1: IC = 5n Clock Cycles
= 2×1 + 1×2 + 2×3= 10
n Avg. CPI = 10/5 = 2.0
n Sequence 2: IC = 6n Clock Cycles
= 4×1 + 1×2 + 1×3= 9
n Avg. CPI = 9/6 = 1.5
SWE3005: Introduction to Computer Architectures, Fall 2019, Jinkyu Jeong ([email protected]) 40
Performance Summary
• Performance depends on– Algorithm– Programming language– Compiler– Instruction set architecture (ISA)– Core organization (also called microarchitecture)– Fabrication technology
cycle ClockSeconds
nInstructiocycles Clock
ProgramnsInstructioTime CPU ´´=
SWE3005: Introduction to Computer Architectures, Fall 2019, Jinkyu Jeong ([email protected]) 41
Determinates of CPU Performance
Instruction_count
CPI clock_cycle
Algorithm
Programming language
Compiler
ISA
Core organization
Technology
CPU time = Instruction_count x CPI x clock_cycle
SWE3005: Introduction to Computer Architectures, Fall 2019, Jinkyu Jeong ([email protected]) 42
Determinates of CPU Performance
Instruction_count
CPI clock_cycle
Algorithm
Programming language
Compiler
ISA
Core organization
Technology
CPU time = Instruction_count x CPI x clock_cycle
X
XX
XX
X X
X
X
X
X
X
SWE3005: Introduction to Computer Architectures, Fall 2019, Jinkyu Jeong ([email protected])
The Power Wall and Transition from Uniprocessor to Multiprocessors
SWE3005: Introduction to Computer Architectures, Fall 2019, Jinkyu Jeong ([email protected]) 44
Power Trends
• In CMOS IC technology
FrequencyVoltageload CapacitivePower 2 ´´=
×1000×30 5V→1V
SWE3005: Introduction to Computer Architectures, Fall 2019, Jinkyu Jeong ([email protected])
Uniprocessor Performance
Constrained by power wall (as well as instruction-level parallelism, memory latency)
SWE3005: Introduction to Computer Architectures, Fall 2019, Jinkyu Jeong ([email protected]) 46
Single-core vs. Multi-core
Raise Clock (20%)
1.73x
1.13x
PERF
ORM
ANCE
POW
ER
Lower Clock (20%)
0.51x
0.87x
PERF
ORM
ANCE
POW
ER
Power
Performance
1.00x
PERF
ORM
ANCE
Single–Core
POW
ER
1.02x
1.73x
PERF
ORM
ANCE
POW
ERDual–Core
Source: Intel
More computation power/watt
SWE3005: Introduction to Computer Architectures, Fall 2019, Jinkyu Jeong ([email protected]) 47
Multiprocessors
• Multicore microprocessors– More than one processor per chip
• Requires explicitly parallel programming– Compare with instruction level parallelism
• Hardware executes multiple instructions at once• Hidden from the programmer
– Hard to do• Programming for performance• Load balancing• Optimizing communication and synchronization
SWE3005: Introduction to Computer Architectures, Fall 2019, Jinkyu Jeong ([email protected]) 48
Summary• Abstraction is fundamental to understanding today's
computer systems– In both hardware and software– ISA is an interface between SW and HW
• Performance measure: execution time
• Moore's law has underpinned microprocessor technology development and IT industry
• Power limit forces a dramatic change in microprocessor design
– Moving on to multicore systems
cycle ClockSeconds
nInstructiocycles Clock
ProgramnsInstructioTime CPU ´´=