computer abstractions and technology -...

48
SWE3005: Introduction to Computer Architectures, Fall 2019, Jinkyu Jeong ([email protected]) Computer Abstractions and Technology Jinkyu Jeong ([email protected] ) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu

Upload: others

Post on 12-Oct-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Computer Abstractions and Technology - SKKUcsi.skku.edu/wp-content/uploads/Lec01-abstraction.pdf–Human genome project –World Wide Web –Search Engines –Artificial intelligence

SWE3005: Introduction to Computer Architectures, Fall 2019, Jinkyu Jeong ([email protected])

Computer Abstractions and Technology

Jinkyu Jeong ([email protected])Computer Systems Laboratory

Sungkyunkwan Universityhttp://csl.skku.edu

Page 2: Computer Abstractions and Technology - SKKUcsi.skku.edu/wp-content/uploads/Lec01-abstraction.pdf–Human genome project –World Wide Web –Search Engines –Artificial intelligence

SWE3005: Introduction to Computer Architectures, Fall 2019, Jinkyu Jeong ([email protected]) 2

OutlineTextbook: P&H 1.1, 1.3-1.8 (5th Ed.) / 1.1-1.6 (4th Ed.)

• Overview

• Below Your Program

• Under the Cover

• Performance

• The Power Wall and Transition to Multiprocessors

Page 3: Computer Abstractions and Technology - SKKUcsi.skku.edu/wp-content/uploads/Lec01-abstraction.pdf–Human genome project –World Wide Web –Search Engines –Artificial intelligence

SWE3005: Introduction to Computer Architectures, Fall 2019, Jinkyu Jeong ([email protected]) 3

The Computer Revolution

• Progress in computer technology– Underpinned by Moore’s Law

• Makes novel applications feasible– Computers in automobiles– Cell phones– Human genome project– World Wide Web– Search Engines– Artificial intelligence

• Computers are pervasive

Page 4: Computer Abstractions and Technology - SKKUcsi.skku.edu/wp-content/uploads/Lec01-abstraction.pdf–Human genome project –World Wide Web –Search Engines –Artificial intelligence

SWE3005: Introduction to Computer Architectures, Fall 2019, Jinkyu Jeong ([email protected]) 4

Moore’s Law (1965)

• The single most important guideline in microprocessor fabrication and architecture

– "the number of transistors per chip will double every 18 months“

– "as the sophistication of chips goes up, the cost of [fabrication plants] goes up exponentially"

• Both are held true after 40+ years.

(http://download.intel.com/museum/research/arc_collect/history_docs/pix/hoff1.jpg)

Gordon Moore Original graph from 1965 (source: www.intel.com)R

elat

ive

Man

ufac

turi

ng C

ost/

Com

pone

ntNumber of Components Per Integrated Circuit

Page 5: Computer Abstractions and Technology - SKKUcsi.skku.edu/wp-content/uploads/Lec01-abstraction.pdf–Human genome project –World Wide Web –Search Engines –Artificial intelligence

SWE3005: Introduction to Computer Architectures, Fall 2019, Jinkyu Jeong ([email protected]) 5

Moore’s Law (1965)

Over 10,000,000x increase of transistor count (and performance)!

(Source: Wikipedia)

Page 6: Computer Abstractions and Technology - SKKUcsi.skku.edu/wp-content/uploads/Lec01-abstraction.pdf–Human genome project –World Wide Web –Search Engines –Artificial intelligence

SWE3005: Introduction to Computer Architectures, Fall 2019, Jinkyu Jeong ([email protected]) 6

Classes of Computers

• Personal computers– General purpose, variety of software– Subject to cost/performance tradeoff

• Server computers– Network based– High capacity, performance, reliability– Range from small servers to building sized

• Embedded computers– Hidden as components of systems– Stringent power/performance/cost constraints

Page 7: Computer Abstractions and Technology - SKKUcsi.skku.edu/wp-content/uploads/Lec01-abstraction.pdf–Human genome project –World Wide Web –Search Engines –Artificial intelligence

SWE3005: Introduction to Computer Architectures, Fall 2019, Jinkyu Jeong ([email protected])

The PostPC EraVo

lum

e (in

milli

on u

nits

) Cell Phone (not including smartphone)

PC (not including tablet)

Smart phone sales

Tablet

Page 8: Computer Abstractions and Technology - SKKUcsi.skku.edu/wp-content/uploads/Lec01-abstraction.pdf–Human genome project –World Wide Web –Search Engines –Artificial intelligence

SWE3005: Introduction to Computer Architectures, Fall 2019, Jinkyu Jeong ([email protected]) 8

The PostPC Era

• Personal Mobile Device (PMD)– Battery operated– Connects to the Internet– Hundreds of dollars– Smart phones, tablets, electronic glasses

• Cloud computing– Warehouse Scale Computers (WSC)– Software as a Service (SaaS)– Portion of software run on a PMD and a portion run in the

Cloud– Amazon and Google

Page 9: Computer Abstractions and Technology - SKKUcsi.skku.edu/wp-content/uploads/Lec01-abstraction.pdf–Human genome project –World Wide Web –Search Engines –Artificial intelligence

SWE3005: Introduction to Computer Architectures, Fall 2019, Jinkyu Jeong ([email protected]) 9

What You Will Learn from This Course

• How programs are translated into the machine language

– And how the hardware executes them

• The hardware/software interface

• What determines program performance– And how it can be improved

• How hardware designers improve performance

• What is parallel processing

Page 10: Computer Abstractions and Technology - SKKUcsi.skku.edu/wp-content/uploads/Lec01-abstraction.pdf–Human genome project –World Wide Web –Search Engines –Artificial intelligence

SWE3005: Introduction to Computer Architectures, Fall 2019, Jinkyu Jeong ([email protected]) 10

Understanding Performance

• Algorithm– Determines number of operations executed

• Programming language, compiler, architecture– Determine number of machine instructions executed per

operation

• Processor and memory system– Determine how fast instructions are executed

• I/O system (including OS)– Determines how fast I/O operations are executed

Page 11: Computer Abstractions and Technology - SKKUcsi.skku.edu/wp-content/uploads/Lec01-abstraction.pdf–Human genome project –World Wide Web –Search Engines –Artificial intelligence

SWE3005: Introduction to Computer Architectures, Fall 2019, Jinkyu Jeong ([email protected])

Below Your Program

Page 12: Computer Abstractions and Technology - SKKUcsi.skku.edu/wp-content/uploads/Lec01-abstraction.pdf–Human genome project –World Wide Web –Search Engines –Artificial intelligence

SWE3005: Introduction to Computer Architectures, Fall 2019, Jinkyu Jeong ([email protected]) 12

Below Your Program

• Application software– Written in high-level language

• System software– Compiler: translates HLL code to machine code– Operating System: service code

• Handling input/output• Managing memory and storage• Scheduling tasks & sharing resources

• Hardware– Processor, memory, I/O controllers

Page 13: Computer Abstractions and Technology - SKKUcsi.skku.edu/wp-content/uploads/Lec01-abstraction.pdf–Human genome project –World Wide Web –Search Engines –Artificial intelligence

SWE3005: Introduction to Computer Architectures, Fall 2019, Jinkyu Jeong ([email protected]) 13

Levels of Program Code

• High-level language– Level of abstraction closer to

problem domain– Provides for productivity and

portability

• Assembly language– Textual representation of instructions

• Hardware representation– Binary digits (bits)– Encoded instructions and data

Page 14: Computer Abstractions and Technology - SKKUcsi.skku.edu/wp-content/uploads/Lec01-abstraction.pdf–Human genome project –World Wide Web –Search Engines –Artificial intelligence

SWE3005: Introduction to Computer Architectures, Fall 2019, Jinkyu Jeong ([email protected])

Under the Covers

Page 15: Computer Abstractions and Technology - SKKUcsi.skku.edu/wp-content/uploads/Lec01-abstraction.pdf–Human genome project –World Wide Web –Search Engines –Artificial intelligence

SWE3005: Introduction to Computer Architectures, Fall 2019, Jinkyu Jeong ([email protected]) 15

Components of a Computer• Same components for

all kinds of computer– Desktop, server,

embedded• Processor

(Datapath/Control), Memory, and I/O

• Input/output (I/O) includes– User-interface devices

• Display, keyboard, mouse– Storage devices

• Hard disk, CD/DVD, flash– Network adapters

• For communicating with other computers

The BIG Picture

Page 16: Computer Abstractions and Technology - SKKUcsi.skku.edu/wp-content/uploads/Lec01-abstraction.pdf–Human genome project –World Wide Web –Search Engines –Artificial intelligence

SWE3005: Introduction to Computer Architectures, Fall 2019, Jinkyu Jeong ([email protected])

Opening the Box

Capacitive multitouch LCD screen

3.8 V, 25 Watt-hour battery

Computer board

Page 17: Computer Abstractions and Technology - SKKUcsi.skku.edu/wp-content/uploads/Lec01-abstraction.pdf–Human genome project –World Wide Web –Search Engines –Artificial intelligence

SWE3005: Introduction to Computer Architectures, Fall 2019, Jinkyu Jeong ([email protected]) 17

Inside the Processor (CPU) (5th Ed.)

• Apple A5

You will understand what each block does by the end of this course!

• Datapath: performs operations on data• Control: sequences datapath, memory, ...• Cache memory

• Small fast SRAM memory for immediate access to data

Page 18: Computer Abstractions and Technology - SKKUcsi.skku.edu/wp-content/uploads/Lec01-abstraction.pdf–Human genome project –World Wide Web –Search Engines –Artificial intelligence

SWE3005: Introduction to Computer Architectures, Fall 2019, Jinkyu Jeong ([email protected]) 18

Abstractions• Abstraction helps us deal with complexity– Hide lower-level detail

• Instruction set architecture (ISA)– Hardware abstraction visible to software (compiler or

programmer)– The hardware/software interface

software

hardware

Source: D. Culler @ UC Berkeley

instruction set

Page 19: Computer Abstractions and Technology - SKKUcsi.skku.edu/wp-content/uploads/Lec01-abstraction.pdf–Human genome project –World Wide Web –Search Engines –Artificial intelligence

SWE3005: Introduction to Computer Architectures, Fall 2019, Jinkyu Jeong ([email protected]) 19

A Safe Place for Data

• Volatile main memory– Loses instructions and data when power off– E.g., DRAM

• Non-volatile secondary memory– Magnetic disk– Flash memory– Optical disk (CDROM, DVD)

Page 20: Computer Abstractions and Technology - SKKUcsi.skku.edu/wp-content/uploads/Lec01-abstraction.pdf–Human genome project –World Wide Web –Search Engines –Artificial intelligence

SWE3005: Introduction to Computer Architectures, Fall 2019, Jinkyu Jeong ([email protected]) 20

Networks

• Communication and resource sharing

• Local area network (LAN): Ethernet– Within a building

• Wide area network (WAN): the Internet

• Wireless network: WiFi, Bluetooth

Page 21: Computer Abstractions and Technology - SKKUcsi.skku.edu/wp-content/uploads/Lec01-abstraction.pdf–Human genome project –World Wide Web –Search Engines –Artificial intelligence

SWE3005: Introduction to Computer Architectures, Fall 2019, Jinkyu Jeong ([email protected]) 21

Manufacturing ICs

Source: Intel

Page 22: Computer Abstractions and Technology - SKKUcsi.skku.edu/wp-content/uploads/Lec01-abstraction.pdf–Human genome project –World Wide Web –Search Engines –Artificial intelligence

SWE3005: Introduction to Computer Architectures, Fall 2019, Jinkyu Jeong ([email protected]) 22

Processing ICs

8

Process

1. Start with a partially processed die on a silicon

wafer.� A� chip� is� often� referred� to� as� die� until� final� packaging�

has been completed.

2. Deposit oxide layer.� A� thin� film� of� oxide is an electrical

insulator.� Like� the� insulator� surrounding� household� wires,�

it� is� a� key� component� of� electronic� circuits.� Intel� “grows”�

this� layer� of� oxide� on� top� of� the� wafer� in� a� furnace� at� very�

high temperatures in the presence of oxygen.

3. Coat with photoresist. A light-sensitive substance called

photoresist prepares� the� wafer� for� the� removal� of� sections�

of� the� oxide� to� create� a� specific� oxide� pattern.� Photoresist� is�

sensitive to ultraviolet light, yet it is also resistant to certain

etching� chemicals� that� will� be� applied� later.�

4.� � Position� mask� and� flash� ultraviolet� light. Masks—

pieces� of� glass� with� transparent� and� opaque� regions—are�

a� result� of� the� design� phase� and� define� the� circuit� pattern�

on each layer of a chip. A sophisticated machine called a

stepper� aligns� the� mask� to� the� wafer.� The� stepper� “steps”�

across� the� wafer,� stopping� briefly� at� incremental� locations� to�

flash� ultraviolet� light� through� the� transparent� regions� of� the�

mask. This process is called photolithography. The portions

of the photoresist that are exposed to light become soluble.

5. Rinse with solvent. A solvent removes the exposed

portions of photoresist, revealing part of the oxide

layer underneath.

6. Etch with acid. Using an acid in a process called etching,

the exposed oxide is removed. Oxide protected by the

mask remains in place.

7. Remove remaining photoresist. Finally, the

remaining photoresist is removed, leaving the desired

pattern� of� oxide� on� the� silicon� wafer.� A� new� oxide�

layer is complete.

Building circuits to� form� a� computer� chip� is� extremely� precise� and� complex.� It� requires� dozens� of� layers� of� various� materials� in� specific� patterns to simultaneously produce hundreds or thousands of die on� each� 300mm� wafer.� The� following� illustration� takes� a� closer� look� at� the� process� of� adding� one� layer—a� single� patterned� oxide� film.

3. Coat with photoresist.1. Start with a partially processed die on a silicon wafer.

Oxide LayerOxide Layer

Photoresist

4. Position mask and flash ultraviolet light.

5. Rinse with solvent. 6. Etch with acid. 7. Remove remaining photoresist. Magnified cross-section of metalinterconnects with oxide layers.

2. Deposit oxide layer.

Oxide Layer

Photoresist

Oxide Layer

Photoresist

Mask

Ultraviolet Light

Oxide Layer

Photoresist

Oxide Layer

Metal InterconnectOxide Layer

Metal InterconnectOxide Layer

9

Performing More Fabrication Steps

Laying� down� an� oxide� layer� is� just� one� part� of� the� fabrication�

process.� Other� steps� include� the� following.

Adding more layers

Additional materials such as polysilicon, which� conducts�

electricity,� are� deposited� on� the� wafer� through� further�

film� deposition,� masking,� and� etching� steps.� Each� layer� of�

material� has� a� unique� pattern.�

Doping

The doping operation bombards the exposed areas of

the� silicon� wafer� with� various� chemical� impurities,� altering�

the� way� the� silicon� in� these� areas� conducts� electricity.�

Doping� is� what� turns� silicon� into� silicon� transistors,� enabling�

the� switching� between� the� two� states,� on� and� off,� that�

represent binary 1s� and� 0s,� which� provide� the� basis�

for representing information in a computer.

Metallization

Multiple layers of metal are applied to form the electrical

connections� between� the� transistors.� Intel� uses� eight� or�

more� patterned� layers� of� copper� because� of� its� low� resistance�

and because it can be cost-effectively integrated into the

manufacturing� process.� Interconnects� between� layers,� called�

contacts,� are� made� of� tungsten.� The� specific� patterns� of�

these metals are also formed using photolithography, as

described previously.

Completing the wafer

A� completed� wafer� contains� millions� or� even� billions� of�

transistors connected by a multi-layer maze of metal

“wires.”� Finally,� the� wafer� is� coated� with� a� passivation

layer to help protect it from contamination and increase

its electrical stability.

A completed die contains millions of circuits

that appear as an intricate pattern.

3. Coat with photoresist.1. Start with a partially processed die on a silicon wafer.

Oxide LayerOxide Layer

Photoresist

4. Position mask and flash ultraviolet light.

5. Rinse with solvent. 6. Etch with acid. 7. Remove remaining photoresist. Magnified cross-section of metalinterconnects with oxide layers.

2. Deposit oxide layer.

Oxide Layer

Photoresist

Oxide Layer

Photoresist

Mask

Ultraviolet Light

Oxide Layer

Photoresist

Oxide Layer

Metal InterconnectOxide Layer

Metal InterconnectOxide Layer

Source: Intel

Page 23: Computer Abstractions and Technology - SKKUcsi.skku.edu/wp-content/uploads/Lec01-abstraction.pdf–Human genome project –World Wide Web –Search Engines –Artificial intelligence

SWE3005: Introduction to Computer Architectures, Fall 2019, Jinkyu Jeong ([email protected]) 23

Realizing a Logic Gate

• NAND logic gate with CMOS technology

14 ICE3003: Computer Architecture | Spring 2012 | Jin-Soo Kim ([email protected])

Realizing a Logic Gate � NAND logic built with CMOS technology

Page 24: Computer Abstractions and Technology - SKKUcsi.skku.edu/wp-content/uploads/Lec01-abstraction.pdf–Human genome project –World Wide Web –Search Engines –Artificial intelligence

SWE3005: Introduction to Computer Architectures, Fall 2019, Jinkyu Jeong ([email protected]) 24

Intel Core i7 Wafer

• 300mm wafer, 280 chips, 32nm technology

• Each chip is 20.7 x 10.5 mm

Page 25: Computer Abstractions and Technology - SKKUcsi.skku.edu/wp-content/uploads/Lec01-abstraction.pdf–Human genome project –World Wide Web –Search Engines –Artificial intelligence

SWE3005: Introduction to Computer Architectures, Fall 2019, Jinkyu Jeong ([email protected]) 25

Technology Trends: Processor and Disk

• Processor– Logic capacity: about 30%/year– Clock rate: about 20%/year

• Disk– Capacity: about 60%/year

Year Technology Relative performance/cost1951 Vacuum tube 11965 Transistor 351975 Integrated circuit (IC) 9001995 Very large scale IC (VLSI) 2,400,0002013 Ultra large scale IC 250,000,000,000

Page 26: Computer Abstractions and Technology - SKKUcsi.skku.edu/wp-content/uploads/Lec01-abstraction.pdf–Human genome project –World Wide Web –Search Engines –Artificial intelligence

SWE3005: Introduction to Computer Architectures, Fall 2019, Jinkyu Jeong ([email protected]) 26

Technology Trends: Memory

• DRAM (Memory)– Capacity: about 60%/year (4x every 3 years)– Speed: about 10%/year

Page 27: Computer Abstractions and Technology - SKKUcsi.skku.edu/wp-content/uploads/Lec01-abstraction.pdf–Human genome project –World Wide Web –Search Engines –Artificial intelligence

SWE3005: Introduction to Computer Architectures, Fall 2019, Jinkyu Jeong ([email protected])

Performance

Page 28: Computer Abstractions and Technology - SKKUcsi.skku.edu/wp-content/uploads/Lec01-abstraction.pdf–Human genome project –World Wide Web –Search Engines –Artificial intelligence

SWE3005: Introduction to Computer Architectures, Fall 2019, Jinkyu Jeong ([email protected]) 28

Defining Performance

• Which airplane has the best performance?

0 200 400 600

Douglas DC-8-50

BAC/Sud Concorde

Boeing 747

Boeing 777

Passenger Capacity

0 2000

4000

6000

8000

1E+04

Douglas DC-8-50

BAC/Sud Concorde

Boeing 747

Boeing 777

Cruising Range (miles)

0 500 1000 1500

Douglas DC-8-50

BAC/Sud Concorde

Boeing 747

Boeing 777

Cruising Speed (mph)

0 1E+05

2E+05

3E+05

4E+05

Douglas DC-8-50

BAC/Sud Concorde

Boeing 747

Boeing 777

Passengers x mph

Page 29: Computer Abstractions and Technology - SKKUcsi.skku.edu/wp-content/uploads/Lec01-abstraction.pdf–Human genome project –World Wide Web –Search Engines –Artificial intelligence

SWE3005: Introduction to Computer Architectures, Fall 2019, Jinkyu Jeong ([email protected]) 29

Response Time and Throughput

• Response time– How long it takes to do a task

• Throughput– Total work done per unit time

• e.g., tasks/transactions/… per hour

• How are response time and throughput affected by– Replacing the processor with a faster version?– Adding more processors?

• We’ll focus on response time for now…

Page 30: Computer Abstractions and Technology - SKKUcsi.skku.edu/wp-content/uploads/Lec01-abstraction.pdf–Human genome project –World Wide Web –Search Engines –Artificial intelligence

SWE3005: Introduction to Computer Architectures, Fall 2019, Jinkyu Jeong ([email protected]) 30

Relative Performance• Define Performance = 1/Execution Time• “X is n time faster than Y”

n Example: time taken to run a programn 10s on A, 15s on Bn Execution TimeB / Execution TimeA

= 15s / 10s = 1.5

n So A is 1.5 times faster than B

PerformanceX PerformanceY

=Execution timeY Execution timeX = n

Page 31: Computer Abstractions and Technology - SKKUcsi.skku.edu/wp-content/uploads/Lec01-abstraction.pdf–Human genome project –World Wide Web –Search Engines –Artificial intelligence

SWE3005: Introduction to Computer Architectures, Fall 2019, Jinkyu Jeong ([email protected]) 31

Measuring Execution Time

• Elapsed time– Total response time, including all aspects

• Processing, I/O, OS overhead, idle time

– Determines system performance

• CPU time– Time spent processing a given job

• Discounts I/O time, other jobs’ shares– Comprises user CPU time and system CPU time

Page 32: Computer Abstractions and Technology - SKKUcsi.skku.edu/wp-content/uploads/Lec01-abstraction.pdf–Human genome project –World Wide Web –Search Engines –Artificial intelligence

SWE3005: Introduction to Computer Architectures, Fall 2019, Jinkyu Jeong ([email protected]) 32

CPU Clocking• Operation of digital hardware governed by a constant-

rate clock

• Clock cycle (period): duration of a clock cycle– e.g., 250ps = 0.25ns = 250×10–12s

• Clock rate (frequency): cycles per second– e.g., 4.0GHz = 4000MHz = 4.0×109Hz

Clock (cycles)

Data transferand computation

Update state

Clock period

Page 33: Computer Abstractions and Technology - SKKUcsi.skku.edu/wp-content/uploads/Lec01-abstraction.pdf–Human genome project –World Wide Web –Search Engines –Artificial intelligence

SWE3005: Introduction to Computer Architectures, Fall 2019, Jinkyu Jeong ([email protected]) 33

Some Math: CPU Clock Rate vs. Clock Cycle

Clock (cycles)

Data transferand computation

Update state

Clock period

10 nsec clock cycle => 100 MHz clock rate

5 nsec clock cycle => 200 MHz clock rate

2 nsec clock cycle => 500 MHz clock rate

1 nsec (10-9) clock cycle => 1 GHz (109) clock rate

500 psec clock cycle => 2 GHz clock rate

250 psec clock cycle => 4 GHz clock rate

200 psec clock cycle => 5 GHz clock rate

Page 34: Computer Abstractions and Technology - SKKUcsi.skku.edu/wp-content/uploads/Lec01-abstraction.pdf–Human genome project –World Wide Web –Search Engines –Artificial intelligence

SWE3005: Introduction to Computer Architectures, Fall 2019, Jinkyu Jeong ([email protected]) 34

CPU Time• CPU execution time (CPU time)– Time the CPU spends working on a task– Does not include time waiting for I/O or running other programs

• CPU time can be improved (decreased) by– Reducing number of clock cycles– Increasing clock rate– Hardware designer often trades off clock rate against cycle count

Rate ClockCycles Clock CPU

Time Cycle ClockCycles Clock CPUTime CPU

=

´=

Page 35: Computer Abstractions and Technology - SKKUcsi.skku.edu/wp-content/uploads/Lec01-abstraction.pdf–Human genome project –World Wide Web –Search Engines –Artificial intelligence

SWE3005: Introduction to Computer Architectures, Fall 2019, Jinkyu Jeong ([email protected]) 35

CPU Time Example• A program runs on computer A with a 2 GHz clock in 10 seconds.

What clock rate must computer B run at to run this program in 6 seconds? Unfortunately, to accomplish this, computer B will require 1.2 times as many clock cycles as computer A to run the program.

• How fast must Computer B clock be?

4GHz6s

10246s

10201.2Rate Clock

10202GHz10s

Rate ClockTime CPUCycles Clock

6sCycles Clock1.2

Time CPUCycles ClockRate Clock

99

B

9

AAA

A

B

BB

=´´

=

´=´=

´=

´==

Page 36: Computer Abstractions and Technology - SKKUcsi.skku.edu/wp-content/uploads/Lec01-abstraction.pdf–Human genome project –World Wide Web –Search Engines –Artificial intelligence

SWE3005: Introduction to Computer Architectures, Fall 2019, Jinkyu Jeong ([email protected]) 36

Instruction Count and CPI

• Instruction Count for a program– Determined by program, ISA and compiler

• Average cycles per instruction– Determined by CPU hardware– Different instructions have different CPI

• Average CPI affected by instruction mix

Rate ClockCPICount nInstructio

Time Cycle ClockCPICount nInstructioTime CPU

nInstructio per CyclesCount nInstructioCycles Clock

´=

´´=

´=

Page 37: Computer Abstractions and Technology - SKKUcsi.skku.edu/wp-content/uploads/Lec01-abstraction.pdf–Human genome project –World Wide Web –Search Engines –Artificial intelligence

SWE3005: Introduction to Computer Architectures, Fall 2019, Jinkyu Jeong ([email protected]) 37

CPI Example• Computer A: Cycle Time = 250ps, CPI = 2.0• Computer B: Cycle Time = 500ps, CPI = 1.2• Same ISA• Which is faster, and by how much?

1.2500psI600psI

ATime CPUBTime CPU

600psI500ps1.2IBTime CycleBCPICount nInstructioBTime CPU

500psI250ps2.0IATime CycleACPICount nInstructioATime CPU

=´´

=

´=´´=

´´=

´=´´=

´´=

A is faster…

…by this much

Page 38: Computer Abstractions and Technology - SKKUcsi.skku.edu/wp-content/uploads/Lec01-abstraction.pdf–Human genome project –World Wide Web –Search Engines –Artificial intelligence

SWE3005: Introduction to Computer Architectures, Fall 2019, Jinkyu Jeong ([email protected]) 38

CPI in More Detail

• If different instruction classes take different numbers of cycles

å=

´=n

1iii )Count nInstructio(CPICycles Clock

å=

÷øö

çèæ ´==

n

1i

ii Count nInstructio

Count nInstructioCPICount nInstructio

Cycles ClockCPI

Relative frequency

• Weighted average CPI

Page 39: Computer Abstractions and Technology - SKKUcsi.skku.edu/wp-content/uploads/Lec01-abstraction.pdf–Human genome project –World Wide Web –Search Engines –Artificial intelligence

SWE3005: Introduction to Computer Architectures, Fall 2019, Jinkyu Jeong ([email protected]) 39

CPI Example• Alternative compiled code sequences using instructions in

classes A, B, C (IC = Instruction Count)

Class A B CCPI for class 1 2 3IC in sequence 1 2 1 2IC in sequence 2 4 1 1

n Sequence 1: IC = 5n Clock Cycles

= 2×1 + 1×2 + 2×3= 10

n Avg. CPI = 10/5 = 2.0

n Sequence 2: IC = 6n Clock Cycles

= 4×1 + 1×2 + 1×3= 9

n Avg. CPI = 9/6 = 1.5

Page 40: Computer Abstractions and Technology - SKKUcsi.skku.edu/wp-content/uploads/Lec01-abstraction.pdf–Human genome project –World Wide Web –Search Engines –Artificial intelligence

SWE3005: Introduction to Computer Architectures, Fall 2019, Jinkyu Jeong ([email protected]) 40

Performance Summary

• Performance depends on– Algorithm– Programming language– Compiler– Instruction set architecture (ISA)– Core organization (also called microarchitecture)– Fabrication technology

cycle ClockSeconds

nInstructiocycles Clock

ProgramnsInstructioTime CPU ´´=

Page 41: Computer Abstractions and Technology - SKKUcsi.skku.edu/wp-content/uploads/Lec01-abstraction.pdf–Human genome project –World Wide Web –Search Engines –Artificial intelligence

SWE3005: Introduction to Computer Architectures, Fall 2019, Jinkyu Jeong ([email protected]) 41

Determinates of CPU Performance

Instruction_count

CPI clock_cycle

Algorithm

Programming language

Compiler

ISA

Core organization

Technology

CPU time = Instruction_count x CPI x clock_cycle

Page 42: Computer Abstractions and Technology - SKKUcsi.skku.edu/wp-content/uploads/Lec01-abstraction.pdf–Human genome project –World Wide Web –Search Engines –Artificial intelligence

SWE3005: Introduction to Computer Architectures, Fall 2019, Jinkyu Jeong ([email protected]) 42

Determinates of CPU Performance

Instruction_count

CPI clock_cycle

Algorithm

Programming language

Compiler

ISA

Core organization

Technology

CPU time = Instruction_count x CPI x clock_cycle

X

XX

XX

X X

X

X

X

X

X

Page 43: Computer Abstractions and Technology - SKKUcsi.skku.edu/wp-content/uploads/Lec01-abstraction.pdf–Human genome project –World Wide Web –Search Engines –Artificial intelligence

SWE3005: Introduction to Computer Architectures, Fall 2019, Jinkyu Jeong ([email protected])

The Power Wall and Transition from Uniprocessor to Multiprocessors

Page 44: Computer Abstractions and Technology - SKKUcsi.skku.edu/wp-content/uploads/Lec01-abstraction.pdf–Human genome project –World Wide Web –Search Engines –Artificial intelligence

SWE3005: Introduction to Computer Architectures, Fall 2019, Jinkyu Jeong ([email protected]) 44

Power Trends

• In CMOS IC technology

FrequencyVoltageload CapacitivePower 2 ´´=

×1000×30 5V→1V

Page 45: Computer Abstractions and Technology - SKKUcsi.skku.edu/wp-content/uploads/Lec01-abstraction.pdf–Human genome project –World Wide Web –Search Engines –Artificial intelligence

SWE3005: Introduction to Computer Architectures, Fall 2019, Jinkyu Jeong ([email protected])

Uniprocessor Performance

Constrained by power wall (as well as instruction-level parallelism, memory latency)

Page 46: Computer Abstractions and Technology - SKKUcsi.skku.edu/wp-content/uploads/Lec01-abstraction.pdf–Human genome project –World Wide Web –Search Engines –Artificial intelligence

SWE3005: Introduction to Computer Architectures, Fall 2019, Jinkyu Jeong ([email protected]) 46

Single-core vs. Multi-core

Raise Clock (20%)

1.73x

1.13x

PERF

ORM

ANCE

POW

ER

Lower Clock (20%)

0.51x

0.87x

PERF

ORM

ANCE

POW

ER

Power

Performance

1.00x

PERF

ORM

ANCE

Single–Core

POW

ER

1.02x

1.73x

PERF

ORM

ANCE

POW

ERDual–Core

Source: Intel

More computation power/watt

Page 47: Computer Abstractions and Technology - SKKUcsi.skku.edu/wp-content/uploads/Lec01-abstraction.pdf–Human genome project –World Wide Web –Search Engines –Artificial intelligence

SWE3005: Introduction to Computer Architectures, Fall 2019, Jinkyu Jeong ([email protected]) 47

Multiprocessors

• Multicore microprocessors– More than one processor per chip

• Requires explicitly parallel programming– Compare with instruction level parallelism

• Hardware executes multiple instructions at once• Hidden from the programmer

– Hard to do• Programming for performance• Load balancing• Optimizing communication and synchronization

Page 48: Computer Abstractions and Technology - SKKUcsi.skku.edu/wp-content/uploads/Lec01-abstraction.pdf–Human genome project –World Wide Web –Search Engines –Artificial intelligence

SWE3005: Introduction to Computer Architectures, Fall 2019, Jinkyu Jeong ([email protected]) 48

Summary• Abstraction is fundamental to understanding today's

computer systems– In both hardware and software– ISA is an interface between SW and HW

• Performance measure: execution time

• Moore's law has underpinned microprocessor technology development and IT industry

• Power limit forces a dramatic change in microprocessor design

– Moving on to multicore systems

cycle ClockSeconds

nInstructiocycles Clock

ProgramnsInstructioTime CPU ´´=