computer abstractions and technology -...

SWE3005: Introduction to Computer Architectures, Fall 2019, Jinkyu Jeong ([email protected])

Computer Abstractions and Technology

Jinkyu Jeong ([email protected])Computer Systems Laboratory

Sungkyunkwan Universityhttp://csl.skku.edu

mailto:[email protected])

http://csl.skku.edu/

SWE3005: Introduction to Computer Architectures, Fall 2019, Jinkyu Jeong ([email protected]) 2

OutlineTextbook: P&H 1.1, 1.3-1.8 (5th Ed.) / 1.1-1.6 (4th Ed.)

• Overview

• Below Your Program

• Under the Cover

• Performance

• The Power Wall and Transition to Multiprocessors


The Computer Revolution

• Progress in computer technology– Underpinned by Moore’s Law

• Makes novel applications feasible– Computers in automobiles– Cell phones– Human genome project– World Wide Web– Search Engines– Artificial intelligence

• Computers are pervasive


Moore’s Law (1965)

• The single most important guideline in microprocessor fabrication and architecture

– "the number of transistors per chip will double every 18 months“

– "as the sophistication of chips goes up, the cost of [fabrication plants] goes up exponentially"

• Both are held true after 40+ years.

(http://download.intel.com/museum/research/arc_collect/history_docs/pix/hoff1.jpg)

Gordon Moore Original graph from 1965 (source: www.intel.com)R

elat

ive

Man

ufac

turi

ng C

ost/

Com

pone

ntNumber of Components Per Integrated Circuit


Moore’s Law (1965)

Over 10,000,000x increase of transistor count (and performance)!

(Source: Wikipedia)


Classes of Computers

• Personal computers– General purpose, variety of software– Subject to cost/performance tradeoff

• Server computers– Network based– High capacity, performance, reliability– Range from small servers to building sized

• Embedded computers– Hidden as components of systems– Stringent power/performance/cost constraints


The PostPC EraVo

lum

e (in

milli

on u

nits

) Cell Phone (not including smartphone)

PC (not including tablet)

Smart phone sales

Tablet


The PostPC Era

• Personal Mobile Device (PMD)– Battery operated– Connects to the Internet– Hundreds of dollars– Smart phones, tablets, electronic glasses

• Cloud computing– Warehouse Scale Computers (WSC)– Software as a Service (SaaS)– Portion of software run on a PMD and a portion run in the

Cloud– Amazon and Google


What You Will Learn from This Course

• How programs are translated into the machine language

– And how the hardware executes them

• The hardware/software interface

• What determines program performance– And how it can be improved

• How hardware designers improve performance

• What is parallel processing


Understanding Performance

• Algorithm– Determines number of operations executed

• Programming language, compiler, architecture– Determine number of machine instructions executed per

operation

• Processor and memory system– Determine how fast instructions are executed

• I/O system (including OS)– Determines how fast I/O operations are executed


Below Your Program


Below Your Program

• Application software– Written in high-level language

• System software– Compiler: translates HLL code to machine code– Operating System: service code

• Handling input/output• Managing memory and storage• Scheduling tasks & sharing resources

• Hardware– Processor, memory, I/O controllers


Levels of Program Code

• High-level language– Level of abstraction closer to

problem domain– Provides for productivity and

portability

• Assembly language– Textual representation of instructions

• Hardware representation– Binary digits (bits)– Encoded instructions and data


Under the Covers


Components of a Computer• Same components for

all kinds of computer– Desktop, server,

embedded• Processor

(Datapath/Control), Memory, and I/O

• Input/output (I/O) includes– User-interface devices

• Display, keyboard, mouse– Storage devices

• Hard disk, CD/DVD, flash– Network adapters

• For communicating with other computers

The BIG Picture


Opening the Box

Capacitive multitouch LCD screen

3.8 V, 25 Watt-hour battery

Computer board


Inside the Processor (CPU) (5th Ed.)

• Apple A5

You will understand what each block does by the end of this course!

• Datapath: performs operations on data• Control: sequences datapath, memory, ...• Cache memory

• Small fast SRAM memory for immediate access to data


Abstractions• Abstraction helps us deal with complexity– Hide lower-level detail

• Instruction set architecture (ISA)– Hardware abstraction visible to software (compiler or

programmer)– The hardware/software interface

software

hardware

Source: D. Culler @ UC Berkeley

instruction set


A Safe Place for Data

• Volatile main memory– Loses instructions and data when power off– E.g., DRAM

• Non-volatile secondary memory– Magnetic disk– Flash memory– Optical disk (CDROM, DVD)


Networks

• Communication and resource sharing

• Local area network (LAN): Ethernet– Within a building

• Wide area network (WAN): the Internet

• Wireless network: WiFi, Bluetooth


Manufacturing ICs

Source: Intel


Processing ICs

8

Process

1. Start with a partially processed die on a silicon

wafer.� A� chip� is� often� referred� to� as� die� until� final� packaging�

has been completed.

2. Deposit oxide layer.� A� thin� film� of� oxide is an electrical

insulator.� Like� the� insulator� surrounding� household� wires,�

it� is� a� key� component� of� electronic� circuits.� Intel� “grows”�

this� layer� of� oxide� on� top� of� the� wafer� in� a� furnace� at� very�

high temperatures in the presence of oxygen.

3. Coat with photoresist. A light-sensitive substance called

photoresist prepares� the� wafer� for� the� removal� of� sections�

of� the� oxide� to� create� a� specific� oxide� pattern.� Photoresist� is�

sensitive to ultraviolet light, yet it is also resistant to certain

etching� chemicals� that� will� be� applied� later.�

4.� � Position� mask� and� flash� ultraviolet� light. Masks—

pieces� of� glass� with� transparent� and� opaque� regions—are�

a� result� of� the� design� phase� and� define� the� circuit� pattern�

on each layer of a chip. A sophisticated machine called a

stepper� aligns� the� mask� to� the� wafer.� The� stepper� “steps”�

across� the� wafer,� stopping� briefly� at� incremental� locations� to�

flash� ultraviolet� light� through� the� transparent� regions� of� the�

mask. This process is called photolithography. The portions

of the photoresist that are exposed to light become soluble.

5. Rinse with solvent. A solvent removes the exposed

portions of photoresist, revealing part of the oxide

layer underneath.

6. Etch with acid. Using an acid in a process called etching,

the exposed oxide is removed. Oxide protected by the

mask remains in place.

7. Remove remaining photoresist. Finally, the

remaining photoresist is removed, leaving the desired

pattern� of� oxide� on� the� silicon� wafer.� A� new� oxide�

layer is complete.

Building circuits to� form� a� computer� chip� is� extremely� precise� and� complex.� It� requires� dozens� of� layers� of� various� materials� in� specific� patterns to simultaneously produce hundreds or thousands of die on� each� 300mm� wafer.� The� following� illustration� takes� a� closer� look� at� the� process� of� adding� one� layer—a� single� patterned� oxide� film.

3. Coat with photoresist.1. Start with a partially processed die on a silicon wafer.

Oxide LayerOxide Layer

Photoresist

4. Position mask and flash ultraviolet light.

5. Rinse with solvent. 6. Etch with acid. 7. Remove remaining photoresist. Magnified cross-section of metalinterconnects with oxide layers.

2. Deposit oxide layer.

Oxide Layer

Photoresist

Oxide Layer

Photoresist

Mask

Ultraviolet Light

Oxide Layer

Photoresist

Oxide Layer

Metal InterconnectOxide Layer


9

Performing More Fabrication Steps

Laying� down� an� oxide� layer� is� just� one� part� of� the� fabrication�

process.� Other� steps� include� the� following.

Adding more layers

Additional materials such as polysilicon, which� conducts�

electricity,� are� deposited� on� the� wafer� through� further�

film� deposition,� masking,� and� etching� steps.� Each� layer� of�

material� has� a� unique� pattern.�

Doping

The doping operation bombards the exposed areas of

the� silicon� wafer� with� various� chemical� impurities,� altering�

the� way� the� silicon� in� these� areas� conducts� electricity.�

Doping� is� what� turns� silicon� into� silicon� transistors,� enabling�

the� switching� between� the� two� states,� on� and� off,� that�

represent binary 1s� and� 0s,� which� provide� the� basis�

for representing information in a computer.

Metallization

Multiple layers of metal are applied to form the electrical

connections� between� the� transistors.� Intel� uses� eight� or�

more� patterned� layers� of� copper� because� of� its� low� resistance�

and because it can be cost-effectively integrated into the

manufacturing� process.� Interconnects� between� layers,� called�

contacts,� are� made� of� tungsten.� The� specific� patterns� of�

these metals are also formed using photolithography, as

described previously.

Completing the wafer

A� completed� wafer� contains� millions� or� even� billions� of�

transistors connected by a multi-layer maze of metal

“wires.”� Finally,� the� wafer� is� coated� with� a� passivation

layer to help protect it from contamination and increase

its electrical stability.

A completed die contains millions of circuits

that appear as an intricate pattern.

3. Coat with photoresist.1. Start with a partially processed die on a silicon wafer.

Oxide LayerOxide Layer

Photoresist

4. Position mask and flash ultraviolet light.

5. Rinse with solvent. 6. Etch with acid. 7. Remove remaining photoresist. Magnified cross-section of metalinterconnects with oxide layers.

2. Deposit oxide layer.

Oxide Layer

Photoresist

Oxide Layer

Photoresist

Mask

Ultraviolet Light

Oxide Layer

Photoresist

Oxide Layer



Source: Intel


Realizing a Logic Gate

• NAND logic gate with CMOS technology

14 ICE3003: Computer Architecture | Spring 2012 | Jin-Soo Kim ([email protected])

Realizing a Logic Gate � NAND logic built with CMOS technology


Intel Core i7 Wafer

• 300mm wafer, 280 chips, 32nm technology

• Each chip is 20.7 x 10.5 mm


Technology Trends: Processor and Disk

• Processor– Logic capacity: about 30%/year– Clock rate: about 20%/year

• Disk– Capacity: about 60%/year

Year Technology Relative performance/cost1951 Vacuum tube 11965 Transistor 351975 Integrated circuit (IC) 9001995 Very large scale IC (VLSI) 2,400,0002013 Ultra large scale IC 250,000,000,000


Technology Trends: Memory

• DRAM (Memory)– Capacity: about 60%/year (4x every 3 years)– Speed: about 10%/year


Performance


Defining Performance

• Which airplane has the best performance?

0 200 400 600

Douglas DC-8-50

BAC/Sud Concorde

Boeing 747

Boeing 777

Passenger Capacity

0 2000

4000

6000

8000

1E+04

Douglas DC-8-50

BAC/Sud Concorde

Boeing 747

Boeing 777

Cruising Range (miles)

0 500 1000 1500

Douglas DC-8-50

BAC/Sud Concorde

Boeing 747

Boeing 777

Cruising Speed (mph)

0 1E+05

2E+05

3E+05

4E+05

Douglas DC-8-50

BAC/Sud Concorde

Boeing 747

Boeing 777

Passengers x mph


Response Time and Throughput

• Response time– How long it takes to do a task

• Throughput– Total work done per unit time

• e.g., tasks/transactions/… per hour

• How are response time and throughput affected by– Replacing the processor with a faster version?– Adding more processors?

• We’ll focus on response time for now…


Relative Performance• Define Performance = 1/Execution Time• “X is n time faster than Y”

n Example: time taken to run a programn 10s on A, 15s on Bn Execution TimeB / Execution TimeA

= 15s / 10s = 1.5

n So A is 1.5 times faster than B

PerformanceX PerformanceY

=Execution timeY Execution timeX = n


Measuring Execution Time

• Elapsed time– Total response time, including all aspects

• Processing, I/O, OS overhead, idle time

– Determines system performance

• CPU time– Time spent processing a given job

• Discounts I/O time, other jobs’ shares– Comprises user CPU time and system CPU time


CPU Clocking• Operation of digital hardware governed by a constant-

rate clock

• Clock cycle (period): duration of a clock cycle– e.g., 250ps = 0.25ns = 250×10–12s

• Clock rate (frequency): cycles per second– e.g., 4.0GHz = 4000MHz = 4.0×109Hz

Clock (cycles)

Data transferand computation

Update state

Clock period


Some Math: CPU Clock Rate vs. Clock Cycle

Clock (cycles)

Data transferand computation

Update state

Clock period

10 nsec clock cycle => 100 MHz clock rate



1 nsec (10-9) clock cycle => 1 GHz (109) clock rate

500 psec clock cycle => 2 GHz clock rate




CPU Time• CPU execution time (CPU time)– Time the CPU spends working on a task– Does not include time waiting for I/O or running other programs

• CPU time can be improved (decreased) by– Reducing number of clock cycles– Increasing clock rate– Hardware designer often trades off clock rate against cycle count

Rate ClockCycles Clock CPU

Time Cycle ClockCycles Clock CPUTime CPU

=

´=


CPU Time Example• A program runs on computer A with a 2 GHz clock in 10 seconds.

What clock rate must computer B run at to run this program in 6 seconds? Unfortunately, to accomplish this, computer B will require 1.2 times as many clock cycles as computer A to run the program.

• How fast must Computer B clock be?

4GHz6s

10246s

10201.2Rate Clock

10202GHz10s

Rate ClockTime CPUCycles Clock

6sCycles Clock1.2

Time CPUCycles ClockRate Clock

99

B

9

AAA

A

B

BB

=´

=´´

=

´=´=

´=

´==


Instruction Count and CPI

• Instruction Count for a program– Determined by program, ISA and compiler

• Average cycles per instruction– Determined by CPU hardware– Different instructions have different CPI

• Average CPI affected by instruction mix

Rate ClockCPICount nInstructio

Time Cycle ClockCPICount nInstructioTime CPU

nInstructio per CyclesCount nInstructioCycles Clock

´=

´´=

´=


CPI Example• Computer A: Cycle Time = 250ps, CPI = 2.0• Computer B: Cycle Time = 500ps, CPI = 1.2• Same ISA• Which is faster, and by how much?

1.2500psI600psI

ATime CPUBTime CPU

600psI500ps1.2IBTime CycleBCPICount nInstructioBTime CPU

500psI250ps2.0IATime CycleACPICount nInstructioATime CPU

=´´

=

´=´´=

´´=

´=´´=

´´=

A is faster…

…by this much


CPI in More Detail

• If different instruction classes take different numbers of cycles

å=

´=n

1iii )Count nInstructio(CPICycles Clock

å=

÷øö

çèæ ´==

n

1i

ii Count nInstructio

Count nInstructioCPICount nInstructio

Cycles ClockCPI

Relative frequency

• Weighted average CPI


CPI Example• Alternative compiled code sequences using instructions in

classes A, B, C (IC = Instruction Count)

Class A B CCPI for class 1 2 3IC in sequence 1 2 1 2IC in sequence 2 4 1 1

n Sequence 1: IC = 5n Clock Cycles

= 2×1 + 1×2 + 2×3= 10

n Avg. CPI = 10/5 = 2.0

n Sequence 2: IC = 6n Clock Cycles

= 4×1 + 1×2 + 1×3= 9

n Avg. CPI = 9/6 = 1.5


Performance Summary

• Performance depends on– Algorithm– Programming language– Compiler– Instruction set architecture (ISA)– Core organization (also called microarchitecture)– Fabrication technology

cycle ClockSeconds

nInstructiocycles Clock

ProgramnsInstructioTime CPU ´´=


Determinates of CPU Performance

Instruction_count

CPI clock_cycle

Algorithm

Programming language

Compiler

ISA

Core organization

Technology

CPU time = Instruction_count x CPI x clock_cycle


Determinates of CPU Performance

Instruction_count

CPI clock_cycle

Algorithm

Programming language

Compiler

ISA

Core organization

Technology

CPU time = Instruction_count x CPI x clock_cycle

X

XX

XX

X X

X

X

X

X

X


The Power Wall and Transition from Uniprocessor to Multiprocessors


Power Trends

• In CMOS IC technology

FrequencyVoltageload CapacitivePower 2 ´´=

×1000×30 5V→1V


Uniprocessor Performance

Constrained by power wall (as well as instruction-level parallelism, memory latency)


Single-core vs. Multi-core

Raise Clock (20%)

1.73x

1.13x

PERF

ORM

ANCE

POW

ER

Lower Clock (20%)

0.51x

0.87x

PERF

ORM

ANCE

POW

ER

Power

Performance

1.00x

PERF

ORM

ANCE

Single–Core

POW

ER

1.02x

1.73x

PERF

ORM

ANCE

POW

ERDual–Core

Source: Intel

More computation power/watt


Multiprocessors

• Multicore microprocessors– More than one processor per chip

• Requires explicitly parallel programming– Compare with instruction level parallelism

• Hardware executes multiple instructions at once• Hidden from the programmer

– Hard to do• Programming for performance• Load balancing• Optimizing communication and synchronization


Summary• Abstraction is fundamental to understanding today's

computer systems– In both hardware and software– ISA is an interface between SW and HW

• Performance measure: execution time

• Moore's law has underpinned microprocessor technology development and IT industry

• Power limit forces a dramatic change in microprocessor design

– Moving on to multicore systems

cycle ClockSeconds

nInstructiocycles Clock

ProgramnsInstructioTime CPU ´´=

computer abstractions and technology -...

Documents