ece/cs 752: advanced computer architecture i · pdf fileece/cs 752: advanced computer...

37
ECE/CS 752: Advanced Computer Architecture I Fall 2017 © Prof. Mikko Lipasti Lecture notes based in part on slides created by John Shen, Ilhyun Kim, Mark Hill, David Wood, Guri Sohi, and Jim Smith, and others

Upload: ngoduong

Post on 06-Mar-2018

225 views

Category:

Documents


3 download

TRANSCRIPT

Page 1: ECE/CS 752: Advanced Computer Architecture I · PDF fileECE/CS 752: Advanced Computer Architecture I ... for multicore processor (2001) • Current issues causing ... or organization

ECE/CS 752: Advanced Computer Architecture I

Fall 2017

© Prof. Mikko Lipasti

Lecture notes based in part on slides created by John Shen, Ilhyun Kim, Mark Hill, David Wood, Guri Sohi,

and Jim Smith, and others

Page 2: ECE/CS 752: Advanced Computer Architecture I · PDF fileECE/CS 752: Advanced Computer Architecture I ... for multicore processor (2001) • Current issues causing ... or organization

Computer Architecture

• Rely on abstraction layers to manage complexity– Von Neumann Machine

Mikko Lipasti -- University of Wisconsin 2

Applications

Technology

ComputerArchitecture

Quantum Physics

Transistors & Devices

Logic Gates & Memory

Von Neumann Machine

x86 Machine Primitives

Visual C++

Firefox, MS Excel

Windows 7

Page 3: ECE/CS 752: Advanced Computer Architecture I · PDF fileECE/CS 752: Advanced Computer Architecture I ... for multicore processor (2001) • Current issues causing ... or organization

Technology• Technology advances at astounding rate

– 19th century: attempts to build mechanical computers– Early 20th century: mechanical counting systems (cash

registers, etc.)– Mid 20th century: vacuum tubes as switches– Since: transistors, integrated circuits

• 1965: Moore’s law [Gordon Moore]– Predicted doubling of IC capacity every 18 months– Has held for five decades, appears to be slowing down

• Drives functionality, performance, cost– Exponential improvement for 50+ years

Mikko Lipasti -- University of Wisconsin 3

0.00E+00

1.00E+07

1965 1970 1975 1980 1985 1990 1995 2000

IC Capacity 1965-1995

0.00E+00

5.00E+09

1.00E+10

1965 1975 1985 1995 2005 2015

IC Capacity 1965-2010

Page 4: ECE/CS 752: Advanced Computer Architecture I · PDF fileECE/CS 752: Advanced Computer Architecture I ... for multicore processor (2001) • Current issues causing ... or organization

Semiconductor History

Mikko Lipasti -- University of Wisconsin 4

Date Event Comments1947 1st transistor Bell Labs1958 1st IC Jack Kilby (MSEE ’50) @TI

Winner of 2000 Nobel prize1971 1st microprocessor Intel (calculator market)1974 Intel 4004 2300 transistors1978 Intel 8086 29K transistors1989 Intel 80486 1M transistors1995 Intel Pentium Pro 5.5M transistors2006 Intel Montecito 1.7B transistors2015 Oracle SPARC M7 10B+ transistors

Page 5: ECE/CS 752: Advanced Computer Architecture I · PDF fileECE/CS 752: Advanced Computer Architecture I ... for multicore processor (2001) • Current issues causing ... or organization

Computer Architecture • Instruction Set Architecture (IBM 360)

– … the attributes of a [computing] system as seen by the programmer. I.e. the conceptual structure and functional behavior, as distinct from the organization of the data flows and controls, the logic design, and the physical implementation. -- Amdahl, Blaaw, & Brooks, 1964

• Machine Organization (microarchitecture)– ALUS, Buses, Caches, Memories, etc.

• Machine Implementation (realization)– Gates, cells, transistors, wires

Mikko Lipasti -- University of Wisconsin 5

Page 6: ECE/CS 752: Advanced Computer Architecture I · PDF fileECE/CS 752: Advanced Computer Architecture I ... for multicore processor (2001) • Current issues causing ... or organization

752 In Context• Prior courses

– 352 – gates up to multiplexors and adders– 354 – high-level language down to machine language

interface or instruction set architecture (ISA)– 552 – implement logic that provides ISA interface– CS 537 – provides OS background (co-req. OK)

• This course – 752 – covers advanced techniques– Modern processors that exploit ILP– Modern memory systems that exploit MLP

• Additional courses– ECE 757 covers parallel and multiprocessing– ECE 755 covers VLSI design

Mikko Lipasti -- University of Wisconsin 6

Page 7: ECE/CS 752: Advanced Computer Architecture I · PDF fileECE/CS 752: Advanced Computer Architecture I ... for multicore processor (2001) • Current issues causing ... or organization

Why Take 752?

• To become a computer designer– Alumni of this class helped design your computer

• To learn what is under the hood of a computer– Innate curiosity– To better understand when things break– To write better code/applications– To write better system software (O/S, compiler, etc.)

• Because it is intellectually fascinating!– What is the most complex man-made single device?

Mikko Lipasti -- University of Wisconsin 7

Page 8: ECE/CS 752: Advanced Computer Architecture I · PDF fileECE/CS 752: Advanced Computer Architecture I ... for multicore processor (2001) • Current issues causing ... or organization

Computer Architecture• Exercise in engineering tradeoff analysis

– Find the fastest/cheapest/power-efficient/etc. solution– Optimization problem with 100s of variables

• All the variables are changing– At non-uniform rates– With inflection points– Only one guarantee: Today’s right answer will be wrong

tomorrow• Two high-level effects:

– Technology push– Application Pull

Mikko Lipasti -- University of Wisconsin 8

Page 9: ECE/CS 752: Advanced Computer Architecture I · PDF fileECE/CS 752: Advanced Computer Architecture I ... for multicore processor (2001) • Current issues causing ... or organization

Technology Push• What do these two intervals have in common?

– 1776-1999 (224 years)– 2000-2001 (2 years)

Mikko Lipasti -- University of Wisconsin 9

• Answer: Equal progress in processor speed!

• The power of exponential growth!• Driven by Moore’s Law

• Devices per chip doubles every 18-24 months• Computer architects turn additional resources into

• Speed• Power savings• Functionality

Page 10: ECE/CS 752: Advanced Computer Architecture I · PDF fileECE/CS 752: Advanced Computer Architecture I ... for multicore processor (2001) • Current issues causing ... or organization

Performance GrowthUnmatched by any other industry ![John Crawford, Intel]

• Doubling every 18 months (1982-1996): 800x– Cars travel at 44,000 mph and get 16,000 mpg– Air travel: LA to NY in 22 seconds (MACH 800)– Wheat yield: 80,000 bushels per acre

Mikko Lipasti -- University of Wisconsin 10

Doubling every 24 months (1971-1996): 9,000x– Cars travel at 600,000 mph, get 150,000 mpg– Air travel: LA to NY in 2 seconds (MACH 9,000)– Wheat yield: 900,000 bushels per acre

Page 11: ECE/CS 752: Advanced Computer Architecture I · PDF fileECE/CS 752: Advanced Computer Architecture I ... for multicore processor (2001) • Current issues causing ... or organization

Technology Push

• Technology advances at varying rates– E.g. DRAM capacity increases at 60%/year– But DRAM speed only improves 10%/year– Creates gap with processor frequency!

• Inflection points– Crossover causes rapid change– E.g. enough devices for multicore processor (2001)

• Current issues causing an “inflection point”– Power consumption– Reliability, variability– Packaging innovations

Mikko Lipasti -- University of Wisconsin 11

Page 12: ECE/CS 752: Advanced Computer Architecture I · PDF fileECE/CS 752: Advanced Computer Architecture I ... for multicore processor (2001) • Current issues causing ... or organization

Application Pull• Corollary to Moore’s Law:

Cost halves every two yearsIn a decade you can buy a computer for less than its sales tax today. –Jim Gray

• Computers cost-effective for– National security – weapons design– Enterprise computing – banking– Departmental computing – computer-aided design– Personal computer – spreadsheets, email, web– Mobile computing – GPS, location-aware, ubiquitous– Wearable computing – activity/health monitoring, etc.– Voice web search

Mikko Lipasti -- University of Wisconsin 12

Page 13: ECE/CS 752: Advanced Computer Architecture I · PDF fileECE/CS 752: Advanced Computer Architecture I ... for multicore processor (2001) • Current issues causing ... or organization

Application Pull• What about the future?

– For many modeling applications, scaling up resolution blows up computational demand (e.g. weather)

– Machine learning: model size increases seem to keep providing better and better accuracy

• Must dream up applications that are not cost-effective today– Realism in games and virtual worlds (graphics, physics, AI)– Virtual reality (Hololens), telepresence– Big data analytics, large-scale optimization– Personal assistants (AI/ML)– Image & video processing, analysis, contextual semantics– ???

• This is your job!

Mikko Lipasti -- University of Wisconsin 13

[Canziani et al., 2016]

Page 14: ECE/CS 752: Advanced Computer Architecture I · PDF fileECE/CS 752: Advanced Computer Architecture I ... for multicore processor (2001) • Current issues causing ... or organization

Trends

• Moore’s Law for device integration [source: Intel]

• Chip power consumption• Single-thread performance trend

Mikko Lipasti -- University of Wisconsin 14

Page 15: ECE/CS 752: Advanced Computer Architecture I · PDF fileECE/CS 752: Advanced Computer Architecture I ... for multicore processor (2001) • Current issues causing ... or organization

Dynamic Power

• Static CMOS: current flows when active– Combinational logic evaluates new inputs– Flip-flop, latch captures new value (clock edge)

• Terms– C: capacitance of circuit

• wire length, number and size of transistors– V: supply voltage– A: activity factor– f: frequency

• Future: Fundamentally power-constrainedMikko Lipasti -- University of Wisconsin

∑∈

≈u n i t si

iiid y n fAVCkP 2

Page 16: ECE/CS 752: Advanced Computer Architecture I · PDF fileECE/CS 752: Advanced Computer Architecture I ... for multicore processor (2001) • Current issues causing ... or organization

Multicore Mania

• First, servers– IBM Power4, 2001

• Then desktops– AMD Athlon X2, 2005

• Then laptops– Intel Core Duo, 2006

• Cellphones– Dual/quad/octo, big.LITTLE

Mikko Lipasti -- University of Wisconsin 16

Page 17: ECE/CS 752: Advanced Computer Architecture I · PDF fileECE/CS 752: Advanced Computer Architecture I ... for multicore processor (2001) • Current issues causing ... or organization

Why Multicore

Single Core Dual Core Quad Core

Core area A ~A/2 ~A/4

Core power W ~W/2 ~W/4

Chip power W + O W + O’ W + O’’

Core performance P 0.9P 0.8P

Chip performance P 1.8P 3.2P

Mikko Lipasti -- University of Wisconsin 17

Core Core CoreCore

Core

Core

Core

Page 18: ECE/CS 752: Advanced Computer Architecture I · PDF fileECE/CS 752: Advanced Computer Architecture I ... for multicore processor (2001) • Current issues causing ... or organization

f

Amdahl’s Law

f – fraction that can run in parallel1-f – fraction that must run serially

Mikko Lipasti -- University of Wisconsin 18

Time

# C

PUs

1 1-f

f

n

nff

Speedup+−

=)1(

1f

nff

n −=

+−∞→ 1

1

1

1lim

Page 19: ECE/CS 752: Advanced Computer Architecture I · PDF fileECE/CS 752: Advanced Computer Architecture I ... for multicore processor (2001) • Current issues causing ... or organization

Fixed Chip Power Budget

• Amdahl’s Law– Ignores (power) cost of n cores

• Revised Amdahl’s Law– More cores each core is slower– Parallel speedup < n– Serial portion (1-f) takes longer– Also, interconnect and scaling overhead

Mikko Lipasti -- University of Wisconsin 19

# C

PUs

Time

1 1-ff

n

Page 20: ECE/CS 752: Advanced Computer Architecture I · PDF fileECE/CS 752: Advanced Computer Architecture I ... for multicore processor (2001) • Current issues causing ... or organization

Fixed Power Scaling

• Fixed power budget forces slow cores• Serial code quickly dominates

Mikko Lipasti -- University of Wisconsin 20

1

2

4

8

16

32

64

128

1 2 4 8 16 32 64 128

Chip

Per

form

ance

# of cores/chip

99.9% Parallel99% Parallel90% Parallel80% Parallel

Page 21: ECE/CS 752: Advanced Computer Architecture I · PDF fileECE/CS 752: Advanced Computer Architecture I ... for multicore processor (2001) • Current issues causing ... or organization

Focus of this Course• How to make serial portion fast

– Fast serial portion also helps parallel portion!

• State-of-the-art processor design– Pipelining review (online lectures)– Superscalar, out-of-order processors– Branch prediction

• Advanced memory systems– Cache review (online lecture)

• Multicore and multithreaded processorsMikko Lipasti -- University of Wisconsin 21

Page 22: ECE/CS 752: Advanced Computer Architecture I · PDF fileECE/CS 752: Advanced Computer Architecture I ... for multicore processor (2001) • Current issues causing ... or organization

Instruction Set ProcessingThe ART and Science of Instruction-Set Processor Design[Gerrit Blaauw & Fred Brooks, 1981]

ARCHITECTURE (ISA) programmer/compiler view– Functional appearance to user/system programmer– Opcodes, addressing modes, architected registers, IEEE floating point

IMPLEMENTATION (μarchitecture) processor designer view– Logical structure or organization that performs the architecture– Pipelining, functional units, caches, physical registers

REALIZATION (Chip) chip/system designer view– Physical structure that embodies the implementation– Gates, cells, transistors, wires

Mikko Lipasti -- University of Wisconsin 22

Page 23: ECE/CS 752: Advanced Computer Architecture I · PDF fileECE/CS 752: Advanced Computer Architecture I ... for multicore processor (2001) • Current issues causing ... or organization

Iron Law

Mikko Lipasti -- University of Wisconsin 23

Processor Performance = ---------------Time

Program

Architecture --> Implementation --> RealizationCompiler Designer Processor Designer Chip Designer

Instructions CyclesProgram Instruction

TimeCycle

(code size)

= X X

(CPI) (cycle time)

Page 24: ECE/CS 752: Advanced Computer Architecture I · PDF fileECE/CS 752: Advanced Computer Architecture I ... for multicore processor (2001) • Current issues causing ... or organization

Iron Law• Instructions/Program

– Instructions executed, not static code size– Determined by algorithm, compiler, ISA

• Cycles/Instruction– Determined by ISA and CPU organization– Overlap among instructions reduces this term– Constrained by energy per instruction (EPI)

• Time/cycle– Determined by technology, organization, clever circuit

design– Constrained by power limitations

Mikko Lipasti -- University of Wisconsin 24

Page 25: ECE/CS 752: Advanced Computer Architecture I · PDF fileECE/CS 752: Advanced Computer Architecture I ... for multicore processor (2001) • Current issues causing ... or organization

Our Goal

• Minimize time, which is the product, NOT isolated terms

• Common error to miss terms while devising optimizations– E.g. ISA change to decrease instruction count– BUT leads to CPU organization which makes clock

slower– Reduced CPI causes large increase in EPI

• Bottom line: terms are inter-relatedMikko Lipasti -- University of Wisconsin 25

Page 26: ECE/CS 752: Advanced Computer Architecture I · PDF fileECE/CS 752: Advanced Computer Architecture I ... for multicore processor (2001) • Current issues causing ... or organization

Textbooks

• Recommended course textbook:– John Paul Shen and Mikko H. Lipasti, Modern

Processor Design: Fundamentals of SuperscalarProcessors, First edition, McGraw-Hill.

• Recommended textbook:– Mark Hill, Norm Jouppi, and Guri Sohi. Readings

in Computer Architecture. Morgan Kauffman,1999

Mikko Lipasti -- University of Wisconsin 26

Page 27: ECE/CS 752: Advanced Computer Architecture I · PDF fileECE/CS 752: Advanced Computer Architecture I ... for multicore processor (2001) • Current issues causing ... or organization

Expected Background• ECE/CS 552 or equivalent

– Design simple uniprocessor– Simple instruction sets– Organization– Datapath design– Hardwired/microprogrammed control– Simple pipelining– Basic caches

• High-level programming experience– C/UNIX skills – modify simulators

Mikko Lipasti -- University of Wisconsin 27

Page 28: ECE/CS 752: Advanced Computer Architecture I · PDF fileECE/CS 752: Advanced Computer Architecture I ... for multicore processor (2001) • Current issues causing ... or organization

Course Context

• Assume canonical RISC ISA– Register-register ALU ops– Load from memory (cache)– Store to memory– Branches, jumps, calls, returns

• Modern CISC (x86) processors– Translate to equivalent primitives

• Later: how the translation is done

Mikko Lipasti -- University of Wisconsin 28

Page 29: ECE/CS 752: Advanced Computer Architecture I · PDF fileECE/CS 752: Advanced Computer Architecture I ... for multicore processor (2001) • Current issues causing ... or organization

About This Course• Readings and Paper Reviews

– Will be posted on website (one list for each midterm)– Make sure you keep up with these! Not necessarily discussed in

lecture.• Lecture

– Attendance required– Some lectures will be delivered on line– Overscheduled in first half; will cancel many lectures in 2nd half

• Homework– Homework assigned but not graded– Learning tool to help prepare for midterm

Mikko Lipasti -- University of Wisconsin 29

Page 30: ECE/CS 752: Advanced Computer Architecture I · PDF fileECE/CS 752: Advanced Computer Architecture I ... for multicore processor (2001) • Current issues causing ... or organization

About This Course• Pop Quizzes

– Not announced ahead of time– Will drop one for final grade to accommodate occasional

absence– Make sure you are ahead on readings!

• Exams– Midterm 1: Wed 10/25 in class– Midterm 2: Wed 12/20 10:05am-12:05pm (final exam time

slot)– Keep up with reading list!

Mikko Lipasti -- University of Wisconsin 30

Page 31: ECE/CS 752: Advanced Computer Architecture I · PDF fileECE/CS 752: Advanced Computer Architecture I ... for multicore processor (2001) • Current issues causing ... or organization

About This Course• Course Project

– Research project• Replicate results from a paper• Or attempt something novel

• Final project includes a written report and an oral presentation– Proposal due 10/30– Progress report due 11/22– Presentations during class time 12/11, 12/13– Final reports due 12/13

Mikko Lipasti -- University of Wisconsin 31

Page 32: ECE/CS 752: Advanced Computer Architecture I · PDF fileECE/CS 752: Advanced Computer Architecture I ... for multicore processor (2001) • Current issues causing ... or organization

About This Course

• Grading– Quizzes & paper reviews 20%

– Midterm 1 25%

– Midterm 2 25%

– Project 30%

• Web Page (check regularly)– http://ece752.ece.wisc.edu

Mikko Lipasti -- University of Wisconsin 32

Page 33: ECE/CS 752: Advanced Computer Architecture I · PDF fileECE/CS 752: Advanced Computer Architecture I ... for multicore processor (2001) • Current issues causing ... or organization

About This Course

• Office Hours– Prof. Lipasti: EH 3621, TBD– Or, catch me after class

• Communication channels– E-mail to instructor, class e-mail list

[email protected]

– Web page– Office hours

Mikko Lipasti -- University of Wisconsin 33

Page 34: ECE/CS 752: Advanced Computer Architecture I · PDF fileECE/CS 752: Advanced Computer Architecture I ... for multicore processor (2001) • Current issues causing ... or organization

About This Course• Other Resources

– Computer Architecture Colloquium –Tuesday 4-5PM, 1221 CSS

– Computer Engineering Seminar – Friday 12-1PM, EH4610

– Architecture mailing list:http://lists.cs.wisc.edu/mailman/listinfo/architecture

– WWW Computer Architecture Page http://pages.cs.wisc.edu/~arch/www/

Mikko Lipasti -- University of Wisconsin 34

Page 35: ECE/CS 752: Advanced Computer Architecture I · PDF fileECE/CS 752: Advanced Computer Architecture I ... for multicore processor (2001) • Current issues causing ... or organization

About This Course

• Lecture schedule:– MWF 11:00-12:15– Cancel approx. 1 of 3 lectures, mostly in second

half of semester– Allows us to get ahead on topics to enable

broader range for project work

Mikko Lipasti -- University of Wisconsin 35

Page 36: ECE/CS 752: Advanced Computer Architecture I · PDF fileECE/CS 752: Advanced Computer Architecture I ... for multicore processor (2001) • Current issues causing ... or organization

Tentative Schedule

Mikko Lipasti -- University of Wisconsin 36

Week 0 Intoduction, Technology challengesWeek 1 Superscalar OrganizationWeek 2 Instruction FlowWeek 3 Register Data FlowWeek 4 Memory Data FlowWeek 5 Advanced Register Data FlowWeek 6 Case StudiesWeek 7 Midterm 1 in-class on 10/25, Case StudiesWeek 8 Advanced Memory HierarchyWeek 9 Multiple threads, Case studies

Week 10 Advanced topicsWeek 11 Lecture canceled, project workWeek 12 Lecture canceled, project workWeek 13 Lecture canceled, project workWeek 14 Project talks, Course Evaluation, Final reports

Finals Week Midterm 2 Wednesday 12/20 10:05pa

Page 37: ECE/CS 752: Advanced Computer Architecture I · PDF fileECE/CS 752: Advanced Computer Architecture I ... for multicore processor (2001) • Current issues causing ... or organization

Wrapping Up• Next lecture on technology challenges

– Sets the stage for the whole course

• View review lecture online – Pipelining Review, 2 lectures with audio narration– http://ece752.ece.wisc.edu

• Reading list and review schedule on web page• Be prepared for discussion/pop quiz

Final thought:Talking about music is like dancing about architecture.

(Thelonius Monk)

Mikko Lipasti -- University of Wisconsin 37