lecture 1: course introduction, technology trends, performance professor alvin r. lebeck computer...

47
Lecture 1: Course Introduction, Technology Trends, Performance Professor Alvin R. Lebeck Computer Science 220 Fall 2001

Upload: gladys-garrison

Post on 26-Dec-2015

221 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Lecture 1: Course Introduction, Technology Trends, Performance Professor Alvin R. Lebeck Computer Science 220 Fall 2001

Lecture 1: Course Introduction, Technology Trends, Performance

Professor Alvin R. Lebeck

Computer Science 220

Fall 2001

Page 2: Lecture 1: Course Introduction, Technology Trends, Performance Professor Alvin R. Lebeck Computer Science 220 Fall 2001

CPS 220 2© Alvin R. Lebeck 2001

Administrative

• Office HoursOffice: D304 LSRC

Hours: Mon 10:00-11:00 Thurs 2:00-3:00 or by appointment (email)

email: [email protected]

Phone: 660-6551

• Teaching AssistantFareed Zaffar

Office: D125 LSRC

Hours: Tuesday 10:00-11:00, Wednesday 1:00-2:00

email: [email protected]

Phone: 660-6576

Page 3: Lecture 1: Course Introduction, Technology Trends, Performance Professor Alvin R. Lebeck Computer Science 220 Fall 2001

CPS 220 3© Alvin R. Lebeck 2001

Administrative (Grading)

• 30% Homeworks– 6 Homeworks

– 5 points per day late, for first 10 days

– Always do the homework (better late than never)

• 30% Examinations (Midterm + Final)

• 30% Research Project (work in pairs)

• 10% Class Participation

• This course requires hard work.

Page 4: Lecture 1: Course Introduction, Technology Trends, Performance Professor Alvin R. Lebeck Computer Science 220 Fall 2001

4© Alvin R. Lebeck 2001

Administrative (Continued)

• Midterm Exam: In class (75 min) Closed book

• Final Exam: (3 hours) closed book

• This is a “Quals” Course.– Quals pass based on Midterm and Final exams only

Page 5: Lecture 1: Course Introduction, Technology Trends, Performance Professor Alvin R. Lebeck Computer Science 220 Fall 2001

5© Alvin R. Lebeck 2001

Administrative (Continued)

• Course Web Page – http://www.cs.duke.edu/courses/fall01/cps220

– Lectures posted there after class (pdf)

– Homework posted there

• Course News Group – duke.cs.cps220

– Use it to 1) read announcements/comments on class or homework, 2) ask questions (help), 3) communicate with each other

• Need Duke CS account– Duke ID, ACPUB account name (see HW #0)

Page 6: Lecture 1: Course Introduction, Technology Trends, Performance Professor Alvin R. Lebeck Computer Science 220 Fall 2001

6© Alvin R. Lebeck 2001

SPIDER: Systems Seminar

• Systems & Architecture Seminar– Wednesdays 3:45-5:00 in D344

– duke.cs.os-research (spider newsgroup)

• Presentations on current work– Practice talks for conferences

– Discussion on recent papers

– Your own research

• Why you should go?– If you want to work in Systems/Architecture…

– Good time to practice public speaking in front of friendly crowd

– Learn about current topics

Page 7: Lecture 1: Course Introduction, Technology Trends, Performance Professor Alvin R. Lebeck Computer Science 220 Fall 2001

7© Alvin R. Lebeck 2001

Assignment

• Homework #0 (Background, due Thursday)

• Read Chapters 1 & 2

Page 8: Lecture 1: Course Introduction, Technology Trends, Performance Professor Alvin R. Lebeck Computer Science 220 Fall 2001

CPS 220 8© Alvin R. Lebeck 2001

CPS 220 Course Focus

Understanding the design techniques, machine structures, technology factors, evaluation methods that will determine the form of computers in 21st Century

Technology ProgrammingLanguages

OperatingSystems History

ApplicationsInterface Design

(ISA)

Measurement & Evaluation

Parallelism

Computer Architecture:• Instruction Set Design• Organization• Hardware

Power

Page 9: Lecture 1: Course Introduction, Technology Trends, Performance Professor Alvin R. Lebeck Computer Science 220 Fall 2001

9© Alvin R. Lebeck 2001

Related Courses

Prerequisites

• CPS 104: Basic Machine Organization

• CPS 110: Basic Operating System Functions

• This course: focus on why, analysis, evaluation– Cost/performance

– Power budget

Follow on Courses

• CPS 221: Advanced Computer Architecture II– Parallel computer architecture

Page 10: Lecture 1: Course Introduction, Technology Trends, Performance Professor Alvin R. Lebeck Computer Science 220 Fall 2001

CPS 220 10© Alvin R. Lebeck 2001

Computer Architecture Is …

the attributes of a [computing] system as seen by the programmer, i.e., the conceptual structure and functional behavior, as distinct from the organization of the data flows and controls, the logic design, and the physical implementation.

Amdahl, Blaaw, and Brooks, 1964

SOFTWARESOFTWARE

Page 11: Lecture 1: Course Introduction, Technology Trends, Performance Professor Alvin R. Lebeck Computer Science 220 Fall 2001

CPS 220 11© Alvin R. Lebeck 2001

Topic Coverage

Textbook: Hennessy and Patterson, Computer Architecture: A Quantitative Approach, 2nd Ed., 1995.

• Fundamentals of Computer Architecture (Chapter 1)• Instruction Set Architecture (Chapter 2, Appendix C&D)• Pipelining (Chapter 3)• Advanced Pipelining and ILP (Chapter 4)• Memory Hierarchy (Chapter 5)• Input/Output and Storage (Chapter 6)• Networks and Interconnection Technology (Chapter 7)• Multiprocessors (Chapter 8)• Vectors (Apendix)• New Architectures/trends (papers)• Power (papers)

Page 12: Lecture 1: Course Introduction, Technology Trends, Performance Professor Alvin R. Lebeck Computer Science 220 Fall 2001

CPS 220 12© Alvin R. Lebeck 2001

Computer Architecture Topics

Instruction Set Architecture

Pipelining, Hazard Resolution,Superscalar, Reordering, Prediction, Speculation

Addressing,Protection,Exception Handling

L1 Cache

L2 Cache

DRAM

Disks, WORM, Tape

Coherence,Bandwidth,Latency

Emerging TechnologiesInterleavingBus protocols

RAID

VLSI

Input/Output and Storage

MemoryHierarchy

Pipelining and InstructionLevel Parallelism

Page 13: Lecture 1: Course Introduction, Technology Trends, Performance Professor Alvin R. Lebeck Computer Science 220 Fall 2001

CPS 220 13© Alvin R. Lebeck 2001

Computer Architecture Topics (CPS 221)

M

Interconnection NetworkS

PMPMPMP° ° °

Topologies,Routing,Bandwidth,Latency,Reliability

Network Interfaces

Shared Memory,Message Passing,Data Parallel

Processor-Memory-Switch

MultiprocessorsNetworks and Interconnections

Page 14: Lecture 1: Course Introduction, Technology Trends, Performance Professor Alvin R. Lebeck Computer Science 220 Fall 2001

14© Alvin R. Lebeck 2001

Computer Engineering Methodology

TechnologyTrends

Page 15: Lecture 1: Course Introduction, Technology Trends, Performance Professor Alvin R. Lebeck Computer Science 220 Fall 2001

15© Alvin R. Lebeck 2001

Computer Engineering Methodology

TechnologyTrends

Evaluate ExistingEvaluate ExistingSystems for Systems for BottlenecksBottlenecks

Benchmarks

Page 16: Lecture 1: Course Introduction, Technology Trends, Performance Professor Alvin R. Lebeck Computer Science 220 Fall 2001

16© Alvin R. Lebeck 2001

Computer Engineering Methodology

TechnologyTrends

Evaluate ExistingEvaluate ExistingSystems for Systems for BottlenecksBottlenecks

Benchmarks

Simulate NewSimulate NewDesigns andDesigns and

OrganizationsOrganizationsWorkloads

Page 17: Lecture 1: Course Introduction, Technology Trends, Performance Professor Alvin R. Lebeck Computer Science 220 Fall 2001

17© Alvin R. Lebeck 2001

TechnologyTrends

Evaluate ExistingEvaluate ExistingSystems for Systems for BottlenecksBottlenecks

Benchmarks

Simulate NewSimulate NewDesigns andDesigns and

OrganizationsOrganizationsWorkloads

Computer Engineering Methodology

Implement NextImplement NextGeneration SystemGeneration System

ImplementationComplexity

Page 18: Lecture 1: Course Introduction, Technology Trends, Performance Professor Alvin R. Lebeck Computer Science 220 Fall 2001

CPS 220 18© Alvin R. Lebeck 2001

• Application Area– Special Purpose (e.g., DSP) / General Purpose

– Scientific (FP intensive) / Commercial (Mainframe)

– Portable (Power matters)

• Level of Software Compatibility– Object Code/Binary Compatible (cost HW vs. SW; IBM S/360)

– Assembly Language (dream to be different from binary)

– Programming Language; Why not?

Context for Designing New Architectures

Page 19: Lecture 1: Course Introduction, Technology Trends, Performance Professor Alvin R. Lebeck Computer Science 220 Fall 2001

CPS 220 19© Alvin R. Lebeck 2001

• OS Requirements for General Purpose Apps– Size of Address Space

– Memory Management/Protection

– Context Switch

– Interrupts and Traps

– Communication

• Standards: Innovation vs. Competition– IEEE 754 Floating Point

– I/O Bus

– Networks

– Operating Systems / Programming Languages ...

Context for Designing New Architectures

Page 20: Lecture 1: Course Introduction, Technology Trends, Performance Professor Alvin R. Lebeck Computer Science 220 Fall 2001

20© Alvin R. Lebeck 2001

Technology Trends: Microprocessor Capacity

100

1000

10000

100000

1000000

10000000

100000000

1000000000

IntelDigital

CMOS improvements:• Die size: 2X every 3 yrs• Line width: halve / 7 yrs

“Graduation Window”

Pentium Pro: 5.5 millionSparc Ultra: 5.2 million PowerPC 620: 6.9 millionAlpha 21164: 9.3 millionAlpha 21264: 15 millionPentium III: 28 millionPentium 4: 42 millionAlpha 21364: 100 millionAlpha 21464: 250 million

Page 21: Lecture 1: Course Introduction, Technology Trends, Performance Professor Alvin R. Lebeck Computer Science 220 Fall 2001

21© Alvin R. Lebeck 2001

DRAM Capacity (single chip)

10

100

1000

10000

100000

1000000

10000000

1980 1983 1986 1989 1992 1996 1998 2002

size

year size cyc time

1980 64 Kb 250 ns

1983 256 Kb 220 ns

1986 1 Mb 190 ns

1989 4 Mb 165 ns

1992 16 Mb 145 ns

1996 64Mb 104 ns

1998 256Mb

2002 1Gb

Page 22: Lecture 1: Course Introduction, Technology Trends, Performance Professor Alvin R. Lebeck Computer Science 220 Fall 2001

CPS 220 22© Alvin R. Lebeck 2001

Technology Trends (Summary)

Capacity Speed

Logic 2x in 3 years 2x in 3 years

DRAM 4x in 3 years 1.4x in 10 years

Disk 2x in 3 years 1.4x in 10 years

Page 23: Lecture 1: Course Introduction, Technology Trends, Performance Professor Alvin R. Lebeck Computer Science 220 Fall 2001

CPS 220 23© Alvin R. Lebeck 2001

Processor Performance

Sun-4/260MIPS M/120

MIPS M2000IBM RS6000/540

HP 9000/750

DEC AXP 3000

0

50

100

150

200

250

300

1987 1988 1989 1990 1991 1992 1993 1994 1995

Year

Performance

IBM Power 2/590

1.54X/yr

1.35X/yr

DEC 21064a

Sun UltraSparc

Page 24: Lecture 1: Course Introduction, Technology Trends, Performance Professor Alvin R. Lebeck Computer Science 220 Fall 2001

24© Alvin R. Lebeck 2001

Alpha SPECint and SPECfp

0

100

200

300

400

500

600

700

1995 1996 1997 1998 1999 2000 2001 2002 2003 2004

Per

form

ance

(S

pec

mar

k)

Integer Floating Point 1.54x/yr

Page 25: Lecture 1: Course Introduction, Technology Trends, Performance Professor Alvin R. Lebeck Computer Science 220 Fall 2001

25© Alvin R. Lebeck 2001

Chip Area Reachable in One Clock Cycle

0

0.2

0.4

0.6

0.8

1

1.2

250 180 130 100 70 50 35

f16

f8

fSIA

Fra

ctio

n of

Chi

p R

each

ed

Nanometers

Page 26: Lecture 1: Course Introduction, Technology Trends, Performance Professor Alvin R. Lebeck Computer Science 220 Fall 2001

26© Alvin R. Lebeck 2001

Power Density

1

10

100

1000

1.5 1 0.8 0.6 0.35 0.25 0.18 0.13 0.1

Processor

Hot Plate

Laser diode

Pow

er D

ensi

ty W

/cm

^2

Microns

Page 27: Lecture 1: Course Introduction, Technology Trends, Performance Professor Alvin R. Lebeck Computer Science 220 Fall 2001

27© Alvin R. Lebeck 2001

Processor Perspective

• Putting performance growth in perspective:

Pentium-III Cray YMP

Personal Comp. Supercomputer

Year 1998 1988

MIPS > 400 MIPS < 50 MIPS

Linpack 140 MFLOPS 160 MFLOPS

Cost $3,000 $1M ($1.6M in 1994$)

Clock 400 MHz 167 MHz

Cache 512 KB 0.25 KB

Memory128 MB 256 MB

• 1988 supercomputer in 1998 personal computer!

Page 28: Lecture 1: Course Introduction, Technology Trends, Performance Professor Alvin R. Lebeck Computer Science 220 Fall 2001

CPS 220 28© Alvin R. Lebeck 2001

Measurement and Evaluation

Design

Analysis

Architecture is an iterative process:• Searching the space of possible designs• At all levels of computer systems

Bad IdeasGood IdeasGood Ideas

Creativity

Mediocre Ideas

Cost /PerformanceAnalysis

Page 29: Lecture 1: Course Introduction, Technology Trends, Performance Professor Alvin R. Lebeck Computer Science 220 Fall 2001

CPS 220 29© Alvin R. Lebeck 2001

Measurement Tools

• How do I evaluate an idea?

• Performance, Cost, Die Area, Power Estimation

• Benchmarks, Traces, Mixes

• Simulation (many levels)– ISA, RT, Gate, Circuit

• Queuing Theory

• Rules of Thumb

• Fundamental Laws

• Question: What is “better” Boeing 747 or Concorde?

Page 30: Lecture 1: Course Introduction, Technology Trends, Performance Professor Alvin R. Lebeck Computer Science 220 Fall 2001

CPS 220 30© Alvin R. Lebeck 2001

The Bottom Line: Performance (and Cost)

• Time to run the task (ExTime)– Execution time, response time, latency

• Tasks per day, hour, week, sec, ns … (Performance)– Throughput, bandwidth

Plane

Boeing 747

BAD/Sud Concorde

Speed

610 mph

1350 mph

DC to Paris

6.5 hours

3 hours

Passengers

470

132

Throughput (pmph)

286,700

178,200

Page 31: Lecture 1: Course Introduction, Technology Trends, Performance Professor Alvin R. Lebeck Computer Science 220 Fall 2001

CPS 220 31© Alvin R. Lebeck 2001

The Bottom Line: Performance (and Cost)

"X is n times faster than Y" means

ExTime(Y) Performance(X)

--------- = ---------------ExTime(X) Performance(Y)

• Speed of Concorde vs. Boeing 747

• Throughput of Boeing 747 vs. Concorde

Page 32: Lecture 1: Course Introduction, Technology Trends, Performance Professor Alvin R. Lebeck Computer Science 220 Fall 2001

CPS 220 32© Alvin R. Lebeck 2001

Performance Terminology

“X is n% faster than Y” means:ExTime(Y) Performance(X) n

--------- = -------------- = 1 + -----

ExTime(X) Performance(Y) 100

n = 100(Performance(X) - Performance(Y))

Performance(Y)

Example: Y takes 15 seconds to complete a task, X takes 10 seconds. What % faster is X?

Page 33: Lecture 1: Course Introduction, Technology Trends, Performance Professor Alvin R. Lebeck Computer Science 220 Fall 2001

CPS 220 33© Alvin R. Lebeck 2001

Example

1510

= 1.51.0

= Performance (X)Performance (Y)

ExTime(Y)ExTime(X)

=

n = 100 (1.5 - 1.0) 1.0

n = 50%

Page 34: Lecture 1: Course Introduction, Technology Trends, Performance Professor Alvin R. Lebeck Computer Science 220 Fall 2001

CPS 220 34© Alvin R. Lebeck 2001

Amdahl's Law

Speedup due to enhancement E: ExTime w/o E Performance w/ E

Speedup(E) = ------------- = -------------------

ExTime w/ E Performance w/o E

Suppose that enhancement E accelerates a fraction F of the task by a factor S, and the remainder of the task is unaffected, then:

ExTime(E) =

Speedup(E) =

Page 35: Lecture 1: Course Introduction, Technology Trends, Performance Professor Alvin R. Lebeck Computer Science 220 Fall 2001

CPS 220 35© Alvin R. Lebeck 2001

Amdahl’s Law

ExTimenew = ExTimeold x (1 - Fractionenhanced) + Fractionenhanced

Speedupoverall =ExTimeold

ExTimenew

Speedupenhanced

=

1

(1 - Fractionenhanced) + Fractionenhanced

Speedupenhanced

Page 36: Lecture 1: Course Introduction, Technology Trends, Performance Professor Alvin R. Lebeck Computer Science 220 Fall 2001

CPS 220 36© Alvin R. Lebeck 2001

Amdahl’s Law

• Floating point instructions improved to run 2X; but only 10% of actual instruction execution time is FP

Speedupoverall =

ExTimenew =

Page 37: Lecture 1: Course Introduction, Technology Trends, Performance Professor Alvin R. Lebeck Computer Science 220 Fall 2001

CPS 220 37© Alvin R. Lebeck 2001

Amdahl’s Law

• Floating point instructions improved to run 2X; but only 10% of actual instruction execution time is FP

Speedupoverall = 1

0.95= 1.053

ExTimenew = ExTimeold x (0.9 + .1/2) = 0.95 x ExTimeold

Page 38: Lecture 1: Course Introduction, Technology Trends, Performance Professor Alvin R. Lebeck Computer Science 220 Fall 2001

CPS 220 38© Alvin R. Lebeck 2001

Corollary: Make The Common Case Fast

• All instructions require an instruction fetch, only a fraction require a data fetch/store.

– Optimize instruction access over data access

• Programs exhibit localitySpatial Locality Temporal Locality

• Access to small memories is faster– Provide a storage hierarchy such that the most frequent

accesses are to the smallest (closest) memories.

Reg'sCache

Memory Disk / Tape

Page 39: Lecture 1: Course Introduction, Technology Trends, Performance Professor Alvin R. Lebeck Computer Science 220 Fall 2001

CPS 220 39© Alvin R. Lebeck 2001

Occam's Toothbrush

• The simple case is usually the most frequent and the easiest to optimize!

• Do simple, fast things in hardware and be sure the rest can be handled correctly in software

Page 40: Lecture 1: Course Introduction, Technology Trends, Performance Professor Alvin R. Lebeck Computer Science 220 Fall 2001

CPS 220 40© Alvin R. Lebeck 2001

Metrics of Performance

Compiler

Programming Language

Application

DatapathControl

Transistors Wires Pins

ISA

Function Units

(millions) of Instructions per second: MIPS(millions) of (FP) operations per second: MFLOP/s

Cycles per second (clock rate)

Megabytes per second

Answers per monthOperations per second

Page 41: Lecture 1: Course Introduction, Technology Trends, Performance Professor Alvin R. Lebeck Computer Science 220 Fall 2001

CPS 220 41© Alvin R. Lebeck 2001

Aspects of CPU Performance

CPU time = Seconds = Instructions x Cycles x Seconds

Program Program Instruction Cycle

CPU time = Seconds = Instructions x Cycles x Seconds

Program Program Instruction Cycle

Instr. Cnt CPI Clock RateProgram

Compiler

Instr. Set

Organization

Technology

Page 42: Lecture 1: Course Introduction, Technology Trends, Performance Professor Alvin R. Lebeck Computer Science 220 Fall 2001

CPS 220 43© Alvin R. Lebeck 2001

Marketing Metrics

• Machines with different instruction sets ?

• Programs with different instruction mixes ?

– Dynamic frequency of instructions

• Uncorrelated with performance

• Machine dependent

• Often not where time is spent

Normalized:

add,sub,compare,mult 1

divide, sqrt 4

exp, sin, . . . 8

Normalized:

add,sub,compare,mult 1

divide, sqrt 4

exp, sin, . . . 8

66 10CPI

Clock Rate10

Time

Count nInstructio MIPS

610Time

Operations FP MFLOPS

Page 43: Lecture 1: Course Introduction, Technology Trends, Performance Professor Alvin R. Lebeck Computer Science 220 Fall 2001

44© Alvin R. Lebeck 2001

Cycles Per Instruction

Count nInstructio

I Fe wherFCPI CPI

I CPI Time Cycle timeCPU

Count nInstructio

Cycles

Count nInstructio

Clock Rate timeCPU CPI

ii

n

1 iii

i

n

1 ii

Invest Resources where time is Spent!

“Average Cycles Per Instruction”

“Instruction Frequency”

Page 44: Lecture 1: Course Introduction, Technology Trends, Performance Professor Alvin R. Lebeck Computer Science 220 Fall 2001

CPS 220 45© Alvin R. Lebeck 2001

Organizational Trade-offs

Instruction Mix

Cycle Time

CPI

Compiler

Programming Language

Application

DatapathControl

Transistors Wires Pins

ISA

Function Units

Page 45: Lecture 1: Course Introduction, Technology Trends, Performance Professor Alvin R. Lebeck Computer Science 220 Fall 2001

CPS 220 46© Alvin R. Lebeck 2001

Example: Calculating CPI

Typical Mix

Base Machine (Reg / Reg)

Op Freq Cycles CPIi (% Time)

ALU 50% 1 .5 (33%)

Load 20% 2 .4 (27%)

Store 10% 2 .2 (13%)

Branch 20% 2 .4 (27%)

1.5

Page 46: Lecture 1: Course Introduction, Technology Trends, Performance Professor Alvin R. Lebeck Computer Science 220 Fall 2001

CPS 220 47© Alvin R. Lebeck 2001

Base Machine (Reg / Reg)

Op Freq Cycles

ALU 50% 1

Load 20% 2

Store 10% 2

Branch 20% 2

Example

Add register / memory operations to traditional RISC:– One source operand in memory– One source operand in register– Cycle count of 2

Branch cycle count to increase to 3.

What fraction of the loads must be eliminated for this to pay off?

Page 47: Lecture 1: Course Introduction, Technology Trends, Performance Professor Alvin R. Lebeck Computer Science 220 Fall 2001

CPS 220 48© Alvin R. Lebeck 2001

Next Time

• Benchmarks

• Performance Metrics

• Cost

• Instruction Set Architectures

TODO

• Read Chapters 1 & 2

• Do Homework #0