cs152 / kubiatowicz lec7.1 2/8/01©ucb spring 2001 cs152 computer architecture and engineering...

47
2/8/01 ©UCB Spring 2001 CS152 / Kubiatowicz Lec7.1 CS152 Computer Architecture and Engineering Lecture 7 Designing a Single Cycle Datapath February 8, 2001 John Kubiatowicz (http.cs.berkeley.edu/~kubitron) lecture slides: http://www- inst.eecs.berkeley.edu/~cs152/

Post on 19-Dec-2015

220 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: CS152 / Kubiatowicz Lec7.1 2/8/01©UCB Spring 2001 CS152 Computer Architecture and Engineering Lecture 7 Designing a Single Cycle Datapath February 8, 2001

2/8/01 ©UCB Spring 2001 CS152 / Kubiatowicz

Lec7.1

CS152Computer Architecture and Engineering

Lecture 7

Designing a Single Cycle Datapath

February 8, 2001

John Kubiatowicz (http.cs.berkeley.edu/~kubitron)

lecture slides: http://www-inst.eecs.berkeley.edu/~cs152/

Page 2: CS152 / Kubiatowicz Lec7.1 2/8/01©UCB Spring 2001 CS152 Computer Architecture and Engineering Lecture 7 Designing a Single Cycle Datapath February 8, 2001

2/8/01 ©UCB Spring 2001 CS152 / Kubiatowicz

Lec7.2

Outline of Today’s Lecture

° Recap (5 minutes)

° Finish on Floating Point

° Design a processor: step-by-step

° Requirements of the Instruction Set

° Questions and Administrative Matters (5 minutes)

° Components and Clocking

° Assembling an Adequate Datapath

° Break (5 minutes)

° Controling the datapath

Page 3: CS152 / Kubiatowicz Lec7.1 2/8/01©UCB Spring 2001 CS152 Computer Architecture and Engineering Lecture 7 Designing a Single Cycle Datapath February 8, 2001

2/8/01 ©UCB Spring 2001 CS152 / Kubiatowicz

Lec7.3

Review: DIVIDE HARDWARE Version 3

° 32-bit Divisor reg, 32 -bit ALU, 64-bit Remainder reg, (0-bit Quotient reg)

Remainder (Quotient)

Divisor

32-bit ALU

WriteControl

32 bits

64 bits

Shift Left“HI” “LO”

° Multiplication and Division can use same hardware!

Page 4: CS152 / Kubiatowicz Lec7.1 2/8/01©UCB Spring 2001 CS152 Computer Architecture and Engineering Lecture 7 Designing a Single Cycle Datapath February 8, 2001

2/8/01 ©UCB Spring 2001 CS152 / Kubiatowicz

Lec7.4

Divide Algorithm Version 3 example (7 / 2)

Remainder Divisor0000 0111 0010

0: 0000 1110 0010 ; Initial Shift1: 1110 1110 0010 ; Try to subtract2: 0000 1110 0010 ; Can’t: Add back3: 0001 1100 0010 ; Shift in 01:1111 1100 0010 ; Try to subtract2:0001 1100 0010 ; Can’t: Add back3: 0011 1000 0010 ; Shift in 01:0001 1000 0010 ; Try to subtract2: 0001 1000 0010 ; Success!3:0011 0001 0010 ; Shift in 11:0001 0001 0010 ; Try to subtract2:0001 0001 0010 ; Success!3:0010 0011 0010 ; Shift in 1 E:0001 0011 0010 ; Correct remainder

Page 5: CS152 / Kubiatowicz Lec7.1 2/8/01©UCB Spring 2001 CS152 Computer Architecture and Engineering Lecture 7 Designing a Single Cycle Datapath February 8, 2001

2/8/01 ©UCB Spring 2001 CS152 / Kubiatowicz

Lec7.5

Non restoring version

Remainder Divisor0000 0111 0010

0: 0000 1110 0010 ; Initial Shift1: 1110 1110 0010 ; Try to subtract3: 1101 1100 0010 ; Negative: Shift in 01:1111 1100 0010 ; Try to add (neg)3: 1111 1000 0010 ; Negative: Shift in 01:0001 1000 0010 ; Try to Add3:0011 0001 0010 ; Positive: Shift in 11:0001 0001 0010 ; Try to subtract3:0010 0011 0010 ; Positive: Shift in 1 E:0001 0011 0010 ; Correct remainder

Insight: (-Divisor * 2) + Divisor = Divisor

Page 6: CS152 / Kubiatowicz Lec7.1 2/8/01©UCB Spring 2001 CS152 Computer Architecture and Engineering Lecture 7 Designing a Single Cycle Datapath February 8, 2001

2/8/01 ©UCB Spring 2001 CS152 / Kubiatowicz

Lec7.6

Review: What is in a number?° What can be represented in N bits?

° Unsigned 0 to 2N - 1

° 2s Complement - 2N-1 to 2N-1 - 1

° 1s Complement -2N-1+1 to 2N-1-1

° Excess M 2 -M to 2 N - M - 1

• (E = e + M)

° BCD 0 to 10N/4 - 1

° But, what about?

• very large numbers? 9,349,398,989,787,762,244,859,087,678

• very small number?0.0000000000000000000000045691

• rationals 2/3

• irrationals

• transcendentals e,

2

Page 7: CS152 / Kubiatowicz Lec7.1 2/8/01©UCB Spring 2001 CS152 Computer Architecture and Engineering Lecture 7 Designing a Single Cycle Datapath February 8, 2001

2/8/01 ©UCB Spring 2001 CS152 / Kubiatowicz

Lec7.7

Review: Recall Scientific Notation

6.02 x 10 1.673 x 1023 -24

exponent

radix (base)Mantissa

decimal point

Sign, magnitude

Sign, magnitude

IEEE F.P. ± 1.M x 2e - 127

° Issues:

• Arithmetic (+, -, *, / )

• Representation, Normal form

• Range and Precision

• Rounding

• Exceptions (e.g., divide by zero, overflow, underflow)

• Errors

• Properties ( negation, inversion, if A B then A - B 0 )

Page 8: CS152 / Kubiatowicz Lec7.1 2/8/01©UCB Spring 2001 CS152 Computer Architecture and Engineering Lecture 7 Designing a Single Cycle Datapath February 8, 2001

2/8/01 ©UCB Spring 2001 CS152 / Kubiatowicz

Lec7.8

Review from Prerequisties: Floating-Point ArithmeticRepresentation of floating point numbers in IEEE 754 standard:

single precision1 8 23

sign

exponent:excess 127binary integer

mantissa:sign + magnitude, normalizedbinary significand w/ hiddeninteger bit: 1.M

actual exponent ise = E - 127

S E M

N = (-1) 2 (1.M)S E-127

0 < E < 255

0 = 0 00000000 0 . . . 0 -1.5 = 1 01111111 10 . . . 0

Magnitude of numbers that can be represented is in the range:

2-126

(1.0) to 2127

(2 - 2-23 )

which is approximately:

1.8 x 10-38

to 3.40 x 10 38

(integer comparison valid on IEEE Fl.Pt. numbers of same sign!)

Page 9: CS152 / Kubiatowicz Lec7.1 2/8/01©UCB Spring 2001 CS152 Computer Architecture and Engineering Lecture 7 Designing a Single Cycle Datapath February 8, 2001

2/8/01 ©UCB Spring 2001 CS152 / Kubiatowicz

Lec7.9

Basic Addition Algorithm/Multiply issues

For addition (or subtraction) this translates into the following steps:

(1) compute Ye - Xe (getting ready to align binary point)

(2) right shift Xm that many positions to form Xm 2

(3) compute Xm 2 + Ym

if representation demands normalization, then normalization step follows:

(4) left shift result, decrement result exponent (e.g., 0.001xx…) right shift result, increment result exponent (e.g., 101.1xx…) continue until MSB of data is 1 (NOTE: Hidden bit in IEEE Standard)

(5) for multiply, doubly biased exponent must be corrected:

Xe = 7 Ye = -3 Excess 8 extra subtraction step of the bias amount

(6) if result is 0 mantissa, may need to zero exponent by special step

Xe-Ye

Xe-Ye

Xe = 1111Ye = 0101 10100

= 15= 5 20

= 7 + 8= -3 + 8 4 + 8 + 8

Page 10: CS152 / Kubiatowicz Lec7.1 2/8/01©UCB Spring 2001 CS152 Computer Architecture and Engineering Lecture 7 Designing a Single Cycle Datapath February 8, 2001

2/8/01 ©UCB Spring 2001 CS152 / Kubiatowicz

Lec7.10

Extra Bits for rounding"Floating Point numbers are like piles of sand; every time you move one

you lose a little sand, but you pick up a little dirt."

How many extra bits?

IEEE: As if computed the result exactly and rounded.

Addition:

1.xxxxx 1.xxxxx 1.xxxxx

+ 1.xxxxx 0.001xxxxx 0.01xxxxx

1x.xxxxy 1.xxxxxyyy 1x.xxxxyyypost-normalization pre-normalization pre and post

° Guard Digits: digits to the right of the first p digits of significand to guard against loss of digits – can later be shifted left into first P places during normalization.

° Addition: carry-out shifted in

° Subtraction: borrow digit and guard

° Multiplication: carry and guard, Division requires guard

Page 11: CS152 / Kubiatowicz Lec7.1 2/8/01©UCB Spring 2001 CS152 Computer Architecture and Engineering Lecture 7 Designing a Single Cycle Datapath February 8, 2001

2/8/01 ©UCB Spring 2001 CS152 / Kubiatowicz

Lec7.11

Rounding Digits

normalized result, but some non-zero digits to the right of the significand --> the number should be rounded

E.g., B = 10, p = 3: 0 2 1.69

0 0 7.45

0 2 1.62

= 1.6900 * 10

= - .0745 * 10

= 1.6155 * 10

2-bias

2-bias

2-bias-

one round digit must be carried to the right of the guard digit so that after a normalizing left shift, the result can be rounded, according to the value of the round digit

IEEE Standard: four rounding modes: round to nearest even (default)

round towards plus infinityround towards minus infinityround towards 0

round to nearest: round digit < B/2 then truncate > B/2 then round up (add 1 to ULP: unit in last place) = B/2 then round to nearest even digit

it can be shown that this strategy minimizes the mean error introduced by rounding

Page 12: CS152 / Kubiatowicz Lec7.1 2/8/01©UCB Spring 2001 CS152 Computer Architecture and Engineering Lecture 7 Designing a Single Cycle Datapath February 8, 2001

2/8/01 ©UCB Spring 2001 CS152 / Kubiatowicz

Lec7.12

Sticky BitAdditional bit to the right of the round digit to better fine tune rounding

d0 . d1 d2 d3 . . . dp-1 0 0 0 0 . 0 0 X . . . X X X S X X S

+Sticky bit: set to 1 if any 1 bits fall off the end of the round digit

d0 . d1 d2 d3 . . . dp-1 0 0 0 0 . 0 0 X . . . X X X 0

X X 0-

d0 . d1 d2 d3 . . . dp-1 0 0 0 0 . 0 0 X . . . X X X 1-

generates a borrow

Rounding Summary:

Radix 2 minimizes wobble in precision

Normal operations in +,-,*,/ require one carry/borrow bit + one guard digit

One round digit needed for correct rounding

Sticky bit needed when round digit is B/2 for max accuracy

Rounding to nearest has mean error = 0 if uniform distribution of digits are assumed

Page 13: CS152 / Kubiatowicz Lec7.1 2/8/01©UCB Spring 2001 CS152 Computer Architecture and Engineering Lecture 7 Designing a Single Cycle Datapath February 8, 2001

2/8/01 ©UCB Spring 2001 CS152 / Kubiatowicz

Lec7.13

Denormalized Numbers

0 2 2 2-bias 1-bias 2-bias

B = 2, p = 4normal numbers with hidden bit -->

denormgap

The gap between 0 and the next representable number is much larger than the gaps between nearby representable numbers.

IEEE standard uses denormalized numbers to fill in the gap, making the distances between numbers near 0 more alike.

0 2 2 2-bias 1-bias 2-bias

p bits ofprecision

p-1bits of

precision

same spacing, half as many values!

NOTE: PDP-11, VAX cannot represent subnormal numbers. These machines underflow to zero instead.

Page 14: CS152 / Kubiatowicz Lec7.1 2/8/01©UCB Spring 2001 CS152 Computer Architecture and Engineering Lecture 7 Designing a Single Cycle Datapath February 8, 2001

2/8/01 ©UCB Spring 2001 CS152 / Kubiatowicz

Lec7.14

Infinity and NaNs

result of operation overflows, i.e., is larger than the largest number that can be represented

overflow is not the same as divide by zero (raises a different exception)

+/- infinity S 1 . . . 1 0 . . . 0

It may make sense to do further computations with infinity e.g., X/0 > Y may be a valid comparison

Not a number, but not infinity (e.q. sqrt(-4)) invalid operation exception (unless operation is = or =)

NaN S 1 . . . 1 non-zero

NaNs propagate: f(NaN) = NaN

HW decides what goes here

Page 15: CS152 / Kubiatowicz Lec7.1 2/8/01©UCB Spring 2001 CS152 Computer Architecture and Engineering Lecture 7 Designing a Single Cycle Datapath February 8, 2001

2/8/01 ©UCB Spring 2001 CS152 / Kubiatowicz

Lec7.15

Radix-4 Modified Booth’s Multiple representations

Current Bit to the Explanation Example RecodeBits Right

0 0 0 Middle of zeros 00 00 00 00 00 00 (0)

0 1 0 Single one 00 00 00 01 00 01 (1)

1 0 0 Begins run of 1s 00 01 11 10 00 10 (-2)

1 1 0 Begins run of 1s 00 01 11 11 00 01 (-1)

0 0 1 Ends run of 1s 00 00 11 11 00 01 (1)

0 1 1 Ends run of 1s 00 01 11 11 00 10 (2)

1 0 1 Isolated 0 00 11 10 11 00 01 (-1)

1 1 1 Middle of run 00 11 11 11 00 00 (0)

Once admit new symbols (i.e. 1), can have multiple representations of a number:

Page 16: CS152 / Kubiatowicz Lec7.1 2/8/01©UCB Spring 2001 CS152 Computer Architecture and Engineering Lecture 7 Designing a Single Cycle Datapath February 8, 2001

2/8/01 ©UCB Spring 2001 CS152 / Kubiatowicz

Lec7.16

Pentium Bug

° Pentium FP Divider uses algorithm to generate multiple bits per steps

• FPU uses most significant bits of divisor & dividend/remainder to guess next 2 bits of quotient

• Guess is taken from lookup table: -2, -1,0,+1,+2 (if previous guess too large a reminder, quotient is adjusted in subsequent pass of -2)

• Guess is multiplied by divisor and subtracted from remainder to generate a new remainder

• Called SRT division after 3 people who came up with idea

° Pentium table uses 7 bits of remainder + 4 bits of divisor = 211 entries

° 5 entries of divisors omitted: 1.0001, 1.0100, 1.0111, 1.1010, 1.1101 from PLA (fix is just add 5 entries back into PLA: cost $200,000)

° Self correcting nature of SRT => string of 1s must follow error

• e.g., 1011 1111 1111 1111 1111 1011 1000 0010 0011 0111 1011 0100 (2.99999892918)

° Since indexed also by divisor/remainder bits, sometimes bug doesn’t show even with dangerous divisor value

Page 17: CS152 / Kubiatowicz Lec7.1 2/8/01©UCB Spring 2001 CS152 Computer Architecture and Engineering Lecture 7 Designing a Single Cycle Datapath February 8, 2001

2/8/01 ©UCB Spring 2001 CS152 / Kubiatowicz

Lec7.17

Pentium bug appearance

° First 11 bits to right of decimal point always correct: bits 12 to 52 where bug can occur (4th to 15th decimal digits)

° FP divisors near integers 3, 9, 15, 21, 27 are dangerous ones:

• 3.0 > d 3.0 - 36 x 2–22 , 9.0 > d 9.0 - 36 x 2–20

• 15.0 > d 15.0 - 36 x 2–20 , 21.0 > d 21.0 - 36 x 2–19

° 0.333333 x 9 could be problem

° In Microsoft Excel, try (4,195,835 / 3,145,727) * 3,145,727

• = 4,195,835 => not a Pentium with bug

• = 4,195,579 => Pentium with bug(assuming Excel doesn’t already have SW bug patch)

• Rarely noticed since error in 5th significant digit

• Success of IEEE standard made discovery possible: all computers should get same answer

Page 18: CS152 / Kubiatowicz Lec7.1 2/8/01©UCB Spring 2001 CS152 Computer Architecture and Engineering Lecture 7 Designing a Single Cycle Datapath February 8, 2001

2/8/01 ©UCB Spring 2001 CS152 / Kubiatowicz

Lec7.18

Pentium Bug Time line

° June 1994: Intel discovers bug in Pentium: takes months to make change, reverify, put into production: plans good chips in January 1995 4 to 5 million Pentiums produced with bug

° Scientist suspects errors and posts on Internet in September 1994

° Nov. 22 Intel Press release: “Can make errors in 9th digit ... Most engineers and financial analysts need only 4 of 5 digits. Theoretical mathematician should be concerned. ... So far only heard from one.”

° Intel claims happens once in 27,000 years for typical spread sheet user:

• 1000 divides/day x error rate assuming numbers random

° Dec 12: IBM claims happens once per 24 days: Bans Pentium sales

• 5000 divides/second x 15 minutes = 4,200,000 divides/day

• IBM statement: http://www.ibm.com/Features/pentium.html

• Intel said it regards IBM's decision to halt shipments of its Pentium processor-based systems as unwarranted.

Page 19: CS152 / Kubiatowicz Lec7.1 2/8/01©UCB Spring 2001 CS152 Computer Architecture and Engineering Lecture 7 Designing a Single Cycle Datapath February 8, 2001

2/8/01 ©UCB Spring 2001 CS152 / Kubiatowicz

Lec7.19

Pentium conclusion: Dec. 21, 1994 $500M write-off“To owners of Pentium processor-based computers and the PC community:

We at Intel wish to sincerely apologize for our handling of the recentlypublicized Pentium processor flaw.

The Intel Inside symbol means that your computer has a microprocessor second to none in quality and performance. Thousands of Intel employees work very hard to ensure that this is true. But no microprocessor is ever perfect.

What Intel continues to believe is technically an extremely minor problem has taken on a life of its own. Although Intel firmly stands behind the quality of the current version of the Pentium processor, we recognize that many users have concerns.

We want to resolve these concerns.

Intel will exchange the current version of the Pentium processor for anupdated version, in which this floating-point divide flaw is corrected, forany owner who requests it, free of charge anytime during the life of theircomputer. Just call 1-800-628-8686.”

Sincerely, Andrew S. Grove Craig R. Barrett Gordon E. Moore President /CEO Executive Vice President Chairman of the Board

&COO

Page 20: CS152 / Kubiatowicz Lec7.1 2/8/01©UCB Spring 2001 CS152 Computer Architecture and Engineering Lecture 7 Designing a Single Cycle Datapath February 8, 2001

2/8/01 ©UCB Spring 2001 CS152 / Kubiatowicz

Lec7.20

Questions and Administrative Matters (5 Minutes)

° Reading Assignment 5.1-5.4

° Project teams -- choose next Wednesday:

• Form four or five people project team.

• We want you to learn to work in a big team.

• Other project members must be in same section

° Make sure to look for assignments on Handouts page

° Midterm Thursday 3/1 in 277 Cory 5:30PM-8:30PM

• you may bring one double-sided page of notes

• we’ll give you the opcode table from the book

• review session Sunday before(?)

• previous midterms and solutions on-line for review

Page 21: CS152 / Kubiatowicz Lec7.1 2/8/01©UCB Spring 2001 CS152 Computer Architecture and Engineering Lecture 7 Designing a Single Cycle Datapath February 8, 2001

2/8/01 ©UCB Spring 2001 CS152 / Kubiatowicz

Lec7.21

The Big Picture: Where are We Now?

° The Five Classic Components of a Computer

° Today’s Topic: Design a Single Cycle Processor

Control

Datapath

Memory

ProcessorInput

Output

inst. set design (L1-2) technology (L3)

machinedesign Arithmetic (L4-6)

Page 22: CS152 / Kubiatowicz Lec7.1 2/8/01©UCB Spring 2001 CS152 Computer Architecture and Engineering Lecture 7 Designing a Single Cycle Datapath February 8, 2001

2/8/01 ©UCB Spring 2001 CS152 / Kubiatowicz

Lec7.22

The Big Picture: The Performance Perspective° Performance of a machine is determined by:

• Instruction count

• Clock cycle time

• Clock cycles per instruction

° Processor design (datapath and control) will determine:

• Clock cycle time

• Clock cycles per instruction

° Today:

• Single cycle processor:

- Advantage: One clock cycle per instruction

- Disadvantage: long cycle time

CPI

Inst. Count Cycle Time

Page 23: CS152 / Kubiatowicz Lec7.1 2/8/01©UCB Spring 2001 CS152 Computer Architecture and Engineering Lecture 7 Designing a Single Cycle Datapath February 8, 2001

2/8/01 ©UCB Spring 2001 CS152 / Kubiatowicz

Lec7.23

How to Design a Processor: step-by-step

° 1. Analyze instruction set => datapath requirements

• the meaning of each instruction is given by the register transfers

• datapath must include storage element for ISA registers

- possibly more

• datapath must support each register transfer

° 2. Select set of datapath components and establish clocking methodology

° 3. Assemble datapath meeting the requirements

° 4. Analyze implementation of each instruction to determine setting of control points that effects the register transfer.

° 5. Assemble the control logic

Page 24: CS152 / Kubiatowicz Lec7.1 2/8/01©UCB Spring 2001 CS152 Computer Architecture and Engineering Lecture 7 Designing a Single Cycle Datapath February 8, 2001

2/8/01 ©UCB Spring 2001 CS152 / Kubiatowicz

Lec7.24

The MIPS Instruction Formats

° All MIPS instructions are 32 bits long. The three instruction formats:

• R-type

• I-type

• J-type

° The different fields are:

• op: operation of the instruction

• rs, rt, rd: the source and destination register specifiers

• shamt: shift amount

• funct: selects the variant of the operation in the “op” field

• address / immediate: address offset or immediate value

• target address: target address of the jump instruction

op target address

02631

6 bits 26 bits

op rs rt rd shamt funct

061116212631

6 bits 6 bits5 bits5 bits5 bits5 bits

op rs rt immediate

016212631

6 bits 16 bits5 bits5 bits

Page 25: CS152 / Kubiatowicz Lec7.1 2/8/01©UCB Spring 2001 CS152 Computer Architecture and Engineering Lecture 7 Designing a Single Cycle Datapath February 8, 2001

2/8/01 ©UCB Spring 2001 CS152 / Kubiatowicz

Lec7.25

Step 1a: The MIPS-lite Subset for today

° ADD and SUB

• addU rd, rs, rt

• subU rd, rs, rt

° OR Immediate:

• ori rt, rs, imm16

° LOAD and STORE Word

• lw rt, rs, imm16

• sw rt, rs, imm16

° BRANCH:

• beq rs, rt, imm16

op rs rt rd shamt funct

061116212631

6 bits 6 bits5 bits5 bits5 bits5 bits

op rs rt immediate

016212631

6 bits 16 bits5 bits5 bits

op rs rt immediate

016212631

6 bits 16 bits5 bits5 bits

op rs rt immediate

016212631

6 bits 16 bits5 bits5 bits

Page 26: CS152 / Kubiatowicz Lec7.1 2/8/01©UCB Spring 2001 CS152 Computer Architecture and Engineering Lecture 7 Designing a Single Cycle Datapath February 8, 2001

2/8/01 ©UCB Spring 2001 CS152 / Kubiatowicz

Lec7.26

Logical Register Transfers

° RTL gives the meaning of the instructions

° All start by fetching the instructionop | rs | rt | rd | shamt | funct = MEM[ PC ]

op | rs | rt | Imm16 = MEM[ PC ]

inst Register Transfers

ADDU R[rd] <– R[rs] + R[rt]; PC <– PC + 4

SUBU R[rd] <– R[rs] – R[rt]; PC <– PC + 4

ORi R[rt] <– R[rs] | zero_ext(Imm16); PC <– PC + 4

LOAD R[rt] <– MEM[ R[rs] + sign_ext(Imm16)]; PC <– PC + 4

STORE MEM[ R[rs] + sign_ext(Imm16) ] <– R[rt]; PC <– PC + 4

BEQ if ( R[rs] == R[rt] ) then PC <– PC + 4 +sign_ext(Imm16)] || 00

else PC <– PC + 4

Page 27: CS152 / Kubiatowicz Lec7.1 2/8/01©UCB Spring 2001 CS152 Computer Architecture and Engineering Lecture 7 Designing a Single Cycle Datapath February 8, 2001

2/8/01 ©UCB Spring 2001 CS152 / Kubiatowicz

Lec7.27

Step 1: Requirements of the Instruction Set

° Memory

• instruction & data

° Registers (32 x 32)

• read RS

• read RT

• Write RT or RD

° PC

° Extender

° Add and Sub register or extended immediate

° Add 4 or extended immediate to PC

Page 28: CS152 / Kubiatowicz Lec7.1 2/8/01©UCB Spring 2001 CS152 Computer Architecture and Engineering Lecture 7 Designing a Single Cycle Datapath February 8, 2001

2/8/01 ©UCB Spring 2001 CS152 / Kubiatowicz

Lec7.28

Step 2: Components of the Datapath

° Combinational Elements

° Storage Elements• Clocking methodology

Page 29: CS152 / Kubiatowicz Lec7.1 2/8/01©UCB Spring 2001 CS152 Computer Architecture and Engineering Lecture 7 Designing a Single Cycle Datapath February 8, 2001

2/8/01 ©UCB Spring 2001 CS152 / Kubiatowicz

Lec7.29

Combinational Logic Elements (Basic Building Blocks)

° Adder

° MUX

° ALU

32

32

A

B32

Sum

Carry

32

32

A

B32

Result

OP

32A

B32

Y32

Select

Ad

der

MU

XA

LU

CarryIn

Page 30: CS152 / Kubiatowicz Lec7.1 2/8/01©UCB Spring 2001 CS152 Computer Architecture and Engineering Lecture 7 Designing a Single Cycle Datapath February 8, 2001

2/8/01 ©UCB Spring 2001 CS152 / Kubiatowicz

Lec7.30

Storage Element: Register (Basic Building Block)

° Register

• Similar to the D Flip Flop except

- N-bit input and output

- Write Enable input

• Write Enable:

- negated (0): Data Out will not change

- asserted (1): Data Out will become Data In

Clk

Data In

Write Enable

N N

Data Out

Page 31: CS152 / Kubiatowicz Lec7.1 2/8/01©UCB Spring 2001 CS152 Computer Architecture and Engineering Lecture 7 Designing a Single Cycle Datapath February 8, 2001

2/8/01 ©UCB Spring 2001 CS152 / Kubiatowicz

Lec7.31

Storage Element: Register File

° Register File consists of 32 registers:

• Two 32-bit output busses:

busA and busB

• One 32-bit input bus: busW

° Register is selected by:

• RA (number) selects the register to put on busA (data)

• RB (number) selects the register to put on busB (data)

• RW (number) selects the register to be writtenvia busW (data) when Write Enable is 1

° Clock input (CLK)

• The CLK input is a factor ONLY during write operation

• During read operation, behaves as a combinational logic block:

- RA or RB valid => busA or busB valid after “access time.”

Clk

busW

Write Enable

3232

busA

32busB

5 5 5RWRA RB

32 32-bitRegisters

Page 32: CS152 / Kubiatowicz Lec7.1 2/8/01©UCB Spring 2001 CS152 Computer Architecture and Engineering Lecture 7 Designing a Single Cycle Datapath February 8, 2001

2/8/01 ©UCB Spring 2001 CS152 / Kubiatowicz

Lec7.32

Storage Element: Idealized Memory

° Memory (idealized)

• One input bus: Data In

• One output bus: Data Out

° Memory word is selected by:

• Address selects the word to put on Data Out

• Write Enable = 1: address selects the memoryword to be written via the Data In bus

° Clock input (CLK)

• The CLK input is a factor ONLY during write operation

• During read operation, behaves as a combinational logic block:

- Address valid => Data Out valid after “access time.”

Clk

Data In

Write Enable

32 32DataOut

Address

Page 33: CS152 / Kubiatowicz Lec7.1 2/8/01©UCB Spring 2001 CS152 Computer Architecture and Engineering Lecture 7 Designing a Single Cycle Datapath February 8, 2001

2/8/01 ©UCB Spring 2001 CS152 / Kubiatowicz

Lec7.33

Clocking Methodology

° All storage elements are clocked by the same clock edge

° Cycle Time = CLK-to-Q + Longest Delay Path + Setup + Clock Skew

° (CLK-to-Q + Shortest Delay Path - Clock Skew) > Hold Time

Clk

Don’t Care

Setup Hold

.

.

.

.

.

.

.

.

.

.

.

.

Setup Hold

Page 34: CS152 / Kubiatowicz Lec7.1 2/8/01©UCB Spring 2001 CS152 Computer Architecture and Engineering Lecture 7 Designing a Single Cycle Datapath February 8, 2001

2/8/01 ©UCB Spring 2001 CS152 / Kubiatowicz

Lec7.34

Step 3: Assemble DataPath meeting our requirements

° Register Transfer Requirements Datapath Assembly

° Instruction Fetch

° Read Operands and Execute Operation

Page 35: CS152 / Kubiatowicz Lec7.1 2/8/01©UCB Spring 2001 CS152 Computer Architecture and Engineering Lecture 7 Designing a Single Cycle Datapath February 8, 2001

2/8/01 ©UCB Spring 2001 CS152 / Kubiatowicz

Lec7.35

3a: Overview of the Instruction Fetch Unit

° The common RTL operations

• Fetch the Instruction: mem[PC]

• Update the program counter:

- Sequential Code: PC <- PC + 4

- Branch and Jump: PC <- “something else”

32

Instruction WordAddress

InstructionMemory

PCClk

Next AddressLogic

Page 36: CS152 / Kubiatowicz Lec7.1 2/8/01©UCB Spring 2001 CS152 Computer Architecture and Engineering Lecture 7 Designing a Single Cycle Datapath February 8, 2001

2/8/01 ©UCB Spring 2001 CS152 / Kubiatowicz

Lec7.36

3b: Add & Subtract

° R[rd] <- R[rs] op R[rt] Example: addU rd, rs, rt

• Ra, Rb, and Rw come from instruction’s rs, rt, and rd fields

• ALUctr and RegWr: control logic after decoding the instruction

32

Result

ALUctr

Clk

busW

RegWr

32

32

busA

32

busB

5 5 5

Rw Ra Rb

32 32-bitRegisters

Rs RtRd

AL

U

op rs rt rd shamt funct

061116212631

6 bits 6 bits5 bits5 bits5 bits5 bits

Page 37: CS152 / Kubiatowicz Lec7.1 2/8/01©UCB Spring 2001 CS152 Computer Architecture and Engineering Lecture 7 Designing a Single Cycle Datapath February 8, 2001

2/8/01 ©UCB Spring 2001 CS152 / Kubiatowicz

Lec7.37

Register-Register Timing: One complete cycle

32Result

ALUctr

Clk

busW

RegWr

3232

busA

32busB

5 5 5

Rw Ra Rb

32 32-bitRegisters

Rs RtRd

AL

U

Clk

PC

Rs, Rt, Rd,Op, Func

Clk-to-Q

ALUctr

Instruction Memory Access Time

Old Value New Value

RegWr Old Value New Value

Delay through Control Logic

busA, BRegister File Access Time

Old Value New Value

busW

ALU Delay

Old Value New Value

Old Value New Value

New ValueOld Value

Register WriteOccurs Here

Page 38: CS152 / Kubiatowicz Lec7.1 2/8/01©UCB Spring 2001 CS152 Computer Architecture and Engineering Lecture 7 Designing a Single Cycle Datapath February 8, 2001

2/8/01 ©UCB Spring 2001 CS152 / Kubiatowicz

Lec7.38

3c: Logical Operations with Immediate° R[rt] <- R[rs] op ZeroExt[imm16] ]

32

Result

ALUctr

Clk

busW

RegWr

32

32

busA

32

busB

5 5 5

Rw Ra Rb

32 32-bitRegisters

Rs

ZeroE

xt

Mu

x

RtRdRegDst

Mux

3216imm16

ALUSrc

AL

U

11

op rs rt immediate

016212631

6 bits 16 bits5 bits5 bits rd?

immediate

016 1531

16 bits16 bits

0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

Rt?

Page 39: CS152 / Kubiatowicz Lec7.1 2/8/01©UCB Spring 2001 CS152 Computer Architecture and Engineering Lecture 7 Designing a Single Cycle Datapath February 8, 2001

2/8/01 ©UCB Spring 2001 CS152 / Kubiatowicz

Lec7.39

3d: Load Operations

° R[rt] <- Mem[R[rs] + SignExt[imm16]] Example: lw rt, rs, imm16

11

op rs rt immediate

016212631

6 bits 16 bits5 bits5 bits rd

32

ALUctr

Clk

busW

RegWr

32

32

busA

32

busB

5 5 5

Rw Ra Rb

32 32-bitRegisters

Rs

RtRd

RegDst

Exten

der

Mu

x

Mux

3216

imm16

ALUSrc

ExtOp

Clk

Data InWrEn

32

Adr

DataMemory

32

AL

U

MemWr Mu

x

W_Src

??

Rt?

Page 40: CS152 / Kubiatowicz Lec7.1 2/8/01©UCB Spring 2001 CS152 Computer Architecture and Engineering Lecture 7 Designing a Single Cycle Datapath February 8, 2001

2/8/01 ©UCB Spring 2001 CS152 / Kubiatowicz

Lec7.40

3e: Store Operations

° Mem[ R[rs] + SignExt[imm16] <- R[rt] ] Example: sw rt, rs, imm16

op rs rt immediate

016212631

6 bits 16 bits5 bits5 bits

32

ALUctr

Clk

busW

RegWr

32

32

busA

32

busB

55 5

Rw Ra Rb

32 32-bitRegisters

Rs

Rt

Rt

Rd

RegDst

Exten

der

Mu

x

Mux

3216imm16

ALUSrcExtOp

Clk

Data InWrEn

32

Adr

DataMemory

MemWr

AL

U

32

Mu

x

W_Src

Page 41: CS152 / Kubiatowicz Lec7.1 2/8/01©UCB Spring 2001 CS152 Computer Architecture and Engineering Lecture 7 Designing a Single Cycle Datapath February 8, 2001

2/8/01 ©UCB Spring 2001 CS152 / Kubiatowicz

Lec7.41

3f: The Branch Instruction

° beq rs, rt, imm16

• mem[PC] Fetch the instruction from memory

• Equal <- R[rs] == R[rt] Calculate the branch condition

• if (Equal) Calculate the next instruction’s address

- PC <- PC + 4 + ( SignExt(imm16) x 4 )

• else

- PC <- PC + 4

op rs rt immediate

016212631

6 bits 16 bits5 bits5 bits

Page 42: CS152 / Kubiatowicz Lec7.1 2/8/01©UCB Spring 2001 CS152 Computer Architecture and Engineering Lecture 7 Designing a Single Cycle Datapath February 8, 2001

2/8/01 ©UCB Spring 2001 CS152 / Kubiatowicz

Lec7.42

Datapath for Branch Operations

° beq rs, rt, imm16 Datapath generates condition (equal)

op rs rt immediate

016212631

6 bits 16 bits5 bits5 bits

32

imm16

PC

Clk

00

Ad

der

Mu

x

Ad

der

4nPC_sel

Clk

busW

RegWr

32

busA

32

busB

5 5 5

Rw Ra Rb

32 32-bitRegisters

Rs Rt

Eq

ual

?

Cond

PC

Ext

Inst Address

Page 43: CS152 / Kubiatowicz Lec7.1 2/8/01©UCB Spring 2001 CS152 Computer Architecture and Engineering Lecture 7 Designing a Single Cycle Datapath February 8, 2001

2/8/01 ©UCB Spring 2001 CS152 / Kubiatowicz

Lec7.43

Putting it All Together: A Single Cycle Datapath

imm

16

32

ALUctr

Clk

busW

RegWr

32

32

busA

32

busB

55 5

Rw Ra Rb

32 32-bitRegisters

Rs

Rt

Rt

RdRegDst

Exten

der

Mu

x

3216imm16

ALUSrcExtOp

Mu

x

MemtoReg

Clk

Data InWrEn32 Adr

DataMemory

MemWrA

LU

Equal

Instruction<31:0>

0

1

0

1

01

<21:25>

<16:20>

<11:15>

<0:15>

Imm16RdRtRs

=

Ad

der

Ad

der

PC

Clk

00

Mu

x

4

nPC_sel

PC

Ext

Adr

InstMemory

Page 44: CS152 / Kubiatowicz Lec7.1 2/8/01©UCB Spring 2001 CS152 Computer Architecture and Engineering Lecture 7 Designing a Single Cycle Datapath February 8, 2001

2/8/01 ©UCB Spring 2001 CS152 / Kubiatowicz

Lec7.44

An Abstract View of the Critical Path° Register file and ideal memory:

• The CLK input is a factor ONLY during write operation

• During read operation, behave as combinational logic:

- Address valid => Output valid after “access time.”

Critical Path (Load Operation) = PC’s Clk-to-Q + Instruction Memory’s Access Time + Register File’s Access Time + ALU to Perform a 32-bit Add + Data Memory Access Time + Setup Time for Register File Write + Clock Skew

Clk

5

Rw Ra Rb

32 32-bitRegisters

RdA

LU

Clk

Data In

DataAddress

IdealData

Memory

Instruction

InstructionAddress

IdealInstruction

Memory

Clk

PC

5Rs

5Rt

16Imm

32

323232

A

B

Nex

t A

dd

ress

Page 45: CS152 / Kubiatowicz Lec7.1 2/8/01©UCB Spring 2001 CS152 Computer Architecture and Engineering Lecture 7 Designing a Single Cycle Datapath February 8, 2001

2/8/01 ©UCB Spring 2001 CS152 / Kubiatowicz

Lec7.45

An Abstract View of the Implementation

DataOut

Clk

5

Rw Ra Rb

32 32-bitRegisters

Rd

AL

U

Clk

Data In

DataAddress

IdealData

Memory

Instruction

InstructionAddress

IdealInstruction

Memory

Clk

PC

5Rs

5Rt

32

323232

A

B

Nex

t A

dd

ress

Control

Datapath

Control Signals Conditions

Page 46: CS152 / Kubiatowicz Lec7.1 2/8/01©UCB Spring 2001 CS152 Computer Architecture and Engineering Lecture 7 Designing a Single Cycle Datapath February 8, 2001

2/8/01 ©UCB Spring 2001 CS152 / Kubiatowicz

Lec7.46

Steps 4 & 5: Implement the control

Next Time!

Page 47: CS152 / Kubiatowicz Lec7.1 2/8/01©UCB Spring 2001 CS152 Computer Architecture and Engineering Lecture 7 Designing a Single Cycle Datapath February 8, 2001

2/8/01 ©UCB Spring 2001 CS152 / Kubiatowicz

Lec7.47

Summary

° 5 steps to design a processor• 1. Analyze instruction set => datapath requirements

• 2. Select set of datapath components & establish clock methodology

• 3. Assemble datapath meeting the requirements

• 4. Analyze implementation of each instruction to determine setting of control points that effects the register transfer.

• 5. Assemble the control logic

° MIPS makes it easier• Instructions same size

• Source registers always in same place

• Immediates same size, location

• Operations always on registers/immediates

° Single cycle datapath => CPI=1, CCT => long

° Next time: implementing control