cs 35101 computer architecture spring 2006 week 6/7 paul durand (durand) course url:...

27
CS 35101 Computer Architecture Spring 2006 Week 6/7 Paul Durand (www.cs.kent.edu/~durand) Course url: www.cs.kent.edu/~durand/cs35101.htm

Upload: natalie-norman

Post on 04-Jan-2016

217 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: CS 35101 Computer Architecture Spring 2006 Week 6/7 Paul Durand (durand) Course url: durand/cs35101.htm

CS 35101Computer

ArchitectureSpring 2006

Week 6/7

Paul Durand (www.cs.kent.edu/~durand)

Course url: www.cs.kent.edu/~durand/cs35101.htm

Page 2: CS 35101 Computer Architecture Spring 2006 Week 6/7 Paul Durand (durand) Course url: durand/cs35101.htm

Head’s Up Week 6 & 7 material

Digital Logic Design Processor organization / description MIPS arithmetic operations PH 3.1, 3.2, 3.3

Reminders Midterm #1 – Thursday, February 23rd

Next week’s material MIPS arithmetic operations

- Reading assignment – PH 3.4 through 3.5

Page 3: CS 35101 Computer Architecture Spring 2006 Week 6/7 Paul Durand (durand) Course url: durand/cs35101.htm

To make the architect’s crucial task even conceivable, it is necessary to separate the architecture, the definition of the product as perceivable by the user, from its implementation. Architecture versus implementation defines a clean boundary between parts of the design task, and there is plenty of work on each side of it.

The Mythical Man-Month, Brooks, pg. 256

Page 4: CS 35101 Computer Architecture Spring 2006 Week 6/7 Paul Durand (durand) Course url: durand/cs35101.htm

Review: MIPS ISACategory Instr Op Code Example Meaning

Arithmetic

(R & I format)

add 0 and 32 add $s1, $s2, $s3 $s1 = $s2 + $s3

subtract 0 and 34 sub $s1, $s2, $s3 $s1 = $s2 - $s3

add immediate 8 addi $s1, $s2, 6 $s1 = $s2 + 6

or immediate 13 ori $s1, $s2, 6 $s1 = $s2 v 6

Data Transfer

(I format)

load word 35 lw $s1, 24($s2) $s1 = Memory($s2+24)

store word 43 sw $s1, 24($s2) Memory($s2+24) = $s1

load byte 32 lb $s1, 25($s2) $s1 = Memory($s2+25)

store byte 40 sb $s1, 25($s2) Memory($s2+25) = $s1

load upper imm 15 lui $s1, 6 $s1 = 6 * 216

Cond. Branch (I & R format)

br on equal 4 beq $s1, $s2, L if ($s1==$s2) go to L

br on not equal 5 bne $s1, $s2, L if ($s1 !=$s2) go to L

set on less than 0 and 42 slt $s1, $s2, $s3 if ($s2<$s3) $s1=1 else $s1=0

set on less than immediate

10 slti $s1, $s2, 6 if ($s2<6) $s1=1 else $s1=0

Uncond. Jump (J & R format)

jump 2 j 2500 go to 10000

jump register 0 and 8 jr $t1 go to $t1

jump and link 3 jal 2500 go to 10000; $ra=PC+4

Page 5: CS 35101 Computer Architecture Spring 2006 Week 6/7 Paul Durand (durand) Course url: durand/cs35101.htm

Review: MIPS Organization, so far

ProcessorMemory

32 bits

230

words

read/write addr

read data

write data

word address(binary)

0…00000…01000…10000…1100

1…1100Register File

src1 addr

src2 addr

dst addr

write data

32 bits

src1data

src2data

32registers

($zero - $ra)

32

32

3232

32

32

5

5

5

PC

ALU

32 32

3232

32

0 1 2 37654

byte address(big Endian)

FetchPC = PC+4

DecodeExec

Add32

324

Add32

32br offset

Page 6: CS 35101 Computer Architecture Spring 2006 Week 6/7 Paul Durand (durand) Course url: durand/cs35101.htm

Processor Organization Processor control needs to have the

Ability to input instructions from memory Logic to control instruction sequencing and to issue signals

that control the way information flows between the datapath components and the operations performed by them

Processor datapath needs to have the Ability to load data from and store data to memory Interconnected components - functional units (e.g., ALU) and

storage units (e.g., Register File) - for executing the ISA

Need a way to describe the organization High level (block diagram) description Schematic (gate level) description Textural (simulation/synthesis level) description

Page 7: CS 35101 Computer Architecture Spring 2006 Week 6/7 Paul Durand (durand) Course url: durand/cs35101.htm

Levels of Description of a Digital SystemArchitectural

Functional/Behavioral

Register Transfer

Logic

Circuit

models programmer's view at ahigh level; written in your favoriteprogramming language

more detailed model, like theblock diagram view

model is in terms of datapath FUs,registers, busses; register xferoperations are clock phase accurate

model is in terms of logic gates; delay information can be specified for gates; digital waveforms

model is in terms of circuits (electrical behavior); accurateanalog waveforms

Less Abstract

More Accurate

Slower Simulation

Special languages + simulation systems for describing the inherent parallel activity in hardware (VHDL and verilog)

Schematic capture + logic simulation package like LogicWorks

Page 8: CS 35101 Computer Architecture Spring 2006 Week 6/7 Paul Durand (durand) Course url: durand/cs35101.htm

Why Simulate First?

Physical breadboarding discrete components/lower scale integration precedes actual

construction of the prototype verification of the initial design

No longer possible as designs reach higher levels of integration!

Simulation before construction - aka functional verification

high level constructs means faster to design and test can play “what if” more easily limited performance (can’t usually simulate all possible input

transitions) and accuracy (can’t usually model wiring delays accurately), however

Page 9: CS 35101 Computer Architecture Spring 2006 Week 6/7 Paul Durand (durand) Course url: durand/cs35101.htm

Because ease of use is the purpose, this ratio of function to conceptual complexity is the ultimate test of system design. Neither function alone nor simplicity alone defines a good design.

The Mythical Man-Month, Brooks, pg. 43

Page 10: CS 35101 Computer Architecture Spring 2006 Week 6/7 Paul Durand (durand) Course url: durand/cs35101.htm

Review: MIPS ISACategory Instr Op Code Example Meaning

Arithmetic

(R & I format)

add 0 and 32 add $s1, $s2, $s3 $s1 = $s2 + $s3

subtract 0 and 34 sub $s1, $s2, $s3 $s1 = $s2 - $s3

add immediate 8 addi $s1, $s2, 6 $s1 = $s2 + 6

or immediate 13 ori $s1, $s2, 6 $s1 = $s2 v 6

Data Transfer

(I format)

load word 35 lw $s1, 24($s2) $s1 = Memory($s2+24)

store word 43 sw $s1, 24($s2) Memory($s2+24) = $s1

load byte 32 lb $s1, 25($s2) $s1 = Memory($s2+25)

store byte 40 sb $s1, 25($s2) Memory($s2+25) = $s1

load upper imm 15 lui $s1, 6 $s1 = 6 * 216

Cond. Branch (I & R format)

br on equal 4 beq $s1, $s2, L if ($s1==$s2) go to L

br on not equal 5 bne $s1, $s2, L if ($s1 !=$s2) go to L

set on less than 0 and 42 slt $s1, $s2, $s3 if ($s2<$s3) $s1=1 else $s1=0

set on less than immediate

10 slti $s1, $s2, 6 if ($s2<6) $s1=1 else $s1=0

Uncond. Jump (J & R format)

jump 2 j 2500 go to 10000

jump register 0 and 8 jr $t1 go to $t1

jump and link 3 jal 2500 go to 10000; $ra=PC+4

Page 11: CS 35101 Computer Architecture Spring 2006 Week 6/7 Paul Durand (durand) Course url: durand/cs35101.htm

Review: MIPS Organization, so far

ProcessorMemory

32 bits

230

words

read/write addr

read data

write data

word address(binary)

0…00000…01000…10000…1100

1…1100Register File

src1 addr

src2 addr

dst addr

write data

32 bits

src1data

src2data

32registers

($zero - $ra)

32

32

3232

32

32

5

5

5

PC

ALU

32 32

3232

32

0 1 2 37654

byte address(big Endian)

FetchPC = PC+4

DecodeExec

Add32

324

Add32

32br offset

Page 12: CS 35101 Computer Architecture Spring 2006 Week 6/7 Paul Durand (durand) Course url: durand/cs35101.htm

Arithmetic

Where we've been: Abstractions:

- Instruction Set Architecture (ISA)- Assembly and machine language

What's up ahead: Implementing the architecture

32

32

32

m (operation)

result

A

B

ALU

4

zero ovf

11

Page 13: CS 35101 Computer Architecture Spring 2006 Week 6/7 Paul Durand (durand) Course url: durand/cs35101.htm

Number Representation Bits are just bits (have no inherent meaning)

conventions define the relationships between bits and numbers

Binary numbers (base 2) - integers0000 0001 0010 0011 0100 0101 0110 0111

1000 1001 . . . in decimal from 0 to 2n-1 for n bits

Of course, it gets more complicated storage locations (e.g., register file words) are finite, so have to

worry about overflow (i.e., when the number is too big to fit into 32 bits)

have to be able to represent negative numbers, e.g., how do we specify -8 in

addi $sp, $sp, -8 #$sp = $sp - 8 in real systems have to provide for more that just integers, e.g.,

fractions and real numbers (and floating point)

Page 14: CS 35101 Computer Architecture Spring 2006 Week 6/7 Paul Durand (durand) Course url: durand/cs35101.htm

Possible RepresentationsSign Mag. Two’s Comp. One’s Comp.

1000 = -8

1111 = -7 1001= -7 1000 = -7

1110 = -6 1010 = -6 1001 = -6

1101 = -5 1011 = -5 1010 = -5

1100 = -4 1100 = -4 1011 = -4

1011 = -3 1101 = -3 1100 = -3

1010 = -2 1110 = -2 1101 = -2

1001 = -1 1111 = -1 1110 = -1

1000 = -0 1111 = -0

0000 = +0 0000 = 0 0000 = +0

0001 = +1 0001 = +1 0001 = +1

0010 = +2 0010 = +2 0010 = +2

0011 = +3 0011 = +3 0011 = +3

0100 = +4 0100 = +4 0100 = +4

0101 = +5 0101 = +5 0101 = +5

0110 = +6 0110 = +6 0110 = +6

0111 = +7 0111 = +7 0111 = +7

Issues:

balance

number of zeros

ease of operations

Which one is best? Why?

Page 15: CS 35101 Computer Architecture Spring 2006 Week 6/7 Paul Durand (durand) Course url: durand/cs35101.htm

32-bit signed numbers (2’s complement):

0000 0000 0000 0000 0000 0000 0000 0000two = 0ten

0000 0000 0000 0000 0000 0000 0000 0001two = + 1ten

0000 0000 0000 0000 0000 0000 0000 0010two = + 2ten

...

0111 1111 1111 1111 1111 1111 1111 1110two = + 2,147,483,646ten

0111 1111 1111 1111 1111 1111 1111 1111two = + 2,147,483,647ten

1000 0000 0000 0000 0000 0000 0000 0000two = – 2,147,483,648ten

1000 0000 0000 0000 0000 0000 0000 0001two = – 2,147,483,647ten

1000 0000 0000 0000 0000 0000 0000 0010two = – 2,147,483,646ten

...

1111 1111 1111 1111 1111 1111 1111 1101two = – 3ten

1111 1111 1111 1111 1111 1111 1111 1110two = – 2ten

1111 1111 1111 1111 1111 1111 1111 1111two = – 1ten

What if the bit string represented addresses? need operations that also deal with only positive (unsigned) integers

maxint

minint

MIPS Representations

Page 16: CS 35101 Computer Architecture Spring 2006 Week 6/7 Paul Durand (durand) Course url: durand/cs35101.htm

Review: Signed Binary Representation2’s comp decimal

1000 -8

1001 -7

1010 -6

1011 -5

1100 -4

1101 -3

1110 -2

1111 -1

0000 0

0001 1

0010 2

0011 3

0100 4

0101 5

0110 6

0111 723 - 1 =

1011

then add a 1

1010

complement all the bits

-(23 - 1) =

-23 =

Page 17: CS 35101 Computer Architecture Spring 2006 Week 6/7 Paul Durand (durand) Course url: durand/cs35101.htm

Negating a two's complement number: complement all

the bits and add a 1

remember: “negate” and “invert” are quite different!

Converting n-bit numbers into numbers with more than

n bits:

MIPS 16-bit immediate gets converted to 32 bits for arithmetic

copy the most significant bit (the sign bit) into the other bits

0010 -> 0000 0010

1010 -> 1111 1010

sign extension versus zero extend (lb vs. lbu)

Two's Complement Operations

Page 18: CS 35101 Computer Architecture Spring 2006 Week 6/7 Paul Durand (durand) Course url: durand/cs35101.htm

Goal: Design a ALU for the MIPS ISA

Must support the Arithmetic/Logic operations of the ISA

Tradeoffs of cost and speed based on frequency of occurrence, hardware budget

Page 19: CS 35101 Computer Architecture Spring 2006 Week 6/7 Paul Durand (durand) Course url: durand/cs35101.htm

MIPS Arithmetic and Logic Instructions

Signed arithmetic generates overflow, but no carry out

R-type:

I-Type:

31 25 20 15 5 0

op Rs Rt Rd funct

op Rs Rt Immed 16

Type op funct

ADDI 001000 xx

ADDIU 001001 xx

SLTI 001010 xx

SLTIU 001011 xx

ANDI 001100 xx

ORI 001101 xx

XORI 001110 xx

LUI 001111 xx

Type op funct

ADD 000000 100000

ADDU 000000 100001

SUB 000000 100010

SUBU 000000 100011

AND 000000 100100

OR 000000 100101

XOR 000000 100110

NOR 000000 100111

Type op funct

000000 101000

000000 101001

SLT 000000 101010

SLTU 000000 101011

000000 101100

Page 20: CS 35101 Computer Architecture Spring 2006 Week 6/7 Paul Durand (durand) Course url: durand/cs35101.htm

Design Trick: Divide & Conquer

Break the problem into simpler problems, solve them and glue together the solution

Example: assume the immediates have been taken care of before the ALU

now down to 10 operations can encode in 4 bits

00 add

01 addu

02 sub

03 subu

04 and

05 or

06 xor

07 nor

12 slt

13 sltu

Page 21: CS 35101 Computer Architecture Spring 2006 Week 6/7 Paul Durand (durand) Course url: durand/cs35101.htm

Just like in grade school (carry/borrow 1s) 0111 0111 0110+ 0110 - 0110 - 0101

Two's complement operations easy

subtraction using addition of negative numbers 0111 0111 - 0110 + 1010

Overflow (result too large for finite computer word):

e.g., adding two n-bit numbers does not yield an n-bit number 0111+ 0001

Addition & Subtraction

1101 0001 0001

0001 1 0001

1000

Page 22: CS 35101 Computer Architecture Spring 2006 Week 6/7 Paul Durand (durand) Course url: durand/cs35101.htm

Building a 1-bit Binary Adder

1 bit Full Adder

A

BS

carry_in

carry_out

S = A xor B xor carry_in

carry_out = AB v Acarry_in v Bcarry_in (majority function)

How can we use it to build a 32-bit adder?

How can we modify it easily to build an adder/subtractor?

A B carry_in carry_out S

0 0 0 0 0

0 0 1 0 1

0 1 0 0 1

0 1 1 1 0

1 0 0 0 1

1 0 1 1 0

1 1 0 1 0

1 1 1 1 1

Page 23: CS 35101 Computer Architecture Spring 2006 Week 6/7 Paul Durand (durand) Course url: durand/cs35101.htm

Building 32-bit Adder

1-bit FA

A0

B0

S0

c0=carry_in

c1

1-bit FA

A1

B1

S1

c2

1-bit FA

A2

B2

S2

c3

c32=carry_out

1-bit FA

A31

B31

S31

c31

. .

.

Just connect the carry-out of the least significant bit FA to the carry-in of the next least significant bit and connect . . .

Ripple Carry Adder (RCA)

advantage: simple logic, so small (low cost)

disadvantage: slow and lots of glitching (so lots of energy consumption)

Page 24: CS 35101 Computer Architecture Spring 2006 Week 6/7 Paul Durand (durand) Course url: durand/cs35101.htm

Building 32-bit Adder/Subtractor

Remember 2’s complement is just

complement all the bits

add a 1 in the least significant bit

A 0111 0111 B - 0110 + 1010

1-bit FA S0

c0=carry_in

c1

1-bit FA S1

c2

1-bit FA S2

c3

c32=carry_out

1-bit FA S31

c31

. .

.

A0

A1

A2

A31

B0

B1

B2

B31

add/subt

B0

control(0=add,1=subt) B0 if control =

0, !B0 if control = 1

Page 25: CS 35101 Computer Architecture Spring 2006 Week 6/7 Paul Durand (durand) Course url: durand/cs35101.htm

Overflow Detection and Effects

Overflow: the result is too large to represent in the number of bits allocated

When adding operands with different signs, overflow cannot occur! Overflow occurs when

adding two positives yields a negative or, adding two negatives gives a positive or, subtract a negative from a positive gives a negative or, subtract a positive from a negative gives a positive

On overflow, an exception (interrupt) occurs Control jumps to predefined address for exception Interrupted address (address of instruction causing the overflow)

is saved for possible resumption

Don't always want to detect (interrupt on) overflow

Page 26: CS 35101 Computer Architecture Spring 2006 Week 6/7 Paul Durand (durand) Course url: durand/cs35101.htm

New MIPS Instructions

Category Instr Op Code Example Meaning

Arithmetic

(R & I format)

add unsigned 0 and 33 addu $s1, $s2, $s3 $s1 = $s2 + $s3

subt unsigned 0 and 35 subu $s1, $s2, $s3 $s1 = $s2 - $s3

add imm. unsigned

9 addiu $s1, $s2, 6 $s1 = $s2 + 6

Data Transfer

load byte unsigned

36 lbu $s1, 25($s2) $s1 = Memory($s2+25)

Cond. Branch (I & R format)

set on less than unsigned

0 and 43 sltu $s1, $s2, $s3 if ($s2<$s3) $s1=1 else $s1=0

set on less than imm. unsigned

11 sltiu $s1, $s2, 6 if ($s2<6) $s1=1 else $s1=0

Sign extend - addiu, sltiu

Zero extend - lbu

No overflow detected - addu, subu, addiu, sltu, sltiu

Page 27: CS 35101 Computer Architecture Spring 2006 Week 6/7 Paul Durand (durand) Course url: durand/cs35101.htm

Conclusion

We can build an ALU to support the MIPS ISA we can efficiently perform subtraction using two’s complement

we can replicate a 1-bit ALU to produce a 32-bit ALU

Important points about hardware all of the gates are always working (concurrent)

the speed of a gate is affected by the number of inputs to the gate (fan-in) and the number of gates that the output is connected to (fan-out)

the speed of a circuit is affected by the number of gates in series (on the “critical path” or the “number of levels of logic”)

Our primary focus: comprehension, however, Clever changes to organization can improve performance

(similar to using better algorithms in software)