cse 141 – computer architecture summer session 1 2004...

57
CSE 141 – Computer Architecture Summer Session 1 2004 Lecture 3 ALU Part 2 Single Cycle CPU Part 1 Pramod V. Argade

Upload: others

Post on 18-Feb-2021

9 views

Category:

Documents


0 download

TRANSCRIPT

  • CSE 141 – Computer ArchitectureSummer Session 1 2004

    Lecture 3ALU Part 2

    Single Cycle CPU Part 1

    Pramod V. Argade

  • Slide 5-2Pramod Argade UCSD CSE 141, Fall 2003

    AnnouncementsReading Assignment– Chapter 5: The Processor: Datapath and Control, Sec. 5.3 - 5.4

    Homework 3: Due Mon., July 12 in class4.14c, 4.27, 4.28, 4.31, Multiply (-6 * 7) using Booth algorithm using 4 bit 2’s complement representation for the operands.

    5.5, 5.6, 5.8, 5.9, 5.10

    Quiz 3When: Mon., July 12, First 10 minutes of the classTopic: ALU, Chapter 4 Need: Paper, pen, calculator

  • Slide 5-3Pramod Argade UCSD CSE 141, Fall 2003

    CSE141 Course Schedule

    Lecture # Date Time Room Topic Quiz topic HomeworkDue

    1 Mon. 6/28 6 - 8:50 PM Center 109 Introduction, Ch. 1ISA, Ch. 3 - -

    2 Wed. 6/30 6 - 8:50 PM Center 109 Performance, Ch. 2Arithmetic, Ch. 4ISA

    Ch. 3 #1

    - Mon. 7/5 July 4th Holiday - -

    3 Wed. 7/7 6 - 8:50 PM Center 109 Arithmetic, Ch. 4 Cont.Single-cycle CPU Ch. 5Performance

    Ch. 2 #2

    4 Mon. 7/12 6 - 8:50 PM Center 109 Single-cycle CPU Ch. 5 Cont.Multi-cycle CPU Ch. 5 Arithmetic, Ch. 4 #3

    5 Tue. 7/13 7:30 - 8:50 PM Center 109 Multi-cycle CPU Ch. 5 Cont.(July 5th make up class) - -

    6 Wed. 7/14 6 - 8:50 PM Center 109 Single and Multicycle CPU Examples andReview for Midterm

    Single-cycle CPUCh. 5

    -

    7 Mon. 7/19 6 - 8:50 PM Center 109 Mid-term ExamExceptions - #4

    8 Tue. 7/20 7:30 - 8:50 PM Center 109 Pipelining Ch. 6(July 5th make up class) - -

    9 Wed. 7/21 6 - 8:50 PM Center 109 Hazards, Ch. 6 - -

    10 Mon. 7/26 6 - 8:50 PM Center 109 Memory Hierarchy & Caches Ch. 7 HazardsCh. 6 #5

    11 Wed. 7/28 6 - 8:50 PM Center 109 Virtual Memory, Ch. 7Course ReviewCacheCh. 7 #6

    12 Sat. 7/31 7 - 10 PM Center 109 Final Exam - -

    No Class

  • Slide 5-4Pramod Argade UCSD CSE 141, Fall 2003

    SLT: Set-on-less-than Logic

    SLT $1, $2, $3– if( $2 < $3)

    $1 = 1;else $1 = 0;

    To test A < B, do a subtraction (A - B)– (A < B) if (A - B) < 0, i.e. negative

    Use sign bit– Route the sign bit to bit 0 of result– Set bits 1 - 31 to zero

    There is a complication due to overflow– Work out solution in Homework problem 4.23

  • Slide 5-5Pramod Argade UCSD CSE 141, Fall 2003

    Set if Less Than

    0

    3

    Result

    Operation

    a

    1

    CarryIn

    CarryOut

    0

    1

    Binvert

    b 2

    Less

    0

    3

    Result

    Operation

    a

    1

    CarryIn

    0

    1

    Binvert

    b 2

    Less

    Set

    Overflow detection Overflow

    a.

    b.

    Seta31

    0

    ALU0 Result0

    CarryIn

    a0

    Result1a1

    0

    Result2a2

    0

    Operation

    b31

    b0

    b1

    b2

    Result31

    Overflow

    Binvert

    CarryIn

    Less

    CarryIn

    CarryOut

    ALU1Less

    CarryIn

    CarryOut

    ALU2Less

    CarryIn

    CarryOut

    ALU31Less

    CarryIn

    SLT $m, $n, $pif( $n < $p ) {$m = 1;

    }else {$m = 0;

    }

    $n < $p($n - $p) < 0

  • Slide 5-6Pramod Argade UCSD CSE 141, Fall 2003

    Complete 32-bit ALU from last lecture

    Seta31

    0

    Result0a0

    Result1a1

    0

    Result2a2

    0

    Operation

    b31

    b0

    b1

    b2

    Result31

    Overflow

    Bnegate

    Zero

    ALU0Less

    CarryIn

    CarryOut

    ALU1Less

    CarryIn

    CarryOut

    ALU2Less

    CarryIn

    CarryOut

    ALU31Less

    CarryIn

    Functionality provided• Arithmetic Operations:

    • ADD, SUB• Logical Operations:

    • AND, OR• Compare

    • SLT• Support for branch

    • BEQ, BNE• Exception detection

    • Overflow

    What is missing?• Signed multiply• Unsigned multiply• Signed division• Unsigned division

  • Slide 5-7Pramod Argade UCSD CSE 141, Fall 2003

    Grade school Multiplication algorithm• In general (ignoring sign bits):

    • m bits x n bits = (m+n) bit product

    • Binary makes it easy:• 0 => place 0 ( 0 x multiplicand)

    • 1 => place multiplicand ( 1 x multiplicand)

    • Paper and pencil example of binary multiplication: (8*10 = 80, 0x8 * 0xa = 0x50 )

    1000 (multiplicand)x 1010 (multiplier)00001000x0000xx

    1000xxx1010000 (Result)

  • Slide 5-8Pramod Argade UCSD CSE 141, Fall 2003

    More complicated than additionSimple algorithm:– Accomplished via shift and add

    More time delay and more gates (=> silicon area)Let's look at 3 versions based on grade school algorithm

    Observations about Multiplication

  • Slide 5-9Pramod Argade UCSD CSE 141, Fall 2003

    Multiplication: First Version

    Done

    1. Test Multiplier0

    1a. Add multiplicand to product and place the result in Product register

    2. Shift the Multiplicand register left 1 bit

    3. Shift the Multiplier register right 1 bit

    32nd repetition?

    Start

    Multiplier0 = 0Multiplier0 = 1

    No: < 32 repetitions

    Yes: 32 repetitions

    64-bit ALU

    Control test

    MultiplierShift right

    ProductWrite

    MultiplicandShift left

    64 bits

    64 bits

    32 bits

    Initialization:• Load 32-bit multiplicand and zero extend to 64 bits• Load 64-bit product register with zeroNeed a state machine to control operation 32 Iterations are required• Each Iteration takes 3 clocks• Total 96 + 3 = 99 clocks

    •Observations:• 32 bits in multiplicand are always zero• 64-bit ALU is unnecessary• Left Shifted multiplicand does not affect

    lower bits of the product

  • Slide 5-10Pramod Argade UCSD CSE 141, Fall 2003

    Multiplication: Second Version

    MultiplierShift right

    Write

    32 bits

    64 bits

    32 bits

    Shift right

    Multiplicand

    32-bit ALU

    Product Control test

    Done

    1. Test Multiplier0

    1a. Add multiplicand to the left half of the product and place the result in the left half of the Product register

    2. Shift the Product register right 1 bit

    3. Shift the Multiplier register right 1 bit

    32nd repetition?

    Start

    Multiplier0 = 0Multiplier0 = 1

    No: < 32 repetitions

    Yes: 32 repetitions

    Initialization:• Load 32-bit multiplicand to 32-bit register• Load 64-bit product register with zeroNeed a state machine to control operation

    Observations:32 Iterations are required• Each Iteration takes 3 clocks• Total 96 + 3 = 99 clocks32-bit ALU is used

  • Slide 5-11Pramod Argade UCSD CSE 141, Fall 2003

    Multiplication: Third Version

    Control testWrite

    32 bits

    64 bits

    Shift rightProduct

    Multiplicand

    32-bit ALU

    Done

    1. Test Product0

    1a. Add multiplicand to the left half of the product and place the result in the left half of the Product register

    2. Shift the Product register right 1 bit

    32nd repetition?

    Start

    Product0 = 0Product0 = 1

    No: < 32 repetitions

    Yes: 32 repetitions

    Initialization:• Load 32-bit multiplicand to 32-bit register• Load upper 32 bits of product register with zero• Load lower 32 bits of product register with multiplierNeed a state machine to control operation

    Observations:32 Iterations are required• Each Iteration takes 2 clocks• Total 64 + 3 = 67 clocks32-bit ALU is used64-bit Product Reg. holds Product and Multiplier

  • Slide 5-12Pramod Argade UCSD CSE 141, Fall 2003

    Multiplying Signed Numbers

    Convert all operands to positiveDetermine sign of the product– Sign of the product = sign( op1) ^ sign( op2)

    Multiply positive operands (only 31 bits)If the sign of the result is negative, negate the resultAdds extra logic and delay to multiply

    Is there a better way?

  • Slide 5-13Pramod Argade UCSD CSE 141, Fall 2003

    Booth’s AlgorithmAn elegant approach to multiplying signed numbersWith ability to add, subtract and shift– There are multiple ways to do multiply

    Consider signed operands A and BA = (A31*-231) + (A30*230) + (A29*229) + … +(A1*21) + (A0*20)

    = (-A31*231) + (2A30 -A30 )230 + (2A29 -A29 )229 + … + (2A0- A0)20

    = (A30 - A31)231 + (A29 - A30)230 + … + (A1 - A2)21 + (A-1 - A0)20

    A*B = [(A30 - A31)231 + (A29 - A30)230 + … + (A1 - A2)21 + (A-1 - A0)20]*B= (A30 - A31)231*B + (A29 - A30)230*B + … + (A1 - A2)21*B + (A-1 - A0)20 *B

    Recipe:Evaluate (Ai-1 - Ai)

    0: Do nothing1: Add B2: Subtract B

  • Slide 5-14Pramod Argade UCSD CSE 141, Fall 2003

    Booths algorithm: Signed multiplication

    Current Bit Bit to the Right Explanation Example Op1 0 Begins run of 1s 0001111000 sub1 1 Middle of run of 1s 0001111000 none0 1 End of run of 1s 0001111000 add0 0 Middle of run of 0s 0001111000 none

    Originally for Speed (when shift was faster than add)• Replace a string of 1s in multiplier with an initial subtract when we first see a one and

    then later add for the bit after the last one• Potential speed up recognizing that string of 0’s and 1’s requires no operation!

    0 1 1 1 1 0beginning of runend of run

    middle of run

    A*B = (A30 - A31)231*B + (A29 - A30)230*B + … + (A1 - A2)21*B + (A-1 - A0)20 *B

  • Slide 5-15Pramod Argade UCSD CSE 141, Fall 2003

    Booth’s Algorithm

    • Example: Use Booth’s Algorithm for following multiplication2 * (-6) = 0010 * 1010 = -12 = 1111 0100

    • Recipe: for A*BAdd Ai-1 = 0Evaluate (Ai-1 - Ai)

    0: Do nothing1: Add B2: Subtract B

  • Slide 5-16Pramod Argade UCSD CSE 141, Fall 2003

    Division

    1001 QuotientDivisor 1000 1001010 Dividend

    –1000101011010

    –100010 Remainder (or Modulo result)

    See how big a number can be subtracted, creating quotient bit on each stepBinary => 1 * divisor or 0 * divisor

    Dividend = Quotient x Divisor + Remainder=> sizeof( Dividend ) = sizeof( Quotient ) + sizeof( Divisor )

    3 versions of divide, successive refinement

  • Slide 5-17Pramod Argade UCSD CSE 141, Fall 2003

    Division 1.0

    • Initialization:• 32-bit quotient register = 0, 64-bit remainder = divisor• 64-bit Divisor = (32-bit divisor

  • Slide 5-18Pramod Argade UCSD CSE 141, Fall 2003

    Division 1.0

    1. Subtract the Divisor register from the Remainder register, and place the result in the

    Remainder register.

    Test RemainderRemainder < 0Remainder >= 0

    2a. Shift the Quotient register to the left setting the new rightmost bit to 1.

    2b. Restore the original value by adding the Divisor register to the Remainder register, and place the sum in the Remainder register. Also

    shift the Quotient register to the left, setting the new least significant bit to 0.

    3. Shift the Divisor register right 1 bit.

    33rd repetition? No: < 33 repetitions

    Done

    Yes: 33 repetitions

    Start

  • Slide 5-19Pramod Argade UCSD CSE 141, Fall 2003

    Divide Algorithm

    Optimizations similar to that for multiply algorithm can be done– 32-bit Divisor register– 32-bit ALU– Quotient bits are left shifted into the remainder register

    In case the result of subtraction is negative, remainder register has to be restored– Takes one extra clock cycle

    Non-restoring divide algorithm removes this stepDivide overflow case– 0x80000000/-1

  • Slide 5-20Pramod Argade UCSD CSE 141, Fall 2003

    Floating Point: Introduction

    We need a way to represent real numbers– Numbers with fractions, e.g., 3.14159265… (recognize me?)

    – Very small numbers, e.g., 0.0000000000000000000000013621

    – Very large numbers, e.g., 9,349,398,989,787,762,244,859,087,678

    Binary Fractions:10112 = 1x23 + 0x22 + 1x21 + 1x20

    so...101.0112 = 1x22 + 0x21 + 1x20 + 0x2-1 + 1x2-2 + 1x2-3

    e.g.,.75 = 0.5 + 0.25 = 1/2 + 1/4 = .112

  • Slide 5-21Pramod Argade UCSD CSE 141, Fall 2003

    Recall Scientific Notation

    6.02 x 1023

    exponent

    radix (base)Mantissa

    decimal point

    IEEE Single Precision F.P. ± 1.M x 2e - 127

  • Slide 5-22Pramod Argade UCSD CSE 141, Fall 2003

    IEEE 754Single-precision Floating-Point

    N = (-1)S (1.M) 2 E-127

    • Example:Convert - 325.75 to IEEE Single Precision Floating Point Representation

    1 8 23

    sign exponent:excess 127binary integer

    mantissa:normalized binary significand w/ hidden integer bit: 1.M

    S E M Total 32 bits

  • Slide 5-23Pramod Argade UCSD CSE 141, Fall 2003

    IEEE 754 Double-precision Floating-Point

    N = (-1)S (1.M) 2 E-1023

    • Example:Convert - 325.75 to IEEE Double Precision Floating Point Representation

    sign exponent:excess 1023binary integer

    mantissa:normalized binary significand w/ hidden integer bit: 1.M

    1 11 20S E M M

    32

    Total 64 bits

  • Slide 5-24Pramod Argade UCSD CSE 141, Fall 2003

    IEEE 754 Single Precision FP

    If E=255 and F is nonzero, then V=NaN ("Not a number")If E=255 and F is zero and S is 1, then V=-InfinityIf E=255 and F is zero and S is 0, then V=InfinityIf 0

  • Slide 5-25Pramod Argade UCSD CSE 141, Fall 2003

    Floating Point Addition

    Done

    2. Add the significands

    4. Round the significand to the appropriate number of bits

    Still normalized?

    Start

    Yes

    No

    No

    YesOverflow or underflow?

    Exception

    3. Normalize the sum, either shifting right and incrementing the exponent or shifting left

    and decrementing the exponent

    1. Compare the exponents of the two numbers. Shift the smaller number to the right until its exponent would match the larger exponent

  • Slide 5-26Pramod Argade UCSD CSE 141, Fall 2003

    Floating Point Addition

    0 10 1 0 1

    Control

    Small ALU

    Big ALU

    Sign Exponent Significand Sign Exponent Significand

    Exponent difference

    Shift right

    Shift left or right

    Rounding hardware

    Sign Exponent Significand

    Increment or decrement

    0 10 1

    Shift smaller number right

    Compare exponents

    Add

    Normalize

    Round

    Example: 0.5 + ( - 0.4375)

  • Slide 5-27Pramod Argade UCSD CSE 141, Fall 2003

    IEEE 754 Floating Point

    Increasing the size of significand enhances accuracyIncreasing the size of exponent increases the range of the numbers that can be representedOverflow or underflow can happenCan do integer compare for greater-than, signSingle Precision– Range of about 2 x 10-38 to 2 x 1038

    Double Precision– Range of about 2 x 10-308 to 2 x 10308

    Infinite variety of real numbers exist between, say, 0 and 1– Not more than 253 can be represented exactly in double precision

  • Slide 5-28Pramod Argade UCSD CSE 141, Fall 2003

    Floating Point Complexities

    Operations are somewhat more complicated

    In addition to overflow we can have “underflow”

    Accuracy can be a big problem– IEEE 754 keeps two extra bits, guard and round

    – four rounding modes

    – positive divided by zero yields “infinity”

    – zero divide by zero yields “not a number”

    Implementing the standard can be trickyNot using the standard can be even worse– See text for description of 80x86 and Pentium bug!

  • Slide 5-29Pramod Argade UCSD CSE 141, Fall 2003

    • Multiplication and division take much longer than addition, requiring multiple addition steps.

    • Floating Point extends the range of numbers that can be represented, at the expense of precision (accuracy).

    • FP operations are very similar to integer, but with pre- and post-processing.

    Summary

  • Slide 5-30Pramod Argade UCSD CSE 141, Fall 2003

    AnnouncementsReading Assignment– Chapter 5: The Processor: Datapath and Control, Sec. 5.3 - 5.4

    Homework 3: Due Mon., July 12 in class4.14c, 4.27, 4.28, 4.31, Multiply (-6 * 7) using Booth algorithm using 4 bit 2’s complement representation for the operands.

    5.5, 5.6, 5.8, 5.9, 5.10

    Quiz 3When: Mon., July 12, First 10 minutes of the classTopic: ALU, Chapter 4 Need: Paper, pen, calculator

  • CSE 141 – Computer ArchitectureFall 2003

    Lecture 3 The Processor: Datapath and Control

    Pramod V. Argade

  • Slide 5-32Pramod Argade UCSD CSE 141, Fall 2003

    Datapath and Control Design

    The Five Classic Components of a Computer

    Control

    Datapath

    Memory

    ProcessorInput

    Output

  • Slide 5-33Pramod Argade UCSD CSE 141, Fall 2003

    Single Cycle Implementation Datapath and Control

    InstructionFetch

    InstructionDecode

    OperandFetch

    Execute

    ResultStore

    NextInstruction

    I. Fe

    tch

    Dec

    ode

    Op.

    Fet

    ch

    Exec

    ute

    Stor

    e

    Nex

    t PC

    Clock Cycle

    Complete Execution of a Single Instruction

  • Slide 5-34Pramod Argade UCSD CSE 141, Fall 2003

    Abstract / Simplified View:

    Datapath

    RegistersRegister #

    Data

    Register #

    Data memory

    Address

    Data

    Register #

    PC Instruction ALU

    Instruction memory

    Address

  • Slide 5-35Pramod Argade UCSD CSE 141, Fall 2003

    Combinational – Elements that operate on data values– Produces same output if given same inputs

    State Elements– contains internal storage– state elements can be read at any time– clock is used to determine when a state element should be written

    Two Types of Logic Components

    CombinationalLogic

    A

    BC = f(A,B)

    StateElement

    clk

    A

    BC = f(A,B,state)

  • Slide 5-36Pramod Argade UCSD CSE 141, Fall 2003

    Clock

    Clock is a free running signal– Fixed cycle time (period)– Frequency = 1/(cycle time)– Duty Cycle: (% high)/(%low), e.g. 50/50 Duty Cycle below– Jitter: Uncertainty in rising or falling edge

    Clock Cycle (Period)

    Rising Edge Falling Edge

  • Slide 5-37Pramod Argade UCSD CSE 141, Fall 2003

    Edge-triggered ClockingValues stored in the machine are updated on a clock edge– The clock edge can be either rising or falling

    By default a state element is written every clock edge– An explicit write control signal is required otherwise.

    Edge triggered methodology allows, in the same clock cycle to:– read the contents of a register– send the value through some combinational logic, and – write the contents of the same or another register

    Possible to have the same state element as input and output

    Clock cycle

    Stateelement

    1Combinational logic

    Stateelement

    2

    Clock cycle

    Stateelement

    1Combinational logic

    Stateelement

    1

  • Slide 5-38Pramod Argade UCSD CSE 141, Fall 2003

    Storage ElementsD Latch• Two inputs:

    – the data value to be stored (D)– the clock signal (C) indicating when to read & store D

    • Two outputs:– the value of the internal state (Q) and it's complement

    Q

    C

    D

    _Q

    D

    C

    Q

    Falling edge triggered D flip-flop• Output changes only on the clock edge

    QQ

    _Q

    Q

    _Q

    D latch

    D

    C

    D latch

    DD

    C

    C

    D

    C

    Q

  • Slide 5-39Pramod Argade UCSD CSE 141, Fall 2003

    CPU: Clocking

    Clk

    Don’t CareSetup Hold

    .

    .

    .

    .

    .

    .

    .

    .

    .

    .

    .

    .

    Setup Hold

    All storage elements are clocked by the same clock edge

    CLK CLK

  • Slide 5-40Pramod Argade UCSD CSE 141, Fall 2003

    Register: A Storage Element– Similar to the D Flip Flop except

    • N-bit input and output• Write Enable input

    – Write Enable:• 0: Data Out will not change• 1: Data Out will become Data In (on the clock

    edge)

    Clk

    Data In

    Write Enable

    N N

    Data Out

  • Slide 5-41Pramod Argade UCSD CSE 141, Fall 2003

    Register FileRegister File consists of (32) registers:– Two 32-bit output busses: busA and busB– One 32-bit input bus: busW

    Register is selected by:– RA selects the register to put on busA– RB selects the register to put on busB– RW selects the register to be written

    via busW when Write Enable is 1

    Clock input (CLK)

    Clk

    busW

    Write Enable

    3232

    busA

    32busB

    5 5 5RW RA RB

    32 32-bitRegisters

  • Slide 5-42Pramod Argade UCSD CSE 141, Fall 2003

    Memory

    Memory– One input bus: Data In– One output bus: Data Out

    Memory word is selected by:– Address selects the word to put on Data Out– Write Enable = 1: address selects the memory word to be written

    via the Data In bus

    Clock input (CLK) – The CLK input is a factor ONLY during write operation– During read operation, behaves as a combinational logic block:

    Address valid => Data Out valid after “access time.”

    Clk

    Data In

    Write Enable

    32 32DataOut

    Address

  • Slide 5-43Pramod Argade UCSD CSE 141, Fall 2003

    Basic 4 x 2 Static RAM

    Dlatch Q

    D

    C

    Enable

    Dlatch Q

    D

    C

    Enable

    Dlatch Q

    D

    C

    Enable

    Dlatch Q

    D

    C

    Enable

    Dlatch Q

    D

    C

    Enable

    Dlatch Q

    D

    C

    Enable

    Dlatch Q

    D

    C

    Enable

    Dlatch Q

    D

    C

    Enable

    2-to-4decoder

    Write enable

    Address

    Din[0]Din[1]

    Dout[1] Dout[0]

    0

    1

    2

    3

  • Slide 5-44Pramod Argade UCSD CSE 141, Fall 2003

    A Simple Implementation of MIPS CPUSimplified to contain only:– Memory-reference instructions: lw, sw– Arithmetic-logical instructions: add, sub, and, or, slt

    – Control flow instructions: beq, j

    Execution Time = Instructions * CPI * Cycle TimeProcessor design (datapath and control) will determine:– Clock cycle time– Clock cycles per instruction

    We will design a single cycle processor:– Advantage: One clock cycle per instruction– Disadvantage: long cycle time

  • Slide 5-45Pramod Argade UCSD CSE 141, Fall 2003

    Arithmetic Instructions (R-Type)

    ADD, SUB, AND, OR, SLTExampleadd rd, rs, rt

    e.g. add $t3, $s0, $s5REG[$t3] = REG[$s0] + REG[$s5]

    op rs rt rd shamt funct061116212631

    6 bits 6 bits5 bits5 bits5 bits5 bits

  • Slide 5-46Pramod Argade UCSD CSE 141, Fall 2003

    Load/Store Instructions (I-Type)

    LW, SWExampleslw rt, rs, imm16sw rt, rs, imm16

    e.g. lw $s3, -4($s2)REG[$s3] = D-MEM[ REG[$s2] - 4 ]

    op rs rt immediate016212631

    6 bits 16 bits5 bits5 bits

  • Slide 5-47Pramod Argade UCSD CSE 141, Fall 2003

    Branch (I-Type)

    BeqExamplebeq rs, rt, imm16

    e.g.0x4c beq $s1, $t3, -12if( REG[$s1] == REG[$t3] ) {

    new_PC = old_PC + 4 - 12 # new_PC = 0x44}else {

    new_PC = old_PC + 4 # new_PC = 0x50}

    op rs rt displacement016212631

    6 bits 16 bits5 bits5 bits

  • Slide 5-48Pramod Argade UCSD CSE 141, Fall 2003

    Jump (J-Type)

    JExampleJ Label

    e.g.0x8000 0000 j 0x111 1111new_PC = 0x8444 4444

    op target address02631

    6 bits 26 bits

  • Slide 5-49Pramod Argade UCSD CSE 141, Fall 2003

    Components Required to implement the ISANext PC generation– Add 4 or extended 16-bit immediate to PC

    Memory– Instruction read– Data read/write

    Registers (32 x 32-bit)– Read register rs– Read register rt– Write register rt or rd

    Sign extend immediate operandALU to operate on the operands

  • Slide 5-50Pramod Argade UCSD CSE 141, Fall 2003

    CPU: Instruction Fetch

    • RTL version of the instruction fetch step: • Fetch the Instruction: mem[PC]– Update the program counter:

    • Sequential Code: PC

  • Slide 5-51Pramod Argade UCSD CSE 141, Fall 2003

    CPU: Register-Register Operations (Add, Subtract etc.)

    R[rd]

  • Slide 5-52Pramod Argade UCSD CSE 141, Fall 2003

    CPU: Load Operations

    R[rt]

  • Slide 5-53Pramod Argade UCSD CSE 141, Fall 2003

    CPU: Store Operations

    Mem[ R[rs] + SignExt[imm16]

  • Slide 5-54Pramod Argade UCSD CSE 141, Fall 2003

    CPU: Datapath for Branching

    beq rs, rt, imm16 Datapath generates condition (equal)

    op rs rt immediate016212631

    6 bits 16 bits5 bits5 bits

    32

    imm16

    PC

    Clk

    00

    Adder

    Mux

    Adder

    4nPC_sel

    Clk

    busW

    RegWr

    32

    busA

    32busB

    5 5 5

    Rw Ra Rb32 32-bitRegisters

    Rs Rt

    Equa

    l?

    Cond

    PC Ext

    Instruction Address

    Sign extend to 32 bits and left shift by 2

  • Slide 5-55Pramod Argade UCSD CSE 141, Fall 2003

    CPU: Binary arithmetic for PC

    • In theory, the PC is a 32-bit byte address into the instruction memory:– Sequential operation: PC = PC + 4– Branch operation: PC = PC + 4 + SignExt[Imm16] * 4

    • The magic number “4” always comes up because:– The 32-bit PC is a byte address– And all our instructions are 4 bytes (32 bits) long

    • In other words:– The 2 LSBs of the 32-bit PC are always zeros– There is no reason to have hardware to keep the 2 LSBs

    • In practice, we can simplify the hardware by using a 30-bit PC:– Sequential operation: PC = PC + 1– Branch operation: PC = PC + 1 + SignExt[Imm16]– In either case: Instruction Memory Address = PC concat “00”

  • Slide 5-56Pramod Argade UCSD CSE 141, Fall 2003

    Single Cycle Implementation

    Putting it all together

    MemtoReg

    MemRead

    MemWrite

    ALUOp

    ALUSrc

    RegDst

    PC

    Instruction memory

    Read address

    Instruction [31– 0]

    Instruction [20– 16]

    Instruction [25– 21]

    Add

    Instruction [5– 0]

    RegWrite

    4

    16 32Instruction [15– 0]

    0Registers

    Write registerWrite data

    Write data

    Read data 1

    Read data 2

    Read register 1Read register 2

    Sign extend

    ALU result

    Zero

    Data memory

    Address Read data M

    u x

    1

    0

    M u x

    1

    0

    M u x

    1

    0

    M u x

    1

    Instruction [15– 11]

    ALU control

    Shift left 2

    PCSrc

    ALU

    Add ALU result

  • Slide 5-57Pramod Argade UCSD CSE 141, Fall 2003

    AnnouncementsReading Assignment– Chapter 5: The Processor: Datapath and Control, Sec. 5.3 - 5.4

    Homework 3: Due Mon., July 12 in class4.14c, 4.27, 4.28, 4.31, Multiply (-6 * 7) using Booth algorithm using 4 bit 2’s complement representation for the operands.

    5.5, 5.6, 5.8, 5.9, 5.10

    Quiz 3When: Mon., July 12, First 10 minutes of the classTopic: ALU, Chapter 4 Need: Paper, pen, calculator