lec05 design in parallel

Upload: aletharee

Post on 14-Apr-2018

223 views

Category:

Documents


0 download

TRANSCRIPT

  • 7/27/2019 Lec05 Design in parallel

    1/51

    361 design.1

    Computer Architecture

    ECE 361Lecture 5: The Design Process & ALU Design

  • 7/27/2019 Lec05 Design in parallel

    2/51

    361 design.2

    Quick Review of Last Lecture

  • 7/27/2019 Lec05 Design in parallel

    3/51

    361 design.3

    MIPS ISA Design Objectives and Implications

    Support general OS and C-style language needs

    Support general andembedded applications

    Use dynamic workloadcharacteristics from generalpurpose program tracesand SPECint to guidedesign decisions

    Implement processsor corewith a relatively smallnumber of gates

    Emphasize performancevia fast clock

    RISC-style:Register-Register /Load-Store

    Traditional datatypes, commonoperations, typical

    addressing modes

  • 7/27/2019 Lec05 Design in parallel

    4/51

    361 design.4

    MIPS jump, branch, compare instructions

    I nstruction Example Meaning

    branch on equal beq $1,$2,100 if ($1 == $2) go to PC+4+100Equal test; PC relative branch

    branch on not eq. bne $1,$2,100 if ($1!= $2) go to PC+4+100Not equal test; PC relative

    set on less than slt $1,$2,$3 if ($2 < $3) $1=1; else $1=0Compare less than; 2s comp.

    set less than imm. slti $1,$2,100 if ($2 < 100) $1=1; else $1=0Compare < constant; 2s comp.

    set less than uns. sltu $1,$2,$3 if ($2 < $3) $1=1; else $1=0Compare less than; natural numbers

    set l. t. imm. uns. sltiu $1,$2,100 if ($2 < 100) $1=1; else $1=0Compare < constant; natural numbers

    jump j 10000 go to 10000Jump to target address

    jump register jr $31 go to $31For switch, procedure return

    jump and link jal 10000 $31 = PC + 4; go to 10000

    For procedure call

  • 7/27/2019 Lec05 Design in parallel

    5/51

    361 design.5

    Example: MIPS Instruction Formats and Addressing Modes

    op rs rt rd

    immed

    register

    Register (direct)

    op rs rt

    register

    Base+index

    +

    Memory

    immedop rs rtImmediate

    immedop rs rt

    PC

    PC-relative

    +

    Memory

    All instructions 32 bits wide

    6 5 5 5 11

  • 7/27/2019 Lec05 Design in parallel

    6/51

    361 design.6

    MIPS Instruction Formats

  • 7/27/2019 Lec05 Design in parallel

    7/51361 design.7

    MIPS Operation Overview

    Arithmetic logical

    Add, AddU, AddI, ADDIU, Sub, SubU

    And, AndI, Or, OrI

    SLT, SLTI, SLTU, SLTIU

    SLL, SRL

    Memory Access

    LW, LB, LBU

    SW, SB

  • 7/27/2019 Lec05 Design in parallel

    8/51361 design.8

    Branch & Pipelines

    execute

    Branch

    Delay Slot

    Branch Target

    By the end of Branch instruction, the CPU knows whether or not

    the branch will take place.

    However, it will have fetched the next instruction by then,regardless of whether or not a branch will be taken.

    Why not execute it?

    ifetch execute

    ifetch execute

    ifetch execute

    LL: slt r1, r3, r5

    li r3, #7

    sub r4, r4, 1

    bz r4, LL

    addi r5, r3, 1

    Time

    ifetch execute

  • 7/27/2019 Lec05 Design in parallel

    9/51

    361 design.9

    The next Destination

    34-bit ALU

    LO register

    (16x2 bits)

    LoadHI

    ClearHI

    LoadLO

    MultiplicandRegister

    ShiftAll

    LoadMp

    Extra

    2bits

    3232

    LO[1:0]

    R esul t[ HI ] R esult[LO]

    32 32

    Prev

    LO[1]

    Booth

    Encoder ENC[0]

    ENC[2]

    "LO[0]"

    Control

    Logic

    InputMultiplier

    32

    Sub/Add

    2

    34

    34

    32

    InputMultiplicand

    32=>34signEx

    34

    34x2 MUX

    32=>34signEx

  • 7/27/2019 Lec05 Design in parallel

    10/51

    361 design.10

    Outline of Todays Lecture

    An Overview of the Design Process

    Illustration using ALU design

    Refinements

  • 7/27/2019 Lec05 Design in parallel

    11/51

    361 design.12

    Design Process

    Design Fin ishes As Assemb ly

    -- Design understood in terms ofcomponents and how they havebeen assembled

    -- Top Down decompos i t ionofcomplex functions (behaviors)

    into more primitive functions

    -- bottom-up compos i t ionof primitivebuilding blocks into more complex assemblies

    CPU

    Datapath Control

    ALU Regs Shifter

    NandGate

    Design is a "creat ive process," not a s imple method

  • 7/27/2019 Lec05 Design in parallel

    12/51

    361 design.14

    Design as Search

    Design invo lves educated guesses and v er i f icat ion

    -- Given the goals, how should these be prioritized?

    -- Given alternative design pieces, which should be selected?

    -- Given design space of components & assemblies, which part will yieldthe best solution?

    Feasible (good) choices vs. Optimal choices

    Problem A

    Strategy 1 Strategy 2

    SubProb 1 SubProb2 SubProb3

    BB1 BB2 BB3 BBn

  • 7/27/2019 Lec05 Design in parallel

    13/51

    361 design.15

    Problem: Design a fast ALU for the MIPS ISA

    Requirements?

    Must support the Arithmetic / Logic operations

    Tradeoffs of cost and speed based on frequency of occurrence,hardware budget

  • 7/27/2019 Lec05 Design in parallel

    14/51

    361 design.16

    MIPS ALU requirements

    Add, AddU, Sub, SubU, AddI, AddIU

    => 2s complement adder/sub with overflow detection

    And, Or, AndI, OrI, Xor, Xori, Nor

    => Logical AND, logical OR, XOR, nor

    SLTI, SLTIU (set less than)

    => 2s complement adder with inverter, check sign bit of result

  • 7/27/2019 Lec05 Design in parallel

    15/51

    361 design.17

    MIPS arithmetic instruction format

    Signed arith generate overflow, no carry

    R-type:

    I-Type:

    31 25 20 15 5 0

    op Rs Rt Rd funct

    op Rs Rt Immed 16

    Type op funct

    ADDI 10 xx

    ADDIU 11 xx

    SLTI 12 xx

    SLTIU 13 xx

    ANDI 14 xx

    ORI 15 xx

    XORI 16 xx

    LUI 17 xx

    Type op funct

    ADD 00 40

    ADDU 00 41

    SUB 00 42

    SUBU 00 43

    AND 00 44

    OR 00 45

    XOR 00 46

    NOR 00 47

    Type op funct

    00 50

    00 51

    SLT 00 52

    SLTU 00 53

  • 7/27/2019 Lec05 Design in parallel

    16/51

  • 7/27/2019 Lec05 Design in parallel

    17/51

    361 design.19

    Refined Requirements

    (1) Functional Specificationinputs: 2 x 32-bit operands A, B, 4-bit mode (sort of control)

    outputs: 32-bit result S, 1-bit carry, 1 bit overflowoperations: add, addu, sub, subu, and, or, xor, nor, slt, sltU

    (2) Block Diagram (CAD-TOOL symbol, VHDL entity)

    ALUA B

    movf

    S

    32 32

    32

    4c

  • 7/27/2019 Lec05 Design in parallel

    18/51

    361 design.20

    Behavioral Representation: VHDL

    Entity ALU isgeneric (c_delay: integer := 20 ns;

    S_delay: integer := 20 ns);

    port ( signal A, B: in vlbit_vector (0 to 31);signal m: in vlbit_vector (0 to 3);signal S: out vlbit_vector (0 to 31);signal c: out vlbit;signal ovf: out vlbit)

    end ALU;

    . . .

    S

  • 7/27/2019 Lec05 Design in parallel

    19/51

    361 design.21

    Design Decisions

    Simple bit-slice

    big combinational problem

    many little combinational problems

    partition into 2-step problem

    Bit slice with carry look-ahead

    . . .

    ALU

    bit slice

    7-to-2 C/L 7 3-to-2 C/L

    PLD Gates muxCL0 CL6

  • 7/27/2019 Lec05 Design in parallel

    20/51

    361 design.22

    Refined Diagram: bit-slice ALU

    A B

    M

    S

    32 32

    32

    4

    Ovflw

    ALU0

    a0 b0

    m

    cinco s0

    ALU0

    a31 b31

    m

    cincos31

  • 7/27/2019 Lec05 Design in parallel

    21/51

    361 design.23

    7-to-2 Combinational Logic

    start turning the crank . . .

    Function Inputs Outputs K-Map

    M0 M1 M2 M3 A B Cin S Cout

    add 0 0 0 0 0 0 0 0 00

    127

  • 7/27/2019 Lec05 Design in parallel

    22/51

    361 design.24

    A One Bit ALU

    This 1-bit ALU will perform AND, OR, and ADD

    A

    B

    1-bit

    Full

    Adder

    CarryOut

    CarryIn

    Mux

    Result

  • 7/27/2019 Lec05 Design in parallel

    23/51

    361 design.25

    A One-bit Full Adder

    This is also called a (3, 2) adder

    Half Adder: No CarryIn nor CarryOut

    Truth Table:

    1-bit

    Full

    Adder

    CarryOut

    CarryIn

    A

    B

    C

    Inputs Outputs

    CommentsA B CarryIn SumCarryOut

    0 0 0 0 0 0 + 0 + 0 = 00

    0 0 1 0 1 0 + 0 + 1 = 01

    0 1 0 0 1 0 + 1 + 0 = 01

    0 1 1 1 0 0 + 1 + 1 = 10

    1 0 0 0 1 1 + 0 + 0 = 01

    1 0 1 1 0 1 + 0 + 1 = 10

    1 1 0 1 0 1 + 1 + 0 = 10

    1 1 1 1 1 1 + 1 + 1 = 11

  • 7/27/2019 Lec05 Design in parallel

    24/51

    361 design.26

    Logic Equation for CarryOut

    CarryOut = (!A & B & CarryIn) | (A & !B & CarryIn) | (A & B & !CarryIn)

    | (A & B & CarryIn)

    CarryOut = B & CarryIn | A & CarryIn | A & B

    Inputs Outputs

    CommentsA B CarryIn SumCarryOut0 0 0 0 0 0 + 0 + 0 = 00

    0 0 1 0 1 0 + 0 + 1 = 01

    0 1 0 0 1 0 + 1 + 0 = 01

    0 1 1 1 0 0 + 1 + 1 = 10

    1 0 0 0 1 1 + 0 + 0 = 01

    1 0 1 1 0 1 + 0 + 1 = 10

    1 1 0 1 0 1 + 1 + 0 = 10

    1 1 1 1 1 1 + 1 + 1 = 11

  • 7/27/2019 Lec05 Design in parallel

    25/51

    361 design.27

    Logic Equation for Sum

    Sum = (!A & !B & CarryIn) | (!A & B & !CarryIn) | (A & !B & !CarryIn)

    | (A & B & CarryIn)

    Inputs Outputs

    CommentsA B CarryIn SumCarryOut0 0 0 0 0 0 + 0 + 0 = 00

    0 0 1 0 1 0 + 0 + 1 = 01

    0 1 0 0 1 0 + 1 + 0 = 01

    0 1 1 1 0 0 + 1 + 1 = 10

    1 0 0 0 1 1 + 0 + 0 = 01

    1 0 1 1 0 1 + 0 + 1 = 10

    1 1 0 1 0 1 + 1 + 0 = 10

    1 1 1 1 1 1 + 1 + 1 = 11

  • 7/27/2019 Lec05 Design in parallel

    26/51

    361 design.28

    Logic Equation for Sum (continue)

    Sum = (!A & !B & CarryIn) | (!A & B & !CarryIn) | (A & !B & !CarryIn)

    | (A & B & CarryIn)

    Sum = A XOR B XOR CarryIn

    Truth Table for XOR:

    X Y X XOR Y

    0 0 0

    0 1 1

    1 0 1

    1 1 0

  • 7/27/2019 Lec05 Design in parallel

    27/51

    361 design.29

    Logic Diagrams for CarryOut and Sum

    CarryOut = B & CarryIn | A & CarryIn | A & B

    Sum = A XOR B XOR CarryIn

    CarryIn

    CarryOut

    A

    B

    A

    B

    CarryIn

    Sum

  • 7/27/2019 Lec05 Design in parallel

    28/51

    361 design.30

    Seven plus a MUX ?

    A

    B

    1-bit

    Full

    Adder

    CarryOut

    Mux

    CarryIn

    Result

    Design trick 2: take pieces you know (or can imagine) and try to putthem together

    Design trick 3: solve part of the problem and extend

    add

    and

    or

    S-select

  • 7/27/2019 Lec05 Design in parallel

    29/51

  • 7/27/2019 Lec05 Design in parallel

    30/51

  • 7/27/2019 Lec05 Design in parallel

    31/51

    361 design.33

    Additional operations

    A - B = A + ( B)

    form two complement by invert and add one

    A

    B

    1-bit

    Full

    Adder

    CarryOut

    Mux

    CarryIn

    Result

    add

    and

    or

    S-selectinvert

    Set-less-than? left as an exercise

  • 7/27/2019 Lec05 Design in parallel

    32/51

    361 design.34

    Revised Diagram

    LSB and MSB need to do a little extra

    A B

    M

    S

    32 32

    32

    4

    Ovflw

    ALU0

    a0 b0

    cincos0

    ALU0

    a31 b31

    cincos31

    C/L to

    produceselect,comp,c-in

    ?

  • 7/27/2019 Lec05 Design in parallel

    33/51

    361 design.35

    Overflow

    Examples: 7 + 3 = 10 but ...

    - 4 - 5 = - 9 but ...

    2s ComplementBinaryDecimal

    0 0000

    1 00012 0010

    3 0011

    0000

    11111110

    1101

    Decimal

    0

    -1-2

    -3

    4 0100

    5 0101

    6 0110

    7 0111

    1100

    1011

    1010

    1001

    -4

    -5

    -6

    -7

    1000-8

    0 1 1 1

    0 0 1 1+

    1 0 1 0

    1

    1 1 0 0

    1 0 1 1+

    0 1 1 1

    110

    7

    3

    1

    6

    4

    5

    7

  • 7/27/2019 Lec05 Design in parallel

    34/51

  • 7/27/2019 Lec05 Design in parallel

    35/51

  • 7/27/2019 Lec05 Design in parallel

    36/51

  • 7/27/2019 Lec05 Design in parallel

    37/51

    361 design.39

    More Revised Diagram

    LSB and MSB need to do a little extra

    A B

    M

    S

    32 32

    32

    4

    Ovflw

    ALU0

    a0 b0

    cincos0

    ALU0

    a31 b31

    cincos31

    C/L to

    produceselect,comp,c-in

    signed-arith

    and cin xor co

  • 7/27/2019 Lec05 Design in parallel

    38/51

  • 7/27/2019 Lec05 Design in parallel

    39/51

  • 7/27/2019 Lec05 Design in parallel

    40/51

    361 design.42

    Carry Look Ahead (Design trick: peek)

    A B C-out0 0 0 kill

    0 1 C-in propagate1 0 C-in propagate1 1 1 generate

    A0B1

    SGP

    P = A xor B

    G = A and B

    A

    B

    S

    GP

    A

    B

    S

    GP

    A

    B

    S

    GP

    Cin

    C1 =G0 + C0 P0

    C2 = G1 + G0 P1 + C0 P0 P1

    C3 = G2 + G1 P2 + G0 P1 P2 + C0 P0 P1 P2

    G

    C4 = . . .

    P

  • 7/27/2019 Lec05 Design in parallel

    41/51

  • 7/27/2019 Lec05 Design in parallel

    42/51

  • 7/27/2019 Lec05 Design in parallel

    43/51

  • 7/27/2019 Lec05 Design in parallel

    44/51

  • 7/27/2019 Lec05 Design in parallel

    45/51

    A Partial Carry Lookahead Adder

  • 7/27/2019 Lec05 Design in parallel

    46/51

    361 design.48

    A Partial Carry Lookahead Adder

    It is very expensive to build a full carry lookahead adder

    Just imagine the length of the equation for Cin31

    Common practices:

    Connects several N-bit Lookahead Adders to form a big adder

    Example: connects four 8-bit carry lookahead adders to form

    a 32-bit partial carry lookahead adder

    8-bit CarryLookahead

    Adder

    C0

    8

    88

    Result[7:0]

    B[7:0]A[7:0]

    8-bit CarryLookahead

    Adder

    C8

    8

    88

    Result[15:8]

    B[15:8]A[15:8]

    8-bit CarryLookahead

    Adder

    C16

    8

    88

    Result[23:16]

    B[23:16]A[23:16]

    8-bit CarryLookahead

    Adder

    C24

    8

    88

    Result[31:24]

    B[31:24]A[31:24]

    D i T i k G

  • 7/27/2019 Lec05 Design in parallel

    47/51

    361 design.49

    Design Trick: Guess

    n-bit adder n-bit adderCP(2n) = 2*CP(n)

    n-bit adder n-bit addern-bit adder 1 0

    Cout

    CP(2n) = CP(n) + CP(mux)

    Carry-select adder

    Carry Select

  • 7/27/2019 Lec05 Design in parallel

    48/51

    361 design.50

    Carry Select

    Consider building a 8-bit ALU

    Simple: connects two 4-bit ALUs in series

    Result[3:0]ALU

    4

    4

    4

    A[3:0] CarryIn

    B[3:0]

    AL

    U

    4

    4

    4

    A[7:4]

    Result[7:4]

    CarryOut

    B[7:4]

    Carry Select (Continue)

  • 7/27/2019 Lec05 Design in parallel

    49/51

    361 design.51

    Carry Select (Continue)

    Consider building a 8-bit ALU

    Expensive but faster: uses three 4-bit ALUs

    Result[3:0]ALU

    4

    4

    4

    A[3:0] CarryIn

    B[3:0]

    C4

    4

    X[7:4]ALU

    4

    4

    A[7:4]

    0

    B[7:4]

    C0

    4

    Y[7:4]ALU

    4

    4

    A[7:4]1

    B[7:4]

    C1

    2to1MUX

    Sel

    0

    1

    Result[7:4]

    4

    2 to 1 MUX0 1 SelC4

    CarryOut

    Additional MIPS ALU requirements

  • 7/27/2019 Lec05 Design in parallel

    50/51

    361 design.53

    Additional MIPS ALU requirements

    Mult, MultU, Div, DivU (next lecture)

    => Need 32-bit multiply and divide, signed and unsigned

    Sll, Srl, Sra (next lecture)=> Need left shift, right shift, right shift arithmetic by 0 to 31 bits

    Nor (leave as exercise to reader)=> logical NOR or use 2 steps: (A OR B) XOR 1111....1111

  • 7/27/2019 Lec05 Design in parallel

    51/51