pipe2

7
CSE 240A Dean Tullsen Pipeline Hazards or Danger!Danger!Danger! CSE 240A Dean Tullsen Data Hazards CC 1 CC 2 CC 3 CC 4 CC 5 CC 6 Time (in clock cycles) R1, R2, R3 Reg DM DM DM ADD SUB R4, R1, R5 AND R6, R1, R7 OR R8, R1, R9 XOR R10, R1, R11 Reg Reg Reg Reg IM IM IM IM IM Reg ALU ALU ALU ALU Reg Program execution order (in instructions) CSE 240A Dean Tullsen Data Hazard add R6, R2, R1 addi R3, R1, #35 lw R8, 10000(R3) Data memory A L U S i g n e x t e n d PC Instruction memory A D D IF/ID 4 ID/EX EX/MEM MEM/WB I R 6 . . 1 0 MEM/WB.IR M u x M u x M u x I R 1 1 . . 1 5 R e g i s t e r s B r a n c h t a k e n IR 16 32 M u x Zero? CSE 240A Dean Tullsen Data Dependence Data hazards are caused by data dependences Data dependences, and thus data hazards, come in 3 flavors (not all of which apply to this pipeline). RAW (read-after-write) WAW (write-after-write) WAR (write-after-read)

Upload: abdullah-ashraf-elkordy

Post on 19-Sep-2015

212 views

Category:

Documents


0 download

DESCRIPTION

l,k

TRANSCRIPT

  • CSE 240A Dean Tullsen

    Pipeline Hazards

    orDanger!Danger!Danger!

    CSE 240A Dean Tullsen

    Data Hazards

    CC 1 CC 2 CC 3 CC 4 CC 5 CC 6

    Time (in clock cycles)

    R1, R2, R3

    Reg

    DM

    DM

    DM

    ADD

    SUB R4, R1, R5

    AND R6, R1, R7

    OR R8, R1, R9

    XOR R10, R1, R11

    Reg

    Reg Reg

    RegIM

    IM

    IM

    IM

    IM

    Reg

    A

    L

    U

    A

    L

    U

    A

    L

    U

    A

    L

    U

    Reg

    P

    r

    o

    g

    r

    a

    m

    e

    x

    e

    c

    u

    t

    i

    o

    n

    o

    r

    d

    e

    r

    (

    i

    n

    i

    n

    s

    t

    r

    u

    c

    t

    i

    o

    n

    s

    )

    CSE 240A Dean Tullsen

    Data Hazard

    add R6, R2, R1 addi R3, R1, #35lw R8, 10000(R3)

    Datamemory

    ALU

    Signextend

    PC

    Instructionmemory

    ADD

    IF/ID

    4

    ID/EX EX/MEM MEM/WB

    IR6..10

    MEM/WB.IR

    MuxMux

    Mux

    IR11..15

    Registers

    Branchtaken

    IR

    16 32

    Mux Zero?

    CSE 240A Dean Tullsen

    Data Dependence

    Data hazards are caused by data dependences Data dependences, and thus data hazards, come in

    3 flavors (not all of which apply to this pipeline). RAW (read-after-write) WAW (write-after-write) WAR (write-after-read)

  • CSE 240A Dean Tullsen

    RAW Hazard

    later instruction tries to read an operand before earlier instruction writes it

    The dependenceadd R1, R2, R3sub R5, R1, R4

    The hazardadd R1, R2, R3sub R5, R1, R4

    RAW hazard is extremely common

    IF ID EX MEM WB

    IF ID EX MEM WB

    CSE 240A Dean Tullsen

    WAW Hazard

    later instruction tries to write an operand before earlier instruction writes it

    The dependenceadd R1, R2, R3sub R1, R2, R4

    The hazardlw R1, R2, R3sub R1, R2, R4

    WAW hazard possible in a reasonable pipeline, but not in the very simple pipeline were assuming.

    IF ID EX MEM WBIF ID EX MEM WB

    MEM3MEM2

    CSE 240A Dean Tullsen

    WAR Hazard

    later instruction tries to write an operand before earlier instruction reads it

    The dependenceadd R1, R2, R3sub R2, R5, R4

    The hazard?add R1, R2, R3sub R2, R5, R4

    WAR hazard is uncommon/impossible in a reasonable (in-order) pipeline

    IF ID EX MEM WBIF ID EX MEM WB

    CSE 240A Dean Tullsen

    Dealing with Data Hazardsthrough Forwarding

    CC 1 CC 2 CC 3 CC 4 CC 5 CC 6

    R1, R2, R3

    Reg

    DM

    DM

    DM

    ADD

    SUB R4, R1, R5

    AND R6, R1, R7

    OR R8, R1, R9

    XOR R10, R1, R11

    Reg

    Reg Reg

    RegIM

    IM

    IM

    IM

    IM

    Reg

    A

    L

    U

    A

    L

    U

    A

    L

    U

    A

    L

    U

    Reg

  • CSE 240A Dean Tullsen

    Dealing with Data Hazards through Forwarding

    SUB R4, R1, R5 ADD R1, R2, R3AND R6, R4, R7

    Datamemory

    ALU

    Signextend

    PC

    Instructionmemory

    ADD

    IF/ID

    4

    ID/EX EX/MEM MEM/WB

    IR6..10

    MEM/WB.IR

    MuxMux

    Mux

    IR11..15

    Registers

    Branchtaken

    IR

    16 32

    Mux Zero?

    CSE 240A Dean Tullsen

    Forwarding Options

    ADD -> ADD ADD -> LW ADD -> SW (2 operands) LW -> ADD LW -> LW LW -> SW (2 operands)

    (Im letting ADD stand in for all ALU operations)

    CSE 240A Dean Tullsen

    More Forwarding

    CC 1 CC 2 CC 3 CC 4 CC 5 CC 6

    R1, R2, R3 DM

    DM

    DM

    ADD

    LW R4, 0(R1)

    SW 12(R1), R4

    Reg

    Reg Reg

    RegIM

    IM

    IM

    A

    L

    U

    A

    L

    U

    A

    L

    U

    Reg

    CSE 240A Dean Tullsen

    Forwarding and Stalling

    DM

    DM

    CC 1 CC 2 CC 3 CC 4 CC 5 CC 6

    LW R1, 0(R2)

    SUB R4, R1, R5

    AND R6, R1, R7

    OR R8, R1, R9

    Reg

    A

    L

    U

    A

    L

    U

    A

    L

    U

    Reg

    Reg

    RegIM

    IM

    IM

    IM Reg

    Bubble

    Bubble

    Bubble

  • CSE 240A Dean Tullsen

    Example

    ADD R1, R2, R3

    SW R1, 1000(R2)

    LW R7, 2000(R2)

    ADD R5, R7, R1

    LW R8, 2004(R2)

    SW R7, 2008(R8)

    ADD R8, R8, R2

    LW R9, 1012(R8)

    SW R9, 1016(R8)

    CSE 240A Dean Tullsen

    Avoiding Pipeline Stalls

    lw R1, 1000(R2)lw R3, 2000(R2)add R4, R1, R3lw R1, 3000(R2)add R6, R4, R1sw R6, 1000(R2)

    this is a compiler technique called instruction scheduling.

    CSE 240A Dean Tullsen

    How big a problem are these pipeline stalls?

    13% of the loads in FP programs 25% of the loads in integer programs

    CSE 240A Dean Tullsen

    Detecting ALU Input Hazards

    ID/EX EX/MEM MEM/WB

    o

    p

    c

    o

    d

    e

    r

    d

    r

    s

    1

    r

    d

    o

    p

    c

    o

    d

    e

    o

    p

    c

    o

    d

    e

    r

    d

    ALUr s2

  • CSE 240A Dean Tullsen

    Inserting Bubbles

    Set all control values in the EX/MEM register to safe values (equivalent to a nop)

    Keep same values in the ID/EX register and IF/ID register Keep PC from incrementing

    CSE 240A Dean Tullsen

    Adding Datapaths

    Datamemory

    ALU

    Zero?

    ID/EX EX/MEM MEM/WB

    Mux

    Mux

    CSE 240A Dean Tullsen

    Control Hazards

    Instructions are not only dependent on instructions that produce their operands, but also on all previous control flow (branch, jump) instructions that lead to that instruction.

    addaddbnesubandbeq

    andsub

    branch taken

    CSE 240A Dean Tullsen

    Branch Hazards

    Branch

    I2

    I3

    I4

    correct instruction

    IF ID EX MEM WB

    IF ID EX MEM WB

    IF ID EX MEM WB

    IF ID EX MEM WB

    IF ID EX MEM WB

  • CSE 240A Dean Tullsen

    Branch Stall Impact

    If CPI = 1, 30% branch, Stall 3 cycles => new CPI = 1.9! Two part solution:

    Determine branch taken or not sooner, AND Compute taken branch address earlier

    (limited MIPS) branch tests if register = 0 or 0 MIPS Solution:

    Move Zero test to ID/RF stage Adder to calculate new PC in ID/RF stage 1 clock cycle penalty for branch versus 3

    CSE 240A Dean Tullsen

    New Datapath

    DataALU

    Signextend

    16 32

    memory

    PC

    Instructionmemory

    ADD

    ADD

    IF/ID

    4

    ID/EX

    EX/MEM MEM/WB

    IR6..10

    MEM/WB.IR

    IR11..15

    Registers

    Zero?

    Mux

    Mux

    Mux

    IR

    CSE 240A Dean Tullsen

    Branch Hazards

    Branch

    I2

    correct instruction

    I4

    I5

    IF ID EX MEM WB

    IF ID EX MEM WB

    IF ID EX MEM WB

    IF ID EX MEM WB

    IF ID EX MEM WB

    CSE 240A Dean Tullsen

    What We Know About Branches

    more conditional branches than unconditional more forward than backward 67% of branches taken backward branches taken 80%

  • CSE 240A Dean Tullsen

    Four Branch Hazard Alternatives#1: Stall until branch direction is clear#2: Predict Branch Not Taken

    Execute successor instructions in sequence Squash instructions in pipeline if branch actually taken Advantage of late pipeline state update 33% MIPS branches not taken on average PC+4 already calculated, so use it to get next instruction

    #3: Predict Branch Taken 67% MIPS branches taken on average But havent calculated branch target address in this MIPS architecture

    MIPS still incurs 1 cycle branch penalty Other machines: branch target known before outcome

    CSE 240A Dean Tullsen

    Four Branch Hazard Alternatives

    #4: Delayed Branch Define branch to take place AFTER a following instruction

    branch instructionsequential successor1sequential successor2........

    sequential successornbranch target if taken

    1 slot delay allows proper decision and branch target address in 5 stage pipeline

    MIPS uses this

    Branch delay of length n

    CSE 240A Dean Tullsen

    Delayed Branch

    Where to get instructions to fill branch delay slot? Before branch instruction From the target address: only valuable when branch taken From fall through: only valuable when branch not taken Cancelling branches allow more slots to be filled

    Compiler effectiveness for single branch delay slot: Fills about 60% of branch delay slots About 80% of instructions executed in branch delay slots useful in

    computation About 50% (60% x 80%) of slots usefully filled

    CSE 240A Dean Tullsen

    Key Points

    Hard to keep the pipeline completely full Data Hazards require dependent instructions to wait for the

    producer instruction Most of the problem handled with forwarding (bypassing) Sometimes stall still required (especially in modern processors)

    Control hazards require control-dependent (post-branch) instructions to wait for the branch to be resolved