lecture 15: pipelining and hazards cs 2011 fall 2014, dr. rozier
TRANSCRIPT
Lecture 15: Pipelining and Hazards
CS 2011
Fall 2014, Dr. Rozier
MIDTERM II
Midterm II
November 13th
Plan for remaining time
Midterm II Topics
• Any topic from Midterm I is fair game– Mostly will be important in terms of how they relate to
what we have done.
• New topics– Code analysis– Load/Store– Decision and Control– Branching– The Stack– Integer Arithmetic– Foating Point– Pipelining and Hazards
PIPELINING AND HAZARDS
What needs to be done to “Process” an Instruction?
• Check the PC• Fetch the instruction from memory• Decode the instruction and set control lines appropriately• Execute the instruction
– Use ALU– Access Memory– Branch
• Store Results• PC = PC + 4, or PC = branch target
Pipeline
Fetch
Decode
IssueInteger
Multiply
Floating Point
Load Store
Write Back
Pipelining Analogy• Pipelined laundry: overlapping execution
– Parallelism improves performance
§4.5 An O
verview of P
ipelining
Hazards
• What could cause pipelining to fail?
Hazards
• Situations that prevent starting the next instruction in the next cycle
• Structure hazards– A required resource is busy
• Data hazard– Need to wait for previous instruction to complete
its data read/write• Control hazard
– Deciding on control action depends on previous instruction
Structure Hazards
• Conflict for use of a resource• In pipeline with a single memory
– Load/store requires data access– Instruction fetch would have to stall for that cycle
• Would cause a pipeline “bubble”
• Hence, pipelined datapaths require separate instruction/data memories– Or separate instruction/data caches
Data Hazards• An instruction depends on completion of data
access by a previous instruction– add r0, r4, r1
sub r2, r0, r3
Forwarding (aka Bypassing)• Use result when it is computed
– Don’t wait for it to be stored in a register– Requires extra connections in the datapath
Load-Use Data Hazard• Can’t always avoid stalls by forwarding
– If value not computed when needed– Can’t forward backward in time!
ldr r0 [r2,#0]
sub r1, r1, r0
Code Scheduling to Avoid Stalls• Reorder code to avoid use of load result in the
next instruction• C code for A = B + E; C = B + F;
ldr r1, [r0, #0]ldr r2, [r0, #4]add r3, r1, r2str r3, [r0, #12]ldr r4, [r0, #8]add r5, r1, r4str r5, [0, #16]
stall
stall
ldr r1, [r0, #0]ldr r2, [r0, #4]ldr r4, [r0, #8]add r3, r1, r2str r3, [r0, #12]add r5, r1, r4str r5, [0, #16]___________________
Control Hazards
• Branch determines flow of control– Fetching next instruction depends on branch
outcome– Pipeline can’t always fetch correct instruction
• Still working on ID stage of branch
• In pipeline– Need to compare registers and compute target
early in the pipeline– Add hardware to do it in ID stage
Stall on Branch• Wait until branch outcome determined before
fetching next instruction
adds r4, r5, r6
beq label
add r7, r8, r9
Branch Prediction
• Longer pipelines can’t readily determine branch outcome early– Stall penalty becomes unacceptable
• Predict outcome of branch– Only stall if prediction is wrong
• In pipeline– Can predict branches not taken– Fetch instruction after branch, with no delay
MIPS with Predict Not Taken
Prediction correct
Prediction incorrect
adds r4, r5, r6
beq label
add r7, r8, r9
adds r4, r5, r6
beq label
sub r7, r8, r9
More-Realistic Branch Prediction
• Static branch prediction– Based on typical branch behavior– Example: loop and if-statement branches
• Predict backward branches taken• Predict forward branches not taken
• Dynamic branch prediction– Hardware measures actual branch behavior
• e.g., record recent history of each branch
– Assume future behavior will continue the trend• When wrong, stall while re-fetching, and update history
Pipeline Summary
• Pipelining improves performance by increasing instruction throughput– Executes multiple instructions in parallel– Each instruction has the same latency
• Subject to hazards– Structure, data, control
• Instruction set design affects complexity of pipeline implementation
The BIG Picture
MIPS Pipelined Datapath§4.6 P
ipelined Datapath and C
ontrol
WB
MEM
Right-to-left flow leads to hazards
Pipeline registers• Need registers between stages
– To hold information produced in previous cycle
Pipeline Operation
• Cycle-by-cycle flow of instructions through the pipelined datapath– “Single-clock-cycle” pipeline diagram
• Shows pipeline usage in a single cycle• Highlight resources used
– c.f. “multi-clock-cycle” diagram• Graph of operation over time
• We’ll look at “single-clock-cycle” diagrams for load & store
IF for Load, Store, …
ID for Load, Store, …
EX for Load
MEM for Load
WB for Load
Wrongregisternumber
Corrected Datapath for Load
EX for Store
MEM for Store
WB for Store
Multi-Cycle Pipeline Diagram• Form showing resource usage
Multi-Cycle Pipeline Diagram• Traditional form
Single-Cycle Pipeline Diagram• State of pipeline in a given cycle
Data Hazards in ALU Instructions
• Consider this sequence:sub r2, r1, r3and r7, r2, r5or r8, r6, r2add r9, r2, r2str r10, [r2, #100]
• We can resolve hazards with forwarding– How do we detect when to forward?
Dependencies & Forwarding
sub r2, r1, r3
and r7, r2, r5
or r8, r6, r2
add r9, r2, r2
str r10, [r2, #100]
Detecting the Need to Forward
• Pass register numbers along pipeline– e.g., ID/EX.RegisterRs = register number for Rs sitting in
ID/EX pipeline register• ALU operand register numbers in EX stage are
given by– ID/EX.RegisterRs, ID/EX.RegisterRt
• Data hazards when1a. EX/MEM.RegisterRd = ID/EX.RegisterRs1b. EX/MEM.RegisterRd = ID/EX.RegisterRt2a. MEM/WB.RegisterRd = ID/EX.RegisterRs2b. MEM/WB.RegisterRd = ID/EX.RegisterRt
Fwd fromEX/MEMpipeline reg
Fwd fromMEM/WBpipeline reg
Detecting the Need to Forward
• But only if forwarding instruction will write to a register!– EX/MEM.RegWrite, MEM/WB.RegWrite
Forwarding Paths
Double Data Hazard
• Consider the sequence:add r1,r1,r2add r1,r1,r3add r1,r1,r4
• Both hazards occur– Want to use the most recent
• Revise MEM hazard condition– Only fwd if EX hazard condition isn’t true
Chapter 4 — The Processor — 43
Datapath with Forwarding
Load-Use Data Hazard
Need to stall for one cycle
ldr r2, [r1, #20]
and r4, r2, r5
or r8, r2, r6
and r9, r4, r2
add r1, r6, r7
Stall/Bubble in the Pipeline
Stall inserted here
ldr r2, [r1, #20]
and becomes nop
and r4, r2, r5
or r8, r2, r6
add r9, r4, r2
Datapath with Hazard Detection
Stalls and Performance
• Stalls reduce performance– But are required to get correct results
• Compiler can arrange code to avoid hazards and stalls– Requires knowledge of the pipeline structure
The BIG Picture
Pitfalls
• Poor ISA design can make pipelining harder– e.g., complex instruction sets (VAX, IA-32)
• Significant overhead to make pipelining work• IA-32 micro-op approach
– e.g., complex addressing modes• Register update side effects, memory indirection
– e.g., delayed branches• Advanced pipelines have long delay slots
Concluding Remarks
• ISA influences design of datapath and control• Datapath and control influence design of ISA• Pipelining improves instruction throughput
using parallelism– More instructions completed per second– Latency for each instruction not reduced
• Hazards: structural, data, control
WRAP UP
For next time
Exam Review