out-of-order execution, exception, branch prediction, cmp
TRANSCRIPT
![Page 1: Out-of-Order Execution, Exception, Branch Prediction, CMP](https://reader031.vdocuments.mx/reader031/viewer/2022012103/616a0d9411a7b741a34e4a92/html5/thumbnails/1.jpg)
1
EE 457 Questions and Answers for Special Topics
Out-of-Order Execution, Exception,
Branch Prediction, CMP
Gandhi Puvvada, Weirong Jiang & Tony Toghia, USC 2008
![Page 2: Out-of-Order Execution, Exception, Branch Prediction, CMP](https://reader031.vdocuments.mx/reader031/viewer/2022012103/616a0d9411a7b741a34e4a92/html5/thumbnails/2.jpg)
Out of Order (OoO) ExecutionDynamic Scheduling of
Instructions(The Tomasulo Algorithm)
![Page 3: Out-of-Order Execution, Exception, Branch Prediction, CMP](https://reader031.vdocuments.mx/reader031/viewer/2022012103/616a0d9411a7b741a34e4a92/html5/thumbnails/3.jpg)
IntegerMultiplier
Issue UnitIn
t. D
ivid
er
63
2
TAG FIFO
Simplifiedfor EE457
Block Diagramprovided by Prof. Dubois
Mult
![Page 4: Out-of-Order Execution, Exception, Branch Prediction, CMP](https://reader031.vdocuments.mx/reader031/viewer/2022012103/616a0d9411a7b741a34e4a92/html5/thumbnails/4.jpg)
I -Cache
����
Dispatch
I-Fetch Queue
Integer Queue
Load/StoreQueue
Div
Queue
Mult Queue
CDB
Back-end
Front-end
Re-order Buffer
Reg File
BPB
Exe Unit Exe UnitCache
Exe Unit Exe Unit
����
Add Buff
OoO Execution and In-Order Committing with ROB (Re-Order Buffer)
Issue Unit
![Page 5: Out-of-Order Execution, Exception, Branch Prediction, CMP](https://reader031.vdocuments.mx/reader031/viewer/2022012103/616a0d9411a7b741a34e4a92/html5/thumbnails/5.jpg)
Q#1 What is the important difference between the two block diagrams?
Which supports precise exceptions
IntegerMultiplier
Issue Unit
Int.
Div
ider
63
2
TAG FIFO
![Page 6: Out-of-Order Execution, Exception, Branch Prediction, CMP](https://reader031.vdocuments.mx/reader031/viewer/2022012103/616a0d9411a7b741a34e4a92/html5/thumbnails/6.jpg)
A#1 ROB is the important difference between the two block diagrams.
The right-side block diagram supportsprecise exceptions.
IntegerMultiplier
Issue Unit
Int.
Div
ider
63
2
TAG FIFO
![Page 7: Out-of-Order Execution, Exception, Branch Prediction, CMP](https://reader031.vdocuments.mx/reader031/viewer/2022012103/616a0d9411a7b741a34e4a92/html5/thumbnails/7.jpg)
Q#2 Choose the right attributes to describe the block diagrams.
1. Left Block Diagram__________ (Out of Order / In-Order) Issue,__________ (Out of Order / In-Order) Execute,__________ (Out of Order / In-Order) Complete.
2. Right Block Diagram__________ (Out of Order / In-Order) Issue,__________ (Out of Order / In-Order) Execute,__________ (Out of Order / In-Order) Complete.
![Page 8: Out-of-Order Execution, Exception, Branch Prediction, CMP](https://reader031.vdocuments.mx/reader031/viewer/2022012103/616a0d9411a7b741a34e4a92/html5/thumbnails/8.jpg)
A#2 Choose the right attributes to describe the block diagrams.
1. Left Block Diagram__________ (Out of Order / In-Order) Issue,__________ (Out of Order / In-Order) Execute,__________ (Out of Order / In-Order) Complete.
2. Right Block Diagram__________ (Out of Order / In-Order) Issue,__________ (Out of Order / In-Order) Execute,__________ (Out of Order / In-Order) Complete.
![Page 9: Out-of-Order Execution, Exception, Branch Prediction, CMP](https://reader031.vdocuments.mx/reader031/viewer/2022012103/616a0d9411a7b741a34e4a92/html5/thumbnails/9.jpg)
9
Out-of-Order Execution (with ROB)Q#3 When we refer to an out-of-order
processor with ROB, do we mean:a. instructions are issued out-of-order?b. instructions start execution out-of-order?c. instructions finish execution out-of-order?d. instructions retire out of order?
![Page 10: Out-of-Order Execution, Exception, Branch Prediction, CMP](https://reader031.vdocuments.mx/reader031/viewer/2022012103/616a0d9411a7b741a34e4a92/html5/thumbnails/10.jpg)
• A#3: b and c. Instructions are issued and retired in-order, to maintain the functionality of in-order execution. What happens in between, however, the start and completion (of execution in integer and floating point units) of instructions, can be done out-of-order.
10
![Page 11: Out-of-Order Execution, Exception, Branch Prediction, CMP](https://reader031.vdocuments.mx/reader031/viewer/2022012103/616a0d9411a7b741a34e4a92/html5/thumbnails/11.jpg)
TAG FIFO (Token FIFO) in the left diagram
IntegerMultiplier
Issue Unit
Int.
Div
ider
63
2
TAG FIFO
Q#4 Q#4.1 Is it necessary to hold the 64 tokens in the 0 to 63 order initially on reset?Q#4.2 Is FIFO used for convenience or is it necessary that we follow the “First-In-First_Out orderQ#4.3 Can the FIFO overflow?Q#4.4 Can the FIFO become empty?
![Page 12: Out-of-Order Execution, Exception, Branch Prediction, CMP](https://reader031.vdocuments.mx/reader031/viewer/2022012103/616a0d9411a7b741a34e4a92/html5/thumbnails/12.jpg)
TAG FIFO (Token FIFO)A#4 A#4.1 It is not necessary to hold the 64 tokens in the 0 to 63 order initially on reset.
A#4.2 FIFO is used for convenience. It is not necessary that we follow the “First-In-First_Out” order.
A#4.3 The FIFO can not overflow as we can not receive more tokens than what we issued.
A#4.4 The FIFO can become empty if the backend capacity exceeds the total number of tokens.
Q#4 Q#4.1 Is it necessary to hold the 64 tokens in the 0 to 63 order initially on reset?
Q#4.2 Is FIFO used for convenience or is it necessary that we follow the “First-In-First_Out order
Q#4.3 Can the FIFO overflow?
Q#4.4 Can the FIFO become empty?
![Page 13: Out-of-Order Execution, Exception, Branch Prediction, CMP](https://reader031.vdocuments.mx/reader031/viewer/2022012103/616a0d9411a7b741a34e4a92/html5/thumbnails/13.jpg)
TAGs for destinations or sources or for both? (in ROB-less design)
• A new tag is assigned to the destination register of the instruction being dispatched.
• For each of the source registers (source operands) of the instruction being dispatched, either the value of the source register (if it has not been previously tagged) or the existing tag associated with the source register (if it has been tagged already in RAS) is conveyed to the instruction.
• If a tag is conveyed for a source, then the instruction needs to wait for the original instruction with that destination tag to go on to the CDB and announce the value.
![Page 14: Out-of-Order Execution, Exception, Branch Prediction, CMP](https://reader031.vdocuments.mx/reader031/viewer/2022012103/616a0d9411a7b741a34e4a92/html5/thumbnails/14.jpg)
Unique TAG
• Like SSN, we need a unique TAG
• SSNs are reused.
• Similarly TAGs can be reused.
• TAGs are similar to the number TOKENs.
4
4
(in ROB-less design)
![Page 15: Out-of-Order Execution, Exception, Branch Prediction, CMP](https://reader031.vdocuments.mx/reader031/viewer/2022012103/616a0d9411a7b741a34e4a92/html5/thumbnails/15.jpg)
TAGs (= Tokens)
• How many Tokens should the bank cashier have to start with?
• What happens if the tokens are run out?
• Does he need to have any order in holding tokens and issuing tokens?
• Does he have to collect tokens back?
4(in ROB-less design)
![Page 16: Out-of-Order Execution, Exception, Branch Prediction, CMP](https://reader031.vdocuments.mx/reader031/viewer/2022012103/616a0d9411a7b741a34e4a92/html5/thumbnails/16.jpg)
TAG FIFO (FIFOs are taught in EE560)
• To issue and collect Tokens (TAGs), use a circular FIFO (First-in-First-Out) unit.
• Filled with (say) 64 tokens (in any order) initially on reset.
• Tokens return in out of order anyway.• Put tokens back in stack and issue.
01
63
wp rp
2
Full
wp
rp
63
2
2 tokens issued
1
63
wprp2
1 token returned
(in ROB-less design)
![Page 17: Out-of-Order Execution, Exception, Branch Prediction, CMP](https://reader031.vdocuments.mx/reader031/viewer/2022012103/616a0d9411a7b741a34e4a92/html5/thumbnails/17.jpg)
17
• Q#5 What is meant by retirement in an out-of-order processor?
• Q#6 What two conditions are required for retirement?
![Page 18: Out-of-Order Execution, Exception, Branch Prediction, CMP](https://reader031.vdocuments.mx/reader031/viewer/2022012103/616a0d9411a7b741a34e4a92/html5/thumbnails/18.jpg)
• A#5: Retirement is the point at which an instruction’s results can be committed(can be written into the register file or memory) or if it is a conditional branch or an exception it can be taken. In short its execution is insured and it is no longer speculative. Note: In speculative execution, conditional branches are executed based on prediction, and if it turns out to be a misprediction, wrong-path instructions are flushed.
• A#6: Execution must be completed, and the instruction must be the oldest instruction not yet retired. (It is the oldest instruction in the re-order buffer.) 18
![Page 19: Out-of-Order Execution, Exception, Branch Prediction, CMP](https://reader031.vdocuments.mx/reader031/viewer/2022012103/616a0d9411a7b741a34e4a92/html5/thumbnails/19.jpg)
19
• Q#7 __________________ (Architectural / Physical) registers are visible to software (i.e. can be used in instructions)
• Q#8 __________________ (Architectural / Physical) registers allow multiple copies of a register to support out-of-order execution (including speculative execution) via register renaming.
![Page 20: Out-of-Order Execution, Exception, Branch Prediction, CMP](https://reader031.vdocuments.mx/reader031/viewer/2022012103/616a0d9411a7b741a34e4a92/html5/thumbnails/20.jpg)
20
• Q#7 __________________ (Architectural / Physical) registers are visible to software (i.e. can be used in instructions)
• Q#8 __________________ (Architectural / Physical) registers allow multiple copies of a register to support out-of-order execution (including speculative execution) via register renaming.
![Page 21: Out-of-Order Execution, Exception, Branch Prediction, CMP](https://reader031.vdocuments.mx/reader031/viewer/2022012103/616a0d9411a7b741a34e4a92/html5/thumbnails/21.jpg)
Limited Architectural RegistersMore Physical Registers
Register Renaminglw $8, 40($2);add $8, $8, $8;sw $8, 40($2);
lw $8, 60($3);add $8, $8, $8;sw $8, 60($3);
It is clear that compiler is using $8 as a temporary register.
If there is a delay in obtaining $2, the first part of the code can not proceed.
Unfortunately, the second part of the code can not proceed because of name dependency for $8.
![Page 22: Out-of-Order Execution, Exception, Branch Prediction, CMP](https://reader031.vdocuments.mx/reader031/viewer/2022012103/616a0d9411a7b741a34e4a92/html5/thumbnails/22.jpg)
22
Q#9 Register renaming can NOT solvea. RAW hazardsb. WAR hazardsc. WAW hazards
Note: In a design with ROB, WAW and WAR will never occur as all writes are performed strictly in-order. So answer the above question for the ROB-less design.
![Page 23: Out-of-Order Execution, Exception, Branch Prediction, CMP](https://reader031.vdocuments.mx/reader031/viewer/2022012103/616a0d9411a7b741a34e4a92/html5/thumbnails/23.jpg)
• A#9: a, The RAW (Read After Write) hazard is the only hazard which cannot be solved by register renaming.
• For WAW (Write After Write) hazard:– if the instruction order is that $1 gets written twice, and if the later
write (W2) can execute before the first write (W1), then register renaming mechanism allows the earlier write to be discarded in a ROB-less design.
• For WAR (Write After Read) hazard:– register renaming allows the older version of the register to be
read and held in the Issue Queues, so that the later write can proceed.
• For RAW (Read After Write) hazard:– a dependent read MUST wait and cannot execute before a write
to the same location. (The to-be written value must be determined before it can be read by a later instruction.) The dependent instruction waits in the Issue Queues for the operand to be broadcast on the CDB. 23
![Page 24: Out-of-Order Execution, Exception, Branch Prediction, CMP](https://reader031.vdocuments.mx/reader031/viewer/2022012103/616a0d9411a7b741a34e4a92/html5/thumbnails/24.jpg)
IntegerMultiplier
Issue Unit
Int.
Div
ider
63
2
TAG FIFO
24
Q#10 What resource is the major bottleneck of Tomasulo algorithm?
IFQ / Dispatcher / Issue Queues / Execution Units / CDB
![Page 25: Out-of-Order Execution, Exception, Branch Prediction, CMP](https://reader031.vdocuments.mx/reader031/viewer/2022012103/616a0d9411a7b741a34e4a92/html5/thumbnails/25.jpg)
25
A#10 What resource is the major bottleneck of Tomasulo algorithm?
CDB
The issue unit has to throttle issuing instructions to the execution units based on CDB’s availability. It does not let multiple execution units to finish execution at the same time.
![Page 26: Out-of-Order Execution, Exception, Branch Prediction, CMP](https://reader031.vdocuments.mx/reader031/viewer/2022012103/616a0d9411a7b741a34e4a92/html5/thumbnails/26.jpg)
26
• Q#11a Suppose the following lwinstruction is in progress and is currently waiting for the cache to respond. lw $2, 0($4)Which of the following instructions in the integer issue queue will begin execution the earliest?
#4 subi $6, $7, $8#3 addi $5, $3, $4#2 sub $4, $4, $6#1 (oldest)
add $1, $2, $3
![Page 27: Out-of-Order Execution, Exception, Branch Prediction, CMP](https://reader031.vdocuments.mx/reader031/viewer/2022012103/616a0d9411a7b741a34e4a92/html5/thumbnails/27.jpg)
27
• A#11a #2. #1 cannot begin execution, because it reads $2, which is still being written by the LW instruction (RAW hazard). Instruction #2 can begin execution. (Note: Register renaming solves the WAR hazard on $4.)
#4 subi $6, $7, $8#3 addi $5, $3, $4#2 sub $4, $4, $6#1 (oldest)
add $1, $2, $3
![Page 28: Out-of-Order Execution, Exception, Branch Prediction, CMP](https://reader031.vdocuments.mx/reader031/viewer/2022012103/616a0d9411a7b741a34e4a92/html5/thumbnails/28.jpg)
28
• Q#11b Given the same situation (lw $2, 0($4) ) as the previous problem, now which of the following instructions in the integer issue queue will begin execution the earliest?
#4 subi $6, $7, $8#3 addi $5, $3, $4#2 sub $4, $4, $1#1 (oldest)
add $1, $2, $3
Was $6
![Page 29: Out-of-Order Execution, Exception, Branch Prediction, CMP](https://reader031.vdocuments.mx/reader031/viewer/2022012103/616a0d9411a7b741a34e4a92/html5/thumbnails/29.jpg)
29
• A#11b Instruction #4 is the earliest instruction that does not read a value that is modified by an earlier instruction.
#4 subi $6, $7, $8#3 addi $5, $3, $4#2 sub $4, $4, $1#1 (oldest)
add $1, $2, $3
Was $6
![Page 30: Out-of-Order Execution, Exception, Branch Prediction, CMP](https://reader031.vdocuments.mx/reader031/viewer/2022012103/616a0d9411a7b741a34e4a92/html5/thumbnails/30.jpg)
Without or with ROB? • Q#11c Are your answers to Q#11a and
Q#11b for the first design without ROB or the second design with ROB?
![Page 31: Out-of-Order Execution, Exception, Branch Prediction, CMP](https://reader031.vdocuments.mx/reader031/viewer/2022012103/616a0d9411a7b741a34e4a92/html5/thumbnails/31.jpg)
Without or with ROB? • Q#11c Are your answers to Q#11a and
Q#11b for the first design without ROB or the second design with ROB?
• A#11c For both! RAW dependency is the true dependency and every implementation has to honor that dependency.
![Page 32: Out-of-Order Execution, Exception, Branch Prediction, CMP](https://reader031.vdocuments.mx/reader031/viewer/2022012103/616a0d9411a7b741a34e4a92/html5/thumbnails/32.jpg)
Q#12 ROB is the important difference between the two block diagrams.
Compare and contrast
IntegerMultiplier
Issue Unit
Int.
Div
ider
63
2
TAG FIFO
![Page 33: Out-of-Order Execution, Exception, Branch Prediction, CMP](https://reader031.vdocuments.mx/reader031/viewer/2022012103/616a0d9411a7b741a34e4a92/html5/thumbnails/33.jpg)
A#12 Compare and contrastWithout ROB With ROB
1. TAG FIFO provides unique TAGs
1. ROB location IDs are TAGs
2. Register Status Table specifies if a register is obsolete.
2. ROB needs to be searched associatively to find the latest register content
3. Allows out-of-order completion
3. Enforces in-order-only completion
![Page 34: Out-of-Order Execution, Exception, Branch Prediction, CMP](https://reader031.vdocuments.mx/reader031/viewer/2022012103/616a0d9411a7b741a34e4a92/html5/thumbnails/34.jpg)
A#12 Compare and contrastWithout ROB With ROB
4. Can not support exceptions
4. Can support exceptions
5. Can not support speculative execution.
5. Can support speculative execution.
6. No speculation,No BPB.
6. Has BPB to aid in branch prediction
7. No good for real implementation
7. Good for real implementation
![Page 35: Out-of-Order Execution, Exception, Branch Prediction, CMP](https://reader031.vdocuments.mx/reader031/viewer/2022012103/616a0d9411a7b741a34e4a92/html5/thumbnails/35.jpg)
A#12 Compare and contrastWithout ROB With ROB
8. Writes are out of order. Hence dispatch is suspended after dispatching a conditional branch, until the branch is resolved.
8. Writes are in-order. Dispatch continues based on prediction. Design provides for flushing wrong-path execution.
9. Stores write to cache when they come out of lsq (load/store queue).
9. Stores write to cache when they reach the top of ROB.
![Page 36: Out-of-Order Execution, Exception, Branch Prediction, CMP](https://reader031.vdocuments.mx/reader031/viewer/2022012103/616a0d9411a7b741a34e4a92/html5/thumbnails/36.jpg)
A#12 Compare and contrastWithout ROB With ROB
10. Memory disambiguation rules are stricter.
10. Since WAW and WAR are not present, rules are simpler.
11. Only RAR is irrelevant. So two loads from the same address can execute in any order. Rest of loads and stores with matching addresses have go in-order.
11. Only RAW needs to be looked at. Loads read cache before going into ROB. Hence, loads have to wait until senior stores with matching addresses finish
![Page 37: Out-of-Order Execution, Exception, Branch Prediction, CMP](https://reader031.vdocuments.mx/reader031/viewer/2022012103/616a0d9411a7b741a34e4a92/html5/thumbnails/37.jpg)
A#12 Compare and contrastWithout ROB With ROB
12. Suppose a senior load is yet to calculate its memory address.A junior load (but not store) can leave LSQ. (No RAR, but WAR).Suppose a senior store is yet to calculate its memory address.A junior load/store can not leave. (RAW, WAW)
12. Stores leave a copy of their address in Address Buffer near LSQ, so that junior loads can figure out (without looking up the ROB) if they can read cache. It means junior stores, with a senior load yet to calculate address, can not leave LSQ. It means, junior stores with address matching to a senior load should not leave LSQ. Or they can leave if senior loads with matching address make a note of this.
![Page 38: Out-of-Order Execution, Exception, Branch Prediction, CMP](https://reader031.vdocuments.mx/reader031/viewer/2022012103/616a0d9411a7b741a34e4a92/html5/thumbnails/38.jpg)
38
Exceptions
• Q#1 What is the definition of an exception?
• Q#2 What is the difference between asynchronous and synchronous exceptions? Give two examples of each.
• Q#3 Precise exceptions are _______________ (synchronous, asynchronous ) and the excepting instruction _________ (must be/does not need to be) re-executed .
![Page 39: Out-of-Order Execution, Exception, Branch Prediction, CMP](https://reader031.vdocuments.mx/reader031/viewer/2022012103/616a0d9411a7b741a34e4a92/html5/thumbnails/39.jpg)
• A#1: Exceptions are very rare events forcing a transfer of program control to a software handler.
• A#2: Synchronous exceptions are triggered by specific instructions (e.g. Divide by zero, illegal instruction, page fault, etc.). Asynchronous exceptions include the hardware interrupts and are not tied to a specific executing instruction (e.g. keyboard interrupt, real-time clock, power failure)
• A#3: Precise exceptions are (synchronous, asynchronous ) and the excepting instruction (must be/does not need to be) re-executed (e.g. in the case page fault, ....).
39
![Page 40: Out-of-Order Execution, Exception, Branch Prediction, CMP](https://reader031.vdocuments.mx/reader031/viewer/2022012103/616a0d9411a7b741a34e4a92/html5/thumbnails/40.jpg)
40
Q#4• Interrupts are ___________
(Asynchronous/Synchronous) to program execution.
• Traps are ___________ (Asynchronous/Synchronous) to program execution.
![Page 41: Out-of-Order Execution, Exception, Branch Prediction, CMP](https://reader031.vdocuments.mx/reader031/viewer/2022012103/616a0d9411a7b741a34e4a92/html5/thumbnails/41.jpg)
41
A#4• Interrupts are ___________
(Asynchronous/Synchronous) to program execution. Example: Keyboard interrupt.
• Traps are ___________ (Asynchronous/Synchronous) to program execution. Example: addition overflow trap.
![Page 42: Out-of-Order Execution, Exception, Branch Prediction, CMP](https://reader031.vdocuments.mx/reader031/viewer/2022012103/616a0d9411a7b741a34e4a92/html5/thumbnails/42.jpg)
42
Q#5• Match the exceptions with the 5 pipeline
stages
IF ID EX MEM WB
Page Fault
Integer Overflow
Undefined Opcode
Memory Protection Violation
![Page 43: Out-of-Order Execution, Exception, Branch Prediction, CMP](https://reader031.vdocuments.mx/reader031/viewer/2022012103/616a0d9411a7b741a34e4a92/html5/thumbnails/43.jpg)
43
A#5• Match the exceptions with the 5 pipeline
stages
IF ID EX MEM WB
Page Fault X X
Integer Overflow X
Undefined Opcode X
Memory Protection Violation
X X
![Page 44: Out-of-Order Execution, Exception, Branch Prediction, CMP](https://reader031.vdocuments.mx/reader031/viewer/2022012103/616a0d9411a7b741a34e4a92/html5/thumbnails/44.jpg)
44
Q#6 For precise exceptions, the exceptions should be taken in
a. process orderb. temporal order
![Page 45: Out-of-Order Execution, Exception, Branch Prediction, CMP](https://reader031.vdocuments.mx/reader031/viewer/2022012103/616a0d9411a7b741a34e4a92/html5/thumbnails/45.jpg)
45
Q#6 For precise exceptions, the exceptions should be taken in
a. process orderb. temporal order
• A#6: Process order. Exceptions on earlier instructions must be handled before exceptions due to later instructions, regardless of when they are detected.
![Page 46: Out-of-Order Execution, Exception, Branch Prediction, CMP](https://reader031.vdocuments.mx/reader031/viewer/2022012103/616a0d9411a7b741a34e4a92/html5/thumbnails/46.jpg)
46
Q#7• For precise exceptions in the 5-stage
pipeline, an exception should be taken in which stage? Why?
![Page 47: Out-of-Order Execution, Exception, Branch Prediction, CMP](https://reader031.vdocuments.mx/reader031/viewer/2022012103/616a0d9411a7b741a34e4a92/html5/thumbnails/47.jpg)
• A#7: WB Stage. This is to insure that no earlier instruction in program order triggers an exception.
Well, as discussed in our class, an exception can be taken in MEM stage (instead of the WB stage) as the instruction in the WB stage would not cause a new exception.
47
![Page 48: Out-of-Order Execution, Exception, Branch Prediction, CMP](https://reader031.vdocuments.mx/reader031/viewer/2022012103/616a0d9411a7b741a34e4a92/html5/thumbnails/48.jpg)
48
Q#8• What are the functions of the Cause
Register and Exception PC (EPC)?
![Page 49: Out-of-Order Execution, Exception, Branch Prediction, CMP](https://reader031.vdocuments.mx/reader031/viewer/2022012103/616a0d9411a7b741a34e4a92/html5/thumbnails/49.jpg)
49
Q#8• What are the functions of the Cause
Register and Exception PC (EPC)?
• A#8: Cause register records what type of exception occurred, and the EPC tells the exception handler on which instruction the exception occurred.
![Page 50: Out-of-Order Execution, Exception, Branch Prediction, CMP](https://reader031.vdocuments.mx/reader031/viewer/2022012103/616a0d9411a7b741a34e4a92/html5/thumbnails/50.jpg)
50
Q#9 What are the requirements of precise exception handling in a pipelined processor?
![Page 51: Out-of-Order Execution, Exception, Branch Prediction, CMP](https://reader031.vdocuments.mx/reader031/viewer/2022012103/616a0d9411a7b741a34e4a92/html5/thumbnails/51.jpg)
51
Q#9 What are the requirements of precise exception handling in a pipelined processor?
A#9: All preceding instructions in process order must complete.All instructions following the faulting instruction plus the faulting instruction itself must be squashed.The execution of the handler must be started.
![Page 52: Out-of-Order Execution, Exception, Branch Prediction, CMP](https://reader031.vdocuments.mx/reader031/viewer/2022012103/616a0d9411a7b741a34e4a92/html5/thumbnails/52.jpg)
52
• Q#10
![Page 53: Out-of-Order Execution, Exception, Branch Prediction, CMP](https://reader031.vdocuments.mx/reader031/viewer/2022012103/616a0d9411a7b741a34e4a92/html5/thumbnails/53.jpg)
53
First run (before first exception handled)
![Page 54: Out-of-Order Execution, Exception, Branch Prediction, CMP](https://reader031.vdocuments.mx/reader031/viewer/2022012103/616a0d9411a7b741a34e4a92/html5/thumbnails/54.jpg)
54
Second run (after page fault handled)
![Page 55: Out-of-Order Execution, Exception, Branch Prediction, CMP](https://reader031.vdocuments.mx/reader031/viewer/2022012103/616a0d9411a7b741a34e4a92/html5/thumbnails/55.jpg)
55
A#10: First run (before first exception handled)
IF ID EX MEM WB
Cycle 1 SW Illegal –Exception Detected
ADD LW –Exception Detected
Cycle 2 Start of Exception Handler
NOP NOP NOP NOP (Exception)
![Page 56: Out-of-Order Execution, Exception, Branch Prediction, CMP](https://reader031.vdocuments.mx/reader031/viewer/2022012103/616a0d9411a7b741a34e4a92/html5/thumbnails/56.jpg)
56
A#10: Second run (after page fault handled)
IF ID EX MEM WB
Cycle 1
SW Illegal –Exception Detected
ADD LW
Cycle 2
NOP NOP NOP (Exception)
ADD LW
Cycle 3
NOP NOP NOP NOP (Exception)
ADD
Cycle 4
Start of Exception Handler
NOP NOP NOP NOP (Exception)
![Page 57: Out-of-Order Execution, Exception, Branch Prediction, CMP](https://reader031.vdocuments.mx/reader031/viewer/2022012103/616a0d9411a7b741a34e4a92/html5/thumbnails/57.jpg)
57
Branch PredictionQ#1 Which types of branches need
prediction?a. Indirect branch due to return from
function callb. Conditional branchc. Unconditional branch
![Page 58: Out-of-Order Execution, Exception, Branch Prediction, CMP](https://reader031.vdocuments.mx/reader031/viewer/2022012103/616a0d9411a7b741a34e4a92/html5/thumbnails/58.jpg)
58
Branch PredictionQ#1 Which types of branches need
prediction (direction prediction)?a. Indirect branch due to return from
function callb. Conditional branchc. Unconditional branch
A#1: Conditional branch
![Page 59: Out-of-Order Execution, Exception, Branch Prediction, CMP](https://reader031.vdocuments.mx/reader031/viewer/2022012103/616a0d9411a7b741a34e4a92/html5/thumbnails/59.jpg)
59
The misprediction rate (increases/decreases/stays the same) if the loop is re-executed.
branchPCBranch Prediction Buffer
N T
Q#2 Given a simple 1-bit (2-state) pattern history predictor, assuming the initial branch is predicted not taken what is the misprediction rate for the following loop? (Assume there are no other branches in the loop):
for (i=0; i<4, i++)
![Page 60: Out-of-Order Execution, Exception, Branch Prediction, CMP](https://reader031.vdocuments.mx/reader031/viewer/2022012103/616a0d9411a7b741a34e4a92/html5/thumbnails/60.jpg)
60
The misprediction rate stays the same for all subsequent runs of the loop.
branchPCBranch Prediction Buffer
N T
A#2 The predictor will predict the 1st branch not taken, and it will predict the 2nd, 3rd, 4th, and 5th branches taken. The 1st and last predictions will be incorrect. So, the misprediction rate is 40%.
for (i=0; i<4, i++)
I 0 1 2 3 4
Pred N T T T T
![Page 61: Out-of-Order Execution, Exception, Branch Prediction, CMP](https://reader031.vdocuments.mx/reader031/viewer/2022012103/616a0d9411a7b741a34e4a92/html5/thumbnails/61.jpg)
Examples
DC08: TTTTTTTTTTT ... TTTTTTTTTTNTTTTTTTTT …
100,000 iterations
How often is branch outcome != previous outcome?2 / 100,000
TNNT
DC44: TTTTT ... TNTTTTT … TNTTTTT …
2 / 100
DC50: TNTNTNTNTNTNTNTNTNTNTNTNTNTNT …
2 / 2
99.998%Prediction
Rate98.0%
0.0%
© Murali Annavaram, Gabe Loh & Gary Tyson, All rights reserved
![Page 62: Out-of-Order Execution, Exception, Branch Prediction, CMP](https://reader031.vdocuments.mx/reader031/viewer/2022012103/616a0d9411a7b741a34e4a92/html5/thumbnails/62.jpg)
Brandon Franzke, USC 2006 62
Use two bit history• 2-bit history
– Start as strongly not taken – Update BPB after every branch execution
branchPC
SN N
Branch Prediction Buffer
T ST
© Murali Annavaram, Gabe Loh & Gary Tyson, All rights reserved
![Page 63: Out-of-Order Execution, Exception, Branch Prediction, CMP](https://reader031.vdocuments.mx/reader031/viewer/2022012103/616a0d9411a7b741a34e4a92/html5/thumbnails/63.jpg)
TWO-BIT PREDICTOR
2-BIT UP-DOWN SATURATING COUNTER IN EACH ENTRY OF THE BPB
TAKEN==> ADD 1; UNTAKEN: SUBTRACT 1NOW IT TAKES 2 MISPREDICTIONS IN A ROW TO CHANGE THE PREDICTIONFOR THE NESTED LOOP, THE MISPRECTION AT ENTRY IS AVOIDED
COULD HAVE MORE THAN 2-BITS, BUT TWO BITS COVER MOST PATTERNS (LOOPS)
00Predict U
10Predict T
01Predict U
11Predict T
T
U T
U
T U
T
U
U: UntakenT: Taken
SN N
TST
SN
N
T
ST
Strongly Not Taken
Not Taken
Taken
Strongly Taken
SN N T ST
EE557 Michel Dubois USC 2007
![Page 64: Out-of-Order Execution, Exception, Branch Prediction, CMP](https://reader031.vdocuments.mx/reader031/viewer/2022012103/616a0d9411a7b741a34e4a92/html5/thumbnails/64.jpg)
64
• Q#3 Show the states and predictions for 2 runs of the loop shown in Q#2 using the 2-bit pattern history predictor?
First run: Second run:Iteration 0 1 2 3 4
Actual T T T T N
State
Prediction N
Iteration 0 1 2 3 4
Actual T T T T N
State
Prediction
SN N T ST
SN
![Page 65: Out-of-Order Execution, Exception, Branch Prediction, CMP](https://reader031.vdocuments.mx/reader031/viewer/2022012103/616a0d9411a7b741a34e4a92/html5/thumbnails/65.jpg)
65
• A#3 The 2-bit predictor works better than the 1-bit predictor after the initial training period.We can improve the initial training period by starting in the state.
First run: Second run:Iteration 0 1 2 3 4
Actual T T T T N
State
Prediction N N T T T
Iteration 0 1 2 3 4
Actual T T T T N
State
Prediction T T T T T
SN N T ST
SN N T ST ST T ST ST ST ST
T
![Page 66: Out-of-Order Execution, Exception, Branch Prediction, CMP](https://reader031.vdocuments.mx/reader031/viewer/2022012103/616a0d9411a7b741a34e4a92/html5/thumbnails/66.jpg)
66
Q#4 (Global / Local) predictors make use of the PC, while (global / local) predictors do not.
![Page 67: Out-of-Order Execution, Exception, Branch Prediction, CMP](https://reader031.vdocuments.mx/reader031/viewer/2022012103/616a0d9411a7b741a34e4a92/html5/thumbnails/67.jpg)
67
A#4 (Global / Local) predictors make use of the PC, while (global / local) predictors do not.
A#4 Local (also known as per-address) predictors, make use of the PC to distinguish between different branch instructions. Global predictors do not.
![Page 68: Out-of-Order Execution, Exception, Branch Prediction, CMP](https://reader031.vdocuments.mx/reader031/viewer/2022012103/616a0d9411a7b741a34e4a92/html5/thumbnails/68.jpg)
Correlating Branches
(2,2) predictor– Behavior of recent
branches selects between four predictions of next branch, updating just that prediction
Branch address
2-bits per branch predictor
Prediction
2-bit global branch history
4
CS252 UC Berkeley David A. Patterson
![Page 69: Out-of-Order Execution, Exception, Branch Prediction, CMP](https://reader031.vdocuments.mx/reader031/viewer/2022012103/616a0d9411a7b741a34e4a92/html5/thumbnails/69.jpg)
69
• Q#5 Two-Level Prediction:• Given the following branch history / pattern
history predictor:– 2-bit global branch history register (Shift-Left)– 3-bits of PC used to access pattern history table.– All predictors are 2-bits Predictors.– Instruction width = 32-bits– Assume the next branch instruction is at PC = 8004,
and it will be taken eventually.• On the following page:
– Provide the bits of the PC used by the predictor.– Indicate if the prediction is taken/not taken.– Show any changes to the branch history register and
pattern history table after the branch taken outcome info is provided.
![Page 70: Out-of-Order Execution, Exception, Branch Prediction, CMP](https://reader031.vdocuments.mx/reader031/viewer/2022012103/616a0d9411a7b741a34e4a92/html5/thumbnails/70.jpg)
700 1
00 10 11 10
11 10 01 01
01 01 01 11
00 01 00 10
00 10 11 10
11 10 01 01
01 01 01 11
00 01 00 10
PC A__ - A__
00 11
000
111BHR
Pattern History Table
01 10
![Page 71: Out-of-Order Execution, Exception, Branch Prediction, CMP](https://reader031.vdocuments.mx/reader031/viewer/2022012103/616a0d9411a7b741a34e4a92/html5/thumbnails/71.jpg)
710 1
00 10 11 10
11 10 01 01
01 01 01 11
00 01 00 10
00 10 11 10
11 10 01 01
01 01 01 11
00 01 00 10
PC A 4 - A 2
00 11
000
111BHR
Pattern History Table
01 10
001
A#5: 8004H => 00110 => Predict T (Taken)
This branch is taken as predicted eventually. Hence•Branch History Register shifts left from 01 to 11.•Pattern changes from state 10 to state 11 (refer to the 2-bit predictor state diagram).
Shift in a 1
![Page 72: Out-of-Order Execution, Exception, Branch Prediction, CMP](https://reader031.vdocuments.mx/reader031/viewer/2022012103/616a0d9411a7b741a34e4a92/html5/thumbnails/72.jpg)
72
Q#6 Is the following statement true or false? Explain.
“A predictor with more bits can always achieve a better performance”
![Page 73: Out-of-Order Execution, Exception, Branch Prediction, CMP](https://reader031.vdocuments.mx/reader031/viewer/2022012103/616a0d9411a7b741a34e4a92/html5/thumbnails/73.jpg)
73
Q#6 Is the following statement true or false? Explain.
“A predictor with more bits can always achieve a better performance”
A#6 : No. More bits can often just increase training time, which will reduce the accuracy for shorter loops. Also more bits mean more hysteresis which in turn means “refusing” to “adopt” or “change”.
![Page 74: Out-of-Order Execution, Exception, Branch Prediction, CMP](https://reader031.vdocuments.mx/reader031/viewer/2022012103/616a0d9411a7b741a34e4a92/html5/thumbnails/74.jpg)
![Page 75: Out-of-Order Execution, Exception, Branch Prediction, CMP](https://reader031.vdocuments.mx/reader031/viewer/2022012103/616a0d9411a7b741a34e4a92/html5/thumbnails/75.jpg)
Q#7 With a branch target buffer, the address of the next instruction can be predicted while the branch is in _____ (IF/ID/EX/MEM/WB) stage.
75
![Page 76: Out-of-Order Execution, Exception, Branch Prediction, CMP](https://reader031.vdocuments.mx/reader031/viewer/2022012103/616a0d9411a7b741a34e4a92/html5/thumbnails/76.jpg)
Q#7 With a branch target buffer, the address of the next instruction can be predicted while the branch is in _____ (IF/ID/EX/MEM/WB) stage.
76
A#7: IF Stage. The branch target buffer compares the PC against the known predicted taken branches and supplies the next address. Since only the PCs are being compared, the instruction does not have to be decoded. For accurately predicted branches, this results in zero clock penalty.
![Page 77: Out-of-Order Execution, Exception, Branch Prediction, CMP](https://reader031.vdocuments.mx/reader031/viewer/2022012103/616a0d9411a7b741a34e4a92/html5/thumbnails/77.jpg)
77
CMPQ#1 Uniprocessor pipelines (with no
multithreading) are constrained by ___________ level parallelism
Q#2 Dynamic power considerations favors ____(Uniprocessor / Parallel Processor)
![Page 78: Out-of-Order Execution, Exception, Branch Prediction, CMP](https://reader031.vdocuments.mx/reader031/viewer/2022012103/616a0d9411a7b741a34e4a92/html5/thumbnails/78.jpg)
78
CMPA#1 Uniprocessor pipelines (with no
multithreading) are constrained by instruction level parallelism (ILP)
A#2 Dynamic power considerations favors ____(Uniprocessor / Parallel Processor)
![Page 79: Out-of-Order Execution, Exception, Branch Prediction, CMP](https://reader031.vdocuments.mx/reader031/viewer/2022012103/616a0d9411a7b741a34e4a92/html5/thumbnails/79.jpg)
79
Q#3a Which types of processor multithreading need context switch through Process Control Block?
a. Software multithreadingb. Hardware multithreading
Q#3b Which has high over-head of context switching?
a. Software multithreadingb. Hardware multithreading
![Page 80: Out-of-Order Execution, Exception, Branch Prediction, CMP](https://reader031.vdocuments.mx/reader031/viewer/2022012103/616a0d9411a7b741a34e4a92/html5/thumbnails/80.jpg)
80
A#3a Which types of processor multithreading need context switch through Process Control Block?
a. Software multithreadingb. Hardware multithreading
A#3b Which has high over-head of context switching?
a. Software multithreadingb. Hardware multithreading
![Page 81: Out-of-Order Execution, Exception, Branch Prediction, CMP](https://reader031.vdocuments.mx/reader031/viewer/2022012103/616a0d9411a7b741a34e4a92/html5/thumbnails/81.jpg)
81
Q#4 Does Niagara have the cache coherence issue? If Yes, in which level of cache?
![Page 82: Out-of-Order Execution, Exception, Branch Prediction, CMP](https://reader031.vdocuments.mx/reader031/viewer/2022012103/616a0d9411a7b741a34e4a92/html5/thumbnails/82.jpg)
82
Q#4 Does Niagara have the cache coherence issue? If Yes, in which level of cache?
A#4: Yes, in L1 cache since it’s not shared.
![Page 83: Out-of-Order Execution, Exception, Branch Prediction, CMP](https://reader031.vdocuments.mx/reader031/viewer/2022012103/616a0d9411a7b741a34e4a92/html5/thumbnails/83.jpg)
83
Q#5a Is L1 cache shared across cores?
Q#5b Is L1 cache shared (used) by the different threads running on a single core?
![Page 84: Out-of-Order Execution, Exception, Branch Prediction, CMP](https://reader031.vdocuments.mx/reader031/viewer/2022012103/616a0d9411a7b741a34e4a92/html5/thumbnails/84.jpg)
84
Q#5a Is L1 cache shared across cores?
No.
Q#5b Is L1 cache shared (used) by the different threads running on a single core?
Yes.
![Page 85: Out-of-Order Execution, Exception, Branch Prediction, CMP](https://reader031.vdocuments.mx/reader031/viewer/2022012103/616a0d9411a7b741a34e4a92/html5/thumbnails/85.jpg)
85
• Q#6 Uniprocessors place greater burden on (hardware / software) designers, while parallel processors place greater burden on (hardware / software) designers.
![Page 86: Out-of-Order Execution, Exception, Branch Prediction, CMP](https://reader031.vdocuments.mx/reader031/viewer/2022012103/616a0d9411a7b741a34e4a92/html5/thumbnails/86.jpg)
86
• Q#6 Uniprocessors place greater burden on (hardware / software) designers, while parallel processors place greater burden on (hardware / software) designers.
• A#6 Uniprocessors place greater burden on (hardware / software) designers, while parallel processors place greater burden on (hardware / software) designers.