processing control transfer instructions chapter no. 8 by najma ismat
TRANSCRIPT
![Page 1: PROCESSING CONTROL TRANSFER INSTRUCTIONS Chapter No. 8 By Najma Ismat](https://reader036.vdocuments.mx/reader036/viewer/2022062421/56649c785503460f9492d0de/html5/thumbnails/1.jpg)
PROCESSING CONTROL TRANSFER INSTRUCTIONS
Chapter No. 8
By
Najma Ismat
![Page 2: PROCESSING CONTROL TRANSFER INSTRUCTIONS Chapter No. 8 By Najma Ismat](https://reader036.vdocuments.mx/reader036/viewer/2022062421/56649c785503460f9492d0de/html5/thumbnails/2.jpg)
Control Transfer Instructions
data hazards are a big enough problem that lots of resources have been devoted to over coming them but unfortunately, the real obstacle and limiting factor in maintaining a good rate of execution in a pipeline are control dependencies
branches are 1 out of every 5 or 6 inst. In an n-issue processor, they’ll arrive n times fasterA “control dependence” determines the ordering of an
instruction with respect to a branch instruction so that the non-branch instruction is executed only when it should be
![Page 3: PROCESSING CONTROL TRANSFER INSTRUCTIONS Chapter No. 8 By Najma Ismat](https://reader036.vdocuments.mx/reader036/viewer/2022062421/56649c785503460f9492d0de/html5/thumbnails/3.jpg)
Control Transfer Instructions
If an instruction is control dependent on a branch, it cannot be moved before the branch
They make sure instructions execute in orderControl dependencies preserve dataflow
Makes sure that instructions that produce results and consume them get the right data at the right time
![Page 4: PROCESSING CONTROL TRANSFER INSTRUCTIONS Chapter No. 8 By Najma Ismat](https://reader036.vdocuments.mx/reader036/viewer/2022062421/56649c785503460f9492d0de/html5/thumbnails/4.jpg)
How Control Instruction Can Be Defined?
Instructions normally fetched and executed from sequential memory locations
PC is the address of the current instruction, and nPC is the address of the next instruction (nPC = PC + 4)
Branches and control transfer instructions change nPC to something else
Branches modify, conditionally or unconditionally, the value of the PC.
![Page 5: PROCESSING CONTROL TRANSFER INSTRUCTIONS Chapter No. 8 By Najma Ismat](https://reader036.vdocuments.mx/reader036/viewer/2022062421/56649c785503460f9492d0de/html5/thumbnails/5.jpg)
Types of Branches
![Page 6: PROCESSING CONTROL TRANSFER INSTRUCTIONS Chapter No. 8 By Najma Ismat](https://reader036.vdocuments.mx/reader036/viewer/2022062421/56649c785503460f9492d0de/html5/thumbnails/6.jpg)
Unconditional Branches
1014181c2024282c3034
jmp addressi1
jmp 24i3i4i5i6i7i8
jmp 20i10
![Page 7: PROCESSING CONTROL TRANSFER INSTRUCTIONS Chapter No. 8 By Najma Ismat](https://reader036.vdocuments.mx/reader036/viewer/2022062421/56649c785503460f9492d0de/html5/thumbnails/7.jpg)
Conditional jumps
i1jle 24
i3i4
jmp 2ci6i7i8i9
i10
i1jle 24
i3i4
jmp 2c
i6i7
i8i9
i10Basic blocks
Basic blocks
1014181c2024282c3034
![Page 8: PROCESSING CONTROL TRANSFER INSTRUCTIONS Chapter No. 8 By Najma Ismat](https://reader036.vdocuments.mx/reader036/viewer/2022062421/56649c785503460f9492d0de/html5/thumbnails/8.jpg)
How Architectures Checks the Results of Operations?
![Page 9: PROCESSING CONTROL TRANSFER INSTRUCTIONS Chapter No. 8 By Najma Ismat](https://reader036.vdocuments.mx/reader036/viewer/2022062421/56649c785503460f9492d0de/html5/thumbnails/9.jpg)
Result State Concept
Architectures that supports result state approach are IBM/360 and 370, PDP-11, VAX, x86, Pentium, MC 68000, SPARC and PowerPC
the generation of the result state requires additional chip area
implementation for VLIW and superscalar architectures requires appropriate mechanisms to avoid multiple or out-of-order updating of the results state
multiple sets of flags or condition codes can be used
![Page 10: PROCESSING CONTROL TRANSFER INSTRUCTIONS Chapter No. 8 By Najma Ismat](https://reader036.vdocuments.mx/reader036/viewer/2022062421/56649c785503460f9492d0de/html5/thumbnails/10.jpg)
Example (Result State Concept)
add r1, r2, r3 // r1<- r2 + r3 beq zero // test for result equals to zero and,if
// ‘yes’ branch to location zerodiv r5, r4, r1 // r5 <- r4 / r1
.
.
.zero: // processing the case if divisor equals to
// zero
![Page 11: PROCESSING CONTROL TRANSFER INSTRUCTIONS Chapter No. 8 By Najma Ismat](https://reader036.vdocuments.mx/reader036/viewer/2022062421/56649c785503460f9492d0de/html5/thumbnails/11.jpg)
Example (Result State Concept)
teq r1 // test for (r1)=0 and update result state // accordingly
beq zero // test for results equals to zero and, if yes, // branch to the location zero
div r5, r4, r1 // r5 <- r4/ r1
.
.
.
zero: // processing the case if divisor equals to // zero
![Page 12: PROCESSING CONTROL TRANSFER INSTRUCTIONS Chapter No. 8 By Najma Ismat](https://reader036.vdocuments.mx/reader036/viewer/2022062421/56649c785503460f9492d0de/html5/thumbnails/12.jpg)
The Direct Check Concept
Direct checking of a condition and a branch can be implemented in architectures in two ways: use two separate instructions
First the result value is checked and compare and the result of the compare instruction is stored in the appropriate register
then the conditional branch instruction can be used to test outcome of the deposited test outcome and branch to the given location if the specified condition is met
use single instructiona single instruction fulfils both testing and conditional branching
![Page 13: PROCESSING CONTROL TRANSFER INSTRUCTIONS Chapter No. 8 By Najma Ismat](https://reader036.vdocuments.mx/reader036/viewer/2022062421/56649c785503460f9492d0de/html5/thumbnails/13.jpg)
Example (Use Two Separate Instructions)
add r1, r2, r3; // r1<- r2 + r3 cmpeq r7, r1; // r7 <- true, if (r1)=0, else NOP
bt r7,zero // branch to ‘zero’:if (r7)=true, else NOPdiv r5, r4, r1 // r5 <- r4 / r1
.
.
.zero:
![Page 14: PROCESSING CONTROL TRANSFER INSTRUCTIONS Chapter No. 8 By Najma Ismat](https://reader036.vdocuments.mx/reader036/viewer/2022062421/56649c785503460f9492d0de/html5/thumbnails/14.jpg)
Example (Use Single Instruction)
add r1, r2, r3 // r1<- r2 + r3
beq r1, zero // test for (r1)=0 and branch if true
div r5, r4, r1 // r5 <- r4 / r1
.
.
.
zero:
![Page 15: PROCESSING CONTROL TRANSFER INSTRUCTIONS Chapter No. 8 By Najma Ismat](https://reader036.vdocuments.mx/reader036/viewer/2022062421/56649c785503460f9492d0de/html5/thumbnails/15.jpg)
Branch Statistics
Branch frequency severely affects how much parallelism can be achieved or extracted from a program
20% of general-purpose code are branch on average, each fifth instruction is a branch
5-10% of scientific code are branchThe Majority of branches are conditional (80%)75-80% of all branches are taken
![Page 16: PROCESSING CONTROL TRANSFER INSTRUCTIONS Chapter No. 8 By Najma Ismat](https://reader036.vdocuments.mx/reader036/viewer/2022062421/56649c785503460f9492d0de/html5/thumbnails/16.jpg)
Branch Statistics (taken/not taken)
![Page 17: PROCESSING CONTROL TRANSFER INSTRUCTIONS Chapter No. 8 By Najma Ismat](https://reader036.vdocuments.mx/reader036/viewer/2022062421/56649c785503460f9492d0de/html5/thumbnails/17.jpg)
Branch Problem
![Page 18: PROCESSING CONTROL TRANSFER INSTRUCTIONS Chapter No. 8 By Najma Ismat](https://reader036.vdocuments.mx/reader036/viewer/2022062421/56649c785503460f9492d0de/html5/thumbnails/18.jpg)
Branch Problem incase of Pipelining (unconditional branch)
![Page 19: PROCESSING CONTROL TRANSFER INSTRUCTIONS Chapter No. 8 By Najma Ismat](https://reader036.vdocuments.mx/reader036/viewer/2022062421/56649c785503460f9492d0de/html5/thumbnails/19.jpg)
Performance Measures of Branch Processing
![Page 20: PROCESSING CONTROL TRANSFER INSTRUCTIONS Chapter No. 8 By Najma Ismat](https://reader036.vdocuments.mx/reader036/viewer/2022062421/56649c785503460f9492d0de/html5/thumbnails/20.jpg)
Performance Measures of Branch Processing
In order to evaluate compare branch processing a performance measure branch penalty is used
branch penalty the number of additional delay cycles occurring until the
target instruction is fetched over the natural 1-cycle delay consider effective branch penalty P for taken and not
taken branches is:
P = ft * Pt + fnt * Pnt
![Page 21: PROCESSING CONTROL TRANSFER INSTRUCTIONS Chapter No. 8 By Najma Ismat](https://reader036.vdocuments.mx/reader036/viewer/2022062421/56649c785503460f9492d0de/html5/thumbnails/21.jpg)
Performance Measures of Branch Processing
Where: Pt : branch penalties for taken Pnt : branch penalties for not-taken ft : frequencies of taken fnt : frequencies for not-taken e.g. 80386 Pt = 8 cycles Pnt=2 cycles , therefore
P = 0.75 * 8 + 0.25 * 2 = 6.5 cycles e.g. I486 Pt = 2 cycles Pnt=0 cycles , therefore
P = 0.75 * 2 + 0.25 * 0 = 1.5 cycles
![Page 22: PROCESSING CONTROL TRANSFER INSTRUCTIONS Chapter No. 8 By Najma Ismat](https://reader036.vdocuments.mx/reader036/viewer/2022062421/56649c785503460f9492d0de/html5/thumbnails/22.jpg)
Performance Measures of Branch Processing
Effective branch penalty for branch prediction incase of correctly predicted or mispredicted branches is:
P = fc * Pc + fm * Pm e.g. In Pentium penalty for correctly predicted branches =
0 cycles & penalty for mispredicted branches = 3 cycles
P = 0.9 * 0 + 0.1 * 3.5 = 0.35 cycles
![Page 23: PROCESSING CONTROL TRANSFER INSTRUCTIONS Chapter No. 8 By Najma Ismat](https://reader036.vdocuments.mx/reader036/viewer/2022062421/56649c785503460f9492d0de/html5/thumbnails/23.jpg)
Zero-cycle Branching (Branch Folding)
Refers to branch implementations which allow execution of branches with a one cycle gain compared to sequential execution
instruction logically following the branch is executed immediately after the instruction which precedes the branch
this scheme is implemented using BTAC (branch target address cache)
![Page 24: PROCESSING CONTROL TRANSFER INSTRUCTIONS Chapter No. 8 By Najma Ismat](https://reader036.vdocuments.mx/reader036/viewer/2022062421/56649c785503460f9492d0de/html5/thumbnails/24.jpg)
Zero-cycle Branching
![Page 25: PROCESSING CONTROL TRANSFER INSTRUCTIONS Chapter No. 8 By Najma Ismat](https://reader036.vdocuments.mx/reader036/viewer/2022062421/56649c785503460f9492d0de/html5/thumbnails/25.jpg)
Basic Approaches to Branch Handling
![Page 26: PROCESSING CONTROL TRANSFER INSTRUCTIONS Chapter No. 8 By Najma Ismat](https://reader036.vdocuments.mx/reader036/viewer/2022062421/56649c785503460f9492d0de/html5/thumbnails/26.jpg)
Delayed Branch
a branch delay slot is a single cycle delay that comes after a conditional branch instruction has begun execution, but before the branch condition has been resolved, and the branch target address has been computed. It is a feature of several RISC designs, such as the SPARC
![Page 27: PROCESSING CONTROL TRANSFER INSTRUCTIONS Chapter No. 8 By Najma Ismat](https://reader036.vdocuments.mx/reader036/viewer/2022062421/56649c785503460f9492d0de/html5/thumbnails/27.jpg)
Delayed Branch
Assuming branch target address (BTA) is available at the end of decode stage and branch target instruction (BTI) can be fetched in a single cycle (execution stage) from the cache
in delayed branching the instruction that is following the branch is executed in the delay slot
delayed branching can be considered as a scheme applicable to branches in general, irrespective of whether they are unconditional or conditional
![Page 28: PROCESSING CONTROL TRANSFER INSTRUCTIONS Chapter No. 8 By Najma Ismat](https://reader036.vdocuments.mx/reader036/viewer/2022062421/56649c785503460f9492d0de/html5/thumbnails/28.jpg)
Delayed Branch
![Page 29: PROCESSING CONTROL TRANSFER INSTRUCTIONS Chapter No. 8 By Najma Ismat](https://reader036.vdocuments.mx/reader036/viewer/2022062421/56649c785503460f9492d0de/html5/thumbnails/29.jpg)
Delayed Branch
![Page 30: PROCESSING CONTROL TRANSFER INSTRUCTIONS Chapter No. 8 By Najma Ismat](https://reader036.vdocuments.mx/reader036/viewer/2022062421/56649c785503460f9492d0de/html5/thumbnails/30.jpg)
Example (Delayed Branch)
![Page 31: PROCESSING CONTROL TRANSFER INSTRUCTIONS Chapter No. 8 By Najma Ismat](https://reader036.vdocuments.mx/reader036/viewer/2022062421/56649c785503460f9492d0de/html5/thumbnails/31.jpg)
Performance Gain (Delayed Branch)
60-70% of the delay slot can be fill with useful instruction fill only with: instruction that can be put in the delay slot
but does not violate data dependency fill only with: instruction that can be executed in single
pipeline cycleRatio of the delay slots that can be filled with useful
instructions is ff
Frequency of branches is fb
20-30% for general-propose program 5-10% for scientific program
![Page 32: PROCESSING CONTROL TRANSFER INSTRUCTIONS Chapter No. 8 By Najma Ismat](https://reader036.vdocuments.mx/reader036/viewer/2022062421/56649c785503460f9492d0de/html5/thumbnails/32.jpg)
Performance Gain (Delayed Branch)
Delay slot utilization is nm
nm =no. of instructions * fb * ff
n instructions have n* fb delay slots, therefore100 instructions have 100* fb delay slots,
nm =100*fb * ff can be utilizedPerformance Gain is Gd
Gd = (no.of instructions*fb * ff)/100 = fb * ff
![Page 33: PROCESSING CONTROL TRANSFER INSTRUCTIONS Chapter No. 8 By Najma Ismat](https://reader036.vdocuments.mx/reader036/viewer/2022062421/56649c785503460f9492d0de/html5/thumbnails/33.jpg)
Example (Performance Gain in Delayed Branch)
Suppose there are 100 instructions, on average 20% of all executed instructions are branches and 60% of the delay slots can be filled with instructions other than NOPs. What is performance gain in this case?
nm =no. of instructions * fb * ff
nm =100 * 0.2 * 0.6=12 delay slots
Gd = (no.of instructions*fb * ff)/100 = fb * ff
Gd = nm /100 =12/100
Gd = 12%
Gdmax = fb * ff (if ff=1 means each slot can be filled with useful instructions)
Gdmax = fb (where fb is the ratio of branches)
![Page 34: PROCESSING CONTROL TRANSFER INSTRUCTIONS Chapter No. 8 By Najma Ismat](https://reader036.vdocuments.mx/reader036/viewer/2022062421/56649c785503460f9492d0de/html5/thumbnails/34.jpg)
Delayed Branch Pros and Cons
Pros: Low Hardware Cost
Cons: Depends on compiler to fill delay slots
Ability to fill delay slots drops as # of slots increases Exposes implementation details to compiler
Can’t change pipeline without breaking software interrupt processing becomes more difficult compatibility
Can’t add to existing architecture and retain compatibility so needs to redefine an architecture
![Page 35: PROCESSING CONTROL TRANSFER INSTRUCTIONS Chapter No. 8 By Najma Ismat](https://reader036.vdocuments.mx/reader036/viewer/2022062421/56649c785503460f9492d0de/html5/thumbnails/35.jpg)
Design Space of Delayed Branching
Delayed Branching
Multipicity of delay slots
Most architectures
MIPS-X (1996)
Annulment of an instruction in the delay
slot
![Page 36: PROCESSING CONTROL TRANSFER INSTRUCTIONS Chapter No. 8 By Najma Ismat](https://reader036.vdocuments.mx/reader036/viewer/2022062421/56649c785503460f9492d0de/html5/thumbnails/36.jpg)
![Page 37: PROCESSING CONTROL TRANSFER INSTRUCTIONS Chapter No. 8 By Najma Ismat](https://reader036.vdocuments.mx/reader036/viewer/2022062421/56649c785503460f9492d0de/html5/thumbnails/37.jpg)
Kinds of Annulment
annul delay slot if branch is not taken
annul delay slot if branch is taken
![Page 38: PROCESSING CONTROL TRANSFER INSTRUCTIONS Chapter No. 8 By Najma Ismat](https://reader036.vdocuments.mx/reader036/viewer/2022062421/56649c785503460f9492d0de/html5/thumbnails/38.jpg)
Design Space of Branch Processing
![Page 39: PROCESSING CONTROL TRANSFER INSTRUCTIONS Chapter No. 8 By Najma Ismat](https://reader036.vdocuments.mx/reader036/viewer/2022062421/56649c785503460f9492d0de/html5/thumbnails/39.jpg)
Branch Detection Schemes
Master pipeline approach branches are detected and processed in a unified instruction
processing scheme
early branch detection in parallel branch detection (Figure 8-16)
branches are detected in parallel with decode of other instructions using a dedicated branch decoder
look-ahead branch detectionbranches are detected from the instruction buffer but ahead of
general instruction decoding
integrated fetch and branch detectionbranches are detected during instruction fetch
![Page 40: PROCESSING CONTROL TRANSFER INSTRUCTIONS Chapter No. 8 By Najma Ismat](https://reader036.vdocuments.mx/reader036/viewer/2022062421/56649c785503460f9492d0de/html5/thumbnails/40.jpg)
![Page 41: PROCESSING CONTROL TRANSFER INSTRUCTIONS Chapter No. 8 By Najma Ismat](https://reader036.vdocuments.mx/reader036/viewer/2022062421/56649c785503460f9492d0de/html5/thumbnails/41.jpg)
![Page 42: PROCESSING CONTROL TRANSFER INSTRUCTIONS Chapter No. 8 By Najma Ismat](https://reader036.vdocuments.mx/reader036/viewer/2022062421/56649c785503460f9492d0de/html5/thumbnails/42.jpg)
Blocking Branch Processing
Execution of a conditional branch is simply stalled until the specified condition can be resolved
![Page 43: PROCESSING CONTROL TRANSFER INSTRUCTIONS Chapter No. 8 By Najma Ismat](https://reader036.vdocuments.mx/reader036/viewer/2022062421/56649c785503460f9492d0de/html5/thumbnails/43.jpg)
Speculative Branch Processing
Predict branches and speculatively execute instructions Correct prediction: no performance loss Incorrect prediction: Squash speculative instructions
it involves three key aspects: branch prediction scheme extent of speculativeness recovery from misprediction
![Page 44: PROCESSING CONTROL TRANSFER INSTRUCTIONS Chapter No. 8 By Najma Ismat](https://reader036.vdocuments.mx/reader036/viewer/2022062421/56649c785503460f9492d0de/html5/thumbnails/44.jpg)
Speculative Branch Processing
Basic Idea: Predict which way branch will go, start executing down that path
![Page 45: PROCESSING CONTROL TRANSFER INSTRUCTIONS Chapter No. 8 By Najma Ismat](https://reader036.vdocuments.mx/reader036/viewer/2022062421/56649c785503460f9492d0de/html5/thumbnails/45.jpg)
Branch Prediction
Example:if (x > 0){
a=0;b=1;c=2; }
d=3;
Cycle Fetch Decode Execute Save
1 if (x>0)2 a=0 if (x>0)3 b=1 a=0 if (x>0)4 c=2 b=1 a=0 if (x>0)5 c=2 b=1 a=06 c=2 b=17 c=2
Cycle Fetch Decode Execute Save
1 if (x>0)2 a=0 if (x>0)3 b=1 a=0 if (x>0)
4 d=3squash
b=1squash
a=0 if (x>0)
5 d=3squash
b=1squash
a=0
6 d=3squash
b=17 d=3
When x>0
When x<0
Predicting x<0
![Page 46: PROCESSING CONTROL TRANSFER INSTRUCTIONS Chapter No. 8 By Najma Ismat](https://reader036.vdocuments.mx/reader036/viewer/2022062421/56649c785503460f9492d0de/html5/thumbnails/46.jpg)
Branch Prediction Schemes
![Page 47: PROCESSING CONTROL TRANSFER INSTRUCTIONS Chapter No. 8 By Najma Ismat](https://reader036.vdocuments.mx/reader036/viewer/2022062421/56649c785503460f9492d0de/html5/thumbnails/47.jpg)
![Page 48: PROCESSING CONTROL TRANSFER INSTRUCTIONS Chapter No. 8 By Najma Ismat](https://reader036.vdocuments.mx/reader036/viewer/2022062421/56649c785503460f9492d0de/html5/thumbnails/48.jpg)
Comparison Between Taken /Not Taken Approach
![Page 49: PROCESSING CONTROL TRANSFER INSTRUCTIONS Chapter No. 8 By Najma Ismat](https://reader036.vdocuments.mx/reader036/viewer/2022062421/56649c785503460f9492d0de/html5/thumbnails/49.jpg)
Static Branch Prediction
![Page 50: PROCESSING CONTROL TRANSFER INSTRUCTIONS Chapter No. 8 By Najma Ismat](https://reader036.vdocuments.mx/reader036/viewer/2022062421/56649c785503460f9492d0de/html5/thumbnails/50.jpg)
Dynamic Branch Prediction
![Page 51: PROCESSING CONTROL TRANSFER INSTRUCTIONS Chapter No. 8 By Najma Ismat](https://reader036.vdocuments.mx/reader036/viewer/2022062421/56649c785503460f9492d0de/html5/thumbnails/51.jpg)
Dynamic Branch Prediction
Explicit dynamic technique (based on history bits) 1-bit history 2-bit history 3-bit history
Implicit dynamic technique (presence of an entry for a predicted branch target access path) BTAC BTIC
![Page 52: PROCESSING CONTROL TRANSFER INSTRUCTIONS Chapter No. 8 By Najma Ismat](https://reader036.vdocuments.mx/reader036/viewer/2022062421/56649c785503460f9492d0de/html5/thumbnails/52.jpg)
1-bit Branch History
TakenNot
TakenT
T
NT
NT
10
![Page 53: PROCESSING CONTROL TRANSFER INSTRUCTIONS Chapter No. 8 By Najma Ismat](https://reader036.vdocuments.mx/reader036/viewer/2022062421/56649c785503460f9492d0de/html5/thumbnails/53.jpg)
1-bit Branch History
Single bit per branch is used to express whether the last occurrence of the branch was taken(T) or not taken(NT)
a21064 and R8000 processors uses single bit prediction scheme
![Page 54: PROCESSING CONTROL TRANSFER INSTRUCTIONS Chapter No. 8 By Najma Ismat](https://reader036.vdocuments.mx/reader036/viewer/2022062421/56649c785503460f9492d0de/html5/thumbnails/54.jpg)
2-bit Branch History
PredictTaken
Predictnot
Taken
Predictnot
Taken
PredictTaken
T
T
NT
NT
T
NT
T
BP state:(predict T/NT) x (last prediction right/wrong)
![Page 55: PROCESSING CONTROL TRANSFER INSTRUCTIONS Chapter No. 8 By Najma Ismat](https://reader036.vdocuments.mx/reader036/viewer/2022062421/56649c785503460f9492d0de/html5/thumbnails/55.jpg)
2-bit Branch History
Operates like a four state finite state machineUse run-time information to make prediction Change
the prediction after two consecutive mistakes! Increment for taken, decrement for not-taken
00,01,10,11
2-bit predictor almost as good as any general n-bit predictor
a21164A, Pentium, PowerPC 604 and 620 etc
![Page 56: PROCESSING CONTROL TRANSFER INSTRUCTIONS Chapter No. 8 By Najma Ismat](https://reader036.vdocuments.mx/reader036/viewer/2022062421/56649c785503460f9492d0de/html5/thumbnails/56.jpg)
3-bit Branch History
![Page 57: PROCESSING CONTROL TRANSFER INSTRUCTIONS Chapter No. 8 By Najma Ismat](https://reader036.vdocuments.mx/reader036/viewer/2022062421/56649c785503460f9492d0de/html5/thumbnails/57.jpg)
3-bit Branch History
Outcome of the last three occurrences of the branch are stored
decision is based on a majority basis simpler than the 2-bit scheme and results in similar
accuracy
![Page 58: PROCESSING CONTROL TRANSFER INSTRUCTIONS Chapter No. 8 By Najma Ismat](https://reader036.vdocuments.mx/reader036/viewer/2022062421/56649c785503460f9492d0de/html5/thumbnails/58.jpg)
Implicit Dynamic Techniques
BTIC (Branch Target Instruction Cache)BTAC (Branch Target Address Cache)
both of the above two schemes are used to access branch target path and also for branch prediction
extra cache is used which holds the most recently used branch and either the corresponding branch target addresses (in the BTAC) or the corresponding branch target instructions (in the BTIC)
for branch prediction BTAC and BTIC simply holds the entries for only taken branches
![Page 59: PROCESSING CONTROL TRANSFER INSTRUCTIONS Chapter No. 8 By Najma Ismat](https://reader036.vdocuments.mx/reader036/viewer/2022062421/56649c785503460f9492d0de/html5/thumbnails/59.jpg)
Implementation of History Bits
![Page 60: PROCESSING CONTROL TRANSFER INSTRUCTIONS Chapter No. 8 By Najma Ismat](https://reader036.vdocuments.mx/reader036/viewer/2022062421/56649c785503460f9492d0de/html5/thumbnails/60.jpg)
Extent of Speculativeness
![Page 61: PROCESSING CONTROL TRANSFER INSTRUCTIONS Chapter No. 8 By Najma Ismat](https://reader036.vdocuments.mx/reader036/viewer/2022062421/56649c785503460f9492d0de/html5/thumbnails/61.jpg)
Recovery from Misprediction
![Page 62: PROCESSING CONTROL TRANSFER INSTRUCTIONS Chapter No. 8 By Najma Ismat](https://reader036.vdocuments.mx/reader036/viewer/2022062421/56649c785503460f9492d0de/html5/thumbnails/62.jpg)
Recovery from Misprediction
![Page 63: PROCESSING CONTROL TRANSFER INSTRUCTIONS Chapter No. 8 By Najma Ismat](https://reader036.vdocuments.mx/reader036/viewer/2022062421/56649c785503460f9492d0de/html5/thumbnails/63.jpg)
![Page 64: PROCESSING CONTROL TRANSFER INSTRUCTIONS Chapter No. 8 By Najma Ismat](https://reader036.vdocuments.mx/reader036/viewer/2022062421/56649c785503460f9492d0de/html5/thumbnails/64.jpg)
Multiway Branching
![Page 65: PROCESSING CONTROL TRANSFER INSTRUCTIONS Chapter No. 8 By Najma Ismat](https://reader036.vdocuments.mx/reader036/viewer/2022062421/56649c785503460f9492d0de/html5/thumbnails/65.jpg)
Multiway Branching
Both taken and sequential paths of the unresolved conditional branch are pursued
good for VLIW architectureshigher demand for hardware resourcesmaintaining sequential consistency and discarding
superfluously executed computation is complex and time consuming job
only experimental implementation is available like in TRACE 500, URPR-2
![Page 66: PROCESSING CONTROL TRANSFER INSTRUCTIONS Chapter No. 8 By Najma Ismat](https://reader036.vdocuments.mx/reader036/viewer/2022062421/56649c785503460f9492d0de/html5/thumbnails/66.jpg)
Guarded Execution
a means to eliminate branchesby conditional operate instructions
IF the condition associated with the instruction is met,
THEN perform the specified operation ELSE do not perform the operation (NOP)
Convert control dependencies into data dependenciesconditional part is known as guard part and
operational part is the instruction part
![Page 67: PROCESSING CONTROL TRANSFER INSTRUCTIONS Chapter No. 8 By Najma Ismat](https://reader036.vdocuments.mx/reader036/viewer/2022062421/56649c785503460f9492d0de/html5/thumbnails/67.jpg)
Guarded Execution
e.g. original
beg r1, label // if (r1) = 0 branch to label
move r2, r3 // move (r2) into r3
label: …e.g. guarded
cmovne r1, r2, r3 // if (r1) != 0, move (r2) into r3
…