execution control comp
TRANSCRIPT
-
7/29/2019 Execution Control Comp
1/47
Execution Control
-
7/29/2019 Execution Control Comp
2/47
Execution Control
Execution control refers to the rules or
mechanisms used in a processor for
determining the next instruction to execute.
several features of execution control
hardware looping,
interrupt handling,stacks, and
relative branch support.
-
7/29/2019 Execution Control Comp
3/47
Hardware Looping
DSP algorithms frequently involve the repetitiveexecution of a small number of instructions- so-called inner loops or kernels.
problems associated with traditional approachesto repeated instruction execution: a natural approach to looping uses a branch
instruction to jump back to the start of the loop.
most loops execute a fixed number of times, the
processor must usually use a register to maintain theloop index, that is, the count of the number of timesthe processor has been through the loop.
-
7/29/2019 Execution Control Comp
4/47
DSP processors have evolved to avoid these problemsvia hardware looping, also known as zero-overheadlooping.
Hardware loops are special hardware controlconstructs that repeat either a single instruction or agroup of instructions some number of times.
The key difference between hardware loops andsoftware loops is that hardware loops lose no timeincrementing or decrementing counters, checking tosee if the loop is finished, or branching back to the topof the loop. This can result in considerable savings.
-
7/29/2019 Execution Control Comp
5/47
The software loop takes roughly three times
as long to execute, assuming that all
instructions execute in one instruction cycle.
In fact, branch instructions usually take several
cycles to execute, so the hardware looping
advantage is usually even larger.
-
7/29/2019 Execution Control Comp
6/47
-
7/29/2019 Execution Control Comp
7/47
Types of H/w looping
Single-Instruction Hardware Loops and
Multi-Instruction Hardware Loops
-
7/29/2019 Execution Control Comp
8/47
Single-Instruction Hardware Loops
A single-instruction hardware loop repeats a
single instruction some number of times.
Eg. Texas Instruments TMS320C2x
A multi-instruction hardware loop repeats a
group of instructions some number of times.
Eg. Analog Devices ADSP-21 xx
-
7/29/2019 Execution Control Comp
9/47
Single-Instruction Hardware Loops
Single-instruction hardware loop
executes one instruction repeatedly
the instruction needs to be fetched from program
memory only once.
the program bus can be used for accessing
memory for purposes other than fetching an
instruction, e.g., for fetching data or coefficientvalues that are stored in program memory.
-
7/29/2019 Execution Control Comp
10/47
Multi-Instruction Hardware Loops
must refetch the instructions in the block of code
being repeated each time the processor proceeds
through the loop.
the processor's program bus is not available toaccess other data.
-
7/29/2019 Execution Control Comp
11/47
Loop Repetition Count
A feature that differentiates processors' hardwarelooping capabilities is the minimum and maximumnumber of times a loop can be repeated.
Almost all processors support a minimum repetitioncount of one, and 65,536 is a common upper limit.
Frequently, the maximum number of repetitions islower if the repetition count is specified usingimmediate data, simply because the processor may
place restrictions on the size of immediate data words.
-
7/29/2019 Execution Control Comp
12/47
Loop Repetition Count
A hardware looping pitfall found on someprocessors (e.g., Motorola DSP5600x andZoran ZR3800x) is the following: a loop count
of zero causes the processor to repeat theloop the maximum number of times.
While this is not a problem for loops whosesize is fixed, it can be a problem if the programdynamically determines the repetition countat run time.
-
7/29/2019 Execution Control Comp
13/47
Loop Effects on Interrupt Latency
Single-instruction hardware loops disablesinterrupts for the duration of their execution.
system designers making use of both single-
instruction hardware loops and interrupts mustcarefully consider the maximum interrupt lockouttime they can accept and code their single-instruction loops accordingly.
Alternative: use multi-instruction loops on singleinstructions and breaking the single-instruction loopup into a number of smaller loops
-
7/29/2019 Execution Control Comp
14/47
Nesting Depth
A nested loop is one loop placed within
another.
The most common approaches to hardware
loop nesting are:
Directly nestable
Partially nestable
Software nestable
Nonnestable
-
7/29/2019 Execution Control Comp
15/47
Hardware-Assisted Software Loops
An alternative to nesting hardware loops is to nest a singlehardware loop within a software loop that uses specializedinstructions for software looping.
TMS320C3x and TMS320C4x support a decrement-andbranch-if-not-zero instruction.
While this instruction costs one instruction cycle on theTMS320C3x and TMS320C4x and three on the DSP16xx, thenumber of cycles is less than would be required to executeindividual decrement, test, and branch instructions.
This is a reasonable compromise between the hardware cost required to support nested hardware loops
the execution time cost of implementing loops entirely insoftware.
-
7/29/2019 Execution Control Comp
16/47
Interrupts
An external event that causes the processor to
stop executing its current program and branch
to a special block of code called an interrupt
service routine.
All DSP processors support interrupts and
most use interrupts as their primary means of
communicating with peripherals.
-
7/29/2019 Execution Control Comp
17/47
Interrupt Sources
On-chip peripherals : generate interrupts
when certain conditions are met
External interrupt lines : asserted by external
circuitry to interrupt the processor
Software interrupts : generated either under
software control or due to a software-initiated
operation
-
7/29/2019 Execution Control Comp
18/47
Interrupt Vectors
All processors associate a different memory address witheach interrupt. These locations are called interruptvectors.
Processors that provide different locations for eachinterrupt are said to support vectored interrupts.
Typical interrupt vectors are one or two words long and arelocated in low memory.
The interrupt vector usually contains a branch orsubroutine call instruction that causes the processor tobegin execution of the interrupt service routine located
elsewhere in memory. On some processors, interrupt vector locations are spaced
apart by several words.
-
7/29/2019 Execution Control Comp
19/47
Interrupt Enables
All processors provide mechanisms to globallydisable interrupts.
On some processors, this may be via special
interrupt enable and interrupt disableinstructions, while on others it may involvewriting to a special control register.
Most processors also provide individual interrupt
enables for each interrupt source. This allows theprocessor to selectively decide which interruptsare allowed at any given time.
-
7/29/2019 Execution Control Comp
20/47
Interrupt Priorities and Automatically
Nestable Interrupts
Prioritized interrupts
some interrupts have ahigher priority than others.
the one with the higher priority will be serviced,
the one with the lower priority must wait for theprocessor to finish servicing the higher priorityinterrupt.
When a processor allows a higher-priority
interrupt to interrupt a lower priority interruptthat is already executing, we call thisautomatically nestable interrupts.
-
7/29/2019 Execution Control Comp
21/47
Interrupt Latency
The amount of time between an interrupt
occurring and the processor doing something
in response to it.
Formal definition: The minimum time from
the assertion of an external interrupt line to
the execution of the first word of the interrupt
vector that can be guaranteed under certainassumptions.
-
7/29/2019 Execution Control Comp
22/47
Interrupt Latency
The details of and assumptions used in thisdefinition are as follows:
Most processors sample the status of externalinterrupt lines every instruction cycle. For an interruptto be recognized as occurring in a given instructioncycle, the interrupt line must be asserted someamount of time prior to the start of the instructioncycle; this time is referred to as the set-up time. we
assume that these setup time requirements aremissed. This lengthens interrupt latency by oneinstruction cycle.
-
7/29/2019 Execution Control Comp
23/47
Interrupt Latency
Depending on the processor synchronization can
add from one to three instruction cycles to the
processor's interrupt latency.
we assume the processor is in an interruptiblestate, which typically means that it is executing
the shortest interruptible instruction possible.
we assume the processor is in an interruptible
state, which typically means that it is executing
the shortest interruptible instruction possible.
-
7/29/2019 Execution Control Comp
24/47
Mechanisms for Reducing Interrupts
Some processors provide "autobuffering" ontheir serial ports.
This feature allows the serial port to save its
received data directly to the processor'smemory without interrupting the processor.
After a certain number of samples have been
transferred, the serial port interrupts theprocessor.
This is a specialized form of DMA.
-
7/29/2019 Execution Control Comp
25/47
Stacks
Processor stack support is closely tied to execution control.For example, subroutine calls typically place their returnaddress on the stack, while interrupts typically use thestack to save both return address and status information.
DSP processors typically provide one of three kinds of stacksupport: Shadow registers: Shadow registers are dedicated backup
registers that hold the contents of key processor registers duringinterrupt processing.
Hardware stack: A hardware stack holds selected registers
during interrupt processing or subroutine calls. Software stack: A software stack is a conventional stack using
the processor's main memory to store values during interruptprocessing or subroutine calls.
-
7/29/2019 Execution Control Comp
26/47
Relative Branch Support
All DSP processors support branch or jump instructionsas one of their most basic forms of execution control.
In PC-relative branching, the address to which theprocessor is to branch is specified as an offset from the
current address.
PC-relative addressing is useful for creating position-independent programs can be relocated in memorywhen it is loaded into the processor's memory.
In addition to supporting position-independent code,PC-relative branching can also save program memoryin certain situations.
-
7/29/2019 Execution Control Comp
27/47
Pipelining
-
7/29/2019 Execution Control Comp
28/47
Pipelining
A technique for increasing the performance ofa processor by breaking a sequence ofoperations into smaller pieces and executing
these pieces in parallel when possible, therebydecreasing the overall time required tocomplete the set of operations.
Unfortunately, in the process of improvingperformance, pipelining frequentlycomplicates programming.
-
7/29/2019 Execution Control Comp
29/47
Pipelining and Performance
A hypothetical processor uses separate executionunits to accomplish the following actionssequentially for a single instruction: Fetch an instruction word from memory
Decode the instruction Read a data operand from or write a data operand to
memory
Execute the ALU or MAC portion of the instruction
Assuming that each of the four stages abovetakes 20 ns to execute, and that they must bedone sequentially
-
7/29/2019 Execution Control Comp
30/47
-
7/29/2019 Execution Control Comp
31/47
A pipelined implementation of this processor
starts a new instruction fetch immediately
after the previous instruction has beenfetched .
Similarly, it begins decoding each instruction
as soon as the previous instruction is finisheddecoding.
In essence, it overlaps the various stages of
execution. As a result, the execution stages now work in
parallel.
-
7/29/2019 Execution Control Comp
32/47
-
7/29/2019 Execution Control Comp
33/47
Pipeline Depth
Although most DSP processors are pipelined, the depth(number of stages) of the pipeline may vary from oneprocessor to another.
In general, a deeper pipeline allows the processor to
execute faster but makes the processor harder toprogram.
Most processors use three stages (instruction fetch,decode, and execute) or four stages (instruction fetch,
decode, operand fetch, and execute). In three-stage pipelines the operand fetch is typically
done in the latter part of the decode stage.
-
7/29/2019 Execution Control Comp
34/47
Interlocking
The execution sequence shown in Figure 9-2 isreferred to as a perfect overlap, because thepipeline phases mesh together perfectly and
provide 100 percent utilization of theprocessor's execution stages. In reality,processors may not perform as well as wehave shown in our hypothetical example.
The most common reason for this is resourcecontention.
-
7/29/2019 Execution Control Comp
35/47
Interlocking
-
7/29/2019 Execution Control Comp
36/47
Interlocking
One solution to this problem is interlocking.
An interlocking pipeline delays the progression of the latter of
the conflicting instructions through the pipeline
-
7/29/2019 Execution Control Comp
37/47
Interlocking
-
7/29/2019 Execution Control Comp
38/47
Branching Effects
Branching effects occur whenever there is achange in program flow, and not just forbranch instructions.
For example, subroutine call instructions,subroutine return instructions, and returnfrom interrupt instructions are all candidatesfor the pipeline effects described above.
Processors offering delayed branchesfrequently also offer delayed returns.
-
7/29/2019 Execution Control Comp
39/47
Branching Effects
-
7/29/2019 Execution Control Comp
40/47
Branching Effects
-
7/29/2019 Execution Control Comp
41/47
Interrupt Effects
Interrupts typically involve a change in a program'sflow of control to branch to the interrupt serviceroutine. The pipeline often increases the processor'sinterrupt response time, much as it slows down branch
execution. When an interrupt occurs, almost all processors allow
instructions at the decode stage or further in thepipeline to finish executing, because these instructions
may be partially executed. What occurs past this pointvaries from processor to processor.
We discuss several examples below:
-
7/29/2019 Execution Control Comp
42/47
Interrupt Effects
-
7/29/2019 Execution Control Comp
43/47
Interrupt Effects
-
7/29/2019 Execution Control Comp
44/47
Interrupt Effects
-
7/29/2019 Execution Control Comp
45/47
Pipeline Programming Models
The examples in the sections above have
concentrated on the instruction pipeline and
its behavior and interaction with other parts
of the processor under various circumstances.
Two major assembly code formats for
pipelined processors:
time-stationary
data-stationary
-
7/29/2019 Execution Control Comp
46/47
Pipeline Programming Models
In the time-stationary programming model,the processor's instructions specify the actionsto be performed by the execution units
(multiplier, accumulator, and so on) during asingle instruction cycle.
A good example is the AT&T DSP16xx familywhere a multiply-accumulate instruction looks
like:
a0=a0+p p=x*y y=*r0++ p=*pt++
-
7/29/2019 Execution Control Comp
47/47
Pipeline Programming Models
Data-stationary programming specifies the
operations that are to be performed, but not
the exact times during which the actions are
to be executed.
As an example, consider the following AT&T
DSP32xx instruction:
a1 = a1 + (*r5++ = *r4++) * *r3++