convolution coding
DESCRIPTION
convolutional encoding descriptionTRANSCRIPT
-
5/26/2018 Convolution Coding
1/26
__________________________________________________________
Convolutional Coding on
Xtensa ProcessorsApplication Note
Tensilica, Inc.3255-6 Scott Blvd.
Santa Clara, CA 95054(408) 986-8000
Fax (408) 986-8919www.tensilica.com
January 2009 Doc Number: AN01-123-04
TENSILICA,INC.
-
5/26/2018 Convolution Coding
2/26
Convolutional Coding on Xtensa Processors
2005-2008 Tensilica, Inc.
All Rights Reserved
Printed in the United States of America
This publication is provided AS IS. Tensilica, Inc. (hereafter Tensilica) does not make any warranty of any kind, either expressed or implied, including, but not
limited to, the implied warranties of merchantability and fitness for a particular purpose. Information in this document is provided solely to enable system andsoftware developers to use Tensilica processors. Unless specifically set forth herein, there are no express or implied patent, copyright or any other intellectual
property rights or licenses granted hereunder to design or fabricate Tensilica integrated circuits or integrated circuits based on the information in this document.
Tensilica does not warrant that the contents of this publication, whether individually or as one or more groups, meets your requirements or that the publication
is error-free. This publication could include technical inaccuracies or typographical errors. Changes may be made to the information herein, and these changes
may be incorporated in new editions of this publication.
The following terms are trademarks of Tensilica, Inc.: OSKit, Tensilica, Vectra, and Xtensa. All other trademarks and registered trademarks are the property of
their respective companies.
Document Change History:
September 1998 (Revised January, 2001; February, 2005)
January 2009
ii
TENSILICA,INC.
Digitally signed byTensilica Technical
PublicationsReason: Certified original
Tensilica document 1/2009
-
5/26/2018 Convolution Coding
3/26
Convolutional Coding on Xtensa Processors
Contents
1 Communication System Challenges............................................................................12 A Simple Encoder.......................................................................................................13 The Encoding Process................................................................................................34 Viterbi Decoding.........................................................................................................65 Details of the Viterbi Algorithm....................................................................................76 Distance Metric Calculation........................................................................................77 The Trellis Decode Butterfly........................................................................................98 Implementation on Base Xtensa ...............................................................................119 Full Optimization with TIE.........................................................................................1210 Demonstration Instructions.......................................................................................1611 Summary..................................................................................................................16Appendix A VTB2.TIE Code ........................................................................................17
iii
TENSILICA,INC.
-
5/26/2018 Convolution Coding
4/26
Convolutional Coding on Xtensa Processors
Figures
Figure 1: Communication System Block Diagram.............................................................1
Figure 2: Simple Convolutional Encoder...........................................................................2
Figure 3: Convolutional Encoder State Diagram...............................................................2
Figure 4: Trellis Diagram Showing Most-Likely Path Through States................................ 6
Figure 5: Distance Metric Graph.......................................................................................8
Figure 6: Four Butterflies in a Trellis Time Step (K=4).....................................................9
Figure 7: Butterfly with Distance Metric..........................................................................10
Figure 8: Adding State and Branch Distances Metrics....................................................10
Figure 9: Selecting Smallest Accumulated Distance Metric ............................................10
Figure 10: Butterfly Operation Diagram..........................................................................11
Tables
Table1: Distance Metric Values........................................................................................9
iv
TENSILICA,INC.
-
5/26/2018 Convolution Coding
5/26
Convolutional Coding on Xtensa Processors
AbstractThis application note looks briefly at popular techniques for convolutional encoding and
decoding, especially Viterbi decoding, and illustrates the power of a configurable processor in
handling the performance-intensive signal processing demands of coding and decoding.
Application-specific processors are quickly designed, simulated, built in silicon, and offer
significantly better programmability, performance, and power-efficiency than most populardigital signal processors (DSPs). In particular, this paper describes user-defined TIE (Tensilica
Instruction Extension) instructions which accelerate distance metric calculations, the most
performance-critical task in Viterbi decoding, by 32x over most popular DSPs and 155x over
most popular 32-bit RISC cores.
This application note makes the assumption that the reader is familiar with Viterbi decoding,
the Xtensa Instruction Set Architecture, and the Tensilica Instruction Extension description
language. Please refer to theXtensa ISA Reference Manualand the Tensilica Instruction
Extension (TIE) Language Users Guidefor additional information.
v
TENSILICA,INC.
-
5/26/2018 Convolution Coding
6/26
Convolutional Coding on Xtensa Processors
vi
TENSILICA,INC.
-
5/26/2018 Convolution Coding
7/26
Convolutional Coding on Xtensa Processors
1 Communication System Challenges
One of greatest challenges in communication system design is efficient transmission and
reception of information in the presence of errors introduced by the communication channel.The presence of errors is especially pronounced in radio communication, due to the variety of
noise sources in the channel. Designers have adopted block coding methods that add
redundancy in the encoding of information before transmission. Although the addition of
redundant data reduces the overall throughput of the channel, forward error correction
improves performance by using the redundant data to correct errors during decoding at the
receiver, as shown in Figure 1.
encoder decoder noisy
channel
original
data
stream
encoded
data
encoded
data +
noise
recovered
data
stream
FIGURE 1: COMMUNICATION SYSTEM BLOCK DIAGRAM
Convolutional coding, that is, coding based on time-invariant finite state machines, is widely
used in wireless communications. This application note looks briefly at popular techniques for
convolutional encoding and decoding, especially Viterbi decoding. It illustrates the power of a
configurable processor in handling the performance-intensive signal processing demands of
coding and decoding. Specifically, user-defined instructions in the Tensilica Instruction
Extension Language (TIE) will be described which accelerate distance metric calculations, the
most performance-critical task in Viterbi decoding, by more than 32 times over most popular
digital signal processors and 155 times over most popular 32 bit RISC cores.
2 A Simple Encoder
In convolutional encoding, each new coded bit for transmission is generated by a convolution of
the current input bit with some number of earlier input bits and a masking polynomial. The
ability of the decoder to detect and correct errors in transmission depends on the number of
input bits used in the convolution. That number of bits is called the constraint length.Redundancy is added to the bit stream by the generation of more than one bit of encoded
output for each input bit. This ratio of input bits to output bits is called the coding rate. Forexample, a coding rate of 1/2 will generate 2 output bits from 1 input bit. Popular wireless
communication standards (GSM, IS-95, IS-136) use constraint lengths from 5 to 9 and coding
rates from 1/2 to 1/4.
A simple convolutional encoder, with a constraint length of 4 and coding rate of 1/2 is shown in
Figure 2. For each new input x( I ) , two new outputs, G0 and G1, are generated fortransmission.
1
TENSILICA,INC.
-
5/26/2018 Convolution Coding
8/26
Convolutional Coding on Xtensa Processors
Xi D
D
D D
G0,i
G1,i
one sample delay exclusive OR
FIGURE 2: SIMPLE CONVOLUTIONALENCODER
This example implements the convolution code represented by the polynomials:G0 = 1 + x + x
3 and G1 = 1 + x + x
2 + x
3
The polynomial formulas listed above are a convenient way to represent inputs from current bit
(X0=1) and delayed bits (X1,X2,X3) into XOR logic to form the output. For example, output G0
(1+x+x3) is calculated by performing XOR calculation on the current bit (X0=1), the previous bit
(X1), and the third previous bit (X3). Output G1 (1+x+x2+x3) is calculated by performing XOR
calculation on the current bit (X0=1), the previous bit (X1), the second previous bit (X2), and the
third previous bit (X3).
This encoder can also be expressed as a state diagram, as shown in Figure 3. Each of the
states is labeled with a state number corresponding to the state of the three delay elements of
the circuit above. Note that the most recent bit is assigned to the LSB, while the third previous
bit is assigned to the MSB. Each of the arcs is labeled x, G0, G1 (the input bit x for that arc, and
the G0, G1 outputs for that input).
000010
100
001011
101
110
111
1,0,0
0,1,1
0,1,0
1,0,0
0,1,1
1,1,11,1,1
1,0,1
1,0,1
0,0,1 0,0,1 0,0,01,1,0
0,0,0
0,1,01,1,0
FIGURE 3: CONVOLUTIONAL ENCODER STATE DIAGRAM
It is convenient to view the encoder as a state diagram showing arcs from one encoder state to
another. Each arc is labeled with the corresponding input bit and encoder output bits. Later,
this state diagram is converted to a trellis diagram to represent state arcs with respect to time.
Note that except for the encoder outputs, the state representation remains unchanged for anybasic convolution encoder with the same constraint length due to the fact that the shifting
pattern of bits through the encoder will remain the same. Different polynomials will generate
different outputs for each arc going from one state to another.
2
TENSILICA,INC.
-
5/26/2018 Convolution Coding
9/26
Convolutional Coding on Xtensa Processors
3 The Encoding Process
The convolution encoder described in the previous sections can be implemented either as a
hardware state machine or as a software routine running on a processor. Although the
hardware implementation for a given encoding polynomial is typically quite simple, a software
implementation offers valuable flexibility. The increasing need for adaptive and multi-protocolcommunication equipment make a processor-based solution appropriate in many
circumstances.
Below is a C implementation of the encoder that was shown earlier.
/ / Sampl e Convol ut i onal Encoder/ / Const r ai nt l engt h 4 and codi ng r ate 1/ 2/ / G0 = 1 + x + x 3 and G1 = 1 + x + x 2 + x 3/ * i nput dat a f or Convol ut i onal Encoder */char I N[ FrameSi ze] ;/ * out put dat a f r om Convol ut i onal Encoder */char G0[ FrameSi ze] , G1[ FrameSi ze] ;
voi d convol ve( ){i nt f , t emp;f or ( f =0; f = 3){
/ / Note t hat ANSI C XOR operat i ons ar e + i n pol ynomi al r epr esentat i onG0[ f ] = I N[ f ] I N[ f - 1] I N[ f - 3] ;G1[ f ] = I N[ f ] I N[ f - 1] I N[ f - 2] I N[ f - 3] ;
}el se i f ( f == 2) / / Assume Del ay el ement 3 f l ushed t o zer o{
G0[ f ] = I N[ f ] I N[ f - 1] ;G1[ f ] = I N[ f ] I N[ f - 1] I N[ f - 2] ;}el se i f ( f == 1) / / Assume Del ay el ement s 2- 3 f l ushed t o zero{
G0[ f ] = I N[ f ] I N[ f - 1] ;G1[ f ] = I N[ f ] I N[ f - 1] ;
}el se i f ( f == 0) / / I ni t i al Condi t i on:
/ / Al l Del ay el ement s are f l ushed to zer o{
G0[ f ] = I N[ f ] ;G1[ f ] = I N[ f ] ;
}
}}
3
TENSILICA,INC.
-
5/26/2018 Convolution Coding
10/26
Convolutional Coding on Xtensa Processors
Encoding can be rewritten, as in the pseudo code below, to take advantage of the Xtensa
processors funnel shift and XOR instructions.
/ / Pseudo Code f or encoder/ / G0=1+X+X3 & G1=1+X+X2+X3/ / N = number of i nput bi t s i n f r ame
/ / Assi gn Encoder I nput & Out put St r eami nt *I nput_Pt r =&I nput ;i nt *Out put_Pt r_G0=&Out put_G0;i nt *Out put_Pt r_G1=&Out put_G1;
/ / I ni t i al i ze I nput32_ol d t o zer oI nput 32_ol d=0;
/ / Encode 32 i nput bi t s per i t er at i onf or ( i =0; i
-
5/26/2018 Convolution Coding
11/26
Convolutional Coding on Xtensa Processors
/ / i nner l oop of k = 4, r = 1/ 2 encodi ng f or t he/ / G0 = 1 + x + x 3 and G1 = 1 + x + x 2 + x 3/ / i nput dat a f or Convol ut i onal Encoder / /
/ / comput es 64 out put pai r s per i t erat i on
/ / a2 poi nt s t o the word cont ai ni ng t he next 64 i nput bi t s/ / organi zed wi t h ol dest bi t i n the msb of t he word/ / a14 poi nt s t o t he out put buf f er f or G0/ / a15 poi nt s t o t he out put buf f er f or G1/ / a8 cont ai ns t he ol dest 32 i nput bi t s f rom t he pr evi ous i t er at i on
movi . n a1, N/ 64l oopnez a1, l oopend / / use zero over head l oop, N i s number of bi t s t o encode
l 32i a3, a2, 0 / / a3 cont ai ns l ow 32b of i nput st r eam ( 1)l 32i a9, a2, 4 / / a9 cont ai ns hi gh 32b of i nput st r eam ( 1)
/ / not e that a8 cont ai ns hi gh 32b of pr evi ous i t er at i onssai 1 / / f unnel shi f t 64b by one sampl e t i mesr c a4, a8, a3 / / a4 cont ai ns l ow del ayed by one ( x)sr c a10, a3, a9 / / a10 cont ai ns hi gh del ayed by one ( x)ssai 2 / / f unnel shi f t 64b by t wo sampl e t i messr c a5, a8, a3 / / a5 cont ai ns l ow del ayed by t wo ( x 2)sr c a11, a3, a9 / / a11 cont ai ns hi gh del ayed by t wo ( x 2)ssai 3 / / f unnel shi f t 64b by t hr ee sampl e t i messr c a6, a8, a3 / / a6 cont ai ns l ow del ayed by t hree ( x 3)sr c a12, a3, a9 / / a12 cont ai ns hi gh del ayed by t hree ( x 3)
/ / comput e G0 & G1 f or al l l ow 32bxor a4, a4, a3 / / G0= 1 + xxor a4, a4, a6 / / +x 3xor a5, a5, a4 / / G1 = G0 + x 2
/ / compute G0 & G1 f or al l hi gh 32bxor a10, a10, a9 / / G0= 1 + xxor a10, a10, a12 / / +X 3xor a11, a11, a10 / / G1 = G0 + x 2
s32i a4, a14, 0 / / st ore G0 l ow 32bs32i a5, a15, 0 / / st ore G1 l ow 32bs32i a10, a14, 4 / / st ore G0 hi gh 32bs32i a11, a15, 4 / / st ore G1 hi gh 32baddi a2, a2, 8 / / advance i nput poi nt er by 64baddi a14, a14, 8 / / and out put poi nters by 64baddi a15, a15, 8mov a8, a9 / / save hi gh 32b f or use i n next i t erat i onl oopend:
The assembly routine listed above is capable of encoding 2.5 bits per cycle. The performance
of this convolutional coding technique can be generalized to 11+((k-1)*5) cycles for each 64
input bits, where k is the constraint length. The actual performance is dependent on the
polynomials used. The convolutional coding performance of a base Xtensa processor is
comparable to a 16-bit DSP, such as members of the Texas Instruments TMS320C54x family.
This class of DSPs is capable of coding 1.5 bits per cycle for a set of polynomials with k=5 (see:
Viterbi Decoding Techniques in the TMS320C54x Family, Henry Hendrix, Texas InstrumentsApplication Note SPRA071, June 1996). For the same polynomials, performance on an Xtensa
processor is about 1.8 bits per cycle.
5
TENSILICA,INC.
-
5/26/2018 Convolution Coding
12/26
Convolutional Coding on Xtensa Processors
4 Viterbi Decoding
The goal of decoding a received bit stream is to find the maximum-likelihood output sequence
given the received sequence - a combination of the transmitted sequence plus noise. Viterbi
decoding offers an efficient algorithm to find this output sequence. It is based on a decoder
that attempts to estimate, using the received data sequence, the likelihood that the encoder is
in each of its possible states. The graphical modeling of all possible state transitions has cometo be called a trellis diagram. A simple trellis diagram is shown below. The trellis diagram is a
different way of modeling the state diagram that was shown earlier, but with the added
dimension of time. This diagram is used to determine the correct path through the states,
based on a particular transmitted sequence, assuming the encoder started in the idle state
(000). The challenge for the decoder is to predict this path even when some of the incoming
bits (G0, G1) may have been corrupted by noise.
Received G0,G1:
000000 000 000 000
010 010010 010 010
011011 011 011 011
100 100 100 100 100
101 101 101 101101
110 110 110 110 110
111 111 111 111 111
001 001001001 001
1,0 1,00,1 0,1
Time 0 Time 1 Time 2 Time 3 Time 4
FIGURE 4: TRELLIS DIAGRAM SHOWING MOST-LIKELY PATH THROUGH STATES
6
TENSILICA,INC.
-
5/26/2018 Convolution Coding
13/26
Convolutional Coding on Xtensa Processors
5 Details of the Viterbi Algorithm
The Viterbi decode algorithm works in two phases. In the first phase, the update phase, the
incoming data is analyzed in sequence order. The maximum-likelihood decoder works by
maintaining a running estimate of the appropriateness of each possible path through the trellis
for the received data sequence. Starting from a known initial state and for each successivereceived input pair (G0,G1), the decoder calculates a distance metric between the received
input pair and the input pair corresponding to each state arc in the diagram. The distance
metric calculation method will be discussed later. The shortest path, the series of arcs with the
smallest total distance metric, is taken to be the most-likely path through the trellis diagram.
Each path implies a unique state sequence in the encoder, and thus a unique input sequence.
This phase is considered the most CPU-intensive task within the Viterbi Algorithm, so the
remainder of this application note focuses on this area.
In the second phase, the trace back phase, the sequence of arc decisions must be traced back
to reconstruct the inferred inputs to the encoder. Recalling that the most recent data shifted
into the delay line is the LSB of the state, the inputs based upon the trellis diagram above are
inferred to be (1,0,0,0). This phase can be easily accomplished by examining the LSB of each
of the states, tracing backward through the most-likely path.
Several popular techniques are used to calculate distance metrics. In general, these methods
are categorized as either hard decision decoding or soft decision decoding. In a soft decision
decoder, the input to the decoder is an integer in the range between +B and -B. Therefore, the
strength of the signal can be used as information by the decoder. In a hard decision decoder,
threshold detection is used to quantize input signals into either of two states: +1 or -1. Soft
decision decoding with infinite range provides approximately 2.2db better coding gain than hard
decision decoding at the expense of slightly more complexity in the decoder.
6 Distance Metric Calculation
In the trellis diagram shown in Figure 4, there are arcs leading from states in one trellis column
to states in the next trellis column. Each of these arcs has an associated local distance (branch
metric). Recall that the state diagram shown earlier labels each arc with the encoder outputsfor each transition. The local distance is determined by comparing the actual received data to
expected encoder outputs for a given arc.
The Hamming Distance technique is one of the more popular techniques used for calculating
distance metrics. For a coding rate of 1/2, we can imagine the actual data, G0 and G1, to
indicate position in two different dimensions. Each arc in the trellis diagram has a
corresponding input pair, R0 and R1, which is the expected output for each arc. The diagram
below shows both actual and expected data represented as points in a Cartesian plane. The
Hamming distance is determined by adding the differences of each dimension
((G0-R0) + (G1-R1)).
7
TENSILICA,INC.
-
5/26/2018 Convolution Coding
14/26
Convolutional Coding on Xtensa
Hamming
Distance
Straight-lineDistance
(R0, R1)
Expected
Actu al
(G0, G1)
FIGURE 5: DISTANCE METRIC GRAPH
Another popular distance metric technique is the Euclidian (Square) Distance technique. The
Euclidian (Square) Distance is determined by calculating the square root of the straight-line
distance between two symbols. Using the Pythagorean Theorem, the straight-line distance
between the actual and expected input pairs of the previous diagram is calculated as follows:
22 R1)( G1R0)-( G0 +
Remove the square root from the straight-line distance calculation to get the Euclidian (Square)
Distance. There is a slight bit error rate (BER) performance penalty for using the Euclidian
(Square) Distance when compared to the straight-line distance, yet this penalty is negligible
when compared with the reduction in complexity. Expanding the Euclidian (Square) distance
metric results in the following equation:
G02 - 2( R0*G0) + R02 + G12 - 2( R1*G1) +R12
Note that the distance metric for a given arc will be compared against distance metrics of other
arcs within the same trellis column. Addition of constants or multiplication by a constant will
not affect the comparison. Therefore distance metric calculation can be simplified by removing
constants and constant multipliers. G0and G1are actual inputs, which have a range between+Band B, yet are constant throughout the trellis column. Therefore the square of G0 and G1can be eliminated. Since expected inputs, R0 and R1 have possible values of +B or -B, the
square of R0 and R1 become B2, which is a constant and can be eliminated. Thus, the distance
metric can be further simplified as follows by removing these constants. - 2( R0*G0) 2( R1*G1)
Removing the constant multiplier 2 in the equation above, leaves
- ( R0*G0) ( R1*G1)
Recalling that R0 and R1 have possible values of +B or B, the distance metric is simplified as
shown in the following table:
8
TENSILICA,INC.
-
5/26/2018 Convolution Coding
15/26
Convolutional Coding on Xtensa Processors
TABLE1: DISTANCE METRIC VALUES
Expected DataR0, R1) DistanceMetric RemovingConstant B Replace withSum, Diff+B, +B -BG0-BG1 -G0-G1 -Sum
+B, -B -BG0+BG1 -(G0-G1) -Diff
-B, -B +BG0+BG1 G0+G1 Sum
-B, +B +BG0-BG1 G0-G1 Diff
Note: Sum=G0+G1; Diff=G0-G1
The distance metric calculation has been greatly simplified to +/- the sum or difference of the
received data. To determine the local distance of a particular arc, determine the expected data
for that arc and replace it with the corresponding equation using the table above.
7 The Trellis Decode Butterfly
To aid in implementation, it is often helpful to arrange calculations in functional groups. The
procedure for doing the calculations on a single group can become a template to be used on
other like groups. A butterfly can be visualized as a grouping of 2 source states, 2 destination
states, and 4 arcs between them. For the trellis diagram shown earlier, with 8 states per
column, a time step from one trellis column to another can be visualized as 4 butterflies as
shown below.
100 001
000 000
011101
010001
110 101
100010
111 111
011 110
FIGURE 6: FOUR BUTTERFLIES IN A TRELLIS TIME STEP (K=4)
9
TENSILICA,INC.
-
5/26/2018 Convolution Coding
16/26
Convolutional Coding on Xtensa
Lets take a closer look at a single butterfly calculation. The diagram below shows a butterfly
diagram with corresponding encoder output values for each arc. The encoder outputs are
translated into local distances as per the previous table.
-Sum
-Sum
+Sum000 000
+Sum100 001
+B+B
+B+B
-B -B000 000
-B -B100 001
FIGURE 7: BUTTERFLY WITH DISTANCE METRIC
The heart of the butterfly calculation is sometimes called the ADD-COMPARE-SELECT operation.
In the ADD stage, the accumulated distance metric is calculated by taking the local distance of
each arc in the butterfly, and adding it to the accumulated distance metric from the originating
state. Considering that the accumulated distance metric of the originating state is named
StateN (N = number of state), the diagram below shows each arcs accumulated distance
metric after the ADD stage.
State0+Sum
State4+Sum
State0-Sum
State4
-Sum
000000
001100
FIGURE 8:ADDING STATE AND BRANCH DISTANCES METRICS
In the COMPARE stage, the distance metric for each arc into a destination state is compared. In
the butterfly diagram, there are two arcs and two corresponding distance metrics leading into
each destination state. Of the two arcs, the arc with the smallest distance metric is considered
as the most-likely arc and the other arc is discarded.
In the SELECT stage, the most-likely arcs accumulated distance metric is stored as the new
accumulated distance metric for the state. The diagram below shows the selected arcs and
updated accumulated distant metric, State 0 and State1, assuming State0+Sum < State4-Sum
and State0-Sum < State4+Sum.
State0+Sum
100
000
001
000State0-Sum
State0=State0+Sum
State4=State0-Sum
FIGURE 9: SELECTING SMALLESTACCUMULATED DISTANCE METRIC
The selected arcs are recorded so this information can be used during the trace back phase to
reconstruct the most-likely path through the trellis. One way to code the selected arc is to use
the MSB of the originating state. Hence, the most-likely arc into State 0 is coded as 0, and the
most-likely arc into State 1 is also coded as 0.
The regularity of the butterfly computation suggests a set of special instructions intended to
accelerate the calculation of distance metrics. Variations of the add-compare-select instructions
have been implemented on advanced digital signal processors. In our C-based implementation,
a macro called ACS is used to implement a variation of the ADD-COMPARE-SELECT calculation.
The macro and sample usage is shown for a single butterfly operation.
10
TENSILICA,INC.
-
5/26/2018 Convolution Coding
17/26
Convolutional Coding on Xtensa Processors
/ ******************************************************************
ACS i s a macro whi ch per f orms a var i at i on of t he ADD- Compar e- Sel ectoper ati on f or each st at e i n t he Trel l i s. I t compar es 2 accumul at eddi st ance met r i cs (X, Y) of t he 2 ar cs l eadi ng i nt o t he st at e. The shor t est
arc i s sel ect ed as the most- l i kel y ar c. The shor t est accumul ated di st ancemetr i c i s st ored i n S( I ) and bi nary code whi ch desi gnates t he most - l i kel yar c t o t he st at e i s stored i n Sel ect[ j ] [ I ] , wher e ( I ) repr esent s the st at eand ( j ) r epresent s t he t r el l i s col umn.
*******************************************************************/
#def i ne ACS( S, I , X, Y) i f ( ( s1 = ( X) ) < ( s2 = ( Y) ) ) {S[ ( I ) ] = s1;Sel ect[ j ] [ ( I ) ] = 0; } el se {S[ ( I ) ] = s2; Sel ect [ j ] [ ( I ) ] = 1; }
Di f f = G0[ j ] - G1[ j ] ;
Sum = G0[ j ] +G1[ j ] ;
/ / Usi ng ACS macro f or si ngl e but t er f l y
ACS(NewStat e, 0, Stat e0+Sum, Stat e4- Sum) ;
ACS(NewStat e, 1, Stat e0- Sum, Stat e4+Sum) ;
A butterfly operation consists of two add-compare-select calculations. The code above is used
to perform the butterfly operation shown below.
State4+Sum100
State0-Sum
State4
-Sum
NewState[ 0 ] =
Min(State0+Sum,State4-Sum)
NewState[ 1 ] =
Min(State0+Sum,State4+Sum)
State0+Sum000
001
000
FIGURE 10: BUTTERFLY OPERATION DIAGRAM
A single butterfly operation is performed for every pair of destination states within a trellis
column. The same trellis column operation is iteratively performed on every subsequent trellis
column until the end of the frame. Once the end of frame is reached, each states accumulated
distance metric is compared, with the smallest being considered the ending state. The trace
back phase begins with the end state. The decoder will then extract the LSB of each state as
the deduced input bit and use the coded path to trace through all prior trellis columns until the
inferred input at the beginning of the frame is deduced.
8 Implementation on Base Xtensa
A demonstration GSM Viterbi Decoder and test bench was developed in C and is provided as an
Xplorer Workspace file, Vi t er bi_v2. xws. The decoder is a soft decision decoder using the
Euclidian (Square) Distance metric and ACS macro described earlier in this Application Note(instead of eight states described in previous sections). Since GSM uses a constraint length of
five, there will be 16 states in every trellis column. Hence, GSM requires eight butterfly
operations to decode a single bit (as compared to our previous example which only required
four butterfly operations).
The Viterbi_v2 project is a test bench that prepares a random frame of 1000 bits and thenencodes them into GSM coded symbols. The symbols are corrupted to simulate white noise.
Finally, the test bench decodes these bits and compares the output with the original input bits.
The Viterbi decoder is benchmarked for performance.
11
TENSILICA,INC.
-
5/26/2018 Convolution Coding
18/26
Convolutional Coding on Xtensa
In this original form, a single bit requires 337 cycles to decode on a base Xtensa processorwhen using aggressive compiler optimizations (-O3 switch used in xt - xcc). Given that theXtensa processor is as efficient, if not more efficient, than ARM9 and MIPS32 cores in handling
ANSI C code, the performance of other 32-bit RISC cores is estimated to be similar.
9 Full Optimization with TIE
The Tensilica Instruction Extension (TIE) language provides a powerful mechanism to add
instructions to the base Xtensa instruction set and to generate complete support in hardware
and software tools for special purpose operations. The decode butterfly involves the addition of
the local distance to a pair of adjacent states accumulated distance metric calculation, then a
comparison and selection of the most-likely arc into each of the pair of states. The regularity of
this computation suggests a set of special instructions intended to accelerate the butterfly
calculation. Variations of add-compare-select instructions have been implemented on
advanced digital signal processors to accelerate the Viterbi decoder. Likewise, variations of the
add-compare-select instruction can be developed for Xtensa using TIE. Such instructions are
invaluable in accelerating Viterbi decoders that support data encoded using arbitrary constraint
length and polynomials. On the other hand, TIE could be used to develop instructions that
accelerate the decoding of data generated from a specific encoder. TIE instructions that are
specific to an encoder can be developed with computational performance comparable to a purehardware implementation. The optimal TIE instructions chosen is dependent upon the balance
between flexibility and computational performance required in a given system.
Significant improvement using TIE can be achieved by creating a variation of the add-compare-
select butterfly computation and defining this logic as a TIE function as shown below:
/ / Vi t erbi ADD- COMPARE- SELECT But t er f l y
function [ 33: 0] VBFLY ( [ 15: 0] St at eA, [ 15: 0] St at eB, [ 15: 0] Met r i c)
{
wire [ 15: 0] neg_Metr i c = ~Metr i c + 1' b1;
/ / Add state and path met r i cwire [ 15: 0] st at eA_pat hA = Stat eA+Met r i c;
wire [ 15: 0] st at eB_pat hB = St at eB+neg_Met r i c;
/ / Compar e accumul at ed met r i c
wire [ 4: 0] compA = TIEcmp(stateA_pat hA, st at eB_pat hB, 1' b1) ;
/ / Sel ect ed ( l east val ue) pat h i s out put
wire [ 15: 0] new_st ateA = ( compA[ 4] ) ?st ateA_pathA: st at eB_pathB;
wire Sel ectA = ( compA[ 4] ) ?0: 1;
wire [ 15: 0] st at eA_pat hB = St at eA+neg_Met r i c;
wire [ 15: 0] st at eB_pat hA = Stat eB+Met r i c;
wire [ 4: 0] compB = TIEcmp(stateA_pathB, st at eB_pat hA, 1' b1) ;
wire [ 15: 0] new_st at eB = ( compB[ 4] ) ?st at eA_pat hB: st ateB_pathA;
wire Sel ectB = ( compB[ 4] ) ?0: 1;
assign VBFLY = {Sel ectA, Sel ectB, new_st at eA, new_st at eB};
}
This TIE function performs the same computation as a pair of ACS macros shown in section 7.
12
TENSILICA,INC.
-
5/26/2018 Convolution Coding
19/26
Convolutional Coding on Xtensa Processors
Several additional techniques used to accelerate the Viterbi decoder are:
The VBFLY TIE function can be instanced several times in an operation so that multipleViterbi butterfly computations are performed in parallel.
Making use of internal TIE state (not to be confused with states in the trellis diagramreferred to as trellis states) to hold intermediate data, such as accumulated state metrics,
can eliminate many memory accesses.
Fusion of memory accesses and butterfly computations into high performance TIEoperations
FLIX with dual load/store interface allows for two operations (both operations performingload/store) to be issued in the same instruction word.
Appendix A lists vtb2. t i e, the TIE file that describes TIE operations that accelerate Viterbidecode. The TIE instructions for the trellis update phase of Viterbi decoding are summarized
below.
VBI N: Viterbi Input
C I ntr i nsi c Synt ax: voi d VBI N( VREG PG0, VREG* p_PG0)
This operation loads 2 GSM coded symbol pairs (4 bytes) at one time by using a 32-bit load into
a 32-bit register file VREG. The load pointer (p_PG0) is also auto-incremented by 4 bytes inpreparation for the next VBI Ninstruction.
VBOUT: Parallel Viterbi Butterfly Operation and Output
C I ntr i nsi c Synt ax: voi d VBOUT ( unsi gned shor t * PSel ect, VREG PG0, i mmi )
This operation updates all state metrics of a trellis column for a single pair of GSM coded data
(PG0). The add-compare-select operation is performed on all 16 states of the trellis column
using 8 VBFLY TIE functions, to support the Viterbi butterfly computations for the entire trellis
column.
This operation updates each states accumulated distance metric within 16-bit TIE states, one
for each of the 16 Trellis states and writes out 16 select bits for the most-likely arcs going into
each of the 16 trellis states. The write pointer (PSelect) is auto-incremented in preparation for
the next VBOUTinstruction. An immediate operand (i) is used to choose a symbol pair of GSMcoded data from the 32-bit VREGTIE register file. Since VBI Nprovides 2 GSM coded symbols,there will be two VBOUTinstructions for each VBI Ninstruction.
WUR_BMsel: Write User Register- Branch Metric Select
The BMSel register is a 32-bit register that sets the distance metric for each path of the Viterbi
butterfly computations as used by the VBOUT instruction. Since the VBOUT performs 8 butterfly
computations, there are 32 paths metrics. However, due to path symmetry in the butterfly
structure, we need only define the top-most path to the butterfly and remaining paths are
inferred from this path. For example, the top-most path in figure 8 is +sum. The bottom-most
path is the same as the top-most path (+sum) and the diagonal paths are negative of the top-
most path (sum).
The BMSel register is split into 8 4-bit fields, where each bit corresponds to a one-hot value for
+sum, -sum, +diff, or -diff. The most significant 4-bit field corresponds to the top-most path of
the butterfly computation that updates states 0 and 1. The following 4-bit field corresponds to
the top-most path of the butterfly computation that updates states 2 and 3, and so on.
Prior to executing VBOUT instructions, the BMSel register should be initialized with the
appropriate branch metric selection for the butterfly computations. By allowing the setting of
the branch metrics, the VBOUT instructions allows support for different polynomials used for
Viterbi coding (given that the constraint length is k=5, coding rate = 1/2).
In this example, the path metrics for each butterfly computation are taken directly from the
GSM decoder C source code. The initialization for standard GSM coded polynomials is shown in
the sample code below:
13
TENSILICA,INC.
-
5/26/2018 Convolution Coding
20/26
Convolutional Coding on Xtensa
#define di s t_sum 8
#define di s t_neg_sum4
#define di s t_di f f 2
#define di s t_neg_di f f 1
WUR_BMSel ( ( di st_sum
-
5/26/2018 Convolution Coding
21/26
Convolutional Coding on Xtensa Processors
work-per-cycle basis. Note that the Xtensa-based implementation is written in C, whereas
hand coded assembly is required to obtain performance numbers for many DSP machines.
The TIE operations for the trace-back phase of Viterbi decoding are summarized below.
BACKTRACE: Viterbi Backtrace
C I ntr i nsi c Synt ax: voi d BACKTRACE(unsi gned shor t * PSel ect)
This operation loads the 16 select bits (from address PSelect) that were stored during
execution of VBOUT instructions. From the current minimum state, the select value
(representing the most likely path) is used to trace backward to the previous trellis stage. The
LSB value of the minimum state is considered to be the most likely output bit and is saved in a
holding register to be later written to memory using the STORE_OUT operation. The select
pointer (PSelect) is post-decremented by 2 in preparation for the next BACKTRACE operation.
BACKTRACE0: Viterbi Backtrace initialization
C I ntr i nsi c Synt ax: voi d BACKTRACE0( char Mi nSt ate)
This instruction is a subset of the BACKTRACE operation that is only executed once prior to
subsequent executions of the BACKTRACE instructions. This instruction initializes the minimum
state after the update phase. The state number with the minimum value is passed as argumentMinState.
STORE_OUT: Store eight output values
C I nt r i nsi c Synt ax: voi d STORE_OUT( unsi gned char* POutput )
This instruction performs a byte store of the single-bit output value calculated in prior
executions of the BACKTRACE instruction to pointer POutput. The POutput pointer is post-
decremented by one in preparation for the next STORE_OUT operation.
The main loop for the Viterbi decoders update phase is shown below:
for ( i =FS- 1; i >=1; i - - ) {
BACKTRACE(PSel ect ) ;
STORE_OUT( pt r _out put ) ;
}
The disassembly of the Viterbi decoders backtrace loop is as follows:
l oopgt z a10, 60000f e0
{ s tore_out a9; backt r ace a8 }
The loop consists of a single FLIX instruction that contains both BACKTRACE and STORE_OUToperations. These operations are effectively pipelined such that the backtrace is done in the
first iteration and then the output bit is written to memory in the next iteration. As a result, an
output bit is written every clock cycle. This means that the trace back phase of Viterbi decoding
occurs at a rate of one cycle per bit.
The highly optimized assembly code described in this section was directly compiled from C
source code with the TIE variable set (#define TIE). Upon building this example and simulating
it, the console shows the following:
15
TENSILICA,INC.
-
5/26/2018 Convolution Coding
22/26
Convolutional Coding on Xtensa
Pr ocessi ng New Fr ame
Err ors det ect ed = 0, Benchmark = 2. 167000 cycl es per bi t
Viterbi decodeperformance of 2.17 cycles per bit is more than 155x improvementover thestandard implementationwithout TIE acceleration(337 cycles per bit). The TIE area for thisapproach is 28.7K gates, in addition to 47K gates for base XtensaLX2 core. This core iscapable of being synthesized up to 264MHz(worst case) in .13 LV. Therefore, this solution iscapable of decoding a GSM coded bitstream at a peak rate of 130Mbits per second.
10Demonstration Instructions
The demonstration requires that you have installed Xplorer CE 2.1.1 with RB-2008.3 software
tools. The workspace, Vi t er bi _V2. xwscan be obtained from the Tensilica support website.
Follow these steps to build and simulate the demonstration code.
1. Start Xplorer and import the Vi t er bi _V2. xwsworkspace. Select all componentsprovided in the workspace for installation into your workspace.
2. In the workspace toolbar, select project (P: Viterbi_v2), configuration (C: Viterbi_v2) andrelease target (T: Release).
3. Click Build Active to compile and then click on Run to simulate. The console will display thedecode error and benchmark results.
To compare performance with ANSI C implementation (without TIE), you can comment out
the line (#define TIE) in the mai n. cfile of the Viterbi_V2 project.
11SummaryXtensa processors offer significant advantages for complex telephony applications. The Xtensa
architecture combines a powerful general-purpose 32-bit instruction set design, with a unique
configuration and extension process. These are used together to solve some of the toughest
problems in communication system design, including efficient convolutional coding and Viterbi
decoding. Application-specific-processors are quickly designed, simulated, built in silicon, and
offer significantly better programmability, performance and power-efficiency than most popular
DSPs. With the benefit of TIE, Xtensa solutions can offer almost 155x improvement incommunication processing efficiency compared to conventional 32-bit RISC cores and over 32ximprovement when compared to specialized DSPs.
16
TENSILICA,INC.
-
5/26/2018 Convolution Coding
23/26
Convolutional Coding on Xtensa Processors
Appendix A VTB2.TIE Code
/ / VTB2. TI E/ / TI E Ext ensi ons f or Vi t er bi Accel erati on/ / FL IXformat vt b_f l i x 32 {s l ot_a, s l ot_b}
slot_opcodes s l ot_a {VBI N, STORE_OUT}
slot_opcodes s l ot_b {VBOUT, BACKTRACE, BACKTRACE0}
/ / St at es used by Vi t erbi I nstr ucti onsstate AccumDi st 0 16 add_read_write
state AccumDi st 1 16 add_read_write
state AccumDi st 2 16 add_read_write
state AccumDi st 3 16 add_read_write
state AccumDi st 4 16 add_read_write
state AccumDi st 5 16 add_read_write
state AccumDi st 6 16 add_read_write
state AccumDi st 7 16 add_read_write
state AccumDi st 8 16 add_read_write
state AccumDi st 9 16 add_read_write
state AccumDi st A 16 add_read_write
state AccumDi st B 16 add_read_writestate AccumDi st C 16 add_read_write
state AccumDi st D 16 add_read_write
state AccumDi st E 16 add_read_write
state AccumDi st F 16 add_read_write
state Mi nSt at e 4 add_read_write
state BMSel 32 add_read_write
state Output 1 add_read_write
/ / I mmedi atesimmediate_range i mm8 0 7 1
regfile VREG 32 2 vr
/ / Vi t erbi ADD- COMPARE- SELECT Butt erf l y
function [ 33: 0] VBFLY ([ 15: 0] St at eA, [ 15: 0] St ateB, [ 15: 0] Metr i c){
wire [ 15: 0] neg_Met ri c = ~Met ri c + 1' b1;
wire [15: 0] st ateA_pathA = St ateA+Metr i c;
wire [15: 0] st ateB_pat hB = Stat eB+neg_Metr i c;
wire [ 4: 0] compA = TIEcmp(stateA_pathA, st ateB_pat hB, 1' b1) ;
wire [ 15: 0] new_st ateA = ( compA[ 4] ) ?st ateA_pathA: st ateB_pat hB;wire Sel ectA = ( compA[ 4] ) ?0: 1;
wire [15: 0] st ateA_pat hB = Stat eA+neg_Metr i c;
wire [15: 0] st ateB_pathA = St ateB+Metr i c;
wire [ 4: 0] compB = TIEcmp(stateA_pathB, st ateB_pat hA, 1' b1) ;
wire [ 15: 0] new_st ateB = ( compB[ 4] ) ?st ateA_pathB: st ateB_pat hA;wire Sel ectB = ( compB[ 4] ) ?0: 1;
assign VBFLY = {Sel ectA, Sel ectB, new_st ateA, new_st ateB};}
operation VBI N {out VREG GI nput , inout AR *ar s} {out VAddr , in MemDat aI n32}{assign VAddr =ars;assign GI nput=MemDat aI n32;assign ar s=ars+4;}
operation VBOUT
17
TENSILICA,INC.
-
5/26/2018 Convolution Coding
24/26
Convolutional Coding on Xtensa
{inout AR *ars, in VREG GI nput , in i mm8 t }{
in BMSel ,inout AccumDi st 0,inout AccumDi st 1,inout AccumDi st 2,inout AccumDi st 3,inout AccumDi st 4,
inout AccumDi st 5,inout AccumDi st 6,inout AccumDi st 7,inout AccumDi st 8,inout AccumDi st 9,inout AccumDi st A,inout AccumDi st B,inout AccumDi st C,inout AccumDi st D,inout AccumDi st E,inout AccumDi st F,out VAddr ,out MemDat aOut16
}{/ / Choose G0 f r om GI nput based upon i mmedi at e ar gument t/ / Wr i t t en for Bi g Endi an Or der i ng
wire [ 7: 0] G0=( ( t ==1) ?GI nput [ 15: 8] : GI nput [ 31: 24] ) ;/ / Choose G1 f r om GI nput based upon i mmedi at e ar gument t/ / Wr i t t en for Bi g Endi an Or der i ngwire [ 7: 0] G1=( ( t ==1) ?GI nput [ 7: 0] : GI nput[ 23: 16] ) ;
/ / Decl are t empor ary var i abl es f or AccumDi stwire [ 15: 0] St ate0=AccumDi st 0;wire [ 15: 0] St ate1=AccumDi st 1;wire [ 15: 0] St ate2=AccumDi st 2;wire [ 15: 0] St ate3=AccumDi st 3;wire [ 15: 0] St ate4=AccumDi st 4;wire [ 15: 0] St ate5=AccumDi st 5;wire [ 15: 0] St ate6=AccumDi st 6;wire [ 15: 0] St ate7=AccumDi st 7;
wire [ 15: 0] St ate8=AccumDi st 8;wire [ 15: 0] St ate9=AccumDi st 9;wire [ 15: 0] St ateA=AccumDi st A;wire [ 15: 0] St ateB=AccumDi st B;wire [ 15: 0] St ateC=AccumDi st C;wire [ 15: 0] StateD=AccumDi st D;wire [15: 0] St ateE=AccumDi st E;wire [ 15: 0] Stat eF=AccumDi st F;/ / Cal cul at e Sum/ Di f f f or i nputwire [ 7: 0] Sum_8=G0+G1;
wire [ 7: 0] Di f f _8=G0- G1;
wire [ 15: 0] Sum={8{Sum_8[ 7] }, Sum_8};
wire [ 15: 0] Di f f ={8{Di f f_8[ 7] }, Di f f _8};
wire [ 15: 0] neg_Sum=~Sum + 1;
wire [ 15: 0] neg_Di f f =~Di f f + 1;
/ / Cal cul ate Accumul ated Path Metr i cs/ / Compar e/ Sel ect Short est Pat h i nto each St ate/ / usi ng 8 paral l el VBFLY f uncti ons
wire [ 15: 0] new_AccumDi st 0, new_AccumDi st 1, new_AccumDi st 2, new_AccumDi st 3,
new_AccumDi st 4, new_AccumDi st 5, new_AccumDi st 6, new_AccumDi st 7, new_AccumDi st 8,
new_AccumDi st 9, new_AccumDi st A, new_AccumDi st B, new_AccumDi st C, new_AccumDi st D,
new_AccumDi st E, new_AccumDi st F;wire Sel ect0, Sel ect1, Sel ect2, Sel ect3, Sel ect4, Sel ect5, Sel ect6, Sel ect7, Sel ect8,
Sel ect9, Sel ectA, Sel ectB, Sel ectC, Sel ectD, Sel ectE, Sel ectF;
18
TENSILICA,INC.
-
5/26/2018 Convolution Coding
25/26
Convolutional Coding on Xtensa Processors
wire [ 15: 0] Di st A = TIEsel( BMSel [ 31] , Sum, BMSel [ 30] , neg_Sum, BMSel [ 29] , Di f f ,
BMSel [ 28] , neg_Di f f ) ;assign {Sel ect 0, Sel ect 1, new_AccumDi st 0, new_AccumDi st 1} = VBFLY( Stat e0, Stat e8,
Di stA);
wire [ 15: 0] Di st B = TIEsel( BMSel [ 27] , Sum, BMSel [ 26] , neg_Sum, BMSel [ 25] , Di f f ,
BMSel [ 24] , neg_Di f f ) ;assign {Sel ect 2, Sel ect 3, new_AccumDi st 2, new_AccumDi st 3} = VBFLY( Stat e1, Stat e9,
Di stB);
wire [ 15: 0] Di st C = TIEsel( BMSel [ 23] , Sum, BMSel [ 22] , neg_Sum, BMSel [ 21] , Di f f ,
BMSel [ 20] , neg_Di f f ) ;assign {Sel ect 4, Sel ect 5, new_AccumDi st 4, new_AccumDi st 5} = VBFLY( Stat e2, Stat eA,
Di stC);
wire [ 15: 0] Di st D = TIEsel( BMSel [ 19] , Sum, BMSel [ 18] , neg_Sum, BMSel [ 17] , Di f f ,
BMSel [ 16] , neg_Di f f ) ;assign {Sel ect 6, Sel ect 7, new_AccumDi st 6, new_AccumDi st 7} = VBFLY( Stat e3, Stat eB,
Di stD) ;
wire [ 15: 0] Di st E = TIEsel( BMSel [ 15] , Sum, BMSel [ 14] , neg_Sum, BMSel [ 13] , Di f f ,
BMSel [ 12] , neg_Di f f ) ;assign {Sel ect 8, Sel ect 9, new_AccumDi st 8, new_AccumDi st 9} = VBFLY( Stat e4, Stat eC,
Di s tE) ;
wire [15: 0] Di st F = TIEsel( BMSel [ 11] , Sum, BMSel [ 10] , neg_Sum, BMSel [ 9] , Di f f ,
BMSel [ 8], neg_Di f f ) ;assign {Sel ectA, Sel ectB, new_AccumDi st A, new_AccumDi st B} = VBFLY( Stat e5, Stat eD,
Di s tF) ;
wire [ 15: 0] Di st G = TIEsel( BMSel [ 7], Sum, BMSel [ 6] , neg_Sum, BMSel [ 5] , Di f f , BMSel [ 4] ,
neg_Di f f ) ;assign {Sel ectC, Sel ectD, new_AccumDi st C, new_AccumDi st D} = VBFLY( Stat e6, Stat eE,
Di stG) ;
wire [ 15: 0] Di st H = TIEsel( BMSel [ 3], Sum, BMSel [ 2] , neg_Sum, BMSel [ 1] , Di f f , BMSel [ 0] ,
neg_Di f f ) ;assign {Sel ect E, Sel ect F, new_AccumDi st E, new_AccumDi st F} = VBFLY( Stat e7, Stat eF,
Di stH);
/ / St ore new st ate metr i csassign AccumDi st 0=new_AccumDi st 0;
assign AccumDi st 1=new_AccumDi st 1;
assign AccumDi st 2=new_AccumDi st 2;
assign AccumDi st 3=new_AccumDi st 3;
assign AccumDi st 4=new_AccumDi st 4;
assign AccumDi st 5=new_AccumDi st 5;
assign AccumDi st 6=new_AccumDi st 6;
assign AccumDi st 7=new_AccumDi st 7;
assign AccumDi st 8=new_AccumDi st 8;
assign AccumDi st 9=new_AccumDi st 9;
assign AccumDi st A=new_AccumDi st A;
assign AccumDi st B=new_AccumDi st B;
assign AccumDi st C=new_AccumDi st C;
assign AccumDi st D=new_AccumDi st D;
assign AccumDi st E=new_AccumDi st E;
assign AccumDi st F=new_AccumDi st F;
/ / Wr i t e out t he Bi nar y Encoded Pat hs
wire [ 15: 0] Sel ect Pat hs={Sel ect 0, Sel ect 1, Sel ect 2, Sel ect 3, Sel ect 4, Sel ect 5, Sel ect 6, Sel ect 7,Sel ect 8, Sel ect 9, Sel ect A, Sel ect B, Sel ect C, Sel ect D, Sel ect E, Sel ect F};
assign VAddr =ars;assign MemDat aOut 16=Sel ect Pat hs;/ / Update t he output poi nterassign ar s=ars+2;
19
TENSILICA,INC.
-
5/26/2018 Convolution Coding
26/26
Convolutional Coding on Xtensa
}
/ / I ni t i al i ze Backtr ace i ns t ruct i onoperation BACKTRACE0{in AR ar s} {out Mi nStat e, out Output}{
/ / i ni t i al i ze Mi nstate w/ most l i kel y endstateassign Mi nSt at e = ar s;
LSB i s t he out put bi t/ / theassign Out put = ars[ 0] ;
}
operation BACKTRA inout AR *ar t }CE{{inout Mi nSt ate, out Out put , out VAddr , in MemDat aI n16}{
/ / Read i n Paths f or t r el l i s col umn and postdecr ement poi nt erassign VAddr = ar t ;wire [ 15: 0] Sel = MemDataI n16;assign ar t = ar t - 2;
/ / Sel ect path for t re l l i s statewire DataI n8 = TIEmux( Mi nSt at e[ 3: 0] , Sel [ 15] , Sel [ 14] , Sel [ 13] , Sel [ 12] , Sel [ 11] ,
Sel [10] , Sel [9] , Sel [8] , Sel [7] , Sel [6] , Sel [5] , Sel [4] , Sel [3] , Sel [2] , Sel [1] , Sel [0] ) ;
e backward one bi t t o pr evi ous st ate/ / Trac assign Mi nSt ate = {DataI n8, Mi nSt ate[ 3: 1] };
out put bi t/ / Save assign Out put = Mi nSt ate[1];}
schedule backt r ace_sched {BACKTRACE}{use Mi nSt ate 2; def Mi nSt ate 2; def Output 2; }
operation STORE_OUT{inout AR *Addr}{in Output , out VAddr , out MemDat aOut 8}{
assign VAddr = Addr ;assign MemDat aOut 8 = {7' b0, Out put};assign Addr = Addr - 1;
}
20
TENSILICA,INC.