convolution coding

Upload: amandeep-grover

Post on 16-Oct-2015

35 views

Category:

Documents


0 download

DESCRIPTION

convolutional encoding description

TRANSCRIPT

  • 5/26/2018 Convolution Coding

    1/26

    __________________________________________________________

    Convolutional Coding on

    Xtensa ProcessorsApplication Note

    Tensilica, Inc.3255-6 Scott Blvd.

    Santa Clara, CA 95054(408) 986-8000

    Fax (408) 986-8919www.tensilica.com

    January 2009 Doc Number: AN01-123-04

    TENSILICA,INC.

  • 5/26/2018 Convolution Coding

    2/26

    Convolutional Coding on Xtensa Processors

    2005-2008 Tensilica, Inc.

    All Rights Reserved

    Printed in the United States of America

    This publication is provided AS IS. Tensilica, Inc. (hereafter Tensilica) does not make any warranty of any kind, either expressed or implied, including, but not

    limited to, the implied warranties of merchantability and fitness for a particular purpose. Information in this document is provided solely to enable system andsoftware developers to use Tensilica processors. Unless specifically set forth herein, there are no express or implied patent, copyright or any other intellectual

    property rights or licenses granted hereunder to design or fabricate Tensilica integrated circuits or integrated circuits based on the information in this document.

    Tensilica does not warrant that the contents of this publication, whether individually or as one or more groups, meets your requirements or that the publication

    is error-free. This publication could include technical inaccuracies or typographical errors. Changes may be made to the information herein, and these changes

    may be incorporated in new editions of this publication.

    The following terms are trademarks of Tensilica, Inc.: OSKit, Tensilica, Vectra, and Xtensa. All other trademarks and registered trademarks are the property of

    their respective companies.

    Document Change History:

    September 1998 (Revised January, 2001; February, 2005)

    January 2009

    ii

    TENSILICA,INC.

    Digitally signed byTensilica Technical

    PublicationsReason: Certified original

    Tensilica document 1/2009

  • 5/26/2018 Convolution Coding

    3/26

    Convolutional Coding on Xtensa Processors

    Contents

    1 Communication System Challenges............................................................................12 A Simple Encoder.......................................................................................................13 The Encoding Process................................................................................................34 Viterbi Decoding.........................................................................................................65 Details of the Viterbi Algorithm....................................................................................76 Distance Metric Calculation........................................................................................77 The Trellis Decode Butterfly........................................................................................98 Implementation on Base Xtensa ...............................................................................119 Full Optimization with TIE.........................................................................................1210 Demonstration Instructions.......................................................................................1611 Summary..................................................................................................................16Appendix A VTB2.TIE Code ........................................................................................17

    iii

    TENSILICA,INC.

  • 5/26/2018 Convolution Coding

    4/26

    Convolutional Coding on Xtensa Processors

    Figures

    Figure 1: Communication System Block Diagram.............................................................1

    Figure 2: Simple Convolutional Encoder...........................................................................2

    Figure 3: Convolutional Encoder State Diagram...............................................................2

    Figure 4: Trellis Diagram Showing Most-Likely Path Through States................................ 6

    Figure 5: Distance Metric Graph.......................................................................................8

    Figure 6: Four Butterflies in a Trellis Time Step (K=4).....................................................9

    Figure 7: Butterfly with Distance Metric..........................................................................10

    Figure 8: Adding State and Branch Distances Metrics....................................................10

    Figure 9: Selecting Smallest Accumulated Distance Metric ............................................10

    Figure 10: Butterfly Operation Diagram..........................................................................11

    Tables

    Table1: Distance Metric Values........................................................................................9

    iv

    TENSILICA,INC.

  • 5/26/2018 Convolution Coding

    5/26

    Convolutional Coding on Xtensa Processors

    AbstractThis application note looks briefly at popular techniques for convolutional encoding and

    decoding, especially Viterbi decoding, and illustrates the power of a configurable processor in

    handling the performance-intensive signal processing demands of coding and decoding.

    Application-specific processors are quickly designed, simulated, built in silicon, and offer

    significantly better programmability, performance, and power-efficiency than most populardigital signal processors (DSPs). In particular, this paper describes user-defined TIE (Tensilica

    Instruction Extension) instructions which accelerate distance metric calculations, the most

    performance-critical task in Viterbi decoding, by 32x over most popular DSPs and 155x over

    most popular 32-bit RISC cores.

    This application note makes the assumption that the reader is familiar with Viterbi decoding,

    the Xtensa Instruction Set Architecture, and the Tensilica Instruction Extension description

    language. Please refer to theXtensa ISA Reference Manualand the Tensilica Instruction

    Extension (TIE) Language Users Guidefor additional information.

    v

    TENSILICA,INC.

  • 5/26/2018 Convolution Coding

    6/26

    Convolutional Coding on Xtensa Processors

    vi

    TENSILICA,INC.

  • 5/26/2018 Convolution Coding

    7/26

    Convolutional Coding on Xtensa Processors

    1 Communication System Challenges

    One of greatest challenges in communication system design is efficient transmission and

    reception of information in the presence of errors introduced by the communication channel.The presence of errors is especially pronounced in radio communication, due to the variety of

    noise sources in the channel. Designers have adopted block coding methods that add

    redundancy in the encoding of information before transmission. Although the addition of

    redundant data reduces the overall throughput of the channel, forward error correction

    improves performance by using the redundant data to correct errors during decoding at the

    receiver, as shown in Figure 1.

    encoder decoder noisy

    channel

    original

    data

    stream

    encoded

    data

    encoded

    data +

    noise

    recovered

    data

    stream

    FIGURE 1: COMMUNICATION SYSTEM BLOCK DIAGRAM

    Convolutional coding, that is, coding based on time-invariant finite state machines, is widely

    used in wireless communications. This application note looks briefly at popular techniques for

    convolutional encoding and decoding, especially Viterbi decoding. It illustrates the power of a

    configurable processor in handling the performance-intensive signal processing demands of

    coding and decoding. Specifically, user-defined instructions in the Tensilica Instruction

    Extension Language (TIE) will be described which accelerate distance metric calculations, the

    most performance-critical task in Viterbi decoding, by more than 32 times over most popular

    digital signal processors and 155 times over most popular 32 bit RISC cores.

    2 A Simple Encoder

    In convolutional encoding, each new coded bit for transmission is generated by a convolution of

    the current input bit with some number of earlier input bits and a masking polynomial. The

    ability of the decoder to detect and correct errors in transmission depends on the number of

    input bits used in the convolution. That number of bits is called the constraint length.Redundancy is added to the bit stream by the generation of more than one bit of encoded

    output for each input bit. This ratio of input bits to output bits is called the coding rate. Forexample, a coding rate of 1/2 will generate 2 output bits from 1 input bit. Popular wireless

    communication standards (GSM, IS-95, IS-136) use constraint lengths from 5 to 9 and coding

    rates from 1/2 to 1/4.

    A simple convolutional encoder, with a constraint length of 4 and coding rate of 1/2 is shown in

    Figure 2. For each new input x( I ) , two new outputs, G0 and G1, are generated fortransmission.

    1

    TENSILICA,INC.

  • 5/26/2018 Convolution Coding

    8/26

    Convolutional Coding on Xtensa Processors

    Xi D

    D

    D D

    G0,i

    G1,i

    one sample delay exclusive OR

    FIGURE 2: SIMPLE CONVOLUTIONALENCODER

    This example implements the convolution code represented by the polynomials:G0 = 1 + x + x

    3 and G1 = 1 + x + x

    2 + x

    3

    The polynomial formulas listed above are a convenient way to represent inputs from current bit

    (X0=1) and delayed bits (X1,X2,X3) into XOR logic to form the output. For example, output G0

    (1+x+x3) is calculated by performing XOR calculation on the current bit (X0=1), the previous bit

    (X1), and the third previous bit (X3). Output G1 (1+x+x2+x3) is calculated by performing XOR

    calculation on the current bit (X0=1), the previous bit (X1), the second previous bit (X2), and the

    third previous bit (X3).

    This encoder can also be expressed as a state diagram, as shown in Figure 3. Each of the

    states is labeled with a state number corresponding to the state of the three delay elements of

    the circuit above. Note that the most recent bit is assigned to the LSB, while the third previous

    bit is assigned to the MSB. Each of the arcs is labeled x, G0, G1 (the input bit x for that arc, and

    the G0, G1 outputs for that input).

    000010

    100

    001011

    101

    110

    111

    1,0,0

    0,1,1

    0,1,0

    1,0,0

    0,1,1

    1,1,11,1,1

    1,0,1

    1,0,1

    0,0,1 0,0,1 0,0,01,1,0

    0,0,0

    0,1,01,1,0

    FIGURE 3: CONVOLUTIONAL ENCODER STATE DIAGRAM

    It is convenient to view the encoder as a state diagram showing arcs from one encoder state to

    another. Each arc is labeled with the corresponding input bit and encoder output bits. Later,

    this state diagram is converted to a trellis diagram to represent state arcs with respect to time.

    Note that except for the encoder outputs, the state representation remains unchanged for anybasic convolution encoder with the same constraint length due to the fact that the shifting

    pattern of bits through the encoder will remain the same. Different polynomials will generate

    different outputs for each arc going from one state to another.

    2

    TENSILICA,INC.

  • 5/26/2018 Convolution Coding

    9/26

    Convolutional Coding on Xtensa Processors

    3 The Encoding Process

    The convolution encoder described in the previous sections can be implemented either as a

    hardware state machine or as a software routine running on a processor. Although the

    hardware implementation for a given encoding polynomial is typically quite simple, a software

    implementation offers valuable flexibility. The increasing need for adaptive and multi-protocolcommunication equipment make a processor-based solution appropriate in many

    circumstances.

    Below is a C implementation of the encoder that was shown earlier.

    / / Sampl e Convol ut i onal Encoder/ / Const r ai nt l engt h 4 and codi ng r ate 1/ 2/ / G0 = 1 + x + x 3 and G1 = 1 + x + x 2 + x 3/ * i nput dat a f or Convol ut i onal Encoder */char I N[ FrameSi ze] ;/ * out put dat a f r om Convol ut i onal Encoder */char G0[ FrameSi ze] , G1[ FrameSi ze] ;

    voi d convol ve( ){i nt f , t emp;f or ( f =0; f = 3){

    / / Note t hat ANSI C XOR operat i ons ar e + i n pol ynomi al r epr esentat i onG0[ f ] = I N[ f ] I N[ f - 1] I N[ f - 3] ;G1[ f ] = I N[ f ] I N[ f - 1] I N[ f - 2] I N[ f - 3] ;

    }el se i f ( f == 2) / / Assume Del ay el ement 3 f l ushed t o zer o{

    G0[ f ] = I N[ f ] I N[ f - 1] ;G1[ f ] = I N[ f ] I N[ f - 1] I N[ f - 2] ;}el se i f ( f == 1) / / Assume Del ay el ement s 2- 3 f l ushed t o zero{

    G0[ f ] = I N[ f ] I N[ f - 1] ;G1[ f ] = I N[ f ] I N[ f - 1] ;

    }el se i f ( f == 0) / / I ni t i al Condi t i on:

    / / Al l Del ay el ement s are f l ushed to zer o{

    G0[ f ] = I N[ f ] ;G1[ f ] = I N[ f ] ;

    }

    }}

    3

    TENSILICA,INC.

  • 5/26/2018 Convolution Coding

    10/26

    Convolutional Coding on Xtensa Processors

    Encoding can be rewritten, as in the pseudo code below, to take advantage of the Xtensa

    processors funnel shift and XOR instructions.

    / / Pseudo Code f or encoder/ / G0=1+X+X3 & G1=1+X+X2+X3/ / N = number of i nput bi t s i n f r ame

    / / Assi gn Encoder I nput & Out put St r eami nt *I nput_Pt r =&I nput ;i nt *Out put_Pt r_G0=&Out put_G0;i nt *Out put_Pt r_G1=&Out put_G1;

    / / I ni t i al i ze I nput32_ol d t o zer oI nput 32_ol d=0;

    / / Encode 32 i nput bi t s per i t er at i onf or ( i =0; i

  • 5/26/2018 Convolution Coding

    11/26

    Convolutional Coding on Xtensa Processors

    / / i nner l oop of k = 4, r = 1/ 2 encodi ng f or t he/ / G0 = 1 + x + x 3 and G1 = 1 + x + x 2 + x 3/ / i nput dat a f or Convol ut i onal Encoder / /

    / / comput es 64 out put pai r s per i t erat i on

    / / a2 poi nt s t o the word cont ai ni ng t he next 64 i nput bi t s/ / organi zed wi t h ol dest bi t i n the msb of t he word/ / a14 poi nt s t o t he out put buf f er f or G0/ / a15 poi nt s t o t he out put buf f er f or G1/ / a8 cont ai ns t he ol dest 32 i nput bi t s f rom t he pr evi ous i t er at i on

    movi . n a1, N/ 64l oopnez a1, l oopend / / use zero over head l oop, N i s number of bi t s t o encode

    l 32i a3, a2, 0 / / a3 cont ai ns l ow 32b of i nput st r eam ( 1)l 32i a9, a2, 4 / / a9 cont ai ns hi gh 32b of i nput st r eam ( 1)

    / / not e that a8 cont ai ns hi gh 32b of pr evi ous i t er at i onssai 1 / / f unnel shi f t 64b by one sampl e t i mesr c a4, a8, a3 / / a4 cont ai ns l ow del ayed by one ( x)sr c a10, a3, a9 / / a10 cont ai ns hi gh del ayed by one ( x)ssai 2 / / f unnel shi f t 64b by t wo sampl e t i messr c a5, a8, a3 / / a5 cont ai ns l ow del ayed by t wo ( x 2)sr c a11, a3, a9 / / a11 cont ai ns hi gh del ayed by t wo ( x 2)ssai 3 / / f unnel shi f t 64b by t hr ee sampl e t i messr c a6, a8, a3 / / a6 cont ai ns l ow del ayed by t hree ( x 3)sr c a12, a3, a9 / / a12 cont ai ns hi gh del ayed by t hree ( x 3)

    / / comput e G0 & G1 f or al l l ow 32bxor a4, a4, a3 / / G0= 1 + xxor a4, a4, a6 / / +x 3xor a5, a5, a4 / / G1 = G0 + x 2

    / / compute G0 & G1 f or al l hi gh 32bxor a10, a10, a9 / / G0= 1 + xxor a10, a10, a12 / / +X 3xor a11, a11, a10 / / G1 = G0 + x 2

    s32i a4, a14, 0 / / st ore G0 l ow 32bs32i a5, a15, 0 / / st ore G1 l ow 32bs32i a10, a14, 4 / / st ore G0 hi gh 32bs32i a11, a15, 4 / / st ore G1 hi gh 32baddi a2, a2, 8 / / advance i nput poi nt er by 64baddi a14, a14, 8 / / and out put poi nters by 64baddi a15, a15, 8mov a8, a9 / / save hi gh 32b f or use i n next i t erat i onl oopend:

    The assembly routine listed above is capable of encoding 2.5 bits per cycle. The performance

    of this convolutional coding technique can be generalized to 11+((k-1)*5) cycles for each 64

    input bits, where k is the constraint length. The actual performance is dependent on the

    polynomials used. The convolutional coding performance of a base Xtensa processor is

    comparable to a 16-bit DSP, such as members of the Texas Instruments TMS320C54x family.

    This class of DSPs is capable of coding 1.5 bits per cycle for a set of polynomials with k=5 (see:

    Viterbi Decoding Techniques in the TMS320C54x Family, Henry Hendrix, Texas InstrumentsApplication Note SPRA071, June 1996). For the same polynomials, performance on an Xtensa

    processor is about 1.8 bits per cycle.

    5

    TENSILICA,INC.

  • 5/26/2018 Convolution Coding

    12/26

    Convolutional Coding on Xtensa Processors

    4 Viterbi Decoding

    The goal of decoding a received bit stream is to find the maximum-likelihood output sequence

    given the received sequence - a combination of the transmitted sequence plus noise. Viterbi

    decoding offers an efficient algorithm to find this output sequence. It is based on a decoder

    that attempts to estimate, using the received data sequence, the likelihood that the encoder is

    in each of its possible states. The graphical modeling of all possible state transitions has cometo be called a trellis diagram. A simple trellis diagram is shown below. The trellis diagram is a

    different way of modeling the state diagram that was shown earlier, but with the added

    dimension of time. This diagram is used to determine the correct path through the states,

    based on a particular transmitted sequence, assuming the encoder started in the idle state

    (000). The challenge for the decoder is to predict this path even when some of the incoming

    bits (G0, G1) may have been corrupted by noise.

    Received G0,G1:

    000000 000 000 000

    010 010010 010 010

    011011 011 011 011

    100 100 100 100 100

    101 101 101 101101

    110 110 110 110 110

    111 111 111 111 111

    001 001001001 001

    1,0 1,00,1 0,1

    Time 0 Time 1 Time 2 Time 3 Time 4

    FIGURE 4: TRELLIS DIAGRAM SHOWING MOST-LIKELY PATH THROUGH STATES

    6

    TENSILICA,INC.

  • 5/26/2018 Convolution Coding

    13/26

    Convolutional Coding on Xtensa Processors

    5 Details of the Viterbi Algorithm

    The Viterbi decode algorithm works in two phases. In the first phase, the update phase, the

    incoming data is analyzed in sequence order. The maximum-likelihood decoder works by

    maintaining a running estimate of the appropriateness of each possible path through the trellis

    for the received data sequence. Starting from a known initial state and for each successivereceived input pair (G0,G1), the decoder calculates a distance metric between the received

    input pair and the input pair corresponding to each state arc in the diagram. The distance

    metric calculation method will be discussed later. The shortest path, the series of arcs with the

    smallest total distance metric, is taken to be the most-likely path through the trellis diagram.

    Each path implies a unique state sequence in the encoder, and thus a unique input sequence.

    This phase is considered the most CPU-intensive task within the Viterbi Algorithm, so the

    remainder of this application note focuses on this area.

    In the second phase, the trace back phase, the sequence of arc decisions must be traced back

    to reconstruct the inferred inputs to the encoder. Recalling that the most recent data shifted

    into the delay line is the LSB of the state, the inputs based upon the trellis diagram above are

    inferred to be (1,0,0,0). This phase can be easily accomplished by examining the LSB of each

    of the states, tracing backward through the most-likely path.

    Several popular techniques are used to calculate distance metrics. In general, these methods

    are categorized as either hard decision decoding or soft decision decoding. In a soft decision

    decoder, the input to the decoder is an integer in the range between +B and -B. Therefore, the

    strength of the signal can be used as information by the decoder. In a hard decision decoder,

    threshold detection is used to quantize input signals into either of two states: +1 or -1. Soft

    decision decoding with infinite range provides approximately 2.2db better coding gain than hard

    decision decoding at the expense of slightly more complexity in the decoder.

    6 Distance Metric Calculation

    In the trellis diagram shown in Figure 4, there are arcs leading from states in one trellis column

    to states in the next trellis column. Each of these arcs has an associated local distance (branch

    metric). Recall that the state diagram shown earlier labels each arc with the encoder outputsfor each transition. The local distance is determined by comparing the actual received data to

    expected encoder outputs for a given arc.

    The Hamming Distance technique is one of the more popular techniques used for calculating

    distance metrics. For a coding rate of 1/2, we can imagine the actual data, G0 and G1, to

    indicate position in two different dimensions. Each arc in the trellis diagram has a

    corresponding input pair, R0 and R1, which is the expected output for each arc. The diagram

    below shows both actual and expected data represented as points in a Cartesian plane. The

    Hamming distance is determined by adding the differences of each dimension

    ((G0-R0) + (G1-R1)).

    7

    TENSILICA,INC.

  • 5/26/2018 Convolution Coding

    14/26

    Convolutional Coding on Xtensa

    Hamming

    Distance

    Straight-lineDistance

    (R0, R1)

    Expected

    Actu al

    (G0, G1)

    FIGURE 5: DISTANCE METRIC GRAPH

    Another popular distance metric technique is the Euclidian (Square) Distance technique. The

    Euclidian (Square) Distance is determined by calculating the square root of the straight-line

    distance between two symbols. Using the Pythagorean Theorem, the straight-line distance

    between the actual and expected input pairs of the previous diagram is calculated as follows:

    22 R1)( G1R0)-( G0 +

    Remove the square root from the straight-line distance calculation to get the Euclidian (Square)

    Distance. There is a slight bit error rate (BER) performance penalty for using the Euclidian

    (Square) Distance when compared to the straight-line distance, yet this penalty is negligible

    when compared with the reduction in complexity. Expanding the Euclidian (Square) distance

    metric results in the following equation:

    G02 - 2( R0*G0) + R02 + G12 - 2( R1*G1) +R12

    Note that the distance metric for a given arc will be compared against distance metrics of other

    arcs within the same trellis column. Addition of constants or multiplication by a constant will

    not affect the comparison. Therefore distance metric calculation can be simplified by removing

    constants and constant multipliers. G0and G1are actual inputs, which have a range between+Band B, yet are constant throughout the trellis column. Therefore the square of G0 and G1can be eliminated. Since expected inputs, R0 and R1 have possible values of +B or -B, the

    square of R0 and R1 become B2, which is a constant and can be eliminated. Thus, the distance

    metric can be further simplified as follows by removing these constants. - 2( R0*G0) 2( R1*G1)

    Removing the constant multiplier 2 in the equation above, leaves

    - ( R0*G0) ( R1*G1)

    Recalling that R0 and R1 have possible values of +B or B, the distance metric is simplified as

    shown in the following table:

    8

    TENSILICA,INC.

  • 5/26/2018 Convolution Coding

    15/26

    Convolutional Coding on Xtensa Processors

    TABLE1: DISTANCE METRIC VALUES

    Expected DataR0, R1) DistanceMetric RemovingConstant B Replace withSum, Diff+B, +B -BG0-BG1 -G0-G1 -Sum

    +B, -B -BG0+BG1 -(G0-G1) -Diff

    -B, -B +BG0+BG1 G0+G1 Sum

    -B, +B +BG0-BG1 G0-G1 Diff

    Note: Sum=G0+G1; Diff=G0-G1

    The distance metric calculation has been greatly simplified to +/- the sum or difference of the

    received data. To determine the local distance of a particular arc, determine the expected data

    for that arc and replace it with the corresponding equation using the table above.

    7 The Trellis Decode Butterfly

    To aid in implementation, it is often helpful to arrange calculations in functional groups. The

    procedure for doing the calculations on a single group can become a template to be used on

    other like groups. A butterfly can be visualized as a grouping of 2 source states, 2 destination

    states, and 4 arcs between them. For the trellis diagram shown earlier, with 8 states per

    column, a time step from one trellis column to another can be visualized as 4 butterflies as

    shown below.

    100 001

    000 000

    011101

    010001

    110 101

    100010

    111 111

    011 110

    FIGURE 6: FOUR BUTTERFLIES IN A TRELLIS TIME STEP (K=4)

    9

    TENSILICA,INC.

  • 5/26/2018 Convolution Coding

    16/26

    Convolutional Coding on Xtensa

    Lets take a closer look at a single butterfly calculation. The diagram below shows a butterfly

    diagram with corresponding encoder output values for each arc. The encoder outputs are

    translated into local distances as per the previous table.

    -Sum

    -Sum

    +Sum000 000

    +Sum100 001

    +B+B

    +B+B

    -B -B000 000

    -B -B100 001

    FIGURE 7: BUTTERFLY WITH DISTANCE METRIC

    The heart of the butterfly calculation is sometimes called the ADD-COMPARE-SELECT operation.

    In the ADD stage, the accumulated distance metric is calculated by taking the local distance of

    each arc in the butterfly, and adding it to the accumulated distance metric from the originating

    state. Considering that the accumulated distance metric of the originating state is named

    StateN (N = number of state), the diagram below shows each arcs accumulated distance

    metric after the ADD stage.

    State0+Sum

    State4+Sum

    State0-Sum

    State4

    -Sum

    000000

    001100

    FIGURE 8:ADDING STATE AND BRANCH DISTANCES METRICS

    In the COMPARE stage, the distance metric for each arc into a destination state is compared. In

    the butterfly diagram, there are two arcs and two corresponding distance metrics leading into

    each destination state. Of the two arcs, the arc with the smallest distance metric is considered

    as the most-likely arc and the other arc is discarded.

    In the SELECT stage, the most-likely arcs accumulated distance metric is stored as the new

    accumulated distance metric for the state. The diagram below shows the selected arcs and

    updated accumulated distant metric, State 0 and State1, assuming State0+Sum < State4-Sum

    and State0-Sum < State4+Sum.

    State0+Sum

    100

    000

    001

    000State0-Sum

    State0=State0+Sum

    State4=State0-Sum

    FIGURE 9: SELECTING SMALLESTACCUMULATED DISTANCE METRIC

    The selected arcs are recorded so this information can be used during the trace back phase to

    reconstruct the most-likely path through the trellis. One way to code the selected arc is to use

    the MSB of the originating state. Hence, the most-likely arc into State 0 is coded as 0, and the

    most-likely arc into State 1 is also coded as 0.

    The regularity of the butterfly computation suggests a set of special instructions intended to

    accelerate the calculation of distance metrics. Variations of the add-compare-select instructions

    have been implemented on advanced digital signal processors. In our C-based implementation,

    a macro called ACS is used to implement a variation of the ADD-COMPARE-SELECT calculation.

    The macro and sample usage is shown for a single butterfly operation.

    10

    TENSILICA,INC.

  • 5/26/2018 Convolution Coding

    17/26

    Convolutional Coding on Xtensa Processors

    / ******************************************************************

    ACS i s a macro whi ch per f orms a var i at i on of t he ADD- Compar e- Sel ectoper ati on f or each st at e i n t he Trel l i s. I t compar es 2 accumul at eddi st ance met r i cs (X, Y) of t he 2 ar cs l eadi ng i nt o t he st at e. The shor t est

    arc i s sel ect ed as the most- l i kel y ar c. The shor t est accumul ated di st ancemetr i c i s st ored i n S( I ) and bi nary code whi ch desi gnates t he most - l i kel yar c t o t he st at e i s stored i n Sel ect[ j ] [ I ] , wher e ( I ) repr esent s the st at eand ( j ) r epresent s t he t r el l i s col umn.

    *******************************************************************/

    #def i ne ACS( S, I , X, Y) i f ( ( s1 = ( X) ) < ( s2 = ( Y) ) ) {S[ ( I ) ] = s1;Sel ect[ j ] [ ( I ) ] = 0; } el se {S[ ( I ) ] = s2; Sel ect [ j ] [ ( I ) ] = 1; }

    Di f f = G0[ j ] - G1[ j ] ;

    Sum = G0[ j ] +G1[ j ] ;

    / / Usi ng ACS macro f or si ngl e but t er f l y

    ACS(NewStat e, 0, Stat e0+Sum, Stat e4- Sum) ;

    ACS(NewStat e, 1, Stat e0- Sum, Stat e4+Sum) ;

    A butterfly operation consists of two add-compare-select calculations. The code above is used

    to perform the butterfly operation shown below.

    State4+Sum100

    State0-Sum

    State4

    -Sum

    NewState[ 0 ] =

    Min(State0+Sum,State4-Sum)

    NewState[ 1 ] =

    Min(State0+Sum,State4+Sum)

    State0+Sum000

    001

    000

    FIGURE 10: BUTTERFLY OPERATION DIAGRAM

    A single butterfly operation is performed for every pair of destination states within a trellis

    column. The same trellis column operation is iteratively performed on every subsequent trellis

    column until the end of the frame. Once the end of frame is reached, each states accumulated

    distance metric is compared, with the smallest being considered the ending state. The trace

    back phase begins with the end state. The decoder will then extract the LSB of each state as

    the deduced input bit and use the coded path to trace through all prior trellis columns until the

    inferred input at the beginning of the frame is deduced.

    8 Implementation on Base Xtensa

    A demonstration GSM Viterbi Decoder and test bench was developed in C and is provided as an

    Xplorer Workspace file, Vi t er bi_v2. xws. The decoder is a soft decision decoder using the

    Euclidian (Square) Distance metric and ACS macro described earlier in this Application Note(instead of eight states described in previous sections). Since GSM uses a constraint length of

    five, there will be 16 states in every trellis column. Hence, GSM requires eight butterfly

    operations to decode a single bit (as compared to our previous example which only required

    four butterfly operations).

    The Viterbi_v2 project is a test bench that prepares a random frame of 1000 bits and thenencodes them into GSM coded symbols. The symbols are corrupted to simulate white noise.

    Finally, the test bench decodes these bits and compares the output with the original input bits.

    The Viterbi decoder is benchmarked for performance.

    11

    TENSILICA,INC.

  • 5/26/2018 Convolution Coding

    18/26

    Convolutional Coding on Xtensa

    In this original form, a single bit requires 337 cycles to decode on a base Xtensa processorwhen using aggressive compiler optimizations (-O3 switch used in xt - xcc). Given that theXtensa processor is as efficient, if not more efficient, than ARM9 and MIPS32 cores in handling

    ANSI C code, the performance of other 32-bit RISC cores is estimated to be similar.

    9 Full Optimization with TIE

    The Tensilica Instruction Extension (TIE) language provides a powerful mechanism to add

    instructions to the base Xtensa instruction set and to generate complete support in hardware

    and software tools for special purpose operations. The decode butterfly involves the addition of

    the local distance to a pair of adjacent states accumulated distance metric calculation, then a

    comparison and selection of the most-likely arc into each of the pair of states. The regularity of

    this computation suggests a set of special instructions intended to accelerate the butterfly

    calculation. Variations of add-compare-select instructions have been implemented on

    advanced digital signal processors to accelerate the Viterbi decoder. Likewise, variations of the

    add-compare-select instruction can be developed for Xtensa using TIE. Such instructions are

    invaluable in accelerating Viterbi decoders that support data encoded using arbitrary constraint

    length and polynomials. On the other hand, TIE could be used to develop instructions that

    accelerate the decoding of data generated from a specific encoder. TIE instructions that are

    specific to an encoder can be developed with computational performance comparable to a purehardware implementation. The optimal TIE instructions chosen is dependent upon the balance

    between flexibility and computational performance required in a given system.

    Significant improvement using TIE can be achieved by creating a variation of the add-compare-

    select butterfly computation and defining this logic as a TIE function as shown below:

    / / Vi t erbi ADD- COMPARE- SELECT But t er f l y

    function [ 33: 0] VBFLY ( [ 15: 0] St at eA, [ 15: 0] St at eB, [ 15: 0] Met r i c)

    {

    wire [ 15: 0] neg_Metr i c = ~Metr i c + 1' b1;

    / / Add state and path met r i cwire [ 15: 0] st at eA_pat hA = Stat eA+Met r i c;

    wire [ 15: 0] st at eB_pat hB = St at eB+neg_Met r i c;

    / / Compar e accumul at ed met r i c

    wire [ 4: 0] compA = TIEcmp(stateA_pat hA, st at eB_pat hB, 1' b1) ;

    / / Sel ect ed ( l east val ue) pat h i s out put

    wire [ 15: 0] new_st ateA = ( compA[ 4] ) ?st ateA_pathA: st at eB_pathB;

    wire Sel ectA = ( compA[ 4] ) ?0: 1;

    wire [ 15: 0] st at eA_pat hB = St at eA+neg_Met r i c;

    wire [ 15: 0] st at eB_pat hA = Stat eB+Met r i c;

    wire [ 4: 0] compB = TIEcmp(stateA_pathB, st at eB_pat hA, 1' b1) ;

    wire [ 15: 0] new_st at eB = ( compB[ 4] ) ?st at eA_pat hB: st ateB_pathA;

    wire Sel ectB = ( compB[ 4] ) ?0: 1;

    assign VBFLY = {Sel ectA, Sel ectB, new_st at eA, new_st at eB};

    }

    This TIE function performs the same computation as a pair of ACS macros shown in section 7.

    12

    TENSILICA,INC.

  • 5/26/2018 Convolution Coding

    19/26

    Convolutional Coding on Xtensa Processors

    Several additional techniques used to accelerate the Viterbi decoder are:

    The VBFLY TIE function can be instanced several times in an operation so that multipleViterbi butterfly computations are performed in parallel.

    Making use of internal TIE state (not to be confused with states in the trellis diagramreferred to as trellis states) to hold intermediate data, such as accumulated state metrics,

    can eliminate many memory accesses.

    Fusion of memory accesses and butterfly computations into high performance TIEoperations

    FLIX with dual load/store interface allows for two operations (both operations performingload/store) to be issued in the same instruction word.

    Appendix A lists vtb2. t i e, the TIE file that describes TIE operations that accelerate Viterbidecode. The TIE instructions for the trellis update phase of Viterbi decoding are summarized

    below.

    VBI N: Viterbi Input

    C I ntr i nsi c Synt ax: voi d VBI N( VREG PG0, VREG* p_PG0)

    This operation loads 2 GSM coded symbol pairs (4 bytes) at one time by using a 32-bit load into

    a 32-bit register file VREG. The load pointer (p_PG0) is also auto-incremented by 4 bytes inpreparation for the next VBI Ninstruction.

    VBOUT: Parallel Viterbi Butterfly Operation and Output

    C I ntr i nsi c Synt ax: voi d VBOUT ( unsi gned shor t * PSel ect, VREG PG0, i mmi )

    This operation updates all state metrics of a trellis column for a single pair of GSM coded data

    (PG0). The add-compare-select operation is performed on all 16 states of the trellis column

    using 8 VBFLY TIE functions, to support the Viterbi butterfly computations for the entire trellis

    column.

    This operation updates each states accumulated distance metric within 16-bit TIE states, one

    for each of the 16 Trellis states and writes out 16 select bits for the most-likely arcs going into

    each of the 16 trellis states. The write pointer (PSelect) is auto-incremented in preparation for

    the next VBOUTinstruction. An immediate operand (i) is used to choose a symbol pair of GSMcoded data from the 32-bit VREGTIE register file. Since VBI Nprovides 2 GSM coded symbols,there will be two VBOUTinstructions for each VBI Ninstruction.

    WUR_BMsel: Write User Register- Branch Metric Select

    The BMSel register is a 32-bit register that sets the distance metric for each path of the Viterbi

    butterfly computations as used by the VBOUT instruction. Since the VBOUT performs 8 butterfly

    computations, there are 32 paths metrics. However, due to path symmetry in the butterfly

    structure, we need only define the top-most path to the butterfly and remaining paths are

    inferred from this path. For example, the top-most path in figure 8 is +sum. The bottom-most

    path is the same as the top-most path (+sum) and the diagonal paths are negative of the top-

    most path (sum).

    The BMSel register is split into 8 4-bit fields, where each bit corresponds to a one-hot value for

    +sum, -sum, +diff, or -diff. The most significant 4-bit field corresponds to the top-most path of

    the butterfly computation that updates states 0 and 1. The following 4-bit field corresponds to

    the top-most path of the butterfly computation that updates states 2 and 3, and so on.

    Prior to executing VBOUT instructions, the BMSel register should be initialized with the

    appropriate branch metric selection for the butterfly computations. By allowing the setting of

    the branch metrics, the VBOUT instructions allows support for different polynomials used for

    Viterbi coding (given that the constraint length is k=5, coding rate = 1/2).

    In this example, the path metrics for each butterfly computation are taken directly from the

    GSM decoder C source code. The initialization for standard GSM coded polynomials is shown in

    the sample code below:

    13

    TENSILICA,INC.

  • 5/26/2018 Convolution Coding

    20/26

    Convolutional Coding on Xtensa

    #define di s t_sum 8

    #define di s t_neg_sum4

    #define di s t_di f f 2

    #define di s t_neg_di f f 1

    WUR_BMSel ( ( di st_sum

  • 5/26/2018 Convolution Coding

    21/26

    Convolutional Coding on Xtensa Processors

    work-per-cycle basis. Note that the Xtensa-based implementation is written in C, whereas

    hand coded assembly is required to obtain performance numbers for many DSP machines.

    The TIE operations for the trace-back phase of Viterbi decoding are summarized below.

    BACKTRACE: Viterbi Backtrace

    C I ntr i nsi c Synt ax: voi d BACKTRACE(unsi gned shor t * PSel ect)

    This operation loads the 16 select bits (from address PSelect) that were stored during

    execution of VBOUT instructions. From the current minimum state, the select value

    (representing the most likely path) is used to trace backward to the previous trellis stage. The

    LSB value of the minimum state is considered to be the most likely output bit and is saved in a

    holding register to be later written to memory using the STORE_OUT operation. The select

    pointer (PSelect) is post-decremented by 2 in preparation for the next BACKTRACE operation.

    BACKTRACE0: Viterbi Backtrace initialization

    C I ntr i nsi c Synt ax: voi d BACKTRACE0( char Mi nSt ate)

    This instruction is a subset of the BACKTRACE operation that is only executed once prior to

    subsequent executions of the BACKTRACE instructions. This instruction initializes the minimum

    state after the update phase. The state number with the minimum value is passed as argumentMinState.

    STORE_OUT: Store eight output values

    C I nt r i nsi c Synt ax: voi d STORE_OUT( unsi gned char* POutput )

    This instruction performs a byte store of the single-bit output value calculated in prior

    executions of the BACKTRACE instruction to pointer POutput. The POutput pointer is post-

    decremented by one in preparation for the next STORE_OUT operation.

    The main loop for the Viterbi decoders update phase is shown below:

    for ( i =FS- 1; i >=1; i - - ) {

    BACKTRACE(PSel ect ) ;

    STORE_OUT( pt r _out put ) ;

    }

    The disassembly of the Viterbi decoders backtrace loop is as follows:

    l oopgt z a10, 60000f e0

    { s tore_out a9; backt r ace a8 }

    The loop consists of a single FLIX instruction that contains both BACKTRACE and STORE_OUToperations. These operations are effectively pipelined such that the backtrace is done in the

    first iteration and then the output bit is written to memory in the next iteration. As a result, an

    output bit is written every clock cycle. This means that the trace back phase of Viterbi decoding

    occurs at a rate of one cycle per bit.

    The highly optimized assembly code described in this section was directly compiled from C

    source code with the TIE variable set (#define TIE). Upon building this example and simulating

    it, the console shows the following:

    15

    TENSILICA,INC.

  • 5/26/2018 Convolution Coding

    22/26

    Convolutional Coding on Xtensa

    Pr ocessi ng New Fr ame

    Err ors det ect ed = 0, Benchmark = 2. 167000 cycl es per bi t

    Viterbi decodeperformance of 2.17 cycles per bit is more than 155x improvementover thestandard implementationwithout TIE acceleration(337 cycles per bit). The TIE area for thisapproach is 28.7K gates, in addition to 47K gates for base XtensaLX2 core. This core iscapable of being synthesized up to 264MHz(worst case) in .13 LV. Therefore, this solution iscapable of decoding a GSM coded bitstream at a peak rate of 130Mbits per second.

    10Demonstration Instructions

    The demonstration requires that you have installed Xplorer CE 2.1.1 with RB-2008.3 software

    tools. The workspace, Vi t er bi _V2. xwscan be obtained from the Tensilica support website.

    Follow these steps to build and simulate the demonstration code.

    1. Start Xplorer and import the Vi t er bi _V2. xwsworkspace. Select all componentsprovided in the workspace for installation into your workspace.

    2. In the workspace toolbar, select project (P: Viterbi_v2), configuration (C: Viterbi_v2) andrelease target (T: Release).

    3. Click Build Active to compile and then click on Run to simulate. The console will display thedecode error and benchmark results.

    To compare performance with ANSI C implementation (without TIE), you can comment out

    the line (#define TIE) in the mai n. cfile of the Viterbi_V2 project.

    11SummaryXtensa processors offer significant advantages for complex telephony applications. The Xtensa

    architecture combines a powerful general-purpose 32-bit instruction set design, with a unique

    configuration and extension process. These are used together to solve some of the toughest

    problems in communication system design, including efficient convolutional coding and Viterbi

    decoding. Application-specific-processors are quickly designed, simulated, built in silicon, and

    offer significantly better programmability, performance and power-efficiency than most popular

    DSPs. With the benefit of TIE, Xtensa solutions can offer almost 155x improvement incommunication processing efficiency compared to conventional 32-bit RISC cores and over 32ximprovement when compared to specialized DSPs.

    16

    TENSILICA,INC.

  • 5/26/2018 Convolution Coding

    23/26

    Convolutional Coding on Xtensa Processors

    Appendix A VTB2.TIE Code

    / / VTB2. TI E/ / TI E Ext ensi ons f or Vi t er bi Accel erati on/ / FL IXformat vt b_f l i x 32 {s l ot_a, s l ot_b}

    slot_opcodes s l ot_a {VBI N, STORE_OUT}

    slot_opcodes s l ot_b {VBOUT, BACKTRACE, BACKTRACE0}

    / / St at es used by Vi t erbi I nstr ucti onsstate AccumDi st 0 16 add_read_write

    state AccumDi st 1 16 add_read_write

    state AccumDi st 2 16 add_read_write

    state AccumDi st 3 16 add_read_write

    state AccumDi st 4 16 add_read_write

    state AccumDi st 5 16 add_read_write

    state AccumDi st 6 16 add_read_write

    state AccumDi st 7 16 add_read_write

    state AccumDi st 8 16 add_read_write

    state AccumDi st 9 16 add_read_write

    state AccumDi st A 16 add_read_write

    state AccumDi st B 16 add_read_writestate AccumDi st C 16 add_read_write

    state AccumDi st D 16 add_read_write

    state AccumDi st E 16 add_read_write

    state AccumDi st F 16 add_read_write

    state Mi nSt at e 4 add_read_write

    state BMSel 32 add_read_write

    state Output 1 add_read_write

    / / I mmedi atesimmediate_range i mm8 0 7 1

    regfile VREG 32 2 vr

    / / Vi t erbi ADD- COMPARE- SELECT Butt erf l y

    function [ 33: 0] VBFLY ([ 15: 0] St at eA, [ 15: 0] St ateB, [ 15: 0] Metr i c){

    wire [ 15: 0] neg_Met ri c = ~Met ri c + 1' b1;

    wire [15: 0] st ateA_pathA = St ateA+Metr i c;

    wire [15: 0] st ateB_pat hB = Stat eB+neg_Metr i c;

    wire [ 4: 0] compA = TIEcmp(stateA_pathA, st ateB_pat hB, 1' b1) ;

    wire [ 15: 0] new_st ateA = ( compA[ 4] ) ?st ateA_pathA: st ateB_pat hB;wire Sel ectA = ( compA[ 4] ) ?0: 1;

    wire [15: 0] st ateA_pat hB = Stat eA+neg_Metr i c;

    wire [15: 0] st ateB_pathA = St ateB+Metr i c;

    wire [ 4: 0] compB = TIEcmp(stateA_pathB, st ateB_pat hA, 1' b1) ;

    wire [ 15: 0] new_st ateB = ( compB[ 4] ) ?st ateA_pathB: st ateB_pat hA;wire Sel ectB = ( compB[ 4] ) ?0: 1;

    assign VBFLY = {Sel ectA, Sel ectB, new_st ateA, new_st ateB};}

    operation VBI N {out VREG GI nput , inout AR *ar s} {out VAddr , in MemDat aI n32}{assign VAddr =ars;assign GI nput=MemDat aI n32;assign ar s=ars+4;}

    operation VBOUT

    17

    TENSILICA,INC.

  • 5/26/2018 Convolution Coding

    24/26

    Convolutional Coding on Xtensa

    {inout AR *ars, in VREG GI nput , in i mm8 t }{

    in BMSel ,inout AccumDi st 0,inout AccumDi st 1,inout AccumDi st 2,inout AccumDi st 3,inout AccumDi st 4,

    inout AccumDi st 5,inout AccumDi st 6,inout AccumDi st 7,inout AccumDi st 8,inout AccumDi st 9,inout AccumDi st A,inout AccumDi st B,inout AccumDi st C,inout AccumDi st D,inout AccumDi st E,inout AccumDi st F,out VAddr ,out MemDat aOut16

    }{/ / Choose G0 f r om GI nput based upon i mmedi at e ar gument t/ / Wr i t t en for Bi g Endi an Or der i ng

    wire [ 7: 0] G0=( ( t ==1) ?GI nput [ 15: 8] : GI nput [ 31: 24] ) ;/ / Choose G1 f r om GI nput based upon i mmedi at e ar gument t/ / Wr i t t en for Bi g Endi an Or der i ngwire [ 7: 0] G1=( ( t ==1) ?GI nput [ 7: 0] : GI nput[ 23: 16] ) ;

    / / Decl are t empor ary var i abl es f or AccumDi stwire [ 15: 0] St ate0=AccumDi st 0;wire [ 15: 0] St ate1=AccumDi st 1;wire [ 15: 0] St ate2=AccumDi st 2;wire [ 15: 0] St ate3=AccumDi st 3;wire [ 15: 0] St ate4=AccumDi st 4;wire [ 15: 0] St ate5=AccumDi st 5;wire [ 15: 0] St ate6=AccumDi st 6;wire [ 15: 0] St ate7=AccumDi st 7;

    wire [ 15: 0] St ate8=AccumDi st 8;wire [ 15: 0] St ate9=AccumDi st 9;wire [ 15: 0] St ateA=AccumDi st A;wire [ 15: 0] St ateB=AccumDi st B;wire [ 15: 0] St ateC=AccumDi st C;wire [ 15: 0] StateD=AccumDi st D;wire [15: 0] St ateE=AccumDi st E;wire [ 15: 0] Stat eF=AccumDi st F;/ / Cal cul at e Sum/ Di f f f or i nputwire [ 7: 0] Sum_8=G0+G1;

    wire [ 7: 0] Di f f _8=G0- G1;

    wire [ 15: 0] Sum={8{Sum_8[ 7] }, Sum_8};

    wire [ 15: 0] Di f f ={8{Di f f_8[ 7] }, Di f f _8};

    wire [ 15: 0] neg_Sum=~Sum + 1;

    wire [ 15: 0] neg_Di f f =~Di f f + 1;

    / / Cal cul ate Accumul ated Path Metr i cs/ / Compar e/ Sel ect Short est Pat h i nto each St ate/ / usi ng 8 paral l el VBFLY f uncti ons

    wire [ 15: 0] new_AccumDi st 0, new_AccumDi st 1, new_AccumDi st 2, new_AccumDi st 3,

    new_AccumDi st 4, new_AccumDi st 5, new_AccumDi st 6, new_AccumDi st 7, new_AccumDi st 8,

    new_AccumDi st 9, new_AccumDi st A, new_AccumDi st B, new_AccumDi st C, new_AccumDi st D,

    new_AccumDi st E, new_AccumDi st F;wire Sel ect0, Sel ect1, Sel ect2, Sel ect3, Sel ect4, Sel ect5, Sel ect6, Sel ect7, Sel ect8,

    Sel ect9, Sel ectA, Sel ectB, Sel ectC, Sel ectD, Sel ectE, Sel ectF;

    18

    TENSILICA,INC.

  • 5/26/2018 Convolution Coding

    25/26

    Convolutional Coding on Xtensa Processors

    wire [ 15: 0] Di st A = TIEsel( BMSel [ 31] , Sum, BMSel [ 30] , neg_Sum, BMSel [ 29] , Di f f ,

    BMSel [ 28] , neg_Di f f ) ;assign {Sel ect 0, Sel ect 1, new_AccumDi st 0, new_AccumDi st 1} = VBFLY( Stat e0, Stat e8,

    Di stA);

    wire [ 15: 0] Di st B = TIEsel( BMSel [ 27] , Sum, BMSel [ 26] , neg_Sum, BMSel [ 25] , Di f f ,

    BMSel [ 24] , neg_Di f f ) ;assign {Sel ect 2, Sel ect 3, new_AccumDi st 2, new_AccumDi st 3} = VBFLY( Stat e1, Stat e9,

    Di stB);

    wire [ 15: 0] Di st C = TIEsel( BMSel [ 23] , Sum, BMSel [ 22] , neg_Sum, BMSel [ 21] , Di f f ,

    BMSel [ 20] , neg_Di f f ) ;assign {Sel ect 4, Sel ect 5, new_AccumDi st 4, new_AccumDi st 5} = VBFLY( Stat e2, Stat eA,

    Di stC);

    wire [ 15: 0] Di st D = TIEsel( BMSel [ 19] , Sum, BMSel [ 18] , neg_Sum, BMSel [ 17] , Di f f ,

    BMSel [ 16] , neg_Di f f ) ;assign {Sel ect 6, Sel ect 7, new_AccumDi st 6, new_AccumDi st 7} = VBFLY( Stat e3, Stat eB,

    Di stD) ;

    wire [ 15: 0] Di st E = TIEsel( BMSel [ 15] , Sum, BMSel [ 14] , neg_Sum, BMSel [ 13] , Di f f ,

    BMSel [ 12] , neg_Di f f ) ;assign {Sel ect 8, Sel ect 9, new_AccumDi st 8, new_AccumDi st 9} = VBFLY( Stat e4, Stat eC,

    Di s tE) ;

    wire [15: 0] Di st F = TIEsel( BMSel [ 11] , Sum, BMSel [ 10] , neg_Sum, BMSel [ 9] , Di f f ,

    BMSel [ 8], neg_Di f f ) ;assign {Sel ectA, Sel ectB, new_AccumDi st A, new_AccumDi st B} = VBFLY( Stat e5, Stat eD,

    Di s tF) ;

    wire [ 15: 0] Di st G = TIEsel( BMSel [ 7], Sum, BMSel [ 6] , neg_Sum, BMSel [ 5] , Di f f , BMSel [ 4] ,

    neg_Di f f ) ;assign {Sel ectC, Sel ectD, new_AccumDi st C, new_AccumDi st D} = VBFLY( Stat e6, Stat eE,

    Di stG) ;

    wire [ 15: 0] Di st H = TIEsel( BMSel [ 3], Sum, BMSel [ 2] , neg_Sum, BMSel [ 1] , Di f f , BMSel [ 0] ,

    neg_Di f f ) ;assign {Sel ect E, Sel ect F, new_AccumDi st E, new_AccumDi st F} = VBFLY( Stat e7, Stat eF,

    Di stH);

    / / St ore new st ate metr i csassign AccumDi st 0=new_AccumDi st 0;

    assign AccumDi st 1=new_AccumDi st 1;

    assign AccumDi st 2=new_AccumDi st 2;

    assign AccumDi st 3=new_AccumDi st 3;

    assign AccumDi st 4=new_AccumDi st 4;

    assign AccumDi st 5=new_AccumDi st 5;

    assign AccumDi st 6=new_AccumDi st 6;

    assign AccumDi st 7=new_AccumDi st 7;

    assign AccumDi st 8=new_AccumDi st 8;

    assign AccumDi st 9=new_AccumDi st 9;

    assign AccumDi st A=new_AccumDi st A;

    assign AccumDi st B=new_AccumDi st B;

    assign AccumDi st C=new_AccumDi st C;

    assign AccumDi st D=new_AccumDi st D;

    assign AccumDi st E=new_AccumDi st E;

    assign AccumDi st F=new_AccumDi st F;

    / / Wr i t e out t he Bi nar y Encoded Pat hs

    wire [ 15: 0] Sel ect Pat hs={Sel ect 0, Sel ect 1, Sel ect 2, Sel ect 3, Sel ect 4, Sel ect 5, Sel ect 6, Sel ect 7,Sel ect 8, Sel ect 9, Sel ect A, Sel ect B, Sel ect C, Sel ect D, Sel ect E, Sel ect F};

    assign VAddr =ars;assign MemDat aOut 16=Sel ect Pat hs;/ / Update t he output poi nterassign ar s=ars+2;

    19

    TENSILICA,INC.

  • 5/26/2018 Convolution Coding

    26/26

    Convolutional Coding on Xtensa

    }

    / / I ni t i al i ze Backtr ace i ns t ruct i onoperation BACKTRACE0{in AR ar s} {out Mi nStat e, out Output}{

    / / i ni t i al i ze Mi nstate w/ most l i kel y endstateassign Mi nSt at e = ar s;

    LSB i s t he out put bi t/ / theassign Out put = ars[ 0] ;

    }

    operation BACKTRA inout AR *ar t }CE{{inout Mi nSt ate, out Out put , out VAddr , in MemDat aI n16}{

    / / Read i n Paths f or t r el l i s col umn and postdecr ement poi nt erassign VAddr = ar t ;wire [ 15: 0] Sel = MemDataI n16;assign ar t = ar t - 2;

    / / Sel ect path for t re l l i s statewire DataI n8 = TIEmux( Mi nSt at e[ 3: 0] , Sel [ 15] , Sel [ 14] , Sel [ 13] , Sel [ 12] , Sel [ 11] ,

    Sel [10] , Sel [9] , Sel [8] , Sel [7] , Sel [6] , Sel [5] , Sel [4] , Sel [3] , Sel [2] , Sel [1] , Sel [0] ) ;

    e backward one bi t t o pr evi ous st ate/ / Trac assign Mi nSt ate = {DataI n8, Mi nSt ate[ 3: 1] };

    out put bi t/ / Save assign Out put = Mi nSt ate[1];}

    schedule backt r ace_sched {BACKTRACE}{use Mi nSt ate 2; def Mi nSt ate 2; def Output 2; }

    operation STORE_OUT{inout AR *Addr}{in Output , out VAddr , out MemDat aOut 8}{

    assign VAddr = Addr ;assign MemDat aOut 8 = {7' b0, Out put};assign Addr = Addr - 1;

    }

    20

    TENSILICA,INC.