vlsidsp_chap6

Upload: anupam-dubey

Post on 04-Apr-2018

216 views

Category:

Documents


0 download

TRANSCRIPT

  • 7/30/2019 VLSIDSP_CHAP6

    1/35

    VLSI Digital Signal Processing Systems

    Folding

    Lan-Da Van (), Ph. D.

    Department of Computer Science

    National Chiao Tung University

    Taiwan, R.O.C.Fall, 2010

    [email protected]

    http://www.cs.nctu.tw/~ldvan/

  • 7/30/2019 VLSIDSP_CHAP6

    2/35

    VLSI Digital Signal Processing Systems

    Lan-Da Van VLSI-DSP-6-2

    Outline

    IntroductionFolding Transformation

    Register Minimization Techniques

    Register Minimization in Folded Architecture

    Conclusions

  • 7/30/2019 VLSIDSP_CHAP6

    3/35

    VLSI Digital Signal Processing Systems

    Lan-Da Van VLSI-DSP-6-3

    Introduction (1/2)

    Systematically determine the control circuits in DSParchitectures by folding transformation, where

    multiple algorithm operations are time-multiplexed to

    a single functional unit.

    Use for synthesis of DSP architectures that can be

    operated at single or multiple clocks.

    Use to reduce the number of hardware functional

    units (FUs) by a factor of N at the expense of

    increasing computation time by a factor of N.

    Lead to an architecture that uses a large number of

    registers and thus present the register minimization

    technique.

  • 7/30/2019 VLSIDSP_CHAP6

    4/35

    VLSI Digital Signal Processing Systems

    Lan-Da Van VLSI-DSP-6-4

    Introduction (2/2)

  • 7/30/2019 VLSIDSP_CHAP6

    5/35

    VLSI Digital Signal Processing Systems

    Lan-Da Van VLSI-DSP-6-5

    Outline

    IntroductionFolding Transformation

    Register Minimization Techniques

    Register Minimization in Folded Architecture

    Conclusions

  • 7/30/2019 VLSIDSP_CHAP6

    6/35

    VLSI Digital Signal Processing Systems

    Lan-Da Van VLSI-DSP-6-6

    Folding Transformation (1/3)

    A systematic techniques for designing control circuits for hardware

    where several algorithm operations are time-multiplexed on a singlefunctional unit.

    Notations U, V: nodes (operations) of the original DFG

    HU, HV: nodes (functional units) of the folded DFG

    W(x): x-th iteration of node W

    U V: an edge e from node U to noe V w(e): # of delays of the edge e

    Folding factor N

    # of operations that share one FU

    Folding set An ordered set of operations that executed by the same FU

    the position of an operation U in folding set is actually the folding order ofU

    The folding set are typically obtained from a scheduling and allocationalgorithm (ref. Appendix B)

    The folding set represents underlying folding transformation

    e

    VLSI Di i l Si l P i S

  • 7/30/2019 VLSIDSP_CHAP6

    7/35

    VLSI Digital Signal Processing Systems

    Lan-Da Van VLSI-DSP-6-7

    Folding Transformation (2/3)

    PU: # of the pipeline stages of HU. PU = 0 indicatesthat HU is not pipelined.

    DF(U V): (folding equation) # of cycles that the

    result of HU must be stored

    e

    Negative value of folding equation DF is possible

    before retiming the folding equations.

    e

    uvPeNw

    uPNlvewlNVUD

    U

    UF

    )(

    ][]))](([)(

    VLSI Di it l Si l P i S t

  • 7/30/2019 VLSIDSP_CHAP6

    8/35

    VLSI Digital Signal Processing Systems

    Lan-Da Van VLSI-DSP-6-8

    Folding Transformation (3/3)

    U(l)w(e)

    V(l+w(e))

    HU(Nl+u)

    PU+DFHV

    (N(l+w(e))+v)

    N folded N folded

    VLSI Di it l Si l P i S t

  • 7/30/2019 VLSIDSP_CHAP6

    9/35

    VLSI Digital Signal Processing Systems

    Lan-Da Van VLSI-DSP-6-9

    Folding Retimed Biquad Filter (1/2)

    Folding factor N = 4

    Folding set S1 = {4, 2, 3, 1}, S2 = {5, 8, 6, 7}, where S1denote all add operation and S2 denote all multiplyoperation.

    Assume that addition and multiplication require 1 and 2 u.t. respectively.

    1-stage adders and 2-stage pipelined multipliers are available.

    VLSI Digital Signal Processing Systems

  • 7/30/2019 VLSIDSP_CHAP6

    10/35

    VLSI Digital Signal Processing Systems

    Lan-Da Van VLSI-DSP-6-10

    Folding Retimed Biquad Filter (2/2)

    folding equations

    VLSI Digital Signal Processing Systems

  • 7/30/2019 VLSIDSP_CHAP6

    11/35

    VLSI Digital Signal Processing Systems

    Lan-Da Van VLSI-DSP-6-11

    Retiming (1/3)

    What situations will be suffered if the folding equationDF is negative?

    Retiming (moving delay elements) the original DFG

    prior to folding

    Constraint: DF(UV)= Nwr(e)PU +vu>=0 -----(1)

    Substitute wr(e)=w(e)+r(V)r(U) into (1)

    r(U)r(V)

  • 7/30/2019 VLSIDSP_CHAP6

    12/35

    VLSI Digital Signal Processing Systems

    Lan-Da Van VLSI-DSP-6-12

    Retiming (2/3)

    Example:DF(12)=Nw(e)-PU+v-

    u=0-1+1-3=-3

    r(1)-r(2)

  • 7/30/2019 VLSIDSP_CHAP6

    13/35

    VLSI Digital Signal Processing Systems

    Lan-Da Van VLSI-DSP-6-13

    Retiming (3/3)

    r(1)=-1, r(2)=0,

    r(3)=-1, r(4)=0

    r(5)=-1, r(6)=-1,r(7)=-2, r(8)=-1

    VLSI Digital Signal Processing Systems

  • 7/30/2019 VLSIDSP_CHAP6

    14/35

    VLSI Digital Signal Processing Systems

    Lan-Da Van VLSI-DSP-6-14

    Outline

    Introduction

    Folding Transformation

    Register Minimization Techniques

    Register Minimization in Folded Architecture

    Conclusions

    VLSI Digital Signal Processing Systems

  • 7/30/2019 VLSIDSP_CHAP6

    15/35

    VLSI Digital Signal Processing Systems

    Lan-Da Van VLSI-DSP-6-15

    Lifetime Analysis

    Lifetime analysis is a procedure used to compute the

    minimum number of registers required to implement a

    DSP algorithm in hardware.

    Linear lifetimes analysis

    Circular lifetime analysis

    In lifetime analysis, the number of live variables at

    each time unit is computed, and the maximum

    number of live variables at any time unit is

    determined.

    Forward-backward register allocation technique

    VLSI Digital Signal Processing Systems

  • 7/30/2019 VLSIDSP_CHAP6

    16/35

    VLSI Digital Signal Processing Systems

    Lan-Da Van VLSI-DSP-6-16

    Linear Lifetime Analysis

    Variables {a , b , c}

    max {0,1,2,2,2,2,2,2}=2

    Three iterations with N=6

    Periodicity Implicit

    VLSI Digital Signal Processing Systems

  • 7/30/2019 VLSIDSP_CHAP6

    17/35

    VLSI Digital Signal Processing Systems

    Lan-Da Van VLSI-DSP-6-17

    Matrix Transpose Example (1/3)

    a d gb e hc f i

    a b cd e fg h i

    i h g f e d c b a Matrix

    Transposei f c h e b g d a

    Transpose

    VLSI Digital Signal Processing Systems

  • 7/30/2019 VLSIDSP_CHAP6

    18/35

    g g g y

    Lan-Da Van VLSI-DSP-6-18

    Matrix Transpose Example (2/3)

    Tzlout = zero-lantacy output timeTdiff = Tzlout TinputToutput = Tzlout + max{-Tdiff}

    VLSI Digital Signal Processing Systems

  • 7/30/2019 VLSIDSP_CHAP6

    19/35

    g g g y

    Lan-Da Van VLSI-DSP-6-19

    Matrix Transpose Example (3/3)

    The minimum register number is 4.

    Linear Lifetime Chart Circular Lifetime Chart

    VLSI Digital Signal Processing Systems

  • 7/30/2019 VLSIDSP_CHAP6

    20/35

    g g g y

    Lan-Da Van VLSI-DSP-6-20

    Procedures of Forward-BackwardRegister Allocation

    Steps:

    Step 1: Determinate the minimum number of registersusing lifetime analysis.

    Step 2: Input each variable at time step according to thebeginning of its lifetime.

    Step 3: Each variable is allocated in a forward manneruntil it is dead or it reaches the last register.

    Step 4: Since the allocation is periodic, the allocation ofthe current iteration also repeats itself in subsequentiterations. Thus, we hash the position for registers at

    period of N.Step 5: If a variable that reaches the last register and isstill alive, then these variables are allocated to a registerin a backwardly manner.

    Step 6: Repeat Steps 4 and 5 as required until the

    allocation is completed.

    VLSI Digital Signal Processing Systems

  • 7/30/2019 VLSIDSP_CHAP6

    21/35

    g g g y

    Lan-Da Van VLSI-DSP-6-21

    Register Allocation for Matrix TransposeExample

    VLSI Digital Signal Processing Systems

  • 7/30/2019 VLSIDSP_CHAP6

    22/35Lan-Da Van VLSI-DSP-6-22

    Outline

    Introduction

    Folding Transformation

    Register Minimization Techniques

    Register Minimization in Folded Architecture

    Conclusions

    VLSI Digital Signal Processing Systems

  • 7/30/2019 VLSIDSP_CHAP6

    23/35

    Lan-Da Van VLSI-DSP-6-23

    Procedures of Register Minimization inFolded Architectures

    Steps:

    Step 1: Perform retiming for folding

    Step 2: Write the folding equations

    Step 3: Use the folding equations to construct alifetime table

    Step 4: Draw the lifetime chart and determine therequired number of registers

    Step 5: Perform forward-backward registerallocation

    Step 6: Draw the folded architecture that uses theminimum number of registers

    VLSI Digital Signal Processing Systems

  • 7/30/2019 VLSIDSP_CHAP6

    24/35

    Lan-Da Van VLSI-DSP-6-24

    Folding Architecture Example

    VLSI Digital Signal Processing Systems

  • 7/30/2019 VLSIDSP_CHAP6

    25/35

    Lan-Da Van VLSI-DSP-6-25

    Folded Architecture for Matrix TransposeExample

    VLSI Digital Signal Processing Systems

  • 7/30/2019 VLSIDSP_CHAP6

    26/35

    Lan-Da Van VLSI-DSP-6-26

    Biquad Filter Example (1/4)

    Retiming

    Invalid folding:DF(12) = -3DF(64) = -4DF(84) = -3DF(73) = -3

    Step 1: Retiming

    VLSI Digital Signal Processing Systems

  • 7/30/2019 VLSIDSP_CHAP6

    27/35

    Lan-Da Van VLSI-DSP-6-27

    Biquad Filter Example (2/4)

    Step 2: Folding Equations

    DF(UV) = Nw(e) Pu + v - u

    DF(12) = 4(1) 1 + 1 3 = 1

    DF(15) = 4(1) 1 + 0 3 = 0DF(16) = 4(1) 1 + 2 3 = 2DF(17) = 4(1) 1 + 3 3 = 3DF(18) = 4(2) 1 + 1 3 = 5DF(31) = 4(0) 1 + 3 2 = 0DF(42) = 4(0) 1 + 1 0 = 0

    DF(53) = 4(0) 2 + 2 0 = 0DF(64) = 4(1) 2 + 0 2 = 4DF(73) = 4(1) 2 + 2 3 = 1DF(84) = 4(1) 2 + 0 1 = 1

    Step 3: Construct the lifetime table

    Tinput = u + PuToutput = u + Pu + maxv{DF(UV) }

    VLSI Digital Signal Processing Systems

  • 7/30/2019 VLSIDSP_CHAP6

    28/35

    Lan-Da Van VLSI-DSP-6-28

    Biquad Filter Example (3/4)

    Step 4: Draw the Lifetime Chart

    The minimum numberof registers is 2.

    Step 5: Register Allocation

    Folding Factor = 4

    VLSI Digital Signal Processing Systems

  • 7/30/2019 VLSIDSP_CHAP6

    29/35

    Lan-Da Van VLSI-DSP-6-29

    Biquad Filter Example (4/4)

    Step 6: Folded Architecture

    VLSI Digital Signal Processing Systems

  • 7/30/2019 VLSIDSP_CHAP6

    30/35

    Lan-Da Van VLSI-DSP-6-30

    IIR Filter Example (1/4)

    Step 1: Retiming

    Retiming

    Invalid folding:DF(3 1) = -3DF(4 1) = -2

    VLSI Digital Signal Processing Systems

  • 7/30/2019 VLSIDSP_CHAP6

    31/35

    Lan-Da Van VLSI-DSP-6-31

    IIR Filter Example (2/4)

    Step 2: Folding Equations

    DF(UV) = Nw(e) Pu + v - u

    DF(12) = 4(1) 1 + 1 3 = 0DF(23) = 4(1) 1 + 0 3 = 5DF(24) = 4(1) 1 + 2 3 = 2DF(31) = 4(1) 1 + 3 3 = 1DF(41) = 4(2) 1 + 1 3 = 0

    Step 3: Construct the lifetime table

    Tinput = u + PuToutput = u + Pu + maxv{DF(UV) }

    VLSI Digital Signal Processing Systems

  • 7/30/2019 VLSIDSP_CHAP6

    32/35

    Lan-Da Van VLSI-DSP-6-32

    IIR Filter Example (3/4)

    Step 4: Draw the Lifetime Chart Step 5: Register Allocation

    The minimum numberof registers is 3.

    Folding Factor = 2

    VLSI Digital Signal Processing Systems

  • 7/30/2019 VLSIDSP_CHAP6

    33/35

    Lan-Da Van VLSI-DSP-6-33

    IIR Filter Example (4/4)

    Step 6: Folded Architecture

    VLSI Digital Signal Processing Systems

  • 7/30/2019 VLSIDSP_CHAP6

    34/35

    Lan-Da Van VLSI-DSP-6-34

    Conclusions

    Present a systematic transformation of time-

    multiplexed architectures

    Explore folding techniques to reduce # of functional

    units

    Explore register minimization technique to reduce #

    of registers

    VLSI Digital Signal Processing Systems

  • 7/30/2019 VLSIDSP_CHAP6

    35/35

    References

    K. K. Parhi, VLSI Digital Signal Processing Systems:

    Design and Implementation, Wiley, 1999.

    S. Y. Huang, Handout of text book, 2004.