Download - VLSIDSP_CHAP6
-
7/30/2019 VLSIDSP_CHAP6
1/35
VLSI Digital Signal Processing Systems
Folding
Lan-Da Van (), Ph. D.
Department of Computer Science
National Chiao Tung University
Taiwan, R.O.C.Fall, 2010
http://www.cs.nctu.tw/~ldvan/
-
7/30/2019 VLSIDSP_CHAP6
2/35
VLSI Digital Signal Processing Systems
Lan-Da Van VLSI-DSP-6-2
Outline
IntroductionFolding Transformation
Register Minimization Techniques
Register Minimization in Folded Architecture
Conclusions
-
7/30/2019 VLSIDSP_CHAP6
3/35
VLSI Digital Signal Processing Systems
Lan-Da Van VLSI-DSP-6-3
Introduction (1/2)
Systematically determine the control circuits in DSParchitectures by folding transformation, where
multiple algorithm operations are time-multiplexed to
a single functional unit.
Use for synthesis of DSP architectures that can be
operated at single or multiple clocks.
Use to reduce the number of hardware functional
units (FUs) by a factor of N at the expense of
increasing computation time by a factor of N.
Lead to an architecture that uses a large number of
registers and thus present the register minimization
technique.
-
7/30/2019 VLSIDSP_CHAP6
4/35
VLSI Digital Signal Processing Systems
Lan-Da Van VLSI-DSP-6-4
Introduction (2/2)
-
7/30/2019 VLSIDSP_CHAP6
5/35
VLSI Digital Signal Processing Systems
Lan-Da Van VLSI-DSP-6-5
Outline
IntroductionFolding Transformation
Register Minimization Techniques
Register Minimization in Folded Architecture
Conclusions
-
7/30/2019 VLSIDSP_CHAP6
6/35
VLSI Digital Signal Processing Systems
Lan-Da Van VLSI-DSP-6-6
Folding Transformation (1/3)
A systematic techniques for designing control circuits for hardware
where several algorithm operations are time-multiplexed on a singlefunctional unit.
Notations U, V: nodes (operations) of the original DFG
HU, HV: nodes (functional units) of the folded DFG
W(x): x-th iteration of node W
U V: an edge e from node U to noe V w(e): # of delays of the edge e
Folding factor N
# of operations that share one FU
Folding set An ordered set of operations that executed by the same FU
the position of an operation U in folding set is actually the folding order ofU
The folding set are typically obtained from a scheduling and allocationalgorithm (ref. Appendix B)
The folding set represents underlying folding transformation
e
VLSI Di i l Si l P i S
-
7/30/2019 VLSIDSP_CHAP6
7/35
VLSI Digital Signal Processing Systems
Lan-Da Van VLSI-DSP-6-7
Folding Transformation (2/3)
PU: # of the pipeline stages of HU. PU = 0 indicatesthat HU is not pipelined.
DF(U V): (folding equation) # of cycles that the
result of HU must be stored
e
Negative value of folding equation DF is possible
before retiming the folding equations.
e
uvPeNw
uPNlvewlNVUD
U
UF
)(
][]))](([)(
VLSI Di it l Si l P i S t
-
7/30/2019 VLSIDSP_CHAP6
8/35
VLSI Digital Signal Processing Systems
Lan-Da Van VLSI-DSP-6-8
Folding Transformation (3/3)
U(l)w(e)
V(l+w(e))
HU(Nl+u)
PU+DFHV
(N(l+w(e))+v)
N folded N folded
VLSI Di it l Si l P i S t
-
7/30/2019 VLSIDSP_CHAP6
9/35
VLSI Digital Signal Processing Systems
Lan-Da Van VLSI-DSP-6-9
Folding Retimed Biquad Filter (1/2)
Folding factor N = 4
Folding set S1 = {4, 2, 3, 1}, S2 = {5, 8, 6, 7}, where S1denote all add operation and S2 denote all multiplyoperation.
Assume that addition and multiplication require 1 and 2 u.t. respectively.
1-stage adders and 2-stage pipelined multipliers are available.
VLSI Digital Signal Processing Systems
-
7/30/2019 VLSIDSP_CHAP6
10/35
VLSI Digital Signal Processing Systems
Lan-Da Van VLSI-DSP-6-10
Folding Retimed Biquad Filter (2/2)
folding equations
VLSI Digital Signal Processing Systems
-
7/30/2019 VLSIDSP_CHAP6
11/35
VLSI Digital Signal Processing Systems
Lan-Da Van VLSI-DSP-6-11
Retiming (1/3)
What situations will be suffered if the folding equationDF is negative?
Retiming (moving delay elements) the original DFG
prior to folding
Constraint: DF(UV)= Nwr(e)PU +vu>=0 -----(1)
Substitute wr(e)=w(e)+r(V)r(U) into (1)
r(U)r(V)
-
7/30/2019 VLSIDSP_CHAP6
12/35
VLSI Digital Signal Processing Systems
Lan-Da Van VLSI-DSP-6-12
Retiming (2/3)
Example:DF(12)=Nw(e)-PU+v-
u=0-1+1-3=-3
r(1)-r(2)
-
7/30/2019 VLSIDSP_CHAP6
13/35
VLSI Digital Signal Processing Systems
Lan-Da Van VLSI-DSP-6-13
Retiming (3/3)
r(1)=-1, r(2)=0,
r(3)=-1, r(4)=0
r(5)=-1, r(6)=-1,r(7)=-2, r(8)=-1
VLSI Digital Signal Processing Systems
-
7/30/2019 VLSIDSP_CHAP6
14/35
VLSI Digital Signal Processing Systems
Lan-Da Van VLSI-DSP-6-14
Outline
Introduction
Folding Transformation
Register Minimization Techniques
Register Minimization in Folded Architecture
Conclusions
VLSI Digital Signal Processing Systems
-
7/30/2019 VLSIDSP_CHAP6
15/35
VLSI Digital Signal Processing Systems
Lan-Da Van VLSI-DSP-6-15
Lifetime Analysis
Lifetime analysis is a procedure used to compute the
minimum number of registers required to implement a
DSP algorithm in hardware.
Linear lifetimes analysis
Circular lifetime analysis
In lifetime analysis, the number of live variables at
each time unit is computed, and the maximum
number of live variables at any time unit is
determined.
Forward-backward register allocation technique
VLSI Digital Signal Processing Systems
-
7/30/2019 VLSIDSP_CHAP6
16/35
VLSI Digital Signal Processing Systems
Lan-Da Van VLSI-DSP-6-16
Linear Lifetime Analysis
Variables {a , b , c}
max {0,1,2,2,2,2,2,2}=2
Three iterations with N=6
Periodicity Implicit
VLSI Digital Signal Processing Systems
-
7/30/2019 VLSIDSP_CHAP6
17/35
VLSI Digital Signal Processing Systems
Lan-Da Van VLSI-DSP-6-17
Matrix Transpose Example (1/3)
a d gb e hc f i
a b cd e fg h i
i h g f e d c b a Matrix
Transposei f c h e b g d a
Transpose
VLSI Digital Signal Processing Systems
-
7/30/2019 VLSIDSP_CHAP6
18/35
g g g y
Lan-Da Van VLSI-DSP-6-18
Matrix Transpose Example (2/3)
Tzlout = zero-lantacy output timeTdiff = Tzlout TinputToutput = Tzlout + max{-Tdiff}
VLSI Digital Signal Processing Systems
-
7/30/2019 VLSIDSP_CHAP6
19/35
g g g y
Lan-Da Van VLSI-DSP-6-19
Matrix Transpose Example (3/3)
The minimum register number is 4.
Linear Lifetime Chart Circular Lifetime Chart
VLSI Digital Signal Processing Systems
-
7/30/2019 VLSIDSP_CHAP6
20/35
g g g y
Lan-Da Van VLSI-DSP-6-20
Procedures of Forward-BackwardRegister Allocation
Steps:
Step 1: Determinate the minimum number of registersusing lifetime analysis.
Step 2: Input each variable at time step according to thebeginning of its lifetime.
Step 3: Each variable is allocated in a forward manneruntil it is dead or it reaches the last register.
Step 4: Since the allocation is periodic, the allocation ofthe current iteration also repeats itself in subsequentiterations. Thus, we hash the position for registers at
period of N.Step 5: If a variable that reaches the last register and isstill alive, then these variables are allocated to a registerin a backwardly manner.
Step 6: Repeat Steps 4 and 5 as required until the
allocation is completed.
VLSI Digital Signal Processing Systems
-
7/30/2019 VLSIDSP_CHAP6
21/35
g g g y
Lan-Da Van VLSI-DSP-6-21
Register Allocation for Matrix TransposeExample
VLSI Digital Signal Processing Systems
-
7/30/2019 VLSIDSP_CHAP6
22/35Lan-Da Van VLSI-DSP-6-22
Outline
Introduction
Folding Transformation
Register Minimization Techniques
Register Minimization in Folded Architecture
Conclusions
VLSI Digital Signal Processing Systems
-
7/30/2019 VLSIDSP_CHAP6
23/35
Lan-Da Van VLSI-DSP-6-23
Procedures of Register Minimization inFolded Architectures
Steps:
Step 1: Perform retiming for folding
Step 2: Write the folding equations
Step 3: Use the folding equations to construct alifetime table
Step 4: Draw the lifetime chart and determine therequired number of registers
Step 5: Perform forward-backward registerallocation
Step 6: Draw the folded architecture that uses theminimum number of registers
VLSI Digital Signal Processing Systems
-
7/30/2019 VLSIDSP_CHAP6
24/35
Lan-Da Van VLSI-DSP-6-24
Folding Architecture Example
VLSI Digital Signal Processing Systems
-
7/30/2019 VLSIDSP_CHAP6
25/35
Lan-Da Van VLSI-DSP-6-25
Folded Architecture for Matrix TransposeExample
VLSI Digital Signal Processing Systems
-
7/30/2019 VLSIDSP_CHAP6
26/35
Lan-Da Van VLSI-DSP-6-26
Biquad Filter Example (1/4)
Retiming
Invalid folding:DF(12) = -3DF(64) = -4DF(84) = -3DF(73) = -3
Step 1: Retiming
VLSI Digital Signal Processing Systems
-
7/30/2019 VLSIDSP_CHAP6
27/35
Lan-Da Van VLSI-DSP-6-27
Biquad Filter Example (2/4)
Step 2: Folding Equations
DF(UV) = Nw(e) Pu + v - u
DF(12) = 4(1) 1 + 1 3 = 1
DF(15) = 4(1) 1 + 0 3 = 0DF(16) = 4(1) 1 + 2 3 = 2DF(17) = 4(1) 1 + 3 3 = 3DF(18) = 4(2) 1 + 1 3 = 5DF(31) = 4(0) 1 + 3 2 = 0DF(42) = 4(0) 1 + 1 0 = 0
DF(53) = 4(0) 2 + 2 0 = 0DF(64) = 4(1) 2 + 0 2 = 4DF(73) = 4(1) 2 + 2 3 = 1DF(84) = 4(1) 2 + 0 1 = 1
Step 3: Construct the lifetime table
Tinput = u + PuToutput = u + Pu + maxv{DF(UV) }
VLSI Digital Signal Processing Systems
-
7/30/2019 VLSIDSP_CHAP6
28/35
Lan-Da Van VLSI-DSP-6-28
Biquad Filter Example (3/4)
Step 4: Draw the Lifetime Chart
The minimum numberof registers is 2.
Step 5: Register Allocation
Folding Factor = 4
VLSI Digital Signal Processing Systems
-
7/30/2019 VLSIDSP_CHAP6
29/35
Lan-Da Van VLSI-DSP-6-29
Biquad Filter Example (4/4)
Step 6: Folded Architecture
VLSI Digital Signal Processing Systems
-
7/30/2019 VLSIDSP_CHAP6
30/35
Lan-Da Van VLSI-DSP-6-30
IIR Filter Example (1/4)
Step 1: Retiming
Retiming
Invalid folding:DF(3 1) = -3DF(4 1) = -2
VLSI Digital Signal Processing Systems
-
7/30/2019 VLSIDSP_CHAP6
31/35
Lan-Da Van VLSI-DSP-6-31
IIR Filter Example (2/4)
Step 2: Folding Equations
DF(UV) = Nw(e) Pu + v - u
DF(12) = 4(1) 1 + 1 3 = 0DF(23) = 4(1) 1 + 0 3 = 5DF(24) = 4(1) 1 + 2 3 = 2DF(31) = 4(1) 1 + 3 3 = 1DF(41) = 4(2) 1 + 1 3 = 0
Step 3: Construct the lifetime table
Tinput = u + PuToutput = u + Pu + maxv{DF(UV) }
VLSI Digital Signal Processing Systems
-
7/30/2019 VLSIDSP_CHAP6
32/35
Lan-Da Van VLSI-DSP-6-32
IIR Filter Example (3/4)
Step 4: Draw the Lifetime Chart Step 5: Register Allocation
The minimum numberof registers is 3.
Folding Factor = 2
VLSI Digital Signal Processing Systems
-
7/30/2019 VLSIDSP_CHAP6
33/35
Lan-Da Van VLSI-DSP-6-33
IIR Filter Example (4/4)
Step 6: Folded Architecture
VLSI Digital Signal Processing Systems
-
7/30/2019 VLSIDSP_CHAP6
34/35
Lan-Da Van VLSI-DSP-6-34
Conclusions
Present a systematic transformation of time-
multiplexed architectures
Explore folding techniques to reduce # of functional
units
Explore register minimization technique to reduce #
of registers
VLSI Digital Signal Processing Systems
-
7/30/2019 VLSIDSP_CHAP6
35/35
References
K. K. Parhi, VLSI Digital Signal Processing Systems:
Design and Implementation, Wiley, 1999.
S. Y. Huang, Handout of text book, 2004.