Download - L33:Low Power Reconfigurable system Jun-Dong Cho SungKyunKwan Univ. Dept. of ECE, Vada Lab
![Page 1: L33:Low Power Reconfigurable system Jun-Dong Cho SungKyunKwan Univ. Dept. of ECE, Vada Lab](https://reader036.vdocuments.mx/reader036/viewer/2022062718/56649e845503460f94b85793/html5/thumbnails/1.jpg)
L33:Low Power Reconfigurable system
Jun-Dong ChoSungKyunKwan Univ.
Dept. of ECE, Vada Lab. http://vada.skku.ac.kr
![Page 2: L33:Low Power Reconfigurable system Jun-Dong Cho SungKyunKwan Univ. Dept. of ECE, Vada Lab](https://reader036.vdocuments.mx/reader036/viewer/2022062718/56649e845503460f94b85793/html5/thumbnails/2.jpg)
Answer IV:Reconfigurable Processor
• Configurable datapaths (e. g., splittable ALUs,complex operations)
• Configurable interconnect (e. g., nearest neighbor,k buses)
• SIMD processor, many functional units,preferably VLIW, possibly superscalar
![Page 3: L33:Low Power Reconfigurable system Jun-Dong Cho SungKyunKwan Univ. Dept. of ECE, Vada Lab](https://reader036.vdocuments.mx/reader036/viewer/2022062718/56649e845503460f94b85793/html5/thumbnails/3.jpg)
ULTRA-LOW-POWER DOMAIN-SPECIFIC MULTIMEDIA PROCESSORS
• Arthur Abnous and Jan Rabaey
• Programmability requires generalized computation, storage, and communication system, which can be used to implement different kinds of algorithms
• Domain specific processors preserve the flexibility of a general purpose programmable device to achieve higher levels of energy-efficiency, while maintaining the flexibility to handle a variety of algorithms
![Page 4: L33:Low Power Reconfigurable system Jun-Dong Cho SungKyunKwan Univ. Dept. of ECE, Vada Lab](https://reader036.vdocuments.mx/reader036/viewer/2022062718/56649e845503460f94b85793/html5/thumbnails/4.jpg)
Flexibility vs. Energy-Efficiency
• Trade-off between efficiency and
flexibility, programmable designs incur
significant performance and power
penalties compared to ASIC.
• The parallel algorithm of signal processing can be achieved
significant power savings by executing the dominant computational
kernels of a given class of applications with common features on
dedicated, optimized processing elements with minimum energy
overhead.
![Page 5: L33:Low Power Reconfigurable system Jun-Dong Cho SungKyunKwan Univ. Dept. of ECE, Vada Lab](https://reader036.vdocuments.mx/reader036/viewer/2022062718/56649e845503460f94b85793/html5/thumbnails/5.jpg)
Application Domains
CELP- Based Speech Coding• LPC Analysis and Synthesis• Codebook Search• Lag ComputationDCT- Based Video Compression and Decompression• DCT and Inverse- DCT• Motion Estimation and Compensation• Huffman Coding and Decoding Baseband Processing for Digital Radios• Demodulation, Channel Equalization• Timing Recovery, Error Correction
![Page 6: L33:Low Power Reconfigurable system Jun-Dong Cho SungKyunKwan Univ. Dept. of ECE, Vada Lab](https://reader036.vdocuments.mx/reader036/viewer/2022062718/56649e845503460f94b85793/html5/thumbnails/6.jpg)
The Re-configurable Terminal
![Page 7: L33:Low Power Reconfigurable system Jun-Dong Cho SungKyunKwan Univ. Dept. of ECE, Vada Lab](https://reader036.vdocuments.mx/reader036/viewer/2022062718/56649e845503460f94b85793/html5/thumbnails/7.jpg)
Low- Power Multimedia Processing
• Hybrid, Re-configurable Architecture– application- specific, parallellism, pipelining,– locality, minimum control- overhead, zero- power when idle
• Task Scheduling, and Miscellaneous Functions on Embedded Core Processor (low speed, minimum functionality)
• Standardized Communication Protocols reduce Design Cycle and enable High Level Support
• Use extensive set of low- power circuit techniques– Reduced swing, variable voltages and frequency, self- timin
g, locally generated clocks
![Page 8: L33:Low Power Reconfigurable system Jun-Dong Cho SungKyunKwan Univ. Dept. of ECE, Vada Lab](https://reader036.vdocuments.mx/reader036/viewer/2022062718/56649e845503460f94b85793/html5/thumbnails/8.jpg)
Arithmetic Energy Profile :VSELP Speech Coder
Lag Computation+Basic Vector Filtering+Codebook Search=76% of total time
![Page 9: L33:Low Power Reconfigurable system Jun-Dong Cho SungKyunKwan Univ. Dept. of ECE, Vada Lab](https://reader036.vdocuments.mx/reader036/viewer/2022062718/56649e845503460f94b85793/html5/thumbnails/9.jpg)
Hybrid Architecture Template
![Page 10: L33:Low Power Reconfigurable system Jun-Dong Cho SungKyunKwan Univ. Dept. of ECE, Vada Lab](https://reader036.vdocuments.mx/reader036/viewer/2022062718/56649e845503460f94b85793/html5/thumbnails/10.jpg)
The dominant, energy-intensive computationalkernels of a given domain of algorithms are implemented as a set of independent,concurrent threads of computation on the satellite processors.
The Popoased Architectue,Arthur Abnous and Jan Rabaey, UC-Berkeley
Energy- Efficiency + Domain- Specific Programmability
![Page 11: L33:Low Power Reconfigurable system Jun-Dong Cho SungKyunKwan Univ. Dept. of ECE, Vada Lab](https://reader036.vdocuments.mx/reader036/viewer/2022062718/56649e845503460f94b85793/html5/thumbnails/11.jpg)
Control Processor
• The main task of control processor is to configure the satellite processors and the communication networks and to manage the overall control flow of a given signal processing algorithm
• Uses the available satellite processor and the re-configurable interconnect to compose the data flow graph corresponding to a given kernel of computation in hardware
![Page 12: L33:Low Power Reconfigurable system Jun-Dong Cho SungKyunKwan Univ. Dept. of ECE, Vada Lab](https://reader036.vdocuments.mx/reader036/viewer/2022062718/56649e845503460f94b85793/html5/thumbnails/12.jpg)
Overlay operation
• Control processor configures network and co- processors
• Co- processors operate in distributed “data- driven” mode
• At completion, control returns to the core processor for next reconfiguration
![Page 13: L33:Low Power Reconfigurable system Jun-Dong Cho SungKyunKwan Univ. Dept. of ECE, Vada Lab](https://reader036.vdocuments.mx/reader036/viewer/2022062718/56649e845503460f94b85793/html5/thumbnails/13.jpg)
Satellite Processors
![Page 14: L33:Low Power Reconfigurable system Jun-Dong Cho SungKyunKwan Univ. Dept. of ECE, Vada Lab](https://reader036.vdocuments.mx/reader036/viewer/2022062718/56649e845503460f94b85793/html5/thumbnails/14.jpg)
Elements of Energy- Efficiency
![Page 15: L33:Low Power Reconfigurable system Jun-Dong Cho SungKyunKwan Univ. Dept. of ECE, Vada Lab](https://reader036.vdocuments.mx/reader036/viewer/2022062718/56649e845503460f94b85793/html5/thumbnails/15.jpg)
Multi-Processor Implementation
![Page 16: L33:Low Power Reconfigurable system Jun-Dong Cho SungKyunKwan Univ. Dept. of ECE, Vada Lab](https://reader036.vdocuments.mx/reader036/viewer/2022062718/56649e845503460f94b85793/html5/thumbnails/16.jpg)
Communication Network
![Page 17: L33:Low Power Reconfigurable system Jun-Dong Cho SungKyunKwan Univ. Dept. of ECE, Vada Lab](https://reader036.vdocuments.mx/reader036/viewer/2022062718/56649e845503460f94b85793/html5/thumbnails/17.jpg)
Distributed Data- Driven Control
Execution of a hardware module is triggered by the arrival of tokens. When there are no tokens to be processed at a given module, no switching activity occurs in that module.
![Page 18: L33:Low Power Reconfigurable system Jun-Dong Cho SungKyunKwan Univ. Dept. of ECE, Vada Lab](https://reader036.vdocuments.mx/reader036/viewer/2022062718/56649e845503460f94b85793/html5/thumbnails/18.jpg)
Implementation of Handshaking
![Page 19: L33:Low Power Reconfigurable system Jun-Dong Cho SungKyunKwan Univ. Dept. of ECE, Vada Lab](https://reader036.vdocuments.mx/reader036/viewer/2022062718/56649e845503460f94b85793/html5/thumbnails/19.jpg)
Single-Wire, Two-Phase Asynchronous Handshaking
Protocol
![Page 20: L33:Low Power Reconfigurable system Jun-Dong Cho SungKyunKwan Univ. Dept. of ECE, Vada Lab](https://reader036.vdocuments.mx/reader036/viewer/2022062718/56649e845503460f94b85793/html5/thumbnails/20.jpg)
Low Power Circuit Techniques
• Reduced swing interconnect (communication network, memories, programmable logic modules)
• On chip dc- dc conversion + multiple supply voltages• Locally synchronous - globally asynchronous• Automatic power- down• Optimized libraries (0.6 m CMOS + Cadence/ Syno
psys design flow)
![Page 21: L33:Low Power Reconfigurable system Jun-Dong Cho SungKyunKwan Univ. Dept. of ECE, Vada Lab](https://reader036.vdocuments.mx/reader036/viewer/2022062718/56649e845503460f94b85793/html5/thumbnails/21.jpg)
Power- Variable Performance
![Page 22: L33:Low Power Reconfigurable system Jun-Dong Cho SungKyunKwan Univ. Dept. of ECE, Vada Lab](https://reader036.vdocuments.mx/reader036/viewer/2022062718/56649e845503460f94b85793/html5/thumbnails/22.jpg)
Low Power Circuit Techniques
• Reduced swing interconnect (communication network, memories, programmable logic modules)
• On chip dc- dc conversion + multiple supply voltages• Locally synchronous - globally asynchronous• Automatic power- down• Optimized libraries (0.6 m CMOS + Cadence/ Syno
psys design flow)
![Page 23: L33:Low Power Reconfigurable system Jun-Dong Cho SungKyunKwan Univ. Dept. of ECE, Vada Lab](https://reader036.vdocuments.mx/reader036/viewer/2022062718/56649e845503460f94b85793/html5/thumbnails/23.jpg)
Design Methodology
![Page 24: L33:Low Power Reconfigurable system Jun-Dong Cho SungKyunKwan Univ. Dept. of ECE, Vada Lab](https://reader036.vdocuments.mx/reader036/viewer/2022062718/56649e845503460f94b85793/html5/thumbnails/24.jpg)
Switching Activity Reduction(a) Average activity in a multiplier as a function of the constant value
(b) A parallel and serial implementations of an adder tree.
![Page 25: L33:Low Power Reconfigurable system Jun-Dong Cho SungKyunKwan Univ. Dept. of ECE, Vada Lab](https://reader036.vdocuments.mx/reader036/viewer/2022062718/56649e845503460f94b85793/html5/thumbnails/25.jpg)
VSELP Synthesis Filter Mapped onto Satellite Processors
![Page 26: L33:Low Power Reconfigurable system Jun-Dong Cho SungKyunKwan Univ. Dept. of ECE, Vada Lab](https://reader036.vdocuments.mx/reader036/viewer/2022062718/56649e845503460f94b85793/html5/thumbnails/26.jpg)
Mappings of VSELP Kernel
The most energy efficient CELP-based speech algorithm - dissipates 36 mW ( Vdd = 1.8V, 0.5 um CMOS) - requires 23.4 MOPS
Proposed VSELP speech coder - 0.6 um CMOS - dissipates under 5 mW
![Page 27: L33:Low Power Reconfigurable system Jun-Dong Cho SungKyunKwan Univ. Dept. of ECE, Vada Lab](https://reader036.vdocuments.mx/reader036/viewer/2022062718/56649e845503460f94b85793/html5/thumbnails/27.jpg)
Case Studies
• Voice coder for cellular
• Video decoder
• Baseband radio modem
• Security - encryption processor
![Page 28: L33:Low Power Reconfigurable system Jun-Dong Cho SungKyunKwan Univ. Dept. of ECE, Vada Lab](https://reader036.vdocuments.mx/reader036/viewer/2022062718/56649e845503460f94b85793/html5/thumbnails/28.jpg)
Architecture for vector dot product
ConfigurationBus
StrobeAddress
Data
8
16
M em ory M em ory
Network (6 Buses)
AddG en AddG en
M AC
IPor
t
IPor
t
OPo
rt
Network ResetSatellite Reset
S low M ode
IP1 IP2 O P18 18
18AutoAck
M ode
• 0.6 ㎛ CMOS process
• Supply Voltage : 1.5
• Power estimation tool
– PowerMill
• 1 MAC, 2 SRAM, 2 Address
generator, 2 external input p
ort, 1 external output port
• All data and address values a
re 16-bits.
![Page 29: L33:Low Power Reconfigurable system Jun-Dong Cho SungKyunKwan Univ. Dept. of ECE, Vada Lab](https://reader036.vdocuments.mx/reader036/viewer/2022062718/56649e845503460f94b85793/html5/thumbnails/29.jpg)
Result
• The most energy efficient CELP-based speech algorithm
- dissipates 36 mW ( Vdd = 1.8V, 0.5 um CMOS)
- requires 23.4 MOPS
• Proposed VSELP speech coder
- 0.6 um CMOS
- dissipates under 5 mW
![Page 30: L33:Low Power Reconfigurable system Jun-Dong Cho SungKyunKwan Univ. Dept. of ECE, Vada Lab](https://reader036.vdocuments.mx/reader036/viewer/2022062718/56649e845503460f94b85793/html5/thumbnails/30.jpg)
IIR Mapping
![Page 31: L33:Low Power Reconfigurable system Jun-Dong Cho SungKyunKwan Univ. Dept. of ECE, Vada Lab](https://reader036.vdocuments.mx/reader036/viewer/2022062718/56649e845503460f94b85793/html5/thumbnails/31.jpg)
IIR Comparison
![Page 32: L33:Low Power Reconfigurable system Jun-Dong Cho SungKyunKwan Univ. Dept. of ECE, Vada Lab](https://reader036.vdocuments.mx/reader036/viewer/2022062718/56649e845503460f94b85793/html5/thumbnails/32.jpg)
FFT Mapping
![Page 33: L33:Low Power Reconfigurable system Jun-Dong Cho SungKyunKwan Univ. Dept. of ECE, Vada Lab](https://reader036.vdocuments.mx/reader036/viewer/2022062718/56649e845503460f94b85793/html5/thumbnails/33.jpg)
FFT Comparison
![Page 34: L33:Low Power Reconfigurable system Jun-Dong Cho SungKyunKwan Univ. Dept. of ECE, Vada Lab](https://reader036.vdocuments.mx/reader036/viewer/2022062718/56649e845503460f94b85793/html5/thumbnails/34.jpg)
ResultStrongARM TMS320C2xx TMS320LC54x XC4003A Pleiades
Frequency(MHz)
# of Multipliers
Throughput(cycle/tap)
Energy/tap(J)
Processor
169
0.5
17
37.4n
20
1
40 6 14
1
1
1.3n
1
600p
5 1
2.2n 205p
0.2 1
StrongARM TMS320C2xx TMS320LC54x XC4003A Pleiades
Frequency(MHz)
# of Multipliers
Throughput(cycle/IIR)
Energy/IIR(J)
Processor
169
0.5
114
277n
20
1
40 2.1 14
1
20
19.1n
13
9.5n
9 2
103n 1.9n
1 8
StrongARM TMS320C2xx TMS320LC54x XC4003A Pleiades
Frequency(MHz)
# of Multipliers
Throughput(cycle/stage)
Energy/stage(J)
Processor
169
0.5
766
1870n
20
1
40 - 14
1
152
131n
76
49.3n
- 4
- 13.3n
- 8
FIRResults
IIRResults
FFTResults
StroangARM: micro-processor[2]
TMS320C2xx: DSP chip
[3,4,5,6]
TMS320LC54x: DSP chip
[7,8,12]
XC4003A: FPGA chip[9,10]
![Page 35: L33:Low Power Reconfigurable system Jun-Dong Cho SungKyunKwan Univ. Dept. of ECE, Vada Lab](https://reader036.vdocuments.mx/reader036/viewer/2022062718/56649e845503460f94b85793/html5/thumbnails/35.jpg)
Conclusions• The StrongARM has the worst performance of all because it takes many instru
ctions and cycles to execute a kernel in a highly sequential manners.– The lack of a single-cycle multiplier exacerbates this problem.
– The other architecture have more internal parallelism that allow them to have superior performance.
• Pleiades (architecture for vector dot product) does much better on the energy scale than the TI DSPs.
– Because DSPs are general-purpose, and instruction execution involves a great deal of overhead.
– Pleiades has the ability to create dedicated hardware structures tuned to the task at hand and executes operations with a small energy overhead.
• Pleiades outperforms the other processors by a large margin owing to its ability to exploit higher levels of parallelism by creating a dedicated parallel structure from its computational resources and flexible interconnect.
![Page 36: L33:Low Power Reconfigurable system Jun-Dong Cho SungKyunKwan Univ. Dept. of ECE, Vada Lab](https://reader036.vdocuments.mx/reader036/viewer/2022062718/56649e845503460f94b85793/html5/thumbnails/36.jpg)
Reconguration for Power Savingin Real-Time Motion Estimation,S.R.Par
k,UMASS
![Page 37: L33:Low Power Reconfigurable system Jun-Dong Cho SungKyunKwan Univ. Dept. of ECE, Vada Lab](https://reader036.vdocuments.mx/reader036/viewer/2022062718/56649e845503460f94b85793/html5/thumbnails/37.jpg)
Motion Estimation
![Page 38: L33:Low Power Reconfigurable system Jun-Dong Cho SungKyunKwan Univ. Dept. of ECE, Vada Lab](https://reader036.vdocuments.mx/reader036/viewer/2022062718/56649e845503460f94b85793/html5/thumbnails/38.jpg)
Block Matching Algorithm
![Page 39: L33:Low Power Reconfigurable system Jun-Dong Cho SungKyunKwan Univ. Dept. of ECE, Vada Lab](https://reader036.vdocuments.mx/reader036/viewer/2022062718/56649e845503460f94b85793/html5/thumbnails/39.jpg)
Configurable H/W Paradigms
![Page 40: L33:Low Power Reconfigurable system Jun-Dong Cho SungKyunKwan Univ. Dept. of ECE, Vada Lab](https://reader036.vdocuments.mx/reader036/viewer/2022062718/56649e845503460f94b85793/html5/thumbnails/40.jpg)
Programmable Logic Modules
![Page 41: L33:Low Power Reconfigurable system Jun-Dong Cho SungKyunKwan Univ. Dept. of ECE, Vada Lab](https://reader036.vdocuments.mx/reader036/viewer/2022062718/56649e845503460f94b85793/html5/thumbnails/41.jpg)
Why Hardware for Motion Estimation?
• Most Computationally demanding part of Video Encoding
• Example: CCIR 601 format
• 720 by 576 pixel
• 16 by 16 macro block (n = 16)
• 32 by 32 search area (p = 8)
• 25 Hz Frame rate (f frame = 25)
• 9 Giga Operations/Sec is needed for Full Search Block Matching Algorithm.
![Page 42: L33:Low Power Reconfigurable system Jun-Dong Cho SungKyunKwan Univ. Dept. of ECE, Vada Lab](https://reader036.vdocuments.mx/reader036/viewer/2022062718/56649e845503460f94b85793/html5/thumbnails/42.jpg)
Why Reconguration in Motion Estimation?
• Adjusting the search area at frame-rate according to the changing characteristics of video sequences
• Reducing Power Consumption by avoiding unnecessary computation
Motion Vector Distributions
![Page 43: L33:Low Power Reconfigurable system Jun-Dong Cho SungKyunKwan Univ. Dept. of ECE, Vada Lab](https://reader036.vdocuments.mx/reader036/viewer/2022062718/56649e845503460f94b85793/html5/thumbnails/43.jpg)
Architecture for Motion EstimationFrom P. Pirsch et al, VLSI Architectures for Video Compression, Proc. Of IEEE, 1995
![Page 44: L33:Low Power Reconfigurable system Jun-Dong Cho SungKyunKwan Univ. Dept. of ECE, Vada Lab](https://reader036.vdocuments.mx/reader036/viewer/2022062718/56649e845503460f94b85793/html5/thumbnails/44.jpg)
Re-configurable Architecture for ME
![Page 45: L33:Low Power Reconfigurable system Jun-Dong Cho SungKyunKwan Univ. Dept. of ECE, Vada Lab](https://reader036.vdocuments.mx/reader036/viewer/2022062718/56649e845503460f94b85793/html5/thumbnails/45.jpg)
Power Estimation in Recongurable Architecture
![Page 46: L33:Low Power Reconfigurable system Jun-Dong Cho SungKyunKwan Univ. Dept. of ECE, Vada Lab](https://reader036.vdocuments.mx/reader036/viewer/2022062718/56649e845503460f94b85793/html5/thumbnails/46.jpg)
Power vs Search area
![Page 47: L33:Low Power Reconfigurable system Jun-Dong Cho SungKyunKwan Univ. Dept. of ECE, Vada Lab](https://reader036.vdocuments.mx/reader036/viewer/2022062718/56649e845503460f94b85793/html5/thumbnails/47.jpg)
Resource Reuse in FPGAs
![Page 48: L33:Low Power Reconfigurable system Jun-Dong Cho SungKyunKwan Univ. Dept. of ECE, Vada Lab](https://reader036.vdocuments.mx/reader036/viewer/2022062718/56649e845503460f94b85793/html5/thumbnails/48.jpg)
Conclusion
• By adjusting the search area according to the changing characteristics of a picture, power can be saved. Further power saving can be achieved by utilizing freed up resources for local memory
• Extension of Adaptive Search Space Method to Software Implementation– Varying p still reduces computation and hence power– Resource reuse may also be applicable in S/W
implementation by freeing up cache space and compute power for more power efficient use of memory
![Page 49: L33:Low Power Reconfigurable system Jun-Dong Cho SungKyunKwan Univ. Dept. of ECE, Vada Lab](https://reader036.vdocuments.mx/reader036/viewer/2022062718/56649e845503460f94b85793/html5/thumbnails/49.jpg)
Future Works
• Reconguration to support more sophisticated motion estimation algorithms ( intelligent search, object-based, ...)
• More detailed performance studies over a wider range of video sequences
• Generalization of this concept to other algorithms and architectures (not just video)
• Modification to FPGA architectures to support the use of logic and configuration cells as local memory
![Page 50: L33:Low Power Reconfigurable system Jun-Dong Cho SungKyunKwan Univ. Dept. of ECE, Vada Lab](https://reader036.vdocuments.mx/reader036/viewer/2022062718/56649e845503460f94b85793/html5/thumbnails/50.jpg)
Motion Estimation - Conventional
![Page 51: L33:Low Power Reconfigurable system Jun-Dong Cho SungKyunKwan Univ. Dept. of ECE, Vada Lab](https://reader036.vdocuments.mx/reader036/viewer/2022062718/56649e845503460f94b85793/html5/thumbnails/51.jpg)
Motion Estimation - Data Reuse
P P P
P P P P
P P
a add abs
b add add abs
abs add
2 2
2
0 45
2
2 1
2
/
/
.
Therefore, power reduction
factor is 11%
![Page 52: L33:Low Power Reconfigurable system Jun-Dong Cho SungKyunKwan Univ. Dept. of ECE, Vada Lab](https://reader036.vdocuments.mx/reader036/viewer/2022062718/56649e845503460f94b85793/html5/thumbnails/52.jpg)
Kernel Scheduling in Reconfigurable Computing
• R. Maestre, F. J. Kurdahi, N. Bagherzadeh, H. Singh, R. Hermida, M. Fernandez, Design and Test in Europe, DATE99, Munich, Germany, Mar 99
The PartitionPartition is to fine some subsets of kernel that may be scheduled
(executed) independently of other kernels.
Partitioning of the application DFG
The SchedulingScheduling is performed within a given partition in detail after
partitioning .
Scheduling within a given partition
![Page 53: L33:Low Power Reconfigurable system Jun-Dong Cho SungKyunKwan Univ. Dept. of ECE, Vada Lab](https://reader036.vdocuments.mx/reader036/viewer/2022062718/56649e845503460f94b85793/html5/thumbnails/53.jpg)
The Major Criteria
M E M C DCT Q IQ IDCT IM C
6 blocks blocks blocks blocks blocksblocks blocks
Fram e
8 4 21 6 6 421# of contexts :
M PEGsequence
G ranularityof com putation
¨Í
M E M C DCT Q IQ IDCT IM C
396(Fram e)
6 6 6 6 66¨Î
M E M C DCT Q IQ IDCT IM C
6 ¡¿396
6 66
¨Ï 396 396
a) M PEG sequence and granularity, b) a possib le schedule of an im age fram e, c) an a lternative schedule
• Context reloading
– Minimizing
• Data reuse
– Maximize
• Computation and
data movement
overlapping
– Maximize
![Page 54: L33:Low Power Reconfigurable system Jun-Dong Cho SungKyunKwan Univ. Dept. of ECE, Vada Lab](https://reader036.vdocuments.mx/reader036/viewer/2022062718/56649e845503460f94b85793/html5/thumbnails/54.jpg)
Scheduling
C M
F B se t 1
F B se t 2
R 1i-1,R 2
i-1
K 1i K 2
i
C 3i
kc 2kc 1= 0
R 3i-1,D 1
i+1,D 2i+1,D 3
i+1
C 1i+1,C 2
i+1
kc 3
K 3i
T im e
¥ái = even t in ¥á ite ra tion i.
k i = C om pu ta tion tim e .
kc i = P ossib le ove rlap o f com pu ta tion and con text load ing
C i = C on text load ing tim e .
D i = D a ta load ing tim e .
R i = R esu lt read ing tim e .
Ide l tim e
P artition = { k 1, k2, k3 }. A poss ib le schedu le :
< Execution m odel representation >
![Page 55: L33:Low Power Reconfigurable system Jun-Dong Cho SungKyunKwan Univ. Dept. of ECE, Vada Lab](https://reader036.vdocuments.mx/reader036/viewer/2022062718/56649e845503460f94b85793/html5/thumbnails/55.jpg)
Algorithm
K i
K j
Km
K p
1 2
3 4
B C = TR U E
a. LE E = ¥õ
K i
K j
Km
K p
2
3 4
B C = TR U E
b. LE E = { 1 }
K i
K j
Km
K p
3 4
B C = TR U E
c. LE E = { 1 , 2 }
K i
K j
Km
K p
2
4
B C =TR U E
d. LE E = { 1 , 3 }
K i
K j
Km
K p
2
B C =TR U E
b. LE E = { 1 , 3 , 4 }
K i
K j
Km
K p
B C =TR U E
c. LE E = { 1 , 4 }
2
3
B C =TR U E
< Som e steps of an exploration sequence >
![Page 56: L33:Low Power Reconfigurable system Jun-Dong Cho SungKyunKwan Univ. Dept. of ECE, Vada Lab](https://reader036.vdocuments.mx/reader036/viewer/2022062718/56649e845503460f94b85793/html5/thumbnails/56.jpg)
References[1] A. Abnous and J. Rabeay, “Ultra-Low-Power Domain-Specific Multimedia Processors”, Proceedings of
the IEEE VLSI Signal Processing Workshop, San Francisco, Oct 1996.
[2] Digital Semiconductor, Digital Semiconductor SA-110 Microprocessor Technical Reference Manual, Digital Equipment Corporation, 1996.
[3] TMS320C5x General-Purpose Application User’s Guides, Literatures Number SPRU164, TI, 1997.
[4] T. Anderson, The TMS320C2xx Sum-of-Products Methodology, Technical Application Report SPRA068, TI, 1996.
[5] M. Tsai, IIR Filter Design on the TMS320C54x DSP, Technical Application Report SPRA079, TI, 1996.
[6] Ftp://ftp.ti.com/pub/tms320bbs/c5xxfiles/54xffts.exe, ‘C54x Software Support Files, TI.
[7] C.Turner, Calculation of TMS320LS54x Power Dissipation, Technical Application Report SPRA164, TI, 1997.
[8] C.Turner, Calculation of TMS320LS54x Power Dissipation, Technical Application Report SPRA088, TI, 1996.
[9] E. Kusse, Personal communication, 1996.
![Page 57: L33:Low Power Reconfigurable system Jun-Dong Cho SungKyunKwan Univ. Dept. of ECE, Vada Lab](https://reader036.vdocuments.mx/reader036/viewer/2022062718/56649e845503460f94b85793/html5/thumbnails/57.jpg)
References
[10] J. Rabeay et al., “Fast Prototyping of Data Path Intensive Architecture”, IEEE Design & Test Magazine, Vol. 8, N0. 2, pp. 40-51, 1991.
[11] J. Montanaro et al., “A 160-MHz, 32-b, 0.5-W CMOS RISC Microprocessor”, IEEE Journal of Solid-State Circuit, Vol. 31, N0. 11, pp. 1703-1714, Nov. 1996.
[12] A. Fischman and P. Rowland, Designing Low-Power Applications with TMS320LC54x, Technical Application Report SPRA281, TI, 1997.
[13] Daniel D. Gajski, Nikil D. Dutt, Allen C-H Wu, Steve Y-L Lin, \High-level synthesis, Introduction to chip and system design," Kluwer Academic publishers, 1992.
[14] Duncan A. Buell, Jerey M.Arnold, Walter J.Kleinfelde \Splash2, FPGAs in Custom Computing Machine," IEEE Computer Society Press, Los Alamitos, California.
[15] Jonathan Babb, Russell Tessier, Mathew Dahl, Silvina Zimi Hanono, David M. Hoki, and Anant Agarwal, Logic emulation with virtual wires," IEEE Transactions on Computer Aided Design of Integrated circuits and systems, vol. 16, No. 6, June 1997.
[16] M.Vasilko, Djamel Ait-Boudaoud, \Architectural synthesis techniques for dynamically Recongurable logic," Field Programmable Logic: Smart Applications, New Paradigms and Compilers, Proceedings of 6th Int. Workshop on Field Programmable Logic and Applications,FPL 96, Darmstadt, Germany, Sept. 23-25 1996.
![Page 58: L33:Low Power Reconfigurable system Jun-Dong Cho SungKyunKwan Univ. Dept. of ECE, Vada Lab](https://reader036.vdocuments.mx/reader036/viewer/2022062718/56649e845503460f94b85793/html5/thumbnails/58.jpg)
References
[17] Patrick Lysaght, Gordon McGregor and Jonathan Stockwood, Conguration Controller Synthesis for Dynamically Recongurable Systems," IEE Colloquium on Hardware Software COSynthesis for Recongurable systems, 1996.
[18] M.Vasilko, Djamel Ait-Boudaoud, Scheduling for dynamically Recongurable FPGAs," Proceedings of International workshop on Logic and Architecture synthesis, pp. 328-336, IFIPTC10 WG10.5, Dec. 18-19 1995.
[19] Doug Smith, Dinesh Bhatia, RACE: Recongurable and Adaptive Computing Environment,” Field Programmable Logic: Smart Applications, New Paradigms and Compilers, Proceedings of 6th Int. Workshop on Field Programmable Logic and Applications,FPL 96, Darmstadt, Germany, Sept. 23-25 1996. See http://www.ececs.uc.edu/ ~ dal.
[20] Xilinx Netlist Format (XNF) Specication, Version 6.1, June 1, 1995.
[21] Xilinx XABEL reference manual.