algorithm analysis and mapping environment for...
TRANSCRIPT
Eric Pauer, Cory Myers, Ken Smith, and Paul Fiore{pauer,cory,jmsmith,pfiore}@sanders.com
Sanders, a Lockheed Martin CompanyNashua, NH 03061
Algorithm Analysis and MappingEnvironment for Adaptive Computing
Systems
Adaptive Computing System Design and Implementation using the ACS Domain
Third Bi-Annual Ptolemy Miniconference - 1999
PUBS-98-M21_001-P2/12/99 2
Reconfigurable computing technology offers leap aheadperformance, e.g. 10X ops per watt and/or ops per cubic inch,
over general purpose programmable solutions without theneed to develop custom hardware. However, today
generation of a working implementation requires hardwaredesign expertise and generation of a good implementation
requires many slow iterations between an algorithm designerand a hardware developer.
Statement of the Problem
Adaptive Computing System Design and Implementation using the ACS Domain
Third Bi-Annual Ptolemy Miniconference - 1999
PUBS-98-M21_001-P2/12/99 3
EXCEED(1)1330
ThreshCalc
FilterBank
MeanCalc
ThreshDelayRam
ExceedCalcFilter
SelectY Calc Sigma
Calc
FBINDelayRAM
MNINDelayRam
THRES(19)1304
FBOUT(20)1328
MEAN(19)1323
FSDLY(2)1323
1296
FLSEL(2)
1298
THDL(19)1329
L = 8s, 6fC = 355
L = 545fb, 30fsC = 3880
L = 1037C = 993
L = 25C = 0
L = 7m, 7fs, 2fo, 1tC = 288MNINO(12)
286
L = 286C = 0
L = 1037C = 1002
L = 783C = 0
SIGMA(20)
FBINO(12)783
IN(12)
L = 259C = 287
Y(13)259
Greater than 10X performanceDesign time measured in weeks
Analysis:Bit WidthsLatency (L)Cells Used (C)
Adaptive Computing Performance GainCHAMP TMS 320C80
Image Size 256 x 256 256 x 256Implementation Time 44 Days 28 DaysFrame Rate 305 frames/sec 12 frames/secLatency 68 µsec 82,000 µsecProcessing Load 4.7 Bops 0.2 BopsUtilization 73% UnknownGates 510k N/A
(Operation count increase by 70% if memory loads and stores are counted)
Reconfigurable Architectures
CHAMP
CHAMP 2 MCMRCP
Adaptive Computing System Design and Implementation using the ACS Domain
Third Bi-Annual Ptolemy Miniconference - 1999
PUBS-98-M21_001-P2/12/99 4
Floating PointSimulation
Fixed PointSimulation
Bit Width AnalysisNoise Distribution Analysis
AlgorithmRearrangement
Dataflow Graph
Algorithm Analysis
PerformanceModeling
AutomaticScheduling
Algorithm Mapping
Smart Generators
Partitioning andMapping
AdaptiveComputingResource
Precision Analysis Alternative Implementations
Dataflow Graph
Performance Metrics
Device Programming
Allocated Functions
•SNR analysis•Alternative
implementations•Functional
approximations
•Timing and sizingestimation
•Scheduling – FSM andcontexts
•Partitioning within aresource node
•Device program•Interface program
CommonDatabase
inPtolemy
GeneratorSelection
VHDL Interface Libraries
Analysis and Mapping in ACS Environment
Adaptive Computing System Design and Implementation using the ACS Domain
Third Bi-Annual Ptolemy Miniconference - 1999
PUBS-98-M21_001-P2/12/99 5
OptimizeWordlengths
Floating PointSimulation
Fixed PointRealization
Bit Widths for each flowCost estimates (area/complexity)Quantization Noise (SNR)
Yields
a
b1 b3
b6b
b2 b4
x
x2
x2b5
y
F1 F3 F4F2
F6 F7F5
F’1 F’3 F’4F’2
F’6 F’7F’5
Automated Float to Fixed Point Translation
SNRCost
Maximize SNR(b),subject to Cost(b) " C0 and b " bmin " 0
orMinimize Cost(b),
subject to SNR(b) " SNR0 and b " bmin " 0
Adaptive Computing System Design and Implementation using the ACS Domain
Third Bi-Annual Ptolemy Miniconference - 1999
PUBS-98-M21_001-P2/12/99 6
•Ptolemy - simulation/design environment from the University ofCalifornia, Berkeley (http://ptolemy.eecs.berkeley.edu)
•New ACS domain developed to facilitate movement amongsimulation and code/design generation (released in 0.7.1, 6/98)
•ACS Stars (basic building block) are composed of a Corona(interface) and multiple cores (implementations)
•Core (implementation) selection is via targeting mechanism
Fixed_PointSimulation
Core
C CodeGeneration
Core
FPGA DesignGeneration
Core
FPGA JavaGeneration
Core
Floating_PointSimulation
Core
Corona
Retargetable Implementations
Common Interface
UCB BRASS Project
Ptolemy and the ACS Domain
Available cores/targets
Adaptive Computing System Design and Implementation using the ACS Domain
Third Bi-Annual Ptolemy Miniconference - 1999
PUBS-98-M21_001-P2/12/99 7
Top Level Example• FIR filter to be implemented in both floating point and fixed point simulation
Adaptive Computing System Design and Implementation using the ACS Domain
Third Bi-Annual Ptolemy Miniconference - 1999
PUBS-98-M21_001-P2/12/99 8
• Alternative implementations are represented as “targets”• Targets can have parameters• Floating point simulation, fixed point simulation, and C code
generation are integrated today. FPGA generation almost ready.
Selecting Among Alternative Implementations
ACS-CGFPGAsoon!
Adaptive Computing System Design and Implementation using the ACS Domain
Third Bi-Annual Ptolemy Miniconference - 1999
PUBS-98-M21_001-P2/12/99 9
•Comparison of floating point and fixed point implementations
Comparing Implementations
Adaptive Computing System Design and Implementation using the ACS Domain
Third Bi-Annual Ptolemy Miniconference - 1999
PUBS-98-M21_001-P2/12/99 10
Winograd-based FSK Receiver
Adaptive Computing System Design and Implementation using the ACS Domain
Third Bi-Annual Ptolemy Miniconference - 1999
PUBS-98-M21_001-P2/12/99 11
ACS Domain - CGFPGA TargetWinograd dataflow (ACS domain)
Dataflow/Hardware schedule
VHDL design (generated)
The results are sent to synthesisand place/route, yielding
complete FPGA implementation!
CGFPGAtarget yields:VHDL designand schedule
Adaptive Computing System Design and Implementation using the ACS Domain
Third Bi-Annual Ptolemy Miniconference - 1999
PUBS-98-M21_001-P2/12/99 12
Winograd schedule
Adaptive Computing System Design and Implementation using the ACS Domain
Third Bi-Annual Ptolemy Miniconference - 1999
PUBS-98-M21_001-P2/12/99 13
FPGA-based Winograd DFT
Address generatorWord counter
Data multiplexerSequencer/State Machine
Winograd DFTcomputations
Xilinx 4062XL
Adaptive Computing System Design and Implementation using the ACS Domain
Third Bi-Annual Ptolemy Miniconference - 1999
PUBS-98-M21_001-P2/12/99 14
Hardware-in-the-loop
SDF Wildforce star executes complete FPGA designin hardware on Annapolis Wildforce FPGA board
SDF Galaxy
Adaptive Computing System Design and Implementation using the ACS Domain
Third Bi-Annual Ptolemy Miniconference - 1999
PUBS-98-M21_001-P2/12/99 15
Sim vs. Hardware
Adaptive Computing System Design and Implementation using the ACS Domain
Third Bi-Annual Ptolemy Miniconference - 1999
PUBS-98-M21_001-P2/12/99 16
Context SwitchingReconfigurable
Computing
Adaptive ComputingSmart Modules
Algorithm Analysis andMapping Environment
for AdaptiveComputing Systems
ReconfigurableAlgorithms for Adaptive
Computing
Efficient MathematicalAlgorithms for Image
Processing Applications
Advanced SensorsFederated Laboratory
TechnologyApplications
JSF
ACP
HHSAR
MAV
Reconfigurable andAdaptive Computing
IRADs
ASFL
OSA
REE
ISAC
AdaptiveComputing
SystemInsertions
Related ACS Work at Sanders