adaptive computing systems (acs) domain for implementing ... · objective/approach/process ... to...
TRANSCRIPT
Adaptive Computing Systems (ACS)Domain for Implementing DSP
Algorithms inReconfigurable Hardware John Zaino, Eric Pauer, Ken Smith, Paul Fiore,
Jairam Ramanathan, Cory Myers{john.c.zaino, ken.smith, paul.d.fiore, cory.s.myers}@lmco.com,
[email protected], [email protected]
Fourth Biennial Ptolemy Miniconference22-23 March 2001
03/07/01 Page - 2BAE SYSTEMS • University of California, Berkeley
Fourth Biennial Ptolemy Miniconference – 22-23 Mar 2001
ACS Domain DSP Algorithms in Reconfig.HW
Automatic Implementation
Objective/Approach/Process• Reconfigurable computing technology offers significant performance gains, e.g. 10X
ops per watt and/or ops per cubic inch, over general purpose programmablesolutions without the need to develop custom hardware. Today however,development of a working implementation requires hardware design expertise andgeneration of a good implementation requires many slow iterations between analgorithm designer and a hardware developer.
• Objective - reduce the design time for an initial implementation to hours and for anoptimized implementation to days, for a range of signal processing applications
• Approach - provide the algorithm developer with tools to help analyze algorithms,understand their implications for hardware, and rapidly implement their chosensolutions
• In the process, isolate the algorithm developer from the hardware designer through aset of library elements that provide well-defined interfaces to both communities
Direct mapping of algorithmto adaptive computing system
implementation.
03/07/01 Page - 3BAE SYSTEMS • University of California, Berkeley
Fourth Biennial Ptolemy Miniconference – 22-23 Mar 2001
ACS Domain DSP Algorithms in Reconfig.HW
Technical Attributes• Development of Adaptive Computing Systems domain under Ptolemy
Classic– Allows alternative implementations from same dataflow graph– Provides floating point simulation, fixed point simulation, C code generation and
VHDL code generation– Released first three versions of ACS domain in Ptolemy Classic
• End-to-end capability to map signal processing dataflow graph to workingreconfigurable computing implementation
• Design space exploration automated– Bit width optimization theory (Markovian modeling) developed for algorithm
analysis– Bit width optimization tool implemented to trade signal to noise ratio versus
hardware complexity
• Pipeline alignment and scheduling algorithms implemented– Automatically generate “algorithm-specific” sequencer and memory control logic
– Uni-rate and multi-rate signal processing– Single and multi-FPGA implementations
• Smart Generators- parameterizable algorithmic blocks
03/07/01 Page - 4BAE SYSTEMS • University of California, Berkeley
Fourth Biennial Ptolemy Miniconference – 22-23 Mar 2001
ACS Domain DSP Algorithms in Reconfig.HW
Analysis and Mapping in ACS Environment
Floating PointSimulation
Fixed PointSimulation
Bit Width AnalysisNoise Distribution Analysis
AlgorithmRearrangement
Dataflow Graph
Algorithm Analysis
PerformanceModeling
AutomaticScheduling
Algorithm Mapping
Smart Generators
Partitioning andMapping
GeneratorSelection
VHDL Interface LibrariesAdaptive Computing
Resource
Precision AnalysisAlternative Implementations
Dataflow Graph
Performance Metrics
Device Programming
Allocated Functions
• SNR analysis• Alternative
implementations• Functional
approximations
• Timing and sizingestimation
• Scheduling• Partitioning
across multipleFPGAs
• Device program• Interface program
CommonDatabase
inPtolemy
03/07/01 Page - 5BAE SYSTEMS • University of California, Berkeley
Fourth Biennial Ptolemy Miniconference – 22-23 Mar 2001
ACS Domain DSP Algorithms in Reconfig.HW
Design Approach
Device Driver
Dataflow Graph
Device Program
HardwareConfiguration Library
Host Processor
Application InterfaceGeneration
Represent in Dataflow
Analysis and Simulation
Bit WidthAnalysis Noise
DistributionAnalysis
AlgorithmRearrangement
Algorithm Analysis
PerformanceModeling
Algorithm Mapping
Smart Generators
Partitioning &Mapping
Precision AnalysisAlternative
ImplementationsSignal Flow Graph
Performance Metrics
Allocated Functions
Floating PointSimulation
Fixed PointSimulation
AutomaticScheduling
CommonDatabase
inPtolemy
Generator Selection
VHDL Interface Libraries
Signal Processing Algorithm
Ptolemy Environment
Compute LibrariesOperating System
Run-Time Manager
ApplicationSoftware
Logic Generation
Floorplanner
RoutingEnhanced or New Capability
Existing Tool or Hardware
Legend
ReconfigurableHardware
Run TimeDesign Time
03/07/01 Page - 6BAE SYSTEMS • University of California, Berkeley
Fourth Biennial Ptolemy Miniconference – 22-23 Mar 2001
ACS Domain DSP Algorithms in Reconfig.HW
Program Progress• Algorithm analysis
– Representation for alternative implementations was incorporatedas part of Ptolemy integration
– Side-by-side simulation capability incorporated as part ofPtolemy integration
– Developed bit width optimization theory for algorithm analysisand extended to include multiple devices and constraints
– Implemented wordlength optimization tool
• Algorithm mapping– Cost analysis included as part of wordlength analysis– Implemented uni-rate and multi-rate pipeline alignment and
scheduling algorithm for signal processing dataflow graphs– One-to-one and one-to-many mapping of functions to blocks
supported
03/07/01 Page - 7BAE SYSTEMS • University of California, Berkeley
Fourth Biennial Ptolemy Miniconference – 22-23 Mar 2001
ACS Domain DSP Algorithms in Reconfig.HW
Program Progress• Smart generators
– Implemented portable logic synthesis methodology with VHDL as first target– Integrated Xilinx Core 4,000-series generators capability within VHDL code generation– Implemented smart generators for state machine sequencer and memory control (address
generator)
• Ptolemy integration– Released initial version of “Adaptive Computing Systems” domain in Ptolemy in June 1998.
Second release April 1999. Third release in August 2000.– ACS domain supports alternative implementations from a common interface. Floating point
simulation, fixed point simulation, C code generation, and VHDL code generation.
• Demonstration– Selected Annapolis Micro Systems WildforceTM board for demonstrations– Established ACS demonstration environment for Solaris– Integrated WildforceTM board under Ptolemy– Demonstrated Winograd-based FSIC receiver and FFT-based signal detector– Procured and installed Annapolis Micro systems WildstarTM board under Solaris– SHARP/HRR (High Range Resolution Radar ATR) algorithm modeled - hardware development
& testing nearing completion
03/07/01 Page - 8BAE SYSTEMS • University of California, Berkeley
Fourth Biennial Ptolemy Miniconference – 22-23 Mar 2001
ACS Domain DSP Algorithms in Reconfig.HW
ACS Domain• Determined that extending old domains could not be justified• New paradigm for Ptolemy, e.g. multiple implementations of a single star
03/07/01 Page - 9BAE SYSTEMS • University of California, Berkeley
Fourth Biennial Ptolemy Miniconference – 22-23 Mar 2001
ACS Domain DSP Algorithms in Reconfig.HW
ACS Domain• New ACS domain to facilitate movement among simulation
and code/design generation• Corona contains interface specification• Core contains an implementation• ACS Stars are composed of one corona and multiple cores• Core selection via targeting defines implementation
Fixed_PointSimulation
Core
C CodeGeneration
Core
FPGA DesignGeneration
Core
Floating_PointSimulation
Core
Corona
TargetsCorona Core
03/07/01 Page - 10BAE SYSTEMS • University of California, Berkeley
Fourth Biennial Ptolemy Miniconference – 22-23 Mar 2001
ACS Domain DSP Algorithms in Reconfig.HW
Selecting Among Alternative Implementations
• Alternative implementations are represented as “targets” with coresfor each star/functional block
• Targets can have parameters• Floating point simulation, fixed point simulation, C code generation,
and FPGA design generation are available.
03/07/01 Page - 11BAE SYSTEMS • University of California, Berkeley
Fourth Biennial Ptolemy Miniconference – 22-23 Mar 2001
ACS Domain DSP Algorithms in Reconfig.HW
Algorithm Analysis
ImplementationTrades
...
...
Coeffs Data
Acc1 Acc2 Acc3
Yn =a0 Xn +a1 Xn - 1+a2Xn - 2
Yn + 1=a0 Xn + 1+a1 Xn +a2Xn - 1
Yn + 2=a0 Xn + 2+a1 Xn + 1+a2 Xn
...P ( z )
Q u a n t i ze
L o o p F i l t e r
......
FAFA...
FA
FAFA...
FA
... ...
Basic FIR
Reduce Wordlength
Reduce Taplength
Systolic
Low Power
Bit Serial
Multirate
MultipleRepresentations
Perform Trade-Offs• Precision (float vs.
fixed, wordlengths)• Speed• Size/Area• Latency
1z
...IQ
N/3 DELAYS
...1z
1z
1z
1z
1z
1z
2N/3 DELAYS
Angle
Freq.Est.
IQ
1z
1z
N DELAYS
Angle
Freq.Est.
1z
...
N MultsN Adds
1 Scaling7 Adds
AlgorithmAnalysis
03/07/01 Page - 12BAE SYSTEMS • University of California, Berkeley
Fourth Biennial Ptolemy Miniconference – 22-23 Mar 2001
ACS Domain DSP Algorithms in Reconfig.HW
Wordlength Optimization Analysis
QuantizationNoise(SNR)
Hardware Cost
DynamicRange
OptimalDesign
Choices
03/07/01 Page - 13BAE SYSTEMS • University of California, Berkeley
Fourth Biennial Ptolemy Miniconference – 22-23 Mar 2001
ACS Domain DSP Algorithms in Reconfig.HW
Algorithm Mapping• Objectives
– Performance Modeling – provide feedback on utilization, throughput, efficiency, etc.Feedback should be used by algorithm analysis capabilities.
– Partitioning and Mapping – break large dataflow graphs into groups and map those groupsacross multiple devices and across time
– Automatic Scheduling – automatically determine firing sequence, optimal mappings andsequence of configurations
• Progress– Cost analysis included as part of wordlength analysis– Implemented uni-rate and multi-rate pipeline alignment and scheduling algorithms– Memory allocation support
PerformanceModeling
Partitioning andMapping
AutomaticScheduling
Signal Flow Graph
CommonDatabase
inPtolemy Performance Metrics
Allocated Functions
03/07/01 Page - 14BAE SYSTEMS • University of California, Berkeley
Fourth Biennial Ptolemy Miniconference – 22-23 Mar 2001
ACS Domain DSP Algorithms in Reconfig.HW
N8
PORT1 PORT2
I2
P=2
I3
P=1
I4
P=1
I5
P=1
I6
P=1
N1
N2
N3
N4
N5
N6
N7
A
B
C
D
E
THE ALGORITHM DATAFLOW GRAPH
I = Instance N=NodeP=Pipeline Delays
DATAPATH AND VARIABLE LOCATIONS
ADDED TO NETLIST BY SEQUENCER GENERATOR
MODIFIED ALGORITHM DATAFLOW GRAPH
FINAL ALGORITHM SCHEDULE
Input Output
Pipeline alignment and schedule determinationrequired for logic synthesis
Node N1 N2 N3 N4 N5 N6 N7 N8 N9 SEL LD1 LD2 LD3 PORT1 PORT2
Activation Sequence
N8
I2
I3P=1
I4P=1
I5P=1
I6P=1
N1
N2
N3
N4
N5
N6
N7
I7P=1
N9
LDEN1
LDEN2
LDEN2
MEM1MEM2
2-1MUX
DELAY
SEL
P=2
FPGA
RAMBANK 1
A
B
C
RAMBANK 2
D
E
Automatic Scheduling
03/07/01 Page - 15BAE SYSTEMS • University of California, Berkeley
Fourth Biennial Ptolemy Miniconference – 22-23 Mar 2001
ACS Domain DSP Algorithms in Reconfig.HW
Processing Model• Well-matched to Ptolemy Synchronous Dataflow (SDF) Domain
– Unit or block token produce and consume amounts– Netlist structure determines execution order constraints– Pipeline delay information required to determine absolute timing
– Delays are set to align pipelines for maximum throughput– Delay can be automatically determined from block parameters
• Combination of fully synchronous model and tagged synchronous models– No handshaking or tags but data is not always valid– Data validity is implicit in timing of latch signals
• Memory access fits same model– Data from common memory demuxed into separate streams running at lower rate– Data to common memory multiplexed to a single port
• Multiple FPGAs introduce additional pipeline delays• Multi-rate parameterized execution
03/07/01 Page - 16BAE SYSTEMS • University of California, Berkeley
Fourth Biennial Ptolemy Miniconference – 22-23 Mar 2001
ACS Domain DSP Algorithms in Reconfig.HW
Smart Generators• Objectives
– Parameterized libraries – generate node implementations for specified bit widths andparameter values
– Hierarchical representations – provide generators that can recursively call other generators– Interface generation – automatically generate software to move data between general-
purpose processor and reconfigurable platform and to manage sequences of configurations– General synthesis – provide device independent representation of implementation
• Progress– Implemented portable logic synthesis methodology with VHDL as first target– Integrated Xilinx Core Generators (4000 Series) capability within VHDL code generation– Implemented smart generators for state machine and memory control– Hierarchical generation
GeneratorSelection
VHDL Interface LibrariesAdaptive
ComputingResource
DeviceProgramming
Allocated Functions
CommonDatabase
inPtolemy
03/07/01 Page - 17BAE SYSTEMS • University of California, Berkeley
Fourth Biennial Ptolemy Miniconference – 22-23 Mar 2001
ACS Domain DSP Algorithms in Reconfig.HW
Multi-FPGA CapabilityDesign Generation for Single or Multiple FPGAs
FPGA 1Logic
FPGA 3Logic
FPGA 1Routing
FPGA 2Routing
FPGA 3Routing
FPGA 4Routing
SingleFPGA
Implementation
FPGA 1Logic
Multi-FPGAImplementation
03/07/01 Page - 18BAE SYSTEMS • University of California, Berkeley
Fourth Biennial Ptolemy Miniconference – 22-23 Mar 2001
ACS Domain DSP Algorithms in Reconfig.HW
FPGAImplementation
Winograd DFT-Based FSK Communications Receiver
03/07/01 Page - 19BAE SYSTEMS • University of California, Berkeley
Fourth Biennial Ptolemy Miniconference – 22-23 Mar 2001
ACS Domain DSP Algorithms in Reconfig.HW
Results from FPGA-target / Back-end Tools
Generated VHDL Generated Schedule FPGA Design
03/07/01 Page - 20BAE SYSTEMS • University of California, Berkeley
Fourth Biennial Ptolemy Miniconference – 22-23 Mar 2001
ACS Domain DSP Algorithms in Reconfig.HW
SDF WildforceTM Star executes complete FPGA designin hardware on Annapolis Wildforce FPGA board
SDF Galaxy
Hardware-in-the-Loop
03/07/01 Page - 21BAE SYSTEMS • University of California, Berkeley
Fourth Biennial Ptolemy Miniconference – 22-23 Mar 2001
ACS Domain DSP Algorithms in Reconfig.HW
Processing Results
03/07/01 Page - 22BAE SYSTEMS • University of California, Berkeley
Fourth Biennial Ptolemy Miniconference – 22-23 Mar 2001
ACS Domain DSP Algorithms in Reconfig.HW
SHARP*/HRR Algorithm
Algorithm• Given test vector
– For each template• For each shift
– Compute least squares error
• Select template with minimum error
Complexity• 70 data points per vector• Number of shifts = 11 (in range)• Number templates = 3,600/class• 186 sec/class for 11 shifts on a Sun
Ultra 5 (360 MHZ) workstation• Expect 30x improvement
Non-Linearity
LeastSquares Fit
TestVector
ModelingError
ShiftTemplate Vector
(one per target, per azimuth,per elevation)
Can be done with correlation if templates aresuitably pre-processed
* System-oriented High Range Resolution(HRR) Automatic Recognition Program
03/07/01 Page - 23BAE SYSTEMS • University of California, Berkeley
Fourth Biennial Ptolemy Miniconference – 22-23 Mar 2001
ACS Domain DSP Algorithms in Reconfig.HW
SHARP/HRR Algorithm
FPGA DesignFPGA DesignSchedule Schedule
NormalizationResults
(HW vs. SW)
Correlation ResultsAcross Range Shifts
(typical expected)
NORMALIZATIONCORRELATION
03/07/01 Page - 24BAE SYSTEMS • University of California, Berkeley
Fourth Biennial Ptolemy Miniconference – 22-23 Mar 2001
ACS Domain DSP Algorithms in Reconfig.HW
ACS Tools - Facts and Figures• 23 Functional Blocks (ACS stars) developed• ~100 lines of code needed for new block/star• Two ACS Architectures supported
– WildforceTM (4062XL)– WildstarTM (XCV1000) (in progress)
• ~16,000 lines of C++ code developed• ~15 min to generate VHDL for five FPGA design• Explore ~1000 bit-width combinations in 1 minute• Ptolemy Classic runs under Solaris and Linux