solving the challenges of increasingly complex fpga ... · title: the xilinx all programmable...
TRANSCRIPT
© Copyright 2015 Xilinx .
Sergei Storojev Tools and Design Methodology Applications Xilinx
Solving the Challenges of Increasingly Complex
FPGA-Designs Using a High-Level Synthesis
Approach with Vivado HLS
© Copyright 2015 Xilinx .
Page 2
Vivado HLS - key technology in Xilinx solution
Vivado HLS Fundamental Concepts
Case Study: Multichannel FIR Filter Architectural Exploration
Demo
Agenda
© Copyright 2015 Xilinx .
Vivado HLS - Key Technology in Xilinx Solution
© Copyright 2015 Xilinx .
Page 4
Xilinx Technology Evolution
Programmable Logic Devices Enables Programmable Logic
All Programmable Devices Enables All Programmable & Smarter Systems
28nm
© Copyright 2015 Xilinx .
Page 5
Xilinx Multi-Node Product Portfolio Offering
Increasing Performance & Integration
2M LC 4.4M LC
© Copyright 2015 Xilinx .
Page 6
Vivado Design Suite Technology Advantages
Integrated Design Environment
De
bu
g a
nd
An
aly
sis
Sh
are
d S
ca
lab
le D
ata
Mo
de
l Scalable to 100M Gates
IP and System-centric
Integration with
Fast Verification
Fast, Hierarchical and
Deterministic Closure
Automation with ECO
Accelerating
System Integration
Accelerating
Implementation
© Copyright 2015 Xilinx .
Page 7
Accelerates Algorithmic C to RTL IP integration
Comprehensive Integration with
the Xilinx Design Environment
VHDL or Verilog
System IP Integration
C, C++ or SystemC
RTL Implementation
Micro Architecture Exploration
Algorithmic Specification
Rapid RTL architecture exploration via Directives
Co-optimization with RTL synthesis for optimal QoR
Generates AXI4-based IP for Vivado IP Integrator
Libraries
Libraries:
Arbitrary Precision
Video, OpenCV
Math
Linear algebra
DSP: FFT and FIR
Case Study Traditional C-based Acceleration
Design Time Radar Design
60 days 5 days 12x
Verification Time Video Design (10 frames)
~2 days 10 sec 12,000x
© Copyright 2015 Xilinx .
Page 8
Design Integration: IP Centric Design Flow
IP Catalog
C-based IP Creation
Libraries
Arbitrary Precision
Video, OpenCV
Math
Linear algebra
DSP: FFT and FIR
System Integration
Vivado IP Integrator
C/C++, SystemC, OpenCL
VHDL or Verilog
System Generator for DSP
Vivado RTL
© Copyright 2015 Xilinx .
Page 9
All Programmable SoCs & Vivado HLS
SW Spec HW Spec
Requirements
Verify Iterate Verify Iterate
Accelerates Algorithmic C to Co-Processing Accelerator Integration
© Copyright 2015 Xilinx .
Page 10
Vivado High-Level Synthesis Serves a Wide Range of Applications across Markets
Communications
LTE MIMO receiver
Advanced wireless antenna
positioning
Audio, Video, Broadcast 3D cameras
Video transport
Consumer 3D television
eReaders
Aerospace and Defense Radar, Sonar
Signals Intelligence
Industrial, Scientific, Medical Ultrasound systems
Motor controllers
Automotive
Infotainment
Driver assistance
Computing & Storage High performance computing
Database acceleration
Test & Measurement Communications instruments
Semiconductor ATE
© Copyright 2015 Xilinx .
Page 11
Vivado High-Level Synthesis Serves a Wide Range of Applications across Markets
Communications
LTE MIMO receiver
Advanced wireless antenna
positioning
Audio, Video, Broadcast 3D cameras
Video transport
Consumer 3D television
eReaders
Aerospace and Defense Radar, Sonar
Signals Intelligence
Industrial, Scientific, Medical Ultrasound systems
Motor controllers
Automotive
Infotainment
Driver assistance
Computing & Storage High performance computing
Database acceleration
Test & Measurement Communications instruments
Semiconductor ATE
© Copyright 2015 Xilinx .
Page 12
Accelerated Development for Hardware Engineers
“The combination of Vivado IPI and HLS has been invaluable to our
development. The combination of these abstractions allowed us to
develop our algorithms in C++ and rapidly integrate the resulting IP,
saving greater than 15X in development costs versus an RTL approach.“
~Ties Bos, director of Software and FPGA at Gainspeed, Inc.
Vivado IPI and HLS
© Copyright 2015 Xilinx .
Page 13
All Programmable Abstractions Programming FPGA/SoC at Software Abstraction Level
Platform Integrator
System Engineer
HW Designer
DSP SW Engineer
System Engineer
Faster time to
differentiation
& revenue
Libraries
State of the Art
Implementation Co-optimized with Architecture
HLS System
Generator IP Integr.
Platform &
IP Integration
HW/SW Interface (drivers…)
Vivado Design Suite
SDxTM
SDAccelTM, SDNetTM,
MathWorks Zynq Design
C models / TLM
IP HLS IP
RTL IP
System Engineer
SW Engineer
© Copyright 2015 Xilinx .
Vivado HLS Fundamental Concepts
© Copyright 2015 Xilinx .
Page 15
Vivado HLS
C based code Abstract, no clocks!
Specific, timed
IP
?
Accelerates Algorithmic C to RTL Creation
© Copyright 2015 Xilinx .
Page 16
Vivado HLS – Synthesis
Accelerates Algorithmic C to RTL Creation
FSM Datapath
MUL
ADD
DIV
FPU
C, C++, SystemC OpenCL C Abstract, untimed
Target optimized, timed,
Connectivity ready Connectable IP / Verified RTL
• Directives / Pragmas
• Constraints
• Libraries
• Arbitrary Precision
• Video
• Math
• Linear algebra
• IP: FFT and FIR
Coding style impacts hardware realization
© Copyright 2015 Xilinx .
Page 17
Loops Optimization: Latency & Throughput Examples
void foo_top (...) { ... add: for (i=0;i<=3;i++) { b = a[i] + b; ...
Default: 4 cycles
Unroll: 1 cycle
0 1 2 3
clk
0 1 2 3
clk
Un
roll
ing
P
ipe
lin
ing
void foo_top (...) { ... add: for (i=1;i=<2;i++) { op_READ; op_COMPUTE; op_WRITE; } ...
w/o pipelining
pipelining
READ
clk COMPUTE WRITE
READ COMPUTE WRITE
READ
clk COMPUTE WRITE
READ COMPUTE WRITE
loop latency = 6
loop latency = 4
throughput = 3
throughput = 1
© Copyright 2015 Xilinx .
Page 18
Arrays – Partitioning and Reshaping
Arrays are the fundamental construct to describe memories…
– Array accesses can often be performance bottlenecks
Partitioning splits an array into independent arrays
– Arrays can be partitioned on any of their dimensions
Example: ARRAY_PARTITION set to “cyclic”, factor of 2
Initial array … becomes two arrays
Reshaping combines array elements into wider containers
– Different arrays into a single physical memory
– New RTL memories are automatically generated without changes to C code
0 1 2 … N-3 N-2 N-1 1 … N-3 N-1
0 2 … N-2
© Copyright 2015 Xilinx .
Page 19
Vivado HLS – Interfaces
Accelerates Algorithmic C to RTL Creation
Connectable IP Ready to use
IP block
C based code
• Directives / Pragmas
• AXI
• FIFO Interface
• BRAM Interface
• Default HLS protocols
• Handshake Interface
Abstract, untimed
© Copyright 2015 Xilinx .
Page 20
Vivado HLS – Verification
Accelerates Algorithmic C to RTL Creation
C, C++ Testbench and C, C++, SystemC
C simulation
C / RTL Co-simulation
• GCC
• G++
• SystemC
Connectable IP / Verified RTL
•Xsim •ISim •Questa SIM •VCS •NCSim •Riviera •OSCI
Abstract, untimed testbench
Automatic generation of
RTL testbench
Single testbench for
C-Sim and RTL-Sim
Creating a self-checking test bench is
highly recommended!
© Copyright 2015 Xilinx .
Page 21
5 steps process to improve your design
Design Methodology (UG902)
● ● ●
© Copyright 2015 Xilinx .
Case Study:
Multichannel FIR Filter Architectural Exploration
© Copyright 2015 Xilinx .
Page 23
Goal: explore different architecture solutions
– Targeting ZYNQ 7045
– At 50 MSPS fixed data rate, 81 coefficient filter
– 32-bit integer or 32-bit Floating-Point (FP) I/O samples and coefficients
13 different architectures considered
– Fixed the 50 MSPS data rate, 3 different clock frequencies:
• 50, 150, 300 MHz with respectively II=1, II=3, II=6
FIR Filter
Odd-symmetric
25x18 bits integer math accuracy (II=3)
32x32 bits integer math accuracy (II=3, II=6)
32-bit floating point accuracy (II=6)
Asymmetric
25x18 bits integer math accuracy (II=1, II=3, II=6)
32x32 bits integer math accuracy (II=1, II=3, II=6)
32-bit floating point accuracy (II=1, II=3, II=6)
Systolic MAC FIR Filter
© Copyright 2015 Xilinx .
Page 24
Overall Performance Summary
II MHz BRAM DSP48 FF LUT
25x18 asym MAC FIR 1 50 0 243 6467 612425x18 asym MAC FIR 3 150 0 81 16778 928525x18 asym MAC FIR 6 300 0 41 19528 1256025x18 odd-sym MAC FIR 3 150 0 41 12258 7890
32 float asym MAC FIR 1 50 0 1701 156066 29721732 float asym MAC FIR 3 150 0 567 87177 13062932 float asym MAC FIR 6 300 0 123 85451 9195932 float odd-sym MAC FIR 6 300 0 63 62171 54835
32x32 asym MAC FIR 1 50 0 972 16037 896832x32 asym MAC FIR 3 150 0 324 27688 1350632x32 asym MAC FIR 6 300 0 164 27529 1824232x32 odd-sym MAC FIR 3 150 0 164 18304 1093232x32 odd-sym MAC FIR 6 300 0 84 18642 13183
ZYNQ 7045 max resources 1090 900 437200 218600
expectedArchitecture
Syntesis Estimation
To accomplish the work
• RTL Approach will take days
• Vivado HLS: 6 hours only
4 hours: C coding
2 hours: Tcl scripting, HLS Execution
© Copyright 2015 Xilinx .
Demo
© Copyright 2015 Xilinx .
Page 26
Vivado HLS
Key technology in Xilinx solution
Reduces >10x design time and >100x verification time
Enables rapid RTL architecture exploration for optimal QoR
Conclusion