automatic generation of customized discrete fourier transform ips grace nordin, peter a. milder,...

Automatic Generation of Automatic Generation of Customized Discrete Fourier Customized Discrete Fourier Transform IPsTransform IPs

Grace Nordin, Peter A. Milder, James C. Hoe, Markus Püschel

Carnegie Mellon University

This project is supported in part by NSF awards ITR/NGS-0325687 and SYS-0310941 and a DARPA DESA programwww.spiral.net

The Paradox of Reusable IPs

Boon to productivity zero effort required zero knowledge required zero chance to introduce new bugs

Why repeat what has already been done?

Bane to optimality finding the right functionality with the right interface design tradeoff -- performance, area, power, accuracy .....

Are you getting what you really wanted? Solution:Solution: parameterized automatic IP generators

zero effort, knowledge or bugs allows application specific customization facilitates design exploration

Our Work: Discrete Fourier Transform IPs

Discrete Fourier Transform (DFT) important building block in DSP applications numerous design “cores” available

Current IP libraries support: various sizes, number formats, data orderings only a small numbersmall number of microarchitecture choices

(Xilinx LogiCore DFT gives 3 choices)

We generate IPs with custom design tradeoffsWe generate IPs with custom design tradeoffs degree of parallelism in microarchitecture (min max) resource preference (e.g. BRAM vs. slices in FPGAs)

Extensible to other common linear DSP transformsExtensible to other common linear DSP transforms

Outline

Introduction Formula-Driven Design GenerationFormula-Driven Design Generation Microarchitecture Parameterization Generator User Interface Experimental Results Conclusions

Transforms as Formulas [www.spiral.net]

Transform computation is represented as matrix-vector multiplication

Matrix-vector multiplication is O(n2) operations

“Fast” algorithms factor the transform into a sequence of structured sparse matrices

O(n log n) operations

DFT:DFT:

FFT:FFT:

Datapath easily formed from factorized formulasDatapath easily formed from factorized formulas

Formula to Datapath

Given where is: apply , then is a permutation permute apply , times in

parallel is a diagonal scale

Outline

Introduction Formula-Driven Design Generation Microarchitecture ParameterizationMicroarchitecture Parameterization Generator User Interface Experimental Results Conclusions

Simple regular structure embodied in formula

Example:

Pease DFT

diagonal

permutation

butterfly

parallel

k stages

stage 1

stage 2stage 3

Pease DFT Example: DFT8

stage 1 stage 2 stage 3

(formula is applied from right to left)

(datapath is built left to right)

Repeating column structure Repeating column structure hardware reuse hardware reuse without performance penaltywithout performance penalty

Horizontal folding

our baseline design degree of freedom: vertical parallelism

parameter pp

inputbypass

register

Vertical (V-)folding according to p

latency

Fine-grained control over cost/latency tradeoffFine-grained control over cost/latency tradeoff

Outline

Introduction Formula-Driven Design Generation Microarchitecture Parameterization Generator User InterfaceGenerator User Interface Experimental Results Conclusions

User Interface

http://www.spiral.net/hardware/dftgen.html

commonDFT

options

customization options

Outline

Introduction Formula-Driven Design Generation Microarchitecture Parameterization Generator User Interface Experimental ResultsExperimental Results Conclusions

We compare Xilinx’s fixed design against our variable We compare Xilinx’s fixed design against our variable generated designsgenerated designs

Evaluation We compare against Xilinx LogiCore DFT Ver. 3.1

radix-4 burst I/O interface

XilinxXilinx SPIRALSPIRAL

datapathdatapath fixed, one radix- 4 basic block

variable, p radix-2 basic blocks

cost-performance cost-performance tradeofftradeoff

fixedfixed user-controlled, user-controlled, varies with varies with pp

Comparison DFT n = {64, 1024, 2048}; width = 16; bit-reversed output Xilinx ISE ver. 6.1, Xilinx Virtex2-Pro XC2VP100-6

1 2 4 8 16 32

esDFT1024 relative to Xilinx

Xilinx

Performance and resources scale with Performance and resources scale with pp

1.0 = 1955 slices 1.0 = 7 BRAMs 1.0 = 1 / 5.6 µsec

1 2 4 8 16 32

Min Slice

Min BRAM

Balanced

Xilinx

logic storage performance

1 2 4 8 16 32

Resource usage preferences

1 2 4 8 16 32

Min Slice

Min BRAM

Balanced

Xilinx

1 2 4 8 16 32

Resource usage preferences

1 2 4 8 16 32

Min Slice

Min BRAM

Balanced

Xilinx

Can control tradeoff between slices and BRAMsCan control tradeoff between slices and BRAMs

Xilinx

exchange BRAM for slices very little change in performance

1 2 4 8 16 32

DFT64 and DFT2048

1 2 4 8 16 32

Min Slice

Min BRAM

Balanced

Xilinx

1.0 = 2140 slices 1.0 = 7 BRAMs 1.0 = 1 transform / 24.578 µsec

Trends hold for sizes 64, 2048Trends hold for sizes 64, 2048

1.0 = 1743 slices 1.0 = 8 BRAMs 1.0 = 1 transform / 0.648 µsec

Xilinx

Related Work

Kumhom, Johnson, Nagvajara, ASIC/SOC 2000 universal FFT processor microarchitecture based on

processing elements interconnected by on-chip reconfigurable network

microarchitecture is scalable in the number of elements supports both Cooley Tukey and Pease

Choi, Scrofano, Prasanna, Jang, FPGA’2003 mapped radix-4 Cooley-Tukey algorithm onto log2(n)/2 DFT4

primitives scalable datapath between 1 element and 4 elements at a

time show energy and performance improvements from scaling

Conclusions

Parameterized DFT IP generator matrix formula-drivenformula-driven synthesis performance/cost tradeoff

fine-grained control over resources vs. latencyresources vs. latency resource usage preference

can balance tradeoff between slices and BRAMslices and BRAM

Key results efficient: efficient: the Xilinx design point can be matched customizable: customizable: design tradeoffsdesign tradeoffs directly controllable easy to use: easy to use: simple yet powerful web interfacesimple yet powerful web interface

Web Generator

This work is part of the SPIRALSPIRAL project, which aims to push the limits of automation in software and hardware development for DSP algorithms.For more information visit: www.spiral.netwww.spiral.net

http://www.spiral.net/hardware/dftgen.htmlhttp://www.spiral.net/hardware/dftgen.html

http://www.spiral.net/hardware/dftgen.html

automatic generation of customized discrete fourier transform ips grace nordin, peter a. milder,...

Documents

milder stephen1544scanned

pak nordin seorang penternak binatang

power point en. nordin

milder dicus speech - private practice section

ew'ge quelle, milder strom

nordin hussin.pdf

portfolio 2015 johanna nordin

this was my resumé - andreas nordin

proyecto milder ferreira

nordin kardi

milder steve1544scanned

nordin gallery - exhibition 29

beautiful garden insect- photographer nordin seruyan

nordin presentation @ akademia sinica

seth nordin in recital - scholarworks.uni.edu

by steph milder and hailey gooch

portfolio johanna nordin 15/16

art by joakim nordin

biomecanica basica nordin completo

spectroscopy with milder photons: molecular vibrations