vector fpga acceleration of 1 -d dwt computations … fpga acceleration of 1 -d dwt computations...

16

Vector FPGA Acceleration of 1-D DWT Computations using Sparse Matrix Skeletons Sidharth Maheshwari, Gourav Modi, Siddhartha, Nachiket Kapre School of Computer Science and Engineering Nanyang Technological University

Upload: ngokien

Post on 11-Apr-2018

222 views

Category:

Documents

3 download

Report

Download

Embed Size (px):

TRANSCRIPT

Page 1: Vector FPGA Acceleration of 1 -D DWT Computations … FPGA Acceleration of 1 -D DWT Computations using Sparse Matrix Skeletons Sidharth Maheshwari, Gourav Modi, Siddhartha, Nachiket

VectorFPGAAccelerationof1-DDWTComputationsusingSparseMatrix

Skeletons

SidharthMaheshwari,GouravModi,Siddhartha,NachiketKapreSchoolofComputerScienceandEngineering

NanyangTechnologicalUniversity

Page 2: Vector FPGA Acceleration of 1 -D DWT Computations … FPGA Acceleration of 1 -D DWT Computations using Sparse Matrix Skeletons Sidharth Maheshwari, Gourav Modi, Siddhartha, Nachiket

Page 3: Vector FPGA Acceleration of 1 -D DWT Computations … FPGA Acceleration of 1 -D DWT Computations using Sparse Matrix Skeletons Sidharth Maheshwari, Gourav Modi, Siddhartha, Nachiket

Matrix-Form1-DDWT

• Formulation:𝐶 = 𝑇𝑀 % 𝑋, where 𝑇𝑀 = ∏ 𝑇()

*)

• TMmatrixishighlysparseØLargenumberofmultiply-by-zerooperations

ØLargememoryfootprintconsistingofzeroes

• Goals:Ø SIMD-friendlyoperationsonnon-zerovaluesonly

Ø CustomizedDMAroutinesforefficientbandwidthutilization

Page 4: Vector FPGA Acceleration of 1 -D DWT Computations … FPGA Acceleration of 1 -D DWT Computations using Sparse Matrix Skeletons Sidharth Maheshwari, Gourav Modi, Siddhartha, Nachiket

Matrix-Form1-DDWT

• Formulation:𝐶 = 𝑇𝑀 % 𝑋, where 𝑇𝑀 = ∏ 𝑇()

*)

• TMmatrixishighlysparseØLargenumberofmultiply-by-zerooperations

ØLargememoryfootprintconsistingofzeroes

• Goals:Ø SIMD-friendlyoperationsonnon-zerovaluesonly

Ø CustomizedDMAroutinesforefficientbandwidthutilization

Page 5: Vector FPGA Acceleration of 1 -D DWT Computations … FPGA Acceleration of 1 -D DWT Computations using Sparse Matrix Skeletons Sidharth Maheshwari, Gourav Modi, Siddhartha, Nachiket

Matrix-Form1-DDWT

• Formulation:𝐶 = 𝑇𝑀 % 𝑋, where 𝑇𝑀 = ∏ 𝑇()

*)

• TMmatrixishighlysparseØLargenumberofmultiply-by-zerooperations

ØLargememoryfootprintconsistingofzeroes

• Goals:Ø SIMD-friendlyoperationsonnon-zerovaluesonly

Ø CustomizedDMAroutinesforefficientbandwidthutilization

Page 6: Vector FPGA Acceleration of 1 -D DWT Computations … FPGA Acceleration of 1 -D DWT Computations using Sparse Matrix Skeletons Sidharth Maheshwari, Gourav Modi, Siddhartha, Nachiket

SparseMatrixSkeleton

• Removemultiply-by-zerooperations• ReductioninmemoryfootprintofTM.

36

8

Page 7: Vector FPGA Acceleration of 1 -D DWT Computations … FPGA Acceleration of 1 -D DWT Computations using Sparse Matrix Skeletons Sidharth Maheshwari, Gourav Modi, Siddhartha, Nachiket

ModifiedMatrix-Form1-DDWTN=65536

Page 8: Vector FPGA Acceleration of 1 -D DWT Computations … FPGA Acceleration of 1 -D DWT Computations using Sparse Matrix Skeletons Sidharth Maheshwari, Gourav Modi, Siddhartha, Nachiket

VectorBloxMXP

• Lanes: 16-32• Scratchpad: 64-128 KB• DMA bandwidth: 4-32 B/cycle

Page 9: Vector FPGA Acceleration of 1 -D DWT Computations … FPGA Acceleration of 1 -D DWT Computations using Sparse Matrix Skeletons Sidharth Maheshwari, Gourav Modi, Siddhartha, Nachiket

𝑁 = 2-., 𝐿 = 6𝑎𝑛𝑑𝑘 = 3

Results- Speedup

05

1015202530354045505560

MXP−DE2 MXP−DE4 MXP−ZedBoard

Speedup Baseline CPU

Raspberry PiZedboardBeagleBone Black

Page 10: Vector FPGA Acceleration of 1 -D DWT Computations … FPGA Acceleration of 1 -D DWT Computations using Sparse Matrix Skeletons Sidharth Maheshwari, Gourav Modi, Siddhartha, Nachiket

𝑁 = 2-., 𝐿 = 6𝑎𝑛𝑑𝑘 = 3

Results- Speedup

05

1015202530354045505560

MXP−DE2 MXP−DE4 MXP−ZedBoard

Speedup Baseline CPU

Raspberry PiZedboardBeagleBone Black

Page 11: Vector FPGA Acceleration of 1 -D DWT Computations … FPGA Acceleration of 1 -D DWT Computations using Sparse Matrix Skeletons Sidharth Maheshwari, Gourav Modi, Siddhartha, Nachiket

𝑁 = 2-., 𝐿 = 6𝑎𝑛𝑑𝑘 = 3

Results- Speedup

05

1015202530354045505560

MXP−DE2 MXP−DE4 MXP−ZedBoard

Speedup Baseline CPU

Raspberry PiZedboardBeagleBone Black

Page 12: Vector FPGA Acceleration of 1 -D DWT Computations … FPGA Acceleration of 1 -D DWT Computations using Sparse Matrix Skeletons Sidharth Maheshwari, Gourav Modi, Siddhartha, Nachiket

Summary

• We propose a Modified Matrix-Form scheme to unlock inherentparallelism in 1-D DWT

• We exploit the sparsity pattern in TM to reduce complexity fromO(𝑛8) to O(𝑛) using :

Ø Skeletons to avoid wastefulmultiply-by-zero operationsØ Rearrangement of input samples

• Speedups of 12-103x over state-of-the-art in-built signal libraryin Octave (dwt function)

Page 13: Vector FPGA Acceleration of 1 -D DWT Computations … FPGA Acceleration of 1 -D DWT Computations using Sparse Matrix Skeletons Sidharth Maheshwari, Gourav Modi, Siddhartha, Nachiket

ExperimentalSetupMatrix-form1-DDWT

SparseMatrixSkeletons

CPU- OptimizedOpenBLASroutinesinOctaveandC(compiledwith–O3)- PerformancemeasuredusingPAPIv5.4.3- 32bARMv7onBeagleboneBlack,Zedboard,andARMv6onRaspberryPi

CPU+MXP- CustomizedDMAroutinesfordatatransferbetweenhostandMXP- 16-32vectorlanes- 64-128KBscratchpadmemory- PerformancemeasuredusingMXPTimingAPI- AlteraDE2/DE4andZedboard

Page 14: Vector FPGA Acceleration of 1 -D DWT Computations … FPGA Acceleration of 1 -D DWT Computations using Sparse Matrix Skeletons Sidharth Maheshwari, Gourav Modi, Siddhartha, Nachiket

Results- Throughput

𝑁 = 2-., 𝐿 = 6𝑎𝑛𝑑𝑘 = 3

●

●●

20

40

60

80

0.1 1.0Throughput (GOps/S)

Ener

gy (m

J)●

●●

ARM (Beagl.)

ARM (Rasp.)

ARM (Zedb.)

MXP−DE2

MXP−DE4

MXP−Zed

Page 15: Vector FPGA Acceleration of 1 -D DWT Computations … FPGA Acceleration of 1 -D DWT Computations using Sparse Matrix Skeletons Sidharth Maheshwari, Gourav Modi, Siddhartha, Nachiket

CHALLENGES:• Largevolumeofdata• Strictreal-timeprocessingconstraints• Highaccuracydemands• Energyconstraints,especiallyinembedded systems

Page 16: Vector FPGA Acceleration of 1 -D DWT Computations … FPGA Acceleration of 1 -D DWT Computations using Sparse Matrix Skeletons Sidharth Maheshwari, Gourav Modi, Siddhartha, Nachiket

ModifiedMatrix-Form1-DDWTRearrangement

FPGA-Based Smart Sensor for Drought Stress Detection in ...sistemanodalsinaloa.gob.mx/archivoscomprobatorios/_11_articulosre... · Discrete Wavelet Transform (DWT) to explore plants

Proposal AGUS DWT

FPGA Implementation of Image Denoiser using Dual Tree ... · 01/03/2011 · the design and implementation of VLSI architecture of multiplier-less DWT image processor. The filter

core.ac.uk · Gambar 2. DWT sebagai pemroses awal ... 1 Gambar 9. Komponen blok FIR pada FPGA LUT berisi semua kemungkinan yang terdapat pada table koefisien wavelet Daubechies. Wavelet

FPGA Implementation of Image Fusion Technique Using DWT for … · 2016-09-09 · the proposed fusion algorithm based on DWT. Key Words: Unmanned Autonomous Vehicles, Micro Air Vehicle,

Computing Faster Without CPUs GOAL: Evaluate FPGA*-based Hypercomputer Potential for NASA Scientific Computations * Field-Programmable Gate Array (e.g

DWT Monthly CLE Series

Understood Volume Computations Did Not Understand Volume Computations

1718440100 00 Installation Manual USAmedia.datatail.com/docs/installation/86704_en.pdf · USA Installation Manual Dishwasher FOR MODELS DWT 14210 NBL00 DWT 14220 NBL00 DWT 14240 NBL00

kaushal patel-DWT

Design and FPGA Implementation of DWT, Image Text Extraction … · 2017-01-18 · Design and FPGA Implementation of DWT, Image Text Extraction Technique ... Jian-Xia Wang et at 6

Overview BWSSN Recap Skin Sensor Signal Types DSC Brief Review DWT Brief Review ECG Signal DWT ECG DWT ECG Adapted for Wireless Transmission DSC DWT ECG

Amenability of Multigrid Computations to FPGA-Based Acceleration*

Tarifa DWT 2012

Siganl Extension & DWT

Embarrassingly Parallel Computations Partitioning and Divide-and-Conquer Strategies Pipelined Computations Synchronous Computations Asynchronous Computations

Privacy Preserving Computations accelerated using FPGA ... · FPGA Overlay architecture and leverage hardware acceleration to tackle the scalability and efficiency challenges inherent

Outlook for Parallel Computing in the Electric Power Industry · computations e.g. multi-core CPU, GPU (graphics processing unit) or FPGA (field-programmable gate array). Significant

Overview of Dwt

Performance analysis of DWT based OFDM over FFT based OFDM and implementing on FPGA

FPGA Implementation of Lifting DWT based LSB Steganography using Micro Blaze Processor

FPGA Implementation of Hybrid Architecture for Image ......Key words: Discrete wavelet transforms(DWT), DWPT, FPGA, Hybrid architecture for Image compression, Hybrid technique, Neural

DWT TURKEY 2013

DWT-DCT-SVD Based Watermarking - pudn.comread.pudn.com/downloads155/doc/685655/DWT-DCT-SVD... · watermarking techniques offer compression while DWT based compression offer scalability

EVOTORQUE - DWT GmbH

2d Dwt Document

FPGA Implementation of Systolic Array Architecture … (part-4)/I021043950.pdfFPGA Implementation of Systolic Array Architecture for 3D-DWT Optimizing Speed and Power 41 | P a g e

Lavavajillas · Lavavajillas Manual de instalación DWT 52600 SSIH DWT 52600 BIH DWT 52800 WIH DWT 52800 SSIH DWT 81900 FBI DWT 51600 W DWT 81900 SS DW 51600 SS DW 51600 FBI

FPGA Implementation of 1D and 2D DWT Architecture using

Performance analysis of DWT based OFDM over FFT …aircconline.com/vlsics/V2N3/2311vlsics10.pdfPerformance analysis of DWT based OFDM over FFT based OFDM and implementing on FPGA

Seminar20091014 - DWT Introductie [Compatibi 20091014 - DWT Introductie... · • DEMO ~~ Speech act theory (Searl) 10/17/2009 4 ... Microsoft PowerPoint - Seminar20091014 - DWT Introductie

IEEE Globecom-2006, NXG-02: Broadband Access ©Copyright 2005-2006 All Rights Reserved 1 FPGA based Acceleration of Linear Algebra Computations. B.Y. Vinay