three-dimensional template correlation: object recognition in 3d voxel data
DESCRIPTION
BOSTON. UNIVERSITY. Three-Dimensional Template Correlation: Object Recognition in 3D Voxel Data. Tom VanCourtBoston University Yongfeng GuECE Department Martin Herbordt CAAD lab www.bu.edu/caadlab. 3D Template Matching. Increasing use of volumetric data sets - PowerPoint PPT PresentationTRANSCRIPT
Three-Dimensional Template Correlation:Three-Dimensional Template Correlation:Object Recognition in 3D Voxel DataObject Recognition in 3D Voxel Data
Tom VanCourtTom VanCourt Boston UniversityBoston UniversityYongfeng GuYongfeng Gu ECE DepartmentECE DepartmentMartin Herbordt Martin Herbordt CAAD lab CAAD lab
www.bu.edu/caadlabwww.bu.edu/caadlab
BOSTONUNIVERSITY
CAMP `053D Template Matching
2BOSTONUNIVERSITY
3D Template Matching3D Template Matching
Increasing use of volumetric data setsIncreasing use of volumetric data sets MRI / CAT, confocal microscopy, molecule structureMRI / CAT, confocal microscopy, molecule structure
Increased complexity of correlationIncreased complexity of correlation 2D: 2D: O(nO(n22) (x,y) ) (x,y) O(nO(n11) rotations = O(n) rotations = O(n33)) 3D : 3D : O(nO(n33) (x,y,z) ) (x,y,z) O(nO(n33) rotations = O(n) rotations = O(n66))
Transform techniques help a little: Transform techniques help a little: O(nO(n33)) O(n O(n22) log n) log n O(nO(n66)) O(n O(n44) log n) log n
Solution: Application-specific accelerators Solution: Application-specific accelerators Programmable off-the-shelf hardwareProgrammable off-the-shelf hardware Custom logic design, unique to each applicationCustom logic design, unique to each application
CAMP `053D Template Matching
3BOSTONUNIVERSITY
Volumetric Data SetsVolumetric Data Sets
Complex data typesComplex data types Multiple fluorescence channelsMultiple fluorescence channels Oriented data: flow vectorsOriented data: flow vectors Nonlinear scoring modelsNonlinear scoring models
True 3D data acquisitionTrue 3D data acquisition Medical imaging (MRI, PET, CAT, …)Medical imaging (MRI, PET, CAT, …) Confocal microscopyConfocal microscopy Emerging techniques: Emerging techniques:
Diffusion tensor tomographyDiffusion tensor tomography
CAMP `053D Template Matching
4BOSTONUNIVERSITY
COTS COTS ANDAND Custom? How? Custom? How?
Field Programmable Gate ArraysField Programmable Gate Arrays 1000s of uncommitted elements1000s of uncommitted elements Custom processor built on demandCustom processor built on demand On-chip RAM bandwidth: >1TBit/secOn-chip RAM bandwidth: >1TBit/sec Massive parallelism: 100s-1000s of PEsMassive parallelism: 100s-1000s of PEs
Accelerator is tailored to each applicationAccelerator is tailored to each application
~100% payload computation cycles~100% payload computation cyclesNoNo load/store cycles load/store cyclesNoNo loop overhead cycles loop overhead cyclesNoNo address arithmetic cycles address arithmetic cycles
~0% logic dedicated to unused features~0% logic dedicated to unused features
CAMP `053D Template Matching
5BOSTONUNIVERSITY
Acceleration StrategyAcceleration Strategy
Standard approach:Standard approach:
Accelerated approach:Accelerated approach:
TransformPer Channel
Rotated Image
Molecule Grid
Products ofTransforms
CorrelationResult
Molecule Grid
CorrelationResult
FFT x FFT-1
Direct Correlation bySystolic Array
RotatedAddressing
CAMP `053D Template Matching
6BOSTONUNIVERSITY
Correlation PipelineCorrelation Pipeline
Systolic3D
Correlation
VoxelValue
Rotation
RotatedImage
Access
DataReductionFiltering
Customizable functionsCustomizable functions High data reuseHigh data reuse
Direct correlationDirect correlation Beats FFT for modest problemsBeats FFT for modest problems Generalizes correlation sumGeneralizes correlation sum:: ΣΣijkijk FF(A(Axyzxyz, T, Tijkijk))
Natural for FPGA implementationNatural for FPGA implementation Regular structure Regular structure Simple data elements Simple data elements
CAMP `053D Template Matching
7BOSTONUNIVERSITY
Rotated Memory AccessRotated Memory Access
Load image once & reuseLoad image once & reuse Access image in rotated orderAccess image in rotated order
via index transformationvia index transformation
xxi i xxjj x xkk i x i xyyii y yjj y ykk j = j = y yzzii z zjj z zkk k k zz
Allows axis scaling, mirror reversalAllows axis scaling, mirror reversalAnisotropic: e.g. X,Y resolution Anisotropic: e.g. X,Y resolution ≠ Z≠ ZNo need for resamplingNo need for resampling
~0 delay & buffer overhead~0 delay & buffer overhead Strength reduction eliminates multiplicationStrength reduction eliminates multiplication Arithmetic cost hidden by pipeliningArithmetic cost hidden by pipelining
x
y
i
j
CAMP `053D Template Matching
8BOSTONUNIVERSITY
Voxel Value RotationVoxel Value Rotation
Not needed for scalar data Not needed for scalar data (RGB, gray scale, etc)(RGB, gray scale, etc)
Step exists architecturally, as identity transformStep exists architecturally, as identity transform For spatially oriented data For spatially oriented data (e.g. fluid flow in brain tissue)(e.g. fluid flow in brain tissue)
Perform rigid rotation of image … Perform rigid rotation of image … Then rotate oriented voxel valuesThen rotate oriented voxel values
CAMP `053D Template Matching
9BOSTONUNIVERSITY
Correlation ArrayCorrelation Array
3D extension of conventional array3D extension of conventional array
Custom unit cellCustom unit cellHolds constant value for templateHolds constant value for templateCustom Custom FF(a, b)(a, b)
… … 1D array + line buffer1D array + line bufferExtend line to result widthExtend line to result width
… … 2D array + plane buffer2D array + plane bufferExtend plane to result sizeExtend plane to result size
… … 3D array3D arrayOne input voxel per cycle, paddedOne input voxel per cycle, paddedOne output correlation point per cycleOne output correlation point per cycle
A
Sin Sout
+
FT
SoutA
SinRAM FIFO
RAM FIFO
CAMP `053D Template Matching
10BOSTONUNIVERSITY
3D Correlation Result3D Correlation Result
Template is stored in computation arrayTemplate is stored in computation array FIFOs hold partial correlation sumsFIFOs hold partial correlation sums
Template data andComputation array
3D Correlation resultWhole volume shown
FIFO line buffersPad to result width
FIFO plane buffersPad to result depth
Correlation completeResult passed to data reduction filter
CAMP `053D Template Matching
11BOSTONUNIVERSITY
Peak Capture / Data ReductionPeak Capture / Data Reduction
3D result 3D result ≥≥ image size image size Full result would slow hostFull result would slow host
Template may occur > 1xTemplate may occur > 1x Find multiple maximaFind multiple maxima
Reporting Reporting NN highest highest points is not points is not
effectiveeffective
Instead: Local max by regionInstead: Local max by region 8x8x8 region– 5128x8x8 region– 512::1 reduction1 reduction More maxima, less redundancyMore maxima, less redundancy Record exact (x,y,z) in regionRecord exact (x,y,z) in region BBUTUT may miss close maxima may miss close maxima
Region Region template size may be OK template size may be OK
Broad maximumreported redundantly
Local maxima missed
CAMP `053D Template Matching
12BOSTONUNIVERSITY
Why Reconfigurable?Why Reconfigurable?
Massive parallelism, modest costMassive parallelism, modest cost COTS hardware, tracks technologyCOTS hardware, tracks technology
Application-optimized processingApplication-optimized processing Tracks application changesTracks application changes
Ex: 1, 2, 3-channel fluorescenceEx: 1, 2, 3-channel fluorescence
Flexible performance tradeoffsFlexible performance tradeoffs Allows non-linear scoringAllows non-linear scoring
Available nowAvailable now PC add-insPC add-ins SGI AltixSGI Altix Cray XD1Cray XD1
24bit
RGB
8bit
Mono
4bit
CAMP `053D Template Matching
13BOSTONUNIVERSITY
Performance ResultsPerformance Results
Voxel valueVoxel value Voxel bitsVoxel bitsLogic per PE Logic per PE
(slices)(slices)Number of Number of
PEsPEsClock Clock MHzMHz
Speed: Speed: 101099 SAC/sec SAC/sec
2-tuple2-tuple 22 1111 2744 = 142744 = 1433 51.551.5 141.9141.9
3-tuple3-tuple 77 2121 1331=111331=1133 46.146.1 61.361.3
2-tuple2-tuple(nonlinear)(nonlinear) 55 4444 729=9729=933 30.630.6 22.222.2
2-tuple2-tuple 66 3535 729=9729=933 38.338.3 27.927.9
4-tuple4-tuple(oriented)(oriented) 77 1616 1331 = 111331 = 1133 46.346.3 61.761.7
Xilinx Virtex-II Pro VP70Xilinx Virtex-II Pro VP70 Measured: Score-accumulate per sec (SAC/sec)Measured: Score-accumulate per sec (SAC/sec)
Complex models not limited in number of bitsComplex models not limited in number of bits Simple models not limited by worst-case speedSimple models not limited by worst-case speed
CAMP `053D Template Matching
14BOSTONUNIVERSITY
ConclusionsConclusions
Accelerators enable 3D template matchingAccelerators enable 3D template matching >100x speedup over 3D FFT (n~100)>100x speedup over 3D FFT (n~100) Complex data types, including vector valuesComplex data types, including vector values Nonlinear comparisons supportedNonlinear comparisons supported
Programmability avoids common limitationsProgrammability avoids common limitations No penalty due to over-generalizationNo penalty due to over-generalization No limit due to data/function restrictionsNo limit due to data/function restrictions
3D data and FPGA coprocessors match well3D data and FPGA coprocessors match well Both are emerging and expanding Both are emerging and expanding FPGAs three years ago couldn’t do it!FPGAs three years ago couldn’t do it!