DSP Design
Case Study
Image Processingage ocess g
Viktor Öwall, Dept. of Electrical and Information Technology, Lund University, Sweden-www.eit.lth.se
DSP Design
Image processingFrom a hardware perspective
• Often massively parallely p– Can be used to increase throughput
• Memory intensiveStorage size– Storage size
– Memory bandwidth
Viktor Öwall, Dept. of Electrical and Information Technology, Lund University, Sweden-www.eit.lth.se
DSP Design
2-diemensional Image Convolution
N
M MFor each pixel position in the
MxM image, the kernel value is multiplied with the underlying pixel value and those are added
N
pixel value and those are added to produce the output value:
21221121 ,,, mmhmkmkxkky
2M
A frame is added to avoid border effects.Image processing is memory
Viktor Öwall, Dept. of Electrical and Information Technology, Lund University, Sweden-www.eit.lth.se
2 Image processing is memory intensive!
DSP Design
Edge detection and zero crossings with different kernel sizewith different kernel size
Viktor Öwall, Dept. of Electrical and Information Technology, Lund University, Sweden-www.eit.lth.se
DSP DesignGrain recognitionGrain recognition
Increasing filter size more calculations
Viktor Öwall, Dept. of Electrical and Information Technology, Lund University, Sweden-www.eit.lth.se
Edge detection
DSP DesignGrain recognition
Increasing filter size more calculationsGrain recognition
Filter size15 x 1515 x 15
225multiplications
i l
Viktor Öwall, Dept. of Electrical and Information Technology, Lund University, Sweden-www.eit.lth.se
per pixel
DSP Design
What datapath architecture?• Hardware mapped, i.e. 225 multipliers + adds
• Single MAC (Multiply Accumulate) unitg ( p y )
• Hardware for one column each clock cycle• Hardware for one column each clock cycle
Viktor Öwall, Dept. of Electrical and Information Technology, Lund University, Sweden-www.eit.lth.se
DSP Design
Adder tree structure of processor core15 pixels read on each l k lclock cycle
PipelinedAdder TreeAdder Tree
Accumulator
Viktor Öwall, Dept. of Electrical and Information Technology, Lund University, Sweden-www.eit.lth.se
DSP Design
Adder tree structure of processor core15 pixels read on each l k lclock cycle
Increased wordlength to keep precisionto keep precision and avoid overflow
Guard bits in accumulatorGuard bits in accumulator
and truncated output
Viktor Öwall, Dept. of Electrical and Information Technology, Lund University, Sweden-www.eit.lth.se
DSP Design
Datapath Chip, 1993
1m standard CMOS technology approx. 50 000 transistors
Viktor Öwall, Dept. of Electrical and Information Technology, Lund University, Sweden-www.eit.lth.se
ppdie area, 8x6,5 mm2
DSP Design
2-diemensional Image Convolution
Off-chip image memoryN
M M • Large• High power
E i l d i l
MxM
Every pixel used in several calculationsN
22 )1( MNM )1( MNM
2M
Multi-level memory hierarchy can be used.
Viktor Öwall, Dept. of Electrical and Information Technology, Lund University, Sweden-www.eit.lth.se
2
DSP Design
How to use line memoriesInitial filling
New lineNew line
Each pixel operation
Only one external memory read per pixel!
Viktor Öwall, Dept. of Electrical and Information Technology, Lund University, Sweden-www.eit.lth.se
+ shift one pixel between memoriesmemory read per pixel!
DSP Design
Memory Hierarchy, accesses
KernelImage memory
off-chip
2N 2MMMN )1(
M Line memories memories
NM-1 with N words1 witk M words
Scheme Image Line Kernel
Image M2(N-M+1)2g ( )Image line N2 M2(N-M+1)2
Image kernel MN(N-M+1) M2(N-M+1)2
Viktor Öwall, Dept. of Electrical and Information Technology, Lund University, Sweden-www.eit.lth.se
Image line kernel N2 MN(N-M+1) M2(N-M+1)2
DSP Design
Memory Hierarchy, energyImage memory
off-chip, 0.18m Kernel0.35m CMOS
2N 2MMMN )1(
Line memories memories
N60nJ/access
4nJ/access 1nJ/access
Scheme energyImage 13.8JN = 1024 Image 13.8J
Image line 1.0J
Image kernel 1.2J
N 0M = 15Wordlength = 16
Viktor Öwall, Dept. of Electrical and Information Technology, Lund University, Sweden-www.eit.lth.se
Image line kernel 0.4J
DSP Design
Tailored architecture in theTailored architecture in the datapath design
Image processorwithout controller
Data out - 4x24 bits
Cyclic columnstorage
4 bitsaddress line
Cachelevel 3(15x16)
Processorcore
1
Processorcore
4
Processorcore
2
Processorcore
3APU
3
gaddress line
New column writtento cache for each
new pixel operation
System bus 15x8 bits
8 bits
Controlsignals from
controller Kernel moving onepixel to the right
Line memories with pipelined registerslevel 2
(15x256)2
APU
1
Large off-chipmemories
level 1 (256x256)
Inputbuffer
New value feeded duringeach new pixel operation
Unfilled memoryelements
Viktor Öwall, Dept. of Electrical and Information Technology, Lund University, Sweden-www.eit.lth.se
DSP Design
Compared to aCompared to aTMS320C80 Multimedia Video Processor (MVP)
Published 1995Published 1995
Designed: 20MHzgMVP: 50MHz
MVP: 4 parallel DSPs + 1 master processor [3].
Viktor Öwall, Dept. of Electrical and Information Technology, Lund University, Sweden-www.eit.lth.se
Each DSP unit contains one 16x16 bit multiplier, which can be split into two8x8 bit multipliers
DSP Design
Memory Considerations
Viktor Öwall, Dept. of Electrical and Information Technology, Lund University, Sweden-www.eit.lth.se
DSP Design
We have registers, why memories?
D Flip-flop : 252µm2 Memory element : 30µm2
Viktor Öwall, Dept. of Electrical and Information Technology, Lund University, Sweden-www.eit.lth.se
DSP Design
Flip flops vs SRAMFlip-flops vs. SRAMAlcatel Microelectronics 0.35µm CMOS technology process
Process and library dependent but same trends
1.6
1.8
Flip-flopsDual port memorySingle port memory
Process and library dependent but same trends
1.2
1.4Single port memoryDouble width memory
0.8
1
squa
re m
m
0.4
0.6
0 500 1000 1500 2000 2500 3000 3500 4000 45000
0.2
memory elements
Viktor Öwall, Dept. of Electrical and Information Technology, Lund University, Sweden-www.eit.lth.se
memory elements
Crossover approx 200bits for this technology
DSP Design
Hardware Aspects of a Real-time Surveillance SystemSurveillance System
Viktor Öwall, Dept. of Electrical and Information Technology, Lund University, Sweden-www.eit.lth.se
DSP Design
A I t lli t S ill S tAn Intelligent Surveillance SystemSegmentation Morphology Labeling
Tracked Objects
Segmentation Morphology Labeling
Object classification
Feature extraction Tracking
Input: Video from stationary cameraOutput: Tracked Objects
Spec: Xilinx Virtex II-Pro Development PlatformResolution 320x24025 frames per second
• Architectures for local decisions
Viktor Öwall, Dept. of Electrical and Information Technology, Lund University, Sweden-www.eit.lth.se
• Embedded system requires real time and low power
DSP Design
Th PhD S d TThe PhD Student TeamThree PhD St dentsThree PhD Students:• Hongtu Jiang; Sensor interface and segmentation.
– PhD February 2007PhD February 2007
• Fredrik Kristensen; System Overview, feature extraction and tracking
PhD S t b 2007– PhD September 2007
• Hugo Hedberg Morphology and labeling• Hugo Hedberg, Morphology and labeling– PhD April 2008
Viktor Öwall, Dept. of Electrical and Information Technology, Lund University, Sweden-www.eit.lth.se
References see: www.es.lth.se/asicdsp
DSP Design
Th d ltThe end result
Viktor Öwall, Dept. of Electrical and Information Technology, Lund University, Sweden-www.eit.lth.se
DSP Design
System
Segmentationalgorithm
Morph filterand labeling
Segmentationalgorithm
CAM
Featureextraction
Tracking
Obj t 1
extraction
Object 1size = 1037position = (56, 180)color 1 = 137
Viktor Öwall, Dept. of Electrical and Information Technology, Lund University, Sweden-www.eit.lth.se
color_1 137…
DSP Design
Segmentation
• Detects motion• Generates a noisy binary mask due to errors y y
caused by camera, fast light changes etc.
Viktor Öwall, Dept. of Electrical and Information Technology, Lund University, Sweden-www.eit.lth.se
DSP Design
Background ModelingBackground Modeling
P1 ( x0 , y0 ) P2 ( x0 , y0 ) P3 ( x0 , y0 ) Pn ( x0 , y0 )
11 22 33 nnSample backgroundSample background
Consecutive Video FramesConsecutive Video FramesSample background Sample background
environment in the digital environment in the digital lablab
120
125
130
BB
Pixel values taken from same Pixel values taken from same location in consecutive video location in consecutive video frames looks like a Gaussian frames looks like a Gaussian
90
110
115
BB distribution in RGB color space, distribution in RGB color space, i.e. even when nothing is i.e. even when nothing is happening it’s not a single happening it’s not a single value.value.
Viktor Öwall, Dept. of Electrical and Information Technology, Lund University, Sweden-www.eit.lth.se
60708090100110100
110120
105
RRGG
DSP Design
M l i d l B k dMulti-modal Background
125
130
110
115
120
BB
100
150
110115120125130100
105
RR
More complicated background pixels
50100105110
GG
More complicated background pixels such as lake surface and swaying trees have the property of two distributions requiring two Gaussian to model
Viktor Öwall, Dept. of Electrical and Information Technology, Lund University, Sweden-www.eit.lth.se
requiring two Gaussian to model
DSP Design
Video segmentation based onGaussian Mixture background Model(Stauffer and Grimson)(Stauffer and Grimson)
• Detect moving object in image sequences
• Each pixel over time is a “pixel process”, modeled by Gaussian distributions
• Each background object correspond to one GaussianMotion
Detection
• GMM is robust for handling multi-modal background situations
• swaying trees
• lake surface
• etc.
Viktor Öwall, Dept. of Electrical and Information Technology, Lund University, Sweden-www.eit.lth.se
DSP Design
Hardware Implementation ConsiderationsGaussian
Hardware Implementation Considerations
ParameterMemory
Bitmask
MemoryBottleneck
Sortingof Matching Decision
Bitmask
LabeledBitstream
GaussiansMatchingNetwork
DecisionNetwork Labeling
RGB pixel stream
Bitstream
Post-processingRGB pixel stream
Fully parallel and pipelined design aiming for one pixel per clock cycle
Most important design parameter: High memory bandwidth:
15 variables/pixel + RGB,
Viktor Öwall, Dept. of Electrical and Information Technology, Lund University, Sweden-www.eit.lth.se
/p ,
i.e. 5 parameters for each Guassian distribution x 3
DSP Design
H d I l t ti C id tiHardware Implementation ConsiderationsGaussianP tParameterMemory
BitmaskEncodingDecoding
+ Buffer
Sortingof
GaussiansMatchingNetwork
DecisionNetwork Labeling
LabeledBitstream
Network Network
RGB pixel stream Post-processing
• Idea: Neighbouring pixels have similar parameters
• Use some form of Run Length Encoding
Viktor Öwall, Dept. of Electrical and Information Technology, Lund University, Sweden-www.eit.lth.se
• Simulations show reduction of memory access by >50%
DSP Design
Memory bandwidth reductionVariance x 2.5
Memory bandwidth reduction
DDR SDRAM
Mean Gaussian Distribution represented as a Cube
ParameterSaving
ParameterReforming
Matching &
Sorting
BitmaskKodakCMOSSensor
Two Overlapping Gaussian Distributions
(Red Cube)
• Cons: more noise is generated in the binary mask• Pros: If Gaussians with 80% overlap is regarded as the “same” Gaussian, more
than 60% memory saving can be expected
Viktor Öwall, Dept. of Electrical and Information Technology, Lund University, Sweden-www.eit.lth.se
than 60% memory saving can be expected
DSP Design
Memory Reduction ResultsMemory Reduction Results 0 9
0 .9 50 .9
0 .7 5
0 .8
0 .8 5
0 .9
redu
ctio
n 0 . 80 .70 .60 .5
0 .6 5
0 . 7
redu
ctio
n
0 . 6 5
0 .7
0 .7 5
band
wid
th r
0 . 6
band
wid
th r
0 5
0 .5 5
0 .6
Mem
ory
b
0 . 5
0 .5 5
mem
ory
b
0 5 0 0 1 0 0 00 .4 5
0 .5
F ra m e0 .5 0 .6 0 .7 0 . 8 0 .9
0 .4 5
T h re s h o ld
• Different memory bandwidth savings with different threshold• Too low threshold results in clustered noise that can not be removed by morphology
Viktor Öwall, Dept. of Electrical and Information Technology, Lund University, Sweden-www.eit.lth.se
DSP Design
Memory Reduction Results
Segmentation withSegmentation with different threshold
Results after morphology
Clustered noise
Viktor Öwall, Dept. of Electrical and Information Technology, Lund University, Sweden-www.eit.lth.se
noise
DSP Design
Segmentation results
Viktor Öwall, Dept. of Electrical and Information Technology, Lund University, Sweden-www.eit.lth.se
Shadow reduction is important!
DSP Design
Original image Output image afterOriginal image Output image after segmentation
Viktor Öwall, Dept. of Electrical and Information Technology, Lund University, Sweden-www.eit.lth.se
DSP Design
System
Segmentationalgorithm
Morph filterand labelingMorph filterand labeling
CAM
Featureextraction
Tracking
Obj t 1
extraction
Object 1size = 1037position = (56, 180)color 1 = 137
Viktor Öwall, Dept. of Electrical and Information Technology, Lund University, Sweden-www.eit.lth.se
color_1 137…
DSP Design
O t t i ft l t iSegmented input image Output image after clustering
Viktor Öwall, Dept. of Electrical and Information Technology, Lund University, Sweden-www.eit.lth.se
DSP Design
MorphologyGreek morphe ”shape”, –ology ”the study of”
The study of shapes
• Applies to many number representations– In our application, only binary input is considered
St t i l t (SE)
1/01/01/01/01/01/01/01/01/01/01/01/01/01/01/01/0
Arbitrary binary image• Structuring element (SE)
– Sliding window
1/01/01/01/01/01/01/01/01/01/01/01/01/01/01/01/01/01/01/01/01/01/01/01/01/01/01/01/01/01/01/01/01/01/01/01/01/01/01/01/0
3x3 SE1/01/01/01/01/01/01/01/01/01/01/01/01/01/01/01/0 11 11 11
11 11 11
Origin
Viktor Öwall, Dept. of Electrical and Information Technology, Lund University, Sweden-www.eit.lth.se
11 11 11
DSP Design
MorphologySimilar to convolution but more on the logic
levelImportant operations• Erosion: Shrinks (minimum)• Erosion: Shrinks (minimum)• Dilation: Expands (maximum)• Opening (erosion followed by dilation):
– Noise reduction• Closing (dilation followed by erosion):
– Reconnect split objectsReconnect split objects
Viktor Öwall, Dept. of Electrical and Information Technology, Lund University, Sweden-www.eit.lth.se
DSP Design
Morphology
ErosionSE
Dilation
Opening:• Erosion followed by dilation
Noise reduction– Noise reduction
Viktor Öwall, Dept. of Electrical and Information Technology, Lund University, Sweden-www.eit.lth.se
DSP Design
Morphology, erosion (”and”)
000000 000000000
011
011
000
110
100111
000
000
001
000
000
000011100
011100
000000
111
111
= 001000
000000
000000000000
000000
000000
000000
Viktor Öwall, Dept. of Electrical and Information Technology, Lund University, Sweden-www.eit.lth.se
DSP Design
Morphology, dilation (”or”)
000000 111111000
011
011
000
110
100111
111
111
111
111
111
111011100
011100
000000
111
111
= 111111
111111
111111000000
000000
111111
111111
Viktor Öwall, Dept. of Electrical and Information Technology, Lund University, Sweden-www.eit.lth.se
DSP Design
Morphology, sliding windowDirect mapped implementationDirect-mapped implementation
ffff ffff2211 33Index image
FIFOFIFO4,5,64,5,6
ffff ffff FIFOFIFO 121110987654321
77 88 99 10,11,1210,11,12
OO,5,6,5,6
ffff ffff InputInput302928272625242322212019181716151413
1313 1414 1515 16,..,3616,..,36
Erosion / DilationErosion / Dilation363534333231
OutputOutput
Viktor Öwall, Dept. of Electrical and Information Technology, Lund University, Sweden-www.eit.lth.se
DSP Design
Morphology, sliding windowDirect mapped implementationDirect-mapped implementation
ffff ffff3322 44Index image
FIFOFIFO5,6,75,6,7
ffff ffff FIFOFIFO 121110987654321
88 99 1010 11,12,1311,12,13
OO5,6,5,6,
ffff ffff InputInput302928272625242322212019181716151413
1414 1515 1616 17,18,1917,18,19
Erosion / DilationErosion / Dilation363534333231
OutputOutput
Viktor Öwall, Dept. of Electrical and Information Technology, Lund University, Sweden-www.eit.lth.se
DSP Design
Morphology, sliding windowDirect mapped implementationDirect-mapped implementation
ffff ffff23232222 2424Index image
FIFOFIFO25,26,2725,26,27
ffff ffff FIFOFIFO 121110987654321
2828 2929 3030 31,32,3331,32,33
OO5, 6,5, 6,
ffff ffff InputInput302928272625242322212019181716151413
3434 3535 3636 --,,--,,--
Erosion / DilationErosion / Dilation363534333231
OutputOutputPros: • Supports arbitrary SEs
Cons: • Unsuitable for large SEs
Viktor Öwall, Dept. of Electrical and Information Technology, Lund University, Sweden-www.eit.lth.se
Compare to 2D- Convolution Architecture!
DSP Design
D itiDecomposition
21 BBB
SESEwidthwidth
SE
SE
hhSESE idthidthx SEx SEh i hth i ht == SESEwidthwidth eighteight
SESEwidthwidthx SEx SEheightheight
Viktor Öwall, Dept. of Electrical and Information Technology, Lund University, Sweden-www.eit.lth.se
DSP Design
Morphology, erosion (”and”)
000000 000000000
011
011
000
110
100111
000
000
001
000
000
000011100
011100
000000
111
111
= 001000
000000
000000000000
000000
000000
000000
Viktor Öwall, Dept. of Electrical and Information Technology, Lund University, Sweden-www.eit.lth.se
DSP Design
Morphology, erosion (”and”)
000000 000000000
011
011
000
110
100111
000
000
001
000
000
000011100
011100
000000
111
111
= 001000
000000
000000000000
000000
000000
000000
Viktor Öwall, Dept. of Electrical and Information Technology, Lund University, Sweden-www.eit.lth.se
DSP Design
tDecomposition 1st step
000000 000000000
011
011
000
110
100
000
001
001
000
100
000011100
011100
000000
111 = 001000
001000
000000000000
000000
000000
000000
Viktor Öwall, Dept. of Electrical and Information Technology, Lund University, Sweden-www.eit.lth.se
DSP Design
Decomposition 2nd step
000000000
001
001
000
100
0001
000
000
001
000
000
000=001000
001000
000000
1
1
001000
000000
000000000000
000000
000000
000000
Viktor Öwall, Dept. of Electrical and Information Technology, Lund University, Sweden-www.eit.lth.se
DSP Design
Morphology
ffff
Mux
Mux
’0’’0’ArchitectureArchitecture
001122ffff
MuxMux
=SE width=SE width++InIn OutOut..., 0, 1, 1, 1, 1, 0,......, 0, 1, 1, 1, 1, 0,... =3=3 ..., 0,......, 0,......, 0, 0,......, 0, 0,......, 0, 0, 0, ......, 0, 0, 0, ......, 1, 0, 0, 0, ......, 1, 0, 0, 0, ......, 1, 1, 0, 0, 0, ......, 1, 1, 0, 0, 0, ......, 0, 1, 1, 0, 0, 0, ......, 0, 1, 1, 0, 0, 0, ...
Stage 1: Stage 1: Number of ones Number of ones in the same rowin the same row
SESEwidthwidth
Viktor Öwall, Dept. of Electrical and Information Technology, Lund University, Sweden-www.eit.lth.se
DSP Design
Morphologyffff
Mux
Mux
Mux
MuxRow memRow mem
’0’’0’ ’0’’0’ArchitectureArchitecture
MuxMux MuxMux
=SE width=SE width =SE height=SE height++ ++InIn OutOut
Stage 1: Stage 1: Number of ones Number of ones in the same rowin the same row
Stage 2: Stage 2: Number of Number of consecutive lines with consecutive lines with
SE width onesSE width ones
SESEwidthwidth
SE
SE
heigheig SESEwidthwidthx SEx SEheightheight==
Viktor Öwall, Dept. of Electrical and Information Technology, Lund University, Sweden-www.eit.lth.se
ghtght
DSP Design
Morphologyffff
Mux
Mux
Mux
MuxRow memRow mem
’0’’0’ ’0’’0’ArchitectureArchitecture
MuxMux MuxMux
=SE width=SE width =SE height=SE height++ ++InIn OutOut
Stage 1: Stage 1: Number of ones Number of ones in the same rowin the same row
Stage 2: Stage 2: Number of Number of consecutive lines with consecutive lines with
SE width onesSE width ones
Viktor Öwall, Dept. of Electrical and Information Technology, Lund University, Sweden-www.eit.lth.se
DSP Design
D litDuality
''A B A B '
A B A B
'A B A B
Viktor Öwall, Dept. of Electrical and Information Technology, Lund University, Sweden-www.eit.lth.se
DSP Design
D lit lDuality, exampleA B 'A B
000011
000110
111
111100
111001
111011100011100000000
111111111
100011100011111111
111111111
000000 111111
'A B '( )´A B111111
111111
A B ( )A B000000
000000
110111111111111111
000001
000000
000000000000
Viktor Öwall, Dept. of Electrical and Information Technology, Lund University, Sweden-www.eit.lth.se
111111111111
000000000000
DSP Design
D litDuality
''A B A B '
A B A B
'A B A B
Both operations on same hardware pby inverting the input and output
streamsstreams.
Viktor Öwall, Dept. of Electrical and Information Technology, Lund University, Sweden-www.eit.lth.se
DSP Design
MorphologyMorphology
ffff
Mux
Mux
Mux
MuxRow memRow mem
’W’’W’ ’N’’N’’0’’0’ ’0’’0’ArchitectureArchitecture
MuxMux
Mux Mux
Mu
Mu
Mux Mux MuxMux
InIn
NorthNorthWestWest
OperationOperation
OperationOperation
SouthSouth&&
EastEast
Mux
Mux =SE width=SE width =SE height=SE height
uxux
++ ++ OutOut’1’’1’
Mu
Muxx
Stage 0: Stage 0: InvertsInvertsif dilation is if dilation is performedperformed
Stage 1: Stage 1: Number of ones Number of ones in the same rowin the same row
Stage 2: Stage 2: Number of Number of consecutive lines with consecutive lines with
SE width onesSE width ones
Stage 3: Stage 3: InvertsInvertsif dilation is if dilation is performedperformed
DualityDuality
Viktor Öwall, Dept. of Electrical and Information Technology, Lund University, Sweden-www.eit.lth.se
DSP Design
M h lMorphologyIn our application• Noise reduction• Reconnect split objects
Low complexity architecture with low memory requirements
Viktor Öwall, Dept. of Electrical and Information Technology, Lund University, Sweden-www.eit.lth.se
DSP Design
P t tPrototype
Viktor Öwall, Dept. of Electrical and Information Technology, Lund University, Sweden-www.eit.lth.se
DSP Design
Embedded Hardware PlatformDDR memory
Segm. Morph Label
Feat. Mem 0
Sensor
Bus
FIFOSegm. Morph LabelFeat. Mem 1
SWMem
Read
Sensor FIFO
Read-&
Draw-boxes
VGA
PPC
VGA
Label Mem 1
Label Mem 0
DISPLAY
ResultMem
VGAMemory
VGACTRL
Viktor Öwall, Dept. of Electrical and Information Technology, Lund University, Sweden-www.eit.lth.se
FPGA-chip
DSP Design
Digital Holography
Transpositiona spos t o
Viktor Öwall, Dept. of Electrical and Information Technology, Lund University, Sweden-www.eit.lth.se
DSP Design
Digital HolographyDigital Holography
A li ti• A digital image sensor replaces the
ApplicationMicroscope based on
Digital Holography
photographic film• Interference pattern, reference and object
li ht i t d t lDigital Holography light is captured separately• Computer algorithm generates the image
LaserReference Light
Digital image Object
Object Lightsensor Light
Viktor Öwall, Dept. of Electrical and Information Technology, Lund University, Sweden-www.eit.lth.se
DSP Design
Advantage 1 - Phase informationUnwrapped phaseR f ti i d
Makes transparent objects visible
Amplitude Refraction index
Viktor Öwall, Dept. of Electrical and Information Technology, Lund University, Sweden-www.eit.lth.se
DSP Design
Advantage 2 – FocusAll focus information in one single recording
Head of a greenfly
1 mm
Viktor Öwall, Dept. of Electrical and Information Technology, Lund University, Sweden-www.eit.lth.se
DSP Design
Phase Holographic Imagingcell analyzer to envision and monitor transparent living cells in vitro, in
their growth environment without the need for artificial staining and makes quantification of a large number of parameters possible to
f i l tiperform in real-time
Pseudo 3D-image of cells generated from
Viktor Öwall, Dept. of Electrical and Information Technology, Lund University, Sweden-www.eit.lth.se
Pseudo 3D-image of cells generated from the phase information. www.phiab.se
DSP Design
Time lapse study of cell division:Time-lapse study of cell division:Wilms' tumor is a rare type of kidney cancer that affects children.yp y
Viktor Öwall, Dept. of Electrical and Information Technology, Lund University, Sweden-www.eit.lth.se
DSP Design
Time-lapse study: a sequence of consequtive imagesa sequence of consequtive images
Viktor Öwall, Dept. of Electrical and Information Technology, Lund University, Sweden-www.eit.lth.se
DSP Design
•Important issues•Processing and efficiency
• Processor vs. FPGA/ASIC
•Memory access and throughput•FFT Selection
Viktor Öwall, Dept. of Electrical and Information Technology, Lund University, Sweden-www.eit.lth.se
•FFT Selection
DSP Design
XSTREAM - 2D FFT• A two-dimensional FFT can be evaluated by
– Applying a one-dimensional FFT over the rowsApplying a one dimensional FFT over the rows– Applying a one-dimensional FFT over the column of the result
• Burst read Column access is slow– Transpose the memory between operations and only operate on
rows
Viktor Öwall, Dept. of Electrical and Information Technology, Lund University, Sweden-www.eit.lth.se
DSP Design
Memory and throughput
Overhead = (Setup+N) / N
N 1 O h d 800%N=1 Overhead 800%N=32 Overhead 21% Burst access
0 N-1
Viktor Öwall, Dept. of Electrical and Information Technology, Lund University, Sweden-www.eit.lth.se
DSP Design
XSTREAM - Transpose• Divide the matrix into macro-blocks (32x32)
– Transpose macro-blocks individuallyTranspose macro blocks individually– Relocate transposed macro-blocks
Viktor Öwall, Dept. of Electrical and Information Technology, Lund University, Sweden-www.eit.lth.se
DSP Design
Divided Transpose1 2 3 4 5 6 7 8 1 9 17 25
9 10 11 12 13 14 15 16
17 18 19 20 21 22 23 24
2 10 18 26
3 11 19 27
25 26 27 28 29 30 31 32
33 34 35 36 37 38 39 40
4 12 20 28
41 42 43 44 45 46 47 48
49 50 51 52 53 54 55 56
57 58 59 60 61 62 63 64
Viktor Öwall, Dept. of Electrical and Information Technology, Lund University, Sweden-www.eit.lth.se
DSP Design
Divided Transpose1 2 3 4 5 6 7 8 1 9 17 25 33 41 49 57
9 10 11 12 13 14 15 16
17 18 19 20 21 22 23 24
2 10 18 26 34 42 50 58
3 11 19 27 35 43 51 59
25 26 27 28 29 30 31 32
33 34 35 36 37 38 39 40
4 12 20 28 36 44 52 60
41 42 43 44 45 46 47 48
49 50 51 52 53 54 55 56
57 58 59 60 61 62 63 64
Viktor Öwall, Dept. of Electrical and Information Technology, Lund University, Sweden-www.eit.lth.se
DSP Design
Divided Transpose1 2 3 4 5 6 7 8 1 9 17 25 33 41 49 57
9 10 11 12 13 14 15 16
17 18 19 20 21 22 23 24
2 10 18 26 34 42 50 58
3 11 19 27 35 43 51 59
25 26 27 28 29 30 31 32
33 34 35 36 37 38 39 40
4 12 20 28 36 44 52 60
5 13 21 29
41 42 43 44 45 46 47 48
49 50 51 52 53 54 55 56
6 14 22 30
7 15 23 31
57 58 59 60 61 62 63 64 8 16 24 32
Viktor Öwall, Dept. of Electrical and Information Technology, Lund University, Sweden-www.eit.lth.se
DSP Design
Divided Transpose1 2 3 4 5 6 7 8 1 9 17 25 33 41 49 57
9 10 11 12 13 14 15 16
17 18 19 20 21 22 23 24
2 10 18 26 34 42 50 58
3 11 19 27 35 43 51 59
25 26 27 28 29 30 31 32
33 34 35 36 37 38 39 40
4 12 20 28 36 44 52 60
5 13 21 29 37 45 53 61
41 42 43 44 45 46 47 48
49 50 51 52 53 54 54 56
6 14 22 30 38 46 54 62
7 15 23 31 39 47 55 63
57 58 59 60 61 62 63 64 8 16 24 32 40 48 56 64
Viktor Öwall, Dept. of Electrical and Information Technology, Lund University, Sweden-www.eit.lth.se
DSP Design
XSTREAM - 2D FFT
Viktor Öwall, Dept. of Electrical and Information Technology, Lund University, Sweden-www.eit.lth.se
A ”rather” small burst size gives a large gain!