image processing case study · dsp design image processing from a hardware perspective • often...

19
DSP Design Case Study Image Processing Viktor Öwall, Dept. of Electrical and Information Technology, Lund University, Sweden-www.eit.lth.se DSP Design Image processing From a hardware perspective Often massively parallel Can be used to increase throughput Memory intensive Storage size Storage size Memory bandwidth Viktor Öwall, Dept. of Electrical and Information Technology, Lund University, Sweden-www.eit.lth.se DSP Design 2-diemensional Image Convolution N M M For each pixel position in the MxM image, the kernel value is multiplied with the underlying pixel value and those are added N pixel value and those are added to produce the output value: 2 1 2 2 1 1 2 1 , , , m m h m k m k x k k y 2 M A frame is added to avoid border effects. Image processing is memory Viktor Öwall, Dept. of Electrical and Information Technology, Lund University, Sweden-www.eit.lth.se 2 Image processing is memory intensive! DSP Design Edge detection and zero crossings with different kernel size with different kernel size Viktor Öwall, Dept. of Electrical and Information Technology, Lund University, Sweden-www.eit.lth.se

Upload: others

Post on 13-Jun-2020

13 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Image processing Case Study · DSP Design Image processing From a hardware perspective • Often massivelyyp parallel – Can be used to increase throughput • Memory intensive –

DSP Design

Case Study

Image Processingage ocess g

Viktor Öwall, Dept. of Electrical and Information Technology, Lund University, Sweden-www.eit.lth.se

DSP Design

Image processingFrom a hardware perspective

• Often massively parallely p– Can be used to increase throughput

• Memory intensiveStorage size– Storage size

– Memory bandwidth

Viktor Öwall, Dept. of Electrical and Information Technology, Lund University, Sweden-www.eit.lth.se

DSP Design

2-diemensional Image Convolution

N

M MFor each pixel position in the

MxM image, the kernel value is multiplied with the underlying pixel value and those are added

N

pixel value and those are added to produce the output value:

21221121 ,,, mmhmkmkxkky

2M

A frame is added to avoid border effects.Image processing is memory

Viktor Öwall, Dept. of Electrical and Information Technology, Lund University, Sweden-www.eit.lth.se

2 Image processing is memory intensive!

DSP Design

Edge detection and zero crossings with different kernel sizewith different kernel size

Viktor Öwall, Dept. of Electrical and Information Technology, Lund University, Sweden-www.eit.lth.se

Page 2: Image processing Case Study · DSP Design Image processing From a hardware perspective • Often massivelyyp parallel – Can be used to increase throughput • Memory intensive –

DSP DesignGrain recognitionGrain recognition

Increasing filter size more calculations

Viktor Öwall, Dept. of Electrical and Information Technology, Lund University, Sweden-www.eit.lth.se

Edge detection

DSP DesignGrain recognition

Increasing filter size more calculationsGrain recognition

Filter size15 x 1515 x 15

225multiplications

i l

Viktor Öwall, Dept. of Electrical and Information Technology, Lund University, Sweden-www.eit.lth.se

per pixel

DSP Design

What datapath architecture?• Hardware mapped, i.e. 225 multipliers + adds

• Single MAC (Multiply Accumulate) unitg ( p y )

• Hardware for one column each clock cycle• Hardware for one column each clock cycle

Viktor Öwall, Dept. of Electrical and Information Technology, Lund University, Sweden-www.eit.lth.se

DSP Design

Adder tree structure of processor core15 pixels read on each l k lclock cycle

PipelinedAdder TreeAdder Tree

Accumulator

Viktor Öwall, Dept. of Electrical and Information Technology, Lund University, Sweden-www.eit.lth.se

Page 3: Image processing Case Study · DSP Design Image processing From a hardware perspective • Often massivelyyp parallel – Can be used to increase throughput • Memory intensive –

DSP Design

Adder tree structure of processor core15 pixels read on each l k lclock cycle

Increased wordlength to keep precisionto keep precision and avoid overflow

Guard bits in accumulatorGuard bits in accumulator

and truncated output

Viktor Öwall, Dept. of Electrical and Information Technology, Lund University, Sweden-www.eit.lth.se

DSP Design

Datapath Chip, 1993

1m standard CMOS technology approx. 50 000 transistors

Viktor Öwall, Dept. of Electrical and Information Technology, Lund University, Sweden-www.eit.lth.se

ppdie area, 8x6,5 mm2

DSP Design

2-diemensional Image Convolution

Off-chip image memoryN

M M • Large• High power

E i l d i l

MxM

Every pixel used in several calculationsN

22 )1( MNM )1( MNM

2M

Multi-level memory hierarchy can be used.

Viktor Öwall, Dept. of Electrical and Information Technology, Lund University, Sweden-www.eit.lth.se

2

DSP Design

How to use line memoriesInitial filling

New lineNew line

Each pixel operation

Only one external memory read per pixel!

Viktor Öwall, Dept. of Electrical and Information Technology, Lund University, Sweden-www.eit.lth.se

+ shift one pixel between memoriesmemory read per pixel!

Page 4: Image processing Case Study · DSP Design Image processing From a hardware perspective • Often massivelyyp parallel – Can be used to increase throughput • Memory intensive –

DSP Design

Memory Hierarchy, accesses

KernelImage memory

off-chip

2N 2MMMN )1(

M Line memories memories

NM-1 with N words1 witk M words

Scheme Image Line Kernel

Image M2(N-M+1)2g ( )Image line N2 M2(N-M+1)2

Image kernel MN(N-M+1) M2(N-M+1)2

Viktor Öwall, Dept. of Electrical and Information Technology, Lund University, Sweden-www.eit.lth.se

Image line kernel N2 MN(N-M+1) M2(N-M+1)2

DSP Design

Memory Hierarchy, energyImage memory

off-chip, 0.18m Kernel0.35m CMOS

2N 2MMMN )1(

Line memories memories

N60nJ/access

4nJ/access 1nJ/access

Scheme energyImage 13.8JN = 1024 Image 13.8J

Image line 1.0J

Image kernel 1.2J

N 0M = 15Wordlength = 16

Viktor Öwall, Dept. of Electrical and Information Technology, Lund University, Sweden-www.eit.lth.se

Image line kernel 0.4J

DSP Design

Tailored architecture in theTailored architecture in the datapath design

Image processorwithout controller

Data out - 4x24 bits

Cyclic columnstorage

4 bitsaddress line

Cachelevel 3(15x16)

Processorcore

1

Processorcore

4

Processorcore

2

Processorcore

3APU

3

gaddress line

New column writtento cache for each

new pixel operation

System bus 15x8 bits

8 bits

Controlsignals from

controller Kernel moving onepixel to the right

Line memories with pipelined registerslevel 2

(15x256)2

APU

1

Large off-chipmemories

level 1 (256x256)

Inputbuffer

New value feeded duringeach new pixel operation

Unfilled memoryelements

Viktor Öwall, Dept. of Electrical and Information Technology, Lund University, Sweden-www.eit.lth.se

DSP Design

Compared to aCompared to aTMS320C80 Multimedia Video Processor (MVP)

Published 1995Published 1995

Designed: 20MHzgMVP: 50MHz

MVP: 4 parallel DSPs + 1 master processor [3].

Viktor Öwall, Dept. of Electrical and Information Technology, Lund University, Sweden-www.eit.lth.se

Each DSP unit contains one 16x16 bit multiplier, which can be split into two8x8 bit multipliers

Page 5: Image processing Case Study · DSP Design Image processing From a hardware perspective • Often massivelyyp parallel – Can be used to increase throughput • Memory intensive –

DSP Design

Memory Considerations

Viktor Öwall, Dept. of Electrical and Information Technology, Lund University, Sweden-www.eit.lth.se

DSP Design

We have registers, why memories?

D Flip-flop : 252µm2 Memory element : 30µm2

Viktor Öwall, Dept. of Electrical and Information Technology, Lund University, Sweden-www.eit.lth.se

DSP Design

Flip flops vs SRAMFlip-flops vs. SRAMAlcatel Microelectronics 0.35µm CMOS technology process

Process and library dependent but same trends

1.6

1.8

Flip-flopsDual port memorySingle port memory

Process and library dependent but same trends

1.2

1.4Single port memoryDouble width memory

0.8

1

squa

re m

m

0.4

0.6

0 500 1000 1500 2000 2500 3000 3500 4000 45000

0.2

memory elements

Viktor Öwall, Dept. of Electrical and Information Technology, Lund University, Sweden-www.eit.lth.se

memory elements

Crossover approx 200bits for this technology

DSP Design

Hardware Aspects of a Real-time Surveillance SystemSurveillance System

Viktor Öwall, Dept. of Electrical and Information Technology, Lund University, Sweden-www.eit.lth.se

Page 6: Image processing Case Study · DSP Design Image processing From a hardware perspective • Often massivelyyp parallel – Can be used to increase throughput • Memory intensive –

DSP Design

A I t lli t S ill S tAn Intelligent Surveillance SystemSegmentation Morphology Labeling

Tracked Objects

Segmentation Morphology Labeling

Object classification

Feature extraction Tracking

Input: Video from stationary cameraOutput: Tracked Objects

Spec: Xilinx Virtex II-Pro Development PlatformResolution 320x24025 frames per second

• Architectures for local decisions

Viktor Öwall, Dept. of Electrical and Information Technology, Lund University, Sweden-www.eit.lth.se

• Embedded system requires real time and low power

DSP Design

Th PhD S d TThe PhD Student TeamThree PhD St dentsThree PhD Students:• Hongtu Jiang; Sensor interface and segmentation.

– PhD February 2007PhD February 2007

• Fredrik Kristensen; System Overview, feature extraction and tracking

PhD S t b 2007– PhD September 2007

• Hugo Hedberg Morphology and labeling• Hugo Hedberg, Morphology and labeling– PhD April 2008

Viktor Öwall, Dept. of Electrical and Information Technology, Lund University, Sweden-www.eit.lth.se

References see: www.es.lth.se/asicdsp

DSP Design

Th d ltThe end result

Viktor Öwall, Dept. of Electrical and Information Technology, Lund University, Sweden-www.eit.lth.se

DSP Design

System

Segmentationalgorithm

Morph filterand labeling

Segmentationalgorithm

CAM

Featureextraction

Tracking

Obj t 1

extraction

Object 1size = 1037position = (56, 180)color 1 = 137

Viktor Öwall, Dept. of Electrical and Information Technology, Lund University, Sweden-www.eit.lth.se

color_1 137…

Page 7: Image processing Case Study · DSP Design Image processing From a hardware perspective • Often massivelyyp parallel – Can be used to increase throughput • Memory intensive –

DSP Design

Segmentation

• Detects motion• Generates a noisy binary mask due to errors y y

caused by camera, fast light changes etc.

Viktor Öwall, Dept. of Electrical and Information Technology, Lund University, Sweden-www.eit.lth.se

DSP Design

Background ModelingBackground Modeling

P1 ( x0 , y0 ) P2 ( x0 , y0 ) P3 ( x0 , y0 ) Pn ( x0 , y0 )

11 22 33 nnSample backgroundSample background

Consecutive Video FramesConsecutive Video FramesSample background Sample background

environment in the digital environment in the digital lablab

120

125

130

BB

Pixel values taken from same Pixel values taken from same location in consecutive video location in consecutive video frames looks like a Gaussian frames looks like a Gaussian

90

110

115

BB distribution in RGB color space, distribution in RGB color space, i.e. even when nothing is i.e. even when nothing is happening it’s not a single happening it’s not a single value.value.

Viktor Öwall, Dept. of Electrical and Information Technology, Lund University, Sweden-www.eit.lth.se

60708090100110100

110120

105

RRGG

DSP Design

M l i d l B k dMulti-modal Background

125

130

110

115

120

BB

100

150

110115120125130100

105

RR

More complicated background pixels

50100105110

GG

More complicated background pixels such as lake surface and swaying trees have the property of two distributions requiring two Gaussian to model

Viktor Öwall, Dept. of Electrical and Information Technology, Lund University, Sweden-www.eit.lth.se

requiring two Gaussian to model

DSP Design

Video segmentation based onGaussian Mixture background Model(Stauffer and Grimson)(Stauffer and Grimson)

• Detect moving object in image sequences

• Each pixel over time is a “pixel process”, modeled by Gaussian distributions

• Each background object correspond to one GaussianMotion

Detection

• GMM is robust for handling multi-modal background situations

• swaying trees

• lake surface

• etc.

Viktor Öwall, Dept. of Electrical and Information Technology, Lund University, Sweden-www.eit.lth.se

Page 8: Image processing Case Study · DSP Design Image processing From a hardware perspective • Often massivelyyp parallel – Can be used to increase throughput • Memory intensive –

DSP Design

Hardware Implementation ConsiderationsGaussian

Hardware Implementation Considerations

ParameterMemory

Bitmask

MemoryBottleneck

Sortingof Matching Decision

Bitmask

LabeledBitstream

GaussiansMatchingNetwork

DecisionNetwork Labeling

RGB pixel stream

Bitstream

Post-processingRGB pixel stream

Fully parallel and pipelined design aiming for one pixel per clock cycle

Most important design parameter: High memory bandwidth:

15 variables/pixel + RGB,

Viktor Öwall, Dept. of Electrical and Information Technology, Lund University, Sweden-www.eit.lth.se

/p ,

i.e. 5 parameters for each Guassian distribution x 3

DSP Design

H d I l t ti C id tiHardware Implementation ConsiderationsGaussianP tParameterMemory

BitmaskEncodingDecoding

+ Buffer

Sortingof

GaussiansMatchingNetwork

DecisionNetwork Labeling

LabeledBitstream

Network Network

RGB pixel stream Post-processing

• Idea: Neighbouring pixels have similar parameters

• Use some form of Run Length Encoding

Viktor Öwall, Dept. of Electrical and Information Technology, Lund University, Sweden-www.eit.lth.se

• Simulations show reduction of memory access by >50%

DSP Design

Memory bandwidth reductionVariance x 2.5

Memory bandwidth reduction

DDR SDRAM

Mean Gaussian Distribution represented as a Cube

ParameterSaving

ParameterReforming

Matching &

Sorting

BitmaskKodakCMOSSensor

Two Overlapping Gaussian Distributions

(Red Cube)

• Cons: more noise is generated in the binary mask• Pros: If Gaussians with 80% overlap is regarded as the “same” Gaussian, more

than 60% memory saving can be expected

Viktor Öwall, Dept. of Electrical and Information Technology, Lund University, Sweden-www.eit.lth.se

than 60% memory saving can be expected

DSP Design

Memory Reduction ResultsMemory Reduction Results 0 9

0 .9 50 .9

0 .7 5

0 .8

0 .8 5

0 .9

redu

ctio

n 0 . 80 .70 .60 .5

0 .6 5

0 . 7

redu

ctio

n

0 . 6 5

0 .7

0 .7 5

band

wid

th r

0 . 6

band

wid

th r

0 5

0 .5 5

0 .6

Mem

ory

b

0 . 5

0 .5 5

mem

ory

b

0 5 0 0 1 0 0 00 .4 5

0 .5

F ra m e0 .5 0 .6 0 .7 0 . 8 0 .9

0 .4 5

T h re s h o ld

• Different memory bandwidth savings with different threshold• Too low threshold results in clustered noise that can not be removed by morphology

Viktor Öwall, Dept. of Electrical and Information Technology, Lund University, Sweden-www.eit.lth.se

Page 9: Image processing Case Study · DSP Design Image processing From a hardware perspective • Often massivelyyp parallel – Can be used to increase throughput • Memory intensive –

DSP Design

Memory Reduction Results

Segmentation withSegmentation with different threshold

Results after morphology

Clustered noise

Viktor Öwall, Dept. of Electrical and Information Technology, Lund University, Sweden-www.eit.lth.se

noise

DSP Design

Segmentation results

Viktor Öwall, Dept. of Electrical and Information Technology, Lund University, Sweden-www.eit.lth.se

Shadow reduction is important!

DSP Design

Original image Output image afterOriginal image Output image after segmentation

Viktor Öwall, Dept. of Electrical and Information Technology, Lund University, Sweden-www.eit.lth.se

DSP Design

System

Segmentationalgorithm

Morph filterand labelingMorph filterand labeling

CAM

Featureextraction

Tracking

Obj t 1

extraction

Object 1size = 1037position = (56, 180)color 1 = 137

Viktor Öwall, Dept. of Electrical and Information Technology, Lund University, Sweden-www.eit.lth.se

color_1 137…

Page 10: Image processing Case Study · DSP Design Image processing From a hardware perspective • Often massivelyyp parallel – Can be used to increase throughput • Memory intensive –

DSP Design

O t t i ft l t iSegmented input image Output image after clustering

Viktor Öwall, Dept. of Electrical and Information Technology, Lund University, Sweden-www.eit.lth.se

DSP Design

MorphologyGreek morphe ”shape”, –ology ”the study of”

The study of shapes

• Applies to many number representations– In our application, only binary input is considered

St t i l t (SE)

1/01/01/01/01/01/01/01/01/01/01/01/01/01/01/01/0

Arbitrary binary image• Structuring element (SE)

– Sliding window

1/01/01/01/01/01/01/01/01/01/01/01/01/01/01/01/01/01/01/01/01/01/01/01/01/01/01/01/01/01/01/01/01/01/01/01/01/01/01/01/0

3x3 SE1/01/01/01/01/01/01/01/01/01/01/01/01/01/01/01/0 11 11 11

11 11 11

Origin

Viktor Öwall, Dept. of Electrical and Information Technology, Lund University, Sweden-www.eit.lth.se

11 11 11

DSP Design

MorphologySimilar to convolution but more on the logic

levelImportant operations• Erosion: Shrinks (minimum)• Erosion: Shrinks (minimum)• Dilation: Expands (maximum)• Opening (erosion followed by dilation):

– Noise reduction• Closing (dilation followed by erosion):

– Reconnect split objectsReconnect split objects

Viktor Öwall, Dept. of Electrical and Information Technology, Lund University, Sweden-www.eit.lth.se

DSP Design

Morphology

ErosionSE

Dilation

Opening:• Erosion followed by dilation

Noise reduction– Noise reduction

Viktor Öwall, Dept. of Electrical and Information Technology, Lund University, Sweden-www.eit.lth.se

Page 11: Image processing Case Study · DSP Design Image processing From a hardware perspective • Often massivelyyp parallel – Can be used to increase throughput • Memory intensive –

DSP Design

Morphology, erosion (”and”)

000000 000000000

011

011

000

110

100111

000

000

001

000

000

000011100

011100

000000

111

111

= 001000

000000

000000000000

000000

000000

000000

Viktor Öwall, Dept. of Electrical and Information Technology, Lund University, Sweden-www.eit.lth.se

DSP Design

Morphology, dilation (”or”)

000000 111111000

011

011

000

110

100111

111

111

111

111

111

111011100

011100

000000

111

111

= 111111

111111

111111000000

000000

111111

111111

Viktor Öwall, Dept. of Electrical and Information Technology, Lund University, Sweden-www.eit.lth.se

DSP Design

Morphology, sliding windowDirect mapped implementationDirect-mapped implementation

ffff ffff2211 33Index image

FIFOFIFO4,5,64,5,6

ffff ffff FIFOFIFO 121110987654321

77 88 99 10,11,1210,11,12

OO,5,6,5,6

ffff ffff InputInput302928272625242322212019181716151413

1313 1414 1515 16,..,3616,..,36

Erosion / DilationErosion / Dilation363534333231

OutputOutput

Viktor Öwall, Dept. of Electrical and Information Technology, Lund University, Sweden-www.eit.lth.se

DSP Design

Morphology, sliding windowDirect mapped implementationDirect-mapped implementation

ffff ffff3322 44Index image

FIFOFIFO5,6,75,6,7

ffff ffff FIFOFIFO 121110987654321

88 99 1010 11,12,1311,12,13

OO5,6,5,6,

ffff ffff InputInput302928272625242322212019181716151413

1414 1515 1616 17,18,1917,18,19

Erosion / DilationErosion / Dilation363534333231

OutputOutput

Viktor Öwall, Dept. of Electrical and Information Technology, Lund University, Sweden-www.eit.lth.se

Page 12: Image processing Case Study · DSP Design Image processing From a hardware perspective • Often massivelyyp parallel – Can be used to increase throughput • Memory intensive –

DSP Design

Morphology, sliding windowDirect mapped implementationDirect-mapped implementation

ffff ffff23232222 2424Index image

FIFOFIFO25,26,2725,26,27

ffff ffff FIFOFIFO 121110987654321

2828 2929 3030 31,32,3331,32,33

OO5, 6,5, 6,

ffff ffff InputInput302928272625242322212019181716151413

3434 3535 3636 --,,--,,--

Erosion / DilationErosion / Dilation363534333231

OutputOutputPros: • Supports arbitrary SEs

Cons: • Unsuitable for large SEs

Viktor Öwall, Dept. of Electrical and Information Technology, Lund University, Sweden-www.eit.lth.se

Compare to 2D- Convolution Architecture!

DSP Design

D itiDecomposition

21 BBB

SESEwidthwidth

SE

SE

hhSESE idthidthx SEx SEh i hth i ht == SESEwidthwidth eighteight

SESEwidthwidthx SEx SEheightheight

Viktor Öwall, Dept. of Electrical and Information Technology, Lund University, Sweden-www.eit.lth.se

DSP Design

Morphology, erosion (”and”)

000000 000000000

011

011

000

110

100111

000

000

001

000

000

000011100

011100

000000

111

111

= 001000

000000

000000000000

000000

000000

000000

Viktor Öwall, Dept. of Electrical and Information Technology, Lund University, Sweden-www.eit.lth.se

DSP Design

Morphology, erosion (”and”)

000000 000000000

011

011

000

110

100111

000

000

001

000

000

000011100

011100

000000

111

111

= 001000

000000

000000000000

000000

000000

000000

Viktor Öwall, Dept. of Electrical and Information Technology, Lund University, Sweden-www.eit.lth.se

Page 13: Image processing Case Study · DSP Design Image processing From a hardware perspective • Often massivelyyp parallel – Can be used to increase throughput • Memory intensive –

DSP Design

tDecomposition 1st step

000000 000000000

011

011

000

110

100

000

001

001

000

100

000011100

011100

000000

111 = 001000

001000

000000000000

000000

000000

000000

Viktor Öwall, Dept. of Electrical and Information Technology, Lund University, Sweden-www.eit.lth.se

DSP Design

Decomposition 2nd step

000000000

001

001

000

100

0001

000

000

001

000

000

000=001000

001000

000000

1

1

001000

000000

000000000000

000000

000000

000000

Viktor Öwall, Dept. of Electrical and Information Technology, Lund University, Sweden-www.eit.lth.se

DSP Design

Morphology

ffff

Mux

Mux

’0’’0’ArchitectureArchitecture

001122ffff

MuxMux

=SE width=SE width++InIn OutOut..., 0, 1, 1, 1, 1, 0,......, 0, 1, 1, 1, 1, 0,... =3=3 ..., 0,......, 0,......, 0, 0,......, 0, 0,......, 0, 0, 0, ......, 0, 0, 0, ......, 1, 0, 0, 0, ......, 1, 0, 0, 0, ......, 1, 1, 0, 0, 0, ......, 1, 1, 0, 0, 0, ......, 0, 1, 1, 0, 0, 0, ......, 0, 1, 1, 0, 0, 0, ...

Stage 1: Stage 1: Number of ones Number of ones in the same rowin the same row

SESEwidthwidth

Viktor Öwall, Dept. of Electrical and Information Technology, Lund University, Sweden-www.eit.lth.se

DSP Design

Morphologyffff

Mux

Mux

Mux

MuxRow memRow mem

’0’’0’ ’0’’0’ArchitectureArchitecture

MuxMux MuxMux

=SE width=SE width =SE height=SE height++ ++InIn OutOut

Stage 1: Stage 1: Number of ones Number of ones in the same rowin the same row

Stage 2: Stage 2: Number of Number of consecutive lines with consecutive lines with

SE width onesSE width ones

SESEwidthwidth

SE

SE

heigheig SESEwidthwidthx SEx SEheightheight==

Viktor Öwall, Dept. of Electrical and Information Technology, Lund University, Sweden-www.eit.lth.se

ghtght

Page 14: Image processing Case Study · DSP Design Image processing From a hardware perspective • Often massivelyyp parallel – Can be used to increase throughput • Memory intensive –

DSP Design

Morphologyffff

Mux

Mux

Mux

MuxRow memRow mem

’0’’0’ ’0’’0’ArchitectureArchitecture

MuxMux MuxMux

=SE width=SE width =SE height=SE height++ ++InIn OutOut

Stage 1: Stage 1: Number of ones Number of ones in the same rowin the same row

Stage 2: Stage 2: Number of Number of consecutive lines with consecutive lines with

SE width onesSE width ones

Viktor Öwall, Dept. of Electrical and Information Technology, Lund University, Sweden-www.eit.lth.se

DSP Design

D litDuality

''A B A B '

A B A B

'A B A B

Viktor Öwall, Dept. of Electrical and Information Technology, Lund University, Sweden-www.eit.lth.se

DSP Design

D lit lDuality, exampleA B 'A B

000011

000110

111

111100

111001

111011100011100000000

111111111

100011100011111111

111111111

000000 111111

'A B '( )´A B111111

111111

A B ( )A B000000

000000

110111111111111111

000001

000000

000000000000

Viktor Öwall, Dept. of Electrical and Information Technology, Lund University, Sweden-www.eit.lth.se

111111111111

000000000000

DSP Design

D litDuality

''A B A B '

A B A B

'A B A B

Both operations on same hardware pby inverting the input and output

streamsstreams.

Viktor Öwall, Dept. of Electrical and Information Technology, Lund University, Sweden-www.eit.lth.se

Page 15: Image processing Case Study · DSP Design Image processing From a hardware perspective • Often massivelyyp parallel – Can be used to increase throughput • Memory intensive –

DSP Design

MorphologyMorphology

ffff

Mux

Mux

Mux

MuxRow memRow mem

’W’’W’ ’N’’N’’0’’0’ ’0’’0’ArchitectureArchitecture

MuxMux

Mux Mux

Mu

Mu

Mux Mux MuxMux

InIn

NorthNorthWestWest

OperationOperation

OperationOperation

SouthSouth&&

EastEast

Mux

Mux =SE width=SE width =SE height=SE height

uxux

++ ++ OutOut’1’’1’

Mu

Muxx

Stage 0: Stage 0: InvertsInvertsif dilation is if dilation is performedperformed

Stage 1: Stage 1: Number of ones Number of ones in the same rowin the same row

Stage 2: Stage 2: Number of Number of consecutive lines with consecutive lines with

SE width onesSE width ones

Stage 3: Stage 3: InvertsInvertsif dilation is if dilation is performedperformed

DualityDuality

Viktor Öwall, Dept. of Electrical and Information Technology, Lund University, Sweden-www.eit.lth.se

DSP Design

M h lMorphologyIn our application• Noise reduction• Reconnect split objects

Low complexity architecture with low memory requirements

Viktor Öwall, Dept. of Electrical and Information Technology, Lund University, Sweden-www.eit.lth.se

DSP Design

P t tPrototype

Viktor Öwall, Dept. of Electrical and Information Technology, Lund University, Sweden-www.eit.lth.se

DSP Design

Embedded Hardware PlatformDDR memory

Segm. Morph Label

Feat. Mem 0

Sensor

Bus

FIFOSegm. Morph LabelFeat. Mem 1

SWMem

Read

Sensor FIFO

Read-&

Draw-boxes

VGA

PPC

VGA

Label Mem 1

Label Mem 0

DISPLAY

ResultMem

VGAMemory

VGACTRL

Viktor Öwall, Dept. of Electrical and Information Technology, Lund University, Sweden-www.eit.lth.se

FPGA-chip

Page 16: Image processing Case Study · DSP Design Image processing From a hardware perspective • Often massivelyyp parallel – Can be used to increase throughput • Memory intensive –

DSP Design

Digital Holography

Transpositiona spos t o

Viktor Öwall, Dept. of Electrical and Information Technology, Lund University, Sweden-www.eit.lth.se

DSP Design

Digital HolographyDigital Holography

A li ti• A digital image sensor replaces the

ApplicationMicroscope based on

Digital Holography

photographic film• Interference pattern, reference and object

li ht i t d t lDigital Holography light is captured separately• Computer algorithm generates the image

LaserReference Light

Digital image Object

Object Lightsensor Light

Viktor Öwall, Dept. of Electrical and Information Technology, Lund University, Sweden-www.eit.lth.se

DSP Design

Advantage 1 - Phase informationUnwrapped phaseR f ti i d

Makes transparent objects visible

Amplitude Refraction index

Viktor Öwall, Dept. of Electrical and Information Technology, Lund University, Sweden-www.eit.lth.se

DSP Design

Advantage 2 – FocusAll focus information in one single recording

Head of a greenfly

1 mm

Viktor Öwall, Dept. of Electrical and Information Technology, Lund University, Sweden-www.eit.lth.se

Page 17: Image processing Case Study · DSP Design Image processing From a hardware perspective • Often massivelyyp parallel – Can be used to increase throughput • Memory intensive –

DSP Design

Phase Holographic Imagingcell analyzer to envision and monitor transparent living cells in vitro, in

their growth environment without the need for artificial staining and makes quantification of a large number of parameters possible to

f i l tiperform in real-time

Pseudo 3D-image of cells generated from

Viktor Öwall, Dept. of Electrical and Information Technology, Lund University, Sweden-www.eit.lth.se

Pseudo 3D-image of cells generated from the phase information. www.phiab.se

DSP Design

Time lapse study of cell division:Time-lapse study of cell division:Wilms' tumor is a rare type of kidney cancer that affects children.yp y

Viktor Öwall, Dept. of Electrical and Information Technology, Lund University, Sweden-www.eit.lth.se

DSP Design

Time-lapse study: a sequence of consequtive imagesa sequence of consequtive images

Viktor Öwall, Dept. of Electrical and Information Technology, Lund University, Sweden-www.eit.lth.se

DSP Design

•Important issues•Processing and efficiency

• Processor vs. FPGA/ASIC

•Memory access and throughput•FFT Selection

Viktor Öwall, Dept. of Electrical and Information Technology, Lund University, Sweden-www.eit.lth.se

•FFT Selection

Page 18: Image processing Case Study · DSP Design Image processing From a hardware perspective • Often massivelyyp parallel – Can be used to increase throughput • Memory intensive –

DSP Design

XSTREAM - 2D FFT• A two-dimensional FFT can be evaluated by

– Applying a one-dimensional FFT over the rowsApplying a one dimensional FFT over the rows– Applying a one-dimensional FFT over the column of the result

• Burst read Column access is slow– Transpose the memory between operations and only operate on

rows

Viktor Öwall, Dept. of Electrical and Information Technology, Lund University, Sweden-www.eit.lth.se

DSP Design

Memory and throughput

Overhead = (Setup+N) / N

N 1 O h d 800%N=1 Overhead 800%N=32 Overhead 21% Burst access

0 N-1

Viktor Öwall, Dept. of Electrical and Information Technology, Lund University, Sweden-www.eit.lth.se

DSP Design

XSTREAM - Transpose• Divide the matrix into macro-blocks (32x32)

– Transpose macro-blocks individuallyTranspose macro blocks individually– Relocate transposed macro-blocks

Viktor Öwall, Dept. of Electrical and Information Technology, Lund University, Sweden-www.eit.lth.se

DSP Design

Divided Transpose1 2 3 4 5 6 7 8 1 9 17 25

9 10 11 12 13 14 15 16

17 18 19 20 21 22 23 24

2 10 18 26

3 11 19 27

25 26 27 28 29 30 31 32

33 34 35 36 37 38 39 40

4 12 20 28

41 42 43 44 45 46 47 48

49 50 51 52 53 54 55 56

57 58 59 60 61 62 63 64

Viktor Öwall, Dept. of Electrical and Information Technology, Lund University, Sweden-www.eit.lth.se

Page 19: Image processing Case Study · DSP Design Image processing From a hardware perspective • Often massivelyyp parallel – Can be used to increase throughput • Memory intensive –

DSP Design

Divided Transpose1 2 3 4 5 6 7 8 1 9 17 25 33 41 49 57

9 10 11 12 13 14 15 16

17 18 19 20 21 22 23 24

2 10 18 26 34 42 50 58

3 11 19 27 35 43 51 59

25 26 27 28 29 30 31 32

33 34 35 36 37 38 39 40

4 12 20 28 36 44 52 60

41 42 43 44 45 46 47 48

49 50 51 52 53 54 55 56

57 58 59 60 61 62 63 64

Viktor Öwall, Dept. of Electrical and Information Technology, Lund University, Sweden-www.eit.lth.se

DSP Design

Divided Transpose1 2 3 4 5 6 7 8 1 9 17 25 33 41 49 57

9 10 11 12 13 14 15 16

17 18 19 20 21 22 23 24

2 10 18 26 34 42 50 58

3 11 19 27 35 43 51 59

25 26 27 28 29 30 31 32

33 34 35 36 37 38 39 40

4 12 20 28 36 44 52 60

5 13 21 29

41 42 43 44 45 46 47 48

49 50 51 52 53 54 55 56

6 14 22 30

7 15 23 31

57 58 59 60 61 62 63 64 8 16 24 32

Viktor Öwall, Dept. of Electrical and Information Technology, Lund University, Sweden-www.eit.lth.se

DSP Design

Divided Transpose1 2 3 4 5 6 7 8 1 9 17 25 33 41 49 57

9 10 11 12 13 14 15 16

17 18 19 20 21 22 23 24

2 10 18 26 34 42 50 58

3 11 19 27 35 43 51 59

25 26 27 28 29 30 31 32

33 34 35 36 37 38 39 40

4 12 20 28 36 44 52 60

5 13 21 29 37 45 53 61

41 42 43 44 45 46 47 48

49 50 51 52 53 54 54 56

6 14 22 30 38 46 54 62

7 15 23 31 39 47 55 63

57 58 59 60 61 62 63 64 8 16 24 32 40 48 56 64

Viktor Öwall, Dept. of Electrical and Information Technology, Lund University, Sweden-www.eit.lth.se

DSP Design

XSTREAM - 2D FFT

Viktor Öwall, Dept. of Electrical and Information Technology, Lund University, Sweden-www.eit.lth.se

A ”rather” small burst size gives a large gain!