lecture 8 - detection and matched filters

University of Illinois at Urbana-Champaign 07/16/13

Detection and Template Matching

CS 598 PS Fall 2013

Friday, September 20, 13

UN I V E R S I T Y O F I L L I N O I S AT UR B A N A -CH A M P A I G N - CS598PS - FA L L 2013

2

A change in thinking

So far we did feature extraction Unsupervised learning

Now well do detection and classificationSupervised learning



3

Todays lecture

Detection of elements of interest

Template matching / matched filtersObtaining invariance/robustnessEfficient matchingUsage cases

Probabilistic interpretation



4

Detection

Find the presence of a predefined signalFace/pedestrian/car detectionSpeech, heartbeat, gunshot detectionEarthquakes, quasars, aliens, ...

Many ways to go about itWell start from the easy part



5

A simple example

Can we detect a known face in a collection?

Query

Search set



6

Vectorize the data(and scale to unit norm)

x = vec( ) =

Y = vec( ) =



7

And get their dot product

x Y =0 2 4 6 8 10 12 14 16 18

0.8

0.85

0.9

0.95

1

1.05



8

Dot product for matching

For two unit length vectors x,yThen x,y are increasingly similar as tends to zeroIf the dot product is maximized we have a closer match

x y = 1



9

Interpreting the dot product

Query

Search set

0 2 4 6 8 10 12 14 16 180.8

0.85

0.9

0.95

1

1.05



And it works ok for noisy inputs too10

Query

Search set

0 2 4 6 8 10 12 14 16 18

0.65

0.7

0.75

0.8

0.85

0.9

0.95

1

1.05

NoisyClean



11

Template matching

Our query is a templateUse dot products to compare it with inputs

Simple and very fast detector/classifierBut it wont get you too far!

Lets signal-ize it



12

A tougher case

Heartbeat data

How do we detect the beats?It can be all over the placeHow do we do the dot product?



13

Brute force approach

Do all possible dot products

d(t) = x(1) x(N)

y(t +1)y(t +N)

x

yFriday, September 20, 13


14

Sliding dot product

d

d(t) = x(1) x(N)

y(t +1)y(t +N)

x

y



15

A closer look at this

Writing this as a summation it comes up to be a convolutionBut with the template time-reversed

d(t) = x(1) x(N) y(t +1)

y(t +N)

=

= x(i)y(t + i)i = x yx(t) = x(t)



16

Matched filter

This is known as a matched filter

Why filter? Because we do a convolution

Also known as cross-correlationVery common tool in communications, sonar, radar, etc

Template presenceat time t Template Input

d(t) = x(k)y(tk)k



17

Matched filters in sonar/radar

Radar/sonar Transmitted signal Object

Reflection

Received signal

Template

Input

Matched filter output d x y



Some post-processing

The matched filter output is a little messy We mostly care about rough peaks, and its energy

So we rectify and lowpass filter

18

0.5 1 1.5 2 2.5 3 3.5 4 4.5 5

0.5

0

0.5

Input echo recording

Time (sec)

0.5 1 1.5 2 2.5 3 3.5 4 4.5 5

0.5

0

0.5

1Input echo recording (rectified/smoothed filter)

Time (sec)

InputFilter

InputFilter



And thankfully it works well with noise19

0.5 1 1.5 2 2.5 3 3.5 4 4.5 50.5

0

0.5

1

Clean echo recording

Time (sec)

0.5 1 1.5 2 2.5 3 3.5 4 4.5 50.5

0

0.5

1

Noisy echo recording

Time (sec)

InputFilter

InputFilter



20

Simple communications example

You send 0/1s over the airBut you pick up some noise

E.g.:

How do you recover the original data?

100 200 300 400 500 600 700 800 900 10000

0.5

1Sent data

100 200 300 400 500 600 700 800 900 10002

0

2

4Received data

100 200 300 400 500 600 700 800 900 10000

0.5

1

Recovered data

ConvolutionThresholding



21

Detecting the 1s

Make a template for them:x = [1 1 1 1 1 1 1 1 1]This will detect a series of 1 symbols

Convolve template with the inputAnd measure when the result is above threshold

E.g. greater than 0.5

Where it is we have a series of 1s



22

Result

100 200 300 400 500 600 700 800 900 10000

0.5

1Sent data

100 200 300 400 500 600 700 800 900 10002

0

2

4Received data

100 200 300 400 500 600 700 800 900 10000

0.5

1

Recovered data

ConvolutionThresholding



An interesting connection

What is this filter? x = [1 1 1 1 1 1 1 1 1] Lets look at its Fourier transform:

23

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

1

2

3

4

5

6

7

8

9

Resp

onse

Normalized frequency



An interesting connection

What is this filter? x = [1 1 1 1 1 1 1 1 1]

Its a lowpass filter We effectively remove the high frequencies (i.e. the noise)

Two ways to achieve the same result Template to match our signal Or a denoising filter

24



25

Computational considerations

The sliding dot product is slowN multiplications/additions per input sample No fun with large templates

Fortunately we can use the FFT!Remember that convolution can be performed

efficiently with the FFT



FFT-based matched filtering

Convolution in the time domain maps to element-wise multiplication in the frequency domain

But we have three sizes here: template, input, output template is length N, input in M, output is N+M-1 What do we use?

26

d = x y F d = F x( ) F y( )



Approach 1

Zero pad everything to the largest size (M+N-1) Ensures that we we have enough space to store the output Doesnt change the result since we convolve with more zeros

Slightly redundant though

Example: N = 1,000, M = 10,000,000 Time domain takes about 10 GFLOPs FFT convolution takes about 200 MFLOPs

27



Approach 2

Take small snippets of the input, convolve them with the template and add them together (convolution is linear!)

How small? Same size as template (N) to avoid excessive zero padding But, the output will be 2N-1, so we need to zero pad as much

Example speedup, N = 1,000, M = 10,000,000: Time: 10 GFLOPS FFT: 200 MFLOPs Fast convolution: 60 MFLOPs

28



Matched filtering for event detection29

Input video

Extract audio only

Now we can process long inputs efficiently

2 4 6 8 10 12 14 16 18 20 22

0.50

0.5

Input scene audio

Time (sec)

0.2 1

0.5

0

0.5

Gunshot template

Time (sec)



Matched filter result

Strong peaks where gunshots took place Processing: Time domain: 1.8 sec, Fast Conv: 0.06 sec

30

2 4 6 8 10 12 14 16 18 20 22

0.5

0

0.5

1

1.5

2Input scene audio / matches

Time (sec)

Input audioFilter output energy



Going even faster31

t s = F F t( ) F s( ) = F ft fs( )t s 2 = F ft fs( )( ) F ft fs( ) = ft fs( ) ft fs( ) = ft fs 2

Freq

uenc

y (kH

z)

Gunshot template in Fourier space

0

2

4

6

Input scene in Fourier space

2 4 6 8 10 12 14 16 18 20

2 4 6 8 10 12 14 16 18 20 22

Filter responses

Time (sec)

Time filterSpectral norms



32

Generalizing to two dimensions

In 1D the template was time-reversedIn 2D we flip both dimensions

We now slide across both dimensionsE.g. left-to-right for all heights

FFT speedup still holds But we predictably use 2D FFTs Same caveats as before apply here as well

d(i, j) = x(k,l)y(i +k, j +l)lk



33

Digit recognition example

Given the following input

Find the digits by analyzing the image



34

Collect templates

Create a set of templates for all digits

Filter flipped versions with input imageEach template will peak at digits position

MATLAB demo

x1 = x2 =



35

A new problem

What if we have a change in size?



36

Broadening the search

Resize the the templates to more sizes and run template matching againHow much resizing?

This is N times more computationWhen considering N scales

MATLAB demo



37

Image pyramids

Scaling the templates is not too efficientInstead resample the input at multiple scales



38

Rotation?

Same as beforeBrute force search over all potential

rotations

One more multiplicative factor in computation

MATLAB demo



39

Some relief

Rotation and scale and , thats a lot of searching that needs to be done!

There are some tricks we can pulle.g. radial filters for rotation

Better featuresChoose, e.g., a rotation invariant feature

Lots and lots of papers But some searching will be there



For example

Treat significant pixels as points Get their cross-distances

And normalize them

The distances histogram is rotation- and scale- invariant Do matching on that instead

40

0 10 20

5

10

15

20

25

Input data Distance matrix

20 40 60

20

40

60

0 0.5 10

100

200

300

400

Distance histogram

10 20 30 40

15

20

25

30

35

50 100 150

50

100

150

0 0.5 10

1000

2000

0 10 20

5

10

15

20

25

50 100 150

50

100

150

0 0.5 10

500

1000

1500

2000

10 20 30 40

20

30

40

100 200 300

100

200

300

0 0.5 10

5000

10000

0

0.5

1

0

0.5

1

0

0.5

1

0

0.5

1



41

One more problem

What happened to the normalization?!Remember that vectors were unit length?That doesnt hold with matched filtering

Heres a challenging case



42

Problems with scaling

Regular cross-correlation is:

Output is highly dependent on the local magnitude of the input

d(t) = x(k)y(tk)kThis is high if this is high



43

In fact, Ive been cheating all this time

I used white-on-black imagesI made sure that high values were parts of interest

Do you see a new problem?

510

15

510

1520

25300

0.05

0.1

"White on black"

510

15

510

1520

25300

0.05

0.1

"Black on white"



44

Solution 1: Normalize

We really want:

Where the denominator ensures a unit-length dot product with the template

Likewise we can generalize to 2-D, etc.But FFT speedup is a little trickier now

d(t) =x(k)y(tk)

xy(t)k



45

Solution 2: Gradient filtering

Detect what mattersRemember what your retina likes? (hint: edges)

Perform detection on the gradient imageThis abstracts the color and local intensity

Much more robust detector (for images at least)



Example46

10 20 30

10

20

30

Template

0

0.05

0.1

10 20 30

10

20

30

Template gradient

0.1

0

0.1

10 20 30 40 50

10203040

Input

0.10.20.3

10 20 30 40 50

10203040

Input gradient

0.200.2

10 20 30 40 50

10203040

Input

0.10.20.3

10 20 30 40 50

10203040

Input gradient

0.200.2

10 20 30 40 50

510152025303540

| Matched filter output |, peak = 1.2886

0

0.2

0.4

0.6

0.8

1

1.2

10 20 30 40 50

510152025303540

| Matched filter output |, peak = 1.2886

0

0.2

0.4

0.6

0.8

1

1.2



Movie example

Track the ball movement Convolve in 4D space

47

Input movie Ball template

=Color channel average power

120 160 3 138

28 29 3

120 160 138



Each channel separately48



49

Template matching applications

Popular tool in computer visionface/car/pedestrian/footballer detection

Templates are example images of the above

Good for event detection in audiogunshot detection, room measurements

Bio/geological/space/underwater signals etc ...



50

Matching without a template

Sometime we dont know the templateBut we know it appears a lot

We can use auto-correlation to find structureUse the full input as a template

Peaks will denote repeating elements



Back to the heartbeat example

We dont know the exact template, but there is a pattern that is repeating in time We can still find structure regardless

51

1 2 3 4 5 6 7x 104

0.2

0

0.2

2000 4000 6000 8000 10000 12000 14000

0.10

0.10.2

Time (samples)



Autocorrelation

Use the entire input as a template Match the input on itself

52

Point where the two sequences alignwill yield the maximum output

Correlation between successive beats Correlation between

beats spaced by two

0.5 1 1.5 2 2.5x 104

5

0

5

10

15

20

Time (samples)

Correlation between oscillations within beat



53

An popular case

Pitch trackingMusical instruments tend to repeat a similar waveform,

the rate of the repeat is the pitch

But it can change over time!!

0 0.005 0.01 0.015 0.02 0.025 0.03 0.035 0.04 0.0450.30.20.1

00.10.20.3

Pitched note

Time (sec)



So we cant really autocorrelate

Many peaks corresponding to various elements No temporal information

54

1 2 3 4 5 6 7 8 9 10

0.5

0

0.5

Input music

Time (sec)

0.08 0.06 0.04 0.02 0 0.02 0.04 0.06 0.081

0

1

2

3x 104

Time (sec)

Zoomed in autocorrelation



Use localized autocorrelations instead

For each analysis window perform an autocorrelation Which you can also efficiently compute with the FFT

55

1 2 3 4 5 6 7 8 9 10

0.5

0

0.5

Time (sec)

Input sound

Lag

(mse

c)

Time (sec)

Sliding autocorrelation

1 2 3 4 5 6 7 8 9

12345



56

Probing matching criteria

Revisiting the starting pointSuppose we instead tried to find the minimal

distance between the template and the input

This is a simple Euclidean distance D(m,n) = x(im, j n)y(i, j)

2ji



57

From Euclidean to dot product

Assume a locally constant input yMinimizing D, maximizes the last term

Which happens to be the dot product

With the dot product we are ultimately minimizing a Euclidean distance

D(m,n) = y(i, j) 2 +ji x(i, j) 2ji 2 y(i, j)x(im, j n)ji



58

Getting to a likelihood

The Euclidean distance implies a Gaussian

We can now get a likelihood of match Is Gaussian the right model though?

We can use various other metrics to define distance, implying a different distributionMore on that later

P(x; y) e12(xy)(xy)



59

And dont forget features!

We focused on detection on raw dataWe can instead convolve on the feature weights

This can buy us noise robustness, invariance, and other desirable propertiesYou will rarely operate on raw data

All of the previous applies here as well



60

Recap

Detection of interesting elementsMatched filtering1D to 2DScale/rotation invarianceNormalized cross-correlationDetection on gradientsAutocorrelation

From filters to likelihoodsFriday, September 20, 13


61

Next lecture

Linear classifiers

Discriminant models

Multi-class recognition



But ...

The next lecture is next Friday I will be at the IEEE MLSP workshop next Wednesday

Does the jet-setting lifestyle appeal to you? Do a good final project, we had students from this class in MLSP before

62



63

Reading

Fast normalized correlationhttp://scribblethink.org/Work/nvisionInterface/nip.pdf

Transform invariant templateshttp://www.springerlink.com/content/h345462721557808/http://www.lps.usp.br/~hae/software/forapro/index.html



Reminder

There is a problem set handed out It is due on Friday Stuck? Tough luck, Ill be out of town :)

Talk to your TA

We also have a piazza page: http://piazza.com/illinois/fall2013/cs598psGood way to find project buddies

64


lecture 8 - detection and matched filters

Documents