cviu lecture 3
TRANSCRIPT
ENGN8530: Computer Vision and Image Understanding:
Theories and Research
Topic 3:Image Matching and Registration
Dr Chunhua Shen and Dr Roland GoeckeVISTA / NICTA & RSISE, ANU
Acknowledgement: Some slides from Dr Antonio Robles-Kelly, Dr TiberioCaetano, Dr Rob Mahony, Dr Cordelia Schmid, and Dr David Lowe
ENGN8530: CVIU 2
Some TermsImage Matching:
Align images of the same modality so as to provide a continuous ‘larger’ imageImages often taken from different viewpoints
Image Registration:Align images of the same or different modalities so that the object of interest shows up in the same way in all the imagesUsually more or less taken from the same viewpointVery important in medical imaging!
However, these terms are often used interchangeably!
ENGN8530: CVIU 3
How to Build a Panorama?
Reference: M. Brown and D.G. Lowe, “Recognising Panoramas”, ICCV 2003.
ENGN8530: CVIU 4
Template MatchingUseful for locating objects with known shape and appearance in an image.An n×m template is compared with n×m regions of the image with the aim of finding the point in the image to which it is most similar.Disadvantage:
Cannot handle pose changes wellCannot handle scale changes well
ENGN8530: CVIU 5
Template Matching (2)Given 2 images I1 and I2 (of same size) measure the correlation between them.E.g. how similar are these? Need similarity measure!
-1
-1
0
-1 0
0 1
1 1
-10
-10
0
-10 0
0 10
10 10
-10
-10
0
-10 0
0
1010
10
ENGN8530: CVIU 6
Similarity MeasuresSum of Absolute Difference (SAD):
Let I1 and I2 be images of the same size, I1(pi) = ai
I2(pi) = bi
Sum of Squared Difference (SSD) very similar, except we use squares
∑ −=i
ii ba),(SAD 21 II
ENGN8530: CVIU 7
SADExample: Find the eyes in a face
Original image showing templateand location of optimum
Sum of absolute differences, note the peak over the left eye
Template
ENGN8530: CVIU 8
SAD (2)Advantages:
Intuitivedegrades gracefully
Disadvantage:Not robust to changes in illumination
ENGN8530: CVIU 9
SAD (3)
original image I
darkened image
Idark= I/2 + 0.2
FAILS:
not robust to uniform change in illumination
ENGN8530: CVIU 10
CorrelationCorrelation measures how changes in one variable correlate with changes in another variable.We will use the normalised cross-correlation (NCC) to determine correlation between I1 and I2. Without normalisation, correlation is also sensitive to illumination changes.This measures the similarity in the way in which I1 and I2 deviate from their mean values. Measures the similarity of the ‘patterns’ of two images.
⇒ Robust to changes in illumination.
ENGN8530: CVIU 11
Normalised Cross CorrelationLet I1 and I2 be images of the same size, I1(pi) = ai
I2(pi) = bi
( )( )( )
( ) ( )2221
∑∑
∑
−−
−−=
ii
ii
iii
bbaa
bbaa),(NCC II
ENGN8530: CVIU 12
NCC (2)
I1 I2 NCC(I1, I2)
-1-10
-1 00 11 1
-1-10
-1 00 11 1
k>01k . +l
-1-10
-1 00 11 1 -1-10
-10
01
11
-1k . +l k>0
Here k and l are constants, '+l' means to add l to all matrix elements
ENGN8530: CVIU 13
NCC (3)
NCC(I1, I2) ∈ [-1,1]Measures the similarity of the ‘patterns’ of two images.Is undefined for a flat featureless image:
This virtually neveroccurs in practice
k
k
k
k k
k k
k k
This means “the interval on the real line bounded by –1 and 1, containing both its boundary points.”
ENGN8530: CVIU 14
NCC (4)
0
1
1
1
-1
0
1
1
-1
-1
0
0
2
3
2
0
-1
-1
0
-1 0
0 1
1 1
Apply in a similar manner to convolution, but• only calculate in ‘valid’ region.
ENGN8530: CVIU 15
NCC (5)
-1
-1
0
-1 0
0 1
1 1
Apply in a similar manner to convolution, but• only calculate in ‘valid’ region.
1
-1
-1
0
-1 0
0 1
1 1
0
1
1
1
-1
0
1
1
-1
-1
0
0
2
3
2
0
ENGN8530: CVIU 16
NCC (6)
Apply in a similar manner to convolution, but• only calculate in ‘valid’ region.
-1
-1
0
-1 0
0 1
1 1
-1
-1
0
-1 0
0 1
1 1
0
1
1
1
-1
0
1
1
-1
-1
0
0
2
3
2
0
1 0.846
ENGN8530: CVIU 17
NCC (7)
Apply in a similar manner to convolution, but• only calculate in ‘valid’ region.
-1
-1
0
-1 0
0 1
1 1
-1
-1
0
-1 0
0 1
1 1
0
1
1
1
-1
0
1
1
-1
-1
0
0
2
3
2
0
1 0.846
0.833
ENGN8530: CVIU 18
NCC (8)
Apply in a similar manner to convolution, but• only calculate in ‘valid’ region.
-1
-1
0
-1 0
0 1
1 1
-1
-1
0
-1 0
0 1
1 1
0
1
1
1
-1
0
1
1
-1
-1
0
0
2
3
2
0
1 0.846
0.833 0.258
ENGN8530: CVIU 19
Template Matching using NCCTemplate of right eye is flipped and used to locate left eye
Original image showing template,and location of maximum
in normalised cross-correlation
Normalised cross-correlation, note the peak over the left eye
ENGN8530: CVIU 20
NCC v. SADNCC SAD
unaltered image I
darkened image
Idark= I/2 + 0.2
Robust to uniform change
in illuminationnot robust to uniform change in illumination
FAIL!
ENGN8530: CVIU 21
IssuesThese similarity measures are not very good for handling
Scale changesPose changesArbitrary rotations of the object or cameraIllumination changes, in particular non-global
How can we improve image matching / registration?Solution:
Use local invariant image featuresThen use these features to do the matching / registration
ENGN8530: CVIU 22
How to Build a Panorama? (2)Need to align (match) images
ENGN8530: CVIU 23
How to Build a Panorama? (3)Detect feature points in both images
ENGN8530: CVIU 24
How to Build a Panorama? (4)Detect feature points in both imagesFind corresponding pairs
ENGN8530: CVIU 25
How to Build a Panorama? (5)Detect feature points in both imagesFind corresponding pairsUse these pairs to align images
ENGN8530: CVIU 26
Local Image FeaturesLocal invariant photometric descriptors
( )local descriptor
Local : robust to occlusion/clutter + no segmentationPhotometric : distinctiveInvariant : to image transformations + illumination changes
ENGN8530: CVIU 27
Invariant FeaturesImage content is transformed into local feature coordinates thatare invariant to translation, rotation, scale, and other imagingparameters
SIFT Features
ENGN8530: CVIU 28
Invariant Features (2)Advantages:
Locality: features are local, so robust to occlusion and clutter (no prior segmentation)Distinctiveness: individual features can be matched to a large database of objectsQuantity: many features can be generated for even small objectsEfficiency: close to real-time performanceExtensibility: can easily be extended to wide range of differing feature types, with each adding robustness
ENGN8530: CVIU 29
Matching with FeaturesProblem 1:
Detect the same point independently in both images
no chance to match!
We need a repeatable detector
ENGN8530: CVIU 30
Matching with Features (2)Problem 2:
For each point correctly recognize the corresponding one
?
We need a reliable and distinctive descriptor
ENGN8530: CVIU 31
Matching with Features (3)Determining correspondences
Vector comparison using the Mahalanobis distance
)()(),( 1 qpqpqp −Λ−= −TMdist
( ) ( )=?
ENGN8530: CVIU 32
A Little Bit of History…Zhang, Deriche, Faugeras, Luong (Artificial Intelligence, 1995):
Apply Harris corner detectorMatch points by correlating only at corner points Derive epipolar alignment using robust least-squares
ENGN8530: CVIU 33
A Little Bit of History… (2)Schmid & Mohr (1997)Apply Harris corner detectorUse rotational invariants at corner points
However, not scale invariant. Sensitive to viewpoint and illumination change.
ENGN8530: CVIU 34
Interest Point DetectorsContour based methods
Junctions, ends, etc.
Intensity based methodsAuto-correlation matrix
Parametric-model based methodL-corner
…
ENGN8530: CVIU 35
Harris Corner Detector
Basic idea:We should easily recognize the point by looking through a small windowShifting a window in any directionshould give a large change in intensity
Reference: C. Harris and M. Stephens, “A combined corner and edge detector”, Proceedings of the 4th Alvey Vision Conference, 1988, pp. 147--151.
ENGN8530: CVIU 36
Harris Corner Detector (2)Based on the idea of auto-correlation
“flat” region:no change in all directions
“edge”:no change along the edge direction
“corner”:significant change in all directions
ENGN8530: CVIU 37
Harris Corner Detector (3)Change of intensity for the shift [u,v]:
[ ]2
,
( , ) ( , ) ( , ) ( , )x y
E u v w x y I x u y v I x y= + + −∑
IntensityShifted intensity
Window function
orWindow function w(x,y) =
Gaussian1 in window, 0 outside
ENGN8530: CVIU 38
Harris Corner Detector (4)
For small shifts [u,v] we have a bilinear approximation:
[ ]( , ) ,u
E u v u v Mv⎡ ⎤
≅ ⎢ ⎥⎣ ⎦
where M is a 2×2 matrix computed from image derivatives:
2
2,
( , ) x x y
x y x y y
I I IM w x y
I I I⎡ ⎤
= ⎢ ⎥⎢ ⎥⎣ ⎦
∑Auto-correlation matrix
ENGN8530: CVIU 39
Harris Corner Detector (5)Intensity change in shifting window: eigenvalue analysis
[ ]( , ) ,u
E u v u v Mv⎡ ⎤
≅ ⎢ ⎥⎣ ⎦
direction of the slowest change
direction of the fastest change
λ1, λ2 – eigenvalues of M
(λmax)-1/2
(λmin)-1/2
Ellipse E(u,v) = const
ENGN8530: CVIU 40
Harris Corner Detector (6)Auto-correlation matrix
captures the structure of the local neighborhoodmeasure based on eigenvalues of this matrix
2 strong eigenvalues => interest point1 strong eigenvalue => contour0 eigenvalue => uniform region
Interest point detectionthreshold on the eigenvalueslocal maximum for localization
ENGN8530: CVIU 41
Harris Corner Detector (7)
λ1
λ2
“Corner”λ1 and λ2 are large,λ1 ~ λ2;E increases in all directions
λ1 and λ2 are small;E is almost constant in all directions
“Edge”λ1 >> λ2
“Edge”λ2 >> λ1
“Flat”region
Classification of image points using eigenvalues of M:
ENGN8530: CVIU 42
Harris Corner Detector (8)
Measure of corner response:
( )2det traceR M k M= −
1 2
1 2
dettrace
MM
λ λλ λ
== +
(k – empirical constant, k = 0.04-0.06)
ENGN8530: CVIU 43
Harris Corner Detector (9)
λ1
λ2 “Corner”
“Edge”
“Edge”
“Flat”
R > 0
R < 0
R < 0|R| small
•R depends only on eigenvalues of M
•R is large for a corner
•R is negative with large magnitude for an edge
•|R| is small for a flatregion
ENGN8530: CVIU 44
Harris Corner Detector (10)The Algorithm:
Find points with large corner response function R (R > threshold)Take the points of local maxima of R
ENGN8530: CVIU 45
Harris: Workflow
ENGN8530: CVIU 46
Harris: Workflow (2)Compute corner response R
ENGN8530: CVIU 47
Harris: Workflow (3)Find points with large corner response: R>threshold
ENGN8530: CVIU 48
Harris: Workflow (4)Take only the points of local maxima of R
ENGN8530: CVIU 49
Harris: Workflow (5)
ENGN8530: CVIU 50
Harris: PropertiesRotation invariance
Ellipse rotates but its shape (i.e. eigenvalues) remains the same
Corner response R is invariant to image rotation
ENGN8530: CVIU 51
Harris: Properties (2)Partial invariance to affine intensity change
Only derivatives are used => invariance to intensity shift I → I + b
Intensity scale: I → a I
R
x (image coordinate)
threshold
R
x (image coordinate)
ENGN8530: CVIU 52
Harris: Properties (3)But: non-invariant to image scale!
All points will be classified as edges
Corner !
ENGN8530: CVIU 53
Scale Invariant FeaturesConsider regions (e.g. circles) of different sizes around a pointRegions of corresponding sizes will look the same in both images
ENGN8530: CVIU 54
Scale Invariant Features (2)The problem: how do we choose corresponding circles independently in each image?
ENGN8530: CVIU 55
Scale Invariant Features (3)Solution:
Design a function on the region (circle), which is “scale invariant” (the same for corresponding regions, even if they are at different scales)
Example: average intensity. For corresponding regions (even of different sizes) it will be the same.
scale = 1/2
– For a point in one image, we can consider it as a function of region size (circle radius)
f
region size
Image 1 f
region size
Image 2
ENGN8530: CVIU 56
Scale Invariant Features (4)Common approach:
scale = 1/2
f
region size
Image 1 f
region size
Image 2
Take a local maximum of this functionObservation: region size, for which the maximum is achieved, should be invariant to image scale.
s1 s2
Important: This scale invariant region size is found in each image independently!
ENGN8530: CVIU 57
Scale Invariant Features (5)A “good” function for scale detection:
has one stable sharp peak
f
region size
bad
f
region size
Good !f
region size
bad
• For usual images: a good function would be a one which responds to contrast (sharp local intensity change)
ENGN8530: CVIU 58
Scale Invariant Features (6)Functions for determining scale
2 2
21 22
( , , )x y
G x y e σπσ
σ+
−=
( )2 ( , , ) ( , , )xx yyL G x y G x yσ σ σ= +
( , , ) ( , , )DoG G x y k G x yσ σ= −
Kernel Imagef = ∗Kernels:
where Gaussian
(Laplacian)
(Difference of Gaussians)
Note: both kernels are invariant to scale and rotation
ENGN8530: CVIU 59
Scale Invariant Features (7)
scale
x
y
← Harris →
←La
plac
ian →
Harris-LaplacianFind local maximum of:
Harris corner detector in space (image coordinates)Laplacian in scale
Reference: K.Mikolajczyk, C.Schmid. “Indexing Based on Scale Invariant Interest Points”. ICCV 2001
ENGN8530: CVIU 60
Scale invariant Harris pointsMulti-scale extraction of Harris interest pointsSelection of points at characteristic scale in scale space
Characteristic scale:- Maximum in scale space- Scale invariantLaplacian
ENGN8530: CVIU 61
Scale invariant Harris points (2)
Multi-scale Harris points
Selection of points
at the characteristic scalewith Laplacian
invariant points + associated regions
ENGN8530: CVIU 62
Viewpoint ChangesLocally approximated by an affine transformation
A
Detected scale invariant region Projected region
Affine transformation: Linear transformation followed by a translation
ENGN8530: CVIU 63
Affine Invariant FeaturesSo far we considered:Similarity transform (rotation + uniform scale)
• Now we go on to:Affine transform (rotation + non-uniform scale)
ENGN8530: CVIU 64
Affine Invariant Features (2)
Take a local intensity extremum as initial pointGo along every ray starting from this point and stop when extremum of function f is reached
f
points along the ray
0
10
( )( )
( )t
ot
I t If t
I t I dt
−=
−∫
Reference: T.Tuytelaars, L.V.Gool. “Wide Baseline Stereo Matching Based on Local, Affinely Invariant Regions”. BMVC 2000.
ENGN8530: CVIU 65
Affine Invariant Features (3)We will obtain approximately corresponding regions
The regions found may not exactly correspond, so we approximate them with ellipsesGeometric moments of orders up to 2 allow to approximate the region by an ellipse
Remark: Search for scale in every direction
ENGN8530: CVIU 66
Affine Invariant Features (4)
q Ap=
2 1TA AΣ = Σ
12 1Tq q−Σ =
2 region 2
TqqΣ =
• Covariance matrix of region points defines an ellipse:
11 1Tp p−Σ =
1 region 1
TppΣ =
( p = [x, y]T is relative to the center of mass)
Ellipses, computed for corresponding regions,
also correspond!
ENGN8530: CVIU 67
Affine Invariant Harris
Initialisation with multi-scale interest points
Iterative modification of location, scale and neighbourhood
ENGN8530: CVIU 68
MSERMaximally Stable Extremal Regions
Threshold image intensities: I > I0
Extract connected components(“Extremal Regions”)Find a threshold when an extremalregion is “Maximally Stable”,i.e. there is a local minimum of the relative growth of its squareApproximate a region with an ellipse
Reference: J. Matas, O. Chum, M.Urban, T. Pajdla, “Robust Wide Baseline Stereo from Maximally Stable Extremal Regions”, BMVC 2002, pp. 384-393
ENGN8530: CVIU 69
SIFTScale-Invariant Feature TransformBasically SIFT is a 4-step process
Scale-space extrema detectionKeypoint localizationOrientation assignmentKeypoint descriptor
Reference: D. Lowe, "Distinctive image features from scale-invariant keypoints," International Journal of Computer Vision, 60, 2 (2004), pp. 91-110.
ENGN8530: CVIU 70
SIFT (2)Build Scale-Space Pyramid
All scales must be examined to identify scale-invariant featuresAn efficient function is to compute the Difference of Gaussian (DOG) pyramid (Burt & Adelson, 1983)
Blur
Res ample
Subtra ct
Blur
Res ample
Subtra ct
Blur
Resample
Subtract
ENGN8530: CVIU 71
SIFT (3)Scale space processed one octave at a time
ENGN8530: CVIU 72
SIFT (4)Key point localisation
Detect maxima and minima of Difference-of-Gaussian (DoG) in scale spaceCould also use Laplacian of Gaussian (LoG)
ENGN8530: CVIU 73
SIFT (5)
0 2π
Select canonical orientation:Create histogram of local gradient directions computed at selected scaleAssign canonical orientation at peak of smoothed histogramEach key specifies stable 2D coordinates (x, y, scale, orientation)
ENGN8530: CVIU 74
SIFT (6)SIFT vector formation (Keypoint Descriptor):
Thresholded image gradients are sampled over 16x16 array of locations in scale spaceCreate array of orientation histograms8 orientations x 4x4 histogram array = 128 dimensions
ENGN8530: CVIU 75
SIFT Example
Laplacian of Gaussian
ENGN8530: CVIU 76
SIFT Example (2)
SIFT keypoints
ENGN8530: CVIU 77
SIFT Example (3)
Query
Result
Task:
Find query image parts in the image
ENGN8530: CVIU 78
SummarySIFT arguably the best affine invariant local image feature, but…SIFT is relatively expensive (computationally)MSER doesn’t work well with images with any motion blur, e.g. from a moving cameraInteresting alternatives:
GLOH (Gradient Location and Orientation Histogram)SURF (Speeded Up Robust Features)Histogram of Oriented GradientsKadir-Brady Saliency Detector