object recognition using local descriptors

Object Recognition using Local Descriptors

Javier Ruiz-del-Solar, and Patricio LoncomillaCenter for Web Research

Universidad de Chile

Outline

•Motivation & Recognition Examples•Dimensionality problems•Object Recognition using Local Descriptors•Matching & Storage of Local Descriptors•Conclusions

Motivation• Object recognition approaches based on local invariant

descriptors (features) have become increasingly popular and have experienced an impressive development in the last years.

• Invariance against: scale, in-plane rotation, partial occlusion, partial distortion, partial change of point of view.

• The recognition process consists on two stages:1. scale-invariant local descriptors (features) of the

observed scene are computed. 2. these descriptors are matched against descriptors of

object prototypes already stored in a model database. These prototypes correspond to images of objects under different view angles.

Recognition Examples (1/2)

Recognition Examples (2/2)

Image Matching Examples (1/2)

Image Matching Examples (2/2)

Some applications• Object retrieval in multimedia databases (e.g. Web)• Image retrieval by similarity in multimedia databases• Robot self-localization• Binocular vision• Image alignment and matching• Movement compensation• …

However … there are some problems

Dimensionality problems• A given image can produce ~100-1,000 descriptors of

128 components (real values)• The model database can contain until 1,000-10,000

objects in some special applications• => large number of comparisons => large processing

time• => large database’s size

Main motivation of this talk:• To get some ideas about how to make efficient

comparisons between local descriptors as well as efficient storage of them …

Recognition Process

The recognition process consists on two stages:1. scale-invariant local descriptors (features) of the

observed scene are computed. 2. these descriptors are matched against descriptors of

object prototypes already stored in a model database. These prototypes correspond to images of objects under different view angles.

Interest Points

Detection

Scale Invariant Descriptors (SIFT)

Calculation

Affine Transform Calculation

SIFT Matching

SIFT Database

Interest Points

Detection

Calculation

Reference Image

Offline Database Creation

Input Image

Affine Transform Parameters

Interest Points

Detection

Calculation

SIFT Matching

SIFT Database

Interest Points

Detection

Calculation

Input Image

Reference Image

Interest Points Detection (1/2) Interests points correspond to maxima of the SDoG

(Subsampled Difference of Gaussians) Scale-Space (x,y,).

Scale Space

Ref: Lowe 1999

Interest Points Detection (2/2)

Examples of detected interest points.

Our improvement: Subpixel location of interest points by a 3D quadratic approximation around the detected interest point in the scale-space.

Interest Points

Detection

Calculation

SIFT Matching

SIFT Database

Interest Points

Detection

Calculation

Input Image

Reference Image

SIFT Calculation For each obtained keypoint, a descriptor or feature vector

that considers the gradient values around the keypoint is computed. This descriptors are called SIFT (Scale -Invariant Feature Transformation).

SIFTs allow obtaining invariance against to scale and orientation.

Ref: Lowe 2004

Interest Points

Detection

Calculation

SIFT Matching

SIFT Database

Interest Points

Detection

Calculation

Input Image

Reference Image

SIFT Matching

Euclidian distance between the SIFTs (vectors) is employed.

Interest Points

Detection

Calculation

SIFT Matching

SIFT Database

Interest Points

Detection

Calculation

Input Image

Reference Image

Affine Transform Calculation (1/2)Several stages are employed:1. Object Pose Prediction• In the pose space a Hough transform is employed for

obtaining a coarse prediction of the object pose, by using each matched keypoint for voting for all object pose that are consistent with the keypoint.

• A candidate object pose is obtained if at least 3 entries are found in a Hough bin.

2. Affine Transformation Calculation• A least-squares procedure is employed for finding an affine

transformation that correctly account for each obtained pose.

Affine Transform Calculation (2/2)3. Affine Transformation Verification Stages:• Verification using a probabilistic model (Bayes classifier).• Verification based on Geometrical Distortion• Verification based on Spatial Correlation• Verification based on Graphical Correlation• Verification based on the Object Rotation4. Transformations Merging based on Geometrical

Overlapping

In blue verification stages proposed by us for improving the detection of robots heads.

Input Image

Reference Images

AIBO Head Pose Detection Example

Matching & Storage of Local Descriptors

• Each reference image gives a set of keypoints.• Each keypoint have a graphical descriptor, which is a 128-

components vector.• All the (keypoint,vector) pairs corresponding to a set of

reference images are stored in a set T.

Reference image

x,y,n,v1

...v128

x,y,n,v1

...v128

x,y,n,v1

...v128

x,y,n,v1

...v128 ...

(1) (2) (3) (4)

Reference image

More compact notation

• Each reference image gives a set of keypoints.• Each keypoint have a graphical descriptor, which is a 128-

components vector.• All the (keypoint,vector) pairs corresponding to a set of

reference images are stored in a set T.

• In the matching-generation stage, an input image gives another set of keypoints and vectors.

• For each input descriptor, the first and second nearest descriptors in T must be found.

• Then, a pair of nearest descriptors (d,dFIRST) gives a pair of matched keypoints (p,pFIRST).

Input image

Search in T

pFIRST

dFIRST

dSECp1

d2 ...

• The match is accepted if the ratio between the distance to the first nearest descriptor and the distance to the second nearest descriptor is lower than a given threshold

• This indicates that exists no possible confusion in the search results.

Accepted if:

distance( , ) < * distance ( , )d dFIRST d dSEC

• A way to store the T set in a ordered way is using a kd-tree• In this case, we will use a 128d-tree• As well known, in a kd-tree the elements are stored in the

leaves. The other nodes are divisions of the space in some dimension.

Storage: Kd-trees

2>3 2>5

All the vectors with more than 2 in the first dimension, stored at right side

Division node

Storage node

• Generation of balanced kd-trees:• We have a set of vectors

• We calculate the means and variances for each dimension i.

Storage: Kd-trees

………

M i 1Nai bi c i ... ,Vi

ai M i 2 (bi M i)2 ...

Tree construction:• Select the dimension iMAX with the largest variance

• Order the vectors with respect to the iMAX dimension.• Select the median M in this dimension.• Get a division node.• Repeat the process in a recursive way.

Storage: Kd-trees

iMAX>M

Nodes with iMAX component lesser than M

Nodes with iMAX component greater than M

Search process of the nearest neighbors, two alternatives:• Compare almost all the descriptors in T with the given

descriptor and return the nearest one, or• Compare Q nodes at most, and return the nearest of

them (compare calculate Euclidean distance)• Requires a good search strategy• It can fail• The failure probability is controllable by Q

We choose the second option and we use the BBF (Best Bin First) algorithm.

Search Process

• Set:• v: query vector• Q: priority queue ordered by distance to v (initially void)• r: initially is the root of T• vFIRST: initially not defined and with an infinite distance to v• ncomp: number of comparisons, initially zero.

• While (!finish):• Make a search for v in T from r => arrive to a leaf c• Add all the directions not taken during the search to Q in an

ordered way (each division node in the path gives one not-taken direction)

• If c is more near to v than vFIRST, then vFIRST=c• Make r = the first node in Q (the more near to v), ncomp++• If distance(r,v) > distance(vFIRST,v), finish=1

• If ncomp > ncompMAX, finish=1

Search Process: BBF Algorithm

queue:

Distance between 2 and 20

Search ExampleRequested vector

•I am a pointer•20>2

•Go right

Not-taken option

Search Example1>2

queue: 1>22>7

•8>7•Go right

comparisons: 0

queue: 1>22>7

Search Example

•20>6•Go right

comparisons: 0

queue: 1>22>7

CMIN: 91000992

•We arrived to a leaf

•Store nearest leaf in CMIN

Search Example

comparisons: 1

queue: 1>22>7

CMIN: 91000992

•Distance from best-in-queue is

lesser than distance from

•Start new search from

best in queue•Delete best

node in queue

Search Example

queue: 1>2

CMIN: 91000992

•Go down from here

Search Example

comparisons: 1

queue: 1>2

•We arrived to a leaf

•Store nearest leaf in CMIN

Search Example

comparisons: 2

queue: 1>2

Search Example

•Distance from best-in-queue is

NOT lesser than distance

from cMIN

•Finish

comparisons: 2

Conclusions• BBF+Kd-trees: good trade off between short search time

and high success probability.• But, perhaps BBF+ Kd-trees is not the optimal solution.• Finding a better methodology is very important to massive

applications (as an example, for Web image retrieval)

object recognition using local descriptors

Documents

object recognition eee 6209 – digital image...

object recognition szeliski chapter 14. recognition

object recognition - tu chemnitz€¦ · object recognition...

object recognition with hierarchical kernel descriptors ·...

object recognition with hierarchical kernel...

learning visual object categories with global descriptors...

combining textural descriptors for forest species...

multisensory perception - robot operating...

object recognition with hierarchical kernel descriptors ·...

recognition of human actions using texture descriptors

discriminative pose-free descriptors for face and object...

face, age and gender recognition using local descriptors

object recognition. so what does object recognition involve?

object reading: text recognition for object recognition ·...

hidden object detection and recognition in passive terahertz...

2007 an object recognition scheme based on visual...

supervised object recognition, unsupervised object...

evaluating color descriptors for object and scene...

object class recognition

local feature descriptors for visual recognition