learning local image descriptors

Local Image Descriptors for Scalable Recognition

Learning Local Image DescriptorsMatthew BrownUniversity of British Columbia(prev.) Microsoft Research[ Collaborators: Simon Winder, *Gang Hua, Rick Szeliski =MS Research, *=MS Live Labs]

Applications @MSFTPanoramic StitchingDigital Image Pro, Windows Live Photogallery, Expression, HDView3D ModellingPhotosynthVirtual EarthLocation RecognitionImage SearchLincoln

[ yellow = product, white = technology preview, grey = research ]

Photosynth

[ http://labs.live.com/photosynth ]Photo Tourism[ Slide credit: Noah Snavely]Scene reconstructionPhoto Explorer

Input photographs

Relative camera positions and orientationsPoint cloudSparse correspondence[ http://photour.cs.washington.edu ]Photosynth is based on Photo Tourism [Snavely, Seitz, Szeliski SIGGRAPH 2006 ]Photo Tourism uses SIFT for correspondenceOur system takes as input an unordered set of photos, either from an Internet search or from a large personal collection. We assume the photos are largely from the same static scene. The first step of our system is to apply a computer vision techniques to reconstruct the geometry of the scene. The output of this procedure is the relative positions and orientation for the cameras used to take a connected set of the photographs, as well as a point cloud representing the geometry of the scene, and a sparse set of correspondences between the photos.This information is then loaded into our interactive photo explorer tool.

multiview stereo = training data

[ Seitz et al CVPR 2006, Goesele et al ICCV 2007 ]Learning Image Features[ Photo Tourism Snavely, Seitz, Szeliski - SIGGRAPH 2006 ]

3D PointCloud6Learning Image Features

[ Photo Tourism Snavely, Seitz, Szeliski - SIGGRAPH 2006 ]

3D PointCloudLearning Image Features

3D PointCloud[ Photo Tourism Snavely, Seitz, Szeliski - SIGGRAPH 2006 ]Learning Image Features

3D PointCloud[ Photo Tourism Snavely, Seitz, Szeliski - SIGGRAPH 2006 ]

Problem Statement = for simplicity + efficiency* = measured by ROC curve

Q: Form of the descriptor function f(.)?Find a function of a local image patchdescriptor = f ( )s.t. a nearest neighbour classifier is optimal*

Descriptor Algorithms

AlgorithmNormalizedImage PatchDescriptorVectorGradientsQuantized tok OrientationsNormalizeSummation[ SIFT Lowe ICCV 1999 ]Descriptor Algorithms

AlgorithmNormalizedImage PatchDescriptorVectorGradientsQuantized tok OrientationsNormalize(plus PCA)Summation[ GLOH Mikolajzcyk Schmid PAMI 2005 ]Descriptor Algorithms

AlgorithmNormalizedImage PatchDescriptorVectorCreateEdge MapNormalizeSummation[ Shape Context Belongie Malik Puzicha NIPS 2000 ]Descriptor Algorithms

AlgorithmNormalizedImage PatchDescriptorVectorFeatureDetectorNormalizeSummationTSN[ Geometric Blur Berg Malik CVPR 2001 ]Our Contribution

NormalizedImage PatchDescriptorVectorTSNParametersPropose a framework for descriptor algorithmsLearn parameters to find best performanceTrain on a ground truth data set based on accurate 3D matchesT-blocks

NormalizedImage Patch(w x h)DescriptorVectorTSNTransformation blockLocal gradientsSteerable filtersIsotropic filtersHaar waveletsLocal classifierQuantized intensities(w x h x k)Output: one length k vector per source pixelS-Blocks

NormalizedImage Patch(w x h)DescriptorVectorSNT(w x h x k)(m x k)Spatial summation block with m regions

Output: m length k vectorsS1S2S3S4N-Blocks

NormalizedImage Patch(w x h)DescriptorVectorSNT(w x h x k)(m x k)(m x k)Normalization BlockUnit normalizationSIFT normalization with clippingLearning DescriptorsSTNLearning DescriptorsS2T1aN2ParametersTraining Pairs

Incorrect Match %Correct Match %Update Parameters(Powell)Descriptor Distances

Powell minimisation: variation on line search where the latest step is added to a set of direction vectors. Do line search on all the direction vectors and add the latest step to the direction set, throwing away oldest direction vector.21Testing DescriptorsS2T1aN2ParametersTest Pairs

Incorrect Match %Correct Match %Final Error RateDescriptor Distances

95%Example of Parameter Learning

Results: Changing T-Blocks (k=4)Polar lattice S2 always has lower error rate than rectangular S1Gradient and DOG with S2 beat our SIFT reference (4% vs 6% error)

Results: Changing T-Blocks (k=8)

Results: Changing T-Blocks (k=16)

Steerable filters produce great results if phase information is keptResults: Changing S-Blocks

Results

SIFT normalization is importantBest result: 4th order steerable filters with phase information combined with polar S4-25 Gaussian summation block (2% error vs SIFT at 6%)Very large numbers of dimensionsDimension Reduction: PCA

wPCA

Dimension Reduction: LDA

wLDADimension Reduction: LDA

wLDADimension Reduction: LDA

wLDA

Results: LDA on patchesLDA on pixels SIFT (6%)PCA gave small improvement

Normalised patches

Gradient patches

Effect of # of Training PairsResults: LDA on patches

LDA on pixels SIFT (6%)PCA gave small improvementNeed ~100,000 training examples Results: LDA on T blocks

LDA on T1-T3 < 4.5%Optimal #dimensions ~20-30Post-normalisation important

T1

T3T1 = gradients binned in 4 orientation bins. T3 = steerable filter magnitude response35Results: LDA on T blocksLDA using T blocks T1T4

LDA on T1-T3 < 4.5%Optimal #dimensions ~20-30Post-normalisation important Results: LDA on descriptorsLDA using CVPR 07 descriptors

Overall best results#dimensions reduced from 100s to 10sNeed more challenging dataset!Discussion: Image Descriptors

AlgorithmNormalizedImage PatchDescriptorVectorFeatureDetectorNormalizeSummation

TSNcomplexsimpleConclusionsUsed learning to obtain good descriptorsAchieved error rates 1/3 of SIFTProduced useful ground truth data set

Future WorkUse multi-view stereo ground truthMulti-level simple-complex architecture+ non-parametric T blocksLearn interest point detectors

[ refs: 1) Winder, Brown CVPR 2007 2) Hua, Brown, Winder ICCV 2007 ][email protected][http://research.microsoft.com/ivm/hdview.htm ]

learning local image descriptors

Documents