learning local image descriptors
DESCRIPTION
Learning Local Image Descriptors. Matthew Brown University of British Columbia (prev.) Microsoft Research. [ Collaborators: † Simon Winder, *Gang Hua , † Rick Szeliski † =MS Research, *=MS Live Labs]. Applications @MSFT. Panoramic Stitching - PowerPoint PPT PresentationTRANSCRIPT
Local Image Descriptors for Scalable Recognition
Learning Local Image DescriptorsMatthew BrownUniversity of British Columbia(prev.) Microsoft Research[ Collaborators: Simon Winder, *Gang Hua, Rick Szeliski =MS Research, *=MS Live Labs]
Applications @MSFTPanoramic StitchingDigital Image Pro, Windows Live Photogallery, Expression, HDView3D ModellingPhotosynthVirtual EarthLocation RecognitionImage SearchLincoln
[ yellow = product, white = technology preview, grey = research ]
Photosynth
[ http://labs.live.com/photosynth ]Photo Tourism[ Slide credit: Noah Snavely]Scene reconstructionPhoto Explorer
Input photographs
Relative camera positions and orientationsPoint cloudSparse correspondence[ http://photour.cs.washington.edu ]Photosynth is based on Photo Tourism [Snavely, Seitz, Szeliski SIGGRAPH 2006 ]Photo Tourism uses SIFT for correspondenceOur system takes as input an unordered set of photos, either from an Internet search or from a large personal collection. We assume the photos are largely from the same static scene. The first step of our system is to apply a computer vision techniques to reconstruct the geometry of the scene. The output of this procedure is the relative positions and orientation for the cameras used to take a connected set of the photographs, as well as a point cloud representing the geometry of the scene, and a sparse set of correspondences between the photos.This information is then loaded into our interactive photo explorer tool.
multiview stereo = training data
[ Seitz et al CVPR 2006, Goesele et al ICCV 2007 ]Learning Image Features[ Photo Tourism Snavely, Seitz, Szeliski - SIGGRAPH 2006 ]
3D PointCloud6Learning Image Features
[ Photo Tourism Snavely, Seitz, Szeliski - SIGGRAPH 2006 ]
3D PointCloudLearning Image Features
3D PointCloud[ Photo Tourism Snavely, Seitz, Szeliski - SIGGRAPH 2006 ]Learning Image Features
3D PointCloud[ Photo Tourism Snavely, Seitz, Szeliski - SIGGRAPH 2006 ]
Problem Statement = for simplicity + efficiency* = measured by ROC curve
Q: Form of the descriptor function f(.)?Find a function of a local image patchdescriptor = f ( )s.t. a nearest neighbour classifier is optimal*
Descriptor Algorithms
AlgorithmNormalizedImage PatchDescriptorVectorGradientsQuantized tok OrientationsNormalizeSummation[ SIFT Lowe ICCV 1999 ]Descriptor Algorithms
AlgorithmNormalizedImage PatchDescriptorVectorGradientsQuantized tok OrientationsNormalize(plus PCA)Summation[ GLOH Mikolajzcyk Schmid PAMI 2005 ]Descriptor Algorithms
AlgorithmNormalizedImage PatchDescriptorVectorCreateEdge MapNormalizeSummation[ Shape Context Belongie Malik Puzicha NIPS 2000 ]Descriptor Algorithms
AlgorithmNormalizedImage PatchDescriptorVectorFeatureDetectorNormalizeSummationTSN[ Geometric Blur Berg Malik CVPR 2001 ]Our Contribution
NormalizedImage PatchDescriptorVectorTSNParametersPropose a framework for descriptor algorithmsLearn parameters to find best performanceTrain on a ground truth data set based on accurate 3D matchesT-blocks
NormalizedImage Patch(w x h)DescriptorVectorTSNTransformation blockLocal gradientsSteerable filtersIsotropic filtersHaar waveletsLocal classifierQuantized intensities(w x h x k)Output: one length k vector per source pixelS-Blocks
NormalizedImage Patch(w x h)DescriptorVectorSNT(w x h x k)(m x k)Spatial summation block with m regions
Output: m length k vectorsS1S2S3S4N-Blocks
NormalizedImage Patch(w x h)DescriptorVectorSNT(w x h x k)(m x k)(m x k)Normalization BlockUnit normalizationSIFT normalization with clippingLearning DescriptorsSTNLearning DescriptorsS2T1aN2ParametersTraining Pairs
Incorrect Match %Correct Match %Update Parameters(Powell)Descriptor Distances
Powell minimisation: variation on line search where the latest step is added to a set of direction vectors. Do line search on all the direction vectors and add the latest step to the direction set, throwing away oldest direction vector.21Testing DescriptorsS2T1aN2ParametersTest Pairs
Incorrect Match %Correct Match %Final Error RateDescriptor Distances
95%Example of Parameter Learning
Results: Changing T-Blocks (k=4)Polar lattice S2 always has lower error rate than rectangular S1Gradient and DOG with S2 beat our SIFT reference (4% vs 6% error)
Results: Changing T-Blocks (k=8)
Results: Changing T-Blocks (k=16)
Steerable filters produce great results if phase information is keptResults: Changing S-Blocks
Results
SIFT normalization is importantBest result: 4th order steerable filters with phase information combined with polar S4-25 Gaussian summation block (2% error vs SIFT at 6%)Very large numbers of dimensionsDimension Reduction: PCA
wPCA
Dimension Reduction: LDA
wLDADimension Reduction: LDA
wLDADimension Reduction: LDA
wLDA
Results: LDA on patchesLDA on pixels SIFT (6%)PCA gave small improvement
Normalised patches
Gradient patches
Effect of # of Training PairsResults: LDA on patches
LDA on pixels SIFT (6%)PCA gave small improvementNeed ~100,000 training examples Results: LDA on T blocks
LDA on T1-T3 < 4.5%Optimal #dimensions ~20-30Post-normalisation important
T1
T3T1 = gradients binned in 4 orientation bins. T3 = steerable filter magnitude response35Results: LDA on T blocksLDA using T blocks T1T4
LDA on T1-T3 < 4.5%Optimal #dimensions ~20-30Post-normalisation important Results: LDA on descriptorsLDA using CVPR 07 descriptors
Overall best results#dimensions reduced from 100s to 10sNeed more challenging dataset!Discussion: Image Descriptors
AlgorithmNormalizedImage PatchDescriptorVectorFeatureDetectorNormalizeSummation
TSNcomplexsimpleConclusionsUsed learning to obtain good descriptorsAchieved error rates 1/3 of SIFTProduced useful ground truth data set
Future WorkUse multi-view stereo ground truthMulti-level simple-complex architecture+ non-parametric T blocksLearn interest point detectors
[ refs: 1) Winder, Brown CVPR 2007 2) Hua, Brown, Winder ICCV 2007 ][email protected][http://research.microsoft.com/ivm/hdview.htm ]