cse 252c: advanced computer visioncseweb.ucsd.edu/~mkchandraker/classes/cse252c/... · •...

Lecture2:Correspondence,MetricLearning

CSE252C:AdvancedComputerVision

ManmohanChandraker

CSE252C,SP20:ManmohanChandraker

Virtualclassrooms• VirtuallecturesonZoom

– Onlyhostsharesthescreen

– Keepvideooffandmicrophonemuted

– Butpleasedospeakup(remembertounmute!)

• VirtualinteractionsonZoom

– Askandanswerplentyofquestions

– “Raisehand”featureonZoomwhenyouwishtospeak

– Postquestionsonchatwindow

– TAwillhelpkeeptrackofraisedhandsandchatwindow

• LecturesrecordedanduploadonKaltura

– Availableunder“MyMedia”onCanvas

Enrollmentlogistics• Ifonwaitlist,needtosend“Requesttoenroll” email

– Statewhetheryousatisfythelistedpre-requisitecourserequirements

– Explainwhetheryouhaverequiredbackground(courses,projects)

• Ifnotclearedtoenroll,orwhileonthewaitlist

– Youarewelcometoattendlectures

– TolimitTAworkload,wecangradeonlyenrolledstudents

• AllannouncementswillbepostedonPiazza

– SendemailtoTA(CCinstructor)ifdidnotgetPiazzainvite

• MSComps:Hostedasthefinalexamofthecourse

• 2unitsorS-U

– SendinstructoranemailandCCTAs

– Assignmentsoptional,gradedonfinalexamsandpresentationCSE252C,SP20:ManmohanChandraker

Coursedetails• Classwebpage:

– http://cseweb.ucsd.edu/~mkchandraker/classes/CSE252C/Spring2020/

• Instructoremail:

– mkchandraker@eng.ucsd.edu

• TAs:Zhengqin LiandYou-YiJau

– Emails:zhl378@eng.ucsd.edu andyjau@eng.ucsd.edu

• Grading

– 10%presentation

– 60%assignments

– 30%finalexam

• Aimistolearntogether,discussandhavefun!

Coursedetails

• “Lightning”presentations

– Fourstudentstopresentinoneclass

– Timelimit:5minutes

– Paperstobeassignedbyinstructor

– Orderofpresentation:alphabetic

– https://docs.google.com/spreadsheets/d/1VcdhEKmvDqLFfYIYh7x6oO

AqiVNfdbc6JLD5omBpXIA/edit#gid=0

• Sendpresentation1daybeforeclass

– Well-practicedandfluentpresentation

– Includenarrationifasynchronous

– Askandanswerquestionsafterpresentation

Coursedetails

• Presentationformat(1slideforeach):

1.Motivationandproblemdescription

2.Priorwork

3.Methodoverview

4.Methodanalysis

5.Experiments

6.Futureworkanddiscussion

• SuperPoint:Self-Supervised InterestPointDetection and Description

– https://arxiv.org/abs/1712.07629

• WarpNet:Weakly SupervisedMatching for Single-viewReconstruction

• AnchorNet:AWeakly Supervised Networkto Learn Geometry-Sensitive

Featuresfor SemanticMatching

• SIFTFlow:DenseCorrespondenceacrossDifferentScenes

– http://people.csail.mit.edu/celiu/ECCV2008/

PapersforWed,Apr15

Correspondence

WhyDoWeHaveTwoEyes?

Depthinformationlost

inimageformation

Binocular(stereo)vision

enablesdepthestimation

Howdoweperceivedepthinimages?

Relateprojectionsofthesamepoint intwoormoreimagesofthescene.

Relateintensitiesofimagepointsintwoormoreimages.

Usepriorinformation intheformofsemantics,function,affordance.

Visualcorrespondenceaids3Dreconstruction

Akeyproblem in3Dreconstruction: relatesimilarelementsacrossasetofimages

Geometric Photometric Semantic

Simple matching methodsInterestpoint:• Localizedposition

• Informativeaboutimagecontent

• Repeatableundervariations

Simple matching methods

W1(x,y):k xk pixelpatchinimage1

Descriptor:• Function appliedonW1 andW2,to

enablecomparing them

Interestpoint:• Localizedposition

• Informativeaboutimagecontent

• Repeatableundervariations

Simple matching methods• SSD (Sum of Squared Differences)

• NCC (Normalized Cross Correlation)

• What advantages might NCC have over SSD?

(Mean) (Standarddeviation)

Featuredistance:threshold

• Thedistancethresholdaffectsperformance

• Onlymatcheswithdistancelessthanthresholdareallowed

• Truepositives=numberofdetectedmatchesthatarecorrect

– Supposewewanttomaximizethese— howtochoosethreshold?

• Falsepositives=numberofdetectedmatchesthatareincorrect

– Supposewewanttominimizethese— howtochoosethreshold?

feature distance

false match

true match

Choosingamatch:ratiotest

• Firstapproach:useSSD=

• Betterapproach:ratiodistance=

– f2 isbestSSDmatchtof1 inI2

– f2’issecondbestSSDmatchtof1 inI2

– Giveslargevalues(closeto1)forambiguousmatches.

f1 f2f2'

Desirableproperty:invariance

Findfeaturesthatareinvarianttotransformations

– geometricinvariance:translation, rotation,scale

– photometric invariance:brightness, exposure,…

Idea of SIFT n For better image matching, need to develop an interest

operator invariant to scale and rotation.

n Also, need a descriptor robust to typical variations. The descriptor is the most-used part of SIFT.

CSE 252C, SP20: Manmohan Chandraker [Adaptedfrom:D.Lowe]

Overall Procedure at a High Level1. Scale-space extrema detection

2. Keypoint localization

3. Orientation assignment

4. Keypoint description

Search over multiple scales and image locations.

Fit a model to determine location and scale.Select keypoints based on a measure of stability.

Compute best orientation(s) for each keypoint region.

Use local image gradients at selected scale and rotationto describe each keypoint region.

1. Scale-space extrema detectionn Goal: Identify locations and scales that can be

reliably assigned under different views of the same scene or object.

n Method: search for stable features across multiple scales using a continuous function of scale.

n The scale space of an image is a function L(x,y,s)that is produced from the convolution of a Gaussian kernel (at different scales) with the input image.

Aside: Gaussian Smoothing

A1DGaussianwithmean0,variance1

A2DGaussianwithmean(0,0),variance1

CSE 252C, SP20: Manmohan Chandraker

Example: Gaussian Smoothed Scale Space

• Scalespaceaxioms:linearity,shiftinvariant,rotationinvariant,nospuriousextrema• Gaussianfilteruniquely satisfiesaxioms

Aside: Image Pyramids

Bottom level is the original image.

2nd level is derived from theoriginal image according tosome function

3rd level derived from 2nd level according to same function

And so on.

Aside: Gaussian PyramidAt each level, image is smoothed and reduced in size.

Bottom level is the original image.

At 2nd level, each pixel is the resultof applying a Gaussian mask tothe first level and then subsamplingto reduce the size.

And so on.

Apply Gaussian filter

Example: Subsampling with Gaussian Pre-filtering

Scale Space Pyramid

s+2 filtersss+1=2(s+1)/ss0

.si=2i/ss0

.s2=22/ss0

s1=21/ss0

s+3imagesincludingoriginal

s+2differenceimages

The parameter s determines the number of images per octave.

2. Key point localization

n Detect maxima and minima of difference-of-Gaussian in scale space

n Each point is compared to its 8 neighbors in the current image and 9 neighbors each in the scales above and below For each max or min found,

output is the location andthe scale.

s+2 difference images.top and bottom ignored.s planes searched.

Difference of Gaussiansn Scale-space detection

q Find local maxima across scale/spaceq A good “blob” detector

[ T. Lindeberg IJCV 1998 ]CSE 252C, SP20: Manmohan Chandraker

3. Orientation assignment

n Create histogram of local gradient directions at selected scale

n Assign canonical orientation at peak of smoothed histogram

If 2 major orientations, use both.

Keypoint localization with orientation

729536

233x189keypoints

keypoints aftergradient threshold

keypoints afterratio threshold

4. Keypoint Descriptors

n At this point, each keypoint hasq locationq scaleq orientation

n Next is to compute a descriptor for the local image region about each keypoint that isq highly distinctiveq as invariant as possible to variations such as changes in

viewpoint and illumination

Normalizationn Rotate the window to standard orientation

n Scale the window size based on the scale at which the point was found.

SIFT Keypoint Descriptor(shown with 2 X 2 descriptors)

In implementation, 4x4 arrays of 8 bin histogram are used, a total of 128 features for one keypoint.

SIFT for Correspondence

[Codeandtutorial:https://www.vlfeat.org/overview/sift.html]CSE 252C, SP20: Manmohan Chandraker

CSE 252C, SP20: Manmohan Chandraker[Codeandtutorial:https://www.vlfeat.org/overview/sift.html]

Uses for SIFT

n Feature points are used also for:q Panorama stitchingq 3D reconstruction q Motion trackingq Object recognitionq Indexing and database retrievalq Robot navigationq … many others

Cases where SIFT does not work

q Strong illumination changesq Large out-of-plane rotationsq Non-rigid deformations or articulationsq Semantic correspondence

LearningCorrespondence

Similar?

Image courtesy of [Vedaldi et al., VLFeat]

Measuringpatchsimilarity

• HandDesignedFeatures

- SIFT,SURF,ORB,FREAK,DAISY

• NeuralNetworks

- ComparePatches [Zagoruyko.etal.]

- MatchNet [Hanetal.]

- SeebyMoving [Agrawaletal.]

- StereoMatchingCost [Zbontaretal.]

Agrawal et al., ICCV 2015, Han et al., CVPR 2015, Zagoruyko et al., CVPR 2015, Zbontar et al., CVPR 2015

Similarity

CNN CNN

FC Layers

• GlobalOptimization

- Flow,DSP,etc.

Revaud et al., ICCV 2013, Liu et al. PAMI 2011Lowe et al., ICCV 1999, Bay et al., ECCV 2006, Rublee et al., ICCV 2011, Alahi et al., ICCV 2012, Tola et al., CVPR 2008

Measuringpatchsimilarity

SpatiallocalizationinclassificationCNNs

[Long et al., “Do Convnets Learn Correspondence?”, NeurIPS 2014]

SpatiallocalizationinclassificationCNNs

Idea:• Siamesenetworktodecidepatchsimilarity

• Useintermediateactivationsasfeatures.

Advantages:• Easytotrain

Issues:• Inefficient:extractfeaturesforoverlapping

regionswithinpatches

• O(n2)feed-forwardpassestocomparenpatchesineachimage.

[Zagoruyko andKomodakis, CVPR2015]

CNNsforlearningcorrespondence

- Learn a feature space directly optimized for correspondence

- Intermediate activations of patch similarity are surrogate features

- Mapping an image to a metric space- Metric Space: Distance relationship = Class membership

Needforametricincorrespondencelearning

MetricLearning

Metric

MetricLearning

• Ametricquantifiesdistancebetweenanytwomembersofaset

• Inducesasimilaritymeasure

Metric

• Usefulfor,say,k-nearestneighborsclassification• Foreverytrainingsample,wantk closestsamplestobefromthesameclass

Nearestneighbors

• Nearestneighborclassification:

• Assignlabelofnearesttrainingdatapointtoeachtestdatapoint

Voronoi partitioning offeaturespacefor2-category2Ddata

[FigurefromDuda etal.]

Noveltestexample

Closesttoapositiveexamplefromthe

trainingset,soclassify

itaspositive.

Black=negativeRed=positive

Nearestneighbors

• Extendtok-nearestneighborclassification:• Foranewpoint,findthekclosestpointsfromtrainingdata

• Labelsofthekpoints“vote”toclassify

Ifquery landshere,5NNconsistof3negatives

and2positives, sowe

classifyasnegative.

Black=negativeRed=positive

[FigurefromD.Lowe]CSE252C,SP20:ManmohanChandraker

Nearestneighbors

• Advantages:• Simpletoimplement

• Flexibletofeatureordistancechoices

• Candowellinpracticewithenoughrepresentativedata

Nearestneighbors

• Advantages:• Simpletoimplement

• Flexibletofeatureordistancechoices

• Candowellinpracticewithenoughrepresentativedata

• Limitations:• Largesearchproblemtofindnearestneighbors

• Storageofdata

• MustknowwehaveameaningfuldistancefunctionCSE252C,SP20:ManmohanChandraker

Metric

• Propertiesofametric

• Non-negativity:! ", $ ≥ 0• Identity:! ", $ = 0 ifandonlyif" = $• Symmetry: ! ", $ = !($, ")• Triangleinequality:! ", $ + ! $, + ≥ !(", +)

• Exampleofmetric:! ", $ =∥ " − $ ∥• Exampleofnon-metric:! ", $ = "2$2

Metric

Learning

Metric

• Sometimes,thechoiceofmetricistrivial

• Depthestimation:distancesin3D,soEuclideandistanceagoodmetric

Metric

• Sometimes,thechoiceofmetricistrivial

• Othertimes,itisnotimmediatelyclearhowtodefineametric

• Findthetwofacesthatareclosesttoeachother

Metric:distancesinfeaturespace

• Trainingdatacomposedof(image,label)pairs

• (Image1,dog),(Image2,cat),(Image3,dog),....

• Classifytestimageasdogorcat

• Representimagesasfeatures,canfinddistancebetweenfeatures

Feature1

Feature2

• Mapnewimagetofeaturespace,finddistances

??Feature1

Feature2

Feature1

Feature2

• Notalldimensionsareequallyuseful!

Feature1

Feature2

Useful

Notuseful

• Notalldimensionsareequallyuseful!

• Idea: changehowwemeasuredistance

Feature1

Feature2

Useful

Notuseful

Metriclearning

• Givendata,learnametricM thathelpspredictionusingadistancefunction

• Goal: FindametricM suchthat

• Samplesfromsameclass(positives)havelow01• Samplesfromdifferentclasses(negatives,impostors)havehigh01

• Approach: Poseasanoptimizationproblem

Metriclearning

• Createtwosets,similarpairsS anddissimilarpairsD

Metriclearning

• Createtwosets,similarpairsS anddissimilarpairsD• FindM suchthat

Metriclearning

• MinimizeanobjectivefunctionwithrespecttoM:

Metriclearning

• MinimizeanobjectivefunctionwithrespecttoM:

• Anoptimizationproblem:findstationarypoints!

Metriclearning

• Advancedvariant:

• Convexoptimization(semidefiniteprogram),efficientlysolvable!

Largemarginnearestneighbors

• Targetneighbors ofpointxi:pointsxjwithsimilarlabels

• Impostors arepointsxl closetoxi,butwithdifferentlabels• Goal: pull targetscloser,push impostorsfartheraway

• Targetneighbors ofpointxi:pointsxjwithsimilarlabels

• Impostors arepointsxl closetoxi,butwithdifferentlabels• Goal: pull targetscloser,push impostorsfartheraway

• Givenpointi,similarpointsj,impostorpointsl :

Localconstraints:directlyimproveneighbor quality

Testimage

Largemarginnearestneighbor

Euclideandistancenearestneighbor

Testimage

Largemarginnearestneighbor

Euclideandistancenearestneighbor

[Weinbergeretal.,2009]

Generalrecipeformetriclearning

LDAasmetriclearning

SiameseCNNsfordistancemetriclearning

• AdvantageofLDA:linearproblem!

• AdvantageofLMNN:convexproblem!

• Disadvantages:featurerepresentationislearnedindependentofthemetric

• AdvantageofCNNs:canoptimizefeaturesconditionedonsimilaritymeasure

SiameseCNNsfordistancemetriclearning

• AdvantageofLDA:linearproblem!

• AdvantageofLMNN:convexproblem!

• Disadvantages:featurerepresentationislearnedindependentofthemetric

• AdvantageofCNNs:canoptimizefeaturesconditionedonsimilaritymeasure

SiameseCNNforpatchsimilarity

||D(x1)– D(x2)||2

Simo-Serra,E.,Trulls,E.,Ferraz,L.,Kokkinos, I.,Fua,P.andMoreno-Noguer,F.,2015.Discriminativelearningofdeepconvolutional

featurepointdescriptors.In ProceedingsoftheIEEEInternationalConferenceonComputerVision (pp.118-126).CSE252C,SP20:ManmohanChandraker

Issuewithusingthislossfortraining?

LossfunctionforSiameseCNNs

Thefinallossisdefinedas:

L=∑lossofpositivepairs+∑lossofnegativepairs

Bell,S.andBala,K.,2015.Learningvisual similarityforproductdesignwithconvolutional neuralnetworks.ACMTransactions onGraphics(TOG),34(4), p.98.

Wecanusedifferentlossfunctionsforthetwotypesofinputpairs.

• Typicalpositivepair (xp,xq)loss:L(xp,xq)=||xp – xq||2

(EuclideanLoss)

Bell,S.andBala,K.,2015.Learningvisual similarityforproductdesignwithconvolutional neuralnetworks.ACMTransactions onGraphics(TOG),34(4), p.98.

• Typicalnegativepair(xn,xq)loss:

L(xn,xq)=max(0,m2 - ||xn – xq||2) (HingeLoss)

• Combinedintoacontrastiveloss

• Forpairoftrainingexamplesx1 andx2 (withlabelsy1,y2):

L(x1,x2)=s12||x1 – x2||2 +(1– s12)max(0,m2 - ||x1 – x2||

s12 =1wheny1 =y2,

otherwises12 =0.

TripletLossforMetricLearning

• Ateachtrainingiteration,sampleamini-batchoftriplets

• Goal:pushnegativeamarginfurtherfromanchorthanpositive

• Tripletloss:

Anchor Positive Negative

• Tripletloss:

• Gradients:

• Tripletloss:

• Gradients:

• Issues:• Fixedparameterm,whileintra-classdistancescanvary

• Inefficienttoconsideralltriplets

AlternativestoSiamesenetworks

cse 252c: advanced computer visioncseweb.ucsd.edu/~mkchandraker/classes/cse252c/... · •...

Documents

model-based...

texture synthesis and image inpainting for video...

castro%252c correa y santiago%252c 1999. (2).pdf

video paragraph captioning using hierarchical recurrent...

antón r. escobedo cse 252c behavior recognition via sparse...

excavator auger - land pride · hydraulic motor (850-252c)...

ii se manana de la gastronomia%252c turismo (2)octubre 2017

cse 152, spring 2015 introduction to computer...

cse 152, spring 2017 introduction to computer...

demystifying medicine -- ebola%252c mers and the...

generative adversarial networks, and...

plato and the psycology of love. phaedrus 252c-253c.pdf

public health - 2020-21 berkeley academic...

bramble: the bayesian multiple-blob tracker by michael isard...

第一章调查介绍 · web view单位名称...

a administraÇÃo espanhola no espÍrito santo...

brockville campus site layout - learning at the centre ·...

edge image description using angular radial...

cse 252c: advanced computer...

active learning for visual object...