10/21/10
DESCRIPTION
Object Recognition and Augmented Reality. 10/21/10. Dali, Swans Reflecting Elephants. Computational Photography Derek Hoiem, University of Illinois. Last class: Image Stitching. Detect keypoints Match keypoints Use RANSAC to estimate homography Project onto a surface and blend. - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: 10/21/10](https://reader035.vdocuments.mx/reader035/viewer/2022062222/56815290550346895dc0b345/html5/thumbnails/1.jpg)
10/21/10
Object Recognition and Augmented Reality
Computational PhotographyDerek Hoiem, University of Illinois
Dali, Swans Reflecting Elephants
![Page 2: 10/21/10](https://reader035.vdocuments.mx/reader035/viewer/2022062222/56815290550346895dc0b345/html5/thumbnails/2.jpg)
Last class: Image Stitching1. Detect keypoints
2. Match keypoints
3. Use RANSAC to estimate homography
4. Project onto a surface and blend
![Page 3: 10/21/10](https://reader035.vdocuments.mx/reader035/viewer/2022062222/56815290550346895dc0b345/html5/thumbnails/3.jpg)
Augmented reality
• Adding fake objects/textures to real images– Project by Karen Liu
• Interact with object in scene– Responsive characters in AR
• Overlay information on a display– Tagging reality– Layar– Google goggles– T2 video (13:23)
![Page 4: 10/21/10](https://reader035.vdocuments.mx/reader035/viewer/2022062222/56815290550346895dc0b345/html5/thumbnails/4.jpg)
Adding fake objects to real video
Approach1. Recognize and/or track points that give you a
coordinate frame2. Apply homography (flat texture) or perspective
projection (3D model) to put object into scene
Main challenge: dealing with lighting, shadows, occlusion
![Page 5: 10/21/10](https://reader035.vdocuments.mx/reader035/viewer/2022062222/56815290550346895dc0b345/html5/thumbnails/5.jpg)
Information overlay
Approach1. Recognize object that you’ve seen before2. Retrieve info and overlay
Main challenge: how to match reliably and efficiently?
![Page 6: 10/21/10](https://reader035.vdocuments.mx/reader035/viewer/2022062222/56815290550346895dc0b345/html5/thumbnails/6.jpg)
Today
How to quickly find images in a large database that match a given image region?
![Page 7: 10/21/10](https://reader035.vdocuments.mx/reader035/viewer/2022062222/56815290550346895dc0b345/html5/thumbnails/7.jpg)
Let’s start with interest points
Query Database
Compute interest points (or keypoints) for every image in the database and the query
![Page 8: 10/21/10](https://reader035.vdocuments.mx/reader035/viewer/2022062222/56815290550346895dc0b345/html5/thumbnails/8.jpg)
Simple idea
See how many keypoints are close to keypoints in each other image
Lots of Matches
Few or No Matches
But this will be really, really slow!
![Page 9: 10/21/10](https://reader035.vdocuments.mx/reader035/viewer/2022062222/56815290550346895dc0b345/html5/thumbnails/9.jpg)
Key idea 1: “Visual Words”• Cluster the keypoint descriptors
![Page 10: 10/21/10](https://reader035.vdocuments.mx/reader035/viewer/2022062222/56815290550346895dc0b345/html5/thumbnails/10.jpg)
Key idea 1: “Visual Words”
K-means algorithm
Illustration: http://en.wikipedia.org/wiki/K-means_clustering
1. Randomly select K centers
2. Assign each point to nearest center
3. Compute new center (mean) for each cluster
![Page 11: 10/21/10](https://reader035.vdocuments.mx/reader035/viewer/2022062222/56815290550346895dc0b345/html5/thumbnails/11.jpg)
Key idea 1: “Visual Words”
K-means algorithm
Illustration: http://en.wikipedia.org/wiki/K-means_clustering
1. Randomly select K centers
2. Assign each point to nearest center
3. Compute new center (mean) for each cluster
Back to 2
![Page 12: 10/21/10](https://reader035.vdocuments.mx/reader035/viewer/2022062222/56815290550346895dc0b345/html5/thumbnails/12.jpg)
Key idea 1: “Visual Words”• Cluster the keypoint descriptors• Assign each descriptor to a cluster number
– What does this buy us?– Each descriptor was 128 dimensional floating
point, now is 1 integer (easy to match!)– Is there a catch?
• Need a lot of clusters (e.g., 1 million) if we want points in the same cluster to be very similar
• Points that really are similar might end up in different clusters
![Page 13: 10/21/10](https://reader035.vdocuments.mx/reader035/viewer/2022062222/56815290550346895dc0b345/html5/thumbnails/13.jpg)
Key idea 1: “Visual Words”• Cluster the keypoint descriptors• Assign each descriptor to a cluster number• Represent an image region with a count of these
“visual words”
![Page 14: 10/21/10](https://reader035.vdocuments.mx/reader035/viewer/2022062222/56815290550346895dc0b345/html5/thumbnails/14.jpg)
Key idea 1: “Visual Words”• Cluster the keypoint descriptors• Assign each descriptor to a cluster number• Represent an image region with a count of these
“visual words”• An image is a good match if it has a lot of the same
visual words as the query region
![Page 15: 10/21/10](https://reader035.vdocuments.mx/reader035/viewer/2022062222/56815290550346895dc0b345/html5/thumbnails/15.jpg)
Naïve matching is still too slow• Imagine matching 1,000,000 images, each
with 1,000 keypoints
![Page 16: 10/21/10](https://reader035.vdocuments.mx/reader035/viewer/2022062222/56815290550346895dc0b345/html5/thumbnails/16.jpg)
Key Idea 2: Inverse document file
• Like a book index: keep a list of all the words (keypoints) and all the pages (images) that contain them.
• Rank database images based on tf-idf measure.
tf-idf: Term Frequency – Inverse Document Frequency
# words in document
# times word appears in document
# documents
# documents that contain the word
![Page 17: 10/21/10](https://reader035.vdocuments.mx/reader035/viewer/2022062222/56815290550346895dc0b345/html5/thumbnails/17.jpg)
Fast visual search
“Scalable Recognition with a Vocabulary Tree”, Nister and Stewenius, CVPR 2006.
“Video Google”, Sivic and Zisserman, ICCV 2003
![Page 18: 10/21/10](https://reader035.vdocuments.mx/reader035/viewer/2022062222/56815290550346895dc0b345/html5/thumbnails/18.jpg)
Slide Slide Credit: Nister
110,000,000 Images in 5.8 Seconds
![Page 19: 10/21/10](https://reader035.vdocuments.mx/reader035/viewer/2022062222/56815290550346895dc0b345/html5/thumbnails/19.jpg)
Slide Slide Credit: Nister
![Page 20: 10/21/10](https://reader035.vdocuments.mx/reader035/viewer/2022062222/56815290550346895dc0b345/html5/thumbnails/20.jpg)
Slide Slide Credit: Nister
![Page 21: 10/21/10](https://reader035.vdocuments.mx/reader035/viewer/2022062222/56815290550346895dc0b345/html5/thumbnails/21.jpg)
Slide Credit: NisterSlide
![Page 22: 10/21/10](https://reader035.vdocuments.mx/reader035/viewer/2022062222/56815290550346895dc0b345/html5/thumbnails/22.jpg)
Recognition with K-tree
Following slides by David Nister (CVPR 2006)
![Page 23: 10/21/10](https://reader035.vdocuments.mx/reader035/viewer/2022062222/56815290550346895dc0b345/html5/thumbnails/23.jpg)
![Page 24: 10/21/10](https://reader035.vdocuments.mx/reader035/viewer/2022062222/56815290550346895dc0b345/html5/thumbnails/24.jpg)
![Page 25: 10/21/10](https://reader035.vdocuments.mx/reader035/viewer/2022062222/56815290550346895dc0b345/html5/thumbnails/25.jpg)
![Page 26: 10/21/10](https://reader035.vdocuments.mx/reader035/viewer/2022062222/56815290550346895dc0b345/html5/thumbnails/26.jpg)
![Page 27: 10/21/10](https://reader035.vdocuments.mx/reader035/viewer/2022062222/56815290550346895dc0b345/html5/thumbnails/27.jpg)
![Page 28: 10/21/10](https://reader035.vdocuments.mx/reader035/viewer/2022062222/56815290550346895dc0b345/html5/thumbnails/28.jpg)
![Page 29: 10/21/10](https://reader035.vdocuments.mx/reader035/viewer/2022062222/56815290550346895dc0b345/html5/thumbnails/29.jpg)
![Page 30: 10/21/10](https://reader035.vdocuments.mx/reader035/viewer/2022062222/56815290550346895dc0b345/html5/thumbnails/30.jpg)
![Page 31: 10/21/10](https://reader035.vdocuments.mx/reader035/viewer/2022062222/56815290550346895dc0b345/html5/thumbnails/31.jpg)
![Page 32: 10/21/10](https://reader035.vdocuments.mx/reader035/viewer/2022062222/56815290550346895dc0b345/html5/thumbnails/32.jpg)
![Page 33: 10/21/10](https://reader035.vdocuments.mx/reader035/viewer/2022062222/56815290550346895dc0b345/html5/thumbnails/33.jpg)
![Page 34: 10/21/10](https://reader035.vdocuments.mx/reader035/viewer/2022062222/56815290550346895dc0b345/html5/thumbnails/34.jpg)
![Page 35: 10/21/10](https://reader035.vdocuments.mx/reader035/viewer/2022062222/56815290550346895dc0b345/html5/thumbnails/35.jpg)
![Page 36: 10/21/10](https://reader035.vdocuments.mx/reader035/viewer/2022062222/56815290550346895dc0b345/html5/thumbnails/36.jpg)
![Page 37: 10/21/10](https://reader035.vdocuments.mx/reader035/viewer/2022062222/56815290550346895dc0b345/html5/thumbnails/37.jpg)
![Page 38: 10/21/10](https://reader035.vdocuments.mx/reader035/viewer/2022062222/56815290550346895dc0b345/html5/thumbnails/38.jpg)
![Page 39: 10/21/10](https://reader035.vdocuments.mx/reader035/viewer/2022062222/56815290550346895dc0b345/html5/thumbnails/39.jpg)
![Page 40: 10/21/10](https://reader035.vdocuments.mx/reader035/viewer/2022062222/56815290550346895dc0b345/html5/thumbnails/40.jpg)
![Page 41: 10/21/10](https://reader035.vdocuments.mx/reader035/viewer/2022062222/56815290550346895dc0b345/html5/thumbnails/41.jpg)
![Page 42: 10/21/10](https://reader035.vdocuments.mx/reader035/viewer/2022062222/56815290550346895dc0b345/html5/thumbnails/42.jpg)
![Page 43: 10/21/10](https://reader035.vdocuments.mx/reader035/viewer/2022062222/56815290550346895dc0b345/html5/thumbnails/43.jpg)
![Page 44: 10/21/10](https://reader035.vdocuments.mx/reader035/viewer/2022062222/56815290550346895dc0b345/html5/thumbnails/44.jpg)
![Page 45: 10/21/10](https://reader035.vdocuments.mx/reader035/viewer/2022062222/56815290550346895dc0b345/html5/thumbnails/45.jpg)
![Page 46: 10/21/10](https://reader035.vdocuments.mx/reader035/viewer/2022062222/56815290550346895dc0b345/html5/thumbnails/46.jpg)
Performance
![Page 47: 10/21/10](https://reader035.vdocuments.mx/reader035/viewer/2022062222/56815290550346895dc0b345/html5/thumbnails/47.jpg)
More words is better ImprovesRetrieval
ImprovesSpeed
Branch factor
![Page 48: 10/21/10](https://reader035.vdocuments.mx/reader035/viewer/2022062222/56815290550346895dc0b345/html5/thumbnails/48.jpg)
Can we be more accurate?
So far, we treat each image as containing a “bag of words”, with no spatial information
af
z
e
e
afee
af
e
e
h
hWhich
matches better?
![Page 49: 10/21/10](https://reader035.vdocuments.mx/reader035/viewer/2022062222/56815290550346895dc0b345/html5/thumbnails/49.jpg)
Can we be more accurate?
So far, we treat each image as containing a “bag of words”, with no spatial information
Real objects have consistent geometry
![Page 50: 10/21/10](https://reader035.vdocuments.mx/reader035/viewer/2022062222/56815290550346895dc0b345/html5/thumbnails/50.jpg)
Final key idea: geometric verification• Goal: Given a set of possible keypoint
matches, figure out which ones are geometrically consistent
How can we do this?
![Page 51: 10/21/10](https://reader035.vdocuments.mx/reader035/viewer/2022062222/56815290550346895dc0b345/html5/thumbnails/51.jpg)
Final key idea: geometric verificationRANSAC for affine transform
af
z
e
e
af
e
ez
af
z
e
e
af e
ez
Affine Transform
Randomly choose 3 matching pairs
Estimate transformation
Predict remaining points and count “inliers”
Repeat N times:
![Page 52: 10/21/10](https://reader035.vdocuments.mx/reader035/viewer/2022062222/56815290550346895dc0b345/html5/thumbnails/52.jpg)
Application: Large-Scale Retrieval
K. Grauman, B. Leibe 54[Philbin CVPR’07]
Query Results on 5K (demo available for 100K)
![Page 53: 10/21/10](https://reader035.vdocuments.mx/reader035/viewer/2022062222/56815290550346895dc0b345/html5/thumbnails/53.jpg)
Application: Image Auto-Annotation
K. Grauman, B. Leibe 55
Left: Wikipedia imageRight: closest match from Flickr
[Quack CIVR’08]
Moulin Rouge
Tour Montparnasse Colosseum
ViktualienmarktMaypole
Old Town Square (Prague)
![Page 54: 10/21/10](https://reader035.vdocuments.mx/reader035/viewer/2022062222/56815290550346895dc0b345/html5/thumbnails/54.jpg)
Example Applications
B. Leibe 56
Mobile tourist guide• Self-localization• Object/building recognition• Photo/video augmentation
Aachen Cathedral
[Quack, Leibe, Van Gool, CIVR’08]
![Page 55: 10/21/10](https://reader035.vdocuments.mx/reader035/viewer/2022062222/56815290550346895dc0b345/html5/thumbnails/55.jpg)
Video Google System
1. Collect all words within query region
2. Inverted file index to find relevant frames
3. Compare word counts4. Spatial verification
Sivic & Zisserman, ICCV 2003
• Demo online at : http://www.robots.ox.ac.uk/~vgg/research/vgoogle/index.html
57 K. Grauman, B. Leibe
Query region
Retrieved fram
es
![Page 56: 10/21/10](https://reader035.vdocuments.mx/reader035/viewer/2022062222/56815290550346895dc0b345/html5/thumbnails/56.jpg)
Summary: Uses of Interest Points• Interest points can be detected reliably in different
images at the same 3D location– DOG interest points are localized in x, y, scale
• SIFT is robust to rotation and small deformation
• Interest points provide correspondence– For image stitching– For defining coordinate frames for object insertion– For object recognition and retrieval
![Page 57: 10/21/10](https://reader035.vdocuments.mx/reader035/viewer/2022062222/56815290550346895dc0b345/html5/thumbnails/57.jpg)
Announcements
• Project 4 is due Monday
• I’ll be out of town next Tues– Kevin Karsch will talk about his cool work
• I’ll still be here for office hours Mon (but leaving soon after for DC)