visual grouping and recognition jitendra malik university of california at berkeley jitendra malik...
TRANSCRIPT
![Page 1: Visual Grouping and Recognition Jitendra Malik University of California at Berkeley Jitendra Malik University of California at Berkeley](https://reader035.vdocuments.mx/reader035/viewer/2022062408/56649ed85503460f94be6dff/html5/thumbnails/1.jpg)
Visual Grouping and RecognitionVisual Grouping and Recognition
Jitendra Malik
University of California at Berkeley
Jitendra Malik
University of California at Berkeley
![Page 2: Visual Grouping and Recognition Jitendra Malik University of California at Berkeley Jitendra Malik University of California at Berkeley](https://reader035.vdocuments.mx/reader035/viewer/2022062408/56649ed85503460f94be6dff/html5/thumbnails/2.jpg)
CollaboratorsCollaborators
• Grouping: Jianbo Shi (CMU), Serge Belongie, Thomas Leung (Compaq CRL)
• Ecological Statistics: Charless Fowlkes, David Martin, Xiaofeng Ren
• Recognition: Serge Belongie, Jan Puzicha
• Grouping: Jianbo Shi (CMU), Serge Belongie, Thomas Leung (Compaq CRL)
• Ecological Statistics: Charless Fowlkes, David Martin, Xiaofeng Ren
• Recognition: Serge Belongie, Jan Puzicha
![Page 3: Visual Grouping and Recognition Jitendra Malik University of California at Berkeley Jitendra Malik University of California at Berkeley](https://reader035.vdocuments.mx/reader035/viewer/2022062408/56649ed85503460f94be6dff/html5/thumbnails/3.jpg)
From images to objectsFrom images to objects
Labeled sets: tiger, grass etc
![Page 4: Visual Grouping and Recognition Jitendra Malik University of California at Berkeley Jitendra Malik University of California at Berkeley](https://reader035.vdocuments.mx/reader035/viewer/2022062408/56649ed85503460f94be6dff/html5/thumbnails/4.jpg)
What enables us to parse a scene?What enables us to parse a scene?
– Low level cues• Color/texture• Contours• Motion
– Mid level cues• T-junctions• Convexity
– High level Cues• Familiar Object• Familiar Motion
– Low level cues• Color/texture• Contours• Motion
– Mid level cues• T-junctions• Convexity
– High level Cues• Familiar Object• Familiar Motion
![Page 5: Visual Grouping and Recognition Jitendra Malik University of California at Berkeley Jitendra Malik University of California at Berkeley](https://reader035.vdocuments.mx/reader035/viewer/2022062408/56649ed85503460f94be6dff/html5/thumbnails/5.jpg)
Grouping factors Grouping factors
![Page 6: Visual Grouping and Recognition Jitendra Malik University of California at Berkeley Jitendra Malik University of California at Berkeley](https://reader035.vdocuments.mx/reader035/viewer/2022062408/56649ed85503460f94be6dff/html5/thumbnails/6.jpg)
But is segmentation a meaningful problem?
But is segmentation a meaningful problem?
• Difficult to define formally, but humans are remarkably consistent…
• Difficult to define formally, but humans are remarkably consistent…
![Page 7: Visual Grouping and Recognition Jitendra Malik University of California at Berkeley Jitendra Malik University of California at Berkeley](https://reader035.vdocuments.mx/reader035/viewer/2022062408/56649ed85503460f94be6dff/html5/thumbnails/7.jpg)
Human Segmentations (1)Human Segmentations (1)
![Page 8: Visual Grouping and Recognition Jitendra Malik University of California at Berkeley Jitendra Malik University of California at Berkeley](https://reader035.vdocuments.mx/reader035/viewer/2022062408/56649ed85503460f94be6dff/html5/thumbnails/8.jpg)
Human Segmentations (2)Human Segmentations (2)
![Page 9: Visual Grouping and Recognition Jitendra Malik University of California at Berkeley Jitendra Malik University of California at Berkeley](https://reader035.vdocuments.mx/reader035/viewer/2022062408/56649ed85503460f94be6dff/html5/thumbnails/9.jpg)
ConsistencyConsistencyA
B C
• A,C are refinements of B• A,C are mutual refinements • A,B,C represent the same percept
• Attention accounts for differences
Image
BG L-bird R-bird
grass bush
headeye
beakfar body
headeye
beak body
Perceptual organization forms a tree:
Two segmentations are consistent when they can beexplained by the samesegmentation tree (i.e. theycould be derived from a single perceptual organization).
![Page 10: Visual Grouping and Recognition Jitendra Malik University of California at Berkeley Jitendra Malik University of California at Berkeley](https://reader035.vdocuments.mx/reader035/viewer/2022062408/56649ed85503460f94be6dff/html5/thumbnails/10.jpg)
Ecological Statistics of image segmentation
Ecological Statistics of image segmentation
• Measure the conditional probability distribution of various grouping cues in human segmented images (Brunswik 1950)
• Design algorithm for incorporating multiple cues for image segmentation
• Measure the conditional probability distribution of various grouping cues in human segmented images (Brunswik 1950)
• Design algorithm for incorporating multiple cues for image segmentation
![Page 11: Visual Grouping and Recognition Jitendra Malik University of California at Berkeley Jitendra Malik University of California at Berkeley](https://reader035.vdocuments.mx/reader035/viewer/2022062408/56649ed85503460f94be6dff/html5/thumbnails/11.jpg)
ProximityProximity
![Page 12: Visual Grouping and Recognition Jitendra Malik University of California at Berkeley Jitendra Malik University of California at Berkeley](https://reader035.vdocuments.mx/reader035/viewer/2022062408/56649ed85503460f94be6dff/html5/thumbnails/12.jpg)
Similarity of brightness(cf. Coughlan & Yuille, Geman & Jedynek)
Similarity of brightness(cf. Coughlan & Yuille, Geman & Jedynek)
![Page 13: Visual Grouping and Recognition Jitendra Malik University of California at Berkeley Jitendra Malik University of California at Berkeley](https://reader035.vdocuments.mx/reader035/viewer/2022062408/56649ed85503460f94be6dff/html5/thumbnails/13.jpg)
ConvexityConvexity
![Page 14: Visual Grouping and Recognition Jitendra Malik University of California at Berkeley Jitendra Malik University of California at Berkeley](https://reader035.vdocuments.mx/reader035/viewer/2022062408/56649ed85503460f94be6dff/html5/thumbnails/14.jpg)
Region AreaRegion Area
• Compare to Alvarez,Gousseau,Morel• Compare to Alvarez,Gousseau,Morel
y = Kx-
= 1.008
![Page 15: Visual Grouping and Recognition Jitendra Malik University of California at Berkeley Jitendra Malik University of California at Berkeley](https://reader035.vdocuments.mx/reader035/viewer/2022062408/56649ed85503460f94be6dff/html5/thumbnails/15.jpg)
Lengths of curvesLengths of curves
100 120 140 160 180 200 220 240140
150
160
170
180
190
200
210
220
230
240
-1 0 1 2 3 4 5 6 70
1
2
3
4
5
6
7
8
9
log(contour length)
log(c
ount)
![Page 16: Visual Grouping and Recognition Jitendra Malik University of California at Berkeley Jitendra Malik University of California at Berkeley](https://reader035.vdocuments.mx/reader035/viewer/2022062408/56649ed85503460f94be6dff/html5/thumbnails/16.jpg)
Image Segmentation as Graph PartitioningImage Segmentation as Graph PartitioningBuild a weighted graph G=(V,E) from image
V: image pixels
E: connections between pairs of nearby pixels
region
same the tobelong
j& iy that probabilit :ijW
Partition graph so that similarity within group is large and similarity between groups is small -- Normalized Cuts [Shi&Malik 97]
![Page 17: Visual Grouping and Recognition Jitendra Malik University of California at Berkeley Jitendra Malik University of California at Berkeley](https://reader035.vdocuments.mx/reader035/viewer/2022062408/56649ed85503460f94be6dff/html5/thumbnails/17.jpg)
Normalized Cut, A measure of dissimilarity
Normalized Cut, A measure of dissimilarity
• Minimum cut is not appropriate since it favors cutting small pieces.
• Normalized Cut, Ncut:
• Minimum cut is not appropriate since it favors cutting small pieces.
• Normalized Cut, Ncut:
V),(
B)A,(
V)A,(
B)A,( B)A,(
Bassoc
cut
assoc
cutNcut
![Page 18: Visual Grouping and Recognition Jitendra Malik University of California at Berkeley Jitendra Malik University of California at Berkeley](https://reader035.vdocuments.mx/reader035/viewer/2022062408/56649ed85503460f94be6dff/html5/thumbnails/18.jpg)
Normalized Cut As Generalized Eigenvalue problem
Normalized Cut As Generalized Eigenvalue problem
• after simplification, we get• after simplification, we get
...
),(
),( ;
11)1(
)1)(()1(
11
)1)(()1(
)VB,(
)BA,(
)VA,(
B)A,(B)A,(
0
i
x
T
T
T
T
iiD
iiDk
Dk
xWDx
Dk
xWDx
assoc
cut
assoc
cutNcut
i
.01},,1{ with ,)(
),(
DybyDyy
yWDyBANcut T
iT
T
![Page 19: Visual Grouping and Recognition Jitendra Malik University of California at Berkeley Jitendra Malik University of California at Berkeley](https://reader035.vdocuments.mx/reader035/viewer/2022062408/56649ed85503460f94be6dff/html5/thumbnails/19.jpg)
Cue-Integration for Image SegmentationCue-Integration for Image Segmentation
[Malik, Belongie, Shi, Leung 1999]
![Page 20: Visual Grouping and Recognition Jitendra Malik University of California at Berkeley Jitendra Malik University of California at Berkeley](https://reader035.vdocuments.mx/reader035/viewer/2022062408/56649ed85503460f94be6dff/html5/thumbnails/20.jpg)
On image segmentation..On image segmentation..
• Humans are quite consistent, so model the goal as emulating their behavior.
• Ecological statistics of grouping cues can be learned from image data.
• We now have a generic image segmentation algorithm (code available) which can be applied for MPEG-4/7 compression and object recognition.
• Humans are quite consistent, so model the goal as emulating their behavior.
• Ecological statistics of grouping cues can be learned from image data.
• We now have a generic image segmentation algorithm (code available) which can be applied for MPEG-4/7 compression and object recognition.
![Page 21: Visual Grouping and Recognition Jitendra Malik University of California at Berkeley Jitendra Malik University of California at Berkeley](https://reader035.vdocuments.mx/reader035/viewer/2022062408/56649ed85503460f94be6dff/html5/thumbnails/21.jpg)
Framework for RecognitionFramework for Recognition(1) Segmentation
PixelsSegments(2) Association
SegmentsRegions(3) Matching
RegionsPrototypes
Over-segmentation necessary; Under-segmentation fatal
Enumerate: # of size k regions in image with n segments is ~(4**k)*n/k
~10 views/object. Matching tolerant to pose/illumination changes, intra-category variation, error in previous steps
![Page 22: Visual Grouping and Recognition Jitendra Malik University of California at Berkeley Jitendra Malik University of California at Berkeley](https://reader035.vdocuments.mx/reader035/viewer/2022062408/56649ed85503460f94be6dff/html5/thumbnails/22.jpg)
Matching regions to viewsMatching regions to views
• GOAL: obtain small misclassification error using few views
• Matching allowing deformations of prototype views makes this possible
• GOAL: obtain small misclassification error using few views
• Matching allowing deformations of prototype views makes this possible
![Page 23: Visual Grouping and Recognition Jitendra Malik University of California at Berkeley Jitendra Malik University of California at Berkeley](https://reader035.vdocuments.mx/reader035/viewer/2022062408/56649ed85503460f94be6dff/html5/thumbnails/23.jpg)
Matching with original and deformed prototypesPrototype Test Error
![Page 24: Visual Grouping and Recognition Jitendra Malik University of California at Berkeley Jitendra Malik University of California at Berkeley](https://reader035.vdocuments.mx/reader035/viewer/2022062408/56649ed85503460f94be6dff/html5/thumbnails/24.jpg)
Deforming Biological ShapesDeforming Biological Shapes
• D’Arcy Thompson: On Growth and Form, 1917– studied transformations between shapes of organisms
![Page 25: Visual Grouping and Recognition Jitendra Malik University of California at Berkeley Jitendra Malik University of California at Berkeley](https://reader035.vdocuments.mx/reader035/viewer/2022062408/56649ed85503460f94be6dff/html5/thumbnails/25.jpg)
• Find correspondences between points on shape
• Estimate transformation
• Measure similarity
model target
...
![Page 26: Visual Grouping and Recognition Jitendra Malik University of California at Berkeley Jitendra Malik University of California at Berkeley](https://reader035.vdocuments.mx/reader035/viewer/2022062408/56649ed85503460f94be6dff/html5/thumbnails/26.jpg)
Finding correspondences between shapes
Finding correspondences between shapes
• Each shape is represented by a set of sample points
• Each sample point has a descriptor – the shape context
• Define cost Wij for matching point i on first shape with point j on second shape.
• Solve for correspondence as optimum assignment.
• Each shape is represented by a set of sample points
• Each sample point has a descriptor – the shape context
• Define cost Wij for matching point i on first shape with point j on second shape.
• Solve for correspondence as optimum assignment.
![Page 27: Visual Grouping and Recognition Jitendra Malik University of California at Berkeley Jitendra Malik University of California at Berkeley](https://reader035.vdocuments.mx/reader035/viewer/2022062408/56649ed85503460f94be6dff/html5/thumbnails/27.jpg)
Shape ContextShape ContextCount the number of points inside each bin, e.g.:
Count = 4
Count = 10
...
Compact representation of distribution of points relative to each point
![Page 28: Visual Grouping and Recognition Jitendra Malik University of California at Berkeley Jitendra Malik University of California at Berkeley](https://reader035.vdocuments.mx/reader035/viewer/2022062408/56649ed85503460f94be6dff/html5/thumbnails/28.jpg)
![Page 29: Visual Grouping and Recognition Jitendra Malik University of California at Berkeley Jitendra Malik University of California at Berkeley](https://reader035.vdocuments.mx/reader035/viewer/2022062408/56649ed85503460f94be6dff/html5/thumbnails/29.jpg)
Comparing Shape ContextsComparing Shape ContextsCompute matching costs using Chi Squared Test:
Recover correspondences by solving linear assignment problem with costs Cij
[Jonker & Volgenant 1987]
![Page 30: Visual Grouping and Recognition Jitendra Malik University of California at Berkeley Jitendra Malik University of California at Berkeley](https://reader035.vdocuments.mx/reader035/viewer/2022062408/56649ed85503460f94be6dff/html5/thumbnails/30.jpg)
MatchingExampleMatchingExample
model target
![Page 31: Visual Grouping and Recognition Jitendra Malik University of California at Berkeley Jitendra Malik University of California at Berkeley](https://reader035.vdocuments.mx/reader035/viewer/2022062408/56649ed85503460f94be6dff/html5/thumbnails/31.jpg)
Synthetic Test ResultsSynthetic Test ResultsFish - deformation + noise Fish - deformation + outliers
ICP Shape Context RPM
![Page 32: Visual Grouping and Recognition Jitendra Malik University of California at Berkeley Jitendra Malik University of California at Berkeley](https://reader035.vdocuments.mx/reader035/viewer/2022062408/56649ed85503460f94be6dff/html5/thumbnails/32.jpg)
Measuring Shape SimilarityMeasuring Shape Similarity• Image appearance around matched points
– color or gray-level window– orientation
• Shape context differences at matched points
• Bending Energy
• Image appearance around matched points– color or gray-level window– orientation
• Shape context differences at matched points
• Bending Energy
![Page 33: Visual Grouping and Recognition Jitendra Malik University of California at Berkeley Jitendra Malik University of California at Berkeley](https://reader035.vdocuments.mx/reader035/viewer/2022062408/56649ed85503460f94be6dff/html5/thumbnails/33.jpg)
COIL Object Database
![Page 34: Visual Grouping and Recognition Jitendra Malik University of California at Berkeley Jitendra Malik University of California at Berkeley](https://reader035.vdocuments.mx/reader035/viewer/2022062408/56649ed85503460f94be6dff/html5/thumbnails/34.jpg)
Editing: PrototypesEditing: Prototypes
• Human Shape Perception• Computational Needs for K-NN
• Human Shape Perception• Computational Needs for K-NN
![Page 35: Visual Grouping and Recognition Jitendra Malik University of California at Berkeley Jitendra Malik University of California at Berkeley](https://reader035.vdocuments.mx/reader035/viewer/2022062408/56649ed85503460f94be6dff/html5/thumbnails/35.jpg)
Prototype Selection: Coil-20Prototype Selection: Coil-20
![Page 36: Visual Grouping and Recognition Jitendra Malik University of California at Berkeley Jitendra Malik University of California at Berkeley](https://reader035.vdocuments.mx/reader035/viewer/2022062408/56649ed85503460f94be6dff/html5/thumbnails/36.jpg)
MNIST Handwritten Digits
![Page 37: Visual Grouping and Recognition Jitendra Malik University of California at Berkeley Jitendra Malik University of California at Berkeley](https://reader035.vdocuments.mx/reader035/viewer/2022062408/56649ed85503460f94be6dff/html5/thumbnails/37.jpg)
Handwritten Digit RecognitionHandwritten Digit Recognition• MNIST 60 000:
– linear: 12.0%
– 40 PCA+ quad: 3.3%
– 1000 RBF +linear: 3.6%
– K-NN: 5%
– K-NN (deskewed): 2.4%
– K-NN (tangent dist.): 1.1%
– SVM: 1.1%
– LeNet 5: 0.95%
• MNIST 60 000: – linear: 12.0%
– 40 PCA+ quad: 3.3%
– 1000 RBF +linear: 3.6%
– K-NN: 5%
– K-NN (deskewed): 2.4%
– K-NN (tangent dist.): 1.1%
– SVM: 1.1%
– LeNet 5: 0.95%
• MNIST 600 000 (distortions): – LeNet 5: 0.8%– SVM: 0.8%– Boosted LeNet 4: 0.7%
• MNIST 20 000: – K-NN, Shape Context
matching: 0.63%
![Page 38: Visual Grouping and Recognition Jitendra Malik University of California at Berkeley Jitendra Malik University of California at Berkeley](https://reader035.vdocuments.mx/reader035/viewer/2022062408/56649ed85503460f94be6dff/html5/thumbnails/38.jpg)
Hand-written Digit RecognitionHand-written Digit Recognition• MNIST 60 000:
– linear: 12.0%
– 40 PCA+ quad: 3.3%
– 1000 RBF +linear: 3.6%
– K-NN: 5%
– K-NN (deskewed): 2.4%
– K-NN (tangent dist.): 1.1%
– SVM: 1.1%
– LeNet 5: 0.95%
• MNIST 60 000: – linear: 12.0%
– 40 PCA+ quad: 3.3%
– 1000 RBF +linear: 3.6%
– K-NN: 5%
– K-NN (deskewed): 2.4%
– K-NN (tangent dist.): 1.1%
– SVM: 1.1%
– LeNet 5: 0.95%
• MNIST 600 000 (distortions): – LeNet 5: 0.8%
– SVM: 0.8%
– Boosted LeNet 4: 0.7%
• MNIST 20 000– K-NN, Shape context
matching: 0.63 %
![Page 39: Visual Grouping and Recognition Jitendra Malik University of California at Berkeley Jitendra Malik University of California at Berkeley](https://reader035.vdocuments.mx/reader035/viewer/2022062408/56649ed85503460f94be6dff/html5/thumbnails/39.jpg)
Results: Digit RecognitionResults: Digit Recognition
1-NN classifier using:Shape context + 0.3 * bending + 1.6 * image appearance
1-NN classifier using:Shape context + 0.3 * bending + 1.6 * image appearance
![Page 40: Visual Grouping and Recognition Jitendra Malik University of California at Berkeley Jitendra Malik University of California at Berkeley](https://reader035.vdocuments.mx/reader035/viewer/2022062408/56649ed85503460f94be6dff/html5/thumbnails/40.jpg)
Results: Digit Recognition (Detail)
Results: Digit Recognition (Detail)
![Page 41: Visual Grouping and Recognition Jitendra Malik University of California at Berkeley Jitendra Malik University of California at Berkeley](https://reader035.vdocuments.mx/reader035/viewer/2022062408/56649ed85503460f94be6dff/html5/thumbnails/41.jpg)
![Page 42: Visual Grouping and Recognition Jitendra Malik University of California at Berkeley Jitendra Malik University of California at Berkeley](https://reader035.vdocuments.mx/reader035/viewer/2022062408/56649ed85503460f94be6dff/html5/thumbnails/42.jpg)
Trademark SimilarityTrademark Similarity
![Page 43: Visual Grouping and Recognition Jitendra Malik University of California at Berkeley Jitendra Malik University of California at Berkeley](https://reader035.vdocuments.mx/reader035/viewer/2022062408/56649ed85503460f94be6dff/html5/thumbnails/43.jpg)
Future work..Future work..
• Indexing based on color/texture/shape features before correspondence matching
• Integrate segmentation and recognition
• Indexing based on color/texture/shape features before correspondence matching
• Integrate segmentation and recognition
![Page 44: Visual Grouping and Recognition Jitendra Malik University of California at Berkeley Jitendra Malik University of California at Berkeley](https://reader035.vdocuments.mx/reader035/viewer/2022062408/56649ed85503460f94be6dff/html5/thumbnails/44.jpg)
Computing cost on a Pentium PCComputing cost on a Pentium PC
• Segmentation: 2 minutes /image (200x100)
• Matching : 0.2 sec / match (100 points)
• Segmentation: 2 minutes /image (200x100)
• Matching : 0.2 sec / match (100 points)
![Page 45: Visual Grouping and Recognition Jitendra Malik University of California at Berkeley Jitendra Malik University of California at Berkeley](https://reader035.vdocuments.mx/reader035/viewer/2022062408/56649ed85503460f94be6dff/html5/thumbnails/45.jpg)
Given a 104 speedup..Given a 104 speedup..
• 5K object categories/sec
• Humans can recognize 10K -100K objects, so we could be in the ballpark of human level vision by 2020.
• 5K object categories/sec
• Humans can recognize 10K -100K objects, so we could be in the ballpark of human level vision by 2020.