stanford cs223b computer vision, winter 2007 lecture 5 advanced image filters professors sebastian...
Post on 19-Dec-2015
218 views
TRANSCRIPT
Stanford CS223B Computer Vision, Winter 2007
Lecture 5 Advanced Image Filters
Professors Sebastian Thrun and Jana Košecká
CAs: Vaibhav Vaish and David Stavens
Sebastian Thrun and Jana Košecká CS223B Computer Vision, Winter 2007
Advanced Features: Topics Advanced Edge Detection Global Image Features (Hough Transform) Templates, Image Pyramid SIFT Features Learning with Many Simple Features
Sebastian Thrun and Jana Košecká CS223B Computer Vision, Winter 2007
Features in Matlabim = imread('bridge.jpg');
bw = rgb2gray(im);
edge(im,’sobel’) - (almost) linear
edge(im,’canny’) - not local, no closed form
Sebastian Thrun and Jana Košecká CS223B Computer Vision, Winter 2007
Sobel Operator
-1 -2 -1 0 0 0 1 2 1
-1 0 1-2 0 2 -1 0 1
S1= S2 =
Edge Magnitude =
Edge Direction =
S1 + S22 2
tan-1S1
S2
Sebastian Thrun and Jana Košecká CS223B Computer Vision, Winter 2007
Sobel in Matlab
edge(im,’sobel’)
Sebastian Thrun and Jana Košecká CS223B Computer Vision, Winter 2007
Canny Edge Detector
edge(im,’canny’)
Sebastian Thrun and Jana Košecká CS223B Computer Vision, Winter 2007
Canny Edge Detection
Steps:1. Apply derivative of Gaussian
2. Non-maximum suppression• Thin multi-pixel wide “ridges” down to single pixel width
3. Linking and thresholding• Low, high edge-strength thresholds• Accept all edges over low threshold that are connected
to edge over high threshold
Sebastian Thrun and Jana Košecká CS223B Computer Vision, Winter 2007
Non-Maximum Supression
Non-maximum suppression:Select the single maximum point across the width of an edge.
Sebastian Thrun and Jana Košecká CS223B Computer Vision, Winter 2007
Linking to the Next Edge Point
Assume the marked point q is an edge point.
Take the normal to the gradient at that point and use this to predict continuation points (either r or p).
Sebastian Thrun and Jana Košecká CS223B Computer Vision, Winter 2007
Edge Hysteresis
Hysteresis: A lag or momentum factor Idea: Maintain two thresholds khigh and klow
– Use khigh to find strong edges to start edge chain
– Use klow to find weak edges which continue edge chain
Typical ratio of thresholds is roughly
khigh / klow = 2
Sebastian Thrun and Jana Košecká CS223B Computer Vision, Winter 2007
Canny Edge Detection (Example)
courtesy of G. Loy
gap is gone
Originalimage
Strongedges
only
Strong +connectedweak edges
Weakedges
Sebastian Thrun and Jana Košecká CS223B Computer Vision, Winter 2007
Canny Edge Detection (Example)
Using Matlab with default thresholds
Sebastian Thrun and Jana Košecká CS223B Computer Vision, Winter 2007
Bridge Example Again
edge(im,’canny’)
Sebastian Thrun and Jana Košecká CS223B Computer Vision, Winter 2007
Summary: Canny Edge Detection Most commonly used method Traces edges, accommodates variations in
contrast Not a linear filter!
Problems with corners
Sebastian Thrun and Jana Košecká CS223B Computer Vision, Winter 2007
Advanced Features: Topics Advanced Edge Detection Global Image Features (Hough Transform) Templates, Image Pyramid SIFT Features Learning with Many Simple Features
Sebastian Thrun and Jana Košecká CS223B Computer Vision, Winter 2007
Towards Global Features
Local versus global
Sebastian Thrun and Jana Košecká CS223B Computer Vision, Winter 2007
Vanishing Points
A. Canaletto [1740], Arrival of the French Ambassador in Venice
Sebastian Thrun and Jana Košecká CS223B Computer Vision, Winter 2007
Vanishing Points…?
A. Canaletto [1740], Arrival of the French Ambassador in Venice
Sebastian Thrun and Jana Košecká CS223B Computer Vision, Winter 2007
m
Hough Transform: Quantization
Detecting Lines by finding maxima / clustering in parameter space
b
mx
y
Sebastian Thrun and Jana Košecká CS223B Computer Vision, Winter 2007
Hough Transform: Algorithm For each image point, determine
– most likely line parameters b,m (direction of gradient)– strength (magnitude of gradient)
Increment parameter counter by strength value
Cluster in parameter space, pick local maxima
Sebastian Thrun and Jana Košecká CS223B Computer Vision, Winter 2007
Hough Transform: Results
Hough TransformImage Edge detection
Sebastian Thrun and Jana Košecká CS223B Computer Vision, Winter 2007
Summary Hough Transform Smart counting
– Local evidence for global features– Organized in a table– Careful with parameterization!
Problem: Curse of dimensionality– Works great for simple features with 3 unknowns– Will fail for complex objects
Problem: Not a local algorithm
Sebastian Thrun and Jana Košecká CS223B Computer Vision, Winter 2007
Advanced Features: Topics Advanced Edge Detection Global Image Features (Hough Transform) Templates, Image Pyramid SIFT Features Learning with Many Simple Features
Sebastian Thrun and Jana Košecká CS223B Computer Vision, Winter 2007
Features for Object Detection/Recognition
Want to find… in here
Sebastian Thrun and Jana Košecká CS223B Computer Vision, Winter 2007
Templates Find an object in an image!
We want Invariance!– Scaling– Rotation– Illumination– Perspective Projection
Sebastian Thrun and Jana Košecká CS223B Computer Vision, Winter 2007
Convolution with Templates% read imageim = imread('bridge.jpg');bw = double(im(:,:,1)) ./ 256;imshow(bw)
% apply FFTFFTim = fft2(bw);bw2 = real(ifft2(FFTim));imshow(bw2)
% define a kernelkernel=zeros(size(bw));kernel(1, 1) = 1;kernel(1, 2) = -1;FFTkernel = fft2(kernel);
% apply the kernel and check out the resultFFTresult = FFTim .* FFTkernel;result = real(ifft2(FFTresult));imshow(result)
% select an image patch
patch = bw(221:240,351:370);
imshow(patch)
patch = patch - (sum(sum(patch)) / size(patch,1) / size(patch, 2));
kernel=zeros(size(bw));
kernel(1:size(patch,1),1:size(patch,2)) = patch;
FFTkernel = fft2(kernel);
% apply the kernel and check out the result
FFTresult = FFTim .* FFTkernel;
result = max(0, real(ifft2(FFTresult)));
result = result ./ max(max(result));
result = (result .^ 1 > 0.5);
imshow(result)
% alternative convolution
imshow(conv2(bw, patch, 'same'))
Sebastian Thrun and Jana Košecká CS223B Computer Vision, Winter 2007
Aside: Convolution Theorem
)()()( gFIFgIF
2
)}(2exp{),(),))(,(( dydxvyuxiyxgvuyxgF Fourier Transform of g:
F is invertible
Sebastian Thrun and Jana Košecká CS223B Computer Vision, Winter 2007
Convolution with Templates% read imageim = imread('bridge.jpg');bw = double(im(:,:,1)) ./ 256;;imshow(bw)
% apply FFTFFTim = fft2(bw);bw2 = real(ifft2(FFTim));imshow(bw2)
% define a kernelkernel=zeros(size(bw));kernel(1, 1) = 1;kernel(1, 2) = -1;FFTkernel = fft2(kernel);
% apply the kernel and check out the resultFFTresult = FFTim .* FFTkernel;result = real(ifft2(FFTresult));imshow(result)
% select an image patch
patch = bw(221:240,351:370);
imshow(patch)
patch = patch - (sum(sum(patch)) / size(patch,1) / size(patch, 2));
kernel=zeros(size(bw));
kernel(1:size(patch,1),1:size(patch,2)) = patch;
FFTkernel = fft2(kernel);
% apply the kernel and check out the result
FFTresult = FFTim .* FFTkernel;
result = max(0, real(ifft2(FFTresult)));
result = result ./ max(max(result));
result = (result .^ 1 > 0.5);
imshow(result)
% alternative convolution
imshow(conv2(bw, patch, 'same'))
Sebastian Thrun and Jana Košecká CS223B Computer Vision, Winter 2007
Convolution with Templates Invariances:
– Scaling– Rotation– Illumination– Perspective Projection
Provides– Good localization
No
No
No
NoNo
Sebastian Thrun and Jana Košecká CS223B Computer Vision, Winter 2007
Scale Invariance: Image Pyramid
Sebastian Thrun and Jana Košecká CS223B Computer Vision, Winter 2007
Pyramid Convolution with Templates
Invariances:– Scaling– Rotation– Illumination– Perspective Projection
Provides– Good localization
No
Yes
No
NoNo
Sebastian Thrun and Jana Košecká CS223B Computer Vision, Winter 2007
Aliasing Effects
Constructing a pyramid by taking every second pixel leads to layers that badly misrepresent the top layer
Slide credit: Gary Bradski
Sebastian Thrun and Jana Košecká CS223B Computer Vision, Winter 2007
Solution to Aliasing Convolve with Gaussian
Sebastian Thrun and Jana Košecká CS223B Computer Vision, Winter 2007
Templates with Image Pyramid Invariance:
– Scaling– Rotation– Illumination– Perspective Projection
Provides– Good localization
No (maybe rotate template?)
Yes
No
Not reallyNo
Sebastian Thrun and Jana Košecká CS223B Computer Vision, Winter 2007
Template Matching, Commercial
http://www.seeingmachines.com/facelab.htm
Sebastian Thrun and Jana Košecká CS223B Computer Vision, Winter 2007
Advanced Features: Topics Advanced Edge Detection Global Image Features (Hough Transform) Templates, Image Pyramid SIFT Features Learning with Many Simple Features
Sebastian Thrun and Jana Košecká CS223B Computer Vision, Winter 2007
Improved Invariance Handling
Want to find… in here
Sebastian Thrun and Jana Košecká CS223B Computer Vision, Winter 2007
SIFT Features Invariances:
– Scaling– Rotation– Illumination– Deformation
Provides– Good localization
Yes
Yes
Yes
Not reallyYes
Sebastian Thrun and Jana Košecká CS223B Computer Vision, Winter 2007
SIFT Reference
Distinctive image features from scale-invariant keypoints. David G. Lowe, International Journal of Computer Vision, 60, 2 (2004), pp. 91-110.
SIFT = Scale Invariant Feature Transform
Sebastian Thrun and Jana Košecká CS223B Computer Vision, Winter 2007
Invariant Local Features Image content is transformed into local feature coordinates that are
invariant to translation, rotation, scale, and other imaging parameters
SIFT Features
Sebastian Thrun and Jana Košecká CS223B Computer Vision, Winter 2007
Advantages of invariant local features
Locality: features are local, so robust to occlusion and clutter (no prior segmentation)
Distinctiveness: individual features can be matched to a large database of objects
Quantity: many features can be generated for even small objects
Efficiency: close to real-time performance
Extensibility: can easily be extended to wide range of differing feature types, with each adding robustness
Sebastian Thrun and Jana Košecká CS223B Computer Vision, Winter 2007
SIFT On-A-Slide1. Enforce invariance to scale: Compute Gaussian difference max, for may
different scales; non-maximum suppression, find local maxima: keypoint candidates
2. Localizable corner: For each maximum fit quadratic function. Compute center with sub-pixel accuracy by setting first derivative to zero.
3. Eliminate edges: Compute ratio of eigenvalues, drop keypoints for which this ratio is larger than a threshold.
4. Enforce invariance to orientation: Compute orientation, to achieve rotation invariance, by finding the strongest second derivative direction in the smoothed image (possibly multiple orientations). Rotate patch so that orientation points up.
5. Compute feature signature: Compute a "gradient histogram" of the local image region in a 4x4 pixel region. Do this for 4x4 regions of that size. Orient so that largest gradient points up (possibly multiple solutions). Result: feature vector with 128 values (15 fields, 8 gradients).
6. Enforce invariance to illumination change and camera saturation: Normalize to unit length to increase invariance to illumination. Then threshold all gradients, to become invariant to camera saturation.
Sebastian Thrun and Jana Košecká CS223B Computer Vision, Winter 2007
SIFT On-A-Slide1. Enforce invariance to scale: Compute Gaussian difference max, for may
different scales; non-maximum suppression, find local maxima: keypoint candidates
2. Localizable corner: For each maximum fit quadratic function. Compute center with sub-pixel accuracy by setting first derivative to zero.
3. Eliminate edges: Compute ratio of eigenvalues, drop keypoints for which this ratio is larger than a threshold.
4. Enforce invariance to orientation: Compute orientation, to achieve rotation invariance, by finding the strongest second derivative direction in the smoothed image (possibly multiple orientations). Rotate patch so that orientation points up.
5. Compute feature signature: Compute a "gradient histogram" of the local image region in a 4x4 pixel region. Do this for 4x4 regions of that size. Orient so that largest gradient points up (possibly multiple solutions). Result: feature vector with 128 values (15 fields, 8 gradients).
6. Enforce invariance to illumination change and camera saturation: Normalize to unit length to increase invariance to illumination. Then threshold all gradients, to become invariant to camera saturation.
Sebastian Thrun and Jana Košecká CS223B Computer Vision, Winter 2007
Finding “Keypoints” (Corners)
Idea: Find Corners, but scale invariance
Approach: Run linear filter (diff of Gaussians) Do this at different resolutions of image pyramid
Sebastian Thrun and Jana Košecká CS223B Computer Vision, Winter 2007
Difference of Gaussians
Minus
Equals
Sebastian Thrun and Jana Košecká CS223B Computer Vision, Winter 2007
Difference of Gaussians
surf(fspecial('gaussian',40,4))
surf(fspecial('gaussian',40,8))
surf(fspecial('gaussian',40,8) - fspecial('gaussian',40,4))
Sebastian Thrun and Jana Košecká CS223B Computer Vision, Winter 2007
Find Corners with DiffOfGaussim =imread('bridge.jpg');
bw = double(im(:,:,1)) / 256;
for i = 1 : 10
gaussD = fspecial('gaussian',40,2*i) - fspecial('gaussian',40,i);
res = abs(conv2(bw, gaussD, 'same'));
res = res / max(max(res));
imshow(res) ; title(['\bf i = ' num2str(i)]); drawnow
end
Sebastian Thrun and Jana Košecká CS223B Computer Vision, Winter 2007
Key point localization
Detect maxima and minima of difference-of-Gaussian in scale space
B l u r
R e s a m p l e
S u b t r a c t
Sebastian Thrun and Jana Košecká CS223B Computer Vision, Winter 2007
Example of keypoint detection
(a) 233x189 image(b) 832 DOG extrema(c) 729 above threshold
Sebastian Thrun and Jana Košecká CS223B Computer Vision, Winter 2007
SIFT On-A-Slide1. Enforce invariance to scale: Compute Gaussian difference max, for may
different scales; non-maximum suppression, find local maxima: keypoint candidates
2. Localizable corner: For each maximum fit quadratic function. Compute center with sub-pixel accuracy by setting first derivative to zero.
3. Eliminate edges: Compute ratio of eigenvalues, drop keypoints for which this ratio is larger than a threshold.
4. Enforce invariance to orientation: Compute orientation, to achieve rotation invariance, by finding the strongest second derivative direction in the smoothed image (possibly multiple orientations). Rotate patch so that orientation points up.
5. Compute feature signature: Compute a "gradient histogram" of the local image region in a 4x4 pixel region. Do this for 4x4 regions of that size. Orient so that largest gradient points up (possibly multiple solutions). Result: feature vector with 128 values (15 fields, 8 gradients).
6. Enforce invariance to illumination change and camera saturation: Normalize to unit length to increase invariance to illumination. Then threshold all gradients, to become invariant to camera saturation.
Sebastian Thrun and Jana Košecká CS223B Computer Vision, Winter 2007
Example of keypoint detection
Threshold on value at DOG peak and on ratio of principle curvatures (Harris approach)
(c) 729 left after peak value threshold (from 832)(d) 536 left after testing ratio of principle curvatures
Sebastian Thrun and Jana Košecká CS223B Computer Vision, Winter 2007
SIFT On-A-Slide1. Enforce invariance to scale: Compute Gaussian difference max, for may
different scales; non-maximum suppression, find local maxima: keypoint candidates
2. Localizable corner: For each maximum fit quadratic function. Compute center with sub-pixel accuracy by setting first derivative to zero.
3. Eliminate edges: Compute ratio of eigenvalues, drop keypoints for which this ratio is larger than a threshold.
4. Enforce invariance to orientation: Compute orientation, to achieve rotation invariance, by finding the strongest second derivative direction in the smoothed image (possibly multiple orientations). Rotate patch so that orientation points up.
5. Compute feature signature: Compute a "gradient histogram" of the local image region in a 4x4 pixel region. Do this for 4x4 regions of that size. Orient so that largest gradient points up (possibly multiple solutions). Result: feature vector with 128 values (15 fields, 8 gradients).
6. Enforce invariance to illumination change and camera saturation: Normalize to unit length to increase invariance to illumination. Then threshold all gradients, to become invariant to camera saturation.
Sebastian Thrun and Jana Košecká CS223B Computer Vision, Winter 2007
Select canonical orientation Create histogram of local gradient
directions computed at selected scale
Assign canonical orientation at peak of smoothed histogram
Each key specifies stable 2D coordinates (x, y, scale, orientation)
0 2
Sebastian Thrun and Jana Košecká CS223B Computer Vision, Winter 2007
SIFT On-A-Slide1. Enforce invariance to scale: Compute Gaussian difference max, for may
different scales; non-maximum suppression, find local maxima: keypoint candidates
2. Localizable corner: For each maximum fit quadratic function. Compute center with sub-pixel accuracy by setting first derivative to zero.
3. Eliminate edges: Compute ratio of eigenvalues, drop keypoints for which this ratio is larger than a threshold.
4. Enforce invariance to orientation: Compute orientation, to achieve rotation invariance, by finding the strongest second derivative direction in the smoothed image (possibly multiple orientations). Rotate patch so that orientation points up.
5. Compute feature signature: Compute a "gradient histogram" of the local image region in a 4x4 pixel region. Do this for 4x4 regions of that size. Orient so that largest gradient points up (possibly multiple solutions). Result: feature vector with 128 values (15 fields, 8 gradients).
6. Enforce invariance to illumination change and camera saturation: Normalize to unit length to increase invariance to illumination. Then threshold all gradients, to become invariant to camera saturation.
Sebastian Thrun and Jana Košecká CS223B Computer Vision, Winter 2007
SIFT vector formation
Thresholded image gradients are sampled over 16x16 array of locations in scale space
Create array of orientation histograms 8 orientations x 4x4 histogram array = 128 dimensions
Sebastian Thrun and Jana Košecká CS223B Computer Vision, Winter 2007
Nearest-neighbor matching to feature database
Hypotheses are generated by approximate nearest neighbor matching of each feature to vectors in the database – SIFT use best-bin-first (Beis & Lowe, 97) modification to k-d
tree algorithm– Use heap data structure to identify bins in order by their
distance from query point
Result: Can give speedup by factor of 1000 while finding nearest neighbor (of interest) 95% of the time
Sebastian Thrun and Jana Košecká CS223B Computer Vision, Winter 2007
3D Object Recognition
Extract outlines with background subtraction
Sebastian Thrun and Jana Košecká CS223B Computer Vision, Winter 2007
3D Object Recognition Only 3 keys are needed for
recognition, so extra keys provide robustness
Affine model is no longer as accurate
Sebastian Thrun and Jana Košecká CS223B Computer Vision, Winter 2007
Test of illumination invariance Same image under differing illumination
273 keys verified in final match
Sebastian Thrun and Jana Košecká CS223B Computer Vision, Winter 2007
SIFT Invariances:
– Scaling– Rotation– Illumination– Perspective Projection
Provides– Good localization
YesYes
Yes
MaybeYes
Sebastian Thrun and Jana Košecká CS223B Computer Vision, Winter 2007
SIFT demos
Run
sift_compile
sift_demo2
Sebastian Thrun and Jana Košecká CS223B Computer Vision, Winter 2007
Summary SIFT1. Enforce invariance to scale: Compute Gaussian difference max, for may
different scales; non-maximum suppression, find local maxima: keypoint candidates
2. Localizable corner: For each maximum fit quadratic function. Compute center with sub-pixel accuracy by setting first derivative to zero.
3. Eliminate edges: Compute ratio of eigenvalues, drop keypoints for which this ratio is larger than a threshold.
4. Enforce invariance to orientation: Compute orientation, to achieve rotation invariance, by finding the strongest second derivative direction in the smoothed image (possibly multiple orientations). Rotate patch so that orientation points up.
5. Compute feature signature: Compute a "gradient histogram" of the local image region in a 4x4 pixel region. Do this for 4x4 regions of that size. Orient so that largest gradient points up (possibly multiple solutions). Result: feature vector with 128 values (15 fields, 8 gradients).
6. Enforce invariance to illumination change and camera saturation: Normalize to unit length to increase invariance to illumination. Then threshold all gradients, to become invariant to camera saturation.
Defines state-of-the-art in invariant feature matching!
Sebastian Thrun and Jana Košecká CS223B Computer Vision, Winter 2007
Advanced Features: Topics Advanced Edge Detection Global Image Features (Hough Transform) Templates, Image Pyramid SIFT Features Learning with Many Simple Features
Sebastian Thrun and Jana Košecká CS223B Computer Vision, Winter 2007
A totally different idea Use many very simple features Learn cascade of tests for target object Efficient if:
– features easy to compute– cascade short
Sebastian Thrun and Jana Košecká CS223B Computer Vision, Winter 2007
Using Many Simple Features Viola Jones / Haar Features
(Generalized) Haar Features:
• rectangular blocks, white or black• 3 types of features:
• two rectangles: horizontal/vertical• three rectangles• four rectangles
• in 24x24 window: 180,000 possible features
Sebastian Thrun and Jana Košecká CS223B Computer Vision, Winter 2007
Integral ImageDef: The integral image at location (x,y), is the sum of the pixel values above and to the left of (x,y), inclusive.
We can calculate the integral image representation of
the image in a single pass.
(x,y)
s(x,y) = s(x,y-1) + i(x,y)
ii(x,y) = ii(x-1,y) + s(x,y)
(0,0)
x
y
Slide credit: Gyozo Gidofalvi
Sebastian Thrun and Jana Košecká CS223B Computer Vision, Winter 2007
Efficient Computation of Rectangle Value
Using the integral image representation one can compute the value of any rectangular sum in constant time.
Example: Rectangle D
ii(4) + ii(1) – ii(2) – ii(3)
As a result two-, three-, and four-rectangular features can be computed with 6, 8 and 9 array references respectively.
Slide credit: Gyozo Gidofalvi
Sebastian Thrun and Jana Košecká CS223B Computer Vision, Winter 2007
Idea 1: Linear Separator
Slide credit: Frank Dellaert, Paul Viola, Foryth&Ponce
Sebastian Thrun and Jana Košecká CS223B Computer Vision, Winter 2007
Linear Separator for Image features(highly related to Vapnik’s Support Vector Machines)
Slide credit: Frank Dellaert, Paul Viola, Foryth&Ponce
Sebastian Thrun and Jana Košecká CS223B Computer Vision, Winter 2007
Problem How to find hyperplane? How to avoid evaluating 180,000 features?
Answer: Boosting [AdaBoost, Freund/Shapire]– Finds small set of features that are “sufficient”– Generalizes very well (a lot of max-margin theory)– Requires positive and negative examples
Sebastian Thrun and Jana Košecká CS223B Computer Vision, Winter 2007
AdaBoost Idea (in Viola/Jones): Given set of “weak” classifiers:
– Pick best one– Reweight training examples, so that misclassified
images have larger weight– Reiterate; then linearly combine resulting classifiers
Weak classifiers: Haar features
Sebastian Thrun and Jana Košecká CS223B Computer Vision, Winter 2007
AdaBoost Weak Classifier 1
WeightsIncreased
Weak classifier 3
Final classifier is linear combination of weak classifiers
t
xhyt
t Z
eiDiD
iti )(
1
)()(
t
i
xhyt
ht
Z
eiDh ii )()(min
Weak Classifier 2
Freund & Shapire
Sebastian Thrun and Jana Košecká CS223B Computer Vision, Winter 2007
Adaboost AlgorithmFreund & Shapire
Sebastian Thrun and Jana Košecká CS223B Computer Vision, Winter 2007
AdaBoost gives efficient classifier:
Features = Weak Classifiers Each round selects the optimal feature
given:– Previous selected features– Exponential Loss
AdaBoost Surprise– Generalization error decreases even after all
training examples 100% correctly classified (margin maximization phenomenon)
Sebastian Thrun and Jana Košecká CS223B Computer Vision, Winter 2007
Boosted Face Detection: Image Features
“Rectangle filters”
000,000,6100000,60 Unique Binary Features
Slide credit: Frank Dellaert, Paul Viola, Foryth&Ponce
Sebastian Thrun and Jana Košecká CS223B Computer Vision, Winter 2007
Example Classifier for Face Detection
ROC curve for 200 feature classifier
A classifier with 200 rectangle features was learned using AdaBoost
95% correct detection on test set with 1 in 14084false positives.
Slide credit: Frank Dellaert, Paul Viola, Foryth&Ponce
Sebastian Thrun and Jana Košecká CS223B Computer Vision, Winter 2007
Classifier are Efficient
Given a nested set of classifier hypothesis classes
vs false neg determined by
% False Pos
% D
etec
tion
0 50
50
100
IMAGESUB-WINDOW
Classifier 1
F
NON-FACE
F
NON-FACE
FACEClassifier 3T
F
NON-FACE
TTTClassifier 2
F
NON-FACE
Slide credit: Frank Dellaert, Paul Viola, Foryth&Ponce
Sebastian Thrun and Jana Košecká CS223B Computer Vision, Winter 2007
Cascaded Classifier
1 Feature 5 Features
F
50%20 Features
20% 2%
FACE
NON-FACE
F
NON-FACE
F
NON-FACE
IMAGESUB-WINDOW
A 1 feature classifier achieves 100% detection rate and about 50% false positive rate.
A 5 feature classifier achieves 100% detection rate and 40% false positive rate (20% cumulative)– using data from previous stage.
A 20 feature classifier achieve 100% detection rate with 10% false positive rate (2% cumulative)
Slide credit: Frank Dellaert, Paul Viola, Foryth&Ponce
Sebastian Thrun and Jana Košecká CS223B Computer Vision, Winter 2007
Output of Face Detector on Test Images
Slide credit: Frank Dellaert, Paul Viola, Foryth&Ponce
Sebastian Thrun and Jana Košecká CS223B Computer Vision, Winter 2007
Solving other “Face” Tasks
Facial Feature Localization
DemographicAnalysis
Profile Detection
Slide credit: Frank Dellaert, Paul Viola, Foryth&Ponce
Sebastian Thrun and Jana Košecká CS223B Computer Vision, Winter 2007
Face Localization Features Learned features reflect the task
Slide credit: Frank Dellaert, Paul Viola, Foryth&Ponce
Sebastian Thrun and Jana Košecká CS223B Computer Vision, Winter 2007
Face Profile Detection
Slide credit: Frank Dellaert, Paul Viola, Foryth&Ponce
Sebastian Thrun and Jana Košecká CS223B Computer Vision, Winter 2007
Finding Cars (DARPA Urban Challenge) Hand-labeled images of generic car rear-ends Training time: ~5 hours, offline
1100 images
Credit: Hendrik Dahlkamp
Sebastian Thrun and Jana Košecká CS223B Computer Vision, Winter 2007
Generating even more examples Generic classifier finds all cars in recorded video. Compute offline and store in database
28700 images
Credit: Hendrik Dahlkamp
Sebastian Thrun and Jana Košecká CS223B Computer Vision, Winter 2007
Summary Viola-Jones Many simple features
– Generalized Haar features (multi-rectangles)– Easy and efficient to compute
Discriminative Learning: – finds a small subset for object recognition– Uses AdaBoost
Result: Feature Cascade– 15fps on 700Mhz Laptop (=fast!)
Applications– Face detection– Car detection– Many others