poster_logo detection and recognition
TRANSCRIPT
A
B
Logo Detection and Recognition Kaicheng Wang1
1School of Engineering, Stanford University
1. Abstract
2. Image Features
4. Results and Brief Discussion
. .
3. Learning Algorithms
• Invariant local features • Local maximum in
space and scale • 128-dimensional
vector per point
1. Scale-invariant feature transform (SIFT)
2. K-nearest neighbor (cooperates with Fisherfaces)
• Sliding detection windows: The testing image is scanned by windows of different sizes. The best match with the specific red window is reported.
2. Fisherfaces
• Several basis vectors • Maximizing difference
between clusters • Minimizing variance
within clusters
• SIFT and Fisherfaces are used to extract features from training and testing images.
• Naïve Bayes and K-nearest neighbors are used to build model for logo detection and recognition.
60 min 5. Acknowledgement
The author would like to thank Prof Andrew Ng for his lecture of CS 229 and TA Albert Haque for his guidance.
1. Naïve Bayes (cooperates with SIFT)
Gatorade1 Gatorade2 Gatorade3
Training data: different versions of the same brand
Every image is expressed by a linear combination of basis vectors. The number of basis vectors to keep is the dimension of features, which is optimized by k-fold cross validation. In this way, both training and testing images have very low-dimensional features.
Testing data: advertisement of Gatorade from Michel Jordan
• Point A is matched to all three training images.
P(A|y = Gatorade) = (3+1)/(3+2) = 0.8
• Point B is matched to only
one image (Gatorade1). P(B|y = Gatorade) =(1+1)/(3+2) = 0.4
• Laplace smoothing is used
here.
Toy model of likelihood estimates in NB
Two parameters of this learning model are to be decided: • Value of K in KNN • Number of basis vectors (“reduced dimensions” in the plots)
• Spatial pyramid: Similarity between detection window and training data is evaluated at different resolutions. The weight of matching at higher resolution is larger.
1. Techniques
3. Accuracy Method Target Training set size Testing set size Precision SIFT + NB Commercial
trademark 10images/brand*150brands = 1500 images
5images/brand*150brands = 750 images
88.27%
Fisherfaces + KNN Traffic sign 3images/sign*40signs = 120 images
9images/sign*40signs = 360 images
95.57%
2. Reasons for each choice
Commercial advertisements
• Weaker assumption
SIFT • Curse of dimensionality
NB • High accuracy
Traffic signs
• Stronger assumption
Fisherfaces • Lower dimensions
KNN • Computation efficient