csse463: image recognition day 27 this week this week last night: k-means lab due. last night:...
TRANSCRIPT
![Page 1: CSSE463: Image Recognition Day 27 This week This week Last night: k-means lab due. Last night: k-means lab due. Today: Classification by “boosting” Today:](https://reader036.vdocuments.mx/reader036/viewer/2022072013/56649e575503460f94b4f7cd/html5/thumbnails/1.jpg)
CSSE463: Image Recognition CSSE463: Image Recognition Day 27Day 27 This weekThis week
Last night: k-means lab due.Last night: k-means lab due. TodayToday: : Classification by “boosting” Classification by “boosting”
Yoav Freund and Robert Schapire. A decision-theoretic generalization of on-Yoav Freund and Robert Schapire. A decision-theoretic generalization of on-line learning and an application to boosting. line learning and an application to boosting. Proceedings of the 2Proceedings of the 2ndnd European European Conference on Computational Learning TheoryConference on Computational Learning Theory, March, 1995., March, 1995.
http://www.cs.princeton.edu/~schapire/publist.html
Sunday night: project plans and prelim work dueSunday night: project plans and prelim work due Questions?Questions?
![Page 2: CSSE463: Image Recognition Day 27 This week This week Last night: k-means lab due. Last night: k-means lab due. Today: Classification by “boosting” Today:](https://reader036.vdocuments.mx/reader036/viewer/2022072013/56649e575503460f94b4f7cd/html5/thumbnails/2.jpg)
Motivation for AdaboostMotivation for Adaboost
SVMs are fairly SVMs are fairly expensive in terms of expensive in terms of memory.memory. Need to store a list of Need to store a list of
support vectors, which for support vectors, which for RBF kernels, can be long.RBF kernels, can be long.
Monolith
SVMinput output
By the way, how does svmfwd compute y1?y1 is just the weighted sum of contributions of individual support vectors:
d = data dimension, e.g., 294
numSupVecs, svcoeff (alpha) and bias are learned during training. BTW: looking at which of your training examples are support vectors can be revealing! (Keep in mind for term project)
biasesvcoeffynumSupVecs
i
svxdi
i
1
)*/1(2
*1
Q1
![Page 3: CSSE463: Image Recognition Day 27 This week This week Last night: k-means lab due. Last night: k-means lab due. Today: Classification by “boosting” Today:](https://reader036.vdocuments.mx/reader036/viewer/2022072013/56649e575503460f94b4f7cd/html5/thumbnails/3.jpg)
Motivation for AdaboostMotivation for Adaboost SVMs are fairly SVMs are fairly
expensive in terms of expensive in terms of memory.memory. Need to store a list of Need to store a list of
support vectors, which for support vectors, which for RBF kernels, can be long.RBF kernels, can be long.
Can we do better?Can we do better? Yes, consider simpler Yes, consider simpler
classifiers like thresholding classifiers like thresholding or decision trees.or decision trees.
The idea of boosting is to The idea of boosting is to combine the results of a combine the results of a number of “weak” learners number of “weak” learners (classifiers) in a smart way.(classifiers) in a smart way.
Monolith
SVMinput output
L4
L3
L2
L1
input output
Team of experts
w1
w4
w2
w3
Q2
![Page 4: CSSE463: Image Recognition Day 27 This week This week Last night: k-means lab due. Last night: k-means lab due. Today: Classification by “boosting” Today:](https://reader036.vdocuments.mx/reader036/viewer/2022072013/56649e575503460f94b4f7cd/html5/thumbnails/4.jpg)
Idea of Boosting (continued)Idea of Boosting (continued)Consider each weak learner as an expert Consider each weak learner as an expert that produces “advice” (a classification that produces “advice” (a classification result, possibly with confidence). result, possibly with confidence).
We want to combine their predictions We want to combine their predictions intelligently.intelligently.
Call each weak learner L multiple times Call each weak learner L multiple times with a different distribution of the data with a different distribution of the data (perhaps a subset of the whole training (perhaps a subset of the whole training set).set).
Use a weighting scheme in which each Use a weighting scheme in which each expert’s advice is weighted by its accuracy expert’s advice is weighted by its accuracy on the training set.on the training set.
Not memorizing the training set, since each classifier is weakNot memorizing the training set, since each classifier is weak
L4
L3
L2
L1
input output
Team of experts
w1
w4
w2
w3
![Page 5: CSSE463: Image Recognition Day 27 This week This week Last night: k-means lab due. Last night: k-means lab due. Today: Classification by “boosting” Today:](https://reader036.vdocuments.mx/reader036/viewer/2022072013/56649e575503460f94b4f7cd/html5/thumbnails/5.jpg)
NotationNotation
Training set contains labeled samples:Training set contains labeled samples: {(x{(x11, C(x, C(x11)), (x)), (x22, C(x, C(x22)), … (x)), … (xNN, C(x, C(xNN))}))}
C(xC(xii) is the class label for sample x) is the class label for sample xii, either 1 or 0., either 1 or 0.
ww is a weight vector, 1 entry per example, of how is a weight vector, 1 entry per example, of how important each example is.important each example is. Can initialize to uniform (1/N) or weight guessed Can initialize to uniform (1/N) or weight guessed
“important” examples more heavily“important” examples more heavily
hh is a hypothesis (weak classifier output) in the is a hypothesis (weak classifier output) in the range [0,1]. When called with a vector of inputs, range [0,1]. When called with a vector of inputs, h is a vector of outputsh is a vector of outputs
![Page 6: CSSE463: Image Recognition Day 27 This week This week Last night: k-means lab due. Last night: k-means lab due. Today: Classification by “boosting” Today:](https://reader036.vdocuments.mx/reader036/viewer/2022072013/56649e575503460f94b4f7cd/html5/thumbnails/6.jpg)
Adaboost AlgorithmAdaboost AlgorithmInitialize weightsInitialize weightsfor t = 1:Tfor t = 1:T
1. Set1. Set
2. Call weak learner WL2. Call weak learner WLtt with each x, and get back result vector with each x, and get back result vector
3. Find error 3. Find error tt of : of :
4. Set4. Set
5. Re-weight the training data5. Re-weight the training data
Output answer:Output answer:
Nwi
11
N
i
iittit xcxhp
1
)()(
t
tt
1
correctifw
incorrectifww
tti
tit
i 1
otherwise
xhifxh
T
t t
T
t
itt
if
0
1log
2
1)(
1log1
)(11
N
i
ti
tt wwp1
/
Normalizes so p weights sum to 1
Average error on training setweighted by p
Low overall error close to 0; (big impact on next step) error almost 0.5 almost 1 (small impact “ “ “ )
Weighting the ones it got right less so next time it will focus more on the incorrect ones
th
th
Q3-5
![Page 7: CSSE463: Image Recognition Day 27 This week This week Last night: k-means lab due. Last night: k-means lab due. Today: Classification by “boosting” Today:](https://reader036.vdocuments.mx/reader036/viewer/2022072013/56649e575503460f94b4f7cd/html5/thumbnails/7.jpg)
Adaboost AlgorithmAdaboost Algorithm
Output answer:Output answer:
Combines predictions made at all T iterations.Combines predictions made at all T iterations.
Label as class 1 if weighted average of all T Label as class 1 if weighted average of all T predictions is over ½. predictions is over ½.
Weighted by error of overall classifier at that Weighted by error of overall classifier at that iteration (better classifiers weighted higher)iteration (better classifiers weighted higher)
otherwise
ihifxh
T
t t
T
tt
tif
0
1log
2
1)(
1log1
)(11
Q6
![Page 8: CSSE463: Image Recognition Day 27 This week This week Last night: k-means lab due. Last night: k-means lab due. Today: Classification by “boosting” Today:](https://reader036.vdocuments.mx/reader036/viewer/2022072013/56649e575503460f94b4f7cd/html5/thumbnails/8.jpg)
Practical issuesPractical issues
For binary problems, weak learners have For binary problems, weak learners have to have at least 50% accuracy (better than to have at least 50% accuracy (better than chance).chance).
Running time for training is O(Running time for training is O(nTnT), for ), for nn samples and T iterations.samples and T iterations.
How do we choose T?How do we choose T?Trade-off between training time and accuracyTrade-off between training time and accuracyBest determined experimentally (see optT Best determined experimentally (see optT
demo)demo)
![Page 9: CSSE463: Image Recognition Day 27 This week This week Last night: k-means lab due. Last night: k-means lab due. Today: Classification by “boosting” Today:](https://reader036.vdocuments.mx/reader036/viewer/2022072013/56649e575503460f94b4f7cd/html5/thumbnails/9.jpg)
Multiclass problemsMulticlass problems
Can be generalized to multiple classes Can be generalized to multiple classes using variants Adaboost.M1 and using variants Adaboost.M1 and Adaboost.M2Adaboost.M2
Challenge finding a weak learner with 50% Challenge finding a weak learner with 50% accuracy if there are many classesaccuracy if there are many classesRandom chance gives only 17% if 6 classes.Random chance gives only 17% if 6 classes.Version M2 handles this.Version M2 handles this.
![Page 10: CSSE463: Image Recognition Day 27 This week This week Last night: k-means lab due. Last night: k-means lab due. Today: Classification by “boosting” Today:](https://reader036.vdocuments.mx/reader036/viewer/2022072013/56649e575503460f94b4f7cd/html5/thumbnails/10.jpg)
Available softwareAvailable software
GML AdaBoost Matlab ToolboxGML AdaBoost Matlab ToolboxWeak learner: Classification treeWeak learner: Classification treeUses less memory if weak learner is Uses less memory if weak learner is
simplesimpleBetter accuracy than SVM in tests (toy and Better accuracy than SVM in tests (toy and
real problems)real problems)DemoDemoSample use in literatureSample use in literature
Q7