csc 411: introduction to machine learningmren/teach/csc411_19s/lec/lec05_matt.… · uoft csc411...

61
CSC 411: Introduction to Machine Learning Lecture 5: Ensembles II Mengye Ren and Matthew MacKay University of Toronto UofT CSC411 2019 Winter Lecture 02 1 / 24

Upload: others

Post on 20-May-2020

14 views

Category:

Documents


0 download

TRANSCRIPT

CSC 411: Introduction to Machine LearningLecture 5: Ensembles II

Mengye Ren and Matthew MacKay

University of Toronto

UofT CSC411 2019 Winter Lecture 02 1 / 24

Boosting

Recall that an ensemble is a set of predictors whose individual decisions arecombined in some way to classify new examples.

(Previous lecture) Bagging: Train classifiers independently on randomsubsets of the training data.

(This lecture) Boosting: Train classifiers sequentially, each time focusing ontraining data points that were previously misclassified.

Let’s start with the concepts of weighted training sets and weaklearner/classifier (or base classifiers).

UofT CSC411 2019 Winter Lecture 02 2 / 24

Weighted Training set

The misclassification error∑N

n=1 I[h(x (n)) 6= t(n)] error weights each trainingexample equally

Key idea: can learn a classifier using different costs (aka weights) forexamples

I Classifier “tries harder” on examples with higher cost

Change cost function:

N∑

n=1

I[h(x (n)) 6= t(n)] becomesN∑

n=1

w (n)I[h(x (n)) 6= t(n)]

Usually require each w (n) > 0 and∑N

n=1 w(n) = 1

UofT CSC411 2019 Winter Lecture 02 3 / 24

Weighted Training set

The misclassification error∑N

n=1 I[h(x (n)) 6= t(n)] error weights each trainingexample equally

Key idea: can learn a classifier using different costs (aka weights) forexamples

I Classifier “tries harder” on examples with higher cost

Change cost function:

N∑

n=1

I[h(x (n)) 6= t(n)] becomesN∑

n=1

w (n)I[h(x (n)) 6= t(n)]

Usually require each w (n) > 0 and∑N

n=1 w(n) = 1

UofT CSC411 2019 Winter Lecture 02 3 / 24

Weighted Training set

The misclassification error∑N

n=1 I[h(x (n)) 6= t(n)] error weights each trainingexample equally

Key idea: can learn a classifier using different costs (aka weights) forexamples

I Classifier “tries harder” on examples with higher cost

Change cost function:

N∑

n=1

I[h(x (n)) 6= t(n)] becomesN∑

n=1

w (n)I[h(x (n)) 6= t(n)]

Usually require each w (n) > 0 and∑N

n=1 w(n) = 1

UofT CSC411 2019 Winter Lecture 02 3 / 24

Weighted Training set

The misclassification error∑N

n=1 I[h(x (n)) 6= t(n)] error weights each trainingexample equally

Key idea: can learn a classifier using different costs (aka weights) forexamples

I Classifier “tries harder” on examples with higher cost

Change cost function:

N∑

n=1

I[h(x (n)) 6= t(n)] becomesN∑

n=1

w (n)I[h(x (n)) 6= t(n)]

Usually require each w (n) > 0 and∑N

n=1 w(n) = 1

UofT CSC411 2019 Winter Lecture 02 3 / 24

Weak Learner/Classifier

(Informal) Weak learner is a learning algorithm that outputs a hypothesis(e.g., a classifier) that performs slightly better than chance, e.g., it predictsthe correct label with probability 0.5 + ε in binary label case.

I For weighted training set, gets slightly less than 0.5 error

We are interested in weak learners that are computationally efficient.

I Decision treesI Even simpler: Decision Stump: A decision tree with only a single split

[Formal definition of weak learnability has quantifies such as “for any distribution over data” and the requirement that itsguarantee holds only probabilistically.]

UofT CSC411 2019 Winter Lecture 02 4 / 24

Weak Learner/Classifier

(Informal) Weak learner is a learning algorithm that outputs a hypothesis(e.g., a classifier) that performs slightly better than chance, e.g., it predictsthe correct label with probability 0.5 + ε in binary label case.

I For weighted training set, gets slightly less than 0.5 error

We are interested in weak learners that are computationally efficient.

I Decision treesI Even simpler: Decision Stump: A decision tree with only a single split

[Formal definition of weak learnability has quantifies such as “for any distribution over data” and the requirement that itsguarantee holds only probabilistically.]

UofT CSC411 2019 Winter Lecture 02 4 / 24

Weak Learner/Classifier

(Informal) Weak learner is a learning algorithm that outputs a hypothesis(e.g., a classifier) that performs slightly better than chance, e.g., it predictsthe correct label with probability 0.5 + ε in binary label case.

I For weighted training set, gets slightly less than 0.5 error

We are interested in weak learners that are computationally efficient.

I Decision treesI Even simpler: Decision Stump: A decision tree with only a single split

[Formal definition of weak learnability has quantifies such as “for any distribution over data” and the requirement that itsguarantee holds only probabilistically.]

UofT CSC411 2019 Winter Lecture 02 4 / 24

Weak Classifiers

These weak classifiers, which are decision stumps, consist of the set of horizontaland vertical half spaces.

Vertical half spaces Horizontal half spaces

UofT CSC411 2019 Winter Lecture 02 5 / 24

Weak Classifiers

Vertical half spaces Horizontal half spaces

A single weak classifier is not capable of making the training error verysmall. It only perform slightly better than chance, i.e., the error of classifierh according to the given weights {w (1), . . . ,w (N)} (with

∑Nn=1 w

(n) = 1 andw (n) ≥ 0)

err =N∑

n=1

w (n)I[h(x(n)) 6= t(n)]

is at most 12 − γ for some γ > 0.

Can we combine a set of weak classifiers in order to make a better ensembleof classifiers?

UofT CSC411 2019 Winter Lecture 02 6 / 24

AdaBoost (Adaptive Boosting)

Boosting: Train classifiers sequentially, each time assigning higher weight totraining data points that were previously misclassified.

Key steps of AdaBoost:

1. At each iteration we re-weight the training samples by assigning largerweights to samples (i.e., data points) that were classified incorrectly.

2. We train a new weak classifier based on the re-weighted samples.3. We add this weak classifier to the ensemble of classifiers. This is our

new classifier.4. We repeat the process many times.

The weak learner needs to minimize weighted error.

AdaBoost reduces bias by making each classifier focus on previous mistakes.

UofT CSC411 2019 Winter Lecture 02 7 / 24

AdaBoost (Adaptive Boosting)

Boosting: Train classifiers sequentially, each time assigning higher weight totraining data points that were previously misclassified.

Key steps of AdaBoost:

1. At each iteration we re-weight the training samples by assigning largerweights to samples (i.e., data points) that were classified incorrectly.

2. We train a new weak classifier based on the re-weighted samples.3. We add this weak classifier to the ensemble of classifiers. This is our

new classifier.4. We repeat the process many times.

The weak learner needs to minimize weighted error.

AdaBoost reduces bias by making each classifier focus on previous mistakes.

UofT CSC411 2019 Winter Lecture 02 7 / 24

AdaBoost (Adaptive Boosting)

Boosting: Train classifiers sequentially, each time assigning higher weight totraining data points that were previously misclassified.

Key steps of AdaBoost:

1. At each iteration we re-weight the training samples by assigning largerweights to samples (i.e., data points) that were classified incorrectly.

2. We train a new weak classifier based on the re-weighted samples.3. We add this weak classifier to the ensemble of classifiers. This is our

new classifier.4. We repeat the process many times.

The weak learner needs to minimize weighted error.

AdaBoost reduces bias by making each classifier focus on previous mistakes.

UofT CSC411 2019 Winter Lecture 02 7 / 24

AdaBoost (Adaptive Boosting)

Boosting: Train classifiers sequentially, each time assigning higher weight totraining data points that were previously misclassified.

Key steps of AdaBoost:

1. At each iteration we re-weight the training samples by assigning largerweights to samples (i.e., data points) that were classified incorrectly.

2. We train a new weak classifier based on the re-weighted samples.

3. We add this weak classifier to the ensemble of classifiers. This is ournew classifier.

4. We repeat the process many times.

The weak learner needs to minimize weighted error.

AdaBoost reduces bias by making each classifier focus on previous mistakes.

UofT CSC411 2019 Winter Lecture 02 7 / 24

AdaBoost (Adaptive Boosting)

Boosting: Train classifiers sequentially, each time assigning higher weight totraining data points that were previously misclassified.

Key steps of AdaBoost:

1. At each iteration we re-weight the training samples by assigning largerweights to samples (i.e., data points) that were classified incorrectly.

2. We train a new weak classifier based on the re-weighted samples.3. We add this weak classifier to the ensemble of classifiers. This is our

new classifier.

4. We repeat the process many times.

The weak learner needs to minimize weighted error.

AdaBoost reduces bias by making each classifier focus on previous mistakes.

UofT CSC411 2019 Winter Lecture 02 7 / 24

AdaBoost (Adaptive Boosting)

Boosting: Train classifiers sequentially, each time assigning higher weight totraining data points that were previously misclassified.

Key steps of AdaBoost:

1. At each iteration we re-weight the training samples by assigning largerweights to samples (i.e., data points) that were classified incorrectly.

2. We train a new weak classifier based on the re-weighted samples.3. We add this weak classifier to the ensemble of classifiers. This is our

new classifier.4. We repeat the process many times.

The weak learner needs to minimize weighted error.

AdaBoost reduces bias by making each classifier focus on previous mistakes.

UofT CSC411 2019 Winter Lecture 02 7 / 24

AdaBoost (Adaptive Boosting)

Boosting: Train classifiers sequentially, each time assigning higher weight totraining data points that were previously misclassified.

Key steps of AdaBoost:

1. At each iteration we re-weight the training samples by assigning largerweights to samples (i.e., data points) that were classified incorrectly.

2. We train a new weak classifier based on the re-weighted samples.3. We add this weak classifier to the ensemble of classifiers. This is our

new classifier.4. We repeat the process many times.

The weak learner needs to minimize weighted error.

AdaBoost reduces bias by making each classifier focus on previous mistakes.

UofT CSC411 2019 Winter Lecture 02 7 / 24

AdaBoost Algorithm

Input: Data DN = {x(n), t(n)}Nn=1, weak classifier WeakLearn (a classificationprocedure that return a classifier from base hypothesis space H withh : x→ {−1,+1} for h ∈ H), number of iterations T

Output: Classifier H(x)

Initialize sample weights: w (n) = 1N

for n = 1, . . . ,N

For t = 1, . . . ,T

I Fit a classifier to data using weighted samples (ht ←WeakLearn(DN ,w)),e.g.,

ht ← argminh∈H

N∑n=1

w (n)I{h(x(n)) 6= t(n)}

I Compute weighted error errt =∑N

n=1 w (n)I{ht (x(n)) 6=t(n)}∑Nn=1 w (n)

I Compute classifier coefficient αt = 12

log 1−errterrt

I Update data weights

w (n) ← w (n) exp(−αtt

(n)ht(x(n))) [≡ w (n) exp

(2αtI{ht(x(n)) 6= t(n)}

)]Return H(x) = sign

(∑Tt=1 αtht(x)

)

UofT CSC411 2019 Winter Lecture 02 8 / 24

AdaBoost Algorithm

Input: Data DN = {x(n), t(n)}Nn=1, weak classifier WeakLearn (a classificationprocedure that return a classifier from base hypothesis space H withh : x→ {−1,+1} for h ∈ H), number of iterations T

Output: Classifier H(x)

Initialize sample weights: w (n) = 1N

for n = 1, . . . ,N

For t = 1, . . . ,T

I Fit a classifier to data using weighted samples (ht ←WeakLearn(DN ,w)),e.g.,

ht ← argminh∈H

N∑n=1

w (n)I{h(x(n)) 6= t(n)}

I Compute weighted error errt =∑N

n=1 w (n)I{ht (x(n)) 6=t(n)}∑Nn=1 w (n)

I Compute classifier coefficient αt = 12

log 1−errterrt

I Update data weights

w (n) ← w (n) exp(−αtt

(n)ht(x(n))) [≡ w (n) exp

(2αtI{ht(x(n)) 6= t(n)}

)]Return H(x) = sign

(∑Tt=1 αtht(x)

)

UofT CSC411 2019 Winter Lecture 02 8 / 24

AdaBoost Algorithm

Input: Data DN = {x(n), t(n)}Nn=1, weak classifier WeakLearn (a classificationprocedure that return a classifier from base hypothesis space H withh : x→ {−1,+1} for h ∈ H), number of iterations T

Output: Classifier H(x)

Initialize sample weights: w (n) = 1N

for n = 1, . . . ,N

For t = 1, . . . ,T

I Fit a classifier to data using weighted samples (ht ←WeakLearn(DN ,w)),e.g.,

ht ← argminh∈H

N∑n=1

w (n)I{h(x(n)) 6= t(n)}

I Compute weighted error errt =∑N

n=1 w (n)I{ht (x(n)) 6=t(n)}∑Nn=1 w (n)

I Compute classifier coefficient αt = 12

log 1−errterrt

I Update data weights

w (n) ← w (n) exp(−αtt

(n)ht(x(n))) [≡ w (n) exp

(2αtI{ht(x(n)) 6= t(n)}

)]Return H(x) = sign

(∑Tt=1 αtht(x)

)

UofT CSC411 2019 Winter Lecture 02 8 / 24

AdaBoost Algorithm

Input: Data DN = {x(n), t(n)}Nn=1, weak classifier WeakLearn (a classificationprocedure that return a classifier from base hypothesis space H withh : x→ {−1,+1} for h ∈ H), number of iterations T

Output: Classifier H(x)

Initialize sample weights: w (n) = 1N

for n = 1, . . . ,N

For t = 1, . . . ,T

I Fit a classifier to data using weighted samples (ht ←WeakLearn(DN ,w)),e.g.,

ht ← argminh∈H

N∑n=1

w (n)I{h(x(n)) 6= t(n)}

I Compute weighted error errt =∑N

n=1 w (n)I{ht (x(n)) 6=t(n)}∑Nn=1 w (n)

I Compute classifier coefficient αt = 12

log 1−errterrt

I Update data weights

w (n) ← w (n) exp(−αtt

(n)ht(x(n))) [≡ w (n) exp

(2αtI{ht(x(n)) 6= t(n)}

)]Return H(x) = sign

(∑Tt=1 αtht(x)

)

UofT CSC411 2019 Winter Lecture 02 8 / 24

AdaBoost Algorithm

Input: Data DN = {x(n), t(n)}Nn=1, weak classifier WeakLearn (a classificationprocedure that return a classifier from base hypothesis space H withh : x→ {−1,+1} for h ∈ H), number of iterations T

Output: Classifier H(x)

Initialize sample weights: w (n) = 1N

for n = 1, . . . ,N

For t = 1, . . . ,T

I Fit a classifier to data using weighted samples (ht ←WeakLearn(DN ,w)),e.g.,

ht ← argminh∈H

N∑n=1

w (n)I{h(x(n)) 6= t(n)}

I Compute weighted error errt =∑N

n=1 w (n)I{ht (x(n)) 6=t(n)}∑Nn=1 w (n)

I Compute classifier coefficient αt = 12

log 1−errterrt

I Update data weights

w (n) ← w (n) exp(−αtt

(n)ht(x(n))) [≡ w (n) exp

(2αtI{ht(x(n)) 6= t(n)}

)]Return H(x) = sign

(∑Tt=1 αtht(x)

)

UofT CSC411 2019 Winter Lecture 02 8 / 24

AdaBoost Algorithm

Input: Data DN = {x(n), t(n)}Nn=1, weak classifier WeakLearn (a classificationprocedure that return a classifier from base hypothesis space H withh : x→ {−1,+1} for h ∈ H), number of iterations T

Output: Classifier H(x)

Initialize sample weights: w (n) = 1N

for n = 1, . . . ,N

For t = 1, . . . ,T

I Fit a classifier to data using weighted samples (ht ←WeakLearn(DN ,w)),e.g.,

ht ← argminh∈H

N∑n=1

w (n)I{h(x(n)) 6= t(n)}

I Compute weighted error errt =∑N

n=1 w (n)I{ht (x(n)) 6=t(n)}∑Nn=1 w (n)

I Compute classifier coefficient αt = 12

log 1−errterrt

I Update data weights

w (n) ← w (n) exp(−αtt

(n)ht(x(n))) [≡ w (n) exp

(2αtI{ht(x(n)) 6= t(n)}

)]Return H(x) = sign

(∑Tt=1 αtht(x)

)

UofT CSC411 2019 Winter Lecture 02 8 / 24

AdaBoost Algorithm

Input: Data DN = {x(n), t(n)}Nn=1, weak classifier WeakLearn (a classificationprocedure that return a classifier from base hypothesis space H withh : x→ {−1,+1} for h ∈ H), number of iterations T

Output: Classifier H(x)

Initialize sample weights: w (n) = 1N

for n = 1, . . . ,N

For t = 1, . . . ,T

I Fit a classifier to data using weighted samples (ht ←WeakLearn(DN ,w)),e.g.,

ht ← argminh∈H

N∑n=1

w (n)I{h(x(n)) 6= t(n)}

I Compute weighted error errt =∑N

n=1 w (n)I{ht (x(n)) 6=t(n)}∑Nn=1 w (n)

I Compute classifier coefficient αt = 12

log 1−errterrt

I Update data weights

w (n) ← w (n) exp(−αtt

(n)ht(x(n))) [≡ w (n) exp

(2αtI{ht(x(n)) 6= t(n)}

)]Return H(x) = sign

(∑Tt=1 αtht(x)

)

UofT CSC411 2019 Winter Lecture 02 8 / 24

AdaBoost Algorithm

Input: Data DN = {x(n), t(n)}Nn=1, weak classifier WeakLearn (a classificationprocedure that return a classifier from base hypothesis space H withh : x→ {−1,+1} for h ∈ H), number of iterations T

Output: Classifier H(x)

Initialize sample weights: w (n) = 1N

for n = 1, . . . ,N

For t = 1, . . . ,T

I Fit a classifier to data using weighted samples (ht ←WeakLearn(DN ,w)),e.g.,

ht ← argminh∈H

N∑n=1

w (n)I{h(x(n)) 6= t(n)}

I Compute weighted error errt =∑N

n=1 w (n)I{ht (x(n)) 6=t(n)}∑Nn=1 w (n)

I Compute classifier coefficient αt = 12

log 1−errterrt

I Update data weights

w (n) ← w (n) exp(−αtt

(n)ht(x(n))) [≡ w (n) exp

(2αtI{ht(x(n)) 6= t(n)}

)]

Return H(x) = sign(∑T

t=1 αtht(x))

UofT CSC411 2019 Winter Lecture 02 8 / 24

AdaBoost Algorithm

Input: Data DN = {x(n), t(n)}Nn=1, weak classifier WeakLearn (a classificationprocedure that return a classifier from base hypothesis space H withh : x→ {−1,+1} for h ∈ H), number of iterations T

Output: Classifier H(x)

Initialize sample weights: w (n) = 1N

for n = 1, . . . ,N

For t = 1, . . . ,T

I Fit a classifier to data using weighted samples (ht ←WeakLearn(DN ,w)),e.g.,

ht ← argminh∈H

N∑n=1

w (n)I{h(x(n)) 6= t(n)}

I Compute weighted error errt =∑N

n=1 w (n)I{ht (x(n)) 6=t(n)}∑Nn=1 w (n)

I Compute classifier coefficient αt = 12

log 1−errterrt

I Update data weights

w (n) ← w (n) exp(−αtt

(n)ht(x(n))) [≡ w (n) exp

(2αtI{ht(x(n)) 6= t(n)}

)]Return H(x) = sign

(∑Tt=1 αtht(x)

)UofT CSC411 2019 Winter Lecture 02 8 / 24

Weighting Intuition

Recall: H(x) = sign(∑T

t=1 αtht(x))

where αt = 12 log 1−errt

errt

Weak classifiers which get lower weighted error get more weight inthe final classifier

Also: w (n) ← w (n) exp(2αtI{ht(x(n)) 6= t(n)}

)

I If errt ≈ 0, αt high so misclassified examples more attentionI If errt ≈ 0.5, αt low so misclassified examples are not emphasized

UofT CSC411 2019 Winter Lecture 02 9 / 24

Weighting Intuition

Recall: H(x) = sign(∑T

t=1 αtht(x))

where αt = 12 log 1−errt

errt

Weak classifiers which get lower weighted error get more weight inthe final classifier

Also: w (n) ← w (n) exp(2αtI{ht(x(n)) 6= t(n)}

)

I If errt ≈ 0, αt high so misclassified examples more attentionI If errt ≈ 0.5, αt low so misclassified examples are not emphasized

UofT CSC411 2019 Winter Lecture 02 9 / 24

Weighting Intuition

Recall: H(x) = sign(∑T

t=1 αtht(x))

where αt = 12 log 1−errt

errt

Weak classifiers which get lower weighted error get more weight inthe final classifier

Also: w (n) ← w (n) exp(2αtI{ht(x(n)) 6= t(n)}

)

I If errt ≈ 0, αt high so misclassified examples more attentionI If errt ≈ 0.5, αt low so misclassified examples are not emphasized

UofT CSC411 2019 Winter Lecture 02 9 / 24

AdaBoost Example

Training data

[Slide credit: Verma & Thrun]

UofT CSC411 2019 Winter Lecture 02 10 / 24

AdaBoost Example

Round 1

w =

(1

10, . . . ,

1

10

)⇒ Train a classifier (using w)⇒ err1 =

∑10n=1 w

(n)I[h1(x(n)) 6= t(n)]∑10n=1 w

(n)=

3

10

⇒α1 =1

2log

1− err1

err1=

1

2log(

1

0.3− 1) ≈ 0.42⇒ H(x) = sign (α1h1(x))

[Slide credit: Verma & Thrun]

UofT CSC411 2019 Winter Lecture 02 11 / 24

AdaBoost Example

Round 1

w =

(1

10, . . . ,

1

10

)⇒ Train a classifier (using w)⇒ err1 =

∑10n=1 w

(n)I[h1(x(n)) 6= t(n)]∑10n=1 w

(n)=

3

10

⇒α1 =1

2log

1− err1

err1=

1

2log(

1

0.3− 1) ≈ 0.42⇒ H(x) = sign (α1h1(x))

[Slide credit: Verma & Thrun]

UofT CSC411 2019 Winter Lecture 02 11 / 24

AdaBoost Example

Round 2

w← new weights⇒ Train a classifier (using w)⇒ err2 =

∑10n=1 w

(n)I{h2(x(n)) 6= t(n)}∑10n=1 w

(n)= 0.21

⇒α2 =1

2log

1− err2

err2=

1

2log(

1

0.21− 1) ≈ 0.66⇒ H(x) = sign (α1h1(x) + α2h2(x))

[Slide credit: Verma & Thrun]UofT CSC411 2019 Winter Lecture 02 12 / 24

AdaBoost Example

Round 3

w← new weights⇒ Train a classifier (using w)⇒ err3 =

∑10n=1 w

(n)I{h3(x(n)) 6= t(n)}∑10i=1 w

(n)= 0.14

⇒α3 =1

2log

1− err3

err3=

1

2log(

1

0.14− 1) ≈ 0.91⇒ H(x) = sign (α1h1(x) + α2h2(x) + α3h3(x))

[Slide credit: Verma & Thrun]

UofT CSC411 2019 Winter Lecture 02 13 / 24

AdaBoost Example

Final classifier

[Slide credit: Verma & Thrun]

UofT CSC411 2019 Winter Lecture 02 14 / 24

AdaBoost Algorithm

Samples h1<latexit sha1_base64="WQi1lWWoKHIxIqloXQAEwQJU1/k=">AAACP3icdVBLS8NAGNz4rPXV6tFLsCieSiKCHou9eKxoH9CGstlskqX7CLsboYT8BK/6e/wZ/gJv4tWb2zQH29KBD4aZb2AYP6FEacf5tDY2t7Z3dit71f2Dw6PjWv2kp0QqEe4iQYUc+FBhSjjuaqIpHiQSQ+ZT3Pcn7Znff8FSEcGf9TTBHoMRJyFBUBvpKR6741rDaToF7FXilqQBSnTGdetyFAiUMsw1olCpoesk2sug1ARRnFdHqcIJRBMY4aGhHDKsvKzomtsXRgnsUEhzXNuF+j+RQabUlPnmk0Edq2VvJq7zdMzyRY1GQhIjE7TGWGqrwzsvIzxJNeZoXjZMqa2FPRvPDojESNOpIRCZPEE2iqGESJuJq6MimLUFY5AHKjfLuss7rpLeddN1mu7jTaN1X25cAWfgHFwBF9yCFngAHdAFCETgFbyBd+vD+rK+rZ/564ZVZk7BAqzfP80ksI4=</latexit><latexit sha1_base64="WQi1lWWoKHIxIqloXQAEwQJU1/k=">AAACP3icdVBLS8NAGNz4rPXV6tFLsCieSiKCHou9eKxoH9CGstlskqX7CLsboYT8BK/6e/wZ/gJv4tWb2zQH29KBD4aZb2AYP6FEacf5tDY2t7Z3dit71f2Dw6PjWv2kp0QqEe4iQYUc+FBhSjjuaqIpHiQSQ+ZT3Pcn7Znff8FSEcGf9TTBHoMRJyFBUBvpKR6741rDaToF7FXilqQBSnTGdetyFAiUMsw1olCpoesk2sug1ARRnFdHqcIJRBMY4aGhHDKsvKzomtsXRgnsUEhzXNuF+j+RQabUlPnmk0Edq2VvJq7zdMzyRY1GQhIjE7TGWGqrwzsvIzxJNeZoXjZMqa2FPRvPDojESNOpIRCZPEE2iqGESJuJq6MimLUFY5AHKjfLuss7rpLeddN1mu7jTaN1X25cAWfgHFwBF9yCFngAHdAFCETgFbyBd+vD+rK+rZ/564ZVZk7BAqzfP80ksI4=</latexit><latexit sha1_base64="WQi1lWWoKHIxIqloXQAEwQJU1/k=">AAACP3icdVBLS8NAGNz4rPXV6tFLsCieSiKCHou9eKxoH9CGstlskqX7CLsboYT8BK/6e/wZ/gJv4tWb2zQH29KBD4aZb2AYP6FEacf5tDY2t7Z3dit71f2Dw6PjWv2kp0QqEe4iQYUc+FBhSjjuaqIpHiQSQ+ZT3Pcn7Znff8FSEcGf9TTBHoMRJyFBUBvpKR6741rDaToF7FXilqQBSnTGdetyFAiUMsw1olCpoesk2sug1ARRnFdHqcIJRBMY4aGhHDKsvKzomtsXRgnsUEhzXNuF+j+RQabUlPnmk0Edq2VvJq7zdMzyRY1GQhIjE7TGWGqrwzsvIzxJNeZoXjZMqa2FPRvPDojESNOpIRCZPEE2iqGESJuJq6MimLUFY5AHKjfLuss7rpLeddN1mu7jTaN1X25cAWfgHFwBF9yCFngAHdAFCETgFbyBd+vD+rK+rZ/564ZVZk7BAqzfP80ksI4=</latexit><latexit sha1_base64="WQi1lWWoKHIxIqloXQAEwQJU1/k=">AAACP3icdVBLS8NAGNz4rPXV6tFLsCieSiKCHou9eKxoH9CGstlskqX7CLsboYT8BK/6e/wZ/gJv4tWb2zQH29KBD4aZb2AYP6FEacf5tDY2t7Z3dit71f2Dw6PjWv2kp0QqEe4iQYUc+FBhSjjuaqIpHiQSQ+ZT3Pcn7Znff8FSEcGf9TTBHoMRJyFBUBvpKR6741rDaToF7FXilqQBSnTGdetyFAiUMsw1olCpoesk2sug1ARRnFdHqcIJRBMY4aGhHDKsvKzomtsXRgnsUEhzXNuF+j+RQabUlPnmk0Edq2VvJq7zdMzyRY1GQhIjE7TGWGqrwzsvIzxJNeZoXjZMqa2FPRvPDojESNOpIRCZPEE2iqGESJuJq6MimLUFY5AHKjfLuss7rpLeddN1mu7jTaN1X25cAWfgHFwBF9yCFngAHdAFCETgFbyBd+vD+rK+rZ/564ZVZk7BAqzfP80ksI4=</latexit>

Re-weighted Samples

Re-weighted Samples

Re-weighted Samples

h2<latexit sha1_base64="RJacZp1vCxsEVFHPWV5JdBplswg=">AAACP3icdVBLS8NAGNzUV62vVo9egkXxVJIi6LHYi8eK9gFtKJvNJl26j7C7EUrIT/Cqv8ef4S/wJl69uU1zsC0d+GCY+QaG8WNKlHacT6u0tb2zu1ferxwcHh2fVGunPSUSiXAXCSrkwIcKU8JxVxNN8SCWGDKf4r4/bc/9/guWigj+rGcx9hiMOAkJgtpIT5Nxc1ytOw0nh71O3ILUQYHOuGZdjQKBEoa5RhQqNXSdWHsplJogirPKKFE4hmgKIzw0lEOGlZfmXTP70iiBHQppjms7V/8nUsiUmjHffDKoJ2rVm4ubPD1h2bJGIyGJkQnaYKy01eGdlxIeJxpztCgbJtTWwp6PZwdEYqTpzBCITJ4gG02ghEibiSujPJi2BWOQByozy7qrO66TXrPhOg338abeui82LoNzcAGugQtuQQs8gA7oAgQi8ArewLv1YX1Z39bP4rVkFZkzsATr9w/O/bCP</latexit><latexit sha1_base64="RJacZp1vCxsEVFHPWV5JdBplswg=">AAACP3icdVBLS8NAGNzUV62vVo9egkXxVJIi6LHYi8eK9gFtKJvNJl26j7C7EUrIT/Cqv8ef4S/wJl69uU1zsC0d+GCY+QaG8WNKlHacT6u0tb2zu1ferxwcHh2fVGunPSUSiXAXCSrkwIcKU8JxVxNN8SCWGDKf4r4/bc/9/guWigj+rGcx9hiMOAkJgtpIT5Nxc1ytOw0nh71O3ILUQYHOuGZdjQKBEoa5RhQqNXSdWHsplJogirPKKFE4hmgKIzw0lEOGlZfmXTP70iiBHQppjms7V/8nUsiUmjHffDKoJ2rVm4ubPD1h2bJGIyGJkQnaYKy01eGdlxIeJxpztCgbJtTWwp6PZwdEYqTpzBCITJ4gG02ghEibiSujPJi2BWOQByozy7qrO66TXrPhOg338abeui82LoNzcAGugQtuQQs8gA7oAgQi8ArewLv1YX1Z39bP4rVkFZkzsATr9w/O/bCP</latexit><latexit sha1_base64="RJacZp1vCxsEVFHPWV5JdBplswg=">AAACP3icdVBLS8NAGNzUV62vVo9egkXxVJIi6LHYi8eK9gFtKJvNJl26j7C7EUrIT/Cqv8ef4S/wJl69uU1zsC0d+GCY+QaG8WNKlHacT6u0tb2zu1ferxwcHh2fVGunPSUSiXAXCSrkwIcKU8JxVxNN8SCWGDKf4r4/bc/9/guWigj+rGcx9hiMOAkJgtpIT5Nxc1ytOw0nh71O3ILUQYHOuGZdjQKBEoa5RhQqNXSdWHsplJogirPKKFE4hmgKIzw0lEOGlZfmXTP70iiBHQppjms7V/8nUsiUmjHffDKoJ2rVm4ubPD1h2bJGIyGJkQnaYKy01eGdlxIeJxpztCgbJtTWwp6PZwdEYqTpzBCITJ4gG02ghEibiSujPJi2BWOQByozy7qrO66TXrPhOg338abeui82LoNzcAGugQtuQQs8gA7oAgQi8ArewLv1YX1Z39bP4rVkFZkzsATr9w/O/bCP</latexit><latexit sha1_base64="RJacZp1vCxsEVFHPWV5JdBplswg=">AAACP3icdVBLS8NAGNzUV62vVo9egkXxVJIi6LHYi8eK9gFtKJvNJl26j7C7EUrIT/Cqv8ef4S/wJl69uU1zsC0d+GCY+QaG8WNKlHacT6u0tb2zu1ferxwcHh2fVGunPSUSiXAXCSrkwIcKU8JxVxNN8SCWGDKf4r4/bc/9/guWigj+rGcx9hiMOAkJgtpIT5Nxc1ytOw0nh71O3ILUQYHOuGZdjQKBEoa5RhQqNXSdWHsplJogirPKKFE4hmgKIzw0lEOGlZfmXTP70iiBHQppjms7V/8nUsiUmjHffDKoJ2rVm4ubPD1h2bJGIyGJkQnaYKy01eGdlxIeJxpztCgbJtTWwp6PZwdEYqTpzBCITJ4gG02ghEibiSujPJi2BWOQByozy7qrO66TXrPhOg338abeui82LoNzcAGugQtuQQs8gA7oAgQi8ArewLv1YX1Z39bP4rVkFZkzsATr9w/O/bCP</latexit>

h3<latexit sha1_base64="BTVvM4uCgypSE15+jQl7JzQyFQw=">AAACP3icdVBLS8NAGNzUV62vVo9egkXxVBIV9FjsxWNF+4A2lM1mky7dR9jdCCXkJ3jV3+PP8Bd4E6/e3KY52JYOfDDMfAPD+DElSjvOp1Xa2Nza3invVvb2Dw6PqrXjrhKJRLiDBBWy70OFKeG4o4mmuB9LDJlPcc+ftGZ+7wVLRQR/1tMYewxGnIQEQW2kp/HoelStOw0nh71K3ILUQYH2qGZdDAOBEoa5RhQqNXCdWHsplJogirPKMFE4hmgCIzwwlEOGlZfmXTP73CiBHQppjms7V/8nUsiUmjLffDKox2rZm4nrPD1m2aJGIyGJkQlaYyy11eGdlxIeJxpzNC8bJtTWwp6NZwdEYqTp1BCITJ4gG42hhEibiSvDPJi2BGOQByozy7rLO66S7lXDdRru4029eV9sXAan4AxcAhfcgiZ4AG3QAQhE4BW8gXfrw/qyvq2f+WvJKjInYAHW7x/Q1rCQ</latexit><latexit sha1_base64="BTVvM4uCgypSE15+jQl7JzQyFQw=">AAACP3icdVBLS8NAGNzUV62vVo9egkXxVBIV9FjsxWNF+4A2lM1mky7dR9jdCCXkJ3jV3+PP8Bd4E6/e3KY52JYOfDDMfAPD+DElSjvOp1Xa2Nza3invVvb2Dw6PqrXjrhKJRLiDBBWy70OFKeG4o4mmuB9LDJlPcc+ftGZ+7wVLRQR/1tMYewxGnIQEQW2kp/HoelStOw0nh71K3ILUQYH2qGZdDAOBEoa5RhQqNXCdWHsplJogirPKMFE4hmgCIzwwlEOGlZfmXTP73CiBHQppjms7V/8nUsiUmjLffDKox2rZm4nrPD1m2aJGIyGJkQlaYyy11eGdlxIeJxpzNC8bJtTWwp6NZwdEYqTp1BCITJ4gG42hhEibiSvDPJi2BGOQByozy7rLO66S7lXDdRru4029eV9sXAan4AxcAhfcgiZ4AG3QAQhE4BW8gXfrw/qyvq2f+WvJKjInYAHW7x/Q1rCQ</latexit><latexit sha1_base64="BTVvM4uCgypSE15+jQl7JzQyFQw=">AAACP3icdVBLS8NAGNzUV62vVo9egkXxVBIV9FjsxWNF+4A2lM1mky7dR9jdCCXkJ3jV3+PP8Bd4E6/e3KY52JYOfDDMfAPD+DElSjvOp1Xa2Nza3invVvb2Dw6PqrXjrhKJRLiDBBWy70OFKeG4o4mmuB9LDJlPcc+ftGZ+7wVLRQR/1tMYewxGnIQEQW2kp/HoelStOw0nh71K3ILUQYH2qGZdDAOBEoa5RhQqNXCdWHsplJogirPKMFE4hmgCIzwwlEOGlZfmXTP73CiBHQppjms7V/8nUsiUmjLffDKox2rZm4nrPD1m2aJGIyGJkQlaYyy11eGdlxIeJxpzNC8bJtTWwp6NZwdEYqTp1BCITJ4gG42hhEibiSvDPJi2BGOQByozy7rLO66S7lXDdRru4029eV9sXAan4AxcAhfcgiZ4AG3QAQhE4BW8gXfrw/qyvq2f+WvJKjInYAHW7x/Q1rCQ</latexit><latexit sha1_base64="BTVvM4uCgypSE15+jQl7JzQyFQw=">AAACP3icdVBLS8NAGNzUV62vVo9egkXxVBIV9FjsxWNF+4A2lM1mky7dR9jdCCXkJ3jV3+PP8Bd4E6/e3KY52JYOfDDMfAPD+DElSjvOp1Xa2Nza3invVvb2Dw6PqrXjrhKJRLiDBBWy70OFKeG4o4mmuB9LDJlPcc+ftGZ+7wVLRQR/1tMYewxGnIQEQW2kp/HoelStOw0nh71K3ILUQYH2qGZdDAOBEoa5RhQqNXCdWHsplJogirPKMFE4hmgCIzwwlEOGlZfmXTP73CiBHQppjms7V/8nUsiUmjLffDKox2rZm4nrPD1m2aJGIyGJkQlaYyy11eGdlxIeJxpzNC8bJtTWwp6NZwdEYqTp1BCITJ4gG42hhEibiSvDPJi2BGOQByozy7rLO66S7lXDdRru4029eV9sXAan4AxcAhfcgiZ4AG3QAQhE4BW8gXfrw/qyvq2f+WvJKjInYAHW7x/Q1rCQ</latexit>

hT<latexit sha1_base64="OPt3ynkhCqxqBzDpgrYhiyO9cFA=">AAACP3icdVBLS8NAGNzUV62vVo9egkXxVBIR9FjsxWPFvqANZbPZJEv3EXY3Qgn9CV719/gz/AXexKs3t2kOtqUDHwwz38AwfkKJ0o7zaZW2tnd298r7lYPDo+OTau20p0QqEe4iQYUc+FBhSjjuaqIpHiQSQ+ZT3Pcnrbnff8FSEcE7eppgj8GIk5AgqI30HI8742rdaTg57HXiFqQOCrTHNetqFAiUMsw1olCpoesk2sug1ARRPKuMUoUTiCYwwkNDOWRYeVnedWZfGiWwQyHNcW3n6v9EBplSU+abTwZ1rFa9ubjJ0zGbLWs0EpIYmaANxkpbHd57GeFJqjFHi7JhSm0t7Pl4dkAkRppODYHI5AmyUQwlRNpMXBnlwawlGIM8UDOzrLu64zrp3TRcp+E+3dabD8XGZXAOLsA1cMEdaIJH0AZdgEAEXsEbeLc+rC/r2/pZvJasInMGlmD9/gEN3rCx</latexit><latexit sha1_base64="OPt3ynkhCqxqBzDpgrYhiyO9cFA=">AAACP3icdVBLS8NAGNzUV62vVo9egkXxVBIR9FjsxWPFvqANZbPZJEv3EXY3Qgn9CV719/gz/AXexKs3t2kOtqUDHwwz38AwfkKJ0o7zaZW2tnd298r7lYPDo+OTau20p0QqEe4iQYUc+FBhSjjuaqIpHiQSQ+ZT3Pcnrbnff8FSEcE7eppgj8GIk5AgqI30HI8742rdaTg57HXiFqQOCrTHNetqFAiUMsw1olCpoesk2sug1ARRPKuMUoUTiCYwwkNDOWRYeVnedWZfGiWwQyHNcW3n6v9EBplSU+abTwZ1rFa9ubjJ0zGbLWs0EpIYmaANxkpbHd57GeFJqjFHi7JhSm0t7Pl4dkAkRppODYHI5AmyUQwlRNpMXBnlwawlGIM8UDOzrLu64zrp3TRcp+E+3dabD8XGZXAOLsA1cMEdaIJH0AZdgEAEXsEbeLc+rC/r2/pZvJasInMGlmD9/gEN3rCx</latexit><latexit sha1_base64="OPt3ynkhCqxqBzDpgrYhiyO9cFA=">AAACP3icdVBLS8NAGNzUV62vVo9egkXxVBIR9FjsxWPFvqANZbPZJEv3EXY3Qgn9CV719/gz/AXexKs3t2kOtqUDHwwz38AwfkKJ0o7zaZW2tnd298r7lYPDo+OTau20p0QqEe4iQYUc+FBhSjjuaqIpHiQSQ+ZT3Pcnrbnff8FSEcE7eppgj8GIk5AgqI30HI8742rdaTg57HXiFqQOCrTHNetqFAiUMsw1olCpoesk2sug1ARRPKuMUoUTiCYwwkNDOWRYeVnedWZfGiWwQyHNcW3n6v9EBplSU+abTwZ1rFa9ubjJ0zGbLWs0EpIYmaANxkpbHd57GeFJqjFHi7JhSm0t7Pl4dkAkRppODYHI5AmyUQwlRNpMXBnlwawlGIM8UDOzrLu64zrp3TRcp+E+3dabD8XGZXAOLsA1cMEdaIJH0AZdgEAEXsEbeLc+rC/r2/pZvJasInMGlmD9/gEN3rCx</latexit><latexit sha1_base64="OPt3ynkhCqxqBzDpgrYhiyO9cFA=">AAACP3icdVBLS8NAGNzUV62vVo9egkXxVBIR9FjsxWPFvqANZbPZJEv3EXY3Qgn9CV719/gz/AXexKs3t2kOtqUDHwwz38AwfkKJ0o7zaZW2tnd298r7lYPDo+OTau20p0QqEe4iQYUc+FBhSjjuaqIpHiQSQ+ZT3Pcnrbnff8FSEcE7eppgj8GIk5AgqI30HI8742rdaTg57HXiFqQOCrTHNetqFAiUMsw1olCpoesk2sug1ARRPKuMUoUTiCYwwkNDOWRYeVnedWZfGiWwQyHNcW3n6v9EBplSU+abTwZ1rFa9ubjJ0zGbLWs0EpIYmaANxkpbHd57GeFJqjFHi7JhSm0t7Pl4dkAkRppODYHI5AmyUQwlRNpMXBnlwawlGIM8UDOzrLu64zrp3TRcp+E+3dabD8XGZXAOLsA1cMEdaIJH0AZdgEAEXsEbeLc+rC/r2/pZvJasInMGlmD9/gEN3rCx</latexit>

H(x) = sign

TX

t=1

↵tht(x)

!

<latexit sha1_base64="GB1VUXh93SXCU1F/0RNvxLUajOE=">AAACfHicdVHLattAFB2rr9R9Oe2ym6Fui0OpkUppskkJzSbLFOIkELnianwlDZmHmLkqMUJ/k6/Jtt30Z0rHjheNQw4MHM65B+49k9dKeorjP73o3v0HDx9tPO4/efrs+YvB5stjbxsncCKssu40B49KGpyQJIWntUPQucKT/Hx/4Z/8ROelNUc0r3GqoTSykAIoSNng68HoYovv8pTwglovS9PxVGFBI576Rmct7SbdjyOegqoryIhXGS0SqZNlRVvZYBiP4yX4bZKsyJCtcJht9t6nMysajYaEAu/PkrimaQuOpFDY9dPGYw3iHEo8C9SARj9tl4d2/F1QZrywLjxDfKn+n2hBez/XeZjUQJVf9xbiXR5VurupqdI6GWQp7jDWtqViZ9pKUzeERlwvWzSKk+WL5vlMOhSk5oGACHkpuKjAgaDwP/10GWz3rdZgZr4LzSbrPd4mx5/GSTxOvn8e7n1bdbzBXrM3bMQSts322AE7ZBMm2CW7Yr/Y797f6G30Ifp4PRr1VplX7AaiL/8ARbLDuA==</latexit><latexit sha1_base64="GB1VUXh93SXCU1F/0RNvxLUajOE=">AAACfHicdVHLattAFB2rr9R9Oe2ym6Fui0OpkUppskkJzSbLFOIkELnianwlDZmHmLkqMUJ/k6/Jtt30Z0rHjheNQw4MHM65B+49k9dKeorjP73o3v0HDx9tPO4/efrs+YvB5stjbxsncCKssu40B49KGpyQJIWntUPQucKT/Hx/4Z/8ROelNUc0r3GqoTSykAIoSNng68HoYovv8pTwglovS9PxVGFBI576Rmct7SbdjyOegqoryIhXGS0SqZNlRVvZYBiP4yX4bZKsyJCtcJht9t6nMysajYaEAu/PkrimaQuOpFDY9dPGYw3iHEo8C9SARj9tl4d2/F1QZrywLjxDfKn+n2hBez/XeZjUQJVf9xbiXR5VurupqdI6GWQp7jDWtqViZ9pKUzeERlwvWzSKk+WL5vlMOhSk5oGACHkpuKjAgaDwP/10GWz3rdZgZr4LzSbrPd4mx5/GSTxOvn8e7n1bdbzBXrM3bMQSts322AE7ZBMm2CW7Yr/Y797f6G30Ifp4PRr1VplX7AaiL/8ARbLDuA==</latexit><latexit sha1_base64="GB1VUXh93SXCU1F/0RNvxLUajOE=">AAACfHicdVHLattAFB2rr9R9Oe2ym6Fui0OpkUppskkJzSbLFOIkELnianwlDZmHmLkqMUJ/k6/Jtt30Z0rHjheNQw4MHM65B+49k9dKeorjP73o3v0HDx9tPO4/efrs+YvB5stjbxsncCKssu40B49KGpyQJIWntUPQucKT/Hx/4Z/8ROelNUc0r3GqoTSykAIoSNng68HoYovv8pTwglovS9PxVGFBI576Rmct7SbdjyOegqoryIhXGS0SqZNlRVvZYBiP4yX4bZKsyJCtcJht9t6nMysajYaEAu/PkrimaQuOpFDY9dPGYw3iHEo8C9SARj9tl4d2/F1QZrywLjxDfKn+n2hBez/XeZjUQJVf9xbiXR5VurupqdI6GWQp7jDWtqViZ9pKUzeERlwvWzSKk+WL5vlMOhSk5oGACHkpuKjAgaDwP/10GWz3rdZgZr4LzSbrPd4mx5/GSTxOvn8e7n1bdbzBXrM3bMQSts322AE7ZBMm2CW7Yr/Y797f6G30Ifp4PRr1VplX7AaiL/8ARbLDuA==</latexit><latexit sha1_base64="GB1VUXh93SXCU1F/0RNvxLUajOE=">AAACfHicdVHLattAFB2rr9R9Oe2ym6Fui0OpkUppskkJzSbLFOIkELnianwlDZmHmLkqMUJ/k6/Jtt30Z0rHjheNQw4MHM65B+49k9dKeorjP73o3v0HDx9tPO4/efrs+YvB5stjbxsncCKssu40B49KGpyQJIWntUPQucKT/Hx/4Z/8ROelNUc0r3GqoTSykAIoSNng68HoYovv8pTwglovS9PxVGFBI576Rmct7SbdjyOegqoryIhXGS0SqZNlRVvZYBiP4yX4bZKsyJCtcJht9t6nMysajYaEAu/PkrimaQuOpFDY9dPGYw3iHEo8C9SARj9tl4d2/F1QZrywLjxDfKn+n2hBez/XeZjUQJVf9xbiXR5VurupqdI6GWQp7jDWtqViZ9pKUzeERlwvWzSKk+WL5vlMOhSk5oGACHkpuKjAgaDwP/10GWz3rdZgZr4LzSbrPd4mx5/GSTxOvn8e7n1bdbzBXrM3bMQSts322AE7ZBMm2CW7Yr/Y797f6G30Ifp4PRr1VplX7AaiL/8ARbLDuA==</latexit>

errt =

PNi=1 wiI{ht(x

(i) 6= t(i)}PN

i=1 wi<latexit sha1_base64="5U5SI+PZ7uJNFYG/ReSMtHFZ5Po=">AAACqXicdVFNaxsxENVuvxL3y2mPvYiaFpeC2Q2F5hIIzaWnNIE4Mc06i1aetYX1sZFm2xixP7LH/pJeI9t7SJzmCcHTm3lieFNUUjhMkr9R/Ojxk6fPtrY7z1+8fPW6u/PmzJnachhyI40dFcyBFBqGKFDCqLLAVCHhvJgfLuvnv8A6YfQpLioYKzbVohScYZDy7jxDuEYP1jY50n2alZZxTzNXq9yL/bS5PKK/c0GzHxr8LLT0s6L0182l74tPDc00XFFsH+F0tteg//miybu9ZJCsQO+TtCU90uI434k+ZhPDawUauWTOXaRJhWPPLAouoelktYOK8TmbwkWgmilwY79KpaEfgjKhpbHhaqQr9bbDM+XcQhWhUzGcuc3aUnyohjPV3NXk1FgRZMEfKGxMi+Xe2Atd1Qiar4cta0nR0OWa6ERY4CgXgTAe/IJTPmNhNxiW2clWRn9olGJ64pbJpps53idnu4M0GaQnX3oH39qMt8g78p70SUq+kgPynRyTIeHkD/kXkSiKP8cn8Sj+uW6No9bzltxBzG8AkrLQdg==</latexit><latexit sha1_base64="5U5SI+PZ7uJNFYG/ReSMtHFZ5Po=">AAACqXicdVFNaxsxENVuvxL3y2mPvYiaFpeC2Q2F5hIIzaWnNIE4Mc06i1aetYX1sZFm2xixP7LH/pJeI9t7SJzmCcHTm3lieFNUUjhMkr9R/Ojxk6fPtrY7z1+8fPW6u/PmzJnachhyI40dFcyBFBqGKFDCqLLAVCHhvJgfLuvnv8A6YfQpLioYKzbVohScYZDy7jxDuEYP1jY50n2alZZxTzNXq9yL/bS5PKK/c0GzHxr8LLT0s6L0182l74tPDc00XFFsH+F0tteg//miybu9ZJCsQO+TtCU90uI434k+ZhPDawUauWTOXaRJhWPPLAouoelktYOK8TmbwkWgmilwY79KpaEfgjKhpbHhaqQr9bbDM+XcQhWhUzGcuc3aUnyohjPV3NXk1FgRZMEfKGxMi+Xe2Atd1Qiar4cta0nR0OWa6ERY4CgXgTAe/IJTPmNhNxiW2clWRn9olGJ64pbJpps53idnu4M0GaQnX3oH39qMt8g78p70SUq+kgPynRyTIeHkD/kXkSiKP8cn8Sj+uW6No9bzltxBzG8AkrLQdg==</latexit><latexit sha1_base64="5U5SI+PZ7uJNFYG/ReSMtHFZ5Po=">AAACqXicdVFNaxsxENVuvxL3y2mPvYiaFpeC2Q2F5hIIzaWnNIE4Mc06i1aetYX1sZFm2xixP7LH/pJeI9t7SJzmCcHTm3lieFNUUjhMkr9R/Ojxk6fPtrY7z1+8fPW6u/PmzJnachhyI40dFcyBFBqGKFDCqLLAVCHhvJgfLuvnv8A6YfQpLioYKzbVohScYZDy7jxDuEYP1jY50n2alZZxTzNXq9yL/bS5PKK/c0GzHxr8LLT0s6L0182l74tPDc00XFFsH+F0tteg//miybu9ZJCsQO+TtCU90uI434k+ZhPDawUauWTOXaRJhWPPLAouoelktYOK8TmbwkWgmilwY79KpaEfgjKhpbHhaqQr9bbDM+XcQhWhUzGcuc3aUnyohjPV3NXk1FgRZMEfKGxMi+Xe2Atd1Qiar4cta0nR0OWa6ERY4CgXgTAe/IJTPmNhNxiW2clWRn9olGJ64pbJpps53idnu4M0GaQnX3oH39qMt8g78p70SUq+kgPynRyTIeHkD/kXkSiKP8cn8Sj+uW6No9bzltxBzG8AkrLQdg==</latexit><latexit sha1_base64="5U5SI+PZ7uJNFYG/ReSMtHFZ5Po=">AAACqXicdVFNaxsxENVuvxL3y2mPvYiaFpeC2Q2F5hIIzaWnNIE4Mc06i1aetYX1sZFm2xixP7LH/pJeI9t7SJzmCcHTm3lieFNUUjhMkr9R/Ojxk6fPtrY7z1+8fPW6u/PmzJnachhyI40dFcyBFBqGKFDCqLLAVCHhvJgfLuvnv8A6YfQpLioYKzbVohScYZDy7jxDuEYP1jY50n2alZZxTzNXq9yL/bS5PKK/c0GzHxr8LLT0s6L0182l74tPDc00XFFsH+F0tteg//miybu9ZJCsQO+TtCU90uI434k+ZhPDawUauWTOXaRJhWPPLAouoelktYOK8TmbwkWgmilwY79KpaEfgjKhpbHhaqQr9bbDM+XcQhWhUzGcuc3aUnyohjPV3NXk1FgRZMEfKGxMi+Xe2Atd1Qiar4cta0nR0OWa6ERY4CgXgTAe/IJTPmNhNxiW2clWRn9olGJ64pbJpps53idnu4M0GaQnX3oH39qMt8g78p70SUq+kgPynRyTIeHkD/kXkSiKP8cn8Sj+uW6No9bzltxBzG8AkrLQdg==</latexit>

↵t =1

2log

✓1� errt

errt

<latexit sha1_base64="o3z3ZC8kU1t+nm6vA42CJrkmjUo=">AAACk3icdVFNaxRBEO0dv7Lr10bx5KVxUeLBZSYICiKExIMXIYKbBDLLUtNbM9OkP4buGnFp5pf5Szx61T9h72YOZkMKGl69Vw+qXxWNkp7S9NcguXX7zt17O8PR/QcPHz0e7z458bZ1AmfCKuvOCvCopMEZSVJ41jgEXSg8LS6O1vrpd3ReWvONVg3ONVRGllIARWoxng2HOaimhgXxjzwvHYiQdWG/46OoKFvlCkva64U3PCf8QQGd66KhC1s9z52sanq9GE/Sabopfh1kPZiwvo4Xu4NX+dKKVqMhocD78yxtaB7AkRQKu1HeemxAXECF5xEa0OjnYfP/jr+MzJKX1sVniG/Y/x0BtPcrXcRJDVT7bW1N3qRRrburnKqsk5GW4gZha1sq38+DNE1LaMTlsmWrOFm+PghfSoeC1CoCENEvBRc1xLgpnm2Ub4zhyGoNZum7mGy2neN1cLI/zdJp9vXt5OCwz3iHPWcv2B7L2Dt2wD6zYzZjgv1kv9kf9jd5lnxIDpNPl6PJoPc8ZVcq+fIPOUDMCw==</latexit><latexit sha1_base64="o3z3ZC8kU1t+nm6vA42CJrkmjUo=">AAACk3icdVFNaxRBEO0dv7Lr10bx5KVxUeLBZSYICiKExIMXIYKbBDLLUtNbM9OkP4buGnFp5pf5Szx61T9h72YOZkMKGl69Vw+qXxWNkp7S9NcguXX7zt17O8PR/QcPHz0e7z458bZ1AmfCKuvOCvCopMEZSVJ41jgEXSg8LS6O1vrpd3ReWvONVg3ONVRGllIARWoxng2HOaimhgXxjzwvHYiQdWG/46OoKFvlCkva64U3PCf8QQGd66KhC1s9z52sanq9GE/Sabopfh1kPZiwvo4Xu4NX+dKKVqMhocD78yxtaB7AkRQKu1HeemxAXECF5xEa0OjnYfP/jr+MzJKX1sVniG/Y/x0BtPcrXcRJDVT7bW1N3qRRrburnKqsk5GW4gZha1sq38+DNE1LaMTlsmWrOFm+PghfSoeC1CoCENEvBRc1xLgpnm2Ub4zhyGoNZum7mGy2neN1cLI/zdJp9vXt5OCwz3iHPWcv2B7L2Dt2wD6zYzZjgv1kv9kf9jd5lnxIDpNPl6PJoPc8ZVcq+fIPOUDMCw==</latexit><latexit sha1_base64="o3z3ZC8kU1t+nm6vA42CJrkmjUo=">AAACk3icdVFNaxRBEO0dv7Lr10bx5KVxUeLBZSYICiKExIMXIYKbBDLLUtNbM9OkP4buGnFp5pf5Szx61T9h72YOZkMKGl69Vw+qXxWNkp7S9NcguXX7zt17O8PR/QcPHz0e7z458bZ1AmfCKuvOCvCopMEZSVJ41jgEXSg8LS6O1vrpd3ReWvONVg3ONVRGllIARWoxng2HOaimhgXxjzwvHYiQdWG/46OoKFvlCkva64U3PCf8QQGd66KhC1s9z52sanq9GE/Sabopfh1kPZiwvo4Xu4NX+dKKVqMhocD78yxtaB7AkRQKu1HeemxAXECF5xEa0OjnYfP/jr+MzJKX1sVniG/Y/x0BtPcrXcRJDVT7bW1N3qRRrburnKqsk5GW4gZha1sq38+DNE1LaMTlsmWrOFm+PghfSoeC1CoCENEvBRc1xLgpnm2Ub4zhyGoNZum7mGy2neN1cLI/zdJp9vXt5OCwz3iHPWcv2B7L2Dt2wD6zYzZjgv1kv9kf9jd5lnxIDpNPl6PJoPc8ZVcq+fIPOUDMCw==</latexit><latexit sha1_base64="o3z3ZC8kU1t+nm6vA42CJrkmjUo=">AAACk3icdVFNaxRBEO0dv7Lr10bx5KVxUeLBZSYICiKExIMXIYKbBDLLUtNbM9OkP4buGnFp5pf5Szx61T9h72YOZkMKGl69Vw+qXxWNkp7S9NcguXX7zt17O8PR/QcPHz0e7z458bZ1AmfCKuvOCvCopMEZSVJ41jgEXSg8LS6O1vrpd3ReWvONVg3ONVRGllIARWoxng2HOaimhgXxjzwvHYiQdWG/46OoKFvlCkva64U3PCf8QQGd66KhC1s9z52sanq9GE/Sabopfh1kPZiwvo4Xu4NX+dKKVqMhocD78yxtaB7AkRQKu1HeemxAXECF5xEa0OjnYfP/jr+MzJKX1sVniG/Y/x0BtPcrXcRJDVT7bW1N3qRRrburnKqsk5GW4gZha1sq38+DNE1LaMTlsmWrOFm+PghfSoeC1CoCENEvBRc1xLgpnm2Ub4zhyGoNZum7mGy2neN1cLI/zdJp9vXt5OCwz3iHPWcv2B7L2Dt2wD6zYzZjgv1kv9kf9jd5lnxIDpNPl6PJoPc8ZVcq+fIPOUDMCw==</latexit>

wi wi exp⇣2↵tI{ht(x

(i)) 6= t(i)}⌘

<latexit sha1_base64="E7ey2D1vUl4iSw4a8mCWl+cdJ5s=">AAACl3icdZHbahsxEIbl7SGpe4jTXpXeiJoW+8bshkJ719BAyF0TqJOUyF208qxXRIeNNNvELPtseY4+QG7TV6i89kXjkAHBP9/oh+GfrFTSYxz/6USPHj95urH5rPv8xctXW73t18feVk7AWFhl3WnGPShpYIwSFZyWDrjOFJxk53uL+clvcF5a8wPnJUw0nxmZS8ExoLT38zKVlCnIkTtnL2nbwlW5ZAO6w7gqC54iZd8N1LRIA2RZXl81v+qBHDZDygxcUFx2lDaUOTkrcJj2+vEoboveF8lK9MmqDtPtzkc2taLSYFAo7v1ZEpc4qblDKRQ0XVZ5KLk45zM4C9JwDX5Stxk09EMgU5pbF55B2tL/HTXX3s91Fn5qjoVfny3gQzMsdHOXqZl1MmApHhisbYv5l0ktTVkhGLFcNq8URUsXR6FT6UCgmgfBRfBLQUXBHRcYTtdlrbHes1pzM/VNSDZZz/G+ON4ZJfEoOfrU3/22yniTvCPvyYAk5DPZJQfkkIyJINfkhtySv9Hb6Gu0Hx0sv0adlecNuVPR0T9Ty80k</latexit><latexit sha1_base64="E7ey2D1vUl4iSw4a8mCWl+cdJ5s=">AAACl3icdZHbahsxEIbl7SGpe4jTXpXeiJoW+8bshkJ719BAyF0TqJOUyF208qxXRIeNNNvELPtseY4+QG7TV6i89kXjkAHBP9/oh+GfrFTSYxz/6USPHj95urH5rPv8xctXW73t18feVk7AWFhl3WnGPShpYIwSFZyWDrjOFJxk53uL+clvcF5a8wPnJUw0nxmZS8ExoLT38zKVlCnIkTtnL2nbwlW5ZAO6w7gqC54iZd8N1LRIA2RZXl81v+qBHDZDygxcUFx2lDaUOTkrcJj2+vEoboveF8lK9MmqDtPtzkc2taLSYFAo7v1ZEpc4qblDKRQ0XVZ5KLk45zM4C9JwDX5Stxk09EMgU5pbF55B2tL/HTXX3s91Fn5qjoVfny3gQzMsdHOXqZl1MmApHhisbYv5l0ktTVkhGLFcNq8URUsXR6FT6UCgmgfBRfBLQUXBHRcYTtdlrbHes1pzM/VNSDZZz/G+ON4ZJfEoOfrU3/22yniTvCPvyYAk5DPZJQfkkIyJINfkhtySv9Hb6Gu0Hx0sv0adlecNuVPR0T9Ty80k</latexit><latexit sha1_base64="E7ey2D1vUl4iSw4a8mCWl+cdJ5s=">AAACl3icdZHbahsxEIbl7SGpe4jTXpXeiJoW+8bshkJ719BAyF0TqJOUyF208qxXRIeNNNvELPtseY4+QG7TV6i89kXjkAHBP9/oh+GfrFTSYxz/6USPHj95urH5rPv8xctXW73t18feVk7AWFhl3WnGPShpYIwSFZyWDrjOFJxk53uL+clvcF5a8wPnJUw0nxmZS8ExoLT38zKVlCnIkTtnL2nbwlW5ZAO6w7gqC54iZd8N1LRIA2RZXl81v+qBHDZDygxcUFx2lDaUOTkrcJj2+vEoboveF8lK9MmqDtPtzkc2taLSYFAo7v1ZEpc4qblDKRQ0XVZ5KLk45zM4C9JwDX5Stxk09EMgU5pbF55B2tL/HTXX3s91Fn5qjoVfny3gQzMsdHOXqZl1MmApHhisbYv5l0ktTVkhGLFcNq8URUsXR6FT6UCgmgfBRfBLQUXBHRcYTtdlrbHes1pzM/VNSDZZz/G+ON4ZJfEoOfrU3/22yniTvCPvyYAk5DPZJQfkkIyJINfkhtySv9Hb6Gu0Hx0sv0adlecNuVPR0T9Ty80k</latexit><latexit sha1_base64="E7ey2D1vUl4iSw4a8mCWl+cdJ5s=">AAACl3icdZHbahsxEIbl7SGpe4jTXpXeiJoW+8bshkJ719BAyF0TqJOUyF208qxXRIeNNNvELPtseY4+QG7TV6i89kXjkAHBP9/oh+GfrFTSYxz/6USPHj95urH5rPv8xctXW73t18feVk7AWFhl3WnGPShpYIwSFZyWDrjOFJxk53uL+clvcF5a8wPnJUw0nxmZS8ExoLT38zKVlCnIkTtnL2nbwlW5ZAO6w7gqC54iZd8N1LRIA2RZXl81v+qBHDZDygxcUFx2lDaUOTkrcJj2+vEoboveF8lK9MmqDtPtzkc2taLSYFAo7v1ZEpc4qblDKRQ0XVZ5KLk45zM4C9JwDX5Stxk09EMgU5pbF55B2tL/HTXX3s91Fn5qjoVfny3gQzMsdHOXqZl1MmApHhisbYv5l0ktTVkhGLFcNq8URUsXR6FT6UCgmgfBRfBLQUXBHRcYTtdlrbHes1pzM/VNSDZZz/G+ON4ZJfEoOfrU3/22yniTvCPvyYAk5DPZJQfkkIyJINfkhtySv9Hb6Gu0Hx0sv0adlecNuVPR0T9Ty80k</latexit>

UofT CSC411 2019 Winter Lecture 02 15 / 24

AdaBoost Example

Each figure shows the number m of base learners trained so far, the decisionof the most recent learner (dashed black), and the boundary of the ensemble(green)

UofT CSC411 2019 Winter Lecture 02 16 / 24

AdaBoost Minimizes the Training Error

TheoremAssume that at each iteration of AdaBoost the WeakLearn returns a hypothesiswith error errt ≤ 1

2 − γ for all t = 1, . . . ,T with γ > 0. The training error of the

output hypothesis H(x) = sign(∑T

t=1 αtht(x))

is at most

LN(H) =1

N

N∑

i=1

I{H(x(i)) 6= t(i))} ≤ exp(−2γ2T

).

This is under the simplifying assumption that each weak learner is γ-betterthan a random predictor.

Analyzing the convergence of AdaBoost is generally difficult.

UofT CSC411 2019 Winter Lecture 02 17 / 24

AdaBoost Minimizes the Training Error

TheoremAssume that at each iteration of AdaBoost the WeakLearn returns a hypothesiswith error errt ≤ 1

2 − γ for all t = 1, . . . ,T with γ > 0. The training error of the

output hypothesis H(x) = sign(∑T

t=1 αtht(x))

is at most

LN(H) =1

N

N∑

i=1

I{H(x(i)) 6= t(i))} ≤ exp(−2γ2T

).

This is under the simplifying assumption that each weak learner is γ-betterthan a random predictor.

Analyzing the convergence of AdaBoost is generally difficult.

UofT CSC411 2019 Winter Lecture 02 17 / 24

Generalization Error of AdaBoost

AdaBoost’s training error (loss) converges to zero. What about the testerror of H?

As we add more weak classifiers, the overall classifier H becomes more“complex”.

We expect more complex classifiers overfit.

If one runs AdaBoost long enough, it can in fact overfit.

Overfitting Can HappenOverfitting Can HappenOverfitting Can HappenOverfitting Can HappenOverfitting Can Happen

0

5

10

15

20

25

30

1 10 100 1000

test

train

erro

r

# rounds

(boosting “stumps” onheart-disease dataset)

• but often doesn’t...

UofT CSC411 2019 Winter Lecture 02 18 / 24

Generalization Error of AdaBoost

AdaBoost’s training error (loss) converges to zero. What about the testerror of H?

As we add more weak classifiers, the overall classifier H becomes more“complex”.

We expect more complex classifiers overfit.

If one runs AdaBoost long enough, it can in fact overfit.

Overfitting Can HappenOverfitting Can HappenOverfitting Can HappenOverfitting Can HappenOverfitting Can Happen

0

5

10

15

20

25

30

1 10 100 1000

test

train

erro

r

# rounds

(boosting “stumps” onheart-disease dataset)

• but often doesn’t...

UofT CSC411 2019 Winter Lecture 02 18 / 24

Generalization Error of AdaBoost

AdaBoost’s training error (loss) converges to zero. What about the testerror of H?

As we add more weak classifiers, the overall classifier H becomes more“complex”.

We expect more complex classifiers overfit.

If one runs AdaBoost long enough, it can in fact overfit.

Overfitting Can HappenOverfitting Can HappenOverfitting Can HappenOverfitting Can HappenOverfitting Can Happen

0

5

10

15

20

25

30

1 10 100 1000

test

train

erro

r

# rounds

(boosting “stumps” onheart-disease dataset)

• but often doesn’t...

UofT CSC411 2019 Winter Lecture 02 18 / 24

Generalization Error of AdaBoost

AdaBoost’s training error (loss) converges to zero. What about the testerror of H?

As we add more weak classifiers, the overall classifier H becomes more“complex”.

We expect more complex classifiers overfit.

If one runs AdaBoost long enough, it can in fact overfit.

Overfitting Can HappenOverfitting Can HappenOverfitting Can HappenOverfitting Can HappenOverfitting Can Happen

0

5

10

15

20

25

30

1 10 100 1000

test

train

erro

r

# rounds

(boosting “stumps” onheart-disease dataset)

• but often doesn’t...

UofT CSC411 2019 Winter Lecture 02 18 / 24

Generalization Error of AdaBoost

But often it does not!

Sometimes the test error decreases even after the training error is zero!Actual Typical RunActual Typical RunActual Typical RunActual Typical RunActual Typical Run

10 100 10000

5

10

15

20

erro

r

test

train

)T# of rounds (

(boosting C4.5 on“letter” dataset)

• test error does not increase, even after 1000 rounds• (total size > 2,000,000 nodes)

• test error continues to drop even after training error is zero!

# rounds5 100 1000

train error 0.0 0.0 0.0

test error 8.4 3.3 3.1

• Occam’s razor wrongly predicts “simpler” rule is better

How does that happen?

We will provide an alternative viewpoint on AdaBoost later in the course.

[Slide credit: Robert Shapire’s Slides, http://www.cs.princeton.edu/courses/archive/spring12/cos598A/schedule.html ]

UofT CSC411 2019 Winter Lecture 02 19 / 24

Generalization Error of AdaBoost

But often it does not!

Sometimes the test error decreases even after the training error is zero!Actual Typical RunActual Typical RunActual Typical RunActual Typical RunActual Typical Run

10 100 10000

5

10

15

20er

ror

test

train

)T# of rounds (

(boosting C4.5 on“letter” dataset)

• test error does not increase, even after 1000 rounds• (total size > 2,000,000 nodes)

• test error continues to drop even after training error is zero!

# rounds5 100 1000

train error 0.0 0.0 0.0

test error 8.4 3.3 3.1

• Occam’s razor wrongly predicts “simpler” rule is better

How does that happen?

We will provide an alternative viewpoint on AdaBoost later in the course.

[Slide credit: Robert Shapire’s Slides, http://www.cs.princeton.edu/courses/archive/spring12/cos598A/schedule.html ]

UofT CSC411 2019 Winter Lecture 02 19 / 24

Generalization Error of AdaBoost

But often it does not!

Sometimes the test error decreases even after the training error is zero!Actual Typical RunActual Typical RunActual Typical RunActual Typical RunActual Typical Run

10 100 10000

5

10

15

20er

ror

test

train

)T# of rounds (

(boosting C4.5 on“letter” dataset)

• test error does not increase, even after 1000 rounds• (total size > 2,000,000 nodes)

• test error continues to drop even after training error is zero!

# rounds5 100 1000

train error 0.0 0.0 0.0

test error 8.4 3.3 3.1

• Occam’s razor wrongly predicts “simpler” rule is better

How does that happen?

We will provide an alternative viewpoint on AdaBoost later in the course.

[Slide credit: Robert Shapire’s Slides, http://www.cs.princeton.edu/courses/archive/spring12/cos598A/schedule.html ]

UofT CSC411 2019 Winter Lecture 02 19 / 24

Generalization Error of AdaBoost

But often it does not!

Sometimes the test error decreases even after the training error is zero!Actual Typical RunActual Typical RunActual Typical RunActual Typical RunActual Typical Run

10 100 10000

5

10

15

20er

ror

test

train

)T# of rounds (

(boosting C4.5 on“letter” dataset)

• test error does not increase, even after 1000 rounds• (total size > 2,000,000 nodes)

• test error continues to drop even after training error is zero!

# rounds5 100 1000

train error 0.0 0.0 0.0

test error 8.4 3.3 3.1

• Occam’s razor wrongly predicts “simpler” rule is better

How does that happen?

We will provide an alternative viewpoint on AdaBoost later in the course.

[Slide credit: Robert Shapire’s Slides, http://www.cs.princeton.edu/courses/archive/spring12/cos598A/schedule.html ]

UofT CSC411 2019 Winter Lecture 02 19 / 24

AdaBoost for Face Detection

Famous application of boosting: detecting faces in images

Viola and Jones created a very fast face detector that can be scanned acrossa large image to find the faces.

A few twists on standard algorithm

I Change loss function for weak learners: false positives less costly thanmisses

I Smart way to do inference in real-time (in 2001 hardware)

UofT CSC411 2019 Winter Lecture 02 20 / 24

AdaBoost for Face Detection

Famous application of boosting: detecting faces in images

Viola and Jones created a very fast face detector that can be scanned acrossa large image to find the faces.

A few twists on standard algorithm

I Change loss function for weak learners: false positives less costly thanmisses

I Smart way to do inference in real-time (in 2001 hardware)

UofT CSC411 2019 Winter Lecture 02 20 / 24

AdaBoost for Face Detection

Famous application of boosting: detecting faces in images

Viola and Jones created a very fast face detector that can be scanned acrossa large image to find the faces.

A few twists on standard algorithm

I Change loss function for weak learners: false positives less costly thanmisses

I Smart way to do inference in real-time (in 2001 hardware)

UofT CSC411 2019 Winter Lecture 02 20 / 24

AdaBoost for Face Detection

Famous application of boosting: detecting faces in images

Viola and Jones created a very fast face detector that can be scanned acrossa large image to find the faces.

A few twists on standard algorithmI Change loss function for weak learners: false positives less costly than

misses

I Smart way to do inference in real-time (in 2001 hardware)

UofT CSC411 2019 Winter Lecture 02 20 / 24

AdaBoost for Face Detection

Famous application of boosting: detecting faces in images

Viola and Jones created a very fast face detector that can be scanned acrossa large image to find the faces.

A few twists on standard algorithmI Change loss function for weak learners: false positives less costly than

missesI Smart way to do inference in real-time (in 2001 hardware)

UofT CSC411 2019 Winter Lecture 02 20 / 24

AdaBoost for Face Recognition

The base classifier/weak learner just compares the total intensity in tworectangular pieces of the image and classifies based on comparison of thisdifference to some threshold.

I There is a neat trick for computing the total intensity in a rectangle ina few operations.

I So it is easy to evaluate a huge number of base classifiers and they arevery fast at runtime.

I The algorithm adds classifiers greedily based on their quality on theweighted training cases

I Each classifier uses just one feature

UofT CSC411 2019 Winter Lecture 02 21 / 24

AdaBoost for Face Recognition

The base classifier/weak learner just compares the total intensity in tworectangular pieces of the image and classifies based on comparison of thisdifference to some threshold.

I There is a neat trick for computing the total intensity in a rectangle ina few operations.

I So it is easy to evaluate a huge number of base classifiers and they arevery fast at runtime.

I The algorithm adds classifiers greedily based on their quality on theweighted training cases

I Each classifier uses just one feature

UofT CSC411 2019 Winter Lecture 02 21 / 24

AdaBoost for Face Recognition

The base classifier/weak learner just compares the total intensity in tworectangular pieces of the image and classifies based on comparison of thisdifference to some threshold.

I There is a neat trick for computing the total intensity in a rectangle ina few operations.

I So it is easy to evaluate a huge number of base classifiers and they arevery fast at runtime.

I The algorithm adds classifiers greedily based on their quality on theweighted training cases

I Each classifier uses just one feature

UofT CSC411 2019 Winter Lecture 02 21 / 24

AdaBoost for Face Recognition

The base classifier/weak learner just compares the total intensity in tworectangular pieces of the image and classifies based on comparison of thisdifference to some threshold.

I There is a neat trick for computing the total intensity in a rectangle ina few operations.

I So it is easy to evaluate a huge number of base classifiers and they arevery fast at runtime.

I The algorithm adds classifiers greedily based on their quality on theweighted training cases

I Each classifier uses just one feature

UofT CSC411 2019 Winter Lecture 02 21 / 24

AdaBoost for Face Recognition

The base classifier/weak learner just compares the total intensity in tworectangular pieces of the image and classifies based on comparison of thisdifference to some threshold.

I There is a neat trick for computing the total intensity in a rectangle ina few operations.

I So it is easy to evaluate a huge number of base classifiers and they arevery fast at runtime.

I The algorithm adds classifiers greedily based on their quality on theweighted training cases

I Each classifier uses just one feature

UofT CSC411 2019 Winter Lecture 02 21 / 24

AdaBoost Face Detection Results

UofT CSC411 2019 Winter Lecture 02 22 / 24

Summary

Boosting reduces bias by generating an ensemble of weak classifiers.

Each classifier is trained to reduce errors of previous ensemble.

It is quite resilient to overfitting, though it can overfit.

We will later provide a loss minimization viewpoint to AdaBoost. It allowsus to derive other boosting algorithms for regression, ranking, etc.

UofT CSC411 2019 Winter Lecture 02 23 / 24

Ensembles Recap

Ensembles combine classifiers to improve performance

Boosting

I Reduces biasI Increases variance (large ensemble can cause overfitting)I SequentialI High dependency between ensemble elements

Bagging

I Reduces variance (large ensemble can’t cause overfitting)I Bias is not changed (much)I ParallelI Want to minimize correlation between ensemble elements.

Next Lecture: Linear Regression

UofT CSC411 2019 Winter Lecture 02 24 / 24