lecture9: classification,lda - stanford...

55
Lecture 9: Classification, LDA Reading: Chapter 4 STATS 202: Data mining and analysis Jonathan Taylor, 10/12 Slide credits: Sergio Bacallado 1 / 21

Upload: others

Post on 24-May-2020

5 views

Category:

Documents


0 download

TRANSCRIPT

  • Lecture 9: Classification, LDA

    Reading: Chapter 4

    STATS 202: Data mining and analysis

    Jonathan Taylor, 10/12Slide credits: Sergio Bacallado

    1 / 21

  • Review: Main strategy in Chapter 4

    Find an estimate P̂ (Y | X). Then, given an input x0, we predictthe response as in a Bayes classifier:

    ŷ0 = argmax y P̂ (Y = y | X = x0).

    2 / 21

  • Linear Discriminant Analysis (LDA)

    Instead of estimating P (Y | X), we will estimate:

    1. P̂ (X | Y ): Given the response, what is the distribution of theinputs.

    2. P̂ (Y ): How likely are each of the categories.

    Then, we use Bayes rule to obtain the estimate:

    P̂ (Y = k | X = x) = P̂ (X = x | Y = k)P̂ (Y = k)

    3 / 21

  • Linear Discriminant Analysis (LDA)

    Instead of estimating P (Y | X), we will estimate:

    1. P̂ (X | Y ): Given the response, what is the distribution of theinputs.

    2. P̂ (Y ): How likely are each of the categories.

    Then, we use Bayes rule to obtain the estimate:

    P̂ (Y = k | X = x) = P̂ (X = x | Y = k)P̂ (Y = k)

    3 / 21

  • Linear Discriminant Analysis (LDA)

    Instead of estimating P (Y | X), we will estimate:

    1. P̂ (X | Y ): Given the response, what is the distribution of theinputs.

    2. P̂ (Y ): How likely are each of the categories.

    Then, we use Bayes rule to obtain the estimate:

    P̂ (Y = k | X = x) = P̂ (X = x | Y = k)P̂ (Y = k)

    3 / 21

  • Linear Discriminant Analysis (LDA)

    Instead of estimating P (Y | X), we will estimate:

    1. P̂ (X | Y ): Given the response, what is the distribution of theinputs.

    2. P̂ (Y ): How likely are each of the categories.

    Then, we use Bayes rule to obtain the estimate:

    P̂ (Y = k | X = x) = P̂ (X = x | Y = k)P̂ (Y = k)P̂ (X = x)

    3 / 21

  • Linear Discriminant Analysis (LDA)

    Instead of estimating P (Y | X), we will estimate:

    1. P̂ (X | Y ): Given the response, what is the distribution of theinputs.

    2. P̂ (Y ): How likely are each of the categories.

    Then, we use Bayes rule to obtain the estimate:

    P̂ (Y = k | X = x) = P̂ (X = x | Y = k)P̂ (Y = k)∑j P̂ (X = x | Y = j)P̂ (Y = j)

    3 / 21

  • Linear Discriminant Analysis (LDA)

    Instead of estimating P (Y | X), we will estimate:

    1. We model P̂ (X = x | Y = k) = f̂k(x) as a MultivariateNormal Distribution:

    −4 −2 0 2 4

    −4

    −2

    02

    4

    −4 −2 0 2 4

    −4

    −2

    02

    4

    X1X1

    X2

    X2

    2. P̂ (Y = k) = π̂k is estimated by the fraction of trainingsamples of class k.

    4 / 21

  • Linear Discriminant Analysis (LDA)

    Instead of estimating P (Y | X), we will estimate:

    1. We model P̂ (X = x | Y = k) = f̂k(x) as a MultivariateNormal Distribution:

    −4 −2 0 2 4

    −4

    −2

    02

    4

    −4 −2 0 2 4

    −4

    −2

    02

    4

    X1X1

    X2

    X2

    2. P̂ (Y = k) = π̂k is estimated by the fraction of trainingsamples of class k.

    4 / 21

  • LDA has linear decision boundaries

    Suppose that:

    I We know P (Y = k) = πk exactly.

    I P (X = x|Y = k) is Mutivariate Normal with density:

    fk(x) =1

    (2π)p/2|Σ|1/2e−

    12(x−µk)TΣ−1(x−µk)

    µk : Mean of the inputs for category k.Σ : Covariance matrix (common to all categories).

    Then, what is the Bayes classifier?

    5 / 21

  • LDA has linear decision boundaries

    Suppose that:

    I We know P (Y = k) = πk exactly.

    I P (X = x|Y = k) is Mutivariate Normal with density:

    fk(x) =1

    (2π)p/2|Σ|1/2e−

    12(x−µk)TΣ−1(x−µk)

    µk : Mean of the inputs for category k.Σ : Covariance matrix (common to all categories).

    Then, what is the Bayes classifier?

    5 / 21

  • LDA has linear decision boundaries

    Suppose that:

    I We know P (Y = k) = πk exactly.

    I P (X = x|Y = k) is Mutivariate Normal with density:

    fk(x) =1

    (2π)p/2|Σ|1/2e−

    12(x−µk)TΣ−1(x−µk)

    µk : Mean of the inputs for category k.Σ : Covariance matrix (common to all categories).

    Then, what is the Bayes classifier?

    5 / 21

  • LDA has linear decision boundaries

    Suppose that:

    I We know P (Y = k) = πk exactly.

    I P (X = x|Y = k) is Mutivariate Normal with density:

    fk(x) =1

    (2π)p/2|Σ|1/2e−

    12(x−µk)TΣ−1(x−µk)

    µk : Mean of the inputs for category k.Σ : Covariance matrix (common to all categories).

    Then, what is the Bayes classifier?

    5 / 21

  • LDA has linear decision boundaries

    Suppose that:

    I We know P (Y = k) = πk exactly.

    I P (X = x|Y = k) is Mutivariate Normal with density:

    fk(x) =1

    (2π)p/2|Σ|1/2e−

    12(x−µk)TΣ−1(x−µk)

    µk : Mean of the inputs for category k.Σ : Covariance matrix (common to all categories).

    Then, what is the Bayes classifier?

    5 / 21

  • LDA has linear decision boundaries

    By Bayes rule, the probability of category k, given the input x is:

    P (Y = k | X = x) = fk(x)πkP (X = x)

    The denominator does not depend on the response k, so we canwrite it as a constant:

    P (Y = k | X = x) = C × fk(x)πk

    Now, expanding fk(x):

    P (Y = k | X = x) = Cπk(2π)p/2|Σ|1/2

    e−12(x−µk)TΣ−1(x−µk)

    6 / 21

  • LDA has linear decision boundaries

    By Bayes rule, the probability of category k, given the input x is:

    P (Y = k | X = x) = fk(x)πkP (X = x)

    The denominator does not depend on the response k, so we canwrite it as a constant:

    P (Y = k | X = x) = C × fk(x)πk

    Now, expanding fk(x):

    P (Y = k | X = x) = Cπk(2π)p/2|Σ|1/2

    e−12(x−µk)TΣ−1(x−µk)

    6 / 21

  • LDA has linear decision boundaries

    By Bayes rule, the probability of category k, given the input x is:

    P (Y = k | X = x) = fk(x)πkP (X = x)

    The denominator does not depend on the response k, so we canwrite it as a constant:

    P (Y = k | X = x) = C × fk(x)πk

    Now, expanding fk(x):

    P (Y = k | X = x) = Cπk(2π)p/2|Σ|1/2

    e−12(x−µk)TΣ−1(x−µk)

    6 / 21

  • LDA has linear decision boundaries

    P (Y = k | X = x) = Cπk(2π)p/2|Σ|1/2

    e−12(x−µk)TΣ−1(x−µk)

    Now, let us absorb everything that does not depend on k into aconstant C ′:

    P (Y = k | X = x) = C ′πke−12(x−µk)TΣ−1(x−µk)

    and take the logarithm of both sides:

    logP (Y = k | X = x) = logC ′+ log πk −1

    2(x− µk)TΣ−1(x− µk).

    This is the same for every category, k.So we want to find the maximum of this over k.

    7 / 21

  • LDA has linear decision boundaries

    P (Y = k | X = x) = Cπk(2π)p/2|Σ|1/2

    e−12(x−µk)TΣ−1(x−µk)

    Now, let us absorb everything that does not depend on k into aconstant C ′:

    P (Y = k | X = x) = C ′πke−12(x−µk)TΣ−1(x−µk)

    and take the logarithm of both sides:

    logP (Y = k | X = x) = logC ′+ log πk −1

    2(x− µk)TΣ−1(x− µk).

    This is the same for every category, k.So we want to find the maximum of this over k.

    7 / 21

  • LDA has linear decision boundaries

    P (Y = k | X = x) = Cπk(2π)p/2|Σ|1/2

    e−12(x−µk)TΣ−1(x−µk)

    Now, let us absorb everything that does not depend on k into aconstant C ′:

    P (Y = k | X = x) = C ′πke−12(x−µk)TΣ−1(x−µk)

    and take the logarithm of both sides:

    logP (Y = k | X = x) = logC ′+ log πk −1

    2(x− µk)TΣ−1(x− µk).

    This is the same for every category, k.So we want to find the maximum of this over k.

    7 / 21

  • LDA has linear decision boundaries

    P (Y = k | X = x) = Cπk(2π)p/2|Σ|1/2

    e−12(x−µk)TΣ−1(x−µk)

    Now, let us absorb everything that does not depend on k into aconstant C ′:

    P (Y = k | X = x) = C ′πke−12(x−µk)TΣ−1(x−µk)

    and take the logarithm of both sides:

    logP (Y = k | X = x) = logC ′+ log πk −1

    2(x− µk)TΣ−1(x− µk).

    This is the same for every category, k.

    So we want to find the maximum of this over k.

    7 / 21

  • LDA has linear decision boundaries

    P (Y = k | X = x) = Cπk(2π)p/2|Σ|1/2

    e−12(x−µk)TΣ−1(x−µk)

    Now, let us absorb everything that does not depend on k into aconstant C ′:

    P (Y = k | X = x) = C ′πke−12(x−µk)TΣ−1(x−µk)

    and take the logarithm of both sides:

    logP (Y = k | X = x) = logC ′+ log πk −1

    2(x− µk)TΣ−1(x− µk).

    This is the same for every category, k.So we want to find the maximum of this over k.

    7 / 21

  • LDA has linear decision boundaries

    Goal, maximize the following over k:

    log πk −1

    2(x− µk)TΣ−1(x− µk).

    = log πk −1

    2

    [xTΣ−1x+ µTkΣ

    −1µk]+ xTΣ−1µk

    =C ′′ + log πk −1

    2µTkΣ

    −1µk + xTΣ−1µk

    We define the objective:

    δk(x) = log πk −1

    2µTkΣ

    −1µk + xTΣ−1µk

    At an input x, we predict the response with the highest δk(x).

    8 / 21

  • LDA has linear decision boundaries

    Goal, maximize the following over k:

    log πk −1

    2(x− µk)TΣ−1(x− µk).

    = log πk −1

    2

    [xTΣ−1x+ µTkΣ

    −1µk]+ xTΣ−1µk

    =C ′′ + log πk −1

    2µTkΣ

    −1µk + xTΣ−1µk

    We define the objective:

    δk(x) = log πk −1

    2µTkΣ

    −1µk + xTΣ−1µk

    At an input x, we predict the response with the highest δk(x).

    8 / 21

  • LDA has linear decision boundaries

    Goal, maximize the following over k:

    log πk −1

    2(x− µk)TΣ−1(x− µk).

    = log πk −1

    2

    [xTΣ−1x+ µTkΣ

    −1µk]+ xTΣ−1µk

    =C ′′ + log πk −1

    2µTkΣ

    −1µk + xTΣ−1µk

    We define the objective:

    δk(x) = log πk −1

    2µTkΣ

    −1µk + xTΣ−1µk

    At an input x, we predict the response with the highest δk(x).

    8 / 21

  • LDA has linear decision boundaries

    Goal, maximize the following over k:

    log πk −1

    2(x− µk)TΣ−1(x− µk).

    = log πk −1

    2

    [xTΣ−1x+ µTkΣ

    −1µk]+ xTΣ−1µk

    =C ′′ + log πk −1

    2µTkΣ

    −1µk + xTΣ−1µk

    We define the objective:

    δk(x) = log πk −1

    2µTkΣ

    −1µk + xTΣ−1µk

    At an input x, we predict the response with the highest δk(x).

    8 / 21

  • LDA has linear decision boundaries

    What is the decision boundary? It is the set of points in which 2classes do just as well:

    δk(x) = δ`(x)

    log πk −1

    2µTkΣ

    −1µk + xTΣ−1µk = log π` −

    1

    2µT` Σ

    −1µ` + xTΣ−1µ`

    This is a linear equation in x.

    −4 −2 0 2 4

    −4

    −2

    02

    4

    −4 −2 0 2 4

    −4

    −2

    02

    4

    X1X1

    X2

    X2

    9 / 21

  • LDA has linear decision boundaries

    What is the decision boundary? It is the set of points in which 2classes do just as well:

    δk(x) = δ`(x)

    log πk −1

    2µTkΣ

    −1µk + xTΣ−1µk = log π` −

    1

    2µT` Σ

    −1µ` + xTΣ−1µ`

    This is a linear equation in x.

    −4 −2 0 2 4

    −4

    −2

    02

    4

    −4 −2 0 2 4

    −4

    −2

    02

    4

    X1X1

    X2

    X2

    9 / 21

  • LDA has linear decision boundaries

    What is the decision boundary? It is the set of points in which 2classes do just as well:

    δk(x) = δ`(x)

    log πk −1

    2µTkΣ

    −1µk + xTΣ−1µk = log π` −

    1

    2µT` Σ

    −1µ` + xTΣ−1µ`

    This is a linear equation in x.

    −4 −2 0 2 4

    −4

    −2

    02

    4

    −4 −2 0 2 4

    −4

    −2

    02

    4

    X1X1

    X2

    X2

    9 / 21

  • Estimating πk

    π̂k =#{i ; yi = k}

    n

    In English, the fraction of training samples of class k.

    10 / 21

  • Estimating the parameters of fk(x)

    Estimate the center of each class µk:

    µ̂k =1

    #{i ; yi = k}∑

    i ; yi=k

    xi

    Estimate the common covariance matrix Σ:

    I One predictor (p = 1):

    σ̂2 =1

    n−K

    K∑k=1

    ∑i ; yi=k

    (xi − µ̂k)2.

    I Many predictors (p > 1): Compute the vectors of deviations(x1 − µ̂y1), (x2 − µ̂y2), . . . , (xn − µ̂yn) and use an unbiasedestimate of its covariance matrix, Σ.

    11 / 21

  • Estimating the parameters of fk(x)

    Estimate the center of each class µk:

    µ̂k =1

    #{i ; yi = k}∑

    i ; yi=k

    xi

    Estimate the common covariance matrix Σ:

    I One predictor (p = 1):

    σ̂2 =1

    n−K

    K∑k=1

    ∑i ; yi=k

    (xi − µ̂k)2.

    I Many predictors (p > 1): Compute the vectors of deviations(x1 − µ̂y1), (x2 − µ̂y2), . . . , (xn − µ̂yn) and use an unbiasedestimate of its covariance matrix, Σ.

    11 / 21

  • Estimating the parameters of fk(x)

    Estimate the center of each class µk:

    µ̂k =1

    #{i ; yi = k}∑

    i ; yi=k

    xi

    Estimate the common covariance matrix Σ:

    I One predictor (p = 1):

    σ̂2 =1

    n−K

    K∑k=1

    ∑i ; yi=k

    (xi − µ̂k)2.

    I Many predictors (p > 1): Compute the vectors of deviations(x1 − µ̂y1), (x2 − µ̂y2), . . . , (xn − µ̂yn) and use an unbiasedestimate of its covariance matrix, Σ.

    11 / 21

  • Estimating the parameters of fk(x)

    Estimate the center of each class µk:

    µ̂k =1

    #{i ; yi = k}∑

    i ; yi=k

    xi

    Estimate the common covariance matrix Σ:

    I One predictor (p = 1):

    σ̂2 =1

    n−K

    K∑k=1

    ∑i ; yi=k

    (xi − µ̂k)2.

    I Many predictors (p > 1): Compute the vectors of deviations(x1 − µ̂y1), (x2 − µ̂y2), . . . , (xn − µ̂yn) and use an unbiasedestimate of its covariance matrix, Σ.

    11 / 21

  • LDA prediction

    For an input x, predict the class with the largest:

    δ̂k(x) = log π̂k −1

    2µ̂Tk Σ̂

    −1µ̂k + xT Σ̂−1µ̂k

    The decision boundaries are defined by:

    log π̂k −1

    2µ̂Tk Σ̂

    −1µ̂k + xT Σ̂−1µ̂k = log π̂` −

    1

    2µ̂T` Σ̂

    −1µ̂` + xT Σ̂−1µ̂`

    Solid lines in:

    −4 −2 0 2 4

    −4

    −2

    02

    4

    −4 −2 0 2 4

    −4

    −2

    02

    4

    X1X1

    X2

    X2

    12 / 21

  • LDA prediction

    For an input x, predict the class with the largest:

    δ̂k(x) = log π̂k −1

    2µ̂Tk Σ̂

    −1µ̂k + xT Σ̂−1µ̂k

    The decision boundaries are defined by:

    log π̂k −1

    2µ̂Tk Σ̂

    −1µ̂k + xT Σ̂−1µ̂k = log π̂` −

    1

    2µ̂T` Σ̂

    −1µ̂` + xT Σ̂−1µ̂`

    Solid lines in:

    −4 −2 0 2 4

    −4

    −2

    02

    4

    −4 −2 0 2 4

    −4

    −2

    02

    4

    X1X1

    X2

    X2

    12 / 21

  • LDA prediction

    For an input x, predict the class with the largest:

    δ̂k(x) = log π̂k −1

    2µ̂Tk Σ̂

    −1µ̂k + xT Σ̂−1µ̂k

    The decision boundaries are defined by:

    log π̂k −1

    2µ̂Tk Σ̂

    −1µ̂k + xT Σ̂−1µ̂k = log π̂` −

    1

    2µ̂T` Σ̂

    −1µ̂` + xT Σ̂−1µ̂`

    Solid lines in:

    −4 −2 0 2 4

    −4

    −2

    02

    4

    −4 −2 0 2 4

    −4

    −2

    02

    4

    X1X1

    X2

    X2

    12 / 21

  • Quadratic discriminant analysis (QDA)

    The assumption that the inputs of every class have the samecovariance Σ can be quite restrictive:

    −4 −2 0 2 4

    −4

    −3

    −2

    −1

    01

    2

    −4 −2 0 2 4

    −4

    −3

    −2

    −1

    01

    2

    X1X1

    X2

    X2

    13 / 21

  • Quadratic discriminant analysis (QDA)

    In quadratic discriminant analysis we estimate a mean µ̂k and acovariance matrix Σ̂k for each class separately.

    Given an input, it is easy to derive an objective function:

    δk(x) = log πk−1

    2µTkΣ

    −1k µk+x

    TΣ−1k µk−1

    2xTΣ−1k x−

    1

    2log |Σk|

    This objective is now quadratic in x and so are the decisionboundaries.

    14 / 21

  • Quadratic discriminant analysis (QDA)

    In quadratic discriminant analysis we estimate a mean µ̂k and acovariance matrix Σ̂k for each class separately.

    Given an input, it is easy to derive an objective function:

    δk(x) = log πk−1

    2µTkΣ

    −1k µk+x

    TΣ−1k µk−1

    2xTΣ−1k x−

    1

    2log |Σk|

    This objective is now quadratic in x and so are the decisionboundaries.

    14 / 21

  • Quadratic discriminant analysis (QDA)

    In quadratic discriminant analysis we estimate a mean µ̂k and acovariance matrix Σ̂k for each class separately.

    Given an input, it is easy to derive an objective function:

    δk(x) = log πk−1

    2µTkΣ

    −1k µk+x

    TΣ−1k µk−1

    2xTΣ−1k x−

    1

    2log |Σk|

    This objective is now quadratic in x and so are the decisionboundaries.

    14 / 21

  • Quadratic discriminant analysis (QDA)

    I Bayes boundary (– – –)

    I LDA (· · · · · · )I QDA (——).

    −4 −2 0 2 4

    −4

    −3

    −2

    −1

    01

    2

    −4 −2 0 2 4

    −4

    −3

    −2

    −1

    01

    2

    X1X1

    X2

    X2

    15 / 21

  • Evaluating a classification method

    We have talked about the 0-1 loss:

    1

    m

    m∑i=1

    1(yi 6= ŷi).

    It is possible to make the wrong prediction for some classes moreoften than others. The 0-1 loss doesn’t tell you anything about this.

    A much more informative summary of the error is a confusionmatrix:

    16 / 21

  • Evaluating a classification method

    We have talked about the 0-1 loss:

    1

    m

    m∑i=1

    1(yi 6= ŷi).

    It is possible to make the wrong prediction for some classes moreoften than others. The 0-1 loss doesn’t tell you anything about this.

    A much more informative summary of the error is a confusionmatrix:

    16 / 21

  • Example. Predicting default

    Used LDA to predict credit card default in a dataset of 10K people.

    Predicted “yes” if P (default = yes|X) > 0.5.

    I The error rate among people who do not default (falsepositive rate) is very low.

    I However, the rate of false negatives is 76%.I It is possible that false negatives are a bigger source of

    concern!I One possible solution: Change the threshold.

    17 / 21

  • Example. Predicting default

    Used LDA to predict credit card default in a dataset of 10K people.

    Predicted “yes” if P (default = yes|X) > 0.5.

    I The error rate among people who do not default (falsepositive rate) is very low.

    I However, the rate of false negatives is 76%.I It is possible that false negatives are a bigger source of

    concern!I One possible solution: Change the threshold.

    17 / 21

  • Example. Predicting default

    Used LDA to predict credit card default in a dataset of 10K people.

    Predicted “yes” if P (default = yes|X) > 0.5.

    I The error rate among people who do not default (falsepositive rate) is very low.

    I However, the rate of false negatives is 76%.

    I It is possible that false negatives are a bigger source ofconcern!

    I One possible solution: Change the threshold.

    17 / 21

  • Example. Predicting default

    Used LDA to predict credit card default in a dataset of 10K people.

    Predicted “yes” if P (default = yes|X) > 0.5.

    I The error rate among people who do not default (falsepositive rate) is very low.

    I However, the rate of false negatives is 76%.I It is possible that false negatives are a bigger source of

    concern!

    I One possible solution: Change the threshold.

    17 / 21

  • Example. Predicting default

    Used LDA to predict credit card default in a dataset of 10K people.

    Predicted “yes” if P (default = yes|X) > 0.5.

    I The error rate among people who do not default (falsepositive rate) is very low.

    I However, the rate of false negatives is 76%.I It is possible that false negatives are a bigger source of

    concern!I One possible solution: Change the threshold.

    17 / 21

  • Example. Predicting default

    Changing the threshold to 0.2 makes it easier to classify to “yes”.

    Predicted “yes” if P (default = yes|X) > 0.2.

    Note that the rate of false positives became higher! That is theprice to pay for fewer false negatives.

    18 / 21

  • Example. Predicting default

    Changing the threshold to 0.2 makes it easier to classify to “yes”.

    Predicted “yes” if P (default = yes|X) > 0.2.

    Note that the rate of false positives became higher! That is theprice to pay for fewer false negatives.

    18 / 21

  • Example. Predicting default

    Let’s visualize the dependence of the error on the threshold:

    0.0 0.1 0.2 0.3 0.4 0.5

    0.0

    0.2

    0.4

    0.6

    Threshold

    Err

    or

    Rate

    I – – – False negative rate (error for defaulting customers)

    I · · · · · False positive rate (error for non-defaulting customers)I —— 0-1 loss or total error rate.

    19 / 21

  • Example. The ROC curve

    ROC Curve

    False positive rate

    Tru

    e p

    ositiv

    e r

    ate

    0.0 0.2 0.4 0.6 0.8 1.0

    0.0

    0.2

    0.4

    0.6

    0.8

    1.0

    I Displays the performance ofthe method for any choice ofthreshold.

    I The area under the curve(AUC) measures the qualityof the classifier:

    I 0.5 is the AUC for arandom classifier

    I The closer AUC is to 1,the better.

    20 / 21

  • Example. The ROC curve

    ROC Curve

    False positive rate

    Tru

    e p

    ositiv

    e r

    ate

    0.0 0.2 0.4 0.6 0.8 1.0

    0.0

    0.2

    0.4

    0.6

    0.8

    1.0

    I Displays the performance ofthe method for any choice ofthreshold.

    I The area under the curve(AUC) measures the qualityof the classifier:

    I 0.5 is the AUC for arandom classifier

    I The closer AUC is to 1,the better.

    20 / 21

  • Next time

    I Comparison of logistic regression, LDA, QDA, and KNNclassification.

    I Start Chapter 5: Resampling.

    21 / 21