discrim class

Upload: jitendra-k-jha

Post on 04-Apr-2018

221 views

Category:

Documents


0 download

TRANSCRIPT

  • 7/30/2019 Discrim Class

    1/21

    DISCRIMINANT ANALYSISStatistic 407, ISU

    1

    WHAT IS? Supervised classification, alternatively called discriminant

    analysis, includes multivariate techniques finding ________

    ______________________________, and using this rule

    to classify new observations. The process starts with a training sample, that is the full data

    set with known classes. Typically the variables that will be

    used to generate the classification rule are easy/cheap to

    measure, but the class is more difficult to measure. It isimportant to be able to classify new observations using

    variables that are easy to measure.

    2

  • 7/30/2019 Discrim Class

    2/21

    VISUAL METHODS FORDISCRIMINATION

    Use ____________ to code the

    class/group information in theplots. Then use the full range of

    plotting methods described in the

    section of graphics. Look for

    separations of the points into the

    color/glyph grouping. Determinewhat variables are potentially

    good separators.

    3

    EXAMPLE: AUSTRALIANCRABS

    This data is from a study ofaustralian crabs. There are 5

    physical measurementsrecorded on 2 species (blue

    and orange) and both sexes ofeach species, giving 4 groups.

    This is a scatterplot of the

    blue species with the twosexes identified.

    5 10 15 20

    5

    10

    15

    20

    Frontal Lobe

    RearWidth

    MalesFemales

    Where would you draw the boundary for this data?

    4

  • 7/30/2019 Discrim Class

    3/21

    LINEAR DISCRIMINANTANALYSIS

    LDA is based on the assumption that the data comes from a________________________ with equal variance-

    covariance matrices. Comparing the density functions

    reduces the rule to:

    Allocate a new observation, X0 to group 1 if

    (X1 X2)

    S1pooledX0

    1

    2(X1 X2)

    S1pooled(X1 + X2) 0

    else allocate to group 2.

    5

    LDA RULE FOR P=1, G=2

    The LDA rule results fromassuming that data for each

    class comes from a MVNwith different means butthe same variance-

    covariance matrix.

    The boundary between the

    two groups is _________

    ___________________.

    4 2 0 2 4 6 8

    0.1

    0.0

    0.1

    0.2

    0.3

    0.4

    X

    Density

    2 222 22 222 22 222 2222222 2 222 22 22 2222 2 222 22 2 22 2 2 222 222111 11 111 111 11 1 111 11 1111 11 11 11 1 11111 1 11 111 1 1 111 11 11

    Density of group 1Density of group 2

    Samples

    LDA Boundary is where the two densities intersect

    6

  • 7/30/2019 Discrim Class

    4/21

    EXAMPLEXMale = (14.8 11.7)

    , XFem = (13.3 12.1) nMale = 50, nF em = 50

    Spooled =(n1 1)S1

    (n1 1 ) + (n2 1)+

    (n2 1)S2(n1 1 ) + (n2 1)

    =

    8.6 6.46.4 5.2

    S1pooled =

    1.47 1.811.81 2.42

    This forms thecoordinates of a vectorgiving the __________________________.

    (X1 X2)S1pooled =

    1.5 0.4

    1.47 1.811.81 2.42

    =

    3.013.86

    SMale =

    10.3 6.56.5 4.5

    SFem =

    6.9 6.36.3 5.9

    7

    EXAMPLE

    Direction of maximum

    separation:

    8

  • 7/30/2019 Discrim Class

    5/21

    EXAMPLE Data projected into the

    ___________________. Boundary between groups is at

    _____.

    LD1

    count

    0

    5

    10

    15

    0

    5

    10

    15

    Female

    Male

    -10 -5 0 5 10

    sex

    LD1

    -5

    0

    5

    10

    Male Female

    sex

    Male

    Female

    9

    EXAMPLE

    The resulting rule is:

    Classify the new observation, X0 as Male if

    [3.01 3.86]X0 + 2.93 0

    else allocate as Female.

    10

  • 7/30/2019 Discrim Class

    6/21

    Where are the prior probabilities for group 1and group 2.

    It shifts the boundary _______ from the group with

    the highest prior.

    p1 p2

    INCORPORATING PRIORS

    (X1 X2)S1pooledX0 12(X1 X2)S1pooled(X1+ X2) ln

    p2p1

    11

    MISCLASSIFICATION TABLEPredict the class of the training sample. Tabulate against

    the true class.

    Predicted membership

    Actual membership

    Group 1 Group 2

    Group 1 n1C n1M = n1 n1CGroup 2 n2M = n2 n2C n2C

    The ________________ is

    n1M+n2Mn1+n2

    12

  • 7/30/2019 Discrim Class

    7/21

    EXAMPLE

    Male Female

    Male

    Female

    45 5

    1 49

    APR=________

    13

    DISCRIMINANT FUNCTIONSThe LDA rule can be divided into parts:

    cj = XjS

    1pooled

    X0 2XjS

    1pooled

    Xj + ln(pj) j = 1, 2; i = j

    And the rule it to allocate the new observation to the group

    with the _________ value of the discriminant function.

    14

  • 7/30/2019 Discrim Class

    8/21

    CLOSEST MEAN?The LDA rule corresponds to allocating a new observation to

    the group that has the ___________ squared Mahalanobisdistance between the new observation and the group mean.

    dj =1

    2(X0 Xj)

    S1pooled(X0 Xj) ln(pj) j = 1, 2; i = j

    15

    MORE THAN 2 GROUPS

    There are now g groups, and the rule is the ____________,

    allocate to the group with the largest value of the discriminant

    function

    cj = XjS

    1pooledX0 2

    XjS

    1pooledXj + ln(pj) j = 1,...,g; i = j

    16

  • 7/30/2019 Discrim Class

    9/21

    CANONICAL COORDINATESThe low-dimensional space which best separates the groups is

    given by the _________ of whereW1B

    B=g

    i=1

    ni(Xi X)(Xi X), W=

    g

    i=1

    (ni 1)Si

    g is the number of groups, and is the overall mean.

    At most _______ dimensions are needed.

    X

    0 2 4 6 8 10

    !2

    0

    2

    4

    6

    8

    10

    X1

    X2

    111

    11

    1

    1

    111

    11

    1

    11

    1

    1 1111

    1

    1

    11

    11

    1

    1

    1

    1

    1 111 1

    11

    11

    1

    11

    1

    1

    1

    1

    11

    1

    22

    2

    2

    2

    2

    2

    22

    2

    2 2

    222

    2

    2

    2 2

    2

    2

    2 2

    22

    22

    2

    2

    2

    2 2

    2

    22 2 2

    2

    22

    2

    22

    22222

    22

    3

    33

    3

    3

    3 33

    333

    3

    333

    3 33

    3

    3

    3 33

    33

    33

    3

    3

    33

    3

    3

    3

    3

    333 3

    3

    3 33

    33

    3

    3

    33

    3

    !2 0 2 4 6

    !2

    0

    2

    4

    6

    X1

    X2

    11 1

    111

    1

    111

    1

    1

    1

    1

    1

    1

    1

    1

    1

    1

    11

    11

    1

    1

    1

    1

    1

    1

    1

    1

    11

    1

    1

    1

    1

    1 111

    1111

    11

    11 2 2

    2

    22 2 22 22

    2

    2

    22

    2

    222

    22

    2

    222 2

    2

    222

    2

    2

    2

    2

    2

    2

    2

    2

    2

    2

    2

    22

    22

    22

    2

    222

    3

    33

    3

    3

    3

    3 33

    33

    3

    3

    3

    33

    333

    33

    3

    3 3

    3

    3

    3

    33

    3

    33

    3

    3

    3

    3

    3

    3

    3 33

    3

    33

    3

    33

    33

    3Eg, g=3,1 or 2 dimneeded

    17

    QUADRATIC DISCRIMINANTANALYSIS

    Suppose that the variance-covariances are ___________ for

    each group, then the rule becomes:

    Allocate a new observation, X0 to group 1 if

    1

    2X0(S

    11 S

    12 )X0 + (X

    1S

    11 X

    1S

    11 )X0

    1

    2(ln

    |S1|

    |S2|

    + (X

    1S

    11 X1 X

    2S

    12 X2))

    lnp2

    p

    1

    else allocate to group 2.

    18

  • 7/30/2019 Discrim Class

    10/21

    DISCRIMINANT FUNCTIONSAllocate the new observation to the group with the _______

    value of the discriminant function:

    cj = 1

    2X0S

    1j X0+XjS

    1j X0

    1

    2ln(|Sj|)+X

    jS

    1j Xj+ln(pj) j = 1, 2; i = j

    19

    RELATIONSHIP BETWEENLDA AND REGRESSION

    A matrix of variables is used to predict a ___________

    __________:

    Xnp =

    X11 X12 . . . X 1pX21 X22 . . . X 2p... ... . . . ...

    Xn1 Xn2 . . . X np

    np

    Y =

    1...

    1

    2...

    2

    20

  • 7/30/2019 Discrim Class

    11/21

    LINEAR REGRESSION

    Y= b0+ b1X1+ . . .+ bpXp

    !2 0 2 4 6

    0.5

    1.0

    1.5

    2.0

    2.5

    X1

    Y

    11 11 111 1 111 111 111 1 111 1 1111 111 1111111 11 11 11 111 11111

    2 22222 222 22 2 22 2 22 2 22222 22 2222 22 222 222 22 2 22 222 222 Problems: ________

    ________________

    21

    LOGISTIC REGRESSION

    The logistic regression model is

    pk(X0) =

    exp(bk0+p

    j=1 bkjX0j)

    1+g1

    l=1 exp(bl0+p

    j=1 bljX0j)

    k = 1, . . . , g 1

    1

    1+g1

    l=1 exp(bl0+p

    j=1 bljX0j)k = g

    And classification rule would

    be to allocate to group with

    the ____________ value.

    22

  • 7/30/2019 Discrim Class

    12/21

    CLASSIFICATION TREESThe tree algorithm generates classification rules bysequentially doing ___________ on the data. Splits aremade on individual variables. On each variable the values

    are sorted, and splits between each pair of values areexamined for quality of the split using a criterionfunction. Of the cases to the left of the split, thecriterion compares the purity, the proportion which arein each class, and similarly for cases to the right of thesplit. A common criterion is entropy, which for twoclasses, would be computed as:

    p0

    log p0

    p1

    log p1

    where p0 =N0

    N, p1 =

    N1

    N= 1 p0 are the relative proportions

    of cases in classes 0,1.

    23

    This is lowest if either is ___. A ___ split

    has ____ groups to each side (bucket), all class 0

    on the left and all class 1 to the right. To measure

    the quality of a split we need to measure the

    impurity in each bucket:

    N0 or N1

    pL(pL0

    log pL0 pL

    1log pL

    1) + pR(pR

    0log pR

    0 pR

    1log pR

    1)

    where is the proportion of cases in the

    left, right buckets, respectively. This is a weighted

    average of the impurity, as measured by entropy,

    in each bucket.

    pL, pR

    24

  • 7/30/2019 Discrim Class

    13/21

    ALGORITHM

    1. For each ________, and for each possible ______

    calculate the impurity measure.

    2. Pick the split with the smallest impurity, ______ the

    data into two using this split. Each split is called a ____

    on the resulting tree.

    3. On each subset, repeat step 1-2.

    4. Splitting a node is controlled by number of cases in

    the subset at that node, and also the amount of

    impurity the node. Stop splitting when either of these

    gets below a tolerance.

    25

    DEVIANCE: MEASURINGFIT

    The ________ at a node i is defined to be:

    and thus the deviance for the classifier is

    Di =

    g

    k=1

    pik log pik

    Ti=1Di, T =number of terminal nodes.

    26

  • 7/30/2019 Discrim Class

    14/21

    EXAMPLE: OLIVE OILS 3 REGIONS,ALL VARIABLES

    > library(rpart)

    > olive.rp olive.rp

    n= 572

    node), split, n, loss, yval, (yprob)

    * denotes terminal node

    1) root 572 249 1 (0.5646853 0.1713287 0.2639860)

    2) eicosenoic>=6.5 323 0 1 (1.0000000 0.0000000 0.0000000) *

    3) eicosenoic< 6.5 249 98 3 (0.0000000 0.3935743 0.6064257)

    6) linoleic>=1053.5 98 0 2 (0.0000000 1.0000000 0.0000000) *

    7) linoleic< 1053.5 151 0 3 (0.0000000 0.0000000 1.0000000) *

    node)is the arbitrary numbering of nodes from top to bottom of the tree

    split is the ____ for the split from that node

    nis the _______ of cases at this node

    loss is the number of cases __________ at this node

    yvalis the _________ value for all cases at this node

    (yprob)are the __________ in each class

    27

    The first split is on __________ acid and the next split

    is on __________ acid.

    It only uses these ____ variables!And there is __________!

    EXAMPLE: OLIVE OILS 3 REGIONS,ALL VARIABLES

    28

  • 7/30/2019 Discrim Class

    15/21

    A CLOSER LOOK.....

    Consider the data x = (1, 2, 3, 4, 5, 6, 7, 8)

    class = (1, 1, 1, 1, 2, 2, 2, 2)

    and

    then all possible splits would be

    Left Right(1,0) (3,4)(2,0) (2,4)(3,0) (1,4)(4,0) (0,4)(4,1) (0,3)

    (4,2) (0,2)(4,3) (0,1)

    1 2 3 4 5 6 7 8

    0.

    0

    0.

    2

    0.

    4

    0.

    6

    0.

    8

    1.

    0

    Testing Trees

    x

    entropy

    1 1 1 1 2 2 2 2

    Calculate theimpurity (onslide 2) for

    each possiblesplit

    lowest valueis betweenpoints 4 and5. Thats thesplit to use.

    29

    How does it work for a nonsensical class structure?Consider data:

    x = (1, 2, 3, 4, 5, 6, 7, 8)

    (1, 2, 1, 2, 1, 2, 1, 2)class =

    1 2 3 4 5 6 7 8

    0.

    0

    0.

    2

    0

    .4

    0.

    6

    0.

    8

    1.

    0

    Testing Trees

    x

    entropy

    1 1 1 12 2 2 2

    The split chosen will most

    likely be the first one,

    between points 1 and 2.

    30

  • 7/30/2019 Discrim Class

    16/21

    HOW DOES IT WORK ON THE OLIVEOILS DATA?

    In practice the impurity functions can be quite _____.

    The next two sets of plots show the impurity measurecalculated to separate the (1) southern oils from the

    other two regions, and (2) northern from Sardinianoils.

    ________ acid is the variable with the lowest impurityoverall, 0. It would be chosen as the most importantvariable at the top of the tree.

    ________ acid is the variable with the lowestimpurity, 0, when region 1 is removed. It would be

    chosen as the second split variable.

    31

    800 1000 1200 1400 1600

    !0.2

    0.0

    0.

    2

    0.

    4

    0.

    6

    0.8

    palmitic

    entropy

    111 1 111 1111111 1 11 111 11 111 111111 11111111 111 11111 11 111111 11 111 11 111 11 111111111 11111 111 1111 1111 1111 1111 1111 1111111 11 11 11 11 11111 11 11 111 11 11 1111 111 11 111111 11 11 1111 1111 1111 111 111 111 11111 111111 11 1 11 11 111 1 1111 11 1111 11111 1111 11111111111 111 11 1 11 111 11 1 111111 1 111 11 1 11 11 11111 111 11 1111 1 111 1 11 111111 1111 1 1 1111 111 111111 1 111 11 111 111 11111 11

    22 222 2222222222 22222 22222222 222 2222222 22 22222 22 22222222 2222 222222 222 22222 222 2222 2222222 222222222 2222222222 22222222222222222222222222222222 22222222222 22222 22 2222222222 222 222222 222 2 222 2 222222222222222 222 22 222 222 222222 2 2222 2222222222 22 222 2222 222 2 22222

    0.3

    50 100 150 200 250

    !0.2

    0.0

    0.

    2

    0.

    4

    0.

    6

    0.8

    palmitoleic

    entropy

    111111 11111 1111 1111111111 11111 1111111 1111 1111 11 11111 11 11 1111 11 11111 11 11 111 11 111 11 111111 11111111 1111111111 11111111 111 1111 11 1111111111111 1111111 111 111 11 111 11 11111 11 1111 111 11 1111 1 11 11 11111 11 11 11 11 111 1 1111 11 1111 11 111 1 111 11111 1 11111 1 11 111 11 111 111 1111 11 1 111 1111111 11 111111 1 1 11111 111 1 11111111 1111 1 111 111 11 1 11111 1 111 11 1 1 1111 1 1111 11

    2 22222 222 222 22 22 22222222 22 222222 222222 22222 222 22 2222222 222 222222222 2 22 22 222 22 22 22 22 2 222 22 2 2222222 2222222 222222222222222 2222 222222222 2222 2222 22 22 22222 2 222222 2222 22222222222 222 22 222222 2222 2222222222222 222 22 222 222 222222 2 2222 222222222 222 22222 2222 22222 22

    0.4

    150 200 250 300 350

    !0.2

    0.0

    0.

    2

    0.

    4

    0.

    6

    0.8

    stearic

    entropy

    11 11 1111111 11 1 111 11 11 111 111 1 111 111 11 11 111 111 11 1 1 1 11 11 111 11 111 11 111111 1 11111 111 11111 1111 1111111 111 11 11 1111 1111 11 11 111 1 11 1 111111 1111 11 11 11 1 1 11111 111 111 11111 11 111 11 111 1 1 111111 1 11 1111 11111 11 111 1111 11 11 11 1111 1 11 1 1111 111 1111 1111 11 11 1111 1111 11 111 1111 11111 111 11 1 11 11 1111 11 11 111 111111 11111111 111 11 1 111 111 1111 1 111 11 11 1 11 111 1111 11111 1

    222 22 222 2 22 22222222 2222 2 222 22222 22 2222 2 22 22 222 2 22222 222 222 22222222 222222222 222 222 222 22222 22 2 22222222222222 2222 222222222222222222 222 222 22 22 222 222 22222 22 22222 22222 2 2 2 222 222 2 2 2222 22 2222 22 222 22 222 2222 222 2222 22 22 22 22 2 2 22 22 2 2222 2 2 222 22 222 22 22 22222222 222 22

    0.67

    6500 7000 7500 8000

    !0.

    2

    0.

    0

    0.2

    0.

    4

    0.

    6

    0.

    8

    oleic

    entropy

    11 111 1111 1111 111 11 111 11 111111 111 111 11111 1111 1111 11 111 111 11 111 1111 11111 111 1111 11 111 11 11111 1 111 111111 1111 111 11111111 111 1 11 111 11 11 11 1111111 11 11 111 1 111 11 11 1111 1111 11 11 1 1111 11 111 1111 111111 11 1111111 11 1 11111 11 11 1 111 11 1111 1 11 111 11111 1111 111 111 1111111 1111 111 111 111 11 11 11 1 111 1111 1 1111 111111 11 1 111 11111111111 11 1 11 1 111 1 11 11111 1 1111111 1

    2 222 2222222222222 2222222222222 222222222222222222 22 2 22 222 222222 22222 22 22222 2222 2222222222 2222 22 2222 2222222222222222222222222222222222222222222222222222 2222 2 22222222 22222 222 222222 222 2 2222222222222222 22 2222 22 22 2222 22 2222 2222 222 22222 22 222 2222 22 2222 222 2

    0.49

    6 00 8 00 1 00 0 1 20 0 1 40 0

    !0.

    2

    0.

    0

    0.2

    0.

    4

    0.

    6

    0.

    8

    linoleic

    entropy

    1 11 1 111 1111 1 11 111 11111111 1 11 11111111111111 111 11 11111 11 111 111 111 1 11 111 111111 111 111 11 11 1 11111 111111 11111 11111 11 111111 111 111 11111111111111 11 11 11 1111 11111 111 1111 111 11 1111 11111 1111 1111111111111 11 1 11 111 1 11 111 1 11111111 11 11111 1 11111111 111 1 11 1 11 1 1 111 11 1 1 111 11 111 111 11 1 1111 1 1 11 1 1 11 1 111 111 111 1111 11111 111 111 11 111 1111 111 11111111 111 1 111

    222 22222 222 2222 222222 22222 222222222222222222222 22 222 2222222 22222222 222 22222 2222 2222222 222 2222 22 222222222222222222222222222222222222222222 222222 22 2222 2 222222 2222 222222 222222 22 22222 222222 2 222222 2222 222 2222 22 22 22222 2 222 222 2 222 2222 2 222222 222222 22 2222 222

    0.63

    0 10 20 30 40 50 60 70

    !0.

    2

    0.

    0

    0.2

    0.

    4

    0.

    6

    0.

    8

    linolenic

    entropy

    111 11111 111 1 11 11 1 111 1111 1111111 1111 11 11111111111 11 1111 111 11 1 1 111 111 111 1 11 1111 1 1111111 1 11 111 1111 111 1111 11 1 111 111 1 1111 111 11 1 1111 1111 11 11111 11 11 11 1111 11 11 11 1 111 111 11111 11 11 111 1111 11 11 11 1 111111111 11111111111111 111111111111111111 11111 11111111111111111111111 11 11 11 1111 1111 11111 1111 111 11111 111111111111 111 11 11111 1 1 1111 111 1 1 11 1111 1

    2222222 22222222 22 222 22222 2222 2 222 22 2222 222 222 2 222222 22222 2222222222222 2222 222 222 222 2 222 222222222222 2 2 22 22 2 22 22 22 222 22222 2 222 2222 222 222 2 2 22 222222222 22 222 2222 2 22 22222 22 2222 22 22 222222 22 222 2 2222 2 2 22222222222 222222 22222222 22222 22 22222 22222 222 222222 2222 22

    0.51

    0 20 40 60 80 100

    !0.

    2

    0.

    0

    0.2

    0.

    4

    0.6

    0.

    8

    arachidic

    e

    ntropy

    111 1111 1 11 11 1111 11 111111 111111 111111111 111111 1 1111 11 11111 111 11 11 111 111 111111 1 11111111 1 1111 1 1111111 1111 11 1 111 11111 11 1 111 111 1111111 11 1 11111 111 111111 1 11 11 1 1111 11 1111 11111 11111 11 11 111111 111 1 11 11 11 11 1111 11 1111 11 11 111111 11 1111 1111 11 111 111 1111 11111111 11111 11111 11 11 11 1 1111111 111 11 11111 1 11 111 111 11 111111 1 11 11 11 111 1 11 11 1 1111 1 111 1111

    2222222222 22 22 2222 2 2 222 22 22 22 2222 22 22 22 222 222 22 22 222 222222 2 22 2 22 22222 22 222 222 222 222 2 2 22 22 2 22 222 2 222 222 2 22 22 22 22222 22 22 222 222 2 2222 22222 22 222 2 22 22 22 22 2 22 22 222 2 22 22222 222222 22 2 22 2 2222 2 2222 2 222 2 2 22 222 222222 22 222222 2 222222 22 22 2222 222222 222 22 22 22 222 2222 2 2

    0.56

    0 10 20 30 40 50

    !0.

    2

    0.

    0

    0.2

    0.

    4

    0.6

    0.

    8

    eicosenoic

    e

    ntropy

    111 1 111 1111 111 11111 1 11 11 1111 1 1 1111 111 11 111 1 11 111 11 11 11 111 11 11 111 1 11 1 11 1 1111 11 1111111 1 1 11 11 1111111 1 111 111 1 111111 1111 11111 11 1111 11 11 1 111 1 11 1111 11111 11 1 1 111 1 1111111 1111 1111 1 11 1 1 111 1 11 111 111 11 111 1 11 1111 11 1111 11 11111 111 11 1111 11 1 111111 111 111 111 1111 11 11 111111 111 1111 1 11 1111 11 11 11111 111 111 11111 1 111 11111 11 11 1 111 11 11 11111 11 11 11 1 1 111

    222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222

    0

    32

  • 7/30/2019 Discrim Class

    17/21

    800 900 1100 1300

    !0.

    2

    0.

    0

    0.

    2

    0.

    4

    0.6

    0.

    8

    palmitic

    entropy

    22 222 222222 2222 22222 22222222 222 222222 2 22 22222 22 2222222 2 2222 222222 2 22 22 2 22 222 2222 22 2222 2 22 2 222 22 2 22

    33333333 3333333333333333333333 333 3333333 333 333 33333 3333 3 33 33 33333333 3 33 333 3 33 333 3 33 3 333333333333 3 33 333 33 333 33 3 33333 3 3 3333 3333333333 33 333 3333 333 3 33333

    0.62

    50 100 150

    !0.

    2

    0.

    0

    0.

    2

    0.

    4

    0.6

    0.

    8

    palmitoleic

    entropy

    2 22222 222 22 2 22 22 22 222222 22 2 22222 2222 22 22222 222 22 222 2222 222 222 222 222 2 22 22 222 22 22 22 22 2 222 22 2 22 22222 2

    333333 333 3333 33333333 3333 33 33333 33 3333 3333 33 33 33333 3 3 33333 3 33 3 33 333 33 333 3 333 33 333 333 33 33 333 333 3333 333 333 33 333 333 333333 3 33 33 3 33333333 333 33333 33 33 33 333 33

    0.56

    200 250 300 350

    !0.

    2

    0.

    0

    0.

    2

    0.

    4

    0.6

    0.

    8

    stearic

    entropy

    222 22 222 2 22 222 22222 2222 2 222 22222 22 2222 2 22 22 222 2 22222 222 222 22222222 222222222 222 222 222 22222 22 2 222222

    33333333 33 33 333333333333333333 333 333 33 33 333 333 333 33 33 33 333 3 3333 3 3 3 333 333 3 3 3333 33 3333 33 3 33 33 333 3333 3 33 33 33 33 33 33 33 3 3 33 33 3 3333 3 3 333 33 333 33 33 3333 333 3 333 33

    0.59

    7000 7400 7800 8200

    !0.

    2

    0.

    0

    0.

    2

    0.

    4

    0.

    6

    0.8

    oleic

    entropy

    2 222 222 2222222222 22222 22222222 222222222 222 222222 22 22 222 222222 22222 22 22222 2222 222222 2222 2222 22 2222

    333333 333 3333333333333333 333333333333 33 333333 3333333 3333 3 333333 33 33333 333 333333 333 3 333 3333 33 3333333 33 33 33 33 33 3333 33 3333 33 33 333 33333 33 333 33 33 33 3333 333 3

    0.04

    60 0 80 0 1 00 0 1 20 0 1 40 0

    !0.

    2

    0.

    0

    0.

    2

    0.

    4

    0.

    6

    0.8

    linoleic

    entropy

    222 22 222 222 2222 222222 22222 222222222222222222222 22 22 2222222 22222222 222 22222 2222 2222222 222 2222 22 222

    333333333333333333333333333333333333333 333 333 33 3333 3 3 33333 3333 333333 333333 33 33333 333333 3 33 3333 3333 333 3333 33 33 33333 3 333 333 3 333 3333 3 333333 333333 33 3333 333

    0

    0 10 20 30 40 50 60

    !0.

    2

    0.

    0

    0.

    2

    0.

    4

    0.

    6

    0.8

    linolenic

    entropy

    222 2222 22222222 22 2 22 2 2222 2222 2 222 22 2222 222 222 2 222222 22222 22222 22222222 2222 222 22 2 222 2 222 2222222222 2

    3 3 3 33 33 3 33 33 33 3 33 33333 3 333 3333 333 333 3 3 33 333333333 3333 3333 3 33 33333 33 3333 33 33 333333 33 333 3 3333 3 3 33333333333 333333 33333333 33333 33 33333 33333 333 333333 3333 33

    0.51

    0 20 40 60 80 100

    !0.

    2

    0.

    0

    0.

    2

    0.4

    0.

    6

    0.

    8

    arachidic

    entropy

    2222222222 22 22 2222 2 2 222 22 22 22 2222 22 22 22 222 222 22 22 222 222222 2 22 2 22 22222 22 222 222 222 222 2 2 22 22 2 22 222 2 22

    3 333 3 33 33 33 33333 33 33 333 333 3 3333 33333 33 333 3 33 33 33 33 3 33 33 333 3 33 33333 333333 33 3 33 3 3333 3 3333 3 333 3 3 33 333 333333 33 333333 3 333333 33 33 3333 333333 333 33 33 33 333 3333 3 3

    0.4

    1.0 1.5 2.0 2.5 3.0

    !0.

    2

    0.

    0

    0.

    2

    0.4

    0.

    6

    0.

    8

    eicosenoic

    entropy

    2 22222 2 2222 22 22 2222222 22 2222 222 22222 222 22222 222 22 222 22 2222 2 22 222 22 22 22 222222 22 2222 22 222222 22222 22 2

    33 33 3333 33333 33333 33 33333 3 33333 3 33 33333 33 33333 3333 333 333 33 3333333 3333 333 3 3333333 333 333 333 33 3 3333 333 3333 333 33 3 333 33 333333 33 333 333 3 33333 33 333 333 333 33 3333

    0.54

    33

    STRENGTHS ANDWEAKNESSES

    The solutions are usually ________, and easy to implement. Thereare few probabilistic assumptions underlying trees, which complicatethe solution. For example, because LDA assumed that the variance-

    covariance of the groups are equal it doesn't see the ``perfect'' splitof northern and sardinian oils in linoleic acid.

    The fitting ___________________ in the sense that the first bestfit will be used at each split, but it may be a better final result mightbe obtained by a less optimal previous step.

    34

  • 7/30/2019 Discrim Class

    18/21

    STRENGTHS ANDWEAKNESSES

    The additive model approach, _______________, is too limited for

    problems where separations between groups is due to combinationsof variables. But because it works variable-by-variable it can______________________, using complete data on each variable.Trees can also accommodate complex data, where some variables arecontinuous and some are categorical.

    Because it is an algorithmic method it can be easy to ___________(_______) the data. The tree will then not have inferential power: it

    will have worse error on new data. Split the current data into trainingand test sets, use the training subset to build the tree, and the testset to estimate the error.

    35

    TREES DONT DO SO WELL IN THEPRESENCE OF COVARIANCEBETWEEN VARIABLES

    36

  • 7/30/2019 Discrim Class

    19/21

    OTHER COMMONCLASSIFICATION METHODS

    _______________ - fit many trees to samples of the data,

    and subsets of the variables, and combine the predictions.

    _______________ - a mixture of logistic regression

    models.

    ____________________ - find gaps between groupsand fit a hyperplane to the points bordering the gaps.

    37

    NEURAL NETWORKFeed-forward neural networks (FFNN) were developed from thisconcept, that combining small components is a way to build amodel from predictors to response. They actually generalize

    ___________________. A simple network model is

    represented by:

    y = f(x) = ( +

    s

    h=1

    wh(h +

    p

    i=1

    wihxi))

    where x is the vector of explanatory variable values, y is the targetvalue, p is the number of variables, s is the number of nodes in thesingle hidden layer and phi is a fixed function, usually a linear orlogistic function. This model has a single hidden layer, and univariateoutput values.

    38

  • 7/30/2019 Discrim Class

    20/21

    y = f(x) = ( +s

    h=1

    wh(h +

    p

    i=1

    wihxi))

    The network is fit by minimizing a squared error

    ni=1(yi f(x))

    2

    39

    SUPPORT VECTOR MACHINES The algorithm finds a

    hyperplane that maximizes

    the ______________ (gap)

    between the two classes.

    The points on the edge of

    the margin are called

    _____________, and are

    used to define the

    hyperplane.

    Boundary

    Support vectorsw.x + b = 0

    w =N

    Si=1iyixi

    NS is the number of support vectors

    40

  • 7/30/2019 Discrim Class

    21/21

    This work is licensed under the Creative CommonsAttribution-Noncommercial 3.0 United States License.To view a copy of this license, visit http://creativecommons.org/licenses/by-nc/3.0/us/ or send aletter to Creative Commons, 171 Second Street, Suite300, San Francisco, California, 94105, USA.

    41