discrim class

7/30/2019 Discrim Class

1/21

DISCRIMINANT ANALYSISStatistic 407, ISU

1

WHAT IS? Supervised classification, alternatively called discriminant

analysis, includes multivariate techniques finding ________

______________________________, and using this rule

to classify new observations. The process starts with a training sample, that is the full data

set with known classes. Typically the variables that will be

used to generate the classification rule are easy/cheap to

measure, but the class is more difficult to measure. It isimportant to be able to classify new observations using

variables that are easy to measure.

2


2/21

VISUAL METHODS FORDISCRIMINATION

Use ____________ to code the

class/group information in theplots. Then use the full range of

plotting methods described in the

section of graphics. Look for

separations of the points into the

color/glyph grouping. Determinewhat variables are potentially

good separators.

3

EXAMPLE: AUSTRALIANCRABS

This data is from a study ofaustralian crabs. There are 5

physical measurementsrecorded on 2 species (blue

and orange) and both sexes ofeach species, giving 4 groups.

This is a scatterplot of the

blue species with the twosexes identified.

5 10 15 20

5

10

15

20

Frontal Lobe

RearWidth

MalesFemales

Where would you draw the boundary for this data?

4


3/21

LINEAR DISCRIMINANTANALYSIS

LDA is based on the assumption that the data comes from a________________________ with equal variance-

covariance matrices. Comparing the density functions

reduces the rule to:

Allocate a new observation, X0 to group 1 if

(X1 X2)

S1pooledX0

1

2(X1 X2)

S1pooled(X1 + X2) 0

else allocate to group 2.

5

LDA RULE FOR P=1, G=2

The LDA rule results fromassuming that data for each

class comes from a MVNwith different means butthe same variance-

covariance matrix.

The boundary between the

two groups is _________

___________________.

4 2 0 2 4 6 8

0.1

0.0

0.1

0.2

0.3

0.4

X

Density

2 222 22 222 22 222 2222222 2 222 22 22 2222 2 222 22 2 22 2 2 222 222111 11 111 111 11 1 111 11 1111 11 11 11 1 11111 1 11 111 1 1 111 11 11

Density of group 1Density of group 2

Samples

LDA Boundary is where the two densities intersect

6


4/21

EXAMPLEXMale = (14.8 11.7)

, XFem = (13.3 12.1) nMale = 50, nF em = 50

Spooled =(n1 1)S1

(n1 1 ) + (n2 1)+

(n2 1)S2(n1 1 ) + (n2 1)

=

8.6 6.46.4 5.2

S1pooled =

1.47 1.811.81 2.42

This forms thecoordinates of a vectorgiving the __________________________.

(X1 X2)S1pooled =

1.5 0.4

1.47 1.811.81 2.42

=

3.013.86

SMale =

10.3 6.56.5 4.5

SFem =

6.9 6.36.3 5.9

7

EXAMPLE

Direction of maximum

separation:

8


5/21

EXAMPLE Data projected into the

___________________. Boundary between groups is at

_____.

LD1

count

0

5

10

15

0

5

10

15

Female

Male

-10 -5 0 5 10

sex

LD1

-5

0

5

10

Male Female

sex

Male

Female

9

EXAMPLE

The resulting rule is:

Classify the new observation, X0 as Male if

[3.01 3.86]X0 + 2.93 0

else allocate as Female.

10


6/21

Where are the prior probabilities for group 1and group 2.

It shifts the boundary _______ from the group with

the highest prior.

p1 p2

INCORPORATING PRIORS

(X1 X2)S1pooledX0 12(X1 X2)S1pooled(X1+ X2) ln

p2p1

11

MISCLASSIFICATION TABLEPredict the class of the training sample. Tabulate against

the true class.

Predicted membership

Actual membership

Group 1 Group 2

Group 1 n1C n1M = n1 n1CGroup 2 n2M = n2 n2C n2C

The ________________ is

n1M+n2Mn1+n2

12


7/21

EXAMPLE

Male Female

Male

Female

45 5

1 49

APR=________

13

DISCRIMINANT FUNCTIONSThe LDA rule can be divided into parts:

cj = XjS

1pooled

X0 2XjS

1pooled

Xj + ln(pj) j = 1, 2; i = j

And the rule it to allocate the new observation to the group

with the _________ value of the discriminant function.

14


8/21

CLOSEST MEAN?The LDA rule corresponds to allocating a new observation to

the group that has the ___________ squared Mahalanobisdistance between the new observation and the group mean.

dj =1

2(X0 Xj)

S1pooled(X0 Xj) ln(pj) j = 1, 2; i = j

15

MORE THAN 2 GROUPS

There are now g groups, and the rule is the ____________,

allocate to the group with the largest value of the discriminant

function

cj = XjS

1pooledX0 2

XjS

1pooledXj + ln(pj) j = 1,...,g; i = j

16


9/21

CANONICAL COORDINATESThe low-dimensional space which best separates the groups is

given by the _________ of whereW1B

B=g

i=1

ni(Xi X)(Xi X), W=

g

i=1

(ni 1)Si

g is the number of groups, and is the overall mean.

At most _______ dimensions are needed.

X

0 2 4 6 8 10

!2

0

2

4

6

8

10

X1

X2

111

11

1

1

111

11

1

11

1

1 1111

1

1

11

11

1

1

1

1

1 111 1

11

11

1

11

1

1

1

1

11

1

22

2

2

2

2

2

22

2

2 2

222

2

2

2 2

2

2

2 2

22

22

2

2

2

2 2

2

22 2 2

2

22

2

22

22222

22

3

33

3

3

3 33

333

3

333

3 33

3

3

3 33

33

33

3

3

33

3

3

3

3

333 3

3

3 33

33

3

3

33

3

!2 0 2 4 6

!2

0

2

4

6

X1

X2

11 1

111

1

111

1

1

1

1

1

1

1

1

1

1

11

11

1

1

1

1

1

1

1

1

11

1

1

1

1

1 111

1111

11

11 2 2

2

22 2 22 22

2

2

22

2

222

22

2

222 2

2

222

2

2

2

2

2

2

2

2

2

2

2

22

22

22

2

222

3

33

3

3

3

3 33

33

3

3

3

33

333

33

3

3 3

3

3

3

33

3

33

3

3

3

3

3

3

3 33

3

33

3

33

33

3Eg, g=3,1 or 2 dimneeded

17

QUADRATIC DISCRIMINANTANALYSIS

Suppose that the variance-covariances are ___________ for

each group, then the rule becomes:

Allocate a new observation, X0 to group 1 if

1

2X0(S

11 S

12 )X0 + (X

1S

11 X

1S

11 )X0

1

2(ln

|S1|

|S2|

+ (X

1S

11 X1 X

2S

12 X2))

lnp2

p

1

else allocate to group 2.

18


10/21

DISCRIMINANT FUNCTIONSAllocate the new observation to the group with the _______

value of the discriminant function:

cj = 1

2X0S

1j X0+XjS

1j X0

1

2ln(|Sj|)+X

jS

1j Xj+ln(pj) j = 1, 2; i = j

19

RELATIONSHIP BETWEENLDA AND REGRESSION

A matrix of variables is used to predict a ___________

__________:

Xnp =

X11 X12 . . . X 1pX21 X22 . . . X 2p... ... . . . ...

Xn1 Xn2 . . . X np

np

Y =

1...

1

2...

2

20


11/21

LINEAR REGRESSION

Y= b0+ b1X1+ . . .+ bpXp

!2 0 2 4 6

0.5

1.0

1.5

2.0

2.5

X1

Y

11 11 111 1 111 111 111 1 111 1 1111 111 1111111 11 11 11 111 11111

2 22222 222 22 2 22 2 22 2 22222 22 2222 22 222 222 22 2 22 222 222 Problems: ________

________________

21

LOGISTIC REGRESSION

The logistic regression model is

pk(X0) =

exp(bk0+p

j=1 bkjX0j)

1+g1

l=1 exp(bl0+p

j=1 bljX0j)

k = 1, . . . , g 1

1

1+g1

l=1 exp(bl0+p

j=1 bljX0j)k = g

And classification rule would

be to allocate to group with

the ____________ value.

22


12/21

CLASSIFICATION TREESThe tree algorithm generates classification rules bysequentially doing ___________ on the data. Splits aremade on individual variables. On each variable the values

are sorted, and splits between each pair of values areexamined for quality of the split using a criterionfunction. Of the cases to the left of the split, thecriterion compares the purity, the proportion which arein each class, and similarly for cases to the right of thesplit. A common criterion is entropy, which for twoclasses, would be computed as:

p0

log p0

p1

log p1

where p0 =N0

N, p1 =

N1

N= 1 p0 are the relative proportions

of cases in classes 0,1.

23

This is lowest if either is ___. A ___ split

has ____ groups to each side (bucket), all class 0

on the left and all class 1 to the right. To measure

the quality of a split we need to measure the

impurity in each bucket:

N0 or N1

pL(pL0

log pL0 pL

1log pL

1) + pR(pR

0log pR

0 pR

1log pR

1)

where is the proportion of cases in the

left, right buckets, respectively. This is a weighted

average of the impurity, as measured by entropy,

in each bucket.

pL, pR

24


13/21

ALGORITHM

1. For each ________, and for each possible ______

calculate the impurity measure.

2. Pick the split with the smallest impurity, ______ the

data into two using this split. Each split is called a ____

on the resulting tree.

3. On each subset, repeat step 1-2.

4. Splitting a node is controlled by number of cases in

the subset at that node, and also the amount of

impurity the node. Stop splitting when either of these

gets below a tolerance.

25

DEVIANCE: MEASURINGFIT

The ________ at a node i is defined to be:

and thus the deviance for the classifier is

Di =

g

k=1

pik log pik

Ti=1Di, T =number of terminal nodes.

26


14/21

EXAMPLE: OLIVE OILS 3 REGIONS,ALL VARIABLES

> library(rpart)

> olive.rp olive.rp

n= 572

node), split, n, loss, yval, (yprob)

* denotes terminal node

1) root 572 249 1 (0.5646853 0.1713287 0.2639860)

2) eicosenoic>=6.5 323 0 1 (1.0000000 0.0000000 0.0000000) *

3) eicosenoic< 6.5 249 98 3 (0.0000000 0.3935743 0.6064257)

6) linoleic>=1053.5 98 0 2 (0.0000000 1.0000000 0.0000000) *

7) linoleic< 1053.5 151 0 3 (0.0000000 0.0000000 1.0000000) *

node)is the arbitrary numbering of nodes from top to bottom of the tree

split is the ____ for the split from that node

nis the _______ of cases at this node

loss is the number of cases __________ at this node

yvalis the _________ value for all cases at this node

(yprob)are the __________ in each class

27

The first split is on __________ acid and the next split

is on __________ acid.

It only uses these ____ variables!And there is __________!

EXAMPLE: OLIVE OILS 3 REGIONS,ALL VARIABLES

28


15/21

A CLOSER LOOK.....

Consider the data x = (1, 2, 3, 4, 5, 6, 7, 8)

class = (1, 1, 1, 1, 2, 2, 2, 2)

and

then all possible splits would be

Left Right(1,0) (3,4)(2,0) (2,4)(3,0) (1,4)(4,0) (0,4)(4,1) (0,3)

(4,2) (0,2)(4,3) (0,1)

1 2 3 4 5 6 7 8

0.

0

0.

2

0.

4

0.

6

0.

8

1.

0

Testing Trees

x

entropy

1 1 1 1 2 2 2 2

Calculate theimpurity (onslide 2) for

each possiblesplit

lowest valueis betweenpoints 4 and5. Thats thesplit to use.

29

How does it work for a nonsensical class structure?Consider data:

x = (1, 2, 3, 4, 5, 6, 7, 8)

(1, 2, 1, 2, 1, 2, 1, 2)class =

1 2 3 4 5 6 7 8

0.

0

0.

2

0

.4

0.

6

0.

8

1.

0

Testing Trees

x

entropy

1 1 1 12 2 2 2

The split chosen will most

likely be the first one,

between points 1 and 2.

30


16/21

HOW DOES IT WORK ON THE OLIVEOILS DATA?

In practice the impurity functions can be quite _____.

The next two sets of plots show the impurity measurecalculated to separate the (1) southern oils from the

other two regions, and (2) northern from Sardinianoils.

________ acid is the variable with the lowest impurityoverall, 0. It would be chosen as the most importantvariable at the top of the tree.

________ acid is the variable with the lowestimpurity, 0, when region 1 is removed. It would be

chosen as the second split variable.

31

800 1000 1200 1400 1600

!0.2

0.0

0.

2

0.

4

0.

6

0.8

palmitic

entropy

111 1 111 1111111 1 11 111 11 111 111111 11111111 111 11111 11 111111 11 111 11 111 11 111111111 11111 111 1111 1111 1111 1111 1111 1111111 11 11 11 11 11111 11 11 111 11 11 1111 111 11 111111 11 11 1111 1111 1111 111 111 111 11111 111111 11 1 11 11 111 1 1111 11 1111 11111 1111 11111111111 111 11 1 11 111 11 1 111111 1 111 11 1 11 11 11111 111 11 1111 1 111 1 11 111111 1111 1 1 1111 111 111111 1 111 11 111 111 11111 11

22 222 2222222222 22222 22222222 222 2222222 22 22222 22 22222222 2222 222222 222 22222 222 2222 2222222 222222222 2222222222 22222222222222222222222222222222 22222222222 22222 22 2222222222 222 222222 222 2 222 2 222222222222222 222 22 222 222 222222 2 2222 2222222222 22 222 2222 222 2 22222

0.3

50 100 150 200 250

!0.2

0.0

0.

2

0.

4

0.

6

0.8

palmitoleic

entropy

111111 11111 1111 1111111111 11111 1111111 1111 1111 11 11111 11 11 1111 11 11111 11 11 111 11 111 11 111111 11111111 1111111111 11111111 111 1111 11 1111111111111 1111111 111 111 11 111 11 11111 11 1111 111 11 1111 1 11 11 11111 11 11 11 11 111 1 1111 11 1111 11 111 1 111 11111 1 11111 1 11 111 11 111 111 1111 11 1 111 1111111 11 111111 1 1 11111 111 1 11111111 1111 1 111 111 11 1 11111 1 111 11 1 1 1111 1 1111 11

2 22222 222 222 22 22 22222222 22 222222 222222 22222 222 22 2222222 222 222222222 2 22 22 222 22 22 22 22 2 222 22 2 2222222 2222222 222222222222222 2222 222222222 2222 2222 22 22 22222 2 222222 2222 22222222222 222 22 222222 2222 2222222222222 222 22 222 222 222222 2 2222 222222222 222 22222 2222 22222 22

0.4

150 200 250 300 350

!0.2

0.0

0.

2

0.

4

0.

6

0.8

stearic

entropy

11 11 1111111 11 1 111 11 11 111 111 1 111 111 11 11 111 111 11 1 1 1 11 11 111 11 111 11 111111 1 11111 111 11111 1111 1111111 111 11 11 1111 1111 11 11 111 1 11 1 111111 1111 11 11 11 1 1 11111 111 111 11111 11 111 11 111 1 1 111111 1 11 1111 11111 11 111 1111 11 11 11 1111 1 11 1 1111 111 1111 1111 11 11 1111 1111 11 111 1111 11111 111 11 1 11 11 1111 11 11 111 111111 11111111 111 11 1 111 111 1111 1 111 11 11 1 11 111 1111 11111 1

222 22 222 2 22 22222222 2222 2 222 22222 22 2222 2 22 22 222 2 22222 222 222 22222222 222222222 222 222 222 22222 22 2 22222222222222 2222 222222222222222222 222 222 22 22 222 222 22222 22 22222 22222 2 2 2 222 222 2 2 2222 22 2222 22 222 22 222 2222 222 2222 22 22 22 22 2 2 22 22 2 2222 2 2 222 22 222 22 22 22222222 222 22

0.67

6500 7000 7500 8000

!0.

2

0.

0

0.2

0.

4

0.

6

0.

8

oleic

entropy

11 111 1111 1111 111 11 111 11 111111 111 111 11111 1111 1111 11 111 111 11 111 1111 11111 111 1111 11 111 11 11111 1 111 111111 1111 111 11111111 111 1 11 111 11 11 11 1111111 11 11 111 1 111 11 11 1111 1111 11 11 1 1111 11 111 1111 111111 11 1111111 11 1 11111 11 11 1 111 11 1111 1 11 111 11111 1111 111 111 1111111 1111 111 111 111 11 11 11 1 111 1111 1 1111 111111 11 1 111 11111111111 11 1 11 1 111 1 11 11111 1 1111111 1

2 222 2222222222222 2222222222222 222222222222222222 22 2 22 222 222222 22222 22 22222 2222 2222222222 2222 22 2222 2222222222222222222222222222222222222222222222222222 2222 2 22222222 22222 222 222222 222 2 2222222222222222 22 2222 22 22 2222 22 2222 2222 222 22222 22 222 2222 22 2222 222 2

0.49

6 00 8 00 1 00 0 1 20 0 1 40 0

!0.

2

0.

0

0.2

0.

4

0.

6

0.

8

linoleic

entropy

1 11 1 111 1111 1 11 111 11111111 1 11 11111111111111 111 11 11111 11 111 111 111 1 11 111 111111 111 111 11 11 1 11111 111111 11111 11111 11 111111 111 111 11111111111111 11 11 11 1111 11111 111 1111 111 11 1111 11111 1111 1111111111111 11 1 11 111 1 11 111 1 11111111 11 11111 1 11111111 111 1 11 1 11 1 1 111 11 1 1 111 11 111 111 11 1 1111 1 1 11 1 1 11 1 111 111 111 1111 11111 111 111 11 111 1111 111 11111111 111 1 111

222 22222 222 2222 222222 22222 222222222222222222222 22 222 2222222 22222222 222 22222 2222 2222222 222 2222 22 222222222222222222222222222222222222222222 222222 22 2222 2 222222 2222 222222 222222 22 22222 222222 2 222222 2222 222 2222 22 22 22222 2 222 222 2 222 2222 2 222222 222222 22 2222 222

0.63

0 10 20 30 40 50 60 70

!0.

2

0.

0

0.2

0.

4

0.

6

0.

8

linolenic

entropy

111 11111 111 1 11 11 1 111 1111 1111111 1111 11 11111111111 11 1111 111 11 1 1 111 111 111 1 11 1111 1 1111111 1 11 111 1111 111 1111 11 1 111 111 1 1111 111 11 1 1111 1111 11 11111 11 11 11 1111 11 11 11 1 111 111 11111 11 11 111 1111 11 11 11 1 111111111 11111111111111 111111111111111111 11111 11111111111111111111111 11 11 11 1111 1111 11111 1111 111 11111 111111111111 111 11 11111 1 1 1111 111 1 1 11 1111 1

2222222 22222222 22 222 22222 2222 2 222 22 2222 222 222 2 222222 22222 2222222222222 2222 222 222 222 2 222 222222222222 2 2 22 22 2 22 22 22 222 22222 2 222 2222 222 222 2 2 22 222222222 22 222 2222 2 22 22222 22 2222 22 22 222222 22 222 2 2222 2 2 22222222222 222222 22222222 22222 22 22222 22222 222 222222 2222 22

0.51

0 20 40 60 80 100

!0.

2

0.

0

0.2

0.

4

0.6

0.

8

arachidic

e

ntropy

111 1111 1 11 11 1111 11 111111 111111 111111111 111111 1 1111 11 11111 111 11 11 111 111 111111 1 11111111 1 1111 1 1111111 1111 11 1 111 11111 11 1 111 111 1111111 11 1 11111 111 111111 1 11 11 1 1111 11 1111 11111 11111 11 11 111111 111 1 11 11 11 11 1111 11 1111 11 11 111111 11 1111 1111 11 111 111 1111 11111111 11111 11111 11 11 11 1 1111111 111 11 11111 1 11 111 111 11 111111 1 11 11 11 111 1 11 11 1 1111 1 111 1111

2222222222 22 22 2222 2 2 222 22 22 22 2222 22 22 22 222 222 22 22 222 222222 2 22 2 22 22222 22 222 222 222 222 2 2 22 22 2 22 222 2 222 222 2 22 22 22 22222 22 22 222 222 2 2222 22222 22 222 2 22 22 22 22 2 22 22 222 2 22 22222 222222 22 2 22 2 2222 2 2222 2 222 2 2 22 222 222222 22 222222 2 222222 22 22 2222 222222 222 22 22 22 222 2222 2 2

0.56

0 10 20 30 40 50

!0.

2

0.

0

0.2

0.

4

0.6

0.

8

eicosenoic

e

ntropy

111 1 111 1111 111 11111 1 11 11 1111 1 1 1111 111 11 111 1 11 111 11 11 11 111 11 11 111 1 11 1 11 1 1111 11 1111111 1 1 11 11 1111111 1 111 111 1 111111 1111 11111 11 1111 11 11 1 111 1 11 1111 11111 11 1 1 111 1 1111111 1111 1111 1 11 1 1 111 1 11 111 111 11 111 1 11 1111 11 1111 11 11111 111 11 1111 11 1 111111 111 111 111 1111 11 11 111111 111 1111 1 11 1111 11 11 11111 111 111 11111 1 111 11111 11 11 1 111 11 11 11111 11 11 11 1 1 111

222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222

0

32


17/21

800 900 1100 1300

!0.

2

0.

0

0.

2

0.

4

0.6

0.

8

palmitic

entropy

22 222 222222 2222 22222 22222222 222 222222 2 22 22222 22 2222222 2 2222 222222 2 22 22 2 22 222 2222 22 2222 2 22 2 222 22 2 22

33333333 3333333333333333333333 333 3333333 333 333 33333 3333 3 33 33 33333333 3 33 333 3 33 333 3 33 3 333333333333 3 33 333 33 333 33 3 33333 3 3 3333 3333333333 33 333 3333 333 3 33333

0.62

50 100 150

!0.

2

0.

0

0.

2

0.

4

0.6

0.

8

palmitoleic

entropy

2 22222 222 22 2 22 22 22 222222 22 2 22222 2222 22 22222 222 22 222 2222 222 222 222 222 2 22 22 222 22 22 22 22 2 222 22 2 22 22222 2

333333 333 3333 33333333 3333 33 33333 33 3333 3333 33 33 33333 3 3 33333 3 33 3 33 333 33 333 3 333 33 333 333 33 33 333 333 3333 333 333 33 333 333 333333 3 33 33 3 33333333 333 33333 33 33 33 333 33

0.56

200 250 300 350

!0.

2

0.

0

0.

2

0.

4

0.6

0.

8

stearic

entropy

222 22 222 2 22 222 22222 2222 2 222 22222 22 2222 2 22 22 222 2 22222 222 222 22222222 222222222 222 222 222 22222 22 2 222222

33333333 33 33 333333333333333333 333 333 33 33 333 333 333 33 33 33 333 3 3333 3 3 3 333 333 3 3 3333 33 3333 33 3 33 33 333 3333 3 33 33 33 33 33 33 33 3 3 33 33 3 3333 3 3 333 33 333 33 33 3333 333 3 333 33

0.59

7000 7400 7800 8200

!0.

2

0.

0

0.

2

0.

4

0.

6

0.8

oleic

entropy

2 222 222 2222222222 22222 22222222 222222222 222 222222 22 22 222 222222 22222 22 22222 2222 222222 2222 2222 22 2222

333333 333 3333333333333333 333333333333 33 333333 3333333 3333 3 333333 33 33333 333 333333 333 3 333 3333 33 3333333 33 33 33 33 33 3333 33 3333 33 33 333 33333 33 333 33 33 33 3333 333 3

0.04

60 0 80 0 1 00 0 1 20 0 1 40 0

!0.

2

0.

0

0.

2

0.

4

0.

6

0.8

linoleic

entropy

222 22 222 222 2222 222222 22222 222222222222222222222 22 22 2222222 22222222 222 22222 2222 2222222 222 2222 22 222

333333333333333333333333333333333333333 333 333 33 3333 3 3 33333 3333 333333 333333 33 33333 333333 3 33 3333 3333 333 3333 33 33 33333 3 333 333 3 333 3333 3 333333 333333 33 3333 333

0

0 10 20 30 40 50 60

!0.

2

0.

0

0.

2

0.

4

0.

6

0.8

linolenic

entropy

222 2222 22222222 22 2 22 2 2222 2222 2 222 22 2222 222 222 2 222222 22222 22222 22222222 2222 222 22 2 222 2 222 2222222222 2

3 3 3 33 33 3 33 33 33 3 33 33333 3 333 3333 333 333 3 3 33 333333333 3333 3333 3 33 33333 33 3333 33 33 333333 33 333 3 3333 3 3 33333333333 333333 33333333 33333 33 33333 33333 333 333333 3333 33

0.51

0 20 40 60 80 100

!0.

2

0.

0

0.

2

0.4

0.

6

0.

8

arachidic

entropy

2222222222 22 22 2222 2 2 222 22 22 22 2222 22 22 22 222 222 22 22 222 222222 2 22 2 22 22222 22 222 222 222 222 2 2 22 22 2 22 222 2 22

3 333 3 33 33 33 33333 33 33 333 333 3 3333 33333 33 333 3 33 33 33 33 3 33 33 333 3 33 33333 333333 33 3 33 3 3333 3 3333 3 333 3 3 33 333 333333 33 333333 3 333333 33 33 3333 333333 333 33 33 33 333 3333 3 3

0.4

1.0 1.5 2.0 2.5 3.0

!0.

2

0.

0

0.

2

0.4

0.

6

0.

8

eicosenoic

entropy

2 22222 2 2222 22 22 2222222 22 2222 222 22222 222 22222 222 22 222 22 2222 2 22 222 22 22 22 222222 22 2222 22 222222 22222 22 2

33 33 3333 33333 33333 33 33333 3 33333 3 33 33333 33 33333 3333 333 333 33 3333333 3333 333 3 3333333 333 333 333 33 3 3333 333 3333 333 33 3 333 33 333333 33 333 333 3 33333 33 333 333 333 33 3333

0.54

33

STRENGTHS ANDWEAKNESSES

The solutions are usually ________, and easy to implement. Thereare few probabilistic assumptions underlying trees, which complicatethe solution. For example, because LDA assumed that the variance-

covariance of the groups are equal it doesn't see the ``perfect'' splitof northern and sardinian oils in linoleic acid.

The fitting ___________________ in the sense that the first bestfit will be used at each split, but it may be a better final result mightbe obtained by a less optimal previous step.

34


18/21

STRENGTHS ANDWEAKNESSES

The additive model approach, _______________, is too limited for

problems where separations between groups is due to combinationsof variables. But because it works variable-by-variable it can______________________, using complete data on each variable.Trees can also accommodate complex data, where some variables arecontinuous and some are categorical.

Because it is an algorithmic method it can be easy to ___________(_______) the data. The tree will then not have inferential power: it

will have worse error on new data. Split the current data into trainingand test sets, use the training subset to build the tree, and the testset to estimate the error.

35

TREES DONT DO SO WELL IN THEPRESENCE OF COVARIANCEBETWEEN VARIABLES

36


19/21

OTHER COMMONCLASSIFICATION METHODS

_______________ - fit many trees to samples of the data,

and subsets of the variables, and combine the predictions.

_______________ - a mixture of logistic regression

models.

____________________ - find gaps between groupsand fit a hyperplane to the points bordering the gaps.

37

NEURAL NETWORKFeed-forward neural networks (FFNN) were developed from thisconcept, that combining small components is a way to build amodel from predictors to response. They actually generalize

___________________. A simple network model is

represented by:

y = f(x) = ( +

s

h=1

wh(h +

p

i=1

wihxi))

where x is the vector of explanatory variable values, y is the targetvalue, p is the number of variables, s is the number of nodes in thesingle hidden layer and phi is a fixed function, usually a linear orlogistic function. This model has a single hidden layer, and univariateoutput values.

38


20/21

y = f(x) = ( +s

h=1

wh(h +

p

i=1

wihxi))

The network is fit by minimizing a squared error

ni=1(yi f(x))

2

39

SUPPORT VECTOR MACHINES The algorithm finds a

hyperplane that maximizes

the ______________ (gap)

between the two classes.

The points on the edge of

the margin are called

_____________, and are

used to define the

hyperplane.

Boundary

Support vectorsw.x + b = 0

w =N

Si=1iyixi

NS is the number of support vectors

40


21/21

This work is licensed under the Creative CommonsAttribution-Noncommercial 3.0 United States License.To view a copy of this license, visit http://creativecommons.org/licenses/by-nc/3.0/us/ or send aletter to Creative Commons, 171 Second Street, Suite300, San Francisco, California, 94105, USA.

41

discrim class

Documents