discrim class
TRANSCRIPT
-
7/30/2019 Discrim Class
1/21
DISCRIMINANT ANALYSISStatistic 407, ISU
1
WHAT IS? Supervised classification, alternatively called discriminant
analysis, includes multivariate techniques finding ________
______________________________, and using this rule
to classify new observations. The process starts with a training sample, that is the full data
set with known classes. Typically the variables that will be
used to generate the classification rule are easy/cheap to
measure, but the class is more difficult to measure. It isimportant to be able to classify new observations using
variables that are easy to measure.
2
-
7/30/2019 Discrim Class
2/21
VISUAL METHODS FORDISCRIMINATION
Use ____________ to code the
class/group information in theplots. Then use the full range of
plotting methods described in the
section of graphics. Look for
separations of the points into the
color/glyph grouping. Determinewhat variables are potentially
good separators.
3
EXAMPLE: AUSTRALIANCRABS
This data is from a study ofaustralian crabs. There are 5
physical measurementsrecorded on 2 species (blue
and orange) and both sexes ofeach species, giving 4 groups.
This is a scatterplot of the
blue species with the twosexes identified.
5 10 15 20
5
10
15
20
Frontal Lobe
RearWidth
MalesFemales
Where would you draw the boundary for this data?
4
-
7/30/2019 Discrim Class
3/21
LINEAR DISCRIMINANTANALYSIS
LDA is based on the assumption that the data comes from a________________________ with equal variance-
covariance matrices. Comparing the density functions
reduces the rule to:
Allocate a new observation, X0 to group 1 if
(X1 X2)
S1pooledX0
1
2(X1 X2)
S1pooled(X1 + X2) 0
else allocate to group 2.
5
LDA RULE FOR P=1, G=2
The LDA rule results fromassuming that data for each
class comes from a MVNwith different means butthe same variance-
covariance matrix.
The boundary between the
two groups is _________
___________________.
4 2 0 2 4 6 8
0.1
0.0
0.1
0.2
0.3
0.4
X
Density
2 222 22 222 22 222 2222222 2 222 22 22 2222 2 222 22 2 22 2 2 222 222111 11 111 111 11 1 111 11 1111 11 11 11 1 11111 1 11 111 1 1 111 11 11
Density of group 1Density of group 2
Samples
LDA Boundary is where the two densities intersect
6
-
7/30/2019 Discrim Class
4/21
EXAMPLEXMale = (14.8 11.7)
, XFem = (13.3 12.1) nMale = 50, nF em = 50
Spooled =(n1 1)S1
(n1 1 ) + (n2 1)+
(n2 1)S2(n1 1 ) + (n2 1)
=
8.6 6.46.4 5.2
S1pooled =
1.47 1.811.81 2.42
This forms thecoordinates of a vectorgiving the __________________________.
(X1 X2)S1pooled =
1.5 0.4
1.47 1.811.81 2.42
=
3.013.86
SMale =
10.3 6.56.5 4.5
SFem =
6.9 6.36.3 5.9
7
EXAMPLE
Direction of maximum
separation:
8
-
7/30/2019 Discrim Class
5/21
EXAMPLE Data projected into the
___________________. Boundary between groups is at
_____.
LD1
count
0
5
10
15
0
5
10
15
Female
Male
-10 -5 0 5 10
sex
LD1
-5
0
5
10
Male Female
sex
Male
Female
9
EXAMPLE
The resulting rule is:
Classify the new observation, X0 as Male if
[3.01 3.86]X0 + 2.93 0
else allocate as Female.
10
-
7/30/2019 Discrim Class
6/21
Where are the prior probabilities for group 1and group 2.
It shifts the boundary _______ from the group with
the highest prior.
p1 p2
INCORPORATING PRIORS
(X1 X2)S1pooledX0 12(X1 X2)S1pooled(X1+ X2) ln
p2p1
11
MISCLASSIFICATION TABLEPredict the class of the training sample. Tabulate against
the true class.
Predicted membership
Actual membership
Group 1 Group 2
Group 1 n1C n1M = n1 n1CGroup 2 n2M = n2 n2C n2C
The ________________ is
n1M+n2Mn1+n2
12
-
7/30/2019 Discrim Class
7/21
EXAMPLE
Male Female
Male
Female
45 5
1 49
APR=________
13
DISCRIMINANT FUNCTIONSThe LDA rule can be divided into parts:
cj = XjS
1pooled
X0 2XjS
1pooled
Xj + ln(pj) j = 1, 2; i = j
And the rule it to allocate the new observation to the group
with the _________ value of the discriminant function.
14
-
7/30/2019 Discrim Class
8/21
CLOSEST MEAN?The LDA rule corresponds to allocating a new observation to
the group that has the ___________ squared Mahalanobisdistance between the new observation and the group mean.
dj =1
2(X0 Xj)
S1pooled(X0 Xj) ln(pj) j = 1, 2; i = j
15
MORE THAN 2 GROUPS
There are now g groups, and the rule is the ____________,
allocate to the group with the largest value of the discriminant
function
cj = XjS
1pooledX0 2
XjS
1pooledXj + ln(pj) j = 1,...,g; i = j
16
-
7/30/2019 Discrim Class
9/21
CANONICAL COORDINATESThe low-dimensional space which best separates the groups is
given by the _________ of whereW1B
B=g
i=1
ni(Xi X)(Xi X), W=
g
i=1
(ni 1)Si
g is the number of groups, and is the overall mean.
At most _______ dimensions are needed.
X
0 2 4 6 8 10
!2
0
2
4
6
8
10
X1
X2
111
11
1
1
111
11
1
11
1
1 1111
1
1
11
11
1
1
1
1
1 111 1
11
11
1
11
1
1
1
1
11
1
22
2
2
2
2
2
22
2
2 2
222
2
2
2 2
2
2
2 2
22
22
2
2
2
2 2
2
22 2 2
2
22
2
22
22222
22
3
33
3
3
3 33
333
3
333
3 33
3
3
3 33
33
33
3
3
33
3
3
3
3
333 3
3
3 33
33
3
3
33
3
!2 0 2 4 6
!2
0
2
4
6
X1
X2
11 1
111
1
111
1
1
1
1
1
1
1
1
1
1
11
11
1
1
1
1
1
1
1
1
11
1
1
1
1
1 111
1111
11
11 2 2
2
22 2 22 22
2
2
22
2
222
22
2
222 2
2
222
2
2
2
2
2
2
2
2
2
2
2
22
22
22
2
222
3
33
3
3
3
3 33
33
3
3
3
33
333
33
3
3 3
3
3
3
33
3
33
3
3
3
3
3
3
3 33
3
33
3
33
33
3Eg, g=3,1 or 2 dimneeded
17
QUADRATIC DISCRIMINANTANALYSIS
Suppose that the variance-covariances are ___________ for
each group, then the rule becomes:
Allocate a new observation, X0 to group 1 if
1
2X0(S
11 S
12 )X0 + (X
1S
11 X
1S
11 )X0
1
2(ln
|S1|
|S2|
+ (X
1S
11 X1 X
2S
12 X2))
lnp2
p
1
else allocate to group 2.
18
-
7/30/2019 Discrim Class
10/21
DISCRIMINANT FUNCTIONSAllocate the new observation to the group with the _______
value of the discriminant function:
cj = 1
2X0S
1j X0+XjS
1j X0
1
2ln(|Sj|)+X
jS
1j Xj+ln(pj) j = 1, 2; i = j
19
RELATIONSHIP BETWEENLDA AND REGRESSION
A matrix of variables is used to predict a ___________
__________:
Xnp =
X11 X12 . . . X 1pX21 X22 . . . X 2p... ... . . . ...
Xn1 Xn2 . . . X np
np
Y =
1...
1
2...
2
20
-
7/30/2019 Discrim Class
11/21
LINEAR REGRESSION
Y= b0+ b1X1+ . . .+ bpXp
!2 0 2 4 6
0.5
1.0
1.5
2.0
2.5
X1
Y
11 11 111 1 111 111 111 1 111 1 1111 111 1111111 11 11 11 111 11111
2 22222 222 22 2 22 2 22 2 22222 22 2222 22 222 222 22 2 22 222 222 Problems: ________
________________
21
LOGISTIC REGRESSION
The logistic regression model is
pk(X0) =
exp(bk0+p
j=1 bkjX0j)
1+g1
l=1 exp(bl0+p
j=1 bljX0j)
k = 1, . . . , g 1
1
1+g1
l=1 exp(bl0+p
j=1 bljX0j)k = g
And classification rule would
be to allocate to group with
the ____________ value.
22
-
7/30/2019 Discrim Class
12/21
CLASSIFICATION TREESThe tree algorithm generates classification rules bysequentially doing ___________ on the data. Splits aremade on individual variables. On each variable the values
are sorted, and splits between each pair of values areexamined for quality of the split using a criterionfunction. Of the cases to the left of the split, thecriterion compares the purity, the proportion which arein each class, and similarly for cases to the right of thesplit. A common criterion is entropy, which for twoclasses, would be computed as:
p0
log p0
p1
log p1
where p0 =N0
N, p1 =
N1
N= 1 p0 are the relative proportions
of cases in classes 0,1.
23
This is lowest if either is ___. A ___ split
has ____ groups to each side (bucket), all class 0
on the left and all class 1 to the right. To measure
the quality of a split we need to measure the
impurity in each bucket:
N0 or N1
pL(pL0
log pL0 pL
1log pL
1) + pR(pR
0log pR
0 pR
1log pR
1)
where is the proportion of cases in the
left, right buckets, respectively. This is a weighted
average of the impurity, as measured by entropy,
in each bucket.
pL, pR
24
-
7/30/2019 Discrim Class
13/21
ALGORITHM
1. For each ________, and for each possible ______
calculate the impurity measure.
2. Pick the split with the smallest impurity, ______ the
data into two using this split. Each split is called a ____
on the resulting tree.
3. On each subset, repeat step 1-2.
4. Splitting a node is controlled by number of cases in
the subset at that node, and also the amount of
impurity the node. Stop splitting when either of these
gets below a tolerance.
25
DEVIANCE: MEASURINGFIT
The ________ at a node i is defined to be:
and thus the deviance for the classifier is
Di =
g
k=1
pik log pik
Ti=1Di, T =number of terminal nodes.
26
-
7/30/2019 Discrim Class
14/21
EXAMPLE: OLIVE OILS 3 REGIONS,ALL VARIABLES
> library(rpart)
> olive.rp olive.rp
n= 572
node), split, n, loss, yval, (yprob)
* denotes terminal node
1) root 572 249 1 (0.5646853 0.1713287 0.2639860)
2) eicosenoic>=6.5 323 0 1 (1.0000000 0.0000000 0.0000000) *
3) eicosenoic< 6.5 249 98 3 (0.0000000 0.3935743 0.6064257)
6) linoleic>=1053.5 98 0 2 (0.0000000 1.0000000 0.0000000) *
7) linoleic< 1053.5 151 0 3 (0.0000000 0.0000000 1.0000000) *
node)is the arbitrary numbering of nodes from top to bottom of the tree
split is the ____ for the split from that node
nis the _______ of cases at this node
loss is the number of cases __________ at this node
yvalis the _________ value for all cases at this node
(yprob)are the __________ in each class
27
The first split is on __________ acid and the next split
is on __________ acid.
It only uses these ____ variables!And there is __________!
EXAMPLE: OLIVE OILS 3 REGIONS,ALL VARIABLES
28
-
7/30/2019 Discrim Class
15/21
A CLOSER LOOK.....
Consider the data x = (1, 2, 3, 4, 5, 6, 7, 8)
class = (1, 1, 1, 1, 2, 2, 2, 2)
and
then all possible splits would be
Left Right(1,0) (3,4)(2,0) (2,4)(3,0) (1,4)(4,0) (0,4)(4,1) (0,3)
(4,2) (0,2)(4,3) (0,1)
1 2 3 4 5 6 7 8
0.
0
0.
2
0.
4
0.
6
0.
8
1.
0
Testing Trees
x
entropy
1 1 1 1 2 2 2 2
Calculate theimpurity (onslide 2) for
each possiblesplit
lowest valueis betweenpoints 4 and5. Thats thesplit to use.
29
How does it work for a nonsensical class structure?Consider data:
x = (1, 2, 3, 4, 5, 6, 7, 8)
(1, 2, 1, 2, 1, 2, 1, 2)class =
1 2 3 4 5 6 7 8
0.
0
0.
2
0
.4
0.
6
0.
8
1.
0
Testing Trees
x
entropy
1 1 1 12 2 2 2
The split chosen will most
likely be the first one,
between points 1 and 2.
30
-
7/30/2019 Discrim Class
16/21
HOW DOES IT WORK ON THE OLIVEOILS DATA?
In practice the impurity functions can be quite _____.
The next two sets of plots show the impurity measurecalculated to separate the (1) southern oils from the
other two regions, and (2) northern from Sardinianoils.
________ acid is the variable with the lowest impurityoverall, 0. It would be chosen as the most importantvariable at the top of the tree.
________ acid is the variable with the lowestimpurity, 0, when region 1 is removed. It would be
chosen as the second split variable.
31
800 1000 1200 1400 1600
!0.2
0.0
0.
2
0.
4
0.
6
0.8
palmitic
entropy
111 1 111 1111111 1 11 111 11 111 111111 11111111 111 11111 11 111111 11 111 11 111 11 111111111 11111 111 1111 1111 1111 1111 1111 1111111 11 11 11 11 11111 11 11 111 11 11 1111 111 11 111111 11 11 1111 1111 1111 111 111 111 11111 111111 11 1 11 11 111 1 1111 11 1111 11111 1111 11111111111 111 11 1 11 111 11 1 111111 1 111 11 1 11 11 11111 111 11 1111 1 111 1 11 111111 1111 1 1 1111 111 111111 1 111 11 111 111 11111 11
22 222 2222222222 22222 22222222 222 2222222 22 22222 22 22222222 2222 222222 222 22222 222 2222 2222222 222222222 2222222222 22222222222222222222222222222222 22222222222 22222 22 2222222222 222 222222 222 2 222 2 222222222222222 222 22 222 222 222222 2 2222 2222222222 22 222 2222 222 2 22222
0.3
50 100 150 200 250
!0.2
0.0
0.
2
0.
4
0.
6
0.8
palmitoleic
entropy
111111 11111 1111 1111111111 11111 1111111 1111 1111 11 11111 11 11 1111 11 11111 11 11 111 11 111 11 111111 11111111 1111111111 11111111 111 1111 11 1111111111111 1111111 111 111 11 111 11 11111 11 1111 111 11 1111 1 11 11 11111 11 11 11 11 111 1 1111 11 1111 11 111 1 111 11111 1 11111 1 11 111 11 111 111 1111 11 1 111 1111111 11 111111 1 1 11111 111 1 11111111 1111 1 111 111 11 1 11111 1 111 11 1 1 1111 1 1111 11
2 22222 222 222 22 22 22222222 22 222222 222222 22222 222 22 2222222 222 222222222 2 22 22 222 22 22 22 22 2 222 22 2 2222222 2222222 222222222222222 2222 222222222 2222 2222 22 22 22222 2 222222 2222 22222222222 222 22 222222 2222 2222222222222 222 22 222 222 222222 2 2222 222222222 222 22222 2222 22222 22
0.4
150 200 250 300 350
!0.2
0.0
0.
2
0.
4
0.
6
0.8
stearic
entropy
11 11 1111111 11 1 111 11 11 111 111 1 111 111 11 11 111 111 11 1 1 1 11 11 111 11 111 11 111111 1 11111 111 11111 1111 1111111 111 11 11 1111 1111 11 11 111 1 11 1 111111 1111 11 11 11 1 1 11111 111 111 11111 11 111 11 111 1 1 111111 1 11 1111 11111 11 111 1111 11 11 11 1111 1 11 1 1111 111 1111 1111 11 11 1111 1111 11 111 1111 11111 111 11 1 11 11 1111 11 11 111 111111 11111111 111 11 1 111 111 1111 1 111 11 11 1 11 111 1111 11111 1
222 22 222 2 22 22222222 2222 2 222 22222 22 2222 2 22 22 222 2 22222 222 222 22222222 222222222 222 222 222 22222 22 2 22222222222222 2222 222222222222222222 222 222 22 22 222 222 22222 22 22222 22222 2 2 2 222 222 2 2 2222 22 2222 22 222 22 222 2222 222 2222 22 22 22 22 2 2 22 22 2 2222 2 2 222 22 222 22 22 22222222 222 22
0.67
6500 7000 7500 8000
!0.
2
0.
0
0.2
0.
4
0.
6
0.
8
oleic
entropy
11 111 1111 1111 111 11 111 11 111111 111 111 11111 1111 1111 11 111 111 11 111 1111 11111 111 1111 11 111 11 11111 1 111 111111 1111 111 11111111 111 1 11 111 11 11 11 1111111 11 11 111 1 111 11 11 1111 1111 11 11 1 1111 11 111 1111 111111 11 1111111 11 1 11111 11 11 1 111 11 1111 1 11 111 11111 1111 111 111 1111111 1111 111 111 111 11 11 11 1 111 1111 1 1111 111111 11 1 111 11111111111 11 1 11 1 111 1 11 11111 1 1111111 1
2 222 2222222222222 2222222222222 222222222222222222 22 2 22 222 222222 22222 22 22222 2222 2222222222 2222 22 2222 2222222222222222222222222222222222222222222222222222 2222 2 22222222 22222 222 222222 222 2 2222222222222222 22 2222 22 22 2222 22 2222 2222 222 22222 22 222 2222 22 2222 222 2
0.49
6 00 8 00 1 00 0 1 20 0 1 40 0
!0.
2
0.
0
0.2
0.
4
0.
6
0.
8
linoleic
entropy
1 11 1 111 1111 1 11 111 11111111 1 11 11111111111111 111 11 11111 11 111 111 111 1 11 111 111111 111 111 11 11 1 11111 111111 11111 11111 11 111111 111 111 11111111111111 11 11 11 1111 11111 111 1111 111 11 1111 11111 1111 1111111111111 11 1 11 111 1 11 111 1 11111111 11 11111 1 11111111 111 1 11 1 11 1 1 111 11 1 1 111 11 111 111 11 1 1111 1 1 11 1 1 11 1 111 111 111 1111 11111 111 111 11 111 1111 111 11111111 111 1 111
222 22222 222 2222 222222 22222 222222222222222222222 22 222 2222222 22222222 222 22222 2222 2222222 222 2222 22 222222222222222222222222222222222222222222 222222 22 2222 2 222222 2222 222222 222222 22 22222 222222 2 222222 2222 222 2222 22 22 22222 2 222 222 2 222 2222 2 222222 222222 22 2222 222
0.63
0 10 20 30 40 50 60 70
!0.
2
0.
0
0.2
0.
4
0.
6
0.
8
linolenic
entropy
111 11111 111 1 11 11 1 111 1111 1111111 1111 11 11111111111 11 1111 111 11 1 1 111 111 111 1 11 1111 1 1111111 1 11 111 1111 111 1111 11 1 111 111 1 1111 111 11 1 1111 1111 11 11111 11 11 11 1111 11 11 11 1 111 111 11111 11 11 111 1111 11 11 11 1 111111111 11111111111111 111111111111111111 11111 11111111111111111111111 11 11 11 1111 1111 11111 1111 111 11111 111111111111 111 11 11111 1 1 1111 111 1 1 11 1111 1
2222222 22222222 22 222 22222 2222 2 222 22 2222 222 222 2 222222 22222 2222222222222 2222 222 222 222 2 222 222222222222 2 2 22 22 2 22 22 22 222 22222 2 222 2222 222 222 2 2 22 222222222 22 222 2222 2 22 22222 22 2222 22 22 222222 22 222 2 2222 2 2 22222222222 222222 22222222 22222 22 22222 22222 222 222222 2222 22
0.51
0 20 40 60 80 100
!0.
2
0.
0
0.2
0.
4
0.6
0.
8
arachidic
e
ntropy
111 1111 1 11 11 1111 11 111111 111111 111111111 111111 1 1111 11 11111 111 11 11 111 111 111111 1 11111111 1 1111 1 1111111 1111 11 1 111 11111 11 1 111 111 1111111 11 1 11111 111 111111 1 11 11 1 1111 11 1111 11111 11111 11 11 111111 111 1 11 11 11 11 1111 11 1111 11 11 111111 11 1111 1111 11 111 111 1111 11111111 11111 11111 11 11 11 1 1111111 111 11 11111 1 11 111 111 11 111111 1 11 11 11 111 1 11 11 1 1111 1 111 1111
2222222222 22 22 2222 2 2 222 22 22 22 2222 22 22 22 222 222 22 22 222 222222 2 22 2 22 22222 22 222 222 222 222 2 2 22 22 2 22 222 2 222 222 2 22 22 22 22222 22 22 222 222 2 2222 22222 22 222 2 22 22 22 22 2 22 22 222 2 22 22222 222222 22 2 22 2 2222 2 2222 2 222 2 2 22 222 222222 22 222222 2 222222 22 22 2222 222222 222 22 22 22 222 2222 2 2
0.56
0 10 20 30 40 50
!0.
2
0.
0
0.2
0.
4
0.6
0.
8
eicosenoic
e
ntropy
111 1 111 1111 111 11111 1 11 11 1111 1 1 1111 111 11 111 1 11 111 11 11 11 111 11 11 111 1 11 1 11 1 1111 11 1111111 1 1 11 11 1111111 1 111 111 1 111111 1111 11111 11 1111 11 11 1 111 1 11 1111 11111 11 1 1 111 1 1111111 1111 1111 1 11 1 1 111 1 11 111 111 11 111 1 11 1111 11 1111 11 11111 111 11 1111 11 1 111111 111 111 111 1111 11 11 111111 111 1111 1 11 1111 11 11 11111 111 111 11111 1 111 11111 11 11 1 111 11 11 11111 11 11 11 1 1 111
222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222
0
32
-
7/30/2019 Discrim Class
17/21
800 900 1100 1300
!0.
2
0.
0
0.
2
0.
4
0.6
0.
8
palmitic
entropy
22 222 222222 2222 22222 22222222 222 222222 2 22 22222 22 2222222 2 2222 222222 2 22 22 2 22 222 2222 22 2222 2 22 2 222 22 2 22
33333333 3333333333333333333333 333 3333333 333 333 33333 3333 3 33 33 33333333 3 33 333 3 33 333 3 33 3 333333333333 3 33 333 33 333 33 3 33333 3 3 3333 3333333333 33 333 3333 333 3 33333
0.62
50 100 150
!0.
2
0.
0
0.
2
0.
4
0.6
0.
8
palmitoleic
entropy
2 22222 222 22 2 22 22 22 222222 22 2 22222 2222 22 22222 222 22 222 2222 222 222 222 222 2 22 22 222 22 22 22 22 2 222 22 2 22 22222 2
333333 333 3333 33333333 3333 33 33333 33 3333 3333 33 33 33333 3 3 33333 3 33 3 33 333 33 333 3 333 33 333 333 33 33 333 333 3333 333 333 33 333 333 333333 3 33 33 3 33333333 333 33333 33 33 33 333 33
0.56
200 250 300 350
!0.
2
0.
0
0.
2
0.
4
0.6
0.
8
stearic
entropy
222 22 222 2 22 222 22222 2222 2 222 22222 22 2222 2 22 22 222 2 22222 222 222 22222222 222222222 222 222 222 22222 22 2 222222
33333333 33 33 333333333333333333 333 333 33 33 333 333 333 33 33 33 333 3 3333 3 3 3 333 333 3 3 3333 33 3333 33 3 33 33 333 3333 3 33 33 33 33 33 33 33 3 3 33 33 3 3333 3 3 333 33 333 33 33 3333 333 3 333 33
0.59
7000 7400 7800 8200
!0.
2
0.
0
0.
2
0.
4
0.
6
0.8
oleic
entropy
2 222 222 2222222222 22222 22222222 222222222 222 222222 22 22 222 222222 22222 22 22222 2222 222222 2222 2222 22 2222
333333 333 3333333333333333 333333333333 33 333333 3333333 3333 3 333333 33 33333 333 333333 333 3 333 3333 33 3333333 33 33 33 33 33 3333 33 3333 33 33 333 33333 33 333 33 33 33 3333 333 3
0.04
60 0 80 0 1 00 0 1 20 0 1 40 0
!0.
2
0.
0
0.
2
0.
4
0.
6
0.8
linoleic
entropy
222 22 222 222 2222 222222 22222 222222222222222222222 22 22 2222222 22222222 222 22222 2222 2222222 222 2222 22 222
333333333333333333333333333333333333333 333 333 33 3333 3 3 33333 3333 333333 333333 33 33333 333333 3 33 3333 3333 333 3333 33 33 33333 3 333 333 3 333 3333 3 333333 333333 33 3333 333
0
0 10 20 30 40 50 60
!0.
2
0.
0
0.
2
0.
4
0.
6
0.8
linolenic
entropy
222 2222 22222222 22 2 22 2 2222 2222 2 222 22 2222 222 222 2 222222 22222 22222 22222222 2222 222 22 2 222 2 222 2222222222 2
3 3 3 33 33 3 33 33 33 3 33 33333 3 333 3333 333 333 3 3 33 333333333 3333 3333 3 33 33333 33 3333 33 33 333333 33 333 3 3333 3 3 33333333333 333333 33333333 33333 33 33333 33333 333 333333 3333 33
0.51
0 20 40 60 80 100
!0.
2
0.
0
0.
2
0.4
0.
6
0.
8
arachidic
entropy
2222222222 22 22 2222 2 2 222 22 22 22 2222 22 22 22 222 222 22 22 222 222222 2 22 2 22 22222 22 222 222 222 222 2 2 22 22 2 22 222 2 22
3 333 3 33 33 33 33333 33 33 333 333 3 3333 33333 33 333 3 33 33 33 33 3 33 33 333 3 33 33333 333333 33 3 33 3 3333 3 3333 3 333 3 3 33 333 333333 33 333333 3 333333 33 33 3333 333333 333 33 33 33 333 3333 3 3
0.4
1.0 1.5 2.0 2.5 3.0
!0.
2
0.
0
0.
2
0.4
0.
6
0.
8
eicosenoic
entropy
2 22222 2 2222 22 22 2222222 22 2222 222 22222 222 22222 222 22 222 22 2222 2 22 222 22 22 22 222222 22 2222 22 222222 22222 22 2
33 33 3333 33333 33333 33 33333 3 33333 3 33 33333 33 33333 3333 333 333 33 3333333 3333 333 3 3333333 333 333 333 33 3 3333 333 3333 333 33 3 333 33 333333 33 333 333 3 33333 33 333 333 333 33 3333
0.54
33
STRENGTHS ANDWEAKNESSES
The solutions are usually ________, and easy to implement. Thereare few probabilistic assumptions underlying trees, which complicatethe solution. For example, because LDA assumed that the variance-
covariance of the groups are equal it doesn't see the ``perfect'' splitof northern and sardinian oils in linoleic acid.
The fitting ___________________ in the sense that the first bestfit will be used at each split, but it may be a better final result mightbe obtained by a less optimal previous step.
34
-
7/30/2019 Discrim Class
18/21
STRENGTHS ANDWEAKNESSES
The additive model approach, _______________, is too limited for
problems where separations between groups is due to combinationsof variables. But because it works variable-by-variable it can______________________, using complete data on each variable.Trees can also accommodate complex data, where some variables arecontinuous and some are categorical.
Because it is an algorithmic method it can be easy to ___________(_______) the data. The tree will then not have inferential power: it
will have worse error on new data. Split the current data into trainingand test sets, use the training subset to build the tree, and the testset to estimate the error.
35
TREES DONT DO SO WELL IN THEPRESENCE OF COVARIANCEBETWEEN VARIABLES
36
-
7/30/2019 Discrim Class
19/21
OTHER COMMONCLASSIFICATION METHODS
_______________ - fit many trees to samples of the data,
and subsets of the variables, and combine the predictions.
_______________ - a mixture of logistic regression
models.
____________________ - find gaps between groupsand fit a hyperplane to the points bordering the gaps.
37
NEURAL NETWORKFeed-forward neural networks (FFNN) were developed from thisconcept, that combining small components is a way to build amodel from predictors to response. They actually generalize
___________________. A simple network model is
represented by:
y = f(x) = ( +
s
h=1
wh(h +
p
i=1
wihxi))
where x is the vector of explanatory variable values, y is the targetvalue, p is the number of variables, s is the number of nodes in thesingle hidden layer and phi is a fixed function, usually a linear orlogistic function. This model has a single hidden layer, and univariateoutput values.
38
-
7/30/2019 Discrim Class
20/21
y = f(x) = ( +s
h=1
wh(h +
p
i=1
wihxi))
The network is fit by minimizing a squared error
ni=1(yi f(x))
2
39
SUPPORT VECTOR MACHINES The algorithm finds a
hyperplane that maximizes
the ______________ (gap)
between the two classes.
The points on the edge of
the margin are called
_____________, and are
used to define the
hyperplane.
Boundary
Support vectorsw.x + b = 0
w =N
Si=1iyixi
NS is the number of support vectors
40
-
7/30/2019 Discrim Class
21/21
This work is licensed under the Creative CommonsAttribution-Noncommercial 3.0 United States License.To view a copy of this license, visit http://creativecommons.org/licenses/by-nc/3.0/us/ or send aletter to Creative Commons, 171 Second Street, Suite300, San Francisco, California, 94105, USA.
41