discriminant analysis
Click here to load reader
Post on 02-Jan-2016
82 views
Embed Size (px)
DESCRIPTION
Chapter 18. Discriminant Analysis. Content. Fisher discriminant analysis Maximum likelihood method Bayes formula discriminant analysis Bayes discriminant analysis Stepwise discriminant analysis. - PowerPoint PPT PresentationTRANSCRIPT
Discriminant AnalysisChapter 18
ContentFisher discriminant analysis Maximum likelihood method Bayes formula discriminant analysis Bayes discriminant analysis Stepwise discriminant analysis
Objectiveget discriminate function or probability formula (using several indicators to classify IV)DataIVs are classified into two or more groups; discriminate indicators are all numerical variables or categorical variablesPurposeinterpret & predictTypes Fisher discriminant analysis & Bayes discriminant analysis
By data 1. Analysis for numerical variableget discriminate function using numerical indicatorsTypes 2. Analysis for categorical variable : get probability formula using categorical indicators
By nameFisher discriminant analysis Maximum likelihood method Bayes formula discriminant analysis Bayes discriminant analysis Stepwise discriminant analysis
1 Fisher Discriminant AnalysisIndicator: numerical indicator Discriminated into: two or more categories
I discriminate into two categories
1 principle
There are
observed units from category Aand
observed units from category AEach one has indicators read
The aim of Fisher discriminant analysis to get a linearity combination as
_1079446266.unknown
_1100698658.unknown
_1075194826.unknown
Fisher ruleto obtain a synthesizing indicator Z, of which, the difference between the means of category A and category B(
) is likely to be max as well as the variation(
) is likely to be smallest, thus make
max.
_1100764436.unknown
_1208770259.unknown
_1079446373.unknown
Discriminant coefficient Ccan get from equations after derivating
is respectively the mean of the jth indicator
of category A and category B
is element of the compounding matrices of covariance of
.
is the observed value of
in category A and category B.
_1079446454.unknown
_1100692619.unknown
_1208773336.unknown
_1208775409.unknown
_1100764774.unknown
_1100692482.unknown
_1069488522.unknown
_1069495128.unknown
_1069488406.unknown
2 Discriminate method :
discriminant function calculate
one by one calculate
,
,
calculate critical value
:
_1079446773.unknown
_1208773798.unknown
_1208774190.unknown
_1079446786.unknown
_1075201165.unknown
rule
_1208774166.unknown
_1208776763.unknown
Example18.1 Data including three indicators X1X2X3of 22 patients is displayed in table 18-1. Among them, 12 patients are declared as early stage of disease category A, the other 10 are terminal patients category B. Try to do discriminant analysis.
Table 18-1 observed values and discriminant results of 22 patientsZc=-0.147
category
number
observed value
Z
Fisher
discriminant result
X1
X2
X3
A
1
23
8
0
0.19
A
A
2
-1
9
-2
2.73
A
A
3
-10
5
0
1.83
A
A
4
-7
-2
1
-0.28
B
A
5
-11
3
-4
2.72
A
A
6
-10
3
-1
1.69
A
A
7
25
9
-2
0.91
A
A
8
-19
12
-3
4.98
A
A
9
9
8
-2
1.81
A
A
10
-25
-3
-1
1.39
A
A
11
0
-2
2
-1.09
B
A
12
-10
-2
0
0.25
A
B
13
9
-5
1
-2.07
B
B
14
2
-1
-1
-0.05
A
B
15
17
-6
-1
-2.22
B
B
16
8
-2
1
-1.33
B
B
17
17
-9
1
-3.53
B
B
18
0
-11
3
-3.43
B
B
19
-9
-20
3
-4.82
B
B
20
-7
-2
3
-0.91
B
B
21
-9
6
0
1.98
A
B
22
12
0
0
-0.84
B
TABLE18-2 means of every category and margins between means 1calculate means of every category and margins between means Dj. Showed as Table 18-2.
A
12
3
4
1
B
10
4
5
1
7
9
2
_1100419847.unknown
_1100419848.unknown
_1208775774.unknown
_1069499456.unknown
2calculate the compounding matrices of covariance: for example:Equations:The compounding matrices of covariance
EMBED Equation.DSMT4
_1075201342.unknown
_1100693949.unknown
Thus get
Discriminant function
Calculate
(showed as column Z)
Calculate
,
,
_1075201387.unknown
_1075201453.unknown
_1075201461.unknown
_1075201827.unknown
_1075201431.unknown
_1069502694.unknown
_1069502885.unknown
_1069502558.unknown
3define critical valuediscriminate into two category:
{
category A
category B
4 cases are wrongly classified .
_1075201788.unknown
_1075205750.unknown
_1075201780.unknown
II EVALUATION using mistake discriminate probability P
Methods
1.retrospective: samples resubstitute often magnify the effect
2. prospective: validate samples
Jackknife (Cross validation)
Steps:
Virtue: fully use information from the sample
REQUEST: the mistake discriminant probability of the discriminant function is less than 0.1 or 0.2
2 MAXIMUM LIKELIHOOD METHOD
Indicator: qualitative indicator Discriminated into: two or more categories
1DataIVs are classified into two or more groups discriminate indicators are all categorical variables
Principleget discriminant probability by probability multiplicative theorem of independent event
2. rule
Calculate
if
then be classified as category
.
_1069513890.unknown
_1079520917.unknown
_1069513684.unknown
3 application Example18.2 Someone want to use 7 indicators to diagnose 4 types of appendicitis. 5668 medical records are summed up in table 18-3.
based on table 18-3
_1069075272.unknown
_1069514645.unknown
_1100782438.unknown
_1100782526.unknown
_1069514511.unknown
_1069075129.unknown
P3 is maximal, so classified it as ganrenous appendicitis. And the diagnosis after operation is consistent with the discrimiant result.
3 Bayes formula discriminance
Indicator: qualitative indicator Discriminated into: two or more categories
1DataIVs are classified into two or more groups discriminative indicators are all qualitative or ranked data
Principleconditional probability + beforehand by probability
RuleFor ExampleExample18-3
calculate
if
then discriminate it as category
.
_1069513890.unknown
_1079520917.unknown
_1069513684.unknown
Example18-3 data is showed in table18-3regard the composing proportion of each type of appendicitis as the estimation of the transcendental probability
.
20%
50%
25%
5%
_1069516110.unknown
For the case to be discriminated:
For example,
means variable
exist in the third state. The same as the others.
_1172434594.unknown
_1172434648.unknown
Calculate
Notice
4 Bayes discriminant analysis
Indicator: numerical indicatorDiscriminated into: several categories (also two categories)
data IVs are classified into G categoruies discriminative indicators are quantitative dataprincipleBayes ruleresultsG discriminant functions
Decide transcendental probability:1. equal probabilitywith selection bias2. frequency estimation
Rule: if the Yg is maximum, then it belongs to category g
Advantage: quick , correct
Example18-4 Use 17 medical records sorted as table18-4 to discriminate 3 diseases. Four indicators have been list out. You are suggested to set Bayes discriminant function.
Table observed category and discriinaanr classification of 18-4 4 indicators
number
Formal categoty
Yg
Discrimina result
Category
1
category 2
category 3
1
6.0
-11.5
19
90
1
0.982
0.018
0.000
1
2
-11.0
-18.5
25
-36
3
0.000
0.140
0.860
3
3
90.2
-17.0
17
3
2
0.002
0.548
0.450
2
4
-4.0
-15.0
13
54
1
0.970
0.030
0.001
1
5
0.0
-14.0
20
35
2
0.099
0.667
0.235
2
6
0.5
-11.5
19
37
3
0.004
0.413
0.584
3
7
-10.0
-19.0
21
-42
3
0.000
0.151
0.848
3
8
0.0
-23.0
5
-35
1
0.427
0.520
0.053
2
9
20.0
-22.0
8
-20