discriminant analysis

Post on 02-Jan-2016

82 views

Category:

Documents

Embed Size (px)

DESCRIPTION

Chapter 18. Discriminant Analysis. Content. Fisher discriminant analysis Maximum likelihood method Bayes formula discriminant analysis Bayes discriminant analysis Stepwise discriminant analysis. - PowerPoint PPT Presentation

TRANSCRIPT

• Discriminant AnalysisChapter 18

• ContentFisher discriminant analysis Maximum likelihood method Bayes formula discriminant analysis Bayes discriminant analysis Stepwise discriminant analysis

• Objectiveget discriminate function or probability formula (using several indicators to classify IV)DataIVs are classified into two or more groups; discriminate indicators are all numerical variables or categorical variablesPurposeinterpret & predictTypes Fisher discriminant analysis & Bayes discriminant analysis

• By data 1. Analysis for numerical variableget discriminate function using numerical indicatorsTypes 2. Analysis for categorical variable : get probability formula using categorical indicators

• By nameFisher discriminant analysis Maximum likelihood method Bayes formula discriminant analysis Bayes discriminant analysis Stepwise discriminant analysis

• 1 Fisher Discriminant AnalysisIndicator: numerical indicator Discriminated into: two or more categories

• I discriminate into two categories

• 1 principle

There are

observed units from category Aand

observed units from category AEach one has indicators read

The aim of Fisher discriminant analysis to get a linearity combination as

_1079446266.unknown

_1100698658.unknown

_1075194826.unknown

• Fisher ruleto obtain a synthesizing indicator Z, of which, the difference between the means of category A and category B(

) is likely to be max as well as the variation(

) is likely to be smallest, thus make

max.

_1100764436.unknown

_1208770259.unknown

_1079446373.unknown

• Discriminant coefficient Ccan get from equations after derivating

is respectively the mean of the jth indicator

of category A and category B

is element of the compounding matrices of covariance of

.

is the observed value of

in category A and category B.

_1079446454.unknown

_1100692619.unknown

_1208773336.unknown

_1208775409.unknown

_1100764774.unknown

_1100692482.unknown

_1069488522.unknown

_1069495128.unknown

_1069488406.unknown

• 2 Discriminate method :

discriminant function calculate

one by one calculate

,

,

calculate critical value

:

_1079446773.unknown

_1208773798.unknown

_1208774190.unknown

_1079446786.unknown

_1075201165.unknown

rule

_1208774166.unknown

_1208776763.unknown

• Example18.1 Data including three indicators X1X2X3of 22 patients is displayed in table 18-1. Among them, 12 patients are declared as early stage of disease category A, the other 10 are terminal patients category B. Try to do discriminant analysis.

• Table 18-1 observed values and discriminant results of 22 patientsZc=-0.147

category

number

observed value

Z

Fisher

discriminant result

X1

X2

X3

A

1

23

8

0

0.19

A

A

2

-1

9

-2

2.73

A

A

3

-10

5

0

1.83

A

A

4

-7

-2

1

-0.28

B

A

5

-11

3

-4

2.72

A

A

6

-10

3

-1

1.69

A

A

7

25

9

-2

0.91

A

A

8

-19

12

-3

4.98

A

A

9

9

8

-2

1.81

A

A

10

-25

-3

-1

1.39

A

A

11

0

-2

2

-1.09

B

A

12

-10

-2

0

0.25

A

B

13

9

-5

1

-2.07

B

B

14

2

-1

-1

-0.05

A

B

15

17

-6

-1

-2.22

B

B

16

8

-2

1

-1.33

B

B

17

17

-9

1

-3.53

B

B

18

0

-11

3

-3.43

B

B

19

-9

-20

3

-4.82

B

B

20

-7

-2

3

-0.91

B

B

21

-9

6

0

1.98

A

B

22

12

0

0

-0.84

B

• TABLE18-2 means of every category and margins between means 1calculate means of every category and margins between means Dj. Showed as Table 18-2.

A

12

3

4

1

B

10

4

5

1

7

9

2

_1100419847.unknown

_1100419848.unknown

_1208775774.unknown

_1069499456.unknown

• 2calculate the compounding matrices of covariance: for example:Equations:The compounding matrices of covariance

EMBED Equation.DSMT4

_1075201342.unknown

_1100693949.unknown

• Thus get

Discriminant function

Calculate

(showed as column Z)

Calculate

,

,

_1075201387.unknown

_1075201453.unknown

_1075201461.unknown

_1075201827.unknown

_1075201431.unknown

_1069502694.unknown

_1069502885.unknown

_1069502558.unknown

• 3define critical valuediscriminate into two category:

{

category A

category B

4 cases are wrongly classified .

_1075201788.unknown

_1075205750.unknown

_1075201780.unknown

• II EVALUATION using mistake discriminate probability P

Methods

1.retrospective: samples resubstitute often magnify the effect

2. prospective: validate samples

Jackknife (Cross validation)

Steps:

Virtue: fully use information from the sample

REQUEST: the mistake discriminant probability of the discriminant function is less than 0.1 or 0.2

• 2 MAXIMUM LIKELIHOOD METHOD

Indicator: qualitative indicator Discriminated into: two or more categories

• 1DataIVs are classified into two or more groups discriminate indicators are all categorical variables

Principleget discriminant probability by probability multiplicative theorem of independent event

• 2. rule

Calculate

if

then be classified as category

.

_1069513890.unknown

_1079520917.unknown

_1069513684.unknown

• 3 application Example18.2 Someone want to use 7 indicators to diagnose 4 types of appendicitis. 5668 medical records are summed up in table 18-3.

• based on table 18-3

_1069075272.unknown

_1069514645.unknown

_1100782438.unknown

_1100782526.unknown

_1069514511.unknown

_1069075129.unknown

P3 is maximal, so classified it as ganrenous appendicitis. And the diagnosis after operation is consistent with the discrimiant result.

• 3 Bayes formula discriminance

Indicator: qualitative indicator Discriminated into: two or more categories

• 1DataIVs are classified into two or more groups discriminative indicators are all qualitative or ranked data

Principleconditional probability + beforehand by probability

• RuleFor ExampleExample18-3

calculate

if

then discriminate it as category

.

_1069513890.unknown

_1079520917.unknown

_1069513684.unknown

• Example18-3 data is showed in table18-3regard the composing proportion of each type of appendicitis as the estimation of the transcendental probability

.

20%

50%

25%

5%

_1069516110.unknown

• For the case to be discriminated:

For example,

means variable

exist in the third state. The same as the others.

_1172434594.unknown

_1172434648.unknown

• Calculate

• Notice

• 4 Bayes discriminant analysis

Indicator: numerical indicatorDiscriminated into: several categories (also two categories)

• data IVs are classified into G categoruies discriminative indicators are quantitative dataprincipleBayes ruleresultsG discriminant functions

• Decide transcendental probability:1. equal probabilitywith selection bias2. frequency estimation

Rule: if the Yg is maximum, then it belongs to category g

• Example18-4 Use 17 medical records sorted as table18-4 to discriminate 3 diseases. Four indicators have been list out. You are suggested to set Bayes discriminant function.

• Table observed category and discriinaanr classification of 18-4 4 indicators

number

Formal categoty

Yg

Discrimina result

Category

1

category 2

category 3

1

6.0

-11.5

19

90

1

0.982

0.018

0.000

1

2

-11.0

-18.5

25

-36

3

0.000

0.140

0.860

3

3

90.2

-17.0

17

3

2

0.002

0.548

0.450

2

4

-4.0

-15.0

13

54

1

0.970

0.030

0.001

1

5

0.0

-14.0

20

35

2

0.099

0.667

0.235

2

6

0.5

-11.5

19

37

3

0.004

0.413

0.584

3

7

-10.0

-19.0

21

-42

3

0.000

0.151

0.848

3

8

0.0

-23.0

5

-35

1

0.427

0.520

0.053

2

9

20.0

-22.0

8

-20