discriminant analysis

Click here to load reader

Post on 02-Jan-2016

82 views

Category:

Documents

6 download

Embed Size (px)

DESCRIPTION

Chapter 18. Discriminant Analysis. Content. Fisher discriminant analysis Maximum likelihood method Bayes formula discriminant analysis Bayes discriminant analysis Stepwise discriminant analysis. - PowerPoint PPT Presentation

TRANSCRIPT

  • Discriminant AnalysisChapter 18

  • ContentFisher discriminant analysis Maximum likelihood method Bayes formula discriminant analysis Bayes discriminant analysis Stepwise discriminant analysis

  • Objectiveget discriminate function or probability formula (using several indicators to classify IV)DataIVs are classified into two or more groups; discriminate indicators are all numerical variables or categorical variablesPurposeinterpret & predictTypes Fisher discriminant analysis & Bayes discriminant analysis

  • By data 1. Analysis for numerical variableget discriminate function using numerical indicatorsTypes 2. Analysis for categorical variable : get probability formula using categorical indicators

  • By nameFisher discriminant analysis Maximum likelihood method Bayes formula discriminant analysis Bayes discriminant analysis Stepwise discriminant analysis

  • 1 Fisher Discriminant AnalysisIndicator: numerical indicator Discriminated into: two or more categories

  • I discriminate into two categories

  • 1 principle

    There are

    observed units from category Aand

    observed units from category AEach one has indicators read

    The aim of Fisher discriminant analysis to get a linearity combination as

    _1079446266.unknown

    _1100698658.unknown

    _1075194826.unknown

  • Fisher ruleto obtain a synthesizing indicator Z, of which, the difference between the means of category A and category B(

    ) is likely to be max as well as the variation(

    ) is likely to be smallest, thus make

    max.

    _1100764436.unknown

    _1208770259.unknown

    _1079446373.unknown

  • Discriminant coefficient Ccan get from equations after derivating

    is respectively the mean of the jth indicator

    of category A and category B

    is element of the compounding matrices of covariance of

    .

    is the observed value of

    in category A and category B.

    _1079446454.unknown

    _1100692619.unknown

    _1208773336.unknown

    _1208775409.unknown

    _1100764774.unknown

    _1100692482.unknown

    _1069488522.unknown

    _1069495128.unknown

    _1069488406.unknown

  • 2 Discriminate method :

    discriminant function calculate

    one by one calculate

    ,

    ,

    calculate critical value

    :

    _1079446773.unknown

    _1208773798.unknown

    _1208774190.unknown

    _1079446786.unknown

    _1075201165.unknown

    rule

    _1208774166.unknown

    _1208776763.unknown

  • Example18.1 Data including three indicators X1X2X3of 22 patients is displayed in table 18-1. Among them, 12 patients are declared as early stage of disease category A, the other 10 are terminal patients category B. Try to do discriminant analysis.

  • Table 18-1 observed values and discriminant results of 22 patientsZc=-0.147

    category

    number

    observed value

    Z

    Fisher

    discriminant result

    X1

    X2

    X3

    A

    1

    23

    8

    0

    0.19

    A

    A

    2

    -1

    9

    -2

    2.73

    A

    A

    3

    -10

    5

    0

    1.83

    A

    A

    4

    -7

    -2

    1

    -0.28

    B

    A

    5

    -11

    3

    -4

    2.72

    A

    A

    6

    -10

    3

    -1

    1.69

    A

    A

    7

    25

    9

    -2

    0.91

    A

    A

    8

    -19

    12

    -3

    4.98

    A

    A

    9

    9

    8

    -2

    1.81

    A

    A

    10

    -25

    -3

    -1

    1.39

    A

    A

    11

    0

    -2

    2

    -1.09

    B

    A

    12

    -10

    -2

    0

    0.25

    A

    B

    13

    9

    -5

    1

    -2.07

    B

    B

    14

    2

    -1

    -1

    -0.05

    A

    B

    15

    17

    -6

    -1

    -2.22

    B

    B

    16

    8

    -2

    1

    -1.33

    B

    B

    17

    17

    -9

    1

    -3.53

    B

    B

    18

    0

    -11

    3

    -3.43

    B

    B

    19

    -9

    -20

    3

    -4.82

    B

    B

    20

    -7

    -2

    3

    -0.91

    B

    B

    21

    -9

    6

    0

    1.98

    A

    B

    22

    12

    0

    0

    -0.84

    B

  • TABLE18-2 means of every category and margins between means 1calculate means of every category and margins between means Dj. Showed as Table 18-2.

    A

    12

    3

    4

    1

    B

    10

    4

    5

    1

    7

    9

    2

    _1100419847.unknown

    _1100419848.unknown

    _1208775774.unknown

    _1069499456.unknown

  • 2calculate the compounding matrices of covariance: for example:Equations:The compounding matrices of covariance

    EMBED Equation.DSMT4

    _1075201342.unknown

    _1100693949.unknown

  • Thus get

    Discriminant function

    Calculate

    (showed as column Z)

    Calculate

    ,

    ,

    _1075201387.unknown

    _1075201453.unknown

    _1075201461.unknown

    _1075201827.unknown

    _1075201431.unknown

    _1069502694.unknown

    _1069502885.unknown

    _1069502558.unknown

  • 3define critical valuediscriminate into two category:

    {

    category A

    category B

    4 cases are wrongly classified .

    _1075201788.unknown

    _1075205750.unknown

    _1075201780.unknown

  • II EVALUATION using mistake discriminate probability P

    Methods

    1.retrospective: samples resubstitute often magnify the effect

    2. prospective: validate samples

    Jackknife (Cross validation)

    Steps:

    Virtue: fully use information from the sample

    REQUEST: the mistake discriminant probability of the discriminant function is less than 0.1 or 0.2

  • 2 MAXIMUM LIKELIHOOD METHOD

    Indicator: qualitative indicator Discriminated into: two or more categories

  • 1DataIVs are classified into two or more groups discriminate indicators are all categorical variables

    Principleget discriminant probability by probability multiplicative theorem of independent event

  • 2. rule

    Calculate

    if

    then be classified as category

    .

    _1069513890.unknown

    _1079520917.unknown

    _1069513684.unknown

  • 3 application Example18.2 Someone want to use 7 indicators to diagnose 4 types of appendicitis. 5668 medical records are summed up in table 18-3.

  • based on table 18-3

    _1069075272.unknown

    _1069514645.unknown

    _1100782438.unknown

    _1100782526.unknown

    _1069514511.unknown

    _1069075129.unknown

    P3 is maximal, so classified it as ganrenous appendicitis. And the diagnosis after operation is consistent with the discrimiant result.

  • 3 Bayes formula discriminance

    Indicator: qualitative indicator Discriminated into: two or more categories

  • 1DataIVs are classified into two or more groups discriminative indicators are all qualitative or ranked data

    Principleconditional probability + beforehand by probability

  • RuleFor ExampleExample18-3

    calculate

    if

    then discriminate it as category

    .

    _1069513890.unknown

    _1079520917.unknown

    _1069513684.unknown

  • Example18-3 data is showed in table18-3regard the composing proportion of each type of appendicitis as the estimation of the transcendental probability

    .

    20%

    50%

    25%

    5%

    _1069516110.unknown

  • For the case to be discriminated:

    For example,

    means variable

    exist in the third state. The same as the others.

    _1172434594.unknown

    _1172434648.unknown

  • Calculate

  • Notice

  • 4 Bayes discriminant analysis

    Indicator: numerical indicatorDiscriminated into: several categories (also two categories)

  • data IVs are classified into G categoruies discriminative indicators are quantitative dataprincipleBayes ruleresultsG discriminant functions

  • Decide transcendental probability:1. equal probabilitywith selection bias2. frequency estimation

    Rule: if the Yg is maximum, then it belongs to category g

    Advantage: quick , correct

  • Example18-4 Use 17 medical records sorted as table18-4 to discriminate 3 diseases. Four indicators have been list out. You are suggested to set Bayes discriminant function.

  • Table observed category and discriinaanr classification of 18-4 4 indicators

    number

    Formal categoty

    Yg

    Discrimina result

    Category

    1

    category 2

    category 3

    1

    6.0

    -11.5

    19

    90

    1

    0.982

    0.018

    0.000

    1

    2

    -11.0

    -18.5

    25

    -36

    3

    0.000

    0.140

    0.860

    3

    3

    90.2

    -17.0

    17

    3

    2

    0.002

    0.548

    0.450

    2

    4

    -4.0

    -15.0

    13

    54

    1

    0.970

    0.030

    0.001

    1

    5

    0.0

    -14.0

    20

    35

    2

    0.099

    0.667

    0.235

    2

    6

    0.5

    -11.5

    19

    37

    3

    0.004

    0.413

    0.584

    3

    7

    -10.0

    -19.0

    21

    -42

    3

    0.000

    0.151

    0.848

    3

    8

    0.0

    -23.0

    5

    -35

    1

    0.427

    0.520

    0.053

    2

    9

    20.0

    -22.0

    8

    -20

View more