7_4 linear discriminant analysis

13
7/29/2019 7_4 Linear Discriminant Analysis http://slidepdf.com/reader/full/74-linear-discriminant-analysis 1/13 1 Advanced Statistical Methods in Insurance 7. Multivariate Data 7.4 Linear Discriminant Analysis 7.4 Linear Discriminant Analysis 7 Multivariate Data Problem Definition ` Given are samples from g different populations `  The main question of discriminant analysis   u   a   r    i   a    l    S    t   u    d    i   e   s  based on a so called training sample, which allows the correct classification of future observations into their unknown population they belong to ` d: S → {1, ..., g } S R P sampling space d is a decision rule which can be applied to an    S   a    l   z    b   u   r   g    I   n   s    t    i    t   u    t   e   o    f    A   c    t observation x ω : d( x ω )=k ` If ω ∈ Ω k and d( x ω ) =k Correct decision ` If ω ∈ Ω k and d( x ω ) k Wrong decision 2 ©Hudec & Schlögl

Upload: abir-hadzic

Post on 03-Apr-2018

235 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 7_4 Linear Discriminant Analysis

7/29/2019 7_4 Linear Discriminant Analysis

http://slidepdf.com/reader/full/74-linear-discriminant-analysis 1/13

1

Advanced Statistical Methods inInsurance

7. Multivariate Data

7.4 Linear Discriminant Analysis

7.4 Linear DiscriminantAnalysis7 Multivariate Data

Problem Definition

` Given are samples from g different populations

`  The main question of discriminantanalysis

  u  a  r   i  a   l   S   t  u   d   i  e  s

 based on a so called training sample, which allowsthe correct classification of future observations intotheir unknown population they belong to

` d: S → {1, ..., g} S ⊂ RP sampling spaced is a decision rule which can be applied to an

   S  a   l  z   b  u  r  g   I  n  s   t   i   t  u   t  e  o   f   A  c   t

observation xω : d(xω)=k

` If  ω ∈ Ωk and d(xω) = k Correct decision

` If  ω ∈ Ωk and d(xω) ≠ k Wrong decision

2 ©Hudec & Schlögl

Page 2: 7_4 Linear Discriminant Analysis

7/29/2019 7_4 Linear Discriminant Analysis

http://slidepdf.com/reader/full/74-linear-discriminant-analysis 2/13

2

7.4 Linear DiscriminantAnalysis7 Multivariate Data

Data Structure Training sample with known group membership

  u  a  r   i  a   l   S   t  u   d   i  e  s

   S  a   l  z   b  u  r  g   I  n  s   t   i   t  u   t  e  o   f   A

  c   t

©Hudec & Schlögl3

7.4 Linear DiscriminantAnalysis7 Multivariate Data

Bayes Theorem

` Prior probability of group memebership

p k P k( ) { }= ∈ >ω Ω 0

  u  a  r   i  a   l   S   t  u   d   i  e  s

` Class specific (conditional) distribution of x

` Unconditional distribution of xg

*

f( k)x

   S  a   l  z   b  u  r  g   I  n  s   t   i   t  u   t  e  o   f   A  c   t

` Posterior Probability

4 ©Hudec & Schlögl

p(k) * f( k)P(k|x)

f( )=

x

x

k 1

 =

Page 3: 7_4 Linear Discriminant Analysis

7/29/2019 7_4 Linear Discriminant Analysis

http://slidepdf.com/reader/full/74-linear-discriminant-analysis 3/13

3

7.4 Linear DiscriminantAnalysis7 Multivariate Data

Decision Principles` Bayes Decision Rule

` Assign an object to that class kest for which the

  u  a  r   i  a   l   S   t  u   d   i  e  s

 

` kest= e(x) with p(kest|x) ≥ p (l|x) for l = 1, ..., g

` p(kest ) * f(x| ) ≥ p(l) * f(x|l) for l = 1, ..., g

` Maximum-Likelihood Rule

` Assign an object to that class kest for which the

   S  a   l  z   b  u  r  g   I  n  s   t   i   t  u   t  e  o   f   A

  c   t

 ` kest= e(x) with f(x| kest) ≥ f(x|l) for l = 1, ..., g

` In case of equality of all prior probabilities both rulesare equivalent

5 ©Hudec & Schlögl

7.4 Linear DiscriminantAnalysis7 Multivariate Data

Optimality of Bayes Decision

Conditional error rate:

ε(d|x) = P (d(x) ≠ k|x) = 1 - P(d(x) = k|x)

  u  a  r   i  a   l   S   t  u   d   i  e  s

 As the Bayes rule maximizes the second term on the rightside, it minimizes the conditional error rate. Integration over

the sampling space S leads to the minimization of theunconditional error rate ε(e) = P (d(x) ≠ k) for an object frompopulation k

   S  a   l  z   b  u  r  g   I  n  s   t   i   t  u   t  e  o   f   A  c   t

Visualization of Optimality of Bayes Decision rule for p=1

6 ©Hudec & Schlögl

Page 4: 7_4 Linear Discriminant Analysis

7/29/2019 7_4 Linear Discriminant Analysis

http://slidepdf.com/reader/full/74-linear-discriminant-analysis 4/13

4

7.4 Linear DiscriminantAnalysis7 Multivariate Data

Bayes Rule forg=2

  u  a  r   i  a   l   S   t  u   d   i  e  s

   S  a   l  z   b  u  r  g   I  n  s   t   i   t  u   t  e  o   f   A

  c   t

©Hudec & Schlögl7

7.4 Linear DiscriminantAnalysis7 Multivariate Data

Linear Decision Rules

  u  a  r   i  a   l   S   t  u   d   i  e  s

   S  a   l  z   b  u  r  g   I  n  s   t   i   t  u   t  e  o   f   A  c   t

©Hudec & Schlögl8

Page 5: 7_4 Linear Discriminant Analysis

7/29/2019 7_4 Linear Discriminant Analysis

http://slidepdf.com/reader/full/74-linear-discriminant-analysis 5/13

5

7.4 Linear DiscriminantAnalysis7 Multivariate Data

Discrimination with p=2 and g=2

Assumption:

  u  a  r   i  a   l   S   t  u   d   i  e  s

t n eac popuatonthe data aremultivariate normalwith different centersbut constantcovariance matrix

   S  a   l  z   b  u  r  g   I  n  s   t   i   t  u   t  e  o   f   A

  c   t

©Hudec & Schlögl9

7.4 Linear DiscriminantAnalysis7 Multivariate Data

Discrimination with p=2 and g=2

12

Obviously the Bayes

  u  a  r   i  a   l   S   t  u   d   i  e  s

2

4

6

8

10

0.010.02

0.03

0.03

0.04

0.04

0.05

0.050.06

0.06

and the MaximumLikelihood decisionrule both lead to a

linear separationbetween the groups

   S  a   l  z   b  u  r  g   I  n  s   t   i   t  u   t  e  o   f   A  c   t

©Hudec & Schlögl10

0

0 2 4 6 8 10 12

grid

Page 6: 7_4 Linear Discriminant Analysis

7/29/2019 7_4 Linear Discriminant Analysis

http://slidepdf.com/reader/full/74-linear-discriminant-analysis 6/13

6

7.4 Linear DiscriminantAnalysis7 Multivariate Data

Discrimination with p=2 and g=2

   1   2

homoscedastic

  u  a  r   i  a   l   S   t  u   d   i  e  s

   1

   4

   6

   8

   1   0 From the training set

we can estimate theunknown parametersof the multivariatenormal and cancalculate an estimatefor the optimum

   S  a   l  z   b  u  r  g   I  n  s   t   i   t  u   t  e  o   f   A

  c   t

©Hudec & Schlögl11

1

0 2 4 6 8 10 12

   0

   2

separating line

7.4 Linear DiscriminantAnalysis7 Multivariate Data

   4

Bayes-Regel

-

Bayes versus Maximum Likelihood Rule

   4

  u  a  r   i  a   l   S   t  u   d   i  e  s

 -   2

   0

   2

 -   2

   0

   2

   S  a   l  z   b  u  r  g   I  n  s   t   i   t  u   t  e  o   f   A  c   t

-4 -2 0 2 4

 -   4

Priori-Wahrscheinlickeiten 0,8 0,2

©Hudec & Schlögl12

-4 -2 0 2 4

 -   4

Priori-Wahrscheinlickeiten 0,5 0,5

Page 7: 7_4 Linear Discriminant Analysis

7/29/2019 7_4 Linear Discriminant Analysis

http://slidepdf.com/reader/full/74-linear-discriminant-analysis 7/13

7

7.4 Linear DiscriminantAnalysis7 Multivariate Data

Case of Non-Homogeneous Variances

12

  u  a  r   i  a   l   S   t  u   d   i  e  s

2

4

6

8

10

0.010.02

0.030.040.05

0.06

0.06

   S  a   l  z   b  u  r  g   I  n  s   t   i   t  u   t  e  o   f   A

  c   t

©Hudec & Schlögl13

0

0 2 4 6 8 10 12

Results in quadratic separation of populations

7.4 Linear DiscriminantAnalysis7 Multivariate Data

Empirical Data

   1   2

heteroscedastic

  u  a  r   i  a   l   S   t  u   d   i  e  s

   1

   4

   6

   8

   1   0

   S  a   l  z   b  u  r  g   I  n  s   t   i   t  u   t  e  o   f   A  c   t

©Hudec & Schlögl14

1

0 2 4 6 8 10 12

   0

Page 8: 7_4 Linear Discriminant Analysis

7/29/2019 7_4 Linear Discriminant Analysis

http://slidepdf.com/reader/full/74-linear-discriminant-analysis 8/13

8

7.4 Linear DiscriminantAnalysis7 Multivariate Data

LDA & QDA

  u  a  r   i  a   l   S   t  u   d   i  e  s

   S  a   l  z   b  u  r  g   I  n  s   t   i   t  u   t  e  o   f   A

  c   t

©Hudec & Schlögl15

7.4 Linear DiscriminantAnalysis7 Multivariate Data

        0  .       4

Nonparametric DiscriminantAnalysis

  u  a  r   i  a   l   S   t  u   d   i  e  s

  .       1

        0  .        2

        0  .        3

e as e nes s owthe true class specificdensities.

In this situation, wherethe true distributionsare normal, the non-parametric densityestimation will ive

   S  a   l  z   b  u  r  g   I  n  s   t   i   t  u   t  e  o   f   A  c   t

-2 0 2 4

        0  .        0

16 ©Hudec & Schlögl

sub-optimalclassification results

Page 9: 7_4 Linear Discriminant Analysis

7/29/2019 7_4 Linear Discriminant Analysis

http://slidepdf.com/reader/full/74-linear-discriminant-analysis 9/13

9

7.4 Linear DiscriminantAnalysis7 Multivariate Data

Nonparametric DiscriminantAnalysis

        0  .       5

  u  a  r   i  a   l   S   t  u   d   i  e  s

e as e nes s owthe true class specificdensities.In this situation non-parametric densityestimation will givebetter classificationresults than the

        0  .        2

        0  .        3

        0  .       4

   S  a   l  z   b  u  r  g   I  n  s   t   i   t  u   t  e  o   f   A

  c   t

17 ©Hudec & Schlögl

parametric estimateshown on the nextslide

-2 0 2 4

        0  .        0

        0  .       1

7.4 Linear DiscriminantAnalysis7 Multivariate Data

        0  .       5

Inadequacy of Parametric Estimate

  u  a  r   i  a   l   S   t  u   d   i  e  s

        0  .        2

        0

  .        3

        0  .       4

Compare thisexample with the

   S  a   l  z   b  u  r  g   I  n  s   t   i   t  u   t  e  o   f   A  c   t

-2 0 2 4

        0  .        0

        0  .       1

©Hudec & Schlögl18

considerations onrobustness fromchapter 3.2!

Page 10: 7_4 Linear Discriminant Analysis

7/29/2019 7_4 Linear Discriminant Analysis

http://slidepdf.com/reader/full/74-linear-discriminant-analysis 10/13

10

7.4 Linear DiscriminantAnalysis7 Multivariate Data

Naïve Bayes` In case of large p non-parametric methods suffer from

the so called curse of dimensionality (Bellmann, 1961).

  u  a  r   i  a   l   S   t  u   d   i  e  s

`  assumptions tend to break down, as the number of datapoints needed to derive reliable estimates increases veryfast.

` In these situations the “Naïve Bayes” principle has itsmerits. It assumes that the class densities are products

   S  a   l  z   b  u  r  g   I  n  s   t   i   t  u   t  e  o   f   A

  c   t

of marginal densities, which corresponds to assumingconditional independence of the variables within eachclass

19 ©Hudec & Schlögl

p

1 p j j 1

ˆ ˆ ˆf( |k) f((x , ,x ) '|k) f(x |k)=

= = ∏x …

7.4 Linear DiscriminantAnalysis7 Multivariate Data

LDA versus Logistic Regression

green Logistic Regression magenta LDA

  u  a  r   i  a   l   S   t  u   d   i  e  s

   S  a   l  z   b  u  r  g   I  n  s   t   i   t  u   t  e  o   f   A  c   t

©Hudec & Schlögl20

Page 11: 7_4 Linear Discriminant Analysis

7/29/2019 7_4 Linear Discriminant Analysis

http://slidepdf.com/reader/full/74-linear-discriminant-analysis 11/13

11

7.4 Linear DiscriminantAnalysis7 Multivariate Data

Fisher - LDA` Popularity of LDA is due to Fisher

who developed the method

  u  a  r   i  a   l   S   t  u   d   i  e  s

 assumption of multivariateGaussian distributions within eachclass.

` Fisher showed that the sameresult can be achieved by

   S  a   l  z   b  u  r  g   I  n  s   t   i   t  u   t  e  o   f   A

  c   t

searching for the most informative(with regard to the class structure)low-dimensional projections of thedata.

21 ©Hudec & Schlögl

7.4 Linear DiscriminantAnalysis7 Multivariate Data

Not the Most Informative Projection

  u  a  r   i  a   l   S   t  u   d   i  e  s

   S  a   l  z   b  u  r  g   I  n  s   t   i   t  u   t  e  o   f   A  c   t

22 ©Hudec & Schlögl

Page 12: 7_4 Linear Discriminant Analysis

7/29/2019 7_4 Linear Discriminant Analysis

http://slidepdf.com/reader/full/74-linear-discriminant-analysis 12/13

12

7.4 Linear DiscriminantAnalysis7 Multivariate Data

Discriminant Analysis due to Fisher

  u  a  r   i  a   l   S   t  u   d   i  e  s

 This projection gives

   S  a   l  z   b  u  r  g   I  n  s   t   i   t  u   t  e  o   f   A

  c   t

the best discriminationbetween the groups

23 ©Hudec & Schlögl

7.4 Linear DiscriminantAnalysis7 Multivariate Data

Fisher‘s Approach with g=2 Groups

` Looking for a linear combination of the observedvariables y a xk k= ′ k=1,2

  u  a  r   i  a   l   S   t  u   d   i  e  s

` which maximizes the variance criterion

` leads to the solution

( )

Q S S( )a

y y

=

+

1 22

12

22  → max

a W x x= −−1

   S  a   l  z   b  u  r  g   I  n  s   t   i   t  u   t  e  o   f   A  c   t

` W ~ Within sumof squares cross productmatrix

24 ©Hudec & Schlögl

( )W x x x x x x x x= + − = − − ′ + − − ′= =

∑ ∑N N Sn

N

n nn

N

n n1 21

1 1 1 11

2 2 2 221 2

* ( )( ) ( )( )

Page 13: 7_4 Linear Discriminant Analysis

7/29/2019 7_4 Linear Discriminant Analysis

http://slidepdf.com/reader/full/74-linear-discriminant-analysis 13/13

13

7.4 Linear DiscriminantAnalysis7 Multivariate Data

Fisher‘s Approach with g>2 Groups (1)

  u  a  r   i  a   l   S   t  u   d   i  e  s

   S  a   l  z   b  u  r  g   I  n  s   t   i   t  u   t  e  o   f   A

  c   t

©Hudec & Schlögl25

7.4 Linear DiscriminantAnalysis7 Multivariate Data

Fisher‘s Approach with g>2 Groups (2)

  u  a  r   i  a   l   S   t  u   d   i  e  s

   S  a   l  z   b  u  r  g   I  n  s   t   i   t  u   t  e  o   f   A  c   t

©Hudec & Schlögl26

It can be shown that the approach of Fisher leads to the same results asthe LDA based on multivariate Gaussians with constant variance-matrices