linear discriminant analysis, part iilinear discriminant analysis, part ii author patrick breheny...

19
Linear discriminant analysis in R/SAS Comparison with multinomial/logistic regression Linear Discriminant Analysis, Part II Patrick Breheny September 20 Patrick Breheny BST 764: Applied Statistical Modeling 1/19

Upload: others

Post on 08-Jul-2020

7 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Linear Discriminant Analysis, Part IILinear Discriminant Analysis, Part II Author Patrick Breheny Created Date 9/20/2011 9:42:40 AM

Linear discriminant analysis in R/SASComparison with multinomial/logistic regression

Linear Discriminant Analysis, Part II

Patrick Breheny

September 20

Patrick Breheny BST 764: Applied Statistical Modeling 1/19

Page 2: Linear Discriminant Analysis, Part IILinear Discriminant Analysis, Part II Author Patrick Breheny Created Date 9/20/2011 9:42:40 AM

Linear discriminant analysis in R/SASComparison with multinomial/logistic regression

Iris DataSAS/R

Anderson’s Iris Data

To illustrate the application of LDA to a real data set, we willuse a famous data set collected by Anderson and published in”The irises of the Gaspe Peninsula”, and which originallyinspired Fisher to develop LDA

Anderson collected and measured hundreds of irises in aneffort to study variation between and among the differentspecies

There are 260 species of iris; this data set focuses of three ofthem (Iris setosa, Iris virginica and Iris versicolor)

Four features were measured on 50 samples for each species:sepal width, sepal length, petal width, and petal length

Patrick Breheny BST 764: Applied Statistical Modeling 2/19

Page 3: Linear Discriminant Analysis, Part IILinear Discriminant Analysis, Part II Author Patrick Breheny Created Date 9/20/2011 9:42:40 AM

Linear discriminant analysis in R/SASComparison with multinomial/logistic regression

Iris DataSAS/R

Iris species

(a) setosa (b) virginica

(c) versicolor

Patrick Breheny BST 764: Applied Statistical Modeling 3/19

Page 4: Linear Discriminant Analysis, Part IILinear Discriminant Analysis, Part II Author Patrick Breheny Created Date 9/20/2011 9:42:40 AM

Linear discriminant analysis in R/SASComparison with multinomial/logistic regression

Iris DataSAS/R

Scatterplot matrix

SepalLength

2.0 2.5 3.0 3.5 4.0 0.5 1.0 1.5 2.0 2.5

4.5

5.5

6.5

7.5

2.0

2.5

3.0

3.5

4.0

SepalWidth

PetalLength

12

34

56

7

4.5 5.5 6.5 7.5

0.5

1.0

1.5

2.0

2.5

1 2 3 4 5 6 7

PetalWidth

setosa versicolor virginica

Patrick Breheny BST 764: Applied Statistical Modeling 4/19

Page 5: Linear Discriminant Analysis, Part IILinear Discriminant Analysis, Part II Author Patrick Breheny Created Date 9/20/2011 9:42:40 AM

Linear discriminant analysis in R/SASComparison with multinomial/logistic regression

Iris DataSAS/R

LDA in SAS/R

Fitting LDA models in SAS/R is straightforward

SAS code:

PROC DISCRIM DATA=iris;

CLASS Species;

RUN;

R code (requires the MASS package):

fit <- lda(Species~.,Data)

Patrick Breheny BST 764: Applied Statistical Modeling 5/19

Page 6: Linear Discriminant Analysis, Part IILinear Discriminant Analysis, Part II Author Patrick Breheny Created Date 9/20/2011 9:42:40 AM

Linear discriminant analysis in R/SASComparison with multinomial/logistic regression

Iris DataSAS/R

Confusion matrix

The cross-classification table of predicted and actual speciesassignments (sometimes called the confusion matrix):

Actualsetosa versicolor virginica

setosa 50 0 0Predicted versicolor 0 48 1

virginica 0 2 49

Patrick Breheny BST 764: Applied Statistical Modeling 6/19

Page 7: Linear Discriminant Analysis, Part IILinear Discriminant Analysis, Part II Author Patrick Breheny Created Date 9/20/2011 9:42:40 AM

Linear discriminant analysis in R/SASComparison with multinomial/logistic regression

Iris DataSAS/R

Mahalanobis distance

The “distance” between classes k and l can be quantifiedusing the Mahalanobis distance:

∆ =

√(µk − µl)

TΣ−1(µk − µl),

Essentially, this is a scale-invariant version of how far apartthe means, and which also adjusts for the correlation betweenvariables

The result is a multivariate extension of the notion of “howmany standard deviations apart are X and Y ”?

Patrick Breheny BST 764: Applied Statistical Modeling 7/19

Page 8: Linear Discriminant Analysis, Part IILinear Discriminant Analysis, Part II Author Patrick Breheny Created Date 9/20/2011 9:42:40 AM

Linear discriminant analysis in R/SASComparison with multinomial/logistic regression

Iris DataSAS/R

Mahalanobis distance

setosa versicolor virginica

setosa 0.00 9.48 13.39versicolor 9.48 0.00 4.15virginica 13.39 4.15 0.00

These distances are rather large; hence the ease with which LDAwas able to classify the species

Patrick Breheny BST 764: Applied Statistical Modeling 8/19

Page 9: Linear Discriminant Analysis, Part IILinear Discriminant Analysis, Part II Author Patrick Breheny Created Date 9/20/2011 9:42:40 AM

Linear discriminant analysis in R/SASComparison with multinomial/logistic regression

Iris DataSAS/R

Prediction

An important feature of LDA is the ability to estimate theconditional probability of the class given the identifyingfeatures

This is valuable in two distinct situations:

To predict future classesTo illustrate the model and the relationship of the explanatoryvariables to the outcome

For example, suppose we only had five observations perspecies; would that be enough to build an accurate classifier?

Patrick Breheny BST 764: Applied Statistical Modeling 9/19

Page 10: Linear Discriminant Analysis, Part IILinear Discriminant Analysis, Part II Author Patrick Breheny Created Date 9/20/2011 9:42:40 AM

Linear discriminant analysis in R/SASComparison with multinomial/logistic regression

Iris DataSAS/R

Making predictions in SAS/R

To explore this, let’s split our sample randomly into a trainingset used to fit the model, and a test set we can use to seehow well our model predicts new observations

Once this is done, it is straightforward in both SAS and R tomake predictions on a new set of data:

PROC DISCRIM DATA=Train TESTDATA=Test TESTOUT=Pred;

CLASS Species;

RUN;

Or in R:

fit <- lda(Species~.,Train)

pred <- predict(fit,Test)

Patrick Breheny BST 764: Applied Statistical Modeling 10/19

Page 11: Linear Discriminant Analysis, Part IILinear Discriminant Analysis, Part II Author Patrick Breheny Created Date 9/20/2011 9:42:40 AM

Linear discriminant analysis in R/SASComparison with multinomial/logistic regression

Iris DataSAS/R

Prediction results

Results from one such test/train split:

Actualsetosa versicolor virginica

setosa 45 0 0Predicted versicolor 0 42 4

virginica 0 3 41

The misclassification error goes up slightly, but the differencesbetween the species are big enough that we have a rather goodclassifier even with only 5 observations per class

Patrick Breheny BST 764: Applied Statistical Modeling 11/19

Page 12: Linear Discriminant Analysis, Part IILinear Discriminant Analysis, Part II Author Patrick Breheny Created Date 9/20/2011 9:42:40 AM

Linear discriminant analysis in R/SASComparison with multinomial/logistic regression

Multinomial logistic regression

If you are familiar with multinomial logistic regression, youmay be thinking to yourself: what’s the big deal? I alreadyhave a perfectly good tool for dealing with this problem

To refresh your memory, the multinomial logistic regressionmodel consists of defining one class to be the reference andfitting separate logistic regression models for k = 2, . . . ,K,comparing each outcome to the baseline:

log

(πikπi1

)= βk0 + xT

i βk

where πik denotes the probability that the ith individual’soutcome belongs to the kth class

Patrick Breheny BST 764: Applied Statistical Modeling 12/19

Page 13: Linear Discriminant Analysis, Part IILinear Discriminant Analysis, Part II Author Patrick Breheny Created Date 9/20/2011 9:42:40 AM

Linear discriminant analysis in R/SASComparison with multinomial/logistic regression

LDA = logistic regression?

Recall, however, that LDA satisfies:

log

(πikπi1

)= log

πkπl− 1

2(µk + µl)

TΣ−1(µk − µl)

+ xTΣ−1(µk − µl)

= αk0 + xTi αk

At first glance, then, it seems the models are the same

Patrick Breheny BST 764: Applied Statistical Modeling 13/19

Page 14: Linear Discriminant Analysis, Part IILinear Discriminant Analysis, Part II Author Patrick Breheny Created Date 9/20/2011 9:42:40 AM

Linear discriminant analysis in R/SASComparison with multinomial/logistic regression

Difference between LDA and logistic regression

However, although the two approaches have the same form,they do not estimate their coefficients in the same manner

LDA operates by maximizing the log-likelihood based on anassumption of normality and homogeneity

Logistic regression, on the other hand, makes no assumptionabout Pr(X), and estimates the parameters of Pr(G|x) bymaximizing the conditional likelihood

Patrick Breheny BST 764: Applied Statistical Modeling 14/19

Page 15: Linear Discriminant Analysis, Part IILinear Discriminant Analysis, Part II Author Patrick Breheny Created Date 9/20/2011 9:42:40 AM

Linear discriminant analysis in R/SASComparison with multinomial/logistic regression

Difference between LDA and logistic regression (cont’d)

Intuitively, it would seem that if the distribution of x is indeedmultivariate normal, then we will be able to estimate ourcoefficients more efficiently by making use of that information

On the other hand, logistic regression would presumably bemore robust if LDA’s distributional assumptions are violated

Indeed, this intuition is borne out, both by theoretical workand simulation studies, although in practice, the twoapproaches do usually give similar results

Patrick Breheny BST 764: Applied Statistical Modeling 15/19

Page 16: Linear Discriminant Analysis, Part IILinear Discriminant Analysis, Part II Author Patrick Breheny Created Date 9/20/2011 9:42:40 AM

Linear discriminant analysis in R/SASComparison with multinomial/logistic regression

Iris data comparison

For the iris data, multinomial logistic regression classifies thedata even better (slightly) than LDA:

Actualsetosa versicolor virginica

setosa 50 0 0Predicted versicolor 0 49 1

virginica 0 1 49

However, this is not convincing; what matters is the ability topredict observations that the model doesn’t already know theanswers for

Patrick Breheny BST 764: Applied Statistical Modeling 16/19

Page 17: Linear Discriminant Analysis, Part IILinear Discriminant Analysis, Part II Author Patrick Breheny Created Date 9/20/2011 9:42:40 AM

Linear discriminant analysis in R/SASComparison with multinomial/logistic regression

Iris cross-validation

Consider a cross-validation study with the iris data, randomlysplitting it up into a training set containing 5 observations perspecies, with the remainder used as a test set

The results: LDA has a misclassification rate of 5.2%, whilelogistic regression has a misclassification rate of 7.7%

Patrick Breheny BST 764: Applied Statistical Modeling 17/19

Page 18: Linear Discriminant Analysis, Part IILinear Discriminant Analysis, Part II Author Patrick Breheny Created Date 9/20/2011 9:42:40 AM

Linear discriminant analysis in R/SASComparison with multinomial/logistic regression

Asymptotic results

Efron (1975) derived the asymptotic relative efficiency of logisticregression compared to LDA in the two-class case when the truedistribution of x is normal and homogeneous, and found thelogistic regression estimates to be considerably more variable:

0 1 2 3 4

0.4

0.5

0.6

0.7

0.8

0.9

1.0

Asy

mpt

otic

rel

ativ

e ef

ficie

ncy

LDA

Logisticregression

Patrick Breheny BST 764: Applied Statistical Modeling 18/19

Page 19: Linear Discriminant Analysis, Part IILinear Discriminant Analysis, Part II Author Patrick Breheny Created Date 9/20/2011 9:42:40 AM

Linear discriminant analysis in R/SASComparison with multinomial/logistic regression

Final remarks

Recall the problem of complete separation in logisticregression: when there is no overlap between the classes, thelogistic regression MLEs go to ±∞This does not happen with LDA, however: estimates arealways well-defined and finite

In principle, LDA should perform poorly when outliers arepresent, as these usually cause problems when assumingnormality

In practice, however, the two approaches usually give similarresults, even in cases where x is obviously not normal (such asfor categorical explanatory variables)

Patrick Breheny BST 764: Applied Statistical Modeling 19/19