discriminant analysis

Discriminant Analysis

Discriminant Analysis With 2-Groups

Comparison of 2-Group Discriminant Analysis With Logistical Regression

Discriminant Analysis With More Than 2-Groups

Notes on Discriminant Analysis: Charles M. Friel Ph.D., Criminal Justice Center, Sam Houston State University

KEY CONCEPTS*****


Discriminant functionA priori categories or groupsHomogeneity of variance/covariance matricesDifferences between discriminant analysis and logistical regressionPartitioning of sums of squares in discriminant analysis

TSS = BSS = WSSDiscriminant scoreDiscriminant weight or coefficientDiscriminant constantDiscriminant analysis assumptionsSteps in the discriminant analysis processBox's M test and its null hypothesisWilks' lambdaStepwise method in discriminant analysisPin and Pout criteriaF-test to determine the effect of adding or deleting a variable from the modelUnstandardized and standardized discriminant weightsMeasures of goodness-of-fit

EigenvalueCanonical correlationModel Wilks' lambdaClassification table and hit ratiot-test for a hit ratioMaximum chance criteriaProportional chance criteriaPress's Q statisticHistogram of discriminant scoresCasewise plot of the predictions

Calculation of the cutting score: equal and unequal groupsPrior probabilityConditional probabilityBayes' theorem and posterior probabilityStructure coefficient or discriminant loadingGroup centroidTesting the collinearity of the predictor variablesAssumptions about multiple discriminant functions

Number: g-1 or k whichever is lessFunctions may be collinearDiscriminant scores must be independent

KEY CONCEPTS (cont.)

Interpretation of multiple discriminant functionsTerritorial mapScatterplot of the discriminant scores across the discriminant functions

Notes on Discriminat Analysis: Charles M. Friel Ph.D., Criminal Justice Center, Sam Houston State University

2

Lecture Outline

What is discriminant analysis

The concept of partitioning sums of squares

Discriminant assumptions

Stepwise discriminant analysis with Wilks' lambda

Testing the goodness-of-fit of the model

Determining the significance of the predictor variables

A 2-group discriminant problem

A multi-group discriminant problem


3


Z = a + W1X1 + W2X2 + ... + WkXk

Dependency Technique

Dependent variable is nonmetric

Independent variables can be metric and/or nonmetric

Used to predict or explain a nonmetric dependent variable with two or more a priori categories

Assumptions

Xk are multivariate normally distributed

Homogeneity of variance-covariance matrices of Xk across groups

Xk are independent, non-collinear

The relationship is linear

Absence of outliers


4

Predicting a Nonmetric Variable

Two approaches

Logistical Regression … with a dummy coded DV

Limited to a binary nonmetric dependent variable

Makes relatively few restrictive assumptions

Discriminant Analysis … with a nonmetric dependent variable with 2 or more groups

Not limited to a binary nonmetric dependent variable

Makes several restrictive assumptions


5

Partitioning Sums of Squares (SS) in Discriminant Analysis

In Linear Regression

The Total SS ( Y- Y) 2 is partitioned into

Regression SS ( Y'- Y) 2

Residual SS + ( Y'- Y) 2

( Y- Y) 2 = ( Y'- Y) 2 + ( Y'- Y) 2

Goal Estimate parameters that minimize the Residual SS


6

Partitioning Sums of Squares (SS) in Discriminant Analysis (cont.)

In Discriminant Analysis

The Total SS ( Zi- Z) 2 is partitioned into:

Between Group SS ( Zj- Z) 2

Within Groups SS ( Zij- Zj) 2

( Zi- Z) 2 = ( Zj- Z) 2 + ( Zij- Zj) 2

i = an individual case, j = group j

Zi = individual discriminant score

Z = grand mean of the discriminant scores

Zj = mean discriminant score for group j

Goal Estimate parameters that minimize the Within Group SS


7

Elements of the Discriminant Model

Z = a + W1X1 + W2X2 + ... + WkXk

Z = discriminant score, a number used to predict group membership of a case

a = discriminant constant

Wk = discriminant weight or coefficient, a measure of the extent to which variable Xk discriminates among the groups of the DV

Xk = an IV or predictor variable. Can be metric or nonmetric.

Discriminant analysis uses OLS to estimate the values of the parameters (a) and Wk that minimize the Within Group SS


8

An Example of Discriminant Analysis with a Binary Dependent Variable

Predicting whether a felony offender will receive a probated or prison sentence as a function of various background factors.

Dependent Variable

Type of sentence (type_sent)(0 = probation, 1 = prison)

Independent Variables

Degree of drug dependency (dr_score)

Age at first arrest (age_firs)

Level of work skill (skl_index)

The seriousness of the crime (ser_indx)


9

Discriminant Analysis with Two Groups

Z0 Z Z1

f of Z

Probation (0) Prison (1)

Between SS = (Z0 - Z)2 + (Z1 - Z)2 = (Zj - Z)2 = BSS

Within SS = (Zi0 - Z0)2 + (Zi1 - Z1)2 = (Zij - Zj)2 =WSS

Total SS = (Zi - Z)2 = TSS

Z0 and Z1 are called centroids, the mean discriminant score for each group


10

Discriminant Analysis Assumptions

The predictor variables are multivariate normal, ipso facto univariate normal

The variance-covariance matrices of the predictor variables across the various groups are the same in the population, i.e. homogeneous

The groups defined by the DV exist a priori

The predictor variables are noncollinear

The relationship is linear in its parameters

Absence of leverage point outliers

The sample is large enough, say 30 cases for each predictor variable


11

Steps in Discriminant Analysis Process

Specify the dependent & the predictor variables

Test the model’s assumptions a priori

Determine the method for selection and criteria for entering the predictor variables into the model

Estimate the parameters of the model

Determine the goodness-of-fit of the model and examine the residuals

Determine the significance of the predictors

Test the assumptions ex post facto

Validate the results


12

The Sentence-Type Discriminant Model

Specification of the model (N = 70)

Z = a + W1(dr_score) + W2(age_firs) + W3(skl_indx)... + W4(ser_indx)

Are the predictor variables multivariate normally distributed?


13

DR_SCORE

10.59.58.57.56.55.54.53.52.51.5

12

10

8

6

4

2

0

The Sentence-Type Discriminant Model (cont.)


14

22.021.020.019.018.017.016.015.014.0

16

14

12

10

8

6

4

2

0

AGE_FIRS

10.08.06.04.02.00.0

20

10

0

SKL_INDX

The Sentence-Type Discriminant Model (cont.)

Variable Skew Kurtosis

Dr-score -0.5049 -0.6946Age_firs +0.7728 -0.4250Skl_indx -0.0266 -1.2321Ser_indx +0.2197 -1.1727


15

7.06.05.04.03.02.01.0

14

12

10

8

6

4

2

0

SER_INDX

Are the Variance/Covariance Matrices of the Two Groups Homogeneous?

Covariance Matrices

7.590 -1.945 1.932 2.110

-1.945 5.553 -.329 -1.225

1.932 -.329 8.632 -.857

2.110 -1.225 -.857 3.441

6.466 -2.688 -.648 1.474

-2.688 4.500 .469 -2.219

-.648 .469 7.922 -.507

1.474 -2.219 -.507 3.405

DR_SCORE

AGE_FIRS

SKL_INDX

SER_INDX

DR_SCORE

AGE_FIRS

SKL_INDX

SER_INDX

TYPE_SEN.00

1.00

DR_SCORE AGE_FIRS SKL_INDX SER_INDX

The variances are on the diagonals, and the covariances are on the off-diagonals.

Q Are the variance/covariance matrices of the two groups the same in the population, i.e. homogeneous?


16

Are the Variance/Covariance Matrices of the Two Groups Homogeneous? (cont.)

Box's M test

H0: the variance/covariance matrices of the two groups are the same in the population.

Log Determinants

2 3.076

2 2.988

2 3.040

TYPE_SEN.00

1.00

Pooled within-groups

RankLog

Determinant

The ranks and natural logarithms of determinantsprinted are those of the group covariance matrices.

Test Results

.361

.116

3

1476249

.951

Box's M

Approx.

df1

df2

Sig.

F

Tests null hypothesis of equal population covariance matrices.

Box's M = 0.361,

Approximate F = 0.116, p = 0.951

Conclusion: The null hypothesis with respect to the homogeneity of variance/covariance matrices in the population is accepted.


17

How Will the Predictor Variables Be Entered into the Discriminant Model?

SPSS offers two methods for building a discriminant model …

Entering all the variables simultaneously

Stepwise method

In this example, the variables will be entered in a stepwise fashion using Wilks' lambda criterion

Q What is Wilks' lambda ()?

For a given predictor variable, is the ratio of the WSS to the TSS ( = WSS / TSS)

It is derived from a one-way ANOVA with type_sent as the IV and the predictor variable as the DV


18

How Will the Predictor Variables be Entered into the Discriminant Model? (cont.)

= (WSS / TSS)

= (Zij - Zj)2 / (Zi - Z)2

Step 1: Compute four one-way ANOVAs with type_sent as the IV and each of the four predictor variables as the DVs

Variables in the Analysis

1.000 .000

.864 .000 .983

.864 .019 .832

SER_INDX

SER_INDX

DR_SCORE

Step1

2

ToleranceSig. of F toRemove

Wilks'Lambda

Step 2: Identify the predictor variable that has the lowest significant Wilks' lambda () and enter it into the discriminant model, i.e. ser_indx. (Pin default = 0.05)

Step 3: Estimate the parameters of the resulting discriminant model


19

How Will the Predictor Variables be Entered into the Discriminant Model? (cont.)

Step 4: Of the variables not in the model, select the predictor that has the lowest significant and enter it into the model. Determine if the addition of the variable was significant. Now check if the predictor(s) previously entered are still significant. (Pout default = 0.10)

Step 5: Repeat Step 4 until all the predictor variables are entered into the model or until none the variables outside the model have significant 's.


20

How Is the Significance of Change Determined When a Variable is Entered

Into the Discriminant Function?

Use an F-ratio comparing the Wilks' lambda of the model with the greater number of predictors (k) with the one with the lesser number of predictors (k-1)

F = 1 - ( k-1) / (k) (N - g - 1) ( k-1) / (k) (g - 1)

= WSS / TSS of the function

N = total sample size

g = number of DV groups

df = (N - g - 1) and (g - 1)


21

Estimation of the Parameters of the Model

Z = a + W1(dr_score) + W2(age_firs) + W3(skl_indx)... + W4(ser_indx)

What values of the constant (a) and the discriminant coefficients Wk best predict whether a case will receive a probated or a prison sentence?

After variable selection by a stepwise process using Wilks' , the best equation was found to be …

Canonical Discriminant Function Coefficients

-.235

.564

-.706

DR_SCORE

SER_INDX

(Constant)

1

Function

Unstandardized coefficients

Z = -0.706 - 0.235 (dr_score) + 0.564 (ser_indx)


22

Estimation of the Parameters of the Model (cont.)

Prediction of a case

Take a case with dr_score = 9, ser_indx = 1, and an actual sentence = 0, (i.e. a probated case)

Z = -0.706 - 0.235 (9) + 0.564 (1) = -2.25

Since -2.25 is closer to the code 0 than the code 1, the case would be predicted a probated case, i.e. code 0.


23

How Can the Goodness-of-Fit of the Model Be Measured?

Eigenvalues ()

The Canonical Correlation eta ()

Wilks' Lambda ()

Classification Table

Hit Ratio

t-test of the Hit Ratio

Maximum Chance Criteria

Proportional Chance Criteria

Press’s Q Statistic

Casewise Plot of the Predictions


24

What is an Eigenvalue?

In matrix algebra, an eigenvalue is a constant, which if subtracted from the diagonal elements of a matrix, results in a new matrix whose determinant equals zero.

An example

Given the matrix:

4 1A =

2 5

(4 - x) 1A = = 0.0

2 (5 - x)

Calculating the determinant of the matrix A:

( 4 - x) (5 - x) - (2) (1) = 0.0

(20 - 4x - 5x + x2 - 2) = 0.0

(18 - 9x + x2) = 0.0

(x2 - 9x + 18) = 0.0

This quadratic equation has two solutions or eigenvalues:+ 6 and + 3


25

What Is an Eigenvalue in Discriminant Analysis?

In the present example, the DV is composed of two groups; i.e. cases either sentenced to probation or prison.

When there are two groups, one discriminant function can be extracted from the data and its associated eigenvalue is as follows …

= BSS / WSS = [ ( Zj- Z)2 / ( Zij- Zj)2 ]Interpretation

If = 0.00, the model has no discriminatory power, BSS = 0.0

The larger the value of , the greater the discriminatory power of the model


26

What is the Eigenvalue for the Sentence-Type Example?

Eigenvalues

.305a 100.0 100.0 .483Function1

Eigenvalue % of Variance Cumulative %CanonicalCorrelation

First 1 canonical discriminant functions were used in theanalysis.

a.

The eigenvalue of the discriminant function = 0.305

The % of the variance explained that is explained by this discriminant function = 100%*

The cumulative percentage of the variance explained by the 1st discriminant function = 100%*

* With two DV groups, only one discriminant function can be extracted, which will therefore explain all the variance explained by the model. But with three groups, two functions can be extracted, with g groups, (g - 1) functions can be extracted, or k functions if k is less than g. Therefore, a different % of the total variance explained will be explained by each of the successive functions extracted.


27

How Can You Tell if the Eigenvalue Is Significant?

Two useful statistical indicators can be derived from the eigenvalue …

The canonical correlation eta ()

Wilks' lambda () for the model

The canonical correlation ( )

= / (1 + ) = BSS / TSS

= the correlation of the predictor(s) with the discriminant scores produced by the model

2 = coefficient of determination

1 - 2 = coefficient of non-determination

For the sentence-type example

= 0.3050 / (1 + 0.3050) = 0.483


28

How Can You Tell if the Eigenvalue is Significant? (cont.)

The Wilks' for the discriminant model

= (1 - 2) = [ 1 / (1 + ) ] = WSS / TSS

is chi-square distributed for df = (k - 1), k equal to the number of parameters estimated ***


= 0.3050

= [ 1 / (1 + 0.3050) ] = 0.766, which can be converted to a chi-square statistic ***

2 = 17.837, df = 2, p = 0.0001

H0: In the population Z0 = Z1 = Z

Since the chi-square results are significant …

H0 is rejected and it is concluded the differences in the mean discriminant scores of the two groups are greater than could be attributed to sampling error.

*** 2 = - [(n - 1) - 0.5 (m + k + 1)] ln , df= (k - 1), m=number of discriminant function extracted, k=number of predictor variables


29

How Can You Tell if the Eigenvalue is Significant? (cont.)

Eigenvalues

.305a 100.0 100.0 .483Function1



a.

eigenvalue

canonical correlation

Wilks' Lambda

.766 17.837 2 .000Test of Function(s)1

Wilks'Lambda Chi-square df Sig.

Wilks'

Chi-Square


30

How Well Does the Model Predict?

Classification Resultsa

27 10 37

14 19 33

73.0 27.0 100.0

42.4 57.6 100.0

TYPE_SEN.00

1.00

.00

1.00

Count

%

Original.00 1.00

Predicted GroupMembership

Total

65.7% of original grouped cases correctly classified.a.

Overall results

Overall hit ratio = 65.7%

Correctly classified probationers = 73.0%

Correctly classified prisoners = 57.6%


31

Does the Model Predict Any Better Than Chance?

Maximum chance criterion (MCC) Predict that all 70 cases are in the group with the largest number of cases.

MCC = (nL / NL) (100)

nL = number of subjects in the larger of the two groups

NL = total number of subjects in to combined groups


Probation group, n = 37Prison group, n = 33

If all the cases were predicted to receive probation …

MCC = (37 / 70) (100) = 52.86% correct by chance


32

Does the Model Predict Any Better Than Chance? (cont.)

Proportional chance criterion (Cpro) Randomly classify the cases proportionate to the number of cases in either group.

Cpro =p2 + (1 - p)2

p = proportion of subjects in one group(1 - p) = proportion of cases in the other group

Proportion of probationers = (37 / 70) = 0.5286

Proportion of prisoners = (33 / 70) = 0.4714

Cpro = 0.5286 2 + (1 - 0.4714)2 = 0.5588 or a hit ratio of 55.88%

Comparison of hit ratios

The model 65.71%

MCC 52.86%

Cpro 55.88%


33

Is the Hit Ratio of the Model Significantly Better Than Chance?

By the maximum chance criterion (MCC), one could guess 52.86% of the cases correctly. The model hit 65.71% correctly. Is this significantly better than chance?

Two ways to test whether the model hit ratio is significantly better than chance

t-test for groups of equal size

Press's Q statistic, groups can be of unequal size

t-test for a model with equal size groups

H0: the model hit ratio is no better than chance

t = (P - 0.5) / (0.5) (1 - 0.5) / N

P = the proportion the model predicted correctly

df = (N - 2)


34

Is the Hit Ratio of the Model Significantly Better Than Chance? (cont.)

Press's Q statistic

Q = [ N - (n) (g) ] 2 / [ N * (g - 1)]

N = total number of subjectsn = number of cases correctly classifiedg = number of groups

Q is chi-square distributed for df = 1


Q = [ 70 - (46) (2) ] 2 / [ 70 - (2 - 1)] = 7.0145

p 0.01

Decision The null hypothesis that the model hit ratio is no better than chance is rejected


35

How Can a Cutting Score Be Established to Sort the Cases Into Either Group Based on

Their Discriminant Scores?

When n0 = n1

Zcutting = (Z0 + Z1) / 2

(Zj = mean discriminant score for group j)

When n0 n1

Zcutting = (n0 Z0 + n1 Z1) / 2

For the sentencing-type study

Z0 = -0.5141 and Z1 = +0.5764

Zcutting = [ (37) (-0.5141 )+ (33) (+0.5764) ] / 2

Zcutting = -0.00025, or slightly less than 0.0


36

Can the Predictions of the Model Be Graphed? (cont.)

Box-Whisker plot of the distributions of discriminant scores for probation and prison cases with the cutting score set at -0.00025

Cutting score (-0.00025)


37

3337N =

1.0.0

3

2

1

0

-1

-2

-3

Type of Sentence (probation = 0, prison = 1)

What is the Best Way to See the Predictions Made on Individual Cases

Casewise Statistics

0 0 .081 1 .932 3.040 1 .068 8.031 -2.258

0 0 .157 1 .905 2.000 1 .095 6.273 -1.928

0 0 .048 1 .946 3.914 1 .054 9.418 -2.493

0 0 .203 1 .891 1.621 1 .109 5.588 -1.787

0 0 .238 1 .880 1.390 1 .120 5.151 -1.693

0 0 .704 1 .755 .144 1 .245 2.162 -.894

0 0 .081 1 .932 3.040 1 .068 8.031 -2.258

0 0 .395 1 .837 .722 1 .163 3.765 -1.364

0 0 .635 1 .773 .225 1 .227 2.448 -.988

1 0** .238 1 .880 1.390 1 .120 5.151 -1.693

Case Number1

2

3

4

5

6

7

8

9

10

OriginalActual Group

PredictedGroup p df

P(D>d | G=g)

P(G=g | D=d)

SquaredMahalanobisDistance to

Centroid

Highest Group

Group P(G=g | D=d)


Centroid

Second Highest Group

Function 1

Discriminant

Scores

Misclassified case**.

Discriminant scores … the column on the extreme right hand side of the table

For case 1, discriminant score = -2.2575


38

How Does SPSS Classify Cases?

In SPSS, case classification is accomplished by calculating the probability of a case being in one group or the other (i.e. probation or prison), rather than by simply using a cutting score.

This is accomplished by calculating the posterior probability of group membership using Bayes' Theorem …

P (Gi D) = [ P (D Gi) P (Gi) ] / [ P(D Gi) P (Gi) ]

D = the discriminant score (i.e. Z)

P (Gi D) = posterior probability that a case is in group i, given that it has a specific discriminant score D

P (D Gi) = conditional probability that a case has a discriminant score of D, given that it is in group i

P (Gi) = prior probability that a case is in group i, which would be equal to (ni / N)


39

How Does SPSS Classify Cases? (cont.)

The Bayesian probabilities associated with being in either group are calculated, and the greater of the two probabilities is used to classify the case.

Example: Case 1

Posterior probability of being in the probation group P (Gprobation D) = 0.932

Posterior probability of being in the prison group P (Gprison D) = 0.068

Since P (Gprobation D) P (Gprison D), the case is classified as a probation case. (0.932 0.068)

The column labeled "Actual Group" shows the group the case actually belongs to. If the Bayesian probability misclassifies the case, the case is marked with two asterisks (**).

These are the errors produced by the model, which can also be seen in the classification table in a previous exhibit.


40

Is Each Predictor Variable in the Model Significant?

The significance of the individual predictors variables is accomplished by conducting a one-way MANOVA …

With the grouping variable as the IV and

The discriminant predictors as the DVs.

The MANOVA sums of squares are then used to calculate Wilks' lambda () for each predictor

= WSS / TSS

The results of the final stepwise discriminant model

Variables in the Analysis

1.000 .000

.864 .000 .983

.864 .019 .832

SER_INDX

SER_INDX

DR_SCORE

Step1

2

ToleranceSig. of F to

RemoveWilks'

Lambda


41

Is Each Predictor Variable in the Model Significant? (cont.)

For dr_score (F to remove: p = 0.0195)

= WSS / TSS = (232.862) / (232.862 + 47.081) = 0.8318

For ser_indx (F to remove: p 0.0001)

= WSS / TSS = (480.152) / (480.152 + 8.433) = 0.9827

The Null Hypothesis

H0: the discriminant coefficients in the population are equal to zero

= WSS / TSS = 1.0, i.e. the WSS = TSS, & BSS = 0

Decision

The null hypothesis is rejected since both Wilks' lambdas () associated with the two predictors are significant

This finding should not be surprising since

The stepwise processes guarantees that only significant variables will be entered into the model, and

That all variables in the model are checked to assure that they remain significant as new variables are added.


42

How are the Discriminant Coefficients Interpreted

The Final Stepwise Model


For dr-score

As dr_score increases by one unit, the discriminant score Z decreases by 0.235

Holding the seriousness of the offence (ser_indx) constant, the more drug dependent the defendant, the more likely he/she will be granted probation (code = 0)

For ser_indx

As ser_indx increases by one unit, the discriminant score Z increases by 0.564

Holding the drug dependency (dr_score) constant, the more serious the offence, the more likely the defendant will be sent to prison (code = 1)


43

How Can the Relative Impact on the DV of the Different Predictor Variables be

Compared?

Two ways

Compare the standardized discriminant weights, i.e. coefficients

Compare the structure coefficients, also called the discriminant loadings

Standardized discriminant coefficient (Ck)

The relative difference among the discriminant coefficients can not be compared …

If the predictors variables are in different units of measurement.

The discriminant coefficients must first be converted to standardized coefficients (Ck)


44

How Can the Relative Impact on the DV of the Different Predictor Variables be Compared? (cont.)

Zz = C1ZX1 + C2ZX2 + … + CkZXk

Ck = Wk (Xk - Xk)2 / (N - g)

Wk = the unstandardized discriminant coefficient of variable k

(Xk - Xk)2 = SS of the predictor variable

N = total sample size

g = number of DV groups

Examples: (dr_score) and (ser_indx)

Cdr_score = - 0.235 495.67/ (70 - 2) = - 0.6345

C ser_indx = + 0.5643 232.857/ (70 - 2) = +1.044

Since +1.044 is greater in absolute value than -0.6345, ser_indx has greater discriminatory impact than dr_score.


45

How Can the Relative Impact on the DV of the Different Predictor Variables be Compared? (cont.)

Unstandardaized discriminant coefficients (Wk)


-.235

.564

-.706

DR_SCORE

SER_INDX

(Constant)

1

Function


Standardized discriminant coefficients (Ck)

Standardized Canonical Discriminant Function Coefficients

-.625

1.044

DR_SCORE

SER_INDX

1

Function

Notice that there is no constant (a) in a standardized discriminant function equation since the mean of a standardized variable equals zero.


46

What is a Structure Coefficient?

A structure coefficient, or discriminant loading, is the correlation between a predictor variable and the discriminant scores produced by the discriminant function.

The higher the absolute value of the coefficient, the greater the discriminatory impact of the predictor variable on the DV.

Structure coefficients of the predictorsStructure Matrix

.814

-.240

-.194

-.185

SER_INDX

DR_SCORE

SKL_INDXa

AGE_FIRSa

1

Function

Pooled within-groups correlations between discriminatingvariables and standardized canonical discriminant functions Variables ordered by absolute size of correlation within function.

This variable not used in the analysis.a.

Ser_index has the highest correlation with the discriminant scores, followed by dr_score, skl_indx and age_firs. The algebraic sign () indicates the direction of the relationship.


47

On Average, How Well Did the Discriminant Function Divide the Two Groups?

Group Centroids

One way to determine the degree of separation between the two groups is to compute the mean discriminant score for either group.

These means are called the group centroids

Probation (0) and prison (1) group centroids

Functions at Group Centroids

-.514

.576

TYPE_SEN.00

1.00

1

Function

Unstandardized canonical discriminantfunctions evaluated at group means


48

Are the Predictor Variables Independent, Noncollinear?

Discriminant analysis assumes that the predictor variables are independent or noncollinear.

This problem is partially addressed by using a stepwise procedure to enter the variables into the equation, since …

The collinearity among the variables is considered in the process and the resulting discriminant coefficients are partial coefficients.


Q Are dr_score and ser_indx significantly correlated?

Not withstanding the stepwise process, the final two predictors are significantly correlated. (r = 0.2791, p = 0.019).


49

Would the Same Results be Achieved Using Logistical Regression?

Yes …virtually the same results.

The functions

Discriminant function


Logistical regression equation

Prob event = 1 / (1 + e - ( -0.8011 - 0.272dr_score + 0.611 ser_indx ) )

Hit ratios

Discriminant analysis 65.71%

Logistical Regression 65.71%

Significance of predictor variables

dr_score ser_indx

Discriminant function p = .0001 p = .0004

Logistical regression p .0001 p .0001


50

Results of the Logistical Analysis of Type of Sentence

---------------------- Variables in the Equation -----------------------

Variable B S.E. Wald df Sig R Exp(B)

DR_SCORE -.2720 .1183 5.2810 1 .0216 -.1841 .7619SER_INDX .6111 .1716 12.6776 1 .0004 .3321 1.8424Constant -.8011 .7311 1.2006 1 .2732

--------------- Variables not in the Equation -----------------Residual Chi Square 1.360 with 2 df Sig = .5067

Variable Score df Sig R

AGE_FIRS 1.0003 1 .3172 .0000SKL_INDX .3970 1 .5287 .0000

-2 Log Likelihood 78.474 Goodness of Fit 67.926

Chi-Square df Significance

Model Chi-Square 18.338 2 .0001 Improvement 5.931 1 .0149

Classification Table for TYPE_SEN Predicted .0 1.0 Percent Correct 0 | 1Observed +-------+-------+ .0 0 | 27 | 10 | 72.97% +-------+-------+ 1.0 1 | 14 | 19 | 57.58% +-------+-------+ Overall 65.71%


51

What if the Dependent Variable Has More Than Two Groups?

Example

Dependant variable

Pre-disposition status (jail, bail, or ROR)

Independent variables

Age of first arrest (age_firs)

Age at time of arrest (age)

Degree of drug dependency (dr_score)

Number of prior arrests (pr_arrst)

Type of counsel (counsel 0 = court appointed, 1 = retained)


52

One Versus Multiple Discriminant Functions

When the dependant variable has two groups …

One discriminant function can be extracted from the data

When there are three groups …

Two functions can be extracted from the data

When there are g-number of groups …

(g - 1) functions can be extracted from the data,

Or k-number of functions if the number of predictor variables (k) is less than the number of groups (g)


53

Geometry of Two Discriminant Functions

Imagine a problem with two predictor variables and a DV with three groups. Now draw a scatterplot of the cases in each group across the two predictor variables.

X2

xx xx x

x xx x x

o oo o o

o o oo o o

o* * *

* * * * ** * *

X1

Group 1 = Group 2 = oGroup 3 = x

Q How best to describe these three groups?


54

Geometry of Two Discriminant Functions (cont.)

X2

Z1

x xx x xx x x x x

x x xo o

o o oo o o

o o oo

* * * Z2

* * * * ** * *

X1

Two vectors are fit to the data

Z1 reasonably good fit for groups 1 and 3, but a bad fit to group 2 (1st discriminant function)

Z2 reasonably good fit for group 2, but a bad fit for groups 1 & 3 (2nd discriminant function)

The two vectors taken together better explain the three groups than either one by itself.


55

Statistics with Multiple Discriminant Functions

The statistical output with multiple discriminant functions is comparable to that with one function …

Except that multiple sets of statistics are derived for each discriminant function, including:

Discriminant coefficients, or weights

Standardized coefficients, or weights

Centroids

Structured coefficients, or loadings

Eigenvalues

Canonical correlations

Wilks' lambdas


56

Assumptions About Multiple Discriminant Functions

Q Must the various discriminant functions be independent of each other, i.e. noncollinear?

No, they may be collinear or noncollinear, whatever best fits the data. Geometrically, the functions can be other than 90 apart.

Q Must the discriminant scores (Z) produced by the various discriminant functions be independent of each other, i.e. noncollinear?

Yes, the correlations among the discriminant scores produced by the various functions must all be equal to zero (0.0)

r Z1 Z2 = 0.0


57

Discriminant Analysis of Pre-Disposition Status

The analysis was conducted using a stepwise selection process using the Wilks' lambda criterion

Q Are the variance/covariance matrices of the three groups the same in the population?

Covariance Matrices

3.415 -2.579 -.391 -1.639 .232

-2.579 19.080 4.389 -1.729 -.229

-.391 4.389 4.770 -1.510 .103

-1.639 -1.729 -1.510 6.814 -.138

.232 -.229 .103 -.138 .136

5.190 1.557 -2.326 -.152 .210

1.557 5.957 .457 .814 -.257

-2.326 .457 8.490 .631 -7.381E-02

-.152 .814 .631 .662 -.148

.210 -.257 -7.381E-02 -.148 .190

5.882 -1.305 -3.890 -.938 .298

-1.305 6.382 -.515 2.313 -.327

-3.890 -.515 10.154 1.125 -9.559E-02

-.938 2.313 1.125 3.000 -.500

.298 -.327 -9.559E-02 -.500 .154

AGE_FIRS

AGE

DR_SCORE

PR_ARRST

COUNSEL

AGE_FIRS

AGE

DR_SCORE

PR_ARRST

COUNSEL

AGE_FIRS

AGE

DR_SCORE

PR_ARRST

COUNSEL

PRE_STAT1.00

2.00

3.00

AGE_FIRS AGE DR_SCORE PR_ARRST COUNSEL


58

Discriminant Analysis of Per-Disposition Status (cont.)

Box's M test for the homogeneity of variance/ covariance matrices

Analysis 1Box's Test of Equality of Covariance Matrices

Log Determinants

2 .934

2 .066

2 -.130

2 .606

PRE_STAT1.00

2.00

3.00

Pooled within-groups

RankLog

Determinant

The ranks and natural logarithms of determinantsprinted are those of the group covariance matrices.

Test Results

12.391

1.968

6

38111.579

.066

Box's M

Approx.

df1

df2

Sig.

F

Tests null hypothesis of equal population covariance matrices.

Decision (Box's M = 12.39, p = 0.0663)

The null hypothesis that the variance/covariance matrices are equal in the population is accepted.


59

What Is the Final Model Estimated by the Discriminant Analysis?

Two discriminant functions were extracted …


-.146 .253

1.946 1.682

2.375 -6.655

AGE

COUNSEL

(Constant)

1 2

Function


1 st Function

Z1 = 2.375 - 0.146 (age) + 1.946 (counsel)

2 nd Function

Z2 = -6.655 + 0.253 (age) + 1.682 (counsel)

Given a 22-year-old offender with retained counsel …

Z1 = 2.375 - 0.146 (22) + 1.946 (1) = 1.109

Z2 = -6.655 + 0.253 (22) + 1.682 (1) = 0.593

These two discriminant scores will be used to classify the offender into one of the three pre-disposition groups.


60

Are the Two Discriminant Functions Significant?

Eigenvalues

.882a 98.0 98.0 .685

.018a 2.0 100.0 .134

Function1

2



a.

Wilks' Lambda

.522 43.254 4 .000

.982 1.206 1 .272

Test of Function(s)1 through 2

2

Wilks'Lambda Chi-square df Sig.

1 st Function

Eigenvalue = 0.8819

Of the variance explained by the two functions, the 1st explains 97.97%

The canonical correlation () between the two predictor variables and the discriminant scores produced by the 1st function = 0.6846

The chi-square test of the Wilks' is significant (2 = 43.254, p 0.0001). The null hypothesis that in the population the BSS = 0, = 0, is rejected.


61

Are the Two Discriminant Functions Significant? (cont.)

2nd Function

Eigenvalue = 0.0183

Of the variance explained by the two functions, the 2nd explains 2.03%

The canonical correlation () between the two predictor variables and the discriminant scores produced by the 2nd function = 0.1341

The chi-square test of the Wilks' is not significant (2 = 1.206, p = 0.272). The null hypothesis that in the population the BSS = 0, = 0, is accepted.

Decision

Since the second function is not significant, its associated statistics will not be used in the interpretation of the affect of age and counsel on pre-disposition status.


62

What Are the Standardized Canonical Discriminant Functions?

Standardized Canonical Discriminant Function Coefficients

-.508 .883

.770 .666

AGE

COUNSEL

1 2

Function

Zz = C1ZX1 + C2ZX2 + … + CkZXk

1 st Function

Zz1 = -0.508 (age) + 0.770 (counsel)

2 nd Function

Zz2 = 0.883 (age) + 0.666 (counsel)

Nota Bene

Recall that the 2nd function was found not to be significant. Of the two variables in the 1st

function, counsel has the greater impact.


63

What is the Correlation Between Each of the Predictor Variables and the

Discriminant Scores Produced By theTwo Functions?

Structure coefficients, or loadings …

Structure Matrix

.867* .499

.291* .067

-.219* -.190

-.654 .757*

-.109 .195*

COUNSEL

AGE_FIRSa

PR_ARRSTa

AGE

DR_SCOREa

1 2

Function

Pooled within-groups correlations between discriminatingvariables and standardized canonical discriminant functions Variables ordered by absolute size of correlation within function.

Largest absolute correlation between each variable andany discriminant function

*.

This variable not used in the analysis.a.

The predictors counsel, age_firs, and pr_arrst load highest on the 1st function, while age and dr_score load highest on the 2nd function.


64

What is the Mean Discriminant Score for Each Pre-Disposition Group on Each

Discriminant Function?

Recall that these mean discriminant scores are called centroids and that the 2nd discriminant function is not significant.

Functions at Group Centroids

-1.001 -1.64E-03

.856 -.160

.827 .201

PRE_STAT1.00

2.00

3.00

1 2

Function

Unstandardized canonical discriminantfunctions evaluated at group means

Notice how numerically similar the centroids of the 1st

function are for groups 2 and 3, i.e. bail and ROR.

This means that the 1st function, while significant, will do a poor job discriminating between the bail and ROR groups, and most of its discriminatory power will be discriminating between the jail group versus the other two groups.


65

What Would a Scatterplot of the Discrimanant Scores of the Three Pre-

Disposition Groups Reveal?

Reading across horizontally, notice how the 1st

discriminant function separates the centroid-pair of the jail group (1) from that of the bail (2) and ROR (3) groups.

Reading vertically, however, notice that the 2nd

discriminant function fails to separate the three centroid-pairs of the three groups.

This is why the 2nd function was not found to be significant.


66

PRE_STAT

Group Centroids

Group 3

Group 2

Group 1

Group 3

Group 2Group 1

Canonical Discriminant Functions

Function 1

210-1-2-3

3

2

1

0

-1

-2

-3

How Were the Individual Cases Classified?

Casewise plot of the cases

Casewise Statistics

2 2 .682 2 .578 .766 3 .391 1.127 1.694 -.411

2 2 .787 2 .550 .480 3 .409 .649 1.548 -.158

3 2** .682 2 .578 .766 3 .391 1.127 1.694 -.411

2 2 .833 2 .521 .364 3 .426 .342 1.403 .096

2 2 .787 2 .550 .480 3 .409 .649 1.548 -.158

3 2** .811 2 .490 .420 3 .442 .206 1.257 .349

2 2 .833 2 .521 .364 3 .426 .342 1.403 .096

3 2** .811 2 .490 .420 3 .442 .206 1.257 .349

3 3 .800 2 .465 .447 2 .426 1.044 .965 .856

3 2** .682 2 .578 .766 3 .391 1.127 1.694 -.411

2 1** .154 2 .596 3.745 2 .283 4.394 -.398 -1.840

2 2 .682 2 .578 .766 3 .391 1.127 1.694 -.411

1 3** .662 2 .470 .825 2 .392 1.613 .819 1.109

3 1** .551 2 .773 1.190 2 .144 3.707 -.836 -1.080

2 1** .392 2 .721 1.871 2 .184 3.765 -.690 -1.333

1 1 .154 2 .596 3.745 2 .283 4.394 -.398 -1.840

1 1 .914 2 .910 .179 2 .049 5.185 -1.419 -.067

3 1** .711 2 .818 .681 2 .112 3.820 -.981 -.827

1 1 .842 2 .855 .343 2 .086 4.104 -1.127 -.573

2 1** .085 2 .526 4.938 2 .341 4.965 -.252 -2.094

Case Number1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

OriginalActual Group

PredictedGroup p df

P(D>d | G=g)

P(G=g | D=d)


Centroid

Highest Group

Group P(G=g | D=d)


Centroid

Second Highest Group

Function 1 Function 2

Discriminant Scores

Misclassified case**.


67

What Was the Hit Ratio of the Discriminant Model?

Classification Resultsa

27 1 4 32

5 14 2 21

3 11 3 17

84.4 3.1 12.5 100.0

23.8 66.7 9.5 100.0

17.6 64.7 17.6 100.0

PRE_STAT1.00

2.00

3.00

1.00

2.00

3.00

Count

%

Original1.00 2.00 3.00

Predicted Group Membership

Total

62.9% of original grouped cases correctly classified.a.

Hit Ratio = (44 / 70) (100) = 62.9%

Errors = (26 / 70) (100) = 37.14%


68

discriminant analysis

Documents

group discriminant analysis

discriminant analysis

discriminant analysis

collinear discriminant

regression ss y y

residual ss y y

total ss y y

groups xk