discriminant function analysis

41
Discriminant Function Analysis Overview Discriminant function analysis, a.k.a. discriminant analysis or DA, is used to classify cases into the values of a categorical dependent, usually a dichotomy. If discriminant function analysis is effective for a set of data, the classification table of correct and incorrect estimates will yield a high percentage correct. Discriminant function analysis is found in SPSS under Analyze, Classify, Discriminant. One gets DA or MDA from this same menu selection, depending on whether the specified grouping variable has two or more categories. Multiple discriminant analysis (MDA) is an extension of discriminant analysis and a cousin of multiple analysis of variance (MANOVA), sharing many of the same assumptions and tests. MDA is used to classify a categorical dependent which has more than two categories, using as predictors a number of interval or dummy independent variables. MDA is sometimes also called discriminant factor analysis or canonical discriminant analysis. There are several purposes for DA and/or MDA: To classify cases into groups using a discriminant prediction equation. To test theory by observing whether cases are classified as predicted. To investigate differences between or among groups. To determine the most parsimonious way to distinguish among groups. To determine the percent of variance in the dependent variable explained by the independents. To determine the percent of variance in the dependent variable explained by the independents over and above the variance accounted for by control variables, using sequential discriminant analysis. To assess the relative importance of the independent variables in classifying the dependent variable. 1

Upload: cart11

Post on 03-Nov-2014

206 views

Category:

Documents


9 download

TRANSCRIPT

Page 1: Discriminant Function Analysis

Discriminant Function Analysis

Overview

Discriminant function analysis, a.k.a. discriminant analysis or DA, is used to classify cases into the values of a categorical dependent, usually a dichotomy. If discriminant function analysis is effective for a set of data, the classification table of correct and incorrect estimates will yield a high percentage correct. Discriminant function analysis is found in SPSS under Analyze, Classify, Discriminant. One gets DA or MDA from this same menu selection, depending on whether the specified grouping variable has two or more categories.

Multiple discriminant analysis (MDA) is an extension of discriminant analysis and a cousin of multiple analysis of variance (MANOVA), sharing many of the same assumptions and tests. MDA is used to classify a categorical dependent which has more than two categories, using as predictors a number of interval or dummy independent variables. MDA is sometimes also called discriminant factor analysis or canonical discriminant analysis.

There are several purposes for DA and/or MDA:

To classify cases into groups using a discriminant prediction equation. To test theory by observing whether cases are classified as predicted. To investigate differences between or among groups. To determine the most parsimonious way to distinguish among groups. To determine the percent of variance in the dependent variable explained by the

independents. To determine the percent of variance in the dependent variable explained by the

independents over and above the variance accounted for by control variables, using sequential discriminant analysis.

To assess the relative importance of the independent variables in classifying the dependent variable.

To discard variables which are little related to group distinctions. To infer the meaning of MDA dimensions which distinguish groups, based on discriminant

loadings.

Discriminant analysis has two steps: (1) an F test (Wilks' lambda) is used to test if the discriminant model as a whole is significant, and (2) if the F test shows significance, then the individual independent variables are assessed to see which differ significantly in mean by group and these are used to classify the dependent variable.

Discriminant analysis shares all the usual assumptions of correlation, requiring linear and homoscedastic relationships, and untruncated interval or near interval data. Like multiple regression, it also assumes proper model specification (inclusion of all important independents and exclusion of extraneous variables). DA also assumes the dependent variable is a true dichotomy since data which are forced into dichotomous coding are truncated, attenuating correlation.

DA is an earlier alternative to logistic regression, which is now frequently used in place of DA as it usually involves fewer violations of assumptions (independent variables needn't be normally distributed, linearly related, or have equal within-group variances), is robust, handles categorical as

1

Page 2: Discriminant Function Analysis

well as continuous variables, and has coefficients which many find easier to interpret. Logistic regression is preferred when data are not normal in distribution or group sizes are very unequal. However, discriminant analysis is preferred when the assumptions of linear regression are met since then DA has more stattistical power than logistic regression (less chance of type 2 errors - accepting a false null hypothesis). See also the separate topic on multiple discriminant function analysis (MDA) for dependents with more than two categories.

Key Terms and Concepts

Discriminating variables: These are the independent variables, also called predictors. The criterion variable. This is the dependent variable, also called the grouping variable in

SPSS. It is the object of classification efforts. Discriminant function: A discriminant function, also called a canonical root, is a latent

variable which is created as a linear combination of discriminating (independent) variables, such that L = b1x1 + b2x2 + ... + bnxn + c, where the b's are discriminant coefficients, the x's are discriminating variables, and c is a constant. This is analogous to multiple regression, but the b's are discriminant coefficients which maximize the distance between the means of the criterion (dependent) variable. Note that the foregoing assumes the discriminant function is estimated using ordinary least-squares, the traditional method, but there is also a version involving maximum likelihood estimation.

o Pairwise group comparisons display the distances between group means (of the dependent variable) in the multidimensional space formed by the discriminant functions. (Not applicable to two-group DA, where there is only one function). The pairwise group comparisons table gives an F test of significance (based on Mahalanobis distances) of the distance of the group means, enabling the researcher to determine if every group mean is significantly distant from every other group mean. Also, the magnitude of the F values can be used to compare distances between groups in multivariate space. In SPSS, Analyze, Classify, Discriminant; check "Use stepwise method"; click Method, check "F for pairwise distances."

o Number of discriminant functions. There is one discriminant function for 2-group discriminant analysis, but for higher order DA, the number of functions (each with its own cut-off value) is the lesser of (g - 1), where g is the number of categories in the grouping variable, or p,the number of discriminating (independent) variables. Each discriminant function is orthogonal to the others. A dimension is simply one of the discriminant functions when there are more than one, in multiple discriminant analysis.

The first function maximizes the differences between the values of the dependent variable. The second function is orthogonal to it (uncorrelated with it) and maximizes the differences between values of the dependent variable, controlling for the first factor. And so on. Though mathematically different, each discriminant function is a dimension which differentiates a case into categories of the dependent (here, religions) based on its values on the independents. The first function will be the most powerful differentiating dimension, but later functions may also represent additional significant dimensions of differentiation.

o The eigenvalue, also called the characteristic root of each discriminant function, reflects the ratio of importance of the dimensions which classify cases of the dependent variable. There is one eigenvalue for each discriminant function. For two-group DA, there is one discriminant function and one eigenvalue, which accounts for

2

Page 3: Discriminant Function Analysis

100% of the explained variance. If there is more than one discriminant function, the first will be the largest and most important, the second next most important in explanatory power, and so on. The eigenvalues assess relative importance because they reflect the percents of variance explained in the dependent variable, cumulating to 100% for all functions. That is, the ratio of the eigenvalues indicates the relative discriminating power of the discriminant functions. If the ratio of two eigenvalues is 1.4, for instance, then the first discriminant function accounts for 40% more between-group variance in the dependent categories than does the second discriiminant function. Eigenvalues are part of the default output in SPSS (Analyze, Classify, Discriminant).

The relative percentage of a discriminant function equals a function's eigenvalue divided by the sum of all eigenvalues of all discriminant functions in the model. Thus it is the percent of discriminating power for the model associated with a given discriminant function. Relative % is used to tell how many functions are important. One may find that only the first two or so eigenvalues are of importance.

The canonical correlation, R*, is a measure of the association between the groups formed by the dependent and the given discriminant function. When R* is zero, there is no relation between the groups and the function. When the canonical correlation is large, there is a high correlation between the discriminant functions and the groups. Note that relative % and R* do not have to be correlated. R* is used to tell how much each function is useful in determining group differences. An R* of 1.0 indicates that all of the variability in the discriminant scores can be accounted for by that dimension. Note that for two-group DA, the canonical correlation is equivalent to the Pearsonian correlation of the discriminant scores with the grouping variable.

The discriminant score, also called the DA score, is the value resulting from applying a discriminant function formula to the data for a given case. The Z score is the discriminant score for standardized data. To get discriminant scores in SPSS, select Analyze, Classify, Discriminant; click the Save button; check "Discriminant scores". One can also view the discriminant scores by clicking the Classify button and checking "Casewise results."

Cutoff: If the discriminant score of the function is less than or equal to the cutoff, the case is classed as 0, or if above it is classed as 1. When group sizes are equal, the cutoff is the mean of the two centroids (for two-group DA). If the groups are unequal, the cutoff is the weighted mean.

Unstandardized discriminant coefficients are used in the formula for making the classifications in DA, much as b coefficients are used in regression in making predictions. The constant plus the sum of products of the unstandardized coefficients with the observations yields the discriminant scores. That is, discriminant coefficients are the regression-like b coefficients in the discriminant function, in the form L = b1x1 + b2x2 + ... + bnxn + c, where L is the latent variable formed by the discriminant function, the b's are discriminant coefficients, the x's are discriminating variables, and c is a constant. The discriminant function coefficients are partial coefficients, reflecting the unique contribution of each variable to the classification of the criterion variable. The standardized discriminant coefficients, like beta weights in regression, are used to assess the relative classifying importance of the independent variables.

If one clicks the Statistics button in SPSS after running discriminant analysis and then checks "Unstandardized coefficients," then SPSS output will include the unstandardized discriminant coefficients.

3

Page 4: Discriminant Function Analysis

Standardized discriminant coefficients, also termed the standardized canonical discriminant function coefficients, are used to compare the relative importance of the independent variables, much as beta weights are used in regression. Note that importance is assessed relative to the model being analyzed. Addition or deletion of variables in the model can change discriminant coefficients markedly.

As with regression, since these are partial coefficients, only the unique explanation of each independent is being compared, not considering any shared explanation. Also, if there are more than two groups of the dependent, the standardized discriminant coefficients do not tell the researcher between which groups the variable is most or least discriminating. For this purpose, group centroids and factor structure are examined. The standardized discriminant coefficients appear by default in SPSS (Analyze, Classify, Discriminant) in a table of "Standardized Canonical Discriminant Function Coefficients". In MDA, there will be as many sets of coefficients as there are discriminant functions (dimensions).

Functions at group centroids are the mean discriminant scores for each of the dependent variable categories for each of the discriminant functions in MDA. Two-group discriminant analysis has two centroids, one for each group. We want the means to be well apart to show the discriminant function is clearly discriminating. The closer the means, the more errors of classification there likely will be. SPSS generates a table of "Functions at group centroids" by default when Analyze, Classify, Discriminant is invoked.

o Discriminant function plots, also called canonical plots, can be created in which the two axes are two of the discriminant functions (the dimensional meaning of which is determined by looking at the structure coefficients, discussed above), and circles within the plot locate the centroids of each category being analyzed. The farther apart one point is from another on the plot, the more the dimension represented by that axis differentiates those two groups. Thus these plots depict discriminant function space. For instance, occupational groups might be located in a space representing educational and motivational dimensions. In the Plots area of the Classify button, one can select Separate-group plots, a Combined-group plot, or a territorial map. Separate and combined group plots show where cases are located in the property space formed by two functions (dimensions). By default, SPSS uses the first two functions. The territorial map shows inter-group distances on the discriminant functions. Each function has a numeric symbol: 1, 2, 3, etc. Cases falling within the boundaries formed by the 2's, for instance, are classified as 2. The individual cases are not shown in territorial maps under SPSS, however.

Tests of significance (Model) Wilks' lambda is used to test the significance of the discriminant function as a

whole. In SPSS, the "Wilks' Lambda" table will have a column labeled "Test of Function(s)" and a row labeled "1 through n" (where n is the number of discriminant functions). The "Sig." level for this row is the significance level of the discriminant function as a whole. The researcher wants a finding of significance, and the larger the lambda, the more likely it is significant. A significant lambda means one can reject the null hypothesis that the two groups have the same mean discriminant function scores and conclude the model is discriminating. Wilks's lambda is part of the default output in SPSS (Analyze, Classify, Discriminant). In SPSS, this use of Wilks' lambda is in the "Wilks' lambda" table of the output section on "Summary of Canonical Discriminant Functions."

o Stepwise Wilks' lambda appears in the "Variables in the Analysis" table of stepwise DA output, after the "Sig. of F. to Remove" column. The Step 1 model will have no entry as removing the first variable is removing the only variable. The Step 2 model will have two predictors, each with a Wilks' lambda coefficient. which represents

4

Page 5: Discriminant Function Analysis

what model Wilks' lambda would be if that variable were dropped, leaving only the other one. If V1 is entered at Step 1 and V2 is entered at Step 2, then the Wilks' lambda in the "Variables in the Analysis" table for V2 will be identical to the model Wilks' lambda in the ""Wilks' Lambda" table for Step 1, since dropping it would reduce the model to the Step 1 model. The more important the variable in classifying the grouping variable, the higher its stepwise Wilks' lambda.

Stepwise Wilks' lambda also appears in the "Variables Not in the Analysis" table of stepwise DA output, after the "Sig. of F to Enter" column. Here the criterion is reversed: the variable with the lowest stepwise Wilks' lambda is the best candidate to add to the model in the next step.

(Model) Wilks' lambda difference tests are also used in a second context to assess the improvement in classification when using sequential discriminant analysis. There is an F test of significance of the ratio of two Wilks' lambdas, such as between a first one for a set of control variables as predictors and a second one for a model including both control variables and independent variables of interest. The second lambda is divided by the first (where the first is the model with fewer predictors) and an approximate F value for this ratio is found using calculations reproduced in Tabachnick and Fidell (2001: 491).

o ANOVA table for discriminant scores is another overall test of the DA model. It is an F test, where a "Sig." p value < .05 means the model differentiates discriminant scores between the groups significantly better than chance (than a model with just the constant). It is obtained in SPSS by asking for Analyze, Compare Means, One-Way ANOVA, using discriminant scores from DA (which SPSS will label Dis1_1 or similar) as dependent.

(Variable) Wilks' lambda also can be used to test which independents contribute significantly to the discrimiinant function. The smaller the variable Wilks' lambda for an independent variable, the more that variable contributes to the discriminant function. Lambda varies from 0 to 1, with 0 meaning group means differ (thus the more the variable differentiates the groups), and 1 meaning all group means are the same. The F test of Wilks's lambda shows which variables' contributions are significant. Wilks's lambda is sometimes called the U statistic. In SPSS, this use of Wilks' lambda is in the "Tests of equality of group means" table in DA output.

o Dichotomous independents are more accurately tested with a chi-square test than with Wilks' lambda for this purpose.

Measuring strength of relationships Classification functions: There are multiple methods of actually classifying cases in MDA.

Simple classification, also known as Fisher's classification function, simply uses the unstandardized discriminant coefficients. Generalized distance functions are based on the Mahalanobis distance, D-square, of each case to each of the group centroids. K-nearest neighbor discriminant analysis (KNN) is a nonparametric method which assigns a new case to the group to which its k neighest neighbors also belong. The KNN method is popular when there are inadequate data to define the sample means and covariance matrices. There are other methods of classification.

The classification table, also called a classification matrix, or a confusion, assignment, or prediction matrix or table, is used to assess the performance of DA. This is simply a table in which the rows are the observed categories of the dependent and the columns are the predicted categories of the dependents. When prediction is perfect, all cases will lie on the diagonal. The percentage of cases on the diagonal is the percentage of correct classifications. This percentage is called the hit ratio.

5

Page 6: Discriminant Function Analysis

o Expected hit ratio. Note that the hit ratio must be compared not to zero but to the percent that would have been correctly classified by chance alone. For two-group discriminant analysis with a 50-50 split in the dependent variable, the expected percent is 50%. For unequally split 2-way groups of different sizes, the expected percent is computed in the "Prior Probabilities for Groups" table in SPSS, by multiplying the prior probabilities times the group size, summing for all groups, and dividing the sum by N. If group sizes are known a priori, the best strategy by chance is to pick the largest group for all cases, so the expected percent is then the largest group size divided by N.

o Cross-validation. Leave-one-out classification is available as a form of cross-validation of the classification table. Under this option, each case is classified using a discriminant function based on all cases except the given case. This is thought to give a better estimate of what classificiation results would be in the population. In SPSS, select Analyze, Classify, Discriminant; select variables; click Classify; select Leave-one-out classification; Continue; OK.

o Measures of association can be computed by the crosstabs procedure in SPSS if the researcher saves the predicted group membership for all cases. In SPSS, select Analyze, Classify, Discriminant; select variables; click Save; select Discriminant scores; Continue; OK.

Mahalanobis D-Square, Rao's V, Hotelling's trace, Pillai's trace, and Roys gcr are indexes other than Wilks' lambda of the extent to which the discriminant functions discriminate between criterion groups. Each has an associated significance test. A measure from this group is sometimes used in stepwise discriminant analysis to determine if adding an independent variable to the model will significantly improve classification of the dependent variable. SPSS uses Wilks' lambda by default but offers Mahalanobis distance, Rao's V, unexplained variance, and smallest F ratio also.

Canonical correlation, Rc: Squared canonical correlation, Rc2, is the percent of variation in

the dependent discriminated by the set of independents in DA or MDA. The canonical correlation of each discriminant function is also the correlation of that function with the discriminant scores. A canonical correlation close to 1 means that nearly all the variance in the discriminant scores can be attributed to group differences. The canonical correlation of any discriminant function is displayed in SPSS by default as a column in the "Eigenvalues" output table. Note the canonical correlations are not the same as the correlations in the structure matrix, discussed below.

Interpreting the discriminant functions Structure coefficients and structure matrix. Structure coefficients, also called structure

correlations or discriminant loadings, are the correlations between a given independent variable and the discriminant scores associated with a given discriminant function. They are used to tell how closely a variable is related to each function in MDA. Looking at all the structure coefficients for a function allows the researcher to assign a label to the dimension it measures, much like factor loadings in factor analysis. A table of structure coefficients of each variable with each discriminant function is called a canonical structure matrix or factor structure matrix. The structure coefficients are whole (not partial) coefficients, similar to correlation coefficients, and reflect the uncontrolled association of the discriminating variables with the criterion variable, whereas the discriminant coefficients are partial coefficients reflecting the unique, controlled association of the discriminating variables with the criterion variable, controlling for other variables in the equation.

Technically, structure coefficients are pooled within-groups correlations between the independent variables and the standardized canonical discriminant functions. When the dependent has more than two categories there will be more than one discriminant function.

6

Page 7: Discriminant Function Analysis

In that case, there will be multiple columns in the table, one for each function. The correlations then serve like factor loadings in factor analysis -- by considering the set of variables that load most heavily on a given dimension, the researcher may infer a suitable label for that dimension. The structure matrix correlations appear in SPSS output in the "Structure Matrix" table, produced by default under Analyze, Classify, Discriminant.

Thus for two-group DA, the structure coefficients show the order of importance of the discriminating variables by total correlation, whereas the standardized discriminant coefficients show the order of importance by unique contribution. The sign of the structure coefficient also shows the direction of the relationship. For multiple discriminant analysis, the structure coefficients additionally allow the researcher to see the relative importance of each independent variable on each dimension.

Structure coefficients vs. standardized discriminant function coefficients. The standardized discriminant function coefficients indicate the semi-partial contribution (the unique, controlled association) of each variable to the discriminant function(s), controlling the independent but not the dependent for other independents entered in the equation (just as regression coefficients are semi-partial coefficients). In contrast, structure coefficients are whole (not partial) coefficients, similar to correlation coefficients, and reflect the uncontrolled association of the discriminant scores with the criterion variable. That is, the structure coefficients indicate the simple correlations between the variables and the discriminant function or functions. The structure coefficients should be used to assign meaningful labels to the discriminant functions. The standardized discriminant function coefficients should be used to assess the importance of each independent variable's unique contribution to the discriminant function.

Mahalanobis distances are used in analyzing cases in discriminant analysis. For instance, one might wish to analyze a new, unknown set of cases in comparison to an existing set of known cases. Mahalanobis distance is the distance between a case and the centroid for each group (of the dependent) in attribute space (n-dimensional space defined by n variables). A case will have one Mahalanobis distance for each group, and it will be classified as belonging to the group for which its Mahalanobis distance is smallest. Thus, the smaller the Mahalanobis distance, the closer the case is to the group centroid and the more likely it is to be classed as belonging to that group. Since Mahalanobis distance is measured in terms of standard deviations from the centroid, therefore a case which is more than 1.96 Mahalanobis distance units from the centroid has less than .05 chance of belonging to the group represented by the centroid; 3 units would likewise correspond to less than .01 chance. SPSS reports squared Mahalanobis distance: click the Classify button and then check "Casewise results."

Wilks's lambda tests the significance of each discriminant function in MDA -- specifically the significance of the eignevalue for a given function. It is a measure of the difference between groups of the centroid (vector) of means on the independent variables. The smaller the lambda, the greater the differences. Lambda varies from 0 to 1, with 0 meaning group means differ (thus the more the variable differentiates the groups), and 1 meaning all group means are the same. The Bartlett's V transformation of lambda is then used to compute the significance of lambda. Wilks's lambda is used, in conjunction with Bartlett's V, as a multivariate significance test of mean differences in MDA, for the case of multiple interval independents and multiple (>2) groups formed by the dependent. Wilks's lambda is sometimes called the U statistic.

Validation A hold-out sample is often used for validation of the discriminant function. This is a split

halves test, were a portion of the cases are assigned to the analysis sample for purposes of

7

Page 8: Discriminant Function Analysis

training the discriminant function, then it is validated by assessing its performance on the remaining cases in the hold-out sample.

Discriminant Function Analysis (Two Groups): SPSS Output

Notes This example is from the SPSS 7.5 "Applications Guide" example for file "gss 93 subset.sav". The dependent is "vote92." The independents are age, educ, income91, sex, and polviews (which is a 7-point Likert scale from "Extremely liberal" to "extremely conservative").

To obtain this output:

1. File, Open, point to gss 93 subset.sav. 2. Restrict vote92 to 1's and 2's by choosing Data, Select Cases, "If condition is satisfied".

Click the If button and enter vote92 <3. Click Continue, OK. 3. Statistics, Classify, Discriminant 4. Select vote92 as the "grouping variable" (the dependent). As independents, select age, educ,

income91, sex, and polviews. Check "Enter independents together" (i.e., not stepwise). 5. Click on Statistics and check all Descriptives and all Function Coefficients. 6. Click on Classify and check Results (limit to first 10), Summary Table, and all plots. 7. To run, click OK.

Comments in blue are by the instructor and are not part of SPSS output.

DiscriminantFirst come several blocks of general processing and descriptive statistics information.

Notes

Output Created02 Mar 98 14:11:35

Comments

Input Data Y:\PC\spss95\GSS93 subset.sav

Filter vote92 < 3 (FILTER)

Weight <none>

Split File <none>

8

Page 9: Discriminant Function Analysis

N of Rows in Working Data File

1452

Missing Value Handling

Definition of Missing

User-defined missing values are treated as missing in the analysis phase.

Cases Used

In the analysis phase, cases with no user- or system-missing values for any predictor variable are used. Cases with user-, system-missing, or out-of-range values for the grouping variable are always excluded.

Syntax

DISCRIMINANT/GROUPS=vote92(1 2)/VARIABLES=sex age educ income91 polviews/ANALYSIS ALL/PRIORS EQUAL/STATISTICS=MEAN STDDEV UNIVF BOXM COEFF RAW TABLE/PLOT=COMBINED SEPARATE MAP/PLOT=CASES(10)/CLASSIFY=NONMISSING POOLED .

Resources Elapsed Time 0:00:01.21

Analysis Case Processing Summary

Unweighted CasesN Percent

Valid 1345 92.6

Excluded

Missing or out-of-range group codes 0 .0

At least one missing discriminating variable 107 7.4

Both missing or out-of-range group codes and at least one missing discriminating variable

0 .0

Total 107 7.4

Total 1452 100.0

9

Page 10: Discriminant Function Analysis

Group Statistics

MeanStd.

Deviation

Valid N (listwise)

Voting in 1992 Election

Unweighted Weighted

voted

Respondent's Sex 1.55 .50 971 971.000

Age of Respondent 47.56 16.73 971 971.000

Highest Year of School Completed

13.64 2.97 971 971.000

Total family Income 15.51 5.00 971 971.000

Think of Self as Liberal or Conservative

4.19 1.41 971 971.000

did not vote

Respondent's Sex 1.57 .50 374 374.000

Age of Respondent 41.64 17.34 374 374.000

Highest Year of School Completed

11.84 2.84 374 374.000

Total Family Income 12.60 5.77 374 374.000

Think of Self as Liberal or Conservative

4.10 1.21 374 374.000

Total

Respondent's Sex 1.56 .50 1345 1345.000

Age of Respondent 45.91 17.10 1345 1345.000

Highest Year of School Completed

13.14 3.04 1345 1345.000

Total family Income 14.70 5.39 1345 1345.000

Think of Self as Liberal or Conservative

4.17 1.36 1345 1345.000

10

Page 11: Discriminant Function Analysis

In the ANOVA table below, the smaller the Wilks's lambda, the more important the independent variable to the discriminant function. Wilks's lambda is significant by the F test for age, educ, and

income92. We might consider dropping sex and polviews from the model.

Tests of Equality of Group Means

Wilks' Lambda F df1 df2 Sig.

Respondent's Sex 1.000 .418 1 1343 .518

Age of Respondent .976 33.197 1 1343 .000

Highest Year of School Completed .930 101.620 1 1343 .000

Total family Income .941 83.840 1 1343 .000

Think of Self as Liberal or Conservative .999 1.423 1 1343 .233

Analysis 1

Box's Test of Equality of Covariance Matrices

The larger the log determinant in the table below, the more that group's covariance matrix differs. The "Rank" column indicates the number of independent variables -- 5 in this case. Since

discriminant analysis assumes homogeneity of covariance matrices between groups, we would like to see the determinants be relatively equal. Box's M, next, tests the homogeneity of covariances

assumption.

Log Determinants

Voting in 1992 ElectionRank Log Determinant

voted 5 10.006

did not vote 5 9.945

Pooled within-groups 5 10.019

The ranks and natural logarithms of determinants printed are those of the group covariance matrices.

11

Page 12: Discriminant Function Analysis

Test Results

Box's M test tests the assumption of homogeneity of covariance matrices. This test is very sensitive to meeting also the assumption of multivariate normality. Discriminant function analysis is robust even when the homogeneity of variances assumption is not met, provided the data do not contain

important outliers. For the data below, the test is significant so we conclude the groups do differ in their covariance matrices, violating an assumption of DA. Note that when n is large, as it is here,

small deviations from homogeneity will be found significant, which is why Box's M must be interpreted in conjunction with inspection of the log determinants, above.

Box's M40.399

F

Approx. 2.679

df1 15

df2 2102169.732

Sig. .000

Tests null hypothesis of equal population covariance matrices.

Summary of Canonical Discriminant Functions

The table below shows the eigenvalues. The larger the eigenvalue, the more of the variance in the dependent variable is explained by that function. Since the dependent in this example has only two categories, there is only one discriminant function. However, if there were more categories, we would have multiple discriminant functions and this table would list them in descending order of importance. The second column lists the percent of variance explained by each function. The third column is the cumulative percent of variance explained. The last column is the canonical correlation, where the squared canonical correlation is the percent of variation in the dependent discriminated by the independents in DA. Sometimes this table is used to decide how many functions are important (ex., eigenvalues over 1, percent of variance more than 5%, cumularive percentage of 75%, canonical correlation of .6). This issue does not arise here since there is only one discriminant function, though we may note its canonical correlation is not high.

Eigenvalues

FunctionEigenvalue % of Variance Cumulative % Canonical Correlation

1 .164(a) 100.0 100.0 .376

12

Page 13: Discriminant Function Analysis

a First 1 canonical discriminant functions were used in the analysis.

This second appearance of Wilks's lambda serves a different purpose than its use in the ANOVA table above. In the table below it tests the significance of the eigenvalue for each discriminant

function. In this example there is only one, and it is significant.

Wilks' Lambda

Test of Function(s)Wilks' Lambda Chi-square df Sig.

1 .859 203.909 5 .000

The standardized discriminant function coefficients in the table below serve the same purpose as beta weights in multiple regression: they indicate the relative importance of the independent

variables in predicting the dependent.

Standardized Canonical Discriminant Function Coefficients

Function

1

Respondent's Sex .011

Age of Respondent .657

Highest Year of School Completed .712

Total family Income .423

Think of Self as Liberal or Conservative .018

The structure matrix table below shows the correlations of each variable with each discriminant function. In this case, there is only one discriminant function. However, when the dependent has more categories there will be more discriminant functions. In that case, there will be additional columns in the table, one for each function. The correlations then serve like factor loadings in factor analysis -- that is, by identifying the largest absolute

correlations associated with each discriminant function the researcher gains insight into how to name each function.

Structure Matrix

13

Page 14: Discriminant Function Analysis

Function

1

Highest Year of School Completed .679

Total family Income .616

Age of Respondent .388

Think of Self as Liberal or Conservative .080

Respondent's Sex -.044

Pooled within-groups correlations between discriminating variables and standardized canonical discriminant functions Variables ordered by absolute size of correlation within function.

The table below contains the unstandardized discriminant function coefficients. These would be used like unstandardized b (regression) coefficients in multiple regression -- that is, they are used to

construct the actual prediciton equation which can be used to classify new cases.

Canonical Discriminant Function Coefficients

Function

1

Respondent's Sex .021

Age of Respondent .039

Highest Year of School Completed .243

Total family Income .081

Think of Self as Liberal or Conservative .013

14

Page 15: Discriminant Function Analysis

(Constant) -6.253

Unstandardized coefficients

The table below is used to establish the cutting point for classifying cases. If the two groups are of equal size, the best cutting point is half way between the values of the functions at group centroids (that is, the average). If the groups are unequal, the optimal cutting point is the weighted average of the two values. Cases which evaluate on the function above the cutting point are classified as "did not vote," while those evaluating below the cutting point are evaluated as "Voted." Of course, the

computer does the classification automatically, so these values are for informational purposes.

Functions at Group Centroids

Function

Voting in 1992 Election 1

voted .251

did not vote -.653

Unstandardized canonical discriminant functions evaluated at group means

Classification Statistics

The table below just tells the researcher about the status of cases in terms of processing.

Classification Processing Summary

Processed1452

Excluded

Missing or out-of-range group codes 0

At least one missing discriminating variable 107

Used in Output 1345

Prior Probabilities below are used in classification. The default is using observed group sizes (marginals) in your sample to determine the prior probabilities of membership in the groups formed

by the dependent, and this is necessary if you have different group sizes. If each group is of the same size, as an alternative you could specify equal prior probabilities for all groups.

15

Page 16: Discriminant Function Analysis

Prior Probabilities for Groups

Prior

Cases Used in Analysis

Voting in 1992 ElectionUnweighted Weighted

voted .500 971 971.000

did not vote .500 374 374.000

Total 1.000 1345 1345.000

The table below is the result of checking "Fisher's" under "Function Coefficients" in the "Statistics" option of discriminant analysis. Two sets (one for each dependent group) of unstandardized linear discriminant coefficients are calculated, which can be used to classify cases. This is the classical

method of classification, though now little used.

Classification Function Coefficients

Voting in 1992 Election

voted did not vote

Respondent's Sex 7.048 7.029

Age of Respondent .250 .215

Highest Year of School Completed 1.884 1.664

Total family Income .319 .246

Think of Self as Liberal or Conservative 2.187 2.175

(Constant) -32.013 -26.541

Fisher's linear discriminant functions

The table below results from checking "Casewise results" in the "Classify" options of discriminant function analysis. The table lists the actual group, the predicted group based on largest posterior

probabilities, the prior probability (the probability of the observed group score given membership in the predicted group), the posterior probability (the chance the case belongs to the predicted group,

16

Page 17: Discriminant Function Analysis

based on the independents), the Mahalanobis distance squared of the case to the group centroid (large scores indicate outliers), and the discriminant score for the case. The case is classified based on the discriminant score in relation to the cutoff (not shown). Misclassified cases are marked with asterisks. The "Second Highest Group" columns show the posterior probabilities and Mahalanobis distances for the case had the case been classed based on the second highest posterior probability. Since there are only two groups in this example, the "second highest" is equivalent to the "other"

group.

Casewise Statistics

Actual Group

Highest Group Second Highest GroupDiscriminant

Scores

Predicted Group

P(D>d | G=g)

P(G=g | D=d)

Squared Mahalanobis Distance to Centroid

GroupP(G=g | D=d)

Squared Mahalanobis Distance to Centroid

Function 1Case Number

p

df

Original 1 1 2(**) .774 1 .537 .082 1 .463 .381 -.366

2 1 1 .561 1 .718 .339 2 .282 2.208 .833

3 1 1 .528 1 .727 .399 2 .273 2.358 .883

4 1 1 .445 1 .750 .583 2 .250 2.780 1.015

5 1 1 .015 1 .932 5.941 2 .068 11.165 2.689

6 1 1 .622 1 .701 .243 2 .299 1.951 .744

7 1 1 .878 1 .634 .024 2 .366 1.119 .405

8 2 1(**) .835 1 .645 .044 2 .355 1.238 .460

17

Page 18: Discriminant Function Analysis

9 1 1 .390 1 .766 .738 2 .234 3.107 1.110

10 1 1 .430 1 .755 .624 2 .245 2.870 1.041

** Misclassified case

Separate-Groups GraphsThe tables below result from checking "Combined-groups" and "Separate-groups" under "Plots" in the "Classify" options of discriminant analysis. If there were two or more discriminant functions, the charts below would be scatterplots showing the relation of the first two discriminant functions. As the dependent in this example has only one discriminant function, bar charts are displayed instead. In a good discriminant function, the bar chart will have most cases near the mean, with small tails.

18

Page 19: Discriminant Function Analysis

The table below is used to assess how well the discriminant function works, and if it works equally well for each group of the dependent variable. Here it correctly classifies about two-thirds of the

cases, making about the same proportion of mistakes for both categories. This would normally not be considered a satisfactory level of discrimination and the researcher would seek to test other

models.

Classification Results(a)

Predicted Group Membership

Total

Voting in 1992 Election voted

did not vote

Original

Count

voted 649 322 971

did not vote 121 253 374

%

voted 66.8 33.2 100.0

did not vote 32.4 67.6 100.0

a 67.1% of original grouped cases correctly classified.

19

Page 20: Discriminant Function Analysis

Discriminant Function Analysis (Three Groups): SPSS Output

Notes This example is from the SPSS 7.5 "Applications Guide" example for file "gss 93 subset.sav". The dependent is "race." The independents are agewed, educ, rincom91, sibs, rap, polviews (which is a 7-point Likert scale from "Extremely liberal" to "extremely conservative"), and marital.

To obtain this output:

1. File, Open, point to gss 93 subset.sav. 2. Statistics, Classify, Discriminant 3. Select race as the "grouping variable" (the dependent). As independents, select agewed,

educ, rincom91, sibs, rap, polviews, and marital.Check "Enter independents together" (i.e., not stepwise).

4. Click on Statistics and check Univariate ANOVA amd Box's M. 5. Click on Classify and check Computer from group sizes, Summary table, and all plots. 6. To run, click OK.

Comments in blue are by the instructor and are not part of SPSS output.

Discriminant

First come several blocks of general processing and descriptive statistics information.

Notes

Output Created 03 Mar 98 13:12:51

Comments

Input

Data Y:\PC\spss95\GSS93 subset.sav

Filter <none>

Weight <none>

Split File <none>

N of Rows in Working Data File

1500

Missing Value

Definition of Missing

User-defined missing values are treated as missing in the analysis phase.

20

Page 21: Discriminant Function Analysis

Handling Cases Used

In the analysis phase, cases with no user- or system-missing values for any predictor variable are used. Cases with user-, system-missing, or out-of-range values for the grouping variable are always excluded.

Syntax

DISCRIMINANT/GROUPS=race(1 3)/VARIABLES=agewed educ rincom91 sibs rap polviews marital/ANALYSIS ALL/PRIORS SIZE/STATISTICS=UNIVF BOXM TABLE/PLOT=COMBINED SEPARATE MAP/CLASSIFY=NONMISSING POOLED .

Resources Elapsed Time 0:00:02.20

Analysis Case Processing Summary

Unweighted Cases N Percent

Valid 732 48.8

Excluded

Missing or out-of-range group codes 0 .0

At least one missing discriminating variable 768 51.2

Both missing or out-of-range group codes and at least one missing discriminating variable

0 .0

Total 768 51.2

Total 1500 100.0

Group Statistics

Valid N (listwise)

Racew of Respondent Unweighted Weighted

white

Age When First Married 623 623.000

Highest Year of School Completed 623 623.000

Respondent's Income 623 623.000

Number of Brothers and Sisters 623 623.000

Rap Music 623 623.000

Think of Self as Liberal or Conservative 623 623.000

Marital Status 623 623.000

21

Page 22: Discriminant Function Analysis

black

Age When First Married 73 73.000

Highest Year of School Completed 73 73.000

Respondent's Income 73 73.000

Number of Brothers and Sisters 73 73.000

Rap Music 73 73.000

Think of Self as Liberal or Conservative 73 73.000

Marital Status 73 73.000

other

Age When First Married 36 36.000

Highest Year of School Completed 36 36.000

Respondent's Income 36 36.000

Number of Brothers and Sisters 36 36.000

Rap Music 36 36.000

Think of Self as Liberal or Conservative 36 36.000

Marital Status 36 36.000

Total

Age When First Married 732 732.000

Highest Year of School Completed 732 732.000

Respondent's Income 732 732.000

Number of Brothers and Sisters 732 732.000

Rap Music 732 732.000

Think of Self as Liberal or Conservative 732 732.000

Marital Status 732 732.000

In the ANOVA table below, the smaller the Wilks's lambda, the more important the independent variable to the discriminant function. Wilks's lambda is significant by the F test for all variables

except rincom91 and polviews, which we might consider dropping from the model.

Tests of Equality of Group Means

Wilks' Lambda F df1 df2 Sig.

Age When First Married .992 3.118 2 729 .045

Highest Year of School Completed .990 3.648 2 729 .027

Respondent's Income .996 1.459 2 729 .233

Number of Brothers and Sisters .937 24.567 2 729 .000

22

Page 23: Discriminant Function Analysis

Rap Music .946 20.793 2 729 .000

Think of Self as Liberal or Conservative .994 2.221 2 729 .109

Marital Status .981 6.992 2 729 .001

Analysis 1

Box's Test of Equality of Covariance Matrices

Log Determinants

Racew of Respondent Rank Log Determinant

white 7 10.002

black 7 12.168

other 7 11.980

Pooled within-groups 7 10.537

The ranks and natural logarithms of determinants printed are those of the group covariance matrices.

Box's M test tests the assumption of homogeneity of covariance matrices. This test is very sensitive to meeting also the assumption of multivariate normality. For the data below, the test is significant so we conclude the groups do differ in their covariance matrices, violating an assumption of DA.

However, discriminant function analysis is robust even when the homogeneity of variances assumption is not met, provided the data do not contain important outliers. Also, when n is large, as

it is here, small deviations from homogeneity will be found significant.

Test Results

Box's M 165.317

F

Approx. 2.792

df1 56

df2 32394.124

Sig. .000

Tests null hypothesis of equal population covariance matrices.

Summary of Canonical Discriminant Functions

One discriminant function will be computed the lesser of g - 1 (number of dependent groups minus 1) or k (the number of independent variables). Since the dependent, race, has three groups, the

number of discriminant functions computed is two. The eigenvalues show how much of the variance in the dependent, race, is accounted for by each of the functions. To attach meaning to the

23

Page 24: Discriminant Function Analysis

functions (like to factors in factor analysis) we will use the structure matrix later in the output. Wilks's lambda shows each function is significant.

Eigenvalues

Function Eigenvalue % of Variance Cumulative % Canonical Correlation

1 .145(a) 87.6 87.6 .356

2 .021(a) 12.4 100.0 .142

a First 2 canonical discriminant functions were used in the analysis.

Wilks' Lambda

Test of Function(s) Wilks' Lambda Chi-square df Sig.

1 through 2 .856 112.950 14 .000

2 .980 14.783 6 .022

The standardized discriminant function coefficients in the table below serve the same purpose as beta weights in multiple regression: they indicate the relative importance of the independent

variables in predicting the dependent.

Standardized Canonical Discriminant Function Coefficients

Function

1 2

Age When First Married .147 .579

Highest Year of School Completed -.211 -.003

Respondent's Income .071 -.255

Number of Brothers and Sisters .674 .246

Rap Music -.644 .135

Think of Self as Liberal or Conservative -.193 .086

Marital Status .190 -.716

The structure matrix table below shows the correlations of each variable with each discriminant function. The correlations serve like factor loadings in factor analysis -- that is, by identifying the

largest absolute correlations associated with each discriminant function the researcher gains insight into how to name each function.

Structure coefficients vs. standardized discriminant function coefficients. The standardized discriminant function coefficients (above) indicate the partial contribution of each variable to the discriminant function(s), controlling for other independents entered in the equation. The structure coefficients (below) indicate the simple correlations between the variables and the discriminant

function or functions. The structure coefficients should be used to assign meaningful labels to the

24

Page 25: Discriminant Function Analysis

discriminant functions. The standardized discriminant function coefficients should be used to assess each independent variable's unique contribution to the discriminant function.

You can see from the example below, it is not easy to assign a meaningful label to each function. The first and most important function has to do with siblings, rap music, education, and political views. The second dimension (function) has to do with age married, marital status, and income.

Could these functions be labeled "culture" and "marriage"?

Structure Matrix

Function

1 2

Number of Brothers and Sisters .675(*) .270

Rap Music -.626(*) .112

Highest Year of School Completed -.260(*) .099

Think of Self as Liberal or Conservative -.200(*) .117

Marital Status .232 -.744(*)

Age When First Married .105 .581(*)

Respondent's Income -.156 -.156(*)

Pooled within-groups correlations between discriminating variables and standardized canonical discriminant functions Variables ordered by absolute size of correlation within function.

* Largest absolute correlation between each variable and any discriminant function

The table below is used to establish the cutting points for classifying cases. The optimal cutting point is the weighted average of the paired values. The cutting points set ranges of the discriminant

score to classify cases as white, black, or other. Of course, the computer does the classification automatically, so these values are for informational purposes.

Functions at Group Centroids

Function

Racew of Respondent 1 2

white -.158 -6.990E-03

black .982 -.219

other .738 .565

Unstandardized canonical discriminant functions evaluated at group means

25

Page 26: Discriminant Function Analysis

Classification Statistics

The tables below just tells the researcher about the status of cases in terms of processing.

Classification Processing Summary

Processed 1500

Excluded Missing or out-of-range group codes 0

At least one missing discriminating variable 768

Used in Output 732

Prior Probabilities for Groups

Prior Cases Used in Analysis

Racew of Respondent Unweighted Weighted

white .851 623 623.000

black .100 73 73.000

other .049 36 36.000

Total 1.000 732 732.000

The territorial map below is a plot of the boundaries used for classifying cases into groups based on discriminant function scores. It is obtained by checking "Territorial map" in the "Classify" options of discriminant analysis. For the meaning of the symbols, note the legend below the map. For instance, where one sees "13" near the top of the map, this is a point in discriminant space where group 1 (whites) are differentiated from group 3 (other) on the two functions. Territorial MapCanonical DiscriminantFunction 2 -3.0 -2.0 -1.0 .0 1.0 2.0 3.0 +---------+---------+---------+---------+---------+---------+ 3.0 + 13 + I 13 I I 13 I I 13 I I 13 I I 13 I 2.0 + + + + + + 13 + I 13 I I 133333I I 12222I I 12 I I 12 I 1.0 + + + + + + 12 + I 12 I I 12 I I * 12 I I 12 I I 12 I

26

Page 27: Discriminant Function Analysis

.0 + + + * + + + 12 + I * 12 I I 12 I I 12 I I 12 I I 12 I -1.0 + + + + + +12 + I 12 I I 12 I I 12 I I 12 I I 12 I -2.0 + + + + + 12 + I 12 I I 12 I I 12 I I 12 I I 12 I -3.0 + 12 + +---------+---------+---------+---------+---------+---------+ -3.0 -2.0 -1.0 .0 1.0 2.0 3.0 Canonical Discriminant Function 1

Symbols used in territorial map

Symbol Group Label------ ----- --------------------

1 1 white 2 2 black 3 3 other * Indicates a group centroid

Separate-Groups Graphs

The tables below result from checking "Combined-groups" and "Separate-groups" under "Plots" in the "Classify" options of discriminant analysis. Since there are two or more discriminant functions,

the charts are scatterplots showing the discriminant scores of the cases on the two discriminant functions. The first three tables show this separately for each of the three race groups, and the fourth

27

Page 28: Discriminant Function Analysis

table shows the same information for the combined groups.

28

Page 29: Discriminant Function Analysis

The table below is used to assess how well the discriminant function works, and if it works equally well for each group of the dependent variable. Here it correctly classifies about 85% of the cases,

but this is not as good as it seems. DA gets almost all whites correctly classified. However, it misclassifies most of the "blacks" and "other" cases. The seemingly high 85% rating is obtained by

classifying nealy everyone white in a sample which is preponderantly white. This is not a satisfactory discriminant analysis. It would be better to train DA on an analysis set which was

balanced in terms of numbers of people in each race group.

Classification Results(a)

29

Page 30: Discriminant Function Analysis

Predicted Group Membership Total

Race of Respondent - white black other

Original

Count

white 616 6 1 623

black 64 8 1 73

other 33 2 1 36

%

white 98.9 1.0 .2 100.0

black 87.7 11.0 1.4 100.0

other 91.7 5.6 2.8 100.0

a 85.4% of original grouped cases correctly classified.

30