discriminant analysis basicrelationships

122
SW388R7 Data Analysis & Computers II Slide 1 Discriminant Analysis – Basic Relationships Discriminant Functions and Scores Describing Relationships Classification Accuracy Sample Problems

Upload: divyakalsi89

Post on 22-Dec-2014

362 views

Category:

Documents


0 download

DESCRIPTION

 

TRANSCRIPT

  • 1. SW388R7Discriminant Analysis Basic RelationshipsData Analysis & Computers IISlide 1Discriminant Functions and ScoresDescribing Relationships Classification AccuracySample Problems

2. SW388R7 Discriminant analysisData Analysis & Computers IISlide 2 Discriminant analysis is used to analyze relationships between anon-metric dependent variable and metric or dichotomousindependent variables. Discriminant analysis attempts to use the independent variablesto distinguish among the groups or categories of the dependentvariable. The usefulness of a discriminant model is based upon itsaccuracy rate, or ability to predict the known groupmemberships in the categories of the dependent variable. 3. SW388R7Discriminant scoresData Analysis & Computers IISlide 3 Discriminant analysis works by creating a new variable calledthe discriminant function score which is used to predict towhich group a case belongs. Discriminant function scores are computed similarly to factorscores, i.e. using eigenvalues. The computations find thecoefficients for the independent variables that maximize themeasure of distance between the groups defined by thedependent variable. The discriminant function is similar to a regression equation inwhich the independent variables are multiplied by coefficientsand summed to produce a score. 4. SW388R7Discriminant functionsData Analysis & Computers IISlide 4 Conceptually, we can think of the discriminant function orequation as defining the boundary between groups. Discriminant scores are standardized, so that if the score fallson one side of the boundary (standard score less than zero, thecase is predicted to be a member of one group) and if the scorefalls on the other side of the boundary (positive standardscore), it is predicted to be a member of the other group. 5. SW388R7Number of functionsData Analysis & Computers IISlide 5 If the dependent variable defines two groups, one statisticallysignificant discriminant function is required to distinguish thegroups; if the dependent variable defines three groups, twostatistically significant discriminant functions are required todistinguish among the three groups; etc. If a discriminant function is able to distinguish among groups, itmust have a strong relationship to at least one of theindependent variables. The number of possible discriminant functions in an analysis islimited to the smaller of the number of independent variablesor one less than the number of groups defined by thedependent variable. 6. SW388R7 Overall test of relationshipData Analysis & Computers IISlide 6 The overall test of relationship among the independentvariables and groups defined by the dependent variable is aseries of tests that each of the functions needed to distinguishamong the groups is statistically significant. In some analyses, we might discover that two or more of thegroups defined by the dependent variable cannot bedistinguished using the available independent variables. Whileit is reasonable to interpret a solution in which there are fewersignificant discriminant functions than the maximum numberpossible, our problems will require that all of the possiblediscriminant functions be significant. 7. SW388R7Data Analysis &Interpreting the relationship between independent and dependent variables Computers IISlide 7 The interpretative statement about the relationship betweenthe independent variable and the dependent variable is astatement like: cases in group A tended to have higher scoreson variable X than cases in group B or group C. This interpretation is complicated by the fact that therelationship is not direct, but operates through the discriminantfunction. Dependent variable groups are distinguished by scores ondiscriminant functions, not on values of independent variables.The scores on functions are based on the values of theindependent variables that are multiplied by the functioncoefficients. 8. SW388R7 Groups, functions, and variablesData Analysis & Computers IISlide 8 To interpret the relationship between an independent variableand the dependent variable, we must first identify how thediscriminant functions separate the groups, and then the role ofthe independent variable is for each function. SPSS provides a table called "Functions at Group Centroids"(multivariate means) that indicates which groups are separatedby which functions. SPSS provides another table called the "Structure Matrix" which,like its counterpart in factor analysis, identifies the loading, orcorrelation, between each independent variable and eachfunction. This tells us which variables to interpret for eachfunction. Each variable is interpreted on the function that itloads most highly on. 9. SW388R7 Functions at Group CentroidsData Analysis & Computers IISlide 9 In order to specify the role that each independent variable plays in predicting group membership on the dependent variable, we must link together the relationship between the discriminant functions and the groups defined by the dependent variable, the role of the significant independent variables in the discriminant functions, and the differences in group means for each of the variables.Function 2 separatesFunctions at Group Centroidssurvey respondentswho thought we spendFunctiontoo little money onwelfare (positive valueWELFARE1 2of 0.235) from survey1-.220.235respondents who2.446-.031thought we spend too3-.311 -.362much money (negativevalue of -0.362) onUnstandardized canonical discriminant welfare. We ignore thefunctions evaluated at group meanssecond group (-0.031)Function 1 separates survey respondents in this comparisonwho thought we spend about the rightbecause it wasamount of money on welfare (the positivedistinguished from thevalue of 0.446) from survey respondents other two groups bywho thought we spend too much (negative function 1.value of -0.311) or little money (negativevalue of -0.220) on welfare. 10. SW388R7 Structure MatrixData Analysis & Computers II Slide 10Based on the structure matrix, theWe do not interpretpredictor variables strongly associated withloadings in thediscriminant function 1 which distinguished structure matrix unlessbetween survey respondents who thoughtthey are 0.30 or higher.we spend about the right amount of moneyon welfare and survey respondents whothought we spend too much or little moneyon welfare were number of hours worked inStructure Matrixthe past week (r=-0.582) and highest yearof school completed (r=0.687).Function 1 2HIGHEST YEAR OF .687* .136SCHOOL COMPLETEDNUMBER OF HOURS-.582* .345WORKED LAST WEEKR SELF-EMP OR WORKS .223.889*FOR SOMEBODYRESPONDENTS INCOMEa.101.292*Pooled within-groups correlations between discriminatingvariables and standardized canonical discriminant functionsVariables ordered by absolute size of correlation within function. Based on the *. Largest absolute correlation between each variable andstructure matrix, the predictor variable strongly associated with discriminant function 2 which any discriminant function distinguished a.between survey respondents who thought we spend too little This variable not used inand analysis. respondentsmoney on welfare the survey who thought we spend too much money on welfare was self-employment (r=0.889). 11. SW388R7Group StatisticsData Analysis & Computers II Slide 11 Group StatisticsValid N (listwise) WELFAREMeanStd. Deviation UnweightedWeighted 1 TOO LITTLENUMBER OF HOURS The average number of hours worked43.9613.240 WORKED LAST WEEKin the past week56.000 56for survey HIGHEST YEAR OF respondents who thought we spend13.73 2.401about the 56right amount of money on 56.000 SCHOOL COMPLETED welfare (mean=37.90) was lower than R SELF-EMP OR WORKS 1.93.260the average number of hours worked 5656.000 FOR SOMEBODYin the past weeks for survey RESPONDENTS INCOME 13.70 5.034respondents who thought we spend too 5656.000 2 ABOUT RIGHT NUMBER OF HOURS much money on welfare (mean=43.96)37.9013.235and survey respondents who thought 5050.000 WORKED LAST WEEK HIGHEST YEAR OF we spend too little money on welfare14.78 2.558(mean=42.03). 50.000 50 SCHOOL COMPLETED R SELF-EMP OR WORKS 1.90.303This enables us to make the50 50.000 FOR SOMEBODYstatement: "survey respondents who RESPONDENTS INCOME 14.00 5.503thought we spend about the right5050.000 3 TOO MUCHNUMBER OF HOURS amount of money on welfare worked42.0310.456fewer hours in the past week than32 32.000 WORKED LAST WEEK HIGHEST YEAR OF survey respondents who thought we13.38 2.524spend too32 much 32.000or little money on SCHOOL COMPLETED welfare." R SELF-EMP OR WORKS 1.75.440 3232.000 FOR SOMEBODY RESPONDENTS INCOME 14.75 5.304 3232.000 Total NUMBER OF HOURS41.3212.846138 138.000 WORKED LAST WEEK HIGHEST YEAR OF14.03 2.537138 138.000 SCHOOL COMPLETED R SELF-EMP OR WORKS 12. SW388R7 Which independent variables to interpretData Analysis & Computers II Slide 12 In a simultaneous discriminant analysis, in which allindependent variables are entered together, we only interpretthe relationships for independent variables that have a loadingof 0.30 or higher one or more discriminant functions. Avariable can have a high loading on more than one function,which complicates the interpretation. We will interpret thevariable for the function on which it has the highest loading. In a stepwise discriminant analysis, we limit the interpretationof relationships between independent variables and groupsdefined by the dependent variable to those independentvariables that met the statistical test for inclusion in theanalysis. 13. SW388R7Discriminant analysis and classificationData Analysis & Computers II Slide 13 Discriminant analysis consists of two stages: in the first stage,the discriminant functions are derived; in the second stage, thediscriminant functions are used to classify the cases. While discriminant analysis does compute correlation measuresto estimate the strength of the relationship, these correlationsmeasure the relationship between the independent variablesand the discriminant scores. A more useful measure to assess the utility of a discriminantmodel is classification accuracy, which compares predictedgroup membership based on the discriminant model to theactual, known group membership which is the value for thedependent variable. 14. SW388R7Evaluating usefulness for discriminant modelsData Analysis & Computers II Slide 14 The benchmark that we will use to characterize a discriminantmodel as useful is a 25% improvement over the rate of accuracyachievable by chance alone. Even if the independent variables had no relationship to thegroups defined by the dependent variable, we would stillexpect to be correct in our predictions of group membershipsome percentage of the time. This is referred to as by chanceaccuracy. The estimate of by chance accuracy that we will use is theproportional by chance accuracy rate, computed by summingthe squared percentage of cases in each group. 15. SW388R7Comparing accuracy ratesData Analysis & Computers II Slide 15 To characterize our model as useful, we compare the cross-validated accuracy rate produced by SPSS to 25% more than theproportional by chance accuracy. The cross-validated accuracy rate is a one-at-a-time hold outmethod that classifies each case based on a discriminantsolution for all of the other cases in the analysis. It is a morerealistic estimate of the accuracy rate we should expect in thepopulation because discriminant analysis inflates accuracy rateswhen the cases classified are the same cases used to derive thediscriminant functions. Cross-validated accuracy rates are not produced by SPSS whenseparate covariance matrices are used in the classification,which we address more next week. 16. SW388R7Computing by chance accuracyData Analysis & Computers II Slide 16 The percentage of cases in each group defined by thedependent variable are reported in the table "PriorProbabilities for Groups" Prior Probabilities for Groups Cases Used in AnalysisWELFAREPrior Unweighted Weighted1 TOO LITTLE.406 5656.0002 ABOUT RIGHT .362 5050.0003 TOO MUCH.232 3232.000Total 1.000 138 138.000The proportional by chance accuracyrate was computed by squaring andsumming the proportion of cases ineach group from the table of priorprobabilities for groups (0.406 +0.362 + 0.232 = 0.350).A 25% increase over this would requirethat our cross-validated accuracy be43.7% (1.25 x 35.0% = 43.7%). 17. SW388R7Comparing the cross-validated accuracy rateData Analysis & Computers II Slide 17b,c Classification ResultsPredicted Group Membership1 TOO2 ABOUT WELFARELITTLE RIGHT3 TOO MUCH Total OriginalCount 1 TOO LITTLE 43 15 6 64 2 ABOUT RIGHT26 30 6 62 3 TOO MUCH 17 10 9 36 Ungrouped cases 33 28 % 1 TOO LITTLE 67.2 23.4 9.4100.0 2 ABOUT RIGHT41.9 48.4 9.7100.0 3 TOO MUCH 47.2 27.825.0100.0 Ungrouped cases37.5 37.525.0100.0 Cross-validated a Count 1 TOO LITTLE 43 15 6 64 SPSS reports the cross-validated accuracy rate 2 ABOUT RIGHT in the footnotes to the table "Classification 626 30 62 3 TOO MUCH The cross-validated accuracy rate 8 Results."17 11 36 % 1 TOO LITTLE by SPSS was 50.0% which was 9.4 computed 67.2 23.4100.0 greater than or equal to the proportional by 2 ABOUT RIGHT41.9 48.4 9.7100.0 chance accuracy criteria of 43.7%. 3 TOO MUCH 47.2 30.622.2100.0 a. Cross validation is done only for those cases in the analysis. In cross validation, each case isclassified by the functions derived from all cases other than that case. b. 50.6% of original grouped cases correctly classified. c. 50.0% of cross-validated grouped cases correctly classified. 18. SW388R7 Problem 1Data Analysis & Computers II Slide 181. In the dataset GSS2000.sav, is the following statement true, false, or an incorrectapplication of a statistic? Assume that there is no problem with missing data, violation ofassumptions, or outliers. Use a level of significance of 0.05 for evaluating the statisticalrelationship.The variables "age" [age], "highest year of school completed" [educ], "sex" [sex], and "income"[rincom98] are useful in distinguishing between groups based on responses to "seen x-ratedmovie in last year" [xmovie]. These predictors differentiate survey respondents who had seenan x-rated movie in the last year from survey respondents who had not seen an x-rated moviein the last year.Survey respondents who had seen an x-rated movie in the last year were younger than surveyrespondents who had not seen an x-rated movie in the last year. Survey respondents who hadseen an x-rated movie in the last year were more likely to be male than survey respondentswho had not seen an x-rated movie in the last year.1. True2. True with caution3. False4. Inappropriate application of a statistic 19. SW388R7 Dissecting problem 1 - 1Data Analysis & Computers II Slide 19In the dataset GSS2000.sav, is the following statement true, false, or an incorrectapplication of a statistic? Assume that there is no problem with missing data, violation ofassumptions, or outliers. Use a level of significance of 0.05 for evaluating the statisticalrelationship.The variables "age" [age], "highest year of school completed" [educ], "sex" [sex], and "income"[rincom98] are useful in distinguishing between groups based on responses to "seen x-rated For these problems, we willmovie in last year" [xmovie]. These predictors differentiate survey respondents who had seenan x-rated movie in the last no problemsurvey respondents who had not seen an x-rated movie assume that there is year from with missing data, violation ofin the last year. assumptions, or outliers.Survey respondents whowe are told tox-rated movie in the last year were younger than surveyIn this problem, had seen anrespondents0.05 as alpha for the x-rated movie in the last year. Survey respondents who haduse who had not seen anseen an discriminant analysis. last year were more likely to be male than survey respondentsx-rated movie in thewho had not seen an x-rated movie in the last year.1. True2. True with caution3. False4. Inappropriate application of a statistic 20. SW388R7Dissecting problem 1 - 2Data Analysis & Computers II Slide 20The variables listed first in the problemstatement are the independent variables1. In the dataset GSS2000.sav, is theof school statement true, false, or an incorrect(IVs): "age" [age], "highest year followingcompleted" [educ], "sex" [sex], andapplication of a statistic? Assume that there is no problem with missing data, violation of"income" [rincom98].assumptions, or outliers. Use a level of significance of 0.05 for evaluating the statisticalrelationship.The variables "age" [age], "highest year of school completed" [educ], "sex" [sex], and"income" [rincom98] are useful in distinguishing between groups based on responses to"seen x-rated movie in last year" [xmovie]. These predictors differentiate survey respondentswho had seen an x-rated movie in the last year from survey respondents who had not seen an x-rated movie in the last year.Survey variable usedwho had seen an x-rated movie in the last year were younger than survey The respondents to definerespondents the dependent groups is who had not seen an x-rated movie in the last year. Survey respondents who hadseen an x-rated movie in the last year were more likely to be male than survey respondents variable (DV): "seen x-rated movie in last year" [xmovie].who had not seen an x-rated movie in the last year.When a problem states that a list of independent variables can distinguish among groups, we do a discriminant analysis entering all of the variables simultaneously. 21. SW388R7 Dissecting problem 1 - 3Data Analysis & Computers II Slide 21In the dataset GSS2000.sav, is the following statement true, false, or an incorrect applicationof a statistic? Assume that there is no problem with missing data, violation of assumptions, oroutliers. Use a level of significance of 0.05 for evaluating the statistical relationship.The variables "age" [age], "highest year of school completed" [educ], "sex" [sex], and "income"[rincom98] are useful in distinguishing between groups based on responses to "seen x-ratedmovie in last year" [xmovie]. These predictors differentiate survey respondents who hadseen an x-rated movie in the last year from survey respondents who had not seen an x-rated movie in the last year.Survey respondents who had seen an x-rated movie in the last year were younger than surveyrespondents who had not seen an x-rated movie in the last year. Survey respondents who hadseen an x-rated movie in the last year were more the dependent The problem identifies two groups for likely to be male than survey respondentswho had not seen an x-rated movie in the last year. variable: survey respondents who had seen an x-rated1. Truemovie in the last year survey respondents who had not seen an x-2. True with caution movie in the last year rated3. False4. Inappropriate application of a statistic the analysis will be To distinguish among two groups, required to find one statistically significant discriminant function. 22. SW388R7 Dissecting problem 1 - 4Data Analysis & Computers II Slide 22 The specific relationships listed in the problem indicate how the independentThe variables "age" [age], "highest year of school completed" [educ], "sex"the variable relates to groups of [sex], and "income"[rincom98] are useful in distinguishing between groups based on responsesmean for x-rated dependent variable, i.e., the to "seenmovie in last year" [xmovie]. These predictors age will be lower for respondents who had seendifferentiate survey respondents who had seen an x-rated movie in the lastan x-rated movie in the last year from survey respondents who had not seen an x-rated movie year.in the last year.Survey respondents who had seen an x-rated movie in the last year were younger thansurvey respondents who had not seen an x-rated movie in the last year. Survey respondentswho had seen an x-rated movie in the last year were more likely to be male than surveyrespondents who had not seen an x-rated movie in the last year.1. True2. True with caution3. FalseIn order for the discriminant analysis to be4. Inappropriate application of a statistic we must have enough statisticallytrue,significant functions to distinguish among thegroups, the classification accuracy rate mustbe substantially better than could be obtainedby chance alone, and each significantrelationship must be interpreted correctly. 23. SW388R7 LEVEL OF MEASUREMENT - 1Data Analysis & Computers II Slide 23In the dataset GSS2000.sav, is the following statement true, false, or an incorrect applicationof a statistic? Assume that there is no problem with missing data, violation of assumptions, oroutliers. Use a level of significance of 0.05 for evaluating the statistical relationship.The variables "age" [age], "highest year of school completed" [educ], "sex" [sex], and "income"[rincom98] are useful in distinguishing between groups based on responses to "seen x-ratedmovie in last year" [xmovie]. These predictors differentiate survey respondents who hadseen an x-rated movie in the last year from survey respondents who had not seen an x-rated movie in the last year.Survey respondents who had seen an x-rated movie in the last year were younger than surveyrespondents who had not seen an x-rated movie in the last year. Survey respondents who hadseen an x-rated movie in the last year were more likely to be male than survey respondentswho had not seen an x-rated movie in the last year.Discriminant analysis requires that thedependent variable be non-metric and the1. True independent variables be metric or dichotomous.2. True with caution"seen x-rated movie in last year" [xmovie] is andichotomous variable, which satisfies the level of3. Falsemeasurement requirement.4. Inappropriate application of a statistic It contains two categories: survey respondents who had seen an x-rated movie in the last year and survey respondents who had not seen an x- rated movie in the last year. 24. SW388R7LEVEL OF MEASUREMENT - 2Data Analysis & Computers II Slide 24 In the dataset GSS2000.sav, is the following statement true, false, or an incorrect application of a statistic? Assume that there is no problem with missing data, violation of assumptions, or outliers. Use a level of significance of 0.05 for evaluating the statistical relationship. The variables "age" [age], "highest year of school completed" [educ], "sex" [sex], and "income" [rincom98] are useful in distinguishing between groups based on responses to "seen x-rated movie in last year" [xmovie]. These predictors differentiate survey respondents who had seen an x-rated movie in the last year from survey respondents who had not seen an x- rated movie in the last year. Survey respondents who had seen an x-rated movie in the last year were younger than survey respondents who had not seen an x-rated movie in the last year. Survey respondents who had seen an x-rated movie in the last year were more likely to be male than survey respondents"Age" [age] and "highest year ofschoolhad not seen an x-rated movie in the last year. who completed" [educ] areinterval level variables, whichsatisfies the level of measurement1. Truerequirements for discriminant"Income" [rincom98] is an ordinal levelanalysis.2. True with caution variable. If we follow the convention of 3. Falsetreating ordinal level variables as metric variables, the level of measurement 4. Inappropriate application of a statistic requirement for discriminant analysis is satisfied. Since some data analysts do not agree with this convention, a note "Sex" [sex] is a dichotomous or of caution should be included in our dummy-coded nominal variableinterpretation. which may be included in discriminant analysis. 25. SW388R7Request simultaneous discriminant analysisData Analysis & Computers II Slide 25 Select the Classify | Discriminant command from the Analyze menu. 26. SW388R7Selecting the dependent variableData Analysis & Computers II Slide 26First, highlight thedependent variablexmovie in the listof variables. Second, click on the right arrow button to move the dependent variable to the Grouping Variable text box. 27. SW388R7 Defining the group valuesData Analysis & Computers II Slide 27When SPSS moves the dependent variable to theGrouping Variable textbox, it puts two question marks inparentheses after the variable name. This is a reminderthat we have to enter the number that represent thegroups we want to include in the analysis.First, to specify thegroup numbers, clickon the Define Rangebutton. 28. SW388R7Completing the range of group valuesData Analysis & Computers II Slide 28The value labels for xmovie showtwo categories:1 = YES2 = NO First, type in 1 inThe range of values that we need the Minimum textto enter goes from 1 as the box.minimum and 2 as the maximum. Second, type in 2 in theThird, click on the Maximum textContinue button to box.close the dialog box. 29. SW388R7Selecting the independent variablesData Analysis & Computers II Slide 29Move the independentvariables listed in theproblem to theIndependents list box. 30. SW388R7Specifying the method for including variablesData Analysis & Computers II Slide 30SPSS provides us with two methods for includingvariables: to enter all of the independent variablesat one time, and a stepwise method for selectingvariables using a statistical test to determine theorder in which variables are included.Since the problemstates that there is arelationship withoutrequesting the bestpredictors, we acceptthe default to Enterindependents together. 31. SW388R7Requesting statistics for the outputData Analysis & Computers II Slide 31Click on the Statisticsbutton to select statisticswe will need for theanalysis. 32. SW388R7 Specifying statistical outputData Analysis & Computers II Slide 32First, mark the Meanscheckbox on the Descriptivespanel. We will use the groupmeans in our interpretation.Second, mark the UnivariateANOVAs checkbox on theDescriptives panel. Perusingthese tests suggests whichvariables might be usefuldescriminators. Third, mark the Boxs M checkbox. Boxs M statistic Fourth, click on the evaluates conformity to the Continue button to assumption of homogeneity ofclose the dialog box. group variances. 33. SW388R7Specifying details for classificationData Analysis & Computers II Slide 33 Click on the Classify button to specify details for the classification phase of the analysis. 34. SW388R7 Details for classification - 1Data Analysis & Computers II Slide 34First, mark the option button to Compute fromgroup sizes on the Prior Probabilities panel.This incorporates the size of the groups definedby the dependent variable into the classificationof cases using the discriminant functions.Second, mark theCasewise resultscheckbox on theDisplay panel toincludeclassification detailsfor each case in theoutput. Third, mark the Summary table checkbox to include summary tables comparing actual and predicted classification. 35. SW388R7Details for classification - 2Data Analysis & Computers II Slide 35Fourth, mark the Leave-one-outclassification checkbox to request SPSS toinclude a cross-validated classification inthe output. This option produces a lessbiased estimate of classification accuracyby sequentially holding each case out ofthe calculations for the discriminantfunctions, and using the derived functionsto classify the case held out. 36. SW388R7Details for classification - 3Data Analysis & Computers II Slide 36Fifth, accept the default of Within-groupsSeventh, clickoption button on the Use Covariance Matrixon the Continuepanel. The Covariance matrices are thebutton to closemeasure of the dispersion in the groups the dialog box.defined by the dependent variable. If wefail the homogeneity of group variancestest (Boxs M), our option is use Separategroups covariance in classification. Sixth, mark the Combines- groups checkbox on the Plots panel to obtain a visual plot of the relationship between functions and groups defined by the dependent variable. 37. SW388R7Completing the discriminant analysis requestData Analysis & Computers II Slide 37 Click on the OK button to request the output for the disciminant analysis. 38. SW388R7Sample size ratio of cases to variablesData Analysis & Computers II Slide 38 Analysis Case Processing SummaryUnweighted CasesN PercentValid 119 44.1Excluded Missing or out-of-range4918.1 group codes At least one missing6624.4 discriminating variable Both missing or out-of-range group codesThe minimum ratio of valid3613.3 and at least one missingcases to independent discriminating variable variables for discriminant Total151analysis is 5 to 1, with a55.9Total 270preferred ratio of 20 to 1. In 100.0 this analysis, there are 119 valid cases and 4 independent variables. The ratio of cases to independent variables is 29.75 to 1, which satisfies the minimum requirement. In addition, the ratio of 29.75 to 1 satisfies the preferred ratio of 20 to 1. 39. SW388R7Sample size minimum group sizeData Analysis & Computers II Slide 39 Prior Probabilities for Groups Cases Used in AnalysisIn addition to the requirement for theXMOVIE Prior Unweighted Weightedratio of cases to independent1 .311 3737.000variables, discriminant analysis2 .689 8282.000 requires that there be a minimumTotal 1.000 119 119.000 number of cases in the smallest groupdefined by the dependent variable.The number of cases in the smallestgroup must be larger than the numberof independent variables, andpreferably contains 20 or more cases.The number of cases in the smallestgroup in this problem is 37, which islarger than the number ofindependent variables (4), satisfyingthe minimum requirement. Inaddition, the number of cases in thesmallest group satisfies the preferredminimum of 20 cases. If the sample size did not initially satisfy the minimum requirements, discriminant analysis is not appropriate. 40. SW388R7NUMBER OF DISCRIMINANT FUNCTIONS - 1Data Analysis & Computers II Slide 40 The maximum possible number of discriminant functions is the smaller of one less than the number of groups defined by the dependent variable and the number of independent variables. In this analysis there were 2 groups defined by seen x-rated movie in last year and 4 independent variables, so the maximum possible number of discriminant functions was 1. 41. SW388R7NUMBER OF DISCRIMINANT FUNCTIONS - 2Data Analysis & Computers II Slide 41In the table of Wilks Lambdawhich tested functions forstatistical significance, the directanalysis identified 1 discriminantfunctions that were statisticallysignificant. The Wilks lambdastatistic for the test of function 1(chi-square=24.159) had aprobability of