a bayesian approach to configural frequency analysis

26
This article was downloaded by: [Colorado College] On: 18 November 2014, At: 15:38 Publisher: Routledge Informa Ltd Registered in England and Wales Registered Number: 1072954 Registered office: Mortimer House, 37-41 Mortimer Street, London W1T 3JH, UK The Journal of Mathematical Sociology Publication details, including instructions for authors and subscription information: http://www.tandfonline.com/loi/gmas20 A Bayesian approach to configural frequency analysis Eduardo GutiérrezPeña a & Alexander von Eye b a IIMAS , UNAM , Mexico b Department of Psychology , Michigan State University , 119 Snyder Hall, East Lansing, MI, 48824, USA Published online: 26 Aug 2010. To cite this article: Eduardo GutiérrezPeña & Alexander von Eye (2000) A Bayesian approach to configural frequency analysis, The Journal of Mathematical Sociology, 24:2, 151-174, DOI: 10.1080/0022250X.2000.9990233 To link to this article: http://dx.doi.org/10.1080/0022250X.2000.9990233 PLEASE SCROLL DOWN FOR ARTICLE Taylor & Francis makes every effort to ensure the accuracy of all the information (the “Content”) contained in the publications on our platform. However, Taylor & Francis, our agents, and our licensors make no representations or warranties whatsoever as to the accuracy, completeness, or suitability for any purpose of the Content. Any opinions and views expressed in this publication are the opinions and views of the authors, and are not the views of or endorsed by Taylor & Francis. The accuracy of the Content should not be relied upon and should be independently verified with primary sources of information. Taylor and Francis shall not be liable for any losses, actions, claims, proceedings, demands, costs, expenses, damages, and other liabilities

Upload: alexander

Post on 18-Mar-2017

256 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: A Bayesian approach to configural frequency analysis

This article was downloaded by: [Colorado College]On: 18 November 2014, At: 15:38Publisher: RoutledgeInforma Ltd Registered in England and Wales Registered Number:1072954 Registered office: Mortimer House, 37-41 Mortimer Street,London W1T 3JH, UK

The Journal ofMathematical SociologyPublication details, including instructions forauthors and subscription information:http://www.tandfonline.com/loi/gmas20

A Bayesian approach toconfigural frequencyanalysisEduardo Gutiérrez‐Peña a & Alexander von Eyeb

a IIMAS , UNAM , Mexicob Department of Psychology , Michigan StateUniversity , 119 Snyder Hall, East Lansing, MI,48824, USAPublished online: 26 Aug 2010.

To cite this article: Eduardo Gutiérrez‐Peña & Alexander von Eye (2000)A Bayesian approach to configural frequency analysis, The Journal ofMathematical Sociology, 24:2, 151-174, DOI: 10.1080/0022250X.2000.9990233

To link to this article: http://dx.doi.org/10.1080/0022250X.2000.9990233

PLEASE SCROLL DOWN FOR ARTICLE

Taylor & Francis makes every effort to ensure the accuracy of allthe information (the “Content”) contained in the publications on ourplatform. However, Taylor & Francis, our agents, and our licensorsmake no representations or warranties whatsoever as to the accuracy,completeness, or suitability for any purpose of the Content. Anyopinions and views expressed in this publication are the opinions andviews of the authors, and are not the views of or endorsed by Taylor& Francis. The accuracy of the Content should not be relied upon andshould be independently verified with primary sources of information.Taylor and Francis shall not be liable for any losses, actions, claims,proceedings, demands, costs, expenses, damages, and other liabilities

Page 2: A Bayesian approach to configural frequency analysis

whatsoever or howsoever caused arising directly or indirectly inconnection with, in relation to or arising out of the use of the Content.

This article may be used for research, teaching, and private studypurposes. Any substantial or systematic reproduction, redistribution,reselling, loan, sub-licensing, systematic supply, or distribution in anyform to anyone is expressly forbidden. Terms & Conditions of accessand use can be found at http://www.tandfonline.com/page/terms-and-conditions

Dow

nloa

ded

by [

Col

orad

o C

olle

ge]

at 1

5:38

18

Nov

embe

r 20

14

Page 3: A Bayesian approach to configural frequency analysis

Journal of Mathematical Sociology © 2000 OPA (Overseas Publishers Association) N.V.2000, Vol. 24(2), pp. 151-174 Published by license underReprints available directly from the publisher the Harwood Academic Publishers imprint,Photocopying permitted by license only part of The Gordon and Breach Publishing Group.

Printed in Malaysia.

A BAYESIAN APPROACH TOCONFIGURAL FREQUENCY

ANALYSIS

EDUARDO GUTIÉRREZ-PEÑAa andALEXANDER VON EYEb,*

aIIMAS – UNAM, Mexico; bDepartment of Psychology, 119 Snyder Hall,Michigan State University, East Lansing, MI 48824, USA

Configural Frequency Analysis (CFA) is a method for cell-wise inspection of cross-classifications. CFA searches for types, that is, patterns of variable categories that occurmore often than expected from some chance model, and for antitypes, that is, patternsobserved less often than expected. Thus far, CFA has been plagued by the difficultiesinvolved when looking for patterns of types and antitypes. This article introducesBayesian CFA. Using Bayesian CFA one can (1) search for types and antitypes as beforewith the advantage that adjustment of the experiment-wise significance level α is notnecessary; and (2) test whether groups of types and antitypes form composite types orcomposite antitypes. This option is crucial when patterns of types or antitypes must existfor a concept to be retained. Empirical examples use data from alcohol research andfrom sleep research to illustrate both new options. Characteristics of Bayesian CFA andextensions are discussed.

KEY WORDS: Configural Frequency Analysis, Bayesian statistics, Types, Antitypes.

Configural Frequency Analysis (CFA; Lienert, 1969; von Eye, 1990;von Eye et al., 1996) is a multivariate method for the analysis of cross-classifications from a person-oriented rather than variable-orientedperspective (Bergman et al., 1991; Magnusson, 1985). CFA screens across-classification for cells that contain more or fewer cases thanexpected from some base model. Thus far, CFA has been discussedalmost exclusively from a frequentist perspective (the only exceptionis the article by Wood et al., 1994). While coherent, this perspectiveis limited in a number of ways. As far as CFA is concerned, the

* Corresponding author.

151

Dow

nloa

ded

by [

Col

orad

o C

olle

ge]

at 1

5:38

18

Nov

embe

r 20

14

Page 4: A Bayesian approach to configural frequency analysis

152 E. GUTIERREZ-PESlA AND A. VON EYE

frequentist perspective constrains hypothesis testing in two ways. Thefirst is that prior information is not easily taken into account whenestimating expected cell frequencies (cf. Spiel and von Eye, 1993). Thesecond constraint is that hypotheses concerning groups of cells arethus far virtually untestable. The present article introduces a Bayesianapproach to CFA. This approach allows us to provide solutions forboth limitations to CFA.

1. CONFIGURAL FREQUENCY ANALYSIS FROM AFREQUENTIST PERSPECTIVE: AN OVERVIEW

Consider the cross-classification of d>2 categorical variables. Let Mbe the vector of the model frequencies in this cross-classification, andw,- the observed frequency in cell i where / goes over all cells in thecross-classification. Let mt be the estimated expected cell frequency forcell i, and jr,- the population probability for cell /. A log-frequencymodel for this cross-classification can be described as

logAf = XX,

where M is the vector of model cell frequencies, X is the design matrix,and k is the parameter vector. •

In a fashion analogous to residual analysis, CFA asks whetherthe model prediction, m,, deviates significantly from the observedvalue, 7w,-. A large number of tests has been proposed for this purpose(see von Eye, 1990; von Eye and Rovine, 1988). Most popular are thez-test,

Zi —y/tl-Kj{\ – Hi) '

where the nt are estimated by m,/n, and the Pearson .^-componenttest,

v2 _ [nii – rhi)2

m,

(df = 1). To come to a decision as to whether the hypothesis HQ:E{mi – rhi) = 0 holds, one applies either of these or one of the othertests discussed in the context of CFA under the appropriate measure

Dow

nloa

ded

by [

Col

orad

o C

olle

ge]

at 1

5:38

18

Nov

embe

r 20

14

Page 5: A Bayesian approach to configural frequency analysis

CONFIGURAL FREQUENCY ANALYSIS 153

for protection of the experiment-wise a. Most of the current CFA testsare approximations of the binomial. As such they are readily calcu-lated using existing statistical software. However, a number of tests arebased on different statistical models. Examples include Lehmacher's(1981) test which is based on the hypergeometric distribution. Employ-ing this test implies (1) that all univariate margins are fixed, and(2) that the test can be used only for a small group of CFA base mod-els. In general, the sampling scheme and CFA base model determinethe test that can be used.

The most frequently used methods for a-protection are theBonferroni method, where an adjusted a is calculated as

* a

a =7'where a is the significance threshold, typically a = 0.05, and t is the num-ber of tests, typically the number of cells in the cross-classification;and the method proposed by Holm (1979) which leads to the adjusted a,-,

aa, =

t-i+V

where / goes over all cells in the cross-tabulation. More efficientmethods have been discussed in the context of CFA and elsewhere(Hommel, 1988; Olejnik et al, 1997).

It is the goal of CFA to identify types and antitypes. Cases in cell iare said to belong to a CFA type if E(mt – mi) > 0, and to a CFA anti-type if E(nti — rhj) < 0. In CFA applications, researchers search fortypes and antitypes and interpret them using substantive backgroundinformation.

The CFA base model specified using the indicator matrix, X,depends on the assumptions made for the search for types and anti-types, and on the sampling scheme employed for data collection. Ifresearchers assume that (1) all variables in a study have the samestatus and (2) types and antitypes are the results of local associations(Havranek and Lienert, 1984), the base model can be, for instance, alog-linear main effect model or a hierarchical log-linear model thattakes into account all first order interactions. If, however, researchersgroup variables in, say, predictors and criteria, base models can bemore complex.

Dow

nloa

ded

by [

Col

orad

o C

olle

ge]

at 1

5:38

18

Nov

embe

r 20

14

Page 6: A Bayesian approach to configural frequency analysis

154 E. GUTIERREZ-PENA AND A. VON EYE

The sampling schemes that have been discussed for CFA (von Eye andSchuster, 1998) are the well-known multinomial and product-multi-nomial approaches. In particular the product-multinomial samplingscheme poses constraints on the selection of CFA base models. Con-sider, for example, a cross-tabulation with four variables, where fortwo of the variables the margins were fixed by the sampling scheme.Therefore, the uni- and bivariate marginal sums must be reproducedby statistical analysis. This poses constraints for the selection of CFAbase models. In the present example, the base model of first order CFAis not applicable, because it takes into account only the main effectsbut not the first order interactions of variables which are required forthe two variables observed under the product multinomial samplingscheme.

In the following sections we first provide two data examples of CFAapplication. Then we introduce the Bayesian approach to CFA andre-analyze the data examples. We illustrate that decisions concerningthe existence of types and antitypes can be based on the new method.We also illustrate that tests can be performed that are not readilyavailable using the standard means currently discussed for CFA.Technical and computational details will be provided.

2. CFA – TWO DATA EXAMPLES

2.1. Alcohol Abuse

The first data example describes a sample of N= 108 adult men whowere diagnosed as alcohol abusers (Zucker, 1994). The diagnostic scalehad the four levels: 1 = alcohol user; 2 = mild abuser; 3 = severeabuser; and 4 = alcohol-dependent. Three years later the same individ-uals were diagnosed again. Diagnostic categories were the same as atTime 1. However, in addition, the diagnosis 0 = no user of alcohol wasincluded. Table 1 displays the 4 x 5 cross-classification of the diag-noses at the two occasions.

We analyze this data set using Lehmacher's (1981) asymptotichypergeometric test. The significance threshold is adjusted using theBonferroni method. For a = 0.05 we obtain a* = 0.05/20 = 0.0025.The CFA base model is that of a simple log-linear main effect model(first order CFA; see von Eye, 1990). It takes the main effects for bothobservation points into account and can therefore be violated only

Dow

nloa

ded

by [

Col

orad

o C

olle

ge]

at 1

5:38

18

Nov

embe

r 20

14

Page 7: A Bayesian approach to configural frequency analysis

CONFIGURAL FREQUENCY ANALYSIS

TABLE 1Cross-Classification of Alcohol Diagnoses at Two Occasions (N= 108)

Alcohol Abuse DiagnosticCategories at Time 2

155

Alcohol AbuseDiagnostic Categoriesat Time 1

1234

0

10849

1

8223

2

11174

3

03

106

4

033

14

TABLE 2Design Matrix for Log-linear Base Model for First Order CFA ofAlcohol Data

0 1 0 10 1 0 00 1 0 00 1 0 00 1 0 - 10 00 00 0((

_—_-

) 0) 0

0 (I 0 (

0 (0 (

10

1 01 0

- 1) 1) 0) 0) 0

0 0 - 1- 1- 1 -- 1- 1- 1

1000

— 1

0100

- 10100

— 10100

- 10100

- 1

0010

- 10010

- 10010

- 10010

- 1

0001

- 10001

-10001

- 10001

when an association between diagnoses at Time 1 and Time 2 exists.*The design matrix X for this model appears in Table 2 (the constantvector for the intercept is assumed). It indicates that we used standardeffect coding. The results of CFA appear in Table 3.

* It may be discussed whether models other than the main effect model are also rea-sonable. Specifically, conditional probability models may be interesting, in particular,when more than two observation points are studied. However, selection of CFA basemodels must be based on specific substantive assumptions. Such assumptions are dis-cussed in another context (von Eye et al., 1999).

Dow

nloa

ded

by [

Col

orad

o C

olle

ge]

at 1

5:38

18

Nov

embe

r 20

14

Page 8: A Bayesian approach to configural frequency analysis

156 E. GUTIERREZ-PENA AND A. VON EYE

TABLE 3First Order CFA Results for Alcohol Data

Configuration

1011121314202122232430313233344041424344

FO

10810082

1133427

1039346

14

FE

5.4542.6394.0463.3433.5197.7503.7505.7504.7505.0007.4633.6115.5374.5744.815

10.3335.0007.6676.3336.667

z

2.5283.900

-1.872-2.208-2.278

0.122-1.119

2.836-1.017-1.139-1.715-1.044

0.8013.192

-1.047-0.599-1.175-1.820-0.178

3.836

P(z)

0.005735900.000048210.030622110.013612090.011348640.451355400.131508510.002280880.154667790.127383830.043179950.148320630.211699490.000705420.147632490.274639690.119995710.034403470.429412180.00006263

Type/Antitype?

Type

Type

Type

Type

The Pearson X2 = 51.54 for the base model is large, indicating sig-nificant (df=12; p<0.0l) discrepancies between the data and theexpectancies that had been estimated under the assumption of inde-pendence of diagnoses at the two points in time. CFA identifies fourinstances where these discrepancies are particularly large. These dis-crepancies appear in the form of types. There are no antitypes in thisdata set. The types emerge for the cells with indices 1-1,2-2, 3-3, and4-4. These are the cells that contain individuals who are diagnosed atTime 2 as displaying the same level of alcohol use/abuse as at Time 1.The number of individuals with stable diagnoses is significantlygreater than was expected from the CFA base model.

The first type, found for Cell 1-1, describes those individuals whowere users of alcohol at both points in time. The second type, foundfor Cell 2-2, describes individuals who were mildly abusing alcohol atboth points in time. The third type, found for Cell 3-3, describes indi-viduals who were severe abusers at both points in time. The fourthtype, found for Cell 4—4, describes individuals who were consistentlyalcohol-dependent.

Standard CFA is able to identify types and antitypes in cross-classifications. However, groups of types and antitypes cannot be easily

Dow

nloa

ded

by [

Col

orad

o C

olle

ge]

at 1

5:38

18

Nov

embe

r 20

14

Page 9: A Bayesian approach to configural frequency analysis

CONFIGURAL FREQUENCY ANALYSIS 157

described (von Eye et al., 1991). For instance, one may ask for the pre-sent data whether there is a general pattern such that respondents aremore consistent than can be expected from some chance model or basemodel. Suppose, for example, that the types and antitypes have beenidentified using the Pearson Ar2-component test. Then the A'2-compo-nents for two or more types or antitypes cannot be summed and testedunder the sum of the degrees of freedom because this sum of X2-components is not necessarily distributed as \2 (see Lancaster, 1969).

This problem with the Ar2-component test is, in principle, no hurdleto the attempt of testing for the existence of a group of types or anti-types. One can always determine the maximum likelihood estimatesfor groups of cells. However, to the best of our knowledge, there existsno form that can be applied to every hypothesized group of types orantitypes. Therefore, the estimation of probabilities for groups of con-figurations is, from an applied perspective, a problem.

The Bayesian approach, described below, allows one to solve thisproblem. The following, second data example makes this point evenmore important. It describes an attempt at using CFA to identify pat-terns of types and antitypes.

2.2. Sleep Behavior

The second data example re-analyzes data presented by von Eye andBrandtstadter (1997). The data stem from a study on sleep behavior(Gortelmeyer, 1988). One result of this study was the description ofsix types of sleep behavior, created using first order CFA. Thesix types describe respondents who sleep (1) short periods of time earlyin the morning; (2) symptom-free during 'normal' night hours;(3) symptom-free but wake up too early; (4) short periods early in themorning and show all symptoms of sleep problems; (5) at normalnight hours but show all symptoms of sleep problems; and (6) longhours from early in the evening on but show all symptoms of sleepproblems.

As is typical of CFA results, not the entire sample was assigned toone of these types. Specifically, a subsample of 107 respondents wasmember of one of the types, and a subsample of 166 did not belong toany type. For the following analyses, the remaining 166 respondentsare treated as if they belonged to a seventh type.

Dow

nloa

ded

by [

Col

orad

o C

olle

ge]

at 1

5:38

18

Nov

embe

r 20

14

Page 10: A Bayesian approach to configural frequency analysis

158 E. GUTIERREZ-PENA AND A. VON EYE

After creating these seven types of sleep problem behavior,Gortelmeyer (1988) examined their external validity. He asked whetherpsychosomatic symptoms displayed during sleep allow one to discrim-inate among the types. For the following analysis, we cross-tabulatethe six (+1) types of sleep problem behavior (S) and the two levels ofpsychosomatic symptomatology (P), above (=2) and below median(= 1). The resulting 7 x 2 cross-tabulation (see Gortelmeyer (1988,p. 216)) is analyzed using first order prediction CFA with P as the pre-dictor and 5 as the criterion. Lehmacher's asymptotic test (1981) wasused with Kuchenhoff's (1986) continuity correction. Thep-level wasset to 0.05 and Bonferroni-adjusted, which led to p* = 0.00357. Thebase model for this approach makes the following assumptions:

1. The main effects of the variables, Sleep Types [S] and Psychoso-matic Symptomatology [P], are both considered; and

2. S and P are independent.

This model involves only one predictor, P, and one criterion, S.Thus, there are no interactions that need to be considered. This modelcan be contradicted only if there are predictor – criterion relation-ships, that is, relationships between P and 5. Table 4 displays CFAresults, which suggest that there are four prediction types and fourprediction antitypes. Reading from the top of the table, the first type,

TABLE 4CFA of Types of Sleep Problems as Predicted by Psychosomatic Symptoms

Configuration

SP

1112212231324142515261627172

Frequencies

Observed

193

204

163544

108

1165

101

Expected

11.0410.9612.0411.969.549.474.524.487.036.979.549.47

83.3082.70

Significance Tests

2

3.31-3.31

3.18-3.18

2.83-2.83-0.01

0.01-1.38

1.38-0.49

0.49-4.41

4.41

0.00050.00050.00070.00070.00230.00230.49560.49560.08330.08330.31160.31160.000010.00001

Type/Antitype?

TypeAntitypeTypeAntitypeTypeAntitype

AntitypeType

Dow

nloa

ded

by [

Col

orad

o C

olle

ge]

at 1

5:38

18

Nov

embe

r 20

14

Page 11: A Bayesian approach to configural frequency analysis

CONFIGURAL FREQUENCY ANALYSIS 159

1-1, suggests that below median occurrence of psychosomatic symp-toms predicts Sleep Pattern Type 1, that is, sleeping for short periodsin the early morning. The corresponding antitype, 1 -2 , suggests thatabove median occurrence of psychosomatic symptoms makes this sleeppattern unlikely. Type 2-1 suggests that below median occurrenceof psychosomatic symptoms leads also to frequent observation ofsymptom-free sleep during normal night hours. This type goes alsohand-in-hand with an antitype, 2-2, suggesting that this pattern isunlikely for respondents with above median psychosomatic sympto-matology. The third type-antitype pair is for Sleep Pattern Type 3,that is, for symptom-free sleep that is shortened by early awakening.As for the first two pairs, below median occurrence suggests high prob-ability for this sleep pattern, and above median occurrence suggestslow probability.

The last type-antitype pair was observed for those respondents thatdid not belong to any of the Sleep Pattern Types. Antitype 7-1 sug-gests that respondents with below median occurrence of psycho-somatic symptoms are unlikely to belong to this group. In contrast(Type 7-2), respondents with above median occurrence of psychoso-matic symptoms are highly likely to belong to this group.

It was one of goals of von Eye and Brandtstadter (1997) to interpretthese results from the perspective of a causal model. Specifically, theauthors asked whether these results could be used to support thenotion of a causal fork which would imply that one causal event canhave more than one consequence. In the present context, an inter-pretation in terms of a causal fork would be plausible, if one predictorcategory would be associated with two or more criterion categories.This association manifests in the form of types or antitypes.

Indeed, the resulting pattern seems to support the hypothesis thatpsychosomatic problems can be causes of sleep problems. This rela-tionship can be described using the concept of a fork. More specifi-cally, the results suggest that below median psychosomatic symptomsresult in the more healthy sleep patterns 1, 2, and 3, shown by individ-uals plagued by fewer sleep problems. This interpretation is based onthe three types 1-1, 1-2, and 1-3. In addition, it seems that the dataalso show an antitypical relationship. Above median psychosomaticsymptoms seem to prevent the occurrence of the more healthy sleeppatterns 1, 2, and 3. Figure 1 displays the pattern of types and Figure 2

Dow

nloa

ded

by [

Col

orad

o C

olle

ge]

at 1

5:38

18

Nov

embe

r 20

14

Page 12: A Bayesian approach to configural frequency analysis

160 E. GUTIERREZ-PENA AND A. VON EYE

displays the pattern of antitypes that were regarded as in support of aninterpretation in terms of a fork (from von Eye and Brandtstadter,1997, p. 18 (Figure 1) and p. 19 (Figure 2)).

As in the first data example, the interpretation of these patterns oftypes and antitypes as supporting the notion of a causal fork is onlybased on the existence of single types and antitypes. There was no

Above MedianPsychosomaticSymptoms

Below MedianPsychosomaticSymptoms

Sleep Pattern 1

Sleep Pattern 2

Sleep Pattern 3

Sleep Pattern 4

Sleep Pattern 5

Sleep Pattern 6

"Poubelle"

FIGURE 1 Fork model of types of sleep patterns.

i 1 ^*Above MedianPsychosomaticSymptoms

Below MedianPsychosomaticSymptoms

Sleep Pattern 1

Sleep Pattern 2

Sleep Pattern 3

Sleep Pattern 4

Sleep Pattern 5

Sleep Pattern 6

"Poubelle"

FIGURE 2 Fork model of antitypes of sleep patterns.

Dow

nloa

ded

by [

Col

orad

o C

olle

ge]

at 1

5:38

18

Nov

embe

r 20

14

Page 13: A Bayesian approach to configural frequency analysis

CONFIGURAL FREQUENCY ANALYSIS 161

statistical test against the null hypothesis of a joint pattern of typesand a joint pattern of antitypes. The following sections will showsuch a test. In addition, Section 3 will introduce Bayesian CFA andre-analyze the two data examples.

3. BAYESIAN CFA

This section first provides a brief discussion of two key terms ofBayesian statistics, the prior and the posterior distributions. Subse-quently, these concepts are applied to CFA.

3.1. Prior and Posterior Distributions

Consider, as in Section 1, the cross-classification of d > 2 categoricalvariables. Recall that JT,- denotes the population probability for cell i,and let it be the vector of such probabilities.

Here we shall only be concerned with the usual multinomial sam-pling scheme (product-multinomial sampling schemes can be dealtwith in a similar fashion). In this case, M, the vector of observed fre-quencies, can be regarded as an observation from a (AT— ̂ -dimen-sional multinomial distribution with index N=J2imi a nd unknownparameter vector it, where K denotes the total number of cells in thecross-classification.

From a Bayesian point of view, all prior beliefs concerning the valueof it must be described in terms of a prior distribution. The usual con-jugate prior for the multinomial parameter is the Dirichlet distribu-tion. This distribution is characterized by a parameter vector0 = (/?,,...,/3JO such that £(jr,) = ft//?-, where /?. = £,•/?,•. F ° r otherproperties of both the multinomial and Dirichlet distributions, seeAppendix A of Gelman et al. (1995).

In the absence of prior information, an 'ignorance' prior (also calleda 'noninformative' prior) will typically be used. One of the most widelyused methods to derive noninformative priors is the so-called Jeffreys'rule, which in this case yields the Dirichlet distribution with parameterP = (1/2,... , 1/2). One of the nice features of this prior is that it isconjugate (or closed under sampling). Specifically, the posterior distri-bution of it is also Dirichlet, with parameter fi = (m\ + 1/2,...,m/c+l/2). This distribution contains all the available information

Dow

nloa

ded

by [

Col

orad

o C

olle

ge]

at 1

5:38

18

Nov

embe

r 20

14

Page 14: A Bayesian approach to configural frequency analysis

162 E. GUTIERREZ-PENA AND A. VON EYE

about the population proportions n, conditional on the observed con-tingency table.

Note that any base model imposes constraints on the space of possi-ble values of jr. In other words, under the base model the populationprobability of cell i is given by -n] =fi{n) for some functions ft. As asimple example, consider a 2 x 2 cross-classification and a base modelwhich states that the two variables are independent. Then f\ (n) ={it\ + 7T3) X (TTi + 7T2), / 2 ( J T ) = (TT2 + 7T4) X (TT] + 7T2), fi(7t) = (TT] + 7T3) X

(TT3 + 7T4) a n d / 4 (jr) = (T3 + TTi) X (?T2 + 7T4).

The base model can be tested on the basis of the posterior distribu-tion of

6 =

This quantity is essentially a deviance. It is always nonnegative and iszero if and only if the base model is correct, i.e. if and only if 717 = it*for all i. The posterior distribution of 6 is not available in closed form,but it can easily be obtained from that of JT using Monte Carlo techni-ques. Posterior distributions concentrated near zero support the basemodel, whereas posterior distributions located away from zero lead torejection of the base model. One can decide whether most of the massof the posterior distribution is really away from zero by means of aBayesian 'significance test': reject Ho: 6 = 0 if zero is not contained inthe 95% (say) highest posterior density region for 6. Such a test playsvery much the same role as the x2 test for the base model used in theconventional approach to CFA.

3.2. Types and Antitypes from a Bayesian Perspective

Cases in cell /belong to a type if TT,- > it*, and to an antitype if TT,- < it*.Since we have the posterior distribution of n, we can, in principle, cal-culate the posterior probability of any event involving the populationproportions, n. In particular, we can, for instance, compute the poster-ior probability of cell i being a type, namely, Pr(7r,- > it*). If this prob-ability is close to 1, then we can classify cell / as a type. Similarly, if theprobability was close to zero, then the cell could be classified as anantitype. However, such an approach is rather simplistic and does not

Dow

nloa

ded

by [

Col

orad

o C

olle

ge]

at 1

5:38

18

Nov

embe

r 20

14

Page 15: A Bayesian approach to configural frequency analysis

CONFIGURAL FREQUENCY ANALYSIS 163

necessarily make good use of all the available information. On theother hand, the posterior probability that TTJ = IT* is always zero underany conjugate prior (or, for that matter, under any absolutely con-tinuous prior distribution on JT), suggesting that every cell in the cross-classification should be classified as either a type or an antitype.

Certainly, even if 7r,- ̂ IT*, we would be unwilling to classify cell i asa type (respectively, antitype) unless TT,- — TT* was significantly greaterthan zero (respectively, less than zero). This suggests the followingslightly modified definition of types and antitypes: cases in cell / aresaid to belong to a type if and only if TTJ > IT* + £,-, and to an antitypeif and only if TT,- < IT * — £,-, where e,- is a suitable threshold value. In theexamples below we have set e,- equal to two times the posterior stan-dard deviation of TT,- — IT*.

3.3. Patterns of Types and Antitypes

An interesting feature of the Bayesian approach is that it allows us tocalculate, for example, the joint posterior probability of several cellsbeing all types (simultaneously). More generally, we can calculate theposterior probability of any specific pattern of types and antitypes inthe cross-classification. Indeed, given a particular base model, theposterior distribution of n induces a probability distribution on theset of all possible patterns. Consider, for example, a 2 x 2 cross-classification. Then such possible patterns include

(T T\ (N A\ (T A\ (A T\ (A A\\T T)'"\T N)'"\A T)'"\T A)'"\A A)'

where T stands for a 'type', A for an 'antitype', and iV for 'neither'.

Of course, many of these patterns will have low – or even zero – probability (for instance, the first and the last above). A Bayesiansolution to the CFA problem would then be to report the most prob-able pattern. Notice, however, that, even for moderately-sized pro-blems, the cardinality of the set of all possible patterns may be toolarge for a direct implementation of this approach to be feasible inpractice. For example, in the case of an r x c cross-classification,the total number of different patterns equals 3 r x c . Hence, even for a2 x 2 cross-classification there are as much as 81 distinct patterns.

Dow

nloa

ded

by [

Col

orad

o C

olle

ge]

at 1

5:38

18

Nov

embe

r 20

14

Page 16: A Bayesian approach to configural frequency analysis

164 E. GUTIERREZ-PENA AND A. VON EYE

Nevertheless, there seems to be scope for a more structured approachhere, based perhaps on a Bayesian multilevel (or hierarchical) model.For a recent discussion of the motivation and analysis of these modelssee Gelman et al. (1995).

In practice, we can dramatically reduce the numerical burden of theapproach described above if we only look at patterns in a 'neighbor-hood' of the particular pattern suggested by an exploratory analysiswhich looks at each cell individually. This procedure is illustrated inthe following sections.

As pointed out at the end of Section 2.1, from a frequentist perspec-tive issuing inferential statements concerning groups of cells has so farbeen – to say the least – problematic. Currently available methodsrequire the researcher to specify the likelihood functions separately foreach statement concerning a group of cells. There is no routine avail-able that can be invoked. In addition, a-adjustment can only rely onBonferroni-type arguments and are therefore only approximate. Incontrast, from a Bayesian viewpoint the posterior distribution of itproduces joint probability statements concerning possible patternsof types and antitypes in a given group of cells. There is no need forBonferroni within the Bayesian framework.

The calculations of all the posterior probabilities reported in the twoexamples below were based on the simulation of 10,000 samples fromthe corresponding (Dirichlet) posterior distributions of jr. Details ofthese calculations are given in Appendix A.

3.4. Bayesian Analysis of the 'Alcohol Abuse' Data

We now re-analyze the data of Table 1 concerning alcohol diagnosis.For the sake of comparison with the analysis of Section 2.1, we use anoninformative prior. Figure 3 shows a histogram of the posterior dis-tribution of 6. This distribution puts most of its mass away from zero,indicating significant discrepancies between the data and the expectan-cies under the assumption of independence.

Table 5 shows the results of the Bayesian CFA. Notice that theBayesian CFA identifies all the types found by the classical CFA (seeSection 2.1). However, there is some indication that Cell 1-0 is alsoa type, whereas Cells 1-2, 1-3, and 1-4 could well be consideredantitypes. The posterior probability of the pattern given by the last

Dow

nloa

ded

by [

Col

orad

o C

olle

ge]

at 1

5:38

18

Nov

embe

r 20

14

Page 17: A Bayesian approach to configural frequency analysis

CONFIGURAL FREQUENCY ANALYSIS 165

0.1 0.2 0.3 0.4

FIGURE 3 Posterior distribution of the deviance equivalent S for the data in Table 3.

TABLE 5First Order Bayesian CFA Results for Alcohol Diagnosis Data

Configuration

1011121314202122232430313233344041424344

FO

10810082

1133427

1039346

14

Pr(Type)

0.54270.73510.00000.00000.00010.04190.00260.64720.00410.00270.00040.00530.11330.70470.00410.00760.00200.00030.02020.9238

Pr(Neither)

0.45730.26490.35850.10170.07990.94390.79680.35280.80620.76480.50990.80210.88540.29530.80380.90650.76370.49250.94800.0762

Pr(Antitype)

0.00000.00000.64090.89830.92000.01420.20060.00000.18970.23250.48970.19260.00130.00000.19210.08590.23430.50720.03180.0000

Type/Antitype?

TypeTypeAntitypeAntitypeAntitype

Type

Type

Type

column of Table 5 is about 0.003. In contrast, the pattern provided bythe last column of Table 3 has a probability of less than 0.0001. Thissuggests that the Bayesian procedure has identified one further typeand three antitypes which were overlooked by the classical method.

The posterior probabilities of the two type/antitype patterns arevery small. This suggests that the posterior distribution over the space

Dow

nloa

ded

by [

Col

orad

o C

olle

ge]

at 1

5:38

18

Nov

embe

r 20

14

Page 18: A Bayesian approach to configural frequency analysis

166 E. GUTIERREZ-PENA AND A. VON EYE

of all possible patterns is quite 'flat'. Note that, for this example, thetotal number of possible patterns is 320, so the probability has to bedistributed among a huge number of patterns. In any event, the pat-tern produced by the Bayesian analysis is more than 30 times as likelyas that provided by the corresponding classical analysis of Section 2.1.

3.5. Bayesian Analysis of the 'Sleep Behavior' Data

Table 6 shows the results of the Bayesian CFA for the sleep behaviordata. For the sake of comparison with the analysis of Section 2.2, weuse a noninformative prior. In Figure 4 we show a histogram of theposterior distribution of 6. This distribution puts most of its massaway from zero, indicating significant discrepancies between the dataand the expectancies under the assumption of independence.

Table 6 shows the results of the Bayesian CFA for this example.Notice that both the classical and the Bayesian CFA lead to the sameresults in this case (compare the last columns of Tables 4 and 6). Theposterior probability of the pattern provided by the two analyses is0.409. In order to illustrate the importance of joint probabilistic state-ments concerning groups of cells, suppose we decide not to classifyCell 7-1 as a type and Cell 7-2 as an antitype. Then the probability ofthe resulting pattern decreases dramatically to less than 0.0001.

Unlike the previous example, in this case the posterior probabilityof the pattern produced by both the Bayesian and the classical

TABLE 6First Order Bayesian CFA Results for Sleep Behavior Data

Configuration

1112212231324142515261627172

FO

193

204

163544

108

1165

101

Pr(Type)

0.95580.00000.93210.00000.86180.00000.04660.01190.00000.35040.00380.10140.00000.9954

Pr(Neither)

0.04420.04420.06790.06790.13820.13820.94150.94150.64960.64960.89480.89480.00460.0046

Pr(Antitype)

0.00000.95580.00000.93210.00000.86180.01190.04660.35040.00000.10140.00380.99540.0000

Type/Antitype?

TypeAntitypeTypeAntitypeTypeAntitype

AntitypeType

Dow

nloa

ded

by [

Col

orad

o C

olle

ge]

at 1

5:38

18

Nov

embe

r 20

14

Page 19: A Bayesian approach to configural frequency analysis

CONFIGURAL FREQUENCY ANALYSIS 167

o8 i

8 -

0.05 0.10

aux

0.15

FIGURE 4 Posterior distribution of the deviance equivalent 6 for the data in Table 4.

preliminary analyses is not small. The posterior distribution over thespace of all possible patterns is much more concentrated around thepattern obtained from the preliminary analysis. This suggests the dataare more informative in this case.

As pointed out in Section 2.2, it is of interest to see whether the datasupport the notion of a causal fork. Specifically, we would like to testthe following two relevant hypotheses:

1. the first three types reported in Table 6 form a fork (see Figure 1);and

2. the first three antitypes reported in Table 6 also form a fork (seeFigure 2).

This is easily accomplished within the Bayesian framework by look-ing at the posterior probability of each of the hypotheses. These prob-abilities are given by

and

Pr(Cells 1-1, 2-1 and 3-1 are all types) = 0.762

Pr(Cells 1-2, 2-2 and 3-2 are all types) = 0.762,

respectively. Therefore, the data support an interpretation in terms ofa fork for the two patterns shown in Figures 1 and 2.

Dow

nloa

ded

by [

Col

orad

o C

olle

ge]

at 1

5:38

18

Nov

embe

r 20

14

Page 20: A Bayesian approach to configural frequency analysis

168 E. GUTIERREZ-PENA AND A. VON EYE

4. DISCUSSION

Viewed from a research strategy perspective, the Bayesian approach toCFA presented in this article embraces both the exploratory and theconfirmatory research traditions. Bayesian CFA allows for explora-tory search for types and antitypes. In this respect, researchers whouse Bayesian CFA pursue the same goals as researchers who use thefrequentist version of CFA. In addition, Bayesian CFA allowsresearchers to test hypotheses that concern groups of cells. Member-ship in a group of cells can be defined from theory or based on empiri-cal results, or both. The examples given in this article used empiricalresults. In either case, testing whether a group of cells constitutes acomposite type or a composite antitype is a confirmatory step ratherthan an exploratory one.

Providing the new option to test for existence of composite typesand composite antitypes leads to new research opportunities, both forsubstantive researchers who analyze cross-classifications and formethodologists. Substantive researchers can now test such hypothesesas proposed by von Eye and Brandtstadter (1997) using a one-stepprocedure rather than a procedure where each involved cell is exam-ined separately. Thus, there now is the option to test for the existenceof composite types/antitypes even if the individual type may fall shortof significance, which is very likely when a table is large but the num-ber of individuals in the table is small.

For methodologists, there are new options that need to be devel-oped and tested. For instance, the above embracing of both explora-tory and confirmatory research strategies can be redefined such thatthe research strategy becomes fully exploratory again. More specifi-cally, it is one goal for the further development of Bayesian CFA todevise a search engine that identifies groups of types and antitypes.Such an engine could base the search on such concepts as the fork andthe wedge (von Eye and Brandtstadter, 1997). In addition, other con-cepts that are rooted in substantive theories or logic can be considered.Even a completely free search is conceivable, for instance, when syn-dromes are of interest that are defined as agglomerations of symptoms.

Two more lines of further development of Bayesian CFA seem nat-ural. The first of these lines concerns the definition of types and anti-types as such. In the present article (and in Wood et ah, 1994) we used

Dow

nloa

ded

by [

Col

orad

o C

olle

ge]

at 1

5:38

18

Nov

embe

r 20

14

Page 21: A Bayesian approach to configural frequency analysis

CONFIGURAL FREQUENCY ANALYSIS 169

the standard definition that is related to the Pearson X1. However, aswas shown by von Eye et al. (1995), there are more ways to deviate fromindependence that can be exploited for the definition of types and anti-types. Bayesian CFA may be able to make additional contributions.

The second line concerns the use of a priori existing information. Inthe present article we used noninformative priors. In many researchcontexts, however, a priori information is available. This type of infor-mation comes in many forms. Examples include probabilities of vari-able categories that can be derived from distributional assumptions as,for instance, in genetics. Most important seem the so-called prior pre-dictive distributions. These are distributions of dependent variablesthat are known beforehand and are not conditional on the previousobservation of a process that establishes a causal relationship betweenindependent and dependent variables. Consideration of prior predic-tive distributions may lead to a new model for prediction CFA, a modelthat helps avoid making wrong predictive conclusions because it takescharacteristics of the dependent measure into account.

From the characteristics of the Bayesian and the frequentist approa-ches to CFA, recommendations can be derived as to which approachto select for data analysis. When (1) single types and antitypes are ofinterest, and (2) the observed cells frequencies are deemed valid for theestimation of expected cell frequencies, the noninformative Bayesianand the frequentist approaches are largely equivalent. Both allow'the data to speak' and typically yield very similar results. If, however,prior information is of importance and if composite types and anti-types are searched for, the Bayesian approach is preferable.

The Bayesian approach presented in this article allows for joint infer-ential statements about groups of cells and does not require Bonferroni-type arguments. On the other hand, the simulation-based approach forthe calculation of posterior probabilities can be appropriate evenfor moderately-sized cross-classifications, but can become unfeasiblefor larger problems.

Acknowledgments

The authors would like to thank Christof Schuster, ManuelMendoza and two reviewers for their valuable comments. Eduardo

Dow

nloa

ded

by [

Col

orad

o C

olle

ge]

at 1

5:38

18

Nov

embe

r 20

14

Page 22: A Bayesian approach to configural frequency analysis

170 E. GUTIERREZ-PENA AND A. VON EYE

Gutiérrez-Peña's work was partially supported by the SistemaNacional de Investigadores, Mexico. Alexander von Eye's work onthis article was supported in part by NIAA grant # ZRO1 AA7065.

REFERENCES

Bergman, L. R., Eklund, G. and Magnusson, D. (1991). Studying individual develop-ment: problems and methods. In D. Magnusson, L. R. Bergman, G. Rudinger andB. Törestadt (Eds.), Problems and Methods in Longitudinal Research (pp. 1-27).Cambridge, UK: Cambridge University Press.

Gelman, A., Carlin, J. B., Stern, H. S. and Rubin, D. B. (1995). Bayesian Data Analysis.London: Chapman & Hall.

Görtelmeyer, R. (1988). Typologie des Schlafverhaltens [Typology of sleep behavior].Regensburg: Roderer.

Havránek, T. and Lienert, G. A. (1984). Local and regional vs. global contingency test-ing. Biometrical Journal, 26, 483-494.

Holm, S. (1979). A simple sequentially rejective multiple test procedure. ScandinavianJournal of Statistics, 6, 65-70.

Hommel, G. (1988). A comparison of two modified Bonferroni procedures. Biometrika,76, 624-625.

Küchenhoff, H. (1986). A note on a continuity correction for testing in three-dimensional configurai frequency analysis. BiometricalJournal, 28, 465-468.

Lancaster, H. D. (1969). The Chi-Squared Distribution. New York: John Wiley.Lehmacher, W. (1981). A more powerful simultaneous test procedure in configurai fre-

quency analysis. Biometrical Journal, 23, 429-436.Lienert, G. A. (1969). Die "Konfigurationsfrequenzanalyse" als Klassifikationsmethode

in der klinischen Psychologie [Configurai Frequency Analysis as a classificationmethod in clinical psychology]. In M. Irle (Ed.), Bericht über den 26. Kongreß derDeutschen Gesellschaft für Psychologie in Tübingen 1968 [Proceedings of the 26' Con-gress of the German Psychological Association] (pp. 244-253). Göttingen: Hogrefe.

Magnusson, D. (1985). Implications of an interactional paradigm for research on humandevelopment. International Journal of Behavioral Development, 8, 115-137.

Olejnik, S., Li, J., Supattathum, S. and Huberty, C. J. (1997). Multiple testing and statis-tical power with modified Bonferroni procedures. Journal of Educations and Beha-vioral Statistics, 22, 389-406.

Spiel, C. and von Eye, A. (1993). Configurai frequency analysis as a parametric methodfor the search for types and antitypes. Biometrical Journal, 35, 151-164.

von Eye, A. (1990). Introduction to Configurai Frequency Analysis: The Search for Typesand Antitypes in Cross-Classifications. Cambridge: Cambridge University Press.

von Eye, A. and Brandtstädter, J. (1997). Configurai Frequency Analysis as a searchingdevice for possible causal relationships. Methods of Psychological Research – Online,2, 1-23.

von Eye, A., Lienert, G. A. and Wertheimer, M. (1991). Syndromkombinationen alsMetasyndrome der KFA. Zeitschrift für Klinische Psychologie, Psychopathologie, andPsychotherapie, 39, 254-260.

von Eye, A. and Rovine, M. J. (1988). A comparison of significance tests for ConfiguraiFrequency Analysis. EDP in Medicine and Bioloy, 19, 6-13.

Dow

nloa

ded

by [

Col

orad

o C

olle

ge]

at 1

5:38

18

Nov

embe

r 20

14

Page 23: A Bayesian approach to configural frequency analysis

CONFIGURAI. FREQUENCY ANALYSIS 171

von Eye, A. and Schuster, C. (1998). On the Specification of Models for Configurai Freq-uency Analysis – Sampling Schemes in Prediction CFA.

von Eye, A., Schuster, C. and Gutiérrez-Peña, E. (1999). Configurai Frequency Analysisunder retrospective and prospective sampling schemes – frequentist and Bayesianapproaches. Psychologische Beitraege (in press).

von Eye, A., Spiel, C. and Rovine, M. J. (1995). Concepts of nonindependence in Con-figurai Frequency Analysis. Journal of Mathematical Sociology, 20, 41-54.

von Eye, A., Spiel, C. and Wood, P. K. (1996). Configurai Frequency Analysis in appliedpsychological research. Applied Psychology: An International Review, 45, 301-327.

Wood, P. K., Sher, K. and von Eye, A. (1994). Conjugate methods in Configurai Fre-quency Analysis. Biometrical Journal, 36, 387-410.

Zucker, R.A. (1994). Pathways to alcohol problems and alcoholism: A developmentalaccount of the evidence for multiple alcoholisms and contextual contributions to risk.In R. A. Zucker, J. Howard and G. Boyd (Eds.), The Development of Alcohol Pro-blems: Exploring the Biopsychosocial Matrix of Risk. Rockville, MD: National Insti-tute on Alcohol Abuse and Alcoholism.

APPENDIX A

Here we briefly describe the calculations of the posterior probabilitiesreported in Section 3.5. The corresponding probabilities for the exam-ple of Section 3.4 were computed analogously.

The example of Section 3.5 concerns a 7 x 2 cross-classification, andso there is a total of K— 14 cells. Recall that n¡ denotes the populationprobability for cell /, and let it be the vector of such probabilities.Also, recall that the posterior distribution of n is Dirichlet with para-meter ß = (m\ + 1/2, . . . ,ntK+ 1/2). We shall find it convenient tointroduce the following notation: let ifJi? denote the population prob-ability for Cell s-q in the associated contingency table, so that (work-ing row-wise) we have IT\ = ñ\z\, 7:2 = in,2, • • • ,^u = ñi,2, and let ñ bethe vector of such probabilities. Now, under the first order model wehave fSiq =fs,q{ñ) = ñs. x if.?, where ñs. = J2q ^,? a n d ^.? = E* ^,?-

In order to calculate the posterior probability that, say, Cell 1-1 is atype and Cell 1-2 is an antitype, we draw a sample { J T ^ , . . . , jr(r)} ofsize T— 10,000 from the posterior distribution of n. We then count thenumber C of such draws simultaneously satisfying the following twoconditions: vfi,! > (in. x f . i) + ei,i andír^ < (TTI. X if.2) – £1,2, where£],! and e\ļ are suitable threshold values (here given by two times theposterior standard deviation of ifi,! — (ifi. x if.i) and if 1,2 – (ifi.xif.2), respectively). Then the required probability is approximatelyC/T. The posterior probability of any other event concerning the

Dow

nloa

ded

by [

Col

orad

o C

olle

ge]

at 1

5:38

18

Nov

embe

r 20

14

Page 24: A Bayesian approach to configural frequency analysis

172 E. GUTIERREZ-PENA AND A. VON EYE

population proportions it (in particular, the posterior probability of anypattern of types and antitypes), can be calculated in a similar fashion.

In the following appendix we provide S-Plus code for the calcula-tion of the posterior probabilities reported in Section 3.5.

APPENDIX B

# Calculation of the posterior probabilities reported in Section 3.5. #

# External function

rdirich <- function(n, alpha, k){

if(length(alpha) ! = k + 1)stop("alpha vector is the wrong length")kml <-k + lM <- matrix(0, n, kml)ve <- vector("numeric", n)for(iin l:kml) {

M[,i] <- rgamma(n, alpha[i])ve <- ve + M[,i]}

M <- M/veM

m_c(19, 3, 20,4, 16, 3, 5,4,4,10, 8,11, 65, 101)total_sum(m)tabla_matrix(m,7,2,byrow = T)

p.i_l:7PJJ:2for(i in 1:7){ p.i[i]_sum(tabla[i,])/total}for(j in 1:2){ p.j[j]_sum(tabla[,j])/total}

m_m + 0.5

N_1000sample_rdirich(N,m, 13)

Dow

nloa

ded

by [

Col

orad

o C

olle

ge]

at 1

5:38

18

Nov

embe

r 20

14

Page 25: A Bayesian approach to configural frequency analysis

CONFIGURAL FREQUENCY ANALYSIS 173

p.i.aux_p.ip.j.aux_p.jsam.hat_sample

for(nin 1:N)

{tabla.aux_matrix(sample[n,],7,2,byrow = T)for(iin 1:7){ p.i.aux[i]_sum(tabla.aux[i,])}for(j in 1:2){ p.j.aux[j]_sum(tabla.aux[,j]) }

sam.hat[n,]_as.vector(t(p.i.aux%o%p.j.aux))print(n)

}

aux_0*(l:N)for(kinl:14) # Computes the Deviance Equivalent #{

aux_log(sample[,k]/sam.hat[,k])*sample[,k] + aux}

win.graphO # Draw histogram #hist(aux)

for(kinl:14){

P[k]_sum(ifelse(sample[,k] > sam.hat[,k],l,O))/N}

## Uses the modified definition of Types and Antitypes ##

e_l:14P.mat_matrix(0,14,3)patt.bay_l:14patt_matrix(0,N,14)for(kinl:14){

e[k]_2*sqrt(var(sample[,k]-sam.hat[,k]))

patt[,k]_ifelse(sample[,k] > (sam.hat[,k] + e[k]),l,0)patt[,k]_ifelse(sample[,k] < (sam.hat[,k] – e[k]),-l,patt[,k])patt[,k]_ifelse(abs(sample[,k] – sam.hat[,k]) < = e[k],0,patt[,k])

Dow

nloa

ded

by [

Col

orad

o C

olle

ge]

at 1

5:38

18

Nov

embe

r 20

14

Page 26: A Bayesian approach to configural frequency analysis

174 E. GUTIERREZ-PENA AND A. VON EYE

P.mat[k,l]_sum(ifelse(patt[,k]== 1,1,O))/N # Posterior probabilitythat the cell is a Type #

P.mat[k,3]_sum(ifelse(patt[,k] ==-1,1,0))/N # Posteriorprobability that the cell is an Antitype #

P.mat[k,2]_l – P.mat[k, 1 ]-P.mat[k,3] # Posterior probability thatthe cell is neither 3

if(P.mat[k,l] > max(P.mat[k,2],P.mat[k,3])){patt.bay[k]_l}if(P.mat[k,3]>max(P.mat[k,l],P.mat[k,2])){patt.bay[k]_-l}if(P.mat[k,2] > max(P.mat[k,l],P.mat[k,3])){patt.bay[k]_0}

}patt.other_c(l,-l,1,-1,1,-1,0,0,0,0,0,0,0,0)

P.other_0P.bay_0for(nin 1:N){

if(sum(ifelse(patt[n,] == pattother, 1,0)) == 14){P.other_P.other + 1}

if(sum(ifelse(patt[n,] == patt.bay,l ,0)) == 14){P.bay_P.bay + 1}print(n)

}

P.bay_P.bay/N ## Posterior probability of the pattern from theBayesian analysis ##

P.other_P.other/N ## Posterior probability of theother pattern ##

## Test the hypotheses of a fork of Types and of a fork of Antitypes ##

fork.type_(sample[,l] > (sam.hat[,l] +(sample[,3] > (sam.hat[,3] + e[3])) &(sample[,5] > (sam.hat[,5] +

fork.antitype_(sample[,2] < (sam.hat[,2] – e[2])) &(sample[,4] < (sam.hat[,4] – e[4])) &(sample[,6] < (sam.hat[,6] -

P.fork.type_sum(ifelse(fork.type, 1,0))/NP.fork.antitype_sum(ifelse(fork.antitype, 1,0))/N

Dow

nloa

ded

by [

Col

orad

o C

olle

ge]

at 1

5:38

18

Nov

embe

r 20

14