using mosaic displays in configural frequency analysis · the specified model. thus, the usefulness...

Methods of Psychological Research Online 2001, Vol.6, No.3 Institute for Science Education

Internet: http://www.mpr-online.de © 2001 IPN Kiel

Using Mosaic Displays in Configural Frequency Analysis

Eun Young Mun, Alexander von Eye, Hiram E. Fitzgerald and Robert A. Zucker1

Abstract

The present study proposes using Mosaic displays to depict results of Configural Fre-

quency Analysis (CFA). The Mosaic display is a graphical method to examine multi-

way cross-tabulated data. Its unique strength as a graphical tool for CFA lies in its

capability to show more than one dimension of the data in a way that instant compari-

sons of proportions can be made without much effort. Mosaic displays allow one to il-

lustrate not only cell frequencies but also patterns of types/antitypes in CFA. In the

present study, we compare Mosaic displays using real data examples to existing graphi-

cal methods that examine cell frequencies or test statistics of types and antitypes.

Keywords: Configural frequency analysis, mosaic display, graphics, categorical data

1 Authors´ note: We would like to thank Christof Schuster for helpful comments on earlier versions of

this article. This research was supported, in part, by NIAAA grant #2 R01 AA07065 to Robert A. Zu-

cker and Hiram E. Fitzgerald. Correspondence concerning to this article should be addressed to Eun

Young Mun, MSU-UM Longitudinal Study, 4660 South Hagadorn Road, Suite 620, East Lansing, MI

48823; Phone: (517) 353-5926, Fax: (517) 432-3764, Electronic mail: [email protected]

mailto:[email protected]

E.Y. Mun, A. von Eye, H.E. Fitzgerald, & R.A.Zucker: Using Mosaic Displays in CFA 165

1. Introduction

Visual display serves a unique function in our communication of findings that tables

or words won’t do: Visual display adds impact to the intended message (Tukey, 1989,

1990). Although graphics can never replace the need for tables, computations, or

words, a good graph can provide an immediate and inescapable shot at a phenomenon

of interest (Tukey, 1989, 1990). The complementary relationship between a graphic

display and analysis is fittingly described in Tukey’s words. “A picture may be worth a

thousand words, but it may take a hundred words to do it” (Tukey, 1986; cf. Wainer,

1990).

Unfortunately, traditional graphs based on the two-dimensional space do not handle

multivariate relationships well (Wainer & Velleman, 2001). It is awkward to display

more than two variables, not to mention four. Therefore, there is a great demand for

graphical methods that help understand multivariate relationships. The present study

proposes using Mosaic displays (Friendly, 1992, 1994; Hartigan & Kleiner, 1981, 1984)

to depict results of Configural Frequency Analysis (CFA; Lienert, 1969; von Eye, 1990,

2001, in prep; von Eye, Spiel, & Wood, 1996). The Mosaic display can accommodate

the graphic needs of CFA. It simultaneously illustrates not only cell frequencies but

also patterns of types and/or antitypes in CFA. The present article describes CFA and

its needs for graphic displays, Mosaic displays, and it provides step-by-step demonstra-

tions of the Mosaic displays of CFA results using data examples.

2. Configural Frequency Analysis (CFA) and Its Needs for

Visual Display

CFA is a multivariate method for typological research that involves categorical vari-

ables. CFA can be applied in both exploratory and confirmatory research. Using CFA,

researchers ask whether cells contain fewer or more cases than expected from some

chance model. Most of these models can be specified using log-linear models. A candi-

date for a chance model is virtually any log-linear model, including non-hierarchical and

non-standard models. Most CFA models can be expressed in terms of the log-frequency

model,

(1) log ,F Xλ=

166 MPR-Online 2001, No. 3

where F is the array of frequencies in the cross-tabulation, X is the indicator or de-

sign matrix that contains all vectors needed for the specified model including the inter-

cept, main effects, interaction effects, and/or covariate effects, and λ is the parameter

vector. In the “Classical” CFA (Lienert, 1969), expected frequencies are computed un-

der the assumption of total independence of all variables under study (von Eye, 1990).

However, virtually any set of assumptions can be incorporated into CFA, allowing more

complex hypotheses to be tested. Accordingly, there exists a variety of ways to test

whether a configuration constitutes a type and/or an antitype (von Eye, 2001).

When a cell contains more cases than expected it is said to constitute a CFA type.

When there are fewer cases than expected, a cell is said to constitute a CFA antitype.

For all the statistical tests for types and antitypes of a given configuration in CFA, a

general null hypothesis is expressed as

(2) 0: ,mij ijH F F=

where Fmij is the model-based expected and Fij is the true expected frequency of cell ij

of a given configuration (cf. DuMouchel, 1999). If Fij > Fmij , a cell is said to constitute

a CFA type. If, in contrast, Fij < Fmij, a cell is said to constitute a CFA antitype. If,

statistically, Fij = Fmij, a cell constitutes neither a type nor an antitype2.

Identification of types and antitypes serves two key roles in CFA. First, the presence

of types and antitypes in a CFA model functions as a red flag to suggest that the hy-

pothesized model is not a good representation of the data; second, it shows where vari-

ables in a cross-classification are associated. A graphic display of types and antitypes

would facilitate these two key functions in CFA. However, few graphical techniques are

available to display types and antitypes of CFA. One way to illustrate types and anti-

types is to show test statistics of types and antitypes in a bar graph (e.g., von Eye &

Niedermeier, 1999, p. 195; see also Figure 6 in this article for an example). Height of a

2Throughout the present article, we identified types and antitypes in CFA base models assuming that

types and antitypes do not exist in the population. This assumption, however, may lead to incorrect

identification of types and antitypes when types/antitypes exist in a population. An alternative approach

to identify types and antitypes suggested by Kieser and Victor (1999) utilizes an additional model for

types/antitypes superimposed on the base model. Using this alternative approach may result in different

numbers and/or configurations of types and antitypes. Since the focus of the present article was to de-

monstrate usefulness of mosaic displays for CFA, however, we identified types and antitypes using the

classical CFA approach.


bar represents the magnitude of test statistics of types and antitypes, and types and

antitypes can easily be identified by drawing two lines across bars representing critical

values. However, an ideal graphic display of CFA should feature not only types and

antitypes but also proportionate or raw cell frequencies of a given configuration. Test

statistics of types and antitypes result from discrepancies between the observed and ex-

pected frequencies. The latter depends on the hypotheses of the model tested under

study. Therefore, magnitude of test statistics of types and antitypes alone does not dis-

close anything about observed frequencies of a cross-tabulation, which facilitate under-

standing with regard to where local associations are reflected in a cross-classification.

Observed cell frequencies of a configuration or multi-way contingency table have

typically been plotted in a bar graph by arraying multiple categorical dimensions into

one dimension listing all possible cell indices (Hartigan & Kleiner, 1981, 1984; Wang,

1985; see Figure 5 in this article for an example). Height of a bar represents the magni-

tude of a cell frequency proportional to others. For example, Mahoney (2000) converted

a three-way contingency table (4 × 2 × 2) to a two-way contingency table (4 × 4) by

arraying the last two categorical variables into one dimension with cell frequencies

shown in bars clustered by the first variable in a clustered bar chart.

There are ways to improve this type of bar charts by adding a third dimension or ad-

justing the width of bars (see Clogg, Rudas, & Matthews, 1997). However, the Mosaic

display can also be an alternative for traditional two-dimensional bar charts in graphics

of multivariate relationships. Furthermore, to our knowledge, there is no graphical

technique yet developed to achieve the two critical features needed for the graphical

presentation of CFA simultaneously: display of types and antitypes and display of cell

frequencies. To accommodate these two features for graphic presentation of CFA re-

sults, this article proposes using mosaic displays for CFA, and illustrates this technique

in comparison to the two other techniques used previously mentioned (e.g., Mahoney,

2000; von Eye & Niedermeier, 1999).

3. Mosaic Displays

The mosaic display, proposed by Hartigan and Kleiner (1981, 1984) is a graphical

method for examining cross-tabulated data. A mosaic, defined as the collection of tiles

or rectangles for the n-way contingency table is formed by dividing a square n times

vertically and then horizontally (or vice versa) in a successive manner until all cell con-


figurations are displayed. Each of the cell counts is represented in a mosaic display by

a rectangular area proportional to the cell frequencies of other cell configurations so that

relative size of a tile or rectangle becomes an indicator for whether the observed data

deviate from the CFA base model. Relatively larger rectangles suggest large observed

frequencies. Likewise, relatively smaller rectangles denote smaller observed frequencies.

If main effects or associations are hypothesized in the CFA base model or when infor-

mation other than cell frequencies is to be displayed, adding shade and color (Wang,

1985) or incorporating residuals or signs into the tiles (Friendly, 1994) helps determine

whether the observed data deviate from the hypothesized model. Thus, in CFA appli-

cations, relative sizes of tiles still indicate cell frequency but additional components such

as shading, color, numbers, or signs can address the deviation of the observed data from

the specified model. Thus, the usefulness of a mosaic display goes beyond displaying

just frequencies. In addition, mosaic displays can be especially helpful for illustrating

multi-way contingency tables by examining successive mosaic displays sequentially as

successive variables are brought into the cross-tabulation (Friendly, 1994).

To summarize, there are two characteristics of a mosaic that suit the needs of CFA:

display of cell frequencies and patterns of type/antitype. First, relative sizes of rectan-

gles in a mosaic display do not change as a function of the hypotheses or models tested

under study since rectangles or tiles reflect the observed frequencies of a cross-

tabulation. Therefore, regardless of the hypotheses or models tested, the size of a tile

always corresponds to the magnitude of an observed frequency of a given cross-

tabulation. Second, incorporating color, shading, sign, or numbers to the mosaic display

allows researchers to discriminate types and/or antitypes and determine whether the

tested model is a good representation of the data or not. In the following section, sev-

eral data examples are used to illustrate Mosaic displays in CFA with step-by-step de-

scriptions.

4. Data Examples

4.1. CFA Base Model

Consider the following data example. In a study on child behavior problems (Mun,

Fitzgerald, von Eye, Puttler, & Zucker, 2001; Zucker et al., 2000), a sample of 215 boys

was rated twice by parents using the Child Behavior Checklist for Ages 4-18 (CBCL;

Achenbach, 1991). The first rating occurred when the boys were between three and five


years old and the second rating occurred when they were six to eight years old. Follow-

ing Achenbach (1991), a T-score of 60 was used as the clinical cut-off for externalizing

and internalizing behavior problems in the clinical range. Based on the averaged paren-

tal ratings, boys were assigned to clinical levels of externalizing behavior problems at

wave 1 (E1), internalizing behavior problems at wave 1 (I1), externalizing behavior prob-

lems at wave 2 (E2), and internalizing behavior problems at wave 2 (I2). For all four

variables, E1, I1, E2, and I2, a category of one indicated behavior problems in the norma-

tive range and a category of two indicated behavior problems in the clinical range.

Table 1: Developmental Patterns of Behavior Problems among Boys

1 1 2 2E I E I Obs. Freq. Exp. Freq. L

1111 142 120.63 6.56 Type

1112 17 24.26 -2.79 Antitype

1121 5 15.16 -4.51 Antitype

1122 3 3.05 -.03

1211 8 11.02 -1.51

1212 2 2.22 -.16

1221 0 1.39 -1.26

1222 1 .28 1.39

2111 16 25.08 -3.46 Antitype

2112 1 5.04 -2.09

2121 6 3.15 1.80

2122 7 .63 8.22 Type

2211 2 2.29 -.21

2212 3 .46 3.83 Type

2221 0 .29 -.55

2222 2 .06 8.10 Type Notes. E1 = externalizing behavior problems at wave 1 (Ages 3-5); I1 = internalizing behavior problems

at wave 1; E2 = externalizing behavior problems at wave 2 (Ages 6-8); I2 = internalizing behavior prob-

lems at wave 2. Numerals in E1I1E2I2 column represent ordered quadruples of variable categories: 1 =

sub-clinical level behavior problems; 2 = clinical level behavior problems. L stands for Lehmacher’s test.

Bonferroni-adjusted alpha (.003125) was used as a critical alpha level.


Figure 1: Developmental Patterns of Behavior Problems Among Boys

This categorization scheme yielded the 2 × 2 × 2 × 2 cross-classification (E1 × I1 × E2 ×

I2). We analyzed this table under the total independence assumption (i.e., main-effect

model), which dictates that all four variables are not related at all. Table 1 shows the

observed and expected frequencies and types and antitypes of the data3. Figure 1 gives

the mosaic display of the cross-classification.

4.2. MOSAICS

All mosaic displays in the current study (Figures 1-4 and 7-13) were generated using

MOSAICS developed for the SAS/IML software (SAS Institute, 1989) by Friendly

(1992, 1994) which is available at http://www.math.yorku.ca/SCS/mosaics.html4. For

ease of understanding, cell indices and legends were later edited into the figures in the

present study. Numbers inside or by the tiles in all mosaic figures are cell indices. In

3Three expected frequencies (cell indices 1122, 1212, and 2211) were smaller than .5. Although we

acknowledge that these values were rather small, for the purpose of illustration, we decided to ignore this.

Likewise, we avoided invoking the delta option to compensate for cells with zero observations. 4Detailed description of the algorithm and a FORTRAN program as an alternative to the MOSAICS

program can be found in Wang (1985).


addition, green and blue colors were consistently used to represent types and antitypes,

respectively. However, color, shading, arrangement of tiles, and size of the graph are

arbitrary and may be changed.

4.3. Sequential introduction of marginal totals

Figure 1 can be developed in a series of steps. The first step was to compute the

marginal totals of the table. Let f ijkl denote the ijkl th cell count for the present data.

And let f 1... through f ...2 denote one-way marginals, and f 11.. through f ..22 denote two-way

marginals, and f 111. through f .222 denote three-way marginals. The first block represent-

ing a proportion of one was vertically divided into two blocks using one-way marginal

totals for externalizing behavior problems at wave 1 (e.g., f 1... and f 2...; see Figure 2).

The left oblong representing cell index 1... displayed 82.8% (178 cases) of the total sam-

ple and the right oblong representing cell index 2... displayed 17.2% (37 cases) of the

total sample. In the next step, the two oblongs representing cell indices 1... and 2...

were horizontally divided into four rectangles using two-way marginal totals for exter-

nalizing and internalizing behavior problems at wave 1 (see Figure 3). The rectangle for

cell 11.. was bigger than any other rectangles displaying 77.7% (167 cases) of the total

sample and 93.8% of the one-way marginal totals for cell 1... in Figure 2. The rectangle

for cell 22.. was the smallest of the four tiles showing only seven observations (3.3% of a

total sample). Thus, the size of a tile serves as a good approximate measure for an ob-

served frequency proportional to others in a given configuration. From CFA results,

cells 11.. and 22.. were identified as types shown in green whereas cells 12.. and 21.. as

antitypes shown in blue. It can be summarized that behavior problems of three-to-five-

year-old boys appeared across all observed variables or not at all5.

5Expected frequencies for 11.., 12.., 21.., and 22.. were 163.10, 14.90, 33.90, and 3.10, respectively. Leh-

macher’s test statistics (1981) were 2.54, -2.54, -2.54, and 2.54 in the same order.


Figure 2: Developmental Patterns of Behavior Problems Among Boys: One-Way Mar-

ginal Totals

Figure 3: Developmental Patterns of Behavior Problems Among Boys: Two-Way Mar-

ginal Totals


Figure 4: Developmental Patterns of Behavior Problems Among Boys: Three-Way Mar-

ginal Totals

The third step was to vertically divide the four tiles in Figure 3 into eight tiles using

three-way marginal totals for both types of behavior problems at wave 1 and externaliz-

ing behavior problems at wave 2 (see Figure 4). The rectangles for cell indices 122. and

222. are very small indicating only one observed case and two observed cases, respec-

tively. The vertical split was asymmetric in that it favored a more even division for the

two-way marginal totals (cell indices 21.. and 22..) in comparison to a disproportionate

division for the other two-way marginal totals representing cell indices 11.. and 12...

The disproportionate and asymmetric split suggested that there may be associations

among these three categorical variables. It turned out that cells 111. , 212. , and 222.

emerged as types whereas cells 112. and 211. were identified as antitypes from CFA re-

sults6. Types and antitypes indicate that more boys than expected showed all-or-none

behavior problems (cells 111. and 222.), that boys with externalizing behavior problems

only at wave 1 also had externalizing behavior problems at wave 2 (cell 212.), and that

6Expected frequencies for 111., 112., 121., 122., 211., 212., 221., and 222. were 144.89, 18.21, 13.24, 1.66,

30.12, 3.78, 2.75, and .35, respectively. Lehmacher’s test statistics (1981) were 5.87, -5.14, -1.80, -.56, -

6.05, 5.44, 1.53, and 2.87 in the same order.


boys without any behavior problems at wave 1 were unlikely to have externalizing be-

havior problems at wave 2 (cell 112.). However, it was less often found than expected

that boys with externalizing behavior problems at wave 1 did not display those prob-

lems at wave 2 (cell 211.). Finally, the eight tiles in Figure 4 were horizontally divided

yielding sixteen tiles based on each cell count, fijkl (see Figure 1). The horizontal divi-

sion was even more asymmetric than the vertical split at the third step, pointing to pos-

sible associations among the four categorical variables. In Figure 1, cell configurations

1221 and 2221 are illustrated with lines instead of tiles to show that the observed fre-

quencies are zero. More details on Figure l follow in the next section.

4.4. Results for the CFA base model

As expected, the CFA base model showed a poor fit. The Pearson X2 = 169.39, for

df = 11, p = .00, suggests that the independence model is not a good representation of

the data. In addition, the CFA results shows four types and three antitypes using

Lehmacher’s test (Lehmacher, 1981) with a Bonferroni-adjusted alpha level (α* =

0.003125). The Bonferroni adjustment of alpha was adopted to control for inflated al-

pha due to first, simultaneous multiple testing of types and antitypes and second, their

mutual dependency of tests (see von Eye, 1990, in prep). Types were found in configu-

rations 1111, 2122, 2212, and 2222. The first type (1111) indicates that there were mo-

re cases than expected of neither externalizing nor internalizing behavior problems at

both waves. Type 2122 shows that there were more boys than expected with externaliz-

ing behavior problems at both waves, and internalizing behavior problems at wave 2

but not at wave 1. Type 2212 shows that there were more boys than expected with

internalizing behavior problems at both waves, and externalizing behavior problems at

wave 1 only. Type 2222 shows that there were more boys with externalizing and inter-

nalizing behavior problems at both waves than expected.

Antitypes were found in cell configurations 1112, 1121, and 2111. Antitype 1112 in-

dicates that fewer cases than expected were found of internalizing behavior problems at

wave 2 only. Antitype 1121 indicates that fewer observations than expected were found

of boys with externalizing behavior problems only at wave 2. Antitype 2111 indicates

that fewer boys than expected showed externalizing behavior problems only at wave 1.


4.5. Alternate graphic methods

Table 1 can alternatively be plotted showing only proportional differences in a bar

graph using, for instance, SPSS 9.0 (SPSS, 1998). Figure 5 is a result of converting the

4-way contingency table (2 × 2 × 2 × 2) to a two-way contingency table (4 × 4). Figure

5 shows that some cell configurations had higher cell counts whereas other cell configu-

rations had lower cell counts. Thus, this technique is limited in that first, it does not

handle types and antitypes of CFA; second, different arrangements of a multi-way con-

tingency table for a statistical analysis and a graphic illustration can create semantic

difficulties. Alternatively, Table 1 can be displayed with a focus on statistics of types

and antitypes as in von Eye and Niedermeier (1999). Figure 6 was drawn using S-Plus

4.5 (MathSoft, 1997). Height of bars in this graph represents the magnitude of Leh-

macher’s test statistics. Bars below the zero line indicate that observed frequencies

were smaller than expected frequencies whereas bars above the zero line indicate that

observed frequencies were larger than expected frequencies. The two horizontal lines,

parallel above and below zero indicate critical values of Lehmacher’s test statistics.

Bars above and below the critical values indicate types and antitypes, respectively.

This graph clearly shows that cells 1111, 2122, 2212, and 2222 were types represented

by green bars and cells 1112, 1121, and 2111 were antitypes represented by blue bars.

This technique, however, is limited in that it does not provide information on cell fre-

quencies of a cross-tabulation. Therefore, a mosaic display seems to be a better fit for

CFA than the other two techniques.


Figure 5: An Alternative Approach to Mosaic Displays: A Bar Graph of Cell Fre-

quencies

Figure 6: A Bar Graph of Test Statistics of Types and Antitypes


4.6. When all cells are types or antitypes

The following data example presents a situation when all cells are either types or

antitypes. This data example has been used by many researchers including Lienert

(1964), von Eye (1990), and Kieser and Victor (1999). 65 students were treated with

LSD 50 and observed for the following three symptoms: Narrowed consciousness (C),

thought disturbance (T), and affective disturbance (A). Each of the symptoms had the

categories of presence or absence. Table 2 gives the resulting cross-tabulation. The

expected frequencies and testings of types and antitypes were computed under the as-

sumption of total independence of all symptoms. The CFA base model did not fit,

Pearson X2 = 37.92, for df = 4, p = .00. In addition, the CFA results showed four

types and four antitypes, all based on Lehmacher’s test with a Bonferroni-adjusted al-

pha level (α* = 0.00625). Types were found in cells 111, 122, 212, and 221 while anti-

types were found in cells 112, 121, 211, and 222. Results can be briefly summarized as

follows. More cases with either all three symptoms or a just single symptom were found

than expected by the independence assumption. On the other hand, antitypes indicate

that fewer cases with either no symptom or any of two symptoms were found than ex-

pected. Detailed interpretations can be found in von Eye (1990, p. 34)7. Figure 7 pre-

sents the mosaic display for the data. As before, types are shaded in green and anti-

types are shaded in blue.

7This data example generates one type (111) and one antitype (222) when analyzed using the approach

suggested by Kieser and Victor (1999).


Table 2: Leuner’s Syndrome Data

CTA Obs. Freq. Exp. Freq. L 111 20 12.51 3.41 Type

112 1 6.85 -3.06 Antitype

121 4 11.40 -3.43 Antitype

122 12 6.24 3.09 Type

211 3 9.46 -3.12 Antitype

212 10 5.18 2.73 Type

221 15 8.63 3.13 Type

222 0 4.73 -2.75 Antitype Notes. C = narrowed consciousness; T = thought disturbance; A = affective disturbance. Numerals in

CTA column represent ordered triples of variable categories: 1 = presence of symptom; 2 = absence of

symptom. L stands for Lehmacher’s test; Bonferroni-adjusted alpha (.00625) was used.

Figure 7: Leuner’s Syndrome Data


4.7. Entry order of variables

The following data examples are to show that a different entry order of categorical

variables into MOSAICS results in a mosaic with tiles of the same proportional size but

with a different planimetric arrangement. The first data set is from a study on the pre-

diction of performance in school, which has been used in von Eye and Brandtstädter

(1998). In this study, fluid intelligence (I) and performances in German (G) and

mathematics (M) were assessed (see Table 3). The expected frequencies and test statis-

tics of types and antitypes were computed under the assumption of total independence

of all variables. The CFA base model did not fit, Pearson X2 = 67.58, for df = 4, p =

.00. There were two types and three antitypes using Lehmacher’s test with a Bon-

ferroni-adjusted alpha level (α* = 0.00625). Figure 8 represents the data set with the

entry order that corresponds to the order of CFA shown in Table 3. The entry order

for the CFA base model, [I][G][M] was 111, 211, 121, 221, 112, 212, 122, and 222 in

MOSAICS. In MOSAICS, the first variable varies most rapidly across the columns of

cell indices whereas in most other programs the first variable varies most slowly.

Table 3: Fluid Intelligence and Performances in German and Mathematics

IGM Obs. Freq. Exp. Freq. L 111 19 4.96 7.71 Type

112 1 7.17 -3.04 Antitype

121 9 14.25 -2.13

122 18 20.62 -1.00

211 3 4.85 -1.02

212 1 7.02 -2.98 Antitype

221 7 13.95 -2.83 Antitype

222 35 20.18 5.67 Type Notes. I = fluid intelligence; G = performance in German; M = performance in mathematics. Numerals in

IGM column represent ordered triples of variable categories: 1 = below average; 2 = above average. L

stands for Lehmacher’s test; Bonferroni-adjusted alpha (.00625) was used.


Figure 8: Prediction of Performance in School, [I][G][M]

We then changed the order of categorical variables from [I][G][M] to [G][M][I]. Cell

indices entered into MOSAICS were in the following order: 111, 121, 112, 122, 211, 221,

212, and 222, which corresponded to the order for the CFA base model, [G][M][I]. Fig-

ure 9 represents the data. In this figure, the shape and the location of the tiles changed

but the relative sizes remained the same. For example, the tall oblong for a cell index

221 in Figure 8 changed to a rectangle in Figure 9. However, the relative size of the cell

221 stayed the same in Figures 8 and 9 in proportion to a total number of cases as well

as marginal totals.


Figure 9: Prediction of Performance in School: Different Entry Order, [G][M][I]

The second data example is from a recently reported study (Mahoney, 2000) in which

four groups of adolescent boys (G), and their records of school dropout (D) and criminal

arrest (C) were obtained to see whether there were associations among group informa-

tion and records of dropout and criminal arrest (see Table 4). The expected frequencies

and test statistics of types and antitypes were calculated under the assumption of total

independence. The CFA base model did not fit, Pearson X2 = 137.66, for df = 7, p =

.00. Three types and three antitypes were identified using Lehmacher’s test with a

Bonferroni-adjusted alpha level (α*= 0.0041667). Patterns of types and antitypes are

interpreted in detail in Mahoney (2000). Figure 10 shows a mosaic display for the CFA

base model, [G][D][C]. The order of cell indices entered into MOSAICS that corre-

sponded to CFA are as follows: 111, 211, 311, 121, 221, 321, 112, 212, 312, 122, 222, and

322. When the order of categorical variables was reversed to [C][D][G] (i.e., 111, 112,

121, 122, 211, 212, 221, 222, 311, 312, 321, and 322), the general look of the mosaic

changed due to differences in the order of introduction of marginal totals (see Figure

11). However, the sizes of tiles remained the same in proportion to the total number of

cases and marginal totals.


Table 4: Records of School Dropout and Criminal Arrest among Adolescent Boys

GDC Obs. Freq. Exp. Freq. L 111 155 121.62 7.73 Type

112 9 22.64 -4.13 Antitype

121 6 24.23 -5.39 Antitype

122 3 4.51 -.78

211 63 64.68 -.44

212 10 12.04 -.72

221 11 12.89 -.65

222 8 2.40 3.82 Type

311 26 42.18 -5.01 Antitype

312 8 7.85 .06

321 13 8.41 1.86

322 13 1.56 9.51 Type Notes. G = configuration group; D = school dropout; C = criminal arrest. Numerals in GDC column

represent ordered triples of variable categories: For G, 1 = configurations 1 and 2, characterized by com-

petence in all domains; 2 = configuration 3, characterized by low academic competence and high aggres-

sion; 3 = configuration 4, characterized by a multiple risk profile. For D and C, 1 = no; 2 = yes. L

stands for Lehmacher’s test; Bonferroni-adjusted alpha (.0041667) was used


Figure 10: Records of School Dropout and Criminal Arrest Among Adolescent Boys,

[G][D][C]

Figure 11: Records of School Dropout and Criminal Arrest Among Adolescent Boys:

Reversed Order [C][D][G]


4.8. Non-Standard CFA Models

So far, the present study illustrated standard CFA base models using mosaic displays

with different data examples. From these results, usefulness of mosaic displays was ex-

amined in terms of display of cell frequencies and residuals. In addition, we demon-

strated that a different entry order of categorical variables generates a different look

overall but the relative sizes of tiles remain intact. In this section, we demonstrate that

mosaic displays can be applied to non-standard and non-hierarchical CFA models as

well.

Consider the following data example, a re-analysis of data published by Glück and

von Eye (2000). A sample of 181 high school students was administered the 24-item

cube comparison task. After completing each item, the students responded to questions

concerning the perceived difficulty of the item, the strategies they had employed to

process the item, and the perceived quality of their strategy (Glück, 1999). The three

strategies the students used to solve the cube comparison task were mental rotation

(R), pattern comparison (P), and change of viewpoint (V). Each strategy was scored as

not used = 1 and used = 2. A category one was assigned for females; two for males for

Gender (G). Table 5 and Figure 12 display the results of first order CFA (i.e., model of

total independence) with the normal approximation of the binomial test and the Bon-

ferroni-adjusted α* = 0.003125.

The results showed a rich pattern of types and antitypes with noticeable gender dif-

ferences. Types indicate that there were more observations than expected for the fol-

lowing configurations: Males who only used the change of viewpoint strategy (1122),

males who only used the pattern comparison strategy (1212), males that used both the

pattern comparison and the change of viewpoint strategies (1222), and females that only

used the rotation strategy (2111). Antitypes suggest that there were fewer observations

than expected for the following configurations: Females that used no strategy (1111),

males that used no strategy (1112), males that used both the rotation and the pattern

comparison strategies (2212), and females that used all three strategies (2221). This

CFA base model for the frequency distribution in Table 5 was rejected because of the

large Pearson X2 = 321.68 with df = 11, p < 0.01 (Likelihood Ratio (LR) = 380.84, df

= 11, p < 0.01).


Table 5: First Order CFA of the Cross-Classification of Rotational Strategy (R), Pat-

tern Comparison Strategy (P), Viewpoint Strategy (V), and Gender (G)

RPVG Obs. Freq. Exp. Freq. L 1111 25 61.30 -4.68 Antitype

1112 5 103.19 -9.81 Antitype

1121 17 10.48 2.02

1122 42 17.65 5.81 Type

1211 98 88.27 1.05

1212 206 148.60 4.81 Type

1221 13 15.10 -.54

1222 64 25.42 7.68 Type

2111 486 398.58 4.65 Type

2112 729 670.92 2.49

2121 46 68.17 -2.71

2122 95 114.75 -1.88

2211 590 573.96 .73

2212 872 966.22 -3.58 Antitype

2221 39 98.17 -6.08 Antitype

2222 199 165.25 2.69 Notes. Numerals in RPVG column represent ordered triples of variable categories. Each strategy was

scored as 1 = not used; 2 = used. For Gender, 1 = females; 2 = males. L stands for Lehmacher’s test;

Bonferroni-adjusted alpha (.003125) was used.


Figure 12: Patterns of Strategies: First-Order CFA Base Model (Glück & von Eye,

2000)

In addition to the four categorical variables used in Table 5, one could ask whether

handedness is associated with strategies adopted by males and females (Glück, 1999). If

so, residuals would diminish and some or all of the types and antitypes would disappear.

To test this hypothesis, in the next step, we added a covariate, handedness to the first-

order CFA base model. Results showed a significant improvement over the previous

CFA base model without the covariate (∆ LR = 164.21; ∆df=1; p < 0.01), although the

model was not tenable by itself (X2 = 168.14, LR = 216.63; df = 10; p < 0.01). Only

one antitype (1112) and three types (1122, 1212, and 2111) remained significant out of

the eight types and antitypes in Table 5, eliminating one type (1222) and three antity-

pes (1111, 2212, and 2221; see Table 6 and Figure 13). Thus, the covariate, handedness

contributed significantly to the explanation of the observed frequency distribution. The

changes in types and antitypes in these two nested analyses are clearly shown in Figures

12 and 13. In Figure 12, eight tiles were illustrated in either green or blue; only four

tiles were still displayed in either color in Figure 13. The tiles in Figures 12 and 13 are

identical in size since the observed frequencies were the same but the shading color pat-

tern was different due to the differences in expected frequencies stemming from different

CFA models.


Table 6: First Order CFA of the Cross-Classification of Rotational Strategy (R), Pat-

tern Comparison Strategy (P), Viewpoint Strategy (V), and Gender (G) with Handed-

ness as Covariate

RPVG Obs. Freq. Exp. Freq.

Handed-

ness

Test Statis-

tics 1111 25 33.67 .99 -1.50

1112 5 87.34 .91 -8.92 Antitype

1121 17 16.11 .88 .22

1122 42 21.84 .89 4.33 Type

1211 98 106.90 .81 -.87

1212 206 134.85 .83 6.25 Type

1221 13 17.34 .85 -1.05

1222 64 51.96 .75 1.68

2111 486 419.00 .83 3.49 Type

2112 729 705.24 .81 1.00

2121 46 47.40 .92 -.21

2122 95 114.41 .85 -1.84

2211 590 646.91 .75 -2.48

2212 872 877.10 .76 -.20

2221 39 26.68 .98 2.40

2222 199 219.28 .74 -1.41 Notes. Numerals in RPVG column represent ordered triples of variable categories. Each strategy was

scored as 1 = not used; 2 = used. For Gender, 1 = females; 2 = males.


Figure 13: Patterns of Strategies: Handedness as Covariate (Glück & von Eye, 2000)

As shown in the present study, mosaic displays using MOSAICS can illustrate stan-

dard/non-standard as well as hierarchical/non-hierarchical CFA with or without covari-

ates. Most non-standard and/or non-hierarchical CFA models can be accommodated by

providing configuration types into MOSAICS. Using residuals as deviations in

MOSAICS is an alternative for more complex models with covariates as shown in the

last example. Appendix A provides a SAS input as an example of a hierarchical CFA

or standard log-linear models, and Appendix B provides a SAS input as an example us-

ing Pearson residuals,2( )f F−

F, when f and F denote the observed and expected fre-

quency, respectively.

5. Discussion

The present article evaluated three graphic methods of displaying CFA results. The

first method used by Mahoney (2000) focuses on the observed cell frequencies. The sec-

ond method used by von Eye and Niedermeier (1999), focuses on the magnitude of test

statistics used for the type/antitype tests. The third method, the Mosaic display


(Friendly, 1992, 1994; Hartigan & Kleiner, 1981, 1984; Wang, 1985) incorporates both,

the observed cell frequencies and the type/antitype information from CFA. The present

article illustrated advantages of the Mosaic display over the other two methods for CFA

since more than one dimension of the data can be illustrated simultaneously in the Mo-

saic display. Cell frequencies and the pattern of types and antitypes are critical features

of CFA, and they can easily be illustrated using the Mosaic display.

Patterns of type/antitype in CFA, in particular, allow one to understand whether

there is a heterogeneous subset of the sample, and on what categories and levels this

subset differs. A good graphical method can be instrumental in implementing features

of CFA and our understanding of the data. Lack of graphical techniques for CFA, fu-

eled by Tukey’s tenets (1989, 1990) on data-based graphics was the major incentive for

the present article. Six of Tukey’s points on visual display (1989, 1990), germane to

Mosaic displays of CFA, are briefly discussed as follows.

5.1. Impact is important

Visual display of the data should be done in a powerful and intuitive way (Tukey,

1989, 1990). The Mosaic display is capable of doing this for CFA results. The Mosaic

display can be as compelling a means of visual display for multivariate relationships as

traditional bar charts for univariate information or bivariate relationship. In particu-

lar, the mosaic simultaneously displays the cell-wise frequencies and the type/antitype

decision in one tile. Thus, the interesting point that the correlation between the size of

cell frequency and presence of types/antitypes is weak at best can also be visualized.

5.2. Understanding graphics is not always automatic

Due to its novelty, Mosaic displays of CFA may not be understood easily at first

sight. However, to be thoroughly understood, even familiar types of graphs may need

explanations that come in the form of descriptions or legends (Tukey, 1989, 1990). This

certainly applies to Mosaic displays in CFA. Once the reader knows how to look at it,

the entire information carried by a Mosaic can easily be understood.

5.3. A graph can show us things easily that might not have been seen

otherwise

The purpose of visual display is not to present numbers, but to compare (Tukey,

1989, 1990). Presenting an array of numbers in the form of a table may make it hard to


see a relationship or lack of it. Mosaics of CFA results can show patterns of types and

antitypes and the relationship between frequency and type/antitype decision. In addi-

tion, by proper selection of the order of variables, Mosaics can make the comparison of

groups clear.

5.4. An understanding of purpose is needed

Graphs as well as analytical methods are selected based on what researchers are try-

ing to get across to the audience. Different graphs serve different purposes. The

Mosaic display of CFA allows one to depict (i) type and antitype patterns, (ii) the

size of cells, and (iii) the relationship between frequencies and type/antitype pat-

terns. If these are the purposes of analysis as in CFA, the Mosaic display is the me-

thod of choice. If, however, researchers focus on the size of frequencies at the expen-

se of type/antitype patterns, mosaics carry too much information and can be repla-

ced by other simpler graphical methods, e.g., bar graphs.

5.5. The absence of phenomena is itself a phenomenon

In the present context, the “absence of phenomena” can be viewed as first, the ab-

sence of types or antitypes, and second, the presence of antitypes. The Mosaic display

of CFA unequivocally depicts the absence of phenomena as well as the presence of phe-

nomena. In first-order CFA, the absence of types or antitypes implies total indepen-

dence among variables, which is rarely observed in CFA applications. So when it hap-

pens, the absence of types or antitypes may require explanation. In addition, more in-

terestingly, there are situations where the main focus of research addresses whether cer-

tain configurations exist, which can be fulfilled by the visual display of the presence of

antitypes using Mosaic displays.

5.6. Color is a disappointment

Although color is not yet an effective means of representing quantitative values, it is

a useful labeling means in general (Tukey, 1990; Wainer, 1990). Color can be used ef-

fectively for qualitative phenomena at two or three levels. In the Mosaic displays of

CFA where color is used to tell whether types or antitypes exist, and whether it is type

or antitype, color is a powerful means of visual display. Moreover, the increased use of

color in prints and the increased publications in CD-ROM or on the web will make color

as a more viable means to illustrate quantitative as well as qualitative information.


References

[1] Achenbach, T. M. (1991). Manual for the Child Behavior Checklist/4-18 and 1991

profile. Burlington, VT: University of Vermont department of Psychiatry.

[2] Clogg, C. C., Rudas, T., & Matthews, S. (1997). Analysis of contingency tables

using graphical displays based on the mixture index of fit. In J. Blasius, & M.

Greenacre (Eds.), Visualization of categorical data. New York: Academic Press.

[3] DuMouchel, W. (1999). Bayesian data mining in large frequency tables, with an

application to the FDA spontaneous reporting system. The American Statistician,

53, 177-190.

[4] Friendly, M. (1992). User's guide for MOSAICS (Tech. Rep. No. 206). York Uni-

versity, Department of Psychology.

[5] Friendly, M. (1994). Mosaic displays for multi-way contingency tables. Journal of

the American Statistical Association, 89, 190-200.

[6] Glück, J. (1999). Spatial strategies - cognitive strategies on spatial tasks. Unpub-

lished dissertation. University of Vienna, Department of psychology.

[7] Glück, J., & von Eye, A. (2000). Including covariates in Configural Frequency

Analysis. Psychologische Beiträäge, 42, 405 - 417.

[8] Hartigan, J. A., & Kleiner, B. (1981). Mosaics for contingency tables. In W. F.

Eddy (Ed.), Proceedings of the 13th symposium on the interface between computer

science and statistics (pp. 268-273). New York: Springer-Verlag.

[9] Hartigan, J. A., & Kleiner, B. (1984). A mosaic of television ratings. The American

Statistician, 38(1), 32-35.

[10] Kieser, M., & Victor, N. (1999). Configural frequency analysis (CFA) revisited - A

new look at an old approach. Biometrical Journal, 41, 967-983.

[11] Lehmacher, W. (1981). A more powerful simultaneous test procedure in configural

frequency analysis. Biometrical Journal, 23(5), 429-436.

[12] Leuner, H. C. (1962). Die experimentelle Psychose. Berlin: Springer.

[13] Lienert, G. A. (1969). Die “Konfigurationsfrequenzanalyse” als Klassifikationsme-

thode in der klinischen Psychologie. In M. Irle (Ed.), Bericht über den 26. Kongreß


der Deutschen Gesellschaft für Psychologie in Tübingen 1968 (pp. 244 - 253). Göt-

tingen: Hogrefe.

[14] Mahoney, J. L. (2000). School extracurricular activity participation as a moderator

in the development of antisocial patterns. Child Development, 71(2), 502-516.

[15] MathSoft (1997). S-Plus user’s guide. Seattle, WA: MathSoft, Inc.

[16] Mun, E. Y., Fitzgerald, H. E., von Eye, A., Puttler, L. I., & Zucker, R. A. (2001).

Temperamental characteristics as predictors of externalizing and internalizing child

behavior problems in the contexts of high and low parental psychopathology. Infant

Mental Health Journal, 22(3), 393-415.

[17] SAS Institute (1989). SAS/IML Software: Usage and reference, version 6, first edi-

tion. Cary, NC: SAS Institute.

[18] SPSS Inc. (1998). SPSS 9.0. Chicago, IL: SPSS, Inc.

[19] Tukey, J. W. (1986). Sunset salvo. American Statistician, 40, 72-76.

[20] Tukey, J. W. (1989). Data-based graphics: Visual display in the decades to come.

In Gail, M.H., & Johnson, N.L. (Coordinators): Sesquicentennial invited paper ses-

sions. Proceedings of the American Statistical Association (pp. 366 - 381). Alexan-

dria, VA: American Statistical Association.

[21] Tukey, J. W. (1990). Data-based graphics: Visual display in the decades to come.

Statistical Science, 5(3), 327-339.

[22] von Eye, A. (1990). Introduction to Configural Frequency Analysis: The search

for types and antitypes in cross-classifications. Cambridge: Cambridge University

Press.

[23] von Eye, A. (2001). Configural Frequency Analysis - Version 2000: A program for

32 bit Windows operating systems. Methods of Psychological Research-Online,

6(2), 129-139.

[24] von Eye, A. (in prep). Configural Frequency Analysis. Mahwah, NJ: Lawrence

Erlbaum Associates, INC.

[25] von Eye, A., & Niedermeier, K. E. (1999). Statistical analysis of longitudinal cate-

gorical data in the social and behavioral sciences. Mahwah, NJ: Erlbaum.


[26] von Eye, A., Spiel, C., & Wood, P. K. (1996). Configural Frequency Analysis in

applied psychological research. Applied Psychology: An International Review, 45,

301 - 327.

[27] Wainer, H. (1989). Discussion: Graphical visions from William Playfair to John

Tukey. In Gail, M.H., & Johnson, N.L. (Coordinators): Sesquicentennial invited pa-

per sessions. Proceedings of the American Statistical Association (pp. 382 - 390).

Alexandria, VA: American Statistical Association.

[28] Wainer, H. (1990). Graphical visions from William Playfair to John Tukey. Statis-

tical Science, 5(3), 340-346.

[29] Wainer, H., & Velleman, P. F. (2001). Statistical graphics: Mapping the pathways

of science. Annual Review of Psychology, 52, 305-335.

[30] Wang, C. M. (1985). Applications and computing of mosaics. Computational Sta-

tistics & Data Analysis, 3, 89-97.

[31] Zucker, R. A., Fitzgerald, H. E., Refior, S. K., Puttler, L. I., Pallas, D., & Ellis, D.

A. (2000). The clinical and social ecology of childhood for children of alcoholics:

Description of a study of implications for a differentiated social policy. In H. E.

Fitzgerald, B. M. Lester, & B. Zuckerman (Eds.), Children of addiction: Research,

health, and public policy issues (pp. 109-142). New York: Routledge/Falmer.

Appendix A

SAS MOSAIC input for the Mun et al (2001) data: Using observed frequencies

filename mosaics 'c:\sas\sasuser\mosaics\';

libname mosaic 'c:\sas\sasuser\mosaics\';

data infant;

input E1 I1 E2 I2 freq;

cards;

1 1 1 1 142


2 1 1 1 16

1 2 1 1 8

2 2 1 1 2

1 1 2 1 5

2 1 2 1 6

1 2 2 1 0

2 2 2 1 0

1 1 1 2 17

2 1 1 2 1

1 2 1 2 2

2 2 1 2 3

1 1 2 2 3

2 1 2 2 7

1 2 2 2 1

2 2 2 2 2

;

proc iml;

use infant;

read all var {freq} into table;

levels={2 2 2 2};

vnames={'E1' 'I1' 'E2' 'I2'};

lnames={'E1:no' 'E1:yes',

'I1:no' 'I1:yes',

'E2:no' 'E2:yes',

'I2:no' 'I2:yes'};


goptions hsize=7 in vsize 7 in;

reset storage=mosaic.mosaic;

load module=_all_;

split={v h};

htext={1};

colors={green blue};

shade={1.4};

plots={4};

plots=4;

fittype='user';

config=t({1 0, 2 0, 3 0, 4 0});

title='infant';

run mosaic (levels, table, vnames, lnames, plots, title);

quit;

Appendix B

SAS MOSAIC input for the Mun et al (2001) data: Using residuals

filename mosaics 'c:\sas\sasuser\mosaics\';

libname mosaic 'c:\sas\sasuser\mosaics\';

proc iml;

infant={2 2 2 2};

f={142 16 8 2 5 6 0 0 17 1 2 3 3 7 1 2};

title={'infant'};


vnames={'E1' 'I1' 'E2' 'I2' };

lnames={'E1:no' 'E1:yes',

'I1:no' 'I1:yes',

'E2:no' 'E2:yes',

'I2:no' 'I2:yes'};

goptions hsize=7 in vsize 7 in;

reset storage=mosaic.mosaic;

load module=_all_;

%include 'c:\sas\sasuser\mosaics\mosaicd.sas';

dev={1.946 -1.812 -.910 -.192 -2.609 1.605 -1.177 -.537 -1.474 -1.800 -.146 3.741 -.028

7.998 1.367 8.071};

split={v h};

htext=1;

colors={green blue};

shade={1.4};

run mosaicd (infant, f, vnames, lnames, dev, title);

quit;

using mosaic displays in configural frequency analysis · the specified model. thus, the usefulness...

Documents