using mosaic displays in configural frequency analysis · the specified model. thus, the usefulness...
TRANSCRIPT
Methods of Psychological Research Online 2001, Vol.6, No.3 Institute for Science Education
Internet: http://www.mpr-online.de © 2001 IPN Kiel
Using Mosaic Displays in Configural Frequency Analysis
Eun Young Mun, Alexander von Eye, Hiram E. Fitzgerald and Robert A. Zucker1
Abstract
The present study proposes using Mosaic displays to depict results of Configural Fre-
quency Analysis (CFA). The Mosaic display is a graphical method to examine multi-
way cross-tabulated data. Its unique strength as a graphical tool for CFA lies in its
capability to show more than one dimension of the data in a way that instant compari-
sons of proportions can be made without much effort. Mosaic displays allow one to il-
lustrate not only cell frequencies but also patterns of types/antitypes in CFA. In the
present study, we compare Mosaic displays using real data examples to existing graphi-
cal methods that examine cell frequencies or test statistics of types and antitypes.
Keywords: Configural frequency analysis, mosaic display, graphics, categorical data
1 Authors´ note: We would like to thank Christof Schuster for helpful comments on earlier versions of
this article. This research was supported, in part, by NIAAA grant #2 R01 AA07065 to Robert A. Zu-
cker and Hiram E. Fitzgerald. Correspondence concerning to this article should be addressed to Eun
Young Mun, MSU-UM Longitudinal Study, 4660 South Hagadorn Road, Suite 620, East Lansing, MI
48823; Phone: (517) 353-5926, Fax: (517) 432-3764, Electronic mail: [email protected]
E.Y. Mun, A. von Eye, H.E. Fitzgerald, & R.A.Zucker: Using Mosaic Displays in CFA 165
1. Introduction
Visual display serves a unique function in our communication of findings that tables
or words won’t do: Visual display adds impact to the intended message (Tukey, 1989,
1990). Although graphics can never replace the need for tables, computations, or
words, a good graph can provide an immediate and inescapable shot at a phenomenon
of interest (Tukey, 1989, 1990). The complementary relationship between a graphic
display and analysis is fittingly described in Tukey’s words. “A picture may be worth a
thousand words, but it may take a hundred words to do it” (Tukey, 1986; cf. Wainer,
1990).
Unfortunately, traditional graphs based on the two-dimensional space do not handle
multivariate relationships well (Wainer & Velleman, 2001). It is awkward to display
more than two variables, not to mention four. Therefore, there is a great demand for
graphical methods that help understand multivariate relationships. The present study
proposes using Mosaic displays (Friendly, 1992, 1994; Hartigan & Kleiner, 1981, 1984)
to depict results of Configural Frequency Analysis (CFA; Lienert, 1969; von Eye, 1990,
2001, in prep; von Eye, Spiel, & Wood, 1996). The Mosaic display can accommodate
the graphic needs of CFA. It simultaneously illustrates not only cell frequencies but
also patterns of types and/or antitypes in CFA. The present article describes CFA and
its needs for graphic displays, Mosaic displays, and it provides step-by-step demonstra-
tions of the Mosaic displays of CFA results using data examples.
2. Configural Frequency Analysis (CFA) and Its Needs for
Visual Display
CFA is a multivariate method for typological research that involves categorical vari-
ables. CFA can be applied in both exploratory and confirmatory research. Using CFA,
researchers ask whether cells contain fewer or more cases than expected from some
chance model. Most of these models can be specified using log-linear models. A candi-
date for a chance model is virtually any log-linear model, including non-hierarchical and
non-standard models. Most CFA models can be expressed in terms of the log-frequency
model,
(1) log ,F Xλ=
166 MPR-Online 2001, No. 3
where F is the array of frequencies in the cross-tabulation, X is the indicator or de-
sign matrix that contains all vectors needed for the specified model including the inter-
cept, main effects, interaction effects, and/or covariate effects, and λ is the parameter
vector. In the “Classical” CFA (Lienert, 1969), expected frequencies are computed un-
der the assumption of total independence of all variables under study (von Eye, 1990).
However, virtually any set of assumptions can be incorporated into CFA, allowing more
complex hypotheses to be tested. Accordingly, there exists a variety of ways to test
whether a configuration constitutes a type and/or an antitype (von Eye, 2001).
When a cell contains more cases than expected it is said to constitute a CFA type.
When there are fewer cases than expected, a cell is said to constitute a CFA antitype.
For all the statistical tests for types and antitypes of a given configuration in CFA, a
general null hypothesis is expressed as
(2) 0: ,mij ijH F F=
where Fmij is the model-based expected and Fij is the true expected frequency of cell ij
of a given configuration (cf. DuMouchel, 1999). If Fij > Fmij , a cell is said to constitute
a CFA type. If, in contrast, Fij < Fmij, a cell is said to constitute a CFA antitype. If,
statistically, Fij = Fmij, a cell constitutes neither a type nor an antitype2.
Identification of types and antitypes serves two key roles in CFA. First, the presence
of types and antitypes in a CFA model functions as a red flag to suggest that the hy-
pothesized model is not a good representation of the data; second, it shows where vari-
ables in a cross-classification are associated. A graphic display of types and antitypes
would facilitate these two key functions in CFA. However, few graphical techniques are
available to display types and antitypes of CFA. One way to illustrate types and anti-
types is to show test statistics of types and antitypes in a bar graph (e.g., von Eye &
Niedermeier, 1999, p. 195; see also Figure 6 in this article for an example). Height of a
2Throughout the present article, we identified types and antitypes in CFA base models assuming that
types and antitypes do not exist in the population. This assumption, however, may lead to incorrect
identification of types and antitypes when types/antitypes exist in a population. An alternative approach
to identify types and antitypes suggested by Kieser and Victor (1999) utilizes an additional model for
types/antitypes superimposed on the base model. Using this alternative approach may result in different
numbers and/or configurations of types and antitypes. Since the focus of the present article was to de-
monstrate usefulness of mosaic displays for CFA, however, we identified types and antitypes using the
classical CFA approach.
E.Y. Mun, A. von Eye, H.E. Fitzgerald, & R.A.Zucker: Using Mosaic Displays in CFA 167
bar represents the magnitude of test statistics of types and antitypes, and types and
antitypes can easily be identified by drawing two lines across bars representing critical
values. However, an ideal graphic display of CFA should feature not only types and
antitypes but also proportionate or raw cell frequencies of a given configuration. Test
statistics of types and antitypes result from discrepancies between the observed and ex-
pected frequencies. The latter depends on the hypotheses of the model tested under
study. Therefore, magnitude of test statistics of types and antitypes alone does not dis-
close anything about observed frequencies of a cross-tabulation, which facilitate under-
standing with regard to where local associations are reflected in a cross-classification.
Observed cell frequencies of a configuration or multi-way contingency table have
typically been plotted in a bar graph by arraying multiple categorical dimensions into
one dimension listing all possible cell indices (Hartigan & Kleiner, 1981, 1984; Wang,
1985; see Figure 5 in this article for an example). Height of a bar represents the magni-
tude of a cell frequency proportional to others. For example, Mahoney (2000) converted
a three-way contingency table (4 × 2 × 2) to a two-way contingency table (4 × 4) by
arraying the last two categorical variables into one dimension with cell frequencies
shown in bars clustered by the first variable in a clustered bar chart.
There are ways to improve this type of bar charts by adding a third dimension or ad-
justing the width of bars (see Clogg, Rudas, & Matthews, 1997). However, the Mosaic
display can also be an alternative for traditional two-dimensional bar charts in graphics
of multivariate relationships. Furthermore, to our knowledge, there is no graphical
technique yet developed to achieve the two critical features needed for the graphical
presentation of CFA simultaneously: display of types and antitypes and display of cell
frequencies. To accommodate these two features for graphic presentation of CFA re-
sults, this article proposes using mosaic displays for CFA, and illustrates this technique
in comparison to the two other techniques used previously mentioned (e.g., Mahoney,
2000; von Eye & Niedermeier, 1999).
3. Mosaic Displays
The mosaic display, proposed by Hartigan and Kleiner (1981, 1984) is a graphical
method for examining cross-tabulated data. A mosaic, defined as the collection of tiles
or rectangles for the n-way contingency table is formed by dividing a square n times
vertically and then horizontally (or vice versa) in a successive manner until all cell con-
168 MPR-Online 2001, No. 3
figurations are displayed. Each of the cell counts is represented in a mosaic display by
a rectangular area proportional to the cell frequencies of other cell configurations so that
relative size of a tile or rectangle becomes an indicator for whether the observed data
deviate from the CFA base model. Relatively larger rectangles suggest large observed
frequencies. Likewise, relatively smaller rectangles denote smaller observed frequencies.
If main effects or associations are hypothesized in the CFA base model or when infor-
mation other than cell frequencies is to be displayed, adding shade and color (Wang,
1985) or incorporating residuals or signs into the tiles (Friendly, 1994) helps determine
whether the observed data deviate from the hypothesized model. Thus, in CFA appli-
cations, relative sizes of tiles still indicate cell frequency but additional components such
as shading, color, numbers, or signs can address the deviation of the observed data from
the specified model. Thus, the usefulness of a mosaic display goes beyond displaying
just frequencies. In addition, mosaic displays can be especially helpful for illustrating
multi-way contingency tables by examining successive mosaic displays sequentially as
successive variables are brought into the cross-tabulation (Friendly, 1994).
To summarize, there are two characteristics of a mosaic that suit the needs of CFA:
display of cell frequencies and patterns of type/antitype. First, relative sizes of rectan-
gles in a mosaic display do not change as a function of the hypotheses or models tested
under study since rectangles or tiles reflect the observed frequencies of a cross-
tabulation. Therefore, regardless of the hypotheses or models tested, the size of a tile
always corresponds to the magnitude of an observed frequency of a given cross-
tabulation. Second, incorporating color, shading, sign, or numbers to the mosaic display
allows researchers to discriminate types and/or antitypes and determine whether the
tested model is a good representation of the data or not. In the following section, sev-
eral data examples are used to illustrate Mosaic displays in CFA with step-by-step de-
scriptions.
4. Data Examples
4.1. CFA Base Model
Consider the following data example. In a study on child behavior problems (Mun,
Fitzgerald, von Eye, Puttler, & Zucker, 2001; Zucker et al., 2000), a sample of 215 boys
was rated twice by parents using the Child Behavior Checklist for Ages 4-18 (CBCL;
Achenbach, 1991). The first rating occurred when the boys were between three and five
E.Y. Mun, A. von Eye, H.E. Fitzgerald, & R.A.Zucker: Using Mosaic Displays in CFA 169
years old and the second rating occurred when they were six to eight years old. Follow-
ing Achenbach (1991), a T-score of 60 was used as the clinical cut-off for externalizing
and internalizing behavior problems in the clinical range. Based on the averaged paren-
tal ratings, boys were assigned to clinical levels of externalizing behavior problems at
wave 1 (E1), internalizing behavior problems at wave 1 (I1), externalizing behavior prob-
lems at wave 2 (E2), and internalizing behavior problems at wave 2 (I2). For all four
variables, E1, I1, E2, and I2, a category of one indicated behavior problems in the norma-
tive range and a category of two indicated behavior problems in the clinical range.
Table 1: Developmental Patterns of Behavior Problems among Boys
1 1 2 2E I E I Obs. Freq. Exp. Freq. L
1111 142 120.63 6.56 Type
1112 17 24.26 -2.79 Antitype
1121 5 15.16 -4.51 Antitype
1122 3 3.05 -.03
1211 8 11.02 -1.51
1212 2 2.22 -.16
1221 0 1.39 -1.26
1222 1 .28 1.39
2111 16 25.08 -3.46 Antitype
2112 1 5.04 -2.09
2121 6 3.15 1.80
2122 7 .63 8.22 Type
2211 2 2.29 -.21
2212 3 .46 3.83 Type
2221 0 .29 -.55
2222 2 .06 8.10 Type Notes. E1 = externalizing behavior problems at wave 1 (Ages 3-5); I1 = internalizing behavior problems
at wave 1; E2 = externalizing behavior problems at wave 2 (Ages 6-8); I2 = internalizing behavior prob-
lems at wave 2. Numerals in E1I1E2I2 column represent ordered quadruples of variable categories: 1 =
sub-clinical level behavior problems; 2 = clinical level behavior problems. L stands for Lehmacher’s test.
Bonferroni-adjusted alpha (.003125) was used as a critical alpha level.
170 MPR-Online 2001, No. 3
Figure 1: Developmental Patterns of Behavior Problems Among Boys
This categorization scheme yielded the 2 × 2 × 2 × 2 cross-classification (E1 × I1 × E2 ×
I2). We analyzed this table under the total independence assumption (i.e., main-effect
model), which dictates that all four variables are not related at all. Table 1 shows the
observed and expected frequencies and types and antitypes of the data3. Figure 1 gives
the mosaic display of the cross-classification.
4.2. MOSAICS
All mosaic displays in the current study (Figures 1-4 and 7-13) were generated using
MOSAICS developed for the SAS/IML software (SAS Institute, 1989) by Friendly
(1992, 1994) which is available at http://www.math.yorku.ca/SCS/mosaics.html4. For
ease of understanding, cell indices and legends were later edited into the figures in the
present study. Numbers inside or by the tiles in all mosaic figures are cell indices. In
3Three expected frequencies (cell indices 1122, 1212, and 2211) were smaller than .5. Although we
acknowledge that these values were rather small, for the purpose of illustration, we decided to ignore this.
Likewise, we avoided invoking the delta option to compensate for cells with zero observations. 4Detailed description of the algorithm and a FORTRAN program as an alternative to the MOSAICS
program can be found in Wang (1985).
E.Y. Mun, A. von Eye, H.E. Fitzgerald, & R.A.Zucker: Using Mosaic Displays in CFA 171
addition, green and blue colors were consistently used to represent types and antitypes,
respectively. However, color, shading, arrangement of tiles, and size of the graph are
arbitrary and may be changed.
4.3. Sequential introduction of marginal totals
Figure 1 can be developed in a series of steps. The first step was to compute the
marginal totals of the table. Let f ijkl denote the ijkl th cell count for the present data.
And let f 1... through f ...2 denote one-way marginals, and f 11.. through f ..22 denote two-way
marginals, and f 111. through f .222 denote three-way marginals. The first block represent-
ing a proportion of one was vertically divided into two blocks using one-way marginal
totals for externalizing behavior problems at wave 1 (e.g., f 1... and f 2...; see Figure 2).
The left oblong representing cell index 1... displayed 82.8% (178 cases) of the total sam-
ple and the right oblong representing cell index 2... displayed 17.2% (37 cases) of the
total sample. In the next step, the two oblongs representing cell indices 1... and 2...
were horizontally divided into four rectangles using two-way marginal totals for exter-
nalizing and internalizing behavior problems at wave 1 (see Figure 3). The rectangle for
cell 11.. was bigger than any other rectangles displaying 77.7% (167 cases) of the total
sample and 93.8% of the one-way marginal totals for cell 1... in Figure 2. The rectangle
for cell 22.. was the smallest of the four tiles showing only seven observations (3.3% of a
total sample). Thus, the size of a tile serves as a good approximate measure for an ob-
served frequency proportional to others in a given configuration. From CFA results,
cells 11.. and 22.. were identified as types shown in green whereas cells 12.. and 21.. as
antitypes shown in blue. It can be summarized that behavior problems of three-to-five-
year-old boys appeared across all observed variables or not at all5.
5Expected frequencies for 11.., 12.., 21.., and 22.. were 163.10, 14.90, 33.90, and 3.10, respectively. Leh-
macher’s test statistics (1981) were 2.54, -2.54, -2.54, and 2.54 in the same order.
172 MPR-Online 2001, No. 3
Figure 2: Developmental Patterns of Behavior Problems Among Boys: One-Way Mar-
ginal Totals
Figure 3: Developmental Patterns of Behavior Problems Among Boys: Two-Way Mar-
ginal Totals
E.Y. Mun, A. von Eye, H.E. Fitzgerald, & R.A.Zucker: Using Mosaic Displays in CFA 173
Figure 4: Developmental Patterns of Behavior Problems Among Boys: Three-Way Mar-
ginal Totals
The third step was to vertically divide the four tiles in Figure 3 into eight tiles using
three-way marginal totals for both types of behavior problems at wave 1 and externaliz-
ing behavior problems at wave 2 (see Figure 4). The rectangles for cell indices 122. and
222. are very small indicating only one observed case and two observed cases, respec-
tively. The vertical split was asymmetric in that it favored a more even division for the
two-way marginal totals (cell indices 21.. and 22..) in comparison to a disproportionate
division for the other two-way marginal totals representing cell indices 11.. and 12...
The disproportionate and asymmetric split suggested that there may be associations
among these three categorical variables. It turned out that cells 111. , 212. , and 222.
emerged as types whereas cells 112. and 211. were identified as antitypes from CFA re-
sults6. Types and antitypes indicate that more boys than expected showed all-or-none
behavior problems (cells 111. and 222.), that boys with externalizing behavior problems
only at wave 1 also had externalizing behavior problems at wave 2 (cell 212.), and that
6Expected frequencies for 111., 112., 121., 122., 211., 212., 221., and 222. were 144.89, 18.21, 13.24, 1.66,
30.12, 3.78, 2.75, and .35, respectively. Lehmacher’s test statistics (1981) were 5.87, -5.14, -1.80, -.56, -
6.05, 5.44, 1.53, and 2.87 in the same order.
174 MPR-Online 2001, No. 3
boys without any behavior problems at wave 1 were unlikely to have externalizing be-
havior problems at wave 2 (cell 112.). However, it was less often found than expected
that boys with externalizing behavior problems at wave 1 did not display those prob-
lems at wave 2 (cell 211.). Finally, the eight tiles in Figure 4 were horizontally divided
yielding sixteen tiles based on each cell count, fijkl (see Figure 1). The horizontal divi-
sion was even more asymmetric than the vertical split at the third step, pointing to pos-
sible associations among the four categorical variables. In Figure 1, cell configurations
1221 and 2221 are illustrated with lines instead of tiles to show that the observed fre-
quencies are zero. More details on Figure l follow in the next section.
4.4. Results for the CFA base model
As expected, the CFA base model showed a poor fit. The Pearson X2 = 169.39, for
df = 11, p = .00, suggests that the independence model is not a good representation of
the data. In addition, the CFA results shows four types and three antitypes using
Lehmacher’s test (Lehmacher, 1981) with a Bonferroni-adjusted alpha level (α* =
0.003125). The Bonferroni adjustment of alpha was adopted to control for inflated al-
pha due to first, simultaneous multiple testing of types and antitypes and second, their
mutual dependency of tests (see von Eye, 1990, in prep). Types were found in configu-
rations 1111, 2122, 2212, and 2222. The first type (1111) indicates that there were mo-
re cases than expected of neither externalizing nor internalizing behavior problems at
both waves. Type 2122 shows that there were more boys than expected with externaliz-
ing behavior problems at both waves, and internalizing behavior problems at wave 2
but not at wave 1. Type 2212 shows that there were more boys than expected with
internalizing behavior problems at both waves, and externalizing behavior problems at
wave 1 only. Type 2222 shows that there were more boys with externalizing and inter-
nalizing behavior problems at both waves than expected.
Antitypes were found in cell configurations 1112, 1121, and 2111. Antitype 1112 in-
dicates that fewer cases than expected were found of internalizing behavior problems at
wave 2 only. Antitype 1121 indicates that fewer observations than expected were found
of boys with externalizing behavior problems only at wave 2. Antitype 2111 indicates
that fewer boys than expected showed externalizing behavior problems only at wave 1.
E.Y. Mun, A. von Eye, H.E. Fitzgerald, & R.A.Zucker: Using Mosaic Displays in CFA 175
4.5. Alternate graphic methods
Table 1 can alternatively be plotted showing only proportional differences in a bar
graph using, for instance, SPSS 9.0 (SPSS, 1998). Figure 5 is a result of converting the
4-way contingency table (2 × 2 × 2 × 2) to a two-way contingency table (4 × 4). Figure
5 shows that some cell configurations had higher cell counts whereas other cell configu-
rations had lower cell counts. Thus, this technique is limited in that first, it does not
handle types and antitypes of CFA; second, different arrangements of a multi-way con-
tingency table for a statistical analysis and a graphic illustration can create semantic
difficulties. Alternatively, Table 1 can be displayed with a focus on statistics of types
and antitypes as in von Eye and Niedermeier (1999). Figure 6 was drawn using S-Plus
4.5 (MathSoft, 1997). Height of bars in this graph represents the magnitude of Leh-
macher’s test statistics. Bars below the zero line indicate that observed frequencies
were smaller than expected frequencies whereas bars above the zero line indicate that
observed frequencies were larger than expected frequencies. The two horizontal lines,
parallel above and below zero indicate critical values of Lehmacher’s test statistics.
Bars above and below the critical values indicate types and antitypes, respectively.
This graph clearly shows that cells 1111, 2122, 2212, and 2222 were types represented
by green bars and cells 1112, 1121, and 2111 were antitypes represented by blue bars.
This technique, however, is limited in that it does not provide information on cell fre-
quencies of a cross-tabulation. Therefore, a mosaic display seems to be a better fit for
CFA than the other two techniques.
176 MPR-Online 2001, No. 3
Figure 5: An Alternative Approach to Mosaic Displays: A Bar Graph of Cell Fre-
quencies
Figure 6: A Bar Graph of Test Statistics of Types and Antitypes
E.Y. Mun, A. von Eye, H.E. Fitzgerald, & R.A.Zucker: Using Mosaic Displays in CFA 177
4.6. When all cells are types or antitypes
The following data example presents a situation when all cells are either types or
antitypes. This data example has been used by many researchers including Lienert
(1964), von Eye (1990), and Kieser and Victor (1999). 65 students were treated with
LSD 50 and observed for the following three symptoms: Narrowed consciousness (C),
thought disturbance (T), and affective disturbance (A). Each of the symptoms had the
categories of presence or absence. Table 2 gives the resulting cross-tabulation. The
expected frequencies and testings of types and antitypes were computed under the as-
sumption of total independence of all symptoms. The CFA base model did not fit,
Pearson X2 = 37.92, for df = 4, p = .00. In addition, the CFA results showed four
types and four antitypes, all based on Lehmacher’s test with a Bonferroni-adjusted al-
pha level (α* = 0.00625). Types were found in cells 111, 122, 212, and 221 while anti-
types were found in cells 112, 121, 211, and 222. Results can be briefly summarized as
follows. More cases with either all three symptoms or a just single symptom were found
than expected by the independence assumption. On the other hand, antitypes indicate
that fewer cases with either no symptom or any of two symptoms were found than ex-
pected. Detailed interpretations can be found in von Eye (1990, p. 34)7. Figure 7 pre-
sents the mosaic display for the data. As before, types are shaded in green and anti-
types are shaded in blue.
7This data example generates one type (111) and one antitype (222) when analyzed using the approach
suggested by Kieser and Victor (1999).
178 MPR-Online 2001, No. 3
Table 2: Leuner’s Syndrome Data
CTA Obs. Freq. Exp. Freq. L 111 20 12.51 3.41 Type
112 1 6.85 -3.06 Antitype
121 4 11.40 -3.43 Antitype
122 12 6.24 3.09 Type
211 3 9.46 -3.12 Antitype
212 10 5.18 2.73 Type
221 15 8.63 3.13 Type
222 0 4.73 -2.75 Antitype Notes. C = narrowed consciousness; T = thought disturbance; A = affective disturbance. Numerals in
CTA column represent ordered triples of variable categories: 1 = presence of symptom; 2 = absence of
symptom. L stands for Lehmacher’s test; Bonferroni-adjusted alpha (.00625) was used.
Figure 7: Leuner’s Syndrome Data
E.Y. Mun, A. von Eye, H.E. Fitzgerald, & R.A.Zucker: Using Mosaic Displays in CFA 179
4.7. Entry order of variables
The following data examples are to show that a different entry order of categorical
variables into MOSAICS results in a mosaic with tiles of the same proportional size but
with a different planimetric arrangement. The first data set is from a study on the pre-
diction of performance in school, which has been used in von Eye and Brandtstädter
(1998). In this study, fluid intelligence (I) and performances in German (G) and
mathematics (M) were assessed (see Table 3). The expected frequencies and test statis-
tics of types and antitypes were computed under the assumption of total independence
of all variables. The CFA base model did not fit, Pearson X2 = 67.58, for df = 4, p =
.00. There were two types and three antitypes using Lehmacher’s test with a Bon-
ferroni-adjusted alpha level (α* = 0.00625). Figure 8 represents the data set with the
entry order that corresponds to the order of CFA shown in Table 3. The entry order
for the CFA base model, [I][G][M] was 111, 211, 121, 221, 112, 212, 122, and 222 in
MOSAICS. In MOSAICS, the first variable varies most rapidly across the columns of
cell indices whereas in most other programs the first variable varies most slowly.
Table 3: Fluid Intelligence and Performances in German and Mathematics
IGM Obs. Freq. Exp. Freq. L 111 19 4.96 7.71 Type
112 1 7.17 -3.04 Antitype
121 9 14.25 -2.13
122 18 20.62 -1.00
211 3 4.85 -1.02
212 1 7.02 -2.98 Antitype
221 7 13.95 -2.83 Antitype
222 35 20.18 5.67 Type Notes. I = fluid intelligence; G = performance in German; M = performance in mathematics. Numerals in
IGM column represent ordered triples of variable categories: 1 = below average; 2 = above average. L
stands for Lehmacher’s test; Bonferroni-adjusted alpha (.00625) was used.
180 MPR-Online 2001, No. 3
Figure 8: Prediction of Performance in School, [I][G][M]
We then changed the order of categorical variables from [I][G][M] to [G][M][I]. Cell
indices entered into MOSAICS were in the following order: 111, 121, 112, 122, 211, 221,
212, and 222, which corresponded to the order for the CFA base model, [G][M][I]. Fig-
ure 9 represents the data. In this figure, the shape and the location of the tiles changed
but the relative sizes remained the same. For example, the tall oblong for a cell index
221 in Figure 8 changed to a rectangle in Figure 9. However, the relative size of the cell
221 stayed the same in Figures 8 and 9 in proportion to a total number of cases as well
as marginal totals.
E.Y. Mun, A. von Eye, H.E. Fitzgerald, & R.A.Zucker: Using Mosaic Displays in CFA 181
Figure 9: Prediction of Performance in School: Different Entry Order, [G][M][I]
The second data example is from a recently reported study (Mahoney, 2000) in which
four groups of adolescent boys (G), and their records of school dropout (D) and criminal
arrest (C) were obtained to see whether there were associations among group informa-
tion and records of dropout and criminal arrest (see Table 4). The expected frequencies
and test statistics of types and antitypes were calculated under the assumption of total
independence. The CFA base model did not fit, Pearson X2 = 137.66, for df = 7, p =
.00. Three types and three antitypes were identified using Lehmacher’s test with a
Bonferroni-adjusted alpha level (α*= 0.0041667). Patterns of types and antitypes are
interpreted in detail in Mahoney (2000). Figure 10 shows a mosaic display for the CFA
base model, [G][D][C]. The order of cell indices entered into MOSAICS that corre-
sponded to CFA are as follows: 111, 211, 311, 121, 221, 321, 112, 212, 312, 122, 222, and
322. When the order of categorical variables was reversed to [C][D][G] (i.e., 111, 112,
121, 122, 211, 212, 221, 222, 311, 312, 321, and 322), the general look of the mosaic
changed due to differences in the order of introduction of marginal totals (see Figure
11). However, the sizes of tiles remained the same in proportion to the total number of
cases and marginal totals.
182 MPR-Online 2001, No. 3
Table 4: Records of School Dropout and Criminal Arrest among Adolescent Boys
GDC Obs. Freq. Exp. Freq. L 111 155 121.62 7.73 Type
112 9 22.64 -4.13 Antitype
121 6 24.23 -5.39 Antitype
122 3 4.51 -.78
211 63 64.68 -.44
212 10 12.04 -.72
221 11 12.89 -.65
222 8 2.40 3.82 Type
311 26 42.18 -5.01 Antitype
312 8 7.85 .06
321 13 8.41 1.86
322 13 1.56 9.51 Type Notes. G = configuration group; D = school dropout; C = criminal arrest. Numerals in GDC column
represent ordered triples of variable categories: For G, 1 = configurations 1 and 2, characterized by com-
petence in all domains; 2 = configuration 3, characterized by low academic competence and high aggres-
sion; 3 = configuration 4, characterized by a multiple risk profile. For D and C, 1 = no; 2 = yes. L
stands for Lehmacher’s test; Bonferroni-adjusted alpha (.0041667) was used
E.Y. Mun, A. von Eye, H.E. Fitzgerald, & R.A.Zucker: Using Mosaic Displays in CFA 183
Figure 10: Records of School Dropout and Criminal Arrest Among Adolescent Boys,
[G][D][C]
Figure 11: Records of School Dropout and Criminal Arrest Among Adolescent Boys:
Reversed Order [C][D][G]
184 MPR-Online 2001, No. 3
4.8. Non-Standard CFA Models
So far, the present study illustrated standard CFA base models using mosaic displays
with different data examples. From these results, usefulness of mosaic displays was ex-
amined in terms of display of cell frequencies and residuals. In addition, we demon-
strated that a different entry order of categorical variables generates a different look
overall but the relative sizes of tiles remain intact. In this section, we demonstrate that
mosaic displays can be applied to non-standard and non-hierarchical CFA models as
well.
Consider the following data example, a re-analysis of data published by Glück and
von Eye (2000). A sample of 181 high school students was administered the 24-item
cube comparison task. After completing each item, the students responded to questions
concerning the perceived difficulty of the item, the strategies they had employed to
process the item, and the perceived quality of their strategy (Glück, 1999). The three
strategies the students used to solve the cube comparison task were mental rotation
(R), pattern comparison (P), and change of viewpoint (V). Each strategy was scored as
not used = 1 and used = 2. A category one was assigned for females; two for males for
Gender (G). Table 5 and Figure 12 display the results of first order CFA (i.e., model of
total independence) with the normal approximation of the binomial test and the Bon-
ferroni-adjusted α* = 0.003125.
The results showed a rich pattern of types and antitypes with noticeable gender dif-
ferences. Types indicate that there were more observations than expected for the fol-
lowing configurations: Males who only used the change of viewpoint strategy (1122),
males who only used the pattern comparison strategy (1212), males that used both the
pattern comparison and the change of viewpoint strategies (1222), and females that only
used the rotation strategy (2111). Antitypes suggest that there were fewer observations
than expected for the following configurations: Females that used no strategy (1111),
males that used no strategy (1112), males that used both the rotation and the pattern
comparison strategies (2212), and females that used all three strategies (2221). This
CFA base model for the frequency distribution in Table 5 was rejected because of the
large Pearson X2 = 321.68 with df = 11, p < 0.01 (Likelihood Ratio (LR) = 380.84, df
= 11, p < 0.01).
E.Y. Mun, A. von Eye, H.E. Fitzgerald, & R.A.Zucker: Using Mosaic Displays in CFA 185
Table 5: First Order CFA of the Cross-Classification of Rotational Strategy (R), Pat-
tern Comparison Strategy (P), Viewpoint Strategy (V), and Gender (G)
RPVG Obs. Freq. Exp. Freq. L 1111 25 61.30 -4.68 Antitype
1112 5 103.19 -9.81 Antitype
1121 17 10.48 2.02
1122 42 17.65 5.81 Type
1211 98 88.27 1.05
1212 206 148.60 4.81 Type
1221 13 15.10 -.54
1222 64 25.42 7.68 Type
2111 486 398.58 4.65 Type
2112 729 670.92 2.49
2121 46 68.17 -2.71
2122 95 114.75 -1.88
2211 590 573.96 .73
2212 872 966.22 -3.58 Antitype
2221 39 98.17 -6.08 Antitype
2222 199 165.25 2.69 Notes. Numerals in RPVG column represent ordered triples of variable categories. Each strategy was
scored as 1 = not used; 2 = used. For Gender, 1 = females; 2 = males. L stands for Lehmacher’s test;
Bonferroni-adjusted alpha (.003125) was used.
186 MPR-Online 2001, No. 3
Figure 12: Patterns of Strategies: First-Order CFA Base Model (Glück & von Eye,
2000)
In addition to the four categorical variables used in Table 5, one could ask whether
handedness is associated with strategies adopted by males and females (Glück, 1999). If
so, residuals would diminish and some or all of the types and antitypes would disappear.
To test this hypothesis, in the next step, we added a covariate, handedness to the first-
order CFA base model. Results showed a significant improvement over the previous
CFA base model without the covariate (∆ LR = 164.21; ∆df=1; p < 0.01), although the
model was not tenable by itself (X2 = 168.14, LR = 216.63; df = 10; p < 0.01). Only
one antitype (1112) and three types (1122, 1212, and 2111) remained significant out of
the eight types and antitypes in Table 5, eliminating one type (1222) and three antity-
pes (1111, 2212, and 2221; see Table 6 and Figure 13). Thus, the covariate, handedness
contributed significantly to the explanation of the observed frequency distribution. The
changes in types and antitypes in these two nested analyses are clearly shown in Figures
12 and 13. In Figure 12, eight tiles were illustrated in either green or blue; only four
tiles were still displayed in either color in Figure 13. The tiles in Figures 12 and 13 are
identical in size since the observed frequencies were the same but the shading color pat-
tern was different due to the differences in expected frequencies stemming from different
CFA models.
E.Y. Mun, A. von Eye, H.E. Fitzgerald, & R.A.Zucker: Using Mosaic Displays in CFA 187
Table 6: First Order CFA of the Cross-Classification of Rotational Strategy (R), Pat-
tern Comparison Strategy (P), Viewpoint Strategy (V), and Gender (G) with Handed-
ness as Covariate
RPVG Obs. Freq. Exp. Freq.
Handed-
ness
Test Statis-
tics 1111 25 33.67 .99 -1.50
1112 5 87.34 .91 -8.92 Antitype
1121 17 16.11 .88 .22
1122 42 21.84 .89 4.33 Type
1211 98 106.90 .81 -.87
1212 206 134.85 .83 6.25 Type
1221 13 17.34 .85 -1.05
1222 64 51.96 .75 1.68
2111 486 419.00 .83 3.49 Type
2112 729 705.24 .81 1.00
2121 46 47.40 .92 -.21
2122 95 114.41 .85 -1.84
2211 590 646.91 .75 -2.48
2212 872 877.10 .76 -.20
2221 39 26.68 .98 2.40
2222 199 219.28 .74 -1.41 Notes. Numerals in RPVG column represent ordered triples of variable categories. Each strategy was
scored as 1 = not used; 2 = used. For Gender, 1 = females; 2 = males.
188 MPR-Online 2001, No. 3
Figure 13: Patterns of Strategies: Handedness as Covariate (Glück & von Eye, 2000)
As shown in the present study, mosaic displays using MOSAICS can illustrate stan-
dard/non-standard as well as hierarchical/non-hierarchical CFA with or without covari-
ates. Most non-standard and/or non-hierarchical CFA models can be accommodated by
providing configuration types into MOSAICS. Using residuals as deviations in
MOSAICS is an alternative for more complex models with covariates as shown in the
last example. Appendix A provides a SAS input as an example of a hierarchical CFA
or standard log-linear models, and Appendix B provides a SAS input as an example us-
ing Pearson residuals,2( )f F−
F, when f and F denote the observed and expected fre-
quency, respectively.
5. Discussion
The present article evaluated three graphic methods of displaying CFA results. The
first method used by Mahoney (2000) focuses on the observed cell frequencies. The sec-
ond method used by von Eye and Niedermeier (1999), focuses on the magnitude of test
statistics used for the type/antitype tests. The third method, the Mosaic display
E.Y. Mun, A. von Eye, H.E. Fitzgerald, & R.A.Zucker: Using Mosaic Displays in CFA 189
(Friendly, 1992, 1994; Hartigan & Kleiner, 1981, 1984; Wang, 1985) incorporates both,
the observed cell frequencies and the type/antitype information from CFA. The present
article illustrated advantages of the Mosaic display over the other two methods for CFA
since more than one dimension of the data can be illustrated simultaneously in the Mo-
saic display. Cell frequencies and the pattern of types and antitypes are critical features
of CFA, and they can easily be illustrated using the Mosaic display.
Patterns of type/antitype in CFA, in particular, allow one to understand whether
there is a heterogeneous subset of the sample, and on what categories and levels this
subset differs. A good graphical method can be instrumental in implementing features
of CFA and our understanding of the data. Lack of graphical techniques for CFA, fu-
eled by Tukey’s tenets (1989, 1990) on data-based graphics was the major incentive for
the present article. Six of Tukey’s points on visual display (1989, 1990), germane to
Mosaic displays of CFA, are briefly discussed as follows.
5.1. Impact is important
Visual display of the data should be done in a powerful and intuitive way (Tukey,
1989, 1990). The Mosaic display is capable of doing this for CFA results. The Mosaic
display can be as compelling a means of visual display for multivariate relationships as
traditional bar charts for univariate information or bivariate relationship. In particu-
lar, the mosaic simultaneously displays the cell-wise frequencies and the type/antitype
decision in one tile. Thus, the interesting point that the correlation between the size of
cell frequency and presence of types/antitypes is weak at best can also be visualized.
5.2. Understanding graphics is not always automatic
Due to its novelty, Mosaic displays of CFA may not be understood easily at first
sight. However, to be thoroughly understood, even familiar types of graphs may need
explanations that come in the form of descriptions or legends (Tukey, 1989, 1990). This
certainly applies to Mosaic displays in CFA. Once the reader knows how to look at it,
the entire information carried by a Mosaic can easily be understood.
5.3. A graph can show us things easily that might not have been seen
otherwise
The purpose of visual display is not to present numbers, but to compare (Tukey,
1989, 1990). Presenting an array of numbers in the form of a table may make it hard to
190 MPR-Online 2001, No. 3
see a relationship or lack of it. Mosaics of CFA results can show patterns of types and
antitypes and the relationship between frequency and type/antitype decision. In addi-
tion, by proper selection of the order of variables, Mosaics can make the comparison of
groups clear.
5.4. An understanding of purpose is needed
Graphs as well as analytical methods are selected based on what researchers are try-
ing to get across to the audience. Different graphs serve different purposes. The
Mosaic display of CFA allows one to depict (i) type and antitype patterns, (ii) the
size of cells, and (iii) the relationship between frequencies and type/antitype pat-
terns. If these are the purposes of analysis as in CFA, the Mosaic display is the me-
thod of choice. If, however, researchers focus on the size of frequencies at the expen-
se of type/antitype patterns, mosaics carry too much information and can be repla-
ced by other simpler graphical methods, e.g., bar graphs.
5.5. The absence of phenomena is itself a phenomenon
In the present context, the “absence of phenomena” can be viewed as first, the ab-
sence of types or antitypes, and second, the presence of antitypes. The Mosaic display
of CFA unequivocally depicts the absence of phenomena as well as the presence of phe-
nomena. In first-order CFA, the absence of types or antitypes implies total indepen-
dence among variables, which is rarely observed in CFA applications. So when it hap-
pens, the absence of types or antitypes may require explanation. In addition, more in-
terestingly, there are situations where the main focus of research addresses whether cer-
tain configurations exist, which can be fulfilled by the visual display of the presence of
antitypes using Mosaic displays.
5.6. Color is a disappointment
Although color is not yet an effective means of representing quantitative values, it is
a useful labeling means in general (Tukey, 1990; Wainer, 1990). Color can be used ef-
fectively for qualitative phenomena at two or three levels. In the Mosaic displays of
CFA where color is used to tell whether types or antitypes exist, and whether it is type
or antitype, color is a powerful means of visual display. Moreover, the increased use of
color in prints and the increased publications in CD-ROM or on the web will make color
as a more viable means to illustrate quantitative as well as qualitative information.
E.Y. Mun, A. von Eye, H.E. Fitzgerald, & R.A.Zucker: Using Mosaic Displays in CFA 191
References
[1] Achenbach, T. M. (1991). Manual for the Child Behavior Checklist/4-18 and 1991
profile. Burlington, VT: University of Vermont department of Psychiatry.
[2] Clogg, C. C., Rudas, T., & Matthews, S. (1997). Analysis of contingency tables
using graphical displays based on the mixture index of fit. In J. Blasius, & M.
Greenacre (Eds.), Visualization of categorical data. New York: Academic Press.
[3] DuMouchel, W. (1999). Bayesian data mining in large frequency tables, with an
application to the FDA spontaneous reporting system. The American Statistician,
53, 177-190.
[4] Friendly, M. (1992). User's guide for MOSAICS (Tech. Rep. No. 206). York Uni-
versity, Department of Psychology.
[5] Friendly, M. (1994). Mosaic displays for multi-way contingency tables. Journal of
the American Statistical Association, 89, 190-200.
[6] Glück, J. (1999). Spatial strategies - cognitive strategies on spatial tasks. Unpub-
lished dissertation. University of Vienna, Department of psychology.
[7] Glück, J., & von Eye, A. (2000). Including covariates in Configural Frequency
Analysis. Psychologische Beiträäge, 42, 405 - 417.
[8] Hartigan, J. A., & Kleiner, B. (1981). Mosaics for contingency tables. In W. F.
Eddy (Ed.), Proceedings of the 13th symposium on the interface between computer
science and statistics (pp. 268-273). New York: Springer-Verlag.
[9] Hartigan, J. A., & Kleiner, B. (1984). A mosaic of television ratings. The American
Statistician, 38(1), 32-35.
[10] Kieser, M., & Victor, N. (1999). Configural frequency analysis (CFA) revisited - A
new look at an old approach. Biometrical Journal, 41, 967-983.
[11] Lehmacher, W. (1981). A more powerful simultaneous test procedure in configural
frequency analysis. Biometrical Journal, 23(5), 429-436.
[12] Leuner, H. C. (1962). Die experimentelle Psychose. Berlin: Springer.
[13] Lienert, G. A. (1969). Die “Konfigurationsfrequenzanalyse” als Klassifikationsme-
thode in der klinischen Psychologie. In M. Irle (Ed.), Bericht über den 26. Kongreß
192 MPR-Online 2001, No. 3
der Deutschen Gesellschaft für Psychologie in Tübingen 1968 (pp. 244 - 253). Göt-
tingen: Hogrefe.
[14] Mahoney, J. L. (2000). School extracurricular activity participation as a moderator
in the development of antisocial patterns. Child Development, 71(2), 502-516.
[15] MathSoft (1997). S-Plus user’s guide. Seattle, WA: MathSoft, Inc.
[16] Mun, E. Y., Fitzgerald, H. E., von Eye, A., Puttler, L. I., & Zucker, R. A. (2001).
Temperamental characteristics as predictors of externalizing and internalizing child
behavior problems in the contexts of high and low parental psychopathology. Infant
Mental Health Journal, 22(3), 393-415.
[17] SAS Institute (1989). SAS/IML Software: Usage and reference, version 6, first edi-
tion. Cary, NC: SAS Institute.
[18] SPSS Inc. (1998). SPSS 9.0. Chicago, IL: SPSS, Inc.
[19] Tukey, J. W. (1986). Sunset salvo. American Statistician, 40, 72-76.
[20] Tukey, J. W. (1989). Data-based graphics: Visual display in the decades to come.
In Gail, M.H., & Johnson, N.L. (Coordinators): Sesquicentennial invited paper ses-
sions. Proceedings of the American Statistical Association (pp. 366 - 381). Alexan-
dria, VA: American Statistical Association.
[21] Tukey, J. W. (1990). Data-based graphics: Visual display in the decades to come.
Statistical Science, 5(3), 327-339.
[22] von Eye, A. (1990). Introduction to Configural Frequency Analysis: The search
for types and antitypes in cross-classifications. Cambridge: Cambridge University
Press.
[23] von Eye, A. (2001). Configural Frequency Analysis - Version 2000: A program for
32 bit Windows operating systems. Methods of Psychological Research-Online,
6(2), 129-139.
[24] von Eye, A. (in prep). Configural Frequency Analysis. Mahwah, NJ: Lawrence
Erlbaum Associates, INC.
[25] von Eye, A., & Niedermeier, K. E. (1999). Statistical analysis of longitudinal cate-
gorical data in the social and behavioral sciences. Mahwah, NJ: Erlbaum.
E.Y. Mun, A. von Eye, H.E. Fitzgerald, & R.A.Zucker: Using Mosaic Displays in CFA 193
[26] von Eye, A., Spiel, C., & Wood, P. K. (1996). Configural Frequency Analysis in
applied psychological research. Applied Psychology: An International Review, 45,
301 - 327.
[27] Wainer, H. (1989). Discussion: Graphical visions from William Playfair to John
Tukey. In Gail, M.H., & Johnson, N.L. (Coordinators): Sesquicentennial invited pa-
per sessions. Proceedings of the American Statistical Association (pp. 382 - 390).
Alexandria, VA: American Statistical Association.
[28] Wainer, H. (1990). Graphical visions from William Playfair to John Tukey. Statis-
tical Science, 5(3), 340-346.
[29] Wainer, H., & Velleman, P. F. (2001). Statistical graphics: Mapping the pathways
of science. Annual Review of Psychology, 52, 305-335.
[30] Wang, C. M. (1985). Applications and computing of mosaics. Computational Sta-
tistics & Data Analysis, 3, 89-97.
[31] Zucker, R. A., Fitzgerald, H. E., Refior, S. K., Puttler, L. I., Pallas, D., & Ellis, D.
A. (2000). The clinical and social ecology of childhood for children of alcoholics:
Description of a study of implications for a differentiated social policy. In H. E.
Fitzgerald, B. M. Lester, & B. Zuckerman (Eds.), Children of addiction: Research,
health, and public policy issues (pp. 109-142). New York: Routledge/Falmer.
Appendix A
SAS MOSAIC input for the Mun et al (2001) data: Using observed frequencies
filename mosaics 'c:\sas\sasuser\mosaics\';
libname mosaic 'c:\sas\sasuser\mosaics\';
data infant;
input E1 I1 E2 I2 freq;
cards;
1 1 1 1 142
194 MPR-Online 2001, No. 3
2 1 1 1 16
1 2 1 1 8
2 2 1 1 2
1 1 2 1 5
2 1 2 1 6
1 2 2 1 0
2 2 2 1 0
1 1 1 2 17
2 1 1 2 1
1 2 1 2 2
2 2 1 2 3
1 1 2 2 3
2 1 2 2 7
1 2 2 2 1
2 2 2 2 2
;
proc iml;
use infant;
read all var {freq} into table;
levels={2 2 2 2};
vnames={'E1' 'I1' 'E2' 'I2'};
lnames={'E1:no' 'E1:yes',
'I1:no' 'I1:yes',
'E2:no' 'E2:yes',
'I2:no' 'I2:yes'};
E.Y. Mun, A. von Eye, H.E. Fitzgerald, & R.A.Zucker: Using Mosaic Displays in CFA 195
goptions hsize=7 in vsize 7 in;
reset storage=mosaic.mosaic;
load module=_all_;
split={v h};
htext={1};
colors={green blue};
shade={1.4};
plots={4};
plots=4;
fittype='user';
config=t({1 0, 2 0, 3 0, 4 0});
title='infant';
run mosaic (levels, table, vnames, lnames, plots, title);
quit;
Appendix B
SAS MOSAIC input for the Mun et al (2001) data: Using residuals
filename mosaics 'c:\sas\sasuser\mosaics\';
libname mosaic 'c:\sas\sasuser\mosaics\';
proc iml;
infant={2 2 2 2};
f={142 16 8 2 5 6 0 0 17 1 2 3 3 7 1 2};
title={'infant'};
196 MPR-Online 2001, No. 3
vnames={'E1' 'I1' 'E2' 'I2' };
lnames={'E1:no' 'E1:yes',
'I1:no' 'I1:yes',
'E2:no' 'E2:yes',
'I2:no' 'I2:yes'};
goptions hsize=7 in vsize 7 in;
reset storage=mosaic.mosaic;
load module=_all_;
%include 'c:\sas\sasuser\mosaics\mosaicd.sas';
dev={1.946 -1.812 -.910 -.192 -2.609 1.605 -1.177 -.537 -1.474 -1.800 -.146 3.741 -.028
7.998 1.367 8.071};
split={v h};
htext=1;
colors={green blue};
shade={1.4};
run mosaicd (infant, f, vnames, lnames, dev, title);
quit;