multiple unordered categorical dependent variables in organizational research

Dependent Categorical Variables

1

Running head: DEPENDENT CATEGORICAL VARIABLES

Multiple Unordered Categorical Dependent Variables in Organizational Research

Peter Westfall*

Rawls College of Business Administration Texas Tech University Lubbock, TX 79409 Tel: (806) 742-2174 Fax: (806) 742-3191 [email protected]

James J. Hoffman

Rawls College of Business Administration Texas Tech University Lubbock, TX 79409 Tel: (806) 742-4004 Fax: (806) 742-3191 [email protected]

Jun Xia

Rawls College of Business Administration Texas Tech University Lubbock, TX 79409 Tel: (806) 742-1534 Fax: (806) 742-2308

[email protected] Topic areas: 4a, 6.m.v. (Multivariate Categorical Response Variables), and 6.r. * Corresponding Author


2

Multiple Unordered Categorical Dependent Variables in Organizational Research

Abstract

A model for analyzing multiple categorical dependent variables is presented and

developed for use in organizational research. A primary example occurs in the foreign market

entry literature, where choice of ownership (majority, equal, or minority) and “function”

(acquisition or joint venture) are simultaneously endogenous; only separate univariate

ownership-based and function-based choice models are considered in the literature. Another

example is in the comparison of gender and race across organizational units, controlling for

confounders such as experience and qualification. Subsuming univariate categorical dependent

variables as a special case, the model unifies existing organizational research methods, mitigates

bias associated with univariate methods, provides more powerful testing methods, and provides a

flexible modeling framework that allows hypotheses to be modeled and tested that are not

possible with univariate models. Standard software may be used for estimation and testing;

examples are given.

Key words: Conditional Logit, Entry Mode, Gender Discrimination, Multinomial Logit, Odds

Ratio.


3

Joint Analysis of Multiple Categorical Dependent Variables in Organizational Research

Categorical measures abound in organizational research. In cases where such measures

are predictor variables, they are often used as control variables or moderator variables. For

example, ethnicity and gender might moderate the effect of test scores on performance.

Regression and structural equations methods are reasonably well developed for such applications;

see Aguinis, Boik, and Pierce (2001), and Williams, Edwards, and Vandenberg (2003), for

examples.

Applications where categorical measures are dependent variables also are common. For

example, in the market entry literature, the categorical measure “function” (acquisition or joint

venture) has been predicted in terms of cultural variables and control variables (e.g., Kogut &

Singh, 1988; Anand & Delios, 1997; Folta & Ferrier, 2000); other researchers have considered

the categorical measure “ownership” (equal, minority, majority) using similar models (e.g.,

Hennart, & Larimo, 1998; Pan, 1996; Erramilli, 1996). For another example, when comparing

ethnic hire percentages across organizations and/or divisions, controlling for exogenous effects,

the variable “ethnicity” is a categorical dependent variable (Holzer, 1998; Shaw, 2004; Giuliano,

Levine, & Leonard; 2005). Yet another example is firm survival, which has been treated as a

binary dependent variable that can be predicted by the level of education of business owners (e.g.,

Chen and Astebro, 2003).

Models for predicting univariate categorical outcomes are commonplace; methods

include logistic and probit regression for binary and ordinal categorical response, and

multinomial logistic regression and the conditional logit model for unordered categorical

response. However, organizational research often requires multivariate dependent measures.

When such measurements are in the metric (numeric) scale, a plethora of methodologies are


4

available, including structural equations models, path analysis, partial least squares, multivariate

analysis of variance or covariance, or moderated multiple regression.

On the other hand, few methods are readily available to the organizational researcher for

the case where there are multiple non-metric (categorical) dependent measures. For multiple

ordered categorical responses, one may use multivariate probit models (Zajac & Westphal, 1996;

Keister, 2004). However, in addition to the assumption of ordered responses, these models

require additional restrictive assumptions in the case of higher dimensional responses to allow

computational tractability (Bock & Gibbons, 1996). Thus, while the available methodologies for

ordered categorical response variables are limited, there seems to be no method whatsoever that

is in general use for multivariate unordered categorical variables in organizational research, a gap

we aim to fill.

Thus, the goal of this paper is to popularize a class of models for the analysis of multiple

unordered categorical response variables. These models are estimated via special application of

the conditional logit model, and therefore may be analyzed easily using existing software,

although the researcher must create various dummy variables and interaction terms. We do not

claim that the model is new, only that it is under-utilized. Original references include Nerlove

and Press (1973), Amemiya (1981), Lehrer and Stokes (1985), and Stokes (1997).

In order to introduce the reader to the fundamentals of the model and illustrate how the

model may be used for the analysis of multiple unordered categorical response variables we

develop theory, extensions, and software implementations for this model using two different

examples that are of interest to organizational researchers. In the first example we consider a

hypothetical case of assessing race and gender bias of hiring practices of managers. In the

second example we conduct an empirical analysis dealing with predicting function and


5

ownership in international entries. The paper then concludes with a discussion of the benefits of

using the model for multiple categorical responses (as opposed to separate univariate analyses).

It should be noted that since it is always possible to transform a metric measure (e.g.,

income) to a categorical measure via segmentation, the model we consider applies equally to

metric measures where categorization is used; this is true for both univariate and multivariate

models. Such segmentation is often done when the response measure is badly skewed (like the

distribution of income), or simply when the categories make sense. Thus, we consider only

models for unordered categorical responses, which are appropriate even when the categories are

ordered, if the effects of predictors on the response are non-linear.

Example One: A Hypothetical Case of Assessing Race and Gender Bias of Hiring Practices of Managers

To illustrate the need for handling multiple categorical responses jointly rather than

separately, via univariate models, consider a hypothetical case of assessing race and gender bias

of hiring practices of managers. This may be done by comparing the (race/gender) distribution

of hires for managers of class A versus the same distribution for managers of class B. Class A

and B might be minority and nonminority, or Male and Female, or of certain personality

characteristics (such as a “Type A” personality); it doesn’t matter for the purposes of this

illustration. In this application, both race and gender are acknowledged by the manager at the

time of the hiring choice, and if there is bias, there may well be an interaction between the two

variables (e.g., “double counting” for female minorities). Meanwhile, manager type (A or B) is a

pre-existing condition. Thus, gender and race are jointly endogenous, and manager type is

exogenous.


6

Table 1 below shows an example where gender-based and race-based univariate models

each would be misleading. Entries within the table are hypothetical counts of hires, cross-

classified by race, gender and manager group.

Table 1. Hypothetical case where manager type affects interaction between race and gender Managers of type A Managers of type B Race\Gender Female Male Total Female Male Total Minority 30 70 100 70 30 100 Non-Minority 70 30 100 30 70 100 Total 100 100 200 100 100 200

In this hypothetical example, manager type has strong effects on gender hire and on race

hire within categories of the other variable. However, if the data are pooled across one or the

other variables, as is done using the (univariate) binary logit models, then only data in the “total”

entries specific to that variable are used. Table 2 compares univariate analyses with those of the

multivariate model (bivariate in this example) that we promote.

Table 2: Results of univariate categorical response model and bivariate categorical response model analysis of data in Table 1.

Model 1 Model 2 Model 3 Prediction of Gender Prediction of Race Prediction of Gender/Race

Dependent Measure Contrast β̂ s.e. β̂ s.e. β̂ s.e.

Gender Main Effect Female/Male 0.000 0.200 -- -- 1.695 0.309 Race Main Effect Minority/NonMinority -- -- 0.000 0.200 1.695 0.309 Gender*Race Interaction (Minority/Nonminority)

/(Female/Male) -- -- -- -- -3.389 0.436

Estrella (1998) R2 0.000 0.000 0.156

As seen, manager type appears to have absolutely no effect whatsoever on race or gender,

no matter which of the two univariate response variables were selected for the analysis. On the

other hand, the bivariate model assesses effects of predictors upon each response variable


7

individually, as well as upon their interaction, showing strong, significant effects. Thus the

bivariate model is more appropriate.

One might argue in the case shown in Table 1 that separate analyses should be performed

within categories of the other variable; in other words, that effects of manager type upon gender

should be studied separately for minorities and nonminorities, and vice versa. However, there

are two problems with this approach: first, it leads to four models, with no formal mechanism to

combine information across them. Second, conditioning upon either race or gender in this

fashion implies that race or gender is exogenous, contrary to the theory that they are jointly

endogenous. In contrast, the bivariate model contains the same information that is contained in

the separate models, and treats the responses as jointly endogenous.

Another approach would be to treat the choice of race/gender mode as a four-level

multinomial response. This solves the problems noted above, and is similar to the bivariate

model. However, the bivariate model subsumes the univariate models as special cases:

parameter estimates and standard errors from univariate models are obtained exactly by

specifying only the “main effects” associated with the univariate choice variable in the bivariate

model. The bivariate model also allows simple determinations of effects of exogenous predictors

on the interaction between endogenous choices, which, though possible with the multinomial

logit model, is much more cumbersome. Finally, the bivariate model provides a more flexible

modeling procedure in which higher order interactions can be tested and included only as needed

based on standard measures of model fit, such as AIC and SBC; the bivariate model is thus the

more natural and useful formulation.


8

Development of the Model

Utility Theory Formulation

The model is motivated by utility theory that gives rise to the conditional logit model

(CLM). The CLM is ordinarily expressed as a univariate model, but is easily extended to the

multivariate case. The standard CLM is motivated by the utility model

Uj = Xj’β + εj, (1)

where Uj is the utility of selecting choice j, Xj is a vector of exogenous predictors that may

depend on choice j, and where εj, j=1,…,J, are independent and identically distributed random

variables having the distribution exp{-exp(-εj)}, also known as the Type I extreme value, or

Gumbel, distribution. When the utility Uj of choice j is highest among (U1,…,UJ), choice j is

selected. Under these assumptions, the choice selection probabilities are given by

P(Choice j is selected) = exp(Xj’β)/ Jk 1=Σ exp(Xk’β), (2)

called the conditional logit model (CLM) by McFadden (1974). While the utility formulation (1)

mathematically justifies the CLM, it is not necessary to posit existence of utilities to use (2).

However, when using (2) without assumption (1), one still must assume independence from

irrelevant alternatives (see e.g., Talluri & van Ryzin, 2004).

Software is readily available to estimate (2); for example, the SAS/ETS procedure PROC

MDC estimates them directly, and the SAS/STAT procedure PROC PHREG can estimate them

as well (Allison, 1999). In STATA, one can use the MCL module.

Extending model (1) and (2) to multivariate response variables is conceptually simple: if

there are two responses with I and J possible choices, respectively, then the choice subscript j in

models (1) and (2) refers to a particular combination of the two response choices, and J in


9

models (1) and (2) is replaced with IJ, the total number of combined choices. For an exogenous

predictor variable X, such a model is generically defined as

Utility =

(baseline main effect on choice variable 1) + (baseline main effect on choice variable 2) +

(baseline effect on interaction between choice variables 1 and 2) +

(effect of X on choice variable 1) + (effect of X on choice variable 2) +

(effect of X on interaction between choice variables 1 and 2) + ε. (3)

To implement model (3) using statistical software, let d(1)i, i=1,…,I-1, denote dummy

variables for choice variable 1, and let d(2)j, j=1,…,J-1, denote dummy variables for choice

variable 2. The model is then estimated as

Utility = (α(1)1 d(1)1 + …+ α(1)I-1 d(1)I-1) + (α(2)1 d(2)1 + …+ α(2)J-1 d(2)J-1) +

(α(12)1,1 d(1)1d(2)1 + …+ α(12)I-1,J-1 d(1)I-1d(2)J-1) +

(β(1)1 d(1)1X + …+ β(1)I-1 d(1)I-1X) + (β(2)1 d(2)1X + …+ β(2)J-1 d(2)J-1X) +

(β(12)1,1 d(1)1d(2)1X + …+ β(12)I-1,J-1 d(1)I-1d(2)J-1X) + ε. (4)

Model (4) accounts for the fact that model (2) is over-parameterized by excluding one in

the list of possible dummy variables.

Extension of (4) to three or more choice variables is simple: in the case of three choice

variables there will be three sets of interaction terms involving two of the variables, and one set

of terms involving all three; these correspond to two-way and three-way interactions in an

analysis of variance (ANOVA). Extensions to four and higher follows the same pattern.

Extension to multiple exogenous predictors is also simple; one simply includes all product terms

involving the additional predictors, using the same form as shown in (4).


10

With more dependent and predictor variables, the complexity of the model grows quickly,

and sparse data in combination cells can render models with higher-level interaction terms non-

identifiable. Thus it is usually important not to include all terms. Parsimonious model selection

is readily accomplished by evaluating interaction terms in (4) via likelihood ratio tests, AIC, or

BIC as desired, and by removing unneeded interactions (backwards deletion), or including

needed terms (forward selection).

Software Implementation

Consider model (4) in the context of the analysis in Table 2. Choice variable 1 is

“Gender” and Choice variable 2 is “Race”, and the variables in (4) are defined as

d(1)1 = 1 if Race is “Minority”; d(1)1 = 0 otherwise

d(2)1 = 1 if Gender is “Female”; d(2)1 = 0 otherwise

X = 1 if manager is Type A; X = 0 otherwise.

To estimate the bivariate model shown in Table 2 using PROC MDC of SAS/ETS, the syntax is

model choice = dMin dFem dMin_dFem

dMin_TypeA dFem_TypeA dMin_dFem_TypeA/type = clogit nchoice = 4;

where the “d” prefix denotes the appropriate dummy variables, the underscores “_” indicate

product, and the remaining terms are products as indicated in model (4). Here, the “TypeA”

term is a binary indicator, but in general it can be any numerical measure. The data structure for

both PROCs PHREG and MDC of SAS requires that the data contain as many rows as choices

per observational unit; i.e., if there are 1000 observations, and each observation corresponds to a

single choice from a set of four choices, then the input data set will have 4000 rows. All SAS

code is freely available from the first author.


11

To estimate the univariate models in Table 2, exclude all of the variables associated with

one or the other choice variable, e.g., for minority only,

model choice = dMin dMin_TypeA/type = clogit nchoice = 4;

In general models there will be more terms to include: I-1 baseline dummy variables for

variable 1, J-1 for variable 2, and (I-1)(J-1) for interactions, giving IJ-1 baseline dummies. There

are product terms involving X as well for all of these terms, giving an additional IJ-1 terms with

one X, and in general k(IJ-1) possible additional terms for k predictor variables. This increase in

complexity underscores the need for parsimony. Generally, not all possible terms should be

included, especially when data are sparse for some combinations of response variable choices.

Simpler models should be specified a priori in confirmatory analyses, and variable selection

should be performed in exploratory analyses.

Parameter Interpretation

Again considering the example of Table 2, let

Oij(X) = P((i,j) selected | X)/ P((2,2) selected | X)

denote the odds favoring choice (race i, gender j) to the (NonMinority, Male) choice (the

(NonMinority, Male) choice is the left-out combination category in the dummy variable

definitions given above). The conditional logit model (2) implies that the log odds ratios

ln{Oij(X=1)/Oij(X=0)} are as given in Table 3.

Table 3. Log odds ratios showing the effect of manager type Female Male Minority β(1)1 + β(2)1 + β(12)1,1 β(1)1 Nonminority β(2)1 0


12

Interpretations are simplest in the model when there is no interaction; i.e., when β(12)1,1 =

0. In this case β(1)1 is the effect of manager type on minority choice, in either gender category.

Specifically, the log odds ratio of Minority selection is β(1)1 higher than the log odds ratio of

Nonminority, in either level of gender. Similarly, when there is no interaction, β(2)1 is the effect

of manager type on gender choice, for either race category. Specifically, the log odds ratio of

Female selection β(2)1 higher than the log odds ratio of male selection, in either race category.

In the interaction model, it is not as easy to interpret the parameters, and often graphical

methods are used (e.g., Cannella & Shen, 2001) to display interaction effects clearly.

Nevertheless, all parameters remain interpretable as log odds ratios in the case where β(12)1,1 ≠ 0,

via inspection of Table 3. In this case β(1)1 is the effect of manager type on minority hires in the

male category only. Similarly, when there is no interaction, β(2)1 is the effect of manager type on

female hires, in the Nonminority category only. Finally, the interaction term β(12)1,1 is the effect

of manager type on the (2x2) interaction between race and gender selection. Specifically, the

difference between log odds ratios of Female and Male selections among Minority hires is β(12)1,1

higher than the difference between log odds ratios of Female and Male selections among

Nonminority hires.

Thus, in table 2, we have the following interpretations:

• (1)1 β = 1.695 is the estimated effect of manager type on minority selection in the male

category.

• (2)1 β = 1.695 is the estimated effect of manager type on female selection in the Nonminority

category.


13

• (12)1,1 β = -3.389 is the estimated effect of manager type on the

(Minority/Nonminority)/(Female/Male) interaction between choices.

Note that when there are more than two categories, the same interpretations are used, but

one must keep in mind that effects are on differences from the excluded categories used in the

dummy variable formulations. Note also that the interpretations are identical when there are

additional predictor variables, with the usual “ceteris paribus” caveat concerning the additional

variables. Finally, note that the predictors are more often continuous rather than binary, in which

case the “effect” of a predictor is interpreted as the effect of a one unit increase in the predictor.

Estimation, Inference and Model Fit

All parameters may be estimated via standard maximum likelihood procedures;

inferences are available using standard large-sample likelihood-based procedures (Greve &

Taylor, 2000; Cannella & Shen, 2001; Gibson & Zellmer-Bruhn, 2001). Alternatively, Bayesian

methods may be used (O’Brien & Dunson, 2004); Bayesian methods solve difficulties with

maximum likelihood that are caused by sparse data, non-identifiability, and non-convergence,

but obtaining software may be more difficult for Bayesian methods than for likelihood methods.

When comparing different models, such as the interaction models versus no-interaction models,

we suggest using likelihood ratio tests; when reporting results of estimated models, we suggest

using Wald standard errors of the parameters (e.g., Agresti, 2002: 11). Predictive ability can be

assessed using maximized log-likelihood values, generalized R2 statistics (Estrella, 1998,

provides an overview), and AIC statistics to guard against over-parameterization (e.g., Bozdogan,

1987).


14

Predicting Function and Ownership in International Entries

The multivariate dependent categorical variable model we consider was motivated by the

literature on predicting foreign market entry modes using cultural characteristics. A problem in

this stream of organizational research is that the models that have been used are univariate; some

predicting function (acquisition or joint venture; see Kogut & Singh, 1988; Anand & Delios,

1997; Folta & Ferrier, 2000), and some predicting ownership (minority, equal, or majority; see

Hennart, & Larimo, 1998; Pan, 1996; Erramilli, 1996). However, theory suggests that the

responses are jointly endogenous (multivariate), rather than univariate: firms entering

partnerships will determine the function and choice parameters of the agreement simultaneously,

and interactively. For example, a less desirable ownership level may be acceptable given a

desirable function. Hence, the variables function and ownership are jointly endogenous. Clearly

exogenous variables are the economic, political and social conditions, cultural and firm-specific

characteristics. The bivariate model avoids potential biases due to pooling effects as illustrated

in Tables 1 and 2, and also allows researchers to address questions such as, “do cultural variables

affect function or ownership most strongly?” and “do cultural variables affect the interaction

between function and ownership?”

As noted by Shenkar (2001), the foreign entry literature lacks a comprehensive

framework to fully understand the theoretical and empirical issues surrounding the influences of

national culture on foreign entry mode selections. The bivariate choice model predicts function

and ownership simultaneously, and subsumes the univariate models as special cases. Thus, the

bivariate model provides a comprehensive framework as advocated by Shenkar, and can resolve

at least some of the theoretical and empirical issues in this stream of organizational research.


15

To illustrate the method, we obtained data on 2085 completed international entries where

one of the partners was a Chinese firm, and another was a foreign manufacturing firm. All data

were selected from the Securities Data Company (SDC) Platinum database, and only entries with

complete information on the variables we considered were selected (see Hennart, 1991; Hennart

& Larimo, 1998; Makino & Neupert, 2000 for similar studies).

The response variables are MODE, the function-based entry mode selection (joint venture

or acquisition), and OWNSHIP, the foreign ownership level selection (majority, >50%

ownership; equal, 50% ownership; or minority, <50% ownership). Most entries are firms from

the United States, Japan, Hong Kong, Germany, France, the United Kingdoms, and Singapore;

the raw counts are given in Table 4.

Table 4. Combinations of Control: Ownership and Function Based Entry Modes Ownership / Function Acquisitions Joint Ventures Total

Minority Ownership Minority Acquisition 118 (5.66%)

Minority Joint Venture 173 (8.30%)

291 (13.96%)

Majority Ownership Majority Acquisition 161 (7.72%)

Majority Joint Venture 647 (31.03%)

808 (38.75%)

Equal Ownership Equal Acquisition 17 (0.82%)

Equal Joint Venture 969 (46.47%)

986 (47.29%)

Total 296 (14.20%) 1789 (85.80%) 2085 (100%)

Cultural variables are summarized using composite cultural distance (CCD), calculated

according to Kogut and Singh’s (1988) aggregated equation. Control variables include Legal

restriction (LEGALRE), a measure for institutional influence in the host country coded 1 for

ownership restriction if an international joint venture before 1990, and 0 otherwise; coded 1 if a

foreign acquisition before 1995, and 0 otherwise (cf. Gomes-Casseres, 1999; Barkema &

Vermeulen, 1998). Timing of entry (TIMING) is used as a measure of institutional change over

time, measured as the number of years from January, 1985 to the venture's founding. Firm size


16

(FMSIZE) is coded as 1 if the foreign firm is listed in 2002 Global 1000 of Business Week or

2002 Fortune Global 500, and 0 otherwise (cf. Pan, 2001).

The model for utility U of each of the six possible joint choices of function (acquisition

or joint venture) and ownership (minority, equal, or majority) is given generically as

U = (baseline constant utilities) + (effects of CCD on function, ownership and interaction)

+ (effects of LEGALRE on function, ownership and interaction)

+ (effects of TIMING on function, ownership and interaction)

+ (effects of FMSIZE on function, ownership and interaction) + ε.

Dummy variable parameterizations are described above. A stepwise variable selection

procedure using the α=.05 threshold was employed, considering interaction terms and main

effects tests as groups, rather than individual variables, where appropriate (cf. Agresti, 2002:

214-216). For example, the two interaction terms that measure the effect of CCD on interaction

are tested as a group using the likelihood ratio chi-square test with two degrees of freedom,

rather than tested separately using singe degree of freedom tests. “Effect heredity” is enforced,

wherein lower-order terms are retained whenever higher-order terms are retained (Hamada &

Wu, 1992). The resulting estimated model shows that CCD and LEGALRE affect function and

ownership but not their interaction, that TIMING affects function but not ownership, and that

FMSIZE affects the interaction of function and ownership. The parameter estimates and

standard errors of the resulting model for composite cultural measure are presented in the first

column of Table 5.


17

Table 5. Comparison of bivariate model with univariate models to predict market entry Bivariate Model Univariate Model

(Function) Univariate Model

(Ownership) Variable Effect on β̂ s.e. β̂ s.e. β̂ s.e.

CCD Acq./JV -0.286 0.061 -0.362 0.057

CCD Low/Equal -0.213 0.061 -0.343 0.055

CCD High/Equal -0.105 0.045 -0.171 0.042

LEGALRE Acq./JV 4.277 0.455 3.710 0.388

LEGALRE Low/Equal -0.742 0.439 0.898 0.392

LEGALRE High/Equal -1.377 0.381 -0.500 0.357

TIMING Acq./JV 0.409 0.026 0.410 0.026

TIMING Low/Equal 0.139 0.025

TIMING High/Equal 0.067 0.019

FMSIZE Acq./JV -1.892 1.037 -0.557 0.186

FMSIZE Low/Equal -0.510 0.204

FMSIZE High/Equal -0.004 0.112

(Acq,JV)/ FMSIZE

(Low,Equal)

2.176 1.075

(Acq,JV)/ FMSIZE

(High,Equal)

1.100 1.060

Model Fit Statistics

Log Likelihood -2551.84 -2953.22 -3477.03

AIC (lower is better) 5137.68 5916.44 6970.06

Estrella (1998) R2 0.745 0.569 0.227

Comparisons with bivariate model

χ2=802.76,df=12 χ2=1850.38,df=9

It is clear from the fit statistics that the bivariate model has much greater explanatory

ability. In addition, the estimated bivariate model shows that FMSIZE affects interaction

between function and ownership; this conclusion is not formally possible (i.e., in a way that

allows significance testing) with the univariate models. The effect of FMSIZE on interaction is

shown in Figure 1.


18

Figure 1. Log odds ratios for FMSIZE effect

-2

-1

0

1

Acq

JV

Acq -0.226 -1.892 -0.796

JV -0.51 0 -0.004

Low Equal High

Among contracts that ultimately end as joint ventures, firm size has little effect.

However, among contracts that ultimately end as acquisitions, the effect of larger firms is to

greatly reduce the odds of equal ownership.

An additional striking contrast between the univariate and bivariate models is that the

estimated effects of LEGALRE are quite different. In the bivariate model, the presence of legal

restrictions is estimated to decrease the odds of low versus equal ownership, while in the

univariate model, the reverse is true. It is important to note that interpretation of the parameters

in the joint model are interpreted as effects within levels of the function variable, while in the

univariate model the estimates refer to data pooled over the levels of function. As seen in Tables

1 and 2, pooling can have disastrous effects, as is well known from Simpson’s paradox.

While we have argued that the bivariate model is an improvement over the univariate

models, the main results concerning effects of cultural distance are essentially unchanged no

matter which model is used. However, an interesting finding not reported in the literature is this:

because the effects of CCD on Low/Equal and High/Equal are both significantly negative, we


19

conclude that the larger the cultural distance between the home and host countries, the more

likely that selects an equal ownership entry mode is selected, rather than a majority or minority

ownership mode. To make this effect more clear, in Table 6 below we show ownership structure

among firms with cultural distance less than the median (2.9534), and among firms with cultural

distance greater than the median. The greater likelihood of equal ownership with High CCD is

apparent. Further, this nonlinear “upside-down U shaped” effect of CCD on ownership

illustrates the need to consider ownership as an unordered categorical variable, despite its ordinal

nature.

Table 6. Ownership Distribution by Composite Cultural Distance Ownership Low CCD* High CCD Minority 130 (16.3%) 161 (12.5%) Majority 346 (43.4%) 462 (35.9%) Equal 322 (40.4%) 664 (51.6%) Total 798 (100%) 1287 (100%)

*Because the number of countries is limited, there were a number of firms having CCD=2.9534 exactly, leading to only 798/2085 = 38% of the firms in the “Low CCD” category rather than ~50%.

Conclusion

We have developed a model for the case where there are multiple unordered categorical

dependent variables. The model is simple to specify and estimate using existing software, and is

flexibly and parsimoniously fit by selecting among interaction effects, similar to an ANOVA.

The model is recommended in cases where responses are jointly endogenous, failure to do so can

result in biases in univariate models caused by pooling. Because the model offers a

comprehensive framework that encompasses univariate models, it provides a unifying

framework for research streams within organizational research including market entry and

gender/race discrimination.


20

Specifically, the model we have developed has the following benefits for analyzing

categorical responses jointly (as opposed to performing separate univariate analyses):

• Parameter estimates and standard errors from the corresponding univariate models are

obtained exactly from the multivariate model by dropping appropriate terms. Thus, the

univariate models results all are obtainable within the more general multivariate framework,

allowing formal model comparisons.

• When the predictor variables affect the interaction between the dependent measures, the

univariate analyses can be grossly misleading, but the multivariate analyses are accurate.

• The multivariate model allows that one predictor can affect one response and not the

other, within a single model, allowing joint efficient estimation of “seemingly unrelated

regressions.”

• The multivariate model includes main effect-type terms, two-way interactions, three-

way interactions, etc., allowing for exploratory parsimonious model selection by eliminating sets

of interaction parameters en masse, as in higher-way ANOVA modeling. For confirmatory

analysis, all model terms can be pre-specified; the ANOVA framework of the multivariate model

facilitates confirmatory model development.

• Composite tests of hypotheses across the set of response variables are easily available,

e.g., one can test the composite hypothesis that organizational unit has no effect on (gender, race)

combination, controlling for appropriate exogenous terms. These tests are often more powerful

than the univariate component-wise tests because (a) they combine information, in a meta-

analytic sense, and (b) when reduced form models are used, there are fewer degrees of freedom,

allowing focused tests, rather than diffuse tests that occur with over-parameterized models.


21

This paper has illustrated and highlighted a model that can be used to analyze multiple

unordered categorical response variables. Given the ease of implementation of these methods,

and the potential for improved analysis, it is hoped that this paper will encourage researchers to

analyze organizational data using models for multiple unordered categorical response variables

whenever their categorical measurements are simultaneously endogenous.


22

References

Agresti, A. (2002). Categorical Data Analysis. New York: John Wiley & Sons. Aguinis, H., Boik, R.J., & Pierce, C.A. (2001). A generalized solution for approximating the

power to detect effects of categorical moderator variables using multiple regression. Organizational Research Methods, 4, 291-323.

Allison, P.D., 1999, Logistic Regression using the SAS® System: Theory and Application. Cary.

NC: SAS Institute Inc Amemiya, T. (1981). Qualitative response models: A survey. Journal of Economic Literature, 19,

1483-1536. Anand, J., & Delios, A. (1997). Location specificity and the transferability of downstream assets

to foreign subsidiaries. Journal of International Business Studies, 28, 579-603. Barkema, H.G., & Vermeulen, F. (1998). International expansion through start-up or acquisition:

A learning perspective. Academy of Management Journal, 41, 7-26. Bock, D., & Gibbons, R.D. (1996). High-dimensional multivariate probit analysis. Biometrics,

52 1183–1194. Bozdogan, H.M. (1987). Selection and Akaike's information criterion: The general theory and its

analytical extensions. Psychometrika, 52, 345-370, Cannella Jr., A.A., & Shen, W. (2001). So close and yet so far: promotion versus exit for CEO

heirs apparent. Academy of Management Journal, 44, 252-270. Chen, G. and Astebro, T. (2003). How to deal with missing categorical data: Test of a simple

bayesian method. Organizational Research Methods, 6, 309-327. Erramilli, M.K. (1996). Nationality and subsidiary ownership patterns in multinational

corporations. Journal of International Business Studies, 27, 225-248. Estrella, A. (1998). A new measure of fit for equations with dichotomous dependent variables.

Journal of Business and Economic Statistics, 16, 198–205. Folta, T.B., & Ferrier, W.J. (2000). The effect of national culture on partner buyouts in cross-

border biotechnology alliances. Journal of High Technology Management Research, 11, 175-198.

Gibson, C., & Zellmer-Bruhn, M. (2001). Metaphors and meaning: An intercultural analysis of

the concept of teamwork. Administrative Science Quarterly, 46, 274-303.


23

Giuliano, L., Levine, D.I., & Leonard, J. (2005). Race, gender, and hiring patterns: evidence from a large service-sector employer. Working manuscript.

Gomes-Casseres, B. (1999). Firm ownership preferences and host government restrictions: An

integrated approach. Journal of International Business Studies, 21, 1-22. Greve, H.R., & Taylor, A. (2000). Innovations as catalysts for organizational change: Shifts in

organizational cognition and search. Administrative Science Quarterly, 45, 54-80. Hamada, M., & Wu, C.F.J. (1992), Analysis of designed experiments with complex aliasing.

Journal of Quality Technology, 24, 130–137. Hennart, J.-F. (1991). The transaction costs theory of joint ventures: An empirical study of

Japanese subsidiaries in the United States. Management Science, 37, 483-497. Hennart, J.-F., & Larimo, J. (1998). The impact of culture on the strategy of multinational

enterprises: Does national origin affect ownership decisions? Journal of International Business Studies, 29, 515-538.

Holzer, H.J. (1998). Employer skill demands and labor market outcomes of blacks and women.

Industrial and Labor Relations Review, 52, 82-98. Keister, L.A. (2004). Capital structure in transition: the transformation of financial strategies in

China's emerging economy. Organization Science, 15, 145-158. Kogut, B., & Singh, H. (1988). The effect of national culture on the choice of entry mode.

Journal of International Business Studies, 19, 411-432. Lehrer, E., & Stokes, H. (1985). Determinants of the female occupational distribution: A log-

linear probability analysis. Review of Economics and Statistics, 67, 395-404. Makino, S., & Neupert, K.E. (2000). National culture, transaction costs, and the choice between

joint venture and wholly owned subsidiary. Journal of International Business Studies, 31, 705-714.

McFadden, D. (1974). Multinomial logit analysis of qualitative choice behavior. In Frontiers in

Econometrics, P. Zarembka (Ed.). New York: Academic Press, 105-142. Nerlove, M., & Press, S. (1973). Univariate and multivariate log-linear and logistic models.

Manuscript R-1306-EDA/NIA. Santa Monica, CA: Rand Corporation. O’Brien, S.M., & Dunson, D.B. (2004). Bayesian multivariate logistic regression. Biometrics, 60,

739-746. Pan, Y. (1996). Influences on foreign equity ownership level in joint ventures in China. Journal

of International Business Studies, 27, 1-25.


24

Pan, Y. (2001). Joint venture formation of very large multinational firms. Journal of

International Business Studies, 31, 179-189.

Shaw, J. (2004). The development and analysis of a measure of group faultlines. Organizational Research Methods, 7, 66-100.

Shenkar, O. (2001). Cultural distance revisited: Towards a more rigorous conceptualization and

measurement of cultural differences. Journal of International Business Studies, 32, 519-535.

Stokes, H.H. (1997). Specifying and Diagnostically Testing Econometric Models (Second

Edition), New York: Quorum Books, Talluri, K., & van Ryzin, G. (2004). Revenue management under a general discrete choice model

of consumer behavior. Management Science, 50, 15-33. Williams, L.J., Edwards, J.R., & Vandenberg, R.J. (2003). Recent advances in causal modeling

methods for organizational and management research. Journal of Management, 29, 903-936.

Zajac, E.J., & Westphal, J.D. (1996). Who shall succeed? How CEO/board preferences and

power affect the choice of new CEOs. Academy of Management Journal, 39, 39:64-90

multiple unordered categorical dependent variables in organizational research

Documents