toward a causal interpretation from observational data: a ...ericz/isr09.pdf · function). existing...

27
Information Systems Research Articles in Advance, pp. 1–27 issn 1047-7047 eissn 1526-5536 inf orms ® doi 10.1287/isre.1080.0224 © 2009 INFORMS Research Note Toward a Causal Interpretation from Observational Data: A New Bayesian Networks Method for Structural Models with Latent Variables Zhiqiang (Eric) Zheng School of Management, University of Texas at Dallas, Richardson, Texas 75083, [email protected] Paul A. Pavlou Fox School of Business and Management, Temple University, Philadelphia, Pennsylvania 19122, [email protected] B ecause a fundamental attribute of a good theory is causality, the Information Systems (IS) literature has strived to infer causality from empirical data, typically seeking causal interpretations from longitudinal, experimental, and panel data that include time precedence. However, such data are not always obtainable and observational (cross-sectional, nonexperimental) data are often the only data available. To infer causality from observational data that are common in empirical IS research, this study develops a new data analysis method that integrates the Bayesian networks (BN) and structural equation modeling (SEM) literatures. Similar to SEM techniques (e.g., LISREL and PLS), the proposed Bayesian Networks for Latent Variables (BN-LV) method tests both the measurement model and the structural model. The method operates in two stages: First, it inductively identifies the most likely LVs from measurement items without prespecifying a measurement model. Second, it compares all the possible structural models among the identified LVs in an exploratory (automated) fashion and it discovers the most likely causal structure. By exploring the causal struc- tural model that is not restricted to linear relationships, BN-LV contributes to the empirical IS literature by overcoming three SEM limitations (Lee et al. 1997)—lack of causality inference, restrictive model structure, and lack of nonlinearities. Moreover, BN-LV extends the BN literature by (1) overcoming the problem of latent vari- able identification using observed (raw) measurement items as the only inputs, and (2) enabling the use of ordinal and discrete (Likert-type) data, which are commonly used in empirical IS studies. The BN-LV method is first illustrated and tested with actual empirical data to demonstrate how it can help reconcile competing hypotheses in terms of the direction of causality in a structural model. Second, we conduct a comprehensive simulation study to demonstrate the effectiveness of BN-LV compared to existing techniques in the SEM and BN literatures. The advantages of BN-LV in terms of measurement model construction and structural model discovery are discussed. Key words : causality; Bayesian networks; structural equation modeling; observational data; Bayesian graphs History : Vallabh Sambamurthy, Senior Editor and Associate Editor. This paper was received on October 25, 2006, and was with the authors 20 months for 2 revisions. Published online in Articles in Advance. 1. Introduction Because a fundamental attribute of a good theory is causality (Bagozzi 1980), causality inference (X causes Y ) is deemed invaluable in the social and behavioral sciences in general and information sys- tems (IS) research in particular. However, despite the enhanced sophistication of IS studies in terms of the- ory and empirical testing, causality has not received the requisite attention. Similar to most other disci- plines (e.g., Mitchell and James 2001, Shugan 2007), the IS discipline tends to avoid issues of causality because of the difficulty in inferring causal relationships from data, and because causality is only inferred from pure theory. This is partly because of the fact that causality inference requires strict conditions. Though there is no consensus on the necessary and sufficient conditions 1 Copyright: INFORMS holds copyright to this Articles in Advance version, which is made available to institutional subscribers. The file may not be posted on any other website, including the author’s site. Please send any questions regarding this policy to [email protected]. Published online ahead of print May 12, 2009

Upload: others

Post on 03-Jul-2020

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Toward a Causal Interpretation from Observational Data: A ...ericz/ISR09.pdf · function). Existing approaches need to prespecify the exact form of the nonlinear relationship. A general

Information Systems ResearchArticles in Advance, pp. 1–27issn 1047-7047 �eissn 1526-5536

informs ®

doi 10.1287/isre.1080.0224©2009 INFORMS

Research Note

Toward a Causal Interpretation from ObservationalData: A New Bayesian Networks Method for

Structural Models with Latent Variables

Zhiqiang (Eric) ZhengSchool of Management, University of Texas at Dallas, Richardson, Texas 75083,

[email protected]

Paul A. PavlouFox School of Business and Management, Temple University, Philadelphia, Pennsylvania 19122,

[email protected]

Because a fundamental attribute of a good theory is causality, the Information Systems (IS) literature hasstrived to infer causality from empirical data, typically seeking causal interpretations from longitudinal,

experimental, and panel data that include time precedence. However, such data are not always obtainable andobservational (cross-sectional, nonexperimental) data are often the only data available. To infer causality fromobservational data that are common in empirical IS research, this study develops a new data analysis methodthat integrates the Bayesian networks (BN) and structural equation modeling (SEM) literatures.

Similar to SEM techniques (e.g., LISREL and PLS), the proposed Bayesian Networks for Latent Variables(BN-LV) method tests both the measurement model and the structural model. The method operates in twostages: First, it inductively identifies the most likely LVs from measurement items without prespecifying ameasurement model. Second, it compares all the possible structural models among the identified LVs in anexploratory (automated) fashion and it discovers the most likely causal structure. By exploring the causal struc-tural model that is not restricted to linear relationships, BN-LV contributes to the empirical IS literature byovercoming three SEM limitations (Lee et al. 1997)—lack of causality inference, restrictive model structure, andlack of nonlinearities. Moreover, BN-LV extends the BN literature by (1) overcoming the problem of latent vari-able identification using observed (raw) measurement items as the only inputs, and (2) enabling the use ofordinal and discrete (Likert-type) data, which are commonly used in empirical IS studies.

The BN-LV method is first illustrated and tested with actual empirical data to demonstrate how it can helpreconcile competing hypotheses in terms of the direction of causality in a structural model. Second, we conducta comprehensive simulation study to demonstrate the effectiveness of BN-LV compared to existing techniquesin the SEM and BN literatures. The advantages of BN-LV in terms of measurement model construction andstructural model discovery are discussed.

Key words : causality; Bayesian networks; structural equation modeling; observational data; Bayesian graphsHistory : Vallabh Sambamurthy, Senior Editor and Associate Editor. This paper was received on October 25,

2006, and was with the authors 20 months for 2 revisions. Published online in Articles in Advance.

1. IntroductionBecause a fundamental attribute of a good theoryis causality (Bagozzi 1980), causality inference (Xcauses Y ) is deemed invaluable in the social andbehavioral sciences in general and information sys-tems (IS) research in particular. However, despite theenhanced sophistication of IS studies in terms of the-ory and empirical testing, causality has not received

the requisite attention. Similar to most other disci-plines (e.g., Mitchell and James 2001, Shugan 2007), theIS discipline tends to avoid issues of causality becauseof the difficulty in inferring causal relationships fromdata, and because causality is only inferred from puretheory. This is partly because of the fact that causalityinference requires strict conditions. Though there is noconsensus on the necessary and sufficient conditions

1

Copyright:

INF

OR

MS

hold

sco

pyrig

htto

this

Articlesin

Adv

ance

vers

ion,

whi

chis

mad

eav

aila

ble

toin

stitu

tiona

lsub

scrib

ers.

The

file

may

notb

epo

sted

onan

yot

her

web

site

,inc

ludi

ngth

eau

thor

’ssi

te.

Ple

ase

send

any

ques

tions

rega

rdin

gth

ispo

licy

tope

rmis

sion

s@in

form

s.or

g. Published online ahead of print May 12, 2009

Page 2: Toward a Causal Interpretation from Observational Data: A ...ericz/ISR09.pdf · function). Existing approaches need to prespecify the exact form of the nonlinear relationship. A general

Zheng and Pavlou: New Bayesian Networks Method for Structural Models with Latent Variables2 Information Systems Research, Articles in Advance, pp. 1–27, © 2009 INFORMS

for inferring causality, Popper’s (1959) three condi-tions for inferring causality are generally accepted:(1) X precedes Y ; (2) X and Y are related; and (3) noconfounding factors explain the X → Y relationship.To satisfy these strict conditions, researchers need touse longitudinal, experimental, or panel data withtime precedence between variables X and Y to accountfor confounds and reverse causality (Allison 2005).1

However, it is often impossible to obtain such datain IS research (Mithas et al. 2006, p. 223) and obser-vational (cross-sectional, nonexperimental) data areoften the only data available. Therefore, our objectiveis to develop a method to help infer causality usingobservational data that are commonly used in empiri-cal IS research.Following the literature that maintains that “near”

(versus “absolute”) causality inference is possible fromobservational data (e.g., Granger 1986, Holland 1986),we develop a new data analysis method built on theBayesian networks (BN) and structural equation mod-eling (SEM) literature that offers a causal interpreta-tion to relationships among latent variables (LVs) instructural equation models.2 Our proposed method(termed BN-LV—Bayesian networks for latent vari-ables) encodes the relationships among LVs in a graph-ical model as conditional probabilities, it accountsfor potential confounds, and it discovers the mostlikely causal structure from observational data. Theproposed BN-LV method seeks to (1) sensitize ISresearchers about the importance of causality andpresent the possibility to infer causal relationshipsfrom data, (2) offer a method to IS researchers to helpinfer causality among constructs from observationaldata while overcoming key SEM limitations, and

1 Even with longitudinal data that have time precedence, it is notreadily known which variable precedes which. It is often impos-sible to know when a person formed certain perceptions (e.g.,perceived usefulness and perceived ease of use), even if these vari-ables are measured in different periods. Thus, even data with timeprecedence may not correspond to the actual timing of a person’sperceptions.2 From a theoretical point of view, because the proposed BN-LVmethod uses observational, cross-sectional data, it only addressestwo of Popper’s (1959) three conditions for inferring causality,excluding the condition that X must precede Y . Therefore, it is nota necessary and sufficient condition for inferring absolute causal-ity, but it is a method for inferring “near” causality (Granger 1986,Holland 1986).

(3) help spawn future research in refining SEM-basedmethods that render causal interpretations.According to Lee et al. (1997), SEM methods have

three key limitations: lack of causality inference,restrictive model structure, and lack of nonlinearities.First, though SEM was originally designed to modelcausal relationships, causality has gradually fadedaway from SEM studies (Pearl 2000). In fact, a reviewof the literature suggests that SEM studies do notattempt to infer causality, and most SEM (Cartwright1995) and IS researchers (Gefen et al. 2000) believethat SEM models cannot infer causality.3 The inabilityfor causal inference has forced IS researchers to refrainfrom even discussing issues of causality in IS studies.Second, most SEM studies specify one model struc-ture and use data to confirm or disconfirm this spe-cific structure by operating in a confirmatory mode.4

This prevents the automated exploration of alterna-tive or equivalent models. Chin (1998) warns thatoverlooking equivalent models is common in SEMstudies, and Breckler (1990) showed that only 1 of 72published SEM studies even suggests the possibilityof alternative models. Third, SEM only encodes linearrelationships among constructs, essentially ignoringthe possibility of nonlinear relationships.5

To address these three SEM limitations, we devel-oped the BN-LV method, which has three keyproperties: First, it encodes the relationships amongconstructs as conditional probabilities that, according toDruzdzel and Simon (1993), can offer a causal interpre-tation (as opposed to SEM, which uses correlation that

3 SEM techniques are primarily based on linear equations (e.g., PLS)or covariance structures (e.g., LISREL). Additional conditions suchas isolation of competing hypotheses (Cook and Campbell 1979) ortemporal ordering (Bollen 1989) are deemed necessary.4 To the best of our knowledge, no SEM techniques allow researchersto automate the process of examining alternative models. Manualexamination of alternative models becomes virtually impossible forcomplex models with multiple constructs.5 There have been attempts to incorporate interaction and nonlineareffects in SEM (e.g., Kenny and Judd 1984). However, as acknowl-edged by Kenny and Judd (1984, p. 209), their method is prelim-inary because it only deals with a single nonlinearity (quadraticfunction). Existing approaches need to prespecify the exact form ofthe nonlinear relationship. A general approach with unknown non-linearities still remains open in the SEM literature. Similarly, therehave been attempts in PLS to model interaction effects (e.g., Chinet al. 2003) but a general method for dealing with nonlinearitiesamong LVs is still not incorporated in to the PLS method.

Copyright:

INF

OR

MS

hold

sco

pyrig

htto

this

Articlesin

Adv

ance

vers

ion,

whi

chis

mad

eav

aila

ble

toin

stitu

tiona

lsub

scrib

ers.

The

file

may

notb

epo

sted

onan

yot

her

web

site

,inc

ludi

ngth

eau

thor

’ssi

te.

Ple

ase

send

any

ques

tions

rega

rdin

gth

ispo

licy

tope

rmis

sion

s@in

form

s.or

g.

Page 3: Toward a Causal Interpretation from Observational Data: A ...ericz/ISR09.pdf · function). Existing approaches need to prespecify the exact form of the nonlinear relationship. A general

Zheng and Pavlou: New Bayesian Networks Method for Structural Models with Latent VariablesInformation Systems Research, Articles in Advance, pp. 1–27, © 2009 INFORMS 3

does not imply causality). Second, BN-LV can auto-matically discover the most likely structural modelfrom observational data without imposing a prespeci-fied structure, thus exploring alternative SEM models.Third, BN-LV does not rely on any functional form(e.g., linear) to capture the relationships among con-structs, thus allowing potentially nonlinear relation-ships to freely emerge in the structural model.Similar to existing SEM techniques (e.g., LISREL

and PLS), the proposed BN-LV method operates intwo stages: measurement model construction andstructural model discovery.6 First, BN-LV inductivelyidentifies the LVs given the measurement items inan exploratory mode. This is achieved by our pro-posed LVI (Latent Variable Identification) algorithm,which is based on testing the conditional indepen-dence axiom (Kline 1998, Heinen 1996). This axiomasserts that the measurement items of the same LVare supposed to be caused by the LV and thusshould be independent of each other (conditional onthe LV). Second, after the LVs are identified, BN-LVexploratory discovers the most likely causal structureamong the LVs. In particular, we develop the OL(ordered logit) scoring function to select competingstructures specifically for ordinal and discrete (Likert-type) data, which are common in IS research. Besides,BN-LV can also be used in a confirmatory mode byexamining the fitness of a potential causal structure.Overall, the inputs to the BN-LV method are the rawmeasurement items, and the final output is the mostlikely causal BN graph that links the identified LVs.We describe how BN-LV works with actual empiri-

cal data to demonstrate how BN-LV can help reconcilecompeting hypotheses in terms of the directionality ofcausality when integrating trust with the technologyacceptance model (TAM) (Gefen et al. 2003, Pavlou2003), specifically the relationship between two con-structs: trust and ease of use. Carte and Russell (2003)argue that it is a common error in IS research not toexamine the reverse causality between two variables X

and Y (Carte and Russell 2003, p. 487): “Investigators

6 The proposed BN-LV method is essentially a data analysis tech-nique that only examines the measurement model (validation of themeasurement properties of LVs) and the structural model (testingthe causal relationships among LV). It does not address issues oftheory and measurement development, data collection, and theoryimplications that a complete research project includes.

need to be aware of theoretical rationale justifying theX → Y or Y → X causal orders.” However, solely rely-ing on theories to reconcile the directionality of causal-ity may not be sufficient because there can be equallyplausible theories, such as the direction of causalitybetween trust and ease of use. BN-LV is particularlyuseful in these circumstances by providing a data-driven method to reconcile competing hypotheses.This also has implications for new theory develop-ment (where there is no theory basis), which is com-mon in IS research because of the rapid change of IT.To evaluate BN-LV relative to existing data analy-

sis techniques for testing the measurement and thestructural model, we conducted a large-scale simula-tion study by varying four data dimensions: samplesize, noise, linearity, and normality. First, we com-pared LVI with the exploratory factor analysis (EFA)(SAS Proc Factor) and the confirmatory factor anal-ysis (PLS) for measurement model testing. Second,we compared BN-LV with LISREL and PLS in termsof structural model testing. Third, we compared ourproposed OL scoring function with two existing BNscoring functions—a Bayesian-Dirichlet-based func-tion (Heckerman 1996) and a Gaussian-based function(Glymour et al. 1987). The results show that BN-LVoverall outperforms all the other techniques underthree of the four simulated conditions (size, linearity,normality) except when the data are noisy, becauseSEM methods (LISREL and PLS) tend to work wellwith noisy data (Fornell and Larcker 1981).This study contributes to the IS literature by propos-

ing a new data analysis method for inferring causalrelationships from observational, cross-sectional,Likert-type data that are prevalent in IS research.Our BN-LV method has several advantages overalternative SEM methods: First, it tests the measure-ment model by identifying the appropriate LVs fromraw measurement items, operating in an exploratorymode without imposing a determined measurementmodel structure (as opposed to SEM). Our novel useof the conditional independence axiom enables causalinterpretation between the LV and its associatedmeasurement items, thereby being the only methodthat is consistent with the theory of measurement.Second, BN-LV infers causal (as opposed to correla-tion) links between the identified LVs by testing allplausible structural models in an automated fashion

Copyright:

INF

OR

MS

hold

sco

pyrig

htto

this

Articlesin

Adv

ance

vers

ion,

whi

chis

mad

eav

aila

ble

toin

stitu

tiona

lsub

scrib

ers.

The

file

may

notb

epo

sted

onan

yot

her

web

site

,inc

ludi

ngth

eau

thor

’ssi

te.

Ple

ase

send

any

ques

tions

rega

rdin

gth

ispo

licy

tope

rmis

sion

s@in

form

s.or

g.

Page 4: Toward a Causal Interpretation from Observational Data: A ...ericz/ISR09.pdf · function). Existing approaches need to prespecify the exact form of the nonlinear relationship. A general

Zheng and Pavlou: New Bayesian Networks Method for Structural Models with Latent Variables4 Information Systems Research, Articles in Advance, pp. 1–27, © 2009 INFORMS

and discovering the most likely one. This exploratorynature offers a major advantage over SEM techniquesthat require manual specification of plausible models,especially for complex models where such manualwork becomes virtually impossible. This propertyalso becomes valuable where there is little or noprior theory to guide the structure specification orwhen researchers want to let the data “speak out.”Also, BN-LV can still differentiate among prespecifiedcandidate structures, allowing IS researchers to testcompeting theories or question existing ones in aconfirmatory mode. Finally, BN-LV offers a causalinterpretation among the LVs in the structural modelby representing conditional probabilities. BN-LVrelaxes the assumption of linear structures imposed bySEM methods, and BN-LV clearly outperforms SEMtechniques (LISREL and PLS) when the structuralmodel is tested with nonlinear simulated data.The paper is organized as follows. Section 2 reviews

the philosophical origins of causality, discusses thechallenges of inferring causality from data, andreviews existing approaches for inferring causality(propensity scores, SEM, and BN). Section 3 presentsthe method development for the two components ofBN-LV—the LVI algorithm that identifies LVs fromraw measurement items (measurement model) andthe OL scoring function that helps build a Bayesiannetwork to identify causal relationships among LVs(structural model). Section 4 describes the steps of theproposed BN-LV method and evaluates the proposedmethod through an extensive empirical study withactual data and a large-scale experiment with simu-lated data. Section 5 discusses the study’s contribu-tions and the advantages and limitations of BN-LV.

2. Literature Review2.1. Philosophical Origins of CausalityThe notion of causality entails a relationship of acause to its effect. As early as 350 b.c., Aristotle pro-posed four distinct causes: the material, the formal,the efficient, and the final.7 Aristotle’s “four causes” is

7 The material cause is what something is made of. The formal causeis the form, type, or pattern according to which something is made.The efficient cause is the immediate power acting to produce thework. The final cause is the end or motive for the sake of which thework is produced (e.g., the owner’s pleasure).

the basis of the modern scientific concept that spe-cific stimuli produce standard results under certainconditions. Descartes (1637) argued that causality canbe understood and that cause is identical to sub-stance. Kant (1781) posited cause as a basic categoryof understanding, arguing that causality is a world of“things in themselves.”Aristotle (350 b.c.), Descartes (1637), and Kant (1781)

posit that causality can be comprehended, but otherphilosophers disagree. In response to Aristotle’s “fourcauses,” Spinoza (1662) believed that all final causesare nothing but human fictions. Plato, in his famous“Allegory of the Cave,”8 questioned that humans canunderstand causality. Hume (1738) holds the sameopinion, concluding that causality is not real but afiction of the mind. To account for the origin of thisfiction, Hume (1738) used the doctrine of associa-tion. He argues that we only learn by experience thefrequent conjunction of objects, without ever beingable to comprehend anything like the true causalconnection between them (Hume 1738, p. 46). Simi-larly, Pearson (1897), a founder of modern statistics,denied that causality was anything beyond frequencyof association.Summarizing the philosophical origins on causality,

there is no consensus whether causality is real, simpleassociation among phenomena, artifact, or the mind,or even fiction (Shugan 2007). Despite these doubtsthat causality is real or not, throughout history therehave been many attempts to operationalize and infercausality from data, as discussed below.

2.2. Operationalizing Causality from DataHume (1738) laid the foundations for the modernview of causality. Hume’s (1738) definition of Xcauses Y stresses three conditions that can be verifiedthrough observation: (1) precedence: X precedes Y intime; (2) contiguity: X and Y are contiguous in spaceand time; and (3) constant conjunction: X and Y alwayscooccur (or not cooccur).

8 Prisoners are in an underground cave with a fire behind them,bound so they can see only the shadows on the wall cast by pup-pets manipulated behind them. They think that this is all there isto see; if released from their bonds and forced to turn around tothe fire and the puppets, they become bewildered and are happierleft in their original state. They would think the things they seeon the wall (the shadows) were real; they would know nothing ofthe real causes of the shadows (Plato 360 b.c.).

Copyright:

INF

OR

MS

hold

sco

pyrig

htto

this

Articlesin

Adv

ance

vers

ion,

whi

chis

mad

eav

aila

ble

toin

stitu

tiona

lsub

scrib

ers.

The

file

may

notb

epo

sted

onan

yot

her

web

site

,inc

ludi

ngth

eau

thor

’ssi

te.

Ple

ase

send

any

ques

tions

rega

rdin

gth

ispo

licy

tope

rmis

sion

s@in

form

s.or

g.

Page 5: Toward a Causal Interpretation from Observational Data: A ...ericz/ISR09.pdf · function). Existing approaches need to prespecify the exact form of the nonlinear relationship. A general

Zheng and Pavlou: New Bayesian Networks Method for Structural Models with Latent VariablesInformation Systems Research, Articles in Advance, pp. 1–27, © 2009 INFORMS 5

Contemporary research has attempted to opera-tionalize causality through data-driven probabilities.Suppes’ (1970) well-known operational causality def-inition states that event X causes event Y if the prob-ability of Y is higher given X than without X, i.e.,P�Y � X� > P�Y � ∼X�. This definition is consistentwith Hume’s (1738) constant conjunction criterion, yetit makes Hume’s criterion probabilistic. A problemarises because there is often a third (confounding) fac-tor. A common example is that atmospheric current Z

causes both lightning X and thunder Y . This satis-fies P�Y � X� > P�Y � ∼X�; however, lightning does notcause thunder. Suppes (1970) solves this problem bynecessitating that X and Y have no common cause,thus avoiding a statistical confounding. This condi-tion is also stressed in Popper (1959) who adds acondition that no third variable Z accounts for theX–Y association. Cartwright (1995) argues that avoid-ing confounds requires that the relevant probabilitiesbe assessed relative to background contexts where allother causal factors are held fixed. Probability maythus infer causality if the data from which the prob-abilities are computed are obtained with appropriatecare to avoid confounds.The view that conditional probability can allow

for causality inference has long been proposed inthe causality literature (Glymour et al. 1987, Pearland Verma 1991, Spirtes et al. 2000). Druzdzel andSimon (1993) explain that conditional probabilitiesmake it possible to represent asymmetries amongvariables and, thereby, causality. There is strong evi-dence that human beings are not indifferent to causalrelationships and often give causal interpretation toconditional probabilities (Shugan 2007). In particu-lar, many studies seek to operationalize causality dis-covery from data with conditional probabilities (e.g.,Heckerman 1996, Pearl 2000).

2.3. Methods for Inferring Causality fromObservational Data

Inferring causality can take place with temporal orcross-sectional data, with each approach having adifferent focus. According to Granger (1986, p. 967):

In cross-sectional causation one is asking why a partic-ular unit is placed in a certain part of the distributionfor the variable of interest. In temporal causality oneis asking why parameters of that distribution have

changed through time. The two types are very differ-ent in nature and probably require different definitionsand methods of analysis.

The well-known Granger causality, for example,addresses causality for time-series data. Temporalcausality often uses experimental methods that permitrandomization (Mithas et al. 2006). However, becausesuch data are often difficult or even impossible toobtain, we focus on methods for inferring causal-ity from observational, cross-sectional data that arecommon in empirical information systems research.Causality inference from such data is well-acceptedin the statistics (Holland 1986, Rubin and Waterman2006), econometrics (Granger 1986), computer science(Druzdzel and Simon 1993), and IS literatures (Leeet al. 1997). We review three main methods—SEM,propensity scores, and BN.

2.3.1. Structural Equation Modeling (SEM). SEMis one of the most common data analysis methodsin IS research. Gefen et al. (2000) report that 45%of empirical papers in Information Systems Researchand 25% papers in Management Information SystemsQuarterly use SEM techniques. Its popularity stemsfrom its advantage over regression analysis, path anal-ysis, factor analysis, panel models, and simultaneousequation models. Though SEM was originally devel-oped to model causal relationships, SEM methodsare no longer believed to infer causality (see Foot-note 3). To overcome this limitation, IS researchershave attempted to extend SEM methods to allow forcausality inference. Lee et al. (1997) proposed an eight-step framework that attempts to represent and dis-cover causal relationships from SEM data. Their ideais to integrate confirmatory analysis in SEM withexploratory analysis using TETRAD.9 However, Leeet al. (1997) did not elaborate on how causal rela-tionships can be discovered from data, nor did theydemonstrate how TETRAD can be integrated withSEM methods.

2.3.2. Propensity Scores. The propensity scoremethod was originally proposed by Rosenbaum andRubin (1983) to help assess causal effects of inter-ventions. For example, Rubin and Waterman’s (2006)

9 TETRAD (Glymour et al. 1987) is a data-driven, graph-basedmethod developed in the machine-learning field that aims to dis-cover causal relationships from data.

Copyright:

INF

OR

MS

hold

sco

pyrig

htto

this

Articlesin

Adv

ance

vers

ion,

whi

chis

mad

eav

aila

ble

toin

stitu

tiona

lsub

scrib

ers.

The

file

may

notb

epo

sted

onan

yot

her

web

site

,inc

ludi

ngth

eau

thor

’ssi

te.

Ple

ase

send

any

ques

tions

rega

rdin

gth

ispo

licy

tope

rmis

sion

s@in

form

s.or

g.

Page 6: Toward a Causal Interpretation from Observational Data: A ...ericz/ISR09.pdf · function). Existing approaches need to prespecify the exact form of the nonlinear relationship. A general

Zheng and Pavlou: New Bayesian Networks Method for Structural Models with Latent Variables6 Information Systems Research, Articles in Advance, pp. 1–27, © 2009 INFORMS

intervention—a pharmaceutical salesman’s visit to adoctor—was shown to have a causal effect on a doc-tor’s drug prescription. Their approach first summa-rizes all covariates into a single propensity score byregressing (often through a logistic regression) thetreatment (salesman’s visit) on a set of covariates. Thepropensity score is thus the probability of a doctorbeing visited as a function of all covariate variables.Mithas et al. (2006) used this approach to show thecausal effect of customer relationship managementapplications on one-to-one marketing effectiveness.They prescribed a set of assumptions that researchersmust make to infer causality with the aid of propen-sity scores at the individual, firm, and economy lev-els. However, existing propensity scores methods onlydeal with one cause (treatment) and one effect. Theyare not applicable to the causal graph discovery prob-lem we aim to address in this paper, where the graph(or structural model) is composed of a network ofmultiple causes and effects.

2.3.3. Bayesian Networks (BN). BNs are graphi-cal structural models that encode probabilistic rela-tionships among variables (Heckerman 1996). The BNliterature has made major advances in inferring causalrelationships from observational data (Binder et al.1997, Pearl 1998, Spirtes et al. 2002). We follow Heck-erman’s (1996) and Friedman et al. (2000) notation torepresent a generic graph (Figure 1). A graph G�V �E�is referred to as a DAG (directed acyclic graph), whenthe edges E linking node V are directed and acyclic.Directed means E has an asymmetric edge over V ,and acyclic means that the directed edges do not formcircles.Associated with each edge is a conditional proba-

bility. A BN is a DAG that encodes a set of conditionalprobabilities and conditional independence assertionsabout variables V (Heckerman 1996). Lack of possi-ble arcs in G encodes conditional independencies. LetV = �X1� � � � �Xm�, where m is the number of variablesand Xi is both the variable and its matching node

Figure 1 A Generic Bayesian Network with Five Nodes

A

C

E

DB

in G. Denote �i as the parents of node Xi in G. Asin Figure 1, node C’s parents are A and B. Given thestructure in G, the joint probability distribution for V

is given by:

p�x� =m∏

i=1

p�xi � �i�� (1)

From the chain rule of probability, we have:

p�x� =m∏

i=1

p�xi � x1� � � � � xi−1�� (2)

A graph G represents causal relationships whenthere is an edge from A to B, if and only if A is adirect cause of B in G (Spirtes et al. 2002). For instance,when Figure 1 is a causal graph, an edge A → C isinterpreted as A is directly causing C or that C iscausally dependent on A. Druzdzel and Simon (1993)introduced the basic conditions by which BN couldreflect a causal structure: A BN is causal if and only if(a) each node of G and all of its predecessors describevariables involved in a separate mechanism in thegraph, and (b) each node without predecessors repre-sents an exogenous variable. More formally, a causalBayesian network is defined in terms of d-separationand the causal Markov assumption.

d-separation. A set of variables Z is said to d-separatevariable X from variable Y , if an only if Z blocks everypath fromX toY (Pearl 2000). Graphically, d-separationtypically exhibits itself in two cases: (1) X → Z → Y

and (2) X ← Z → Y . The intuition behind this is: X andY become independent of each other if they are con-ditioned on variable Z. X causes Y through Z in case(1) and X and Y have a common cause Z in case (2).There is also a third case X → Z′ ← Y , denoting thatX and Y have a common effect Z′. This case is oppo-site to d-separation: If two variables are independent,they will become dependent once conditioned on Z′.A set Z that d-separates X and Y should therefore notbelong to Z′. The notion of d-separation is especiallyuseful in constructing a BN because it controls possibleconfounds in the form of Z.Causal Markov Assumption. This is the central

assumption that defines a causal BN. According tothis assumption, each node is independent of its non-descendants in the graph, conditional on its parentsin the graph. Simply put, given a node’s immediate

Copyright:

INF

OR

MS

hold

sco

pyrig

htto

this

Articlesin

Adv

ance

vers

ion,

whi

chis

mad

eav

aila

ble

toin

stitu

tiona

lsub

scrib

ers.

The

file

may

notb

epo

sted

onan

yot

her

web

site

,inc

ludi

ngth

eau

thor

’ssi

te.

Ple

ase

send

any

ques

tions

rega

rdin

gth

ispo

licy

tope

rmis

sion

s@in

form

s.or

g.

Page 7: Toward a Causal Interpretation from Observational Data: A ...ericz/ISR09.pdf · function). Existing approaches need to prespecify the exact form of the nonlinear relationship. A general

Zheng and Pavlou: New Bayesian Networks Method for Structural Models with Latent VariablesInformation Systems Research, Articles in Advance, pp. 1–27, © 2009 INFORMS 7

cause, we can disregard the causes of its ancestors.The parents of a node form the smallest set of vari-ables for which the relevant conditional independenceholds. This assumption greatly reduces the complex-ity of Equation (2) and the joint probability of Fig-ure 1 simplifies to: P�A�B�C�D�E� = P�A� × P�B� ×P�C � A�B� × P�E � C� × P�D � C�. By acceptingthe causal Markov assumption, we can then infersome causal relationships from observational data(Friedman et al. 2000).

3. Method Development3.1. Rationale and Overview of

the Proposed MethodThe main interest of SEM studies is the structuralmodel, i.e., the relationships among LVs (or theoreti-cal constructs). LVs are assumed to be unobservablephenomena that are not directly measurable. What isobservable, however, are the measurement items of eachLV, the raw inputs to an SEM model. SEM studiesthus also address the measurement model—the relation-ships among the measurement items and their LVs—that test how well the LVs were actually measured.First, given a set of measurement items, how can

we identify the overarching LVs? This question doesnot often arise in the SEM literature because the com-mon SEM methods (e.g., confirmatory factor analysis(CFA) in LISREL and PLS) mostly work in a confir-matory mode by prespecifying which measurementitems load on which LVs (Gefen et al. 2000). Lee et al.(1997) criticize this confirmatory mode, pointing outBN as a potential alternative for exploratory analysis.However, building a BN in the presence of hiddenLVs is a nontrivial problem that has long been rec-ognized as one of the crucial, yet unsolved problemsin the BN literature (Cooper 1995, Friedman 1997).There are two fundamental issues to be addressed:(1) detecting the structure (or location) of LVs, and(2) calculating the values of the identified LVs.The BN literature only addresses the second issue

without dealing with the structure problem.10 Elidanand Friedman (2001) show that even learning thedimensionality—the number of possible values—ofLVs is hard. Cooper (1995) and Chickering and

10 This is referred to as the structure problem of LVs because deter-mining the location of LVs in a BN also specifies the BN structure.

Heckerman (1997) consider a simple case where thereonly exists a single hidden LV with a known struc-ture. What remains unknown and needs to be deter-mined is only the value of this LV. This simplifies thehidden LV problem to a special type of missing dataproblem where all values of the LV are missing. Impu-tation methods, such as the expectation maximizationalgorithm, can be used to impute the missing values.Similarly, Binder et al. (1997) assume that the completenetwork structure that includes the location of the LVsis known, and the goal is to learn the BN parameters inthe presence of LVs. However, when a certain structureis involved, the difficulty arises from having an unlim-ited number of hidden LVs and an unlimited numberof network structures to contain them (Cooper 1995).Determining network structures is thus NP-hard11 andheuristic methods may be necessary. For instance,Elidan et al. (2000) propose a “natural” approach byidentifying the “structure signature” of hidden LVs,which uses a heuristic to identify “semicliques” wheneach of the variables connects to at least half of theothers. If such a semiclique is detected, a hidden vari-able is introduced as the parent node to replace thesemiclique. Silva et al. (2006) questioned this ad-hocapproach and proposed a method for determining thelocation of LVs based on a TETRAD difference that,loosely speaking, captures the intercorrelations amongfour variables (Spirtes et al. 2000, p. 264). However, theapproach of Silva et al. (2006) only focuses on lin-ear continuous LVs for which the correlation-basedTETRAD difference is applicable.In sum, though the BN literature has made some

progress in developing methods for detecting “hid-den” LVs from data, to the best of our knowledgeconstructing a BN with LVs from measurement itemshas still not been achieved. The proposed LVI algo-rithm is thus developed (§3.2) to fill in this gap. Ituses the axiom of conditional independence as thebuilding block that provides a causal interpretation tothe measurement model (§3.2.1); the value of an LVis determined by nonlinear programming that max-imizes conditional independence (§3.2.2). The actual

11 For n nodes, the number of possible BN structures is f �n� =∑ni=1 �−1�i+1Ci

n2i�n−i�f �n − i�. For n = 2, the possible structures is 3;

for n = 3, it is 25; and for n = 5, it is 29,000 (Cooper and Herskovits1992).

Copyright:

INF

OR

MS

hold

sco

pyrig

htto

this

Articlesin

Adv

ance

vers

ion,

whi

chis

mad

eav

aila

ble

toin

stitu

tiona

lsub

scrib

ers.

The

file

may

notb

epo

sted

onan

yot

her

web

site

,inc

ludi

ngth

eau

thor

’ssi

te.

Ple

ase

send

any

ques

tions

rega

rdin

gth

ispo

licy

tope

rmis

sion

s@in

form

s.or

g.

Page 8: Toward a Causal Interpretation from Observational Data: A ...ericz/ISR09.pdf · function). Existing approaches need to prespecify the exact form of the nonlinear relationship. A general

Zheng and Pavlou: New Bayesian Networks Method for Structural Models with Latent Variables8 Information Systems Research, Articles in Advance, pp. 1–27, © 2009 INFORMS

algorithm is presented in §3.2.3, which takes the rawmeasurement items as inputs and outputs the identi-fied LVs.Second, after the LVs are identified, how can we

discover the most probable causal structure amongthe LVs? Two generic issues need to be addressed:(1) How can we determine that one causal structureis better than the other? (2) How can we search forthe best structure among all possible graphs, a prob-lem known to be NP-hard? We first adopt the pop-ular PC algorithm (named after Peter and Clark inSpirtes et al. 2000) to generate a good initial Bayesiannetwork to reduce the number of searchers needed(§3.3.1). We then refine this initial graph using a scor-ing approach (§3.3.2) to compare potential candidatestructures. The two state-of-the-art scoring functionsin the BN literature are the Bayesian-Dirichlet met-ric (Cooper and Herskovits 1992, Heckerman et al.1996) and the Gaussian metric (Glymour et al. 1987).However, neither scoring function is applicable to theLikert-type data commonly used in IS, which ren-der discrete and ordinal data. Specifically, the Bayesian-Dirichlet metric assumes a multinomial distribution,with parameters distributed as Dirchlet. Its multino-mial assumption, which is applicable to general dis-crete data, ignores the ordinal nature of Likert-typedata. The Gaussian metric treats data as continuouswith a Gaussian distribution, ignoring the discretenature of Likert-type data.To fill in this gap, we develop a new scoring func-

tion, the proposed OL metric (§3.3.2.1). However, theOL metric only computes the overall fitness of a can-didate structure and it does not infer if a given struc-ture is significantly better than the other. In view ofthis, we develop a Chi-square test (§3.3.2.2). Integrat-ing the OL metric and the Chi-square test, we can thendetermine if a certain structure is significantly betterthan the other. Finally, to intelligently search for thebest causal structure, we adopt the greedy equivalencesearch (GES) searching strategy (which is proven tobe optimal by Chickering 2002) for BN construction(§3.3.2.3).Overall, the proposed BN-LV method can both

identify the measurement model and also test thestructural model. The BN-LV method operates in twomajor stages: (1) it first identifies the “hidden” LVsgiven a set of measurement items (described in §3.2),

and (2) it generates an equivalent class of graphsamong the LV, scores each candidate graph using theOL scoring function, and searches for the structurewith the highest fitness score (described in §3.3).

3.2. Stage 1. Identifying Latent Variables fromMeasurement Items

The theory of measurement assumes that the LVs“cause” the direct measurement items (or indica-tors).12 In theory, given an LV, the measurement itemsare independent from each other (Kline 1998). Thisis formally referred to as the axiom or assumptionof conditional (or local) independence. Conditional inde-pendence is the basis of the theory of measurementand “the defining attribute of any latent structureanalysis” (Heinen 1996, p. 4). However, existing SEMor BN models do not directly test this axiom. SEMmethods mainly use correlation- or covariance-basedfactor analysis methods to categorize measurementitems under LVs to test the measurement model.13

Herein, we propose a new algorithm that takes theraw measurement items as the only inputs and thusoutputs the most likely LVs. This is accomplished byidentifying the most likely measurement model forthese measurement items by directly testing the axiomof conditional independence.

3.2.1. Testing the Axiom of Conditional Indepen-dence. The conditional independence axiom (Heinen1996, p. 4; Spirtes et al. 2000, p. 253; Bollen 1989)asserts that Rxixj �y—the conditional correlation bet-ween any two measurement items xi and xj given thelatent variable y—should approach zero for any pairof xi and xj where i, j ∈ �1� � � � �m� and i = j . Rxixj �yiscomputed as follows:

Rxixj �y =Rxixj

− RxiyRxj y√

�1− R2xiy

��1− R2xj y

�� (3)

12 This property of “reflective” LV is shown in SEM models as anarrow starting from the LV and pointing to the measurement items.13 Only one approach (Sarkar and Sriram 2001) uses conditionalindependence to discover composite attributes, which can be viewedas a group of attributes caused by LVs. However, it assumes thatthe dependent variable (in the case of Sarkar and Sriram 2001, bankfailure) of the BN is known, and that given the dependent variable,the composite attributes are independent. This is a strict conditionthat SEM studies do not meet.

Copyright:

INF

OR

MS

hold

sco

pyrig

htto

this

Articlesin

Adv

ance

vers

ion,

whi

chis

mad

eav

aila

ble

toin

stitu

tiona

lsub

scrib

ers.

The

file

may

notb

epo

sted

onan

yot

her

web

site

,inc

ludi

ngth

eau

thor

’ssi

te.

Ple

ase

send

any

ques

tions

rega

rdin

gth

ispo

licy

tope

rmis

sion

s@in

form

s.or

g.

Page 9: Toward a Causal Interpretation from Observational Data: A ...ericz/ISR09.pdf · function). Existing approaches need to prespecify the exact form of the nonlinear relationship. A general

Zheng and Pavlou: New Bayesian Networks Method for Structural Models with Latent VariablesInformation Systems Research, Articles in Advance, pp. 1–27, © 2009 INFORMS 9

We choose the common t-test

t = r√�1− r2�/�n − 2�

for correlation coefficient r of a sample size n to deter-mine if Rxixj �y is significantly different from zero. Ift > 1�96 when n is sufficiently large, we assume thatthe correlation is nonzero (p < 0�05).14 This test iscalled into question for LVs with only two measure-ment items (say, x1 and x2�. For example, Rx1x2�y isalways one when y is a linear combination of x1 andx2, similar to computing the factor scores in a prin-cipal components factor analysis. This implies thatthis test cannot empirically verify the axiom withonly two items per LV. Torgerson (1958) observed thesame phenomenon that he referred to as “measure-ment by Fiat,” because it often leads to rejection ofthe measurement model. It is no accident that manyresearchers (e.g., Kline 1998) recommend using morethan two measurement items per LV in SEM stud-ies whereas LISREL recommends at least four mea-surement items per LV for the measurement model toconverge.

3.2.2. Determining the Latent Scores. A remain-ing caveat above is that the axiom of conditional inde-pendence cannot be empirically tested because the LV(i.e., y in Equation (3)), at least in principle, is notdirectly measurable and, thus, it cannot be empiri-cally fixed. To test the conditional independence of ameasurement model, an estimate of the value of theLV, referred to as latent scoring, must first be assessed.A common latent scoring method is raw sumscore,which uses the simple sum of the measurement items.A variant of the raw sumscore method is the weightedaverage. For example, a usual method for estimatingthe values of LVs is principal components factor anal-ysis where the factor loadings are used as weights forcomputing the latent scores. However, the use of theraw sumscore or a weighted average lacks theoreticaljustification (Skrondal and Rable-Hesketh 2004).We propose an optimal weighting method to com-

pute the latent scores to directly maximize conditional

14 Empirically, this test may be too restrictive for a large n. Forinstance, a small r of 0.08 turns out to be significant when n = 500,leading to rejection of the measurement model. Future studies mayfind a more appropriate rule-of-thumb test for Rxixj �y .

independence. Without loss of generality, supposean LV y has m indicators x1�x2� � � � � xm. Let �1, �2,�m�

∑mi=1 �i = 1� be the corresponding weights. We

then have y = ∑mi=1 �ixi, using the weighted average

approach.15

We formulate the problem of assigning latent scoresas an optimization problem of finding the optimalweight vector �∗ = ��1��2� � � � ��m�, such that themaximum of all the m× �m−1�/2 pairs of conditionalcorrelation Rxixj �y in absolute value is minimized.16

Formally, the optimization problem is formulatedbelow:

�∗ = argmin�1������m

(max

1≤i� j≤m� i =j

∣∣Rxixj �y∣∣)

s.t. 0≤ �i ≤ 1, i ∈ �1� � � � �n�

m∑i=1

�i = 1

y =m∑

i=1

�ixi�

(4)

Once the optimal weight �∗ is determined (given0 ≤ �i ≤ 1�,17 the latent score of y is fixed. The pro-posed minmax approach ensures that the conditionalindependence axiom is met even in the worst case sce-nario (i.e., the max of Rxixj �y). However, if no such �∗

is found to satisfy the axiom of conditional indepen-dence, then the research design or the measurementitems may be highly problematic.

3.2.3. The Proposed Latent Variable Identifica-tion (LVI) Algorithm. The LVI algorithm seeks to dis-cover the smallest possible set of LVs (to ensure aparsimonious model with as few LVs) for the mea-surement items to be partitioned into disjoint setswhile assuring that the axiom of conditional indepen-dence is satisfied within each disjoint set. The notation

15 If the true latent score is required to be an integer (e.g., Likert-type scales), one can add an integer constraint to the optimizationmodel (Equation (5)) or round the latent score to an integer value.16 One less strict alternative is to minimize the average of thosem × �m − 1�/2 pairs of conditional correlation.17 Our proposed formulation sets 0 ≤ �i ≤ 1, thereby preventingnegative correlations among the measurement items. This followsthe reflective view of measurement where all items are expected tobe positively correlated with each other. For example, LISREL auto-matically converts a negative correlation into a positive one whentesting the measurement model.

Copyright:

INF

OR

MS

hold

sco

pyrig

htto

this

Articlesin

Adv

ance

vers

ion,

whi

chis

mad

eav

aila

ble

toin

stitu

tiona

lsub

scrib

ers.

The

file

may

notb

epo

sted

onan

yot

her

web

site

,inc

ludi

ngth

eau

thor

’ssi

te.

Ple

ase

send

any

ques

tions

rega

rdin

gth

ispo

licy

tope

rmis

sion

s@in

form

s.or

g.

Page 10: Toward a Causal Interpretation from Observational Data: A ...ericz/ISR09.pdf · function). Existing approaches need to prespecify the exact form of the nonlinear relationship. A general

Zheng and Pavlou: New Bayesian Networks Method for Structural Models with Latent Variables10 Information Systems Research, Articles in Advance, pp. 1–27, © 2009 INFORMS

Table 1 Notation for the LVI Algorithm

Notation Definition

X A set of m measurement items, X = �X1�X2� � � � � Xm)xi A measurement item of X

k-item set An item set having k-measurement itemsLk Set of valid k-item sets that satisfy the axiom of local

independenceCk Set of candidate k-item sets (with k items that may not

satisfy the axiom of conditional independence)C-1 The necessary condition that two measurement items of

the same LV should be moderately correlatedC-2 The axiom of conditional independence is detailed in §4.1

used in the LVI algorithm is described in Table 1. LVIhas two steps (Table 2).The flowchart of the LVI algorithm is shown in

Figure 2 and its algorithmic steps are outlined inAppendix A. The inputs to the LVI algorithm are themeasurement items and the outputs are the disjointitem sets, each of which represents an LV. Each L1

contains a measurement item. L2 is generated usingonly the necessary condition C-1. This is becausethe axiom of conditional independence of L2 is notdirectly testable (§3.2.2). Therefore, the LVI algorithmworks best for LVs that are measured with more thantwo measurement items, as is strongly recommendedby SEM researchers (e.g., Kline 1998, Bollen 1989).Then, LVI generates Lk+1from Lk by examining candi-date item sets based on C-1 and C-2 (see also Table 1).Step 2 prunes all valid item sets by eliminat-

ing all subsets and overlapping measurement items.To ensure the smallest number of LVs, the LVI algo-rithm begins from the largest item set (the one withthe most measurement items) among all Lk. It theneliminates the overlapping items from the item setthat is affected the least, after removing any overlap-ping items. Finally, the LVI algorithm outputs the dis-joint item sets, each of which represents an underlyingLV, and the value of each LV is computed accordingto formulation (4).18

3.3. Stage 2. Constructing a Causal BayesianNetwork for Structural Models

After the LVs are identified and their values are com-puted, the next step is to build a BN to test the causal

18 Additional information is contained in an online appendix to thispaper that is available on the Information Systems Research website(http://isr.pubs.informs.org/ecompanion.html).

relationships among the LVs. This corresponds to thestructural model testing part of SEM. The commonapproach to learning a BN from data is by speci-fying a scoring function (typically variations of thelikelihood function) of each candidate network struc-ture and then selecting the BN with the highest score(Friedman et al. 2000). Because examining the possiblenetwork structure is NP-hard, the search algorithms(for the optimal structure) in the BN literature arealmost exclusively variations of greedy algorithms. Toreduce the number of searches, Spirtes et al. (2002)proposed the generic PC algorithm to generate an ini-tial starting point and then used a greedy search algo-rithm based on the scoring function to reduce searchcomplexity. We follow this common practice and dis-cover the most likely BN in two steps: (1) generate aninitial class of equivalent BN using PC2 (our proposedvariation of the PC algorithm), and (2) select themost likely causal BN using a new scoring functiondesigned specifically for ordinal and discrete (Likert-type) data that are commonly found in IS research.

3.3.1. Generating Equivalent Classes of BayesianNetworks from Data. Given a set of data, is it pos-sible to create a unique causal Bayesian network? Theconsensus is that one cannot distinguish betweenBN that specify the same conditional independencefrom data alone. It is possible that two or moreBN structures represent the exact same constraints ofconditional independence (every joint probability dis-tribution generated by one BN structure can also begenerated by the other). In this case, the BN structuresare said to be likelihood equivalent.When learning an equivalent class of structures

from data, we can conclude that the true BN is pos-sibly any one of the networks in this class (Friedmanet al. 2000). An equivalence class of network struc-tures can be uniquely represented by a partiallydirected graph, where a directed edge X → Y sug-gests that all members of the equivalence class containthe arc X → Y . Otherwise, an undirected X–Y edgedenotes that some members of the class contain arcX → Y while others contain arc Y → X. Learning thecausal relationships among LVs can be regarded asthe process of “directing” a graph.The BN literature (e.g., Glymour et al. 1987,

Heckerman et al. 1995) has developed methods togenerate equivalent structures that have the same

Copyright:

INF

OR

MS

hold

sco

pyrig

htto

this

Articlesin

Adv

ance

vers

ion,

whi

chis

mad

eav

aila

ble

toin

stitu

tiona

lsub

scrib

ers.

The

file

may

notb

epo

sted

onan

yot

her

web

site

,inc

ludi

ngth

eau

thor

’ssi

te.

Ple

ase

send

any

ques

tions

rega

rdin

gth

ispo

licy

tope

rmis

sion

s@in

form

s.or

g.

Page 11: Toward a Causal Interpretation from Observational Data: A ...ericz/ISR09.pdf · function). Existing approaches need to prespecify the exact form of the nonlinear relationship. A general

Zheng and Pavlou: New Bayesian Networks Method for Structural Models with Latent VariablesInformation Systems Research, Articles in Advance, pp. 1–27, © 2009 INFORMS 11

Table 2 Detailed Steps of the LVI Algorithm

Step 1: Identify all sets of measurement items (item sets) that satisfy the axiom of conditional independence

LVI uses a maximum spanning approach. It starts with a randomly selected measurement item and it incrementally adds items to the item set. It stops whenno item can be added to the item set without violating the axiom. Denote Lk the item set with k measurement items that meet the conditional independenceaxiom. The core step of the algorithm is to span from Lk to Lk+1 the item set containing k +1 measurement items that still meet the axiom. This is done byadding an item not already in Lk into Lk , and then testing the axiom for the new item set with these k + 1 items using the method in §3.2.1. This step canincur high computational cost because it involves the optimization procedure to determine the latent score. We impose one condition to limit the possiblecombinations of Lk+1 to reduce the computation cost: The correlation between any two items in Lk+1, say, xi and xj , should be at least moderate (Kline1998, p. 190). The user may specify a threshold to determine “moderate” correlation. We use a low correlation of r = 0�5 as the generic threshold. Then,we eliminate the candidate item sets with k + 1 items that do not meet the conditions in the first place.

Step 2: Prune the generated item sets from Stage 1 into disjoint (discriminant) sets

LVI first identifies the supersets of item sets and then deletes all subsets. For example, suppose two item sets A = �x1� x2� x3� and B = �x1� x2� x3� x4� aregenerated by Step 1. Clearly B is a superset of A. In this case, we need to delete subset A to ensure convergent validity. We then deal with item sets that haveoverlapping items. For example, suppose two item sets A = �x1� x2� x3� and B = �x1� x4� x5� x6� have an overlapping item x1 (i.e., x1 loads on bothA and B). SEM methods would consider x1 a problematic item because it violates discriminant validity. Skrondal and Rable-Hesketh (2004, p. 8) suggesteither accepting that an item may belong to two or more LVs or discarding the problematic item (Goldstein 1994). Our algorithm detects such problematicitems and, by default, we assume the user decides to keep the items. The LVI algorithm then determines the LV that the item is more likely to belong to.This is done by testing the impact of deleting the measurement item from the two LVs on the value max�Rxi xj �y � among the residual measurement items.The measurement item is then assigned to the LV that is affected the most.

underlying undirected graph. As reviewed earlier,theories of causal BN are based on the d-separationand the causal Markov assumptions. Pearl and Verma(1991) established theorems to operationalize the con-struction of BN using d-separation. Let Rx�y �z be thepartial correlation of variable X and Y given Z, from

Figure 2 Flowchart of the Steps of the LVI Algorithm

Generate L2 = {xi, xj} from L1 that satisfy C-1 and i, j � (1, …, m), i≠ j

Output all Lk

Prune Lk; delete subsets and handle overlapping items

Input X = (X1, X2, …, Xm)

Initiate L1 = {xi}, i � (1, …, m)

Generate candidate set Ck +1 by adding xi (xi ∉Lk) to Lk

Yes

Add all Ck+1 to Lk+1

No

When k = m–1

k = k +1, 2 ≤ k ≤ m –1

Ck +1 meet C-1and C-2?

Next xi

a triplet of variables �X�Y �Z�, where Z is any setof variables besides X and Y . If X and Y are notd-separated given any Z(Rx�y �z = 0), then there is anedge between X and Y . We thus need to test the con-dition Rx�y �z = 0 for all possible combinations of Z.This is again NP-hard (Spirtes et al. 1998). Spirteset al. (2002) proposed the PC algorithm that tests thed-separation condition for any possible combinationsof X, Y , and Z to determine if there is a link between Xand Y . Though the PC algorithm is found to not be asaccurate as the scoring approach in general (Silva et al.2006, p. 211), it is more efficient and it can thus gen-erate BN structures that serve as good starting pointsfor other scoring-based algorithms.Our approach also uses the PC algorithm to dis-

cover an initial causal structure as the input to thescoring-based algorithm in §3.3.2. We slightly modi-fied the PC algorithm to make it consistent with SEMtechniques and we termed our version of the algo-rithm PC2 (Appendix B). Algorithm PC2 refines thePC algorithm in two dimensions: (1) the PC algorithmuses Fisher’s Z test, which requires all variables to benormally distributed, but the PC2 algorithm relaxesthis assumption by using the aforementioned t-testfor correlation coefficients to determine the signifi-cance of Rx�y �z (Equation (2)); (2) the PC2 algorithmincorporates Verma and Pearl’s (1992) five rules fordirecting graphs.The output of the proposed PC2 algorithm is a par-

tially causal BN Bs because some edges may remain

Copyright:

INF

OR

MS

hold

sco

pyrig

htto

this

Articlesin

Adv

ance

vers

ion,

whi

chis

mad

eav

aila

ble

toin

stitu

tiona

lsub

scrib

ers.

The

file

may

notb

epo

sted

onan

yot

her

web

site

,inc

ludi

ngth

eau

thor

’ssi

te.

Ple

ase

send

any

ques

tions

rega

rdin

gth

ispo

licy

tope

rmis

sion

s@in

form

s.or

g.

Page 12: Toward a Causal Interpretation from Observational Data: A ...ericz/ISR09.pdf · function). Existing approaches need to prespecify the exact form of the nonlinear relationship. A general

Zheng and Pavlou: New Bayesian Networks Method for Structural Models with Latent Variables12 Information Systems Research, Articles in Advance, pp. 1–27, © 2009 INFORMS

undirected. This is a direct result of the limited capac-ity of Verma and Pearl’s (1992) five rules for direct-ing links. Our empirical studies suggest that thesefive rules are too specific. For example, Rule 3 statesthat if X → Y , Y → Z, and X–Z, then direct the linkbetween X and Z as X → Z. This rule only covers afew cases we might encounter in BN construction. ThePC2 algorithm is thus ineffective in orienting thoselinks that are not covered by these five rules. How-ever, we found PC2 to be adequate in identifyingnonedges (i.e., nodes that should not be connected),and it is a fine starting point for causal BN discovery.

3.3.2. Refining Bayesian Network Using theOrdered Logit Scoring Function. The problem oflearning a Bayesian network can also be perceived asa process of finding a class of Bs that best matchesdata D. To compare two BNs—BS1 and BS2—we needto test the ratio P�Bs1 � D�/P�Bs2 �D� given data D. Bycomputing such ratios for pairs of BN structures, wecan rank order a set of competing structures. Follow-ing Bayes’ rules, we have

P�Bs1 �D�

P�Bs2 �D�= �P�Bs1�D�/P�D�

�P�Bs2�D�/P�D�= P�Bs1�D�

P�Bs2�D��

Therefore, it is possible to score the likelihood of aBayesian network Bs given D by computing the jointprobability P�Bs�D�.

3.3.2.1. The Ordered Logit (OL) Scoring Func-tion. As pointed out in §3.1, none of the existing scor-ing functions is intended specifically for SEM data,especially for Likert-type data that are commonlyused in IS research. We develop a new scoring func-tion termed ordered logit (OL) specifically for ordinaland discrete data (Likert-type data). For a particularnode x given a set of q parents �, its conditional prob-ability can be estimated by the following OL function:

P�x/��� =OL(

� +q∑

i=1

�i�i

)� (5)

where represents the set of parameters of the inter-cept � and coefficient �i by running an orderedlogistic regression (Borooah 2002) with x as the depen-dent variable and the q parents as the independentvariables. The proposed OL function is derived inAppendix C. The joint probability of a Bs and data D

is computed as follows.

Suppose there are n records. Let X�x1 · · ·xm� be a setof m discrete variables. Each variable xi has ri possiblevalues and a set of parents �i. Let �ijk denote the kthparent for record j of variable xi. Suppose there are qi

such parents. Then,

P�Bs�D� = P�Bs�n∏

j=1

m∏i=1

OL(

�ij +qi∑

k=1

�ijk�ijk

)� (6)

Once P�Bs� and the data D are known, we havefull knowledge about the domain for the purpose oflearning network structures. Cooper and Herskovits(1992) assume that P�Bs� is constant, that is, all net-work structures are equally likely without furtherknowledge. This is a necessary assumption if we donot allow the users to specify their own priors aboutthe network structure. Therefore, we can omit thecomponent P�Bs� when computing Equation (6).

3.3.2.2. Goodness-of-Fit Test Based on OrderedLogit. A goodness-of-fit test is needed to comparecompeting structures. SEM techniques have varioustests for the overall fitness of a structural model(Gefen et al. 2000).19 Here, we develop a �2 testbased on the OL scoring function. Similar to SEM, weassume that the null model is the measurement modelthat has no paths among its LVs. Let X�x1 · · ·xm� be aset of m latent variables. In the null model, P�Bs , D� =p�x1�×p�x2� · · ·×p�xm�. Assume each variable xi has ri

possible values x1i · · ·xri

i . Suppose there are n records.Define nij to be the number of cases in the data D

in which variable xi has the value xji . Then, p�xi� =∏ri

j=1�nij/n� and the overall likelihood for the nullmodel is:

L0 = p�Bs�Di� =m∏

i=1

ri∏j=1

nij

n� (7)

For a particular BN, we know from Equation (6)that L1 =∏m

i=1∏n

j=1 OL��ij +∑qi

k=1 �ijk�ijk�.

19 SEM techniques use different statistical tests for overall goodness-of-fit. For example, LISREL has 15 and AMOS has 25 differenttests, the choice of which is debatable. However, we do not aimto develop a comprehensive list of tests here. We simply proposeone overall goodness-of-fit test, assuming the null model is nestedwithin the model of interest. Extensions to general criteria such asAIC or BIC are straightforward, following the derived likelihoodfunction.

Copyright:

INF

OR

MS

hold

sco

pyrig

htto

this

Articlesin

Adv

ance

vers

ion,

whi

chis

mad

eav

aila

ble

toin

stitu

tiona

lsub

scrib

ers.

The

file

may

notb

epo

sted

onan

yot

her

web

site

,inc

ludi

ngth

eau

thor

’ssi

te.

Ple

ase

send

any

ques

tions

rega

rdin

gth

ispo

licy

tope

rmis

sion

s@in

form

s.or

g.

Page 13: Toward a Causal Interpretation from Observational Data: A ...ericz/ISR09.pdf · function). Existing approaches need to prespecify the exact form of the nonlinear relationship. A general

Zheng and Pavlou: New Bayesian Networks Method for Structural Models with Latent VariablesInformation Systems Research, Articles in Advance, pp. 1–27, © 2009 INFORMS 13

Given the two likelihoods, an appropriate goodness-of-fit test is the �2 test −2 ln�L1 − L0�, with degree offreedom as n minus the total number of local param-eters estimated. We know there are

∑ni=1 �ri − 1� inter-

cepts (�� and∑n

i=1 qi coefficients (�) to be estimated,where ri is the possible number of values of xi and qi

is the total number of parents of xi.Therefore, the degrees of freedom are given by:

df = n −m∑

i=1

�ri − 1� −m∑

i=1

qi� (8)

Using the �2 test, we can determine if a particulargraph structure is significantly better than any com-peting graphs (see Footnote 18).

3.3.2.3. Searching for the Best Structure. Afterrunning the PC2 algorithm, we already have an ini-tial graph Bs . Our only task is to then refine the ini-tial BN graph and orient the undirected edges (e.g.,X–Y ) in Bs . According to Pearl and Verma (1991),two graphs G and G′ are structurally different anddistinguishable if and only if they have a differentunderlying undirected graph and at least one differ-ent V -structure (i.e., converging directed edges intothe same node, such as X → Z ← Y �. From this theo-rem, if G and G′ have the same undirected structure,the only edges that must be directed are those thatparticipate in V -structures (also referred to as collid-ers by Spirtes et al. 2000). Suppose we need to selectbetween two competing structures Bs1�X → Y � withBs2 (Y → X� to orient the direction for X–Y . We firstneed to investigate if the direction reversal yields dif-ferent V -structures. If it does, we must check if thelikelihoods of the two structures are significantly dif-ferent according to the Chi-square test, and we choosethe one with the highest fitness score.However, a local change in one part of the net-

work can affect the evaluation of a change in anotherpart of the network, making the search for the opti-mal structure NP-hard (Chickering 2002). Chickering(2002) developed a greedy equivalence search (GES)algorithm that is asymptotically optimal, and it isnow considered as the best causal model search algo-rithm to date (Silva et al. 2006). This searching strat-egy is herein adopted. The main objective of GES isto reduce the search space. GES has two phases: First,it greedily (according to a scoring criterion such as

the OL function) adds dependencies by consideringall possible single-edge additions. Once the greedyalgorithm stops at a local maximum, a second-phasegreedy algorithm considers all possible single-edgedeletions. The algorithm terminates after no signif-icant improvement can be further achieved in thesecond phase and the final graph is outputted. Thisrepresents the method’s final output.

4. Evaluating the Bayesian Networksfor Latent Variables (BN-LV)Method

The BN-LV method (Figure 3) integrates both themeasurement model (via the LVI algorithm) and thestructural model (via the OL scoring function). BN-LVtakes the raw measurement items as inputs (Step 1,Figure 3) and it identifies the LVs that govern theseitems using the LVI algorithm (Step 2, Figure 3). Thevalue of the LVs is simultaneously computed by LVIthrough formulation (4) (Step 3, Figure 3). Then, aninitial graph is generated using the PC2 algorithm onthe identified LVs (Steps 4 and 5, Figure 3). The graphis refined based on the OL scoring function (Step 6,Figure 3), and it then outputs the most likely causalBN graph (Step 7, Figure 3).The computational complexity of BN-LV is tested

in Appendix D. Section 4.1 offers an illustrative exam-ple to describe the step-by-step process of the BN-LV

Figure 3 Flowchart of the Steps of the BN-LV Algorithm

1. Input data X = (X1, X2, …, Xm)

4. Initiate a graph G among LVs (all connected and undirected)

6. Refine Bs using OL function with GES searching strategy

2. Identify LV using algorithm LVI

3. Compute latent scores for each LV

5. Apply algorithm PC2 on G to get Bs

7. Output the most likely causal graphCopyright:

INF

OR

MS

hold

sco

pyrig

htto

this

Articlesin

Adv

ance

vers

ion,

whi

chis

mad

eav

aila

ble

toin

stitu

tiona

lsub

scrib

ers.

The

file

may

notb

epo

sted

onan

yot

her

web

site

,inc

ludi

ngth

eau

thor

’ssi

te.

Ple

ase

send

any

ques

tions

rega

rdin

gth

ispo

licy

tope

rmis

sion

s@in

form

s.or

g.

Page 14: Toward a Causal Interpretation from Observational Data: A ...ericz/ISR09.pdf · function). Existing approaches need to prespecify the exact form of the nonlinear relationship. A general

Zheng and Pavlou: New Bayesian Networks Method for Structural Models with Latent Variables14 Information Systems Research, Articles in Advance, pp. 1–27, © 2009 INFORMS

Table 3 The Correlation Matrix for the Raw Measurement Items

SAT1 SAT2 Trust1 Trust2 Trust3 EOU1 EOU2 EOU3 USEF1 USEF2 USEF3 INT1 INT2

SAT1 1�0

SAT2 0�867 1�0

Trust1 0�725 0�694 1�0

Trust2 0�695 0�615 0�819 1�0

Trust3 0�722 0�676 0�912 0�869 1�0

EOU1 0�583 0�532 0�603 0�658 0�622 1�0

EOU2 0�455 0�377 0�476 0�551 0�510 0�720 1�0

EOU3 0�558 0�477 0�569 0�616 0�561 0�865 0�864 1�0

USEF1 0�613 0�550 0�604 0�636 0�675 0�667 0�718 0�730 1�0

USEF2 0�527 0�425 0�464 0�476 0�512 0�541 0�577 0�620 0�695 1�0

USEF3 0�545 0�520 0�665 0�646 0�649 0�585 0�472 0�564 0�602 0�582 1�0

INT1 0�310 0�319 0�431 0�373 0�413 0�393 0�312 0�296 0�392 0�387 0�497 1�0

INT2 0�336 0�355 0�447 0�397 0�385 0�381 0�285 0�306 0�393 0�361 0�530 0�773 1�0

method using actual empirical data. Because of datacomplexity20 and space constraints, we can only pro-vide details at a summary level for Stage 2 of thealgorithm (Steps 4–7, Figure 3). Still, for a smallerartificial data set, we show the detailed calculationsbehind Stage 2 of the BN-LV method as a tutorial (seeFootnote 18).

4.1. Illustrating the BN-LV Method UsingEmpirical Data to Test CompetingCausal Models

We describe BN-LV by illustrating how it can helpreconcile competing structural models using actualempirical data. The step-by-step illustration demon-strates how BN-LV can test competing hypothesesin terms of the direction of causality between trustand ease of use when integrating the classic TAMmodel with the construct of trust (Figure 5).21 Specifi-cally, Pavlou (2003) argued and showed trust to influ-ence ease of use while Gefen et al. (2003) arguedand showed the opposite direction of causality. Tomake the comparison meaningful, we selected five

20 Theoretically, there are in total 29,000 possible structures thatneed to be evaluated for 5 constructs.21 According to TAM (Davis 1989), perceived usefulness (USEF) isthe extent to which a user thinks that using a system will enhanceher job performance (Davis 1989, p. 320). Perceived ease of use(EOU) is the extent to which a user thinks that using the systemwill be effortless (Davis 1989, p. 321). These two constructs predicta user’s intention (INT) to use the system in the workplace.

constructs that are common across the two structuralmodels—intentions (INT), usefulness (USEF), ease ofuse (EOU), trust (Trust), and satisfaction (SAT).22

4.1.1. Identifying Latent Variables from Mea-surement Items. In Stage 1 (the first three steps inFigure 3), BN-LV identifies the LVs from the measure-ment items. The raw data are the 13 measurementitems associated with the 5 LVs (Pavlou 2003). Eachitem is measured on a seven-point Likert-type scale.The correlation matrix of the 13 measurement itemsis shown in Table 3 with a significance level at 0.162(p < 0�05 level, n = 151).Following the flowchart of LVI (Figure 2), L1 is

first initiated and consists of 13 item sets {S1}, {S2},{T1} � � �and so on (abbreviation of each item is used).L2 is then generated based on the weak constraintthat the pairwise correlation should be moderate(i.e., >0.50), following condition C-1. This is done bychecking with the correlation matrix (Table 3). Fifty-three such item sets meet the condition, including {S1,S2}, {S1, T1} � � �and so on. L3 needs to satisfy bothconditions (C-1 and C-2) for any three-item combina-tion. This yields five item sets in Table 4 together withthe weight vector and the MinMax(R)—the objectivefunction in formulation (4). LVI then determines that

22 Because the two studies (Gefen et al. 2003, Pavlou 2003) includeddifferent control variables, though Pavlou (2003) also included per-ceived risk in the structural model, the comparison was made foronly these five constructs.

Copyright:

INF

OR

MS

hold

sco

pyrig

htto

this

Articlesin

Adv

ance

vers

ion,

whi

chis

mad

eav

aila

ble

toin

stitu

tiona

lsub

scrib

ers.

The

file

may

notb

epo

sted

onan

yot

her

web

site

,inc

ludi

ngth

eau

thor

’ssi

te.

Ple

ase

send

any

ques

tions

rega

rdin

gth

ispo

licy

tope

rmis

sion

s@in

form

s.or

g.

Page 15: Toward a Causal Interpretation from Observational Data: A ...ericz/ISR09.pdf · function). Existing approaches need to prespecify the exact form of the nonlinear relationship. A general

Zheng and Pavlou: New Bayesian Networks Method for Structural Models with Latent VariablesInformation Systems Research, Articles in Advance, pp. 1–27, © 2009 INFORMS 15

Table 4 The Weight Vector and MinMax�R� Values

L3 Weights MinMax(R)

{T1, T2, T3} �0�15�0�12�0�73� 0.133{E1, E2, E3} �0�09�0�16�0�75� 0.106{U1, U2, U3} �0�84�0�07�0�09� 0.112{T2, T3, U1} �0�04�0�84�0�12� 0.118{E1, E3, U1} �0�04�0�87�0�09� 0.103

there is no L4 that meets C-2 (the best candidate is{T1, T2, T3, U1} with MinMax(R)= 0.194). In sum, LVIcompletes generating item sets, resulting in 5 item setsin L3 and 53 item sets in L2.

Next, LVI prunes all those item sets as follows: First,it identifies the supersets of these 53 item sets in L2

and drops those subsets correspondingly (51 of them).Only two item sets in L2 remain: {S1, S2} and {I1, I2}.We then identify the overlapping items in L3. Noticethat item U1 loads on three item sets (last three rows)(Table 4), which suggests that it is a potentially prob-lematic item. LVI recommends keeping U1 with theset {U1, U2, U3} because it affects this item set themost (dropping it would lead to a weak L2 of {U2, U3}with a correlation of 0.582). The other two sets {T2,T3, U1} and {E1, E3, U1} are then dropped. Therefore,LVI eventually outputs the final disjoint item sets {S1,S2}, {I1, I2}, {T1, T2, T3}, {E1, E2, E3}, and {U1, U2, U3}.The corresponding latent scores are computed usingthe optimal weights in Table 4 for L3 and equal weightof 0.5 for L2.

4.1.2. Constructing the Most Likely GraphAmong the Identified LVs. Stage 2 in BN-LV (corre-sponding to Steps 4–7 in Figure 3) discovers the mostlikely graph among the five LVs. It first initiates afully connected and undirected graph that connectsany two LVs (Step 4, Figure 3). Step 5 in Figure 3applies algorithm PC2 and generates the graph (leftpanel of Figure 4). Note that PC2 fails to identify anycausal directions for these links. Step 6 of Figure 3applies the proposed OL scoring function with theGES search strategy to identify the directionality ofthese links. The result shown (right panel of Figure 4)closely corresponds to Pavlou’s (2003) structuralmodel for these five LVs.

4.1.3. Using BN-LV to Reconcile CompetingHypotheses on Causal Links. Gefen et al. (2003) pro-posed a different structural model in which the direc-tion between EOU and Trust is EOU → Trust while

Pavlou (2003) proposed that Trust→ EOU. Both stud-ies provide compelling theoretical justifications. Insuch cases, Carte and Russell’s (2003, p. 487) solutionthat seeks theoretical justification may be of little help.In contrast, BN-LV offers a “let data speak” approachto reconcile competing structural models. We exam-ine the interrelationships among the five LVs—Trust,EOU, USEF, INT, SAT—for the two competing mod-els (Figure 5).23 The data support the case of Trust→EOU. The likelihoods are −908.8 for Pavlou (2003)and −952.3 for Gefen et al. (2003). The Trust → EOUdirection of causality thus improves the likelihood by43.5. The degrees of freedom for the Chi-square test(Equation (8)) is 6 between the 2 graphs (Figure 5)and the critical value is 14.5 (p < 0�05).24 The improve-ment of Pavlou’s (2003) model over the model ofGefen et al (2003) is significant and distinguishable.Though we cannot obviously draw definite conclu-sions from one data set, this example illustrates howBN-LV can empirically reconcile between competingstructural models in terms of the direction of causalityin certain relationships.25

4.2. Simulation Experiments to Evaluatethe BN-LV Method Relative toCompeting Techniques

We further systematically evaluate the BN-LV methodusing a simulation experiment. The data generat-ing process (DGP) is first described, followed by

23 SAT (satisfaction) is a control variable antecedent of Trust inPavlou (2003). Our simulation study excluded this control variablefor simplicity, but it is necessary to have SAT here in the empiricaldata to ensure that the two graphs are distinguishable. In graph (b),Figure 5, the node Trust is a collider case because SAT → Trust ←EOU but it is not a collider case in graph (a), Figure 5.24 The only difference between the two graphs in Figure 6 is “Trust”and “EOU.” In the left graph, both “Trust” and “EOU” have oneparent. Each node needs �7 − 1� + 1 = 7 parameters (see Equa-tion (8)) and, in total, 14 parameters are needed for them. In theright graph, “Trust” has two parents and “EOU” has none. There-fore, it needs �7 − 1� + 2 = 8 parameters. The left graph needs sixmore parameters.25 We acknowledge that the two studies proposed different struc-tural models with multiple additional constructs whereas our anal-ysis only includes these five constructs. Therefore, we do not makeany claims about the validity of the original findings because theresults may have been influenced by the other constructs that arenot included in this example.

Copyright:

INF

OR

MS

hold

sco

pyrig

htto

this

Articlesin

Adv

ance

vers

ion,

whi

chis

mad

eav

aila

ble

toin

stitu

tiona

lsub

scrib

ers.

The

file

may

notb

epo

sted

onan

yot

her

web

site

,inc

ludi

ngth

eau

thor

’ssi

te.

Ple

ase

send

any

ques

tions

rega

rdin

gth

ispo

licy

tope

rmis

sion

s@in

form

s.or

g.

Page 16: Toward a Causal Interpretation from Observational Data: A ...ericz/ISR09.pdf · function). Existing approaches need to prespecify the exact form of the nonlinear relationship. A general

Zheng and Pavlou: New Bayesian Networks Method for Structural Models with Latent Variables16 Information Systems Research, Articles in Advance, pp. 1–27, © 2009 INFORMS

Figure 4 Graphs Generated by the Proposed BN-LV Method

EOUUSEF

INT

TrustSAT

INT

EOUUSEF

TrustSAT

Graph after applying the PC2 algorithm (Step 5, Figure 3) Graph after applying the OL scoring function (Step 7, Figure 3)

the evaluation of the BN-LV method’s two corecomponents—the LVI algorithm for identifying theLVs, and the OL scoring function for discovering theoptimal graph structure (most likely causal BN).

4.2.1. The Data Generating Process (DGP) for theSimulation Experiments. The blueprint graph struc-ture for our simulated data comes from Pavlou (2003,p. 90). The structural model (Figure 6) depicts theconstructs that are hypothesized to affect consumerintentions to transact online. We only selected thestudy’s principal constructs—Trust, Risk, EOU, USEF,and INT—while the control variables were omittedfor simplicity. In particular, Trust is deemed exoge-nous while EOU, USEF, Risk, and INT are deemedendogenous (Figure 6).We simulated the data according to the above theo-

retical structure, following the DGP specified in Silvaet al. (2006) and Spirtes et al. (2000, p. 114). The DGPwas composed of three steps:Step 1. The exogenous variables were first indepen-

dently generated following a normal distribution.Step 2. Values of the endogenous variables were

then generated as a linear function of their parentswith a normally distributed error term e.Step 3. Values of the indicators were generated

directly from each of their corresponding latentvariables, adjusted by a normally distributed noiseterm i.

Figure 5 Competing SEM Models for Integrating TAM with Trust

EOUUSEF

INT

TrustSAT

EOUUSEF

INT

TrustSAT

(a) Pavlou (2003), log l = – 908.8 (b) Gefen et al. (2003), log l = –952.3

First, we generated the exogenous construct Trustwith a normal distribution N�4�3�. We attempted tobe consistent with Pavlou (2003) who used a 7-pointLikert-type scale with mean = 4. The variance waschosen to be 3, such that 95% of times the simulatedvalues fall within the range �0�6�7�4�. The endoge-nous LVs were simulated using the path coefficients(Figure 6). For instance, the only parent of EOU isTrust with a path coefficient 0.64. We thus generatedthe value of EOU from the linear equation EOU =1�44+ 0�64× Trust+ e, where the intercept 1.44 waschosen for the mean of EOU to be 4. The noise term e follows N�0��2

e�. Noise �e was varied by threelevels: low, medium, and high, which were instanti-ated as 0.3, 0.6, and 0.9, respectively. The same pro-cedure was used for the other principal constructs inFigure 6.We simulated four indicators per LV (LVI algorithm

requires three indicators per LV while LISREL recom-mends four). Each indicator was simulated as the LVplus an error term i. Likewise, i followed a nor-mal distribution N�0��2

i � while �i has 3 levels: 0.3, 0.6,and 0.9. We then converted the indicator values into7-point Likert scales—1 for the simulated value below1.49, 2 for the value within the range 1.5–2.49, and soon. We argue that this is consistent with the subjects’actual responses to survey questionnaire items, whereeach Likert anchor reflects a range of values.

Copyright:

INF

OR

MS

hold

sco

pyrig

htto

this

Articlesin

Adv

ance

vers

ion,

whi

chis

mad

eav

aila

ble

toin

stitu

tiona

lsub

scrib

ers.

The

file

may

notb

epo

sted

onan

yot

her

web

site

,inc

ludi

ngth

eau

thor

’ssi

te.

Ple

ase

send

any

ques

tions

rega

rdin

gth

ispo

licy

tope

rmis

sion

s@in

form

s.or

g.

Page 17: Toward a Causal Interpretation from Observational Data: A ...ericz/ISR09.pdf · function). Existing approaches need to prespecify the exact form of the nonlinear relationship. A general

Zheng and Pavlou: New Bayesian Networks Method for Structural Models with Latent VariablesInformation Systems Research, Articles in Advance, pp. 1–27, © 2009 INFORMS 17

Figure 6 The Blueprint Graph Structure for the Simulated Data(Adapted from Pavlou 2003)

Intentionsto transact

Trust

–0.63 –0.110.35

0.33

0.120.64

0.41

0.49

Perceived risk

Usefulness

Ease of use

4.2.2. Manipulation of Experimental Dimensions.In this experiment, we simulated the data acrossfour dimensions: sample size, noise, normality, andlinearity.Sample Size. We simulated three sample sizes—50,

250, and 1,000—to represent small, medium, and largesample sizes, respectively. In SEM, as a rule of thumb,sample sizes below 100 are considered small, between200–300 are considered moderate, and >500 are con-sidered large (Gefen et al. 2000). LISREL is sensitive tosmall sample sizes while PLS can handle small sam-ple sizes with bootstrapping (Chin 1998).Noise. Because noise affects data quality and the

structural models built with noisy data, we varied thenoise levels of both �e and �i from low (0.3), medium(0.6), to high (0.9).Normality. SEM assumes data normality (Gefen et al.

2000). To examine how violation of this assumptionaffects model performance, we simulated the exoge-nous construct Trust from a uniform distributionbetween 0.5 and 7.5. Note that a nonnormal Trust alsorenders the other constructs nonnormal because theother constructs are endogenous.Linearity. SEM assumes linear relationships among

LVs while the BN literature does not make thisrestrictive assumption. We simulated the endogenousvariables to follow an exponential function of theirparents plus a normal error term. We used a cumula-tive exponential distribution with parameter � set tobe equal to the path coefficients. For example, EOUwas EOU= 0�5+7× �1−exp�−0�64×Trust��+ e withthe scale parameters (0.5 and 7) chosen to ensure amean of 4.These 4 dimensions yielded a total of 15 combina-

tions. There were five scenarios: three noise levels forthe normal and linear data, one level of nonnormal-ity, and one level of nonlinearity. Each scenario had3 sizes: 50, 250, and 1,000. Because it is customary to

use multiple runs to average out the randomness thatarises from the normally distributed error terms, weused 5 runs for each of the 15 combinations, resultingin a total of 75 data sets. The total number of data setswas limited by the manual data analysis proceduresin LISREL and PLS. Therefore, we only examined onelevel of noise (medium noise) for the nonnormalityand nonlinearity cases and we used only five runs foreach combination.

4.2.3. Measurement Model Comparison. We com-pared the LVI algorithm with the two commonly usedmethods for measurement model testing: EFA andCFA. For EFA, we used the principal componentsfactor analysis (Proc Factor in SAS 9.1) using theEigenvalue >1 criterion. For CFA, we used the CFAprocedure in PLS Graph 3.0. Note that this compari-son is more generous to the CFA method because ittakes more inputs (the number of factors) than theexploratory LVI and EFA.We ran each method on the 75 simulated data sets.

Each data set consisted of 20 indicators (4 measure-ment items for each of the 5 constructs). Dayton andMacready (1988) proposed using omission and intru-sion error rates to evaluate the results of measure-ment model testing (factor analysis). The omissionerror rate is the percentage of manifest items that arenot included in any LV. Intrusion is the error ratethat manifest items that are misassociated with cer-tain LVs. Spirtes et al. (2000) and Silva et al. (2006) usesimilar metrics, termed omission rate and commissionrate, respectively (the percentage of LVs not specifiedin the true measurement model). These metrics resultin four evaluation criteria:1. Latent Omission �LO�. The error rate associated

with omitted LVs. It is computed as the number oftrue LVs that are not identified by the method underinvestigation, divided by the total number of true LVs(five in our simulated study).2. Latent Commission �LC�. The error rate associated

with misidentified LVs. It is computed as the numberof LVs that are identified by the method (however,not the true LVs), divided by the total number of trueLVs.3. Indicator Omission �IO�. The error rate associated

with missing indicators (items). It is computed as thenumber of items that are in the true measurementmodel but do not appear in the measurement model

Copyright:

INF

OR

MS

hold

sco

pyrig

htto

this

Articlesin

Adv

ance

vers

ion,

whi

chis

mad

eav

aila

ble

toin

stitu

tiona

lsub

scrib

ers.

The

file

may

notb

epo

sted

onan

yot

her

web

site

,inc

ludi

ngth

eau

thor

’ssi

te.

Ple

ase

send

any

ques

tions

rega

rdin

gth

ispo

licy

tope

rmis

sion

s@in

form

s.or

g.

Page 18: Toward a Causal Interpretation from Observational Data: A ...ericz/ISR09.pdf · function). Existing approaches need to prespecify the exact form of the nonlinear relationship. A general

Zheng and Pavlou: New Bayesian Networks Method for Structural Models with Latent Variables18 Information Systems Research, Articles in Advance, pp. 1–27, © 2009 INFORMS

Table 5 Measurement Model Comparison Results

Simulated dimensions LVI EFA (SAS Proc Factor) CFA (PLS)

Size Noise Normality Linearity LO LC IO IC LO LC IO IC LC IC

1,000 Low Yes Yes 0 0 0 0 0�6 0�4 0�6 0�4 0�68 0�591,000 Medium Yes Yes 0�36 0 0�36 0 0�6 0�4 0�6 0�4 0�36 0�41,000 High Yes Yes 0�64 0 0�64 0 0�36 0�16 0�36 0�16 0�2 0�21,000 Medium No Yes 0�48 0 0�48 0 0�6 0�28 0�6 0�28 0�68 0�541,000 Medium Yes No 0�4 0 0�4 0 0�28 0�04 0�28 0�05 0�28 0�14250 Low Yes Yes 0 0�12 0 0�06 0�6 0�4 0�6 0�4 0�8 0�82250 Medium Yes Yes 0�26 0�08 0�26 0�04 0�52 0�32 0�52 0�32 0�36 0�12250 High Yes Yes 0�44 0 0�44 0 0�32 0�22 0�32 0�2 0�08 0�04250 Medium No Yes 0�16 0�24 0�16 0�11 0�52 0�56 0�52 0�56 0�6 0�31250 Medium Yes No 0�24 0�16 0�24 0�06 0�28 0�08 0�28 0�08 0�08 0�0350 Low Yes Yes 0�12 0�28 0�12 0�22 0�56 0�24 0�56 0�28 0�76 0�8550 Medium Yes Yes 0�12 0�12 0�12 0�06 0�36 0�24 0�36 0�2 0�44 0�2150 High Yes Yes 0�16 0�28 0�16 0�18 0�28 0�2 0�28 0�17 0�32 0�1850 Medium No Yes 0�08 0�34 0�08 0�13 0�48 0�2 0�48 0�17 0�88 0�3850 Medium Yes No 0�24 0�04 0�24 0�02 0�32 0�2 0�32 0�14 0�36 0�13

Average 0�25 0�11 0�25 0�06 0�45 0�263 0�45 0�25 0�46 0�33

generated by the method under investigation, dividedby the total number of items in the true measurementmodel (20 in our simulated study).4. Indicator Commission �IC�. The error rate associ-

ated with misclustered indicators (items). It is computedas the total number of items generated by the methodunder investigation that are misclustered under theirnonhypothesized LVs.These four criteria can be readily computed by the

LVI algorithm that directly outputs the LVs and theassociated items. However, the LO and IO criteria arenot applicable to the PLS CFA because the true num-ber of LVs is already prespecified. Also, because theSAS Proc Factor and the PLS CFA output LVs with theloadings of each item associated with the LVs, there isno consensus as to what determines a good LV givenits loadings. A common guideline for ensuring dis-criminant validity is that the loading of an item on itshypothesized LV to be reasonably high (e.g., >0.70)while the item loadings on the other LVs should besubstantially smaller (e.g., <0.40) (Gefen et al. 2000).A conservative rule of thumb is for the differencebetween the hypothesized and nonhypothesized indi-cators to be at least 0.2. If this rule is violated and anitem loads on more than one LV, we detect the indica-tor as IC. If any indicators from different LVs load intoa single LV, we detect LC because the method doesnot discriminate among these items. If an indicator

does not load onto any LV, IO is detected. Finally, ifan LV in the true measurement model is not identifiedby the method, we detect an LO.Table 5 presents the summary results for LVI, EFA

(SAS Proc Factor), and CFA (PLS) for an average overfive runs. Column 1 indicates the sample size. Col-umn 2 indicates the noise level (low, medium, orhigh). Column 3 indicates whether the data is gener-ated from a normal distribution and Column 4 indi-cates whether there is a nonlinearity, as describedabove. Table 5 shows that on average the LVI algo-rithm outperforms the EFA and CFA methods. As thesample size increases, the LVI error shifts from com-mission to omission. PLS CFA is the least sensitive tosample size. Noise has a negative effect on the perfor-mance of the LVI algorithm. Nonnormality affects theCFA the most (e.g., the LC rate goes from 0.36 to 0.68for n = 1�000). However, nonlinearity does not appearto have a clear impact on the three methods.We ran two repeated measure ANOVA analyses:

One between the LVI and the EFA (Table 6) and onebetween the LVI and the CFA (Table 7) to examine therole of the four simulated dimensions (sample size,noise, normality, and linearity).Table 6 shows that the LVI is significantly supe-

rior to the EFA in terms of LO, LC, and IC(p-value< 0�05) and marginally significant in terms ofIO (p-value = 0�051) (within subjects). The between

Copyright:

INF

OR

MS

hold

sco

pyrig

htto

this

Articlesin

Adv

ance

vers

ion,

whi

chis

mad

eav

aila

ble

toin

stitu

tiona

lsub

scrib

ers.

The

file

may

notb

epo

sted

onan

yot

her

web

site

,inc

ludi

ngth

eau

thor

’ssi

te.

Ple

ase

send

any

ques

tions

rega

rdin

gth

ispo

licy

tope

rmis

sion

s@in

form

s.or

g.

Page 19: Toward a Causal Interpretation from Observational Data: A ...ericz/ISR09.pdf · function). Existing approaches need to prespecify the exact form of the nonlinear relationship. A general

Zheng and Pavlou: New Bayesian Networks Method for Structural Models with Latent VariablesInformation Systems Research, Articles in Advance, pp. 1–27, © 2009 INFORMS 19

Table 6 ANOVA Results Between LVI and EFA (SAS Proc Factor)

Comparison Source Measure F -value Significance (p-value)

Within LVI vs. EFA LO 4�188 0�045LC 5�415 0�023IO 3�947 0�051IC 6�075 0�016

Between Sample size LO 27�067 0�000LC 5�178 0�008IO 28�189 0�000IC 2�243 0 �114

Noise LO 3�066 0�053LC 10�858 0�000IO 3�122 0�050IC 5�417 0�007

Nonnormality LO 0�282 0�597LC 2�516 0 �117IO 0�287 0 �594IC 1�356 0 �248

Nonlinearity LO 3�785 0�056LC 11�362 0�001IO 4�588 0�036IC 11�503 0�001

subjects comparison shows the LVI to be generallysuperior to the EFA in terms of sample size, noise, andnonlinearity but not in terms of nonnormality.Table 7 shows the comparison between LVI and

CFA. The within-subjects comparison shows that theLVI algorithm significantly outperforms CFA in termsof both LC and IC. The between-subjects comparisonalso shows that the LVI outperforms the CFA on vir-tually all accounts except under sample size for LC.

4.2.4. Structural Model Comparison. This sectioncompares BN-LV with four methods: PLS, LISREL,and two BN methods—the BD-metric approach based

Table 7 ANOVA Results Between LVI and CFA (PLS)

Comparison Source Measure F -value Significance

Within LVI vs. CFA LC 41�789 0�000IC 57�274 0�000

Between Sample size LC 0�611 0 �546IC 3�885 0�025

Noise LC 27�521 0�000IC 48�402 0�000

Nonnormality LC 14�505 0�000IC 9�009 0�004

Nonlinearity LC 14�505 0�000IC 4�294 0�042

on the Dirichlet assumption of the data (Heckerman1996) and the Gaussian approach that assumes datanormality (Glymour et al. 1987). The graph (structuralmodel) generated by the five methods is comparedagainst the prespecified graph (Pavlou 2003) on whichthe data was simulated. Following Spirtes et al. (2000),we use three comparison criteria:1. Path Omission �PO�. The error rate associated

with omitted paths (links). It is computed as the num-ber of paths that are in the true structural model(graph) but were not identified by the method underinvestigation, divided by the total number of paths(eight in our simulated study) in the underlying struc-tural model (Pavlou 2003).2. Path Commission �PC�. The error rate associated

with misidentified paths. It is computed as the numberof paths that are identified by the method but do notappear in the true model, divided by the total numberof paths in the true model.3. Path Misdirection �PM�. The error rate associated

with misdirected paths. It is computed as the numberof misoriented directions of causality as opposed tothe true structural model, divided by the number oftrue directions (eight in our study).We further separated the comparison into confir-

matory (PLS and LISREL) (Table 8) and exploratoryresults (BN) (Table 9).

Table 8 Structural Model Comparison Between BN-LV with PLS andLISREL (Confirmatory Mode)

Simulated dimensions BN-LV PLS LISREL

Size Noise Normality Linearity PO PC PO PC PO PC

1,000 Low Yes Yes 0.05 0.00 0.00 0.15 0.08 0.001,000 Medium Yes Yes 0.05 0.00 0.00 0.10 0.05 0.001,000 High Yes Yes 0.00 0.00 0.08 0.05 0.13 0.001,000 Medium No Yes 0.05 0.00 0.00 0.05 0.08 0.021,000 Medium Yes No 0.00 0.15 0.03 0.20 0.08 0.13250 Low Yes Yes 0.08 0.03 0.10 0.08 0.15 0.00250 Medium Yes Yes 0.03 0.00 0.08 0.00 0.20 0.00250 High Yes Yes 0.00 0.03 0.23 0.05 0.25 0.08250 Medium No Yes 0.08 0.00 0.10 0.03 0.18 0.00250 Medium Yes No 0.03 0.00 0.20 0.13 0.18 0.0850 Low Yes Yes 0.15 0.00 0.33 0.00 0.38 0.0050 Medium Yes Yes 0.13 0.00 0.30 0.03 0.35 0.0050 High Yes Yes 0.15 0.00 0.33 0.05 0.43 0.0050 Medium No Yes 0.10 0.00 0.30 0.00 0.43 0.0050 Medium Yes No 0.18 0.00 0.30 0.05 0.53 0.00

Averages 0.07 0.01 0.16 0.06 0.23 0.02Copyright:

INF

OR

MS

hold

sco

pyrig

htto

this

Articlesin

Adv

ance

vers

ion,

whi

chis

mad

eav

aila

ble

toin

stitu

tiona

lsub

scrib

ers.

The

file

may

notb

epo

sted

onan

yot

her

web

site

,inc

ludi

ngth

eau

thor

’ssi

te.

Ple

ase

send

any

ques

tions

rega

rdin

gth

ispo

licy

tope

rmis

sion

s@in

form

s.or

g.

Page 20: Toward a Causal Interpretation from Observational Data: A ...ericz/ISR09.pdf · function). Existing approaches need to prespecify the exact form of the nonlinear relationship. A general

Zheng and Pavlou: New Bayesian Networks Method for Structural Models with Latent Variables20 Information Systems Research, Articles in Advance, pp. 1–27, © 2009 INFORMS

Table 9 Structural Model Comparison Between BN-LV with Dirichletand Gaussian Methods (Exploratory Mode)

Common BN BN-LV BN BNSimulated dimensions error rates (OL) (Dirichlet) (Gaussian)

Size Noise Normality Linearity PO PC PM PM PM

1,000 Low Yes Yes 0�05 0 0�28 0�55 0�381,000 Medium Yes Yes 0�25 0 0�23 0�43 0�281,000 High Yes Yes 0�18 0 0�20 0�50 0�501,000 Medium No Yes 0�2 0 0�13 0�38 0�451,000 Medium Yes No 0�03 0�15 0�18 0�33 0�53250 Low Yes Yes 0�2 0�03 0�23 0�45 0�48250 Medium Yes Yes 0�3 0 0�20 0�40 0�35250 High Yes Yes 0�25 0�03 0�20 0�33 0�30250 Medium No Yes 0�28 0 0�20 0�38 0�35250 Medium Yes No 0�3 0 0�18 0�13 0�3050 Low Yes Yes 0�53 0 0�15 0�18 0�1850 Medium Yes Yes 0�4 0 0�15 0�33 0�2050 High Yes Yes 0�41 0 0�08 0�13 0�1550 Medium No Yes 0�3 0 0�20 0�20 0�2350 Medium Yes No 0�3 0 0�08 0�10 0�07

Averages 0�18 0�32 0�31

BN-LV operates in both a confirmatory and anexploratory mode26 while PLS and LISREL operatesolely in a confirmatory mode. Therefore, alternativeBN approaches (BD-metric and Gaussian) that operatein an exploratory mode must be used for a completecomparison with BN-LV. For the confirmatory mode,a link is considered missing if a hypothesized path isnot significant (PO error). A PC error occurs if non-hypothesized link is significant. The PM error is irrel-evant because the true direction of the causal linksis already prespecified. Table 8 shows that on aver-age BN-LV outperforms both PLS and LISREL withrespect to the PO and PC error rates.Table 9 compares the three BN methods. To make

the results consistent, we only apply the three scoringfunctions to orient directions after an initial graph isgenerated by algorithm PC2. Therefore, the only rel-evant criterion is the PM error rate. Table 9 showsthat on average our OL function outperforms both theDirichlet and the Gaussian functions.Tables 10 and 11 report the ANOVA results with

repeated measures. The results demonstrate thatBN-LV statistically outperforms the two competingSEM methods in both the confirmatory and in theexploratory mode.

26 BN-LV can also be used for confirmatory analysis but the errorrates can be different from the exploratory case. The PO error isdetected by using d-separation conditions (PC2 algorithm) on theprespecified graph if a link is d-separated by other prespecifiedlinks.

Table 10 ANOVA Results Between BN-LV with Dirichlet and GaussianMethods (Exploratory Mode)

BN (Dirichlet) BN (Gaussian)

Source F -value Significance Source F -value Significance

Within Dirichlet 110�454 0.000 Gaussian 53�391 0�000Between Size 30�853 0.000 Size 20�847 0�000

Noise 5�577 0.006 Noise 1�149 0�323Normality 0�799 0.374 Normal 1�077 0�303Linearity 21�586 0.000 Linear 0�129 0�720

4.2.5. Discussion of Simulation Results. Basedon the simulated results, BN-LV has certain advan-tages over PLS and LISREL under the followingconditions.Sample Size. The BN-LV method performs consis-

tently better than PLS and LISREL as sample sizesincrease (PO errors decrease and PC errors staylow). LISREL substantially improves with larger sam-ple sizes while PLS shows little improvement frommedium to large sample sizes. BN-LV is clearly pre-ferred for small sample sizes. Taken together, BN-LVis generally superior to PLS and LISREL across thespectrum of sample sizes �50�250�1�000�.Noise. In terms of noise, PLS and LISREL are shown

to generally perform well for high noise levels, con-sistent with Fornell and Larcker (1981) who findthat SEM model fitness based on structure consis-tency may improve as both the model and the theorydecline. BN-LV turns out to be more sensitive to highdata noise, especially for the measurement model.Linearity. When the true relationship among LVs is

not linear, BN-LV is shown to be superior to PLS and

Table 11 ANOVA Results Between BN-LV with PLS and LISREL(Confirmatory Mode)

PLS LISREL

Error ErrorSource rate F -value Significance Source rate F -value Significance

Within PLS PO 13�595 0.000 LISREL PO 53�852 0.000PC 8�686 0.004 PC 0�881 0.351

Between Size PO 78�860 0.000 Size PO 81�377 0.000PC 8�151 0.001 PC 2�730 0.072

Noise PO 1�289 0.282 Noise PO 0�569 0.569PC 0�631 0.535 PC 0�610 0.546

Normality PO 0�158 0.692 Normality PO 0�506 0.479PC 0�199 0.657 PC 0�145 0.865

Linearity PO 1�421 0.237 Linearity PO 2�560 0.114PC 12�755 0.001 PC 13�791 0.000C

opyright:

INF

OR

MS

hold

sco

pyrig

htto

this

Articlesin

Adv

ance

vers

ion,

whi

chis

mad

eav

aila

ble

toin

stitu

tiona

lsub

scrib

ers.

The

file

may

notb

epo

sted

onan

yot

her

web

site

,inc

ludi

ngth

eau

thor

’ssi

te.

Ple

ase

send

any

ques

tions

rega

rdin

gth

ispo

licy

tope

rmis

sion

s@in

form

s.or

g.

Page 21: Toward a Causal Interpretation from Observational Data: A ...ericz/ISR09.pdf · function). Existing approaches need to prespecify the exact form of the nonlinear relationship. A general

Zheng and Pavlou: New Bayesian Networks Method for Structural Models with Latent VariablesInformation Systems Research, Articles in Advance, pp. 1–27, © 2009 INFORMS 21

LISREL. This is expected because BN-LV explicitlyallows nonlinearities to emerge in the relationshipsamong principal constructs. The main impact of non-linearity on PLS and LISREL is the PC error inthe structural model. For example, for sample sizen = 1�000, the average PC error for PLS doubles from0.10 to 0.20 (other dimensions held constant).Normality. In terms of nonnormality, BN-LV is not

affected when the data violate the normality assump-tion in the measurement model or in the structuralmodel. In contrast, the CFA turns out to be very sen-sitive to data normality. The effect of nonnormality onLISREL appears to interact with sample size. For largesample sizes, nonnormality has a negative impact onLISREL (PO error increases from 0.05 to 0.08); formedium sample size, PO error decreases from 0.2 to0.18. However, nonnormality does not considerablyaffect PLS for the structural model (Chin 1998). There-fore, BN-LV is clearly superior to LISREL but per-forms comparably to PLS.The results for the structural model show that both

PLS and LISREL make high omission errors in generalwhile PLS commits the highest PC error rate. There-fore, BN-LV outperforms PLS and LISREL both in thePC and the PO error rate.The simulation study also highlighted the difficulty

in manually testing the measurement and the struc-tural model for 75 data sets in PLS and LIREL (eachdata set required about 30 minutes to calculate themeasurement and structural model). In contrast, theautomated nature of BN-LV greatly facilitated modelestimation (about few seconds per data set). Thus,when there is a need for automating the process ofexploring multiple causal structures, particularly forcomplex models that prohibit a manual specificationof all possible structural models, BN-LV is clearlysuperior to PLS and LISREL.

5. DiscussionThis study contributes to and has implications forthe following literatures: First, it contributes to theempirical IS literature and the social and behav-ioral sciences in general by proposing a new dataanalysis method for inductively identifying LVs fromraw measurement items and inferring the mostlikely causal structural model among the identifiedLVs. Second, it contributes to the BN literature by

(a) allowing the identification of multi-item LVs withthe proposed LVI algorithm and by (b) allowing dis-crete and ordinal data in BN with the proposed OLscoring function. Third, it contributes to the SEM lit-erature by addressing three key limitations of exist-ing SEM methods (Lee et al. 1997)—identifying causallinks in the structural model, identifying the measure-ment and structural models in an exploratory manner,and allowing nonlinearities.

5.1. Implications for Empirical IS ResearchThe first contribution is to develop a comprehen-sive (measurement model construction and structuralmodel discovery) data analysis method for inferringcausal relationships among constructs, using observa-tional, cross-sectional data that are discrete and ordi-nal. In fact, the majority of empirical studies in the ISliterature use this type of data in Likert-type scales.In terms of the measurement model, the proposed

LVI algorithm has several advantages over competingmethods. First, in contrast to common factor analysistechniques that rely on “rule-of-thumb” heuristicsand approximate solutions, LVI offers an exact solu-tion to the measurement model by categorizing allmeasurement items into LVs. Second, in contrast toCFA methods that impose a certain structure onthe data, LVI operates in an exploratory mode, thusallowing the data to “speak out” and be categorizedunder the most likely LVs. Because BN-LV does notrequire IS researchers to prespecify which measure-ment items should belong to each LV, it allows themto explore how new measurement items could beclassified into new LVs. By identifying problematicitems (those that cannot be categorized under LVs), itallows IS researchers to reevaluate potentially prob-lematic such items. Most important, the LVI algorithmdirectly tests the fundamental axiom of conditionalindependence, thus allowing a causal interpretation tothe relationship between the LVs and their identifiedmeasurement items, consistent with the principles ofthe psychometric theory of measurement.In terms of the structural model, BN-LV tests

the d-separation conditions and uses the proposedOL scoring function to generate the most likelycausal Bayesian network. This allows the inferenceof causality in structural models without imposing aprespecified structure. By operating in an exploratory

Copyright:

INF

OR

MS

hold

sco

pyrig

htto

this

Articlesin

Adv

ance

vers

ion,

whi

chis

mad

eav

aila

ble

toin

stitu

tiona

lsub

scrib

ers.

The

file

may

notb

epo

sted

onan

yot

her

web

site

,inc

ludi

ngth

eau

thor

’ssi

te.

Ple

ase

send

any

ques

tions

rega

rdin

gth

ispo

licy

tope

rmis

sion

s@in

form

s.or

g.

Page 22: Toward a Causal Interpretation from Observational Data: A ...ericz/ISR09.pdf · function). Existing approaches need to prespecify the exact form of the nonlinear relationship. A general

Zheng and Pavlou: New Bayesian Networks Method for Structural Models with Latent Variables22 Information Systems Research, Articles in Advance, pp. 1–27, © 2009 INFORMS

mode, BN-LV automatically examines all plausiblestructural models and selects the most likely one.This provides a major advantage over competing SEMmethods that require manual specification of plau-sible models, especially for complex models wheresuch manual work becomes virtually impossible. Thisadvantage becomes valuable where there is little orno prior theory or when IS researchers want to relypurely on data. This has implications for new theorydevelopment (where there is no existing theory basis),which is particularly common in IS research becauseof the rapid evolution of IT and the introduction ofnew IT systems.Finally, as a “conditional probability” method, BN-

LV fundamentally differs from existing data analysistools that rely on the correlation or covariance matrix.Because conditional probabilities can help infer causal-ity (e.g., Druzdzel and Simon 1993, Shugan 1997), theBN-LV method provides IS researchers with anothertool to infer causality from observational data.

5.2. Implications for the BayesianNetworks Literature

Despite the touted potential of Bayesian networksto facilitate research in the IS literature (Lee et al.1997), existing BN methods have two key limitationsthat preclude their application to empirical SEM stud-ies: (1) They cannot readily handle LVs measuredwith multiple measurement items,27 and (2) they arenot suitable for discrete and ordinal data such asthose obtained from Likert-type scales (which areboth prevalent in IS research). The proposed BN-LVmethod overcomes these limitations.First, our proposed LVI algorithm provides a gen-

eral method that allows the identification of “hid-den” LVs from raw measurement items through anoptimal weighting method that maximizes condi-tional independence. Also, the LVI algorithm doesnot impose a certain prespecified structure onthe measurement model, nor does it make any

27 An exception is the work of Spirtes et al. (2000, p. 264) in theirmultiple indicator model building (MIMBuild) algorithm, whichstarts with a certain mix of LV and measurement items in a linearsystem. The MIMBuild algorithm identifies impure measurementitems for certain LV. In contrast, the BN-LV method starts with onlyraw measurement items and it does not assume any of the relation-ships among the LV to be linear.

distributional assumptions. More important, the LVIalgorithm uses the axiom of conditional independenceas its building block. To our knowledge, this rendersthe proposed LVI algorithm as the only approach con-sistent with the theory of measurement: it offers acausal interpretation to the measurement model byspecifying directional links from the measurementitems to the LVs.Second, BN-LV extends the BN literature to allow

the use of ordinal and discrete data. The proposedOL scoring function overcomes this long-held limita-tion. Moreover, our simulation results show that OLoutperforms the two state-of-the-art BN approaches—the Bayesian Dirichlet and the Gaussian metrics—forordinal and discrete data.

5.3. Implications for Structural EquationModeling (SEM) Research

As reviewed earlier, SEM methods no longer claim toinfer causality, though they were originally designedto model causal relationships. In fact, Pearl (2000)notably observed:

I believe that the causal content of SEM has beenallowed to gradually escape the consciousness of SEMpractitioners mainly for the following two reasons:(1) SEM practitioners have sought to gain respectabil-ity for SEM by keeping causal assumptions implicit,since statisticians, the arbiters of respectability, abhorsuch assumptions because they are not directly testableand; (2) The algebraic, graph-less language that hasdominated SEM research lacks the notational facil-ity needed for making causal assumptions, as dis-tinct from statistical assumptions, explicit. By failing toequip causal relations with distinct mathematical nota-tion, the founding fathers in fact committed the causalfoundation of SEM to oblivion. Their disciples todayare seeking foundational answers elsewhere (p. 209).

In contrast, BN-LV takes advantage of condi-tional probabilities to encode directional relationshipsamong LVs, thus permitting a causal interpretation inSEM models. This helps overcome the limitation ofSEM methods to infer causality.Most SEM studies specify one model structure

and use data to confirm this specific structure,thus operating in a confirmatory mode.28 Diligent

28 To the best of our knowledge, no SEM techniques allow re-searchers to automate the process of examining alternative models.Manual examination of alternative models becomes virtually impos-sible for complex models with multiple LV and measurement items.

Copyright:

INF

OR

MS

hold

sco

pyrig

htto

this

Articlesin

Adv

ance

vers

ion,

whi

chis

mad

eav

aila

ble

toin

stitu

tiona

lsub

scrib

ers.

The

file

may

notb

epo

sted

onan

yot

her

web

site

,inc

ludi

ngth

eau

thor

’ssi

te.

Ple

ase

send

any

ques

tions

rega

rdin

gth

ispo

licy

tope

rmis

sion

s@in

form

s.or

g.

Page 23: Toward a Causal Interpretation from Observational Data: A ...ericz/ISR09.pdf · function). Existing approaches need to prespecify the exact form of the nonlinear relationship. A general

Zheng and Pavlou: New Bayesian Networks Method for Structural Models with Latent VariablesInformation Systems Research, Articles in Advance, pp. 1–27, © 2009 INFORMS 23

researchers are supposed to explore plausible alter-native models and the lack of an automated methodfor exploring alternative models potentially overlooksequivalent models (Chin 1998). BN-LV helps over-come this limitation by generating an equivalent classof graphs among LVs based on d-separation tests,scoring each candidate graph using the proposed OLscoring function, and selecting the one with the high-est score. In doing so, BN-LV allows IS researchers tooperate in an exploratory mode and allows the data toinductively identify the most likely structural model.SEM encodes relationships among LVs as linear

equations, thereby ignoring potential nonlinear rela-tionships. In contrast, BN-LV uses conditional prob-abilities that do not assume any functional (linear)form, thus allowing nonlinear relationships to emergeamong LVs in the structural model. Accounting fornonlinearities is an important strength of the BN-LVmethod. Our simulation study corroborates this byshowing the superiority of BN-LV for data with non-linearities.In sum, BN-LV has certain advantages over exist-

ing SEM data analysis methods under the followingconditions:1. In the early stages of research when a hypothe-

sis has not yet been developed, particularly when aresearcher prefers to let data “speak by themselves”as opposed to testing a prespecified measurement orstructural model.2. When theory and the literature provide little

guidance on the causal structure of a structural model,particularly when the researcher needs to inductivelyexplore several potential structural models to identifythe most appropriate one.3. When there is need to automate the process for

exploring potential causal structures, in particular forcomplex models with numerous permutations amongLVs that prohibit the researcher from manually spec-ifying all potential models.4. When the researcher needs to have a stronger

causal interpretation. The conditional probability-based BN-LV method is theoretically closer to thenotion of causality than existing correlation- orcovariance-based methods.5. When the data violate normality assumptions

and when the true relationships among the LVs arenonlinear. BN-LV is also found to be more robust to

small sample size. However, SEM approaches are lesssensitive to high data noise.Therefore, BN-LV can be used under these conditionsto complement existing SEM methods.

5.4. Limitations and Suggestions forFuture Research

The paper also has a number of limitations, which cre-ate some interesting opportunities for future research.First, because this paper focuses on observational

(cross-sectional, nonexperimental data), we addressonly two of Popper’s (1959) conditions for inferringcausality (correlation between X and Y and account-ing for potential confounds), excluding the condi-tion that X must temporally precede Y . FollowingGranger (1986) who distinguishes between temporaland cross-sectional causality, our method does notclaim to create the necessary and sufficient conditionsfor inferring “absolute” causality, but it is posited asa method of inferring “near” causality from obser-vational, nonexperimental data. However, if temporalordering among variables is already known from thedata, the BN-LV method can sort all variables in atemporal order and add a constraint to only allow thepreceding variables to cause the subsequent variableswhen constructing the BN. This approach will per-mit longitudinal or experimental data to be used inthe BN-LV method, and the temporal constraint willalso greatly reduce the complexity of the BN construc-tion. The challenge for future research, however, is toidentify the temporal ordering among variables fromlongitudinal data.Second, the BN-LV method aims to discover the

most likely causal structure in a probabilistic (notdeterministic) fashion. By no means does the mostlikely causal structure discovered by the BN-LV nec-essarily capture the definite causal model. As noted in§2, there is still disagreement among many philoso-phers and researchers about the possibility to infercausality from data deterministically or probabilisti-cally. In response to some philosophers who arguethat causality can only be inferred from controlledexperiments that account for all potential confounds,BN-LV examines the relationship between X and Y

while capturing possible confounds Z by evaluatingd-separation conditions. Though rigorous researchersare supposed to account for all potential confounds,

Copyright:

INF

OR

MS

hold

sco

pyrig

htto

this

Articlesin

Adv

ance

vers

ion,

whi

chis

mad

eav

aila

ble

toin

stitu

tiona

lsub

scrib

ers.

The

file

may

notb

epo

sted

onan

yot

her

web

site

,inc

ludi

ngth

eau

thor

’ssi

te.

Ple

ase

send

any

ques

tions

rega

rdin

gth

ispo

licy

tope

rmis

sion

s@in

form

s.or

g.

Page 24: Toward a Causal Interpretation from Observational Data: A ...ericz/ISR09.pdf · function). Existing approaches need to prespecify the exact form of the nonlinear relationship. A general

Zheng and Pavlou: New Bayesian Networks Method for Structural Models with Latent Variables24 Information Systems Research, Articles in Advance, pp. 1–27, © 2009 INFORMS

future research could develop a formal method totest whether the existing confounds Z in the data are“adequate” to assure that the X → Y relationship istruly and significantly causal.Third, BN-LV only deals with LV identification

given observed measurement items. However, itdoes not address the more general missing variableproblem—how to build a model when potentially rel-evant LVs are unobserved and thus may not havebeen necessarily captured by the data (raw measure-ment items). When a relevant variable is missing fromthe set of Z variables, it may cause inconsistencyin the d-separation condition of BN-LV. An intuitivesolution is to search for the missing (unobserved)variables. Hutchinson et al. (2000, p. 325) call this a“needle in a haystack” problem because seeking allmissing variables is unending and it is very likely thatthe key sources of unobserved effects may never befound. Indeed, a major challenge for causality infer-ence is to account for all possible confounding vari-ables (Mitchell and James 2001, Allison 2005). Recentadvances in econometrics and marketing have lookedinto this problem, primarily via latent class modelingand mixture models. It is assumed that responses arenot from a single population (group). However, whatcauses the group membership is unobserved and can-not be determined a priori. These studies do representa major step toward identifying missing variables butthe literature has still not addressed the general struc-ture problem of LVs, discussed in §3.1. Most studiesmake specific assumptions on the structure of the LVs.Once the structure is known, the LV identificationproblem is simplified as one of finding the parame-ters that fit the LV structure best. This is best reflectedby the finite mixture model where each observationmay arise from two or more unobserved groups thathave a common distribution but different parame-ters. Still, the “missing variables” problem remainsa caveat in the literature. Solutions to this problemby future research can be readily integrated into theproposed BN-LV method to provide a more accurateset of Z variables for accounting for the d-separationcondition.Fourth, besides BN, the propensity scores approach

(Mithas and Krishnan 2008) is a promising causalmethod. However, existing methods are not readily

applicable to the complex nexus of causal relation-ships in the structural model addressed here. Thereare two key challenges to extend the propensity scoresapproach to our problem: (1) determining the propen-sity scores in the presence of multiple causes, and(2) identifying the right cloning for a given individualin the presence of multiple values of a given variable.Both issues need to be investigated by future research.Finally, BN techniques cannot distinguish between

structures that entail the same likelihood, especiallywhen the two structures have the same V -structures(see our discussion in §3.3 and Spirtes et al. 2000,p. 60). In these cases, theoretical arguments maybe necessary to specify the best structure. However,the BN-LV method is a data analysis method thatonly examines the measurement and structural modeland does not address issues of theory development,measurement development, data collection, or the-ory implications. Similar to the study of Lee et al.(1997) study, future research could explore how theproposed BN-LV method can be integrated into acomprehensive method for theory building, empiricalvalidation, and theory implications.

Concluding RemarksCausality is a fundamental characteristic of a goodtheory, but the difficulty in inferring causality hasforced researchers to either infer causality from puretheory (Carte and Russell 2003) or from longitudi-nal (Granger 1986), experimental (Cook and Campbell1979), or panel (Allison 2005) data. This paper is anattempt to revive the pursuit of causality in struc-tural models from observational data in the IS liter-ature in particular and the social sciences in general,and encourage IS researchers to bring causality con-siderations back into IS studies. The proposed BN-LVmethod aims to provide a tool for IS researchers tobetter understand how causal relationships can beinferred in structural models from observational data.We hope the proposed data analysis method servesas a modest starting point for enhancing methodsfor inferring causality and building causal theories inthe IS literature. Given the enhanced sophistication ofIS research in terms of theory and methods, causal-ity can become an important consideration in the ISliterature.

Copyright:

INF

OR

MS

hold

sco

pyrig

htto

this

Articlesin

Adv

ance

vers

ion,

whi

chis

mad

eav

aila

ble

toin

stitu

tiona

lsub

scrib

ers.

The

file

may

notb

epo

sted

onan

yot

her

web

site

,inc

ludi

ngth

eau

thor

’ssi

te.

Ple

ase

send

any

ques

tions

rega

rdin

gth

ispo

licy

tope

rmis

sion

s@in

form

s.or

g.

Page 25: Toward a Causal Interpretation from Observational Data: A ...ericz/ISR09.pdf · function). Existing approaches need to prespecify the exact form of the nonlinear relationship. A general

Zheng and Pavlou: New Bayesian Networks Method for Structural Models with Latent VariablesInformation Systems Research, Articles in Advance, pp. 1–27, © 2009 INFORMS 25

Table A1 Steps of the LVI Algorithm

Input: X = �X1�X2� � � � � Xm�

Output: disjoint item sets, each of which represents an LV

1. L1 ← all �xi �, i ∈ �1� � � � �m�

2. L2 ← all �xi � xj � that meet C-1, i� j ∈ �1� � � � �m�� i = j /∗ (see §3.2.3)

3. for (k = 2; Lk = � and k ≤ m − 1 k + +� {

4. generate Lk+1;

(a) Ck+1 ← adding xi �xi Lk� to Lk /∗ adding xi one at a time

(b) Eliminate Ck+1 that do not meet C-1

(c) Eliminate Ck+1 that do not meet C-2

5. }

6. Prune Lk for all k ∈ �1� � � � �m� /∗ (see §3.2.3)

(a) Identify supersets and delete all subsets

(b) Detect overlapping measurement items and determine which itemset to keep the items /∗ start from the largest item sets and then godown the list to ensure the minimum number of LVs

7. Output all Lk

Appendix A. The LVI AlgorithmThe algorithmic steps of the LVI algorithm are outlined inTable A1.

Appendix B. The Proposed PC2 AlgorithmTable B1 summarizes the steps of Algorithm PC2. For moredetailed discussion of Algorithm PC, please refer to Spirteset al. (2000). The proposed PC2 algorithm has three mainsteps:

• Step 1 initiates a fully connected, undirected graph.• Step 2 computes Rx�y �z for all possible X, Y , and Z. If

Rx�y �z = 0, then delete the edge between X and Y .• Step 3 orients the graph using five rules of directing

an undirected graph (Verma and Pearl 1992).

Table B1 The Proposed PC2 Algorithm

Step 1. Start with the complete (all nodes are connected), undirectedgraph G

Step 2. Generate reduced undirected graph G′

(1) Test if Rx�y �z = 0 for each edge

(2) Delete the edges that are d-separated by Z (where Rx�y �z = 0)

Step 3. Direct G′ using the following five rules:

Rule 1. For each triple of vertices X , Y , Z such that the pair X , Y , andthe pair Y , Z are each adjacent in C but the pair X , Z are not adjacentin C, orient X–Y –Z as X → Y ← Z if and only if Y does notd-separate X and Z and if it does not introduce a cycle of direction. Ifthis edge reverses the previous orientation, then make it bidirectory

Rule 2. If X → Y , Y –Z and X and Z are not adjacent, then direct Y → Z

Rule 3. If X → Y , Y → Z , and X–Z , then direct X → Z

Rule 4. If X–Y , Y –Z , Y –W , X → W , Z → W , then Y → W

Rule 5. If X–Y , Y –Z , X–Z , Z–W , W → X , then X → Y and Z → Y

Appendix C. Deriving the OL MetricWe assume that a subject’s response (e.g., to a measurementitem) is a choice among r ordered values, rendering r pos-sible ordered values �1� � � � � r� of a variable (node) x. Forthis type of choice behavior, the OL model is appropriate(Borooah 2002). Let pi = P�x = i � �� and i ∈ �1� � � � � r� be theconditional probability of x = i given its parents �. We havethe following OL functions:

log it�p1� = log�p1/1− p1� = �1 +q∑

i=1

�i�i

log it�p1 + p2� = log�p1 + p2/1− p1 − p2� = �2 +q∑

i=1

�i�i

� � �

log it�p1 + p2 + · · · + pr−1�

= log�p1 + p2 + · · · + pr−1/1− p1 − p2 − · · · − pr−1�

= �r−1 +q∑

i=1

�i�i

p1 + p2 + · · · + pr = 1�

An ordered logistic regression estimates (r − 1) intercepts�1� � � � ��r−1 and q coefficients (�s). Note that each possiblevalue i has a different regression equation with different �i

but with the same �. Once the parameters are estimated,it is possible to derive the individual conditional probabil-ity pi, i ∈ �1� � � � � r� from the set of equations above. TheOL function is defined as the solution of pi to the aboveequations.

Appendix D. The Time Complexity of the BN-LVMethodThe proposed BN-LV algorithm can be decomposed intothree parts: algorithm LVI, PC2, and the OL scoring func-tion. We discuss the computation complexity for each ofthese three components below:

1. The LVI Algorithm. Assume there are n measurements.Item set Lk has at most n/k item sets. And there are at most�n − k� candidate items need to be evaluated. At each step,LVI needs to check two conditions: C-1 and C-2. The mostexpensive one is condition C-2, whose complexity dependson the optimization procedure (Equation (4)). Let us denotethis complexity as o�C2�. Therefore, the overall complexityis given by:

n∑k=1

�n/k� × �n − k� × o�C� =[n2∑

k

1/

k − n�n + 1�/2]

×o�c2��

∑nk=1 1/k is the Harmonic series, which diverges very

slowly. When n = 1�000, the value is 7.48. Therefore, we cantreat it as a constant. Hence, the complexity is n2o�c2�, thenumber of tests that must be examined.

2. PC2 Algorithm. Spirtes et al. (2000, p. 86) demonstratesthat the PC2 complexity is given by n2�n − 1�k−1/�k − 1�!,where k is the maximum number of edges a node can have.

Copyright:

INF

OR

MS

hold

sco

pyrig

htto

this

Articlesin

Adv

ance

vers

ion,

whi

chis

mad

eav

aila

ble

toin

stitu

tiona

lsub

scrib

ers.

The

file

may

notb

epo

sted

onan

yot

her

web

site

,inc

ludi

ngth

eau

thor

’ssi

te.

Ple

ase

send

any

ques

tions

rega

rdin

gth

ispo

licy

tope

rmis

sion

s@in

form

s.or

g.

Page 26: Toward a Causal Interpretation from Observational Data: A ...ericz/ISR09.pdf · function). Existing approaches need to prespecify the exact form of the nonlinear relationship. A general

Zheng and Pavlou: New Bayesian Networks Method for Structural Models with Latent Variables26 Information Systems Research, Articles in Advance, pp. 1–27, © 2009 INFORMS

Table D1 Illustration of the Computational Cost of BN-LV Method

Original sample 10×Original data points

TAM model (Pavlou 2003) <1 second <1 second�k = 8�m = 3� n = 151� (151 data points) (1,510 data points)

TAM-trust model (Pavlou 2003) <1 second 3.5 seconds�k = 13�m = 5� n = 151� (151 data points) (1,510 data points)

Extended TPB model ≈3 minutes ≈6 hours(Pavlou and Fygenson 2006)

�k = 24�m = 14� n = 266� (266 data points) (2,660 data points)

3. The OL Scoring Function. The number of edges thatneed to be directed at the worst case is nk, suggesting thatthe scoring function needs to run nk times ordered logisticregression.To investigate how the time complexity of the BN-LVmethod varies as a function of the number of measurementitems (denoted by k), the number of constructs (denotedas m), and the number of data points (denoted as n),Table D1 presents the results of six scenarios. The three rowsrepresent three data sets used in prior research with differ-ent level of complexity. The second column represents theoriginal data set and the third column represents the orig-inal data set bootstrapped to artificially generate 10 timeslarger data sets. The results (on a 2 GB RAM and 2 GHZCPU computer) show that BN-LV is more sensitive to thenumber of measurement items than to the number of datapoints. In sum, BN-LV can easily run on typical data setsencountered in most empirical IS studies.

ReferencesAllison, P. D. 2005. Causal inference with panel data. Amer.

Sociol. Association Annual Meeting, Philadelphia. http://www.allacademic.com/meta/p23194_index.html.

Aristotle. 350 b.c. Physics, Book II. Translated by R. P. Hardie,R. K. Gaye in 1994. Massachusetts Institute of Technology,Cambridge. http://classics.mit.edu/Aristotle/physics.2.ii.html.

Bagozzi, R. P. 1980. Causal Modes in Marketing. Wiley, New York.Binder, J., D. Koller, S. Russell, K. Kanazawa. 1997. Adaptive prob-

abilistic networks with hidden variables. Machine Learn. 29213–244.

Bollen, K. A. 1989. Structural Equations with Latent Variables. JohnWiley and Sons, New York.

Borooah, V. 2002. Logit and Probit: Ordered and Multinomial Models.Sage Publications, Thousand Oaks, CA.

Breckler, S. J. 1990. Application of covariance structure modeling inpsychology: Causes for concern? Psych. Bull. 107(2) 260–372.

Carte, T., C. Russell. 2003. In pursuit of moderation: Nine commonerrors and their solutions. MIS Quart. 27(3) 479–501.

Cartwright, N. 1995. Probabilities and experiments. J. Econometrics67(1) 47–59.

Chickering, D. 2002. Optimal structure identification with greedysearch. J. Machine Learn. Res. 3(3) 507–554.

Chickering, D., D. Heckerman. 1997. Efficient approximation for themarginal likelihood of Bayesian networks with hidden vari-ables. Machine Learn. 29 181–212.

Chin, W. W. 1998. Issues and opinion on structural equation mod-eling. MIS Quart. 22(1) 7–16.

Cook, T. D., D. T. Campbell. 1979. Quasi-Experimentation: Design andAnalysis for Field Settings. Rand McNally, Chicago.

Cooper, G. 1995. A Bayesian method for learning belief networksthat contain hidden variables. J. Intelligent Inform. Systems 471–88.

Cooper, G., E. Herskovits. 1992. A Bayesian method for the induc-tion of probabilistic networks from data. Machine Learn. 9309–347.

Davis, F. D. 1989. Perceived usefulness, perceived ease of use anduser acceptance of information technology. MIS Quart. 13(3)319–340.

Dayton, M., G. Macready. 1988. Concomitant-variable latent-classmodels. J. Amer. Statist. Assoc. 83(401) 173–178.

Descartes, R. 1637. Discourse on method. Translated byJ. Cottingham, R. Stoothoff, D. Murdoch, and A. Kenny (1991).The Philosophical Writings of Descartes. Cambridge UniversityPress, Cambridge, UK.

Druzdzel, M., H. Simon. 1993. Causality in Bayesian belief net-works. Proc. 9th Annual Conf. Uncertainty Artificial Intelligence�UAI�, Washington, DC, 3–11.

Elidan, G., N. Friedman. 2001. Learning the dimensionality of hid-den variables. Proc. 17th Conf. Annual Uncertainty Artificial Intel-ligence �UAI�, Seattle, WA, 144–151.

Elidan, G., N. Lotner, N. Friedman, D. Koller. 2000. Discovering hid-den variables: A structure-based approach. Adv. Neural Inform.Processing Systems 13(1) 30–37.

Fornell, C., D. Larcker. 1981. Evaluation structural equation modelswith unobserved variables and measurement error. J. MarketingRes. 18(1) 39–50.

Friedman, N. 1997. Learning belief networks in the presence ofmissing values and hidden variables. Proc. Internat. Conf.Machine Learn. �ICML 97�, Nashville, TN, 125–133.

Friedman, N., M. Linial, I. Machman, D. Peer. 2000. Using Bayesiannetworks to analyze expression data. J. Computational Biol.7(3/4) 601–620.

Gefen, D., E. Karahanna, D. W. Straub. 2003. Trust and TAM inonline shopping: An integrated model. MIS Quart. 27(1) 51–90.

Gefen, D., D. W. Straub, M.-C. Boudreau. 2000. Structural equa-tion modeling and regression: Guidelines for research practice.Comm. Association Inform. Systems 4(7) 1–70.

Glymour, C., R. Scheines, P. Spirtes, K. Kelly. 1987. DiscoveringCausal Structure: Artificial Intelligence, Philosophy and StatisticalModeling. Academic Press, San Diego.

Goldstein, E. 1994. Psychology. Brooks/Cole Publishing Company,Belmont, MA.

Granger, C. 1986. Statistics and causal inferences: Comment. J. Amer.Statist. Association 81(396) 967–968.

Heckerman, D. 1996. A tutorial on learning Bayesian networks.Technical Report MSR-TR-95-06, Microsoft Research, Redmond,WA. http://citeseer.ist.psu.edu/heckerman95tutorial.html.

Heckerman, D., D. Geiger, D. Chickering. 1995. Learning Bayesiannetworks: The combination of knowledge and statistical data.Machine Learn. 20 197–243.

Heinen, T. 1996. Latent Class and Discrete Latent Trait Models. SagePublications, New York.

Holland, P. 1986. Statistics and causal inference. J. Amer. Statist.Association 81 945–960.

Copyright:

INF

OR

MS

hold

sco

pyrig

htto

this

Articlesin

Adv

ance

vers

ion,

whi

chis

mad

eav

aila

ble

toin

stitu

tiona

lsub

scrib

ers.

The

file

may

notb

epo

sted

onan

yot

her

web

site

,inc

ludi

ngth

eau

thor

’ssi

te.

Ple

ase

send

any

ques

tions

rega

rdin

gth

ispo

licy

tope

rmis

sion

s@in

form

s.or

g.

Page 27: Toward a Causal Interpretation from Observational Data: A ...ericz/ISR09.pdf · function). Existing approaches need to prespecify the exact form of the nonlinear relationship. A general

Zheng and Pavlou: New Bayesian Networks Method for Structural Models with Latent VariablesInformation Systems Research, Articles in Advance, pp. 1–27, © 2009 INFORMS 27

Hume, D. 1738. A treatise of human nature. Report, ClarendonPress, Oxford, UK (1996).

Hutchinson, W., W. Kamakura, J. Lynch. 2000. Unobserved het-erogeneity as an alternative explanation for reversal effects inbehavioral research. J. Consumer Res. 27 323–344.

Kant, I. 1781. The Critique of Pure Reason. Translation by J. M. D.Meiklejohn. 1999. Cambridge University Press, London. http://eserver.org/philosophy/kant/critique-of-pure-reason.txt.

Kenny, D., C. Judd. 1984. Estimating the nonlinear and interactiveeffects of latent variables. Psych. Bull. 96(1) 201–210.

Kline, R. B. 1998. Principles and Practice of Structural Equation Mod-eling. Guilford Press, New York.

Lee, B., A. Barua, A. B. Whinston. 1997. Discovery and representa-tion of causal relationships in MIS research: A methodologicalframework. MIS Quart. 21(1) 109–136.

Mitchell, T. R., L. R. James. 2001. Building better theory: Time andthe specification of when things happen. Acad. Management Rev.26(4) 530–547.

Mithas, S., M. Krishnan. 2008. From association to causation viaa potential outcomes approach. Inform. Systems Res., ePubahead of print December 18, http://isr.journal.informs.org/cgi/content/abstract/isre.1080.0184v1.

Mithas, S., D. Almirall, M. Krishnan. 2006. Do CRM systemscause one-to-one marketing effectiveness? Statist. Sci. 21(2)223–233.

Pavlou, P. A. 2003. Consumer acceptance of electronic commerce:Integrating trust and risk with the technology acceptancemodel. Internat. J. Electronic Commerce 7(3) 69–103.

Pearl, J. 1998. Graphs, causality, and structural equation models.Sociol. Methods Res. 27(2) 226–284.

Pearl, J. 2000. Causality: Models, Reasoning and Inference. CambridgeUniversity Press, Cambridge, UK.

Pearl, J., T. Verma. 1991. A theory of inferred causation. Proc. Prin-ciples Knowledge Presentation Reasoning 2(1) 441–452.

Pearson, K. 1897. Mathematical contributions to the theory of evo-lution. Proc. Roy. Soc. London 60 489–503.

Plato. 360 b.c. The Republic. Translated by Benjamin Jowett in1893. Wikisource, New York. http://en.wikisource.org/wiki/TheRepublic.

Popper, K. 1959. The Logic of Scientific Discovery. Basic Books,New York.

Rubin, D., R. Waterman. 2006. Estimating the causal effects ofmarketing interventions using propensity score methodology.Statist. Sci. 21(2) 206–222.

Sarkar, S., R. Sriram. 2001. Bayesian models for early warnings ofbank failures. Management Sci. 47(10) 1457–1475.

Shugan, S. M. 2007. Causality, unintended consequences anddeducing shared causes. Marketing Sci. 26(6) 731–741.

Silva, R., R. Scheines, C. Glymour, P. Spirtes. 2006. Learning thestructure of linear latent variables. J. Machine Learn. Res. 7191–246.

Skrondal, A., S. Rable-Hesketh. 2004. Generalized Latent VariableModeling. Chapman & Hall, London.

Spinoza, B. 1662. On the improvement of the understanding. Trans-lated by R. H. M. Elwes. In The Chief Works of Benedict deSpinoza. 1883. G. Bell & Sons, London.

Spirtes, P., C. Glymour, R. Scheines. 2000. Causation, Prediction, andSearch. MIT Press, Cambridge.

Spirtes, P., C. Glymour, R. Scheines. 2002. Data mining tasks andmethods: Probabilistic and causal networks: Mining for prob-abilistic networks. Handbook of Data Mining and Knowledge Dis-covery. Oxford University Press, New York.

Spirtes, P., T. Richardson, C. Meek, R. Scheines, C. Glymour. 1998.Using path diagrams as a structural equation modeling tool.Sociol. Methods Res. 27(2) 182–225.

Suppes, P. 1970. A Probabilistic Theory of Causality. North HollandPublishing Company, Amsterdam.

Torgerson, W. S. 1958. Theory and Methods of Scaling. Wiley,New York.

Verma, T., J. Pearl. 1992. An algorithm for deciding if a set ofobserved independencies has a causal explanation. Proc. 8thAnnual Conf. Uncertainty Artificial Intelligence �UAL�, StanfordUniversity, Palo Alto, CA, 323–330.

Copyright:

INF

OR

MS

hold

sco

pyrig

htto

this

Articlesin

Adv

ance

vers

ion,

whi

chis

mad

eav

aila

ble

toin

stitu

tiona

lsub

scrib

ers.

The

file

may

notb

epo

sted

onan

yot

her

web

site

,inc

ludi

ngth

eau

thor

’ssi

te.

Ple

ase

send

any

ques

tions

rega

rdin

gth

ispo

licy

tope

rmis

sion

s@in

form

s.or

g.