how validity travelled to economic experimenting

17
This article was downloaded by: [UZH Hauptbibliothek / Zentralbibliothek Zürich] On: 09 July 2014, At: 01:50 Publisher: Routledge Informa Ltd Registered in England and Wales Registered Number: 1072954 Registered office: Mortimer House, 37-41 Mortimer Street, London W1T 3JH, UK Journal of Economic Methodology Publication details, including instructions for authors and subscription information: http://www.tandfonline.com/loi/rjec20 How validity travelled to economic experimenting Floris Heukelom a a Department of Economics, Institute for Management Research , Radboud University , Nijmegen, The Netherlands Published online: 13 May 2011. To cite this article: Floris Heukelom (2011) How validity travelled to economic experimenting, Journal of Economic Methodology, 18:01, 13-28, DOI: 10.1080/1350178X.2011.556435 To link to this article: http://dx.doi.org/10.1080/1350178X.2011.556435 PLEASE SCROLL DOWN FOR ARTICLE Taylor & Francis makes every effort to ensure the accuracy of all the information (the “Content”) contained in the publications on our platform. However, Taylor & Francis, our agents, and our licensors make no representations or warranties whatsoever as to the accuracy, completeness, or suitability for any purpose of the Content. Any opinions and views expressed in this publication are the opinions and views of the authors, and are not the views of or endorsed by Taylor & Francis. The accuracy of the Content should not be relied upon and should be independently verified with primary sources of information. Taylor and Francis shall not be liable for any losses, actions, claims, proceedings, demands, costs, expenses, damages, and other liabilities whatsoever or howsoever caused arising directly or indirectly in connection with, in relation to or arising out of the use of the Content. This article may be used for research, teaching, and private study purposes. Any substantial or systematic reproduction, redistribution, reselling, loan, sub-licensing, systematic supply, or distribution in any form to anyone is expressly forbidden. Terms & Conditions of access and use can be found at http://www.tandfonline.com/page/terms- and-conditions

Upload: floris

Post on 27-Jan-2017

213 views

Category:

Documents


0 download

TRANSCRIPT

This article was downloaded by: [UZH Hauptbibliothek / Zentralbibliothek Zürich]On: 09 July 2014, At: 01:50Publisher: RoutledgeInforma Ltd Registered in England and Wales Registered Number: 1072954 Registeredoffice: Mortimer House, 37-41 Mortimer Street, London W1T 3JH, UK

Journal of Economic MethodologyPublication details, including instructions for authors andsubscription information:http://www.tandfonline.com/loi/rjec20

How validity travelled to economicexperimentingFloris Heukelom aa Department of Economics, Institute for Management Research ,Radboud University , Nijmegen, The NetherlandsPublished online: 13 May 2011.

To cite this article: Floris Heukelom (2011) How validity travelled to economic experimenting,Journal of Economic Methodology, 18:01, 13-28, DOI: 10.1080/1350178X.2011.556435

To link to this article: http://dx.doi.org/10.1080/1350178X.2011.556435

PLEASE SCROLL DOWN FOR ARTICLE

Taylor & Francis makes every effort to ensure the accuracy of all the information (the“Content”) contained in the publications on our platform. However, Taylor & Francis,our agents, and our licensors make no representations or warranties whatsoever as tothe accuracy, completeness, or suitability for any purpose of the Content. Any opinionsand views expressed in this publication are the opinions and views of the authors,and are not the views of or endorsed by Taylor & Francis. The accuracy of the Contentshould not be relied upon and should be independently verified with primary sourcesof information. Taylor and Francis shall not be liable for any losses, actions, claims,proceedings, demands, costs, expenses, damages, and other liabilities whatsoever orhowsoever caused arising directly or indirectly in connection with, in relation to or arisingout of the use of the Content.

This article may be used for research, teaching, and private study purposes. Anysubstantial or systematic reproduction, redistribution, reselling, loan, sub-licensing,systematic supply, or distribution in any form to anyone is expressly forbidden. Terms &Conditions of access and use can be found at http://www.tandfonline.com/page/terms-and-conditions

How validity travelled to economic experimenting

Floris Heukelom*

Department of Economics, Institute for Management Research, Radboud University, Nijmegen,The Netherlands

Validity was first given a more specifically scientific meaning by psychologists in theearly twentieth century in the contexts of psychological tests. Following theclassification of different validity-types in the American Psychological Association’sTechnical Recommendations (1954), validity travelled from psychological tests topsychological experiments through the work of Donald Campbell. Thus the idea wasintroduced that also experiments could be more or less valid. In addition, a distinctionwas made between the internal and the external validity of an experiment. Of thetwo domains in which validity was employed in psychology, only the internal andexternal validity of experimental methodology travelled to economics. The initialimplementation of validity in economic experimentation was reluctant, and focusedupon showing how external validity in particular was not problematic in economicexperiments. However, the gradual adoption of validity by economists working withexperiments eventually led to the clear, analytical definition of internal validity byFrancesco Guala, a definition that was subsequently taken over by other economists.In its travels from psychological tests to psychological experiments to economicexperiments the concept of validity generally retained its meaning as the accuracy of ascientific procedure. At the same time, however, it was put to use in dissimilar ways andelicited different discussions in the scientific realms in which it was applied.

Keywords: validity; tests; experiments

1 Introduction

In psychology and economics, validity denotes the accuracy of a scientific operation such

as measurement or experimentation. In other words, validity asks how well the scientific

operation achieves what it aims to achieve. The concept of validity traveled at least twice

across scientific boundaries. First, it traveled from the area of psychological tests to

psychological experiments, and then, second, it traveled from the area of psychological

experiments to experiments in the other social and behavioral sciences. As such, validity

provides an illustrative example of how concepts and other scientific tools travel from one

scientific domain to another (e.g. Galison and Stump 1996; Galison 1999; Gieryn 1999;

Morgan and Howlett 2010).

The reason that validity traveled from one area to another was that scientists in the area

of destination deemed the meaning and usage of the concept useful. However, as an

inevitable by-product of scientific traveling, the meaning and usage of validity also

changed with each voyage. The purpose of the present article is to document the two

voyages of validity. Thus, it adds to the literature of traveling scientific tools. But, in

ISSN 1350-178X print/ISSN 1469-9427 online

q 2011 Taylor & Francis

DOI: 10.1080/1350178X.2011.556435

http://www.informaworld.com

*Email: [email protected]

Journal of Economic Methodology,

Vol. 18, No. 1, March 2011, 13–28

Dow

nloa

ded

by [

UZ

H H

aupt

bibl

ioth

ek /

Zen

tral

bibl

ioth

ek Z

üric

h] a

t 01:

50 0

9 Ju

ly 2

014

addition, by showing where validity originates and how it was applied in different ways

each time it crossed a scientific border, this article aims to contribute to current discussions

of validity in economic and related experimenting (e.g. Calder, Phillips, and Tybout 1982;

Hammersley 1991; Guala 1999, 2003, 2005; Loewenstein 1999; Vissers, Heyne, Peters,

and Guerts 2001; Lucas 2003; Harrison and List 2004; Bardsley 2005; Guala and Mittone

2005; Samuelson 2005; Schram 2005; Thye 2007; Webster and Sell 2007; Willer and

Walker 2007; Zelditch 2007; Bardsley, Cubitt, Loomes, Moffatt, Starmer, and Sugden

2009; Jimenez-Buedo and Miller forthcoming).

The upcoming section reveals the origin of validity in the methodology of

psychological tests, in which validity functioned as a counterpart to reliability. When

validity subsequently traveled from the area of tests to that of psychological experiments,

it roughly retained its meaning and provided experimental methodology with a new

category in which it enabled one to think about experiments. At the same time, however, it

gave rise to a new series of questions regarding what exactly experiments are about. The

third section addresses how validity traveled from psychological experimentation to the

area of economic experimentation. In economics, validity was initially reluctantly adopted

and received little substantial discussion as compared with its discussion in psychological

experimentation. But when it did become a subject of concern, it received an analytical

treatment and definition that was stricter than it had received in psychology.

2 Validity in psychology

The concept of validity had been commonly used by American psychologists from at least

the 1870s onwards (e.g. McCrea and Pritchard 1897; Colvin 1900), but it was in the

psychology of tests that it received a more specific meaning from the 1920s onwards as the

accuracy of a test (Swanborn 1981). By contrast, validity entered the experimental

psychological vocabulary only in the late 1950s. In Edwin Boring’s classic A History of

Experimental Psychology (1929), the issue of validity was completely absent. Also the

equally voluminous and complete overview Experimental Psychology (1938) by Robert

Woodworth and Harold Schlosberg contained no reference to validity. Moreover, validity

remained absent in the 1950 second edition of Boring’s history, despite the fact that the

‘treatment of the later period [was] greatly expanded’ (Boring 1950, p. vii). The same is

true for Woodworth and Schlosberg’s revised edition that appeared in 1954. Even

A Manual of Psychological Experiments (1937), edited by Boring and with contributions

from some 20 high-standing experimental psychologists of the time, in its many examples

of the practicalities of conducting experiments, did not spend a single word on validity.

That may seem stranger than it is, as a test and an experiment really are two different

things. A test is an assessment by means of a questionnaire, interview, or other method to

assess the functioning of a human being compared with a standard or average – the IQ test is

the historically first and paradigmatic example (Danziger 1997). A test, in other words, is an

instrument, just as the thermometer is an instrument. For any instrument or test at least two

characteristics are crucial: its precision and its accuracy (e.g. Chang 2004). The accuracy of

the thermometer refers to how closely the thermometer follows the actual, true value of what

it is supposed to measure: heat. Precision refers to the degree of closeness of repeated

measurements of the same true value. The same distinction came to be used for psychological

tests, inwhich accuracywas referred to as validity, and precision becameknown as reliability.

An example can be found in Chicago-based psychometrician Louis Leon Thurstone’s

The Reliability and Validity of Tests: derivation and interpretation of fundamental

formulae concerned with reliability and validity of tests and illustrative problems (1931).

F. Heukelom14

Dow

nloa

ded

by [

UZ

H H

aupt

bibl

ioth

ek /

Zen

tral

bibl

ioth

ek Z

üric

h] a

t 01:

50 0

9 Ju

ly 2

014

As the title indicates, reliability and validity were the two guiding concepts in this 130-

page student manual. Thurstone reminded the reader that ‘validity refers to the correlation

between a test and its oriterion’ (Thurstone 1931, p. 97). In other words, validity to

Thurstone referred to how accurate the test measured the actual, true value (the oriterion) it

wanted to measure. The example that Thurstone used was a test of students’ previous

examination grades as a predictor for future scholarship grades. If this test performed well,

in the sense that previous examination grades were a good predictor of future scholarship

grades, the test had a high validity, if it did not the test had a low validity.1

The ever-recurring issue in empirical science is how to disentangle reliability from

validity, and in particular how to determine and demonstrate conclusively that your

instrument is valid (Porter 1994; Power 1996). The problem is that you cannot. But what

you can do is try to move in the right direction. To that end, psychologists in the 1930s and

1940s started to break down validity into different sub-categories, so as to be able to more

clearly assess the different aspects of validity separately. Sub-categories proposed

included content validity, construct validity, and ecological validity (Swanborn 1981). The

subsequent rapid expansion of validity concepts convinced the members of the American

Psychological Association (APA) in the early 1950s that some definitions and standards

for conducting tests were badly needed (Vissers et al. 2001, p. 131).

The APA was created in 1892 and aimed at representing all American professional

psychologists (www.apa.org, accessed 7 May 2009). In this capacity, it actively sought to

establish standards regarding various methodological aspects of psychological research.

Well known are its standards for scientific publication, first published in 1952, with

precursors going back to 1929 (Kadzin 2001, p. xix). Less successful examples include the

Committee on Measurement (1909–1919). In 1950, the then president of the APA, Joy

Paul Guilford, a psychometrician who worked in the tradition of Thurstone, appointed a

Committee on Test Standards to prepare a ‘statement on technical standards for evaluating

tests and the contents of test manuals’ (Street 1994, p. 152), which would be ‘an official

statement of the profession’ (APA Committee on Test Standards 1954, p. 461). The

committee, composed of Edward S. Bordin, R.C. Challman, H.S. Conrad, Lloyd

G. Humphreys, Paul E. Meehl, Donald E. Super, and its chairman Lee J. Cronbach, was

created to set the standard for performing tests for years to come. It conducted its

discussion in close cooperation with representatives of the American Educational

Research Association (AERA) and the National Council on Measurements Used in

Education (NCMUE). A preliminary version of the Technical Recommendations for

Psychological Tests and Diagnostic Techniques, based in part on Cronbach and Meehl

(1955), was published for examination by the psychological community in American

Psychologist (1952, pp. 461–475). The final version was published both as a separate

manual and as a special issue of American Psychologist (1954, pp. 201–238). About a

third of this 40-page manual was devoted to validity.

The Technical Recommendations distinguished four types of validity: content validity,

predictive validity, concurrent validity, and construct validity. Content validity was

described as a claim regarding whether observations from a draw of the population justify

claims about the population as a whole. Predictive validity was understood as an

assessment of how well predictions based on current observations accurately reflect future

observations. Concurrent validity demanded that a new test behaves similarly to existing

tests that measure the same phenomenon. In other words, concurrent validity demanded

that for instance a new IQ test had a high correlation with an already existing and accepted

IQ test. Construct validity, finally, was an assessment of the overlap between the

scientific operationalization of a higher-order term (intelligence, academic performance,

Journal of Economic Methodology 15

Dow

nloa

ded

by [

UZ

H H

aupt

bibl

ioth

ek /

Zen

tral

bibl

ioth

ek Z

üric

h] a

t 01:

50 0

9 Ju

ly 2

014

unemployment, and so on) and the meaning of this higher-order term in everyday

language.

In the same period as the publication of Technical Recommendations, but without any

apparent direct connection, the two traditions of experimental psychology and

correlational psychology integrated, among others as a result of the efforts of Thurstone

(Gigerenzer 1987; Danziger 1997). As a result, a test became an instrument that could be

employed in an experiment. One could for instance employ a test of subjects’ intelligence

in an experiment investigating the relation between intelligence and learning (Festinger

and Katz 1953). However, one could also still conduct a test without an experiment (say

measuring a child’s IQ), just as one could do an experiment without the instrument of a test

(say judging relative brightness of lamp bulbs).

Perhaps because of the increasing integration of correlational and experimental

psychology, the Technical Recommendations’ presentation of validity was a mixed

success. Almost immediately following its publication, the Technical Recommendations’

classification of validity became contested – in particular by a number of researchers who

applied it to experiments instead of tests. However, the Technical Recommendations did

succeed in deeply ingraining validity as the banner under which to discuss the

methodology of tests and experiments in psychology. Moreover, through the export of

psychological experimentation to the other social sciences in the post-war period, it

unintentionally succeeded in ingraining the concept of validity in the social sciences

generally.

Principal among the Technical Recommendations’ contesters was Donald Campbell.

In ‘Factors Relevant to the Validity of Experiments in Social Settings’ (1957), Campbell

investigated the issue of validity for the case of experiments and proposed a distinction

between internal and external validity. Campbell explicitly built on previous

methodological discussions of psychological experimenting, and illustrated his approach

by means of well-known experimental examples. Yet, the distinction between internal and

external validity is Campbell’s, without any directly related previous expositions other

than those mentioned above.

Validity will be evaluated in terms of two major criteria. First, and as a basic minimum, iswhat can be called internal validity: did in fact the experimental stimulus make somesignificant difference in this specific instance? The second criterion is that of external validity,representativeness, or generalizability: to what populations, settings, and variables can thiseffect be generalized. (Campbell 1957, p. 297, emphasis in the original)

In other words, internal validity referred to whether the experiment itself had been

conducted properly and whether it had yielded significant results, according to the received

rules of experimenting and statistics. External validity referred to the generalizability of the

experimental results. Although Campbell noted that internal and external validity might

sometimes be incompatible when controls for internal validity would jeopardize external

validity, they were not presented as opposites. A bad experiment could be inferior in both

the internal and external validity domain, and a good test or experiment would score high

on both internal and external validity.

With his introduction of internal and external validity to experimental methodology,

Campbell transplanted the notion of validity from the realm of tests to that of experiments.

In other words, Campbell applied a key notion in measurement theory – validity, or

accuracy – to the theory of experimentation in the behavioral and social sciences. In still

other words, Campbell introduced the idea that also experiments could be more or less

accurate. It is a matter of personal opinion whether it is meaningful to talk about the

accuracy of experiments in this way, which has been the subject of continuous debate (e.g.

F. Heukelom16

Dow

nloa

ded

by [

UZ

H H

aupt

bibl

ioth

ek /

Zen

tral

bibl

ioth

ek Z

üric

h] a

t 01:

50 0

9 Ju

ly 2

014

Zelditch 2007; Jimenez-Buedo and Miller forthcoming). One peculiarity to be noted here

is that although Campbell applied the notion of validity to experiments, he did not

transplant validity’s counterpart reliability, or precision. Thus, to Campbell and to all

those who picked up his internal–external validity distinction it was meaningful to talk

about experiments as being more or less accurate, but it was not meaningful to talk about

experiments as being more or less precise.

It is to be emphasized that Campbell’s application of validity to experiments does not

mean that he failed to distinguish between tests and experiments. On the contrary,

Campbell and Donald Fiske’s ‘Convergent and Discriminant Validation by the Multitrait-

Multimethod Matrix’ (1959), for instance, widely cited as the start of the double-blind

research method (Brewer 2001), argued that underneath the ‘major types of validity’ of

tests as presented in the Technical Recommendations lay concerns for convergence and

discrimination. It completely abstained from the internal–external distinction. Validation

is convergent, the authors argued, when a particular test confirms observations obtained

from other tests. For the validity of a new test, however, the observations in addition are

needed to be shown to discriminate more, or to discriminate differently as compared with

previous tests.

The distinction between internal and external validity returned in Campbell and

Stanley (1963), ‘Experimental and Quasi-Experimental Designs for Research on

Teaching’. In this lengthy book chapter, external validity retained the meaning it was

given in Campbell (1957). The meaning of internal validity, however, was slightly

sharpened: ‘Internal validity is the basic minimum without which any experiment is

uninterpretable: Did in fact the experimental treatments make a difference in this specific

context?’ (Campbell and Stanley 1963, p. 175, emphasis in the original). Subsequently,

internal validity was divided into ‘eight different classes of extraneous variable’, which

needed to be controlled in the experimental design. These eight classes had no direct origin

in other methodological discussions in psychology and like internal and external validity

were derived directly from Campbell’s own observations of experimental practice in

psychology. The eight classes included History: the influence of events between

experimental treatments; Maturation: the influence of the passage of time on subjects’

behavior; Testing: the effects of taking a test upon scores of a second testing; Selection:

biases resulting from the method of selecting the subjects; and Experimental mortality: the

loss of responses during the experiment. Note again that all potential effects here are about

the experiment, including the Testing effect, which is not a worry about the validity of the

measurement instrument itself but an effect of the instrument on the experimental subject.

Subsequently, external validity was divided into four classes, including the Reactive or

Interaction effect: the influence of previous tests or other experiments on subjects’

behavior; and Reactive effects of experimental arrangements: effects of the experimental

arrangements on the behavior of subjects, later also known as the problem of artificiality.

The distinction between reliability and validity, the classification and definition of

validity in the Technical Recommendations, and Campbell’s distinction between internal

and external validity defined the experimental vocabulary in psychology for subsequent

decades, up into the twenty-first century. In general, it seems, the different domains have

been kept separate: validity versus reliability and the different subcategories of validity

under the rubric of tests and measurement, and internal and external validity and their

different sub-categories under the rubric of experimentation (e.g. Kruglanski 1975;

Henshel 1980; Berkovitz and Donnerstein 1982; Swanborn 1994; Smith and Davis 1997).

An example is the 2000 eight-tome version of the APA’s Encyclopedia of Psychology

(Kadzin 2000). The Encyclopedia contained two entries for validity, ‘Validity’, and

Journal of Economic Methodology 17

Dow

nloa

ded

by [

UZ

H H

aupt

bibl

ioth

ek /

Zen

tral

bibl

ioth

ek Z

üric

h] a

t 01:

50 0

9 Ju

ly 2

014

‘Construct Validity’, signaling a predominant position of construct validity among the

different categories. In both the entries, validity was understood as a description of ‘the

extent to which a psychological instrument measures what it is supposed to measure’

(Krueger and Kling 2000, p. 149). In addition to construct validity, validity was then

divided into a number of categories and sub-categories. Only at the end did the validity-

entry briefly mention the applicability of the validity concept to experiments, and cited

internal and external validity as its main categories.

Sometimes, however, the different validities have been related to one another. For

instance, in the authoritative handbook Quasi-Experimentation, Design & Analysis Issues

for Field Settings (1979), Campbell together with Thomas Cook started by broadly

distinguishing internal and external validity.2 Under the heading of internal validity, the

authors discussed what they considered to be a main threat to internal validity in the case

of quasi-experiments: statistical conclusion validity, which asks whether or not statistical

inference of covariance between variables is justified. They considered the main factor of

external validity, on the other hand, to be construct validity. Thus, the internal and external

validity experiments functioned as the general terms under which also the measurement

types of validity were discussed.

To sum up, validity originates in psychology as a synonym of the accuracy of a

measurement instrument – the measurement instrument in psychology often being a test.

Validity or accuracy of a test was distinguished from the reliability, or precision of the test.

In 1954, the APA’s Technical Recommendations standardized the different sub-categories

of validity. A few years later, Campbell applied validity, but not reliability, to experiments,

and chose his own sub-categories in the formof internal and external validity. Subsequently,

test, experiment, and their associated categories of validity were kept largely separate in

psychology, despite occasional attempts to specify a relationship between them.

3 The dissemination of validity into economics

During the period of the publication of Thurstone’s textbook on reliability and validity

(1931) and the APA’s Technical Recommendations (1954), the distinction between

reliability/precision and validity/accuracy as assessments of measurement instruments

were occasionally and then only briefly mentioned by economists. Samuelson’s

Foundations of Economic Analysis (1947), for instance, mentioned a difference between

the validity and precision of a hypothesis (p. 12), without, however, elaborating on the

matter. By contrast, methodological concerns of measurement formed an integral part of

econometrics (Stewart andWallis 1981;Morgan 1990; Peracchi 2001; Boumans 2005). But

these discussions stemmed from measurement theory in natural sciences, and not from

psychology.Moreover, there is no indication thatmeasurement discussions in econometrics

influenced the methodological perspective of economic experimenting.

However, prior to the rise of experimental economics in the 1960s and 1970s, there

was some discussion surrounding the methodology of experiments in economics.3 Wallis

and Friedman (1942), for instance, criticized Thurstone’s one-time attempt to infer

indifference curves from an experiment with a subject who had to choose between shoes,

coats, and hats (Thurstone 1931; Moscati 2007). Wallis and Friedman argued that the

degree of control in Thurstone’s experiment had been too low to allow for any serious

inferences regarding indifference curves. For instance, they remarked that ‘[i]t is

questionable whether a subject in so artificial an experimental situation could know what

choices he would make in an economic situation’ (Wallis and Friedman 1942, p. 179).

Anachronistically, this could be understood as a problem of external validity, in the sense

F. Heukelom18

Dow

nloa

ded

by [

UZ

H H

aupt

bibl

ioth

ek /

Zen

tral

bibl

ioth

ek Z

üric

h] a

t 01:

50 0

9 Ju

ly 2

014

that the results obtained in the laboratory environment could not be translated to behavior

in the economy outside the laboratory. It should be added, however, that Wallis and

Friedman did not relate their critique of Thurstone (1931) to methodological discussions

on measurement and experiments generally. Instead, they interpreted their objections in

terms of what they considered to be a fundamental difference between economics and

psychology. The methods of the latter, they argued, could simply not be applied to the

former. Moreover, Wallis and Friedman put forth similar practical objections against the

construction of indifference curves on the basis of statistical data of actual economic

behavior, arguing for instance that individual’s preferences could change between two

observed preference expressions (pp. 183–186). But despite a few loose experiments and

related methodological discussions of Wallis and Friedman (1942) and some others

(Moscati 2007), expositions about the methodology of economic experimenting only

seriously commenced with the rise of experimental and later behavioral economics.4

3.1 Validity in experimental economics

It is well known that Vernon Smith, the founding father of experimental economics, in the

1950s developed his experimental method for economics in an atmosphere of

interdisciplinary research (Weintraub 1992; Lee 2004; Dimand 2005). In particular, the

influence of social psychologist Sidney Siegel has been recognized as vital for the

development of Smith’s application of the experimental method of the psychologists to

economic research questions (Smith 1992; Innocenti 2008; Lee 2008). However, among

others because of Siegel’s death in 1961, Smith was alone in conducting his economic

experiments until the mid-1970s, when he was joined by a younger generation of

economists, including Charles Plott, John Kagel, and others.

Discussions of the proper method of conducting experiments in economics are a

recurring theme in Smith’s writings of the 1960s, and in the experimental economics

literature more generally from the mid-1970s onwards. Yet, these discussions were not put

in terms of the internal–external validity framework that was developing in psychology

around the same time. On the contrary, in the single methodological article Smith wrote in

the 1960s, the issue of inferences based on experiments is a few times put in terms of

validity, without however any reference to the psychologists, and despite the fact that in

the same article Smith explicitly cites some social psychologists as important for

developing the experimental method for economics. Instead, Smith took a position which

is best described as a combination of set-theory and Bayesian statistics, and which derived

its inspiration mainly from Savage (1954). In this approach, the overlap of the sets Nature,

Model, and Experimental Results ‘constitutes the “validity” of the experiment’ (Rice and

Smith 1964, p. 240), the statistics of which are an input for a Bayesian updating process

calculating the scientist’s post-experiment beliefs.

Also in Smith’s well-known methodological article of the 1970s, ‘Experimental

Economics: Induced Value Theory’ (Smith 1976), no reference was made to the validity

framework of the psychologists. Smith talked about experiments as testing the ‘validity’ of

economic theories (Smith 1976, p. 274), invoked the concept of parallelism [‘As far as we

can tell, the same physical laws prevail everywhere’ (Shapley 1964; quoted in Smith 1976,

p. 274)], and advanced control as ‘the essence of experimental methodology’ (Smith 1976,

p. 275). But validity as understood by the psychologists did not appear even in a footnote.

The reason that Smith and other experimental economists in the 1970s such as Plott

were reluctant to adopt validity as understood by the psychologists was that the

psychologists’ way of dealing with validity risked creating a division between an inside

Journal of Economic Methodology 19

Dow

nloa

ded

by [

UZ

H H

aupt

bibl

ioth

ek /

Zen

tral

bibl

ioth

ek Z

üric

h] a

t 01:

50 0

9 Ju

ly 2

014

world of the laboratory and an outside ‘real’ world of the economy and its actors. Such an

interpretation, they argued, would be directly against experimental economists’

conception of experiments in economics because,

The relevance of experimental methods rests on the proposition that laboratory markets are‘real’ markets in the sense that principles of economics apply there as well as elsewhere. Realpeople pursue real profits within the context of real rules. The simplicity of laboratory marketsin comparison with naturally occurring markets must not be confused with questions abouttheir reality as markets. (Plott 1982, p. 1520)

For this reason, Plott (1982) was equally dismayed about the term ‘artificial’ as a

description of the laboratory environment. Plott argued that a concern with the artificiality

of experiments is a straw man, and that in any case it would be an argument directed

at experimentation in general, not just at experiments in economics.5 Finally, also the

560-page leading textbook in experimental economics, Davis and Holt Experimental

Economics (1993), was entirely devoid of references to psychologists’ validity.

Yet, in the late 1980s and early 1990s the psychological categories nevertheless slowly

started to creep into the methodological discussion of the experimental economists. The

first article to invoke one of the psychologists’ notions of validity was Brookshire,

Coursey, and Schulze (1987), ‘The External Validity of Experimental Economics

Techniques: Analysis of Demand Behavior’. The focuses of the authors was the now well-

known twin allegations of experimental economic results not translating to ‘real world

settings’, and how to compare behavior of student-subjects with the behavior of ‘“actual”

buyers, sellers, or traders’ (p. 289). In contrast to earlier replies to these allegations, as

indicated above, the authors summarized them as focusing upon ‘the external validity, or

lack of validity, associated with experimental economic techniques’ (p. 289, emphasis in

the original), which was then equated with the issue of the parallelism between the two

settings, as defined by Smith and others. Subsequently, the authors set out to invalidate

these allegations on empirical grounds. That is, they compared subjects’ behavior in

experimental markets with agents in ‘real’ markets and concluded that the observed

behavior in the two settings was similar enough to infer that the experimental method in

economics did not pose a problem of parallelism or external validity.

It is first of all important to note that the authors adopted external validity without its

counterpart internal validity, thus defying the idea that the two always come together.

Second, the issue of external validity thus was first introduced in experimental economics

not as a fundamental philosophical problem of experimentation, but as a purely empirical

question. This continues to be the understanding of (external) validity of many

experimental economists up to this day. For instance, in an email to the author, Smith

emphasized that he considered validity, or parallelism, ‘100% an empirical issue’, an issue

that ‘needs to be addressed as such if wheel spinning is to be avoided’ (email Smith to

author, 1 July 2009).

Following this first experimental economic article employing external validity,

internal validity was also introduced.6 For instance, in Friedman and Sunder Experimental

Methods, A Primer for Economists (1994), economists interested in conducting experiments

were occasionally warned that such-and-such a method may undermine or threaten the

internal validity of the experiment, or that some other procedure risked weakening the

external validity of the experiment, in which external validity continued to be equated with

parallelism. They referred to the ‘older terminology (from psychology, biology and other

disciplines)’ of internal and external validity ‘to make explicit the connection to the

literature on philosophy of science’ (email Friedman to author, 2 July 2009). As a result of

F. Heukelom20

Dow

nloa

ded

by [

UZ

H H

aupt

bibl

ioth

ek /

Zen

tral

bibl

ioth

ek Z

üric

h] a

t 01:

50 0

9 Ju

ly 2

014

these and other references, internal validity and external validity gradually became

household concepts of experimental methodology in economics.

Nevertheless, many experimental economists continued to have some reservations

with regard to the new methodological concepts. Vernon Smith tried to play down the

importance of the two validities, emphasizing, as said, that it was merely an empirical

question that could be tested, and continued to equate validity with parallelism. In an email

to the author Dan Friedman noted that he has never been ‘especially enthusiastic about

[ . . . ] ‘internal’ and ‘external’ validity’ (email Friedman to author, 2 July 2009). Similar

remarks have been made by other well-known experimentalists (email Glenn Harrison to

author, 2 July 2009, email John List to author, 3 July 2009).

The question thus arises why experimental economists sometimes adopted the internal

and external validity concepts of the psychologists if they did consider them to contribute

to the assessment of experiments in economics. The most plausible answer, as illustrated

by Brookshire, Coursey, and Schulze’s article discussed above, is that the experimental

economists of the late 1980s and early 1990s felt compelled to use internal and external

validity in order to establish experimental economics as a viable economic sub-discipline.

As experimental economics was growing, but its position anything but secured, external

validity and later internal validity were introduced to convince a skeptical audience of

economists who did not believe in the ‘reality’ of experiments (Starmer 1999, email Chris

Starmer to author, 1 July 2009, email Friedman to author, 2 July 2009). It is hence

somewhat ironic that the use of internal and external validity subsequently contributed to

ingraining in economics the conception of an inside world of the laboratory versus an

outside world of reality.

3.2 Behavioral economics

The introduction of external and internal validity by experimental economists in the late

1980s and early 1990s coincided with the emergence of behavioral economics, the other

economic sub-discipline that employs experiments. Despite the fact that during the 1990s

and early 2000s behavioral economics and experimental economics came to oppose one

another on a number of issues (Heukelom 2011), behavioral economists adopted many of

the experimental techniques and language developed by the experimental economists,

including an aversion toward the use of deception, the use of monetary rewards, and the

use of internal and external validity.7 However, behavioral economists dropped the link

between external validity and parallelism and preferred to refer to internal and external

validity as a ‘psychological distinction’, in the sense of ‘from the psychological literature’

(e.g. Camerer 1996; Loewenstein 1999; Camerer and Loewenstein 2004; Samuelson

2005). As in experimental economics, validity was not extensively discussed in behavioral

economics, as compared with the extensive discussions in psychology. When it was

discussed, however, it was framed in terms of internal validity and external validity, in

which internal validity, roughly and without much discussion referred to the validity of the

inferences drawn on the basis of the experimental observations, and external validity

referred to the generalizability of the observations and inferences.

In a wonderful historical twist, Loewenstein (1999), the most extensive and explicit

behavioral economics discussion of validity in the 1990s, used the ‘psychological

distinction’ of internal and external validity to criticize experimental practice of

experimental economists. ‘Experimental Economics from the vantage-point of

Behavioural Economics’, Loewenstein (1999) positioned behavioral economics explicitly

in opposition to experimental economics. Under the heading of external validity,

Journal of Economic Methodology 21

Dow

nloa

ded

by [

UZ

H H

aupt

bibl

ioth

ek /

Zen

tral

bibl

ioth

ek Z

üric

h] a

t 01:

50 0

9 Ju

ly 2

014

Loewenstein saw four problems with experimental economics. First, experimental

economics put great emphasis on the use of auctions in its experiments. As people in

reality hardly ever find themselves in an auction situation, it is doubtful that these

experiments can tell us very much about economic behavior in the real world. Second,

Loewenstein disagreed with experimental economists’ use of repetition in what he called

the Ground Hog Day argument, following Camerer (1996). In reality, Loewenstein

argued, people never make the exact same decision 40 times in a row. Real-world behavior

is much more like the first few rounds of an experiment compared with the last two or three

rounds.

Third, Loewenstein criticized experimental economists for their tendency to reduce

real-world content to the absolute minimum possible. Apart from the fact that a context-

free experiment is an illusion, Loewenstein argued that it also greatly reduces the

external validity of the experiments. Instead, economists should, just as Loewenstein

himself, make the experimental situation as congruent with reality as possible; hence

make the experiment ‘context-rich’. Fourth, according to Loewenstein experimental

economists wrongly assumed that monetary rewards result in strict control over

incentives. With monetary incentives, subjects are also likely to be driven by other

motives than profit maximization, he argued. Finally, one problem concerning internal

validity that Loewenstein observed was that experimental economists had been far too

careless in not using randomization and in comparing the experimental results that had

been obtained under different circumstances. Loewenstein’s discussion illustrates that

by the late 1990s internal and external validity had emerged as household concepts for

experimental methodology in both experimental and behavioral economics. They were

not given extensive treatment or definition, but had emerged as two contrasting

concepts in which terms experimentalists in economics could think about their

experiments.

3.3 The analytical definition of internal and external validity in economics

Thus, in the 1980s and 1990s external validity and internal validity were loosely introduced

in the methodology of experiments in economics. Although the economists who conducted

and discussed seemed to be comfortable with the meaning of both terms and with the

relationship between the two, internal and external validity were never given extensive

discussion or definition. This changed in the late 1990s and early 2000s when Francesco

Guala made validity the key methodological concern of economic experiments (Guala

1999, 2003, 2005; Guala and Mittone 2005).

Guala first of all drew a sharp line between an inside world of the experiment and an

outside, real world, thus implicitly arguing against the position of the experimental

economists of the 1960s–1980s. Experimentalists draw ‘inferences within the

experiment’, Guala argued, in which experiments are to be understood as ‘very special

settings, which are rarely if ever instantiated in the “real world” outside the laboratory’

(Guala 2005, p. 141, emphasis in the original). Internal validity, according to Guala, was

about the validity of the inferences in the inside world of the laboratory, whereas external

validity was about the relation of these inferences to the real outside world. Internal

validity ‘is achieved when some particular aspect of a laboratory system [ . . . ] has been

properly understood by the experimenter’ (Guala 2005, p. 142). Experiment E is internally

valid when variation in Y is known to be caused by X. It is in addition externally valid if

‘X causes Y not only in E, but also in F, G, H, etc’ (Guala 2005, p. 142). Second, Guala

posited a trade-off between internal and external validity:

F. Heukelom22

Dow

nloa

ded

by [

UZ

H H

aupt

bibl

ioth

ek /

Zen

tral

bibl

ioth

ek Z

üric

h] a

t 01:

50 0

9 Ju

ly 2

014

The stronger an experimental design is with respect to one validity issue, the weaker it islikely to be with respect to the other. The more artificial the environment, the better forinternal validity; the less artificial, the better for external purposes. (Guala 2005, p. 144)

Guala defined a trade-off as ‘likely’ in the first sentence, although the second sentence

is less ambiguous. What is important, however, is that the trade-off has been understood as

strict and well defined by other experimentalists who have adopted the internal–external

validity distinction (e.g. Samuelson 2005; Schram 2005; Bardsley et al. 2009). For

instance, Schram (2005) ‘Artificiality: The Tension Between Internal and External

Validity in Economic Experiments’ constructed a similar framework as Guala, and relied

for its terminology furthermore on Loewenstein (1999). Schram interpreted Guala to be

positing a strict trade-off between internal and external validity, and retraced it directly to

psychology.

There is an obvious tension between [internal and external validity]. Where internal validityoften requires abstraction and simplification to make the research more tractable, theseconcessions are made at the cost of decreasing external validity. Loewenstein (1999) pointsout that while this tension is a starting point in learning research methods in psychology, thediscussion is often completely neglected by economists. (Schram 2005, p. 226)

In addition, Schram like Guala invoked a distinction between the laboratory world of

the experiment and the real world outside the experiment. A key issue in economic

methodology of experimentation, according to Schram, was the issue of the ‘artificiality of

the laboratory situation’, in which ‘the [artificiality] question is whether the stylized form

of experimental institutions allows for conclusions pertaining to the “real world”’ (Schram

2005, p. 226). As said, in the experimental economic literature the outside versus inside

dichotomy is often linked to this notion of ‘artificiality’ (e.g. Starmer 1999; Bardsley 2005;

Schram 2005), in which artificiality refers to the alleged artificial world within the

laboratory that is different, and therefore perhaps not comparable, to the ‘real’, or ‘natural’

world outside the laboratory. In other words, the problem of external validity is understood

as a consequence of the problem of artificiality.8

With the analytical treatment by Guala and others, internal and external validity came

to the center of a logic of experimentation in economics. Where experimental economists

of the 1980s and early 1990s such as Vernon Smith conceived of validity as a purely

empirical question of the comparability between experimental outcomes in different

settings, in the 2000s validity became the way to logically connect the experimental setup

to the experimental results, and to logically connect the experimental results derived in the

‘inside’ world of the experiment to the ‘outside’ world of reality. At the end of the first

decade of the twenty-first century, with the distinction between experimental and

behavioral economics gradually dissolving and with other experimental and empirical

procedures emerging, the analytical take on validity provides a prominent way of thinking

about experimentation in economics.

Finally, the treatment of internal and external validity in economics in recent years has

provided an argument in the distinction between laboratory experiments and field

experiments (e.g. Harrison and List 2004; Carpenter, Harrison, and List 2005; others). The

reasoning is that laboratory experiments allow for a relatively large amount of control, and

therefore, provide a high degree of internal validity. The cost of this high-internal validity,

however, is that laboratory experiments yield a relatively low-external validity.

Experimental practice in economics therefore leaves room for a method, which off-sets

the low external validity of laboratory experiments. This is where field experiments come

in, which lack the high-internal validity of laboratory experiments, but therefore offer a

high degree of external validity. That said, it should immediately be added that the recent

Journal of Economic Methodology 23

Dow

nloa

ded

by [

UZ

H H

aupt

bibl

ioth

ek /

Zen

tral

bibl

ioth

ek Z

üric

h] a

t 01:

50 0

9 Ju

ly 2

014

literature also seeks ways to go beyond internal and external validity for the case of field

experiments (e.g. Harrison and List 2004; Harrison 2005; Levitt and List 2007).

4 Conclusion

The history of validity in psychology and economics adds a new case-study to the

literature that investigates how scientific tools travel across scientific boundaries (e.g.

Galison and Stump 1996; Galison 1999; Gieryn 1999; Morgan and Howlett 2010). In

addition, it illuminates present usage and meaning of validity in economic experimenting

(Loewenstein 1999; Harrison and List 2004; Samuelson 2005; Jimenez-Buedo and Miller

forthcoming). The common English concept of validity was first given a more specifically

scientific meaning by psychologists in the early twentieth century in the contexts of

psychological tests, in which validity denoted the accuracy of a test and was contrasted

with the reliability or precision of a test. Following the classification of different

validity-types in the APA’s Technical Recommendations (1954), validity traveled from

psychological tests to psychological experiments through the work of Campbell

(Campbell 1957; Campbell and Fiske 1959; Campbell and Stanley 1963). Thus, the idea

that also experiments could be more or less valid or accurate was introduced. In addition, a

distinction was made between the internal and the external validity of an experiment.

Many subsequent discussions in experimental methodology in psychology and economics

essentially have been about the question what it means to say that an experiment is more or

less internally or externally valid, i.e. accurate.

Of the two domains in which validity was employed in psychology, only the internal

and external validity of experimental methodology traveled to economics. The initial

implementation of validity in economic experimentation was reluctant, and focused upon

showing how external validity in particular was not problematic in economic experiments

(e.g. Brookshire, Coursey, and Schulze 1987). However, the gradual adoption of validity

by economists working with experiments eventually led to the clear, analytical definition

of internal and external validity by Guala (Guala 1999, 2003, 2005; Guala and Mittone

2005), a definition that was subsequently taken over by other economists (e.g. Harrison

and List 2004; Bardsley 2005; Samuelson 2005; Schram 2005; Jimenez-Buedo and Miller

forthcoming). Thus, the application of validity, or accuracy, to the domain of

psychological experimenting and its loose classification into internal and external

validity, eventually became in economics the strict twin concepts by which the

meaningfulness of experiments was assessed. In other words, in its travels from

psychological tests to psychological experiments to economic experiments the concept of

validity generally retained its meaning as the accuracy of a scientific procedure. At the

same time, however, it was put to use in dissimilar ways and elicited different discussions

in the scientific realms in which it was applied.

Notes

1. Another issue is that one may discuss validity/accuracy and reliability/precision without usingthose or similar terms. An example is Boring’s well-known ‘Intelligence as the Tests Test it’(1923), which assesses the meaning of the then recently emerged intelligence tests. It waspublished a few years before psychologists developed the validity–reliability vocabulary todiscuss the interpretation of tests.

2. A quasi-experiment is an experiment in which the experimenter has little control over the causalfactors. A quasi-experiment is to be situated in between pure laboratory experiments andcorrelational research, and is similar to what in the 2000s in economics has been labeled a fieldexperiment.

F. Heukelom24

Dow

nloa

ded

by [

UZ

H H

aupt

bibl

ioth

ek /

Zen

tral

bibl

ioth

ek Z

üric

h] a

t 01:

50 0

9 Ju

ly 2

014

3. A JSTOR search of economic journals reveals a handful of occasional remarks to either internalor external validity in the 1950s–1970s, in journals such as The Journal of Economic Educationand The Journal of Human Resources.

4. In this paper, I focus on laboratory experiments in experimental and behavioral economics.Recent developments, such as experimenting in financial accounting (e.g. Libby, Bloomfield,and Nelson 2002), are left for another occasion.

5. Similar arguments such as these in Plott (1982) can be found in Smith (1976, 1982), and inWilde (1980). The most extensive presentation and refutation of artificiality and the associatedreal-world argument can be found in Starmer (1999).

6. A JSTOR search of economic journals yields a combination of internal validity and externalvalidity in 3 articles in the 1980s, in 14 articles in the 1990s, and in 41 articles in the 2000s.

7. Arguably, the distinction between experimental and behavioral economics started to dissolvearound the mid-2000.

8. The intuition behind such a distinction is that in an experimental setting, i.e. in a laboratory, thescientist controls some factors of the world and thereby creates an artificial world inside thelaboratory. As such, it risks to confuse the physical operation of conducting an experiment withits ontological status. Experiments often are physically conducted ‘inside’ a laboratory: thescientist puts on her gloves and glasses, passes through an airlock and enters the laboratory. Ineconomics and psychology similarly the scientist often physically enters the laboratory wherethe experiment is conducted. However, that does not mean that by doing so she enters adifferent, inside, or non-real world, as opposed to the real world outside. It simply means that sheenters a place in the world specifically manipulated and controlled to conduct experiments. In alaboratory, the scientist is completely in our real, material world.

References

APA Committee on Test Standards (1952), ‘Technical Recommendations for Psychological Testsand Diagnostic Techniques: Preliminary Proposal’, The American Psychologist, 461.

——— (1954), ‘Technical Recommendations for Psychological Tests and Diagnostic Techniques’,Psychological Bulletin, 51(2), Supplement.

Bardsley, N. (2005), ‘Experimental Economics and the Artificiality of Alteration’, Journal ofEconomic Methodology, 12(2), 239–252.

Bardsley, N., Cubitt, R., Loomes, G., Moffatt, P., Starmer, C., and Sugden, R. (2009), ExperimentalEconomics: Rethinking the Rules, Princeton, NJ: Princeton University Press.

Berkovitz, L., and Donnerstein, E. (1982), ‘External Validity is more than Skin Deep’, AmericanPsychologist, 37, 245–257.

Boring, E.C. (1923), ‘Intelligence as the Tests Test It’, New Republic, 36, 35–37.——— (1929), A History of Experimental Psychology, New York: The Century Co.——— (ed.) (1937), A Manual of Psychological Experiments, New York: Wiley.Boring, E.C. (1950), A History of Experimental Psychology (2nd ed.), New York: Appleton-

Century-Crofts.Boumans, M.J. (2005), How Economists Model the World into Numbers, London: Routledge.Brewer, M.B. (2001), ‘Donald Campbell’, in Encyclopedia of Psychology, ed. A.E. Kadzin,

Washington, DC: American Psychological Association: III, pp. 3–5.Brookshire, D.S., Coursey, D.L., and Schulze, W.D. (1987), ‘The External Validity of Experimental

Economics Techniques: Analysis of Demand Behavior’, Economic Inquiry, 25(2), 239–250.Calder, B.J., Phillips, L.W., and Tybout, A.M. (1982), ‘The Concept of External Validity’, Journal

of Consumer Research, 9, 240–244.Camerer, C. (1996), ‘Rules for Experimenting in Psychology and Economics, andWhy They Differ’,

in Experimental Studies of Strategic Interaction: Essays in Honor of Reinhard Selten, eds. W.Albers, W. Guth, and E. Van Damme, Berlin: Springer-Verlag, pp. 313–327.

Camerer, C., and Loewenstein, G. (2004), ‘Behavioral Economics: Past, Present, Future’, inAdvances in Behavioral Economics, eds. C.F. Camerer, G. Loewenstein, and M. Rabin,Princeton, NJ: Princeton University Press, pp. 3–52.

Campbell, D.T. (1957), ‘Factors Relevant to the Validity of Experiments in Social Settings’,Psychological Bulletin, 54, 297–312.

Campbell, D.T., and Fiske, D. (1959), ‘Convergent and Discriminant Validation by the Multitrait-Multimethod Matrix’, Psychological Bulletin, 56, 81.

Journal of Economic Methodology 25

Dow

nloa

ded

by [

UZ

H H

aupt

bibl

ioth

ek /

Zen

tral

bibl

ioth

ek Z

üric

h] a

t 01:

50 0

9 Ju

ly 2

014

Campbell, D.T., and Stanley, J.C. (1963), ‘Experimental and Quasi-Experimental Designs forResearch on Teaching’, in Handbook of Research on Teaching, ed. N.L. Gage, Boston, MA:Houghton Mifflin, pp. 171–246.

Carpenter, J.P., Harrison, G.W., and List, J.A. (2005), ‘Field Experiments in Economics: AnIntroduction’, in Research in Experimental Economics, Vol. 10, eds. J.P. Carpenter, G.W.Harrison, and J.A. List, Amsterdam: Elsevier, pp. 1–15.

Chang, H. (2004), Inventing Temperature: Measurement and Scientific Progress, Oxford: OxfordUniversity Press.

Colvin, S.S. (1900), ‘The Fallacy of Extreme Idealism’, The American Journal of Psychology, 11(4),511–526.

Cook, T.D., and Campbell, D.T. (1979), Quasi-Experimentation: Design & Analysis Issues for FieldSettings, Boston, MA: Houghton Mifflin Co.

Cronbach, L.J., and Meehl, P.E. (1955), ‘Construct Validity in Psychological Tests’, PsychologicalBulletin, 52, 281.

Danziger, K. (1997), Naming the Mind, How Psychology Found its Language, London: SagePublications.

Davis, D.D., and Holt, C.A. (1993), Experimental Economics, Princeton, NJ: Princeton UniversityPress.

Dimand, R.W. (2005), ‘Experimental Economic Games: The Early Years’, in The Experiment in theHistory of Economics, eds. P. Fontaine and R. Leonard, New York: Routledge, pp. 5–24.

Festinger, L., and Katz, D. (eds.) (1953), Research Methods in the Behavioral Sciences, Fort Worth,TX: Dryden Press.

Friedman, D., and Sunder, S. (1994), Experimental Methods: A Primer for Economists, Cambridge:Cambridge University Press.

Galison, P. (1999), ‘Trading Zone, Coordinating Action and Belief’, in The Science Studies Reader,ed. M. Biagioli, London: Routledge, pp. 137–160.

Galison, P., and Stump, D.J. (1996), The Disunity of Science: Boundaries, Contexts, and Power,Stanford, CA: Stanford University Press.

Gieryn, T. (1999), Cultural Boundaries of Science: Credibility on the Line, Chicago, IL: Universityof Chicago Press.

Gigerenzer, G. (1987), ‘Survival of the Fittest Probabilist: Brunswik, Thurstone, and the TwoDisciplines of Psychology’, in The Probabilistic Revolution 2, eds. L. Kruger, G. Gigerenzer andM.S. Morgan, Cambridge: MIT Press, pp. 49–72.

Guala, F. (1999), ‘The Problem of External Validity (or “Parallelism”) in Experimental Economics’,Social Science Information, 38, 555–573.

——— (2003), ‘Experimental Localism and External Validity’, Philosophy of Science, 70,1195–1205.

——— (2005), The Methodology of Experimental Economics, Cambridge: Cambridge UniversityPress.

Guala, F., and Mittone, L. (2005), ‘Experiments in Economics: External Validity and the Robustnessof Phenomena’, Journal of Economic Methodology, 12(4), 495–515.

Hammersley, M. (1991), ‘A Note on Campbell’s Distinction between Internal and ExternalValidity’, Quality and Quantity, 25, 381–387.

Harrison, G.W. (2005), ‘Field Experiments and Control’, Research in Experimental Economics, 10,17–50.

Harrison, G.W., and List, J.A. (2004), ‘Field Experiments’, Journal of Economic Literature, 27,1013–1059.

Henshel, R.L. (1980), ‘The Purposes of Laboratory Experimentation and the Virtues of DeliberateArtificiality’, Journal of Experimental Social Psychology, 16, 466–478.

Heukelom, F. (2011), ‘What to Conclude from Psychological Experiments: The Contrasting Casesof Experimental and Behavioral Economics’, History of Political Economy, forthcoming.

Innocenti, A. (2008), ‘How Can a Psychologist Inform Economics? The Strange Case of SidneySiegel’, DEPEID Working papers, 8/2008.

Jimenez-Buedo, M., and Miller, L.M. (forthcoming), ‘Why a Trade-off? The Relationship betweenthe External and Internal Validity of Experiments’, Theoria. An International Journal forTheory, History and Foundations of Science.

Kadzin, A.E. (ed.) (2000), Encyclopedia of Psychology, Washington, DC: American PsychologicalAssociation.

F. Heukelom26

Dow

nloa

ded

by [

UZ

H H

aupt

bibl

ioth

ek /

Zen

tral

bibl

ioth

ek Z

üric

h] a

t 01:

50 0

9 Ju

ly 2

014

Krueger, R.F., and Kling, K.C. (2000), ‘Validity’, in Encyclopedia of Psychology, ed. A.E. Kadzin,Washington, DC: American Psychological Association, pp. 149–153.

Kruglanski, A.W. (1975), ‘The Human Subject in the Psychological Experiment: Fact and Artifact’,in Advances in Experimental Social Psychology (Vol. 8), ed. L. Berkovitz, Chicago, IL:University of Chicago Press, pp. 101–147.

Lee, K.S. (2004), ‘Rationality, Minds, and Machines in the Laboratory: A Thematic History ofVernon Smith’s Experimental Economics’, Ph.D. dissertation, Notre Dame, University of NotreDame, p. 300.

Lee, K.S., and Mirowski, P. (2008), ‘The Energy Behind Vernon Smith’s Experimental Economics’,Cambridge Journal of Economics, 32, 257–271.

Levitt, S.D., and List, J.A. (2007), ‘What Do Laboratory Experiments Measuring Social PreferencesReveal about the Real World?’ Journal of Economic Perspectives, 21(2), 153–174.

Libby, R., Bloomsfield, R., and Nelson, M.W. (2002), ‘Experimental Research in FinancialAccounting’, Accounting, Organization and Society, 27, 775–810.

Loewenstein, G. (1999), ‘Experimental Economics from the Vantage-Point of BehaviouralEconomics’, The Economic Journal, 109, F25–F34.

Lucas, J.W. (2003), ‘Theory-Testing, Generalizations, and the Problem of External Validity’,Sociological Theory, 21(3), 236–253.

McCrea, J., and Pritchard, H.J. (1897), ‘The Validity of the Psychophysical Law for the Estimationof Surface Magnitudes’, The American Journal of Psychology, 8(4), 494–505.

Morgan, M.S. (1990), The History of Econometric Ideas, Cambridge: Cambridge University Press.Morgan, M.S., and Howlett, W.P. (eds.) (2010), How Well Do Facts Travel? Cambridge: Cambridge

University Press.Moscati, I. (2007), ‘Early Experiments in Consumer Demand Theory: 1930–1970’, History of

Political Economy, 39(3), 359–401.Peracchi, F. (2001), Econometrics, New York: Wiley.Plott, C.R. (1982), ‘The Application of Laboratory Experimental Methods to Public Choice’, in

Collective Decision Making: Applications from Public Choice Theory, ed. C.S. Russell,Baltimore, MD: Johns Hopkins University Press.

Porter, T. (1994), ‘Making Things Quantitative’, Science in Context, 7(3), 389–407.Power, M. (1996), ‘Making Things Auditable’, Accounting, Organization and Society, 21(2/3),

289–315.Rice, D.B., and Smith, V.L. (1964), ‘Nature, the Experimental Laboratory, and the Credibility of

Hypotheses’, Behavioral Science, 9(3), 239–246.Samuelson, L. (2005), ‘Economic Theory and Experimental Economics’, Journal of Economic

Literature, 43, 65–107.Samuelson, P.A. (1947), Foundations of Economic Analysis, Cambridge: Harvard University Press.Savage, L.J. (1954), The Foundations of Statistics, New York: Wiley.Schram, A. (2005), ‘Artificiality: The Tension between Internal and External Validity in Economic

Experiments’, Journal of Economic Methodology, 12(2), 225–237.Shapley, H. (1964), Of Stars and Men, Boston, MA: Beacon Press.Smith, R.A., and Davis, S.F. (1997), The Psychologist as Detective: An Introduction to Conducting

Research in Psychology, Upper Saddle River, NJ: Prentice Hall.Smith, V.L. (1976), ‘Experimental Economics: Induced Value Theory’, American Economic

Review, 66, 274–279.——— (1982), ‘Microeconomic Systems as an Experimental Science’, American Economic Review,

72, 923–955.——— (1992), ‘Game Theory and Experimental Economics: Beginnings and Early Influences’, in

Towards a History of Game Theory, ed. E.R. Weintraub, London: Duke University Press,pp. 241–282.

Starmer, C. (1999), ‘Experiments in Economics: Should We Trust the Dismal Scientists in WhiteCoats?’ Journal of Economic Methodology, 6(1), 1–30.

Stewart, M.B., and Wallis, K.F. (1981), Introductory Econometrics, Oxford: Blackwell.Street, W.R. (1994), A Chronology of Noteworthy Events in American Psychology, Washington, DC:

American Psychological Association.Swanborn, P.G. (1981), Methoden Van Sociaal-Wetenschappelijk Onderzoek, Meppel: Boom.——— (1994), ‘External Validity Abandoned?’, Quality and Quantity, 27, 211–215.

Journal of Economic Methodology 27

Dow

nloa

ded

by [

UZ

H H

aupt

bibl

ioth

ek /

Zen

tral

bibl

ioth

ek Z

üric

h] a

t 01:

50 0

9 Ju

ly 2

014

Thurstone, L.L. (1931), The Reliability and Validity of Tests: Derivation and Interpretation ofFundamental Formulae Concerned with Reliability and Validity of Tests and IllustrativeProblems, Ann Arbor, MI: Edwards Bros.

Thye, S.R. (2007), ‘Logical and Philosophical Foundations of Experimental Research in the SocialSciences’, in Laboratory Experiments in the Social Sciences, eds. M. Webster and J. Sell,Amsterdam: Elsevier, pp. 57–86.

Vissers, G., Heyne, G., Peters, V., and Guerts, J. (2001), ‘The Validity of Laboratory Research inSocial and Behavioral Science’, Quality and Quantity, 35, 129–145.

Wallis, W.A., and Friedman, M. (1942), ‘The Empirical Derivation of Indifference Functions’,in Studies in Mathematical Economics and Econometrics, In Memory of Henry Schultz,eds. O. Lange, F. McIntyre and T.O. Yntema, New York: Books for Libraries Press,pp. 175–189.

Webster, M., and Sell, J. (2007), ‘Why Do Experiments’, in Laboratory Experiments in the SocialSciences, eds. M. Webster and J. Sell, Amsterdam: Elsevier, pp. 6–24.

Weintraub, E.R. (ed.) (1992), Toward a History of Game Theory, London: Duke University Press.Wilde, L. (1980), ‘On the Use of Laboratory Experiments in Economics’, in The Philosophy of

Economics, ed. J. Pitt, Dordrecht: Reidel, pp. 137–148.Willer, D., and Walker, H.A. (2007), Building Experiments, Testing Social Theory, Stanford, CA:

Stanford University Press.Woodworth, R.C., and Schlosberg, H. (1938), Experimental Psychology, New York: Holt, Rinehart

and Winston.Zelditch, M. (2007), ‘The External Validity of Experiments That Test Theories’, in Laboratory

Experiments in the Social Sciences, eds. M. Webster and J. Sell, Amsterdam: Elsevier,pp. 87–112.

F. Heukelom28

Dow

nloa

ded

by [

UZ

H H

aupt

bibl

ioth

ek /

Zen

tral

bibl

ioth

ek Z

üric

h] a

t 01:

50 0

9 Ju

ly 2

014