how validity travelled to economic experimenting
TRANSCRIPT
This article was downloaded by: [UZH Hauptbibliothek / Zentralbibliothek Zürich]On: 09 July 2014, At: 01:50Publisher: RoutledgeInforma Ltd Registered in England and Wales Registered Number: 1072954 Registeredoffice: Mortimer House, 37-41 Mortimer Street, London W1T 3JH, UK
Journal of Economic MethodologyPublication details, including instructions for authors andsubscription information:http://www.tandfonline.com/loi/rjec20
How validity travelled to economicexperimentingFloris Heukelom aa Department of Economics, Institute for Management Research ,Radboud University , Nijmegen, The NetherlandsPublished online: 13 May 2011.
To cite this article: Floris Heukelom (2011) How validity travelled to economic experimenting,Journal of Economic Methodology, 18:01, 13-28, DOI: 10.1080/1350178X.2011.556435
To link to this article: http://dx.doi.org/10.1080/1350178X.2011.556435
PLEASE SCROLL DOWN FOR ARTICLE
Taylor & Francis makes every effort to ensure the accuracy of all the information (the“Content”) contained in the publications on our platform. However, Taylor & Francis,our agents, and our licensors make no representations or warranties whatsoever as tothe accuracy, completeness, or suitability for any purpose of the Content. Any opinionsand views expressed in this publication are the opinions and views of the authors,and are not the views of or endorsed by Taylor & Francis. The accuracy of the Contentshould not be relied upon and should be independently verified with primary sourcesof information. Taylor and Francis shall not be liable for any losses, actions, claims,proceedings, demands, costs, expenses, damages, and other liabilities whatsoever orhowsoever caused arising directly or indirectly in connection with, in relation to or arisingout of the use of the Content.
This article may be used for research, teaching, and private study purposes. Anysubstantial or systematic reproduction, redistribution, reselling, loan, sub-licensing,systematic supply, or distribution in any form to anyone is expressly forbidden. Terms &Conditions of access and use can be found at http://www.tandfonline.com/page/terms-and-conditions
How validity travelled to economic experimenting
Floris Heukelom*
Department of Economics, Institute for Management Research, Radboud University, Nijmegen,The Netherlands
Validity was first given a more specifically scientific meaning by psychologists in theearly twentieth century in the contexts of psychological tests. Following theclassification of different validity-types in the American Psychological Association’sTechnical Recommendations (1954), validity travelled from psychological tests topsychological experiments through the work of Donald Campbell. Thus the idea wasintroduced that also experiments could be more or less valid. In addition, a distinctionwas made between the internal and the external validity of an experiment. Of thetwo domains in which validity was employed in psychology, only the internal andexternal validity of experimental methodology travelled to economics. The initialimplementation of validity in economic experimentation was reluctant, and focusedupon showing how external validity in particular was not problematic in economicexperiments. However, the gradual adoption of validity by economists working withexperiments eventually led to the clear, analytical definition of internal validity byFrancesco Guala, a definition that was subsequently taken over by other economists.In its travels from psychological tests to psychological experiments to economicexperiments the concept of validity generally retained its meaning as the accuracy of ascientific procedure. At the same time, however, it was put to use in dissimilar ways andelicited different discussions in the scientific realms in which it was applied.
Keywords: validity; tests; experiments
1 Introduction
In psychology and economics, validity denotes the accuracy of a scientific operation such
as measurement or experimentation. In other words, validity asks how well the scientific
operation achieves what it aims to achieve. The concept of validity traveled at least twice
across scientific boundaries. First, it traveled from the area of psychological tests to
psychological experiments, and then, second, it traveled from the area of psychological
experiments to experiments in the other social and behavioral sciences. As such, validity
provides an illustrative example of how concepts and other scientific tools travel from one
scientific domain to another (e.g. Galison and Stump 1996; Galison 1999; Gieryn 1999;
Morgan and Howlett 2010).
The reason that validity traveled from one area to another was that scientists in the area
of destination deemed the meaning and usage of the concept useful. However, as an
inevitable by-product of scientific traveling, the meaning and usage of validity also
changed with each voyage. The purpose of the present article is to document the two
voyages of validity. Thus, it adds to the literature of traveling scientific tools. But, in
ISSN 1350-178X print/ISSN 1469-9427 online
q 2011 Taylor & Francis
DOI: 10.1080/1350178X.2011.556435
http://www.informaworld.com
*Email: [email protected]
Journal of Economic Methodology,
Vol. 18, No. 1, March 2011, 13–28
Dow
nloa
ded
by [
UZ
H H
aupt
bibl
ioth
ek /
Zen
tral
bibl
ioth
ek Z
üric
h] a
t 01:
50 0
9 Ju
ly 2
014
addition, by showing where validity originates and how it was applied in different ways
each time it crossed a scientific border, this article aims to contribute to current discussions
of validity in economic and related experimenting (e.g. Calder, Phillips, and Tybout 1982;
Hammersley 1991; Guala 1999, 2003, 2005; Loewenstein 1999; Vissers, Heyne, Peters,
and Guerts 2001; Lucas 2003; Harrison and List 2004; Bardsley 2005; Guala and Mittone
2005; Samuelson 2005; Schram 2005; Thye 2007; Webster and Sell 2007; Willer and
Walker 2007; Zelditch 2007; Bardsley, Cubitt, Loomes, Moffatt, Starmer, and Sugden
2009; Jimenez-Buedo and Miller forthcoming).
The upcoming section reveals the origin of validity in the methodology of
psychological tests, in which validity functioned as a counterpart to reliability. When
validity subsequently traveled from the area of tests to that of psychological experiments,
it roughly retained its meaning and provided experimental methodology with a new
category in which it enabled one to think about experiments. At the same time, however, it
gave rise to a new series of questions regarding what exactly experiments are about. The
third section addresses how validity traveled from psychological experimentation to the
area of economic experimentation. In economics, validity was initially reluctantly adopted
and received little substantial discussion as compared with its discussion in psychological
experimentation. But when it did become a subject of concern, it received an analytical
treatment and definition that was stricter than it had received in psychology.
2 Validity in psychology
The concept of validity had been commonly used by American psychologists from at least
the 1870s onwards (e.g. McCrea and Pritchard 1897; Colvin 1900), but it was in the
psychology of tests that it received a more specific meaning from the 1920s onwards as the
accuracy of a test (Swanborn 1981). By contrast, validity entered the experimental
psychological vocabulary only in the late 1950s. In Edwin Boring’s classic A History of
Experimental Psychology (1929), the issue of validity was completely absent. Also the
equally voluminous and complete overview Experimental Psychology (1938) by Robert
Woodworth and Harold Schlosberg contained no reference to validity. Moreover, validity
remained absent in the 1950 second edition of Boring’s history, despite the fact that the
‘treatment of the later period [was] greatly expanded’ (Boring 1950, p. vii). The same is
true for Woodworth and Schlosberg’s revised edition that appeared in 1954. Even
A Manual of Psychological Experiments (1937), edited by Boring and with contributions
from some 20 high-standing experimental psychologists of the time, in its many examples
of the practicalities of conducting experiments, did not spend a single word on validity.
That may seem stranger than it is, as a test and an experiment really are two different
things. A test is an assessment by means of a questionnaire, interview, or other method to
assess the functioning of a human being compared with a standard or average – the IQ test is
the historically first and paradigmatic example (Danziger 1997). A test, in other words, is an
instrument, just as the thermometer is an instrument. For any instrument or test at least two
characteristics are crucial: its precision and its accuracy (e.g. Chang 2004). The accuracy of
the thermometer refers to how closely the thermometer follows the actual, true value of what
it is supposed to measure: heat. Precision refers to the degree of closeness of repeated
measurements of the same true value. The same distinction came to be used for psychological
tests, inwhich accuracywas referred to as validity, and precision becameknown as reliability.
An example can be found in Chicago-based psychometrician Louis Leon Thurstone’s
The Reliability and Validity of Tests: derivation and interpretation of fundamental
formulae concerned with reliability and validity of tests and illustrative problems (1931).
F. Heukelom14
Dow
nloa
ded
by [
UZ
H H
aupt
bibl
ioth
ek /
Zen
tral
bibl
ioth
ek Z
üric
h] a
t 01:
50 0
9 Ju
ly 2
014
As the title indicates, reliability and validity were the two guiding concepts in this 130-
page student manual. Thurstone reminded the reader that ‘validity refers to the correlation
between a test and its oriterion’ (Thurstone 1931, p. 97). In other words, validity to
Thurstone referred to how accurate the test measured the actual, true value (the oriterion) it
wanted to measure. The example that Thurstone used was a test of students’ previous
examination grades as a predictor for future scholarship grades. If this test performed well,
in the sense that previous examination grades were a good predictor of future scholarship
grades, the test had a high validity, if it did not the test had a low validity.1
The ever-recurring issue in empirical science is how to disentangle reliability from
validity, and in particular how to determine and demonstrate conclusively that your
instrument is valid (Porter 1994; Power 1996). The problem is that you cannot. But what
you can do is try to move in the right direction. To that end, psychologists in the 1930s and
1940s started to break down validity into different sub-categories, so as to be able to more
clearly assess the different aspects of validity separately. Sub-categories proposed
included content validity, construct validity, and ecological validity (Swanborn 1981). The
subsequent rapid expansion of validity concepts convinced the members of the American
Psychological Association (APA) in the early 1950s that some definitions and standards
for conducting tests were badly needed (Vissers et al. 2001, p. 131).
The APA was created in 1892 and aimed at representing all American professional
psychologists (www.apa.org, accessed 7 May 2009). In this capacity, it actively sought to
establish standards regarding various methodological aspects of psychological research.
Well known are its standards for scientific publication, first published in 1952, with
precursors going back to 1929 (Kadzin 2001, p. xix). Less successful examples include the
Committee on Measurement (1909–1919). In 1950, the then president of the APA, Joy
Paul Guilford, a psychometrician who worked in the tradition of Thurstone, appointed a
Committee on Test Standards to prepare a ‘statement on technical standards for evaluating
tests and the contents of test manuals’ (Street 1994, p. 152), which would be ‘an official
statement of the profession’ (APA Committee on Test Standards 1954, p. 461). The
committee, composed of Edward S. Bordin, R.C. Challman, H.S. Conrad, Lloyd
G. Humphreys, Paul E. Meehl, Donald E. Super, and its chairman Lee J. Cronbach, was
created to set the standard for performing tests for years to come. It conducted its
discussion in close cooperation with representatives of the American Educational
Research Association (AERA) and the National Council on Measurements Used in
Education (NCMUE). A preliminary version of the Technical Recommendations for
Psychological Tests and Diagnostic Techniques, based in part on Cronbach and Meehl
(1955), was published for examination by the psychological community in American
Psychologist (1952, pp. 461–475). The final version was published both as a separate
manual and as a special issue of American Psychologist (1954, pp. 201–238). About a
third of this 40-page manual was devoted to validity.
The Technical Recommendations distinguished four types of validity: content validity,
predictive validity, concurrent validity, and construct validity. Content validity was
described as a claim regarding whether observations from a draw of the population justify
claims about the population as a whole. Predictive validity was understood as an
assessment of how well predictions based on current observations accurately reflect future
observations. Concurrent validity demanded that a new test behaves similarly to existing
tests that measure the same phenomenon. In other words, concurrent validity demanded
that for instance a new IQ test had a high correlation with an already existing and accepted
IQ test. Construct validity, finally, was an assessment of the overlap between the
scientific operationalization of a higher-order term (intelligence, academic performance,
Journal of Economic Methodology 15
Dow
nloa
ded
by [
UZ
H H
aupt
bibl
ioth
ek /
Zen
tral
bibl
ioth
ek Z
üric
h] a
t 01:
50 0
9 Ju
ly 2
014
unemployment, and so on) and the meaning of this higher-order term in everyday
language.
In the same period as the publication of Technical Recommendations, but without any
apparent direct connection, the two traditions of experimental psychology and
correlational psychology integrated, among others as a result of the efforts of Thurstone
(Gigerenzer 1987; Danziger 1997). As a result, a test became an instrument that could be
employed in an experiment. One could for instance employ a test of subjects’ intelligence
in an experiment investigating the relation between intelligence and learning (Festinger
and Katz 1953). However, one could also still conduct a test without an experiment (say
measuring a child’s IQ), just as one could do an experiment without the instrument of a test
(say judging relative brightness of lamp bulbs).
Perhaps because of the increasing integration of correlational and experimental
psychology, the Technical Recommendations’ presentation of validity was a mixed
success. Almost immediately following its publication, the Technical Recommendations’
classification of validity became contested – in particular by a number of researchers who
applied it to experiments instead of tests. However, the Technical Recommendations did
succeed in deeply ingraining validity as the banner under which to discuss the
methodology of tests and experiments in psychology. Moreover, through the export of
psychological experimentation to the other social sciences in the post-war period, it
unintentionally succeeded in ingraining the concept of validity in the social sciences
generally.
Principal among the Technical Recommendations’ contesters was Donald Campbell.
In ‘Factors Relevant to the Validity of Experiments in Social Settings’ (1957), Campbell
investigated the issue of validity for the case of experiments and proposed a distinction
between internal and external validity. Campbell explicitly built on previous
methodological discussions of psychological experimenting, and illustrated his approach
by means of well-known experimental examples. Yet, the distinction between internal and
external validity is Campbell’s, without any directly related previous expositions other
than those mentioned above.
Validity will be evaluated in terms of two major criteria. First, and as a basic minimum, iswhat can be called internal validity: did in fact the experimental stimulus make somesignificant difference in this specific instance? The second criterion is that of external validity,representativeness, or generalizability: to what populations, settings, and variables can thiseffect be generalized. (Campbell 1957, p. 297, emphasis in the original)
In other words, internal validity referred to whether the experiment itself had been
conducted properly and whether it had yielded significant results, according to the received
rules of experimenting and statistics. External validity referred to the generalizability of the
experimental results. Although Campbell noted that internal and external validity might
sometimes be incompatible when controls for internal validity would jeopardize external
validity, they were not presented as opposites. A bad experiment could be inferior in both
the internal and external validity domain, and a good test or experiment would score high
on both internal and external validity.
With his introduction of internal and external validity to experimental methodology,
Campbell transplanted the notion of validity from the realm of tests to that of experiments.
In other words, Campbell applied a key notion in measurement theory – validity, or
accuracy – to the theory of experimentation in the behavioral and social sciences. In still
other words, Campbell introduced the idea that also experiments could be more or less
accurate. It is a matter of personal opinion whether it is meaningful to talk about the
accuracy of experiments in this way, which has been the subject of continuous debate (e.g.
F. Heukelom16
Dow
nloa
ded
by [
UZ
H H
aupt
bibl
ioth
ek /
Zen
tral
bibl
ioth
ek Z
üric
h] a
t 01:
50 0
9 Ju
ly 2
014
Zelditch 2007; Jimenez-Buedo and Miller forthcoming). One peculiarity to be noted here
is that although Campbell applied the notion of validity to experiments, he did not
transplant validity’s counterpart reliability, or precision. Thus, to Campbell and to all
those who picked up his internal–external validity distinction it was meaningful to talk
about experiments as being more or less accurate, but it was not meaningful to talk about
experiments as being more or less precise.
It is to be emphasized that Campbell’s application of validity to experiments does not
mean that he failed to distinguish between tests and experiments. On the contrary,
Campbell and Donald Fiske’s ‘Convergent and Discriminant Validation by the Multitrait-
Multimethod Matrix’ (1959), for instance, widely cited as the start of the double-blind
research method (Brewer 2001), argued that underneath the ‘major types of validity’ of
tests as presented in the Technical Recommendations lay concerns for convergence and
discrimination. It completely abstained from the internal–external distinction. Validation
is convergent, the authors argued, when a particular test confirms observations obtained
from other tests. For the validity of a new test, however, the observations in addition are
needed to be shown to discriminate more, or to discriminate differently as compared with
previous tests.
The distinction between internal and external validity returned in Campbell and
Stanley (1963), ‘Experimental and Quasi-Experimental Designs for Research on
Teaching’. In this lengthy book chapter, external validity retained the meaning it was
given in Campbell (1957). The meaning of internal validity, however, was slightly
sharpened: ‘Internal validity is the basic minimum without which any experiment is
uninterpretable: Did in fact the experimental treatments make a difference in this specific
context?’ (Campbell and Stanley 1963, p. 175, emphasis in the original). Subsequently,
internal validity was divided into ‘eight different classes of extraneous variable’, which
needed to be controlled in the experimental design. These eight classes had no direct origin
in other methodological discussions in psychology and like internal and external validity
were derived directly from Campbell’s own observations of experimental practice in
psychology. The eight classes included History: the influence of events between
experimental treatments; Maturation: the influence of the passage of time on subjects’
behavior; Testing: the effects of taking a test upon scores of a second testing; Selection:
biases resulting from the method of selecting the subjects; and Experimental mortality: the
loss of responses during the experiment. Note again that all potential effects here are about
the experiment, including the Testing effect, which is not a worry about the validity of the
measurement instrument itself but an effect of the instrument on the experimental subject.
Subsequently, external validity was divided into four classes, including the Reactive or
Interaction effect: the influence of previous tests or other experiments on subjects’
behavior; and Reactive effects of experimental arrangements: effects of the experimental
arrangements on the behavior of subjects, later also known as the problem of artificiality.
The distinction between reliability and validity, the classification and definition of
validity in the Technical Recommendations, and Campbell’s distinction between internal
and external validity defined the experimental vocabulary in psychology for subsequent
decades, up into the twenty-first century. In general, it seems, the different domains have
been kept separate: validity versus reliability and the different subcategories of validity
under the rubric of tests and measurement, and internal and external validity and their
different sub-categories under the rubric of experimentation (e.g. Kruglanski 1975;
Henshel 1980; Berkovitz and Donnerstein 1982; Swanborn 1994; Smith and Davis 1997).
An example is the 2000 eight-tome version of the APA’s Encyclopedia of Psychology
(Kadzin 2000). The Encyclopedia contained two entries for validity, ‘Validity’, and
Journal of Economic Methodology 17
Dow
nloa
ded
by [
UZ
H H
aupt
bibl
ioth
ek /
Zen
tral
bibl
ioth
ek Z
üric
h] a
t 01:
50 0
9 Ju
ly 2
014
‘Construct Validity’, signaling a predominant position of construct validity among the
different categories. In both the entries, validity was understood as a description of ‘the
extent to which a psychological instrument measures what it is supposed to measure’
(Krueger and Kling 2000, p. 149). In addition to construct validity, validity was then
divided into a number of categories and sub-categories. Only at the end did the validity-
entry briefly mention the applicability of the validity concept to experiments, and cited
internal and external validity as its main categories.
Sometimes, however, the different validities have been related to one another. For
instance, in the authoritative handbook Quasi-Experimentation, Design & Analysis Issues
for Field Settings (1979), Campbell together with Thomas Cook started by broadly
distinguishing internal and external validity.2 Under the heading of internal validity, the
authors discussed what they considered to be a main threat to internal validity in the case
of quasi-experiments: statistical conclusion validity, which asks whether or not statistical
inference of covariance between variables is justified. They considered the main factor of
external validity, on the other hand, to be construct validity. Thus, the internal and external
validity experiments functioned as the general terms under which also the measurement
types of validity were discussed.
To sum up, validity originates in psychology as a synonym of the accuracy of a
measurement instrument – the measurement instrument in psychology often being a test.
Validity or accuracy of a test was distinguished from the reliability, or precision of the test.
In 1954, the APA’s Technical Recommendations standardized the different sub-categories
of validity. A few years later, Campbell applied validity, but not reliability, to experiments,
and chose his own sub-categories in the formof internal and external validity. Subsequently,
test, experiment, and their associated categories of validity were kept largely separate in
psychology, despite occasional attempts to specify a relationship between them.
3 The dissemination of validity into economics
During the period of the publication of Thurstone’s textbook on reliability and validity
(1931) and the APA’s Technical Recommendations (1954), the distinction between
reliability/precision and validity/accuracy as assessments of measurement instruments
were occasionally and then only briefly mentioned by economists. Samuelson’s
Foundations of Economic Analysis (1947), for instance, mentioned a difference between
the validity and precision of a hypothesis (p. 12), without, however, elaborating on the
matter. By contrast, methodological concerns of measurement formed an integral part of
econometrics (Stewart andWallis 1981;Morgan 1990; Peracchi 2001; Boumans 2005). But
these discussions stemmed from measurement theory in natural sciences, and not from
psychology.Moreover, there is no indication thatmeasurement discussions in econometrics
influenced the methodological perspective of economic experimenting.
However, prior to the rise of experimental economics in the 1960s and 1970s, there
was some discussion surrounding the methodology of experiments in economics.3 Wallis
and Friedman (1942), for instance, criticized Thurstone’s one-time attempt to infer
indifference curves from an experiment with a subject who had to choose between shoes,
coats, and hats (Thurstone 1931; Moscati 2007). Wallis and Friedman argued that the
degree of control in Thurstone’s experiment had been too low to allow for any serious
inferences regarding indifference curves. For instance, they remarked that ‘[i]t is
questionable whether a subject in so artificial an experimental situation could know what
choices he would make in an economic situation’ (Wallis and Friedman 1942, p. 179).
Anachronistically, this could be understood as a problem of external validity, in the sense
F. Heukelom18
Dow
nloa
ded
by [
UZ
H H
aupt
bibl
ioth
ek /
Zen
tral
bibl
ioth
ek Z
üric
h] a
t 01:
50 0
9 Ju
ly 2
014
that the results obtained in the laboratory environment could not be translated to behavior
in the economy outside the laboratory. It should be added, however, that Wallis and
Friedman did not relate their critique of Thurstone (1931) to methodological discussions
on measurement and experiments generally. Instead, they interpreted their objections in
terms of what they considered to be a fundamental difference between economics and
psychology. The methods of the latter, they argued, could simply not be applied to the
former. Moreover, Wallis and Friedman put forth similar practical objections against the
construction of indifference curves on the basis of statistical data of actual economic
behavior, arguing for instance that individual’s preferences could change between two
observed preference expressions (pp. 183–186). But despite a few loose experiments and
related methodological discussions of Wallis and Friedman (1942) and some others
(Moscati 2007), expositions about the methodology of economic experimenting only
seriously commenced with the rise of experimental and later behavioral economics.4
3.1 Validity in experimental economics
It is well known that Vernon Smith, the founding father of experimental economics, in the
1950s developed his experimental method for economics in an atmosphere of
interdisciplinary research (Weintraub 1992; Lee 2004; Dimand 2005). In particular, the
influence of social psychologist Sidney Siegel has been recognized as vital for the
development of Smith’s application of the experimental method of the psychologists to
economic research questions (Smith 1992; Innocenti 2008; Lee 2008). However, among
others because of Siegel’s death in 1961, Smith was alone in conducting his economic
experiments until the mid-1970s, when he was joined by a younger generation of
economists, including Charles Plott, John Kagel, and others.
Discussions of the proper method of conducting experiments in economics are a
recurring theme in Smith’s writings of the 1960s, and in the experimental economics
literature more generally from the mid-1970s onwards. Yet, these discussions were not put
in terms of the internal–external validity framework that was developing in psychology
around the same time. On the contrary, in the single methodological article Smith wrote in
the 1960s, the issue of inferences based on experiments is a few times put in terms of
validity, without however any reference to the psychologists, and despite the fact that in
the same article Smith explicitly cites some social psychologists as important for
developing the experimental method for economics. Instead, Smith took a position which
is best described as a combination of set-theory and Bayesian statistics, and which derived
its inspiration mainly from Savage (1954). In this approach, the overlap of the sets Nature,
Model, and Experimental Results ‘constitutes the “validity” of the experiment’ (Rice and
Smith 1964, p. 240), the statistics of which are an input for a Bayesian updating process
calculating the scientist’s post-experiment beliefs.
Also in Smith’s well-known methodological article of the 1970s, ‘Experimental
Economics: Induced Value Theory’ (Smith 1976), no reference was made to the validity
framework of the psychologists. Smith talked about experiments as testing the ‘validity’ of
economic theories (Smith 1976, p. 274), invoked the concept of parallelism [‘As far as we
can tell, the same physical laws prevail everywhere’ (Shapley 1964; quoted in Smith 1976,
p. 274)], and advanced control as ‘the essence of experimental methodology’ (Smith 1976,
p. 275). But validity as understood by the psychologists did not appear even in a footnote.
The reason that Smith and other experimental economists in the 1970s such as Plott
were reluctant to adopt validity as understood by the psychologists was that the
psychologists’ way of dealing with validity risked creating a division between an inside
Journal of Economic Methodology 19
Dow
nloa
ded
by [
UZ
H H
aupt
bibl
ioth
ek /
Zen
tral
bibl
ioth
ek Z
üric
h] a
t 01:
50 0
9 Ju
ly 2
014
world of the laboratory and an outside ‘real’ world of the economy and its actors. Such an
interpretation, they argued, would be directly against experimental economists’
conception of experiments in economics because,
The relevance of experimental methods rests on the proposition that laboratory markets are‘real’ markets in the sense that principles of economics apply there as well as elsewhere. Realpeople pursue real profits within the context of real rules. The simplicity of laboratory marketsin comparison with naturally occurring markets must not be confused with questions abouttheir reality as markets. (Plott 1982, p. 1520)
For this reason, Plott (1982) was equally dismayed about the term ‘artificial’ as a
description of the laboratory environment. Plott argued that a concern with the artificiality
of experiments is a straw man, and that in any case it would be an argument directed
at experimentation in general, not just at experiments in economics.5 Finally, also the
560-page leading textbook in experimental economics, Davis and Holt Experimental
Economics (1993), was entirely devoid of references to psychologists’ validity.
Yet, in the late 1980s and early 1990s the psychological categories nevertheless slowly
started to creep into the methodological discussion of the experimental economists. The
first article to invoke one of the psychologists’ notions of validity was Brookshire,
Coursey, and Schulze (1987), ‘The External Validity of Experimental Economics
Techniques: Analysis of Demand Behavior’. The focuses of the authors was the now well-
known twin allegations of experimental economic results not translating to ‘real world
settings’, and how to compare behavior of student-subjects with the behavior of ‘“actual”
buyers, sellers, or traders’ (p. 289). In contrast to earlier replies to these allegations, as
indicated above, the authors summarized them as focusing upon ‘the external validity, or
lack of validity, associated with experimental economic techniques’ (p. 289, emphasis in
the original), which was then equated with the issue of the parallelism between the two
settings, as defined by Smith and others. Subsequently, the authors set out to invalidate
these allegations on empirical grounds. That is, they compared subjects’ behavior in
experimental markets with agents in ‘real’ markets and concluded that the observed
behavior in the two settings was similar enough to infer that the experimental method in
economics did not pose a problem of parallelism or external validity.
It is first of all important to note that the authors adopted external validity without its
counterpart internal validity, thus defying the idea that the two always come together.
Second, the issue of external validity thus was first introduced in experimental economics
not as a fundamental philosophical problem of experimentation, but as a purely empirical
question. This continues to be the understanding of (external) validity of many
experimental economists up to this day. For instance, in an email to the author, Smith
emphasized that he considered validity, or parallelism, ‘100% an empirical issue’, an issue
that ‘needs to be addressed as such if wheel spinning is to be avoided’ (email Smith to
author, 1 July 2009).
Following this first experimental economic article employing external validity,
internal validity was also introduced.6 For instance, in Friedman and Sunder Experimental
Methods, A Primer for Economists (1994), economists interested in conducting experiments
were occasionally warned that such-and-such a method may undermine or threaten the
internal validity of the experiment, or that some other procedure risked weakening the
external validity of the experiment, in which external validity continued to be equated with
parallelism. They referred to the ‘older terminology (from psychology, biology and other
disciplines)’ of internal and external validity ‘to make explicit the connection to the
literature on philosophy of science’ (email Friedman to author, 2 July 2009). As a result of
F. Heukelom20
Dow
nloa
ded
by [
UZ
H H
aupt
bibl
ioth
ek /
Zen
tral
bibl
ioth
ek Z
üric
h] a
t 01:
50 0
9 Ju
ly 2
014
these and other references, internal validity and external validity gradually became
household concepts of experimental methodology in economics.
Nevertheless, many experimental economists continued to have some reservations
with regard to the new methodological concepts. Vernon Smith tried to play down the
importance of the two validities, emphasizing, as said, that it was merely an empirical
question that could be tested, and continued to equate validity with parallelism. In an email
to the author Dan Friedman noted that he has never been ‘especially enthusiastic about
[ . . . ] ‘internal’ and ‘external’ validity’ (email Friedman to author, 2 July 2009). Similar
remarks have been made by other well-known experimentalists (email Glenn Harrison to
author, 2 July 2009, email John List to author, 3 July 2009).
The question thus arises why experimental economists sometimes adopted the internal
and external validity concepts of the psychologists if they did consider them to contribute
to the assessment of experiments in economics. The most plausible answer, as illustrated
by Brookshire, Coursey, and Schulze’s article discussed above, is that the experimental
economists of the late 1980s and early 1990s felt compelled to use internal and external
validity in order to establish experimental economics as a viable economic sub-discipline.
As experimental economics was growing, but its position anything but secured, external
validity and later internal validity were introduced to convince a skeptical audience of
economists who did not believe in the ‘reality’ of experiments (Starmer 1999, email Chris
Starmer to author, 1 July 2009, email Friedman to author, 2 July 2009). It is hence
somewhat ironic that the use of internal and external validity subsequently contributed to
ingraining in economics the conception of an inside world of the laboratory versus an
outside world of reality.
3.2 Behavioral economics
The introduction of external and internal validity by experimental economists in the late
1980s and early 1990s coincided with the emergence of behavioral economics, the other
economic sub-discipline that employs experiments. Despite the fact that during the 1990s
and early 2000s behavioral economics and experimental economics came to oppose one
another on a number of issues (Heukelom 2011), behavioral economists adopted many of
the experimental techniques and language developed by the experimental economists,
including an aversion toward the use of deception, the use of monetary rewards, and the
use of internal and external validity.7 However, behavioral economists dropped the link
between external validity and parallelism and preferred to refer to internal and external
validity as a ‘psychological distinction’, in the sense of ‘from the psychological literature’
(e.g. Camerer 1996; Loewenstein 1999; Camerer and Loewenstein 2004; Samuelson
2005). As in experimental economics, validity was not extensively discussed in behavioral
economics, as compared with the extensive discussions in psychology. When it was
discussed, however, it was framed in terms of internal validity and external validity, in
which internal validity, roughly and without much discussion referred to the validity of the
inferences drawn on the basis of the experimental observations, and external validity
referred to the generalizability of the observations and inferences.
In a wonderful historical twist, Loewenstein (1999), the most extensive and explicit
behavioral economics discussion of validity in the 1990s, used the ‘psychological
distinction’ of internal and external validity to criticize experimental practice of
experimental economists. ‘Experimental Economics from the vantage-point of
Behavioural Economics’, Loewenstein (1999) positioned behavioral economics explicitly
in opposition to experimental economics. Under the heading of external validity,
Journal of Economic Methodology 21
Dow
nloa
ded
by [
UZ
H H
aupt
bibl
ioth
ek /
Zen
tral
bibl
ioth
ek Z
üric
h] a
t 01:
50 0
9 Ju
ly 2
014
Loewenstein saw four problems with experimental economics. First, experimental
economics put great emphasis on the use of auctions in its experiments. As people in
reality hardly ever find themselves in an auction situation, it is doubtful that these
experiments can tell us very much about economic behavior in the real world. Second,
Loewenstein disagreed with experimental economists’ use of repetition in what he called
the Ground Hog Day argument, following Camerer (1996). In reality, Loewenstein
argued, people never make the exact same decision 40 times in a row. Real-world behavior
is much more like the first few rounds of an experiment compared with the last two or three
rounds.
Third, Loewenstein criticized experimental economists for their tendency to reduce
real-world content to the absolute minimum possible. Apart from the fact that a context-
free experiment is an illusion, Loewenstein argued that it also greatly reduces the
external validity of the experiments. Instead, economists should, just as Loewenstein
himself, make the experimental situation as congruent with reality as possible; hence
make the experiment ‘context-rich’. Fourth, according to Loewenstein experimental
economists wrongly assumed that monetary rewards result in strict control over
incentives. With monetary incentives, subjects are also likely to be driven by other
motives than profit maximization, he argued. Finally, one problem concerning internal
validity that Loewenstein observed was that experimental economists had been far too
careless in not using randomization and in comparing the experimental results that had
been obtained under different circumstances. Loewenstein’s discussion illustrates that
by the late 1990s internal and external validity had emerged as household concepts for
experimental methodology in both experimental and behavioral economics. They were
not given extensive treatment or definition, but had emerged as two contrasting
concepts in which terms experimentalists in economics could think about their
experiments.
3.3 The analytical definition of internal and external validity in economics
Thus, in the 1980s and 1990s external validity and internal validity were loosely introduced
in the methodology of experiments in economics. Although the economists who conducted
and discussed seemed to be comfortable with the meaning of both terms and with the
relationship between the two, internal and external validity were never given extensive
discussion or definition. This changed in the late 1990s and early 2000s when Francesco
Guala made validity the key methodological concern of economic experiments (Guala
1999, 2003, 2005; Guala and Mittone 2005).
Guala first of all drew a sharp line between an inside world of the experiment and an
outside, real world, thus implicitly arguing against the position of the experimental
economists of the 1960s–1980s. Experimentalists draw ‘inferences within the
experiment’, Guala argued, in which experiments are to be understood as ‘very special
settings, which are rarely if ever instantiated in the “real world” outside the laboratory’
(Guala 2005, p. 141, emphasis in the original). Internal validity, according to Guala, was
about the validity of the inferences in the inside world of the laboratory, whereas external
validity was about the relation of these inferences to the real outside world. Internal
validity ‘is achieved when some particular aspect of a laboratory system [ . . . ] has been
properly understood by the experimenter’ (Guala 2005, p. 142). Experiment E is internally
valid when variation in Y is known to be caused by X. It is in addition externally valid if
‘X causes Y not only in E, but also in F, G, H, etc’ (Guala 2005, p. 142). Second, Guala
posited a trade-off between internal and external validity:
F. Heukelom22
Dow
nloa
ded
by [
UZ
H H
aupt
bibl
ioth
ek /
Zen
tral
bibl
ioth
ek Z
üric
h] a
t 01:
50 0
9 Ju
ly 2
014
The stronger an experimental design is with respect to one validity issue, the weaker it islikely to be with respect to the other. The more artificial the environment, the better forinternal validity; the less artificial, the better for external purposes. (Guala 2005, p. 144)
Guala defined a trade-off as ‘likely’ in the first sentence, although the second sentence
is less ambiguous. What is important, however, is that the trade-off has been understood as
strict and well defined by other experimentalists who have adopted the internal–external
validity distinction (e.g. Samuelson 2005; Schram 2005; Bardsley et al. 2009). For
instance, Schram (2005) ‘Artificiality: The Tension Between Internal and External
Validity in Economic Experiments’ constructed a similar framework as Guala, and relied
for its terminology furthermore on Loewenstein (1999). Schram interpreted Guala to be
positing a strict trade-off between internal and external validity, and retraced it directly to
psychology.
There is an obvious tension between [internal and external validity]. Where internal validityoften requires abstraction and simplification to make the research more tractable, theseconcessions are made at the cost of decreasing external validity. Loewenstein (1999) pointsout that while this tension is a starting point in learning research methods in psychology, thediscussion is often completely neglected by economists. (Schram 2005, p. 226)
In addition, Schram like Guala invoked a distinction between the laboratory world of
the experiment and the real world outside the experiment. A key issue in economic
methodology of experimentation, according to Schram, was the issue of the ‘artificiality of
the laboratory situation’, in which ‘the [artificiality] question is whether the stylized form
of experimental institutions allows for conclusions pertaining to the “real world”’ (Schram
2005, p. 226). As said, in the experimental economic literature the outside versus inside
dichotomy is often linked to this notion of ‘artificiality’ (e.g. Starmer 1999; Bardsley 2005;
Schram 2005), in which artificiality refers to the alleged artificial world within the
laboratory that is different, and therefore perhaps not comparable, to the ‘real’, or ‘natural’
world outside the laboratory. In other words, the problem of external validity is understood
as a consequence of the problem of artificiality.8
With the analytical treatment by Guala and others, internal and external validity came
to the center of a logic of experimentation in economics. Where experimental economists
of the 1980s and early 1990s such as Vernon Smith conceived of validity as a purely
empirical question of the comparability between experimental outcomes in different
settings, in the 2000s validity became the way to logically connect the experimental setup
to the experimental results, and to logically connect the experimental results derived in the
‘inside’ world of the experiment to the ‘outside’ world of reality. At the end of the first
decade of the twenty-first century, with the distinction between experimental and
behavioral economics gradually dissolving and with other experimental and empirical
procedures emerging, the analytical take on validity provides a prominent way of thinking
about experimentation in economics.
Finally, the treatment of internal and external validity in economics in recent years has
provided an argument in the distinction between laboratory experiments and field
experiments (e.g. Harrison and List 2004; Carpenter, Harrison, and List 2005; others). The
reasoning is that laboratory experiments allow for a relatively large amount of control, and
therefore, provide a high degree of internal validity. The cost of this high-internal validity,
however, is that laboratory experiments yield a relatively low-external validity.
Experimental practice in economics therefore leaves room for a method, which off-sets
the low external validity of laboratory experiments. This is where field experiments come
in, which lack the high-internal validity of laboratory experiments, but therefore offer a
high degree of external validity. That said, it should immediately be added that the recent
Journal of Economic Methodology 23
Dow
nloa
ded
by [
UZ
H H
aupt
bibl
ioth
ek /
Zen
tral
bibl
ioth
ek Z
üric
h] a
t 01:
50 0
9 Ju
ly 2
014
literature also seeks ways to go beyond internal and external validity for the case of field
experiments (e.g. Harrison and List 2004; Harrison 2005; Levitt and List 2007).
4 Conclusion
The history of validity in psychology and economics adds a new case-study to the
literature that investigates how scientific tools travel across scientific boundaries (e.g.
Galison and Stump 1996; Galison 1999; Gieryn 1999; Morgan and Howlett 2010). In
addition, it illuminates present usage and meaning of validity in economic experimenting
(Loewenstein 1999; Harrison and List 2004; Samuelson 2005; Jimenez-Buedo and Miller
forthcoming). The common English concept of validity was first given a more specifically
scientific meaning by psychologists in the early twentieth century in the contexts of
psychological tests, in which validity denoted the accuracy of a test and was contrasted
with the reliability or precision of a test. Following the classification of different
validity-types in the APA’s Technical Recommendations (1954), validity traveled from
psychological tests to psychological experiments through the work of Campbell
(Campbell 1957; Campbell and Fiske 1959; Campbell and Stanley 1963). Thus, the idea
that also experiments could be more or less valid or accurate was introduced. In addition, a
distinction was made between the internal and the external validity of an experiment.
Many subsequent discussions in experimental methodology in psychology and economics
essentially have been about the question what it means to say that an experiment is more or
less internally or externally valid, i.e. accurate.
Of the two domains in which validity was employed in psychology, only the internal
and external validity of experimental methodology traveled to economics. The initial
implementation of validity in economic experimentation was reluctant, and focused upon
showing how external validity in particular was not problematic in economic experiments
(e.g. Brookshire, Coursey, and Schulze 1987). However, the gradual adoption of validity
by economists working with experiments eventually led to the clear, analytical definition
of internal and external validity by Guala (Guala 1999, 2003, 2005; Guala and Mittone
2005), a definition that was subsequently taken over by other economists (e.g. Harrison
and List 2004; Bardsley 2005; Samuelson 2005; Schram 2005; Jimenez-Buedo and Miller
forthcoming). Thus, the application of validity, or accuracy, to the domain of
psychological experimenting and its loose classification into internal and external
validity, eventually became in economics the strict twin concepts by which the
meaningfulness of experiments was assessed. In other words, in its travels from
psychological tests to psychological experiments to economic experiments the concept of
validity generally retained its meaning as the accuracy of a scientific procedure. At the
same time, however, it was put to use in dissimilar ways and elicited different discussions
in the scientific realms in which it was applied.
Notes
1. Another issue is that one may discuss validity/accuracy and reliability/precision without usingthose or similar terms. An example is Boring’s well-known ‘Intelligence as the Tests Test it’(1923), which assesses the meaning of the then recently emerged intelligence tests. It waspublished a few years before psychologists developed the validity–reliability vocabulary todiscuss the interpretation of tests.
2. A quasi-experiment is an experiment in which the experimenter has little control over the causalfactors. A quasi-experiment is to be situated in between pure laboratory experiments andcorrelational research, and is similar to what in the 2000s in economics has been labeled a fieldexperiment.
F. Heukelom24
Dow
nloa
ded
by [
UZ
H H
aupt
bibl
ioth
ek /
Zen
tral
bibl
ioth
ek Z
üric
h] a
t 01:
50 0
9 Ju
ly 2
014
3. A JSTOR search of economic journals reveals a handful of occasional remarks to either internalor external validity in the 1950s–1970s, in journals such as The Journal of Economic Educationand The Journal of Human Resources.
4. In this paper, I focus on laboratory experiments in experimental and behavioral economics.Recent developments, such as experimenting in financial accounting (e.g. Libby, Bloomfield,and Nelson 2002), are left for another occasion.
5. Similar arguments such as these in Plott (1982) can be found in Smith (1976, 1982), and inWilde (1980). The most extensive presentation and refutation of artificiality and the associatedreal-world argument can be found in Starmer (1999).
6. A JSTOR search of economic journals yields a combination of internal validity and externalvalidity in 3 articles in the 1980s, in 14 articles in the 1990s, and in 41 articles in the 2000s.
7. Arguably, the distinction between experimental and behavioral economics started to dissolvearound the mid-2000.
8. The intuition behind such a distinction is that in an experimental setting, i.e. in a laboratory, thescientist controls some factors of the world and thereby creates an artificial world inside thelaboratory. As such, it risks to confuse the physical operation of conducting an experiment withits ontological status. Experiments often are physically conducted ‘inside’ a laboratory: thescientist puts on her gloves and glasses, passes through an airlock and enters the laboratory. Ineconomics and psychology similarly the scientist often physically enters the laboratory wherethe experiment is conducted. However, that does not mean that by doing so she enters adifferent, inside, or non-real world, as opposed to the real world outside. It simply means that sheenters a place in the world specifically manipulated and controlled to conduct experiments. In alaboratory, the scientist is completely in our real, material world.
References
APA Committee on Test Standards (1952), ‘Technical Recommendations for Psychological Testsand Diagnostic Techniques: Preliminary Proposal’, The American Psychologist, 461.
——— (1954), ‘Technical Recommendations for Psychological Tests and Diagnostic Techniques’,Psychological Bulletin, 51(2), Supplement.
Bardsley, N. (2005), ‘Experimental Economics and the Artificiality of Alteration’, Journal ofEconomic Methodology, 12(2), 239–252.
Bardsley, N., Cubitt, R., Loomes, G., Moffatt, P., Starmer, C., and Sugden, R. (2009), ExperimentalEconomics: Rethinking the Rules, Princeton, NJ: Princeton University Press.
Berkovitz, L., and Donnerstein, E. (1982), ‘External Validity is more than Skin Deep’, AmericanPsychologist, 37, 245–257.
Boring, E.C. (1923), ‘Intelligence as the Tests Test It’, New Republic, 36, 35–37.——— (1929), A History of Experimental Psychology, New York: The Century Co.——— (ed.) (1937), A Manual of Psychological Experiments, New York: Wiley.Boring, E.C. (1950), A History of Experimental Psychology (2nd ed.), New York: Appleton-
Century-Crofts.Boumans, M.J. (2005), How Economists Model the World into Numbers, London: Routledge.Brewer, M.B. (2001), ‘Donald Campbell’, in Encyclopedia of Psychology, ed. A.E. Kadzin,
Washington, DC: American Psychological Association: III, pp. 3–5.Brookshire, D.S., Coursey, D.L., and Schulze, W.D. (1987), ‘The External Validity of Experimental
Economics Techniques: Analysis of Demand Behavior’, Economic Inquiry, 25(2), 239–250.Calder, B.J., Phillips, L.W., and Tybout, A.M. (1982), ‘The Concept of External Validity’, Journal
of Consumer Research, 9, 240–244.Camerer, C. (1996), ‘Rules for Experimenting in Psychology and Economics, andWhy They Differ’,
in Experimental Studies of Strategic Interaction: Essays in Honor of Reinhard Selten, eds. W.Albers, W. Guth, and E. Van Damme, Berlin: Springer-Verlag, pp. 313–327.
Camerer, C., and Loewenstein, G. (2004), ‘Behavioral Economics: Past, Present, Future’, inAdvances in Behavioral Economics, eds. C.F. Camerer, G. Loewenstein, and M. Rabin,Princeton, NJ: Princeton University Press, pp. 3–52.
Campbell, D.T. (1957), ‘Factors Relevant to the Validity of Experiments in Social Settings’,Psychological Bulletin, 54, 297–312.
Campbell, D.T., and Fiske, D. (1959), ‘Convergent and Discriminant Validation by the Multitrait-Multimethod Matrix’, Psychological Bulletin, 56, 81.
Journal of Economic Methodology 25
Dow
nloa
ded
by [
UZ
H H
aupt
bibl
ioth
ek /
Zen
tral
bibl
ioth
ek Z
üric
h] a
t 01:
50 0
9 Ju
ly 2
014
Campbell, D.T., and Stanley, J.C. (1963), ‘Experimental and Quasi-Experimental Designs forResearch on Teaching’, in Handbook of Research on Teaching, ed. N.L. Gage, Boston, MA:Houghton Mifflin, pp. 171–246.
Carpenter, J.P., Harrison, G.W., and List, J.A. (2005), ‘Field Experiments in Economics: AnIntroduction’, in Research in Experimental Economics, Vol. 10, eds. J.P. Carpenter, G.W.Harrison, and J.A. List, Amsterdam: Elsevier, pp. 1–15.
Chang, H. (2004), Inventing Temperature: Measurement and Scientific Progress, Oxford: OxfordUniversity Press.
Colvin, S.S. (1900), ‘The Fallacy of Extreme Idealism’, The American Journal of Psychology, 11(4),511–526.
Cook, T.D., and Campbell, D.T. (1979), Quasi-Experimentation: Design & Analysis Issues for FieldSettings, Boston, MA: Houghton Mifflin Co.
Cronbach, L.J., and Meehl, P.E. (1955), ‘Construct Validity in Psychological Tests’, PsychologicalBulletin, 52, 281.
Danziger, K. (1997), Naming the Mind, How Psychology Found its Language, London: SagePublications.
Davis, D.D., and Holt, C.A. (1993), Experimental Economics, Princeton, NJ: Princeton UniversityPress.
Dimand, R.W. (2005), ‘Experimental Economic Games: The Early Years’, in The Experiment in theHistory of Economics, eds. P. Fontaine and R. Leonard, New York: Routledge, pp. 5–24.
Festinger, L., and Katz, D. (eds.) (1953), Research Methods in the Behavioral Sciences, Fort Worth,TX: Dryden Press.
Friedman, D., and Sunder, S. (1994), Experimental Methods: A Primer for Economists, Cambridge:Cambridge University Press.
Galison, P. (1999), ‘Trading Zone, Coordinating Action and Belief’, in The Science Studies Reader,ed. M. Biagioli, London: Routledge, pp. 137–160.
Galison, P., and Stump, D.J. (1996), The Disunity of Science: Boundaries, Contexts, and Power,Stanford, CA: Stanford University Press.
Gieryn, T. (1999), Cultural Boundaries of Science: Credibility on the Line, Chicago, IL: Universityof Chicago Press.
Gigerenzer, G. (1987), ‘Survival of the Fittest Probabilist: Brunswik, Thurstone, and the TwoDisciplines of Psychology’, in The Probabilistic Revolution 2, eds. L. Kruger, G. Gigerenzer andM.S. Morgan, Cambridge: MIT Press, pp. 49–72.
Guala, F. (1999), ‘The Problem of External Validity (or “Parallelism”) in Experimental Economics’,Social Science Information, 38, 555–573.
——— (2003), ‘Experimental Localism and External Validity’, Philosophy of Science, 70,1195–1205.
——— (2005), The Methodology of Experimental Economics, Cambridge: Cambridge UniversityPress.
Guala, F., and Mittone, L. (2005), ‘Experiments in Economics: External Validity and the Robustnessof Phenomena’, Journal of Economic Methodology, 12(4), 495–515.
Hammersley, M. (1991), ‘A Note on Campbell’s Distinction between Internal and ExternalValidity’, Quality and Quantity, 25, 381–387.
Harrison, G.W. (2005), ‘Field Experiments and Control’, Research in Experimental Economics, 10,17–50.
Harrison, G.W., and List, J.A. (2004), ‘Field Experiments’, Journal of Economic Literature, 27,1013–1059.
Henshel, R.L. (1980), ‘The Purposes of Laboratory Experimentation and the Virtues of DeliberateArtificiality’, Journal of Experimental Social Psychology, 16, 466–478.
Heukelom, F. (2011), ‘What to Conclude from Psychological Experiments: The Contrasting Casesof Experimental and Behavioral Economics’, History of Political Economy, forthcoming.
Innocenti, A. (2008), ‘How Can a Psychologist Inform Economics? The Strange Case of SidneySiegel’, DEPEID Working papers, 8/2008.
Jimenez-Buedo, M., and Miller, L.M. (forthcoming), ‘Why a Trade-off? The Relationship betweenthe External and Internal Validity of Experiments’, Theoria. An International Journal forTheory, History and Foundations of Science.
Kadzin, A.E. (ed.) (2000), Encyclopedia of Psychology, Washington, DC: American PsychologicalAssociation.
F. Heukelom26
Dow
nloa
ded
by [
UZ
H H
aupt
bibl
ioth
ek /
Zen
tral
bibl
ioth
ek Z
üric
h] a
t 01:
50 0
9 Ju
ly 2
014
Krueger, R.F., and Kling, K.C. (2000), ‘Validity’, in Encyclopedia of Psychology, ed. A.E. Kadzin,Washington, DC: American Psychological Association, pp. 149–153.
Kruglanski, A.W. (1975), ‘The Human Subject in the Psychological Experiment: Fact and Artifact’,in Advances in Experimental Social Psychology (Vol. 8), ed. L. Berkovitz, Chicago, IL:University of Chicago Press, pp. 101–147.
Lee, K.S. (2004), ‘Rationality, Minds, and Machines in the Laboratory: A Thematic History ofVernon Smith’s Experimental Economics’, Ph.D. dissertation, Notre Dame, University of NotreDame, p. 300.
Lee, K.S., and Mirowski, P. (2008), ‘The Energy Behind Vernon Smith’s Experimental Economics’,Cambridge Journal of Economics, 32, 257–271.
Levitt, S.D., and List, J.A. (2007), ‘What Do Laboratory Experiments Measuring Social PreferencesReveal about the Real World?’ Journal of Economic Perspectives, 21(2), 153–174.
Libby, R., Bloomsfield, R., and Nelson, M.W. (2002), ‘Experimental Research in FinancialAccounting’, Accounting, Organization and Society, 27, 775–810.
Loewenstein, G. (1999), ‘Experimental Economics from the Vantage-Point of BehaviouralEconomics’, The Economic Journal, 109, F25–F34.
Lucas, J.W. (2003), ‘Theory-Testing, Generalizations, and the Problem of External Validity’,Sociological Theory, 21(3), 236–253.
McCrea, J., and Pritchard, H.J. (1897), ‘The Validity of the Psychophysical Law for the Estimationof Surface Magnitudes’, The American Journal of Psychology, 8(4), 494–505.
Morgan, M.S. (1990), The History of Econometric Ideas, Cambridge: Cambridge University Press.Morgan, M.S., and Howlett, W.P. (eds.) (2010), How Well Do Facts Travel? Cambridge: Cambridge
University Press.Moscati, I. (2007), ‘Early Experiments in Consumer Demand Theory: 1930–1970’, History of
Political Economy, 39(3), 359–401.Peracchi, F. (2001), Econometrics, New York: Wiley.Plott, C.R. (1982), ‘The Application of Laboratory Experimental Methods to Public Choice’, in
Collective Decision Making: Applications from Public Choice Theory, ed. C.S. Russell,Baltimore, MD: Johns Hopkins University Press.
Porter, T. (1994), ‘Making Things Quantitative’, Science in Context, 7(3), 389–407.Power, M. (1996), ‘Making Things Auditable’, Accounting, Organization and Society, 21(2/3),
289–315.Rice, D.B., and Smith, V.L. (1964), ‘Nature, the Experimental Laboratory, and the Credibility of
Hypotheses’, Behavioral Science, 9(3), 239–246.Samuelson, L. (2005), ‘Economic Theory and Experimental Economics’, Journal of Economic
Literature, 43, 65–107.Samuelson, P.A. (1947), Foundations of Economic Analysis, Cambridge: Harvard University Press.Savage, L.J. (1954), The Foundations of Statistics, New York: Wiley.Schram, A. (2005), ‘Artificiality: The Tension between Internal and External Validity in Economic
Experiments’, Journal of Economic Methodology, 12(2), 225–237.Shapley, H. (1964), Of Stars and Men, Boston, MA: Beacon Press.Smith, R.A., and Davis, S.F. (1997), The Psychologist as Detective: An Introduction to Conducting
Research in Psychology, Upper Saddle River, NJ: Prentice Hall.Smith, V.L. (1976), ‘Experimental Economics: Induced Value Theory’, American Economic
Review, 66, 274–279.——— (1982), ‘Microeconomic Systems as an Experimental Science’, American Economic Review,
72, 923–955.——— (1992), ‘Game Theory and Experimental Economics: Beginnings and Early Influences’, in
Towards a History of Game Theory, ed. E.R. Weintraub, London: Duke University Press,pp. 241–282.
Starmer, C. (1999), ‘Experiments in Economics: Should We Trust the Dismal Scientists in WhiteCoats?’ Journal of Economic Methodology, 6(1), 1–30.
Stewart, M.B., and Wallis, K.F. (1981), Introductory Econometrics, Oxford: Blackwell.Street, W.R. (1994), A Chronology of Noteworthy Events in American Psychology, Washington, DC:
American Psychological Association.Swanborn, P.G. (1981), Methoden Van Sociaal-Wetenschappelijk Onderzoek, Meppel: Boom.——— (1994), ‘External Validity Abandoned?’, Quality and Quantity, 27, 211–215.
Journal of Economic Methodology 27
Dow
nloa
ded
by [
UZ
H H
aupt
bibl
ioth
ek /
Zen
tral
bibl
ioth
ek Z
üric
h] a
t 01:
50 0
9 Ju
ly 2
014
Thurstone, L.L. (1931), The Reliability and Validity of Tests: Derivation and Interpretation ofFundamental Formulae Concerned with Reliability and Validity of Tests and IllustrativeProblems, Ann Arbor, MI: Edwards Bros.
Thye, S.R. (2007), ‘Logical and Philosophical Foundations of Experimental Research in the SocialSciences’, in Laboratory Experiments in the Social Sciences, eds. M. Webster and J. Sell,Amsterdam: Elsevier, pp. 57–86.
Vissers, G., Heyne, G., Peters, V., and Guerts, J. (2001), ‘The Validity of Laboratory Research inSocial and Behavioral Science’, Quality and Quantity, 35, 129–145.
Wallis, W.A., and Friedman, M. (1942), ‘The Empirical Derivation of Indifference Functions’,in Studies in Mathematical Economics and Econometrics, In Memory of Henry Schultz,eds. O. Lange, F. McIntyre and T.O. Yntema, New York: Books for Libraries Press,pp. 175–189.
Webster, M., and Sell, J. (2007), ‘Why Do Experiments’, in Laboratory Experiments in the SocialSciences, eds. M. Webster and J. Sell, Amsterdam: Elsevier, pp. 6–24.
Weintraub, E.R. (ed.) (1992), Toward a History of Game Theory, London: Duke University Press.Wilde, L. (1980), ‘On the Use of Laboratory Experiments in Economics’, in The Philosophy of
Economics, ed. J. Pitt, Dordrecht: Reidel, pp. 137–148.Willer, D., and Walker, H.A. (2007), Building Experiments, Testing Social Theory, Stanford, CA:
Stanford University Press.Woodworth, R.C., and Schlosberg, H. (1938), Experimental Psychology, New York: Holt, Rinehart
and Winston.Zelditch, M. (2007), ‘The External Validity of Experiments That Test Theories’, in Laboratory
Experiments in the Social Sciences, eds. M. Webster and J. Sell, Amsterdam: Elsevier,pp. 87–112.
F. Heukelom28
Dow
nloa
ded
by [
UZ
H H
aupt
bibl
ioth
ek /
Zen
tral
bibl
ioth
ek Z
üric
h] a
t 01:
50 0
9 Ju
ly 2
014