selection and mode effects in risk preference elicitation …ftp.iza.org/dp3321.pdf · 2008. 1....

IZA DP No. 3321

Selection and Mode Effects in Risk PreferenceElicitation Experiments

Hans-Martin von GaudeckerArthur van SoestErik Wengström

DI

SC

US

SI

ON

PA

PE

R S

ER

IE

S

Forschungsinstitutzur Zukunft der ArbeitInstitute for the Studyof Labor

January 2008

Selection and Mode Effects in

Risk Preference Elicitation Experiments

Hans-Martin von Gaudecker Free University Amsterdam and Netspar

Arthur van Soest

Tilburg University, Netspar and IZA

Erik Wengström University of Copenhagen

Discussion Paper No. 3321 January 2008

IZA

P.O. Box 7240 53072 Bonn

Germany

Phone: +49-228-3894-0 Fax: +49-228-3894-180

E-mail: [email protected]

Any opinions expressed here are those of the author(s) and not those of IZA. Research published in this series may include views on policy, but the institute itself takes no institutional policy positions. The Institute for the Study of Labor (IZA) in Bonn is a local and virtual international research center and a place of communication between science, politics and business. IZA is an independent nonprofit organization supported by Deutsche Post World Net. The center is associated with the University of Bonn and offers a stimulating research environment through its international network, workshops and conferences, data service, project support, research visits and doctoral program. IZA engages in (i) original and internationally competitive research in all fields of labor economics, (ii) development of policy concepts, and (iii) dissemination of research results and concepts to the interested public. IZA Discussion Papers often represent preliminary work and are circulated to encourage discussion. Citation of such a paper should account for its provisional character. A revised version may be available directly from the author.

mailto:[email protected]

IZA Discussion Paper No. 3321 January 2008

ABSTRACT

Selection and Mode Effects in Risk Preference Elicitation Experiments*

We combine data from a risk preference elicitation experiment conducted on a representative sample via the Internet with laboratory data on students for the same experiment to investigate effects of implementation mode and of subject pool selection. We find that the frequency of errors in the lab experiment is drastically below that of the representative sample in the Internet experiment, and average risk aversion is also lower. Considering the student-like subsample of the Internet subjects and comparing a traditional lab design with an Internet-like design in the lab gives two ways to decompose these differences into differences due to subject pool selection and differences due to implementation mode. Both lead to the conclusion that the differences are due to selection and not implementation mode. An analysis of the various steps leading to participation or non-participation in the Internet survey leads shows that these processes are selective in selecting subjects who make fewer errors, but do not lead to biased conclusions on risk preferences. These findings point at the usefulness of the Internet survey as an alternative to a student pool in the laboratory if the ambition is to use the experiments to draw inference on a broad population. JEL Classification: C90, D81 Keywords: risk aversion, internet surveys, laboratory experiments Corresponding author: Erik Wengström Department of Economics University of Copenhagen Studiestræde 6 DK-1455 Copenhagen Denmark E-mail: [email protected]

* Financial support from the Dutch National Science Foundation (NWO), the Swedish Institute for Banking Research (Bankforskningsinstitutet), and from the European Union under grant HPRN-CT-2002-00235 (RTNAGE) is gratefully acknowledged. We thank the team of CentERdata, especially Marika Puumala, for their support with the experiments, as well as Morten Lau and Joachim Winter for very helpful comments on the experimental design. The analysis benefitted from comments received at presentations in Mannheim, Copenhagen, Gothenburg, the XIIth FUR conference at LUISS in Rome and the ESA meetings at Nottingham and Tucson. Daniel Kemptner provided able research assistance.

mailto:[email protected]

1 Introduction

Recently, there has been an increased interest in eliciting economically important preference

parameters by means of experimental methods (Harrison, Lau, and Williams (2002), Ble-

ichrodt, Pinto, and Wakker (2001), among many others). In this context, researchers are

generally interested in parameters that are valid for the general population and carry over to

situations outside the laboratory setting. There are several reasons why it may be difficult

or impossible to recover such parameters from standard experiments. First, the experimental

design could differ too widely from real-world situations in terms of context, stakes, or sim-

ilar features. The literature investigating this type of effect is reviewed in Harrison and List

(2004) and Levitt and List (2007). Second, the subjects taking part in the experiment may

not resemble the population of interest. There has been a growing concern that the standard

recruitment procedure – an experimenter inviting college students via emails or posters – may

restrict socio-demographic variation too severely as to allow meaningful inference on the broad

population of interest. Spurred by Harrison, Lau, and Williams (2002), this issue has been

addressed in several recent field studies. However, there is a second type of selection effect

that also applies if recruitment is broader than among students and is largely out of control

of the experimenter: Participation in experiments is voluntary and may be selective, so that

the participating subjects may differ from the sampling population in relevant dimensions.

In this paper, we address both types of selection effects.

Recent years have witnessed different approaches to enhance demographic variation in

experimental situations. One rather laborious possibility is to take the laboratory from the

university to the population of interest; for one of many examples along these lines see Har-

rison, List, and Towe (2007). Sample sizes usually do not exceed those typically encountered

in the laboratory which may pose a problem in accounting for demographic heterogeneity.

A similar strategy that has become available recently is to integrate experiments into ex-

isting household surveys; see for example the pioneering work by Fehr et al. (2003) and

Dohmen et al. (2005). Major advantages of this approach are that careful sampling frames

are employed and that a lot of background information on participants is available. Until

now, capacity constraints in the survey instruments and the relatively high costs of personal

interviews have hindered a more widespread use of this method. The two cited studies were

able to use moderately-sized subsamples from the much larger German Socio-Economic Panel

(N=429 and N=450, respectively). Third, experimenters have used convenience samples of

Internet respondents recruited by means of newspaper advertising or email-invitations. See,

e.g., Lucking-Reiley (1999) and Guth, Schmidt, and Sutter (2007). While this approach fa-

cilitates conducting experiments with very large numbers of participants, there is essentially

no control over the recruitment process in most cases. Experimenter-induced selection may

2

arise from reading a particular newspaper, the necessity of having access to the Internet, or

subscribing to a specific electronic mailing list.

We combine the advantages of the last two approaches in conducting an experiment

with a large sample (2,299 subjects) of respondents from a Dutch household survey, the

CentERpanel. This is carried out over the Internet and avoids non-coverage of those without

Internet access by providing them with a set-top box for their TV. In order to investigate

the traditional experimenter-induced subject pool bias, we compare the Internet outcomes to

those of parallel laboratory experiments with 178 students. However, replacing the laboratory

by the Internet also changes the environment, unlike the case of comparisons based on a “mo-

bile laboratory” approach (Andersen et al. 2005). Potential differences due to demographic

variation can therefore be confounded with implementation mode effects. We address this

issue from two angles. First, we introduce a treatment in the laboratory that replicates the

Internet setting as closely as possible. In particular, no experimenter is present while subjects

complete the experiment and there is no restriction to wait for the last person to finish before

leaving the room. Second, our Internet sample is sufficiently large to analyse a subsample

that resembles the student population in terms of age and education. If environmental factors

play a role, they should be revealed when comparing results from this subsample to results

for the laboratory experiment.

Section 2 contains an extensive description of our data and the experimental setup. The

issue of experimenter-induced selection effects is taken up in section 3. We find that when

moving from student samples to the general population, the most dramatic difference is a

drastic rise in the number of violations of the most basic economic principles, namely choosing

dominated options and non-monotonic behaviour. Risk aversion also turns out to be higher

in the overall population. We cannot detect any differences arising from the environmental

treatment for the young and educated, which leads us to conclude that the differences are

driven by subject pool effects.

Selection effects stemming from voluntary participation have not received much attention

until recently, see Harrison, Lau, and Rutstrom (2007) for a primer. The main reason for this

is probably that there is little control over the recruitment process in most cases. Experi-

menters typically collect some demographic information about participating subjects, but the

corresponding values of nonparticipants are unobserved. A crucial feature of our setup is that

we have access to rich background information for participants as well as nonparticipants.

This allows us to estimate a model of selection into the experiment (Section 4). We find

higher participation rates if incentives are provided. Participation is also larger for the more

educated, the non-elderly, males, and for those household members with most interest and ex-

pertise in financial matters. This induces some association of participation with inconsistent

3

behaviour – participants typically have observed characteristics that make them less prone

to making mistakes. But estimates of average risk preferences do not seem to be seriously

affected by selective participation.

2 Data and Experimental Setup

This section provides a detailed description of our experimental design and subject pools. The

starting point of the experiments is the multiple price list format, a well-established method-

ology for preference elicitation, which we modify in two ways. First, to help respondents

understand their tasks, we include pie-chart representations of the probabilities in addition

to the representations in numbers. Second, the experiment is designed to elicit not only risk

aversion, but also two additional preference parameters - reflecting loss aversion (Kahneman

and Tversky 1979) and uncertainty resolution preferences (Kreps and Porteus 1978). In this

paper, however, we focus on risk aversion.

We first describe the multiple price list format and how we implement it. We then point

out the aspects of the experiment that are specific to the Internet and laboratory settings,

respectively. In particular, we highlight the features of our design aimed at disentangling

subject pool (”selection”) effects and implementation method (”mode”) effects. The most

important of these is the introduction of two environmental treatments in the laboratory. One

replicates traditional experiments, the other mimics the Internet setting as much as possible.

We term them “Lab-Lab” and “Lab-Internet” to avoid confusion with the CentERpanel

experiment (also denoted as “Internet experiment”). The full experimental instructions,

samples of choice problems, help screens, final questions, and the debriefing part are available

at http://www.econ.ku.dk/wengstrom/.

2.1 The Multiple Price List Format

The experiments were conducted using an adapted version of the multiple price list format,

introduced in economics by Binswanger (1980) and recently employed in the context of risk

preferences by Holt and Laury (2002). An extensive description can be found in Andersen

et al. (2006). In principle, multiple price lists work as follows: Each subject is presented a

series of lotteries with identical payoffs but varying probabilities such as the one presented

in Figure 1. In each of the four cases, the participant can choose between Option ‘A’ and

Option ‘B’. The table is designed such that the expected payoff of lottery ‘A’ starts out higher

but moves up slower than the expected payoff of lottery ‘B’. A participant with monotone

preferences switches at some point from the safer Option ‘A’ to the riskier Option ‘B’, or

4

chooses ‘B’ throughout. This is because the last row amounts to a choice between two certain

payoffs, with that of Option ‘B’ higher than that of Option ‘A’.

A modification compared to previous studies is that we include pie-charts describing the

probabilities of the outcomes, in addition to the verbal descriptions of the decision tasks.

Pilot experiments showed that this was appreciated by subjects who were not familiar with

probability judgements. Using a design with low cognitive demands seems important when

moving outside the traditional laboratory environment. Because of the extra screen space

needed for the graphical probability representations, we reduced the number of decision tasks

per screen from the usual ten to four, avoiding that respondents need to scroll. To obtain

precise responses despite this, subjects who are consistent in the sense that they do not switch

back and forth between ‘A’ and ‘B’ and choose the higher certain payoff in the final question

are routed to a second screen containing lotteries with the same payoffs but a refined probabil-

ity grid. The probability grid of the second screen involves 10%-steps located approximately

between their highest choice of ‘A’ and their lowest choice of ‘B’ on the previous screen. This

is a version of the iterative multiple price list format described in Andersen et al. (2006).

Each subject faced seven payoff configurations, described in Table 1. For each configu-

ration subjects make either four or eight decisions, depending on their answers on the first

screen. Some of the riskier option ‘B’ lotteries involved negative payoffs, while payoffs from

the safer option ‘A’ were all strictly positive. The actual payments were always made three

months after the experiment and subjects were informed about this in the introduction. At

the top of each screen, we indicated whether the outcome of the lottery was revealed imme-

diately or in three months’ time (see Figure 1).

Subjects were randomly assigned to one of three groups with different payoff treatments:

groups with hypothetical and real lotteries with the amounts shown in Table 1, and one group

with real payoffs but amounts divided by three. We refer to these as hypothetical, high, and

low incentive treatments. All subjects in the high and low incentive groups received an upfront

payment of 15 or 5 Euros, respectively. No payment at all was made to the hypothetical

group of the CentERpanel experiment. The laboratory subjects in the hypothetical group

received a participation fee of 5 Euros for recruitment reasons. In the incentives treatments,

everyone received the participation fee, but only one in ten subjects got paid in addition for

one of the lotteries. The lottery to be paid out was selected at random to ensure incentive

compatibility. In order to avoid negative payoffs, the highest possible loss did not exceed the

fixed participation fee. We randomised the order in which the seven payoff configurations

were presented. In an effort to remain close to earlier work, the first payoff configuration in

the low incentive treatment is a scaled version of the payoff configuration in Table 1 of Holt

and Laury (2002), multiplied by six and rounded to the next lowest integer. The other payoff

5

configurations are derived in a similar way from those used by Holt and Laury.

2.2 The CentERpanel Experiment

The subjects in the Internet experiment are respondents in the CentERpanel,1 aged 16 and

older. The CentERpanel is managed by CentERdata, a survey research institute affiliated

with Tilburg University. The panel contains more than 2,000 households and covers the

complete Dutch population, excluding the institutionalised. Questionnaires and experiments

are fielded over the Internet. To avoid selection bias, households without Internet access

are provided with a set-top box for their TV (and with a TV if they do not have that

either). Panel members get questions every weekend. They are reimbursed for their costs of

participation (fees of dial-up Internet connections etc.) on a regular basis. We conducted our

experiments in November and December of 2005. Our payments were included in one of the

regular transactions three months after the experiments.

The welcome screen contained a brief introduction to the experiment followed by a non-

participation option. See Figure 2 for the introductory screens of all treatments. For the

treatments with real incentives, subjects were told the amount of the participation fee and

that they had the chance to win substantially more or lose (part of) this money again. It

was made clear that no payment would be made upon nonparticipation. In the hypothetical

treatment, subjects were informed that the questionnaire consisted of choices under uncer-

tainty in a hypothetical setting. In all treatments, subjects then had to indicate whether they

wanted to participate or not. Respondents who opted for participation first went through two

pages of online instructions before facing the seven price list configurations. The instructions

and specially designed help screens could be accessed throughout the experiment. They were

included to improve comparability with similar laboratory experiments, compensating for the

absence of an experimenter.

In total, 2,299 persons logged into the system. About 12.7% opted for nonparticipation,

leaving 2,008 respondents who started the experiment. 80 subjects dropped out before com-

pleting the questionnaire. Moreover, 138 respondents went through the experiment extremely

rapidly. Those who took less than 5:20 minutes are treated as dropouts in the analysis be-

low (see Section 4 for more details about the choice of cut-off point). Our final sample thus

consists of 1,790 subjects who made 91,808 choices.

The first three columns of Table 2 list descriptive statistics for the participants who1For related papers using data collected through the CentERpanel see Donkers, Melenberg, and van Soest

(2001) who analysed risk preferences using hypothetical questions, and Bellemare and Kroger (2007) for ev-

idence from a trust game with real payoffs. More information about the CentERpanel can be found at

http://www.uvt.nl/centerdata/.

6

completed the experiment (”final sample”), those who opted for nonparticipation, and those

who dropped out in the course of the experiment or sped through it. As expected, the

three groups differ in many respects. The variables in Table 2 can be broadly classified into

six groups: Incentive treatment; education; sex and age; employment status and residential

area; financial literacy and experience; income. Some of the questions, particularly those

on assets and financial literacy and experience, are drawn from the DNB household survey

(DHS), a survey focusing on financial issues held among CentERpanel respondents once a

year. Not everybody in our sample took part in the DHS and sample sizes fall if we include

the corresponding variables in the analysis.

2.3 The Laboratory Experiment

In order to compare the answers in the Internet survey to those in the environment of a

controlled laboratory experiment, we performed the same experiment in the economics labo-

ratory at Tilburg University. In total, 178 students participated (8 sessions in September 2005

and 8 sessions in May 2006). The same treatments were carried out as in the Internet survey.

The only difference was the above-mentioned payment of a show-up fee in the hypothetical

treatment. The payment procedure for the incentives treatments was as in the CentERpanel

experiment: The participation fee was transferred to participants’ bank accounts three months

after the experiment. One in ten subjects received the sum of the participation fee and the

(possibly negative) payment from one (randomly drawn) lottery.

To distinguish effects due to different subject pools from effects stemming from replac-

ing the controlled laboratory setting by the Internet environment, we also replicated this

latter change as good as possible in the lab. The first environmental treatment, labelled

the “Lab-Lab” treatment, was designed to replicate the traditional setup used in laboratory

experiments. In particular, an experimenter was present in the room to help the subjects

and answer questions. In contrast to the CentERpanel experiment, no links to the instruc-

tions and to the help screens were shown in the core part of the experiment. Otherwise, the

screens resembled the one shown in Figure 1. Participants also had to wait until everyone

else in the session had finished before they could leave the room. In the second environmental

treatment – termed the “Lab-Internet” treatment – the experimenter was not present. In-

stead subjects had access to the same help screens (including the introductory screens) as in

the CentERpanel experiment. Moreover, subjects could leave directly after completing the

experiment without having to wait for everyone else.

The last column of Table 2 contains the available demographic characteristics of the

laboratory subjects. Much less information is available than for the CentERpanel experiment,

and there is less variation among the basic demographic characteristics. Specifically, in terms

7

of age and education, the laboratory population represents less than five percent of the sample

in the first three columns.

3 Traditional Subject Pool Bias in the Laboratory

This section addresses “selection” or “subject pool bias”: the concern that the results of

standard laboratory experiments are not representative for a broader, heterogeneous, popu-

lation, since samples of students do not cover the population at large. Our design enables

analysing subject pool bias in laboratory experiments controlling for implementation mode

effects. With regard to elicitation and modelling of preferences, two issues are of special inter-

est. The first is the structure and frequency of errors and violations of fundamental principles

of choice. Several recent studies have highlighted the importance of accounting for errors in

the decision-making process when modelling risky choice (cf. Hey (1995), Loomes, Moffatt,

and Sugden (2002), and Schmidt and Neugebauer (2007)). The second concerns the extent

to which the distribution of preferences depends on the composition of the subject pool.

3.1 Errors and Inconsistencies

We classify three choice patterns as inconsistent. First, a dominance violation occurs if

somebody chooses option ‘B’ when the probability for the high outcome is zero or option ‘A’

when this probability is one. The second type of inconsistency emerges when subjects switch

back and forth from choosing ‘A’ and ‘B’ on the same screen. The third category of violations

consists of inconsistent choices on the initial and follow-up screens with the same payoffs and

probabilities. As explained above, we use an iterative version of the multiple price list format,

where after four choices on the first screen of each payoff configuration, subjects get a second

screen with the same payoffs but with a finer grid of probabilities. There was some overlap

of probabilities on the two screens, so subjects could make a choice on the second that was

inconsistent with their choice on the first screen.

We consider the average number of violations as a summary statistic for error frequencies.

Only one violation per subject is counted for each payoff configuration, limiting the maximum

amount of mistakes to seven. In Figure 3, the average number of inconsistent choices is

presented by sample. The error frequency is much higher in the Internet experiment than

in the Lab experiment: 2.43 (first bar) versus 1.21 (second bar). The null hypothesis that

underlying population frequencies of the number of inconsistencies are the same is rejected

using a Mann-Whitney (MW) or Kolmogorov-Smirnov (KS) nonparametric tests (two-sided

p-values< 0.01).

8

On the other hand, it is evident from Figure 3 that average error frequencies are very

similar in the two Laboratory treatments: 1.29 in the “Lab-Internet” treatment (fourth bar)

and 1.14 in the “Lab-Lab” treatment (fifth bar). We cannot reject the null hypothesis of

identical underlying distributions using the MW or KS test (two-sided p-values> 0.3). This

suggests that the disparity between the lab and the Internet does not stem from the different

environments under which the experiments were conducted. The higher frequency of viola-

tions observed in the Internet treatment is not driven by the presence of an experimenter or

the ability to leave immediately after finishing the experiment.

While this already suggests that differences between Internet and Lab inconsistencies are

driven by subject pool bias rather than implementation mode effects, one may still doubt

whether the CentERpanel setting has been perfectly replicated in the “Lab-Internet” treat-

ment. Hence, we also compare the laboratory sample with a sub-sample of the Internet sample

that has similar characteristics as the laboratory sample, thus at least partially controlling for

subject pool selection. We label this sub-sample “Internet-Uni”. Its 96 observations comprise

respondents between 18 and 34 years of age who hold a university degree or study at a univer-

sity. Behaviour of the Internet participants in this sub-sample resembles that of the student

sample: The average number of violations in the “Internet-Uni” sub-sample is 1.28 (third

bar in Figure 3), close to laboratory means of 1.29 and 1.14 (the differences are insignificant

according to KS and MW tests). This confirms that the higher frequency of errors in the

Internet treatment is driven by the different composition of the subject pool.

Table 3 displays the frequencies of the different types of errors as a percentage of the

number of possible violations. The pattern found in the laboratory is similar to results

reported by Loomes, Moffatt, and Sugden (2002) on a different risky choice design: very

few dominance violations but many more inconsistencies when faced twice with the same

decision problem. Inconsistencies between screens constitute more than 70% of all consistency

violations in both the Lab-Lab and the Lab-Internet treatment. Our results indicate that this

changes dramatically when the general population is considered, where dominance violations

play a much larger role (more than 38% of all consistency violations), though the numbers

of within and between screens inconsistencies are also larger than in the lab. As above,

the figures suggest that the difference between the laboratory and the internet is mainly

driven by subject pool effects. The error frequencies of the young and well educated in

the “Internet-Uni” group resemble those of the laboratory samples. The only discrepancy

concerns dominance violations, which appear to be slightly more common in the “Internet-

Uni” sample than in the laboratory samples. However, using individual-level frequencies of

dominance violations, we cannot reject the null hypothesis of identical underlying distributions

(MW two-sided p-values> 0.05; KS two-sided p-values> 0.6). Taken together, the frequencies

in Table 3 confirm that findings for student samples in the lab cannot always be generalised

9

to the general population.

Finally, we checked whether providing monetary incentives makes subjects take more care

in answering the questions and make fewer errors. We find no evidence of this – differences

between incentive treatments are not significant according to the MW and KS tests.

3.2 Preferences

We first consider the fraction of subjects choosing the safe option for a given probability of

the high outcome. In order to get comparable data across probabilities we restrict attention

to choices on the first screen. Subjects in the low incentive treatment are excluded, since they

faced a different payoff scale and cannot be directly compared with the other treatments. We

aggregate the data over the seven decision problems (looking at subgroups does not lead to

additional insights). Comparing the answers between the laboratory and the Internet, it is

evident that there are considerable differences – see Figure 4. Except for the case of a 0.25

probability of the high outcome, the fractions of risky choices are higher in the laboratory than

in the Internet experiment. The figure also suggests that the decisions in the lab are more

sensitive to the probability of the high outcome than in the Internet experiment. There are

two explanations for the smaller slope in the CentERpanel experiment: More heterogeneity

in preferences or higher error rates. When deleting payoff configurations with dominance

violations the pattern persists, suggesting that more dominance violations are not the only

explanation. Still, disentangling the two explanations calls for a structural model of choice

at the level of the individual, which is beyond the scope of this paper.

We can check whether the differences are due to subject pool or implementation mode

effects. First, Figure 5 shows that there are hardly any differences between the choice fre-

quencies in the “Lab-Lab” and “Lab-Internet” treatments. Second, Figure 6 shows that the

pattern in the “Internet-Uni” sub-sample of CentERpanel is similar to the pattern in the

laboratory experiment. Both figures indicate that the main driving force for the difference

between lab and Internet is the subject pool composition rather than the environment in

which the experiments were conducted.

To obtain a simple measure of individual preferences we consider at which probabilities

subjects switched from (the safer) option ‘A’ to (the riskier) option ‘B’ in each payoff config-

uration. Similar measures have been used in earlier studies, cf., e.g., Holt and Laury (2002).

We can only compute bounds that will at best be a 5%-interval (e.g. between 75% as the

highest ‘A’-choice on the first screen and 80% as the lowest choice of ‘B’ on the second screen).

In many cases, the bounds are substantially wider because of the inconsistencies discussed in

section 3.1. We computed the bounds as follows: the lowest possible switch point is defined

10

as the highest probability corresponding to an ‘A’ choice that is still lower than the minimum

probability with a ‘B’ choice; the upper bound is the minimum probability with a ‘B’ choice

that is still higher than the maximum probability where option ‘A’ was chosen. If only choice

‘A’ (‘B’) was observed, both upper and lower bound were set to one (zero).

For each individual, we averaged the upper bounds and the lower bounds across the seven

payoff configurations. This leaves us with two preference measures per individual – the higher

the measure, the more averse the subject is to more risky choices. To save space, we just

report results using the midpoint of the two bounds. All results remain qualitatively the same

if we use the upper or lower bounds or if we discard all payoff configurations with inconsistent

choices.

The average switch point of 70.4 in the Internet experiment is considerably higher than the

corresponding figure of 61.5 for the laboratory experiment. Moreover, the difference between

the two samples is found for all seven decision problems. Using the MW and KS tests we find

that the differences between the laboratory and Internet samples are highly significant, both

comparing averages across all questions or looking at each question separately (two-sided

p-values < 0.01). This is in contrast with Andersen et al. (2005), who find no significant

difference between average risk aversion in a laboratory and a field experiment.

To disentangle subject pool bias and implementation mode effects as explanations for the

observed differences in risk preferences, we also compared the average switch points in the

“Lab-Lab” treatment and the “Lab-Internet” treatment. The MW and KS tests (using mean

switch points across all payoff configurations or each payoff configuration separately), do not

reject the null hypothesis that there is no difference (two-sided p-values > 0.10). The switch

points are actually slightly higher in the “Lab-Lab” treatment, suggesting that the observed

difference between the laboratory and the Internet experiments is not due to characteristics

of the laboratory setting – in that case we would expect the difference to go in the other

direction.

A similar picture emerges when we compare the average switch points in the “Internet-

Uni” subsample to those of the student sample in the lab. The average switch point of 65.1 of

the “Internet-Uni” sample is much closer to the laboratory mean of 61.5 than the overall In-

ternet mean, and we cannot reject the null hypothesis of equality (two-sided p-values > 0.15).

For each payoff configuration separately, MW and KS tests show no significant differences

except for payoff configurations 3 and 5. Controlling for the composition of subject pools

hence eliminates most of the differences between the Internet and laboratory findings. The

disparity found between both errors and preferences in the Internet sample and the laboratory

experiments is mainly driven by the fact that the behaviour of the student population differs

from the behaviour of the general population.

11

Providing monetary incentives or not does not seem to affect the behaviour in any sys-

tematic way: the choices in the hypothetical and high incentive treatments are very similar.

This is confirmed by the MW and KS tests on differences in average switch points.

4 Self-Selection Bias in the CentERpanel Experiment

Conducting the experiment via an existing survey allows us to observe more features of the

recruitment process than usual, since we know many characteristics of all persons eligible

for participation, regardless of whether they actually take part in the experiment or not. In

order to get reliable population-wide estimates, sampling from a representative sub-population

suffices only if non-response is perfectly random. Since people self-select into the experiment

this condition may not hold. Indeed, the descriptive statistics in Table 2 suggest that there

are some important differences between the three groups of Internet-participants: those who

completed the experiment without rushing through it, those who chose not to participate,

and those who started but rushed through or did not complete. In addition to this, Harrison,

Lau, and Rutstrom (2007) have shown that self-selection effects may be important for the

estimation of risk preferences. We look at the same issue, but we can exploit much more

information about nonparticipants.

We first analyse the determinants of self-selection and then investigate their impact on

observed choices. In order to structure the analysis, it is useful to divide the sampling process

in the Internet experiment into three stages:

1. Dutch individuals are contacted at random and participate in the CentERpanel or not.

2. A random subsample of CentERpanel respondents is asked to take part in our experi-

ment. After learning about the nature of the experiment, they decide to participate or

decline participation.

3. Some of the subjects drop out during the experiment or click through it extremely

rapidly;

Steps 2 and 3 are especially interesting for experimental economics because they replicate the

recruitment process for laboratory experiments to some extent.

To see how Step 2 relates to typical recruitment procedures, note that some information

on payoffs and the type of experiment is usually conveyed before subjects come to the lab.

This is the typical form of communication in recruitment emails or on posters announcing the

experiments. Such information is provided on our welcome screen (Figure 2). Subjects learn

12

about the nature of the experiment and the possible payoffs, and then choose to participate

or not.

Step 3 seems typical for the Internet environment – it usually plays no role in the labo-

ratory. One may argue, however, that the participation decisions for laboratory experiments

combine features of steps 2 and 3. Part of the nonparticipation in laboratory experiments

may be similar to dropping out of the CentERpanel experiment, because of the negligible

fixed costs of participation in the latter. Showing up at the laboratory at a specific time

and date entails a significant cost – and subjects can be expected to have made the trade-off

between the costs and benefits of participation beforehand. This is probably not the case

in the Internet setting, where the experiment can be accessed within seconds of notification.

Hence the cost-benefit analysis may well be postponed and carried out during the experiment.

This may explain why subjects hardly ever leave the economics laboratory prematurely (and

nobody left our laboratory sessions), whereas 4% of subjects did not finish the CentERpanel

experiment. Similarly, rushing through the experiment can be considered as a form of nonpar-

ticipation, since there is a lower bound on the time needed to digest the instructions and to

give serious answers. This minimum time certainly seems higher than the 1:43 minutes which

is the minimum time observed in the Internet experiment. We checked several cut-off points

between 3 and 7 minutes and finally chose the minimum duration observed in the laboratory

(5:20 minutes). Results were robust to the precise value chosen. With this threshold, about

seven percent of the Internet subjects fall into the category of “speeders.”2

An alternative explanation why Step 3 features prominently in the Internet experiment and

not in laboratory experiments is the interaction with the experimenter and the typical rules

in the laboratory. One difference is the possibility to ask questions. Internet participant who

do not understand a task and cannot ask questions may more easily opt for randomly ticking

options or drop out entirely. Another difference is that in typical laboratory experiments

everybody is expected to stay until the last subject has finished, so that there is no point

in rapid completion. We can analyse the consequences of these differences by comparing

the “Lab-Lab” and “Lab-Internet” treatments (see Section 2.3). The distributions of the

completion times look rather similar, with mean durations of about 12.5 minutes in both

cases. Surprisingly, in the traditional “Lab-Lab” treatment the dispersion is higher and the

left tail of the distribution has more mass. If rapid completion were due to the two factors

mentioned above, we would expect the opposite. Completion times in the “Internet-Uni”-2The combined response rate for steps 2 and 3 in our Internet experiment is 78%. This seems to compare

favourably to Harrison, Lau, and Rutstrom (2007), who employed more standard recruitment procedures in

mailing out a letter to a random subsample of the Danish population and achieved a response rate of 38% (253

of 664 subjects), but it should be noted that our response rate is within a preselected sample that has shown

a general inclination to fill out survey questionnaires.

13

subgroup of the Internet subjects are lower than those of the laboratory subjects. This

is consistent with our preferred interpretation of step 3 of the selection process, since the

“Internet-Uni” group will contain more respondents who rush through the experiment.

4.1 The Determinants of Self-Selection

To analyse the factors that drive participation in the Internet experiment, we estimated a

multinomial logit model with three possible outcomes: non-participation, rushing through or

dropping out, and regular (“full”) participation. Results are presented in Table 5, with full

participation as the baseline category. Columns 1 and 2 contain the coefficients and standard

errors of our basic specification, for nonparticipation and dropouts/speeders, respectively.

Only basic covariates that are available for almost everybody are included – dummies for the

incentive treatments, education, gender, age, occupational status, and residential area. In

the extended specification (columns 3 and 4) the number of observations is lower since we

use covariates from the DNB Household Survey questionnaires which were not filled out by

all subjects. This specification adds household income (in four categories) and variables mea-

suring financial expertise and preferences: Whether the respondent manages the household’s

finances, whether the employer offers a save-as-you-earn deduction arrangement,3 whether

the respondent holds such a plan, whether the household’s financial portfolio includes savings

accounts, and whether it includes risky assets like mutual funds or stocks.4

For the variables included in both specifications, results for the two specifications are

very similar. Nonparticipation is significantly less likely in the incentive treatments than in

the hypothetical treatment (the benchmark). Translated to marginal effects, the coefficients

indicate response rates that are almost ten percentage points higher than in the hypothetical

treatment (all marginal effects are evaluated at a baseline with all dummy variables set to

zero). The point estimates on not finishing the experiment are much smaller in magnitude

and not significant. While incentives increase participation, they do not seem to attract a

different subject pool – we estimated models that included a wide variety of interaction effects

and none of them was significant. The coefficients on the high and low incentive treatments

are not significantly different from each other, so with the relatively small stakes we consider

here, the size of the incentives does not seem to have an impact.3This is an employer provided savings plan that is heavily subsidised by the state through tax deductions,

see Alessie, Hochgurtel, and van Soest (2006).These subsidies make net returns much higher than on any other

safe asset. While it is easy to sign up for these plans and the employer does most of the paperwork, the default

is not to participate. This may explain why employees with little financial knowledge or interest often do not

sign up, cf., e.g., the work of Madrian and Shea (2001) on non-take up of 401(k) plans.4Other specifications of the selection model did not yield additional insights; they also included subjective

income measures, wealth, and interactions between the covariates.

14

Persons in the top two education categories are both significantly more likely to participate

in the experiment and to finish it. Women’s nonparticipation rates are four to five percentage

points higher than those of men. Women also are slightly more prone to quit during the

experiment or to finish it very rapidly. Age effects start to matter at age 45, beyond which

participation rates decline. Those beyond 65 years of age are only half as likely to start

the experiment as those younger than 35. At the same time, however, non-completion rates

decrease significantly with age. This is mainly due to the fact that older participants are

less likely to rush through the experiment.5 Part of this result may be due to the fact that

the elderly are slower in working with computers, but the effects are already visible for ages

around forty. The combined effects of age on full participation are small and insignificant in

almost all cases.

Working respondents have higher participation rates than non-workers according to the

parsimonious specification, but the effect becomes insignificant in the richer specification.

Labour market status does not affect quitting or speeding. Living in an urban area has no

significant impact at all.

Point estimates of the effects of income and wealth variables on participation are generally

small and insignificant, and a joint test confirms that they play no role. The other financial

variables are proxies for preferences and financial knowledge. Being the financial administrator

of the household for example may reflect a preference for spending one’s time with risky

choice problems. It significantly increases the propensity to participate and also to finish

the experiment in more than 5:20 minutes. Whether the employer offers a savings plan is

just a control variable necessary to avoid confounding the effects of holding such a plan

with employment type. The variable of interest, taking part in a save-as-you-earn savings

arrangement, is associated with a (significant) six percentage points higher propensity to

begin the experiment. This supports the interpretation that the financial variables reflect an

interest in contemplating financial questions. This interpretation is strengthened by the other

two portfolio variables – on the one hand, having an ordinary savings account does not have

any predictive power for taking part in the experiment. Saving accounts are commonly known

and do not require much expertise or effort. On the other hand, the ownership of mutual

funds, stocks, etc. is significantly associated with higher participation in the experiment.

These are much more sophisticated products and investing in them requires more financial

knowledge. The results thus point towards a type of preference-based selection process in

which interested and knowledgeable individuals have a higher probability of participating.5We estimated models that treated the two components of step 3 (speeding through and dropping out) as

separate outcomes. This is the only case where the distinction mattered, so we report only the results from

the more parsimonious specification.

15

4.2 The Impact of Self-Selection on Outcomes

To test whether selection effects matter for the outcomes considered in Section 3, we compare

the observed sample distribution of full participants with a weighted distribution that corrects

for the various steps of the selection process.

For Step 1, CentERdata provides standard survey weights based upon comparing with a

much larger household survey drawn by Statistics Netherlands. We will assume that selection

into the CentERpanel is independent of the variables of interest, conditional on the basic

background variables used to construct these weights (age, sex, education, home ownership,

region). This is a missing at random (MAR) assumption, see e.g. Little and Rubin (2002)).

It implies that the weights can be used to correct for the selection in Step 1.

We make similar MAR assumptions for the other two steps, but then conditioning on the

much larger set of background variables used in the previous subsection. We construct weights

from a probit model that jointly explains the selection in Steps 2 and 3; each weight is the

inverse of the predicted probability of being in the final sample. We multiply these weights

with the weights for Step 1 to get weights that correct for all steps of the selection process.

Due to sample size considerations we opt for the parsimonious specification in the probit

regression. We then test whether weighted sample statistics on the outcomes are significantly

different from unweighted statistics. This can be seen as a test whether the selection process

is selective, under the maintained assumption that selection in each step is MAR given the

covariates used to construct the weights.

In Table 4 the average number of inconsistencies and average mean switch points for

the weighted and unweighted data are presented together with p-values of t-tests comparing

mean values.6 Taking the full selection process (steps 1, 2 and 3) into account the estimated

population average number of inconsistencies is 2.60, compared to 2.43 for the unweighted

sample. The difference is statistically significant (two-sided p-value=0.0001). This result

is consistent with the findings in Section 4.1. The only variable for which we had clear

priors was the education variable. We expect better educated people to make fewer errors.

Indeed, there are fewer inconsistencies in the raw estimates, where the better educated are

overrepresented. Controlling only for the selection at steps 2 and 3, the average number

of inconsistencies is 2.55, which is significantly different from the unweighted number 2.43

(two-sided p-value:=0.001) but not from the 2.60, which also corrects for Step 1 (two-sided6The test works as follows: Let y denote our variable of interest (i.e. the average number of violations

or mean switch point) and w(x) the weight. Our null hypothesis of no difference between the weighted and

unweighted observations can then be stated as E{w(x)y} = E{y} or E{z} = 0, with z = (w(x) − 1)y. Since

we have a large sample size and few explanatory variables, we neglect the estimation error in w. The null

hypothesis can then be tested with a standard t-test on whether the mean of z is zero or not.

16

p-value=0.1). Hence, the selection bias reported for the full process originates mostly from

steps 2 and 3, while step 1 has less of an impact.

For the estimates of risk preferences, we find a small underestimation of the mean switch

points when using the unweighted sample compared to both weighted samples, but the dif-

ference is insignificant (irrespective of which switch points we use). Thus, although the

participation decision is correlated with demographics, selection based on observables does

not fundamentally alter the results on this simple summary measure of risk preferences.

Harrison, Lau, and Rutstrom (2007) find that not controlling for selection effects leads to

an underestimation of average risk aversion in the population. They attribute this finding to

the use of fixed show-up fees attracting less risk averse participants. To investigate this in

our sample, we tested for selection effects for each incentive treatment group separately. In

contrast to Harrison, Lau, and Rutstrom (2007) we do not observe any selection effects on

the aggregated measures of preferences when only considering the treatments with fixed show

up fees. Combining this with the similarities we found in behaviour of the high incentive and

hypothetical groups (Section 3.2), we conclude that our data do not support the explanation

of selection effects suggested by Harrison, Lau, and Rutstrom (2007).

5 Conclusions

We have analysed different aspects of the representativeness of preference elicitation experi-

ments. First, we disentangled environmental effects (laboratory vs. Internet) from traditional

subject pool bias. On the one hand, we showed that the implementation mode does not mat-

ter for either error frequencies or preference estimates among the young and educated. On the

other hand, we found dramatic differences in the number of dominance and monotonicity vi-

olations when moving from a student sample to a sample drawn from the general population.

Students also exhibited a higher degree of risk tolerance than the general population.

Since this implies that selection of subjects is important, we then looked at selection effects

that may arise from voluntary participation in the Internet experiment. We first considered

the relation between self-selection and observed characteristics and showed that higher ed-

ucation, being male, as well as interest and expertise in financial matters are predictors of

participating in the experiment. Consistent with the education and financial literacy vari-

ables, we showed that this type of selection leads to a sample with less prevalence of violations

of monotonicity and dominance. On the other hand, we found no evidence of selection bias

on risk preferences for the Internet experiment.

Our main messages are the following. First, in line with existing evidence, we find large

17

differences between experimental choices of students and the general population, reflecting

not only different tendencies to make errors but also differences in risk preferences. This

confirms that findings based upon laboratory experiments with student samples cannot be

simply extrapolated to the general population. Second, we conclude that using a represen-

tative Internet survey offers a feasible solution to this problem. We find hardly any effects

of implementation mode (help screens replacing an experimenter, etc.), with similar choice

distributions of students in the traditional lab setting, a lab setting mimicking the Internet

experiment, and the Internet experiment. We also find that the selection processes leading to

participation in the Internet experiment leads to a subject pool making fewer errors than the

general population, but with the same distribution of risk preferences, irrespective of whether

or not we provide real monetary incentives. Thus the Internet experiment seems an appro-

priate way to estimate risk preferences for the general population, which is substantially less

inclined to make risky choices than the subpopulation of students. The conclusion that im-

plementation mode does not matter may of course be specific to the nature of the experiment

– whether it remains the same in more complicated experiments or experiments measuring

different sorts of preferences remains to be seen.

A Tables

Table 1: Characteristics of the Seven Payoff Configurations

Payoff Uncertainty Payoff Payoff Uncertainty Payoff Payoff

Configuration Resolution Low, A High, A Resolution Low, B High, B

1 early 27 33 early 0 69

2 early 39 48 early 9 87

3 early 12 15 early -15 48

4 early 33 36 late 6 69

5 early 18 21 late -9 54

6 early 24 27 early -3 60

7 late 15 18 late -12 51

Note: These values were shown in the high incentive and hypothetical treatments. For the low incentive

treatment they were divided by three.

18

Table 2: Selected Characteristics of Participants

CentERpanel Laboratory

Final Non- Dropouts Final

Variable Sample Participants Speeders Sample

Hypothetical treatment 0.31 0.50 0.37 0.37

Low incentive treatment 0.37 0.23 0.35 0.27

High incentive treatment 0.32 0.27 0.28 0.37

Primary / lower sec. education 0.31 0.44 0.34 .

Higher sec. / interm. voc. train. 0.33 0.30 0.41 .

Higher vocational training 0.25 0.20 0.16 .

University degree / univ. student 0.12 0.06 0.09 1.00

Female 0.45 0.54 0.56 0.46

Age 16-34 years 0.24 0.14 0.46 1.00

Age 35-44 years 0.19 0.13 0.21 .

Age 45-54 years 0.23 0.22 0.14 .

Age 55-64 years 0.18 0.18 0.09 .

Age 65+ years 0.16 0.33 0.10 .

Working 0.56 0.36 0.55 .

Unemployed, Looking for Job 0.02 0.03 0.03 .

Student, Pensioner, Housework 0.42 0.62 0.42 1.00

Lives in Urban Area 0.60 0.63 0.58 .

HH financial administrator 0.66 0.56 0.48 .

Employer offers Savings Plan 0.44 0.25 0.32 .

Has Sav. Plan via Employer 0.36 0.17 0.26 .

Has Sav. Acc. or similar 0.87 0.85 0.90 .

Holds Stocks, or similar 0.31 0.25 0.29 .

HH income below 22k Euros 0.34 0.35 0.33 .

HH income ∈ [22k, 40k Euros) 0.49 0.51 0.49 .

HH income at least 40k Euros 0.17 0.14 0.18 .

Max. Number of Observations 1790 291 218 178

Note: The numbers shown indicate fractions in the final sample. Some households did not

complete the questionnaires of the DHS from which some of the variables are drawn. Hence

the number of observations is lower for some of the variables in question. This is particularly

true for the last two sections of the table.

19

Table 3: Frequency of inconsistencies by type of error and subsample

Dominance Within Between

Internet 11.0% 3.9% 21.1%

Laboratory 1.5% 1.8% 14.4%

Internet - Uni 3.9% 1.5% 13.1%

Lab - Internet 1.4% 1.8% 15.1%

Lab - Lab 1.5% 1.7% 13.7%

Note: The figures represent frequencies of the different

types of errors as a percentage of the number of possible

violations. The fractions of violations for the dominance

category were obtained by dividing the total number of

dominance violations in each category by the total num-

ber of screens shown to subjects on which dominance

violations could be made. The numbers for the within

category are calculated as the number of within viola-

tions, divided by the total number of screens shown to

subjects in each group. The figures of the last column

were obtained by dividing the number of between errors

by the number of times the second screen was displayed

to subjects.

Table 4: Weighted data

Weight Average # p-value p-value Meanswitch p-value p-value

inconsistencies steps 2,3 no weight steps 2,3 no weight

Steps 1,2,3 2.6 0.1 0.0001 71.55 0.2 0.1

(0.05) (0.50)

Steps 2,3 2.55 . 0.001 71.14 . 0.1

(0.05) (0.49)

None 2.43 . . 70.36 . .

(0.05) (0.53)

Note: Variables in category Steps 1,2,3 use weights for step 1,2 and 3; variables in category Steps

2,3 use weights for step 2 and 3. Average mean switch points are calculated using data from

the hypothetical and high incentive treatments only. Standard errors are given in parenthesis.

P-values comes from t-tests, described in footnote 6 in Section 4.2, with the null hypotheses of

equal means.

20

Table 5: Self-Selection into the CentERpanel Experiment

NP DO/SP NP DO/SPSpecification (1) (2) (3) (4)

Low incentive treatment -1.047∗∗∗ -.220 -1.053∗∗∗ -.247(.163) (.174) (.183) (.207)

High incentive treatment -.699∗∗∗ -.277 -.881∗∗∗ -.342(.156) (.183) (.188) (.226)

Higher sec. / interm. voc. train. -.285∗ -.120 -.180 -.290(.160) (.177) (.185) (.215)

Higher vocational training -.413∗∗ -.677∗∗∗ -.329 -.801∗∗∗

(.181) (.228) (.217) (.282)

University degree / univ. student -.881∗∗∗ -.447 -.840∗∗ -.622∗

(.281) (.279) (.356) (.356)

Female .383∗∗∗ .275∗ .357∗∗ .253(.137) (.152) (.158) (.184)

Age 35-44 years .336 -.450∗∗ .385 -.471∗

(.242) (.196) (.301) (.253)

Age 45-54 years .579∗∗∗ -1.076∗∗∗ .761∗∗∗ -1.044∗∗∗

(.221) (.222) (.268) (.275)

Age 55-64 years .489∗∗ -1.326∗∗∗ .781∗∗∗ -1.214∗∗∗

(.231) (.262) (.274) (.301)

Age 65+ years 1.120∗∗∗ -1.171∗∗∗ 1.349∗∗∗ -1.297∗∗∗

(.233) (.278) (.284) (.341)

Working -.402∗∗ -.104 -.185 -.054(.175) (.179) (.214) (.229)

Unemployed, Looking for Job .059 .198 -.011 .263(.416) (.439) (.515) (.519)

Lives in Urban Area .165 -.059 .235 -.051(.137) (.151) (.159) (.183)

HH financial administrator -.379∗∗ -.328∗

(.164) (.193)

Employer offers Savings Plan .180 -.369(.281) (.365)

Has Sav. Plan via Employer -.845∗∗∗ -.045(.310) (.379)

Has Sav. Acc. or similar .095 .228(.225) (.292)

Holds Stocks, or similar -.368∗∗ .058(.177) (.204)

HH income ∈ [22k, 40k Euros) .153 .191(.176) (.208)

HH income at least 40k Euros .040 .539∗

(.259) (.280)

Constant -1.693∗∗∗ -1.147∗∗∗ -1.735∗∗∗ -1.244∗∗∗

(.253) (.240) (.371) (.401)

No. of Observations 2296 2296 1802 1802

Note: Coefficient estimates and corresponding standard errors of multinomial logit regression.

Columns indicate categories of the dependent variable by regression type. The reference category

contains those respondents who completed the experiment in more than 5:20 minutes. Columns

(1) and (3) list estimates for opting for nonparticipation on the first screen (NP); columns (2)

and (4) those for dropping out before completion (DO) or finishing the experiment in less than

5:20 minutes (SP). Left-out categories of relevant variables are hypothetical treatment; primary

and lower secondary education; ages 16-34; other type of occupation; and household income less

than 22,000 Euro. Asterisks indicate significance at the 10%, 5%, and 1%-level.

21

B Figures

Figure 1: Screenshot of Payoff Configuration 5, First Screen

Progress: 70% Instructions Help

Please, make a choice between A and B for each of the decision problems below.

Option A -outcome IMMEDIATELY revealed

Option B -outcome revealed in THREE MONTHS

Choice

A B

€ 21 with probability 25%€ 18 with probability 75%

€ 54 with probability 25%€ -9 with probability 75%





€ 21 with probability100%€ 18 with probability 0%

€ 54 with probability100%€ -9 with probability 0%

Continue

22

Figure 2: Translations of the Welcome Screens in the CentERpanel Experiment

High (Low) Incentive Treatment

Welcome to this economic experiment carriedout by researchers of Tilburg University. Theexperiment is about making choices under un-certainty. Please read the instructions care-fully in order to understand how the experimentworks.

If you have questions after the beginning of theexperiment, you can return to the instructionsby clicking on a link at the top of the screen.If you have questions on the specific screen, youcan click on ‘Help’ at the top right corner of thescreen.

You will receive 15 (5) Euros for participating.Then you can, depending on the choices youmake and on chance, earn more or lose part ofthe 15 (5) Euros. If completing the total exper-iment, you receive the reward for participating,possibly increased by your gain (or reduced byyour loss) in one of the choices you have made.Whether the latter occurs and which choice thendetermines your payoff, will be determined bychance. Your total reward will be addedto your CentERpoints.

The questions are not designed to test you.Answers are therefore not correct or incorrect;please give the answers that reflect your ownpreferences. Assume in each choice problemthat this choice determines your actual payoff.

This questionnaire is about making choices, andyour payoff depends on your choices and onchance. If you do not want to participate out ofprinciple, you can indicate this below. In thatcase you will not continue with the question-naire.

O Yes, I proceed with the questionnaire

O No, I do not want to complete this question-naire

Hypothetical Treatment

Welcome to this economic experiment carriedout by researchers of Tilburg University. Theexperiment is about making choices under un-certainty. Please read the instructions care-fully in order to understand how the experimentworks.

If you have questions after the beginning of theexperiment, you can return to the instructionsby clicking on a link at the top of the screen.If you have questions on the specific screen, youcan click on ‘Help’ at the top right corner of thescreen.

The questions are not designed to test you.Answers are therefore not correct or incorrect;please give the answers that reflect your ownpreferences.

This questionnaire is about making choices be-tween several situations in which you can (hy-pothetically) gain or lose money. Your revenuedepends on the choices you make and on chance.What matters is what you would do in hy-pothetical situations, in reality, there isnothing at stake for you. If you neverthelessdo not want to participate out of principle, youcan indicate this below. In that case you willnot continue with the questionnaire.

O Yes, I proceed with the questionnaire

O No, I do not want to complete this question-naire

23

Figure 3: Average Number of Answers that Violate Monotonicity or Dominance by Sample.

0.5

11.

52

2.5

3M

ean

num

ber

of in

cons

iste

ncie

s

Internet Laboratory Internet−Uni Lab−Internet Lab−Lab

Sample

Note: Values shown are means over all screens per subject (mini-mum 0, maximum 7). “Internet” consists of unweighted numbers fromCentERpanel respondents. “Lab” are averages for all laboratory subjectsand “Internet-Uni” mean values for those respondents of the CentERpanelthat are less than 35 years of age and hold a university degree or studyto obtain one. “Lab-Lab” and “Lab-Internet” are averages for laboratorysubjects in the“Lab-Lab” and “Lab-Internet” treatments respectively.

Figure 4: Risky Choices, Internet and Lab Subsamples

0.1

.2.3

.4.5

.6.7

.8.9

1

Fra

ctio

n C

hoos

ing

Ris

ky O

ptio

n B

25 50 75 100

Probability of the High Outcome

Internet Lab

Note: Solid lines depict means over all first screen choices. Dashed linesdisplay corresponding confidence intervals. Data from the low incen-tive treatment is excluded. “Internet” consists of the raw numbers fromCentERpanel respondents. “Lab” are averages for laboratory subjects.

24

Figure 5: Risky Choices, Lab-Internet and Lab-Lab Subsamples

0.1

.2.3

.4.5

.6.7

.8.9

1

Fra

ctio

n C

hoos

ing

Ris

ky O

ptio

n B

25 50 75 100


Lab−Lab Lab−Internet

Note: Solid lines depict means over all first screen choices. Dashed linesdisplay corresponding confidence intervals. Data from the low incentivetreatment is excluded. “Lab-Lab” are averages for laboratory subjects inthe “Lab-Lab” treatment and “Lab-Internet” mean values for subjects ofthe “Lab-Internet” treatment.

Figure 6: Risky Choices, Internet-Uni and Lab Subsamples

0.1

.2.3

.4.5

.6.7

.8.9

1

Fra

ctio

n C

hoos

ing

Ris

ky O

ptio

n B

25 50 75 100


Internet−Uni Lab

Note: Solid lines depict means over all first screen choices. Dashed linesdisplay corresponding confidence intervals. Data from the low incentivetreatment is excluded. “Lab” are averages for laboratory subjects and“Internet-Uni” mean values for those respondents of the CentERpanelthat are less than 35 years of age and hold a university degree or study toobtain one.

25

References

Alessie, R., S. Hochgurtel, and A. van Soest (2006): “Non-take-up of tax-favoredsavings plans: evidence from Dutch employees,” Journal of Economic Psychology, 27, 483–501.

Andersen, S., G. W. Harrison, M. I. Lau, and E. E. Rutstrom (2005): “Preferenceheterogeneity in experiments: comparing the field and lab,” Centre for Economic andBusiness Research Discussion Paper No. 2005-03, Copenhagen.

(2006): “Elicitation using multiple price list formats,” Experimental Economics,9(4), 383–405.

Bellemare, C., and S. Kroger (2007): “On representative social capital,” EuropeanEconomic Review, 21, 183–202.

Binswanger, H. (1980): “Attitudes toward risk: experimental measurement in rural India,”American Journal of Agricultural Economics, 62, 395–407.

Bleichrodt, H., J. L. Pinto, and P. P. Wakker (2001): “Making descriptive use ofprospect theory to improve the prescriptive use of expected utility,” Management Science,47(11), 1498–1514.

Dohmen, T., A. Falk, D. Huffman, U. Sunde, J. Schupp, and G. G. Wagner (2005):“Individual risk attitudes: new evidence from a large, representative, experimentally-validated survey,” IZA Discussion Paper No 1730.

Donkers, B., B. Melenberg, and A. van Soest (2001): “Estimating risk attitudes usinglotteries; a large sample approach,” Journal of Risk and Uncertainty, 22(2), 165–195.

Fehr, E., U. Fischbacher, J. Schupp, B. von Rosenbladt, and G. G. Wagner (2003):“A nationwide laboratory examining trust and trustworthiness by integrating behaviouralexperiments into representative surveys,” CEPR Discussion Papers 3858, C.E.P.R. Discus-sion Papers.

Guth, W., C. Schmidt, and M. Sutter (2007): “Bargaining outside the lab - a newspaperexperiment of a three-person ultimatum game,” Economic Journal, 117(518), 449–469.

Harrison, G., M. Lau, and E. Rutstrom (2007): “Risk attitudes, randomization totreatment, and self-selection into experiments,” UCF Economics Working Paper.

Harrison, G. W., M. I. Lau, and M. B. Williams (2002): “Estimating individual dis-count rates in Denmark: a field experiment,” American Economic Review, 92(5), 1606–1617.

Harrison, G. W., and J. A. List (2004): “Field experiments,” Journal of EconomicLiterature, 42, 1013–1059.

Harrison, G. W., J. A. List, and C. Towe (2007): “Naturally occurring preferences andexogenous laboratory experiments: a case study of risk aversion,” Econometrica, 75(2),433–458.

26

Hey, J. (1995): “Experimental investigations of errors in decision making under risk,” Eu-ropean Economic Review, 39, 633–641.

Holt, C. A., and S. K. Laury (2002): “Risk aversion and incentive effects,” AmericanEconomic Review, 92, 1644–1655.

Kahneman, D., and A. Tversky (1979): “Prospect theory: an analysis of decision underrisk,” Econometrica, 47, 263–291.

Kreps, D. M., and E. L. Porteus (1978): “Temporal resolution of uncertainty and dy-namic choice theory,” Econometrica, 46, 185–200.

Levitt, S., and J. List (2007): “What do laboratory experiments measuring social prefer-ences reveal about the real world?,” The Journal of Economic Perspectives, 21(2), 153–174.

Little, R. J., and D. B. Rubin (2002): Statistical Analysis with Missing Data. John Wiley& Sons Inc., New York, 2nd edn.

Loomes, G., P. Moffatt, and R. Sugden (2002): “A microeconometric test of alternativestochastic theories of risky Choice,” Journal of Risk and Uncertainty, 24, 103–130.

Lucking-Reiley, D. (1999): “Using field experiments to test equivalence between auctionformats: magic on the internet.,” American Economic Review, 89, 1063–1080.

Madrian, B. C., and D. F. Shea (2001): “The power of suggestion: inertia in 401(K)participation and savings behavior,” Quarterly Journal of Economics, 116(4), 1149–1187.

Schmidt, U., and T. Neugebauer (2007): “Testing expected utility in the presence oferrors,” Economic Journal, 117, 470–485.

27

selection and mode effects in risk preference elicitation …ftp.iza.org/dp3321.pdf · 2008. 1....

Documents