value-based decision-making: a new developmental paradigm€¦ · value-based decision-making: a...
TRANSCRIPT
Value-Based Decision-Making:
A New Developmental Paradigm ∗
Isabelle BrocasUniversity of Southern California
and CEPR
Juan D. CarrilloUniversity of Southern California
and CEPR
T. Dalton CombsDopamine Labs
Niree KodaverdianUniversity of Southern California
May 2017
Abstract
How does value-based reasoning develop and how different this development is fromone domain to another? Children from Kindergarten to 5th grade made pairwisechoices in the Goods domain (toys), Social domain (sharing between self and other),and Risk domain (lotteries) and we evaluated how consistent their choices were. Wereport evidence that the development of consistency across domains cannot be ac-counted for by existing developmental paradigms: it is not related to the developmentof transitive reasoning and it is only partially linked to developmental aspects of atten-tional control and centration. Rather, choice consistency is related to self knowledgeof preferences which develops gradually and differentially across domains. The Goodsdomain offers a developmental template: children become more consistent over timebecause they learn what they like most and least. Systematic differences exist how-ever in both the Social and Risk domains. In the Social domain, children graduallylearn what they like most but not what they like least, while in the Risk domain, theygradually learn what they like least but not what they like most. These asymmetricdevelopments give rise to asymmetric patterns of consistency.
Keywords: laboratory experiments, developmental economics, revealed preferences, risk,
social preferences.
JEL Classification: C91, D11, D12.
∗We are grateful to members of the Los Angeles Behavioral Economics Laboratory (LABEL) for theirinsights and comments in the various phases of the project. We also thank participants at the 2014Social Neuroscience retreat (Catalina Island, USC) and at the 2015 Morality, Incentives and UnethicalBehavior Conference (UCSD) for useful comments and the staff at the Lycee International de Los Angelesfor their support. All remaining errors are ours. The study was conducted with the University of SouthernCalifornia IRB approval UP-12-00528. We acknowledge the financial support of the National ScienceFoundation grant SES-1425062.
1 Introduction
Adults have many abilities that children lack. Multiple paradigms exist to explain why
these differences exist and change during development. Each of these paradigms describes
an important aspect of development, although none of them alone can explain all of the
diverse abilities that change as we grow. Here, we present evidence that an important
difference between children and adults cannot be explained by existing paradigms; namely,
that adults consistently know what they want but children do not. We can measure the
consistency of preferences by testing for transitivity of choices: if a subject chooses option
A over option B and option B over option C, consistency of preferences requires that
she must choose option A over option C. A number of studies have shown that children,
especially young ones, are less consistent than adults in the Goods domain, that is, when
choosing between foods or toys (Smedslund, 1960; Harbaugh, Krause and Berry, 2001; List
and Millimet, 2008) while there is some partial evidence that age is less of a predictor of
consistency in the Social (Harbaugh and Krause, 2000) and Risk (Harbaugh, Krause and
Vesterlund, 2002) domains. This suggests that children’s self-knowledge of preferences is
imperfect and varies across domains.
It is intuitive, however, that differences in transitivity across ages and domains may
reflect known aspects of development. Even though children know that they prefer A
to B and B to C, their choices may not conform to this ranking for reasons related to
established developmental paradigms. For example, it could occur because attentional
control, a deficit which has previously been associated with intransitive behavior in older
adults (Brocas, Carrillo, Combs and Kodaverdian, 2015), is still underdeveloped in children
(Davidson, Amso, Anderson and Diamond, 2006; Astle and Scerif, 2008). Alternatively,
it may be because the ability to reason logically (Sher, Koenig and Rustichini, 2014) and
transitively (Piaget, 1948; Bouwmeester and Sijtsma, 2006) is still developing. Finally, it
may result from children’s inability to focus on more than one attribute of an item at a
time, a phenomenon referred to as centration (Piaget, Elkind and Tenzer, 1967; Donaldson,
1982; Crain, 2015).
The objective of this research is to assess the common and domain-specific develop-
mental trajectories of transitive decision-making in the Goods, Social, and Risk domains
and to determine if the dominant developmental paradigms (attentional control, logical
reasoning, and centration) are enough to explain age- and domain-related differences in
transitive decision-making. Our study is most closely related to the literature on choice
consistency. There are however two main differences between the existing studies and
ours. A first difference is conceptual. We want to compare consistency across domains
and investigate how existing developmental paradigms account for changes in preference
2
consistency over childhood. Earlier studies offered instead analyses of consistency in one
domain at a time. A second difference is methodological. Earlier studies have relied on
the Generalized Axiom of Revealed Preferences (GARP), an indirect test of transitivity
which focuses on choices between bundles of options given a budget constraint, a system
of prices, and a non-satiation assumption (Samuelson, 1938; Varian, 1982). By contrast,
our design is a direct test of transitivity. Furthermore, the extreme simplicity of the design
(pairwise comparisons instead of a system of prices and a budget constraint) and the re-
moval of assumptions such as non-satiation (sometimes violated in experimental studies)
makes it especially suitable to test it in young children. It also delivers results that are
directly comparable across domains.
2 Methods
We recruited 134 children from Kindergarten to 5th grade and 51 Undergraduate students
to participate in three Choice tasks and three Ranking tasks (Fig.1). In each Ranking
task, we asked participants to provide explicit rankings of seven items. In each Choice
task, we asked them to choose between all 21 pairwise combinations of those seven items.
The Goods-Choice and Goods-Ranking tasks involved age-appropriate toys, the Social-
Choice and Social-Ranking tasks involved sharing rules between self and other, and the
Risk-Choice and Risk-Ranking tasks involved lotteries. For each domain and age group,
we determined whether the pairwise choices in the Choice tasks were transitive and how in-
transitive choices were distributed over rankings elicited in the Ranking tasks. We included
attention trials to assess attentional control, and a reasoning task to measure transitive
reasoning. To assess centration, we determined whether actual choices were consistent
with attending to single attributes. For analysis, we grouped children into the age groups
K-1st (Kindergarten and 1st grade, 47 participants), 2nd-3rd (2nd and 3rd grades, 55
participants), 4th-5th (4th and 5th grades, 32 participants), and U (Undergraduates, 51
participants). The U age group was a control for adult level value-based decision-making.
Appendix SI-1 contains a detailed description of the design and procedures.
3 Experimental Results
We measured choice consistency by counting the number of transitivity violations (TV).
Formally, for each triplet of items A, B and C, a TV occurs when the participant chooses A
over B, B over C, and C over A. In the analysis, we considered all combinations of triplets
of items and counted the number of times choices between those triplets were intransitive
for each subject in each Choice task. Then, we computed the average number of TV by
3
A) Choice Task
B) Ranking Task
C) Goods
D) Social
E) Risk
Opt.1
Opt.2
Figure 1: Decision-making tasks. (A) In each of the 21 trials of the Choice tasks,participants were shown one combination of two options, left (Opt.1) and right (Opt. 2).They touched one of three buttons displayed at the top of their screen to select an optionor to express indifference (middle button). (B) In Ranking tasks, participants ranked allseven options from most preferred (green happy face) to least preferred (yellow neutralface). Both types of tasks were conducted in the (C) Goods domain involving toys, in the(D) Social domain involving sharing rules for self (hand pointing at self) and other (handpointing right), and in the (E) Risk domain involving lotteries that consisted of quantities(number of toys) and probabilities (green share of the pie).
age group and Choice task. Appendix SI-2 explains the procedure and provides details of
the statistical methods and results in this section.
Choice transitivity is domain-dependent (Fig.2). TV decreased with age in both the
Goods and Social domains, while no two age groups differed significantly in the Risk do-
main. Also, young children were significantly more consistent in the Social than in the
Goods domain and U were less consistent in the Risk than in the other two domains.
However, participants in the U age group were not making significantly more transitive
choices compared to participants in the 4th-5th age group in any domain, initially sug-
gesting that the development of transitive decision-making stops around 4th grade in all
domains. The result was robust to alternative ways of measuring choice inconsistencies
(see Appendix SI-3.1.)
Transitivity in the Goods domain improves gradually with age. Although consistency
in the Goods-Choice task improved with age, that improvement was not uniform across
all paired choices. In particular, trials featuring options ranked very differently in the
4
K & 1st 2nd & 3rd 4th & 5th U
Age Group
Tran
sitiv
ity V
iola
tions
per
Sub
ject Goods
SocialRisk
Choice-Task
Figure 2: Performance improves with age in the Goods-Choice and Social-Choice tasks but not in the Risk-Choice task. We report the average number of TVin the Choice tasks (y-axis) for each age group (x-axis) broken down by domain (Goods,Social, and Risk). Shadings correspond to the 95% confidence intervals.
Goods-Ranking task were unlikely to be involved in a TV. By contrast, trials featuring
options ranked similarly were significantly more likely to be involved in TV (Fig.3). This
general pattern was independent of age and suggested that it was more difficult to choose
consistently when the options involved were close in value. We observed convergence to
a state where participants almost never committed TV when choices involved their best
(left column) or their worst (bottom row) options and most TV got concentrated in items
with intermediate ranks (3rd, 4th and 5th). The result implies that children gradually
“learn to know what they like most and least” with age.
“Looking consistent” in the Social domain and the role of simple policies. We did
not anticipate to find that small children would be significantly more consistent in the
Social-Choice task compared to the Goods-Choice task. One possible explanation is that
goods are atomic and need to be evaluated as a whole. By contrast, social options can be
decomposed into several simple attributes such as “objects for self,” “objects for other,”
and “total number of objects,” where each attribute is easy to evaluate consistently. As
such, a participant might focus on a single attribute at a time (centration) and use simple
rules to choose. To test that hypothesis, we listed all such simple rules (for example,
“pick the option that gives self more objects, and if both give the same, pick the option
that gives other more objects”) and determined the fraction of participants who employed
them. From now on, we call them ‘simple policies,’ although they may also correspond to
5
0.7
0.6
0.5
0.8
0.4
0.5
0.3
0.20.1
0.2
0.3
0.4
0.0 0.0
0.1
0.2
0.3
1 2 3 4 5 6 1 2 3 4 5 6 1 2 3 4 5 6 1 2 3 4 5 6
76
54
32
76
54
32
76
54
32
76
54
32
Rank of Higher Ranked OptionRank of Higher Ranked Option Rank of Higher Ranked Option Rank of Higher Ranked Option
Rank
of L
ower
Ran
ked
Opt
ion
Transitivity Violations in Goods-Choice-Task Plotted by Option Ranks in Goods-Ranking-Task for Each Age Group
K and 1st 2nd and 3rd 4th and 5th Undergraduate
Figure 3: Transitivity violations decrease with age differently across choices.Each cell represents the color-coded average number of TV involving a higher ranked option(x-axis) and a lower ranked option (y-axis), as revealed by explicit rankings obtained inthe Goods-Ranking task. Lighter colors reflect more violations. All age groups are morelikely to make TV when options have similar ranks. There is convergence to a state whereparticipants almost never commit TV when choices involve their best (left column) or theirworst (bottom row) options. The vectors in the top right corner show the average gradientof the heatmap. Subjects become more consistent if the rank of the higher-ranked optionincreases (left-right gradient of the vector) and if the rank of the lower-ranked optiondecreases (up-down gradient of the vector).
simple heuristics or lexicographic preferences. Either way, these rules “buy” consistency,
as they are easy to follow and unlikely to produce TV.
The most popular simple policy was that of maximizing objects for self, then mini-
mizing objects for other. The choices of 35% of K-1st, 20% of 2nd-3rd, 22% of 4th-5th
and 4% of U were in-line with this policy. When we removed from our sample all the sub-
jects who used simple policies, the developmental signature matched that of the Goods
domain (Fig.4, left panel). However, a systematic difference with the Goods domain per-
sisted when we looked at how violations evolved between similarly- and differently-ranked
options (Fig.5, top row (A)). We found that children learned to become consistent in
choices involving their best options (“they learned to know what they liked most”) but
they still committed a substantial number of violations in choices involving their worst
options (“they did not learn to know what they liked least”). This was true even in the U
age group. Last, a noticeable trend was the gradual evolution of behavior towards more
integrative decision rules, reflecting trade-offs between the two attributes. This result sug-
gests that as children age, they become better able to think in terms of prosociality and
social efficiency, confirming the results from studies on other-regarding preferences (Fehr,
Bernhard and Rockenbach, 2008).
Limited development of consistency in the Risk domain. Given lotteries are also multi-
attribute options, in this case probabilities and outcomes, we hypothesized that centration
6
K & 1st 2nd & 3rd 4th & 5th U
Age Group
Tran
sitiv
ity V
iola
tions
per
Sub
ject Goods
SocialRisk
Choice-TaskGoodsSocial
Social
Choice-Task
All Subjects
w/o HeuristicSubjects
K & 1st 2nd & 3rd 4th & 5th U
Age Group
GoodsRisk
Risk
Choice-Task
All Subjects
w/o HeuristicSubjects
Figure 4: Left panel. Social vs. Goods after removing simple policies: thedevelopmental signature of consistency is comparable across domains. Right panel.Risk vs. Goods after removing simple policies: the developmental signature ofconsistency is different across domains. All children are similarly inconsistent in Risk andimprovements occur later in life.
could also play a role in the Risk domain. We listed all simple policies characterized
by the evaluation of one attribute at a time (e.g., “pick the option with more goods,
and if both have the same, pick the most likely option”) and determined the fraction of
participants with a behavior consistent with one of these policies. Again one of them was
used dominantly: 44% of K-1st, 35% of 2nd-3rd, 19% of 4th-5th and 8% of U chose the
lottery offering the larger reward. When we removed all subjects who used simple policies,
we found that the developmental signature did not match that of the Goods and Social
domains (Fig.4, right panel). In particular, the number of violations was the same across
all school-age groups and this was significantly different from that of the U group. The
result indicates that a potential milestone for Risk was outside our window of observation,
somewhere in middle or high school. Also, children learned to become more consistent in
choices involving their worst options (“they learned to know what they liked least”), but
less so in choices involving their best options (“they did not learn to know what they liked
most”), a trend opposite to the trend observed in the Social domain (Fig.5, bottom row
(B)). Similar to the Social domain, however, older participants were better able to make
integrative decisions, in this case, trade-offs between reward amounts and probabilities.
Who makes transitive choices? As noted earlier, centration took the form of using
simple policies and it was strongly associated with consistency. In addition, among par-
7
0.8
0.7
0.6
0.5 0.4
0.5
0.6
0.7
0.8
0.3
0.4
0.5
0.6
0.7
0.8
0.9
0.3
0.4
0.5
0.7
0.6
0.5
0.4
0.1
0.2
0.3
0.0
0.1
0.2
0.3
0.4
0.0
0.1
0.2
1 2 3 4 5 6 1 2 3 4 5 6 1 2 3 4 5 6 1 2 3 4 5 6
1 2 3 4 5 6 1 2 3 4 5 6 1 2 3 4 5 6 1 2 3 4 5 6
Rank of Higher Ranked OptionRank of Higher Ranked Option Rank of Higher Ranked Option Rank of Higher Ranked Option
Rank
of L
ower
Ran
ked
Opt
ion
Rank
of L
ower
Ran
ked
Opt
ion
76
54
32
76
54
32
76
54
32
76
54
32
76
54
32
76
54
32
76
54
32
76
54
32
Transitivity Violations in Social-Choice-Task and Risk-Choice-Task Plotted by Option Ranks in the Respective Ranking-Task for Each Age-GroupK and 1st 2nd and 3rd 4th and 5th Undergraduates
K and 1st 2nd and 3rd 4th and 5th Undergraduates
A) S
ocia
lB)
Ris
k
Figure 5: Transitivity violations across choices among participants who do notuse simple policies. (A) In the Social domain, participants converge to a state whereTV rarely involve their highest-ranked options (vector with a horizontal gradient: “partic-ipants learn to know what they like”). (B) In the Risk domain, participants converge to astate where TV rarely involve their lowest-ranked options (vector with a vertical gradient:“participants learn to know what they do not like”).
ticipants who did not use simple policies, intransitivity was strongly correlated across
domains. Intransitivity was also strongly associated with mistakes in attention trials,
indicating that attentional control also played a role. By contrast, performance in the
transitive reasoning task was not a predictor, suggesting that logical reasoning per se was
not a key determinant of choice transitivity. Finally and notably, participants whose de-
cisions in the Choice tasks were more in line with the ordering displayed in the Ranking
tasks had also fewer TV (see Appendix SI-3.2). This effect is not reminiscent of any
known paradigm. It implies that the gradual and differential development of choice con-
sistency across domains is not entirely explained by changes in attention, centration and
logical reasoning. As children learn to know what they like best and least, their pairwise
choices and explicit rankings become more consistent with each other. In the Appendix,
we present some further investigation of the determinants of choice consistency (SI-3.2,
SI-3.3, SI-3.4), the relationship between TV and attention control (SI-3.5), the relation-
ship between TV and transitive reasoning (SI-3.6) as well as an integrative analysis of TV
in each domain (SI-3.7)
8
4 Discussion
Decision-making in the Goods domain: the developmental template. For each age group,
the behavior observed in the Goods domain was consistent with the hypothesis that par-
ticipants make choices by estimating and comparing noisy values. Under that hypothesis,
decisions involving options close in value are confusing and prone to error, while decisions
involving options valued differently are easy to make. The developmental trajectory we
observed also suggests that the evaluation process becomes less and less noisy over time,
reducing the number of confusing decisions, and hence the number of violations, espe-
cially among options ranked very differently (Fig.3). This pattern of improvement implies
that the ability to make the simplest decisions is what solidifies in this developmental
window. Over time, children learn to know with more accuracy what they like most and
least. In addition, the fact that inconsistency was not associated with the ability to rea-
son transitively suggests that value-based reasoning requires the involvement of different
brain regions compared to logical reasoning (ventromedial prefrontal cortex in the first
case (Levy and Glimcher, 2012) and parietal regions in the second (Hinton, Dymond,
Von Hecker and Evans, 2010)). Furthermore, the association of inconsistency with age
and attentiveness suggests that improvements in decision-making in the Goods domain
are partly driven by known age-related changes in attentional control.
Decision-making in the Social domain: the effect of centration. Reasoning over multiple
attributes is known to be a difficult exercise for children 7 years of age and younger, and
this ability develops during the concrete operational stage, somewhere between 2nd and
5th grade (Piaget, 1952; Crain, 2015). The high utilization of simple policies by children
in the youngest age group is therefore consistent with centration. Participants who have
not yet overcome centration are more likely to pick a simple policy that makes them look
consistent to the outside observer. Among children who do not use simple policies, we
observe the same trajectory as in the Goods domain, implying that centration conceals
their underdeveloped decision-making system.
Decision-making in the Risk domain: too complex. As in the Social domain, making
choices in the Risk domain requires the evaluation of multiple attributes. It is therefore
natural to observe a similar tendency, especially among young children, to act according
to simple policies that makes them look consistent. However, integrative reasoning is
substantially more complex in the Risk domain and participants who do not use simple
policies are not nearly as consistent as in the other two domains. All children were at the
same level of performance suggesting that the ability to trade-off probabilities and rewards
was not yet developed, perhaps as a consequence of a still underdeveloped working memory
system (Gathercole, Pickering, Ambridge and Wearing, 2004).
9
A different learning trajectory across domains. The results obtained for participants
in the U age group are consistent with the typical findings in choice consistency literature:
that by adulthood, people have learned to know their preferences and are largely consistent
in the Goods (Battalio, Kagel, Winkler, Fisher, Basmann and Krasner, 1973; Cox, 1997)
and Social domains (Andreoni and Miller, 2002; Fisman, Kariv and Markovits, 2007).
Our findings in the Risk domain are in accordance with the original transitivity stud-
ies (Loomes, Starmer and Sugden, 1991), and less optimistic than recent GARP studies
(Mattei, 2000; Choi, Fisman, Gale and Kariv, 2007) perhaps due to differences in design.
More importantly, our study identifies different learning trajectories across domains. In
the Social domain, children learn to pick their most-preferred option. In the Risk domain,
children learn to avoid their least-preferred option. In the Goods domain, children learn
both. These asymmetries cannot be explained by the existence of an obvious best option
in the Social domain and an obvious worst option in the Risk domain, since the most and
least favorite items change significantly across age groups (see Appendix SI-3.4 for details).
It cannot be explained by both an obvious best and worst option in the Goods domain,
since the set of goods are different for different gender and age groups (see Appendix SI-1
for details). Moreover the goods ranked highest and lowest within each age-group vary
substantially across individuals. Finally, the development of attentional control cannot be
the main explanation either, since these should act similarly across domains. Overall, the
results suggest that self-knowledge of preferences follows its own developmental trajectory.
To conclude, we have investigated the developmental trajectories of transitive decision-
making in the Goods, Social, and Risk domains in children from Kindergarten to 5th
grade and compared consistency levels with adult-level performance. We have found that
transitivity in choices develops gradually, but differentially across domains. It is not
related to the development of transitive reasoning and it is only partially explained by
the development of attentional control. Instead, choice transitivity follows a common
trajectory across domains supported by the gradual improvement of self-knowledge of
preferences.
10
References
Andreoni, James and John Miller, “Giving According to GARP : An Experimental
Test of the Consistency of Preferences for Altruism,” Econometrica, 2002, 70 (2), 737–
753.
Astle, Duncan E. and Gaia Scerif, “Using developmental cognitive neuroscience to
study behavioral and attentional control,” Developmental Psychobiology, 2008, 51 (2),
107–118.
Battalio, Raymond C, John H Kagel, Robin C Winkler, Edwin B Fisher,
Robert L Basmann, and Leonard Krasner, “A test of consumer demand theory
using observations of individual consumer purchases,” Economic Inquiry, 1973, 11 (4),
411.
Bouwmeester, Samantha and Klaas Sijtsma, “Constructing a transitive reasoning
test for 6- to 13-year-old children,” European Journal of Psychological Assessment, 2006,
22 (4), 225–232.
Brocas, Isabelle, Juan D. Carrillo, T Dalton Combs, and Niree Kodaverdian,
“Consistency in Simple vs. Complex Choices over the Life Cycle,” CEPR discussion
paper 10457, 2015.
Choi, Syngjoo, Raymond Fisman, Douglas Gale, and Shachar Kariv, “Con-
sistency and Heterogeneity of Individual Behavior under Uncertainty,” The American
Economic Review, 2007, 97 (5), 1921–1938.
Cox, J.C., “On Testing the Utility Hypothesis.,” The Economic Journal, 1997, 107 (443),
1054–1078.
Crain, W., Theories of Development: Concepts and Applications, 6 ed., Psychology Press,
2015.
Davidson, Matthew C., Dima Amso, Loren Cruess Anderson, and Adele Dia-
mond, “Development of cognitive control and executive functions from 4 to 13 years:
Evidence from manipulations of memory, inhibition, and task switching,” Neuropsy-
chologia, 2006, 44 (11), 2037–2078.
Donaldson, Margaret, “Conservation - What is the question?,” British Journal of Psy-
chology, 1982, 23 (2), 199.
11
Fehr, Ernst, Helen Bernhard, and Bettina Rockenbach, “Egalatarianism in young
children,” Nature, 2008, 454 (28), 1079–1084.
Fisman, Raymond, Shachar Kariv, and Daniel Markovits, “Individual Preferences
for Giving,” The American Economic Review, 2007, 97 (5), 1858–1876.
Gathercole, Susan E., Susan J. Pickering, Benjamin Ambridge, and Hannah
Wearing, “The Structure of Working Memory From 4 to 15 Years of Age.,” Develop-
mental Psychology, 2004, 40 (2), 177–190.
Harbaugh, William T and Kate Krause, “Children’s Contributions in Public Good
Experiments: The Development of Altruistic and Free-riding Behaviors,” Economic
Inquiry, 2000, 38 (1), 95–109.
, , and Lise Vesterlund, “Risk Attitudes of Children and Adults : Choices Over
Small and Large Probability Gains and Losses,” Experimental Economics, 2002, 5 (1),
53–84.
, , and Timothy R Berry, “GARP for Kids : On the Development of Rational
Choice Behavior,” The American Economic Review, 2001, 91 (5), 1539–1545.
Hinton, EC, S Dymond, Ulrich Von Hecker, and Christopher John Evans,
“Neural correlates of relational reasoning and the symbolic distance effect: Involvement
of parietal cortex,” Neuroscience, 2010, 168 (1), 138–148.
Levy, Dino J. and Paul W. Glimcher, “The root of all value: A neural common
currency for choice,” Current Opinion in Neurobiology, 2012, 22 (6), 1027–1038.
List, John A and Daniel L Millimet, “The Market: Catalyst for Rationality and
Filter of Irrationality,” The BE Journal of Economic Analysis & Policy, 2008, 8 (1),
Article 47.
Loomes, Graham, Chris Starmer, and Robert Sugden, “Observing violations of
transitivity by experimental methods,” Econometrica, 1991, pp. 425–439.
Mattei, Aurelio, “Full-scale real tests of consumer behavior using experimental data,”
Journal of Economic Behavior & Organization, 2000, 43 (4), 487–497.
Piaget, J, “La psychologie de l’intelligence,” Revue Philosophique de Louvain, 1948, 46
(10), 225–227.
Piaget, Jean, “The child’s conception of numbers,” Trans. eds C. Gattegno and FM
Hodgson, NY: Routledge, 1952.
12
, David Elkind, and Anita Tenzer, Six psychological studies, New York, NY: Vintage
Books., 1967.
Reyna, Valerie F and Susan C Ellis, “Fuzzy-trace theory and framing effects in
children’s risky decision making,” Psychological Science, 1994, 5 (5), 275–279.
Samuelson, P A, “A Note on the Pure Theory of Consumer’s Behaviour,” Economica,
1938, 5 (17), 61–71.
Sher, Itai, Melissa Koenig, and Aldo Rustichini, “Children’s strategic theory of
mind,” Proceedings of the National Academy of Sciences, 2014, 111 (37), 13307–13312.
Smedslund, Jan, “Transitivity of preference patterns as seen by preschool children,”
Scandinavian journal of psychology, 1960, 1 (1), 49–54.
Varian, Hal R, “The Nonparametric Approach to Demand Analysis,” Econometrica,
1982, 50 (4), 945–973.
13
Supporting Information
SI-1. Design and procedures
The experiment was reviewed and approved by the IRB of the University of Southern
California. It was conducted through tablet computers and the tasks were programmed
with the Psychtoolbox software, an extension of Matlab.
Participants. We recruited 134 children from kindergarden (K) to 5th grade from
Lycee International of Los Angeles, a bilingual private school. We ran 18 sessions, each
with 5 to 10 subjects and lasting between 1 and 1.5 hours. Sessions were conducted in a
classroom at the school. In all of our tasks, children had to make choices between options
involving goods. To make sure that options were desirable by all participants, we organized
sessions by gender and age group: K to 2nd grade boys, K to 2nd grade girls, 3rd to 5th
grade boys, and 3rd to 5th grade girls. Goods included toys and stationary. As a control,
we ran 7 sessions with 51 undergraduate students (U). These were conducted in the Los
Angeles Behavioral Economics Laboratory (LABEL) in the department of Economics at
the University of Southern California. For the undergraduate population, participants
were recruited from the LABEL subject pool. Instead of toys or stationary, we used snack
foods. A description of the distribution of our participants is reported in Table 1.
K 1st 2nd 3rd 4th 5th U
Male 12 12 15 20 11 6 22Female 7 16 11 9 8 7 29
Total 19 28 26 29 19 13 51
Table 1: Description of participants by age and gender.
Each participant completed several tasks. In Ranking tasks, participants received 7
cards, each with a picture of one of the options. Participants were instructed to rank
these cards on a ranking board attached to their desk from most to least preferred. Once
a participant was finished ranking the options, an experimenter transferred her rankings
onto her tablet. There were three Ranking tasks. In the Goods-Ranking task, options
were toys. In the Social-Ranking task, options were sharing rules for oneself and another
child of the same age and gender in another school. The sharing rules contained different
numbers of a single toy. To ensure the toy was highly desirable, we selected the one
that was ranked as first favorite in the Goods-Ranking task. In the Risk-Ranking task,
options were lotteries. In each lottery, the participant could earn a given number of toys
with a given probability. Probability was represented by a circle with outcomes shown
14
in white and green, with the green area corresponding to the probability of winning the
good(s). Each circle corresponded to a spinner wheel (previous experiments on risk with
children have successfully employed a similar design with a spinner wheel (Reyna and Ellis,
1994; Harbaugh et al., 2002)). Again, to ensure that options were desirable, we chose the
toy that was ranked second favorite in the Goods-Ranking task.
In Choice tasks, participants were presented with all 21 pairwise combinations of the
7 options in the corresponding Ranking task. We will refer to these tasks as Goods-
Choice task, Social-Choice task and Risk-Choice task. In each trial, one option was
displayed on the left side and another on the right of the tablet’s screen. A participant
could select their preferred option (by touching a button above the left or right options
on their screen), or report to be indifferent between them (by touching a button between
the two options). The 21 pairs were presented randomly, both in terms of trial order and
in terms of left-right presentation. These tasks are represented in Fig.1 and the specific
options are displayed in Figs. 6 (Goods), 7 (Social), and 8 (Risk). In each Choice task, the
22nd trial was an attention trial, where subjects were presented with their most frequently
and least frequently chosen options from the preceding 21 trials.1
Figure 6: Toys used in the Goods-Choice and Goods-Ranking tasks. Childrenof different age and gender were presented different toys.
For analysis, participants also completed a Transitive Reasoning task. This task was
designed to measure levels of transitive reasoning. Several tasks have been proposed in
the literature (Bouwmeester and Sijtsma, 2006). In order to test the relation between
1As the 21 pairwise choices were being made, the computer awarded 1 point to the option selected bythe participant and 0.5 points to each option in case of indifference. At the end of the 21st trial, the tallyfor each option was summed and the options with the most and least points were determined. In case ofa tie, one of the options was chosen randomly.
15
Figure 7: Sharing rules between self and other used in the Social-Choice andSocial-Ranking tasks. For children, each circle represented one toy, personalized toensure desirability. For undergraduate students, each token represented $2.
Figure 8: Lotteries used in the Risk-Choice and Risk-Ranking tasks. For chil-dren, each circle represented one toy, personalized to ensure desirability. For undergradu-ate students, each token represented $2.
transitive choice and transitive reasoning devoid of memory or operational reasoning re-
quirements, we opted for a new design; we constructed a test that does not require mem-
ory and is visually represented. The task consisted of seven questions of varying difficulty.
Each of the seven questions consisted of two premises represented in two vignettes, and a
third vignette with a response prompt. For each premise, participants were told that the
animals shown in the vignette were at a party and the oldest wore a hat. They had to
determine which animal in the third vignette should wear the hat. This is illustrated in
Fig.9. Three of the seven questions did not require transitive reasoning and were included
to test whether participants were paying attention. These are referred to as “pseudotransi-
tivity” trials (Bouwmeester and Sijtsma, 2006). The remaining four did require transitive
reasoning. Of the four transitivity questions, two of them were less difficult and two were
more difficult.
All subjects completed the tasks in the same order: (1) Goods-Choice task, (2) Goods-
Ranking task, (3) Transitive Reasoning task, (4) Social-Choice task, (5) Social-Ranking
task, (6) Risk-Choice task and (7) Risk-Ranking task. All tasks were untimed. Upon
the completion of a given task by all subjects, instructions for the next task were given.
To incentivize participants, we told them that it was important to choose options they
liked, because their choices would determine what they would receive at the end of the
16
Figure 9: Transitive Reasoning Task. The animal wearing the hat is the oldest ineach vignette on the left. The participant had to answer in the vignette on the bottomright by choosing the animal he thought was the oldest given the information on the left,or by reporting it could not be known (?).
experiment. This was explained accessibly through a simple analogy. From each of the
Choice tasks, one choice was randomly selected by the computer and subjects received their
selection in that trial. For the Social-Choice task, we explained that each participant had
been paired with another student, of their same gender and grade level, from another school
in Los Angeles (Undergraduate were paired with a subject of another session, matched
for gender). We explained that the selected sharing rule would actually be implemented:
we would go to the other school and deliver to the other student the goods that were
represented in the selected option. The shared items were delivered to Foshay elementary
school, a public school in LAUSD. For the Risk-Choice task, we explained that their choice
in the selected trial would be implemented and that the corresponding spinner wheel would
be spun by a blindfolded assistant at the end of the session that day. If the spinner arrow
landed in the green part of the wheel, the subject would win the number of toys (or
money, for the case of undergraduates) associated to that choice. Otherwise, they would
not win anything from this task. This was implemented at the end of each session. Before
leaving, we collected demographic information consisting of “gender,” “grade,” “number
of younger siblings,” and “number of older siblings.” All participants also received a fixed
show-up fee. Children received their highest ranked item in the Goods-Ranking task while
Undergraduate students received $5.
17
SI-2. Analysis of transitivity violations
We collected data from all participants in the Goods-Choice and Goods-Ranking tasks.
The tablets did not record the choices of two subjects in the Risk-Choice and Risk-Ranking
tasks and one subject in the Social-Choice and Social-Ranking tasks.
Transitivity violations. As briefly explained in the main text, we measured consistency
by counting the number of transitivity violations between all triplets of options: A vs.
B, B vs. C and A vs. C. With 7 options, there are 35 triplets to consider. Importantly,
considering triplets is enough to account for all TV. Indeed, it is trivial to show that a
violation involving four or more items (such as, for example, A chosen over B, B over C,
C over D and D over A) necessarily involves at least one violation between three items of
that set. Also, given we allowed participants to express indifference between options, we
allocated 0.5 violation to triplets involving weak violations, that is, when we observed A
chosen over B, B chosen over C, and A indifferent to C. We first counted the number of
times choices between triplets of items were intransitive for each subject in each Choice
task. Then, we computed the average number of violations by age group and Choice task.
We reported the results in Fig.2.
In the Goods domain, we found a significant improvement between consecutive age
groups, with 4th-5th being at the same level of performance as U subjects. More specifi-
cally, participants in K-1st had significantly more violations than participants in 2nd-3rd
(p-value = 0.0050), participants in 2nd-3rd had significantly more violations than par-
ticipants in 4th-5th (p-value = 0.0355), but violations of 4th-5th and U groups were
not significantly different (p-value = 0.0835). In the Social domain, a similar pattern was
observed. TV were significantly different between the K-1st age group and the 4th-5th
age group (p-value = 0.0162), between the K-1st age group and U age group (p-value
= 0.0002) and between the 2nd-3rd age group and the U age group (p-value = 0.0048),
while all other differences in TV were not statistically significantly. In the Risk domain
however, there was no apparent improvement. No age group of children had significantly
more violations than U.
When we compared TV across domains, we found that participants in the K-1st
age group had significantly more violations in the Goods than in the Social domain (p
= 0.0009). A similar result held for the 2nd-3rd age group (p = 0.0117). Participants
in the U group also had significantly more violations in the Risk than in the Goods (p
< 0.0001) or Social domains (p < 0.0001).
Analysis of transitivity in the Goods domain. We used the explicit ranking elicited
in the Goods-Ranking task to assess the frequency with which items were involved in
transitive violations as a function of their ranking. For each transitivity violating triplet,
we assigned a score of 1 to its 3 corresponding items. Each item was then allocated
18
the percentage of times it was involved in a violation (the number of times it obtained
a score of 1 over all possible times). By repeating this exercise for all participants and
then averaging over all of them, we obtained the percentage of times violations occurred
in choices between options ranked x and y in each age group. Naturally, ranks x and y
corresponded to different specific items since each participant had her own rankings of
options. Fig.3 represents the color-coded result of this exercise. Darker colors are used to
represent fewer violations.
Next, we studied the marginal sensitivity to violations to learn if some choices were
more consistent than others, and if so, why. Intuitively when choices are easy to make,
inconsistencies are more unlikely. Easiness might just be a matter of picking what we like.
Or it might be a matter of avoiding what we do not like. Assuming that ranks derived from
the Ranking tasks give a proxy for ordinal values, the more (less) valuable options should
be those with higher (lower) ranks. If consistency is only driven by choosing what is liked,
it should not matter what the lower-ranked option is, and if consistency is only driven by
avoiding what is liked least, then it should not matter what the higher-ranked option is.
We tested this conjecture by analyzing how consistency changes if we marginally change
the rank of the higher- or lower-ranked option. Intuitively, in Fig.3, if participants are
making choices in a given cell and exhibit a certain level of consistency, how much more
(or less) consistent they become by moving one cell to the right (changing the rank of the
higher-ranked item but not the lower-ranked item) or by moving one cell up (changing the
rank of the lower-ranked item but not the higher-ranked item).
More specifically, we considered every pair of adjacent boxes in the same row and we
determined the difference in violations between a box and the box to the right of it. We
then computed the average difference over all pairs of boxes and reported this number as
the left-right gradient of the vector in the corner of Fig.3 for each age group. A vector
with a larger x-coordinate means a greater increase in violations when the higher-ranked
option is decreased by one rank. We did the same with every pair of adjacent boxes in the
same column to determine the up-down gradient. In this case, a vector with a larger y-
coordinate means a greater increase in violations when the lower-ranked option is increased
by one rank. As expected, the vector of all four age groups have upper right gradients,
indicating more violations when options are ranked more closely in either dimension (lower
rank for the higher option or higher rank for the lower option). It also implies that the
highest number of violations occurs between items of intermediate ranks (3rd, 4th, 5th).
The left graph of Fig.10 presents a heatmap of all vectors in the Goods domain and
confirms the increase in violations when the higher- and lower-ranked options are closer
to each other. Finally, a t-test confirms that both the x- and y-coordinates are positive
and significantly different from 0 (p-value < 0.01) overall and for all age groups, with the
19
exception of the x-coordinate in the K-1st age group. Overall, in the Goods domain, all
groups were significantly driven by the rank of the lower-ranked option, and all groups
except K-1st were driven by the rank of the higher-ranked option.
Goods Social Risk
Figure 10: Sensitivity analysis
Analysis of transitivity in the Social domain. An option in the Social domain can
be decomposed into three attributes that can be visually assessed through counting: the
reward to self, the reward to other, and the total reward. When comparing attributes,
a participant can use three simple rules: pick the maximum, pick the minimum, or be
indifferent. A participant could apply a rule to a single attribute (a policy such as “pick
option with maximum reward for self”); a participant could alternatively apply a rule to
one attribute then move to a second attribute and apply another rule (a policy such as
“pick option with maximum reward for self and if they are the same, pick option with
maximum total reward”). We looked at all such policies, and only 3 were used by subjects:
“maximize amount for self, then minimize amount for other,” “maximize amount for self,
then maximize amount for other” and “maximize amount for self, irrespective of amount
for other.” We counted all participants who complied with any such policy and those who
made exactly one mistake with respect to it. We will call these participants simple policy
users. Among them, 61% chose the policy “maximize amount for self, then minimize
amount for other.” We removed all simple policy users from our sample, leaving us 125
participants. We computed TV by age group after the exclusion and found that the results
were now similar to those obtained in the Goods domain. The development of consistency
followed the same pattern: participants in the K-1st age group were significantly more
inconsistent than all other participants (all p-values < 0.01). Participants in the 2nd-3rd
age group had significantly more violations than participants in the U age group (p-value
= 0.0144) and participants in the 4th-5th age group were not significantly different from
participants in the U age group (p-value = 0.1205). This is represented in Fig.4 (left).
20
We repeated the heatmap analysis we performed on the Goods-Choice task. The result is
represented in Fig.5(A).
We also performed the same marginal sensitivity analysis as in the Goods domain after
removing simple policy users. The x-coordinates were positive and significantly different
from 0 (p-value < 0.01), overall and for all age groups except for participants in the K-
1st age group. However, and contrary to the Goods domain, the y-coordinates were not
significantly different from 0 for any age group. This suggests that participants had a very
clear idea of what they liked best but a much less clear idea of what they liked least in the
Social domain. The result can also be seen in the heatmap representation of the vectors
(Fig.10, center).
Analysis of transitivity in the Risk domain. The procedure to elicit simple policies
was the same. In the Risk domain, there are two attributes per option, reward amount
and probability, and three simple rules, pick the maximum, pick the minimum, or be
indifferent. We defined 6 policies, but only 4 were ever used by subjects: “maximize
the amount, irrespective of the probability,” “maximize the probability, then maximize
the amount,” “minimize the amount, irrespective of the probability,” and “minimize the
probability, irrespective of the amount.” We counted all participants who complied exactly
with any such policy as well as those who made exactly one mistake with respect to a policy.
Among those, 89% chose the policy “maximize the amount irrespective of the probability.”
We removed from our sample all participants whose behavior was consistent with a simple
policy, leaving us 128 subjects. We analyzed TV again and found that violations by the U
group were different than those by any younger group (p-value = 0.0019 for the comparison
with K-1st, p-value = 0.0050 for the comparison with 2nd-3rd, and p-value = 0.0153
for the comparison with 4th-5th), but that all school-age groups performed at similar
levels from each other. This is represented in Fig.4 (right). We also repeated the heatmap
analysis. The result is represented in Figure 5(B).
Again, we performed a marginal sensitivity analysis after removing simple policy users
and found the opposite result than in the Social domain: y-coordinates were significantly
different from 0 overall and for two age groups (4th-5th and U, t-test p-values < 0.05),
whereas x-coordinates were not different from 0 in any age group (Fig.10, right). This
means that in the Risk domain, participants had a clear idea of what they liked least but
a much less clear idea of what they liked best.
Transitivity violations across domains. The aggregate analysis of TV indicated that
consistency improves with age. We addressed the question of how individual scores vary
across domains to assess whether participants who commited relatively more violations in
one domain were also those who committed relatively more violations in a different domain.
Said differently, we wanted to know whether consistency was driven (at least partially) by
21
a common factor or whether it resulted from the development of domain-specific skills.
When considering the full sample, we found that TV in the Goods and Risk domains were
not correlated with one another, while TV in the Goods and Social or TV in the Risk
and Social domains were (Pearson coefficient = 0.38 and 0.30, respectively; p-value 0.0001
for both). When we removed simple policy users, we found that TV were significantly
correlated across all domains (Pearson coefficient = 0.36, p-value = 0.0006 between the
Goods and Risk domains, Pearson coefficient = 0.43, p-value < 0.0001 between the Goods
and Social domains, and Pearson coefficient = 0.49, p-value < 0.0001 between the Risk
and Social domains) suggesting that participants’ consistency was partially driven by the
development of a skill useful in all domains.
Comparison with random play. To assess whether participants committing many vio-
lations might have been acting randomly, we simulated random players. Depending on the
probability we assigned to their expressing indifference, the number of TV was between 8
and 10, substantially above the actual numbers obtained even among the most inconsistent
participants. This was in line with earlier literature on consistency in children (Harbaugh
et al., 2001).
22
SI-3. Additional Statistical Analysis
SI-3.1. Other measures of choice inconsistencies
Choice reversals and choice removals. Number of violations is only one possible way
to address the issue of consistency. An alternative way is to look at severity of violations.
We computed two measures of violation severity for each subject. The first one counts
the number of choices that need to be reversed to restore transitivity. To compute that
number, an algorithm sequentially changes the choices made in each trial, and computes
a TV score after each change. If the score is 0 at any point, the algorithm stops. After
all single trials have been exhausted, the algorithm repeats the procedure over pairs of
trials, then triplets and so on. We found that choice reversals followed a very similar
pattern across age groups as TV (Fig.11, left), and were highly correlated to it in all three
Choice tasks: Goods (Pearson = 0.95, Spearman = 0.98, p < 0.0001), Social (Pearson
= 0.93, Spearman = 0.98, p < 0.0001), and Risk (Pearson = 0.92, Spearman = 0.95, p
< 0.0001). The second of these severity measures counts the number of choices that need
to be removed to restore transitivity. We used an algorithm similar to that for counting
choice reversals. We found that choice removals and choice reversal were almost identical
in capturing severity of violations (Fig.11 right and left), so this measure also closely
reflected the age patterns observed with TV in the three Choice tasks.
Cho
ice
Reve
rsal
s pe
r Sub
ject
K & 1st 2nd & 3rd 4th & 5th U
Age Group
GoodsSocialRisk
Choice-Task
K & 1st 2nd & 3rd 4th & 5th U
Age Group
Cho
ice
Rem
oval
s pe
r Sub
ject
GoodsSocialRisk
Choice-Task
Figure 11: Analysis of severity of violations. Left: choice reversals across domainsand age. Right: Choice removals across domains and age.
Implicit ranking and choice noisiness. From a subject’s selections in a Choice task, it
is possible to extract their implicit (or revealed) ranking for the options in that task. We
23
computed this revealed ranking in a very simple way, by tallying their selections “for” and
“against” each of the options as briefly explained in footnote 1: each time the participant
selected an option, 1 point was added to the running tally of that option and when the
participant expressed indifference, 0.5 points were added. The tallied points for each option
were summed, and the options were ordered according to this sum, giving the subject’s
implicit ranking. The subject’s choices were then checked for inconsistencies against their
implicit ranking using the following simple rule. Suppose that, according to the tally of
the implicit ranking, option B was ranked lower than A. The participant would receive a
score of 0, if A was chosen over B (showing consistency with the implicit ranking), a score
of 0.5 if he expressed indifference between A and B, and a score of 1 if B was chosen over
A (showing inconsistency with the implicit ranking). We called this classification error
score ICR (Inconsistency in Choices given the Revealed ranking). Notwithstanding the
endogeneity issues with this measure (we use choices to extract a revealed ranking then
check that ranking against the very choices that were used to create it), it still provides
a measure of choice “noisiness.” As can be seen by comparing Fig.2 with Fig.12 (left),
the age pattern of choice inconsistencies with implicit rankings (ICR) closely resembles
that of TV: participants who make more TV are those who are more “noisy” (make more
mistakes) around their implicit ranking.
Explicit rankings and choices. As an additional measure, we evaluated inconsistencies
between choices in Choice tasks and the explicit rankings elicited in Ranking tasks. This
measure was computed in a similar way as for inconsistencies with respect to implicit
rankings, that is, we checked the subject’s choices for inconsistencies against their explicit
ranking. We called ICE (Inconsistency in Choices given the Explicit ranking) the new
classification error score. One advantage of this measure over the previous one is the
absence of endogeneity (errors are computed from rankings in two different tasks). The
results are illustrated in Fig.12 (right).
In the Goods domain, we found that participants in the K-1st and 2nd-3rd age
groups had relatively more difficulty making choices consistent with their explicit rankings
compared to older participants (p-values < 0.0167 for all comparisons). A similar story
held qualitatively in the Social domain, except that the participants in the K-1st and
2nd-3rd age groups were not significantly different from each other, nor were the older
participants in groups 4th-5th and U. It suggests a less gradual development of the ability
to choose from an explicit ranking. We found that inconsistencies in the Social domain were
following the same trend as inconsistencies in the Goods domain after removing subjects
who used simple policies. In the Risk domain, we found that the level of inconsistencies
was high and similar across all subjects: they were not able to make choices consistent with
their explicit rankings. This result changed when we removed subjects who used simple
24
policies: they were becoming more capable over time to express their explicit rankings in
their choices. Nevertheless, the level of inconsistencies remained higher than in the Goods
domain for all ages.
K & 1st 2nd & 3rd 4th & 5th U
Age Group
ICR
per S
ubje
ct
GoodsSocialRisk
Choice-Task
K & 1st 2nd & 3rd 4th & 5th U
Age GroupIC
E pe
r Sub
ject
GoodsSocialRisk
Choice-Task
Figure 12: Ranking and choices. Left: Inconsistencies with respect to implicit ranking(ICR). Right: Inconsistencies with respect to explicit ranking (ICE).
Explicit vs. implicit ranking. The implicit ranking is revealed by choices and does not
need to coincide with the explicit ranking elicited in a Ranking task. For each participant
and task, we computed a measure of distance between those rankings. We used a very
simple procedure where we assigned to each option a score representing the absolute dif-
ference between its rank in the implicit and explicit rankings (e.g., if A was ranked 3rd
according to the explicit ranking and 2nd according to the implicit ranking, its score was
|3− 2| = 1). We then computed the average score of all options as each subject’s discrep-
ancy score. Fig.13 depicts those scores by domain and age group. As can be seen from the
graph, the patterns are similar to TV. In the Goods domain, the ability to choose accord-
ing to one’s explicitly disclosed preferences was found to gradually develop. The same is
true in the Social domain for age groups 2nd-3rd and above. By contrast, discrepancies
remained constant with age in the Risk domain, suggesting a persistent inability to draw
choices from explicit preferences. Overall, the result reinforces the idea that preferences
are not well established in young children, with the corresponding consequences in terms
of transitivity violations as well as discrepancies between pairwise choices and explicit
rankings.
Relationship between measures of consistency. Not surprisingly given our previous
25
K & 1st 2nd & 3rd 4th & 5th U
Age Group
Disc
repa
ncy
Scor
e pe
r Sub
ject
GoodsSocialRisk
Choice-Task
Figure 13: Discrepancies between explicit and implicit rankings (all subjects).
findings, our measures of consistency were all highly correlated. Indeed, a high number of
transitivity violations (TV) was associated with a high number of inconsistencies between
explicit rankings and choices (ICE) in the Goods domain (Pearson = 0.79), in the Social
domain (Pearson = 0.69) and in the Risk domain (Pearson = 0.72) with all p-values
< 0.0001. A high number of TV was also associated with a high discrepancy score between
implicit and explicit ranking in the Goods domain (Pearson = 0.61), in the Social domain
(Pearson = 0.54) and in the Risk domain (Pearson = 0.60), again with all p-values <
0.0001. Last, TV was also highly correlated with classification error score (ICR) in the
Goods domain (Pearson = 0.95), in the Social domain (Pearson = 0.93) and in the Risk
domain (Pearson = 0.93), all p-values < 0.0001. The results were very similar when we
removed subjects who used simple policies.
Overall, although we feel that TV is the best measure of choice consistency, the main
conclusion of SI3.1 is that the results presented in the main text are robust both when
we consider severity instead of number of violations and also when we use different in-
consistency measures, such as ICE or ICR: intransitivity is invariably associated with the
inability to make choices consistent with rankings, both implicit and explicit, and these
have different developmental signatures across domains.
SI-3.2. Other analysis of choices
Indifference. We checked whether age and choice domain had an influence on the
tendency to be indifferent. The results are reported in Table 2. In the Goods domain, the
26
number of indifferent choices decreased over time (unpaired t-test, p-values < 0.05 between
K-1st and 4th-5th, between K-1st and U, and between 2nd-3rd and U). Participants
were less often indifferent in the Social domain than the Goods domain (paired t-test, p-
values < 0.05 for age groups K-1st and U), and school-age children in the Social domain
were more often indifferent than participants in the U age group (unpaired t-test, p-values
< 0.05). Finally, in the Risk domain we observed no trend in reducing indifferences with
age.
Goods Social Risk
K-1st 3.81 (0.54) 2.04 (0.47) 1.53 (0.59)2nd-3rd 3.09 (0.47) 2.47 (0.54) 1.31 (0.45)4th-5th 2.19 (0.42) 2.50 (0.60) 1.94 (0.53)U 1.86 (0.23) 1.12 (0.28) 1.49 (0.35)
(standard errors in parenthesis)
Table 2: Average number of indifferences.
At the same time, we also found that triplets involving indifferent choices were less
likely to result in TV in all domains for the whole sample (t-tests, p-values < 0.0001), as
well as for each age group (p-values < 0.0001).
Reaction times and choices. Reaction times in trials involving violations were longer
compared to reaction times in trials involving no violations. This was true in all age
groups and in all domains (KS test, p-value = 0.015 for K-1st in the Risk domain, p-
value = 0.010 for 2nd-3rd in the Risk domain and p-value < 0.0001 for all other age
groups and domains), suggesting that participants who were more confused, took longer
to choose an ended up contradicting their choices. We also found that reaction times were
longer when participants pressed the indifference button compared to when they made a
left or right selection (KS test, p-value < 0.0001 for age groups above 2nd-3rd), consistent
with the idea that choices were more difficult to make when options were close in value.
Last, simple policy users were faster than the other subjects (KS test, p-value < 0.0001),
indicating that their choice process was simpler.
SI-3.3. Simple policy users
From the 59 subjects who used simple policies in the Social domain and the 55 who
used simple policies in the Risk domain, we found that only 19 used simple policies in
both domains. We also compared TV in the Goods domain (where no simple policies
were available) by subjects who did and did not use simple policies in the other domains
and found no significant differences. Therefore, centration and the development of the
27
value-based decision-making system appeared to be uncorrelated. However, we found
that simple policy users were very distinguishable from the other participants in terms
of discrepancies between implicit and explicit rankings (t-test, p-value < 0.0001 in both
Risk and Social). This means that they were explicitly revealing preferences that were not
supported by their choices, that is, by their implicit rankings.
SI-3.4. Evolution of preferences
Evolution of preferences in the Social domain. The most common of the simple policies
employed by our participants was to maximize the reward for self, but its predominance
changed over time. Indeed, the preference for giving seemed to be developing: small
children tended to maximize their own reward systematically and, other things being equal,
they also preferred smaller rewards for others. Older participants however selected larger
rewards for others. We did not find any evidence that participants were maximizing the
payoffs of both participants. However, we found that they came closer to that policy with
age, with participants in the 4th-5th age group being similarly close to it as participants
in the U group.
The most and least favorite options, as revealed by implicit rankings, also changed
over time. Indeed, for all ages, (4,0) and (3,3) were the most popular, but the frequency of
(4,0) decreased while the frequency of (3,3) increased with age. Similarly, (0,4) and (0,5)
were the least popular but the frequency of (0,5) decreased while the frequency of (0,4)
increased with age. The results are summarized in Table 3.
Most favorite Least favorite(4,0) (3,3) (0,4) (0,5)
K-1st 0.65 0.33 0.43 0.772nd-3rd 0.40 0.58 0.31 0.784th-5th 0.50 0.50 0.50 0.72U 0.33 0.73 0.80 0.30
Table 3: Most and least favorite options in Social domain derived from implicit rankings
Our Social-Choice task was also rich enough to permit a study of the evolution of
other regarding preferences. In particular, it is possible to study whether participants
were prosocial (chose (3,3) over (3,1)), willing to share (chose (3,1) over (4,0)) or envious
(chose (0,4) over (0,5)). In other words, we can conduct a similar analysis as in Fehr
et al. (2008), and determine a type for each subject as a function of the decisions in
these three pairs of choices. We defined the same five types as in Fehr et al. (2008):
“strongly egalitarian” (choices (3,3); (3,1); (0.4)), “weakly egalitarian” (choices (3,3);
28
(4,0); (0,4)), “strongly generous” (choices (3,3); (3,1); (0,5)), “weakly generous” (choices
(3,3); (4,0); (0,5)), and “spiteful” (choices (3,1); (4,0); (0,4)). To make it comparable with
the literature, we excluded subjects who expressed one or more indifferences. In Fig.14, we
report the proportion of subjects in each of these five categories (as in Fehr et al. (2008),
the remaining subjects are those who do not belong to any of them).
0
10
20
30
40
50
60
70
80
90
100
3y-4y 5y-6y 7y-8y
Fehretal.(2008)
0
10
20
30
40
50
60
70
80
90
100
K-1st 2nd-3rd 4th-5th U
Thisstudy
Spiteful
Weaklygenerous
Stronglygenerous
Weaklyegalitarian
Stronglyegalitarian
Figure 14: Evolution of other-regarding preferences.
Notice that the results with the aforementioned paper are not directly comparable: ages
overlap, reward structures are not identical and options are also different (e.g., choosing
between (1,1) and (1,2) captures a slightly different aspect of envy than choosing between
(0,4) and (0,5)). However, if we compare 5y-6y with K-1st and 7y-8y with 2nd-3rd we
notice that outcomes are similar across studies. Consistent with the centration hypothesis,
many young children are spiteful. As they grow, they develop some integrative reasoning
and become more egalitarian. Our oldest participants are predominantly generous.
Evolution of preferences in the Risk domain. The most common of the simple policies
employed by our participants was to maximize reward but, as in the Social domain, its
predominance changed over time. More than 20% of participants in the K-1st and 2nd-
3rd age groups used it against less than 10% in the 4th-5th and U age groups. We
chose maximization of expected value, E(V), as a template of integrative reasoning, but
strictly speaking no participant maximized E(V). Only two subjects were one step away
from that integrative policy and they both had some TV. For each participant, we counted
the number of choices that maximized E(V) and averaged this count across participants
in each Choice task and each age group. We found that participants in the K-1st, 2nd-
3rd and 4th-5th age groups were making significantly fewer choices consistent with the
maximization of E(V) compared to participants in the U age group. In particular, after
removing simple policy users, each group of children used policies that were farther away
29
from E(V) compared to participants in the U age group (t-test, p-values < 0.005).
When looking at the favorite option revealed by implicit rankings, we found that chil-
dren were transitioning gradually from the option involving the largest quantity (12, 12.5%)
to the option exhibiting the largest expected value (5, 50%), as depicted in Table 4. In-
terestingly, option (12, 12.5%) was ranked first by many and, at the same time, last by
others.
(12,12.5%) (5,50%)
K-1st 0.60 0.132nd-3rd 0.44 0.354th-5th 0.28 0.53U 0.16 0.82
Table 4: Favorite option in the Risk domain derived from implicit rankings
The results taken together showed that behavior was changing from choices consistent
with very simple policies to choices resulting from trade-offs and integrative thinking. The
centration effect observed in young participants made them appear selfish and consistent
in the Social domain and risk-loving and consistent in the Risk domain. These attitudes
gradually changed with age.
Finally, one could argue that a main result of the paper, namely the idea that partici-
pants learn to know what they like most in the Social domain whereas they learn to know
what they liked least in the Risk domain, was simply due to the fact that there was a
clear best option in Social and a clear worst option in Risk (despite our attempts to have
reasonably balanced options). This simple explanation would be inconsistent with the
fact that the most and least preferred options in Social and Risk change significantly with
age. Instead, we claim the existence of different developmental signatures across domains
regarding the subjects’ learning about their own preferences.
SI-3.5. Attention trials
Remember that after all 21 paired comparisons in each Choice task we included an
“attention trial”, that is, a last choice between the subject’s most- and least-favorite
options, as revealed by their implicit ranking. In each attention trial, subjects received an
inattention score of 1, 0.5, or 0 if they selected their least-favorite option, the indifference
button, or their most-favorite option, respectively. We found that most children were
attentive: 70% in K-1st, 76% in 2nd-3rd, 84% in 4th-5th and 100% U answered all
attention trials correctly. Most of those who failed got a total inattention score of 0.5, and
no children failed all 3 trials. We also found that performance on attention trials and TV
30
was correlated in all three tasks: Goods-Choice task (Pearson = 0.54, Spearman = 0.43),
Social-Choice task (Pearson = 0.40, Spearman = 0.20), and Risk-Choice task (Pearson =
0.36, Spearman = 0.32), all p-values < 0.01. It suggests a relationship between ability to
choose consistently and attention mechanisms. In the same lines, discrepancies between
explicit and implicit rankings were correlated in all domains with attention trials: Goods
domain (Pearson = 0.38, Spearman = 0.34) Social domain (Pearson = 0.32, Spearman =
0.21) and Risk domain (Pearson = 0.27, Spearman = 0.30), all p-values < 0.01.
These results imply that attentiveness as measured by attention trials was a predictor
of intransitivity and it was also associated with the ability to choose according to explicit
rankings.
SI-3.6. Transitive reasoning
We counted for each participant the number of mistakes accumulated in each level of
difficulty of the transitive reasoning task. In all three levels, the K-1st group, 2nd-3rd
group, and 4th-5th group accrued significantly more errors than the U group (p-value
< 0.001, p-value < 0.05, and p-value < 0.05, respectively). Participants in the K-1st
group made more mistakes on the most difficult reasoning trials than they did on the easy
or medium trials (p-value = 0.02 and p-value = 0.0001, respectively). Within the other
age groups the average error counts were similar across trial difficulty (with the exception
of the 2nd-3rd group which had more mistakes on easy than on difficult trials (p-value
= 0.02)).
Performance in the transitive reasoning task was correlated with attentiveness. This
was true for the most difficult trials (Pearson = 0.20, p-value < 0.01) as well as for all
levels of difficulty together (Pearson = 0.22, p-value < 0.01). We also found that it was
correlated with the level of discrepancies between implicit and explicit rankings in all
domains: Goods domain (Pearson = 0.33, p-value < 0.0001), Social domain (Pearson =
0.20, p-value < 0.01) and Risk domain (Pearson = 0.17, p-value < 0.05).
Overall, transitive reasoning was associated with the same explanatory variables as
transitive decision-making.
SI-3.7. The determinants of transitive choices
Relationship between TV and demographic variables. Given TV covaries across do-
mains, we looked for possible common explanatory variables of intransitivity. To this end,
we ran OLS regressions treating TV in each domain as the variable to be explained by a
set of characteristics. Those included age group, gender, number of younger siblings, and
number of older siblings. We found that the only significant explanatory variable for TV
was age group. Moving from one age group to the next was associated with a decreased
31
number of TV in the Goods-Choice task (p-value < 0.001) and in the Social-Choice task
(p-value < 0.001) but not in the Risk-Choice task (p-value = 0.921). The results were
unchanged when we removed subjects who used simple policies.
Relationship between TV and developmental variables. We ran OLS regressions on
the full sample to assess the explanatory power of mistakes in transitive reasoning on
transitivity violations in each domain. We found that mistakes in transitive reasoning (‘TR
mistakes’) were not associated with TV in the Risk domain. They were correlated with
TV in the Goods and Social domains, but significance levels dropped as we controlled for
other explanatory variables such as age group (‘Dummies K-1, 2-3, 4-5’) and attentiveness
(‘Attention trials’). These however were highly significant as well as the tendency to use
simple policies (‘Simple policy’). The results are reported in Tables 5, 6 and 7.
Model 1 Model 2 Model 3 Model 4
TR mistakes 0.878∗∗∗ 0.393∗∗ 0.297∗ 0.188Dummy K-1 − 3.049∗∗∗ 2.497∗∗∗ 1.376∗∗
Dummy 2-3 − 1.383∗∗ 0.824 0.221Dummy 4-5 − 0.146 -0.139 -0.275Attention trials − − 3.460∗∗∗ 2.565∗∗∗
Discrepancies − − − 2.397∗∗∗
Constant 1.359∗∗∗ 0.693∗ 0.708∗∗ -0.403
# obs 185 185 185 185R2 0.127 0.242 0.359 0.485
Significance levels: ∗ = 0.05, ∗∗ = 0.01, ∗∗∗ = 0.001.
Table 5: OLS regression of transitivity violations in the Goods domain
Overall, TV was best predicted in all domains by the performance in attention trials
and the ability to make choices consistent with explicit rankings (specifically, to have
similar implicit and explicit rankings, as captured by ‘Discrepancies’). It was further
predicted by centration (’Simple policy’) in the Social and Risk domains. Transitive
reasoning, though correlated with TV, failed to predict any TV result after controlling for
these other explanatory variables.
32
Model 1 Model 2 Model 3 Model 4 Model 5
TR mistakes 0.359∗∗∗ 0.115 0.069 0.016 -0.018Dummy K-1 − 1.492∗∗∗ 1.176∗∗ 0.775∗ 1.010∗∗
Dummy 2-3 − 0.738∗ 0.418 -0.034 0.023Dummy 4-5 − 0.215 0.048 0.031 0.122Attention trials − − 1.937∗∗∗ 1.330∗∗∗ 1.311∗∗∗
Discrepancies − − − 1.654∗∗∗ 1.250∗∗∗
Simple policy − − − − -1.003∗∗∗
Constant 0.894∗∗∗ 0.531∗ 0.538∗∗ -0.213 0.331
# obs 184 184 184 184 184R2 0.045 0.103 0.187 0.364 0.379
Significance levels: ∗ = 0.05, ∗∗ = 0.01, ∗∗∗ = 0.001.
Table 6: OLS regression of transitivity violations in the Social domain
Model 1 Model 2 Model 3 Model 4 Model 5
TR mistakes 0.216 0.265 0.179 0.027 0.006Dummy K-1 − -0.358 -0.957 -0.947 0.234Dummy 2-3 − 0.069 -0.523 -0.149 0.657Dummy 4-5 − 0.765 0.457 0.598 1.007∗
Attention trials − − 3.579∗∗∗ 1.796∗∗∗ 1.690∗∗∗
Discrepancies − − − 2.237∗∗∗ 1.559∗∗∗
Simple policy − − − − -2.255∗∗∗
Constant 2.510∗∗∗ 2.390∗∗∗ 2.403∗∗∗ 0.592 1.328∗∗∗
# obs 183 183 183 183 183R2 0.008 0.022 0.157 0.416 0.471
Significance levels: ∗ = 0.05, ∗∗ = 0.01, ∗∗∗ = 0.001.
Table 7: OLS regression of transitivity violations in the Risk domain
33