value-based decision-making: a new developmental paradigm€¦ · value-based decision-making: a...

Value-Based Decision-Making:

A New Developmental Paradigm ∗

Isabelle BrocasUniversity of Southern California

and CEPR

Juan D. CarrilloUniversity of Southern California

and CEPR

T. Dalton CombsDopamine Labs

Niree KodaverdianUniversity of Southern California

May 2017

Abstract

How does value-based reasoning develop and how different this development is fromone domain to another? Children from Kindergarten to 5th grade made pairwisechoices in the Goods domain (toys), Social domain (sharing between self and other),and Risk domain (lotteries) and we evaluated how consistent their choices were. Wereport evidence that the development of consistency across domains cannot be ac-counted for by existing developmental paradigms: it is not related to the developmentof transitive reasoning and it is only partially linked to developmental aspects of atten-tional control and centration. Rather, choice consistency is related to self knowledgeof preferences which develops gradually and differentially across domains. The Goodsdomain offers a developmental template: children become more consistent over timebecause they learn what they like most and least. Systematic differences exist how-ever in both the Social and Risk domains. In the Social domain, children graduallylearn what they like most but not what they like least, while in the Risk domain, theygradually learn what they like least but not what they like most. These asymmetricdevelopments give rise to asymmetric patterns of consistency.

Keywords: laboratory experiments, developmental economics, revealed preferences, risk,

social preferences.

JEL Classification: C91, D11, D12.

∗We are grateful to members of the Los Angeles Behavioral Economics Laboratory (LABEL) for theirinsights and comments in the various phases of the project. We also thank participants at the 2014Social Neuroscience retreat (Catalina Island, USC) and at the 2015 Morality, Incentives and UnethicalBehavior Conference (UCSD) for useful comments and the staff at the Lycee International de Los Angelesfor their support. All remaining errors are ours. The study was conducted with the University of SouthernCalifornia IRB approval UP-12-00528. We acknowledge the financial support of the National ScienceFoundation grant SES-1425062.

1 Introduction

Adults have many abilities that children lack. Multiple paradigms exist to explain why

these differences exist and change during development. Each of these paradigms describes

an important aspect of development, although none of them alone can explain all of the

diverse abilities that change as we grow. Here, we present evidence that an important

difference between children and adults cannot be explained by existing paradigms; namely,

that adults consistently know what they want but children do not. We can measure the

consistency of preferences by testing for transitivity of choices: if a subject chooses option

A over option B and option B over option C, consistency of preferences requires that

she must choose option A over option C. A number of studies have shown that children,

especially young ones, are less consistent than adults in the Goods domain, that is, when

choosing between foods or toys (Smedslund, 1960; Harbaugh, Krause and Berry, 2001; List

and Millimet, 2008) while there is some partial evidence that age is less of a predictor of

consistency in the Social (Harbaugh and Krause, 2000) and Risk (Harbaugh, Krause and

Vesterlund, 2002) domains. This suggests that children’s self-knowledge of preferences is

imperfect and varies across domains.

It is intuitive, however, that differences in transitivity across ages and domains may

reflect known aspects of development. Even though children know that they prefer A

to B and B to C, their choices may not conform to this ranking for reasons related to

established developmental paradigms. For example, it could occur because attentional

control, a deficit which has previously been associated with intransitive behavior in older

adults (Brocas, Carrillo, Combs and Kodaverdian, 2015), is still underdeveloped in children

(Davidson, Amso, Anderson and Diamond, 2006; Astle and Scerif, 2008). Alternatively,

it may be because the ability to reason logically (Sher, Koenig and Rustichini, 2014) and

transitively (Piaget, 1948; Bouwmeester and Sijtsma, 2006) is still developing. Finally, it

may result from children’s inability to focus on more than one attribute of an item at a

time, a phenomenon referred to as centration (Piaget, Elkind and Tenzer, 1967; Donaldson,

1982; Crain, 2015).

The objective of this research is to assess the common and domain-specific develop-

mental trajectories of transitive decision-making in the Goods, Social, and Risk domains

and to determine if the dominant developmental paradigms (attentional control, logical

reasoning, and centration) are enough to explain age- and domain-related differences in

transitive decision-making. Our study is most closely related to the literature on choice

consistency. There are however two main differences between the existing studies and

ours. A first difference is conceptual. We want to compare consistency across domains

and investigate how existing developmental paradigms account for changes in preference

2

consistency over childhood. Earlier studies offered instead analyses of consistency in one

domain at a time. A second difference is methodological. Earlier studies have relied on

the Generalized Axiom of Revealed Preferences (GARP), an indirect test of transitivity

which focuses on choices between bundles of options given a budget constraint, a system

of prices, and a non-satiation assumption (Samuelson, 1938; Varian, 1982). By contrast,

our design is a direct test of transitivity. Furthermore, the extreme simplicity of the design

(pairwise comparisons instead of a system of prices and a budget constraint) and the re-

moval of assumptions such as non-satiation (sometimes violated in experimental studies)

makes it especially suitable to test it in young children. It also delivers results that are

directly comparable across domains.

2 Methods

We recruited 134 children from Kindergarten to 5th grade and 51 Undergraduate students

to participate in three Choice tasks and three Ranking tasks (Fig.1). In each Ranking

task, we asked participants to provide explicit rankings of seven items. In each Choice

task, we asked them to choose between all 21 pairwise combinations of those seven items.

The Goods-Choice and Goods-Ranking tasks involved age-appropriate toys, the Social-

Choice and Social-Ranking tasks involved sharing rules between self and other, and the

Risk-Choice and Risk-Ranking tasks involved lotteries. For each domain and age group,

we determined whether the pairwise choices in the Choice tasks were transitive and how in-

transitive choices were distributed over rankings elicited in the Ranking tasks. We included

attention trials to assess attentional control, and a reasoning task to measure transitive

reasoning. To assess centration, we determined whether actual choices were consistent

with attending to single attributes. For analysis, we grouped children into the age groups

K-1st (Kindergarten and 1st grade, 47 participants), 2nd-3rd (2nd and 3rd grades, 55

participants), 4th-5th (4th and 5th grades, 32 participants), and U (Undergraduates, 51

participants). The U age group was a control for adult level value-based decision-making.

Appendix SI-1 contains a detailed description of the design and procedures.

3 Experimental Results

We measured choice consistency by counting the number of transitivity violations (TV).

Formally, for each triplet of items A, B and C, a TV occurs when the participant chooses A

over B, B over C, and C over A. In the analysis, we considered all combinations of triplets

of items and counted the number of times choices between those triplets were intransitive

for each subject in each Choice task. Then, we computed the average number of TV by

3

A) Choice Task

B) Ranking Task

C) Goods

D) Social

E) Risk

Opt.1

Opt.2

Figure 1: Decision-making tasks. (A) In each of the 21 trials of the Choice tasks,participants were shown one combination of two options, left (Opt.1) and right (Opt. 2).They touched one of three buttons displayed at the top of their screen to select an optionor to express indifference (middle button). (B) In Ranking tasks, participants ranked allseven options from most preferred (green happy face) to least preferred (yellow neutralface). Both types of tasks were conducted in the (C) Goods domain involving toys, in the(D) Social domain involving sharing rules for self (hand pointing at self) and other (handpointing right), and in the (E) Risk domain involving lotteries that consisted of quantities(number of toys) and probabilities (green share of the pie).

age group and Choice task. Appendix SI-2 explains the procedure and provides details of

the statistical methods and results in this section.

Choice transitivity is domain-dependent (Fig.2). TV decreased with age in both the

Goods and Social domains, while no two age groups differed significantly in the Risk do-

main. Also, young children were significantly more consistent in the Social than in the

Goods domain and U were less consistent in the Risk than in the other two domains.

However, participants in the U age group were not making significantly more transitive

choices compared to participants in the 4th-5th age group in any domain, initially sug-

gesting that the development of transitive decision-making stops around 4th grade in all

domains. The result was robust to alternative ways of measuring choice inconsistencies

(see Appendix SI-3.1.)

Transitivity in the Goods domain improves gradually with age. Although consistency

in the Goods-Choice task improved with age, that improvement was not uniform across

all paired choices. In particular, trials featuring options ranked very differently in the

4

K & 1st 2nd & 3rd 4th & 5th U

Age Group

Tran

sitiv

ity V

iola

tions

per

Sub

ject Goods

SocialRisk

Choice-Task

Figure 2: Performance improves with age in the Goods-Choice and Social-Choice tasks but not in the Risk-Choice task. We report the average number of TVin the Choice tasks (y-axis) for each age group (x-axis) broken down by domain (Goods,Social, and Risk). Shadings correspond to the 95% confidence intervals.

Goods-Ranking task were unlikely to be involved in a TV. By contrast, trials featuring

options ranked similarly were significantly more likely to be involved in TV (Fig.3). This

general pattern was independent of age and suggested that it was more difficult to choose

consistently when the options involved were close in value. We observed convergence to

a state where participants almost never committed TV when choices involved their best

(left column) or their worst (bottom row) options and most TV got concentrated in items

with intermediate ranks (3rd, 4th and 5th). The result implies that children gradually

“learn to know what they like most and least” with age.

“Looking consistent” in the Social domain and the role of simple policies. We did

not anticipate to find that small children would be significantly more consistent in the

Social-Choice task compared to the Goods-Choice task. One possible explanation is that

goods are atomic and need to be evaluated as a whole. By contrast, social options can be

decomposed into several simple attributes such as “objects for self,” “objects for other,”

and “total number of objects,” where each attribute is easy to evaluate consistently. As

such, a participant might focus on a single attribute at a time (centration) and use simple

rules to choose. To test that hypothesis, we listed all such simple rules (for example,

“pick the option that gives self more objects, and if both give the same, pick the option

that gives other more objects”) and determined the fraction of participants who employed

them. From now on, we call them ‘simple policies,’ although they may also correspond to

5

0.7

0.6

0.5

0.8

0.4

0.5

0.3

0.20.1

0.2

0.3

0.4

0.0 0.0

0.1

0.2

0.3

1 2 3 4 5 6 1 2 3 4 5 6 1 2 3 4 5 6 1 2 3 4 5 6

76

54

32

76

54

32

76

54

32

76

54

32

Rank of Higher Ranked OptionRank of Higher Ranked Option Rank of Higher Ranked Option Rank of Higher Ranked Option

Rank

of L

ower

Ran

ked

Opt

ion

Transitivity Violations in Goods-Choice-Task Plotted by Option Ranks in Goods-Ranking-Task for Each Age Group

K and 1st 2nd and 3rd 4th and 5th Undergraduate

Figure 3: Transitivity violations decrease with age differently across choices.Each cell represents the color-coded average number of TV involving a higher ranked option(x-axis) and a lower ranked option (y-axis), as revealed by explicit rankings obtained inthe Goods-Ranking task. Lighter colors reflect more violations. All age groups are morelikely to make TV when options have similar ranks. There is convergence to a state whereparticipants almost never commit TV when choices involve their best (left column) or theirworst (bottom row) options. The vectors in the top right corner show the average gradientof the heatmap. Subjects become more consistent if the rank of the higher-ranked optionincreases (left-right gradient of the vector) and if the rank of the lower-ranked optiondecreases (up-down gradient of the vector).

simple heuristics or lexicographic preferences. Either way, these rules “buy” consistency,

as they are easy to follow and unlikely to produce TV.

The most popular simple policy was that of maximizing objects for self, then mini-

mizing objects for other. The choices of 35% of K-1st, 20% of 2nd-3rd, 22% of 4th-5th

and 4% of U were in-line with this policy. When we removed from our sample all the sub-

jects who used simple policies, the developmental signature matched that of the Goods

domain (Fig.4, left panel). However, a systematic difference with the Goods domain per-

sisted when we looked at how violations evolved between similarly- and differently-ranked

options (Fig.5, top row (A)). We found that children learned to become consistent in

choices involving their best options (“they learned to know what they liked most”) but

they still committed a substantial number of violations in choices involving their worst

options (“they did not learn to know what they liked least”). This was true even in the U

age group. Last, a noticeable trend was the gradual evolution of behavior towards more

integrative decision rules, reflecting trade-offs between the two attributes. This result sug-

gests that as children age, they become better able to think in terms of prosociality and

social efficiency, confirming the results from studies on other-regarding preferences (Fehr,

Bernhard and Rockenbach, 2008).

Limited development of consistency in the Risk domain. Given lotteries are also multi-

attribute options, in this case probabilities and outcomes, we hypothesized that centration

6


Age Group

Tran

sitiv

ity V

iola

tions

per

Sub

ject Goods

SocialRisk

Choice-TaskGoodsSocial

Social

Choice-Task

All Subjects

w/o HeuristicSubjects


Age Group

GoodsRisk

Risk

Choice-Task

All Subjects

w/o HeuristicSubjects

Figure 4: Left panel. Social vs. Goods after removing simple policies: thedevelopmental signature of consistency is comparable across domains. Right panel.Risk vs. Goods after removing simple policies: the developmental signature ofconsistency is different across domains. All children are similarly inconsistent in Risk andimprovements occur later in life.

could also play a role in the Risk domain. We listed all simple policies characterized

by the evaluation of one attribute at a time (e.g., “pick the option with more goods,

and if both have the same, pick the most likely option”) and determined the fraction of

participants with a behavior consistent with one of these policies. Again one of them was

used dominantly: 44% of K-1st, 35% of 2nd-3rd, 19% of 4th-5th and 8% of U chose the

lottery offering the larger reward. When we removed all subjects who used simple policies,

we found that the developmental signature did not match that of the Goods and Social

domains (Fig.4, right panel). In particular, the number of violations was the same across

all school-age groups and this was significantly different from that of the U group. The

result indicates that a potential milestone for Risk was outside our window of observation,

somewhere in middle or high school. Also, children learned to become more consistent in

choices involving their worst options (“they learned to know what they liked least”), but

less so in choices involving their best options (“they did not learn to know what they liked

most”), a trend opposite to the trend observed in the Social domain (Fig.5, bottom row

(B)). Similar to the Social domain, however, older participants were better able to make

integrative decisions, in this case, trade-offs between reward amounts and probabilities.

Who makes transitive choices? As noted earlier, centration took the form of using

simple policies and it was strongly associated with consistency. In addition, among par-

7

0.8

0.7

0.6

0.5 0.4

0.5

0.6

0.7

0.8

0.3

0.4

0.5

0.6

0.7

0.8

0.9

0.3

0.4

0.5

0.7

0.6

0.5

0.4

0.1

0.2

0.3

0.0

0.1

0.2

0.3

0.4

0.0

0.1

0.2

1 2 3 4 5 6 1 2 3 4 5 6 1 2 3 4 5 6 1 2 3 4 5 6

1 2 3 4 5 6 1 2 3 4 5 6 1 2 3 4 5 6 1 2 3 4 5 6

Rank of Higher Ranked OptionRank of Higher Ranked Option Rank of Higher Ranked Option Rank of Higher Ranked Option

Rank

of L

ower

Ran

ked

Opt

ion

Rank

of L

ower

Ran

ked

Opt

ion

76

54

32

76

54

32

76

54

32

76

54

32

76

54

32

76

54

32

76

54

32

76

54

32

Transitivity Violations in Social-Choice-Task and Risk-Choice-Task Plotted by Option Ranks in the Respective Ranking-Task for Each Age-GroupK and 1st 2nd and 3rd 4th and 5th Undergraduates

K and 1st 2nd and 3rd 4th and 5th Undergraduates

A) S

ocia

lB)

Ris

k

Figure 5: Transitivity violations across choices among participants who do notuse simple policies. (A) In the Social domain, participants converge to a state whereTV rarely involve their highest-ranked options (vector with a horizontal gradient: “partic-ipants learn to know what they like”). (B) In the Risk domain, participants converge to astate where TV rarely involve their lowest-ranked options (vector with a vertical gradient:“participants learn to know what they do not like”).

ticipants who did not use simple policies, intransitivity was strongly correlated across

domains. Intransitivity was also strongly associated with mistakes in attention trials,

indicating that attentional control also played a role. By contrast, performance in the

transitive reasoning task was not a predictor, suggesting that logical reasoning per se was

not a key determinant of choice transitivity. Finally and notably, participants whose de-

cisions in the Choice tasks were more in line with the ordering displayed in the Ranking

tasks had also fewer TV (see Appendix SI-3.2). This effect is not reminiscent of any

known paradigm. It implies that the gradual and differential development of choice con-

sistency across domains is not entirely explained by changes in attention, centration and

logical reasoning. As children learn to know what they like best and least, their pairwise

choices and explicit rankings become more consistent with each other. In the Appendix,

we present some further investigation of the determinants of choice consistency (SI-3.2,

SI-3.3, SI-3.4), the relationship between TV and attention control (SI-3.5), the relation-

ship between TV and transitive reasoning (SI-3.6) as well as an integrative analysis of TV

in each domain (SI-3.7)

8

4 Discussion

Decision-making in the Goods domain: the developmental template. For each age group,

the behavior observed in the Goods domain was consistent with the hypothesis that par-

ticipants make choices by estimating and comparing noisy values. Under that hypothesis,

decisions involving options close in value are confusing and prone to error, while decisions

involving options valued differently are easy to make. The developmental trajectory we

observed also suggests that the evaluation process becomes less and less noisy over time,

reducing the number of confusing decisions, and hence the number of violations, espe-

cially among options ranked very differently (Fig.3). This pattern of improvement implies

that the ability to make the simplest decisions is what solidifies in this developmental

window. Over time, children learn to know with more accuracy what they like most and

least. In addition, the fact that inconsistency was not associated with the ability to rea-

son transitively suggests that value-based reasoning requires the involvement of different

brain regions compared to logical reasoning (ventromedial prefrontal cortex in the first

case (Levy and Glimcher, 2012) and parietal regions in the second (Hinton, Dymond,

Von Hecker and Evans, 2010)). Furthermore, the association of inconsistency with age

and attentiveness suggests that improvements in decision-making in the Goods domain

are partly driven by known age-related changes in attentional control.

Decision-making in the Social domain: the effect of centration. Reasoning over multiple

attributes is known to be a difficult exercise for children 7 years of age and younger, and

this ability develops during the concrete operational stage, somewhere between 2nd and

5th grade (Piaget, 1952; Crain, 2015). The high utilization of simple policies by children

in the youngest age group is therefore consistent with centration. Participants who have

not yet overcome centration are more likely to pick a simple policy that makes them look

consistent to the outside observer. Among children who do not use simple policies, we

observe the same trajectory as in the Goods domain, implying that centration conceals

their underdeveloped decision-making system.

Decision-making in the Risk domain: too complex. As in the Social domain, making

choices in the Risk domain requires the evaluation of multiple attributes. It is therefore

natural to observe a similar tendency, especially among young children, to act according

to simple policies that makes them look consistent. However, integrative reasoning is

substantially more complex in the Risk domain and participants who do not use simple

policies are not nearly as consistent as in the other two domains. All children were at the

same level of performance suggesting that the ability to trade-off probabilities and rewards

was not yet developed, perhaps as a consequence of a still underdeveloped working memory

system (Gathercole, Pickering, Ambridge and Wearing, 2004).

9

A different learning trajectory across domains. The results obtained for participants

in the U age group are consistent with the typical findings in choice consistency literature:

that by adulthood, people have learned to know their preferences and are largely consistent

in the Goods (Battalio, Kagel, Winkler, Fisher, Basmann and Krasner, 1973; Cox, 1997)

and Social domains (Andreoni and Miller, 2002; Fisman, Kariv and Markovits, 2007).

Our findings in the Risk domain are in accordance with the original transitivity stud-

ies (Loomes, Starmer and Sugden, 1991), and less optimistic than recent GARP studies

(Mattei, 2000; Choi, Fisman, Gale and Kariv, 2007) perhaps due to differences in design.

More importantly, our study identifies different learning trajectories across domains. In

the Social domain, children learn to pick their most-preferred option. In the Risk domain,

children learn to avoid their least-preferred option. In the Goods domain, children learn

both. These asymmetries cannot be explained by the existence of an obvious best option

in the Social domain and an obvious worst option in the Risk domain, since the most and

least favorite items change significantly across age groups (see Appendix SI-3.4 for details).

It cannot be explained by both an obvious best and worst option in the Goods domain,

since the set of goods are different for different gender and age groups (see Appendix SI-1

for details). Moreover the goods ranked highest and lowest within each age-group vary

substantially across individuals. Finally, the development of attentional control cannot be

the main explanation either, since these should act similarly across domains. Overall, the

results suggest that self-knowledge of preferences follows its own developmental trajectory.

To conclude, we have investigated the developmental trajectories of transitive decision-

making in the Goods, Social, and Risk domains in children from Kindergarten to 5th

grade and compared consistency levels with adult-level performance. We have found that

transitivity in choices develops gradually, but differentially across domains. It is not

related to the development of transitive reasoning and it is only partially explained by

the development of attentional control. Instead, choice transitivity follows a common

trajectory across domains supported by the gradual improvement of self-knowledge of

preferences.

10

References

Andreoni, James and John Miller, “Giving According to GARP : An Experimental

Test of the Consistency of Preferences for Altruism,” Econometrica, 2002, 70 (2), 737–

753.

Astle, Duncan E. and Gaia Scerif, “Using developmental cognitive neuroscience to

study behavioral and attentional control,” Developmental Psychobiology, 2008, 51 (2),

107–118.

Battalio, Raymond C, John H Kagel, Robin C Winkler, Edwin B Fisher,

Robert L Basmann, and Leonard Krasner, “A test of consumer demand theory

using observations of individual consumer purchases,” Economic Inquiry, 1973, 11 (4),

411.

Bouwmeester, Samantha and Klaas Sijtsma, “Constructing a transitive reasoning

test for 6- to 13-year-old children,” European Journal of Psychological Assessment, 2006,

22 (4), 225–232.

Brocas, Isabelle, Juan D. Carrillo, T Dalton Combs, and Niree Kodaverdian,

“Consistency in Simple vs. Complex Choices over the Life Cycle,” CEPR discussion

paper 10457, 2015.

Choi, Syngjoo, Raymond Fisman, Douglas Gale, and Shachar Kariv, “Con-

sistency and Heterogeneity of Individual Behavior under Uncertainty,” The American

Economic Review, 2007, 97 (5), 1921–1938.

Cox, J.C., “On Testing the Utility Hypothesis.,” The Economic Journal, 1997, 107 (443),

1054–1078.

Crain, W., Theories of Development: Concepts and Applications, 6 ed., Psychology Press,

2015.

Davidson, Matthew C., Dima Amso, Loren Cruess Anderson, and Adele Dia-

mond, “Development of cognitive control and executive functions from 4 to 13 years:

Evidence from manipulations of memory, inhibition, and task switching,” Neuropsy-

chologia, 2006, 44 (11), 2037–2078.

Donaldson, Margaret, “Conservation - What is the question?,” British Journal of Psy-

chology, 1982, 23 (2), 199.

11

Fehr, Ernst, Helen Bernhard, and Bettina Rockenbach, “Egalatarianism in young

children,” Nature, 2008, 454 (28), 1079–1084.

Fisman, Raymond, Shachar Kariv, and Daniel Markovits, “Individual Preferences

for Giving,” The American Economic Review, 2007, 97 (5), 1858–1876.

Gathercole, Susan E., Susan J. Pickering, Benjamin Ambridge, and Hannah

Wearing, “The Structure of Working Memory From 4 to 15 Years of Age.,” Develop-

mental Psychology, 2004, 40 (2), 177–190.

Harbaugh, William T and Kate Krause, “Children’s Contributions in Public Good

Experiments: The Development of Altruistic and Free-riding Behaviors,” Economic

Inquiry, 2000, 38 (1), 95–109.

, , and Lise Vesterlund, “Risk Attitudes of Children and Adults : Choices Over

Small and Large Probability Gains and Losses,” Experimental Economics, 2002, 5 (1),

53–84.

, , and Timothy R Berry, “GARP for Kids : On the Development of Rational

Choice Behavior,” The American Economic Review, 2001, 91 (5), 1539–1545.

Hinton, EC, S Dymond, Ulrich Von Hecker, and Christopher John Evans,

“Neural correlates of relational reasoning and the symbolic distance effect: Involvement

of parietal cortex,” Neuroscience, 2010, 168 (1), 138–148.

Levy, Dino J. and Paul W. Glimcher, “The root of all value: A neural common

currency for choice,” Current Opinion in Neurobiology, 2012, 22 (6), 1027–1038.

List, John A and Daniel L Millimet, “The Market: Catalyst for Rationality and

Filter of Irrationality,” The BE Journal of Economic Analysis & Policy, 2008, 8 (1),

Article 47.

Loomes, Graham, Chris Starmer, and Robert Sugden, “Observing violations of

transitivity by experimental methods,” Econometrica, 1991, pp. 425–439.

Mattei, Aurelio, “Full-scale real tests of consumer behavior using experimental data,”

Journal of Economic Behavior & Organization, 2000, 43 (4), 487–497.

Piaget, J, “La psychologie de l’intelligence,” Revue Philosophique de Louvain, 1948, 46

(10), 225–227.

Piaget, Jean, “The child’s conception of numbers,” Trans. eds C. Gattegno and FM

Hodgson, NY: Routledge, 1952.

12

, David Elkind, and Anita Tenzer, Six psychological studies, New York, NY: Vintage

Books., 1967.

Reyna, Valerie F and Susan C Ellis, “Fuzzy-trace theory and framing effects in

children’s risky decision making,” Psychological Science, 1994, 5 (5), 275–279.

Samuelson, P A, “A Note on the Pure Theory of Consumer’s Behaviour,” Economica,

1938, 5 (17), 61–71.

Sher, Itai, Melissa Koenig, and Aldo Rustichini, “Children’s strategic theory of

mind,” Proceedings of the National Academy of Sciences, 2014, 111 (37), 13307–13312.

Smedslund, Jan, “Transitivity of preference patterns as seen by preschool children,”

Scandinavian journal of psychology, 1960, 1 (1), 49–54.

Varian, Hal R, “The Nonparametric Approach to Demand Analysis,” Econometrica,

1982, 50 (4), 945–973.

13

Supporting Information

SI-1. Design and procedures

The experiment was reviewed and approved by the IRB of the University of Southern

California. It was conducted through tablet computers and the tasks were programmed

with the Psychtoolbox software, an extension of Matlab.

Participants. We recruited 134 children from kindergarden (K) to 5th grade from

Lycee International of Los Angeles, a bilingual private school. We ran 18 sessions, each

with 5 to 10 subjects and lasting between 1 and 1.5 hours. Sessions were conducted in a

classroom at the school. In all of our tasks, children had to make choices between options

involving goods. To make sure that options were desirable by all participants, we organized

sessions by gender and age group: K to 2nd grade boys, K to 2nd grade girls, 3rd to 5th

grade boys, and 3rd to 5th grade girls. Goods included toys and stationary. As a control,

we ran 7 sessions with 51 undergraduate students (U). These were conducted in the Los

Angeles Behavioral Economics Laboratory (LABEL) in the department of Economics at

the University of Southern California. For the undergraduate population, participants

were recruited from the LABEL subject pool. Instead of toys or stationary, we used snack

foods. A description of the distribution of our participants is reported in Table 1.

K 1st 2nd 3rd 4th 5th U

Male 12 12 15 20 11 6 22Female 7 16 11 9 8 7 29

Total 19 28 26 29 19 13 51

Table 1: Description of participants by age and gender.

Each participant completed several tasks. In Ranking tasks, participants received 7

cards, each with a picture of one of the options. Participants were instructed to rank

these cards on a ranking board attached to their desk from most to least preferred. Once

a participant was finished ranking the options, an experimenter transferred her rankings

onto her tablet. There were three Ranking tasks. In the Goods-Ranking task, options

were toys. In the Social-Ranking task, options were sharing rules for oneself and another

child of the same age and gender in another school. The sharing rules contained different

numbers of a single toy. To ensure the toy was highly desirable, we selected the one

that was ranked as first favorite in the Goods-Ranking task. In the Risk-Ranking task,

options were lotteries. In each lottery, the participant could earn a given number of toys

with a given probability. Probability was represented by a circle with outcomes shown

14

in white and green, with the green area corresponding to the probability of winning the

good(s). Each circle corresponded to a spinner wheel (previous experiments on risk with

children have successfully employed a similar design with a spinner wheel (Reyna and Ellis,

1994; Harbaugh et al., 2002)). Again, to ensure that options were desirable, we chose the

toy that was ranked second favorite in the Goods-Ranking task.

In Choice tasks, participants were presented with all 21 pairwise combinations of the

7 options in the corresponding Ranking task. We will refer to these tasks as Goods-

Choice task, Social-Choice task and Risk-Choice task. In each trial, one option was

displayed on the left side and another on the right of the tablet’s screen. A participant

could select their preferred option (by touching a button above the left or right options

on their screen), or report to be indifferent between them (by touching a button between

the two options). The 21 pairs were presented randomly, both in terms of trial order and

in terms of left-right presentation. These tasks are represented in Fig.1 and the specific

options are displayed in Figs. 6 (Goods), 7 (Social), and 8 (Risk). In each Choice task, the

22nd trial was an attention trial, where subjects were presented with their most frequently

and least frequently chosen options from the preceding 21 trials.1

Figure 6: Toys used in the Goods-Choice and Goods-Ranking tasks. Childrenof different age and gender were presented different toys.

For analysis, participants also completed a Transitive Reasoning task. This task was

designed to measure levels of transitive reasoning. Several tasks have been proposed in

the literature (Bouwmeester and Sijtsma, 2006). In order to test the relation between

1As the 21 pairwise choices were being made, the computer awarded 1 point to the option selected bythe participant and 0.5 points to each option in case of indifference. At the end of the 21st trial, the tallyfor each option was summed and the options with the most and least points were determined. In case ofa tie, one of the options was chosen randomly.

15

Figure 7: Sharing rules between self and other used in the Social-Choice andSocial-Ranking tasks. For children, each circle represented one toy, personalized toensure desirability. For undergraduate students, each token represented $2.

Figure 8: Lotteries used in the Risk-Choice and Risk-Ranking tasks. For chil-dren, each circle represented one toy, personalized to ensure desirability. For undergradu-ate students, each token represented $2.

transitive choice and transitive reasoning devoid of memory or operational reasoning re-

quirements, we opted for a new design; we constructed a test that does not require mem-

ory and is visually represented. The task consisted of seven questions of varying difficulty.

Each of the seven questions consisted of two premises represented in two vignettes, and a

third vignette with a response prompt. For each premise, participants were told that the

animals shown in the vignette were at a party and the oldest wore a hat. They had to

determine which animal in the third vignette should wear the hat. This is illustrated in

Fig.9. Three of the seven questions did not require transitive reasoning and were included

to test whether participants were paying attention. These are referred to as “pseudotransi-

tivity” trials (Bouwmeester and Sijtsma, 2006). The remaining four did require transitive

reasoning. Of the four transitivity questions, two of them were less difficult and two were

more difficult.

All subjects completed the tasks in the same order: (1) Goods-Choice task, (2) Goods-

Ranking task, (3) Transitive Reasoning task, (4) Social-Choice task, (5) Social-Ranking

task, (6) Risk-Choice task and (7) Risk-Ranking task. All tasks were untimed. Upon

the completion of a given task by all subjects, instructions for the next task were given.

To incentivize participants, we told them that it was important to choose options they

liked, because their choices would determine what they would receive at the end of the

16

Figure 9: Transitive Reasoning Task. The animal wearing the hat is the oldest ineach vignette on the left. The participant had to answer in the vignette on the bottomright by choosing the animal he thought was the oldest given the information on the left,or by reporting it could not be known (?).

experiment. This was explained accessibly through a simple analogy. From each of the

Choice tasks, one choice was randomly selected by the computer and subjects received their

selection in that trial. For the Social-Choice task, we explained that each participant had

been paired with another student, of their same gender and grade level, from another school

in Los Angeles (Undergraduate were paired with a subject of another session, matched

for gender). We explained that the selected sharing rule would actually be implemented:

we would go to the other school and deliver to the other student the goods that were

represented in the selected option. The shared items were delivered to Foshay elementary

school, a public school in LAUSD. For the Risk-Choice task, we explained that their choice

in the selected trial would be implemented and that the corresponding spinner wheel would

be spun by a blindfolded assistant at the end of the session that day. If the spinner arrow

landed in the green part of the wheel, the subject would win the number of toys (or

money, for the case of undergraduates) associated to that choice. Otherwise, they would

not win anything from this task. This was implemented at the end of each session. Before

leaving, we collected demographic information consisting of “gender,” “grade,” “number

of younger siblings,” and “number of older siblings.” All participants also received a fixed

show-up fee. Children received their highest ranked item in the Goods-Ranking task while

Undergraduate students received $5.

17

SI-2. Analysis of transitivity violations

We collected data from all participants in the Goods-Choice and Goods-Ranking tasks.

The tablets did not record the choices of two subjects in the Risk-Choice and Risk-Ranking

tasks and one subject in the Social-Choice and Social-Ranking tasks.

Transitivity violations. As briefly explained in the main text, we measured consistency

by counting the number of transitivity violations between all triplets of options: A vs.

B, B vs. C and A vs. C. With 7 options, there are 35 triplets to consider. Importantly,

considering triplets is enough to account for all TV. Indeed, it is trivial to show that a

violation involving four or more items (such as, for example, A chosen over B, B over C,

C over D and D over A) necessarily involves at least one violation between three items of

that set. Also, given we allowed participants to express indifference between options, we

allocated 0.5 violation to triplets involving weak violations, that is, when we observed A

chosen over B, B chosen over C, and A indifferent to C. We first counted the number of

times choices between triplets of items were intransitive for each subject in each Choice

task. Then, we computed the average number of violations by age group and Choice task.

We reported the results in Fig.2.

In the Goods domain, we found a significant improvement between consecutive age

groups, with 4th-5th being at the same level of performance as U subjects. More specifi-

cally, participants in K-1st had significantly more violations than participants in 2nd-3rd

(p-value = 0.0050), participants in 2nd-3rd had significantly more violations than par-

ticipants in 4th-5th (p-value = 0.0355), but violations of 4th-5th and U groups were

not significantly different (p-value = 0.0835). In the Social domain, a similar pattern was

observed. TV were significantly different between the K-1st age group and the 4th-5th

age group (p-value = 0.0162), between the K-1st age group and U age group (p-value

= 0.0002) and between the 2nd-3rd age group and the U age group (p-value = 0.0048),

while all other differences in TV were not statistically significantly. In the Risk domain

however, there was no apparent improvement. No age group of children had significantly

more violations than U.

When we compared TV across domains, we found that participants in the K-1st

age group had significantly more violations in the Goods than in the Social domain (p

= 0.0009). A similar result held for the 2nd-3rd age group (p = 0.0117). Participants

in the U group also had significantly more violations in the Risk than in the Goods (p

< 0.0001) or Social domains (p < 0.0001).

Analysis of transitivity in the Goods domain. We used the explicit ranking elicited

in the Goods-Ranking task to assess the frequency with which items were involved in

transitive violations as a function of their ranking. For each transitivity violating triplet,

we assigned a score of 1 to its 3 corresponding items. Each item was then allocated

18

the percentage of times it was involved in a violation (the number of times it obtained

a score of 1 over all possible times). By repeating this exercise for all participants and

then averaging over all of them, we obtained the percentage of times violations occurred

in choices between options ranked x and y in each age group. Naturally, ranks x and y

corresponded to different specific items since each participant had her own rankings of

options. Fig.3 represents the color-coded result of this exercise. Darker colors are used to

represent fewer violations.

Next, we studied the marginal sensitivity to violations to learn if some choices were

more consistent than others, and if so, why. Intuitively when choices are easy to make,

inconsistencies are more unlikely. Easiness might just be a matter of picking what we like.

Or it might be a matter of avoiding what we do not like. Assuming that ranks derived from

the Ranking tasks give a proxy for ordinal values, the more (less) valuable options should

be those with higher (lower) ranks. If consistency is only driven by choosing what is liked,

it should not matter what the lower-ranked option is, and if consistency is only driven by

avoiding what is liked least, then it should not matter what the higher-ranked option is.

We tested this conjecture by analyzing how consistency changes if we marginally change

the rank of the higher- or lower-ranked option. Intuitively, in Fig.3, if participants are

making choices in a given cell and exhibit a certain level of consistency, how much more

(or less) consistent they become by moving one cell to the right (changing the rank of the

higher-ranked item but not the lower-ranked item) or by moving one cell up (changing the

rank of the lower-ranked item but not the higher-ranked item).

More specifically, we considered every pair of adjacent boxes in the same row and we

determined the difference in violations between a box and the box to the right of it. We

then computed the average difference over all pairs of boxes and reported this number as

the left-right gradient of the vector in the corner of Fig.3 for each age group. A vector

with a larger x-coordinate means a greater increase in violations when the higher-ranked

option is decreased by one rank. We did the same with every pair of adjacent boxes in the

same column to determine the up-down gradient. In this case, a vector with a larger y-

coordinate means a greater increase in violations when the lower-ranked option is increased

by one rank. As expected, the vector of all four age groups have upper right gradients,

indicating more violations when options are ranked more closely in either dimension (lower

rank for the higher option or higher rank for the lower option). It also implies that the

highest number of violations occurs between items of intermediate ranks (3rd, 4th, 5th).

The left graph of Fig.10 presents a heatmap of all vectors in the Goods domain and

confirms the increase in violations when the higher- and lower-ranked options are closer

to each other. Finally, a t-test confirms that both the x- and y-coordinates are positive

and significantly different from 0 (p-value < 0.01) overall and for all age groups, with the

19

exception of the x-coordinate in the K-1st age group. Overall, in the Goods domain, all

groups were significantly driven by the rank of the lower-ranked option, and all groups

except K-1st were driven by the rank of the higher-ranked option.

Goods Social Risk

Figure 10: Sensitivity analysis

Analysis of transitivity in the Social domain. An option in the Social domain can

be decomposed into three attributes that can be visually assessed through counting: the

reward to self, the reward to other, and the total reward. When comparing attributes,

a participant can use three simple rules: pick the maximum, pick the minimum, or be

indifferent. A participant could apply a rule to a single attribute (a policy such as “pick

option with maximum reward for self”); a participant could alternatively apply a rule to

one attribute then move to a second attribute and apply another rule (a policy such as

“pick option with maximum reward for self and if they are the same, pick option with

maximum total reward”). We looked at all such policies, and only 3 were used by subjects:

“maximize amount for self, then minimize amount for other,” “maximize amount for self,

then maximize amount for other” and “maximize amount for self, irrespective of amount

for other.” We counted all participants who complied with any such policy and those who

made exactly one mistake with respect to it. We will call these participants simple policy

users. Among them, 61% chose the policy “maximize amount for self, then minimize

amount for other.” We removed all simple policy users from our sample, leaving us 125

participants. We computed TV by age group after the exclusion and found that the results

were now similar to those obtained in the Goods domain. The development of consistency

followed the same pattern: participants in the K-1st age group were significantly more

inconsistent than all other participants (all p-values < 0.01). Participants in the 2nd-3rd

age group had significantly more violations than participants in the U age group (p-value

= 0.0144) and participants in the 4th-5th age group were not significantly different from

participants in the U age group (p-value = 0.1205). This is represented in Fig.4 (left).

20

We repeated the heatmap analysis we performed on the Goods-Choice task. The result is

represented in Fig.5(A).

We also performed the same marginal sensitivity analysis as in the Goods domain after

removing simple policy users. The x-coordinates were positive and significantly different

from 0 (p-value < 0.01), overall and for all age groups except for participants in the K-

1st age group. However, and contrary to the Goods domain, the y-coordinates were not

significantly different from 0 for any age group. This suggests that participants had a very

clear idea of what they liked best but a much less clear idea of what they liked least in the

Social domain. The result can also be seen in the heatmap representation of the vectors

(Fig.10, center).

Analysis of transitivity in the Risk domain. The procedure to elicit simple policies

was the same. In the Risk domain, there are two attributes per option, reward amount

and probability, and three simple rules, pick the maximum, pick the minimum, or be

indifferent. We defined 6 policies, but only 4 were ever used by subjects: “maximize

the amount, irrespective of the probability,” “maximize the probability, then maximize

the amount,” “minimize the amount, irrespective of the probability,” and “minimize the

probability, irrespective of the amount.” We counted all participants who complied exactly

with any such policy as well as those who made exactly one mistake with respect to a policy.

Among those, 89% chose the policy “maximize the amount irrespective of the probability.”

We removed from our sample all participants whose behavior was consistent with a simple

policy, leaving us 128 subjects. We analyzed TV again and found that violations by the U

group were different than those by any younger group (p-value = 0.0019 for the comparison

with K-1st, p-value = 0.0050 for the comparison with 2nd-3rd, and p-value = 0.0153

for the comparison with 4th-5th), but that all school-age groups performed at similar

levels from each other. This is represented in Fig.4 (right). We also repeated the heatmap

analysis. The result is represented in Figure 5(B).

Again, we performed a marginal sensitivity analysis after removing simple policy users

and found the opposite result than in the Social domain: y-coordinates were significantly

different from 0 overall and for two age groups (4th-5th and U, t-test p-values < 0.05),

whereas x-coordinates were not different from 0 in any age group (Fig.10, right). This

means that in the Risk domain, participants had a clear idea of what they liked least but

a much less clear idea of what they liked best.

Transitivity violations across domains. The aggregate analysis of TV indicated that

consistency improves with age. We addressed the question of how individual scores vary

across domains to assess whether participants who commited relatively more violations in

one domain were also those who committed relatively more violations in a different domain.

Said differently, we wanted to know whether consistency was driven (at least partially) by

21

a common factor or whether it resulted from the development of domain-specific skills.

When considering the full sample, we found that TV in the Goods and Risk domains were

not correlated with one another, while TV in the Goods and Social or TV in the Risk

and Social domains were (Pearson coefficient = 0.38 and 0.30, respectively; p-value 0.0001

for both). When we removed simple policy users, we found that TV were significantly

correlated across all domains (Pearson coefficient = 0.36, p-value = 0.0006 between the

Goods and Risk domains, Pearson coefficient = 0.43, p-value < 0.0001 between the Goods

and Social domains, and Pearson coefficient = 0.49, p-value < 0.0001 between the Risk

and Social domains) suggesting that participants’ consistency was partially driven by the

development of a skill useful in all domains.

Comparison with random play. To assess whether participants committing many vio-

lations might have been acting randomly, we simulated random players. Depending on the

probability we assigned to their expressing indifference, the number of TV was between 8

and 10, substantially above the actual numbers obtained even among the most inconsistent

participants. This was in line with earlier literature on consistency in children (Harbaugh

et al., 2001).

22

SI-3. Additional Statistical Analysis

SI-3.1. Other measures of choice inconsistencies

Choice reversals and choice removals. Number of violations is only one possible way

to address the issue of consistency. An alternative way is to look at severity of violations.

We computed two measures of violation severity for each subject. The first one counts

the number of choices that need to be reversed to restore transitivity. To compute that

number, an algorithm sequentially changes the choices made in each trial, and computes

a TV score after each change. If the score is 0 at any point, the algorithm stops. After

all single trials have been exhausted, the algorithm repeats the procedure over pairs of

trials, then triplets and so on. We found that choice reversals followed a very similar

pattern across age groups as TV (Fig.11, left), and were highly correlated to it in all three

Choice tasks: Goods (Pearson = 0.95, Spearman = 0.98, p < 0.0001), Social (Pearson

= 0.93, Spearman = 0.98, p < 0.0001), and Risk (Pearson = 0.92, Spearman = 0.95, p

< 0.0001). The second of these severity measures counts the number of choices that need

to be removed to restore transitivity. We used an algorithm similar to that for counting

choice reversals. We found that choice removals and choice reversal were almost identical

in capturing severity of violations (Fig.11 right and left), so this measure also closely

reflected the age patterns observed with TV in the three Choice tasks.

Cho

ice

Reve

rsal

s pe

r Sub

ject


Age Group

GoodsSocialRisk

Choice-Task


Age Group

Cho

ice

Rem

oval

s pe

r Sub

ject

GoodsSocialRisk

Choice-Task

Figure 11: Analysis of severity of violations. Left: choice reversals across domainsand age. Right: Choice removals across domains and age.

Implicit ranking and choice noisiness. From a subject’s selections in a Choice task, it

is possible to extract their implicit (or revealed) ranking for the options in that task. We

23

computed this revealed ranking in a very simple way, by tallying their selections “for” and

“against” each of the options as briefly explained in footnote 1: each time the participant

selected an option, 1 point was added to the running tally of that option and when the

participant expressed indifference, 0.5 points were added. The tallied points for each option

were summed, and the options were ordered according to this sum, giving the subject’s

implicit ranking. The subject’s choices were then checked for inconsistencies against their

implicit ranking using the following simple rule. Suppose that, according to the tally of

the implicit ranking, option B was ranked lower than A. The participant would receive a

score of 0, if A was chosen over B (showing consistency with the implicit ranking), a score

of 0.5 if he expressed indifference between A and B, and a score of 1 if B was chosen over

A (showing inconsistency with the implicit ranking). We called this classification error

score ICR (Inconsistency in Choices given the Revealed ranking). Notwithstanding the

endogeneity issues with this measure (we use choices to extract a revealed ranking then

check that ranking against the very choices that were used to create it), it still provides

a measure of choice “noisiness.” As can be seen by comparing Fig.2 with Fig.12 (left),

the age pattern of choice inconsistencies with implicit rankings (ICR) closely resembles

that of TV: participants who make more TV are those who are more “noisy” (make more

mistakes) around their implicit ranking.

Explicit rankings and choices. As an additional measure, we evaluated inconsistencies

between choices in Choice tasks and the explicit rankings elicited in Ranking tasks. This

measure was computed in a similar way as for inconsistencies with respect to implicit

rankings, that is, we checked the subject’s choices for inconsistencies against their explicit

ranking. We called ICE (Inconsistency in Choices given the Explicit ranking) the new

classification error score. One advantage of this measure over the previous one is the

absence of endogeneity (errors are computed from rankings in two different tasks). The

results are illustrated in Fig.12 (right).

In the Goods domain, we found that participants in the K-1st and 2nd-3rd age

groups had relatively more difficulty making choices consistent with their explicit rankings

compared to older participants (p-values < 0.0167 for all comparisons). A similar story

held qualitatively in the Social domain, except that the participants in the K-1st and

2nd-3rd age groups were not significantly different from each other, nor were the older

participants in groups 4th-5th and U. It suggests a less gradual development of the ability

to choose from an explicit ranking. We found that inconsistencies in the Social domain were

following the same trend as inconsistencies in the Goods domain after removing subjects

who used simple policies. In the Risk domain, we found that the level of inconsistencies

was high and similar across all subjects: they were not able to make choices consistent with

their explicit rankings. This result changed when we removed subjects who used simple

24

policies: they were becoming more capable over time to express their explicit rankings in

their choices. Nevertheless, the level of inconsistencies remained higher than in the Goods

domain for all ages.


Age Group

ICR

per S

ubje

ct

GoodsSocialRisk

Choice-Task


Age GroupIC

E pe

r Sub

ject

GoodsSocialRisk

Choice-Task

Figure 12: Ranking and choices. Left: Inconsistencies with respect to implicit ranking(ICR). Right: Inconsistencies with respect to explicit ranking (ICE).

Explicit vs. implicit ranking. The implicit ranking is revealed by choices and does not

need to coincide with the explicit ranking elicited in a Ranking task. For each participant

and task, we computed a measure of distance between those rankings. We used a very

simple procedure where we assigned to each option a score representing the absolute dif-

ference between its rank in the implicit and explicit rankings (e.g., if A was ranked 3rd

according to the explicit ranking and 2nd according to the implicit ranking, its score was

|3− 2| = 1). We then computed the average score of all options as each subject’s discrep-

ancy score. Fig.13 depicts those scores by domain and age group. As can be seen from the

graph, the patterns are similar to TV. In the Goods domain, the ability to choose accord-

ing to one’s explicitly disclosed preferences was found to gradually develop. The same is

true in the Social domain for age groups 2nd-3rd and above. By contrast, discrepancies

remained constant with age in the Risk domain, suggesting a persistent inability to draw

choices from explicit preferences. Overall, the result reinforces the idea that preferences

are not well established in young children, with the corresponding consequences in terms

of transitivity violations as well as discrepancies between pairwise choices and explicit

rankings.

Relationship between measures of consistency. Not surprisingly given our previous

25


Age Group

Disc

repa

ncy

Scor

e pe

r Sub

ject

GoodsSocialRisk

Choice-Task

Figure 13: Discrepancies between explicit and implicit rankings (all subjects).

findings, our measures of consistency were all highly correlated. Indeed, a high number of

transitivity violations (TV) was associated with a high number of inconsistencies between

explicit rankings and choices (ICE) in the Goods domain (Pearson = 0.79), in the Social

domain (Pearson = 0.69) and in the Risk domain (Pearson = 0.72) with all p-values

< 0.0001. A high number of TV was also associated with a high discrepancy score between

implicit and explicit ranking in the Goods domain (Pearson = 0.61), in the Social domain

(Pearson = 0.54) and in the Risk domain (Pearson = 0.60), again with all p-values <

0.0001. Last, TV was also highly correlated with classification error score (ICR) in the

Goods domain (Pearson = 0.95), in the Social domain (Pearson = 0.93) and in the Risk

domain (Pearson = 0.93), all p-values < 0.0001. The results were very similar when we

removed subjects who used simple policies.

Overall, although we feel that TV is the best measure of choice consistency, the main

conclusion of SI3.1 is that the results presented in the main text are robust both when

we consider severity instead of number of violations and also when we use different in-

consistency measures, such as ICE or ICR: intransitivity is invariably associated with the

inability to make choices consistent with rankings, both implicit and explicit, and these

have different developmental signatures across domains.

SI-3.2. Other analysis of choices

Indifference. We checked whether age and choice domain had an influence on the

tendency to be indifferent. The results are reported in Table 2. In the Goods domain, the

26

number of indifferent choices decreased over time (unpaired t-test, p-values < 0.05 between

K-1st and 4th-5th, between K-1st and U, and between 2nd-3rd and U). Participants

were less often indifferent in the Social domain than the Goods domain (paired t-test, p-

values < 0.05 for age groups K-1st and U), and school-age children in the Social domain

were more often indifferent than participants in the U age group (unpaired t-test, p-values

< 0.05). Finally, in the Risk domain we observed no trend in reducing indifferences with

age.

Goods Social Risk

K-1st 3.81 (0.54) 2.04 (0.47) 1.53 (0.59)2nd-3rd 3.09 (0.47) 2.47 (0.54) 1.31 (0.45)4th-5th 2.19 (0.42) 2.50 (0.60) 1.94 (0.53)U 1.86 (0.23) 1.12 (0.28) 1.49 (0.35)

(standard errors in parenthesis)

Table 2: Average number of indifferences.

At the same time, we also found that triplets involving indifferent choices were less

likely to result in TV in all domains for the whole sample (t-tests, p-values < 0.0001), as

well as for each age group (p-values < 0.0001).

Reaction times and choices. Reaction times in trials involving violations were longer

compared to reaction times in trials involving no violations. This was true in all age

groups and in all domains (KS test, p-value = 0.015 for K-1st in the Risk domain, p-

value = 0.010 for 2nd-3rd in the Risk domain and p-value < 0.0001 for all other age

groups and domains), suggesting that participants who were more confused, took longer

to choose an ended up contradicting their choices. We also found that reaction times were

longer when participants pressed the indifference button compared to when they made a

left or right selection (KS test, p-value < 0.0001 for age groups above 2nd-3rd), consistent

with the idea that choices were more difficult to make when options were close in value.

Last, simple policy users were faster than the other subjects (KS test, p-value < 0.0001),

indicating that their choice process was simpler.

SI-3.3. Simple policy users

From the 59 subjects who used simple policies in the Social domain and the 55 who

used simple policies in the Risk domain, we found that only 19 used simple policies in

both domains. We also compared TV in the Goods domain (where no simple policies

were available) by subjects who did and did not use simple policies in the other domains

and found no significant differences. Therefore, centration and the development of the

27

value-based decision-making system appeared to be uncorrelated. However, we found

that simple policy users were very distinguishable from the other participants in terms

of discrepancies between implicit and explicit rankings (t-test, p-value < 0.0001 in both

Risk and Social). This means that they were explicitly revealing preferences that were not

supported by their choices, that is, by their implicit rankings.

SI-3.4. Evolution of preferences

Evolution of preferences in the Social domain. The most common of the simple policies

employed by our participants was to maximize the reward for self, but its predominance

changed over time. Indeed, the preference for giving seemed to be developing: small

children tended to maximize their own reward systematically and, other things being equal,

they also preferred smaller rewards for others. Older participants however selected larger

rewards for others. We did not find any evidence that participants were maximizing the

payoffs of both participants. However, we found that they came closer to that policy with

age, with participants in the 4th-5th age group being similarly close to it as participants

in the U group.

The most and least favorite options, as revealed by implicit rankings, also changed

over time. Indeed, for all ages, (4,0) and (3,3) were the most popular, but the frequency of

(4,0) decreased while the frequency of (3,3) increased with age. Similarly, (0,4) and (0,5)

were the least popular but the frequency of (0,5) decreased while the frequency of (0,4)

increased with age. The results are summarized in Table 3.

Most favorite Least favorite(4,0) (3,3) (0,4) (0,5)

K-1st 0.65 0.33 0.43 0.772nd-3rd 0.40 0.58 0.31 0.784th-5th 0.50 0.50 0.50 0.72U 0.33 0.73 0.80 0.30

Table 3: Most and least favorite options in Social domain derived from implicit rankings

Our Social-Choice task was also rich enough to permit a study of the evolution of

other regarding preferences. In particular, it is possible to study whether participants

were prosocial (chose (3,3) over (3,1)), willing to share (chose (3,1) over (4,0)) or envious

(chose (0,4) over (0,5)). In other words, we can conduct a similar analysis as in Fehr

et al. (2008), and determine a type for each subject as a function of the decisions in

these three pairs of choices. We defined the same five types as in Fehr et al. (2008):

“strongly egalitarian” (choices (3,3); (3,1); (0.4)), “weakly egalitarian” (choices (3,3);

28

(4,0); (0,4)), “strongly generous” (choices (3,3); (3,1); (0,5)), “weakly generous” (choices

(3,3); (4,0); (0,5)), and “spiteful” (choices (3,1); (4,0); (0,4)). To make it comparable with

the literature, we excluded subjects who expressed one or more indifferences. In Fig.14, we

report the proportion of subjects in each of these five categories (as in Fehr et al. (2008),

the remaining subjects are those who do not belong to any of them).

0

10

20

30

40

50

60

70

80

90

100

3y-4y 5y-6y 7y-8y

Fehretal.(2008)

0

10

20

30

40

50

60

70

80

90

100

K-1st 2nd-3rd 4th-5th U

Thisstudy

Spiteful

Weaklygenerous

Stronglygenerous

Weaklyegalitarian

Stronglyegalitarian

Figure 14: Evolution of other-regarding preferences.

Notice that the results with the aforementioned paper are not directly comparable: ages

overlap, reward structures are not identical and options are also different (e.g., choosing

between (1,1) and (1,2) captures a slightly different aspect of envy than choosing between

(0,4) and (0,5)). However, if we compare 5y-6y with K-1st and 7y-8y with 2nd-3rd we

notice that outcomes are similar across studies. Consistent with the centration hypothesis,

many young children are spiteful. As they grow, they develop some integrative reasoning

and become more egalitarian. Our oldest participants are predominantly generous.

Evolution of preferences in the Risk domain. The most common of the simple policies

employed by our participants was to maximize reward but, as in the Social domain, its

predominance changed over time. More than 20% of participants in the K-1st and 2nd-

3rd age groups used it against less than 10% in the 4th-5th and U age groups. We

chose maximization of expected value, E(V), as a template of integrative reasoning, but

strictly speaking no participant maximized E(V). Only two subjects were one step away

from that integrative policy and they both had some TV. For each participant, we counted

the number of choices that maximized E(V) and averaged this count across participants

in each Choice task and each age group. We found that participants in the K-1st, 2nd-

3rd and 4th-5th age groups were making significantly fewer choices consistent with the

maximization of E(V) compared to participants in the U age group. In particular, after

removing simple policy users, each group of children used policies that were farther away

29

from E(V) compared to participants in the U age group (t-test, p-values < 0.005).

When looking at the favorite option revealed by implicit rankings, we found that chil-

dren were transitioning gradually from the option involving the largest quantity (12, 12.5%)

to the option exhibiting the largest expected value (5, 50%), as depicted in Table 4. In-

terestingly, option (12, 12.5%) was ranked first by many and, at the same time, last by

others.

(12,12.5%) (5,50%)

K-1st 0.60 0.132nd-3rd 0.44 0.354th-5th 0.28 0.53U 0.16 0.82

Table 4: Favorite option in the Risk domain derived from implicit rankings

The results taken together showed that behavior was changing from choices consistent

with very simple policies to choices resulting from trade-offs and integrative thinking. The

centration effect observed in young participants made them appear selfish and consistent

in the Social domain and risk-loving and consistent in the Risk domain. These attitudes

gradually changed with age.

Finally, one could argue that a main result of the paper, namely the idea that partici-

pants learn to know what they like most in the Social domain whereas they learn to know

what they liked least in the Risk domain, was simply due to the fact that there was a

clear best option in Social and a clear worst option in Risk (despite our attempts to have

reasonably balanced options). This simple explanation would be inconsistent with the

fact that the most and least preferred options in Social and Risk change significantly with

age. Instead, we claim the existence of different developmental signatures across domains

regarding the subjects’ learning about their own preferences.

SI-3.5. Attention trials

Remember that after all 21 paired comparisons in each Choice task we included an

“attention trial”, that is, a last choice between the subject’s most- and least-favorite

options, as revealed by their implicit ranking. In each attention trial, subjects received an

inattention score of 1, 0.5, or 0 if they selected their least-favorite option, the indifference

button, or their most-favorite option, respectively. We found that most children were

attentive: 70% in K-1st, 76% in 2nd-3rd, 84% in 4th-5th and 100% U answered all

attention trials correctly. Most of those who failed got a total inattention score of 0.5, and

no children failed all 3 trials. We also found that performance on attention trials and TV

30

was correlated in all three tasks: Goods-Choice task (Pearson = 0.54, Spearman = 0.43),

Social-Choice task (Pearson = 0.40, Spearman = 0.20), and Risk-Choice task (Pearson =

0.36, Spearman = 0.32), all p-values < 0.01. It suggests a relationship between ability to

choose consistently and attention mechanisms. In the same lines, discrepancies between

explicit and implicit rankings were correlated in all domains with attention trials: Goods

domain (Pearson = 0.38, Spearman = 0.34) Social domain (Pearson = 0.32, Spearman =

0.21) and Risk domain (Pearson = 0.27, Spearman = 0.30), all p-values < 0.01.

These results imply that attentiveness as measured by attention trials was a predictor

of intransitivity and it was also associated with the ability to choose according to explicit

rankings.

SI-3.6. Transitive reasoning

We counted for each participant the number of mistakes accumulated in each level of

difficulty of the transitive reasoning task. In all three levels, the K-1st group, 2nd-3rd

group, and 4th-5th group accrued significantly more errors than the U group (p-value

< 0.001, p-value < 0.05, and p-value < 0.05, respectively). Participants in the K-1st

group made more mistakes on the most difficult reasoning trials than they did on the easy

or medium trials (p-value = 0.02 and p-value = 0.0001, respectively). Within the other

age groups the average error counts were similar across trial difficulty (with the exception

of the 2nd-3rd group which had more mistakes on easy than on difficult trials (p-value

= 0.02)).

Performance in the transitive reasoning task was correlated with attentiveness. This

was true for the most difficult trials (Pearson = 0.20, p-value < 0.01) as well as for all

levels of difficulty together (Pearson = 0.22, p-value < 0.01). We also found that it was

correlated with the level of discrepancies between implicit and explicit rankings in all

domains: Goods domain (Pearson = 0.33, p-value < 0.0001), Social domain (Pearson =

0.20, p-value < 0.01) and Risk domain (Pearson = 0.17, p-value < 0.05).

Overall, transitive reasoning was associated with the same explanatory variables as

transitive decision-making.

SI-3.7. The determinants of transitive choices

Relationship between TV and demographic variables. Given TV covaries across do-

mains, we looked for possible common explanatory variables of intransitivity. To this end,

we ran OLS regressions treating TV in each domain as the variable to be explained by a

set of characteristics. Those included age group, gender, number of younger siblings, and

number of older siblings. We found that the only significant explanatory variable for TV

was age group. Moving from one age group to the next was associated with a decreased

31

number of TV in the Goods-Choice task (p-value < 0.001) and in the Social-Choice task

(p-value < 0.001) but not in the Risk-Choice task (p-value = 0.921). The results were

unchanged when we removed subjects who used simple policies.

Relationship between TV and developmental variables. We ran OLS regressions on

the full sample to assess the explanatory power of mistakes in transitive reasoning on

transitivity violations in each domain. We found that mistakes in transitive reasoning (‘TR

mistakes’) were not associated with TV in the Risk domain. They were correlated with

TV in the Goods and Social domains, but significance levels dropped as we controlled for

other explanatory variables such as age group (‘Dummies K-1, 2-3, 4-5’) and attentiveness

(‘Attention trials’). These however were highly significant as well as the tendency to use

simple policies (‘Simple policy’). The results are reported in Tables 5, 6 and 7.

Model 1 Model 2 Model 3 Model 4

TR mistakes 0.878∗∗∗ 0.393∗∗ 0.297∗ 0.188Dummy K-1 − 3.049∗∗∗ 2.497∗∗∗ 1.376∗∗

Dummy 2-3 − 1.383∗∗ 0.824 0.221Dummy 4-5 − 0.146 -0.139 -0.275Attention trials − − 3.460∗∗∗ 2.565∗∗∗

Discrepancies − − − 2.397∗∗∗

Constant 1.359∗∗∗ 0.693∗ 0.708∗∗ -0.403

# obs 185 185 185 185R2 0.127 0.242 0.359 0.485

Significance levels: ∗ = 0.05, ∗∗ = 0.01, ∗∗∗ = 0.001.

Table 5: OLS regression of transitivity violations in the Goods domain

Overall, TV was best predicted in all domains by the performance in attention trials

and the ability to make choices consistent with explicit rankings (specifically, to have

similar implicit and explicit rankings, as captured by ‘Discrepancies’). It was further

predicted by centration (’Simple policy’) in the Social and Risk domains. Transitive

reasoning, though correlated with TV, failed to predict any TV result after controlling for

these other explanatory variables.

32

Model 1 Model 2 Model 3 Model 4 Model 5

TR mistakes 0.359∗∗∗ 0.115 0.069 0.016 -0.018Dummy K-1 − 1.492∗∗∗ 1.176∗∗ 0.775∗ 1.010∗∗

Dummy 2-3 − 0.738∗ 0.418 -0.034 0.023Dummy 4-5 − 0.215 0.048 0.031 0.122Attention trials − − 1.937∗∗∗ 1.330∗∗∗ 1.311∗∗∗

Discrepancies − − − 1.654∗∗∗ 1.250∗∗∗

Simple policy − − − − -1.003∗∗∗

Constant 0.894∗∗∗ 0.531∗ 0.538∗∗ -0.213 0.331

# obs 184 184 184 184 184R2 0.045 0.103 0.187 0.364 0.379


Table 6: OLS regression of transitivity violations in the Social domain

Model 1 Model 2 Model 3 Model 4 Model 5

TR mistakes 0.216 0.265 0.179 0.027 0.006Dummy K-1 − -0.358 -0.957 -0.947 0.234Dummy 2-3 − 0.069 -0.523 -0.149 0.657Dummy 4-5 − 0.765 0.457 0.598 1.007∗

Attention trials − − 3.579∗∗∗ 1.796∗∗∗ 1.690∗∗∗

Discrepancies − − − 2.237∗∗∗ 1.559∗∗∗

Simple policy − − − − -2.255∗∗∗

Constant 2.510∗∗∗ 2.390∗∗∗ 2.403∗∗∗ 0.592 1.328∗∗∗

# obs 183 183 183 183 183R2 0.008 0.022 0.157 0.416 0.471


Table 7: OLS regression of transitivity violations in the Risk domain

33

value-based decision-making: a new developmental paradigm€¦ · value-based decision-making: a...

Documents