probability, algorithmic complexity, and subjective...

6

Click here to load reader

Upload: votu

Post on 21-Aug-2018

213 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Probability, algorithmic complexity, and subjective randomnessweb.mit.edu/cocosci/Papers/complex.pdf · Probability, algorithmic complexity, and subjective randomness ... while a

Probability, algorithmic complexity, and subjective randomness

Thomas L. [email protected]

Department of PsychologyStanford University

Joshua B. [email protected]

Brain and Cognitive Sciences DepartmentMassachusetts Institute of Technology

AbstractWe present a statistical account of human random-ness judgments that uses the idea of algorithmiccomplexity. We show that an existing measure ofthe randomness of a sequence corresponds to the as-sumption that non-random sequences are generatedby a particular probabilistic finite state automaton,and use this as the basis for an account that evalu-ates randomness in terms of the length of programsfor machines at different levels of the Chomsky hi-erarchy. This approach results in a model that pre-dicts human judgments better than the responsesof other participants in the same experiment.

The development of information theory promptedcognitive scientists to formally examine how humansencode experience, with a variety of schemes be-ing used to predict subjective complexity (Leeuwen-berg, 1969), memorization difficulty (Restle, 1970),and sequence completion (Simon & Kotovsky, 1963).This proliferation of similar, seemingly arbitrarytheories was curtailed by Simon’s (1972) observationthat the inevitable high correlation between mea-sures of information content renders them essentiallyequivalent. The development of algorithmic infor-mation theory (see Li & Vitanyi, 1997, for a detailedaccount) has revived some of these ideas, with codelengths playing a central role in recent accounts ofhuman concept learning (Feldman, 2000), subjectiverandomness (Falk & Konold, 1997), and the role ofsimplicity in cognition (Chater, 1999). Algorithmicinformation theory avoids the arbitrariness of ear-lier approaches by using a single universal code: thecomplexity of an object (called the Kolmogorov com-plexity after Kolmogorov, 1965) is the length of theshortest computer program that can reproduce it.

Chater and Vitanyi (2003) argue that a preferencefor simplicity can be seen throughout cognition, fromperception to language learning. Their argument isbased upon the important constraints that simplicityprovides for solving problems of induction, which arecentral to cognition. Kolmogorov complexity gives aformal means of addressing “asymptotic” questionsabout induction, such as why anything is learnableat all, but the constraints it imposes are too weakto support the rapid inferences that characterize hu-man cognition. In order to explain how human be-ings learn so much from so little, we need to consider

accounts that can express the strong prior knowl-edge that contributes to our inferences. The struc-tures that people find simple form a strict (and flex-ible) subset of those easily expressed in a computerprogram. For example, the sequence of heads andtails TTHTTTHHTH appears quite complex to us, eventhough, as the parity of the first 10 digits of π, itis easily generated by a computer. Identifying thekinds of regularities that contribute to our sense ofsimplicity will be an important part of any cognitivetheory, and is in fact necessary since Kolmogorovcomplexity is not computable (Kolmogorov, 1965).

There is a crucial middle ground between Kol-mogorov complexity and the arbitrary encodingschemes to which Simon (1972) objected. We willexplore this middle ground using an approach thatcombines rational statistical inference with algorith-mic information theory. This approach gives an in-tuitive transparency to measures of complexity byexpressing them in terms of probabilities, and usescomputability to establish meaningful differences be-tween them. We will test this approach on judg-ments of the randomness of binary sequences, sincerandomness is one of the key applications of Kol-mogorov complexity: Kolmogorov (1965) suggestedthat random sequences are irreducibly complex, anotion that has inspired several psychological theo-ries (eg. Falk & Konold, 1997). We will analyze sub-jective randomness as an inference about the sourceof a sequence X, comparing its probability of beinggenerated by a random source, P (X|random), withits probability of generation by a more regular pro-cess, P (X|regular). Since probabilities map directlyto code lengths, P (X|regular) uniquely identifies ameasure of complexity. This formulation allows usto identify the properties of an existing complexitymeasure (Falk & Konold, 1997), and extend it tocapture more of the statistical structure detectedby people. While Kolmogorov complexity is ex-pressed in terms of programs for a universal Turingmachine, many of the regularities people detect arecomputable by simpler devices. We will use Chom-sky’s (1956) hierarchy of formal languages to orga-nize our analysis, testing a set of nested models thatcan be interpreted in terms of the length of programsfor automata at different levels of the hierarchy.

Page 2: Probability, algorithmic complexity, and subjective randomnessweb.mit.edu/cocosci/Papers/complex.pdf · Probability, algorithmic complexity, and subjective randomness ... while a

Complexity and randomnessThe idea of using a code based upon the length ofcomputer programs was independently proposed bySolomonoff (1964), Kolmogorov (1965), and Chaitin(1969), although it has come to be associated withKolmogorov. A sequence X has Kolmogorov com-plexity K(X) equal to the length of the shortestprogram p for a (prefix) universal Turing machineU that produces X and then halts,

K(X) = minp:U(p)=X

l(p), (1)

where l(p) is the length of p in bits. Kolmogorovcomplexity can be used to define algorithmic proba-bility, with the probability of X being

R(X) = 2−K(X) = maxp:U(p)=X

2−l(p). (2)

There is no requirement that R(X) sum to one overall sequences; many probability distributions thatcorrespond to codes are unnormalized, assigning themissing probability to an undefined sequence.

Kolmogorov complexity can be used to mathemat-ically define the randomness of sequences, identify-ing a sequence X as random if l(X)−K(X) is small(Kolmogorov, 1965). While not necessarily follow-ing the form of this definition, psychologists havepreserved its spirit in proposing that the perceivedrandomness of a sequence increases with its complex-ity. Falk and Konold (1997) consider a particularmeasure of complexity they call the “difficulty pre-dictor” (DP ), calculated by counting the number ofruns (sub-sequences containing only heads or tails),and adding twice the number of alternating sub-sequences. For example, the sequence TTTHHHTHTHis a run of tails, a run of heads, and an alternatingsub-sequence, DP = 4. If there are several parti-tions into runs and alternations, DP is calculatedon the partition that results in the lowest score.1

Falk and Konold (1997) showed that DP corre-lates remarkably well with subjective randomnessjudgments. Figure 1 shows the results of Falk andKonold (1997, Experiment 1), in which 97 partici-pants each rated the apparent randomness of ten bi-nary sequences of length 21, with each sequence con-taining between 2 and 20 alternations (transitionsfrom heads to tails or vice versa). The mean rat-ings show the classic preference for overalternatingsequences: the sequences perceived as most randomare those with 14 alternations, while a truly randomprocess would be most likely to produce sequences

1We modify DP slightly from the definition of Falkand Konold (1997), who seem to require alternating sub-sequences to be of even length. The equivalence resultsshown below also hold for their original version, but itmakes the counter-intuitive interpretation of HTHTH asa run of a single head, followed by an alternating sub-sequence, DP = 3. Under our formulation it would beparsed as an alternating sequence, DP = 2.

2 4 6 8 10 12 14 16 18 20Number of alternations

Sub

ject

ive

rand

omne

ss

Falk & Konold (1997)DP modelFinite state model

Figure 1: Mean randomness ratings from Falk andKonold (1987, Experiment 1), shown with the pre-dictions of DP and the finite state model.

with 10 alternations. The mean DP has a similarprofile, achieving a maximum at 12 alternations andgiving a correlation of r = 0.93.

Subjective randomness asa statistical inference

Psychologists have claimed that the way we thinkabout chance is inconsistent with probability the-ory (eg. Kahneman & Tversky, 1972). For ex-ample, people are willing to say that X1=HHTHTis more random than X2=HHHHH, while they areequally likely to arise by chance: P (X1|random) =P (X2|random) = ( 1

2 )5. However, many of the ap-parently irrational aspects of human judgments canbe understood by considering the possibility thatpeople are assessing a different kind of probability –instead of P (X|random), we evaluate P (random|X)(Griffiths & Tenenbaum, 2001).

The statistical basis of subjective randomness be-comes clear if we view randomness judgments interms of a signal detection task (cf. Lopes, 1982;Lopes & Oden, 1987). On seeing a stimulus X,we consider two hypotheses: X was produced bya random process, or X was produced by a regularprocess. Finding regularities is an important partof identifying predictable processes, a fundamentalcomponent of induction (Lopes, 1982). The deci-sion about the source of X can be formalized as aBayesian inference,

P (random|X)P (regular|X)

=P (X|random)P (X|regular)

P (random)P (regular)

, (3)

in which the posterior odds in favor of a random gen-erating process are obtained from the likelihood ratioand the prior odds. The only part of the right handside of the equation affected by X is the likelihoodratio, which led Griffiths and Tenenbaum (2001) todefine the subjective randomness of X as

random(X) = logP (X|random)P (X|regular)

, (4)

Page 3: Probability, algorithmic complexity, and subjective randomnessweb.mit.edu/cocosci/Papers/complex.pdf · Probability, algorithmic complexity, and subjective randomness ... while a

being the evidence that X provides towards the con-clusion that it was produced by a random process.

When evaluating binary sequences, it is natural toset P (X|random) = ( 1

2 )l(X). Taking the logarithmin base 2, random(X) is −l(X)− log2 P (X|regular),depending entirely on P (X|regular). We obtainrandom(X) = K(X)− l(X), the difference betweenthe complexity of a sequence and its length, if wechoose P (X|regular) = R(X), the algorithmic prob-ability defined in Equation 2. This is identical to themathematical definitions of randomness discussedabove. However, the key point of this statistical ap-proach is that we are not restricted to using R(X):we have a measure of the randomness of X forany choice of P (X|regular), reducing the problem ofspecifying a measure of complexity to the more intu-itive task of determining the probability with whichsequences are produced by a regular process.

A statistical model ofrandomness perception

DP does a good job of predicting the results ofFalk and Konold (1997), but it has some counter-intuitive properties, such as independence of length:HHTHT and HHHHHHHHHHHHHHHHHTHT both haveDP = 3, but the former seems more random. In thissection we will show that using DP is equivalentto specifying P (X|regular) with a hidden Markovmodel (HMM), providing the basis for a statisticalmodel of randomness perception.

An HMM is a probabilistic finite state automa-ton, associating each symbol xi in the sequence X =x1x2 . . . xn with a “hidden” state zi. The probabilityof X under this model is obtained by summing overall possible hidden states, P (X) =

∑Z P (X,Z),

where Z = z1z2 . . . zn. The model assumes that eachxi is chosen based only on zi and each zi is chosenbased only on zi−1, allowing us to write P (X,Z) =P (z0)

∏ni=2 P (zi|zi−1)

∏ni=1 P (xi|zi). An HMM can

thus be specified by a transition matrix P (zi|zi−1),a prior P (z0), and an emission matrix P (xi|zi). Hid-den Markov models are widely used in statistical ap-proaches to speech recognition and bioinformatics.

Using DP as a measure of randomness is equiva-lent to specifying P (X|random) with an HMM cor-responding to the finite state automaton shown inFigure 2. This HMM has six states, and we candefine a transition matrix

P (zi|zi−1) =

δ Cα Cα2 0 0 Cα2

Cα δ Cα2 0 0 Cα2

Cα Cα 0 δ 0 Cα2

Cα Cα δ 0 0 Cα2

Cα Cα Cα2 0 0 δCα Cα Cα2 0 δ 0

(5)

where each row is a vector of (unnormalized) transi-tion probabilities (ie. the first row is P (zi|zi−1 = 1)),and a prior P (z0) = (Cα Cα Cα2 0 0 Cα2), withC = 1−δ

2α+2α2 . If we then take P (xi = H|zi) to be

1 for zi = 1, 3, 5 and 0 for zi = 2, 4, 6 we have aregular generating process based on repeating four“motifs”: state 1 repeats H, state 2 repeats T, states3 and 4 repeat HT, and states 5 and 6 repeat TH. δis the probability of continuing with a motif, whileα defines a prior over motifs, with the probability ofproducing a motif of length k proportional to αk.

Having defined this HMM, the equivalence toDP is straightforward. For a choice of Z indi-cating n1 runs and n2 alternating sub-sequences,P (X,Z) = δn−n1−n2( 1−δ

2α+2α2 )n1+n2αn1+2n2 . Tak-ing P (X|regular) to be maxZ P (X,Z), it is easy toshow that random(X) = −DP logα when δ = 0.5and α =

√3−12 . By varying δ and α, we ob-

tain a more general model: as shown in Figure1, taking δ = 0.525, α = 0.107 gives a betterfit to the data of Falk and Konold (1997), r =0.99. This also addresses some of the counter-intuitive predictions of DP : if δ > 0.5, increas-ing the length of a sequence but not changingthe number of runs or alternating sub-sequencesreduces its randomness, since P (X|regular) growsfaster than P (X|random). With the choices of δand α given above, random(HHTHT) = 3.33, whilerandom(HHHHHHHHHHHHHHHHHTHT) = 2.61. Theeffect is greater with larger values of δ.

Just as the algorithmic probability R(X) is aprobability distribution defined by the length of pro-grams for a universal Turing machine, this choice ofP (X|random) can be seen as specifying the length of“programs” for a particular finite state automaton.The output of an automaton is determined by itsstate sequence, just as the output of a universal Tur-ing machine is determined by its program. However,since the state sequence is the same length as the se-quence itself, this alone does not provide a meaning-ful measure of complexity. In our model, probabil-ity imposes a metric on state sequences, dictating agreater cost for moves between certain states. Sincewe find the state sequence Z most likely to have pro-duced X, we have ananalogue of Kolmogorov com-plexity defined on a finite state automaton.

H

HHT

T

T

1 2

3

45

6

Figure 2: The finite state automaton correspondingto the HMM described in the text. Solid lines indi-cate motif continuation, dotted lines are permittedstate changes, and numbers label the states.

Page 4: Probability, algorithmic complexity, and subjective randomnessweb.mit.edu/cocosci/Papers/complex.pdf · Probability, algorithmic complexity, and subjective randomness ... while a

0 1 2 3 4 5 6 70

0.2

0.4

0.6

0.8

1Lopes & Oden (1987) − Repetition

P(c

orre

ct)

Number of alternations0 1 2 3 4 5 6 7

0

0.2

0.4

0.6

0.8

1Lopes & Oden (1987) − Alternation

P(c

orre

ct)

Number of alternations

TheoreticalSymmetric Asymmetric

0 1 2 3 4 5 6 70

0.5

1

P(c

orre

ct)

Number of alternations

Finite state model

0 1 2 3 4 5 6 70

0.5

1Finite state model

0 1 2 3 4 5 6 70

0.5

1Context−free model

0 1 2 3 4 5 6 70

0.5

1Context−free model

Figure 3: Results of Lopes and Oden (1987, Experi-ment 1) together with predictions of finite state andcontext-free models, illustrating effects of symmetry.

Ascending the Chomsky hierarchySolomonoff’s (1964) contemplation of codes basedupon computer programs was initiated by NoamChomsky’s talk at the 1956 Dartmouth SummerStudy Group on Artificial Intelligence (Li & Vi-tanyi, 1997, p. 308). Chomsky was presenting aformal hierarchy of languages based upon the kindsof computing machines required to recognize them(Chomsky, 1956). Chomsky identified four types oflanguages, falling into a strict hierarchy: Type 3,the simplest, are regular languages, recognizable bya finite state automaton; Type 2 are context-freelanguages, recognizable by a push-down automaton;Type 1 are context-sensitive languages, recognizableby a Turing machine with a tape of length linearlybounded by the length of the input; Type 0 are re-cursively enumerable languages, recognizable by astandard Turing machine. Kolmogorov complexityis defined with respect to a universal Turing ma-chine, capable of recognizing Type 0 languages.

There are features of regular sequences that can-not be recognized by a finite state automaton, be-longing to languages on higher levels of the Chom-sky hierarchy. One such feature is symmetry: theset of symmetric sequences is a classic example of acontext-free language, and symmetry is known toaffect subjective randomness judgments (Lopes &Oden, 1987). Here we will develop “context-free”(Type 2) and “context-sensitive” (Type 1) modelsthat incorporate these regularities.

Lopes and Oden (1987, Experiment 1) illustratedthe effects of symmetry on subjective randomnessusing a signal detection task, in which participantsclassified sequences of length 8 as being either ran-dom or non-random. Half of the sequences weregenerated at random, but the other half were gen-erated by a process biased towards either repeti-tion or alternation, depending on condition. Theproportion of sequences correctly classified was ex-

amined as a function of the number of alternationsand whether or not a sequence was symmetric, in-cluding both mirror symmetry (all sequences forwhich x1x2x3x4 = x8x7x6x5) and “cyclic” symme-try (HTHTHTHT, HHTTHHTT, HHHHTTTT and theircomplements). Their results are shown in Figure 3,together with the theoretical optimal performancethat could be obtained with perfect knowledge of theprocesses generating the sequences. Deviations fromoptimal performance reflect a difference between theP (X|regular) implicitly used by participants and thedistribution used to generate the sequences.

Our Bayesian model naturally addresses this deci-sion problem. By the relationship between log oddsand probabilities, we have

P (random|X) =1

1 + exp{−λ random(X)− ψ}

where λ scales the effect of random(X), with λ = 1a correctly weighted Bayesian inference, and ψ =log P (random)

P (regular)is the log prior odds. Fitting the fi-

nite state model outlined in the previous section tothis data gives δ = 0.638, α = 0.659, λ = 0.128,ψ = −2.75, and a correlation of r = 0.90. However,as shown in Figure 3, the model does not predict theeffect of symmetry.

Accommodating symmetry requires a “context-free” model for P (X|regular). This model allowssequences to be generated by three methods: repe-tition, producing sequences with probabilities deter-mined by the HMM introduced above; symmetry,where half of the sequence is produced by the HMMand the second half is produced by reflection; andcomplement symmetry, where the second half is pro-duced by reflection and exchanging H and T. We thentake P (X|regular) = maxZ,M P (X,Z|M)P (M),where M is the method of production. Since thetwo new methods of producing a sequence go be-yond the capacities of a finite state automaton, thiscan be viewed as imposing a metric on the programsfor a push-down automaton. Applying this modelto the data from Lopes and Oden (1987), we obtainδ = 0.688, α = 0.756, λ = 0.125, ψ = −2.99, andprobabilities of 0.437, 0.491, 0.072, for repetition,symmetry, and complement symmetry respectively,with a fit of r = 0.98. As shown in Figure 3, themodel captures the effect of symmetry.

Duplication is a context-sensitive regularity: theset of all sequences generated by repeating a sub-sequence exactly twice forms a context-sensitive lan-guage. This kind of regularity can be incorporatedinto a “context-sensitive” model in the same wayas symmetry, but the results of Lopes and Oden(1987) are at too coarse a grain to evaluate sucha model. Likewise, these results do not allow us toidentify whether our simple finite state model cap-tures enough regularities: since only motifs of length2 are included, random(THHTHHTH) is quite large.

Page 5: Probability, algorithmic complexity, and subjective randomnessweb.mit.edu/cocosci/Papers/complex.pdf · Probability, algorithmic complexity, and subjective randomness ... while a

Table 1: Log likelihood (correlation) for modelsModel 4 motifs 22 motifs

Finite state -1617.95 (0.65) -1597.47 (0.69)Context-free -1577.41 (0.74) -1553.59 (0.79)

Context-sensitive -1555.47 (0.79) -1531.05 (0.83)

To evaluate the contribution of these factors, weconducted an experiment testing two sets of nestedmodels. The experiment was based on Lopes andOden (1987, Experiment 1), asking participants toclassify sequences as regular or random. One setof models used the HMM equivalent to DP , with 4motifs and 6 states. The second used an HMM ex-tended to allow 22 motifs (all motifs up to length 4that were not repetitions of a smaller motif), witha total of 72 states. In each set we evaluated threemodels, at different levels in the Chomsky hierarchy.The finite state (Type 3) model was simplest, withfour free parameters: δ, α, λ and ψ. The context-free(Type 2) model adds two parameters for the proba-bilities of symmetry and complement symmetry, andthe context-sensitive (Type 1) model adds one moreparameter for the probability of duplication. Be-cause the three models are nested, the simpler beinga special case of the more complex, we can use likeli-hood ratio tests to determine whether the additionalparameters significantly improve the fit of the model.

Method

ParticipantsParticipants were 20 MIT undergraduates.

StimuliStimuli were sequences of heads (H) and tails (T)presented in 130 point fixed width sans-serif font ona 19” monitor at 1280× 1024 pixel resolution.

ProcedureParticipants were instructed that they were aboutto see sequences which had either been produced bya random process (flipping a fair coin) or by otherprocesses in which the choice of heads and tails wasnot random, and had to classify these sequences ac-cording to their source. After a practice session, eachparticipant classified all 128 sequences of length 8, inrandom order, with each sequence randomly start-ing with either a head or a tail. Participants tookbreaks at intervals of 32 sequences.

ResultsWe optimized the log-likelihood for all six models,with the results shown in Table 1. The model with 4motifs consistently performed worse than the modelwith 22 motifs, so we will focus on the results ofthe latter. The context-free model gave a significantimprovement over the finite state model, χ2(2) =87.76, p < 0.0001, and the context-sensitive model

Data CS model

0 0.5 10

0.5

1Context−sensitive model

Data

P(r

ando

m|X

)

r=0.83

Context−free model

r=0.79

Finite state model

r=0.69

RepetitionSymmetryComplementDuplication

Figure 4: Scatterplots show the relationship betweenmodel predictions and data, with markers accordingto the process the context-sensitive model identifiedas generating the sequence. The arrays on the rightshow the sequences used in the experiment, orderedfrom most to least random by both human responsesand the context-sensitive model.

gave a further significant improvement, χ2(1) =45.08, p < 0.0001. Scatterplots for these threemodels are shown in Figure 4, together with se-quences ordered by randomness based on the dataand the context-sensitive model. The parametersof the context-sensitive model were δ = 0.706, α =0.102, λ = 0.532, ψ = −1.99, with probabilitiesof 0.429, 0.497, 0.007, 0.067 for repetition, symmetry,complement, and duplication. This model also ac-counted well for the data sets discussed earlier inthe paper, obtaining correlations of r = 0.94 onFalk and Konold (1997) and r = 0.91 on Lopesand Oden (1987) using exactly the same parame-ter values, showing that it can give a good accountof randomness judgments in general and is not justoverfitting this particular data set.

The likelihood ratio tests suggest that the context-sensitive model gives the best account of human ran-domness judgments. Since this model has several

Page 6: Probability, algorithmic complexity, and subjective randomnessweb.mit.edu/cocosci/Papers/complex.pdf · Probability, algorithmic complexity, and subjective randomness ... while a

free parameters, we evaluated its generalization per-formance using a split-half procedure. We randomlysplit the participants in half ten times, computedthe correlation between halves, and fit the context-sensitive model to one half, computing its correlationwith both that half and the unseen data. The meancorrelation (obtained via a Fisher z transformation)between halves was r = 0.73, while the model gavea mean correlation of r = 0.77 on the fit half andr = 0.76 on the unseen half. The fact that the cor-relation with the unseen data is higher than the cor-relation between halves suggests that the model isaccurately extracting information about the statis-tical structure underlying subjective randomness.

DiscussionThe results of our experiment and model-fitting sug-gest that the perceived randomness of binary se-quences is sensitive to motif repetition, symmetryand symmetry in the complement, and duplicationof a sub-sequence. In fact, these regularities can beincorporated into a statistical model that predictshuman judgments better than the responses of otherparticipants in the same experiment. These regular-ities can be recognized by a Turing machine witha tape of length linearly bounded by the length ofthe sequence, corresponding to Type 1 in the Chom-sky hierarchy. Our statistical model provides a com-putable measure of the randomness of a sequence inthe spirit of Kolmogorov complexity, but defined ona simpler computing machine.

The probabilistic approach presented in this pa-per provides an intuitive method for developing mea-sures of complexity. However, we differ from exist-ing accounts of randomness that make claims aboutcomplexity (Chater, 1999; Falk & Konold, 1997) inviewing probability as primary, and the relationshipbetween randomness and complexity as a secondaryconsequence of a statistical inference comparing ran-dom generation with generation by a more regularprocess. This approach emphasizes the interpreta-tion of subjective randomness in terms of a ratio-nal statistical inference, and explains why complexsequences should seem more random in terms ofP (X|regular) being biased towards simple outcomes:random sequences are those that seem too complexto have been produced by a simple process.

Chater and Vitanyi (2003) argue that simplicitymay provide a unifying principle for cognitive sci-ence. While simplicity undoubtedly plays an im-portant role in guiding induction, being able to usethese ideas in cognitive science requires developing ameans of quantifying simplicity that can accommo-date the kind of strong prior knowledge that humanbeings bring to bear on inductive problems. Kol-mogorov complexity provides a universal, objectivemeasure, and a firm foundation for this endeavour,but is very permissive in the kinds of structures itidentifies as simple. We have described an approach

that uses the framework of rational statistical in-ference to explore measures of complexity that aremore restrictive than Kolmogorov complexity, whileretaining the principles of algorithmic informationtheory by organizing these measures in terms of com-putability. This approach provides us with a goodaccount of subjective randomness, and suggests thatit may be possible to develop restricted measures ofcomplexity applicable elsewhere in cognitive science.

Acknowledgments We thank Liz Baraff for her de-votion to data collection, Tania Lombrozo, CharlesKemp, and Tevya Krynski for significantly reducingP (random|this paper), and Ruma Falk, Cliff Konold,Lola Lopes, and Gregg Oden for answering questions,providing data and sending challenging postcards. TLGwas supported by a Stanford Graduate Fellowship.

ReferencesChaitin, G. J. (1969). On the length of programs forcomputing finite binary sequences: statistical consider-ations. Journal of the ACM, 16:145–159.

Chater, N. (1999). The search for simplicity: A fun-damental cognitive principle? Quarterly Journal ofExperimental Psychology, 52A:273–302.

Chater, N. and Vitanyi, P. (2003). Simplicity: a unify-ing principle in cognitive science. Trends in CognitiveScience, 7:19–22.

Chomsky, N. (1956). Threee models for the descriptionof language. IRE Transactions on Information Theory,2:113–124.

Falk, R. and Konold, C. (1997). Making sense of random-ness: Implicit encoding as a bias for judgment. Psycho-logical Review, 104:301–318.

Feldman, J. (2000). Minimization of boolean copmlexityin human concept learning. Nature, 407:630–633.

Kahneman, D. and Tversky, A. (1972). Subjective prob-ability: A judgment of representativeness. CognitivePsychology, 3:430–454.

Kolmogorov, A. N. (1965). Three approaches to thequantitative definition of information. Problems of In-formation Transmission, 1:1–7.

Leeuwenberg, E. L. L. (1969). Quantitative specificationof information in sequential patterns. Psychological Re-view, 76:216–220.

Li, M. and Vitanyi, P. (1997). An introduction to Kol-mogorov complexity and its applications. Springer Ver-lag, London.

Lopes, L. L. (1982). Doing the impossible: A note oninduction and the experience of randomness. Journalof Experimental Psychology, 8:626–636.

Lopes, L. L. and Oden, G. C. (1987). Distringuishingbetween random and nonrandom events. Journal ofExperimental Psychology: Learning, Memory and Cog-nition, 13:392–400.

Restle, F. (1970). Theory of serial pattern learning. Psy-chological Review, 77:481–495.

Simon, H. A. (1972). Complexity and the representa-tion of patterned sequences of symbols. PsychologicalReview, 79:369–382.

Simon, H. A. and Kotovsky, K. (1963). Human acquisi-tion of concepts for sequential patterns. PsychologicalReview, 70:534–546.

Solomonoff, R. J. (1964). A formal theory of inductiveinference. part i. Information and Control, 7:1–22.