knowing with certainty: the appropriateness of extreme confidence

13
Journal of Experimental Psychology: Human Perception and Performance 1977, Vol. 3, No. 4, 552-564 Knowing with Certainty: The Appropriateness of Extreme Confidence Baruch Fischhoff, Paul Slovic, and Sarah Lichtenstein Decision Research, A Branch of Perceptronics Eugene, Oregon How often are people wrong when they are certain that they know the answer to a question ? The studies reported here suggest that the answer is "too often." For a variety of general-knowledge questions (e.g., absinthe is [a] a liqueur or [b] a precious stone), subjects first chose the most likely answer and then indicated their degree of certainty that the answer they had selected was, in fact, correct. Across several different question and response formats, subjects were consistently overconfident. They had sufficient faith in their confidence judgments to be willing to stake money on their validity. The psychological bases for unwarranted certainty are discussed in terms of the inferential processes whereby knowledge is constructed from perceptions and memories. Two aspects of knowledge are what one believes to be true and how confident one is in that belief. Both are represented in a statement like, "I am 70% certain that Quito is the capital of Equador." While it is often not difficult to assess the veridicality of a belief (e.g., by looking in a definitive atlas), evaluating the validity of a degree of confidence is more difficult. For example, the 70% certainty in the above statement would seem more appropriate if Quito is the capital than if Quito isn't the capital, but that is a rather crude assessment. In a sense, only statements of certainty (0% or 100%) can be evaluated individually, according to whether the beliefs to which they are at- tached are true or false. This research was supported by the Advanced Research Projects Agency (ARPA) of the De- partment of Defense and was monitored by the Office of Naval Research under Contract N00014- 76-0074 (ARPA Order No. 30S2) under a sub- contract from Decisions and Designs, Inc. to Ore- gan Research Institute. We would like to thank Bernard Corrigan, Robyn Dawes, Ward Edwards, and Amos Tversky for their contributions to this project. Requests for reprints should be sent to Baruch Fischhoff, Decision Research, A Branch of Per- ceptronics, 1201 Oak Street, Eugene, Oregon 97401. One way to validate degrees of confidence is to look at the calibration of a set of such con- fidence statements. An individual is well cali- brated if, over the long run, for all proposi- tions assigned a given probability, the pro- portion that is true is equal to the probability assigned. For example, half of those state- ments assigned a probability of .50 of being true should be true, as should 60% of those assigned .60, and all of those about which the individual is 100% certain. A burgeoning literature on calibration has been surveyed by Lichtenstein, Fischhoff, and Phillips (in press). The primary conclusion of this re- view is that people tend to be overconfident, that is, they exaggerate the extent to which what they know is correct. A fairly typical set of calibration curves, drawn from several studies, appears in Figure 1. We see that when people should be right 70% of the time, their "hit rate" is only 60%; when they are 90% certain, they are only 75% right; and so on. People's poor calibration may be, in part, just a question of scaling. Probabilities (or odds) are a set of numbers that people use with some internal consistency (e.g., the curves in Figure 1 are more or less mono- tonically increasing) but not in accordance 552

Upload: vodung

Post on 14-Feb-2017

220 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Knowing with Certainty: The Appropriateness of Extreme Confidence

Journal of Experimental Psychology:Human Perception and Performance1977, Vol. 3, No. 4, 552-564

Knowing with Certainty: The Appropriateness ofExtreme Confidence

Baruch Fischhoff, Paul Slovic, and Sarah LichtensteinDecision Research, A Branch of Perceptronics

Eugene, Oregon

How often are people wrong when they are certain that they know theanswer to a question ? The studies reported here suggest that the answer is"too often." For a variety of general-knowledge questions (e.g., absinthe is[a] a liqueur or [b] a precious stone), subjects first chose the most likelyanswer and then indicated their degree of certainty that the answer they hadselected was, in fact, correct. Across several different question and responseformats, subjects were consistently overconfident. They had sufficient faithin their confidence judgments to be willing to stake money on their validity.The psychological bases for unwarranted certainty are discussed in terms ofthe inferential processes whereby knowledge is constructed from perceptionsand memories.

Two aspects of knowledge are what onebelieves to be true and how confident oneis in that belief. Both are represented in astatement like, "I am 70% certain thatQuito is the capital of Equador." While itis often not difficult to assess the veridicalityof a belief (e.g., by looking in a definitiveatlas), evaluating the validity of a degreeof confidence is more difficult. For example,the 70% certainty in the above statementwould seem more appropriate if Quito is thecapital than if Quito isn't the capital, butthat is a rather crude assessment. In a sense,only statements of certainty (0% or 100%)can be evaluated individually, according towhether the beliefs to which they are at-tached are true or false.

This research was supported by the AdvancedResearch Projects Agency (ARPA) of the De-partment of Defense and was monitored by theOffice of Naval Research under Contract N00014-76-0074 (ARPA Order No. 30S2) under a sub-contract from Decisions and Designs, Inc. to Ore-gan Research Institute.

We would like to thank Bernard Corrigan,Robyn Dawes, Ward Edwards, and Amos Tverskyfor their contributions to this project.

Requests for reprints should be sent to BaruchFischhoff, Decision Research, A Branch of Per-ceptronics, 1201 Oak Street, Eugene, Oregon 97401.

One way to validate degrees of confidence isto look at the calibration of a set of such con-fidence statements. An individual is well cali-brated if, over the long run, for all proposi-tions assigned a given probability, the pro-portion that is true is equal to the probabilityassigned. For example, half of those state-ments assigned a probability of .50 of beingtrue should be true, as should 60% of thoseassigned .60, and all of those about whichthe individual is 100% certain. A burgeoningliterature on calibration has been surveyedby Lichtenstein, Fischhoff, and Phillips (inpress). The primary conclusion of this re-view is that people tend to be overconfident,that is, they exaggerate the extent to whichwhat they know is correct. A fairly typicalset of calibration curves, drawn from severalstudies, appears in Figure 1. We see thatwhen people should be right 70% of thetime, their "hit rate" is only 60%; whenthey are 90% certain, they are only 75%right; and so on.

People's poor calibration may be, in part,just a question of scaling. Probabilities (orodds) are a set of numbers that people usewith some internal consistency (e.g., thecurves in Figure 1 are more or less mono-tonically increasing) but not in accordance

552

Page 2: Knowing with Certainty: The Appropriateness of Extreme Confidence

KNOWING WITH CERTAINTY 553

1.0

,9

.8

oU

.2 7

oQ.O

.6

.5 .6 .7 .8Subjects' Response

.9 1.0

Figure 1. Some representative calibration curves. (Taken from Lichtenstein, Fischhoff, & Phillips,in press. Copyright 1977 by D. Reidel Publishing Co. Reprinted by permission.)

with the absolute criteron of calibration.Miscalibration can have serious conse-quences (see Lichtenstein et al., in press),yet people's inabilty to assess appropriatelya probability of .80 may be no more surpris-ing than the difficulty they might have inestimating brightness in candles or tempera-ture in degrees Fahrenheit. Degrees of cer-tainty are often used in everyday speech (asare references to temperature), but they areseldom expressed numerically nor is the op-portunity to validate them often available(Tversky & Kahneman, 1974).

The extremes of the probability scale are,however, not such foreign concepts. Being100% certain that a statement is true isreadily understood by most people, and itsappropriateness is readily evaluated. Thefollowing studies examine the calibration ofpeople's expressions of extreme certainty.The studies ask, How often are peoplewrong when they are certain that they are

right? In Experiment 1, the answer issought in probability judgments elicited byquestions posed in four different ways.

Experiment 1

Method

Stimuli. The questions covered a wide varietyof topics, including history, music, geography, na-ture, and literature. The. four formats used werethe following:

1. Open-ended format. Subjects were presentedwith a question stem, which they were asked tocomplete; for example, "Absinthe is a ."After writing down an answer, they estimated theprobability that their answer was correct, using anumber from .00 to 1.00.

2. One-alternative format. Subjects were askedto assess the probability (from .00 to 1.00) thatsimple statements were correct; for example,"What is the probability that absinthe is a preciousstone?" The statement of fact being judged wassometimes true and sometimes false.

3. Two-alternative format (half range of re-sponses). For each question, subjects were asked

Page 3: Knowing with Certainty: The Appropriateness of Extreme Confidence

554 B. FISCHHOFF, P. SLOVIC, AND S. LIECHTENSTEIN

to choose the correct answer from two that wereoffered. After making each choice, they judgedthe probability that the choice was correct; for ex-ample, "Absinthe is (a) a precious stone or (b)a liqueur." Since they chose the more likely an-swer, their probabilities were limited to the rangefrom .50 to 1.00.

4. Two-alternative format (full range of re-sponses). Instead of having subjects pick the an-swer most likely to be correct as in Format 3,the experimenters randomly selected one of thetwo alternatives (e.g., [b] a liqueur) and had sub-jects judge the probability that the selected al-ternative was correct. Here the full range [.00,1.00] was used. As in Format 3, one answer wascorrect.

Subjects and procedure. The subjects were 361paid volunteers who responded to an ad in theUniversity of Oregon student newspaper. Theywere assigned to the four groups according topreference for experiment time and date. Eachgroup received the questions in only one of thefour formats. Besidej the differences in questionformat, the specific questions used differed some-what from group to group. Instructions werebrief and straightforward, asking subjects tochoose or produce an answer and assign a proba-bility of being correct in accordance with the for-mat used.

Results

Lichtenstein and Fischhoff (in press) andFischhoff and Lichtenstein (Note 1) have re-ported on the calibration of the entire rangeof probability responses of subjects in Ex-periment 1. Here we examine only their ex-treme responses. Table 1 shows (a) thefrequency with which subjects indicated 1.00or .00 as the probability an alternative wascorrect and (b) the percentage of answersassociated with these extreme probabilitiesthat were, in fact, correct. Answers assigned

a probability of 1.00 of being correct wereright between 20% and 30% of the time.Answers assigned a probability of .00 wereright between 20% and 30% of the time.In Formats 2 and 4, where responses of 1.00and .00 were possible, both responses oc-curred with about equal frequency. Further-more, alternatives judged certain to be cor-rect were wrong about as often as alterna-tives judged certain to be wrong were cor-rect. The percentage of false certaintiesranged from about 17% (Format 1) toabout 30% (Format 2), but comparisonsacross formats should be made with cautionbecause the items differed. Clearly, our sub-jects were wrong all too often when theywere certain of the correctness of theirchoice of answer.

Experiment 2

Experiment 1 might be faulted because ofthe insensitivity of the response mode. Withprobabilities, subjects using the stereotypicresponses of .50, .55, .60, and so on, havefew possible responses for indicating differ-ent degrees of high certainty. At the ex-treme, most subjects restricted themselvesto the responses .90, .95, and 1.00, corre-sponding to odds of 9:1, 19:1, and oo :1.Perhaps with a more graduated responsemode, subjects would be better able to ex-press different levels of certainty. In Ex-periment 2, subjects were presented withgeneral-knowledge questions concerned witha single topic—the incidence of differentcauses of death in the United States—and

Table 1Analysis of Certainty Responses in Experiment 1

Question format

1. Open ended2. One alternative

3. Two alternative(half range)

4. Two alternative(full range)

No.items

4375

75

50

No.subjects

3086

120

131

Totalno.

responses

1,2906,450

9,000

6,500

Certaintyresponses

(P)

1.001.00.00

1.00

1.00.00

% certaintyresponses

19.714.213.821.8

17,319.1

% correctcertaintyresponses

83.171.729.581.8

80.720.5

Page 4: Knowing with Certainty: The Appropriateness of Extreme Confidence

KNOWING WITH CERTAINTY 555

asked to express their confidence in theiranswers in odds. The odds scale is openended at the extremes, easily allowing theexpression of many different levels of greatcertainty (e.g., 20:1, 50:1, 100:1, 500:1,etc.).

Method

Stimuli. All items involved the relative fre-quencies of the 41 lethal events shown in Table 2.They were chosen because they were easily under-stood and had fairly stable death rates over thelast S years for which statistics were available.The event frequencies appearing in Table 2 wereestimated from vital statistics reports prepared bythe National Center for Health Statistics and the"Statistical Bulletin" of the Metropolitan Life In-surance Company. These frequencies provided thecorrect answers for the questions posed to oursubjects.

From among these 41 causes of death, 106 pairswere constructed according to the following cri-teria: (a) Each cause appeared in approximatelysix pairs and (b) the ratios of the statistical ratesof the more-frequent event to the less-frequentevent varied systematically from 1.25:1 (e.g., ac-cidental falls vs. emphysema) to about 100,000:1(e.g., stroke vs. botulism).

Procedure. Subjects' instructions read as fol-lows:

Each item consists of two possible causes ofdeath. The question you are to answer is: Whichcause of death is more frequent, in general, inthe United States?

For each pair of possible causes of death, (a)and (b), we want you to mark on your answersheet which cause you think is more frequent.

Next, we want to decide haw confident you arethat you have, in fact, chosen the more frequentcause of death. Indicate your confidence by theodds that your answer is correct. Odds of 2:1mean that you are twice as likely to be rightas wrong. Odds of 1,000:1 mean that you area thousand times more likely to be right thanwrong. Odds of 1:1 mean that you are equallylikely to be right or wrong. That is, your an-swer is completely a guess.

At the top of the answer sheet we have drawna scale that looks like this:

I ! I1:1 10:1 100:1 1,000:1

I I10,000:1 100,000:1 1,000,000:1

etc.

could write 75:1 if you think that it is 75 timesmore likely that you are right than you arewrong, or 1.2:1 if you think that it is only 20%more likely that you are right than wrong.

Do not use odds less than 1:1. That would meanthat it is less likely that you are right than thatyou are wrong, in which case you should indi-cate the other cause of death as more frequent.

Table 2Lethal Events Whose Relative Frequencies WereJudged by Subjects in Experiments 2 and 3

Actual deaths

Lethal eventper

100 million"

This scale is used to give you an idea of thekinds of numbers you might want to use. Youdon't have to use exactly these numbers. You

Smallpox 0Poisoning by vitamins 0.5Botulism 1Measles 2.4Fireworks 3Smallpox vaccination 4Whooping cough 7.2Polio 8.3Venomous bite or sting 23.5Tornado 44Lightning 52Nonvenomous animal 63Flood 100Excess cold 163Syphilis 200Pregnancy, childbirth, and abortion 220Infectious hepatitis 330Appendicitis 440Electrocution 500Motor vehicle - train collision 740Asthma 920Firearm accident 1,100Poisoning by solid or liquid 1,250Tuberculosis 1,800Fire and flames 3,600Drowning 3,600Leukemia 7,100Accidental falls 8,500Homicide 9,200Emphysema 10,600Suicide 12,000Breast cancer 15,200Diabetes 19,000Motor vehicle (car, truck, or bus)

accident 27,000Lung cancer 37,000Cancer of the digestive system 46,400All accidents 55,000Stroke 102,000All cancers 160,000Heart disease 360,000All diseases 849,000

0 Per-year death rates are based on 100 millionUnited States residents.

Page 5: Knowing with Certainty: The Appropriateness of Extreme Confidence

556 B. FISCHHOFF, P. SLOVIC, AND S. LICHTENSTEIN

Table 3Percentage of Correct Answers for Major Odds Categories

Lethal events

Experiment 2 Experiment 3

General-knowledgequestions

Experiment 4

Odds

1:11.5:1

2:13:15:1

10:120:150:1

100:11,000:1

10,000:1100,000:1

1,000,000:1

TotalOverall %

-Tippiopriau:%

correct"

50606775839195989999.9

100100100

correct

TV

64468

575189250

1,167126258

1,180862459163157

6,098

%N

91824

1724

1713

722

88

% cor-rect

53576471706672687381878590

71.0

TV

3391084342523223901632273192191382347

2,981

%N

82.5

1068945853

.51

70

% cor-rect

54596565717681748784929696

72.5

N

861210455157194376

6669

376334263134360

3,855

%N

19513.5481.51.587638

75

% cor-rect

53566376767485838088899294

73.1

Note. % TV refers to the percentage of odds judgments that fell in each of the major categories. There were 66subjects in Experiment 2, 40 in Experiment 3, and 42 in Experiment 4.a For well-calibrated subjects.

In case some of the causes of death are ambigu-ous or not well defined by the brief phrase thatdescribes them, we have included a glossary forseveral of these items. Read this glossary beforestarting.

Subjects. The subjects were 66 paid volunteerswho answered an ad in the University of Oregonstudent newspaper.

Resultsx

Table 3 shows the percentages of correctanswers, grouped across subjects, for eachof the most frequently used (major) oddscategories. At odds of 1:1, 1.5:1, 2:1, and3:1, subjects were reasonably well cali-brated. However, as odds increased from3:1 to 100:1, there was little or no increasein accuracy. Only 73% of the answers as-signed odds of 100:1 were correct. Accuracyjumped to 81% at 1,000:1 and to 87% at10,000:1. For the answers assigned odds of1,000,000:1 or greater, accuracy was 90%.For the latter responses, the appropriate de-gree of confidence would have been odds of

9:1. The 12% of responses that are notlisted in Table 3 because they fell betweenthe major odds categories showed similarcalibration.

As in Experiment 1, subjects in Experi-ment 2 exhibited great overconfidence. Theywere frequently wrong at even the highestodds levels. Moreover, they gave many ex-treme odds responses. Of 6,996 odds judg-ments, 3,560 (51%) were greater than 50:1.Almost one fourth of the responses weregreater than 1,000:1.

Experiment 3

Although the tasks and instructions forExperiments 1 and 2 seemed reasonablystraightforward, we were concerned thatsubjects' extreme overconfidence might be

1A more detailed description of subjects' per-formances on this task and several related ones canbe found in Lichtenstein, Slovic, Fischhoff, Combs,and Layman (Note 2).

Page 6: Knowing with Certainty: The Appropriateness of Extreme Confidence

KNOWING WITH CERTAINTY 557

due to lack of motivation or misunderstand-ing of the response scale. Experiment 3replicated Experiment 2, giving more careand attention to instructing and motivatingthe subjects.

Method

Experiment 3 used the 106 causes-of-death ques-tions and odds response format of Experiment 2.The experimenter started the session with a 20-minute lecture to the subjects. In this lecture, theconcepts of probability and odds were carefullyexplained. The subtleties of expressing one's feel-ings of uncertainty as numerical odds judgmentswere discussed, with special emphasis on how touse small odds (between 1:1 and 2:1) when oneis quite uncertain about the correct answer. Achart was provided showing the relationship be-tween various odds estimates and the correspond-ing probabilities. Finally, subjects were taught theconcept of calibration and were urged to makeodds judgments in a way that would lead them tobe well calibrated. (The complete text of the in-structions is available from the authors.)

The subjects for Experiment 3 were 40 per-sons who responded to an ad in the University ofOregon student newspaper. As in previous experi-ments, they were paid for participating. Group sizewas held to about 20 to increase the likelihoodthat subjects would ask questions about any facetof the task that was unclear.

Results

The proportion of correct answers foreach of the most frequent odds categories isshown in the center portion of Table 3. Thedetailed instructions had several effects.First, subjects were much more prone touse atypical odds such as 1.4:1, 2.5:1, andso on. Only 70% of their judgments fellwithin the major odds categories of Table 3,as compared to 88% for Experiment 2.Second, their odds estimates tended to besmaller. About 43% of their estimates were5:1 or less, compared to 27% for this cate-gory in Experiment 1. Third, subjects inthis experiment were more often correct atodds above 10:1 and thus were better cali-brated.

Nevertheless, subjects again exhibited un-warranted certainty. They assigned oddsgreater than or equal to 50:1 to approxi-mately one third of the items. Only 83% ofthe answers associated with these odds were

correct. When subjects estimated odds of50:1, they were correct 74% of the timeand thus should have been giving odds ofabout 3:1. At 1,000:1, they should havebeen saying about 5:1.

Although only 70% of the responses fellin the major odds categories of Table 3,inclusion of the remaining 32% would nothave changed the picture. Odds estimatesfalling between major categories were cali-brated similarly to estimates within thosecategories. Elaborate instruction temperedsubjects' extreme overconfidence, but onlyto a limited extent.

Experiment 4

Is there something peculiar to the causes-of-death items that induces such overconfi-dence? Experiment 4 replicated Experiment3 using general-knowledge questions (ofthe type used in Experiment 1) matched indifficulty with the 106 causes-of-death items.In addition, subjects' faith in their oddsjudgments was tested by their willingnessto participate in a gambling game based onthose judgments.

Method

The questionnaire consisted of 106 two-alterna-tive items covering a wide variety of topics; forexample, "Which magazine had the largest circu-lation in 1970? (a) Playboy or (b) Time"; "Adenwas occupied in 1839 by the (a) British or (b)French"; "Bile pigments accumulate as a result ofa condition known as (a) gangrene or (b) jaun-dice." These items were taken from a large itempool with known characteristics. Availability ofthis pool allowed us to select items matched indifficulty, question by question, with the 106 itemsabout lethal events studied in Experiments 2 and 3.

The subjects were 42 paid volunteers, recruitedby an ad in the University of Oregon studentnewspaper. The instructions paralleled those ofExperiment 3. Subjects first received the detailedlecture describing the concepts of probability, odds,and calibration. They then responded to the 106general-knowledge items, marking the answer theythought to be correct and expressing their cer-tainty about that answer with an odds judgment.

After responding to the 106 items, they wereasked whether they would be willing to acceptgambles contingent on the correctness of their an-swers and the appropriateness of their odds esti-mates. If subjects really believe in their extreme(extremely overconfident) odds responses, it

Page 7: Knowing with Certainty: The Appropriateness of Extreme Confidence

558 B. FISCHHOFF, P. SLOVIC, AND S. LICHTENSTEIN

should be possible to construct gambles that theyare eager to accept but which, in fact, are quitedisadvantageous to them. The game was describedby the following instructions:

The experiment is over. You have just earned$2.50, which you will be able to collect soon.But before you take the money and leave, I'dlike you to consider whether you would be will-ing to play a certain game in order to possiblyincrease your earnings. The rules of the gameare as follows.

1. Look at your answer sheet. Find the questionswhere you estimated the odds of your being cor-rect as 50:1 or greater than 50:1. How manysuch questions were there? (write num-ber)

2. I'll give you the correct answers to these"50:1 or greater" questions. We'll count howmany times your answers to these questionswere wrong. Since a wrong answer in the faceof such high certainty would be surprising,we'll call these wrong answers "your surprises."

3. I have a bag of poker chips in front of me.There are 100 white chips and 2 red chips inthe bag. If I reach in and randomly select achip, the odds that I will select a white chipare 100:2 or 50:1, just like the odds that your"50:1" answers are correct.

4. For every "50:1 or greater" answer you gave,I'll draw a chip out of the bag. (If you wish,you can draw the chips for me.) I'll put thechip back in the bag before I draw again, so theodds won't change. The probability of my draw-ing a red chip is 1/51. Since drawing a red chipis unlikely, every red chip I draw can be con-sidered "my surprise."

5. Every time you are surprised by a wrong an-swer to a "50:1 or greater" question, you payme $1. Every time I am surprised by drawing ared chip, I'll pay you $1.

6. If you are well calibrated, this game is ad-vantageous to you. This is because I expect tolose $1 about once out of every 51 times I drawa chip, on the average. But since your odds aresometimes higher than 50:1, you expect to loseless often than that.

:. Yes

icas uiicu man uiai.

7. Would you play this game? Circle one,No

Subjects who declined were then asked if theywould play if the experimenter raised the amounthe would pay them to $1.50 whenever he drew ared chip, while they still had to pay only $1 inthe event of a wrong answer. Those who still re-fused were offered $2 and then a final offer of$2.50 for every red chip. Since the experimentersexpected the game to be unfair to subjects (bycapitalizing on a "known" judgmental bias), it wasnot actually played for money.

Results

The proportion of correct answers asso-ciated with each of the most common oddsresponses is shown in the right-hand columnof Table 3. Compared with the previousstudies, subjects in Experiment 4 gave ahigher proportion of 1:1 odds (19% of thetotal responses). A few difficult items ledalmost all of the subjects to give answersclose to 1:1, indicating that they were tryingto use small odds when they felt it was ap-propriate to do so. However, this bit of re-straint was coupled with as high a per-centage of large odds estimates as was givenby the untutored subjects in Experiment 2.About one quarter of all answers were as-signed odds equal to or greater than 1,000:1.

Once again, answers to which extremelyhigh odds had been assigned were frequentlywrong. At odds of 10:1, subjects were cor-rect on about three out of every four ques-tions, appropriate to odds of 3:1. At 100:1,they should have been saying 4:1. At 1,000:1 and at 100,000:1, estimates of about 7:1and 9:1 would have been more in keepingwith subjects' actual abilities. Over thelarge number of questions for which peoplegave odds of 1,000,000:1 or higher, theywere wrong an average of about 1 time outof every 16.

The gambling game. Of the 42 subjects,27 agreed to play the gambling game de-scribed above for $1. Six more agreed whenthe stakes were raised to $1.50 every timethe experimenter drew a red chip. Of theholdouts, 3 subjects agreed to play at $2 forevery red chip and 2 more agreed when thefinal offer of $2.50 was made. Only 3 sub-jects refused to participate at any level ofpayment per red chip.

After subjects had made their decisionsabout playing the game, they were askedwhether they would change their minds ifthe game were to be played, on the spot, forreal money. No subject indicated a desire tochange his or her decision. Two subjectsapproached the experimenter after the ex-periment requesting that they be given achance to play the game for cash. Their re-quest was refused.

Page 8: Knowing with Certainty: The Appropriateness of Extreme Confidence

KNOWING WITH CERTAINTY 559

Of course, this game is strongly biasedin favor of the experimenter. Since subjectswere wrong about once for every eight an-swers assigned odds of 50:1 or greater, thegame would have been approximately fairhad the experimenter removed 86 of thewhite chips from the bag, leaving its con-tents at 14 white and 2 red chips.

The expected outcome of playing thegame with each subject was simulated.Every wrong answer on a "50:1 or greater"question was assumed to cost the subject $1.The experimenter was assumed to havedrawn 1/51 of a red chip for every answergiven at odds greater than or equal to 50:1;his expected loss was then calculated in ac-cordance with the bet the subject had ac-cepted. For example, if a subject acceptedthe experimenter's first offer ($1 per redchip) and gave 17 "50:1 or greater" an-swers, the experimenter's simulated loss was17/51 dollars (33^5).

The subjects who agreed to play averaged38.3 questions with odds greater than orequal to 50:1. Thirty-six persons had ex-pected monetary losses, and three had ex-pected wins. Individual expected outcomesranged between a loss of $25.63 and a gainof $1.84. The mean expected outcome wasa loss of $3.64 per person and the medianoutcome was a loss of $2.35. Ten personswould have lost more than $5. The 39 sub-jects would have lost a total of $142.13across 1,495 answers at odds greater than orequal to 50:1, an average loss of 9.5$ forevery such answer. The two persons whoearnestly requested special permission toplay the game had expected losses totaling$33.38 between them.

Experiment 5: Playing for Keeps

Subjects in Experiment 4 viewed theiroverconfident odds judgments as faithfulenough reflections of their state of knowl-edge that they were willing to accept hypo-thetical bets more disadvantageous thanmany that can be found in a Las Vegascasino. Before concluding that there ismoney to be made in "trivia hustling," we

decided to replicate Experiment 4 with realgambling at the end.

Method

Nineteen subjects participated in Experiment 5.It differed from Experiment 4 only in that thegambling game was presented as a real game.After responding to the 106 items, subjects heardthe gambling game instructions and decidedwhether or not they would play. They were toldthat they could lose all the money they had earnedin the experiment and possibly even more thanthat. After they made their decisions about playingthe game, subjects were told that any earningsfrom the game would be added to their pay forthe experiment, but that if they lost money, noneof the money initially promised them for partici-pating would be confiscated. The game was thenplayed on those terms.

Six of the 19 subjects agreed to play the gameas first specified (with a $1 payment for each "ex-perimenter's surprise"). Three more agreed to playwhen the experimenter offered to increase thepayment to $1.50 per red chip. Increasing the pay-ment to $2 brought in one additional player, andthree more agreed to play at $2.50. Six subjectsconsistently refused to participate; some becausethey felt they were not well calibrated, others be-cause they did not like to gamble.

Results

When the game was actually played, the13 participating subjects missed 46 of the387 answers (11.9%) to which they had as-signed odds greater than or equal to 50:1. All13 subjects would have lost money, rangingfrom $1 to $11 (in part because, by chance,no red chips were drawn). When the ex-perimenter's part of the game was simulatedas in Experiment 4, four subjects wouldhave lost more than $6, and the averageparticipating subject would have lost $2.64.Thus, the hypothetical nature of the gamblein Experiment 4 apparently had minimalinfluence on subjects' willingness to bet.

General Subject and Item Analyses

Is undue confidence found only in a fewsubjects or only for a few special items? Ifcases of extreme overconfidence are concen-trated in only a few subjects, then the gen-erality of our conclusions would be limited.Pathological overconfidence on the part ofa small sector of the public would be worth

Page 9: Knowing with Certainty: The Appropriateness of Extreme Confidence

560 B. FISCHHOFF, P. SLOVIC, AND S. LICHTENSTEIN

Table 4Frequency of Extreme Overconfidence (Odds Greater Than or Equal to 50:1That Were Assigned to Wrong Answers)

No. casesof extreme

overconfidence

0123456789

1011121314IS1617More than 17°

Number

Experiment 3

5"366414210201100211 (32)

of subjects

Experiment 4

3755911203210002001 (27)

No. extremelyoverconfident

subjects

0123456789

101112131415

Number

Experiment 3

33"251913324110021011

of items

Experiment 4

41201610821300300101

a There were five subjects in Experiment 3 who never showed extreme overconfidence.b There were 33 items in Experiment 3 for which no subject showed extreme overconfidence.0 Actual number of cases is in parentheses.

exploring further but would not tell us muchabout cognitive functioning in general. Theresults of the gambling games reportedabove show that this was not the case. Mostsubjects were willing to play and mostwould have lost money because they weretoo often wrong when using extreme odds.The left columns of Table 4 show the dis-tribution of cases of extreme overconfidence(defined as giving odds of 50:1 or greaterand being wrong) over subjects for Experi-ments 3 and 4. The great majority of sub-jects had one or more cases of extremeoverconfidence. The median number was 4in Experiment 4 and between 3 and 4 inExperiment 3, well over what would be ex-pected with well-calibrated subjects. In eachexperiment, one subject appeared to be anoutlier (those subjects having 32 and 27cases). Reanalyzing the data after removingthose two subjects had no effect on our con-clusions.

The right columns of Table 4 show thedistribution of cases of extreme overconfi-

dence over items. If most cases were concen-trated in only a few items, the situationwould be rather different than if a broadsection of items fooled some of the peoplesome of the time. It would not necessarilybe less interesting, for it would remain tobe explained why people went astray onthose few items. As the results in Table 4

Table 5Percentage Wrong with Deceptive andNondeceptive Items

Percentage wrong associatedwith odds of

Experiment and item >50:1 >1000:l

Experiment 3All items (106) 16.6 14.1 12.9Deceptive items (18) 73.9 75.5 72.3Nondeceptive items (88) 8.9 6.7 6.9

Experiment 4All items (106) 13.8 13.1 10.8Deceptive items (17) 73.2 76.7 70.6Nondeceptive items (89) 7.6 6.8 5.2

Expected with perfectcalibration <1.96 <.99 <.10

Page 10: Knowing with Certainty: The Appropriateness of Extreme Confidence

KNOWING WITH CERTAINTY 561

indicate, both situations seem to have beentrue. There are some items on which manypeople gave high odds to the wrong answer,but most items did show a few such cases.

The items on which six or more subjectsshowed extreme overconfidence were allitems that might be described as "decep-tive," ones which less than 50% of the sub-j ects answered correctly. Some correlationbetween deceptiveness and extreme overcon-fidence is inevitable; many subjects must getan answer wrong before many can get itwrong and be certain that they are right.There were 18 items in Experiment 3 and17 items in Experiment 4 answered cor-rectly by less than 50% of our subjects.

Table 5 shows the incidence of cases ofextreme overconfidence with deceptive andnondeceptive items. Although extreme over-confidence is disproportionately prevalentwith the deceptive items, it is still abundantwith the nondeceptive ones. If the deceptiveitems are removed from the sample, thenthe remaining distribution of cases of ex-treme overconfidence over items closely re-sembles a Poisson distribution, which iswhat one would expect if such cases weredistributed at random over items. One thirdof the easiest items, those answered cor-rectly by 90% or more of our subjects, hadat least one case of a subject answeringwrongly and giving odds of being correct of1,000:1 or greater. Deleting the one ex-treme subject from each of Experiments 3and 4 had little effect on this result. Clearly,a few subjects or items are not responsiblefor the extreme overconfidence effect.

General Discussion

These five experiments have shown peo-ple to be wrong too often when they arecertain that they are right. This result wasobtained with both probability and odds re-sponses, with minimal and extensive in-structions and with two rather differenttypes of questions. Subjects were sufficientlycomfortable with their expressions of cer-tainty that they were willing to risk moneyon them in both hypothetical and real gam-bles. Finally, cases of extreme overconfi-

dence were widely distributed over subjectsand items.

Although these studies have shown theeffect to be a robust one, they have cer-tainly not closed the topic. Further researchwith different subjects, different items, anddifferent instructions would be most useful.Some moderately informed guesses at theresults of such additional studies are pos-sible. Lichtenstein and Fischhoff (in press)have found that the calibration of probabilityresponses associated with general-knowledgequestions is relatively invariant with regardto several factors not considered here, in-cluding subjects' intelligence, subjects' ex-pertise in the subject-matter area of thequestions and subjects' reliance on the ste-reotypic responses of .50 and 1.00. Theydid, however, find that calibration varieswith item difficulty.

A crucial question for generality is howwell the level of item difficulty found in theseexperiments represents the level found inthe world. Although no simple answer tothis question is possible, it is worth notingthat the items in Experiments 2 and 3 werenot constructed with the intention of elicit-ing extreme overconfidence. Rather, theywere constructed to vary in difficulty fromvery hard to very easy, as defined by theratio of the statistical frequencies of deathfrom each of the two causes. Items in Ex-periments 4 and 5 were matched to theseitems in difficulty.

To explain these results, we must under-stand both how people answer questions andhow they assess the validity of their answer-ing process. Collins (Collins, Warnock,Aiello, & Miller, 1975; Collins, Note 3) hasshown that people use many different strate-gies in answering questions. We suspect,therefore, that extreme overconfidence cancome from a variety of sources. Every an-swering procedure may have its own waysof leading people astray and its own waysof hiding that misguidance when people tryto assess answer validity. Some possiblepathways to overconfidence are describedbelow.

Many of the items we presented to oursubjects are on topics for which they do not

Page 11: Knowing with Certainty: The Appropriateness of Extreme Confidence

562 B. FISCHHOFF, P. SLOVIC, AND S. LICHTENSTEIN

Table 6Deceptive Items in Experiment 3

Causes of death compared'

Pregnancy, abortion, childbirthversus appendicitis

All accidents versus strokeHomicide versus suicideMeasles versus fireworksSuicide versus diabetesBreast cancer versus diabetes

Percentcorrect

1517.5252527.530

No.cases of extremeoverconfidenceb

151412581

a Subjects judged the first cause of death listed to be less fre-quent than the second.b Data are the number of subjects (out of 40) who gave oddsgreater than or equal to 50:1 to the wrong alternative.

have a ready answer stored in memory.They must infer the answer from other in-formation known to them. But people maybe insufficiently critical of their inferenceprocesses. They may fail to ask "What weremy assumptions in deriving that infer-ence?" or "How good am I at making suchinferences?" For example, when peopledraw a few instances of a category frommemory to get an idea of the properties ofthe category, they may not realize thatreadily available examples need not be rep-resentative of the category (Tversky &Kahneman, 1973). Wason and Johnson-Laird (1972) have shown that people have

Table 7Deceptive Items in Experiment 4

considerable confidence in their own errone-ous syllogistic reasoning. Collins et al.(1975) have described a variety of inferen-tial strategies that people use in producinganswers without realizing their limitations.Summarizing her studies on the inferenceprocess in perception, Johnson-Abercrombie(1960) concluded, "Thefse erroneous] in-ferences were not arrived at as a series oflogical steps but swiftly and almost uncon-sciously. The validity of the inferences wasusually not inquired into; indeed, the pro-cess was usually accompanied by a feeling ofcertainty of being right" (p. 89). Pitz(1974), who also observed overconfidencein probability estimates, elaborated a similarhypothesis. He proposed that people tend totreat the results of inferential processes asthough there was no uncertainty associatedwith the early stages of the inference. Sucha strategy is similar to the "best-guess"heuristic that has been found to describe thebehavior of subjects in cascaded inferencetasks (e.g., Gettys, Kelly, & Peterson,1973).

For other questions, people believe thatthey are answering directly from memorywithout making any inferences. People com-monly view their memories as exact (al-

General-knowledge question0 Answers'1

No. casesPercent of extremecorrect overconfidence0

1. Three fourths of theworld's cacao comes from

2. Which causes moredeaths in the U.S.?

Africa* or South America

Appendicitis* or preg-

4.8

19.0

IS

13

3. When was the first airraid?

4. Adonis was the god of5. Kahlil Gibran was most

inspired by which religion?6. Dido and Aeneas is an

opera written by7. Potatoes are native to

nancy, abortion, andchildbirth

1849* or 1937 26.2Love or vegetation* 31.0

Buddhist or Christian* 33.3

Berlioz or Purcell* 33.3Ireland or Peru* 35.7

1010

210

" Some questions have been abbreviated slightly.b Correct answer carries an asterisk.0 Data are the number of subjects (out of 42) who gave odds greater than or equal to 50:1 to the wrongalternative.

Page 12: Knowing with Certainty: The Appropriateness of Extreme Confidence

KNOWING WITH CERTAINTY 563

though perhaps faded) copies of their origi-nal experiences. However, considerable evi-dence has demonstrated that memory ismore than just a copying process (e.g.,Neisser, 1967). According to this view, peo-ple reach conclusions about what they haveseen or what they remember by reconstruct-ing their knowledge from fragments of in-formation, much as a paleontologist infersthe appearance of a dinosaur from fragmentsof bone. During reconstruction, a variety ofcognitive, social, and motivational factorscan introduce error and distortion into theoutput of the process. Examples of this arethe foibles of eyewitness testimony docu-mented by Buckhout (1974), Loftus(1974), Miinsterberg (1908), and others.

If people are unaware of the reconstruc-tive nature of memory and perception andcannot distinguish between assertions andinferences (Harris & Monaco, in press),they will not critically evaluate their in-ferred knowledge. In general, any processthat changes the contents of memory un-beknownst to people will keep them fromasking relevant validity questions and maylead to overconfidence. In his classic studiesof reconstructive processes in memory,Bartlett (1932) found that subjects not onlycreated new material but were often highlycertain about that which they had invented.2

We present these ideas more as a frame-work for future research and conceptuali-zation than as an explanation for our re-sults. Nonetheless, if these speculations havesome validity, it should be possible to findapparent examples in our data. Tables 6 and7 present the most deceptive items from Ex-periments 3 and 4, respectively. Althoughcases of extreme overconfidence were dis-tributed over most i'tems, these "deceptive"items produced a disproportionate share. Inthe absence of detailed protocols from sub-jects, these cases where many people wentastray may provide better clues to our in-tuitions than situations where just one ortwo subjects had trouble with an item.

Looking at the deceptive items in Experi-ment 3 (see Table 6), we find that in manycases the cause of death incorrectly judgedto be more frequent (the first one listed in

each pair) is a dramatic, well-publicizedevent, whereas the underestimated cause isa more "quiet" killer. Considering the firstthree examples, (a) pregnancy, abortion,and childbirth, (b) accidents, and (c) ho-micide seem disproportionately more news-worthy and better reported than their com-parison cause of death.3 In these cases, peo-ple may be relying on the greater avail-ability in memory of examples of the "flash-ier" causes of death without realizing thatavailability is an imperfect inferential rule(Tversky & Kahneman, 1973).4 Otheritems suggest other answering processes.Subjects' confident—but erroneous—beliefs(not shown in Table 6) that there werefewer deaths from smallpox vaccine thanfrom the disease itself may have been basedon the generally valid assumption that vac-cines are safer than the diseases they aremeant to prevent. With smallpox, however,the vaccine has been so successful that noone has died of the disease in the U.S. since1949, while from 6 to 10 people have died

2 An example of the subtle role of assumptionsin the reconstruction of knowledge comes from theexperience of one of the authors who became em-broiled in a friendly debate with a colleague aboutthe dates of a forthcoming conference. Both par-ties agreed that the conference was to last about4 to S days. But the dispute centered about whetherthese dates were March 30 to April 3 or April 30to May 3. The author was certain of the former datesbecause he specifically recalled the date March 30in the organizer's letter. His colleague was certainof the latter period because he specifically recalledthe date May 3 in the letter. Bets were placed, andthe letter was consulted to resolve the dispute. Tothe surprise of both parties, the letter stated thedates as March 30 to May 3, an obvious mistake.Thus, both parties were correct regarding the frag-ment of information they recalled, but one frag-ment led to the wrong conclusion.

3 This speculation has been empirically affirmedby Lichtenstein, Slovic, Fischhoff, Combs, and Lay-man (Note 2).

4 Subjects in Experiment 3 were asked to selectone answer about which they were certain and towrite a short statement indicating why they wereso confident. One subject explained odds of 2,000:1that death from pregnancy was more frequent thandeaths by appendicitis by writing "I've never heardof a person dying of appendicitis, but I have manytimes heard of persons dying during childbirth andabortion."

Page 13: Knowing with Certainty: The Appropriateness of Extreme Confidence

564 B. FISCHHOFF, P. SLOVIC, AND S. LICHTENSTEIN

annually from complications arising fromvaccination.

For the general-knowledge questions ofExperiment 4 in Table 7, we will give a fewinterpretations of the varied ways that un-recognized or inadequately questioned as-sumptions can obscure the tenuousness oferroneous beliefs; the reader can surely pro-vide others. Regarding Item 1, cacao is na-tive to South America. Subjects who knewthis fact (or guessed it from the Spanish-sounding name) may have been misled byassuming that the continent of origin is alsothe continent of greatest production. Similarreasoning may have been involved withItem 7. The potato's prominence in Irishhistory does not mean that it originatedthere. Regarding Item 3, it may not haveoccurred to subjects that an air raid couldbe conducted by balloons, which were usedby Austria to bomb Venice in 1849. Thefact that Adonis was a handsome youth whohad an affair with Venus, the Goddess ofLove, may have suggested that he, too, wasa diety of love (Item 4). And so on.

Finally, let us add a warning that ex-treme overconfidence cuts both ways. Oursources for the answers to general-knowl-edge questions were a variety of encyclo-pedias and dictionaries. We viewed the an-swers they provided with great confidence.Much to our chagrin, we discovered on sev-eral occasions that these authorative sourcesdisagreed, a possibility we had never con-sidered. Fortunately, our own overconfi-dence was discovered before conductingthese experiments; the offending items weredeleted and the remaining ones double- andtriple-checked until we were certain of theiraccuracy.

Reference Notes

1. Fischhoff, B., & Lichtenstein, S. The effect ofresponse mode and question format on calibra-tion (Rep. No. 77-1). Eugene, Oregon: Deci-sion Research, 1977.

2. Lichtenstein, S., Slovic, P., Fischhoff, B.,Combs, B., & Layman, M. Perceived frequency

of low probability, lethal events (Rep. No. 76-2). Eugene, Oregon: Decision Research, 1976.

3. Collins, A. Processes in acquiring knowledgeJ (Rep. No. 3231). Cambridge, Mass.: Bolt Bera-

nek and Newman, Inc., 1976.

References

Bartlett, F. C. Remembering. Cambridge, Eng-land: Cambridge University Press, 1932.

Buckhout, R. Eyewitness testimony. ScientificAmerican, 1974, 231, 23-31.

Collins, A., Warnock, E. H., Aiello, N., & Miller,M. Reasoning from incomplete knowledge. InD. Bobrow & A. Collins (Eds.), Representationand understanding. New York: Academic Press,1975.

Gettys, C. F., Kelly, C., Ill, & Peterson, C. R.The best guess hypothesis in multistage infer-ence. Organizational Behavior and Human Per-formance, 1973, 10, 364-373.

Harris, R. ]., & Monaco, G. E. Psychology ofpragmatic implication: Information processingbetween the lines. Journal of Experimental Psy-chology: General, in press.

Johnson-Abercrombie, M. L. The anatomy ofjudgment. New York: Basic Books, 1960.

Lichtenstein, S., & Fischhoff, B. Do those whoknow more also know more about how much theyknow? The calibration of probability judgments.Organizational Behavior and Human Perform-ance, in press.

Lichtenstein, S., Fischhoff, B., & Phillips, L. D.Calibration of probabilities: The state of the art.In H. Jungerman & G. de Zeeuw (Eds.), Deci-sion making and change in human affairs. Am-sterdam, The Netherlands: Reidel, in press.

Loftus, E. The incredible eyewitness. PsychologyToday, December 1974, pp. 116-119.

Miinsterberg, H. On the witness stand. New York:Doubleday, Page, 1908.

Neisser, U. Cognitive psychology. New York:Appleton-Century-Crofts, 1967.

Pitz, G. Subjective probability distributions forimperfectly known quantities. In G. W. Gregg(Ed.), Knowledge and cognition. New York:Wiley, 1974.

Tversky, A., & Kahneman, D. Availability: Aheuristic for judging frequency and probability.Cognitive Psychology, 1973, 5, 207-232.

Tversky, A., & Kahneman, D. Judgment underuncertainty: Heuristics and biases. Science, 1974,185, 1124-1131.

Wason, P. C., & Johnson-Laird, P. N. Psychologyof reasoning: Structure and content. Cambridge,Mass.: Harvard University Press, 1972.

Received December 14, 1976 •