judgments and decisions psych 253 individual, group, and computer strengths and weaknesses

36
Judgments and Decisions Psych 253 Individual, Group, and Computer Strengths and Weaknesses

Upload: bathsheba-wiggins

Post on 27-Dec-2015

216 views

Category:

Documents


0 download

TRANSCRIPT

Judgments and Decisions

Psych 253

Individual, Group, and Computer

Strengths and Weaknesses

Individual Weaknesses

• Limited perception: Accept the frame and context that are given

• Limited attention: Insensitivity to relevant information and sensitivity to irrelevant information (order, phrasing, surrounding situation)

• Limited memory: Short-term limit of 7, ± 2

• Limited reasoning: People are inconsistent and invalid information processors.

Individual Strengths

• Computer builders: We build the machines, not vice versa.

• Theorists: We develop normative theories for decision-making.

• Pattern recognizers: We can see and extract patterns (faces, chess experts, nurses, pilots, art experts).

“It is a simple house fire in a one-story house in a residential neighborhood. The fire is in the back, in the kitchen area. The lieutenant leads his hose crew into the building, to the back, to spray water on the fire, but the fire just roars back at them. ‘Odd,’ he thinks. The water should have more of an impact. They try dousing it again and get the same results. They retreat a few steps to regroup. Then the lieutenant starts to feel as if something is not right. He doesn’t have any clues; he just doesn’t feel right about being in that house, so he orders his men out of the building—a perfectly standard building with nothing out of the ordinary. As soon as his men leave the building, the floor where they had been standing collapses. Had they still been inside, they would have plunged into the fire below.”

Source: Klein, Gary (1998), Sources of Power: How People Make Decisions, Boston: MIT Press.

Let’s consider a scene from Klein’s Sources of Power.

Many reasons, but one important point to notice: There were no “gold standards” in his real-world scenarios. Perhaps, if statistical models had been developed, the models would have outperformed the experts.

Many people claim to be experts. How do we know whether an "expert" is really an expert?

Experts should be identified by comparing their predictions to a gold standard, such as survival (with respect to surgeons) and safety (with respect to air traffic controllers). But often there is no gold standard (i.e., wine connoisseurs, professors grading essays, eye witnesses giving testimonial accounts, jurors determining guilt or innocence).

Why were Klein’s experts so good?

When gold standards are not available, experts should AT LEAST show:

1. Discrimination in judgments between similar, though not identical, stimuli

2. Consistency in judgments of the same stimuli on repeated occasions

In a study on good judgment, Swedish general practitioners judged the probability of heart failure for 45 cases based on real patients. Five were repeated (though the physicians were not told that). They are called A, B, C, D, and E. Assessments were made on a scale from “Totally Unlikely” to “Certain.”

Samples from three practitioners’ judgments are shown in the next slides.

Source: Skånér, Y., Strender, L. E., & Bring, J. (1998), “How do GPs use clinical information in their judgements of heart failure? A Clinical Judgment Analysis Study,” Scandinavian Journal of Primary Health Care, 16, 95-100.

GP who is consistent but can’t discriminate.

0

20

40

60

80

100

A B C D E

Patient Cases

GP who discriminates but is inconsistent.

0

20

40

60

80

100

A B C D E

Patient Cases

GP who discriminates and is consistent.

0

20

40

60

80

100

A B C D E

Patient Cases

Of course, we don’t know if any of these doctors are correct ... but discriminability and consistency are necessary components of expertise.

What we really want is validity. But even without that information, this type of exercise can be used for training, evaluating, and enhancing performance in fields as diverse as medical diagnosis, auditing, personnel selection, figure skating, and air traffic control.

Group Weaknesses

Suggestibility

Conformity

Obedience

Compliance

Effects of Suggestibility in Ambiguous Settings

0

2

4

6

8

Alone in Group in Group in Group

Day 1 Day 2 Day 3 Day 4

Est

imat

ed m

ovem

ent

in in

ches

Person 1

Person 2

Person 3

Source: Sherif, Muzafer (1936), The Psychology of Social Norms, New York: Harper Collins.

Suggestibility and conformity to group pressure even occurred when people could easily judge the truth by themselves.

Test line A B C

Source: Asch, S. (1956), “Studies of independence and conformity: A minority of one against a unanimous majority, Psychological Monographs, 70, 9, Whole No. 416.

Milgram experiment: What are the conditions under which ordinary people would follow instructions and hurt others?

0.0

0.2

0.4

0.6

0.8

1.0

slig

ht

mod

erat

e

stro

ngve

ryst

rong

inte

nse

ex.in

tens

e

dang

er xxx

15-60

75-120

135-180

195-240

255-300

315-360

375-420

435-450

Pro

port

ion

Com

ply

Voltage

Source: Milgram, Stanley (1974), Obedience to authority: An experimental view, New York: Harper and Row.

Irving Janis came up with the term groupthink when he read Arthur Schlesinger’s account of the how Kennedy and his advisers blundered into the Bay of Pigs. Janis studied the process and found that the advisors fostered a sense that the plan had to succeed. To preserve the good group feeling, dissenting views were censored, especially after Kennedy voiced enthusiasm for the idea. Janis called the behavior groupthink. The recipe for groupthink is:

Source: Janis, Irving (1982), Groupthink: Real World Examples of Conformity, Boston: Houghton Mifflin.

• Members self-censor

• Pressure is placed on those who dissent

• Members feel invulnerable

• Members stereotype others

• Group is extremely cohesive

• Group is insulated from others’ opinions

• Group has a strong, directive leader

Group Strengths Groups tend to work better when members’ opinions are:

• Independent (people’s opinions are not dependent on those around them)

• Diverse (each person has some private information)

• Decentralized (people can specialize and draw on local knowledge)

• Aggregated via a reliable mechanism (turning private judgments into a collective decision)

Source: Surowieki, J. (2004), Wisdom of the Crowds, New York: Doubleday.

Examples

• Judging the weight of an ox

• Locating the USS Scorpion

• Google’s method for locating web pages

• Playing the Iowa Electronic Market

• Getting advice on Who Wants to Be a Millionaire?

Companies can take advantage of the group strengths

Companies now using prediction markets: Yahoo!, Eli Lilly, Google, Microsoft, HP, GE

Predicting whether customers will like new products and services, whether new drugs will gain FDA approval (Eli Lilly),when product launches will occur (Google), how often products will be used (Google), what sales growth will be, when a particular feature will work (Microsoft), when a project is ready for testing, or the number of bugs that will be reported in a piece of software in a given period (Microsoft)

Prediction markets open to the public yield data on factors affecting business plans. These include presidential elections, gas prices, real estate values, a film's performance at the box

office, and even the probability of a flu pandemic.

Computer Weaknesses

• Doing complex tasks in 3D

• Putting information in context and taking unusual events into consideration (broken leg cue)

Computer Strengths

Remember things, keep track of things, combine the same information the same way on repeated occasions, make predictions (help us find predictable cues and predictable relationships)

Numerous studies have compared “experts” against prediction models. Simple models do better.

• Selecting applicants to universities, colleges, or professional schools

• Making medical diagnoses (i.e., cancer) based on tests, interviews, and other available information

• Identifying students who will later act violently in high schools and middle schools

• Identifying who will default on a loan

• Predicting which criminals will violate parole

• Estimating survival times for patients with terminal illness

• Forecasting the weather

• Determining who is guilty and who is innocent

Meehl (1954)

Concluded there were 16 to 20 studies that compared clinical and statistical methods of decision making. We’ll refer to these as intuitive versus statistical methods of information aggregation. In all but one, statistical methods did better at predicting actual behavior.

Sawyer (1966)

Recognized that the issue of measurement was also important.

Prediction refers to the way the data were combined (intuitively or statistically)

Measurement refers to the way the data were collected (intuitively or statistically).

Intuitive data are unstructured interviews, whereas statistical Data are test scores

Data Prediction Method

Intuitive

Statistical

Both

Both sequential

Statistical

Regression on ratings

Pure statistical

Statistical composite

Statistical Synthesis

Intuitive

Pure intuition

Intuitive combo of test scores

Intuitive composite

Intuitive synthesis

Pure intuition. Predict behavior from an interview without tests or other objective information

Regression on ratings. Rate candidate on impressions and with regression

Intuitive combo of test scores.

Pure statistical. Statistically collected data, mechanically combined. Test scores used in a multiple regression to predict performance.

Intuitive composite. Both types of data, intuitively combined. Impressions from interviews and test scores combined intuitively

Statistical composite. Both modes of data combined with regression

Intuitive synthesis. Take a prediction produced by mechanical combination and treat it as a datum to be combined intuitively with other data

Statistical synthesis. Take a prediction produced by intuition and treat it as a datum to be combined statistically with other data.

Data Prediction Method

Intuitive

Statistical

Both

Both as

prediction

Statistical

43%

63%

75%

75%

Intuitive

20%

38%

I

26%

50%

Why do linear models do better?

1. There are not too many crossover interactions in the world.

2. Monotonic relationships between predictors and the criterion are captured fairly well with linear models.

3. The weights assigned to predictor variables are not as important as their signs (Even nonoptimal regression methods outperform expert judgments).

4. People are unreliable, invalid, and distracted by “exceptions.” They are better at providing information that is then combined statistically.

Simple models help people separate facts from values.

Police officers and minority communities in Denver, Colorado, had opposing views about which bullets police should use. The police wanted to switch from lightweight bullets to a new, hollow-tipped bullet that would more reliably disable suspects. Minorities argued the new bullet would kill innocent bystanders. The issue was brought to the city council, where each side brought in experts to testify in their favor.

Source: Hammond, K., & Adelman, L. (1974), “Science, values, and human judgment, Science, 194, 389-96.

X2

X3

X4

Injury

StoppingEffectiveness

Threat to Bystanders

Acceptability

X1

X5

Weight

MuzzleVelocity

KineticEnergy

Multiattribute Utility Approach

Attributes

Bullets B1

B2

B3

B4

InjuryStoppingEffectiveness Threat

Acceptability = w1Injury + w2Stopping Effectiveness + w3Threat

One can separate facts from values and put the two types of information in the appropriate place in a linear model.

Ballistic experts determine how the muzzle velocity, mass, and kinetic energy influence injury potential, threat to bystanders, and stopping effectiveness.

Council members determine the importance of the attributes.

The bullet with the greatest multiattribute utility was not the one the police had been using or the one they wanted. Nonetheless, the bullet with the greatest MAU had no greater threat to bystanders and was nearly equal in stopping effectiveness. The process lead to acceptance by all concerned.

Hour of Day

Pe

rce

nta

ge

Fa

vora

ble

D

eci

sio

ns

Extraneous factors in judicial decisions (in PNAS 2011)Shai Danzigera, Jonathan Levav and Liora Avnaim-Pessoa