gelman psych crisis_2

43
50 shades of gray: A research story Brian Nosek, Jeffrey Spies, and Matt Motyl: Participants from the political left, right and center (N = 1,979) completed a perceptual judgment task in which words were presented in different shades of gray . . . The results were stunning. Moderates perceived the shades of gray more accurately than extremists on the left and right (p = .01). They continue: Our design and follow-up analyses ruled out obvious alternative explanations such as time spent on task and a tendency to select extreme responses. Enthused about the result, we identified Psychological Science as our fall back journal after we toured the Science, Nature, and PNAS rejection mills . . . 1/45

Upload: jemille6

Post on 30-Jul-2015

7.444 views

Category:

Education


0 download

TRANSCRIPT

50 shades of gray: A research story

Brian Nosek, Jeffrey Spies, and Matt Motyl:

Participants from the political left, right and center (N =1,979) completed a perceptual judgment task in whichwords were presented in different shades of gray . . . Theresults were stunning. Moderates perceived the shades ofgray more accurately than extremists on the left and right(p = .01).

They continue:

Our design and follow-up analyses ruled out obviousalternative explanations such as time spent on task and atendency to select extreme responses. Enthused aboutthe result, we identified Psychological Science as our fallback journal after we toured the Science, Nature, andPNAS rejection mills . . .

1/45

The preregistered replication

Nosek, Spies, and Motyl:

We conducted a direct replication while we prepared themanuscript. We ran 1,300 participants, giving us .995power to detect an effect of the original effect size atalpha = .05.

The result:The effect vanished (p = .59).

2/45

3/45

4/45

5/45

6/45

The famous study of social priming

7/45

8/45

Daniel Kahneman (2011):

“When I describe primingstudies to audiences, thereaction is often disbelief. . . The idea you should focuson, however, is that disbelief isnot an option. The results arenot made up, nor are theystatistical flukes. You have nochoice but to accept that themajor conclusions of thesestudies are true.”

9/45

10/45

The attempted replication

11/45

Daniel Kahneman (2011):

“When I describepriming studies toaudiences, the reactionis often disbelief . . . Theidea you should focuson, however, is thatdisbelief is not anoption. The results arenot made up, nor arethey statistical flukes.You have no choice butto accept that themajor conclusions ofthese studies are true.”

Wagenmakers et al. (2014):

“[After] a long seriesof failed replications. . . disbelief does in factremain an option.”

12/45

Alan Turing (1950):

“I assume that the reader isfamiliar with the idea ofextra-sensory perception, andthe meaning of the four itemsof it, viz. telepathy,clairvoyance, precognition andpsycho-kinesis. Thesedisturbing phenomena seem todeny all our usual scientificideas. How we should like todiscredit them! Unfortunatelythe statistical evidence, atleast for telepathy, isoverwhelming.”

13/45

14/45

This week in Psychological Science

I “Turning Body and Self Inside Out: Visualized HeartbeatsAlter Bodily Self-Consciousness and Tactile Perception”

I “Aging 5 Years in 5 Minutes: The Effect of Taking a MemoryTest on Older Adults’ Subjective Age”

I “The Double-Edged Sword of Grandiose Narcissism:Implications for Successful and Unsuccessful LeadershipAmong U.S. Presidents”

I “On the Nature and Nurture of Intelligence and SpecificCognitive Abilities: The More Heritable, the More CultureDependent”

I “Beauty at the Ballot Box: Disease Threats PredictPreferences for Physically Attractive Leaders”

I “Shaping Attention With Reward: Effects of Reward on Space-and Object-Based Selection”

I “It Pays to Be Herr Kaiser: Germans With Noble-SoundingSurnames More Often Work as Managers Than as Employees”

15/45

This week in Psychological Science

I N = 17I N = 57I N = 42I N = 7,582I N = 123+ 156+ 66I N = 47I N = 222,924

16/45

17/45

The “That which does not destroy my statistical significancemakes it stronger” fallacy

Charles Murray: “To me, the experience of early childhoodintervention programs follows the familiar, discouraging pattern. . . small-scale experimental efforts [N = 123 and N = 111] staffedby highly motivated people show effects. When they are subject towell-designed large-scale replications, those promising signsattenuate and often evaporate altogether.”

James Heckman: “The effects reported for the programs I discusssurvive batteries of rigorous testing procedures. They are conductedby independent analysts who did not perform or design the originalexperiments. The fact that samples are small works against findingany effects for the programs, much less the statistically significantand substantial effects that have been found.”

18/45

What’s going on?

I The paradigm of routine discoveryI The garden of forking pathsI The “law of small numbers” fallacyI The “That which does not destroy my statistical significance

makes it stronger” fallacyI Correlation does not even imply correlation

19/45

Why is psychology particularly difficult?

I Indirect and noisy measurementI Human variationI Noncompliance and missing dataI Experimental subjects trying to figure out what you’re doing

20/45

What to do?

I Look at everythingI InteractionsI Multilevel modelingI Within-person studiesI Design analysisI Bayesian inference

21/45

22/45

Living in the multiverse

23/45

Choices!

1. Exclusion criteria based on cycle length (3 options)2. Exclusion criteria based on “How sure are you?” response (2)3. Cycle day assessment (3)4. Fertility assessment (4)5. Relationship status assessment (3)

168 possibilities (after excluding some contradictory combinations)

24/45

Living in the multiverse

25/45

Living in the multiverse

26/45

27/45

28/45

29/45

Interactions and the freshman fallacy

From an email I received:

30/45

Why it’s hard to study comparisons and interactions

I Standard error for a proportion: 0.5/√n

I Standard error for a comparison:√

0.52/n2 + 0.52/n

2 = 1/√n

I Twice the standard error . . . and the effect is probably smaller!

31/45

32/45

Within-person studies

33/45

34/45

35/45

Power Design analysis

I I’ve never made a type 1 error in my lifeI I’ve never made a type 2 error in my lifeI I make Type S (sign) errorsI I make Type M (magnitude) errors

36/45

What can we learn from statistical significance?

37/45

This is what "power = 0.06" looks like.Get used to it.

Estimated effect size

−30 −20 −10 0 10 20 30

Trueeffectsize(assumed)Type S error probability:

If the estimate isstatistically significant,it has a 24% chance ofhaving the wrong sign.

Exaggeration ratio:If the estimate isstatistically significant,it must be at least 9times higher than the true effect size.

38/45

The paradox of publication

39/45

40/45

41/45

Let us havethe serenity to embrace the variation that we cannot reduce,

the courage to reduce the variation we cannot embrace,and the wisdom to distinguish one from the other.

42/45

The Statistical Crisis in Science

Andrew Gelman, John Carlin, Eric Loken, Francis Tuerlinckx,Sara Steegen, Wolf Vanpaemel

Department of Statistics and Department of Political ScienceColumbia University, New York

Department of Psychology, Harvard University, 29 Jan 2015

43/45