problems with science

Problems with Science

“The first principle is that you must not fool yourself.” – Richard Feynman

THE FINAL

Final Exam

• 17 December 2013 (Tuesday)• 18:30-20:30• In the Gymnasium• 20 Questions• All short answer• 5 marks each• Worth 20% of the course grade

PROBLEMS WITH SCIENCE

Replication

In science, we require that our results be reproducible.

In a scientific article about an experiment, scientists are forced to describe every detail of what they did, so that someone else can do the same experiment and get the same result.

Replication

This is called replication. It is a basic principle of the scientific method. If a finding cannot be replicated, then we must reject it.

Does Replication Happen?

• Biotech firm Amgen tried to reproduce 53 “landmark” cancer studies, but only reproduced 6.

• Drug company Bayer tried to reproduce 67 different studies, but only reproduced 17.

• In the decade 2000-2010, there were 80,000 people involved in studies that later could not be replicated.

Why? What’s wrong?

FALSE POSITIVES

Classic Article

In a classic article titled “Why Most Published Research is False,” John Ioannidis found:

“Simulations show that for most study designs and settings, it is more likely for a research claim to be false than true.”

Video Time!

http://www.economist.com/multimedia?bclid=1294626183001&bctid=2719450974001

http://www.economist.com/blogs/graphicdetail/2013/10/daily-chart-2



False Positives

Why do we get 5% false positives?

In science we require p < .05. If the null hypothesis is true, we would obtain these results only 1 in 20 times, or 5%.

So 5% of our results involve accidental correlations. Repeating the experiment is unlikely to result in the same accident.

False Negatives

And why can there be more than 5% false negatives?

There’s no cap on false negatives: you can’t punish people for not finding the truth: it’s difficult!

But what this does mean is that most published findings are false.

Solving the Problem

Remember that p = .05 is the maximum p-value that scientific journals will accept. There’s no reason you can’t have p = .01 (one percent false positives) or p = .001 (one in 1000 false positives).

How do you do that? Just have more people in your experiment.

FINDING YOUR HYPOTHESES IN THE DATA

Finding the Hypothesis in the Data

There’s a difficult-to-understand fallacy in science that goes by different names– “finding hypotheses in the data,” “the problem of multiple comparisons,” “hypothesis fishing” etc.

Random Correlations Everywhere

Suppose I decide to test a new drug I made. I don’t have any idea what it does or doesn’t do. I’m just going to give the drug to the experimental group and a placebo to the control group, then see what happens.

Questionnaire

• How much have you slept in the past two weeks?• How much sex did you have?• Have you had any headaches? How many?• Did you find yourself getting angry for no reason?• How easy or hard did you find it to concentrate?• Do you have more or fewer pimples?• What is your blood pressure?• What’s 14723 plus 9843?

Finding the Hypothesis in the Data

By pure random chance, if I ask enough questions, there will be an accidental correlation between the experimental group and some answers that is not found in the control group.

Look! My drug gives you increased mathematical ability!

Use New Data!

It’s not unusual to find accidental correlations in data.

So in science we require that hypotheses be tested by new data. After I decide that my drug gives people better math skills, I then need to do a new experiment. This is to avoid the problem of multiple comparisons.

Why New Data Is Important

It would be really unlikely if (i) I propose a correlation(ii) I test it against some new data(iii) The new data confirm the correlation(iv) All of that was just an accident

Compare this to the fact that it is really likely to find random correlations in the data.

fMRI

fMRI = functional Magnetic Resonance Imaging. It’s a way of measuring change in blood flow in the brain, that allows us to get an understanding of change in brain activity.

fMRI Neuroscience

Common neuroscience involving fMRI might go something like this: I put a bunch of people in fMRI machines, and have them look at various pictures.

Information Processing

When they look at pictures of happy things, like smiling babies, double rainbows, cute puppies, or whatever, I might notice that certain parts of their brains are active (and not active when I’m not showing them these pictures).I might then conclude that these parts of the brain (the active ones) are responsible for processing information about happy things.

Multiple Comparisons

But this methodology is ripe for the problem of multiple comparisons.

There are lots of areas of the brain and lots of different aspects of any picture. If I look at all the areas of the brain and all the aspects of the pictures, I will find many correlations totally by random chance.

The Proof

Dead Fish

Craig Bennett is a neuroscience graduate student. He wanted to test out his fMRI machine, so he bought a whole dead salmon.

He put the dead salmon in the machine and showed it “a series of photographs depicting human individuals in social situations.”

Experimental Design

The salmon “was asked to determine what emotion the individual in the photo must have been experiencing.”

Then Bennett looked to see whether there were correlations between changes in the blood flow in the salmon’s brain, and the pictures.

Correlations!

Unsurprisingly, there were.

16 out of 8,064 voxels (volumetric pixels) were correlated with picture-viewing.

The important thing is that lots of neuroscientists use these same methods for humans. The risk of error is great.

Solving the Problem

This problem is easily solved: don’t find your hypotheses in your data.

Well… it’s not that easy. You have to convince neuroscientists to behave!

MASSAGING THE DATA

Cheating in Science

There are lots of ways to cheat in science. If you want your study to show that antidepressants do better than placebos, you can not double blind your studies, or use improper randomization techniques (this is obvious to real scientists, though).

You can also:• Only correct the baseline when it suits you.• Ignore dropouts.• Remove outliers when it suits you.• Choose a statistical test that gets the best

results.• Publish only positive findings.

The Baseline

Often, studies don’t have the power we would ideally desire. Remember that for a 95% confidence interval of 6%, we estimated that we’d need 1,000 subjects in our study.

But if you’re studying a new drug, how do you find 1,000 people who need it in your area who are willing to sign up for your trial?

The Baseline

Scientists often test much smaller groups, and then aggregate (put together) all the data later. This is called metaanalysis, and we’ll be discussing it later.When you have a small group of people– for example 20 or 30, there is a high probability that by random chance either the control group or the experimental group will be doing better.

The Baseline

This is called “the baseline.”

If you’re testing a pain medication, for example, the control group might– merely as a matter of chance– have a higher degree of average pain than the experimental group.

They have a higher “baseline” degree of pain.

Controlling for the Baseline

You can “control for the baseline” by testing how much people’s pain improved over the course of the trial, instead of just testing how much pain they’re in at the end of the trial.

Not Controlling

The average pain score in the control group was 65 when the experiment started, and 52 for the experimental group. Nobody improved, so it was also 65 and 52 at the end. But if you report just the end scores, it looks like your treatment worked: the experimental group had 12 less average pain points!

Controlling for the Baseline

It’s best to control for the baseline, but it’s OK if you don’t.

What’s bad is when you control for the baseline when the control group is doing better, but don’t control for it when the experimental group is doing better. That’s cheating

Ignoring Dropouts

Sometimes a treatment won’t work, or will cause harmful side-effects. The people experiencing the worst of these side-effects might drop out of the trial.

If you collect data only on people who finished the trial, it will seem like your treatment has fewer side-effects than it actually does.

Outliers

Outliers

An outlier is a data point that is far away from all of your other data points– it doesn’t fit a pattern that is clearly there.

For example, in a trial for a pain medication, you might have some people get a little better, some people get a little worse, and one person who dies. Dying is an outlier, in this situation.

Controlling for Outliers

Outliers are often due to just random chance. Through no fault of your treatment, sometimes people die. It can’t be helped.

It’s accepted practice to control for outliers (which have specific definitions in statistics) by removing them from your data. You can also choose to leave all your data intact.

Controlling for Outliers

Nothing is wrong with removing outliers– except when you do it only when it suits you.

If you choose to keep negative outliers in the control group and keep positive outliers in the experimental group, but choose to eliminate positive outliers in the control group or negative outliers in the experimental group, you’re cheating!

Publication Bias

Suppose I conduct a rigorous, scientific test of the claim that reading causes foot cancer. I show (high statistical significance, large effect size) that it is true!

That’s big news, and not only will I get published in the best science journals, like Science and Nature, I’ll probably get in the newspapers too.

Publication Bias

Instead, suppose I go out and conduct a rigorous, double-blind placebo-controlled randomized trial for the claim that reading does NOT cause foot cancer.

I use a large sample of a representative set of the population, and discover, with a high degree of statistical significance, that I’m right.

Publication Bias

Well who cares? Not Science or Nature!

We all knew that reading didn’t cause foot cancer. That’s silly.

Negative results are inherently boring and uninteresting. Positive results are exciting and informative.

Testing ESP

http://www.colbertnation.com/the-colbert-report-videos/372474/january-27-2011/time-traveling-porn---daryl-bem

Testing ESP

Dr. Daryl Bem conducted experiments where the task was for subjects to select which of two curtains had an image behind it.

The curtain with the picture was determined randomly by a computer. So we expect that people will get the answer right about 50% of the time… random guessing.

Porn from the Future

What Bem found was that when the picture was normal and not pornographic, people did guess randomly: 49.8% of the time they guessed which curtain hid the picture.

But if the picture was pornographic, subjects guessed right 53.1% of the time. This was statistically significant.

Replicability

This is not unusual. Positive results will happen frequently through mere chance. We saw this in the salmon example.

This is why science relies on replicability. Other experimenters should be able to repeat your experiment and get the same results. If they can’t, then it looks like your result was lucky.

Publication Bias

So does ESP exist? Three separate teams of scientists at the University of Edinburgh, the University of Hertfordshire, and Goldsmiths, the University of London decided to test it. They performed the same experiment that Bem did, but got negative results: even porn pictures were guessed at random.

“No Replications”

The original journal that published Bem’s results, the Journal of Personality and Social Psychology, refused to publish these new ones.

“We don’t publish replications” they said.

Two other psychological journals said the same thing. They wouldn’t even look at the paper.

Publication Bias

Just consider this for a moment.

Suppose you want to know whether ESP works. There are 4 studies and 1 is positive and 3 are negative. But only the positive one is published, no one will publish the negative ones. So you look at all the available evidence and conclude that it works. You’re wrong!

FRAUD ON BOTH ENDS

Diederik Stapel

Diederik Stapel was a professor of social psychology at Tilburg University in the Netherlands.

Diederik Stapel

His research was frequently in the news.

He found that people are more racist when there’s trash around.

He found that eating meat made people less social and more selfish.

Diederik Stapel

But it was a lie!

He made up all the experimental data.

Diederik Stapel

“Nobody ever checked my work. They trusted me.… I did everything myself, and next to me was a big jar of cookies. No mother, no lock, not even a lid.… Every day, I would be working and there would be this big jar of cookies, filled with sweets, within reach, right next to me — with nobody even near. All I had to do was take it.” (p. 164)

Publish or Perish

It’s a “publish or perish” world in science. If you don’t have exciting scientific discoveries, you won’t get promoted, or you may lose your job.

On the other side of things, many unscrupulous “scientific journals” publish your results for money. They claim to “peer review” the results, but this is obviously not always true…

Pay to Publish Scam

Researcher John Bohannon wanted to test whether these “pay to publish” journals were really just scams to make money off of scientists.

So he made up a fake paper, full of obvious mistakes, and sent it to 304 journals. More than half accepted it!

The One Paper that was 304 Papers

“The paper took this form: Molecule X from lichen species Y inhibits the growth of cancer cell Z. To substitute for those variables, I created a database of molecules, lichens, and cancer cell lines and wrote a computer program to generate hundreds of unique papers. Other than those differences, the scientific content of each paper is identical.” -- Bohannon

Basic Problems

The paper had basic problems with it.

For example, there was no control group. It showed that certain lichens stopped cancer from growing– but it didn’t show that they didn’t also stop normal cells from growing.

Stats

• Submitted to 304 journals.• Journals were published by top publishers:

Elsevier, Kluwer, Sage, Wolters• Accepted by 157 (52%)• Rejected by 98 (32%)• No decision by 49 (16%)

Why?

This doesn’t mean all the research published by these journals is bad– it just means you can’t tell whether it’s good unless you look.

Why would anyone doing real research pay money to have it published? One example: Chinese universities often pay between 5,000 and 10,000 RMB per paper published, AND, again, publication is the only way to get ahead.

problems with science

Documents