surviving statistics lecture 1

30
Surviving Statistics Surviving Statistics 1. Don’t Panic 1. Don’t Panic By Mike Blyth, M.D., M.P.H. ECWA Evangel Hospital, Jos, Nigeria 2006

Upload: mikeblyth

Post on 19-Jun-2015

66 views

Category:

Data & Analytics


0 download

DESCRIPTION

A presentation for medical students and residents in Jos, Nigeria. Brief orientation to a course in basic statistics.

TRANSCRIPT

Page 1: Surviving statistics lecture 1

Surviving StatisticsSurviving Statistics

1. Don’t Panic1. Don’t Panic

By Mike Blyth, M.D., M.P.H.ECWA Evangel Hospital, Jos, Nigeria2006

Page 2: Surviving statistics lecture 1

Don’t PanicDon’t Panic

It’s not about mathIt’s not about mathYou do not need a You do not need a computer computer butbutComputers will do the Computers will do the algebra for youalgebra for youFocus on visualization, Focus on visualization, conceptsconceptsKnow who to trust Know who to trust when you look for helpwhen you look for help

Page 3: Surviving statistics lecture 1

ResourcesResources

Traditional textbooks are about the worst Traditional textbooks are about the worst place to look, unless you love math.place to look, unless you love math.

There are some good books that help you There are some good books that help you get a foundation without much math.get a foundation without much math.

Online resources are plentiful.Online resources are plentiful.

JUTH Research and Statistics courseJUTH Research and Statistics course

Page 4: Surviving statistics lecture 1

TextbooksTextbooksThis?

Page 5: Surviving statistics lecture 1

TextbooksTextbooksThis?

Intuitive Biostatistics by Harvey Motulsky

Page 6: Surviving statistics lecture 1

TextbooksTextbooksThis?

Statistics For People Who (Think They) Hate Statistics By Neil Salkind

Page 7: Surviving statistics lecture 1

TextbooksTextbooksThis?

Cartoon Guide to Statistics - Larry Gonick

Cartoon Guide to Statistics - Larry Gonick

Page 8: Surviving statistics lecture 1

TextbooksTextbooksThis?

“The intent of this book is to help you through the results section of aresearch article where the numbers are actually crunched, and little asterisksor ‘p < .05’ values appear as if by magic in the margins, to the apparentdelight of the authors. We think that by reading this book, you won’tactually be able to do any statistics (actually,with computers on every streetcorner, no one—doctor, lawyer, beggarman, or statistician—should have todo statistics), but you will understand what researchers are doing and mayeven be able to tell when they’re doing it wrong.”

PDQ Statistics, by G R Norman and D L Streiner

Page 9: Surviving statistics lecture 1

HyperstatHyperstathttp://davidmlane.com/hyperstatThis is a good place to start

Page 10: Surviving statistics lecture 1

StatSoft Electronic TextbookStatSoft Electronic Textbookhttp://www.statsoft.com/textbook/stathome.html

•Emphasizes concepts

•Wide range of topics, but many are advanced

•Could be a good reference source for more info on specific topics

•Not a place to start learning

Page 11: Surviving statistics lecture 1

BMJ BMJ Statistics at Statistics at Square OneSquare One

http://bmj.bmjjournals.com/statsbk

Page 12: Surviving statistics lecture 1

BMJ BMJ Statistics at Statistics at Square OneSquare One

http://bmj.bmjjournals.com/statsbk

•Good as far as it goes

•Limited range of topics

•Tends to emphasize calculation techniques

ContentsPreface1 Data display and summary 2 Mean and standard deviation3 Populations and samples4 Statements of probability and confidence intervals5 Differences between means: type I and type II errors and power 6 Differences between percentages and paired alternatives 7 The t tests 8 The chi-squared tests 9 Exact probability test 10 Rank score tests 11 Correlation and regression 12 Survival analysis13 Study design and choosing a statistical test

Page 13: Surviving statistics lecture 1

Uses or Purposes of StatisticsUses or Purposes of Statistics

Description of a Description of a populationpopulation in compact in compact termsterms– Rather than knowing the PCV of a million Rather than knowing the PCV of a million

individuals, we can make use of an average individuals, we can make use of an average valuevalue

– Measures of spread or Measures of spread or dispersiondispersion can tell us can tell us things about the range of a value like PCVthings about the range of a value like PCV

– We can show rWe can show relationshipselationships between between characteristics.characteristics.

Page 14: Surviving statistics lecture 1

Uses or Purposes of StatisticsUses or Purposes of Statistics

Inferential statisticsInferential statistics – using statistical – using statistical methods to help us accept or reject methods to help us accept or reject hypotheseshypotheses– May involve causation: x May involve causation: x y y– May involve comparison of groups to show May involve comparison of groups to show

that they are differentthat they are different– Most often, used to show how likely it was Most often, used to show how likely it was

that an observation occurred by chance rather that an observation occurred by chance rather than because of some non-random effect than because of some non-random effect such as a risk exposure or medical treatment.such as a risk exposure or medical treatment.

Page 15: Surviving statistics lecture 1

Uses or Purposes of StatisticsUses or Purposes of Statistics

PredictionPrediction– In a sense, we’re nearly always interested in In a sense, we’re nearly always interested in

prediction. We care about history mainly prediction. We care about history mainly because we expect that many consistent because we expect that many consistent patterns will continue. patterns will continue.

– PrognosisPrognosis– Guiding choice: if we do Guiding choice: if we do this, this, recovery is more recovery is more

likely than if we do likely than if we do that.that.

Page 16: Surviving statistics lecture 1

Limitations of StatisticsLimitations of Statistics

Statistics cannot tell us anything about Statistics cannot tell us anything about valuesvalues– How much pain is worth how much extension How much pain is worth how much extension

of lifeof life– How much money should we spend on How much money should we spend on

education vs malaria control?education vs malaria control?– If male or female circumcision helps prevent If male or female circumcision helps prevent

HIV, should they be promoted despite other HIV, should they be promoted despite other factors?factors?

Page 17: Surviving statistics lecture 1

Limitations of StatisticsLimitations of Statistics

Statistics cannot tell you what effects are Statistics cannot tell you what effects are really importantreally important– ‘‘The probability or “p” level associated with The probability or “p” level associated with

any test of significance is only a statement of any test of significance is only a statement of the likelihood that an observed difference the likelihood that an observed difference could have arisen by chance. Of itself, it says could have arisen by chance. Of itself, it says nothing about the size or importance of an nothing about the size or importance of an effect.’ effect.’ (PDQS pg xi)(PDQS pg xi)

Page 18: Surviving statistics lecture 1

Limitations of StatisticsLimitations of Statistics

““No statistical method can effectively deal with the systematic biases that may result from a poorly designed study.”– If data is lost, distorted, not properly collected, If data is lost, distorted, not properly collected,

statistics cannot help recover itstatistics cannot help recover it– If the design is faulty, statistics can’t correct If the design is faulty, statistics can’t correct

that either (though it might help sometimes)that either (though it might help sometimes)– That’s why it’s critical to think through the That’s why it’s critical to think through the

whole design whole design ahead of time,ahead of time, before starting before starting the study.the study.

Page 19: Surviving statistics lecture 1

Limitations of StatisticsLimitations of Statistics

Statistics always apply in a certain situation. In Statistics always apply in a certain situation. In reality, no two situations are ever the same. reality, no two situations are ever the same. Judgment is needed to decide when a given Judgment is needed to decide when a given result is likely to be relevant in a new situation.result is likely to be relevant in a new situation.– ““Whether you are a researcher or clinician, you must Whether you are a researcher or clinician, you must

examine whether the results are applicable to the examine whether the results are applicable to the people with whom you deal. Are the people studied in people with whom you deal. Are the people studied in the research paper sufficiently similar to your patients the research paper sufficiently similar to your patients that the effects or associations are likely to be that the effects or associations are likely to be similar?” similar?” (PDQS xi)(PDQS xi)

Page 20: Surviving statistics lecture 1

Limitations of StatisticsLimitations of Statistics

““A statistician is a A statistician is a person whose lifetime person whose lifetime ambition is to be ambition is to be wrong 5 percent of wrong 5 percent of the time.”the time.”

The problem of “Why The problem of “Why most research results most research results are false.”are false.”

(And the problem is, no one can ever know which 5 percent of answer are wrong)

Page 21: Surviving statistics lecture 1

Limitations of StatisticsLimitations of Statistics

““Why most research results are false”Why most research results are false”– In many areas of research, In many areas of research, truetrue significant results significant results

are relatively rare. For example, suppose one gene are relatively rare. For example, suppose one gene out of 1 million affects susceptibility to malaria. If we out of 1 million affects susceptibility to malaria. If we somehow manage to test the effect of 1 million somehow manage to test the effect of 1 million genes, and use the p=0.5 cutoff for ‘significance’ we genes, and use the p=0.5 cutoff for ‘significance’ we will will by chance aloneby chance alone get 5% ‘significant’ results. get 5% ‘significant’ results. That’s 50,000 false positive results. That’s 50,000 false positive results.

Page 22: Surviving statistics lecture 1

Variables and Descriptive StatisticsVariables and Descriptive Statistics

What is a variable?What is a variable?– Something that can be measuredSomething that can be measured– In certain disciplines, much of life is spent In certain disciplines, much of life is spent

trying to trying to measure measure what is not easily measuredwhat is not easily measured

Independent vs Dependent VariablesIndependent vs Dependent Variables– Independent: Independent: controlled by the researcher, or controlled by the researcher, or

naturally in the environmentnaturally in the environment– Dependent: Dependent: controlled to some degree by the controlled to some degree by the

independent variable(s). Generally, the independent variable(s). Generally, the ‘outcome’ variable of a study‘outcome’ variable of a study

Page 23: Surviving statistics lecture 1

Independent vs Dependent VariablesIndependent vs Dependent Variables

Which are the dependent and independent Which are the dependent and independent variables:variables:– Smoking and cancerSmoking and cancer– Breastfeeding and HIV infectionBreastfeeding and HIV infection– Reflux symptoms and PPI useReflux symptoms and PPI use

Page 24: Surviving statistics lecture 1

Types of variablesTypes of variablesNominalNominal– ““named” only, have no named” only, have no

logical order or rank.logical order or rank.

OrdinalOrdinal– Have logical order, but no Have logical order, but no

good measure of distance good measure of distance between valuesbetween values

IntervalInterval– Ordered and have a Ordered and have a

meaningful way of meaningful way of comparing distances comparing distances between value, but not a between value, but not a true zerotrue zero

RatioRatio– Have order, meaningful Have order, meaningful

distance, and a “true zero”distance, and a “true zero”

(PDQS 2)

Am

t of

info

rmat

ion

Page 25: Surviving statistics lecture 1

Discrete and Continuous VariablesDiscrete and Continuous Variables

DiscreteDiscrete variables have exact values with variables have exact values with no possibility of anything between them.no possibility of anything between them.– Number of children, number of anythingNumber of children, number of anything

ContinuousContinuous variables can be infinitely variables can be infinitely subdivided.subdivided.– Temperature, weight, most physical variablesTemperature, weight, most physical variables

(PDQS 6)

Page 26: Surviving statistics lecture 1

Describing Data:Describing Data: Frequency DistributionsFrequency Distributions

““A key concept in statistics is the use of a A key concept in statistics is the use of a frequency distribution to reflect the frequency distribution to reflect the probability of the occurrence of an event. probability of the occurrence of an event. The distribution can be characterized by The distribution can be characterized by measures of the average—mean, median, measures of the average—mean, median, and mode—and measures of dispersion—and mode—and measures of dispersion—range and standard deviation.” range and standard deviation.” (PDQS 5)(PDQS 5)

Page 27: Surviving statistics lecture 1

Frequency DistributionsFrequency Distributions

These are one These are one of the most of the most basic and basic and important tools important tools in statistics—in statistics—understand understand them well!them well!

The value of variable, measurement is on the x axis.

The y axis reflects not the value of anything, but the number of times a given value was obtained.

Number of children preferring each type of ice cream (PDQS 6)

Page 28: Surviving statistics lecture 1

Frequency DistributionsFrequency Distributions

When displaying frequency distributions of When displaying frequency distributions of continuous variables, we have to subdivide the continuous variables, we have to subdivide the range into discrete range into discrete intervals intervals (or our bars would (or our bars would have zero width)have zero width)

(PDQS 7)

Page 29: Surviving statistics lecture 1

Measures of AverageMeasures of Average

ModeMode– The most The most commoncommon value value

MedianMedian– The value with half the observations above, and half The value with half the observations above, and half

below. The midpoint. (Data must be at least ordinal). below. The midpoint. (Data must be at least ordinal). Median is not affected by Median is not affected by how farhow far outliers are from outliers are from average. average.

MeanMean– Arithmetic average of the data, (which must be at Arithmetic average of the data, (which must be at

least interval). Mean least interval). Mean isis affected by extreme outliers. affected by extreme outliers.

Page 30: Surviving statistics lecture 1

Appendix: Why is the P value not the probability that Appendix: Why is the P value not the probability that the null hypothesis is true?the null hypothesis is true?

A moment's reflection should convince you that the P value could not be the probability that the null hypothesis is true. Suppose we got exactly the same value for the mean in two samples (if the samples were small and the observations coarsely rounded this would not be uncommon; the difference between the means is zero). The probability of getting the observed result (zero) or a result more extreme (a result that is either positive or negative) is unity, that is we can be certain that we must obtain a result which is positive, negative or zero. However, we can never be certain that the null hypothesis is true, especially with small samples, so clearly the statement that the P value is the probability that the null hypothesis is true is in error. We can think of it as a measure of the strength of evidence against the null hypothesis, but since it is critically dependent on the sample size we should not compare P values to argue that a difference found in one group is more

"significant" than a difference found in another. than a difference found in another. –BMJ Statistics at Square One–BMJ Statistics at Square One (http://bmj.bmjjournals.com/statsbk/5.shtml)(http://bmj.bmjjournals.com/statsbk/5.shtml)