error bars; power and sample size - university of iowapower and sample size summary introduction in...

29
Presenting results with error bars Power and sample size Summary Error bars; Power and sample size Patrick Breheny March 27 Patrick Breheny Introduction to Biostatistics (171:161) 1/29

Upload: others

Post on 16-Feb-2021

9 views

Category:

Documents


0 download

TRANSCRIPT

  • Presenting results with error barsPower and sample size

    Summary

    Error bars; Power and sample size

    Patrick Breheny

    March 27

    Patrick Breheny Introduction to Biostatistics (171:161) 1/29

  • Presenting results with error barsPower and sample size

    Summary

    Introduction

    In this lecture, we’ll discuss two topics that come up in one-samplestudies (as well as other types of studies):

    One is the presentation of results, and specifically, thedecision when making figures and tables to include error barsand whether the error bar should be based on the standarddeviation or the standard error

    The other is the issue of planning a study, and determiningbeforehand the power of the study for a given sample size

    Patrick Breheny Introduction to Biostatistics (171:161) 2/29

  • Presenting results with error barsPower and sample size

    Summary

    A dynamite plot

    Figures like the following areincredibly common:

    Far Near

    IQ

    0

    20

    40

    60

    80

    100

    There are a number ofpossible variations, but thegeneral idea is to have a baror a dot or a line that showsthe mean, then error barsthat show . . . something

    These particular error barsare one SD, so theyillustrate something aboutvariability of individuals’ IQsin the two samples

    Patrick Breheny Introduction to Biostatistics (171:161) 3/29

  • Presenting results with error barsPower and sample size

    Summary

    Box plot

    ●●

    Far Near

    6080

    100

    120

    140

    IQ

    Keep in mind, however, thatthe “dynamite” plot is justshowing two numbers, themean and SD

    This has the potential tomisrepresent the actualdistribution of data, andmore informative sorts ofplots, such as the box plot,are alternatives worthconsidering

    Patrick Breheny Introduction to Biostatistics (171:161) 4/29

  • Presenting results with error barsPower and sample size

    Summary

    Strip plotIQ

    60

    80

    100

    120

    140

    Far Near

    ● ●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    We could also just show allthe data

    This type of plot is knownas a strip plot

    Patrick Breheny Introduction to Biostatistics (171:161) 5/29

  • Presenting results with error barsPower and sample size

    Summary

    ±1 SD again

    75

    80

    85

    90

    95

    100

    105

    110

    IQ

    Far Near

    This is the same as the firstplot, although with dotsreplacing the bars; this hasthe advantage that the bardoesn’t block the lower errorbar, so we can see the±1 SD region more clearlyRecall that the mean ±1 SDtypically contains about themiddle two-thirds of thedata

    Patrick Breheny Introduction to Biostatistics (171:161) 6/29

  • Presenting results with error barsPower and sample size

    Summary

    ±1 SE

    75

    80

    85

    90

    95

    100

    105

    110

    IQ

    Far Near

    Here, we’re plotting ±1 SEinstead of ±1 SDThe error bars are a lotsmaller here, of course, sinceSE = SD/

    √n, and now say

    something about thevariability of the samplemean or, if you prefer, theerror with which we havemeasured the sample mean

    Patrick Breheny Introduction to Biostatistics (171:161) 7/29

  • Presenting results with error barsPower and sample size

    Summary

    Error bars: SD vs. SE

    So, the ±SD error bar plot emphasized the variability of dataThis doesn’t really have anything to do with any sort of“error” at all, and furthermore, other kinds of plots are usuallybetter for showing the distribution of your data

    On the other hand, the ±SE error bar plot emphasizesuncertainty (or possible error) about the mean

    However, this isn’t a particularly meaningful plot either, since±SE is only a 68% CI, and really, it isn’t even that, since aswe learned in the previous lecture, the coverage of the CIdepends on the degrees of freedom with which we’vemeasured the SD

    Patrick Breheny Introduction to Biostatistics (171:161) 8/29

  • Presenting results with error barsPower and sample size

    Summary

    CI plot

    75

    80

    85

    90

    95

    100

    105

    110

    IQ

    Far Near

    So typically, the mostmeaningful route is to plotthe confidence interval

    For example, the plot on theleft suggests that it isn’tunreasonable to think thatthe average IQ for bothgroups is 90

    Patrick Breheny Introduction to Biostatistics (171:161) 9/29

  • Presenting results with error barsPower and sample size

    Summary

    Variability and uncertainty are both of interest

    It should be noted, however, that the confidence interval plot,although very useful and informative, doesn’t really tell usanything about the distribution of the data, which is alsointeresting

    In principle, this is the choice one makes in deciding on afigure, whether to emphasize uncertainty about the average orto emphasize variability/diversity, although there are ways toillustrate both in a single figure

    Patrick Breheny Introduction to Biostatistics (171:161) 10/29

  • Presenting results with error barsPower and sample size

    Summary

    Illustrating both variability and uncertainty

    60

    80

    100

    120

    140

    IQ

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    Far Near

    In this plot, the shaded regiondenotes the 95% confidenceinterval, and we can get a senseof the variability amongindividuals as well as theuncertainty about the populationmeans

    Patrick Breheny Introduction to Biostatistics (171:161) 11/29

  • Presenting results with error barsPower and sample size

    Summary

    ±1 SD for NHANES height

    62

    64

    66

    68

    70

    72

    Hei

    ght

    Male Female

    Just to make sure the point isclear, let’s look at the NHANESheight data; here are the means±SD

    Patrick Breheny Introduction to Biostatistics (171:161) 12/29

  • Presenting results with error barsPower and sample size

    Summary

    CI plot for NHANES height

    62

    64

    66

    68

    70

    72

    Hei

    ght

    Male Female

    And here the error bars denotethe 95% confidence interval;both this plot and the previousone tell us different things

    It is obvious that men are,on average, taller thanwomen

    At the same time, it is justas obvious that there areplenty of women in thepopulation who are tallerthan many men

    Patrick Breheny Introduction to Biostatistics (171:161) 13/29

  • Presenting results with error barsPower and sample size

    Summary

    NHANES height data

    55

    60

    65

    70

    75

    80

    Hei

    ght

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    Male Female

    Patrick Breheny Introduction to Biostatistics (171:161) 14/29

  • Presenting results with error barsPower and sample size

    Summary

    Cystic Fibrosis

    One last comment: plottingseparate confidence intervals andlooking for overlap is not thesame as plotting a confidenceinterval for the difference

    0

    100

    200

    300

    400

    500

    600

    FV

    C R

    educ

    tion

    Drug Placebo

    The disparity is particularlydrastic for paired studies, but alsooccurs for two-sample studies, aswe will see in a week or two

    −50 0 50 100 150 200 250 300

    Difference in FVC reduction

    Patrick Breheny Introduction to Biostatistics (171:161) 15/29

  • Presenting results with error barsPower and sample size

    Summary

    Planning a study

    One of the most important questions as far as planning andbudgeting a study is concerned is: how many subjects do Ineed?

    The number of subjects tends to play a very large role indetermining the cost of a study, so funding agencies generallywant to know the number of subjects that a study will requirebefore they make a decision about whether or not to pay for it

    But of course, the fewer subjects you have, the harder it is todistinguish a real phenomenon from chance

    Patrick Breheny Introduction to Biostatistics (171:161) 16/29

  • Presenting results with error barsPower and sample size

    Summary

    Power

    The probability that you will successfully distinguish the realphenomenon from chance (i.e., reject the null hypothesis) iscaptured in the notion of power

    Power is the opposite of the type II error we discussed back atthe beginning of the semester

    Power is the probability of rejecting the null hypothesis giventhat it is in reality false; the type II error rate (β) was theprobability of failing to reject the null hypothesis given that itwas false

    Thus, by the compliment rule:

    Power = 1− β

    Patrick Breheny Introduction to Biostatistics (171:161) 17/29

  • Presenting results with error barsPower and sample size

    Summary

    Two important questions

    With the time remaining in today’s lecture, I want to addresstwo highly related questions:

    If I plan a study with a certain number of subjects, what is mypower going to be?If I want to achieve a certain power, how large does my samplesize need to be?

    Patrick Breheny Introduction to Biostatistics (171:161) 18/29

  • Presenting results with error barsPower and sample size

    Summary

    Two important questions (cont’d)

    These are important questions for any kind of data, and eachtype of study has its own formulas and procedures forcalculating power

    We won’t get into the details of power calculations in thisclass (which can be quite complicated)

    Instead, we will focus on the main concepts, which aregenerally similar for any type of study

    Patrick Breheny Introduction to Biostatistics (171:161) 19/29

  • Presenting results with error barsPower and sample size

    Summary

    Power: Basement analogy

    Consider the following analogy1: you send a child into thebasement to find an object

    What’s the probability that she actually finds it?

    This depends on three things:

    How long does she spend looking?How big is the object?How messy is the basement?

    1This analogy comes from Intuitive Biostatistics, which in turn credits JohnHartung for the original idea

    Patrick Breheny Introduction to Biostatistics (171:161) 20/29

  • Presenting results with error barsPower and sample size

    Summary

    Power: Basement analogy (cont’d)

    If the child spends a long time looking for a large object in aclean, organized basement, then she’ll probably find it

    If the child spends a short time looking for a small object in amessy basement, then there’s a good chance she won’t find it

    All three of these questions have statistical analogs:

    How long does she spend looking? = How big was the samplesize?How big is the object? = How large is the effect size?How messy is the basement? = How noisy/variable is thedata?

    Patrick Breheny Introduction to Biostatistics (171:161) 21/29

  • Presenting results with error barsPower and sample size

    Summary

    Specifying effect size and variability

    In general, one does not know the effect size or the variability– especially before we have conducted the study

    So, in order to calculate power, we are going to essentiallymake up values for these quantities, and our calculated powerwill depend on the values that we choose

    Of course, if we specify values that are far away from reality,our power calculations are not going to be accurate

    Sometimes, reasonable values for certain quantities can bechosen on the basis of past studies or observations

    Other times, a small initial study called a “pilot study” isconducted in order to provide some data with which toestimate these quantities and help plan for a larger study thatwould take place in the future.

    Patrick Breheny Introduction to Biostatistics (171:161) 22/29

  • Presenting results with error barsPower and sample size

    Summary

    Example

    Suppose we develop some intervention that we think canreduce the LDL cholesterol levels of individuals whoparticipate in it

    Suppose we plan to conduct a study in which individuals tryboth the intervention and a control, and we are going to lookat the difference in each individual’s LDL cholesterol levels onand off the intervention

    The power of our study (the probability that we will get ap-value under .05) depends on:

    The sample size (how many people we enroll in our study)The variability (how much variability there is in a person’s LDLcholesterol levels)The effect size (the amount by which our intervention actuallyreduces cholesterol)

    Patrick Breheny Introduction to Biostatistics (171:161) 23/29

  • Presenting results with error barsPower and sample size

    Summary

    Power curves

    The main concepts behind power and sample size can beillustrated in power curves – graphs of what happens to poweras we change one of sample size/variability/effect size

    In the curves that follow, I will start with:

    A sample size of 100A variability of 36 mg/dLAn effect size of 5 mg/dL

    And we will see what happens to power as we vary each ofthem

    Patrick Breheny Introduction to Biostatistics (171:161) 24/29

  • Presenting results with error barsPower and sample size

    Summary

    Power curve #1

    −20 −10 0 10 20

    0.0

    0.2

    0.4

    0.6

    0.8

    1.0

    True effect

    Pow

    er

    Patrick Breheny Introduction to Biostatistics (171:161) 25/29

  • Presenting results with error barsPower and sample size

    Summary

    Power curve #2

    10 20 30 40 50 60

    0.0

    0.2

    0.4

    0.6

    0.8

    1.0

    True SD

    Pow

    er

    Patrick Breheny Introduction to Biostatistics (171:161) 26/29

  • Presenting results with error barsPower and sample size

    Summary

    Power curve #3

    0 200 400 600 800 1000

    0.0

    0.2

    0.4

    0.6

    0.8

    1.0

    Sample size

    Pow

    er

    Patrick Breheny Introduction to Biostatistics (171:161) 27/29

  • Presenting results with error barsPower and sample size

    Summary

    Sample size determination

    Thus, we can determine our sample size by looking at thepower curve

    For instance, in the previous example, if we want a power of80%, we would need a sample size of about 400

    In reality, of course, lots of other things like money, time,resources, availability of subjects, etc., influence the actualsample size of a study

    Also, we may be interested in calculating the required samplesize under a few different designs to see which way is theeasiest/cheapest to conduct the study

    Patrick Breheny Introduction to Biostatistics (171:161) 28/29

  • Presenting results with error barsPower and sample size

    Summary

    Summary

    Displaying ±SD in a figure emphasizes variability, whiledisplaying ±SE emphasizes uncertainty, although ask yourself:Wouldn’t I be better off plotting the confidence interval?

    The power of a study depends on:

    Sample sizeVariabilityEffect size

    Patrick Breheny Introduction to Biostatistics (171:161) 29/29

    Presenting results with error barsPower and sample sizeSummary