confidence limits in statistics

Upload: aassmmrr

Post on 03-Apr-2018

217 views

Category:

Documents


0 download

TRANSCRIPT

  • 7/28/2019 Confidence Limits in Statistics

    1/30

    Planning, Performing, and

    Publishing Research withConfidence Limits

    A tutorial lecture given at the annual meeting of theAmerican College of Sports Medicine, Seattle, June 4 1999.

    Will G Hopkins

    Physiology and Physical Education

    University of Otago

    Dunedin NZ

    [email protected]

  • 7/28/2019 Confidence Limits in Statistics

    2/30

    Outline

    Definitions and Mis/interpretations

    Planning Sample size

    Performing Sample size "on the fly"

    Publishing Methods, Results, Discussion

    Meta-analysis

    Publishing non-significant outcomes

    Conclusions

    Dis/advantages

  • 7/28/2019 Confidence Limits in Statistics

    3/30

    Definitions and Mis/interpretations

    Confidence limits: Definitions

    "Margin of error" Example: Survey of 1000 voters

    Democrats 43%, Republicans 33%

    Margin of error is 3% (for a result of 50%...)

    Likely range of true value "Likely" is usually 95%.

    "True value" = population value

    = value if you studied the entire population.

    Example: Survey of 1000 voters

    Democrats 43% (likely range 40 to 46%)

    Democrats - Republicans 10% (likely range 5 to 15%)

  • 7/28/2019 Confidence Limits in Statistics

    4/30

    Example: in a study of 64 subjects, the correlation between

    height and weight was 0.68 (likely range 0.52 to 0.79).

    correlation coefficient

    observed

    value

    0.00

    0.50 1

    upper

    confidence

    limit

    lower

    confidence

    limit

  • 7/28/2019 Confidence Limits in Statistics

    5/30

    Confidence interval: difference between the upper

    and lower confidence limits.

    Amazing facts about confidence intervals(for normally distributed statistics)

    To halve the interval, you have to quadruple sample size.

    A 99% interval is 1.3 times wider than a 95% interval.

    You need 1.7 times the sample size for the same width.

    A 90% interval is 0.8 of the width of a 95% interval.

    You need 0.7 times the sample size for the same width.

  • 7/28/2019 Confidence Limits in Statistics

    6/30

    How to Derive Confidence Limits Find a function(true value, observed value, data) with a

    known probability distribution.

    Calculate a critical value, such that for 2.5% of the time,function(true value, observed value, data) < critical value.

    probability

    function (e.g. (n-1)s2/2)

    area =0.025

    critical value

    probability

    distributionof function(e.g. 2)

    Rearranging, for 2.5% of the time,true value > function'(observed value, data, critical value)

    = upper confidence limit

  • 7/28/2019 Confidence Limits in Statistics

    7/30

    Mis/interpretation of confidence limits

    Hard to misinterpret confidence limits for simple

    proportions and correlation coefficients. Easier to misinterpret changes in means.

    Example: The change in blood volume in a study

    was 0.52 L (likely range 0.12 to 0.92 L).

    For 95% of subjects, the change was/would be between0.12 and 0.92 L.

    The average change in the population would be between

    0.12 and 0.92 L.

    The change for the average subject would be between

    0.12 and 0.92 L. There may be individual differences in the change.

  • 7/28/2019 Confidence Limits in Statistics

    8/30

    P value: Definition

    The probability of a more extreme absolute value

    than the observed value if the true value was zero

    or null.

    Example: 20 subjects, correlation = 0.25, p = 0.29.

    probability

    correlation coefficient

    area =p value= 0.29

    no effect

    observed effect(r = 0.25)

    distribution ofcorrelations

    for no effectand n = 200 0.5-0.5

  • 7/28/2019 Confidence Limits in Statistics

    9/30

    "Statistically Significant": Definitions

    P < 0.05

    Zero lies outside the confidence interval. Examples: four correlations for samples of size 20.

    0.00 0.50 1

    correlation coefficient

    -0.50

    r likely range P

    0.70 0.37 to 0.87 0.007

    0.44 0.00 to 0.74 0.05

    0.25 -0.22 to 0.62 0.29

    0.00 -0.44 to 0.44 1.00

  • 7/28/2019 Confidence Limits in Statistics

    10/30

    Incredibly interesting information about statistical

    significance and confidence intervals

    Two independent estimates of a normally distributed statistic

    with equal confidence intervals are significantly different atthe 5% level if the overlap of their intervals is less than 0.29

    (1 - 2/2) of the length of the interval. If the intervals are very unequal...

    p < 0.05 p = 0.05 p > 0.05

    p < 0.05 p = 0.05 p > 0.05

  • 7/28/2019 Confidence Limits in Statistics

    11/30

    Type I and II Errors

    You could be wrong about significance or lack of it.

    Type I error = false alarm. Rate = 5% for zero real effect.

    Type II error = failed alarm. Traditional acceptable rate = 20% for smallest worthwhile

    effect. Lots of tests for significance implies more chance of

    at least one false alarm: "inflated type I error". Ditto type II error?

    Deal with inflated type I error by reducing the p value.

    Should we adjust confidence intervals? No.

  • 7/28/2019 Confidence Limits in Statistics

    12/30

    Mis/interpretation of P > 0.05(for an observed positive effect)

    The effect is not publishable.

    There is no effect.

    The effect is probably zero or trivial.

    There's a reasonable chance the effect is < zero.

    Mis/interpretation of P < 0.05(for an observed positive effect)

    The effect is probably big.

    There's a < 5% chance the effect is zero.

    There's a < 2.5% chance the effect is < zero.

    There's a high chance the effect is > zero.

    The effect is publishable.

  • 7/28/2019 Confidence Limits in Statistics

    13/30

    Planning Research

    Sample Size via Statistical Significance

    Sample size must be big enough to be sure you willdetect the smallest worthwhile effect. To be sure: 80% of the time.

    Detect: P < 0.05.

    Smallest worthwhile effect: what impacts your subjects

    correlation = 0.10

    relative risk = 1.2 (or frequency difference = 10%)

    difference in means = 0.2 of a between-subject standard deviation

    change in means = 0.5 of a within-subject standard deviation

    Example: 760 subjects to detect a correlation of 0.10. Example: 68 subjects to detect a 0.5% change in a

    crossover study when the within-subject variation is 1%.

  • 7/28/2019 Confidence Limits in Statistics

    14/30

    But 95% likely range doesn't work properly with

    traditional sample-size estimation (maybe).Example: Correlation of 0.06, sample size of 760...

    47.5% + 47.5% (=95%) likely range:

    0.10

    correlation coefficient

    -0.1

    Not significant, but

    could be substantial.

    Huh?

    0.10

    correlation coefficient

    -0.1

    47.5% + 30% likely range:

    Not significant, and

    can't be substantial.OK!

  • 7/28/2019 Confidence Limits in Statistics

    15/30

    Sample Size via Confidence Limits

    Sample size must be big enough for acceptable

    precision of the effect. Precision means 95% confidence limits.

    Acceptable means any value of the effect within these

    limits will not impact your subjects.

    Example: need 380 subjects to delimit a correlation of

    zero.

    0 0.10

    correlation coefficient

    -0.10

    smallest worthwhile

    effects confidence

    interval for

    N = 380

  • 7/28/2019 Confidence Limits in Statistics

    16/30

    But sample size needed to detect or delimit

    smallest effect is overkill for larger effects.

    Example: confidence limits for correlations of 0.10 and0.80 with a sample size of 760...

    0.1 0.3 0.5 0.70 0.9 1

    correlation coefficient

    -0.1

    So why not start with a smaller sample and do

    more subjects only if necessary?

    Yes, I call it...

  • 7/28/2019 Confidence Limits in Statistics

    17/30

    Performing Research

    Sample Size "On the Fly"

    Start with a small sample; add subjects until you

    get acceptable precision for the effect.Acceptable precision defined as before.

    Need qualitative scale for magnitudes of effects.

    Example: sample sizes to delimit correlations...

    155

    0.1 0.3 0.5 0.70 0.9 1

    trivial small moderate large

    270350

    380

    correlation coefficient-0.1

    nearlyperfect

    46

    very large

  • 7/28/2019 Confidence Limits in Statistics

    18/30

    Problems with sampling on the fly

    Do not sample until you get statistical significance: theresulting outcomes are biased larger than life.

    Sampling until the confidence interval is acceptable

    produces bias, but it is negligible.

    But researchers will rush into print as soon as they getstatistical significance.

    And funding agencies prefer to give money once

    (but you could give some back!).

    And all the big effects have been researched anyway?No, not really.

  • 7/28/2019 Confidence Limits in Statistics

    19/30

    Publishing Research

    In the Methods

    "We show the precision of our estimates of outcomestatistics as 95% confidence limits (which define the

    likely range of the true value in the population from

    which we drew our sample)."

    Amazingly useful tips on calculating confidence limits Simpledifferences between means: stats program.

    Other normally distributed statistics: mean and p value.

    Relative risks: stats program.

    Correlations: Fisher's z transform.

    Standard deviations and other root mean square variations:chi-squared distribution.

  • 7/28/2019 Confidence Limits in Statistics

    20/30

    Coefficients of variation: standard deviation of 100x natural log

    of the variable. Back transform for CV>5%.

    Use the adjustment of Tate and Klett to get shorter intervals for

    SDs and CVs from small samples.

    21

    coefficient of variation (%)

    0 3

    Example:

    coefficient of

    variation for 10

    subjects in 2 tests

    usualadjusted

    Ratios of independent standard deviations: F distribution.

    R2 (variance explained): convert to a correlation.

    Use the spreadsheet at sportsci.org/stats for all the above.

    Effect-size (mean/standard deviation): non-centralF distribution or bootstrapping.

    Really awful statistics: bootstrapping.

  • 7/28/2019 Confidence Limits in Statistics

    21/30

    Bootstrapping (Resampling) for confidence limits Use for difficult statistics, e.g. for grossly non-normal

    repeated measures with missing values. Here's how... For a large-enough sample, you can recreate (sort of) the

    population by duplicating the sample endlessly.

    Draw 1000 samples (of same size as your original) from

    this population.

    Calculate your outcome statistic for each of these

    samples, rank them, then find the 25th and 975th place-

    getters. These are the confidence limits.

    Problems

    Painful to generate. No good for infrequent levels of nominal variables.

  • 7/28/2019 Confidence Limits in Statistics

    22/30

    In the Results

    In TEXT

    Change or difference in meansFirst mention:

    ...0.42 (95% confidence/likely limits/range -0.09 to 0.93) or

    ...0.42 (95% confidence/likely limits/range 0.51).

    Thereafter:

    ...2.6 (1.4 to 3.8) or 2.6 ( 1.2) etc. Correlations, relative risks, odds ratios, standard deviations

    ratios of standard deviations: can't use because the

    confidence interval is skewed:

    ...a correlation of 0.90 (0.67 to 0.97)...

    ...a coefficient of variation of 1.3% (0.9 to 1.9)...

  • 7/28/2019 Confidence Limits in Statistics

    23/30

    In TABLES Confidence intervals

    r likely range

    0.70 0.37 to 0.87

    0.44 0.00 to 0.74

    0.25 -0.22 to 0.62

    0.00 -0.44 to 0.44

    Variable A

    Variable B

    Variable C

    Variable D

    P values

    r p

    0.70 0.007

    0.44 0.05

    0.25 0.290.00 1.00

    Variable A

    Variable B

    Variable CVariable D

    Asterisks

    r

    0.70**

    0.44*

    0.250.00

    Variable A

    Variable B

    Variable CVariable D

  • 7/28/2019 Confidence Limits in Statistics

    24/30

    In FIGURES

    -10 -5 0 5 10

    Change in power (%)

    Bars are 95% likely ranges

    Told placebo

    Not told

    Told carbohydrate

  • 7/28/2019 Confidence Limits in Statistics

    25/30

    -3

    -2

    -1

    0

    1

    2

    3

    4

    0 2 4 6 8 10 12 14

    live low

    train low

    live high

    train high

    live high

    train low

    sea level altitude sea level

    change in

    5000-m

    time (%)

    training time (weeks)

    likely range of

    true change

  • 7/28/2019 Confidence Limits in Statistics

    26/30

    In the Discussion

    Interpret the observed effect and its 95%

    confidence limits qualitatively. Example: you observed a moderate correlation, but the

    true value of the correlation could be anything between

    trivial and very strong.

    0.1 0.3 0.5 0.70 0.9 1

    trivial small moderate large

    correlation coefficient

    -0.1

    nearlyperfectvery large

  • 7/28/2019 Confidence Limits in Statistics

    27/30

    Meta-Analysis

    Deriving a single estimate and confidence interval

    for an effect from several studies. Here's how it works for two:

    Study 1

    Study 2Study 1+2

    Equal Confidence Intervals

    Study 1

    Study 2

    Study 1+2

    Unequal Confidence Intervals

  • 7/28/2019 Confidence Limits in Statistics

    28/30

    Publishing non-significant outcomes

    Publishing only significant effects from small-scalestudies leads to publication bias.

    Publishing effects with confidence limits regardless

    of magnitude is free of bias.

    Many smaller studies are probably better than a

    few larger ones anyway.

    So bully the editor into accepting the paper about

    your seemingly inconclusive small-scale study.

  • 7/28/2019 Confidence Limits in Statistics

    29/30

    Conclusions

    Disadvantages of Statistical Significance

    Emphasizes testing of hypotheses.Aim is to detect an effect--effects are zero until provenotherwise.

    Have to understand Type I and II errors.

    Hard to understand; easy to misinterpret.

    Have to consider sample size. Focuses on statistically significant effects.

    Advantages of Statistical Significance

    Familiar.

    All stats programs give p values.

    Easy to put asterisks in tables and figures.

  • 7/28/2019 Confidence Limits in Statistics

    30/30

    Disadvantages of Confidence Limits

    Unfamiliar.

    Not always available in stats programs. Cluttersome in tables.

    Display in time series can be a challenge.

    Advantages of Confidence Limits

    Emphasizes precision of estimation.Aim is to del imi t an effect--effects are never zero.

    Only one kind of "error".

    Meaning is reasonably clear, even to lay readers.

    No confusion between significance and magnitude.

    Journals now require them.