how to learn everything you ever wanted to know about biostatistics
DESCRIPTION
How to Learn Everything You Ever Wanted to Know About Biostatistics. Daniel W. Byrne Director of Biostatistics and Study Design General Clinical Research Center Vanderbilt University Medical Center. The presenter has no financial interests in the products mentioned in this talk. - PowerPoint PPT PresentationTRANSCRIPT
1
How to Learn Everything You Ever
Wanted to Know About Biostatistics
Daniel W. ByrneDaniel W. Byrne
Director of Biostatistics and Study DesignDirector of Biostatistics and Study Design
General Clinical Research CenterGeneral Clinical Research CenterVanderbilt University Medical CenterVanderbilt University Medical Center
The presenter has no financial interests in the products mentioned in this talk.
2
Objective of This Workshop
To provide a 1-hour overview of the To provide a 1-hour overview of the
important practical information that a important practical information that a
clinical investigator needs to know about clinical investigator needs to know about
biostatistics to be successful.biostatistics to be successful.
3
I. You Will Need the Right Tools
4
Install a powerful, yet easy to use, statistical software package on your computer.
I recommend SPSS for Windows.I recommend SPSS for Windows.
Bring an 1180 for $80 to Karen Montefiori Bring an 1180 for $80 to Karen Montefiori
in 143 Hill Student Center (3-1630).in 143 Hill Student Center (3-1630).
She will lend you the SPSS CD for the day She will lend you the SPSS CD for the day
and you can install this software easily.and you can install this software easily.
5
11 163163 SASSAS22 5252 SPSSSPSS33 4848 STATASTATA44 3636 Epi InfoEpi Info55 2222 SUDAANSUDAAN66 1919 S-PLUSS-PLUS77 1212 StatxactStatxact88 88 BMDPBMDP99 66 StatisticaStatistica1010 55 StatviewStatview
SPSS is the 2nd most popular package. It is much easier to use than SAS and Stata.
6
Install additional software for statistical “odds and ends”
Instat by GraphPad – graphpad.comInstat by GraphPad – graphpad.com for summary data analysis - $100for summary data analysis - $100
True Epistat by Epistat Services – true-True Epistat by Epistat Services – true-epistat.com epistat.com - $395- $395 for random number table, etc.for random number table, etc.
CIA (CCIA (Confidence Interval Analysis) onfidence Interval Analysis) – bmj.com– bmj.com for confidence intervals - $35.95 with bookfor confidence intervals - $35.95 with book ““Statistics with Confidence” D. AltmanStatistics with Confidence” D. Altman
7
Install a sample size program.
If you can afford to spend $400, buy nQuery Advisor – statistical If you can afford to spend $400, buy nQuery Advisor – statistical solutions - solutions - www.statsol.comwww.statsol.com
If you can afford to spend $0, download PS from the Vanderbilt If you can afford to spend $0, download PS from the Vanderbilt
web site – web site – http://www.mc.vanderbilt.edu/prevmed/ps/index.http://www.mc.vanderbilt.edu/prevmed/ps/index.htmhtm
Both packages are on the CRC’s statistical workstation in room A-Both packages are on the CRC’s statistical workstation in room A-3101. VUMC investigators are welcome to use this workstation.3101. VUMC investigators are welcome to use this workstation.
8
II. You Will Need a Plan
9
Use the scientific method to keep your project focused.
State the problemState the problem Formulate the null hypothesisFormulate the null hypothesis Design the studyDesign the study Collect the dataCollect the data Interpret the dataInterpret the data Draw conclusionsDraw conclusions
10
State the Problem Among patients hospitalized for a hip fracture Among patients hospitalized for a hip fracture
who develop pneumonia during their stay in the who develop pneumonia during their stay in the hospital, the mortality rate is 2.3 times higher at hospital, the mortality rate is 2.3 times higher at non-trauma centers compared with trauma centers non-trauma centers compared with trauma centers (48.7% vs. 21.1%, P=0.043.)(48.7% vs. 21.1%, P=0.043.)
It is not clear if, or how, those who will develop It is not clear if, or how, those who will develop pneumonia could be identified on admission.pneumonia could be identified on admission.
11
Formulate the Null Hypothesis
Among patients hospitalized for treatment Among patients hospitalized for treatment of a hip fracture, there are no factors known of a hip fracture, there are no factors known upon admission that are statistically upon admission that are statistically different between those who develop different between those who develop pneumonia during their stay and those who pneumonia during their stay and those who do not.do not.
12
Why bother with a null hypothesis?
For the same reason that we assume that a person For the same reason that we assume that a person is innocent until proven guilty.is innocent until proven guilty.
The burden of responsibility is on the prosecutor The burden of responsibility is on the prosecutor to demonstrate enough evidence for members of a to demonstrate enough evidence for members of a jury to be convinced of that the charges are true jury to be convinced of that the charges are true and to and to changechange their minds. their minds.
Outcome after treatment with Drug A will not be Outcome after treatment with Drug A will not be significantly different from placebo.significantly different from placebo.
13
Design the Study
Data on 933 patients with a hip fracture Data on 933 patients with a hip fracture from a New York trauma registry will be from a New York trauma registry will be analyzed.analyzed.
The 58 patients with pneumonia will be The 58 patients with pneumonia will be compared with the 875 without pneumonia.compared with the 875 without pneumonia.
14
The Most Common Type of Flaw
4
4
20
0
0 5 10 15 20 25
Presentation of the results
Importance of the topic
Interpretation of the findings
Study Design
Number of Responses
15
Example of Recall Bias A control group is asked, A control group is asked,
““Two weeks ago from today, did you eat X for Two weeks ago from today, did you eat X for breakfast?”breakfast?”
Two weeks after their MI, patients are asked Two weeks after their MI, patients are asked ““Did you eat X for breakfast on the day of your Did you eat X for breakfast on the day of your
heart attack?”heart attack?” You can prove any food causes an MI using this You can prove any food causes an MI using this
method (X=bacon, X=Flintstone vitamins, etc.)method (X=bacon, X=Flintstone vitamins, etc.)
16
John Bailar’s Quote:
““Study design and bias are much more Study design and bias are much more important than complex statistical important than complex statistical methods.”methods.”
Devote more time to improving the study Devote more time to improving the study design, and minimizing and measuring bias.design, and minimizing and measuring bias.
Become an expert at study design issues Become an expert at study design issues and biases in your area of research.and biases in your area of research.
17
What is the statistical power of the study? PowerPower BetaBeta AlphaAlpha Sample sizeSample size Ratio of treated to control groupRatio of treated to control group Measure of outcomeMeasure of outcome
18
Sample Size Table
See Table 9-1 in the handout See Table 9-1 in the handout ““Sample Size Requirements for Each of Sample Size Requirements for Each of
Two Groups”.Two Groups”.
19
20
Collect the Data
See the handouts for:See the handouts for: II TEC Trauma Systems StudyTEC Trauma Systems Study
21
III. You Will Need Data Management Skills
22
Enter your data with statistical analysis in mind.
For small projects enter data into Microsoft For small projects enter data into Microsoft Excel or directly into SPSS.Excel or directly into SPSS.
For large projects, create a database with For large projects, create a database with Microsoft Access.Microsoft Access.
Keep variables names in the first row, with Keep variables names in the first row, with <=8 characters, and no internal spaces.<=8 characters, and no internal spaces.
Enter as little text as possible and use codes Enter as little text as possible and use codes for categories, such as 1=male, 2=female.for categories, such as 1=male, 2=female.
23
Spreadsheet from Hell
24
Spreadsheet from Heaven
25
IV. You Will Need to Learn Descriptive Statistics
26
Descriptive vs. Inferential
Descriptive statistics summarize your group.Descriptive statistics summarize your group. average age 78.5, 89.3% white.average age 78.5, 89.3% white.
Inferential statistics use the theory of probability to Inferential statistics use the theory of probability to make inferences about larger populations from your make inferences about larger populations from your sample. sample. White patients were significantly older than black White patients were significantly older than black
and Hispanic patients, P<0.001.and Hispanic patients, P<0.001.
27
Import your data into a statistical program for screening and analysis.
28
Screen your data thoroughly for errors and inconsistencies before doing ANY analyses.
Check the lowest and highest value for each Check the lowest and highest value for each variable. variable. For example, age 1-777.For example, age 1-777.
Look at histograms to detect typos.Look at histograms to detect typos. Cross-check variables to detect impossible Cross-check variables to detect impossible
combinations. combinations. For example, pregnant males, survivors For example, pregnant males, survivors
discharged to the morgue, patients in the ICU discharged to the morgue, patients in the ICU for 25 days with no complications.for 25 days with no complications.
29
Analyze, descriptive statistics, frequencies, select the variable
Statistics
AGE933
079.29281.300
90.026.537763.014.0
777.0
ValidMissing
N
MeanMedianModeStd. DeviationRangeMinimumMaximum
AGE
775.0725.0
675.0625.0
575.0525.0
475.0425.0
375.0325.0
275.0225.0
175.0125.0
75.025.0
AGE
Fre
quen
cy
700
600
500
400
300
200
100
0
Std. Dev = 26.54
Mean = 79.3
N = 933.00
30
Analyze, Descriptive Statistics, Crosstabs
SURVIVAL * 48-DISPOSITION Crosstabulation
Count
63 63224 56 12 201 236 3 138 870224 56 12 63 201 236 3 138 933
EXPIREDSURVIVED
SURVIVAL
Total
HOME
REHABILITATION
FACILITYOTHER
HOSPITAL MORGUE
SKILLEDNURSINGFACILITY
HOMEWITH
ASSISTANCE
AMADISCHAR
GEAGAINSTMEDICALADVICE 8
48-DISPOSITION
Total
31
Correct the data in the original database or spreadsheet and import a revised version into
the statistical package.
The age of 777 should be checked and The age of 777 should be checked and
changed to the correct age.changed to the correct age.
Suspicious values, such as an age of 106 Suspicious values, such as an age of 106
should be checked. In this case it is correct.should be checked. In this case it is correct.
32
Interpret the Data
33
Run descriptive statistics to summarize your data.
SURVIVAL
63 6.8 6.8 6.8870 93.2 93.2 100.0933 100.0 100.0
EXPIREDSURVIVEDTotal
ValidFrequency Percent
ValidPercent
Cumulative Percent
Statistics
49-DAYS IN HOSPITAL933
023.3419.00
2018.03
2361
237
ValidMissing
N
MeanMedianModeStd. DeviationRangeMinimumMaximum
49-DAYS IN HOSPITAL
240.0220.0
200.0180.0
160.0140.0
120.0100.0
80.060.0
40.020.0
0.0
49-DAYS IN HOSPITAL
Freq
uenc
y
400
300
200
100
0
Std. Dev = 18.03
Mean = 23.3
N = 933.00
34
V. You Will Need to Learn Inferential Statistics
35
P Value A P value is an estimate of the probability of A P value is an estimate of the probability of
results such as yours could have occurred by results such as yours could have occurred by chance alone if there truly was no difference or chance alone if there truly was no difference or association.association.
P < 0.05 = 5% chance, 1 in 20.P < 0.05 = 5% chance, 1 in 20. P <0.01 = 1% chance, 1 in 100.P <0.01 = 1% chance, 1 in 100. Alpha is the threshold. If P is < this threshold, Alpha is the threshold. If P is < this threshold,
you consider it statistically significant.you consider it statistically significant.
36
Basic formula for inferential tests
Based on the total number of observations Based on the total number of observations and the size of the test statistic, one can and the size of the test statistic, one can determine the P value.determine the P value.
yVariabilitExpectedObservedStatisticTest
37
How many noise units?
Test statistic & sample size (degrees of Test statistic & sample size (degrees of freedom) convert to a probability or P freedom) convert to a probability or P Value.Value.
NoiseSignalStatisticTest
38
Use inference statistics to test for differences and associations.
There are hundreds of statistical tests.There are hundreds of statistical tests.
A clinical researcher does not need to know them all.A clinical researcher does not need to know them all.
Learn how to perform the most common tests on SPSS.Learn how to perform the most common tests on SPSS.
Learn how to use the statistical flowchart to determine Learn how to use the statistical flowchart to determine
which test to use.which test to use.
39
VI. You Will Need to Understand the Statistical
Terminology Required to Select the Proper Inferential Test
40
Univariate vs. Multivariate Univariate analysis usually refers to one Univariate analysis usually refers to one
predictor variable and one outcome variablepredictor variable and one outcome variable Is gender a predictor of pneumonia?Is gender a predictor of pneumonia?
Multivariate analysis usually refers to more Multivariate analysis usually refers to more than one predictor variable or more than than one predictor variable or more than one outcome variable being evaluated one outcome variable being evaluated simultaneously.simultaneously. After adjusting for age, is gender a After adjusting for age, is gender a
predictor of pneumonia?predictor of pneumonia?
41
Difference vs. Association Some tests are designed to assess whether there Some tests are designed to assess whether there
are statistically significant differences between are statistically significant differences between groups.groups. Is there a statistically significant difference Is there a statistically significant difference
between the age of patients with and without between the age of patients with and without pneumonia?pneumonia?
Some tests are designed to assess whether there Some tests are designed to assess whether there are statistically significant associations between are statistically significant associations between variables.variables. Is the age of the patient associated with the Is the age of the patient associated with the
number of days in the hospital?number of days in the hospital?
42
Unmatched vs. Matched Some statistical tests are designed to assess Some statistical tests are designed to assess
groups that are unmatched or independent.groups that are unmatched or independent. Is the admission systolic blood pressure Is the admission systolic blood pressure
different between men and women?different between men and women? Some statistical tests are designed to assess Some statistical tests are designed to assess
groups that are matched or data that are paired.groups that are matched or data that are paired. Is the systolic blood pressure different Is the systolic blood pressure different
between admission and discharge?between admission and discharge?
43
Level of Measurement Categorical vs. continuous variablesCategorical vs. continuous variables
If you take the average of a continuous If you take the average of a continuous variable, it has meaning.variable, it has meaning.Average age, blood pressure, days in the Average age, blood pressure, days in the
hospital.hospital. If you take the average of a categorical If you take the average of a categorical
variable, it has no meaning.variable, it has no meaning.Average gender, race, smoker.Average gender, race, smoker.
44
Level of Measurement
Nominal - categorical Nominal - categorical gender, race, hypertensivegender, race, hypertensive
Ordinal - categories that can be rankedOrdinal - categories that can be ranked none, light, moderate, heavy smokernone, light, moderate, heavy smoker
Interval - continuous Interval - continuous blood pressure, age, days in the hospitalblood pressure, age, days in the hospital
45
Horse race example NominalNominal
Did this horse come in first place? Did this horse come in first place? 0=no, 1=yes0=no, 1=yes
OrdinalOrdinal In what position did this horse finish?In what position did this horse finish? 1=first, 2=second, 3=third, etc.1=first, 2=second, 3=third, etc.
Interval (scale)Interval (scale) How long did it take for this horse to finish?How long did it take for this horse to finish? 60 seconds, etc.60 seconds, etc.
46
47
Normal vs. Skewed Distributions Parametric statistical test can be used to Parametric statistical test can be used to
assess variables that have a “normal” or assess variables that have a “normal” or symmetrical bell-shaped distribution curve symmetrical bell-shaped distribution curve for a histogram.for a histogram.
Nonparamettric statistical test can be used Nonparamettric statistical test can be used to assess variables that are skewed or to assess variables that are skewed or nonnormal.nonnormal.
Look at a histogram to decide.Look at a histogram to decide.
48
Examples of Normal and Skewed
44-DAYS IN ICU
70.065.0
60.055.0
50.045.0
40.035.0
30.025.0
20.015.0
10.05.0
0.0
44-DAYS IN ICU
Freq
uenc
y
1000
800
600
400
200
0
Std. Dev = 3.99
Mean = .9
N = 933.00
35-SYSTOLIC BLOOD PRESSURE FIRST ER
250.0240.0
230.0220.0
210.0200.0
190.0180.0
170.0160.0
150.0140.0
130.0120.0
110.0100.0
90.080.0
70.060.0
35-SYSTOLIC BLOOD PRESSURE FIRST ER
Freq
uenc
y
160
140
120
100
80
60
40
20
0
Std. Dev = 27.74
Mean = 146.9
N = 925.00
49
VII. You Will Need to Know Which Statistical Test to Use
50
Flowchart of common inferential statistics See the handout, Figure 16-1, pages 78-79.See the handout, Figure 16-1, pages 78-79.
51
Commonly used statistical methods 1. Chi-square 2. Logistic regression 3. Student's t-test 4. Fisher's exact test 5. Cox proportional-hazards 6. Kaplan-Meier method 7. Wilcoxon rank-sum test 8. Log-rank test 9. Linear regression analysis 10. Mantel-Haenszel method
52
Commonly used statistical methods 11. One-way analysis of variance (ANOVA) 12. Mann-Whitney U test 13. Kruskal-Wallis test 14. Repeated-measures analysis of variance
15. Paired t-test 16. Chi-square test for trend 17. Wilcoxon signed-rank test 18. Analysis of variance (two-way) 19. Spearman rank-order correlation 20. Analysis of covariance (ANCOVA)
53
Chi-square The most commonly used statistical test.The most commonly used statistical test. Used to test if two or more percentages are Used to test if two or more percentages are
different.different. For example, suppose that in a study of 933 patients For example, suppose that in a study of 933 patients
with a hip fracture, 10% of the men (22/219) of the with a hip fracture, 10% of the men (22/219) of the men develop pneumonia compared with 5% of the men develop pneumonia compared with 5% of the women (36/714).women (36/714).
What is the probability that this could happen by What is the probability that this could happen by chance alone?chance alone?
Univariate, difference, unmatched, nominal, =>2 Univariate, difference, unmatched, nominal, =>2 groups, n=>20.groups, n=>20.
54
Chi-square example
PNEUMONIA COMPLICATION 480.00-486.99 * SEX Crosstabulation
197 678 875
90.0% 95.0% 93.8%
22 36 58
10.0% 5.0% 6.2%
219 714 933
100.0% 100.0% 100.0%
Count
% within SEX
Count
% within SEX
Count
% within SEX
ABSENT
PRESENT
PNEUMONIACOMPLICATION480.00-486.99
Total
MALE FEMALE
SEX
Total
Chi-Square Tests
7.197b 1 .007
6.364 1 .012
6.492 1 .011
.010 .008
7.189 1 .007
933
Pearson Chi-Square
Continuity Correctiona
Likelihood Ratio
Fisher's Exact Test
Linear-by-Linear Association
N of Valid Cases
Value dfAsymp. Sig.
(2-sided)Exact Sig.(2-sided)
Exact Sig.(1-sided)
Computed only for a 2x2 tablea.
0 cells (.0%) have expected count less than 5. The minimum expected count is 13.61.b.
55
Fisher’s Exact Test This test can be used for 2 by 2 tables when This test can be used for 2 by 2 tables when
the number of cases is too small to satisfy the the number of cases is too small to satisfy the assumptions of the chi-square.assumptions of the chi-square. Total number of cases is <20 orTotal number of cases is <20 or The expected number of cases in any cell is The expected number of cases in any cell is
<1 or<1 or More than 25% of the cells have expected More than 25% of the cells have expected
frequencies <5.frequencies <5.
56
Chi-Square Tests
13.545b 1 .000
8.674 1 .003
6.842 1 .009
.010 .010
13.531 1 .000
933
Pearson Chi-Square
Continuity Correctiona
Likelihood Ratio
Fisher's Exact Test
Linear-by-Linear Association
N of Valid Cases
Value dfAsymp. Sig.
(2-sided)Exact Sig.(2-sided)
Exact Sig.(1-sided)
Computed only for a 2x2 tablea.
1 cells (25.0%) have expected count less than 5. The minimum expected count is .50.b.
PNEUMONIA COMPLICATION 480.00-486.99 * CIRRHOSIS OR CHRONIC LIVER 571Crosstabulation
870 5 875
867.5 7.5 875.0
99.4% .6% 100.0%
94.1% 62.5% 93.8%
55 3 58
57.5 .5 58.0
94.8% 5.2% 100.0%
5.9% 37.5% 6.2%
925 8 933
925.0 8.0 933.0
99.1% .9% 100.0%
100.0% 100.0% 100.0%
Count
Expected Count
% within PNEUMONIACOMPLICATION480.00-486.99
% within CIRRHOSIS ORCHRONIC LIVER 571
Count
Expected Count
% within PNEUMONIACOMPLICATION480.00-486.99
% within CIRRHOSIS ORCHRONIC LIVER 571
Count
Expected Count
% within PNEUMONIACOMPLICATION480.00-486.99
% within CIRRHOSIS ORCHRONIC LIVER 571
ABSENT
PRESENT
PNEUMONIACOMPLICATION480.00-486.99
Total
ABSENT PRESENT
CIRRHOSIS ORCHRONIC LIVER 571
Total
57
How to calculate the expected number in a cell
PNEUMONIA COMPLICATION 480.00-486.99 * CIRRHOSIS OR CHRONICLIVER 571 Crosstabulation
Count
870 5 875
55 3 58
925 8 933
ABSENT
PRESENT
PNEUMONIACOMPLICATION480.00-486.99Total
ABSENT PRESENT
CIRRHOSIS ORCHRONIC LIVER 571
Total
PNEUMONIA COMPLICATION 480.00-486.99 * CIRRHOSIS OR CHRONIC LIVER 571Crosstabulation
870 5 875867.5 7.5 875.0
55 3 5857.5 .5 58.0925 8 933
925.0 8.0 933.0
CountExpected CountCountExpected CountCountExpected Count
ABSENT
PRESENT
PNEUMONIACOMPLICATION480.00-486.99
Total
ABSENT PRESENT
CIRRHOSIS ORCHRONIC LIVER 571
Total
PNEUMONIA COMPLICATION 480.00-486.99 * CIRRHOSIS OR CHRONIC LIVER 571 Crosstabulation
870 5 875867.5 7.5 875.0
99.4% .6% 100.0%
94.1% 62.5% 93.8%
55 3 5857.5 .5 58.0
94.8% 5.2% 100.0%
5.9% 37.5% 6.2%
925 8 933925.0 8.0 933.0
99.1% .9% 100.0%
100.0% 100.0% 100.0%
CountExpected Count% within PNEUMONIACOMPLICATION480.00-486.99% within CIRRHOSIS ORCHRONIC LIVER 571CountExpected Count% within PNEUMONIACOMPLICATION480.00-486.99% within CIRRHOSIS ORCHRONIC LIVER 571CountExpected Count% within PNEUMONIACOMPLICATION480.00-486.99% within CIRRHOSIS ORCHRONIC LIVER 571
ABSENT
PRESENT
PNEUMONIACOMPLICATION480.00-486.99
Total
ABSENT PRESENT
CIRRHOSIS ORCHRONIC LIVER 571
Total
58
Chi-square for a trend test
Used to assess a nominal variable and an Used to assess a nominal variable and an ordinal variable.ordinal variable.
Does the pneumonia rate increase with the Does the pneumonia rate increase with the total number of comorbidities?total number of comorbidities?
Univariate, association, nominal.Univariate, association, nominal. Analyze, Descriptive Statistics, Crosstabs.Analyze, Descriptive Statistics, Crosstabs.
59
PNEUMONIA COMPLICATION 480.00-486.99 * NUMBER OF COMORBIDITES (0-9) Crosstabulation
250 292 213 98 19 3 875
98.8% 94.2% 93.0% 86.0% 90.5% 50.0% 93.8%
3 18 16 16 2 3 58
1.2% 5.8% 7.0% 14.0% 9.5% 50.0% 6.2%
253 310 229 114 21 6 933
100.0% 100.0% 100.0% 100.0% 100.0% 100.0% 100.0%
Count% within NUMBER OFCOMORBIDITES (0-9)Count% within NUMBER OFCOMORBIDITES (0-9)Count% within NUMBER OFCOMORBIDITES (0-9)
ABSENT
PRESENT
PNEUMONIACOMPLICATION480.00-486.99
Total
.00 1.00 2.00 3.00 4.00 5.00NUMBER OF COMORBIDITES (0-9)
Total
Chi-Square Tests
43.381a 5 .00034.576 5 .000
30.522 1 .000
933
Pearson Chi-SquareLikelihood RatioLinear-by-LinearAssociationN of Valid Cases
Value df
Asymp.Sig.
(2-sided)
2 cells (16.7%) have expected count less than 5. Theminimum expected count is .37.
a.
60
Mantel-Haenszel Method
Used to assess a factor across a number of 2 Used to assess a factor across a number of 2 by 2 tables.by 2 tables.
Is the mortality rate associated with Is the mortality rate associated with pneumonia different between trauma pneumonia different between trauma centers and nontrauma centers?centers and nontrauma centers?
Analyze, Descriptive Statistics, Crosstabs.Analyze, Descriptive Statistics, Crosstabs.
61
62
Student’s t-test
Used to compare the average (mean) in one Used to compare the average (mean) in one group with the average in another group.group with the average in another group.
Is the average age of patients significantly Is the average age of patients significantly different between those who developed different between those who developed pneumonia and those who did not?pneumonia and those who did not?
Univariate, Difference, Unmatched, Univariate, Difference, Unmatched, Interval, Normal, 2 groups.Interval, Normal, 2 groups.
63
Independent Samples Test
1.937 .164 -1.561 931 .119 -2.849 1.825 -6.429 .732
-2.085 72.574 .041 -2.849 1.366 -5.572 -.125
Equal variances assumed
Equal variances not assumed
AGEF Sig.
Levene's Test for Equalityof Variances
t dfSig.
(2-tailed)Mean
DifferenceStd. ErrorDifference Lower Upper
95% Confidence Intervalof the Difference
t-test for Equality of Means
64
Mann-Whitney U test Same as the Same as the Wilcoxon rank-sum test Used in place of the Student’s t-test when the data are skewed.
A nonparametric test that uses the rank of the value rather than the actual value.
Univariate, Difference, Unmatched, Interval, Nonnormal, 2 groups.
65
Paired t-test Used to compare the average for measurements Used to compare the average for measurements
made twice within the same person - before vs. made twice within the same person - before vs. after.after.
Used to compare a treatment group and a matched Used to compare a treatment group and a matched control group.control group.
For example, Did the systolic blood pressure change For example, Did the systolic blood pressure change significantly from the scene of the injury to significantly from the scene of the injury to admission?admission?
Univariate, Difference, Matched, Interval, Normal, Univariate, Difference, Matched, Interval, Normal, 2 groups.2 groups.
66
Wilcoxon signed-rank test Used to compare two skewed continuous variables Used to compare two skewed continuous variables
that are paired or matched.that are paired or matched. Nonparametric equivalent of the paired t-test.Nonparametric equivalent of the paired t-test. For example, “Was the Glasgow Coma Scale score For example, “Was the Glasgow Coma Scale score
different between the scene and admission?”different between the scene and admission?” Univariate, Difference, Matched, Interval, Univariate, Difference, Matched, Interval,
Nonnormal, 2 group.Nonnormal, 2 group.
67
ANOVA
One-way used to compare more than 3 means One-way used to compare more than 3 means from independent groups.from independent groups.““Is the age different between White, Black, Is the age different between White, Black, Hispanic patients?”Hispanic patients?”
Two-way used to compare 2 or more means Two-way used to compare 2 or more means by 2 or more factors.by 2 or more factors.““Is the age different between Males and Is the age different between Males and Females, With and Without Pnuemonia?”Females, With and Without Pnuemonia?”
68
Tests of Between-Subjects Effects
Dependent Variable: AGE
5769944a 4 1442486 8664.775 .0001981.683 1 1981.683 11.904 .0011299.320 1 1299.320 7.805 .005519.282 1 519.282 3.119 .078
154657.2 929 166.4775924601 933
SourceModelSEXPNEUMONSEX * PNEUMONErrorTotal
Type IIISum ofSquares df
MeanSquare F Sig.
R Squared = .974 (Adjusted R Squared = .974)a.
69
Kruskal-Wallis One-Way ANOVA
Used to compare continuous variables that Used to compare continuous variables that are not normally distributed between more are not normally distributed between more than 2 groups.than 2 groups.
Nonparametric equivalent to the one-way Nonparametric equivalent to the one-way ANOVA.ANOVA.
Is the length of stay different by ethnicity?Is the length of stay different by ethnicity? Analyze, nonparametric tests, K Analyze, nonparametric tests, K
independent samples.independent samples.
70
Repeated-Measures ANOVA Used to assess the change in 2 or more continuous Used to assess the change in 2 or more continuous
measurement made on the same person. Can also measurement made on the same person. Can also compare groups and adjust for covariates.compare groups and adjust for covariates.
Do changes in the vital signs within the first 24 Do changes in the vital signs within the first 24 hours of a hip fracture predict which patients will hours of a hip fracture predict which patients will develop pneumonia?develop pneumonia?
Analyze, General Linear Model, Repeated Analyze, General Linear Model, Repeated Measures.Measures.
71
Pearson Correlation
Used to assess the linear association Used to assess the linear association between two continuous variables.between two continuous variables. r=1.0 perfect correlationr=1.0 perfect correlation r=0.0 no correlationr=0.0 no correlation r=-1.0 perfect inverse correlationr=-1.0 perfect inverse correlation
Univariate, Association, IntervalUnivariate, Association, Interval
72
Correlations
1.000 .088** .211** .137** .149** -.030 -.008. .007 .000 .000 .000 .356 .809
933 933 933 933 925 926 923.088** 1.000 .167** .453** .039 .016 .022.007 . .000 .000 .237 .633 .499933 933 933 933 925 926 923.211** .167** 1.000 .222** .034 -.079* .055.000 .000 . .000 .296 .017 .093
933 933 933 933 925 926 923
.137** .453** .222** 1.000 -.033 -.028 .046
.000 .000 .000 . .310 .393 .161933 933 933 933 925 926 923
.149** .039 .034 -.033 1.000 .043 .069*
.000 .237 .296 .310 . .196 .035925 925 925 925 925 925 923
-.030 .016 -.079* -.028 .043 1.000 -.100**.356 .633 .017 .393 .196 . .002926 926 926 926 925 926 923
-.008 .022 .055 .046 .069* -.100** 1.000.809 .499 .093 .161 .035 .002 .923 923 923 923 923 923 923
Pearson CorrelationSig. (2-tailed)NPearson CorrelationSig. (2-tailed)NPearson CorrelationSig. (2-tailed)N
Pearson CorrelationSig. (2-tailed)NPearson CorrelationSig. (2-tailed)NPearson CorrelationSig. (2-tailed)NPearson CorrelationSig. (2-tailed)N
AGE
49-DAYS IN HOSPITAL
NUMBER OFCOMORBIDITES (0-9)
43-TOTAL NUMBEROF COMPLICATIONS
35-SYSTOLIC BLOODPRESSURE FIRST ER
35-GLASGOW COMASCALE FIRST ER
35-PULSE FIRST ER
AGE
49-DAYSIN
HOSPITAL
NUMBEROF
COMORBIDITES
(0-9)
43-TOTALNUMBER
OFCOMPLICATIONS
35-SYSTOLIC
BLOODPRESSURE FIRST
ER
35-GLASGOW COMA
SCALEFIRST ER
35-PULSEFIRST ER
Correlation is significant at the 0.01 level (2-tailed).**.
Correlation is significant at the 0.05 level (2-tailed).*.
73
Spearman rank-order correlation Use to assess the relationship between two Use to assess the relationship between two
ordinal variables or two skewed continuous ordinal variables or two skewed continuous variables.variables.
Nonparametric equivalent of the Pearson Nonparametric equivalent of the Pearson correlation.correlation.
Univariate, Association, Ordinal (or Univariate, Association, Ordinal (or skewed).skewed).
74
Correlations
1.000 .089** .158** .145** .091** -.146** -.008. .007 .000 .000 .005 .000 .806
933 933 933 933 925 926 923.089** 1.000 .142** .389** .073* .048 .037.007 . .000 .000 .027 .149 .268933 933 933 933 925 926 923
.158** .142** 1.000 .229** .037 -.091** .042
.000 .000 . .000 .257 .006 .202
933 933 933 933 925 926 923
.145** .389** .229** 1.000 -.014 -.076* .043
.000 .000 .000 . .676 .020 .196933 933 933 933 925 926 923
.091** .073* .037 -.014 1.000 .079* .080*
.005 .027 .257 .676 . .017 .015925 925 925 925 925 925 923
-.146** .048 -.091** -.076* .079* 1.000 -.038.000 .149 .006 .020 .017 . .252926 926 926 926 925 926 923
-.008 .037 .042 .043 .080* -.038 1.000.806 .268 .202 .196 .015 .252 .923 923 923 923 923 923 923
Correlation CoefficientSig. (2-tailed)NCorrelation CoefficientSig. (2-tailed)NCorrelation CoefficientSig. (2-tailed)N
Correlation CoefficientSig. (2-tailed)NCorrelation CoefficientSig. (2-tailed)NCorrelation CoefficientSig. (2-tailed)NCorrelation CoefficientSig. (2-tailed)N
AGE
49-DAYS IN HOSPITAL
NUMBER OFCOMORBIDITES (0-9)
43-TOTAL NUMBEROF COMPLICATIONS
35-SYSTOLIC BLOODPRESSURE FIRST ER
35-GLASGOW COMASCALE FIRST ER
35-PULSE FIRST ER
Spearman's rhoAGE
49-DAYSIN
HOSPITAL
NUMBEROF
COMORBIDITES
(0-9)
43-TOTALNUMBER
OFCOMPLICATIONS
35-SYSTOLIC
BLOODPRESSURE FIRST
ER
35-GLASGOW COMA
SCALEFIRST ER
35-PULSEFIRST ER
Correlation is significant at the .01 level (2-tailed).**.
Correlation is significant at the .05 level (2-tailed).*.
75
Summary of Inferential Tests
76
Unpaired vs. Paired Student’s t-testStudent’s t-test Chi-squareChi-square One-way ANOVAOne-way ANOVA Mann-Whitney U testMann-Whitney U test Kruskal-Wallis H testKruskal-Wallis H test
Paired t-testPaired t-test McNemar’s testMcNemar’s test Repeated-measuresRepeated-measures Wilcoxon signed-rankWilcoxon signed-rank Friedman ANOVAFriedman ANOVA
77
Parametric vs. Nonparametric Student’s t-testStudent’s t-test One-way ANOVAOne-way ANOVA Paired t-testPaired t-test Pearson correlationPearson correlation Correlated F ratio Correlated F ratio
(repeatedmeasures (repeatedmeasures ANOVA)ANOVA)
Mann-Whitney U testMann-Whitney U test Kruskal-Wallis testKruskal-Wallis test Wilcoxon signed-rankWilcoxon signed-rank Spearman’s rSpearman’s r Friedman ANOVAFriedman ANOVA
78
A Good Rule to Follow
Always check your results with a Always check your results with a nonparametric.nonparametric.
If you test your null hypothesis with a If you test your null hypothesis with a Student’s t-test, also check it with a Mann-Student’s t-test, also check it with a Mann-Whitney U test.Whitney U test.
It will only take an extra 25 seconds.It will only take an extra 25 seconds.
79
VIII. You Will Need to Understand Regression
Techniques
80
Linear Regression Used to assess how one or more predictor Used to assess how one or more predictor
variables can be used to predict a variables can be used to predict a continuous outcome variable.continuous outcome variable.
““Do age, number of comorbidities, or Do age, number of comorbidities, or admission vital signs predict the length of admission vital signs predict the length of stay in the hospital after a hip fracture?”stay in the hospital after a hip fracture?”
Multivariate, Association, Interval/Ordinal Multivariate, Association, Interval/Ordinal dependent variable.dependent variable.
81
Coefficientsa
-4.451 18.889 -.236 .8147.136E-02 .045 .053 1.571 .117
2.606 .548 .159 4.757 .000
1.562E-02 .022 .024 .726 .468
1.067 1.170 .030 .912 .362
2.581E-02 .047 .019 .554 .580
-8.00E-02 .188 -.014 -.425 .671
(Constant)AGENUMBER OFCOMORBIDITES (0-9)35-SYSTOLIC BLOODPRESSURE FIRST ER35-GLASGOW COMASCALE FIRST ER35-PULSE FIRST ER35-RESPIRATIONRATE FIRST ER
Model1
B Std. Error
UnstandardizedCoefficients
Beta
Standardized
Coefficients
t Sig.
Dependent Variable: 49-DAYS IN HOSPITALa.
82
Logistic Regression
Used to assess the predictive value of one or more Used to assess the predictive value of one or more
variables on an outcome that is a yes/no question.variables on an outcome that is a yes/no question.
““Do age, gender, and comorbidities predict which Do age, gender, and comorbidities predict which
hip fracture patients will develop pneumonia?”hip fracture patients will develop pneumonia?”
Multivariate, Difference, Nominal dependent Multivariate, Difference, Nominal dependent
variable, not time-dependent, 2 groups.variable, not time-dependent, 2 groups.
83
11 Total number of Total number of comorbiditiescomorbidities
22 CirrhosisCirrhosis
33 COPDCOPD
44 GenderGender
55 AgeAge
84
Draw Conclusions We reject the null hypothesis.We reject the null hypothesis. Patients who are at high risk of developing Patients who are at high risk of developing
pneumonia during their hospitalization for a pneumonia during their hospitalization for a hip fracture can be identified by:hip fracture can be identified by: total number of pre-existing conditionstotal number of pre-existing conditions cirrhosiscirrhosis COPDCOPD male gendermale gender
85
How this information could be used to predict pneumonia on admission
Z=-4.899 + (number of comorbidities x 0.469) + Z=-4.899 + (number of comorbidities x 0.469) + (cirrhosis x 2.275) + (COPD x 0.714) + (age x (cirrhosis x 2.275) + (COPD x 0.714) + (age x 0.021) + (gender[female=1, male=0] x –0.715)0.021) + (gender[female=1, male=0] x –0.715)
e=2.718e=2.718 Example, an 80 year old male with cirrhosis and Example, an 80 year old male with cirrhosis and
one other comorbidity (but not COPD) had a one other comorbidity (but not COPD) had a 99.4% chance of developing pneumonia.99.4% chance of developing pneumonia.
Z=-4.899 + (2 x 0.469) + (1 x 2.275) + (0 x 0.714) Z=-4.899 + (2 x 0.469) + (1 x 2.275) + (0 x 0.714) + (80 x 0.021) (0 x –0.715)+ (80 x 0.021) (0 x –0.715)
)e(1
1 Pneumonia ofy Probabilit Z-
Z
86
Survival Analysis Kaplan-Meier method
Used to plot cumulative survival Log-rank test
Used to compare survival curves Cox proportional-hazards
Used to adjust for covariates in survival analysis
87
Odds and Ends You Will Need
88
95% Confidence Intervals A 95% confidence interval is an estimate that you A 95% confidence interval is an estimate that you
make from your sample as to where the true make from your sample as to where the true population value lies.population value lies.
If your study were to be repeated 100 times, you If your study were to be repeated 100 times, you would expect the 95% CIs to cross the true value would expect the 95% CIs to cross the true value for the population in 95 of these 100 studies.for the population in 95 of these 100 studies. the value might be a mean, percentage or RRthe value might be a mean, percentage or RR
Confidence intervals should be included in Confidence intervals should be included in publications for the major findings of the study.publications for the major findings of the study.
89
Prevalence vs. Incidence
PrevalencePrevalence How many of you now have the flu?How many of you now have the flu?
IncidenceIncidence How many of you have had the flu in the How many of you have had the flu in the
past year?past year?
90
Random Random is not the same as haphazard, Random is not the same as haphazard,
unplanned, incidental.unplanned, incidental. Allocating patients to the treatment group Allocating patients to the treatment group
on even days and to the control group on on even days and to the control group on odd days is systematic – not random.odd days is systematic – not random.
Random refers to the idea that each element Random refers to the idea that each element in a set has an equal probability of in a set has an equal probability of occurrence.occurrence.
91
Improving a RCT
See the handout, Table 3-2 pages18-19.See the handout, Table 3-2 pages18-19. ““Checklist to Be Used by Authors When Checklist to Be Used by Authors When
Preparing or by Readers When Analyzing a Preparing or by Readers When Analyzing a Report of a Randomized Controlled Trial”.Report of a Randomized Controlled Trial”.
92
IX. You Will Need to Continue Learning About Statistics
93
Recommended books on statistics Kuzma – Statistics in the Health SciencesKuzma – Statistics in the Health Sciences Norusis – Data Analysis with SPSSNorusis – Data Analysis with SPSS Altman – Statistics with ConfidenceAltman – Statistics with Confidence Friedman – Fundamentals of Clinical TrialsFriedman – Fundamentals of Clinical Trials Pagano – Principles of BiostatisticsPagano – Principles of Biostatistics Encyclopedia of BiostatisticsEncyclopedia of Biostatistics SPSS manualsSPSS manuals
94
Future Workshops
95
Future CRC Workshops Oct 11 - How to use wireless hand-helds for clinical
research(Paul St Jacques, MD, Anesthesiology)
Oct 18 - How to conduct Anova statistical tests - Part 1/3(Ayumi Shintani, PhD, MPH, Center for Health Services Research)
Oct 25 - How to conduct Anova statistical tests - Part 2/3(Ayumi Shintani, PhD, MPH, Center for Health Services Research)
Nov 1 - How to conduct Anova statistical tests - Part 3/3(Ayumi Shintani, PhD, MPH, Center for Health Services Research)
Nov 8 - How to write a data and safety-monitoring plan(Harvey Murff, MD)
96
X. One Final Skill You Will Need to Master
97
A response to the comment: You’re comparing apples and oranges” ““No – this is comparing apples and No – this is comparing apples and
oranges!”oranges!”