basics of biostatistics for health research session 3 – february 21, 2013 dr. scott patten,...

41
cs of Biostatistics for Health Rese Session 3 – February 21, 2013 Dr. Scott Patten, Professor of Epidemiology Department of Community Health Sciences & Department of Psychiatry [email protected]

Upload: bartholomew-robertson

Post on 17-Jan-2016

218 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Basics of Biostatistics for Health Research Session 3 – February 21, 2013 Dr. Scott Patten, Professor of Epidemiology Department of Community Health Sciences

Basics of Biostatistics for Health ResearchSession 3 – February 21, 2013

Dr. Scott Patten, Professor of EpidemiologyDepartment of Community Health Sciences

& Department of Psychiatry

[email protected]

Page 2: Basics of Biostatistics for Health Research Session 3 – February 21, 2013 Dr. Scott Patten, Professor of Epidemiology Department of Community Health Sciences

Some General Principles of Data Analysis

• Data cleaning, checking is always the first step after data entry.

• Start with “univariate” analysis (frequencies and their CIs.

• Progress to “bivariate” analysis – does a “dependent” variable differ depending on an “independent” variable

• The next stage is “multivariate”

Page 3: Basics of Biostatistics for Health Research Session 3 – February 21, 2013 Dr. Scott Patten, Professor of Epidemiology Department of Community Health Sciences

Statistical Errors

Page 4: Basics of Biostatistics for Health Research Session 3 – February 21, 2013 Dr. Scott Patten, Professor of Epidemiology Department of Community Health Sciences
Page 5: Basics of Biostatistics for Health Research Session 3 – February 21, 2013 Dr. Scott Patten, Professor of Epidemiology Department of Community Health Sciences

• Go to “www.ucalgary.ca/~patten” www.ucalgary.ca/~patten

• Scroll to the bottom.

• Right click to download the files described as being “for PGME Students”– One is a dataset– One is a data dictionary

• Save them on your desktop

Page 6: Basics of Biostatistics for Health Research Session 3 – February 21, 2013 Dr. Scott Patten, Professor of Epidemiology Department of Community Health Sciences

Open the Datafile

Page 7: Basics of Biostatistics for Health Research Session 3 – February 21, 2013 Dr. Scott Patten, Professor of Epidemiology Department of Community Health Sciences

Comparing Proportions

• We’ve looked at two procedures (e.g. for obesity in men vs. women):

generate obese = bmi

recode obese 0/30=0 30.001/1000=1

prtest obese, by(sex)

Page 8: Basics of Biostatistics for Health Research Session 3 – February 21, 2013 Dr. Scott Patten, Professor of Epidemiology Department of Community Health Sciences

Generate Commands Using Logic

generate obese2 = .

recode obese2 .=0 if bmi <= 30

recode obese2 .=1 if bmi > 30

tab obese obese2

prtest obese2, by(sex)

Page 9: Basics of Biostatistics for Health Research Session 3 – February 21, 2013 Dr. Scott Patten, Professor of Epidemiology Department of Community Health Sciences

Generate as a Recode Subcommand

recode bmi (0/30=0) (30.01/1000=1), gen(obese3)

tab obese obese3

Page 10: Basics of Biostatistics for Health Research Session 3 – February 21, 2013 Dr. Scott Patten, Professor of Epidemiology Department of Community Health Sciences

Alternative to prtest

• Can use tab with the subcommand “exact”

tab obese sex, exact

Page 11: Basics of Biostatistics for Health Research Session 3 – February 21, 2013 Dr. Scott Patten, Professor of Epidemiology Department of Community Health Sciences

Epitab Commands

1

3

2

Page 12: Basics of Biostatistics for Health Research Session 3 – February 21, 2013 Dr. Scott Patten, Professor of Epidemiology Department of Community Health Sciences

Risk Ratios

“risk” in the “exposed”

“risk” in the “non-exposed”RR =

Page 13: Basics of Biostatistics for Health Research Session 3 – February 21, 2013 Dr. Scott Patten, Professor of Epidemiology Department of Community Health Sciences

Odds Ratios

Odds in the “exposed”

Odds in the “non-exposed”OR =

Page 14: Basics of Biostatistics for Health Research Session 3 – February 21, 2013 Dr. Scott Patten, Professor of Epidemiology Department of Community Health Sciences

Measures of Association

• The most common ones are ratios..– RR– OR– PR– IR

• You’ll sometimes see differences as well..– Risk Difference

Page 15: Basics of Biostatistics for Health Research Session 3 – February 21, 2013 Dr. Scott Patten, Professor of Epidemiology Department of Community Health Sciences

Another Alternative…

• The “cs” command is for “cross-sectional” and will give you risk ratios or risk differences

• However, it requires 0 and 1 values.

recode sex (1=0) (2=1), gen(female)

cs obese female, exact

Page 16: Basics of Biostatistics for Health Research Session 3 – February 21, 2013 Dr. Scott Patten, Professor of Epidemiology Department of Community Health Sciences

Odds and Proportions

• In our sample, there are…– 1560 obese– 10,015 non-obese– (and 52 missing)

• The frequency of obesity (prevalence) is 1,560/(1,560 + 10,015)

• The odds are: 1,560/10,015

Page 17: Basics of Biostatistics for Health Research Session 3 – February 21, 2013 Dr. Scott Patten, Professor of Epidemiology Department of Community Health Sciences

Odds and Proportions

• In other words…– If ‘a’ means “have disease” and b means “does

not have the disease” then…– Proportion = a / a+b– Odds = a / b

Page 18: Basics of Biostatistics for Health Research Session 3 – February 21, 2013 Dr. Scott Patten, Professor of Epidemiology Department of Community Health Sciences

Another Alternative…

• The “cc” command is for “case-control” and will give you odds ratios

• However, it requires 0 and 1 values.

cc obese female, exact

Page 19: Basics of Biostatistics for Health Research Session 3 – February 21, 2013 Dr. Scott Patten, Professor of Epidemiology Department of Community Health Sciences

As Task for You…• What is the prevalence of diabetes? (provide a 95%

confidence interval for your estimate)

• What is the prevalence of diabetes in men and women (hint: use “by” in the dialogue box)

• What is the odds ratio for the association of diabetes and obesity?

• What is the risk ratio for the association of diabetes and obesity?

• Is the association statistically significant?

Page 20: Basics of Biostatistics for Health Research Session 3 – February 21, 2013 Dr. Scott Patten, Professor of Epidemiology Department of Community Health Sciences

A More Complex Problem..

• The prevalence of obesity is said to be associated with lower levels of education

Page 21: Basics of Biostatistics for Health Research Session 3 – February 21, 2013 Dr. Scott Patten, Professor of Epidemiology Department of Community Health Sciences

Two-way Tables

12

3

Page 22: Basics of Biostatistics for Health Research Session 3 – February 21, 2013 Dr. Scott Patten, Professor of Epidemiology Department of Community Health Sciences

A Two-way Table

Pearson chi2(3) = 136.4094 Pr = 0.000

Total 4,663 3,395 1,883 1,341 11,282

1 813 417 162 112 1,504

0 3,850 2,978 1,721 1,229 9,778

obese 1 2 3 4 Total

grad+

0-11 years, hs or ged, some coll, coll

. tabulate obese educ, chi2

Page 23: Basics of Biostatistics for Health Research Session 3 – February 21, 2013 Dr. Scott Patten, Professor of Epidemiology Department of Community Health Sciences

Bar Graphs

• It is under the graphics menu, the dialogue box…

1 2

3

Page 24: Basics of Biostatistics for Health Research Session 3 – February 21, 2013 Dr. Scott Patten, Professor of Epidemiology Department of Community Health Sciences

Select Categories..

1

2

Page 25: Basics of Biostatistics for Health Research Session 3 – February 21, 2013 Dr. Scott Patten, Professor of Epidemiology Department of Community Health Sciences

0.0

5.1

.15

.2m

ean

of o

bese

1 2 3 4

Page 26: Basics of Biostatistics for Health Research Session 3 – February 21, 2013 Dr. Scott Patten, Professor of Epidemiology Department of Community Health Sciences

Histograms, with “by”

• The pattern of obesity by education is different than that of mean BMI.

• Your Task: use the “by” subcommand with the histogram command to look at the distribution of BMI by eduation.

Page 27: Basics of Biostatistics for Health Research Session 3 – February 21, 2013 Dr. Scott Patten, Professor of Epidemiology Department of Community Health Sciences

Does BMI Differ by Education?

• If we had two groups we’ld use a t-test.

• Our null would be Mean(1) = Mean(2), or as Stata says: Mean(1) – Mean(2) = 0

• But we have > 2 groups, so could try to use ANOVA– Can think of this test as an extension of the two

group t-test– Assumes normal distribution and equal variances

(like the t-test it is “parametric”)

Page 28: Basics of Biostatistics for Health Research Session 3 – February 21, 2013 Dr. Scott Patten, Professor of Epidemiology Department of Community Health Sciences

1

2

3

One-Way ANOVA

Page 29: Basics of Biostatistics for Health Research Session 3 – February 21, 2013 Dr. Scott Patten, Professor of Epidemiology Department of Community Health Sciences

STATA Warns of a Problem

Bartlett's test for equal variances: chi2(3) = 195.1798 Prob>chi2 = 0.000

Total 189524.965 11281 16.8003692

Within groups 184631.134 11278 16.370911

Between groups 4893.83027 3 1631.27676 99.64 0.0000

Source SS df MS F Prob > F

Analysis of Variance

. oneway bmi educ

Page 30: Basics of Biostatistics for Health Research Session 3 – February 21, 2013 Dr. Scott Patten, Professor of Epidemiology Department of Community Health Sciences

The Kruskal-Wallis Test

1

2

3

Page 31: Basics of Biostatistics for Health Research Session 3 – February 21, 2013 Dr. Scott Patten, Professor of Epidemiology Department of Community Health Sciences

Kruskal-Wallis Output

probability = 0.0001

chi-squared with ties = 294.008 with 3 d.f.

probability = 0.0001

chi-squared = 294.007 with 3 d.f.

4 1341 6.93e+06

3 1883 9.29e+06

2 3395 1.83e+07

1 4663 2.91e+07

educ Obs Rank Sum

Kruskal-Wallis equality-of-populations rank test

. kwallis bmi, by(educ)

Page 32: Basics of Biostatistics for Health Research Session 3 – February 21, 2013 Dr. Scott Patten, Professor of Epidemiology Department of Community Health Sciences

Non-Parametric Tests• Kruskall-Wallis and its 2 sample version (Wilcoxon

Rank Sum Test) require that…– The variable can be meaningfully ordered, and– Has a roughly/loosely bell shaped frequency

distribution (should have a central tendency)

• Your task: Repeat our analysis from last week in which we compared BMI in men and women, but use Kruskall-Wallis and Wilcoxon’s Rank Sum test.– Do you get equivalent results?

Page 33: Basics of Biostatistics for Health Research Session 3 – February 21, 2013 Dr. Scott Patten, Professor of Epidemiology Department of Community Health Sciences

Comparing Proportions?

Yes No

Fisher’s Exact Test Parametric Assumptions?

Yes No

Multiple Groups? Multiple Groups?

Yes NoYes No

ANOVA t-test Kruskall-Wallis Wilcoxon’s-Rank Sum

Page 34: Basics of Biostatistics for Health Research Session 3 – February 21, 2013 Dr. Scott Patten, Professor of Epidemiology Department of Community Health Sciences

Prevalence of Diabetes

Your Task: try this command: cii 11627 530, exact

Total 11,627 100.00

1 530 4.56 100.00

0 11,097 95.44 95.44

y/n Freq. Percent Cum.

diabetic

. tab diabetes

(Does your estimate resemble what you get with ci diabetes, exact?)

Page 35: Basics of Biostatistics for Health Research Session 3 – February 21, 2013 Dr. Scott Patten, Professor of Epidemiology Department of Community Health Sciences

The CI Calculator

1 2

3

Page 36: Basics of Biostatistics for Health Research Session 3 – February 21, 2013 Dr. Scott Patten, Professor of Epidemiology Department of Community Health Sciences

The “CC” Calculator

12

3

The CC Calculator

Page 37: Basics of Biostatistics for Health Research Session 3 – February 21, 2013 Dr. Scott Patten, Professor of Epidemiology Department of Community Health Sciences

Your Task: Try the cci command to obtain the OR

chi2(1) = 122.89 Pr>chi2 = 0.0000

Attr. frac. pop .1368591

Attr. frac. ex. .7042266 .6248288 .7652318 (exact)

Odds ratio 3.380966 2.66545 4.25952 (exact)

Point estimate [95% Conf. Interval]

Total 842 10785 11627 0.0724

Controls 739 10358 11097 0.0666

Cases 103 427 530 0.1943

Exposed Unexposed Total Exposed

Proportion

. cc diabetes prevchd

Your Task: Can you reproduce these CIs with an immediateCommand?

Page 38: Basics of Biostatistics for Health Research Session 3 – February 21, 2013 Dr. Scott Patten, Professor of Epidemiology Department of Community Health Sciences

Diagnostic Test Metrics

• Sensitivity

• Specificity

• Positive Predictive Value

• Negative Predictive Value

Page 39: Basics of Biostatistics for Health Research Session 3 – February 21, 2013 Dr. Scott Patten, Professor of Epidemiology Department of Community Health Sciences

Common Notation for Test Metrics

Page 40: Basics of Biostatistics for Health Research Session 3 – February 21, 2013 Dr. Scott Patten, Professor of Epidemiology Department of Community Health Sciences

Formulas for Test Metrics…

Let’s make formulas for Se, Sp, PPV and NPV using this terminology.

Page 41: Basics of Biostatistics for Health Research Session 3 – February 21, 2013 Dr. Scott Patten, Professor of Epidemiology Department of Community Health Sciences

(In Class) Assignment for Today

• Our database has random blood glucose (they call it “casual”)

• In these units (mg/dl) about 140 may be used as a cut-point for an “elevated” level

• Create a variable for “elevated” glucose and determine its Se, Sp, PPV and NPV as a diagnostic test for diabetes

• Calculate a confidence interval for each parameter.