chapter 1 basic biostatistics

49
12/8/2014 1 Chapter 1 Basic Biostatistics Jamalludin Ab Rahman MDMPH Department of Community Medicine Kulliyyah of Medicine Content 12 August, 2014 www.jamalrahman.net 2 Basic premises – variables, level of measurements, probability distribution Descriptive statistics Inferential statistics, hypothesis testing

Upload: others

Post on 06-Jun-2022

11 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Chapter 1 Basic Biostatistics

12/8/2014

1

Chapter1BasicBiostatisticsJamalludinAbRahmanMDMPHDepartmentofCommunityMedicineKulliyyahofMedicine

Content

12 August, 2014www.jamalrahman.net

2

Basicpremises– variables,levelofmeasurements,probabilitydistribution

Descriptivestatistics Inferentialstatistics,hypothesistesting

Page 2: Chapter 1 Basic Biostatistics

12/8/2014

2

12 August, 2014www.jamalrahman.net

3

Weobserve,webelieve.Whatweobservemightnotbethetruth

…andwecan’tobserveall.Wesample.

12 August, 2014www.jamalrahman.net

4

Parameter Statistic

Population Sample

Page 3: Chapter 1 Basic Biostatistics

12/8/2014

3

Variable

12 August, 2014www.jamalrahman.net

5

Characteristicofapopulation Cantakedifferentvalues Data=measurementscollected/observed

12 August, 2014www.jamalrahman.net

6Typeofdata(Levelofmeasurement)

Categorical

Nominal Ordinal

Numerical

Discrete Continuous

e.g.Gender,Race e.g.Cancerstaging,SeverityofCXRfor

PTB

e.g.Parity,Gravida e.g.Hb,RBS,cholesterol.

Page 4: Chapter 1 Basic Biostatistics

12/8/2014

4

Normaldistribution

12 August, 2014www.jamalrahman.net

7

; ,1

2

Otherdistributions

12/8/2014www.jamalrahman.net

8

Discretevs.continuousprobabilitydistribution

2,F,Weibull,Binomial&Poisson

Page 5: Chapter 1 Basic Biostatistics

12/8/2014

5

CharacteristicsofNormaldistribution

12 August, 2014www.jamalrahman.net

9

Smooth,symmetrical(aroundthemean),uni‐modal,bellshapedcurve

Mean=Median=Mode Skewness =0 Kurtosis=0

TestofNormality

12 August, 2014www.jamalrahman.net

10

Anderson–DarlingTest

CorrectedKolmogorov–SmirnovTest(Lilliefors Test)

Cramér–von‐mises Criterion

D'agostino's K‐squaredTest

Jarque–Bera Test

Pearson'sChi‐squareTest

Shapiro–Francia

Shapiro–Wilk Test

Page 6: Chapter 1 Basic Biostatistics

12/8/2014

6

UseNormalitytestwithcaution

12 August, 2014www.jamalrahman.net

11

Smallsamplesalmostalwayspassanormalitytest.NormalitytestshavelittlepowertotellwhetherornotasmallsampleofdatacomesfromaGaussiandistribution.

Withlargesamples,minordeviationsfromnormalitymaybeflaggedasstatisticallysignificant,eventhoughsmalldeviationsfromanormaldistributionwon’taffecttheresultsofattestorANOVA.

Statisticalobjectives

12/8/2014www.jamalrahman.net

12

1. Determinepresenceofdifference(orsimilarity)2. Determinedegreeofdifference3. Determinethedirectionofchanges(outcome)4. Predictchanges(outcomes)

Page 7: Chapter 1 Basic Biostatistics

12/8/2014

7

12/8/2014www.jamalrahman.net

13

A B C

IsthereanydifferencebetweenA&B?

HowbigisthedifferencebetweenA&B?

Whichoneistaller?AorB?

IsCdifferentfromA&B?

Isthereanypatternnow?

IftherewillbeD,canyoupredicthowtallisD?

Association

12 August, 2014www.jamalrahman.net

14

avalueandwhoseassociatedvaluemaybechanged

DependentIndependent

e.g.Smoking e.g.LungCancer

Page 8: Chapter 1 Basic Biostatistics

12/8/2014

8

Diseasemodel

12 August, 2014www.jamalrahman.net

15

Time

Exposure Exposure

Exposure

Outcome

Diseasemodel(example)

12 August, 2014www.jamalrahman.net

16

Time

Smoking MineralDust

Age

LungCancer

Page 9: Chapter 1 Basic Biostatistics

12/8/2014

9

Causalrelationship

12 August, 2014www.jamalrahman.net

17

Outcome

Factor1

Factor2

Factor3

Factor4

Factor5

Factor6Factor7

Data

CategoricalFrequency(count)&Percentage

Numerical

Normal Mean(SD)

NotNormal Median(Range/IQR)

Descriptivestatistics

12 August, 2014www.jamalrahman.net

18

Page 10: Chapter 1 Basic Biostatistics

12/8/2014

10

HypothesisTesting

12 August, 2014www.jamalrahman.net

19

Involvemorethanonevariables‐ exposure&outcome,‐ predictor&criterion,‐ risk&disease

TrytoprovethatExposurecausestheDiseasee.g.SmokingcausingLungCancer

Example~Ho:NodifferenceofrisktogetLungCancerbetweensmokerandnon‐smoker

LungCancerNoLungCancer

Smoking 20(18.2%) 90(81.8%)

NotSmoking 5(4.5%) 105(95.5%)

2 (df=1)= 10.150, p =0.001, OR = 4.7 (CI95% 1.7 – 13.0)

Becausep<0.05,werejectH0.Thereforethereisadifferentbetweensmoker&nonsmoker

12 August, 2014www.jamalrahman.net

20

Page 11: Chapter 1 Basic Biostatistics

12/8/2014

11

StatisticalTest

12 August, 2014www.jamalrahman.net

21

Univariate~Onedependent&oneindependentMultivariate~Multipledependent&multipleindependentvariable

Whattesttouse?

12 August, 2014www.jamalrahman.net

22

Variable1 Variable2 Test

Categorical Categorical Chi‐square

Categorical(2pop) Numerical(Normal) Independent samplet‐test

Categorical(2pop) Numerical(NotNormal) Mann‐WhitneyUtest

Categorical(>2pop) Numerical(Normal) One‐wayANOVA

Categorical(>2pop) Numerical(NotNormal) Kruskal‐Wallis test

Numerical(Normal) Numerical(Normal) Pearson CorrelationCoefficientTest

Numerical(Normal/NotNormal)

Numerical(NotNormal) Spearman CorrelationCoefficientTest

Numerical(Normal) Numerical(Normal) – Paired Pairedt‐test

Numerical(NotNormal) Numerical(NotNormal)–Paired

Friedmantest

Page 12: Chapter 1 Basic Biostatistics

12/8/2014

12

Butlifeisnotsimple!

12 August, 2014www.jamalrahman.net

23

LungCancer

Smoking Exposuretomineraldust

Radiation

Outcome

Noofcigarettesmokedperday

RadiationAbsorbeddose(mGy)perday

PM2.5,PM10(g/m3)

Factor Factor

Factor

Themultivariatemodel

12/8/2014www.jamalrahman.net

24

LungCA=Smoking+Radiation+Mineraldust+Others

Regression Residual

AgoodmodeliswhenRegression>Residual

Page 13: Chapter 1 Basic Biostatistics

12/8/2014

13

MultivariateAnalysis

12 August, 2014www.jamalrahman.net

25

Hypothesistesting&controlforconfounders– e.g.GeneralLinearModel,LogisticRegression

Modeling– e.g.LinearRegression

Datareduction– e.g.FactorAnalysis,ClusterAnalysis

Writingplanforstatisticalanalysis#1

12 August, 2014www.jamalrahman.net

26

DatawereanalyzedusingthecomplexsamplefunctionofSPSS(version13.0).Samplingerrorswereestimatedusingtheprimarysamplingunitsandstrataprovidedinthedataset.Samplingweightswereusedtoadjustfornonresponsebiasandtheoversamplingofblacks,MexicanAmericans,andtheelderlyinNHANES.Theprevalenceofhypertension,aswellastheawareness,treatment,andcontrolrates,wereageadjustedbydirectstandardizationtotheUS2000standardpopulation.Toanalyzedifferencesovertime,the2003–2004datawerecomparedwiththe1999–2000data.Estimateswithacoefficientofvariation>0.3wereconsideredunreliable.A2‐tailedPvalue<0.05wasconsideredstatisticallysignificant.(Ongetal.2009)

Page 14: Chapter 1 Basic Biostatistics

12/8/2014

14

Writingplanforstatisticalanalysis#2

12 August, 2014www.jamalrahman.net

27

Toassesstheeffectoftheselectionprocessonthecharacteristicsofthecases,wecomparedcasesincludedinthefinalanalysistotherestofthecases.Sincecontrolsincludedinthepresentanalysisweredifferentfromtherestofthediabetesfreeparticipantsbydesign,nosimilarcomparisonswereperformedforthatgroup.Tocomparebaselinecharacteristicsofcasesandcontrolsappropriateunivariatestatisticswereused.SimilarbinarylogisticandmultiplelinearregressionmodelswerebuiltwithincidentdiabetesorHbA1casrespectiveoutcomesandadditiveblockentryofadiponectin andpotentialconfounders.ForlinearregressionCRPandtriglycerideswerelogtransformed.SinceHbA1ccouldbemodifiedbydrugtreatment,weranasensitivityanalysisexcludingallparticipantsonantidiabetic medication.Ap‐valueof<0.05wasconsideredsignificant.AnalyseswereperformedwithSPSS14.0forWindows.

Reportinganalysis(example)

12 August, 2014www.jamalrahman.net

28

Page 15: Chapter 1 Basic Biostatistics

12/8/2014

15

29

ww

w.j

amal

rah

man

.ne

t12

Aug

ust,

2014

Reportinganalysis(example)

Reportinganalysis(example)

12 August, 2014www.jamalrahman.net

30

Page 16: Chapter 1 Basic Biostatistics

12/8/2014

16

Summary1. Identify&definevariables2. Type– independentvs.dependent3. Levelofmeasurements– nominal,ordinalor

continuous4. Checkdistribution– Normalvs.NotNormal5. Decidewhattodo‐ descriptivevs.analytical

Chapter2IntroductiontoSPSSIBMSPSSStatisticsv21forWindows

JamalludinAbRahmanMDMPHDepartmentofCommunityMedicineKulliyyahofMedicine

Page 17: Chapter 1 Basic Biostatistics

12/8/2014

17

IBMSPSSStatistics

IBMCorporationSoftwareGroupRoute100Somers,NY10589ProducedintheUnitedStatesofAmericaMay2012

12/8/2014www.jamalrahman.net

33

12/8/2014www.jamalrahman.net

34

SPSSisNOT

StatisticalPackageforSocialSciences

anymore!!!

Page 18: Chapter 1 Basic Biostatistics

12/8/2014

18

SPSSLAYOUT

12/8/2014www.jamalrahman.net

35

Layout

12/8/2014www.jamalrahman.net

36Mainmenu

Toolbar

Variables

Page 19: Chapter 1 Basic Biostatistics

12/8/2014

19

Dataeditor

12/8/2014www.jamalrahman.net

37

Rows=variablesRows=eachdata

Define&describeyourvariableshere

Enteryourdatahere

Viewer

12/8/2014www.jamalrahman.net

38

Theoutputofanalyseswillbedisplayedhere.Outputisseparatedfrom

data

Page 20: Chapter 1 Basic Biostatistics

12/8/2014

20

Syntax

12/8/2014www.jamalrahman.net

39

Wecancompileallthestepsoftheanalyseshere.ExtendtheprogrammingfunctioninSPSS.Abilitytoperformcomplexstepse.g.

“looping”

CREATINGDATASET

12/8/2014www.jamalrahman.net

40

Page 21: Chapter 1 Basic Biostatistics

12/8/2014

21

BeforeevenyoustartSPSS!

12/8/2014www.jamalrahman.net

41

Youmustidentify&definerelevantvariables Definemeans

1. Name– preferablyshortsinglename,beginswithalphabet,nospecialcharacter,nospace

2. Typeofdata– e.g.Numeric,Date,String3. Width&DecimalPlaces(ifnumeric)4. Label– descriptionfortheName(willbedisplayedinViewer)5. Values– labelsforvaluee.g.1=Male,2=Female6. Missing– definemissingvaluee.g.999forN/A

Defineyourvariables

12/8/2014www.jamalrahman.net

42

Page 22: Chapter 1 Basic Biostatistics

12/8/2014

22

VariableTypes

12/8/2014www.jamalrahman.net

43

VariableType

12/8/2014www.jamalrahman.net

44

Decidethesuitable

variabletype

Fornumeric,determineWidth&Decimal.

Decimal<Width

ForString,nooptionfor

DecimalPlace

NewoptionforNumerics withleadingzeros

Page 23: Chapter 1 Basic Biostatistics

12/8/2014

23

ValueLabels

12/8/2014www.jamalrahman.net

45

MissingValue

12/8/2014www.jamalrahman.net

46

ThisquestionisNotApplicable

tomale

e.g.Assign999torepresentN/A

value&thiswon’tbeincludedinany

analysis

Page 24: Chapter 1 Basic Biostatistics

12/8/2014

24

Chapter3DescriptiveStatisticsIBMSPSSStatisticsv22forWindows

JamalludinAbRahmanMDMPHDepartmentofCommunityMedicineKulliyyahofMedicine

Datasetfortheexercise

12/8/2014www.jamalrahman.net

48

Filename:bp.sav Hypothetical Studytodescribefactorsrelatedtobloodpressure N=150 13variables

Page 25: Chapter 1 Basic Biostatistics

12/8/2014

25

Retrievefileinformation

12/8/2014www.jamalrahman.net

49

12/8/2014www.jamalrahman.net

Page 26: Chapter 1 Basic Biostatistics

12/8/2014

26

Exercise#1

12/8/2014www.jamalrahman.net

51

1. Describesocio‐demographiccharacteristicsoftherespondent

2. Describetheexplanatoryvariables1. Smoke2. BMI3. Glucose4. Cholesterol

3. Measureprevalenceofhighbloodpressure(usingSBP140,DBP90)

DESCRIBENUMERICALDATA

12/8/2014www.jamalrahman.net

52

Page 27: Chapter 1 Basic Biostatistics

12/8/2014

27

Explore

12/8/2014www.jamalrahman.net

53

Results

12/8/2014www.jamalrahman.net

54

Check for Normality.Is Age data distributed Normally?

Page 28: Chapter 1 Basic Biostatistics

12/8/2014

28

Describeage

12/8/2014www.jamalrahman.net

55

Normal Thesubjectsdistributedbetween18‐56yearsoldwiththeaverageof38(SD=8)years.

IfnotNormal Thesubjectsdistributedbetween18‐56yearsoldwiththemedianof38(IQR=10)years

DESCRIBECATEGORICALDATA

12/8/2014www.jamalrahman.net

56

Page 29: Chapter 1 Basic Biostatistics

12/8/2014

29

Frequency

12/8/2014www.jamalrahman.net

57

Result

12/8/2014www.jamalrahman.net

58

Page 30: Chapter 1 Basic Biostatistics

12/8/2014

30

TRANSFORM

12/8/2014www.jamalrahman.net

59

Compute

12/8/2014www.jamalrahman.net

60

Page 31: Chapter 1 Basic Biostatistics

12/8/2014

31

12/8/2014www.jamalrahman.net

61

weight / ((height / 100) ** 2)

Visualbinning

12/8/2014www.jamalrahman.net

62

Page 32: Chapter 1 Basic Biostatistics

12/8/2014

32

12/8/2014www.jamalrahman.net

63

Normal < 23Overweight 23 to < 27.5Obese >= 27.5

12/8/2014www.jamalrahman.net

64

Page 33: Chapter 1 Basic Biostatistics

12/8/2014

33

Chapter4Bivariable analysesIBMSPSSStatisticsv21forWindows

JamalludinAbRahmanMDMPHDepartmentofCommunityMedicineKulliyyahofMedicine

Todetermineassociationoftwovariables?

12/8/2014www.jamalrahman.net

66

BPAge

Page 34: Chapter 1 Basic Biostatistics

12/8/2014

34

Thesteps

12/8/2014www.jamalrahman.net

67

1. Determinewhichisdependant&whichisindependent

2. Determinelevelofmeasurements3. CheckNormalityofthenumericalmeasurement4. Determinethesuitablestatisticaltest

Whatarethetests?

12 August, 2014www.jamalrahman.net

68

Variable1 Variable2 Test

Categorical Categorical Chi‐square

Categorical(2pop) Numerical(Normal) Independent samplet‐test

Categorical(2pop) Numerical(NotNormal) Mann‐WhitneyUtest

Categorical(>2pop) Numerical(Normal) One‐wayANOVA

Categorical(>2pop) Numerical(NotNormal) Kruskal‐Wallis test

Numerical(Normal) Numerical(Normal) Pearson CorrelationCoefficientTest

Numerical(Normal/NotNormal)

Numerical(NotNormal) Spearman CorrelationCoefficientTest

Numerical(Normal) Numerical(Normal) – Paired Pairedt‐test

Numerical(NotNormal) Numerical(NotNormal)–Paired

Friedmantest

Page 35: Chapter 1 Basic Biostatistics

12/8/2014

35

Exercise#2

12/8/2014www.jamalrahman.net

69

1. Determineassociationbetweensocio‐demographiccharacteristics&alltheriskfactorswithBP

2. Determineassociationbetweensocio‐demographiccharacteristics&alltheriskfactorswithHCY

Note:Itwouldbegoodifyoucouldconstructdummytable fortheanswersevenbeforetheanalysesstarted

HCYnormalrange

12/8/2014www.jamalrahman.net

70

Page 36: Chapter 1 Basic Biostatistics

12/8/2014

36

INDEPENDENTSAMPLEt‐TEST

COMPARING TWOMEANS

12/8/2014www.jamalrahman.net

71

Agevs.BP

12/8/2014www.jamalrahman.net

72

Page 37: Chapter 1 Basic Biostatistics

12/8/2014

37

12/8/2014www.jamalrahman.net

73

Results

12/8/2014www.jamalrahman.net

74

Levene’s testcheckequalitybetweenvariances.Ho:Thereisnodifferenceofvariances.SoifPissignificant,werejectHo,andthereforeequalvariancesassumed.

The original t-test (Student’s t-test) assumes equal variances for equal sample sizes. However if the variances are equal, it is robust for different sizes.

Welch's correction

Page 38: Chapter 1 Basic Biostatistics

12/8/2014

38

Table– Distributionofagebybloodpressurestatus

12/8/2014www.jamalrahman.net

75

N Mean SD Statistics df P

NormalBP 156 33.9 7.9 t=0.431 299 0.667

HighBP 145 34.4 8.9

CHI‐SQUAREDTEST

DIFFERENCE OF TWOPROPORTIONS

12/8/2014www.jamalrahman.net

76

Page 39: Chapter 1 Basic Biostatistics

12/8/2014

39

Gendervs.BP

12/8/2014www.jamalrahman.net

77

12/8/2014www.jamalrahman.net

78

Page 40: Chapter 1 Basic Biostatistics

12/8/2014

40

Results

12/8/2014www.jamalrahman.net

79

Describe this table first. What is your impression? 49% women vs. 47% men with high BP

Some books may suggest the use of Continuity Correction at ALL time, but recent simulations showed that CC (or Yate’s correction) is OVERCONSERVATIVE. Hence, use Pearson 2 when < 20% of cells have expected count < 5

When 20% of cells have EC < 5, use Fisher’s Exact Test

This is given because we code the variables using numbers. Can be used to measure P-trend

ONE‐WAYANOVA

COMPARING MORE THANTWOMEANS

12/8/2014www.jamalrahman.net

80

Page 41: Chapter 1 Basic Biostatistics

12/8/2014

41

Racevs.HbA1c

12/8/2014www.jamalrahman.net

81

12/8/2014www.jamalrahman.net

82

Page 42: Chapter 1 Basic Biostatistics

12/8/2014

42

Results

12/8/2014www.jamalrahman.net

83

Describe these results first. What is your impression? HbA1c between races? 6.4 (SD 2.1) vs. 6.7 (SD 2.2) vs. 6.5 (SD 2.2)

The F test shows that there is no single significant difference between any two groups

Results– BMIstatusvs.HbA1c

12/8/2014www.jamalrahman.net

84

The F test shows that at least there is ONE pair with significant different. Either N vs. OW, N vs. OB or OW vs. OB

We need to run Post-hoc test to determine which of the PAIR is significant.

To decide which Post-hoc test to choose, we have to test for equality of variances i.e. Homogeneity of variances (Levene’s test)

Page 43: Chapter 1 Basic Biostatistics

12/8/2014

43

12/8/2014www.jamalrahman.net

85

Results– Posthoc

12/8/2014www.jamalrahman.net

86

The significant difference is only for Normal vs. Obese (P=0.002)

Page 44: Chapter 1 Basic Biostatistics

12/8/2014

44

Report

12/8/2014www.jamalrahman.net

87

ThereisasignificantassociationbetweenBMIStatusandHbA1c(F(2,298)=13.129,P<0.001).Post‐hoctestshowedthatObesesubjectshavesignificantlyhigherHbA1ccomparedtoNormalandOverweightsubjects(P=0.001andP<0.001respectively).

MANN‐WHITNEY U

NONPARAMETRIC TESTS

12/8/2014www.jamalrahman.net

88

Page 45: Chapter 1 Basic Biostatistics

12/8/2014

45

Gendervs.HCY

12/8/2014www.jamalrahman.net

89

Results

12/8/2014www.jamalrahman.net

90

This ranks table is not to be cited in the research paper. Instead, describe their MEDIAN

Page 46: Chapter 1 Basic Biostatistics

12/8/2014

46

KRUSKALLWALLIS

NON‐PARAMETRIC TESTS

12/8/2014www.jamalrahman.net

91

Racevs.HCY

12/8/2014www.jamalrahman.net

92

Page 47: Chapter 1 Basic Biostatistics

12/8/2014

47

Results

12/8/2014www.jamalrahman.net

93

CORRELATIONTEST

RELATIONSHIP OF TWONUMERICAL DATA

12/8/2014www.jamalrahman.net

94

Page 48: Chapter 1 Basic Biostatistics

12/8/2014

48

Agevs.HbA1c

12/8/2014www.jamalrahman.net

95

Results

12/8/2014www.jamalrahman.net

96

Page 49: Chapter 1 Basic Biostatistics

12/8/2014

49

Agevs.HCY

12/8/2014www.jamalrahman.net

97

12/8/2014www.jamalrahman.net

98