varians- og...

u n i v e r s i t y o f c o p e n h a g e n d e pa rt m e n t o f b i o s tat i s t i c s

Faculty of Health Sciences

Varians- og regressionsanalyseIntroduction / Repetition

Lene Theil SkovgaardDepartment of Biostatistics


Overview

Homepages:http://staff.pubhealth.ku.dk/~lts/regression11_2

Introduction/Practicalities/Repetition:I Structure of the courseI Basics concerning quantitative measurementsI Paired and unpaired comparisonsI SAS coding examplesI Determining the size of an investigation

2 / 92


The aim of the course

I to make the participants able toI understand and interpret statistical analyses

I judge the assumptions behind the use of various methods ofanalyses

I perform own analyses (using SAS)I understand output from a statistical program package

- in general, i.e. other than SASI present results from a statistical analysis - numerically and

graphicallyI to create a better platform for communication between

’users’ of statistics and statisticians,to benefit subsequent collaboration

3 / 92


Prerequisites

I Interest

I Motivation,ideally from your own research project,or thoughts about carrying out one

I Basic knowledge of statistical concepts:I mean, averageI variance, standard deviation,

standard error of the meanI estimation, confidence intervalsI correlation, regressionI t-test, χ2-test

4 / 92


Course overview

Kursus i varians- og regressionsanalyse, efterar 2011

Dato Tid Sted Lærer Emne (henvisninger)22. nov. 9.15-12.00 Panum 31.01.4a LTS Parrede og uparrede sammeligningeruge 47 Stikprøvestørrelse23. nov. 13.00-15.45 2.2.18 JJ+MG Introduktion til SAS

DA 1-4, 8, 9.1-9.7, 15.3ABM 4.3, 4.6, ABFJM

24. nov. 9.15-12.00 Panum 31.01.4a MA ANOVAuge 47 Introduktion til SAS, fortsat24. nov. 13.00-15.45 2.2.18 MA+IJC SAS-øvelser

DA 9.8-9.10, 12.1-12.3ABM 8, 9, ABFJM

29. nov. 9.15-12.00 Panum 31.01.4a EBJ Multipel regression.uge 48 Den generelle lineære model29. nov. 13.00-15.45 9.2.14 a+b JJ+MG SAS-øvelser

DA 11, 12.4, ABM 111. dec. 9.15-12.00 1.2.16 MA Parametrisering, splines, test af linearitetuge 48 Polynomial regression1. dec. 13.00-15.45 9.2.14 a+b MA+IJC SAS-øvelser

ABM 11.3-11.106. dec. 9.15-12.00 Panum 31.01.4a BLT Generaliseret lineær regression.uge 49 Ordinal regression6. dec. 13.00-15.45 9.2.14 a+b LTS+MG SAS-øvelser

DA 12.5, 13, ABM 14, 178. dec. 9.15-12.00 Panum 31.01.4a LTS Ikke-lineær regressionuge 498. dec. 13.00-15.45 9.2.14 a+b JJ+IJC SAS-øvelser

ABM 12.413. dec. 9.15-12.00 Panum 31.01.4a LTS Varianskomponentmodelleruge 50 Multi-level modeller13. dec. 13.00-15.45 2.2.18 MG+EWA SAS-øvelser

ABM 8.3, 12.515. dec. 9.15-12.00 Panum 31.01.4a LTS Analyse af gentagne malingeruge 5015. dec. 13.00-15.45 2.2.18 JJ+EWA SAS-øvelser

DA 14.6, ABM 12.6

5 / 92


Lectures

I Tuesday and Thursday mornings: 9.15-12.00maybe a bit longer some days

I in DanishI overheads must be printed and brought alongI usually a large break around 10.15-10.30

and some 25 minutesI coffee and tea will be servedI smaller break later, if necessary

6 / 92


Exercises

I in the afternoon following each lecturejust not today!

I exercises must be printed from the homepageI two teachers in exercise classI we use SAS programmingI a short introduction to SAS will be givenI Language and Graphics notes are ment for self studyI solutions may be downloaded after the exercises

7 / 92


Course diploma requirements

I 80% attendance is requiredI your responsibility to sign the list each morning

and afternoonI 8*2=16 lists, 80% equals 13 half daysI no compulsory home work

... but you are supposed to work with the material at home....

8 / 92


TopicsQuantitative data (’something with normality’) :

birth weight, blood pressure etc.

I T-tests → analysis of variance→ variance component models

I Regression analysis

General linear model

Non-linear model Repeated measurements

��

@@@@R

9 / 92


Non-normal outcome :I Binary data: logistic regression

Probability of complications following surgery ofvarying length

I Counts: Poisson regressionNumber of infections in a calendar year

I Ordinal data: Proportional odds modelsDegree of fibrosis, predicted from blood samples

I Censored data: survival analysisSurvival time following treatment for cancer ofvarious kinds

10 / 92


Outcome Explanatory variables = CovariatesResponse Dichotomous Categorical Quantitative Categorical

and quantitativeDichotomous 2*2-tables χ2-test Logistic regressionCategorical Frequency tabels/χ2-test Generalized logistic regressionOrdinal difficult, e.g. proportional odds modelsQuantitative Mann-Whitney Kruskal-Wallis Robust multiple

Wilcoxon signed rank Friedman regressionNormalfordelt T-test ANOVA ANCOVA

paired / unpaired one-way / two-way Multiple regressionCensored Log-rank test Cox regressionCorrelated Variance component Models fornormal models repeated measurements

11 / 92


Litterature

I D.G. Altman: Practical statistics formedical research.Chapman and hall, 1991.

I P. Armitage, G. Berry & J.N.S Matthews: Statistical methodsin medical research.Blackwell, 2002.

I Aa. T. Andersen, T.V. Bedsted, M. Feilberg, R.B. Jakobsenand A. Milhøj: Elementær indføring i SAS. Akademisk Forlag(in Danish, 2002)

I Aa. T. Andersen, M. Feilberg, R.B. Jakobsen and A. Milhøj:Statistik med SAS. Akademisk Forlag (in Danish, 2002)

12 / 92


I D. Kronborg og L.T. Skovgaard: Regressionsanalyse medanvendelser i lægevidenskabelig forskning.FADL (in Danish), 1990.

I H. Brown and R. Prescott: Applied Mixed Models inMedicine. Wiley, 2006.

I D. W. Hosmer and S. Lemeshow: Applied Logistic Regression.Second Edition. Wiley, 2000.

13 / 92


Example:Two methods, expected togive the same result:

I MF: Transmitralvolumetric flow,determined by Dopplereccocardiography

I SV: Left ventricularstroke volume, determinedby cross-sectionaleccocardiography

subject MF SV1 47 432 66 703 68 724 69 815 70 60. . .. . .. . .. . .18 105 9819 112 10820 120 13121 132 131

average 86.05 85.81SD 20.32 21.19SEM 4.43 4.62

How do we compare the two measurement methods?

14 / 92


Paired design

The individual is its own control

This gives:I the same power with fewer individualsI greater power with same number of individuals

Paired designs may be used:I in cross-over studiesI using e.g. both arms for investigationsI measurements over time (repeated measures)

in particular before/after studiesI for matched pairs

15 / 92


The paired situation:

Look at differences – but on which scale?I Are the size of the differences approximately the same over

the entire range?I Or do we rather see relative (procentual) differences?

In that case, we have to take differences on a logarithmicscale.

When we have determined the proper scale:Investigate whether the differences have mean zero.

16 / 92


17 / 92


18 / 92


Example:Two methods for determining concentration of glucose.

REFE:Colour test,may be ’polluted’by urine acid

TEST:Enzymatic test,more specific for glucose.

nr. REFE TEST1 155 1502 160 1553 180 169. . .. . .. . .44 94 8845 111 10246 210 188X 144.1 134.2SD 91.0 83.2

Ref: R.G. Miller et.al. (eds): Biostatistics Casebook.Wiley, 1980.

19 / 92


Scatter plot and Bland-Altman plot:

Since differences seem to be relative, sowe consider transformation with logarithms ... later

20 / 92


Location, centre

I average (mean value) y = 1n (y1 + · · ·+ yn)

I median (’middle observation’)

– average– median

21 / 92


Variation

I variance, s2 = 1n−1Σ(yi − y)2

I standard deviation, SD = s =√variance

I special quantiles/percentiles:quartiles: 25% and 75% quantiles1%, 21

2%, 5% quantiles etc.

22 / 92


Summary statistics

I Average / MeanI MedianI Variance (quadratic units, hard to interpret)I Standard deviation (units as outcome, interpretable)I Standard deviation of estimate, often called

standard error(here SEM, standard error of mean)

The MEANS Procedure

Variable N Mean Median Std Dev Std Error-------------------------------------------------------------------------mf 21 86.0476190 85.0000000 20.3211126 4.4344303sv 21 85.8095238 82.0000000 21.1863613 4.6232431dif 21 0.2380952 1.0000000 6.9635103 1.5195625-------------------------------------------------------------------------

23 / 92


Interpretation of the standard deviation, s

“Most” of the observations can be found in the interval

y ± approx.2× s

i.e. the probability that a randomly chosen subject from apopulation has a value in this interval is “large”...For the differences mf-sv we find

0.24± 2× 6.96 = (−13.68, 14.16)

If the variable is normally distributed, this interval contains approx.95% of future observations. If not....In order to use the above interval, we should at least havereasonable symmetry....

24 / 92


EstimationThe estimated mean is the averageEstimates are often written with ’hats’ on top,e.g. α, β, δHere

δ = d = 0.24 cm3

I The estimate is our best guess, but uncertainty (biologicalvariation) might as well have given us a somewhat differentresult

I The estimate has a distribution, with a standard deviationoften called the standard error of the estimate.

I Standard error (of the mean), SEM

SEM = SD√n

25 / 92


Density of the normal distribution: N (µ, σ2)

Mean,often denoted µ

Standard deviation,often denoted σ

26 / 92


Determination of quantiles

Cumulative distribution:Fraction of observations less than value:

27 / 92


Quantile plot ... or probability plot

The observed quantilesshould correspond tothe theoretical ones(except for a scale factor)

If the variable isnormally distributed,the quantile plot will look likea straight line

28 / 92


Reference regions

Regions containing 95% of the ’typical’ (middle) observations(95% coverage) :

I lower limit: 212%-quantile

I upper limit: 9712%-quantile

If a distribution fits well to a normal distribution N (µ, σ2), thenthese quantiles can be directly calculated as follows:

212%-quantile: µ− 1.96σ ≈ y − 1.96s

9712%-quantile: µ+ 1.96σ ≈ y + 1.96s

and the reference region is therefore calculated as

y ± approx.2× s = (y − approx.2× s, y + approx.2× s)

29 / 92


Technicalities

What is the ’approx. 2’?

The reference region has to ’catch’ future observations, ynew, with95% probability

We know that

ynew − y ∼ N(

0, σ2(

1 + 1n

))and hence,

30 / 92


Technicalities, contn’d:

ynew − ys√

1 + 1n

∼ t(n − 1)⇒

t2.5%(n − 1) < ynew − ys√

1 + 1n

< t97.5%(n − 1)

y − s√

1 + 1n × t2.5%(n − 1) < ynew < y + s

√1 + 1

n × t97.5%(n − 1)

31 / 92


Technicalitis, contn’d:

The meaning of ’approx. 2’ is therefore√1 + 1

n × t97.5%(n − 1) ≈ t97.5%(n − 1)

The t-quantiles (212% = - 97 1

2%) may be looked up in tables,or calculated fromthe program R: freeware, may be downloaded from e.g.

http://mirrors.dotsrc.org/cran/

32 / 92


Technicalitis, contn’d:

> degrees.freedom<-10:25> qt<-qt(0.975,degrees.freedom)> cbind(degrees.freedom,qt)

degrees.freedom qt[1,] 10 2.228139[2,] 11 2.200985[3,] 12 2.178813[4,] 13 2.160369[5,] 14 2.144787[6,] 15 2.131450

degrees.freedom qt[7,] 16 2.119905[8,] 17 2.109816[9,] 18 2.100922

[10,] 19 2.093024[11,] 20 2.085963 <----[12,] 21 2.079614[13,] 22 2.073873[14,] 23 2.068658[15,] 24 2.063899[16,] 25 2.059539

For the differences mf-sv we have n = 21, and the relevantt-quantile is therefore 2.086, and the correct reference region is

0.24±2.086 ×(1+ 121) × 6.96 = 0.24±2.185×6.96 = (−14.97, 15.45)

33 / 92


To sum up:Statistical model for paired data:

Xi : MF-method for the i’th subjectYi : SV-method for i’th subject

Differences Di = Xi −Yi (i=1,· · · ,21) are independent,normally distributed

Di ∼ N (δ, σ2D)

Note: No assumptions about the distribution ofthe basic flow measurements!

34 / 92


Point estimation

Estimated mean (estimate of δ, ’delta-hat’):

δ = d = 0.24 cm3

I The estimate is our best guess, but uncertainty (biologicalvariation) might as well have given us a somewhat differentresult

I The estimate has a distribution, with a standard deviationoften called the standard error of the estimate.

35 / 92


Central limit theorem (CLT)

The more observations that we have in the averageI the smaller is the uncertainty concerning the true mean

Standard error (of the mean), SEM

SEM = SD√n

I the more the distribution of the average will look like aNormal distribution

36 / 92


Central limit theorem

The average, y is’much more normal’than the original observations

SEM,standard error of the mean

SEM = 6.96√21

= 1.52 cm3

37 / 92


Interval estimates

How do we summarize our knowledge about the unknownparameter?

We provide an entire interval,that has a high probability of containing the true parameter:

A Confidence Interval (CI)

..not to be confused with reference regions!

38 / 92


Confidence intervals

I Confidence intervals tells us what the unknown parameter islikely to be

I An interval, that ’catches’ the true mean with a high (95%)probability is called a 95% confidence interval

I 95% is called the coverageThe usual construction (for some parameter β, say) is

β ± approx.2× SE(β)

This is often a good approximation, even if the originalmeasurements are not especially normally distributed

39 / 92


For the mean of the differences mf-sv we get the confidenceinterval to be:

d ± t97.5%(20) × SEM

= 0.24± 2.086 × 6.96√21

= (−2.93, 3.41)

If there is a systematic difference between the two methods,it is probably (with 95% certainty) within the limits(−2.93cm3, 3.41cm3) i.e.We cannot rule out a difference of approx. 3 cm3

40 / 92


I Standard deviation, SDtells us something about the variation in our sample,and presumably in the population– is used for description

I Standard error of the mean, SEMtells us something about the uncertainty ofthe estimate of the mean

SEM = SD√n

– is used for comparisons, relations etc.

41 / 92


Paired t-test

Test of the null hypothesis H0 : δ = 0(no systematic difference)

t = δ − 0s.e.(δ)

= 0.24− 06.96√

21= 0.158 ∼ t(20)

P = 0.88, i.e. no indication of a systematic difference.

Tests and confidence intervals are equivalent, i.e.they agree on ’reasonable values for the mean’!

42 / 92


Read in from the data file ’mf_sv.tal’(text file with two columns and 21 observations)

data mfsv;infile ’mf_sv.tal’ firstobs=2;input mf sv;

dif=mf-sv;average=(mf+sv)/2;run;

proc means mean std data=mfsv;run;

The MEANS Procedure

Variable Mean Std Dev----------------------------------------mf 86.3913043 24.5759290sv 86.1739130 25.2346302dif 0.2380952 6.9635103average 85.9285714 20.4641673----------------------------------------43 / 92


Paired t-test in SAScan be performed in two different ways:

1. as a one-sample test on the differences:

proc ttest data=mfsv;var dif;

run;

The TTEST ProcedureVariable: dif

N Mean Std Dev Std Err Minimum Maximum21 0.2381 6.9635 1.5196 -13.0000 10.0000

Mean 95% CL Mean Std Dev 95% CL Std Dev0.2381 -2.9317 3.4078 6.9635 5.3275 10.0558

DF t Value Pr > |t|20 0.16 0.8771

44 / 92


2. as a paired two-sample test

proc ttest data=mfsv;paired mf*sv;run;

The TTEST ProcedureStatistics

Lower CL Upper CLDifference N Mean Mean Mean Std Devmf - sv 21 -2.932 0.2381 3.4078 6.9635

T-Tests

Difference DF t Value Pr > |t|mf - sv 20 0.16 0.8771

45 / 92


Assumptions for the paired comparison

The differences:I are independent: the subjects are unrelatedI have identical variances: is judged using the ’Bland-Altman

plot’ of differencs vs. averagesI are normally distributed: is judged graphically or numerically

I we have seen the histogram.....and the probability plotI We could make formal tests..

But unfortunately, ....

46 / 92


Tests of normality are not very informative!

Why?

I They do not reflect the importance of the discrepancy fromnormality

I For small samples, the discrepancy must be largein order to be detected

I For large samples, even minor and unimportant discrepanciesmay give rise to significance

47 / 92


If the normal distribution is not a good description, we have

I Tests and confidence intervals are still reasonably OK– due to the central limit theorem

I Reference regions become untrustworthy!

When comparing measurement methods, the reference region isdenoted as

limits-of-agreement

These limits are important for deciding whether or not twomeasurement methods may replace each other.

48 / 92


Nonparametric tests

Tests, that do not assume a normal distribution– Not assumption free

DrawbacksI loss of efficiency (typically small)I unclear problem formulation

- no actual model, no interpretable parametersI no estimates ! - and no confidence intervalsI may only be used for simple problems

– unless you have plenty computer power and an advancedcomputer package

I is of no use at all for small data sets

49 / 92


Nonparametric one-sample test

of mean 0 (paired two-sample test)I Sign test

I uses only the sign of the observations, not their sizeI has very little powerI is invariant under transformation

I Wilcoxon signed rank testI uses the sign of the observations,

combined with the rank of the numerical valuesI is more powerful than the sign testI demands that differences may be called ’large’ or ’small’I may be influenced by transformation

50 / 92


For the comparison of MF and SV, we write:

proc univariate normal data=mfsv;var dif;

run;

and among a lot of other information, we get

The UNIVARIATE ProcedureVariable: dif

Moments

N 21 Sum Weights 21Mean 0.23809524 Sum Observations 5Std Deviation 6.96351034 Variance 48.4904762Skewness -0.5800231 Kurtosis -0.5626393Uncorrected SS 971 Corrected SS 969.809524Coeff Variation 2924.67434 Std Error Mean 1.51956253

51 / 92


Tests for Location: Mu0=0

Test -Statistic- -----p Value------

Student’s t t 0.156687 Pr > |t| 0.8771Sign M 2.5 Pr >= |M| 0.3593Signed Rank S 8 Pr >= |S| 0.7603

so the conclusions stay the same...

52 / 92


Example:Two methods for determining concentration of glucose.

REFE:Colour test,may be ’polluted’ by urine acid

TEST:Enzymatic test,more specific for glucose.

nr. REFE TEST1 155 1502 160 1553 180 169. . .. . .. . .44 94 8845 111 10246 210 188X 144.1 134.2SD 91.0 83.2

Ref: R.G. Miller et.al. (eds): Biostatistics Casebook. Wiley, 1980.

53 / 92


Scatter plot and Bland-Altman plot

Since differences seem to be relative,we consider transformation with logarithms

54 / 92


Systematic difference?

Test ’H0 : δ=0’ for differencesDi = REFEi − TESTi ∼ N (δ, σ2

d)

δ = 9.89, sd = 9.70⇒ t = δsem = δ

sd/√

n = 8.27 ∼ t(45)

P< 0.0001 , i.e. stong indication of a systematic difference.

We should quantify this difference, but....

55 / 92


Limits of agreement tells us that the typical differences are to befound in the interval

9.89± 2× 9.70 = (−9.51, 29.29)

From the picture we see that this is a bad description sinceI the differences increase with the level (average)I the variation increases with the level too

56 / 92


After logarithmic transformation

We notice an obvious outlier (the smallest observation)

57 / 92


Note:I It is the original measurements, that have to be transformed

with the logarithm, not the differences!Never make a logarithmic transformation on data that mightbe negative!!

I It does not matter which logarithm you choose (i.e. the baseof the logarithm) since they are all proportional

I The procedure with construction of limits of agreement isnow repeated for the transformed observations

I and the result is transformed back to the original scale withthe anti logarithm

58 / 92


Following a logarithmictransformation

(and omitting thesmallest observation),

we get areasonable picture

59 / 92


Limits of agreement: 0.0285± 2× 0.0182 = (−0.0079, 0.0649)This means that for 95% of the subjects we will have

−0.0079 < log10(REFE)− log10(TEST) = log10(REFETEST) < 0.0649

and when transforming back (using the anti-logarithm), this givesus

10−0.0079 = 0.982 < REFETEST

< 1.162 = 100.0649 or ’reversed’

0.861 < TESTREFE

< 1.018

Interpretation: TEST will typically be between

14% below and 2% above REFE.60 / 92


Limits of agreement on the original scale

61 / 92


New type of problem: Unpaired comparisonsIf the two measurement methods were applied to separate groupsof subjcets, we would have two independent samplesTraditional assumptions:

I all observations are independentI both groups have the same variance (between subjects)

– should be checkedI observations follow a normal distribution for each method,

with possibly different mean values– the normality assumption should be checked’to a certain extent’ (if possible)

62 / 92


Ex. Calcium supplement to adolescent girls

A total of 112 11-year old girls are randomised to get eithercalcium supplement or placebo.Outcome: BMD=bone mineral density, in g

cm2 ,measured 5 times over 2 years (6 month intervals)We look at the first and the last, and define the new variable

increase=bmd5-bmd1;

Problem:Does calcium induce a larger increase in bmd than placebo?

63 / 92


64 / 92


Boxplot of changes, divided into groups

65 / 92


Technicalities

Model:The variable increase is normally distributed, with mean µg(different for the two groups) and common standard deviation (σ)

Two sample t-test: H0 : µ1 = µ2

t = x1 − x2se(x1 − x2)

= x1 − x2

s√

1n1

+ 1n2

= 0.0190.0064 = 2.95

which gives P = 0.0041in a t distribution with 89 degrees of freedom

66 / 92


Technicalities, cont’d

The reasoning behind the test statistic:

x1 normally distributed N (µ1,1

n1σ2)

x2 normally distributed N (µ2,1

n2σ2)

x1 − x2 ∼ N (µ1 − µ2, ( 1n1

+ 1n2

)σ2)

σ2 is estimated by s2, a pooled variance estimate,and the degrees of freedom is

df = (n1 − 1) + (n2 − 1) = (44− 1) + (47 − 1) = 89

67 / 92


Unpaired t-test for increase, calcium vs. placebo:

proc ttest data=calcium;class grp;var increase;

run;

Lower CL Upper CLVariable grp N Mean Mean Mean Std Dev

increase C 44 0.0971 0.1069 0.1167 0.0321increase P 47 0.0793 0.0879 0.0965 0.0294increase Diff (1-2) 0.0062 0.019 0.0318 0.0307

T-Tests

Variable Method Variances DF t Value Pr > |t|

increase Pooled Equal 89 2.95 0.0041increase Satterthwaite Unequal 86.9 2.94 0.0042

Equality of Variances

Variable Method Num DF Den DF F Value Pr > F

increase Folded F 43 46 1.20 0.5513

68 / 92


Conclusions

I No detectable difference in standard deviations(0.0321 vs. 0.0294, P=0.55)

I Clear difference in means:0.019 with CI: (0.0062, 0.0318)

I Note that we have two different versions of the t-test, one forequal variances and one for unequal variances.

69 / 92


Technicalities

The hypothesis of equal standard deviations is investigated by

F = s22

s21

= 0.03212

0.02942 = 1.20

If the two standard deviations are actually equal, this quantity hasan F-distribution with (43,46) degrees of freedom. We find P=0.55and therefore cannot reject the equality of the two variances.

70 / 92


Technicalities, cont’d

If we had rejected the hypothesis of equal standard deviations,then what?Alternative form of t-test:

t = x1 − x2se(x1 − x2)

= x1 − x2√s2

1n1

+ s22

n2

∼ t(??)

This would have resulted in essentially the same as before:

t = 2.94 ∼ t(86.9), P = 0.0042

71 / 92


Paired or unpaired comparisons?

Note the consequences for the MF vs. SV example:I Difference according to the paired t-test:

0.24, CI: (-2.93, 3.41)I Difference according to the unpaired t-test:

0.24, CI: (-12.71, 13.19)i.e. with identical estimated mean differences,but much wider confidence interval

You have to respect your design!!

–and not forget to take advantage of subjectsserving as their own control

72 / 92


Significance level α (usually 0.05) denotes the risk, that we arewilling to take of rejecting a true hypothesis,also denoted as an error of type I.

accept rejectH0 true 1-α α

error of type IH0 false β 1-β

error of type II

1-β is denoted the power.This describes the probability of rejecting a false hypothesis.

But what does ’H0 false’ mean? How false is H0?

73 / 92


The power is a function of the true difference:’If the difference is xx, what is our probability of detecting it– on a 5% level’??

Power:I is calculated in order to

determine the sizeof an investigation

I when the observationshave been gathered,we present insteadconfidence intervals

74 / 92


Statistical significance depends upon:I the size of the true difference (if any)I the number of observationsI the size of the random variation, i.e.

the biological variationI the chosen significance level

Clinical significance depends upon:I the size of the estimated difference,

and the width of the confidence interval

75 / 92


Two active treatments: A og B, compared to Placebo: P

Results:

1. trial: A significantly better than P (n=100)2. trial: B not significantly better than P (n=50)

Conclusion: A is better than B???

No, not necessarily! Why?

76 / 92


Determination of the size of an investigation

How many patients do we need?

This depends on the nature of the data,and on the type of conclusion wanted:

I Clinically relevant difference, d0, i.e.Which magnitude of the difference are we interested indetecting?Effects smaller than d0 have no real interestValues of d0 decided from

I knowledge of the substance matterI relation to biological variation

77 / 92


I With how large a probability (power)?I ought to be large, at least 80%

I On which level of significance?I usually 5%, maybe 1%

I How large is the biological variation?I guess from previous (similar) investigations or pilot studiesI pure guessing....

78 / 92


Example

New drug in anaesthesia: XX, given e.g. in the dose 0.1 mg/kg.

Outcome: Time until some event, e.g. ’head lift’.

2 groups: Eu1 Eu

1 og Eu1 Ea

1

We would like to establish a difference between these two groups,but not if it is uninterestingly smallsay, less than 3 minutes.

How many patients do we need to investigate?

79 / 92


Pilot study

From a study on a similar drug, we found:

group N time to first response (min.±se)Eu

1 Eua 4 16.3± 2.6

Eu1 Eu

1 10 10.1± 3.0

From this, we make a guess of the biological variation:SD=3 min.We do not normally use pilot studies for determining the clinicallyrelevant difference d0!!

80 / 92


Nomogram

81 / 92


Quantities in nomogram

I d0: clinically relevant difference, MIREDIFI s: standard deviationI d0

s : standardized differenceI 1− β: power at MIREDIFI α: significance level

What to do?I Connect d0

s and 1− β with a straight lineI Read off N (required sample size, totally for both groups) for

the relevant α

82 / 92


d0 = 3: clinical relevant differences = 3: standard deviationd0s = 1: standardized difference

1− β = 0.80: power

Significance level, α = 0.05:N = 32

Significance level, α = 0.01:N ≈ 50

83 / 92


Sample size determination with SASproc power;

twosamplemeans test=diffmeandiff = 3stddev = 3npergroup = .power = 0.8;

run;

The POWER ProcedureTwo-sample t Test for Mean Difference

Fixed Scenario Elements

Distribution NormalMethod ExactMean Difference 3Standard Deviation 3Nominal Power 0.8Number of Sides 2Null Difference 0Alpha 0.05

Computed N Per Group

Actual N PerPower Group

0.807 1784 / 92


What, if we cannot get hold of so many patients?

I Include more centres- multi center study

I Take fewer from one group, more from another- How many?

I Perform a paired comparison, i.e.use the patients as their own control.- How many?

I Be content to take less than needed- and hope for the best (!?)

I Give up on the investigation- instead of wasting time (and money)

85 / 92


Different group sizes?

n1 in group 1n2 in group 2

}n1 = kn2

The total necessary sample size gets bigger:I Find N as beforeI New total number needed: N ′ = N (1+k)2

4kI Necessary number in each group:

n1 = N ′ k1 + k = N 1 + k

4

n2 = N ′ 11 + k = N 1 + k

4k

86 / 92


Different group sizes, cont’d

I Least possible totalnumber: 32 = 16 + 16

I Each group has to containat least 8 = N

4 patientsEx: k = 2⇒ N ′ = 36⇒n1 = 24,n2 = 12

87 / 92


Necessary sample size– in the paired situation

Standardized difference is now calculated as

√2× clinically relevant difference

sD

= clinically relevant differences√

1− ρ

whereI sD denotes the standard deviation for the differencesI ρ denotes the correlation between paired observations

Necessary number of patients will then be N2

88 / 92


Necessary sample size

– when comparing frequencies

We would rather not overlook a situation such as

treatment probabilitygroup for complicationsA θAB θB

The standardised difference is then calculated as θA−θB√θ(1−θ) ,

where θ = θA+θB2

89 / 92


Dictionary English-Danish

English DanishAverage GennemsnitConfidence Limit SikkerhedsintervalDistribution FordelingMean MiddelværdiReference Region NormalområdeSample StikprøveStandard Deviation SpredningOne-way / Two-way Ensidet / TosidetPower StyrkeQuantile / Percentile Fraktil / PercentilQuantile Plot Fraktildiagram

90 / 92


The pictures on p. 17-18 are made using the code:

proc gplot data=mfsv;plot mf*sv / haxis=axis1 vaxis=axis2 frame;

axis1 value=(H=2) minor=NONE label=(H=2);axis2 value=(H=2) minor=NONE label=(A=90 R=0 H=2);symbol1 v=circle i=none c=BLACK l=1 w=2;title h=3 ’Scatter plot with identity line’;run;

proc gplot data=mfsv;plot dif*average / vref=0 lv=1 vref=0.24 15.5 -15.0 lv=2

haxis=axis1 vaxis=axis2 frame;axis1 value=(H=2) minor=NONE label=(H=2 ’average’);axis2 order=(-16 to 16 by 4) value=(H=2) minor=NONE

label=(A=90 R=0 H=2 ’difference MF-SV’);symbol1 v=circle i=none l=1 w=2;title h=3 ’Bland-Altman plot’;run;

91 / 92


data mfsv_lang;set mfsv;

flow=mf; method=’mf’; output;flow=sv; method=’sv’; output;run;

proc gplot data=mfsv_lang;plot flow*method=subject/ nolegend haxis=axis1 vaxis=axis2 frame;

axis1 value=(H=2) minor=NONE label=(H=2);axis2 value=(H=2) minor=NONE label=(A=90 R=0 H=2);symbol1 v=circle i=join l=1 w=2 r=21;title h=3 ’Spaghetti-plot’;run;

proc gchart data=mfsv;vbar dif;

title ’Histogram of differences’;run;

92 / 92

varians- og...

Documents