guidelines in analysis phase.pdf

Upload: bonnyong-ma

Post on 04-Apr-2018

224 views

Category:

Documents


0 download

TRANSCRIPT

  • 7/29/2019 Guidelines in Analysis Phase.pdf

    1/32

    1

    Guidelines in the Analysis Phase

    Analysis plan ...................................................................................................................................... 2Data analysis in general ..................................................................................................................... 5Initial data analysis ............................................................................................................................. 9Post-hoc and sensitivity analyses ......................................................................................................11Data analysis documentation.............................................................................................................13Reporting results in tables and figures...............................................................................................14Guidelines for reporting specific types of studies ...............................................................................17Prognostic models .............................................................................................................................19

    Handling Missing Data.......................................................................................................................26Updated: 5 July 2012

  • 7/29/2019 Guidelines in Analysis Phase.pdf

    2/32

    2

    Page. 2 of 32

    Rev. Nr.: Effective date:

    1.2 1 Jan 2010

    Title of the document:

    Analysis planHB Nr. : 1.4-01 avs

    1. Aim

    To promote structured and targeted data analysis

    2. DefinitionsAn analysis plan is a stepwise plan created prior to the actual data analysis

    3. KeywordsResearch questions, population, variables, analysis methods, stepwise plan

    4. DescriptionAn analysis plan should be created prior to the data analyses. The analysis plan contains adescription of the research question and what the various steps in the analysis are going to be. Theanalysis plan is intended as a starting point for the analysis. It ensures that the analysis can be

    undertaken in a targeted manner.However, both the research questions and the analyses may be revised during the data analysis. Itmay also be that certain options are not yet clear before the start of the data analysis. Evenexplorative data analysis is possible. The findings and decisions made during the analyses may bedocumented at a later stage in the analysis plan, meaning the analysis plan becomes a dynamicdocument. However, there is also the option of documenting findings and decisions made during thedata analysis in SPSS syntax (see guideline 1.4-05 Documentation of data analysis). In this instancethe analysis plan only serves as the starting point.

    The concrete research question needs to be formulated firstly within the analysis plan; this is thequestion intended to be answered by the analyses. Concrete research questions may be definedusing the acronym PICO: Population, Intervention, Comparison, Outcomes. A question such as:

    What are the risk factors for back pain? is too general. An example of a concrete question could be:Does frequent bending at work lead to an elevated risk of lower back pain occurring in employees?(Population = Employees; Intervention = Frequent bending; Comparison = Infrequent bending;Outcome = Occurrence of back pain). Concrete research questions are essential for determining theanalyses required.

    The analysis plan should then describe which statistical techniques are to be used to analyse thedata. The following issues need to be considered in this process and described where applicable:Which (subgroup of the) population is to be included in the analysesData from which endpoint (T1, T2, etc) will be used?Which (dependent and independent) variables are to be used in the analyses and how are thevariables supposed to be analysed (e.g. continuous or in categories)Which variables are to be investigated as potential confounders of effect modifiers and how are thesevariables supposed to be analysed. There are different ways of dealing with confounders. Oftenvariables are only included as confounders if they influence the relationship between the determinantand outcome in actual fact (i.e. when they modify the regression coefficient of the determinant, seeexample). Another frequently used method is to include all variables that have a significantrelationship with the outcome, even if they are perhaps not (strong) confounders.How to deal with missing valuesWhich analyses are to be carried out in which order (e.g. univariate analyses, multivariate analyses,analysis of confounders, analysis of interaction effects, analysis of sub-populations, etc.).

    A statistician may need to be consulted regarding the choice of statistical techniques.

    It can be quite efficient to create a number of empty tables to be included in the article prior to thestart of data analysis. This is often very helpful in deciding which analyses are exactly required inorder to analyse the data in a targeted manner.

  • 7/29/2019 Guidelines in Analysis Phase.pdf

    3/32

    3

    5. DetailsAudit questions:Has an analysis plan been created prior to the start of analysis?Has a concrete research question been formulated in the analysis plan?Have the points described under section 4 been considered and have the most important optionsbeen decided?Has a stepwise description of the analyses to be applied been provided in the analysis plan?

    6. Appendices/references/links

    7. AmendmentsV1.2 1 Jan 2010: English translation.V1.1 21 Jan 2008: Text in guideline has been re-written with more emphasis on a flexibleapproach.

    EXAMPLE OF AN ANALYSIS PLAN

    Work-related psychosocial risk factors in relation to the occurrence of neck complaints.

    Research questionWhat is the influence of the following psychosocial factors in the occurrence of neck complaints within1 year in symptom-free employees?1. Quantitative job demands2. Skill discretion3. Decision authority4. Supervisor support5. Co-worker support

    PopulationAll 977 individuals who were symptom-free at baseline measurement and had a full follow-up.

    Outcome measure (dependent variable)Dichotomous variable: Presence (1) or absence (0) of neck complaintsTime variable: Time prior to neck complaint arising (minimum length of time of 1 day) in days

    Independent variables:All independent variables and confounders are dimensions of the Job Content Questionnaire(Karasek questionnaire).1. Quantitative job demands2. Skill discretion3. Decision authority4. Supervisor support5. Co-worker support

    Confounders:1. Qualitative job demands2. Job securityFor each analysis with 1 central psychosocial factor, the other 4 will be analysed as potentialconfounders.

    Other potential confounders

    Age

    Sex

    Coping styles (3 variables): Avoidance behaviour, seeking social support, approachingproblems actively

    Life events

  • 7/29/2019 Guidelines in Analysis Phase.pdf

    4/32

    4

    Physical factors in leisure time (9 variables): Intensive sport/heavy physical activity duringthe last 4 months requiring a lot of exertion; Long-term sitting, computer screen work, working withhands above shoulder height, exertion with hands/arms; having to work in the same position forlong periods of time, having to make the same hand/arm movements numerous times per minute,driving a vehicle, bending/twisting the upper body numerous times per hour.

    Work-related physical factors (11 variables): Percentage of work time neck flexion >45degrees; Percentage of work time seated; Percentage of work time neck rotation >45 degrees;Frequency of lifting >25 kg per working day; Percentage of work time making repetitive

    movements with arms/hands and frequency >4 times per minute; Percentage of work time upperarm elevation >60 degrees; Working with hands above shoulder height, Computer screen work;Working with vibrating or pulsating objects; Driving a vehicle at work; Bending/twisting of the upperbody numerous times per hour.

    Statistical analysisOne regression model for each psychosocial factor:- Firstly, univariate Cox regressions; dependent variable neck complaints, independent variable is thecentral psychosocial factor

    Confounding

    - Univariate Cox regressions of all potential confounders. Potential confounders with a p > 0.25 will nolonger be considered as confounders.- Multivariate Cox regressions of always 1 central psychosocial factor and 1 potential confounderusing p < 0.25. When the change in the regression coefficient of the central psychosocial factor isaround 10% or greater, then the potential confounder should be viewed as a true confounder, andthis confounder should then be included in the multivariable analysis.- Always add 1 potential confounder: If the change in the regression coefficient is greater than 10%,the confounder should be kept in the model, otherwise it can be excluded.

    Effect modification- Sex: Create a sex* psychosocial factor interaction. Add the interaction to the final model (withconfounders). If the interaction is significant, then there is effect modification present.

  • 7/29/2019 Guidelines in Analysis Phase.pdf

    5/32

    5

    Page. 5 of 32

    Rev. Nr.: Effective date:

    1.1 1 Jan 2010

    Title of the document:

    Data analysis in generalHB Nr. : 1.4-2 avs

    1. Aim

    Outline of quality aspects of data analysis (principal analyses).

    2. Definitions

    Modelling Finding a statistical model that works well with the data.Cross-validation Method where the sample is split in two. One half is used to develop

    the models, the other to test the models developed.Stepwise modelling Modelling method involving stepwise procedures: A term is removed

    or added to the model at each step. A distinction is made between:forward stepwise, backward stepwiseand stepwise

    Imputing Method of filling in missing values in a dataset.Multilevel analysis Type of regression analysis where a distinction can be made

    between more than one level: For instance, data collected from

    patients within a general practice whereby both the patients as wellas general practitioners data play a role.

    GEE Generalized Estimating Equations: Specific type of multilevelanalysis

    Logistic regressionanalysis

    Type of regression analysis where the dependent variable isdichotomous.

    Dichotomy A variable that can only assume one of two values.Cox regression Type of survival analysis. The dependent variable reflects length of

    survival.Normality Property of a variables distribution: The underlying distribution is

    normalor Gaussian.Resampling method Method where samples are repeatedly taken from the available data,

    either to by-pass the distribution requirements of a test (for instance

    for bootstrapping), or to increase the precision of an estimate.

    3. KeywordsData analysis, modelling, regression analysis, (co-)variance analysis, multilevel analysis, GEE,analysis of longitudinal data, factor analysis, structural models, exact testing, non-parametric testing,bootstrapping.

    4. DescriptionThe variety of methods used in data analysis for medical/epidemiological research is enormous. Thisnote provides an overview of the classes of frequently used methods. Here and there it discussesfactors that may have an influence on the quality of interpretation, and therefore the conclusions aswell. It is self-evident that no attempt has been made to provide an exhaustive list: The field is simplymuch too large for this.We discuss the following topics in brief:General modellingRegression analysis(Co)-variance analysisMultilevel analysisMethods for longitudinal dataFactor analysisMethods for analysing structural modelsSpecial methods: Exact tests, non-parametric tests, bootstrapping

    General modelling Not very much has been written about the general principles of statisticalmodelling, although there is literature on modelling within specific academic areas. A general bookabout statistical modelling is Dixons [1]; Edwards discusses the advantages and disadvantages of

  • 7/29/2019 Guidelines in Analysis Phase.pdf

    6/32

    6

    iterative (stepwise) methods in detail [2] (compare this with the article by Adr, Kuik, Hoeksma andMellenbergh [3], which is available here as a handout).There are a number of issues that frequently occur in modelling:

    Reliability of the models determinedModels may be specific to the data provided; this means that they may not be found in follow-upstudies. A remedy for this is cross-validation. This method involves randomly splitting the sample intwo halves. One half of the sample is used to develop the model, and the other to verify the models .

    In general, this will require a great number of observations.

    Stepwise analysisIn this method the models are built in a stepwise manner. At each step a term is removed or added tothe model. Although there are a number of arguments against this procedure (see Edwards [2]), it isstill used frequently. In general it is recommended that only forward stepwisemethods are used,preferably using a variation in which the user can confirm or prevent the removal or addition of a termsuggested by the programme.Methods may also be used, which, instead of stepwise procedures, run through and detail thecompeting models [2, 4]. The results of this type of analysis therefore consist of more than onemodel.

    MisspecificationIf there are terms missing from a statistical model (for example confounders), or if the specifiedmodel does not represent certain essential aspects (for instance, the use of linear regression analysiswhilst the data has a hierarchical structure, meaning multilevel analysis would have been moreappropriate), this may influence the results dramatically.

    Missing observationsMany multivariate methods (such as multiple regression analysis) are sensitive to missing values, asthey apply standard listwise deletion: If oneobservation is missing from a respondent, then therespondent is not used in estimating the model parameters. Possible remedies: (i) Apply multipleimputation[5], even if this is often impractical in practice; (ii) Imputation using the EM algorithmorImputation using regression analysis: This can be carried out using the MissingValueAnalysis(MVA)

    programme within SPSS; (iii) Enter a safe value for the missing values; this is a value that is notexpected to disrupt the estimates for the coefficients (mean imputation, last observation carriedforward and similar methods). Although this appears to be simple, it is not always the right option;(iv) Multilevel analysis can be used for models with repeated measures where values are missing atcertain time points.

    Violations of model assumptionsBoth linear regression analyses, as well as analysis of variance require that residuals are normallydistributed. It is therefore good practice to calculate diagnostics for both analyses: Probability plotsand other diagnostic plots [6]. However , both methods are relatively robust against violations of theassumptions. The main purpose of analysis of the diagnostics is therefore to get an impression of thereliability of the results.

    Regression analysisA number of methods fall into this category, each with specific properties and assumptions:(Multiple) linear regression analysis;Logistic regression analysis;Poisson regression;Cox regression analysis (survival analysis).Some comments:A multilevel variant also exists for all of these methods, which can be applied when the data have ahierarchical structure. It should be pointed out that there are specific assumptions that need to be metfor the Cox regression analysis, and that GEE is a preferred method for logistical multilevel analysis.

    The diagnostics used for these methods differ greatly. The diagnostics for linear regression havealready been described above. There are also diagnostics for logistic regression analysis: seeHosmer and Lemmeshows book [7]. Diagnostic assessment is less common for the other twomethods.

  • 7/29/2019 Guidelines in Analysis Phase.pdf

    7/32

    7

    A special type of logistic regression analysis is produced by calculating ROC curves: The result is atable and plot of sensitivity against (1 - specificity) at different thresholds for the predictor.In Cox regression analysis the time dependency of the covariate can be taken into consideration. It isstandard practice to assume that covariates are constant over time.

    (Co)-variance analysisOften a one-way ANOVA is used when the averages of more than two groups need to be compared(for two groups this equates to a t-test). If, in addition to this, a number of co-variates both

    categorical as well as continuous - need to be included in the model, then an analysis of (co-)variance needs to be carried out.The advantage of covariance analysis over regression analysis is that all covariates can be specifiedin a single model: All interactions are included automatically in the model. A disadvantage is that thevariance analysis imposes strict requirements on the (continuous) covariates (the regressioncoefficients need to be equal in all subgroups), which are not always met. Analysis of variance isuseful in the exploratory phase in order to get an impression of the influential covariates/confounders.

    Multilevel analysisMultilevel analysis is used if the data are nested. For instance, patient data collected from various GPpractices, where some practices are group practices in which each doctor has his/her own patients.The methodology is complicated: It is advisable to take a course on the topic before undertaking the

    analysis and asking for advice prior to the analysis phase.

    Methods for longitudinal dataMultilevel analysis can also be used if the lowest level contains observations over time. The GEE [8]programme can be used in this situation. The GGE estimating procedures are particularly reliablewhen the dependent variable is dichotomous (the equivalent of logistic regression analysis).In a methodological sense, use of GEE is recommended when comparisons between groups need tobe made and the researcher is not interested in the variability between individual patients.

    Factor analysisFactor analysis is often used in validating questionnaires (see also guideline 1.1B-08 Selecting,translating and validating questionnaires), particularly when there is an assumption that the

    questionnaire contains more than one dimension.A distinction can be made between exploratoryand confirmatoryfactor analysis. PrincipalComponents Analysis (PCA) is often used in the former (as well as Common Factor Analysis); thelatter often makes use of software to derive structural models (see below).The use of factor analysis is anything other than trivial: There are various pitfalls to avoid. The sameadvice applies here as for multilevel analysis: Take a course and ask advice prior to the analysisphase.

    Methods for analysing structural modelsTwo programmes are often used for this: EQS and Lisrel. The standard reference text for SEM(Structural Equation Modelling) is Bollen [9].Lisrel is obtainable through EMGO.

    Special methods:Exact testsIn many instances a choice (in SPSS) can be made between asymptotic and exact tests, for instancein calculating chi-square tests on a cross tabulation. A specific statistical package has beendeveloped for this purpose (Statexact), which can also be used to calculate exact odds ratios. Consultone of the EMGO+ biostatisticians.

    Special methods: Non-parametric methods, bootstrappingMethods such as regression analysis and variance analysis impose relatively strict requirements onthe data that they are applied to. It is important that the distribution of the data are unimodaland,more generally, that the data are normallydistributed. Various options are available if these

    requirements cannot be met.

  • 7/29/2019 Guidelines in Analysis Phase.pdf

    8/32

    8

    Non-parametric methodsSPSS has a (large) range of non-parametric tests available, for instance: Mann-Whitney U, Kruskal-Wallis, Wilcoxon and Friedmans test. These tests do not specify all of the requirements regarding thedata distribution and in most cases use the rank orderof the dependent variable.

    BootstrappingThis is a so-called resamplingmethod, which allows the distribution requirements for parametric teststo be by-passed. Bootstrapping is frequently used in cost-effectiveness analyses these days. The

    standard reference text is Efron and Tibshirani [10].

    5. Details

    Appendices/references/links[1] Dobson AJ. Introduction to Statistical Modelling. London New York: Chapman and Hall, 1983.[2] Edwards D. Introduction to Graphical Modelling. New York Berlin Heidelberg Bacelona Hon Kong LondonMilan Paris Singapore Tokyo: Springer, 2nd edn., 2000. ISBN 0-387-95054-0.[3] Adr HJ, Kuik DJ, Hoeksma JB, Mellenbergh GJ. Methodological aspects of statistical modelling: Some newperspectives. In: Stasinopoulos M, Touloumi G, eds., Statistical Modelling in Society. Proceedings of the 17thInternational Workshop on Statistical Modelling. Chania, Crete, Greece, July 8-12, 2002, Athens, 2002. National

    & Kapodistrian University of Athens and University of North London, 2002; 59-68.[4] Burnham KP, Anderson DR. Model selection and multimodel inference. A Practical Information-TheoreticApproach. ??????, 2nd edn., 2002.[5] Little RJA, Rubin DB. Statistical Analysis with Missing Data. New York: Wiley, 1987.[6] Judd. Statistical methods in the social sciences. To be sorted out.[7] Hosmer DW, Lemeshow S. Applied Logistic Regression. New York Chichester Brisbane Toronto Singapore:John Wiley & Sons, 1989.[8] Liang K, Zeger SL. Longitudinal data analysis using generalized linear models. Biometrika 1986;73:13-22.[9] Bollen KA. Structural Equations with Latent Variables. New York: John Wiley and Sons, 1989.[10] Efron B, Tibshirani RJ. An Introduction to the Bootstrap. New York London: Chapman & Hall, 1993.

    6. AmendmentsV1.1 1 Jan 2010 English translation.

  • 7/29/2019 Guidelines in Analysis Phase.pdf

    9/32

    9

    Page. 9 of 32

    Rev. Nr.: Effective date:

    1.1 1 Jan 2010

    Title of the document:

    Initial data analysisHB Nr. : 1.4-03 avs

    1. Aim

    To get a first impression of the data. Evaluating the randomisation procedures. Evaluating andpotentially imputing missing values and outliers. To get an impression of the distribution properties ofthe continuous variables and the numbers in the subgroups. Exploration of the validity and reliabilityof the measurement instruments.

    2. Definitions

    3.KeywordsRandomisation, missing values, distribution of continuous variables, subgroups, scale scores,measurement level of variables.

    4. Description

    Exploratory analysesThis type of analysis helps in assessing whether there are missing values and/or outliers and whethercategories need to be combined.

    First impressionIt is advisable to always review the distribution of all the variables to be used. Frequencies arereviewed for all categorical variables (e.g. marital status, education). Descriptive statistics(percentage missing values, average, trimmed average, standard deviation, median, other possiblepercentiles, minimum, maximum, skewness and kurtosis) are calculated for continuous variables (e.g.body weight, blood pressure). It is advisable to create figures, e.g. boxplots or histograms in order toreview the distribution.

    OutliersSo-called outliers may occur in continuous variables. These are values that, theoretically, are not outof range, but are extremely unlikely given the observed distribution. Reviewing averages andstandard deviations is not enough to discover outliers; a frequency or boxplot will need to begenerated for this.

    Odd combinationsCross-tabulations can be generated for categorical variables (e.g. gender x ADL limitations) in orderto assess whether odd combinations are present. Scatterplots can be created for continuousvariables to reveal any unlikely combinations (simply reviewing correlations is not sufficient). Forinstance: A weight of 120 kg and height of 1.50 metres will be an outlier in most populations. When it

    has been decided that a certain value or combination of values are outliers and the true value cannotbe recovered from the raw data, then these need to be recoded as missing.

    Missing valuesAlso carefully review missing values when evaluating the distributions. Often specific codes (e.g. -1 or9) are used for missing values. Note whether these codes have been defined as missing values. Ifthere are missing values, consider whether these need to be imputed (filled in). There are a numberof methods for this: Please consult a statistician.

    Normal distributionIf there is a requirement that the variables are normally distributed for a given analysis, it is advisableto evaluate whether a variable is in fact normally distributed. Graphs can be used for this, such as

    histograms or Q-Q plots. If it is apparent that the variable is not normally distributed, then atransformation could be considered (for instance a logarithm transformation) to see whether thisimproves matters.

  • 7/29/2019 Guidelines in Analysis Phase.pdf

    10/32

    10

    Distribution of categoriesCategories can be combined if the numbers in one or more categories is/are too small. The need forthis is not always evident from an ordinary frequency distribution. However, it can be apparent from across-tabulation.For instance in a study where there is stratification by gender and education, the cross-tabulation ofeducation by gender shows that for men the lowest category not completed primary education rarelyoccurs, whereas for women the highest category completed university education rarely occurs. The

    lowest and next lowest categories can then be added together, as well as the highest and secondhighest.

    Evaluating the randomisation procedureIn order to evaluate whether the randomisation has been successful, the distribution of all therelevant (prognostic) variables needs to be reviewed separately for each treatment arm. Descriptivestatistics (percentages, averages, median, standard deviation, range) can be used for this.Differences between groups can be tested (e.g. chi-square or t-test), although it needs toremembered that due to the randomisation procedure any differences found are, by definition, due tochance.

    Scale scores

    Prior to the items in a scale being summed, the way in which the items behave in the sample needsto be evaluated. The first step in this process is a frequency plot of the items in a scale. Usually thereare positive and negative items. It may be necessary to reverse-score the positive or negativeitems prior to summing the items to a sum score, to ensure all items are scored in the same direction.The items can then be summed, possibly once the response categories have been combined (e.g.very severe and severe).The second step is a reliability and/or principal components analysis. The principal components orfactor analysis can be used to explore which items belong to which (sub)scales. Cronbachs alphacan be used to determine the internal consistency (homogeneity) of the scale. It is advisable toalways determine the Cronbachs alpha for the scales and, if possible, a principal componentsanalysis (refer to the guideline Selecting, translating and validating questionnaires) to evaluatewhether the expected scales are also evident in the data.

    If it is apparent from the study that a given item does not fit the scale (e.g. the item-total correlation istoo low or it does not load adequately onto the principal component), then whether this item should beexcluded from the sum score needs to be considered. This does, of course, have consequences onthe comparability of scores with other studies. This should therefore be considered carefully. Ingeneral it is not advisable to modify frequently used scales. It is better to use the original scales andreport the results found (e.g. low alpha or low item-total correlations) in the discussion of the article.

    5. Details

    Audit questionsHas the distribution of all the variables been reviewed?Were there variables with a high percentage of missing values? If so, how were these dealt with?Have outliers been explored? If so, how?Have the cell numbers for central variables been taken into consideration?Where relevant: How were (large) deviations from normality solved?Has been assessed whether the items belonging to a scale actually fit to the scale?

    6. Appendices/references/links

    7. AmendmentsV1.1: 1 Jan 2010: English translation.V1.0: 23 Apr 2007.

  • 7/29/2019 Guidelines in Analysis Phase.pdf

    11/32

    11

    Page. 11 of 32

    Rev. Nr.: Effective date:

    1.1 1 Jan 2010

    Title of the document:

    Post-hoc and sensitivity analysesHB Nr. : 1.4-05 dd

    1 Aim

    Specifying and correctly implementing post-hoc and sensitivity analyses.

    2. Definitions

    3. KeywordsPost-hoc analyses, sensitivity analyses

    4. DescriptionPost-hoc analysesPost-hoc analyses are required when a significant relationship has been found between thedependent variable and a categorical, independent variable with more than two categories. Thisallows researchers to ascertain to which categories the significance can be ascribed. For logistic or

    Cox regressions the output provides both the overall significance for categorical variables as well asthe significance of the OR or RR of the separate categories in respect of the reference category. Thelatter are, in fact, post-hoc analyses (albeit not corrected for repeated testing). However, there arealso analysis methods in which the output does not automatically provide this specification. It istempting to decide which categories differ significantly by eyeballing the results. However, additionalanalyses need to be undertaken in order for this to be determined. An example of this is varianceanalysis, in which so-called post-hoc tests can be used (examples include Tukey, Duncan, Scheff ofBonferroni tests).

    Sensitivity analysesThere is always more than one way to carry out an analysis. In order to be more certain about theresults it is advisable to redo the analyses in a slightly different way, often by changing one or more

    (external) parameters. There are a number of cases where a sensitivity analyses is almost alwaysdesirable. These will be discussed here.Firstly, when a cut-off has been selected for the dependent or independent variable for which there is,as yet, no consensus. Even if there is a consensus, there is the question of whether this cut-off isapplicable to the study population. It is advisable to repeat the analyses with different cut-off values.Secondly, there may be variables with unused categories. This may be either missing values orvariables that are composed of data from various sources where there may occasionally be conflictsbetween both sources. An example of the latter is a disease diagnosis based on data provided by thegeneral practitioner and the respondent. Missing values can be substituted, meaning the respondentcan be retained for the analysis. Advanced statistical imputation methods can be used for this.Substitution can also be based on the basis of a best guess. It is good practice to carry out theanalyses both with and without respondents with missing values, and to compare the results.An example of this is an uncertain diagnosis where all uncertain cases are set at "no disease" in oneanalysis, and as "diseased" in another. All uncertain cases can be omitted in a third analysis.A third situation when sensitivity analysis is desirable is with longitudinal data. For instance, theremay be data at two time points and the analysis concerns the definition of change in the dependentvariable. There is an ongoing lively discussion around this topic in the literature: Whether the choiceshould be for a difference score or for one or another definition of relevant change, it is advisable tocarry out the analyses using different definitions. A similar strategy is recommended in all cases inwhich there is uncertainty regarding the best choice of statistical measure or procedure.Finally, sensitivity analyses are a standard component of economic evaluations. The opportunities formultivariate analysis in economic evaluations are very limited, owing to the fact that the distribution ofcost data is skewed. Sensitivity analyses are used to study the effect of, for instance, the value of

    cost points on outcomes. Often subgroup analyses and analyses with imputed missing values arecarried out as sensitivity analyses (see Drummond et al., 1997).

  • 7/29/2019 Guidelines in Analysis Phase.pdf

    12/32

    12

    5. Details

    Audit questionsHave post-hoc tests been carried out following the omnibus tests? If not, why not?Have sensitivity analyses been carried out? Would it still be useful to do this for some of thevariables?Are cost variables being used? Are sensitivity analyses needed for this?

    6. Appendices/references/linksDrummond MF, O'Brien BJ, Soddart, GL and Torrance GW. Methods for the Economic Evaluation ofHealth Care Programmes. Oxford New York Toronto: Oxford University Press, Second Edition, 1997.

    7. AmendmentsV1.1: 1 Jan 2010: English translation.V1.0: 23 Apr 2007.

  • 7/29/2019 Guidelines in Analysis Phase.pdf

    13/32

    13

    Page. 13 of 32

    Rev. Nr.: Effective date:

    1.1 1 Jan 2010

    Title of the document:

    Data analysis documentationHB Nr. : 1.4-05 avs

    1. Aim

    To ensure that the analyses can be properly reproduced.

    2. Definitions

    3. Keywords

    4. DescriptionIt is important in respect of reproducibility and efficiency of data analysis that clear documentation ofthe data analysis takes place. This may be undertaken by creating a text file for all the relevantanalyses, for instance in Word. This text file needs to include both the relevant control file (with clearinformation about all the steps taken), as well as the output (with clear information for all results). Thetext file needs to start off with the research question to be answered and the date of the analysis, and

    should end with a(n) (provisional) answer to the question.See details for an example

    5. DetailsSPSS syntax can be used to document your analyses (e.g. for an article) to allow you and others toeasily retrieve and reproduce everything. Text can be included in syntax files (as a kind of analysislogbook). Place your analysis in a logical order (e.g. firstly all the analyses for table 1, then table 2,etc.). Dont forget to always include get/file, so you know which file is related to your analysis (andwhere they are stored). A Dutch example of this can be found here.

    Tip: A PDF writer can be used to store the output in PDF format. This saves on paper.

    Audit questionDoes the data documentation for the analysis contain the following elements: Research question,control file with clear explanations, output with clear explanations and an answer to the researchquestion?

    6. Appendices/references/links

    7. AmendmentsV1.1: 1 Jan 2010: English translationV1.0: 21 Apr 2004: Title modified: Documentation instead of Report. Adding details with example ofdocumented syntax.

  • 7/29/2019 Guidelines in Analysis Phase.pdf

    14/32

    14

    Page. 14 of 32

    Rev. Nr.: Effective date:

    1.1 1 jan 2010

    Title of the document:

    Reporting results in tables andfigures HB Nr. : 1.4-06

    1. AimTo present the results of your analysis in a clear and well-organised way.

    2. Definitions

    3. KeywordsGraphs, figures, tables

    4. Description

    Graphs and tablesIt is important to present your results in a clear and well-organised way in tables and graphs, sincethis will make a significant contribution to the attractiveness of your article, poster or PowerPointpresentation. The choice of presenting results in a table or graph depends on the aim, number ofvariables, analysis methods and personal preferences. Some journals have a fixed policy on thenumber and design of tables and graphs, usually a maximum of 5 to 6 tables or figures. This shouldbe taken into consideration when writing your article. See examples of guidelines in the details.

    Tables and graphs need to be produced in such a way that the reader is able to understand themwithout having to read any additional text. The title needs to be informative and the rows and columnsin the tables or axes of the graphs need to be properly labelled. All abbreviations used need to beexplained in full in a footnote below the table or graph. In general tables are appropriate when youwant to display the exact numbers from your analyses. Graphs are more appropriate for displaying

    trends or associations.

    It is common practice to have the tables and figures follow a specific order in an article. Table 1 is thebaseline table with the most important features of the study population. The results of the analyses ofthe primary outcome measures are usually displayed in Table 2 (or Figure 1). The remainingtables/figures follow after this.

  • 7/29/2019 Guidelines in Analysis Phase.pdf

    15/32

    15

    5. DetailsAlmost every results section in an article starts with a paragraph about the recruitment of researchparticipants. These days, when describing an RCT, the majority of medical journals require a patientflow chart to be included in the article. This represents how many patients were approached, whichones were selected and excluded (and the exclusion criteria), the dropouts and the number ofpatients ultimately remaining who participated in the trial. This will usually be Figure 1 in the article.

    For other articles these details

    can be represented in the text.Ensure that the numbers add upand that no participants appearto have disappeared (always asksomeone to read through thearticle to check whether it isclear) (link to example)

    A flow chart is alsorecommended for a systematicreview reflecting how manyarticles have been scanned, how

    many full text articles have beenrequested and how many articleshave been included (see thesystematic review guideline). Aflow chart can also be useful inclarifying a complex treatmentprotocol.

    The baseline table (usually Table1) is intended as a description ofyour research population. Thiswill include the socio-

    demographic variables from yourresearch population, such asage, gender and educationallevel. It will also contain the mostimportant clinical characteristicsdescribing your population, suchas the severity of the disorderand general health status.

    Finally, all baseline values of the determinants, outcomes and potential prognostic variables will beincluded as well. The average, number of observations and standard deviation can be displayed inthe baseline table (or the median and range for data not normally distributed, or ordinal data).

    When including effect estimates (e.g. when comparing two study populations in a trial) the effectestimate (e.g. average difference, relative risk or odds ratio) should always be included with the 95%confidence interval.

    For (multiple) linear regression analysis the regression coefficient(s) (B) should be included for allcases along, including the standard error(s) or a confidence interval. The p-value may also beincluded. However, this is not necessary if you present confidence intervals. Often odds ratio(s) andthe 95% confidence interval are included in (multiple) logistic regression analyses.For an association model (e.g. what is the effect of alcohol use on developing a cardiac infarction?) itis advisable to include both the raw effect estimates (e.g. odds ratio with 95% confidence interval), aswell as any corrected effect estimates (e.g. corrected for age and gender).

    For a prognostic model (e.g. what predicts levels of recovery after 6 months?) a measure of how wellthe model works needs to be included along with the regression coefficients, e.g. percentagevariance explained or distinctive power (area under the ROC curve). For a prognostic model it is also

  • 7/29/2019 Guidelines in Analysis Phase.pdf

    16/32

    16

    necessary to properly describe the strategy used in selecting the variables and the criteria forincluding variables in the model.N.b.: Please refer to the postgraduate course in logistic regression for more information about thedifference between association and prognostic models.

    6. Appendices/references/linksScientific style and format: the CBE manual for authors, editors, and publishers, 6th ed. Style ManualCommittee, Council of Biology Editors. New York: Cambridge University Press, 1994

    Iverson C, Flanagin A, Fontanarosa PB, et al. American Medical Association manual of style: a guidefor authors and editors, 9th ed. Hagerstown, Maryland, Lippincott Williams & Wilkins; 1997.

    7. AmendmentsV1.1: 1 Jan 2010: English translation.V1.0: 31 Jan 2008.

  • 7/29/2019 Guidelines in Analysis Phase.pdf

    17/32

    17

    Page. 17 of 32

    Rev. Nr.: Effective date:

    1.3 1 dec 2011

    Title of the document:

    Guidelines for reporting specifictypes of studies

    HB Nr. : 1.4-07 mp

    1. AimTo present the necessary details for correct interpretation of the published results.

    2. Definitions

    3. KeywordsRCTs, meta-analyses, diagnostic study, observational study

    4. Description

    For each type of study it is strongly advised to follow the international standards or statements. Thereare statement for RCTs, meta-analyses, diagnostic and observational studies:

    CONSORT (link to www.consort-statement.org)The CONSORT statement is intended to improve the reporting of RCTs, to enable readers tounderstand the trial design and correctly interpret the results.

    PRISMAPrisma stands for Preferred Reporting Items for Systematic Reviews and Meta-Analyses. It is anevidence-based minimum set of items for reporting in systematic reviews and meta-analyses. Theaim of the PRISMA Statement is to help authors improve the reporting of systematic reviews andmeta-analyses. http://www.prisma-statement.org/

    QUOROM (link to

    http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&list_uids=10584742&dopt=Citation)The QUOROM (Quality Of Reporting Of Meta-analysis) statement is specifically intended for thereporting of meta-analyses of RCTs.

    STARD (link to http://www.consort-statement.org/stardstatement.htm)The STARD statement is specifically intended for the accurate reporting of diagnostic studies.

    MOOSE (link to http://www.meduohio.edu/lib/instr/pdf/MOOSE.pdf)The MOOSE (Meta-analysis of Observational Studies in Epidemiology) statement is intended for thereporting of meta-analyses of observational studies.

    STROBE statementThe STROBE statement: Strengthening the Reporting of Observational Studies in Epidemiology is agood checklist for preparing a publication of an observational study. The statement has beendeveloped for cohort, case-control and cross-sectional study designs. Anybody using this type ofdesign is advised to employ the Strengthening the Reporting of Observational Studies inEpidemiology checklist: Guidelines for reporting observational studies. The lancet, vol 370, Oct 20,2004, p. 1453-1457. The explanation of the checklist items are described in a separate publication:Stobe explanation and elaboration. See also www.strobe-statement.org

    COREQConsolidated criteria for reporting qualitative research (COREQ): a 32-item checklist for interviews

    and focus groups. International Journal for Quality in Health Care; Volume 19, Number 6: pp. 349357 2007

  • 7/29/2019 Guidelines in Analysis Phase.pdf

    18/32

    18

    TREND statementThe TREND statement ,Transparent Reporting of Evaluations with Nonrandomized Designs, isintended for the reporting of theories used and descriptions of intervention and comparisonconditions, research design, and methods of adjusting for possible biases in evaluation studies thatuse nonrandomized designs. (Am J Public Health. 2004;94:361366

    In addition to these statements based on research designs there are also statements developed forresearch in a specific field:

    APA-statementThe APA statement of the American Psychological Association includes a) standards for all journalarticles, b) more specific standards for reports of studies with experimental manipulations orevaluations of interventions using research designs involving random or non-random assignment andc) standards for articles reporting meta-analyses.American Psychologist 2008:63:839-51.

    AERA-statementThe AERA statement of the American Educational Research Association provides guidelines forreporting on empirical social science research in AERA publications. These guidelines apply toreports of education research grounded in the empirical traditions of social sciences. They cover, but

    are not limited to, qualitative and quantitative methods. Educational Researcher 2006;35:33-40.

    GRISPA checklist of 25 items recommended for strengthening the reporting of Genetic Risk PredictionStudies (GRIPS)Strengthening the Reporting of Genetic RIsk Prediction Studies: The GRIPS StatementStrengthening the Reporting of Genetic RIsk Prediction Studies (GRIPS): Explanation andElaboration

    5. Details

    6. Appendices/references/links

    7. AmendmentsV1.4: 1 Dec 2011: Addition of COREQ and GRIPSV1.3: 14 Feb 2011: Addition of the Prisma statement.V1.2: 11 Oct 2010: Addition of statements: Trend, APA and AERAV1.1: 1 Jan 2010: English translation and separation statements from graphs and tablesV1.0: 31 Jan 2008: Addition of the Strobe statement for observational studies

  • 7/29/2019 Guidelines in Analysis Phase.pdf

    19/32

    19

    Page. 19 of 32

    Rev. Nr.: Effective date:

    1.1 1 Mar 2011

    Title of the document:

    Prognostic modelsHB Nr. : 1.4-08 mh

    Martijn W Heymans

    Tobias van den BergDanielle van der WindtCaroline Terwee

    1. AimTo describe how a prognostic model can be developed and validated as thoroughly aspossible.

    2. DefinitionsA prognostic model is a multivariable model consisting of a combination of predictors asstrongly as possible associated with the outcome.

    3. KeywordsPrediction, prognosis/prognostic, model, regression, validity

    4. DescriptionThis guideline describes the methods and techniques that are used to develop and validateprognostic models. The aim of a prognostic model is to estimate the probability of a particularoutcome based on as few variables as possible. This may involve prognostic (risk or outcome)prediction (predicting the course of a disease), as well as aetiological models (predicting whowill get the disease on the basis of risk factors) or diagnostic models (predicting the presenceof the disease). The various steps to develop a prognostic model are provided in summary,from the selection of predictors to the testing of the external validity. For a few steps there isthe option between a fundamental, yet simple approach, or the use of more complex

    techniques. These options are summarised briefly in this guideline.

    Contents of this guidelineIntroductionPreparation

    Choice of predictors Defining the outcome measure Choice of model

    Sample size and number of predictors

    Linearity Correlation between predictors

    Handling missing valuesDeveloping a prognostic model Preselecting predictors and building the model

    o Univariate and stepwise regression analysiso Least absolute shrinkage and selection operator (Lasso)

    The performance of the prognostic modelCreating a prediction ruleValidity

    Internal validity External validity

  • 7/29/2019 Guidelines in Analysis Phase.pdf

    20/32

    20

    5. DetailsA. IntroductionThe aim of a prognostic model is to estimate (predict)the probability of a particular outcomeas optimally as possible, and not just to explore the causality of the association between aspecific factor and the outcome (explanatory). The way in which a prognostic model isdeveloped differs therefore from the method for building an explanatory model. For anexplanatory (causal) model there is normally a single central determinant and correction forconfounding; when building a prognostic model the focus is on the search for a combination of

    factors which are as strongly as possible related to the outcome.

    Prognostic models are often developed for the clinical practice, where the risk of diseasedevelopment or disease outcome (e.g. recovery from a specific disease) can be calculated forindividuals by combining information across patients. The model can then be presented in theform of a clinical prediction rule (1). It is often preferable that the variables in the model areeasily determined in practice in order to ensure that a prognostic model is applicable in(clinical) practice.

    B. Preparation

    Choice of predictors

    Prognostic models can be developed using a broad variety of biological, psychological andsocial predictors. The correct predictors need to be carefully selected. It is advisable to includeall predictors which have been shown to be strongly associated with the outcome in previousresearch, or those which can be expected to show an association on the basis of conceptualor theoretical models. A proper systematic literature review and expert advice is important inthis step. When the practical applicability of the prognostic model is important, it is preferablefor predictors to be determined quickly and simply (e.g. no complex or invasive tests and noextensive questionnaires).

    Defining the outcome measureThe outcome is central to the prognostic model and needs to be carefully selected. Thinkcarefully about the nature of the outcome (which concept), the method for determining theoutcome (which measurement instrument, by whom) and the length of follow-up (whichmeasurement time points). The outcome of a prognostic model is often dichotomous (e.g. ill ornot ill), but it may also be a continuous outcome (for instance, the severity of functionallimitations), or the time until a certain event occurs (time to event, for instance, the time untilwork is resumed or time until death). When defining a dichotomous outcome, occasionally acut-off point is chosen on a continuous scale. Bare in mind that this leads to a loss ofinformation and therefore only in the case of strong arguments this should be considered. Ifdichotomized, a cut-off needs to be carefully selected, preferably based on substantivearguments and the use of a conceptual or theoretical model. For instance, at what point do wedefine whether or not there is a case of depression?

    Choice of modelThe choice of the statistical model to be used in creating the prognostic model is dependenton the definition of the outcome measure. A logistic regression model should be chosen for adichotomous outcome. A Cox regression model can be used for a time to event model and alinear regression model for a continuous outcome measure. There are various other options,but these will not be discussed in this guideline.

    Sample size and number of predictorsThe precision of the estimates in the prognostic model is highly dependent on the size of thestudy population. There are different ways of generating power calculations for determiningthe minimal sample size of the study population. This, in particular, will determine the numberof variables that can be included in the regression model. A rule of thumb is that for a

    continuous outcome measure (linear regression) you will need at least 10 15 participants pervariable in the model. For a dichotomous outcome (logistic regression) at least 10 15events" or non-events, depending on which has the lowest number of participants, need tobe considered per variable (2). Events and non-events refer to whether or not the outcome

  • 7/29/2019 Guidelines in Analysis Phase.pdf

    21/32

    21

    occurs, for instance, disease/no disease. The logistic regression rule also applies to Coxregression models. When dealing with external validation of a prognostic model the validationcohort also needs to have a sufficient number of participants (validation cohort refers to acohort used to externally test the model). The 10 - 15 participants rule also applies here.

    LinearityThe regression models discussed in this guideline presupposes a linear relationship betweenthe predictor and outcome. However, more often than not this relationship is non-linear rather

    than linear. An example of this, for instance, is the relationship between alcohol consumptionand the risk of developing a cardiac infarction. This relationship is U-shaped. One thereforeneeds to consider investigating for all potential predictors (with the exception of nominal ordichotomous variables nominal variables should always be included as dummy variables)whether the relationship with the outcome measure is indeed linear. However, a balance mustbe sought between a data driven search for sample idiosyncratic non-linearity and specificsapplying to the population. Most important is that not the exact form of the relationship isimportant but the increase in predictive performance. There are various options forinvestigation of non-linearity including spline functions. More information about the variousmethods for investigating linearity will be available in the Epidm course Prediction modellingthat will start in 2012.

    Spline functionsSpline functions can be used to further explore the linear/non-linear relationship between apredictor and the outcome (spline functions are mathematical functions that are used tocarefully analyse the relationship between a predictor and the outcome measure, if this is non-linear). These spline functions do not assume a linear relationship between a predictor and theoutcome measure, if this is not present, but follow the pattern of the data in more detail. Ifthere is a non-linear relationship between the predictor and the outcome, then this can beincluded as a function in the regression model. The advantage of this is that this does notreduce the power of the regression model too greatly in comparison with categorising thevariables and including these as dummy variables, which often happens in a non-linearrelationship. Contact Martijn W Heymans for more information about spline functions and howto apply them.

    Correlation between predictorsA significant correlation between variables will affect the selection of both predictors. It istherefore sensible to generate a correlation table, including all potential predictors. Whenvariables are strongly correlated (e.g. >0.70), it is sensible to choose which variables you aregoing to use in building the model, or if you intend to add variables together into a singlevariable. For instance, you could choose the variable most strongly associated with theoutcome measure, or the measure that is easiest to measure. N.B.: There is no problem withvariables strongly correlating with each other, i.e. the correlation between the dependent andthe independent variable, in a single model. Problems arise when forward and backwardselection takes place in combination with strongly correlated (independent) variables.

    Handling missing valuesThere will be dropouts and missing values in virtually every cohort study1. Dropouts areparticipants not (or no longer) taking part in follow-up assessments and whose outcomemeasures are missing. The number and reasons of dropouts need to be described. If possiblethe personal characteristics of the dropouts should also be described and compared withthose participants who did take part in the follow-up assessments, in order to investigatewhether a selective dropout took place. In addition to dropouts there are often also (incidental)missing values, where results of one or more predictors are missing for a section of theparticipants.

    There are various strategies for dealing with missing values. One of these is to only use data

    from participants with a complete dataset (complete case analysis). In the most ideal case,where missing values are completely at random, coefficients are estimated less precisely. In

    1Dropouts will not arise during the research in patient-controlled studies. However, there may of course be

    missing values. The same solutions as described above apply.

  • 7/29/2019 Guidelines in Analysis Phase.pdf

    22/32

    22

    less ideal cases, i.e. missing at random or missing not at random, this method will have anegative effect on the composition of the model and the regression coefficient estimates. Thismethod is therefore strongly discouraged.

    It is possible to impute missing values in a dataset. There are various methods available forthis, including imputing an average value or imputing a value estimated from regressionmethods. However, use of these techniques is strongly discouraged. Multiple imputation isconsidered to be one of the best methods. It is common practice for an expert or a statistician

    to be consulted for applying these techniques (Martijn W Heymans can be consulted for this).Make sure that the number of dropouts and missing values are always described in yourstudy. For detailed information on techniques to evaluate and handle missing data we wouldlike to refer to the missing data guideline in the quality handbook.

    Developing the model

    Preselecting predictors and building the modelOnce a set of predictors has been selected, the next step is to create the prognostic model. Itis important in this process to distinguish between relevant and less relevant predictors,meaning that the final model can be developed with as few predictors as possible, but wouldstill lead to reliable predictions. The following techniques can be used for developing a

    prognostic model.

    1. Univariate and Stepwise regression analysis

    Selecting variablesFirstly, the relationship between each individual predictor is investigated with the outcomemeasure in a model that only includes the predictor and outcome measure (univariate). Therelationship between the predictor and outcome are evaluated against a specific p-value: 0.20is often used for this, or lower. If the predictor has a lower p-value, then this can be consideredas relevant and included in the next step. The importance of each predictor to the prognosticmodel can be explored in this way. Should too many variables be retained in this pre-selectionphase, than you can be stricter in the level of selection, i.e. choose a lower p-value, i.e. p < 0.1

    or p < 0.05. An important note to consider is that the pre-selection of predictors based onunivariate statistical significance is arbitrary. It is a better choice to make use of previousresearch and expert opinion for the first selection of predictors without thrusting too much onstatistical pre-selection alone.

    You may also choose to work with groups of variables. For instance, you could firstly generatethe model on the basis of all easily obtainable variables (e.g. details from the case history).The most important predictors can then be selected from this group of variables (see buildingthe model). You can then add the next group of variables (e.g. details from the physicalexamination). Select the most important predictors from this group of variables, plus from thevariables that have been retained from the previous group, etc.

    Building the modelThe options for this are to use a forward or backward selection method, or a combination ofthe two (stepwise regression). Forward and backward selection methods can be used in orderto select the predictors for the model step-by-step. In the forward selection method you addvariables to the model, whereas in a backward selection method you remove variables fromthe model. The backward selection method is preferred, as it leads to fewer errors in theestimates for the predictors or in selecting the most relevant predictors. For these reasons thismethod is discussed in more detail here.N.B.: Selecting predictors by using forward or backward selection techniques will alwaysgenerate more problems than selecting variables on the basis of previous research(prospective or systematic literature reviews) or by consulting clinical experts for choosing

    important variables on basis of a Delphi procedure. It is therefore advisable to use forward andbackward selection techniques as little as possible.

  • 7/29/2019 Guidelines in Analysis Phase.pdf

    23/32

    23

    In backward selection all the selected variables are firstly entered at the same time into amodel. Subsequently the variables with the highest p-values are manually removed (i.e. thosevariables contributing the least) on the basis of the Wald test (which allows you to calculatethe significance level of a predictor). Then the model is re-run. This step is repeated until thereare no variables left with a p-value smaller than 0.10 or 0.20. A p-value of 0.10 or 0.20 iscommonly used in prognostic models, as variables that are less strongly associated with theoutcome may still make a relevant contribution to the prediction.

    Sometimes it may be informative to, following this procedure, add specific variables that didnot end up in the final model (but perhaps were expected to fit in the model), to assesswhether they make a significant contribution to the final model. This process is occasionallysuccessful. It may also be interesting to interchange variables on the basis of the correlationbetween variables (e.g. variables that are easier to measure), to assess whether thisgenerates an equivalent, but more easily applicable model.

    2. Least absolute shrinkage and selection operator (Lasso)

    The Lasso is an advanced technique for the selection of variables. The Lasso is able to shrinkregression coefficients to zero. This is the same as not selecting variables in a multivariableanalysis. The Lasso method combines this shrinkage with variable selection and so does not

    need a separate shrinkage step (for more on shrinkage see paragraph G below). Furthermore,with the Lasso the number of potential prognostic variables to select can be much larger thanwith normal backward selection. To learn more about this technique and how to apply itcontact Martijn W Heymans. The method is promising but has not been applied much inepidemiological studies yet.

    E. The performance of the prognostic modelOnce you have developed a prognostic model, it is also important to investigate how well themodel works, that is to say, how well does the model predict outcomes? The section belowdescribes which techniques, depending on the choice of model, can be used to test how wellyour prognostic model works (1):

    Linear regressionThe percentage variance explained (R2): This indicates the percentage of the total variance ofthe outcome measure explained by the predictors in the prognostic model.

    Logistic and Cox regressionCalibration: Calibration can be used to assess how well the observed probability of theoutcome agrees with the probability predicted by the model. This can also be presentedgraphically in a calibration plot. In a calibration plot groups of predicted probabilities of theoutcome are plotted against groups of observed probabilities (groups of 10 are often used).Subsequently you can assess the extent to which these groups lie along the perfect calibrationline, which forms a 45 degree angle with the horizontal axis. The Hosmer-Lemeshow test canalso be used to investigate how well the predicted probabilities agree with the observedprobabilities. This test should not be statistically significant (null hypothesis: There is nodifference between predicted and observed values).

    Discrimination: This indicates how well the model discriminates between people with andwithout the outcome. If there are few predictors in the model, then a lot of people will fall intothe same group of predicted probabilities and the model will not be able to discriminate verywell between groups. If there are numerous predictors in the model, then few people will fallinto the same group and the model will have a better discriminatory power. An ROC curve canbe generated for the predicted probabilities to determine the level of discrimination. The AreaUnder the Curve (AUC) of the ROC curve is a measure of discriminatory power for the model,that is, how well the model is able to discriminate between people with and without the

    outcome based on the predicted probabilities (3). An AUC of 0.5 indicates that the model isnot discriminating very well (no different to tossing a coin); an AUC of 1.0 indicates perfectdiscrimination.

  • 7/29/2019 Guidelines in Analysis Phase.pdf

    24/32

    24

    Reclassification tables: This is a novel method to evaluate the performance of a predictionmodel and can be seen as a refinement of discrimination obtained by the ROC curve (4). Thismethod is especially useful to detect an improvement in discrimination when a new variable isadded to an existing prediction model. It makes use of the reassignment of subject with and withoutthe outcome in their corresponding risk categories. When a new variable is added to the modeland prediction is improved, subjects with the outcome are reassigned to a higher risk category.This means improved reclassification. When subjects with the outcome are reassigned to lower risk

    categories reclassification is worsened. For subjects without the outcome it works in the

    opposite direction. The Net Reclassification Improvement (NRI) and Integrated DiscriminationImprovement (IDI) can be used to test of significance of reclassification and create confidenceintervals.

    F. Creating a prediction ruleFor logistic and Cox regression models the regression coefficients can be used to calculatethe outcome (predicted probabilities), based on individual patient characteristics (values of thedeterminants). The regression coefficients can be transformed into risk scores in order tofacilitate use of the prediction rule in practice. A frequently used method for this is to divide theregression coefficients by the lowest value or to multiply the coefficients by a constant, forinstance 10. A score card containing these scores can then be generated to allow theprobability of an outcome to be easily calculated for a given individual. This is easy to use in

    practice. Refer to the article by Kuijpers et al. for an example. 2006 (5). Another example is tocreate a mathematical algorithm and install this on a website.

    G. ValidityThis is perhaps the most important part of developing a prediction rule. Prediction modelscommonly perform better in datasets used to develop the model than in new datasets(subjects). This means that the models regression coefficients and performance measuresare too optimistic and that these have to be adapted to new situations (1, 6). A way to adaptprediction models is to shrink (i.e. make smaller) the regression coefficients before the modelwill be applied in new subjects. Internal and external validation are used to estimate theamount of optimism. In other words, validating the model explores how well predictionsgenerated by the prognostic model agree with predictions for future patients or comparable

    patients not part of the study population. Determining validity of a prediction rule can beachieved in a number of ways, which are discussed briefly below. A nice reference for a morecomprehensive overview is Vergouwe et al. (7).

    A distinction is made between internal and external validity when validating a prediction rule.

    Internal validityFor internal validity the model is developed and validated using exactly the same dataset ofpatients. Techniques that can be used to determine internal validity include: Data-splitting(where the dataset is split in two at random), cross-validation (where the dataset is split intomore than two datasets at random) and bootstrapping (a type of simulation technique). Thelast method is recommended, as this makes efficient use of all the data.

    External validityFor external validity a model is developed in a cohort of patients and the validity is determinedusing another cohort of comparable patients.

    The previously described measures, such as variance explained (R2), calibration anddiscrimination, are used to determine validity.

    Contacts:If you would like more information about developing and/or validating prediction rules, thenplease contact Martijn W Heymans. There will also start a new Epidm course on Prediction

    modelling in 2012.

    Audit questions

  • 7/29/2019 Guidelines in Analysis Phase.pdf

    25/32

    25

    1. Was the selection of the predictors based on a literature search and advice fromexperts?2. Has the outcome measure been clearly defined?3. Have dropouts and missing values been described and have the potential

    consequences of these been discussed in the research report (is dealt with missingvalues in a sensible way, i.e. multiple imputation)?

    4. Is the sample size of the study population sufficient?5. Has linearity been assessed for all potential predictors?

    6. Has a correlation table been created of all potential predictors?7. Has a (manual) backward selection been used for building the model?8. Has the model quality been assessed? If possible, have calibration and discrimination

    been assessed?9. Was the prediction model validated?

    6. Appendices/references/links Harrell F. Regression Modeling Strategies: With Applications to Linear Models, Logistic

    Regression, and Survival Analysis. Springer, 2001. (a new update will be available in june/july2011).

    Peduzzi P, Concato J, Feinstein AR, Holford TR. Importance of events per independentvariable in proportional hazards regression analysis. II. Accuracy and precision of regression

    estimates. J Clin Epidemiol 1995; 48(12):1503-10. Harrell F, Lee K, Mark D. Multivariate prognostic models: issues in developing models,

    evaluating assumptions and adequacy, and measuring and reducing errors. Stat Med1996;15:361-87.

    Steyerberg EW, Vickers AJ, Cook NR, Gerds T, Gonen M, Obuchowski N, Pencina MJ, KattanMW. Assessing the performance of prediction models: a framework for traditional and novelmeasures. Epidemiology. 2010;21(1):128-38.

    Kuijpers T, van der Windt DA, Boeke AJ, Twisk JW, Vergouwe Y, Bouter LM, van der HeijdenGJ. Clinical prediction rules for the prognosis of shoulder pain in general practice. Pain2006;120(3):276-85.

    Steyerberg EW. Clinical Prediction Models: A Practical Approach to Development, Validation,and Updating. New York: Springer Science+Business Media, 2009.

    Vergouwe Y, Steyerberg EW, Eijkemans MJ, Habbema JD. Validity of prognostic models: when

    is a model clinically useful? Semin Urol Oncol 2002;20:96-107.

    7. AmendmentsV1.0: 1 Jan 2010. English translation.V1.1: 1 Mar 2011. Several textual changes and additions, Replacement of bootstrapping bythe Lasso technique, addition of reclassification tables, more emphasis on validation themodel. Update references.

  • 7/29/2019 Guidelines in Analysis Phase.pdf

    26/32

    26

    Page. 26 of 32

    Rev. Nr.: Effective date:

    1.1 5-7-2012

    Title of the document:

    Handling Missing DataHB Nr. : 1.4-09 aw

    1. Aim

    To give researchers a structured guideline for handling missing data

    2. Definitions

    3. KeywordsMissing data, Missing completely at random, Missing at Random, Missing not at random, Imputation.

    4. Description4.1 IntroductionMissing data is a common problem in all kinds of research. The way you deal with it depends on howmuch data is missing, the kind of missing data (single items, a full questionnaire, a measurementwave), and why it is missing, i.e. the reasons that the data are missing. Handling missing data is an

    important step in several phases of your study.

    4.2 Why do you need to do something with missing data?The default option in SPSS is that cases with missing values are not included in the analyses.Deleting cases or persons results in a smaller sample size and larger standard errors. As a result thepower to find a significant result decreases and the chance that you correctly accept the alternativehypothesis of an effect (compared to the null hypothesis of no effect) is smaller. Secondly, youintroduce bias in effect estimates, like mean differences (from t-tests) or regression coefficients (fromregression analyses). When the group of non-responders is large, and you delete them, your samplecharacteristics are different from your original sample and from the population you study. There couldbe a difference in characteristics between responders and non-responders. Therefore you need toinspect the missing data, before doing further analyses. Thus, always check the missing data in your

    data set before starting your analyses, and do never simply delete persons in your dataset withmissing values (default option in SPSS).

    4.3 What to do with missing data in different phases of your studyData preparation:If you work with questionnaires, make sure that all questions are clear and applicable to your respondents.If necessary, use the not applicable answer option. To decrease the chance of missing data, use digitalapplications to collect your data, such as Web based questionnaires where you can set the option thatanswering the question is required. You can also use these applications for sending reminders andtracking the respondents progress. If you work with physical or physiological data, the most frequentcause of missing data is a technical problem with the instruments. Testing the instruments in a pilot studywill partly prevent you for these problems.

    Data collection:Closely monitor the completeness of the data when you receive or obtain the data. When you detectmissing data during data collection, try to complete your data. Look back in the raw data(questionnaires), or ask your respondents to fill out the missing items. Describe in your logbook whydata are missing. This helps you to decide whether data are missing at random or not.

    Data processing:Investigate the number of missing data you have (see 4.4) and estimate the need for imputation andthink about the most adequate imputation method (see 4.5 and further).

    Data analyses:If you have missing values in your data set when starting your analyses, remember that case wiseand list wise deletion (default in SPSS regression and ANOVAs) may hamper the reliability of yourresults (see 4.2).

  • 7/29/2019 Guidelines in Analysis Phase.pdf

    27/32

    27

    4.4 How much data is missing?SPSS can help you to identify the amount of missing data. When you are interested in the percentageof missing values for each variable separately (e.g. item on a questionnaire) use the Frequencyoption in SPSS:

    1. Select Analyze Descriptive StatisticsFrequencies2. Move all variables into the Variable(s) window.3. Click OK. The Statistics box tells you the number of missing values for each variable.

    However, be aware that this only gives you information about the percentage of missing values for

    each variable separately. It is more important to study the full percentage of missing data, especiallywhen you use more variables in your analysis.

    When you are interested in the full percentage of missing data use the following option:1. Select Analyze Multiple Imputation Analyze patterns2. Move all variables into the Variable(s) window.3. Click OK. The output tells you the percentage of variables with missing data, the

    percentage of cases with missing data, and the number of missing values. This final piechart tells you the full percentage of missing data. Note the 5% borderline. Also patterns ofmissing data are presented.

    4. Tip: use the Help button, and click show me for more information about the options andoutput in SPSS.

    When you want to find out more about the patterns of missing data and the relation between missingdata between variables, use the following option:

    1. Analyze Missing Value Analysis,2. Move all variables of interest into the Quantitative or Categorical Variable(s) window.3. Use the patterns button to get information about the relation between missing data on

    more variables4. A tutorial of the Missing Value Analysis (SPSS 16 and further) procedures in SPSS can be

    found via the Help button. A users guide can be downloaded freely on the internet.

    4.5 What kind of data is missing?Next step is to identify the kind of data that is missing. You can find out this information from the steps

    described in 4.4.1. A single item, or several items of a questionnaire is missing.2. A full questionnaire or a single variable (such as blood pressure)3. A measurement wave (in longitudinal / randomized studies)

    The way you deal with missing data depends on the type of missing data

    4.6 What type of missings do you have?Missing values are either random or non-random. Random missing values may occur because thesubject accidentally did not answer some questions. For example, the subject may be tired and/or notpaying attention, and misses the question. Random missing values may also result from data entrymistakes. Non-random missing values may occur because subjects purposefully do not answer somequestions. For example, the question may be confusing, so respondents do not answer the question.Also, the question may not provide appropriate answer choices, such as no opinion or notapplicable, so the subject chooses not to answer the question. Also, subjects may be reluctant toanswer some questions because of social desirability concerns about the content of the question,such as questions about sensitive topics like income, past crimes, sexual history, prejudice or biastoward certain groups, and etc.

    Think about your dataset. Is there an option that the missing values are non-random?

  • 7/29/2019 Guidelines in Analysis Phase.pdf

    28/32

    28

    Rubin developed in 1976 a typology for missing data.

    Type of missings DescriptionMCAR: Missing Completely AtRandom:

    The data are MCAR when the probability that a value fora certain variable is missing is unrelated to the value ofother observed variables, or unrelated to the variablewith missing values itself. An example is whenrespondents accidentally skip questions. In other words,

    the observed values in your dataset is just a randomsample from your dataset, when it would have beencomplete.

    MAR: Missing at Random (most ofthe time)

    The data are MAR when the probability that a value for acertain variable is missing is related to observed valueson other variables. An example is when olderrespondents have more missing values than youngerrespondents. However, within the group of older andyounger respondents, the data are still MCAR. Anotherexample is when respondents with low scores on thefirst wave are not invited for a second wave.

    MNAR: Missing Not At Random: The data are MNAR when the probability that a value fora certain variable is missing is related to the scores onthat variable itself. An example is that respondents withlow income intentionally skip their low income scoresbecause that violates their privacy. In that case, theprobability that an observation is missing depends oninformation that is not observed, like the value of theincome score, because only low values are missing.MNAR is a serious problem, which can not be solvedwith a technique as multiple imputation.

    How do you know what kind of missings you have?There are three kinds of methods.1. First you can inspect the data by yourself. Are the missings equally distributed in the data. Are

    low and / or high scores missing? If the missings are not equally spread this might be anindication that the data are MNAR. With this method you a-priori must now what thedistribution of the variable normally is, i.e. is it normal or skewed? You need this informationbefore you can judge which part of the data suffers from missing values. This method onlyapplies if your dataset is large.

    2. Second, SPSS can test whether the respondents with missing data differ from therespondents without missing data on important variables (Analyze Missing ValueAnalysis select important variables descriptivest-test formed by indicator. Significant?

    Indication for MAR. Be aware that if your sample size is large (>500) this t-test might besignificant if the data truly are not MAR. So, just looking at the means and their differencemight be good enough. In case this mean difference is very small, this might be an indicationof MCAR.

    3. In SPSS via (Analyze Missing Value Analysis, EM button), it is also possible to do a test forMCAR data. This is called Littles test. A tutorial of the Missing Value Analysis (SPSS 16 andfurther) procedures in SPSS can be found via the Help button.

    It is important to note that youre not able to testwhether your missing data is MAR or MNAR. Theabove mentioned procedures (1 and 2) will only give you an indication. Pay attention to thepossibility of MNAR, because all analyses have serious problems when your missing data isMNAR.

    4.7 How to handle missing data?Missing data is random:

  • 7/29/2019 Guidelines in Analysis Phase.pdf

    29/32

    29

    For MCAR and MAR, many missing data methods have been developed in the last two decades(Schafer & Graham, 2002). Although MCAR seems to be the least problematic mechanism,deleting cases can still reduce the power of finding an effect. It is argued that the MARmechanism is most frequently seen in practice. An argument for this is that in most researchmultifactorial or multivariable problems are studied, so when data on variables are missing it ismostly related to other variables in the dataset.

    Missing data is not random:

    For MNAR, imputation is not sufficient, because the missing data are totally different from theavailable data, i.e. your complete data has become a selective group of persons. If you think yourdata is MNAR it might be wise to contact a statistician from EMGO+ who is willing to help you.

    For MCAR and MAR, there are roughly two kinds of techniques for imputation. Single and MultipleImputation.

    Single imputation is possible in SPSS and is an easy way to handle missings when just a fewcases are missing (less than 5%) and you think your missing values are MCAR or MAR. However,after single imputation the cases are more similar which may result in an underestimation of thestandard errors, i.e. smaller confidence intervals. This increases the chance of a type 1 error (thenull hypothesis of no effect is rejected, while there is truly no effect). Therefore, this method is less

    adequate when you have >5% missing data.

    Multiple imputation is more complex, but also implemented in SPSS 17.0 and later versions.Multiple imputation takes into account the uncertainty of missing values (present in all values ofvariables) and is therefore more preferred than single imputation. When your missingness is high(exceeds 5% in several variables and different persons) multiple imputation is more adequate.

    Imputation techniquesSingle imputationSingle imputation techniques are based on the idea that in a random sample every person can bereplaced by a new person, given that this new person is randomly chosen from the same sourcepopulation as the original person. In that case you can use the observed available data of the

    other persons to make an estimation of the distribution of the test result in the source population.It is called single imputation, because each missing is imputed once.There are many methods for single imputation, such as replacement by the mean, regression,and expected maximization. Expected maximization is preferred, because in the other methodsthe variance and standard error are reduced and the chance for Type II errors increases.Expected maximization forms a missing data correlation matrix by assuming the shape of adistribution for the missing data and imputes missing values on the likelihood under thatdistribution. Single imputation is possible in SPSS (analyze missing value analyses button EMfor Expected Maximization). Contact a statistician from EMGO+ who is willing to help you with thisprocedure.

    For the imputation of a missing score on a single item in a questionnaire (see 4.5) , SPSSsyntaxes can be found at:http://www.tilburguniversity.edu/nl/over-tilburg-university/schools/socialsciences/organisatie/departementen/mto/onderzoek/software/tw.zip: Software for two-way imputation in SPSS. (Van Ginkel & Van der Ark, 2003a), orrf.zip: Software for response function imputation in SPSS (Van Ginkel & Van der Ark, 2003b).

  • 7/29/2019 Guidelines in Analysis Phase.pdf

    30/32

    30

    Multiple imputation (MI)The difference with single imputation is that in MI the value is imputed for several times. There aremore imputed datasets created. The different imputations are then based on random draws ofdifferent estimations of the underlying distribution in the source population. In this way, theimputed data comes from different distributions and therefore are less look alike. There is moreuncertainty created in the dataset. Therefore the standard error increases. The amount ofimputations is dependent on the amount of missing data, but mostly 5 to 10 imputations areenough. A drawback of this method it that several imputed datasets are created and that the

    statistical analysis has to be repeated in each dataset. Finally, results have to be pooled in asummary measure. Most statistical packages can do this automatically. Multiple imputation ispossible in recent versions (vs 17) of SPSS (analyze multiple imputation impute missing datavalues). For more information see references. Contact a statistician from EMGO+ who is willing tohelp you with this procedure.

    Sensitivity analysisAfter imputation, sensitivity analysis is needed to determine how your substantive results dependon how you handled the missing data.Follow these steps:

    1. Do a complete case analysis (default option in SPSS; cases with missings are notincluded)

    2. Do a missing data analysis after you imputed the results3. Compare substantive conclusions, decide how to report.

    When is imputation of missing data not necessary?1) When your missing data is MCAR or MAR, and you use Maximum Likelihood estimation

    techniques in analyses such as Structural Equation Modelling (SEM) or Linear Mixed Models(LMM), imputation of missing data is not necessary. These techniques use the available data,and ignore the missing values and still give correct results. In such situations you do not haveto use an extra imputation technique to handle your missing values. Missing data that areMNAR is still a problem for these methods.

    2) A different approach may be used for descriptive studies. If you want to show the (observed)study data (means and standard deviations), for example to compare them with other

    countries/settings, without directly linking them to a conclusion, imputation is not immediatelyneeded. However, the evaluative statistics (t-tests, regressions, etc.) would certainly needcomplete case analysis. So, if you use statistical tests to compare the descriptive, imputationis needed (of course depending on the amount and type of missing data). In this final case,you link your descriptive to a conclusion and want a corrected p-value / 95% CI, and thereforeyou need to use the data with imputed values. Do not forget the reviewer, who maysometimes have problems with using imputed and non-imputed data in one paper. Be clearabout imputation and point out why you choose to present imputed/non-imputed data.

    4. Summary

    Make every effort to avoid missing data, or failing that, to understand how much andwhy data is missing.

    Understand missing data mechanisms (MCAR, MAR, MNAR) and their implications Avoid default methods (listwise deletion, pairwise deletion)

    Avoid default fixups (mean imputation, etc.) where possible Use multiple imputation to take proper account of missings

    Do a sensitivity analysis

  • 7/29/2019 Guidelines in Analysis Phase.pdf

    31/32

    31

    5. Details

    6. Appendices/references/linksMultiple Imputation Methods, Niels Smits (technical literature).http://www2.chass.ncsu.edu/garson/pa765/missing.htmhttp://www.ssc.upenn.edu/~allison/MultInt99.pdf(especially for Multiple Imput