criteria for monitoring tests were described: validity, responsiveness, detectability of long-term...

8
ORIGINAL ARTICLES Criteria for monitoring tests were described: validity, responsiveness, detectability of long-term change, and practicality Katy J.L. Bell a,b, * , Paul P. Glasziou a , Andrew Hayen c , Les Irwig b a Centre for Research in Evidence-Based Practice (CREBP), Bond University, Gold Coast, QLD 4229, Australia b Screening and Diagnostic Test Evaluation Program (STEP), School of Public Health, Building A27, University of Sydney, Sydney, NSW 2006, Australia c School of Public Health and Community Medicine, University of New South Wales, Sydney, NSW 2052, Australia Accepted 19 July 2013; Published online 1 November 2013 Abstract Objectives: To describe how evidence from trials and cohort studies may be used to guide choice of test for monitoring patients with chronic disease. Study Design and Setting: Exploration of potential criteria for choosing the best monitoring test. Criteria are defined and options for assessment measures for test performance on each criterion discussed. Results: Monitoring in clinical practice occurs in three main phases: before treatment, response to treatment, and long-term monitoring. Four important criteria may be used to choose the best test for monitoring a patient in each of these phases. Clinical validity describes the ability of the test to predict the clinically relevant outcome that we are trying to control or prevent. Responsiveness describes how much the test changes in response to an intervention relative to background random variation. Detectability of long-term change describes the size of changes in the test over the long term relative to background random variation. Practicality describes the ease of use, invasiveness, and cost of the test. Test performance generally requires longitudinal data from trial and/or cohort studies using statistical methods such as those discussed. Conclusion: Four specific criteria can help clinicians inform evidence-based decisions on which monitoring test to use. Ó 2014 Elsevier Inc. All rights reserved. Keywords: Chronic disease; Disease management; Diagnostic tests; Biological markers; Reproducibility of results; Statistical models 1. Introduction Monitoring is an important element of a patient’s long- term care. Consider a 68-year-old patient with early stage chronic kidney disease who has an elevated clinic blood pressure (BP) measurement. A good clinician will wish to do a series of BP measurements to establish a baseline, as- sess the long-term BP, and assess the need for intervention. If BP treatment is required, then the clinician will wish to monitor the response to treatment to check if it has been ad- equate and monitor for adverse effects such as hyperkalemia. The patient will need to be monitored long term for not only BP changes but also further decline in renal function. As this scenario illustrates, clinicians monitor their pa- tients for many reasons: to assess the progress of disease, the need to start or change treatment, the adequacy of treat- ment, and the development of complications [1]. Monitor- ing may be summarized as consisting of three important phases: (1) pretreatment monitoring, (2) initial response monitoring, and (3) on-treatment long-term monitoring. These are illustrated in Fig. 1, which shows a theoretical trajectory of a monitoring test over time. Pretreatment mon- itoring is done to screen individuals on the need to start treatment. As these individuals are usually at lower risk of adverse clinical outcomes from disease, monitoring in- tervals are often longer during this phase than subsequent ones. Initial response monitoring is done over a relatively short time to assess whether the individual’s response to treatment is the same as expected from mean response ob- served in trials. On-treatment long-term monitoring is done after initial treatment is stabilized to assess whether treat- ment remains adequate over the long term. Further clinical intervention may be indicated, for example, because of Conflict of interest: The authors have no potential conflicts of interest to declare. Funding sources: The authors have received funding from the Austra- lian National Health and Medical Research Council (Program Grant No. 633003, Early Career Fellowship No. APP1013390). The funders had no role in design and conduct of the study; collection, management, analysis, and interpretation of the data; and preparation, review, or approval of the manuscript. * Corresponding author. Tel.: þ61-293515994; fax: þ61-293515049. E-mail address: [email protected] (K.J.L. Bell). 0895-4356/$ - see front matter Ó 2014 Elsevier Inc. All rights reserved. http://dx.doi.org/10.1016/j.jclinepi.2013.07.015 Journal of Clinical Epidemiology 67 (2014) 152e159

Upload: independent

Post on 08-Dec-2023

0 views

Category:

Documents


0 download

TRANSCRIPT

Journal of Clinical Epidemiology 67 (2014) 152e159

ORIGINAL ARTICLES

Criteria for monitoring tests were described validity responsivenessdetectability of long-term change and practicality

Katy JL Bellab Paul P Glaszioua Andrew Hayenc Les IrwigbaCentre for Research in Evidence-Based Practice (CREBP) Bond University Gold Coast QLD 4229 Australia

bScreening and Diagnostic Test Evaluation Program (STEP) School of Public Health Building A27 University of Sydney Sydney NSW 2006 AustraliacSchool of Public Health and Community Medicine University of New South Wales Sydney NSW 2052 Australia

Accepted 19 July 2013 Published online 1 November 2013

Abstract

Objectives To describe how evidence from trials and cohort studies may be used to guide choice of test for monitoring patients withchronic disease

Study Design and Setting Exploration of potential criteria for choosing the best monitoring test Criteria are defined and options forassessment measures for test performance on each criterion discussed

Results Monitoring in clinical practice occurs in three main phases before treatment response to treatment and long-term monitoringFour important criteriamay be used to choose the best test formonitoring a patient in each of these phases Clinical validity describes the abilityof the test to predict the clinically relevant outcome that we are trying to control or prevent Responsiveness describes how much the testchanges in response to an intervention relative to background randomvariation Detectability of long-term change describes the size of changesin the test over the long term relative to background random variation Practicality describes the ease of use invasiveness and cost of the testTest performance generally requires longitudinal data from trial andor cohort studies using statistical methods such as those discussed

Conclusion Four specific criteria can help clinicians inform evidence-based decisions on which monitoring test to use 2014Elsevier Inc All rights reserved

Keywords Chronic disease Disease management Diagnostic tests Biological markers Reproducibility of results Statistical models

1 Introduction

Monitoring is an important element of a patientrsquos long-term care Consider a 68-year-old patient with early stagechronic kidney disease who has an elevated clinic bloodpressure (BP) measurement A good clinician will wish todo a series of BP measurements to establish a baseline as-sess the long-term BP and assess the need for interventionIf BP treatment is required then the clinician will wish tomonitor the response to treatment to check if it has been ad-equate andmonitor for adverse effects such as hyperkalemia

Conflict of interest The authors have no potential conflicts of interest

to declare

Funding sources The authors have received funding from the Austra-

lian National Health and Medical Research Council (Program Grant No

633003 Early Career Fellowship No APP1013390) The funders had no

role in design and conduct of the study collection management analysis

and interpretation of the data and preparation review or approval of the

manuscript

Corresponding author Tel thorn61-293515994 fax thorn61-293515049

E-mail address katybellsydneyeduau (KJL Bell)

0895-4356$ - see front matter 2014 Elsevier Inc All rights reserved

httpdxdoiorg101016jjclinepi201307015

The patient will need to be monitored long term for not onlyBP changes but also further decline in renal function

As this scenario illustrates clinicians monitor their pa-tients for many reasons to assess the progress of diseasethe need to start or change treatment the adequacy of treat-ment and the development of complications [1] Monitor-ing may be summarized as consisting of three importantphases (1) pretreatment monitoring (2) initial responsemonitoring and (3) on-treatment long-term monitoringThese are illustrated in Fig 1 which shows a theoreticaltrajectory of a monitoring test over time Pretreatment mon-itoring is done to screen individuals on the need to starttreatment As these individuals are usually at lower riskof adverse clinical outcomes from disease monitoring in-tervals are often longer during this phase than subsequentones Initial response monitoring is done over a relativelyshort time to assess whether the individualrsquos response totreatment is the same as expected from mean response ob-served in trials On-treatment long-term monitoring is doneafter initial treatment is stabilized to assess whether treat-ment remains adequate over the long term Further clinicalintervention may be indicated for example because of

153KJL Bell et al Journal of Clinical Epidemiology 67 (2014) 152e159

What is new

Key findings Monitoring in clinical practice occurs in three main

phases before treatment response to treatmentand long-term monitoring Four important criteriamay be used to choose the best test for monitoringa patient in each of these phases clinical validityresponsiveness long-term change and practicalityThe performance of tests on these criteria may beinformed from trial andor cohort data

What this adds to what was known Evidence can be used to compare different testsrsquo

performance across these key criteria to judgewhich test is best for monitoring patients at eachphase of clinical management

What is the implication and what should changenow Using methods we describe in this article clini-

cians may make evidence-based decisions onwhich monitoring test to use

problems with adherence lifestyle changes that modify theeffect of treatment or natural progression of the disease andthe development of complications

Methods have been described to help choose which testsor markers are best for diagnosis predicting risk and treat-ment decisions [2e4] but there are no accepted approachesfor choosing which tests should be used for clinical moni-toring However just as with diagnosis there are oftena number of different tests available for the clinician tochoose from For example we might consider the clinicalpopulation of patients at high risk for cardiovascular dis-ease (CVD) Should we monitor total cholesterol or low-density lipoprotein (LDL) cholesterol when deciding onthe need foradequacy of statin treatment Or is there an-other lipid test that is better to monitor Should we measureBP in the clinic or via a 24-hour ambulatory device orteach patients how to do it themselves at home Similarlyfor patients with osteoporosis should we monitor boneturnover markersdand if so which onedor bone density(which can take several years to change) If so how oftenAfter considering key criteria that may be used to helpchoose the best monitoring test we will revisit some ofthese clinical situations at the end of the article

2 Criteria for good monitoring measurements

To help choose between monitoring tests a number ofcriteria may be used [5] These include (1) clinical validity

(2) responsiveness (3) detectability of long-term changeand (4) practicality Clinical validity describes the abilityof the test to predict the clinically relevant outcome thatwe are trying to prevent For example with both cholesteroland BP tests the clinical outcomes we are interested in arecardiovascular events such as myocardial infarction andstroke Responsiveness describes how much the testchanges in response to an intervention relative to back-ground random variation Detectability of long-term changedescribes the size of changes in the test over the long termrelative to background random variation Practicality de-scribes the ease of the test in terms of invasiveness costand straightforwardness This fourth criterion although im-portant poses few theoretical questions for evaluation andis included for completeness but will not be considered fur-ther here

These criteria may be considered within the context ofthe pathway that leads from the prescription of treatmentto the outcome that the treatment aims to produce illus-trated in Fig 2 The top section of Fig 2 shows the naturalhistory of disease from risk factors to the outcome via inter-mediate pathology both early and late Below this is a path-way that starts with prescription of a drug (or another typeof intervention) that aims to alter that natural history of dis-ease and decrease the patientrsquos risk of the clinical outcomeThe pathway suggests that choices for monitoring will in-clude the treatment (compliance or blood levels) a proximalconsequence of treatment (change in BP cholesterol orother marker) or the disease itself (symptoms or signs)

This treatmenterisk factoredisease pathway may helpus understand the choice of test in the different monitoringphases before treatment initial response and long termWe often may choose the same test for each of the threemonitoring phases for ease of comparison over time How-ever there is often a trade-off between criteria for exampleas shown in Fig 2 the most valid test is often not the mostresponsive one This means that in some cases a differenttest may be chosen for different monitoring phases for ex-ample a more responsive test for initial response and a morevalid test for pretreatment and long term

3 The three measurement criteria

We now consider each of the specific criteria as appliedto monitoring tests with continuous outcomes and then ap-ply these to analyze the choice among several options formonitoring of lipid- and BP-lowering treatments

31 Clinical validity how well do the tests predictpatient outcomes

A test has clinical validity if it predicts the clinical out-come of interest It must be on the risk factor outcomepathway either directly or as a proxy for another marker onthe pathway that is less easily measured As Fig 2 shows

Fig 1 Phases of monitoring Adapted from Ref [1]

154 KJL Bell et al Journal of Clinical Epidemiology 67 (2014) 152e159

the later the test is on the pathway (closer to outcome) themore predictive it is likely to be with the most valid testbeing a measure of the outcome itself For example withour aforementioned 68-year-old patient following treat-ment we might monitor compliance by pill counts BP ath-erosclerotic changes (such as renal stenosis) signs ofkidney damage (such as proteinuria) or renal functionThe measures along this path have progressively greaterimportance but take progressively longer for changes to be-come apparent

Hazard ratios (HRs) are the most obvious statisticalmeasure for capturing how well tests predict clinical out-comes however in their natural form they have limited ca-pacity for comparison across tests as the HR depends onboth the units of measurement (which are test specific)and distributional range of test results To compare the pre-dictiveness of different tests a standardized measure maybe used such as the HR over the interquartile range (HRIQR) or the HR per standard deviation (SD) increase inthe test (HRSD) Using the IQR to standardize allows lessreliance on the distributional assumptions of normality thatapply when using SD and thus may be preferred in many

Fig 2 Flow diagram of disease and the effects of treatment Adaptedfrom Ref [2] CHD coronary heart disease

instances However neither measure is robust against veryskewed data where a transformation may be required firstbefore calculating predictiveness measures Alternativemethods of standardization include calculating the HR forthe top quartile compared with the bottom quartile [6] (ordecile quintile or similar) Although this usually gives re-sults with a similar interpretation to standardizing by theIQR (or interdecile range etc) less information is used In-stead of estimating risk from the complete data and thencalculating the ratio from two points (HRIQR) the averagerisk from the top quartile is compared with that from thebottom quartile (HR for top vs bottom quartile) HR com-paring top with bottom quartile is also likely to be moresensitive to distributional assumptions than HRIQR

To calculate our preferred standardized measure of pre-dictiveness we first need to choose the patient populationthat we will use for the data This may include patientson no treatment (or placebo) patients on active treatmentor both patients on and off treatment

In some cases there is evidence that the relationshipsbetween monitoring tests and clinical outcome are similarwhether patients are on treatment or not For example inthe Long-Term Intervention with Pravastatin in IschemicDisease (LIPID) trial total cholesterol and LDL cholesterolhad a similar ability to predict CVD whether the patientwas on pravastatin treatment or placebo after adjustmentfor other nonlipid risk factors and measurement error [7]In this case the same data may be used to compare validityfor pretreatment and on-treatment monitoring and this maycome from individuals on placebono treatment or individ-uals on treatment If data are from a randomized controlledtrial (RCT) it may be better to include individuals fromboth placebo and treatment groups This helps ensure a widerange of values for the monitoring test and increase the sta-tistical precision because of the larger data set

In other cases the predictiveness of the monitoring testmay differ depending on whether the patient is on treat-ment For example in the Framingham study the relation-ship between BP and CVD appeared stronger if the patientwas on BP-lowering treatment than if they were not [8] Inthis case it may be best to use data from patients not ontreatment to decide on the clinical validity of a test for pre-treatment monitoring and data from patients on treatment todecide on the clinical validity of a test for on-treatmentmonitoring A caveat to this is that in the first couple ofyears of on-treatment monitoring data from off-treatmentpopulations may be more informative This is because inmany cases of chronic disease the effect of the treatmenton the monitoring test (eg drop in BP) is relatively fastbut the effect on the clinical outcome (eg CVD) may takea few years to become apparent and pretreatment levels ofthe monitoring test may be more predictive during thistime This lsquolsquolag timersquorsquo phenomenon is evident in trial datafor a number of chronic diseases including treatments tolower lipids [9] and BP [10] In each case the differencein monitoring test levels between treatment and placebo

155KJL Bell et al Journal of Clinical Epidemiology 67 (2014) 152e159

is maximal after a few months but the difference in CVDevents between the two treatment groups continues to di-verge for a number of years

Other methods of assessing validity such as proportionof treatment explained (PTE) also assess the responsivenessof the monitoring test and are discussed in the following

32 Responsiveness how clearly and rapidly do thetests change with treatment change

The responsiveness criterion is especially important forthe initial response phase of monitoring soon after a newtreatment has been started Although less obvious this crite-rion is also important for both pretreatment and long-termmonitoring For all monitoring phases we ideally want thetest to be responsive to interventions that alter the patientrsquosrisk of the clinical outcome Such interventions may belifestyle changes in the pretreatment screening phase phar-macologic treatments in the initial response phase or mea-sures to improve adherence in the long-term monitoringphase

Responsiveness describes the amount the test changesfrom the expected trajectory in response to an interventionIt is dependent on the intervention and where the test isplaced in the risk factor outcome pathway In generalresponsive tests tend to be reversible and are placed earlierin the risk factor outcome pathway as illustrated inFig 2 For example lipids and BP normalize in responseto lipid- or BP-lowering therapy However this is not al-ways the case sometimes responsive tests may be less re-versible or even irreversible and are placed later in the riskfactor outcome pathway For example we may observea lesser decline in bone mineral density (BMD) for a post-menopausal woman started on bisphosphonate therapy ora slowing of visual field loss for glaucoma patient startedon therapy to lower intraocular pressure

Related to the concept of responsiveness is the speed ofchange in response to an intervention We often want teststhat show a rapid response to an intervention This is obvi-ously a necessity when the change in outcome in responseto the intervention is also rapid for example risk of hypo-glycemia for glucose-lowering drugs (monitor glucose)bleeding risk for patients on warfarin (monitor internationalnormalized ratio) In other situations in which the change inoutcome is much slower we still often want tests that showa rapid response so that we may quickly judge whethertreatment is working as expected for example risk of a car-diovascular event (monitor cholesterol and BP)

Not all responsive tests show rapid changes in response tothe intervention in fact some take monthsyears to changefor example HbA1c left ventricular function and BMDBecause changes in these tests reflect average treatmenteffects over a longer period of time these tests may bepreferred for judging effects over the mediumelong term

The concepts of lsquolsquosignalrsquorsquo and lsquolsquonoisersquorsquo are relevant toboth response monitoring and long-term monitoring which

follow on from this section For response monitoring signalincludes both mean change and between-person variation inresponse If the between-person variation component of thesignal is small then we may estimate the signal for an in-dividual using the population mean change without needingto monitor (see Refs [11e13] for examples) If thebetween-person variation is not small we are unable to es-timate signal on the basis of population mean change aloneWe will also need to estimate the individualrsquos true deviationfrom the mean change and this is best done where there isa favorable signal-to-noise ratio

Noise is a result of background random variation withinindividuals because of measurement error and biologicalfluctuations The amount of noise in monitoring tests maynot be appreciated by clinicians and variations due to noisemay be wrongly attributed to real change For example theSD for random variation in one measurement of systolicblood is approximately 10 mm Hg [12e14] This meansthat a change of 20 mm Hg between two measurements willoften be because of background noise alone without anyreal underlying change We may estimate noise using sim-ple methods such as halving the observed variance of thedifference between two measurements made within a shortinterval [14] (or for the noise of change we may simply usethe observed variance of difference between the two mea-surements) More sophisticated methods include vario-grams [14e16] and using the residual estimate for mixedmodels [11e1316] A simple method of decreasing noiseis to take the mean of multiple measurements for examplefor the BP of our 68-year-old patient it would be commonto use the average of several sets of home measurements

We may estimate response-monitoring signal using datafrom placebo-controlled randomized trials (these are usu-ally drug trials but theoretically could include behavioralchange or other nondrug interventions) Just as the truemean response is commonly estimated by comparing thedifference in mean changes seen in the active and placeboarms before and after treatment the true between-personvariation in response may be estimated by comparing thedifference in variation in changes Variation in changein the active group results from both true between-personvariation and noise whereas variation in the placebo groupresults from just noise thus we may estimate true between-person variation from the difference in variation betweenthe two groups

If only summary data are available it may be possible toestimate response-monitoring signal using a direct method[17] For example Fig 3 shows the distributions of changein cholesterol concentration after 6 months found in theLIPID trial [9] The placebo group had a mean increasein total cholesterol of 002 mmolL (variance of change04225 mmol2L2) and the pravastatin group had a meandecrease in total cholesterol of 116 mmolL (variance ofchange 05625 mmol2L2) We estimated the mean responseto treatment as a decrease in cholesterol concentration of118 mmolL (002 to 116 mmolL) and the between-

Fig 3 Average and variation in change in cholesterol 6 months afterstarting 40-mg pravastatin Adapted from Ref [14]

Fig 4 Distribution of apparent and true changes in systolic BP (SBP)after starting an ACE inhibitor (based on meta-analytic estimates fromseven RCTsdsee Ref [10]) The pair of normal distribution curveswas constructed to have the same height and the area under eachcurve is proportional to the SD of change in BP Only 3 of the appar-ent variance in systolic BP change is because of true between-personvariance in treatment effects [proportion is calculated from between-person variance in treatment effects(between-person variance intreatment effects thorn within-person variation in BP change)] Adaptedfrom Ref [10]

156 KJL Bell et al Journal of Clinical Epidemiology 67 (2014) 152e159

person variance of response as 014 mmol2L2

(05625e04225 mmolL) Noise may be estimated fromthe placebo group SD of change 065 mmolL (varianceof change 04225 mmol2L2) Although this method out-lines the conceptual framework for estimating response-monitoring signal and noise in practice it relies on thedistributional assumptions of normality and constant vari-ance Often it is difficult to test these assumptions directlyand other data may suggest that they are unlikely to hold(eg where measurement error is known to be proportionalto level) For these reasons the modeling approachdescribed next is preferred wherever possible (with a trans-formation applied to data if necessary)

If individual patient data are available mixed modelsmay be used to estimate mean and between-person variationin response as well as noise (eg see Refs [11e1318]) Forexample Fig 4 shows distribution of apparent and truechanges in BP after starting an angiotensin-converting en-zyme (ACE) inhibitor estimated from a meta-analysis ofmixed models from seven placebo-controlled RCTs ofACE inhibitors [13] The curves give an indication of thesignal-to-noise ratio and in this case the ratio is ratherlow (a fact that may not be widely appreciated by clinicianswho use clinic BP to monitor response to BP-lowering treat-ments) The proportion of the observed variance of change insystolic BP after starting treatment that was actually becauseof true between-person variance in the response to ACEinhibitorsdpart of the signaldwas estimated at only 3In contrast the proportion of observed change that was be-cause of measurement variability and random day-to-dayfluctuationsdthe noisedwas estimated at 97

These percentages do not account for mean change insystolic BP after starting treatment which is also a compo-nent of the signal even so the signal-to-noise ratio is verypoor and monitoring systolic BP is unlikely to be helpful injudging response to treatment For example we estimated

that if treatment had a mean systolic BP-lowering effectof 65 mm Hg it would be necessary to average O90 mea-surement occasions both before and after starting treatmentto be 95 certain that an apparent decrease of O4 mm Hgin systolic BP indicated a true decrease of O4 mm Hg (ieto be certain that treatment is having a substantial effect)

When there are insufficient data (neither individual pa-tient nor summary data available data analyzed in 2011)to estimate true variation in response the largest probablevariation in true response may be estimated using the for-mula 1 SD of between-person variation in treatmenteffects half the mean treatment effect (see Ref [19]) ormean proportional change divided by coefficient of varia-tion (Glasziou et al unpublished data) For example themean treatment effect of tumor necrosis factor inhibitorson tender joint count among patients with rheumatoid ar-thritis has been reported as a decrease of 105 units[2021] Assuming that the minimum treatment effect isgreater than zero this means that the largest probable SDof variation in response to treatment is 53 units (ie halfthe mean effect) The SD for within-person variation forchange in tender joint count (ie noise) was estimated at57 [22] From these estimates we concluded that usingtender joint count we are unlikely to be able to disaggregate

157KJL Bell et al Journal of Clinical Epidemiology 67 (2014) 152e159

true treatment effects from the background day-to-daywithin-person variation

Responsiveness is also indirectly assessed using the PTE[23] and related methods These statistical methods weredeveloped to assess the validity of a surrogate outcome inthe setting of an RCT of a treatment PTE aims to estimatehow much of the treatment effect on clinical outcome is ex-plained by treatment effect on the surrogate this includesboth how much the surrogate predicts the clinical outcomeand how responsive the surrogate is to treatment Estimatesfrom the LIPID trial found that most or all of the effectpravastatin had on reducing cardiovascular events couldbe attributable to the combined effects on total cholesteroland high-density lipoprotein (HDL) The estimated PTE foreach of the single lipid parameters ranged from 8 (triglyc-erides) to 87 (apolipoprotein B) [7]

Table 1 Ranking of lipid-monitoring tests

Lipid Validity ResponsivenessLong-termchange

Total cholesterol 7a 5 5LDL cholesterol 6a 2 6HDL cholesterol 54a 6 3Total-to-HDL cholesterol ratio 3a 4 1LDL-to-HDL cholesterol ratio 2a 1 2Non-HDL cholesterol 54a 3 4Estimated absolutecardiovascular risk

1

Abbreviations LDL low-density lipoprotein HDL high-density li-poprotein HR hazard ratio IQR interquartile range LIPID Long-Term Intervention with Pravastatin in Ischemic Disease

5 indicates that two tests tied on rankingsa Based on HRIQR for change in lipid for predicting coronary

heart disease in the LIPID trial [9]

33 Detection of long-term change how well do thetests distinguish long-term true change from randomvariation

Long-term change describes the ability of the test to dis-cern true long-term changes in the patientrsquos condition (sig-nal) from short-term measurement variability (noise) Thiscriterion is relevant to pretreatment screening and long termon treatment monitoring Although not directly relevant toinitial response monitoring (which is done over the shortterm) we might still consider it in this setting if for consis-tency we want to choose one test to use for all three mon-itoring phases

The signal for long-term change monitoring is the truelong-term trend in level within an individual over time Thisis a combination of the population mean change and thebetween-person variability or individual deviations fromthe mean change Noise is the same as for responsemonitoringdshort-term random variation in level withinan individual Unlike response monitoring in which thebetween-person variation component of the signal is oftensmall rendering monitoring unnecessary in long-term mon-itoring there is usually substantial between-person variationin the long-term trends (see Refs [141624] for examples)This means that we are unable to estimate signal on the basisof population mean change alone and need to also estimatethe individualrsquos true deviation from the mean change underconditions of a favorable signal-to-noise ratio

Analogous methods may be used to those for estimatingsignal and noise as for the case of response monitoringBetween-person variation in long-term trends over timemay be estimated by simple methods (eg subtractingtwice the short-term random variation from the total ob-served variation of the difference from the baseline to eachsubsequent time point [1416]) or through modeling [mixedmodels which may include random slopes (trends overtime) [111416]]

Other issues to consider is that variance may be depen-dent on level and the distribution curves may be non-

normal For example it is well accepted that the lsquolsquonoisersquorsquocomponent of variance may be level dependent [25] Othercomponents of total variance (such as between-person var-iation in baseline levels and trajectories over time) may alsobe level dependent This will often result in the residualsandor random-effects distributions being non-normallydistributed Often a transformation such as natural logmay solve both problems and should be done before calcu-lation of the signal-to-noise ratio (for examples see Refs[1418])

4 Some clinical examples

We applied the aforementioned framework to the clini-cal situations of monitoring patients at high risk for CVD(lipids and BP) For the purposes of illustration we haveused ranking to compare the tests for each of the criteriaAn alternative approach would be to present the actualquantitative measures for each criterion Although datafrom a single study are used as illustration for both lipidsand blood pressure ideally we would use data from severalsources or from a systematic review

The results for lipids are shown in Table 1 Overall thebest lipid marker for monitoring appears to be LDL-to-HDL ratio although other markers that combined two testsalso ranked highly total-to-HDL cholesterol ratio and non-HDL cholesterol

The results for BP measurements are shown in Table 2All three tests (clinic ambulatory and home BP) aim tomeasure the same risk factor the individualrsquos BP In thiscase the differences in ranking for validity are becauseout-of-office measurement of BP offers a more accurate re-flection of the individualrsquos usual BP This is because of botha more natural environment where measurement takesplace (avoidance of lsquolsquowhite coat hypertensionrsquorsquo) and reduc-tion in noise (measurement variability) through averaginga larger number of measurements In addition the measuresof variability possible with out-of-office measurement havebeen found to be predictive of CVD independent of meanBP level this also adds to the superior predictiveness of

Table 2 Ranking of BP-monitoring tests

BP test Validity ResponsivenessLong-termchange

Clinic BP 4 3 3Ambulatory BP 52a 51 51Home BP 52a 51 51Estimated absolute

cardiovascular risk1

Abbreviation BP blood pressurea Based on risk of cardiovascular events in the PAMELA

study [28]

158 KJL Bell et al Journal of Clinical Epidemiology 67 (2014) 152e159

these types of BP measurement Both types of out-of-officeBP also outrank clinic BP for both responsiveness and long-term change primarily through the reduction in noise that ispossible with these types of BP measurement

Cardiovascular risk estimation (using the Framingham[8] or similar equation) outperforms both lipids and BP interms of validity Currently this is only monitored in thepretreatment population to decide who should be startedon lipid- and BP-lowering treatments [26] but theoreticallyit could be used for monitoring patients on treatment alsoHow this test performs on measures of responsiveness andlong-term change is currently unknown

5 Discussion and conclusions

The choice of the best test to monitor a patient may bemade by applying the methods described previously toavailable evidence and then ranking tests in terms of valid-ity responsiveness and long-term change Although notdiscussed in this article the fourth criterion of practicalityis also important For example when choosing betweenhome and ambulatory BP (which perform similarly on thecriteria of validity responsiveness and long-term change)the clinician and patient may opt for home BP measure-ment because of its ease of application

The criteria that we describe evaluate monitoring fortests assessed on a continuum We do this for several rea-sons many common monitoring tests are reported as con-tinuous separating lsquolsquosignalrsquorsquo from lsquolsquonoisersquorsquo is most easilydone using continuous measures and maintaining the con-tinuity of measurements will increase statistical power [27]However sometimes changes observed in continuous testsduring monitoring may then be dichotomized or split intomore than two categories by comparing the level to deci-sion thresholds to decide whether there is a need for alter-ing management or early retesting

Unless one test dominates on all three criteria and onpracticality the choice of test will involve some trade-offbetween the criteria Depending on the clinical circum-stances clinicians may give more weight to one of thesecriteria than the others although validity will usually bethe most important By making an evidence-based choiceon the monitoring test to use we can expect better clinicaloutcomes higher patient and clinician satisfaction and lesswaste of scarce health resources

Acknowledgments

The authors thank Dr Clement Loy and Dr RafaelPerera-Salazar for their helpful comments on an earlierdraft of this article

References

[1] Glasziou PP Irwig L Mant D Monitoring in chronic disease a ratio-

nal approach BMJ 2005330644e8[2] Bossuyt PM Irwig L Craig J Glasziou P Comparative accuracy as-

sessing new tests against existing diagnostic pathways [Review] [16

refs] [Erratum appears in BMJ 2006332(7554)1368] BMJ 2006

3321089e92

[3] Hlatky MA Greenland P Arnett DK Ballantyne CM Criqui MH

Elkind MS et al Criteria for evaluation of novel markers of cardio-

vascular risk a scientific statement from the American Heart Associ-

ation [Erratum appears in Circulation 2009119(25)e606 Note

Hong Yuling [added]] Circulation 20091192408e16

[4] Janes H Pepe M Bossuyt P Barlow W Measuring the performance

of markers for guiding treatment decisions Ann Intern Med 2011

154253e9

[5] Irwig L Glasziou PP Choosing the best monitoring tests In

Glasziou PP Irwig L Aronson JK editors Evidence-based medical

monitoring from principles to practice Malden MA BMJ Books

200863e74

[6] Boekholdt SM Arsenault BJ Mora S Pedersen TR LaRosa JC

Nestel PJ et al Association of LDL cholesterol non-HDL choles-

terol and apolipoprotein B levels with risk of cardiovascular events

among patients treated with statins A meta-analysis JAMA 2012

3071302e9

[7] Simes RJ Marschner IC Hunt D Colquhoun D Sullivan D

Steward RAH et al Relationship between lipid levels and clinical

outcomes in the Long-Term Intervention with Pravastatin in Ischemic

Disease (LIPID) Trial to what extend is the reduction in coronary

events with pravastatin explained by on-study lipid levels Circula-

tion 20021051162e9

[8] DrsquoAgostino RB Vasan RS Pencina MJ Wolf PA Cobain M

Massaro JM et al General cardiovascular risk profile for use in pri-

mary care the Framingham Heart Study Circulation 2008117

743e53

[9] The Long-Term Intervention with Pravastatin in Ischaemic Disease

(LIPID) Study Group Prevention of cardiovascular events and death

with pravastatin in patients with coronary heart disease and a broad

range of initial cholesterol levels N Engl J Med 1998339

1349e57

[10] PROGRESS Collaboration Group Randomised trial of a perindopril-

based blood-pressure-lowering regimen among 6105 individuals with

previous stroke or transient ischaemic attack Lancet 2001358

1033e41

[11] Bell KJL Hayen A Macaskill P Irwig L Craig JC Ensrud KEE

et al Value of routine monitoring of bone mineral density after start-

ing bisphosphonate treatment secondary analysis of trial data BMJ

2009338b2266

[12] Bell KJL Hayen A Macaskill P Craig JC Neal BC Irwig L Mixed

models showed no need for initial response monitoring after starting

anti-hypertensive therapy J Clin Epidemiol 200962650e9

[13] Bell KJL Hayen A Macaskill P Craig JC Neal BC Fox KM

et al Monitoring initial response to angiotensin converting enzyme

inhibitor based regimens an individual patient data meta-analysis

from randomised placebo controlled trials Hypertension 201056

533e9[14] Keenan K Hayen A Neal B Irwig L Long term monitoring in pa-

tients receiving treatment to lower blood pressure analysis of data

from placebo controlled randomised controlled trial BMJ 2009

338b1492

159KJL Bell et al Journal of Clinical Epidemiology 67 (2014) 152e159

[15] Shepard D Reliability of blood pressure measurements implications

for designing and evaluating programs to control hypertension

J Chronic Dis 198134191e209

[16] Glasziou PP Irwig L Heritier S Simes J Tonkin A the LIPID Study

Investigators Monitoring cholesterol levels measurement error or

true change Ann Intern Med 2008148656e61

[17] Bell KJL Irwig L Craig JC Macaskill P Use of randomized trials to

decide when to monitor response to new treatment BMJ 2008336

361e5[18] Bell K Hayen A Irwig L Hochberg M Ensrud K Cummings S

et al The potential value of monitoring bone turnover markers among

women on alendronate J Bone Miner Res 201227195e201[19] Bell KJL Irwig L March L Hayen A Macaskill P Craig JC Should

response rules be used to decide continued subsidy of very expensive

drugs A checklist for decision makers Pharmacoepidemiol Drug Saf

201019(1)99e105[20] van de Putte LBA Atkins C Malaise M Sany J Russell AS van

Riel PLCM Efficacy and safety of adalimumab as monotherapy in

patients with rheumatoid arthritis for whom previous disease modify-

ing antirheumatic drug treatment has failed Ann Rheum Dis 2004

63508e16

[21] Westhoevens R Yocum D Han J Berman A Strusber I Geusens P

et al The safety of infliximab combined with background treatments

among patients with rheumatoid arthritis and various comorbidities

Arthritis Rheum 2006541075e86

[22] Lassere M van der Heijde D Johnson KR Boers M Edmonds J

Reliability of measures of disease activity and disease damage in rheu-

matoid arthritis implications for smallest detectable difference min-

imal clinically important difference and analysis of treatment effects

in randomized controlled trials J Rheumatol 200128892e903[23] Freedman LS Graubard BI Statistical validation of intermediate

endpoints for chronic diseases Stat Med 199211167e78

[24] Takahashi O Glasziou PP Perera R Shimbo T Fukui T Blood pres-

sure re-screening for healthy adults what is the best measure and in-

terval J Hum Hypertens 201226540e6 httpdxdoiorg101038

jhh201172

[25] Bland JM Altman DG Measurement error proportional to the mean

BMJ 1996313106

[26] Jackson R Lawes C Bennet D Milne R Roders A Treatment with

drugs to lower blood pressure and blood cholesterol based on an in-

dividualrsquos absolute cardiovascular risk Lancet 2005365434e41[27] Spruijt B Vergouwe Y Nijman RG Thompson M Oostenbrink R

Vital signs should be maintained as continuous variables when pre-

dicting bacterial infections in febrile children J Clin Epidemiol

201366453e7[28] Sega R Facchetti R Bombelli M Cesana G Corrao G Grassi G

Mancia G Prognostic value of ambulatory and home blood pressures

compared with office blood pressure in the general population fol-

low-up results from the Pressioni Arteriose Monitorate e Loro Asso-

ciazioni (PAMELA) study Circulation 20051111777e83

153KJL Bell et al Journal of Clinical Epidemiology 67 (2014) 152e159

What is new

Key findings Monitoring in clinical practice occurs in three main

phases before treatment response to treatmentand long-term monitoring Four important criteriamay be used to choose the best test for monitoringa patient in each of these phases clinical validityresponsiveness long-term change and practicalityThe performance of tests on these criteria may beinformed from trial andor cohort data

What this adds to what was known Evidence can be used to compare different testsrsquo

performance across these key criteria to judgewhich test is best for monitoring patients at eachphase of clinical management

What is the implication and what should changenow Using methods we describe in this article clini-

cians may make evidence-based decisions onwhich monitoring test to use

problems with adherence lifestyle changes that modify theeffect of treatment or natural progression of the disease andthe development of complications

Methods have been described to help choose which testsor markers are best for diagnosis predicting risk and treat-ment decisions [2e4] but there are no accepted approachesfor choosing which tests should be used for clinical moni-toring However just as with diagnosis there are oftena number of different tests available for the clinician tochoose from For example we might consider the clinicalpopulation of patients at high risk for cardiovascular dis-ease (CVD) Should we monitor total cholesterol or low-density lipoprotein (LDL) cholesterol when deciding onthe need foradequacy of statin treatment Or is there an-other lipid test that is better to monitor Should we measureBP in the clinic or via a 24-hour ambulatory device orteach patients how to do it themselves at home Similarlyfor patients with osteoporosis should we monitor boneturnover markersdand if so which onedor bone density(which can take several years to change) If so how oftenAfter considering key criteria that may be used to helpchoose the best monitoring test we will revisit some ofthese clinical situations at the end of the article

2 Criteria for good monitoring measurements

To help choose between monitoring tests a number ofcriteria may be used [5] These include (1) clinical validity

(2) responsiveness (3) detectability of long-term changeand (4) practicality Clinical validity describes the abilityof the test to predict the clinically relevant outcome thatwe are trying to prevent For example with both cholesteroland BP tests the clinical outcomes we are interested in arecardiovascular events such as myocardial infarction andstroke Responsiveness describes how much the testchanges in response to an intervention relative to back-ground random variation Detectability of long-term changedescribes the size of changes in the test over the long termrelative to background random variation Practicality de-scribes the ease of the test in terms of invasiveness costand straightforwardness This fourth criterion although im-portant poses few theoretical questions for evaluation andis included for completeness but will not be considered fur-ther here

These criteria may be considered within the context ofthe pathway that leads from the prescription of treatmentto the outcome that the treatment aims to produce illus-trated in Fig 2 The top section of Fig 2 shows the naturalhistory of disease from risk factors to the outcome via inter-mediate pathology both early and late Below this is a path-way that starts with prescription of a drug (or another typeof intervention) that aims to alter that natural history of dis-ease and decrease the patientrsquos risk of the clinical outcomeThe pathway suggests that choices for monitoring will in-clude the treatment (compliance or blood levels) a proximalconsequence of treatment (change in BP cholesterol orother marker) or the disease itself (symptoms or signs)

This treatmenterisk factoredisease pathway may helpus understand the choice of test in the different monitoringphases before treatment initial response and long termWe often may choose the same test for each of the threemonitoring phases for ease of comparison over time How-ever there is often a trade-off between criteria for exampleas shown in Fig 2 the most valid test is often not the mostresponsive one This means that in some cases a differenttest may be chosen for different monitoring phases for ex-ample a more responsive test for initial response and a morevalid test for pretreatment and long term

3 The three measurement criteria

We now consider each of the specific criteria as appliedto monitoring tests with continuous outcomes and then ap-ply these to analyze the choice among several options formonitoring of lipid- and BP-lowering treatments

31 Clinical validity how well do the tests predictpatient outcomes

A test has clinical validity if it predicts the clinical out-come of interest It must be on the risk factor outcomepathway either directly or as a proxy for another marker onthe pathway that is less easily measured As Fig 2 shows

Fig 1 Phases of monitoring Adapted from Ref [1]

154 KJL Bell et al Journal of Clinical Epidemiology 67 (2014) 152e159

the later the test is on the pathway (closer to outcome) themore predictive it is likely to be with the most valid testbeing a measure of the outcome itself For example withour aforementioned 68-year-old patient following treat-ment we might monitor compliance by pill counts BP ath-erosclerotic changes (such as renal stenosis) signs ofkidney damage (such as proteinuria) or renal functionThe measures along this path have progressively greaterimportance but take progressively longer for changes to be-come apparent

Hazard ratios (HRs) are the most obvious statisticalmeasure for capturing how well tests predict clinical out-comes however in their natural form they have limited ca-pacity for comparison across tests as the HR depends onboth the units of measurement (which are test specific)and distributional range of test results To compare the pre-dictiveness of different tests a standardized measure maybe used such as the HR over the interquartile range (HRIQR) or the HR per standard deviation (SD) increase inthe test (HRSD) Using the IQR to standardize allows lessreliance on the distributional assumptions of normality thatapply when using SD and thus may be preferred in many

Fig 2 Flow diagram of disease and the effects of treatment Adaptedfrom Ref [2] CHD coronary heart disease

instances However neither measure is robust against veryskewed data where a transformation may be required firstbefore calculating predictiveness measures Alternativemethods of standardization include calculating the HR forthe top quartile compared with the bottom quartile [6] (ordecile quintile or similar) Although this usually gives re-sults with a similar interpretation to standardizing by theIQR (or interdecile range etc) less information is used In-stead of estimating risk from the complete data and thencalculating the ratio from two points (HRIQR) the averagerisk from the top quartile is compared with that from thebottom quartile (HR for top vs bottom quartile) HR com-paring top with bottom quartile is also likely to be moresensitive to distributional assumptions than HRIQR

To calculate our preferred standardized measure of pre-dictiveness we first need to choose the patient populationthat we will use for the data This may include patientson no treatment (or placebo) patients on active treatmentor both patients on and off treatment

In some cases there is evidence that the relationshipsbetween monitoring tests and clinical outcome are similarwhether patients are on treatment or not For example inthe Long-Term Intervention with Pravastatin in IschemicDisease (LIPID) trial total cholesterol and LDL cholesterolhad a similar ability to predict CVD whether the patientwas on pravastatin treatment or placebo after adjustmentfor other nonlipid risk factors and measurement error [7]In this case the same data may be used to compare validityfor pretreatment and on-treatment monitoring and this maycome from individuals on placebono treatment or individ-uals on treatment If data are from a randomized controlledtrial (RCT) it may be better to include individuals fromboth placebo and treatment groups This helps ensure a widerange of values for the monitoring test and increase the sta-tistical precision because of the larger data set

In other cases the predictiveness of the monitoring testmay differ depending on whether the patient is on treat-ment For example in the Framingham study the relation-ship between BP and CVD appeared stronger if the patientwas on BP-lowering treatment than if they were not [8] Inthis case it may be best to use data from patients not ontreatment to decide on the clinical validity of a test for pre-treatment monitoring and data from patients on treatment todecide on the clinical validity of a test for on-treatmentmonitoring A caveat to this is that in the first couple ofyears of on-treatment monitoring data from off-treatmentpopulations may be more informative This is because inmany cases of chronic disease the effect of the treatmenton the monitoring test (eg drop in BP) is relatively fastbut the effect on the clinical outcome (eg CVD) may takea few years to become apparent and pretreatment levels ofthe monitoring test may be more predictive during thistime This lsquolsquolag timersquorsquo phenomenon is evident in trial datafor a number of chronic diseases including treatments tolower lipids [9] and BP [10] In each case the differencein monitoring test levels between treatment and placebo

155KJL Bell et al Journal of Clinical Epidemiology 67 (2014) 152e159

is maximal after a few months but the difference in CVDevents between the two treatment groups continues to di-verge for a number of years

Other methods of assessing validity such as proportionof treatment explained (PTE) also assess the responsivenessof the monitoring test and are discussed in the following

32 Responsiveness how clearly and rapidly do thetests change with treatment change

The responsiveness criterion is especially important forthe initial response phase of monitoring soon after a newtreatment has been started Although less obvious this crite-rion is also important for both pretreatment and long-termmonitoring For all monitoring phases we ideally want thetest to be responsive to interventions that alter the patientrsquosrisk of the clinical outcome Such interventions may belifestyle changes in the pretreatment screening phase phar-macologic treatments in the initial response phase or mea-sures to improve adherence in the long-term monitoringphase

Responsiveness describes the amount the test changesfrom the expected trajectory in response to an interventionIt is dependent on the intervention and where the test isplaced in the risk factor outcome pathway In generalresponsive tests tend to be reversible and are placed earlierin the risk factor outcome pathway as illustrated inFig 2 For example lipids and BP normalize in responseto lipid- or BP-lowering therapy However this is not al-ways the case sometimes responsive tests may be less re-versible or even irreversible and are placed later in the riskfactor outcome pathway For example we may observea lesser decline in bone mineral density (BMD) for a post-menopausal woman started on bisphosphonate therapy ora slowing of visual field loss for glaucoma patient startedon therapy to lower intraocular pressure

Related to the concept of responsiveness is the speed ofchange in response to an intervention We often want teststhat show a rapid response to an intervention This is obvi-ously a necessity when the change in outcome in responseto the intervention is also rapid for example risk of hypo-glycemia for glucose-lowering drugs (monitor glucose)bleeding risk for patients on warfarin (monitor internationalnormalized ratio) In other situations in which the change inoutcome is much slower we still often want tests that showa rapid response so that we may quickly judge whethertreatment is working as expected for example risk of a car-diovascular event (monitor cholesterol and BP)

Not all responsive tests show rapid changes in response tothe intervention in fact some take monthsyears to changefor example HbA1c left ventricular function and BMDBecause changes in these tests reflect average treatmenteffects over a longer period of time these tests may bepreferred for judging effects over the mediumelong term

The concepts of lsquolsquosignalrsquorsquo and lsquolsquonoisersquorsquo are relevant toboth response monitoring and long-term monitoring which

follow on from this section For response monitoring signalincludes both mean change and between-person variation inresponse If the between-person variation component of thesignal is small then we may estimate the signal for an in-dividual using the population mean change without needingto monitor (see Refs [11e13] for examples) If thebetween-person variation is not small we are unable to es-timate signal on the basis of population mean change aloneWe will also need to estimate the individualrsquos true deviationfrom the mean change and this is best done where there isa favorable signal-to-noise ratio

Noise is a result of background random variation withinindividuals because of measurement error and biologicalfluctuations The amount of noise in monitoring tests maynot be appreciated by clinicians and variations due to noisemay be wrongly attributed to real change For example theSD for random variation in one measurement of systolicblood is approximately 10 mm Hg [12e14] This meansthat a change of 20 mm Hg between two measurements willoften be because of background noise alone without anyreal underlying change We may estimate noise using sim-ple methods such as halving the observed variance of thedifference between two measurements made within a shortinterval [14] (or for the noise of change we may simply usethe observed variance of difference between the two mea-surements) More sophisticated methods include vario-grams [14e16] and using the residual estimate for mixedmodels [11e1316] A simple method of decreasing noiseis to take the mean of multiple measurements for examplefor the BP of our 68-year-old patient it would be commonto use the average of several sets of home measurements

We may estimate response-monitoring signal using datafrom placebo-controlled randomized trials (these are usu-ally drug trials but theoretically could include behavioralchange or other nondrug interventions) Just as the truemean response is commonly estimated by comparing thedifference in mean changes seen in the active and placeboarms before and after treatment the true between-personvariation in response may be estimated by comparing thedifference in variation in changes Variation in changein the active group results from both true between-personvariation and noise whereas variation in the placebo groupresults from just noise thus we may estimate true between-person variation from the difference in variation betweenthe two groups

If only summary data are available it may be possible toestimate response-monitoring signal using a direct method[17] For example Fig 3 shows the distributions of changein cholesterol concentration after 6 months found in theLIPID trial [9] The placebo group had a mean increasein total cholesterol of 002 mmolL (variance of change04225 mmol2L2) and the pravastatin group had a meandecrease in total cholesterol of 116 mmolL (variance ofchange 05625 mmol2L2) We estimated the mean responseto treatment as a decrease in cholesterol concentration of118 mmolL (002 to 116 mmolL) and the between-

Fig 3 Average and variation in change in cholesterol 6 months afterstarting 40-mg pravastatin Adapted from Ref [14]

Fig 4 Distribution of apparent and true changes in systolic BP (SBP)after starting an ACE inhibitor (based on meta-analytic estimates fromseven RCTsdsee Ref [10]) The pair of normal distribution curveswas constructed to have the same height and the area under eachcurve is proportional to the SD of change in BP Only 3 of the appar-ent variance in systolic BP change is because of true between-personvariance in treatment effects [proportion is calculated from between-person variance in treatment effects(between-person variance intreatment effects thorn within-person variation in BP change)] Adaptedfrom Ref [10]

156 KJL Bell et al Journal of Clinical Epidemiology 67 (2014) 152e159

person variance of response as 014 mmol2L2

(05625e04225 mmolL) Noise may be estimated fromthe placebo group SD of change 065 mmolL (varianceof change 04225 mmol2L2) Although this method out-lines the conceptual framework for estimating response-monitoring signal and noise in practice it relies on thedistributional assumptions of normality and constant vari-ance Often it is difficult to test these assumptions directlyand other data may suggest that they are unlikely to hold(eg where measurement error is known to be proportionalto level) For these reasons the modeling approachdescribed next is preferred wherever possible (with a trans-formation applied to data if necessary)

If individual patient data are available mixed modelsmay be used to estimate mean and between-person variationin response as well as noise (eg see Refs [11e1318]) Forexample Fig 4 shows distribution of apparent and truechanges in BP after starting an angiotensin-converting en-zyme (ACE) inhibitor estimated from a meta-analysis ofmixed models from seven placebo-controlled RCTs ofACE inhibitors [13] The curves give an indication of thesignal-to-noise ratio and in this case the ratio is ratherlow (a fact that may not be widely appreciated by clinicianswho use clinic BP to monitor response to BP-lowering treat-ments) The proportion of the observed variance of change insystolic BP after starting treatment that was actually becauseof true between-person variance in the response to ACEinhibitorsdpart of the signaldwas estimated at only 3In contrast the proportion of observed change that was be-cause of measurement variability and random day-to-dayfluctuationsdthe noisedwas estimated at 97

These percentages do not account for mean change insystolic BP after starting treatment which is also a compo-nent of the signal even so the signal-to-noise ratio is verypoor and monitoring systolic BP is unlikely to be helpful injudging response to treatment For example we estimated

that if treatment had a mean systolic BP-lowering effectof 65 mm Hg it would be necessary to average O90 mea-surement occasions both before and after starting treatmentto be 95 certain that an apparent decrease of O4 mm Hgin systolic BP indicated a true decrease of O4 mm Hg (ieto be certain that treatment is having a substantial effect)

When there are insufficient data (neither individual pa-tient nor summary data available data analyzed in 2011)to estimate true variation in response the largest probablevariation in true response may be estimated using the for-mula 1 SD of between-person variation in treatmenteffects half the mean treatment effect (see Ref [19]) ormean proportional change divided by coefficient of varia-tion (Glasziou et al unpublished data) For example themean treatment effect of tumor necrosis factor inhibitorson tender joint count among patients with rheumatoid ar-thritis has been reported as a decrease of 105 units[2021] Assuming that the minimum treatment effect isgreater than zero this means that the largest probable SDof variation in response to treatment is 53 units (ie halfthe mean effect) The SD for within-person variation forchange in tender joint count (ie noise) was estimated at57 [22] From these estimates we concluded that usingtender joint count we are unlikely to be able to disaggregate

157KJL Bell et al Journal of Clinical Epidemiology 67 (2014) 152e159

true treatment effects from the background day-to-daywithin-person variation

Responsiveness is also indirectly assessed using the PTE[23] and related methods These statistical methods weredeveloped to assess the validity of a surrogate outcome inthe setting of an RCT of a treatment PTE aims to estimatehow much of the treatment effect on clinical outcome is ex-plained by treatment effect on the surrogate this includesboth how much the surrogate predicts the clinical outcomeand how responsive the surrogate is to treatment Estimatesfrom the LIPID trial found that most or all of the effectpravastatin had on reducing cardiovascular events couldbe attributable to the combined effects on total cholesteroland high-density lipoprotein (HDL) The estimated PTE foreach of the single lipid parameters ranged from 8 (triglyc-erides) to 87 (apolipoprotein B) [7]

Table 1 Ranking of lipid-monitoring tests

Lipid Validity ResponsivenessLong-termchange

Total cholesterol 7a 5 5LDL cholesterol 6a 2 6HDL cholesterol 54a 6 3Total-to-HDL cholesterol ratio 3a 4 1LDL-to-HDL cholesterol ratio 2a 1 2Non-HDL cholesterol 54a 3 4Estimated absolutecardiovascular risk

1

Abbreviations LDL low-density lipoprotein HDL high-density li-poprotein HR hazard ratio IQR interquartile range LIPID Long-Term Intervention with Pravastatin in Ischemic Disease

5 indicates that two tests tied on rankingsa Based on HRIQR for change in lipid for predicting coronary

heart disease in the LIPID trial [9]

33 Detection of long-term change how well do thetests distinguish long-term true change from randomvariation

Long-term change describes the ability of the test to dis-cern true long-term changes in the patientrsquos condition (sig-nal) from short-term measurement variability (noise) Thiscriterion is relevant to pretreatment screening and long termon treatment monitoring Although not directly relevant toinitial response monitoring (which is done over the shortterm) we might still consider it in this setting if for consis-tency we want to choose one test to use for all three mon-itoring phases

The signal for long-term change monitoring is the truelong-term trend in level within an individual over time Thisis a combination of the population mean change and thebetween-person variability or individual deviations fromthe mean change Noise is the same as for responsemonitoringdshort-term random variation in level withinan individual Unlike response monitoring in which thebetween-person variation component of the signal is oftensmall rendering monitoring unnecessary in long-term mon-itoring there is usually substantial between-person variationin the long-term trends (see Refs [141624] for examples)This means that we are unable to estimate signal on the basisof population mean change alone and need to also estimatethe individualrsquos true deviation from the mean change underconditions of a favorable signal-to-noise ratio

Analogous methods may be used to those for estimatingsignal and noise as for the case of response monitoringBetween-person variation in long-term trends over timemay be estimated by simple methods (eg subtractingtwice the short-term random variation from the total ob-served variation of the difference from the baseline to eachsubsequent time point [1416]) or through modeling [mixedmodels which may include random slopes (trends overtime) [111416]]

Other issues to consider is that variance may be depen-dent on level and the distribution curves may be non-

normal For example it is well accepted that the lsquolsquonoisersquorsquocomponent of variance may be level dependent [25] Othercomponents of total variance (such as between-person var-iation in baseline levels and trajectories over time) may alsobe level dependent This will often result in the residualsandor random-effects distributions being non-normallydistributed Often a transformation such as natural logmay solve both problems and should be done before calcu-lation of the signal-to-noise ratio (for examples see Refs[1418])

4 Some clinical examples

We applied the aforementioned framework to the clini-cal situations of monitoring patients at high risk for CVD(lipids and BP) For the purposes of illustration we haveused ranking to compare the tests for each of the criteriaAn alternative approach would be to present the actualquantitative measures for each criterion Although datafrom a single study are used as illustration for both lipidsand blood pressure ideally we would use data from severalsources or from a systematic review

The results for lipids are shown in Table 1 Overall thebest lipid marker for monitoring appears to be LDL-to-HDL ratio although other markers that combined two testsalso ranked highly total-to-HDL cholesterol ratio and non-HDL cholesterol

The results for BP measurements are shown in Table 2All three tests (clinic ambulatory and home BP) aim tomeasure the same risk factor the individualrsquos BP In thiscase the differences in ranking for validity are becauseout-of-office measurement of BP offers a more accurate re-flection of the individualrsquos usual BP This is because of botha more natural environment where measurement takesplace (avoidance of lsquolsquowhite coat hypertensionrsquorsquo) and reduc-tion in noise (measurement variability) through averaginga larger number of measurements In addition the measuresof variability possible with out-of-office measurement havebeen found to be predictive of CVD independent of meanBP level this also adds to the superior predictiveness of

Table 2 Ranking of BP-monitoring tests

BP test Validity ResponsivenessLong-termchange

Clinic BP 4 3 3Ambulatory BP 52a 51 51Home BP 52a 51 51Estimated absolute

cardiovascular risk1

Abbreviation BP blood pressurea Based on risk of cardiovascular events in the PAMELA

study [28]

158 KJL Bell et al Journal of Clinical Epidemiology 67 (2014) 152e159

these types of BP measurement Both types of out-of-officeBP also outrank clinic BP for both responsiveness and long-term change primarily through the reduction in noise that ispossible with these types of BP measurement

Cardiovascular risk estimation (using the Framingham[8] or similar equation) outperforms both lipids and BP interms of validity Currently this is only monitored in thepretreatment population to decide who should be startedon lipid- and BP-lowering treatments [26] but theoreticallyit could be used for monitoring patients on treatment alsoHow this test performs on measures of responsiveness andlong-term change is currently unknown

5 Discussion and conclusions

The choice of the best test to monitor a patient may bemade by applying the methods described previously toavailable evidence and then ranking tests in terms of valid-ity responsiveness and long-term change Although notdiscussed in this article the fourth criterion of practicalityis also important For example when choosing betweenhome and ambulatory BP (which perform similarly on thecriteria of validity responsiveness and long-term change)the clinician and patient may opt for home BP measure-ment because of its ease of application

The criteria that we describe evaluate monitoring fortests assessed on a continuum We do this for several rea-sons many common monitoring tests are reported as con-tinuous separating lsquolsquosignalrsquorsquo from lsquolsquonoisersquorsquo is most easilydone using continuous measures and maintaining the con-tinuity of measurements will increase statistical power [27]However sometimes changes observed in continuous testsduring monitoring may then be dichotomized or split intomore than two categories by comparing the level to deci-sion thresholds to decide whether there is a need for alter-ing management or early retesting

Unless one test dominates on all three criteria and onpracticality the choice of test will involve some trade-offbetween the criteria Depending on the clinical circum-stances clinicians may give more weight to one of thesecriteria than the others although validity will usually bethe most important By making an evidence-based choiceon the monitoring test to use we can expect better clinicaloutcomes higher patient and clinician satisfaction and lesswaste of scarce health resources

Acknowledgments

The authors thank Dr Clement Loy and Dr RafaelPerera-Salazar for their helpful comments on an earlierdraft of this article

References

[1] Glasziou PP Irwig L Mant D Monitoring in chronic disease a ratio-

nal approach BMJ 2005330644e8[2] Bossuyt PM Irwig L Craig J Glasziou P Comparative accuracy as-

sessing new tests against existing diagnostic pathways [Review] [16

refs] [Erratum appears in BMJ 2006332(7554)1368] BMJ 2006

3321089e92

[3] Hlatky MA Greenland P Arnett DK Ballantyne CM Criqui MH

Elkind MS et al Criteria for evaluation of novel markers of cardio-

vascular risk a scientific statement from the American Heart Associ-

ation [Erratum appears in Circulation 2009119(25)e606 Note

Hong Yuling [added]] Circulation 20091192408e16

[4] Janes H Pepe M Bossuyt P Barlow W Measuring the performance

of markers for guiding treatment decisions Ann Intern Med 2011

154253e9

[5] Irwig L Glasziou PP Choosing the best monitoring tests In

Glasziou PP Irwig L Aronson JK editors Evidence-based medical

monitoring from principles to practice Malden MA BMJ Books

200863e74

[6] Boekholdt SM Arsenault BJ Mora S Pedersen TR LaRosa JC

Nestel PJ et al Association of LDL cholesterol non-HDL choles-

terol and apolipoprotein B levels with risk of cardiovascular events

among patients treated with statins A meta-analysis JAMA 2012

3071302e9

[7] Simes RJ Marschner IC Hunt D Colquhoun D Sullivan D

Steward RAH et al Relationship between lipid levels and clinical

outcomes in the Long-Term Intervention with Pravastatin in Ischemic

Disease (LIPID) Trial to what extend is the reduction in coronary

events with pravastatin explained by on-study lipid levels Circula-

tion 20021051162e9

[8] DrsquoAgostino RB Vasan RS Pencina MJ Wolf PA Cobain M

Massaro JM et al General cardiovascular risk profile for use in pri-

mary care the Framingham Heart Study Circulation 2008117

743e53

[9] The Long-Term Intervention with Pravastatin in Ischaemic Disease

(LIPID) Study Group Prevention of cardiovascular events and death

with pravastatin in patients with coronary heart disease and a broad

range of initial cholesterol levels N Engl J Med 1998339

1349e57

[10] PROGRESS Collaboration Group Randomised trial of a perindopril-

based blood-pressure-lowering regimen among 6105 individuals with

previous stroke or transient ischaemic attack Lancet 2001358

1033e41

[11] Bell KJL Hayen A Macaskill P Irwig L Craig JC Ensrud KEE

et al Value of routine monitoring of bone mineral density after start-

ing bisphosphonate treatment secondary analysis of trial data BMJ

2009338b2266

[12] Bell KJL Hayen A Macaskill P Craig JC Neal BC Irwig L Mixed

models showed no need for initial response monitoring after starting

anti-hypertensive therapy J Clin Epidemiol 200962650e9

[13] Bell KJL Hayen A Macaskill P Craig JC Neal BC Fox KM

et al Monitoring initial response to angiotensin converting enzyme

inhibitor based regimens an individual patient data meta-analysis

from randomised placebo controlled trials Hypertension 201056

533e9[14] Keenan K Hayen A Neal B Irwig L Long term monitoring in pa-

tients receiving treatment to lower blood pressure analysis of data

from placebo controlled randomised controlled trial BMJ 2009

338b1492

159KJL Bell et al Journal of Clinical Epidemiology 67 (2014) 152e159

[15] Shepard D Reliability of blood pressure measurements implications

for designing and evaluating programs to control hypertension

J Chronic Dis 198134191e209

[16] Glasziou PP Irwig L Heritier S Simes J Tonkin A the LIPID Study

Investigators Monitoring cholesterol levels measurement error or

true change Ann Intern Med 2008148656e61

[17] Bell KJL Irwig L Craig JC Macaskill P Use of randomized trials to

decide when to monitor response to new treatment BMJ 2008336

361e5[18] Bell K Hayen A Irwig L Hochberg M Ensrud K Cummings S

et al The potential value of monitoring bone turnover markers among

women on alendronate J Bone Miner Res 201227195e201[19] Bell KJL Irwig L March L Hayen A Macaskill P Craig JC Should

response rules be used to decide continued subsidy of very expensive

drugs A checklist for decision makers Pharmacoepidemiol Drug Saf

201019(1)99e105[20] van de Putte LBA Atkins C Malaise M Sany J Russell AS van

Riel PLCM Efficacy and safety of adalimumab as monotherapy in

patients with rheumatoid arthritis for whom previous disease modify-

ing antirheumatic drug treatment has failed Ann Rheum Dis 2004

63508e16

[21] Westhoevens R Yocum D Han J Berman A Strusber I Geusens P

et al The safety of infliximab combined with background treatments

among patients with rheumatoid arthritis and various comorbidities

Arthritis Rheum 2006541075e86

[22] Lassere M van der Heijde D Johnson KR Boers M Edmonds J

Reliability of measures of disease activity and disease damage in rheu-

matoid arthritis implications for smallest detectable difference min-

imal clinically important difference and analysis of treatment effects

in randomized controlled trials J Rheumatol 200128892e903[23] Freedman LS Graubard BI Statistical validation of intermediate

endpoints for chronic diseases Stat Med 199211167e78

[24] Takahashi O Glasziou PP Perera R Shimbo T Fukui T Blood pres-

sure re-screening for healthy adults what is the best measure and in-

terval J Hum Hypertens 201226540e6 httpdxdoiorg101038

jhh201172

[25] Bland JM Altman DG Measurement error proportional to the mean

BMJ 1996313106

[26] Jackson R Lawes C Bennet D Milne R Roders A Treatment with

drugs to lower blood pressure and blood cholesterol based on an in-

dividualrsquos absolute cardiovascular risk Lancet 2005365434e41[27] Spruijt B Vergouwe Y Nijman RG Thompson M Oostenbrink R

Vital signs should be maintained as continuous variables when pre-

dicting bacterial infections in febrile children J Clin Epidemiol

201366453e7[28] Sega R Facchetti R Bombelli M Cesana G Corrao G Grassi G

Mancia G Prognostic value of ambulatory and home blood pressures

compared with office blood pressure in the general population fol-

low-up results from the Pressioni Arteriose Monitorate e Loro Asso-

ciazioni (PAMELA) study Circulation 20051111777e83

Fig 1 Phases of monitoring Adapted from Ref [1]

154 KJL Bell et al Journal of Clinical Epidemiology 67 (2014) 152e159

the later the test is on the pathway (closer to outcome) themore predictive it is likely to be with the most valid testbeing a measure of the outcome itself For example withour aforementioned 68-year-old patient following treat-ment we might monitor compliance by pill counts BP ath-erosclerotic changes (such as renal stenosis) signs ofkidney damage (such as proteinuria) or renal functionThe measures along this path have progressively greaterimportance but take progressively longer for changes to be-come apparent

Hazard ratios (HRs) are the most obvious statisticalmeasure for capturing how well tests predict clinical out-comes however in their natural form they have limited ca-pacity for comparison across tests as the HR depends onboth the units of measurement (which are test specific)and distributional range of test results To compare the pre-dictiveness of different tests a standardized measure maybe used such as the HR over the interquartile range (HRIQR) or the HR per standard deviation (SD) increase inthe test (HRSD) Using the IQR to standardize allows lessreliance on the distributional assumptions of normality thatapply when using SD and thus may be preferred in many

Fig 2 Flow diagram of disease and the effects of treatment Adaptedfrom Ref [2] CHD coronary heart disease

instances However neither measure is robust against veryskewed data where a transformation may be required firstbefore calculating predictiveness measures Alternativemethods of standardization include calculating the HR forthe top quartile compared with the bottom quartile [6] (ordecile quintile or similar) Although this usually gives re-sults with a similar interpretation to standardizing by theIQR (or interdecile range etc) less information is used In-stead of estimating risk from the complete data and thencalculating the ratio from two points (HRIQR) the averagerisk from the top quartile is compared with that from thebottom quartile (HR for top vs bottom quartile) HR com-paring top with bottom quartile is also likely to be moresensitive to distributional assumptions than HRIQR

To calculate our preferred standardized measure of pre-dictiveness we first need to choose the patient populationthat we will use for the data This may include patientson no treatment (or placebo) patients on active treatmentor both patients on and off treatment

In some cases there is evidence that the relationshipsbetween monitoring tests and clinical outcome are similarwhether patients are on treatment or not For example inthe Long-Term Intervention with Pravastatin in IschemicDisease (LIPID) trial total cholesterol and LDL cholesterolhad a similar ability to predict CVD whether the patientwas on pravastatin treatment or placebo after adjustmentfor other nonlipid risk factors and measurement error [7]In this case the same data may be used to compare validityfor pretreatment and on-treatment monitoring and this maycome from individuals on placebono treatment or individ-uals on treatment If data are from a randomized controlledtrial (RCT) it may be better to include individuals fromboth placebo and treatment groups This helps ensure a widerange of values for the monitoring test and increase the sta-tistical precision because of the larger data set

In other cases the predictiveness of the monitoring testmay differ depending on whether the patient is on treat-ment For example in the Framingham study the relation-ship between BP and CVD appeared stronger if the patientwas on BP-lowering treatment than if they were not [8] Inthis case it may be best to use data from patients not ontreatment to decide on the clinical validity of a test for pre-treatment monitoring and data from patients on treatment todecide on the clinical validity of a test for on-treatmentmonitoring A caveat to this is that in the first couple ofyears of on-treatment monitoring data from off-treatmentpopulations may be more informative This is because inmany cases of chronic disease the effect of the treatmenton the monitoring test (eg drop in BP) is relatively fastbut the effect on the clinical outcome (eg CVD) may takea few years to become apparent and pretreatment levels ofthe monitoring test may be more predictive during thistime This lsquolsquolag timersquorsquo phenomenon is evident in trial datafor a number of chronic diseases including treatments tolower lipids [9] and BP [10] In each case the differencein monitoring test levels between treatment and placebo

155KJL Bell et al Journal of Clinical Epidemiology 67 (2014) 152e159

is maximal after a few months but the difference in CVDevents between the two treatment groups continues to di-verge for a number of years

Other methods of assessing validity such as proportionof treatment explained (PTE) also assess the responsivenessof the monitoring test and are discussed in the following

32 Responsiveness how clearly and rapidly do thetests change with treatment change

The responsiveness criterion is especially important forthe initial response phase of monitoring soon after a newtreatment has been started Although less obvious this crite-rion is also important for both pretreatment and long-termmonitoring For all monitoring phases we ideally want thetest to be responsive to interventions that alter the patientrsquosrisk of the clinical outcome Such interventions may belifestyle changes in the pretreatment screening phase phar-macologic treatments in the initial response phase or mea-sures to improve adherence in the long-term monitoringphase

Responsiveness describes the amount the test changesfrom the expected trajectory in response to an interventionIt is dependent on the intervention and where the test isplaced in the risk factor outcome pathway In generalresponsive tests tend to be reversible and are placed earlierin the risk factor outcome pathway as illustrated inFig 2 For example lipids and BP normalize in responseto lipid- or BP-lowering therapy However this is not al-ways the case sometimes responsive tests may be less re-versible or even irreversible and are placed later in the riskfactor outcome pathway For example we may observea lesser decline in bone mineral density (BMD) for a post-menopausal woman started on bisphosphonate therapy ora slowing of visual field loss for glaucoma patient startedon therapy to lower intraocular pressure

Related to the concept of responsiveness is the speed ofchange in response to an intervention We often want teststhat show a rapid response to an intervention This is obvi-ously a necessity when the change in outcome in responseto the intervention is also rapid for example risk of hypo-glycemia for glucose-lowering drugs (monitor glucose)bleeding risk for patients on warfarin (monitor internationalnormalized ratio) In other situations in which the change inoutcome is much slower we still often want tests that showa rapid response so that we may quickly judge whethertreatment is working as expected for example risk of a car-diovascular event (monitor cholesterol and BP)

Not all responsive tests show rapid changes in response tothe intervention in fact some take monthsyears to changefor example HbA1c left ventricular function and BMDBecause changes in these tests reflect average treatmenteffects over a longer period of time these tests may bepreferred for judging effects over the mediumelong term

The concepts of lsquolsquosignalrsquorsquo and lsquolsquonoisersquorsquo are relevant toboth response monitoring and long-term monitoring which

follow on from this section For response monitoring signalincludes both mean change and between-person variation inresponse If the between-person variation component of thesignal is small then we may estimate the signal for an in-dividual using the population mean change without needingto monitor (see Refs [11e13] for examples) If thebetween-person variation is not small we are unable to es-timate signal on the basis of population mean change aloneWe will also need to estimate the individualrsquos true deviationfrom the mean change and this is best done where there isa favorable signal-to-noise ratio

Noise is a result of background random variation withinindividuals because of measurement error and biologicalfluctuations The amount of noise in monitoring tests maynot be appreciated by clinicians and variations due to noisemay be wrongly attributed to real change For example theSD for random variation in one measurement of systolicblood is approximately 10 mm Hg [12e14] This meansthat a change of 20 mm Hg between two measurements willoften be because of background noise alone without anyreal underlying change We may estimate noise using sim-ple methods such as halving the observed variance of thedifference between two measurements made within a shortinterval [14] (or for the noise of change we may simply usethe observed variance of difference between the two mea-surements) More sophisticated methods include vario-grams [14e16] and using the residual estimate for mixedmodels [11e1316] A simple method of decreasing noiseis to take the mean of multiple measurements for examplefor the BP of our 68-year-old patient it would be commonto use the average of several sets of home measurements

We may estimate response-monitoring signal using datafrom placebo-controlled randomized trials (these are usu-ally drug trials but theoretically could include behavioralchange or other nondrug interventions) Just as the truemean response is commonly estimated by comparing thedifference in mean changes seen in the active and placeboarms before and after treatment the true between-personvariation in response may be estimated by comparing thedifference in variation in changes Variation in changein the active group results from both true between-personvariation and noise whereas variation in the placebo groupresults from just noise thus we may estimate true between-person variation from the difference in variation betweenthe two groups

If only summary data are available it may be possible toestimate response-monitoring signal using a direct method[17] For example Fig 3 shows the distributions of changein cholesterol concentration after 6 months found in theLIPID trial [9] The placebo group had a mean increasein total cholesterol of 002 mmolL (variance of change04225 mmol2L2) and the pravastatin group had a meandecrease in total cholesterol of 116 mmolL (variance ofchange 05625 mmol2L2) We estimated the mean responseto treatment as a decrease in cholesterol concentration of118 mmolL (002 to 116 mmolL) and the between-

Fig 3 Average and variation in change in cholesterol 6 months afterstarting 40-mg pravastatin Adapted from Ref [14]

Fig 4 Distribution of apparent and true changes in systolic BP (SBP)after starting an ACE inhibitor (based on meta-analytic estimates fromseven RCTsdsee Ref [10]) The pair of normal distribution curveswas constructed to have the same height and the area under eachcurve is proportional to the SD of change in BP Only 3 of the appar-ent variance in systolic BP change is because of true between-personvariance in treatment effects [proportion is calculated from between-person variance in treatment effects(between-person variance intreatment effects thorn within-person variation in BP change)] Adaptedfrom Ref [10]

156 KJL Bell et al Journal of Clinical Epidemiology 67 (2014) 152e159

person variance of response as 014 mmol2L2

(05625e04225 mmolL) Noise may be estimated fromthe placebo group SD of change 065 mmolL (varianceof change 04225 mmol2L2) Although this method out-lines the conceptual framework for estimating response-monitoring signal and noise in practice it relies on thedistributional assumptions of normality and constant vari-ance Often it is difficult to test these assumptions directlyand other data may suggest that they are unlikely to hold(eg where measurement error is known to be proportionalto level) For these reasons the modeling approachdescribed next is preferred wherever possible (with a trans-formation applied to data if necessary)

If individual patient data are available mixed modelsmay be used to estimate mean and between-person variationin response as well as noise (eg see Refs [11e1318]) Forexample Fig 4 shows distribution of apparent and truechanges in BP after starting an angiotensin-converting en-zyme (ACE) inhibitor estimated from a meta-analysis ofmixed models from seven placebo-controlled RCTs ofACE inhibitors [13] The curves give an indication of thesignal-to-noise ratio and in this case the ratio is ratherlow (a fact that may not be widely appreciated by clinicianswho use clinic BP to monitor response to BP-lowering treat-ments) The proportion of the observed variance of change insystolic BP after starting treatment that was actually becauseof true between-person variance in the response to ACEinhibitorsdpart of the signaldwas estimated at only 3In contrast the proportion of observed change that was be-cause of measurement variability and random day-to-dayfluctuationsdthe noisedwas estimated at 97

These percentages do not account for mean change insystolic BP after starting treatment which is also a compo-nent of the signal even so the signal-to-noise ratio is verypoor and monitoring systolic BP is unlikely to be helpful injudging response to treatment For example we estimated

that if treatment had a mean systolic BP-lowering effectof 65 mm Hg it would be necessary to average O90 mea-surement occasions both before and after starting treatmentto be 95 certain that an apparent decrease of O4 mm Hgin systolic BP indicated a true decrease of O4 mm Hg (ieto be certain that treatment is having a substantial effect)

When there are insufficient data (neither individual pa-tient nor summary data available data analyzed in 2011)to estimate true variation in response the largest probablevariation in true response may be estimated using the for-mula 1 SD of between-person variation in treatmenteffects half the mean treatment effect (see Ref [19]) ormean proportional change divided by coefficient of varia-tion (Glasziou et al unpublished data) For example themean treatment effect of tumor necrosis factor inhibitorson tender joint count among patients with rheumatoid ar-thritis has been reported as a decrease of 105 units[2021] Assuming that the minimum treatment effect isgreater than zero this means that the largest probable SDof variation in response to treatment is 53 units (ie halfthe mean effect) The SD for within-person variation forchange in tender joint count (ie noise) was estimated at57 [22] From these estimates we concluded that usingtender joint count we are unlikely to be able to disaggregate

157KJL Bell et al Journal of Clinical Epidemiology 67 (2014) 152e159

true treatment effects from the background day-to-daywithin-person variation

Responsiveness is also indirectly assessed using the PTE[23] and related methods These statistical methods weredeveloped to assess the validity of a surrogate outcome inthe setting of an RCT of a treatment PTE aims to estimatehow much of the treatment effect on clinical outcome is ex-plained by treatment effect on the surrogate this includesboth how much the surrogate predicts the clinical outcomeand how responsive the surrogate is to treatment Estimatesfrom the LIPID trial found that most or all of the effectpravastatin had on reducing cardiovascular events couldbe attributable to the combined effects on total cholesteroland high-density lipoprotein (HDL) The estimated PTE foreach of the single lipid parameters ranged from 8 (triglyc-erides) to 87 (apolipoprotein B) [7]

Table 1 Ranking of lipid-monitoring tests

Lipid Validity ResponsivenessLong-termchange

Total cholesterol 7a 5 5LDL cholesterol 6a 2 6HDL cholesterol 54a 6 3Total-to-HDL cholesterol ratio 3a 4 1LDL-to-HDL cholesterol ratio 2a 1 2Non-HDL cholesterol 54a 3 4Estimated absolutecardiovascular risk

1

Abbreviations LDL low-density lipoprotein HDL high-density li-poprotein HR hazard ratio IQR interquartile range LIPID Long-Term Intervention with Pravastatin in Ischemic Disease

5 indicates that two tests tied on rankingsa Based on HRIQR for change in lipid for predicting coronary

heart disease in the LIPID trial [9]

33 Detection of long-term change how well do thetests distinguish long-term true change from randomvariation

Long-term change describes the ability of the test to dis-cern true long-term changes in the patientrsquos condition (sig-nal) from short-term measurement variability (noise) Thiscriterion is relevant to pretreatment screening and long termon treatment monitoring Although not directly relevant toinitial response monitoring (which is done over the shortterm) we might still consider it in this setting if for consis-tency we want to choose one test to use for all three mon-itoring phases

The signal for long-term change monitoring is the truelong-term trend in level within an individual over time Thisis a combination of the population mean change and thebetween-person variability or individual deviations fromthe mean change Noise is the same as for responsemonitoringdshort-term random variation in level withinan individual Unlike response monitoring in which thebetween-person variation component of the signal is oftensmall rendering monitoring unnecessary in long-term mon-itoring there is usually substantial between-person variationin the long-term trends (see Refs [141624] for examples)This means that we are unable to estimate signal on the basisof population mean change alone and need to also estimatethe individualrsquos true deviation from the mean change underconditions of a favorable signal-to-noise ratio

Analogous methods may be used to those for estimatingsignal and noise as for the case of response monitoringBetween-person variation in long-term trends over timemay be estimated by simple methods (eg subtractingtwice the short-term random variation from the total ob-served variation of the difference from the baseline to eachsubsequent time point [1416]) or through modeling [mixedmodels which may include random slopes (trends overtime) [111416]]

Other issues to consider is that variance may be depen-dent on level and the distribution curves may be non-

normal For example it is well accepted that the lsquolsquonoisersquorsquocomponent of variance may be level dependent [25] Othercomponents of total variance (such as between-person var-iation in baseline levels and trajectories over time) may alsobe level dependent This will often result in the residualsandor random-effects distributions being non-normallydistributed Often a transformation such as natural logmay solve both problems and should be done before calcu-lation of the signal-to-noise ratio (for examples see Refs[1418])

4 Some clinical examples

We applied the aforementioned framework to the clini-cal situations of monitoring patients at high risk for CVD(lipids and BP) For the purposes of illustration we haveused ranking to compare the tests for each of the criteriaAn alternative approach would be to present the actualquantitative measures for each criterion Although datafrom a single study are used as illustration for both lipidsand blood pressure ideally we would use data from severalsources or from a systematic review

The results for lipids are shown in Table 1 Overall thebest lipid marker for monitoring appears to be LDL-to-HDL ratio although other markers that combined two testsalso ranked highly total-to-HDL cholesterol ratio and non-HDL cholesterol

The results for BP measurements are shown in Table 2All three tests (clinic ambulatory and home BP) aim tomeasure the same risk factor the individualrsquos BP In thiscase the differences in ranking for validity are becauseout-of-office measurement of BP offers a more accurate re-flection of the individualrsquos usual BP This is because of botha more natural environment where measurement takesplace (avoidance of lsquolsquowhite coat hypertensionrsquorsquo) and reduc-tion in noise (measurement variability) through averaginga larger number of measurements In addition the measuresof variability possible with out-of-office measurement havebeen found to be predictive of CVD independent of meanBP level this also adds to the superior predictiveness of

Table 2 Ranking of BP-monitoring tests

BP test Validity ResponsivenessLong-termchange

Clinic BP 4 3 3Ambulatory BP 52a 51 51Home BP 52a 51 51Estimated absolute

cardiovascular risk1

Abbreviation BP blood pressurea Based on risk of cardiovascular events in the PAMELA

study [28]

158 KJL Bell et al Journal of Clinical Epidemiology 67 (2014) 152e159

these types of BP measurement Both types of out-of-officeBP also outrank clinic BP for both responsiveness and long-term change primarily through the reduction in noise that ispossible with these types of BP measurement

Cardiovascular risk estimation (using the Framingham[8] or similar equation) outperforms both lipids and BP interms of validity Currently this is only monitored in thepretreatment population to decide who should be startedon lipid- and BP-lowering treatments [26] but theoreticallyit could be used for monitoring patients on treatment alsoHow this test performs on measures of responsiveness andlong-term change is currently unknown

5 Discussion and conclusions

The choice of the best test to monitor a patient may bemade by applying the methods described previously toavailable evidence and then ranking tests in terms of valid-ity responsiveness and long-term change Although notdiscussed in this article the fourth criterion of practicalityis also important For example when choosing betweenhome and ambulatory BP (which perform similarly on thecriteria of validity responsiveness and long-term change)the clinician and patient may opt for home BP measure-ment because of its ease of application

The criteria that we describe evaluate monitoring fortests assessed on a continuum We do this for several rea-sons many common monitoring tests are reported as con-tinuous separating lsquolsquosignalrsquorsquo from lsquolsquonoisersquorsquo is most easilydone using continuous measures and maintaining the con-tinuity of measurements will increase statistical power [27]However sometimes changes observed in continuous testsduring monitoring may then be dichotomized or split intomore than two categories by comparing the level to deci-sion thresholds to decide whether there is a need for alter-ing management or early retesting

Unless one test dominates on all three criteria and onpracticality the choice of test will involve some trade-offbetween the criteria Depending on the clinical circum-stances clinicians may give more weight to one of thesecriteria than the others although validity will usually bethe most important By making an evidence-based choiceon the monitoring test to use we can expect better clinicaloutcomes higher patient and clinician satisfaction and lesswaste of scarce health resources

Acknowledgments

The authors thank Dr Clement Loy and Dr RafaelPerera-Salazar for their helpful comments on an earlierdraft of this article

References

[1] Glasziou PP Irwig L Mant D Monitoring in chronic disease a ratio-

nal approach BMJ 2005330644e8[2] Bossuyt PM Irwig L Craig J Glasziou P Comparative accuracy as-

sessing new tests against existing diagnostic pathways [Review] [16

refs] [Erratum appears in BMJ 2006332(7554)1368] BMJ 2006

3321089e92

[3] Hlatky MA Greenland P Arnett DK Ballantyne CM Criqui MH

Elkind MS et al Criteria for evaluation of novel markers of cardio-

vascular risk a scientific statement from the American Heart Associ-

ation [Erratum appears in Circulation 2009119(25)e606 Note

Hong Yuling [added]] Circulation 20091192408e16

[4] Janes H Pepe M Bossuyt P Barlow W Measuring the performance

of markers for guiding treatment decisions Ann Intern Med 2011

154253e9

[5] Irwig L Glasziou PP Choosing the best monitoring tests In

Glasziou PP Irwig L Aronson JK editors Evidence-based medical

monitoring from principles to practice Malden MA BMJ Books

200863e74

[6] Boekholdt SM Arsenault BJ Mora S Pedersen TR LaRosa JC

Nestel PJ et al Association of LDL cholesterol non-HDL choles-

terol and apolipoprotein B levels with risk of cardiovascular events

among patients treated with statins A meta-analysis JAMA 2012

3071302e9

[7] Simes RJ Marschner IC Hunt D Colquhoun D Sullivan D

Steward RAH et al Relationship between lipid levels and clinical

outcomes in the Long-Term Intervention with Pravastatin in Ischemic

Disease (LIPID) Trial to what extend is the reduction in coronary

events with pravastatin explained by on-study lipid levels Circula-

tion 20021051162e9

[8] DrsquoAgostino RB Vasan RS Pencina MJ Wolf PA Cobain M

Massaro JM et al General cardiovascular risk profile for use in pri-

mary care the Framingham Heart Study Circulation 2008117

743e53

[9] The Long-Term Intervention with Pravastatin in Ischaemic Disease

(LIPID) Study Group Prevention of cardiovascular events and death

with pravastatin in patients with coronary heart disease and a broad

range of initial cholesterol levels N Engl J Med 1998339

1349e57

[10] PROGRESS Collaboration Group Randomised trial of a perindopril-

based blood-pressure-lowering regimen among 6105 individuals with

previous stroke or transient ischaemic attack Lancet 2001358

1033e41

[11] Bell KJL Hayen A Macaskill P Irwig L Craig JC Ensrud KEE

et al Value of routine monitoring of bone mineral density after start-

ing bisphosphonate treatment secondary analysis of trial data BMJ

2009338b2266

[12] Bell KJL Hayen A Macaskill P Craig JC Neal BC Irwig L Mixed

models showed no need for initial response monitoring after starting

anti-hypertensive therapy J Clin Epidemiol 200962650e9

[13] Bell KJL Hayen A Macaskill P Craig JC Neal BC Fox KM

et al Monitoring initial response to angiotensin converting enzyme

inhibitor based regimens an individual patient data meta-analysis

from randomised placebo controlled trials Hypertension 201056

533e9[14] Keenan K Hayen A Neal B Irwig L Long term monitoring in pa-

tients receiving treatment to lower blood pressure analysis of data

from placebo controlled randomised controlled trial BMJ 2009

338b1492

159KJL Bell et al Journal of Clinical Epidemiology 67 (2014) 152e159

[15] Shepard D Reliability of blood pressure measurements implications

for designing and evaluating programs to control hypertension

J Chronic Dis 198134191e209

[16] Glasziou PP Irwig L Heritier S Simes J Tonkin A the LIPID Study

Investigators Monitoring cholesterol levels measurement error or

true change Ann Intern Med 2008148656e61

[17] Bell KJL Irwig L Craig JC Macaskill P Use of randomized trials to

decide when to monitor response to new treatment BMJ 2008336

361e5[18] Bell K Hayen A Irwig L Hochberg M Ensrud K Cummings S

et al The potential value of monitoring bone turnover markers among

women on alendronate J Bone Miner Res 201227195e201[19] Bell KJL Irwig L March L Hayen A Macaskill P Craig JC Should

response rules be used to decide continued subsidy of very expensive

drugs A checklist for decision makers Pharmacoepidemiol Drug Saf

201019(1)99e105[20] van de Putte LBA Atkins C Malaise M Sany J Russell AS van

Riel PLCM Efficacy and safety of adalimumab as monotherapy in

patients with rheumatoid arthritis for whom previous disease modify-

ing antirheumatic drug treatment has failed Ann Rheum Dis 2004

63508e16

[21] Westhoevens R Yocum D Han J Berman A Strusber I Geusens P

et al The safety of infliximab combined with background treatments

among patients with rheumatoid arthritis and various comorbidities

Arthritis Rheum 2006541075e86

[22] Lassere M van der Heijde D Johnson KR Boers M Edmonds J

Reliability of measures of disease activity and disease damage in rheu-

matoid arthritis implications for smallest detectable difference min-

imal clinically important difference and analysis of treatment effects

in randomized controlled trials J Rheumatol 200128892e903[23] Freedman LS Graubard BI Statistical validation of intermediate

endpoints for chronic diseases Stat Med 199211167e78

[24] Takahashi O Glasziou PP Perera R Shimbo T Fukui T Blood pres-

sure re-screening for healthy adults what is the best measure and in-

terval J Hum Hypertens 201226540e6 httpdxdoiorg101038

jhh201172

[25] Bland JM Altman DG Measurement error proportional to the mean

BMJ 1996313106

[26] Jackson R Lawes C Bennet D Milne R Roders A Treatment with

drugs to lower blood pressure and blood cholesterol based on an in-

dividualrsquos absolute cardiovascular risk Lancet 2005365434e41[27] Spruijt B Vergouwe Y Nijman RG Thompson M Oostenbrink R

Vital signs should be maintained as continuous variables when pre-

dicting bacterial infections in febrile children J Clin Epidemiol

201366453e7[28] Sega R Facchetti R Bombelli M Cesana G Corrao G Grassi G

Mancia G Prognostic value of ambulatory and home blood pressures

compared with office blood pressure in the general population fol-

low-up results from the Pressioni Arteriose Monitorate e Loro Asso-

ciazioni (PAMELA) study Circulation 20051111777e83

155KJL Bell et al Journal of Clinical Epidemiology 67 (2014) 152e159

is maximal after a few months but the difference in CVDevents between the two treatment groups continues to di-verge for a number of years

Other methods of assessing validity such as proportionof treatment explained (PTE) also assess the responsivenessof the monitoring test and are discussed in the following

32 Responsiveness how clearly and rapidly do thetests change with treatment change

The responsiveness criterion is especially important forthe initial response phase of monitoring soon after a newtreatment has been started Although less obvious this crite-rion is also important for both pretreatment and long-termmonitoring For all monitoring phases we ideally want thetest to be responsive to interventions that alter the patientrsquosrisk of the clinical outcome Such interventions may belifestyle changes in the pretreatment screening phase phar-macologic treatments in the initial response phase or mea-sures to improve adherence in the long-term monitoringphase

Responsiveness describes the amount the test changesfrom the expected trajectory in response to an interventionIt is dependent on the intervention and where the test isplaced in the risk factor outcome pathway In generalresponsive tests tend to be reversible and are placed earlierin the risk factor outcome pathway as illustrated inFig 2 For example lipids and BP normalize in responseto lipid- or BP-lowering therapy However this is not al-ways the case sometimes responsive tests may be less re-versible or even irreversible and are placed later in the riskfactor outcome pathway For example we may observea lesser decline in bone mineral density (BMD) for a post-menopausal woman started on bisphosphonate therapy ora slowing of visual field loss for glaucoma patient startedon therapy to lower intraocular pressure

Related to the concept of responsiveness is the speed ofchange in response to an intervention We often want teststhat show a rapid response to an intervention This is obvi-ously a necessity when the change in outcome in responseto the intervention is also rapid for example risk of hypo-glycemia for glucose-lowering drugs (monitor glucose)bleeding risk for patients on warfarin (monitor internationalnormalized ratio) In other situations in which the change inoutcome is much slower we still often want tests that showa rapid response so that we may quickly judge whethertreatment is working as expected for example risk of a car-diovascular event (monitor cholesterol and BP)

Not all responsive tests show rapid changes in response tothe intervention in fact some take monthsyears to changefor example HbA1c left ventricular function and BMDBecause changes in these tests reflect average treatmenteffects over a longer period of time these tests may bepreferred for judging effects over the mediumelong term

The concepts of lsquolsquosignalrsquorsquo and lsquolsquonoisersquorsquo are relevant toboth response monitoring and long-term monitoring which

follow on from this section For response monitoring signalincludes both mean change and between-person variation inresponse If the between-person variation component of thesignal is small then we may estimate the signal for an in-dividual using the population mean change without needingto monitor (see Refs [11e13] for examples) If thebetween-person variation is not small we are unable to es-timate signal on the basis of population mean change aloneWe will also need to estimate the individualrsquos true deviationfrom the mean change and this is best done where there isa favorable signal-to-noise ratio

Noise is a result of background random variation withinindividuals because of measurement error and biologicalfluctuations The amount of noise in monitoring tests maynot be appreciated by clinicians and variations due to noisemay be wrongly attributed to real change For example theSD for random variation in one measurement of systolicblood is approximately 10 mm Hg [12e14] This meansthat a change of 20 mm Hg between two measurements willoften be because of background noise alone without anyreal underlying change We may estimate noise using sim-ple methods such as halving the observed variance of thedifference between two measurements made within a shortinterval [14] (or for the noise of change we may simply usethe observed variance of difference between the two mea-surements) More sophisticated methods include vario-grams [14e16] and using the residual estimate for mixedmodels [11e1316] A simple method of decreasing noiseis to take the mean of multiple measurements for examplefor the BP of our 68-year-old patient it would be commonto use the average of several sets of home measurements

We may estimate response-monitoring signal using datafrom placebo-controlled randomized trials (these are usu-ally drug trials but theoretically could include behavioralchange or other nondrug interventions) Just as the truemean response is commonly estimated by comparing thedifference in mean changes seen in the active and placeboarms before and after treatment the true between-personvariation in response may be estimated by comparing thedifference in variation in changes Variation in changein the active group results from both true between-personvariation and noise whereas variation in the placebo groupresults from just noise thus we may estimate true between-person variation from the difference in variation betweenthe two groups

If only summary data are available it may be possible toestimate response-monitoring signal using a direct method[17] For example Fig 3 shows the distributions of changein cholesterol concentration after 6 months found in theLIPID trial [9] The placebo group had a mean increasein total cholesterol of 002 mmolL (variance of change04225 mmol2L2) and the pravastatin group had a meandecrease in total cholesterol of 116 mmolL (variance ofchange 05625 mmol2L2) We estimated the mean responseto treatment as a decrease in cholesterol concentration of118 mmolL (002 to 116 mmolL) and the between-

Fig 3 Average and variation in change in cholesterol 6 months afterstarting 40-mg pravastatin Adapted from Ref [14]

Fig 4 Distribution of apparent and true changes in systolic BP (SBP)after starting an ACE inhibitor (based on meta-analytic estimates fromseven RCTsdsee Ref [10]) The pair of normal distribution curveswas constructed to have the same height and the area under eachcurve is proportional to the SD of change in BP Only 3 of the appar-ent variance in systolic BP change is because of true between-personvariance in treatment effects [proportion is calculated from between-person variance in treatment effects(between-person variance intreatment effects thorn within-person variation in BP change)] Adaptedfrom Ref [10]

156 KJL Bell et al Journal of Clinical Epidemiology 67 (2014) 152e159

person variance of response as 014 mmol2L2

(05625e04225 mmolL) Noise may be estimated fromthe placebo group SD of change 065 mmolL (varianceof change 04225 mmol2L2) Although this method out-lines the conceptual framework for estimating response-monitoring signal and noise in practice it relies on thedistributional assumptions of normality and constant vari-ance Often it is difficult to test these assumptions directlyand other data may suggest that they are unlikely to hold(eg where measurement error is known to be proportionalto level) For these reasons the modeling approachdescribed next is preferred wherever possible (with a trans-formation applied to data if necessary)

If individual patient data are available mixed modelsmay be used to estimate mean and between-person variationin response as well as noise (eg see Refs [11e1318]) Forexample Fig 4 shows distribution of apparent and truechanges in BP after starting an angiotensin-converting en-zyme (ACE) inhibitor estimated from a meta-analysis ofmixed models from seven placebo-controlled RCTs ofACE inhibitors [13] The curves give an indication of thesignal-to-noise ratio and in this case the ratio is ratherlow (a fact that may not be widely appreciated by clinicianswho use clinic BP to monitor response to BP-lowering treat-ments) The proportion of the observed variance of change insystolic BP after starting treatment that was actually becauseof true between-person variance in the response to ACEinhibitorsdpart of the signaldwas estimated at only 3In contrast the proportion of observed change that was be-cause of measurement variability and random day-to-dayfluctuationsdthe noisedwas estimated at 97

These percentages do not account for mean change insystolic BP after starting treatment which is also a compo-nent of the signal even so the signal-to-noise ratio is verypoor and monitoring systolic BP is unlikely to be helpful injudging response to treatment For example we estimated

that if treatment had a mean systolic BP-lowering effectof 65 mm Hg it would be necessary to average O90 mea-surement occasions both before and after starting treatmentto be 95 certain that an apparent decrease of O4 mm Hgin systolic BP indicated a true decrease of O4 mm Hg (ieto be certain that treatment is having a substantial effect)

When there are insufficient data (neither individual pa-tient nor summary data available data analyzed in 2011)to estimate true variation in response the largest probablevariation in true response may be estimated using the for-mula 1 SD of between-person variation in treatmenteffects half the mean treatment effect (see Ref [19]) ormean proportional change divided by coefficient of varia-tion (Glasziou et al unpublished data) For example themean treatment effect of tumor necrosis factor inhibitorson tender joint count among patients with rheumatoid ar-thritis has been reported as a decrease of 105 units[2021] Assuming that the minimum treatment effect isgreater than zero this means that the largest probable SDof variation in response to treatment is 53 units (ie halfthe mean effect) The SD for within-person variation forchange in tender joint count (ie noise) was estimated at57 [22] From these estimates we concluded that usingtender joint count we are unlikely to be able to disaggregate

157KJL Bell et al Journal of Clinical Epidemiology 67 (2014) 152e159

true treatment effects from the background day-to-daywithin-person variation

Responsiveness is also indirectly assessed using the PTE[23] and related methods These statistical methods weredeveloped to assess the validity of a surrogate outcome inthe setting of an RCT of a treatment PTE aims to estimatehow much of the treatment effect on clinical outcome is ex-plained by treatment effect on the surrogate this includesboth how much the surrogate predicts the clinical outcomeand how responsive the surrogate is to treatment Estimatesfrom the LIPID trial found that most or all of the effectpravastatin had on reducing cardiovascular events couldbe attributable to the combined effects on total cholesteroland high-density lipoprotein (HDL) The estimated PTE foreach of the single lipid parameters ranged from 8 (triglyc-erides) to 87 (apolipoprotein B) [7]

Table 1 Ranking of lipid-monitoring tests

Lipid Validity ResponsivenessLong-termchange

Total cholesterol 7a 5 5LDL cholesterol 6a 2 6HDL cholesterol 54a 6 3Total-to-HDL cholesterol ratio 3a 4 1LDL-to-HDL cholesterol ratio 2a 1 2Non-HDL cholesterol 54a 3 4Estimated absolutecardiovascular risk

1

Abbreviations LDL low-density lipoprotein HDL high-density li-poprotein HR hazard ratio IQR interquartile range LIPID Long-Term Intervention with Pravastatin in Ischemic Disease

5 indicates that two tests tied on rankingsa Based on HRIQR for change in lipid for predicting coronary

heart disease in the LIPID trial [9]

33 Detection of long-term change how well do thetests distinguish long-term true change from randomvariation

Long-term change describes the ability of the test to dis-cern true long-term changes in the patientrsquos condition (sig-nal) from short-term measurement variability (noise) Thiscriterion is relevant to pretreatment screening and long termon treatment monitoring Although not directly relevant toinitial response monitoring (which is done over the shortterm) we might still consider it in this setting if for consis-tency we want to choose one test to use for all three mon-itoring phases

The signal for long-term change monitoring is the truelong-term trend in level within an individual over time Thisis a combination of the population mean change and thebetween-person variability or individual deviations fromthe mean change Noise is the same as for responsemonitoringdshort-term random variation in level withinan individual Unlike response monitoring in which thebetween-person variation component of the signal is oftensmall rendering monitoring unnecessary in long-term mon-itoring there is usually substantial between-person variationin the long-term trends (see Refs [141624] for examples)This means that we are unable to estimate signal on the basisof population mean change alone and need to also estimatethe individualrsquos true deviation from the mean change underconditions of a favorable signal-to-noise ratio

Analogous methods may be used to those for estimatingsignal and noise as for the case of response monitoringBetween-person variation in long-term trends over timemay be estimated by simple methods (eg subtractingtwice the short-term random variation from the total ob-served variation of the difference from the baseline to eachsubsequent time point [1416]) or through modeling [mixedmodels which may include random slopes (trends overtime) [111416]]

Other issues to consider is that variance may be depen-dent on level and the distribution curves may be non-

normal For example it is well accepted that the lsquolsquonoisersquorsquocomponent of variance may be level dependent [25] Othercomponents of total variance (such as between-person var-iation in baseline levels and trajectories over time) may alsobe level dependent This will often result in the residualsandor random-effects distributions being non-normallydistributed Often a transformation such as natural logmay solve both problems and should be done before calcu-lation of the signal-to-noise ratio (for examples see Refs[1418])

4 Some clinical examples

We applied the aforementioned framework to the clini-cal situations of monitoring patients at high risk for CVD(lipids and BP) For the purposes of illustration we haveused ranking to compare the tests for each of the criteriaAn alternative approach would be to present the actualquantitative measures for each criterion Although datafrom a single study are used as illustration for both lipidsand blood pressure ideally we would use data from severalsources or from a systematic review

The results for lipids are shown in Table 1 Overall thebest lipid marker for monitoring appears to be LDL-to-HDL ratio although other markers that combined two testsalso ranked highly total-to-HDL cholesterol ratio and non-HDL cholesterol

The results for BP measurements are shown in Table 2All three tests (clinic ambulatory and home BP) aim tomeasure the same risk factor the individualrsquos BP In thiscase the differences in ranking for validity are becauseout-of-office measurement of BP offers a more accurate re-flection of the individualrsquos usual BP This is because of botha more natural environment where measurement takesplace (avoidance of lsquolsquowhite coat hypertensionrsquorsquo) and reduc-tion in noise (measurement variability) through averaginga larger number of measurements In addition the measuresof variability possible with out-of-office measurement havebeen found to be predictive of CVD independent of meanBP level this also adds to the superior predictiveness of

Table 2 Ranking of BP-monitoring tests

BP test Validity ResponsivenessLong-termchange

Clinic BP 4 3 3Ambulatory BP 52a 51 51Home BP 52a 51 51Estimated absolute

cardiovascular risk1

Abbreviation BP blood pressurea Based on risk of cardiovascular events in the PAMELA

study [28]

158 KJL Bell et al Journal of Clinical Epidemiology 67 (2014) 152e159

these types of BP measurement Both types of out-of-officeBP also outrank clinic BP for both responsiveness and long-term change primarily through the reduction in noise that ispossible with these types of BP measurement

Cardiovascular risk estimation (using the Framingham[8] or similar equation) outperforms both lipids and BP interms of validity Currently this is only monitored in thepretreatment population to decide who should be startedon lipid- and BP-lowering treatments [26] but theoreticallyit could be used for monitoring patients on treatment alsoHow this test performs on measures of responsiveness andlong-term change is currently unknown

5 Discussion and conclusions

The choice of the best test to monitor a patient may bemade by applying the methods described previously toavailable evidence and then ranking tests in terms of valid-ity responsiveness and long-term change Although notdiscussed in this article the fourth criterion of practicalityis also important For example when choosing betweenhome and ambulatory BP (which perform similarly on thecriteria of validity responsiveness and long-term change)the clinician and patient may opt for home BP measure-ment because of its ease of application

The criteria that we describe evaluate monitoring fortests assessed on a continuum We do this for several rea-sons many common monitoring tests are reported as con-tinuous separating lsquolsquosignalrsquorsquo from lsquolsquonoisersquorsquo is most easilydone using continuous measures and maintaining the con-tinuity of measurements will increase statistical power [27]However sometimes changes observed in continuous testsduring monitoring may then be dichotomized or split intomore than two categories by comparing the level to deci-sion thresholds to decide whether there is a need for alter-ing management or early retesting

Unless one test dominates on all three criteria and onpracticality the choice of test will involve some trade-offbetween the criteria Depending on the clinical circum-stances clinicians may give more weight to one of thesecriteria than the others although validity will usually bethe most important By making an evidence-based choiceon the monitoring test to use we can expect better clinicaloutcomes higher patient and clinician satisfaction and lesswaste of scarce health resources

Acknowledgments

The authors thank Dr Clement Loy and Dr RafaelPerera-Salazar for their helpful comments on an earlierdraft of this article

References

[1] Glasziou PP Irwig L Mant D Monitoring in chronic disease a ratio-

nal approach BMJ 2005330644e8[2] Bossuyt PM Irwig L Craig J Glasziou P Comparative accuracy as-

sessing new tests against existing diagnostic pathways [Review] [16

refs] [Erratum appears in BMJ 2006332(7554)1368] BMJ 2006

3321089e92

[3] Hlatky MA Greenland P Arnett DK Ballantyne CM Criqui MH

Elkind MS et al Criteria for evaluation of novel markers of cardio-

vascular risk a scientific statement from the American Heart Associ-

ation [Erratum appears in Circulation 2009119(25)e606 Note

Hong Yuling [added]] Circulation 20091192408e16

[4] Janes H Pepe M Bossuyt P Barlow W Measuring the performance

of markers for guiding treatment decisions Ann Intern Med 2011

154253e9

[5] Irwig L Glasziou PP Choosing the best monitoring tests In

Glasziou PP Irwig L Aronson JK editors Evidence-based medical

monitoring from principles to practice Malden MA BMJ Books

200863e74

[6] Boekholdt SM Arsenault BJ Mora S Pedersen TR LaRosa JC

Nestel PJ et al Association of LDL cholesterol non-HDL choles-

terol and apolipoprotein B levels with risk of cardiovascular events

among patients treated with statins A meta-analysis JAMA 2012

3071302e9

[7] Simes RJ Marschner IC Hunt D Colquhoun D Sullivan D

Steward RAH et al Relationship between lipid levels and clinical

outcomes in the Long-Term Intervention with Pravastatin in Ischemic

Disease (LIPID) Trial to what extend is the reduction in coronary

events with pravastatin explained by on-study lipid levels Circula-

tion 20021051162e9

[8] DrsquoAgostino RB Vasan RS Pencina MJ Wolf PA Cobain M

Massaro JM et al General cardiovascular risk profile for use in pri-

mary care the Framingham Heart Study Circulation 2008117

743e53

[9] The Long-Term Intervention with Pravastatin in Ischaemic Disease

(LIPID) Study Group Prevention of cardiovascular events and death

with pravastatin in patients with coronary heart disease and a broad

range of initial cholesterol levels N Engl J Med 1998339

1349e57

[10] PROGRESS Collaboration Group Randomised trial of a perindopril-

based blood-pressure-lowering regimen among 6105 individuals with

previous stroke or transient ischaemic attack Lancet 2001358

1033e41

[11] Bell KJL Hayen A Macaskill P Irwig L Craig JC Ensrud KEE

et al Value of routine monitoring of bone mineral density after start-

ing bisphosphonate treatment secondary analysis of trial data BMJ

2009338b2266

[12] Bell KJL Hayen A Macaskill P Craig JC Neal BC Irwig L Mixed

models showed no need for initial response monitoring after starting

anti-hypertensive therapy J Clin Epidemiol 200962650e9

[13] Bell KJL Hayen A Macaskill P Craig JC Neal BC Fox KM

et al Monitoring initial response to angiotensin converting enzyme

inhibitor based regimens an individual patient data meta-analysis

from randomised placebo controlled trials Hypertension 201056

533e9[14] Keenan K Hayen A Neal B Irwig L Long term monitoring in pa-

tients receiving treatment to lower blood pressure analysis of data

from placebo controlled randomised controlled trial BMJ 2009

338b1492

159KJL Bell et al Journal of Clinical Epidemiology 67 (2014) 152e159

[15] Shepard D Reliability of blood pressure measurements implications

for designing and evaluating programs to control hypertension

J Chronic Dis 198134191e209

[16] Glasziou PP Irwig L Heritier S Simes J Tonkin A the LIPID Study

Investigators Monitoring cholesterol levels measurement error or

true change Ann Intern Med 2008148656e61

[17] Bell KJL Irwig L Craig JC Macaskill P Use of randomized trials to

decide when to monitor response to new treatment BMJ 2008336

361e5[18] Bell K Hayen A Irwig L Hochberg M Ensrud K Cummings S

et al The potential value of monitoring bone turnover markers among

women on alendronate J Bone Miner Res 201227195e201[19] Bell KJL Irwig L March L Hayen A Macaskill P Craig JC Should

response rules be used to decide continued subsidy of very expensive

drugs A checklist for decision makers Pharmacoepidemiol Drug Saf

201019(1)99e105[20] van de Putte LBA Atkins C Malaise M Sany J Russell AS van

Riel PLCM Efficacy and safety of adalimumab as monotherapy in

patients with rheumatoid arthritis for whom previous disease modify-

ing antirheumatic drug treatment has failed Ann Rheum Dis 2004

63508e16

[21] Westhoevens R Yocum D Han J Berman A Strusber I Geusens P

et al The safety of infliximab combined with background treatments

among patients with rheumatoid arthritis and various comorbidities

Arthritis Rheum 2006541075e86

[22] Lassere M van der Heijde D Johnson KR Boers M Edmonds J

Reliability of measures of disease activity and disease damage in rheu-

matoid arthritis implications for smallest detectable difference min-

imal clinically important difference and analysis of treatment effects

in randomized controlled trials J Rheumatol 200128892e903[23] Freedman LS Graubard BI Statistical validation of intermediate

endpoints for chronic diseases Stat Med 199211167e78

[24] Takahashi O Glasziou PP Perera R Shimbo T Fukui T Blood pres-

sure re-screening for healthy adults what is the best measure and in-

terval J Hum Hypertens 201226540e6 httpdxdoiorg101038

jhh201172

[25] Bland JM Altman DG Measurement error proportional to the mean

BMJ 1996313106

[26] Jackson R Lawes C Bennet D Milne R Roders A Treatment with

drugs to lower blood pressure and blood cholesterol based on an in-

dividualrsquos absolute cardiovascular risk Lancet 2005365434e41[27] Spruijt B Vergouwe Y Nijman RG Thompson M Oostenbrink R

Vital signs should be maintained as continuous variables when pre-

dicting bacterial infections in febrile children J Clin Epidemiol

201366453e7[28] Sega R Facchetti R Bombelli M Cesana G Corrao G Grassi G

Mancia G Prognostic value of ambulatory and home blood pressures

compared with office blood pressure in the general population fol-

low-up results from the Pressioni Arteriose Monitorate e Loro Asso-

ciazioni (PAMELA) study Circulation 20051111777e83

Fig 3 Average and variation in change in cholesterol 6 months afterstarting 40-mg pravastatin Adapted from Ref [14]

Fig 4 Distribution of apparent and true changes in systolic BP (SBP)after starting an ACE inhibitor (based on meta-analytic estimates fromseven RCTsdsee Ref [10]) The pair of normal distribution curveswas constructed to have the same height and the area under eachcurve is proportional to the SD of change in BP Only 3 of the appar-ent variance in systolic BP change is because of true between-personvariance in treatment effects [proportion is calculated from between-person variance in treatment effects(between-person variance intreatment effects thorn within-person variation in BP change)] Adaptedfrom Ref [10]

156 KJL Bell et al Journal of Clinical Epidemiology 67 (2014) 152e159

person variance of response as 014 mmol2L2

(05625e04225 mmolL) Noise may be estimated fromthe placebo group SD of change 065 mmolL (varianceof change 04225 mmol2L2) Although this method out-lines the conceptual framework for estimating response-monitoring signal and noise in practice it relies on thedistributional assumptions of normality and constant vari-ance Often it is difficult to test these assumptions directlyand other data may suggest that they are unlikely to hold(eg where measurement error is known to be proportionalto level) For these reasons the modeling approachdescribed next is preferred wherever possible (with a trans-formation applied to data if necessary)

If individual patient data are available mixed modelsmay be used to estimate mean and between-person variationin response as well as noise (eg see Refs [11e1318]) Forexample Fig 4 shows distribution of apparent and truechanges in BP after starting an angiotensin-converting en-zyme (ACE) inhibitor estimated from a meta-analysis ofmixed models from seven placebo-controlled RCTs ofACE inhibitors [13] The curves give an indication of thesignal-to-noise ratio and in this case the ratio is ratherlow (a fact that may not be widely appreciated by clinicianswho use clinic BP to monitor response to BP-lowering treat-ments) The proportion of the observed variance of change insystolic BP after starting treatment that was actually becauseof true between-person variance in the response to ACEinhibitorsdpart of the signaldwas estimated at only 3In contrast the proportion of observed change that was be-cause of measurement variability and random day-to-dayfluctuationsdthe noisedwas estimated at 97

These percentages do not account for mean change insystolic BP after starting treatment which is also a compo-nent of the signal even so the signal-to-noise ratio is verypoor and monitoring systolic BP is unlikely to be helpful injudging response to treatment For example we estimated

that if treatment had a mean systolic BP-lowering effectof 65 mm Hg it would be necessary to average O90 mea-surement occasions both before and after starting treatmentto be 95 certain that an apparent decrease of O4 mm Hgin systolic BP indicated a true decrease of O4 mm Hg (ieto be certain that treatment is having a substantial effect)

When there are insufficient data (neither individual pa-tient nor summary data available data analyzed in 2011)to estimate true variation in response the largest probablevariation in true response may be estimated using the for-mula 1 SD of between-person variation in treatmenteffects half the mean treatment effect (see Ref [19]) ormean proportional change divided by coefficient of varia-tion (Glasziou et al unpublished data) For example themean treatment effect of tumor necrosis factor inhibitorson tender joint count among patients with rheumatoid ar-thritis has been reported as a decrease of 105 units[2021] Assuming that the minimum treatment effect isgreater than zero this means that the largest probable SDof variation in response to treatment is 53 units (ie halfthe mean effect) The SD for within-person variation forchange in tender joint count (ie noise) was estimated at57 [22] From these estimates we concluded that usingtender joint count we are unlikely to be able to disaggregate

157KJL Bell et al Journal of Clinical Epidemiology 67 (2014) 152e159

true treatment effects from the background day-to-daywithin-person variation

Responsiveness is also indirectly assessed using the PTE[23] and related methods These statistical methods weredeveloped to assess the validity of a surrogate outcome inthe setting of an RCT of a treatment PTE aims to estimatehow much of the treatment effect on clinical outcome is ex-plained by treatment effect on the surrogate this includesboth how much the surrogate predicts the clinical outcomeand how responsive the surrogate is to treatment Estimatesfrom the LIPID trial found that most or all of the effectpravastatin had on reducing cardiovascular events couldbe attributable to the combined effects on total cholesteroland high-density lipoprotein (HDL) The estimated PTE foreach of the single lipid parameters ranged from 8 (triglyc-erides) to 87 (apolipoprotein B) [7]

Table 1 Ranking of lipid-monitoring tests

Lipid Validity ResponsivenessLong-termchange

Total cholesterol 7a 5 5LDL cholesterol 6a 2 6HDL cholesterol 54a 6 3Total-to-HDL cholesterol ratio 3a 4 1LDL-to-HDL cholesterol ratio 2a 1 2Non-HDL cholesterol 54a 3 4Estimated absolutecardiovascular risk

1

Abbreviations LDL low-density lipoprotein HDL high-density li-poprotein HR hazard ratio IQR interquartile range LIPID Long-Term Intervention with Pravastatin in Ischemic Disease

5 indicates that two tests tied on rankingsa Based on HRIQR for change in lipid for predicting coronary

heart disease in the LIPID trial [9]

33 Detection of long-term change how well do thetests distinguish long-term true change from randomvariation

Long-term change describes the ability of the test to dis-cern true long-term changes in the patientrsquos condition (sig-nal) from short-term measurement variability (noise) Thiscriterion is relevant to pretreatment screening and long termon treatment monitoring Although not directly relevant toinitial response monitoring (which is done over the shortterm) we might still consider it in this setting if for consis-tency we want to choose one test to use for all three mon-itoring phases

The signal for long-term change monitoring is the truelong-term trend in level within an individual over time Thisis a combination of the population mean change and thebetween-person variability or individual deviations fromthe mean change Noise is the same as for responsemonitoringdshort-term random variation in level withinan individual Unlike response monitoring in which thebetween-person variation component of the signal is oftensmall rendering monitoring unnecessary in long-term mon-itoring there is usually substantial between-person variationin the long-term trends (see Refs [141624] for examples)This means that we are unable to estimate signal on the basisof population mean change alone and need to also estimatethe individualrsquos true deviation from the mean change underconditions of a favorable signal-to-noise ratio

Analogous methods may be used to those for estimatingsignal and noise as for the case of response monitoringBetween-person variation in long-term trends over timemay be estimated by simple methods (eg subtractingtwice the short-term random variation from the total ob-served variation of the difference from the baseline to eachsubsequent time point [1416]) or through modeling [mixedmodels which may include random slopes (trends overtime) [111416]]

Other issues to consider is that variance may be depen-dent on level and the distribution curves may be non-

normal For example it is well accepted that the lsquolsquonoisersquorsquocomponent of variance may be level dependent [25] Othercomponents of total variance (such as between-person var-iation in baseline levels and trajectories over time) may alsobe level dependent This will often result in the residualsandor random-effects distributions being non-normallydistributed Often a transformation such as natural logmay solve both problems and should be done before calcu-lation of the signal-to-noise ratio (for examples see Refs[1418])

4 Some clinical examples

We applied the aforementioned framework to the clini-cal situations of monitoring patients at high risk for CVD(lipids and BP) For the purposes of illustration we haveused ranking to compare the tests for each of the criteriaAn alternative approach would be to present the actualquantitative measures for each criterion Although datafrom a single study are used as illustration for both lipidsand blood pressure ideally we would use data from severalsources or from a systematic review

The results for lipids are shown in Table 1 Overall thebest lipid marker for monitoring appears to be LDL-to-HDL ratio although other markers that combined two testsalso ranked highly total-to-HDL cholesterol ratio and non-HDL cholesterol

The results for BP measurements are shown in Table 2All three tests (clinic ambulatory and home BP) aim tomeasure the same risk factor the individualrsquos BP In thiscase the differences in ranking for validity are becauseout-of-office measurement of BP offers a more accurate re-flection of the individualrsquos usual BP This is because of botha more natural environment where measurement takesplace (avoidance of lsquolsquowhite coat hypertensionrsquorsquo) and reduc-tion in noise (measurement variability) through averaginga larger number of measurements In addition the measuresof variability possible with out-of-office measurement havebeen found to be predictive of CVD independent of meanBP level this also adds to the superior predictiveness of

Table 2 Ranking of BP-monitoring tests

BP test Validity ResponsivenessLong-termchange

Clinic BP 4 3 3Ambulatory BP 52a 51 51Home BP 52a 51 51Estimated absolute

cardiovascular risk1

Abbreviation BP blood pressurea Based on risk of cardiovascular events in the PAMELA

study [28]

158 KJL Bell et al Journal of Clinical Epidemiology 67 (2014) 152e159

these types of BP measurement Both types of out-of-officeBP also outrank clinic BP for both responsiveness and long-term change primarily through the reduction in noise that ispossible with these types of BP measurement

Cardiovascular risk estimation (using the Framingham[8] or similar equation) outperforms both lipids and BP interms of validity Currently this is only monitored in thepretreatment population to decide who should be startedon lipid- and BP-lowering treatments [26] but theoreticallyit could be used for monitoring patients on treatment alsoHow this test performs on measures of responsiveness andlong-term change is currently unknown

5 Discussion and conclusions

The choice of the best test to monitor a patient may bemade by applying the methods described previously toavailable evidence and then ranking tests in terms of valid-ity responsiveness and long-term change Although notdiscussed in this article the fourth criterion of practicalityis also important For example when choosing betweenhome and ambulatory BP (which perform similarly on thecriteria of validity responsiveness and long-term change)the clinician and patient may opt for home BP measure-ment because of its ease of application

The criteria that we describe evaluate monitoring fortests assessed on a continuum We do this for several rea-sons many common monitoring tests are reported as con-tinuous separating lsquolsquosignalrsquorsquo from lsquolsquonoisersquorsquo is most easilydone using continuous measures and maintaining the con-tinuity of measurements will increase statistical power [27]However sometimes changes observed in continuous testsduring monitoring may then be dichotomized or split intomore than two categories by comparing the level to deci-sion thresholds to decide whether there is a need for alter-ing management or early retesting

Unless one test dominates on all three criteria and onpracticality the choice of test will involve some trade-offbetween the criteria Depending on the clinical circum-stances clinicians may give more weight to one of thesecriteria than the others although validity will usually bethe most important By making an evidence-based choiceon the monitoring test to use we can expect better clinicaloutcomes higher patient and clinician satisfaction and lesswaste of scarce health resources

Acknowledgments

The authors thank Dr Clement Loy and Dr RafaelPerera-Salazar for their helpful comments on an earlierdraft of this article

References

[1] Glasziou PP Irwig L Mant D Monitoring in chronic disease a ratio-

nal approach BMJ 2005330644e8[2] Bossuyt PM Irwig L Craig J Glasziou P Comparative accuracy as-

sessing new tests against existing diagnostic pathways [Review] [16

refs] [Erratum appears in BMJ 2006332(7554)1368] BMJ 2006

3321089e92

[3] Hlatky MA Greenland P Arnett DK Ballantyne CM Criqui MH

Elkind MS et al Criteria for evaluation of novel markers of cardio-

vascular risk a scientific statement from the American Heart Associ-

ation [Erratum appears in Circulation 2009119(25)e606 Note

Hong Yuling [added]] Circulation 20091192408e16

[4] Janes H Pepe M Bossuyt P Barlow W Measuring the performance

of markers for guiding treatment decisions Ann Intern Med 2011

154253e9

[5] Irwig L Glasziou PP Choosing the best monitoring tests In

Glasziou PP Irwig L Aronson JK editors Evidence-based medical

monitoring from principles to practice Malden MA BMJ Books

200863e74

[6] Boekholdt SM Arsenault BJ Mora S Pedersen TR LaRosa JC

Nestel PJ et al Association of LDL cholesterol non-HDL choles-

terol and apolipoprotein B levels with risk of cardiovascular events

among patients treated with statins A meta-analysis JAMA 2012

3071302e9

[7] Simes RJ Marschner IC Hunt D Colquhoun D Sullivan D

Steward RAH et al Relationship between lipid levels and clinical

outcomes in the Long-Term Intervention with Pravastatin in Ischemic

Disease (LIPID) Trial to what extend is the reduction in coronary

events with pravastatin explained by on-study lipid levels Circula-

tion 20021051162e9

[8] DrsquoAgostino RB Vasan RS Pencina MJ Wolf PA Cobain M

Massaro JM et al General cardiovascular risk profile for use in pri-

mary care the Framingham Heart Study Circulation 2008117

743e53

[9] The Long-Term Intervention with Pravastatin in Ischaemic Disease

(LIPID) Study Group Prevention of cardiovascular events and death

with pravastatin in patients with coronary heart disease and a broad

range of initial cholesterol levels N Engl J Med 1998339

1349e57

[10] PROGRESS Collaboration Group Randomised trial of a perindopril-

based blood-pressure-lowering regimen among 6105 individuals with

previous stroke or transient ischaemic attack Lancet 2001358

1033e41

[11] Bell KJL Hayen A Macaskill P Irwig L Craig JC Ensrud KEE

et al Value of routine monitoring of bone mineral density after start-

ing bisphosphonate treatment secondary analysis of trial data BMJ

2009338b2266

[12] Bell KJL Hayen A Macaskill P Craig JC Neal BC Irwig L Mixed

models showed no need for initial response monitoring after starting

anti-hypertensive therapy J Clin Epidemiol 200962650e9

[13] Bell KJL Hayen A Macaskill P Craig JC Neal BC Fox KM

et al Monitoring initial response to angiotensin converting enzyme

inhibitor based regimens an individual patient data meta-analysis

from randomised placebo controlled trials Hypertension 201056

533e9[14] Keenan K Hayen A Neal B Irwig L Long term monitoring in pa-

tients receiving treatment to lower blood pressure analysis of data

from placebo controlled randomised controlled trial BMJ 2009

338b1492

159KJL Bell et al Journal of Clinical Epidemiology 67 (2014) 152e159

[15] Shepard D Reliability of blood pressure measurements implications

for designing and evaluating programs to control hypertension

J Chronic Dis 198134191e209

[16] Glasziou PP Irwig L Heritier S Simes J Tonkin A the LIPID Study

Investigators Monitoring cholesterol levels measurement error or

true change Ann Intern Med 2008148656e61

[17] Bell KJL Irwig L Craig JC Macaskill P Use of randomized trials to

decide when to monitor response to new treatment BMJ 2008336

361e5[18] Bell K Hayen A Irwig L Hochberg M Ensrud K Cummings S

et al The potential value of monitoring bone turnover markers among

women on alendronate J Bone Miner Res 201227195e201[19] Bell KJL Irwig L March L Hayen A Macaskill P Craig JC Should

response rules be used to decide continued subsidy of very expensive

drugs A checklist for decision makers Pharmacoepidemiol Drug Saf

201019(1)99e105[20] van de Putte LBA Atkins C Malaise M Sany J Russell AS van

Riel PLCM Efficacy and safety of adalimumab as monotherapy in

patients with rheumatoid arthritis for whom previous disease modify-

ing antirheumatic drug treatment has failed Ann Rheum Dis 2004

63508e16

[21] Westhoevens R Yocum D Han J Berman A Strusber I Geusens P

et al The safety of infliximab combined with background treatments

among patients with rheumatoid arthritis and various comorbidities

Arthritis Rheum 2006541075e86

[22] Lassere M van der Heijde D Johnson KR Boers M Edmonds J

Reliability of measures of disease activity and disease damage in rheu-

matoid arthritis implications for smallest detectable difference min-

imal clinically important difference and analysis of treatment effects

in randomized controlled trials J Rheumatol 200128892e903[23] Freedman LS Graubard BI Statistical validation of intermediate

endpoints for chronic diseases Stat Med 199211167e78

[24] Takahashi O Glasziou PP Perera R Shimbo T Fukui T Blood pres-

sure re-screening for healthy adults what is the best measure and in-

terval J Hum Hypertens 201226540e6 httpdxdoiorg101038

jhh201172

[25] Bland JM Altman DG Measurement error proportional to the mean

BMJ 1996313106

[26] Jackson R Lawes C Bennet D Milne R Roders A Treatment with

drugs to lower blood pressure and blood cholesterol based on an in-

dividualrsquos absolute cardiovascular risk Lancet 2005365434e41[27] Spruijt B Vergouwe Y Nijman RG Thompson M Oostenbrink R

Vital signs should be maintained as continuous variables when pre-

dicting bacterial infections in febrile children J Clin Epidemiol

201366453e7[28] Sega R Facchetti R Bombelli M Cesana G Corrao G Grassi G

Mancia G Prognostic value of ambulatory and home blood pressures

compared with office blood pressure in the general population fol-

low-up results from the Pressioni Arteriose Monitorate e Loro Asso-

ciazioni (PAMELA) study Circulation 20051111777e83

157KJL Bell et al Journal of Clinical Epidemiology 67 (2014) 152e159

true treatment effects from the background day-to-daywithin-person variation

Responsiveness is also indirectly assessed using the PTE[23] and related methods These statistical methods weredeveloped to assess the validity of a surrogate outcome inthe setting of an RCT of a treatment PTE aims to estimatehow much of the treatment effect on clinical outcome is ex-plained by treatment effect on the surrogate this includesboth how much the surrogate predicts the clinical outcomeand how responsive the surrogate is to treatment Estimatesfrom the LIPID trial found that most or all of the effectpravastatin had on reducing cardiovascular events couldbe attributable to the combined effects on total cholesteroland high-density lipoprotein (HDL) The estimated PTE foreach of the single lipid parameters ranged from 8 (triglyc-erides) to 87 (apolipoprotein B) [7]

Table 1 Ranking of lipid-monitoring tests

Lipid Validity ResponsivenessLong-termchange

Total cholesterol 7a 5 5LDL cholesterol 6a 2 6HDL cholesterol 54a 6 3Total-to-HDL cholesterol ratio 3a 4 1LDL-to-HDL cholesterol ratio 2a 1 2Non-HDL cholesterol 54a 3 4Estimated absolutecardiovascular risk

1

Abbreviations LDL low-density lipoprotein HDL high-density li-poprotein HR hazard ratio IQR interquartile range LIPID Long-Term Intervention with Pravastatin in Ischemic Disease

5 indicates that two tests tied on rankingsa Based on HRIQR for change in lipid for predicting coronary

heart disease in the LIPID trial [9]

33 Detection of long-term change how well do thetests distinguish long-term true change from randomvariation

Long-term change describes the ability of the test to dis-cern true long-term changes in the patientrsquos condition (sig-nal) from short-term measurement variability (noise) Thiscriterion is relevant to pretreatment screening and long termon treatment monitoring Although not directly relevant toinitial response monitoring (which is done over the shortterm) we might still consider it in this setting if for consis-tency we want to choose one test to use for all three mon-itoring phases

The signal for long-term change monitoring is the truelong-term trend in level within an individual over time Thisis a combination of the population mean change and thebetween-person variability or individual deviations fromthe mean change Noise is the same as for responsemonitoringdshort-term random variation in level withinan individual Unlike response monitoring in which thebetween-person variation component of the signal is oftensmall rendering monitoring unnecessary in long-term mon-itoring there is usually substantial between-person variationin the long-term trends (see Refs [141624] for examples)This means that we are unable to estimate signal on the basisof population mean change alone and need to also estimatethe individualrsquos true deviation from the mean change underconditions of a favorable signal-to-noise ratio

Analogous methods may be used to those for estimatingsignal and noise as for the case of response monitoringBetween-person variation in long-term trends over timemay be estimated by simple methods (eg subtractingtwice the short-term random variation from the total ob-served variation of the difference from the baseline to eachsubsequent time point [1416]) or through modeling [mixedmodels which may include random slopes (trends overtime) [111416]]

Other issues to consider is that variance may be depen-dent on level and the distribution curves may be non-

normal For example it is well accepted that the lsquolsquonoisersquorsquocomponent of variance may be level dependent [25] Othercomponents of total variance (such as between-person var-iation in baseline levels and trajectories over time) may alsobe level dependent This will often result in the residualsandor random-effects distributions being non-normallydistributed Often a transformation such as natural logmay solve both problems and should be done before calcu-lation of the signal-to-noise ratio (for examples see Refs[1418])

4 Some clinical examples

We applied the aforementioned framework to the clini-cal situations of monitoring patients at high risk for CVD(lipids and BP) For the purposes of illustration we haveused ranking to compare the tests for each of the criteriaAn alternative approach would be to present the actualquantitative measures for each criterion Although datafrom a single study are used as illustration for both lipidsand blood pressure ideally we would use data from severalsources or from a systematic review

The results for lipids are shown in Table 1 Overall thebest lipid marker for monitoring appears to be LDL-to-HDL ratio although other markers that combined two testsalso ranked highly total-to-HDL cholesterol ratio and non-HDL cholesterol

The results for BP measurements are shown in Table 2All three tests (clinic ambulatory and home BP) aim tomeasure the same risk factor the individualrsquos BP In thiscase the differences in ranking for validity are becauseout-of-office measurement of BP offers a more accurate re-flection of the individualrsquos usual BP This is because of botha more natural environment where measurement takesplace (avoidance of lsquolsquowhite coat hypertensionrsquorsquo) and reduc-tion in noise (measurement variability) through averaginga larger number of measurements In addition the measuresof variability possible with out-of-office measurement havebeen found to be predictive of CVD independent of meanBP level this also adds to the superior predictiveness of

Table 2 Ranking of BP-monitoring tests

BP test Validity ResponsivenessLong-termchange

Clinic BP 4 3 3Ambulatory BP 52a 51 51Home BP 52a 51 51Estimated absolute

cardiovascular risk1

Abbreviation BP blood pressurea Based on risk of cardiovascular events in the PAMELA

study [28]

158 KJL Bell et al Journal of Clinical Epidemiology 67 (2014) 152e159

these types of BP measurement Both types of out-of-officeBP also outrank clinic BP for both responsiveness and long-term change primarily through the reduction in noise that ispossible with these types of BP measurement

Cardiovascular risk estimation (using the Framingham[8] or similar equation) outperforms both lipids and BP interms of validity Currently this is only monitored in thepretreatment population to decide who should be startedon lipid- and BP-lowering treatments [26] but theoreticallyit could be used for monitoring patients on treatment alsoHow this test performs on measures of responsiveness andlong-term change is currently unknown

5 Discussion and conclusions

The choice of the best test to monitor a patient may bemade by applying the methods described previously toavailable evidence and then ranking tests in terms of valid-ity responsiveness and long-term change Although notdiscussed in this article the fourth criterion of practicalityis also important For example when choosing betweenhome and ambulatory BP (which perform similarly on thecriteria of validity responsiveness and long-term change)the clinician and patient may opt for home BP measure-ment because of its ease of application

The criteria that we describe evaluate monitoring fortests assessed on a continuum We do this for several rea-sons many common monitoring tests are reported as con-tinuous separating lsquolsquosignalrsquorsquo from lsquolsquonoisersquorsquo is most easilydone using continuous measures and maintaining the con-tinuity of measurements will increase statistical power [27]However sometimes changes observed in continuous testsduring monitoring may then be dichotomized or split intomore than two categories by comparing the level to deci-sion thresholds to decide whether there is a need for alter-ing management or early retesting

Unless one test dominates on all three criteria and onpracticality the choice of test will involve some trade-offbetween the criteria Depending on the clinical circum-stances clinicians may give more weight to one of thesecriteria than the others although validity will usually bethe most important By making an evidence-based choiceon the monitoring test to use we can expect better clinicaloutcomes higher patient and clinician satisfaction and lesswaste of scarce health resources

Acknowledgments

The authors thank Dr Clement Loy and Dr RafaelPerera-Salazar for their helpful comments on an earlierdraft of this article

References

[1] Glasziou PP Irwig L Mant D Monitoring in chronic disease a ratio-

nal approach BMJ 2005330644e8[2] Bossuyt PM Irwig L Craig J Glasziou P Comparative accuracy as-

sessing new tests against existing diagnostic pathways [Review] [16

refs] [Erratum appears in BMJ 2006332(7554)1368] BMJ 2006

3321089e92

[3] Hlatky MA Greenland P Arnett DK Ballantyne CM Criqui MH

Elkind MS et al Criteria for evaluation of novel markers of cardio-

vascular risk a scientific statement from the American Heart Associ-

ation [Erratum appears in Circulation 2009119(25)e606 Note

Hong Yuling [added]] Circulation 20091192408e16

[4] Janes H Pepe M Bossuyt P Barlow W Measuring the performance

of markers for guiding treatment decisions Ann Intern Med 2011

154253e9

[5] Irwig L Glasziou PP Choosing the best monitoring tests In

Glasziou PP Irwig L Aronson JK editors Evidence-based medical

monitoring from principles to practice Malden MA BMJ Books

200863e74

[6] Boekholdt SM Arsenault BJ Mora S Pedersen TR LaRosa JC

Nestel PJ et al Association of LDL cholesterol non-HDL choles-

terol and apolipoprotein B levels with risk of cardiovascular events

among patients treated with statins A meta-analysis JAMA 2012

3071302e9

[7] Simes RJ Marschner IC Hunt D Colquhoun D Sullivan D

Steward RAH et al Relationship between lipid levels and clinical

outcomes in the Long-Term Intervention with Pravastatin in Ischemic

Disease (LIPID) Trial to what extend is the reduction in coronary

events with pravastatin explained by on-study lipid levels Circula-

tion 20021051162e9

[8] DrsquoAgostino RB Vasan RS Pencina MJ Wolf PA Cobain M

Massaro JM et al General cardiovascular risk profile for use in pri-

mary care the Framingham Heart Study Circulation 2008117

743e53

[9] The Long-Term Intervention with Pravastatin in Ischaemic Disease

(LIPID) Study Group Prevention of cardiovascular events and death

with pravastatin in patients with coronary heart disease and a broad

range of initial cholesterol levels N Engl J Med 1998339

1349e57

[10] PROGRESS Collaboration Group Randomised trial of a perindopril-

based blood-pressure-lowering regimen among 6105 individuals with

previous stroke or transient ischaemic attack Lancet 2001358

1033e41

[11] Bell KJL Hayen A Macaskill P Irwig L Craig JC Ensrud KEE

et al Value of routine monitoring of bone mineral density after start-

ing bisphosphonate treatment secondary analysis of trial data BMJ

2009338b2266

[12] Bell KJL Hayen A Macaskill P Craig JC Neal BC Irwig L Mixed

models showed no need for initial response monitoring after starting

anti-hypertensive therapy J Clin Epidemiol 200962650e9

[13] Bell KJL Hayen A Macaskill P Craig JC Neal BC Fox KM

et al Monitoring initial response to angiotensin converting enzyme

inhibitor based regimens an individual patient data meta-analysis

from randomised placebo controlled trials Hypertension 201056

533e9[14] Keenan K Hayen A Neal B Irwig L Long term monitoring in pa-

tients receiving treatment to lower blood pressure analysis of data

from placebo controlled randomised controlled trial BMJ 2009

338b1492

159KJL Bell et al Journal of Clinical Epidemiology 67 (2014) 152e159

[15] Shepard D Reliability of blood pressure measurements implications

for designing and evaluating programs to control hypertension

J Chronic Dis 198134191e209

[16] Glasziou PP Irwig L Heritier S Simes J Tonkin A the LIPID Study

Investigators Monitoring cholesterol levels measurement error or

true change Ann Intern Med 2008148656e61

[17] Bell KJL Irwig L Craig JC Macaskill P Use of randomized trials to

decide when to monitor response to new treatment BMJ 2008336

361e5[18] Bell K Hayen A Irwig L Hochberg M Ensrud K Cummings S

et al The potential value of monitoring bone turnover markers among

women on alendronate J Bone Miner Res 201227195e201[19] Bell KJL Irwig L March L Hayen A Macaskill P Craig JC Should

response rules be used to decide continued subsidy of very expensive

drugs A checklist for decision makers Pharmacoepidemiol Drug Saf

201019(1)99e105[20] van de Putte LBA Atkins C Malaise M Sany J Russell AS van

Riel PLCM Efficacy and safety of adalimumab as monotherapy in

patients with rheumatoid arthritis for whom previous disease modify-

ing antirheumatic drug treatment has failed Ann Rheum Dis 2004

63508e16

[21] Westhoevens R Yocum D Han J Berman A Strusber I Geusens P

et al The safety of infliximab combined with background treatments

among patients with rheumatoid arthritis and various comorbidities

Arthritis Rheum 2006541075e86

[22] Lassere M van der Heijde D Johnson KR Boers M Edmonds J

Reliability of measures of disease activity and disease damage in rheu-

matoid arthritis implications for smallest detectable difference min-

imal clinically important difference and analysis of treatment effects

in randomized controlled trials J Rheumatol 200128892e903[23] Freedman LS Graubard BI Statistical validation of intermediate

endpoints for chronic diseases Stat Med 199211167e78

[24] Takahashi O Glasziou PP Perera R Shimbo T Fukui T Blood pres-

sure re-screening for healthy adults what is the best measure and in-

terval J Hum Hypertens 201226540e6 httpdxdoiorg101038

jhh201172

[25] Bland JM Altman DG Measurement error proportional to the mean

BMJ 1996313106

[26] Jackson R Lawes C Bennet D Milne R Roders A Treatment with

drugs to lower blood pressure and blood cholesterol based on an in-

dividualrsquos absolute cardiovascular risk Lancet 2005365434e41[27] Spruijt B Vergouwe Y Nijman RG Thompson M Oostenbrink R

Vital signs should be maintained as continuous variables when pre-

dicting bacterial infections in febrile children J Clin Epidemiol

201366453e7[28] Sega R Facchetti R Bombelli M Cesana G Corrao G Grassi G

Mancia G Prognostic value of ambulatory and home blood pressures

compared with office blood pressure in the general population fol-

low-up results from the Pressioni Arteriose Monitorate e Loro Asso-

ciazioni (PAMELA) study Circulation 20051111777e83

Table 2 Ranking of BP-monitoring tests

BP test Validity ResponsivenessLong-termchange

Clinic BP 4 3 3Ambulatory BP 52a 51 51Home BP 52a 51 51Estimated absolute

cardiovascular risk1

Abbreviation BP blood pressurea Based on risk of cardiovascular events in the PAMELA

study [28]

158 KJL Bell et al Journal of Clinical Epidemiology 67 (2014) 152e159

these types of BP measurement Both types of out-of-officeBP also outrank clinic BP for both responsiveness and long-term change primarily through the reduction in noise that ispossible with these types of BP measurement

Cardiovascular risk estimation (using the Framingham[8] or similar equation) outperforms both lipids and BP interms of validity Currently this is only monitored in thepretreatment population to decide who should be startedon lipid- and BP-lowering treatments [26] but theoreticallyit could be used for monitoring patients on treatment alsoHow this test performs on measures of responsiveness andlong-term change is currently unknown

5 Discussion and conclusions

The choice of the best test to monitor a patient may bemade by applying the methods described previously toavailable evidence and then ranking tests in terms of valid-ity responsiveness and long-term change Although notdiscussed in this article the fourth criterion of practicalityis also important For example when choosing betweenhome and ambulatory BP (which perform similarly on thecriteria of validity responsiveness and long-term change)the clinician and patient may opt for home BP measure-ment because of its ease of application

The criteria that we describe evaluate monitoring fortests assessed on a continuum We do this for several rea-sons many common monitoring tests are reported as con-tinuous separating lsquolsquosignalrsquorsquo from lsquolsquonoisersquorsquo is most easilydone using continuous measures and maintaining the con-tinuity of measurements will increase statistical power [27]However sometimes changes observed in continuous testsduring monitoring may then be dichotomized or split intomore than two categories by comparing the level to deci-sion thresholds to decide whether there is a need for alter-ing management or early retesting

Unless one test dominates on all three criteria and onpracticality the choice of test will involve some trade-offbetween the criteria Depending on the clinical circum-stances clinicians may give more weight to one of thesecriteria than the others although validity will usually bethe most important By making an evidence-based choiceon the monitoring test to use we can expect better clinicaloutcomes higher patient and clinician satisfaction and lesswaste of scarce health resources

Acknowledgments

The authors thank Dr Clement Loy and Dr RafaelPerera-Salazar for their helpful comments on an earlierdraft of this article

References

[1] Glasziou PP Irwig L Mant D Monitoring in chronic disease a ratio-

nal approach BMJ 2005330644e8[2] Bossuyt PM Irwig L Craig J Glasziou P Comparative accuracy as-

sessing new tests against existing diagnostic pathways [Review] [16

refs] [Erratum appears in BMJ 2006332(7554)1368] BMJ 2006

3321089e92

[3] Hlatky MA Greenland P Arnett DK Ballantyne CM Criqui MH

Elkind MS et al Criteria for evaluation of novel markers of cardio-

vascular risk a scientific statement from the American Heart Associ-

ation [Erratum appears in Circulation 2009119(25)e606 Note

Hong Yuling [added]] Circulation 20091192408e16

[4] Janes H Pepe M Bossuyt P Barlow W Measuring the performance

of markers for guiding treatment decisions Ann Intern Med 2011

154253e9

[5] Irwig L Glasziou PP Choosing the best monitoring tests In

Glasziou PP Irwig L Aronson JK editors Evidence-based medical

monitoring from principles to practice Malden MA BMJ Books

200863e74

[6] Boekholdt SM Arsenault BJ Mora S Pedersen TR LaRosa JC

Nestel PJ et al Association of LDL cholesterol non-HDL choles-

terol and apolipoprotein B levels with risk of cardiovascular events

among patients treated with statins A meta-analysis JAMA 2012

3071302e9

[7] Simes RJ Marschner IC Hunt D Colquhoun D Sullivan D

Steward RAH et al Relationship between lipid levels and clinical

outcomes in the Long-Term Intervention with Pravastatin in Ischemic

Disease (LIPID) Trial to what extend is the reduction in coronary

events with pravastatin explained by on-study lipid levels Circula-

tion 20021051162e9

[8] DrsquoAgostino RB Vasan RS Pencina MJ Wolf PA Cobain M

Massaro JM et al General cardiovascular risk profile for use in pri-

mary care the Framingham Heart Study Circulation 2008117

743e53

[9] The Long-Term Intervention with Pravastatin in Ischaemic Disease

(LIPID) Study Group Prevention of cardiovascular events and death

with pravastatin in patients with coronary heart disease and a broad

range of initial cholesterol levels N Engl J Med 1998339

1349e57

[10] PROGRESS Collaboration Group Randomised trial of a perindopril-

based blood-pressure-lowering regimen among 6105 individuals with

previous stroke or transient ischaemic attack Lancet 2001358

1033e41

[11] Bell KJL Hayen A Macaskill P Irwig L Craig JC Ensrud KEE

et al Value of routine monitoring of bone mineral density after start-

ing bisphosphonate treatment secondary analysis of trial data BMJ

2009338b2266

[12] Bell KJL Hayen A Macaskill P Craig JC Neal BC Irwig L Mixed

models showed no need for initial response monitoring after starting

anti-hypertensive therapy J Clin Epidemiol 200962650e9

[13] Bell KJL Hayen A Macaskill P Craig JC Neal BC Fox KM

et al Monitoring initial response to angiotensin converting enzyme

inhibitor based regimens an individual patient data meta-analysis

from randomised placebo controlled trials Hypertension 201056

533e9[14] Keenan K Hayen A Neal B Irwig L Long term monitoring in pa-

tients receiving treatment to lower blood pressure analysis of data

from placebo controlled randomised controlled trial BMJ 2009

338b1492

159KJL Bell et al Journal of Clinical Epidemiology 67 (2014) 152e159

[15] Shepard D Reliability of blood pressure measurements implications

for designing and evaluating programs to control hypertension

J Chronic Dis 198134191e209

[16] Glasziou PP Irwig L Heritier S Simes J Tonkin A the LIPID Study

Investigators Monitoring cholesterol levels measurement error or

true change Ann Intern Med 2008148656e61

[17] Bell KJL Irwig L Craig JC Macaskill P Use of randomized trials to

decide when to monitor response to new treatment BMJ 2008336

361e5[18] Bell K Hayen A Irwig L Hochberg M Ensrud K Cummings S

et al The potential value of monitoring bone turnover markers among

women on alendronate J Bone Miner Res 201227195e201[19] Bell KJL Irwig L March L Hayen A Macaskill P Craig JC Should

response rules be used to decide continued subsidy of very expensive

drugs A checklist for decision makers Pharmacoepidemiol Drug Saf

201019(1)99e105[20] van de Putte LBA Atkins C Malaise M Sany J Russell AS van

Riel PLCM Efficacy and safety of adalimumab as monotherapy in

patients with rheumatoid arthritis for whom previous disease modify-

ing antirheumatic drug treatment has failed Ann Rheum Dis 2004

63508e16

[21] Westhoevens R Yocum D Han J Berman A Strusber I Geusens P

et al The safety of infliximab combined with background treatments

among patients with rheumatoid arthritis and various comorbidities

Arthritis Rheum 2006541075e86

[22] Lassere M van der Heijde D Johnson KR Boers M Edmonds J

Reliability of measures of disease activity and disease damage in rheu-

matoid arthritis implications for smallest detectable difference min-

imal clinically important difference and analysis of treatment effects

in randomized controlled trials J Rheumatol 200128892e903[23] Freedman LS Graubard BI Statistical validation of intermediate

endpoints for chronic diseases Stat Med 199211167e78

[24] Takahashi O Glasziou PP Perera R Shimbo T Fukui T Blood pres-

sure re-screening for healthy adults what is the best measure and in-

terval J Hum Hypertens 201226540e6 httpdxdoiorg101038

jhh201172

[25] Bland JM Altman DG Measurement error proportional to the mean

BMJ 1996313106

[26] Jackson R Lawes C Bennet D Milne R Roders A Treatment with

drugs to lower blood pressure and blood cholesterol based on an in-

dividualrsquos absolute cardiovascular risk Lancet 2005365434e41[27] Spruijt B Vergouwe Y Nijman RG Thompson M Oostenbrink R

Vital signs should be maintained as continuous variables when pre-

dicting bacterial infections in febrile children J Clin Epidemiol

201366453e7[28] Sega R Facchetti R Bombelli M Cesana G Corrao G Grassi G

Mancia G Prognostic value of ambulatory and home blood pressures

compared with office blood pressure in the general population fol-

low-up results from the Pressioni Arteriose Monitorate e Loro Asso-

ciazioni (PAMELA) study Circulation 20051111777e83

159KJL Bell et al Journal of Clinical Epidemiology 67 (2014) 152e159

[15] Shepard D Reliability of blood pressure measurements implications

for designing and evaluating programs to control hypertension

J Chronic Dis 198134191e209

[16] Glasziou PP Irwig L Heritier S Simes J Tonkin A the LIPID Study

Investigators Monitoring cholesterol levels measurement error or

true change Ann Intern Med 2008148656e61

[17] Bell KJL Irwig L Craig JC Macaskill P Use of randomized trials to

decide when to monitor response to new treatment BMJ 2008336

361e5[18] Bell K Hayen A Irwig L Hochberg M Ensrud K Cummings S

et al The potential value of monitoring bone turnover markers among

women on alendronate J Bone Miner Res 201227195e201[19] Bell KJL Irwig L March L Hayen A Macaskill P Craig JC Should

response rules be used to decide continued subsidy of very expensive

drugs A checklist for decision makers Pharmacoepidemiol Drug Saf

201019(1)99e105[20] van de Putte LBA Atkins C Malaise M Sany J Russell AS van

Riel PLCM Efficacy and safety of adalimumab as monotherapy in

patients with rheumatoid arthritis for whom previous disease modify-

ing antirheumatic drug treatment has failed Ann Rheum Dis 2004

63508e16

[21] Westhoevens R Yocum D Han J Berman A Strusber I Geusens P

et al The safety of infliximab combined with background treatments

among patients with rheumatoid arthritis and various comorbidities

Arthritis Rheum 2006541075e86

[22] Lassere M van der Heijde D Johnson KR Boers M Edmonds J

Reliability of measures of disease activity and disease damage in rheu-

matoid arthritis implications for smallest detectable difference min-

imal clinically important difference and analysis of treatment effects

in randomized controlled trials J Rheumatol 200128892e903[23] Freedman LS Graubard BI Statistical validation of intermediate

endpoints for chronic diseases Stat Med 199211167e78

[24] Takahashi O Glasziou PP Perera R Shimbo T Fukui T Blood pres-

sure re-screening for healthy adults what is the best measure and in-

terval J Hum Hypertens 201226540e6 httpdxdoiorg101038

jhh201172

[25] Bland JM Altman DG Measurement error proportional to the mean

BMJ 1996313106

[26] Jackson R Lawes C Bennet D Milne R Roders A Treatment with

drugs to lower blood pressure and blood cholesterol based on an in-

dividualrsquos absolute cardiovascular risk Lancet 2005365434e41[27] Spruijt B Vergouwe Y Nijman RG Thompson M Oostenbrink R

Vital signs should be maintained as continuous variables when pre-

dicting bacterial infections in febrile children J Clin Epidemiol

201366453e7[28] Sega R Facchetti R Bombelli M Cesana G Corrao G Grassi G

Mancia G Prognostic value of ambulatory and home blood pressures

compared with office blood pressure in the general population fol-

low-up results from the Pressioni Arteriose Monitorate e Loro Asso-

ciazioni (PAMELA) study Circulation 20051111777e83