disease entities, mixed multi-normal distributions, and the role of the hyperkinetic state in the...

14
STATISTICS IN MEDICINE, VOL. 9, 301-314 (1990) DISEASE ENTITIES, MIXED MULTI-NORMAL DISTRIBUTIONS, AND THE ROLE OF THE HYPERKINETIC STATE IN THE PATHOGENESIS OF HYPERTENSION NICHOLAS J. SCHORK, ALAN B. WEDER, M. ANTHONY SCHORK, DAVID R. BASSETT AND STEVO JULIUS University of Michigan, Kresge Medical Research Building I, R6661, Box 0500, Ann Arbor, Mirhigan 488109, U.S.A SUMMARY This paper concerns the theory and relevance of finite mixtures of univariate and multivariate normal distributions in medical research and suggests that multivariate normal mixture analysis, hitherto not extensively explored, is an appealing approach to the investigation of etiologically obscure, multifactorial diseases such as hypertension. We elaborate a statistical strategy to resolve and test for a normal mixture distribution in a seemingly heterogenous population actually comprising homogenous subpopulations. We use this strategy to validate the hypothesis that in the population at large there is a subgroup of individuals with the characteristic of a hyperkinetic circulatory state, defined as the association of an elevated cardiac index or heart rate with high blood pressure. This subgroup may have implications for the pathogenesis of hypertension. We discuss directions and implications for future research into the pathogenesis of hypertension. 1. INTRODUCTION Human diseases, particularly those involving dysregulation of complex physiological systems, are commonly defined by relatively broad, easily observable traits: hypertension by high blood pressure; diabetes mellitus by high blood sugar; obesity by excessive body weight. A focus on these traits is often a useful simplification, such as for the design of therapeutic interventions, but may obscure the heterogeneity of the disease process. Broad traits actually define disease families composed of numerous pathogenetic entities that converge upon a common feature, and rarely does the study of a (broad) trait alone uncover the several causes of the disease. The basis for the search for the causes of complex dysregulations therefore must consist of a taxonomy created by the partition of disease traits into distinct subgroups based on discrete physiological, biochemical or genetic characters. One approach to the partition of multifactorial disease has been the study of elements that contribute to the regulation of a trait of interest, the expectation being that extreme values of a regulatory element may dominate the expression of the final trait to such an extent as to constitute the sole cause of the disease in affected individuals. In hypertension research, this approach has proved useful in the definition of several clear causes of high blood pressure, for 0277-67 15/90/030301-14$07.00 0 1990 by John Wiley & Sons, Ltd. Received April 1988 Revised August 1989

Upload: nicholas-j-schork

Post on 06-Jul-2016

215 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Disease entities, mixed multi-normal distributions, and the role of the hyperkinetic state in the pathogenesis of hypertension

STATISTICS IN MEDICINE, VOL. 9, 301-314 (1990)

DISEASE ENTITIES, MIXED MULTI-NORMAL DISTRIBUTIONS, AND THE ROLE OF THE

HYPERKINETIC STATE IN THE PATHOGENESIS OF HYPERTENSION

NICHOLAS J. SCHORK, ALAN B. WEDER, M. ANTHONY SCHORK, DAVID R. BASSETT AND STEVO JULIUS

University of Michigan, Kresge Medical Research Building I , R6661, Box 0500, Ann Arbor, Mirhigan 488109, U.S.A

SUMMARY This paper concerns the theory and relevance of finite mixtures of univariate and multivariate normal distributions in medical research and suggests that multivariate normal mixture analysis, hitherto not extensively explored, is an appealing approach to the investigation of etiologically obscure, multifactorial diseases such as hypertension. We elaborate a statistical strategy to resolve and test for a normal mixture distribution in a seemingly heterogenous population actually comprising homogenous subpopulations. We use this strategy to validate the hypothesis that in the population at large there is a subgroup of individuals with the characteristic of a hyperkinetic circulatory state, defined as the association of an elevated cardiac index or heart rate with high blood pressure. This subgroup may have implications for the pathogenesis of hypertension. We discuss directions and implications for future research into the pathogenesis of hypertension.

1 . INTRODUCTION

Human diseases, particularly those involving dysregulation of complex physiological systems, are commonly defined by relatively broad, easily observable traits: hypertension by high blood pressure; diabetes mellitus by high blood sugar; obesity by excessive body weight. A focus on these traits is often a useful simplification, such as for the design of therapeutic interventions, but may obscure the heterogeneity of the disease process. Broad traits actually define disease families composed of numerous pathogenetic entities that converge upon a common feature, and rarely does the study of a (broad) trait alone uncover the several causes of the disease. The basis for the search for the causes of complex dysregulations therefore must consist of a taxonomy created by the partition of disease traits into distinct subgroups based on discrete physiological, biochemical or genetic characters.

One approach to the partition of multifactorial disease has been the study of elements that contribute to the regulation of a trait of interest, the expectation being that extreme values of a regulatory element may dominate the expression of the final trait to such an extent as to constitute the sole cause of the disease in affected individuals. In hypertension research, this approach has proved useful in the definition of several clear causes of high blood pressure, for

0277-67 15/90/030301-14$07.00 0 1990 by John Wiley & Sons, Ltd.

Received April 1988 Revised August 1989

Page 2: Disease entities, mixed multi-normal distributions, and the role of the hyperkinetic state in the pathogenesis of hypertension

302 N. J. SCHORK ET AL.

example, pheochromocytoma, as well as clinically useful physiological subgroupings of less obvious etiological importance, for example, high- and low-renin hypertensions. It has been argued, however, that for the bulk of hypertensives, the number of potential regulators of blood pressure is too great and the contribution of individual regulatory elements too small to expect success with this approach.’*2 In part, this pessimism stems from the reductionist bias of many investigators who favour commencement of the search for subgroups at a very basic biochemical or genetic level. While the attraction of such an approach is obvious, as a defined subgroup will most likely have a unique cause of hypertension, it ignores the hierarchical nature of cardiovascular function and directs attention away from other potentially useful, albeit more complex, subgroup schemes based on functional characteristics of the circulation. While such functional disorders retain a level of complexity that requires further investigation, they have appeal precisely because they are proximate to the final trait of interest, hypertension. We should view the approach to understanding hypertension as a co-operative effort that employs both ‘top- down’ and ‘bottom-up’ strategies, directed toward a single ultimate goal of the explanation of not only the root causes of hypertension but also the intermediary dysfunctions (referred to as ‘disease entities’ below) that occur at all levels of the cardiovascular hierarchy.

One major impediment to the delineation of disease entities that contribute to quantitative disease traits such as hypertension is the difficulty with isolation of discrete subgroups from what are usually continuously distributed variables (for example, blood pressure) at more integrated, higher-order levels of cardiovascular function. One best approaches the problem statistically, but only recently have techniques of appropriate power seen applications. We elaborate in this paper an approach to, and statistical techniques for, identification within a quantitative disease trait subgroups of individuals characterized by discrete complex physiological characteristics. We apply the statistical models developed to a common disease entity, hyperkinetic hypertension, to demonstrate how, through the use of such subgroup analysis, we can identify clusters of individuals suitable for further study at more basic levels.

2 CLUSTER ANALYSIS VIA NORMAL MIXTURE DISTRIBUTIONS

2.1. Background

Statistical techniques for the identification of subgroups or clusters of observations have been previously described.334 Many of these techniques have purely exploratory or speculative purposes and make no assumptions about the number, relative position, or distributional form of the subpopulations; as a result, the clusters or subgroups gleaned from such analyses may resist probabilistic and interpretative assesment. Tn the following we suggest that mixtures of normal (or more specifically multi-normal) distributions provide effective, statistically tractable, and intuitively appealing clustering devices that lend themselves to the delineation and investigation of disease entities.

In the mid-1960s Morton’ developed a terminology to describe the effects that underlying factors can have on the overall distribution of a quantitative trait (in Morton’s research, the factors were associated with genes). Although we borrow this terminology, we dispense with connotations that implicate purely genetic effects. Morton labelled as rnicrophenic those factors whose effects are small relative to the overall standard deviation of the trait, and as rneguphenic those whose effects are large. For most quantitative traits, there are typically many microphenic factors; these can include products of an individual’s genome and purely environmental influences

Page 3: Disease entities, mixed multi-normal distributions, and the role of the hyperkinetic state in the pathogenesis of hypertension

MIXED MULTI-NORMAL DISTRIBUTIONS 303

as well as interactive effects in behavioural, physiological and biochemical systems. The identification of such microphenic effects often depends on devising an experimental approach in which one can study a proposed microphenic factor in isolation so that its contribution to the overall variability of the trait is estimable. Megaphenic factors affecting quantitative traits, although conspicuous when isolated, are rare. When identified, however, megaphenic effects, in addition to their explanation of quantitative variability, can admit qualitative interpretations: we can subdivide populations into groups affected or not affected by the megaphenic factor. If we assume that traits with heterogenous or multi-factorial determinants are in part determined by a number of (microphenic) factors whose summed effects contribute to the overall variation of the trait in the population of large, we can invoke the Central Limit Theorem to assert that the distribution of values of a relevant quantitative trait is approximately normal. The effects of a megaphenic factor will then act to displace the average value of an affected subgroup of people from the average value of those people not affected. Within each subgroup the same microphenic factors are operative so that the distribution of values within each subgroup is normal. Thus, we can model the joint effects of microphenic and megaphenic factors on the variation of a specific trait as a mixture of normal distributions. The separation between the component distribution means will reflect a distance that is proportional to the displacing effect of the megaphenic factor or factors, and the proportion of observations assignable to each component distribution is proportional to the prevalence of the megaphenic factor. When the subpopulations are sufficiently discrete, as assessed by a statistically significant likelihood that a mixture of normal distributions better explains the variation in a trait (or traits) than a single distribution, we can say that a disease entity or entities (that is, significant megaphenic effects) exist. It is important to note that this approach is not limited to a single trait of interest: disease entities may manifest themselves as a complex of physiological and biochemical mechanisms, for example, the hyperkinetic circulatory state we describe below. Though published work with univariate normal mixture distributions is extensive (see reference list in Titterington, Smith, and Makov6), multivariate normal distributions have received considerably less attention. We believe that the multivariate normal setting is more appropriate for the isolation of disease entities simply because it is rare that a single factor, acting independently of other physiological mechanisms, is the sole determinant of a complex physiologic dysfunction in a group of individuals. In addition, with the use of a multivariate normal model one can take advantage of known correlates or associations that better characterize the presumed functional disorders as candidate disease entities. We emphasize that we make no claim that any single phenomenon can explain all megaphenic forces (for example, the working of a major gene), but we merely draw attention to concepts that can greatly facilitate investigative processes concerning causally obscure but common traits known to have deleterious (that is ‘disease’) manifestations.

2.2. Mathematical details

In this section, we concentrate on the mathematical formulation and statistical assessment of models that depict disease entities and the workings of megaphenic forces.

Consider the case in which we observe a sample of size N collected on M different variables from a population hypothesized as composed of K specific subpopulations such that each observation has probability P k of belonging to the kth subpopulation, such that If= Pk = 1. Suppose further that the M variables are multivariate normally distributed within each subpopulation. The probability of a single observation vector, x,, with the assumption that the subdistributions possess either unique variance-covariance structures, or share a common

Page 4: Disease entities, mixed multi-normal distributions, and the role of the hyperkinetic state in the pathogenesis of hypertension

304 N. J. SCHORK ET AL.

covariance structure, becomes:

where equation (la) is the equation that assumes unequal variance-covariance structures and equation (lb) assumes a common covariance structure; and where ,uk is the mean vector and &' the variance-covariance matrix for the kth subdistribution (X is the common variance-covariance matrix in equation (lb)). The log-likelihoods for the parameters, P k , pk, and Zk, given a sample of N observations are:

N

L"(Pl, ' ' ' , P K , p l y . . . , p K , x l ? . . . ? x K ) = 1 log,[p:(xn)]

L e ( p l , . . . 9 P K , p I r - . . , p K > x ) = 1 log,[p:(xn)].

(2a )

(2b 1

n = 1

N

n = 1

To isolate a disease entity (that is, the presence of a megaphenic factor) we must test the hypothesis that the distribution with the assumption of K subpopulations more likely explains the variation exhibited by a trait than does a distribution with the assumption of K' # K subpopulations. We can accomplish this through likelihood-ratio testing. Thus, tests of hypotheses that implicate different numbers of subpopulations involve maximization of equation (2) over the entire sample for each hypothesis. We can accomplish this with a number of numerical techniques. Though success has been achieved with the use of variants of Newton and Quasi-Newton iteration ~chernes,~ an iterative scheme based on first partial derivatives (typically known as EM a l g ~ r i t h m ~ , ~ ) possesses some favourable numerical properties, and is the scheme we both recommend and used for the analyses described in Sections 4 and 5 of this paper. The assumption concerning the equality of the variance-covariance structures is important. One can argue that disease entities uncovered in an affected group may involve a correlation structure among certain variables unlike that correlation structure among those same variables within an unaffected population. We therefore encourage use of an initial assumption of unequal variance-covariance structures with a subsequent test for equality of the structures once one has estimated the normal mixture parameters. We also encourage a re-run of the analyses with the assumption of a common covariance structure to allow comparison of results and their interpretations. In addition to the covariance matrix assumption, there are several other aspects of normal mixture distribution fitting analyses that one needs to address explicitly.

Skewness

The interpretation of skewness in a distribution, like conspicuous multimodality, is often that a mixture of distributions will fit the given set of (skewed) data better than a single distribution. Many traits, however, exhibit skewed distributions for reasons other than the presence of subpopulations; for instance, a trait distribution may follow a simple log-normal or exponential distribution. What is more, Schork and Schork" have shown that for given levels of skewing, a mixture of two normal distributions will always give a better fit to the data than a single normal distribution. What is needed, then, is a test that will guard against interpretations that implicate subpopulations when skewed, single population-oriented distributional interpretations are more

Page 5: Disease entities, mixed multi-normal distributions, and the role of the hyperkinetic state in the pathogenesis of hypertension

MIXED MULTI-NORMAL DISTRIBUTIONS 305

appropriate. Schork and Schork" elaborate such a technique. Though not trivial to implement, this technique does offer some protection against false inferences.

Hypo thesis testing

In the use of the likelihood ratio test to investigate hypotheses concerning K and K' subpopulations, one cannot generally rely on the usual chi-square asymptotics (see, for example, Hartigan' for theoretical details and Schork and Schork lo for empirical justification). WolfeI2 suggests that in certain situations the difference in log-likelihoods with the assumption of K and K' subpopulations is approximately distributed as chi-square when transformed as:

-"(N - 1 - M - (L, - L,,) - Xz(d.f.), N ( 3 )

where L K and LK, are the log-likelihoods under the assumption of K and K' = K + 1 subpopulations and where the degrees of freedom (d.f.) are 2- (difference in number of parameters assuming K and K' subpopulations not including the probability of membership, pk, parameters). Though this statistic is appropriate for small samples, it is poor for large samples. We offer it for want of any other, though one could use a conservative implementation of this statistic by simply adding one degree of freedom to the total, d.f., derived by Wolfe.12 As an alternative, one could forgo asymptotics and rely on bootstrap tests of hypotheses (see McLachlan13 or Schork and Schork14, 15).

ClassiJcat ion

Once we have found parameters for a mixed-normal distribution, we must classify the observations into the subgroups to which they most likely belong. There are many techniques for achieving this with an excellent survey of the methods given by Johnson and Wichern.I6 Classification of each observation into the appropriate group presents some difficulties. Typically, overlap between the subpopulations results in observations that can be classified into any one of several groups. Though classification techniques generally have mechanisms to resolve marginal cases objectively, we mention this phenomenon because the values of the mixture parameters will invariably differ before and after classification. Thus, one should not interpret the displacing effect of megaphenic factors until one has examined the parameter values after classification of the observations.

In the sections that follow, we focus on an hypothesis amenable to the techniques just outlined, and present results of a large-scale study that investigated the existence of a disease entity based on this hypothesis.

3. BORDERLINE HYPERTENSION AND THE HYPERKINETIC STATE

Blood pressure is a quantitative physical sign continuously distributed in the population, and hypertensives are individuals whose average blood pressure exceeds an accepted arbitrary limit, usually 90 mmHg diastolic. Although some 5-10 per cent of hypertensives have a definable cause (the 'secondary' hypertensions), the majority have high blood pressure for no discernable reason: we say such persons have primary or essential hypertension. In the search for clues to the pathophysiology of essential hypertension, considerable attent.ion has focused on the mildest form, borderline hypertension. Patients with borderline hypertension are those whose blood pressure on different occasions oscillates above and below the arbitrary threshold that defines the

Page 6: Disease entities, mixed multi-normal distributions, and the role of the hyperkinetic state in the pathogenesis of hypertension

306 N. J. SCHORK ET AL.

hypertensive range. Because such patients are largely free of cardiovascular end organ damage and have an increased risk for progression to higher blood pressure, the antecedents of essential hypertension may most clearly show themselves in borderline hypertensives. Utilizing invasive haemodynamic techniques, Julius and co-workers described a subset of borderline hypertensives with increased cardiac output and quantitatively normal vascular resistance.17319 Subsequent studies proved that this ‘hyperkinetic’ circulatory state is neurogenic in origin and probably arises from an abnormality in the central integration of neural inputs in medullary cardiovascular centres.20-22 The ‘hyperkinetic’ circulatory state, however, is not present in established essential hypertension, where vascular resistance is persistently elevated while cardiac output is normal to subnormal. What happens to ‘hyperkinetic’ borderline hypertensives? Julius has proposed that hyperkinetic borderline hypertensives evolve into high resistance hypertensions by a physiologically plausible mechanism involving beta-receptor downregulation in the heart, which would tend to decrease cardiac index, and, as a result, simultaneously increase vascular smooth muscle hypertrophy, which should amplify vascular reactivity and increase r e ~ i s t a n c e . ~ ~ Lund- J o h a n ~ o n ~ ~ provided evidence that supported such a transition when he performed serial haemodynamic studies in a small group of borderline hypertensives over an 18 year period and observed a progressive rise in vascular resistance and a reciprocal normalization of cardiac output. If, as this construct implies, hyperkinetic borderline hypertension is ‘pre-hypertension’, we can undertake the development of strategies aimed to prevent the transition to established hypertension.

There are barriers to the expansion of the studies of Lund-Johanson and the testing of Julius’s hypothesis of the hyperkinetic state. Many of the studies that form the basis for the hypothesis were invasive and performed in elaborate, well-controlled, laboratory settings on patients recruited on the basis of their physiological traits (such as borderline high blood pressure). Clearly it is difficult to imagine such studies in large population based samples. If, however, the hyperkinetic state is common and affected individuals have characteristics that one can assess at a simple clinical examination, then one can test hypotheses about the hyperkinetic state in large extant data sets. Because heart rate is a major determinant of cardiac output and is controlled by the same adrenergic and cholinergic systems implicated pathogenetically in the hyperkinetic state, we postulated that heart rate is a sufficiently precise indicator of cardiac output to permit detection of the hyperkinetic circulatory state. A hypothesis of interest, then, is whether or not the hyperkinetic state is a true disease entity. If it is not, one could argue that it is simply an artifact related to the sampling of people from the upper end of the blood pressure, cardiac output, or heart rate distribution. To examine this hypothesis, we searched for the presence of megaphenic effects in the joint blood pressure and cardiac output and joint blood pressure and heart rate distribution in three large data sets. We describe in the next two sections the data sets, the study design, and results of this investigation.

4. ASSESSING THE HYPERKINETIC STATE AS A DISEASE ENTITY: METHODS

4.1. The data

To examine the hypotheses outlined in Section 3, we analysed three different data sets. The variables of interest include cardiac index, heart rate and mean blood pressure. We used mean blood pressure, defined as diastolic blood pressure + (1/3 (systolic-diastolic)), in preference to systolic or diastolic blood pressure, since, from a physiologic standpoint, mean blood pressure = cardiac output x resistance, and this integrates most easily into the

Page 7: Disease entities, mixed multi-normal distributions, and the role of the hyperkinetic state in the pathogenesis of hypertension

MIXED MULTI-NORMAL DISTRIBUTIONS 307

context of the haemodynamic patterns implicated in the research outlined earlier. Cardiac output was indexed to body surface area (hence, cardiac index) in our data to eliminate the possible confounding effects of body size in the mixture analysis.

The first data set (subsequently referred to as the 'Ann Arbor' data set) consisted of 444 normotensive and borderline hypertensives who had undergone invasive haemodynamic monitoring in the laboratories of the University of Michigan over the past two decades25. We used these data to verify the existence of the hyperkinetic subgroup, as well as to explore the potential of heart rate as a surrogate marker for more invasive haemodynamic measurements (that is, cardiac index) previously used to define the hyperkinetic state.

We then evaluated two independent data sets to test further the hypothesis that the hyperkinetic subgroup, as defined by the presence of both elevated heart rate and mean blood pressure, is a true disease entity or process identifiable in the general population. The first data set consisted of 1005 patients never diagnosed as having hypertension who were evaluated at the University of Michigan Atherosclerosis Risk Factor detection programme (the 'Risk factor' data set) in an outpatient setting. The second data set consisted of 2633 randomly selected persons who participated in a Michigan state-wide survey of home blood pressure levels funded by the National Heart, Lung, and Blood Institute (the 'State-Wide' data set26).

4.2. Data adjustment techniques

To ensure that the presence of a mixture in our data was not due to simple demographic and body size variables, we employed a data adjustment technique, which consisted of creating variables for race and sex, as well as the first, second, and third order (that is, power) effects of weight, height, and age. In addition, we generated all combinations of these variables to allow for interaction effects. These variables were then entered into a step-wise regression algorithm as independent variables with heart rate and mean blood pressure (and cardiac index for the Ann Arbor data set) as separate dependent variables. We allowed in the final model only those variables that explained a significant ( p < 0.05) portion of the variability, based on a partial F-test criterion, in heart rate and mean blood pressure. We used in the mixture analysis the residuals that resulted from these models, which represent variability not explained by the demographic and body size factors. With this procedure, we could explain 5 per cent, 7 per cent, and 7 per cent of the heart rate and 16 per cent, 18 per cent, and 14 per cent of the mean blood pressure variability on the basis of these factors for the Ann Arbor, risk factor, and state-wide data sets, respectively. We added back to the residuals the mean values of the unadjusted variables to aid in the interpretation of the results. For reasons of brevity we do not show the final models. However, weight, height, and age, along with weight and age interactions dominated the models.

4.3. Analytic strategy

To determine the status of the hyperkinetic state as a true disease entity, we searched for megaphenic effects in the adjusted joint blood pressure and cardiac index or heart rate distribution with the use of the normal mixture likelihood methods outlined in Section 2. In an effort to assist interpretation and safeguard against misinterpretation, we employed a number of statistical strategies. For each analysis, we obtained bootstrap estimates of the standard error and bias of each parameter in the mixture d i s t r ibu t i~n .~~ Small estimates of the standard error and bias of the parameters suggest stability of the mixture distribution, and militate against the interpretation that the mixture arose from outliers or statistical noise (for example, poor convergence of the parameter estimates, too few sample values, etc.). In addition, we used the test outlined in Schork and Schork" to test the hypothesis that a single skewed distribution may

Page 8: Disease entities, mixed multi-normal distributions, and the role of the hyperkinetic state in the pathogenesis of hypertension

308 N. J. SCHORK ET AL.

Table 1. Mixture distribution parameter estimates derived from the mixture analysis of each of the four data sets using adjusted values (see text)

Measure

Sub-distribution 1 Sub-distribution 2 (normokinetic) (hyperkinetic)

Prop Mean SD Prop Mean SD

Ann Arbor

CI 5.8 1 0.63 MBP 86.74 080 2 x Likelihood ratio of 1 versus 2 mixed distributions: 89.4 (p<O~OOoOOOl) 2 x Likelihood ratio of 2 versus 3 mixed distributions: 11.7 (p=0.30)

Ann Arbor

HR 64.77 082 MBP 86.79 0 9 1 2 x Likelihood ratio of 1 versus 2 mixed distributions: 45.7 (p<O~OOOOoO1) 2 x Likelihood ratio of 2 versus 3 mixed distributions: 8.7 (p=0.56)

Risk factor

HR 71.53 0.78 MBP 93.77 075 2 x Likelihood ratio of 1 versus 2 mixed distributions: 90.6 (p<O~OOOOOOl) 2 x Likelihood ratio of 2 versus 3 mixed distributions: 13.9 (p=0.18)

State-wide

HR 73.7 1 084 MBP 9 1.89 066 2 x Likelihood ratio of 1 versus 2 mixed distributions: 210.1 (p<O~OOOOOOl) 2 x Likelihood ratio of 2 versus 3 mixed distributions: 17.4 (p=0.07)

0.84 0-16

093 0.07

0.83 0.17

0.76 0.24

6.8 1 87.62

66.68 88.12

72.66 94.69

74.06 92.49

1.33 1.09

1.07 1-17

1.16 1.04

1.3 1 1-14

Prop: Mean: mean value S D standard deviation

proportion of observations in a sub-population

explain the variation in a given set of data better than a mixture of normal distributions. Finally, once we obtained reliable mixture parameters, we classified each observation into the appropriate sub-group by use of discrimination analysis. After classification, we tested the values of the observations in each subgroup for normality to confirm the appropriateness of the assumption of a mixture of normal distributions in the identification of megaphenic effects.

5. ASSESSING THE HYPERKINETIC STATE AS A DISEASE ENTITY: RESULTS

5.1. Hyperkinetic borderline hypertension

As shown in Table I (first section), the results of the mixture analyses of the distribution of resting cardiac index (CI) and mean blood pressure (MBP) strongly favoured the presence of two subgroups in the Ann Arbor data set. Table I1 presents the characteristics of the groups after classification of each observation; we present unadjusted values with the adjusted values used in

Page 9: Disease entities, mixed multi-normal distributions, and the role of the hyperkinetic state in the pathogenesis of hypertension

MIXED MULTI-NORMAL DISTRIBUTIONS 309

Table I1 Mixture distribution parameter estimates obtained after the classification of subjects, for the Ann Arbor data set using two different haemodynamic characterizations of the hyperkinetic state

~ ~~~

Sub-distribution 1 Sub-distribution 2 (normokinetic) (hyperkinetic)

Measure Prop Mean SD Skew Kurt Prop Mean SD Skew Kurt

CI and MBP model: 1 Distribution 1 .o Adjusted CI 3.06 0.97 Adjusted MBP 87.05 0.92 Unadjusted CI 3.06 0.72 Unadjusted MBP 87.05 11.06 2 Sub-distributions 0.90 Adjusted CI 2% 0.65 Adjusted MBP 86.75 0.79 Unadjusted CI 2.92 0.50 Unadjusted MBP 85.64 9.93 (Unadjusted HR) 63.42 9.37

H R and MBP model: 1 Distribution 1 .oo Adjusted HR 64.91 0.98 Adjusted MBP 87.05 0.92 Unadjusted HR 64.91 11.01 Unadjusted MBP 87.05 11.06 2 Sub-distributions 0.94 Adjusted HR 64.78 0.82 Adjusted MBP 86.81 0-82 Unadjusted HR 63.06 9.30 Unadjusted MBP 86-04 10.05 (Unadjusted CI) 2.99 0.64

1.12t 0.45 t 2.91 t 0.44 t

0.14 0.03 0.23 * 014

0.70t 0.45 t 0.79 t 0.44 t

0.01 005 0.13 0.13

2.47 t 0434 t 3.05 t 0.51 *

0.10 - 0.53 * 4.48 - 0.26 87.81

0.36* 4.39 - 0.20 99.41

79.21

1.58t 0.84 t 1.84t 0.517

0.06 - 0.37 69.8 1 - 0.21 88.31 - 0.26 89.43 - 027 104.59

4.37

1.41 - 066 1.13 0.13

13.34 0.10 14.66

1.03 - 0.82*

1.01 0.36 1.34 - 0.57

11.49 0.50 13.88 - 0.31 089

1.10 - 0.02

1.76 0.0 1

1.23 0.39

- 0.26 0.90

* p < 0.05

Prop: Mean: mean value SD. standard deviation Skew: 3rd-moment estimate of skewness Kurt: 4th-moment estimate of kurtosis

t p i 0.01 proportion of observations in a sub-population

the analysis as an aid in interpretation. As expected for a hyperkinetic borderline hypertensive subgroup, those 10 per cent of the subjects clustering in the upper subdistribution have not only higher blood pressures but also significantly higher cardiac indices than those in the lower subdistribution. Indeed, as we have shown elsewhere, the hyperkinetic state, as defined by mixture analysis, occurs only rarely in normotensives, and thus we expect the hyperkinetic circulatory state to have a strong association with the presence of borderline hyper ten~ion .~~

We next applied mixture analysis to the same data set but examined the combined distributions of heart rate (HR) and mean blood pressure. Again, we found that two sub-distributions were found to fit far better than one or three (Table I and Table 11, second sections). Subjects in the upper (hyperkinetic) sub-distribution had high heart rates and mean blood pressures, with a mean blood pressure similar to that of subjects identified as hyperkinetic by analysis of cardiac index and blood pressure. In addition, the mean value of cardiac index of those classified as hyperkinetic using heart rate and blood pressure was significantly higher ( p < 0-001) than that for the normokinetic subgroup (Table 11, values in parenthesis), which lends further support for the

Page 10: Disease entities, mixed multi-normal distributions, and the role of the hyperkinetic state in the pathogenesis of hypertension

310 N. J. SCHORK ET AL.

Table 111. Specificity and sensitivity of the classification to the hyperkinetic sub-group based on the elevated heart rate and mean blood pressure model

Classification based on the CI and MBP model

Classification based on the HR and MBP model Normokinetic Hyperkinetic Totals

Normokinetic Hyperkinetic

Totals

396 24 420 4 20 24

400 44 444

Chi-square: 132.08 (p < 0 ~ 0 0 ~ 0 1 ) Specificity: 3961400 = 0.99 Sensitivity: 20/44 = 0.45

role of heart rate as a surrogate for cardiac index in the identification of the hyperkinetic subgroup.

The proportion of subjects classified as hyperkinetic by heart rate and mean blood pressure was lower (24/444, 5.4 per cent) than that identified by mixture analysis of cardiac index and blood pressure (44/444,9.9 per cent). Shown in Table 111 are the two-way classifications based on the pairs of variables subjected to mixture analysis. Both approaches similarly classify 396/444 (89-2 per cent) of subjects as normokinetic and 20/444 (4.5 per cent) as hyperkinetic. Use of the heart rate results in 4/444 (0.9 per cent) subjects classified as hyperkinetic who were normokinetic by cardiac index (false positives) and 24/444 (5.4 per cent) classified as normokinetic when they were in fact hyperkinetic by cardiac index (false negatives). Using the cardiac index and blood pressure classification as the ‘gold standard’, specificity of the heart rate and blood pressure classification is only 45 per cent, but sensitivity is 99 per cent.

5.2. The hyperkinetic subgroup in the general population

Tables I (lower sections) and IV present the results of the mixture analysis for the risk-factor and state-wide data sets. The results uphold the hypothesis of the resolution of two subdistributions from the joint heart rate and mean blood pressure values, with similar results in both data sets. The proportion of subjects identified as hyperkinetic are similar (17 per cent and 24 per cent before classification; 10 per cent and 12 per cent after classification), and the mean values for each of the variables agree closely. Direct comparison of these subjects to those in the Ann Arbor data set may not be justified since, for the latter study the heart rate and mean blood pressure determinations occurred during invasive intra-arterial monitoring. Nevertheless, we note that the mean heart rate and mean blood pressure values differed significantly (p < 0.0001) between the normokinetic and the hyperkinetic subgroups, for each of the Ann Arbor, risk factor, and state- wide data sets.

Since the technique of multivariate normal mixture analysis entails several crucial assumptions, we assessed these assumptions with respect to the data sets we analysed:

1. The assumption that we could resolve two multivariate normal distributions from the combined heart rate and mean blood pressure levels seems justified. Not only is there overwhelming evidence from the likelihood ratio tests, but also the mixed subdistributions derived from the analysis appear to be normally distributed as evidenced by the marginal

Page 11: Disease entities, mixed multi-normal distributions, and the role of the hyperkinetic state in the pathogenesis of hypertension

MIXED MULTI-NORMAL DISTRIBUTIONS 311

Table IV. Mixture distribution parameter estimates obtained after classification of each subject for the risk factor and state-wide data sets

Sub-distribution 1 Sub-distribution 3 (normokinetic) (hyperkinetic)

Me as u r e Prop Mean SD Skew Kurt Prop Mean SD Skew Kurt

Risk-jactor data: 1 Distribution 1.00 Adjusted HR Adjusted MBP Unadjusted HR Unadjusted MBP

2 Sub-distribution 0.90 Adjusted HR Adjusted MBP Unadjusted HR Unadjusted MBP

State-wide data: 1 Distribution 1.00 Adjusted HR Adjusted MBP Unadjusted HR Unadjusted MBP

2 Sub-distributions 0.88 Adjusted HR Adjusted MBP Unadjusted HR Unadjusted MBP

72.1 1 94.55 72.1 1 94.55

72.06 96.06 70.29 92.90

73.83 91.95 73.83 91.95

73.72 91.88 72.98 89.45

0.96 0.88

10.73 10.85

0.77 0.72 8.83 9.35

0.99 0.84

11.45 12.04

0.83 0.64 9.70 9.58

0.68 ** 0.40** 0.62** 0.45 **

- 0.04 - 0 0 2 - 0.05

0.05

0.18 0.64 t 0.18t 0.71 t

- 0.07 - 0.177 - 0.05

0.0 1

0.87** 0.31 * 0.66** 072**

- 0.41 * - 022 - 0.27 - 0.22

0 7 4 t 144 t 0.71 t 1.48 t

- 0.327 - 0.28 t - 0.26t

0.33t

0.10 73.76 99.06 87.7 1

108.67

0.12 74.3 1 96.12 79.87

107.01

1.04 0.9 1

12.73 1250

1 6 5 1.21

18.91 16.38

- 0.43 * - 0.25 - 0.26 - 0.22

- 0.33* - 0.76f - 0.34* - 0.26

0.7 1 - 0.12

0.4 1 0.07

- 0.48 0.81 t

- 0.42 - 0.49

p < 0.05

Prop: Mean: mean value S D standard deviation Skew: 3rd-moment estimate of skewness Kurt: 4th-moment estimate of kurtosis

t p < 0.01 proportion of observations in a sub-population

levels of skewness and kurtosis (presented with the mixture parameters in Tables I1 and IV) determined after classification.

2. The mixtures were clearly not falsely detected from skewness in single bivariate distributions. With use of the test discussed in Schork and Schork," we rejected the hypothesis of a single skewed distribution in favour of a mixture of two distributions in both data sets ( x 2 = 33.5,~ < 04005 for the risk factor data set, and I* = 154.6,~ < 0-00001 for the state-wide data set).

3. We assumed markedly different relationships between heart rate and mean blood pressure within the normokinetic and hyperkinetic subgroups; therefore, throughout the study we assumed unequal variance-covariance structures. Table V presents the correlation coefficients between heart rate and mean blood pressure within each subgroup for each of the three data sets. The test described by Morrison2* to test the equality of correlation coefficients (also given in Table V), suggests that marked differences to indeed exist. Not only does this substantiate our assumption of unequal covariance structures, but proves

Page 12: Disease entities, mixed multi-normal distributions, and the role of the hyperkinetic state in the pathogenesis of hypertension

312 N. J. SCHORK ET AL.

Table V. Correlations between heart rate and mean blood pressure within the normokinetic and hyperkine- tic sub-groups for each of the Ann Arbor, risk-factor, and state-wide data sets

Normo kinetic Hyperkinetic r p-value r p-value I4 - p-value

Ann Arbor: Adjusted values 0-25 < 04001 - 0.59 0.0029 4.69 < O W 0 1 Unadjusted values 0.3 1 0.0004 - 0.63 0.0012 5.07 < O.OOO1

Risk-factor: Adjusted values - 0.03 0.32 -0.49 < 0OOO1 3.87 < 0.0005 Unadjusted values - 011 0.75 - 0.53 < 0.0001 5.76 < 0.0001 State-wide: Adjusted values 010 <0~0001 -021 <00002 6.13 <00001 Unadjusted values 0.10 < 0.0001 - 0.19 0.0004 5.88 < 0~0001

The Idl-statistic for testing the equality of two correlation coefficients given in Morrison”

4.

interesting in its own right. In addition, when we ran an analysis that forced a common covariance structure among the subgroups, we could find only a relatively small (< 5 per cent) proportion of ‘non-normals’ for each of the risk factor and state-wide data sets. We could not glean any intuitively appealing conclusions associated with the hyperkinetic state from these results since these smaller subgroups possessed higher pressures but lower heart rates. Therefore, we did not pursue the algorithmic assumption of common covariance structures and the results obtained with the use of that assumption. The estimates of the mixture parameters for each data set appeared relatively stable, as evidenced by the standard deviation and bias of each parameter gleaned from 200 bootstrap replications (not reported). The proportion parameters, p k , had the largest standard deviations. In using these to construct 95 per cent confidence limits we determined that the proportion parameters differed significantly from 0 and 1, a fact which lends evidence to our belief that subpopulations exist.

6. DISCUSSION

The results of our mixture analysis demonstrate that the hyperkinetic borderline hypertensive state is indeed a distinct pat hophysiologic entity, as suggested in earlier case-control studies. Since the study population that forms the basis for these earlier conclusions was not randomly selected, and the invasive nature of the haemodynamic profiling may have affected cardiovascular function, we examined a potential non-invasive surrogate for cardiac index, heart rate. We find that heart rate is a highly sensitive marker for the hyperkinetic state, although one with limited specificity. Low specificity would not seem to pose an important impediment to the use of mixture analysis in population-based studies of hyperkinetic borderline hypertension, as the patients identified by the heart rate and mean blood pressure criterion seem to constitute a representative subpopulation of those classified as hyperkinetic by cardiac index and mean blood pressure measurements (see Table 111). For epidemiologic studies or hypothesis testing, the failure of mixture analysis to identify approximately half of the hyperkinetics is more than offset by the ease with which one can collect mean blood pressure and heart rate data and by the high sensitivity of classification based on such measurements.

Page 13: Disease entities, mixed multi-normal distributions, and the role of the hyperkinetic state in the pathogenesis of hypertension

MIXED MULTI-NORMAL DISTRIBUTIONS 313

In studies of borderline hypertension, where as many as 25 per cent of patients may have a hyperkinetic circulatory state, the predictive power of classification by mixture analysis methods based on heart rate and mean blood pressure would be 94 per cent. In addition, because one can repeatedly obtain such measures quickly and non-invasively in naturalistic settings, concerns related to the confounding effects of invasive techniques and the laboratory environment become obviated.

The results from the two other data sets examined bear out the utility of mixture anylysis in the identification of the hyperkinetic state in the general population. Of great interest is the close agreement of the proportion of hyperkinetic subjects within the two study populations after classification of each subject. Since the setting for one study was an outpatient clinic (risk factor) while the other study involved data collected in the home (state-wide), the results suggest that classification by mixture analysis is relatively unaffected by setting. The ability to identify hyperkinetic individuals in such data sets will allow the examination of other large extant data sets with longitudinal observations, and the performance of retrospective and prospective studies of the natural history of hyperkinetic borderline hypertension.

One can only speculate about the mechanism responsible for the megaphenic factor that affects the joint distribution of blood pressure and cardiac index or heart rate. It may be the effect of a major gene or gene constellation. Alternatively, the hyperkinetic state could result from a unique social or psycho-physiological d i ~ o r d e r . ~ ~ . ~ ~ At present, we feel that the analysis described above strongly supports the contention that the hyperkinetic state is a genuine disease entity, and that its etiology and importance to the development of future hypertension demands further research.

REFERENCES

1. Pickering, G. W. High Blood Pressure, Grune and Stratton, Inc., New York, 1968. 2. Swales, J. 0. (ed.) Platt us. Pickering: An Episode in Recent Medical History, The Keynes Press,

3. Hartigan, J. A. Clustering Algorithms, Wiley, New York, 1975. 4. Anderberg, M. R. Cluster Analysis for Applications, Academic Press, New York, 1973. 5. Morton, N. E. ‘The detection of major genes under additive continuous variation’, American Journal of

Human Genetics, 19, 23-24 (1967). 6. Titterington, D. M., Smith, A. F. M., and Makov, U. E. Statistical Analysis of Finite Mixture

Distributions, Wiley, Great Britain, 1985. 7. Everitt, B. S. ‘Maximum likelihood estimation of the parameters in a mixture of two univariate normal

distributions; a comparison of different algorithms’, The Statistician, 33, 205-21 5 (1984). 8. Peters, B. C. and Walker, H. F. ‘An iterative procedure for obtaining maximum likelihood estimates of

the parameters for a mixture of normal distributions’, SIAM Journal of Applied Mathematics, 35,

9. Dempster, A. P., Laird, N. M., and Rubin, D. B. Maximum likelihood from incomplete data via the EM algorithm’, Journal of the Royal Statistical Society, Series B, 39, 1-38 (1977).

10. Schork, N. J. and Schork. M. A. ‘Skewness and mixtures of normal distributions’, Communications in Statistics, Theory and Methods, 17, 3951-3969 (1988).

11. Hartigan, J. A. ‘A failure of likelihood asymptotics for normal mixtures’, in LeCum, L. M. and Olshen, R. A. (eds.) Proceedings of the Berkeley Conference in Honor of Jerzy Neyman and Jack Kiejer, Volume IZ, Wadsworth, Monterey, 1985, pp. 807-810.

12. Wolfe, J. H. ‘A Monte Carlo study of the sampling distribution of the likelihood ratio for mixtures of multinormal distributions’, Navy Personnel and Training Research Laboratory, Technical Bulletin, STB 72-2, San Diego, 1971.

13. McLachlan, G. J. ‘On bootstrapping the likelihood ratio test statistic for the number of components in a normal mixture’, Applied Statistics, 36, 318-324 (1987).

14. Schork, N. J. and Schork, M. A. ‘Bootstrapping likelihood ratios in tests of separate families of hypotheses: the SOS criterion’, Submitted for publication, 1989.

Cambridge, 1985.

362-378 (1978).

Page 14: Disease entities, mixed multi-normal distributions, and the role of the hyperkinetic state in the pathogenesis of hypertension

314 N. J. SCHORK ET AL.

15. Schork, N. J. and Schork, M. A. ‘Testing separate families of segregation hypotheses: bootstrap

16. Johnson, R. A. and Wichern, D. W. Applied Multivariate Statistical Analysis, Prentice-Hall, New York,

17. Julius, S. and Schork, M. A. ‘Borderline hypertension-A critical review’, Journal of Chronic Diseases, 23,

18. Julius, S. and Conway, J. ‘Hemodynamic studies in patients with borderline blood pressure elevation’, Circulation, 38, 282-288 (1968).

19. Julius, S., Pascual, A. V., Sannerstedt, R. and Mitchell, C. ‘Relationship between cardiac output and peripheral resistance in borderline hypertension’, Circulation, 43, 382-390 (1971).

20. Julius, S., Pascual, A. V. and London, R. ‘Role of parasympathetic inhibition in the hyper-kinetic type of borderline hypertension’, Circulation, 44, 41 3 4 1 8 (1971).

2 1. Julius, S. and Hansson, L. ‘Hernodynamics of prehypertension and hypertension’, Verhandlungen der Deutschen Gessellschafi fur innere Medizin, 80, 49-58 (1974).

22. Julius, S., Esler, M. D., Randall, 0. S. and Ellis, C . N. ‘Neurogenic maintenance of peripheral resistance in borderline hypertension’, Acta Physiologica Latino Americana, 24, 42543 1 (1974).

23. Lund-Johansen, P. ‘Haemodynamic observations in mild hypertension’, in Gross, F. and Strasser, T. (eds.) Mild Hypertension: Natural History and Management, Pittman Medical, Bath, England, 1979,

24. Julius, S., Weder, A. B. and Egan, B. M. ‘Pathophysiology of early hypertension: implication for epidemiologic research’, in Gross, F. and Strasser, T. (eds) Recent Advances in Mild Hypertension, Raven Press, New York, 1983, pp. 219-236.

25. Julius, S., Schork, N. J., and Schork, M. A. ‘Sympathetic hyperactivity in early stages of hypertension: The Ann Arbor data set’, Journal of Cardiovascular Pharmacology, 12S, S121-S129 (1988).

26. Cottington, E. M., Brock, B. M., House, J. S. and Hawthorn, V. M. ‘Psychosocial factors and blood pressure in the Michigan Statewide Blood Pressure Survey’, American Journal of Epidemiology. 121,

methods’, American Journal of Human Genetics, 45, 803-8 13 (1989).

1982.

723-754 (1971).

pp. 102-115.

515-529 (1985). 27. Efron, B. The Jacknife, the Bootstrap and Other Resampling Plans, SIAM, Philadelphia, 1982. 28. Morrison, D. F. Multivariate Statistical Methods, McGraw-Hill, United States, 1976. 29. Julius, S. ‘The psychophysiology of borderline hypertension’, in Weiner, H., Hoffer, M. A. and Stundard,

A. J. (eds.) Brain, Behavior, and Bodily Disease, Raven Press, New York, 1977, pp. 293-303. 30. Esler, M., Julius, S., Zweifler, A., Randall, O., Harburg, E., Gardiner, H. and DeQuattro, V. ‘Mild high-

renin essential hypertension: Neurogenic human hypertension? New England Journal of Medicine, 296, 405-41 1 (1981).