use of the regressive models in linkage analysis of quantitative traits

6
Genetic Epidemiology 10587-592 (1993) Use of the Regressive Models in Linkage Analysis of Quantitative Traits Florence Demenais and Mark Lathrop Unit6 de Recherche de GGnBtique des Maladies Humaines (INSERM U. 358), Centre d’Etude du Polymorphisme Humain, Paris, France Use of the regressive models to account for residual familial correlations in linkage analysis of complex quantitative traits can increase the power to detect linkage. This is especially observed when the effect of the gene to be mapped is small or when the residual correlations are substantial. 1993 Wiley-Liss, Inc. Key words: multifactorial traits, lod score method, familial correlations INTRODUCTION With the advances made in mapping the human genome, linkage analysis has become an increasingly efficient tool to identify genes involved in complex diseases and quantitative traits associated with these diseases. Examples are linkage studies of lipid levels associated with cardiovascular diseases [Leppert et al., 19861. The power ofthe lod score method to detect linkage between a major locus for a quantitative trait and a marker locus has been investigated in various situations where a single gene accounted for the familial transmission ofthe trait [Demenais et al., 1988; Boehnke, 19901. Boehnke [ 19901 also studied the effect of a polygenic background on the sample size required to detect linkage when a single gene model is assumed for the trait, as is classically done in linkage analysis. The regressive models, introduced by Bonney [1984], provide a general and computationally practical method to account for a major gene and various patterns of familial covariation of unspecified origin (genetic andor environmental) as well as measured covariates. This approach has been extended to linked marker loci [Bonney et al., 19881. The goal of the present paper is to investigate, through computer simulations, the sensitivity of the linkage test to the presence of multiple sources of familial covariation for a quantitative trait, using the regressive approach. Address reprint requests to Dr. Florence Demenais, INSERM U.358,27 rue Juliette Dodu, 75010, Paris, France. 0 1993 Wiley-Liss, Inc.

Upload: dr-florence-demenais

Post on 12-Jun-2016

218 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Use of the regressive models in linkage analysis of quantitative traits

Genetic Epidemiology 10587-592 (1993)

Use of the Regressive Models in Linkage Analysis of Quantitative Traits

Florence Demenais and Mark Lathrop

Unit6 de Recherche de GGnBtique des Maladies Humaines (INSERM U. 358), Centre d’Etude du Polymorphisme Humain, Paris, France

Use of the regressive models to account for residual familial correlations in linkage analysis of complex quantitative traits can increase the power to detect linkage. This is especially observed when the effect of the gene to be mapped is small or when the residual correlations are substantial. 1993 Wiley-Liss, Inc.

Key words: multifactorial traits, lod score method, familial correlations

INTRODUCTION

With the advances made in mapping the human genome, linkage analysis has become an increasingly efficient tool to identify genes involved in complex diseases and quantitative traits associated with these diseases. Examples are linkage studies of lipid levels associated with cardiovascular diseases [Leppert et al., 19861. The power ofthe lod score method to detect linkage between a major locus for a quantitative trait and a marker locus has been investigated in various situations where a single gene accounted for the familial transmission ofthe trait [Demenais et al., 1988; Boehnke, 19901. Boehnke [ 19901 also studied the effect of a polygenic background on the sample size required to detect linkage when a single gene model is assumed for the trait, as is classically done in linkage analysis.

The regressive models, introduced by Bonney [1984], provide a general and computationally practical method to account for a major gene and various patterns of familial covariation of unspecified origin (genetic andor environmental) as well as measured covariates. This approach has been extended to linked marker loci [Bonney et al., 19881. The goal of the present paper is to investigate, through computer simulations, the sensitivity of the linkage test to the presence of multiple sources of familial covariation for a quantitative trait, using the regressive approach.

Address reprint requests to Dr. Florence Demenais, INSERM U.358,27 rue Juliette Dodu, 75010, Paris, France.

0 1993 Wiley-Liss, Inc.

Page 2: Use of the regressive models in linkage analysis of quantitative traits

588 Demenais and Lathrop

METHODS

Monte-Carlo methods were used to simulate a quantitative trait and a linked marker in eight-member nuclear families (two parents and six children). The quantitative data were generated under the regressive models including a major gene and four different patterns of residual correlations. The major gene was assumed autosomal diallelic, either dominant or recessive. When the gene was dominant, the frequency of the allele (A) responsible for high quantitative measures was set at 0.01 or 0.10. We let t be the ratio of the difference between the means of the high and low genotypic classes to the standard deviation of the distribution of the trait conditional on major genotypes. The displacement, t, was set at 1 or 2 for each gene frequency. When the gene was recessive, the allele A frequency was set at 0.10 or 0.30 and the displacement, t, was 1.5 or 2 . Thus, the proportion of the total variance due to the gene varied from 2% to 38% and from 2% to 25%, when it was dominant and recessive, respectively. Conditioned on major genotypes, the phenotypes of all members of a nuclear family were assumed to follow a multivariate normal distribution with correlation structure as specified by the regressive models. In these models, the phenotype of each individual is regressed upon the phenotypes of preceding relatives, and the regression coefficients are expressed in terms of phenotypic correlations, without introducing a particular scheme of causal relationship [Bonney, 19841. The polygenic model and its variants have been shown to correspond to particular patterns of the regressive models in nuclear families [Demenais and Bonney, 19891. For each generated major locus model, four different patterns of residual correlations were considered, each pattern being characterized by the relationship between the parent-offspring (ppo) and sib-sib (pss) residual correlations. The class A pattern specifies that the sibs are correlated only because of common parentage, imposing the constraint pss = 2p ppo was set at 0.25 (A1 pattern) or 0.45 (A2 pattern). The pure polygenic pattern (P pattern) implies that the sib-sib correlation is equal to the parent- offspring correlation; ppo and pss were chosen to be each equal to 0.25 (Pl) or 0.45 (P2). The S pattern is characterized by the presence of a residual sib-sib correlation only, with the parent-offspring correlation being zero; pss was set equal to 0.3 (Sl) or 0.5 (S2). The general class D pattern corresponds to the sibling correlation being different from the parent-offspring correlation; ppo and pss were set at 0.3 and 0.5, respectively. In all cases, we assumed no correlation between spouses and equality of variances conditional on major genotypes. The linked marker was taken to be fully informative and completely linked to the trait locus (recombination fraction, 8, = 0.0). One hundred replicates of family samples were generated for each combination of parameter values. The sample sizes were chosen to yield at least 80% power of detecting linkage under the true generating model. Families were selected to include at least two or three individuals above a given threshold, depending on the allele A frequency, high or low. The threshold corresponded to the upper 5% tail of the trait distribution in the population.

To assess whether linkage detection of a major locus is affected by ignoring residual familial correlations, data were analyzed under each generating model including residual correlations (strategy 1) and under a model where these correlations were set to zero and the major locus parameters fixed at their generating values (strategy 2),The maximum lod score (Z) and corresponding estimate of the recombination fraction, 6, were Ftermined for each replicate, under each strategy of analysis, and each quantity, Z or Q was then averaged over 100 replicates. Analysis strategies (1) and ( 2 ) were compared in terms of

2

Page 3: Use of the regressive models in linkage analysis of quantitative traits

Use of the Regressive Models in Linkage Analysis 589

the ratio of the mean maximum lod scores (ZR = Z2/Z,) y d the ratio of the biases in the mean 8 estimates (BR = B,/B,, where B,= bl-O0, B2= %-€lo and 8, = 0.0 is the true parameter value). All computations were done with the computer program REGRESS, a version of the LINKAGE program [Lathrop et al., 19841 incorporating the regressive approach.

RESULTS

The outcomes of linkage analyses of a quantitative trait controlled by a major gene and other sources of familial covariation are presented in Tables I (dominant gene ) and 11 (recessive gene). Whether the gene is dominant or recessive, ignoring the residual correlations affects the detection of linkage. This is especially observed when the effect of the gene is small for all correlation values, or with a relatively large gene effect when the correlations are substantial. When the gene is dominant (Table I), setting the residual correlations to zero seriously hampers the detection of linkage when the variance due to the gene is only 2% of the total variance. The ratio of the mean maximum lod scores, ZR, obtained by ignoring versus taking into account the residual correlations, decreases from 0.70 to 0.21 as the residual parent-offspring and sib-sib correlations increase. Interestingly, the decrease of ZR appears to be related to the values, high or low, of the residual correlations, especially pss, but not to a particular correlation pattern, characterized by whether pss is lower, equal to, or greater than pp9 When the effect of the gene is larger, accounting for 7% up to 38% of the total variance, ignoring the residual correlations leads to a substantial decrease in the mean maximum lod scores (ZR being in the range 0.40-0.70) ifthe residual correlations are moderate to high (pss greater than 0.30 with ppo varying from 0.0 to 0.45). The decrease of ZR appears to be related more to the distance between the means of the two distributions for the trait than to the gene frequency. Analyzing the data by wrongly assuming a single gene model leads to an overestimation of the recombination fraction when the variance due to the gene is less than 10%. However, there is only a small bias in 6 for genes with larger effect, independent of the correlations. Similar trends are observed when the gene is recessive (Table 11). The mean maximum lod scores are greatly reduced when the residual correlations are ignored and the variance due to the gene is 4% or less of the total variance. For larger gene effects, the detection of linkage is also affected when the residual correlations are relatively important. Here, the lod score decrease depends more on the gene frequency than on the distance between the means of the two distributions. Again, the decrease in ZR values is related not to a particular correlation pattern but to an increase in correlations. As before, the recombination fraction is severely biased only when the gene has a small effect.

DISCUSSION

Our results show that ignoring residual sources of familial covariation for a quantitative trait can seriously hamper the detection of linkage. Although this is observed mainly for genes with a small effect on the trait, lod scores are also substantially decreased when the gene accounts for more than 10% of the total variance and the residual correlations are relatively important. It is of interest that the decrease of the maximum lod score depends more on the values, low or high, of the residual correlations, especially the sib-sib correlation, than on a particular correlation pattern (Class A, Polygenic, Sibling

Page 4: Use of the regressive models in linkage analysis of quantitative traits

590 Demenais and Lathrop

TABLE I. Ratios of the Mean Maximum Lod Scores (ZR) and Ratios of the Biases in the Mean Recombination Fraction Estimates (BR) Obtained in Linkage Analyses of a Quantitative Trait with a Dominant Gene Effect When Ignoring Versus Taking into Account the Presence of Residual Correlations

Major gene model

Residual DOM 1 DOM2 DOM3 DOM4 correlation q=O.O1,t=la q=O.O1,t=2 q=o. 1 ,t=l q=o. 1 ,t=2 pattern (2%; 400) (7%; 30) (13%; 80) (38%; 15)

ZR BR ZR BR ZR BR ZR BR

A1 (0.25,0.125) 0.70 9.0 0.82 4.0 0.81 1.5 0.88 1.0 PI (0.25,0.25) 0.53 12.0 0.78 6.0 0.74 1.5 0.86 1.0 S1 (0.0,0.30) 0.52 8.0 0.84 3.0 0.76 1.0 0.89 0.5 A2 (0.45,0.405) 0.25 11.3 0.55 10.0 0.41 2.0 0.59 0.3 P2 (0.45,0.45) 0.24 11.3 0.56 10.0 0.35 3.0 0.52 0.3 S2 (0.0,O.SO) 0.23 8.5 0.61 8.0 0.48 2.0 0.76 0.2 D (0.30,O.SO) 0.21 18.0 0.60 9.0 0.51 1.0 0.70 0.1

aFor each dominant model, the gene frequency, q, and displacement, t, between the two homozygous means are given. The proportion of total variance due to the gene and the sample size (number of eight- member nuclear families) used are shown in parentheses. b ppo, pss denote the residual parent-offspring and sib-sib correlations respectively

b (PPO Pss)

Table 11. Ratios of the Mean Maximum Lod Scores (ZR) and Ratios of the Biases in the Mean Recombination Fraction Estimates (BR) Obtained in Linkage Analyses of a Quantitative Trait with a Recessive Gene Effect When Ignoring Versus Taking into Account the Presence of Residual Correlations

Major gene model

Residual REC 1 REC2 REc3 REc4 correlation q=O.l , t= 1.5a q=o. 1 ,t=2 q=0.3,t=1.5 q=o. 1 ,t=2 pattern (2%; 250) (4%; 80) (1 6%; 40) (25%; 15)

(PPO Psdb ZR BR ZR BR ZR BR ZR BR

A1 (0.25,0.125) 0.56 14.0 0.69 7.0 0.82 2.0 0.89 1.0 PI (0.25,0.25) 0.49 7.0 0.58 13.0 0.81 1.5 0.88 1.0 S1 (0.0,0.30) 0.43 12.5 0.54 16.0 0.79 2.0 0.89 2.0 A2 (0.45,0.405) 0.20 26.0 0.37 16.0 0.56 1.0 0.65 1.0 P2 (0.45,0.45) 0.22 14.0 0.29 19.0 0.51 1.0 0.64 0.3 S2 (0.0,O.SO) 0.21 11.3 0.33 12.0 0.61 2.5 0.75 2.0 D (0.30,0.50) 0.25 15.0 0.35 22.0 0.61 1.0 0.71 1.0

aFor each recessive model, the gene frequency, q, and displacement, t, between the two homozygous means are given. The proportion of total variance due to the gene and the sample size (number of eight- member nuclear families) used are shown in parentheses. b ppo, pss denote the residual parent-offspring and sib-sib correlations respectively.

Page 5: Use of the regressive models in linkage analysis of quantitative traits

Use of the Regressive Models in Linkage Analysis 59 1

correlation alone, Class D). Thus, misspecifying the correlation pattern for a given sib-sib correlation may not much reduce the power to detect linkage. The detection of linkage is similarly affected for a dominant and for a recessive mode of inheritance at the major locus. However, the decrease in lod scores as a finction of the gene effect appears to be more related to the distance between the means of the two distributions for the trait when the gene is dominant and to the gene frequency when the gene is recessive. This may be partly explained by the relative proportion of informative matings under each mode of inheritance, using our selection scheme. A displacement between means of less than 1.5 was not considered for a recessive gene since the sample size required to detect linkage with 80% power was at least 1,000 eight-member nuclear families, when the gene frequency was low. Moreover, the recombination fraction is notably biased only when the effect of the gene is small. However, we noted that 8 was slightly biased (bias of 0.01- 0.03) when the residual correlations were correctly specified or, for comparison, when the trait was due to a single gene.

All our results apply when the models and true parameter values are known. Taking into account but misspecifying the residual correlations may also affect the detection of linkage. We considered the extreme situation where the trait is controlled by a single gene and residual correlations are wrongly included in the analysis, using the same sets of parameters as before. The decrease in the maximum lod score is at most 0.70 in all instances but can reach 0.40-0.50 when the gene has a small effect and high residual correlations are spuriously added. The recombination fraction is biased when high residual correlations are wrongly included in the analysis with a small or large gene effect. This can be compared to the overestimation of the recombination fraction due to overestimating the penetrances in the case of discrete traits [Clerget-Darpoux et al., 19861. Note that our primary concern is to detect linkage before getting an accurate estimate of the recombination fraction.

The linkage test was also found to be sensitive to ignoring residual covariation for discrete traits [Martinez et al., 19911 but to a lesser extent than observed here. Their study also showed that, when there is no linkage, ignoring or misspecifying residual family dependence does not lead to a false conclusion of linkage.

The present simulations were conducted in nuclear families with sibships of size six. Neglecting correlations in smaller sibships may lead to a smaller effect on linkage detection.

In conclusion, when quantitative traits are controlled by multiple factors (genetic and/or environmental), identification of the gene(s) involved by the lod score method requires more complex modelling. The present study focused on the effect of misspecifying residual correlations on linkage detection when the major gene effect is known. The combined segregation and linkage analysis strategy, estimating jointly the major gene effect and residual correlations together with the recombination fraction to detect a marker-linked-gene, requires further investigation.

ACKNOWLEDGMENTS

This work was supported by INSERM and the French Ministry of Research.

REFERENCES

Boehnke M (1990): Sample size guidelines for linkage analysis of a dominant locus for a quantitative trait by the method of lod scores. Am J Hum Genet 47:218-227.

Page 6: Use of the regressive models in linkage analysis of quantitative traits

592 Demenais and Lathrop

Bonney GE (1984): On the statistical determination of major gene mechanisms in continuous human traits: Regressive models. Am J Med Genet 18:731-749.

Bonney GE, Lathrop GM, Lalouel JM (1988): Combined linkage and segregation analysis using regressive models. Am J Hum Genet 43:29-37.

Clerget-Darpoux F, Bonaiti-Pellie C, Hochez J (1 986): Effects of misspecifying genetic parameters in lod score analysis. Biometrics 42:393-399.

Demenais FM, Bonney GE (1989): Equivalence of the mixed and regressive models for genetic analysis. I . Continuous traits. Genet Epidemiol6:597-617.

Demenais F, Lathrop GM, Lalouel JM (1988): Detection of linkage between a quantitative trait and a marker locus by the lod score method: sample size and sampling considerations. Ann Hum Genet

Lathrop GM, Lalouel JM, Julier C, Ott J (1984): Strategies for multilocus analysis in humans. Proc Natl Acad Sci USA 81:3443-3446.

Leppert MF, Hasstedt SJ, Holm T, O’Connell P, Wu L, Ash 0, Williams RR, White R (1986): A DNA probe for LDL receptor gene is tightly linked to hypercholesterolemia in a pedigree with early coronary disease. Am J Hum Genet 39:300-306.

Martinez M, Demenais F, Bonney GE (1991): Use of the regressive logistic models in linkage analysis of complex disorders. Proceedings of the 8th International Congress of Human Genetics. Am J Hum Genet, Suppl. 49:350.

52~237-246.