5.3.3 estimating heritability using the classical twin design

Click here to load reader

Upload: naida-oneil

Post on 01-Jan-2016

36 views

Category:

Documents


1 download

DESCRIPTION

5.3.3 Estimating heritability using the classical twin design. Heritability ( 유전율 , 유전력 ) 연속적인 변이를 나타내는 양적 형질의 표현형에 대해 그 중 어느 정도가 다음 대에 유전되는지를 나타내는 양 . Measure the importance of genetics in relation to other factors in causing the variability of a trait in a population. - PowerPoint PPT Presentation

TRANSCRIPT

  • 5.3.3 Estimating heritability using the classical twin designHeritability (, ) .Measure the importance of genetics in relation to other factors in causing the variability of a trait in a population.Broad heritability (coefficient or genetic determination) Proportion of total phenotypic variance accounted for by all genetic componentsAdditive, dominance and epistasisNarrow heritability (or just heritability) Proportion of phenotypic variance accounted for by the additive genetic component.

  • Analysis of varianceJinks and Fulker, 1970; Eaves, 1977The classical twin method1.Genetic variance (additive components, dominance components)2. Environmental variance (shared components, non-shared components)Assumes that MZ and DZ twins do not differ in total environmental variance,or in the proportion of environmental variance that is common to members of the same twin-pairs (the equal environment assumption)VP = VA + VD + VC + VEVP : Total phenotypic varianceVA : Additive varianceVD : Dominance varianceVC : common environmental variance VE : The remaining, non-shared environmental variance.*

  • The Correlation Between RelativesExample (Father, Son)A : LocusA1 and A2 : Two allele at Am : measurement of characterp,q : frequency of A1, A2P(fater = A1A1) * P(son= A1A1) = P(fater=A1A1) * P(son take A1 from mother)Methematical Population Genetics

  • The Correlation Between RelativesPrinciples of population GeneticsThe coefficients r and u are determined from coefficients of coancestry FxyFxy of two individuals x and y is the inbreeding coefficient of a hypothetical offspring of x and yIf individuals A and B are the parents of x, and C and D are the parents of y thenr = 2Fxyu = FACFBD + FADFBCGenetic covariance - Cotterman(1940)

  • Twin studyX is determined by two underlying variables, B and W whereB is perfectly correlated between members of the same twin-pairbut uncorrelated between members of different twin-pairs.W is uncorrelated between any two individuals.

  • Analysis of varianceRelationships between these variance components and intraclass correlations for MZ and DZ twins*

  • Analysis of varianceExpected mean squares from one-way ANOVA of MZ and DZ twin

  • Analysis of varianceSet VP=1 with no loss of generalityEstimate the values of three unknown parameters(VA, VD, VC) with the two statistics(rMZ, rDZ) and so there is no unique solution.

    If rMZ/rDZ < 1 or rMZ/rDZ > 4 then the model is inappropriate.

  • Analysis of varianceThis procedure does not imply that VC and VD cannot coexist, but merely that they cannot be jointly estimated with the data available.

  • Analysis of varianceBroad heritabilityNarrow heritabilityVariance of the intraclass correlation estimated from data on n twin-pairsThis method can be used to obtain approximate standard errors of h2 and H2 for the different ranges of values of the rMZ/rDZ ratio

  • Example 5.9MZ:DZ ratio in intraclass correlations (Twin data in example 5.1)rMZ/rDZ = 2.3946123986Since this is between 2 and 4, a model including additive genetic effects and dominance is selectedComponents of varianceVA=0.3099VB=0.1524VE=0.5377Broad heritabilityH2=0.4623SE(H2)=0.0344Narrow heritabilityh2=0.3099489195SE(h2)=0.2360174398

  • Example 5.9The large standard error for narrow heritability is due to the partial confounding between additive genetic effects and dominance.Although the total genetic contribution can be estimated quite precisely, there is much more uncertainty about the relative contributions of additive genetic effects and dominanceExample - IQ has 0.771 broad-heritability : About 70% of the variance in IQ was found to be associated with genetic variation

  • Linear regressionDeFries and Fulker, 1985(DF model)A single analysis of the entire dataset, instead of separate analyses on MZ and DZ twins.Heritability is estimated by a regression coefficient, so that its standard error is easily obtained.Applicable samplingThe MZ and DZ twin-pairs are random samples from a population.The twin-pairs are ascertained through proband twins selected to be over representative of certain ranges of trait values.

  • Linear regressiony : mean of maie offspring for a quantitative trait.x : phenotypic value of the fatherxy

  • The twin-pairs are random samples from a populationThis sampling procedure also assumed by one-way ANOVA.RecallIntraclass correlation is the proportion of trait variance due to a random effect shared by members of the same class.Intraclass correlation can be estimated from the values of mean squares from a one-way ANOVAFor twin data, the intraclass correlation is also the covariance of the trait between twins divided by the variance of the trait.In order to estimate this without arbitrarily assigning a member of each twin-pair as variable 1 and the other as variable 2Duplicate the data of each twin-pair so that the order of assignment is reversed in the two duplicates.

  • The twin-pairs are random samples from a populationSince the two variables in the duplicated data must have the same variance, their correlation is equal to the regression coefficient of either variable on the otherAn estimate of the intraclass correlation can therefore be obtained by a linear regression analysis on the duplicated data.Since sample size is twice the real sample size so the estimated standard error should be inflated by a factor of 2 .Alternatively, each observation should be given the weight of half an observation (using weight command)

  • The twin-pairs are random samples from a populationFor MZ twins, theoretical value of intraclass correlation, and hence the regression coefficient, is (VA+VD+VC)The regression equation for MZ twinsXC = KM+(VA+VD+VC)XP + EMXC : the trait values of the cotwinXP : the trait values of the proband twinKM : constantEM : random error termIn a duplicated dataset, proband and cotwin status is entirely arbitrary, so that the expected trait values for proband and cotwins are equal.

  • The twin-pairs are random samples from a populationLet the mean trait value be m, then KM = (1 VA VD VC)mThe regression equation for MZ twinsXC = KM+(VA+VD+VC)XP + EMXC = m VAm VDm VCm + VAXP + VDXP + VCXP + EMXC m =VA(XP-m) + VD(XP-m) + VC(XP-m) + EMThe regression equation for DZ twinsXC = KD + (VA/2 + VD/4 + VC) XP + EDXC m = VA(XP m) + VD(XP m) + VC(XP m) + ED

  • The twin-pairs are random samples from a populationThese equations suggest a regression analysis of MZ and DZ twins together, through the origin, with XC m as a dependent variables, and with three dummy independent variables, A, D and C, coded asA DCMZ XP m XP m XP m DZ (XP m)/2 (XP m)/4 XP m Assumes a single error term for both MZ and DZ twins.DZ twins are expected to have a greater residual variance than MZ twins, in the presence of a genetic component. This assumption is often ignored, and the regression coefficients of A, D and C taken as estimates for VA, VD and VC respectively.However, taking appropriate account of the possible difference in error variance between MZ and DZ twins is expected to improve the precision of the parameter estimation

  • The twin-pairs are random samples from a populationRegardless of the treatment of the error variance, the complete model for the fixed effects might be denoted as (A, D, C, E) three dummy variables A, D, C and error term E. C 3A + 2D =0They cannot all be entered into a single regression modelIf the regression coefficients are unconstrained, the three submodels, (A,D,E), (A,C,E), (D,C,E) will fit the observed data equally well, so that it is impossible to choose a submodel on the basis of goodness-of-fit.However, the (D,C,E) model is usually discarded because the presence of dominance interactions in the absence of additive genetic effects is considered extremely unlikely.

  • The twin-pairs are random samples from a populationThe submodel with positive regression coefficients is selected among the submodels (A, D, E) and (A, C, E).Whichever submodel is selected, it can be subjected to a backward elimination procedure to obtain a final model.The aim of the analysis is to assess the compatibility of the data with the alternative models (A,D,E), (A,C,E), (A,E), (D,E), (C,E) and (E), and to obtain the parameter estimates of the best supported modelGreat care must be taken in assessing the significance of a variable since the data have been duplicated for the analysis.

  • The twin-pairs are random samples from a populationAll the duplicated observations can be assigned a weight of Alternatively, if an unadjusted analysis is performed on the duplicated sample, then the results will need to be corrected for the artificial inflation of the sample size.Duplicating the data has the effect of doubling all the sums of squares and increasing the residual degrees of freedom by the actual sample size, n.A reasonable adjustment is therefore to halve all the sums of squares, and to reduce the residual degrees of freedom by n.Revise the mean squares and the F-statistics.Similarly, In assessing the regression coefficient of a variable, its standard error should be multiplied by a factor of 2 to give an adjusted standard error

  • The twin-pairs are random samples from a populationExample 5.10 Twin neuroticism dataMethodAll neuroticism scores are adjusted by subtracting the mean score 10.23Independent variables A and D are defined as aboveThe observations are duplicated, so that each twin acts as a proband twin in one duplicate and as the cotwin in the other.All observations are given a weight of .(A, D, E), (A, E) modelsResult (SPSS)(A, E) model is best-supported, with a heritability estimate of 0.4539(SE 0.05278)Do not take account of the potentially greater residual variance of DZ twins as compared with MZ twins (heteroscedasticity)MLN program (solution of heteroscedasticity)But no weight function, SE * 21/2 and -2 log-likelihoods * Select (A,E) model and estimated heritability is 0.4539

  • The twin-pairs are ascertained through a sample of proband twinsThe twin-pairs are ascertained through a sample of proband twins who may be selected for particular ranges of values of the trait. The same principles apply except that the twin-pair need not be duplicated unless both members of the pair are probands.Where there is only one proband in a pair, the non-proband member is treated as the dependent variable, and the proband member as the independent variable.

  • The twin-pairs are ascertained through a sample of proband twinsThe analysis proceeds as before, although the results should be adjusted for the ascertainment procedure and the duplication of some twin-pairs.Using software which allows fractional weightings for observations, each twin-pair in the regression analysis can be assigned a weight of n/N, where n is the actual number of twin-pairs, and N is the total number of pairs in the regression analysis (including the duplicated pairs)If an unadjusted analysis is performed, then all sums of squares should be multiplied by n/N, and the residual degrees of freedom reduced by N n.Similarly, standard errors should be increased by a factor of (n/N) .

  • The twin-pairs are ascertained through a sample of proband twinsExample 5.11 Twin neuroticism data example 5.1MethodThe selection criterion : at least one member has a neuroticism score of 16 or above are included, while the others are excluded.Selection : 108 MZ twin-pairs and 66 DZ twin-pairs.MZ : singly ascertained : 85 twins, doubly ascertained : 23 twinsDoubly ascertained twin-pairs were duplicated, 85 + 2(23) = 131 MZ observations were subjected to the analysis.Similarly, DZ 57 and 9 pairs selected, 57 +2(9)=75 DZ observations were subjected to the analysis.Weight=(108+66)/(131+75)=0.845

  • The twin-pairs are ascertained through a sample of proband twinsExample 5.11 Result(A,E) model selected with heritability estimate of 0.4545 (SE 0.05278)Similar to example 5.10 although the standard errors are largerUsing MLN (A,E) model selected

  • 5.4 ScaleThe genetic analysis of a trait is sometimes simplified by a suitable transformation of scale.Transformation may help to normalize the distribution of the trait in the population Reduce heteroscedasticity (e.g. a correlation between pair-differences and pair-means) Reduce the need for interaction terms (e.g. dominance, epistasis, gene-environment interaction and shared-nonshared environment interaction) in an analysis of variance.For exampleIf gene action is multiplicative on the original scale of a variable, then analysis of variance would lead to significant dominance and epistasis.However, multiplicative action on the original scale translates to additive effects on a logarithmic scale, so that interactions will be absent in an analysis of variance of the logarithm of the original variable.

  • ScaleFor exampleWhen the trait is a function of an area of a volume of a structure, but gene action is additive on the linear dimension of the structure, then an additive model will fit the data after a square root or a cube root transformation, but not the raw data on the original scale.Even in the absence of a theoretical rationale, a transformation may still be justified on empirical grounds if it reduces non-additivity.A transformation that normalizes the variable often has other desirable effects, such as removing the correlation between pair-differences and pair-means and the interaction terms in the analysis of variance.If this is not the case, then other transformations should be tried in order to find one that produces data compatible with an additive model.?

  • ScaleNegative view, Falconer and Mackay(1996)Transformations of scale, however, should not be used without good reason. The first purpose of experimental observations is the description of the genetic properties of the population, and a scale transformation obscures rather than illuminates this description. If epistasis, for example, is found, this is an essential part of the description and it is better labelled as such than a scale effect.Positive view, Mather and Jinks (1977)the justification for using a transformed scale is not theoretical but empirical while we must recognise that it is not always possible to find a transformation which in effect removes non-additivity when this is present in the direct measurements, the search for such a transformation is always well worth-while.

  • ScaleThere is no doubt that transformations can sometimes reduce the complexity of the genetic model necessary for providing an adequate description of the data.However, great care must be taken when drawing conclusions from such analyses, in that the simpler description applies to the transformed and not the original scale.For example, if an additive genetic model offers an adequate description of the cube root of body weight, this implies that an adequate genetic model for weight it self will probably include dominance and epistatic components.The simpler model based on the cube root transformation is more appealing from a statistical point of view, but the model based on the original scale may still be relevant.For example, if the risk of heart disease is more directly related to body weight itself rather than its cube root

  • 5.5 Quasi-continuous charactersQuasi-continuous charactersA single locus are necessarily discontinuous.However, not all discrete traits demonstrate Mendelian segregation.Many discrete traits appear to be inherited in a fashion similar to continuous characters.In humans, the notion that multiple loci are involved in common diseases. congenital malformations( ), ischaemic heart disease( ), diabetes mellitus().

  • Quasi-continuous characters5.5.1 Liability-threshold modelModeling the relationship betweenMultiple genetic and environmental factorsPresence or absence of a discrete characteristic such as a common diseaseLogistic regression modelResponse : presence of absence of the diseasePredictor : potential risk factorsLiability-threshold model Common method in geneticsPearson and Lee (1901)Natural extension of biometrical models for quantitative trait.

  • Quasi-continuous charactersIf we take a problem like that of coat-colour in horses, it is by no means difficult to construct an order of intensity of scale. The variable on which it depends may be the amount of pigment in the hair we may reasonably argue that, if we could find the quantity of pigment, we should be able to form a continuous curve of frequency Now if we take any line parallel to the axis of frequency and dividing the curve, we devide the total frequency into two classes, which, so long as there is a quantitative order of tint or colour, will have their relative frequency unchanged.horses coat-color(quantity of pigment)

  • Quasi-continuous charactersLiability-threshold modelRequire continuous variable liability, X, ~ N(0,1) in the general populationAll individuals above threshold t : the disease is presentOthers : the disease is absent.t can be estimated from the population frequency of the disease, p : standard normal distribution function. (CDF)

  • Quasi-continuous charactersLiability-threshold modelThe threshold has been criticized on biological groundsalternative model has been proposed that relates risk of illness to liability by a probit function (Curnow and Smith, 1975)However, this model is mathematically equivalent to the liability-threshold model

    TFigure 5.1 Distribution of liability in general population with threshold T.

  • Quasi-continuous charactersExample 5.12Consider the neuroticism data of Example 5.1.threshold : 15 low < 15 < highMZ : 132 high, 914 low, frequency of high scores is 132/1046=0.126DZ : 76 high, 470 low, frequency of low score is 76/546 = 0.1391.1450.8740

    () P(fater = A1A1) * P(son= A1A1) = P(fater=A1A1) * P(son take A1 from mother)P(fater = A1A1) * P(son= A1A1) = P(fater=A1A1) * P(son take A1 from mother)0.7 broad-heritability : about 70% of the variance in IQ was found to be associated with genetic variation