linkage analysis: two-factor testcross aabb x aabb aabb, aabb, aabb, aabb what are the implications...
Post on 20-Dec-2015
354 Views
Preview:
TRANSCRIPT
Linkage analysis: Two-factor testcross
AaBb x aabb
AaBb, Aabb, aaBb, aabb
What are the implications of phenotypes scored on these progeny?
Linkage analysis: Two-factor testcross
• Double heterozgyotes are mated with homozygous recessives
• Genotypes of a large number of progeny are scored
• If locus A and B are on different chromsomes, alleles will follow Mendel’s law of Independent Assortment
• Genetically linked? Two of four genotypes more frequent than expected (2 test statistic)
Linkage analysis: Interval mapping (Haley and Knott, 1992)
A BQ
rA rB
rAB = rA + rB - 2rArB
Frequencies for F1 gametes and RI genotypes (Markel et al., 1996)
F1 gametes Frequency RI genotype Frequency
A1B1 (1 - R')/2 A1A1B1B1 (1 - R)/2
A1B2 R'/2 A1B1B2B2 R/2
A2B1 R'/2 A2A2B1B1 R/2
A2B2 (1 - R')/2 A2A2B2B2 (1 - R)/2
RI genotypic frequencies of two flanking markers and an intermediate QTL (Markel et al., 1996)
Genotype Predicted Frequency
A1A1Q1Q1B1B1
A1A1Q2Q2B1B1
(1 - RA)(1 - RB)/2RARB/2
A1A1Q1Q1B2B2
A1A1Q2Q2B2B2
(1 - RA)RB/2RA(1 - RB)/2
A2A2Q1Q1B1B1
A2A2Q2Q2B1B1
RA(1 - RB)/2(1 - RA)RB/2
A2A2Q1Q1B2B2
A2A2Q2A2B2B2
RARB/2(1 - RA)(1 - RB)/2
Expected additive effect coefficients of each pair of RI genotypes (Markel et al., 1996)
RI Genotypes Expected additive effect
A1A1B1B1 [(1 - RA - RB)/(1 - R)](a)
A1A1B2B2 [(RB - RA)/R](a)
A2A2B1B1 [(RA - RB)/R](a)
A2A2B2B2 [(RA + RB - 1)/(1 - R)](a)
Coefficients (xi) of the additive effect of a QTL at five positions between two flanking markers of A and B that are 20 cM apart (Markel et al., 1996)
Position of QTL (cM)
Genotype 0 5 10 15 20
A1A1B1B1 1.00 0.84 0.79 0.84 1.00
A1A1B2B2 1.00 0.43 0.00 -0.43 -1.00
A2A2B1B1 -1.00 -0.43 0.00 0.43 1.00
A2A2B2B2 1.00 -0.84 -0.79 -0.84 -1.00
Maximum likelihood approach to QTL mapping (Lander and Botstein, 1988)
• Assuming complete map coverage, is it possible to design a cross to make it highly likely that QTLs will be found?
• Using flanking markers as opposed to single-marker analysis
• Reduce the number of markers individually tested and thus reduce type I error
Traditional approach
• Compare the mean phenotypic value of progeny with genotype AB to those with marker genotype AA
• One-way analysis of variance– i.e., a linear regression– assume normally-distributed residual
environmental variance
Number of progeny required for detection (Soller and Brody, 1976)
• Assume that a QTL contributes 2exp to the genetic variance
and is located exactly at a marker locus
• (Z)2(2res/2
exp)
– Z is the number of standard deviations beyond with the normal curve contains probabilty a
• Phenotypic effect may be underestimated if not at marker locus• Greater number of progeny if not at the marker• No definition of the likely position of the QTL• Multiple testing
Interval mapping of QTLs using LOD scores: Method of maximum likelihood
i=a + bgi + · gi is coded (0, 1) for number of B alleles is a random normal variable with mean 0 and
variance 2
· b denotes the estimated phenotypic effect of a single allele substitution at a putative QTL
• L(a, b, 2) = iz((i - (a + bgi)), 2)
• LOD = log10(L(a’, b’, 2’)/L(A’, ), 2B’))
Interval mapping of QTLs using LOD scores: Method of maximum likelihood
• ELOD = 1/2log10(1 + 2exp/2
res) (a result from linear regression)
• ~1/2(log10e)(2exp/2
res) (Taylor expansion for small values of 2
exp/2res)
• ~0.22(2exp/2
res)
• T/ELOD ~ (Z)2/(2exp/2
res)
Interval mapping of QTLs using LOD scores(Lander and Botstein, 1988)
• L(a, b, 2) = i[Gi(0)Li(0) + Gi(1)Li(1)]
• Li(x) = z((i - (a + bx)), 2) denotes likelihood function for individual I
• Assumptions– gi = x
– Gi(x) denotes the probability that gi = x conditional on the genotypes and positions of the flanking markers
Confirmation of EtOH sensitivity QTL in mouse (Markel et al., 1997)
Genetic map of EtOH-sensitivity QTL (Lore1 - 6; Markel et al., 1997)
Additive effect of confirmed QTL for alcohol sensitivity (Markel et al., 1997)
Marker-assisted breeding of congenic mouse strains (Markel et al, 1997b)
• Yellow indicates the donor (D) genome
• Blue represents the recipient (R) genome
• Apoe is the target region of introgression
• Left side represents traditional approach, while right the “speed” congenic method
Traditional congenic breeding strategy (Markel et al., 1997b)
Generation Average %heterozygous (D/R)
segments SD
% recipient genome
F1N2
100.0050.007.07
50.0075.00
N3N4
25.005.0012.503.54
87.5093.75
N5N6
6.252.503.131.76
96.8898.44
N7N8
1.561.250.780.88
99.2299.61
N9N10
0.390.630.200.44
99.8199.90
Marker-assisted congenic breeding strategy (Markel et al., 1997)
Backcrossgeneration
Average %D/R segments
SD
% D/Rsegments in'best' male
% recipientgenome of'best' male
F1 1000 100 50
N2N3
50.007.0719.164.38
38.3211.93
80.8494.03
N4N5
5.982.440.980.98
1.95~0
99.03~100
Theoretical potential (Markel et al., 1997b)
Number of male carriers Potential reduction inD/R (x)
510
0.851.29
1520
1.501.65
3040
1.841.96
50 2.06
Comparison of theoretical expectations and empirical data
RecipientStrain at N5
Estimated %recipient
genome forbest male
Observed %recipient
genome forbest male
BABL/cByJC3H/HeJ
99.5299.27
99.1199.41
C57BL/KsCAST/Ei
99.6692.74 (N4)
99.7095.54 (N4)
DBA/2JFVB/NJ
98.9799.38
99.3899.73
Lecture 4: Mapping in humans (1 of 2)
• Linkage analysis
• Relative-pair analysis
Genetic mapping has been uncommon for human in most of the last century
• Lack of abundant supply of markers• Inability to arrange human crosses to suit
experimental purposes• Breakthrough with Botstein et al. (1980) for yeast• Use naturally occurring DNA sequence variation
in humans• Led to mapping several hundred rare Mendelian
diseases
Human Genetic Revolution
• Human genetics has sparked a revolution in medical science
• Can find genes behind disease without knowing how they function
• Completely generic approach
Last two decades ushered in complex traits
• Do not follow simple Mendelian monogenic inheritance
• Heart disease, hypertension, diabetes, cancer, and infection
Defining disease
• Clinical phenotype
• Age at onset
• Family history
• Severity
Allele frequencies+
Environment
Method/Technique
+Time/Place
• Prevalence• Risk• Heritability• Age of onset• Family history• Severity etc.
}
The Population
The Sample
The Metric
Linkage Analysis: Overview
• Simple Mendelian traits offer a small number of hypotheses for the geneticist to test.
• Thus, the geneticist speculates based on Mendelian rules what the most appropriate model is to explain the pattern of relationship between observed phenotype and genotype.
Linkage analysis: Hypothesis
• For simple mendelian traits, mendelian rules of gametic transmission can explain adequately the pattern of phenotypes in a multigenerational family:
• M1 = a specified model that suggests a specific location for a trait-causing gene
• Much more likely to have produced the observed data than
• M0 = a model that suggests no linkage to a trait-causing gene in the region
Linkage analysis: Hypothesis
• The evidence for M1 versus M0 is measured by the likelihood ratio
LR = Prob(Data|M1)/Prob (Data|M0)
• This is also presented as Z, the lod score
Z = log10(LR)
• (see 49, 50; Morton (1955))
1
2 3 5
T / t, M1 / M2 t / t, M2 / M2
t/tM1/M2
T/tM2/M2
T/tM2/M2
T/tM1/M2
T/tM1/M2
t/tM1/m2
2
1 4 6
Autosomal dominant trait
Basic calculations in human linkage analysis
• Assign linkage phase• Calculate conditional probabilities• Observe the number of each class of paternal
gametes in progeny• Probability of observed family given a model [L()]• Probability assuming independent assortment
[L(0.5)]• Calculate likelihood ratio: LR = L()/L(0.5)
Assign linkage phase
• Equivalent to experimental two-factor testcross• Linkage phase
– Different sets of alleles on each member within a pair of homologous chromosomes (i.e, haplotype)
– AB/ab is in coupling; Ab/aB is in repulsion– Marker alleles are codominant, so phase is
arbitrary; coupling is TM1/tM2 and repulsion is tM1/TM2
Conditional probabilities
Gamete Frequencies
Phase TM1 TM2 tM1 tM2
Coupling (1 - )/2 /2 /2 (1 - )/2
Repulsion /2 (1-)/2 (1-)/2 2
n1 n2 n3 n4
Observe paternal gametes
• n1 = TM1, n2 = TM2, n3 = tM1, and n4 = tM2 gametes
• Six children in the present example– n1 = 1– n2 = 2– n3 = 3– n4 = 0
Probability L()
• Each offspring is an independent event so that:• L() = L(coupling)L() + L(repulsion)L()
=0.5[0.5n(1 - )n1+n4()n2+n3]+0.5[0.5n(1 - )n2+n3()n1+n4]
=0.5n+1[(1- )n1+n4()n2+n3+(1- )n2+n3()n1+n4]• The geneticist provides a reasonable value for ;
in this case, what is a reasonable value for ?
Probability L(.167)
• L(0.167) = (0.5)7[(0.833)1(0.167)5+(0.833)5(0.167)1] = 0.000524
L(0.5)
• L(0.5)=.25n, n is the number of progeny• L(0.5)
=(0.25)6
=0.000244
LR and Z
• LR = L()/L(0.5) = 0.00052/0.00024
= 2.147
• Z = log10LR = 0.332
• Try different values of • If recombinants (r) can be counted directly, then
maximum likelihood estimate (MLE) = r/n
1
2 3 5
T / t, M1 / M2 t / t, M2 / M2
t/tM1/M2
T/tM2/M2
T/tM2/M2
T/tM1/M2
T/tM1/M2
t/tM1/m2
2
1 4 6
1 2
t/t, M1/M2 T/t, M2/M2
Father’s genotype is in repulsion
• Assume father’s alleles are in repulsion (TM2/tM1)
– L()=0.5n(1 - )n2+n3()n1+n4
– L(0.167)=(0.5)6(0.833)5(0.167)=0.001046
• Multiple generations are thus valuable
– Nearly twice the earlier value
– Z improves by 0.3, underscoring the value of multi-generation pedigrees
• How about two families of 6 children versus one family of 12?
Linkage analysis: Autosomal recessive trait
• More complicated analysis; more families are required to demonstrate linkage between a marker locus and an autosomal recessive trait compared to autosomal dominant
• Normal children can be Tt or TT; thus, alone can not be used to deduce linkage phase of doubly-heterozygous parent
• Families with just one affected are not informative, even when several normal children are available
• LR()=0.5[(1-)1()0+()1(1-)0]
=0.5[(1-)+]
=0.5
Allele frequency estimation
• Allelic heterogeneity
• Critical; rare versus common allele
Allele-sharing studies
• Penrose (1935)
• Haseman and Elston (1972)
• Carey and Williamson (1993)
• Fulker and Cardon (1994)
• Lander et al. (1995)
Allele-sharing: Haseman and Elston (1972)
• Can genetic variance be assigned to a locus?
• Twin studies– Partition genetic variance– Do not address the contribution of individual loci
• Sib-pairs– Addresses secular and age effects– Include information about parents
Allele-sharing: Haseman and Elston (1972)
• Xij = + gij + eij
• gij = genotypic value; eij = environmental deviation
• Assume random mating and linkage equilibrium
• Yj = (sib-pair difference)2
• Estimate Y based on best estimate of the number of alleles the sibs share identical by descent (IBD)
Allele-sharing: Haseman and Elston (1972)
• Let j = proportion of genes shared IBD and Y = (x1j - x2j)2 for sib pair j
• Develop expectation of Y if known precisely at the disease locus
• Estimate (’) given the genotypes of the parents (sometimes) and children for marker locus
• Predict Y based on ’
Development of the model
• E (Yj | j
• E (’ | Im) ’ = estimate of – Im = information about parent and sib genotypes
• E (Y | ’)
E (Yj | j)
• For sib pair BB-Bb
• x1j = + a + e1j
• x2j = + d + e2j
• Yj = (a + e1j - d - e2j)2 = (a - d + ej)2
E (Yj | j)
j Genotype pair Probability
0 BB - BB p2(p2) = p4
1/2 BB - BB p2(p) = p3
1 BB - BB p2(1) = p2
E (Yj | j)
Expectation Variance components
E(Yj | j = 1) 2e
E(Yj | j = 1/2) 2e + 2
a + 22d
E(Yj | j = 0) 2e + 22
a + 22d
0 1/2 1 j
Yj
E (Yj | j)
E (Yj | j)
• Expectation for Yj varies with proportion of j
• E(Yj | j) = + j
= (2e + 22
g)
= -22g
j = 0, 1/2, 1
• Note: 2d vanishes with large n
E(’ | Im)
• Estimate p based on sib-pair and parental genotypes for a marker locus
• fji is the probability that the jth sib pair have I genes identical by descent
• Im is the information on sib-pair and parental genotypes
• Our best estimate of j (strongest correlation) is given as
’ = fj2 + 1/2fj1
’j is the Bayes estimate of j when a squared error loss function is used
• Maximum possible correlation with j when j is a random variables taking on values of 1, 1/2, and 1 (Haseman, 1970).
E(’ | Im)
Type Probability
7 parental mating types p(b)
34 offspring types p(a|b)
Joint probability p(ab)
E(’ | Im)
Mating type Sib pair type p(ab) fj0 fj1 fj2'j
AiAi x AiAi AiAi-AiAi pi4 1/4 1/2 1/4 1/2
AiAi x AjAj AiAj-AiAj 2pi2pj
2 1/4 1/2 1/4 1/2
AiAi x AiAj AiAi - AiAi
AiAi - AiAj
AiAj- AiAj
pi3pj
2pi3pj
pi3pj
01/20
1/21/21/2
1/20
1/2
3/41/43/4
fji = 2
h = 0
vPp
wPs
P{v and w and j = h/2},
wPs
vPp
P{v and w and i = i/2},
For i = 0,1,2
Joint probability of observing Im and that j should equal i/2
Sum of the three joint probabilities, i = 0, 1, 2
E(Yj | ’j)
• Assume a two-allele marker locus...
• No dominance...
• And complete parental information
E(Y | ’)
• Given complete Im
• E(Yj|’j) = + ’j
= -2(1-2c)22g
• (1-2c)2 = correlation between jm and jt, i.e., proportion of marker genes ibd and QTL genes i.b.d.
E(Yj|’jm) =
jm
E(Y|jt)P{jt|jm}P{jm|’jm}jt
Joint distribution of jt and jm
Joint distribution of ’jm and jm
E(Yj | ’jm) = [2e + 2(1 - 2c + 2c2) 2
g - 2(1 -c)22g’jm
= [2e + 2(1 - 2c + 2c2)2
g
= - 2(1 -c)22g’jm
If c = 1/2, then b = 0If c = 0, then b = -22
g
P{jm = jt = 1} A1B1A2B2
XA3B3A4B4
A = marker B = trait
A1B1 (1 - c)/2A2B2 (1 - c)/2A1B2 c/2A2B1 c/2
A3B3 (1 - c)/2A4B4 (1 - c)/2A3B4 c/2A4B3 c/2
A1B1A2B2X
A3B3A4B4
A1B1A3B3
A1B1A3B3
Sib 1 Sib 2
A1B1A3B3
A1B1A3B3
Sib 1 Sib 2
[(1 - c)/2]2 [(1 - c)/2]2
[(1 - c)/2]2[(1 - c)/2]2 = (1 - c)4 / 16
P{jm = jt = 1} = 4(c4/16) + 8[c2(1 - c)2 /16] + 4[(1 - c)4/ 16]
=[c2 + (1 - c)2]2/4 = 2 / 4, where
= c2 + (1 - c)2
Contemporary sib-pair analysis (Kruglyak and Lander, 1995)
• Multipoint linkage analysis– full inheritance information– maximum likelihood estimates
• Qualitative traits
• Quantitative traits
Sib-pair analysis advantages
• Sib pairs are relatively easy to ascertain
• Closely matched, control for secular effects
• No assumptions about inheritance
• No assumptions:– penetrance– phenocopy– disease allele frequency
Sib-pair analysis: Basic model
• Determine whether a sib pair shares 0, 1, or 2 alleles identical by descent (IBD)
• Affected sibs should share alleles IBD more often than expected under random Mendelian segregation (qualitative trait)
• Sib-pairs should show a correlation between magnitude of phenotypic difference and number of alleles shared IBD (quantitative trait)
Sib-pair analysis: Qualitative traits
• Estimated proportions of IBD sharing– (z0, z1, z2)
• Mendelian expectation 0, 1, 2) = (1/4, 1/2, 1/4)
• According to Holmans (1993):– z0 + z1 + z2 = 1; 1/2 z1; z1 2z0
– If the is no dominance variance: z1 = 1/2
Sib-pair analysis and relative risk (Risch, 1990)
• If only a single locus is involved...
• Relative-risk ratio for a sib (prevalence in siblings of affecteds divided by population prevalence) S = relative risk ratio for sibling
O = relative risk ratio for offspring
M = relative risk ratio for monozygotic twin
Sib-pair analysis and relative risk (Risch, 1990)
• zO = 0 / S
• z1 = 1O / S
• z2 = 2M / S
• In the absence of dominance variance, O = S and M - 1 = 2(S - 1)
IBD distribution (adapted from Kruglyak and Lander, 1995)
Sibling 1
Sibling 2
4 2 3 2 3 2 3 4 4 1 3
4 2 3 2 2 5 1 5 2 3 1
2 3 4 5 5 4 3 3 3 1 2
2 3 4 5 5 4 3 3 5 2 3
0 20 40 60 80 100 cM
p(IBD) 2 1 0
1.00
.50
Quantitative trait sib-pair analysis
• Let 1i, 2i denote phenotypes of two siblings
• Di = 1i - 2i
• vi represents the number of alleles shared IBD
• At the QTL, variance of D depends on v
• So that 20 > 2
1 > 22, where 2
j is the variance of the difference D when j alleles are shared
• How do we test this hypothesis?
Quantitative traits with complete information: Haseman-Elston
• E(Di2 | vi ) = - vi; = 2
g (additive genetic variance)
• Linear regression assures an ML estimate only if the noise process is normally distributed and uncorrelated with the dependent variable
• Squared difference D2 does not necessarily follow• Standard error and distribution of test statistic are
based on normal, uncorrelated error; thus, t-test derived by dividing by its standard error is inappropriate
Quantitative traits with complete information: ML QTL variance estimation
• Derive direct estimates of 2j based on D for
each value of v
• Assume the simple constraint
20 2
1 22
• No dominance variance
21 = (2
0 + 22) / 2
• How to deal with incomplete data?
Quantitative traits with complete information: Nonparametric QTL analysis
• Make no assumptions about the phenotypic distribution; Wilcoxon rank-sum test
• Rank sib pairs according to absolute D; rank(i) the rank of the ith sib pair and s a location in the genome
XW(s) = rank(i) f(vi)i = 1
n
Quantitative traits with complete information: Nonparametric QTL analysis
• For f(v)
• No linkage, XW(s) has expectation 0 and variance V = [n(n+1)(2n+1)]/12
• Ratio Z(s) = XW(s) / V1/2
• Z(s) asymptotically distributed– standard normal– Ornstein-Uhlenbeck diffusion process
Lecture 5a: Mapping in humans (2 of 2)
• Linkage disequilibrium
• Allele frequency estimation
• Association analysis
Linkage equilibrium and disequilibrium
• The linkage analyses so far discussed assume linkage equilibrium
• All possible combination of alleles on a a single chromosome (all possible haplotypes or all possible gamete genotypes) occurs as frequently as would be predicted from the random association of individual allele frequencies
For example, assume that:A = 0.2 a = 0.8 M = 0.6 m = 0.4
Haplotypes ExpectedFrequency
AM 0.2 x 0.6 = 0.12
Am 0.2 x 0.4 = 0.08
aM 0.8 x 0.6 = 0.48
am 0.8 x 0.4 = 0.32
Total = 1.00
Disequilibrium = D = observed frequency - expected frequency
Haplotype Observed 0 - E D
AM .04 .04 - .12 = -0.08
Am .16 .16 - .08 = +0.08
aM .56 .58 - .48 = +0.08
am .24 .24 - .32 = -0.08
Comments on linkage disequilibrium
• Dmax is determined by setting one of the haplotypes involving the least common allele at a frequency of zero
– Dmax = 0.12, if frequency of AM were zero
– Absolute Dmax is 0.25 for any two-locus system (frequency of each of four alleles were 0.25)
• Effect on linkage analysis
– If no assumptions about any genotype, D is not relevant
– Guess about one or more individual’s genotype, total lod score is less accurate
Linkage disequilibrium between marker and trait loci
• Most cases of trait are due to relatively few distinct ancestral mutations at trait-causing locus
• Allele A present on an ancestral chromosomes and lying close enough to trait-causing locus so that linkage has not been thoroughly “shuffled” in the population’s history
• Young mutation in an isolated population
Association Studies
• Disregard familial patterns of inheritance
• Case-control studies
• Allele A is associated with a trait if it is significantly more frequent among affecteds as compared to unrelated controls
• 2 x 2 contingency 2 test
Association studies
• Choice of control group is a major issue– Not an issue in linkage or allele-sharing method– why?
• Association studies most meaningful when it involves alleles with direct biological relevance
Association studies and complex traits
• HLA complex (chrom. 6) implicated in etiology of autoimmune diseases
• HLA-B27 allele– Occurs in 90% of patients with ankylosing spondylities
– Only 9% of the general population
• Type I diabetes, rheumatoid arthritis, multiple sclerosis, systemic lupus, late-onset Alzheimer’s disease
Three competing hypotheses (Hn) for positive
associations • H1: Allele is actually a cause of the disease
• H2: Allele is in linkage disequilibrium with the actual cause (syntenic with trait-causing allele)
• Recall that for D– Most cases of trait are due to relatively few distinct ancestral
mutations at trait-causing locus– allele A was present on one of these ancestral chromosomes
and lies close enough to trait-causing locus such that linkage has not been thoroughly “shuffled” in the population’s history
– young mutation in an isolated population
Three competing hypotheses (Hn) for positive associations
• H3: Artifact of population admixture
• A trait present at a higher frequency in an ethnic group will be positively associated with any allele that happens to be more common in tht group
• For example, (Lander and Shork, 1994)– eating with chopstick in San Francisco– HLA-A1 allele (more common among Asians
than Caucasians)
top related