lecture 15: linkage analysis vii
DESCRIPTION
Lecture 15: Linkage Analysis VII. Date: 10/14/02 Correction: power calculation Lander-Green Algorithm (Titles on updated or added slides highlighted). Sample Size Calculation. What is the sample size needed in order to achieve a particular statistical power for an estimate? - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Lecture 15: Linkage Analysis VII](https://reader033.vdocuments.mx/reader033/viewer/2022061605/56814541550346895db20b28/html5/thumbnails/1.jpg)
Lecture 15: Linkage Analysis VII
Date: 10/14/02 Correction: power calculation Lander-Green Algorithm (Titles on updated or added slides highlighted)
![Page 2: Lecture 15: Linkage Analysis VII](https://reader033.vdocuments.mx/reader033/viewer/2022061605/56814541550346895db20b28/html5/thumbnails/2.jpg)
Sample Size Calculation
What is the sample size needed in order to achieve a particular statistical power for an estimate?
We shall assume the relevant statistic is distributed as chi-square statistic.
![Page 3: Lecture 15: Linkage Analysis VII](https://reader033.vdocuments.mx/reader033/viewer/2022061605/56814541550346895db20b28/html5/thumbnails/3.jpg)
Sample Size Calculation (cont.)
is the statistical power is the critical value to reject H0 with significance level
. c is the non-centrality parameter, usually the expectation of
the log-likelihood ratio test statistic under particular HA and experimental conditions.
df is the degrees of freedom
22,P
1
cdf
2
![Page 4: Lecture 15: Linkage Analysis VII](https://reader033.vdocuments.mx/reader033/viewer/2022061605/56814541550346895db20b28/html5/thumbnails/4.jpg)
Sample Size Calculation (cont.)
12
,, 2
dfc
freedom. of degrees
and parameter ity noncentral with square-chi
central-nonfor valuecritical- is
E
2
12
,,
0
2
df
Gnc
df
![Page 5: Lecture 15: Linkage Analysis VII](https://reader033.vdocuments.mx/reader033/viewer/2022061605/56814541550346895db20b28/html5/thumbnails/5.jpg)
Modeling
Test your modeling skills. Propose a model for the following family ascertainment situation.
What if you knew that probands were detected independently and with the same probability in each family, except all secondary probands are more easily detected (second, third, etc all to the same degree) than the first proband in a family.
The model formulation and calculation of pr probabilities for families with 3 affected are now posted to the website.
![Page 6: Lecture 15: Linkage Analysis VII](https://reader033.vdocuments.mx/reader033/viewer/2022061605/56814541550346895db20b28/html5/thumbnails/6.jpg)
Lander-Green Algorithm
Like the Elston-Stewart algorithm, the Lander-Green algorithm models the pedigree and data as a Hidden Markov Model (HMM), except that the hidden states are the so-called inheritance vectors.
Like the Elston-Stewart algorithm, the Lander-Green algorithm assumes that there is no interference.
![Page 7: Lecture 15: Linkage Analysis VII](https://reader033.vdocuments.mx/reader033/viewer/2022061605/56814541550346895db20b28/html5/thumbnails/7.jpg)
LG – (Dis)Advantages
The Lander-Green algorithm is linear in the number of loci and exponential in the number of members in the pedigree.
Recall that the Elston-Stewart algorithm is complementary, linear in the number of members, but exponential in the number of loci.
Simulation methods (MCMC in particular) are used to deal with pedigrees with both high numbers of members and loci.
![Page 8: Lecture 15: Linkage Analysis VII](https://reader033.vdocuments.mx/reader033/viewer/2022061605/56814541550346895db20b28/html5/thumbnails/8.jpg)
LG – Inheritance Vector
The inheritance vector is a vector defined for each locus i in the dataset.
It is a binary vector with two components for each non-founder individual in the pedigree. Thus, it is of length 2(n – f).
The entry in the inheritance vector is 0 if the individual’s allele at that position is grandmaternal. If grandpaternal, it is 1. There are 22(n – f) possible inheritance vectors for each locus.
![Page 9: Lecture 15: Linkage Analysis VII](https://reader033.vdocuments.mx/reader033/viewer/2022061605/56814541550346895db20b28/html5/thumbnails/9.jpg)
LG – Inheritance Vector (cont)
The inheritance vector holds information about the number of crossovers that occurred to produce each non-founder in the population.
Thus, it is appropriate for estimating recombination fractions as is our goal here with the LG algorithm.
![Page 10: Lecture 15: Linkage Analysis VII](https://reader033.vdocuments.mx/reader033/viewer/2022061605/56814541550346895db20b28/html5/thumbnails/10.jpg)
LG – Inheritance Vector Example
4
AA aa
aA aA aaaa
Aa
1 2
3 5 6
7 89
Aaaa
Gamete v
4M 0|1
4P 0|1
5M 0|1
5P 0|1
7M 1
7P 0|1
8M 0
8P 0|1
9M 0
9P 0|1
![Page 11: Lecture 15: Linkage Analysis VII](https://reader033.vdocuments.mx/reader033/viewer/2022061605/56814541550346895db20b28/html5/thumbnails/11.jpg)
LG – Simplification by Conditioning
Fortunately, conditional on the inheritance vectors, the genotypes of each offspring are independent.
Of course, conditional on the genotype, the phenotype probabilities are independent.
Thus, we can calculate the probability for each individual in the pedigree independently of the others once we condition on the inheritance vectors.
![Page 12: Lecture 15: Linkage Analysis VII](https://reader033.vdocuments.mx/reader033/viewer/2022061605/56814541550346895db20b28/html5/thumbnails/12.jpg)
LG – Hidden States
The inheritance vector constitutes the unknown hidden state for each allele. We must define transition probabilities among the hidden states (from locus-to-locus).
Begin, by considering the transition probability between loci within a single individual, where the inheritance vector is of length 2.
Therefore, the hidden state at each locus is a binary vector of length 2.
![Page 13: Lecture 15: Linkage Analysis VII](https://reader033.vdocuments.mx/reader033/viewer/2022061605/56814541550346895db20b28/html5/thumbnails/13.jpg)
LG – Initial State
We must define the initial state of the first marker locus.
Prior to viewing the genotypes, all inheritance vectors are equally likely.
Assume the initial state of the inheritance vector at marker 1 is uniform over {(0,0), (1,0), (0,1), (1,1)}, where we list the maternal status first. In other words, marker 1 has ¼ probability of being in each of these possible states.
![Page 14: Lecture 15: Linkage Analysis VII](https://reader033.vdocuments.mx/reader033/viewer/2022061605/56814541550346895db20b28/html5/thumbnails/14.jpg)
Because of the assumption of no interference, the transition probabilities from the state at locus i to the state at locus i+1 are given by:
LG – Pairwise Transition Probabilities
22
22
22
22
111)1,1(
111)1,0(
111)0,1(
111)0,0(
)1,1()1,0()0,1()0,0(
iiiiii
iiiiii
iiiiii
iiiiii
where i is the recombination fraction between locus i and locus i+1.
![Page 15: Lecture 15: Linkage Analysis VII](https://reader033.vdocuments.mx/reader033/viewer/2022061605/56814541550346895db20b28/html5/thumbnails/15.jpg)
LG – Switch in Notation
From this point on, assume there are n non-founders (rather than n – f).
The reason for this change is simplification of the equations.
![Page 16: Lecture 15: Linkage Analysis VII](https://reader033.vdocuments.mx/reader033/viewer/2022061605/56814541550346895db20b28/html5/thumbnails/16.jpg)
LG – Inheritance Vector Transition Probabilities
The transition probabilities between inheritance vectors defined on full pedigrees with n relevant members, are given by
wvdni
wvdivw ,2, 1P
where d(v,w) is the Hamming distance between inheritance vectors v and w, i.e. the number of discordances between them.
![Page 17: Lecture 15: Linkage Analysis VII](https://reader033.vdocuments.mx/reader033/viewer/2022061605/56814541550346895db20b28/html5/thumbnails/17.jpg)
LG – Forward Variable
i
i
i
i
viiiiii
viiiiii
viiiiii
viiii
iii
vbvvbvO
bvOvbvv
vOOvOObvO
vbvOOb
byOb
byOOb
,P,P
,P,P
,,,P,,,,,P
,,,,P
P4
1
,,,P
111
111
1111
1111
111
1
![Page 18: Lecture 15: Linkage Analysis VII](https://reader033.vdocuments.mx/reader033/viewer/2022061605/56814541550346895db20b28/html5/thumbnails/18.jpg)
LG – Backward Variable
1
1
1
1
,PP
,P,,P,,,,,P
,P,,,,P
,,,,P
1
,,,P
11111
111112
111
11
1
i
i
i
i
viiiiii
viiiiiiiili
viiiili
viilii
l
ilii
bvvvOv
bvvvbvOOvbvOO
bvvvbvOO
bvvOOb
b
bvOOb
![Page 19: Lecture 15: Linkage Analysis VII](https://reader033.vdocuments.mx/reader033/viewer/2022061605/56814541550346895db20b28/html5/thumbnails/19.jpg)
LG – i(v,w)
wviii
iii
ii
iii
wwOvwv
wwOvwv
O
Owvvv
Owvvvwv
,11
11
1
1
P,P)(
P,P)(
P
,,P
,,P,
transition probability
penetrance parameter
![Page 20: Lecture 15: Linkage Analysis VII](https://reader033.vdocuments.mx/reader033/viewer/2022061605/56814541550346895db20b28/html5/thumbnails/20.jpg)
LG – Baum’s Lemma
Baum’s Lemma: Let
v
OvOvQ ',Plog,P',
If
then
OO P'P
![Page 21: Lecture 15: Linkage Analysis VII](https://reader033.vdocuments.mx/reader033/viewer/2022061605/56814541550346895db20b28/html5/thumbnails/21.jpg)
LG – Proof of Baum’s Lemma
OQQ
Ov
Ov
O
Ov
O
O
O
Ov
Ov
Ov
O
Ov
O
O
v
v
v
P/,',
,P
',Plog
P
,P
P
'Plog
1P
,P
,P
',P
P
,P
P
'P
![Page 22: Lecture 15: Linkage Analysis VII](https://reader033.vdocuments.mx/reader033/viewer/2022061605/56814541550346895db20b28/html5/thumbnails/22.jpg)
LG – Jensen’s Inequality
function concave a is
when EE xfxfxf
Ov
Ov
Ov
Ov
Ov
Ov
O
Ov
O
O
v
v
,P
',PlogE
,P
',PElog
,P
',Plog
P
,P
P
'Plog
![Page 23: Lecture 15: Linkage Analysis VII](https://reader033.vdocuments.mx/reader033/viewer/2022061605/56814541550346895db20b28/html5/thumbnails/23.jpg)
LG – EM Algorithm
We maximize Q(’) over ’ to maximize the likelihood P(O|) conditional on the current parameter estimates .
This may sound familiar. It is the M step of the EM algorithm, and the EM algorithm is how we maximize over a pedigree.
Details are shown below. Maximization is the difficult step. We show it first.
![Page 24: Lecture 15: Linkage Analysis VII](https://reader033.vdocuments.mx/reader033/viewer/2022061605/56814541550346895db20b28/html5/thumbnails/24.jpg)
LG - Maximization
viii
i
v ii
v
v
vvOv
vOvOvvvOvQ
vOvOvvvOvQ
OvOvQ
',Plog,P
PP',PPlog,P'
',
PP',PPlog,P',
',Plog,P',
1
22111121
22111121
Key step: by conditional independence, this probabilitybecomes a product of conditional probabilities.
![Page 25: Lecture 15: Linkage Analysis VII](https://reader033.vdocuments.mx/reader033/viewer/2022061605/56814541550346895db20b28/html5/thumbnails/25.jpg)
LG - Maximization
viiii
v i
i
i
i
v
dni
di
i
viii
ii
dndOv
dndOv
Ov
vvOvQ
ii
'1'2,P
''1
2,P
'1'log,P
',Plog,P'
',
2
1
![Page 26: Lecture 15: Linkage Analysis VII](https://reader033.vdocuments.mx/reader033/viewer/2022061605/56814541550346895db20b28/html5/thumbnails/26.jpg)
LG – EM Agorithm (M Step)
n
Od
nOOv
dOOv
nOv
dOv
dndOvQ
i
v
vi
v
vi
i
viiii
i
2
,E
2P,P
P,P
2,P
,Pˆ
0'ˆ1'ˆ2,P'
'ˆ,
'
![Page 27: Lecture 15: Linkage Analysis VII](https://reader033.vdocuments.mx/reader033/viewer/2022061605/56814541550346895db20b28/html5/thumbnails/27.jpg)
LG – EM Algorithm (E Step)
1,
,1
,,
,,P,,,E
ii vvii
wviiii
wvwvd
OwvvvwvdOwvd
sum over all pairs ofinheritance vectors
the usualconditionalprobabilitiesneeded tocalculateexpectation
![Page 28: Lecture 15: Linkage Analysis VII](https://reader033.vdocuments.mx/reader033/viewer/2022061605/56814541550346895db20b28/html5/thumbnails/28.jpg)
Heterogeneity in Recombination Fraction
Allow for two recombination fraction parameters in each interval.
Allow for one recombination fraction in each interval and a universal constant relating male and female recombination fractions.
Use nested models to test for evidence of sex-based differences.
![Page 29: Lecture 15: Linkage Analysis VII](https://reader033.vdocuments.mx/reader033/viewer/2022061605/56814541550346895db20b28/html5/thumbnails/29.jpg)
Model Misspecification
Penetrance parameters, allele frequencies may be incorrectly specified.
The model is robust to misspecification such that the false positive rate for linkage is unaffected by misspecification of these parameters.
![Page 30: Lecture 15: Linkage Analysis VII](https://reader033.vdocuments.mx/reader033/viewer/2022061605/56814541550346895db20b28/html5/thumbnails/30.jpg)
Model Misspecification and Ascertainment
When ascertainment is made independent of disease state and marker loci, the method remains robust to misspecification in both.
When ascertainment is made with respect to disease state, then the method is robust to misspecification of the disease parameters.
![Page 31: Lecture 15: Linkage Analysis VII](https://reader033.vdocuments.mx/reader033/viewer/2022061605/56814541550346895db20b28/html5/thumbnails/31.jpg)
Effects on Power
Power in two-point linkage analysis is largely unaffected as long as the dominance is specified correctly.
Multipoint linkage analysis is much more sensitive to misspecification of the model. However, there is more information when model parameters are jointly estimated along with position.