interval mapping basics
TRANSCRIPT
ch. 2 Ā© 2003 Broman, Churchill, Yandell, Zeng 1
2 Key Statistical Issues for QTLā¢ general notation and data structureā¢ recombination model
ā two linked markersā flanking markers to a QTLā map distance and map functions
ā¢ modelling the phenotypeā phenotype modelā model likelihoodā Bayesian posterior
ā¢ missing data concepts and algorithmsā¢ model selection
ch. 2 Ā© 2003 Broman, Churchill, Yandell, Zeng 2
interval mapping basicsā¢ observed measurements
ā Y = phenotypic traitā X = markers & linkage map
ā¢ i = individual index 1,ā¦,nā¢ missing data
ā missing marker dataā Q = QT genotypes
ā¢ alleles QQ, Qq, or qq at locusā¢ unknown genetic architecture
ā Ī» = QT locus (or loci)ā Īø = genetic actionā m = number of QTL
ā¢ pr(Q|X,Ī»,m) recombination modelā grounded by linkage map, experimental crossā recombination yields multinomial for Q given X
ā¢ pr(Y|Q,Īø,m) phenotype modelā distribution shape (assumed normal here) ā unknown parameters Īø (could be non-parametric)
observed X Y
missing Q
unknown Ī» Īøafter
Sen Churchill (2001)
ch. 2 Ā© 2003 Broman, Churchill, Yandell, Zeng 3
2.1 general notation and data structure
ā¢ Y = phenotype valuesā as concept and realized (observed) values
ā¢ X = marker genotypesā type of experimental crossā linkage map construction
ā¢ marker orders, positions, linkage phasesā observed marker genotypes (possibly with error)
ā¢ pr(Y,X) = joint probabilityā what we āknowā about Y and X for this experimentā usually assume linkage map is āknownā
ch. 2 Ā© 2003 Broman, Churchill, Yandell, Zeng 4
conditional data likelihood
ā¢ condition on markers and linkage map
ā¢ pr(X) comprises information on linkage mapā not influenced by phenotypeā thus can āignoreā for QTL purposes
)(pr),(pr)|(pr
XXYXY =
ch. 2 Ā© 2003 Broman, Churchill, Yandell, Zeng 5
unknown QTL genotypesā¢ usually have sparse linkage map of markers
ā want to condition on actual QTL genotype Qpr(Y|Q)
ā but actual QTL affecting phenotype not knownā¢ need to consider all possibilities
ā average pr(Y|Q) over all possible genotypes Qā weight by recombination pr(Q|X)
ā=Q
XQQYXY )|(pr)|(pr)|(pr
ch. 2 Ā© 2003 Broman, Churchill, Yandell, Zeng 6
enter the (Greek) parametersā¢ Īø = genetic effects, or gene action
ā additive, dominance, epistasisā may include reference values
ā¢ grand mean (Āµ), environmental variance (Ļ2)
ā¢ Ī» = location(s) of QTLā measured along ālinearā genomeā related to recombination and map distance
ā==Q
XQQYXYXYL ),|(pr),|(pr),,|(pr),|,( Ī»ĪøĪ»ĪøĪ»Īø
ch. 2 Ā© 2003 Broman, Churchill, Yandell, Zeng 7
2.2 recombination modelā¢ locus Ī» is distance along linkage map
ā identifies flanking marker regionā¢ flanking markers provide good approximation
ā map assumed known from earlier studyā inaccuracy slight using only flanking markers
ā¢ extend to next flanking markers if missing dataā could consider more complicated relationship
ā¢ but little change in results
pr(Q|X,Ī») = pr(geno | map, locus) āpr(geno | flanking markers, locus)
kX 1+kXQ?
Ī»
ch. 2 Ā© 2003 Broman, Churchill, Yandell, Zeng 8
ā¢ backcross designā n individuals, 2 markersā follow one gamete
ā¢ recombinantsā Ab, aBā nR = n12 + n21
ā¢ non-recombinantsā ab, ABā nNR = n11 + n22
ā¢ recombination rate
2.2.1 two linked markersA B
22211211
2112Ėnnnn
nnnnr R
++++==
r
21122211
,,nnnnnn
aBAbABabRNR
RNR +=+=
n21
ch. 2 Ā© 2003 Broman, Churchill, Yandell, Zeng 9
no linkage?ā¢ test for no linkage: r = 1/2ā¢ assumption: all individuals have same rate
ā implies binomial variation
ā¢ normal or chi-square test statisticn
rrrnnnn
nnnnr R )Ė1(Ė)Ė( var,Ė
22211211
2112 āā+++
+==
21
22 ~
)/()2/(Zor )1,0(~
)Ė(var2/1Ė Ļ
nnnnnN
rrZ
NRR
R ā=ā=
ch. 2 Ā© 2003 Broman, Churchill, Yandell, Zeng 10
binomial probabilitiesbinomial probn = 30,100r = 0.4,0.2
normal approx
knkR rr
kn
kn āā
== )1()(pr
ch. 2 Ā© 2003 Broman, Churchill, Yandell, Zeng 11
likelihood ratio and LOD testā¢ likelihood for linked markers
ā¢ likelihood for unlinked markers
ā¢ likelihood ratio and LOD
nCL )()( 21
21 =
22
10
21
2
217.)10log(2
)(log
~)log(2,)Ė1()Ė(2
GGLRLOD
LRGrrLR NRR nnn
===
=ā= Ļ
NRR nnR rCrrnnrL )1(),|(pr)( ā==
ch. 2 Ā© 2003 Broman, Churchill, Yandell, Zeng 12
test statistic: distribution
ā¢ Z2 and G2 are generally close to each otherā Z2 based on properties of countsā G2 and LOD based on likelihood principleā both have approximate chi-square distribution
ā¢ (non)central chi-square distribution
22;1
22
21
22
)5.0(4,~,:5.0
~,:5.0
rnncpGZr
GZr
ncp ā=<
=
ĻĻ
ch. 2 Ā© 2003 Broman, Churchill, Yandell, Zeng 13
backcross examples
ā¢ n=100 individuals, nR= 40 recombinantsā r = 0.4, se(r) = 0.049ā Z = -2.04, Z2 = 4.17, p-value = 0.041ā G2 = 4.03, LOD = 0.874, p-value = 0.045
ā¢ n=100 individuals, nR= 20 recombinantsā r = 0.2, se(r) = 0.04ā Z = -7.5, Z2 = 56.25, p-value < 0.0001ā G2 = 38.55, LOD = 8.37, p-value < 0.0001
ch. 2 Ā© 2003 Broman, Churchill, Yandell, Zeng 14
backcross examples
ā¢ n=30 individuals, nR= 12 recombinantsā r = 0.4, se(r) = 0.089ā Z = -1.12, Z2 = 1.25, p-value = 0.26ā G2 = 1.21, LOD = 0.262, p-value = 0.27
ā¢ n=30 individuals, nR= 6 recombinantsā r = 0.2, se(r) = 0.073ā Z = -4.11, Z2 = 16.87, p-value < 0.0001ā G2 = 11.56, LOD = 2.51, p-value < 0.0001
ch. 2 Ā© 2003 Broman, Churchill, Yandell, Zeng 15
simulations of LOD distribution
n=100,r=0.3,0.51000 sampleshistogram
chi-square curverescaled by 2log(10)
centralr = 0.5
non-centralr = 0.3
ch. 2 Ā© 2003 Broman, Churchill, Yandell, Zeng 16
LOD and LR over possible rn = 30nR=12 or 6evaluate at
all possible rnot just ābestā
LR like a densityLOD is basis for
hypothesis test estimate interval
0.0 0.1 0.2 0.3 0.4 0.5
0.0
e+00
5.0
e-19
1.0
e-18
1.5
e-18
recombination rate
likel
ihoo
d ra
tio
0.0 0.1 0.2 0.3 0.4 0.5
-15
-10
-50
recombination rate
LOD
0.0 0.1 0.2 0.3 0.4 0.5
0.0
e+00
1.0
e-16
2.0
e-16
recombination rate
likel
ihoo
d ra
tio
0.0 0.1 0.2 0.3 0.4 0.5
-4-2
02
recombination rate
LOD
ch. 2 Ā© 2003 Broman, Churchill, Yandell, Zeng 17
LR, LOD and p-values
p-value p-valueLR LOD 1 d.f. 2 d.f.10 1 0.0319 0.131.6 1.5 0.0086 0.0316100 2 0.0024 0.011000 3 0.0002 0.00110000 4 <0.0001 0.0001
ch. 2 Ā© 2003 Broman, Churchill, Yandell, Zeng 18
LOD-based interval estimate for rpoint estimate interval estimate
from LOD peakdown 1.5 LOD(or 1 or 2 or ā¦)
n = 30, nR = 12, 6
nnr R /Ė =
0.20 0.25 0.30 0.35 0.40 0.45 0.50
-1.5
-1.0
-0.5
0.0
n=30,r=0.4
LOD
0.1 0.2 0.3 0.4
0.5
1.0
1.5
2.0
2.5
n=30,r=0.2
LOD
ch. 2 Ā© 2003 Broman, Churchill, Yandell, Zeng 19
LOD-based interval calculationsconfidence 96.8% 99.1% 99.76%n nR r 1 LOD 1.5 LOD 2 LOD
30 3 0.1 0.03-0.25 0.02-0.29 0.01-0.33100 10 0.1 0.05-0.17 0.04-0.19 0.04-0.2130 6 0.2 0.08-0.37 0.06-0.42 0.05-0.46100 20 0.2 0.13-0.29 0.11-0.31 0.10-0.3330 12 0.4 0.23-0.50 0.19-0.50 0.17-0.50100 40 0.4 0.30-0.50 0.28-0.50 0.26-0.50
Note skew in intervals for small recombination rates.Note upper boundary of 0.5.
ch. 2 Ā© 2003 Broman, Churchill, Yandell, Zeng 20
LOD-based intervals
0.00 0.10 0.20 0.30
3.0
3.5
4.0
4.5
n=30,r=0.1
LOD
0.1 0.2 0.3 0.4
0.5
1.0
1.5
2.0
2.5
n=30,r=0.2
LOD
0.20 0.30 0.40 0.50
-1.5
-1.0
-0.5
0.0
n=30,r=0.4
LOD
0.05 0.10 0.15 0.20
14.0
14.5
15.0
15.5
16.0
n=100,r=0.1
LOD
0.10 0.15 0.20 0.25 0.30
6.5
7.0
7.5
8.0
n=100,r=0.2
LOD
0.30 0.35 0.40 0.45 0.50
-1.0
-0.5
0.0
0.5
n=100,r=0.4
LOD
ch. 2 Ā© 2003 Broman, Churchill, Yandell, Zeng 21
likelihood & Bayesian posteriorā¢ recall the likelihood and likelihood ratio:
ā¢ posterior turns likelihood into a densityā assume r may be any value prior to seeing dataā posterior = likelihood x prior / constant
NRR
NRR
nnn
nnR
rrrLR
rCrrnnrL
)1(2)(
)1(),|(pr)(
ā=
ā==
1),|(pr sumcurve or likelihoodunder area
/)(or /)(),|(pr
==
==
Rr
R
nnrLRA
ArLRArLnnr
ch. 2 Ā© 2003 Broman, Churchill, Yandell, Zeng 22
LR and Bayes posteriorimagine LR as density
area under curve = 1pr(r | nR ) = LR(r) / A
what is probability that ris between 0.25 and 0.5?
where is interval with highest posterior mass? (HPD region)
example: n=30, nR =12,695% HPD regions
0.0 0.1 0.2 0.3 0.4 0.5
0.0
e+00
1.0
e-18
recombination rate
likel
ihoo
d ra
tio
0.0 0.1 0.2 0.3 0.4 0.5
0.00
00.
010
0.02
0
recombination rate
dens
ity
0.0 0.1 0.2 0.3 0.4 0.5
0.0
e+00
1.5
e-16
recombination rate
likel
ihoo
d ra
tio
0.0 0.1 0.2 0.3 0.4 0.5
0.00
00.
010
0.02
0
recombination rate
dens
ity
ch. 2 Ā© 2003 Broman, Churchill, Yandell, Zeng 23
HPD-based interval calculationsHPD level 96.8% 99.1% 99.76%n nR r
30 3 0.1 0.03-0.25 0.02-0.29 0.01-0.33100 10 0.1 0.05-0.17 0.05-0.19 0.04-0.2130 6 0.2 0.08-0.37 0.07-0.41 0.05-0.45100 20 0.2 0.13-0.29 0.12-0.31 0.10-0.3330 12 0.4 0.25-0.50 0.22-0.50 0.19-0.50100 40 0.4 0.31-0.50 0.30-0.50 0.28-0.50
Note how these almost agree with LOD-based intervals.Density height for HPD varies by n and r.
ch. 2 Ā© 2003 Broman, Churchill, Yandell, Zeng 24
Bayesian posteriors for r
0.0 0.1 0.2 0.3 0.4 0.5
0.00
0.01
0.02
0.03
n=30,r=0.1
dens
ity
0.0 0.1 0.2 0.3 0.4 0.5
0.00
00.
010
0.02
0
n=30,r=0.2
dens
ity
0.0 0.1 0.2 0.3 0.4 0.5
0.00
00.
010
0.02
0
n=30,r=0.4
dens
ity
0.0 0.1 0.2 0.3 0.4 0.5
0.00
0.02
0.04
0.06
n=100,r=0.1
dens
ity
0.0 0.1 0.2 0.3 0.4 0.5
0.00
0.01
0.02
0.03
0.04
0.05
n=100,r=0.2
dens
ity
0.0 0.1 0.2 0.3 0.4 0.5
0.00
0.01
0.02
0.03
0.04
n=100,r=0.4
dens
ity
ch. 2 Ā© 2003 Broman, Churchill, Yandell, Zeng 25
ā¢ Reverend Thomas Bayes (1702-1761)ā part-time mathematicianā buried in Bunhill Cemetary, Moongate, Londonā famous paper in 1763 Phil Trans Roy Soc Londonā was Bayes the first with this idea? (Laplace)
ā¢ billiard balls on rectangular tableā two balls tossed at random (uniform) on tableā where is first ball if the second is to its left (right)?
who was Bayes?
prior pr(Īø) = 1likelihood pr(Y | Īø)= ĪøY(1- Īø)1-Y
posterior pr(Īø |Y)= ?
Īø
Y=1 Y=0
firstsecond
ch. 2 Ā© 2003 Broman, Churchill, Yandell, Zeng 26
where is the first ball?
=ā=
=
=ā=
=
ā« ā
0)1(212
)|(pr
21)1()(pr
)(pr)(pr)|(pr)|pr(
1
0
1
YY
Y
dY
YYY
YY
ĪøĪø
Īø
ĪøĪøĪø
ĪøĪøĪø
prior pr(Īø) = 1likelihood pr(Y | Īø)= ĪøY(1- Īø)1-Y
posterior pr(Īø |Y)= ?
Īø
Y=1 Y=0
first ball
second ball
0.0 0.2 0.4 0.6 0.8 1.0
0.0
0.5
1.0
1.5
2.0
Y = 1
Y = 0
Īø
pr(Īø
|Y)
(now throw second ball n times)
ch. 2 Ā© 2003 Broman, Churchill, Yandell, Zeng 27
what is Bayes theorem?ā¢ before and after observing data
ā prior: pr(Īø) = pr(parameters)ā posterior: pr(Īø|Y) = pr(parameters|data)
ā¢ posterior = likelihood * prior / constantā usual likelihood of parameters given dataā normalizing constant pr(Y) depends only on data
ā¢ constant often drops out of calculation
)(pr)(pr)|(pr
)(pr),(pr)|(pr
YY
YYY ĪøĪøĪøĪø Ć==
ch. 2 Ā© 2003 Broman, Churchill, Yandell, Zeng 28
Bayes rule for recombination r
ā«=
Ć=
ā¤ā¤ā¤ā=ā¤ā¤
ā==
2/1
02),|(pr)|(pr
:constant g normalizin)|(pr
)(pr),|(pr),|(pr
:rule Bayes2/10
)(2)pr(: ion recombinat onprior
)1(),|(),|(pr
:likelihood
drrnnnn
nnrrnnnnr
baabbra
rrCrnnrLrnn
RR
R
RR
nnRR
NRR
0.0 0.1 0.2 0.3 0.4 0.5
01
23
45
posterior & prior
rela
tive
frequ
ency n=30
nR=12
0.0 0.1 0.2 0.3 0.4 0.5
01
23
45
posterior & prior
rela
tive
frequ
ency n=30
nR=6
ch. 2 Ā© 2003 Broman, Churchill, Yandell, Zeng 29
two markers in F2 intercrossā¢ two meioses
ā follow both gametesā 16 possibilities
ā¢ ambiguity with AaBbā 0 or 2 recombinations
ā¢ log likelihood ratio:lkjlj
( ))5.0(/)(logsumlog xxxx frfnLR =
ch. 2 Ā© 2003 Broman, Churchill, Yandell, Zeng 30
two markers in F2 intercrossā¢ Ī³ = probability of double recombinant
ā for AaBb genotype, haplotype not knownā need to āguessā the recombinant fraction of n11 offspring
ā¢ Ī³ and r are inter-relatedā no āclosedā solution, need to iterateā guess Ī³ , use to estimate r, improve Ī³ , etc.
[ ]) Ė(2)(21Ė
)1(AaBb
aBAbpr
11200221121001
22
2|nnnnnnn
nr
rrr
Ī³
Ī³
++++++=
+ā=
=
ch. 2 Ā© 2003 Broman, Churchill, Yandell, Zeng 31
EM algorithm for F2 recombinationā¢ initial guess: r = 0.5, Ī³ = 0.5ā¢ Expectation (E) step
ā substitute expected values for nuisance Ī³ā update Ī³ given current value of r
ā¢ Maximization (M) stepā maximize likelihood for parameter rā update r given current value of Ī³
ā¢ iterate E-step and M-step until āconvergenceāā stop when change in log-likelihood is small
ā usually change in r is small at this point( ))5.0(/)(logsumlog xxxx frfnLR =
ch. 2 Ā© 2003 Broman, Churchill, Yandell, Zeng 32
2.2.2 flanking markers to QTL
ā¢ most genotype information is localā linkage drops off with distanceā approximate by using only flanking markersā exception: linkage disequilibrium
ā¢ different chromosome regions could be correlatedā¢ due to selection, etc.ā¢ not a problem for backcross or F2 intercross
ā¢ missing marker data: use next flanking marker
A QrAQBrQB
rAB
ch. 2 Ā© 2003 Broman, Churchill, Yandell, Zeng 33
backcross QTL & flanking markers1 meiosis8 possible genotypes3 recombination ratessmall distances & rates?
no double crossoversĻ = rAQ /rAB
A QrAQBrQB
rAB
ch. 2 Ā© 2003 Broman, Churchill, Yandell, Zeng 34
F2 QTL & flanking markers2 meioses27 possible genotypes3 recombination ratesEM steps on Ī³ and rABsmall distances & rates?
no double crossovers
A QrAQBrQB
rAB
22
2
)1(,/
ABAB
ABABAQ rr
rrr+ā
== Ī³Ļ
ch. 2 Ā© 2003 Broman, Churchill, Yandell, Zeng 35
2.2.3 map distance & map functionsā¢ How to relate genetic linkage to physical distance?
ā math assumptions = crude approximationsā critical for map building, minor effect on QTL
ā¢ x = genetic map distance (1Morgan = 100cM)ā expected number of crossovers per meiosis between
two loci on a single chromatid strand (Sturtevant 1913)
ā¢ typical map functionsā Morgan: interference rAB = rAQ + rQB
ā Kosambi: intermediate rAB = (rAQ + rQB)/(1+4rAQ rQB)ā Haldane: no interference rAB = rAQ + rQB - 2rAQ rQB
ch. 2 Ā© 2003 Broman, Churchill, Yandell, Zeng 36
2.3 modelling the phenotypeā¢ trait = mean + genetic + environmentā¢ pr( trait Y | genotype Q, effects Īø )
EGYGQY
Q
Q
++=
=
ĀµĻĪø ),(normal),|(pr 2
8 9 10 11 12 13 14 15 16 17 18 19 20
qq QQ
ch. 2 Ā© 2003 Broman, Churchill, Yandell, Zeng 37
caution: donāt trust raw histograms!
8 9 10 11 12 13 14 15 16
qqQq
no QTL?
8 9 10 11 12 13 14 15 16
qqQq
skewed?
8 9 10 11 12 13 14 15 16 17 18 19 20 21
qqQq
dominance?
0 1 2 3 4 5 6 7 8 9
QqQQ
zeros?
ch. 2 Ā© 2003 Broman, Churchill, Yandell, Zeng 38
2.3.1 phenotype modelā¢ how is phenotype related to genotype?ā¢ typical assumptions
ā normal environmental variationā¢ residuals e (not Y!) have bell-shaped histogram
ā genetic value GQ is composite of a few QTLā¢ other polygenic effects same across all individuals
ā genetic effect uncorrelated with environment
effects ),,(
),|(var,),|(
),0(~,
2
2
2
ĻĀµĪø
ĻĪøĀµĪø
ĻĀµ
Q
Q
Q
G
QYGQYE
NeeGY
=
=+=
++=
ch. 2 Ā© 2003 Broman, Churchill, Yandell, Zeng 39
F2 intercross phenotype modelā¢ here assume only one QTLā¢ genotypes QQ, Qq, qqā¢ genotypic values GQQ, GQq, Gqq
ā¢ decompose as additive, dominance effects
222:Cockerham-Fisher:Jinx-Mather
qqQqQQ: genotype
Ī“Ī“Ī“ Ī±ĀµĀµĪ±ĀµĪ±ĀµĪ“ĀµĪ±Āµāā+ā+=
ā++==
Q
Q
GG
Q
ch. 2 Ā© 2003 Broman, Churchill, Yandell, Zeng 40
2.3.2 model likelihoodā¢ why study the likelihood?
ā uncover hidden aspects of QTLā loci Ī», effects Īø, given data (Y,X)
ā¢ what is evidence to support a QTL?ā¢ where are the QTL? ā¢ how precise can estimate the loci & effects?ā¢ what genetic architecture is supported?
ch. 2 Ā© 2003 Broman, Churchill, Yandell, Zeng 41
building the model likelihoodā¢ likelihood links phenotype & recombination
ā through unknown QTL genotypes Qā mixture over all possible genotypes
ā¢ contribution from individual i
ā¢ product over all individuals
),,|(pr prod),|,(
),|(pr),|(prsum prod),|,(
),|(pr),|(pr sum),,|(pr
Ī»ĪøĪ»ĪøĪ»ĪøĪ»Īø
Ī»ĪøĪ»Īø
iii
iiQi
iiQii
XYXYL
XQQYXYL
XQQYXY
=
=
=
ch. 2 Ā© 2003 Broman, Churchill, Yandell, Zeng 42
and if there are no QTL?
ā¢ Y = Āµ + e, or L(Āµ | Y )ā¢ no relationship with markers & map Xā¢ for normal data, maximum likelihood yields
nYYs
nYYYNYL
ii
ii
/)(sumĖ/sumĖ
),|()|(
222
2
ā==
===
ā¢
ĻĀµ
ĻĀµĀµ
ch. 2 Ā© 2003 Broman, Churchill, Yandell, Zeng 43
maximum likelihood & LODā¢ likelihood peaks at some (Īø,Ī»)
ā use āhatā (^) to signify value at maximumā¢ LOD profiles likelihood peak
ā find Īø to maximize likelihood for each Ī»ā profile (scan) loci Ī» over entire genome
=
==
)|(max),|,(maxlog),|(
),,|(pr prod),,|(pr),|,(
10 YLXYLXYLOD
XYXYXYL iii
ĀµĪ»ĪøĪ»
Ī»ĪøĪ»ĪøĪ»Īø
Āµ
Īø
ch. 2 Ā© 2003 Broman, Churchill, Yandell, Zeng 44
2.3.3 Bayesian posterior
ā¢ treat unknowns as randomā build āuncertaintyā into model frameworkā genetic architecture: gene action Īø, QTL locus Ī»
ā¢ interpret weighted likelihood as a densityā weights based on prior ābeliefsā
)|(pr)(pr)|,(pr)|(pr
)|,(pr),,|(pr),|,(pr
XXXY
XXYXY
Ī»ĪøĪ»Īø
Ī»ĪøĪ»ĪøĪ»Īø
=
=
ch. 2 Ā© 2003 Broman, Churchill, Yandell, Zeng 45
choice of Bayesian priorsā¢ elicited priors
ā higher weight for more probable parameter valuesā¢ based on prior empirical knowledge
ā use previous study to inform current studyā¢ weather prediction, previous QTL studies on related organisms
ā¢ conjugate priorsā convenient mathematical formā essential before computers, helpful now to simply computationā large variances on priors reduces their influence on posterior
ā¢ non-informative priors ā may have ānoā information on unknown parametersā prior with all parameter values equally likely
ā¢ may not sum to 1 (improper), which can complicate useā¢ always check sensitivity of posterior to choice of prior
ch. 2 Ā© 2003 Broman, Churchill, Yandell, Zeng 46
incorporate missing genotypes Qā¢ augment data with missing genotypes Q
ā use recombination model to state uncertaintyā avoid likelihood mixture by augmentation
ā¢ marginal posterior on unknowns of interestā average over fully augmented posterior
),|,,(pr sum),|,(pr)|(pr
)|,(pr),|(pr),|(pr),|,,(pr
XYQXYXY
XXQQYXYQ
Q Ī»ĪøĪ»Īø
Ī»ĪøĪ»ĪøĪ»Īø
=
=
ch. 2 Ā© 2003 Broman, Churchill, Yandell, Zeng 47
Bayesian parameter estimatesā¢ estimates are posterior means or modes
ā mean = weighted average of all possible valuesā mode = maximum
ā¢ can get joint or marginal estimates
( )),|,(pr sum argmaxĖ),|,(pr sumĖ
mode
,mean
XY
XY
Ī»ĪøĪø
Ī»ĪøĪøĪø
Ī»Īø
Ī»Īø
=
=
ch. 2 Ā© 2003 Broman, Churchill, Yandell, Zeng 48
2.4 missing data concepts
ā¢ missing QTL genotype Q--see section 2.3ā¢ missing marker data X
ā errors in genotypingā difficulty reading signal (on gel)ā marker not fully informative
ā¢ distinguish full data X from observed Xobsā weighted average over all possible marker values
)|(pr),|(prsum),|(pr obsobs XXXQXQ X Ī»Ī» =