interval mapping basics

24
ch. 2 Ā© 2003 Broman, Churchill, Yandell, Zeng 1 2 Key Statistical Issues for QTL ā€¢ general notation and data structure ā€¢ recombination model ā€“ two linked markers ā€“ flanking markers to a QTL ā€“ map distance and map functions ā€¢ modelling the phenotype ā€“ phenotype model ā€“ model likelihood ā€“ Bayesian posterior ā€¢ missing data concepts and algorithms ā€¢ model selection ch. 2 Ā© 2003 Broman, Churchill, Yandell, Zeng 2 interval mapping basics ā€¢ observed measurements ā€“ Y = phenotypic trait ā€“ X = markers & linkage map ā€¢ i = individual index 1,ā€¦,n ā€¢ missing data ā€“ missing marker data ā€“ Q = QT genotypes ā€¢ alleles QQ, Qq, or qq at locus ā€¢ unknown genetic architecture ā€“ Ī» = QT locus (or loci) ā€“ Īø = genetic action ā€“ m = number of QTL ā€¢ pr(Q|X,Ī»,m) recombination model ā€“ grounded by linkage map, experimental cross ā€“ recombination yields multinomial for Q given X ā€¢ pr(Y|Q,Īø,m) phenotype model ā€“ distribution shape (assumed normal here) ā€“ unknown parameters Īø (could be non-parametric) observed X Y missing Q unknown Ī» Īø after Sen Churchill (2001)

Upload: others

Post on 11-Feb-2022

6 views

Category:

Documents


0 download

TRANSCRIPT

ch. 2 Ā© 2003 Broman, Churchill, Yandell, Zeng 1

2 Key Statistical Issues for QTLā€¢ general notation and data structureā€¢ recombination model

ā€“ two linked markersā€“ flanking markers to a QTLā€“ map distance and map functions

ā€¢ modelling the phenotypeā€“ phenotype modelā€“ model likelihoodā€“ Bayesian posterior

ā€¢ missing data concepts and algorithmsā€¢ model selection

ch. 2 Ā© 2003 Broman, Churchill, Yandell, Zeng 2

interval mapping basicsā€¢ observed measurements

ā€“ Y = phenotypic traitā€“ X = markers & linkage map

ā€¢ i = individual index 1,ā€¦,nā€¢ missing data

ā€“ missing marker dataā€“ Q = QT genotypes

ā€¢ alleles QQ, Qq, or qq at locusā€¢ unknown genetic architecture

ā€“ Ī» = QT locus (or loci)ā€“ Īø = genetic actionā€“ m = number of QTL

ā€¢ pr(Q|X,Ī»,m) recombination modelā€“ grounded by linkage map, experimental crossā€“ recombination yields multinomial for Q given X

ā€¢ pr(Y|Q,Īø,m) phenotype modelā€“ distribution shape (assumed normal here) ā€“ unknown parameters Īø (could be non-parametric)

observed X Y

missing Q

unknown Ī» Īøafter

Sen Churchill (2001)

ch. 2 Ā© 2003 Broman, Churchill, Yandell, Zeng 3

2.1 general notation and data structure

ā€¢ Y = phenotype valuesā€“ as concept and realized (observed) values

ā€¢ X = marker genotypesā€“ type of experimental crossā€“ linkage map construction

ā€¢ marker orders, positions, linkage phasesā€“ observed marker genotypes (possibly with error)

ā€¢ pr(Y,X) = joint probabilityā€“ what we ā€œknowā€ about Y and X for this experimentā€“ usually assume linkage map is ā€œknownā€

ch. 2 Ā© 2003 Broman, Churchill, Yandell, Zeng 4

conditional data likelihood

ā€¢ condition on markers and linkage map

ā€¢ pr(X) comprises information on linkage mapā€“ not influenced by phenotypeā€“ thus can ā€œignoreā€ for QTL purposes

)(pr),(pr)|(pr

XXYXY =

ch. 2 Ā© 2003 Broman, Churchill, Yandell, Zeng 5

unknown QTL genotypesā€¢ usually have sparse linkage map of markers

ā€“ want to condition on actual QTL genotype Qpr(Y|Q)

ā€“ but actual QTL affecting phenotype not knownā€¢ need to consider all possibilities

ā€“ average pr(Y|Q) over all possible genotypes Qā€“ weight by recombination pr(Q|X)

āˆ‘=Q

XQQYXY )|(pr)|(pr)|(pr

ch. 2 Ā© 2003 Broman, Churchill, Yandell, Zeng 6

enter the (Greek) parametersā€¢ Īø = genetic effects, or gene action

ā€“ additive, dominance, epistasisā€“ may include reference values

ā€¢ grand mean (Āµ), environmental variance (Ļƒ2)

ā€¢ Ī» = location(s) of QTLā€“ measured along ā€œlinearā€ genomeā€“ related to recombination and map distance

āˆ‘==Q

XQQYXYXYL ),|(pr),|(pr),,|(pr),|,( Ī»ĪøĪ»ĪøĪ»Īø

ch. 2 Ā© 2003 Broman, Churchill, Yandell, Zeng 7

2.2 recombination modelā€¢ locus Ī» is distance along linkage map

ā€“ identifies flanking marker regionā€¢ flanking markers provide good approximation

ā€“ map assumed known from earlier studyā€“ inaccuracy slight using only flanking markers

ā€¢ extend to next flanking markers if missing dataā€“ could consider more complicated relationship

ā€¢ but little change in results

pr(Q|X,Ī») = pr(geno | map, locus) ā‰ˆpr(geno | flanking markers, locus)

kX 1+kXQ?

Ī»

ch. 2 Ā© 2003 Broman, Churchill, Yandell, Zeng 8

ā€¢ backcross designā€“ n individuals, 2 markersā€“ follow one gamete

ā€¢ recombinantsā€“ Ab, aBā€“ nR = n12 + n21

ā€¢ non-recombinantsā€“ ab, ABā€“ nNR = n11 + n22

ā€¢ recombination rate

2.2.1 two linked markersA B

22211211

2112Ė†nnnn

nnnnr R

++++==

r

21122211

,,nnnnnn

aBAbABabRNR

RNR +=+=

n21

ch. 2 Ā© 2003 Broman, Churchill, Yandell, Zeng 9

no linkage?ā€¢ test for no linkage: r = 1/2ā€¢ assumption: all individuals have same rate

ā€“ implies binomial variation

ā€¢ normal or chi-square test statisticn

rrrnnnn

nnnnr R )Ė†1(Ė†)Ė†( var,Ė†

22211211

2112 āˆ’ā‰ˆ+++

+==

21

22 ~

)/()2/(Zor )1,0(~

)Ė†(var2/1Ė† Ļ‡

nnnnnN

rrZ

NRR

R āˆ’=āˆ’=

ch. 2 Ā© 2003 Broman, Churchill, Yandell, Zeng 10

binomial probabilitiesbinomial probn = 30,100r = 0.4,0.2

normal approx

knkR rr

kn

kn āˆ’āˆ’

== )1()(pr

ch. 2 Ā© 2003 Broman, Churchill, Yandell, Zeng 11

likelihood ratio and LOD testā€¢ likelihood for linked markers

ā€¢ likelihood for unlinked markers

ā€¢ likelihood ratio and LOD

nCL )()( 21

21 =

22

10

21

2

217.)10log(2

)(log

~)log(2,)Ė†1()Ė†(2

GGLRLOD

LRGrrLR NRR nnn

===

=āˆ’= Ļ‡

NRR nnR rCrrnnrL )1(),|(pr)( āˆ’==

ch. 2 Ā© 2003 Broman, Churchill, Yandell, Zeng 12

test statistic: distribution

ā€¢ Z2 and G2 are generally close to each otherā€“ Z2 based on properties of countsā€“ G2 and LOD based on likelihood principleā€“ both have approximate chi-square distribution

ā€¢ (non)central chi-square distribution

22;1

22

21

22

)5.0(4,~,:5.0

~,:5.0

rnncpGZr

GZr

ncp āˆ’=<

=

Ļ‡Ļ‡

ch. 2 Ā© 2003 Broman, Churchill, Yandell, Zeng 13

backcross examples

ā€¢ n=100 individuals, nR= 40 recombinantsā€“ r = 0.4, se(r) = 0.049ā€“ Z = -2.04, Z2 = 4.17, p-value = 0.041ā€“ G2 = 4.03, LOD = 0.874, p-value = 0.045

ā€¢ n=100 individuals, nR= 20 recombinantsā€“ r = 0.2, se(r) = 0.04ā€“ Z = -7.5, Z2 = 56.25, p-value < 0.0001ā€“ G2 = 38.55, LOD = 8.37, p-value < 0.0001

ch. 2 Ā© 2003 Broman, Churchill, Yandell, Zeng 14

backcross examples

ā€¢ n=30 individuals, nR= 12 recombinantsā€“ r = 0.4, se(r) = 0.089ā€“ Z = -1.12, Z2 = 1.25, p-value = 0.26ā€“ G2 = 1.21, LOD = 0.262, p-value = 0.27

ā€¢ n=30 individuals, nR= 6 recombinantsā€“ r = 0.2, se(r) = 0.073ā€“ Z = -4.11, Z2 = 16.87, p-value < 0.0001ā€“ G2 = 11.56, LOD = 2.51, p-value < 0.0001

ch. 2 Ā© 2003 Broman, Churchill, Yandell, Zeng 15

simulations of LOD distribution

n=100,r=0.3,0.51000 sampleshistogram

chi-square curverescaled by 2log(10)

centralr = 0.5

non-centralr = 0.3

ch. 2 Ā© 2003 Broman, Churchill, Yandell, Zeng 16

LOD and LR over possible rn = 30nR=12 or 6evaluate at

all possible rnot just ā€œbestā€

LR like a densityLOD is basis for

hypothesis test estimate interval

0.0 0.1 0.2 0.3 0.4 0.5

0.0

e+00

5.0

e-19

1.0

e-18

1.5

e-18

recombination rate

likel

ihoo

d ra

tio

0.0 0.1 0.2 0.3 0.4 0.5

-15

-10

-50

recombination rate

LOD

0.0 0.1 0.2 0.3 0.4 0.5

0.0

e+00

1.0

e-16

2.0

e-16

recombination rate

likel

ihoo

d ra

tio

0.0 0.1 0.2 0.3 0.4 0.5

-4-2

02

recombination rate

LOD

ch. 2 Ā© 2003 Broman, Churchill, Yandell, Zeng 17

LR, LOD and p-values

p-value p-valueLR LOD 1 d.f. 2 d.f.10 1 0.0319 0.131.6 1.5 0.0086 0.0316100 2 0.0024 0.011000 3 0.0002 0.00110000 4 <0.0001 0.0001

ch. 2 Ā© 2003 Broman, Churchill, Yandell, Zeng 18

LOD-based interval estimate for rpoint estimate interval estimate

from LOD peakdown 1.5 LOD(or 1 or 2 or ā€¦)

n = 30, nR = 12, 6

nnr R /Ė† =

0.20 0.25 0.30 0.35 0.40 0.45 0.50

-1.5

-1.0

-0.5

0.0

n=30,r=0.4

LOD

0.1 0.2 0.3 0.4

0.5

1.0

1.5

2.0

2.5

n=30,r=0.2

LOD

ch. 2 Ā© 2003 Broman, Churchill, Yandell, Zeng 19

LOD-based interval calculationsconfidence 96.8% 99.1% 99.76%n nR r 1 LOD 1.5 LOD 2 LOD

30 3 0.1 0.03-0.25 0.02-0.29 0.01-0.33100 10 0.1 0.05-0.17 0.04-0.19 0.04-0.2130 6 0.2 0.08-0.37 0.06-0.42 0.05-0.46100 20 0.2 0.13-0.29 0.11-0.31 0.10-0.3330 12 0.4 0.23-0.50 0.19-0.50 0.17-0.50100 40 0.4 0.30-0.50 0.28-0.50 0.26-0.50

Note skew in intervals for small recombination rates.Note upper boundary of 0.5.

ch. 2 Ā© 2003 Broman, Churchill, Yandell, Zeng 20

LOD-based intervals

0.00 0.10 0.20 0.30

3.0

3.5

4.0

4.5

n=30,r=0.1

LOD

0.1 0.2 0.3 0.4

0.5

1.0

1.5

2.0

2.5

n=30,r=0.2

LOD

0.20 0.30 0.40 0.50

-1.5

-1.0

-0.5

0.0

n=30,r=0.4

LOD

0.05 0.10 0.15 0.20

14.0

14.5

15.0

15.5

16.0

n=100,r=0.1

LOD

0.10 0.15 0.20 0.25 0.30

6.5

7.0

7.5

8.0

n=100,r=0.2

LOD

0.30 0.35 0.40 0.45 0.50

-1.0

-0.5

0.0

0.5

n=100,r=0.4

LOD

ch. 2 Ā© 2003 Broman, Churchill, Yandell, Zeng 21

likelihood & Bayesian posteriorā€¢ recall the likelihood and likelihood ratio:

ā€¢ posterior turns likelihood into a densityā€“ assume r may be any value prior to seeing dataā€“ posterior = likelihood x prior / constant

NRR

NRR

nnn

nnR

rrrLR

rCrrnnrL

)1(2)(

)1(),|(pr)(

āˆ’=

āˆ’==

1),|(pr sumcurve or likelihoodunder area

/)(or /)(),|(pr

==

==

Rr

R

nnrLRA

ArLRArLnnr

ch. 2 Ā© 2003 Broman, Churchill, Yandell, Zeng 22

LR and Bayes posteriorimagine LR as density

area under curve = 1pr(r | nR ) = LR(r) / A

what is probability that ris between 0.25 and 0.5?

where is interval with highest posterior mass? (HPD region)

example: n=30, nR =12,695% HPD regions

0.0 0.1 0.2 0.3 0.4 0.5

0.0

e+00

1.0

e-18

recombination rate

likel

ihoo

d ra

tio

0.0 0.1 0.2 0.3 0.4 0.5

0.00

00.

010

0.02

0

recombination rate

dens

ity

0.0 0.1 0.2 0.3 0.4 0.5

0.0

e+00

1.5

e-16

recombination rate

likel

ihoo

d ra

tio

0.0 0.1 0.2 0.3 0.4 0.5

0.00

00.

010

0.02

0

recombination rate

dens

ity

ch. 2 Ā© 2003 Broman, Churchill, Yandell, Zeng 23

HPD-based interval calculationsHPD level 96.8% 99.1% 99.76%n nR r

30 3 0.1 0.03-0.25 0.02-0.29 0.01-0.33100 10 0.1 0.05-0.17 0.05-0.19 0.04-0.2130 6 0.2 0.08-0.37 0.07-0.41 0.05-0.45100 20 0.2 0.13-0.29 0.12-0.31 0.10-0.3330 12 0.4 0.25-0.50 0.22-0.50 0.19-0.50100 40 0.4 0.31-0.50 0.30-0.50 0.28-0.50

Note how these almost agree with LOD-based intervals.Density height for HPD varies by n and r.

ch. 2 Ā© 2003 Broman, Churchill, Yandell, Zeng 24

Bayesian posteriors for r

0.0 0.1 0.2 0.3 0.4 0.5

0.00

0.01

0.02

0.03

n=30,r=0.1

dens

ity

0.0 0.1 0.2 0.3 0.4 0.5

0.00

00.

010

0.02

0

n=30,r=0.2

dens

ity

0.0 0.1 0.2 0.3 0.4 0.5

0.00

00.

010

0.02

0

n=30,r=0.4

dens

ity

0.0 0.1 0.2 0.3 0.4 0.5

0.00

0.02

0.04

0.06

n=100,r=0.1

dens

ity

0.0 0.1 0.2 0.3 0.4 0.5

0.00

0.01

0.02

0.03

0.04

0.05

n=100,r=0.2

dens

ity

0.0 0.1 0.2 0.3 0.4 0.5

0.00

0.01

0.02

0.03

0.04

n=100,r=0.4

dens

ity

ch. 2 Ā© 2003 Broman, Churchill, Yandell, Zeng 25

ā€¢ Reverend Thomas Bayes (1702-1761)ā€“ part-time mathematicianā€“ buried in Bunhill Cemetary, Moongate, Londonā€“ famous paper in 1763 Phil Trans Roy Soc Londonā€“ was Bayes the first with this idea? (Laplace)

ā€¢ billiard balls on rectangular tableā€“ two balls tossed at random (uniform) on tableā€“ where is first ball if the second is to its left (right)?

who was Bayes?

prior pr(Īø) = 1likelihood pr(Y | Īø)= ĪøY(1- Īø)1-Y

posterior pr(Īø |Y)= ?

Īø

Y=1 Y=0

firstsecond

ch. 2 Ā© 2003 Broman, Churchill, Yandell, Zeng 26

where is the first ball?

=āˆ’=

=

=āˆ’=

=

āˆ« āˆ’

0)1(212

)|(pr

21)1()(pr

)(pr)(pr)|(pr)|pr(

1

0

1

YY

Y

dY

YYY

YY

ĪøĪø

Īø

ĪøĪøĪø

ĪøĪøĪø

prior pr(Īø) = 1likelihood pr(Y | Īø)= ĪøY(1- Īø)1-Y

posterior pr(Īø |Y)= ?

Īø

Y=1 Y=0

first ball

second ball

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.5

1.0

1.5

2.0

Y = 1

Y = 0

Īø

pr(Īø

|Y)

(now throw second ball n times)

ch. 2 Ā© 2003 Broman, Churchill, Yandell, Zeng 27

what is Bayes theorem?ā€¢ before and after observing data

ā€“ prior: pr(Īø) = pr(parameters)ā€“ posterior: pr(Īø|Y) = pr(parameters|data)

ā€¢ posterior = likelihood * prior / constantā€“ usual likelihood of parameters given dataā€“ normalizing constant pr(Y) depends only on data

ā€¢ constant often drops out of calculation

)(pr)(pr)|(pr

)(pr),(pr)|(pr

YY

YYY ĪøĪøĪøĪø Ɨ==

ch. 2 Ā© 2003 Broman, Churchill, Yandell, Zeng 28

Bayes rule for recombination r

āˆ«=

Ɨ=

ā‰¤ā‰¤ā‰¤āˆ’=ā‰¤ā‰¤

āˆ’==

2/1

02),|(pr)|(pr

:constant g normalizin)|(pr

)(pr),|(pr),|(pr

:rule Bayes2/10

)(2)pr(: ion recombinat onprior

)1(),|(),|(pr

:likelihood

drrnnnn

nnrrnnnnr

baabbra

rrCrnnrLrnn

RR

R

RR

nnRR

NRR

0.0 0.1 0.2 0.3 0.4 0.5

01

23

45

posterior & prior

rela

tive

frequ

ency n=30

nR=12

0.0 0.1 0.2 0.3 0.4 0.5

01

23

45

posterior & prior

rela

tive

frequ

ency n=30

nR=6

ch. 2 Ā© 2003 Broman, Churchill, Yandell, Zeng 29

two markers in F2 intercrossā€¢ two meioses

ā€“ follow both gametesā€“ 16 possibilities

ā€¢ ambiguity with AaBbā€“ 0 or 2 recombinations

ā€¢ log likelihood ratio:lkjlj

( ))5.0(/)(logsumlog xxxx frfnLR =

ch. 2 Ā© 2003 Broman, Churchill, Yandell, Zeng 30

two markers in F2 intercrossā€¢ Ī³ = probability of double recombinant

ā€“ for AaBb genotype, haplotype not knownā€“ need to ā€œguessā€ the recombinant fraction of n11 offspring

ā€¢ Ī³ and r are inter-relatedā€“ no ā€œclosedā€ solution, need to iterateā€“ guess Ī³ , use to estimate r, improve Ī³ , etc.

[ ]) Ė†(2)(21Ė†

)1(AaBb

aBAbpr

11200221121001

22

2|nnnnnnn

nr

rrr

Ī³

Ī³

++++++=

+āˆ’=

=

ch. 2 Ā© 2003 Broman, Churchill, Yandell, Zeng 31

EM algorithm for F2 recombinationā€¢ initial guess: r = 0.5, Ī³ = 0.5ā€¢ Expectation (E) step

ā€“ substitute expected values for nuisance Ī³ā€“ update Ī³ given current value of r

ā€¢ Maximization (M) stepā€“ maximize likelihood for parameter rā€“ update r given current value of Ī³

ā€¢ iterate E-step and M-step until ā€œconvergenceā€ā€“ stop when change in log-likelihood is small

ā€“ usually change in r is small at this point( ))5.0(/)(logsumlog xxxx frfnLR =

ch. 2 Ā© 2003 Broman, Churchill, Yandell, Zeng 32

2.2.2 flanking markers to QTL

ā€¢ most genotype information is localā€“ linkage drops off with distanceā€“ approximate by using only flanking markersā€“ exception: linkage disequilibrium

ā€¢ different chromosome regions could be correlatedā€¢ due to selection, etc.ā€¢ not a problem for backcross or F2 intercross

ā€¢ missing marker data: use next flanking marker

A QrAQBrQB

rAB

ch. 2 Ā© 2003 Broman, Churchill, Yandell, Zeng 33

backcross QTL & flanking markers1 meiosis8 possible genotypes3 recombination ratessmall distances & rates?

no double crossoversĻ = rAQ /rAB

A QrAQBrQB

rAB

ch. 2 Ā© 2003 Broman, Churchill, Yandell, Zeng 34

F2 QTL & flanking markers2 meioses27 possible genotypes3 recombination ratesEM steps on Ī³ and rABsmall distances & rates?

no double crossovers

A QrAQBrQB

rAB

22

2

)1(,/

ABAB

ABABAQ rr

rrr+āˆ’

== Ī³Ļ

ch. 2 Ā© 2003 Broman, Churchill, Yandell, Zeng 35

2.2.3 map distance & map functionsā€¢ How to relate genetic linkage to physical distance?

ā€“ math assumptions = crude approximationsā€“ critical for map building, minor effect on QTL

ā€¢ x = genetic map distance (1Morgan = 100cM)ā€“ expected number of crossovers per meiosis between

two loci on a single chromatid strand (Sturtevant 1913)

ā€¢ typical map functionsā€“ Morgan: interference rAB = rAQ + rQB

ā€“ Kosambi: intermediate rAB = (rAQ + rQB)/(1+4rAQ rQB)ā€“ Haldane: no interference rAB = rAQ + rQB - 2rAQ rQB

ch. 2 Ā© 2003 Broman, Churchill, Yandell, Zeng 36

2.3 modelling the phenotypeā€¢ trait = mean + genetic + environmentā€¢ pr( trait Y | genotype Q, effects Īø )

EGYGQY

Q

Q

++=

=

ĀµĻƒĪø ),(normal),|(pr 2

8 9 10 11 12 13 14 15 16 17 18 19 20

qq QQ

Qq

ch. 2 Ā© 2003 Broman, Churchill, Yandell, Zeng 37

caution: donā€™t trust raw histograms!

8 9 10 11 12 13 14 15 16

qqQq

QQ

no QTL?

8 9 10 11 12 13 14 15 16

qqQq

QQ

skewed?

8 9 10 11 12 13 14 15 16 17 18 19 20 21

qqQq

QQ

dominance?

0 1 2 3 4 5 6 7 8 9

qq

QqQQ

zeros?

ch. 2 Ā© 2003 Broman, Churchill, Yandell, Zeng 38

2.3.1 phenotype modelā€¢ how is phenotype related to genotype?ā€¢ typical assumptions

ā€“ normal environmental variationā€¢ residuals e (not Y!) have bell-shaped histogram

ā€“ genetic value GQ is composite of a few QTLā€¢ other polygenic effects same across all individuals

ā€“ genetic effect uncorrelated with environment

effects ),,(

),|(var,),|(

),0(~,

2

2

2

ĻƒĀµĪø

ĻƒĪøĀµĪø

ĻƒĀµ

Q

Q

Q

G

QYGQYE

NeeGY

=

=+=

++=

ch. 2 Ā© 2003 Broman, Churchill, Yandell, Zeng 39

F2 intercross phenotype modelā€¢ here assume only one QTLā€¢ genotypes QQ, Qq, qqā€¢ genotypic values GQQ, GQq, Gqq

ā€¢ decompose as additive, dominance effects

222:Cockerham-Fisher:Jinx-Mather

qqQqQQ: genotype

Ī“Ī“Ī“ Ī±ĀµĀµĪ±ĀµĪ±ĀµĪ“ĀµĪ±Āµāˆ’āˆ’+āˆ’+=

āˆ’++==

Q

Q

GG

Q

ch. 2 Ā© 2003 Broman, Churchill, Yandell, Zeng 40

2.3.2 model likelihoodā€¢ why study the likelihood?

ā€“ uncover hidden aspects of QTLā€“ loci Ī», effects Īø, given data (Y,X)

ā€¢ what is evidence to support a QTL?ā€¢ where are the QTL? ā€¢ how precise can estimate the loci & effects?ā€¢ what genetic architecture is supported?

ch. 2 Ā© 2003 Broman, Churchill, Yandell, Zeng 41

building the model likelihoodā€¢ likelihood links phenotype & recombination

ā€“ through unknown QTL genotypes Qā€“ mixture over all possible genotypes

ā€¢ contribution from individual i

ā€¢ product over all individuals

),,|(pr prod),|,(

),|(pr),|(prsum prod),|,(

),|(pr),|(pr sum),,|(pr

Ī»ĪøĪ»ĪøĪ»ĪøĪ»Īø

Ī»ĪøĪ»Īø

iii

iiQi

iiQii

XYXYL

XQQYXYL

XQQYXY

=

=

=

ch. 2 Ā© 2003 Broman, Churchill, Yandell, Zeng 42

and if there are no QTL?

ā€¢ Y = Āµ + e, or L(Āµ | Y )ā€¢ no relationship with markers & map Xā€¢ for normal data, maximum likelihood yields

nYYs

nYYYNYL

ii

ii

/)(sumĖ†/sumĖ†

),|()|(

222

2

āˆ’==

===

ā€¢

ĻƒĀµ

ĻƒĀµĀµ

ch. 2 Ā© 2003 Broman, Churchill, Yandell, Zeng 43

maximum likelihood & LODā€¢ likelihood peaks at some (Īø,Ī»)

ā€“ use ā€œhatā€ (^) to signify value at maximumā€¢ LOD profiles likelihood peak

ā€“ find Īø to maximize likelihood for each Ī»ā€“ profile (scan) loci Ī» over entire genome

=

==

)|(max),|,(maxlog),|(

),,|(pr prod),,|(pr),|,(

10 YLXYLXYLOD

XYXYXYL iii

ĀµĪ»ĪøĪ»

Ī»ĪøĪ»ĪøĪ»Īø

Āµ

Īø

ch. 2 Ā© 2003 Broman, Churchill, Yandell, Zeng 44

2.3.3 Bayesian posterior

ā€¢ treat unknowns as randomā€“ build ā€œuncertaintyā€ into model frameworkā€“ genetic architecture: gene action Īø, QTL locus Ī»

ā€¢ interpret weighted likelihood as a densityā€“ weights based on prior ā€œbeliefsā€

)|(pr)(pr)|,(pr)|(pr

)|,(pr),,|(pr),|,(pr

XXXY

XXYXY

Ī»ĪøĪ»Īø

Ī»ĪøĪ»ĪøĪ»Īø

=

=

ch. 2 Ā© 2003 Broman, Churchill, Yandell, Zeng 45

choice of Bayesian priorsā€¢ elicited priors

ā€“ higher weight for more probable parameter valuesā€¢ based on prior empirical knowledge

ā€“ use previous study to inform current studyā€¢ weather prediction, previous QTL studies on related organisms

ā€¢ conjugate priorsā€“ convenient mathematical formā€“ essential before computers, helpful now to simply computationā€“ large variances on priors reduces their influence on posterior

ā€¢ non-informative priors ā€“ may have ā€œnoā€ information on unknown parametersā€“ prior with all parameter values equally likely

ā€¢ may not sum to 1 (improper), which can complicate useā€¢ always check sensitivity of posterior to choice of prior

ch. 2 Ā© 2003 Broman, Churchill, Yandell, Zeng 46

incorporate missing genotypes Qā€¢ augment data with missing genotypes Q

ā€“ use recombination model to state uncertaintyā€“ avoid likelihood mixture by augmentation

ā€¢ marginal posterior on unknowns of interestā€“ average over fully augmented posterior

),|,,(pr sum),|,(pr)|(pr

)|,(pr),|(pr),|(pr),|,,(pr

XYQXYXY

XXQQYXYQ

Q Ī»ĪøĪ»Īø

Ī»ĪøĪ»ĪøĪ»Īø

=

=

ch. 2 Ā© 2003 Broman, Churchill, Yandell, Zeng 47

Bayesian parameter estimatesā€¢ estimates are posterior means or modes

ā€“ mean = weighted average of all possible valuesā€“ mode = maximum

ā€¢ can get joint or marginal estimates

( )),|,(pr sum argmaxĖ†),|,(pr sumĖ†

mode

,mean

XY

XY

Ī»ĪøĪø

Ī»ĪøĪøĪø

Ī»Īø

Ī»Īø

=

=

ch. 2 Ā© 2003 Broman, Churchill, Yandell, Zeng 48

2.4 missing data concepts

ā€¢ missing QTL genotype Q--see section 2.3ā€¢ missing marker data X

ā€“ errors in genotypingā€“ difficulty reading signal (on gel)ā€“ marker not fully informative

ā€¢ distinguish full data X from observed Xobsā€“ weighted average over all possible marker values

)|(pr),|(prsum),|(pr obsobs XXXQXQ X Ī»Ī» =