seattle summer institute 2006 advanced qtl brian s. yandell...

60
QTL 2: Overview Seattle SISG: Yandell © 2006 1 Seattle Summer Institute 2006 Advanced QTL Brian S. Yandell University of Wisconsin-Madison Bayesian QTL mapping & model selection data examples in detail multiple phenotypes & microarrays software demo & automated strategy QTL 2: Overview Seattle SISG: Yandell © 2006 2 contact information & resources • email: [email protected] • web: www.stat.wisc.edu/~yandell/statgen QTL & microarray resources references, software, people • thanks: students: Jaya Satagopan, Pat Gaffney, Fei Zou, Amy Jin, W. Whipple Neely – faculty/staff: Alan Attie, Michael Newton, Nengjun Yi, Gary Churchill, Hong Lan, Christina Kendziorski, Tom Osborn, Jason Fine, Tapan Mehta, Hao Wu, Samprit Banerjee, Daniel Shriner

Upload: others

Post on 04-Oct-2020

5 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Seattle Summer Institute 2006 Advanced QTL Brian S. Yandell …pages.stat.wisc.edu/~yandell/talk/sisg/seattle2006/bsy... · 2006. 5. 23. · • 10 QTL ok, 50 QTL are too many •

QTL 2: Overview Seattle SISG: Yandell © 2006 1

Seattle Summer Institute 2006Advanced QTLBrian S. Yandell

University of Wisconsin-Madison

• Bayesian QTL mapping & model selection• data examples in detail• multiple phenotypes & microarrays• software demo & automated strategy

QTL 2: Overview Seattle SISG: Yandell © 2006 2

contact information & resources

• email: [email protected]• web: www.stat.wisc.edu/~yandell/statgen

– QTL & microarray resources– references, software, people

• thanks:– students: Jaya Satagopan, Pat Gaffney, Fei Zou, Amy Jin,

W. Whipple Neely– faculty/staff: Alan Attie, Michael Newton, Nengjun Yi, Gary

Churchill, Hong Lan, Christina Kendziorski, Tom Osborn, Jason Fine, Tapan Mehta, Hao Wu, Samprit Banerjee, Daniel Shriner

Page 2: Seattle Summer Institute 2006 Advanced QTL Brian S. Yandell …pages.stat.wisc.edu/~yandell/talk/sisg/seattle2006/bsy... · 2006. 5. 23. · • 10 QTL ok, 50 QTL are too many •

QTL 2: Bayes Seattle SISG: Yandell © 2006 1

Bayesian Interval Mapping

1. what is goal of QTL study? 2-82. Bayesian QTL mapping 9-203. Markov chain sampling 21-274. sampling across architectures 28-345. epistatic interactions 35-426. comparing models 43-46

QTL 2: Bayes Seattle SISG: Yandell © 2006 2

1. what is the goal of QTL study?• uncover underlying biochemistry

– identify how networks function, break down– find useful candidates for (medical) intervention– epistasis may play key role– statistical goal: maximize number of correctly identified QTL

• basic science/evolution– how is the genome organized?– identify units of natural selection– additive effects may be most important (Wright/Fisher debate)– statistical goal: maximize number of correctly identified QTL

• select “elite” individuals– predict phenotype (breeding value) using suite of characteristics

(phenotypes) translated into a few QTL– statistical goal: mimimize prediction error

Page 3: Seattle Summer Institute 2006 Advanced QTL Brian S. Yandell …pages.stat.wisc.edu/~yandell/talk/sisg/seattle2006/bsy... · 2006. 5. 23. · • 10 QTL ok, 50 QTL are too many •

QTL 2: Bayes Seattle SISG: Yandell © 2006 3

QTL

Marker Trait

cross two inbred lines → linkage disequilibrium

→ associations→ linked segregating QTL

(after Gary Churchill)

QTL 2: Bayes Seattle SISG: Yandell © 2006 4

pragmatics of multiple QTL• evaluate some objective for model given data

– classical likelihood– Bayesian posterior

• search over possible genetic architectures (models)– number and positions of loci– gene action: additive, dominance, epistasis

• estimate “features” of model– means, variances & covariances, confidence regions– marginal or conditional distributions

• art of model selection– how select “best” or “better” model(s)?– how to search over useful subset of possible models?

Page 4: Seattle Summer Institute 2006 Advanced QTL Brian S. Yandell …pages.stat.wisc.edu/~yandell/talk/sisg/seattle2006/bsy... · 2006. 5. 23. · • 10 QTL ok, 50 QTL are too many •

QTL 2: Bayes Seattle SISG: Yandell © 2006 5

advantages of multiple QTL approach• improve statistical power, precision

– increase number of QTL detected– better estimates of loci: less bias, smaller intervals

• improve inference of complex genetic architecture– patterns and individual elements of epistasis– appropriate estimates of means, variances, covariances

• asymptotically unbiased, efficient– assess relative contributions of different QTL

• improve estimates of genotypic values– less bias (more accurate) and smaller variance (more precise)– mean squared error = MSE = (bias)2 + variance

QTL 2: Bayes Seattle SISG: Yandell © 2006 6

0 5 10 15 20 25 30

01

23

rank order of QTL

addi

tive

effe

ct

Pareto diagram of QTL effects

54

3

2

1

major QTL onlinkage map

majorQTL

minorQTL

polygenes

(modifiers)

Page 5: Seattle Summer Institute 2006 Advanced QTL Brian S. Yandell …pages.stat.wisc.edu/~yandell/talk/sisg/seattle2006/bsy... · 2006. 5. 23. · • 10 QTL ok, 50 QTL are too many •

QTL 2: Bayes Seattle SISG: Yandell © 2006 7

limits of multiple QTL?• limits of statistical inference

– power depends on sample size, heritability, environmental variation

– “best” model balances fit to data and complexity (model size)– genetic linkage = correlated estimates of gene effects

• limits of biological utility– sampling: only see some patterns with many QTL– marker assisted selection (Bernardo 2001 Crop Sci)

• 10 QTL ok, 50 QTL are too many• phenotype better predictor than genotype when too many QTL• increasing sample size may not give multiple QTL any advantage

– hard to select many QTL simultaneously• 3m possible genotypes to choose from

QTL 2: Bayes Seattle SISG: Yandell © 2006 8

QTL below detection level?• problem of selection bias

– QTL of modest effect only detected sometimes– their effects are biased upwards when detected

• probability that QTL detected– avoids sharp in/out dichotomy– avoid pitfalls of one “best” model– examine “better” models with more probable QTL

• build m = number of QTL detected into QTL model– directly allow uncertainty in genetic architecture– model selection over genetic architecture

Page 6: Seattle Summer Institute 2006 Advanced QTL Brian S. Yandell …pages.stat.wisc.edu/~yandell/talk/sisg/seattle2006/bsy... · 2006. 5. 23. · • 10 QTL ok, 50 QTL are too many •

QTL 2: Bayes Seattle SISG: Yandell © 2006 9

• Reverend Thomas Bayes (1702-1761)– part-time mathematician– buried in Bunhill Cemetary, Moongate, London– famous paper in 1763 Phil Trans Roy Soc London– was Bayes the first with this idea? (Laplace?)

• basic idea (from Bayes’ original example)– two billiard balls tossed at random (uniform) on table– where is first ball if the second is to its left?

• prior: anywhere on the table• posterior: more likely toward right end of table

2. Bayesian QTL mapping

QTL 2: Bayes Seattle SISG: Yandell © 2006 10

Bayes posterior for normal data

large prior variancesmall prior variance

6 8 10 12 14 16

y = phenotype values

n la

rge

n sm

all

prio

r

6 8 10 12 14 16

y = phenotype values

n la

rge

n sm

all

prio

rpr

ior

actu

al m

ean

prio

r mea

n

actu

al m

ean

prio

r mea

n

Page 7: Seattle Summer Institute 2006 Advanced QTL Brian S. Yandell …pages.stat.wisc.edu/~yandell/talk/sisg/seattle2006/bsy... · 2006. 5. 23. · • 10 QTL ok, 50 QTL are too many •

QTL 2: Bayes Seattle SISG: Yandell © 2006 11

model yi = µ + eienvironment e ~ N( 0, σ2 ), σ2 known likelihood y ~ N( µ, σ2 )prior µ ~ N( µ0, κσ2 ), κ known

posterior: mean tends to sample meansingle individual µ ~ N( µ0 + b1(y1 – µ0), b1σ2)

sample of n individuals

fudge factor(shrinks to 1)

Bayes posterior for normal data

( )

11

/sum with/,)1(~

},...,1{

20

→+

=

=−+

=•

nnb

nyynbbybN

n

ini

nnn

κκ

σµµ

QTL 2: Bayes Seattle SISG: Yandell © 2006 12

Bayesian QTL: key players• observed measurements

– y = phenotypic trait– m = markers & linkage map– i = individual index (1,…,n)

• missing data– missing marker data– q = QT genotypes

• alleles QQ, Qq, or qq at locus• unknown quantities

– λ = QT locus (or loci)– µ = phenotype model parameters– H = QTL model/genetic architecture

• pr(q|m,λ,H) genotype model– grounded by linkage map, experimental cross– recombination yields multinomial for q given m

• pr(y|q,µ,H) phenotype model– distribution shape (assumed normal here) – unknown parameters µ (could be non-parametric)

observed X Y

missing Q

unknown λ θ

afterSen Churchill (2001)

y

λ

q

µ

m

H

Page 8: Seattle Summer Institute 2006 Advanced QTL Brian S. Yandell …pages.stat.wisc.edu/~yandell/talk/sisg/seattle2006/bsy... · 2006. 5. 23. · • 10 QTL ok, 50 QTL are too many •

QTL 2: Bayes Seattle SISG: Yandell © 2006 13

pr(y|q,µ) phenotype model

6 8 10 12 14 16

y = phenotype values

n la

rge

n la

rge

n sm

allp

rior

qq Qq QQ

QTL 2: Bayes Seattle SISG: Yandell © 2006 14

posterior centered on sample genotypic meanbut shrunken slightly toward overall mean prior:

posterior:

fudge factor:

Bayes posterior QTL means

( )

( )

11

/sum},{count

/,)1(~

,~

}{

2

2

→+

=

===

−+

=

q

qq

qiqqqiq

qqqqqq

q

nn

b

nyyqqn

nbybybN

yN

i

κκ

σµ

κσµ

Page 9: Seattle Summer Institute 2006 Advanced QTL Brian S. Yandell …pages.stat.wisc.edu/~yandell/talk/sisg/seattle2006/bsy... · 2006. 5. 23. · • 10 QTL ok, 50 QTL are too many •

QTL 2: Bayes Seattle SISG: Yandell © 2006 15

• partition genotype-specific mean into QTL effectsµq = mean + main effects + epistatic interactionsµq = µ + βq = µ + sumj in H βqj

• priors on mean and effectsµ ~ N(µ0, κ0σ2) grand meanβq ~ N(0, κ1σ2) model-independent genotypic effectβqj ~ N(0, κ1σ2/|H|) effects down-weighted by size of H

• determine hyper-parameters via empirical Bayes

partition of multiple QTL effects

2

2

2

2

10 1 and

σσκµ G

hhY =−

≈≈ •

QTL 2: Bayes Seattle SISG: Yandell © 2006 16

λ

1m 2m 3m 4m 5m 6m

pr(q|m,λ) recombination modelpr(q|m,λ) = pr(geno | map, locus) ≈pr(geno | flanking markers, locus)

distance along chromosome

q?markers

Page 10: Seattle Summer Institute 2006 Advanced QTL Brian S. Yandell …pages.stat.wisc.edu/~yandell/talk/sisg/seattle2006/bsy... · 2006. 5. 23. · • 10 QTL ok, 50 QTL are too many •

QTL 2: Bayes Seattle SISG: Yandell © 2006 17

how does phenotype Y improve posterior for genotype Q?

90

100

110

120

D4Mit41D4Mit214

Genotype

bp

AAAA

ABAA

AAAB

ABAB

what are probabilitiesfor genotype Qbetween markers?

recombinants AA:AB

all 1:1 if ignore Yand if we use Y?

QTL 2: Bayes Seattle SISG: Yandell © 2006 18

posterior on QTL genotypes• full conditional for q depends data for individual i

– proportional to prior pr(q | mi, λ )• weight toward q that agrees with flanking markers

– proportional to likelihood pr(yi|q,µ)• weight toward q so that group mean µq ≈ yi

• phenotype and prior recombination may conflict– posterior recombination balances these two weights– this is “E step” in EM for classical QTL analysis

),,|(pr),|(pr),|(pr),,,|(pr

λµµλλµ

ii

iiii my

qymqmyq =

Page 11: Seattle Summer Institute 2006 Advanced QTL Brian S. Yandell …pages.stat.wisc.edu/~yandell/talk/sisg/seattle2006/bsy... · 2006. 5. 23. · • 10 QTL ok, 50 QTL are too many •

QTL 2: Bayes Seattle SISG: Yandell © 2006 19

Bayesian model posterior• augment data (y,m) with unknowns q• study unknowns (µ,λ,q) given data (y,m)

– properties of posterior pr(µ,λ,q | y,m )• sample from posterior in some clever way

– multiple imputation or MCMC

),|,,(pr sum),|,(pr)|(pr

)|(pr)(pr),|(pr),|(pr),|,,(pr

nyqmymy

mmqqymyq

q λµλµ

λµλµλµ

=

=

QTL 2: Bayes Seattle SISG: Yandell © 2006 20

Bayesian priors for QTL• missing genotypes q

– pr( q | m, λ )– recombination model is formally a prior

• effects ( µ, σ2 )– prior = pr( µq | σ2 ) pr(σ2 ) – use conjugate priors for normal phenotype

• pr( µq | σ2 ) = normal• pr(σ2 ) = inverse chi-square

• each locus λ may be uniform over genome– pr(λ | m ) = 1 / length of genome

• combined prior– pr( q, µ, λ | m ) = pr( q | m, λ ) pr( µ ) pr(λ | m )

Page 12: Seattle Summer Institute 2006 Advanced QTL Brian S. Yandell …pages.stat.wisc.edu/~yandell/talk/sisg/seattle2006/bsy... · 2006. 5. 23. · • 10 QTL ok, 50 QTL are too many •

QTL 2: Bayes Seattle SISG: Yandell © 2006 21

3. Markov chain sampling of architectures• construct Markov chain around posterior

– want posterior as stable distribution of Markov chain– in practice, the chain tends toward stable distribution

• initial values may have low posterior probability• burn-in period to get chain mixing well

• hard to sample (q, µ , λ, H) from joint posterior– update (q,µ,λ) from full conditionals for model H– update genetic architecture H

NHqHqHqmyHqHq

),,,(),,,(),,,(),|,,,(pr~),,,(

21 λµλµλµλµλµ

→→→ L

QTL 2: Bayes Seattle SISG: Yandell © 2006 22

MCMC sampling of (λ,q,µ)• Gibbs sampler

– genotypes q– effects µ– not loci λ

• Metropolis-Hastings sampler– extension of Gibbs sampler– does not require normalization

• pr( q | m ) = sumλ pr( q | m, λ ) pr(λ )

)|(pr)|(pr),|(pr~

)|(pr)(pr),|(pr~

),,,|(pr~

mqmmq

qyqy

myqq ii

λλλ

µµµ

λµ

Page 13: Seattle Summer Institute 2006 Advanced QTL Brian S. Yandell …pages.stat.wisc.edu/~yandell/talk/sisg/seattle2006/bsy... · 2006. 5. 23. · • 10 QTL ok, 50 QTL are too many •

QTL 2: Bayes Seattle SISG: Yandell © 2006 23

full conditional for locus• cannot easily sample from locus full conditional

pr(λ |y,m,µ,q) = pr( λ | m,q)= pr( q | m, λ ) pr(λ ) / constant

• constant is very difficult to compute explicitly– must average over all possible loci λ over genome– must do this for every possible genotype q

• Gibbs sampler will not work in general– but can use method based on ratios of probabilities– Metropolis-Hastings is extension of Gibbs sampler

QTL 2: Bayes Seattle SISG: Yandell © 2006 24

Gibbs sampler idea• toy problem

– want to study two correlated effects– could sample directly from their bivariate distribution

• instead use Gibbs sampler:– sample each effect from its full conditional given the other– pick order of sampling at random– repeat many times

( )( )2

12

221

2

1

1,~

1,~

11

,00

~

ρρµµ

ρρµµ

ρρ

µµ

N

N

N

Page 14: Seattle Summer Institute 2006 Advanced QTL Brian S. Yandell …pages.stat.wisc.edu/~yandell/talk/sisg/seattle2006/bsy... · 2006. 5. 23. · • 10 QTL ok, 50 QTL are too many •

QTL 2: Bayes Seattle SISG: Yandell © 2006 25

Gibbs sampler samples: ρ = 0.6

0 10 20 30 40 50

-2-1

01

2

Markov chain index

Gib

bs: m

ean

1

0 10 20 30 40 50

-2-1

01

23

Markov chain index

Gib

bs: m

ean

2

-2 -1 0 1 2

-2-1

01

23

Gibbs: mean 1

Gib

bs: m

ean

2

-2 -1 0 1 2

-2-1

01

23

Gibbs: mean 1

Gib

bs: m

ean

2

0 50 100 150 200

-2-1

01

23

Markov chain index

Gib

bs: m

ean

1

0 50 100 150 200

-2-1

01

2

Markov chain index

Gib

bs: m

ean

2

-2 -1 0 1 2 3

-2-1

01

2

Gibbs: mean 1

Gib

bs: m

ean

2

-2 -1 0 1 2 3

-2-1

01

2

Gibbs: mean 1

Gib

bs: m

ean

2

N = 50 samples N = 200 samples

QTL 2: Bayes Seattle SISG: Yandell © 2006 26

Metropolis-Hastings idea• want to study distribution f(λ)

– take Monte Carlo samples• unless too complicated

– take samples using ratios of f• Metropolis-Hastings samples:

– propose new value λ*

• near (?) current value λ• from some distribution g

– accept new value with prob a• Gibbs sampler: a = 1 always

−−

=)()()()(,1min *

**

λλλλλλ

gfgfa

0 2 4 6 8 10

0.0

0.2

0.4

-4 -2 0 2 4

0.0

0.2

0.4

f(λ)

g(λ–λ*)

Page 15: Seattle Summer Institute 2006 Advanced QTL Brian S. Yandell …pages.stat.wisc.edu/~yandell/talk/sisg/seattle2006/bsy... · 2006. 5. 23. · • 10 QTL ok, 50 QTL are too many •

QTL 2: Bayes Seattle SISG: Yandell © 2006 27

Metropolis-Hastings samples

0 2 4 6 8

050

150

θ

mcm

c se

quen

ce

0 2 4 6 8

02

46

θ

pr(θ

|Y)

0 2 4 6 8

050

150

θ

mcm

c se

quen

ce

0 2 4 6 8

0.0

0.4

0.8

1.2

θ

pr(θ

|Y)

0 2 4 6 8

040

080

0

θ

mcm

c se

quen

ce0 2 4 6 8

0.0

1.0

2.0

θ

pr(θ

|Y)

0 2 4 6 8

040

080

0

θ

mcm

c se

quen

ce

0 2 4 6 8

0.0

0.2

0.4

0.6

θ

pr(θ

|Y)

N = 200 samples N = 1000 samplesnarrow g wide g narrow g wide g

λ λ λ λ

hist

ogra

m

hist

ogra

m

hist

ogra

m

hist

ogra

m

QTL 2: Bayes Seattle SISG: Yandell © 2006 28

4. sampling across architectures • search across genetic architectures M of various sizes

– allow change in number of QTL– allow change in types of epistatic interactions

• methods for search– reversible jump MCMC– Gibbs sampler with loci indicators

• complexity of epistasis– Fisher-Cockerham effects model– general multi-QTL interaction & limits of inference

Page 16: Seattle Summer Institute 2006 Advanced QTL Brian S. Yandell …pages.stat.wisc.edu/~yandell/talk/sisg/seattle2006/bsy... · 2006. 5. 23. · • 10 QTL ok, 50 QTL are too many •

QTL 2: Bayes Seattle SISG: Yandell © 2006 29

model selection in regression• consider known genotypes q at 2 known loci λ

– models with 1 or 2 QTL• jump between 1-QTL and 2-QTL models• adjust parameters when model changes

– βq1 estimate changes between models 1 and 2– due to collinearity of QTL genotypes

21

1

:2

:1

qqq

qq

m

m

ββµµ

βµµ

++==

+==

QTL 2: Bayes Seattle SISG: Yandell © 2006 30

0.0 0.2 0.4 0.6 0.8

0.0

0.2

0.4

0.6

0.8

b1

b2

c21 = 0.7

Move Between Models

m=1

m=2

0.0 0.2 0.4 0.6 0.8

0.0

0.2

0.4

0.6

0.8

b1

b2

Reversible Jump Sequence

geometry of reversible jump

β1 β1

β 2β 2

Page 17: Seattle Summer Institute 2006 Advanced QTL Brian S. Yandell …pages.stat.wisc.edu/~yandell/talk/sisg/seattle2006/bsy... · 2006. 5. 23. · • 10 QTL ok, 50 QTL are too many •

QTL 2: Bayes Seattle SISG: Yandell © 2006 31

0.05 0.10 0.15

0.0

0.05

0.10

0.15

b1

b2a short sequence

-0.3 -0.1 0.10.

00.

10.

20.

30.

4

first 1000 with m<3

b1

b2-0.2 0.0 0.2

geometry allowing q and λ to change

β1 β1

β 2β 2

QTL 2: Bayes Seattle SISG: Yandell © 2006 32

collinear QTL = correlated effects

-0.6 -0.4 -0.2 0.0 0.2

-0.6

-0.4

-0.2

0.0

additive 1

addi

tive

2

cor = -0.81

4-week

-0.2 -0.1 0.0 0.1 0.2

-0.3

-0.2

-0.1

0.0

additive 1

addi

tive

2

cor = -0.7

8-week

effe

c t 2

effect 1

effe

c t 2

effect 1

• linked QTL = collinear genotypescorrelated estimates of effects (negative if in coupling phase)sum of linked effects usually fairly constant

Page 18: Seattle Summer Institute 2006 Advanced QTL Brian S. Yandell …pages.stat.wisc.edu/~yandell/talk/sisg/seattle2006/bsy... · 2006. 5. 23. · • 10 QTL ok, 50 QTL are too many •

QTL 2: Bayes Seattle SISG: Yandell © 2006 33

reversible jump MCMC idea

• Metropolis-Hastings updates: draw one of three choices– update m-QTL model with probability 1-b(m+1)-d(m)

• update current model using full conditionals• sample m QTL loci, effects, and genotypes

– add a locus with probability b(m+1)• propose a new locus and innovate new genotypes & genotypic effect• decide whether to accept the “birth” of new locus

– drop a locus with probability d(m)• propose dropping one of existing loci• decide whether to accept the “death” of locus

• Satagopan Yandell (1996, 1998); Sillanpaa Arjas (1998); Stevens Fisch (1998)– these build on RJ-MCMC idea of Green (1995); Richardson Green (1997)

0 Lλ1 λm+1 λmλ2 …

QTL 2: Bayes Seattle SISG: Yandell © 2006 34

Gibbs sampler with loci indicators • consider only QTL at pseudomarkers

– every 1-2 cM– modest approximation with little bias

• use loci indicators in each pseudomarker– δ = 1 if QTL present– δ = 0 if no QTL present

• Gibbs sampler on loci indicators δ– relatively easy to incorporate epistasis– Yi, Yandell, Churchill, Allison, Eisen, Pomp (2005 Genetics)

• (see earlier work of Nengjun Yi and Ina Hoeschele)

2211 qqq βδβδµµ ++=

Page 19: Seattle Summer Institute 2006 Advanced QTL Brian S. Yandell …pages.stat.wisc.edu/~yandell/talk/sisg/seattle2006/bsy... · 2006. 5. 23. · • 10 QTL ok, 50 QTL are too many •

QTL 2: Bayes Seattle SISG: Yandell © 2006 35

5. Gene Action and Epistasisadditive, dominant, recessive, general effects

of a single QTL (Gary Churchill)

QTL 2: Bayes Seattle SISG: Yandell © 2006 36

additive effects of two QTL(Gary Churchill)

µq = µ + βq1 + βq2

Page 20: Seattle Summer Institute 2006 Advanced QTL Brian S. Yandell …pages.stat.wisc.edu/~yandell/talk/sisg/seattle2006/bsy... · 2006. 5. 23. · • 10 QTL ok, 50 QTL are too many •

QTL 2: Bayes Seattle SISG: Yandell © 2006 37

Epistasis (Gary Churchill)

The allelic state at one locus can mask or

uncover the effects of allelic variation at another.

- W. Bateson, 1907.

QTL 2: Bayes Seattle SISG: Yandell © 2006 38

epistasis in parallel pathways (GAC)• Z keeps trait value low

• neither E1 nor E2 is rate limiting

• loss of function alleles aresegregating from parent A at E1 and from parent B at E2

Z

X

Y

E1

E2

Page 21: Seattle Summer Institute 2006 Advanced QTL Brian S. Yandell …pages.stat.wisc.edu/~yandell/talk/sisg/seattle2006/bsy... · 2006. 5. 23. · • 10 QTL ok, 50 QTL are too many •

QTL 2: Bayes Seattle SISG: Yandell © 2006 39

epistasis in a serial pathway (GAC)

ZX YE1 E2

• Z keeps trait value high

• neither E1 nor E2 is rate limiting

• loss of function alleles aresegregating from parent B at E1 and from parent A at E2

QTL 2: Bayes Seattle SISG: Yandell © 2006 40

212

22

2122

22

212

22

21

2

1221

2

)var(

)var(,

hhhh

eey

G

G

Gq

qqqq

q

++=+

=

++==

+++=

=+=

σσσ

σσσσµ

βββµµ

σµ

QTL with epistasis• same phenotype model overview

• partition of genotypic value with epistasis

• partition of genetic variance & heritability

Page 22: Seattle Summer Institute 2006 Advanced QTL Brian S. Yandell …pages.stat.wisc.edu/~yandell/talk/sisg/seattle2006/bsy... · 2006. 5. 23. · • 10 QTL ok, 50 QTL are too many •

QTL 2: Bayes Seattle SISG: Yandell © 2006 41

epistatic interactions• model space issues

– 2-QTL interactions only? • or general interactions among multiple QTL?

– partition of effects• Fisher-Cockerham or tree-structured or ?

• model search issues– epistasis between significant QTL

• check all possible pairs when QTL included?• allow higher order epistasis?

– epistasis with non-significant QTL• whole genome paired with each significant QTL?• pairs of non-significant QTL?

• Yi Xu (2000) Genetics; Yi, Xu, Allison (2003) Genetics; Yi et al. (2005) Genetics

QTL 2: Bayes Seattle SISG: Yandell © 2006 42

limits of epistatic inference• power to detect effects

– epistatic model size grows exponentially• |H| = 3nqtl for general interactions

– power depends on ratio of n to model size• want n / |H| to be fairly large (say > 5)• n = 100, nqtl = 3, n / |H| ≈ 4

• empty cells mess up adjusted (Type 3) tests– missing q1Q2 / q1Q2 or q1Q2q3 / q1Q2q3 genotype– null hypotheses not what you would expect– can confound main effects and interactions– can bias AA, AD, DA, DD partition

Page 23: Seattle Summer Institute 2006 Advanced QTL Brian S. Yandell …pages.stat.wisc.edu/~yandell/talk/sisg/seattle2006/bsy... · 2006. 5. 23. · • 10 QTL ok, 50 QTL are too many •

QTL 2: Bayes Seattle SISG: Yandell © 2006 43

6. comparing QTL models

• balance model fit with model "complexity“– want maximum likelihood– without too complicated a model

• information criteria quantifies the balance– Bayes information criteria (BIC) for likelihood– Bayes factors for Bayesian approach

QTL 2: Bayes Seattle SISG: Yandell © 2006 44

Bayes factors & BIC

• what is a Bayes factor?– ratio of posterior odds to prior odds– ratio of model likelihoods

• BF is equivalent to LR statistic when– comparing two nested models– simple hypotheses (e.g. 1 vs 2 QTL)

• BF is equivalent to Bayes Information Criteria (BIC)– for general comparison of any models– want Bayes factor to be substantially larger than 1 (say 10 or more)

)log()()log(2)log(2 1212 nppLRB −−−=−

)model|(pr)model|(pr

)model(pr/)model(pr)|model(pr/)|model(pr

2

1

21

2112

YYYYB ==

Page 24: Seattle Summer Institute 2006 Advanced QTL Brian S. Yandell …pages.stat.wisc.edu/~yandell/talk/sisg/seattle2006/bsy... · 2006. 5. 23. · • 10 QTL ok, 50 QTL are too many •

QTL 2: Bayes Seattle SISG: Yandell © 2006 45

Bayes factors and genetic model H• H = number of QTL

– prior pr(H) chosen by user– posterior pr(H|y,m)

• sampled marginal histogram• shape affected by prior pr(H)

• pattern of QTL across genome• gene action and epistasis

)1(pr),1(pr)(pr),(pr

1 ++=+ H/m|yH

H/mH|yBFH,H

e

e

e

ee

ee e e e e

0 2 4 6 8 10

0.00

0.10

0.20

0.30

p

p

p p

p

p

pp p p p

u u u u u u u

u u u u

m = number of QTL

prio

r pro

babi

lity

epu

exponentialPoissonuniform

QTL 2: Bayes Seattle SISG: Yandell © 2006 46

issues in computing Bayes factors• BF insensitive to shape of prior on nqtl

– geometric, Poisson, uniform– precision improves when prior mimics posterior

• BF sensitivity to prior variance on effects θ– prior variance should reflect data variability– resolved by using hyper-priors

• automatic algorithm; no need for user tuning

• easy to compute Bayes factors from samples– sample posterior using MCMC– posterior pr(nqtl|y,m) is marginal histogram

Page 25: Seattle Summer Institute 2006 Advanced QTL Brian S. Yandell …pages.stat.wisc.edu/~yandell/talk/sisg/seattle2006/bsy... · 2006. 5. 23. · • 10 QTL ok, 50 QTL are too many •

QTL 2: Data Seattle SISG: Yandell © 2006 1

examples in detail• simulation study (after Stephens & Fisch (1998)• days to flower for Brassica napus (plant) (n = 108)

– single chromosome with 2 linked loci– whole genome

• gonad shape in Drosophila spp. (insect) (n = 1000)– multiple traits reduced by PC– many QTL and epistasis

• expression phenotype (SCD1) in mice (n = 108)– multiple QTL and epistasis

• obesity in mice (n = 421)– epistatic QTLs with no main effects

QTL 2: Data Seattle SISG: Yandell © 2006 2

simulation with 8 QTL•simulated F2 intercross, 8 QTL

– (Stephens, Fisch 1998)– n=200, heritability = 50%– detected 3 QTL

•increase to detect all 8– n=500, heritability to 97%

QTL chr loci effect1 1 11 –32 1 50 –53 3 62 +24 6 107 –35 6 152 +36 8 32 –47 8 54 +18 9 195 +2

7 8 9 10 11 12 13

010

2030

40

number of QTL

frequ

ency

in %

0 50 100 150 200

ch1ch2ch3ch4ch5ch6ch7ch8ch9

ch10

Genetic map

posterior

Page 26: Seattle Summer Institute 2006 Advanced QTL Brian S. Yandell …pages.stat.wisc.edu/~yandell/talk/sisg/seattle2006/bsy... · 2006. 5. 23. · • 10 QTL ok, 50 QTL are too many •

QTL 2: Data Seattle SISG: Yandell © 2006 3

loci pattern across genome• notice which chromosomes have persistent loci• best pattern found 42% of the time

Chromosome m 1 2 3 4 5 6 7 8 9 10 Count of 80008 2 0 1 0 0 2 0 2 1 0 33719 3 0 1 0 0 2 0 2 1 0 7517 2 0 1 0 0 2 0 1 1 0 3779 2 0 1 0 0 2 0 2 1 0 2189 2 0 1 0 0 3 0 2 1 0 2189 2 0 1 0 0 2 0 2 2 0 198

QTL 2: Data Seattle SISG: Yandell © 2006 4

Brassica napus: 1 chromosome• 4-week & 8-week vernalization effect

– log(days to flower)

• genetic cross of– Stellar (annual canola)– Major (biennial rapeseed)

• 105 F1-derived double haploid (DH) lines– homozygous at every locus (QQ or qq)

• 10 molecular markers (RFLPs) on LG9– two QTLs inferred on LG9 (now chromosome N2)– corroborated by Butruille (1998)– exploiting synteny with Arabidopsis thaliana

Page 27: Seattle Summer Institute 2006 Advanced QTL Brian S. Yandell …pages.stat.wisc.edu/~yandell/talk/sisg/seattle2006/bsy... · 2006. 5. 23. · • 10 QTL ok, 50 QTL are too many •

QTL 2: Data Seattle SISG: Yandell © 2006 5

2.5 3.0 3.5 4.0

2.5

3.0

3.5

4-week

8-w

eek

2.5

3.5

0 2 4 6 8 108-week vernalization

2.5 3.0 3.5 4.0

02

46

8

4-week vernalization

Brassica 4- & 8-week data

summaries of raw datajoint scatter plots

(identity line)separate histograms

QTL 2: Data Seattle SISG: Yandell © 2006 6

20 40 60 80

-0.6

-0.4

-0.2

0.0

0.2

distance (cM)

addi

tive

4-week

20 40 60 80

-0.3

-0.2

-0.1

0.0

0.1

0.2

distance (cM)

addi

tive

8-weekBrassica credible regions

Page 28: Seattle Summer Institute 2006 Advanced QTL Brian S. Yandell …pages.stat.wisc.edu/~yandell/talk/sisg/seattle2006/bsy... · 2006. 5. 23. · • 10 QTL ok, 50 QTL are too many •

QTL 2: Data Seattle SISG: Yandell © 2006 7

B. napus 8-week vernalizationwhole genome study

• 108 plants from double haploid– similar genetics to backcross: follow 1 gamete– parents are Major (biennial) and Stellar (annual)

• 300 markers across genome– 19 chromosomes– average 6cM between markers

• median 3.8cM, max 34cM– 83% markers genotyped

• phenotype is days to flowering– after 8 weeks of vernalization (cooling)– Stellar parent requires vernalization to flower

• Ferreira et al. (1994); Kole et al. (2001); Schranz et al. (2002)

QTL 2: Data Seattle SISG: Yandell © 2006 8

Bayesian model assessment

row 1: # QTLrow 2: pattern

col 1: posteriorcol 2: Bayes factornote error bars on bf

evidence suggests4-5 QTLN2(2-3),N3,N16

1 3 5 7 9 11

0.0

0.1

0.2

0.3

number of QTL

QTL

pos

terio

r

QTL posterior

15

5050

0

number of QTL

post

erio

r / p

rior

Bayes factor ratios

1 3 5 7 9 11

weak

moderate

strong

1 3 5 7 9 11 13

0.0

0.1

0.2

0.3

2*2

2:2,

123:

2*2,

122:

2,13

2:2,

32:

2,16

2:2,

113:

2*2,

32:

2,15

4:2*

2,3,

163:

2*2,

132:

2,14

2

model index

mod

el p

oste

rior

pattern posterior

5 e

-01

1 e

+01

5 e

+02

model index

post

erio

r / p

rior

Bayes factor ratios

1 3 5 7 9 11 13

12

23

2 22

23

24

32

weak

moderate

strong

Page 29: Seattle Summer Institute 2006 Advanced QTL Brian S. Yandell …pages.stat.wisc.edu/~yandell/talk/sisg/seattle2006/bsy... · 2006. 5. 23. · • 10 QTL ok, 50 QTL are too many •

QTL 2: Data Seattle SISG: Yandell © 2006 9

Bayesian estimates of loci & effects

histogram of lociblue line is densityred lines at estimates

estimate additive effects(red circles)

grey points sampledfrom posterior

blue line is cubic splinedashed line for 2 SD

0 50 100 150 200 250

0.00

0.02

0.04

0.06

loci

his

togr

am

napus8 summaries with pattern 1,1,2,3 and m ≥ 4

N2 N3 N16

0 50 100 150 200 250

-0.0

8-0

.02

0.02

addi

tive

N2 N3 N16

QTL 2: Data Seattle SISG: Yandell © 2006 10

Bayesian model diagnostics pattern: N2(2),N3,N16col 1: densitycol 2: boxplots by m

environmental varianceσ2 = .008, σ = .09

heritabilityh2 = 52%

LOD = 16(highly significant)

but note change with m

0.004 0.006 0.008 0.010 0.012

050

100

200

dens

ity

marginal envvar, m ≥ 44 5 6 7 8 9 11 12

0.00

60.

008

0.01

0

envvar conditional on number of QTL

envv

ar

0.2 0.3 0.4 0.5 0.6 0.7

01

23

45

dens

ity

marginal heritability, m ≥ 44 5 6 7 8 9 11 12

0.30

0.40

0.50

0.60

heritability conditional on number of QT

herit

abilit

y

5 10 15 20 25

0.00

0.04

0.08

0.12

dens

ity

marginal LOD, m ≥ 44 5 6 7 8 9 11 12

1012

1416

1820

LOD conditional on number of QTL

LOD

Page 30: Seattle Summer Institute 2006 Advanced QTL Brian S. Yandell …pages.stat.wisc.edu/~yandell/talk/sisg/seattle2006/bsy... · 2006. 5. 23. · • 10 QTL ok, 50 QTL are too many •

QTL 2: Data Seattle SISG: Yandell © 2006 11

shape phenotype in BC studyindexed by PC1

Liu et al. (1996) Genetics

QTL 2: Data Seattle SISG: Yandell © 2006 12

shape phenotype via PC

Liu et al. (1996) Genetics

Page 31: Seattle Summer Institute 2006 Advanced QTL Brian S. Yandell …pages.stat.wisc.edu/~yandell/talk/sisg/seattle2006/bsy... · 2006. 5. 23. · • 10 QTL ok, 50 QTL are too many •

QTL 2: Data Seattle SISG: Yandell © 2006 13

Zeng et al. (2000)CIM vs. MIM

composite interval mapping(Liu et al. 1996)narrow peaksmiss some QTL

multiple interval mapping(Zeng et al. 2000)triangular peaks

both conditional 1-D scansfixing all other "QTL"

QTL 2: Data Seattle SISG: Yandell © 2006 14

CIM, MIM and IM pairscan

mim

cim

Page 32: Seattle Summer Institute 2006 Advanced QTL Brian S. Yandell …pages.stat.wisc.edu/~yandell/talk/sisg/seattle2006/bsy... · 2006. 5. 23. · • 10 QTL ok, 50 QTL are too many •

QTL 2: Data Seattle SISG: Yandell © 2006 15

2 QTL + epistasis:IM versus multiple imputation

IM pairscan multiple imputation

QTL 2: Data Seattle SISG: Yandell © 2006 16

multiple QTL: CIM, MIM and BIM

bim

cim

mim

Page 33: Seattle Summer Institute 2006 Advanced QTL Brian S. Yandell …pages.stat.wisc.edu/~yandell/talk/sisg/seattle2006/bsy... · 2006. 5. 23. · • 10 QTL ok, 50 QTL are too many •

QTL 2: Data Seattle SISG: Yandell © 2006 17

studying diabetes in an F2• segregating cross of inbred lines

– B6.ob x BTBR.ob → F1 → F2– selected mice with ob/ob alleles at leptin gene (chr 6)– measured and mapped body weight, insulin, glucose at various ages

(Stoehr et al. 2000 Diabetes)– sacrificed at 14 weeks, tissues preserved

• gene expression data– Affymetrix microarrays on parental strains, F1

• key tissues: adipose, liver, muscle, β-cells• novel discoveries of differential expression (Nadler et al. 2000 PNAS; Lan et

al. 2002 in review; Ntambi et al. 2002 PNAS)– RT-PCR on 108 F2 mice liver tissues

• 15 genes, selected as important in diabetes pathways• SCD1, PEPCK, ACO, FAS, GPAT, PPARgamma, PPARalpha, G6Pase,

PDI,…

QTL 2: Data Seattle SISG: Yandell © 2006 18

0 50 100 150 200 250 300

02

46

8LO

D

chr2 chr5 chr9

0 50 100 150 200 250 300

-0.5

0.0

0.5

1.0

effe

ct (a

dd=b

lue,

dom

=red

)

chr2 chr5 chr9

Multiple Interval Mapping (QTLCart)SCD1: multiple QTL plus epistasis!

Page 34: Seattle Summer Institute 2006 Advanced QTL Brian S. Yandell …pages.stat.wisc.edu/~yandell/talk/sisg/seattle2006/bsy... · 2006. 5. 23. · • 10 QTL ok, 50 QTL are too many •

QTL 2: Data Seattle SISG: Yandell © 2006 19

Bayesian model assessment:number of QTL for SCD1

1 2 3 4 5 6 7 8 9 11 13

0.00

0.05

0.10

0.15

0.20

0.25

number of QTL

QTL

pos

terio

r

QTL posterior

15

1050

500

number of QTL

post

erio

r / p

rior

Bayes factor ratios

1 2 3 4 5 6 7 8 9 11 13

weak

moderate

strong

QTL 2: Data Seattle SISG: Yandell © 2006 20

Bayesian LOD and h2 for SCD1

0 5 10 15 20

0.00

0.10

dens

ity

marginal LOD, m ≥ 11 2 3 4 5 6 7 8 9 10 12 14

510

1520

LOD conditional on number of QTL

LOD

0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7

01

23

4

dens

ity

marginal heritability, m ≥ 11 2 3 4 5 6 7 8 9 10 12 14

0.1

0.3

0.5

heritability conditional on number of QTL

herit

abilit

y

Page 35: Seattle Summer Institute 2006 Advanced QTL Brian S. Yandell …pages.stat.wisc.edu/~yandell/talk/sisg/seattle2006/bsy... · 2006. 5. 23. · • 10 QTL ok, 50 QTL are too many •

QTL 2: Data Seattle SISG: Yandell © 2006 21

Bayesian model assessment:chromosome QTL pattern for SCD1

1 3 5 7 9 11 13 15

0.00

0.05

0.10

0.15

4:1,

2,2*

34:

1,2*

2,3

5:3*

1,2,

35:

2*1,

2,2*

35:

2*1,

2*2,

36:

3*1,

2,2*

36:

3*1,

2*2,

35:

1,2*

2,2*

36:

4*1,

2,3

6:2*

1,2*

2,2*

32:

1,3

3:2*

1,2

2:1,

2

3:1,

2,3

4:2*

1,2,

3

model index

mod

el p

oste

rior

pattern posterior

0.2

0.4

0.6

0.8

model index

post

erio

r / p

rior

Bayes factor ratios

1 3 5 7 9 11 13 15

3 44 4

55 5 6 6

56

62

32

weak

moderate

QTL 2: Data Seattle SISG: Yandell © 2006 22

trans-acting QTL for SCD1(no epistasis yet: see Yi, Xu, Allison 2003)

dominance?

Page 36: Seattle Summer Institute 2006 Advanced QTL Brian S. Yandell …pages.stat.wisc.edu/~yandell/talk/sisg/seattle2006/bsy... · 2006. 5. 23. · • 10 QTL ok, 50 QTL are too many •

QTL 2: Data Seattle SISG: Yandell © 2006 23

2-D scan: assumes only 2 QTL!

epistasisLODpeaks

jointLODpeaks

QTL 2: Data Seattle SISG: Yandell © 2006 24

sub-peaks can be easily overlooked!

Page 37: Seattle Summer Institute 2006 Advanced QTL Brian S. Yandell …pages.stat.wisc.edu/~yandell/talk/sisg/seattle2006/bsy... · 2006. 5. 23. · • 10 QTL ok, 50 QTL are too many •

QTL 2: Data Seattle SISG: Yandell © 2006 25

epistatic model fit

QTL 2: Data Seattle SISG: Yandell © 2006 26

Cockerham epistatic effects

Page 38: Seattle Summer Institute 2006 Advanced QTL Brian S. Yandell …pages.stat.wisc.edu/~yandell/talk/sisg/seattle2006/bsy... · 2006. 5. 23. · • 10 QTL ok, 50 QTL are too many •

QTL 2: Data Seattle SISG: Yandell © 2006 27

obesity in CAST/Ei BC onto M16i

• 421 mice (Daniel Pomp)– (213 male, 208 female)

• 92 microsatellites on 19 chromosomes– 1214 cM map

• subcutaneous fat pads– pre-adjusted for sex and dam effects

• Yi, Yandell, Churchill, Allison, Eisen, Pomp (2005) Genetics (in press)

QTL 2: Data Seattle SISG: Yandell © 2006 28

non-epistatic analysis

05

1015

20

LOD

sco

re

050

100

150

200

250

300

Baye

s fa

ctor

single QTL LOD profile multiple QTLBayes factor profile

Page 39: Seattle Summer Institute 2006 Advanced QTL Brian S. Yandell …pages.stat.wisc.edu/~yandell/talk/sisg/seattle2006/bsy... · 2006. 5. 23. · • 10 QTL ok, 50 QTL are too many •

QTL 2: Data Seattle SISG: Yandell © 2006 29

posterior profile of main effectsin epistatic analysis

-0.8

-0.4

0.0

0.2

0.4

Mai

n ef

fect

0.00

0.05

0.10

0.15

0.20

Her

itabi

lity

010

2030

4050

Bay

es fa

ctor

main effects & heritability profile Bayes factor profile

QTL 2: Data Seattle SISG: Yandell © 2006 30

posterior profile of main effectsin epistatic analysis

Page 40: Seattle Summer Institute 2006 Advanced QTL Brian S. Yandell …pages.stat.wisc.edu/~yandell/talk/sisg/seattle2006/bsy... · 2006. 5. 23. · • 10 QTL ok, 50 QTL are too many •

QTL 2: Data Seattle SISG: Yandell © 2006 31

model selectionvia

Bayes factorsfor

epistatic model

number of QTL

QTL pattern

QTL 2: Data Seattle SISG: Yandell © 2006 32

posterior probability of effects

Posterior probability

0.0 0.2 0.4 0.6 0.8 1.0

Chr2(72,85)Chr13(20,42)Chr15(1,31)

Chr18(43,71)Chr1(26,54)

Chr19(15,45)Chr7(50,75)

Chr14(12,41)Chr1(26,54)*Chr18(43,71)Chr2(72,85)*Chr13(20,42)Chr15(1,31)*Chr19(15,45)Chr2(72,85)*Chr14(12,41)Chr7(50,75)*Chr19(15,45)Chr13(20,42)*Chr15(1,31)

Page 41: Seattle Summer Institute 2006 Advanced QTL Brian S. Yandell …pages.stat.wisc.edu/~yandell/talk/sisg/seattle2006/bsy... · 2006. 5. 23. · • 10 QTL ok, 50 QTL are too many •

QTL 2: Data Seattle SISG: Yandell © 2006 33

scatterplot estimates of epistatic loci

QTL 2: Data Seattle SISG: Yandell © 2006 34

stronger epistatic effects

Page 42: Seattle Summer Institute 2006 Advanced QTL Brian S. Yandell …pages.stat.wisc.edu/~yandell/talk/sisg/seattle2006/bsy... · 2006. 5. 23. · • 10 QTL ok, 50 QTL are too many •

QTL 2: Data Seattle SISG: Yandell © 2006 35

model selection for pairs

QTL 2: Data Seattle SISG: Yandell © 2006 36

our RJ-MCMC software• R: www.r-project.org

– freely available statistical computing application R– library(bim) builds on Broman’s library(qtl)

• QTLCart: statgen.ncsu.edu/qtlcart– Bmapqtl incorporated into QTLCart (S Wang 2003)

• www.stat.wisc.edu/~yandell/qtl/software/bmqtl• R/bim

– initially designed by JM Satagopan (1996)– major revision and extension by PJ Gaffney (2001)

• whole genome, multivariate and long range updates• speed improvements, pre-burnin

– built as official R library (H Wu, Yandell, Gaffney, CF Jin 2003)• R/bmqtl

– collaboration with N Yi, H Wu, GA Churchill– initial working module: Winter 2005– improved module and official release: Summer/Fall 2005– major NIH grant (PI: Yi)

Page 43: Seattle Summer Institute 2006 Advanced QTL Brian S. Yandell …pages.stat.wisc.edu/~yandell/talk/sisg/seattle2006/bsy... · 2006. 5. 23. · • 10 QTL ok, 50 QTL are too many •

Traits NCSU QTL II: Yandell © 2005 1

Multiple Traits & Microarrays

1. why study multiple traits together? 2-10– diabetes case study

2. design issues 11-13– selective phenotyping

3. why are traits correlated? 14-17– close linkage or pleiotropy?

4. modern high throughput 18-31– principal components & discriminant analysis

5. graphical models 32-36– building causal biochemical networks

Traits NCSU QTL II: Yandell © 2005 2

1. why study multiple traits together?• avoid reductionist approach to biology

– address physiological/biochemical mechanisms– Schmalhausen (1942); Falconer (1952)

• separate close linkage from pleiotropy– 1 locus or 2 linked loci?

• identify epistatic interaction or canalization– influence of genetic background

• establish QTL x environment interactions• decompose genetic correlation among traits• increase power to detect QTL

Page 44: Seattle Summer Institute 2006 Advanced QTL Brian S. Yandell …pages.stat.wisc.edu/~yandell/talk/sisg/seattle2006/bsy... · 2006. 5. 23. · • 10 QTL ok, 50 QTL are too many •

Traits NCSU QTL II: Yandell © 2005 3

Type 2 Diabetes Mellitus

Traits NCSU QTL II: Yandell © 2005 4

Insulin Requirement

from Unger & Orci FASEB J. (2001) 15,312

decompensation

Page 45: Seattle Summer Institute 2006 Advanced QTL Brian S. Yandell …pages.stat.wisc.edu/~yandell/talk/sisg/seattle2006/bsy... · 2006. 5. 23. · • 10 QTL ok, 50 QTL are too many •

Traits NCSU QTL II: Yandell © 2005 5

glucose insulin

(courtesy AD Attie)

Traits NCSU QTL II: Yandell © 2005 6

studying diabetes in an F2• segregating cross of inbred lines

– B6.ob x BTBR.ob → F1 → F2– selected mice with ob/ob alleles at leptin gene (chr 6)– measured and mapped body weight, insulin, glucose at various

ages (Stoehr et al. 2000 Diabetes)– sacrificed at 14 weeks, tissues preserved

• gene expression data– Affymetrix microarrays on parental strains, F1

• (Nadler et al. 2000 PNAS; Ntambi et al. 2002 PNAS)– RT-PCR for a few mRNA on 108 F2 mice liver tissues

• (Lan et al. 2003 Diabetes; Lan et al. 2003 Genetics)– Affymetrix microarrays on 60 F2 mice liver tissues

• design (Jin et al. 2004 Genetics tent. accept)• analysis (work in prep.)

Page 46: Seattle Summer Institute 2006 Advanced QTL Brian S. Yandell …pages.stat.wisc.edu/~yandell/talk/sisg/seattle2006/bsy... · 2006. 5. 23. · • 10 QTL ok, 50 QTL are too many •

Traits NCSU QTL II: Yandell © 2005 7

why map gene expressionas a quantitative trait?

• cis- or trans-action?– does gene control its own expression? – or is it influenced by one or more other genomic regions?– evidence for both modes (Brem et al. 2002 Science)

• simultaneously measure all mRNA in a tissue– ~5,000 mRNA active per cell on average– ~30,000 genes in genome– use genetic recombination as natural experiment

• mechanics of gene expression mapping– measure gene expression in intercross (F2) population– map expression as quantitative trait (QTL)– adjust for multiple testing

Traits NCSU QTL II: Yandell © 2005 8

LOD map for PDI:cis-regulation (Lan et al. 2003)

Page 47: Seattle Summer Institute 2006 Advanced QTL Brian S. Yandell …pages.stat.wisc.edu/~yandell/talk/sisg/seattle2006/bsy... · 2006. 5. 23. · • 10 QTL ok, 50 QTL are too many •

Traits NCSU QTL II: Yandell © 2005 9

mapping microarray data• single gene expression as trait (single QTL)

– Dumas et al. (2000 J Hypertens)• overview, wish lists

– Jansen, Nap (2001 Trends Gen); Cheung, Spielman(2002); Doerge (2002 Nat Rev Gen); Bochner (2003 Nat Rev Gen)

• microarray scan via 1 QTL interval mapping– Brem et al. (2002 Science); Schadt et al. (2003 Nature);

Yvert et al. (2003 Nat Gen)– found putative cis- and trans- acting genes

• multivariate and multiple QTL approach– Lan et al. (2003 Genetics)

Traits NCSU QTL II: Yandell © 2005 10

Page 48: Seattle Summer Institute 2006 Advanced QTL Brian S. Yandell …pages.stat.wisc.edu/~yandell/talk/sisg/seattle2006/bsy... · 2006. 5. 23. · • 10 QTL ok, 50 QTL are too many •

Traits NCSU QTL II: Yandell © 2005 11

2. design issues for expensive phenotypes(thanks to CF “Amy” Jin)

• microarray analysis ~ $1000 per mouse– can only afford to assay 60 of 108 in panel– wish to not lose much power to detect QTL

• selective phenotyping– genotype all individuals in panel– select subset for phenotyping– previous studies can provide guide

Traits NCSU QTL II: Yandell © 2005 12

selective phenotyping• emphasize additive effects in F2

– F2 design: 1QQ:2Qq:1qq– best design for additive only: 1QQ:1Qq– drop heterozygotes (Qq)– reduce sample size by half with no power loss

• emphasize general effects in F2– best design: 1QQ:1Qq:1qq– drop half of heterozygotes (25% reduction)

• multiple loci– same idea but care is needed– drop 7/16 of sample for two unlinked loci

Page 49: Seattle Summer Institute 2006 Advanced QTL Brian S. Yandell …pages.stat.wisc.edu/~yandell/talk/sisg/seattle2006/bsy... · 2006. 5. 23. · • 10 QTL ok, 50 QTL are too many •

Traits NCSU QTL II: Yandell © 2005 13

is this relevant to large QTL studies?

• why not phenotype entire mapping panel?– selectively phenotype subset of 50-67%– may capture most effects– with little loss of power

• two-stage selective phenotyping?– genotype & phenotype subset of 100-300

• could selectively phenotype using whole genome– QTL map to identify key genomic regions– selectively phenotype subset using key regions

Traits NCSU QTL II: Yandell © 2005 14

3. why are traits correlated?• environmental correlation

– non-genetic, controllable by design– historical correlation (learned behavior)– physiological correlation (same body)

• genetic correlation– pleiotropy

• one gene, many functions• common biochemical pathway, splicing variants

– close linkage• two tightly linked genes• genotypes Q are collinear

Page 50: Seattle Summer Institute 2006 Advanced QTL Brian S. Yandell …pages.stat.wisc.edu/~yandell/talk/sisg/seattle2006/bsy... · 2006. 5. 23. · • 10 QTL ok, 50 QTL are too many •

Traits NCSU QTL II: Yandell © 2005 15

interplay of pleiotropy & correlation

pleiotropy only bothcorrelation onlyKorol et al. (2001)

Traits NCSU QTL II: Yandell © 2005 16

3 correlated traits(Jiang Zeng 1995)

ellipses centered on genotypic valuewidth for nominal frequencymain axis angle environmental correlation3 QTL, F227 genotypes

note signs ofgenetic andenvironmentalcorrelation

-3 -2 -1 0 1 2 3

-3-2

-10

12

3

jiang3

jiang

2

ρP = 0.06, ρG = 0.68, ρE = -0.2

-3 -2 -1 0 1 2 3

-2-1

01

2

jiang2

jiang

1

ρP = 0.3, ρG = 0.54, ρE = 0.2

-2 -1 0 1 2

-2-1

01

2

jiang3

jiang

1

ρP = -0.07, ρG = -0.22, ρE = 0

Page 51: Seattle Summer Institute 2006 Advanced QTL Brian S. Yandell …pages.stat.wisc.edu/~yandell/talk/sisg/seattle2006/bsy... · 2006. 5. 23. · • 10 QTL ok, 50 QTL are too many •

Traits NCSU QTL II: Yandell © 2005 17

pleiotropy or close linkage?2 traits, 2 qtl/traitpleiotropy @ 54cMlinkage @ 114,128cMJiang Zeng (1995)

Traits NCSU QTL II: Yandell © 2005 18

4. modern high throughput biology• measuring the molecular dogma of biology

– DNA → RNA → protein → metabolites– measured one at a time only a few years ago

• massive array of measurements on whole systems (“omics”)– thousands measured per individual (experimental unit)– all (or most) components of system measured simultaneously

• whole genome of DNA: genes, promoters, etc.• all expressed RNA in a tissue or cell• all proteins• all metabolites

• systems biology: focus on network interconnections– chains of behavior in ecological community– underlying biochemical pathways

• genetics as one experimental tool– perturb system by creating new experimental cross– each individual is a unique mosaic

Page 52: Seattle Summer Institute 2006 Advanced QTL Brian S. Yandell …pages.stat.wisc.edu/~yandell/talk/sisg/seattle2006/bsy... · 2006. 5. 23. · • 10 QTL ok, 50 QTL are too many •

Traits NCSU QTL II: Yandell © 2005 19

coordinated expression in mouse genome (Schadt et al. 2003)

expression pleiotropy

in yeast genome(Brem et al. 2002)

Traits NCSU QTL II: Yandell © 2005 20

• reduce 30,000 traits to 300-3,000 heritable traits

• probability a trait is heritablepr(H|Y,Q) = pr(Y|Q,H) pr(H|Q) / pr(Y|Q) Bayes rule

pr(Y|Q) = pr(Y|Q,H) pr(H|Q) + pr(Y|Q, not H) pr(not H|Q)

• phenotype averaged over genotypic mean µpr(Y|Q, not H) = f0(Y) = ∫ f(Y|G ) pr(G) dG if not H

pr(Y|Q, H) = f1(Y|Q) = ∏q f0(Yq ) if heritable

Yq = {Yi | Qi =q} = trait values with genotype Q=q

finding heritable traits(from Christina Kendziorski)

Page 53: Seattle Summer Institute 2006 Advanced QTL Brian S. Yandell …pages.stat.wisc.edu/~yandell/talk/sisg/seattle2006/bsy... · 2006. 5. 23. · • 10 QTL ok, 50 QTL are too many •

Traits NCSU QTL II: Yandell © 2005 21

hierarchical model for expression phenotypes(EB arrays: Christina Kendziorski)

( )⋅pr~qG

QqG qqG

( )QQQQ ~ GfY ⋅

( )QqQq ~ GfY ⋅

( )qqqq ~ GfY ⋅

QQG

mRNA phenotype modelsgiven genotypic mean Gq

common prior on Gq across all mRNA(use empirical Bayes to estimate prior)

qqGQqG

QQG

Traits NCSU QTL II: Yandell © 2005 22

expression meta-traits: pleiotropy• reduce 3,000 heritable traits to 3 meta-traits(!)• what are expression meta-traits?

– pleiotropy: a few genes can affect many traits• transcription factors, regulators

– weighted averages: Z = YW• principle components, discriminant analysis

• infer genetic architecture of meta-traits– model selection issues are subtle

• missing data, non-linear search• what is the best criterion for model selection?

– time consuming process• heavy computation load for many traits• subjective judgement on what is best

Page 54: Seattle Summer Institute 2006 Advanced QTL Brian S. Yandell …pages.stat.wisc.edu/~yandell/talk/sisg/seattle2006/bsy... · 2006. 5. 23. · • 10 QTL ok, 50 QTL are too many •

Traits NCSU QTL II: Yandell © 2005 23

PC for two correlated mRNA

8.2 8.4 8.6 8.8 9.0 9.2 9.4

7.6

7.8

8.0

8.2

8.4

8.6

etif3s6

ettf1

-0.5 0.0 0.5

-0.2

-0.1

0.0

0.1

0.2

PC1 (93%)

PC

2 (7

%)

Traits NCSU QTL II: Yandell © 2005 24

PC across microarray functional groupsAffy chips on 60 mice~40,000 mRNA

2500+ mRNA show DE(via EB arrays withmarker regression)

1500+ organized in85 functional groups2-35 mRNA / group

which are interesting? examine PC1, PC2

circle size = # unique mRNA

Page 55: Seattle Summer Institute 2006 Advanced QTL Brian S. Yandell …pages.stat.wisc.edu/~yandell/talk/sisg/seattle2006/bsy... · 2006. 5. 23. · • 10 QTL ok, 50 QTL are too many •

Traits NCSU QTL II: Yandell © 2005 25

84 PC meta-traits by functional groupfocus on 2 interesting groups

Traits NCSU QTL II: Yandell © 2005 26

red lines: peakfor PC meta-trait

black/blue: peaksfor mRNA traits

arrows: cis-action?

Page 56: Seattle Summer Institute 2006 Advanced QTL Brian S. Yandell …pages.stat.wisc.edu/~yandell/talk/sisg/seattle2006/bsy... · 2006. 5. 23. · • 10 QTL ok, 50 QTL are too many •

Traits NCSU QTL II: Yandell © 2005 27

(portion of) chr 4 region chr 15 region

?

Traits NCSU QTL II: Yandell © 2005 28

interaction plots for DA meta-traitsDA for all pairs of markers:

separate 9 genotypes based on markers(a) same locus pair found with PC meta-traits(b) Chr 2 region interesting from biochemistry (Jessica Byers)(c) Chr 5 & Chr 9 identified as important for insulin, SCD

Page 57: Seattle Summer Institute 2006 Advanced QTL Brian S. Yandell …pages.stat.wisc.edu/~yandell/talk/sisg/seattle2006/bsy... · 2006. 5. 23. · • 10 QTL ok, 50 QTL are too many •

Traits NCSU QTL II: Yandell © 2005 29

genotypes from Chr 4/Chr 15 locus pair(circle=centroid)

PC captures spread without genotype

DA creates best separation by genotype

-15 -10 -5 0 5 10 15

-10

-50

510

P C 1 (2 5% )

PC

2 (1

2%)

H .H

H .H

A.A

H .H

A.B

H .H

B.B

H .A

A.H

H .A

H .HB.H

A.H

A.B

A.H

H .HH .H

B.BA.H

A.A

B.H

H .A

H .A

A.H

A.B

A.B

H .A

A.H

H .H

B.A

A.A

A.B

H .HH .B

H .B

B.HH .HH .A

A.H

B.H

B.H

A.H

B.H

B.AA.H

A.B

H .B

H .HB.H

H .B

B.A

H .H

B.H

B.H

H .B

B.H

B.H H .A

A.AH .B

-3 -2 -1 0 1 2 3 4

-3-2

-10

12

3

D A 1 (3 7 % )

DA

2 (1

8%)

H .H

H .H

A.AH .H

A.B

H .H

B.B

H .A

A.HH .A

H .H

B.H

A.H

A.B

A.H

H .HH .H

B.B

A.H

A.A

B.H

H .A

H .A

A.H

A.B

A.B

H .A

A.H

H .H

B.A

A.A

A.B

H .H

H .BH .B

B.H

H .H

H .A

A.HB.H

B.H

A.H

B.H

B.A

A.HA.B

H .B

H .H

B.H

H .B

B.A

H .H

B.H

B.H

H .B

B.H

B.H

H .A

A.A

H .B

-15 -10 -5 0 5 10 15

-3-2

-10

12

34

P C 1 (2 5% )

DA

1 (3

7%)

H .H

H .H

A.A

H .H

A.B

H .H

B.B

H .A

A.H

H .A

H .H

B.H

A.H

A.B

A.H

H .H

H .H

B.B

A.H

A.A

B.HH .A

H .A

A.H

A.BA.B

H .A

A.H

H .H

B.A

A.A

A.B

H .HH .B

H .B

B.H

H .H

H .A

A.H

B.H B.H

A.H

B.H

B.A

A.H

A.B

H .B

H .H

B.H

H .B

B.A

H .H

B.H

B.H

H .B

B.H

B.H

H .A

A.AH .B

-10 -5 0 5 10

-3-2

-10

12

3

P C 2 (1 2 % )D

A2

(18%

)

H .H

H .H

A.AH .H

A.B

H .H

B.B

H .A

A.HH .A

H .H

B.H

A.H

A.B

A.H

H .HH .H

B.B

A.H

A.A

B.H

H .A

H .A

A.H

A.B

A.B

H .A

A.H

H .H

B.A

A.A

A.B

H .H

H .BH .B

B.H

H .H

H .A

A.HB.H

B.H

A.H

B.H

B.A

A.HA.B

H .B

H .H

B.H

H .B

B.A

H .H

B.H

B.H

H .B

B.H

B.H

H .A

A.A

H .B

comparison of PC and DA meta-traits on 1500+ mRNA traits

correlation ofPC and DA meta-traits

note betterspread of circles

PC ignores genotype DA uses genotype

Traits NCSU QTL II: Yandell © 2005 30

relating meta-traits to mRNA traits

SCD

trai

tlo

g2 e

xpre

ssio

nD

A m

eta-

trait

stan

dard

uni

ts

Page 58: Seattle Summer Institute 2006 Advanced QTL Brian S. Yandell …pages.stat.wisc.edu/~yandell/talk/sisg/seattle2006/bsy... · 2006. 5. 23. · • 10 QTL ok, 50 QTL are too many •

Traits NCSU QTL II: Yandell © 2005 31

DA: a cautionary tale(184 mRNA with |cor| > 0.5; mouse 13 drives heritability)

Traits NCSU QTL II: Yandell © 2005 32

building graphical models

• infer genetic architecture of meta-trait– E(Z | Q, M) = µq = β0 + ∑{q in M} βqk

• find mRNA traits correlated with meta-trait– Z ≈ YW for modest number of traits Y

• extend meta-trait genetic architecture– M = genetic architecture for Y– expect subset of QTL to affect each mRNA– may be additional QTL for some mRNA

Page 59: Seattle Summer Institute 2006 Advanced QTL Brian S. Yandell …pages.stat.wisc.edu/~yandell/talk/sisg/seattle2006/bsy... · 2006. 5. 23. · • 10 QTL ok, 50 QTL are too many •

Traits NCSU QTL II: Yandell © 2005 33

posterior for graphical models•posterior for graph given multivariate trait & architecturepr(G | Y, Q, M) = pr(Y | Q, G) pr(G | M) / pr(Y | Q)

–pr(G | M) = prior on valid graphs given architecture

•multivariate phenotype averaged over genotypic mean µpr(Y | Q, G) = f1(Y | Q, G) = ∏q f0(Yq | G)

f0(Yq | G) = ∫ f(Yq | µ, G) pr(µ) dµ

•graphical model G implies correlation structure on Y

•genotype mean prior assumed independent across traitspr(µ) = ∏t pr(µt)

Traits NCSU QTL II: Yandell © 2005 34

from graphical models to pathways

• build graphical modelsQTL → RNA1 → RNA2– class of possible models– best model = putative biochemical pathway

• parallel biochemical investigation– candidate genes in QTL regions– laboratory experiments on pathway components

Page 60: Seattle Summer Institute 2006 Advanced QTL Brian S. Yandell …pages.stat.wisc.edu/~yandell/talk/sisg/seattle2006/bsy... · 2006. 5. 23. · • 10 QTL ok, 50 QTL are too many •

Traits NCSU QTL II: Yandell © 2005 35

graphical models (with Elias Chaibub)f1(Y | Q, G=g) = f1(Y1 | Q) f1(Y2 | Q, Y1)

R2D2 P2

QTL R1D1 P1

observabletrans-action

unobservablemeta-traitQTL RNADNA

observablecis-action?

protein

Traits NCSU QTL II: Yandell © 2005 36

summary• expression QTL are complicated

– need to consider multiple interacting QTL• coherent approach for high-throughput traits

– identify heritable traits– dimension reduction to meta-traits– mapping genetic architecture– extension via graphical models to networks

• many open questions– model selection– computation efficiency– inference on graphical models