estimating genetic variation within...

Post on 13-Jul-2020

5 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Estimating genetic variation within families

Peter M. VisscherQueensland Institute of Medical

ResearchBrisbane, Australia

peter.visscher@qimr.edu.au1

Overview

• Estimation of genetic parameters• Variation in identity• Applications

– mean and variance of genome-wide IBD sharing for sibpairs

– estimation of heritability of height– genome partitioning of genetic variation

2

Estimation of genetic parameters

• Model– expected covariance between relatives

• Genetics• Environment

• Data– correlation/regression of observations between

relatives• Statistical method

– ANOVA– regression– maximum likelihood– Bayesian analysis

3

[Galton, 1889] 4

The height vs. pea debate

(early 1900s)

Do quantitative traits have the same hereditary and evolutionary properties as discrete characters?

Biometricians Mendelians

5

RA Fisher (1918). Transactions of the Royal Societyof Edinburgh52: 399-433.

m-a m+d m+a

QQ

Qq

qq

Trait

m-a m+d m+a

QQ

Qq

qq

Trait

6

Genetic covariance between relatives

covG(yi,yj) = aijσA2 + dijσD

2

a = additive coefficient of relationship= 2 * coefficient of kinship (= E(π))

d = coefficient of fraternity= Prob(2 alleles are IBD)

7

Examples (no inbreeding)

Relatives a d

MZ twins 1 1Parent-offspring ½ 0Fullsibs ½ ¼Double first cousins ¼ 1/16

8

Controversy/confounding:nature vs nurture

• Is observed resemblance between relatives genetic or environmental?– MZ & DZ twins (shared environment)– Fullsibs (dominance & shared environment)

• Estimation and statistical inference– Different models with many parameters may

fit data equally well

9

Total mole count for MZ and DZ twins

0

100

200

300

400

0 100 200 300 400

Twin 2

Twin

1

0

100

200

300

400

0 100 200 300 400

Twin 2

Twin

1

MZ twins - 153 pairs, r = 0.94 DZ twins - 199 pairs, r = 0.60

10

Sources of variation in Queensland school test results of 16-year olds

12%78%

10%

Additivegenetic

Sharedenvironment

Non-sharedenvironment

11

An unbiased approach

Estimate genetic variance within

families

12

Actual or realised genetic relationship

= proportion of genome shared IBD (πa)

• Varies around the expectation– Apart from parent-offspring and MZ twins

• Can be estimated using marker data

13

x

1/4 1/4 1/4 1/414

IDENTITY BY DESCENTSib 1

Sib 2

4/16 = 1/4 sibs share BOTH parental alleles IBD = 2

8/16 = 1/2 sibs share ONE parental allele IBD = 1

4/16 = 1/4 sibs share NO parental alleles IBD = 015

Single locus

Relatives E(πa) var(πa)

Fullsibs ½ 1/8Halfsibs ¼ 1/16

Double 1st cousins ¼ 3/32

16

Several notations

IBD Probability Actual

IBD0 k0 0 or 1IBD1 k1 0 or 1IBD2 k2 0 or 1

Σ=1 Σ=1

πa = ½k1 + k2 = R = 2θπd = k2 = ∆xy

17

[e.g., LW Chapter 7; Weir and Hill 2011, Genetics Research]

Realisationsk0 k1 k2

1 0 00 1 00 0 1

n multiple unlinked loci

Relatives E(πa) var(πa)

Fullsibs ½ 1/8n

Halfsibs ¼ 1/16n

Double 1st cousins ¼ 3/32n

18

Loci are on chromosomes

• Segregation of large chromosome segments within families– increasing variance of IBD sharing

• Independent segregation of chromosomes– decreasing variance of IBD sharing

19

Theoretical SD of πa

Relatives 1 chrom (1 M) genome (35 M)

Fullsibs 0.217 0.038Halfsibs 0.154 0.027Double 1st cousins 0.173 0.030

[Stam 1980; Hill 1993; Guo 1996; Hill & Weir 2011]20

Fullsibs: genome-wide (Total length L Morgan)

var(πa) ≈ 1/(16L) – 1/(3L2)

var(πd) ≈ 5/(64L) – 1/(3L2)

var(πd)/ var(πa) ≈ 1.3 if L = 35

[Stam 1980; Hill 1993; Guo 1996]

Genome-wide variance depends more on total genome length than on the number of chromosomes

21

Fullsibs: Correlation additive and dominance relationships

r(πa, πd) = σ(πa) / σ(πd) ≈ [1/(16L) / (5/(64L))]0.5 = 0.89.

Using β(πa on πd) = 1

Difficult but not impossible to disentangle additive and dominance variance

NB Practical 22

SummaryAdditive and dominance (fullsibs)

SD(πa) SD(πd)

Single locus 0.354 0.433One chromsome (1M) 0.217 0.247Whole genome (35M) 0.038 0.043

Predicted correlation 0.89(genome-wide πa and πd)

23

Application (1)Aim: estimate genetic variance from actual

relationships between fullsib pairs

• Two cohorts of Australian twin families

Adolescent AdultFamilies 500 1512Individuals 1201 3804Sibpairs with genotypes 950 3451Markers per individual 211-791 201-1717Average marker spacing 6 cM 5 cM

24

Application (1)

• Phenotype = height

Number of sibpairs with phenotypesand genotypes

Adolescent cohort 931Adult cohort 2444Combined 3375

25

Mean IBD sharing across the genome for the jth sib pair was based on IBD estimated from Merlin every

centimorgan and averaged at all 3491 points

3491/ˆˆ3491

1)()( ∑

=

=i

ijaja ππ

3491/ˆ3491

1)(2)( ∑

=

=i

ijjd pπ

additive

dominance

26

And for the cth chromosome of length lc cM

c

l

i

cc lc

ijaja/ˆˆ

1)()( ∑

=

= ππ

c

l

iij

c lpc

jd/ˆ

1)(2)( ∑

=

additive

dominance

27

Mean and SD of genome-wide additive relationships

28

Mean and SD of genome-wide dominance relationships

29

Empirical and theoretical SD of additive relationshipscorrelation = 0.98 (n = 4401)

30

Empirical and theoretical SD of dominance relationshipscorrelation = 0.98 (n = 4401)

31

Additive and dominance relationships correlation = 0.91 (n= 4401)

32

Phenotypes

After adjustment for sex and age:σp = 7.7 cm σp = 6.9 cm 33

Phenotypic correlation between siblings

Raw After age & sex

Adolescents 0.33 0.40Adults 0.24 0.39

34

Models

C= Family effectA = Genome-wide additive geneticE = Residual

Full model C + A + EReduced model C + E

35

Estimation

• Maximum Likelihood variance components

• Likelihood-ratio-test (LRT) to calculate P-values for hypothesesH0: A = 0H1: A > 0

36

Estimates: null model (CE)

Cohort Family effect (C)

Adolescent 0.40 (0.34 – 0.45)Adult 0.39 (0.36 – 0.43)Combined 0.39 (0.36 – 0.42)

37

Estimates: full model (ACE)

Cohort C A P

Adolescent 0 0.80 0.0869Adult 0 0.80 0.0009Combined 0 0.80 0.0003

►All family resemblance due to additive genetic variation

38

Sampling variances are large

Cohort A (95% CI)

Adolescent 0.80 (0.00 – 0.90)Adult 0.80 (0.43 – 0.86)Combined 0.80 (0.46 – 0.85)

39

F+A more accurately estimated

Cohort C+A (95% CI)

Adolescent 0.80 (0.36 – 0.90)Adult 0.80 (0.61 – 0.86)Combined 0.80 (0.62 – 0.85)

►Prediction of MZ correlation from fullsibs!

40

Power and SE of estimates

• True parameters (t)• Sample size (n)• Variance in genome-wide IBD sharing (var(π))

NCP = nh4var(π)(1+t2) / (1-t2)2

[ ]))var()(1(/)1()ˆvar( 2222 πntth +−≈

41

• Aims– Estimate genetic variance from genome-wide

IBD in larger sample– Partition genetic variance to individual

chromosomes• using chromosome-wide coefficients of relationship

– Test hypotheses about the distribution of genetic variance in the genome

Application (2)Genome partitioning of additive

genetic variance for height

42

Sample # Sibpairs Sib Correlation

AU 5952 0.43US 3996 0.50NL 1266 0.45

Total 11,214 0.46

43

Realised relationshipsMean 0.499Range 0.31 – 0.64SD 0.036

44

Estimates from genome-wide additive and dominance coefficients

ACE modelHeritability 0.86 (0.49 – 0.95) P<0.0001Family 0.03 (0.00 – 0.03) P=0.38

ADCE modelAdditive component 0.70Dominance component 0.16 (P=0.35)

45

46

y = 1.006x + 0.0001R2 = 0.9715

0.00

0.02

0.04

0.06

0.08

0.10

0.12

0.00 0.02 0.04 0.06 0.08 0.10 0.12

From single chromosome analyses

From

com

bine

d ch

rom

osom

e an

alys

is

Estimates of chromosomal heritabilities

No epistasis?

47

222120

19

18

17

16

15

14

13

12

11 10

9

8

7

6

5

4

3

2

1

0.00

0.02

0.04

0.06

0.08

0.10

0.12

0.14

50 100 150 200 250 300

Length of chromosome (cM)

Estim

ate

of h

erita

bilit

y

WLS analysis: P<0.001; intercept NS

Longer chromosomes explain more additive genetic variance: ~0.03 per 100 cM

48

19

9

7

3

17

14

4

8

15

12

18

1

2016

21

2213

1011

62 5

y = 0.9623xR2 = 0.2813

0.00

0.02

0.04

0.06

0.08

0.10

0.12

0.14

0.16

0.18

0.20

0.00 0.02 0.04 0.06 0.08 0.10 0.12

Heritability AU

Her

itabi

lity

USA

Estimates are consistent across countries

49

-20

-18

-16

-14

-12

-10

-8

-6

-4

-2

00 1 2 3 4 5 6 7 8

Number of chromosomal additive genetic variance components

Scal

ed A

ICStepwise analyses: at least 6 chromosomes are needed to explain the additive genetic variance

50

Hypothesis test

Model h2 c2 df LRT

Full (22 chrom.) 0.92 0.00 22Genome-wide 0.86 0.03 1 19.2

Additive genetic variance in proportion to lenght not rejected

51

17

612

18

15

9

34

8

14 5

10 13

17

19

216

2111

20220.00

0.02

0.04

0.06

0.08

0.10

0.12

0 1 2 3 4

Number of publications with LOD > 1.9

Estim

ate

of h

erita

bilit

y

Data consistent with published QTL results

Rank test: P=0.002

52

Conclusions

• Empirical variation in genome-wide IBD sharing follows theoretical predictions

• Genetic variance can be estimated from genome-wide IBD within families– results for height consistent with estimates from

between-relative comparisons– no assumptions about nature/nurture causes of family

resemblance• Genetic variance can be partitioned onto

chromosomes53

Conclusions

• With large sample sizes it will become possible to estimate – dominance variance– epistatic variance– genome-wide parent-of-origin variance

– genetic relative risk to disease

54

Genetic architecture for height

• Additive genetic variance• No QTL of large effects• Chromosomes explain ~10% of genetic

variance• Consequences for genome-wide

association

55

Other applications: breeding programmes

• Exploit variance in genome-wide IBD by using the realised A-matrix– large increase in accuracy of selection if

• variance in identity is large• family size is large

• “Genomic Selection”

56

Using the realised A-matrix: Reliability of EBV for an unphenotyped individual from n-1 phenotyped relatives(a simulation study)

57

top related