Estimating genetic variation within families
Peter M. VisscherQueensland Institute of Medical
ResearchBrisbane, Australia
Overview
• Estimation of genetic parameters• Variation in identity• Applications
– mean and variance of genome-wide IBD sharing for sibpairs
– estimation of heritability of height– genome partitioning of genetic variation
2
Estimation of genetic parameters
• Model– expected covariance between relatives
• Genetics• Environment
• Data– correlation/regression of observations between
relatives• Statistical method
– ANOVA– regression– maximum likelihood– Bayesian analysis
3
[Galton, 1889] 4
The height vs. pea debate
(early 1900s)
Do quantitative traits have the same hereditary and evolutionary properties as discrete characters?
Biometricians Mendelians
5
RA Fisher (1918). Transactions of the Royal Societyof Edinburgh52: 399-433.
m-a m+d m+a
Trait
m-a m+d m+a
Trait
6
Genetic covariance between relatives
covG(yi,yj) = aijσA2 + dijσD
2
a = additive coefficient of relationship= 2 * coefficient of kinship (= E(π))
d = coefficient of fraternity= Prob(2 alleles are IBD)
7
Examples (no inbreeding)
Relatives a d
MZ twins 1 1Parent-offspring ½ 0Fullsibs ½ ¼Double first cousins ¼ 1/16
8
Controversy/confounding:nature vs nurture
• Is observed resemblance between relatives genetic or environmental?– MZ & DZ twins (shared environment)– Fullsibs (dominance & shared environment)
• Estimation and statistical inference– Different models with many parameters may
fit data equally well
9
Total mole count for MZ and DZ twins
0
100
200
300
400
0 100 200 300 400
Twin 2
Twin
1
0
100
200
300
400
0 100 200 300 400
Twin 2
Twin
1
MZ twins - 153 pairs, r = 0.94 DZ twins - 199 pairs, r = 0.60
10
Sources of variation in Queensland school test results of 16-year olds
12%78%
10%
Additivegenetic
Sharedenvironment
Non-sharedenvironment
11
An unbiased approach
Estimate genetic variance within
families
12
Actual or realised genetic relationship
= proportion of genome shared IBD (πa)
• Varies around the expectation– Apart from parent-offspring and MZ twins
• Can be estimated using marker data
13
x
1/4 1/4 1/4 1/414
IDENTITY BY DESCENTSib 1
Sib 2
4/16 = 1/4 sibs share BOTH parental alleles IBD = 2
8/16 = 1/2 sibs share ONE parental allele IBD = 1
4/16 = 1/4 sibs share NO parental alleles IBD = 015
Single locus
Relatives E(πa) var(πa)
Fullsibs ½ 1/8Halfsibs ¼ 1/16
Double 1st cousins ¼ 3/32
16
Several notations
IBD Probability Actual
IBD0 k0 0 or 1IBD1 k1 0 or 1IBD2 k2 0 or 1
Σ=1 Σ=1
πa = ½k1 + k2 = R = 2θπd = k2 = ∆xy
17
[e.g., LW Chapter 7; Weir and Hill 2011, Genetics Research]
Realisationsk0 k1 k2
1 0 00 1 00 0 1
n multiple unlinked loci
Relatives E(πa) var(πa)
Fullsibs ½ 1/8n
Halfsibs ¼ 1/16n
Double 1st cousins ¼ 3/32n
18
Loci are on chromosomes
• Segregation of large chromosome segments within families– increasing variance of IBD sharing
• Independent segregation of chromosomes– decreasing variance of IBD sharing
19
Theoretical SD of πa
Relatives 1 chrom (1 M) genome (35 M)
Fullsibs 0.217 0.038Halfsibs 0.154 0.027Double 1st cousins 0.173 0.030
[Stam 1980; Hill 1993; Guo 1996; Hill & Weir 2011]20
Fullsibs: genome-wide (Total length L Morgan)
var(πa) ≈ 1/(16L) – 1/(3L2)
var(πd) ≈ 5/(64L) – 1/(3L2)
var(πd)/ var(πa) ≈ 1.3 if L = 35
[Stam 1980; Hill 1993; Guo 1996]
Genome-wide variance depends more on total genome length than on the number of chromosomes
21
Fullsibs: Correlation additive and dominance relationships
r(πa, πd) = σ(πa) / σ(πd) ≈ [1/(16L) / (5/(64L))]0.5 = 0.89.
Using β(πa on πd) = 1
Difficult but not impossible to disentangle additive and dominance variance
NB Practical 22
SummaryAdditive and dominance (fullsibs)
SD(πa) SD(πd)
Single locus 0.354 0.433One chromsome (1M) 0.217 0.247Whole genome (35M) 0.038 0.043
Predicted correlation 0.89(genome-wide πa and πd)
23
Application (1)Aim: estimate genetic variance from actual
relationships between fullsib pairs
• Two cohorts of Australian twin families
Adolescent AdultFamilies 500 1512Individuals 1201 3804Sibpairs with genotypes 950 3451Markers per individual 211-791 201-1717Average marker spacing 6 cM 5 cM
24
Application (1)
• Phenotype = height
Number of sibpairs with phenotypesand genotypes
Adolescent cohort 931Adult cohort 2444Combined 3375
25
Mean IBD sharing across the genome for the jth sib pair was based on IBD estimated from Merlin every
centimorgan and averaged at all 3491 points
3491/ˆˆ3491
1)()( ∑
=
=i
ijaja ππ
3491/ˆ3491
1)(2)( ∑
=
=i
ijjd pπ
additive
dominance
26
And for the cth chromosome of length lc cM
c
l
i
cc lc
ijaja/ˆˆ
1)()( ∑
=
= ππ
c
l
iij
c lpc
jd/ˆ
1)(2)( ∑
=
=π
additive
dominance
27
Mean and SD of genome-wide additive relationships
28
Mean and SD of genome-wide dominance relationships
29
Empirical and theoretical SD of additive relationshipscorrelation = 0.98 (n = 4401)
30
Empirical and theoretical SD of dominance relationshipscorrelation = 0.98 (n = 4401)
31
Additive and dominance relationships correlation = 0.91 (n= 4401)
32
Phenotypes
After adjustment for sex and age:σp = 7.7 cm σp = 6.9 cm 33
Phenotypic correlation between siblings
Raw After age & sex
Adolescents 0.33 0.40Adults 0.24 0.39
34
Models
C= Family effectA = Genome-wide additive geneticE = Residual
Full model C + A + EReduced model C + E
35
Estimation
• Maximum Likelihood variance components
• Likelihood-ratio-test (LRT) to calculate P-values for hypothesesH0: A = 0H1: A > 0
36
Estimates: null model (CE)
Cohort Family effect (C)
Adolescent 0.40 (0.34 – 0.45)Adult 0.39 (0.36 – 0.43)Combined 0.39 (0.36 – 0.42)
37
Estimates: full model (ACE)
Cohort C A P
Adolescent 0 0.80 0.0869Adult 0 0.80 0.0009Combined 0 0.80 0.0003
►All family resemblance due to additive genetic variation
38
Sampling variances are large
Cohort A (95% CI)
Adolescent 0.80 (0.00 – 0.90)Adult 0.80 (0.43 – 0.86)Combined 0.80 (0.46 – 0.85)
39
F+A more accurately estimated
Cohort C+A (95% CI)
Adolescent 0.80 (0.36 – 0.90)Adult 0.80 (0.61 – 0.86)Combined 0.80 (0.62 – 0.85)
►Prediction of MZ correlation from fullsibs!
40
Power and SE of estimates
• True parameters (t)• Sample size (n)• Variance in genome-wide IBD sharing (var(π))
NCP = nh4var(π)(1+t2) / (1-t2)2
[ ]))var()(1(/)1()ˆvar( 2222 πntth +−≈
41
• Aims– Estimate genetic variance from genome-wide
IBD in larger sample– Partition genetic variance to individual
chromosomes• using chromosome-wide coefficients of relationship
– Test hypotheses about the distribution of genetic variance in the genome
Application (2)Genome partitioning of additive
genetic variance for height
42
Sample # Sibpairs Sib Correlation
AU 5952 0.43US 3996 0.50NL 1266 0.45
Total 11,214 0.46
43
Realised relationshipsMean 0.499Range 0.31 – 0.64SD 0.036
44
Estimates from genome-wide additive and dominance coefficients
ACE modelHeritability 0.86 (0.49 – 0.95) P<0.0001Family 0.03 (0.00 – 0.03) P=0.38
ADCE modelAdditive component 0.70Dominance component 0.16 (P=0.35)
45
46
y = 1.006x + 0.0001R2 = 0.9715
0.00
0.02
0.04
0.06
0.08
0.10
0.12
0.00 0.02 0.04 0.06 0.08 0.10 0.12
From single chromosome analyses
From
com
bine
d ch
rom
osom
e an
alys
is
Estimates of chromosomal heritabilities
No epistasis?
47
222120
19
18
17
16
15
14
13
12
11 10
9
8
7
6
5
4
3
2
1
0.00
0.02
0.04
0.06
0.08
0.10
0.12
0.14
50 100 150 200 250 300
Length of chromosome (cM)
Estim
ate
of h
erita
bilit
y
WLS analysis: P<0.001; intercept NS
Longer chromosomes explain more additive genetic variance: ~0.03 per 100 cM
48
19
9
7
3
17
14
4
8
15
12
18
1
2016
21
2213
1011
62 5
y = 0.9623xR2 = 0.2813
0.00
0.02
0.04
0.06
0.08
0.10
0.12
0.14
0.16
0.18
0.20
0.00 0.02 0.04 0.06 0.08 0.10 0.12
Heritability AU
Her
itabi
lity
USA
Estimates are consistent across countries
49
-20
-18
-16
-14
-12
-10
-8
-6
-4
-2
00 1 2 3 4 5 6 7 8
Number of chromosomal additive genetic variance components
Scal
ed A
ICStepwise analyses: at least 6 chromosomes are needed to explain the additive genetic variance
50
Hypothesis test
Model h2 c2 df LRT
Full (22 chrom.) 0.92 0.00 22Genome-wide 0.86 0.03 1 19.2
Additive genetic variance in proportion to lenght not rejected
51
17
612
18
15
9
34
8
14 5
10 13
17
19
216
2111
20220.00
0.02
0.04
0.06
0.08
0.10
0.12
0 1 2 3 4
Number of publications with LOD > 1.9
Estim
ate
of h
erita
bilit
y
Data consistent with published QTL results
Rank test: P=0.002
52
Conclusions
• Empirical variation in genome-wide IBD sharing follows theoretical predictions
• Genetic variance can be estimated from genome-wide IBD within families– results for height consistent with estimates from
between-relative comparisons– no assumptions about nature/nurture causes of family
resemblance• Genetic variance can be partitioned onto
chromosomes53
Conclusions
• With large sample sizes it will become possible to estimate – dominance variance– epistatic variance– genome-wide parent-of-origin variance
– genetic relative risk to disease
54
Genetic architecture for height
• Additive genetic variance• No QTL of large effects• Chromosomes explain ~10% of genetic
variance• Consequences for genome-wide
association
55
Other applications: breeding programmes
• Exploit variance in genome-wide IBD by using the realised A-matrix– large increase in accuracy of selection if
• variance in identity is large• family size is large
• “Genomic Selection”
56
Using the realised A-matrix: Reliability of EBV for an unphenotyped individual from n-1 phenotyped relatives(a simulation study)
57