calculation of ibd probabilities
DESCRIPTION
Calculation of IBD probabilities. David Evans and Stacey Cherny University of Oxford Wellcome Trust Centre for Human Genetics. This Session …. IBD vs IBS Why is IBD important? Calculating IBD probabilities Lander-Green Algorithm (MERLIN) Single locus probabilities Hidden Markov Model - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Calculation of IBD probabilities](https://reader035.vdocuments.mx/reader035/viewer/2022070405/56813fe8550346895daad7c3/html5/thumbnails/1.jpg)
Calculation of IBD probabilities
David Evans and Stacey Cherny
University of OxfordWellcome Trust Centre for
Human Genetics
![Page 2: Calculation of IBD probabilities](https://reader035.vdocuments.mx/reader035/viewer/2022070405/56813fe8550346895daad7c3/html5/thumbnails/2.jpg)
This Session … IBD vs IBS Why is IBD important? Calculating IBD probabilities
Lander-Green Algorithm (MERLIN) Single locus probabilities Hidden Markov Model
Other ways of calculating IBD status Elston-Stewart Algorithm MCMC approaches
MERLIN Practical Example
IBD determination Information content mapping
SNPs vs micro-satellite markers?
![Page 3: Calculation of IBD probabilities](https://reader035.vdocuments.mx/reader035/viewer/2022070405/56813fe8550346895daad7c3/html5/thumbnails/3.jpg)
Aim of Gene Mapping Experiments
Identify variants that control interesting traits Susceptibility to human disease Phenotypic variation in the population
The hypothesis Individuals sharing these variants will
be more similar for traits they control The difficulty…
Testing ~10 million variants is impractical…
![Page 4: Calculation of IBD probabilities](https://reader035.vdocuments.mx/reader035/viewer/2022070405/56813fe8550346895daad7c3/html5/thumbnails/4.jpg)
Identity-by-Descent (IBD)
Two alleles are IBD if they are descended from the same ancestral allele
If a stretch of chromosome is IBD among a set of individuals, ALL variants within that stretch will also be shared IBD (markers, QTLs, disease genes)
Allows surveys of large amounts of variation even when a few polymorphisms measured
![Page 5: Calculation of IBD probabilities](https://reader035.vdocuments.mx/reader035/viewer/2022070405/56813fe8550346895daad7c3/html5/thumbnails/5.jpg)
A Segregating Disease Allele
+/+ +/mut
+/+ +/mut +/mut+/mut
+/mut
+/+
+/+
All affected individuals IBD for disease causing mutation
![Page 6: Calculation of IBD probabilities](https://reader035.vdocuments.mx/reader035/viewer/2022070405/56813fe8550346895daad7c3/html5/thumbnails/6.jpg)
Segregating Chromosomes
MARKER
DISEASE LOCUS
Affected individuals tend to share adjacent areas of chromosome IBD
![Page 7: Calculation of IBD probabilities](https://reader035.vdocuments.mx/reader035/viewer/2022070405/56813fe8550346895daad7c3/html5/thumbnails/7.jpg)
Marker Shared Among Affecteds
“4” allele segregates with disease
1/2 3/4
4/41/42/41/33/4
4/4 1/4
![Page 8: Calculation of IBD probabilities](https://reader035.vdocuments.mx/reader035/viewer/2022070405/56813fe8550346895daad7c3/html5/thumbnails/8.jpg)
Why is IBD sharing important?
1/2 3/4
4/41/42/41/33/4
4/4 1/4
IBD sharing forms the basis of non-parametric linkage statistics
Affected relatives tend to share marker alleles close to the disease locus IBD more often than chance
![Page 9: Calculation of IBD probabilities](https://reader035.vdocuments.mx/reader035/viewer/2022070405/56813fe8550346895daad7c3/html5/thumbnails/9.jpg)
Linkage between QTL and marker
Marker
QTL
IBD 0 IBD 1 IBD 2
![Page 10: Calculation of IBD probabilities](https://reader035.vdocuments.mx/reader035/viewer/2022070405/56813fe8550346895daad7c3/html5/thumbnails/10.jpg)
NO Linkage between QTL and marker
Marker
![Page 11: Calculation of IBD probabilities](https://reader035.vdocuments.mx/reader035/viewer/2022070405/56813fe8550346895daad7c3/html5/thumbnails/11.jpg)
IBD vs IBS
1 4
2 41 3
31
Identical by Descentand
Identical by State
2 1
2 31 1
31
Identical by state only
![Page 12: Calculation of IBD probabilities](https://reader035.vdocuments.mx/reader035/viewer/2022070405/56813fe8550346895daad7c3/html5/thumbnails/12.jpg)
Example: IBD in Siblings
Consider a mating between mother AB x father CD:
IBD 0 : 1 : 2 = 25% : 50% : 25%
Sib2
Sib1
AC AD BC BD
AC 2 1 1 0
AD 1 2 0 1
BC 1 0 2 1
BD 0 1 1 2
![Page 13: Calculation of IBD probabilities](https://reader035.vdocuments.mx/reader035/viewer/2022070405/56813fe8550346895daad7c3/html5/thumbnails/13.jpg)
IBD can be trivial…
1
1 1
1
/ 2 2/
2/ 2/
IBD=0
![Page 14: Calculation of IBD probabilities](https://reader035.vdocuments.mx/reader035/viewer/2022070405/56813fe8550346895daad7c3/html5/thumbnails/14.jpg)
Two Other Simple Cases…
1
1 1
1
/
2/ 2/
1 1/
1 12/ 2/
IBD=2
2 2/ 2 2/
![Page 15: Calculation of IBD probabilities](https://reader035.vdocuments.mx/reader035/viewer/2022070405/56813fe8550346895daad7c3/html5/thumbnails/15.jpg)
A little more complicated…
1 2/
IBD=1(50% chance)
2 2/
1 2/ 1 2/
IBD=2(50% chance)
![Page 16: Calculation of IBD probabilities](https://reader035.vdocuments.mx/reader035/viewer/2022070405/56813fe8550346895daad7c3/html5/thumbnails/16.jpg)
And even more complicated…
1 1/IBD=? 1 1/
![Page 17: Calculation of IBD probabilities](https://reader035.vdocuments.mx/reader035/viewer/2022070405/56813fe8550346895daad7c3/html5/thumbnails/17.jpg)
Bayes Theorem
j
AjBPAjP
AiBPAiP
BP
AiBPAiP
BP
BBAiP
)|()(
)|()(
)(
|()(
)(
), P()|(
)
Ai
![Page 18: Calculation of IBD probabilities](https://reader035.vdocuments.mx/reader035/viewer/2022070405/56813fe8550346895daad7c3/html5/thumbnails/18.jpg)
Bayes Theorem for IBD Probabilities
j
jIBDGPjIBDP
iIBDGPiIBDP
GP
iIBDGPiIBDP
GP
GiGiIBDP
)|()(
)|()(
)(
)|()(
)(
), P(IBD)|(
![Page 19: Calculation of IBD probabilities](https://reader035.vdocuments.mx/reader035/viewer/2022070405/56813fe8550346895daad7c3/html5/thumbnails/19.jpg)
Sib 1 Sib 2 P(observing genotypes / k alleles IBD)
k=0 k=1 k=2
A1A1 A1A1 p14 p13 p1
2
A1A1 A1A2 2p13p2 p12p2 0
A1A1 A2A2 p12p22 0 0
A1A2 A1A1 2p13p2 p12p2 0
A1A2 A1A2 4p12p22 p1p2 2p1p2
A1A2 A2A2 2p1p23 p1p22 0
A2A2 A1A1 p12p22 0 0
A2A2 A1A2 2p1p23 p1p22 0
A2A2 A2A2 p24 p23 p22
P(Marker Genotype|IBD State)
![Page 20: Calculation of IBD probabilities](https://reader035.vdocuments.mx/reader035/viewer/2022070405/56813fe8550346895daad7c3/html5/thumbnails/20.jpg)
Worked Example
1 1/ 1 1/
p1 = 0.5
![Page 21: Calculation of IBD probabilities](https://reader035.vdocuments.mx/reader035/viewer/2022070405/56813fe8550346895daad7c3/html5/thumbnails/21.jpg)
Worked Example
1 1/ 1 1/ 94
)(4
1)|2(
94
)(2
1)|1(
91
)(4
1)|0(
649
41
21
41)(
41)2|(
81)1|(
161)0|(
5.0
21
3
41
21
31
41
21
31
41
1
1
GP
pGIBDP
GP
pGIBDP
GP
pGIBDP
pppGP
pIBDGP
pIBDGP
pIBDGP
p
![Page 22: Calculation of IBD probabilities](https://reader035.vdocuments.mx/reader035/viewer/2022070405/56813fe8550346895daad7c3/html5/thumbnails/22.jpg)
For ANY PEDIGREE the inheritance pattern at every point in the genome can be completely described by a binary inheritance vector:
v(x) = (p1, m1, p2, m2, …,pn,mn)
whose coordinates describe the outcome of the 2n paternal and maternal meioses giving rise to the n non-founders in the pedigree
pi (mi) is 0 if the grandpaternal allele transmittedpi (mi) is 1 if the grandmaternal allele is transmitted
a c/ b d/
a b/ c d/
v(x) = [0,0,1,1]p1
m1p2 m2
![Page 23: Calculation of IBD probabilities](https://reader035.vdocuments.mx/reader035/viewer/2022070405/56813fe8550346895daad7c3/html5/thumbnails/23.jpg)
Inheritance Vector
Inheritance vector Prior Posterior------------------------------------------------------------------0000 1/16 1/80001 1/16 1/80010 1/16 00011 1/16 00100 1/16 1/80101 1/16 1/80110 1/16 00111 1/16 01000 1/16 1/81001 1/16 1/81010 1/16 01011 1/16 01100 1/16 1/81101 1/16 1/81110 1/16 01111 1/16 0
In practice, it is not possible to determine the true inheritance vector at every point in the genome, rather we represent partial information as a probability distribution over the 22n possible inheritance vectors
a b
p1
m2p2
m1
b ba c
a ca b 1 2
3 4
5
![Page 24: Calculation of IBD probabilities](https://reader035.vdocuments.mx/reader035/viewer/2022070405/56813fe8550346895daad7c3/html5/thumbnails/24.jpg)
Computer Representation
Define inheritance vector vℓ Each inheritance vector indexed by a
different memory location Likelihood for each gene flow pattern
Conditional on observed genotypes at location ℓ 22n elements !!!
At each marker location ℓ
0000 0001 0010 0011 0100 0101 0110 0111 1000 1001 1010 1011 1100 1101 1110 1111
L L L L L L L L L L L L L L L L
![Page 25: Calculation of IBD probabilities](https://reader035.vdocuments.mx/reader035/viewer/2022070405/56813fe8550346895daad7c3/html5/thumbnails/25.jpg)
a) bit-indexed array
b) packed tree
c) sparse tree
Legend
Node with zero likelihood
Node identical to sibling
Likelihood for this branch
0000 0001 0010 0011 0100 0101 0110 0111 1000 1001 1010 1011 1100 1101 1110 1111
L1 L2 L1 L2 L1 L2 L1 L2
L1 L2 L1 L2
L1 L2 L1 L2 L1 L2 L1 L2
Abecasis et al (2002) Nat Genet 30:97-101
![Page 26: Calculation of IBD probabilities](https://reader035.vdocuments.mx/reader035/viewer/2022070405/56813fe8550346895daad7c3/html5/thumbnails/26.jpg)
Multipoint IBD
IBD status may not be able to be ascertained with certainty because e.g. the mating is not informative, parental information is not available
IBD information at uninformative loci can be made more precise by examining nearby linked loci
![Page 27: Calculation of IBD probabilities](https://reader035.vdocuments.mx/reader035/viewer/2022070405/56813fe8550346895daad7c3/html5/thumbnails/27.jpg)
a c/ b d/1 1/ 1 2/
a b/1 1/
c d/1 2/
Multipoint IBD
IBD = 0
IBD = 0 or IBD =1?
![Page 28: Calculation of IBD probabilities](https://reader035.vdocuments.mx/reader035/viewer/2022070405/56813fe8550346895daad7c3/html5/thumbnails/28.jpg)
Complexity of the Problemin Larger Pedigrees
2n meioses in pedigree with n non-founders Each meiosis has 2 possible outcomes Therefore 22n possibilities for each locus
For each genetic locus One location for each of m genetic markers Distinct, non-independent meiotic
outcomes Up to 4nm distinct outcomes!!!
![Page 29: Calculation of IBD probabilities](https://reader035.vdocuments.mx/reader035/viewer/2022070405/56813fe8550346895daad7c3/html5/thumbnails/29.jpg)
0000
0001
0010
1111
2 3 4 m = 10…
…
Marker
Inheritance vector
(22xn)m = (22 x 2)10 = 1012 possible paths !!!
Example: Sib-pair Genotyped at 10 Markers
1
![Page 30: Calculation of IBD probabilities](https://reader035.vdocuments.mx/reader035/viewer/2022070405/56813fe8550346895daad7c3/html5/thumbnails/30.jpg)
Lander-Green Algorithm The inheritance vector at a locus is conditionally
independent of the inheritance vectors at all preceding loci given the inheritance vector at the immediately preceding locus (“Hidden Markov chain”)
The conditional probability of an inheritance vector vi+1 at locus i+1, given the inheritance vector vi at locus i is θi
j(1-θi)2n-j where θ is the recombination fraction and j is the number of changes in elements of the inheritance vector (“transition probabilities”)
[0000]
Conditional probability = (1 – θ)3θ
Locus 2Locus 1[0001]
Example:
![Page 31: Calculation of IBD probabilities](https://reader035.vdocuments.mx/reader035/viewer/2022070405/56813fe8550346895daad7c3/html5/thumbnails/31.jpg)
0000
0001
0010
1111
1 2 3 m…
…
Total Likelihood = 1’Q1T1Q2T2…Tm-1Qm1
P[0000]
…
0
Qi =0
P[1111]
P[0001]
0
0
0
0
0
00
0
0
0
22n x 22n diagonal matrix of single locus probabilitiesat locus i
(1-θ)4
…
θ4
Ti =(1-θ)3θ
(1-θ)4
(1-θ)4
…
θ4
…
…
…
…(1-θ)θ3
…
(1-θ)3θ
(1-θ)θ3
22n x 22n matrix of transitional probabilities betweenlocus i and locus i+1
~10 x (22 x 2)2 operations = 2560 for this case !!!
![Page 32: Calculation of IBD probabilities](https://reader035.vdocuments.mx/reader035/viewer/2022070405/56813fe8550346895daad7c3/html5/thumbnails/32.jpg)
0000
0001
0010
1111
2 3 4 m = 10…
…
Marker
Inheritance vector
(L[0000] + L[0101] + L[1010] + L[1111] ) / L[ALL]
P(IBD) = 2 at Marker Three
1
L[IBD = 2 at marker 3] / L[ALL]
![Page 33: Calculation of IBD probabilities](https://reader035.vdocuments.mx/reader035/viewer/2022070405/56813fe8550346895daad7c3/html5/thumbnails/33.jpg)
0000
0001
0010
1111
2 3 4 m = 10…
…
Marker
Inheritance vector
P(IBD) = 2 at arbitrary position on the chromosome
1
(L[0000] + L[0101] + L[1010] + L[1111] ) / L[ALL]
![Page 34: Calculation of IBD probabilities](https://reader035.vdocuments.mx/reader035/viewer/2022070405/56813fe8550346895daad7c3/html5/thumbnails/34.jpg)
Further speedups… Trees summarize redundant
information Portions of inheritance vector that are
repeated Portions of inheritance vector that are
constant or zero Use sparse-matrix by vector multiplication
Regularities in transition matrices Use symmetries in divide and conquer
algorithm (Idury & Elston, 1997)
![Page 35: Calculation of IBD probabilities](https://reader035.vdocuments.mx/reader035/viewer/2022070405/56813fe8550346895daad7c3/html5/thumbnails/35.jpg)
Lander-Green Algorithm Summary
Factorize likelihood by marker Complexity m·en
Large number of markers (e.g. dense SNP data)
Relatively small pedigrees MERLIN, GENEHUNTER, ALLEGRO
etc
![Page 36: Calculation of IBD probabilities](https://reader035.vdocuments.mx/reader035/viewer/2022070405/56813fe8550346895daad7c3/html5/thumbnails/36.jpg)
Elston-Stewart Algorithm
Factorize likelihood by individual Complexity n·em
Small number of markers Large pedigrees
With little inbreeding VITESSE etc
![Page 37: Calculation of IBD probabilities](https://reader035.vdocuments.mx/reader035/viewer/2022070405/56813fe8550346895daad7c3/html5/thumbnails/37.jpg)
Other methods
Number of MCMC methods proposed ~Linear on # markers ~Linear on # people
Hard to guarantee convergence on very large datasets Many widely separated local minima
E.g. SIMWALK, LOKI
![Page 38: Calculation of IBD probabilities](https://reader035.vdocuments.mx/reader035/viewer/2022070405/56813fe8550346895daad7c3/html5/thumbnails/38.jpg)
MERLIN-- Multipoint Engine for Rapid Likelihood Inference
![Page 39: Calculation of IBD probabilities](https://reader035.vdocuments.mx/reader035/viewer/2022070405/56813fe8550346895daad7c3/html5/thumbnails/39.jpg)
Capabilities
Linkage Analysis NPL and K&C LOD Variance Components
Haplotypes Most likely Sampling All
IBD and info content
Error Detection Most SNP typing
errors are Mendelian consistent
Recombination No. of recombinants
per family per interval can be controlled
Simulation
![Page 40: Calculation of IBD probabilities](https://reader035.vdocuments.mx/reader035/viewer/2022070405/56813fe8550346895daad7c3/html5/thumbnails/40.jpg)
MERLIN Website
Reference
FAQ
Source
Binaries
Tutorial Linkage Haplotyping Simulation Error detection IBD calculation
www.sph.umich.edu/csg/abecasis/Merlin
![Page 41: Calculation of IBD probabilities](https://reader035.vdocuments.mx/reader035/viewer/2022070405/56813fe8550346895daad7c3/html5/thumbnails/41.jpg)
Test Case Pedigrees
![Page 42: Calculation of IBD probabilities](https://reader035.vdocuments.mx/reader035/viewer/2022070405/56813fe8550346895daad7c3/html5/thumbnails/42.jpg)
Timings – Marker Locations
A (x1000) B C DGenehunter 38s 37s 18m16s *
Allegro 18s 2m17s 3h54m13s *Merlin 11s 18s 13m55s *
A (x1000) B C DGenehunter 45s 1m54s * *
Allegro 18s 1m08s 1h12m38s *Merlin 13s 25s 15m50s *
Top Generation Genotyped
Top Generation Not Genotyped
![Page 43: Calculation of IBD probabilities](https://reader035.vdocuments.mx/reader035/viewer/2022070405/56813fe8550346895daad7c3/html5/thumbnails/43.jpg)
Intuition: Approximate Sparse T
Dense maps, closely spaced markers Small recombination fractions Reasonable to set k with zero
Produces a very sparse transition matrix Consider only elements of v separated
by <k recombination events At consecutive locations
![Page 44: Calculation of IBD probabilities](https://reader035.vdocuments.mx/reader035/viewer/2022070405/56813fe8550346895daad7c3/html5/thumbnails/44.jpg)
Additional Speedup…Time Memory
Exact 40s 100 MB
No recombination <1s 4 MB≤1 recombinant 2s 17 MB≤2 recombinants 15s 54 MB
Genehunter 2.1 16min 1024MB
Keavney et al (1998) ACE data, 10 SNPs within gene,4-18 individuals per family
![Page 45: Calculation of IBD probabilities](https://reader035.vdocuments.mx/reader035/viewer/2022070405/56813fe8550346895daad7c3/html5/thumbnails/45.jpg)
Input Files Pedigree File
Relationships Genotype data Phenotype data
Data File Describes contents of pedigree file
Map File Records location of genetic markers
![Page 46: Calculation of IBD probabilities](https://reader035.vdocuments.mx/reader035/viewer/2022070405/56813fe8550346895daad7c3/html5/thumbnails/46.jpg)
Example Pedigree File<contents of example.ped>1 1 0 0 1 1 x 3 3 x x1 2 0 0 2 1 x 4 4 x x1 3 0 0 1 1 x 1 2 x x1 4 1 2 2 1 x 4 3 x x1 5 3 4 2 2 1.234 1 3 2 21 6 3 4 1 2 4.321 2 4 2 2<end of example.ped>
Encodes family relationships, marker and phenotype information
![Page 47: Calculation of IBD probabilities](https://reader035.vdocuments.mx/reader035/viewer/2022070405/56813fe8550346895daad7c3/html5/thumbnails/47.jpg)
Example Data File
<contents of example.dat>T some_trait_of_interestM some_markerM another_marker<end of example.dat>
Provides information necessary to decode pedigree file
![Page 48: Calculation of IBD probabilities](https://reader035.vdocuments.mx/reader035/viewer/2022070405/56813fe8550346895daad7c3/html5/thumbnails/48.jpg)
Data File Field Codes
Code Description
M Marker Genotype
A Affection Status.
T Quantitative Trait.
C Covariate.
Z Zygosity.
![Page 49: Calculation of IBD probabilities](https://reader035.vdocuments.mx/reader035/viewer/2022070405/56813fe8550346895daad7c3/html5/thumbnails/49.jpg)
Example Map File
<contents of example.map>CHROMOSOME MARKER POSITION2 D2S160 160.02 D2S308 165.0…<end of example.map>
Indicates location of individual markers, necessary to derive recombination fractions between them
![Page 50: Calculation of IBD probabilities](https://reader035.vdocuments.mx/reader035/viewer/2022070405/56813fe8550346895daad7c3/html5/thumbnails/50.jpg)
Worked Example
1 1/ 1 1/
94)|2(
94)|1(
91)|0(
5.01
GIBDP
GIBDP
GIBDP
p
merlin –d example.dat –p example.ped –m example.map --ibd
![Page 51: Calculation of IBD probabilities](https://reader035.vdocuments.mx/reader035/viewer/2022070405/56813fe8550346895daad7c3/html5/thumbnails/51.jpg)
Application: Information Content Mapping Information content: Provides a measure of how
well a marker set approaches the goal of completely determining the inheritance outcome
Based on concept of entropy E = -ΣPilog2Pi where Pi is probability of the ith outcome
IE(x) = 1 – E(x)/E0 Always lies between 0 and 1 Does not depend on test for linkage Scales linearly with power
![Page 52: Calculation of IBD probabilities](https://reader035.vdocuments.mx/reader035/viewer/2022070405/56813fe8550346895daad7c3/html5/thumbnails/52.jpg)
Application: Information Content Mapping Simulations (sib-pairs with/out parental
genotypes) 1 micro-satellite per 10cM (ABI) 1 microsatellite per 3cM (deCODE) 1 SNP per 0.5cM (Illumina) 1 SNP per 0.2 cM (Affymetrix)
Which panel performs best in terms of extracting marker information?
Do the results depend upon the presence of parental genotypes?
merlin –d file.dat –p file.ped –m file.map --information --step 1 --markerNames
![Page 53: Calculation of IBD probabilities](https://reader035.vdocuments.mx/reader035/viewer/2022070405/56813fe8550346895daad7c3/html5/thumbnails/53.jpg)
SNPs + parents
microsat + parents
SNP microsat0.2 cM 3 cM0.5 cM 10 cM
Densities
0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1.0
0 10 20 30 40 50 60 70 80 90 100
Position (cM)
Info
rma
tio
n C
on
ten
tSNPs vs Microsatellites
with parents
![Page 54: Calculation of IBD probabilities](https://reader035.vdocuments.mx/reader035/viewer/2022070405/56813fe8550346895daad7c3/html5/thumbnails/54.jpg)
microsat - parents
SNPs - parents
SNP microsat0.2 cM 3 cM0.5 cM 10 cM
Densities
0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1.0
0 10 20 30 40 50 60 70 80 90 100
Position (cM)
Info
rma
tio
n C
on
ten
t
SNP microsat0.2 cM 3 cM0.5 cM 10 cM
Densities
SNPs vs Microsatellites without parents