![Page 1: ISBRA 2007 Tutorial A: Scalable Algorithms for Genotype and Haplotype Analysis Ion Mandoiu (University of Connecticut) Alexander Zelikovsky (Georgia State](https://reader035.vdocuments.mx/reader035/viewer/2022062407/56649d555503460f94a335c6/html5/thumbnails/1.jpg)
ISBRA 2007 Tutorial A:
Scalable Algorithms for Genotype
and Haplotype Analysis
Ion Mandoiu (University of Connecticut)
Alexander Zelikovsky (Georgia State University)
![Page 2: ISBRA 2007 Tutorial A: Scalable Algorithms for Genotype and Haplotype Analysis Ion Mandoiu (University of Connecticut) Alexander Zelikovsky (Georgia State](https://reader035.vdocuments.mx/reader035/viewer/2022062407/56649d555503460f94a335c6/html5/thumbnails/2.jpg)
Outline
Background on genetic variationGenotype phasingError detectionDisease association searchDisease susceptibility prediction
![Page 3: ISBRA 2007 Tutorial A: Scalable Algorithms for Genotype and Haplotype Analysis Ion Mandoiu (University of Connecticut) Alexander Zelikovsky (Georgia State](https://reader035.vdocuments.mx/reader035/viewer/2022062407/56649d555503460f94a335c6/html5/thumbnails/3.jpg)
3
Main form of variation between individual genomes: single nucleotide polymorphisms (SNPs)
High density in the human genome: 1 107 SNPs out of total 3 109 base pairs
Single Nucleotide Polymorphisms
… ataggtccCtatttcgcgcCgtatacacgggActata …… ataggtccGtatttcgcgcCgtatacacgggTctata …… ataggtccCtatttcgcgcCgtatacacgggTctata …
![Page 4: ISBRA 2007 Tutorial A: Scalable Algorithms for Genotype and Haplotype Analysis Ion Mandoiu (University of Connecticut) Alexander Zelikovsky (Georgia State](https://reader035.vdocuments.mx/reader035/viewer/2022062407/56649d555503460f94a335c6/html5/thumbnails/4.jpg)
Haplotypes and Genotypes
Diploids: two homologous copies of each autosomal chromosome
One inherited from mother and one from father
Haplotype: description of SNP alleles on a chromosome 0/1 vector: 0 for major allele, 1 for minor
Genotype: description of alleles on both chromosomes 0/1/2 vector: 0 (1) - both chromosomes contain the major (minor)
allele; 2 - the chromosomes contain different alleles
011100110001000010021200210
+two haplotypes per individual
genotype
![Page 5: ISBRA 2007 Tutorial A: Scalable Algorithms for Genotype and Haplotype Analysis Ion Mandoiu (University of Connecticut) Alexander Zelikovsky (Georgia State](https://reader035.vdocuments.mx/reader035/viewer/2022062407/56649d555503460f94a335c6/html5/thumbnails/5.jpg)
5
Identification and fine mapping of disease-related genes
Methods: Linkage analysis, allele-sharing, association studies Genotype data: large pedigrees, sibling pairs, trios, unrelated
Why SNPs?
![Page 6: ISBRA 2007 Tutorial A: Scalable Algorithms for Genotype and Haplotype Analysis Ion Mandoiu (University of Connecticut) Alexander Zelikovsky (Georgia State](https://reader035.vdocuments.mx/reader035/viewer/2022062407/56649d555503460f94a335c6/html5/thumbnails/6.jpg)
6
Latest technologies deliver 1M SNP genotypes per sample, at low cost
Major challenges Efficiency Reproducibility Need simple methods!
Challenges in SNP Data Analysis
![Page 7: ISBRA 2007 Tutorial A: Scalable Algorithms for Genotype and Haplotype Analysis Ion Mandoiu (University of Connecticut) Alexander Zelikovsky (Georgia State](https://reader035.vdocuments.mx/reader035/viewer/2022062407/56649d555503460f94a335c6/html5/thumbnails/7.jpg)
Genotype Phasing
![Page 8: ISBRA 2007 Tutorial A: Scalable Algorithms for Genotype and Haplotype Analysis Ion Mandoiu (University of Connecticut) Alexander Zelikovsky (Georgia State](https://reader035.vdocuments.mx/reader035/viewer/2022062407/56649d555503460f94a335c6/html5/thumbnails/8.jpg)
Genotype Phasing
For a genotype with k 2’s there are 2k-1 possible pairs of haplotypes explaining it
g: 0010212 ?
h1:0010111
h2:0010010
h3:0010011
h4:0010110
Computational approaches to genotype phasing Statistical methods: PHASE, Phamily, PL, GERBIL … Combinatorial methods: Parsimony, HAP, 2SNP, ENT …
![Page 9: ISBRA 2007 Tutorial A: Scalable Algorithms for Genotype and Haplotype Analysis Ion Mandoiu (University of Connecticut) Alexander Zelikovsky (Georgia State](https://reader035.vdocuments.mx/reader035/viewer/2022062407/56649d555503460f94a335c6/html5/thumbnails/9.jpg)
Minimum Entropy Genotype Phasing
Phasing – function f that assigns to each genotype g a pair of haplotypes (h,h’) that explains g
Coverage of h in f – number of times h appears in the image of f
Entropy of a phasing:
)||2
),cov(log(
||2
),cov()(
0),cov(: G
fh
G
fhfEntropy
fhh
Minimum Entropy Genotype Phasing [HalperinKarp 04]: Given a set of genotypes, find a phasing with minimum entropy
![Page 10: ISBRA 2007 Tutorial A: Scalable Algorithms for Genotype and Haplotype Analysis Ion Mandoiu (University of Connecticut) Alexander Zelikovsky (Georgia State](https://reader035.vdocuments.mx/reader035/viewer/2022062407/56649d555503460f94a335c6/html5/thumbnails/10.jpg)
Connection with Likelihood Maximization
![Page 11: ISBRA 2007 Tutorial A: Scalable Algorithms for Genotype and Haplotype Analysis Ion Mandoiu (University of Connecticut) Alexander Zelikovsky (Georgia State](https://reader035.vdocuments.mx/reader035/viewer/2022062407/56649d555503460f94a335c6/html5/thumbnails/11.jpg)
Iterative Improvement Algorithm[Gusev et al. 07]
InitializationStart with random phasing
Iterative improvement stepWhile there exists a genotype whose re-phasing decreases the entropy, find the genotype that yields the highest decrease in entropy and re-phase it
![Page 12: ISBRA 2007 Tutorial A: Scalable Algorithms for Genotype and Haplotype Analysis Ion Mandoiu (University of Connecticut) Alexander Zelikovsky (Georgia State](https://reader035.vdocuments.mx/reader035/viewer/2022062407/56649d555503460f94a335c6/html5/thumbnails/12.jpg)
Overlapping Window approach
Entropy is computed over short windows of size l+f l “locked” SNPs previously phased f “free” SNPs are currently phased
locked free
…4321
g1
gn
…
…
• Only phasings consistent with the l locked SNPs are considered
![Page 13: ISBRA 2007 Tutorial A: Scalable Algorithms for Genotype and Haplotype Analysis Ion Mandoiu (University of Connecticut) Alexander Zelikovsky (Georgia State](https://reader035.vdocuments.mx/reader035/viewer/2022062407/56649d555503460f94a335c6/html5/thumbnails/13.jpg)
Effect of Window Size
![Page 14: ISBRA 2007 Tutorial A: Scalable Algorithms for Genotype and Haplotype Analysis Ion Mandoiu (University of Connecticut) Alexander Zelikovsky (Georgia State](https://reader035.vdocuments.mx/reader035/viewer/2022062407/56649d555503460f94a335c6/html5/thumbnails/14.jpg)
Time Complexity
n unrelated genotypes over k SNPs k/f windows n*2f candidate haplotype pairs evaluated per
window O(1) time per pair to compute the entropy gain Empirically, the number of iterations is linear in
n, but is reduced to O(log3n) by re-explaining multiple genotypes per iteration (batching)
Total runtime O(n log3n 2f k/f)
![Page 15: ISBRA 2007 Tutorial A: Scalable Algorithms for Genotype and Haplotype Analysis Ion Mandoiu (University of Connecticut) Alexander Zelikovsky (Georgia State](https://reader035.vdocuments.mx/reader035/viewer/2022062407/56649d555503460f94a335c6/html5/thumbnails/15.jpg)
Empirical Runtime
![Page 16: ISBRA 2007 Tutorial A: Scalable Algorithms for Genotype and Haplotype Analysis Ion Mandoiu (University of Connecticut) Alexander Zelikovsky (Georgia State](https://reader035.vdocuments.mx/reader035/viewer/2022062407/56649d555503460f94a335c6/html5/thumbnails/16.jpg)
Extension to general pedigrees
Parent-child relationships can be exploited to infer haplotype phase for a substantial fraction of the SNPs
Phasing related genotypes based on the no recombination assumption
Algorithm modifications: At each step re-explain an entire family
Cache inheritance pattern given by first window to speed-up computations for subsequent windows
Entropy computation based on founder haplotypes only
![Page 17: ISBRA 2007 Tutorial A: Scalable Algorithms for Genotype and Haplotype Analysis Ion Mandoiu (University of Connecticut) Alexander Zelikovsky (Georgia State](https://reader035.vdocuments.mx/reader035/viewer/2022062407/56649d555503460f94a335c6/html5/thumbnails/17.jpg)
Enumeration No-Recombination Phasings for a Pedigree
• Gaussian elimination [Jiang et al.]• [Gusev et al. 07] implementation based on simple backtracking
![Page 18: ISBRA 2007 Tutorial A: Scalable Algorithms for Genotype and Haplotype Analysis Ion Mandoiu (University of Connecticut) Alexander Zelikovsky (Georgia State](https://reader035.vdocuments.mx/reader035/viewer/2022062407/56649d555503460f94a335c6/html5/thumbnails/18.jpg)
Empirical Evaluation
International HapMap Project, Phase I & II datasets 3.7 million SNP loci Trio and unrelated genotypes from 4 different populations Reference haplotypes obtained using PHASE
Accuracy measures Relative Genotype Error (RGE): percentage of missing
genotypes inferred differently from the reference method Relative Switching Error (RSE): number of switches
needed to convert inferred haplotype pairs into the reference haplotype pairs
![Page 19: ISBRA 2007 Tutorial A: Scalable Algorithms for Genotype and Haplotype Analysis Ion Mandoiu (University of Connecticut) Alexander Zelikovsky (Georgia State](https://reader035.vdocuments.mx/reader035/viewer/2022062407/56649d555503460f94a335c6/html5/thumbnails/19.jpg)
Empirical Evaluation (cont.)
Compared algorithms ENT [Gusev et al. 07] 2SNP [Brinza&Zelikovsky 05] Pure Parsimony Trio Phasing (PPTP) [Brinza et al. 05] PHASE [Stephens et al 01] HAP [Halperin&Eskin 04] FastPhase [Scheet & Stephens 06]
![Page 20: ISBRA 2007 Tutorial A: Scalable Algorithms for Genotype and Haplotype Analysis Ion Mandoiu (University of Connecticut) Alexander Zelikovsky (Georgia State](https://reader035.vdocuments.mx/reader035/viewer/2022062407/56649d555503460f94a335c6/html5/thumbnails/20.jpg)
Results on Hapmap Phase II Trio Populations
ENT needs only few hours on a regular workstation to phase the entire HapMap Phase II dataset, compared to PHASE which required months of CPU time on two clusters with a total of 238 nodes
![Page 21: ISBRA 2007 Tutorial A: Scalable Algorithms for Genotype and Haplotype Analysis Ion Mandoiu (University of Connecticut) Alexander Zelikovsky (Georgia State](https://reader035.vdocuments.mx/reader035/viewer/2022062407/56649d555503460f94a335c6/html5/thumbnails/21.jpg)
Complex Pedigree Phasing
Exploiting pedigree info significantly improves accuracy!
![Page 22: ISBRA 2007 Tutorial A: Scalable Algorithms for Genotype and Haplotype Analysis Ion Mandoiu (University of Connecticut) Alexander Zelikovsky (Georgia State](https://reader035.vdocuments.mx/reader035/viewer/2022062407/56649d555503460f94a335c6/html5/thumbnails/22.jpg)
Application of Phasing: Missing data recovery
![Page 23: ISBRA 2007 Tutorial A: Scalable Algorithms for Genotype and Haplotype Analysis Ion Mandoiu (University of Connecticut) Alexander Zelikovsky (Georgia State](https://reader035.vdocuments.mx/reader035/viewer/2022062407/56649d555503460f94a335c6/html5/thumbnails/23.jpg)
Genotype Error Detection
![Page 24: ISBRA 2007 Tutorial A: Scalable Algorithms for Genotype and Haplotype Analysis Ion Mandoiu (University of Connecticut) Alexander Zelikovsky (Georgia State](https://reader035.vdocuments.mx/reader035/viewer/2022062407/56649d555503460f94a335c6/html5/thumbnails/24.jpg)
Genotyping Errors
A real problem despite advances in technology & typing algorithms
1.1% of 20 million dbSNP genotypes typed multiple times are inconsistent [Zaitlen et al. 2005]
Systematic errors (e.g., assay failure) typically detected by departure from HWE [Hosking et al. 2004]
In pedigrees, some errors detected as Mendelian Inconsistencies (MIs)
Many errors remain undetected As much as 70% of errors are Mendelian consistent for
mother/father/child trios [Gordon et al. 1999]
![Page 25: ISBRA 2007 Tutorial A: Scalable Algorithms for Genotype and Haplotype Analysis Ion Mandoiu (University of Connecticut) Alexander Zelikovsky (Georgia State](https://reader035.vdocuments.mx/reader035/viewer/2022062407/56649d555503460f94a335c6/html5/thumbnails/25.jpg)
Effects of Undetected Genotyping Errors
Even low error levels can have large effects for some study designs (e.g. rare alleles, haplotype-based)
Errors as low as .1% can increase Type I error rates in haplotype sharing transmission disequilibrium test (HS-TDT) [Knapp&Becker04]
1% errors decrease power by 10-50% for linkage, and by 5-20% for association [Douglas et al. 00, Abecasis et al. 01]
![Page 26: ISBRA 2007 Tutorial A: Scalable Algorithms for Genotype and Haplotype Analysis Ion Mandoiu (University of Connecticut) Alexander Zelikovsky (Georgia State](https://reader035.vdocuments.mx/reader035/viewer/2022062407/56649d555503460f94a335c6/html5/thumbnails/26.jpg)
Related Work
Improved genotype calling algorithms [Di et al. 05, Rabbee&Speed 06, Nicolae et al. 06]
Explicit modeling in analysis methods [Sieberts et al. 01, Sobel et al. 02, Abecasis et al. 02,Cheng 06] Computationally complex
Separate error detection step [Douglas et al. 00, Abecasis et al. 02, Becker et al. 06] Detected errors can be retyped, imputed, or ignored in
downstream analyses
![Page 27: ISBRA 2007 Tutorial A: Scalable Algorithms for Genotype and Haplotype Analysis Ion Mandoiu (University of Connecticut) Alexander Zelikovsky (Georgia State](https://reader035.vdocuments.mx/reader035/viewer/2022062407/56649d555503460f94a335c6/html5/thumbnails/27.jpg)
Likelihood Sensitivity Approach to Error Detection [Becker et al. 06]
0 1 2 1 0 2
0 2 2 1 0 2
0 2 2 1 0 2
Mother Father
Child
Likelihood of best phasing for original trio T
0 1 1 1 0 0 h1
0 0 0 1 0 1 h3
0 1 1 1 0 0 h1
0 1 0 1 0 1 h2
0 0 0 1 0 1 h3
0 1 1 1 0 0 h4
)()()()( MAX)( 4321 hphphphpTL
![Page 28: ISBRA 2007 Tutorial A: Scalable Algorithms for Genotype and Haplotype Analysis Ion Mandoiu (University of Connecticut) Alexander Zelikovsky (Georgia State](https://reader035.vdocuments.mx/reader035/viewer/2022062407/56649d555503460f94a335c6/html5/thumbnails/28.jpg)
Likelihood Sensitivity Approach to Error Detection [Becker et al. 06]
0 1 2 1 0 2
0 2 2 1 0 2
0 2 2 1 0 2
Mother Father
Child
Likelihood of best phasing for original trio T
)()()()( MAX)( 4321 hphphphpTL
? 0 1 0 1 0 1 h’ 1 0 0 0 1 0 0 h’ 3
0 1 0 1 0 1 h’1
0 1 1 1 0 0 h’2
0 0 0 1 0 0 h’ 3
0 1 1 1 0 1 h’ 4
Likelihood of best phasing for modified trio T’
)'()'()'()'( MAX)'( 4321 hphphphpTL
![Page 29: ISBRA 2007 Tutorial A: Scalable Algorithms for Genotype and Haplotype Analysis Ion Mandoiu (University of Connecticut) Alexander Zelikovsky (Georgia State](https://reader035.vdocuments.mx/reader035/viewer/2022062407/56649d555503460f94a335c6/html5/thumbnails/29.jpg)
Likelihood Sensitivity Approach to Error Detection [Becker et al. 06]
0 1 2 1 0 2
0 2 2 1 0 2
0 2 2 1 0 2
Mother Father
Child
?
Large change in likelihood suggests likely error Flag genotype as an error if L(T’)/L(T) > R, where R is the detection threshold (e.g., R=104)
![Page 30: ISBRA 2007 Tutorial A: Scalable Algorithms for Genotype and Haplotype Analysis Ion Mandoiu (University of Connecticut) Alexander Zelikovsky (Georgia State](https://reader035.vdocuments.mx/reader035/viewer/2022062407/56649d555503460f94a335c6/html5/thumbnails/30.jpg)
Implementation in FAMHAP[Becker et al. 06]
Window-based algorithm For each window including the SNP
under test, generate list of H most frequent haplotypes (default H=50)
Find most likely trio phasings by pruned search over the H4 quadruples of frequent haplotypes
Flag genotype as an error if L(T’)/L(T) > R for at least one window
Mother …201012 1 02210...Father …201202 2 10211...Child …000120 2 21021...
![Page 31: ISBRA 2007 Tutorial A: Scalable Algorithms for Genotype and Haplotype Analysis Ion Mandoiu (University of Connecticut) Alexander Zelikovsky (Georgia State](https://reader035.vdocuments.mx/reader035/viewer/2022062407/56649d555503460f94a335c6/html5/thumbnails/31.jpg)
Limitations of FAMHAP Implementation
Truncating the list of haplotypes to size H may lead to sub-optimal phasings and inaccurate L(T) values
False positives caused by nearby errors (due to the use of multiple short windows)
[Kennedy et al.] HMM model of haplotype diversity all haplotypes are
represented + no need for short windows Alternate likelihood functions scalable runtime
![Page 32: ISBRA 2007 Tutorial A: Scalable Algorithms for Genotype and Haplotype Analysis Ion Mandoiu (University of Connecticut) Alexander Zelikovsky (Georgia State](https://reader035.vdocuments.mx/reader035/viewer/2022062407/56649d555503460f94a335c6/html5/thumbnails/32.jpg)
HMM Model
Similar to models proposed by [Schwartz 04, Rastas et al. 05, Kimmel&Shamir 05, Scheet&Stephens 06]
Block-free model, paths with high transition probability correspond to “founder” haplotypes
(Figure from Rastas et al. 07)
![Page 33: ISBRA 2007 Tutorial A: Scalable Algorithms for Genotype and Haplotype Analysis Ion Mandoiu (University of Connecticut) Alexander Zelikovsky (Georgia State](https://reader035.vdocuments.mx/reader035/viewer/2022062407/56649d555503460f94a335c6/html5/thumbnails/33.jpg)
HMM Training
Previous works use EM training of HMM based on unrelated genotype data
2-step algo exploiting pedigree info [Kennedy et al. 07]
Step 1: Infer haplotypes using pedigree-aware algorithm based on entropy-minimization
Step 2: train HMM based on inferred haplotypes, using Baum-Welch
![Page 34: ISBRA 2007 Tutorial A: Scalable Algorithms for Genotype and Haplotype Analysis Ion Mandoiu (University of Connecticut) Alexander Zelikovsky (Georgia State](https://reader035.vdocuments.mx/reader035/viewer/2022062407/56649d555503460f94a335c6/html5/thumbnails/34.jpg)
Complexity of Computing Maximum Phasing Probability
How hard is to compute the likelihood function of Becker et al.?
Theorem [Kennedy et al. 07]
• Cannot approximate L(T) within O(n1/4 -), unless ZPP=NP, where n is the number of SNP loci
• For unrelated genotypes, computing maximum phasing probability is hard to approximate within a factor of O(n½-)
Open: complexity for fixed number of founder haplotypes
)()()()( MAX)( 4321 hphphphpTL
![Page 35: ISBRA 2007 Tutorial A: Scalable Algorithms for Genotype and Haplotype Analysis Ion Mandoiu (University of Connecticut) Alexander Zelikovsky (Georgia State](https://reader035.vdocuments.mx/reader035/viewer/2022062407/56649d555503460f94a335c6/html5/thumbnails/35.jpg)
Complexity of Computing Maximum Phasing Probability
• Reductions from the clique problem
![Page 36: ISBRA 2007 Tutorial A: Scalable Algorithms for Genotype and Haplotype Analysis Ion Mandoiu (University of Connecticut) Alexander Zelikovsky (Georgia State](https://reader035.vdocuments.mx/reader035/viewer/2022062407/56649d555503460f94a335c6/html5/thumbnails/36.jpg)
Alternate Likelihood Functions
• Viterbi probability (ViterbiProb): the maximum probability of a set of 4 HMM paths that emit 4 haplotypes compatible with the trio
• Probability of Viterbi Haplotypes (ViterbiHaps): product of total probabilities of the 4 Viterbi haplotypes
• Total Trio Probability (TotalProb): total probability P(T) that the HMM emits four haplotypes that explain trio T along all possible 4-tuples of paths
![Page 37: ISBRA 2007 Tutorial A: Scalable Algorithms for Genotype and Haplotype Analysis Ion Mandoiu (University of Connecticut) Alexander Zelikovsky (Georgia State](https://reader035.vdocuments.mx/reader035/viewer/2022062407/56649d555503460f94a335c6/html5/thumbnails/37.jpg)
For a fixed trio, Viterbi paths can be found using a 4-path version of Viterbi’s algorithm in time
K3 speed-up by factoring common terms:
Efficient Computation of Viterbi Probability for Trios
)( 8NKO
)},'()',,,;({max),,,;1(),,,;1( 4443213'43214321 4qqqqqqjPreqqqqjEqqqqjV
jQq
• = maximum probability of emitting SNP genotypes at locus j+1 from states • = transition probability
),,,;1( 4321 qqqqjE ),,,( 4321 qqqq
Where:
![Page 38: ISBRA 2007 Tutorial A: Scalable Algorithms for Genotype and Haplotype Analysis Ion Mandoiu (University of Connecticut) Alexander Zelikovsky (Georgia State](https://reader035.vdocuments.mx/reader035/viewer/2022062407/56649d555503460f94a335c6/html5/thumbnails/38.jpg)
Viterbi probability Likelihoods of all 3N modified trios can be computed within
time using forward-backward algorithm Overall runtime for M trios
Probability of Viterbi haplotypes Obtain haplotypes from standard traceback, then compute
haplotype probabilities using forward algorithms Overall runtime
Total trio probability Similar pre-computation speed-up & forward-backward algorithm Overall runtime
Overall Runtimes
)( 5MNKO
))(( 25 KNNKMO
)( 5MNKO
)( 5NKO
![Page 39: ISBRA 2007 Tutorial A: Scalable Algorithms for Genotype and Haplotype Analysis Ion Mandoiu (University of Connecticut) Alexander Zelikovsky (Georgia State](https://reader035.vdocuments.mx/reader035/viewer/2022062407/56649d555503460f94a335c6/html5/thumbnails/39.jpg)
Empirical Evaluation
Real dataset [Becker et al. 2006] 35 SNP loci on chromosome 16 covering a region of 91kb 551 trios
Synthetic datasets 35 SNPs, 30-551 trios, same missing data pattern as real
dataset Haplotypes assigned to trios based on frequencies inferred
from real dataset 1% error rate, four error insertion models
Random allele Random genotype Heterozygous-to-homozygous Homozygous-to-heterozygous
![Page 40: ISBRA 2007 Tutorial A: Scalable Algorithms for Genotype and Haplotype Analysis Ion Mandoiu (University of Connecticut) Alexander Zelikovsky (Georgia State](https://reader035.vdocuments.mx/reader035/viewer/2022062407/56649d555503460f94a335c6/html5/thumbnails/40.jpg)
Comparison of Alternative Likelihood Functions (1% Random Allele Errors)
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0 0.005 0.01 0.015
FP rate
Se
ns
itiv
ity
VitHaps-P
VitProb-P
TotalProb-P
VitHaps-C
VitProb-C
TotalProb-C
![Page 41: ISBRA 2007 Tutorial A: Scalable Algorithms for Genotype and Haplotype Analysis Ion Mandoiu (University of Connecticut) Alexander Zelikovsky (Georgia State](https://reader035.vdocuments.mx/reader035/viewer/2022062407/56649d555503460f94a335c6/html5/thumbnails/41.jpg)
Parents vs. Children (1% Random Allele Errors)
Parents-TRIOS
1
10
100
1000
10000
100000
1000000
0
0.2
7
0.5
4
0.8
1
1.0
8
1.3
5
1.6
2
1.8
9
2.1
6
2.4
3
2.7
2.9
7
3.2
4
3.5
1
3.7
8
4.0
5
4.3
2
4.5
9
4.8
6
5.1
3
5.4
5.6
7
5.9
4
NO_ERR ERR
Children-TRIOS
1
10
100
1000
10000
100000
1000000
0
0.2
7
0.5
4
0.8
1
1.0
8
1.3
5
1.6
2
1.8
9
2.1
6
2.4
3
2.7
2.9
7
3.2
4
3.5
1
3.7
8
4.0
5
4.3
2
4.5
9
4.8
6
5.1
3
5.4
5.6
7
5.9
4
NO_ERR ERR
FPs caused by same-locus errors in parents
![Page 42: ISBRA 2007 Tutorial A: Scalable Algorithms for Genotype and Haplotype Analysis Ion Mandoiu (University of Connecticut) Alexander Zelikovsky (Georgia State](https://reader035.vdocuments.mx/reader035/viewer/2022062407/56649d555503460f94a335c6/html5/thumbnails/42.jpg)
“Combined” Detection Method
Compute 4 likelihood ratios
Trio Mother-child duo Father-child duo Child (unrelated)
Flag as error if all ratios are above detection threshold
![Page 43: ISBRA 2007 Tutorial A: Scalable Algorithms for Genotype and Haplotype Analysis Ion Mandoiu (University of Connecticut) Alexander Zelikovsky (Georgia State](https://reader035.vdocuments.mx/reader035/viewer/2022062407/56649d555503460f94a335c6/html5/thumbnails/43.jpg)
Comparison with FAMHAP (Children)
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0 0.005 0.01 0.015
FP rate
Sen
siti
vity
TotalProb-UNO
TotalProb-DUO
TotalProb-TRIO
TotalProb-COMBINED
FAMHAP-1
FAMHAP-3
![Page 44: ISBRA 2007 Tutorial A: Scalable Algorithms for Genotype and Haplotype Analysis Ion Mandoiu (University of Connecticut) Alexander Zelikovsky (Georgia State](https://reader035.vdocuments.mx/reader035/viewer/2022062407/56649d555503460f94a335c6/html5/thumbnails/44.jpg)
Comparison with FAMHAP (Parents)
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0 0.005 0.01 0.015
FP rate
Sen
siti
vity
TotalProb-UNO
TotalProb-DUO
TotalProb-TRIO
TotalProb-COMBINED
FAMHAP-1
FAMHAP-3
![Page 45: ISBRA 2007 Tutorial A: Scalable Algorithms for Genotype and Haplotype Analysis Ion Mandoiu (University of Connecticut) Alexander Zelikovsky (Georgia State](https://reader035.vdocuments.mx/reader035/viewer/2022062407/56649d555503460f94a335c6/html5/thumbnails/45.jpg)
Acknowledgements
Sasha Gusev, Justin Kennedy, Bogdan Pasaniuc
NSF funding (Awards 0546457 and 0543365)
Software available at http://www.engr.uconn.edu/~ion/SOFT/