zhixiang chenuniversity of texas pan american bin fuuniversity of texas pan american
DESCRIPTION
Linear Time Probabilistic Algorithms for the Singular Haplotype Reconstruction Problem from SNP Fragments. Zhixiang ChenUniversity of Texas Pan American Bin FuUniversity of Texas Pan American Robert Schweller University of Texas Pan American Boting YangUniversity of Regina - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Zhixiang ChenUniversity of Texas Pan American Bin FuUniversity of Texas Pan American](https://reader035.vdocuments.mx/reader035/viewer/2022062813/5681656f550346895dd80305/html5/thumbnails/1.jpg)
Linear Time Probabilistic Algorithms for the Singular Haplotype Reconstruction Problem from SNP Fragments
Zhixiang Chen University of Texas Pan AmericanBin Fu University of Texas Pan AmericanRobert SchwellerUniversity of Texas Pan AmericanBoting Yang University of ReginaZhiyu Zhao University of New OrleansBinhai Zhu Montana State University
The Sixth Asia Pacific Bioinformatics ConferenceJanuary 16, 2008
![Page 2: Zhixiang ChenUniversity of Texas Pan American Bin FuUniversity of Texas Pan American](https://reader035.vdocuments.mx/reader035/viewer/2022062813/5681656f550346895dd80305/html5/thumbnails/2.jpg)
Outline
• Background, Haplotypes • Problem Formulation• Algorithms• Empirical Data
![Page 3: Zhixiang ChenUniversity of Texas Pan American Bin FuUniversity of Texas Pan American](https://reader035.vdocuments.mx/reader035/viewer/2022062813/5681656f550346895dd80305/html5/thumbnails/3.jpg)
AGATTACCACATAGTCGATAGAGATTACTAGTATCGATC
AGTTTACCACATAGTCGATCGAGATTACCAGTATCGAAC
AGATTACCACATAGGCGATCGAGATTACCAGTATCGATC
HaplotypesFull Chromosome for a cat
![Page 4: Zhixiang ChenUniversity of Texas Pan American Bin FuUniversity of Texas Pan American](https://reader035.vdocuments.mx/reader035/viewer/2022062813/5681656f550346895dd80305/html5/thumbnails/4.jpg)
AGATTACCACATAGTCGATAGAGATTACTAGTATCGATC
AGTTTACCACATAGTCGATCGAGATTACCAGTATCGAAC
AGATTACCACATAGGCGATCGAGATTACCAGTATCGATC
HaplotypesMost sites are the same, those that differ areSingle nucleotide polymorphisms ( SNP’s )
![Page 5: Zhixiang ChenUniversity of Texas Pan American Bin FuUniversity of Texas Pan American](https://reader035.vdocuments.mx/reader035/viewer/2022062813/5681656f550346895dd80305/html5/thumbnails/5.jpg)
AGATTACCACATAGTCGATAGAGATTACTAGTATCGATC
AGTTTACCACATAGTCGATCGAGATTACCAGTATCGAAC
AGATTACCACATAGGCGATCGAGATTACCAGTATCGATC
HaplotypesSingle Nucleotide Polymorphisms ( SNP’s )
![Page 6: Zhixiang ChenUniversity of Texas Pan American Bin FuUniversity of Texas Pan American](https://reader035.vdocuments.mx/reader035/viewer/2022062813/5681656f550346895dd80305/html5/thumbnails/6.jpg)
AGATTACCACATAGTCGATAGAGATTACTAGTATCGATC
AGTTTACCACATAGTCGATCGAGATTACCAGTATCGAAC
AGATTACCACATAGGCGATCGAGATTACCAGTATCGATC
ATATT
TTCCA
AGCCT
Haplotypes
Haplotype
Haplotype: Compact genetic fingerprint specifying identifying characteristics among individual specimens within the same species.
![Page 7: Zhixiang ChenUniversity of Texas Pan American Bin FuUniversity of Texas Pan American](https://reader035.vdocuments.mx/reader035/viewer/2022062813/5681656f550346895dd80305/html5/thumbnails/7.jpg)
ATATT
TTCCA
AGCCT
Haplotypes, simplified even further
Each SNP site only variesbetween two possiblevalues.
Use binary representation
![Page 8: Zhixiang ChenUniversity of Texas Pan American Bin FuUniversity of Texas Pan American](https://reader035.vdocuments.mx/reader035/viewer/2022062813/5681656f550346895dd80305/html5/thumbnails/8.jpg)
ATATT
TTCCA
AGCCT
HaplotypesEach SNP site only variesbetween two possiblevalues.
Use binary representation01001
11110
00111
Determination of an individual’s haplotype is a key step in the analysis of genetic variation.-Drug design- Medical applications
![Page 9: Zhixiang ChenUniversity of Texas Pan American Bin FuUniversity of Texas Pan American](https://reader035.vdocuments.mx/reader035/viewer/2022062813/5681656f550346895dd80305/html5/thumbnails/9.jpg)
Outline
• Background, Haplotypes • Problem Formulation• Algorithms• Empirical Data
![Page 10: Zhixiang ChenUniversity of Texas Pan American Bin FuUniversity of Texas Pan American](https://reader035.vdocuments.mx/reader035/viewer/2022062813/5681656f550346895dd80305/html5/thumbnails/10.jpg)
Haplotype Reconstruction Problem
0100100100101010
1110101000111011
Diploid organism:Two haplotypes
2 Unknown haplotypes
![Page 11: Zhixiang ChenUniversity of Texas Pan American Bin FuUniversity of Texas Pan American](https://reader035.vdocuments.mx/reader035/viewer/2022062813/5681656f550346895dd80305/html5/thumbnails/11.jpg)
Diploid organism:Two haplotypes
Unknown
Given: Set of n fragments1) Incomplete
n x m SNP matrix
--0010-10-10—01—1—101-1—0---10----001-01001--010010---01-010-01011----10001-101-111—10-0--1110-1--0010010-10-01--1101—100-111--1
0100100100101010
1110101000111011
Haplotype Reconstruction Problem
![Page 12: Zhixiang ChenUniversity of Texas Pan American Bin FuUniversity of Texas Pan American](https://reader035.vdocuments.mx/reader035/viewer/2022062813/5681656f550346895dd80305/html5/thumbnails/12.jpg)
Diploid organism:Two haplotypes
Unknown
Given: Set of n fragments1) Incomplete2) Inconsistent
--0010-10-10—01—1—101-1—0---10----001-01001--010010---01-010-01011----10001-101-111—10-0--1110-1--0010010-10-01--1101—100-111--1
n x m SNP matrix
0100100100101010
1110101000111011
Haplotype Reconstruction Problem
![Page 13: Zhixiang ChenUniversity of Texas Pan American Bin FuUniversity of Texas Pan American](https://reader035.vdocuments.mx/reader035/viewer/2022062813/5681656f550346895dd80305/html5/thumbnails/13.jpg)
Diploid organism:Two haplotypes
Unknown
Given: Set of n fragments1) Incomplete2) Inconsistent
--1010-11-10—01—1—101-0—0---10----011-01001--010010---01-010-00011----10101-101-111—01-0--1110-1--0110010-10-01--1111—000-111--0
n x m SNP matrix
0100100100101010
1110101000111011
Haplotype Reconstruction Problem
![Page 14: Zhixiang ChenUniversity of Texas Pan American Bin FuUniversity of Texas Pan American](https://reader035.vdocuments.mx/reader035/viewer/2022062813/5681656f550346895dd80305/html5/thumbnails/14.jpg)
Diploid organism:Two haplotypes
Unknown
Given: Set of n fragments1) Incomplete2) Inconsistent3) Don’t know which
fragments came fromthe same haplotype.
--1010-11-10—01—1—101-0—0---10----011-01001--010010---01-010-00011----10101-101-111—01-0--1110-1--0110010-10-01--1111—000-111--0
n x m SNP matrix
0100100100101010
1110101000111011
Haplotype Reconstruction Problem
![Page 15: Zhixiang ChenUniversity of Texas Pan American Bin FuUniversity of Texas Pan American](https://reader035.vdocuments.mx/reader035/viewer/2022062813/5681656f550346895dd80305/html5/thumbnails/15.jpg)
Haplotype Reconstruction Problem
Input:-n x m SNP matrix
Output:-The 2 “best” haplotypes for generating the input matrix.
--1010-11-10—01—1—101-0—0---10----011-01001--010010---01-010-00011----10101-101-111—01-0--1110-1--0110010-10-01--1111—000-111--0
n x m SNP matrix
- What is meant by “best” defines multiple optimization problems
![Page 16: Zhixiang ChenUniversity of Texas Pan American Bin FuUniversity of Texas Pan American](https://reader035.vdocuments.mx/reader035/viewer/2022062813/5681656f550346895dd80305/html5/thumbnails/16.jpg)
Haplotype Reconstruction Problem
--1010-11-10—01—1—101-0—0---10----011-01001--010010---01-010-00011----10101-101-111—01-0--1110-1--0110010-10-01--1111—000-111--0
n x m SNP matrixWhat is meant by “best” defines multiple optimization problems
Minimum Error Correction (MEC)- Flip the minimum number of matrix positions such that the matrix can be partitioned into 2 sets of consistent strings.
![Page 17: Zhixiang ChenUniversity of Texas Pan American Bin FuUniversity of Texas Pan American](https://reader035.vdocuments.mx/reader035/viewer/2022062813/5681656f550346895dd80305/html5/thumbnails/17.jpg)
Haplotype Reconstruction Problemn x m SNP matrixWhat is meant by “best” defines
multiple optimization problems
Minimum Error Correction (MEC)- Flip the minimum number of matrix positions such that the matrix can be partitioned into 2 sets of consistent strings.
--1010-11-10—01—1—101-0—0---10----011-01001--010010---01-010-00011----10101-101-111—01-0--1110-1--0110010-10-01--1111—000-111--0
![Page 18: Zhixiang ChenUniversity of Texas Pan American Bin FuUniversity of Texas Pan American](https://reader035.vdocuments.mx/reader035/viewer/2022062813/5681656f550346895dd80305/html5/thumbnails/18.jpg)
Haplotype Reconstruction Problemn x m SNP matrixWhat is meant by “best” defines
multiple optimization problems
Minimum Error Correction (MEC)- Flip the minimum number of matrix positions such that the matrix can be partitioned into 2 sets of consistent strings.
--0010-10-10—01—1—101-1—0---10----001-01001--010010---01-010-01011----10001-101-111—10-0--1110-1--0010010-10-01--1101—100-111--1
![Page 19: Zhixiang ChenUniversity of Texas Pan American Bin FuUniversity of Texas Pan American](https://reader035.vdocuments.mx/reader035/viewer/2022062813/5681656f550346895dd80305/html5/thumbnails/19.jpg)
Haplotype Reconstruction Problemn x m SNP matrixWhat is meant by “best” defines
multiple optimization problems
Minimum Error Correction (MEC)- Flip the minimum number of matrix positions such that the matrix can be partitioned into 2 sets of consistent strings.
--0010-10-10—01—1—101-1—0---10----001-01001--010010---01-010-01011----10001-101-111—10-0--1110-1--0010010-10-01--1101—100-111--1
![Page 20: Zhixiang ChenUniversity of Texas Pan American Bin FuUniversity of Texas Pan American](https://reader035.vdocuments.mx/reader035/viewer/2022062813/5681656f550346895dd80305/html5/thumbnails/20.jpg)
Haplotype Reconstruction Problemn x m SNP matrixWhat is meant by “best” defines
multiple optimization problems
Minimum Error Correction (MEC)- Flip the minimum number of matrix positions such that the matrix can be partitioned into 2 sets of consistent strings.
--0010-10-10—01—1—101-1—0---10----001-01001--010010---01-010-01011----10001-101-111—10-0--1110-1--0010010-10-01--1101—100-111--1Minimum Fragment Removal (MFR)
- Remove minimum number of fragments/rows
Minimum SNP Removal (MSR)- Remove minimum number of SNP sites/columns
![Page 21: Zhixiang ChenUniversity of Texas Pan American Bin FuUniversity of Texas Pan American](https://reader035.vdocuments.mx/reader035/viewer/2022062813/5681656f550346895dd80305/html5/thumbnails/21.jpg)
Haplotype Reconstruction Problemn x m SNP matrixWhat is meant by “best” defines
multiple optimization problems
Minimum Error Correction (MEC)- Flip the minimum number of matrix positions such that the matrix can be partitioned into 2 sets of consistent strings.
--0010-10-10—01—1—101-1—0---10----001-01001--010010---01-010-01011----10001-101-111—10-0--1110-1--0010010-10-01--1101—100-111--1Minimum Fragment Removal (MFR)
- Remove minimum number of fragments/rows
Minimum SNP Removal (MSR)- Remove minimum number of SNP sites/columns
Great problems. However, they are all computationally hard.[Bafna, Istrail, Lancia, Rizzi, 2005],[Cilibrasi, Iersel, Kelk, Tromp, 2005][Lippert, Schwartz, Lancia, Istrail, 2002]
![Page 22: Zhixiang ChenUniversity of Texas Pan American Bin FuUniversity of Texas Pan American](https://reader035.vdocuments.mx/reader035/viewer/2022062813/5681656f550346895dd80305/html5/thumbnails/22.jpg)
Haplotype Reconstruction Problem
Dealing with the hardness ofHaplotype Reconstruction
![Page 23: Zhixiang ChenUniversity of Texas Pan American Bin FuUniversity of Texas Pan American](https://reader035.vdocuments.mx/reader035/viewer/2022062813/5681656f550346895dd80305/html5/thumbnails/23.jpg)
Haplotype Reconstruction Problem
Dealing with the hardness ofHaplotype Reconstruction
Approximation algorithms forMEC, MFR, MSR.Some work here, but..- Some versions provably hard to approximate- No known O(1) approximation for MEC.
![Page 24: Zhixiang ChenUniversity of Texas Pan American Bin FuUniversity of Texas Pan American](https://reader035.vdocuments.mx/reader035/viewer/2022062813/5681656f550346895dd80305/html5/thumbnails/24.jpg)
Haplotype Reconstruction Problem
Dealing with the hardness ofHaplotype Reconstruction
Approximation algorithms forMEC, MFR, MSR.Some work here, but..- Some versions provably hard to approximate- No known O(1) approximation for MEC.
Consider a probabilistic modelFor SNP matrix generation.
(our approach)
![Page 25: Zhixiang ChenUniversity of Texas Pan American Bin FuUniversity of Texas Pan American](https://reader035.vdocuments.mx/reader035/viewer/2022062813/5681656f550346895dd80305/html5/thumbnails/25.jpg)
Haplotype Reconstruction ProblemConsider the following model parameters:
1) a1 , inconsistency rate2) a2 , incompleteness rate3) b , difference rate
![Page 26: Zhixiang ChenUniversity of Texas Pan American Bin FuUniversity of Texas Pan American](https://reader035.vdocuments.mx/reader035/viewer/2022062813/5681656f550346895dd80305/html5/thumbnails/26.jpg)
Haplotype Reconstruction ProblemConsider the following model parameters:
1) a1 , inconsistency rate2) a2 , incompleteness rate3) b , difference rate
Haplotype: 01001001
Random : 000-100-Fragment
Rate a1
A fragment is generated(a1, a2) by a Haplotype H if1) mismatch/inconsistency errors happened
with probability at most a1.
![Page 27: Zhixiang ChenUniversity of Texas Pan American Bin FuUniversity of Texas Pan American](https://reader035.vdocuments.mx/reader035/viewer/2022062813/5681656f550346895dd80305/html5/thumbnails/27.jpg)
Haplotype Reconstruction ProblemConsider the following model parameters:
1) a1 , inconsistency rate2) a2 , incompleteness rate3) b , difference rate
Haplotype: 01001001
Random : 000-100-Fragment
Rate a1 Rate a2
A fragment is generated(a1, a2) by a Haplotype H if1) mismatch/inconsistency errors happened
with probability at most a1.2) Holes happen with probability at most a2.
![Page 28: Zhixiang ChenUniversity of Texas Pan American Bin FuUniversity of Texas Pan American](https://reader035.vdocuments.mx/reader035/viewer/2022062813/5681656f550346895dd80305/html5/thumbnails/28.jpg)
Haplotype Reconstruction ProblemConsider the following model parameters:
1) a1 , inconsistency rate2) a2 , incompleteness rate3) b , difference rate
A fragment is generated(a1, a2) by a Haplotype H if1) mismatch/inconsistency errors happened
with probability at most a1.2) Holes happen with probability at most a2.
Haplotype Reconstruction Problem:
Let a1, a2, and b be small positive constants.
Let H1 and H2 be any two length m haplotypesSuch that: HAMM( H1, H2 )
m ≥ b
Input: n x m SNP matrix such that each rowIs generated(a1, a2) by either H1 or H2.
Output: H1 and H2. (with high probability)
![Page 29: Zhixiang ChenUniversity of Texas Pan American Bin FuUniversity of Texas Pan American](https://reader035.vdocuments.mx/reader035/viewer/2022062813/5681656f550346895dd80305/html5/thumbnails/29.jpg)
Outline
• Background, Haplotypes • Problem Formulation• Algorithms• Empirical Data
![Page 30: Zhixiang ChenUniversity of Texas Pan American Bin FuUniversity of Texas Pan American](https://reader035.vdocuments.mx/reader035/viewer/2022062813/5681656f550346895dd80305/html5/thumbnails/30.jpg)
Algorithm 1010010-10-10101—1—101-0—0--110----011-01001--010010-1001-010-00011----10101-101-111—10-0--1110-1--0110010-10-01--1111—100-111--1
GROUP1 GROUP2
010010-10-10101—
1) Grab a random fragment X
![Page 31: Zhixiang ChenUniversity of Texas Pan American Bin FuUniversity of Texas Pan American](https://reader035.vdocuments.mx/reader035/viewer/2022062813/5681656f550346895dd80305/html5/thumbnails/31.jpg)
Algorithm 1010010-10-10101—1—101-0—0--110----011-01001--010010-1001-010-00011----10101-101-111—10-0--1110-1--0110010-10-01--1111—100-111--1
GROUP1 GROUP2
010010-10-10101—
1) Grab a random fragment X2) Compare each remaining fragment Y to X:
- IF diff( Y, X ) < 2( α1 + ε )THEN put Y in GROUP1
ELSE put Y in GROUP2
![Page 32: Zhixiang ChenUniversity of Texas Pan American Bin FuUniversity of Texas Pan American](https://reader035.vdocuments.mx/reader035/viewer/2022062813/5681656f550346895dd80305/html5/thumbnails/32.jpg)
Algorithm 1010010-10-10101—1—101-0—0--110----011-01001--010010-1001-010-00011----10101-101-111—10-0--1110-1--0110010-10-01--1111—100-111--1
GROUP1 GROUP2
1—101-0—0--110--11----10101-101-111—10-0--1110-1-1111—100-111--1
010010-10-10101—--011-01001—-010010-1001-010-000--0110010-10-01-
1) Grab a random fragment X2) Compare each remaining fragment Y to X:
- IF diff( Y, X ) < 2( α1 + ε )THEN put Y in GROUP1
ELSE put Y in GROUP2
For Example:α1 = .03 ε = .04
2 or fewer mismatches, GROUP13 or more, GROUP2
![Page 33: Zhixiang ChenUniversity of Texas Pan American Bin FuUniversity of Texas Pan American](https://reader035.vdocuments.mx/reader035/viewer/2022062813/5681656f550346895dd80305/html5/thumbnails/33.jpg)
Algorithm 1010010-10-10101—1—101-0—0--110----011-01001--010010-1001-010-00011----10101-101-111—10-0--1110-1--0110010-10-01--1111—100-111--1
GROUP1 GROUP2
1—101-0—0--110--11----10101-101-111—10-0--1110-1-1111—100-111--1
010010-10-10101—--011-01001—-010010-1001-010-000--0110010-10-01-
1) Grab a random fragment X2) Compare each remaining fragment Y to X:
- IF diff( Y, X ) < 2( α1 + ε )THEN put Y in GROUP1
ELSE put Y in GROUP2
For Example:α1 = .03 ε = .04
2 or fewer mismatches, GROUP13 or more, GROUP2
0101100100101010 1110101000111011
Consensus Strings
![Page 34: Zhixiang ChenUniversity of Texas Pan American Bin FuUniversity of Texas Pan American](https://reader035.vdocuments.mx/reader035/viewer/2022062813/5681656f550346895dd80305/html5/thumbnails/34.jpg)
Algorithm 1
1) Grab a random fragment X2) Compare each remaining fragment Y to X:
- IF diff( Y, X ) < 2( α1 + ε )THEN put Y in GROUP1
ELSE put Y in GROUP2
GROUP1 GROUP2
1—101-0—0--110--11----10101-101-111—10-0--1110-1-1111—100-111--1
010010-10-10101—--011-01001—-010010-1001-010-000--0110010-10-01-
0101100100101010 1110101000111011
Consensus Strings
Theorem: n₁ and n₂ denote the number of fragments generated from H1 and H2 respectively
If 4(α₁ + ε) < β
0 < 2α₁ + α₂ - ε < 1 Then the target haplotypes are reconstructed with probability at least Pr ≥ 1 - 2ne^(-ε²m/3) – 2me ^(-ε²n₁/2) – 2me ^(-ε²n₂/2)
α₁ = 3% n = 50000α₂ = 3% m = 50000Β = 30% n₁ = 25000ε = 0.044 n₂ = 25000
Pr[success] ≥ 0.999993817
Example:
![Page 35: Zhixiang ChenUniversity of Texas Pan American Bin FuUniversity of Texas Pan American](https://reader035.vdocuments.mx/reader035/viewer/2022062813/5681656f550346895dd80305/html5/thumbnails/35.jpg)
Downside: Algorithm 1 needs to know the parameter α1
Algorithm 1
1) Grab a random fragment X2) Compare each remaining fragment Y to X:
- IF diff( Y, X ) < 2( α1 + ε )THEN put Y in GROUP1
ELSE put Y in GROUP2
GROUP1 GROUP2
1—101-0—0--110--11----10101-101-111—10-0--1110-1-1111—100-111--1
010010-10-10101—--011-01001—-010010-1001-010-000--0110010-10-01-
0101100100101010 1110101000111011
Consensus Strings
![Page 36: Zhixiang ChenUniversity of Texas Pan American Bin FuUniversity of Texas Pan American](https://reader035.vdocuments.mx/reader035/viewer/2022062813/5681656f550346895dd80305/html5/thumbnails/36.jpg)
Algorithm 2010010-10-10101—1—101-0—0--110----011-01001--010010-1001-010-00011----10101-101-111—10-0--1110-1--0110010-10-01--1111—100-111--1
GROUP1 GROUP2
010-1001-010-000010010-10-10101—
Pick 2 fragments at random
![Page 37: Zhixiang ChenUniversity of Texas Pan American Bin FuUniversity of Texas Pan American](https://reader035.vdocuments.mx/reader035/viewer/2022062813/5681656f550346895dd80305/html5/thumbnails/37.jpg)
Algorithm 2010010-10-10101—1—101-0—0--110----011-01001--010010-1001-010-00011----10101-101-111—10-0--1110-1--0110010-10-01--1111—100-111--1
Partition each remaining fragment to closestrepresentative fragmentGROUP1 GROUP2
010-1001-010-0001—101-0—0--110----0110010-10-01-
010010-10-10101—--011-01001—-01011----10101-101-111—10-0--1110-1-1111—100-111--1
![Page 38: Zhixiang ChenUniversity of Texas Pan American Bin FuUniversity of Texas Pan American](https://reader035.vdocuments.mx/reader035/viewer/2022062813/5681656f550346895dd80305/html5/thumbnails/38.jpg)
Algorithm 2010010-10-10101—1—101-0—0--110----011-01001--010010-1001-010-00011----10101-101-111—10-0--1110-1--0110010-10-01--1111—100-111--1
GROUP1 GROUP2
010-1001-010-0001—101-0—0--110----0110010-10-01-
010010-10-10101—--011-01001—-01011----10101-101-111—10-0--1110-1-1111—100-111--1
Can get a Bad Partition
So try again…
![Page 39: Zhixiang ChenUniversity of Texas Pan American Bin FuUniversity of Texas Pan American](https://reader035.vdocuments.mx/reader035/viewer/2022062813/5681656f550346895dd80305/html5/thumbnails/39.jpg)
Algorithm 2010010-10-10101—1—101-0—0--110----011-01001--010010-1001-010-00011----10101-101-111—10-0--1110-1--0110010-10-01--1111—100-111--1
GROUP1 GROUP2
010-1001-010-0001—101-0—0--110----0110010-10-01-
010010-10-10101—--011-01001—-01011----10101-101-111—10-0--1110-1-1111—100-111--1
-Repeatedly pick 2 random fragments, and partition.
![Page 40: Zhixiang ChenUniversity of Texas Pan American Bin FuUniversity of Texas Pan American](https://reader035.vdocuments.mx/reader035/viewer/2022062813/5681656f550346895dd80305/html5/thumbnails/40.jpg)
Algorithm 2010010-10-10101—1—101-0—0--110----011-01001--010010-1001-010-00011----10101-101-111—10-0--1110-1--0110010-10-01--1111—100-111--1
GROUP1 GROUP2
010-1001-010-0001—101-0—0--110----0110010-10-01-
010010-10-10101—--011-01001—-01011----10101-101-111—10-0--1110-1-1111—100-111--1
-Repeatedly pick 2 random fragments, and partition.
- For each iteration, computeMAX distance between representative fragment and any other fragment in the same group.
![Page 41: Zhixiang ChenUniversity of Texas Pan American Bin FuUniversity of Texas Pan American](https://reader035.vdocuments.mx/reader035/viewer/2022062813/5681656f550346895dd80305/html5/thumbnails/41.jpg)
Algorithm 2010010-10-10101—1—101-0—0--110----011-01001--010010-1001-010-00011----10101-101-111—10-0--1110-1--0110010-10-01--1111—100-111--1
GROUP1 GROUP2
010-1001-010-0001—101-0—0--110----0110010-10-01-
010010-10-10101—--011-01001—-01011----10101-101-111—10-0--1110-1-1111—100-111--1
-Repeatedly pick 2 random fragments, and partition.
- For each iteration, computeMAX distance between representative fragment and any other fragment in the same group.
-Repeat O(1) times, outputthe best partition found over all iterations.
Achieves success withhigh probability
Does not require priorknowledge of modelparameters
![Page 42: Zhixiang ChenUniversity of Texas Pan American Bin FuUniversity of Texas Pan American](https://reader035.vdocuments.mx/reader035/viewer/2022062813/5681656f550346895dd80305/html5/thumbnails/42.jpg)
010010-10-10101—1—101-0—0--110----011-01001--010010-1001-010-00011----10101-101-111—10-0--1110-1--0110010-10-01--1111—100-111--1
GROUP1 GROUP2
010-1001-010-0001—101-0—0--110----0110010-10-01-
010010-10-10101—--011-01001—-01011----10101-101-111—10-0--1110-1-1111—100-111--1
-Repeatedly pick 2 random fragments, and partition.
- For each iteration, computeMAX distance between representative fragment and any other fragment in the same group.
-Repeat O(1) times, outputthe best partition found over all iterations.
Achieves success withhigh probability
Does not require priorknowledge of modelparameters
Measure SUM of distances between representative fragment and ALL other fragments in the same group
Algorithm 3
-This achieves the best empirical results
![Page 43: Zhixiang ChenUniversity of Texas Pan American Bin FuUniversity of Texas Pan American](https://reader035.vdocuments.mx/reader035/viewer/2022062813/5681656f550346895dd80305/html5/thumbnails/43.jpg)
Outline
• Background, Haplotypes • Problem Formulation• Algorithms• Empirical Data
![Page 44: Zhixiang ChenUniversity of Texas Pan American Bin FuUniversity of Texas Pan American](https://reader035.vdocuments.mx/reader035/viewer/2022062813/5681656f550346895dd80305/html5/thumbnails/44.jpg)
Empirical Tests
In experiments, Algorithm 3 gave the best performance:
![Page 45: Zhixiang ChenUniversity of Texas Pan American Bin FuUniversity of Texas Pan American](https://reader035.vdocuments.mx/reader035/viewer/2022062813/5681656f550346895dd80305/html5/thumbnails/45.jpg)
Summary / Future Work• Summary– Provably good probabilistic algorithms for singular
haplotype reconstruction.– Algorithms are fast: linear time – Good empirical performance on simulated data
• Future Work– Consider model with short fragments– Test with real biological data
The software of our algorithms is available for public access and for real-time on-line demonstration at
http://fpsa.cs.uno.edu/HapRec/HapRec.html
Thank you for listening. Questions?