extended islands of tractability for parsimony haplotyping · introduction improved fpt algorithm...
Post on 09-Jun-2020
8 Views
Preview:
TRANSCRIPT
Introduction Improved FPT algorithm Induced Haplotyping Conclusion
Extended Islands of Tractability for
Parsimony Haplotyping
Rudolf Fleischer1, Jiong Guo2, Rolf Niedermeier3, JohannesUhlmann3, Yihui Wang1, Mathias Weller3, and Xi Wu1
1Fudan University Shanghai, 2Universitat des Saarlandes,and 3Friedrich-Schiller-Universitat Jena
June 22, 2010
1 / 21
Introduction Improved FPT algorithm Induced Haplotyping Conclusion
First, Some Biology...
A G T T A G C G A
A G T C A G C A A
approx. 0.1% of human nucleotide sites differ between individuals
2 / 21
Introduction Improved FPT algorithm Induced Haplotyping Conclusion
First, Some Biology...
A G T T A G C G A
A G T C A G C A A
approx. 0.1% of human nucleotide sites differ between individuals
2 / 21
Introduction Improved FPT algorithm Induced Haplotyping Conclusion
First, Some Biology...
A G T T A G C G A
A G T C A G C A A
the sequence of SNPs is called a haplotype
2 / 21
Introduction Improved FPT algorithm Induced Haplotyping Conclusion
First, Some Biology...
T G
C A
the sequence of SNPs is called a haplotype
2 / 21
Introduction Improved FPT algorithm Induced Haplotyping Conclusion
First, Some Biology...
humans are diploid ; 2 chromosome setshaplotype = SNP sequence in one chromosome setgenotype = SNP sequence in the combined chromosome sets
3 / 21
Introduction Improved FPT algorithm Induced Haplotyping Conclusion
First, Some Biology...
humans are diploid ; 2 chromosome setshaplotype = SNP sequence in one chromosome setgenotype = SNP sequence in the combined chromosome sets
3 / 21
Introduction Improved FPT algorithm Induced Haplotyping Conclusion
Motivation
Haplotype Inference
goal: find relation between certain SNPs and genetic diseasesproblem: difficult (expensive) to sequence both haplotypesbut: easy (cheap) to sequence the genotype instead; idea: sequence genotype and computationally infer haplotypes
4 / 21
Introduction Improved FPT algorithm Induced Haplotyping Conclusion
Motivation
Haplotype Inference
goal: find relation between certain SNPs and genetic diseasesproblem: difficult (expensive) to sequence both haplotypesbut: easy (cheap) to sequence the genotype instead; idea: sequence genotype and computationally infer haplotypes
Problems
impossible to infer haplotypes of just 1 genotype; sequence and infer groups/populations
which explanation should be preferred if there are multiple?; parsimony (“Ockham’s razor”)
how to perform the actual computation fast?
4 / 21
Introduction Improved FPT algorithm Induced Haplotyping Conclusion
Motivation
Haplotype Inference
goal: find relation between certain SNPs and genetic diseasesproblem: difficult (expensive) to sequence both haplotypesbut: easy (cheap) to sequence the genotype instead; idea: sequence genotype and computationally infer haplotypes
Problems
impossible to infer haplotypes of just 1 genotype; sequence and infer groups/populations
which explanation should be preferred if there are multiple?; parsimony (“Ockham’s razor”)
how to perform the actual computation fast?
4 / 21
Introduction Improved FPT algorithm Induced Haplotyping Conclusion
Preliminary Definitions
Definition (Haplotype, Genotype, Resolve)
haplotype h = string over {0, 1}genotype g = string over {0, 1, 2}h1, h2 resolve g ⇔
for all i ∈ N, g [i ] = h1[i ] = h2[i ] or g [i ] = 2 and h1[i ] 6= h2[i ]multiset H ; res (H)multiset H resolves multiset G ⇔ G ⊆ res (H)
Example
haplotype1: 0 0 1 0 1 1 1 0 0 0 1haplotype2: 0 0 1 1 0 1 0 0 1 1 1genotype: 0 0 1 2 2 1 2 0 2 2 1
5 / 21
Introduction Improved FPT algorithm Induced Haplotyping Conclusion
Preliminary Definitions
Definition (Haplotype, Genotype, Resolve)
haplotype h = string over {0, 1}genotype g = string over {0, 1, 2}h1, h2 resolve g ⇔
for all i ∈ N, g [i ] = h1[i ] = h2[i ] or g [i ] = 2 and h1[i ] 6= h2[i ]multiset H ; res (H)multiset H resolves multiset G ⇔ G ⊆ res (H)
Example
haplotype1: 0 0 1 0 1 1 1 0 0 0 1haplotype2: 0 0 1 1 0 1 0 0 1 1 1genotype: 0 0 1 2 2 1 2 0 2 2 1
5 / 21
Introduction Improved FPT algorithm Induced Haplotyping Conclusion
Preliminary Definitions
Definition (Haplotype, Genotype, Resolve)
haplotype h = string over {0, 1}genotype g = string over {0, 1, 2}h1, h2 resolve g ⇔
for all i ∈ N, g [i ] = h1[i ] = h2[i ] or g [i ] = 2 and h1[i ] 6= h2[i ]multiset H ; res (H)multiset H resolves multiset G ⇔ G ⊆ res (H)
Example
haplotype1: 0 0 1 1 1 1 1 0 0 0 1haplotype2: 0 0 1 0 0 1 0 0 1 1 1genotype: 0 0 1 2 2 1 2 0 2 2 1
5 / 21
Introduction Improved FPT algorithm Induced Haplotyping Conclusion
Preliminary Definitions
Definition (Haplotype Graph)
Given H and G ⊆ res (H), define the haplotype graph of H and G :
|H| vertices (labeled by H)
|G | edges (labeled by G )
haplotypes of the endpoints of an edge resolve its genotype
Example
11011
01001 11111
10110 11100
1212011122
21221
12212
22222
21021
6 / 21
Introduction Improved FPT algorithm Induced Haplotyping Conclusion
Haplotype Inference by Parsimony
Definition (Haplotype Inference by Parsimony)
Input: multiset G of length-m genotypes, integer k ≥ 0Question: ∃ multiset H of k haplotypes that resolves G ?
Example
genotypes haplotype graph haplotypes111221212012212210212122122222
11011
01001 11111
10110 11100
12120
11122
21221
12212
22222
21021
0100110110110111110011111
7 / 21
Introduction Improved FPT algorithm Induced Haplotyping Conclusion
Previous Work
Complexity
NP-hard [Halldorsson et al., DMTCS ’03]
APX-hard [Lancia et al., INFORMS ’04]
many special cases in P (e.g. [Lancia & Rizzi, ORL ’06])
Algorithms
Factor-2d−1-Approximation [Lancia & Rizzi, ORL ’06]
O(2|G |·d) Branch&Bound [Wang & Xu, Bioinformatics ’03]
O(k2k2· m) Algorithm [Sharan et al., TCBB ’06]
k := #haplotypes, m := genotype length, d := max # 2’s in a genotype
8 / 21
Introduction Improved FPT algorithm Induced Haplotyping Conclusion
Our Contribution
k4k · poly (|G |,m) time algorithm
polynomial-time solvable special case:“Induced Haplotype Inference”
9 / 21
Introduction Improved FPT algorithm Induced Haplotyping Conclusion
Inference Graph
Definition (Inference Graph)
inference graph Γ of G = an order-k graph with edgesconsistently labeled by the genotypes in G
Example
12120
11122
21221
12212
22222
21021
10 / 21
Introduction Improved FPT algorithm Induced Haplotyping Conclusion
If Only We Had an Inference Graph...
Observation (non-bipartite components of Γ)
C = cycle in Γ; for all i , |{g ∈ C | g [i ] = 2}| is even
non-bipartite components of Γ ; O(|Γ| · m) time
Example
12120
11122
21221
12212
22222
21021
11 / 21
Introduction Improved FPT algorithm Induced Haplotyping Conclusion
If Only We Had an Inference Graph...
Observation (bipartite components of Γ)
all g [i ] = 2 for some i ⇒ choose arbitrarily
bipartite components of Γ ; O(|Γ| · m) time
Example
1212011122
21221
22222
12 / 21
Introduction Improved FPT algorithm Induced Haplotyping Conclusion
If Only We Had an Inference Graph...
Observation
(G , k) yes-instance with solution H⇒ ∃ Γ extendable (O(|Γ| ·m) time) to a haplotype graph of H and G
13 / 21
Introduction Improved FPT algorithm Induced Haplotyping Conclusion
If Only We Had an Inference Graph...
Observation
(G , k) yes-instance with solution H⇒ ∃ Γ extendable (O(|Γ| ·m) time) to a haplotype graph of H and G
algorithmic idea: guess Γ
13 / 21
Introduction Improved FPT algorithm Induced Haplotyping Conclusion
If Only We Had an Inference Graph...
Observation
(G , k) yes-instance with solution H⇒ ∃ Γ extendable (O(|Γ| ·m) time) to a haplotype graph of H and G
algorithmic idea: guess Γbetter idea: guess a “spanning” subgraph of Γ
13 / 21
Introduction Improved FPT algorithm Induced Haplotyping Conclusion
Algorithm Outline
Algorithm
1 guess “spanning” subgraph of Γ
2 infer the haplotype multiset H ; O(k · m) time
3 check whether H resolves G ; O(k2 · m) time
14 / 21
Introduction Improved FPT algorithm Induced Haplotyping Conclusion
Algorithm Outline
Algorithm
1 guess “spanning” subgraph of Γ1 guess a size-k genotype subset of G ; O(k2k) possibilities2 for these genotypes, guess 2 (of k) vertices ; O(k2k ) possibilities
2 infer the haplotype multiset H ; O(k · m) time
3 check whether H resolves G ; O(k2 · m) time
14 / 21
Introduction Improved FPT algorithm Induced Haplotyping Conclusion
Algorithm Outline
Algorithm
1 guess “spanning” subgraph of Γ1 guess a size-k genotype subset of G ; O(k2k) possibilities2 for these genotypes, guess 2 (of k) vertices ; O(k2k ) possibilities
2 infer the haplotype multiset H ; O(k · m) time
3 check whether H resolves G ; O(k2 · m) time
Theorem
Haplotype Inference by Parsimony can be solved in O(k4k+2 · m)time.
14 / 21
Introduction Improved FPT algorithm Induced Haplotyping Conclusion
Example
11122
21221
12212
22222
21021
22020
01022
21221
12222
01001
11011
10110 11100
11111
10010 01000
0101111101
12120
15 / 21
Introduction Improved FPT algorithm Induced Haplotyping Conclusion
Example
11122
21221
12212
22222
21021
22020
01022
21221
12222
15 / 21
Introduction Improved FPT algorithm Induced Haplotyping Conclusion
Example
11122
21221
12212
22222
21021
22020
01022
21221
12222
?10?1
15 / 21
Introduction Improved FPT algorithm Induced Haplotyping Conclusion
Example
11122
21221
12212
22222
21021
22020
01022
21221
12222
?10?1
11011
1??1? 111??
111?1
1?0?0 010?0
010?111??1
15 / 21
Introduction Improved FPT algorithm Induced Haplotyping Conclusion
Example
11122
21221
12212
22222
21021
22020
01022
21221
12222
01001
11011
10110 11100
11111
100?0 010?0
010?1111?1
15 / 21
Introduction Improved FPT algorithm Induced Haplotyping Conclusion
Example
11122
21221
12212
22222
21021
22020
01022
21221
12222
01001
11011
10110 11100
11111
10010 01000
0101111101
15 / 21
Introduction Improved FPT algorithm Induced Haplotyping Conclusion
Induced Haplotype Inference
recall: H resolves G ⇔ G ⊆ res (H)what if G = res (H)? ; haplotype graph is a clique
Definition
G = res (H) ; H induces G
16 / 21
Introduction Improved FPT algorithm Induced Haplotyping Conclusion
Induced Haplotype Inference
recall: H resolves G ⇔ G ⊆ res (H)what if G = res (H)? ; haplotype graph is a clique
Definition
G = res (H) ; H induces G
Definition (Induced Haplotype Inference by Parsimony)
Input: multiset G of length-m genotypesQuestion: ∃ multiset H of haplotypes that induces G ?
16 / 21
Introduction Improved FPT algorithm Induced Haplotyping Conclusion
Observations & Algorithm
Observation
G can be partitioned “nicely” into G0, G1, and G2.
17 / 21
Introduction Improved FPT algorithm Induced Haplotyping Conclusion
Observations & Algorithm
Observation
G can be partitioned “nicely” into G0, G1, and G2.
Observation
G2 6= ∅ but G0 = ∅ or G1 = ∅ ; poly
|G0| = |G1| = 1 ; poly (although we may get 2 solutions)
17 / 21
Introduction Improved FPT algorithm Induced Haplotyping Conclusion
Observations & Algorithm
Observation
G can be partitioned “nicely” into G0, G1, and G2.
Observation
G2 6= ∅ but G0 = ∅ or G1 = ∅ ; poly
|G0| = |G1| = 1 ; poly (although we may get 2 solutions)
Strategy: Divide & Conquer
divide-stepbase casesmerge-step
17 / 21
Introduction Improved FPT algorithm Induced Haplotyping Conclusion
Observations & Algorithm
Observation
G can be partitioned “nicely” into G0, G1, and G2.
Observation
G2 6= ∅ but G0 = ∅ or G1 = ∅ ; poly
|G0| = |G1| = 1 ; poly (although we may get 2 solutions)
Strategy: Divide & Conquer
divide-step ; OKbase casesmerge-step
17 / 21
Introduction Improved FPT algorithm Induced Haplotyping Conclusion
Observations & Algorithm
Observation
G can be partitioned “nicely” into G0, G1, and G2.
Observation
G2 6= ∅ but G0 = ∅ or G1 = ∅ ; poly
|G0| = |G1| = 1 ; poly (although we may get 2 solutions)
Strategy: Divide & Conquer
divide-step ; OKbase cases ; OKmerge-step
17 / 21
Introduction Improved FPT algorithm Induced Haplotyping Conclusion
Observations & Algorithm
Observation
G can be partitioned “nicely” into G0, G1, and G2.
Observation
G2 6= ∅ but G0 = ∅ or G1 = ∅ ; poly
|G0| = |G1| = 1 ; poly (although we may get 2 solutions)
Strategy: Divide & Conquer
divide-step ; OKbase cases ; OKmerge-step ; problem!
17 / 21
Introduction Improved FPT algorithm Induced Haplotyping Conclusion
Observations & Algorithm
Observation
G can be partitioned “nicely” into G0, G1, and G2.
Observation
G2 6= ∅ but G0 = ∅ or G1 = ∅ ; poly
|G0| = |G1| = 1 ; poly (although we may get 2 solutions)
Strategy: Divide & Conquer
divide-step ; OKbase cases ; OKmerge-step ; problem!
need to find a way to compute H1 for given H0 and G2
17 / 21
Introduction Improved FPT algorithm Induced Haplotyping Conclusion
Extending a Subclique Solution
Observation
Let H0 induce G0 and let g be a genotype in G2 with the smallestnumber of 2’s. ; All h ∈ H0 that are consistent with g areequal.
18 / 21
Introduction Improved FPT algorithm Induced Haplotyping Conclusion
Extending a Subclique Solution
Observation
Let H0 induce G0 and let g be a genotype in G2 with the smallestnumber of 2’s. ; All h ∈ H0 that are consistent with g areequal.
Proof Idea
H0H1
g = 2120 . . .h′ = 1100 . . . h = 0110 . . .
18 / 21
Introduction Improved FPT algorithm Induced Haplotyping Conclusion
Extending a Subclique Solution
Observation
Let H0 induce G0 and let g be a genotype in G2 with the smallestnumber of 2’s. ; All h ∈ H0 that are consistent with g areequal.
Proof Idea
H0H1
g′
g = 2120 . . .h′ = 1100 . . . h = 0110 . . .
h′′ =?1?0 . . .
18 / 21
Introduction Improved FPT algorithm Induced Haplotyping Conclusion
Concluding The Induced Problem
Divide & Conquer like algorithm
divide steps O(|G | · k)base solutions O(|G | · m)
extends (merges) O(|G | · k · m)
all in all O(|G | · k · m)
19 / 21
Introduction Improved FPT algorithm Induced Haplotyping Conclusion
Conclusion
what we saw. . .
Induced Haplotyping in O(|G | · k · m) time
2O(k2 log k)-time algorithm improved to 2O(k log k)
20 / 21
Introduction Improved FPT algorithm Induced Haplotyping Conclusion
Conclusion
what we saw. . .
Induced Haplotyping in O(|G | · k · m) time
2O(k2 log k)-time algorithm improved to 2O(k log k)
also in the paper
results also hold for sets instead of multisets
results basically also hold for constrained variant
data reduction (O(2k · k2)-bit kernel)
20 / 21
Introduction Improved FPT algorithm Induced Haplotyping Conclusion
Conclusion
what we saw. . .
Induced Haplotyping in O(|G | · k · m) time
2O(k2 log k)-time algorithm improved to 2O(k log k)
also in the paper
results also hold for sets instead of multisets
results basically also hold for constrained variant
data reduction (O(2k · k2)-bit kernel)
future work
find polynomial kernel (or prove nonexistence)
distance from triviality measures
find 2O(k) time algorithm
20 / 21
Introduction Improved FPT algorithm Induced Haplotyping Conclusion
Thank you
21 / 21
top related