fixed-parameter algorithms for closest string and related problems algorithmica(2003) jens gramm,...
TRANSCRIPT
Fixed-Parameter Algorithms for CLOSEST STRING and Related Problems
Algorithmica(2003)Jens Gramm, Rolf Niedermeier, P
eter Rossmanith
Outline
Introduction Preliminaries Linear-Time solution for constant d Related Problems Linear-Time solution for fixed k Conclusion
Intro : Problem Definition
Input: String s1, s2, …, sk over alphabet Σ of length L each, and a nonnegative integer d.
Question: Is there a string s of length L such that dH(s, si)≤d for all i=1,…,k dH(s1, s2) = |{i|s1[i]≠s2[i]}|, |s1|=|s2|
NP-completeness
CLOSEST STRING is NP-complete d is usually small in biological applica
tions O(kL+kd*dd) result in this paper
PTAS by Li et al
Extended problems
d-MISMATCH DISTINGUISHING STRING
SELECTION DISTINGUISHING SUBSTRING
SELECTION
Preliminaries
Given a set of string S={s1,…,sk}, each of length L s is optimal center string iff no s’ such t
hat maxi=1,…,kdH(s’,si)<maxi=1,…,kdH(s,si) s is optimal median string iff no s’ such t
hat Σi=1,…,kdH(s’,si)<Σi=1,…,kdH(s,si)
Given a set of k strings of length L, think of this string as k x L matrix
Optimal median string : a c c a
s1 a b c d
s2 a a d b
s3 b c d a
s4 a c c c
Main idea
Search! Fixed-parameter tractibility Reduction to problem kernel
LEMMA 1. Given a set of strings S={s1,…,sk}, each of length L, and a permutationσ:{1,…,L}{1,…,L}. Then s is an optimal center string for {s1,…,sk} iff σ(s) is an optimal center string for {σ(s1), σ(s2), …, σ(sk)}
LEMMA 2. To compute an optimal center string, it is sufficient to solve a normalized and reordered instance. From this, the solution of the original instance can be derived in linear time s
1a b c d
s2
a a d b
s3
b c d a
s4
a c c c
s1
a b a a
s2
a c b b
s3
b a b c
s4
a a a d
s1
b a a a
s2
c a b b
s3
a b b c
s4
a a a d
LEMMA 3. A CLOSEST STRING instance with arbitrary alphabet Σ, |Σ|>k, isomorphic to a CLOSEST STRING instance with alphabet Σ’, |Σ’|=k. By normalization
LEMMA 4. Given a CLOSTEST STRING instance s1,…,sk of length L and d. If the resulting k x L matrix has more than kd dirty dirty columns, then there is no string s with maxi=1,…,kdH(s,si)≤d A column is dirty iff it contains at least tw
o different symbols from alphabet Σ By pigeon theorem
A Linear-Time solution for constant d
Bounded search tree algorithm LEMMA 5. Given a set of strings S={s1,
…,sk} and a positive integer d. If there are i, j {1,…,k} with dH{si,sj}>2d, then there is no string s with maxi=1,…,kdH(s, si)≤d
Theorem 1. Given a set of string S={s1,…,sk} and d, Algorithm D determines in O(kL+kd*dd) time. By lemma 4, reduced the input instance t
o O(kd) in O(kL) time Depth=d, Time(D0+D1+D2+D3)=kd by buil
ding a table containing the distances of candidate s1 to all other given strings
correctness
Show only the correctness of first step If s1 is not a solution but there exists a c
enter string s P :={p|s1[p]≠si[p]}, |P|=d+1 Ps1≠s=si := {p|s1[p]≠s[p]=si[p]} goal! Ps1≠s=si =Ps≠si∪ P (disjoint), |Ps≠si|≤d So d+1 subcases is sufficient
Related Problems d-MISMATCH problem
Si,p,L denote the length L substring of a given string si starting at position p
Whether there is a string of length L and a position p with 1≤p≤n-L+1, such that dH(s,si,p,L)≤d, for all I
Stojanvoic et al give a linear time algorithm fo 1-MISMATCH
Theorem 2. d-MISMATCH is solvable in O(kL+(n-L)kd*dd) time which O(n*k) for fixed d
Naively: O(n*(KL+kd*dd)) Maintain the queue of dirty columns Considering only the first L columns, we can build a FIFO
queue in O(kL) Update at each position in O(k) time
DSS problem DISTINGUISHING STRING SELECTION
Given S={s1,…,sk1}, S’={s’1,…,s’k2} all of the same length L, and d1,d2≥0, is there a s such that
LEMMA 6. Given two set of strings S1={s1,…,sk1} and S2={s’1,…,s’k2} and positive d1,d2. If there are i{1,…,k1} and j{1,…k2} with dH(si,s’j)<L-(d1+d2), then there is no string s satisfying both maxi=1,…,k1dH(s,si)≤d1 and minj=1,…,k2dH(s,s’j)≥L-d2
dH(s,s’j)≤dH(s,si)+dH(si,s’j)
2,...,1
1,...,1
)',(max
),(max
2
1
dLssd
dssd
jHkj
iHki
A Linear-Time Solution for Fixed k
Is CLOSEST STRING fixed parameter tractable?
Use integer linear programming (ILP) Lenstra: ILP with a fixed number of va
riables can be solved in linear time(exponential space)
CLOSEST STRING in ILP Column types for k
For k=3: (a,a,a)t, (a,a,b)t, (a,b,a)t, (b,a,a)t, (a,b,c)t
|column types|=B(k)≤k! Xt,φ, t: column type, φΣ
Number of column type t whose corresponding character in the desired solution string of CLOSEST STRING is set to φ
B(k)*k Variables needed Minimize
Φt,i denates the alphabet symbol at the ith entry of column type t
tt
kiit
x}){(,
1,
max
Conclusion
Fixed parameter tractability for CLOSEST STRING in d, k
Improve previous work in d-MISMATCH
DSS CLOSEST SUBSTRING ?