efficient data structures and a new randomized approach for sorting signed permutations by reversals...
Post on 22-Dec-2015
218 views
TRANSCRIPT
Efficient Data Structures and a New Randomized Approach
for Sorting Signed Permutations by Reversals
Haim Kaplan and Elad Verbin
Unsigned Sorting by reversals
Given a permutation, find a shortest sequence of reversals
that transforms it to
(1 2 … n)
(3 4 2 5 1) 3 4 2 5 1)(
(3 4 -2 -5 1) 3 4 2 5 1)( - --
Signed Sorting by reversals
Given a signed permutation, find a shortest sequence of reversals
that transforms it to
(+1 +2 … +n)
Motivation
• Biology:
computing large-scale
evolutionary distance
(most parsimonious scenario)
• Heuristics for TSP
History of Unsigned SBR
• 1993: Conjectured to be NP-Hard and 2-approx. algorithm, Kececioglu and Sankoff
• 1997: Proven NP-Hard, Caprara
• 1999: Proven MAX-SNP Hard, Berman and Karpinski
• 2001: 1.375-approximation by Berman, Hannenhalli and Karpinski
History of Signed SBR
• 1993: Conjectured NP-Hard, Kececioglu and Sankoff
• 1995: Polynomial Algorithm, Pevzner and Hannenhalli
• 1996: O(n2(n)) Algorithm, Berman and Hannenhalli
• 1997: Simpler O(n2) Algorithm, Kaplan, Shamir and Tarjan - Best to date.
• 2001: Very Simple cubic solution, Bergeron
Variants
• Computing just the distance (signed version) – linear (Bader et. al., 2001)
• Many, many, more
The Breakpoint Graph
+3 +4 -2 -5 +1
x -x
xa,xb xb,xa
0 3a 3b 4a 4b 2b 2a 5b 5a 1a 1b 6
• Transform Permutation
• Blue Edges Between Adjacent Vertices
• Red Edges Between Consecutive elements
Calculating the distance
dist=n+1-c+h+f
dist=n+1-c
Goal: Find a reversal that creates a cycle and keeps h+f=0 (a safe reversal)
Eliminated in linear time
h,f are small parameters of the breakpoint graph (usually h+f=0)
Note: single decomposition to alternating cycles
0 3a 3b 4a 4b 2b 2a 5b 5a 1a 1b 6
Red edges can be :Right-to-Right
Left-to-LeftOriented{
Left-to-Right
Right-to-LeftUnoriented{
Def:This reversal acts on the red edge
Oriented EdgesWe consider only reversals that reverse between the endpoints of a red edge
Red edges can be :
Oriented Edges
c=1Oriented
c=0Unoriented
Thm: A reversal acting on a red edge creates a new cycle the edge is oriented
Safe reversals
• Safe oriented reversal: does not increase h+f
• Theorem(H-P): a permutation with h+f=0 always has a safe oriented reversal (or is id)
• SBR algorithms iteratively find a safe reversal. KST does this in linear time (total running time – O(n2))
• H-P characterized safe oriented reversals using the overlap graph
Prop: if we ended up at id, then the sequence was a minimal sorting sequence, otherwise we can retry
RandomWalk
Each path is <n+2 steps
Whoops..
NO CHILDREN
H-P: all nodes with no children are either id or red
Prop: oriented reversals never decrease h+f (i.e. red points only to red)
Yey!
id
Failed but don’t know it
yet
h+f>0
=(1 -4 2 -3)
’s oriented reversals
After how many trials will RandomWalk succeed?
•Worst permutation – many
•Average permutation – very fewpi7=(2 4 6 -1 -3 -5 7)
Probability of success
22
1n
50% chance of failure
50% chance of failure
50% chance of failure
(permutation of Michal Ozery and Ron Shamir)
What is the average over all permutations of the
expected number of trials until success?
•Theoretically – we don’t know (but we do know that red permutations are polynomially rare)
•Experimentally – 1.6, regardless of n
Behavior of RandomWalk on the average
Empirical testing of RandomWalk on random permutations (selected uniformly without unoriented components)
n Number of permutations tested
% success at first trial Average number of trials until success
10 300M 66.20% 1.642
20 100M 63.86% 1.607
50 30M 63.01% 1.594
100 10M 62.86% 1.593
200 3M 62.76% 1.594
500 600K 62.69% 1.596
1000 200K 62.62% 1.596
2000 50K 63.35% 1.586
5000 6000 63.7% 1.58
10000 2000 63% 1.59
Basic structure
Our DS is based on a simple data structures ofFredman, Johnson, McGeoch & Hostheimer `95 (which are based on those by Chrobak, Szymacha & Krawczyk, `90). These structures were invented for implementing TSP heuristics
These data structures allow us to maintainpermutation under:• Reversals• Queries: ,All operations taking logarithmic time.
i 1i
Random Walk
• Repeatedly:
• Select a random oriented reversal
• Perform it and update the permutation
How fast can we run one RandomWalk iteration?
Repeatedly:• Select a random oriented reversal• Perform it and update the permutation
• Can be done in O(n) time• In the paper we show how to do it in so one Random Walk
takes time )log( nnO
)log( 2/3 nnO
Further Questions
• Why does RandomWalk work (so well)?
• Are there variants of RandomWalk that work better (i.e. have good worst-case behavior – no bad cases)?
• Can RandomWalk be easily and efficiently implemented?
Further Questions
• Are there variants to RandomWalk that can be easily and efficiently implemented but maintain good average-case behavior (i.e. by waving the demand for a uniform selection of an oriented reversal)?
• Can we maintain safety or hurdelity too?
General Further Research
• Is there a subquadratic algorithm for SBR?
• What is the structure of the space of all sorting reversal sequences? (i.e. the permutation graph we saw before)
Fredman et. al.’s DS
• Splay tree with the permutation in the leaves – inorder scan gives the permutation.
• Reverse bits at the nodes. If on means that the order of the subtree should be reverse, as should the signs of the elements.
i j
Reverse(i,j):
splay(j)
Splay operations should keep the invariant that the tree indeed represents the perumutation