efficient data structures and a new randomized approach for sorting signed permutations by reversals...

Efficient Data Structures and a New Randomized Approach

for Sorting Signed Permutations by Reversals

Haim Kaplan and Elad Verbin

Unsigned Sorting by reversals

Given a permutation, find a shortest sequence of reversals

that transforms it to

(1 2 … n)

(3 4 2 5 1) 3 4 2 5 1)(

(3 4 -2 -5 1) 3 4 2 5 1)( - --

Signed Sorting by reversals

Given a signed permutation, find a shortest sequence of reversals

that transforms it to

(+1 +2 … +n)

Example

• (3 4 -2 -5 1)

• (-4 -3 -2 -5 1)

• (-4 -3 -2 -1 5)

• ( 1 2 3 4 5)

Motivation

• Biology:

computing large-scale

evolutionary distance

(most parsimonious scenario)

• Heuristics for TSP

History of Unsigned SBR

• 1993: Conjectured to be NP-Hard and 2-approx. algorithm, Kececioglu and Sankoff

• 1997: Proven NP-Hard, Caprara

• 1999: Proven MAX-SNP Hard, Berman and Karpinski

• 2001: 1.375-approximation by Berman, Hannenhalli and Karpinski

History of Signed SBR

• 1993: Conjectured NP-Hard, Kececioglu and Sankoff

• 1995: Polynomial Algorithm, Pevzner and Hannenhalli

• 1996: O(n2(n)) Algorithm, Berman and Hannenhalli

• 1997: Simpler O(n2) Algorithm, Kaplan, Shamir and Tarjan - Best to date.

• 2001: Very Simple cubic solution, Bergeron

Variants

• Computing just the distance (signed version) – linear (Bader et. al., 2001)

• Many, many, more

Sorting Signed

Permutations

The Breakpoint Graph

+3 +4 -2 -5 +1

x -x

xa,xb xb,xa

0 3a 3b 4a 4b 2b 2a 5b 5a 1a 1b 6

• Transform Permutation

• Blue Edges Between Adjacent Vertices

• Red Edges Between Consecutive elements

Calculating the distance

dist=n+1-c+h+f

dist=n+1-c

Goal: Find a reversal that creates a cycle and keeps h+f=0 (a safe reversal)

Eliminated in linear time

h,f are small parameters of the breakpoint graph (usually h+f=0)

Note: single decomposition to alternating cycles

0 3a 3b 4a 4b 2b 2a 5b 5a 1a 1b 6

Red edges can be :Right-to-Right

Left-to-LeftOriented{

Left-to-Right

Right-to-LeftUnoriented{

Def:This reversal acts on the red edge

Oriented EdgesWe consider only reversals that reverse between the endpoints of a red edge

Red edges can be :

Oriented Edges

c=1Oriented

c=0Unoriented

Thm: A reversal acting on a red edge creates a new cycle the edge is oriented

Safe reversals

• Safe oriented reversal: does not increase h+f

• Theorem(H-P): a permutation with h+f=0 always has a safe oriented reversal (or is id)

• SBR algorithms iteratively find a safe reversal. KST does this in linear time (total running time – O(n2))

• H-P characterized safe oriented reversals using the overlap graph

What happens if we disregard

safety?

perms with h+f>0 are rare

Prop: if we ended up at id, then the sequence was a minimal sorting sequence, otherwise we can retry

RandomWalk

Each path is <n+2 steps

Whoops..

NO CHILDREN

H-P: all nodes with no children are either id or red

Prop: oriented reversals never decrease h+f (i.e. red points only to red)

Yey!

id

Failed but don’t know it

yet

h+f>0

=(1 -4 2 -3)

’s oriented reversals

After how many trials will RandomWalk succeed?

•Worst permutation – many

•Average permutation – very fewpi7=(2 4 6 -1 -3 -5 7)

Probability of success

22

1n

50% chance of failure



(permutation of Michal Ozery and Ron Shamir)

What is the average over all permutations of the

expected number of trials until success?

•Theoretically – we don’t know (but we do know that red permutations are polynomially rare)

•Experimentally – 1.6, regardless of n

Behavior of RandomWalk on the average

Empirical testing of RandomWalk on random permutations (selected uniformly without unoriented components)

n Number of permutations tested

% success at first trial Average number of trials until success

10 300M 66.20% 1.642

20 100M 63.86% 1.607

50 30M 63.01% 1.594

100 10M 62.86% 1.593

200 3M 62.76% 1.594

500 600K 62.69% 1.596

1000 200K 62.62% 1.596

2000 50K 63.35% 1.586

5000 6000 63.7% 1.58

10000 2000 63% 1.59

Implementing the Random

Walk

Basic structure

Our DS is based on a simple data structures ofFredman, Johnson, McGeoch & Hostheimer `95 (which are based on those by Chrobak, Szymacha & Krawczyk, `90). These structures were invented for implementing TSP heuristics

These data structures allow us to maintainpermutation under:• Reversals• Queries: ,All operations taking logarithmic time.

i 1i

Random Walk

• Repeatedly:

• Select a random oriented reversal

• Perform it and update the permutation

How fast can we run one RandomWalk iteration?

Repeatedly:• Select a random oriented reversal• Perform it and update the permutation

• Can be done in O(n) time• In the paper we show how to do it in so one Random Walk

takes time )log( nnO

)log( 2/3 nnO

Further Questions

• Why does RandomWalk work (so well)?

• Are there variants of RandomWalk that work better (i.e. have good worst-case behavior – no bad cases)?

• Can RandomWalk be easily and efficiently implemented?

Further Questions

• Are there variants to RandomWalk that can be easily and efficiently implemented but maintain good average-case behavior (i.e. by waving the demand for a uniform selection of an oriented reversal)?

• Can we maintain safety or hurdelity too?

General Further Research

• Is there a subquadratic algorithm for SBR?

• What is the structure of the space of all sorting reversal sequences? (i.e. the permutation graph we saw before)

Fredman et. al.’s DS

• Splay tree with the permutation in the leaves – inorder scan gives the permutation.

• Reverse bits at the nodes. If on means that the order of the subtree should be reverse, as should the signs of the elements.

i j

Reverse(i,j):

splay(j)

Splay operations should keep the invariant that the tree indeed represents the perumutation

j

Reverse(i,j):

splay(i)

i

i

Reverse(i,j):

jT1

T3T2

-j

-iT1

T3T2

R

i

kT1

T3

T2

T4

j

-j

-kT1

T2R

T3R

T4

-i

efficient data structures and a new randomized approach for sorting signed permutations by reversals...

Documents