sorting with forbidden intermediates

19
Sorting with Forbidden Intermediates Carlo Comin Anthony Labarre Romeo Rizzi Stéphane Vialette University of Trento UPEM University of Verona Trento, Italy Champs-sur-Marne, Paris, France Verona, Italy 3rd Int. Conf. on Algorithms for Computational Biology Trujillo, Spain June 21-22, 2016 Sorting with Forbidden Intermediates C. Comin, A. Labarre, R. Rizzi, S. Vialette 1/17

Upload: phamtu

Post on 14-Feb-2017

254 views

Category:

Documents


6 download

TRANSCRIPT

Page 1: Sorting with Forbidden Intermediates

Sorting with Forbidden Intermediates

Carlo Comin Anthony Labarre Romeo RizziStéphane Vialette

University of Trento UPEM University of VeronaTrento, Italy Champs-sur-Marne, Paris, France Verona, Italy

3rd Int. Conf. on Algorithms for Computational BiologyTrujillo, Spain

June 21-22, 2016

Sorting with Forbidden Intermediates

C. Comin, A. Labarre, R. Rizzi, S. Vialette 1/17

Page 2: Sorting with Forbidden Intermediates

Introduction

Genome Rearrangements on Permutations [ref. Combinatorics of Genome

Rearrangements, G. Fertin, A. Labarre, I. Rusu, E. Tannier, S. Vialette, 2009]

Most genome rearrangement problems on permutations can be recast asconstrained sorting problems, where the goal is to compute a shortestsorting sequence of operations for a given permutation, under therestriction that the set of allowed operations is fixed beforehand.The corresponding sequences of permutations are useful for instancewhen reconstructing potential scenarios of evolution between species.

Sorting with Forbidden Intermediates

C. Comin, A. Labarre, R. Rizzi, S. Vialette 2/17

[fig. ref. Comparative primate genomics:emerging patterns of genome content and

dynamics, Jeffrey Rogers, Richard A. Gibbs,Nature Reviews Genetics, 15, 347–359, (2014)]

Page 3: Sorting with Forbidden Intermediates

Permutation Sorting ProblemGiven a permutation π and a set S of allowed operations, find a sequenceof elements from S that sorts π and is as short as possible.

I e.g., π =(

1, 2, 3, 44, 3, 1, 2

)and S = exchanges (algebraic transpositions).

Sorting with Forbidden Intermediates

C. Comin, A. Labarre, R. Rizzi, S. Vialette 3/17

[fig. ref. Odd-Even Transposition Sort & Shear Sort]

Page 4: Sorting with Forbidden Intermediates

However...Biologists have known for more than a century (Cuénot, 1905) that someof these mutations at a given point in time can be lethal for the organism.

No genome rearrangement problem seems to take into account theconstraint that the produced sequences may involve allele mutations thatare lethal for the organism on which they act.

Sorting with Forbidden Intermediates

C. Comin, A. Labarre, R. Rizzi, S. Vialette 4/17

[fig. ref. Punnet square for the agoutigene in mice, demonstrating a lethalrecessive allele, Wikipedia.]

Page 5: Sorting with Forbidden Intermediates

Lethals as Forbidden IntermediatesWe aim at revisiting sorting problems by adding a natural constraint:

I namely, the presence of a set F of forbidden intermediate permutations,which the sorting sequence that we seek must avoid.

I e.g., consider Sym4, Cayley graph on generator cycles (132) and (1234).F let F = {4123, 1243, 1324, 1342, 2341}, and s = 3412, t = 1234.

Sorting with Forbidden Intermediates

C. Comin, A. Labarre, R. Rizzi, S. Vialette 5/17

[fig. ref. Symmetric group S4, Wikiversity.]

Page 6: Sorting with Forbidden Intermediates

Lethals as Forbidden IntermediatesWe aim at revisiting sorting problems by adding a natural constraint:

I namely, the presence of a set F of forbidden intermediate permutations,which the sorting sequence that we seek must avoid.

I e.g., consider Sym4, Cayley graph on generator cycles (132) and (1234).F let F = {4123, 1243, 1324, 1342, 2341}, and s = 3412, t = 1234.

Sorting with Forbidden Intermediates

C. Comin, A. Labarre, R. Rizzi, S. Vialette 5/17

[fig. ref. Symmetric group S4, Wikiversity.]

Page 7: Sorting with Forbidden Intermediates

Constrained st-Connectivity ProblemsIn the ’70s, motivated by automatic software testing and validation, K.Krause, et al., 1973 introduced the path avoiding forbidden pairs problem

I namely, that of finding a directed (s, t)-path in a graph G = (V,E) thatcontains at most one vertex from each pair in a prescribed set F ⊆ V × Vof forbidden pairs of vertices.

I H. Yinnone 1997 proved that the problem is NP-complete. A number of specialcases were shown to admit poly(|G|, |F|) time algorithms.

S. Szeider, 2003 considered the problem of finding paths between two givenvertices avoiding forbidden transitions.

I where a transition in a graph is a pair of adjacent edges.

In this work...We consider (single) forbidden vertices and look for poly(|F|)-time algorithms.

Sorting with Forbidden Intermediates

C. Comin, A. Labarre, R. Rizzi, S. Vialette 6/17

Page 8: Sorting with Forbidden Intermediates

Constrained st-Connectivity ProblemsIn the ’70s, motivated by automatic software testing and validation, K.Krause, et al., 1973 introduced the path avoiding forbidden pairs problem

I namely, that of finding a directed (s, t)-path in a graph G = (V,E) thatcontains at most one vertex from each pair in a prescribed set F ⊆ V × Vof forbidden pairs of vertices.

I H. Yinnone 1997 proved that the problem is NP-complete. A number of specialcases were shown to admit poly(|G|, |F|) time algorithms.

S. Szeider, 2003 considered the problem of finding paths between two givenvertices avoiding forbidden transitions.

I where a transition in a graph is a pair of adjacent edges.

In this work...We consider (single) forbidden vertices and look for poly(|F|)-time algorithms.

Sorting with Forbidden Intermediates

C. Comin, A. Labarre, R. Rizzi, S. Vialette 6/17

Page 9: Sorting with Forbidden Intermediates

Guided Sorting by ExchangesOnly exchanges (i.e., algebraic transpositions) are allowed.The solutions we seek must be optimal.

I in the sense that no shorter sorting sequence of exchanges exists evenwhen no intermediate permutation is forbidden.

Main ResultWe identify a (non-trivial) polynomial-time solution for solving GUIDEDSORTING by exchanges when the permutation π to sort is an involution,i.e., a permutation whose cycles have length at most two.

I the algorithm runs in time polynomial in |F|.

Sorting with Forbidden Intermediates

C. Comin, A. Labarre, R. Rizzi, S. Vialette 7/17

Page 10: Sorting with Forbidden Intermediates

HY-STCON

We reduce GUIDED SORTING (for involutions, by exchanges) to theproblem, HY-STCON, of finding directed (s, t)-paths that avoid aprescribed set F ⊆ V of forbidden vertices in hypercube graphsHn.

INPUT: size n ∈ N of the underlying ground set [n], a family of forbiddenvertices F ⊆ ℘n, a source set S ∈ ℘n and a target set T ∈ ℘n.DECISION-TASK: Decide whether there exists a directed path p in Hn

that goes from source S to target T avoiding F ;SEARCH-TASK: Compute a directed path p inHn that goes from sourceS to target T avoiding F , provided that at least one such path exists.

Sorting with Forbidden Intermediates

C. Comin, A. Labarre, R. Rizzi, S. Vialette 8/17

Page 11: Sorting with Forbidden Intermediates

Reduction From GUIDED SORTING to HY-STCON

Let (π,F , S,K) be an instance of GUIDED SORTING.I S is the set of all exchanges.I Assume π is an involution. Write π with k cycles of length 2 as c1c2 · · · ck.I Since we are looking for an optimal sorting sequence, we may assume that

each permutation in F is an involution and its 2-cycles form a propersubset of those of π.

I Our instance of GUIDED SORTING then translates to the following instanceof HY-STCON:

F π 7→ [k] in the following way: ci 7→ i for 1 ≤ i ≤ k;F each permutation φ in F is mapped onto a subset of [k] by replacing its

cycles with the indices obtained in the first step; we let F ′ denote thecollection of subsets of [k] obtained by applying that mapping to each φ in F .

I The resulting HY-STCON instance is then 〈[k], ∅,F ′, k〉;F a solution to instance (π,F , S,K) of GUIDED SORTING exists if and only if a

solution to instance 〈[k], ∅,F ′, k〉 of HY-STCON exists.

Sorting with Forbidden Intermediates

C. Comin, A. Labarre, R. Rizzi, S. Vialette 9/17

Page 12: Sorting with Forbidden Intermediates

Theorem (Algorithm for HY-STCON)

Concerning the HY-STCON problem, the following propositions hold on anyinput 〈S, T,F , n〉, where dS,T is the distance between S and T .

1 There exists an algorithm for solving the DECISION-TASK of HY-STCON

within O(min(√|F| dS,T n, |F|) |F|2 d4

S,T n2) time.

2 There exists an algorithm for solving the SEARCH-TASK of HY-STCON

within O(min(√|F| dS,T n, |F|) |F|2 d4

S,T n2 + |F|5/2n3/2dS,T ) time.

Sorting with Forbidden Intermediates

C. Comin, A. Labarre, R. Rizzi, S. Vialette 10/17

Page 13: Sorting with Forbidden Intermediates

Sketch of the AlgorithmThe algorithm mainly consists in the continuous iteration of two phases:

1 Double-BFS. This phase explores the outgoing neighbourhood of thesource S by a breadth-first search denoted by BFS↑ going from lower tohigher levels ofHn while avoiding the vertices in F . BFS↑ collects acertain (polynomially bounded) amount of visited vertices. Symmetrically,the incoming neighbourhood of the target vertex T is also explored byanother breadth-first search BFS↓ going from higher to lower levels ofHn

while avoiding the vertices in F , also collecting a certain (polynomiallybounded) amount of visited vertices.

2 Compression. If a valid solution has not yet been determined, then acompression technique is devised in order to shrink the size of theremaining search space. This is possible thanks to some nice regularitiesof the search space and to certain connectivity properties of hypercubegraphs [GoldsLehmanRon2001, LehmanRon2001]. This allows us to reduce thesearch space suitably and, therefore, to continue with the Double-BFSphase in order to keep the search towards valid solutions going.

Sorting with Forbidden Intermediates

C. Comin, A. Labarre, R. Rizzi, S. Vialette 11/17

Page 14: Sorting with Forbidden Intermediates

On Vertex-Disjoint Paths in Hypercube Graphs

Theorem (Lehman, Ron 2001)

Given n,m ∈ N, let R ⊆ H(r)n and S ⊆ H(s)

n with |R| = |S| = m and0 ≤ r < s ≤ n. Assume there exists a bijection ϕ : S → R such thatϕ(S) ⊂ S for every S ∈ S. Then there exist m vertex-disjoint directed pathsinHn whose union contains all the subsets in S and R.

The theorem doesn’t hold if one requires that the disjoint chains exactlycorrespond to the given bijection ϕ.

Sorting with Forbidden Intermediates

C. Comin, A. Labarre, R. Rizzi, S. Vialette 12/17

Page 15: Sorting with Forbidden Intermediates

From the proof of Lehman, Ron 2001 we can extract the following algorithm.

Computing Lehman-Ron paths in polynomial time

Theorem

There exists an algorithm for computing all Lehman-Ron paths within timeO(m5/2n3/2d

)on any input 〈R,S, ϕ, n〉 with |R| = |S| = m, where d is the

distance between R and S and n is the size of the underlying ground set.

In the time complexity stated above, Menger’s vertex-connectivity theorem andHopcroft-Karp’s algorithm for maximum cardinality matching in undirectedbipartite graphs play a major role.

Sorting with Forbidden Intermediates

C. Comin, A. Labarre, R. Rizzi, S. Vialette 13/17

Page 16: Sorting with Forbidden Intermediates

A Poly(|F|)-Time Algorithm for Solving HY-STCON

Sorting with Forbidden Intermediates

C. Comin, A. Labarre, R. Rizzi, S. Vialette 14/17

Page 17: Sorting with Forbidden Intermediates

Compression Phase (Crux!)

At line 3, construct_bipartite_graph(S,T , n) builds an undirected bipartite graph G = (VG , EG), where:VG = S ∪ T and every vertex U ∈ S is adjacent to a vertex V ∈ T iff U ⊂ V .

At line 4, compute_max_matching(G, |F| + 1), finds a matchingM of size |M| = min(m∗, |F| + 1),I where m∗ is the size of a maximum cardinality matching of G.

The compression of T (possibly returned at line 16) is T ′ =⋃maxi

i=1X (i)T . Then, we can prove the following:

I |T ′| ≤ |F| dS,T , thus “|X| ≤ |F| dS,T " is matched at line 1 of bfs_phase().I If p is any directed path inHn that goes from S to T avoidingF , then p goes from S to T ′ .

Sorting with Forbidden Intermediates

C. Comin, A. Labarre, R. Rizzi, S. Vialette 15/17

Page 18: Sorting with Forbidden Intermediates

Conclusion

Future Works and Open ProblemsMany questions remain open, most notably:

The computational complexity of the GUIDED SORTING problem, whetherunder our assumptions or in a more general setting.

I i.e., using structures other than permutations, operations other thanexchanges, or sequences “as short as possible” instead of optimal.

One could also investigate “implicit” representations for the set offorbidden intermediate permutations, e.g., all permutations that avoid agiven (set of) pattern(s).

Sorting with Forbidden Intermediates

C. Comin, A. Labarre, R. Rizzi, S. Vialette 16/17

Page 19: Sorting with Forbidden Intermediates

Thank you

Thank you for the attention

Sorting with Forbidden Intermediates

C. Comin, A. Labarre, R. Rizzi, S. Vialette 17/17