greedy algorithms -class3- · greedy algorithms and genome rearrangements bioinfo i (institut...

84
Greedy Algorithms And Genome Rearrangements Bioinfo I (Institut Pasteur de Montevideo) Greedy Algorithms -class3- July 19th, 2011 1 / 75

Upload: others

Post on 26-Aug-2020

14 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Greedy Algorithms -class3- · Greedy Algorithms And Genome Rearrangements Bioinfo I (Institut Pasteur de Montevideo) Greedy Algorithms -class3- July 19th, 2011 1 / 75

Greedy Algorithms

And

Genome Rearrangements

Bioinfo I (Institut Pasteur de Montevideo) Greedy Algorithms -class3- July 19th, 2011 1 / 75

Page 2: Greedy Algorithms -class3- · Greedy Algorithms And Genome Rearrangements Bioinfo I (Institut Pasteur de Montevideo) Greedy Algorithms -class3- July 19th, 2011 1 / 75

Outline

Greedy exampleTransforming Cabbage into TurnipGenome RearrangementsSorting By ReversalsPancake Flipping ProblemGreedy Algorithm for Sorting by ReversalsApproximation AlgorithmsIntroduction to Dynamic Programming

Bioinfo I (Institut Pasteur de Montevideo) Greedy Algorithms -class3- July 19th, 2011 2 / 75

Page 3: Greedy Algorithms -class3- · Greedy Algorithms And Genome Rearrangements Bioinfo I (Institut Pasteur de Montevideo) Greedy Algorithms -class3- July 19th, 2011 1 / 75

Greedy Example

Goal: Given a tree, find the longest path from the root to the leaves

Greedy approach Actual path

Bioinfo I (Institut Pasteur de Montevideo) Greedy Algorithms -class3- July 19th, 2011 3 / 75

Page 4: Greedy Algorithms -class3- · Greedy Algorithms And Genome Rearrangements Bioinfo I (Institut Pasteur de Montevideo) Greedy Algorithms -class3- July 19th, 2011 1 / 75

Greedy Example

Goal: Given a tree, find the longest path from the root to the leaves

Greedy approach Actual path

Bioinfo I (Institut Pasteur de Montevideo) Greedy Algorithms -class3- July 19th, 2011 3 / 75

Page 5: Greedy Algorithms -class3- · Greedy Algorithms And Genome Rearrangements Bioinfo I (Institut Pasteur de Montevideo) Greedy Algorithms -class3- July 19th, 2011 1 / 75

Turnip vs. Cabbage: Look and Taste Different

Although cabbages and turnips share a recent common ancestor, they lookand taste different

Bioinfo I (Institut Pasteur de Montevideo) Greedy Algorithms -class3- July 19th, 2011 4 / 75

Page 6: Greedy Algorithms -class3- · Greedy Algorithms And Genome Rearrangements Bioinfo I (Institut Pasteur de Montevideo) Greedy Algorithms -class3- July 19th, 2011 1 / 75

Turnip vs Cabbage: Comparing Gene Sequences Yields NoEvolutionary Information

Bioinfo I (Institut Pasteur de Montevideo) Greedy Algorithms -class3- July 19th, 2011 5 / 75

Page 7: Greedy Algorithms -class3- · Greedy Algorithms And Genome Rearrangements Bioinfo I (Institut Pasteur de Montevideo) Greedy Algorithms -class3- July 19th, 2011 1 / 75

Turnip vs Cabbage: Almost Identical mtDNA genesequences

In 1980s Jeffrey Palmer studied evolution of plant organelles bycomparing mitochondrial genomes of the cabbage and turnip99% similarity between genesThese surprisingly identical gene sequences differed in gene orderThis study helped pave the way to analyzing genomerearrangements in molecular evolution

Bioinfo I (Institut Pasteur de Montevideo) Greedy Algorithms -class3- July 19th, 2011 6 / 75

Page 8: Greedy Algorithms -class3- · Greedy Algorithms And Genome Rearrangements Bioinfo I (Institut Pasteur de Montevideo) Greedy Algorithms -class3- July 19th, 2011 1 / 75

Turnip vs Cabbage: Different mtDNA Gene Order

Gene order comparison:

Bioinfo I (Institut Pasteur de Montevideo) Greedy Algorithms -class3- July 19th, 2011 7 / 75

Page 9: Greedy Algorithms -class3- · Greedy Algorithms And Genome Rearrangements Bioinfo I (Institut Pasteur de Montevideo) Greedy Algorithms -class3- July 19th, 2011 1 / 75

Turnip vs Cabbage: Different mtDNA Gene Order

Gene order comparison:

Bioinfo I (Institut Pasteur de Montevideo) Greedy Algorithms -class3- July 19th, 2011 8 / 75

Page 10: Greedy Algorithms -class3- · Greedy Algorithms And Genome Rearrangements Bioinfo I (Institut Pasteur de Montevideo) Greedy Algorithms -class3- July 19th, 2011 1 / 75

Turnip vs Cabbage: Different mtDNA Gene Order

Gene order comparison:

Bioinfo I (Institut Pasteur de Montevideo) Greedy Algorithms -class3- July 19th, 2011 9 / 75

Page 11: Greedy Algorithms -class3- · Greedy Algorithms And Genome Rearrangements Bioinfo I (Institut Pasteur de Montevideo) Greedy Algorithms -class3- July 19th, 2011 1 / 75

Turnip vs Cabbage: Different mtDNA Gene Order

Gene order comparison:

Bioinfo I (Institut Pasteur de Montevideo) Greedy Algorithms -class3- July 19th, 2011 10 / 75

Page 12: Greedy Algorithms -class3- · Greedy Algorithms And Genome Rearrangements Bioinfo I (Institut Pasteur de Montevideo) Greedy Algorithms -class3- July 19th, 2011 1 / 75

Turnip vs Cabbage: Different mtDNA Gene Order

Gene order comparison:

Bioinfo I (Institut Pasteur de Montevideo) Greedy Algorithms -class3- July 19th, 2011 11 / 75

Page 13: Greedy Algorithms -class3- · Greedy Algorithms And Genome Rearrangements Bioinfo I (Institut Pasteur de Montevideo) Greedy Algorithms -class3- July 19th, 2011 1 / 75

Turnip vs Cabbage: Different mtDNA Gene Order

Gene order comparison:

Bioinfo I (Institut Pasteur de Montevideo) Greedy Algorithms -class3- July 19th, 2011 12 / 75

Page 14: Greedy Algorithms -class3- · Greedy Algorithms And Genome Rearrangements Bioinfo I (Institut Pasteur de Montevideo) Greedy Algorithms -class3- July 19th, 2011 1 / 75

Turnip vs Cabbage: Different mtDNA Gene Order

Gene order comparison:

Bioinfo I (Institut Pasteur de Montevideo) Greedy Algorithms -class3- July 19th, 2011 13 / 75

Page 15: Greedy Algorithms -class3- · Greedy Algorithms And Genome Rearrangements Bioinfo I (Institut Pasteur de Montevideo) Greedy Algorithms -class3- July 19th, 2011 1 / 75

Turnip vs Cabbage: Different mtDNA Gene Order

Gene order comparison:

Evolution is manifested as the divergence in gene order

Bioinfo I (Institut Pasteur de Montevideo) Greedy Algorithms -class3- July 19th, 2011 14 / 75

Page 16: Greedy Algorithms -class3- · Greedy Algorithms And Genome Rearrangements Bioinfo I (Institut Pasteur de Montevideo) Greedy Algorithms -class3- July 19th, 2011 1 / 75

Genome rearrangements

What are the similarity blocks and how to find them?What is the architecture of the ancestral genome?What is the evolutionary scenario for transforming one genome intothe other?

Bioinfo I (Institut Pasteur de Montevideo) Greedy Algorithms -class3- July 19th, 2011 15 / 75

Page 17: Greedy Algorithms -class3- · Greedy Algorithms And Genome Rearrangements Bioinfo I (Institut Pasteur de Montevideo) Greedy Algorithms -class3- July 19th, 2011 1 / 75

Reversals

Bioinfo I (Institut Pasteur de Montevideo) Greedy Algorithms -class3- July 19th, 2011 16 / 75

Page 18: Greedy Algorithms -class3- · Greedy Algorithms And Genome Rearrangements Bioinfo I (Institut Pasteur de Montevideo) Greedy Algorithms -class3- July 19th, 2011 1 / 75

Reversals

Blocks represent conserved genesIn the course of evolution or in a clinical context, blocks 1,...,10 couldbe misread as 1, 2, 3,−8,−7,−6,−5,−4, 9, 10

Bioinfo I (Institut Pasteur de Montevideo) Greedy Algorithms -class3- July 19th, 2011 17 / 75

Page 19: Greedy Algorithms -class3- · Greedy Algorithms And Genome Rearrangements Bioinfo I (Institut Pasteur de Montevideo) Greedy Algorithms -class3- July 19th, 2011 1 / 75

Reversals and Breakpoints

Bioinfo I (Institut Pasteur de Montevideo) Greedy Algorithms -class3- July 19th, 2011 18 / 75

Page 20: Greedy Algorithms -class3- · Greedy Algorithms And Genome Rearrangements Bioinfo I (Institut Pasteur de Montevideo) Greedy Algorithms -class3- July 19th, 2011 1 / 75

Reversals: Example

Bioinfo I (Institut Pasteur de Montevideo) Greedy Algorithms -class3- July 19th, 2011 19 / 75

Page 21: Greedy Algorithms -class3- · Greedy Algorithms And Genome Rearrangements Bioinfo I (Institut Pasteur de Montevideo) Greedy Algorithms -class3- July 19th, 2011 1 / 75

Types of rearrangements

Bioinfo I (Institut Pasteur de Montevideo) Greedy Algorithms -class3- July 19th, 2011 20 / 75

Page 22: Greedy Algorithms -class3- · Greedy Algorithms And Genome Rearrangements Bioinfo I (Institut Pasteur de Montevideo) Greedy Algorithms -class3- July 19th, 2011 1 / 75

Comparative Genomic Architectures: Mouse vs HumanGenome

Humans and mice have similargenomes, but their genes areordered differently245 rearrangements

I ReversalsI FusionsI FissionsI Translocation

Bioinfo I (Institut Pasteur de Montevideo) Greedy Algorithms -class3- July 19th, 2011 21 / 75

Page 23: Greedy Algorithms -class3- · Greedy Algorithms And Genome Rearrangements Bioinfo I (Institut Pasteur de Montevideo) Greedy Algorithms -class3- July 19th, 2011 1 / 75

Comparative Genomic Architecture of Human and MouseGenomes

To locate where correspondinggene is in humans, we have toanalyze the relative architectureof human and mouse genomes

Bioinfo I (Institut Pasteur de Montevideo) Greedy Algorithms -class3- July 19th, 2011 22 / 75

Page 24: Greedy Algorithms -class3- · Greedy Algorithms And Genome Rearrangements Bioinfo I (Institut Pasteur de Montevideo) Greedy Algorithms -class3- July 19th, 2011 1 / 75

Reversals: Example

Bioinfo I (Institut Pasteur de Montevideo) Greedy Algorithms -class3- July 19th, 2011 23 / 75

Page 25: Greedy Algorithms -class3- · Greedy Algorithms And Genome Rearrangements Bioinfo I (Institut Pasteur de Montevideo) Greedy Algorithms -class3- July 19th, 2011 1 / 75

Reversals: Example

Bioinfo I (Institut Pasteur de Montevideo) Greedy Algorithms -class3- July 19th, 2011 24 / 75

Page 26: Greedy Algorithms -class3- · Greedy Algorithms And Genome Rearrangements Bioinfo I (Institut Pasteur de Montevideo) Greedy Algorithms -class3- July 19th, 2011 1 / 75

Reversals and Gene Orders

Gene order is represented by a permutation π:

π = π1π2...πi−1πiπi+1...πj−1πj ...πn

π = π1π2...πi−1πjπj−1...πi+1πi ...πn

Reversal ρ(i , j) reverses (flips) the elements from i to j in π

Bioinfo I (Institut Pasteur de Montevideo) Greedy Algorithms -class3- July 19th, 2011 25 / 75

Page 27: Greedy Algorithms -class3- · Greedy Algorithms And Genome Rearrangements Bioinfo I (Institut Pasteur de Montevideo) Greedy Algorithms -class3- July 19th, 2011 1 / 75

Reversal Distance Problem

Goal: Given two permutations, find the shortest series of reversals thattransforms one into another

Input: Permutations π and σOutput: A series of reversals ρ1,..., ρt transforming π into σ, such that tis minimum

t: reversal distance between π and σd(π, σ): smallest possible value of t, given π and σ

Bioinfo I (Institut Pasteur de Montevideo) Greedy Algorithms -class3- July 19th, 2011 26 / 75

Page 28: Greedy Algorithms -class3- · Greedy Algorithms And Genome Rearrangements Bioinfo I (Institut Pasteur de Montevideo) Greedy Algorithms -class3- July 19th, 2011 1 / 75

Sorting By Reversals Problem

Goal: : Given a permutation, find a shortest series of reversals thattransforms it into the identity permutation (1 2 ... n )

Input: Permutations πOutput: A series of reversals ρ1,..., ρt transforming π into the identitypermutation such that t is minimum

Bioinfo I (Institut Pasteur de Montevideo) Greedy Algorithms -class3- July 19th, 2011 27 / 75

Page 29: Greedy Algorithms -class3- · Greedy Algorithms And Genome Rearrangements Bioinfo I (Institut Pasteur de Montevideo) Greedy Algorithms -class3- July 19th, 2011 1 / 75

Sorting By Reversals: Example

t = d(π) - reversal distance of πExample :

π = 3 4 2 1 5 6 7 10 9 8π = 4 3 2 1 5 6 7 10 9 8π = 4 3 2 1 5 6 7 8 9 10π = 1 2 3 4 5 6 7 8 9 10

→ So d(π) = 3

Bioinfo I (Institut Pasteur de Montevideo) Greedy Algorithms -class3- July 19th, 2011 28 / 75

Page 30: Greedy Algorithms -class3- · Greedy Algorithms And Genome Rearrangements Bioinfo I (Institut Pasteur de Montevideo) Greedy Algorithms -class3- July 19th, 2011 1 / 75

Sorting by reversals: 5 steps

→ d(π) = 5

Bioinfo I (Institut Pasteur de Montevideo) Greedy Algorithms -class3- July 19th, 2011 29 / 75

Page 31: Greedy Algorithms -class3- · Greedy Algorithms And Genome Rearrangements Bioinfo I (Institut Pasteur de Montevideo) Greedy Algorithms -class3- July 19th, 2011 1 / 75

Sorting by reversals: 4 steps

→ d(π) = 4

What is the reversal distance for this permutation?Can it be sorted in 3 steps?

Bioinfo I (Institut Pasteur de Montevideo) Greedy Algorithms -class3- July 19th, 2011 30 / 75

Page 32: Greedy Algorithms -class3- · Greedy Algorithms And Genome Rearrangements Bioinfo I (Institut Pasteur de Montevideo) Greedy Algorithms -class3- July 19th, 2011 1 / 75

Sorting by reversals: 4 steps

→ d(π) = 4

What is the reversal distance for this permutation?Can it be sorted in 3 steps?

Bioinfo I (Institut Pasteur de Montevideo) Greedy Algorithms -class3- July 19th, 2011 30 / 75

Page 33: Greedy Algorithms -class3- · Greedy Algorithms And Genome Rearrangements Bioinfo I (Institut Pasteur de Montevideo) Greedy Algorithms -class3- July 19th, 2011 1 / 75

Pancake Flipping Problem

The chef is sloppy; he prepares anunordered stack of pancakes ofdifferent sizesThe waiter wants to rearrange them(so that the smallest winds up ontop, and so on, down to the largestat the bottom)He does it by flipping over severalfrom the top, repeating this as manytimes as necessary

Bioinfo I (Institut Pasteur de Montevideo) Greedy Algorithms -class3- July 19th, 2011 31 / 75

Page 34: Greedy Algorithms -class3- · Greedy Algorithms And Genome Rearrangements Bioinfo I (Institut Pasteur de Montevideo) Greedy Algorithms -class3- July 19th, 2011 1 / 75

Pancake Flipping Problem: Formulation

Goal: Given a stack of n pancakes, what is the minimum number of flipsto rearrange them into perfect stack?Input: Permutation πOutput: A series of prefix reversals ρ1, ...ρt transforming π into theidentity permutation such that t is minimum

Bioinfo I (Institut Pasteur de Montevideo) Greedy Algorithms -class3- July 19th, 2011 32 / 75

Page 35: Greedy Algorithms -class3- · Greedy Algorithms And Genome Rearrangements Bioinfo I (Institut Pasteur de Montevideo) Greedy Algorithms -class3- July 19th, 2011 1 / 75

Pancake Flipping Problem: Greedy Algorithm

Greedy approach: 2 prefix reversals at most to place a pancake in itsright position, 2n − 2 steps total at mostWilliam Gates and Christos Papadimitriou showed in the mid-1970sthat this problem can be solved by at most 5/3(n + 1) prefix reversals

Bioinfo I (Institut Pasteur de Montevideo) Greedy Algorithms -class3- July 19th, 2011 33 / 75

Page 36: Greedy Algorithms -class3- · Greedy Algorithms And Genome Rearrangements Bioinfo I (Institut Pasteur de Montevideo) Greedy Algorithms -class3- July 19th, 2011 1 / 75

Sorting By Reversals: A Greedy Algorithm

If sorting permutation π = 1 2 3 6 4 5, the first three elements arealready in order so it does not make any sense to break them.The length of the already sorted prefix of π is denoted prefix(π) →prefix(π) = 3This results in an idea for a greedy algorithm: increase prefix(π) atevery step

Bioinfo I (Institut Pasteur de Montevideo) Greedy Algorithms -class3- July 19th, 2011 34 / 75

Page 37: Greedy Algorithms -class3- · Greedy Algorithms And Genome Rearrangements Bioinfo I (Institut Pasteur de Montevideo) Greedy Algorithms -class3- July 19th, 2011 1 / 75

Greedy Algorithm: An Example

Doing so, π can be sorted

1 2 3 6 4 5↓

1 2 3 4 6 5↓

1 2 3 4 5 6

Number of steps to sort permutation of length n is at most (n − 1)

Bioinfo I (Institut Pasteur de Montevideo) Greedy Algorithms -class3- July 19th, 2011 35 / 75

Page 38: Greedy Algorithms -class3- · Greedy Algorithms And Genome Rearrangements Bioinfo I (Institut Pasteur de Montevideo) Greedy Algorithms -class3- July 19th, 2011 1 / 75

Greedy Algorithm: Pseudocode

1 SimpleReversalSort(π)2 1 for i → 1 to n - 13 j → position of element i in π (i.e., πj = i)4 if j 6= i5 π → π ∗ ρ(i , j)6 output π7 if π is the identity permutation8 return

Bioinfo I (Institut Pasteur de Montevideo) Greedy Algorithms -class3- July 19th, 2011 36 / 75

Page 39: Greedy Algorithms -class3- · Greedy Algorithms And Genome Rearrangements Bioinfo I (Institut Pasteur de Montevideo) Greedy Algorithms -class3- July 19th, 2011 1 / 75

Analyzing SimpleReversalSort

SimpleReversalSort does not guarantee the smallest number of reversalsand takes five steps on π = 6 1 2 3 4 5 :

Step 1: 1 6 2 3 4 5

Step 2: 1 2 6 3 4 5

Step 3: 1 2 3 6 4 5

Step 4: 1 2 3 4 6 5

Step 5: 1 2 3 4 5 6

Bioinfo I (Institut Pasteur de Montevideo) Greedy Algorithms -class3- July 19th, 2011 37 / 75

Page 40: Greedy Algorithms -class3- · Greedy Algorithms And Genome Rearrangements Bioinfo I (Institut Pasteur de Montevideo) Greedy Algorithms -class3- July 19th, 2011 1 / 75

Analyzing SimpleReversalSort

But it can be sorted in two steps:π = 6 1 2 3 4 5

Step 1: 5 4 3 2 1 6Step 2: 1 2 3 4 5 6

So, SimpleReversalSort(π) is not optimalOptimal algorithms are unknown for many problems → approximationalgorithms are used

Bioinfo I (Institut Pasteur de Montevideo) Greedy Algorithms -class3- July 19th, 2011 38 / 75

Page 41: Greedy Algorithms -class3- · Greedy Algorithms And Genome Rearrangements Bioinfo I (Institut Pasteur de Montevideo) Greedy Algorithms -class3- July 19th, 2011 1 / 75

Approximation Algorithms

These algorithms find approximate solutions rather than optimalsolutionsThe approximation ratio of an algorithm A on input π is:

A(π) / OPT(π)where

A(π) -solution produced by algorithm AOPT (π) - optimal solution of the problem

Bioinfo I (Institut Pasteur de Montevideo) Greedy Algorithms -class3- July 19th, 2011 39 / 75

Page 42: Greedy Algorithms -class3- · Greedy Algorithms And Genome Rearrangements Bioinfo I (Institut Pasteur de Montevideo) Greedy Algorithms -class3- July 19th, 2011 1 / 75

Approximation Ratio/Performance Guarantee

Approximation ratio (performance guarantee) of algorithm A: maxapproximation ratio of all inputs of size n

For minimization algorithm A the objective function is:

max|π| =n A(π)/OPT (π)

For maximization algorithm:

min|π| =n A(π)/OPT (π)

Bioinfo I (Institut Pasteur de Montevideo) Greedy Algorithms -class3- July 19th, 2011 40 / 75

Page 43: Greedy Algorithms -class3- · Greedy Algorithms And Genome Rearrangements Bioinfo I (Institut Pasteur de Montevideo) Greedy Algorithms -class3- July 19th, 2011 1 / 75

Adjacencies and Breakpoints

Given π = π1π2π3...πn−1πn

A pair of elements πi and πi+1 are adjacent consecutive if

πi+1 = πi ± 1

For example:

π = 1 9 3 4 7 8 2 6 5

(3, 4) or (7, 8) and (6,5) are adjacent consecutive pairs

Bioinfo I (Institut Pasteur de Montevideo) Greedy Algorithms -class3- July 19th, 2011 41 / 75

Page 44: Greedy Algorithms -class3- · Greedy Algorithms And Genome Rearrangements Bioinfo I (Institut Pasteur de Montevideo) Greedy Algorithms -class3- July 19th, 2011 1 / 75

Breakpoints: An Example

There is a breakpoint between any adjacent element that arenon-consecutive:

π = 1 9 3 4 7 8 2 6 5

Pairs (1,9), (9,3), (4,7), (8,2) and (2,6) form breakpoints ofpermutation πb(π): number breakpoints in permutation π

Bioinfo I (Institut Pasteur de Montevideo) Greedy Algorithms -class3- July 19th, 2011 42 / 75

Page 45: Greedy Algorithms -class3- · Greedy Algorithms And Genome Rearrangements Bioinfo I (Institut Pasteur de Montevideo) Greedy Algorithms -class3- July 19th, 2011 1 / 75

Adjacency & Breakpoints

An adjacency - a pair of adjacent elements that are consecutiveA breakpoint - a pair of adjacent elements that are not consecutive

Bioinfo I (Institut Pasteur de Montevideo) Greedy Algorithms -class3- July 19th, 2011 43 / 75

Page 46: Greedy Algorithms -class3- · Greedy Algorithms And Genome Rearrangements Bioinfo I (Institut Pasteur de Montevideo) Greedy Algorithms -class3- July 19th, 2011 1 / 75

Extending Permutations

We put two elements π0=0 and πn+1=n+1 at the ends of π

Example:

Note: A new breakpoint was created after extending

Bioinfo I (Institut Pasteur de Montevideo) Greedy Algorithms -class3- July 19th, 2011 44 / 75

Page 47: Greedy Algorithms -class3- · Greedy Algorithms And Genome Rearrangements Bioinfo I (Institut Pasteur de Montevideo) Greedy Algorithms -class3- July 19th, 2011 1 / 75

Reversal Distance and Breakpoints

Each reversal eliminates at most 2 breakpoints

→ This implies:

reversaldistance ≥ #breakpoints2

π = 2 3 1 4 6 5

0 | 2 3 | 1 | 4 | 6 5| 7 b(π) = 50 1| 3 2 | 4| 6 5 |7 b(π) = 40 1 2 3 4 | 6 5 | 7 b(π) = 20 1 2 3 4 5 6 7 b(π) = 0

Bioinfo I (Institut Pasteur de Montevideo) Greedy Algorithms -class3- July 19th, 2011 45 / 75

Page 48: Greedy Algorithms -class3- · Greedy Algorithms And Genome Rearrangements Bioinfo I (Institut Pasteur de Montevideo) Greedy Algorithms -class3- July 19th, 2011 1 / 75

Reversal Distance and Breakpoints

Each reversal eliminates at most 2 breakpoints→ This implies:

reversaldistance ≥ #breakpoints2

π = 2 3 1 4 6 5

0 | 2 3 | 1 | 4 | 6 5| 7 b(π) = 50 1| 3 2 | 4| 6 5 |7 b(π) = 40 1 2 3 4 | 6 5 | 7 b(π) = 20 1 2 3 4 5 6 7 b(π) = 0

Bioinfo I (Institut Pasteur de Montevideo) Greedy Algorithms -class3- July 19th, 2011 45 / 75

Page 49: Greedy Algorithms -class3- · Greedy Algorithms And Genome Rearrangements Bioinfo I (Institut Pasteur de Montevideo) Greedy Algorithms -class3- July 19th, 2011 1 / 75

Sorting By Reversals: A Better Greedy Algorithm

1 BreakPointReversalSort(π)2 while b(π) > 03 Among all possible reversals, choose reversal ρ minimizing b(π ∗ ρ)4 π ← π ∗ ρ(i , j)5 output π6 return

Problem: this algorithm may work forever

Bioinfo I (Institut Pasteur de Montevideo) Greedy Algorithms -class3- July 19th, 2011 46 / 75

Page 50: Greedy Algorithms -class3- · Greedy Algorithms And Genome Rearrangements Bioinfo I (Institut Pasteur de Montevideo) Greedy Algorithms -class3- July 19th, 2011 1 / 75

Sorting By Reversals: A Better Greedy Algorithm

1 BreakPointReversalSort(π)2 while b(π) > 03 Among all possible reversals, choose reversal ρ minimizing b(π ∗ ρ)4 π ← π ∗ ρ(i , j)5 output π6 return

Problem: this algorithm may work forever

Bioinfo I (Institut Pasteur de Montevideo) Greedy Algorithms -class3- July 19th, 2011 46 / 75

Page 51: Greedy Algorithms -class3- · Greedy Algorithms And Genome Rearrangements Bioinfo I (Institut Pasteur de Montevideo) Greedy Algorithms -class3- July 19th, 2011 1 / 75

Strips

Strip: an interval between two consecutive breakpoints in apermutation

I Decreasing strip: strip of elements in decreasing order(e.g. 6 5 and 3 2 )

I Increasing strip: strip of elements in increasing order (e.g. 7 8)

A single-element strip can be declared either increasing or decreasing.We will choose to declare them as decreasing with exception of thestrips with 0 and n+1

Bioinfo I (Institut Pasteur de Montevideo) Greedy Algorithms -class3- July 19th, 2011 47 / 75

Page 52: Greedy Algorithms -class3- · Greedy Algorithms And Genome Rearrangements Bioinfo I (Institut Pasteur de Montevideo) Greedy Algorithms -class3- July 19th, 2011 1 / 75

Reducing the Number of Breakpoints

Theorem 1:If permutation π contains at least one decreasing strip, then there exists areversal ρ which decreases the number of breakpoints (i.e. b(π ∗ ρ) < b(π))

Things To ConsiderFor π= 1 4 6 5 7 8 3 2

0 1 | 4 | 6 5 | 7 8| 3 2 | 9 b(π) = 5

Choose decreasing strip with the smallest element k in π ( k = 2 inthis case)

Bioinfo I (Institut Pasteur de Montevideo) Greedy Algorithms -class3- July 19th, 2011 48 / 75

Page 53: Greedy Algorithms -class3- · Greedy Algorithms And Genome Rearrangements Bioinfo I (Institut Pasteur de Montevideo) Greedy Algorithms -class3- July 19th, 2011 1 / 75

Reducing the Number of Breakpoints

Theorem 1:If permutation π contains at least one decreasing strip, then there exists areversal ρ which decreases the number of breakpoints (i.e. b(π ∗ ρ) < b(π))

Things To ConsiderFor π= 1 4 6 5 7 8 3 2

0 1 | 4 | 6 5 | 7 8| 3 2 | 9 b(π) = 5

Choose decreasing strip with the smallest element k in π ( k = 2 inthis case)

Bioinfo I (Institut Pasteur de Montevideo) Greedy Algorithms -class3- July 19th, 2011 48 / 75

Page 54: Greedy Algorithms -class3- · Greedy Algorithms And Genome Rearrangements Bioinfo I (Institut Pasteur de Montevideo) Greedy Algorithms -class3- July 19th, 2011 1 / 75

Reducing the Number of Breakpoints

Theorem 1:If permutation π contains at least one decreasing strip, then there exists areversal ρ which decreases the number of breakpoints (i.e. b(π ∗ ρ) < b(π))

Things To ConsiderFor π= 1 4 6 5 7 8 3 2

0 1 | 4 | 6 5 | 7 8| 3 2 | 9 b(π) = 5

Choose decreasing strip with the smallest element k in π ( k = 2 inthis case)

Bioinfo I (Institut Pasteur de Montevideo) Greedy Algorithms -class3- July 19th, 2011 49 / 75

Page 55: Greedy Algorithms -class3- · Greedy Algorithms And Genome Rearrangements Bioinfo I (Institut Pasteur de Montevideo) Greedy Algorithms -class3- July 19th, 2011 1 / 75

Reducing the Number of Breakpoints

Theorem 1:If permutation π contains at least one decreasing strip, then there exists areversal ρ which decreases the number of breakpoints (i.e. b(π ∗ ρ) < b(π))

Things To ConsiderFor π= 1 4 6 5 7 8 3 2

0 1 | 4 | 6 5 | 7 8| 3 2 | 9 b(π) = 5

Choose decreasing strip with the smallest element k in π ( k = 2 inthis case)Find k − 1 in the permutation

Bioinfo I (Institut Pasteur de Montevideo) Greedy Algorithms -class3- July 19th, 2011 50 / 75

Page 56: Greedy Algorithms -class3- · Greedy Algorithms And Genome Rearrangements Bioinfo I (Institut Pasteur de Montevideo) Greedy Algorithms -class3- July 19th, 2011 1 / 75

Reducing the Number of BreakpointsTheorem 1:If permutation π contains at least one decreasing strip, then there exists areversal ρ which decreases the number of breakpoints (i.e. b(π ∗ ρ) < b(π))

Things To ConsiderFor π= 1 4 6 5 7 8 3 2

0 1 | 4 | 6 5 | 7 8| 3 2 | 9 b(π) = 5

Choose decreasing strip with the smallest element k in π ( k = 2 inthis case)Find k − 1 in the permutationReverse the segment between k and k-1:

Bioinfo I (Institut Pasteur de Montevideo) Greedy Algorithms -class3- July 19th, 2011 51 / 75

Page 57: Greedy Algorithms -class3- · Greedy Algorithms And Genome Rearrangements Bioinfo I (Institut Pasteur de Montevideo) Greedy Algorithms -class3- July 19th, 2011 1 / 75

Reducing the number of breakpoints again

If there is no decreasing strip, there may be no reversal ρ that reducesthe number of breakpoints (i.e. b(π ∗ ρ) ≥ b(π) for any reversal ρ)By reversing an increasing strip (# of breakpoints stay unchanged),we will create a decreasing strip at the next step. Then the number ofbreakpoints will be reduced in the next step (Theorem 1).

There is no decreasing strip in π for:

ρ(6, 7) does not change the # of breakpointsρ(6, 7) creates a decreasing strip thus guaranteeing that the next stepwill decrease the # of breakpoints.

Bioinfo I (Institut Pasteur de Montevideo) Greedy Algorithms -class3- July 19th, 2011 52 / 75

Page 58: Greedy Algorithms -class3- · Greedy Algorithms And Genome Rearrangements Bioinfo I (Institut Pasteur de Montevideo) Greedy Algorithms -class3- July 19th, 2011 1 / 75

Reducing the number of breakpoints again

If there is no decreasing strip, there may be no reversal ρ that reducesthe number of breakpoints (i.e. b(π ∗ ρ) ≥ b(π) for any reversal ρ)By reversing an increasing strip (# of breakpoints stay unchanged),we will create a decreasing strip at the next step. Then the number ofbreakpoints will be reduced in the next step (Theorem 1).There is no decreasing strip in π for:

ρ(6, 7) does not change the # of breakpointsρ(6, 7) creates a decreasing strip thus guaranteeing that the next stepwill decrease the # of breakpoints.

Bioinfo I (Institut Pasteur de Montevideo) Greedy Algorithms -class3- July 19th, 2011 52 / 75

Page 59: Greedy Algorithms -class3- · Greedy Algorithms And Genome Rearrangements Bioinfo I (Institut Pasteur de Montevideo) Greedy Algorithms -class3- July 19th, 2011 1 / 75

ImprovedBreakpointReversalSort

1 ImprovedBreakpointReversalSort(π)2 while b(π) > 03 if π has a decreasing strip

Among all possible reversals, choose reversal ρthat minimizes b(π ∗ ρ)

4 else5 Choose a reversal ρ that flips an increasing strip in π6 π ← π ∗ ρ7 output π8 return

Bioinfo I (Institut Pasteur de Montevideo) Greedy Algorithms -class3- July 19th, 2011 53 / 75

Page 60: Greedy Algorithms -class3- · Greedy Algorithms And Genome Rearrangements Bioinfo I (Institut Pasteur de Montevideo) Greedy Algorithms -class3- July 19th, 2011 1 / 75

ImprovedBreakpointReversalSort: Performance Guarantee

ImprovedBreakPointReversalSort is an approximation algorithm with aperformance guarantee of at most 4

It eliminates at least one breakpoint in every two steps; at most 2b(π)stepsApproximation ratio: 2b(π)

d(π)

Optimal algorithm eliminates at most 2 breakpoints in every step:d(π) ≥ b(π)

2

Performance guarantee:

(2b(π)d(π)

) ≤ [2b(π)b(π)

2

] = 4

Bioinfo I (Institut Pasteur de Montevideo) Greedy Algorithms -class3- July 19th, 2011 54 / 75

Page 61: Greedy Algorithms -class3- · Greedy Algorithms And Genome Rearrangements Bioinfo I (Institut Pasteur de Montevideo) Greedy Algorithms -class3- July 19th, 2011 1 / 75

Signed Permutations

Up to this point, all permutations to sort were unsignedBut genes have directions... so we should consider signed permutations

GRIMM Web Server

Real genome architectures are represented by signed permutationsEfficient algorithms to sort signed permutations have been developedGRIMM web server computes the reversal distances between signedpermutations

Bioinfo I (Institut Pasteur de Montevideo) Greedy Algorithms -class3- July 19th, 2011 55 / 75

Page 62: Greedy Algorithms -class3- · Greedy Algorithms And Genome Rearrangements Bioinfo I (Institut Pasteur de Montevideo) Greedy Algorithms -class3- July 19th, 2011 1 / 75

Signed Permutations

Up to this point, all permutations to sort were unsignedBut genes have directions... so we should consider signed permutations

GRIMM Web Server

Real genome architectures are represented by signed permutationsEfficient algorithms to sort signed permutations have been developedGRIMM web server computes the reversal distances between signedpermutations

Bioinfo I (Institut Pasteur de Montevideo) Greedy Algorithms -class3- July 19th, 2011 55 / 75

Page 63: Greedy Algorithms -class3- · Greedy Algorithms And Genome Rearrangements Bioinfo I (Institut Pasteur de Montevideo) Greedy Algorithms -class3- July 19th, 2011 1 / 75

Dynamic ProgrammingPart I: Edit Distance

Bioinfo I (Institut Pasteur de Montevideo) Greedy Algorithms -class3- July 19th, 2011 56 / 75

Page 64: Greedy Algorithms -class3- · Greedy Algorithms And Genome Rearrangements Bioinfo I (Institut Pasteur de Montevideo) Greedy Algorithms -class3- July 19th, 2011 1 / 75

DNA Sequence Comparison: First Success Story

Finding sequence similarities with genes of known function is acommon approach to infer a newly sequenced gene’s functionIn 1984 Russell Doolittle and colleagues found similarities betweencancer-causing gene and normal growth factor (PDGF) gene

Bioinfo I (Institut Pasteur de Montevideo) Greedy Algorithms -class3- July 19th, 2011 57 / 75

Page 65: Greedy Algorithms -class3- · Greedy Algorithms And Genome Rearrangements Bioinfo I (Institut Pasteur de Montevideo) Greedy Algorithms -class3- July 19th, 2011 1 / 75

Cystic Fibrosis

Cystic fibrosis (CF) is a chronic and frequently fatal genetic disease ofthe body’s mucus glands (abnormally high level of mucus in glands).CF primarily affects the respiratory systems in children.Mucus is a slimy material that coats many epithelial surfaces and issecreted into fluids such as saliva

Finding Similarities between the Cystic Fibrosis Gene and ATP bindingproteins

In 1989 biologists found similarity between the cystic fibrosis gene andATP binding proteinsATP binding proteins are present on cell membrane and act astransport channelA plausible function for cystic fibrosis gene, given the fact that CFinvolves sweat secretion with abnormally high sodium level

Bioinfo I (Institut Pasteur de Montevideo) Greedy Algorithms -class3- July 19th, 2011 58 / 75

Page 66: Greedy Algorithms -class3- · Greedy Algorithms And Genome Rearrangements Bioinfo I (Institut Pasteur de Montevideo) Greedy Algorithms -class3- July 19th, 2011 1 / 75

Cystic Fibrosis

Cystic fibrosis (CF) is a chronic and frequently fatal genetic disease ofthe body’s mucus glands (abnormally high level of mucus in glands).CF primarily affects the respiratory systems in children.Mucus is a slimy material that coats many epithelial surfaces and issecreted into fluids such as saliva

Finding Similarities between the Cystic Fibrosis Gene and ATP bindingproteins

In 1989 biologists found similarity between the cystic fibrosis gene andATP binding proteinsATP binding proteins are present on cell membrane and act astransport channelA plausible function for cystic fibrosis gene, given the fact that CFinvolves sweat secretion with abnormally high sodium level

Bioinfo I (Institut Pasteur de Montevideo) Greedy Algorithms -class3- July 19th, 2011 58 / 75

Page 67: Greedy Algorithms -class3- · Greedy Algorithms And Genome Rearrangements Bioinfo I (Institut Pasteur de Montevideo) Greedy Algorithms -class3- July 19th, 2011 1 / 75

Cystic Fibrosis: Finding the Gene

Bioinfo I (Institut Pasteur de Montevideo) Greedy Algorithms -class3- July 19th, 2011 59 / 75

Page 68: Greedy Algorithms -class3- · Greedy Algorithms And Genome Rearrangements Bioinfo I (Institut Pasteur de Montevideo) Greedy Algorithms -class3- July 19th, 2011 1 / 75

Cystic Fibrosis: Mutation Analysis

If a high % of cystic fibrosis (CF) patients have a certain mutation inthe gene and the normal patients don’t, then that could be anindicator of a mutation that is related to CFA certain mutation was found in 70% of CF patients, convincingevidence that it is a predominant genetic diagnostics marker for CF

Bioinfo I (Institut Pasteur de Montevideo) Greedy Algorithms -class3- July 19th, 2011 60 / 75

Page 69: Greedy Algorithms -class3- · Greedy Algorithms And Genome Rearrangements Bioinfo I (Institut Pasteur de Montevideo) Greedy Algorithms -class3- July 19th, 2011 1 / 75

Cystic Fibrosis and CFTR Gene

Bioinfo I (Institut Pasteur de Montevideo) Greedy Algorithms -class3- July 19th, 2011 61 / 75

Page 70: Greedy Algorithms -class3- · Greedy Algorithms And Genome Rearrangements Bioinfo I (Institut Pasteur de Montevideo) Greedy Algorithms -class3- July 19th, 2011 1 / 75

Cystic Fibrosis and the CFTR Protein

CFTR (Cystic FibrosisTransmembrane conductanceRegulator) protein is acting in thecell membrane of epithelial cells thatsecrete mucus.These cells line the airways of thenose, lungs, the stomach wall, etc.

Bioinfo I (Institut Pasteur de Montevideo) Greedy Algorithms -class3- July 19th, 2011 62 / 75

Page 71: Greedy Algorithms -class3- · Greedy Algorithms And Genome Rearrangements Bioinfo I (Institut Pasteur de Montevideo) Greedy Algorithms -class3- July 19th, 2011 1 / 75

Mechanism of Cystic Fibrosis

The CFTR protein (1480 amino acids) regulates a chloride ion channelAdjusts the “wateriness" of fluids secreted by the cellThose with cystic fibrosis are missing one single amino acid in theirCFTRMucus ends up being too thick, affecting many organs

Bioinfo I (Institut Pasteur de Montevideo) Greedy Algorithms -class3- July 19th, 2011 63 / 75

Page 72: Greedy Algorithms -class3- · Greedy Algorithms And Genome Rearrangements Bioinfo I (Institut Pasteur de Montevideo) Greedy Algorithms -class3- July 19th, 2011 1 / 75

Bring in the Bioinformaticians!!!

Bioinfo I (Institut Pasteur de Montevideo) Greedy Algorithms -class3- July 19th, 2011 64 / 75

Page 73: Greedy Algorithms -class3- · Greedy Algorithms And Genome Rearrangements Bioinfo I (Institut Pasteur de Montevideo) Greedy Algorithms -class3- July 19th, 2011 1 / 75

Bring in the Bioinformaticians!!!

Gene similarities between two genes with known and unknown functionalert biologists to some possibilitiesComputing a similarity score between two genes tells how likely it isthat they have similar functionsDynamic programming is a technique for revealing similarities betweengenesThe Change Problem is a good problem to introduce the idea ofdynamic programming

Bioinfo I (Institut Pasteur de Montevideo) Greedy Algorithms -class3- July 19th, 2011 65 / 75

Page 74: Greedy Algorithms -class3- · Greedy Algorithms And Genome Rearrangements Bioinfo I (Institut Pasteur de Montevideo) Greedy Algorithms -class3- July 19th, 2011 1 / 75

The Change Problem

Goal: Convert some amount of money M into given denominations, usingthe fewest possible number of coinsInput: An amount of money M, and an array of d denominationsc = (c1, c2, ..., cd ), in a decreasing order of value (c1 > c2 > ... > cd )Output: A list of d integers i1, i2, ..., id such thatc1i1 + c2i2 + + cd id = M and i1 + i2 + + id is minimal

Bioinfo I (Institut Pasteur de Montevideo) Greedy Algorithms -class3- July 19th, 2011 66 / 75

Page 75: Greedy Algorithms -class3- · Greedy Algorithms And Genome Rearrangements Bioinfo I (Institut Pasteur de Montevideo) Greedy Algorithms -class3- July 19th, 2011 1 / 75

Change Problem: Example

Given the denominations 1, 3, and 5, what is the minimum number of coinsneeded to make change for a given value?

Only one coin is needed to make change for the values 1, 3 and 5

Bioinfo I (Institut Pasteur de Montevideo) Greedy Algorithms -class3- July 19th, 2011 67 / 75

Page 76: Greedy Algorithms -class3- · Greedy Algorithms And Genome Rearrangements Bioinfo I (Institut Pasteur de Montevideo) Greedy Algorithms -class3- July 19th, 2011 1 / 75

Change Problem: Example

Given the denominations 1, 3, and 5, what is the minimum number of coinsneeded to make change for a given value?

However, two coins are needed to make change for the values 2, 4, 6, 8 and10.

Bioinfo I (Institut Pasteur de Montevideo) Greedy Algorithms -class3- July 19th, 2011 68 / 75

Page 77: Greedy Algorithms -class3- · Greedy Algorithms And Genome Rearrangements Bioinfo I (Institut Pasteur de Montevideo) Greedy Algorithms -class3- July 19th, 2011 1 / 75

Change Problem: Example

Given the denominations 1, 3, and 5, what is the minimum number of coinsneeded to make change for a given value?

Lastly, three coins are needed to make change for the values 7 and 9

Bioinfo I (Institut Pasteur de Montevideo) Greedy Algorithms -class3- July 19th, 2011 69 / 75

Page 78: Greedy Algorithms -class3- · Greedy Algorithms And Genome Rearrangements Bioinfo I (Institut Pasteur de Montevideo) Greedy Algorithms -class3- July 19th, 2011 1 / 75

Change Problem: Recurrence

This example is expressed by the following recurrence relation:

minNumCoins(M) = min

minNumCoins(M − 1) + 1minNumCoins(M − 3) + 1minNumCoins(M − 5) + 1

Given the denominations c: c1, c2, ..., cd , the recurrence relation is:

minNumCoins(M) = min

minNumCoins(M − c1) + 1minNumCoins(M − c2) + 1...minNumCoins(M − cd ) + 1

Bioinfo I (Institut Pasteur de Montevideo) Greedy Algorithms -class3- July 19th, 2011 70 / 75

Page 79: Greedy Algorithms -class3- · Greedy Algorithms And Genome Rearrangements Bioinfo I (Institut Pasteur de Montevideo) Greedy Algorithms -class3- July 19th, 2011 1 / 75

Change Problem: Recurrence

This example is expressed by the following recurrence relation:

minNumCoins(M) = min

minNumCoins(M − 1) + 1minNumCoins(M − 3) + 1minNumCoins(M − 5) + 1

Given the denominations c: c1, c2, ..., cd , the recurrence relation is:

minNumCoins(M) = min

minNumCoins(M − c1) + 1minNumCoins(M − c2) + 1...minNumCoins(M − cd ) + 1

Bioinfo I (Institut Pasteur de Montevideo) Greedy Algorithms -class3- July 19th, 2011 70 / 75

Page 80: Greedy Algorithms -class3- · Greedy Algorithms And Genome Rearrangements Bioinfo I (Institut Pasteur de Montevideo) Greedy Algorithms -class3- July 19th, 2011 1 / 75

Change Problem: A Recursive Algorithm

1 RecursiveChange(M,c,d)2 if M = 03 return 04 bestNumCoins← inf5 for i ← 1 to d6 if M ≥ ci

7 numCoins ← RecursiveChange(M − ci , c , d)8 if numCoins + 1 < bestNumCoins9 bestNumCoins ← numCoins + 110 return bestNumCoins

Bioinfo I (Institut Pasteur de Montevideo) Greedy Algorithms -class3- July 19th, 2011 71 / 75

Page 81: Greedy Algorithms -class3- · Greedy Algorithms And Genome Rearrangements Bioinfo I (Institut Pasteur de Montevideo) Greedy Algorithms -class3- July 19th, 2011 1 / 75

RecursiveChange Is Not EfficientIt recalculates the optimal coin combination for a given amount of moneyrepeatedlyFor example M = 77, c = (1, 3, 7)→ optimal coin combo for 70 cents iscomputed 9 times!

Bioinfo I (Institut Pasteur de Montevideo) Greedy Algorithms -class3- July 19th, 2011 72 / 75

Page 82: Greedy Algorithms -class3- · Greedy Algorithms And Genome Rearrangements Bioinfo I (Institut Pasteur de Montevideo) Greedy Algorithms -class3- July 19th, 2011 1 / 75

We Can Do Better

We’re re-computing values in our algorithm more than onceSave results of each computation for 0 to MThis way, we can do a reference call to find an already computedvalue, instead of re-computing each timeRunning time M ∗ d , where M is the value of money and d is thenumber of denominations

Bioinfo I (Institut Pasteur de Montevideo) Greedy Algorithms -class3- July 19th, 2011 73 / 75

Page 83: Greedy Algorithms -class3- · Greedy Algorithms And Genome Rearrangements Bioinfo I (Institut Pasteur de Montevideo) Greedy Algorithms -class3- July 19th, 2011 1 / 75

The Change Problem: Dynamic Programming

1 DPChange(M,c,d)2 bestNumCoins0 ← 03 for m ← 1 to M4 bestNumCoinsm ← inf5 for i ← 1 to d6 if m ≥ ci

7 if bestNumCoinsm−ci+ 1 < bestNumCoinsm8 bestNumCoinsm ← bestNumCoinsm−ci + 19 return bestNumCoinsM

Bioinfo I (Institut Pasteur de Montevideo) Greedy Algorithms -class3- July 19th, 2011 74 / 75

Page 84: Greedy Algorithms -class3- · Greedy Algorithms And Genome Rearrangements Bioinfo I (Institut Pasteur de Montevideo) Greedy Algorithms -class3- July 19th, 2011 1 / 75

DPChange: Example

Bioinfo I (Institut Pasteur de Montevideo) Greedy Algorithms -class3- July 19th, 2011 75 / 75