submodular functions and optimizing biodiversityapnelson/moutons/charles.pdfrodrigues et al. (2005)...

Submodular Functions and OptimizingBiodiversity

Charles Semple

School of Mathematics and StatisticsUniversity of Canterbury

New Zealand

La Vacquerie 2015

Conservation biology

Measures are used in conservation biology to quantify thebiological diversity of a collection of species.

These measures are used for selecting which species to conserve.

Typically, individual species are the focus of attention, but this isnot necessarily the best way to conserve diversity:

Although conservation action is frequently targetedtoward single species, the most effective way ofpreserving overall species diversity is by conserving viablepopulations in their natural habitats, often by designatingnetworks of protected areas. Rodrigues et al. (2005)

Conservation biology

Measures are used in conservation biology to quantify thebiological diversity of a collection of species.

These measures are used for selecting which species to conserve.

Typically, individual species are the focus of attention, but this isnot necessarily the best way to conserve diversity:

Although conservation action is frequently targetedtoward single species, the most effective way ofpreserving overall species diversity is by conserving viablepopulations in their natural habitats, often by designatingnetworks of protected areas. Rodrigues et al. (2005)

Page 4: Submodular Functions and Optimizing Biodiversityapnelson/moutons/Charles.pdfRodrigues et al. (2005) Conservation biology Measuresare used in conservation biology to quantify the biological

Phylogenetic diversity

Phylogenetic diversity (PD) is a quantitative tool for measuring thebiodiversity of a collection of species (Faith 1992).

For a subset Y of species, PD(Y ) is the evolutionary distancespanned by the species in Y .

• This distance is usually defined in reference to somephylogeny, but here we (initially) extend this with reference toa collection of 2-state characters.

Page 5: Submodular Functions and Optimizing Biodiversityapnelson/moutons/Charles.pdfRodrigues et al. (2005) Conservation biology Measuresare used in conservation biology to quantify the biological

Phylogenetic diversity on splits

A bipartition {A,B} of a set X , where |A|, |B| ≥ 1, is a split of X .

A split system Σ of X is a collection of splits of X .

Σ is weighted if there is a map w : Σ→ R≥0.

For a subset Z of X ,

PD(Z ) =∑

A|B∈Σ;A∩Z ,B∩Z 6=∅

w(A|B).

Page 6: Submodular Functions and Optimizing Biodiversityapnelson/moutons/Charles.pdfRodrigues et al. (2005) Conservation biology Measuresare used in conservation biology to quantify the biological

Example w(ag |bcdef ) = 1, w(abg |cdef ) = 5, w(abefg |cd) = 2,w(a|−) = 1, w(b|−) = 3, w(c |−) = 1, w(d |−) = 2, w(e|−) = 3,w(f |−) = 2, w(g |−) = 2

If Z = {a, b, f }, then

PD(Z ) = ag |bcdef + abg |cdef + a|−+ b|−+ f |−= 1 + 5 + 1 + 3 + 2 = 12

Page 7: Submodular Functions and Optimizing Biodiversityapnelson/moutons/Charles.pdfRodrigues et al. (2005) Conservation biology Measuresare used in conservation biology to quantify the biological

Page 8: Submodular Functions and Optimizing Biodiversityapnelson/moutons/Charles.pdfRodrigues et al. (2005) Conservation biology Measuresare used in conservation biology to quantify the biological

1

2

a1

2

g

53

2e

3

b

f

c

d

Page 9: Submodular Functions and Optimizing Biodiversityapnelson/moutons/Charles.pdfRodrigues et al. (2005) Conservation biology Measuresare used in conservation biology to quantify the biological

1

2

a1

2

g

53

2e

3

b

f

c

d

Page 10: Submodular Functions and Optimizing Biodiversityapnelson/moutons/Charles.pdfRodrigues et al. (2005) Conservation biology Measuresare used in conservation biology to quantify the biological

b

f

c

d

1

2

a1

2

g

53

2e

3

Page 11: Submodular Functions and Optimizing Biodiversityapnelson/moutons/Charles.pdfRodrigues et al. (2005) Conservation biology Measuresare used in conservation biology to quantify the biological

Phylogenetic diversity across reserves

For a weighted split system Σ of X and a collection R of protectedreserves containing species in X , the phylogenetic diversity of asubset S of R is the PD score of the species contained within atleast one reserve in S .

Example S ={{a, b}, {c , e}, {a, g , e}

}PD({{a, b}, {c, e}, {a, g , e}

})= PD

({a, b, c , e, g}

)= 18

Page 12: Submodular Functions and Optimizing Biodiversityapnelson/moutons/Charles.pdfRodrigues et al. (2005) Conservation biology Measuresare used in conservation biology to quantify the biological

}PD({{a, b}, {c, e}, {a, g , e}

})= PD

({a, b, c , e, g}

)= 18

1

2

a1

2

g

53

2e

3

b

f

c

d

Page 13: Submodular Functions and Optimizing Biodiversityapnelson/moutons/Charles.pdfRodrigues et al. (2005) Conservation biology Measuresare used in conservation biology to quantify the biological

}PD({{a, b}, {c, e}, {a, g , e}

})= PD

({a, b, c , e, g}

)= 18

b

f

c

d

1

2

1

2

g

53

2e

3

a

Page 14: Submodular Functions and Optimizing Biodiversityapnelson/moutons/Charles.pdfRodrigues et al. (2005) Conservation biology Measuresare used in conservation biology to quantify the biological

The problem

Budgeted Nature Reserve Selection (BNRS)Instance: A weighted split system Σ on X , a collection R ofregions containing species in X , a cost of preservation for eachregion, a fixed budget B.

Task: Find a subset of regions to preserve that maximizes the PDscore on Σ of the species contained within the preserved regionswhile keeping within budget B.

Applications of using phylogenetic diversity across regions to makeassessments in conservation planning include Moritz and Faith(1998), Rodrigues and Gaston (2002), Smith et al. (2000).

Page 15: Submodular Functions and Optimizing Biodiversityapnelson/moutons/Charles.pdfRodrigues et al. (2005) Conservation biology Measuresare used in conservation biology to quantify the biological

The problem

Budgeted Nature Reserve Selection (BNRS)Instance: A weighted split system Σ on X , a collection R ofregions containing species in X , a cost of preservation for eachregion, a fixed budget B.Task: Find a subset of regions to preserve that maximizes the PDscore on Σ of the species contained within the preserved regionswhile keeping within budget B.

Page 16: Submodular Functions and Optimizing Biodiversityapnelson/moutons/Charles.pdfRodrigues et al. (2005) Conservation biology Measuresare used in conservation biology to quantify the biological

The problem

Budgeted Nature Reserve Selection (BNRS)Instance: A weighted split system Σ on X , a collection R ofregions containing species in X , a cost of preservation for eachregion, a fixed budget B.Task: Find a subset of regions to preserve that maximizes the PDscore on Σ of the species contained within the preserved regionswhile keeping within budget B.

Page 17: Submodular Functions and Optimizing Biodiversityapnelson/moutons/Charles.pdfRodrigues et al. (2005) Conservation biology Measuresare used in conservation biology to quantify the biological

Budgeted Nature Reserve Selection

If R consists of all singleton subsets of X with each subset havingunit cost and

• Σ is compatible, then BNRS is solvable in polynomial time(Faith, 1992; Pardi and Goldman, 2005; Steel, 2005).

• Σ is circular, then BNRS is solvable in polynomial time(Minh et al., 2009).

• Σ is affine, then BNRS is solvable in polynomial time(Spillner et al., 2008).

However, if R consists of all singleton subsets of X with eachsubset having unit cost and Σ is arbitrary, then BNRS is NP-hard(Spillner et al., 2008).

Page 18: Submodular Functions and Optimizing Biodiversityapnelson/moutons/Charles.pdfRodrigues et al. (2005) Conservation biology Measuresare used in conservation biology to quantify the biological

If R consists of all singleton subsets of X with each subset havingunit cost and

• Σ is compatible, then BNRS is solvable in polynomial time(Faith, 1992; Pardi and Goldman, 2005; Steel, 2005).

• Σ is circular, then BNRS is solvable in polynomial time(Minh et al., 2009).

• Σ is affine, then BNRS is solvable in polynomial time(Spillner et al., 2008).

However, if R consists of all singleton subsets of X with eachsubset having unit cost and Σ is arbitrary, then BNRS is NP-hard(Spillner et al., 2008).

Page 19: Submodular Functions and Optimizing Biodiversityapnelson/moutons/Charles.pdfRodrigues et al. (2005) Conservation biology Measuresare used in conservation biology to quantify the biological

If Σ is compatible and each region has unit cost, then BNRS isNP-hard (Moulton, S, Steel 2007).

Theorem (Bordewich, S 2008)If Σ is compatible, then there is a polynomial-time(1− 1

e

)-approximation algorithm for BNRS. Moreover, for any

δ > 0, BNRS cannot be approximated with an approximationratio of

(1− 1

e + δ)

unless P=NP.

Page 20: Submodular Functions and Optimizing Biodiversityapnelson/moutons/Charles.pdfRodrigues et al. (2005) Conservation biology Measuresare used in conservation biology to quantify the biological

e

(1− 1

e + δ)

unless P=NP.

• The algorithm returns a feasible solution whose score is atleast

(1− 1

e

)(approx. 0.63) times the optimal score.

Page 21: Submodular Functions and Optimizing Biodiversityapnelson/moutons/Charles.pdfRodrigues et al. (2005) Conservation biology Measuresare used in conservation biology to quantify the biological

e

(1− 1

e + δ)

unless P=NP.

The restriction to compatible split systems is redundant. Inparticular, we have

Theorem (Bordewich, S 2012)There is a polynomial-time

(1− 1

e

)-approximation algorithm for

BNRS.

Submodular functions

For a set E , a function f : 2E → R is submodular if, for all subsetsS ,T ⊆ E ,

f (S) + f (T ) ≥ f (S ∪ T ) + f (S ∩ T ).

Optimizing Submodular Functions (OSF)Instance: A non-negative, non-decreasing, submodular function fon 2E which is computable in polynomial time, a cost function onE , a fixed budget B.Task: Find a subset S of E which maximizes f while keepingwithin budget B.

Page 23: Submodular Functions and Optimizing Biodiversityapnelson/moutons/Charles.pdfRodrigues et al. (2005) Conservation biology Measuresare used in conservation biology to quantify the biological

f (S) + f (T ) ≥ f (S ∪ T ) + f (S ∩ T ).

Optimizing Submodular Functions (OSF)Instance: A non-negative, non-decreasing, submodular function fon 2E which is computable in polynomial time, a cost function onE , a fixed budget B.

Task: Find a subset S of E which maximizes f while keepingwithin budget B.

Page 24: Submodular Functions and Optimizing Biodiversityapnelson/moutons/Charles.pdfRodrigues et al. (2005) Conservation biology Measuresare used in conservation biology to quantify the biological

f (S) + f (T ) ≥ f (S ∪ T ) + f (S ∩ T ).

Optimizing Submodular Functions (OSF)Instance: A non-negative, non-decreasing, submodular function fon 2E which is computable in polynomial time, a cost function onE , a fixed budget B.Task: Find a subset S of E which maximizes f while keepingwithin budget B.

Page 25: Submodular Functions and Optimizing Biodiversityapnelson/moutons/Charles.pdfRodrigues et al. (2005) Conservation biology Measuresare used in conservation biology to quantify the biological

Approximation solution to OSF

ApproxFunction

1 Exhaustively find a feasible solution of size at most two thatmaximizes f . Call the resulting solution H1.

2 For all subsets of E of size three,

(i) Sequentially add elements of E that maximize the ratio ofincremental f to cost while keeping within budget.

(ii) Do this until no more elements can be added.

Call the best solution H2.

3 Output H1 or H2 depending on which has the bigger value.

Theorem (Sviridenko, 2004)ApproxFunction is a polynomial-time

(1− 1

e

)-approximation

algorithm for OSF.

Page 26: Submodular Functions and Optimizing Biodiversityapnelson/moutons/Charles.pdfRodrigues et al. (2005) Conservation biology Measuresare used in conservation biology to quantify the biological

A PD-based submodular function

Lemma Let Σ be a weighted split system of X , let Q be anon-empty subset of X , and let R be a collection of subsets of X .Then the function PD(Q,Σ) : 2R → R≥0 defined for all subsets R′of R, by the PD score of Q ∪

⋃R∈R′ R is a submodular function.

• Choosing E , f , c , and B to be R− Q, PD(Q,Σ), c , andB − c(Q), ApproxFunction is a polynomial-time(1− 1

e

)-approximation algorithm for BNRS for when the

selected set of reserves includes Q.

• By running through each possible choice for Q, we get apolynomial-time

(1− 1

e

)-approximation algorithm for BNRS.

Page 27: Submodular Functions and Optimizing Biodiversityapnelson/moutons/Charles.pdfRodrigues et al. (2005) Conservation biology Measuresare used in conservation biology to quantify the biological

e

(1− 1

e

Page 28: Submodular Functions and Optimizing Biodiversityapnelson/moutons/Charles.pdfRodrigues et al. (2005) Conservation biology Measuresare used in conservation biology to quantify the biological

e

(1− 1

e

Page 29: Submodular Functions and Optimizing Biodiversityapnelson/moutons/Charles.pdfRodrigues et al. (2005) Conservation biology Measuresare used in conservation biology to quantify the biological

BNRS in the rooted setting

For a rooted phylogenetic X -tree T and a subset Y of X , PD(Y )is the sum of the edge weights of the minimal subtree in T thatconnects the elements in Y and the root.

(1− 1

e

rBNRS. Moreover, for any δ > 0, rBNRS cannot beapproximated with an approximation ratio of

(1− 1

e + δ)

unlessP=NP.

Can we extend rBNRS while maintaining the property of apolynomial-time

(1− 1

e

)-approximation algorithm for solving it?

Page 30: Submodular Functions and Optimizing Biodiversityapnelson/moutons/Charles.pdfRodrigues et al. (2005) Conservation biology Measuresare used in conservation biology to quantify the biological

(1− 1

e

(1− 1

e + δ)

unlessP=NP.

(1− 1

e

Page 31: Submodular Functions and Optimizing Biodiversityapnelson/moutons/Charles.pdfRodrigues et al. (2005) Conservation biology Measuresare used in conservation biology to quantify the biological

(1− 1

e

(1− 1

e + δ)

unlessP=NP.

(1− 1

e

Page 32: Submodular Functions and Optimizing Biodiversityapnelson/moutons/Charles.pdfRodrigues et al. (2005) Conservation biology Measuresare used in conservation biology to quantify the biological

rBNRS extensions

1 Evolutionary relationships are not necessarily represented by asingle tree. For example, a collection of gene trees may be abetter representation.

• Allow for a collection of trees.

2 It’s unrealistic to expect a species probability of survival iszero if it is not contained in a selected region or, if it iscontained in such a region, its probability of survival is one.

• Allow for arbitrary survival probabilities.

3 PD assumes that features arise uniformly across a phylogenyand persist to be present in all descendant species. Butfeatures may disappear over time.

• Allow for features to disappear.Once a feature is present, it has a constant and memorylessprobability e−λ of surviving in each time step.

Page 33: Submodular Functions and Optimizing Biodiversityapnelson/moutons/Charles.pdfRodrigues et al. (2005) Conservation biology Measuresare used in conservation biology to quantify the biological

rBNRS extensions

Page 34: Submodular Functions and Optimizing Biodiversityapnelson/moutons/Charles.pdfRodrigues et al. (2005) Conservation biology Measuresare used in conservation biology to quantify the biological

rBNRS extensions

Page 35: Submodular Functions and Optimizing Biodiversityapnelson/moutons/Charles.pdfRodrigues et al. (2005) Conservation biology Measuresare used in conservation biology to quantify the biological

PD in the extended rooted setting

Each x ∈ X has a probability p(x) of survival. Under the extendedmodel, the PD of X on T is the expected number of featurespresent amongst the surviving taxa.

PD(λ,T )(X , p) =

∫t∈T

P(t → X )dt,

where (t → X ) denotes the event that a feature arising at point ton T survives to be present in a taxa in X which itself survives.

Summing over a collection P = {T1, T2, . . . , Tk} of weighted trees,the PD of X on P is

PD(λ,P)(X , p) =k∑

j=1

w(Tj)∫t∈Tj

P(t → X )dt.

Page 36: Submodular Functions and Optimizing Biodiversityapnelson/moutons/Charles.pdfRodrigues et al. (2005) Conservation biology Measuresare used in conservation biology to quantify the biological

PD(λ,T )(X , p) =

∫t∈T

P(t → X )dt,

PD(λ,P)(X , p) =k∑

j=1

w(Tj)∫t∈Tj

P(t → X )dt.

Page 37: Submodular Functions and Optimizing Biodiversityapnelson/moutons/Charles.pdfRodrigues et al. (2005) Conservation biology Measuresare used in conservation biology to quantify the biological

PD(λ,T )(X , p) =

∫t∈T

P(t → X )dt,

PD(λ,P)(X , p) =k∑

j=1

w(Tj)∫t∈Tj

P(t → X )dt.

Page 38: Submodular Functions and Optimizing Biodiversityapnelson/moutons/Charles.pdfRodrigues et al. (2005) Conservation biology Measuresare used in conservation biology to quantify the biological

rBNRS under the extended model

BNRS(λ,P)

Instance: A collection P of weighted trees on X , a collection R ofregions containing species in X , a cost of preservation for eachregion, a fixed budget B and, for all (x ,R) ∈ X ×R, probabilitiesa(x ,R) and b(x ,R), where b(x ,R) ≥ a(x ,R).

Task: Find a subset R′ of regions to preserve that maximizesPD(λ,P)(X , pR′) while keeping within budget B.

Page 39: Submodular Functions and Optimizing Biodiversityapnelson/moutons/Charles.pdfRodrigues et al. (2005) Conservation biology Measuresare used in conservation biology to quantify the biological

BNRS(λ,P)

Instance: A collection P of weighted trees on X , a collection R ofregions containing species in X , a cost of preservation for eachregion, a fixed budget B and, for all (x ,R) ∈ X ×R, probabilitiesa(x ,R) and b(x ,R), where b(x ,R) ≥ a(x ,R).Task: Find a subset R′ of regions to preserve that maximizesPD(λ,P)(X , pR′) while keeping within budget B.

Page 40: Submodular Functions and Optimizing Biodiversityapnelson/moutons/Charles.pdfRodrigues et al. (2005) Conservation biology Measuresare used in conservation biology to quantify the biological

BNRS(λ,P)

Lemma The function PD(λ,P) : 2R → R≥0 is non-negative,non-decreasing, submodular, and computable in polynomial time.

Page 41: Submodular Functions and Optimizing Biodiversityapnelson/moutons/Charles.pdfRodrigues et al. (2005) Conservation biology Measuresare used in conservation biology to quantify the biological

BNRS(λ,P)

(1− 1

e

BNRS(λ,P). Moreover, for any δ > 0, BNRS(λ,P) cannot be

approximated with an approximation ratio of(1− 1

e + δ)

unlessP=NP.

submodular functions and optimizing biodiversityapnelson/moutons/charles.pdfrodrigues et al. (2005)...

Documents