Maximization Problems with Submodular Objective Functions
Moran Feldman
Publication List• Improved Approximations for k-Exchange Systems.
Moran Feldman, Joseph (Seffi) Naor, Roy Schwartz and Justin Ward, ESA, 2011.• A Unified Continuous Greedy Algorithm for Submodular Maximization.
Moran Feldman, Joseph (Seffi) Naor and Roy Schwartz, FOCS 2011.• A Tight Linear Time (1/2)-Approximation for Unconstrained Submodular Maximization.
Niv Buchbinder, Moran Feldman, Joseph (Seffi) Naor and Roy Schwartz, FOCS 2012.
2
Outline• Preliminaries– What is a submodular function?
• Unconstrained Submodular Maximization• More Preliminaries• Maximizining a Submodular Function over a
Polytope Constraint
3
Set Functions
DefinitionGiven a ground set N, a set function f : 2N R assigns a number to every subset of the ground set.
Intuition• Consider a player participating in an auction on a set N of elements.• The utility of the player from buying a subset N’ N of elements is
given by a set function f.
Modular FunctionsA set function is modular (linear) if there is a fixed utility vu for every element u N, and the utility of a subset N’ N is given by:
Nu
uv
4
Properties of Set FunctionsMotivation• Modularity is often a too strong property.• Many weaker properties have been defined.• A modular function has all these properties.
Normalization• No utility is gained from the set of no elements.• f() = 0.
Monotonicity• More elements cannot give less utility.• For every two sets A B N, f(A) f(B).
Subadditivity• Two sets of elements give less utility together than separately.• For every two sets A, B N, f(A) + f(B) f(A B).• Intuition: I want to hang a single painting in the living room, but I have nothing
to do with a second painting.
5
Properties of Set Functions (cont.)Difficulty• Subadditivity is often a too weak property.
Solution• A stronger property called submodularity.• The marginal contribution of an element u to a set A decreases when
adding elements to A.
NotationGiven a set A, and an elements u, fu(A) is the marginal contribution of u to A:
Formal Definition
AfuAfAfu
For sets A B N, and u B:fu(A) fu(B)
For sets A, B N:f(A) + f(B) f(A B) + f(A B)
6
Submodular Function - Example
0567
10811
0
Too heavy
• Normalized• Nonmonotone• Submodular
54-8
7
Where can One Find Submodular Set Functions?
QuestionSubmodularity looks peculiar. Is it common in real life?
Answer• Yes, submodular set functions pop often in many settings.• Submodular functions represents “economy of scale”, which makes
them useful in economics.• Examples of submodular functions in combinatorial settings:
Ground Set Submodular FunctionNodes of a graph The number of edges leaving a
set of nodes.Collection of sets The number of elements in the
union of the of a sub-collection.
8
Unconstrained Submodular Maximization
InstanceA non-negative submodular function f : 2N R+.
ObjectiveFind a subset S N maximizing f(S).
Most Recent Previous Works• 0.41 approximation via simulated annealing [Gharan and Vondrak
11]• 0.42 approximation via a combination of simulated annealing and
the Structural Continuous Greedy Algorithm [Feldman et al. 11]• 0.5 hardness [Feige et al. 07]
9
The Players• The algorithm we present manages two solutions (sets):
– X – Initially the empty set.– Y – Initially the set of all elements (N).
• The elements are ordered in an arbitrary order: u1, u2, …, un.• The algorithm has n iterations, one for each element. In iteration i the algorithm
decides whether to add ui to X or to remove it from Y.
• The following notation is defined using these players:
Formally:
The contribution of adding ui to X
ai = f(X {ui}) – f(X)
The contribution of removing ui from Y
bi = f(Y \ {ui}) – f(Y)
10
More on the Algorithm
• Above we can see the initial configuration of the algorithm.– The set Y is red. Initially it contains all the elements.– The set X is green. Initially it contains no elements.
• Assume the algorithm now performs the following moves:– Add u1 to X.– Remove u2 from Y.
– Add u3 to X.
– Add u4 to X.– Remove u5 from Y.
• Notice: when the algorithm terminates, X = Y is the output of the algorithm.
IN
Out
u1 u2 u3 u4 u5
11
Notation• Xi and Yi - the values of the sets X and Y after i iterations.
• OPTi - the optimal set such that: Xi OPTi Yi.• V – the value of the algorithm’s output.
For i = 0OPT0 = OPT
X0 = Y0 = N
For i = nOPTn = Xn = Yn
f(OPTn) = f(Xn) = f(Yn) = V
Observations• As i increases, f(OPTi) deteriorates from f(OPT) to V.
• V [f(Xn) – f(X0) + f(Yn) – f(Y0)] / 2, i.e., V is at least the increase in f(X) + f(Y) over 2.
12
Roadmap• To analyze the algorithm we show that for some constant c and every
iteration i:f(OPTi-1) - f(OPTi) c ∙ [f(Xi) – f(Xi-1) + f(Yi) – f(Yi-1)]
Deterioration of OPT Improvement in f(X) + f(Y)
• Summing over all the iterations, we get:Deterioration in OPT c ∙ Improvement in f(X) + f(Y)f(OPT) - V 2c ∙ V
V f(OPT) / [1 + 2c]
• The value of c depends on how the algorithm decides whether to add ui to X or remove it from Y.
(*)
13
Deterministic RuleIf ai bi
• then add ui to X,
• else remove ui from Y.
LemmaAlways, ai + bi 0.
LemmaIf the algorithm adds ui to X (respectively, removes ui from Y), and OPTi-1 OPTi , then the deterioration in OPT is at most bi (respectively, ai).ProofSince OPTi-1 OPTi , we know that ui OPTi-1.
Non-standard greedy algorithm
f(OPTi-1) – f(OPTi) f(OPTi-1) – f(OPTi-1 {ui}) f(Y \ {ui}) – f(Y) = bi
14
Deterministic Rule - Analysis• Assume ai bi , and the algorithm adds ui to X (the other case is
analogues).• Cleary ai 0 since ai + bi 0.• If ui OPTi-1 , then:– f(OPTi-1) - f(OPTi) = 0– f(Xi) – f(Xi-1) + f(Yi) – f(Yi-1) = ai 0– Inequality (*) holds for every non-negative c.
• If ui OPTi-1 , then:– f(OPTi-1) - f(OPTi) bi
– f(Xi) – f(Xi-1) + f(Yi) – f(Yi-1) = ai bi
– Inequality (*) holds for c = 1.• The approximation ratio is: 1 / [1 + 2c] = 1/3.
15
Random Rule1. If bi 0, then add ui to X and quit.2. If ai 0, then remove ui from Y and quit.3. With probability ai / (ai + bi) add ui to X, otherwise (with probability bi / (ai +
bi)), remove ui from Y.
Three cases: one for each line of the algorithm.
Case 1 (bi 0)• Cleary ai 0 since ai + bi 0.
• Thus, f(Xi) – f(Xi-1) + f(Yi) – f(Yi-1) = ai 0.
• If ui OPTi-1 , then f(OPTi-1) - f(OPTi) = 0
• If ui OPTi-1 , then f(OPTi-1) - f(OPTi) bi 0• Inequality (*) holds for every non-negative c.
16
Random Rule - AnalysisCase 2 (ai 0)Analogues to the previous case.
Case 3 (ai , bi > 0)The improvement is:
Assume, w.l.o.g., ui OPTi-1, the deterioration in OPT is:
Inequality (*) holds for c = 0.5, since:
The approximation ratio is: 1 / [1 + 2c] = 1/2.
ii
iii
ii
ii
ii
iiiii ba
babba
baba
aYfYfXfXf
22
11
ii
iii
ii
i
ii
iii ba
baaba
bba
aOPTfOPTf
01
iiii baba 222
17
18
Polytope Constraints
• We abuse notation and identify a set S with its characteristic vector in [0, 1]N.– Given a set S, we use S also to denote its characteristic vector.– Given a vector x {0, 1}N, we denote by x also the set whose
characteristic vector is x.
NPx
xf1,0.t.s
max
Nxxxx
xxxxf
1,03322
13.t.smax
321
321
• Using this notation, we can define LP like problems:
• More generally, maximizing a submodular functrion subject to a polytope P constraint is the problem:
19
Relaxation• The last program:
– requires integer solutions.– generalizes “integer programming”.– is unlikely to have a reasonable approximation.
• We need to relax it.– We replace the constraint x {0,1}N with x [0,1]N.– The difficulty is to extend the objective function to fractional vectors.
• The multilinear extension F (a.k.a. extension by expectation) [Calinescu et al. 07].– Given a vector x [0, 1]N, let R(x) denote a random set containing
every element u N with probability xu independently.– F(x) = E[f(R(x))].
20
Introducing the ProblemMaximizining a Submodular Function over a Polytope Constraint• Asks to find (or approximate) the optimal solution for a relaxation
of the form presented.• Formally for as large a constant c as possible:
– Given , find a feasible x such that: F(x) c ∙ f(OPT).
Motivation• For many polytopes, a fractional solution can be rounded without
losing too much in the objective.– Matroid Polytopes – no loss [Calinescu et al. 11].– Constant number of knapsacks – (1 – ε) loss [Kulik et al. 09].– Unsplittable flow in trees – O(1) loss [Chekuri et al. 11].
NPx
xF1,0.t.s
max
21
The Continuous Greedy AlgorithmThe Algorithm [Vondrak 08]• Let δ > 0 be a small number.1. Initialize: y(0) and t 0.2. While t < 1 do:3. For every u N, let wu = F(y(t) u) – F(y(t)).4. Find a solution x in P [0, 1]N maximizing w ∙ x.5. y(t + δ) y(t) + δ ∙ x6. Set t t + δ 7. Return y(t)
Remark• Calculation of wu can be tricky for some functions.• Assuming one can evaluate f, the value of wu can be approximated via
sampling.
22
The Continuous Greedy Algorithm - Demonstration
Observations• The algorithm is somewhat like gradient descent.• The algorithm moves only in positive directions because the
extension F is guaranteed to be concave in such directions.
y(0)
y(0.01)y(0.02)
y(0.03)
y(0.04)
1x2x
3x
4x
23
The Continuous Greedy Algorithm - Results
Theorem [Calinescu et al. 11]• Assuming,– f is a normalized monotone submodular function.– P is a solvable polytope.
• The continuous greedy algorithm gives 1 – 1/e – o(n-1) approximation.
• By guessing the element of OPT with the maximal marginal contribution, one can get an optimal 1 – 1/e approximation.
Proof Idea• Show that w x∙ w ∙ OPT is large.• Show that the improvement in each iteration δ ∙ w ∙ x.• Sum up the improvements over all the iterations.
The Continuous Greedy Algorithm - Rethinking
QuestionWhat is the difference between gradient descent and the continuous greedy?
Answer• In gradient descent the direction chosen is the direction x
maximizing x ∙F.• In the continuous greedy the direction chosen is the direction x
maximizing x ∙ w, where wu is the marginal contribution of u.
24
)\( uNtyF tyF utyF tyu tyu1
Marginal contributionDerivative (gradient)
25
The Measured Continuous Greedy Algorithm
IdeaTo guarantee a gain of δ ∙ w ∙ x, it is enough to increase yu by: δ ∙ xu ∙ (1 – yu(t)).
The Algorithm• Let δ > 0 be a small number.1. Initialize: y(0) and t 0.2. While t < T do:3. For every u N, let wu = F(y(t) u) – F(y(t)).4. Find a solution x in P {0, 1}N maximizing w ∙ x.5. For every u N, yu(t + δ) yu(t) + δ ∙ xu (1 – ∙ yu(t)).6. Set t t + δ 7. Return y(t)
26
The Measured Continuous Greedy Algorithm - Results
Theorem• Assuming,– f is a non-negative submodular function.– P is a solvable down-montone polytope.
• The approximation ratio of the measured continuous greedy algorithm with T = 1 is 1/e – o(n-1).
Remarks• For monotone f, we get the an approximation ratio of 1-e-T.
For T = 1, this is the ratio of the continuous greedy.• The solution is no longer a convex combination of points of P:– The sum of the coefficients is at most T.– For T 1, the output is in P since P is down-monotone.
27
The Measured Continuous Greedy Algorithm - Analysis
Helper LemmaGiven x [0, 1]N and [0, 1], let T(x) be a set containing every element u N such that xu .• For every x [0, 1]N,ProofOmitted due to time constraints.
Lemma 1There is a good direction, i.e., w ∙ x e-t ∙ f(OPT) – F(y(t)).Proof• Notice that the increase in yu(t) is at most δ ∙ (1 – yu(t)).
• If δ is infinitely small, yu(t) is upper bounded by the solution of the following differential equation.
• For small δ, yu(t) is almost upper bounded by 1 – e-t.
1
0 dxTfxF
00),(1/ ytydtdy tety 1)(
28
The Continuous Greedy Algorithm – Analysis (cont.)
• OPT itself is a feasible direction. Its weight is:
Lemma 2The improvement is related to w ∙ x, i.e., F(y(t + δ)) F(y(t)) + δ ∙ w ∙ x.Proof• For small δ:
Nuuu
Nuuuu
xw
tyxtyFtyFtyF
1
tyFOPTfetyFdOPTtyTf
tyFOPTtyFtyFutyFw
t
e
OPTuOPTuu
t
1
1
29
Proof of the Theorem• Combining the last two lemmata gives:
F(y(t + δ)) - F(y(t)) δ ∙ [e-t ∙ f(OPT) – F(y(t))]
• Let g(t) = F(y(t)). For small δ the last equation becomes the differential equation:
dg/dt = e-t ∙ f(OPT) – g(t), g(0) = 0
• The solution of this equation is:g(t) = t ∙ e-t ∙ f(OPT)
• Hence, at time t = 1, the value of the solution is at least g(1) = e-1 ∙f(OPT).
30
Result for Monotone Functions
• For non-monotone functions, the approximation ratio is T ∙ e-T, which is maximized for T = 1.
• For monotone functions, the approximation ratio is 1 – e-T, which improves as T increases.
• In general, the solution produced for T > 1:– Is always within the cube [0, 1]N.– Might be outside the polytope.
• However, for some polytopes, somewhat larger values of T can be used.
31
The Submodular Welfare ProblemInstance• A set P of n players • A set Q of m items• Normalized monotone submodular utility function
wj: 2Q R+ for each player.
Objective• Let Qj Q denote the set of items the jth player gets.• The utility of the jth player is wj(Qj).• Distribute the items among the players, maximizing the sum
of utilities.
32
The PolytopeProblem Representation• Each item is represented by n elements in the ground set (each one
corresponding to its assignment to a different player).• The objective is the sum of the utility functions of the players, each
applied to the items allocated to it.• The polytope requires that every item will be allocated to at most one
player.
Analyzing a Single Constraint• All constraints are of the form:• Let ti be the amount of time in which the algorithm increases xi. Notice
that xi = 1 – e-ti.• By definition, , and it can be shown that for T -n ln (1–n∙ -1)
it always holds that:
11
n
i ix
Ttn
i i 1
1111
n
itn
i iiex
33
Approximating the Submodular Welfare Problem
• Apply the measured continuous greedy till time:T = - n ln (1 – 1/n)∙
• By the previous analysis, the solution produced will be in the polytope.
• The expected value of the solution is at least:
• Round the solution using the natural rounding:– Assign an item to each player with probability equal to the
corresponding variable.• This approximation ratio is tight for every n. [Vondrak 06]
nnn ne /1111 /11ln
34
Open Problems• The measured continuous greedy algorithm:– Provides tight approximation for monotone functions
[Nemhauser and Wolsey 78].– Is this also the case for non-monotone functions?– The current approximation ratio of e-1 is a natural number.
• Uniform Matroids– The approximability depends on k/n.
• For k/n = 0.5, we have 0.5 approximation.• Recently, a hardness of a bit less than 0.5 was shown for k/n
approaching 0. [Gharan and Vondrak 11]– What is the correct approximation ratio as a function of k/n?– We know it is always possible to bit the e-1 ratio.
?