submodular function optimization
TRANSCRIPT
![Page 1: Submodular Function Optimization](https://reader031.vdocuments.mx/reader031/viewer/2022012502/617bd23ec041b85a3629cd59/html5/thumbnails/1.jpg)
Submodular Function Optimization
http://bicmr.pku.edu.cn/~wenzw/bigdata2018.html
Acknowledgement: this slides is based on Prof. Andreas Krause’s, Prof. Jeff Bilmes, Prof.Francis Bach and Prof. Shaddin Dughmi lecture notes
1/84
![Page 2: Submodular Function Optimization](https://reader031.vdocuments.mx/reader031/viewer/2022012502/617bd23ec041b85a3629cd59/html5/thumbnails/2.jpg)
2/84
Outline
1 What is submodularity?Examples in recommendation setsDefinition
2 Submodular maximization
3 Submodular minimization
4 Applications of submodular maximization
![Page 3: Submodular Function Optimization](https://reader031.vdocuments.mx/reader031/viewer/2022012502/617bd23ec041b85a3629cd59/html5/thumbnails/3.jpg)
3/84
Outline
1 What is submodularity?Examples in recommendation setsDefinition
2 Submodular maximization
3 Submodular minimization
4 Applications of submodular maximization
![Page 4: Submodular Function Optimization](https://reader031.vdocuments.mx/reader031/viewer/2022012502/617bd23ec041b85a3629cd59/html5/thumbnails/4.jpg)
4/84
Case study: News article recommendation
![Page 5: Submodular Function Optimization](https://reader031.vdocuments.mx/reader031/viewer/2022012502/617bd23ec041b85a3629cd59/html5/thumbnails/5.jpg)
5/84
Case study: News article recommendation
![Page 6: Submodular Function Optimization](https://reader031.vdocuments.mx/reader031/viewer/2022012502/617bd23ec041b85a3629cd59/html5/thumbnails/6.jpg)
6/84
Interactive recommendation
Number of recommendations k to choose from largeSimilar articles→ similar click-through rates!
Performance depends on query / contextSimilar users→ similar click-through rates!
Need to compile sets of k recommendations. (instead of onlyone)
Similar sets→ similar click-through rates!
![Page 7: Submodular Function Optimization](https://reader031.vdocuments.mx/reader031/viewer/2022012502/617bd23ec041b85a3629cd59/html5/thumbnails/7.jpg)
7/84
News recommendation
Which set of articles satisfies most users?
![Page 8: Submodular Function Optimization](https://reader031.vdocuments.mx/reader031/viewer/2022012502/617bd23ec041b85a3629cd59/html5/thumbnails/8.jpg)
8/84
News recommendation
Which set of articles satisfies most users?
![Page 9: Submodular Function Optimization](https://reader031.vdocuments.mx/reader031/viewer/2022012502/617bd23ec041b85a3629cd59/html5/thumbnails/9.jpg)
9/84
Sponsored search
Which set of ads should be displayed to maximize revenue?
![Page 10: Submodular Function Optimization](https://reader031.vdocuments.mx/reader031/viewer/2022012502/617bd23ec041b85a3629cd59/html5/thumbnails/10.jpg)
10/84
Sponsored search
Which set of ads should be displayed to maximize revenue?
![Page 11: Submodular Function Optimization](https://reader031.vdocuments.mx/reader031/viewer/2022012502/617bd23ec041b85a3629cd59/html5/thumbnails/11.jpg)
11/84
Relevance vs. Diversity
Users may have different interests /queries may be ambiguous
E.g., "jaguar", "squash",· · ·
Want to choose a set that is relevant toas many users as possible
Users may choose from the set thearticle they’re most interested in
Want to optimize both relevance anddiversity
![Page 12: Submodular Function Optimization](https://reader031.vdocuments.mx/reader031/viewer/2022012502/617bd23ec041b85a3629cd59/html5/thumbnails/12.jpg)
12/84
Simple abstract model
Suppose we’re given a set W of users and a collection V ofarticles/adsEach article i is relevant to a set of users Si
For now suppose this is known!
For each set of articles define
F(A) = | ∪i∈A Si|
Want to select k articles to maximize "users covered"
max|A|<k
F(A)
Number of sets A grows exponential in k!Finding optimal A is NP-hard
![Page 13: Submodular Function Optimization](https://reader031.vdocuments.mx/reader031/viewer/2022012502/617bd23ec041b85a3629cd59/html5/thumbnails/13.jpg)
13/84
Maximum coverage
Given: Collection V of sets, utility function F(A)
Want: A∗ ⊆ V such that
A∗ = argmax|A|≤kF(A)
NP-hard!
Greedy algorithm:Start with A0 = {}For i=1 to k
s∗ = argmaxsF(Ai−1∪{s})Ai = Ai−1 ∪ {s∗}
��
��
How well doesthis simpleheuristic do?
![Page 14: Submodular Function Optimization](https://reader031.vdocuments.mx/reader031/viewer/2022012502/617bd23ec041b85a3629cd59/html5/thumbnails/14.jpg)
14/84
Approximation guarantee
TheoremUnder some natural conditions,greedy algorithm produces a solutionA, where F(A) ≥ (1− 1/e) * optimal-value (∼ 63%).[Nemhauser,Fisher,Wolsey’78]
This result holds for utility functions F with 2 properties:F is (nonnegative) monotone:if A ⊆ B then 0 ≤ F(A) ≤ F(B)F is submodular:"diminishing returns"
![Page 15: Submodular Function Optimization](https://reader031.vdocuments.mx/reader031/viewer/2022012502/617bd23ec041b85a3629cd59/html5/thumbnails/15.jpg)
15/84
Outline
1 What is submodularity?Examples in recommendation setsDefinition
2 Submodular maximization
3 Submodular minimization
4 Applications of submodular maximization
![Page 16: Submodular Function Optimization](https://reader031.vdocuments.mx/reader031/viewer/2022012502/617bd23ec041b85a3629cd59/html5/thumbnails/16.jpg)
16/84
Set Functions
Ground set: subsets of some finite set
Given a set X, the set V := 2X = {A | A ⊆ X}
A set function takes as input a set, and outputs a real numberInputs are subsets of some ground set XF : 2X → R
It is common in the literature to use either X or V as the groundset.
We will follow this inconsistency in the literature and willinconsistently use either X or V as our ground set (hopefully notin the same equation, if so, please point this out).
![Page 17: Submodular Function Optimization](https://reader031.vdocuments.mx/reader031/viewer/2022012502/617bd23ec041b85a3629cd59/html5/thumbnails/17.jpg)
17/84
Set Functions
If F is a modular function, then for any A,B ⊆ V, we have
F(A) + F(B) = F(A ∩ B) + F(A ∪ B)
If F is a modular function, it may be written as
F(A) = F(∅) +∑a∈A
(F({a})− F(∅))
modular set functionsAssociate a weight wi with each i ∈ X, and set F(S) =
∑i∈S wi
Discrete analogue of linear functions
Other possibly useful properties a set function may have:Monotone: if A ⊆ B ⊆ X, then F(A) ≤ F(B)Nonnegative: F(A) ≥ 0 for all S ⊆ XNormalized: F(∅) = 0.
![Page 18: Submodular Function Optimization](https://reader031.vdocuments.mx/reader031/viewer/2022012502/617bd23ec041b85a3629cd59/html5/thumbnails/18.jpg)
18/84
Submodular Functions
Definition 1A set function F : V → R is submodular if and only if
F(A) + F(B) ≥ F(A ∩ B) + F(A ∪ B)
for all A,B ⊆ V.
“Uncrossing” two sets reduces theirtotal function value
DefinitionA set function F : V → R is supmodular if and only if −F issubmodular.
![Page 19: Submodular Function Optimization](https://reader031.vdocuments.mx/reader031/viewer/2022012502/617bd23ec041b85a3629cd59/html5/thumbnails/19.jpg)
19/84
Submodular Functions
Definition 2 (diminishing returns)A set function F : V → R is submodular if and only if
F(B ∪ {s})− F(B)︸ ︷︷ ︸Gain of adding a set s to a large solution
≤ F(A ∪ {s})− F(A)︸ ︷︷ ︸Gain of adding a set s to a small solution
for all A ⊆ B ⊆ V and s /∈ B.
The marginal value of an additional elementexhibits “diminishing marginal returns”This means that the incremental “value”,“gain”, or “cost” of s decreases (diminishes)as the context in which s is consideredgrows from A to B.
![Page 20: Submodular Function Optimization](https://reader031.vdocuments.mx/reader031/viewer/2022012502/617bd23ec041b85a3629cd59/html5/thumbnails/20.jpg)
20/84
Submodular: Consumer Costs of Living
Consumer costs are very often submodular. For example:
When seen as diminishing returns:
![Page 21: Submodular Function Optimization](https://reader031.vdocuments.mx/reader031/viewer/2022012502/617bd23ec041b85a3629cd59/html5/thumbnails/21.jpg)
21/84
Submodular Functions
Definition 3 (group diminishing returns)A set function F : V → R is submodular if and only if
F(B ∪ C)− F(B) ≤ F(A ∪ C)− F(A)
for all A ⊆ B ⊆ V and C ⊆ V\B.
This means that the incremental “value”, “gain”, or “cost” of set Cdecreases (diminishes) as the context in which C is consideredgrows from A to B.
![Page 22: Submodular Function Optimization](https://reader031.vdocuments.mx/reader031/viewer/2022012502/617bd23ec041b85a3629cd59/html5/thumbnails/22.jpg)
22/84
Equivalence of Definitions
Definition 2 =⇒ Definition 3Let C = {c1, . . . , ck}. The Definition 2 implies
F(A ∪ C)− F(A)
= F(A ∪ C)−k−1∑i=1
(F(A ∪ {c1, . . . , ci})− F(A ∪ {c1, . . . , ci}))− F(A)
=
k∑i=1
(F(A ∪ {c1, . . . , ci})− F(A ∪ {c1, . . . , ci−1}))
≥k∑
i=1
(F(B ∪ {c1, . . . , ci})− F(B ∪ {c1, . . . , ci−1}))
= F(B ∪ C)− F(B)
![Page 23: Submodular Function Optimization](https://reader031.vdocuments.mx/reader031/viewer/2022012502/617bd23ec041b85a3629cd59/html5/thumbnails/23.jpg)
23/84
Equivalence of Definitions
Definition 1 =⇒ Definition 2To prove (2), let A′ = A ∪ {i} and B′ = B and apply (1)
F(A ∪ {i}) + F(B) = F(A′) + F(B′)
≥ F(A′ ∩ B′) + F(A′ ∪ B′)
= F(A) + F(B ∪ {i})
Definition 2 =⇒ Definition 1Assume A 6= B. Define A′ = A ∩ B, C = A\B and B′ = B. Then
F(A′ ∪ C)− F(A′) ≥ F(B′ ∪ C)− F(B′)
⇐⇒ F((A ∩ B) ∪ (A\B)) + F(B) ≥ F(B ∪ (A\B)) + F(A′)
⇐⇒ F(A) + F(B) ≥ F(A ∪ B) + F(A ∩ B)
![Page 24: Submodular Function Optimization](https://reader031.vdocuments.mx/reader031/viewer/2022012502/617bd23ec041b85a3629cd59/html5/thumbnails/24.jpg)
24/84
Submodularity
Submodular functions have a long history in economics, gametheory, combinatorial optimization, electrical networks, andoperations research.
They are gaining importance in machine learning as well
Arbitrary set functions are hopelessly difficult to optimize, whilethe minimum of submodular functions can be found in polynomialtime, and the maximum can be constant-factor approximated inlow-order polynomial time.
Submodular functions share properties in common with bothconvex and concave functions.
![Page 25: Submodular Function Optimization](https://reader031.vdocuments.mx/reader031/viewer/2022012502/617bd23ec041b85a3629cd59/html5/thumbnails/25.jpg)
25/84
Example: Set cover
F is submodular: A ⊆ B
F(A ∪ {s})− F(A)︸ ︷︷ ︸Gain of adding a set s to a small solution
≥ F(B ∪ {s})− F(B)︸ ︷︷ ︸Gain of adding a set s to a large solution
Natural example:Set S1, S2, · · · , Sn
F(A)=size of union of Si
(e.g.„number of satisfied users)
F(A) = |∪i∈ASi|
![Page 26: Submodular Function Optimization](https://reader031.vdocuments.mx/reader031/viewer/2022012502/617bd23ec041b85a3629cd59/html5/thumbnails/26.jpg)
26/84
Closedness properties
F1, · · · ,Fm submodular functions on V and λ1, · · · , λm ≥ 0
Then: F(A) =∑
i λiFi(A) is submodular!
Submodularity closed under nonnegative linear combinationsExtremely useful fact:
Fθ(A) submodular⇒∑θ P(θ)Fθ(A) submodular!
Multi-objective optimization:F1, · · · ,Fm submodular, λi > 0⇒
∑i λiFi(A) submodular
![Page 27: Submodular Function Optimization](https://reader031.vdocuments.mx/reader031/viewer/2022012502/617bd23ec041b85a3629cd59/html5/thumbnails/27.jpg)
27/84
Probabilistic set cover
Document coverage function:coverd(c)=probability document d covers concept c, e.g., howstrongly d covers cIt can model how relevant is concept c for user u
Set coverage function:
coverA(c) = 1−Πd∈A(1− coverd(c))
Probability that at least one document in A covers c
Objective:max|A|≤k
F(A) =∑
c
wccoverA(c)
wc is the concept weights
The objective function is submodular
![Page 28: Submodular Function Optimization](https://reader031.vdocuments.mx/reader031/viewer/2022012502/617bd23ec041b85a3629cd59/html5/thumbnails/28.jpg)
28/84
A model of Influence in Social Networks
Given a graph G = (V,E), each v ∈ V corresponds to a person,to each v we have an activation function fv : 2V → [0, 1]dependent only on its neighbors. i.e., fv(A) = fv(A ∩ Γ(v)).
Goal - Viral Marketing: find a small subset S ⊆ V of individuals todirectly influence, and thus indirectly influence the greatestnumber of possible other individuals (via the social network G).
We define a function f : 2V → Z+ that models the ultimateinfluence of an initial set S of nodes based on the followingiterative process: At each step, a given set of nodes S areactivated, and we activate new nodes v ∈ V\S if fv(S) ≥ U[0; 1](where U[0; 1] is a uniform random number between 0 and 1).
It can be shown that for many fv (including simple linearfunctions, and where fv is submodular itself) that f is submodular.
![Page 29: Submodular Function Optimization](https://reader031.vdocuments.mx/reader031/viewer/2022012502/617bd23ec041b85a3629cd59/html5/thumbnails/29.jpg)
29/84
Example: Influence in social networks [Kempe,Kleinberg, Tardos KDD’03]
Which nodes are most influential?V = Alice,Bob,Charlie,Dorothy,Eric,FionaF(A) = Expected number of people influenced by set A
![Page 30: Submodular Function Optimization](https://reader031.vdocuments.mx/reader031/viewer/2022012502/617bd23ec041b85a3629cd59/html5/thumbnails/30.jpg)
30/84
Influence in social networks is submodular [Kempe,Kleinberg, Tardos KDD’03]
Key idea: Flip coins c in advance→ "live" edgesFc(A) = People influenced under outcome cF(A) =
∑c P(c)Fc(A) is submodular as well!
![Page 31: Submodular Function Optimization](https://reader031.vdocuments.mx/reader031/viewer/2022012502/617bd23ec041b85a3629cd59/html5/thumbnails/31.jpg)
31/84
The value of a friend
Let V be a group of individuals. How valuable to you is a givenfriend v ∈ V ?
It depends on how many friends you have.
Given a group of friends S ⊆ V , can you valuate them with afunction F(S) and how?
Let F(S) be the value of the set of friends S. Is submodular orsupermodular a good model?
![Page 32: Submodular Function Optimization](https://reader031.vdocuments.mx/reader031/viewer/2022012502/617bd23ec041b85a3629cd59/html5/thumbnails/32.jpg)
32/84
Information and Summarization
Let V be a set of information containing elements (V might saybe either words, sentences, documents, web pages, or blogs,each v ∈ V is one element, so v might be a word, a sentence, adocument, etc.). The total amount of information in V is measureby a function F(V), and any given subset S ⊆ V measures theamount of information in S, given by F(S).
How informative is any given item v in different sized contexts?Any such real-world information function would exhibitdiminishing returns, i.e., the value of v decreases when it isconsidered in a larger context.
So a submodular function would likely be a good model.
![Page 33: Submodular Function Optimization](https://reader031.vdocuments.mx/reader031/viewer/2022012502/617bd23ec041b85a3629cd59/html5/thumbnails/33.jpg)
33/84
Restriction
RestrictionIf F(S) is submodular on V and W ⊆ V. Then F′(S) = F(S ∩W) issubmodular
Proof: Given A ⊆ B ⊆ V\{i}, prove:
F((A ∪ {i}) ∩W)− F(A ∩W) ≥ F((B ∪ {i}) ∩W)− F(B ∩W).
If i /∈ W, then both differences on each size are zero. Suppose thati ∈ W, then (A ∪ {i}) ∩W = (A ∩W) ∪ {i} and(B ∪ {i}) ∩W = (B ∩W) ∪ {i}. We have A ∩W ⊆ B ∩W, thesubmodularity of F yields
F((A ∩W) ∪ {i})− F(A ∩W) ≥ F((B ∩W) ∪ {i})− F(B ∩W).
![Page 34: Submodular Function Optimization](https://reader031.vdocuments.mx/reader031/viewer/2022012502/617bd23ec041b85a3629cd59/html5/thumbnails/34.jpg)
34/84
Conditioning
ConditioningIf F(S) is submodular on V and W ⊆ V. Then F′(S) = F(S ∪W) issubmodular
![Page 35: Submodular Function Optimization](https://reader031.vdocuments.mx/reader031/viewer/2022012502/617bd23ec041b85a3629cd59/html5/thumbnails/35.jpg)
35/84
Reflection
ReflectionIf F(S) is submodular on V. Then F′(S) = F(V \ S) is submodular
Proof: Since V\(A ∪ B) = (V\A) ∩ (V\B) andV\(A ∩ B) = (V\A) ∪ (V\B), then
F(V\A) + F(V\B) ≥ F(V\(A ∪ B)) + F(V\(A ∩ B)))
![Page 36: Submodular Function Optimization](https://reader031.vdocuments.mx/reader031/viewer/2022012502/617bd23ec041b85a3629cd59/html5/thumbnails/36.jpg)
36/84
Convex aspects
Submodularity as discrete analogue of convexity
Convex extension
Duality
Polynomial time minimization!
A∗ = arg minA⊆V
F(A)
Many applications (computer vision,ML, · · · )
![Page 37: Submodular Function Optimization](https://reader031.vdocuments.mx/reader031/viewer/2022012502/617bd23ec041b85a3629cd59/html5/thumbnails/37.jpg)
37/84
Concave aspects
Marginal gain 4F(s|A) = F({s} ∪ A)− F(A)
Submodular:
∀A ⊆ B, s 6∈ B : F(A ∪ {s})− F(A) ≥ F(B ∪ {s})− F(B)
Concave:
∀a ≤ b, s > 0 g(a + s)− g(a) ≥ g(b + s)− g(b)
![Page 38: Submodular Function Optimization](https://reader031.vdocuments.mx/reader031/viewer/2022012502/617bd23ec041b85a3629cd59/html5/thumbnails/38.jpg)
38/84
∀a ≤ b, s > 0 g(a + s)− g(a) ≥ g(b + s)− g(b)
Suppose that a + s ∈ [a, b]
Apply the concavity of g(x) to [a, a + s, b + s]:
g(a + s) ≥ b− ab + s− a
g(a) +s
b + s− ag(b + s)
⇐⇒ g(a + s)− g(a) ≥ −sb + s− a
g(a) +s
b + s− ag(b + s)
Apply the concavity of g(x) to [a + s, b, b + s]:
g(b) ≥ sb + s− a
g(a) +b− a
b + s− ag(b + s)
⇐⇒ g(b + s)− g(b) ≤ −sb + s− a
g(a) +s
b + s− ag(b + s)
![Page 39: Submodular Function Optimization](https://reader031.vdocuments.mx/reader031/viewer/2022012502/617bd23ec041b85a3629cd59/html5/thumbnails/39.jpg)
39/84
Submodularity and Concavity
Let m ∈ RX+ be a modular function, and g a concave function over R.
Define F(A) = g(m(A)). Then F(A) is submodular.
Proof: Given A ⊆ B ⊆ X\v, we have 0 ≤ a = m(A) ≤ b = m(B), and0 ≤ s = m(v). For g concave, we haveg(a + s)− g(a) ≥ g(b + s)− g(b), which implies
g(m(A) + m(v))− g(m(A)) ≥ g(m(B) + m(v))− g(m(B))
![Page 40: Submodular Function Optimization](https://reader031.vdocuments.mx/reader031/viewer/2022012502/617bd23ec041b85a3629cd59/html5/thumbnails/40.jpg)
40/84
Maximum of submodular functions
Suppose F1(A) and F2(A) submodular.Is F(A) = max(F1(A),F2(A)) submodular?
max(F1,F2) not submodular in general!
![Page 41: Submodular Function Optimization](https://reader031.vdocuments.mx/reader031/viewer/2022012502/617bd23ec041b85a3629cd59/html5/thumbnails/41.jpg)
41/84
Minimum of submodular functions
Well,maybe F(A) = min(F1(A),F2(A)) instead?
max(F1,F2) not submodular in general!
![Page 42: Submodular Function Optimization](https://reader031.vdocuments.mx/reader031/viewer/2022012502/617bd23ec041b85a3629cd59/html5/thumbnails/42.jpg)
42/84
Max - normalized
Given V, let c ∈ RV+ be a given fixed vector. Then F : 2V → R+, where
F(A) = maxj∈A
cj
is submodular and normalized (we take F(∅) = 0).Proof: Since
max(maxj∈A
cj,maxj∈B
cj) = maxj∈A∪B
cj
andmin(max
j∈Acj,max
j∈Bcj) ≥ max
j∈A∩Bcj,
we havemaxj∈A
cj + maxj∈B
cj ≥ maxj∈A∪B
cj + maxj∈A∩B
cj
![Page 43: Submodular Function Optimization](https://reader031.vdocuments.mx/reader031/viewer/2022012502/617bd23ec041b85a3629cd59/html5/thumbnails/43.jpg)
43/84
Monotone difference of two functions
Let F and G both be submodular functions on subsets of V and let(F −G)(·) be either monotone increasing. Then h : 2V → R defined byh(A) = min(F(A),G(A)) is submodular.
If h(A) agrees with either f or g on both X and Y , the resultfollows since
F(X) + F(Y)G(X) + G(Y)
≥ min(F(X ∪ Y),G(X ∪ Y)) + min(F(X ∩ Y),G(X ∩ Y))
otherwise, w.l.o.g., h(X) = F(X) and h(Y) = G(Y), giving
h(X) + h(Y) = F(X) + G(Y) ≥ F(X ∪ Y) + F(X ∩ Y) + G(Y)− F(Y)
Assume F − G is monotonic increasing. Hence,F(X ∪ Y) + G(Y)− F(Y) ≥ G(X ∪ Y) giving
h(X) + h(Y) ≥ G(X ∪ Y) + F(X ∩ Y) ≥ h(X ∪ Y) + h(X ∩ Y)
![Page 44: Submodular Function Optimization](https://reader031.vdocuments.mx/reader031/viewer/2022012502/617bd23ec041b85a3629cd59/html5/thumbnails/44.jpg)
44/84
Min
Let F : 2V → R be an increasing or decreasing submodularfunction and let k be a constant. Then the function h : 2V → Rdefined by
h(A) = min(k; F(A))
is submodular
In general, the minimum of two submodular functions is notsubmodular. However, when wishing to maximize two monotonenon-decreasing submodular functions, we can define functionh : 2V → R as
h(A) =12
(min(k,F) + min(k,G))
then h is submodular, and h(A) ≥ k if and only if both F(A) ≥ kand G(A) ≥ k
![Page 45: Submodular Function Optimization](https://reader031.vdocuments.mx/reader031/viewer/2022012502/617bd23ec041b85a3629cd59/html5/thumbnails/45.jpg)
45/84
Outline
1 What is submodularity?Examples in recommendation setsDefinition
2 Submodular maximization
3 Submodular minimization
4 Applications of submodular maximization
![Page 46: Submodular Function Optimization](https://reader031.vdocuments.mx/reader031/viewer/2022012502/617bd23ec041b85a3629cd59/html5/thumbnails/46.jpg)
46/84
Submodular maximization a Cardinality Constraint
Problem DefinitionGiven a non-decreasing and normalized submodular functionF : 2X → R+ on a finite ground set X with |X| = n, and an integerk ≤ n:
max F(A), s.t. |A| ≤ k
Greedy AlgorithmI A0 ← ∅, set i = 0
I While |Ai| ≤ kChoose s ∈ X maximizing F(Ai ∪ {s})Ai+1 ← Ai ∪ {s}
![Page 47: Submodular Function Optimization](https://reader031.vdocuments.mx/reader031/viewer/2022012502/617bd23ec041b85a3629cd59/html5/thumbnails/47.jpg)
47/84
Greedy maximization is near-optimal
Theorem[Nemhauser, Fisher& Wolsey’78]For monotonic submodular functions, Greedy algorithm givesconstant factor approximation
F(Agreedy) ≥ (1− 1/e)︸ ︷︷ ︸∼63%
F(A∗)
Greedy algorithm gives near-optimal solution!For many submodular objectives: Guarantees best possibleunless P=NPCan also handle more complex constraints
![Page 48: Submodular Function Optimization](https://reader031.vdocuments.mx/reader031/viewer/2022012502/617bd23ec041b85a3629cd59/html5/thumbnails/48.jpg)
48/84
Contraction/Conditioning
Let F : 2X → R and A ⊆ X. Define FA(S) = F(A ∪ S)− F(A).Lemma: If F is monotone and submodular, then FA is monotone,submodular, and normalized for any A.
Proof: Monotone:Let S ⊆ T, then FA(S) = F(A∪ S)−F(A) ≤ F(A∪T)−F(A) = FA(T)
Submodular. Let S,T ⊆ X:
FA(S) + FA(T) = F(S ∪ A)− F(A) + F(T ∪ A)− F(A)
≥ F(S ∪ T ∪ A)− F(A) + F((S ∩ T) ∪ A)− F(A)
= FA(S ∪ T) + FA(S ∩ T)
![Page 49: Submodular Function Optimization](https://reader031.vdocuments.mx/reader031/viewer/2022012502/617bd23ec041b85a3629cd59/html5/thumbnails/49.jpg)
49/84
LemmaIf F is normalized and submodular, and A ⊆ X, then there is j ∈ Asuch that F({j}) ≥ 1
|A|F(A)
Proof. If A1 and A2 partition A, i.e., A = A1 ∪ A2 and A1 ∩ A2 = ∅,then
F(A1) + F(A2) ≥ F(A1 ∪ A2) + F(A1 ∩ A2) = F(A)
Applying recursively, we get∑j∈A
F({j}) ≥ F(A)
Therefore, maxj∈A F({j}) ≥ 1|A|F(A)
![Page 50: Submodular Function Optimization](https://reader031.vdocuments.mx/reader031/viewer/2022012502/617bd23ec041b85a3629cd59/html5/thumbnails/50.jpg)
50/84
Greedy maximization is near-optimal
Theorem[Nemhauser, Fisher& Wolsey’78]For monotonic submodular functions, Greedy algorithm givesconstant factor approximation
F(Agreedy) ≥ (1− 1/e)F(A∗)
Proof: Let Ai be the working set in the algorithmLet A∗ be optimal solution.We will show that the suboptimality F(A∗)− F(A) shrinks by afactor of (1− 1/k) each iterationAfter k iterations, it has shrunk to (1− 1/k)k ≤ 1/e from itsoriginal valueThe algorithm choose s ∈ X maximizing F(Ai ∪ {s}). Hence:
F(Ai+1) = F(Ai) + F(Ai ∪ {s})− F(Ai) = F(Ai) + maxj
FAi({j})
![Page 51: Submodular Function Optimization](https://reader031.vdocuments.mx/reader031/viewer/2022012502/617bd23ec041b85a3629cd59/html5/thumbnails/51.jpg)
51/84
By our lemmas, there is j ∈ A∗ s.t.
FAi({j}) ≥1|A∗|
FAi(A∗) (apply lemma to FAi)
=1k
(F(Ai ∪ A∗)− F(Ai))
≥ 1k
(F(A∗)− F(Ai))
Therefore
F(A∗)− F(Ai+1) = F(A∗)− F(Ai)−maxj
FAi({j})
≤(
1− 1k
)(F(A∗)− F(Ai))
≤(
1− 1k
)k
(F(A∗)− F(∅))
![Page 52: Submodular Function Optimization](https://reader031.vdocuments.mx/reader031/viewer/2022012502/617bd23ec041b85a3629cd59/html5/thumbnails/52.jpg)
52/84
Scaling up the greedy algorithm [Minoux’78]
In round i+1,have picked Ai = s1, · · · , si
pick si+1 = arg maxs F(Ai ∪ {s})− F(Ai)
i.e., maximize "marginal benefit" 4(s|Ai)
4(s|Ai) = F(Ai ∪ {s})− F(Ai)
Key observation: Submodularity implies
Marginal benefits can never increase!
![Page 53: Submodular Function Optimization](https://reader031.vdocuments.mx/reader031/viewer/2022012502/617bd23ec041b85a3629cd59/html5/thumbnails/53.jpg)
53/84
"Lazy" greedy algorithm [Minoux’78]
Lazy greedy algorithm:First iteration as usualKeep an ordered list of marginalbenefits 4i from previous iterationRe-evaluate 4i only for topelementIf 4i stays on top, use it,otherwise re-sort
Note: Very easy to compute online bounds, lazy evaluations, etc.[Leskovec,Krause et al.’07]
![Page 54: Submodular Function Optimization](https://reader031.vdocuments.mx/reader031/viewer/2022012502/617bd23ec041b85a3629cd59/html5/thumbnails/54.jpg)
54/84
Empirical improvements [Leskovec, Krause et al’06]
![Page 55: Submodular Function Optimization](https://reader031.vdocuments.mx/reader031/viewer/2022012502/617bd23ec041b85a3629cd59/html5/thumbnails/55.jpg)
55/84
Outline
1 What is submodularity?Examples in recommendation setsDefinition
2 Submodular maximization
3 Submodular minimization
4 Applications of submodular maximization
![Page 56: Submodular Function Optimization](https://reader031.vdocuments.mx/reader031/viewer/2022012502/617bd23ec041b85a3629cd59/html5/thumbnails/56.jpg)
56/84
Optimizing Submodular Functions
As our examples suggest, optimization problems involvingsubmodular functions are very common
These can be classified on two axes: constrained/unconstrainedand maximization/minimization
Maximization MinimizationUnconstrained NP-hard
12 approximation Polynomial time
via convex optConstrained Usually NP-hard
1 − 1/e (mono, matroid)O(1) (“nice” constriants)
Usually NP-hard to apx.Few easy special cases
RepresentationIn order to generalize all our examples, algorithmic results are oftenposed in the value oracle model. Namely, we only assume we haveaccess to a subroutine evaluating F(S).
![Page 57: Submodular Function Optimization](https://reader031.vdocuments.mx/reader031/viewer/2022012502/617bd23ec041b85a3629cd59/html5/thumbnails/57.jpg)
57/84
Problem DefinitionGiven a submodular function f : 2X → R on a finite ground set X,
min F(S)
s.t. S ⊆ X
We denote n = |X|
We assume F(S) is a rational number with at most b bits
Representation: in order to generalize all our examples,algorithmic results are often posed in the value oracle model.Namely, we only assume we have access to a subroutineevaluating F(S) in constant time.
GoalAn algorithm which runs in time polynomial in n and b.
![Page 58: Submodular Function Optimization](https://reader031.vdocuments.mx/reader031/viewer/2022012502/617bd23ec041b85a3629cd59/html5/thumbnails/58.jpg)
58/84
Some more notations
E = {1, 2, . . . , n}
RE = {x = (xj ∈ R : j ∈ E)}
RE+ = {x = (xj ∈ R : j ∈ E) : x ≥ 0}
Any vector x ∈ RE can be treated as a normalized modularfunction, and vice verse. That is
x(A) =∑a∈A
xa.
Note that x is said to be normalized since x(∅) = 0.
Given A ⊆ E, define the vector 1A ∈ RE+ to be
1A(j) =
{1 if j ∈ A0 if j /∈ A
given modular function x ∈ RE, we can write x(A) in a variety ofways, i.e., x(A) = x · 1A =
∑i∈A xi
![Page 59: Submodular Function Optimization](https://reader031.vdocuments.mx/reader031/viewer/2022012502/617bd23ec041b85a3629cd59/html5/thumbnails/59.jpg)
59/84
Continuous Extensions of a Set Function
A set function F on X = {1, . . . , n} can be thought of as a mapfrom the vertices {0, 1}n of the n-dimensional hypercube to thereal numbers.
Extension of a Set FunctionGiven a set function F : {0, 1}n → R, an extension of F to thehypercube [0, 1]n is a function g : [0, 1]n → R satisfying g(x) = F(x) forevery x ∈ {0, 1}n.
minw∈{0,1}n
F(w)
with ∀A ⊆ X, F(1A) = F(A)
![Page 60: Submodular Function Optimization](https://reader031.vdocuments.mx/reader031/viewer/2022012502/617bd23ec041b85a3629cd59/html5/thumbnails/60.jpg)
60/84
Choquet integral - Lovász extension
Subsets may be identified with elements of {0, 1}n
Given any set-function F and w such that wj1 ≥ . . . ≥ wjn , define
f (w) =
n∑k=1
wjk [F({j1, . . . , jk})− F({j1, . . . , jk−1})
=
n−1∑k=1
(wjk − wjk+1)F({j1, . . . , jk}) + wjnF({j1, . . . , jn})
If w = 1A, f (w) = F(A) =⇒ extension from {0, 1}n to Rn
![Page 61: Submodular Function Optimization](https://reader031.vdocuments.mx/reader031/viewer/2022012502/617bd23ec041b85a3629cd59/html5/thumbnails/61.jpg)
61/84
Choquet integral - Lovász extension, example: p = 2
If w1 ≥ w2, f (w) = F({1})w1 + [F({1, 2})− F({1})]w2
If w1 ≤ w2, f (w) = F({2})w2 + [F({1, 2})− F({2})]w1
level set {w ∈ R2, f (w) = 1} is displayed in blue
Compact formulation: f (w) =[F({1, 2})− F({1})− F({2})] min(w1,w2) + F({1})w1 + F({2})w2
![Page 62: Submodular Function Optimization](https://reader031.vdocuments.mx/reader031/viewer/2022012502/617bd23ec041b85a3629cd59/html5/thumbnails/62.jpg)
62/84
Links with convexity
Theorem (Lovász, 1982)F is submodular if and only if f is convex
Proof requires: Submodular and base polyhedra
Submodular polyhedron: P(F) = {s ∈ Rn, ∀A ⊆ V, s(A) ≤ F(A)}
Base polyhedron: B(F) = P(F) ∩ {s(V) = F(V)}
![Page 63: Submodular Function Optimization](https://reader031.vdocuments.mx/reader031/viewer/2022012502/617bd23ec041b85a3629cd59/html5/thumbnails/63.jpg)
63/84
Submodular and base polyhedra
P(F) has non-empty interiorMany facets (up to 2n), many extreme points (up to n!)
Fundamental property (Edmonds, 1970): If F is submodular,maximizing linear functions may be done by a “greedy algorithm”
Let w ∈ Rn+ such that wj1 ≥ . . . ≥ wjn
Let sjk = F({j1, . . . , jk})− F({j1, . . . , jk−1}) for k ∈ {1, . . . , n}
Thenf (w) = max
s∈P(F)w>s = max
s∈B(F)w>s
Both problems attained at s defined as above.
proofs: pages 41-44 in http://bicmr.pku.edu.cn/~wenzw/bigdata/submodular_fbach_mlss2012.pdf
![Page 64: Submodular Function Optimization](https://reader031.vdocuments.mx/reader031/viewer/2022012502/617bd23ec041b85a3629cd59/html5/thumbnails/64.jpg)
64/84
Links with convexity
Theorem (Lovász, 1982)F is submodular if and only if f is convex
If F is submodular, f is the maximum of linear functions. Then f isconvex
If f is convex, let A,B ⊆ V1A∪B + 1A∩B = 1A + 1B has components equal to 0 (on V\(A ∪ B)),2 (on A ∩ B) and 1 (on A∆B = (A\B) ∪ (B\A))
Thus f (1A∪B + 1A∩B) = F(A ∪ B) + F(A ∩ B). Proof by writing outf (1A∪B + 1A∩B) and the definition of f (w).
By homogeneity and convexity, f (1A + 1B) ≤ f (1A) + f (1B), which isequal to F(A) + F(B), and thus F is submodular.
![Page 65: Submodular Function Optimization](https://reader031.vdocuments.mx/reader031/viewer/2022012502/617bd23ec041b85a3629cd59/html5/thumbnails/65.jpg)
65/84
Links with convexity
Theorem (Lovász, 1982)If F is submodular, then
minA⊆V
F(A) = minw∈{0,1}n
f (w) = minw∈[0,1]n
f (w)
Since f is an extension of F,
minA⊆V
F(A) = minw∈{0,1}n
f (w) ≥ minw∈[0,1]n
f (w)
Any w ∈ [0, 1]n can be decomposed as w =∑m
i=1 λi1Bi , whereB1 ⊆ . . . ⊆ Bm = V, where λ ≥ 0 and λ(V) ≤ 1:
Since minA⊆V F(A) ≤ 0 (F(∅) = 0),
f (w) =
m∑i=1
λiF(Bi) ≥m∑
i=1
λi minA⊆V
F(A) ≥ minA⊆V
F(A)
Thus minw∈[0,1]n f (w) ≥ minA⊆V F(A).
![Page 66: Submodular Function Optimization](https://reader031.vdocuments.mx/reader031/viewer/2022012502/617bd23ec041b85a3629cd59/html5/thumbnails/66.jpg)
66/84
Links with convexity
Any w ∈ [0, 1]n, sort wj1 ≥ . . . ≥ wjn . Find λ such that
n∑k=1
λjk = wj1 ,
n∑k=2
λjk = wj2 , . . . , λjn = wjn ,
B1 = {j1},B2 = {j1, j2}, . . . ,Bn = {j1, j2, . . . , jn}
Then we have w =∑n
i=1 λi1Bi , where B1 ⊆ . . . ⊆ Bn = V, whereλ ≥ 0 and λ(V) =
∑i∈V λi ≤ 1.
![Page 67: Submodular Function Optimization](https://reader031.vdocuments.mx/reader031/viewer/2022012502/617bd23ec041b85a3629cd59/html5/thumbnails/67.jpg)
67/84
Submodular function minimization
Let F : 2V → R be a submodular function (such that F(∅) = 0)
convex duality:
minA⊆V
F(A) = minw∈[0,1]n
f (w)
= minw∈[0,1]n
maxs∈B(F)
w>s
= maxs∈B(F)
minw∈[0,1]n
w>s = maxs∈B(F)
s−(V)
![Page 68: Submodular Function Optimization](https://reader031.vdocuments.mx/reader031/viewer/2022012502/617bd23ec041b85a3629cd59/html5/thumbnails/68.jpg)
68/84
Submodular function minimization
Convex optimizationIf F is submodular, then
minA⊆V
F(A) = minw∈{0,1}n
f (w) = minw∈[0,1]n
f (w)
Using projected subgradient descent to minimize f on [0, 1]n
Iteration: wt = Π[0,1]n(wt−1 − C√t st), where st ∈ ∂f (wt−1)
f (w) = maxs∈B(F) w>s
Standard convergence results from convex optimization
f (wt)− minw∈[0,1]n
f (w) ≤ C√t
![Page 69: Submodular Function Optimization](https://reader031.vdocuments.mx/reader031/viewer/2022012502/617bd23ec041b85a3629cd59/html5/thumbnails/69.jpg)
69/84
Outline
1 What is submodularity?Examples in recommendation setsDefinition
2 Submodular maximization
3 Submodular minimization
4 Applications of submodular maximization
![Page 70: Submodular Function Optimization](https://reader031.vdocuments.mx/reader031/viewer/2022012502/617bd23ec041b85a3629cd59/html5/thumbnails/70.jpg)
70/84
Question
I have 10 minutes.Which blogs should Iread to be most up to date?[Leskovec, Krause, Guestrin, Faloutsos,VanBriesen, Glance’07]
![Page 71: Submodular Function Optimization](https://reader031.vdocuments.mx/reader031/viewer/2022012502/617bd23ec041b85a3629cd59/html5/thumbnails/71.jpg)
71/84
Detecting Cascades in the Blogosphere
Which blogs should we read to learn about big cascades early?
![Page 72: Submodular Function Optimization](https://reader031.vdocuments.mx/reader031/viewer/2022012502/617bd23ec041b85a3629cd59/html5/thumbnails/72.jpg)
72/84
Reward function is submodular
Consider cascade i:Fi(sk) = benefit from blog sk inevent iFi(A) = max Fi(sk), sk ∈ A⇒ Fi is submodular
Overall objective:
F(A) =1m
m∑i=1
Fi(A)
⇒ F is submodular
⇒ Can use greedy algorithm to solve max|A|≤k F(A) !
![Page 73: Submodular Function Optimization](https://reader031.vdocuments.mx/reader031/viewer/2022012502/617bd23ec041b85a3629cd59/html5/thumbnails/73.jpg)
73/84
Performance on Blog selection
Submodular formulation outperforms heuristicsLazy greedy gives 700x speedup
![Page 74: Submodular Function Optimization](https://reader031.vdocuments.mx/reader031/viewer/2022012502/617bd23ec041b85a3629cd59/html5/thumbnails/74.jpg)
74/84
Application: Network inference
How can we learn who influences whom?
![Page 75: Submodular Function Optimization](https://reader031.vdocuments.mx/reader031/viewer/2022012502/617bd23ec041b85a3629cd59/html5/thumbnails/75.jpg)
75/84
Inferring diffusion networks[Gomez Rodriguez, Leskovec, Krause ACM TKDE 2012]
Given traces of influence, wish to infer sparsedirected network G = (V,E)⇒ Formulate as optimization problem
E∗ = arg max|E|≤k
F(E)
![Page 76: Submodular Function Optimization](https://reader031.vdocuments.mx/reader031/viewer/2022012502/617bd23ec041b85a3629cd59/html5/thumbnails/76.jpg)
76/84
Estimation problem
Many influence trees T consistent with dataFor cascade Ci , model P(Ci|T)
Find sparse graph that maximizes likelihood for all observedcascades
⇒ Log likelihood monotonic submodular in selected edges
F(E) =∑
i
log maxtree T⊆E
P(Ci|T)
![Page 77: Submodular Function Optimization](https://reader031.vdocuments.mx/reader031/viewer/2022012502/617bd23ec041b85a3629cd59/html5/thumbnails/77.jpg)
77/84
Evaluation: Synthetic networks
Performance does not depend on the network structure:Synthetic Networks: Forest Fire, Kronecker, etc.Transmission time distribution: Exponential, Power Law
Break-even point of > 90%
![Page 78: Submodular Function Optimization](https://reader031.vdocuments.mx/reader031/viewer/2022012502/617bd23ec041b85a3629cd59/html5/thumbnails/78.jpg)
78/84
Inferred Diffusion Network[Gomez Rodriguez, Leskovec, Krause ACM TKDE 2012]
Actual network inferred from 172 million articles from 1 million newssources
![Page 79: Submodular Function Optimization](https://reader031.vdocuments.mx/reader031/viewer/2022012502/617bd23ec041b85a3629cd59/html5/thumbnails/79.jpg)
79/84
Diffusion Network (small part)
![Page 80: Submodular Function Optimization](https://reader031.vdocuments.mx/reader031/viewer/2022012502/617bd23ec041b85a3629cd59/html5/thumbnails/80.jpg)
80/84
Application: Document summarization[Lin & Bilmes’11]
Which sentences should we select that best summarize adocument?
![Page 81: Submodular Function Optimization](https://reader031.vdocuments.mx/reader031/viewer/2022012502/617bd23ec041b85a3629cd59/html5/thumbnails/81.jpg)
81/84
Marginal gain of a sentence
Many natural notions of “document coverage” are submodular[Lin & Bilmes’11]
![Page 82: Submodular Function Optimization](https://reader031.vdocuments.mx/reader031/viewer/2022012502/617bd23ec041b85a3629cd59/html5/thumbnails/82.jpg)
82/84
Relevance of a summary
F(S) = R(S) + λD(S)↑ ↑
Relevance Diversity
R(S) =∑
i
min{Ci(S), αCi(V)}
Ci(S) =∑j∈S
ωi,j
Ci(S): How well is sentence i "covered" by Sωi,j: Similarity between i and j
![Page 83: Submodular Function Optimization](https://reader031.vdocuments.mx/reader031/viewer/2022012502/617bd23ec041b85a3629cd59/html5/thumbnails/83.jpg)
83/84
Diversity of a summary
D(S) =
K∑i=1
√ ∑j∈Pi∩S
rj
rj =1N
∑i
ωi,j
rj: Relevance of sentence j to doc.ωi,j: Similarity between i and j
Can be made query-specific; multi-resolution; etc.
![Page 84: Submodular Function Optimization](https://reader031.vdocuments.mx/reader031/viewer/2022012502/617bd23ec041b85a3629cd59/html5/thumbnails/84.jpg)
84/84
Summary
Many problems of recommending sets can be cast assubmodular maximizationGreedy algorithm gives best set of size kCan use lazy evaluations to speed upApproximate submodular maximization possible under a varietyof constraints:
MatroidKnapsackMultiple matroid and knapsack constraintsPath constraints (Submodular orienteering)Connectedness (Submodular Steiner)Robustness (minimax)