lecture 16: earth-mover distance - mit
TRANSCRIPT
Administrivia, Plan
β’ Administrivia:
β NO CLASS next Tuesday 11/3 (holiday)
β’ Plan:
β Earth-Mover Distance
β’ Scriber?
2
Earth-Mover Distance
β’ Definition:
β Given two sets π΄, π΅ of points in a metric space
β πΈππ·(π΄, π΅) = min cost bipartite matching between π΄ and π΅
β’ Which metric space?
β Can be plane, β2, β1β¦
β’ Applications in image vision
Images courtesy of Kristen Grauman
Embedding EMD into β1
β’ Why β1?
β’ At least as hard as β1
β Can embed 0,1 π into EMD with distortion 1
β’ β1 is richer than β2
β’ Will focus on integer grid Ξ 2:
4
Embedding EMD into β1
β’ Theorem: Can embed EMD over Ξ 2 into β1 with distortion π log Ξ . In fact, will construct a randomized π: 2 Ξ 2
β β1 such that:β for any π΄, π΅ β Ξ 2:
πΈππ· π΄, π΅ β€ π¬ ||π π΄ β π π΅ ||1 β€ π log Ξ β πΈππ·(π΄, π΅)
β time to embed a set of π points: π π logΞ .
β’ Consequences:β Nearest Neighbor Search: π(π logΞ ) approximation with
π(π π1+1/π) space, and π(π1/π β π log Ξ) query time.
β Computation: π(logΞ) approximation in π π logΞ timeβ’ Best known: 1 + π approximation in π π time [ASβ12]
[Charikarβ02, Indyk-Thaperβ03]
What if π΄ β |π΅| ?
β’ Suppose:
β π΄ = π
β π΅ = π < π
β’ Define
πΈππ·β π΄, π΅ = Ξ π β π + ππππ΄β²,π
πβπ΄β²
π(π, π π )
where
π΄β² ranges over all subsets of π΄ of size π
π: π΄β² β π΅ ranges over all 1-to-1 mappings
For optimal π΄β², call π β π΄\π΄β² unmatched
6
Embedding EMD over small gridβ’ Suppose = 3
β’ π(π΄) has nine coordinates, counting # points in each integer point
β π(π΄) = (2,1,1,0,0,0,1,0,0)
β π(π΅) = (1,1,0,0,2,0,0,0,1)
β’ Claim: 2 2 distortion embedding
High level embedding
β’ Set in Ξ 2 box
β’ Embedding of set π΄:β take a quad-tree
β’ grid of cell size Ξ/3β’ partition each cell in 3x3
β’ recurse until of size 3x3
β randomly shift it
β Each cell gives a
coordinate:
π (π΄)π=#points in the
cell π
β’ Want to proveπ¬ π π΄ β π π΅
1β πΈππ·(π΄, π΅)
8
2 2
1 0
02 11
1
0 00
0 0 0
0
0 2
21
π π¨ = β¦2210β¦0002β¦0011β¦0100β¦0000β¦
π π© = β¦1202β¦0100β¦0011β¦0000β¦1100β¦
Main idea: intuition
β’ Decompose EMD over []2 into EMDs over smaller grids
β’ Recursively reduce to = π(1)
+β
Decomposition Lemma
β’ For randomly-shifted cut-grid πΊ of side length π, will prove:1) πΈππ·(π΄, π΅) β€ πΈππ·π(π΄1, π΅1) + πΈππ·π(π΄2, π΅2) + β―
+ π β πΈππ·Ξ/π(π΄πΊ, π΅πΊ)
2) πΈππ·Ξ π΄, π΅ β₯1
3π¬[πΈππ·π π΄1, π΅1 + πΈππ·π π΄2, π΅2 + β―]
3) πΈππ·Ξ π΄, π΅ β₯ π¬[π β πΈππ·Ξ/π(π΄πΊ , π΅πΊ)]
β’ The distortion will
follow by applying the lemma
recursively to (AG,BG)
/π
π
1 (lower bound)β’ Claim 1: for a randomly-shifted cut-grid πΊ of side length π:
πΈππ·(π΄, π΅) β€ πΈππ·π(π΄1, π΅1) + πΈππ·π(π΄2, π΅2) + β―+ π β πΈππ·Ξ/π(π΄πΊ , π΅πΊ)
β’ Construct a matching π for πΈππ·Ξ π΄, π΅ from the matchings on RHS as follows
β’ For each ππ΄ (suppose ππ΄π) it is either:1) matched in πΈππ·(π΄π , π΅π) to some ππ΅π (if π β π΄πβ²)
β’ then π π = π
2) or π β π΄πβ², and then it is matched
in πΈππ·(π΄πΊ , π΅πΊ) to some ππ΅π (π β π)β’ then π π = π
β’ Cost? 1) paid by πΈππ·(π΄π , π΅π)2) Move π to center ()
β’ Charge to πΈππ·(π΄π , π΅π)
Move from cell π to cell πβ’ Charge π to πΈππ·(π΄πΊ , π΅πΊ)
β’ If π΄ > |π΅|, extra |π΄| β |π΅|
pay π β Ξ
π= Ξ on LHS & RHS
/π
π
2 & 3 (upper bound)
β’ Claims 2,3: for a randomly-shifted cut-grid πΊ of side length π, we have:
2) πΈππ·Ξ π΄, π΅ β₯1
3π¬[πΈππ·π π΄1, π΅1 + πΈππ·π π΄2, π΅2 + β―]
3) πΈππ·Ξ π΄, π΅ β₯ π¬[π β πΈππ·Ξ/π(π΄πΊ , π΅πΊ)]
β’ Fix a matching π minimizing πΈππ·Ξ(π΄, π΅)β Will construct matchings for each EMD on RHS
β’ Uncut pairs π, π β π are matched in respective (π΄, π΅)β’ Cut pairs π, π β π :
β are unmatched in their mini-grids
β are matched in (π΄πΊ , π΅πΊ)
3: Cost
β’ Claim 2: β’ 3 β πΈππ·Ξ π΄,π΅ β₯ π¬[πΈππ·π π΄1, π΅1 + πΈππ·π π΄2, π΅2 + β―]
β’ Uncut pairs (π, π) are matched in respective (π΄π , π΅π)β Total contribution from uncut pairs β€ πΈππ·Ξ(π΄, π΅)
β’ Consider a cut pair (π, π) at distance π β π = (ππ₯ , ππ¦)β (π, π) can contribute to RHS as they may be unmatched in their
own mini-grids
β Pr[(π, π) cut] = 1 β 1 βππ₯
π +1 β
ππ¦
π +β€
ππ₯
π+
ππ¦
πβ€
1
π||π β π||2
β Expected contribution of (π, π) to RHS:β€Pr[(π, π) cut] β 2π β€ 2 π β π
2
β Total expected cost contributed to RHS:2 β πΈππ·Ξ(π΄, π΅)
β’ Total (cut & uncut pairs): 3 β πΈππ·Ξ(π΄, π΅)
ππ₯
π
3: Cost
β’ Claim:
β πΈππ·Ξ π΄, π΅ β₯ π¬[π β πΈππ·Ξ/π(π΄πΊ , π΅πΊ)]
β’ Uncut pairs: contribute zero to RHS!
β’ Cut pair: π, π β π with π β π = (ππ₯, ππ¦)
β if ππ₯ = π₯π + ππ, and ππ¦ = π¦π + ππ¦ , then
β expected cost contribution to π β πΈππ·Ξ/π(π΄πΊ , π΅πΊ):
β€ π₯ +ππ₯
πβ π + π¦ +
ππ¦
πβ π = ππ₯ + ππ¦ = π β π
2
β’ Total expected cost β€ πΈππ·Ξ(π΄, π΅)
14
ππ₯
π π π
Recurse on decompositionβ’ For randomly-shifted cut-grid πΊ of side length π, we have:
1) πΈππ·(π΄, π΅) β€ πΈππ·π(π΄1, π΅1) + πΈππ·π(π΄2, π΅2) + β―
+ π β πΈππ·Ξ/π(π΄πΊ, π΅πΊ)
2) πΈππ·Ξ π΄, π΅ β₯1
3π¬[πΈππ·π π΄1, π΅1 + πΈππ·π π΄2, π΅2 + β―]
3) πΈππ·Ξ π΄, π΅ β₯ π¬[π β πΈππ·Ξ/π(π΄πΊ , π΅πΊ)]
β’ We applying decomposition recursively for π = 3β Choose randomly-shifted cut-grid πΊ1 on []2
β Obtain many grids [3]2, and a big grid [/3]2
β Then choose randomly-shifted cut-grid πΊ2 on [/3]2
β Obtain more grids [3]2, and another big grid [/9]2
β Then choose randomly-shifted cut-grid πΊ3 on [/9]2
β β¦
β’ Then, embed each of the small grids [3]2 into β1, using π(1)distortion embedding, and concatenate the embeddingsβ Each 3 2 grid occupies 9 coordinates on β1 embedding
15
Proving recursion worksβ’ Claim: embedding contracts distances by π 1 :
πΈππ·(π΄, π΅) β€
β€ βπ πΈππ·π π΄π, π΅π + π β πΈππ·Ξ/π(π΄πΊ1, π΅πΊ1
)
β€ βπ πΈππ·π π΄π, π΅π + πβπ πΈππ·π π΄πΊ1,π, π΅πΊ1,π
+π β πΈππ· Ξ
π2π΄πΊ2
, π΅πΊ2
β€ β¦β€ sum of πΈππ·3 costs of 3 Γ 3 instances
β€1
2 2||π π΄ β π π΅ ||1
β’ Claim: embedding distorts distances by π log Ξ in expectation:3 logπ Ξ β πΈππ·(π΄, π΅)
β₯ 3 β πΈππ·(π΄, π΅) + 3 logπΞ
πβ πΈππ·Ξ(π΄, π΅)
β₯ π[ βπ πΈππ·π(π΄π, π΅π) + 3 logπΞ
πβ π β πΈππ·Ξ/π(π΄πΊ1
, π΅πΊ1) ]
β₯ β―β₯ sum of πΈππ·3 costs of 3 Γ 3 instances
β₯ ||π π΄ β π π΅ ||1
Final theoremβ’ Theorem: can embed EMD over [Ξ]2 into β1
with π(log Ξ) distortion in expectation.
β’ Notes:β Dimension required: π(Ξ2), but a set π΄ of size π
maps to a vector that has only π(π β log Ξ) non-zero coordinates.
β Time: can compute in π(π β log )β By Markovβs, itβs π(log Ξ) distortion with 90%
probability
β’ Applications:β Can compute πΈππ·(π΄, π΅) in time π(π β log Ξ)β NNS: π(π β log Ξ) approximation, with π(π1+1/π β
π ) space, and π(π1/π β π β log Ξ) query time.