lecture 16: earth-mover distance - mit

17
1 1 Lecture 16: Earth-Mover Distance COMS E6998-9 F15

Upload: others

Post on 08-Feb-2022

3 views

Category:

Documents


0 download

TRANSCRIPT

11

Lecture 16:

Earth-Mover Distance

COMS E6998-9 F15

Administrivia, Plan

β€’ Administrivia:

– NO CLASS next Tuesday 11/3 (holiday)

β€’ Plan:

– Earth-Mover Distance

β€’ Scriber?

2

Earth-Mover Distance

β€’ Definition:

– Given two sets 𝐴, 𝐡 of points in a metric space

– 𝐸𝑀𝐷(𝐴, 𝐡) = min cost bipartite matching between 𝐴 and 𝐡

β€’ Which metric space?

– Can be plane, β„“2, β„“1…

β€’ Applications in image vision

Images courtesy of Kristen Grauman

Embedding EMD into β„“1

β€’ Why β„“1?

β€’ At least as hard as β„“1

– Can embed 0,1 𝑑 into EMD with distortion 1

β€’ β„“1 is richer than β„“2

β€’ Will focus on integer grid Ξ” 2:

4

Embedding EMD into β„“1

β€’ Theorem: Can embed EMD over Ξ” 2 into β„“1 with distortion 𝑂 log Ξ” . In fact, will construct a randomized 𝑓: 2 Ξ” 2

β†’ β„“1 such that:– for any 𝐴, 𝐡 βŠ‚ Ξ” 2:

𝐸𝑀𝐷 𝐴, 𝐡 ≀ 𝑬 ||𝑓 𝐴 βˆ’ 𝑓 𝐡 ||1 ≀ 𝑂 log Ξ” β‹… 𝐸𝑀𝐷(𝐴, 𝐡)

– time to embed a set of 𝑠 points: 𝑂 𝑠 logΞ” .

β€’ Consequences:– Nearest Neighbor Search: 𝑂(𝑐 logΞ” ) approximation with

𝑂(𝑠𝑛1+1/𝑐) space, and 𝑂(𝑛1/𝑐 β‹… 𝑠 log Ξ”) query time.

– Computation: 𝑂(logΞ”) approximation in 𝑂 𝑠 logΞ” timeβ€’ Best known: 1 + πœ– approximation in 𝑂 𝑠 time [AS’12]

[Charikar’02, Indyk-Thaper’03]

What if 𝐴 β‰  |𝐡| ?

β€’ Suppose:

– 𝐴 = π‘Ž

– 𝐡 = 𝑏 < π‘Ž

β€’ Define

πΈπ‘€π·βˆ† 𝐴, 𝐡 = Ξ” π‘Ž βˆ’ 𝑏 + π‘šπ‘–π‘›π΄β€²,πœ‹

π‘Žβˆˆπ΄β€²

𝑑(π‘Ž, πœ‹ π‘Ž )

where

𝐴′ ranges over all subsets of 𝐴 of size 𝑏

πœ‹: 𝐴′ β†’ 𝐡 ranges over all 1-to-1 mappings

For optimal 𝐴′, call π‘Ž ∈ 𝐴\𝐴′ unmatched

6

Embedding EMD over small gridβ€’ Suppose = 3

β€’ 𝑓(𝐴) has nine coordinates, counting # points in each integer point

– 𝑓(𝐴) = (2,1,1,0,0,0,1,0,0)

– 𝑓(𝐡) = (1,1,0,0,2,0,0,0,1)

β€’ Claim: 2 2 distortion embedding

High level embedding

β€’ Set in Ξ” 2 box

β€’ Embedding of set 𝐴:– take a quad-tree

β€’ grid of cell size Ξ”/3β€’ partition each cell in 3x3

β€’ recurse until of size 3x3

– randomly shift it

– Each cell gives a

coordinate:

𝑓 (𝐴)𝑐=#points in the

cell 𝑐

β€’ Want to prove𝑬 𝑓 𝐴 βˆ’ 𝑓 𝐡

1β‰ˆ 𝐸𝑀𝐷(𝐴, 𝐡)

8

2 2

1 0

02 11

1

0 00

0 0 0

0

0 2

21

𝑓 𝑨 = …2210…0002…0011…0100…0000…

𝑓 𝑩 = …1202…0100…0011…0000…1100…

Main idea: intuition

β€’ Decompose EMD over []2 into EMDs over smaller grids

β€’ Recursively reduce to = 𝑂(1)

+β‰ˆ

Decomposition Lemma

β€’ For randomly-shifted cut-grid 𝐺 of side length π‘˜, will prove:1) 𝐸𝑀𝐷(𝐴, 𝐡) ≀ πΈπ‘€π·π‘˜(𝐴1, 𝐡1) + πΈπ‘€π·π‘˜(𝐴2, 𝐡2) + β‹―

+ π‘˜ β‹… 𝐸𝑀𝐷Δ/π‘˜(𝐴𝐺, 𝐡𝐺)

2) 𝐸𝑀𝐷Δ 𝐴, 𝐡 β‰₯1

3𝑬[πΈπ‘€π·π‘˜ 𝐴1, 𝐡1 + πΈπ‘€π·π‘˜ 𝐴2, 𝐡2 + β‹―]

3) 𝐸𝑀𝐷Δ 𝐴, 𝐡 β‰₯ 𝑬[π‘˜ β‹… 𝐸𝑀𝐷Δ/π‘˜(𝐴𝐺 , 𝐡𝐺)]

β€’ The distortion will

follow by applying the lemma

recursively to (AG,BG)

/π‘˜

π‘˜

1 (lower bound)β€’ Claim 1: for a randomly-shifted cut-grid 𝐺 of side length π‘˜:

𝐸𝑀𝐷(𝐴, 𝐡) ≀ πΈπ‘€π·π‘˜(𝐴1, 𝐡1) + πΈπ‘€π·π‘˜(𝐴2, 𝐡2) + β‹―+ π‘˜ β‹… 𝐸𝑀𝐷Δ/π‘˜(𝐴𝐺 , 𝐡𝐺)

β€’ Construct a matching πœ‹ for 𝐸𝑀𝐷Δ 𝐴, 𝐡 from the matchings on RHS as follows

β€’ For each π‘Žπ΄ (suppose π‘Žπ΄π‘–) it is either:1) matched in 𝐸𝑀𝐷(𝐴𝑖 , 𝐡𝑖) to some 𝑏𝐡𝑖 (if π‘Ž ∈ 𝐴𝑖′)

β€’ then πœ‹ π‘Ž = 𝑏

2) or π‘Ž βˆ‰ 𝐴𝑖′, and then it is matched

in 𝐸𝑀𝐷(𝐴𝐺 , 𝐡𝐺) to some 𝑏𝐡𝑗 (𝑗 β‰  𝑖)β€’ then πœ‹ π‘Ž = 𝑏

β€’ Cost? 1) paid by 𝐸𝑀𝐷(𝐴𝑖 , 𝐡𝑖)2) Move π‘Ž to center ()

β€’ Charge to 𝐸𝑀𝐷(𝐴𝑖 , 𝐡𝑖)

Move from cell 𝑖 to cell 𝑗‒ Charge π‘˜ to 𝐸𝑀𝐷(𝐴𝐺 , 𝐡𝐺)

β€’ If 𝐴 > |𝐡|, extra |𝐴| βˆ’ |𝐡|

pay π‘˜ β‹…Ξ”

π‘˜= Ξ” on LHS & RHS

/π‘˜

π‘˜

2 & 3 (upper bound)

β€’ Claims 2,3: for a randomly-shifted cut-grid 𝐺 of side length π‘˜, we have:

2) 𝐸𝑀𝐷Δ 𝐴, 𝐡 β‰₯1

3𝑬[πΈπ‘€π·π‘˜ 𝐴1, 𝐡1 + πΈπ‘€π·π‘˜ 𝐴2, 𝐡2 + β‹―]

3) 𝐸𝑀𝐷Δ 𝐴, 𝐡 β‰₯ 𝑬[π‘˜ β‹… 𝐸𝑀𝐷Δ/π‘˜(𝐴𝐺 , 𝐡𝐺)]

β€’ Fix a matching πœ‹ minimizing 𝐸𝑀𝐷Δ(𝐴, 𝐡)– Will construct matchings for each EMD on RHS

β€’ Uncut pairs π‘Ž, 𝑏 ∈ πœ‹ are matched in respective (𝐴, 𝐡)β€’ Cut pairs π‘Ž, 𝑏 ∈ πœ‹ :

– are unmatched in their mini-grids

– are matched in (𝐴𝐺 , 𝐡𝐺)

3: Cost

β€’ Claim 2: β€’ 3 β‹… 𝐸𝑀𝐷Δ 𝐴,𝐡 β‰₯ 𝑬[πΈπ‘€π·π‘˜ 𝐴1, 𝐡1 + πΈπ‘€π·π‘˜ 𝐴2, 𝐡2 + β‹―]

β€’ Uncut pairs (π‘Ž, 𝑏) are matched in respective (𝐴𝑖 , 𝐡𝑖)– Total contribution from uncut pairs ≀ 𝐸𝑀𝐷Δ(𝐴, 𝐡)

β€’ Consider a cut pair (π‘Ž, 𝑏) at distance π‘Ž βˆ’ 𝑏 = (𝑑π‘₯ , 𝑑𝑦)– (π‘Ž, 𝑏) can contribute to RHS as they may be unmatched in their

own mini-grids

– Pr[(π‘Ž, 𝑏) cut] = 1 βˆ’ 1 βˆ’π‘‘π‘₯

π‘˜ +1 βˆ’

𝑑𝑦

π‘˜ +≀

𝑑π‘₯

π‘˜+

𝑑𝑦

π‘˜β‰€

1

π‘˜||π‘Ž βˆ’ 𝑏||2

– Expected contribution of (π‘Ž, 𝑏) to RHS:≀Pr[(π‘Ž, 𝑏) cut] β‹… 2π‘˜ ≀ 2 π‘Ž βˆ’ 𝑏

2

– Total expected cost contributed to RHS:2 β‹… 𝐸𝑀𝐷Δ(𝐴, 𝐡)

β€’ Total (cut & uncut pairs): 3 β‹… 𝐸𝑀𝐷Δ(𝐴, 𝐡)

𝑑π‘₯

π‘˜

3: Cost

β€’ Claim:

– 𝐸𝑀𝐷Δ 𝐴, 𝐡 β‰₯ 𝑬[π‘˜ β‹… 𝐸𝑀𝐷Δ/π‘˜(𝐴𝐺 , 𝐡𝐺)]

β€’ Uncut pairs: contribute zero to RHS!

β€’ Cut pair: π‘Ž, 𝑏 ∈ πœ‹ with π‘Ž βˆ’ 𝑏 = (𝑑π‘₯, 𝑑𝑦)

– if 𝑑π‘₯ = π‘₯π‘˜ + π‘Ÿπ‘˜, and 𝑑𝑦 = π‘¦π‘˜ + π‘Ÿπ‘¦ , then

– expected cost contribution to π‘˜ β‹… 𝐸𝑀𝐷Δ/π‘˜(𝐴𝐺 , 𝐡𝐺):

≀ π‘₯ +π‘Ÿπ‘₯

π‘˜β‹… π‘˜ + 𝑦 +

π‘Ÿπ‘¦

π‘˜β‹… π‘˜ = 𝑑π‘₯ + 𝑑𝑦 = π‘Ž βˆ’ 𝑏

2

β€’ Total expected cost ≀ 𝐸𝑀𝐷Δ(𝐴, 𝐡)

14

𝑑π‘₯

π‘˜ π‘˜ π‘˜

Recurse on decompositionβ€’ For randomly-shifted cut-grid 𝐺 of side length π‘˜, we have:

1) 𝐸𝑀𝐷(𝐴, 𝐡) ≀ πΈπ‘€π·π‘˜(𝐴1, 𝐡1) + πΈπ‘€π·π‘˜(𝐴2, 𝐡2) + β‹―

+ π‘˜ β‹… 𝐸𝑀𝐷Δ/π‘˜(𝐴𝐺, 𝐡𝐺)

2) 𝐸𝑀𝐷Δ 𝐴, 𝐡 β‰₯1

3𝑬[πΈπ‘€π·π‘˜ 𝐴1, 𝐡1 + πΈπ‘€π·π‘˜ 𝐴2, 𝐡2 + β‹―]

3) 𝐸𝑀𝐷Δ 𝐴, 𝐡 β‰₯ 𝑬[π‘˜ β‹… 𝐸𝑀𝐷Δ/π‘˜(𝐴𝐺 , 𝐡𝐺)]

β€’ We applying decomposition recursively for π‘˜ = 3– Choose randomly-shifted cut-grid 𝐺1 on []2

– Obtain many grids [3]2, and a big grid [/3]2

– Then choose randomly-shifted cut-grid 𝐺2 on [/3]2

– Obtain more grids [3]2, and another big grid [/9]2

– Then choose randomly-shifted cut-grid 𝐺3 on [/9]2

– …

β€’ Then, embed each of the small grids [3]2 into β„“1, using 𝑂(1)distortion embedding, and concatenate the embeddings– Each 3 2 grid occupies 9 coordinates on β„“1 embedding

15

Proving recursion worksβ€’ Claim: embedding contracts distances by 𝑂 1 :

𝐸𝑀𝐷(𝐴, 𝐡) ≀

≀ βˆ‘π‘– πΈπ‘€π·π‘˜ 𝐴𝑖, 𝐡𝑖 + π‘˜ β‹… 𝐸𝑀𝐷Δ/π‘˜(𝐴𝐺1, 𝐡𝐺1

)

≀ βˆ‘π‘– πΈπ‘€π·π‘˜ 𝐴𝑖, 𝐡𝑖 + π‘˜βˆ‘π‘– πΈπ‘€π·π‘˜ 𝐴𝐺1,𝑖, 𝐡𝐺1,𝑖

+π‘˜ β‹… 𝐸𝑀𝐷 Ξ”

π‘˜2𝐴𝐺2

, 𝐡𝐺2

≀ …≀ sum of 𝐸𝑀𝐷3 costs of 3 Γ— 3 instances

≀1

2 2||𝑓 𝐴 βˆ’ 𝑓 𝐡 ||1

β€’ Claim: embedding distorts distances by 𝑂 log Ξ” in expectation:3 logπ‘˜ Ξ” β‹… 𝐸𝑀𝐷(𝐴, 𝐡)

β‰₯ 3 β‹… 𝐸𝑀𝐷(𝐴, 𝐡) + 3 logπ‘˜Ξ”

π‘˜β‹… 𝐸𝑀𝐷Δ(𝐴, 𝐡)

β‰₯ 𝐄[ βˆ‘π‘– πΈπ‘€π·π‘˜(𝐴𝑖, 𝐡𝑖) + 3 logπ‘˜Ξ”

π‘˜β‹… π‘˜ β‹… 𝐸𝑀𝐷Δ/π‘˜(𝐴𝐺1

, 𝐡𝐺1) ]

β‰₯ β‹―β‰₯ sum of 𝐸𝑀𝐷3 costs of 3 Γ— 3 instances

β‰₯ ||𝑓 𝐴 βˆ’ 𝑓 𝐡 ||1

Final theoremβ€’ Theorem: can embed EMD over [Ξ”]2 into β„“1

with 𝑂(log Ξ”) distortion in expectation.

β€’ Notes:– Dimension required: 𝑂(Ξ”2), but a set 𝐴 of size 𝑠

maps to a vector that has only 𝑂(𝑠 β‹… log Ξ”) non-zero coordinates.

– Time: can compute in 𝑂(𝑠 β‹… log )– By Markov’s, it’s 𝑂(log Ξ”) distortion with 90%

probability

β€’ Applications:– Can compute 𝐸𝑀𝐷(𝐴, 𝐡) in time 𝑂(𝑠 β‹… log Ξ”)– NNS: 𝑂(𝑐 β‹… log Ξ”) approximation, with 𝑂(𝑛1+1/𝑐 β‹…

𝑠) space, and 𝑂(𝑛1/𝑐 β‹… 𝑠 β‹… log Ξ”) query time.