ryan o'donnell (cmu) yi wu (cmu, ibm) yuan zhou (cmu)

31
Ryan O'Donnell (CMU) Yi Wu (CMU, IBM) Yuan Zhou (CMU)

Post on 20-Dec-2015

232 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Ryan O'Donnell (CMU) Yi Wu (CMU, IBM) Yuan Zhou (CMU)

Ryan O'Donnell (CMU)Yi Wu (CMU, IBM)

Yuan Zhou (CMU)

Page 2: Ryan O'Donnell (CMU) Yi Wu (CMU, IBM) Yuan Zhou (CMU)

Locality Sensitive Hashing [Indyk-Motwani '98]

objects sketchesh :

H : family of hash functions h s.t.

“similar” objects collide w/ high prob.

“dissimilar” objects collide w/ low prob.

Page 3: Ryan O'Donnell (CMU) Yi Wu (CMU, IBM) Yuan Zhou (CMU)

Abbreviated history

Page 4: Ryan O'Donnell (CMU) Yi Wu (CMU, IBM) Yuan Zhou (CMU)

Min-wise hash functions [Broder '98]

||

||

BA

BA

A

Bw

ord

1?w

ord

2?w

ord

3?

wor

d d?

Jaccard similarity:

Invented simple H s.t. Pr [h(A) = h(B)] =

0 1 1 1 0 0 1 0 0

1 1 1 0 0 0 1 0 1

Page 5: Ryan O'Donnell (CMU) Yi Wu (CMU, IBM) Yuan Zhou (CMU)

Indyk-Motwani '98

Defined LSH.

Invented very simple H good for

{0, 1}d under Hamming distance.

Showed good LSH implies good

nearest-neighbor-search data structs.

Page 6: Ryan O'Donnell (CMU) Yi Wu (CMU, IBM) Yuan Zhou (CMU)

Charikar '02, STOC

Proposed alternate H (“simhash”) for

Jaccard similarity.

Patented by .GoogleGoogle

Page 7: Ryan O'Donnell (CMU) Yi Wu (CMU, IBM) Yuan Zhou (CMU)

Many papers about LSH

Page 8: Ryan O'Donnell (CMU) Yi Wu (CMU, IBM) Yuan Zhou (CMU)

Practice Theory

Free code base [AI’04]

Sequence comparisonin bioinformatics

Association-rule findingin data mining

Collaborative filtering

Clustering nouns bymeaning in NLP

Pose estimation in vision

• • •

[Tenesawa–Tanaka ’07]

[Broder ’97]

[Indyk–Motwani ’98]

[Gionis–Indyk–Motwani ’98]

[Charikar ’02]

[Datar–Immorlica– –Indyk–Mirrokni ’04]

[Motwani–Naor–Panigrahi ’06]

[Andoni–Indyk ’06]

[Neylon ’10]

[Andoni–Indyk ’08, CACM]

Page 9: Ryan O'Donnell (CMU) Yi Wu (CMU, IBM) Yuan Zhou (CMU)

Given: (X, dist), r > 0, c > 1

distance space “radius” “approx factor”

Goal: Family H of functions X → S

(S can be any finite set)

s.t. ∀ x, y ∈ X,

≥ p

≤ q

≥ q.5 ≥ q.25 ≥ q.1 ≥ qρ )]()([Pr),(~

yhxhyxdisth

H

≤ r

)]()([Pr),(~

yhxhyxdisth

H

≥ cr

Page 10: Ryan O'Donnell (CMU) Yi Wu (CMU, IBM) Yuan Zhou (CMU)

Theorem

[IM’98, GIM’98]

Given LSH family for (X, dist),

can solve “(r,cr)-near-neighbor search”

for n points with data structure of

size: O(n1+ρ)

query time: Õ(nρ) hash fcn evals.

qyhxh

cryxdist

qyhxh

ryxdist

h

h

)]()([Pr

),(

)]()([Pr

),(

~

~

H

H

Page 11: Ryan O'Donnell (CMU) Yi Wu (CMU, IBM) Yuan Zhou (CMU)

qyhxh

cryxdist

qyhxh

ryxdist

h

h

)]()([Pr

),(

)]()([Pr

),(

~

~

H

H

Example

X = {0,1}d, dist = Hamming

r = εd, c = 5

0 1 1 1 0 0 1 0 0

1 1 1 0 0 0 1 0 1

dist ≤ εd or ≥5εd

H = { h1, h2, …, hd }, hi(x) = xi[IM’98]

“output a random coord.”

Page 12: Ryan O'Donnell (CMU) Yi Wu (CMU, IBM) Yuan Zhou (CMU)

51)]()([Pr5),(

1)]()([Pr),(

~

~

yhxhdyxdist

yhxhdyxdist

h

h

H

H

Analysis

= q

= qρ

(1 − 5ε)1/5 ≈ 1 − ε. ∴ ρ ≈ 1/5

(1 − 5ε)1/5 ≤ 1 − ε. ∴ ρ ≤ 1/5

In general, achieves ρ ≤ 1/c, ∀c (∀r).

Page 13: Ryan O'Donnell (CMU) Yi Wu (CMU, IBM) Yuan Zhou (CMU)

Optimal upper bound

( {0, 1}d, Ham ), r > 0, c > 1.

S ≝ {0, 1}d ∪ {✔}, H ≝ {hab : dist(a,b) ≤ r}

hab(x) = ✔ if x = a or x = b

x otherwise

= 0

positive=> 0.5 > 0.1 > 0.01 > 0.0001 )]()([Pr),(~

yhxhyxdisth

H

≤ r

)]()([Pr),(~

yhxhyxdisth

H

≥ cr

Page 14: Ryan O'Donnell (CMU) Yi Wu (CMU, IBM) Yuan Zhou (CMU)

The End.

Any questions?

Page 15: Ryan O'Donnell (CMU) Yi Wu (CMU, IBM) Yuan Zhou (CMU)

Wait, what?Theorem [IM’98, GIM’98]

Given LSH family for (X, dist),

can solve “(r,cr)-near-neighbor search”

for n points with data structure of

size: O(n1+ρ)

query time: Õ(nρ) hash fcn evals.

Page 16: Ryan O'Donnell (CMU) Yi Wu (CMU, IBM) Yuan Zhou (CMU)

Wait, what?Theorem [IM’98, GIM’98]

Given LSH family for (X, dist),

can solve “(r,cr)-near-neighbor search”

for n points with data structure of

size: O(n1+ρ)

query time: Õ(nρ) hash fcn evals.

q ≥ n-o(1) ("not tiny")

Page 17: Ryan O'Donnell (CMU) Yi Wu (CMU, IBM) Yuan Zhou (CMU)

More results

For Rd with ℓp-distance:

when p = 1, 0 < p < 1, p = 2[IM’98] [DIIM’04] [AI’06]

For Jaccard similarity: ρ ≤ 1/c

pc

1

[Bro’98]

For {0,1}d with Hamming distance:

−od(1) (assuming q ≥ 2−o(d))[MNP’06]

immediately

for ℓp-distance

c

462.

pc

462.

Page 18: Ryan O'Donnell (CMU) Yi Wu (CMU, IBM) Yuan Zhou (CMU)

Our Theorem

For {0,1}d with Hamming distance:

−od(1) (assuming q ≥ 2−o(d))

immediately

for ℓp-distance

(∃ r s.t.)

Proof also yields ρ ≥ 1/c for Jaccard.

c

1

pc

1

Page 19: Ryan O'Donnell (CMU) Yi Wu (CMU, IBM) Yuan Zhou (CMU)

Proof

Page 20: Ryan O'Donnell (CMU) Yi Wu (CMU, IBM) Yuan Zhou (CMU)

Proof

Noise-stability is log-convex.

:

Page 21: Ryan O'Donnell (CMU) Yi Wu (CMU, IBM) Yuan Zhou (CMU)

Proof

A definition, and two lemmas.

:

Page 22: Ryan O'Donnell (CMU) Yi Wu (CMU, IBM) Yuan Zhou (CMU)

Definition: Noise stability at e-т

Fix any arbitrary function h : {0,1}d → S.

Pick x ∈ {0,1}d at random:

x = h(x) = s

Flip each bit w.p. (1-e-2т)/2 independenttly

y = h(y) = s’

def:

0 1 1 1 0 0 1 0 0

0 0 1 1 0 0 1 1 0

)]()([Pr)(~

yhxhyx

h K

Page 23: Ryan O'Donnell (CMU) Yi Wu (CMU, IBM) Yuan Zhou (CMU)

Lemma 1:

Lemma 2:

For x y,

when τ ≪ 1.

Kh(τ) is a log-convex function of τ.

(for any h)

τ

dist(x, y) = o(d) w.v.h.p. 2/)1( 2 de

d≈Proof: Chernoff bound and Taylor expansion.

Proof uses Fourier analysis of Boolean functions.

0 τ

log Kh(τ)

Page 24: Ryan O'Donnell (CMU) Yi Wu (CMU, IBM) Yuan Zhou (CMU)

Fourier transformation

• Theorem. f : {0, 1}d -> R can be uniquely written as

where

• Proof. is an orthonormal basis of {f : {0, 1}d -> R}.

][

)()(ˆ)(nS

S xSfxf

ii xSi

i

x

SiS x

][1)1()1()(

SS x)}({

Basis fcns.

Fourier coef.

Page 25: Ryan O'Donnell (CMU) Yi Wu (CMU, IBM) Yuan Zhou (CMU)

Lemma 2: Kh(τ) is a log-convex function of τ.

Proof: Let hi(x) = 1h(x)=i . i

yxyxh iyhxhyhxh ])()([Pr)]()([Pr)(

~~K

i

iiyx

yhxh )]()([E~

iT

nTiS

nSi

yxyThxSh )()(ˆ)()(ˆE

][][~

i nTS

TSyx

ii yxThSh][,

~)]()([E)(ˆ)(ˆ

Page 26: Ryan O'Donnell (CMU) Yi Wu (CMU, IBM) Yuan Zhou (CMU)

)]()([E~

yx TSyx

ii yTixSi

diyx

][1][1

][~)1(E

ii yTi

di

xSi

diyx

][1

][

][1

][~)1()1(E

])1[(E ][1][1

~][

ii

ii

yTixSi

yxdi

=

TiSi ,1

TiSi ,0

TiSi ,0

TiSi ,

2

22

2

)1(

2

)1(

e

ee

TSe

TSS ||2

0

Page 27: Ryan O'Donnell (CMU) Yi Wu (CMU, IBM) Yuan Zhou (CMU)

Lemma 2: Kh(τ) is a log-convex function of τ.

Proof: Let hi(x) = 1h(x)=i .)]()([Pr)(

~yhxh

yxh K

i nTS

TSyx

ii yxThSh][,

~)]()([E)(ˆ)(ˆ

i nS

Si eSh

][

||22)(ˆ

log-convex fcns.

non-neg comb. of

Page 28: Ryan O'Donnell (CMU) Yi Wu (CMU, IBM) Yuan Zhou (CMU)

Lemma 1:

Lemma 2:

For x y,

when τ ≪ 1.

Kh(τ) is a log-convex function of τ.

(for any h)

τ

d≈

Theorem: LSH for {0,1}d requires .)1(1

doc

0 τ

log Kh(τ)

dist(x, y) = o(d) w.v.h.p. 2/)1( 2 de

Page 29: Ryan O'Donnell (CMU) Yi Wu (CMU, IBM) Yuan Zhou (CMU)

Proof: Say H is an LSH family for {0,1}d

with params (εd + o(d), cεd - o(d), qρ, q) .

r (c − o(1)) r

def: (Non-neg. lin. comb.

of log-convex fcns.

∴ KH(τ) is also

log-convex.)

w.v.h.p.,

dist(x,y) ≈ (1 - e-т)d ≈ тd ∴ KH(ε) ≳ qρ

KH(cε) ≲ q

)]([E)(~

hhKK

HH

)]]()([Pr[E~~

yhxhyxh

H

in truth, q+2−Θ(d); we assume q not tiny

)]]()([Pr[E~~

yhxhhyx

H

Page 30: Ryan O'Donnell (CMU) Yi Wu (CMU, IBM) Yuan Zhou (CMU)

∴ KH(ε) ≳

KH(cε) ≲

∴ KH(0) = ln

ln

ln

1

q

0

ρ ln q

ln q

KH(τ) is log-convex

0 τ

ln KH(τ)

ln q

ε

ln q1c

ρ ln q ≤ ln q1c

Page 31: Ryan O'Donnell (CMU) Yi Wu (CMU, IBM) Yuan Zhou (CMU)

The End.

Any questions?