andoni beyondlsh mmdsmmds-data.org/presentations/2014/andoni_mmds14.pdf ·...

Post on 13-Aug-2020

1 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Beyond    Locality  Sensitive  Hashing  

Alex  Andoni    (Microsoft  Research)  

Joint  with:  Piotr  Indyk  (MIT),  Huy  L.  Nguyen  (Princeton),  Ilya  Razenshteyn  (MIT)  

Nearest  Neighbor  Search  (NNS)  •     

Motivation  •  Generic  setup:  

•  Points  model  objects  (e.g.  images)  •  Distance  models  (dis)similarity  measure  

•  Application  areas:    •  machine  learning:  k-­‐NN  rule  •  image/video/music  recognition,  deduplication,  bioinformatics,  etc…  

•  Distance  can  be:    •  Hamming,  Euclidean,  …  

•  Primitive  for  other  problems:  •  find  the  similar  pairs,  clustering…  

000000 011100 010100 000100 010100 011111

000000 001100 000100 000100 110100 111111

Approximate  NNS  •     

q

r p

cr

Locality-­‐Sensitive  Hashing  •     

q

p

1

[Indyk-Motwani’98]

q

“not-­‐so-­‐small”

Locality  sensitive  hash  functions  

•     

6

[Indyk-Motwani’98]

1

Algorithms  and  Lower  Bounds  Space Time Comment Reference

[IM’98]

[PTW’08, PTW’10]

[IM’98]

[DIIM’04, AI’06]

[MNP’06]

[OWZ’11]

[PTW’08, PTW’10]

[MNP’06]

[OWZ’11]

LSH  is  tight…  

leave  the  rest  to  cell-­‐probe  lower  bounds?  

Main  Result  

•     

9

A  look  at  LSH  lower  bounds  

•     

10

[O’Donnell-­‐Wu-­‐Zhou’11]  

Why  not  NNS  lower  bound?  

•     

11

Our  algorithm:  intuition  

•     

12

Nice  Configuration:  “sparsity”  

•     

13

Reduction:  into  spherical  LSH  

•     

14

Two-­‐level  algorithm  

•     

Details  

•     

16

Practice  •  Practice  uses  data-­‐dependent  partitions!  

•  “wherever  theoreticians  suggest  to  use  random  dimensionality  reduction,  use  PCA”  

•  Lots  of  variants  •  Trees:  kd-­‐trees,  quad-­‐trees,  ball-­‐trees,  rp-­‐trees,  PCA-­‐trees,  sp-­‐trees…  

•  no  guarantees:  e.g.,  are  deterministic  

•  Is  there  a  better  way  to  do  partitions  in  practice?  

• Why  do  PCA-­‐trees  work?  •  [Abdullah-­‐A-­‐Kannan-­‐Krauthgamer]:  if  have  more  structure  

17

Finale  

•     

18

top related