proximity searching in high dimensional spaces with a proximity preserving order edgar chávez...

20
Proximity Searching in High Dimensional Spaces with a Proximity Preserving Order Edgar Chávez Karina Figueroa Gonzalo Navarro UNIVERSIDAD MICHOACANA, MEXICO UNIVERSIDAD DE CHILE, CHILE

Upload: javon-wanless

Post on 14-Dec-2015

219 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Proximity Searching in High Dimensional Spaces with a Proximity Preserving Order Edgar Chávez Karina Figueroa Gonzalo Navarro UNIVERSIDAD MICHOACANA, MEXICO

Proximity Searching in High Dimensional Spaces with a Proximity

Preserving Order

Edgar Chávez

Karina Figueroa

Gonzalo Navarro

UNIVERSIDADMICHOACANA,MEXICO

UNIVERSIDADDE CHILE,

CHILE

Page 2: Proximity Searching in High Dimensional Spaces with a Proximity Preserving Order Edgar Chávez Karina Figueroa Gonzalo Navarro UNIVERSIDAD MICHOACANA, MEXICO

Content

1. About the problem

2. Basic concepts

3. Previous work

4. Our technique

5. Experiments

6. Conclusion and future wok

Page 3: Proximity Searching in High Dimensional Spaces with a Proximity Preserving Order Edgar Chávez Karina Figueroa Gonzalo Navarro UNIVERSIDAD MICHOACANA, MEXICO

Proximity Searching

Huge Database

•Exact searching is not possible

Expensive distance

Page 4: Proximity Searching in High Dimensional Spaces with a Proximity Preserving Order Edgar Chávez Karina Figueroa Gonzalo Navarro UNIVERSIDAD MICHOACANA, MEXICO

Applications

• Retrieval Information

• Classification

• People finder through the web

• Clustering

• Currently used on– Classification of Spider’s web– Face recognition on Chilean’s Web

Page 5: Proximity Searching in High Dimensional Spaces with a Proximity Preserving Order Edgar Chávez Karina Figueroa Gonzalo Navarro UNIVERSIDAD MICHOACANA, MEXICO

Problems (metric spaces)

Index

Extraction of characteristics

Complex objects

High dimension

Memorylimited

Huge databases

Page 6: Proximity Searching in High Dimensional Spaces with a Proximity Preserving Order Edgar Chávez Karina Figueroa Gonzalo Navarro UNIVERSIDAD MICHOACANA, MEXICO

Terminology

• Queries– Range query– K nearest neighbor

Properties•Symmetry•Strict possitiveness•Triangle inequality

Page 7: Proximity Searching in High Dimensional Spaces with a Proximity Preserving Order Edgar Chávez Karina Figueroa Gonzalo Navarro UNIVERSIDAD MICHOACANA, MEXICO

Previous work

• Pivot based • Partition based

Pivot

distance

q

Page 8: Proximity Searching in High Dimensional Spaces with a Proximity Preserving Order Edgar Chávez Karina Figueroa Gonzalo Navarro UNIVERSIDAD MICHOACANA, MEXICO

Previous work

• Pivot based • Partition based

centroq

Page 9: Proximity Searching in High Dimensional Spaces with a Proximity Preserving Order Edgar Chávez Karina Figueroa Gonzalo Navarro UNIVERSIDAD MICHOACANA, MEXICO

Our techniquePermutation

Permutantp3

p2

p5

P4

P6

u

P1

Page 10: Proximity Searching in High Dimensional Spaces with a Proximity Preserving Order Edgar Chávez Karina Figueroa Gonzalo Navarro UNIVERSIDAD MICHOACANA, MEXICO

Our technique

• Exact matching elements have the same permutation

• Similar elements must have a similar permutation (we guess)

• Spearman footrule metric– Measures the similarity of the

permutations– Promissority elements first

Page 11: Proximity Searching in High Dimensional Spaces with a Proximity Preserving Order Edgar Chávez Karina Figueroa Gonzalo Navarro UNIVERSIDAD MICHOACANA, MEXICO

Spearman Footrule metricExample

3-1, 6 - 2, 3-2, 4-1, 5-5, 6-4

Difference of positions

Page 12: Proximity Searching in High Dimensional Spaces with a Proximity Preserving Order Edgar Chávez Karina Figueroa Gonzalo Navarro UNIVERSIDAD MICHOACANA, MEXICO

Searching process (1a. part)Preprocessing time

Permutantp1

p2

p3

p3,p1,p2

p3,p2,p1

p2,p1,p3

p2,p3,p1

Page 13: Proximity Searching in High Dimensional Spaces with a Proximity Preserving Order Edgar Chávez Karina Figueroa Gonzalo Navarro UNIVERSIDAD MICHOACANA, MEXICO

Searching process (2a. part)Query time

Permutantp1

p2

p3

p3,p1,p2

p3,p2,p1

p2,p1,p3

p2,p3,p1

q

p2,p1,p3

Sorting elementsby SpearmanFootrule metric

p2,p1,p3p2,p3,p1…..…..p3,p1,p2

Page 14: Proximity Searching in High Dimensional Spaces with a Proximity Preserving Order Edgar Chávez Karina Figueroa Gonzalo Navarro UNIVERSIDAD MICHOACANA, MEXICO

Experiments 93% retrieved, comparing 10% of database

90% retrieved, comparing 60% of databasePivot based

algorithmRetrieved 48%

%re

trie

ved

Page 15: Proximity Searching in High Dimensional Spaces with a Proximity Preserving Order Edgar Chávez Karina Figueroa Gonzalo Navarro UNIVERSIDAD MICHOACANA, MEXICO

Experiments100% retrieved, comparing 15% of database

100% retrieved, comparing 90% of database%

retr

ieve

d

Page 16: Proximity Searching in High Dimensional Spaces with a Proximity Preserving Order Edgar Chávez Karina Figueroa Gonzalo Navarro UNIVERSIDAD MICHOACANA, MEXICO

How good is our prediction?

retrieved

Dimension 256, using 256 pivots

Percentage of the database compared

Metric algorithms are using one of them

Page 17: Proximity Searching in High Dimensional Spaces with a Proximity Preserving Order Edgar Chávez Karina Figueroa Gonzalo Navarro UNIVERSIDAD MICHOACANA, MEXICO

Similarities between permutations

Almost the same value

Page 18: Proximity Searching in High Dimensional Spaces with a Proximity Preserving Order Edgar Chávez Karina Figueroa Gonzalo Navarro UNIVERSIDAD MICHOACANA, MEXICO

Conclusion

• A new probabilistic algorithm for proximity searching in metric space.

• Our technique is based on permutations.• Close elements will have similar

permutations.• This technique is the fastest known

algorithm for high dimension.• Permutations are good predictor

Page 19: Proximity Searching in High Dimensional Spaces with a Proximity Preserving Order Edgar Chávez Karina Figueroa Gonzalo Navarro UNIVERSIDAD MICHOACANA, MEXICO

Future Work

• Can Non-metric spaces be tackled with this technique?

• Approximated all K Nearest neighbor algorithm.

• Improving other metric indexes.

Page 20: Proximity Searching in High Dimensional Spaces with a Proximity Preserving Order Edgar Chávez Karina Figueroa Gonzalo Navarro UNIVERSIDAD MICHOACANA, MEXICO

Thank you

UNIVERSIDADMICHOACANA,MEXICO

UNIVERSIDADDE CHILE,

CHILE

[email protected]