learning to project and binarise for hashing-based approximate nearest neighbour search

1
L EARNING TO P ROJECT AND B INARISE FOR H ASHING - BASED A PPROXIMATE N EAREST N EIGHBOUR S EARCH S EAN M ORAN SEAN . MORAN @ ED . AC . UK L EARNING THE H ASHING T HRESHOLDS AND H YPERPLANES Research Question: Can learning the hashing quantisation thresh- olds and hyperplanes lead to greater retrieval effectiveness than learning either in isolation? Approach: most hashing methods consist of two steps: data-point projection (hyperplane learning) followed by quantisation of those projections into binary. We show the benefits of explicitly learning the hyperplanes and thresholds based on the data. H ASHING - BASED APPROXIMATE NEAREST NEIGHBOUR SEARCH Problem: Nearest Neighbour (NN) search in image datasets. Hashing-based approach: Generate a similarity preserving binary hashcode for query. Use the fingerprint as index into the buckets of a hash table. If collision occurs only compare to items in the same bucket. 110101 010111 111101 H H Content Based IR Image: Imense Ltd Image: Doersch et al. Image: Xu et al. Location Recognition Near duplicate detection 010101 111101 ..... Query Database Query Nearest Neighbours Hashtable Compute Similarity Hashtable buckets are the polytopes formed by intersecting hyper- planes in the image descriptor space. Thresholds partition each bucket into sub-regions, each with a unique hashcode. We want related images to fall within the same sub-region of a bucket. This work: learn hyperplanes and thresholds to encourage colli- sions between semantically related images. PART 1: S UPERVISED D ATA - SPACE PARTITIONING Step A: Use LSH [1] to initialise hashcode bits B ∈ {-1, 1} N trd ×K N trd : # training data-points, K : # bits Repeat for M iterations: – Step B: Graph regularisation, update the bits of each data-point to be the average of its nearest neighbours B sgn ( α SD -1 B + (1-α) B ) * S ∈{0, 1} N trd ×N trd : adjacency matrix, D Z N trd ×N trd + di- agonal degree matrix, B ∈ {-1, 1} N trd ×K : bits, α [0, 1]: interpolation parameter, sgn: sign function – Step C: Data-space partitioning, learn hyperplanes that predict the K bits with maximum margin for k =1. . .K : min ||w k || 2 + C N trd i=1 ξ ik s.t. B ik (w | k x i ) 1 - ξ ik for i =1. . .N trd * w k ∈< D : hyperplane, x i : image descriptor, B ik : bit k for data-point x i , ξ ik : slack variable Use the learnt hyperplanes w k ∈< D K k=1 to generate K projected dimensions: y k ∈< N trd K k =1 for quantisation. PART 2: S UPERVISED Q UANTISATION T HRESHOLD L EARNING Thresholds t k =[t k 1 ,t k2 ,...,t kT ] are learnt to quantise projected dimension y k , where T [1, 3, 7, 15] is the number of thresholds. We formulate an F 1 -measure objective function that seeks a quan- tisation respecting the contraints in S. Define P k ∈{0, 1} N trd ×N trd : P k ij = ( 1, if γ s.t. t (y k i ,y k j ) <t k(γ +1) 0, otherwise. γ Z 0 γ T . P k indicates whether or not the projec- tions (y k i ,y k j ) fall within the same thresholded region. The algorithm counts true positives (TP), false negatives (FN) and false positives (FP): TP = 1 2 kP Sk 1 FN = 1 2 kSk 1 - TP FP = 1 2 kPk 1 - TP •◦ is the Hadamard product, k.k 1 is the L 1 matrix norm. TP is the number of +ve pairs in the same thresholded region, FP is the -ve pairs, and FN are the +ve pairs in different regions. Counts com- bined using F 1 -measure optimised by Evolutionary Algorithms [3]: F 1 (t k )= 2kP Sk 1 kSk 1 + kPk 1 I LLUSTRATING THE K EY A LGORITHMIC S TEPS y x 1 -1 1 1 1 -1 1 -1 1 -1 -1 1 -1 1 -1 1 (a) Initialisation -1 1 -1 1 1 1 -1 1 1 -1 1 -1 -1 1 -1 1 -1 1 -1 1 1 -1 -1 -1 -1 -1 1 -1 1 -1 1 -1 (b) Regularisation y w 2 w 1 h 1 h 2 x -1 -1 1 -1 -1 1 1 1 (c) Partitioning w 1 Projected Dimension #1 F-measure: 0.67 t 1 1 -1 Projected Dimension #1 F-measure: 1.00 w 1 t 1 1 -1 a a (d) Quantisation E XPERIMENTAL R ESULTS (H AMMING R ANKING AUPRC) Retrieval evaluation on CIFAR-10. Baselines: single static threshold (SBQ) [1], multiple threshold optimisation (NPQ) [3], supervised projection (GRH) [2], and variable threshold learning (VBQ) [4]. LSH [1], PCA, SKLSH [5], SH [6] are used to initialise bits in B 0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.35 0.40 0.45 0.50 0.55 0.60 16 32 48 64 80 96 112 128 Test AUPRC # Bits LSH+GRH+NPQ LSH+GRH+SBQ LSH+NPQ CIFAR-10 (AUPRC) SBQ NPQ GRH VBQ GRH+NPQ LSH 0.0954 0.1621 0.2023 0.2035 0.2593 SH 0.0626 0.1834 0.2147 0.2380 0.2958 PCA 0.0387 0.1660 0.2186 0.2579 0.2791 SKLSH 0.0513 0.1063 0.1652 0.2122 0.2566 Learning hyperplanes and thresholds (GRH+NPQ) most effective. C ONCLUSIONS AND R EFERENCES Hashing model that learns the hyperplanes and thresholds. Found to have highest retrieval effectiveness versus competing models. Future work: closer integration of both steps in a unified objective. References: [1] Indyk, P., Motwani, R.: Approximate nearest neighbors: Towards removing the curse of dimensionality. In: STOC (1998). [2] Moran S., and Lavrenko, V.: Graph Regularised Hashing. In Proc. ECIR, 2015. [3] Moran S., and Lavrenko, V., Osborne, M. Neighbourhood Preserving Quantisation for LSH. In Proc. SIGIR, 2013. [4] Moran S., and Lavrenko, V., Osborne, M. Variable Bit Quantisation for LSH. In Proc. ACL, 2013. [5] Raginsky M., Lazebnik, S. Locality-sensitive binary codes from shift-invariant kernels. In Proc. NIPS, 2009. [6] Weiss Y., Torralba, A. and Fergus, R. Spectral Hashing. In Proc. NIPS, 2008.

Upload: sean-moran

Post on 14-Apr-2017

149 views

Category:

Science


1 download

TRANSCRIPT

Page 1: Learning to Project and Binarise for Hashing-based Approximate Nearest Neighbour Search

LEARNING TO PROJECT AND BINARISE FOR HASHING-BASEDAPPROXIMATE NEAREST NEIGHBOUR SEARCH

SEAN MORAN†† [email protected]

LEARNING THE HASHING THRESHOLDS AND HYPERPLANES

• Research Question: Can learning the hashing quantisation thresh-olds and hyperplanes lead to greater retrieval effectiveness thanlearning either in isolation?

• Approach: most hashing methods consist of two steps: data-pointprojection (hyperplane learning) followed by quantisation of thoseprojections into binary. We show the benefits of explicitly learningthe hyperplanes and thresholds based on the data.

HASHING-BASED APPROXIMATE NEAREST NEIGHBOUR SEARCH

• Problem: Nearest Neighbour (NN) search in image datasets.• Hashing-based approach:

– Generate a similarity preserving binary hashcode for query.– Use the fingerprint as index into the buckets of a hash table.– If collision occurs only compare to items in the same bucket.

110101

010111

111101

H

H

Content Based IR

Image: Imense Ltd

Image: Doersch et al.

Image: Xu et al.

Location Recognition

Near duplicate detection

010101

111101

.....

Query

Database

Query

NearestNeighbours

Hashtable

Compute Similarity

• Hashtable buckets are the polytopes formed by intersecting hyper-planes in the image descriptor space. Thresholds partition eachbucket into sub-regions, each with a unique hashcode. We wantrelated images to fall within the same sub-region of a bucket.

• This work: learn hyperplanes and thresholds to encourage colli-sions between semantically related images.

PART 1: SUPERVISED DATA-SPACE PARTITIONING

• Step A: Use LSH [1] to initialise hashcode bits B ∈ {−1, 1}Ntrd×K

Ntrd: # training data-points, K: # bits

• Repeat for M iterations:

– Step B: Graph regularisation, update the bits of each data-pointto be the average of its nearest neighbours

B← sgn(α SD−1B+ (1−α) B

)∗ S ∈ {0, 1}Ntrd×Ntrd : adjacency matrix, D ∈ ZNtrd×Ntrd

+ di-agonal degree matrix, B ∈ {−1, 1}Ntrd×K : bits, α ∈ [0, 1]:interpolation parameter, sgn: sign function

– Step C: Data-space partitioning, learn hyperplanes that predictthe K bits with maximum margin

for k = 1. . .K : min ||wk||2 + C∑Ntrd

i=1 ξik

s.t. Bik(wᵀkxi) ≥ 1− ξik for i = 1. . .Ntrd

∗ wk ∈ <D: hyperplane, xi: image descriptor, Bik:bit k for data-point xi, ξik: slack variable

• Use the learnt hyperplanes{wk ∈ <D

}Kk=1

to generate K projected

dimensions:{yk ∈ <Ntrd

}Kk=1

for quantisation.

PART 2: SUPERVISED QUANTISATION THRESHOLD LEARNING

• Thresholds tk = [tk1, tk2, . . . , tkT ] are learnt to quantise projecteddimension yk, where T ∈ [1, 3, 7, 15] is the number of thresholds.• We formulate an F1-measure objective function that seeks a quan-

tisation respecting the contraints in S. Define Pk ∈ {0, 1}Ntrd×Ntrd :

P kij =

{1, if ∃γ s.t. tkγ ≤ (yki , y

kj ) < tk(γ+1)

0, otherwise.

• γ ∈ Z ∈ 0 ≤ γ ≤ T . Pk indicates whether or not the projec-tions (yki , y

kj ) fall within the same thresholded region. The algorithm

counts true positives (TP), false negatives (FN) and false positives (FP):

TP =1

2‖P ◦ S‖1 FN =

1

2‖S‖1 − TP FP =

1

2‖P‖1 − TP

• ◦ is the Hadamard product, ‖.‖1 is the L1 matrix norm. TP is thenumber of +ve pairs in the same thresholded region, FP is the −vepairs, and FN are the +ve pairs in different regions. Counts com-bined using F1-measure optimised by Evolutionary Algorithms [3]:

F1(tk) =2‖P ◦ S‖1‖S‖1 + ‖P‖1

ILLUSTRATING THE KEY ALGORITHMIC STEPS

y

x

1 -1

1 1

1 -1 1 -1

1 -1

-1 1

-1 1

-1 1

(a) Initialisation

-1 1

-1 1

1 1

-1 1

1 -1

1 -1

-1 1

-1 1

-1 1

-1 1 1 -1

-1 -1

-1 -1

1 -11 -1 1 -1

(b) Regularisation

y

w2

w1

h1

h2

x

-1 -1 1 -1

-1 1 1 1

(c) Partitioning

w1Projected Dimension #1F-measure: 0.67

t1

1-1

Projected Dimension #1F-measure: 1.00

w1

t1

1-1

a

a

(d) Quantisation

EXPERIMENTAL RESULTS (HAMMING RANKING AUPRC)

• Retrieval evaluation on CIFAR-10. Baselines: single static threshold(SBQ) [1], multiple threshold optimisation (NPQ) [3], supervisedprojection (GRH) [2], and variable threshold learning (VBQ) [4].• LSH [1], PCA, SKLSH [5], SH [6] are used to initialise bits in B

0.00

0.05

0.10

0.15

0.20

0.25

0.30

0.35

0.40

0.45

0.50

0.55

0.60

16 32 48 64 80 96 112 128

Test

AU

PR

C

# Bits

LSH+GRH+NPQLSH+GRH+SBQ

LSH+NPQ

CIFAR-10 (AUPRC)

SBQ NPQ GRH VBQ GRH+NPQ

LSH 0.0954 0.1621 0.2023 0.2035 0.2593

SH 0.0626 0.1834 0.2147 0.2380 0.2958

PCA 0.0387 0.1660 0.2186 0.2579 0.2791

SKLSH 0.0513 0.1063 0.1652 0.2122 0.2566

• Learning hyperplanes and thresholds (GRH+NPQ) most effective.

CONCLUSIONS AND REFERENCES

• Hashing model that learns the hyperplanes and thresholds. Foundto have highest retrieval effectiveness versus competing models.• Future work: closer integration of both steps in a unified objective.

References: [1] Indyk, P., Motwani, R.: Approximate nearest neighbors: Towards removing the curse of dimensionality. In: STOC(1998). [2] Moran S., and Lavrenko, V.: Graph Regularised Hashing. In Proc. ECIR, 2015. [3] Moran S., and Lavrenko, V., Osborne,M. Neighbourhood Preserving Quantisation for LSH. In Proc. SIGIR, 2013. [4] Moran S., and Lavrenko, V., Osborne, M. Variable BitQuantisation for LSH. In Proc. ACL, 2013. [5] Raginsky M., Lazebnik, S. Locality-sensitive binary codes from shift-invariant kernels. InProc. NIPS, 2009. [6] Weiss Y., Torralba, A. and Fergus, R. Spectral Hashing. In Proc. NIPS, 2008.