regularised cross-modal hashing (sigir'15 poster)

REGULARISED CROSS-MODALHASHING

SEAN MORAN†, VICTOR LAVRENKO† [email protected]

CROSS-MODAL HASHING: FAST RETRIEVAL WITH HASHCODES

• Encourage collisions between similar images and documents

110101

010111

H

H

010101

111101

.....

Query

Database

Query

NearestNeighbours

Hashtable

Tiger

Compute Similarity

Camera

Camera

Tiger

Tiger

Flowers

Flowers

Car

Car

Tiger

REGULARISED CROSS-MODAL HASHING (RCMH)

• Step A: Randomly initialise word modality bits B ∈ {−1, 1}N×K

N : # data-points, K: # bits

• Repeat M times:

– Step B: Graph Regularisation

B← sgn(α SD−1B + (1−α) B

)∗ S ∈ {0, 1}N×N : adjacency matrix, D ∈ ZN×N

+ diagonaldegree matrix, B ∈ {−1, 1}N×K : bits, α ∈ [0, 1]: interpola-tion parameter, sgn: sign function

– Step C: Word Data-Space Partitioning

for k = 1. . .K : min ||fk||2 + C∑N

i=1 ξik

s.t. Bik(fkᵀai + bk) ≥ 1− ξik for i = 1. . .N

∗ fk ∈ <D: word hyperplane, bk ∈ <: bias, ai ∈ <D:word descriptor, Bik: bit k for word data-point ai,ξik: slack variable

– Step C: Visual Data-Space Partitioning

for k = 1. . .K : min ||gk||2 + C∑N

i=1 ξik

s.t. Bik(gkᵀvi + tk) ≥ 1− ξik for i = 1. . .N

∗ gk ∈ <D: visual hyperplane, tk ∈ <: bias, vi ∈ <D:visual descriptor, Bik: bit k for word data-point ai

– Step D: Update matrix B using word hyperplanes: bik =sgn(fkᵀai + bk)

• Use hyperplanes {fk,gk}Kk=1 to encode unseen data-points

STEP B: GRAPH REGULARISATION

a

d

0 1

0 0

0 1

c e

b

1 01 1

1 0

CarCar

Car

TigerCar

TigerCar

TigerCar

0 1

STEP C: DATSET PARTITIONING IN WORD AND VISUAL SPACE (LEARN HASHING HYPERPLANES)

Word Space

a

c

f 1

0 1

0 0

b

e

d

f 2

a2

a1

1 1

1 0

1 0

Car

Car

TigerCar

TigerCar

TigerCar

g1

a

c

e

d

0 1

1 1

1 0

b

g2

Visual Space

v1

v2

0 0

0 11 0

QUANTITATIVE RESULTS (NUS-WIDE, 64 BITS, HAMMING RANKING MEAN AVERAGE PRECISION (MAP))

0.30

0.35

0.40

0.45

Text Database - Image Query

CRH

CVH

CMSSH

IMH

PDH

RCMH

Model

mA

P

+6%vs PDH

0.30

0.35

0.40

0.45

0.50

Image Query - Text Database

CRH

CVH

CMSSH

IMH

PDH

RCMH

Model

mA

P

+9%vs PDH

• State-of-the-art retrieval effectivenesson standard datasets

• Graph regularisation can producehigh-quality cross-modal hashcodes

• No need to solve an intermediateeigenvalue problem (saves O(D3))

• Future work: extend to fast cross-lingual (e.g. Spanish-English) re-trieval

regularised cross-modal hashing (sigir'15 poster)

Technology

ai step d

word modality bits b

b e d f

word hyperplanes

word dataspace partitioning

word descriptor

unseen datapoints step

n fk d