regularised cross-modal hashing (sigir'15 poster)

1
R EGULARISED C ROSS -M ODAL H ASHING S EAN M ORAN ,V ICTOR L AVRENKO SEAN . MORAN @ ED . AC . UK C ROSS -M ODAL H ASHING :FAST R ETRIEVAL WITH H ASHCODES Encourage collisions between similar images and documents 110101 010111 H H 010101 111101 ..... Query Database Query Nearest Neighbours Hashtable Tiger Compute Similarity Camera Camera Tiger Tiger Flowers Flowers Car Car Tiger R EGULARISED C ROSS -M ODAL H ASHING (RCMH) Step A: Randomly initialise word modality bits B ∈ {-1, 1} N ×K N : # data-points, K : # bits Repeat M times: – Step B: Graph Regularisation B sgn ( α SD -1 B + (1-α) B ) * S ∈{0, 1} N ×N : adjacency matrix, D Z N ×N + diagonal degree matrix, B ∈ {-1, 1} N ×K : bits, α [0, 1]: interpola- tion parameter, sgn: sign function – Step C: Word Data-Space Partitioning for k =1. . .K : min ||f k || 2 + C N i=1 ξ ik s.t. B ik (f k | a i + b k ) 1 - ξ ik for i =1. . .N * f k ∈< D : word hyperplane, b k ∈<: bias, a i ∈< D : word descriptor, B ik : bit k for word data-point a i , ξ ik : slack variable – Step C: Visual Data-Space Partitioning for k =1. . .K : min ||g k || 2 + C N i=1 ξ ik s.t. B ik (g k | v i + t k ) 1 - ξ ik for i =1. . .N * g k ∈< D : visual hyperplane, t k ∈<: bias, v i ∈< D : visual descriptor, B ik : bit k for word data-point a i – Step D: Update matrix B using word hyperplanes: b ik = sgn(f k | a i + b k ) Use hyperplanes {f k , g k } K k =1 to encode unseen data-points S TEP B: G RAPH R EGULARISATION a d 0 1 0 0 0 1 c e b 1 0 1 1 1 0 Car Car Car Tiger Car Tiger Car Tiger Car 0 1 S TEP C: D ATSET PARTITIONING IN W ORD AND V ISUAL S PACE (L EARN HASHING HYPERPLANES ) Word Space a c f 1 0 1 0 0 b e d f 2 a 2 a 1 1 1 1 0 1 0 Car Car Tiger Car Tiger Car Tiger Car g 1 a c e d 0 1 1 1 1 0 b g 2 Visual Space v 1 v 2 0 0 0 1 1 0 Q UANTITATIVE R ESULTS (NUS-WIDE, 64 BITS ,H AMMING R ANKING MEAN AVERAGE PRECISION ( M AP)) 0.30 0.35 0.40 0.45 Text Database - Image Query CRH CVH CMSSH IMH PDH RCMH Model mAP +6% vs PDH 0.30 0.35 0.40 0.45 0.50 Image Query - Text Database CRH CVH CMSSH IMH PDH RCMH Model mAP +9% vs PDH State-of-the-art retrieval effectiveness on standard datasets Graph regularisation can produce high-quality cross-modal hashcodes No need to solve an intermediate eigenvalue problem (saves O (D 3 )) Future work: extend to fast cross- lingual (e.g. Spanish-English) re- trieval

Upload: sean-moran

Post on 07-Aug-2015

279 views

Category:

Technology


0 download

TRANSCRIPT

REGULARISED CROSS-MODALHASHING

SEAN MORAN†, VICTOR LAVRENKO† [email protected]

CROSS-MODAL HASHING: FAST RETRIEVAL WITH HASHCODES

• Encourage collisions between similar images and documents

110101

010111

H

H

010101

111101

.....

Query

Database

Query

NearestNeighbours

Hashtable

Tiger

Compute Similarity

Camera

Camera

Tiger

Tiger

Flowers

Flowers

Car

Car

Tiger

REGULARISED CROSS-MODAL HASHING (RCMH)

• Step A: Randomly initialise word modality bits B ∈ {−1, 1}N×K

N : # data-points, K: # bits

• Repeat M times:

– Step B: Graph Regularisation

B← sgn(α SD−1B + (1−α) B

)∗ S ∈ {0, 1}N×N : adjacency matrix, D ∈ ZN×N

+ diagonaldegree matrix, B ∈ {−1, 1}N×K : bits, α ∈ [0, 1]: interpola-tion parameter, sgn: sign function

– Step C: Word Data-Space Partitioning

for k = 1. . .K : min ||fk||2 + C∑N

i=1 ξik

s.t. Bik(fkᵀai + bk) ≥ 1− ξik for i = 1. . .N

∗ fk ∈ <D: word hyperplane, bk ∈ <: bias, ai ∈ <D:word descriptor, Bik: bit k for word data-point ai,ξik: slack variable

– Step C: Visual Data-Space Partitioning

for k = 1. . .K : min ||gk||2 + C∑N

i=1 ξik

s.t. Bik(gkᵀvi + tk) ≥ 1− ξik for i = 1. . .N

∗ gk ∈ <D: visual hyperplane, tk ∈ <: bias, vi ∈ <D:visual descriptor, Bik: bit k for word data-point ai

– Step D: Update matrix B using word hyperplanes: bik =sgn(fkᵀai + bk)

• Use hyperplanes {fk,gk}Kk=1 to encode unseen data-points

STEP B: GRAPH REGULARISATION

a

d

0 1

0 0

0 1

c e

b

1 01 1

1 0

CarCar

Car

TigerCar

TigerCar

TigerCar

0 1

STEP C: DATSET PARTITIONING IN WORD AND VISUAL SPACE (LEARN HASHING HYPERPLANES)

Word Space

a

c

f 1

0 1

0 0

b

e

d

f 2

a2

a1

1 1

1 0

1 0

Car

Car

TigerCar

TigerCar

TigerCar

g1

a

c

e

d

0 1

1 1

1 0

b

g2

Visual Space

v1

v2

0 0

0 11 0

QUANTITATIVE RESULTS (NUS-WIDE, 64 BITS, HAMMING RANKING MEAN AVERAGE PRECISION (MAP))

0.30

0.35

0.40

0.45

Text Database - Image Query

CRH

CVH

CMSSH

IMH

PDH

RCMH

Model

mA

P

+6%vs PDH

0.30

0.35

0.40

0.45

0.50

Image Query - Text Database

CRH

CVH

CMSSH

IMH

PDH

RCMH

Model

mA

P

+9%vs PDH

• State-of-the-art retrieval effectivenesson standard datasets

• Graph regularisation can producehigh-quality cross-modal hashcodes

• No need to solve an intermediateeigenvalue problem (saves O(D3))

• Future work: extend to fast cross-lingual (e.g. Spanish-English) re-trieval