![Page 1: A Sketch-based Approach for Multimedia Retrieval...Outline 1 Motivation 2 Video Retrieval 3 Image Retrieval 4 Zero-Shot Learning Koustav Ghosal Supervisor : Dr. Anoop Namboodiri (Universities](https://reader034.vdocuments.mx/reader034/viewer/2022042911/5f44affc29bcf618280caf3c/html5/thumbnails/1.jpg)
A Sketch-based Approach for Multimedia RetrievalMS (by research) Thesis,
August, 2012 - November, 2016
Koustav GhosalSupervisor : Dr. Anoop Namboodiri
Centre for Visual Information TechnologyIIIT Hyderabad
November 29, 2016
Koustav Ghosal Supervisor : Dr. Anoop Namboodiri (Universities of Somewhere and Elsewhere)A Sketch-based Approach for Multimedia Retrieval November 29, 2016 1 / 55
![Page 2: A Sketch-based Approach for Multimedia Retrieval...Outline 1 Motivation 2 Video Retrieval 3 Image Retrieval 4 Zero-Shot Learning Koustav Ghosal Supervisor : Dr. Anoop Namboodiri (Universities](https://reader034.vdocuments.mx/reader034/viewer/2022042911/5f44affc29bcf618280caf3c/html5/thumbnails/2.jpg)
Outline
1 Motivation
2 Video Retrieval
3 Image Retrieval
4 Zero-Shot Learning
Koustav Ghosal Supervisor : Dr. Anoop Namboodiri (Universities of Somewhere and Elsewhere)A Sketch-based Approach for Multimedia Retrieval November 29, 2016 2 / 55
![Page 3: A Sketch-based Approach for Multimedia Retrieval...Outline 1 Motivation 2 Video Retrieval 3 Image Retrieval 4 Zero-Shot Learning Koustav Ghosal Supervisor : Dr. Anoop Namboodiri (Universities](https://reader034.vdocuments.mx/reader034/viewer/2022042911/5f44affc29bcf618280caf3c/html5/thumbnails/3.jpg)
Outline
1 Motivation
2 Video Retrieval
3 Image Retrieval
4 Zero-Shot Learning
Koustav Ghosal Supervisor : Dr. Anoop Namboodiri (Universities of Somewhere and Elsewhere)A Sketch-based Approach for Multimedia Retrieval November 29, 2016 3 / 55
![Page 4: A Sketch-based Approach for Multimedia Retrieval...Outline 1 Motivation 2 Video Retrieval 3 Image Retrieval 4 Zero-Shot Learning Koustav Ghosal Supervisor : Dr. Anoop Namboodiri (Universities](https://reader034.vdocuments.mx/reader034/viewer/2022042911/5f44affc29bcf618280caf3c/html5/thumbnails/4.jpg)
Content Based Multimedia RetrievalTextual and Example Based Queries
(a) (b)
Figure: (a) Query by Text (b) Query by Example
Koustav Ghosal Supervisor : Dr. Anoop Namboodiri (Universities of Somewhere and Elsewhere)A Sketch-based Approach for Multimedia Retrieval November 29, 2016 4 / 55
![Page 5: A Sketch-based Approach for Multimedia Retrieval...Outline 1 Motivation 2 Video Retrieval 3 Image Retrieval 4 Zero-Shot Learning Koustav Ghosal Supervisor : Dr. Anoop Namboodiri (Universities](https://reader034.vdocuments.mx/reader034/viewer/2022042911/5f44affc29bcf618280caf3c/html5/thumbnails/5.jpg)
Content Based Multimedia RetrievalLimitations of Text based Query
”Car”
Figure: Image Search
Metadata may not represent the original content.
Koustav Ghosal Supervisor : Dr. Anoop Namboodiri (Universities of Somewhere and Elsewhere)A Sketch-based Approach for Multimedia Retrieval November 29, 2016 5 / 55
![Page 6: A Sketch-based Approach for Multimedia Retrieval...Outline 1 Motivation 2 Video Retrieval 3 Image Retrieval 4 Zero-Shot Learning Koustav Ghosal Supervisor : Dr. Anoop Namboodiri (Universities](https://reader034.vdocuments.mx/reader034/viewer/2022042911/5f44affc29bcf618280caf3c/html5/thumbnails/6.jpg)
Content Based Multimedia RetrievalLimitations of Text based Query
”All those red coloured vehicles which came south and turned left”
Figure: Tracking
A more complicated event means a more complicated query.
Koustav Ghosal Supervisor : Dr. Anoop Namboodiri (Universities of Somewhere and Elsewhere)A Sketch-based Approach for Multimedia Retrieval November 29, 2016 6 / 55
![Page 7: A Sketch-based Approach for Multimedia Retrieval...Outline 1 Motivation 2 Video Retrieval 3 Image Retrieval 4 Zero-Shot Learning Koustav Ghosal Supervisor : Dr. Anoop Namboodiri (Universities](https://reader034.vdocuments.mx/reader034/viewer/2022042911/5f44affc29bcf618280caf3c/html5/thumbnails/7.jpg)
Content Based Multimedia RetrievalLimitations of Example based Query
Examples are not always available.In fact, their absence being the reason for the search.
Koustav Ghosal Supervisor : Dr. Anoop Namboodiri (Universities of Somewhere and Elsewhere)A Sketch-based Approach for Multimedia Retrieval November 29, 2016 7 / 55
![Page 8: A Sketch-based Approach for Multimedia Retrieval...Outline 1 Motivation 2 Video Retrieval 3 Image Retrieval 4 Zero-Shot Learning Koustav Ghosal Supervisor : Dr. Anoop Namboodiri (Universities](https://reader034.vdocuments.mx/reader034/viewer/2022042911/5f44affc29bcf618280caf3c/html5/thumbnails/8.jpg)
Why Sketch-based queries ?Advantages
Efficiently encodes information like shape, pose, colour, size etc. , allat once.
A free-hand sketch is more convenient to draw than typing lengthyqueries.
A sketch is closer to the content of a video as compared to meta-data(tags, comments, captions).
Koustav Ghosal Supervisor : Dr. Anoop Namboodiri (Universities of Somewhere and Elsewhere)A Sketch-based Approach for Multimedia Retrieval November 29, 2016 8 / 55
![Page 9: A Sketch-based Approach for Multimedia Retrieval...Outline 1 Motivation 2 Video Retrieval 3 Image Retrieval 4 Zero-Shot Learning Koustav Ghosal Supervisor : Dr. Anoop Namboodiri (Universities](https://reader034.vdocuments.mx/reader034/viewer/2022042911/5f44affc29bcf618280caf3c/html5/thumbnails/9.jpg)
Challenges in Sketch-based systemsPerceptual Variability
(a) (b) (c) (d)
Figure: Different interpretations of the same trajectory. (a) Original (b),(c),(d)User Inputs
Cognitive variation in motion perception in human beings .
Koustav Ghosal Supervisor : Dr. Anoop Namboodiri (Universities of Somewhere and Elsewhere)A Sketch-based Approach for Multimedia Retrieval November 29, 2016 9 / 55
![Page 10: A Sketch-based Approach for Multimedia Retrieval...Outline 1 Motivation 2 Video Retrieval 3 Image Retrieval 4 Zero-Shot Learning Koustav Ghosal Supervisor : Dr. Anoop Namboodiri (Universities](https://reader034.vdocuments.mx/reader034/viewer/2022042911/5f44affc29bcf618280caf3c/html5/thumbnails/10.jpg)
Challenges in Sketch-based systemsMultimodality
An image is a multi-channel dense representation of an object/scene.
(a) (b) (c)
Figure: (a) Query (b) Results (c) Desired Output
”A simple sketch is a high level sparse representation of the object/scenebeing searched for.” [Li et al., 2015]
Koustav Ghosal Supervisor : Dr. Anoop Namboodiri (Universities of Somewhere and Elsewhere)A Sketch-based Approach for Multimedia Retrieval November 29, 2016 10 / 55
![Page 11: A Sketch-based Approach for Multimedia Retrieval...Outline 1 Motivation 2 Video Retrieval 3 Image Retrieval 4 Zero-Shot Learning Koustav Ghosal Supervisor : Dr. Anoop Namboodiri (Universities](https://reader034.vdocuments.mx/reader034/viewer/2022042911/5f44affc29bcf618280caf3c/html5/thumbnails/11.jpg)
Challenges in Sketch-based systemsScarcity of Data
(a) ImageNet (b) Caltech
(c)TU-Berlin (d) Sketchy
Figure: Number of samples (a) 14,197,122 (b) 30,607 (c) 20,000 (d) 75,471
The system should be able to generalize to unknown classes.
Koustav Ghosal Supervisor : Dr. Anoop Namboodiri (Universities of Somewhere and Elsewhere)A Sketch-based Approach for Multimedia Retrieval November 29, 2016 11 / 55
![Page 12: A Sketch-based Approach for Multimedia Retrieval...Outline 1 Motivation 2 Video Retrieval 3 Image Retrieval 4 Zero-Shot Learning Koustav Ghosal Supervisor : Dr. Anoop Namboodiri (Universities](https://reader034.vdocuments.mx/reader034/viewer/2022042911/5f44affc29bcf618280caf3c/html5/thumbnails/12.jpg)
Challenges in Sketch-based systemsSummary
Perceptual Variability.
Multimodality.
Scarcity of Data.
Koustav Ghosal Supervisor : Dr. Anoop Namboodiri (Universities of Somewhere and Elsewhere)A Sketch-based Approach for Multimedia Retrieval November 29, 2016 12 / 55
![Page 13: A Sketch-based Approach for Multimedia Retrieval...Outline 1 Motivation 2 Video Retrieval 3 Image Retrieval 4 Zero-Shot Learning Koustav Ghosal Supervisor : Dr. Anoop Namboodiri (Universities](https://reader034.vdocuments.mx/reader034/viewer/2022042911/5f44affc29bcf618280caf3c/html5/thumbnails/13.jpg)
Challenges in Sketch-based systemsSummary
Perceptual Variability : Video Retrieval
Multimodality : Image Retrieval
Scarcity of Data : Zero Shot Learning
Koustav Ghosal Supervisor : Dr. Anoop Namboodiri (Universities of Somewhere and Elsewhere)A Sketch-based Approach for Multimedia Retrieval November 29, 2016 13 / 55
![Page 14: A Sketch-based Approach for Multimedia Retrieval...Outline 1 Motivation 2 Video Retrieval 3 Image Retrieval 4 Zero-Shot Learning Koustav Ghosal Supervisor : Dr. Anoop Namboodiri (Universities](https://reader034.vdocuments.mx/reader034/viewer/2022042911/5f44affc29bcf618280caf3c/html5/thumbnails/14.jpg)
Outline
1 Motivation
2 Video Retrieval
3 Image Retrieval
4 Zero-Shot Learning
Koustav Ghosal Supervisor : Dr. Anoop Namboodiri (Universities of Somewhere and Elsewhere)A Sketch-based Approach for Multimedia Retrieval November 29, 2016 14 / 55
![Page 15: A Sketch-based Approach for Multimedia Retrieval...Outline 1 Motivation 2 Video Retrieval 3 Image Retrieval 4 Zero-Shot Learning Koustav Ghosal Supervisor : Dr. Anoop Namboodiri (Universities](https://reader034.vdocuments.mx/reader034/viewer/2022042911/5f44affc29bcf618280caf3c/html5/thumbnails/15.jpg)
Motion
Videos characterized by motion.
(a) (b)
Figure: (a) Human Tracking (b) Sports Analysis
0Image Courtsey : www.google.comKoustav Ghosal Supervisor : Dr. Anoop Namboodiri (Universities of Somewhere and Elsewhere)A Sketch-based Approach for Multimedia Retrieval November 29, 2016 15 / 55
![Page 16: A Sketch-based Approach for Multimedia Retrieval...Outline 1 Motivation 2 Video Retrieval 3 Image Retrieval 4 Zero-Shot Learning Koustav Ghosal Supervisor : Dr. Anoop Namboodiri (Universities](https://reader034.vdocuments.mx/reader034/viewer/2022042911/5f44affc29bcf618280caf3c/html5/thumbnails/16.jpg)
FeaturesQualitative
Objective : Features minimize perceptual variability.
Qualitative Spatio Temporal Features
Features based on motion properties which tell us “how” rather than “howmuch”.
Shape
Direction
Scale
Combines 3 different aspects of motion.
Koustav Ghosal Supervisor : Dr. Anoop Namboodiri (Universities of Somewhere and Elsewhere)A Sketch-based Approach for Multimedia Retrieval November 29, 2016 16 / 55
![Page 17: A Sketch-based Approach for Multimedia Retrieval...Outline 1 Motivation 2 Video Retrieval 3 Image Retrieval 4 Zero-Shot Learning Koustav Ghosal Supervisor : Dr. Anoop Namboodiri (Universities](https://reader034.vdocuments.mx/reader034/viewer/2022042911/5f44affc29bcf618280caf3c/html5/thumbnails/17.jpg)
Aspects of MotionShape
(a) (b)circles
(c) (d)
Figure: A sample motion with the corresponding m-segments: (a) Original (b)Smooth and Normalized (c) m-segments (d) Circle-Based Representation
Koustav Ghosal Supervisor : Dr. Anoop Namboodiri (Universities of Somewhere and Elsewhere)A Sketch-based Approach for Multimedia Retrieval November 29, 2016 17 / 55
![Page 18: A Sketch-based Approach for Multimedia Retrieval...Outline 1 Motivation 2 Video Retrieval 3 Image Retrieval 4 Zero-Shot Learning Koustav Ghosal Supervisor : Dr. Anoop Namboodiri (Universities](https://reader034.vdocuments.mx/reader034/viewer/2022042911/5f44affc29bcf618280caf3c/html5/thumbnails/18.jpg)
Aspects of MotionShape
J = minx0,y0,r
n∑i
x2i + y2i − 2x0xi − 2y0yi + x20 + y20 + r2
S = ( xµ, yµ, r , w , s )(xµ, yµ), r = center, radius of the circle.
w = Slope of the segment.s = Normalized length of arc.
K-Means, Bag of Motion
Koustav Ghosal Supervisor : Dr. Anoop Namboodiri (Universities of Somewhere and Elsewhere)A Sketch-based Approach for Multimedia Retrieval November 29, 2016 18 / 55
![Page 19: A Sketch-based Approach for Multimedia Retrieval...Outline 1 Motivation 2 Video Retrieval 3 Image Retrieval 4 Zero-Shot Learning Koustav Ghosal Supervisor : Dr. Anoop Namboodiri (Universities](https://reader034.vdocuments.mx/reader034/viewer/2022042911/5f44affc29bcf618280caf3c/html5/thumbnails/19.jpg)
Aspects of MotionScale, Direction
0 20 40 60 80 100 120 1401.5
1.0
0.5
0.0
0.5
1.0
1.5
0 20 40 60 80 100 120 1400
20
40
60
80
100
(a) (b) (c) (d)
Figure: A Spiral Motion from our synthetic dataset (a) Points sampledequidistantly in each segment (b) Directions tracked for each equipoint segment(c) Temporal Change of Direction (d) Temporal Change of scale
Trajectory Direction = (α1, α2, . . . , αn)αk = sin θk
Trajectory Scale = (d1, d2, . . . , dn)dk = distance of the current segment from the mean.
Koustav Ghosal Supervisor : Dr. Anoop Namboodiri (Universities of Somewhere and Elsewhere)A Sketch-based Approach for Multimedia Retrieval November 29, 2016 19 / 55
![Page 20: A Sketch-based Approach for Multimedia Retrieval...Outline 1 Motivation 2 Video Retrieval 3 Image Retrieval 4 Zero-Shot Learning Koustav Ghosal Supervisor : Dr. Anoop Namboodiri (Universities](https://reader034.vdocuments.mx/reader034/viewer/2022042911/5f44affc29bcf618280caf3c/html5/thumbnails/20.jpg)
Summary of Features4 types
Bag of Motion : Trajectory = Histogram.
Ordered Bag of Motion : Trajectory = (s1, s2, . . . , sm), wheresk = ( xµ, yµ, r , w , s )k
Direction : = (α1, α2, . . . , αn)
Scale : = (d1, d2, . . . , dn)
Koustav Ghosal Supervisor : Dr. Anoop Namboodiri (Universities of Somewhere and Elsewhere)A Sketch-based Approach for Multimedia Retrieval November 29, 2016 20 / 55
![Page 21: A Sketch-based Approach for Multimedia Retrieval...Outline 1 Motivation 2 Video Retrieval 3 Image Retrieval 4 Zero-Shot Learning Koustav Ghosal Supervisor : Dr. Anoop Namboodiri (Universities](https://reader034.vdocuments.mx/reader034/viewer/2022042911/5f44affc29bcf618280caf3c/html5/thumbnails/21.jpg)
Pipeline
ResultsFilters
1. Bag Of Motions
2. Ordered BoM
3. Direction
4. Scale
1 2 3 44 3 2 1
Query Database
1
2
3
4
Update Score
Figure: Cascaded Retrieval
Koustav Ghosal Supervisor : Dr. Anoop Namboodiri (Universities of Somewhere and Elsewhere)A Sketch-based Approach for Multimedia Retrieval November 29, 2016 21 / 55
![Page 22: A Sketch-based Approach for Multimedia Retrieval...Outline 1 Motivation 2 Video Retrieval 3 Image Retrieval 4 Zero-Shot Learning Koustav Ghosal Supervisor : Dr. Anoop Namboodiri (Universities](https://reader034.vdocuments.mx/reader034/viewer/2022042911/5f44affc29bcf618280caf3c/html5/thumbnails/22.jpg)
Datasets
(a) Pool Dataset (b) Synthetic Dataset.
Figure: (a) Five classes each containing 20 videos each. (b) Five classescontaining 20 videos each. Thus the dataset had 200 videos
Koustav Ghosal Supervisor : Dr. Anoop Namboodiri (Universities of Somewhere and Elsewhere)A Sketch-based Approach for Multimedia Retrieval November 29, 2016 22 / 55
![Page 23: A Sketch-based Approach for Multimedia Retrieval...Outline 1 Motivation 2 Video Retrieval 3 Image Retrieval 4 Zero-Shot Learning Koustav Ghosal Supervisor : Dr. Anoop Namboodiri (Universities](https://reader034.vdocuments.mx/reader034/viewer/2022042911/5f44affc29bcf618280caf3c/html5/thumbnails/23.jpg)
ResultsA Sample Retrieval
Figure: Qualitative Results
Koustav Ghosal Supervisor : Dr. Anoop Namboodiri (Universities of Somewhere and Elsewhere)A Sketch-based Approach for Multimedia Retrieval November 29, 2016 23 / 55
![Page 24: A Sketch-based Approach for Multimedia Retrieval...Outline 1 Motivation 2 Video Retrieval 3 Image Retrieval 4 Zero-Shot Learning Koustav Ghosal Supervisor : Dr. Anoop Namboodiri (Universities](https://reader034.vdocuments.mx/reader034/viewer/2022042911/5f44affc29bcf618280caf3c/html5/thumbnails/24.jpg)
ResultsQuantitative Results : Accuracy
0 2 4 6 8 10 12 14 16Top K retrievals
0
20
40
60
80
100
Acc
ura
cy
33
5056 57 60 62
66 67 69 69 71 74 74 76 78
0 2 4 6 8 10 12 14 16Top K retrievals
0
20
40
60
80
100
Acc
ura
cy
26
3843
5056 58
6267 69 71 72 75 78 78 79
(a) Pool Videos dataset (b) Synthetic Motion dataset
Figure: Accuracy at different top K retrievals.
Koustav Ghosal Supervisor : Dr. Anoop Namboodiri (Universities of Somewhere and Elsewhere)A Sketch-based Approach for Multimedia Retrieval November 29, 2016 24 / 55
![Page 25: A Sketch-based Approach for Multimedia Retrieval...Outline 1 Motivation 2 Video Retrieval 3 Image Retrieval 4 Zero-Shot Learning Koustav Ghosal Supervisor : Dr. Anoop Namboodiri (Universities](https://reader034.vdocuments.mx/reader034/viewer/2022042911/5f44affc29bcf618280caf3c/html5/thumbnails/25.jpg)
ResultsQuantitative Results : Mean Reciprocal Rank
0.0 0.2 0.4 0.6 0.8 1.0Reciprocal Ranks
0
10
20
30
40
50
Num
ber
of
Queri
es
0.0 0.2 0.4 0.6 0.8 1.0Reciprocal Ranks
0
10
20
30
40
50
Num
ber
of
Queri
es
(a)Pool Videos dataset (b) Synthetic Motion dataset
Figure: Mean Reciprocal Ranks
Koustav Ghosal Supervisor : Dr. Anoop Namboodiri (Universities of Somewhere and Elsewhere)A Sketch-based Approach for Multimedia Retrieval November 29, 2016 25 / 55
![Page 26: A Sketch-based Approach for Multimedia Retrieval...Outline 1 Motivation 2 Video Retrieval 3 Image Retrieval 4 Zero-Shot Learning Koustav Ghosal Supervisor : Dr. Anoop Namboodiri (Universities](https://reader034.vdocuments.mx/reader034/viewer/2022042911/5f44affc29bcf618280caf3c/html5/thumbnails/26.jpg)
Outline
1 Motivation
2 Video Retrieval
3 Image Retrieval
4 Zero-Shot Learning
Koustav Ghosal Supervisor : Dr. Anoop Namboodiri (Universities of Somewhere and Elsewhere)A Sketch-based Approach for Multimedia Retrieval November 29, 2016 26 / 55
![Page 27: A Sketch-based Approach for Multimedia Retrieval...Outline 1 Motivation 2 Video Retrieval 3 Image Retrieval 4 Zero-Shot Learning Koustav Ghosal Supervisor : Dr. Anoop Namboodiri (Universities](https://reader034.vdocuments.mx/reader034/viewer/2022042911/5f44affc29bcf618280caf3c/html5/thumbnails/27.jpg)
Sparsity of Sketches
Two different modalities
(b)
(a) (c)
Figure: (a) Query (b) Results (c) Desired Output
Should not be compared directly.
Koustav Ghosal Supervisor : Dr. Anoop Namboodiri (Universities of Somewhere and Elsewhere)A Sketch-based Approach for Multimedia Retrieval November 29, 2016 27 / 55
![Page 28: A Sketch-based Approach for Multimedia Retrieval...Outline 1 Motivation 2 Video Retrieval 3 Image Retrieval 4 Zero-Shot Learning Koustav Ghosal Supervisor : Dr. Anoop Namboodiri (Universities](https://reader034.vdocuments.mx/reader034/viewer/2022042911/5f44affc29bcf618280caf3c/html5/thumbnails/28.jpg)
Our ModelTwo Modalities
Koustav Ghosal Supervisor : Dr. Anoop Namboodiri (Universities of Somewhere and Elsewhere)A Sketch-based Approach for Multimedia Retrieval November 29, 2016 28 / 55
![Page 29: A Sketch-based Approach for Multimedia Retrieval...Outline 1 Motivation 2 Video Retrieval 3 Image Retrieval 4 Zero-Shot Learning Koustav Ghosal Supervisor : Dr. Anoop Namboodiri (Universities](https://reader034.vdocuments.mx/reader034/viewer/2022042911/5f44affc29bcf618280caf3c/html5/thumbnails/29.jpg)
Our ModelCorrespondence
Koustav Ghosal Supervisor : Dr. Anoop Namboodiri (Universities of Somewhere and Elsewhere)A Sketch-based Approach for Multimedia Retrieval November 29, 2016 29 / 55
![Page 30: A Sketch-based Approach for Multimedia Retrieval...Outline 1 Motivation 2 Video Retrieval 3 Image Retrieval 4 Zero-Shot Learning Koustav Ghosal Supervisor : Dr. Anoop Namboodiri (Universities](https://reader034.vdocuments.mx/reader034/viewer/2022042911/5f44affc29bcf618280caf3c/html5/thumbnails/30.jpg)
Our Model
Koustav Ghosal Supervisor : Dr. Anoop Namboodiri (Universities of Somewhere and Elsewhere)A Sketch-based Approach for Multimedia Retrieval November 29, 2016 30 / 55
![Page 31: A Sketch-based Approach for Multimedia Retrieval...Outline 1 Motivation 2 Video Retrieval 3 Image Retrieval 4 Zero-Shot Learning Koustav Ghosal Supervisor : Dr. Anoop Namboodiri (Universities](https://reader034.vdocuments.mx/reader034/viewer/2022042911/5f44affc29bcf618280caf3c/html5/thumbnails/31.jpg)
Our Model
Koustav Ghosal Supervisor : Dr. Anoop Namboodiri (Universities of Somewhere and Elsewhere)A Sketch-based Approach for Multimedia Retrieval November 29, 2016 31 / 55
![Page 32: A Sketch-based Approach for Multimedia Retrieval...Outline 1 Motivation 2 Video Retrieval 3 Image Retrieval 4 Zero-Shot Learning Koustav Ghosal Supervisor : Dr. Anoop Namboodiri (Universities](https://reader034.vdocuments.mx/reader034/viewer/2022042911/5f44affc29bcf618280caf3c/html5/thumbnails/32.jpg)
Our ModelTwo Modalities
Koustav Ghosal Supervisor : Dr. Anoop Namboodiri (Universities of Somewhere and Elsewhere)A Sketch-based Approach for Multimedia Retrieval November 29, 2016 32 / 55
![Page 33: A Sketch-based Approach for Multimedia Retrieval...Outline 1 Motivation 2 Video Retrieval 3 Image Retrieval 4 Zero-Shot Learning Koustav Ghosal Supervisor : Dr. Anoop Namboodiri (Universities](https://reader034.vdocuments.mx/reader034/viewer/2022042911/5f44affc29bcf618280caf3c/html5/thumbnails/33.jpg)
Standard CCA
Given two sets I and S ,
I = (I1, I2, . . . , In)
S = (S1,S2, . . . ,Sn)
we try two find two subspaces,
P I =<WI , I >,PS =<WS , S >
such that that their correlation is maximized.
ρ = maxWI ,WS
corr (P I ,PS)
= maxWI ,WS
〈PS ,P I 〉‖PS‖‖P I‖
(1)
Koustav Ghosal Supervisor : Dr. Anoop Namboodiri (Universities of Somewhere and Elsewhere)A Sketch-based Approach for Multimedia Retrieval November 29, 2016 33 / 55
![Page 34: A Sketch-based Approach for Multimedia Retrieval...Outline 1 Motivation 2 Video Retrieval 3 Image Retrieval 4 Zero-Shot Learning Koustav Ghosal Supervisor : Dr. Anoop Namboodiri (Universities](https://reader034.vdocuments.mx/reader034/viewer/2022042911/5f44affc29bcf618280caf3c/html5/thumbnails/34.jpg)
Standard CCAcontinued...
As derived in [Hardoon et al., 2004], Equation 1 reduces to,
ρ = maxWI ,WS
W ′I CovISWS√
W ′I CovIIWI
√W ′
SCovSSWS
(2)
and the covariance matrix of (AI ,AS) given by:
Cov = E
[ (AI
AS
) (AI
AS
)′]
=
[CovII CovISCovSI CovSS
](3)
Equation 2 can be solved as an Eigen value problem for WI and WS .
Koustav Ghosal Supervisor : Dr. Anoop Namboodiri (Universities of Somewhere and Elsewhere)A Sketch-based Approach for Multimedia Retrieval November 29, 2016 34 / 55
![Page 35: A Sketch-based Approach for Multimedia Retrieval...Outline 1 Motivation 2 Video Retrieval 3 Image Retrieval 4 Zero-Shot Learning Koustav Ghosal Supervisor : Dr. Anoop Namboodiri (Universities](https://reader034.vdocuments.mx/reader034/viewer/2022042911/5f44affc29bcf618280caf3c/html5/thumbnails/35.jpg)
Cluster CCA
As suggested in [Rasiwasia et al., 2014], we compute the covariancematrices as follows,
CovIS =1
M
C∑c=1
|I c |∑j=1
|Sc |∑k=1
I cj Sc ′k (4)
CovII =1
M
C∑c=1
|I c |∑j=1
|Sc |I cj I c′
j (5)
CovSS =1
M
C∑c=1
|Sc |∑k=1
|I c |SckS
c ′k (6)
where M =∑C
c=1 |I c ||Sc |, is the total number of pairwisecorrespondences across C classes.
Koustav Ghosal Supervisor : Dr. Anoop Namboodiri (Universities of Somewhere and Elsewhere)A Sketch-based Approach for Multimedia Retrieval November 29, 2016 35 / 55
![Page 36: A Sketch-based Approach for Multimedia Retrieval...Outline 1 Motivation 2 Video Retrieval 3 Image Retrieval 4 Zero-Shot Learning Koustav Ghosal Supervisor : Dr. Anoop Namboodiri (Universities](https://reader034.vdocuments.mx/reader034/viewer/2022042911/5f44affc29bcf618280caf3c/html5/thumbnails/36.jpg)
PipelineTraining and Testing
(a)
(b)
Figure: Proposed pipeline (a) Training (b) Retrieval
Koustav Ghosal Supervisor : Dr. Anoop Namboodiri (Universities of Somewhere and Elsewhere)A Sketch-based Approach for Multimedia Retrieval November 29, 2016 36 / 55
![Page 37: A Sketch-based Approach for Multimedia Retrieval...Outline 1 Motivation 2 Video Retrieval 3 Image Retrieval 4 Zero-Shot Learning Koustav Ghosal Supervisor : Dr. Anoop Namboodiri (Universities](https://reader034.vdocuments.mx/reader034/viewer/2022042911/5f44affc29bcf618280caf3c/html5/thumbnails/37.jpg)
Datasets
TU Berlin Dataset, 250 categories, 80 sample each category
Caltech 256, Pascal VOC 2007
Koustav Ghosal Supervisor : Dr. Anoop Namboodiri (Universities of Somewhere and Elsewhere)A Sketch-based Approach for Multimedia Retrieval November 29, 2016 37 / 55
![Page 38: A Sketch-based Approach for Multimedia Retrieval...Outline 1 Motivation 2 Video Retrieval 3 Image Retrieval 4 Zero-Shot Learning Koustav Ghosal Supervisor : Dr. Anoop Namboodiri (Universities](https://reader034.vdocuments.mx/reader034/viewer/2022042911/5f44affc29bcf618280caf3c/html5/thumbnails/38.jpg)
Features
Table 1 : Summary of Features
Feature Dimension SourceCALTECH - SIFT 1000 Vl-Feat. [Vedaldi and Fulkerson, 2008]CALTECH - HOG 20000 Vl-Feat [Vedaldi and Fulkerson, 2008]CALTECH - CNN 4096 Krizhevsky et al. [Krizhevsky et al., 2012]PASCAL - SIFT 1000 Guillaumin et al.in [Guillaumin et al., 2010]PASCAL - HOG 20000 Vl-Feat [Vedaldi and Fulkerson, 2008]PASCAL - CNN 4096 Krizhevsky et al. [Krizhevsky et al., 2012]
TU-BERLIN - SIFT-Like 501 Eitz et al.in [Eitz et al., 2012]TU-BERLIN - HOG 20000 Vl-Feat [Vedaldi and Fulkerson, 2008]TU-BERLIN - Fisher 250000 Rosalia et al. [Schneider and Tuytelaars, 2014]TU-BERLIN - CNN 4096 Yang et al. [Yang and Hospedales, 2015]
Koustav Ghosal Supervisor : Dr. Anoop Namboodiri (Universities of Somewhere and Elsewhere)A Sketch-based Approach for Multimedia Retrieval November 29, 2016 38 / 55
![Page 39: A Sketch-based Approach for Multimedia Retrieval...Outline 1 Motivation 2 Video Retrieval 3 Image Retrieval 4 Zero-Shot Learning Koustav Ghosal Supervisor : Dr. Anoop Namboodiri (Universities](https://reader034.vdocuments.mx/reader034/viewer/2022042911/5f44affc29bcf618280caf3c/html5/thumbnails/39.jpg)
ResultsQualitative
Figure: Example Retrievals From Our System
Koustav Ghosal Supervisor : Dr. Anoop Namboodiri (Universities of Somewhere and Elsewhere)A Sketch-based Approach for Multimedia Retrieval November 29, 2016 39 / 55
![Page 40: A Sketch-based Approach for Multimedia Retrieval...Outline 1 Motivation 2 Video Retrieval 3 Image Retrieval 4 Zero-Shot Learning Koustav Ghosal Supervisor : Dr. Anoop Namboodiri (Universities](https://reader034.vdocuments.mx/reader034/viewer/2022042911/5f44affc29bcf618280caf3c/html5/thumbnails/40.jpg)
ResultsQuantitative
Table : Mean Average Precision (MAP) for Image-Sketch feature combinations
Dataset SIFT-SIFT SIFT-HOG SIFT-Fisher HOG-SIFT HOG-HOG HOG-Fisher CNN-CNN
Caltech 0.06 0.03 0.20 0.14 0.02 0.01 0.20Pascal 0.13 0.12 0.05 0.18 0.09 0.06 0.06
Table : Performance improvement in mAP values
Dataset Features Before CCA After CCA
Caltech SIFT-Fisher 0.01 0.20Caltech CNN-CNN 0.01 0.20Pascal HOG-SIFT 0.01 0.18Pascal SIFT-SIFT 0.06 0.13
Koustav Ghosal Supervisor : Dr. Anoop Namboodiri (Universities of Somewhere and Elsewhere)A Sketch-based Approach for Multimedia Retrieval November 29, 2016 40 / 55
![Page 41: A Sketch-based Approach for Multimedia Retrieval...Outline 1 Motivation 2 Video Retrieval 3 Image Retrieval 4 Zero-Shot Learning Koustav Ghosal Supervisor : Dr. Anoop Namboodiri (Universities](https://reader034.vdocuments.mx/reader034/viewer/2022042911/5f44affc29bcf618280caf3c/html5/thumbnails/41.jpg)
Outline
1 Motivation
2 Video Retrieval
3 Image Retrieval
4 Zero-Shot Learning
Koustav Ghosal Supervisor : Dr. Anoop Namboodiri (Universities of Somewhere and Elsewhere)A Sketch-based Approach for Multimedia Retrieval November 29, 2016 41 / 55
![Page 42: A Sketch-based Approach for Multimedia Retrieval...Outline 1 Motivation 2 Video Retrieval 3 Image Retrieval 4 Zero-Shot Learning Koustav Ghosal Supervisor : Dr. Anoop Namboodiri (Universities](https://reader034.vdocuments.mx/reader034/viewer/2022042911/5f44affc29bcf618280caf3c/html5/thumbnails/42.jpg)
Zero-Shot Learning
(a) (b)
Figure: (a) Standard Classifier (b) Zero-Shot Classifier
Knowledge Database acts as an Oracle, who knows everything
Koustav Ghosal Supervisor : Dr. Anoop Namboodiri (Universities of Somewhere and Elsewhere)A Sketch-based Approach for Multimedia Retrieval November 29, 2016 42 / 55
![Page 43: A Sketch-based Approach for Multimedia Retrieval...Outline 1 Motivation 2 Video Retrieval 3 Image Retrieval 4 Zero-Shot Learning Koustav Ghosal Supervisor : Dr. Anoop Namboodiri (Universities](https://reader034.vdocuments.mx/reader034/viewer/2022042911/5f44affc29bcf618280caf3c/html5/thumbnails/43.jpg)
Word 2 Vec
Word 2 Vec by Mikolov et al. [Mikolov et al., 2013] is a vector space,where semantically similar words are mapped together.Apples and Oranges are closer than Apples and Mumbai.The distance between two classes in Word 2 Vec space representstheir semantic similarity.
Figure: An artistic expression of Word2Vec vector space.
Koustav Ghosal Supervisor : Dr. Anoop Namboodiri (Universities of Somewhere and Elsewhere)A Sketch-based Approach for Multimedia Retrieval November 29, 2016 43 / 55
![Page 44: A Sketch-based Approach for Multimedia Retrieval...Outline 1 Motivation 2 Video Retrieval 3 Image Retrieval 4 Zero-Shot Learning Koustav Ghosal Supervisor : Dr. Anoop Namboodiri (Universities](https://reader034.vdocuments.mx/reader034/viewer/2022042911/5f44affc29bcf618280caf3c/html5/thumbnails/44.jpg)
TrainingObjective Function
We train a CNN as in [Yu et al., 2015] for TU Berlin and asin [Chatfield et al., 2014] for Caltech256 dataset and replace the soft-max
layer as follows.
J(θ) =∑y∈Y
∑x i∈XY
||wy − g(x i )||2
A close miss is penalized less than a distant miss.
Koustav Ghosal Supervisor : Dr. Anoop Namboodiri (Universities of Somewhere and Elsewhere)A Sketch-based Approach for Multimedia Retrieval November 29, 2016 44 / 55
![Page 45: A Sketch-based Approach for Multimedia Retrieval...Outline 1 Motivation 2 Video Retrieval 3 Image Retrieval 4 Zero-Shot Learning Koustav Ghosal Supervisor : Dr. Anoop Namboodiri (Universities](https://reader034.vdocuments.mx/reader034/viewer/2022042911/5f44affc29bcf618280caf3c/html5/thumbnails/45.jpg)
Datasets
TU Berlin Dataset, 250 categories, 80 sample each category
Caltech 256, Pascal VOC 2007
Koustav Ghosal Supervisor : Dr. Anoop Namboodiri (Universities of Somewhere and Elsewhere)A Sketch-based Approach for Multimedia Retrieval November 29, 2016 45 / 55
![Page 46: A Sketch-based Approach for Multimedia Retrieval...Outline 1 Motivation 2 Video Retrieval 3 Image Retrieval 4 Zero-Shot Learning Koustav Ghosal Supervisor : Dr. Anoop Namboodiri (Universities](https://reader034.vdocuments.mx/reader034/viewer/2022042911/5f44affc29bcf618280caf3c/html5/thumbnails/46.jpg)
ResultsQualitative
Figure: Example Queries
Koustav Ghosal Supervisor : Dr. Anoop Namboodiri (Universities of Somewhere and Elsewhere)A Sketch-based Approach for Multimedia Retrieval November 29, 2016 46 / 55
![Page 47: A Sketch-based Approach for Multimedia Retrieval...Outline 1 Motivation 2 Video Retrieval 3 Image Retrieval 4 Zero-Shot Learning Koustav Ghosal Supervisor : Dr. Anoop Namboodiri (Universities](https://reader034.vdocuments.mx/reader034/viewer/2022042911/5f44affc29bcf618280caf3c/html5/thumbnails/47.jpg)
Experiments
Sketch Classification.
Uni-Modal Sketch Retrieval.
Cross-Modal Retrieval.
Zero Shot Retrieval.
Koustav Ghosal Supervisor : Dr. Anoop Namboodiri (Universities of Somewhere and Elsewhere)A Sketch-based Approach for Multimedia Retrieval November 29, 2016 47 / 55
![Page 48: A Sketch-based Approach for Multimedia Retrieval...Outline 1 Motivation 2 Video Retrieval 3 Image Retrieval 4 Zero-Shot Learning Koustav Ghosal Supervisor : Dr. Anoop Namboodiri (Universities](https://reader034.vdocuments.mx/reader034/viewer/2022042911/5f44affc29bcf618280caf3c/html5/thumbnails/48.jpg)
ResultsClassification
Algorithm Features Classifier Accuracy
TU-Berlin DatasetYang et al. [Yu et al., 2015] CNN-Ensemble Soft-Max 74.9%Yang et al. [Yu et al., 2015] CNN-Single Soft-Max 72.6%Schneider et al. [Schneider and Tuytelaars, 2014] Fisher SVM 63.1%Eitz et al. [Eitz et al., 2012] BOW SVM 56%Proposed DFSR Random Forest 70.22%
Table: Classification Results show that our features perform reasonably wellalmost at par with the state of the art methods.
Koustav Ghosal Supervisor : Dr. Anoop Namboodiri (Universities of Somewhere and Elsewhere)A Sketch-based Approach for Multimedia Retrieval November 29, 2016 48 / 55
![Page 49: A Sketch-based Approach for Multimedia Retrieval...Outline 1 Motivation 2 Video Retrieval 3 Image Retrieval 4 Zero-Shot Learning Koustav Ghosal Supervisor : Dr. Anoop Namboodiri (Universities](https://reader034.vdocuments.mx/reader034/viewer/2022042911/5f44affc29bcf618280caf3c/html5/thumbnails/49.jpg)
ResultsUni-Modal Retrieval
(a) (b)
Figure: Uni-modal retrieval : (a) Sketch Modality (b) Image Modality
Koustav Ghosal Supervisor : Dr. Anoop Namboodiri (Universities of Somewhere and Elsewhere)A Sketch-based Approach for Multimedia Retrieval November 29, 2016 49 / 55
![Page 50: A Sketch-based Approach for Multimedia Retrieval...Outline 1 Motivation 2 Video Retrieval 3 Image Retrieval 4 Zero-Shot Learning Koustav Ghosal Supervisor : Dr. Anoop Namboodiri (Universities](https://reader034.vdocuments.mx/reader034/viewer/2022042911/5f44affc29bcf618280caf3c/html5/thumbnails/50.jpg)
ResultsCross-Modal Retrieval
Figure: Cross Modal Retrieval
Koustav Ghosal Supervisor : Dr. Anoop Namboodiri (Universities of Somewhere and Elsewhere)A Sketch-based Approach for Multimedia Retrieval November 29, 2016 50 / 55
![Page 51: A Sketch-based Approach for Multimedia Retrieval...Outline 1 Motivation 2 Video Retrieval 3 Image Retrieval 4 Zero-Shot Learning Koustav Ghosal Supervisor : Dr. Anoop Namboodiri (Universities](https://reader034.vdocuments.mx/reader034/viewer/2022042911/5f44affc29bcf618280caf3c/html5/thumbnails/51.jpg)
ResultsZero Shot Retrieval
(a) (b)
Figure: Zero-Shot Cross-Modal Retrieval : (a) PR curve for the worst performingpartition. It can be observed that DFSR outperform the features proposed byYang et al. [Yang and Hospedales, 2015]. (b) PR curve for the best performingpartition.
Koustav Ghosal Supervisor : Dr. Anoop Namboodiri (Universities of Somewhere and Elsewhere)A Sketch-based Approach for Multimedia Retrieval November 29, 2016 51 / 55
![Page 52: A Sketch-based Approach for Multimedia Retrieval...Outline 1 Motivation 2 Video Retrieval 3 Image Retrieval 4 Zero-Shot Learning Koustav Ghosal Supervisor : Dr. Anoop Namboodiri (Universities](https://reader034.vdocuments.mx/reader034/viewer/2022042911/5f44affc29bcf618280caf3c/html5/thumbnails/52.jpg)
Summary
Figure: Related Domains
Qualitative features for video retrieval.
Cluster CCA for Multi Modal Image Retrieval.
Deep Features for Semantic Rerieval.
Koustav Ghosal Supervisor : Dr. Anoop Namboodiri (Universities of Somewhere and Elsewhere)A Sketch-based Approach for Multimedia Retrieval November 29, 2016 52 / 55
![Page 53: A Sketch-based Approach for Multimedia Retrieval...Outline 1 Motivation 2 Video Retrieval 3 Image Retrieval 4 Zero-Shot Learning Koustav Ghosal Supervisor : Dr. Anoop Namboodiri (Universities](https://reader034.vdocuments.mx/reader034/viewer/2022042911/5f44affc29bcf618280caf3c/html5/thumbnails/53.jpg)
References I
Chatfield, K., Simonyan, K., Vedaldi, A., and Zisserman, A. (2014). Return of the devilin the details: Delving deep into convolutional nets. In British Machine VisionConference.
Eitz, M., Hays, J., and Alexa, M. (2012). How do humans sketch objects? ACM Trans.Graph.
Guillaumin, M., Verbeek, J., and Schmid, C. (2010). Multimodal semi-supervisedlearning for image classification. In CVPR.
Hardoon, D., Szedmak, S., and Shawe-Taylor, J. (2004). Canonical correlation analysis:An overview with application to learning methods. Neural computation.
Krizhevsky, A., Sutskever, I., and Hinton, G. E. (2012). Imagenet classification withdeep convolutional neural networks. In NIPS.
Li, Y., Hospedales, T. M., Song, Y.-Z., and Gong, S. (2015). Free-hand sketchrecognition by multi-kernel feature learning. CVIU.
Koustav Ghosal Supervisor : Dr. Anoop Namboodiri (Universities of Somewhere and Elsewhere)A Sketch-based Approach for Multimedia Retrieval November 29, 2016 53 / 55
![Page 54: A Sketch-based Approach for Multimedia Retrieval...Outline 1 Motivation 2 Video Retrieval 3 Image Retrieval 4 Zero-Shot Learning Koustav Ghosal Supervisor : Dr. Anoop Namboodiri (Universities](https://reader034.vdocuments.mx/reader034/viewer/2022042911/5f44affc29bcf618280caf3c/html5/thumbnails/54.jpg)
References II
Mikolov, T., Sutskever, I., Chen, K., Corrado, G. S., and Dean, J. (2013). Distributedrepresentations of words and phrases and their compositionality. In Advances in neuralinformation processing systems, pages 3111–3119.
Rasiwasia, N., Mahajan, D., Mahadevan, V., and Aggarwal, G. (2014). Clustercanonical correlation analysis. In AI Statistics.
Schneider, R. G. and Tuytelaars, T. (2014). Sketch classification andclassification-driven analysis using fisher vectors. TOG.
Vedaldi, A. and Fulkerson, B. (2008). VLFeat: An open and portable library ofcomputer vision algorithms. http://www.vlfeat.org/.
Yang, Y. and Hospedales, T. M. (2015). Deep neural networks for sketch recognition.arXiv preprint arXiv:1501.07873.
Yu, Q., Yang, Y., Song, Y.-Z., Xiang, T., and Hospedales, T. M. (2015). Sketch-a-netthat beats humans. In BMVC.
Koustav Ghosal Supervisor : Dr. Anoop Namboodiri (Universities of Somewhere and Elsewhere)A Sketch-based Approach for Multimedia Retrieval November 29, 2016 54 / 55
![Page 55: A Sketch-based Approach for Multimedia Retrieval...Outline 1 Motivation 2 Video Retrieval 3 Image Retrieval 4 Zero-Shot Learning Koustav Ghosal Supervisor : Dr. Anoop Namboodiri (Universities](https://reader034.vdocuments.mx/reader034/viewer/2022042911/5f44affc29bcf618280caf3c/html5/thumbnails/55.jpg)
Thank [email protected]
Koustav Ghosal Supervisor : Dr. Anoop Namboodiri (Universities of Somewhere and Elsewhere)A Sketch-based Approach for Multimedia Retrieval November 29, 2016 55 / 55