In Search of Art
Elliot J. Crowley and Andrew Zisserman
Visual Geometry Group
Department of Engineering Science
University of Oxford
The Goal • An on-the-fly system for searching paintings
visually
• A user can type in the name of any category...
• Then hundreds of paintings containing that category will be retrieved in a matter of seconds
dog
Benefits
• In many instances, the retrieved paintings will not have been known to contain the category
• Meaning these are new discoveries for the Art History community
dog
Why is this good?
• Art historians can discover when something first appeared in paintings
• They can also observe how things have changed over time
How is this achieved?
• Natural images annotated with object categories are everywhere.
• These can be used to learn object classifiers.
Google images of dog
Dataset of Paintings
• We use `Your Paintings’ as the dataset
• `Your Paintings’ consists of over 210,000 paintings from UK galleries
http://www.bbc.co.uk/arts/yourpaintings/
• Method is independent of dataset however
• Can use other datasets e.g. Rijksmuseum or PrintART
Outline
• Methodology
• Quantitative Evaluation
• Aligning retrieved objects
What do we do?
• We crawl Google Images for a given category and learn a CNN-based classifier
• This classifier is applied to a dataset of paintings, retrieving paintings containing the category
The Architecture
How do we do this quickly?
• The bulk of the data has been pre-processed offline (negative training data, dataset of paintings)
• Online processing of Google Images is done in parallel across multiple cores
In more detail…
• For a given query, the top 200 Google Image Hits are downloaded
• For each of these a CNN feature is computed online
• This is the positive training data
Negative Training Data
• Offline, images are downloaded for Google searches of `things’ and ‘photos’
• The features for these are pre-computed
Classification
• A Support Vector Machine is used to learn a classifier that discriminates the positive training data from the negative data
beard not beard
Retrieval
• The classifier is applied to the pre-processed features of `Your Paintings’
• Each painting is given a score by the classifier
Retrieval • The paintings are displayed in order of score.
beard
The Architecture - Timings 0.5s
4.5s
<0.5s
<0.5s 2s
Example Queries bridge
Example Queries
carriage
Example Queries
flower
Example Queries
house
Outline
• Methodology
• Quantitative Evaluation
• Aligning retrieved objects
Quantitative Evaluation
• Evaluating the domain transfer problem of learning classifiers on natural images and applying these to paintings
Test Set
• For this an annotated dataset of paintings is required
• 10,000 paintings in `Your Paintings’ have been tagged by the public
• These tags + painting titles are used to form the `Paintings Dataset’ with annotations corresponding to classes of PASCAL VOC
The Paintings Dataset Class Paintings
with Class
Aeroplane 200
Bird 805
Boat 2143
Chair 1202
Cow 625
Dining-table 1201
Dog 1145
Horse 1493
Sheep 751
Train 329
• Assume complete annotation in the PASCAL sense
• Assess by calculating APs
Train
Dog
Horse
Training Datasets
• 4 Datasets of natural images are used for training
• VOC12, VOC12+, Net Noisy, Net Curated
Experiments
Features compared:
• Shallow Features - Fisher Vectors
VS.
• Deep Features - Convolutional Neural Networks (CNNs)
Experiments - Features
• Fisher Vector VS. CNN Features
• CNN outperforms Fisher Vectors
• Added advantage of being lower dimensionality
Augmentation • No augmentation
• C+F augmentation
224 224
224
256
Experiments - Augmentation
• Sum Pool: Classifier applied to mean of augmented windows
• Max Pool: Classifier applied to each augmented window and maximum score recorded
• Best performance is aug + sum pool but almost as good with no aug + sum pool
Experiments - Dimensionality
• 1K performs best
• Not that different from the others however
Experiment Conclusions
• For the on-the-fly system 1K CNN features are used as these performed the best
• Sum pooled features are used for `Your Paintings’ as time is not a factor in computing these
• No augmentation is used on the images downloaded from Google (0.3s per image per core vs. 2.4s)
Outline
• Methodology
• Quantitative Evaluation
• Aligning retrieved objects
Alignment
• Some objects are automatically aligned…
moustache
The Pencil Moustache
Anonymous Trendsetter, 1565
Copycats, Now
Alignment
• Other objects require some work…
train
Solution
Learn a DPM [1] on either
1. annotated bounding boxes (e.g. PASCAL VOC) or
2. the downloaded Google Images
[1] P Felzenszwalb, R Girshick, D McAllester, D Ramanan, Object Detection with Discriminatively Trained Part Based Models, CVPR 2010
Auto-alignment
train
Auto-alignment
horse
Conclusion
• We provide a system that can find objects in paintings with high precision in very little time
• The objects found can be further curated using a DPM
Links
• VISOR: Visual Search of BBC News [1] http://www.robots.ox.ac.uk/~vgg/research/on-the-fly/ • CNN code [2] http://www.robots.ox.ac.uk/~vgg/research/deep_eval/ • Our system COMING SHORTLY! [1] K Chatfield, A Zisserman, VISOR: Towards On-the-Fly Large-Scale Object Category Retrieval, ACCV, 2012 [2] Ken Chatfield, Karen Simonyan, Andrea Vedaldi, Andrew Zisserman, Return of the Devil in the Details: Delving Deep into Convolutional Nets, BMVC, 2014