automatic image annotation (aia)
Embed Size (px)
TRANSCRIPT
Seminar Report
Seminar Report
Presented to: Dr. Shanbehzadeh
Presented by: Farzaneh Rezaei
November 2015
1
What is the goal of computer vision ?Perceive the story behind the picture See the world!!But what exactly does it mean to see?
2Source: Wall-e Movie: Pixar, Walt Disney Pictures
Outline3
3Introduction
Outline4
4Introduction
What is Automatic Image Annotation?
Automatic image annotation is the task of automatically assigning words to an image that describe the content of the image.
Munirathnam Srikanth,et al.Exploiting ontologies for automatic image annotation
Source: Personalizing Automated Image Annotation Using Cross-Entropy: https://ivi.fnwi.uva.nl/isis/publications/bibtexbrowser.php?key=LiICM2011&bib=all.bib5
5Introduction
What is Automatic Image Annotation?(Cont.)
Source: MS COCO Captioning Challenge: http://mscoco.org/dataset/#captions-challenge20156
6Introduction
3,000 Photos Are Uploaded Every Second to Facebook
Why Image Annotation is important?
Recently, we have witnessed an exponential growth of user generated videos and images, due to the booming of social networks, such as Facebook and Flickr. Source: petapixel.comSource: http://petapixel.com/2012/02/01/3000-photos-are-uploaded-every-second-to-facebook/ 7
7
Why Image Annotation is important?(Cont.)
Source: Barriuso, A., & Torralba, A. (2012). Notes on image annotation Applications e.g. Photo organizer appsImage Classification Systems
8
8Introduction
Numbers of articles per year for Automatic Image Annotation (in Title of article)Reported by: Google Scholar9
9Introduction
Outline10
10Introduction
How do you annotate these images?
11
What are components of Automatic Image AnnotationSystem ?12
How to classify Images ?What are components of Automatic Image AnnotationSystem ?13
Feature ExtractionClassificationMethodsWhat are components of Automatic Image AnnotationSystem ?14
What are components of Automatic Image AnnotationSystem ?ClassificationMethodsFeature Extraction15
What are components of Automatic Image AnnotationSystem ?Feature ExtractionClassificationMethodsPattern Recognition !!16
Pas ghesmate azaame ye system aia az sakhtar haye pr peyravi mikone, va be hamin dalil motale sakhtarha da pr be ma komak mikoneVa azoon mohem tar dalile asli moshkelate aia ro mishe too hamin sakhtar ha jostojo kard16
17Slide Credit
18
An Example of classical approaches in AIASource: Zhang, D., Islam, M. M., & Lu, G. (2012). A review on automatic image annotation techniques. Pattern Recognition, 45(1), 346362. doi:10.1016/j.patcog.2011.05.013
In derakhto ta che omghi pish berim???Masalan ye tasvire gol? Ya ye tasvir az ye chahar rahe sholoogh? Ta chand level berim ?18
Theoretical Limitations of Shallow Architectures*Functions that can be compactly represented by a depth k architecture might require an exponential number of computational elements to be represented by a depth k 1 architecture
Issues of classical approaches19*Bengio, Y. (2009). Learning Deep Architectures for AI. Foundations and Trends in Machine Learning
Introduction19
Issues of classical approaches (Cont.)Theoretical Limitations of Shallow ArchitecturesShallow? Deep?Functions?Compact?Depth?Computational Elements?20logic circuit
Mesal baraye roshan shodane function va ce mitoone madar manteghi bashe, ke khoroojie ma hamoon form sade shodeye madaremoone va har gate neshan dahande yek onsor mohasebatie, mesale ai mitoone Safheye bad20
Issues of classical approaches (Cont.)21
Picture Source: Bengio, Y. (2009). Learning Deep Architectures for AI. Foundations and Trends in Machine Learning
Depth 4Depth 3
Issues of classical approaches (Cont.)Theoretical Limitations of Shallow ArchitecturesLinear regression and logistic regression have depth 1, i.e., have a single level.Ordinary multi-layer neural networks With the most common choice of one hidden layer, they have depth twoDecision trees can also be seen as having two levelsBoosting (Freund & Schapire, 1996) usually adds one level to its base learners: that level computes a vote or linear combination of the outputs of the base learners22
22
Issues of classical approaches (Cont.)Theoretical Limitations of Shallow ArchitecturesShallow? Deep?FunctionsCompactDepthComputational Elements23
Ama hala deep boodan yani chi? Masalan migim az omgh 10 be bad deep hesab mishe ??! Nazare shoma chieHala baz bargardim be hamoon jomle, bebinid ma ye tedad khas made nazaremoon nisChon baste be masala fargh mikone bahs e ma ine ke age beshe target funcemoon ro ba k omgh compact neshoon bedim VA Maaghaleye zisserman ke miad mige harchi omgh bishtar javab behtar ama bayad did che ghadr behtar shode mi arze?23
Theoretical Limitations of Shallow Architectures*Functions that can be compactly represented by a depth k architecture might require an exponential number of computational elements to be represented by a depth k 1 architecture
Issues of classical approaches24*Bengio, Y. (2009). Learning Deep Architectures for AI. Foundations and Trends in Machine Learning
Introduction24
A two-layer circuit of logic gates can represent any boolean function (Mendelson, 1997).With depth two logical circuits, most boolean functions require an exponential number of logic gates (Wegener, 1987) to be represented (with respect to input size)There are functions computable with a polynomial-size logic gates circuit of depth k that require exponential size when restricted to depth k 1 (Hastad, 1986) The proof of this theorem relies on earlier results (Yao, 1985) showing that d-bit parity circuits of depth 2 have exponential size25Issues of classical approaches (Cont.)
25
One might wonder whether these computational complexity results for boolean circuits are relevant to machine learning. See Orponen (1994)! for an early survey of theoretical results in computational complexity relevant to learning algorithms. Interestingly, many of the results for boolean circuits can be generalized to architectures whose computational elements are linear threshold units (also known as artificial neurons (McCulloch & Pitts, 1943)), which compute:f(x) = w0 x+b0 (1)with parameters w and b.
26Issues of classical approaches (Cont.)
26
27Issues of classical approaches (Cont.)1 Theoretical Limitations of Shallow Architectures2 Theoretical Advantages of Deep ArchitecturesWhich one ?? !
Bahse inke migim shallow structure behtar az classic hastam mikhay begooNagoo javab chie begoo bayad maghalate bengio ro kamel tar khoond va dalile in eslah ro fahmid ama hala shoma behesh fek konid va man ham dar natije giri payani, nazare khodamo ba tavajjoh be matalebi ke khoondam midam27
28Slide Credit
29
Slide Credit
How to assign a word to an image ?What are components of Automatic Image AnnotationSystem ?Feature ExtractionClassificationMethodsPattern Recognition !!30
30
31
http://graffiti-artist.net/corporate-offices/ny-facebook-office-graffiti/
31
Outline32
32Introduction
Going Deeper!33
33Introduction
Feature Extraction34
Color
35
Color
36
Color: ComparisonsColor methodProsConsHistogramSimple to compute, intuitiveHigh dimension, no spatial info,sensitive to noiseCM Compact, robustNot enough to describe all colors, no spatial infoCCVSpatial infoHigh dimension, high computation costCorrelogramSpatial infoVery high computation cost, sensitive to noise, rotation and scale
37
37
Color: Comparisons (Cont.)Color methodProsConsDCD Compact, robust,perceptual meaningNeed post-processing for spatial infoCSD Spatial infoSensitive to noise, rotation and scaleSCDCompact on need,scalabilityNo spatial info, less accurate ifcompact
38
Spatial Texture : ComparisonsColor methodProsConsTextonIntuitiveSensitive to noise, rotation and scale, difficult to define textonsGLCM based method Intuitive, compact, robust HighHigh computation cost, not enough to describe allTamuraPerceptually meaningfulToo few featuresSARCompact, robust, rotationinvariantHigh computation cost, difficult to define pattern sizeFDCompact, perceptually meaningfulcomputation cost, sensitive to scale
39
Spectral Texture : Comparisons (Cont.)Color methodProsConsFT/DCTFast computationSensitive to scale and rotationWaveletFast computation, multi-resolutionSensitive to rotation, limitedorientationsGaborMulti-scale, multi-orientation, robustnormalisation, losing of spectral information due to incomplete cover of spectrum planeCurveletMulti-resolution, multi-orientation, robustNeed rotation normalisation
40
Shape
Chart Source: [Zhang and Lu 2004]41
Because contour based techniques use only a portion of the region, they are more sensitive to noise than region based techniques
Introduction41
Chart Source: [M. Yang, K. Kpalma, J. Ronsin 2008]Shape (Cont.)42
Because contour based techniques use only a portion of the region, they are more sensitive to noise than region based techniques
Introduction42
Shape (Cont.)43
Shape (Cont.)Because contour based techniques are more sensitive to noise than region based techniques.Therefore, color image retrieval usually employs region based shape features.
44
Because contour based techniques use only a portion of the region, they are more sensitive to noise than region based techniques
Introduction44
Learning Methods:45
Learning Methods: Comparisons
Annotation methodProsConsSVMSmall sample, optimal class boundary, non-linear classificationSingle labelling, one class per time, expensive trial and run, sensitive to noisy data, prone to over-fittingANNMulticlass outputs, non- linear classification, robust to noisy data, suitable for complex problemSingle labelling, sub-optimal, expensive training, complex and black box classificationDTIntuitive, semantic rules, multiclass outputs, fast, allow missing values, handle both categorical and numerical valuesSingle labelling, sub-optimal, need pruning, can be unstable
46
Learning Methods: Comparisons
Annotation methodProsConsNon-parametricMulti-labelling, model free, fastLarge number of parameters, large sample, sensitive to noisy dataParametricMulti-labelling, small sample, good approximation of unknown distributionPredefined distribution, expensive training, approximated boundaryMetadataUse of both textual and visual featuresDifficult to relate visual features with textual features, difficult textual feature extraction
47
Deep Learning48Deep belief networksDeepBoltzmann machinesDeepConvolutional neural networksDeepRecurrent neural networksHierarchical temporal memory
Source: https://en.wikipedia.org/wiki/List_of_machine_learning_concepts
Deep Learning (Cont.)49Source: Ranzato, 4 October 2013, Slides
Deep Learning (Cont.) 50A Potential Problem with Deep Learning *??Optimization TaskSee : Bengios Articles!Hot videos about Deep Learning on YouTube!Ranzato, 4 October 2013:https://www.youtube.com/watch?v=clgMTk5V2Sk
*: Ranzato, 4 October 2013, Slides
Darinja ye sakhtare kolli deepo migim , va baraye moghayse beyne deepo classic va beyne khode deep ha dar slide badi natayeje yeki az maghalate 2015 ro be namayesh mizarim,50
Outline51
51Introduction
2009, Shallow
Source: Venkatesh N. Mur thy, S. Maji, R. Manmatha, Automatic Image Annotation using Deep Learning Representations 2015Useful Information: Recent Articles52
52Introduction
53Which one ?? !1 Theoretical Limitations of Shallow Architectures2 Theoretical Advantages of Deep Architectures
53
Source: B. Klein, G. Lev, G. Sadeh, and L. Wolf, Fisher Vectors Derived from Hybrid Gaussian-Laplacian Mixture Models for Image Annotation 2015
Useful Information: Recent Articles (Cont.)54
54Introduction
Useful Information: Toolbox55
55
Useful Information: Databases56
56
Useful Information: DatabasesOther Databases:Flicker8,10,30
Table Source: M. Guillaumin, T. Mensink, J. Verbeek and C. Schmid, TagProp: Discriminative Metric Learning in Nearest Neighbor Models for Image Auto-Annotation57
57
Useful Information: Authors
Reported by: Google Scholar58
58
Useful Information: Authors (Cont.)
59Recursive Deep Learning for Natural Language Processing and Computer Vision, PhD Thesis, Computer Science Department, Stanford University2014 Arthur L. Samuel Best Computer Science PhD Thesis AwardReported by: Google Scholar
59
Outline60
60Introduction
How to assign a word to an image ?What are components of Automatic Image AnnotationSystem ?Feature ExtractionClassificationMethodsPattern Recognition !!61Conclusions !!!
61
High dimensional feature analysisHow to build an effective annotation model?The third issue is that currently annotation and ranking are done online simultaneously in the multiple labelling annotation approaches. This is not efficient for image retrieval. Lack of standard vocabulary and taxonomy.There is no commonly acceptable image databaseinsufficient depth of architectures, and locality of estimators[Bengio, 2009]
62Picture Source: Bengio, Y. (2009). Learning Deep Architectures for AI. Foundations and Trends in Machine LearningSource: Zhang, D., Islam, M. M., & Lu, G. (2012). A review on automatic image annotation techniques. Pattern Recognition, 45(1), 346362. doi:10.1016/j.patcog.2011.05.013Conclusions (Cont.)
and locality of estimators: moshkele digari ke deep hal karde
Va begim chera ma rooye in moshkel focus kardim na moshkelate dg? Ye slide besazChon hameye maghalate AIA be Sematic Gap eshare kardand
Bargardim be inke aya classic kollan kenar gozashte shode ??
Introduction62
References
63
63