automatic image annotation (aia)

of 63 /63
Seminar Report Presented to: Dr. Shanbehzadeh Presented by: Farzaneh Rezaei November 2015

Author: farzaneh-rezaei

Post on 12-Apr-2017

367 views

Category:

Education


2 download

Embed Size (px)

TRANSCRIPT

Seminar Report

Seminar Report

Presented to: Dr. Shanbehzadeh

Presented by: Farzaneh Rezaei

November 2015

1

What is the goal of computer vision ?Perceive the story behind the picture See the world!!But what exactly does it mean to see?

2Source: Wall-e Movie: Pixar, Walt Disney Pictures

Outline3

3Introduction

Outline4

4Introduction

What is Automatic Image Annotation?

Automatic image annotation is the task of automatically assigning words to an image that describe the content of the image.

Munirathnam Srikanth,et al.Exploiting ontologies for automatic image annotation

Source: Personalizing Automated Image Annotation Using Cross-Entropy: https://ivi.fnwi.uva.nl/isis/publications/bibtexbrowser.php?key=LiICM2011&bib=all.bib5

5Introduction

What is Automatic Image Annotation?(Cont.)

Source: MS COCO Captioning Challenge: http://mscoco.org/dataset/#captions-challenge20156

6Introduction

3,000 Photos Are Uploaded Every Second to Facebook

Why Image Annotation is important?

Recently, we have witnessed an exponential growth of user generated videos and images, due to the booming of social networks, such as Facebook and Flickr. Source: petapixel.comSource: http://petapixel.com/2012/02/01/3000-photos-are-uploaded-every-second-to-facebook/ 7

7

Why Image Annotation is important?(Cont.)

Source: Barriuso, A., & Torralba, A. (2012). Notes on image annotation Applications e.g. Photo organizer appsImage Classification Systems

8

8Introduction

Numbers of articles per year for Automatic Image Annotation (in Title of article)Reported by: Google Scholar9

9Introduction

Outline10

10Introduction

How do you annotate these images?

11

What are components of Automatic Image AnnotationSystem ?12

How to classify Images ?What are components of Automatic Image AnnotationSystem ?13

Feature ExtractionClassificationMethodsWhat are components of Automatic Image AnnotationSystem ?14

What are components of Automatic Image AnnotationSystem ?ClassificationMethodsFeature Extraction15

What are components of Automatic Image AnnotationSystem ?Feature ExtractionClassificationMethodsPattern Recognition !!16

Pas ghesmate azaame ye system aia az sakhtar haye pr peyravi mikone, va be hamin dalil motale sakhtarha da pr be ma komak mikoneVa azoon mohem tar dalile asli moshkelate aia ro mishe too hamin sakhtar ha jostojo kard16

17Slide Credit

18

An Example of classical approaches in AIASource: Zhang, D., Islam, M. M., & Lu, G. (2012). A review on automatic image annotation techniques. Pattern Recognition, 45(1), 346362. doi:10.1016/j.patcog.2011.05.013

In derakhto ta che omghi pish berim???Masalan ye tasvire gol? Ya ye tasvir az ye chahar rahe sholoogh? Ta chand level berim ?18

Theoretical Limitations of Shallow Architectures*Functions that can be compactly represented by a depth k architecture might require an exponential number of computational elements to be represented by a depth k 1 architecture

Issues of classical approaches19*Bengio, Y. (2009). Learning Deep Architectures for AI. Foundations and Trends in Machine Learning

Introduction19

Issues of classical approaches (Cont.)Theoretical Limitations of Shallow ArchitecturesShallow? Deep?Functions?Compact?Depth?Computational Elements?20logic circuit

Mesal baraye roshan shodane function va ce mitoone madar manteghi bashe, ke khoroojie ma hamoon form sade shodeye madaremoone va har gate neshan dahande yek onsor mohasebatie, mesale ai mitoone Safheye bad20

Issues of classical approaches (Cont.)21

Picture Source: Bengio, Y. (2009). Learning Deep Architectures for AI. Foundations and Trends in Machine Learning

Depth 4Depth 3

Issues of classical approaches (Cont.)Theoretical Limitations of Shallow ArchitecturesLinear regression and logistic regression have depth 1, i.e., have a single level.Ordinary multi-layer neural networks With the most common choice of one hidden layer, they have depth twoDecision trees can also be seen as having two levelsBoosting (Freund & Schapire, 1996) usually adds one level to its base learners: that level computes a vote or linear combination of the outputs of the base learners22

22

Issues of classical approaches (Cont.)Theoretical Limitations of Shallow ArchitecturesShallow? Deep?FunctionsCompactDepthComputational Elements23

Ama hala deep boodan yani chi? Masalan migim az omgh 10 be bad deep hesab mishe ??! Nazare shoma chieHala baz bargardim be hamoon jomle, bebinid ma ye tedad khas made nazaremoon nisChon baste be masala fargh mikone bahs e ma ine ke age beshe target funcemoon ro ba k omgh compact neshoon bedim VA Maaghaleye zisserman ke miad mige harchi omgh bishtar javab behtar ama bayad did che ghadr behtar shode mi arze?23

Theoretical Limitations of Shallow Architectures*Functions that can be compactly represented by a depth k architecture might require an exponential number of computational elements to be represented by a depth k 1 architecture

Issues of classical approaches24*Bengio, Y. (2009). Learning Deep Architectures for AI. Foundations and Trends in Machine Learning

Introduction24

A two-layer circuit of logic gates can represent any boolean function (Mendelson, 1997).With depth two logical circuits, most boolean functions require an exponential number of logic gates (Wegener, 1987) to be represented (with respect to input size)There are functions computable with a polynomial-size logic gates circuit of depth k that require exponential size when restricted to depth k 1 (Hastad, 1986) The proof of this theorem relies on earlier results (Yao, 1985) showing that d-bit parity circuits of depth 2 have exponential size25Issues of classical approaches (Cont.)

25

One might wonder whether these computational complexity results for boolean circuits are relevant to machine learning. See Orponen (1994)! for an early survey of theoretical results in computational complexity relevant to learning algorithms. Interestingly, many of the results for boolean circuits can be generalized to architectures whose computational elements are linear threshold units (also known as artificial neurons (McCulloch & Pitts, 1943)), which compute:f(x) = w0 x+b0 (1)with parameters w and b.

26Issues of classical approaches (Cont.)

26

27Issues of classical approaches (Cont.)1 Theoretical Limitations of Shallow Architectures2 Theoretical Advantages of Deep ArchitecturesWhich one ?? !

Bahse inke migim shallow structure behtar az classic hastam mikhay begooNagoo javab chie begoo bayad maghalate bengio ro kamel tar khoond va dalile in eslah ro fahmid ama hala shoma behesh fek konid va man ham dar natije giri payani, nazare khodamo ba tavajjoh be matalebi ke khoondam midam27

28Slide Credit

29

Slide Credit

How to assign a word to an image ?What are components of Automatic Image AnnotationSystem ?Feature ExtractionClassificationMethodsPattern Recognition !!30

30

31

http://graffiti-artist.net/corporate-offices/ny-facebook-office-graffiti/

31

Outline32

32Introduction

Going Deeper!33

33Introduction

Feature Extraction34

Color

35

Color

36

Color: ComparisonsColor methodProsConsHistogramSimple to compute, intuitiveHigh dimension, no spatial info,sensitive to noiseCM Compact, robustNot enough to describe all colors, no spatial infoCCVSpatial infoHigh dimension, high computation costCorrelogramSpatial infoVery high computation cost, sensitive to noise, rotation and scale

37

37

Color: Comparisons (Cont.)Color methodProsConsDCD Compact, robust,perceptual meaningNeed post-processing for spatial infoCSD Spatial infoSensitive to noise, rotation and scaleSCDCompact on need,scalabilityNo spatial info, less accurate ifcompact

38

Spatial Texture : ComparisonsColor methodProsConsTextonIntuitiveSensitive to noise, rotation and scale, difficult to define textonsGLCM based method Intuitive, compact, robust HighHigh computation cost, not enough to describe allTamuraPerceptually meaningfulToo few featuresSARCompact, robust, rotationinvariantHigh computation cost, difficult to define pattern sizeFDCompact, perceptually meaningfulcomputation cost, sensitive to scale

39

Spectral Texture : Comparisons (Cont.)Color methodProsConsFT/DCTFast computationSensitive to scale and rotationWaveletFast computation, multi-resolutionSensitive to rotation, limitedorientationsGaborMulti-scale, multi-orientation, robustnormalisation, losing of spectral information due to incomplete cover of spectrum planeCurveletMulti-resolution, multi-orientation, robustNeed rotation normalisation

40

Shape

Chart Source: [Zhang and Lu 2004]41

Because contour based techniques use only a portion of the region, they are more sensitive to noise than region based techniques

Introduction41

Chart Source: [M. Yang, K. Kpalma, J. Ronsin 2008]Shape (Cont.)42

Because contour based techniques use only a portion of the region, they are more sensitive to noise than region based techniques

Introduction42

Shape (Cont.)43

Shape (Cont.)Because contour based techniques are more sensitive to noise than region based techniques.Therefore, color image retrieval usually employs region based shape features.

44

Because contour based techniques use only a portion of the region, they are more sensitive to noise than region based techniques

Introduction44

Learning Methods:45

Learning Methods: Comparisons

Annotation methodProsConsSVMSmall sample, optimal class boundary, non-linear classificationSingle labelling, one class per time, expensive trial and run, sensitive to noisy data, prone to over-fittingANNMulticlass outputs, non- linear classification, robust to noisy data, suitable for complex problemSingle labelling, sub-optimal, expensive training, complex and black box classificationDTIntuitive, semantic rules, multiclass outputs, fast, allow missing values, handle both categorical and numerical valuesSingle labelling, sub-optimal, need pruning, can be unstable

46

Learning Methods: Comparisons

Annotation methodProsConsNon-parametricMulti-labelling, model free, fastLarge number of parameters, large sample, sensitive to noisy dataParametricMulti-labelling, small sample, good approximation of unknown distributionPredefined distribution, expensive training, approximated boundaryMetadataUse of both textual and visual featuresDifficult to relate visual features with textual features, difficult textual feature extraction

47

Deep Learning48Deep belief networksDeepBoltzmann machinesDeepConvolutional neural networksDeepRecurrent neural networksHierarchical temporal memory

Source: https://en.wikipedia.org/wiki/List_of_machine_learning_concepts

Deep Learning (Cont.)49Source: Ranzato, 4 October 2013, Slides

Deep Learning (Cont.) 50A Potential Problem with Deep Learning *??Optimization TaskSee : Bengios Articles!Hot videos about Deep Learning on YouTube!Ranzato, 4 October 2013:https://www.youtube.com/watch?v=clgMTk5V2Sk

*: Ranzato, 4 October 2013, Slides

Darinja ye sakhtare kolli deepo migim , va baraye moghayse beyne deepo classic va beyne khode deep ha dar slide badi natayeje yeki az maghalate 2015 ro be namayesh mizarim,50

Outline51

51Introduction

2009, Shallow

Source: Venkatesh N. Mur thy, S. Maji, R. Manmatha, Automatic Image Annotation using Deep Learning Representations 2015Useful Information: Recent Articles52

52Introduction

53Which one ?? !1 Theoretical Limitations of Shallow Architectures2 Theoretical Advantages of Deep Architectures

53

Source: B. Klein, G. Lev, G. Sadeh, and L. Wolf, Fisher Vectors Derived from Hybrid Gaussian-Laplacian Mixture Models for Image Annotation 2015

Useful Information: Recent Articles (Cont.)54

54Introduction

Useful Information: Toolbox55

55

Useful Information: Databases56

56

Useful Information: DatabasesOther Databases:Flicker8,10,30

Table Source: M. Guillaumin, T. Mensink, J. Verbeek and C. Schmid, TagProp: Discriminative Metric Learning in Nearest Neighbor Models for Image Auto-Annotation57

57

Useful Information: Authors

Reported by: Google Scholar58

58

Useful Information: Authors (Cont.)

59Recursive Deep Learning for Natural Language Processing and Computer Vision, PhD Thesis, Computer Science Department, Stanford University2014 Arthur L. Samuel Best Computer Science PhD Thesis AwardReported by: Google Scholar

59

Outline60

60Introduction

How to assign a word to an image ?What are components of Automatic Image AnnotationSystem ?Feature ExtractionClassificationMethodsPattern Recognition !!61Conclusions !!!

61

High dimensional feature analysisHow to build an effective annotation model?The third issue is that currently annotation and ranking are done online simultaneously in the multiple labelling annotation approaches. This is not efficient for image retrieval. Lack of standard vocabulary and taxonomy.There is no commonly acceptable image databaseinsufficient depth of architectures, and locality of estimators[Bengio, 2009]

62Picture Source: Bengio, Y. (2009). Learning Deep Architectures for AI. Foundations and Trends in Machine LearningSource: Zhang, D., Islam, M. M., & Lu, G. (2012). A review on automatic image annotation techniques. Pattern Recognition, 45(1), 346362. doi:10.1016/j.patcog.2011.05.013Conclusions (Cont.)

and locality of estimators: moshkele digari ke deep hal karde

Va begim chera ma rooye in moshkel focus kardim na moshkelate dg? Ye slide besazChon hameye maghalate AIA be Sematic Gap eshare kardand

Bargardim be inke aya classic kollan kenar gozashte shode ??

Introduction62

References

63

63