ontology-based image representation and inferenceece417/lecturenotes/ece417_ontology.pdfpattern...

64
Ontology-based image representation and inference Ning Xu Advisor: Thomas Huang UIUC Many Slides from Shen-Fu Tsai, Derek Hoiem

Upload: others

Post on 21-Oct-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

  • Ontology-based image representation and inference

    Ning Xu

    Advisor: Thomas Huang

    UIUC

    Many Slides from Shen-Fu Tsai, Derek Hoiem

  • Outline

    • Traditional image representation and inference

    • Ontology-based image representation and inference – What is ontology

    – Why use ontology

    – Researches on ontology • Semantic hierarchic classifiers [Schmid’07]

    • album event recognition [Tsai’11]

    • ontological image annotation [Tsai’12]

    • Conclusion and future work

  • Outline

    • Traditional image representation and inference

    • Ontology-based image representation and inference – What is ontology

    – Why use ontology

    – Researches on ontology • Semantic hierarchic classifiers [Schmid’07]

    • album event recognition [Tsai’11]

    • ontological image annotation [Tsai’12]

    • Conclusion and future work

  • Traditional image representation

    • Labels/categories are treated independently;

    Dog Bicycle Motorbike

    Ex1. one-label problem:

    Ex2. multi-label problem:

    Office,desk,chair,computer

    bedroom,bed,mirror,drawer

    outdoor,sunlight,tree,sky

  • Traditional image inference

    LAB Histogram

    Textons

    Bag of SIFT

    HOG

    x x x x

    x

    x

    x

    x x

    o o

    o o

    o = Category

    label

    Examples Image Features Classifier + +

    Slide from Derek Hoiem

    http://images.google.com/imgres?imgurl=http://scienceblogs.com/bushwells/upload/2006/07/IcePlantOrgy.JPG&imgrefurl=http://scienceblogs.com/bushwells/2006/07/friday_flower_porn.php&h=1704&w=2272&sz=838&hl=en&start=17&tbnid=RBGFTXqFUNjqAM:&tbnh=113&tbnw=150&prev=/images?q=plant&gbv=2&hl=en&safe=off

  • Training phase

    Training Labels

    Training Images

    Classifier Training

    Training

    Image Features

    Trained Classifier

    Slide from Derek Hoiem

  • Testing phase

    Training Labels

    Training Images

    Classifier Training

    Training

    Image Features

    Trained Classifier

    Image Features

    Testing

    Test Image

    Trained Classifier Outdoor

    Prediction

    Slide from Derek Hoiem

  • Outline

    • Traditional image representation and inference

    • Ontology-based image representation and inference – What is ontology

    – Why use ontology

    – Researches on ontology • Semantic hierarchic classifiers [Schmid’07]

    • album event recognition [Tsai’11]

    • ontological image annotation [Tsai’12]

    • Conclusion and future work

  • What is ontology

    • Ontology

    – Prior human knowledge, domain knowledge

    – a set of concepts and their relations (part of; is a; co occur etc.) in some domain

    Slide from Shen-Fu Tsai

    Parmenides was among the first to propose an ontological characterization of the fundamental nature of reality.

    http://en.wikipedia.org/wiki/Parmenides

  • General ontology structure

    scene

    Indoor Outdoor

    object

    natural artifact

    event

    sports social

    Slide from Shen-Fu Tsai

  • Outline

    • Traditional image representation and inference

    • Ontology-based image representation and inference – What is ontology

    – Why use ontology

    – Researches on ontology • Semantic hierarchic classifiers [Schmid’07]

    • album event recognition [Tsai’11]

    • ontological image annotation [Tsai’12]

    • Conclusion and future work

  • Why use ontology

    • Scalability

    – W/o ontology needs N*(N-1)/2 one-versus-one classifiers or N one-versus-rest classifiers for N concepts;

    – W ontology needs approximately ceil(log2N) classifiers for N concepts;

    – N can be quite large in real dataset. (imagenet, flickr etc.)

  • Why use ontology

    • Independently trained concepts classifiers are limited even erroneous

    x

    x

    x x

    x

    x

    x

    x

    x o

    o

    o

    o

    o

    Δ

    Δ

    Δ Δ

    Δ

    o

    o

    2

    1

  • Why use ontology

    • Ontology enables us more knowledgeable – If we know object A is a sedan, then we also know

    A is a car, a vehicle, as well as a means of transportation. W/o the need of training all classifiers.

    – If we can’t confidently say A is a sedan or SUV, we can label A as a car.

    – Bridging the gap between low level concepts and high level ones

  • Outline

    • Traditional image representation and inference

    • Ontology-based image representation and inference – What is ontology

    – Why use ontology

    – Researches on ontology • Semantic hierarchic classifiers [Schmid’07]

    • album event recognition [Tsai’11]

    • ontological image annotation [Tsai’12]

    • Conclusion and future work

  • Semantic hierarchies for image classification [Schmid’07]

    • Basic idea: use semantic hierarchies to reflect the similarity among categories in the view of visual appearance.

  • First step: feature extraction

    • Harris-Laplace detector and Laplacian detector

    • Sift descriptor and hue color descriptor (128D + 36D = 164D)

    • Bag of words (1000D dictionary)

  • Choice of classifiers

    • SVM classifier with extend Gaussian kernel K(Hi,Hj) = e

    -1/A*D(Hi,Hj)

    where D(Hi,Hj) = , called distance. Hi and Hj are the dictionary histograms of image i and j. A is the mean value of the distances between all training images

    • D = ΣnDn where n indicates channels

  • Second step: extract semantic graph

    • WordNet contains over 80000 noun synonym sets called synsets.

    • Two kinds of semantic relations are defined as hypernymy/hyponymy (is-a) and holonymy/meronymy (part-of).

    Wordnet: http://wordnetweb.princeton.edu/perl/webwn

    http://wordnetweb.princeton.edu/perl/webwnhttp://wordnetweb.princeton.edu/perl/webwn

  • Extracted subgraphs

  • Semantic graph pruning

    • Part-of relation may permits reasoning which is incorrect from the point view of visual appearance. E.g. A car has fuel which is an organic material does not imply similarity to living organism like a cat.

    • Pruning: from the base node, reject those nodes which are not connected by the Is-a relation graph.

  • Third step: construct semantic hierarchic classifier

    • Define the support of concept A as

    • train a given Bi|A classifier with the is-a and part-of relations by a binary SVM classifier.

    • Base node is supported by all training images.

    • When support(A) = support(Bi), generate a trivial classifier with only one label.

  • Inference

    • Given a test image, start from the base node; • Descend to the linked concept when the classifier

    returns a positive answer. • There are possibly multiply paths to one concept in the

    ontology, the final decision value is defined as

    c is the concept, v is the concept set containing c, s is the base node, P is the possible path set from s to v, e are the edges in P. In other words, the maximum decision value over all possible paths is returned, whereas for a given path the minimum decision value over its edge is chosen.

  • Inference

    Test image

  • Complexity

    • Define complexity = the number of binary classifiers evaluated for a test image.

    • It’s difficult to measure the complexity since not only depend on the structure of the hierarchy but also the number of paths considered.

    • Only rough estimation on VOC 06 is O(N0.64) which is better than traditional one-versus-rest classifier O(N).

  • Experimental results

    • Image dataset (VOC’06):

    – 10 concepts: bike, bus, car, cat, cow, dog, horse, motorbike, person, sheep;

    – 1277 training images, 1341 testing images

  • Experimental results

    • Comparing algorithms: – OAR: One-Against-Rest classifier;

    – AVH: Automatically constructed Visual Hierarchy which is a binary tree obtained by iteratively merging categories with smallest average distance;

    – SSH: Simple Semantic Hierarchy which only considers is-a relation;

    – ESH: Extended Semantic Hierarchy which considers both is-a and part-of relations.

  • Experimental results

  • Experimental results

    • A: low level concepts in VOC’06; – SH methods are generally better than OAR, both improve the

    efficiency and no loss of accuracy; – SH methods are generally better than AVH, meaning that

    apparent visual similarity may not generalize well to object classes while semantic knowledge can better help;

    • B: High level concepts in VOC’06; – SH methods are capable of reasoning high level concepts

    • C: images from external dataset by querying “vehicle window”, “windscreen”, “windshield” in Google; – To test the generalization ability of classifiers; – SSH can’t work since only is-a relation is considered; – For OAR and AVH, simple reasoning is applied that if there is a

    car or bus then there is a window;

  • Publication

    • Marszalek, Marcin, and Cordelia Schmid. "Semantic hierarchies for visual object recognition." Computer Vision and Pattern Recognition, 2007. CVPR'07. IEEE Conference on. IEEE, 2007.

  • Outline

    • Traditional image representation and inference

    • Ontology-based image representation and inference – What is ontology

    – Why use ontology

    – Researches on ontology • Semantic hierarchic classifiers [Schmid’07]

    • album event recognition [Tsai’11]

    • ontological image annotation [Tsai’12]

    • Conclusion and future work

  • album event recognition [Tsai’11]

    • Goal: recognize the event/topic of a given album (a set of images);

  • Basic idea

    • Use co-occurrence relation to identify typical concepts for each event;

    – Event hiking:

    • Positive concepts: mountain, people walking, outdoor etc.;

    • Negative concepts: bedroom, indoor etc.;

    – Event Valentine’s day:

    • Positive concepts: chocolate, heart, candy;

    • Negative concepts: turkey, green clothes etc.;

  • Framework of album event classification

  • What is the object pattern: imperfect object detection

    Discovered patterns: {cloud(5/7), sky(6/7), mountain(5/7), indoor(1/7)}

    With imperfect detection: let’s discretize the continuous-valued scores of detector output:

    Quantized detection: {cloud(5/7), sky(6/7), mountain(5/7), indoor(1/7), person(2/7)} {cloud(5/7), sky(6/7), mountain(5/7), indoor(1/7), person(5/7)}

  • Dataset construction: select popular holidays using Flickr

  • Dataset construction: picking up relevant objects

    • For each tag T, Flickr provide some relevant tags

    • Take union of relevant object tags to all 10 holidays 500 tags

    • For each holiday H

    – Rank each tag T by

    • R(H, T) = |I(H and T)| / |I(H or T)|

    • Pick the top 50 tags

  • List of 38 object detectors Holidays Positively relevant objects

    Christmas Christmas tree, gift

    Easter Easter egg, basket, rabbit, church

    Halloween Attire, pumpkin, jack-o-lantern

    Independence Day American flag, firework, crowd

    Mardi Gras Mask, necklace, attire, feather boa

    Memorial Day American flag, uniform, military uniform, music band

    New Year’s Eve Champagne, firework, crowd

    St. Patrick’s Day Music band, crowd

    Thanksgiving Food, dinner, turkey, pumpkin

    Valentine’s Day Heart, bouquet

    Other objects Accordion, bassoon, child, cross, drum, euphonium, flag, french horn, light source, room light, shopping basket, soil, stage, table

  • Some Mined Patterns

  • Some Mined Patterns

  • Pattern ranking for album event classification

    • Let f(p) = percentage of photos containing pattern p in an album

    • For each event E

    – For each pattern p

    • Try predicting E using f(p)

    • Measure the prediction performance by Average Precision(AP)

    – Rank all patterns by their APs with respect to E

    • Take the union of top patterns for all events

  • Experimental results

    • Dataset: 1) small dataset: 3 topics: potluck, hiking, concert; 2) 10 holiday albums collected from flickr;

    • Comparing algorithm: – Image-based multiclass Adaboost (SAMME)

    • J. Yuan, J. Luo, and Y. Wu. Mining compositional features for boosting. In IEEE CVPR 2008;

    • Difference: 1) Mining patterns from the whole dataset; 2) results are majority vote of the image labels of the given album.

    – Compositional object pattern with non-flexible pattern (COPF_base)

    – Compositional object pattern with flexible pattern (COPF)

  • Classification results of small dataset

  • Classification results of 10 holiday dataset

  • Publication

    • Tsai, Shen-Fu, et al. "Compositional object pattern: a new model for album event recognition." Proceedings of the 19th ACM international conference on Multimedia. ACM, 2011.

  • Outline

    • Traditional image representation and inference

    • Ontology-based image representation and inference – What is ontology

    – Why use ontology

    – Researches on ontology • Semantic hierarchic classifiers [Schmid’07]

    • album event recognition [Tsai’11]

    • ontological image annotation [Tsai’12]

    • Conclusion and future work

  • Ontological image annotation [Tsai’12]

    • Input: image I; concepts C1, C2, …, Cn; output values x1, x2, …, xn from coarse detectors;

    • Output: y1, y2, …, yn, where yi is 1 or -1 which indicates whether Ci is present in the image or not.

    C3

    C5 C4

    C1 C2

    Coarse C1 detector Coarse C1 detector

    Coarse C2 detector Coarse C2 detector

    Coarse C3 detector Coarse C3 detector

    Coarse C4 detector Coarse C4 detector

    Coarse C5 detector Coarse C5 detector

    image

    refined C1 detection refined C1 detection

    refined C2 detection refined C2 detection

    refined C3 detection refined C3 detection

    refined C4 detection refined C4 detection

    refined C5 detection refined C5 detection

  • Basic idea

    • Joint inference of concepts, considering their subclass and co-occurrence relations

    concepts of interest

    WordNet

    subclass relation

    subclass extraction

    Training image

    co-occurrence learner

    co-occurrence relation

    inference

  • Formulation

    Unary potential

    Potential function

    Pairwise potential

  • Relation constraints

    • Subclass constraint (hard constraint) – If Ca (dog) is a subclass of Cb (animal), then yb ≥ ya

    – Relation obtained from WordNet

    • Co-occurrence reward/penalty (soft constraint) – E.g. reward (indoor, table) pair

    – E.g. penalize (computer, beach) pair

    – Learned from training set

    – Only positive pairs are considered

  • Inference

    • Find the assignment y that satisfies all constraints with the highest score

  • subclass relation

    • Indoor

    • Bedroom

    • Office

    • Outdoor

    • Light

    • Room light

    • Street light

    • Computer

    • Laptop

    • Desktop computer

    entity

    artifact

    devicestructure,

    construction

    personal computer

    source of illumination

    bedroom office

    laptop desktop computer

    room light

    street light

    WordNet subclass

    relations

  • Final subclass relation

  • Ontological learning

  • Baseline algorithms

    • RAW: raw output of initial detectors

    • Semantic Hierarchy (SH): conditional classifier on each subclass/part-of link

    • SVM fusion

  • Results: AUC with %50 training

    indoor

    outd

    oor

    bedro

    om

    offic

    e

    light

    room

    light

    str

    eetlig

    ht

    com

    pute

    r

    lapto

    p

    deskto

    p

    ave

    0.6

    0.7

    0.8

    0.9

    AU

    C

    OI

    SH

    SVM

    RAW

  • AUC v.s. #training

    20 35 50Percentage of training (%)

    0.68

    0.70

    0.72

    0.74

    0.76

    0.78

    Mean A

    UC

    OI

    SH

    SVM

    RAW

  • Publication

    • Tsai, Shen-Fu, et al. "Ontological Inference Framework with Joint Ontology Construction and Learning for Image Understanding." Multimedia and Expo (ICME), 2012 IEEE International Conference on. IEEE, 2012.

  • Outline

    • Traditional image representation and inference

    • Ontology-based image representation and inference – What is ontology

    – Why use ontology

    – Researches on ontology • Semantic hierarchic classifiers [Schmid’07]

    • album event recognition [Tsai’11]

    • ontological image annotation [Tsai’12]

    • Conclusion and future work

  • Conclusion

    • Advantages:

    – Joint inference;

    – Scalability;

    – More robust and accurate classifiers;

    – Bridging the low level semantic and high level ones;

    • Disadvantages:

    – Harder to understand than traditional methods

    – Sometimes prior knowledge is wrong;

    – Efficiency and accuracy are usually contradictory;

  • Future work

    • Explore ontology deeper to see how much improvement can be achieved in terms of accuracy and efficiency;

    • Explore ontology wider to apply ontology on many other domains such as medical imaging, healthcare, AI etc.;

    • Explore how to construct ontology automatically or semiautomatically;

  • Thank you !!