ontology-based image representation and inferenceece417/lecturenotes/ece417_ontology.pdfpattern...

Ontology-based image representation and inference

Ning Xu

Advisor: Thomas Huang

UIUC

Many Slides from Shen-Fu Tsai, Derek Hoiem

Outline

• Traditional image representation and inference

• Ontology-based image representation and inference – What is ontology

– Why use ontology

– Researches on ontology • Semantic hierarchic classifiers [Schmid’07]

• album event recognition [Tsai’11]

• ontological image annotation [Tsai’12]

• Conclusion and future work

Traditional image representation

• Labels/categories are treated independently;

Dog Bicycle Motorbike

Ex1. one-label problem:

Ex2. multi-label problem:

Office,desk,chair,computer

bedroom,bed,mirror,drawer

outdoor,sunlight,tree,sky

Traditional image inference

LAB Histogram

Textons

Bag of SIFT

HOG

x x x x

x

x

x

x x

o o

o o

o = Category

label

Examples Image Features Classifier + +

Slide from Derek Hoiem

http://images.google.com/imgres?imgurl=http://scienceblogs.com/bushwells/upload/2006/07/IcePlantOrgy.JPG&imgrefurl=http://scienceblogs.com/bushwells/2006/07/friday_flower_porn.php&h=1704&w=2272&sz=838&hl=en&start=17&tbnid=RBGFTXqFUNjqAM:&tbnh=113&tbnw=150&prev=/images?q=plant&gbv=2&hl=en&safe=off

Training phase

Training Labels

Training Images

Classifier Training

Training

Image Features

Trained Classifier


Testing phase

Training Labels

Training Images

Classifier Training

Training

Image Features

Trained Classifier

Image Features

Testing

Test Image

Trained Classifier Outdoor

Prediction


Outline








What is ontology

• Ontology

– Prior human knowledge, domain knowledge

– a set of concepts and their relations (part of; is a; co occur etc.) in some domain

Slide from Shen-Fu Tsai

Parmenides was among the first to propose an ontological characterization of the fundamental nature of reality.

http://en.wikipedia.org/wiki/Parmenides

General ontology structure

scene

Indoor Outdoor

object

natural artifact

event

sports social

Slide from Shen-Fu Tsai

Outline








Why use ontology

• Scalability

– W/o ontology needs N*(N-1)/2 one-versus-one classifiers or N one-versus-rest classifiers for N concepts;

– W ontology needs approximately ceil(log2N) classifiers for N concepts;

– N can be quite large in real dataset. (imagenet, flickr etc.)

Why use ontology

• Independently trained concepts classifiers are limited even erroneous

x

x

x x

x

x

x

x

x o

o

o

o

o

Δ

Δ

Δ Δ

Δ

o

o

2

1

Why use ontology

• Ontology enables us more knowledgeable – If we know object A is a sedan, then we also know

A is a car, a vehicle, as well as a means of transportation. W/o the need of training all classifiers.

– If we can’t confidently say A is a sedan or SUV, we can label A as a car.

– Bridging the gap between low level concepts and high level ones

Outline








Semantic hierarchies for image classification [Schmid’07]

• Basic idea: use semantic hierarchies to reflect the similarity among categories in the view of visual appearance.

First step: feature extraction

• Harris-Laplace detector and Laplacian detector

• Sift descriptor and hue color descriptor (128D + 36D = 164D)

• Bag of words (1000D dictionary)

Choice of classifiers

• SVM classifier with extend Gaussian kernel K(Hi,Hj) = e

-1/A*D(Hi,Hj)

where D(Hi,Hj) = , called distance. Hi and Hj are the dictionary histograms of image i and j. A is the mean value of the distances between all training images

• D = ΣnDn where n indicates channels

Second step: extract semantic graph

• WordNet contains over 80000 noun synonym sets called synsets.

• Two kinds of semantic relations are defined as hypernymy/hyponymy (is-a) and holonymy/meronymy (part-of).

Wordnet: http://wordnetweb.princeton.edu/perl/webwn

http://wordnetweb.princeton.edu/perl/webwnhttp://wordnetweb.princeton.edu/perl/webwn

Extracted subgraphs

Semantic graph pruning

• Part-of relation may permits reasoning which is incorrect from the point view of visual appearance. E.g. A car has fuel which is an organic material does not imply similarity to living organism like a cat.

• Pruning: from the base node, reject those nodes which are not connected by the Is-a relation graph.

Third step: construct semantic hierarchic classifier

• Define the support of concept A as

• train a given Bi|A classifier with the is-a and part-of relations by a binary SVM classifier.

• Base node is supported by all training images.

• When support(A) = support(Bi), generate a trivial classifier with only one label.

Inference

• Given a test image, start from the base node; • Descend to the linked concept when the classifier

returns a positive answer. • There are possibly multiply paths to one concept in the

ontology, the final decision value is defined as

c is the concept, v is the concept set containing c, s is the base node, P is the possible path set from s to v, e are the edges in P. In other words, the maximum decision value over all possible paths is returned, whereas for a given path the minimum decision value over its edge is chosen.

Inference

Test image

Complexity

• Define complexity = the number of binary classifiers evaluated for a test image.

• It’s difficult to measure the complexity since not only depend on the structure of the hierarchy but also the number of paths considered.

• Only rough estimation on VOC 06 is O(N0.64) which is better than traditional one-versus-rest classifier O(N).

Experimental results

• Image dataset (VOC’06):

– 10 concepts: bike, bus, car, cat, cow, dog, horse, motorbike, person, sheep;

– 1277 training images, 1341 testing images


• Comparing algorithms: – OAR: One-Against-Rest classifier;

– AVH: Automatically constructed Visual Hierarchy which is a binary tree obtained by iteratively merging categories with smallest average distance;

– SSH: Simple Semantic Hierarchy which only considers is-a relation;

– ESH: Extended Semantic Hierarchy which considers both is-a and part-of relations.


• A: low level concepts in VOC’06; – SH methods are generally better than OAR, both improve the

efficiency and no loss of accuracy; – SH methods are generally better than AVH, meaning that

apparent visual similarity may not generalize well to object classes while semantic knowledge can better help;

• B: High level concepts in VOC’06; – SH methods are capable of reasoning high level concepts

• C: images from external dataset by querying “vehicle window”, “windscreen”, “windshield” in Google; – To test the generalization ability of classifiers; – SSH can’t work since only is-a relation is considered; – For OAR and AVH, simple reasoning is applied that if there is a

car or bus then there is a window;

Publication

• Marszalek, Marcin, and Cordelia Schmid. "Semantic hierarchies for visual object recognition." Computer Vision and Pattern Recognition, 2007. CVPR'07. IEEE Conference on. IEEE, 2007.

Outline








album event recognition [Tsai’11]

• Goal: recognize the event/topic of a given album (a set of images);

Basic idea

• Use co-occurrence relation to identify typical concepts for each event;

– Event hiking:

• Positive concepts: mountain, people walking, outdoor etc.;

• Negative concepts: bedroom, indoor etc.;

– Event Valentine’s day:

• Positive concepts: chocolate, heart, candy;

• Negative concepts: turkey, green clothes etc.;

Framework of album event classification

What is the object pattern: imperfect object detection

Discovered patterns: {cloud(5/7), sky(6/7), mountain(5/7), indoor(1/7)}

With imperfect detection: let’s discretize the continuous-valued scores of detector output:

Quantized detection: {cloud(5/7), sky(6/7), mountain(5/7), indoor(1/7), person(2/7)} {cloud(5/7), sky(6/7), mountain(5/7), indoor(1/7), person(5/7)}

Dataset construction: select popular holidays using Flickr

Dataset construction: picking up relevant objects

• For each tag T, Flickr provide some relevant tags

• Take union of relevant object tags to all 10 holidays 500 tags

• For each holiday H

– Rank each tag T by

• R(H, T) = |I(H and T)| / |I(H or T)|

• Pick the top 50 tags

List of 38 object detectors Holidays Positively relevant objects

Christmas Christmas tree, gift

Easter Easter egg, basket, rabbit, church

Halloween Attire, pumpkin, jack-o-lantern

Independence Day American flag, firework, crowd

Mardi Gras Mask, necklace, attire, feather boa

Memorial Day American flag, uniform, military uniform, music band

New Year’s Eve Champagne, firework, crowd

St. Patrick’s Day Music band, crowd

Thanksgiving Food, dinner, turkey, pumpkin

Valentine’s Day Heart, bouquet

Other objects Accordion, bassoon, child, cross, drum, euphonium, flag, french horn, light source, room light, shopping basket, soil, stage, table

Some Mined Patterns

Pattern ranking for album event classification

• Let f(p) = percentage of photos containing pattern p in an album

• For each event E

– For each pattern p

• Try predicting E using f(p)

• Measure the prediction performance by Average Precision(AP)

– Rank all patterns by their APs with respect to E

• Take the union of top patterns for all events


• Dataset: 1) small dataset: 3 topics: potluck, hiking, concert; 2) 10 holiday albums collected from flickr;

• Comparing algorithm: – Image-based multiclass Adaboost (SAMME)

• J. Yuan, J. Luo, and Y. Wu. Mining compositional features for boosting. In IEEE CVPR 2008;

• Difference: 1) Mining patterns from the whole dataset; 2) results are majority vote of the image labels of the given album.

– Compositional object pattern with non-flexible pattern (COPF_base)

– Compositional object pattern with flexible pattern (COPF)

Classification results of small dataset

Classification results of 10 holiday dataset

Publication

• Tsai, Shen-Fu, et al. "Compositional object pattern: a new model for album event recognition." Proceedings of the 19th ACM international conference on Multimedia. ACM, 2011.

Outline








Ontological image annotation [Tsai’12]

• Input: image I; concepts C1, C2, …, Cn; output values x1, x2, …, xn from coarse detectors;

• Output: y1, y2, …, yn, where yi is 1 or -1 which indicates whether Ci is present in the image or not.

C3

C5 C4

C1 C2

Coarse C1 detector Coarse C1 detector





image

refined C1 detection refined C1 detection





Basic idea

• Joint inference of concepts, considering their subclass and co-occurrence relations

concepts of interest

WordNet

subclass relation

subclass extraction

Training image

co-occurrence learner

co-occurrence relation

inference

Formulation

Unary potential

Potential function

Pairwise potential

Relation constraints

• Subclass constraint (hard constraint) – If Ca (dog) is a subclass of Cb (animal), then yb ≥ ya

– Relation obtained from WordNet

• Co-occurrence reward/penalty (soft constraint) – E.g. reward (indoor, table) pair

– E.g. penalize (computer, beach) pair

– Learned from training set

– Only positive pairs are considered

Inference

• Find the assignment y that satisfies all constraints with the highest score

subclass relation

• Indoor

• Bedroom

• Office

• Outdoor

• Light

• Room light

• Street light

• Computer

• Laptop

• Desktop computer

entity

artifact

devicestructure,

construction

personal computer

source of illumination

bedroom office

laptop desktop computer

room light

street light

WordNet subclass

relations

Final subclass relation

Ontological learning

Baseline algorithms

• RAW: raw output of initial detectors

• Semantic Hierarchy (SH): conditional classifier on each subclass/part-of link

• SVM fusion

Results: AUC with %50 training

indoor

outd

oor

bedro

om

offic

e

light

room

light

str

eetlig

ht

com

pute

r

lapto

p

deskto

p

ave

0.6

0.7

0.8

0.9

AU

C

OI

SH

SVM

RAW

AUC v.s. #training

20 35 50Percentage of training (%)

0.68

0.70

0.72

0.74

0.76

0.78

Mean A

UC

OI

SH

SVM

RAW

Publication

• Tsai, Shen-Fu, et al. "Ontological Inference Framework with Joint Ontology Construction and Learning for Image Understanding." Multimedia and Expo (ICME), 2012 IEEE International Conference on. IEEE, 2012.

Outline








Conclusion

• Advantages:

– Joint inference;

– Scalability;

– More robust and accurate classifiers;

– Bridging the low level semantic and high level ones;

• Disadvantages:

– Harder to understand than traditional methods

– Sometimes prior knowledge is wrong;

– Efficiency and accuracy are usually contradictory;

Future work

• Explore ontology deeper to see how much improvement can be achieved in terms of accuracy and efficiency;

• Explore ontology wider to apply ontology on many other domains such as medical imaging, healthcare, AI etc.;

• Explore how to construct ontology automatically or semiautomatically;

Thank you !!

ontology-based image representation and inferenceece417/lecturenotes/ece417_ontology.pdfpattern...

Documents