multi-level annotation of natural scenes using dominant image
TRANSCRIPT
Multi-Level Annotation of Natural Scenes Using Dominant Image
Compounds and Semantic Concepts
Jianping Fan, Yuli Gao, Hangzai Luo
Department of Computer Science
University of North Carolina at Charlotte
Outline of Presentation
� Research Motivation
� Semantic Image Representation
� Semantic Image Concept Modeling
� Adaptive EM Algorithm for Classifier Training
� Multi-Level Image Annotation
� Conclusions
(b) Semantics of Image Contents should be captured based on the human perception dimensions for judgment of image similarity!
(a) Semantic Image Indexing should be done by using the semantics of image contents!
Google indexes images by using associated text, but the semantics of image contents may not be described by the associated text correctly!
1. Research Motivation
� Three Types for Image Similarity Judgment:
(a) Image Similarity with same Dominant Image Compound
Dog
1. Research Motivation
� Three Types of Image Similarity Judgment:
(b) Image Similarity with same Semantic Image Concept
Mountain View
1. Research Motivation
� Three Types of Image Similarity Judgment:
(c) Image Similarity with same Semantic Image Event
Sailing
1. Research Motivation
� Image Similarity Judgment: Conclusions
Image Similarity could be on Multiple Levels
Image Annotation should be on Multiple Levels
Dominant Image Compounds & Semantic Image Concepts & Events
1. Research Motivation
� Semantic Image Classification is widely used to enable Automatic Image Annotation, but its performance largely depends on three issues:
(a) The ability of the underlying Image Patterns for image representation and feature extraction to capture the Middle-Level Semantics of Images!
(b) The ability of visual features on discriminating among different semantic image concepts!
(c) Performance of Classifier Training Algorithms.
1. Research Motivation
Semantic Image Concepts or Events
Low-Level Image Signals
� Challenges for Semantic Image Classification
Semantic Gap
What are the suitable image patterns that can be used to enhance quality of features and narrow the semantic gap?
2. Image Content Representation
Semantic Image Concepts
Image Patterns to Capture Middle-Level Image Semantics
Low-Level Image Signals
Sem
an
tic G
ap
Sem
an
tic
Brid
ge 1
Sem
an
ti c
Br id
ge 2
Gap
1G
ap
2
2. Image Content Representation
� Basic Requirements of Image Patterns:
(a) Be able to capture middle-level semantics of images, narrow the semantic gap and to be semantic to human beings!
(b) Be able to enhance the quality of features and improve classifier performance!
2. Image Content Representation
Semantic Image Concepts
Semantic-Sensitive Salient Objects
Low-Level Image Signals
Sem
an
tic G
ap
Sem
an
tic
Brid
ge 1
Sem
an
ti c
Br id
ge 2
Gap
1G
ap
2
2. Image Content Representation
� Three Issues for Salient Objects
a. What are the salient objects?
b. What is the basic vocabulary?
c. How can we detect them automatically?---The same type of salient object may appear in different
images with very different visual properties!
---Requirement of salient objects: they should be able to capture middle-level semantics of images without performing semanticobject detection, and they should be able to capture the dominant visual properties of the relevant semantic objects!
---The number of “key” salient objects in a specific imagedomain should be limited!
� Definition of Salient Objects
2. Image Content Representation
Salient Objects are defined as the dominant imagecompounds that are semantic to human beings!
2. Image Content Representation
� WordNet or Domain Knowledge can be used to define Basic Vocabulary of Salient Objects
Natural Images
Sky Ground Foliage
Blue Sky
Cloudy Sky
Floor Sand Grass
Green Foliage
Floral Foliage
2. Image Content Representation
� Automatic Salient Object Detection Function
a. Homogeneous Image Regions are first detected!
b. Support Vector Machines (SVM) for binary region classification and similarity-based Region Merging!----classifier training is performed on multiple images to capture different visual
properties of the same type of salient object under different vision conditions!
---we boost mean shift, edgeflow, seeded region growing for image segmentation!
2. Image Content Representation
� Average Performance of Detection Functions
Objects
Precision
Recall
Brown Horse
95.6%
100%
Grass
92.9%
94.8%
Purple Flower
96.1%
95.2%
Red Flower
87.8%
86.4%
Sand Field
98.8%
96.6%
Objects Rock
Precision
Recall
98.7%
100%
Water
86.7%
89.5%
Human Skin
86.2%
85.4%
Yellow Flower
87.4%
89.3%
Sunset/Sunrise
92.5%
95.2%
Objects Sky
Precision
Recall
87.6%
94.5%
Snow
86.7%
87.5%
Waterfall
88.5%
87.1%
Sail Cloth
96.3%
94.9%
Forest
85.4%
84.8%
2. Image Content Representation
� Salient Object Detection Results
2. Image Content Representation
� Salient Object Detection Results
Observation: Salient Objects have provided the dominant visual propertiesfor the relevant semantic objects!
2. Image Content Representation
� Advantages & Benefits:
© It will enable a compound-level image annotation, users will have more choices to select the keywords at compound level to specify their query concepts!
(b) It is able to capture the middle-level semantics of images, thus it is able to enhance the quality offeatures and improve classifier performance!
(a) It is able to reach a good balance between the semantics sufficiency and the segmentation cost!
3. Semantic Image Concept Modeling
Semantic Concept 1 Semantic Concept i Semantic Concept Nc
Salient Object Type 1 Salient Object Type k Salient Object Type Ns
Color/Texture Pattern 1 Color/Texture Pattern j Color/Texture Pattern Nr
How can we model semantic image concepts by using salient objects?
3. Semantic Image Concept Modeling
How can we quantify the semantic relationship (i.e., image context)between one specific image concept and the relevant salient objects?
3. Semantic Image Concept Modeling
� Finite Mixture Model for Image Concept Modeling
(a) The semantic relationship (i.e., image context) between semantic image concept and the relevant salient objects is modeled by finite mixture model!
(b) Different semantic image concepts are relevant to different typesand different numbers of salient objects with different importance!
© The class distribution for each type of relevant salient objects is modeled by using multiple mixture components to capture the different visual properties under different conditions!
iij Rr
i
isj RXPSXP ϖθθ
κ
),|(),,(1
∑=
=
4. Adaptive EM Algorithm
Problems of traditional EM algorithm:
a. Knowledge of K is required and K is often predefined based on experience!
b. Local maximum!c. Sensitive to initial values of parameters!
??? ϖθκ
4. Adaptive EM Algorithm
� We start from a large value of K to capture the essential relationships between semantic image concept and relevant salient objects!
� We perform automatic merging, splitting and elimination of mixture components to search optimal model structure and parameters!
When we should do merging, splitting and elimination?
4. Adaptive EM Algorithm
Merging: Two mixture components overpopulate the relevant sample regions!
Jensen-Shannon Divergence is used to quantify the overlapping!
Original Mixture Components
Merged Mixture Component
4. Adaptive EM Algorithm
Splitting: One specific component under-populate the relevant sample region!
Original Mixture Component
Local Sample Distribution
Elimination: Two mixture components from two concept models overlap too much!
Mixture Components for Concept 1
Mixture Componentsfor Concept 2
4. Adaptive EM Algorithm
(a) How can we make the margins among different concepts large enough?
(b) What the negative samples can do for us to maximize the margins?
4. Adaptive EM Algorithm
)(
))|,(),,|,((),,(
ΘΦ
Θ=Θ
jlklkj
merge
CXPSCXPJSklJ
θ
)(
)),|,(),,|,((),,(lim
ΘΦ=Θ
mmillj
inatione
SCXPSCXPJSmlJ
θθ
)|,(),,|,((
)(),(
Θ
ΘΦ=Θ
jllj
splitCXPSCXPJS
lJθ
Merge:
Split:
Elimination:
� Criteria for Merging, Splitting and Elimination
4. Adaptive EM Algorithm
1),,(),,(),( lim
1 11 11
=Θ+Θ+Θ ∑ ∑∑∑∑= +== ==
mlJklJlJ inatione
l lm
merge
l k
split
l
j ij jj κ κκ κκ
� Normalization Factor )(ΘΦ is determined by:
� Acceptance Probability to prevent poor operations
Θ−Θ= 1,
),(),(expmin 21
τ
XLXLPaccept
),( 1ΘXL ),( 2ΘXLand are the penalized likelihood functions
before and after performing merging, splitting or elimination!
4. Adaptive EM Algorithm
� Advantages of Adaptive EM Algorithm
(a) It is able to search the optimal model structure K and model parameters automatically in a single probabilistic scheme!
(b) It is able to avoid local maximum problem by re-organizing the distribution of mixture components in feature space!
© It is able to integrate negative samples for classifier training, thus it is able to support margin-based classifier training like SVM!
Convergence of our Adaptive EM Algorithm
4. Adaptive EM Algorithm
5. Semantic Image Classification
6. Multi-Level Image Annotation
6. Multi-Level Image Annotation
6. Multi-Level Image Annotation
6. Multi-Level Image Annotation
7. Benchmark: Image Representation
� Comparison with Blobs under same classifier
Concepts
Salient Objects
Blobs
Mountain View
81.7% (p), 84.3% ®
Beach
80.5% (p)84.7% ®
Garden
80.6% (p)90.6% ®
78.5% (p), 75.5% ®74.6% (p)75.9% ®
73.3% (p)78.2% ®
Concepts Sailing Skiing Desert
Salient Objects 87.6% (p), 85.5% ® 85.4% (p)83.7% ®
89.6% (p)82.8% ®
Blobs 79.5% (p), 77.3% ®79.3% (p) 78.2% ®
76.6% (p)78.5% ®
7. Benchmark: Image Annotation
Image Blobs Salient Objects
It is very easy to use salient objects to specify query concepts!
7. Benchmark: Image Classification
� Comparison with SVM Classifiers
Negative samples are also integrated for FMM classifier training like SVM!
7. Benchmark: Text Classification
� Comparison with SVM Classifiers
SVM classifiers are better than FMM classifiers for high-dimensional text data classification!
FMM needs a large number of labeled samples in such high-dimensional space!
8. Conclusions
� A multi-level approach for image annotation is proposed. Two advantages of this approach are:
(1) it is able to capture middle-level image
semantics, enhance quality of features
and improve classifier performance!
(2) users will have more choices to specify
their query concepts using the keywords
at both compound and concept levels!
8. Conclusions
� An adaptive EM algorithm has been proposed for more effective model selection and model parameter estimation!
� Negative Samples are integrated to enable margin-based classifier training in a FMM framework!
---our FMM classifier can achieve the same advantage like SVM!
Ear – Huge Fan
Body -- Wall
Leg --- Pillar
Tail -- Snake
Trunk --- Tree
Tusk – Spear
@ Prof. R. Jain, IEEE Multimedia,2001@ Journal of Electronic Imaging,2000
Acknowledgement
� Two Wonderful Papers:
Acknowledgement
� Solutions: Compound-Based Chinese Characters
Follow up
People
People
People follow people Follow up
Chinese Character:
English Meaning:
Acknowledgement
� Solutions: Compound-Based Chinese Characters
Crowd
people
people
people
Multiple people together Crowd
Chinese Character:
English Meaning:
We just transform this idea into a statistical image modeling framework!
Acknowledgement
� Prof. Ying Wu at Northwestern University for useful discussion!
� Prof. Edward Chang at UCSB, mentor for the final version of this paper!
� Reviewers for the information on SMEM, CEM!
Q/A
Online demo is available at:
http://www.cs.uncc.edu/~jfan