attention in computer vision mica arie-nachimson and michal kiwkowitz may 22, 2005 advanced topics...
Post on 21-Dec-2015
215 views
TRANSCRIPT
Attention in Computer Vision
Mica Arie-Nachimson and Michal Kiwkowitz
May 22, 2005Advanced Topics in Computer Vision
Weizmann Institute of Science
Problem definition – Search Order
Object recognition
NO
• Vision applications apply “expensive” algorithms (e.g. recognition) to image patches
• Mostly naïve selection of patches• Selection of patches determines number of calls to
“expensive” algorithm
Problem Definition - Search Order
Object recognition
NOYES
• More sophisticated selection of patches would imply less calls to “expensive” algorithm
• Attention used to efficiently focus on incoming data (better use for limited processing capacity)
Problem Definition - Search Order
Object recognition
12345
6
Outline• What is Attention• Attention in Object Recognition
• Saliency Model• Feature Integration Theory• Saliency Algorithm• Saliency & Object Recognition• Comparison
• Inner Scene Similarity Model• Biological motivation• Difficulty of Search Tasks• Algorithms
• FLNN• VSLE
Outline• What is Attention• Attention in Object Recognition
• Saliency Model• Feature Integration Theory• Saliency Algorithm• Saliency & Object Recognition• Comparison
• Inner Scene Similarity Model• Biological motivation• Difficulty of Search Tasks• Algorithms
• FLNN• VSLE
Attention
• Attention implies allocating resources, perceptual or cognitive, to some things at the expense of not allocating them to something else.
What is Attention
• You are sitting in class listening to a lecture.
• Two people behind you are talking. – Can you hear the lecture?
• One of them mentions the name of a friend of yours. – How did you know?
Attention in Other Applications
• Face Detection (feature selection)
• Video Analysis (temporal block selection)
• Robot Navigation (select locations)
• …
Attention is Directed by:
Bottom-up: • From small to large units of meaning • Rapid • Task-independent
Attention is Directed by:
Top-down:• Use higher levels (context, expectation)
to process incoming information (Guess)• Slower• Task dependent
http://www.rybak-et-al.net/nisms.html
Outline• What is Attention• Attention in Object Recognition
• Saliency Model• Feature Integration Theory• Saliency Algorithm• Saliency & Object Recognition• Comparison
• Inner Scene Similarity Model• Biological motivation• Difficulty of Search Tasks• Algorithms
• FLNN• VSLE
When is information selected (filtered)? – Early selection (Broadbent, 1958)– Cocktail party phenomenon (Moray, 1959)– Late selection (Treisman, 1960) - attenuation
• All information is sent to perceptual systems for processing
• Some is selected for complete processing• Some is more likely to be selected
Attention
WHICH?
Parallel SearchIs there a green O ?
+
A. Treisman, G. Gelade, 1980
Conjunction Search
Is there a green N ?
+
A. Treisman, G. Gelade, 1980
Results
A. Treisman, G. Gelade, 1980
Conjunction Search
+
A. Treisman, G. Gelade, 1980
Color map Orientation map
A. Treisman, G. Gelade, 1980
Color map Orientation map
A. Treisman, G. Gelade, 1980
Conjunction Search
+
A. Treisman, G. Gelade, 1980
Primitives
PP PP
PP
Intensity
PP P
PPP
Orientation
PP PP
PP
Color
xx
x
xs
x
Curvature
II
I
IILine End
Movement
x x x
xx
x
Feature Integration Theory
Attention - two stages:
Attention•Serial Processing•Localized Focus•Slower•Conjunctive search
Pre-attention•Parallel Processing•Low Level Features•Fast•Parallel Search
How is the Focus
found & shifted?
A. Treisman, G. Gelade, 1980
Outline• What is Attention• Attention in Object Recognition
• Saliency Model• Feature Integration Theory• Saliency Algorithm• Saliency & Object Recognition• Comparison
• Inner Scene Similarity Model• Biological motivation• Difficulty of Search Tasks• Algorithms
• FLNN• VSLE
Shifts in Attention
“Shifts in selective visual attention: towards the underlying neural circuitry”,
Christof Koch, and Shimon Ullman, 1985
C. Koch, and S. Ullman, 1985
Feature Maps
•Orientation•Color•Curvature•Line end•Movement
Feature Maps
•Orientation•Color•Curvature•Line end•Movement
Feature Maps
•Orientation•Color•Curvature•Line end•Movement
Feature Maps
•Orientation•Color•Curvature•Line end•Movement
Feature Maps
•Orientation•Color•Curvature•Line end•Movement
Central RepresentationAttention
SaliencySaliency
Saliency
“A model of saliency-based visual attention for rapid scene analysis”
Laurent Itti, Christof Koch, and Ernst Niebur, 1998
L. Itti, C. Koch, and E. Niebur, 1998
• Salient - stands out
• Example – telephone & road sign have high saliency
from C. Koch L. Itti, C. Koch, and E. Niebur, 1998
Intensity
L. Itti, C. Koch, and E. Niebur, 1998
Cells in the retina
01
2
Intensity
Create 8 spatial scale using Gaussian pyramids
8
L. Itti, C. Koch, and E. Niebur, 1998
IntensityCenter-Surround difference operator- Sensitive to local spatial
discontinuities- Principle computation in the retina &
primary visual cortex- Subtract coarse scale from fine
scale
+
-
Fine scale
Coarse scale
L. Itti, C. Koch, and E. Niebur, 1998
+
-
fine
coarse
Toy Example
0 0 0
0 0 0
0 0 0
0 0 0
0 255 0
0 0 0
Fine level Coarse level
Gauss Pyramid Interpolation
Coarse level
Point-by-point subtraction
0 0 0
0 255 0
0 0 0
Toy Example
255 255 255
255 255 255
255 255 255
255 255 255
255 255 255
255 255 255
Fine level Coarse level
Gauss Pyramid Interpolation
Coarse level
Point-by-point subtraction
0 0 0
0 0 0
0 0 0
Intensity
4,3,2c 4,3 ccs
)()(),( sIcIscI
)5()2()5,2( III
Compute:
6 Intensity maps
)6()2()6,2( III
Different ratios – multiscale feature extraction
)6()3()6,3( III
L. Itti, C. Koch, and E. Niebur, 1998
Color
Same c and s as with intensity12 Color maps
Kandel et al. (2000). Principles of Neural Science. McGraw-Hill/Appleton & Lange
L. Itti, C. Koch, and E. Niebur, 1998More
Orientation
Same c and s as with intensity24 Orientation maps
}135,90,45,0{
|),(),(|),,( sOcOscO
From Visual system presentation by S. Ullman
L. Itti, C. Koch, and E. Niebur, 1998More
from C. Koch L. Itti, C. Koch, and E. Niebur, 1998
More
Normalization Operator
L. Itti, C. Koch, and E. Niebur, 1998
Saliency Map
3
)()()( ONCNINS
L. Itti, C. Koch, and E. Niebur, 1998
1. Extract Feature Maps
Algorithm- up to now
2. Compute Center-Surround (42)
• Intensity – I (6)
• Color – C (12)
• Orientation – O (24)
3. Combine each channel into conspicuity map
4. Compute saliency by summing and normalizing maps
Laurent Itti, Christof Koch, and Ernst Niebur, 1998
Leaky integrate-and-fire neurons“Inhibition of return”
Winner Takes All
Selection (FOA)
L. Itti, C. Koch, and E. Niebur, 1998
FOA – Focus Of Attention
Results
• FOA shifts: 30-70 ms• Inhibition: 500-900 ms
Inhibition of return ends
L. Itti, C. Koch, and E. Niebur, 1998
Results
Spatial Frequency Content, Reinage & Zador, 1997
Image
SFC
Saliency
Output
L. Itti, C. Koch, and E. Niebur, 1998
Results
(a) (b)
(c) (d)
Image
SFC
Saliency
Output
L. Itti, C. Koch, and E. Niebur, 1998Spatial Frequency Content, Reinage & Zador, 1997
Outline• What is Attention• Attention in Object Recognition
• Saliency Model• Feature Integration Theory• Saliency Algorithm• Saliency & Object Recognition• Comparison
• Inner Scene Similarity Model• Biological motivation• Difficulty of Search Tasks• Algorithms
• FLNN• VSLE
Attention & Object Recognition
• “Is bottom-up attention useful for object recognition?”– U. Rutishauser, D. Walther, C. Koch and P. Perona,
2004
U. Rutishauser, D. Walther, C. Koch and P. Perona, 2004
Computer recognition
Human recognition
segmented Cluttered scenes
labeled Non labeled
Attention
Object Recognition
saliency model
U. Rutishauser, D. Walther, C. Koch and P. Perona, 2004
Growing region in strongest map
To Object Recognition
(Lowe)
More
Attention & Object Recognition
Learning inventories – “grocery cart problem”
U. Rutishauser, D. Walther, C. Koch and P. Perona, 2004
Real world scenes1 image for training (15 fixations)
2-5 images for testing (20 fixations)
testing
training Object recognitionMatch
“Grocery Cart” Problem
U. Rutishauser, D. Walther, C. Koch and P. Perona, 2004
training testing1
testing2
“Grocery Cart” Problem
Downsides:
• Bias of human photography
• Small image set
U. Rutishauser, D. Walther, C. Koch and P. Perona, 2004
Solution• Robot as acquisition tool
Robot - Landmark Learning
Objective – how many objects are found and classified correctly?
Navigation – simple obstacle avoiding algorithm using infrared sensors
U. Rutishauser, D. Walther, C. Koch and P. Perona, 2004
Object recognition
< 3 key points
Landmark Learning
With
Attention
U. Rutishauser, D. Walther, C. Koch and P. Perona, 2004
Landmark Learning
With Random Selection
U. Rutishauser, D. Walther, C. Koch and P. Perona, 2004
Landmark Learning - Results
U. Rutishauser, D. Walther, C. Koch and P. Perona, 2004
Saliency Based Object Recognition
• Biologically motivated• Uses bottom-up, allows
combining top-down information
• Segmentation– Cluttered scenes– Unlabeled objects– Multiple objects in single image
• Static priority map
U. Rutishauser, D. Walther, C. Koch and P. Perona, 2004
Outline• What is Attention• Attention in Object Recognition
• Saliency Model• Feature Integration Theory• Saliency Algorithm• Saliency & Object Recognition• Comparison
• Inner Scene Similarity Model• Biological motivation• Difficulty of Search Tasks• Algorithms
• FLNN• VSLE
Comparison
“Comparing attention operators for learning landmarks”, R. Sim, S. Polifroni, G. Dudek , June 2003
Other attention operators for low level features
R. Sim, S. Polifroni, G. Dudek , June 2003
Comparison
R. Sim, S. Polifroni, G. Dudek , June 2003
Edge density Radial symmetry
Smallest eigenvalue Caltech saliency
Comparison
• Landmark learning
• Training – learn landmarks knowing camera pose
• Testing - determine pose of camera according to landmarks (pose estimation)
R. Sim, S. Polifroni, G. Dudek , June 2003
Comparison - Results
• All operators better than random
• Radial symmetry worst results
• Caltech operator performs similar to edge and eigenvalue operators
• BUT – More complex to implement – More computing time
• Less preferred candidate in practice
R. Sim, S. Polifroni, G. Dudek , June 2003
Outline• What is Attention• Attention in Object Recognition
• Saliency Model• Feature Integration Theory• Saliency Algorithm• Saliency & Object Recognition• Comparison
• Inner Scene Similarity Model• Biological motivation• Difficulty of Search Tasks• Algorithms
• FLNN• VSLE
The Problem
Object recognition
12345
6
Outline• What is Attention• Attention in Object Recognition
• Saliency Model• Feature Integration Theory• Saliency Algorithm• Saliency & Object Recognition• Comparison
• Inner Scene Similarity Model• Biological motivation• Difficulty of Search Tasks• Algorithms
• FLNN• VSLE
Biological Motivation
• An alternative approach: continuous search difficulty
• Based on similarity:– Between Targets and Non-Targets in the scene– Between Non-Targets and Non-Targets in the scene
• Similar structural units do not need separate treatment
• Structural units similar to a possible target get high priority
Duncan & Humphreys [89]
Biological Motivation
similar
similar
not similar
not similar
search difficulty
target- nontarget similarity
nontarget- nontarget similarity
Duncan & Humphreys [89]
Biological Motivation
• Explains pop-out vs. serial search phenomenon
Non-targets:
Target:
Duncan & Humphreys [89]
Biological Motivation
• Explains pop-out vs. serial search phenomenon
Non-targets:
Target:
Duncan & Humphreys [89]
similar
similar
not similar
not similar
search difficulty
Biological Motivation
• Explains pop-out vs. serial search phenomenon Non-targets:
Target:
Non-targets:
Target:
target- nontarget similarity
nontarget- nontarget similarity
Duncan & Humphreys [89]
Using Inner-scene Similarities
• Every candidate is characterized by a vector of n attributes
• n-dimentional metric space– A candidate is a point in the space– Some distance function d is associated with
the space
Avraham & Lindenbaum [04] Avraham & Lindenbaum [05]
Using Inner-scene Similarities Example
• One feature only: object area
• d: regular Euclidean distance Feature space
Outline• What is Attention• Attention in Object Recognition
• Saliency Model• Feature Integration Theory• Saliency Algorithm• Saliency & Object Recognition• Comparison
• Inner Scene Similarity Model• Biological motivation• Difficulty of Search Tasks• Algorithms
• FLNN• VSLE
Difficulty of Search
• The difficulty measure is the number of queries until the first target is found
• Two main factors– Distance between Targets and Non-Targets– Distance between Non-Targets and Non-
Targets
Feature space
CoverDifficulty of Search
Feature space
c: the number of circles in the cover
Difficulty of Search
c will be our measure of the search difficulty
We need some constraint on the
circles’ size!
c: the number of circles
dt: max-min target distanceDifficulty of Search
dt
dt-cover
diamete
r
d t
Difficulty of Searchdt
Minimum dt-cover
c: The number of circles in the minimal dt-cover
diamete
r
d t
Difficulty of Searchdt
c: the number of circlesDifficulty of Search
dt
c = 7
dt
dt
c: insects exampleDifficulty of Search
dt
Feature spacec = 3
Example: easy searchDifficulty of Search
dt
c = 2
Example: hard searchDifficulty of Search
c = # of candidates
dt
Define the Difficulty using c
• Lower bound: Every search algorithm needs c calls to the oracle before finding the first target in the worst case
• Upper bound: There is an algorithm that will need max. c calls to the oracle to find the first target, for all search tasks
Difficulty of Search
Lower bound
Every search algorithm needs c calls to the oracle before finding the first target in the worst case
Difficulty of Search
1
2
3
4
5dt
dt
dt
dt
Upper bound
There is an algorithm that will need max. c calls to the oracle to find the first target, for all search tasks
FLNN-Farthest Labeled Nearest Neighbor
Difficulty of Search
Outline• What is Attention• Attention in Object Recognition
• Saliency Model• Feature Integration Theory• Saliency Algorithm• Saliency & Object Recognition• Comparison
• Inner Scene Similarity Model• Biological motivation• Difficulty of Search Tasks• Algorithms
• FLNN• VSLE
FLNNFarthest Labeled Nearest Neighbor
Efficient Algorithms
1
2
3
4
5
c is a tight bound!
How do we compute c?Difficulty of Search
dt
– Need to know dt
– Compute the minimal dt-cover
– Count number of circles c=7
dt
– Need to know dt
– Compute the minimal dt-cover
– Count number of circles = c
To know the exact dt we need to know all the targets and non-targets, but that’s what we’re looking for…
Computing the minimal dt-cover is NP-complete!
Ok, that’s easy…
Difficulty of Search
dt
How do we compute c?
Upper & Lower Bounds on c
• Upper bounds:– The number of candidates
– Know that dt is larger than some d0:• Can approximate cover size
• Lower bounds:– FLNN worst case
– Know that dt is larger than some d0:• Can approximate cover size
Difficulty of Search
Outline• What is Attention• Attention in Object Recognition
• Saliency Model• Feature Integration Theory• Saliency Algorithm• Saliency & Object Recognition• Comparison
• Inner Scene Similarity Model• Biological motivation• Difficulty of Search Tasks• Algorithms
• FLNN• VSLE
Improving FLNN
• What’s wrong with FLNN?– Relates only to the nearest known neighbor– Finds only the first target efficiently– Cannot be easily extended to include top-
down information
Efficient Algorithms
VSLEVisual Search using Linear Estimation
• Each candidate has a prob. to be a target• Query the candidate with the highest probability• Update other candidates’ prob. according to the
known results– Every known target/non-target affects other
candidates in reverse order to its distance.
If we know results for candidates 1,…,m:
• Dynamic priority map
Efficient Algorithms
Efficient Algorithms
0.650.4
0.45
0.6 0.5
0.54
0.450.51
0.530.46
0.58
0.51
0.1
0.4
0.450.5
0.560.48
0.5
0.56
0.63
0.70.68
VSLEVisual Search using Linear Estimation
Efficient Algorithms
0.15
0.45
0.6 0.60.63
0.450.65
0.20.25
0.53
0.23
0.55
0.1 0.620.15
0.59
0.210.27
0.65
VSLEVisual Search using Linear Estimation
0.06
0.45
0.12 0.550.18
0.95
0.220.28
More
Combining Top-Down Information
• Simply specify the initial probabilities to match previous known data
• Add known target objects to the space. This will alter the probabilities accordingly and speed up search
Efficient Algorithms
Experiment 1: COIL-100Efficient Algorithms
Columbia Object Image Library [96]
Experiment 1: COIL-100
• Features:– 1st, 2nd, 3rd gaussian derivatives 9 basis
filters– 5 scales 9x5 = 45 features
• Euclidean distance
Efficient Algorithms
Rao & Ballard [95]
Experiment 1: COIL-100Efficient Algorithms
10 cars10 cups
# queries# queries
Experiment 2: hand segmentedEfficient Algorithms
• Every large segment is a candidate• 24 candidates• 4 targets
Berkeley hand segmented DB
Martin, Fowlkes, Tal & Malik [01]
Experiment 2: hand segmented
• Features: color histograms and
separated into 8 bins each 64 features
• Euclidean distance
Efficient Algorithms
Experiment 3: automatic color segmentation
• Automatic color segmented image for face detection
Efficient Algorithms
Experiment 3: color segmentation
• 146 candidates
• 4 features: segment size, mean value of red, green and blue
• Euclidean distance
Efficient Algorithms
# queries
Combining top-down information
• Add known targets to the space
Efficient Algorithms
Without additional targets With additional targets
# queries# queries
Summary: similarity modelSaliency model• Biologically motivated• Uses bottom-up, allows
combining top-down information
• Segmentation• Static priority map
Similarity model• Biologically motivated• Uses bottom-up, allows
combining top-down information
• No segmentation• Dynamic priority map• Measures the search
difficulty
Summary
• What is attention
• Aid object recognition tasks by choosing the area of interest
• Two approaches: saliency model and similarity model– Biological motivation– Algorithms
Thank You!
Linearly Estimating l(xk)
A linear estimation for l(xk):
Which, of course, minimizes the error
Solving a set of equations gives an estimation:
Linearly Estimating l(xk)
Estimation:
Where vector of known labels,
and is computed as follows (i,j=1,…,m):
R and r depend only on the distances, computed in
advance once