statistics of natural images
DESCRIPTION
Statistics of natural images. May 30, 2010 Ofer Bartal Alon Faktor. Outline. Motivation Classical statistical models New MRF model approach Learning the models Applications and results. Motivation. Big variance in appearance Can we even dream of modeling this?. Motivation. - PowerPoint PPT PresentationTRANSCRIPT
1
Statistics of natural images
May 30, 2010Ofer BartalAlon Faktor
2
Outline
• Motivation• Classical statistical models• New MRF model approach• Learning the models• Applications and results
3
Motivation
• Big variance in appearance • Can we even dream of modeling this?
4
Motivation
• Main questions:– Do all natural images obey some common
“rules”?– How can one find these “rules”?– How to use “rules” for computer vision
tasks?
5
Motivation
• Why bother to model at all?
• “Noise”, uncertainty
• Model helps choose the “best” possible answer
• Lets see some examples
Natural image model
6
Noise-blur removal
• Consider the classical De-convolution problem
• Can be formulated as linear set of equations:
?Y h X N X
cs cs csy Hx n
H Y+X
N
7
X
=
YNh
X̂
?
Noise-blur removal
8
Inpainting
Y AX n
Y
?X
1 0 0 ... 00 1 0 0 ... 00 0 0 0 0 1 0 0 ... 00 0 0 0 0 0 1 0 ... 00 0 .... 0 1
A
Missing lines of identity matrix = missing pixels (under-determined system)
9
Motivation
• Problems: – Unknown noise– H may be singular (Deconvolution)– H may be under-determined (Inpainting)
• So there can be many solutions. • How can we find the “right” one?
10
Motivation
• Goal: Estimate x– Assume:
• Prior model of natural image:• Prior model of noise:
– Use MAP estimator to find x:
* arg max ( | ) arg max ( | ) ( )x x
x P x y P y x P x
( )xP x
* arg max ( ) ( )n xx
x P y Hx P x
( )nP n
11
Energy Minimization problem
• The MAP problem can be reformulated as:
data term( | )+prior term( ) ˆ arg min
x
E y x xx E
E
x
13
Classical models
• Smoothness prior (model of image gradients) – Gaussian prior (LS problem)– L1 Prior and sparse prior (IRLS problem)
Image gradient
14
Gaussian Priors
• Assume:
– Gaussian priors on gradients of x:
– Gaussian noise:• Using this assumption:
2
221( )2
x
p x e
* arg min 2T T
xx x Tx x b
* arg max ( ) ( )n xx
x P y Hx P x
2~ (0, )n N
15
Non-Gaussian Priors
• Empirical results: image gradients have a Non-Gaussian heavy tailed distribution
• We assume L1 or sparse prior• We solve it by IRLS –iterative re-weighted LS
16
De-convolution Results
Gaussian prior Sparse priorBlurred image
Good results on simple images
17
De-noising Results
De-noising resultNoisy image
Poor results on real natural images
18
Classical models – Pro’s and Con’s
• Advantages:– Simple and easy to implement
• Disadvantages:– Too Heuristic– Only one property - Smoothness– Bias towards totally smooth images:
P P
19
Going Beyond Classical Models
0 1 2 3 40
0.01
0.02
0.03
0.04
0.05
0.06
0.07
0.08
0.09
number of similar patches (in log10 scale)
prob
abilt
y
0 1 2 3 40
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
number of similar patches (in log10 scale)
prob
abilt
y
0 1 2 3 40
0.01
0.02
0.03
0.04
0.05
0.06
0.07
0.08
0.09
number of similar patches (in log10 scale)
prob
abilt
y
20
Modern Approach
• Model is based on image properties• Choose properties using image dataset
• Questions:1. What types of properties?
Responses to linear filters.2. How to find good properties?
Either pre-determined bank or learn from data.3. How should combine properties to one distribution?
We will see how.
21
Mathematical framework
• Want: A model p(I) of real distribution f(I).• Computationally hard:
– A 100x100 pixel image has 10,000 variables• Can explicitly model only a few dimensions at a time
Arrow = viewpoint of few dimensions
22
Mathematical framework
• A viewpoint is a response to a linear filter• A distribution over these responses is a
marginal of real distribution f(I)• (Marginal = Distribution over a subset of variables)
Arrow = marginal of f(I)
23
Mathematical framework
• If p(I) and f(I) have the same marginal distributions of linear filters then p(I)=f(I) (proposition by Zhu and Mumford)
• “Hope”: If we will choose K “good” filters then p(I) and f(I) will be “close”.
How do we measure “close?”
24
Distance between distributions
• Kullback-Leibler divergence:
• Problem - f(I) unknown• Proposition - use instead:
• Measures fit of model to observations
( ), ( ; , ) log ( ) log ( ; , )f fKL f I p I S E f I E p I S
~
( ), ( ; , ) log ( ; , ) log ( ; , )P X
KL f I p I S p I S p I S
( ; , )p I S X
25
Illustration
log ( ; , )p I S
log ( ; , )p I S
~KL
~
( ), ( ; , ) log ( ; , ) log ( ; , )P X
KL f I p I S p I S p I S
26
Getting synthesized images
• Get synthesized images by sampling the learned model
• Sample using Markov Chain Monte Carlo (MCMC).
• Drawback: Learning process is slow
xp
27
Our model P(I) – A MRF
• MRF = Markov Random Field• A MRF is based on a graph G=(V,E).
V – pixels E – between pixels that affect each other
• Our distribution is the MRF:
( )1( ) exp ( )c c
c Cliques
p I U IZ
28
Simple grid MRF
• Here, cliques are edges• Every pixel belongs to 4 cliques
29
MRF
• We limit ourselves to:
– Cliques of fixed size (over-lapping patches)
– Same for all cliques
• We get:
( ) ( )( ) ( )
1
( ) ( )K
Tc cU I F I
( ) ( )( )
1 1
1( ) exp ( )C K
Tc
c
p I F IZ
U
30
MRF simulation
( ) ( )( )
1 1
1( ) exp ( )C K
Tc
c
p I F IZ
31
Histogram simulation
( ) ( )obsnH I
Histogram of a marginal
32
MRF
• In terms of convolutions:
• Denote: Set of potential functions:
• Denote: Set of filters:
( )
1...K
( )
1...KS F
( ) ( )
1 ( , )
,1( ; , )
K
x y
F I x y
p I S eZ
33
MRF - A simple example
• Cliques of size 1• Pixels are i.i.d and distributed by grayscale histogram
grayscale histogram
Drawback: cliques are too small
34
MRF - Another simple example
• Clique = whole image• Result: Uniform distribution on images in dataset
Px
Drawback: cliques are too big
37
Revisiting classical models
• Actually, the classical model is a pairwise MRF:
• Has cliques of size 2:
• Has only 2 linear filters => 2 marginals
• No guarantee that p(I) will be close to f(I)
( , )( ( , )) ( ( , ))1( ) x yx y
I x y I x yp I e
Z
39
Zhu and Mumford’s approach (1997)
• We want to find K “good” filters• Strategy:
– Start off with a bank B of possible filters– Choose subset that minimizes the
distance between p(I) and f(I)– For computational reasons, choose filters one by
one using a greedy method
, | |S B S K
41
Choosing the next filter
• AIG = the difference between the model p(I) and the data from the viewpoint of marginal
• AIF = the difference in between different images in dataset from the viewpoint of marginal
( ) ( ) ( )IC AIG AIF
( ) ( )( ; , )
1
( ) ( )
1
1( ) ( ) ( )21( ) ( )
2obs
Mobsn P I S
n
Mobsn
n
AIG H I E H IM
AIF H IM
42
Algorithm – Filter selection
Bank of filters
IC
IC
IC arg max
max
Model ( )learn
44
Learning the potentials
( ) Model
( )IC
Calculate update
Init
(Using maximum entropy on P)
45
The bank of filters
• Filter types: – Intensity filter (1X1)– Isotropic filters - Laplacian of Gaussian (LG, )– Directional filters - Gabor (Gcos, Gsin)
• Computation in different scales - image pyramid
Laplacian of Gaussian Gabor
46
Running example of algorithmExperiment I
Use only small filters
47
Results
All learned potentials have a diffusive nature
( ) ( )( )
1 1
1( ) exp ( )C K
Tc
c
p I F IZ
48
Running example of algorithmExperiment II
• Only gradient filters, in different scales• Small filters -> diffusive potential (as expected)• Surprisingly: Large filters -> reactive potentials
Diffusive Reactive
50
Examples of the synthesized images
Experiment I Experiment II
This image is more “natural” because it has some regions with sharp boundaries
51
Outline
• We have seen:– MRF models – Selection of filters from a bank – Learning potentials
• Now:– Data-driven filters – Analytic results for simple potentials– Making sense in results– Applications
52
Roth and Black’s approach
filters potentials
Chosen from bank Learn a-parametricallyX XLearn from data Learn parametrically
Learn together
53
Motivation – model of natural patches
• Why learn filters from data?• Inspiration from models of natural patches:
– Sparse coding– Component analysis– Product of experts
54
Motivation – Sparse Coding of patches
• Goal: find a set s.t.
•
• Learn from database of natural patches
• Only few filters should fire on a given patch
1
, are sparseN
i i ii
patch a F a
iF
,i ia patch F
iF
1 2 3 4 5
55
Motivation – Component analysis
• Learn by component analysis:– PCA– ICA
• Results in “filters like” components– PCA – first components look like contrast filters– ICA - components look like Gabor filters
iF
56
PCA results
high
low
57
ICA results
• Independent filters • Can derive model for patches:
1
( ) ( )n
Ti i
i
P x p F x
TiF x
ip
58
Motivation – Product of experts
• More sophisticated model for natural patches:
• Training of MLE => “intuitive” filters:
2
1
1( ; ) ( ; ), , ( ) 1( 2, )
i
i i i
KT
POE i sti ii
F F zzZ F
p X X
texturecontrast
59
• extension of POE to FOE:
Field of experts (FOE)
( )1 1
1( ; ) exp ( ; ) ,,( , )
C KT
FOE i iii
i ii
cc
F Fp I IFZ
log( )st
( )1 1
( ; )iC K
TFOE i c
c i
E F I
Roth S., Black M. J., Fields of experts IJCV, 2009
60
The experts
• Student-t experts2
( ) 12
i
izz
( )st z
61
Meaning of
• Higher means:– Punishes high responses more severely – A filter with higher weight
( )st z
1
2
( )
1
1( ; , ) exp ( , )
g log 12
K
FOE i i i iii i
TCi c
ic
p I F gZ F
F I
Learning the model
log ( ; , )p I S
log ( ; , )p I S
~KL
Model
1
2K
MCMCinit
random
65
Results of learning FOE
Filters aren’t “intuitive”
F
67
So far…
filters potentials
Chosen from bank Learned a-parametrically
diffusive reactive
Small filters Large filters
non-intuitive?
68
So far…
filters potentials
Learned from database Learned parametrically
non-intuitive?
69
What now?• Revisiting POE and FOE with Gaussian
potentials• Relation to non-Gaussian potentials• Making sense of previous results
Weiss Y., Freeman W. T. What makes a good model of natural images?. CVPR, 2007
70
Gaussian POE
2
2
1
2
1
* 2
1
1; exp ( )( )
ln ; ( ) ln ( )
arg min ( ) ln ( )i
z
KT
GPOE i iii
KT
GPOE i i ii
KT
i i iML F i
e
p x F F xZ F
p x F F x Z F
F F x Z F
71
• Claim: Z is constant for any set of K orthonormal vectors
•
• This has an analytic solution – the K minor components of the data
Gaussian POE
* 2
1
arg min ( )i
KT
i iF orthonormal i
F F x
72
• Non-intuitive high-frequency filters• Reminder - PCA
ResultsExample of learned filters
high
low
73
Gaussian FOE
2
2( )
1 1
2 2( )
1 1 1 ,
2 2 2
1
2
1
1( ;{ }) exp ( )({ })
( ) ( * ) ( , )
{ }( ) { }( ) ( ) { }( )
ln ({ }) ln { }( ) ln ( )
K CT
GFOE i i ci ci
K C KTi c i
i c i x y
K
ii
K
i ii
z
p I F F IZ F
F I F I x y
F I G I
Z F F G
74
Gaussian FOE
* 2
( )1 1
2*
( )
*2
arg min ( ) ln ( )
( ) arg min ( ) { }( ) ln ( )
1( ){ }( )
i
K CT
i i c iML F i c
MLG
ML
F F I Z F
G G I G
GI
75
Gaussian FOE
• satisfies:
=> Optimal filters have high frequencies
2*2
1
1{ }( ){ }( )
K
ii
FI
2*
1
{ }( )K
ii
F
2{ }( )I
*iF
76
• Non-Gaussian potentials -> modeled by GSM
• Properties of GFOE hold for GSM
Gaussian Scale Mixture (GSM)
77
Revisiting FOE
• Student t expert – fit GSM• Filters have the property of
Natural image Roth and Black filters
22
1
1{ }( ){ }( )
K
ii
FI
high-frequency filters
78
Learning FOE with fixed filters
Algorithm prefers high-frequency filters
79
Conclusion
• For Gaussian potentials and GSM’s:learning => High frequency filters
• Experimental evidence to this phenomena • Maybe there is a “logic” behind this non-intuitive
result?
80
Making Sense of results
• Criterion for “good” filters for patches – Rarely fire on natural images and fire frequently on all other images
Patches from Natural images
Histogram of filter responses
White noise
81
Making Sense of results
• An image was modeled by what you don’t expect to find in it
• This is satisfied by the classical prior of smooth gradients
• But why limit ourselves to intuitive filters?• Maybe non-intuitive filters can do better…
82
reactivediffusive
White noise
Patches from Natural images
Revisiting diffusive and reactive potentials
White noise
Patches from Natural images
83
Inference
• We learned a model• We can use it for inference problems
– Corrupted information– Missing information
• Exact inference – Loopy BP • Approximate inference - gradient based
optimization
84
Belief Propagation
• Observed data is incorporated to model byiy i
ix
iy
85
Belief Propagation
Message passing Algorithm
• Exact only on tree MRFs • Efficient only on pairwise MRFs
86
Alternative by Roth and Black
• Reminder:
• Approximate inference by gradient-based optimization :
• Advantage: Low computational cost• Drawback: only local minimum if not convex
= argmin ( ( | )) ( ( ))MAPI
I Log P I I Log P I
Uncertainty \Noise model Learned model
( 1) ( ) , ( , )t tII I I I E I I
87
Partition function
=> No need to estimate partition function
• We get:
( , ) 1
arg min ( ( | )) * ( , );n
i iI x y i
Log P I I F I x y
( , ) 1
argmin ( ( | )) log( ( , )) * ( , );n
i i i iI x y i
Log P I I Z F F I x y
X
(Doesn’t depend on )
I
88
The gradient step
( ) ( )
( , ) 1 1
* ( , ); * '( * ; )n N
i iI i i i
x y i i
F I x y F F I
( )iF( )iF
• How to derivate the second term?• By a mathematical “trick” we get:
89
• Assume Gaussian noise
• So the Gradient step is:
De-noising
2
2
( | ) ( )1( ( | )) ( )
2
nP I I P I I
Log P I I I I
( ) ( )2
1
1 ( ) * '( * ; )N
i ii
i
I I I F F I
90
Results
91
Results
92
Results
Original Noisy(20.29dB)
FOE(28.72dB)
Poritilla (Wavelets)(28.9dB)
Non-local means
(28.21dB)
StandardNon-Linear diffusion (27.18dB)
State of the art
Generalprior
93
Results on Berkeley databaseWiener filter
Non-Linear diffusion FOE
Poritilla1Poritilla2
Out
put P
SNR
Low noise
High noise
Input PSNRLow noise
High noise
Input PSNR
94
How many 3x3 filters to take?
Number of filters
Size of filter – 3X3Performance start saturating when we reach 8 filters
95
Dependence on size and shape of clique
What is the best filter?
97
Inpainting - Reminder
Y AX n
Y X
Problem: pixels outside mask can change
Solution: constraint them
Inpainting
• Assume pixels outside mask M don’t change
• So the gradient step is: ( ) ( )
1
* '( * ; )N
i ii
i
I M F F I
Advanced Topics In Computer
Vision CourseSpring 2010
Advanced Topics In Computer
Vision CourseSpring 2010
0-1 Mask Image we want to inpaint98
99
Results
100
Results
101
ResultsFOE Bertalmio
FOE Bertalmio
PSNR 29.06dB 27.56dB
SSIM 0.9371 0.9167
102
Pro’s and Con’s
• Perform well on narrow straws or small holes (even if they cover most of the image)
• Isn’t able to fill large holes• Isn’t designed to handle textures
103
Thank you for Listening…