learning low-level vision william t. freeman egon c. pasztor owen t. carmichael
DESCRIPTION
Learning Low-Level Vision William T. Freeman Egon C. Pasztor Owen T. Carmichael. Model image and scene patches as nodes in a Markov network. image patches. scene patches. image. F ( x i , y i ). Y ( x i , x j ). scene. Network joint probability. 1. Õ. Õ. =. Y. F. y. P. (. x. - PowerPoint PPT PresentationTRANSCRIPT
Learning Low-Level Vision
William T. Freeman Egon C. Pasztor
Owen T. Carmichael
Model image and scene patches as nodes in a Markov network
image patches
(xi, yi)
(xi, xj)
image
scene
scene patches
Network joint probability
scene
image
Scene-scenecompatibility
functionneighboringscene nodes
local observations
Image-scenecompatibility
function
i
iiji
ji yxxxZ
yxP ),(),(1
),(,
Super-resolution
• Image: low resolution image
• Scene: high resolution image
imag
esc
ene
ultimate goal...
True high freqsLow-band input
(contrast normalized, PCA fitted)
Full freq. originalRepresentationZoomed low-freq.
(to minimize the complexity of the relationships we have to learn,we remove the lowest frequencies from the input image,
and normalize the local contrast level).
Training images, ~100,000 image/scene patch pairs
Images from two Corel database categories: “giraffes” and “urban skyline”.
Training data samples (magnified)
......
Gather ~100,000 patches
low freqs.
high freqs.
Input low freqs.
Training data samples (magnified)
......
Nearest neighbor estimate
low freqs.
high freqs.
Estimated high freqs.
Image-scene compatibility function, (xi, yi)
Assume Gaussian noise takes you from observed image patch to synthetic sample:
y
x
Scene-scene compatibility function, (xi, xj)
Assume overlapped regions, d, of hi-res. patches differ by Gaussian observation noise:
d
Uniqueness constraint,not smoothness.
Form linking matrices between nodes
scene samplesat node xj
scene samplesat node xk (xk, xj)
0.16 0.14 0.23 0.40 0.380.72 0.61 0.58 0.13 0.050.60 0.55 0.52 0.11 0.070.48 0.32 0.29 0.03 0.000.09 0.04 0.03 0.01 0.00
Linking matrix:(xk,xj)at samples
Local likelihoods are
all 1 for the scene samples
Markov network
image patches
(xi, yi)
(xi, xj)
scene patches
),,,,,(sumsummean 3213211321
yyyxxxPxxxx
MMSE
y1
Derivation of belief propagation
),( 11 yx
),( 21 xx
),( 22 yx
),( 32 xx
),( 33 yx
x1
y2
x2
y3
x3
The posterior factorizes
y1
),( 11 yx
),( 21 xx
),( 22 yx
),( 32 xx
),( 33 yx
x1
y2
x2
y3
x3),(),(sum
),(),(sum
),(mean
),(),(
),(),(
),(sumsummean
),,,,,(sumsummean
3233
2122
111
3233
2122
111
3213211
3
2
1
321
321
xxyx
xxyx
yxx
xxyx
xxyx
yxx
yyyxxxPx
x
x
xMMSE
xxxMMSE
xxxMMSE
Propagation rules
y1
),( 11 yx
),( 21 xx
),( 22 yx
),( 32 xx
),( 33 yx
x1
y2
x2
y3
x3
),(),(sum
),(),(sum
),(mean
3233
2122
111
3
2
1
xxyx
xxyx
yxx
x
x
xMMSE
)( ),( ),(sum)( 23222211
21
2
xMyxxxxMx
Belief, and message updates
jii =
ij( )\
( ) ( , ) ( , ) ( )j
j ki i i j i j j j
x k N j i
M x x x x y M x
j
( )
( ) ( , ) ( )kj j j j j j
k N j
b x x y M x
ˆ argmax ( )j
j j jx
x b x
Optimal solution in a chain or tree:Belief Propagation
• “Do the right thing” Bayesian algorithm.
• For Gaussian random variables over time: Kalman filter.
• For hidden Markov models: forward/backward algorithm (and MAP variant is Viterbi).
No factorization with loops!
y1
x1
y2
x2
y3
x3
),(),(sum
),(),(sum
),(mean
3233
2122
111
3
2
1
xxyx
xxyx
yxx
x
x
xMMSE
31 ),( xx
Justification for running belief propagation in networks with loops
• Experimental results:
– Error-correcting codes
– Vision applications
• Theoretical results:
– For Gaussian processes, means are correct.
– Large neighborhood local maximum for MAP.
– Equivalent to Bethe approx. in statistical
physics.
Weiss and Freeman, 2000
Yedidia, Freeman, and Weiss, 2000
Freeman and Pasztor, 1999;Frey, 2000
Kschischang and Frey, 1998;McEliece et al., 1998
Weiss and Freeman, 1999
VISTA--Vision by Image-Scene TrAining
image patches
(xi, yi)
(xi, xj)
image
scene
scene patches
Super-resolution application
image patches
(xi, yi)
(xi, xj)
scene patches
Iter. 3
Iter. 1
Belief PropagationInput
Iter. 0
After a few iterations of belief propagation, the algorithm selects spatially consistent high resolution
interpretations for each low-resolution patch of the input image.
Zooming 2 octaves
85 x 51 input
Cubic spline zoom to 340x204 Max. likelihood zoom to 340x204
We apply the super-resolution algorithm recursively, zooming
up 2 powers of 2, or a factor of 4 in each dimension.
Generic training images
Next, train on a generic set of training images. Using the same camera
as for the test image, but a random collection of
photographs.
Cubic Spline
Original70x70
Markovnet, training:generic
True280x280
Training image
Processed image