learning low-level vision william t. freeman egon c. pasztor owen t. carmichael

Learning Low-Level Vision

William T. Freeman Egon C. Pasztor

Owen T. Carmichael

Model image and scene patches as nodes in a Markov network

image patches

(xi, yi)

(xi, xj)

image

scene

scene patches

Network joint probability

scene

image

Scene-scenecompatibility

functionneighboringscene nodes

local observations

Image-scenecompatibility

function

i

iiji

ji yxxxZ

yxP ),(),(1

),(,

Super-resolution

• Image: low resolution image

• Scene: high resolution image

imag

esc

ene

ultimate goal...

True high freqsLow-band input

(contrast normalized, PCA fitted)

Full freq. originalRepresentationZoomed low-freq.

(to minimize the complexity of the relationships we have to learn,we remove the lowest frequencies from the input image,

and normalize the local contrast level).

Training images, ~100,000 image/scene patch pairs

Images from two Corel database categories: “giraffes” and “urban skyline”.

Training data samples (magnified)

......

Gather ~100,000 patches

low freqs.

high freqs.

Input low freqs.

Training data samples (magnified)

......

Nearest neighbor estimate

low freqs.

high freqs.

Estimated high freqs.

Image-scene compatibility function, (xi, yi)

Assume Gaussian noise takes you from observed image patch to synthetic sample:

y

x

Scene-scene compatibility function, (xi, xj)

Assume overlapped regions, d, of hi-res. patches differ by Gaussian observation noise:

d

Uniqueness constraint,not smoothness.

Form linking matrices between nodes

scene samplesat node xj

scene samplesat node xk (xk, xj)

0.16 0.14 0.23 0.40 0.380.72 0.61 0.58 0.13 0.050.60 0.55 0.52 0.11 0.070.48 0.32 0.29 0.03 0.000.09 0.04 0.03 0.01 0.00

Linking matrix:(xk,xj)at samples

Local likelihoods are

all 1 for the scene samples

Markov network

image patches

(xi, yi)

(xi, xj)

scene patches

),,,,,(sumsummean 3213211321

yyyxxxPxxxx

MMSE

y1

Derivation of belief propagation

),( 11 yx

),( 21 xx

),( 22 yx

),( 32 xx

),( 33 yx

x1

y2

x2

y3

x3

The posterior factorizes

y1

),( 11 yx

),( 21 xx

),( 22 yx

),( 32 xx

),( 33 yx

x1

y2

x2

y3

x3),(),(sum

),(),(sum

),(mean

),(),(

),(),(

),(sumsummean

),,,,,(sumsummean

3233

2122

111

3233

2122

111

3213211

3

2

1

321

321

xxyx

xxyx

yxx

xxyx

xxyx

yxx

yyyxxxPx

x

x

xMMSE

xxxMMSE

xxxMMSE

Propagation rules

y1

),( 11 yx

),( 21 xx

),( 22 yx

),( 32 xx

),( 33 yx

x1

y2

x2

y3

x3

),(),(sum

),(),(sum

),(mean

3233

2122

111

3

2

1

xxyx

xxyx

yxx

x

x

xMMSE

)( ),( ),(sum)( 23222211

21

2

xMyxxxxMx

Belief, and message updates

jii =

ij( )\

( ) ( , ) ( , ) ( )j

j ki i i j i j j j

x k N j i

M x x x x y M x

j

( )

( ) ( , ) ( )kj j j j j j

k N j

b x x y M x

ˆ argmax ( )j

j j jx

x b x

Optimal solution in a chain or tree:Belief Propagation

• “Do the right thing” Bayesian algorithm.

• For Gaussian random variables over time: Kalman filter.

• For hidden Markov models: forward/backward algorithm (and MAP variant is Viterbi).

No factorization with loops!

y1

x1

y2

x2

y3

x3

),(),(sum

),(),(sum

),(mean

3233

2122

111

3

2

1

xxyx

xxyx

yxx

x

x

xMMSE

31 ),( xx

Justification for running belief propagation in networks with loops

• Experimental results:

– Error-correcting codes

– Vision applications

• Theoretical results:

– For Gaussian processes, means are correct.

– Large neighborhood local maximum for MAP.

– Equivalent to Bethe approx. in statistical

physics.

Weiss and Freeman, 2000

Yedidia, Freeman, and Weiss, 2000

Freeman and Pasztor, 1999;Frey, 2000

Kschischang and Frey, 1998;McEliece et al., 1998

Weiss and Freeman, 1999

VISTA--Vision by Image-Scene TrAining

image patches

(xi, yi)

(xi, xj)

image

scene

scene patches

Super-resolution application

image patches

(xi, yi)

(xi, xj)

scene patches

Iter. 3

Iter. 1

Belief PropagationInput

Iter. 0

After a few iterations of belief propagation, the algorithm selects spatially consistent high resolution

interpretations for each low-resolution patch of the input image.

Zooming 2 octaves

85 x 51 input

Cubic spline zoom to 340x204 Max. likelihood zoom to 340x204

We apply the super-resolution algorithm recursively, zooming

up 2 powers of 2, or a factor of 4 in each dimension.

Generic training images

Next, train on a generic set of training images. Using the same camera

as for the test image, but a random collection of

photographs.

Cubic Spline

Original70x70

Markovnet, training:generic

True280x280

Training image

Processed image

learning low-level vision william t. freeman egon c. pasztor owen t. carmichael

Documents

input image

carmichaelmodel image

training data samples

test image

patcheslow freqs

input low freqs

observed image patch

superresolution algorithm