graphical models in vision. alan l. yuille. ucla. dept. statistics

Graphical Models in VisionGraphical Models in Vision..

Alan L. Yuille.

UCLA. Dept. Statistics

The Purpose of Vision.The Purpose of Vision.

“To Know What is Where by Looking”. Aristotle. (384-322 BC).

Information Processing: receive a signal by light rays and decode its information.

Vision appears deceptively simple, but there is more to Vision than meets the Eye.

Ames RoomAmes Room

Perspective.Perspective.

What are Humans Ideal for?What are Humans Ideal for?Clearly humans are not good at determining

the size of objects in images – at least for these types of stimuli.

But they are good at determining context and taking contextual cues into account – i.e. use perspective cues to estimate depth and make adjustments.

What reasoning/statistical tasks are humans ideal for?

Brightness of Patterns: Adelson (MIT)Brightness of Patterns: Adelson (MIT)

Visual IllusionsVisual Illusions

The perception of brightness of a surface, or the length of a line, depends on context. Not on basic measurements like:the no. of photons that reach the eyeor the length of line in the image.

Vision is ill-posed.Vision is ill-posed.

Vision is ill-posed – the data in the retina is not sufficient to unambiguously determine the visual scene.

Vision is possible because we have prior knowledge about visual scenes.

Even simple perception is an act of creation.

Perception as InferencePerception as Inference

Helmholtz. 1821-1894.“Perception as Unconscious Inference”.

Ball in a Box. (D. Kersten)Ball in a Box. (D. Kersten)

How Hard is Vision?How Hard is Vision?

The Human Brain devotes an enormous amount of resources to vision.

(I) Optic nerve is the biggest nerve in the body. (II) Roughly half of the neurons in the cortex are

involved in vision (van Essen). If intelligence is proportional to neural activity,

then vision requires more intelligence than mathematics or chess.

Vision and the BrainVision and the Brain

Half the Cortex does VisionHalf the Cortex does Vision

Vision and Artificial IntelligenceVision and Artificial Intelligence

The hardness of vision became clearer when

the Artificial Intelligence community tried to

design computer programs to do vision. ’60s.AI workers thought that vision was “low-

level” and easy. Prof. Marvin Minsky (pioneer of AI) asked

a student to solve vision as a summer project.

Chess and Face DetectionChess and Face Detection

Artificial Intelligence Community preferred Chess to Vision.

By the mid-90’s Chess programs could beat the world champion Kasparov.

But computers could not find faces in images.

Man and Machine.Man and Machine.

David Marr (1945-1980) Three Levels of explanation:

1. Computation Level/Information Processing

2. Algorithmic Level

3. Hardware: Neurons versus silicon chips.

Claim: Man and Machine are similar at Level 1.

Vision: Decoding ImagesVision: Decoding Images

Vision as Probabilistic Inference Vision as Probabilistic Inference

Represent the World by S.Represent the Image by I.Goal: decode I and infer S.Model image formation by likelihood

function, generative model, P(I|S)Model our knowledge of the world by a

prior P(S).

Bayes TheoremBayes Theorem

Then Bayes’ Theorem states we show infer the world S from I by

P(S|I) = P(I|S)P(S)/P(I).Rev. T. Bayes. 1702-1761

Bayes to Infer S from IBayes to Infer S from I

P(I|S) likelihood function . P(S) prior.

.

Ambiguity and Complexity of Images.Ambiguity and Complexity of Images.

Similar objects give rise to very different images. Different objects can cause similar images.

Ideal ObserversIdeal Observers

The Image of a cylinder is consistent with multiple objects and viewpoints.

The likelihood is ambiguous

(concave or convex). The prior resolves the ambiguity by

biasing towards convex objects viewed from above.

Influence Graphs and Visual TasksInfluence Graphs and Visual Tasks

Influence Graphs and the Visual Task

A Simple Taxonomy of GraphsA Simple Taxonomy of Graphs

A Taxonomy of Graphs:

B.

C.

D.

Examples of Vision TasksExamples of Vision Tasks

Visual Inference: (1) Estimating Shape. (2) Segmenting Images. (3) Detecting Faces. (4) Detecting and Reading Text. (5) Parsing the full image – detect and

recognize all objects in the image, understand the viewed scene.

Segmentation (Level Sets)Segmentation (Level Sets)

Analysis by SynthesisAnalysis by Synthesis

Invert generation process to parse the image.

Probabilistic Grammars

for image generation

(week 2).

Probabilistic Grammars for ImagesProbabilistic Grammars for Images

(I) Image are generated by composing visual patterns: (II) Parse an image by decomposing it into patterns.

Generative Models for PatternsGenerative Models for Patterns

Examples of images synthesized from generative models (MCMC).

Shape InferenceShape Inference

Face and Text Detection.Face and Text Detection.

Text DetectionText Detection

Towards Full Image ParsingTowards Full Image Parsing

The image genome project (Zhu).

Attempt to determine the grammar for images by interactive parsing of images.

Thereby learn the statistical regularities of images – the priors and the representations.

Parse graph with horizontal relationsParse graph with horizontal relations

Example: street sceneExample: street scene

DatabaseDatabase

landscape

seashore

scenegeneric object

others

attribute curve

natural manmade

land mammal

pigcat

horsetigercattlebearpanda

kangarooorangutang

zebra...

bird

robin

eaglecrane

ibisparrotflamingoowlpigeon

duckhen...

marine

sharkbass

dolpintroutgoldfishshrimpoctopus...

insert

butterflyant

cockroachdragonflymayflyscorpiontick...

other

turtlecrocodile

forgcrabsnak...

animal other

mountain/hill

plantflowerfruit

body of water...

chairtable

bedbenchcouch...

furniture

televisionlampmicrowave

air-condition

ceiling fan

...

ambulancetelepnone

mp3cell phone

camera

electronic

helicopter

battleshipcannon

rifletank

sword...

weapon

food

containercomputer

flag

toolsmusic instrumentstationery...

other

airplanecarbusbicycle

motorcycle

...ambulance

truckSUV

cruise ship

vehicle

bathroombedroomcorridorhallkitchen

livingroomoffice

indoor

street

cityview

harborhighway

parking

rural

forest

outdoor

D a t a b a s e 561,726 images3,309,257 POs

804 images86,665 curves

525,850 frames2,794,727POsvideo

surveillance video clips

602 images18,878 POs

chinese

english

text1,194 images13,889 POsface

ageposeexpression

22,405 images129,184 POs

723 images48,907 POs

10,139 images217,007 POs

meetingshoppingsports

dinnerlecture

activity

graphlet

...

businessparkingairportresidentialindustryintersectionmarinaschool

aerial image

weak boundary

low-middle level vision

cartoon movie clips

Inventory of the annotated image database by Nov.06PO means a parsed object node in the database

Back to the BrainBack to the Brain

Top-Level; compare human performance to

Ideal Observers.

Explain human perceptual biases (visual

illusions) as strategies that are “statistical

effective”.

Brain Architecture Brain Architecture

The Bayesian models have interesting

analogies to the brain. Generative models and analysis by

synthesis.

This is consistent with top-down processing? (Kersten’s talk next week).

ConclusionConclusion

Vision is unconscious inference. Bayesian Approach lead to vision as analysis by

synthesis -- inverting the image generation process.

This requires “sophisticated” priors about the statistics of natural images.

This can be formulated mathematically in terms of Probabilistic Grammars for image formation.

These grammars can be learnt by analysing the “sophisticated” statistics of natural images.

graphical models in vision. alan l. yuille. ucla. dept. statistics

Documents

vision van essen

world s

similar objects

similar images

s chess programs

convex objects

different objects

different images