what is vision? why is it hard ? alan l. yuille. ucla

What is Vision? What is Vision? Why is it Hard Why is it Hard??

Alan L. Yuille.

UCLA.

The Purpose of Vision.The Purpose of Vision.

“To Know What is Where by Looking”. Aristotle. (384-322 BC).

Information Processing: receive a signal by light rays and decode its information.

Vision appears deceptively simple, but this is highly misleading.

What can humans see?What can humans see?

Rich description: fox, tree trunk, grass and background twigs.

Also: shape of fox’s legs and head, its fur, what it is doing? old or young? and more.

Local regions (B,C,D) are highly ambiguous.

AirplaneCarBoatSign

Building

Local regions are ambiguousLocal regions are ambiguous

Why can Humans see so well?Why can Humans see so well?

Humans appear understand images effortlessly. But this is only because of the enormous amount of our brains that we devote to this task.

It is estimated that 40-50% of neurons in the cortex are involved in doing vision.

Humans are very visual. We get much more information from our eyes than other animals.

Vision and the BrainVision and the Brain

Half the Cortex does VisionHalf the Cortex does Vision

But human vision is not perfectBut human vision is not perfect

Human perception is often incorrect. Visual illusions suggest that perception is

often a reconstruction, or even a controlled hallucination.

Here are some illusions.

Ames RoomAmes Room

Perspective CuesPerspective Cues

Are these lines horizontal? Are these lines horizontal?

Which square is brighter? Which square is brighter?

Visual IllusionsVisual Illusions

The perception of brightness of a surface, or the length of a line depends on context. Not on basic measurements like:the no. of photons that reach the eyeor the length of line in the image.

Perception is InferencePerception is Inference

Helmholtz. 1821-1894.“Perception as Unconscious Inference”.

So why is vision hard?So why is vision hard?Vision is an inverse inference problem. There is

a highly complex process that generates the image.

This process is studied in Computer Graphics. It models objects, light sources, and how they interact.

Vision must invert this image formation process to estimate the “causal factors” – objects, lighting, and so on.

Vision as Inverse Inference:Vision as Inverse Inference:Inverting image formation.Inverting image formation.

Inverse Problems are often hardInverse Problems are often hard

There are an infinite number of ways that images can be formed.

Which ways are most plausible?

.

Vision and BayesVision and Bayes

Bayes’ Theorem gives a procedure to solve inverse inference problems.

It states that we can infer the

state S of the world from the

observed image I by, P(S|I) = P(I|S)P(S)/P(I).But doesn’t tell us how toSpecify these distributions.Rev. T. Bayes. 1702-1761

Complexity: Complexity: The Fundamental Problem?The Fundamental Problem?

The set of all images is almost infinite (Kersten 1987).

Estimate that only a tiny fraction of all possible images have been seen by humans (even small images 10x10).

The no. of objects is large – 30,000?The no. of scene types is large – 1,000?Hence I and S are extremely complex.

Variability and AmbiguityVariability and Ambiguity

Variability: Images of similar objects (bicycle) can be very different (left, center).

Ambiguity: Images of different objects can be identical (right).

Complexity, Variability, AmbiguityComplexity, Variability, Ambiguity

What is the space of all possible images? It is only now becoming possible to explore

this questions – (technological advances).Large datasets – 20,000 images (Pascal),

1,000,000 (ImageNet).But are these datasets big enough for results

on them to apply to other images?

Datasets: Promise and PerilDatasets: Promise and Peril

Vision evaluates performance of algorithms on benchmarked datasets.

But results on these datasets may fail to generalize to other images.

Datasets need to be representative of the complexity of the real world.

Tyranny of Datasets. Unrepresentative datasets. Limited visual tasks.

Dataset ExamplesDataset Examples

Pascal (top). 20,000 images. 20 objects.

ImageNet (bottom) 1,000,000 images. 1,000 objects.

Brief History of VisionBrief History of Vision

The difficulty of vision became clear in the 1960’s,70’s when Artificial Intelligence researchers tried to get computer programs to interpret images.

Initially AI researchers thought vision was easy. “Solve vision in a summer”.

Researchers gradually started realizing that vision was extremely hard. Much harder than Chess.

.

Big Picture Theories of Vision.Big Picture Theories of Vision.

Work in the 1980’s in vision was largely influenced by big picture theories.

e.g., David Marr’s Vision (1982).The theories were interesting, but practical

results were limited.Progress started being made by breaking up

vision into subproblems. Inventing or borrowing mathematical tools for these problems.

Techniques used in VisionTechniques used in Vision

Multi-Disciplinary set of tools. Linear and Non-linear filtering.Variational Methods.Probabilities on Graphs.Perspective Geometry.Differential Geometry.And many more. We will introduce many of

these at this summer school.

Vision as a Bag of Tricks?Vision as a Bag of Tricks?Dividing vision up into sub-problems enabled

progress to be made.But lead to a situation where vision was seen as

an enormous bag of tricks.“What papers should I read in computer vision?

There are so many, and they are so different.” Chinese Graduate student.

Attempts are being made – summer schools, Wikivision project – to give a core and foundation for vision,

The mathematical richness of The mathematical richness of visionvision

The complexity and ambiguity of vision is a challenge.

Vision problems (broadly defined) have attracted the attention of learning mathematicians (e.g., Field’s medalists).

David Mumford, Shing-Tung Yau, Maxim Kontsevich, Stephen Smale, Terrence Tao.

Relationships to other Relationships to other DisciplinesDisciplines

Computer vision relates closely toImage Processing Machine Learning.Partially covered in this school. Artificial Intelligence.Robotics.Study of Biological Vision Systems –

Psychology and Neuroscience.

Imaging DevicesImaging Devices

In this school we will be dealing with “natural images” of everyday scenes.

But the methods used can be applied to other imaging modalities.

Medical images: fMRI, EEC,Astronomial Images:…Ever-increasing number of imaging devices.

Need a science of visual images.

Structure of the SchoolStructure of the School

Vision can be broken down into low-, mid-, and high-level vision (very roughly).

Low-level vision – local image operations which have limited knowledge of the world.

Mid-level vision – non-local operations which know about surfaces and geometry.

High-level vision – operations which know about objects and scene structures.

Low-level visionLow-level vision

Image processing. Filtering, denoising, enhancement.Edge detection. Image segmentation.Right: ideal segmentation(followed by labeling).

Mid-Level VisionMid-Level Vision

Estimation of 3D surfaces:Binocular stereo,structure from motion,other depth cues (e.g., perspective, shading,texture, focus).Fig: Image, Depth (blue to red),Segmentation.

Mid-Level Vision: GroupingMid-Level Vision: Grouping

Kanisza. Humans have a strong tendency to group image structures as surfaces.

High-Level VisionHigh-Level Vision

Object detection and Scene Understanding.Example: detect objects in Pascal.

High-Level Vision: partsHigh-Level Vision: parts

Detect parts of objects:

High-Level Vision – AI.High-Level Vision – AI. Rich Explicit Representations enable: Understanding of objects, scenes, and events. Reasoning about functions and roles of objects, goals

and intentions of agents, predicting the outcomes of events. SC Zhu – MURI.

Visual Turing testsVisual Turing tests

Vision algorithms that have similar properties to those of humans:

(i) flexible, adaptive, robust(ii) capable of learning from limited data,

ability to transfer, (iii) able to perform multiple tasks,(iv) closely coupled to reasoning, language,

and other cognitive abilities.

Grammars, Compositionality, Grammars, Compositionality, and Pattern Theory.and Pattern Theory.

How to put everything together?How to address the complexity problem of

vision.Pattern theory offers suggestions:Mumford and Desolneux 2010.Describe the class of patterns that can

happen in images using grammars.

Grammars and Pattern TheoryGrammars and Pattern TheoryParse images by decomposing them into

elementary image patterns.Inverse inference on a grammar for

generating images.

The next few lecturesThe next few lectures

The next few lectures will concentrate on basic low-level vision tasks:

Filtering – linear and non-linear.Segmentation. Variational and probabilistic

methods.

ConclusionConclusion

Vision is difficult and challenging. Humans devote half of their cortex to doing it.

It is difficult because it is inverse inference in an extremely complex domain.

It involves a great variety of mathematical and computational techniques.

Low-, Mid-, High-Level taxonomy. Pattern Theory and Grammars.

what is vision? why is it hard ? alan l. yuille. ucla

Documents

vision slide

ambiguous slide

brain slide

human vision

purpose of vision

possible images

images of different

images pascal