what is vision? why is it hard ? alan l. yuille. ucla

42
What is Vision? What is Vision? Why is it Hard Why is it Hard ? ? Alan L. Yuille. UCLA.

Upload: bertram-harrell

Post on 17-Dec-2015

220 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: What is Vision? Why is it Hard ? Alan L. Yuille. UCLA

What is Vision? What is Vision? Why is it Hard Why is it Hard??

Alan L. Yuille.

UCLA.

Page 2: What is Vision? Why is it Hard ? Alan L. Yuille. UCLA

The Purpose of Vision.The Purpose of Vision.

“To Know What is Where by Looking”. Aristotle. (384-322 BC).

Information Processing: receive a signal by light rays and decode its information.

Vision appears deceptively simple, but this is highly misleading.

Page 3: What is Vision? Why is it Hard ? Alan L. Yuille. UCLA

What can humans see?What can humans see?

Rich description: fox, tree trunk, grass and background twigs.

Also: shape of fox’s legs and head, its fur, what it is doing? old or young? and more.

Local regions (B,C,D) are highly ambiguous.

Page 4: What is Vision? Why is it Hard ? Alan L. Yuille. UCLA

AirplaneCarBoatSign

Building

Local regions are ambiguousLocal regions are ambiguous

Page 5: What is Vision? Why is it Hard ? Alan L. Yuille. UCLA

Why can Humans see so well?Why can Humans see so well?

Humans appear understand images effortlessly. But this is only because of the enormous amount of our brains that we devote to this task.

It is estimated that 40-50% of neurons in the cortex are involved in doing vision.

Humans are very visual. We get much more information from our eyes than other animals.

Page 6: What is Vision? Why is it Hard ? Alan L. Yuille. UCLA

Vision and the BrainVision and the Brain

Page 7: What is Vision? Why is it Hard ? Alan L. Yuille. UCLA

Half the Cortex does VisionHalf the Cortex does Vision

Page 8: What is Vision? Why is it Hard ? Alan L. Yuille. UCLA

But human vision is not perfectBut human vision is not perfect

Human perception is often incorrect. Visual illusions suggest that perception is

often a reconstruction, or even a controlled hallucination.

Here are some illusions.

Page 9: What is Vision? Why is it Hard ? Alan L. Yuille. UCLA

Ames RoomAmes Room

Page 10: What is Vision? Why is it Hard ? Alan L. Yuille. UCLA

Perspective CuesPerspective Cues

Page 11: What is Vision? Why is it Hard ? Alan L. Yuille. UCLA

Are these lines horizontal? Are these lines horizontal?

Page 12: What is Vision? Why is it Hard ? Alan L. Yuille. UCLA

Which square is brighter? Which square is brighter?

Page 13: What is Vision? Why is it Hard ? Alan L. Yuille. UCLA

Visual IllusionsVisual Illusions

The perception of brightness of a surface, or the length of a line depends on context. Not on basic measurements like:the no. of photons that reach the eyeor the length of line in the image.

Page 14: What is Vision? Why is it Hard ? Alan L. Yuille. UCLA

Perception is InferencePerception is Inference

Helmholtz. 1821-1894.“Perception as Unconscious Inference”.

Page 15: What is Vision? Why is it Hard ? Alan L. Yuille. UCLA

So why is vision hard?So why is vision hard?Vision is an inverse inference problem. There is

a highly complex process that generates the image.

This process is studied in Computer Graphics. It models objects, light sources, and how they interact.

Vision must invert this image formation process to estimate the “causal factors” – objects, lighting, and so on.

Page 16: What is Vision? Why is it Hard ? Alan L. Yuille. UCLA

Vision as Inverse Inference:Vision as Inverse Inference:Inverting image formation.Inverting image formation.

Page 17: What is Vision? Why is it Hard ? Alan L. Yuille. UCLA

Inverse Problems are often hardInverse Problems are often hard

There are an infinite number of ways that images can be formed.

Which ways are most plausible?

.

Page 18: What is Vision? Why is it Hard ? Alan L. Yuille. UCLA

Vision and BayesVision and Bayes

Bayes’ Theorem gives a procedure to solve inverse inference problems.

It states that we can infer the

state S of the world from the

observed image I by, P(S|I) = P(I|S)P(S)/P(I).But doesn’t tell us how toSpecify these distributions.Rev. T. Bayes. 1702-1761

Page 19: What is Vision? Why is it Hard ? Alan L. Yuille. UCLA

Complexity: Complexity: The Fundamental Problem?The Fundamental Problem?

The set of all images is almost infinite (Kersten 1987).

Estimate that only a tiny fraction of all possible images have been seen by humans (even small images 10x10).

The no. of objects is large – 30,000?The no. of scene types is large – 1,000?Hence I and S are extremely complex.

Page 20: What is Vision? Why is it Hard ? Alan L. Yuille. UCLA

Variability and AmbiguityVariability and Ambiguity

Variability: Images of similar objects (bicycle) can be very different (left, center).

Ambiguity: Images of different objects can be identical (right).

Page 21: What is Vision? Why is it Hard ? Alan L. Yuille. UCLA

Complexity, Variability, AmbiguityComplexity, Variability, Ambiguity

What is the space of all possible images? It is only now becoming possible to explore

this questions – (technological advances).Large datasets – 20,000 images (Pascal),

1,000,000 (ImageNet).But are these datasets big enough for results

on them to apply to other images?

Page 22: What is Vision? Why is it Hard ? Alan L. Yuille. UCLA

Datasets: Promise and PerilDatasets: Promise and Peril

Vision evaluates performance of algorithms on benchmarked datasets.

But results on these datasets may fail to generalize to other images.

Datasets need to be representative of the complexity of the real world.

Tyranny of Datasets. Unrepresentative datasets. Limited visual tasks.

Page 23: What is Vision? Why is it Hard ? Alan L. Yuille. UCLA

Dataset ExamplesDataset Examples

Pascal (top). 20,000 images. 20 objects.

ImageNet (bottom) 1,000,000 images. 1,000 objects.

Page 24: What is Vision? Why is it Hard ? Alan L. Yuille. UCLA

Brief History of VisionBrief History of Vision

The difficulty of vision became clear in the 1960’s,70’s when Artificial Intelligence researchers tried to get computer programs to interpret images.

Initially AI researchers thought vision was easy. “Solve vision in a summer”.

Researchers gradually started realizing that vision was extremely hard. Much harder than Chess.

.

Page 25: What is Vision? Why is it Hard ? Alan L. Yuille. UCLA

Big Picture Theories of Vision.Big Picture Theories of Vision.

Work in the 1980’s in vision was largely influenced by big picture theories.

e.g., David Marr’s Vision (1982).The theories were interesting, but practical

results were limited.Progress started being made by breaking up

vision into subproblems. Inventing or borrowing mathematical tools for these problems.

Page 26: What is Vision? Why is it Hard ? Alan L. Yuille. UCLA

Techniques used in VisionTechniques used in Vision

Multi-Disciplinary set of tools. Linear and Non-linear filtering.Variational Methods.Probabilities on Graphs.Perspective Geometry.Differential Geometry.And many more. We will introduce many of

these at this summer school.

Page 27: What is Vision? Why is it Hard ? Alan L. Yuille. UCLA

Vision as a Bag of Tricks?Vision as a Bag of Tricks?Dividing vision up into sub-problems enabled

progress to be made.But lead to a situation where vision was seen as

an enormous bag of tricks.“What papers should I read in computer vision?

There are so many, and they are so different.” Chinese Graduate student.

Attempts are being made – summer schools, Wikivision project – to give a core and foundation for vision,

Page 28: What is Vision? Why is it Hard ? Alan L. Yuille. UCLA

The mathematical richness of The mathematical richness of visionvision

The complexity and ambiguity of vision is a challenge.

Vision problems (broadly defined) have attracted the attention of learning mathematicians (e.g., Field’s medalists).

David Mumford, Shing-Tung Yau, Maxim Kontsevich, Stephen Smale, Terrence Tao.

Page 29: What is Vision? Why is it Hard ? Alan L. Yuille. UCLA

Relationships to other Relationships to other DisciplinesDisciplines

Computer vision relates closely toImage Processing Machine Learning.Partially covered in this school. Artificial Intelligence.Robotics.Study of Biological Vision Systems –

Psychology and Neuroscience.

Page 30: What is Vision? Why is it Hard ? Alan L. Yuille. UCLA

Imaging DevicesImaging Devices

In this school we will be dealing with “natural images” of everyday scenes.

But the methods used can be applied to other imaging modalities.

Medical images: fMRI, EEC,Astronomial Images:…Ever-increasing number of imaging devices.

Need a science of visual images.

Page 31: What is Vision? Why is it Hard ? Alan L. Yuille. UCLA

Structure of the SchoolStructure of the School

Vision can be broken down into low-, mid-, and high-level vision (very roughly).

Low-level vision – local image operations which have limited knowledge of the world.

Mid-level vision – non-local operations which know about surfaces and geometry.

High-level vision – operations which know about objects and scene structures.

Page 32: What is Vision? Why is it Hard ? Alan L. Yuille. UCLA

Low-level visionLow-level vision

Image processing. Filtering, denoising, enhancement.Edge detection. Image segmentation.Right: ideal segmentation(followed by labeling).

Page 33: What is Vision? Why is it Hard ? Alan L. Yuille. UCLA

Mid-Level VisionMid-Level Vision

Estimation of 3D surfaces:Binocular stereo,structure from motion,other depth cues (e.g., perspective, shading,texture, focus).Fig: Image, Depth (blue to red),Segmentation.

Page 34: What is Vision? Why is it Hard ? Alan L. Yuille. UCLA

Mid-Level Vision: GroupingMid-Level Vision: Grouping

Kanisza. Humans have a strong tendency to group image structures as surfaces.

Page 35: What is Vision? Why is it Hard ? Alan L. Yuille. UCLA

High-Level VisionHigh-Level Vision

Object detection and Scene Understanding.Example: detect objects in Pascal.

Page 36: What is Vision? Why is it Hard ? Alan L. Yuille. UCLA

High-Level Vision: partsHigh-Level Vision: parts

Detect parts of objects:

Page 37: What is Vision? Why is it Hard ? Alan L. Yuille. UCLA

High-Level Vision – AI.High-Level Vision – AI. Rich Explicit Representations enable: Understanding of objects, scenes, and events. Reasoning about functions and roles of objects, goals

and intentions of agents, predicting the outcomes of events. SC Zhu – MURI.

Page 38: What is Vision? Why is it Hard ? Alan L. Yuille. UCLA

Visual Turing testsVisual Turing tests

Vision algorithms that have similar properties to those of humans:

(i) flexible, adaptive, robust(ii) capable of learning from limited data,

ability to transfer, (iii) able to perform multiple tasks,(iv) closely coupled to reasoning, language,

and other cognitive abilities.

Page 39: What is Vision? Why is it Hard ? Alan L. Yuille. UCLA

Grammars, Compositionality, Grammars, Compositionality, and Pattern Theory.and Pattern Theory.

How to put everything together?How to address the complexity problem of

vision.Pattern theory offers suggestions:Mumford and Desolneux 2010.Describe the class of patterns that can

happen in images using grammars.

Page 40: What is Vision? Why is it Hard ? Alan L. Yuille. UCLA

Grammars and Pattern TheoryGrammars and Pattern TheoryParse images by decomposing them into

elementary image patterns.Inverse inference on a grammar for

generating images.

Page 41: What is Vision? Why is it Hard ? Alan L. Yuille. UCLA

The next few lecturesThe next few lectures

The next few lectures will concentrate on basic low-level vision tasks:

Filtering – linear and non-linear.Segmentation. Variational and probabilistic

methods.

Page 42: What is Vision? Why is it Hard ? Alan L. Yuille. UCLA

ConclusionConclusion

Vision is difficult and challenging. Humans devote half of their cortex to doing it.

It is difficult because it is inverse inference in an extremely complex domain.

It involves a great variety of mathematical and computational techniques.

Low-, Mid-, High-Level taxonomy. Pattern Theory and Grammars.