image processing, image recognition, computer vision...

Image Processing, Image Recognition,

Computer Vision, Image Understanding

Takashi Matsuyama

[email protected]

Dept. of Intelligence Science and Technology

Graduate School of Informatics, Kyoto University

Sound Processing, Speech Recognition,

Auditory Scene Understanding

Mathematical Theory of Pattern Recognition

The introduction will be given in two weeks. Slides will be on http://vision.kuee.kyoto-u.ac.jp/lecture/dsp Questions should be asked to [email protected] All reports should be sent to [email protected] by May 2nd (Fri.)

http://vision.kuee.kyoto-u.ac.jp/lecture/dsp

mailto:[email protected]




What is recognition?

Longman Dictionary: 1. the act of realizing and accepting that something is true or important 2. public respect and thanks for someone's work or achievements 3. the act of knowing someone or something because you have known or learned about them in the past 4. the act of officially accepting that an organization, government, person etc has legal or official authority

awareness perception cognition

understanding

Intelligent mental function

Outer

World Environments

Other systems

Intelligent System

Reasoning, Learning

Knowledge

Perception

(Sensory System）

Recognition Sensation

Action, Manipulation

(Motor System)

Architecture of Intelligent Systems

Thought

Interaction

Report 1

(a) Describe the meanings of and differences among 1. Sensation, 2. Perception, and 3. Recognition.

(b) Describe the differences between 1. Cognition and 2. Recognition.

Information at the (physical) signal level

VS

Information at the (mental) cognitive level

Discrimination between these levels is important!

>

>

<

<

Physical Quantity vs Psychophysical Quantity

We see what we want to see and

hear what we want to hear.

Information at the image (signal) level

a pair of intersecting line segments in an image

an imaged part

(b) Electric circuit world

Information at the cognitive level

(a) block world

an imaged part

3D world Semantic world

θ

Shape from Shading

Outer

World Environments

Other systems

Intelligent System

Reasoning, Learning

Knowledge

Perception

(Sensory System）

Recognition Sensation

Action, Manipulation

(Motor System)

Architecture of Intelligent Systems

Mental World (Informatics) Physical World

(Physics)

How to bridge two worlds

Thought Thought

Architecture of 21st Century

Cyber Society (mental world)

Physical World

Cyber Network Society

Physical Real World

Social Structure in the 21st Century

？

Physical Laws (obey)

Rules, Standards (comply)

Physical Model

Computation Model

Cyber-Physical Systems

13

Cyber-Physical Systems for Developing Smart Society

1. e-money in economy 2. e-Tag in transportation (ubiquitous

systems) 3. Digitizing Human Activities 4. Smart Energy Management


Physical Real World

(1) e-Money in economy

Authentication Security Pricing

Credit Warranty


Physical Real World

(2) e-Tag in transportation (ubiquitous systems)

ID type age origin grade

11 onion １ kyoto A

12 beef ３ USA B

E-tag

E-tag

ID role name opinion

8 leader Jim Yes

10 chair John ？

Location Information GIS

(3) Digitizing Human Activities

Real-time Integration

Sensing and Recognition Presentation and Control


Human

Sensor Networks Embedded in the Real World

Motion and Blood Pressure Sensor

Taken from Panasonic Homepage

Real-Time Sensing &

Control

Power, Frequency, Phase Sensing

Power, Frequency, Phase Control


(4)Integrating Information and Electricity Networks

Physical Real World (Electricity Power Network)

Solar Cell

Fuel Cell

Solar Cell

Solar Cell

Solar Cell

Solar Cell

battery

EV

EV

battery battery

battery

・distributed ・personalization ・bi-directional

Fundamental Concepts

Symbol (in the cyber network society) segmented entity with a unique ID

(basic processing: entity identification)

Signal (in the physical real world) non-segmented numerical data with physical measures

(basic processing: segmentation, similarity evaluation)

Pattern Recognition: Transform signal data to symbols. Informationization

Informationization: Bridging between cyber and physical worlds

Digitization

t

≠ Informationization


Physical Real World

ID Authentication

Object Recognition

Real world objects human

car dog cat

real estate

Data Structure numeral character

figure graph tree

Relation Interaction

Computation

Modeling Prediction

Mechanism of Informationization

bit sequence：0100, 1110

Representing Information in a Computer

Internal State of a computer (Electronic Circuit)

World of Information (Cyber Society)

Algorithm

, Coding

numeral, character, sound, image tree, graph, knowledge, concept

computation reasoning

bit operation

Report 2

In each application of the cyber-physical systems, explain how pattern recognition technologies can be used to realize the informationization. 1. e-money in economy 2. e-Tag in transportation (ubiquitous systems) 3. Digitizing Human Activities 4. Smart Energy Management

Pattern Recognition

Pattern Recognition in Informatics

１． What are patterns? ① Classes/categories/types of objects （class: a set of objects）

② Internal structures of objects（example: design patterns, fabric patterns, sound and image patterns, behavior patterns）

２． What is recognition? ① Decision about the membership of a set

（class/category/type classification） X（observed data） ∈ C（class）？

② Identification of an object （similarity, identity） X （observed data）= M（object model）？

Types of Pattern Recognition Methods

Types of

information

Method of

recognition

Classification

(Categorization)

Matching

(Identification)

Attributes Relations

Statistical

Pattern Classification

Syntactic


Pattern Matching Computer Vision

Image Understanding

Data Representation in Statistical Pattern Classification

All data are represented by Feature Vectors

X ＝

X1 X2 ・・・ Xn

heightweight ・・・ age

Vector Representation of Video Data

Video Data i-th frame 1D signal

1/30 second

Scan line

Raster scan

Frame 1 Frame 2

row1 row2…rowN row1 row2

t 5 10 11 9 6 3 3 3 5 13 15 11

Class1

Class3 x1

x2

Class2

decision boundaries

Feature Vectors

X ＝

X1 X2 ・・・ Xn


Basic Scheme of Statistical Pattern Classification

Processes of Image Recognition

How Pattern Classification is used in practical applications.

Image Processing, Image Recognition,

Computer Vision, Image Understanding

•Image Processing ： image → image

•Pattern Classification ： feature vector → class name

•Computer Vision, Image Understanding ： image → scene description

•Image Processing ： signal processing + geometric processing

•Image Recognition ： image processing + pattern recognition

•Computer Vision ： image processing + camera/3D model

•Image Understanding ： image processing + knowledge/reasoning

Input / Output Data

Computational Methods

Image Processing ー contrast enhancement －

Image input preprocessing

Output Image

Input Image

Weighting Matrix

Sum of Products

Spatial Filtering (2D Convolution)

∫∫ −−=S

dxdyyxtyxfS ),(),(),(

:nConvolutio 2D

βαβα

−−−

−

010151

010:Filter Sharping

Image Processing ー silhouette extraction －

Image input preprocessing Image feature extraction (segmentation)

Geometric Processing ー extraction of small defects －

Input binary image expansion erosion

expansion erosion output XOR

Image input preprocessing Image feature extraction (segmentation)

Feature Measurement

Concavity (area size, number)

×

Shape projection

Bounding box (area size, location)

Principal axis (moment, direction)

Convex hull (area size)

Chord (length)

Area size

Boundary length

Hole (area size, number)

Feature measurement Image input preprocessing Image feature extraction (segmentation)

Feature vector

X ＝

X1 X2 ・・・・・・・ Xn

Color features

Shape features

Texture features

Region / line in an image

Image input

preprocessing Image feature

extraction (segmentation)

Feature measurement

Image Processing

recognition


Class1

Class3 x1

x2

Class2

decision boundaries

Feature Vectors

X ＝

X1 X2 ・・・ Xn


Basic Scheme of Statistical Pattern Classification


Types of

information

Method of

recognition

Classification

(Categorization)

Matching

(Identification)


Statistical


Syntactic



Image Understanding

Recognition by Matching

【Signal Matching】【Symbol Matching】 ① Template Matching ① Word Matching ② Elastic Matching ② DNA Analysis ③ Model Matching ③ String Pattern Matching “at” matches with “hat”, “cat”, “bat”, … ④ Unification unify(f(x), f(g(a)) x=g(a)

2)()|(minarg tsignaltmodel −θθ

Correlation

*)()()(:TransformFourier )()()(

:)( and )(between Function n Correlatio

ωωω GFYdstsgsfty

tgtf

=−= ∫∞

∞−

　　

Correlation Function between Signals

∫∫∫

∫

∫

∞

∞−

∞

∞−

∞

∞−

∞

∞−

∞

∞−

−+=

+−=

−

dttgtfdttgdttf

dttgtgtftf

dttgtf

)()(2)()(

))()()(2)((

))()((

:sDifference Squared of Sum

22

22

2

Correlation Function: 　∫∞

∞−

−= dstsmsgtr )()()(

Input signal to be processed

Target signal to be matched

x

Correlation Function：　∫∞

∞−

−= dstsmsgtr )()()(

a ta ma

a

Normalized Correlation Function

∫∞

∞−−= dstsgsfty )()()(

:y)(SimilaritFunction n Correlatio

∫ ∫

∫∫∫

∞

∞−

∞

∞−

∞

∞−

∞

∞−

∞

∞−

==

−−−

−−−=

dttgtgtgdttf

tftf

dsgtsgdsfsf

dsgtsgfsfty

2

2

2

2

22

*

||)(|||)(|,||)(||

|)(|

)()(

)()()(

:Functionn Correlatio Normalized

　　

Invariant against biasing and scaling

∫∫

∫∫∫∫

∫∫

−−−

−−+=

−−−=

S

SS

S

dxdyyxtyxf

dxdyyxtdxdyyxf

dxdyyxtyxfD

),(),(2

),(),(

),(),(),(

rity)(dissimila images obetween tw Difference

22

2

βα

βα

βαβα

　　　　

　　　　

∫∫∫∫

∫∫

∫∫

−−−−

−−−−=

−−=

SS

S

S

dxdytyxtdxdyfyxf

dxdytyxtfyxfS

dxdyyxtyxfS

22

*

),(),(

),(),(),(

:Fucntionn Correlatio Normalized

),(),(),(

:y)(SimilaritFunction n Correlatio

βα

βαβα

βαβα

Image Processing by Correlation

Template Matching

),( yxf ),( yxt

Image Light Source P Light Source

Camera

3D scene

P

P’

P’’

Stereo Image Analysis

Finding the best matching point

• Resultant displacement is in units of pixels.

),( yxf

),( yxt

49

Depth Measurement by Triangulation The 3D depth of a scene point can be computed from a pair of matching image points in left and right images.

Baseline b θ1 θ2

?d Image plane of camera 2 Image Plane

of camera 1

l1 cosθ1 + l2 cosθ2 = b l1 sinθ1 = l2 sinθ2 = d

Eliminate l1, l2

d = b/(tan-1 θ1 + tan-1 θ2 )

l2 l1

Motion Analysis by Correlation

Observed 2D motion images T=2 T=F T=1

template image

Best matching position

Motion vector: (i0-x0, j0-y0)

Elastic (DP) Matching

Model

Signal

P1 P2 P3 P4 ・・・・・・・・・・・・・・・・・・・・・・・・ Pn

Qm ・・・・・・・・・ Q4 Q3 Q2 Q1

Mode l

Signal

Q1 matches with P1 and P2

Principle of Optimality: An optimal policy has the property that whatever the initial state and initial decision are, the remaining decisions must constitute an optimal policy with regard to the state resulting from the first decision.

),,,()],1([

),,,,()1,1([

),,,,()]1,([min)],([:),( cost to minimum

1

11

1

><><+−

><><+−−

><><+−=

−

−−

−

jiji

jiji

jiji

QPQPcjiPEQPQPcjiPE

QPQPcjiPEjiPEjiP

The optimal paths to

these points have been computed

1−iP iP

jQ

1−jQ

Report 3

∫∫

∫∫

−−=

−−=

S

dxdyyxtyxfS

dxdyyxtyxfC

),(),(),(

:n Correlatio

),(),(),(

:nConvolutiorelation. their Discuss

similar.look very n Correlatio andn Convolutio

S

βαβα

βαβα

Computational Scheme of Statistical Pattern Classification


Types of

information

Method of

recognition

Classification

(Categorization)

Matching

(Identification)


Statistical


Syntactic



Image Understanding

Natural

Pattern Feature

Measurement

Feature Selection

(Extraction) Classification

Learning

Sample

Pattern (Training Sample)

Picture

Data

Feature

Set 1

Feature

Set 2 x =

x x

x

１

2

m

．．．．

y =

y y

y

．．．．

１

2

n

Class

Name

Feature Vector : x =

x x

x

１

2

n

．．．．

measurement 1 measurement 2

．．．．

measurement n

α

Architecture of Statistical Pattern Classification Systems

g 1

g 2

g c

x

x

x

1

2

d

MAX

g c (×)

(×) g 2

g 1

(×)

FEATURE DISCRIMINANT VECTOR FUNCTIONS

MAXIMUM DECISION

SELECTOR

α

Architecture of Pattern Classifiers

[1] Nearest Neighbor Classification

Class1

Class3 x1

x2

Class2

decision boundaries

Feature Vectors

X ＝

X1 X2 ・・・ Xn


Basic Scheme of Nearest Neighbor Classification

[Q1]What distance measures?

Measuring Unit Problem

height(cm)

weight (Kg)

Unit Change

height(cm)

weight (g)

Non-isotropic distance measure based on the shape of data distribution

=

nx

xx

X2

1

Distance between vectors and

=

ny

yy

Y2

1

1. Euclidean Distance :

2. Distance :

3. Similarity :

4. Mahalanobis Distance :

1L

2/12

1])([ i

n

ii yx −∑

=

||1

i

n

ii yx −∑

=

YXYX

⋅⋅

=θcos

2/11 )]()[( MXMX t −− ∑−

(M : Mean Vector, ∑ : Covariance Matrix )

•

θX

Y

1

2

3

Distance Measures between a pair of feature vectors

=

px

xx

2

1

x

=

pµ

µµ

2

1

u

=∑pppp

p

p

i

σσσ

σσσσσσ

21

22221

11211

Mean Covariance Matrix

Parameter Estimation from sample set Nxxx ,, 21

∑=

=N

lili x

N 1

1µ pi ,,2,1 =

))((1

11

kkljjl

N

ljk xx

Nµµσ −−

−= ∑

=pkpj

,,2,1,,2,1

==

Mean and Covariance matrix of data distribution

2µ

2x

1µ 1x

•

•

•

•• ••

•

•

•

•••

••

• ••

•

•••

•••

•

•

•

•

•

(b) SCATTER DIAGRAM ( a ) BIVARIATE NORMAL DENSITY

Two representations of a normal density.

n-dimensional Normal Distribution

[Q2]Distance between which entities?

Distance to distribution centers

Class1

Class3 x1

x2

Class2

Feature vector

X ＝

X1 X2 ・・・ Xn

Decision boundary

Distance to sample data

• Decision rule: – Find k nearest neighbor sample

data. – Find the most popular class by

voting from the k nearest neighbor sample data.

1x 2xjx

nx

...

2ω

1ωInput feature vector x

n dimensional feature space

X

Distance by voting: k-nearest neighbor classification

Report 4

Compare the performance between 1. the nearest neighbor classification with the Mahalanobis distance and 2. the k-nearest neighbor classification in the following case.

[2]Statistical Pattern Classification

. is nature of state when theaction for taking incurred loss the)|(diagnoses i.e. actions, possible ofset finite the,,

classesobject i.e. nature, of states ofset finite the,,

1

1

iiii

a

s

aAs

ωαωαλααωω

==Ω

x).|(x)|(x)|( x)|(x)|(x)|(

casecategory two

2221212

2121111

ωλωλαωλωλα

PPRPPR

+=+=

).()|()(

where

,)(

)()|()|(

1i

s

ji

iii

Ppp

pPpp

ωω

ωωω

∑=

=

=

xx

xxx

)|()|()|( 1

xx jj

s

jii pRrisklconditiona ωωαλα ∑

=

=

統計的パターン分類（BAYES DECISION THEORY）

A priori probability

a posteriori probability

Bayesian Decision Rule

Probability distribution

. and estaimate todifficult isIt

data. sample from ,,, as wellas , Estimate

),N()|(

),N()|(

:onsDistributi Normal of Mixture Assume

data. sample from ,,, Estimate),N()|(

),N()|(:onDistributi Normal Assume

21

2121

1222

1111

2121

22

111

2

1

NNba

bp

ap

pp

ji

N

jjjj

N

iiii

ΣΣ

Σ=

Σ=

ΣΣΣ=Σ=

∑

∑

=

=

µµ

µω

µω

µµµωµω

x

x

xx

2

【２】

【１】

Estimation of Probability Distribution Function

Report 5

.Statistics andTheory Probality between sdifference Discuss

y.Explain whtraining.-over fitting,-over todue

effectivenot is functionson distributiy probabilit as data sample from computed histograms Using

[3]Linear Discriminant Function

g 1

g 2

g c

x

x

x

1

2

d

MAX

g c (×)

(×) g 2

g 1

(×)

FEATURE DISCRIMINANT VECTOR FUNCTIONS

MAXIMUM DECISION

SELECTOR

α

Architecture of Pattern Classifiers

).( ofsign on the baseddecision make and ,)( asit denoteLet

0)()()()()(BoundaryDecision

)()(Functionsnt DiscriminaLinear

0

212121

222111

xxwxxwwxxx

xwxxwx

gwgwwggg

wgwg

t

tt

tt

+=

=−+−=⇒=

+=+=

　　　：

，：

Class1

Class2

x

g＞0

g＜0 g = 0

Two Class Linear Discriminant Function

Geometric Representation

).( ofregion positive the towardheading is

.||)( |,|)( ,0)( sinceThen

.||

)()||

()(

have which wefrom ,||

:follow as represent can Then we .0)( onto

of projection orthogonal thedenote Let .0)(by defined

hyperplane the toorthogonal is that means This0)( Then,

boundary.decision on the points denote and Let

0

210201

21

xwwxwxx

wwwx

wwxwx

wwxx

xxxx

xw

xxwxwxwxx

g

grrgg

rgwrg

r

g

g

ww

p

t

ppt

p

p

ttt

　

　①

　

===

+=++=

+=

=

=

=−⇒+=+

g＞0

g＜0 g = 0

______|)(| xg|||| w

xW

|| 0w___|||| w

px

!system coordinate same in thedrepresente are vector feature theand t vector coefficien that theNote

xw

1x

2x・

・

kkk

kk

Ykkpkkk

t

Y

tp

iit

iii

byiedmisclassifsampleparameterinitialarbitrary

datasampleofsequencecyclicPROCEDURECORRECTIONSAMPLESINGLE

JPROCEDUREDESCENTGRADIENT

YwhereJFunctionCriterionPerceptronMinimize

sampleallforthatsuchfind

thenif

wg

w

ayyaaa

yyyyyyyyy

yaaaa

yaayaa

yyaa

yyy

yaxwx

wa

xy

ay

ay

tt

　　　：　

　　　　：

　　　　：

　　　】　【

　　

）　（：　　

　　　３】【

　　　　　　　　　　　

　　　　　　

２】【

　　

，　　

１】【

’

+=

+=∇−=

<−=

>

−=∈

=+=

=

=

+

∈+

∈

∑

∑

1

1

321321321

)(1

)(

2

2

0

0

,,,,,,,,,,,STEP3

)(

data sample iedmisclassif0)()()(STEP

0Then,

:. class of data sample all ofsign theFlipSTEP

.)(by drepresente isfunction nt discrinima Then the

1t vectors.coefficien and feature ExtendSTEP

ρρ

ωω

Learning the coefficient vector from sample data

)( 11 ay

)( 22 ay

1ya01 >yat

2y

∑

∑

∈+

∈

+=

<−=

)(1

)(data sample iedmisclassif0)()()(

STEP

ay

ay

yaa

yaayaa

Ykkk

t

Y

tp

PROCEDUREDESCENTGRADIENT

YwhereJFunctionCriterionPerceptronMinimize

ρ　　

）　（：　　

　　　３】【

system. coordinate same in thedrepresente are vector featureand t vector coefficien that Note

ya

)( 11 ay

)( 22 ay

ky

ka

0>ktya

ky

1+ka

kkk

kk byiedmisclassifsampleparameterinitialarbitrary

datasampleofsequencecyclicPROCEDURECORRECTIONSAMPLESINGLE

ayyaaa

yyyyyyyyy

　　　：　

　　　　：

　　　　：

　　　】　【 ’

+=+1

1

321321321 ,,,,,,,,,,,STEP3

Report 6

Describe how we can generalize the two-class linear classifier to a multi-class classifier.

Class1

Class2

x

g＞0

g＜0 g = 0

Maximize the margin

Optimizing Generalization Capability in Linear Discriminant Function

unique!not isfunction But the them.separatingfunction nt discrimina aget can we

separable,linearly are data sample class two that whenNote

Optimizing Generalization Capability in Linear Discriminant Function

. data sample allfor 0)(such that Find

i

itg

yyaxa >=

)( 11 ay

1ya01 >yat

abt >1ˆ ya

. data sample allfor 0ˆ)(such that ˆ Find

i

it bg

yyaxa >>=

b

2x

1x

Linearly non-separable

312)(

space. feature original in thefunction nt discriminalinear -non

312)(

space. feature extended in thefunction discrinantlinear theFind

, ,:mappinglinear -nonby vector feature Extend

2121

321

3212211

−−+=

−−+=

→→→

xxxxg

yyyg

yxxyxyx

x

y

1y

3y

2y

Generalized Linear Discriminant Function

S layer R layer A layer

i

j

Random connection

coefficients： 1±

Complete connection

coefficient： ijω

≥

<=±==

∑

∑∑

=

=

=a

s

N

iiij

Na

iiij

jm

N

mmmi

Taif

Taifrsa

1

1

1 1

0)1,0(

ω

ωαα

　　　　

　　　　

　　　　　

Perceptron

The Perceptron is a linear discrininant

function.

Natural

Pattern Feature

Measurement

Feature Selection

(Extraction) Classification

Learning

Sample

Pattern (Training Sample)

Picture

Data

Feature

Set 1

Feature

Set 2 x =

x x

x

１

2

m

．．．．

y =

y y

y

．．．．

１

2

n

Class

Name

Feature Vector : x =

x x

x

１

2

n

．．．．

measurement 1 measurement 2

．．．．

measurement n

α

Principal Component Analysis Independent Component Analysis

Discriminant Analysis Multivariate Analysis

k-NN method Bayesian Method Sub-space method

Support Vector Machine Hidden Markov Model Dynamic Programming

Model Fitting Clustering

Self-Organizing Map Multidimensional

Scaling ML Estimation EM Algorithm

Architecture of Statistical Pattern Recognition Systems

【Artifacts】 1. Highly correlated features are included. The recognition rate does not improved as expected.

2. Sparse distribution Curse of dimensionality (Hughes effect) Require larger training samples for learning

3. The recognition error rate may be increased! over-fitting

Select useful features from observed features. （Pattern recognition systems should be designed

to recognize UNKNOW data correctly!）

Myth: Increase features to improve recognition rate.

image processing, image recognition, computer vision...

Documents