digital image processing lectures 13 & 14digital image processing lectures 13 & 14 m.r....

Karhunen-Loeve (KL) Transform Face Recognition and Eigen-Faces Short-Time Fourier Transform

Digital Image ProcessingLectures 13 & 14

M.R. Azimi, Professor

Department of Electrical and Computer EngineeringColorado State University

Spring 2013

M.R. Azimi Digital Image Processing


Properties of KL TransformThe KL transform has many desirable properties which makes it optimalfor many signal/image processing applications.

i. DecorrelationThe KL coefficients X(k), k ∈ [0, N − 1] are uncorrelated i.e.

E[X(k)X∗(l)] = γkδ(k − l)

since

E[XX∗t]

∆= Ψ∗tE[xxt]Ψ = Ψ∗tRΨ = Γ = Diag(γ0, · · · , γN−1)

That is each eigenvalue γi is the variance (or energy) of the ith elementof X along eigenvector ξ

i. Thus, the KL components represent the

contributions of the data along the relevant coordinates. Note that Ψ isnot a unique matrix w.r.t. this property, and there could be manymatrices (unitary or non-unitary) that would decorrelate the data intransformed domain.



ii. Optimality and Data ReductionConsider the scenario depicted below where vector x is transformedto X using N ×N unitary matrix A. The elements of Z arechosen to be the first m elements of X and zero elsewhere i.e.

Z(k) =

X(k) k ∈ [0,m− 1]0 k ≥ m

That is, Z is in the (m ≤ N)-D subspace, though both x and Xare in N -D space. Then, Z is transformed to x using N ×Nunitary matrix B. The average MSE between the original signalx(n) and reconstructed signal x(n) is,

Jm∆=

1

NE[

N−1∑n=0

|x(n)− x(n)|2]



Now, it is desired to find matrices A and B such that Jm isminimized for each and every choice of m ∈ [1, N ].

Theorem:The MSE Jm is minimized for every choice of m when we haveA = Ψ∗t, B = Ψ, AB = I where the columns of Ψ are arrangedaccording to the decreasing order of the eigenvalues of R

Proof: See A.K. Jain, “Fundamentals of Digital Image Processing”.

Note that Jm is equal to the total energy in the discarded eigenvalues.To see this, using the unitary property we can rewrite Jm as

Jm =1

NE[

N−1∑k=0

|X(k)− Z(k)|2] =1

NE[

N−1∑k=m

|X(k)|2] =1

N

N−1∑k=m

γk

This result leads to the following procedure for applying KL transform for

data reduction.



KLT/PCA Procedure for Data Reduction:

1 Form R matrix and diagonalize it to findγ0 > γ1 > · · · > γN−1.

2 If the 1st m eigenvalues contain most of the energy i.e.

η =∑m−1i=0 γi∑N−1i=0 γi

≥ e.g., 95%, then form

Ψred = [ξ0, ξ

1, · · · , ξ

m−1]

3 Transform the data to an m× 1 PCA/KLT vector

Xred = Ψ∗redt · x

4 To reconstructxrec = ΨredXred

Clearly, if m = N we have perfect reconstruction i.e. xrec = x.



iii. Distribution of VarianceAmong all unitary transforms KL packs the maximum averageenergy into m ≤ N elements of X. That is, if A is any otherunitary transform and Ψ∗t is the KL transform matrix, then for anym ∈ [1, N ],

Sm(Ψ∗t) ≥ Sm(A)

where Sm(A) is energy function of unitary transform A, i.e.

Sm(A)∆=

m−1∑k=0

σ2k

and σ2k

∆= E[|X(k)|2] with σ2

0 ≥ σ21 · · · ≥ σ2

N−1.Proof: Note that

Sm(A) =

m−1∑k=0

(ARA∗t)k,k = tr(ImA∗tRA) = Jm

where Jm is the total energy in the retained eigenvalues.M.R. Azimi Digital Image Processing


We know from property (ii) that Jm is maximized (or Jmminimized) when A is the KLT. Since σ2

k = γk when A = Ψ∗t fromthe KL property

Jm =

m−1∑k=0

γk ≥m−1∑k=0

σ2k, m ∈ [1, N ]

2-D KL Transform of ImagesFor a zero mean 2-D random process x(m,n), m, n ∈ [0, N − 1] usingthe same procedure adopted for DFT and DCT, the 2-D KL transform ofimage matrix x is

X = Ψ∗1txΨ2

∗

where Ψ∗1t and Ψ∗2 are 1-D KL matrices applied to columns and rows of

the image, respectively. The inverse KT transform is

x = Ψ1XΨt2



Eigen-images of 2-D KL Transform

The basis images (or eigen-images) of 2-D KL transform or PCA,are K(k, l) = ξ

1kξt

2l, k, l ∈ [0, N − 1] where ξ

1kis kth column of

Ψ1 and ξt2l

is lth row of Ψt2. Image x is decomposed as a linear

combination of these eigen-images with the KL coefficients (orPC’s), X(k, l)s, i.e.

x =

N−1∑k=0

N−1∑l=0

X(k, l)K(k, l).

However, this requires finding two KL matrices Ψ1 and Ψ2. On theother hand, arranging image x into a 1-D vector x leads to sizeN2 ×N2 covariance matrix which is not practical either. Followingalgorithm gives an efficient method to find these eigen-images.Algorithm for Extracting Eigen-imagesLet xi, i ∈ [1,M ] be a set of N ×N training images. Each xiimage is converted to a vector, xi, of size N2 × 1. Then,



(i) Find the ensemble mean image µ = 1M

∑Mi=1 xi and

mean-subtract each image, i.e. xi = xi − µ and form data

matrix Υ = 1√M

[x1 · · · xM ].

(ii) Find the ensemble covariance matrix R = 1M

∑Mi=1 xix

ti or

R = ΥΥt.

(iii) Find the principal eigenvectors of R by solving Rξk

= γkξk.These ξ

ks rearranged in image matrix form are the

eigen-images.However, since R is rank M − 1 (i.e. M N2 eigen-images)solving the original N2-D eigenvalue problem is inefficient.Thus, instead we solve the M -D eigenvalue problem,

ΥtΥζk

= γkζk

Now, pre-multiplying this Eq. by Υ yields

(ΥΥt)Υζk

= γkΥζk



Alternatively,RΥζ

k= γkΥζk

implying that the eigen-images can be obtained from

ξk

= Υζk

=1√M

M∑l=1

ζl,kxl

where ζl,k is the lth element of vector ζk. An eigenvalue

associated with an eigen-image represents how much the image inthe training set vary from the mean image. We keep P ≤Meigen-images associated with the largest eigenvalues.

(iv) The reconstructed version of the ith training image fromonly P eigen-images is

xi =

P∑k=1

Xi(k)ξk

+ µ

where Xi(k) = ξ∗ktxi/γk is its kth PC. This is due to

ξ∗ltξk

= ζ∗ltΥtΥζ

k= ζ∗

ltγkζk = γkδ(k − l).



Any other image y (mean subtracted by µ) can also be representedby the same eigen-images fairly accurately using

y =

P∑k=1

Y (k)ξk

+ µ

where Y (k) = ξ∗kty/γk is the kth PC of y. The PC or KL

transform vector Y = [Y (0), · · · , Y (P − 1)]t can be used asfeature vector to represent this image.

(v) A simple image recognition (minimum distance classifier) canbe implemented using

j = argmink||Y −Xk||2

where Xk is the PC vector of the kth training sample and j isthe class of the training sample that has the closest match(MSE sense) to the unknown image y.



Face Recognition using Eigen-Faces

Turk & Pentland,1991 showed that with only a few eigen-faces(standardized face ingredients) extracted from an ensemble ofimages, any other face can be fairly accurately represented. Thismethod is used not only in face recognition but also in handwritinganalysis, lip reading, voice recognition, sign language/handgestures interpretation and medical imaging.Figures below show the original M=20 training faces and first 16(out of 20) eigen-faces ordered column-wise.




Figures below show fairly accurate reconstruction of two trainingimages using only P=7 eigen-faces.




Figures below show the reconstructed images of a face wearing twodifferent glasses (transparent and dark). The error images are alsoprovided. The reconstructed image in the second case is not asgood. Why? There are more robust methods (e.g., non-linearPCA) for these cases where the test sample is different than thoseof training samples.



Short-Time Fourier & Wavelet Transforms

Several major shortcomings of the Fourier transform include:

1 Does’t allow for simultaneous time and frequency domain analysis(e.g., FT cannot localize a particular note in a given piece of music).Lack of time-frequency localization is an important drawback fordetecting and isolating events in both time or frequency domains.

2 Not useful for analyzing non-stationary signals.

3 Not appropriate for representing discontinuities or sharpchanges(i.e., requires a large number of Fourier components torepresent discontinuities).

4 It does’t provide multi-resolution look at the signals/images.

These deficiencies were first identified by D. Gabor, 1946 who introduced

the time-localization using STFT or Gabor transform.



Short-Time Fourier Transform (STFT)-Fixed resolution

Time localization in FT can be achieved by windowing the signal x(t)over which the signal is nearly stationary. The FT of the windowed signalyields the STFT as

XSTFT (τ, ω) =

∫ ∞−∞

x(t)g∗(t− τ)e−jωtdt

where g(t) is the window function and τ is the center of the window.This windowing in STFT introduces time dependency in the analysis.

There are two ways to interpret STFT: (1) FT (over all frequencies) ofthe windowed signal around every τ ; or (2) if we leth(t− τ) = g∗(t− τ)e−jωt, then STFT becomes convolution integralrepresenting the output of a bandpass filter whose frequency response iscentered around every ω i.e., STFT amounts to filtering the signal ”at alltimes” with a bandpass filter having an impulse response which is thewindow function modulated to that frequency. Thus, STFT may beviewed as a modulated filter bank.



Time-Frequency ResolutionsThe resolution in frequency domain corresponds to the bandwidth of thebandpass filter that is measured by estimating the RMS value

∆ω =(∫∞−∞ ω2|G(ω)|2dω)

12

(∫∞−∞ |G(ω)|2dω)

12

where G(ω) is the FT of g(t). The time resolution is given by the spread(window width) in the time domain i.e.

∆t =(∫∞−∞ t2|g(t)|2dt) 1

2

(∫∞−∞ |g(t)|2dt) 1

2

Owing to the Heisenberg inequality we have

Time− Bandwidth = ∆t∆ω ≥ 1/2 Lower bound

i.e. resolution in time and frequency cannot be made arbitrarily small.Thus, two sinusoids in the frequency domain may only be discriminated ifthey are more than ∆ω apart and two impulses in the time domain canbe separated if they are more than ∆t apart.



The lower bound (equality) is achieved when Gaussian window (FTis also a Gaussian) is used. This gives the “Gabor transform”.

gα(t) =1

2√πα

e−t24α

XGT (τ, ω) =

∫ ∞−∞

x(t)gα(t− τ)e−jωtdt

Note that because of∫∞−∞ gα(t− τ)dτ =

∫∞−∞ gα(τ)dτ = 1, we get∫ ∞

−∞XGT (τ, ω)dτ = X(ω)

i.e collection of localized Gabor transform (local spectralinformation values) gives the global FT of the signal. The width ofthe Gabor window is ∆t =

√α with α > 0. If we define the Gabor

basis function as gτ,ω(t) ≡ gα(t− τ)e−jωt then

XGT (τ, ω) =

∫ ∞−∞

x(t)gτ,ω(t)dt



Remarks

1 The plot of |XSTFT (τ, ω)|2 in (τ, ω)-plane is referred to as“spectrogram”, which is a very useful tool in signal analysis as itprovides a distribution of the signal in time-frequency plane. Someexamples of spectrograms for two different signals (linear FM andsignal with transients) are shown.



2 For a discrete-time signal, x(n), STFT becomes,

XSTFT (m,Ω) =

∞∑n=−∞

x(n)g(n−m)e−jΩn

where g(n) is the window function and m is the center of thewindow (or shift) in this case.

3 The biggest drawback of the STFT is its fixed resolutionwhich implies that arbitrary good resolution in time andfrequency cannot be achieved simultaneously. Choosingnarrow window =⇒ good time resolution but poor frequencyresolution; while choosing wide window =⇒ good frequencyresolution but poor time resolution.


digital image processing lectures 13 & 14digital image processing lectures 13 & 14 m.r....

Documents