kernels for dynamic textures - purdue universityvishy/talks/dynamic.pdf · 2009. 8. 22. · dynamic...

S.V.N. Vishwanathan: Kernels for Dynamic Textures, Page 1

Kernels for Dynamic TexturesS.V.N. Vishwanathan

SVN.Vishwanathan@nicta.com.auhttp://web.anu.edu.au/~vishy

National ICT Australiaand

Australian National University

Joint work with Alex Smola and René Vidal

Roadmap

Introduction to Kernel Methods

Why kernels?

Kernels on Dynamical Systems

Trajectories, Noise ModelsComputation

Dynamical Textures

ARMA ModelsApproximate SolutionsKernel ComputationExperiments

Outlook and Conclusion

Classification

Pairs of observations (xi, yi)

Underlying distribution P(x, y)

Examples (blood status, cancer), (transactions, fraud)

Find a function f (x) which predicts y given x

The function f (x) must generalize well

Optimal Separating Hyperplane

Minimize1

2‖w‖2 subject to yi(〈w, xi〉 + b) ≥ 1 for all i

Kernels and Nonlinearity

Problem: Linear functions are often toosimple to provide good estimators

Idea 1: Map to a higher dimensionalfeature space via Φ : x → Φ(x) andsolve the problem there Replace ev-ery 〈x, x′〉 by 〈Φ(x), Φ(x′)〉

Idea 2: Instead of computing Φ(x) ex-plicitly use a kernel functionk(x, x′) := 〈Φ(x), Φ(x′)〉A large class of functions are admis-sible as kernels

Non-vectorial data can be handled ifwe can compute meaningful k(x, x′)

Roadmap

Why kernels?

Dynamical Textures

The Basic Idea

Key Observation:

Trajectories are easily observableSimilar trajectories ⇒ similar systemsRestrict attention to interesting casesAverage over noise models

Kernels Using Dynamical Systems:

Simulate system for both inputsSimilar time evolution ⇒ similar inputs

Kernels on Dynamical Systems:

Restrict to interesting initial conditionsSimulate both the systemsSimilar time evolution ⇒ similar systems

Notation

X - state space (Hilbert space)

A - time evolution operators

T - time of measurement

µ - nice probability measure on T

Discounting Factors:For some λ > 0

µ(t) = λ−1e−λt for T = R+0

µ(t) =e−λt

1− e−λfor T = N0

Time Evolution:We study

xA(t) := A(t)x for A ∈ A

Trajectories and Kernels

Comparing Trajectories:Using the dot product on X we define a dot product on XT

〈θ, θ′〉 := Eµ[〈θ(t), θ′(t)〉] for θ, θ′ ∈ XT

Extending to Dynamical Systems:Identify a dynamical system with its trajectory and define

k((x,A), (x, A)) := Eµ

[〈A(t)x, A(t) x〉

]Other Ideas:

A nicely decaying measure required for convergenceModify the dot product in X

Covariance matrices?Rational kernels and transducers

Special Cases

Kernels on Dynamical Systems:

Restrict attention to x = x

Compare trajectory for identical initial conditionsTake expectation if interested in a range of x

k(A, A) := Ex

[k((x,A), (x, A))

]More generally

k(A, A) := EA EA Ex

[k((x,A), (x, A))

]Kernels Using Dynamical Systems:

Restrict attention to a particular dynamical systemAs before we can take expectations over A

k(x, x) := Ex Ex EA [k((x,A), (x,A))]

Discrete Linear Systems

Linear Systems:

We assume time propagation occurs as

xA(t + 1) = AxA(t) + at + ξt

In closed form

xA(t) = At x0 +

t∑i=0

At−i ξi + At−i at

To avoid messy math assume at = 0 and hence

xA(t) = At x0 +

t∑i=0

At−i ξi

Contribution to kernel due to A as well as noise

Continuous Linear Systems

Linear Systems:

Sytem dynamics here are described by

dtxA(t) = AxA(t) + a(t) + ξ(t)

Here ξ(t) with E[ξ(t)] = 0 is a stochastic process and

xA(t) = exp(A t)x0 +

exp(A(t− τ ))(a(τ ) + ξ(τ ))dτ

As before we assume a(t) = 0

We even assume ξ(τ ) = 0 (avoids messy math again!)

xA(t) = exp(A t)x0

Kernel contribution only due to A

Convergence Criterion

Discrete Case:

Let A and B and W be linear operatorsThe matrix norms obey 0 ≤ ‖A‖, ‖B‖ ≤ Λ

For suitable λ with eλ > Λ2 and W � 0

∞∑t=0

e−λtAtWBt

Sylvester equation e−λAMB + W = M

Continuous Case:We define

∫ ∞

e−λt exp(At)>W exp(Bt) dt

Sylvester equation (A> + λ2 1)M + M(B + λ

2 1) = −W

Gory Details

Contribution due to A:

p∞∑t=0

e−λt〈Atx, Atx〉 := p · x>

[ ∞∑t=0

e−λt(At)>W At

= p · x>M x

Contribution due to noise:

p∞∑t=0

t∑j,j′=0

e−λt〈At−jξj, At−j′

ξj′〉

= p tr

[ ∞∑t=0

e−λt(At)>M At

]):= p tr(Cξ M)

In above equations p is a normalizing term

Delving Deeper

More on M and M :

The matrix M and M look like

[ ∞∑t=0

e−λt(At)>W At

[ ∞∑t=0

e−λt(At)>M At

]Sylvester Equation:

Both M and M satisfy the Sylvester equation

e−λ A>M A +W = M and e−λ A> M A +M = M

Can be solved for in cubic time

Discrete Kernel

Discrete Case:

Putting it all together

k((A, x), (A, x)) = p[x>M x+ tr(CξM)

]Note that Cξ is the covariance matrix of ξt

Can assume different noise models per time step

Initial Conditions:

C be the covariance matrix of the initial conditionsIf we set x = x then

k((A, x), (A, x)) = p[tr(CM) + tr(CξM)

Continuous Kernel

Contribution due to A:

Since we assumed a(t) = ξ(t) = 0 we get

k((x,A), (x, A)) = λ−1

∫ ∞

e−λt〈exp(A t)x, exp(A t) x〉dt

The Final Form:

The kernel can be expressed as

k((x,A), (x, A)) = λ−1x>M x

(A> +λ

21)M + M>(A +

21) = −W

Solution in cubic time by solving Sylvester equation

Special Cases

Snapshot:

If we consider only the snapshot at time instance T

k((x,A), (x, A)) = λ−1x exp(A t)W exp(A t)> x>

Initial Conditions:

Fix A = A

Now we just solve

M = −1

21)−1W

Dynamical Systems:

Fix x = x to get k(A, A) = λ−1 tr(MC)

Here C is the covariance matrix of initial conditions

Graph Kernels

Graph Laplacian:

Let E be the adjacency matrix and D := diag(E 1)

L := E −D and L := D−12LD−1

Diffusion Process:

We can define a diffusion process by

dtx(t) = Lx(t)

Diffusion Kernel (Kondor and Lafferty, 2002):

If we measure overlap at time instance T we get

K = exp(LT )> exp(LT )

Kij is the probability that state l reached from i and j

Graph Kernels

Undirected Graphs (Kondor and Lafferty, 2002):

Here L is symmetric and hence yields

K = exp(2LT )

Labeled Graphs (Gärtner, 2002):

If W acts as an indicator for node labelsSay Wij = 1 if two nodes have same labelFor other fancy weights see (Kashima et al, 2003)

Averaged Graph Laplacian:

If we average over a range of T values

Roadmap

Why kernels?

Dynamical Textures

ARMA Models

ARMA Model:

An auto-regressive moving average model is

x(t + 1) = Ax(t) + B v(t)

y(t) = φ(x(t)) + w(t)

x(t) is a hidden variablev(t) and w(t) are IID random noise

Linear Gaussian Model:

If φ is linear and the noise is white Gaussian:

x(t + 1) = Ax(t) + v(t) v(t) ∼ N(0, Q)

y(t) = C x(t) + w(t) w(t) ∼ N(0, R)

Fix scaling by demanding that C>C = 1

Dynamic Textures

Image Model:

y(t) ∈ Rm are the observed noisy imagesx(t) ∈ Rn (n < m) are hidden variables

Modeling:

A sequence of images {y(1), . . . , y(τ )} is observedIdeally we want to solve

A(τ ),C(τ ), Q(τ ), R(τ ) = arg maxA,C,Q,R

p(y(1), . . . , y(τ ))

Exact Solution:

n4sid in MATLAB solves above problemDoes not scale well if m is largeImpractical for images where m ∼ 105

Approximate Solution

Problem To Solve:

For any variable z(t) define Zτi := [z(i), . . . , z(τ )]

We are solving

Y τ1 = CXτ

1 + W τ1 with C>C = 1

Solving By SVD:

Solving for arg minC,Xτ1‖W‖ yields

C(τ ) = U and X(τ ) = ΣV > where Y τ1 = UΣV >

Solving for arg minA ‖Xτ2 −AXτ

1‖ yields

A(τ ) = ΣV >D1V (V >D2V )−1Σ−1

Here D1 =

1(τ−1) 0

]and D2 =

[1(τ−1) 0

Dynamic Texture Kernel

Kernel Definition:

Estimate model and compute kernels between modelsIf we average out the noise then for some W � 0

k((x0,A,C), (x′0,A′,C′)) := E

[ ∞∑t=1

e−λty>t Wy′t

]Kernel Computation:

The kernel can be computed as

k = x>0 Mx′0 +(eλ − 1

)−1tr[QM + WR

]The matrices M and M satisfy

M = e−λ A>C>WC ′A′ +e−λ A>M A′

M = C>W C′ +e−λ A> M A′

Experimental Setup

Typical Textures:

Some sample textures

A long clip was cut to shorter clips of 120 frames each

Freak Textures:

We also collected some freak textures

Results

Kernel Induced Metric:

Clips closer on a axis are from the same master clipWe plot the kernel induced metric for λ = 0.9 and 0.1

Results fairly independent of the cholice of λ

Notice the block diagonal structure of the metric matrix

Roadmap

Why kernels?

Dynamical Textures

Conclusion

A new method to embed dynamical systems

Analytical solutions for linear systems

Many graph kernels are special cases

Analytical solutions require cubic time

Are better solutions possible for special cases?

Extensions to nonlinear systems?

Application to dynamical textures

Works with approximate model parameters

Picks out clips from the same master clip

Close relations to rational kernels of Cortes et. al.

More information at http://mlg.anu.edu.au/~vishy

Questions?

kernels for dynamic textures - purdue universityvishy/talks/dynamic.pdf · 2009. 8. 22. · dynamic...

Documents

decomposition of dynamic textures using morphological...

optimization & learning for registration of moving dynamic...

properties of kernels - university of california,...

living globe creating the illusion of dynamic textures on a...

using&images&as&textures&in&maya& -...

metamorphic textures textures of contact metamorphism

spatially homogeneous dynamic textures · 2017. 8. 27. ·...

kernels - arxiv · among di erent kernels such security...

07.07.dynamic textures

lecture 21slide 16.837 fall 2001 texture-mapping tricks...

1 texture. 2 overview introduction painted textures bump...

metamorphic textures textures of regional metamorphism f...

dynamic spectral clustering based on kernels · dynamic...

a dynamic aspect-oriented system for data-driven profiling...

dynamic textures - ucla vision...

biologically inspired dynamic textures for probing...

why are some of kernels striped or spotted? the dynamic...

micro-expression recognition using dynamic...

kernels review

extensible kernels