computational multilinear algebra - cornell university · computational multilinear algebra charles...

Computational Multilinear Algebra

Charles F. Van Loan

Supported in part by the NSF contract CCR-9901988.

Involves Working With Kronecker Products

⎡⎢⎢⎣ b11 b12b21 b22

⎤⎥⎥⎦ ⊗⎡⎢⎢⎢⎢⎢⎢⎣c11 c12 c13c21 c22 c23c31 c32 c33

⎤⎥⎥⎥⎥⎥⎥⎦ =

⎡⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎣

b11c11 b11c12 b11c13 b12c11 b12c12 b12c13

b11c21 b11c22 b11c23 b12c21 b12c22 b12c23

b11c31 b11c32 b11c33 b12c31 b12c32 b12c33

b21c11 b21c12 b21c13 b22c11 b22c12 b22c13

b21c21 b21c22 b21c23 b22c21 b22c22 b22c23

b21c31 b21c32 b21c33 b22c31 b22c32 b22c33

⎤⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎦

“Replicated Block Structure”

Properties

Quite predictable:

(B ⊗ C)T = BT ⊗ CT

(B ⊗ C)−1 = B−1 ⊗ C−1

(B ⊗ C)(D ⊗ F ) = BD ⊗ CFB ⊗ (C ⊗D) = (B ⊗ C)⊗D

Think twice:B ⊗ C = C ⊗B

B ⊗ C = (Perfect Shuffle)(C ⊗B)(Perfect Shuffle)T

The Perfect Shuffle Sp,q

S3,4

⎡⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎣

01234567891011

⎤⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎦

=

⎡⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎣

04815926103711

⎤⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎦

≡

⎡⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎣

0 4 81 5 92 6 103 7 11

⎤⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎦

S3,4−→

⎡⎢⎢⎢⎢⎢⎢⎣0 1 2 34 5 6 78 9 10 11

⎤⎥⎥⎥⎥⎥⎥⎦

Takes the length-pq “card deck” x, splits it into p piles of length-q each, andthen takes one card from each pile in turn until the deck is reassembled.

C ⊗B = Sm1,m2(B ⊗ C)STn1,n2 B ∈ IRm1×n1 , C ∈ IRm2×n2

E.g.,

A = B ⊗ C =

⎡⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎣

b11c11 b11c12 b11c13 b12c11 b12c12 b12c13

b11c21 b11c22 b11c23 b12c21 b12c22 b12c23

b21c11 b21c12 b21c13 b22c11 b22c12 b22c13

b21c21 b21c22 b21c23 b22c21 b22c22 b22c23

⎤⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎦

S2,2 A ST2,3 = = C ⊗B =

⎡⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎣

c11b11 c11b12 c12b11 c12b12 c13b11 c13b12

c11b21 c11b22 c12b21 c12b22 c13b21 c13b22

c21b11 c21b12 c22b11 c22b12 c23b11 c23b12

c21b21 c21b22 c22b21 c22b22 c23b21 c23b22

⎤⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎦

Inheriting Structure

IfB andC are

⎧⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎨⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎩

nonsingularlower(upper) triangularbandedsymmetricpositive definitestochasticToeplitzpermutationsorthogonal

⎫⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎬⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎭

thenB ⊗ C is

⎧⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎨⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎩

nonsingularlower(upper)triangularblock bandedsymmetricpositive definitestochasticblock Toeplitza permutationorthogonal

⎫⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎬⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎭

Factorizations of B ⊗ CObvious...

B ⊗ C = (PTB LBUB)⊗ (PTC LCUC) = (PB ⊗ PC)T (LB ⊗ LC)(UB ⊗ UC)B ⊗ C = (GBG

TB)⊗ (GCG

TC ) = (GB ⊗GC)(GB ⊗GC)

T

B ⊗ C = (QBRB)⊗ (QCRC) = (QB ⊗QC)(RB ⊗RC)

Sort of...

B ⊗ C = (QBΛBQTB)⊗ (QCΛCQ

TC ) = (QB ⊗QC)(ΛB ⊗ ΛC)(QB ⊗QC)

T

B ⊗ C = (UBΣBVTB )⊗ (UCΣCV TC ) = (UB ⊗ UC)(ΣB ⊗ ΣC)(VB ⊗ VC)T

Problematic...

B ⊗ C = (QBRBPTB )⊗ (QCRCP

TC ) = (QB ⊗QC)(RB ⊗RC)(PB ⊗ PC)T

“Sort of”

⎡⎢⎢⎢⎢⎢⎢⎣× 00 ×0 0

⎤⎥⎥⎥⎥⎥⎥⎦ ⊗

⎡⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎣

× 0 00 × 00 0 ×0 0 00 0 0

⎤⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎦=

⎡⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎣

× 0 0 0 0 00 × 0 0 0 00 0 × 0 0 00 0 0 0 0 00 0 0 0 0 00 0 0 × 0 00 0 0 0 × 00 0 0 0 0 ×0 0 0 0 0 00 0 0 0 0 00 0 0 0 0 00 0 0 0 0 00 0 0 0 0 00 0 0 0 0 00 0 0 0 0 0

⎤⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎦

P

⎛⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎝

⎡⎢⎢⎢⎢⎢⎢⎣× 00 ×0 0

⎤⎥⎥⎥⎥⎥⎥⎦ ⊗

⎡⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎣

× 0 00 × 00 0 ×0 0 00 0 0

⎤⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎦

⎞⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎠=

⎡⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎣

× 0 0 0 0 00 × 0 0 0 00 0 × 0 0 00 0 0 × 0 00 0 0 0 × 00 0 0 0 0 ×0 0 0 0 0 00 0 0 0 0 00 0 0 0 0 00 0 0 0 0 00 0 0 0 0 00 0 0 0 0 00 0 0 0 0 00 0 0 0 0 00 0 0 0 0 0

⎤⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎦

“Fast” Factoring

Suppose B ∈ IRm×m and C ∈ IRn×n are positive definite.

A = B ⊗ C is an mn-by-mn symmetric positive definite natrix.

Ordinarily, a Cholesky factorization GGT this size would cost O(m3n3) flops.

But GA = GB ⊗GC and GB and GC require O(m3 + n3) flops.

“Fast” Solving

If B,C ∈ IRm×m , then the m2-by-m2 system

(B ⊗ C)x = f ≡ CXBT = F x = vec(X), f = vec(F )

can be solved in O(m3) flops:

CY = F

XBT = Y

via factorizations of B and C.

The vec Operation

X ∈ IRm×n ⇒ vec(X) =

⎡⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎣

X(:, 1)X(:, 2)...

X(:, n)

⎤⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎦

Y = CXBT ⇒ vec(Y ) = (B ⊗ C)vec(X)

Matrix Equations

Sylvester:

FX +XGT = C ≡ (In ⊗ F + G⊗ Im) vec(X) = vec(C)

Lyapunov:

FX +XFT = C ≡ (In ⊗ F + F ⊗ In) vec(X) = vec(C)

Generalized Sylvester:

F1XGT1 + F2XG

T2 = C ≡ (G1 ⊗ F1 + G2 ⊗ F2) vec(X) = vec(C)

These can be solved in O(n3) time via the (generalized) Schur decomposition.See Bartels and Stewart, Golub, Nash, and Van Loan, and J. Gardiner, M.R.Wette, A.J. Laub, J.J. Amato, and C.B. Moler.

Talk Outline

How do we solve

(1) min W ((B ⊗ C)x− d) (weighted least quares)

(2) UT [B ⊗ C | d]V = Σ (total least squares)

(3) (Ap ⊗ · · · ⊗ A1 − λI)x = b (shifted linear systems)

given that these problems

(1 ) min (B ⊗ C)x− d

(2 ) UT (B ⊗ C)V = Σ

(3 ) (Ap ⊗ · · · ⊗A1)x = bare easy

The Rise of the Kronecker Product

• Thriving Application Areas

• Sparse Factorizations for fast linear transforms

• Tensoring Low-Dimension Techniques

Thriving application areas such as SemidefiniteProgramming...

Some sample problems...

X ⊗X +ATDA u = f.

⎡⎢⎢⎢⎢⎢⎢⎣0 AT IA 0 0

Z ⊗ I 0 X ⊗ I

⎤⎥⎥⎥⎥⎥⎥⎦

⎡⎢⎢⎢⎢⎢⎢⎣∆x∆y∆z

⎤⎥⎥⎥⎥⎥⎥⎦ =⎡⎢⎢⎢⎢⎢⎢⎣rdrprc

⎤⎥⎥⎥⎥⎥⎥⎦ .

See Alizadeh, Haeberly, and Overton (1998).

Symmetric Kronecker Products

For symmetricX ∈ IRn×n and arbitrary B,C ∈ IRn×n this operation is definedby

(B ⊗ C)svec(X) = svec⎛⎜⎝12CXBT +BXCT

⎞⎟⎠

where the “svec” operation is a normalized stacking ofX ’s subdiagonal columns,e.g.,

X =

⎡⎢⎢⎢⎢⎢⎢⎣x11 x12 x13x21 x22 x23x31 x32 x33

⎤⎥⎥⎥⎥⎥⎥⎦ ⇒ svec(X) = x11√2x21

√2x31 x22

√2x32 x33

T.

svec stacks the subdiagonal portion of X ’s columns.

The Rise of the Kronecker Product Cont’d

Sparse Factorizations

Kronecker products are proving to be a very effective way to look at fastlinear transforms.

A Sparse Factorization of the DFT Matrix

n = 2t

Fn = At · · ·A1Pn

Pn = S2,n/2(I2 ⊗ S2,n/4) · · · (In/4 ⊗ S2,2)

Aq = Ir ⊗⎡⎢⎢⎢⎢⎢⎣IL/2 ΩL/2

IL/2 −ΩL/2

⎤⎥⎥⎥⎥⎥⎦ L = 2q, r = n/L

ΩL/2 = diag(1,ωL, . . . ,ωL/2−1L ) ωL = exp(−2πi/L)

Different FFTs Correspond to DifferentFactorizations of Fn

The Cooley-Tukey FFT is based on y = Fnx = At · · ·A1Pnxx← Pnxfor k = 1:tx← Aqx

endy ← x

The Gentleman-Sande FFT is based on y = Fnx = FTn x = P

Tn A

T1 · · ·ATt x

for k = t:− 1:1x← ATq x

endy ← PTn x

Matrix Transpose

A ∈ IRm×n and B = AT , then vec(B) = Sn,m · vec(A).⎡⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎣

a11a12a13a21a22a23

⎤⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎦

=

⎡⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎣

1 0 0 0 0 00 0 1 0 0 00 0 0 0 1 00 1 0 0 0 00 0 0 1 0 00 0 0 0 0 1

⎤⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎦

⎡⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎣

a11a21a12a22a13a23

⎤⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎦

⎡⎢⎢⎢⎢⎢⎢⎣a11 a21a12 a22a13 a23

⎤⎥⎥⎥⎥⎥⎥⎦ =⎡⎢⎢⎣ a11 a12 a13a21 a22 a23

⎤⎥⎥⎦T

Multiple Pass Transpose

To compute B = AT where A ∈ IRm×n factor

Sn,m = Γt · · ·Γ1

and then execute

a = vec(A)for k = 1:ta← Γka

endDefine B ∈ IRn×m by vec(B) = a.

Different transpose algorithms correspond to different factorizations ofSn,m.

An Example

If m = pn, then Sn,m = Γ2Γ1 = (Ip ⊗ Sn,n)(Sn,p ⊗ In)

The first pass b(1) = Γ1vec(A) corresponds to a block transposition:

A =

⎡⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎣

A1A2A3A4

⎤⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎦→ B(1) = A1 A2 A3 A4 .

The second pass b(2) = Γ2b(1) carries out the transposition of the blocks.

B(1) = A1 A2 A3 A4 → B(2) = AT1 AT2 A

T3 A

T4 .

Note that B(2) = AT.

The Rise of the Kronecker Product Cont’d

Tensoring Low-Dimension Techniques

Greater willingness to entertain problems of high dimension.

Tensoring Low Dimensional Ideas

Quadrature in one and three dimensions:

ba f(x) dx ≈

n

i=1wi f(xi)

= wTf(x)

b1a1

b2a2

b3a3f(x, y, z) dx dy dz ≈ nx

i=1

ny

j=1

nz

k=1w(x)i w

(y)j w

(z)k f(xi, yj, zk)

= (w(x) ⊗ w(y) ⊗ w(z))Tf(x⊗ y ⊗ z)

Higher Dimensional KP Problems

1. Andre, Nowak, and Van Veen (1997).

In higher-order statistics the calculations involve the “cumulants’x⊗ x⊗ · · · ⊗ x. (Note that vec(xxT ) = x⊗ x.)

2. de Lathauwer, de Moor, and Vandewalle (2000)

multilinear SVD, low rank approximation of tensors

Least Squares Problems that Involve KroneckerProducts

Some Least Squares Problems

Least squares problems of the form

min (B ⊗ C)x− b

can be efficiently solved by computing the QR factorizations (or SVDs) of Band C. See Fausett and Fulton (1994) and Fausett, Fulton, and Hashish (1997).

Barrlund (1998) on surface fitting with splines...

(A1 ⊗ A2)x− f such that (B1 ⊗B2)x = g

Weighted Least Squares

min W−1/2((B ⊗ C)x− b) 2 ≡⎡⎢⎢⎢⎢⎣

W B ⊗ CBT ⊗ CT 0

⎤⎥⎥⎥⎥⎦⎡⎢⎢⎢⎢⎣r

x

⎤⎥⎥⎥⎥⎦ =⎡⎢⎢⎢⎢⎣b

0

⎤⎥⎥⎥⎥⎦

Compute the QR factorizations

B = QB

⎡⎢⎢⎣ RB0

⎤⎥⎥⎦ C = QC

⎡⎢⎢⎣ RC0

⎤⎥⎥⎦

The augmented system transforms to

⎡⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎣

E11 E12 RB ⊗RCE21 E22 0

RTB ⊗RTC 0 0

⎤⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎦

⎡⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎣

r1

r2

x

⎤⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎦=

⎡⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎣

b1

b2

0

⎤⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎦

where

⎡⎢⎢⎢⎢⎣E11 E12

E21 E22

⎤⎥⎥⎥⎥⎦

is a simple permutation of (QB ⊗QC)TW (QB ⊗QC). Solve the E22 systemvia congugate gradients exploiting structure.

When the blocks are Kronecker products...

min

⎡⎢⎢⎣ B1 ⊗ C1B2 ⊗ C2

⎤⎥⎥⎦x− b2

Compute a pair of GSVD’s:

B1 = U1BD1BXTB B2 = U2BD2BX

TB

C1 = U1CD1CXTC C2 = U2CD2CX

TC .

⎡⎢⎢⎣ B1 ⊗ C1B2 ⊗ C2

⎤⎥⎥⎦ =⎡⎢⎢⎣ U1B ⊗ U2B 0

0 U1C ⊗ U2C⎤⎥⎥⎦⎡⎢⎢⎣D1B ⊗D2B

D1C ⊗D2C

⎤⎥⎥⎦XTB⊗XT

C .

Total Least Squares

LS: minb + r ∈ ran(A)

r 22 AxLS = b + ropt

TLS: minb + r ∈ ran(A + E)

E 2F + r 2

2 (A + Eopt)xTLS = b + ropt

“Errors in Variables”

Total Least Squares Solution

To solve

minb + r ∈ ran(A + E)

E 2F + r 2

2 (A + Eopt)xTLS = b + ropt

compute the SVD of [A | b] ∈ IRm×n+1 :

UT A b V = Σ

and set

xTLS = −V (1:n, n + 1)/V (n + 1, n + 1)

If A is a Kronecker Product B ⊗ C...

We need the last column of V in UTFV = Σ where

F = B ⊗ C b

First compute the SVDs of B and C:

UTB BVB = ΣB UTC CVC = ΣC

If

U = UB ⊗ UC V =

⎡⎢⎢⎣ V B ⊗ VC 00 1

⎤⎥⎥⎦then

UTF V = ΣB ⊗ ΣC g ≡ F where g = UTb

We need the smallest right singular vector of F .

Think Eigenvector

The right singular vectors of F are the eigenvectors of

C = F T F = ΣB ⊗ ΣC gTΣB ⊗ ΣC g =

⎡⎢⎢⎣ D zzT α

⎤⎥⎥⎦where

D = ΣTBΣB ⊗ ΣTCΣC (square and diagonal)

z = (ΣB ⊗ ΣC)Tg

α = gTg

Eigenvectors/Eigenvalues of Bordered Matrices

C =

⎡⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎣

d1 0 0 0 z10 d2 0 0 z20 0 d3 0 z30 0 0 d4 z4z1 z2 z3 z4 α

⎤⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎦

The eigenvalues of C are the zeros of

f(µ) = µ− α + zT (D − µI)−1z = µ− α +N

i=1

z2idi − µ

Can assume that the di are distinct and that the zi are nonzero.

The zeros of f are separated by the di. Can show that the smallest root of f(a.k.a. the smallest eigenvalue of F ) is between 0 and the smallest di.

From the eigenvalue equation

⎡⎢⎢⎣ D zzT α

⎤⎥⎥⎦⎡⎢⎢⎣ x−1

⎤⎥⎥⎦ = µmin⎡⎢⎢⎣ x−1

⎤⎥⎥⎦

it follows that

x = (D − µminI)−1z

Thus the sought-after singular vector of F is just a unit vector in the direction

⎡⎢⎢⎢⎢⎣(D − µminI)−1z

−1

⎤⎥⎥⎥⎥⎦

Overall Solution Procedure

1. First compute the SVDs of B and C:

UTB BVB = ΣB UTC CVC = ΣC

2. Find smallest root of f(µ) = µ− α + zT (D − µI)−1z.

3. Get smallest singular vector w of

ΣB ⊗ ΣC g

4. Get the TLS solution from

v =

⎡⎢⎢⎣ V B ⊗ VC 00 1

⎤⎥⎥⎦w

A Shifted Kronecker Product System

Frequency Response

Suppose we wish to evaluate for many different values of µ.

φ(µ) = cT (A− µI)−1d

It pays to recompute the real Schur decomposition..

QTAQ = T =

⎡⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎣

× × × × ×0 × × × ×0 × × × ×0 0 0 × ×0 0 0 0 ×

⎤⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎦

φ(µ) = cTQ(T − µI)−1QTd (O(n2) per evaluation

Solve Ap ⊗ · · · ⊗A1 − λI x = b.

First compute the real Schur decompositions Ak = QkTkQTk , i = 1:p

Set Q = Q(p) ⊗ · · · ⊗Q(1).It follows that

QT A(p) ⊗ · · · ⊗A(1) Q = T (p) ⊗ · · · ⊗ T (1) ≡ T

and we get

T (p) ⊗ · · · ⊗ T (1) − λIn y = c

where y ∈ IRn and c ∈ IRn are defined by

x = Q(p) ⊗ · · · ⊗Q(1) y c = Q(p) ⊗ · · · ⊗Q(1) T b.

Unshifted Case

The standard approach for solving

T (p) ⊗ · · · ⊗ T (1) y = c T (i) ∈ IRni×niis to recognize that

T =p

i=1Iρi ⊗ T (i) ⊗ Iµi

where ρi = n1 · · ·ni−1 and µi = ni+1 · · ·np for i = 1:p.We then obtain...

y ← cfor i = 1:p

y ← Iρi ⊗ T (i) ⊗ Iµi−1y

end

The p = 2 case: (F ⊗G− λI)x = b

⎡⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎣

f11G− λIm f12G f13G f14G0 f22G− λIm f23G f24G0 0 f33G− λIm f34G0 0 0 f44G− λIm

⎤⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎦

⎡⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎣

y1y2y3y4

⎤⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎦=

⎡⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎣

c1c2c3c4

⎤⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎦

Solve the triangular system

(f44G− λIm) y4 = c4

for y4. Substituting this into the above system reduces it to

⎡⎢⎢⎢⎢⎢⎢⎣f11G− λIm f12G f13G

0 f22G− λIm f23G0 0 f33G− λIm

⎤⎥⎥⎥⎥⎥⎥⎦

⎡⎢⎢⎢⎢⎢⎢⎣y1y2y3

⎤⎥⎥⎥⎥⎥⎥⎦ =⎡⎢⎢⎢⎢⎢⎢⎣c1c2c3

⎤⎥⎥⎥⎥⎥⎥⎦

where ci = ci − fi4Gy4, i = 1:3.

The General Triangular Case

It is recursive...

function y = KPShiftSolve(T, c,λ, n,α)n is a p-by-1 integer vectorT is a p-by-1 cell array where Ti is anni-by-ni upper triangular matrix for i = 1:p.Assume that c ∈ IRN whereN = n1 · · ·np andλ and α are scalars so A = (αTp⊗ · · · ⊗ T1− λIN)A is nonsingular. y ∈ IRN solves Ay = c.

p← length(n)

if p == 1

y ← (αT1− λI)\celse

m← n1 · · ·np−1for i = np:− 1:1idx← 1 + (i− 1)m:imα← Tp(i, i)y(idx)← KPShiftSolve(T, b(idx),λ, n(1:p− 1),α)z ← (Tp− 1⊗ · · · ⊗ T1)y(idx)for j = 1:i− 1jdx← 1 + (j − 1)m:jmc(jdx)← c(jdx)− Tp(j, i)z

endend

end

Two-by-Two Bumps

⎡⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎣

f11G− λIm f12G f13G f14G0 f22G− λIm f23G f24G0 0 f33G− λIm f34G0 0 f43G f44G− λIm

⎤⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎦

⎡⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎣

y1y2y3y4

⎤⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎦=

⎡⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎣

c1c2c3c4

⎤⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎦

Solve for y3 and y4 at the same time.

Revert to localized complex arithmetic.

ZH⎡⎢⎢⎣ f33 f34f43 f44

⎤⎥⎥⎦Z =

⎡⎢⎢⎣ t33 t340 t44

⎤⎥⎥⎦

High Order Kronecker Products

y = (Fp ⊗ · · · ⊗ F1)x Fi ∈ IRni×ni

It is convenient to make use of the factorization

Fp ⊗ · · · ⊗ F1 = Mp · · ·M1

where

Mi = ΠTni,N/ni

IN/ni ⊗ Fi N = n1 · · ·npMatlab:

Z ← xfor i = 1:p

Z ← (Fi · reshape(Z, ni,N/ni))Tendy ← reshape(Z,N, 1)

Conclusion

Our goal is to heighten the profile of the Kronecker product and develop an“infrastructure” of methods thereby making it easier for the numerical linearalgebra community to spot Kronecker “opportunities”.

With precedent...

• The development of effective algorithms for the QR and SVD factorizationsturned many “ATA” problems into least square/singular value calculations.

• The development of the QZ algorithm (Moler and Stewart (1973)) for theproblem Ax = λBx prompted a rethink of “standard” eigenproblems thathave the form B−1Ax = λx.

computational multilinear algebra - cornell university · computational multilinear algebra charles...

Documents