1 b. khoromskij, leipzig 2007(l3) tensor-product ... · lebesque constant for chebyshev...

Lect. 3. Tensor-product interpolation. Introduction to MLA. B. Khoromskij, Leipzig 2007(L3) 1

Contents of Lecture 3

1. Best polynomial approximation.

2. Error bound for tensor-product interpolants.

- Polynomial interpolation.

- Sinc interpolation.

3. Data-sparse formats to represent high-order tensors.

- Tucker model.

- Canonical (PARAFAC) model.

- Two-level and mixed models.

4. Multi-linear algebra (MLA) with Kronecker-product data.

Chebyshev polynomials B. Khoromskij, Leipzig 2007(L3) 2

By Eρ = Eρ(B) with the reference interval B := [−1, 1], we

denote the Bernstein’s regularity ellipse (with foci at w = ±1

and the sum of semi-axes equal to ρ > 1),

Eρ := w ∈ C : |w − 1| + |w + 1| ≤ ρ + ρ−1.

The Chebyshev polynomials, Tn(w), are defined recursively

T0(w) = 1, T1(w) = w,

Tn+1(w) = 2wTn(w) − Tn−1(w), n = 1, 2, . . . .

Representation Tn(x) = cos(n arccosx), x ∈ [−1, 1], implies

Tn(1) = 1, Tn(−1) = (−1)n. There holds

Tn(w) =1

2(zn + z−n) with w =

1

2(z +

1

z).

Best polynomial approximation by Chebyshev series B. Khoromskij, Leipzig 2007(L3) 3

Thm. 3.1. Let F be analytic and bounded by M in Eρ (with

ρ > 1). Then the expansion

F (w) = C0 + 2∞∑

n=1

CnTn(w), (1)

holds for all w ∈ Eρ (Chebyshev series), and with

Cn =1

π

∫ 1

−1

F (w)Tn(w)√1 − w2

dw.

Moreover, |Cn| ≤ M/ρn and for w ∈ B and for m = 1, 2, 3, . . . ,

|F (w) − C0 − 2m

∑

n=1

CnTn(w)| ≤ 2M

ρ − 1ρ−m, w ∈ B. (2)

Lagrangian polynomial interpolation B. Khoromskij, Leipzig 2007(L3) 4

Let PN (B) be the set of polynomials of degree ≤ N on B.

Define by [INF ](x) ∈ PN (B) the interpolation polynomial of F

w.r.t. the Chebyshev-Gauss-Lobatto (CGL) nodes

ξj = cosπj

N∈ B, j = 0, 1, . . . , N, with ξ0 = 1, ξN = −1,

where ξj are zeroes of the polynomials (1 − x2)T ′N (x), x ∈ B.

The Lagrangian interpolant IN of F has the form

INF :=

N∑

j=0

F (ξj)lj(x) ∈ PN (B) (3)

with lj(x) being the set of interpolation polynomials

lj :=N∏

k=0,j 6=k

x − ξk

ξj − ξk

∈ PN (B), j = 0, . . . , N.

Clearly, IN (ξj) = F (ξj), since lj(ξj) = 1 and lj(ξk) = 0 ∀k 6= j.

Lebesque constant for Chebyshev interpolation B. Khoromskij, Leipzig 2007(L3) 5

Given the set ξjNj=0 of interpolation points on [−1, 1] and the

associated Lagrangian interpolation operator IN .

The approximation theory for polynomial interpolation

includes the so-called Lebesque constant ΛN ∈ R>1,

‖INu‖∞,B ≤ ΛN‖u‖∞,B ∀u ∈ C(B). (4)

In the case of Chebyshev interpolation it can be shown that

ΛN grows at most logarithmically in N ,

ΛN ≤ 2

πlog N + 1.

The interpolation points which produce the smallest value Λ∗N

of all ΛN are not known, but Bernstein ’54 proves that

Λ∗N =

2

πlog N + O(1).

Error bound for polynomial interpolation B. Khoromskij, Leipzig 2007(L3) 6

Thm. 3.2. Let u ∈ C∞[−1, 1] have an analytic extension to Eρ

bounded by M > 0 in Eρ (with ρ > 1). Then we have

‖u − INu‖∞,I ≤ (1 + ΛN )2M

ρ − 1ρ−N , N ∈ N≥1. (5)

Proof. Due to (2) one obtains for the best polynomial

approximations to u on [−1, 1],

minv∈PN

‖u − v‖∞,B ≤ 2M

ρ − 1ρ−N .

The interpolation operator IN is a projection, that is, for all

v ∈ PN we have INv = v.

Now apply the triangle inequality,

‖u − INu‖∞,B = ‖u − v − IN (u − v)‖∞,B ≤ (1 + ΛN )‖u − v‖∞,B .

Tensor-product polynomial interpolation B. Khoromskij, Leipzig 2007(L3) 7

Consider a multi-variate funct.

f = f(x1, . . . , xd) : Bd → R, d ≥ 2,

defined on a box Bd = B1 × B2 × . . . × Bd with Bk = B = [−1, 1].

Define N-th order tensor product interpolation operator

INf = I1N × I2

N × . . . × IdNf ∈ PN [Bd],

where IkNf denotes the interpolation polynomial w.r.t. xk, at

nodes ξk ∈ Bk, k = 1, . . . , d.

We choose the CGL nodes, hence the interpolation points

ξα ∈ Bd, α = (i1, . . . , id) ∈ Nd0, are obtained by the Cartesian

product of 1D-nodes,

ξα :=

(

cosπi1N

, . . . , cosπidN

)

.


Again, IN is the projection map,

IN : C(Bd) → PN := p1 × . . . × pd : pi ∈ PN , i = 1, . . . d

implying stability of IN in the multidimensional case, cf. (4),

‖INf‖∞,Bd ≤ ΛdN‖f‖∞,Bd ∀ f ∈ C(Bd). (6)

To derive an analogue of Thm. 3.2, introduce the product

domain

E(j)ρ := B1 × . . . × Bj−1 × Eρ(Ij) × Bj+1 × . . . × Bd,

and denote by X−j the (d − 1)-dimensional subset of variables

x1, . . . , xj−1, xj+1, . . . , xd with xj ∈ Bj , j = 1, ..., d.


Assump. 3.1. Given f ∈ C∞(Bd), assume there is ρ > 1 s.t.

for all j = 1, . . . , d, and each fixed ξ ∈ X−j, there exists an

analytic extension fj(xj , ξ) of f(xj , ξ) to Eρ(Bj) ⊂ C w.r.t. xj

bounded in Eρ(Bj) by certain Mj > 0, independent on ξ.

Thm. 3.3. For f ∈ C∞(Bd), let Assump. 3.1 be satisfied.

Then the interpolation error can be estimated by

‖f − INf‖∞,Bd ≤ ΛdN

2Mρ(f)

ρ − 1ρ−N , (7)

where ΛN is the Lebesque const. for the 1D interpolant IkN ,

and

Mρ(f) := max1≤j≤d

maxx∈E

(j)ρ

|fj(x, ξ)|.

Sinc-approximation of multi-variate functions B. Khoromskij, Leipzig 2007(L3) 11

Consider the separable approximation in the case Ω = R.

Extension to the case Ω = R+ or Ω = (a, b) is possible.

The tensor-product Sinc interpolant CM w.r.t. the first d − 1

variables, reads

CMf := C1M × ... × Cd−1

M f, f : Rd → R,

where CℓMf = Cℓ

M (f, h), 1 ≤ ℓ ≤ d, is the univariate Sinc interp.

CM (f, h) =M∑

k=−M

f(kh)Sk,h(x),

in xℓ ∈ Iℓ = R, with Rd = I1 × ... × Id.

Ex. 3.1. Examples of approximated function (x, y ∈ Rd)

f(x) = |x|α, f(x) =exp(κ|x|)

|x| , f(x, y) = sinc(|x||y|).


Error bound for tensor-product Sinc interpolant.

The estimation of the error f − CMf requires the Lebesgue

constant ΛM ≥ 1 defined by

||CM (f, h)||∞ ≤ ΛM ||f ||∞ for all f ∈ C(R). (8)

Stenger ’93 proves the inequality

ΛM = maxx∈R

M∑

k=−M

|Sk,h(x)| ≤ 2

π(3 + log(M)). (9)

For each fixed ℓ ∈ 1, . . . , d − 1, choose ζℓ ∈ Iℓ and define the

remaining parameter set by

Yℓ := I1 × ... × Iℓ−1 × Iℓ+1 × ... × Id ∈ Rd−1.


Introduces the univariate (parameter dependent) function

Fℓ(·, y) : Iℓ → R, y ∈ Yℓ,

which is the restriction of f onto Iℓ.

Thm. 3.4. (Hackbusch, Khoromskij) For each ℓ = 1, ..., d − 1 we

assume that for any fixed y ∈ Yℓ, Fℓ(·, y) satisfies

(a) Fℓ(·, y) ∈ H1(Dδ) with N(Fℓ, Dδ) ≤ Nℓ < ∞ uniformly in y;

(b) Fℓ(·, y) has hyper-exponential decay with a = 1, C, b > 0.

Then, for all y ∈ Yℓ, the “optimal” choice h := log MM

yields

|f − CM (f, h)| ≤ C

2πδΛd−2

M maxℓ=1,...,d−1

Nℓ e−πδMlog M (10)

with ΛM defined by (9).

Data-sparse representation of high-order tensors B. Khoromskij, Leipzig 2007(L3) 15

Def. 3.1. A d-th order tensor on Id = I1 × ... × Id,

A := [ai1...id ] ∈ RId

, d = pd, p, d, n ∈ N

with multi-index

iℓ = (iℓ,1, ..., iℓ,p) ∈ Iℓ = I1 × ... × Ip (ℓ = 1, ..., d),

and iℓ,m ∈ 1, ..., n, for m = 1, ..., p (p = 1, 2, 3).

The L2 inner product of tensors induces the Frobenius norm

〈A,B〉 :=∑

(i1...id)∈Id

ai1...id bi1...id , ‖A‖F :=√

〈A,A〉.

A ∈ RId

has |Id| = ndp entries.

How to remove d from the exponential ?

Data-sparse representation of high-order tensors B. Khoromskij, Leipzig 2007(L3) 16

Key ingredient: representation by a sum of rank-1 tensors

A = V (1) ×2 · · · ×d V (d), ai1...id = v(1)i1

· · · v(d)id

with low dimensional (canonical) comp. V (ℓ) = v(ℓ)iℓ

∈ Rnp

.

Complexity: dnp.

Standard MLA has linear scaling in d.

Ex. 3.1. Let A = a1 × a2, B = b1 × b2, ai, bi ∈ Rn (q = 2, p = 1).

Then

(A, B) = (a1, b1)(a2, b2), ||A||F =√

(a1, a1)(a2, a2) = ‖a1‖‖a2‖,

where the latter corresponds to the Frobenius norm of a

matrix.

Rank-(r1, ..., rd) Tucker model B. Khoromskij, Leipzig 2007(L3) 17

Tucker Model (T r). (orthonormalised set V(ℓ)kℓ

∈ RIℓ)

A(r) =

r1∑

k1=1

...

rd∑

kd=1

bk1...kd×1 V

(1)k1

×2 ... ×d V(d)kd

∈ RI1×...×Id . (11)

Core tens. B = bk ∈ Rr1×...×rd is not unique (up to rotations)

Complexity (p = 1): rd + rdn ≪ nd with r = max rℓ ≪ n.

Visualization of the Tucker model with d = 3:

=

I 2

I 1

I 3

A B

I 1

r 2

r 1

I 2

I 3

r 3

V

V

V

(1)

(2)

(3)

CANDECOMP/PARAFAC (CP) tensor format B. Khoromskij, Leipzig 2007(L3) 18

CP Model (Cr). Approx. A by a sum of rank-1 tensors

A(r) =r

∑

k=1

bk ×1 V(1)k ×2 · · · ×d V

(d)k ≈ A, bk ∈ R

with normalised V(ℓ)k ∈ Rnp

. Uniqueness is due to J. Kruskal ’77.

Complexity: r + rdn.

The minimal number r is called a tensor rank of A(r).

+

b

A

1b

V V V

V V V

V V V

+= ...+

1

1 2

2

2

r

r

r

(1) (1) (1)

(2) (2) (2)

21

(3) (3) (3)

rb

Figure 1: Visualization of the CP-model for d = 3.

Two-level and mixed models B. Khoromskij, Leipzig 2007(L3) 19

Two-level Tucker model T (U ,r,q),

A(r,q) = B ×1 V(1) ×2 V

(2)... ×d V(d) ∈ T (U ,r,q) ⊂ C(n,q),

1. B ∈ Rr1×...×rd is retrieved by the rank-q CP model C(r,q)

2. V(ℓ) = [V

(ℓ)1 V

(ℓ)2 ...V

(ℓ)rℓ ] ∈ U, ℓ = 1, ..., d,

U spans fixed (uniform/adaptive) basis;

⇒ O(rd) with r = maxℓ≤d rℓ ⇒ O(dqr) (independent of n !).

Mixed model MC,T :

A = A1 + A2, A1 ∈ Cr1 , A2 ∈ T r2 .

Applies to ”ill-conditioned” tensors.

Challenge of multi-factor analysis B. Khoromskij, Leipzig 2007(L3) 20

There is little analogy between the cases d = 2 and d ≥ 3,

Paradigm: linear algebra vs. multi-linear algebra (MLA).

CP/Tucker tensor-product models have plenty of merits:

1. A(r) is repr. with low cost drn (resp. drn + rd) ≪ nd.

2. V(ℓ)k can be repr. in the data-sparse form:

H-matrix (HKT), wavelet-based (WKT), uniform basis.

3. The core tensor B = bk can be sparsified as well.

4. Efficient numerical MLA (practical experience).

Remark. CP decomposition (unique !) can’t be retrieved by

rotation and truncation of the Tucker model,

Cr = T r if r = 1, but Cr 6⊂ T r if r = |r| ≥ 2.

Examples of T (U,r,q)-models B. Khoromskij, Leipzig 2007(L3) 21

(I) Tensor-product sinc-interpolation:

analytic functions with point singularities,

r = (r, ..., r), r = q = O(log n| log ε|) ⇒ O(dqr).

(II) Sparse grids: regularity of mixed derivatives,

r = (n1, ..., nd), hyperbolic cross

⇒ q = n logd n ⇒ O(n logd n).

(III) Adaptive two-level appr.: Tucker + CP decomp. of Bwith q ≤ |r| ⇒ O(dqn).

Structured Kronecker product models (d-th order tensors of size nd)

Model Notation Memory/A · x A · B Comp. tools

Canonical - CP Cr drn drn2 ALS/Newton

HKT - CP CH,r dr√

n logq n drn logq n Analytic (quadr.)

Nested - CP CT (I),L drlog dn+ rd drlog dn SVD/QR/orthog. iter.

Tucker T r rd + drn - Orthogonal ALS

Two-level Tucker T (U,r,q) drq/drr0qn2 dr2q2 (mem.) Analyt.(interp.)+ CP

Properties of the Kronecker product B. Khoromskij, Leipzig 2007(L3) 22

The Kronecker product (KP) operation A⊗B of two matrices

A = [aij ] ∈ Rm×n, B ∈ Rh×g is an mh × ng matrix that has the

block-representation [aijB] (corr. to p = 2).

1. Let C ∈ Rs×t, then the KP satisfies the associative law,

(A ⊗ B) ⊗ C = A ⊗ (B ⊗ C),

and therefore we do not use brackets. The matrix

A ⊗ B ⊗ C := (A ⊗ B) ⊗ C has (mhs) rows and (ngt) columns.

2. Let C ∈ Rn×r and D ∈ Rg×s, then the standard

matrix-matrix product in the Kronecker format takes the form

(A ⊗ B)(C ⊗ D) = (AC) ⊗ (BD).

The corresponding extension to q-th order tensors is

(A1 ⊗ ... ⊗ Aq)(B1 ⊗ ... ⊗ Bq) = (A1B1) ⊗ ... ⊗ (AqBq).

Properties of the Kronecker product B. Khoromskij, Leipzig 2007(L3) 23

3. We have the distributive law

(A + B) ⊗ (C + D) = A ⊗ C + A ⊗ D + B ⊗ C + B ⊗ D.

4. Rank relation: rank(A ⊗ B) = rank(A)rank(B).

Ex. 3.2. In general A ⊗ B 6= B ⊗ A. What is the condition on

A and B that provides A ⊗ B = B ⊗ A ?

Invariance of some matrix properties:

(1) If A and B are diagonal then A ⊗ B is also diagonal, and

conversely (if A ⊗ B 6= 0).

(2) Let A and B be Hermitian/normal matrices (A∗ = A resp.

A−1 = A). Then A ⊗ B is of the corresponding type.

(3) A ∈ Rn×n, B ∈ Rm×m ⇒ det(A ⊗ B) = (detA)n(detB)m.

Kronecker product: matrix operations B. Khoromskij, Leipzig 2007(L3) 24

Thm. 3.5. Let A ∈ Rn×n and B ∈ R

m×m be invertible

matrices. Then

(A ⊗ B)−1 = A−1 ⊗ B−1.

Proof. Since det(A) 6= 0, det(B) 6= 0 and the above property

(3) we have det(A ⊗ B) 6= 0. Thus (A ⊗ B)−1 exists and

(A−1 ⊗ B−1)(A ⊗ B) = (A−1A) ⊗ (B−1B) = Inm.

Lem. 3.6. Let A ∈ Rn×n and B ∈ R

m×m be unitary matrices.

Then A ⊗ B is a unitary matrix.

Proof. Since A∗ = A−1, B∗ = B−1 we have

(A ⊗ B)∗ = A∗ ⊗ B∗ = A−1 ⊗ B−1 = (A ⊗ B)−1.

Kronecker product: matrix operations B. Khoromskij, Leipzig 2007(L3) 25

Define the commutator [A, B] := AB − BA.

Lem. 3.7. Let A ∈ Rn×n and B ∈ R

m×m. Then

[A ⊗ In, Im ⊗ B] = 0 ∈ Rm2×n2

.

Proof.

[A ⊗ In, Im ⊗ B] = (A ⊗ In)(Im ⊗ B) − (Im ⊗ B)(A ⊗ In)

= A ⊗ B − A ⊗ B = 0.

Rem. 3.1. Let A, B ∈ Rn×n, C, D ∈ R

m×m and [A, B] = 0,

[C, D] = 0. Then

[A ⊗ C, B ⊗ D] = 0.

Proof. Apply the identity (A ⊗ B)(C ⊗ D) = (AC) ⊗ (BD).

1 b. khoromskij, leipzig 2007(l3) tensor-product ... · lebesque constant for chebyshev...

Documents