1 b. khoromskij, leipzig 2007(l3) tensor-product ... · lebesque constant for chebyshev...
TRANSCRIPT
Lect. 3. Tensor-product interpolation. Introduction to MLA. B. Khoromskij, Leipzig 2007(L3) 1
Contents of Lecture 3
1. Best polynomial approximation.
2. Error bound for tensor-product interpolants.
- Polynomial interpolation.
- Sinc interpolation.
3. Data-sparse formats to represent high-order tensors.
- Tucker model.
- Canonical (PARAFAC) model.
- Two-level and mixed models.
4. Multi-linear algebra (MLA) with Kronecker-product data.
Chebyshev polynomials B. Khoromskij, Leipzig 2007(L3) 2
By Eρ = Eρ(B) with the reference interval B := [−1, 1], we
denote the Bernstein’s regularity ellipse (with foci at w = ±1
and the sum of semi-axes equal to ρ > 1),
Eρ := w ∈ C : |w − 1| + |w + 1| ≤ ρ + ρ−1.
The Chebyshev polynomials, Tn(w), are defined recursively
T0(w) = 1, T1(w) = w,
Tn+1(w) = 2wTn(w) − Tn−1(w), n = 1, 2, . . . .
Representation Tn(x) = cos(n arccosx), x ∈ [−1, 1], implies
Tn(1) = 1, Tn(−1) = (−1)n. There holds
Tn(w) =1
2(zn + z−n) with w =
1
2(z +
1
z).
Best polynomial approximation by Chebyshev series B. Khoromskij, Leipzig 2007(L3) 3
Thm. 3.1. Let F be analytic and bounded by M in Eρ (with
ρ > 1). Then the expansion
F (w) = C0 + 2∞∑
n=1
CnTn(w), (1)
holds for all w ∈ Eρ (Chebyshev series), and with
Cn =1
π
∫ 1
−1
F (w)Tn(w)√1 − w2
dw.
Moreover, |Cn| ≤ M/ρn and for w ∈ B and for m = 1, 2, 3, . . . ,
|F (w) − C0 − 2m
∑
n=1
CnTn(w)| ≤ 2M
ρ − 1ρ−m, w ∈ B. (2)
Lagrangian polynomial interpolation B. Khoromskij, Leipzig 2007(L3) 4
Let PN (B) be the set of polynomials of degree ≤ N on B.
Define by [INF ](x) ∈ PN (B) the interpolation polynomial of F
w.r.t. the Chebyshev-Gauss-Lobatto (CGL) nodes
ξj = cosπj
N∈ B, j = 0, 1, . . . , N, with ξ0 = 1, ξN = −1,
where ξj are zeroes of the polynomials (1 − x2)T ′N (x), x ∈ B.
The Lagrangian interpolant IN of F has the form
INF :=
N∑
j=0
F (ξj)lj(x) ∈ PN (B) (3)
with lj(x) being the set of interpolation polynomials
lj :=N∏
k=0,j 6=k
x − ξk
ξj − ξk
∈ PN (B), j = 0, . . . , N.
Clearly, IN (ξj) = F (ξj), since lj(ξj) = 1 and lj(ξk) = 0 ∀k 6= j.
Lebesque constant for Chebyshev interpolation B. Khoromskij, Leipzig 2007(L3) 5
Given the set ξjNj=0 of interpolation points on [−1, 1] and the
associated Lagrangian interpolation operator IN .
The approximation theory for polynomial interpolation
includes the so-called Lebesque constant ΛN ∈ R>1,
‖INu‖∞,B ≤ ΛN‖u‖∞,B ∀u ∈ C(B). (4)
In the case of Chebyshev interpolation it can be shown that
ΛN grows at most logarithmically in N ,
ΛN ≤ 2
πlog N + 1.
The interpolation points which produce the smallest value Λ∗N
of all ΛN are not known, but Bernstein ’54 proves that
Λ∗N =
2
πlog N + O(1).
Error bound for polynomial interpolation B. Khoromskij, Leipzig 2007(L3) 6
Thm. 3.2. Let u ∈ C∞[−1, 1] have an analytic extension to Eρ
bounded by M > 0 in Eρ (with ρ > 1). Then we have
‖u − INu‖∞,I ≤ (1 + ΛN )2M
ρ − 1ρ−N , N ∈ N≥1. (5)
Proof. Due to (2) one obtains for the best polynomial
approximations to u on [−1, 1],
minv∈PN
‖u − v‖∞,B ≤ 2M
ρ − 1ρ−N .
The interpolation operator IN is a projection, that is, for all
v ∈ PN we have INv = v.
Now apply the triangle inequality,
‖u − INu‖∞,B = ‖u − v − IN (u − v)‖∞,B ≤ (1 + ΛN )‖u − v‖∞,B .
Tensor-product polynomial interpolation B. Khoromskij, Leipzig 2007(L3) 7
Consider a multi-variate funct.
f = f(x1, . . . , xd) : Bd → R, d ≥ 2,
defined on a box Bd = B1 × B2 × . . . × Bd with Bk = B = [−1, 1].
Define N-th order tensor product interpolation operator
INf = I1N × I2
N × . . . × IdNf ∈ PN [Bd],
where IkNf denotes the interpolation polynomial w.r.t. xk, at
nodes ξk ∈ Bk, k = 1, . . . , d.
We choose the CGL nodes, hence the interpolation points
ξα ∈ Bd, α = (i1, . . . , id) ∈ Nd0, are obtained by the Cartesian
product of 1D-nodes,
ξα :=
(
cosπi1N
, . . . , cosπidN
)
.
Tensor-product polynomial interpolation B. Khoromskij, Leipzig 2007(L3) 8
Again, IN is the projection map,
IN : C(Bd) → PN := p1 × . . . × pd : pi ∈ PN , i = 1, . . . d
implying stability of IN in the multidimensional case, cf. (4),
‖INf‖∞,Bd ≤ ΛdN‖f‖∞,Bd ∀ f ∈ C(Bd). (6)
To derive an analogue of Thm. 3.2, introduce the product
domain
E(j)ρ := B1 × . . . × Bj−1 × Eρ(Ij) × Bj+1 × . . . × Bd,
and denote by X−j the (d − 1)-dimensional subset of variables
x1, . . . , xj−1, xj+1, . . . , xd with xj ∈ Bj , j = 1, ..., d.
Tensor-product polynomial interpolation B. Khoromskij, Leipzig 2007(L3) 9
Assump. 3.1. Given f ∈ C∞(Bd), assume there is ρ > 1 s.t.
for all j = 1, . . . , d, and each fixed ξ ∈ X−j, there exists an
analytic extension fj(xj , ξ) of f(xj , ξ) to Eρ(Bj) ⊂ C w.r.t. xj
bounded in Eρ(Bj) by certain Mj > 0, independent on ξ.
Thm. 3.3. For f ∈ C∞(Bd), let Assump. 3.1 be satisfied.
Then the interpolation error can be estimated by
‖f − INf‖∞,Bd ≤ ΛdN
2Mρ(f)
ρ − 1ρ−N , (7)
where ΛN is the Lebesque const. for the 1D interpolant IkN ,
and
Mρ(f) := max1≤j≤d
maxx∈E
(j)ρ
|fj(x, ξ)|.
Tensor-product polynomial interpolation B. Khoromskij, Leipzig 2007(L3) 10
Proof. Multiple use of (4), (5) and the triangle inequality
lead to
|f − INf | ≤ |f − I1Nf | + |I1
N (f − I2N × . . . × Id
Nf)|≤ |f − I1
Nf | + |I1N (f − I2
Nf)| ++ |I1
NI2N (f − I3
Nf)| + . . . + |I1N × . . . × Id−1
N (f − IdNf)|
≤ [(1 + ΛN ) maxx∈E
(1)ρ
|f1(x, ξ)| + ΛN (1 + ΛN ) maxx∈E
(2)ρ
|f2(x, ξ)|
+ . . . + Λd−1N (1 + ΛN ) max
x∈E(d)ρ
|fd(x, ξ)|] 2
ρ − 1ρ−N
≤ (1 + ΛN )(ΛdN − 1)
ΛN − 1
2Mρ
ρ − 1ρ−N .
Hence (7) follows since for x > 1 we have(1+x)(xn−1)
x−1 ≤ xn.
Sinc-approximation of multi-variate functions B. Khoromskij, Leipzig 2007(L3) 11
Consider the separable approximation in the case Ω = R.
Extension to the case Ω = R+ or Ω = (a, b) is possible.
The tensor-product Sinc interpolant CM w.r.t. the first d − 1
variables, reads
CMf := C1M × ... × Cd−1
M f, f : Rd → R,
where CℓMf = Cℓ
M (f, h), 1 ≤ ℓ ≤ d, is the univariate Sinc interp.
CM (f, h) =M∑
k=−M
f(kh)Sk,h(x),
in xℓ ∈ Iℓ = R, with Rd = I1 × ... × Id.
Ex. 3.1. Examples of approximated function (x, y ∈ Rd)
f(x) = |x|α, f(x) =exp(κ|x|)
|x| , f(x, y) = sinc(|x||y|).
Sinc-approximation of multi-variate functions B. Khoromskij, Leipzig 2007(L3) 12
Error bound for tensor-product Sinc interpolant.
The estimation of the error f − CMf requires the Lebesgue
constant ΛM ≥ 1 defined by
||CM (f, h)||∞ ≤ ΛM ||f ||∞ for all f ∈ C(R). (8)
Stenger ’93 proves the inequality
ΛM = maxx∈R
M∑
k=−M
|Sk,h(x)| ≤ 2
π(3 + log(M)). (9)
For each fixed ℓ ∈ 1, . . . , d − 1, choose ζℓ ∈ Iℓ and define the
remaining parameter set by
Yℓ := I1 × ... × Iℓ−1 × Iℓ+1 × ... × Id ∈ Rd−1.
Sinc-approximation of multi-variate functions B. Khoromskij, Leipzig 2007(L3) 13
Introduces the univariate (parameter dependent) function
Fℓ(·, y) : Iℓ → R, y ∈ Yℓ,
which is the restriction of f onto Iℓ.
Thm. 3.4. (Hackbusch, Khoromskij) For each ℓ = 1, ..., d − 1 we
assume that for any fixed y ∈ Yℓ, Fℓ(·, y) satisfies
(a) Fℓ(·, y) ∈ H1(Dδ) with N(Fℓ, Dδ) ≤ Nℓ < ∞ uniformly in y;
(b) Fℓ(·, y) has hyper-exponential decay with a = 1, C, b > 0.
Then, for all y ∈ Yℓ, the “optimal” choice h := log MM
yields
|f − CM (f, h)| ≤ C
2πδΛd−2
M maxℓ=1,...,d−1
Nℓ e−πδMlog M (10)
with ΛM defined by (9).
Proof of the Sinc-interpolation error B. Khoromskij, Leipzig 2007(L3) 14
The multiple use of (8) and the triangle inequality lead to
|f − CMf | ≤ |f − C1Mf | + |C1
M (f − C2M . . . Cd
Mf)|≤ |f − C1
Nf | + |C1M (f − C2
Mf)| ++ |C1
MC2M (f − C3
Mf)| + . . . + |C1M . . . Cd−2
M (f − Cd−1M f)|
≤ [N1 + ΛMN2 + . . . + Λd−2M Nd−1]
1
2πδe
−πδMlog M
≤ 1 + ΛM + ... + Λd−2M
2πδmax
ℓ=1,...,d−1Nℓ e
−πδMlog M .
Note thatΛd−1
M − 1
ΛM − 1≈ Λd−2
M , ΛM → ∞,
hence (10) follows.
Data-sparse representation of high-order tensors B. Khoromskij, Leipzig 2007(L3) 15
Def. 3.1. A d-th order tensor on Id = I1 × ... × Id,
A := [ai1...id ] ∈ RId
, d = pd, p, d, n ∈ N
with multi-index
iℓ = (iℓ,1, ..., iℓ,p) ∈ Iℓ = I1 × ... × Ip (ℓ = 1, ..., d),
and iℓ,m ∈ 1, ..., n, for m = 1, ..., p (p = 1, 2, 3).
The L2 inner product of tensors induces the Frobenius norm
〈A,B〉 :=∑
(i1...id)∈Id
ai1...id bi1...id , ‖A‖F :=√
〈A,A〉.
A ∈ RId
has |Id| = ndp entries.
How to remove d from the exponential ?
Data-sparse representation of high-order tensors B. Khoromskij, Leipzig 2007(L3) 16
Key ingredient: representation by a sum of rank-1 tensors
A = V (1) ×2 · · · ×d V (d), ai1...id = v(1)i1
· · · v(d)id
with low dimensional (canonical) comp. V (ℓ) = v(ℓ)iℓ
∈ Rnp
.
Complexity: dnp.
Standard MLA has linear scaling in d.
Ex. 3.1. Let A = a1 × a2, B = b1 × b2, ai, bi ∈ Rn (q = 2, p = 1).
Then
(A, B) = (a1, b1)(a2, b2), ||A||F =√
(a1, a1)(a2, a2) = ‖a1‖‖a2‖,
where the latter corresponds to the Frobenius norm of a
matrix.
Rank-(r1, ..., rd) Tucker model B. Khoromskij, Leipzig 2007(L3) 17
Tucker Model (T r). (orthonormalised set V(ℓ)kℓ
∈ RIℓ)
A(r) =
r1∑
k1=1
...
rd∑
kd=1
bk1...kd×1 V
(1)k1
×2 ... ×d V(d)kd
∈ RI1×...×Id . (11)
Core tens. B = bk ∈ Rr1×...×rd is not unique (up to rotations)
Complexity (p = 1): rd + rdn ≪ nd with r = max rℓ ≪ n.
Visualization of the Tucker model with d = 3:
=
I 2
I 1
I 3
A B
I 1
r 2
r 1
I 2
I 3
r 3
V
V
V
(1)
(2)
(3)
CANDECOMP/PARAFAC (CP) tensor format B. Khoromskij, Leipzig 2007(L3) 18
CP Model (Cr). Approx. A by a sum of rank-1 tensors
A(r) =r
∑
k=1
bk ×1 V(1)k ×2 · · · ×d V
(d)k ≈ A, bk ∈ R
with normalised V(ℓ)k ∈ Rnp
. Uniqueness is due to J. Kruskal ’77.
Complexity: r + rdn.
The minimal number r is called a tensor rank of A(r).
+
b
A
1b
V V V
V V V
V V V
+= ...+
1
1 2
2
2
r
r
r
(1) (1) (1)
(2) (2) (2)
21
(3) (3) (3)
rb
Figure 1: Visualization of the CP-model for d = 3.
Two-level and mixed models B. Khoromskij, Leipzig 2007(L3) 19
Two-level Tucker model T (U ,r,q),
A(r,q) = B ×1 V(1) ×2 V
(2)... ×d V(d) ∈ T (U ,r,q) ⊂ C(n,q),
1. B ∈ Rr1×...×rd is retrieved by the rank-q CP model C(r,q)
2. V(ℓ) = [V
(ℓ)1 V
(ℓ)2 ...V
(ℓ)rℓ ] ∈ U, ℓ = 1, ..., d,
U spans fixed (uniform/adaptive) basis;
⇒ O(rd) with r = maxℓ≤d rℓ ⇒ O(dqr) (independent of n !).
Mixed model MC,T :
A = A1 + A2, A1 ∈ Cr1 , A2 ∈ T r2 .
Applies to ”ill-conditioned” tensors.
Challenge of multi-factor analysis B. Khoromskij, Leipzig 2007(L3) 20
There is little analogy between the cases d = 2 and d ≥ 3,
Paradigm: linear algebra vs. multi-linear algebra (MLA).
CP/Tucker tensor-product models have plenty of merits:
1. A(r) is repr. with low cost drn (resp. drn + rd) ≪ nd.
2. V(ℓ)k can be repr. in the data-sparse form:
H-matrix (HKT), wavelet-based (WKT), uniform basis.
3. The core tensor B = bk can be sparsified as well.
4. Efficient numerical MLA (practical experience).
Remark. CP decomposition (unique !) can’t be retrieved by
rotation and truncation of the Tucker model,
Cr = T r if r = 1, but Cr 6⊂ T r if r = |r| ≥ 2.
Examples of T (U,r,q)-models B. Khoromskij, Leipzig 2007(L3) 21
(I) Tensor-product sinc-interpolation:
analytic functions with point singularities,
r = (r, ..., r), r = q = O(log n| log ε|) ⇒ O(dqr).
(II) Sparse grids: regularity of mixed derivatives,
r = (n1, ..., nd), hyperbolic cross
⇒ q = n logd n ⇒ O(n logd n).
(III) Adaptive two-level appr.: Tucker + CP decomp. of Bwith q ≤ |r| ⇒ O(dqn).
Structured Kronecker product models (d-th order tensors of size nd)
Model Notation Memory/A · x A · B Comp. tools
Canonical - CP Cr drn drn2 ALS/Newton
HKT - CP CH,r dr√
n logq n drn logq n Analytic (quadr.)
Nested - CP CT (I),L drlog dn+ rd drlog dn SVD/QR/orthog. iter.
Tucker T r rd + drn - Orthogonal ALS
Two-level Tucker T (U,r,q) drq/drr0qn2 dr2q2 (mem.) Analyt.(interp.)+ CP
Properties of the Kronecker product B. Khoromskij, Leipzig 2007(L3) 22
The Kronecker product (KP) operation A⊗B of two matrices
A = [aij ] ∈ Rm×n, B ∈ Rh×g is an mh × ng matrix that has the
block-representation [aijB] (corr. to p = 2).
1. Let C ∈ Rs×t, then the KP satisfies the associative law,
(A ⊗ B) ⊗ C = A ⊗ (B ⊗ C),
and therefore we do not use brackets. The matrix
A ⊗ B ⊗ C := (A ⊗ B) ⊗ C has (mhs) rows and (ngt) columns.
2. Let C ∈ Rn×r and D ∈ Rg×s, then the standard
matrix-matrix product in the Kronecker format takes the form
(A ⊗ B)(C ⊗ D) = (AC) ⊗ (BD).
The corresponding extension to q-th order tensors is
(A1 ⊗ ... ⊗ Aq)(B1 ⊗ ... ⊗ Bq) = (A1B1) ⊗ ... ⊗ (AqBq).
Properties of the Kronecker product B. Khoromskij, Leipzig 2007(L3) 23
3. We have the distributive law
(A + B) ⊗ (C + D) = A ⊗ C + A ⊗ D + B ⊗ C + B ⊗ D.
4. Rank relation: rank(A ⊗ B) = rank(A)rank(B).
Ex. 3.2. In general A ⊗ B 6= B ⊗ A. What is the condition on
A and B that provides A ⊗ B = B ⊗ A ?
Invariance of some matrix properties:
(1) If A and B are diagonal then A ⊗ B is also diagonal, and
conversely (if A ⊗ B 6= 0).
(2) Let A and B be Hermitian/normal matrices (A∗ = A resp.
A−1 = A). Then A ⊗ B is of the corresponding type.
(3) A ∈ Rn×n, B ∈ Rm×m ⇒ det(A ⊗ B) = (detA)n(detB)m.
Kronecker product: matrix operations B. Khoromskij, Leipzig 2007(L3) 24
Thm. 3.5. Let A ∈ Rn×n and B ∈ R
m×m be invertible
matrices. Then
(A ⊗ B)−1 = A−1 ⊗ B−1.
Proof. Since det(A) 6= 0, det(B) 6= 0 and the above property
(3) we have det(A ⊗ B) 6= 0. Thus (A ⊗ B)−1 exists and
(A−1 ⊗ B−1)(A ⊗ B) = (A−1A) ⊗ (B−1B) = Inm.
Lem. 3.6. Let A ∈ Rn×n and B ∈ R
m×m be unitary matrices.
Then A ⊗ B is a unitary matrix.
Proof. Since A∗ = A−1, B∗ = B−1 we have
(A ⊗ B)∗ = A∗ ⊗ B∗ = A−1 ⊗ B−1 = (A ⊗ B)−1.
Kronecker product: matrix operations B. Khoromskij, Leipzig 2007(L3) 25
Define the commutator [A, B] := AB − BA.
Lem. 3.7. Let A ∈ Rn×n and B ∈ R
m×m. Then
[A ⊗ In, Im ⊗ B] = 0 ∈ Rm2×n2
.
Proof.
[A ⊗ In, Im ⊗ B] = (A ⊗ In)(Im ⊗ B) − (Im ⊗ B)(A ⊗ In)
= A ⊗ B − A ⊗ B = 0.
Rem. 3.1. Let A, B ∈ Rn×n, C, D ∈ R
m×m and [A, B] = 0,
[C, D] = 0. Then
[A ⊗ C, B ⊗ D] = 0.
Proof. Apply the identity (A ⊗ B)(C ⊗ D) = (AC) ⊗ (BD).