tr2012-058 september 2012 · 2012. 11. 15. · figure 1: the pabs in 2d. where 0

MITSUBISHI ELECTRIC RESEARCH LABORATORIEShttp://www.merl.com

Principal Angles Between Subspaces andTheir Tangents

Knyazev, A.V.; Zhu, P.

TR2012-058 September 2012

Abstract

Principal angles between subspaces (PABS) (also called canonical angles) serve as a classicaltool in mathematics, statistics, and applications, e.g., data mining. Traditionally, PABS are intro-duced and used via their cosines. The tangents of PABS have attracted relatively less attention,but are important for analysis of convergence of subspace iterations for eigenvalue problems. Weexplicitly construct matrices, such that their singular values are equal to the tangents of PABS,using several approaches: orthonormal and nonorthonormal bases for subspaces, and orthogonalprojectors.

Cornell University Library

This work may not be copied or reproduced in whole or in part for any commercial purpose. Permission to copy in whole or in partwithout payment of fee is granted for nonprofit educational and research purposes provided that all such whole or partial copies includethe following: a notice that such copying is by permission of Mitsubishi Electric Research Laboratories, Inc.; an acknowledgment ofthe authors and individual contributions to the work; and all applicable portions of the copyright notice. Copying, reproduction, orrepublishing for any other purpose shall require a license with payment of fee to Mitsubishi Electric Research Laboratories, Inc. Allrights reserved.

Copyright c©Mitsubishi Electric Research Laboratories, Inc., 2012201 Broadway, Cambridge, Massachusetts 02139

MERLCoverPageSide2

Principal angles between subspaces and their tangentsI

Peizhen Zhua, Andrew V. Knyazeva,b,∗

aDepartment of Mathematical and Statistical Sciences; University of Colorado Denver,P.O. Box 173364, Campus Box 170, Denver, CO 80217-3364, USA.

bMitsubishi Electric Research Laboratories; 201 Broadway Cambridge, MA 02139

Abstract

Principal angles between subspaces (PABS) (also called canonical angles)serve as a classical tool in mathematics, statistics, and applications, e.g., datamining. Traditionally, PABS are introduced and used via their cosines. Thetangents of PABS have attracted relatively less attention, but are importantfor analysis of convergence of subspace iterations for eigenvalue problems.We explicitly construct matrices, such that their singular values are equalto the tangents of PABS, using several approaches: orthonormal and non-orthonormal bases for subspaces, and orthogonal projectors.

Keywords: principal angles, canonical angles, singular value, projector2000 MSC: 15A42, 15A60, 65F35

1. Introduction

The concept of principal angles between subspaces (PABS) is first in-troduced by Jordan [1] in 1875. Hotelling [2] defines PABS in the form ofcanonical correlations in statistics in 1936. Numerous researchers work onPABS; see, e.g., our pseudo-random choice of initial references [3, 4, 5, 6, 7, 8],out of tens of thousand of Internet links for the keywords “principal angles”and “canonical angles”.

ISeptember 5, 2012. The preliminary version is at http://arxiv.org/abs/math/.This material is based upon work partially supported by the National Science Foundationunder Grant No. 1115734.∗Corresponding author.Email addresses: peizhen.zhu[at]ucdenver.edu (Peizhen Zhu),

andrew.knyazev[at]ucdenver.edu (Andrew V. Knyazev)URL: http://math.ucdenver.edu/~aknyazev/ (Andrew V. Knyazev)

Preprint submitted to Linear Algebra and Its Applications September 5, 2012

arX

iv:1

209.

0523

v1 [

mat

h.N

A]

4 S

ep 2

012

http://arxiv.org/abs/math/

In this paper, we first briefly review the concept and some importantproperties of PABS in Section 2. Traditionally, PABS are introduced andused via their sines and more commonly, because of the connection to canon-ical correlations, cosines. The tangents of PABS have attracted relativelyless attention, despite of the celebrated work of Davis and Kahan [9], whichincludes several tangent-related theorems. Our interest to the tangents ofPABS is motivated by their applications in theoretical analysis of conver-gence of subspace iterations for eigenvalue problems.

We review some previous work on the tangents of PABS in Section 3. Themain goal of this work is explicitly constructing a family of matrices suchthat their singular values are equal to the tangents of PABS. We form thesematrices using several different approaches: orthonormal bases for subspacesin Section 4, non-orthonormal bases in Section 5, and orthogonal projectorsin Section 6.

2. Definition and basic properties of PABS

In this section, we remind the reader the concept of PABS and somefundamental properties of PABS. First, we recall that an acute angle betweentwo unit vectors x and y, i.e., with xHx = yHy = 1, is defined as cos θ(x, y) =|xHy|, where 0 ≤ θ(x, y) ≤ π/2.

The definition of an acute angle between two vectors can be recursivelyextended to PABS; see, e.g., [2, 10, 11].

Definition 2.1. Let X ⊂ Cn and Y ⊂ Cn be subspaces with dim(X ) = pand dim(Y) = q. Let m = min (p, q). The principal angles

Θ(X ,Y) = [θ1, . . . , θm] ,where θk ∈ [0, π/2], k = 1, . . . ,m,

between X and Y are recursively defined by

sk = cos(θk) = maxx∈X

maxy∈Y|xHy| = |xHk yk|,

subject to

‖x‖ = ‖y‖ = 1, xHxi = 0, yHyi = 0, i = 1, . . . , k − 1.

The vectors {x1, . . . , xm} and {y1, . . . , ym} are called the principal vectors.

An alternative definition of PABS, from [10, 11], is based on the singularvalue decomposition (SVD) and reproduced here as the following theorem.

2

Theorem 2.1. Let the columns of matrices X ∈ Cn×p and Y ∈ Cn×q formorthonormal bases for the subspaces X and Y, correspondingly. Let the SVDof XHY be UΣV H , where U and V are unitary matrices and Σ is a p × qdiagonal matrix with the real diagonal elements s1(X

HY ), . . . , sm(XHY ) in

nonincreasing order with m = min(p, q). Then

cos Θ↑(X ,Y) = S(XHY

)=[s1(XHY

), . . . , sm

(XHY

)],

where Θ↑(X ,Y) denotes the vector of principal angles between X and Y ar-ranged in nondecreasing order and S(A) denotes the vector of singular valuesof A. Moreover, the principal vectors associated with this pair of subspacesare given by the first m columns of XU and Y V, correspondingly.

Theorem 2.1 implies that PABS are symmetric, i.e. Θ(X ,Y) = Θ(Y ,X ),and unitarily invariant, i.e., Θ(UX , UY) = Θ(X ,Y) for any unitary matrixU ∈ Cn×n, since (UX)H(UY ) = XHY . A number of other important prop-erties of PABS have been established, for finite dimensional subspaces, e.g.,in [4, 5, 6, 7, 8], and for infinite dimensional subspaces in [3, 12]. We list afew most useful properties of PABS below.

Property 2.1. [7] In the notation of Theorem 2.1, let p ≥ q and let [X,X⊥]be a unitary matrix such that S(XH⊥ Y ) = [s1(X

H⊥ Y ), . . . , sq(X

H⊥ Y )] where

s1(XH⊥ Y ) ≥ · · · ≥ sq(XH⊥ Y ). Then sk(XH⊥ Y ) = sin(θq+1−k), k = 1, . . . , q.

Relationships of principal angles between X and Y , and between their or-thogonal complements are thoroughly investigated in [4, 5, 12]. The nonzeroprincipal angles between X and Y are the same as those between X⊥ andY⊥. Similarly, the nonzero principal angles between X and Y⊥ are the sameas those between X⊥ and Y .

Property 2.2. [4, 5, 12] Let the orthogonal complements of the subspaces Xand Y be denoted by X⊥ and Y⊥, correspondingly. Then,

1.[Θ↓(X ,Y), 0, . . . , 0

]=[Θ↓(X⊥,Y⊥), 0, . . . , 0

].

2.[Θ↓(X ,Y⊥), 0, . . . , 0

]=[Θ↓(X⊥,Y), 0, . . . , 0

].

3.[π2, . . . , π

2,Θ↓(X ,Y)

]=[π2−Θ↑(X ,Y⊥), 0, . . . , 0

],

where there are max(dim(X )− dim(Y), 0) additional π/2s on the left.Extra 0s at the end may need to be added on either side to match the sizes.Symbols ↓ and ↑ denote the components in vector arranged in nonincreasingand nondecreasing order, correspondingly.

These statements are widely used to discover new properties of PABS.

3

3. Tangents of PABS

The tangents of PABS serve as an important tool in numerical matrixanalysis. For example, in [13, 14], the authors use the tangent of the largestprincipal angle derived from a norm of a specific matrix. In [6, p. 231-232]and [15, Theorem 2.4, p. 252] the tangents of PABS, related to singular valuesof a matrix—without an explicit matrix formulation—are used to analyzeperturbations of invariant subspaces. In [6, 13, 14, 15], the two subspaceshave the same dimensions.

Let the orthonormal columns of matrices X, X⊥, and Y span the sub-spaces X , the orthogonal complement X⊥ of X , and Y , correspondingly.According to Theorem 2.1 and Property 2.1, cos Θ(X ,Y) = S(XHY ) andsin Θ(X ,Y) = S(XH⊥ Y ). The properties of sines and especially cosines ofPABS are well studied; e.g., in [7, 10]. One could obtain the tangents ofPABS by using the sines and the cosines of PABS directly, however, in somesituations this does not reveal enough information about their properties.

In this work, we construct a family F of explicitly given matrices, suchthat the singular values of the matrix T are the tangents of PABS, whereT ∈ F . We form T in three different ways. First, we derive T as the productof matrices whose columns form the orthonormal bases of subspaces. Second,we present T as the product of matrices with non-orthonormal columns.Third, we form T as the product of orthogonal projections on subspaces.To better understand the matrix T , we provide a geometric interpretation ofsingular values of T and of the action of T as a linear operator.

Our constructions include the matrices used in [6, 13, 14, 15]. Further-more, we consider the tangents of principal angles between two subspacesnot only with the same dimensions, but also with different dimensions.

4. tan Θ in terms of the orthonormal bases of subspaces

The sines and cosines of PABS are obtained from the singular valuesof explicitly given matrices, which motivates us to construct T as follows:if matrices X and Y have orthonormal columns and XHY has full rank,

let T = XH⊥ Y(XHY

)−1, then tan Θ(X ,Y) could be equal to the positive

singular values of T . We begin with an example in two dimensions in Figure 1.

Let X =

(cos βsin β

), X⊥ =

(− sin β

cos β

), and Y =

(cos (θ + β)sin (θ + β)

),

4

Figure 1: The PABS in 2D.

where 0 ≤ θ < π/2 and β ∈ [0, π/2]. Then, XHY = cos θ and XH⊥ Y = sin θ.Obviously, tan θ coincides with the singular value of T = XH⊥ Y

(XHY

)−1.

If θ = π/2, then the matrix XHY is singular, which demonstrates that, ingeneral, we need to use its Moore-Penrose Pseudoinverse to form our matrix

T = XH⊥ Y(XHY

)†.

The Moore-Penrose Pseudoinverse is well known. For a matrix A ∈ Cn×m,its Moore-Penrose Pseudoinverse is the unique matrix A† ∈ Cm×n satisfying

AA†A = A, A†AA† = A†, (AA†)H = AA†, (A†A)H = A†A.

Some properties of the Moore-Penrose Pseudoinverse are listed as follows.

• If A = UΣV H is the SVD of A, then A† = V Σ†UH .

• If A has full column rank and B has full row rank, then (AB)† = B†A†.However, this formula does not hold in general.

• AA† is the orthogonal projector onto the range of A, and A†A is theorthogonal projector onto the range of AH .

For additional properties of the Moore-Penrose Pseudoinverse, we refer thereader to [6]. Now we are ready to prove that the intuition discussed above,

suggesting to try T = XH⊥ Y(XHY

)†, is correct. Moreover, we cover the case

of the subspaces X and Y having possibly different dimensions.

5

Theorem 4.1. Let X ∈ Cn×p have orthonormal columns and be arbitrarilycompleted to a unitary matrix [X,X⊥]. Let the matrix Y ∈ Cn×q be suchthat Y HY = I. Then the positive singular values, denoted by S+(T ), of the

matrix T = XH⊥ Y(XHY

)†satisfy

tan Θ(R(X),R(Y )) = [∞, . . . ,∞, S+(T ), 0, . . . , 0],

where R(·) denotes the matrix range.

Proof. Let [ Y Y⊥ ] be unitary, then [ X X⊥ ]H [ Y Y⊥ ] is unitary,

such that

[ X X⊥ ]H [ Y Y⊥ ] =

q n− qp

n− p

[XHY XHY⊥XH⊥ Y X

H⊥ Y⊥

].

Applying the CS-decomposition (CSD), e.g., [16, 17, 18], we get

[ X X⊥ ]H [ Y Y⊥ ] =

[XHY XHY⊥XH⊥ Y X

H⊥ Y⊥

]=

[U1

U2

]D

[V1

V2

]H,

where U1 ∈ U(p), U2 ∈ U(n− p), V1 ∈ U(q), and V2 ∈ U(n− q). The symbolU(k) denotes the family of unitary matrices of the size k. The matrix D hasthe following structure:

D =

r s q−r−s n−p−q+r s p−r−s

r I OHss C S

p−r−s Oc In−p−q+r Os −I

s S −Cq−r−s I OHc

,

where C = diag(cos(θ1), . . . , cos(θs)), and S = diag(sin(θ1), . . . , sin(θs)) withθk ∈ (0, π/2) for k = 1, . . . , s, which are the principal angles between thesubspaces R(Y ) and R(X). The matrices C and S could be not present.The matrices Os and Oc are matrices of zeros and do not have to be square.I denotes the identity matrix. We may have different sizes of I in D.

6

Therefore,

T = XH⊥ Y(XHY

)†= U2

Os SI

V H1 V1 I C

Oc

† UH1= U2

Os SC−1OHc

UH1 .Hence, S+(T ) = (tan(θ1), . . . , tan(θs)), where 0 < θ1 ≤ θ2 ≤ · · · ≤ θs < π/2.

From the matrix D, we obtain S(XHY

)= S (diag(I, C,Oc)) . According

to Theorem 2.1, Θ(R(X),R(Y )) = [0, . . . , 0, θ1, . . . , θs, π/2, . . . , π/2], whereθk ∈ (0, π/2) for k = 1, . . . , s, and there are min(q−r−s, p−r−s) additionalπ/2’s and r additional 0’s on the right-hand side, which completes the proof.

Remark 4.2. If more information is available about the subspaces X , Y andtheir orthogonal complements, we can obtain the exact sizes of block matricesin the matrix D. Let us consider the decomposition of the space Cn into anorthogonal sum of five subspaces as in [12, 19],

Cn = M00 ⊕M01 ⊕M10 ⊕M11 ⊕M,

where M00 = X ∩ Y , M01 = X ∩ Y⊥, M10 = X⊥ ∩ Y , M11 = X⊥ ∩ Y⊥(X = R (X) and Y = R (Y )). Using Tables 1 and 2 in [12] we get

dim(M00) = r,

dim(M10) = q − r − s,dim(M11) = n− p− q + r,dim(M01) = p− r − s,

where M = MX ⊕MX⊥ = MY ⊕MY⊥ with

MX =X ∩ (M00 ⊕M01)⊥ ,MX⊥=X⊥∩ (M10 ⊕M11)⊥ ,MY =Y ∩ (M00 ⊕M10)⊥ ,MY⊥ =Y⊥∩ (M01 ⊕M11)⊥ ,

7

and s = dim(MX ) = dim(MY) = dim(MX⊥) = dim(MY⊥). Thus, we canrepresent D in the following block form

dim(M00) dim(M)/2 dim(M10) dim(M11) dim(M)/2 dim(M01)

dim(M00) I OHsdim(M)/2 C Sdim(M01) Oc Idim(M11) Os −Idim(M)/2 S −Cdim(M10) I OHc

.

In addition, it is possible to permute the first q columns or the last n − qcolumns of D, or the first p rows or the last n − p rows and to change thesign of any column or row to obtain the variants of the CSD.

Remark 4.3. From Remark 4.2, we see that some identity and zero matricesmay be not present in the matrix D. To obtain more details for D, we have todeal with D in several cases based on either p ≤ q or q ≥ p and either p+q ≤ nor p+ q ≥ n. Since the angles are symmetric such that Θ(X ,Y) = Θ(Y ,X ),it is enough to discuss one case of either p ≤ q or q ≤ p. Let us assumeq ≤ p which implies dim(M01)− dim(M10) = p− q.

Case 1: assuming p+ q ≤ n, we have (also see [16, 20]),

D =

C0O(p−q)×q

O S0 OO O Ip−q

−S0O(n−p−q)×q

O C0 OIn−p−q O O

,where C0 = diag(cos(θ1), . . . , cos(θq)) and S0 = diag(sin(θ1), . . . , sin(θq)) withθk ∈ [0, π/2] for k = 1, . . . , q.

Case 2 : assuming p+ q ≥ n, we have

D =

C0 O(n−p)×(p+q−n) S0 OO(p+q−n)×(n−p) Ip+q−n O OO(p−q)×(n−p) O(p−q)×(p+q−n) O Ip−q−S0 O(n−p)×(p+q−n) C0 O

,where C0 = diag(cos(θ1), . . . , cos(θn−p)) and S0 = diag(sin(θ1), . . . , sin(θn−p))with θk ∈ [0, π/2] for k = 1, . . . , n− p.

8

Due to the fact that the angles are symmetric, i.e., Θ(X ,Y) = Θ(Y ,X ),the matrix T in Theorem 4.1 could be substituted with Y H⊥ X(Y

HX)†. More-over, the nonzero angles between the subspaces X and Y are the same asthose between the subspaces X⊥ and Y⊥. Hence, T can be presented asXHY⊥(X

H⊥ Y⊥)

† and Y HX⊥(YH⊥ X⊥)

†. Furthermore, for any matrix T , wehave S(T ) = S(TH), which implies that all conjugate transposes of T arevalid. Let F denote a family of matrices, such that the singular values of thematrix in F are the tangents of PABS. To sum up, any of the formulas forT in F in the first column of Table 1 can be used in Theorem 4.1.

Table 1: Different matrices in F : matrix T ∈ F using orthonormal bases.XH⊥ Y

(XHY

)†PX⊥Y

(XHY

)†Y H⊥ X(Y

HX)† PY⊥X(YHX)†

XHY⊥(XH⊥ Y⊥)

† PXY⊥(XH⊥ Y⊥)

†

Y HX⊥(YH⊥ X⊥)

† PYX⊥(YH⊥ X⊥)

†

(Y HX)†Y HX⊥ (YHX)†Y HPX⊥

(XHY )†XHY⊥ (XHY )†XHPY⊥

(Y H⊥ X⊥)†Y H⊥ X (Y

H⊥ X⊥)

†Y H⊥ PX(XH⊥ Y⊥)

†XH⊥ Y (XH⊥ Y⊥)

†XH⊥ PY

Using the fact that the singular values are invariant under unitary multi-plications, we can also use PX⊥Y (X

HY )† for T in Theorem 4.1, where PX⊥is an orthogonal projector onto the subspace X⊥. Thus, we can use T as inthe second column in Table 1, where PX , PY , and PY⊥ denote the orthogonalprojectors onto the subspace X , Y , and Y⊥, correspondingly.Remark 4.4. From the proof of Theorem 4.1, we immediately derive that

XH⊥ Y(XHY

)†= −(Y H⊥ X⊥)†Y H⊥ X, and Y H⊥ X(Y HX)† = −(XH⊥ Y⊥)†XH⊥ Y.

Remark 4.5. Let XH⊥ Y(XHY

)†= U2Σ1U1 and Y

H⊥ X(Y

HX)† = V2Σ2V1, bethe SVDs, where Ui and Vi (i = 1, 2) are unitary matrices and Σi (i = 1, 2)are diagonal matrices. Let the diagonal elements of Σ1 and Σ2 be in the sameorder. Then the principal vectors associated with the pair of subspaces X andY are given by the columns of XU1 and Y V1.

5. tan Θ in terms of the bases of subspaces

In the previous section, we use matrices X and Y with orthonormalcolumns to formulate T . It turns out that we can choose a non-orthonormal

9

basis for one of the subspaces X or Y .

Theorem 5.1. Let [X,X⊥] be a unitary matrix with X ∈ Cn×p. LetY ∈ Cn×q and rank (Y ) = rank

(XHY

)≤ p. Then the positive singu-

lar values of the matrix T = XH⊥ Y(XHY

)†satisfy tan Θ(R(X),R(Y )) =

[S+(T ), 0, . . . , 0].

Proof. Let the rank of Y be r, thus r ≤ p. Let the SVD of Y be UΣV H ,where U is an n× n unitary matrix; Σ is an n× q real diagonal matrix withdiagonal entries s1, . . . , sr, 0, . . . , 0 ordered by decreasing magnitude such thats1 ≥ s2 ≥ · · · ≥ sr > 0, and V is a q×q unitary matrix. Since rank(Y )=r ≤ q,we can compute a reduced SVD such that Y = UrΣrV

Hr . Only the r column

vectors of U and the r row vectors of V H , corresponding to nonzero singularvalues are used, which means that Σr is an r × r invertible diagonal matrix.Based on the fact that the left singular vectors corresponding to the non-zerosingular values of Y span the range of Y , we have

tan Θ(R(X),R(Y )) = tan Θ(R(X),R(Ur)).

Since rank(XHY ) = r, we have rank(XHUrΣr) = rank(XHUr) = r. Let

T1 = XH⊥ Ur

(XHUr

)†. According to Theorem 4.1, it follows that

tan Θ(R(X),R(Ur)) = [S+(T1), 0, . . . , 0].

It is worth noting that the angles between R(X) and R(Ur) are in [0, π/2),since XHUr is full rank.

Our task is now to show that T1 = T . By direct computation, we have

T = XH⊥ Y(XHY

)†= XH⊥ UrΣrV

Hr

(XHUrΣrV

Hr

)†= XH⊥ UrΣrV

Hr

(V Hr)† (

XHUrΣr)†

= XH⊥ UrΣr(XHUrΣr

)†= XH⊥ UrΣrΣ

†r

(XHUr

)†= XH⊥ Ur

(XHUr

)†.

In two identities above we use the fact that if a matrix A is of full col-umn rank, and a matrix B is of full row rank, then (AB)† = B†A†. Hence,tan Θ(R(X),R(Y )) = [S+(T ), 0, . . . , 0] which completes the proof.

10

Remark 5.2. Let XHY have full rank. Let us define Z = X+X⊥T following

[6, p.231-232] and [15, Theorem 2.4, p.252]. Since(XHY

)†XHY = I, the

following identities Y = PXY +PX⊥Y = XXHY +X⊥X

H⊥ Y = ZX

HY implythat R(Y ) ⊆ R(Z). By direct calculation, we obtain that

XHZ = XH(X+X⊥T ) = I andZHZ = (X+X⊥T )

H(X+X⊥T ) = I+THT.

Thus, XHZ(ZHZ

)−1/2= (I + THT )−1/2 is Hermitian positive definite.

The matrix Z(ZHZ)−1/2 by construction has orthonormal columns whichspan the space Z. Moreover, we observe that

S↑(XHZ(ZHZ)−1/2

)= Λ↑

((I + THT )−1/2

)=(1 + S2 (T )

)−1/2.

Therefore, tan Θ(R(X),R(Z)) = [S+(T ), 0, . . . , 0] and dim(X ) = dim(Z).From Theorem 5.1, we have tan Θ(R(X),R(Y )) ⊂ tan Θ(R(X),R(Z)). Itis worth noting that Θ(R(X),R(Y )) and Θ(R(X),R(Z)) in (0, π/2) are thesame. For the case p = q, we have that Θ(R(X),R(Y )) = Θ(R(X),R(Z)).This gives us an explicit matrix expression XH⊥ Y

(XHY

)−1for the matrix P

in [6, p.231-232] and [15, Theorem 2.4, p.252].

Remark 5.3. The condition rank (Y ) = rank(XHY

)≤ p is necessary in

Theorem 5.1. For example, let

X =

100

, X⊥ = 0 01 0

0 1

, and Y = 1 10 1

0 0

.Then, we have XHY = ( 1 1 ) and (XHY )† = ( 1/2 1/2 )H . Thus,

T = XH⊥ Y(XHY

)†= ( 1/2 0 )H , and s(T ) = 1/2. On the other hand,

we obtain that tan Θ(R(X),R(Y )) = 0. From this example, we see that theresult in Theorem 5.1 does not hold for the case rank(Y ) = rank

(XHY

)> p.

Corollary 5.1. Using the notation of Theorem 5.1, let Y be full rank andp = q. Let PX⊥ be an orthogonal projection onto the subspace R(X⊥). Thentan Θ(R(X),R(Y )) = S

(PX⊥Y

(XHY

)−1).

Proof. Since the singular values are invariant under unitary transform, it

is easy to obtain[S(XH⊥ Y

(XHY

)−1), 0, . . . , 0

]= S

(PX⊥Y

(XHY

)−1).

Moreover, the number of the singular values of PX⊥Y(XHY

)−1is p, hence

we obtain the result as stated.

11

Above we construct T explicitly in terms of bases of subspaces X and Y .Next, we provide an alternative construction by adopting only the triangularmatrix from the QR factorization of a basis of the subspace X + Y .

Corollary 5.2. Let X ∈ Cn×p and Y ∈ Cn×q be matrices of full rank whereq ≤ p. Let the triangular part R of the reduced QR factorization of the matrixL = [X, Y ] be

R =

[R11 R12O R22

],

where R12 is full rank. Then the positive singular values of the matrix T =R22(R12)

†satisfy tan Θ(R(X),R(Y )) = [S+(T ), 0, . . . , 0].

Proof. Let the reduced QR factorization of L be

[ X Y ] = [ Q1 Q2 ]

[R11 R12O R22

],

where the columns of [ Q1 Q2 ] are orthonormal, and R(Q1) = X . Wedirectly obtain Y = Q1R12 +Q2R22. Multiplying by Q

H1 on both sides of the

above equality for Y , we get R12 = QH1 Y. Similarly, multiplying by Q

H2 we

have R22 = QH2 Y. Combining these equalities with Theorem 5.1 completes

the proof.

6. tan Θ in terms of the orthogonal projections onto subspaces

In this section, we present T using orthogonal projectors only. Before weproceed, we need a lemma and a corollary.

Lemma 6.1. [6] If U and V are unitary matrices, then for any matrix A wehave (UAV )† = V HA†UH .

Corollary 6.1. Let A ∈ Cn×n be a block matrix, such that

A =

[D O12O21 O22

],

where D is a p× q matrix and O12, O21, and O22 are zero matrices. Then

A† =

[D† OH21OH12 O

H22

].

12

Proof. Let U(n) denote a family of unitary matrices of size n by n. Let theSVD of D be U1ΣV

H1 , where Σ is a p × q diagonal matrix and U1 ∈ U(p),

V1 ∈ U(q). Then there exist unitary matrices U2 ∈ U(n−p) and V2 ∈ U(n−q),such that

A =

[U1

U2

] [Σ O12O21 O22

] [V H1

V H2

].

Thus,

A† =

[V1

V2

] [Σ O12O21 O22

]† [UH1

UH2

]=

[V1

V2

] [Σ† OH21OH12 O

H22

] [UH1

UH2

].

Since D† = V1Σ†UH1 , we have A

† =

[D† OH21OH12 O

H22

]as desired.

Theorem 6.2. Let PX , PX⊥ and PY be orthogonal projectors onto the sub-spaces X , X⊥ and Y, correspondingly. Then the positive singular valuesS+(T ) of the matrix T = PX⊥ (PXPY)

† satisfy

tan Θ(X ,Y) = [∞, . . . ,∞, S+(T ), 0, . . . , 0].

Proof. Let the matrices [ X X⊥ ] and [ Y Y⊥ ] be unitary, where R(X) =X and R(Y ) = Y . By direct calculation, we obtain[

XH

XH⊥

]PXPY [ Y Y⊥ ] =

[XHY OO O

].

The matrix PXPY can be written as

PXPY = [ X X⊥ ]

[XHY OO O

] [Y H

Y H⊥

]. (1)

From Lemma 6.1 and Corollary 6.1, we have

(PXPY)† = [ Y Y⊥ ]

[(XHY )† OO O

] [XH

XH⊥

]. (2)

In a similar way,

PX⊥PY = [ X⊥ X ]

[XH⊥ Y OO O

] [Y H

Y H⊥

]. (3)

13

Let T = PX⊥PY (PXPY)† and B = XH⊥ Y

(XHY

)†, so

T = [ X⊥ X ]

[XH⊥ Y OO O

] [(XHY )† OO O

] [XH

XH⊥

]= [ X⊥ X ]

[B OO O

] [XH

XH⊥

].

The singular values are invariant under unitary multiplications, thereforeS+(T ) = S+(B).

Finally, we show that PX⊥PY (PXPY)† = PX⊥ (PXPY)

†. The null spaceof PXPY is the orthogonal sum Y⊥ ⊕

(Y ∩ X⊥

). The range of (PXPY)

† is

thus Y ∩(Y ∩ X⊥

)⊥, which is the orthogonal complement of the null space

of PXPY . Moreover, (PXPY)† is an oblique projector, because (PXPY)

† =((PXPY)

†)2 which can be obtained by direct calculation using equality (2).The oblique projector (PXPY)

† projects onto Y along X⊥. Thus, we havePY (PXPY)

† = (PXPY)† . Consequently, we can simplify T = PX⊥PY (PXPY)

†

as T = PX⊥ (PXPY)†.

Figure 2: Geometrical meaning of T = PX⊥ (PXPY)†.

To gain geometrical insight of T , in Figure 2 we choose a generic unitvector z. We project z onto Y along X⊥, then project onto the subspace X⊥which is interpreted as Tz. The red segment in the graph is the image of Tunder all unit vectors. It is straightforward to see that s(T ) = ‖T‖ = tan(θ).

14

Using Property 2.2 and the fact that the principal angles are symmetricwith respect to the subspaces X and Y , the expressions PY⊥(PYPX )† andPX (PX⊥PY⊥)

† can also be used in Theorem 6.2.

Remark 6.3. According to Remark 4.4 and Theorem 6.2, we have

PX (PX⊥PY⊥)† = PXPY⊥(PX⊥PY⊥)

† = −(PX⊥ (PXPY)

†)H

.

Furthermore, by the geometrical properties of the oblique projector (PXPY)†,

it follows that (PXPY)† = PY(PXPY)

† = (PXPY)†PX . Therefore, we obtain

PX⊥ (PXPY)† = PX⊥PY (PXPY)

† = (PY − PX )(PXPY)†.

Similarly, we have

PY⊥(PYPX )† = PY⊥PX (PYPX )

† = (PX −PY)(PYPX )† = −(PY(PY⊥PX⊥)

†)H .From equalities above, we see that there are several different expressions forT in Theorem 6.2.

Theorem 6.4. For subspaces X and Y,1. If dim(X ) ≤ dim(Y), we have PXPY(PXPY)† = PX (PXPY)† = PX .2. If dim(X ) ≥ dim(Y), we have (PXPY)†PXPY = (PXPY)†PY = PY .

Proof. Using the fact that XHY (XHY )† = I for dim(X ) ≤ dim(Y) andcombining identities (1) and (2), we obtain the first statement. Identities (1)and (2), together with the fact that (XHY )†XHY = I for dim(X ) ≥ dim(Y),imply the second statement.

Remark 6.5. From Theorem 6.4, for the case dim(X ) ≤ dim(Y) we have

PX⊥(PXPY)† = (PXPY)

† − PX (PXPY)† = (PXPY)† − PX .

Since the angles are symmetric, using the second statement in Theorem 6.4we have (PX⊥PY⊥)

†PY = (PX⊥PY⊥)† − PY⊥ .

On the other hand, for the case dim(X ) ≥ dim(Y) we obtain

(PXPY)†PY⊥ = (PXPY)

† − PY and (PX⊥PY⊥)†PX = (PX⊥PY⊥)† − PX⊥ .

To sum up, the following formulas for T in Table 2 can also be used inTheorem 6.2. An alternative proof for T = (PXPY)

† − PY is provided byDrmač in [14] for the particular case dim(X ) = dim(Y).

15

Table 2: Different formulas for T : left for dim(X ) ≤ dim(Y); right for dim(X ) ≥ dim(Y).(PXPY)

† − PX (PXPY)† − PY(PX⊥PY⊥)

† − PY⊥ (PX⊥PY⊥)† − PX⊥

Remark 6.6. Finally, we note that our choice of the space H = Cn mayappear natural to the reader familiar with the matrix theory, but in fact issomewhat misleading. The principal angles (and the corresponding principalvectors) between the subspaces X ⊂ H and Y ⊂ H are exactly the same asthose between the subspaces X ⊂ X +Y and Y ⊂ X +Y, i.e., we can reducethe space H to the space X + Y ⊂ H without changing PABS.

This reduction changes the definition of the subspaces X⊥ and Y⊥ and,thus, of the matrices X⊥ and Y⊥ that column-span the subspaces X⊥ andY⊥. All our statements that use the subspaces X⊥ and Y⊥ or the matricesX⊥ and Y⊥ therefore have their new analogs, if the space X + Y substitutesfor H. The formulas PX⊥ = I −PX⊥ and PY⊥ = I −PY⊥, look the same, butthe identity operators are different in the spaces X + Y and H.

References

[1] C. Jordan, Essai sur la géométrie á n dimensions, Amer. Math. Monthly3 (1875) 103–174.

[2] H. Hotelling, Relations between two sets of variates, Biometrika 28(1936) 321–377.

[3] F. Deutsch, The angle between subspaces of a Hilbert space, in: Ap-proximation theory, wavelets and applications (Maratea, 1994), volume454 of NATO Adv. Sci. Inst. Ser. C Math. Phys. Sci., Kluwer Acad.Publ., Dordrecht, 1995, pp. 107–130.

[4] I. C. F. Ipsen, C. D. Meyer, The angle between complementary sub-spaces, Amer. Math. Monthly 102 (1995) 904–911.

[5] A. V. Knyazev, M. E. Argentati, Majorization for changes in anglesbetween subspaces, Ritz values, and graph Laplacian spectra, SIAM J.Matrix Anal. Appl. 29 (2006/07) 15–32 (electronic).

[6] G. W. Stewart, J. Sun, Matrix perturbation theory, Computer Scienceand Scientific Computing, Academic Press Inc., Boston, MA, 1990.

16

[7] J. Sun, Perturbation of angles between linear subspaces, J. Comput.Math. 5 (1987) 58–61.

[8] P. Wedin, On angles between subspaces of a finite dimensional innerproduct space, in: B. Kgstrm, A. Ruhe (Eds.), Matrix Pencils, volume973 of Lecture Notes in Mathematics, Springer Berlin Heidelberg, 1983,pp. 263–285.

[9] C. Davis, W. M. Kahan, The rotation of eigenvectors by a perturbation.III, SIAM J. Numer. Anal. 7 (1970) 1–46.

[10] Ȧ. Björck, G. Golub, Numerical methods for computing angles betweenlinear subspaces, Math. Comp. 27 (1973) 579–594.

[11] G. H. Golub, H. Zha, The canonical correlations of matrix pairs andtheir numerical computation, in: Linear algebra for signal processing(Minneapolis, MN, 1992), volume 69 of IMA Vol. Math. Appl., Springer,New York, 1995, pp. 27–49.

[12] A. V. Knyazev, A. Jujunashvili, M. Argentati, Angles between infi-nite dimensional subspaces with applications to the Rayleigh-Ritz andalternating projectors methods, J. Funct. Anal. 259 (2010) 1323–1345.

[13] N. Bosner, Z. Drmač, Subspace gap residuals for Rayleigh-Ritz approx-imations, SIAM J. Matrix Anal. Appl. 31 (2009) 54–67.

[14] Z. Drmač, On relative residual bounds for the eigenvalues of a Hermitianmatrix, Linear Algebra Appl. 244 (1996) 155–163.

[15] G. W. Stewart, Matrix algorithms. Vol. II, Society for Industrial andApplied Mathematics (SIAM), Philadelphia, PA, 2001. Eigensystems.

[16] A. Edelman, B. D. Sutton, The beta-Jacobi matrix model, the CS de-composition, and generalized singular value problems, Found. Comput.Math. 8 (2008) 259–285.

[17] C. C. Paige, M. A. Saunders, Towards a generalized singular valuedecomposition, SIAM J. Numer. Anal. 18 (1981) 398–405.

[18] C. C. Paige, M. Wei, History and generality of the CS decomposition,Linear Algebra Appl. 208/209 (1994) 303–326.

17

[19] P. R. Halmos, Two subspaces, Trans. Amer. Math. Soc. 144 (1969)381–389.

[20] B. D. Sutton, Computing the complete CS decomposition, Numer.Algorithms 50 (2009) 33–65.

18

Title PageTitle Pagepage 2

Principal Angles Between Subspaces and Their Tangentspage 2page 3page 4page 5page 6page 7page 8page 9page 10page 11page 12page 13page 14page 15page 16page 17page 18

tr2012-058 september 2012 · 2012. 11. 15. · figure 1: the pabs in 2d. where 0

Documents