lecture 4: randomized algorithms for low-rank...

40
Lecture 4: Randomized Algorithms for Low-Rank Approximations March 6 - 13, 2020 Low-Rank Approximations Lecture 4 March 6 - 13, 2020 1 / 40

Upload: others

Post on 30-May-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Lecture 4 Randomized Algorithms for Low-RankApproximations

March 6 - 13 2020

Low-Rank Approximations Lecture 4 March 6 - 13 2020 1 40

1 Low-rank approximation

Given A isin Rmtimesn we seek to compute E isin Rmtimesk and F isin Rktimesn

such that

A asymp EF rank(E) = rank(F) = k k ≪ minmn

Case 1 the rank k is given in advance

Case 2 determine the rank k such that

983042AminusEF983042 le ε

where ε is a given tolerance and 983042 middot 983042 is the spectral or theFrobenius norms

Advantages (1) storage (2) complexity (3) many others

rank k EVD for symmetric A asymp UDUT diagonal D isin Rktimesk

rank k SVD for general A asymp UDVT

Low-Rank Approximations Lecture 4 March 6 - 13 2020 2 40

11 A simple prototypical randomized algorithm

Let A be a matrix of size mtimes n that is approximately of low rank

Draw a Gaussian random matrix G of size ntimes k form thesampling matrix

E = AG

and then compute the factor F via

F = EdaggerA

where Edagger is the Moore-Penrose pseudo-inverse of E In manyimportant situations the approximation

A asymp E(EdaggerA)

is close to optimal

Low-Rank Approximations Lecture 4 March 6 - 13 2020 3 40

12 Advantages of randomized methods

Given an mtimes n matrix A the cost of computing a rank-kapproximant using classical methods is O(mnk) Randomizedalgorithms can attain complexity O(mn log k + k2(m+ n))

Algorithms for performing principal component analysis (PCA) oflarge data sets have been greatly accelerated in particular whenthe data is stored out-of-core

Randomized methods tend to require less communication thantraditional methods and can be efficiently implemented onseverely communication constrained environments such as GPUsand distributed computing platforms

Randomized algorithms have enabled the development ofsingle-pass matrix factorization algorithms in which the matrix isldquostreamedrdquo and never stored

Low-Rank Approximations Lecture 4 March 6 - 13 2020 4 40

13 A two-stage approach for an approximate rank-k SVD

Stage Amdashfind an approximate range Construct an mtimes k matrixQ with orthonormal columns such that

A asymp QQTA

(In other words the columns of Q form an approximate basis forthe column space of A) This step will be executed via arandomized process

Stage Bmdashform a specific factorization Given the matrix Qcomputed in Stage A form the factors UDV using classicaldeterministic techniques For instance this stage can be executedvia the following steps

(1) Form the k times n matrix B = QTA

(2) Compute an SVD of the (small) matrix B B = 983141UDVT

(3) Form U = Q983141U

Low-Rank Approximations Lecture 4 March 6 - 13 2020 5 40

A randomized algorithm for ldquoStage Ardquo

(1) Draw a Gaussian random matrix G and form AG

(2) Construct orthonormal matrix Q st

range(Q) = range(AG)

Motivation

(i) The special case rank(A) = k Let G isin Rntimesk One can prove

range(Q) = range(A)

holds with probability 1

(ii) Practical cases rank(A) ≫ k Simply take a few extra samplesIt turns out that if we take say k + 10 samples instead of k thenthe process will with probability almost 1 produce Qk that iscomparable to the best possible basis Uk where Uk are thematrix consisting of the k leading left singular vectors

Low-Rank Approximations Lecture 4 March 6 - 13 2020 6 40

Notation Orthonormalization Given an mtimes l matrix X withrank(X) = l with m ge l we introduce the function

Q = orth(X)

to denote orthonormalization of the columns of X In other wordsQ will be an mtimes l orthonormal matrix whose columns form abasis for the column space of X ie

QTQ = I and range(Q) = range(X)

(1) Via QR factorization

X = QR

In Matlab[Qsim] = qr(X 0)

(2) Via SVD factorization

X = QDVT

Low-Rank Approximations Lecture 4 March 6 - 13 2020 7 40

Algorithm 1 Randomized SVD

Input A isin Rmtimesn a target rank k over-sampling parameter p

Output U isin Rmtimes(k+p) D isin R(k+p)times(k+p) and V isin Rntimes(k+p)

Stage A(1) Form an ntimes (k + p) Gaussian random matrix G(2) Form the sample matrix Y = AG(3) Orthonormalize the columns of the sample matrix

Q = orth(Y)

Stage B(4) Form the (k + p)times n matrix B = QTA

(5) Form the SVD of the small matrix B B = 983141UDVT

(6) Form U = Q983141U

Randomized SVD accesses the matrix A twice

Low-Rank Approximations Lecture 4 March 6 - 13 2020 8 40

Theorem 1

Let A be an mtimes n matrix with singular values σjmin(mn)j=1 Let k be a

target rank and let p be an over-sampling parameter such that p ge 2and k + p le min(mn) Let G be a Gaussian random matrix of sizentimes (k + p) and set Q = orth(AG) Then the average error asmeasured in the Frobenius norm satisfies

E[983042AminusQQTA983042F] le9830611 +

k

pminus 1

98306212983091

983107min(mn)983131

j=k+1

σ2j

983092

98310812

where E refers to expectation with respect to the draw of G Thecorresponding result for the spectral norm reads

E[983042AminusQQTA9830422] le9830751 +

983158k

pminus 1

983076σk+1 +

eradick + p

p

983091

983107min(mn)983131

j=k+1

σ2j

983092

98310812

Low-Rank Approximations Lecture 4 March 6 - 13 2020 9 40

Bounds on the likelihood of large deviations

The likelihood of a large deviation from the mean depends only onthe over-sampling parameter p and decays extraordinarily fastFor instance one can prove that if p ge 4 then

983042AminusQQTA9830422 le9830751 + 17

983158

1 +k

p

983076σk+1

+8radick + p

p+ 1

983091

983107min(mn)983131

j=k+1

σ2j

983092

98310812

with failure probability at most 3eminusp

Low-Rank Approximations Lecture 4 March 6 - 13 2020 10 40

14 Single-pass algorithms

Algorithm 2 Single-pass randomized EVD for symmetric matrix

Input Symmetric A isin Rntimesn a target rank kover-sampling parameter p

Output U isin Rntimesk D isin Rktimesk and A asymp UDUT

Stage A(1) Form an ntimes (k + p) Gaussian random matrix G(2) Form the sample matrix Y = AG(3) Let Q denote the orthonormal matrix formed by the k

dominant left singular vectors of Y

Stage B

(4) Let C = argminXisinRktimeskX=XT

983042XQTGminusQTY983042F

(5) Form the EVD of the small matrix C C = 983141UD983141UT

(6) Form U = Q983141U

Low-Rank Approximations Lecture 4 March 6 - 13 2020 11 40

Algorithm 3 Single-pass randomized SVD for general matrix

Input A isin Rmtimesn a target rank k over-sampling parameter pOutput U isin Rntimesk D isin Rktimesk V isin Rntimesk and A asymp UDVT

Stage A

(1) Form two Gaussian random matrices Gc isin Rntimes(k+p) and

Gr isin Rmtimes(k+p) (2) Form the sample matrices Yc = AGc and Yr = ATGr(3) Form orthonormal matrices Qc and Qr consisting of the

k dominant left singular vectors of Yc and Yr

Stage B(4) Let

C = argminXisinRktimesk

983042GTr QcXminusYT

r Qr9830422F + 983042XQTr Gc minusQT

c Yc9830422F

(5) Form the SVD of the small matrix C C = 983141UD983141VT

(6) Form U = Qc983141U and V = Qr

983141V

Low-Rank Approximations Lecture 4 March 6 - 13 2020 12 40

Streaming algorithms

We say that an algorithm for processing a matrix is a streamingalgorithm if both (1) and (2) are satisfied

(1) each entry of the matrix is accessed only once

(2) the entries can be fed in any order (In other words thealgorithm is not allowed to dictate the order in which the elementsare viewed)

Complexity O(mn log k) algorithm

Replace Gaussian random matrix G with a different randommatrix Ω

(1) Ω is sufficiently structured that AΩ can be evaluated inO(mn log k) flops

(2) Ω is sufficiently random such that the columns of AΩaccurately span the range of A

Low-Rank Approximations Lecture 4 March 6 - 13 2020 13 40

Candidate ldquoFast Johnson-Lindenstrauss Transformrdquo

(1) Complex case A good candidate for Ω

Ω = DFS

where D is diagonal and Dii are complex numbers of modulus onedrawn from a uniform distribution on the unit circle in C F is thediscrete Fourier transform

Fpq =1radicneminus2πi(pminus1)(qminus1)n p q = 1 n

and S is a matrix consisting of a random subset of l columns fromthe ntimes n unit matrix (drawn without replacement) In otherwords given an arbitrary matrix X of size mtimes n the matrix XSconsists of a randomly drawn subset of l columns of X

(2) Real case Ω = DHS where H is the Hadamard Transform Dis diagonal and Dii = plusmn1 with equal probability

Low-Rank Approximations Lecture 4 March 6 - 13 2020 14 40

15 Accuracy enhanced algorithms

Algorithm 4 Accuracy enhanced randomized SVD

Input A isin Rmtimesn a target rank kover-sampling parameter pa small q the number of steps in the power iteration

Output U isin Rmtimes(k+p) D isin R(k+p)times(k+p) and V isin Rntimes(k+p)

G = randn(n k + p)Y = AGfor j = 1 q

Z = ATYY = AZ

endQ = orth(Y) (for example [Qsim] = qr(Y 0))B = QTA

[983141UDV] = svd(Blsquoeconrsquo)

U = Q983141U

Low-Rank Approximations Lecture 4 March 6 - 13 2020 15 40

Theorem 2

Let A be an mtimes n matrix with singular values σjmin(mn)j=1 Let k be a

target rank and let p be an over-sampling parameter such that p ge 2and k + p le min(mn) Let G be a Gaussian random matrix of sizentimes (k + p) let q be a small integer set

Y = (AAT)qAG

and let Q = orth(Y) Then

E[983042AminusQQTA9830422]

le

983093

983097983095

9830751 +

983158k

pminus 1

983076σ2q+1k+1 +

eradick + p

p

983091

983107min(mn)983131

j=k+1

σ2(2q+1)j

983092

983108

12

983094

983098983096

12q+1

le9830771 +

983158k

pminus 1+

eradick + p

p

983155min(mn)minus k

983078 12q+1

σk+1

Low-Rank Approximations Lecture 4 March 6 - 13 2020 16 40

Algorithm 5 Accuracy enhanced randomized SVDwith orthonormalization

Input A isin Rmtimesn a target rank kover-sampling parameter pa small q the number of steps in the power iteration

Output U isin Rmtimes(k+p) D isin R(k+p)times(k+p) and V isin Rntimes(k+p)

G = randn(n k + p)Q = orth(AG)for j = 1 q

W = orth(ATQ)Q = orth(AW)

endB = QTA

[983141UDV] = svd(Blsquoeconrsquo)

U = Q983141U

Low-Rank Approximations Lecture 4 March 6 - 13 2020 17 40

16 The Nystrom method for SPD matrices

Algorithm 6 Randomized EVD for SPD matrixvia Nystrom method

Input SPD A isin Rntimesn a target rank kover-sampling parameter p

Output U isin Rmtimes(k+p) and Λ isin R(k+p)times(k+p)

G = randn(n k + p)Q = orth(AG)B1 = AQB2 = QTB1C = chol(B2)F = B1C

minus1[UΣsim] = svd(Flsquoeconrsquo)Λ = Σ2

Low-Rank Approximations Lecture 4 March 6 - 13 2020 18 40

2 Interpolative decomposition (ID)

Any matrix A of size mtimes n and rank k where k lt min(mn)admits a so called ldquocolumn IDrdquo which takes the form

A = CZ C isin Rmtimesk Z isin Rktimesn

where the matrix C = A( Js) and Z is a matrix that containsthe k times k identity matrix (Z( Js) = Ik)

Advantages compared to QR or SVD

(1) If A is sparse or non-negative then C shares these properties

(2) The ID requires less memory to store than either the QR orSVD

(3) Finding the indices associated with the spanning columns isoften helpful in data interpretation

(4) In the context of numerical algorithms for discretizing PDEsand integral equations the ID often preserves ldquothe physicsrdquo of aproblem in a way that the QR or SVD do not

Low-Rank Approximations Lecture 4 March 6 - 13 2020 19 40

ldquoRow IDrdquo

A = XR X isin Rmtimesk R isin Rktimesn

where R = A(Is ) and X is a matrix that contains the k times kidentity matrix (X(Is ) = Ik)

ldquoDouble-sided IDrdquo

A = XAsZ X isin Rmtimesk As isin Rktimesk Z isin Rktimesn

where X and Z are the same matrices as those that appear in rowID and column ID and As is the k times k submatrix of A given by

As = A(Is Js)

Low-Rank Approximations Lecture 4 March 6 - 13 2020 20 40

21 Deterministic techniques for the approximate ID

The standard software used to compute the classical columnpivoted QR factorization (CPQR)

AP = QS

can with some light post-processing be used to compute thecolumn ID Let

Q =983045Q1 Q2

983046 S =

983063S11 S12

0 S22

983064

ThenAP =

983045Q1S11 Q1S12 +Q2S22

983046

Let J denote the permutation vector associated with thepermutation matrix P so that P = I( J) and AP = A( J) LetJs = J(1 k) Then set

C = A( Js) = Q1S11

Low-Rank Approximations Lecture 4 March 6 - 13 2020 21 40

ByAP = Q1S11

983045I Sminus1

11 S12

983046+

9830450 Q2S22

983046

we have

A = C983045I Sminus1

11 S12

983046PT +

9830450 Q2S22

983046PT

= CZ+9830450 Q2S22

983046PT

Algorithm 7 Column ID A asymp A( Js)Z

function [JsZ] = ID col(A k)[simS J ] = qr(A 0)T = S(1 k 1 k)S(1 k (k + 1) n)Z = zeros(k n)Z( J) =

983045Ik T

983046

Js = J(1 k)end

Low-Rank Approximations Lecture 4 March 6 - 13 2020 22 40

The row ID can be computed via an entirely analogous processthat starts with a CPQR of the transpose of A In other wordswe execute a pivoted Gram-Schmidt orthonormalization processon the rows of A

Algorithm 8 Row ID A asymp XA(Is )

function [IsX] = ID row(A k)[simS J ] = qr(AT 0)T = S(1 k 1 k)S(1 k (k + 1) m)X = zeros(m k)

X(J ) =983045Ik T

983046T

Is = J(1 k)end

Low-Rank Approximations Lecture 4 March 6 - 13 2020 23 40

To obtain the double-sided ID we start with using theCPQR-based process to build the column ID Then compute therow ID by performing Gram-Schmidt on the rows of the tall thinmatrix C

Algorithm 9 Double-sided ID A asymp XA(Is Js)Z

function [Is JsXZ] = ID double(A k)[JsZ] = ID col(A k)[IsX] = ID row(A( Js) k)

end

The algorithms for computing interpolatory decompositions arewasteful when k ≪ min(mn) since they involve a full QRfactorization which has complexity O(mnmin(mn)) Thisproblem is very easily remedied by replacing the full QRfactorization by a partial QR factorization (the QR factorization isinterrupted after k steps) which has cost O(mnk)

Low-Rank Approximations Lecture 4 March 6 - 13 2020 24 40

22 Randomized techniques for the approximate ID

Suppose A isin Rmtimesn with rank(A) = k Let

A = YF Y isin Rmtimesk F isin Rktimesn

Then by [IsX] = ID row(Y k) to compute a row ID of Y wehave

Y = XY(Is )

andXA(Is ) = XY(Is )F = YF = A

That is XA(Is ) is automatically a row ID of A as well

Observation In order to compute a row ID of a matrix A theonly information needed is a matrix Y whose columns span thecolumn space of A

As we have seen the task of finding a matrix Y whose columnsform a good basis for the column space of a matrix is ideallysuited to randomized sampling

Low-Rank Approximations Lecture 4 March 6 - 13 2020 25 40

Algorithm 10 Accuracy enhanced randomized row ID

Input A isin Rmtimesn a target rank kover-sampling parameter pa small q the number of steps in the power iteration

Output X isin Rmtimesk and an index vector Is isin Nk

such that A asymp XA(Is )

G = randn(n k + p)Y = AGfor j = 1 q

Z = ATYY = AZ

end[IsX] = ID row(Y k)

We can reduce the complexity by using a structured randommatrix instead of a Gaussian For example Ω = DFSDHS

Low-Rank Approximations Lecture 4 March 6 - 13 2020 26 40

3 The CUR decomposition

The CUR factorization approximates an mtimes n matrix A as aproduct

A asymp CUR C isin Rmtimesk U isin Rktimesk R isin Rktimesn

where C consists of a subset of the columns of A and R consists ofa subset of the rows of A

From double-sided ID to CUR Let

C = A( Js) R = A(Is )

ByA asymp CZ asymp CUR

we setU = ZRdagger

Low-Rank Approximations Lecture 4 March 6 - 13 2020 27 40

Algorithm 11 Accuracy enhanced randomized CUR

Input A isin Rmtimesn a target rank kover-sampling parameter pa small q the number of steps in the power iteration

Output U isin Rktimesk index vectors Is Js isin Nk

such that A asymp A( Js)UA(Is )

G = randn(k + pm)Y = GAfor j = 1 q

Z = YATY = ZA

end[IsZ] = ID col(Y k)[Issim] = ID row(A( Js) k)Solve UA(Is ) = Z in least squares sense

Low-Rank Approximations Lecture 4 March 6 - 13 2020 28 40

4 Adaptive rank determination

Given a matrix A and a tolerance ε The task is to determinematrices Qk and Bk of rank le k such that 983042AminusQkBk983042 le ε

41 Algorithm with updating the matrix

Q0 = [ ] B0 = [ ] A0 = A k = 0while 983042Ak9830422 gt ε

k = k + 1pick y isin range(Akminus1)qk = y983042y9830422bk = qT

kAkminus1Qk =

983045Qkminus1 qk

983046

Bk =

983063Bkminus1

bk

983064

Ak = Akminus1 minus qkbkend

Low-Rank Approximations Lecture 4 March 6 - 13 2020 29 40

Strategies for choosing y isin range(Akminus1)

(1) Pick the largest remaining column

y = Akminus1( jk) with jk = argmaxj

983042Akminus1( j)9830422

(2) Pick the locally optimal vector (computationally hard)

y = argmin983042x9830422=1 xisinrange(Akminus1)

983042(Iminus xxT)Akminus19830422

(3) A randomized selection strategy Set

y = Akminus1g

where g isin Rn is a Gaussian random vector

Low-Rank Approximations Lecture 4 March 6 - 13 2020 30 40

Exercise Prove that for each k Qk is orthonormal and

Bk = QTkA A = QkBk +Ak

Exercise Prove that for each k

983042A9830422F = 983042Bk9830422F + 983042Ak9830422F

Exercise Prove that for each k with the strategy (2) ie

y = argmin983042x9830422=1 xisinrange(Akminus1)

983042(Iminus xxT)Akminus19830422

the algorithm will produce matrices that attain the theoreticallyoptimal precision ie

983042AminusQkBk9830422 = σk+1

Low-Rank Approximations Lecture 4 March 6 - 13 2020 31 40

42 Block version for randomized selection strategy

Q = [ ] B = [ ]while 983042A9830422 gt ε

G = randn(n p)[Qnewsim] = qr(AG 0)Bnew = QT

newAQ =

983045Q Qnew

983046

B =

983063B

Bnew

983064

A = AminusQnewBnewend

Exercise The block version algorithmrsquos output is an orthonormalmatrix Q of size mtimes k (where k is a multiple of p) and a k times nmatrix B such that

983042AminusQB9830422 le ε

Low-Rank Approximations Lecture 4 March 6 - 13 2020 32 40

43 Algorithm without updating the matrix

Draw standard Gaussian vectors g1 gr of length nyi = Agi i = 1 r j = 0 Q0 = [ ]

while max983042yj+19830422 983042yj+29830422 983042yj+r9830422 gt ε(10983155

2π)j = j + 1yj = yj minusQjminus1Q

Tjminus1yj

qj = yj983042yj9830422Qj =

983045Qjminus1 qj

983046

Draw a standard Gaussian vector gj+r of length nyj+r = (IminusQjQ

Tj )Agj+r

yi = yi minus qjqTj yi i = (j + 1) (j + r minus 1)

endQ = Qj

The orthonormal matrix Q such that 983042AminusQQTA9830422 le ε holdswith probability at least 1minusminmn10minusr

Low-Rank Approximations Lecture 4 March 6 - 13 2020 33 40

5 Rank-revealing QR Factorization

Column pivoted QR factorization (CPQR) of A isin Rmtimesn

AP = QR

where P is a permutation matrix Q is an orthogonal matrix andR is upper triangular

Householder column pivoted QR

Householder column pivoted QR consists of a sequence of nminus 1rank-1 updates (so called BLAS2 operations)

Low-Rank Approximations Lecture 4 March 6 - 13 2020 34 40

Householder column pivoted QR executes rather slowly on modernhardware in particular on systems involving many cores when thematrix is stored in distributed memory or on a hard drive etcThe reason is the process very communication intensive Wereturn to block the process

Blocked Householder column pivoted QR

BLAS3 operations are involved which can be executed very rapidlyon a broad range of computing hardware

A randomized algorithm

G = randn(b+ pm) Y = GA [simsimP1] = qr(Y 0)

Low-Rank Approximations Lecture 4 March 6 - 13 2020 35 40

6 Rank-revealing UTV decomposition

Given an mtimes n matrix A with m ge n a UTV decomposition ofA is a factorization of the form

A = UTVT

where U isin Rmtimesm and V isin Rntimesn are unitary matrices and T is atriangular matrix (either lower or upper triangular)

The UTV decomposition can be viewed as a generalization ofother standard factorizations such as eg the Singular ValueDecomposition (SVD) or the Column Pivoted QR decomposition(CPQR) (To be precise the SVD is the special case where T isdiagonal and the CPQR is the special case where V is apermutation matrix) The additional flexibility inherent in UTVdecomposition enables the design of efficient updating procedures

randUTV provides close to optimal low-rank approximation andhighly accurate estimates for the singular values of a matrix

Low-Rank Approximations Lecture 4 March 6 - 13 2020 36 40

The algorithm randUTV builds the factorization incrementallywhich means that when it is applied to a matrix of numerical rankk the algorithm can be stopped early and incur an overall cost ofO(mnk)

Like HQRRP the algorithm randUTV is blocked which enables itto execute fast on modern hardware

The algorithm randUTV is not an iterative algorithm In thisregard it is closer to the CPQR than standard SVD algorithmswhich substantially simplifies software optimization

Blocked UTV factorization process

Low-Rank Approximations Lecture 4 March 6 - 13 2020 37 40

61 A single step block factorization from A0 to A1

Draw an mtimes p Gaussian matrix G and form the sample matrix

Y = (ATA)qATG

where q is a parameter indicating the number of steps of poweriteration taken

Form a unitary matrix V whose first p columns form anorthonormal basis for the column space of Y For example

[Vsim] = qr(Y)

We then compute SVD of the matrix AV( 1 p)

AV( 1 p) = U1A11WT

and set V1 = V

983063W

I

983064

Low-Rank Approximations Lecture 4 March 6 - 13 2020 38 40

One step of randUTV

function [UTV] = stepUTV(A p q)G =randn(size(A 1) p)Y = ATGfor i = 1 q

Y = AT(AY)end[Vsim] = qr(Y)[UDW] = svd(AV( 1 p))T =

983045D UTAV( (p+ 1) end)

983046

V( 1 p) = V( 1 p)Wreturn

randUTV then applies the same procedure to the remaining blockand so on

Low-Rank Approximations Lecture 4 March 6 - 13 2020 39 40

function [UTV] = randUTV(A p q)T = AU = eye(size(A 1)) V = eye(size(A 2))for i = 1 ceil(size(A 2)p)

I1 = 1 (p(iminus 1))I2 = (p(iminus 1) + 1) size(A 1)J2 = (p(iminus 1) + 1) size(A 2)if length(J2) gt p

[983141U 983141T 983141V] = stepUTV(T(I2 J2) p q)else

[983141U 983141T 983141V] = svd(T(I2 J2))end

U( I2) = U( I2)983141U V( J2) = V( J2)983141V

T(I2 J2) = 983141T T(I1 J2) = 983141T(I1 J2)983141Vend

return

Low-Rank Approximations Lecture 4 March 6 - 13 2020 40 40

1 Low-rank approximation

Given A isin Rmtimesn we seek to compute E isin Rmtimesk and F isin Rktimesn

such that

A asymp EF rank(E) = rank(F) = k k ≪ minmn

Case 1 the rank k is given in advance

Case 2 determine the rank k such that

983042AminusEF983042 le ε

where ε is a given tolerance and 983042 middot 983042 is the spectral or theFrobenius norms

Advantages (1) storage (2) complexity (3) many others

rank k EVD for symmetric A asymp UDUT diagonal D isin Rktimesk

rank k SVD for general A asymp UDVT

Low-Rank Approximations Lecture 4 March 6 - 13 2020 2 40

11 A simple prototypical randomized algorithm

Let A be a matrix of size mtimes n that is approximately of low rank

Draw a Gaussian random matrix G of size ntimes k form thesampling matrix

E = AG

and then compute the factor F via

F = EdaggerA

where Edagger is the Moore-Penrose pseudo-inverse of E In manyimportant situations the approximation

A asymp E(EdaggerA)

is close to optimal

Low-Rank Approximations Lecture 4 March 6 - 13 2020 3 40

12 Advantages of randomized methods

Given an mtimes n matrix A the cost of computing a rank-kapproximant using classical methods is O(mnk) Randomizedalgorithms can attain complexity O(mn log k + k2(m+ n))

Algorithms for performing principal component analysis (PCA) oflarge data sets have been greatly accelerated in particular whenthe data is stored out-of-core

Randomized methods tend to require less communication thantraditional methods and can be efficiently implemented onseverely communication constrained environments such as GPUsand distributed computing platforms

Randomized algorithms have enabled the development ofsingle-pass matrix factorization algorithms in which the matrix isldquostreamedrdquo and never stored

Low-Rank Approximations Lecture 4 March 6 - 13 2020 4 40

13 A two-stage approach for an approximate rank-k SVD

Stage Amdashfind an approximate range Construct an mtimes k matrixQ with orthonormal columns such that

A asymp QQTA

(In other words the columns of Q form an approximate basis forthe column space of A) This step will be executed via arandomized process

Stage Bmdashform a specific factorization Given the matrix Qcomputed in Stage A form the factors UDV using classicaldeterministic techniques For instance this stage can be executedvia the following steps

(1) Form the k times n matrix B = QTA

(2) Compute an SVD of the (small) matrix B B = 983141UDVT

(3) Form U = Q983141U

Low-Rank Approximations Lecture 4 March 6 - 13 2020 5 40

A randomized algorithm for ldquoStage Ardquo

(1) Draw a Gaussian random matrix G and form AG

(2) Construct orthonormal matrix Q st

range(Q) = range(AG)

Motivation

(i) The special case rank(A) = k Let G isin Rntimesk One can prove

range(Q) = range(A)

holds with probability 1

(ii) Practical cases rank(A) ≫ k Simply take a few extra samplesIt turns out that if we take say k + 10 samples instead of k thenthe process will with probability almost 1 produce Qk that iscomparable to the best possible basis Uk where Uk are thematrix consisting of the k leading left singular vectors

Low-Rank Approximations Lecture 4 March 6 - 13 2020 6 40

Notation Orthonormalization Given an mtimes l matrix X withrank(X) = l with m ge l we introduce the function

Q = orth(X)

to denote orthonormalization of the columns of X In other wordsQ will be an mtimes l orthonormal matrix whose columns form abasis for the column space of X ie

QTQ = I and range(Q) = range(X)

(1) Via QR factorization

X = QR

In Matlab[Qsim] = qr(X 0)

(2) Via SVD factorization

X = QDVT

Low-Rank Approximations Lecture 4 March 6 - 13 2020 7 40

Algorithm 1 Randomized SVD

Input A isin Rmtimesn a target rank k over-sampling parameter p

Output U isin Rmtimes(k+p) D isin R(k+p)times(k+p) and V isin Rntimes(k+p)

Stage A(1) Form an ntimes (k + p) Gaussian random matrix G(2) Form the sample matrix Y = AG(3) Orthonormalize the columns of the sample matrix

Q = orth(Y)

Stage B(4) Form the (k + p)times n matrix B = QTA

(5) Form the SVD of the small matrix B B = 983141UDVT

(6) Form U = Q983141U

Randomized SVD accesses the matrix A twice

Low-Rank Approximations Lecture 4 March 6 - 13 2020 8 40

Theorem 1

Let A be an mtimes n matrix with singular values σjmin(mn)j=1 Let k be a

target rank and let p be an over-sampling parameter such that p ge 2and k + p le min(mn) Let G be a Gaussian random matrix of sizentimes (k + p) and set Q = orth(AG) Then the average error asmeasured in the Frobenius norm satisfies

E[983042AminusQQTA983042F] le9830611 +

k

pminus 1

98306212983091

983107min(mn)983131

j=k+1

σ2j

983092

98310812

where E refers to expectation with respect to the draw of G Thecorresponding result for the spectral norm reads

E[983042AminusQQTA9830422] le9830751 +

983158k

pminus 1

983076σk+1 +

eradick + p

p

983091

983107min(mn)983131

j=k+1

σ2j

983092

98310812

Low-Rank Approximations Lecture 4 March 6 - 13 2020 9 40

Bounds on the likelihood of large deviations

The likelihood of a large deviation from the mean depends only onthe over-sampling parameter p and decays extraordinarily fastFor instance one can prove that if p ge 4 then

983042AminusQQTA9830422 le9830751 + 17

983158

1 +k

p

983076σk+1

+8radick + p

p+ 1

983091

983107min(mn)983131

j=k+1

σ2j

983092

98310812

with failure probability at most 3eminusp

Low-Rank Approximations Lecture 4 March 6 - 13 2020 10 40

14 Single-pass algorithms

Algorithm 2 Single-pass randomized EVD for symmetric matrix

Input Symmetric A isin Rntimesn a target rank kover-sampling parameter p

Output U isin Rntimesk D isin Rktimesk and A asymp UDUT

Stage A(1) Form an ntimes (k + p) Gaussian random matrix G(2) Form the sample matrix Y = AG(3) Let Q denote the orthonormal matrix formed by the k

dominant left singular vectors of Y

Stage B

(4) Let C = argminXisinRktimeskX=XT

983042XQTGminusQTY983042F

(5) Form the EVD of the small matrix C C = 983141UD983141UT

(6) Form U = Q983141U

Low-Rank Approximations Lecture 4 March 6 - 13 2020 11 40

Algorithm 3 Single-pass randomized SVD for general matrix

Input A isin Rmtimesn a target rank k over-sampling parameter pOutput U isin Rntimesk D isin Rktimesk V isin Rntimesk and A asymp UDVT

Stage A

(1) Form two Gaussian random matrices Gc isin Rntimes(k+p) and

Gr isin Rmtimes(k+p) (2) Form the sample matrices Yc = AGc and Yr = ATGr(3) Form orthonormal matrices Qc and Qr consisting of the

k dominant left singular vectors of Yc and Yr

Stage B(4) Let

C = argminXisinRktimesk

983042GTr QcXminusYT

r Qr9830422F + 983042XQTr Gc minusQT

c Yc9830422F

(5) Form the SVD of the small matrix C C = 983141UD983141VT

(6) Form U = Qc983141U and V = Qr

983141V

Low-Rank Approximations Lecture 4 March 6 - 13 2020 12 40

Streaming algorithms

We say that an algorithm for processing a matrix is a streamingalgorithm if both (1) and (2) are satisfied

(1) each entry of the matrix is accessed only once

(2) the entries can be fed in any order (In other words thealgorithm is not allowed to dictate the order in which the elementsare viewed)

Complexity O(mn log k) algorithm

Replace Gaussian random matrix G with a different randommatrix Ω

(1) Ω is sufficiently structured that AΩ can be evaluated inO(mn log k) flops

(2) Ω is sufficiently random such that the columns of AΩaccurately span the range of A

Low-Rank Approximations Lecture 4 March 6 - 13 2020 13 40

Candidate ldquoFast Johnson-Lindenstrauss Transformrdquo

(1) Complex case A good candidate for Ω

Ω = DFS

where D is diagonal and Dii are complex numbers of modulus onedrawn from a uniform distribution on the unit circle in C F is thediscrete Fourier transform

Fpq =1radicneminus2πi(pminus1)(qminus1)n p q = 1 n

and S is a matrix consisting of a random subset of l columns fromthe ntimes n unit matrix (drawn without replacement) In otherwords given an arbitrary matrix X of size mtimes n the matrix XSconsists of a randomly drawn subset of l columns of X

(2) Real case Ω = DHS where H is the Hadamard Transform Dis diagonal and Dii = plusmn1 with equal probability

Low-Rank Approximations Lecture 4 March 6 - 13 2020 14 40

15 Accuracy enhanced algorithms

Algorithm 4 Accuracy enhanced randomized SVD

Input A isin Rmtimesn a target rank kover-sampling parameter pa small q the number of steps in the power iteration

Output U isin Rmtimes(k+p) D isin R(k+p)times(k+p) and V isin Rntimes(k+p)

G = randn(n k + p)Y = AGfor j = 1 q

Z = ATYY = AZ

endQ = orth(Y) (for example [Qsim] = qr(Y 0))B = QTA

[983141UDV] = svd(Blsquoeconrsquo)

U = Q983141U

Low-Rank Approximations Lecture 4 March 6 - 13 2020 15 40

Theorem 2

Let A be an mtimes n matrix with singular values σjmin(mn)j=1 Let k be a

target rank and let p be an over-sampling parameter such that p ge 2and k + p le min(mn) Let G be a Gaussian random matrix of sizentimes (k + p) let q be a small integer set

Y = (AAT)qAG

and let Q = orth(Y) Then

E[983042AminusQQTA9830422]

le

983093

983097983095

9830751 +

983158k

pminus 1

983076σ2q+1k+1 +

eradick + p

p

983091

983107min(mn)983131

j=k+1

σ2(2q+1)j

983092

983108

12

983094

983098983096

12q+1

le9830771 +

983158k

pminus 1+

eradick + p

p

983155min(mn)minus k

983078 12q+1

σk+1

Low-Rank Approximations Lecture 4 March 6 - 13 2020 16 40

Algorithm 5 Accuracy enhanced randomized SVDwith orthonormalization

Input A isin Rmtimesn a target rank kover-sampling parameter pa small q the number of steps in the power iteration

Output U isin Rmtimes(k+p) D isin R(k+p)times(k+p) and V isin Rntimes(k+p)

G = randn(n k + p)Q = orth(AG)for j = 1 q

W = orth(ATQ)Q = orth(AW)

endB = QTA

[983141UDV] = svd(Blsquoeconrsquo)

U = Q983141U

Low-Rank Approximations Lecture 4 March 6 - 13 2020 17 40

16 The Nystrom method for SPD matrices

Algorithm 6 Randomized EVD for SPD matrixvia Nystrom method

Input SPD A isin Rntimesn a target rank kover-sampling parameter p

Output U isin Rmtimes(k+p) and Λ isin R(k+p)times(k+p)

G = randn(n k + p)Q = orth(AG)B1 = AQB2 = QTB1C = chol(B2)F = B1C

minus1[UΣsim] = svd(Flsquoeconrsquo)Λ = Σ2

Low-Rank Approximations Lecture 4 March 6 - 13 2020 18 40

2 Interpolative decomposition (ID)

Any matrix A of size mtimes n and rank k where k lt min(mn)admits a so called ldquocolumn IDrdquo which takes the form

A = CZ C isin Rmtimesk Z isin Rktimesn

where the matrix C = A( Js) and Z is a matrix that containsthe k times k identity matrix (Z( Js) = Ik)

Advantages compared to QR or SVD

(1) If A is sparse or non-negative then C shares these properties

(2) The ID requires less memory to store than either the QR orSVD

(3) Finding the indices associated with the spanning columns isoften helpful in data interpretation

(4) In the context of numerical algorithms for discretizing PDEsand integral equations the ID often preserves ldquothe physicsrdquo of aproblem in a way that the QR or SVD do not

Low-Rank Approximations Lecture 4 March 6 - 13 2020 19 40

ldquoRow IDrdquo

A = XR X isin Rmtimesk R isin Rktimesn

where R = A(Is ) and X is a matrix that contains the k times kidentity matrix (X(Is ) = Ik)

ldquoDouble-sided IDrdquo

A = XAsZ X isin Rmtimesk As isin Rktimesk Z isin Rktimesn

where X and Z are the same matrices as those that appear in rowID and column ID and As is the k times k submatrix of A given by

As = A(Is Js)

Low-Rank Approximations Lecture 4 March 6 - 13 2020 20 40

21 Deterministic techniques for the approximate ID

The standard software used to compute the classical columnpivoted QR factorization (CPQR)

AP = QS

can with some light post-processing be used to compute thecolumn ID Let

Q =983045Q1 Q2

983046 S =

983063S11 S12

0 S22

983064

ThenAP =

983045Q1S11 Q1S12 +Q2S22

983046

Let J denote the permutation vector associated with thepermutation matrix P so that P = I( J) and AP = A( J) LetJs = J(1 k) Then set

C = A( Js) = Q1S11

Low-Rank Approximations Lecture 4 March 6 - 13 2020 21 40

ByAP = Q1S11

983045I Sminus1

11 S12

983046+

9830450 Q2S22

983046

we have

A = C983045I Sminus1

11 S12

983046PT +

9830450 Q2S22

983046PT

= CZ+9830450 Q2S22

983046PT

Algorithm 7 Column ID A asymp A( Js)Z

function [JsZ] = ID col(A k)[simS J ] = qr(A 0)T = S(1 k 1 k)S(1 k (k + 1) n)Z = zeros(k n)Z( J) =

983045Ik T

983046

Js = J(1 k)end

Low-Rank Approximations Lecture 4 March 6 - 13 2020 22 40

The row ID can be computed via an entirely analogous processthat starts with a CPQR of the transpose of A In other wordswe execute a pivoted Gram-Schmidt orthonormalization processon the rows of A

Algorithm 8 Row ID A asymp XA(Is )

function [IsX] = ID row(A k)[simS J ] = qr(AT 0)T = S(1 k 1 k)S(1 k (k + 1) m)X = zeros(m k)

X(J ) =983045Ik T

983046T

Is = J(1 k)end

Low-Rank Approximations Lecture 4 March 6 - 13 2020 23 40

To obtain the double-sided ID we start with using theCPQR-based process to build the column ID Then compute therow ID by performing Gram-Schmidt on the rows of the tall thinmatrix C

Algorithm 9 Double-sided ID A asymp XA(Is Js)Z

function [Is JsXZ] = ID double(A k)[JsZ] = ID col(A k)[IsX] = ID row(A( Js) k)

end

The algorithms for computing interpolatory decompositions arewasteful when k ≪ min(mn) since they involve a full QRfactorization which has complexity O(mnmin(mn)) Thisproblem is very easily remedied by replacing the full QRfactorization by a partial QR factorization (the QR factorization isinterrupted after k steps) which has cost O(mnk)

Low-Rank Approximations Lecture 4 March 6 - 13 2020 24 40

22 Randomized techniques for the approximate ID

Suppose A isin Rmtimesn with rank(A) = k Let

A = YF Y isin Rmtimesk F isin Rktimesn

Then by [IsX] = ID row(Y k) to compute a row ID of Y wehave

Y = XY(Is )

andXA(Is ) = XY(Is )F = YF = A

That is XA(Is ) is automatically a row ID of A as well

Observation In order to compute a row ID of a matrix A theonly information needed is a matrix Y whose columns span thecolumn space of A

As we have seen the task of finding a matrix Y whose columnsform a good basis for the column space of a matrix is ideallysuited to randomized sampling

Low-Rank Approximations Lecture 4 March 6 - 13 2020 25 40

Algorithm 10 Accuracy enhanced randomized row ID

Input A isin Rmtimesn a target rank kover-sampling parameter pa small q the number of steps in the power iteration

Output X isin Rmtimesk and an index vector Is isin Nk

such that A asymp XA(Is )

G = randn(n k + p)Y = AGfor j = 1 q

Z = ATYY = AZ

end[IsX] = ID row(Y k)

We can reduce the complexity by using a structured randommatrix instead of a Gaussian For example Ω = DFSDHS

Low-Rank Approximations Lecture 4 March 6 - 13 2020 26 40

3 The CUR decomposition

The CUR factorization approximates an mtimes n matrix A as aproduct

A asymp CUR C isin Rmtimesk U isin Rktimesk R isin Rktimesn

where C consists of a subset of the columns of A and R consists ofa subset of the rows of A

From double-sided ID to CUR Let

C = A( Js) R = A(Is )

ByA asymp CZ asymp CUR

we setU = ZRdagger

Low-Rank Approximations Lecture 4 March 6 - 13 2020 27 40

Algorithm 11 Accuracy enhanced randomized CUR

Input A isin Rmtimesn a target rank kover-sampling parameter pa small q the number of steps in the power iteration

Output U isin Rktimesk index vectors Is Js isin Nk

such that A asymp A( Js)UA(Is )

G = randn(k + pm)Y = GAfor j = 1 q

Z = YATY = ZA

end[IsZ] = ID col(Y k)[Issim] = ID row(A( Js) k)Solve UA(Is ) = Z in least squares sense

Low-Rank Approximations Lecture 4 March 6 - 13 2020 28 40

4 Adaptive rank determination

Given a matrix A and a tolerance ε The task is to determinematrices Qk and Bk of rank le k such that 983042AminusQkBk983042 le ε

41 Algorithm with updating the matrix

Q0 = [ ] B0 = [ ] A0 = A k = 0while 983042Ak9830422 gt ε

k = k + 1pick y isin range(Akminus1)qk = y983042y9830422bk = qT

kAkminus1Qk =

983045Qkminus1 qk

983046

Bk =

983063Bkminus1

bk

983064

Ak = Akminus1 minus qkbkend

Low-Rank Approximations Lecture 4 March 6 - 13 2020 29 40

Strategies for choosing y isin range(Akminus1)

(1) Pick the largest remaining column

y = Akminus1( jk) with jk = argmaxj

983042Akminus1( j)9830422

(2) Pick the locally optimal vector (computationally hard)

y = argmin983042x9830422=1 xisinrange(Akminus1)

983042(Iminus xxT)Akminus19830422

(3) A randomized selection strategy Set

y = Akminus1g

where g isin Rn is a Gaussian random vector

Low-Rank Approximations Lecture 4 March 6 - 13 2020 30 40

Exercise Prove that for each k Qk is orthonormal and

Bk = QTkA A = QkBk +Ak

Exercise Prove that for each k

983042A9830422F = 983042Bk9830422F + 983042Ak9830422F

Exercise Prove that for each k with the strategy (2) ie

y = argmin983042x9830422=1 xisinrange(Akminus1)

983042(Iminus xxT)Akminus19830422

the algorithm will produce matrices that attain the theoreticallyoptimal precision ie

983042AminusQkBk9830422 = σk+1

Low-Rank Approximations Lecture 4 March 6 - 13 2020 31 40

42 Block version for randomized selection strategy

Q = [ ] B = [ ]while 983042A9830422 gt ε

G = randn(n p)[Qnewsim] = qr(AG 0)Bnew = QT

newAQ =

983045Q Qnew

983046

B =

983063B

Bnew

983064

A = AminusQnewBnewend

Exercise The block version algorithmrsquos output is an orthonormalmatrix Q of size mtimes k (where k is a multiple of p) and a k times nmatrix B such that

983042AminusQB9830422 le ε

Low-Rank Approximations Lecture 4 March 6 - 13 2020 32 40

43 Algorithm without updating the matrix

Draw standard Gaussian vectors g1 gr of length nyi = Agi i = 1 r j = 0 Q0 = [ ]

while max983042yj+19830422 983042yj+29830422 983042yj+r9830422 gt ε(10983155

2π)j = j + 1yj = yj minusQjminus1Q

Tjminus1yj

qj = yj983042yj9830422Qj =

983045Qjminus1 qj

983046

Draw a standard Gaussian vector gj+r of length nyj+r = (IminusQjQ

Tj )Agj+r

yi = yi minus qjqTj yi i = (j + 1) (j + r minus 1)

endQ = Qj

The orthonormal matrix Q such that 983042AminusQQTA9830422 le ε holdswith probability at least 1minusminmn10minusr

Low-Rank Approximations Lecture 4 March 6 - 13 2020 33 40

5 Rank-revealing QR Factorization

Column pivoted QR factorization (CPQR) of A isin Rmtimesn

AP = QR

where P is a permutation matrix Q is an orthogonal matrix andR is upper triangular

Householder column pivoted QR

Householder column pivoted QR consists of a sequence of nminus 1rank-1 updates (so called BLAS2 operations)

Low-Rank Approximations Lecture 4 March 6 - 13 2020 34 40

Householder column pivoted QR executes rather slowly on modernhardware in particular on systems involving many cores when thematrix is stored in distributed memory or on a hard drive etcThe reason is the process very communication intensive Wereturn to block the process

Blocked Householder column pivoted QR

BLAS3 operations are involved which can be executed very rapidlyon a broad range of computing hardware

A randomized algorithm

G = randn(b+ pm) Y = GA [simsimP1] = qr(Y 0)

Low-Rank Approximations Lecture 4 March 6 - 13 2020 35 40

6 Rank-revealing UTV decomposition

Given an mtimes n matrix A with m ge n a UTV decomposition ofA is a factorization of the form

A = UTVT

where U isin Rmtimesm and V isin Rntimesn are unitary matrices and T is atriangular matrix (either lower or upper triangular)

The UTV decomposition can be viewed as a generalization ofother standard factorizations such as eg the Singular ValueDecomposition (SVD) or the Column Pivoted QR decomposition(CPQR) (To be precise the SVD is the special case where T isdiagonal and the CPQR is the special case where V is apermutation matrix) The additional flexibility inherent in UTVdecomposition enables the design of efficient updating procedures

randUTV provides close to optimal low-rank approximation andhighly accurate estimates for the singular values of a matrix

Low-Rank Approximations Lecture 4 March 6 - 13 2020 36 40

The algorithm randUTV builds the factorization incrementallywhich means that when it is applied to a matrix of numerical rankk the algorithm can be stopped early and incur an overall cost ofO(mnk)

Like HQRRP the algorithm randUTV is blocked which enables itto execute fast on modern hardware

The algorithm randUTV is not an iterative algorithm In thisregard it is closer to the CPQR than standard SVD algorithmswhich substantially simplifies software optimization

Blocked UTV factorization process

Low-Rank Approximations Lecture 4 March 6 - 13 2020 37 40

61 A single step block factorization from A0 to A1

Draw an mtimes p Gaussian matrix G and form the sample matrix

Y = (ATA)qATG

where q is a parameter indicating the number of steps of poweriteration taken

Form a unitary matrix V whose first p columns form anorthonormal basis for the column space of Y For example

[Vsim] = qr(Y)

We then compute SVD of the matrix AV( 1 p)

AV( 1 p) = U1A11WT

and set V1 = V

983063W

I

983064

Low-Rank Approximations Lecture 4 March 6 - 13 2020 38 40

One step of randUTV

function [UTV] = stepUTV(A p q)G =randn(size(A 1) p)Y = ATGfor i = 1 q

Y = AT(AY)end[Vsim] = qr(Y)[UDW] = svd(AV( 1 p))T =

983045D UTAV( (p+ 1) end)

983046

V( 1 p) = V( 1 p)Wreturn

randUTV then applies the same procedure to the remaining blockand so on

Low-Rank Approximations Lecture 4 March 6 - 13 2020 39 40

function [UTV] = randUTV(A p q)T = AU = eye(size(A 1)) V = eye(size(A 2))for i = 1 ceil(size(A 2)p)

I1 = 1 (p(iminus 1))I2 = (p(iminus 1) + 1) size(A 1)J2 = (p(iminus 1) + 1) size(A 2)if length(J2) gt p

[983141U 983141T 983141V] = stepUTV(T(I2 J2) p q)else

[983141U 983141T 983141V] = svd(T(I2 J2))end

U( I2) = U( I2)983141U V( J2) = V( J2)983141V

T(I2 J2) = 983141T T(I1 J2) = 983141T(I1 J2)983141Vend

return

Low-Rank Approximations Lecture 4 March 6 - 13 2020 40 40

11 A simple prototypical randomized algorithm

Let A be a matrix of size mtimes n that is approximately of low rank

Draw a Gaussian random matrix G of size ntimes k form thesampling matrix

E = AG

and then compute the factor F via

F = EdaggerA

where Edagger is the Moore-Penrose pseudo-inverse of E In manyimportant situations the approximation

A asymp E(EdaggerA)

is close to optimal

Low-Rank Approximations Lecture 4 March 6 - 13 2020 3 40

12 Advantages of randomized methods

Given an mtimes n matrix A the cost of computing a rank-kapproximant using classical methods is O(mnk) Randomizedalgorithms can attain complexity O(mn log k + k2(m+ n))

Algorithms for performing principal component analysis (PCA) oflarge data sets have been greatly accelerated in particular whenthe data is stored out-of-core

Randomized methods tend to require less communication thantraditional methods and can be efficiently implemented onseverely communication constrained environments such as GPUsand distributed computing platforms

Randomized algorithms have enabled the development ofsingle-pass matrix factorization algorithms in which the matrix isldquostreamedrdquo and never stored

Low-Rank Approximations Lecture 4 March 6 - 13 2020 4 40

13 A two-stage approach for an approximate rank-k SVD

Stage Amdashfind an approximate range Construct an mtimes k matrixQ with orthonormal columns such that

A asymp QQTA

(In other words the columns of Q form an approximate basis forthe column space of A) This step will be executed via arandomized process

Stage Bmdashform a specific factorization Given the matrix Qcomputed in Stage A form the factors UDV using classicaldeterministic techniques For instance this stage can be executedvia the following steps

(1) Form the k times n matrix B = QTA

(2) Compute an SVD of the (small) matrix B B = 983141UDVT

(3) Form U = Q983141U

Low-Rank Approximations Lecture 4 March 6 - 13 2020 5 40

A randomized algorithm for ldquoStage Ardquo

(1) Draw a Gaussian random matrix G and form AG

(2) Construct orthonormal matrix Q st

range(Q) = range(AG)

Motivation

(i) The special case rank(A) = k Let G isin Rntimesk One can prove

range(Q) = range(A)

holds with probability 1

(ii) Practical cases rank(A) ≫ k Simply take a few extra samplesIt turns out that if we take say k + 10 samples instead of k thenthe process will with probability almost 1 produce Qk that iscomparable to the best possible basis Uk where Uk are thematrix consisting of the k leading left singular vectors

Low-Rank Approximations Lecture 4 March 6 - 13 2020 6 40

Notation Orthonormalization Given an mtimes l matrix X withrank(X) = l with m ge l we introduce the function

Q = orth(X)

to denote orthonormalization of the columns of X In other wordsQ will be an mtimes l orthonormal matrix whose columns form abasis for the column space of X ie

QTQ = I and range(Q) = range(X)

(1) Via QR factorization

X = QR

In Matlab[Qsim] = qr(X 0)

(2) Via SVD factorization

X = QDVT

Low-Rank Approximations Lecture 4 March 6 - 13 2020 7 40

Algorithm 1 Randomized SVD

Input A isin Rmtimesn a target rank k over-sampling parameter p

Output U isin Rmtimes(k+p) D isin R(k+p)times(k+p) and V isin Rntimes(k+p)

Stage A(1) Form an ntimes (k + p) Gaussian random matrix G(2) Form the sample matrix Y = AG(3) Orthonormalize the columns of the sample matrix

Q = orth(Y)

Stage B(4) Form the (k + p)times n matrix B = QTA

(5) Form the SVD of the small matrix B B = 983141UDVT

(6) Form U = Q983141U

Randomized SVD accesses the matrix A twice

Low-Rank Approximations Lecture 4 March 6 - 13 2020 8 40

Theorem 1

Let A be an mtimes n matrix with singular values σjmin(mn)j=1 Let k be a

target rank and let p be an over-sampling parameter such that p ge 2and k + p le min(mn) Let G be a Gaussian random matrix of sizentimes (k + p) and set Q = orth(AG) Then the average error asmeasured in the Frobenius norm satisfies

E[983042AminusQQTA983042F] le9830611 +

k

pminus 1

98306212983091

983107min(mn)983131

j=k+1

σ2j

983092

98310812

where E refers to expectation with respect to the draw of G Thecorresponding result for the spectral norm reads

E[983042AminusQQTA9830422] le9830751 +

983158k

pminus 1

983076σk+1 +

eradick + p

p

983091

983107min(mn)983131

j=k+1

σ2j

983092

98310812

Low-Rank Approximations Lecture 4 March 6 - 13 2020 9 40

Bounds on the likelihood of large deviations

The likelihood of a large deviation from the mean depends only onthe over-sampling parameter p and decays extraordinarily fastFor instance one can prove that if p ge 4 then

983042AminusQQTA9830422 le9830751 + 17

983158

1 +k

p

983076σk+1

+8radick + p

p+ 1

983091

983107min(mn)983131

j=k+1

σ2j

983092

98310812

with failure probability at most 3eminusp

Low-Rank Approximations Lecture 4 March 6 - 13 2020 10 40

14 Single-pass algorithms

Algorithm 2 Single-pass randomized EVD for symmetric matrix

Input Symmetric A isin Rntimesn a target rank kover-sampling parameter p

Output U isin Rntimesk D isin Rktimesk and A asymp UDUT

Stage A(1) Form an ntimes (k + p) Gaussian random matrix G(2) Form the sample matrix Y = AG(3) Let Q denote the orthonormal matrix formed by the k

dominant left singular vectors of Y

Stage B

(4) Let C = argminXisinRktimeskX=XT

983042XQTGminusQTY983042F

(5) Form the EVD of the small matrix C C = 983141UD983141UT

(6) Form U = Q983141U

Low-Rank Approximations Lecture 4 March 6 - 13 2020 11 40

Algorithm 3 Single-pass randomized SVD for general matrix

Input A isin Rmtimesn a target rank k over-sampling parameter pOutput U isin Rntimesk D isin Rktimesk V isin Rntimesk and A asymp UDVT

Stage A

(1) Form two Gaussian random matrices Gc isin Rntimes(k+p) and

Gr isin Rmtimes(k+p) (2) Form the sample matrices Yc = AGc and Yr = ATGr(3) Form orthonormal matrices Qc and Qr consisting of the

k dominant left singular vectors of Yc and Yr

Stage B(4) Let

C = argminXisinRktimesk

983042GTr QcXminusYT

r Qr9830422F + 983042XQTr Gc minusQT

c Yc9830422F

(5) Form the SVD of the small matrix C C = 983141UD983141VT

(6) Form U = Qc983141U and V = Qr

983141V

Low-Rank Approximations Lecture 4 March 6 - 13 2020 12 40

Streaming algorithms

We say that an algorithm for processing a matrix is a streamingalgorithm if both (1) and (2) are satisfied

(1) each entry of the matrix is accessed only once

(2) the entries can be fed in any order (In other words thealgorithm is not allowed to dictate the order in which the elementsare viewed)

Complexity O(mn log k) algorithm

Replace Gaussian random matrix G with a different randommatrix Ω

(1) Ω is sufficiently structured that AΩ can be evaluated inO(mn log k) flops

(2) Ω is sufficiently random such that the columns of AΩaccurately span the range of A

Low-Rank Approximations Lecture 4 March 6 - 13 2020 13 40

Candidate ldquoFast Johnson-Lindenstrauss Transformrdquo

(1) Complex case A good candidate for Ω

Ω = DFS

where D is diagonal and Dii are complex numbers of modulus onedrawn from a uniform distribution on the unit circle in C F is thediscrete Fourier transform

Fpq =1radicneminus2πi(pminus1)(qminus1)n p q = 1 n

and S is a matrix consisting of a random subset of l columns fromthe ntimes n unit matrix (drawn without replacement) In otherwords given an arbitrary matrix X of size mtimes n the matrix XSconsists of a randomly drawn subset of l columns of X

(2) Real case Ω = DHS where H is the Hadamard Transform Dis diagonal and Dii = plusmn1 with equal probability

Low-Rank Approximations Lecture 4 March 6 - 13 2020 14 40

15 Accuracy enhanced algorithms

Algorithm 4 Accuracy enhanced randomized SVD

Input A isin Rmtimesn a target rank kover-sampling parameter pa small q the number of steps in the power iteration

Output U isin Rmtimes(k+p) D isin R(k+p)times(k+p) and V isin Rntimes(k+p)

G = randn(n k + p)Y = AGfor j = 1 q

Z = ATYY = AZ

endQ = orth(Y) (for example [Qsim] = qr(Y 0))B = QTA

[983141UDV] = svd(Blsquoeconrsquo)

U = Q983141U

Low-Rank Approximations Lecture 4 March 6 - 13 2020 15 40

Theorem 2

Let A be an mtimes n matrix with singular values σjmin(mn)j=1 Let k be a

target rank and let p be an over-sampling parameter such that p ge 2and k + p le min(mn) Let G be a Gaussian random matrix of sizentimes (k + p) let q be a small integer set

Y = (AAT)qAG

and let Q = orth(Y) Then

E[983042AminusQQTA9830422]

le

983093

983097983095

9830751 +

983158k

pminus 1

983076σ2q+1k+1 +

eradick + p

p

983091

983107min(mn)983131

j=k+1

σ2(2q+1)j

983092

983108

12

983094

983098983096

12q+1

le9830771 +

983158k

pminus 1+

eradick + p

p

983155min(mn)minus k

983078 12q+1

σk+1

Low-Rank Approximations Lecture 4 March 6 - 13 2020 16 40

Algorithm 5 Accuracy enhanced randomized SVDwith orthonormalization

Input A isin Rmtimesn a target rank kover-sampling parameter pa small q the number of steps in the power iteration

Output U isin Rmtimes(k+p) D isin R(k+p)times(k+p) and V isin Rntimes(k+p)

G = randn(n k + p)Q = orth(AG)for j = 1 q

W = orth(ATQ)Q = orth(AW)

endB = QTA

[983141UDV] = svd(Blsquoeconrsquo)

U = Q983141U

Low-Rank Approximations Lecture 4 March 6 - 13 2020 17 40

16 The Nystrom method for SPD matrices

Algorithm 6 Randomized EVD for SPD matrixvia Nystrom method

Input SPD A isin Rntimesn a target rank kover-sampling parameter p

Output U isin Rmtimes(k+p) and Λ isin R(k+p)times(k+p)

G = randn(n k + p)Q = orth(AG)B1 = AQB2 = QTB1C = chol(B2)F = B1C

minus1[UΣsim] = svd(Flsquoeconrsquo)Λ = Σ2

Low-Rank Approximations Lecture 4 March 6 - 13 2020 18 40

2 Interpolative decomposition (ID)

Any matrix A of size mtimes n and rank k where k lt min(mn)admits a so called ldquocolumn IDrdquo which takes the form

A = CZ C isin Rmtimesk Z isin Rktimesn

where the matrix C = A( Js) and Z is a matrix that containsthe k times k identity matrix (Z( Js) = Ik)

Advantages compared to QR or SVD

(1) If A is sparse or non-negative then C shares these properties

(2) The ID requires less memory to store than either the QR orSVD

(3) Finding the indices associated with the spanning columns isoften helpful in data interpretation

(4) In the context of numerical algorithms for discretizing PDEsand integral equations the ID often preserves ldquothe physicsrdquo of aproblem in a way that the QR or SVD do not

Low-Rank Approximations Lecture 4 March 6 - 13 2020 19 40

ldquoRow IDrdquo

A = XR X isin Rmtimesk R isin Rktimesn

where R = A(Is ) and X is a matrix that contains the k times kidentity matrix (X(Is ) = Ik)

ldquoDouble-sided IDrdquo

A = XAsZ X isin Rmtimesk As isin Rktimesk Z isin Rktimesn

where X and Z are the same matrices as those that appear in rowID and column ID and As is the k times k submatrix of A given by

As = A(Is Js)

Low-Rank Approximations Lecture 4 March 6 - 13 2020 20 40

21 Deterministic techniques for the approximate ID

The standard software used to compute the classical columnpivoted QR factorization (CPQR)

AP = QS

can with some light post-processing be used to compute thecolumn ID Let

Q =983045Q1 Q2

983046 S =

983063S11 S12

0 S22

983064

ThenAP =

983045Q1S11 Q1S12 +Q2S22

983046

Let J denote the permutation vector associated with thepermutation matrix P so that P = I( J) and AP = A( J) LetJs = J(1 k) Then set

C = A( Js) = Q1S11

Low-Rank Approximations Lecture 4 March 6 - 13 2020 21 40

ByAP = Q1S11

983045I Sminus1

11 S12

983046+

9830450 Q2S22

983046

we have

A = C983045I Sminus1

11 S12

983046PT +

9830450 Q2S22

983046PT

= CZ+9830450 Q2S22

983046PT

Algorithm 7 Column ID A asymp A( Js)Z

function [JsZ] = ID col(A k)[simS J ] = qr(A 0)T = S(1 k 1 k)S(1 k (k + 1) n)Z = zeros(k n)Z( J) =

983045Ik T

983046

Js = J(1 k)end

Low-Rank Approximations Lecture 4 March 6 - 13 2020 22 40

The row ID can be computed via an entirely analogous processthat starts with a CPQR of the transpose of A In other wordswe execute a pivoted Gram-Schmidt orthonormalization processon the rows of A

Algorithm 8 Row ID A asymp XA(Is )

function [IsX] = ID row(A k)[simS J ] = qr(AT 0)T = S(1 k 1 k)S(1 k (k + 1) m)X = zeros(m k)

X(J ) =983045Ik T

983046T

Is = J(1 k)end

Low-Rank Approximations Lecture 4 March 6 - 13 2020 23 40

To obtain the double-sided ID we start with using theCPQR-based process to build the column ID Then compute therow ID by performing Gram-Schmidt on the rows of the tall thinmatrix C

Algorithm 9 Double-sided ID A asymp XA(Is Js)Z

function [Is JsXZ] = ID double(A k)[JsZ] = ID col(A k)[IsX] = ID row(A( Js) k)

end

The algorithms for computing interpolatory decompositions arewasteful when k ≪ min(mn) since they involve a full QRfactorization which has complexity O(mnmin(mn)) Thisproblem is very easily remedied by replacing the full QRfactorization by a partial QR factorization (the QR factorization isinterrupted after k steps) which has cost O(mnk)

Low-Rank Approximations Lecture 4 March 6 - 13 2020 24 40

22 Randomized techniques for the approximate ID

Suppose A isin Rmtimesn with rank(A) = k Let

A = YF Y isin Rmtimesk F isin Rktimesn

Then by [IsX] = ID row(Y k) to compute a row ID of Y wehave

Y = XY(Is )

andXA(Is ) = XY(Is )F = YF = A

That is XA(Is ) is automatically a row ID of A as well

Observation In order to compute a row ID of a matrix A theonly information needed is a matrix Y whose columns span thecolumn space of A

As we have seen the task of finding a matrix Y whose columnsform a good basis for the column space of a matrix is ideallysuited to randomized sampling

Low-Rank Approximations Lecture 4 March 6 - 13 2020 25 40

Algorithm 10 Accuracy enhanced randomized row ID

Input A isin Rmtimesn a target rank kover-sampling parameter pa small q the number of steps in the power iteration

Output X isin Rmtimesk and an index vector Is isin Nk

such that A asymp XA(Is )

G = randn(n k + p)Y = AGfor j = 1 q

Z = ATYY = AZ

end[IsX] = ID row(Y k)

We can reduce the complexity by using a structured randommatrix instead of a Gaussian For example Ω = DFSDHS

Low-Rank Approximations Lecture 4 March 6 - 13 2020 26 40

3 The CUR decomposition

The CUR factorization approximates an mtimes n matrix A as aproduct

A asymp CUR C isin Rmtimesk U isin Rktimesk R isin Rktimesn

where C consists of a subset of the columns of A and R consists ofa subset of the rows of A

From double-sided ID to CUR Let

C = A( Js) R = A(Is )

ByA asymp CZ asymp CUR

we setU = ZRdagger

Low-Rank Approximations Lecture 4 March 6 - 13 2020 27 40

Algorithm 11 Accuracy enhanced randomized CUR

Input A isin Rmtimesn a target rank kover-sampling parameter pa small q the number of steps in the power iteration

Output U isin Rktimesk index vectors Is Js isin Nk

such that A asymp A( Js)UA(Is )

G = randn(k + pm)Y = GAfor j = 1 q

Z = YATY = ZA

end[IsZ] = ID col(Y k)[Issim] = ID row(A( Js) k)Solve UA(Is ) = Z in least squares sense

Low-Rank Approximations Lecture 4 March 6 - 13 2020 28 40

4 Adaptive rank determination

Given a matrix A and a tolerance ε The task is to determinematrices Qk and Bk of rank le k such that 983042AminusQkBk983042 le ε

41 Algorithm with updating the matrix

Q0 = [ ] B0 = [ ] A0 = A k = 0while 983042Ak9830422 gt ε

k = k + 1pick y isin range(Akminus1)qk = y983042y9830422bk = qT

kAkminus1Qk =

983045Qkminus1 qk

983046

Bk =

983063Bkminus1

bk

983064

Ak = Akminus1 minus qkbkend

Low-Rank Approximations Lecture 4 March 6 - 13 2020 29 40

Strategies for choosing y isin range(Akminus1)

(1) Pick the largest remaining column

y = Akminus1( jk) with jk = argmaxj

983042Akminus1( j)9830422

(2) Pick the locally optimal vector (computationally hard)

y = argmin983042x9830422=1 xisinrange(Akminus1)

983042(Iminus xxT)Akminus19830422

(3) A randomized selection strategy Set

y = Akminus1g

where g isin Rn is a Gaussian random vector

Low-Rank Approximations Lecture 4 March 6 - 13 2020 30 40

Exercise Prove that for each k Qk is orthonormal and

Bk = QTkA A = QkBk +Ak

Exercise Prove that for each k

983042A9830422F = 983042Bk9830422F + 983042Ak9830422F

Exercise Prove that for each k with the strategy (2) ie

y = argmin983042x9830422=1 xisinrange(Akminus1)

983042(Iminus xxT)Akminus19830422

the algorithm will produce matrices that attain the theoreticallyoptimal precision ie

983042AminusQkBk9830422 = σk+1

Low-Rank Approximations Lecture 4 March 6 - 13 2020 31 40

42 Block version for randomized selection strategy

Q = [ ] B = [ ]while 983042A9830422 gt ε

G = randn(n p)[Qnewsim] = qr(AG 0)Bnew = QT

newAQ =

983045Q Qnew

983046

B =

983063B

Bnew

983064

A = AminusQnewBnewend

Exercise The block version algorithmrsquos output is an orthonormalmatrix Q of size mtimes k (where k is a multiple of p) and a k times nmatrix B such that

983042AminusQB9830422 le ε

Low-Rank Approximations Lecture 4 March 6 - 13 2020 32 40

43 Algorithm without updating the matrix

Draw standard Gaussian vectors g1 gr of length nyi = Agi i = 1 r j = 0 Q0 = [ ]

while max983042yj+19830422 983042yj+29830422 983042yj+r9830422 gt ε(10983155

2π)j = j + 1yj = yj minusQjminus1Q

Tjminus1yj

qj = yj983042yj9830422Qj =

983045Qjminus1 qj

983046

Draw a standard Gaussian vector gj+r of length nyj+r = (IminusQjQ

Tj )Agj+r

yi = yi minus qjqTj yi i = (j + 1) (j + r minus 1)

endQ = Qj

The orthonormal matrix Q such that 983042AminusQQTA9830422 le ε holdswith probability at least 1minusminmn10minusr

Low-Rank Approximations Lecture 4 March 6 - 13 2020 33 40

5 Rank-revealing QR Factorization

Column pivoted QR factorization (CPQR) of A isin Rmtimesn

AP = QR

where P is a permutation matrix Q is an orthogonal matrix andR is upper triangular

Householder column pivoted QR

Householder column pivoted QR consists of a sequence of nminus 1rank-1 updates (so called BLAS2 operations)

Low-Rank Approximations Lecture 4 March 6 - 13 2020 34 40

Householder column pivoted QR executes rather slowly on modernhardware in particular on systems involving many cores when thematrix is stored in distributed memory or on a hard drive etcThe reason is the process very communication intensive Wereturn to block the process

Blocked Householder column pivoted QR

BLAS3 operations are involved which can be executed very rapidlyon a broad range of computing hardware

A randomized algorithm

G = randn(b+ pm) Y = GA [simsimP1] = qr(Y 0)

Low-Rank Approximations Lecture 4 March 6 - 13 2020 35 40

6 Rank-revealing UTV decomposition

Given an mtimes n matrix A with m ge n a UTV decomposition ofA is a factorization of the form

A = UTVT

where U isin Rmtimesm and V isin Rntimesn are unitary matrices and T is atriangular matrix (either lower or upper triangular)

The UTV decomposition can be viewed as a generalization ofother standard factorizations such as eg the Singular ValueDecomposition (SVD) or the Column Pivoted QR decomposition(CPQR) (To be precise the SVD is the special case where T isdiagonal and the CPQR is the special case where V is apermutation matrix) The additional flexibility inherent in UTVdecomposition enables the design of efficient updating procedures

randUTV provides close to optimal low-rank approximation andhighly accurate estimates for the singular values of a matrix

Low-Rank Approximations Lecture 4 March 6 - 13 2020 36 40

The algorithm randUTV builds the factorization incrementallywhich means that when it is applied to a matrix of numerical rankk the algorithm can be stopped early and incur an overall cost ofO(mnk)

Like HQRRP the algorithm randUTV is blocked which enables itto execute fast on modern hardware

The algorithm randUTV is not an iterative algorithm In thisregard it is closer to the CPQR than standard SVD algorithmswhich substantially simplifies software optimization

Blocked UTV factorization process

Low-Rank Approximations Lecture 4 March 6 - 13 2020 37 40

61 A single step block factorization from A0 to A1

Draw an mtimes p Gaussian matrix G and form the sample matrix

Y = (ATA)qATG

where q is a parameter indicating the number of steps of poweriteration taken

Form a unitary matrix V whose first p columns form anorthonormal basis for the column space of Y For example

[Vsim] = qr(Y)

We then compute SVD of the matrix AV( 1 p)

AV( 1 p) = U1A11WT

and set V1 = V

983063W

I

983064

Low-Rank Approximations Lecture 4 March 6 - 13 2020 38 40

One step of randUTV

function [UTV] = stepUTV(A p q)G =randn(size(A 1) p)Y = ATGfor i = 1 q

Y = AT(AY)end[Vsim] = qr(Y)[UDW] = svd(AV( 1 p))T =

983045D UTAV( (p+ 1) end)

983046

V( 1 p) = V( 1 p)Wreturn

randUTV then applies the same procedure to the remaining blockand so on

Low-Rank Approximations Lecture 4 March 6 - 13 2020 39 40

function [UTV] = randUTV(A p q)T = AU = eye(size(A 1)) V = eye(size(A 2))for i = 1 ceil(size(A 2)p)

I1 = 1 (p(iminus 1))I2 = (p(iminus 1) + 1) size(A 1)J2 = (p(iminus 1) + 1) size(A 2)if length(J2) gt p

[983141U 983141T 983141V] = stepUTV(T(I2 J2) p q)else

[983141U 983141T 983141V] = svd(T(I2 J2))end

U( I2) = U( I2)983141U V( J2) = V( J2)983141V

T(I2 J2) = 983141T T(I1 J2) = 983141T(I1 J2)983141Vend

return

Low-Rank Approximations Lecture 4 March 6 - 13 2020 40 40

12 Advantages of randomized methods

Given an mtimes n matrix A the cost of computing a rank-kapproximant using classical methods is O(mnk) Randomizedalgorithms can attain complexity O(mn log k + k2(m+ n))

Algorithms for performing principal component analysis (PCA) oflarge data sets have been greatly accelerated in particular whenthe data is stored out-of-core

Randomized methods tend to require less communication thantraditional methods and can be efficiently implemented onseverely communication constrained environments such as GPUsand distributed computing platforms

Randomized algorithms have enabled the development ofsingle-pass matrix factorization algorithms in which the matrix isldquostreamedrdquo and never stored

Low-Rank Approximations Lecture 4 March 6 - 13 2020 4 40

13 A two-stage approach for an approximate rank-k SVD

Stage Amdashfind an approximate range Construct an mtimes k matrixQ with orthonormal columns such that

A asymp QQTA

(In other words the columns of Q form an approximate basis forthe column space of A) This step will be executed via arandomized process

Stage Bmdashform a specific factorization Given the matrix Qcomputed in Stage A form the factors UDV using classicaldeterministic techniques For instance this stage can be executedvia the following steps

(1) Form the k times n matrix B = QTA

(2) Compute an SVD of the (small) matrix B B = 983141UDVT

(3) Form U = Q983141U

Low-Rank Approximations Lecture 4 March 6 - 13 2020 5 40

A randomized algorithm for ldquoStage Ardquo

(1) Draw a Gaussian random matrix G and form AG

(2) Construct orthonormal matrix Q st

range(Q) = range(AG)

Motivation

(i) The special case rank(A) = k Let G isin Rntimesk One can prove

range(Q) = range(A)

holds with probability 1

(ii) Practical cases rank(A) ≫ k Simply take a few extra samplesIt turns out that if we take say k + 10 samples instead of k thenthe process will with probability almost 1 produce Qk that iscomparable to the best possible basis Uk where Uk are thematrix consisting of the k leading left singular vectors

Low-Rank Approximations Lecture 4 March 6 - 13 2020 6 40

Notation Orthonormalization Given an mtimes l matrix X withrank(X) = l with m ge l we introduce the function

Q = orth(X)

to denote orthonormalization of the columns of X In other wordsQ will be an mtimes l orthonormal matrix whose columns form abasis for the column space of X ie

QTQ = I and range(Q) = range(X)

(1) Via QR factorization

X = QR

In Matlab[Qsim] = qr(X 0)

(2) Via SVD factorization

X = QDVT

Low-Rank Approximations Lecture 4 March 6 - 13 2020 7 40

Algorithm 1 Randomized SVD

Input A isin Rmtimesn a target rank k over-sampling parameter p

Output U isin Rmtimes(k+p) D isin R(k+p)times(k+p) and V isin Rntimes(k+p)

Stage A(1) Form an ntimes (k + p) Gaussian random matrix G(2) Form the sample matrix Y = AG(3) Orthonormalize the columns of the sample matrix

Q = orth(Y)

Stage B(4) Form the (k + p)times n matrix B = QTA

(5) Form the SVD of the small matrix B B = 983141UDVT

(6) Form U = Q983141U

Randomized SVD accesses the matrix A twice

Low-Rank Approximations Lecture 4 March 6 - 13 2020 8 40

Theorem 1

Let A be an mtimes n matrix with singular values σjmin(mn)j=1 Let k be a

target rank and let p be an over-sampling parameter such that p ge 2and k + p le min(mn) Let G be a Gaussian random matrix of sizentimes (k + p) and set Q = orth(AG) Then the average error asmeasured in the Frobenius norm satisfies

E[983042AminusQQTA983042F] le9830611 +

k

pminus 1

98306212983091

983107min(mn)983131

j=k+1

σ2j

983092

98310812

where E refers to expectation with respect to the draw of G Thecorresponding result for the spectral norm reads

E[983042AminusQQTA9830422] le9830751 +

983158k

pminus 1

983076σk+1 +

eradick + p

p

983091

983107min(mn)983131

j=k+1

σ2j

983092

98310812

Low-Rank Approximations Lecture 4 March 6 - 13 2020 9 40

Bounds on the likelihood of large deviations

The likelihood of a large deviation from the mean depends only onthe over-sampling parameter p and decays extraordinarily fastFor instance one can prove that if p ge 4 then

983042AminusQQTA9830422 le9830751 + 17

983158

1 +k

p

983076σk+1

+8radick + p

p+ 1

983091

983107min(mn)983131

j=k+1

σ2j

983092

98310812

with failure probability at most 3eminusp

Low-Rank Approximations Lecture 4 March 6 - 13 2020 10 40

14 Single-pass algorithms

Algorithm 2 Single-pass randomized EVD for symmetric matrix

Input Symmetric A isin Rntimesn a target rank kover-sampling parameter p

Output U isin Rntimesk D isin Rktimesk and A asymp UDUT

Stage A(1) Form an ntimes (k + p) Gaussian random matrix G(2) Form the sample matrix Y = AG(3) Let Q denote the orthonormal matrix formed by the k

dominant left singular vectors of Y

Stage B

(4) Let C = argminXisinRktimeskX=XT

983042XQTGminusQTY983042F

(5) Form the EVD of the small matrix C C = 983141UD983141UT

(6) Form U = Q983141U

Low-Rank Approximations Lecture 4 March 6 - 13 2020 11 40

Algorithm 3 Single-pass randomized SVD for general matrix

Input A isin Rmtimesn a target rank k over-sampling parameter pOutput U isin Rntimesk D isin Rktimesk V isin Rntimesk and A asymp UDVT

Stage A

(1) Form two Gaussian random matrices Gc isin Rntimes(k+p) and

Gr isin Rmtimes(k+p) (2) Form the sample matrices Yc = AGc and Yr = ATGr(3) Form orthonormal matrices Qc and Qr consisting of the

k dominant left singular vectors of Yc and Yr

Stage B(4) Let

C = argminXisinRktimesk

983042GTr QcXminusYT

r Qr9830422F + 983042XQTr Gc minusQT

c Yc9830422F

(5) Form the SVD of the small matrix C C = 983141UD983141VT

(6) Form U = Qc983141U and V = Qr

983141V

Low-Rank Approximations Lecture 4 March 6 - 13 2020 12 40

Streaming algorithms

We say that an algorithm for processing a matrix is a streamingalgorithm if both (1) and (2) are satisfied

(1) each entry of the matrix is accessed only once

(2) the entries can be fed in any order (In other words thealgorithm is not allowed to dictate the order in which the elementsare viewed)

Complexity O(mn log k) algorithm

Replace Gaussian random matrix G with a different randommatrix Ω

(1) Ω is sufficiently structured that AΩ can be evaluated inO(mn log k) flops

(2) Ω is sufficiently random such that the columns of AΩaccurately span the range of A

Low-Rank Approximations Lecture 4 March 6 - 13 2020 13 40

Candidate ldquoFast Johnson-Lindenstrauss Transformrdquo

(1) Complex case A good candidate for Ω

Ω = DFS

where D is diagonal and Dii are complex numbers of modulus onedrawn from a uniform distribution on the unit circle in C F is thediscrete Fourier transform

Fpq =1radicneminus2πi(pminus1)(qminus1)n p q = 1 n

and S is a matrix consisting of a random subset of l columns fromthe ntimes n unit matrix (drawn without replacement) In otherwords given an arbitrary matrix X of size mtimes n the matrix XSconsists of a randomly drawn subset of l columns of X

(2) Real case Ω = DHS where H is the Hadamard Transform Dis diagonal and Dii = plusmn1 with equal probability

Low-Rank Approximations Lecture 4 March 6 - 13 2020 14 40

15 Accuracy enhanced algorithms

Algorithm 4 Accuracy enhanced randomized SVD

Input A isin Rmtimesn a target rank kover-sampling parameter pa small q the number of steps in the power iteration

Output U isin Rmtimes(k+p) D isin R(k+p)times(k+p) and V isin Rntimes(k+p)

G = randn(n k + p)Y = AGfor j = 1 q

Z = ATYY = AZ

endQ = orth(Y) (for example [Qsim] = qr(Y 0))B = QTA

[983141UDV] = svd(Blsquoeconrsquo)

U = Q983141U

Low-Rank Approximations Lecture 4 March 6 - 13 2020 15 40

Theorem 2

Let A be an mtimes n matrix with singular values σjmin(mn)j=1 Let k be a

target rank and let p be an over-sampling parameter such that p ge 2and k + p le min(mn) Let G be a Gaussian random matrix of sizentimes (k + p) let q be a small integer set

Y = (AAT)qAG

and let Q = orth(Y) Then

E[983042AminusQQTA9830422]

le

983093

983097983095

9830751 +

983158k

pminus 1

983076σ2q+1k+1 +

eradick + p

p

983091

983107min(mn)983131

j=k+1

σ2(2q+1)j

983092

983108

12

983094

983098983096

12q+1

le9830771 +

983158k

pminus 1+

eradick + p

p

983155min(mn)minus k

983078 12q+1

σk+1

Low-Rank Approximations Lecture 4 March 6 - 13 2020 16 40

Algorithm 5 Accuracy enhanced randomized SVDwith orthonormalization

Input A isin Rmtimesn a target rank kover-sampling parameter pa small q the number of steps in the power iteration

Output U isin Rmtimes(k+p) D isin R(k+p)times(k+p) and V isin Rntimes(k+p)

G = randn(n k + p)Q = orth(AG)for j = 1 q

W = orth(ATQ)Q = orth(AW)

endB = QTA

[983141UDV] = svd(Blsquoeconrsquo)

U = Q983141U

Low-Rank Approximations Lecture 4 March 6 - 13 2020 17 40

16 The Nystrom method for SPD matrices

Algorithm 6 Randomized EVD for SPD matrixvia Nystrom method

Input SPD A isin Rntimesn a target rank kover-sampling parameter p

Output U isin Rmtimes(k+p) and Λ isin R(k+p)times(k+p)

G = randn(n k + p)Q = orth(AG)B1 = AQB2 = QTB1C = chol(B2)F = B1C

minus1[UΣsim] = svd(Flsquoeconrsquo)Λ = Σ2

Low-Rank Approximations Lecture 4 March 6 - 13 2020 18 40

2 Interpolative decomposition (ID)

Any matrix A of size mtimes n and rank k where k lt min(mn)admits a so called ldquocolumn IDrdquo which takes the form

A = CZ C isin Rmtimesk Z isin Rktimesn

where the matrix C = A( Js) and Z is a matrix that containsthe k times k identity matrix (Z( Js) = Ik)

Advantages compared to QR or SVD

(1) If A is sparse or non-negative then C shares these properties

(2) The ID requires less memory to store than either the QR orSVD

(3) Finding the indices associated with the spanning columns isoften helpful in data interpretation

(4) In the context of numerical algorithms for discretizing PDEsand integral equations the ID often preserves ldquothe physicsrdquo of aproblem in a way that the QR or SVD do not

Low-Rank Approximations Lecture 4 March 6 - 13 2020 19 40

ldquoRow IDrdquo

A = XR X isin Rmtimesk R isin Rktimesn

where R = A(Is ) and X is a matrix that contains the k times kidentity matrix (X(Is ) = Ik)

ldquoDouble-sided IDrdquo

A = XAsZ X isin Rmtimesk As isin Rktimesk Z isin Rktimesn

where X and Z are the same matrices as those that appear in rowID and column ID and As is the k times k submatrix of A given by

As = A(Is Js)

Low-Rank Approximations Lecture 4 March 6 - 13 2020 20 40

21 Deterministic techniques for the approximate ID

The standard software used to compute the classical columnpivoted QR factorization (CPQR)

AP = QS

can with some light post-processing be used to compute thecolumn ID Let

Q =983045Q1 Q2

983046 S =

983063S11 S12

0 S22

983064

ThenAP =

983045Q1S11 Q1S12 +Q2S22

983046

Let J denote the permutation vector associated with thepermutation matrix P so that P = I( J) and AP = A( J) LetJs = J(1 k) Then set

C = A( Js) = Q1S11

Low-Rank Approximations Lecture 4 March 6 - 13 2020 21 40

ByAP = Q1S11

983045I Sminus1

11 S12

983046+

9830450 Q2S22

983046

we have

A = C983045I Sminus1

11 S12

983046PT +

9830450 Q2S22

983046PT

= CZ+9830450 Q2S22

983046PT

Algorithm 7 Column ID A asymp A( Js)Z

function [JsZ] = ID col(A k)[simS J ] = qr(A 0)T = S(1 k 1 k)S(1 k (k + 1) n)Z = zeros(k n)Z( J) =

983045Ik T

983046

Js = J(1 k)end

Low-Rank Approximations Lecture 4 March 6 - 13 2020 22 40

The row ID can be computed via an entirely analogous processthat starts with a CPQR of the transpose of A In other wordswe execute a pivoted Gram-Schmidt orthonormalization processon the rows of A

Algorithm 8 Row ID A asymp XA(Is )

function [IsX] = ID row(A k)[simS J ] = qr(AT 0)T = S(1 k 1 k)S(1 k (k + 1) m)X = zeros(m k)

X(J ) =983045Ik T

983046T

Is = J(1 k)end

Low-Rank Approximations Lecture 4 March 6 - 13 2020 23 40

To obtain the double-sided ID we start with using theCPQR-based process to build the column ID Then compute therow ID by performing Gram-Schmidt on the rows of the tall thinmatrix C

Algorithm 9 Double-sided ID A asymp XA(Is Js)Z

function [Is JsXZ] = ID double(A k)[JsZ] = ID col(A k)[IsX] = ID row(A( Js) k)

end

The algorithms for computing interpolatory decompositions arewasteful when k ≪ min(mn) since they involve a full QRfactorization which has complexity O(mnmin(mn)) Thisproblem is very easily remedied by replacing the full QRfactorization by a partial QR factorization (the QR factorization isinterrupted after k steps) which has cost O(mnk)

Low-Rank Approximations Lecture 4 March 6 - 13 2020 24 40

22 Randomized techniques for the approximate ID

Suppose A isin Rmtimesn with rank(A) = k Let

A = YF Y isin Rmtimesk F isin Rktimesn

Then by [IsX] = ID row(Y k) to compute a row ID of Y wehave

Y = XY(Is )

andXA(Is ) = XY(Is )F = YF = A

That is XA(Is ) is automatically a row ID of A as well

Observation In order to compute a row ID of a matrix A theonly information needed is a matrix Y whose columns span thecolumn space of A

As we have seen the task of finding a matrix Y whose columnsform a good basis for the column space of a matrix is ideallysuited to randomized sampling

Low-Rank Approximations Lecture 4 March 6 - 13 2020 25 40

Algorithm 10 Accuracy enhanced randomized row ID

Input A isin Rmtimesn a target rank kover-sampling parameter pa small q the number of steps in the power iteration

Output X isin Rmtimesk and an index vector Is isin Nk

such that A asymp XA(Is )

G = randn(n k + p)Y = AGfor j = 1 q

Z = ATYY = AZ

end[IsX] = ID row(Y k)

We can reduce the complexity by using a structured randommatrix instead of a Gaussian For example Ω = DFSDHS

Low-Rank Approximations Lecture 4 March 6 - 13 2020 26 40

3 The CUR decomposition

The CUR factorization approximates an mtimes n matrix A as aproduct

A asymp CUR C isin Rmtimesk U isin Rktimesk R isin Rktimesn

where C consists of a subset of the columns of A and R consists ofa subset of the rows of A

From double-sided ID to CUR Let

C = A( Js) R = A(Is )

ByA asymp CZ asymp CUR

we setU = ZRdagger

Low-Rank Approximations Lecture 4 March 6 - 13 2020 27 40

Algorithm 11 Accuracy enhanced randomized CUR

Input A isin Rmtimesn a target rank kover-sampling parameter pa small q the number of steps in the power iteration

Output U isin Rktimesk index vectors Is Js isin Nk

such that A asymp A( Js)UA(Is )

G = randn(k + pm)Y = GAfor j = 1 q

Z = YATY = ZA

end[IsZ] = ID col(Y k)[Issim] = ID row(A( Js) k)Solve UA(Is ) = Z in least squares sense

Low-Rank Approximations Lecture 4 March 6 - 13 2020 28 40

4 Adaptive rank determination

Given a matrix A and a tolerance ε The task is to determinematrices Qk and Bk of rank le k such that 983042AminusQkBk983042 le ε

41 Algorithm with updating the matrix

Q0 = [ ] B0 = [ ] A0 = A k = 0while 983042Ak9830422 gt ε

k = k + 1pick y isin range(Akminus1)qk = y983042y9830422bk = qT

kAkminus1Qk =

983045Qkminus1 qk

983046

Bk =

983063Bkminus1

bk

983064

Ak = Akminus1 minus qkbkend

Low-Rank Approximations Lecture 4 March 6 - 13 2020 29 40

Strategies for choosing y isin range(Akminus1)

(1) Pick the largest remaining column

y = Akminus1( jk) with jk = argmaxj

983042Akminus1( j)9830422

(2) Pick the locally optimal vector (computationally hard)

y = argmin983042x9830422=1 xisinrange(Akminus1)

983042(Iminus xxT)Akminus19830422

(3) A randomized selection strategy Set

y = Akminus1g

where g isin Rn is a Gaussian random vector

Low-Rank Approximations Lecture 4 March 6 - 13 2020 30 40

Exercise Prove that for each k Qk is orthonormal and

Bk = QTkA A = QkBk +Ak

Exercise Prove that for each k

983042A9830422F = 983042Bk9830422F + 983042Ak9830422F

Exercise Prove that for each k with the strategy (2) ie

y = argmin983042x9830422=1 xisinrange(Akminus1)

983042(Iminus xxT)Akminus19830422

the algorithm will produce matrices that attain the theoreticallyoptimal precision ie

983042AminusQkBk9830422 = σk+1

Low-Rank Approximations Lecture 4 March 6 - 13 2020 31 40

42 Block version for randomized selection strategy

Q = [ ] B = [ ]while 983042A9830422 gt ε

G = randn(n p)[Qnewsim] = qr(AG 0)Bnew = QT

newAQ =

983045Q Qnew

983046

B =

983063B

Bnew

983064

A = AminusQnewBnewend

Exercise The block version algorithmrsquos output is an orthonormalmatrix Q of size mtimes k (where k is a multiple of p) and a k times nmatrix B such that

983042AminusQB9830422 le ε

Low-Rank Approximations Lecture 4 March 6 - 13 2020 32 40

43 Algorithm without updating the matrix

Draw standard Gaussian vectors g1 gr of length nyi = Agi i = 1 r j = 0 Q0 = [ ]

while max983042yj+19830422 983042yj+29830422 983042yj+r9830422 gt ε(10983155

2π)j = j + 1yj = yj minusQjminus1Q

Tjminus1yj

qj = yj983042yj9830422Qj =

983045Qjminus1 qj

983046

Draw a standard Gaussian vector gj+r of length nyj+r = (IminusQjQ

Tj )Agj+r

yi = yi minus qjqTj yi i = (j + 1) (j + r minus 1)

endQ = Qj

The orthonormal matrix Q such that 983042AminusQQTA9830422 le ε holdswith probability at least 1minusminmn10minusr

Low-Rank Approximations Lecture 4 March 6 - 13 2020 33 40

5 Rank-revealing QR Factorization

Column pivoted QR factorization (CPQR) of A isin Rmtimesn

AP = QR

where P is a permutation matrix Q is an orthogonal matrix andR is upper triangular

Householder column pivoted QR

Householder column pivoted QR consists of a sequence of nminus 1rank-1 updates (so called BLAS2 operations)

Low-Rank Approximations Lecture 4 March 6 - 13 2020 34 40

Householder column pivoted QR executes rather slowly on modernhardware in particular on systems involving many cores when thematrix is stored in distributed memory or on a hard drive etcThe reason is the process very communication intensive Wereturn to block the process

Blocked Householder column pivoted QR

BLAS3 operations are involved which can be executed very rapidlyon a broad range of computing hardware

A randomized algorithm

G = randn(b+ pm) Y = GA [simsimP1] = qr(Y 0)

Low-Rank Approximations Lecture 4 March 6 - 13 2020 35 40

6 Rank-revealing UTV decomposition

Given an mtimes n matrix A with m ge n a UTV decomposition ofA is a factorization of the form

A = UTVT

where U isin Rmtimesm and V isin Rntimesn are unitary matrices and T is atriangular matrix (either lower or upper triangular)

The UTV decomposition can be viewed as a generalization ofother standard factorizations such as eg the Singular ValueDecomposition (SVD) or the Column Pivoted QR decomposition(CPQR) (To be precise the SVD is the special case where T isdiagonal and the CPQR is the special case where V is apermutation matrix) The additional flexibility inherent in UTVdecomposition enables the design of efficient updating procedures

randUTV provides close to optimal low-rank approximation andhighly accurate estimates for the singular values of a matrix

Low-Rank Approximations Lecture 4 March 6 - 13 2020 36 40

The algorithm randUTV builds the factorization incrementallywhich means that when it is applied to a matrix of numerical rankk the algorithm can be stopped early and incur an overall cost ofO(mnk)

Like HQRRP the algorithm randUTV is blocked which enables itto execute fast on modern hardware

The algorithm randUTV is not an iterative algorithm In thisregard it is closer to the CPQR than standard SVD algorithmswhich substantially simplifies software optimization

Blocked UTV factorization process

Low-Rank Approximations Lecture 4 March 6 - 13 2020 37 40

61 A single step block factorization from A0 to A1

Draw an mtimes p Gaussian matrix G and form the sample matrix

Y = (ATA)qATG

where q is a parameter indicating the number of steps of poweriteration taken

Form a unitary matrix V whose first p columns form anorthonormal basis for the column space of Y For example

[Vsim] = qr(Y)

We then compute SVD of the matrix AV( 1 p)

AV( 1 p) = U1A11WT

and set V1 = V

983063W

I

983064

Low-Rank Approximations Lecture 4 March 6 - 13 2020 38 40

One step of randUTV

function [UTV] = stepUTV(A p q)G =randn(size(A 1) p)Y = ATGfor i = 1 q

Y = AT(AY)end[Vsim] = qr(Y)[UDW] = svd(AV( 1 p))T =

983045D UTAV( (p+ 1) end)

983046

V( 1 p) = V( 1 p)Wreturn

randUTV then applies the same procedure to the remaining blockand so on

Low-Rank Approximations Lecture 4 March 6 - 13 2020 39 40

function [UTV] = randUTV(A p q)T = AU = eye(size(A 1)) V = eye(size(A 2))for i = 1 ceil(size(A 2)p)

I1 = 1 (p(iminus 1))I2 = (p(iminus 1) + 1) size(A 1)J2 = (p(iminus 1) + 1) size(A 2)if length(J2) gt p

[983141U 983141T 983141V] = stepUTV(T(I2 J2) p q)else

[983141U 983141T 983141V] = svd(T(I2 J2))end

U( I2) = U( I2)983141U V( J2) = V( J2)983141V

T(I2 J2) = 983141T T(I1 J2) = 983141T(I1 J2)983141Vend

return

Low-Rank Approximations Lecture 4 March 6 - 13 2020 40 40

13 A two-stage approach for an approximate rank-k SVD

Stage Amdashfind an approximate range Construct an mtimes k matrixQ with orthonormal columns such that

A asymp QQTA

(In other words the columns of Q form an approximate basis forthe column space of A) This step will be executed via arandomized process

Stage Bmdashform a specific factorization Given the matrix Qcomputed in Stage A form the factors UDV using classicaldeterministic techniques For instance this stage can be executedvia the following steps

(1) Form the k times n matrix B = QTA

(2) Compute an SVD of the (small) matrix B B = 983141UDVT

(3) Form U = Q983141U

Low-Rank Approximations Lecture 4 March 6 - 13 2020 5 40

A randomized algorithm for ldquoStage Ardquo

(1) Draw a Gaussian random matrix G and form AG

(2) Construct orthonormal matrix Q st

range(Q) = range(AG)

Motivation

(i) The special case rank(A) = k Let G isin Rntimesk One can prove

range(Q) = range(A)

holds with probability 1

(ii) Practical cases rank(A) ≫ k Simply take a few extra samplesIt turns out that if we take say k + 10 samples instead of k thenthe process will with probability almost 1 produce Qk that iscomparable to the best possible basis Uk where Uk are thematrix consisting of the k leading left singular vectors

Low-Rank Approximations Lecture 4 March 6 - 13 2020 6 40

Notation Orthonormalization Given an mtimes l matrix X withrank(X) = l with m ge l we introduce the function

Q = orth(X)

to denote orthonormalization of the columns of X In other wordsQ will be an mtimes l orthonormal matrix whose columns form abasis for the column space of X ie

QTQ = I and range(Q) = range(X)

(1) Via QR factorization

X = QR

In Matlab[Qsim] = qr(X 0)

(2) Via SVD factorization

X = QDVT

Low-Rank Approximations Lecture 4 March 6 - 13 2020 7 40

Algorithm 1 Randomized SVD

Input A isin Rmtimesn a target rank k over-sampling parameter p

Output U isin Rmtimes(k+p) D isin R(k+p)times(k+p) and V isin Rntimes(k+p)

Stage A(1) Form an ntimes (k + p) Gaussian random matrix G(2) Form the sample matrix Y = AG(3) Orthonormalize the columns of the sample matrix

Q = orth(Y)

Stage B(4) Form the (k + p)times n matrix B = QTA

(5) Form the SVD of the small matrix B B = 983141UDVT

(6) Form U = Q983141U

Randomized SVD accesses the matrix A twice

Low-Rank Approximations Lecture 4 March 6 - 13 2020 8 40

Theorem 1

Let A be an mtimes n matrix with singular values σjmin(mn)j=1 Let k be a

target rank and let p be an over-sampling parameter such that p ge 2and k + p le min(mn) Let G be a Gaussian random matrix of sizentimes (k + p) and set Q = orth(AG) Then the average error asmeasured in the Frobenius norm satisfies

E[983042AminusQQTA983042F] le9830611 +

k

pminus 1

98306212983091

983107min(mn)983131

j=k+1

σ2j

983092

98310812

where E refers to expectation with respect to the draw of G Thecorresponding result for the spectral norm reads

E[983042AminusQQTA9830422] le9830751 +

983158k

pminus 1

983076σk+1 +

eradick + p

p

983091

983107min(mn)983131

j=k+1

σ2j

983092

98310812

Low-Rank Approximations Lecture 4 March 6 - 13 2020 9 40

Bounds on the likelihood of large deviations

The likelihood of a large deviation from the mean depends only onthe over-sampling parameter p and decays extraordinarily fastFor instance one can prove that if p ge 4 then

983042AminusQQTA9830422 le9830751 + 17

983158

1 +k

p

983076σk+1

+8radick + p

p+ 1

983091

983107min(mn)983131

j=k+1

σ2j

983092

98310812

with failure probability at most 3eminusp

Low-Rank Approximations Lecture 4 March 6 - 13 2020 10 40

14 Single-pass algorithms

Algorithm 2 Single-pass randomized EVD for symmetric matrix

Input Symmetric A isin Rntimesn a target rank kover-sampling parameter p

Output U isin Rntimesk D isin Rktimesk and A asymp UDUT

Stage A(1) Form an ntimes (k + p) Gaussian random matrix G(2) Form the sample matrix Y = AG(3) Let Q denote the orthonormal matrix formed by the k

dominant left singular vectors of Y

Stage B

(4) Let C = argminXisinRktimeskX=XT

983042XQTGminusQTY983042F

(5) Form the EVD of the small matrix C C = 983141UD983141UT

(6) Form U = Q983141U

Low-Rank Approximations Lecture 4 March 6 - 13 2020 11 40

Algorithm 3 Single-pass randomized SVD for general matrix

Input A isin Rmtimesn a target rank k over-sampling parameter pOutput U isin Rntimesk D isin Rktimesk V isin Rntimesk and A asymp UDVT

Stage A

(1) Form two Gaussian random matrices Gc isin Rntimes(k+p) and

Gr isin Rmtimes(k+p) (2) Form the sample matrices Yc = AGc and Yr = ATGr(3) Form orthonormal matrices Qc and Qr consisting of the

k dominant left singular vectors of Yc and Yr

Stage B(4) Let

C = argminXisinRktimesk

983042GTr QcXminusYT

r Qr9830422F + 983042XQTr Gc minusQT

c Yc9830422F

(5) Form the SVD of the small matrix C C = 983141UD983141VT

(6) Form U = Qc983141U and V = Qr

983141V

Low-Rank Approximations Lecture 4 March 6 - 13 2020 12 40

Streaming algorithms

We say that an algorithm for processing a matrix is a streamingalgorithm if both (1) and (2) are satisfied

(1) each entry of the matrix is accessed only once

(2) the entries can be fed in any order (In other words thealgorithm is not allowed to dictate the order in which the elementsare viewed)

Complexity O(mn log k) algorithm

Replace Gaussian random matrix G with a different randommatrix Ω

(1) Ω is sufficiently structured that AΩ can be evaluated inO(mn log k) flops

(2) Ω is sufficiently random such that the columns of AΩaccurately span the range of A

Low-Rank Approximations Lecture 4 March 6 - 13 2020 13 40

Candidate ldquoFast Johnson-Lindenstrauss Transformrdquo

(1) Complex case A good candidate for Ω

Ω = DFS

where D is diagonal and Dii are complex numbers of modulus onedrawn from a uniform distribution on the unit circle in C F is thediscrete Fourier transform

Fpq =1radicneminus2πi(pminus1)(qminus1)n p q = 1 n

and S is a matrix consisting of a random subset of l columns fromthe ntimes n unit matrix (drawn without replacement) In otherwords given an arbitrary matrix X of size mtimes n the matrix XSconsists of a randomly drawn subset of l columns of X

(2) Real case Ω = DHS where H is the Hadamard Transform Dis diagonal and Dii = plusmn1 with equal probability

Low-Rank Approximations Lecture 4 March 6 - 13 2020 14 40

15 Accuracy enhanced algorithms

Algorithm 4 Accuracy enhanced randomized SVD

Input A isin Rmtimesn a target rank kover-sampling parameter pa small q the number of steps in the power iteration

Output U isin Rmtimes(k+p) D isin R(k+p)times(k+p) and V isin Rntimes(k+p)

G = randn(n k + p)Y = AGfor j = 1 q

Z = ATYY = AZ

endQ = orth(Y) (for example [Qsim] = qr(Y 0))B = QTA

[983141UDV] = svd(Blsquoeconrsquo)

U = Q983141U

Low-Rank Approximations Lecture 4 March 6 - 13 2020 15 40

Theorem 2

Let A be an mtimes n matrix with singular values σjmin(mn)j=1 Let k be a

target rank and let p be an over-sampling parameter such that p ge 2and k + p le min(mn) Let G be a Gaussian random matrix of sizentimes (k + p) let q be a small integer set

Y = (AAT)qAG

and let Q = orth(Y) Then

E[983042AminusQQTA9830422]

le

983093

983097983095

9830751 +

983158k

pminus 1

983076σ2q+1k+1 +

eradick + p

p

983091

983107min(mn)983131

j=k+1

σ2(2q+1)j

983092

983108

12

983094

983098983096

12q+1

le9830771 +

983158k

pminus 1+

eradick + p

p

983155min(mn)minus k

983078 12q+1

σk+1

Low-Rank Approximations Lecture 4 March 6 - 13 2020 16 40

Algorithm 5 Accuracy enhanced randomized SVDwith orthonormalization

Input A isin Rmtimesn a target rank kover-sampling parameter pa small q the number of steps in the power iteration

Output U isin Rmtimes(k+p) D isin R(k+p)times(k+p) and V isin Rntimes(k+p)

G = randn(n k + p)Q = orth(AG)for j = 1 q

W = orth(ATQ)Q = orth(AW)

endB = QTA

[983141UDV] = svd(Blsquoeconrsquo)

U = Q983141U

Low-Rank Approximations Lecture 4 March 6 - 13 2020 17 40

16 The Nystrom method for SPD matrices

Algorithm 6 Randomized EVD for SPD matrixvia Nystrom method

Input SPD A isin Rntimesn a target rank kover-sampling parameter p

Output U isin Rmtimes(k+p) and Λ isin R(k+p)times(k+p)

G = randn(n k + p)Q = orth(AG)B1 = AQB2 = QTB1C = chol(B2)F = B1C

minus1[UΣsim] = svd(Flsquoeconrsquo)Λ = Σ2

Low-Rank Approximations Lecture 4 March 6 - 13 2020 18 40

2 Interpolative decomposition (ID)

Any matrix A of size mtimes n and rank k where k lt min(mn)admits a so called ldquocolumn IDrdquo which takes the form

A = CZ C isin Rmtimesk Z isin Rktimesn

where the matrix C = A( Js) and Z is a matrix that containsthe k times k identity matrix (Z( Js) = Ik)

Advantages compared to QR or SVD

(1) If A is sparse or non-negative then C shares these properties

(2) The ID requires less memory to store than either the QR orSVD

(3) Finding the indices associated with the spanning columns isoften helpful in data interpretation

(4) In the context of numerical algorithms for discretizing PDEsand integral equations the ID often preserves ldquothe physicsrdquo of aproblem in a way that the QR or SVD do not

Low-Rank Approximations Lecture 4 March 6 - 13 2020 19 40

ldquoRow IDrdquo

A = XR X isin Rmtimesk R isin Rktimesn

where R = A(Is ) and X is a matrix that contains the k times kidentity matrix (X(Is ) = Ik)

ldquoDouble-sided IDrdquo

A = XAsZ X isin Rmtimesk As isin Rktimesk Z isin Rktimesn

where X and Z are the same matrices as those that appear in rowID and column ID and As is the k times k submatrix of A given by

As = A(Is Js)

Low-Rank Approximations Lecture 4 March 6 - 13 2020 20 40

21 Deterministic techniques for the approximate ID

The standard software used to compute the classical columnpivoted QR factorization (CPQR)

AP = QS

can with some light post-processing be used to compute thecolumn ID Let

Q =983045Q1 Q2

983046 S =

983063S11 S12

0 S22

983064

ThenAP =

983045Q1S11 Q1S12 +Q2S22

983046

Let J denote the permutation vector associated with thepermutation matrix P so that P = I( J) and AP = A( J) LetJs = J(1 k) Then set

C = A( Js) = Q1S11

Low-Rank Approximations Lecture 4 March 6 - 13 2020 21 40

ByAP = Q1S11

983045I Sminus1

11 S12

983046+

9830450 Q2S22

983046

we have

A = C983045I Sminus1

11 S12

983046PT +

9830450 Q2S22

983046PT

= CZ+9830450 Q2S22

983046PT

Algorithm 7 Column ID A asymp A( Js)Z

function [JsZ] = ID col(A k)[simS J ] = qr(A 0)T = S(1 k 1 k)S(1 k (k + 1) n)Z = zeros(k n)Z( J) =

983045Ik T

983046

Js = J(1 k)end

Low-Rank Approximations Lecture 4 March 6 - 13 2020 22 40

The row ID can be computed via an entirely analogous processthat starts with a CPQR of the transpose of A In other wordswe execute a pivoted Gram-Schmidt orthonormalization processon the rows of A

Algorithm 8 Row ID A asymp XA(Is )

function [IsX] = ID row(A k)[simS J ] = qr(AT 0)T = S(1 k 1 k)S(1 k (k + 1) m)X = zeros(m k)

X(J ) =983045Ik T

983046T

Is = J(1 k)end

Low-Rank Approximations Lecture 4 March 6 - 13 2020 23 40

To obtain the double-sided ID we start with using theCPQR-based process to build the column ID Then compute therow ID by performing Gram-Schmidt on the rows of the tall thinmatrix C

Algorithm 9 Double-sided ID A asymp XA(Is Js)Z

function [Is JsXZ] = ID double(A k)[JsZ] = ID col(A k)[IsX] = ID row(A( Js) k)

end

The algorithms for computing interpolatory decompositions arewasteful when k ≪ min(mn) since they involve a full QRfactorization which has complexity O(mnmin(mn)) Thisproblem is very easily remedied by replacing the full QRfactorization by a partial QR factorization (the QR factorization isinterrupted after k steps) which has cost O(mnk)

Low-Rank Approximations Lecture 4 March 6 - 13 2020 24 40

22 Randomized techniques for the approximate ID

Suppose A isin Rmtimesn with rank(A) = k Let

A = YF Y isin Rmtimesk F isin Rktimesn

Then by [IsX] = ID row(Y k) to compute a row ID of Y wehave

Y = XY(Is )

andXA(Is ) = XY(Is )F = YF = A

That is XA(Is ) is automatically a row ID of A as well

Observation In order to compute a row ID of a matrix A theonly information needed is a matrix Y whose columns span thecolumn space of A

As we have seen the task of finding a matrix Y whose columnsform a good basis for the column space of a matrix is ideallysuited to randomized sampling

Low-Rank Approximations Lecture 4 March 6 - 13 2020 25 40

Algorithm 10 Accuracy enhanced randomized row ID

Input A isin Rmtimesn a target rank kover-sampling parameter pa small q the number of steps in the power iteration

Output X isin Rmtimesk and an index vector Is isin Nk

such that A asymp XA(Is )

G = randn(n k + p)Y = AGfor j = 1 q

Z = ATYY = AZ

end[IsX] = ID row(Y k)

We can reduce the complexity by using a structured randommatrix instead of a Gaussian For example Ω = DFSDHS

Low-Rank Approximations Lecture 4 March 6 - 13 2020 26 40

3 The CUR decomposition

The CUR factorization approximates an mtimes n matrix A as aproduct

A asymp CUR C isin Rmtimesk U isin Rktimesk R isin Rktimesn

where C consists of a subset of the columns of A and R consists ofa subset of the rows of A

From double-sided ID to CUR Let

C = A( Js) R = A(Is )

ByA asymp CZ asymp CUR

we setU = ZRdagger

Low-Rank Approximations Lecture 4 March 6 - 13 2020 27 40

Algorithm 11 Accuracy enhanced randomized CUR

Input A isin Rmtimesn a target rank kover-sampling parameter pa small q the number of steps in the power iteration

Output U isin Rktimesk index vectors Is Js isin Nk

such that A asymp A( Js)UA(Is )

G = randn(k + pm)Y = GAfor j = 1 q

Z = YATY = ZA

end[IsZ] = ID col(Y k)[Issim] = ID row(A( Js) k)Solve UA(Is ) = Z in least squares sense

Low-Rank Approximations Lecture 4 March 6 - 13 2020 28 40

4 Adaptive rank determination

Given a matrix A and a tolerance ε The task is to determinematrices Qk and Bk of rank le k such that 983042AminusQkBk983042 le ε

41 Algorithm with updating the matrix

Q0 = [ ] B0 = [ ] A0 = A k = 0while 983042Ak9830422 gt ε

k = k + 1pick y isin range(Akminus1)qk = y983042y9830422bk = qT

kAkminus1Qk =

983045Qkminus1 qk

983046

Bk =

983063Bkminus1

bk

983064

Ak = Akminus1 minus qkbkend

Low-Rank Approximations Lecture 4 March 6 - 13 2020 29 40

Strategies for choosing y isin range(Akminus1)

(1) Pick the largest remaining column

y = Akminus1( jk) with jk = argmaxj

983042Akminus1( j)9830422

(2) Pick the locally optimal vector (computationally hard)

y = argmin983042x9830422=1 xisinrange(Akminus1)

983042(Iminus xxT)Akminus19830422

(3) A randomized selection strategy Set

y = Akminus1g

where g isin Rn is a Gaussian random vector

Low-Rank Approximations Lecture 4 March 6 - 13 2020 30 40

Exercise Prove that for each k Qk is orthonormal and

Bk = QTkA A = QkBk +Ak

Exercise Prove that for each k

983042A9830422F = 983042Bk9830422F + 983042Ak9830422F

Exercise Prove that for each k with the strategy (2) ie

y = argmin983042x9830422=1 xisinrange(Akminus1)

983042(Iminus xxT)Akminus19830422

the algorithm will produce matrices that attain the theoreticallyoptimal precision ie

983042AminusQkBk9830422 = σk+1

Low-Rank Approximations Lecture 4 March 6 - 13 2020 31 40

42 Block version for randomized selection strategy

Q = [ ] B = [ ]while 983042A9830422 gt ε

G = randn(n p)[Qnewsim] = qr(AG 0)Bnew = QT

newAQ =

983045Q Qnew

983046

B =

983063B

Bnew

983064

A = AminusQnewBnewend

Exercise The block version algorithmrsquos output is an orthonormalmatrix Q of size mtimes k (where k is a multiple of p) and a k times nmatrix B such that

983042AminusQB9830422 le ε

Low-Rank Approximations Lecture 4 March 6 - 13 2020 32 40

43 Algorithm without updating the matrix

Draw standard Gaussian vectors g1 gr of length nyi = Agi i = 1 r j = 0 Q0 = [ ]

while max983042yj+19830422 983042yj+29830422 983042yj+r9830422 gt ε(10983155

2π)j = j + 1yj = yj minusQjminus1Q

Tjminus1yj

qj = yj983042yj9830422Qj =

983045Qjminus1 qj

983046

Draw a standard Gaussian vector gj+r of length nyj+r = (IminusQjQ

Tj )Agj+r

yi = yi minus qjqTj yi i = (j + 1) (j + r minus 1)

endQ = Qj

The orthonormal matrix Q such that 983042AminusQQTA9830422 le ε holdswith probability at least 1minusminmn10minusr

Low-Rank Approximations Lecture 4 March 6 - 13 2020 33 40

5 Rank-revealing QR Factorization

Column pivoted QR factorization (CPQR) of A isin Rmtimesn

AP = QR

where P is a permutation matrix Q is an orthogonal matrix andR is upper triangular

Householder column pivoted QR

Householder column pivoted QR consists of a sequence of nminus 1rank-1 updates (so called BLAS2 operations)

Low-Rank Approximations Lecture 4 March 6 - 13 2020 34 40

Householder column pivoted QR executes rather slowly on modernhardware in particular on systems involving many cores when thematrix is stored in distributed memory or on a hard drive etcThe reason is the process very communication intensive Wereturn to block the process

Blocked Householder column pivoted QR

BLAS3 operations are involved which can be executed very rapidlyon a broad range of computing hardware

A randomized algorithm

G = randn(b+ pm) Y = GA [simsimP1] = qr(Y 0)

Low-Rank Approximations Lecture 4 March 6 - 13 2020 35 40

6 Rank-revealing UTV decomposition

Given an mtimes n matrix A with m ge n a UTV decomposition ofA is a factorization of the form

A = UTVT

where U isin Rmtimesm and V isin Rntimesn are unitary matrices and T is atriangular matrix (either lower or upper triangular)

The UTV decomposition can be viewed as a generalization ofother standard factorizations such as eg the Singular ValueDecomposition (SVD) or the Column Pivoted QR decomposition(CPQR) (To be precise the SVD is the special case where T isdiagonal and the CPQR is the special case where V is apermutation matrix) The additional flexibility inherent in UTVdecomposition enables the design of efficient updating procedures

randUTV provides close to optimal low-rank approximation andhighly accurate estimates for the singular values of a matrix

Low-Rank Approximations Lecture 4 March 6 - 13 2020 36 40

The algorithm randUTV builds the factorization incrementallywhich means that when it is applied to a matrix of numerical rankk the algorithm can be stopped early and incur an overall cost ofO(mnk)

Like HQRRP the algorithm randUTV is blocked which enables itto execute fast on modern hardware

The algorithm randUTV is not an iterative algorithm In thisregard it is closer to the CPQR than standard SVD algorithmswhich substantially simplifies software optimization

Blocked UTV factorization process

Low-Rank Approximations Lecture 4 March 6 - 13 2020 37 40

61 A single step block factorization from A0 to A1

Draw an mtimes p Gaussian matrix G and form the sample matrix

Y = (ATA)qATG

where q is a parameter indicating the number of steps of poweriteration taken

Form a unitary matrix V whose first p columns form anorthonormal basis for the column space of Y For example

[Vsim] = qr(Y)

We then compute SVD of the matrix AV( 1 p)

AV( 1 p) = U1A11WT

and set V1 = V

983063W

I

983064

Low-Rank Approximations Lecture 4 March 6 - 13 2020 38 40

One step of randUTV

function [UTV] = stepUTV(A p q)G =randn(size(A 1) p)Y = ATGfor i = 1 q

Y = AT(AY)end[Vsim] = qr(Y)[UDW] = svd(AV( 1 p))T =

983045D UTAV( (p+ 1) end)

983046

V( 1 p) = V( 1 p)Wreturn

randUTV then applies the same procedure to the remaining blockand so on

Low-Rank Approximations Lecture 4 March 6 - 13 2020 39 40

function [UTV] = randUTV(A p q)T = AU = eye(size(A 1)) V = eye(size(A 2))for i = 1 ceil(size(A 2)p)

I1 = 1 (p(iminus 1))I2 = (p(iminus 1) + 1) size(A 1)J2 = (p(iminus 1) + 1) size(A 2)if length(J2) gt p

[983141U 983141T 983141V] = stepUTV(T(I2 J2) p q)else

[983141U 983141T 983141V] = svd(T(I2 J2))end

U( I2) = U( I2)983141U V( J2) = V( J2)983141V

T(I2 J2) = 983141T T(I1 J2) = 983141T(I1 J2)983141Vend

return

Low-Rank Approximations Lecture 4 March 6 - 13 2020 40 40

A randomized algorithm for ldquoStage Ardquo

(1) Draw a Gaussian random matrix G and form AG

(2) Construct orthonormal matrix Q st

range(Q) = range(AG)

Motivation

(i) The special case rank(A) = k Let G isin Rntimesk One can prove

range(Q) = range(A)

holds with probability 1

(ii) Practical cases rank(A) ≫ k Simply take a few extra samplesIt turns out that if we take say k + 10 samples instead of k thenthe process will with probability almost 1 produce Qk that iscomparable to the best possible basis Uk where Uk are thematrix consisting of the k leading left singular vectors

Low-Rank Approximations Lecture 4 March 6 - 13 2020 6 40

Notation Orthonormalization Given an mtimes l matrix X withrank(X) = l with m ge l we introduce the function

Q = orth(X)

to denote orthonormalization of the columns of X In other wordsQ will be an mtimes l orthonormal matrix whose columns form abasis for the column space of X ie

QTQ = I and range(Q) = range(X)

(1) Via QR factorization

X = QR

In Matlab[Qsim] = qr(X 0)

(2) Via SVD factorization

X = QDVT

Low-Rank Approximations Lecture 4 March 6 - 13 2020 7 40

Algorithm 1 Randomized SVD

Input A isin Rmtimesn a target rank k over-sampling parameter p

Output U isin Rmtimes(k+p) D isin R(k+p)times(k+p) and V isin Rntimes(k+p)

Stage A(1) Form an ntimes (k + p) Gaussian random matrix G(2) Form the sample matrix Y = AG(3) Orthonormalize the columns of the sample matrix

Q = orth(Y)

Stage B(4) Form the (k + p)times n matrix B = QTA

(5) Form the SVD of the small matrix B B = 983141UDVT

(6) Form U = Q983141U

Randomized SVD accesses the matrix A twice

Low-Rank Approximations Lecture 4 March 6 - 13 2020 8 40

Theorem 1

Let A be an mtimes n matrix with singular values σjmin(mn)j=1 Let k be a

target rank and let p be an over-sampling parameter such that p ge 2and k + p le min(mn) Let G be a Gaussian random matrix of sizentimes (k + p) and set Q = orth(AG) Then the average error asmeasured in the Frobenius norm satisfies

E[983042AminusQQTA983042F] le9830611 +

k

pminus 1

98306212983091

983107min(mn)983131

j=k+1

σ2j

983092

98310812

where E refers to expectation with respect to the draw of G Thecorresponding result for the spectral norm reads

E[983042AminusQQTA9830422] le9830751 +

983158k

pminus 1

983076σk+1 +

eradick + p

p

983091

983107min(mn)983131

j=k+1

σ2j

983092

98310812

Low-Rank Approximations Lecture 4 March 6 - 13 2020 9 40

Bounds on the likelihood of large deviations

The likelihood of a large deviation from the mean depends only onthe over-sampling parameter p and decays extraordinarily fastFor instance one can prove that if p ge 4 then

983042AminusQQTA9830422 le9830751 + 17

983158

1 +k

p

983076σk+1

+8radick + p

p+ 1

983091

983107min(mn)983131

j=k+1

σ2j

983092

98310812

with failure probability at most 3eminusp

Low-Rank Approximations Lecture 4 March 6 - 13 2020 10 40

14 Single-pass algorithms

Algorithm 2 Single-pass randomized EVD for symmetric matrix

Input Symmetric A isin Rntimesn a target rank kover-sampling parameter p

Output U isin Rntimesk D isin Rktimesk and A asymp UDUT

Stage A(1) Form an ntimes (k + p) Gaussian random matrix G(2) Form the sample matrix Y = AG(3) Let Q denote the orthonormal matrix formed by the k

dominant left singular vectors of Y

Stage B

(4) Let C = argminXisinRktimeskX=XT

983042XQTGminusQTY983042F

(5) Form the EVD of the small matrix C C = 983141UD983141UT

(6) Form U = Q983141U

Low-Rank Approximations Lecture 4 March 6 - 13 2020 11 40

Algorithm 3 Single-pass randomized SVD for general matrix

Input A isin Rmtimesn a target rank k over-sampling parameter pOutput U isin Rntimesk D isin Rktimesk V isin Rntimesk and A asymp UDVT

Stage A

(1) Form two Gaussian random matrices Gc isin Rntimes(k+p) and

Gr isin Rmtimes(k+p) (2) Form the sample matrices Yc = AGc and Yr = ATGr(3) Form orthonormal matrices Qc and Qr consisting of the

k dominant left singular vectors of Yc and Yr

Stage B(4) Let

C = argminXisinRktimesk

983042GTr QcXminusYT

r Qr9830422F + 983042XQTr Gc minusQT

c Yc9830422F

(5) Form the SVD of the small matrix C C = 983141UD983141VT

(6) Form U = Qc983141U and V = Qr

983141V

Low-Rank Approximations Lecture 4 March 6 - 13 2020 12 40

Streaming algorithms

We say that an algorithm for processing a matrix is a streamingalgorithm if both (1) and (2) are satisfied

(1) each entry of the matrix is accessed only once

(2) the entries can be fed in any order (In other words thealgorithm is not allowed to dictate the order in which the elementsare viewed)

Complexity O(mn log k) algorithm

Replace Gaussian random matrix G with a different randommatrix Ω

(1) Ω is sufficiently structured that AΩ can be evaluated inO(mn log k) flops

(2) Ω is sufficiently random such that the columns of AΩaccurately span the range of A

Low-Rank Approximations Lecture 4 March 6 - 13 2020 13 40

Candidate ldquoFast Johnson-Lindenstrauss Transformrdquo

(1) Complex case A good candidate for Ω

Ω = DFS

where D is diagonal and Dii are complex numbers of modulus onedrawn from a uniform distribution on the unit circle in C F is thediscrete Fourier transform

Fpq =1radicneminus2πi(pminus1)(qminus1)n p q = 1 n

and S is a matrix consisting of a random subset of l columns fromthe ntimes n unit matrix (drawn without replacement) In otherwords given an arbitrary matrix X of size mtimes n the matrix XSconsists of a randomly drawn subset of l columns of X

(2) Real case Ω = DHS where H is the Hadamard Transform Dis diagonal and Dii = plusmn1 with equal probability

Low-Rank Approximations Lecture 4 March 6 - 13 2020 14 40

15 Accuracy enhanced algorithms

Algorithm 4 Accuracy enhanced randomized SVD

Input A isin Rmtimesn a target rank kover-sampling parameter pa small q the number of steps in the power iteration

Output U isin Rmtimes(k+p) D isin R(k+p)times(k+p) and V isin Rntimes(k+p)

G = randn(n k + p)Y = AGfor j = 1 q

Z = ATYY = AZ

endQ = orth(Y) (for example [Qsim] = qr(Y 0))B = QTA

[983141UDV] = svd(Blsquoeconrsquo)

U = Q983141U

Low-Rank Approximations Lecture 4 March 6 - 13 2020 15 40

Theorem 2

Let A be an mtimes n matrix with singular values σjmin(mn)j=1 Let k be a

target rank and let p be an over-sampling parameter such that p ge 2and k + p le min(mn) Let G be a Gaussian random matrix of sizentimes (k + p) let q be a small integer set

Y = (AAT)qAG

and let Q = orth(Y) Then

E[983042AminusQQTA9830422]

le

983093

983097983095

9830751 +

983158k

pminus 1

983076σ2q+1k+1 +

eradick + p

p

983091

983107min(mn)983131

j=k+1

σ2(2q+1)j

983092

983108

12

983094

983098983096

12q+1

le9830771 +

983158k

pminus 1+

eradick + p

p

983155min(mn)minus k

983078 12q+1

σk+1

Low-Rank Approximations Lecture 4 March 6 - 13 2020 16 40

Algorithm 5 Accuracy enhanced randomized SVDwith orthonormalization

Input A isin Rmtimesn a target rank kover-sampling parameter pa small q the number of steps in the power iteration

Output U isin Rmtimes(k+p) D isin R(k+p)times(k+p) and V isin Rntimes(k+p)

G = randn(n k + p)Q = orth(AG)for j = 1 q

W = orth(ATQ)Q = orth(AW)

endB = QTA

[983141UDV] = svd(Blsquoeconrsquo)

U = Q983141U

Low-Rank Approximations Lecture 4 March 6 - 13 2020 17 40

16 The Nystrom method for SPD matrices

Algorithm 6 Randomized EVD for SPD matrixvia Nystrom method

Input SPD A isin Rntimesn a target rank kover-sampling parameter p

Output U isin Rmtimes(k+p) and Λ isin R(k+p)times(k+p)

G = randn(n k + p)Q = orth(AG)B1 = AQB2 = QTB1C = chol(B2)F = B1C

minus1[UΣsim] = svd(Flsquoeconrsquo)Λ = Σ2

Low-Rank Approximations Lecture 4 March 6 - 13 2020 18 40

2 Interpolative decomposition (ID)

Any matrix A of size mtimes n and rank k where k lt min(mn)admits a so called ldquocolumn IDrdquo which takes the form

A = CZ C isin Rmtimesk Z isin Rktimesn

where the matrix C = A( Js) and Z is a matrix that containsthe k times k identity matrix (Z( Js) = Ik)

Advantages compared to QR or SVD

(1) If A is sparse or non-negative then C shares these properties

(2) The ID requires less memory to store than either the QR orSVD

(3) Finding the indices associated with the spanning columns isoften helpful in data interpretation

(4) In the context of numerical algorithms for discretizing PDEsand integral equations the ID often preserves ldquothe physicsrdquo of aproblem in a way that the QR or SVD do not

Low-Rank Approximations Lecture 4 March 6 - 13 2020 19 40

ldquoRow IDrdquo

A = XR X isin Rmtimesk R isin Rktimesn

where R = A(Is ) and X is a matrix that contains the k times kidentity matrix (X(Is ) = Ik)

ldquoDouble-sided IDrdquo

A = XAsZ X isin Rmtimesk As isin Rktimesk Z isin Rktimesn

where X and Z are the same matrices as those that appear in rowID and column ID and As is the k times k submatrix of A given by

As = A(Is Js)

Low-Rank Approximations Lecture 4 March 6 - 13 2020 20 40

21 Deterministic techniques for the approximate ID

The standard software used to compute the classical columnpivoted QR factorization (CPQR)

AP = QS

can with some light post-processing be used to compute thecolumn ID Let

Q =983045Q1 Q2

983046 S =

983063S11 S12

0 S22

983064

ThenAP =

983045Q1S11 Q1S12 +Q2S22

983046

Let J denote the permutation vector associated with thepermutation matrix P so that P = I( J) and AP = A( J) LetJs = J(1 k) Then set

C = A( Js) = Q1S11

Low-Rank Approximations Lecture 4 March 6 - 13 2020 21 40

ByAP = Q1S11

983045I Sminus1

11 S12

983046+

9830450 Q2S22

983046

we have

A = C983045I Sminus1

11 S12

983046PT +

9830450 Q2S22

983046PT

= CZ+9830450 Q2S22

983046PT

Algorithm 7 Column ID A asymp A( Js)Z

function [JsZ] = ID col(A k)[simS J ] = qr(A 0)T = S(1 k 1 k)S(1 k (k + 1) n)Z = zeros(k n)Z( J) =

983045Ik T

983046

Js = J(1 k)end

Low-Rank Approximations Lecture 4 March 6 - 13 2020 22 40

The row ID can be computed via an entirely analogous processthat starts with a CPQR of the transpose of A In other wordswe execute a pivoted Gram-Schmidt orthonormalization processon the rows of A

Algorithm 8 Row ID A asymp XA(Is )

function [IsX] = ID row(A k)[simS J ] = qr(AT 0)T = S(1 k 1 k)S(1 k (k + 1) m)X = zeros(m k)

X(J ) =983045Ik T

983046T

Is = J(1 k)end

Low-Rank Approximations Lecture 4 March 6 - 13 2020 23 40

To obtain the double-sided ID we start with using theCPQR-based process to build the column ID Then compute therow ID by performing Gram-Schmidt on the rows of the tall thinmatrix C

Algorithm 9 Double-sided ID A asymp XA(Is Js)Z

function [Is JsXZ] = ID double(A k)[JsZ] = ID col(A k)[IsX] = ID row(A( Js) k)

end

The algorithms for computing interpolatory decompositions arewasteful when k ≪ min(mn) since they involve a full QRfactorization which has complexity O(mnmin(mn)) Thisproblem is very easily remedied by replacing the full QRfactorization by a partial QR factorization (the QR factorization isinterrupted after k steps) which has cost O(mnk)

Low-Rank Approximations Lecture 4 March 6 - 13 2020 24 40

22 Randomized techniques for the approximate ID

Suppose A isin Rmtimesn with rank(A) = k Let

A = YF Y isin Rmtimesk F isin Rktimesn

Then by [IsX] = ID row(Y k) to compute a row ID of Y wehave

Y = XY(Is )

andXA(Is ) = XY(Is )F = YF = A

That is XA(Is ) is automatically a row ID of A as well

Observation In order to compute a row ID of a matrix A theonly information needed is a matrix Y whose columns span thecolumn space of A

As we have seen the task of finding a matrix Y whose columnsform a good basis for the column space of a matrix is ideallysuited to randomized sampling

Low-Rank Approximations Lecture 4 March 6 - 13 2020 25 40

Algorithm 10 Accuracy enhanced randomized row ID

Input A isin Rmtimesn a target rank kover-sampling parameter pa small q the number of steps in the power iteration

Output X isin Rmtimesk and an index vector Is isin Nk

such that A asymp XA(Is )

G = randn(n k + p)Y = AGfor j = 1 q

Z = ATYY = AZ

end[IsX] = ID row(Y k)

We can reduce the complexity by using a structured randommatrix instead of a Gaussian For example Ω = DFSDHS

Low-Rank Approximations Lecture 4 March 6 - 13 2020 26 40

3 The CUR decomposition

The CUR factorization approximates an mtimes n matrix A as aproduct

A asymp CUR C isin Rmtimesk U isin Rktimesk R isin Rktimesn

where C consists of a subset of the columns of A and R consists ofa subset of the rows of A

From double-sided ID to CUR Let

C = A( Js) R = A(Is )

ByA asymp CZ asymp CUR

we setU = ZRdagger

Low-Rank Approximations Lecture 4 March 6 - 13 2020 27 40

Algorithm 11 Accuracy enhanced randomized CUR

Input A isin Rmtimesn a target rank kover-sampling parameter pa small q the number of steps in the power iteration

Output U isin Rktimesk index vectors Is Js isin Nk

such that A asymp A( Js)UA(Is )

G = randn(k + pm)Y = GAfor j = 1 q

Z = YATY = ZA

end[IsZ] = ID col(Y k)[Issim] = ID row(A( Js) k)Solve UA(Is ) = Z in least squares sense

Low-Rank Approximations Lecture 4 March 6 - 13 2020 28 40

4 Adaptive rank determination

Given a matrix A and a tolerance ε The task is to determinematrices Qk and Bk of rank le k such that 983042AminusQkBk983042 le ε

41 Algorithm with updating the matrix

Q0 = [ ] B0 = [ ] A0 = A k = 0while 983042Ak9830422 gt ε

k = k + 1pick y isin range(Akminus1)qk = y983042y9830422bk = qT

kAkminus1Qk =

983045Qkminus1 qk

983046

Bk =

983063Bkminus1

bk

983064

Ak = Akminus1 minus qkbkend

Low-Rank Approximations Lecture 4 March 6 - 13 2020 29 40

Strategies for choosing y isin range(Akminus1)

(1) Pick the largest remaining column

y = Akminus1( jk) with jk = argmaxj

983042Akminus1( j)9830422

(2) Pick the locally optimal vector (computationally hard)

y = argmin983042x9830422=1 xisinrange(Akminus1)

983042(Iminus xxT)Akminus19830422

(3) A randomized selection strategy Set

y = Akminus1g

where g isin Rn is a Gaussian random vector

Low-Rank Approximations Lecture 4 March 6 - 13 2020 30 40

Exercise Prove that for each k Qk is orthonormal and

Bk = QTkA A = QkBk +Ak

Exercise Prove that for each k

983042A9830422F = 983042Bk9830422F + 983042Ak9830422F

Exercise Prove that for each k with the strategy (2) ie

y = argmin983042x9830422=1 xisinrange(Akminus1)

983042(Iminus xxT)Akminus19830422

the algorithm will produce matrices that attain the theoreticallyoptimal precision ie

983042AminusQkBk9830422 = σk+1

Low-Rank Approximations Lecture 4 March 6 - 13 2020 31 40

42 Block version for randomized selection strategy

Q = [ ] B = [ ]while 983042A9830422 gt ε

G = randn(n p)[Qnewsim] = qr(AG 0)Bnew = QT

newAQ =

983045Q Qnew

983046

B =

983063B

Bnew

983064

A = AminusQnewBnewend

Exercise The block version algorithmrsquos output is an orthonormalmatrix Q of size mtimes k (where k is a multiple of p) and a k times nmatrix B such that

983042AminusQB9830422 le ε

Low-Rank Approximations Lecture 4 March 6 - 13 2020 32 40

43 Algorithm without updating the matrix

Draw standard Gaussian vectors g1 gr of length nyi = Agi i = 1 r j = 0 Q0 = [ ]

while max983042yj+19830422 983042yj+29830422 983042yj+r9830422 gt ε(10983155

2π)j = j + 1yj = yj minusQjminus1Q

Tjminus1yj

qj = yj983042yj9830422Qj =

983045Qjminus1 qj

983046

Draw a standard Gaussian vector gj+r of length nyj+r = (IminusQjQ

Tj )Agj+r

yi = yi minus qjqTj yi i = (j + 1) (j + r minus 1)

endQ = Qj

The orthonormal matrix Q such that 983042AminusQQTA9830422 le ε holdswith probability at least 1minusminmn10minusr

Low-Rank Approximations Lecture 4 March 6 - 13 2020 33 40

5 Rank-revealing QR Factorization

Column pivoted QR factorization (CPQR) of A isin Rmtimesn

AP = QR

where P is a permutation matrix Q is an orthogonal matrix andR is upper triangular

Householder column pivoted QR

Householder column pivoted QR consists of a sequence of nminus 1rank-1 updates (so called BLAS2 operations)

Low-Rank Approximations Lecture 4 March 6 - 13 2020 34 40

Householder column pivoted QR executes rather slowly on modernhardware in particular on systems involving many cores when thematrix is stored in distributed memory or on a hard drive etcThe reason is the process very communication intensive Wereturn to block the process

Blocked Householder column pivoted QR

BLAS3 operations are involved which can be executed very rapidlyon a broad range of computing hardware

A randomized algorithm

G = randn(b+ pm) Y = GA [simsimP1] = qr(Y 0)

Low-Rank Approximations Lecture 4 March 6 - 13 2020 35 40

6 Rank-revealing UTV decomposition

Given an mtimes n matrix A with m ge n a UTV decomposition ofA is a factorization of the form

A = UTVT

where U isin Rmtimesm and V isin Rntimesn are unitary matrices and T is atriangular matrix (either lower or upper triangular)

The UTV decomposition can be viewed as a generalization ofother standard factorizations such as eg the Singular ValueDecomposition (SVD) or the Column Pivoted QR decomposition(CPQR) (To be precise the SVD is the special case where T isdiagonal and the CPQR is the special case where V is apermutation matrix) The additional flexibility inherent in UTVdecomposition enables the design of efficient updating procedures

randUTV provides close to optimal low-rank approximation andhighly accurate estimates for the singular values of a matrix

Low-Rank Approximations Lecture 4 March 6 - 13 2020 36 40

The algorithm randUTV builds the factorization incrementallywhich means that when it is applied to a matrix of numerical rankk the algorithm can be stopped early and incur an overall cost ofO(mnk)

Like HQRRP the algorithm randUTV is blocked which enables itto execute fast on modern hardware

The algorithm randUTV is not an iterative algorithm In thisregard it is closer to the CPQR than standard SVD algorithmswhich substantially simplifies software optimization

Blocked UTV factorization process

Low-Rank Approximations Lecture 4 March 6 - 13 2020 37 40

61 A single step block factorization from A0 to A1

Draw an mtimes p Gaussian matrix G and form the sample matrix

Y = (ATA)qATG

where q is a parameter indicating the number of steps of poweriteration taken

Form a unitary matrix V whose first p columns form anorthonormal basis for the column space of Y For example

[Vsim] = qr(Y)

We then compute SVD of the matrix AV( 1 p)

AV( 1 p) = U1A11WT

and set V1 = V

983063W

I

983064

Low-Rank Approximations Lecture 4 March 6 - 13 2020 38 40

One step of randUTV

function [UTV] = stepUTV(A p q)G =randn(size(A 1) p)Y = ATGfor i = 1 q

Y = AT(AY)end[Vsim] = qr(Y)[UDW] = svd(AV( 1 p))T =

983045D UTAV( (p+ 1) end)

983046

V( 1 p) = V( 1 p)Wreturn

randUTV then applies the same procedure to the remaining blockand so on

Low-Rank Approximations Lecture 4 March 6 - 13 2020 39 40

function [UTV] = randUTV(A p q)T = AU = eye(size(A 1)) V = eye(size(A 2))for i = 1 ceil(size(A 2)p)

I1 = 1 (p(iminus 1))I2 = (p(iminus 1) + 1) size(A 1)J2 = (p(iminus 1) + 1) size(A 2)if length(J2) gt p

[983141U 983141T 983141V] = stepUTV(T(I2 J2) p q)else

[983141U 983141T 983141V] = svd(T(I2 J2))end

U( I2) = U( I2)983141U V( J2) = V( J2)983141V

T(I2 J2) = 983141T T(I1 J2) = 983141T(I1 J2)983141Vend

return

Low-Rank Approximations Lecture 4 March 6 - 13 2020 40 40

Notation Orthonormalization Given an mtimes l matrix X withrank(X) = l with m ge l we introduce the function

Q = orth(X)

to denote orthonormalization of the columns of X In other wordsQ will be an mtimes l orthonormal matrix whose columns form abasis for the column space of X ie

QTQ = I and range(Q) = range(X)

(1) Via QR factorization

X = QR

In Matlab[Qsim] = qr(X 0)

(2) Via SVD factorization

X = QDVT

Low-Rank Approximations Lecture 4 March 6 - 13 2020 7 40

Algorithm 1 Randomized SVD

Input A isin Rmtimesn a target rank k over-sampling parameter p

Output U isin Rmtimes(k+p) D isin R(k+p)times(k+p) and V isin Rntimes(k+p)

Stage A(1) Form an ntimes (k + p) Gaussian random matrix G(2) Form the sample matrix Y = AG(3) Orthonormalize the columns of the sample matrix

Q = orth(Y)

Stage B(4) Form the (k + p)times n matrix B = QTA

(5) Form the SVD of the small matrix B B = 983141UDVT

(6) Form U = Q983141U

Randomized SVD accesses the matrix A twice

Low-Rank Approximations Lecture 4 March 6 - 13 2020 8 40

Theorem 1

Let A be an mtimes n matrix with singular values σjmin(mn)j=1 Let k be a

target rank and let p be an over-sampling parameter such that p ge 2and k + p le min(mn) Let G be a Gaussian random matrix of sizentimes (k + p) and set Q = orth(AG) Then the average error asmeasured in the Frobenius norm satisfies

E[983042AminusQQTA983042F] le9830611 +

k

pminus 1

98306212983091

983107min(mn)983131

j=k+1

σ2j

983092

98310812

where E refers to expectation with respect to the draw of G Thecorresponding result for the spectral norm reads

E[983042AminusQQTA9830422] le9830751 +

983158k

pminus 1

983076σk+1 +

eradick + p

p

983091

983107min(mn)983131

j=k+1

σ2j

983092

98310812

Low-Rank Approximations Lecture 4 March 6 - 13 2020 9 40

Bounds on the likelihood of large deviations

The likelihood of a large deviation from the mean depends only onthe over-sampling parameter p and decays extraordinarily fastFor instance one can prove that if p ge 4 then

983042AminusQQTA9830422 le9830751 + 17

983158

1 +k

p

983076σk+1

+8radick + p

p+ 1

983091

983107min(mn)983131

j=k+1

σ2j

983092

98310812

with failure probability at most 3eminusp

Low-Rank Approximations Lecture 4 March 6 - 13 2020 10 40

14 Single-pass algorithms

Algorithm 2 Single-pass randomized EVD for symmetric matrix

Input Symmetric A isin Rntimesn a target rank kover-sampling parameter p

Output U isin Rntimesk D isin Rktimesk and A asymp UDUT

Stage A(1) Form an ntimes (k + p) Gaussian random matrix G(2) Form the sample matrix Y = AG(3) Let Q denote the orthonormal matrix formed by the k

dominant left singular vectors of Y

Stage B

(4) Let C = argminXisinRktimeskX=XT

983042XQTGminusQTY983042F

(5) Form the EVD of the small matrix C C = 983141UD983141UT

(6) Form U = Q983141U

Low-Rank Approximations Lecture 4 March 6 - 13 2020 11 40

Algorithm 3 Single-pass randomized SVD for general matrix

Input A isin Rmtimesn a target rank k over-sampling parameter pOutput U isin Rntimesk D isin Rktimesk V isin Rntimesk and A asymp UDVT

Stage A

(1) Form two Gaussian random matrices Gc isin Rntimes(k+p) and

Gr isin Rmtimes(k+p) (2) Form the sample matrices Yc = AGc and Yr = ATGr(3) Form orthonormal matrices Qc and Qr consisting of the

k dominant left singular vectors of Yc and Yr

Stage B(4) Let

C = argminXisinRktimesk

983042GTr QcXminusYT

r Qr9830422F + 983042XQTr Gc minusQT

c Yc9830422F

(5) Form the SVD of the small matrix C C = 983141UD983141VT

(6) Form U = Qc983141U and V = Qr

983141V

Low-Rank Approximations Lecture 4 March 6 - 13 2020 12 40

Streaming algorithms

We say that an algorithm for processing a matrix is a streamingalgorithm if both (1) and (2) are satisfied

(1) each entry of the matrix is accessed only once

(2) the entries can be fed in any order (In other words thealgorithm is not allowed to dictate the order in which the elementsare viewed)

Complexity O(mn log k) algorithm

Replace Gaussian random matrix G with a different randommatrix Ω

(1) Ω is sufficiently structured that AΩ can be evaluated inO(mn log k) flops

(2) Ω is sufficiently random such that the columns of AΩaccurately span the range of A

Low-Rank Approximations Lecture 4 March 6 - 13 2020 13 40

Candidate ldquoFast Johnson-Lindenstrauss Transformrdquo

(1) Complex case A good candidate for Ω

Ω = DFS

where D is diagonal and Dii are complex numbers of modulus onedrawn from a uniform distribution on the unit circle in C F is thediscrete Fourier transform

Fpq =1radicneminus2πi(pminus1)(qminus1)n p q = 1 n

and S is a matrix consisting of a random subset of l columns fromthe ntimes n unit matrix (drawn without replacement) In otherwords given an arbitrary matrix X of size mtimes n the matrix XSconsists of a randomly drawn subset of l columns of X

(2) Real case Ω = DHS where H is the Hadamard Transform Dis diagonal and Dii = plusmn1 with equal probability

Low-Rank Approximations Lecture 4 March 6 - 13 2020 14 40

15 Accuracy enhanced algorithms

Algorithm 4 Accuracy enhanced randomized SVD

Input A isin Rmtimesn a target rank kover-sampling parameter pa small q the number of steps in the power iteration

Output U isin Rmtimes(k+p) D isin R(k+p)times(k+p) and V isin Rntimes(k+p)

G = randn(n k + p)Y = AGfor j = 1 q

Z = ATYY = AZ

endQ = orth(Y) (for example [Qsim] = qr(Y 0))B = QTA

[983141UDV] = svd(Blsquoeconrsquo)

U = Q983141U

Low-Rank Approximations Lecture 4 March 6 - 13 2020 15 40

Theorem 2

Let A be an mtimes n matrix with singular values σjmin(mn)j=1 Let k be a

target rank and let p be an over-sampling parameter such that p ge 2and k + p le min(mn) Let G be a Gaussian random matrix of sizentimes (k + p) let q be a small integer set

Y = (AAT)qAG

and let Q = orth(Y) Then

E[983042AminusQQTA9830422]

le

983093

983097983095

9830751 +

983158k

pminus 1

983076σ2q+1k+1 +

eradick + p

p

983091

983107min(mn)983131

j=k+1

σ2(2q+1)j

983092

983108

12

983094

983098983096

12q+1

le9830771 +

983158k

pminus 1+

eradick + p

p

983155min(mn)minus k

983078 12q+1

σk+1

Low-Rank Approximations Lecture 4 March 6 - 13 2020 16 40

Algorithm 5 Accuracy enhanced randomized SVDwith orthonormalization

Input A isin Rmtimesn a target rank kover-sampling parameter pa small q the number of steps in the power iteration

Output U isin Rmtimes(k+p) D isin R(k+p)times(k+p) and V isin Rntimes(k+p)

G = randn(n k + p)Q = orth(AG)for j = 1 q

W = orth(ATQ)Q = orth(AW)

endB = QTA

[983141UDV] = svd(Blsquoeconrsquo)

U = Q983141U

Low-Rank Approximations Lecture 4 March 6 - 13 2020 17 40

16 The Nystrom method for SPD matrices

Algorithm 6 Randomized EVD for SPD matrixvia Nystrom method

Input SPD A isin Rntimesn a target rank kover-sampling parameter p

Output U isin Rmtimes(k+p) and Λ isin R(k+p)times(k+p)

G = randn(n k + p)Q = orth(AG)B1 = AQB2 = QTB1C = chol(B2)F = B1C

minus1[UΣsim] = svd(Flsquoeconrsquo)Λ = Σ2

Low-Rank Approximations Lecture 4 March 6 - 13 2020 18 40

2 Interpolative decomposition (ID)

Any matrix A of size mtimes n and rank k where k lt min(mn)admits a so called ldquocolumn IDrdquo which takes the form

A = CZ C isin Rmtimesk Z isin Rktimesn

where the matrix C = A( Js) and Z is a matrix that containsthe k times k identity matrix (Z( Js) = Ik)

Advantages compared to QR or SVD

(1) If A is sparse or non-negative then C shares these properties

(2) The ID requires less memory to store than either the QR orSVD

(3) Finding the indices associated with the spanning columns isoften helpful in data interpretation

(4) In the context of numerical algorithms for discretizing PDEsand integral equations the ID often preserves ldquothe physicsrdquo of aproblem in a way that the QR or SVD do not

Low-Rank Approximations Lecture 4 March 6 - 13 2020 19 40

ldquoRow IDrdquo

A = XR X isin Rmtimesk R isin Rktimesn

where R = A(Is ) and X is a matrix that contains the k times kidentity matrix (X(Is ) = Ik)

ldquoDouble-sided IDrdquo

A = XAsZ X isin Rmtimesk As isin Rktimesk Z isin Rktimesn

where X and Z are the same matrices as those that appear in rowID and column ID and As is the k times k submatrix of A given by

As = A(Is Js)

Low-Rank Approximations Lecture 4 March 6 - 13 2020 20 40

21 Deterministic techniques for the approximate ID

The standard software used to compute the classical columnpivoted QR factorization (CPQR)

AP = QS

can with some light post-processing be used to compute thecolumn ID Let

Q =983045Q1 Q2

983046 S =

983063S11 S12

0 S22

983064

ThenAP =

983045Q1S11 Q1S12 +Q2S22

983046

Let J denote the permutation vector associated with thepermutation matrix P so that P = I( J) and AP = A( J) LetJs = J(1 k) Then set

C = A( Js) = Q1S11

Low-Rank Approximations Lecture 4 March 6 - 13 2020 21 40

ByAP = Q1S11

983045I Sminus1

11 S12

983046+

9830450 Q2S22

983046

we have

A = C983045I Sminus1

11 S12

983046PT +

9830450 Q2S22

983046PT

= CZ+9830450 Q2S22

983046PT

Algorithm 7 Column ID A asymp A( Js)Z

function [JsZ] = ID col(A k)[simS J ] = qr(A 0)T = S(1 k 1 k)S(1 k (k + 1) n)Z = zeros(k n)Z( J) =

983045Ik T

983046

Js = J(1 k)end

Low-Rank Approximations Lecture 4 March 6 - 13 2020 22 40

The row ID can be computed via an entirely analogous processthat starts with a CPQR of the transpose of A In other wordswe execute a pivoted Gram-Schmidt orthonormalization processon the rows of A

Algorithm 8 Row ID A asymp XA(Is )

function [IsX] = ID row(A k)[simS J ] = qr(AT 0)T = S(1 k 1 k)S(1 k (k + 1) m)X = zeros(m k)

X(J ) =983045Ik T

983046T

Is = J(1 k)end

Low-Rank Approximations Lecture 4 March 6 - 13 2020 23 40

To obtain the double-sided ID we start with using theCPQR-based process to build the column ID Then compute therow ID by performing Gram-Schmidt on the rows of the tall thinmatrix C

Algorithm 9 Double-sided ID A asymp XA(Is Js)Z

function [Is JsXZ] = ID double(A k)[JsZ] = ID col(A k)[IsX] = ID row(A( Js) k)

end

The algorithms for computing interpolatory decompositions arewasteful when k ≪ min(mn) since they involve a full QRfactorization which has complexity O(mnmin(mn)) Thisproblem is very easily remedied by replacing the full QRfactorization by a partial QR factorization (the QR factorization isinterrupted after k steps) which has cost O(mnk)

Low-Rank Approximations Lecture 4 March 6 - 13 2020 24 40

22 Randomized techniques for the approximate ID

Suppose A isin Rmtimesn with rank(A) = k Let

A = YF Y isin Rmtimesk F isin Rktimesn

Then by [IsX] = ID row(Y k) to compute a row ID of Y wehave

Y = XY(Is )

andXA(Is ) = XY(Is )F = YF = A

That is XA(Is ) is automatically a row ID of A as well

Observation In order to compute a row ID of a matrix A theonly information needed is a matrix Y whose columns span thecolumn space of A

As we have seen the task of finding a matrix Y whose columnsform a good basis for the column space of a matrix is ideallysuited to randomized sampling

Low-Rank Approximations Lecture 4 March 6 - 13 2020 25 40

Algorithm 10 Accuracy enhanced randomized row ID

Input A isin Rmtimesn a target rank kover-sampling parameter pa small q the number of steps in the power iteration

Output X isin Rmtimesk and an index vector Is isin Nk

such that A asymp XA(Is )

G = randn(n k + p)Y = AGfor j = 1 q

Z = ATYY = AZ

end[IsX] = ID row(Y k)

We can reduce the complexity by using a structured randommatrix instead of a Gaussian For example Ω = DFSDHS

Low-Rank Approximations Lecture 4 March 6 - 13 2020 26 40

3 The CUR decomposition

The CUR factorization approximates an mtimes n matrix A as aproduct

A asymp CUR C isin Rmtimesk U isin Rktimesk R isin Rktimesn

where C consists of a subset of the columns of A and R consists ofa subset of the rows of A

From double-sided ID to CUR Let

C = A( Js) R = A(Is )

ByA asymp CZ asymp CUR

we setU = ZRdagger

Low-Rank Approximations Lecture 4 March 6 - 13 2020 27 40

Algorithm 11 Accuracy enhanced randomized CUR

Input A isin Rmtimesn a target rank kover-sampling parameter pa small q the number of steps in the power iteration

Output U isin Rktimesk index vectors Is Js isin Nk

such that A asymp A( Js)UA(Is )

G = randn(k + pm)Y = GAfor j = 1 q

Z = YATY = ZA

end[IsZ] = ID col(Y k)[Issim] = ID row(A( Js) k)Solve UA(Is ) = Z in least squares sense

Low-Rank Approximations Lecture 4 March 6 - 13 2020 28 40

4 Adaptive rank determination

Given a matrix A and a tolerance ε The task is to determinematrices Qk and Bk of rank le k such that 983042AminusQkBk983042 le ε

41 Algorithm with updating the matrix

Q0 = [ ] B0 = [ ] A0 = A k = 0while 983042Ak9830422 gt ε

k = k + 1pick y isin range(Akminus1)qk = y983042y9830422bk = qT

kAkminus1Qk =

983045Qkminus1 qk

983046

Bk =

983063Bkminus1

bk

983064

Ak = Akminus1 minus qkbkend

Low-Rank Approximations Lecture 4 March 6 - 13 2020 29 40

Strategies for choosing y isin range(Akminus1)

(1) Pick the largest remaining column

y = Akminus1( jk) with jk = argmaxj

983042Akminus1( j)9830422

(2) Pick the locally optimal vector (computationally hard)

y = argmin983042x9830422=1 xisinrange(Akminus1)

983042(Iminus xxT)Akminus19830422

(3) A randomized selection strategy Set

y = Akminus1g

where g isin Rn is a Gaussian random vector

Low-Rank Approximations Lecture 4 March 6 - 13 2020 30 40

Exercise Prove that for each k Qk is orthonormal and

Bk = QTkA A = QkBk +Ak

Exercise Prove that for each k

983042A9830422F = 983042Bk9830422F + 983042Ak9830422F

Exercise Prove that for each k with the strategy (2) ie

y = argmin983042x9830422=1 xisinrange(Akminus1)

983042(Iminus xxT)Akminus19830422

the algorithm will produce matrices that attain the theoreticallyoptimal precision ie

983042AminusQkBk9830422 = σk+1

Low-Rank Approximations Lecture 4 March 6 - 13 2020 31 40

42 Block version for randomized selection strategy

Q = [ ] B = [ ]while 983042A9830422 gt ε

G = randn(n p)[Qnewsim] = qr(AG 0)Bnew = QT

newAQ =

983045Q Qnew

983046

B =

983063B

Bnew

983064

A = AminusQnewBnewend

Exercise The block version algorithmrsquos output is an orthonormalmatrix Q of size mtimes k (where k is a multiple of p) and a k times nmatrix B such that

983042AminusQB9830422 le ε

Low-Rank Approximations Lecture 4 March 6 - 13 2020 32 40

43 Algorithm without updating the matrix

Draw standard Gaussian vectors g1 gr of length nyi = Agi i = 1 r j = 0 Q0 = [ ]

while max983042yj+19830422 983042yj+29830422 983042yj+r9830422 gt ε(10983155

2π)j = j + 1yj = yj minusQjminus1Q

Tjminus1yj

qj = yj983042yj9830422Qj =

983045Qjminus1 qj

983046

Draw a standard Gaussian vector gj+r of length nyj+r = (IminusQjQ

Tj )Agj+r

yi = yi minus qjqTj yi i = (j + 1) (j + r minus 1)

endQ = Qj

The orthonormal matrix Q such that 983042AminusQQTA9830422 le ε holdswith probability at least 1minusminmn10minusr

Low-Rank Approximations Lecture 4 March 6 - 13 2020 33 40

5 Rank-revealing QR Factorization

Column pivoted QR factorization (CPQR) of A isin Rmtimesn

AP = QR

where P is a permutation matrix Q is an orthogonal matrix andR is upper triangular

Householder column pivoted QR

Householder column pivoted QR consists of a sequence of nminus 1rank-1 updates (so called BLAS2 operations)

Low-Rank Approximations Lecture 4 March 6 - 13 2020 34 40

Householder column pivoted QR executes rather slowly on modernhardware in particular on systems involving many cores when thematrix is stored in distributed memory or on a hard drive etcThe reason is the process very communication intensive Wereturn to block the process

Blocked Householder column pivoted QR

BLAS3 operations are involved which can be executed very rapidlyon a broad range of computing hardware

A randomized algorithm

G = randn(b+ pm) Y = GA [simsimP1] = qr(Y 0)

Low-Rank Approximations Lecture 4 March 6 - 13 2020 35 40

6 Rank-revealing UTV decomposition

Given an mtimes n matrix A with m ge n a UTV decomposition ofA is a factorization of the form

A = UTVT

where U isin Rmtimesm and V isin Rntimesn are unitary matrices and T is atriangular matrix (either lower or upper triangular)

The UTV decomposition can be viewed as a generalization ofother standard factorizations such as eg the Singular ValueDecomposition (SVD) or the Column Pivoted QR decomposition(CPQR) (To be precise the SVD is the special case where T isdiagonal and the CPQR is the special case where V is apermutation matrix) The additional flexibility inherent in UTVdecomposition enables the design of efficient updating procedures

randUTV provides close to optimal low-rank approximation andhighly accurate estimates for the singular values of a matrix

Low-Rank Approximations Lecture 4 March 6 - 13 2020 36 40

The algorithm randUTV builds the factorization incrementallywhich means that when it is applied to a matrix of numerical rankk the algorithm can be stopped early and incur an overall cost ofO(mnk)

Like HQRRP the algorithm randUTV is blocked which enables itto execute fast on modern hardware

The algorithm randUTV is not an iterative algorithm In thisregard it is closer to the CPQR than standard SVD algorithmswhich substantially simplifies software optimization

Blocked UTV factorization process

Low-Rank Approximations Lecture 4 March 6 - 13 2020 37 40

61 A single step block factorization from A0 to A1

Draw an mtimes p Gaussian matrix G and form the sample matrix

Y = (ATA)qATG

where q is a parameter indicating the number of steps of poweriteration taken

Form a unitary matrix V whose first p columns form anorthonormal basis for the column space of Y For example

[Vsim] = qr(Y)

We then compute SVD of the matrix AV( 1 p)

AV( 1 p) = U1A11WT

and set V1 = V

983063W

I

983064

Low-Rank Approximations Lecture 4 March 6 - 13 2020 38 40

One step of randUTV

function [UTV] = stepUTV(A p q)G =randn(size(A 1) p)Y = ATGfor i = 1 q

Y = AT(AY)end[Vsim] = qr(Y)[UDW] = svd(AV( 1 p))T =

983045D UTAV( (p+ 1) end)

983046

V( 1 p) = V( 1 p)Wreturn

randUTV then applies the same procedure to the remaining blockand so on

Low-Rank Approximations Lecture 4 March 6 - 13 2020 39 40

function [UTV] = randUTV(A p q)T = AU = eye(size(A 1)) V = eye(size(A 2))for i = 1 ceil(size(A 2)p)

I1 = 1 (p(iminus 1))I2 = (p(iminus 1) + 1) size(A 1)J2 = (p(iminus 1) + 1) size(A 2)if length(J2) gt p

[983141U 983141T 983141V] = stepUTV(T(I2 J2) p q)else

[983141U 983141T 983141V] = svd(T(I2 J2))end

U( I2) = U( I2)983141U V( J2) = V( J2)983141V

T(I2 J2) = 983141T T(I1 J2) = 983141T(I1 J2)983141Vend

return

Low-Rank Approximations Lecture 4 March 6 - 13 2020 40 40

Algorithm 1 Randomized SVD

Input A isin Rmtimesn a target rank k over-sampling parameter p

Output U isin Rmtimes(k+p) D isin R(k+p)times(k+p) and V isin Rntimes(k+p)

Stage A(1) Form an ntimes (k + p) Gaussian random matrix G(2) Form the sample matrix Y = AG(3) Orthonormalize the columns of the sample matrix

Q = orth(Y)

Stage B(4) Form the (k + p)times n matrix B = QTA

(5) Form the SVD of the small matrix B B = 983141UDVT

(6) Form U = Q983141U

Randomized SVD accesses the matrix A twice

Low-Rank Approximations Lecture 4 March 6 - 13 2020 8 40

Theorem 1

Let A be an mtimes n matrix with singular values σjmin(mn)j=1 Let k be a

target rank and let p be an over-sampling parameter such that p ge 2and k + p le min(mn) Let G be a Gaussian random matrix of sizentimes (k + p) and set Q = orth(AG) Then the average error asmeasured in the Frobenius norm satisfies

E[983042AminusQQTA983042F] le9830611 +

k

pminus 1

98306212983091

983107min(mn)983131

j=k+1

σ2j

983092

98310812

where E refers to expectation with respect to the draw of G Thecorresponding result for the spectral norm reads

E[983042AminusQQTA9830422] le9830751 +

983158k

pminus 1

983076σk+1 +

eradick + p

p

983091

983107min(mn)983131

j=k+1

σ2j

983092

98310812

Low-Rank Approximations Lecture 4 March 6 - 13 2020 9 40

Bounds on the likelihood of large deviations

The likelihood of a large deviation from the mean depends only onthe over-sampling parameter p and decays extraordinarily fastFor instance one can prove that if p ge 4 then

983042AminusQQTA9830422 le9830751 + 17

983158

1 +k

p

983076σk+1

+8radick + p

p+ 1

983091

983107min(mn)983131

j=k+1

σ2j

983092

98310812

with failure probability at most 3eminusp

Low-Rank Approximations Lecture 4 March 6 - 13 2020 10 40

14 Single-pass algorithms

Algorithm 2 Single-pass randomized EVD for symmetric matrix

Input Symmetric A isin Rntimesn a target rank kover-sampling parameter p

Output U isin Rntimesk D isin Rktimesk and A asymp UDUT

Stage A(1) Form an ntimes (k + p) Gaussian random matrix G(2) Form the sample matrix Y = AG(3) Let Q denote the orthonormal matrix formed by the k

dominant left singular vectors of Y

Stage B

(4) Let C = argminXisinRktimeskX=XT

983042XQTGminusQTY983042F

(5) Form the EVD of the small matrix C C = 983141UD983141UT

(6) Form U = Q983141U

Low-Rank Approximations Lecture 4 March 6 - 13 2020 11 40

Algorithm 3 Single-pass randomized SVD for general matrix

Input A isin Rmtimesn a target rank k over-sampling parameter pOutput U isin Rntimesk D isin Rktimesk V isin Rntimesk and A asymp UDVT

Stage A

(1) Form two Gaussian random matrices Gc isin Rntimes(k+p) and

Gr isin Rmtimes(k+p) (2) Form the sample matrices Yc = AGc and Yr = ATGr(3) Form orthonormal matrices Qc and Qr consisting of the

k dominant left singular vectors of Yc and Yr

Stage B(4) Let

C = argminXisinRktimesk

983042GTr QcXminusYT

r Qr9830422F + 983042XQTr Gc minusQT

c Yc9830422F

(5) Form the SVD of the small matrix C C = 983141UD983141VT

(6) Form U = Qc983141U and V = Qr

983141V

Low-Rank Approximations Lecture 4 March 6 - 13 2020 12 40

Streaming algorithms

We say that an algorithm for processing a matrix is a streamingalgorithm if both (1) and (2) are satisfied

(1) each entry of the matrix is accessed only once

(2) the entries can be fed in any order (In other words thealgorithm is not allowed to dictate the order in which the elementsare viewed)

Complexity O(mn log k) algorithm

Replace Gaussian random matrix G with a different randommatrix Ω

(1) Ω is sufficiently structured that AΩ can be evaluated inO(mn log k) flops

(2) Ω is sufficiently random such that the columns of AΩaccurately span the range of A

Low-Rank Approximations Lecture 4 March 6 - 13 2020 13 40

Candidate ldquoFast Johnson-Lindenstrauss Transformrdquo

(1) Complex case A good candidate for Ω

Ω = DFS

where D is diagonal and Dii are complex numbers of modulus onedrawn from a uniform distribution on the unit circle in C F is thediscrete Fourier transform

Fpq =1radicneminus2πi(pminus1)(qminus1)n p q = 1 n

and S is a matrix consisting of a random subset of l columns fromthe ntimes n unit matrix (drawn without replacement) In otherwords given an arbitrary matrix X of size mtimes n the matrix XSconsists of a randomly drawn subset of l columns of X

(2) Real case Ω = DHS where H is the Hadamard Transform Dis diagonal and Dii = plusmn1 with equal probability

Low-Rank Approximations Lecture 4 March 6 - 13 2020 14 40

15 Accuracy enhanced algorithms

Algorithm 4 Accuracy enhanced randomized SVD

Input A isin Rmtimesn a target rank kover-sampling parameter pa small q the number of steps in the power iteration

Output U isin Rmtimes(k+p) D isin R(k+p)times(k+p) and V isin Rntimes(k+p)

G = randn(n k + p)Y = AGfor j = 1 q

Z = ATYY = AZ

endQ = orth(Y) (for example [Qsim] = qr(Y 0))B = QTA

[983141UDV] = svd(Blsquoeconrsquo)

U = Q983141U

Low-Rank Approximations Lecture 4 March 6 - 13 2020 15 40

Theorem 2

Let A be an mtimes n matrix with singular values σjmin(mn)j=1 Let k be a

target rank and let p be an over-sampling parameter such that p ge 2and k + p le min(mn) Let G be a Gaussian random matrix of sizentimes (k + p) let q be a small integer set

Y = (AAT)qAG

and let Q = orth(Y) Then

E[983042AminusQQTA9830422]

le

983093

983097983095

9830751 +

983158k

pminus 1

983076σ2q+1k+1 +

eradick + p

p

983091

983107min(mn)983131

j=k+1

σ2(2q+1)j

983092

983108

12

983094

983098983096

12q+1

le9830771 +

983158k

pminus 1+

eradick + p

p

983155min(mn)minus k

983078 12q+1

σk+1

Low-Rank Approximations Lecture 4 March 6 - 13 2020 16 40

Algorithm 5 Accuracy enhanced randomized SVDwith orthonormalization

Input A isin Rmtimesn a target rank kover-sampling parameter pa small q the number of steps in the power iteration

Output U isin Rmtimes(k+p) D isin R(k+p)times(k+p) and V isin Rntimes(k+p)

G = randn(n k + p)Q = orth(AG)for j = 1 q

W = orth(ATQ)Q = orth(AW)

endB = QTA

[983141UDV] = svd(Blsquoeconrsquo)

U = Q983141U

Low-Rank Approximations Lecture 4 March 6 - 13 2020 17 40

16 The Nystrom method for SPD matrices

Algorithm 6 Randomized EVD for SPD matrixvia Nystrom method

Input SPD A isin Rntimesn a target rank kover-sampling parameter p

Output U isin Rmtimes(k+p) and Λ isin R(k+p)times(k+p)

G = randn(n k + p)Q = orth(AG)B1 = AQB2 = QTB1C = chol(B2)F = B1C

minus1[UΣsim] = svd(Flsquoeconrsquo)Λ = Σ2

Low-Rank Approximations Lecture 4 March 6 - 13 2020 18 40

2 Interpolative decomposition (ID)

Any matrix A of size mtimes n and rank k where k lt min(mn)admits a so called ldquocolumn IDrdquo which takes the form

A = CZ C isin Rmtimesk Z isin Rktimesn

where the matrix C = A( Js) and Z is a matrix that containsthe k times k identity matrix (Z( Js) = Ik)

Advantages compared to QR or SVD

(1) If A is sparse or non-negative then C shares these properties

(2) The ID requires less memory to store than either the QR orSVD

(3) Finding the indices associated with the spanning columns isoften helpful in data interpretation

(4) In the context of numerical algorithms for discretizing PDEsand integral equations the ID often preserves ldquothe physicsrdquo of aproblem in a way that the QR or SVD do not

Low-Rank Approximations Lecture 4 March 6 - 13 2020 19 40

ldquoRow IDrdquo

A = XR X isin Rmtimesk R isin Rktimesn

where R = A(Is ) and X is a matrix that contains the k times kidentity matrix (X(Is ) = Ik)

ldquoDouble-sided IDrdquo

A = XAsZ X isin Rmtimesk As isin Rktimesk Z isin Rktimesn

where X and Z are the same matrices as those that appear in rowID and column ID and As is the k times k submatrix of A given by

As = A(Is Js)

Low-Rank Approximations Lecture 4 March 6 - 13 2020 20 40

21 Deterministic techniques for the approximate ID

The standard software used to compute the classical columnpivoted QR factorization (CPQR)

AP = QS

can with some light post-processing be used to compute thecolumn ID Let

Q =983045Q1 Q2

983046 S =

983063S11 S12

0 S22

983064

ThenAP =

983045Q1S11 Q1S12 +Q2S22

983046

Let J denote the permutation vector associated with thepermutation matrix P so that P = I( J) and AP = A( J) LetJs = J(1 k) Then set

C = A( Js) = Q1S11

Low-Rank Approximations Lecture 4 March 6 - 13 2020 21 40

ByAP = Q1S11

983045I Sminus1

11 S12

983046+

9830450 Q2S22

983046

we have

A = C983045I Sminus1

11 S12

983046PT +

9830450 Q2S22

983046PT

= CZ+9830450 Q2S22

983046PT

Algorithm 7 Column ID A asymp A( Js)Z

function [JsZ] = ID col(A k)[simS J ] = qr(A 0)T = S(1 k 1 k)S(1 k (k + 1) n)Z = zeros(k n)Z( J) =

983045Ik T

983046

Js = J(1 k)end

Low-Rank Approximations Lecture 4 March 6 - 13 2020 22 40

The row ID can be computed via an entirely analogous processthat starts with a CPQR of the transpose of A In other wordswe execute a pivoted Gram-Schmidt orthonormalization processon the rows of A

Algorithm 8 Row ID A asymp XA(Is )

function [IsX] = ID row(A k)[simS J ] = qr(AT 0)T = S(1 k 1 k)S(1 k (k + 1) m)X = zeros(m k)

X(J ) =983045Ik T

983046T

Is = J(1 k)end

Low-Rank Approximations Lecture 4 March 6 - 13 2020 23 40

To obtain the double-sided ID we start with using theCPQR-based process to build the column ID Then compute therow ID by performing Gram-Schmidt on the rows of the tall thinmatrix C

Algorithm 9 Double-sided ID A asymp XA(Is Js)Z

function [Is JsXZ] = ID double(A k)[JsZ] = ID col(A k)[IsX] = ID row(A( Js) k)

end

The algorithms for computing interpolatory decompositions arewasteful when k ≪ min(mn) since they involve a full QRfactorization which has complexity O(mnmin(mn)) Thisproblem is very easily remedied by replacing the full QRfactorization by a partial QR factorization (the QR factorization isinterrupted after k steps) which has cost O(mnk)

Low-Rank Approximations Lecture 4 March 6 - 13 2020 24 40

22 Randomized techniques for the approximate ID

Suppose A isin Rmtimesn with rank(A) = k Let

A = YF Y isin Rmtimesk F isin Rktimesn

Then by [IsX] = ID row(Y k) to compute a row ID of Y wehave

Y = XY(Is )

andXA(Is ) = XY(Is )F = YF = A

That is XA(Is ) is automatically a row ID of A as well

Observation In order to compute a row ID of a matrix A theonly information needed is a matrix Y whose columns span thecolumn space of A

As we have seen the task of finding a matrix Y whose columnsform a good basis for the column space of a matrix is ideallysuited to randomized sampling

Low-Rank Approximations Lecture 4 March 6 - 13 2020 25 40

Algorithm 10 Accuracy enhanced randomized row ID

Input A isin Rmtimesn a target rank kover-sampling parameter pa small q the number of steps in the power iteration

Output X isin Rmtimesk and an index vector Is isin Nk

such that A asymp XA(Is )

G = randn(n k + p)Y = AGfor j = 1 q

Z = ATYY = AZ

end[IsX] = ID row(Y k)

We can reduce the complexity by using a structured randommatrix instead of a Gaussian For example Ω = DFSDHS

Low-Rank Approximations Lecture 4 March 6 - 13 2020 26 40

3 The CUR decomposition

The CUR factorization approximates an mtimes n matrix A as aproduct

A asymp CUR C isin Rmtimesk U isin Rktimesk R isin Rktimesn

where C consists of a subset of the columns of A and R consists ofa subset of the rows of A

From double-sided ID to CUR Let

C = A( Js) R = A(Is )

ByA asymp CZ asymp CUR

we setU = ZRdagger

Low-Rank Approximations Lecture 4 March 6 - 13 2020 27 40

Algorithm 11 Accuracy enhanced randomized CUR

Input A isin Rmtimesn a target rank kover-sampling parameter pa small q the number of steps in the power iteration

Output U isin Rktimesk index vectors Is Js isin Nk

such that A asymp A( Js)UA(Is )

G = randn(k + pm)Y = GAfor j = 1 q

Z = YATY = ZA

end[IsZ] = ID col(Y k)[Issim] = ID row(A( Js) k)Solve UA(Is ) = Z in least squares sense

Low-Rank Approximations Lecture 4 March 6 - 13 2020 28 40

4 Adaptive rank determination

Given a matrix A and a tolerance ε The task is to determinematrices Qk and Bk of rank le k such that 983042AminusQkBk983042 le ε

41 Algorithm with updating the matrix

Q0 = [ ] B0 = [ ] A0 = A k = 0while 983042Ak9830422 gt ε

k = k + 1pick y isin range(Akminus1)qk = y983042y9830422bk = qT

kAkminus1Qk =

983045Qkminus1 qk

983046

Bk =

983063Bkminus1

bk

983064

Ak = Akminus1 minus qkbkend

Low-Rank Approximations Lecture 4 March 6 - 13 2020 29 40

Strategies for choosing y isin range(Akminus1)

(1) Pick the largest remaining column

y = Akminus1( jk) with jk = argmaxj

983042Akminus1( j)9830422

(2) Pick the locally optimal vector (computationally hard)

y = argmin983042x9830422=1 xisinrange(Akminus1)

983042(Iminus xxT)Akminus19830422

(3) A randomized selection strategy Set

y = Akminus1g

where g isin Rn is a Gaussian random vector

Low-Rank Approximations Lecture 4 March 6 - 13 2020 30 40

Exercise Prove that for each k Qk is orthonormal and

Bk = QTkA A = QkBk +Ak

Exercise Prove that for each k

983042A9830422F = 983042Bk9830422F + 983042Ak9830422F

Exercise Prove that for each k with the strategy (2) ie

y = argmin983042x9830422=1 xisinrange(Akminus1)

983042(Iminus xxT)Akminus19830422

the algorithm will produce matrices that attain the theoreticallyoptimal precision ie

983042AminusQkBk9830422 = σk+1

Low-Rank Approximations Lecture 4 March 6 - 13 2020 31 40

42 Block version for randomized selection strategy

Q = [ ] B = [ ]while 983042A9830422 gt ε

G = randn(n p)[Qnewsim] = qr(AG 0)Bnew = QT

newAQ =

983045Q Qnew

983046

B =

983063B

Bnew

983064

A = AminusQnewBnewend

Exercise The block version algorithmrsquos output is an orthonormalmatrix Q of size mtimes k (where k is a multiple of p) and a k times nmatrix B such that

983042AminusQB9830422 le ε

Low-Rank Approximations Lecture 4 March 6 - 13 2020 32 40

43 Algorithm without updating the matrix

Draw standard Gaussian vectors g1 gr of length nyi = Agi i = 1 r j = 0 Q0 = [ ]

while max983042yj+19830422 983042yj+29830422 983042yj+r9830422 gt ε(10983155

2π)j = j + 1yj = yj minusQjminus1Q

Tjminus1yj

qj = yj983042yj9830422Qj =

983045Qjminus1 qj

983046

Draw a standard Gaussian vector gj+r of length nyj+r = (IminusQjQ

Tj )Agj+r

yi = yi minus qjqTj yi i = (j + 1) (j + r minus 1)

endQ = Qj

The orthonormal matrix Q such that 983042AminusQQTA9830422 le ε holdswith probability at least 1minusminmn10minusr

Low-Rank Approximations Lecture 4 March 6 - 13 2020 33 40

5 Rank-revealing QR Factorization

Column pivoted QR factorization (CPQR) of A isin Rmtimesn

AP = QR

where P is a permutation matrix Q is an orthogonal matrix andR is upper triangular

Householder column pivoted QR

Householder column pivoted QR consists of a sequence of nminus 1rank-1 updates (so called BLAS2 operations)

Low-Rank Approximations Lecture 4 March 6 - 13 2020 34 40

Householder column pivoted QR executes rather slowly on modernhardware in particular on systems involving many cores when thematrix is stored in distributed memory or on a hard drive etcThe reason is the process very communication intensive Wereturn to block the process

Blocked Householder column pivoted QR

BLAS3 operations are involved which can be executed very rapidlyon a broad range of computing hardware

A randomized algorithm

G = randn(b+ pm) Y = GA [simsimP1] = qr(Y 0)

Low-Rank Approximations Lecture 4 March 6 - 13 2020 35 40

6 Rank-revealing UTV decomposition

Given an mtimes n matrix A with m ge n a UTV decomposition ofA is a factorization of the form

A = UTVT

where U isin Rmtimesm and V isin Rntimesn are unitary matrices and T is atriangular matrix (either lower or upper triangular)

The UTV decomposition can be viewed as a generalization ofother standard factorizations such as eg the Singular ValueDecomposition (SVD) or the Column Pivoted QR decomposition(CPQR) (To be precise the SVD is the special case where T isdiagonal and the CPQR is the special case where V is apermutation matrix) The additional flexibility inherent in UTVdecomposition enables the design of efficient updating procedures

randUTV provides close to optimal low-rank approximation andhighly accurate estimates for the singular values of a matrix

Low-Rank Approximations Lecture 4 March 6 - 13 2020 36 40

The algorithm randUTV builds the factorization incrementallywhich means that when it is applied to a matrix of numerical rankk the algorithm can be stopped early and incur an overall cost ofO(mnk)

Like HQRRP the algorithm randUTV is blocked which enables itto execute fast on modern hardware

The algorithm randUTV is not an iterative algorithm In thisregard it is closer to the CPQR than standard SVD algorithmswhich substantially simplifies software optimization

Blocked UTV factorization process

Low-Rank Approximations Lecture 4 March 6 - 13 2020 37 40

61 A single step block factorization from A0 to A1

Draw an mtimes p Gaussian matrix G and form the sample matrix

Y = (ATA)qATG

where q is a parameter indicating the number of steps of poweriteration taken

Form a unitary matrix V whose first p columns form anorthonormal basis for the column space of Y For example

[Vsim] = qr(Y)

We then compute SVD of the matrix AV( 1 p)

AV( 1 p) = U1A11WT

and set V1 = V

983063W

I

983064

Low-Rank Approximations Lecture 4 March 6 - 13 2020 38 40

One step of randUTV

function [UTV] = stepUTV(A p q)G =randn(size(A 1) p)Y = ATGfor i = 1 q

Y = AT(AY)end[Vsim] = qr(Y)[UDW] = svd(AV( 1 p))T =

983045D UTAV( (p+ 1) end)

983046

V( 1 p) = V( 1 p)Wreturn

randUTV then applies the same procedure to the remaining blockand so on

Low-Rank Approximations Lecture 4 March 6 - 13 2020 39 40

function [UTV] = randUTV(A p q)T = AU = eye(size(A 1)) V = eye(size(A 2))for i = 1 ceil(size(A 2)p)

I1 = 1 (p(iminus 1))I2 = (p(iminus 1) + 1) size(A 1)J2 = (p(iminus 1) + 1) size(A 2)if length(J2) gt p

[983141U 983141T 983141V] = stepUTV(T(I2 J2) p q)else

[983141U 983141T 983141V] = svd(T(I2 J2))end

U( I2) = U( I2)983141U V( J2) = V( J2)983141V

T(I2 J2) = 983141T T(I1 J2) = 983141T(I1 J2)983141Vend

return

Low-Rank Approximations Lecture 4 March 6 - 13 2020 40 40

Theorem 1

Let A be an mtimes n matrix with singular values σjmin(mn)j=1 Let k be a

target rank and let p be an over-sampling parameter such that p ge 2and k + p le min(mn) Let G be a Gaussian random matrix of sizentimes (k + p) and set Q = orth(AG) Then the average error asmeasured in the Frobenius norm satisfies

E[983042AminusQQTA983042F] le9830611 +

k

pminus 1

98306212983091

983107min(mn)983131

j=k+1

σ2j

983092

98310812

where E refers to expectation with respect to the draw of G Thecorresponding result for the spectral norm reads

E[983042AminusQQTA9830422] le9830751 +

983158k

pminus 1

983076σk+1 +

eradick + p

p

983091

983107min(mn)983131

j=k+1

σ2j

983092

98310812

Low-Rank Approximations Lecture 4 March 6 - 13 2020 9 40

Bounds on the likelihood of large deviations

The likelihood of a large deviation from the mean depends only onthe over-sampling parameter p and decays extraordinarily fastFor instance one can prove that if p ge 4 then

983042AminusQQTA9830422 le9830751 + 17

983158

1 +k

p

983076σk+1

+8radick + p

p+ 1

983091

983107min(mn)983131

j=k+1

σ2j

983092

98310812

with failure probability at most 3eminusp

Low-Rank Approximations Lecture 4 March 6 - 13 2020 10 40

14 Single-pass algorithms

Algorithm 2 Single-pass randomized EVD for symmetric matrix

Input Symmetric A isin Rntimesn a target rank kover-sampling parameter p

Output U isin Rntimesk D isin Rktimesk and A asymp UDUT

Stage A(1) Form an ntimes (k + p) Gaussian random matrix G(2) Form the sample matrix Y = AG(3) Let Q denote the orthonormal matrix formed by the k

dominant left singular vectors of Y

Stage B

(4) Let C = argminXisinRktimeskX=XT

983042XQTGminusQTY983042F

(5) Form the EVD of the small matrix C C = 983141UD983141UT

(6) Form U = Q983141U

Low-Rank Approximations Lecture 4 March 6 - 13 2020 11 40

Algorithm 3 Single-pass randomized SVD for general matrix

Input A isin Rmtimesn a target rank k over-sampling parameter pOutput U isin Rntimesk D isin Rktimesk V isin Rntimesk and A asymp UDVT

Stage A

(1) Form two Gaussian random matrices Gc isin Rntimes(k+p) and

Gr isin Rmtimes(k+p) (2) Form the sample matrices Yc = AGc and Yr = ATGr(3) Form orthonormal matrices Qc and Qr consisting of the

k dominant left singular vectors of Yc and Yr

Stage B(4) Let

C = argminXisinRktimesk

983042GTr QcXminusYT

r Qr9830422F + 983042XQTr Gc minusQT

c Yc9830422F

(5) Form the SVD of the small matrix C C = 983141UD983141VT

(6) Form U = Qc983141U and V = Qr

983141V

Low-Rank Approximations Lecture 4 March 6 - 13 2020 12 40

Streaming algorithms

We say that an algorithm for processing a matrix is a streamingalgorithm if both (1) and (2) are satisfied

(1) each entry of the matrix is accessed only once

(2) the entries can be fed in any order (In other words thealgorithm is not allowed to dictate the order in which the elementsare viewed)

Complexity O(mn log k) algorithm

Replace Gaussian random matrix G with a different randommatrix Ω

(1) Ω is sufficiently structured that AΩ can be evaluated inO(mn log k) flops

(2) Ω is sufficiently random such that the columns of AΩaccurately span the range of A

Low-Rank Approximations Lecture 4 March 6 - 13 2020 13 40

Candidate ldquoFast Johnson-Lindenstrauss Transformrdquo

(1) Complex case A good candidate for Ω

Ω = DFS

where D is diagonal and Dii are complex numbers of modulus onedrawn from a uniform distribution on the unit circle in C F is thediscrete Fourier transform

Fpq =1radicneminus2πi(pminus1)(qminus1)n p q = 1 n

and S is a matrix consisting of a random subset of l columns fromthe ntimes n unit matrix (drawn without replacement) In otherwords given an arbitrary matrix X of size mtimes n the matrix XSconsists of a randomly drawn subset of l columns of X

(2) Real case Ω = DHS where H is the Hadamard Transform Dis diagonal and Dii = plusmn1 with equal probability

Low-Rank Approximations Lecture 4 March 6 - 13 2020 14 40

15 Accuracy enhanced algorithms

Algorithm 4 Accuracy enhanced randomized SVD

Input A isin Rmtimesn a target rank kover-sampling parameter pa small q the number of steps in the power iteration

Output U isin Rmtimes(k+p) D isin R(k+p)times(k+p) and V isin Rntimes(k+p)

G = randn(n k + p)Y = AGfor j = 1 q

Z = ATYY = AZ

endQ = orth(Y) (for example [Qsim] = qr(Y 0))B = QTA

[983141UDV] = svd(Blsquoeconrsquo)

U = Q983141U

Low-Rank Approximations Lecture 4 March 6 - 13 2020 15 40

Theorem 2

Let A be an mtimes n matrix with singular values σjmin(mn)j=1 Let k be a

target rank and let p be an over-sampling parameter such that p ge 2and k + p le min(mn) Let G be a Gaussian random matrix of sizentimes (k + p) let q be a small integer set

Y = (AAT)qAG

and let Q = orth(Y) Then

E[983042AminusQQTA9830422]

le

983093

983097983095

9830751 +

983158k

pminus 1

983076σ2q+1k+1 +

eradick + p

p

983091

983107min(mn)983131

j=k+1

σ2(2q+1)j

983092

983108

12

983094

983098983096

12q+1

le9830771 +

983158k

pminus 1+

eradick + p

p

983155min(mn)minus k

983078 12q+1

σk+1

Low-Rank Approximations Lecture 4 March 6 - 13 2020 16 40

Algorithm 5 Accuracy enhanced randomized SVDwith orthonormalization

Input A isin Rmtimesn a target rank kover-sampling parameter pa small q the number of steps in the power iteration

Output U isin Rmtimes(k+p) D isin R(k+p)times(k+p) and V isin Rntimes(k+p)

G = randn(n k + p)Q = orth(AG)for j = 1 q

W = orth(ATQ)Q = orth(AW)

endB = QTA

[983141UDV] = svd(Blsquoeconrsquo)

U = Q983141U

Low-Rank Approximations Lecture 4 March 6 - 13 2020 17 40

16 The Nystrom method for SPD matrices

Algorithm 6 Randomized EVD for SPD matrixvia Nystrom method

Input SPD A isin Rntimesn a target rank kover-sampling parameter p

Output U isin Rmtimes(k+p) and Λ isin R(k+p)times(k+p)

G = randn(n k + p)Q = orth(AG)B1 = AQB2 = QTB1C = chol(B2)F = B1C

minus1[UΣsim] = svd(Flsquoeconrsquo)Λ = Σ2

Low-Rank Approximations Lecture 4 March 6 - 13 2020 18 40

2 Interpolative decomposition (ID)

Any matrix A of size mtimes n and rank k where k lt min(mn)admits a so called ldquocolumn IDrdquo which takes the form

A = CZ C isin Rmtimesk Z isin Rktimesn

where the matrix C = A( Js) and Z is a matrix that containsthe k times k identity matrix (Z( Js) = Ik)

Advantages compared to QR or SVD

(1) If A is sparse or non-negative then C shares these properties

(2) The ID requires less memory to store than either the QR orSVD

(3) Finding the indices associated with the spanning columns isoften helpful in data interpretation

(4) In the context of numerical algorithms for discretizing PDEsand integral equations the ID often preserves ldquothe physicsrdquo of aproblem in a way that the QR or SVD do not

Low-Rank Approximations Lecture 4 March 6 - 13 2020 19 40

ldquoRow IDrdquo

A = XR X isin Rmtimesk R isin Rktimesn

where R = A(Is ) and X is a matrix that contains the k times kidentity matrix (X(Is ) = Ik)

ldquoDouble-sided IDrdquo

A = XAsZ X isin Rmtimesk As isin Rktimesk Z isin Rktimesn

where X and Z are the same matrices as those that appear in rowID and column ID and As is the k times k submatrix of A given by

As = A(Is Js)

Low-Rank Approximations Lecture 4 March 6 - 13 2020 20 40

21 Deterministic techniques for the approximate ID

The standard software used to compute the classical columnpivoted QR factorization (CPQR)

AP = QS

can with some light post-processing be used to compute thecolumn ID Let

Q =983045Q1 Q2

983046 S =

983063S11 S12

0 S22

983064

ThenAP =

983045Q1S11 Q1S12 +Q2S22

983046

Let J denote the permutation vector associated with thepermutation matrix P so that P = I( J) and AP = A( J) LetJs = J(1 k) Then set

C = A( Js) = Q1S11

Low-Rank Approximations Lecture 4 March 6 - 13 2020 21 40

ByAP = Q1S11

983045I Sminus1

11 S12

983046+

9830450 Q2S22

983046

we have

A = C983045I Sminus1

11 S12

983046PT +

9830450 Q2S22

983046PT

= CZ+9830450 Q2S22

983046PT

Algorithm 7 Column ID A asymp A( Js)Z

function [JsZ] = ID col(A k)[simS J ] = qr(A 0)T = S(1 k 1 k)S(1 k (k + 1) n)Z = zeros(k n)Z( J) =

983045Ik T

983046

Js = J(1 k)end

Low-Rank Approximations Lecture 4 March 6 - 13 2020 22 40

The row ID can be computed via an entirely analogous processthat starts with a CPQR of the transpose of A In other wordswe execute a pivoted Gram-Schmidt orthonormalization processon the rows of A

Algorithm 8 Row ID A asymp XA(Is )

function [IsX] = ID row(A k)[simS J ] = qr(AT 0)T = S(1 k 1 k)S(1 k (k + 1) m)X = zeros(m k)

X(J ) =983045Ik T

983046T

Is = J(1 k)end

Low-Rank Approximations Lecture 4 March 6 - 13 2020 23 40

To obtain the double-sided ID we start with using theCPQR-based process to build the column ID Then compute therow ID by performing Gram-Schmidt on the rows of the tall thinmatrix C

Algorithm 9 Double-sided ID A asymp XA(Is Js)Z

function [Is JsXZ] = ID double(A k)[JsZ] = ID col(A k)[IsX] = ID row(A( Js) k)

end

The algorithms for computing interpolatory decompositions arewasteful when k ≪ min(mn) since they involve a full QRfactorization which has complexity O(mnmin(mn)) Thisproblem is very easily remedied by replacing the full QRfactorization by a partial QR factorization (the QR factorization isinterrupted after k steps) which has cost O(mnk)

Low-Rank Approximations Lecture 4 March 6 - 13 2020 24 40

22 Randomized techniques for the approximate ID

Suppose A isin Rmtimesn with rank(A) = k Let

A = YF Y isin Rmtimesk F isin Rktimesn

Then by [IsX] = ID row(Y k) to compute a row ID of Y wehave

Y = XY(Is )

andXA(Is ) = XY(Is )F = YF = A

That is XA(Is ) is automatically a row ID of A as well

Observation In order to compute a row ID of a matrix A theonly information needed is a matrix Y whose columns span thecolumn space of A

As we have seen the task of finding a matrix Y whose columnsform a good basis for the column space of a matrix is ideallysuited to randomized sampling

Low-Rank Approximations Lecture 4 March 6 - 13 2020 25 40

Algorithm 10 Accuracy enhanced randomized row ID

Input A isin Rmtimesn a target rank kover-sampling parameter pa small q the number of steps in the power iteration

Output X isin Rmtimesk and an index vector Is isin Nk

such that A asymp XA(Is )

G = randn(n k + p)Y = AGfor j = 1 q

Z = ATYY = AZ

end[IsX] = ID row(Y k)

We can reduce the complexity by using a structured randommatrix instead of a Gaussian For example Ω = DFSDHS

Low-Rank Approximations Lecture 4 March 6 - 13 2020 26 40

3 The CUR decomposition

The CUR factorization approximates an mtimes n matrix A as aproduct

A asymp CUR C isin Rmtimesk U isin Rktimesk R isin Rktimesn

where C consists of a subset of the columns of A and R consists ofa subset of the rows of A

From double-sided ID to CUR Let

C = A( Js) R = A(Is )

ByA asymp CZ asymp CUR

we setU = ZRdagger

Low-Rank Approximations Lecture 4 March 6 - 13 2020 27 40

Algorithm 11 Accuracy enhanced randomized CUR

Input A isin Rmtimesn a target rank kover-sampling parameter pa small q the number of steps in the power iteration

Output U isin Rktimesk index vectors Is Js isin Nk

such that A asymp A( Js)UA(Is )

G = randn(k + pm)Y = GAfor j = 1 q

Z = YATY = ZA

end[IsZ] = ID col(Y k)[Issim] = ID row(A( Js) k)Solve UA(Is ) = Z in least squares sense

Low-Rank Approximations Lecture 4 March 6 - 13 2020 28 40

4 Adaptive rank determination

Given a matrix A and a tolerance ε The task is to determinematrices Qk and Bk of rank le k such that 983042AminusQkBk983042 le ε

41 Algorithm with updating the matrix

Q0 = [ ] B0 = [ ] A0 = A k = 0while 983042Ak9830422 gt ε

k = k + 1pick y isin range(Akminus1)qk = y983042y9830422bk = qT

kAkminus1Qk =

983045Qkminus1 qk

983046

Bk =

983063Bkminus1

bk

983064

Ak = Akminus1 minus qkbkend

Low-Rank Approximations Lecture 4 March 6 - 13 2020 29 40

Strategies for choosing y isin range(Akminus1)

(1) Pick the largest remaining column

y = Akminus1( jk) with jk = argmaxj

983042Akminus1( j)9830422

(2) Pick the locally optimal vector (computationally hard)

y = argmin983042x9830422=1 xisinrange(Akminus1)

983042(Iminus xxT)Akminus19830422

(3) A randomized selection strategy Set

y = Akminus1g

where g isin Rn is a Gaussian random vector

Low-Rank Approximations Lecture 4 March 6 - 13 2020 30 40

Exercise Prove that for each k Qk is orthonormal and

Bk = QTkA A = QkBk +Ak

Exercise Prove that for each k

983042A9830422F = 983042Bk9830422F + 983042Ak9830422F

Exercise Prove that for each k with the strategy (2) ie

y = argmin983042x9830422=1 xisinrange(Akminus1)

983042(Iminus xxT)Akminus19830422

the algorithm will produce matrices that attain the theoreticallyoptimal precision ie

983042AminusQkBk9830422 = σk+1

Low-Rank Approximations Lecture 4 March 6 - 13 2020 31 40

42 Block version for randomized selection strategy

Q = [ ] B = [ ]while 983042A9830422 gt ε

G = randn(n p)[Qnewsim] = qr(AG 0)Bnew = QT

newAQ =

983045Q Qnew

983046

B =

983063B

Bnew

983064

A = AminusQnewBnewend

Exercise The block version algorithmrsquos output is an orthonormalmatrix Q of size mtimes k (where k is a multiple of p) and a k times nmatrix B such that

983042AminusQB9830422 le ε

Low-Rank Approximations Lecture 4 March 6 - 13 2020 32 40

43 Algorithm without updating the matrix

Draw standard Gaussian vectors g1 gr of length nyi = Agi i = 1 r j = 0 Q0 = [ ]

while max983042yj+19830422 983042yj+29830422 983042yj+r9830422 gt ε(10983155

2π)j = j + 1yj = yj minusQjminus1Q

Tjminus1yj

qj = yj983042yj9830422Qj =

983045Qjminus1 qj

983046

Draw a standard Gaussian vector gj+r of length nyj+r = (IminusQjQ

Tj )Agj+r

yi = yi minus qjqTj yi i = (j + 1) (j + r minus 1)

endQ = Qj

The orthonormal matrix Q such that 983042AminusQQTA9830422 le ε holdswith probability at least 1minusminmn10minusr

Low-Rank Approximations Lecture 4 March 6 - 13 2020 33 40

5 Rank-revealing QR Factorization

Column pivoted QR factorization (CPQR) of A isin Rmtimesn

AP = QR

where P is a permutation matrix Q is an orthogonal matrix andR is upper triangular

Householder column pivoted QR

Householder column pivoted QR consists of a sequence of nminus 1rank-1 updates (so called BLAS2 operations)

Low-Rank Approximations Lecture 4 March 6 - 13 2020 34 40

Householder column pivoted QR executes rather slowly on modernhardware in particular on systems involving many cores when thematrix is stored in distributed memory or on a hard drive etcThe reason is the process very communication intensive Wereturn to block the process

Blocked Householder column pivoted QR

BLAS3 operations are involved which can be executed very rapidlyon a broad range of computing hardware

A randomized algorithm

G = randn(b+ pm) Y = GA [simsimP1] = qr(Y 0)

Low-Rank Approximations Lecture 4 March 6 - 13 2020 35 40

6 Rank-revealing UTV decomposition

Given an mtimes n matrix A with m ge n a UTV decomposition ofA is a factorization of the form

A = UTVT

where U isin Rmtimesm and V isin Rntimesn are unitary matrices and T is atriangular matrix (either lower or upper triangular)

The UTV decomposition can be viewed as a generalization ofother standard factorizations such as eg the Singular ValueDecomposition (SVD) or the Column Pivoted QR decomposition(CPQR) (To be precise the SVD is the special case where T isdiagonal and the CPQR is the special case where V is apermutation matrix) The additional flexibility inherent in UTVdecomposition enables the design of efficient updating procedures

randUTV provides close to optimal low-rank approximation andhighly accurate estimates for the singular values of a matrix

Low-Rank Approximations Lecture 4 March 6 - 13 2020 36 40

The algorithm randUTV builds the factorization incrementallywhich means that when it is applied to a matrix of numerical rankk the algorithm can be stopped early and incur an overall cost ofO(mnk)

Like HQRRP the algorithm randUTV is blocked which enables itto execute fast on modern hardware

The algorithm randUTV is not an iterative algorithm In thisregard it is closer to the CPQR than standard SVD algorithmswhich substantially simplifies software optimization

Blocked UTV factorization process

Low-Rank Approximations Lecture 4 March 6 - 13 2020 37 40

61 A single step block factorization from A0 to A1

Draw an mtimes p Gaussian matrix G and form the sample matrix

Y = (ATA)qATG

where q is a parameter indicating the number of steps of poweriteration taken

Form a unitary matrix V whose first p columns form anorthonormal basis for the column space of Y For example

[Vsim] = qr(Y)

We then compute SVD of the matrix AV( 1 p)

AV( 1 p) = U1A11WT

and set V1 = V

983063W

I

983064

Low-Rank Approximations Lecture 4 March 6 - 13 2020 38 40

One step of randUTV

function [UTV] = stepUTV(A p q)G =randn(size(A 1) p)Y = ATGfor i = 1 q

Y = AT(AY)end[Vsim] = qr(Y)[UDW] = svd(AV( 1 p))T =

983045D UTAV( (p+ 1) end)

983046

V( 1 p) = V( 1 p)Wreturn

randUTV then applies the same procedure to the remaining blockand so on

Low-Rank Approximations Lecture 4 March 6 - 13 2020 39 40

function [UTV] = randUTV(A p q)T = AU = eye(size(A 1)) V = eye(size(A 2))for i = 1 ceil(size(A 2)p)

I1 = 1 (p(iminus 1))I2 = (p(iminus 1) + 1) size(A 1)J2 = (p(iminus 1) + 1) size(A 2)if length(J2) gt p

[983141U 983141T 983141V] = stepUTV(T(I2 J2) p q)else

[983141U 983141T 983141V] = svd(T(I2 J2))end

U( I2) = U( I2)983141U V( J2) = V( J2)983141V

T(I2 J2) = 983141T T(I1 J2) = 983141T(I1 J2)983141Vend

return

Low-Rank Approximations Lecture 4 March 6 - 13 2020 40 40

Bounds on the likelihood of large deviations

The likelihood of a large deviation from the mean depends only onthe over-sampling parameter p and decays extraordinarily fastFor instance one can prove that if p ge 4 then

983042AminusQQTA9830422 le9830751 + 17

983158

1 +k

p

983076σk+1

+8radick + p

p+ 1

983091

983107min(mn)983131

j=k+1

σ2j

983092

98310812

with failure probability at most 3eminusp

Low-Rank Approximations Lecture 4 March 6 - 13 2020 10 40

14 Single-pass algorithms

Algorithm 2 Single-pass randomized EVD for symmetric matrix

Input Symmetric A isin Rntimesn a target rank kover-sampling parameter p

Output U isin Rntimesk D isin Rktimesk and A asymp UDUT

Stage A(1) Form an ntimes (k + p) Gaussian random matrix G(2) Form the sample matrix Y = AG(3) Let Q denote the orthonormal matrix formed by the k

dominant left singular vectors of Y

Stage B

(4) Let C = argminXisinRktimeskX=XT

983042XQTGminusQTY983042F

(5) Form the EVD of the small matrix C C = 983141UD983141UT

(6) Form U = Q983141U

Low-Rank Approximations Lecture 4 March 6 - 13 2020 11 40

Algorithm 3 Single-pass randomized SVD for general matrix

Input A isin Rmtimesn a target rank k over-sampling parameter pOutput U isin Rntimesk D isin Rktimesk V isin Rntimesk and A asymp UDVT

Stage A

(1) Form two Gaussian random matrices Gc isin Rntimes(k+p) and

Gr isin Rmtimes(k+p) (2) Form the sample matrices Yc = AGc and Yr = ATGr(3) Form orthonormal matrices Qc and Qr consisting of the

k dominant left singular vectors of Yc and Yr

Stage B(4) Let

C = argminXisinRktimesk

983042GTr QcXminusYT

r Qr9830422F + 983042XQTr Gc minusQT

c Yc9830422F

(5) Form the SVD of the small matrix C C = 983141UD983141VT

(6) Form U = Qc983141U and V = Qr

983141V

Low-Rank Approximations Lecture 4 March 6 - 13 2020 12 40

Streaming algorithms

We say that an algorithm for processing a matrix is a streamingalgorithm if both (1) and (2) are satisfied

(1) each entry of the matrix is accessed only once

(2) the entries can be fed in any order (In other words thealgorithm is not allowed to dictate the order in which the elementsare viewed)

Complexity O(mn log k) algorithm

Replace Gaussian random matrix G with a different randommatrix Ω

(1) Ω is sufficiently structured that AΩ can be evaluated inO(mn log k) flops

(2) Ω is sufficiently random such that the columns of AΩaccurately span the range of A

Low-Rank Approximations Lecture 4 March 6 - 13 2020 13 40

Candidate ldquoFast Johnson-Lindenstrauss Transformrdquo

(1) Complex case A good candidate for Ω

Ω = DFS

where D is diagonal and Dii are complex numbers of modulus onedrawn from a uniform distribution on the unit circle in C F is thediscrete Fourier transform

Fpq =1radicneminus2πi(pminus1)(qminus1)n p q = 1 n

and S is a matrix consisting of a random subset of l columns fromthe ntimes n unit matrix (drawn without replacement) In otherwords given an arbitrary matrix X of size mtimes n the matrix XSconsists of a randomly drawn subset of l columns of X

(2) Real case Ω = DHS where H is the Hadamard Transform Dis diagonal and Dii = plusmn1 with equal probability

Low-Rank Approximations Lecture 4 March 6 - 13 2020 14 40

15 Accuracy enhanced algorithms

Algorithm 4 Accuracy enhanced randomized SVD

Input A isin Rmtimesn a target rank kover-sampling parameter pa small q the number of steps in the power iteration

Output U isin Rmtimes(k+p) D isin R(k+p)times(k+p) and V isin Rntimes(k+p)

G = randn(n k + p)Y = AGfor j = 1 q

Z = ATYY = AZ

endQ = orth(Y) (for example [Qsim] = qr(Y 0))B = QTA

[983141UDV] = svd(Blsquoeconrsquo)

U = Q983141U

Low-Rank Approximations Lecture 4 March 6 - 13 2020 15 40

Theorem 2

Let A be an mtimes n matrix with singular values σjmin(mn)j=1 Let k be a

target rank and let p be an over-sampling parameter such that p ge 2and k + p le min(mn) Let G be a Gaussian random matrix of sizentimes (k + p) let q be a small integer set

Y = (AAT)qAG

and let Q = orth(Y) Then

E[983042AminusQQTA9830422]

le

983093

983097983095

9830751 +

983158k

pminus 1

983076σ2q+1k+1 +

eradick + p

p

983091

983107min(mn)983131

j=k+1

σ2(2q+1)j

983092

983108

12

983094

983098983096

12q+1

le9830771 +

983158k

pminus 1+

eradick + p

p

983155min(mn)minus k

983078 12q+1

σk+1

Low-Rank Approximations Lecture 4 March 6 - 13 2020 16 40

Algorithm 5 Accuracy enhanced randomized SVDwith orthonormalization

Input A isin Rmtimesn a target rank kover-sampling parameter pa small q the number of steps in the power iteration

Output U isin Rmtimes(k+p) D isin R(k+p)times(k+p) and V isin Rntimes(k+p)

G = randn(n k + p)Q = orth(AG)for j = 1 q

W = orth(ATQ)Q = orth(AW)

endB = QTA

[983141UDV] = svd(Blsquoeconrsquo)

U = Q983141U

Low-Rank Approximations Lecture 4 March 6 - 13 2020 17 40

16 The Nystrom method for SPD matrices

Algorithm 6 Randomized EVD for SPD matrixvia Nystrom method

Input SPD A isin Rntimesn a target rank kover-sampling parameter p

Output U isin Rmtimes(k+p) and Λ isin R(k+p)times(k+p)

G = randn(n k + p)Q = orth(AG)B1 = AQB2 = QTB1C = chol(B2)F = B1C

minus1[UΣsim] = svd(Flsquoeconrsquo)Λ = Σ2

Low-Rank Approximations Lecture 4 March 6 - 13 2020 18 40

2 Interpolative decomposition (ID)

Any matrix A of size mtimes n and rank k where k lt min(mn)admits a so called ldquocolumn IDrdquo which takes the form

A = CZ C isin Rmtimesk Z isin Rktimesn

where the matrix C = A( Js) and Z is a matrix that containsthe k times k identity matrix (Z( Js) = Ik)

Advantages compared to QR or SVD

(1) If A is sparse or non-negative then C shares these properties

(2) The ID requires less memory to store than either the QR orSVD

(3) Finding the indices associated with the spanning columns isoften helpful in data interpretation

(4) In the context of numerical algorithms for discretizing PDEsand integral equations the ID often preserves ldquothe physicsrdquo of aproblem in a way that the QR or SVD do not

Low-Rank Approximations Lecture 4 March 6 - 13 2020 19 40

ldquoRow IDrdquo

A = XR X isin Rmtimesk R isin Rktimesn

where R = A(Is ) and X is a matrix that contains the k times kidentity matrix (X(Is ) = Ik)

ldquoDouble-sided IDrdquo

A = XAsZ X isin Rmtimesk As isin Rktimesk Z isin Rktimesn

where X and Z are the same matrices as those that appear in rowID and column ID and As is the k times k submatrix of A given by

As = A(Is Js)

Low-Rank Approximations Lecture 4 March 6 - 13 2020 20 40

21 Deterministic techniques for the approximate ID

The standard software used to compute the classical columnpivoted QR factorization (CPQR)

AP = QS

can with some light post-processing be used to compute thecolumn ID Let

Q =983045Q1 Q2

983046 S =

983063S11 S12

0 S22

983064

ThenAP =

983045Q1S11 Q1S12 +Q2S22

983046

Let J denote the permutation vector associated with thepermutation matrix P so that P = I( J) and AP = A( J) LetJs = J(1 k) Then set

C = A( Js) = Q1S11

Low-Rank Approximations Lecture 4 March 6 - 13 2020 21 40

ByAP = Q1S11

983045I Sminus1

11 S12

983046+

9830450 Q2S22

983046

we have

A = C983045I Sminus1

11 S12

983046PT +

9830450 Q2S22

983046PT

= CZ+9830450 Q2S22

983046PT

Algorithm 7 Column ID A asymp A( Js)Z

function [JsZ] = ID col(A k)[simS J ] = qr(A 0)T = S(1 k 1 k)S(1 k (k + 1) n)Z = zeros(k n)Z( J) =

983045Ik T

983046

Js = J(1 k)end

Low-Rank Approximations Lecture 4 March 6 - 13 2020 22 40

The row ID can be computed via an entirely analogous processthat starts with a CPQR of the transpose of A In other wordswe execute a pivoted Gram-Schmidt orthonormalization processon the rows of A

Algorithm 8 Row ID A asymp XA(Is )

function [IsX] = ID row(A k)[simS J ] = qr(AT 0)T = S(1 k 1 k)S(1 k (k + 1) m)X = zeros(m k)

X(J ) =983045Ik T

983046T

Is = J(1 k)end

Low-Rank Approximations Lecture 4 March 6 - 13 2020 23 40

To obtain the double-sided ID we start with using theCPQR-based process to build the column ID Then compute therow ID by performing Gram-Schmidt on the rows of the tall thinmatrix C

Algorithm 9 Double-sided ID A asymp XA(Is Js)Z

function [Is JsXZ] = ID double(A k)[JsZ] = ID col(A k)[IsX] = ID row(A( Js) k)

end

The algorithms for computing interpolatory decompositions arewasteful when k ≪ min(mn) since they involve a full QRfactorization which has complexity O(mnmin(mn)) Thisproblem is very easily remedied by replacing the full QRfactorization by a partial QR factorization (the QR factorization isinterrupted after k steps) which has cost O(mnk)

Low-Rank Approximations Lecture 4 March 6 - 13 2020 24 40

22 Randomized techniques for the approximate ID

Suppose A isin Rmtimesn with rank(A) = k Let

A = YF Y isin Rmtimesk F isin Rktimesn

Then by [IsX] = ID row(Y k) to compute a row ID of Y wehave

Y = XY(Is )

andXA(Is ) = XY(Is )F = YF = A

That is XA(Is ) is automatically a row ID of A as well

Observation In order to compute a row ID of a matrix A theonly information needed is a matrix Y whose columns span thecolumn space of A

As we have seen the task of finding a matrix Y whose columnsform a good basis for the column space of a matrix is ideallysuited to randomized sampling

Low-Rank Approximations Lecture 4 March 6 - 13 2020 25 40

Algorithm 10 Accuracy enhanced randomized row ID

Input A isin Rmtimesn a target rank kover-sampling parameter pa small q the number of steps in the power iteration

Output X isin Rmtimesk and an index vector Is isin Nk

such that A asymp XA(Is )

G = randn(n k + p)Y = AGfor j = 1 q

Z = ATYY = AZ

end[IsX] = ID row(Y k)

We can reduce the complexity by using a structured randommatrix instead of a Gaussian For example Ω = DFSDHS

Low-Rank Approximations Lecture 4 March 6 - 13 2020 26 40

3 The CUR decomposition

The CUR factorization approximates an mtimes n matrix A as aproduct

A asymp CUR C isin Rmtimesk U isin Rktimesk R isin Rktimesn

where C consists of a subset of the columns of A and R consists ofa subset of the rows of A

From double-sided ID to CUR Let

C = A( Js) R = A(Is )

ByA asymp CZ asymp CUR

we setU = ZRdagger

Low-Rank Approximations Lecture 4 March 6 - 13 2020 27 40

Algorithm 11 Accuracy enhanced randomized CUR

Input A isin Rmtimesn a target rank kover-sampling parameter pa small q the number of steps in the power iteration

Output U isin Rktimesk index vectors Is Js isin Nk

such that A asymp A( Js)UA(Is )

G = randn(k + pm)Y = GAfor j = 1 q

Z = YATY = ZA

end[IsZ] = ID col(Y k)[Issim] = ID row(A( Js) k)Solve UA(Is ) = Z in least squares sense

Low-Rank Approximations Lecture 4 March 6 - 13 2020 28 40

4 Adaptive rank determination

Given a matrix A and a tolerance ε The task is to determinematrices Qk and Bk of rank le k such that 983042AminusQkBk983042 le ε

41 Algorithm with updating the matrix

Q0 = [ ] B0 = [ ] A0 = A k = 0while 983042Ak9830422 gt ε

k = k + 1pick y isin range(Akminus1)qk = y983042y9830422bk = qT

kAkminus1Qk =

983045Qkminus1 qk

983046

Bk =

983063Bkminus1

bk

983064

Ak = Akminus1 minus qkbkend

Low-Rank Approximations Lecture 4 March 6 - 13 2020 29 40

Strategies for choosing y isin range(Akminus1)

(1) Pick the largest remaining column

y = Akminus1( jk) with jk = argmaxj

983042Akminus1( j)9830422

(2) Pick the locally optimal vector (computationally hard)

y = argmin983042x9830422=1 xisinrange(Akminus1)

983042(Iminus xxT)Akminus19830422

(3) A randomized selection strategy Set

y = Akminus1g

where g isin Rn is a Gaussian random vector

Low-Rank Approximations Lecture 4 March 6 - 13 2020 30 40

Exercise Prove that for each k Qk is orthonormal and

Bk = QTkA A = QkBk +Ak

Exercise Prove that for each k

983042A9830422F = 983042Bk9830422F + 983042Ak9830422F

Exercise Prove that for each k with the strategy (2) ie

y = argmin983042x9830422=1 xisinrange(Akminus1)

983042(Iminus xxT)Akminus19830422

the algorithm will produce matrices that attain the theoreticallyoptimal precision ie

983042AminusQkBk9830422 = σk+1

Low-Rank Approximations Lecture 4 March 6 - 13 2020 31 40

42 Block version for randomized selection strategy

Q = [ ] B = [ ]while 983042A9830422 gt ε

G = randn(n p)[Qnewsim] = qr(AG 0)Bnew = QT

newAQ =

983045Q Qnew

983046

B =

983063B

Bnew

983064

A = AminusQnewBnewend

Exercise The block version algorithmrsquos output is an orthonormalmatrix Q of size mtimes k (where k is a multiple of p) and a k times nmatrix B such that

983042AminusQB9830422 le ε

Low-Rank Approximations Lecture 4 March 6 - 13 2020 32 40

43 Algorithm without updating the matrix

Draw standard Gaussian vectors g1 gr of length nyi = Agi i = 1 r j = 0 Q0 = [ ]

while max983042yj+19830422 983042yj+29830422 983042yj+r9830422 gt ε(10983155

2π)j = j + 1yj = yj minusQjminus1Q

Tjminus1yj

qj = yj983042yj9830422Qj =

983045Qjminus1 qj

983046

Draw a standard Gaussian vector gj+r of length nyj+r = (IminusQjQ

Tj )Agj+r

yi = yi minus qjqTj yi i = (j + 1) (j + r minus 1)

endQ = Qj

The orthonormal matrix Q such that 983042AminusQQTA9830422 le ε holdswith probability at least 1minusminmn10minusr

Low-Rank Approximations Lecture 4 March 6 - 13 2020 33 40

5 Rank-revealing QR Factorization

Column pivoted QR factorization (CPQR) of A isin Rmtimesn

AP = QR

where P is a permutation matrix Q is an orthogonal matrix andR is upper triangular

Householder column pivoted QR

Householder column pivoted QR consists of a sequence of nminus 1rank-1 updates (so called BLAS2 operations)

Low-Rank Approximations Lecture 4 March 6 - 13 2020 34 40

Householder column pivoted QR executes rather slowly on modernhardware in particular on systems involving many cores when thematrix is stored in distributed memory or on a hard drive etcThe reason is the process very communication intensive Wereturn to block the process

Blocked Householder column pivoted QR

BLAS3 operations are involved which can be executed very rapidlyon a broad range of computing hardware

A randomized algorithm

G = randn(b+ pm) Y = GA [simsimP1] = qr(Y 0)

Low-Rank Approximations Lecture 4 March 6 - 13 2020 35 40

6 Rank-revealing UTV decomposition

Given an mtimes n matrix A with m ge n a UTV decomposition ofA is a factorization of the form

A = UTVT

where U isin Rmtimesm and V isin Rntimesn are unitary matrices and T is atriangular matrix (either lower or upper triangular)

The UTV decomposition can be viewed as a generalization ofother standard factorizations such as eg the Singular ValueDecomposition (SVD) or the Column Pivoted QR decomposition(CPQR) (To be precise the SVD is the special case where T isdiagonal and the CPQR is the special case where V is apermutation matrix) The additional flexibility inherent in UTVdecomposition enables the design of efficient updating procedures

randUTV provides close to optimal low-rank approximation andhighly accurate estimates for the singular values of a matrix

Low-Rank Approximations Lecture 4 March 6 - 13 2020 36 40

The algorithm randUTV builds the factorization incrementallywhich means that when it is applied to a matrix of numerical rankk the algorithm can be stopped early and incur an overall cost ofO(mnk)

Like HQRRP the algorithm randUTV is blocked which enables itto execute fast on modern hardware

The algorithm randUTV is not an iterative algorithm In thisregard it is closer to the CPQR than standard SVD algorithmswhich substantially simplifies software optimization

Blocked UTV factorization process

Low-Rank Approximations Lecture 4 March 6 - 13 2020 37 40

61 A single step block factorization from A0 to A1

Draw an mtimes p Gaussian matrix G and form the sample matrix

Y = (ATA)qATG

where q is a parameter indicating the number of steps of poweriteration taken

Form a unitary matrix V whose first p columns form anorthonormal basis for the column space of Y For example

[Vsim] = qr(Y)

We then compute SVD of the matrix AV( 1 p)

AV( 1 p) = U1A11WT

and set V1 = V

983063W

I

983064

Low-Rank Approximations Lecture 4 March 6 - 13 2020 38 40

One step of randUTV

function [UTV] = stepUTV(A p q)G =randn(size(A 1) p)Y = ATGfor i = 1 q

Y = AT(AY)end[Vsim] = qr(Y)[UDW] = svd(AV( 1 p))T =

983045D UTAV( (p+ 1) end)

983046

V( 1 p) = V( 1 p)Wreturn

randUTV then applies the same procedure to the remaining blockand so on

Low-Rank Approximations Lecture 4 March 6 - 13 2020 39 40

function [UTV] = randUTV(A p q)T = AU = eye(size(A 1)) V = eye(size(A 2))for i = 1 ceil(size(A 2)p)

I1 = 1 (p(iminus 1))I2 = (p(iminus 1) + 1) size(A 1)J2 = (p(iminus 1) + 1) size(A 2)if length(J2) gt p

[983141U 983141T 983141V] = stepUTV(T(I2 J2) p q)else

[983141U 983141T 983141V] = svd(T(I2 J2))end

U( I2) = U( I2)983141U V( J2) = V( J2)983141V

T(I2 J2) = 983141T T(I1 J2) = 983141T(I1 J2)983141Vend

return

Low-Rank Approximations Lecture 4 March 6 - 13 2020 40 40

14 Single-pass algorithms

Algorithm 2 Single-pass randomized EVD for symmetric matrix

Input Symmetric A isin Rntimesn a target rank kover-sampling parameter p

Output U isin Rntimesk D isin Rktimesk and A asymp UDUT

Stage A(1) Form an ntimes (k + p) Gaussian random matrix G(2) Form the sample matrix Y = AG(3) Let Q denote the orthonormal matrix formed by the k

dominant left singular vectors of Y

Stage B

(4) Let C = argminXisinRktimeskX=XT

983042XQTGminusQTY983042F

(5) Form the EVD of the small matrix C C = 983141UD983141UT

(6) Form U = Q983141U

Low-Rank Approximations Lecture 4 March 6 - 13 2020 11 40

Algorithm 3 Single-pass randomized SVD for general matrix

Input A isin Rmtimesn a target rank k over-sampling parameter pOutput U isin Rntimesk D isin Rktimesk V isin Rntimesk and A asymp UDVT

Stage A

(1) Form two Gaussian random matrices Gc isin Rntimes(k+p) and

Gr isin Rmtimes(k+p) (2) Form the sample matrices Yc = AGc and Yr = ATGr(3) Form orthonormal matrices Qc and Qr consisting of the

k dominant left singular vectors of Yc and Yr

Stage B(4) Let

C = argminXisinRktimesk

983042GTr QcXminusYT

r Qr9830422F + 983042XQTr Gc minusQT

c Yc9830422F

(5) Form the SVD of the small matrix C C = 983141UD983141VT

(6) Form U = Qc983141U and V = Qr

983141V

Low-Rank Approximations Lecture 4 March 6 - 13 2020 12 40

Streaming algorithms

We say that an algorithm for processing a matrix is a streamingalgorithm if both (1) and (2) are satisfied

(1) each entry of the matrix is accessed only once

(2) the entries can be fed in any order (In other words thealgorithm is not allowed to dictate the order in which the elementsare viewed)

Complexity O(mn log k) algorithm

Replace Gaussian random matrix G with a different randommatrix Ω

(1) Ω is sufficiently structured that AΩ can be evaluated inO(mn log k) flops

(2) Ω is sufficiently random such that the columns of AΩaccurately span the range of A

Low-Rank Approximations Lecture 4 March 6 - 13 2020 13 40

Candidate ldquoFast Johnson-Lindenstrauss Transformrdquo

(1) Complex case A good candidate for Ω

Ω = DFS

where D is diagonal and Dii are complex numbers of modulus onedrawn from a uniform distribution on the unit circle in C F is thediscrete Fourier transform

Fpq =1radicneminus2πi(pminus1)(qminus1)n p q = 1 n

and S is a matrix consisting of a random subset of l columns fromthe ntimes n unit matrix (drawn without replacement) In otherwords given an arbitrary matrix X of size mtimes n the matrix XSconsists of a randomly drawn subset of l columns of X

(2) Real case Ω = DHS where H is the Hadamard Transform Dis diagonal and Dii = plusmn1 with equal probability

Low-Rank Approximations Lecture 4 March 6 - 13 2020 14 40

15 Accuracy enhanced algorithms

Algorithm 4 Accuracy enhanced randomized SVD

Input A isin Rmtimesn a target rank kover-sampling parameter pa small q the number of steps in the power iteration

Output U isin Rmtimes(k+p) D isin R(k+p)times(k+p) and V isin Rntimes(k+p)

G = randn(n k + p)Y = AGfor j = 1 q

Z = ATYY = AZ

endQ = orth(Y) (for example [Qsim] = qr(Y 0))B = QTA

[983141UDV] = svd(Blsquoeconrsquo)

U = Q983141U

Low-Rank Approximations Lecture 4 March 6 - 13 2020 15 40

Theorem 2

Let A be an mtimes n matrix with singular values σjmin(mn)j=1 Let k be a

target rank and let p be an over-sampling parameter such that p ge 2and k + p le min(mn) Let G be a Gaussian random matrix of sizentimes (k + p) let q be a small integer set

Y = (AAT)qAG

and let Q = orth(Y) Then

E[983042AminusQQTA9830422]

le

983093

983097983095

9830751 +

983158k

pminus 1

983076σ2q+1k+1 +

eradick + p

p

983091

983107min(mn)983131

j=k+1

σ2(2q+1)j

983092

983108

12

983094

983098983096

12q+1

le9830771 +

983158k

pminus 1+

eradick + p

p

983155min(mn)minus k

983078 12q+1

σk+1

Low-Rank Approximations Lecture 4 March 6 - 13 2020 16 40

Algorithm 5 Accuracy enhanced randomized SVDwith orthonormalization

Input A isin Rmtimesn a target rank kover-sampling parameter pa small q the number of steps in the power iteration

Output U isin Rmtimes(k+p) D isin R(k+p)times(k+p) and V isin Rntimes(k+p)

G = randn(n k + p)Q = orth(AG)for j = 1 q

W = orth(ATQ)Q = orth(AW)

endB = QTA

[983141UDV] = svd(Blsquoeconrsquo)

U = Q983141U

Low-Rank Approximations Lecture 4 March 6 - 13 2020 17 40

16 The Nystrom method for SPD matrices

Algorithm 6 Randomized EVD for SPD matrixvia Nystrom method

Input SPD A isin Rntimesn a target rank kover-sampling parameter p

Output U isin Rmtimes(k+p) and Λ isin R(k+p)times(k+p)

G = randn(n k + p)Q = orth(AG)B1 = AQB2 = QTB1C = chol(B2)F = B1C

minus1[UΣsim] = svd(Flsquoeconrsquo)Λ = Σ2

Low-Rank Approximations Lecture 4 March 6 - 13 2020 18 40

2 Interpolative decomposition (ID)

Any matrix A of size mtimes n and rank k where k lt min(mn)admits a so called ldquocolumn IDrdquo which takes the form

A = CZ C isin Rmtimesk Z isin Rktimesn

where the matrix C = A( Js) and Z is a matrix that containsthe k times k identity matrix (Z( Js) = Ik)

Advantages compared to QR or SVD

(1) If A is sparse or non-negative then C shares these properties

(2) The ID requires less memory to store than either the QR orSVD

(3) Finding the indices associated with the spanning columns isoften helpful in data interpretation

(4) In the context of numerical algorithms for discretizing PDEsand integral equations the ID often preserves ldquothe physicsrdquo of aproblem in a way that the QR or SVD do not

Low-Rank Approximations Lecture 4 March 6 - 13 2020 19 40

ldquoRow IDrdquo

A = XR X isin Rmtimesk R isin Rktimesn

where R = A(Is ) and X is a matrix that contains the k times kidentity matrix (X(Is ) = Ik)

ldquoDouble-sided IDrdquo

A = XAsZ X isin Rmtimesk As isin Rktimesk Z isin Rktimesn

where X and Z are the same matrices as those that appear in rowID and column ID and As is the k times k submatrix of A given by

As = A(Is Js)

Low-Rank Approximations Lecture 4 March 6 - 13 2020 20 40

21 Deterministic techniques for the approximate ID

The standard software used to compute the classical columnpivoted QR factorization (CPQR)

AP = QS

can with some light post-processing be used to compute thecolumn ID Let

Q =983045Q1 Q2

983046 S =

983063S11 S12

0 S22

983064

ThenAP =

983045Q1S11 Q1S12 +Q2S22

983046

Let J denote the permutation vector associated with thepermutation matrix P so that P = I( J) and AP = A( J) LetJs = J(1 k) Then set

C = A( Js) = Q1S11

Low-Rank Approximations Lecture 4 March 6 - 13 2020 21 40

ByAP = Q1S11

983045I Sminus1

11 S12

983046+

9830450 Q2S22

983046

we have

A = C983045I Sminus1

11 S12

983046PT +

9830450 Q2S22

983046PT

= CZ+9830450 Q2S22

983046PT

Algorithm 7 Column ID A asymp A( Js)Z

function [JsZ] = ID col(A k)[simS J ] = qr(A 0)T = S(1 k 1 k)S(1 k (k + 1) n)Z = zeros(k n)Z( J) =

983045Ik T

983046

Js = J(1 k)end

Low-Rank Approximations Lecture 4 March 6 - 13 2020 22 40

The row ID can be computed via an entirely analogous processthat starts with a CPQR of the transpose of A In other wordswe execute a pivoted Gram-Schmidt orthonormalization processon the rows of A

Algorithm 8 Row ID A asymp XA(Is )

function [IsX] = ID row(A k)[simS J ] = qr(AT 0)T = S(1 k 1 k)S(1 k (k + 1) m)X = zeros(m k)

X(J ) =983045Ik T

983046T

Is = J(1 k)end

Low-Rank Approximations Lecture 4 March 6 - 13 2020 23 40

To obtain the double-sided ID we start with using theCPQR-based process to build the column ID Then compute therow ID by performing Gram-Schmidt on the rows of the tall thinmatrix C

Algorithm 9 Double-sided ID A asymp XA(Is Js)Z

function [Is JsXZ] = ID double(A k)[JsZ] = ID col(A k)[IsX] = ID row(A( Js) k)

end

The algorithms for computing interpolatory decompositions arewasteful when k ≪ min(mn) since they involve a full QRfactorization which has complexity O(mnmin(mn)) Thisproblem is very easily remedied by replacing the full QRfactorization by a partial QR factorization (the QR factorization isinterrupted after k steps) which has cost O(mnk)

Low-Rank Approximations Lecture 4 March 6 - 13 2020 24 40

22 Randomized techniques for the approximate ID

Suppose A isin Rmtimesn with rank(A) = k Let

A = YF Y isin Rmtimesk F isin Rktimesn

Then by [IsX] = ID row(Y k) to compute a row ID of Y wehave

Y = XY(Is )

andXA(Is ) = XY(Is )F = YF = A

That is XA(Is ) is automatically a row ID of A as well

Observation In order to compute a row ID of a matrix A theonly information needed is a matrix Y whose columns span thecolumn space of A

As we have seen the task of finding a matrix Y whose columnsform a good basis for the column space of a matrix is ideallysuited to randomized sampling

Low-Rank Approximations Lecture 4 March 6 - 13 2020 25 40

Algorithm 10 Accuracy enhanced randomized row ID

Input A isin Rmtimesn a target rank kover-sampling parameter pa small q the number of steps in the power iteration

Output X isin Rmtimesk and an index vector Is isin Nk

such that A asymp XA(Is )

G = randn(n k + p)Y = AGfor j = 1 q

Z = ATYY = AZ

end[IsX] = ID row(Y k)

We can reduce the complexity by using a structured randommatrix instead of a Gaussian For example Ω = DFSDHS

Low-Rank Approximations Lecture 4 March 6 - 13 2020 26 40

3 The CUR decomposition

The CUR factorization approximates an mtimes n matrix A as aproduct

A asymp CUR C isin Rmtimesk U isin Rktimesk R isin Rktimesn

where C consists of a subset of the columns of A and R consists ofa subset of the rows of A

From double-sided ID to CUR Let

C = A( Js) R = A(Is )

ByA asymp CZ asymp CUR

we setU = ZRdagger

Low-Rank Approximations Lecture 4 March 6 - 13 2020 27 40

Algorithm 11 Accuracy enhanced randomized CUR

Input A isin Rmtimesn a target rank kover-sampling parameter pa small q the number of steps in the power iteration

Output U isin Rktimesk index vectors Is Js isin Nk

such that A asymp A( Js)UA(Is )

G = randn(k + pm)Y = GAfor j = 1 q

Z = YATY = ZA

end[IsZ] = ID col(Y k)[Issim] = ID row(A( Js) k)Solve UA(Is ) = Z in least squares sense

Low-Rank Approximations Lecture 4 March 6 - 13 2020 28 40

4 Adaptive rank determination

Given a matrix A and a tolerance ε The task is to determinematrices Qk and Bk of rank le k such that 983042AminusQkBk983042 le ε

41 Algorithm with updating the matrix

Q0 = [ ] B0 = [ ] A0 = A k = 0while 983042Ak9830422 gt ε

k = k + 1pick y isin range(Akminus1)qk = y983042y9830422bk = qT

kAkminus1Qk =

983045Qkminus1 qk

983046

Bk =

983063Bkminus1

bk

983064

Ak = Akminus1 minus qkbkend

Low-Rank Approximations Lecture 4 March 6 - 13 2020 29 40

Strategies for choosing y isin range(Akminus1)

(1) Pick the largest remaining column

y = Akminus1( jk) with jk = argmaxj

983042Akminus1( j)9830422

(2) Pick the locally optimal vector (computationally hard)

y = argmin983042x9830422=1 xisinrange(Akminus1)

983042(Iminus xxT)Akminus19830422

(3) A randomized selection strategy Set

y = Akminus1g

where g isin Rn is a Gaussian random vector

Low-Rank Approximations Lecture 4 March 6 - 13 2020 30 40

Exercise Prove that for each k Qk is orthonormal and

Bk = QTkA A = QkBk +Ak

Exercise Prove that for each k

983042A9830422F = 983042Bk9830422F + 983042Ak9830422F

Exercise Prove that for each k with the strategy (2) ie

y = argmin983042x9830422=1 xisinrange(Akminus1)

983042(Iminus xxT)Akminus19830422

the algorithm will produce matrices that attain the theoreticallyoptimal precision ie

983042AminusQkBk9830422 = σk+1

Low-Rank Approximations Lecture 4 March 6 - 13 2020 31 40

42 Block version for randomized selection strategy

Q = [ ] B = [ ]while 983042A9830422 gt ε

G = randn(n p)[Qnewsim] = qr(AG 0)Bnew = QT

newAQ =

983045Q Qnew

983046

B =

983063B

Bnew

983064

A = AminusQnewBnewend

Exercise The block version algorithmrsquos output is an orthonormalmatrix Q of size mtimes k (where k is a multiple of p) and a k times nmatrix B such that

983042AminusQB9830422 le ε

Low-Rank Approximations Lecture 4 March 6 - 13 2020 32 40

43 Algorithm without updating the matrix

Draw standard Gaussian vectors g1 gr of length nyi = Agi i = 1 r j = 0 Q0 = [ ]

while max983042yj+19830422 983042yj+29830422 983042yj+r9830422 gt ε(10983155

2π)j = j + 1yj = yj minusQjminus1Q

Tjminus1yj

qj = yj983042yj9830422Qj =

983045Qjminus1 qj

983046

Draw a standard Gaussian vector gj+r of length nyj+r = (IminusQjQ

Tj )Agj+r

yi = yi minus qjqTj yi i = (j + 1) (j + r minus 1)

endQ = Qj

The orthonormal matrix Q such that 983042AminusQQTA9830422 le ε holdswith probability at least 1minusminmn10minusr

Low-Rank Approximations Lecture 4 March 6 - 13 2020 33 40

5 Rank-revealing QR Factorization

Column pivoted QR factorization (CPQR) of A isin Rmtimesn

AP = QR

where P is a permutation matrix Q is an orthogonal matrix andR is upper triangular

Householder column pivoted QR

Householder column pivoted QR consists of a sequence of nminus 1rank-1 updates (so called BLAS2 operations)

Low-Rank Approximations Lecture 4 March 6 - 13 2020 34 40

Householder column pivoted QR executes rather slowly on modernhardware in particular on systems involving many cores when thematrix is stored in distributed memory or on a hard drive etcThe reason is the process very communication intensive Wereturn to block the process

Blocked Householder column pivoted QR

BLAS3 operations are involved which can be executed very rapidlyon a broad range of computing hardware

A randomized algorithm

G = randn(b+ pm) Y = GA [simsimP1] = qr(Y 0)

Low-Rank Approximations Lecture 4 March 6 - 13 2020 35 40

6 Rank-revealing UTV decomposition

Given an mtimes n matrix A with m ge n a UTV decomposition ofA is a factorization of the form

A = UTVT

where U isin Rmtimesm and V isin Rntimesn are unitary matrices and T is atriangular matrix (either lower or upper triangular)

The UTV decomposition can be viewed as a generalization ofother standard factorizations such as eg the Singular ValueDecomposition (SVD) or the Column Pivoted QR decomposition(CPQR) (To be precise the SVD is the special case where T isdiagonal and the CPQR is the special case where V is apermutation matrix) The additional flexibility inherent in UTVdecomposition enables the design of efficient updating procedures

randUTV provides close to optimal low-rank approximation andhighly accurate estimates for the singular values of a matrix

Low-Rank Approximations Lecture 4 March 6 - 13 2020 36 40

The algorithm randUTV builds the factorization incrementallywhich means that when it is applied to a matrix of numerical rankk the algorithm can be stopped early and incur an overall cost ofO(mnk)

Like HQRRP the algorithm randUTV is blocked which enables itto execute fast on modern hardware

The algorithm randUTV is not an iterative algorithm In thisregard it is closer to the CPQR than standard SVD algorithmswhich substantially simplifies software optimization

Blocked UTV factorization process

Low-Rank Approximations Lecture 4 March 6 - 13 2020 37 40

61 A single step block factorization from A0 to A1

Draw an mtimes p Gaussian matrix G and form the sample matrix

Y = (ATA)qATG

where q is a parameter indicating the number of steps of poweriteration taken

Form a unitary matrix V whose first p columns form anorthonormal basis for the column space of Y For example

[Vsim] = qr(Y)

We then compute SVD of the matrix AV( 1 p)

AV( 1 p) = U1A11WT

and set V1 = V

983063W

I

983064

Low-Rank Approximations Lecture 4 March 6 - 13 2020 38 40

One step of randUTV

function [UTV] = stepUTV(A p q)G =randn(size(A 1) p)Y = ATGfor i = 1 q

Y = AT(AY)end[Vsim] = qr(Y)[UDW] = svd(AV( 1 p))T =

983045D UTAV( (p+ 1) end)

983046

V( 1 p) = V( 1 p)Wreturn

randUTV then applies the same procedure to the remaining blockand so on

Low-Rank Approximations Lecture 4 March 6 - 13 2020 39 40

function [UTV] = randUTV(A p q)T = AU = eye(size(A 1)) V = eye(size(A 2))for i = 1 ceil(size(A 2)p)

I1 = 1 (p(iminus 1))I2 = (p(iminus 1) + 1) size(A 1)J2 = (p(iminus 1) + 1) size(A 2)if length(J2) gt p

[983141U 983141T 983141V] = stepUTV(T(I2 J2) p q)else

[983141U 983141T 983141V] = svd(T(I2 J2))end

U( I2) = U( I2)983141U V( J2) = V( J2)983141V

T(I2 J2) = 983141T T(I1 J2) = 983141T(I1 J2)983141Vend

return

Low-Rank Approximations Lecture 4 March 6 - 13 2020 40 40

Algorithm 3 Single-pass randomized SVD for general matrix

Input A isin Rmtimesn a target rank k over-sampling parameter pOutput U isin Rntimesk D isin Rktimesk V isin Rntimesk and A asymp UDVT

Stage A

(1) Form two Gaussian random matrices Gc isin Rntimes(k+p) and

Gr isin Rmtimes(k+p) (2) Form the sample matrices Yc = AGc and Yr = ATGr(3) Form orthonormal matrices Qc and Qr consisting of the

k dominant left singular vectors of Yc and Yr

Stage B(4) Let

C = argminXisinRktimesk

983042GTr QcXminusYT

r Qr9830422F + 983042XQTr Gc minusQT

c Yc9830422F

(5) Form the SVD of the small matrix C C = 983141UD983141VT

(6) Form U = Qc983141U and V = Qr

983141V

Low-Rank Approximations Lecture 4 March 6 - 13 2020 12 40

Streaming algorithms

We say that an algorithm for processing a matrix is a streamingalgorithm if both (1) and (2) are satisfied

(1) each entry of the matrix is accessed only once

(2) the entries can be fed in any order (In other words thealgorithm is not allowed to dictate the order in which the elementsare viewed)

Complexity O(mn log k) algorithm

Replace Gaussian random matrix G with a different randommatrix Ω

(1) Ω is sufficiently structured that AΩ can be evaluated inO(mn log k) flops

(2) Ω is sufficiently random such that the columns of AΩaccurately span the range of A

Low-Rank Approximations Lecture 4 March 6 - 13 2020 13 40

Candidate ldquoFast Johnson-Lindenstrauss Transformrdquo

(1) Complex case A good candidate for Ω

Ω = DFS

where D is diagonal and Dii are complex numbers of modulus onedrawn from a uniform distribution on the unit circle in C F is thediscrete Fourier transform

Fpq =1radicneminus2πi(pminus1)(qminus1)n p q = 1 n

and S is a matrix consisting of a random subset of l columns fromthe ntimes n unit matrix (drawn without replacement) In otherwords given an arbitrary matrix X of size mtimes n the matrix XSconsists of a randomly drawn subset of l columns of X

(2) Real case Ω = DHS where H is the Hadamard Transform Dis diagonal and Dii = plusmn1 with equal probability

Low-Rank Approximations Lecture 4 March 6 - 13 2020 14 40

15 Accuracy enhanced algorithms

Algorithm 4 Accuracy enhanced randomized SVD

Input A isin Rmtimesn a target rank kover-sampling parameter pa small q the number of steps in the power iteration

Output U isin Rmtimes(k+p) D isin R(k+p)times(k+p) and V isin Rntimes(k+p)

G = randn(n k + p)Y = AGfor j = 1 q

Z = ATYY = AZ

endQ = orth(Y) (for example [Qsim] = qr(Y 0))B = QTA

[983141UDV] = svd(Blsquoeconrsquo)

U = Q983141U

Low-Rank Approximations Lecture 4 March 6 - 13 2020 15 40

Theorem 2

Let A be an mtimes n matrix with singular values σjmin(mn)j=1 Let k be a

target rank and let p be an over-sampling parameter such that p ge 2and k + p le min(mn) Let G be a Gaussian random matrix of sizentimes (k + p) let q be a small integer set

Y = (AAT)qAG

and let Q = orth(Y) Then

E[983042AminusQQTA9830422]

le

983093

983097983095

9830751 +

983158k

pminus 1

983076σ2q+1k+1 +

eradick + p

p

983091

983107min(mn)983131

j=k+1

σ2(2q+1)j

983092

983108

12

983094

983098983096

12q+1

le9830771 +

983158k

pminus 1+

eradick + p

p

983155min(mn)minus k

983078 12q+1

σk+1

Low-Rank Approximations Lecture 4 March 6 - 13 2020 16 40

Algorithm 5 Accuracy enhanced randomized SVDwith orthonormalization

Input A isin Rmtimesn a target rank kover-sampling parameter pa small q the number of steps in the power iteration

Output U isin Rmtimes(k+p) D isin R(k+p)times(k+p) and V isin Rntimes(k+p)

G = randn(n k + p)Q = orth(AG)for j = 1 q

W = orth(ATQ)Q = orth(AW)

endB = QTA

[983141UDV] = svd(Blsquoeconrsquo)

U = Q983141U

Low-Rank Approximations Lecture 4 March 6 - 13 2020 17 40

16 The Nystrom method for SPD matrices

Algorithm 6 Randomized EVD for SPD matrixvia Nystrom method

Input SPD A isin Rntimesn a target rank kover-sampling parameter p

Output U isin Rmtimes(k+p) and Λ isin R(k+p)times(k+p)

G = randn(n k + p)Q = orth(AG)B1 = AQB2 = QTB1C = chol(B2)F = B1C

minus1[UΣsim] = svd(Flsquoeconrsquo)Λ = Σ2

Low-Rank Approximations Lecture 4 March 6 - 13 2020 18 40

2 Interpolative decomposition (ID)

Any matrix A of size mtimes n and rank k where k lt min(mn)admits a so called ldquocolumn IDrdquo which takes the form

A = CZ C isin Rmtimesk Z isin Rktimesn

where the matrix C = A( Js) and Z is a matrix that containsthe k times k identity matrix (Z( Js) = Ik)

Advantages compared to QR or SVD

(1) If A is sparse or non-negative then C shares these properties

(2) The ID requires less memory to store than either the QR orSVD

(3) Finding the indices associated with the spanning columns isoften helpful in data interpretation

(4) In the context of numerical algorithms for discretizing PDEsand integral equations the ID often preserves ldquothe physicsrdquo of aproblem in a way that the QR or SVD do not

Low-Rank Approximations Lecture 4 March 6 - 13 2020 19 40

ldquoRow IDrdquo

A = XR X isin Rmtimesk R isin Rktimesn

where R = A(Is ) and X is a matrix that contains the k times kidentity matrix (X(Is ) = Ik)

ldquoDouble-sided IDrdquo

A = XAsZ X isin Rmtimesk As isin Rktimesk Z isin Rktimesn

where X and Z are the same matrices as those that appear in rowID and column ID and As is the k times k submatrix of A given by

As = A(Is Js)

Low-Rank Approximations Lecture 4 March 6 - 13 2020 20 40

21 Deterministic techniques for the approximate ID

The standard software used to compute the classical columnpivoted QR factorization (CPQR)

AP = QS

can with some light post-processing be used to compute thecolumn ID Let

Q =983045Q1 Q2

983046 S =

983063S11 S12

0 S22

983064

ThenAP =

983045Q1S11 Q1S12 +Q2S22

983046

Let J denote the permutation vector associated with thepermutation matrix P so that P = I( J) and AP = A( J) LetJs = J(1 k) Then set

C = A( Js) = Q1S11

Low-Rank Approximations Lecture 4 March 6 - 13 2020 21 40

ByAP = Q1S11

983045I Sminus1

11 S12

983046+

9830450 Q2S22

983046

we have

A = C983045I Sminus1

11 S12

983046PT +

9830450 Q2S22

983046PT

= CZ+9830450 Q2S22

983046PT

Algorithm 7 Column ID A asymp A( Js)Z

function [JsZ] = ID col(A k)[simS J ] = qr(A 0)T = S(1 k 1 k)S(1 k (k + 1) n)Z = zeros(k n)Z( J) =

983045Ik T

983046

Js = J(1 k)end

Low-Rank Approximations Lecture 4 March 6 - 13 2020 22 40

The row ID can be computed via an entirely analogous processthat starts with a CPQR of the transpose of A In other wordswe execute a pivoted Gram-Schmidt orthonormalization processon the rows of A

Algorithm 8 Row ID A asymp XA(Is )

function [IsX] = ID row(A k)[simS J ] = qr(AT 0)T = S(1 k 1 k)S(1 k (k + 1) m)X = zeros(m k)

X(J ) =983045Ik T

983046T

Is = J(1 k)end

Low-Rank Approximations Lecture 4 March 6 - 13 2020 23 40

To obtain the double-sided ID we start with using theCPQR-based process to build the column ID Then compute therow ID by performing Gram-Schmidt on the rows of the tall thinmatrix C

Algorithm 9 Double-sided ID A asymp XA(Is Js)Z

function [Is JsXZ] = ID double(A k)[JsZ] = ID col(A k)[IsX] = ID row(A( Js) k)

end

The algorithms for computing interpolatory decompositions arewasteful when k ≪ min(mn) since they involve a full QRfactorization which has complexity O(mnmin(mn)) Thisproblem is very easily remedied by replacing the full QRfactorization by a partial QR factorization (the QR factorization isinterrupted after k steps) which has cost O(mnk)

Low-Rank Approximations Lecture 4 March 6 - 13 2020 24 40

22 Randomized techniques for the approximate ID

Suppose A isin Rmtimesn with rank(A) = k Let

A = YF Y isin Rmtimesk F isin Rktimesn

Then by [IsX] = ID row(Y k) to compute a row ID of Y wehave

Y = XY(Is )

andXA(Is ) = XY(Is )F = YF = A

That is XA(Is ) is automatically a row ID of A as well

Observation In order to compute a row ID of a matrix A theonly information needed is a matrix Y whose columns span thecolumn space of A

As we have seen the task of finding a matrix Y whose columnsform a good basis for the column space of a matrix is ideallysuited to randomized sampling

Low-Rank Approximations Lecture 4 March 6 - 13 2020 25 40

Algorithm 10 Accuracy enhanced randomized row ID

Input A isin Rmtimesn a target rank kover-sampling parameter pa small q the number of steps in the power iteration

Output X isin Rmtimesk and an index vector Is isin Nk

such that A asymp XA(Is )

G = randn(n k + p)Y = AGfor j = 1 q

Z = ATYY = AZ

end[IsX] = ID row(Y k)

We can reduce the complexity by using a structured randommatrix instead of a Gaussian For example Ω = DFSDHS

Low-Rank Approximations Lecture 4 March 6 - 13 2020 26 40

3 The CUR decomposition

The CUR factorization approximates an mtimes n matrix A as aproduct

A asymp CUR C isin Rmtimesk U isin Rktimesk R isin Rktimesn

where C consists of a subset of the columns of A and R consists ofa subset of the rows of A

From double-sided ID to CUR Let

C = A( Js) R = A(Is )

ByA asymp CZ asymp CUR

we setU = ZRdagger

Low-Rank Approximations Lecture 4 March 6 - 13 2020 27 40

Algorithm 11 Accuracy enhanced randomized CUR

Input A isin Rmtimesn a target rank kover-sampling parameter pa small q the number of steps in the power iteration

Output U isin Rktimesk index vectors Is Js isin Nk

such that A asymp A( Js)UA(Is )

G = randn(k + pm)Y = GAfor j = 1 q

Z = YATY = ZA

end[IsZ] = ID col(Y k)[Issim] = ID row(A( Js) k)Solve UA(Is ) = Z in least squares sense

Low-Rank Approximations Lecture 4 March 6 - 13 2020 28 40

4 Adaptive rank determination

Given a matrix A and a tolerance ε The task is to determinematrices Qk and Bk of rank le k such that 983042AminusQkBk983042 le ε

41 Algorithm with updating the matrix

Q0 = [ ] B0 = [ ] A0 = A k = 0while 983042Ak9830422 gt ε

k = k + 1pick y isin range(Akminus1)qk = y983042y9830422bk = qT

kAkminus1Qk =

983045Qkminus1 qk

983046

Bk =

983063Bkminus1

bk

983064

Ak = Akminus1 minus qkbkend

Low-Rank Approximations Lecture 4 March 6 - 13 2020 29 40

Strategies for choosing y isin range(Akminus1)

(1) Pick the largest remaining column

y = Akminus1( jk) with jk = argmaxj

983042Akminus1( j)9830422

(2) Pick the locally optimal vector (computationally hard)

y = argmin983042x9830422=1 xisinrange(Akminus1)

983042(Iminus xxT)Akminus19830422

(3) A randomized selection strategy Set

y = Akminus1g

where g isin Rn is a Gaussian random vector

Low-Rank Approximations Lecture 4 March 6 - 13 2020 30 40

Exercise Prove that for each k Qk is orthonormal and

Bk = QTkA A = QkBk +Ak

Exercise Prove that for each k

983042A9830422F = 983042Bk9830422F + 983042Ak9830422F

Exercise Prove that for each k with the strategy (2) ie

y = argmin983042x9830422=1 xisinrange(Akminus1)

983042(Iminus xxT)Akminus19830422

the algorithm will produce matrices that attain the theoreticallyoptimal precision ie

983042AminusQkBk9830422 = σk+1

Low-Rank Approximations Lecture 4 March 6 - 13 2020 31 40

42 Block version for randomized selection strategy

Q = [ ] B = [ ]while 983042A9830422 gt ε

G = randn(n p)[Qnewsim] = qr(AG 0)Bnew = QT

newAQ =

983045Q Qnew

983046

B =

983063B

Bnew

983064

A = AminusQnewBnewend

Exercise The block version algorithmrsquos output is an orthonormalmatrix Q of size mtimes k (where k is a multiple of p) and a k times nmatrix B such that

983042AminusQB9830422 le ε

Low-Rank Approximations Lecture 4 March 6 - 13 2020 32 40

43 Algorithm without updating the matrix

Draw standard Gaussian vectors g1 gr of length nyi = Agi i = 1 r j = 0 Q0 = [ ]

while max983042yj+19830422 983042yj+29830422 983042yj+r9830422 gt ε(10983155

2π)j = j + 1yj = yj minusQjminus1Q

Tjminus1yj

qj = yj983042yj9830422Qj =

983045Qjminus1 qj

983046

Draw a standard Gaussian vector gj+r of length nyj+r = (IminusQjQ

Tj )Agj+r

yi = yi minus qjqTj yi i = (j + 1) (j + r minus 1)

endQ = Qj

The orthonormal matrix Q such that 983042AminusQQTA9830422 le ε holdswith probability at least 1minusminmn10minusr

Low-Rank Approximations Lecture 4 March 6 - 13 2020 33 40

5 Rank-revealing QR Factorization

Column pivoted QR factorization (CPQR) of A isin Rmtimesn

AP = QR

where P is a permutation matrix Q is an orthogonal matrix andR is upper triangular

Householder column pivoted QR

Householder column pivoted QR consists of a sequence of nminus 1rank-1 updates (so called BLAS2 operations)

Low-Rank Approximations Lecture 4 March 6 - 13 2020 34 40

Householder column pivoted QR executes rather slowly on modernhardware in particular on systems involving many cores when thematrix is stored in distributed memory or on a hard drive etcThe reason is the process very communication intensive Wereturn to block the process

Blocked Householder column pivoted QR

BLAS3 operations are involved which can be executed very rapidlyon a broad range of computing hardware

A randomized algorithm

G = randn(b+ pm) Y = GA [simsimP1] = qr(Y 0)

Low-Rank Approximations Lecture 4 March 6 - 13 2020 35 40

6 Rank-revealing UTV decomposition

Given an mtimes n matrix A with m ge n a UTV decomposition ofA is a factorization of the form

A = UTVT

where U isin Rmtimesm and V isin Rntimesn are unitary matrices and T is atriangular matrix (either lower or upper triangular)

The UTV decomposition can be viewed as a generalization ofother standard factorizations such as eg the Singular ValueDecomposition (SVD) or the Column Pivoted QR decomposition(CPQR) (To be precise the SVD is the special case where T isdiagonal and the CPQR is the special case where V is apermutation matrix) The additional flexibility inherent in UTVdecomposition enables the design of efficient updating procedures

randUTV provides close to optimal low-rank approximation andhighly accurate estimates for the singular values of a matrix

Low-Rank Approximations Lecture 4 March 6 - 13 2020 36 40

The algorithm randUTV builds the factorization incrementallywhich means that when it is applied to a matrix of numerical rankk the algorithm can be stopped early and incur an overall cost ofO(mnk)

Like HQRRP the algorithm randUTV is blocked which enables itto execute fast on modern hardware

The algorithm randUTV is not an iterative algorithm In thisregard it is closer to the CPQR than standard SVD algorithmswhich substantially simplifies software optimization

Blocked UTV factorization process

Low-Rank Approximations Lecture 4 March 6 - 13 2020 37 40

61 A single step block factorization from A0 to A1

Draw an mtimes p Gaussian matrix G and form the sample matrix

Y = (ATA)qATG

where q is a parameter indicating the number of steps of poweriteration taken

Form a unitary matrix V whose first p columns form anorthonormal basis for the column space of Y For example

[Vsim] = qr(Y)

We then compute SVD of the matrix AV( 1 p)

AV( 1 p) = U1A11WT

and set V1 = V

983063W

I

983064

Low-Rank Approximations Lecture 4 March 6 - 13 2020 38 40

One step of randUTV

function [UTV] = stepUTV(A p q)G =randn(size(A 1) p)Y = ATGfor i = 1 q

Y = AT(AY)end[Vsim] = qr(Y)[UDW] = svd(AV( 1 p))T =

983045D UTAV( (p+ 1) end)

983046

V( 1 p) = V( 1 p)Wreturn

randUTV then applies the same procedure to the remaining blockand so on

Low-Rank Approximations Lecture 4 March 6 - 13 2020 39 40

function [UTV] = randUTV(A p q)T = AU = eye(size(A 1)) V = eye(size(A 2))for i = 1 ceil(size(A 2)p)

I1 = 1 (p(iminus 1))I2 = (p(iminus 1) + 1) size(A 1)J2 = (p(iminus 1) + 1) size(A 2)if length(J2) gt p

[983141U 983141T 983141V] = stepUTV(T(I2 J2) p q)else

[983141U 983141T 983141V] = svd(T(I2 J2))end

U( I2) = U( I2)983141U V( J2) = V( J2)983141V

T(I2 J2) = 983141T T(I1 J2) = 983141T(I1 J2)983141Vend

return

Low-Rank Approximations Lecture 4 March 6 - 13 2020 40 40

Streaming algorithms

We say that an algorithm for processing a matrix is a streamingalgorithm if both (1) and (2) are satisfied

(1) each entry of the matrix is accessed only once

(2) the entries can be fed in any order (In other words thealgorithm is not allowed to dictate the order in which the elementsare viewed)

Complexity O(mn log k) algorithm

Replace Gaussian random matrix G with a different randommatrix Ω

(1) Ω is sufficiently structured that AΩ can be evaluated inO(mn log k) flops

(2) Ω is sufficiently random such that the columns of AΩaccurately span the range of A

Low-Rank Approximations Lecture 4 March 6 - 13 2020 13 40

Candidate ldquoFast Johnson-Lindenstrauss Transformrdquo

(1) Complex case A good candidate for Ω

Ω = DFS

where D is diagonal and Dii are complex numbers of modulus onedrawn from a uniform distribution on the unit circle in C F is thediscrete Fourier transform

Fpq =1radicneminus2πi(pminus1)(qminus1)n p q = 1 n

and S is a matrix consisting of a random subset of l columns fromthe ntimes n unit matrix (drawn without replacement) In otherwords given an arbitrary matrix X of size mtimes n the matrix XSconsists of a randomly drawn subset of l columns of X

(2) Real case Ω = DHS where H is the Hadamard Transform Dis diagonal and Dii = plusmn1 with equal probability

Low-Rank Approximations Lecture 4 March 6 - 13 2020 14 40

15 Accuracy enhanced algorithms

Algorithm 4 Accuracy enhanced randomized SVD

Input A isin Rmtimesn a target rank kover-sampling parameter pa small q the number of steps in the power iteration

Output U isin Rmtimes(k+p) D isin R(k+p)times(k+p) and V isin Rntimes(k+p)

G = randn(n k + p)Y = AGfor j = 1 q

Z = ATYY = AZ

endQ = orth(Y) (for example [Qsim] = qr(Y 0))B = QTA

[983141UDV] = svd(Blsquoeconrsquo)

U = Q983141U

Low-Rank Approximations Lecture 4 March 6 - 13 2020 15 40

Theorem 2

Let A be an mtimes n matrix with singular values σjmin(mn)j=1 Let k be a

target rank and let p be an over-sampling parameter such that p ge 2and k + p le min(mn) Let G be a Gaussian random matrix of sizentimes (k + p) let q be a small integer set

Y = (AAT)qAG

and let Q = orth(Y) Then

E[983042AminusQQTA9830422]

le

983093

983097983095

9830751 +

983158k

pminus 1

983076σ2q+1k+1 +

eradick + p

p

983091

983107min(mn)983131

j=k+1

σ2(2q+1)j

983092

983108

12

983094

983098983096

12q+1

le9830771 +

983158k

pminus 1+

eradick + p

p

983155min(mn)minus k

983078 12q+1

σk+1

Low-Rank Approximations Lecture 4 March 6 - 13 2020 16 40

Algorithm 5 Accuracy enhanced randomized SVDwith orthonormalization

Input A isin Rmtimesn a target rank kover-sampling parameter pa small q the number of steps in the power iteration

Output U isin Rmtimes(k+p) D isin R(k+p)times(k+p) and V isin Rntimes(k+p)

G = randn(n k + p)Q = orth(AG)for j = 1 q

W = orth(ATQ)Q = orth(AW)

endB = QTA

[983141UDV] = svd(Blsquoeconrsquo)

U = Q983141U

Low-Rank Approximations Lecture 4 March 6 - 13 2020 17 40

16 The Nystrom method for SPD matrices

Algorithm 6 Randomized EVD for SPD matrixvia Nystrom method

Input SPD A isin Rntimesn a target rank kover-sampling parameter p

Output U isin Rmtimes(k+p) and Λ isin R(k+p)times(k+p)

G = randn(n k + p)Q = orth(AG)B1 = AQB2 = QTB1C = chol(B2)F = B1C

minus1[UΣsim] = svd(Flsquoeconrsquo)Λ = Σ2

Low-Rank Approximations Lecture 4 March 6 - 13 2020 18 40

2 Interpolative decomposition (ID)

Any matrix A of size mtimes n and rank k where k lt min(mn)admits a so called ldquocolumn IDrdquo which takes the form

A = CZ C isin Rmtimesk Z isin Rktimesn

where the matrix C = A( Js) and Z is a matrix that containsthe k times k identity matrix (Z( Js) = Ik)

Advantages compared to QR or SVD

(1) If A is sparse or non-negative then C shares these properties

(2) The ID requires less memory to store than either the QR orSVD

(3) Finding the indices associated with the spanning columns isoften helpful in data interpretation

(4) In the context of numerical algorithms for discretizing PDEsand integral equations the ID often preserves ldquothe physicsrdquo of aproblem in a way that the QR or SVD do not

Low-Rank Approximations Lecture 4 March 6 - 13 2020 19 40

ldquoRow IDrdquo

A = XR X isin Rmtimesk R isin Rktimesn

where R = A(Is ) and X is a matrix that contains the k times kidentity matrix (X(Is ) = Ik)

ldquoDouble-sided IDrdquo

A = XAsZ X isin Rmtimesk As isin Rktimesk Z isin Rktimesn

where X and Z are the same matrices as those that appear in rowID and column ID and As is the k times k submatrix of A given by

As = A(Is Js)

Low-Rank Approximations Lecture 4 March 6 - 13 2020 20 40

21 Deterministic techniques for the approximate ID

The standard software used to compute the classical columnpivoted QR factorization (CPQR)

AP = QS

can with some light post-processing be used to compute thecolumn ID Let

Q =983045Q1 Q2

983046 S =

983063S11 S12

0 S22

983064

ThenAP =

983045Q1S11 Q1S12 +Q2S22

983046

Let J denote the permutation vector associated with thepermutation matrix P so that P = I( J) and AP = A( J) LetJs = J(1 k) Then set

C = A( Js) = Q1S11

Low-Rank Approximations Lecture 4 March 6 - 13 2020 21 40

ByAP = Q1S11

983045I Sminus1

11 S12

983046+

9830450 Q2S22

983046

we have

A = C983045I Sminus1

11 S12

983046PT +

9830450 Q2S22

983046PT

= CZ+9830450 Q2S22

983046PT

Algorithm 7 Column ID A asymp A( Js)Z

function [JsZ] = ID col(A k)[simS J ] = qr(A 0)T = S(1 k 1 k)S(1 k (k + 1) n)Z = zeros(k n)Z( J) =

983045Ik T

983046

Js = J(1 k)end

Low-Rank Approximations Lecture 4 March 6 - 13 2020 22 40

The row ID can be computed via an entirely analogous processthat starts with a CPQR of the transpose of A In other wordswe execute a pivoted Gram-Schmidt orthonormalization processon the rows of A

Algorithm 8 Row ID A asymp XA(Is )

function [IsX] = ID row(A k)[simS J ] = qr(AT 0)T = S(1 k 1 k)S(1 k (k + 1) m)X = zeros(m k)

X(J ) =983045Ik T

983046T

Is = J(1 k)end

Low-Rank Approximations Lecture 4 March 6 - 13 2020 23 40

To obtain the double-sided ID we start with using theCPQR-based process to build the column ID Then compute therow ID by performing Gram-Schmidt on the rows of the tall thinmatrix C

Algorithm 9 Double-sided ID A asymp XA(Is Js)Z

function [Is JsXZ] = ID double(A k)[JsZ] = ID col(A k)[IsX] = ID row(A( Js) k)

end

The algorithms for computing interpolatory decompositions arewasteful when k ≪ min(mn) since they involve a full QRfactorization which has complexity O(mnmin(mn)) Thisproblem is very easily remedied by replacing the full QRfactorization by a partial QR factorization (the QR factorization isinterrupted after k steps) which has cost O(mnk)

Low-Rank Approximations Lecture 4 March 6 - 13 2020 24 40

22 Randomized techniques for the approximate ID

Suppose A isin Rmtimesn with rank(A) = k Let

A = YF Y isin Rmtimesk F isin Rktimesn

Then by [IsX] = ID row(Y k) to compute a row ID of Y wehave

Y = XY(Is )

andXA(Is ) = XY(Is )F = YF = A

That is XA(Is ) is automatically a row ID of A as well

Observation In order to compute a row ID of a matrix A theonly information needed is a matrix Y whose columns span thecolumn space of A

As we have seen the task of finding a matrix Y whose columnsform a good basis for the column space of a matrix is ideallysuited to randomized sampling

Low-Rank Approximations Lecture 4 March 6 - 13 2020 25 40

Algorithm 10 Accuracy enhanced randomized row ID

Input A isin Rmtimesn a target rank kover-sampling parameter pa small q the number of steps in the power iteration

Output X isin Rmtimesk and an index vector Is isin Nk

such that A asymp XA(Is )

G = randn(n k + p)Y = AGfor j = 1 q

Z = ATYY = AZ

end[IsX] = ID row(Y k)

We can reduce the complexity by using a structured randommatrix instead of a Gaussian For example Ω = DFSDHS

Low-Rank Approximations Lecture 4 March 6 - 13 2020 26 40

3 The CUR decomposition

The CUR factorization approximates an mtimes n matrix A as aproduct

A asymp CUR C isin Rmtimesk U isin Rktimesk R isin Rktimesn

where C consists of a subset of the columns of A and R consists ofa subset of the rows of A

From double-sided ID to CUR Let

C = A( Js) R = A(Is )

ByA asymp CZ asymp CUR

we setU = ZRdagger

Low-Rank Approximations Lecture 4 March 6 - 13 2020 27 40

Algorithm 11 Accuracy enhanced randomized CUR

Input A isin Rmtimesn a target rank kover-sampling parameter pa small q the number of steps in the power iteration

Output U isin Rktimesk index vectors Is Js isin Nk

such that A asymp A( Js)UA(Is )

G = randn(k + pm)Y = GAfor j = 1 q

Z = YATY = ZA

end[IsZ] = ID col(Y k)[Issim] = ID row(A( Js) k)Solve UA(Is ) = Z in least squares sense

Low-Rank Approximations Lecture 4 March 6 - 13 2020 28 40

4 Adaptive rank determination

Given a matrix A and a tolerance ε The task is to determinematrices Qk and Bk of rank le k such that 983042AminusQkBk983042 le ε

41 Algorithm with updating the matrix

Q0 = [ ] B0 = [ ] A0 = A k = 0while 983042Ak9830422 gt ε

k = k + 1pick y isin range(Akminus1)qk = y983042y9830422bk = qT

kAkminus1Qk =

983045Qkminus1 qk

983046

Bk =

983063Bkminus1

bk

983064

Ak = Akminus1 minus qkbkend

Low-Rank Approximations Lecture 4 March 6 - 13 2020 29 40

Strategies for choosing y isin range(Akminus1)

(1) Pick the largest remaining column

y = Akminus1( jk) with jk = argmaxj

983042Akminus1( j)9830422

(2) Pick the locally optimal vector (computationally hard)

y = argmin983042x9830422=1 xisinrange(Akminus1)

983042(Iminus xxT)Akminus19830422

(3) A randomized selection strategy Set

y = Akminus1g

where g isin Rn is a Gaussian random vector

Low-Rank Approximations Lecture 4 March 6 - 13 2020 30 40

Exercise Prove that for each k Qk is orthonormal and

Bk = QTkA A = QkBk +Ak

Exercise Prove that for each k

983042A9830422F = 983042Bk9830422F + 983042Ak9830422F

Exercise Prove that for each k with the strategy (2) ie

y = argmin983042x9830422=1 xisinrange(Akminus1)

983042(Iminus xxT)Akminus19830422

the algorithm will produce matrices that attain the theoreticallyoptimal precision ie

983042AminusQkBk9830422 = σk+1

Low-Rank Approximations Lecture 4 March 6 - 13 2020 31 40

42 Block version for randomized selection strategy

Q = [ ] B = [ ]while 983042A9830422 gt ε

G = randn(n p)[Qnewsim] = qr(AG 0)Bnew = QT

newAQ =

983045Q Qnew

983046

B =

983063B

Bnew

983064

A = AminusQnewBnewend

Exercise The block version algorithmrsquos output is an orthonormalmatrix Q of size mtimes k (where k is a multiple of p) and a k times nmatrix B such that

983042AminusQB9830422 le ε

Low-Rank Approximations Lecture 4 March 6 - 13 2020 32 40

43 Algorithm without updating the matrix

Draw standard Gaussian vectors g1 gr of length nyi = Agi i = 1 r j = 0 Q0 = [ ]

while max983042yj+19830422 983042yj+29830422 983042yj+r9830422 gt ε(10983155

2π)j = j + 1yj = yj minusQjminus1Q

Tjminus1yj

qj = yj983042yj9830422Qj =

983045Qjminus1 qj

983046

Draw a standard Gaussian vector gj+r of length nyj+r = (IminusQjQ

Tj )Agj+r

yi = yi minus qjqTj yi i = (j + 1) (j + r minus 1)

endQ = Qj

The orthonormal matrix Q such that 983042AminusQQTA9830422 le ε holdswith probability at least 1minusminmn10minusr

Low-Rank Approximations Lecture 4 March 6 - 13 2020 33 40

5 Rank-revealing QR Factorization

Column pivoted QR factorization (CPQR) of A isin Rmtimesn

AP = QR

where P is a permutation matrix Q is an orthogonal matrix andR is upper triangular

Householder column pivoted QR

Householder column pivoted QR consists of a sequence of nminus 1rank-1 updates (so called BLAS2 operations)

Low-Rank Approximations Lecture 4 March 6 - 13 2020 34 40

Householder column pivoted QR executes rather slowly on modernhardware in particular on systems involving many cores when thematrix is stored in distributed memory or on a hard drive etcThe reason is the process very communication intensive Wereturn to block the process

Blocked Householder column pivoted QR

BLAS3 operations are involved which can be executed very rapidlyon a broad range of computing hardware

A randomized algorithm

G = randn(b+ pm) Y = GA [simsimP1] = qr(Y 0)

Low-Rank Approximations Lecture 4 March 6 - 13 2020 35 40

6 Rank-revealing UTV decomposition

Given an mtimes n matrix A with m ge n a UTV decomposition ofA is a factorization of the form

A = UTVT

where U isin Rmtimesm and V isin Rntimesn are unitary matrices and T is atriangular matrix (either lower or upper triangular)

The UTV decomposition can be viewed as a generalization ofother standard factorizations such as eg the Singular ValueDecomposition (SVD) or the Column Pivoted QR decomposition(CPQR) (To be precise the SVD is the special case where T isdiagonal and the CPQR is the special case where V is apermutation matrix) The additional flexibility inherent in UTVdecomposition enables the design of efficient updating procedures

randUTV provides close to optimal low-rank approximation andhighly accurate estimates for the singular values of a matrix

Low-Rank Approximations Lecture 4 March 6 - 13 2020 36 40

The algorithm randUTV builds the factorization incrementallywhich means that when it is applied to a matrix of numerical rankk the algorithm can be stopped early and incur an overall cost ofO(mnk)

Like HQRRP the algorithm randUTV is blocked which enables itto execute fast on modern hardware

The algorithm randUTV is not an iterative algorithm In thisregard it is closer to the CPQR than standard SVD algorithmswhich substantially simplifies software optimization

Blocked UTV factorization process

Low-Rank Approximations Lecture 4 March 6 - 13 2020 37 40

61 A single step block factorization from A0 to A1

Draw an mtimes p Gaussian matrix G and form the sample matrix

Y = (ATA)qATG

where q is a parameter indicating the number of steps of poweriteration taken

Form a unitary matrix V whose first p columns form anorthonormal basis for the column space of Y For example

[Vsim] = qr(Y)

We then compute SVD of the matrix AV( 1 p)

AV( 1 p) = U1A11WT

and set V1 = V

983063W

I

983064

Low-Rank Approximations Lecture 4 March 6 - 13 2020 38 40

One step of randUTV

function [UTV] = stepUTV(A p q)G =randn(size(A 1) p)Y = ATGfor i = 1 q

Y = AT(AY)end[Vsim] = qr(Y)[UDW] = svd(AV( 1 p))T =

983045D UTAV( (p+ 1) end)

983046

V( 1 p) = V( 1 p)Wreturn

randUTV then applies the same procedure to the remaining blockand so on

Low-Rank Approximations Lecture 4 March 6 - 13 2020 39 40

function [UTV] = randUTV(A p q)T = AU = eye(size(A 1)) V = eye(size(A 2))for i = 1 ceil(size(A 2)p)

I1 = 1 (p(iminus 1))I2 = (p(iminus 1) + 1) size(A 1)J2 = (p(iminus 1) + 1) size(A 2)if length(J2) gt p

[983141U 983141T 983141V] = stepUTV(T(I2 J2) p q)else

[983141U 983141T 983141V] = svd(T(I2 J2))end

U( I2) = U( I2)983141U V( J2) = V( J2)983141V

T(I2 J2) = 983141T T(I1 J2) = 983141T(I1 J2)983141Vend

return

Low-Rank Approximations Lecture 4 March 6 - 13 2020 40 40

Candidate ldquoFast Johnson-Lindenstrauss Transformrdquo

(1) Complex case A good candidate for Ω

Ω = DFS

where D is diagonal and Dii are complex numbers of modulus onedrawn from a uniform distribution on the unit circle in C F is thediscrete Fourier transform

Fpq =1radicneminus2πi(pminus1)(qminus1)n p q = 1 n

and S is a matrix consisting of a random subset of l columns fromthe ntimes n unit matrix (drawn without replacement) In otherwords given an arbitrary matrix X of size mtimes n the matrix XSconsists of a randomly drawn subset of l columns of X

(2) Real case Ω = DHS where H is the Hadamard Transform Dis diagonal and Dii = plusmn1 with equal probability

Low-Rank Approximations Lecture 4 March 6 - 13 2020 14 40

15 Accuracy enhanced algorithms

Algorithm 4 Accuracy enhanced randomized SVD

Input A isin Rmtimesn a target rank kover-sampling parameter pa small q the number of steps in the power iteration

Output U isin Rmtimes(k+p) D isin R(k+p)times(k+p) and V isin Rntimes(k+p)

G = randn(n k + p)Y = AGfor j = 1 q

Z = ATYY = AZ

endQ = orth(Y) (for example [Qsim] = qr(Y 0))B = QTA

[983141UDV] = svd(Blsquoeconrsquo)

U = Q983141U

Low-Rank Approximations Lecture 4 March 6 - 13 2020 15 40

Theorem 2

Let A be an mtimes n matrix with singular values σjmin(mn)j=1 Let k be a

target rank and let p be an over-sampling parameter such that p ge 2and k + p le min(mn) Let G be a Gaussian random matrix of sizentimes (k + p) let q be a small integer set

Y = (AAT)qAG

and let Q = orth(Y) Then

E[983042AminusQQTA9830422]

le

983093

983097983095

9830751 +

983158k

pminus 1

983076σ2q+1k+1 +

eradick + p

p

983091

983107min(mn)983131

j=k+1

σ2(2q+1)j

983092

983108

12

983094

983098983096

12q+1

le9830771 +

983158k

pminus 1+

eradick + p

p

983155min(mn)minus k

983078 12q+1

σk+1

Low-Rank Approximations Lecture 4 March 6 - 13 2020 16 40

Algorithm 5 Accuracy enhanced randomized SVDwith orthonormalization

Input A isin Rmtimesn a target rank kover-sampling parameter pa small q the number of steps in the power iteration

Output U isin Rmtimes(k+p) D isin R(k+p)times(k+p) and V isin Rntimes(k+p)

G = randn(n k + p)Q = orth(AG)for j = 1 q

W = orth(ATQ)Q = orth(AW)

endB = QTA

[983141UDV] = svd(Blsquoeconrsquo)

U = Q983141U

Low-Rank Approximations Lecture 4 March 6 - 13 2020 17 40

16 The Nystrom method for SPD matrices

Algorithm 6 Randomized EVD for SPD matrixvia Nystrom method

Input SPD A isin Rntimesn a target rank kover-sampling parameter p

Output U isin Rmtimes(k+p) and Λ isin R(k+p)times(k+p)

G = randn(n k + p)Q = orth(AG)B1 = AQB2 = QTB1C = chol(B2)F = B1C

minus1[UΣsim] = svd(Flsquoeconrsquo)Λ = Σ2

Low-Rank Approximations Lecture 4 March 6 - 13 2020 18 40

2 Interpolative decomposition (ID)

Any matrix A of size mtimes n and rank k where k lt min(mn)admits a so called ldquocolumn IDrdquo which takes the form

A = CZ C isin Rmtimesk Z isin Rktimesn

where the matrix C = A( Js) and Z is a matrix that containsthe k times k identity matrix (Z( Js) = Ik)

Advantages compared to QR or SVD

(1) If A is sparse or non-negative then C shares these properties

(2) The ID requires less memory to store than either the QR orSVD

(3) Finding the indices associated with the spanning columns isoften helpful in data interpretation

(4) In the context of numerical algorithms for discretizing PDEsand integral equations the ID often preserves ldquothe physicsrdquo of aproblem in a way that the QR or SVD do not

Low-Rank Approximations Lecture 4 March 6 - 13 2020 19 40

ldquoRow IDrdquo

A = XR X isin Rmtimesk R isin Rktimesn

where R = A(Is ) and X is a matrix that contains the k times kidentity matrix (X(Is ) = Ik)

ldquoDouble-sided IDrdquo

A = XAsZ X isin Rmtimesk As isin Rktimesk Z isin Rktimesn

where X and Z are the same matrices as those that appear in rowID and column ID and As is the k times k submatrix of A given by

As = A(Is Js)

Low-Rank Approximations Lecture 4 March 6 - 13 2020 20 40

21 Deterministic techniques for the approximate ID

The standard software used to compute the classical columnpivoted QR factorization (CPQR)

AP = QS

can with some light post-processing be used to compute thecolumn ID Let

Q =983045Q1 Q2

983046 S =

983063S11 S12

0 S22

983064

ThenAP =

983045Q1S11 Q1S12 +Q2S22

983046

Let J denote the permutation vector associated with thepermutation matrix P so that P = I( J) and AP = A( J) LetJs = J(1 k) Then set

C = A( Js) = Q1S11

Low-Rank Approximations Lecture 4 March 6 - 13 2020 21 40

ByAP = Q1S11

983045I Sminus1

11 S12

983046+

9830450 Q2S22

983046

we have

A = C983045I Sminus1

11 S12

983046PT +

9830450 Q2S22

983046PT

= CZ+9830450 Q2S22

983046PT

Algorithm 7 Column ID A asymp A( Js)Z

function [JsZ] = ID col(A k)[simS J ] = qr(A 0)T = S(1 k 1 k)S(1 k (k + 1) n)Z = zeros(k n)Z( J) =

983045Ik T

983046

Js = J(1 k)end

Low-Rank Approximations Lecture 4 March 6 - 13 2020 22 40

The row ID can be computed via an entirely analogous processthat starts with a CPQR of the transpose of A In other wordswe execute a pivoted Gram-Schmidt orthonormalization processon the rows of A

Algorithm 8 Row ID A asymp XA(Is )

function [IsX] = ID row(A k)[simS J ] = qr(AT 0)T = S(1 k 1 k)S(1 k (k + 1) m)X = zeros(m k)

X(J ) =983045Ik T

983046T

Is = J(1 k)end

Low-Rank Approximations Lecture 4 March 6 - 13 2020 23 40

To obtain the double-sided ID we start with using theCPQR-based process to build the column ID Then compute therow ID by performing Gram-Schmidt on the rows of the tall thinmatrix C

Algorithm 9 Double-sided ID A asymp XA(Is Js)Z

function [Is JsXZ] = ID double(A k)[JsZ] = ID col(A k)[IsX] = ID row(A( Js) k)

end

The algorithms for computing interpolatory decompositions arewasteful when k ≪ min(mn) since they involve a full QRfactorization which has complexity O(mnmin(mn)) Thisproblem is very easily remedied by replacing the full QRfactorization by a partial QR factorization (the QR factorization isinterrupted after k steps) which has cost O(mnk)

Low-Rank Approximations Lecture 4 March 6 - 13 2020 24 40

22 Randomized techniques for the approximate ID

Suppose A isin Rmtimesn with rank(A) = k Let

A = YF Y isin Rmtimesk F isin Rktimesn

Then by [IsX] = ID row(Y k) to compute a row ID of Y wehave

Y = XY(Is )

andXA(Is ) = XY(Is )F = YF = A

That is XA(Is ) is automatically a row ID of A as well

Observation In order to compute a row ID of a matrix A theonly information needed is a matrix Y whose columns span thecolumn space of A

As we have seen the task of finding a matrix Y whose columnsform a good basis for the column space of a matrix is ideallysuited to randomized sampling

Low-Rank Approximations Lecture 4 March 6 - 13 2020 25 40

Algorithm 10 Accuracy enhanced randomized row ID

Input A isin Rmtimesn a target rank kover-sampling parameter pa small q the number of steps in the power iteration

Output X isin Rmtimesk and an index vector Is isin Nk

such that A asymp XA(Is )

G = randn(n k + p)Y = AGfor j = 1 q

Z = ATYY = AZ

end[IsX] = ID row(Y k)

We can reduce the complexity by using a structured randommatrix instead of a Gaussian For example Ω = DFSDHS

Low-Rank Approximations Lecture 4 March 6 - 13 2020 26 40

3 The CUR decomposition

The CUR factorization approximates an mtimes n matrix A as aproduct

A asymp CUR C isin Rmtimesk U isin Rktimesk R isin Rktimesn

where C consists of a subset of the columns of A and R consists ofa subset of the rows of A

From double-sided ID to CUR Let

C = A( Js) R = A(Is )

ByA asymp CZ asymp CUR

we setU = ZRdagger

Low-Rank Approximations Lecture 4 March 6 - 13 2020 27 40

Algorithm 11 Accuracy enhanced randomized CUR

Input A isin Rmtimesn a target rank kover-sampling parameter pa small q the number of steps in the power iteration

Output U isin Rktimesk index vectors Is Js isin Nk

such that A asymp A( Js)UA(Is )

G = randn(k + pm)Y = GAfor j = 1 q

Z = YATY = ZA

end[IsZ] = ID col(Y k)[Issim] = ID row(A( Js) k)Solve UA(Is ) = Z in least squares sense

Low-Rank Approximations Lecture 4 March 6 - 13 2020 28 40

4 Adaptive rank determination

Given a matrix A and a tolerance ε The task is to determinematrices Qk and Bk of rank le k such that 983042AminusQkBk983042 le ε

41 Algorithm with updating the matrix

Q0 = [ ] B0 = [ ] A0 = A k = 0while 983042Ak9830422 gt ε

k = k + 1pick y isin range(Akminus1)qk = y983042y9830422bk = qT

kAkminus1Qk =

983045Qkminus1 qk

983046

Bk =

983063Bkminus1

bk

983064

Ak = Akminus1 minus qkbkend

Low-Rank Approximations Lecture 4 March 6 - 13 2020 29 40

Strategies for choosing y isin range(Akminus1)

(1) Pick the largest remaining column

y = Akminus1( jk) with jk = argmaxj

983042Akminus1( j)9830422

(2) Pick the locally optimal vector (computationally hard)

y = argmin983042x9830422=1 xisinrange(Akminus1)

983042(Iminus xxT)Akminus19830422

(3) A randomized selection strategy Set

y = Akminus1g

where g isin Rn is a Gaussian random vector

Low-Rank Approximations Lecture 4 March 6 - 13 2020 30 40

Exercise Prove that for each k Qk is orthonormal and

Bk = QTkA A = QkBk +Ak

Exercise Prove that for each k

983042A9830422F = 983042Bk9830422F + 983042Ak9830422F

Exercise Prove that for each k with the strategy (2) ie

y = argmin983042x9830422=1 xisinrange(Akminus1)

983042(Iminus xxT)Akminus19830422

the algorithm will produce matrices that attain the theoreticallyoptimal precision ie

983042AminusQkBk9830422 = σk+1

Low-Rank Approximations Lecture 4 March 6 - 13 2020 31 40

42 Block version for randomized selection strategy

Q = [ ] B = [ ]while 983042A9830422 gt ε

G = randn(n p)[Qnewsim] = qr(AG 0)Bnew = QT

newAQ =

983045Q Qnew

983046

B =

983063B

Bnew

983064

A = AminusQnewBnewend

Exercise The block version algorithmrsquos output is an orthonormalmatrix Q of size mtimes k (where k is a multiple of p) and a k times nmatrix B such that

983042AminusQB9830422 le ε

Low-Rank Approximations Lecture 4 March 6 - 13 2020 32 40

43 Algorithm without updating the matrix

Draw standard Gaussian vectors g1 gr of length nyi = Agi i = 1 r j = 0 Q0 = [ ]

while max983042yj+19830422 983042yj+29830422 983042yj+r9830422 gt ε(10983155

2π)j = j + 1yj = yj minusQjminus1Q

Tjminus1yj

qj = yj983042yj9830422Qj =

983045Qjminus1 qj

983046

Draw a standard Gaussian vector gj+r of length nyj+r = (IminusQjQ

Tj )Agj+r

yi = yi minus qjqTj yi i = (j + 1) (j + r minus 1)

endQ = Qj

The orthonormal matrix Q such that 983042AminusQQTA9830422 le ε holdswith probability at least 1minusminmn10minusr

Low-Rank Approximations Lecture 4 March 6 - 13 2020 33 40

5 Rank-revealing QR Factorization

Column pivoted QR factorization (CPQR) of A isin Rmtimesn

AP = QR

where P is a permutation matrix Q is an orthogonal matrix andR is upper triangular

Householder column pivoted QR

Householder column pivoted QR consists of a sequence of nminus 1rank-1 updates (so called BLAS2 operations)

Low-Rank Approximations Lecture 4 March 6 - 13 2020 34 40

Householder column pivoted QR executes rather slowly on modernhardware in particular on systems involving many cores when thematrix is stored in distributed memory or on a hard drive etcThe reason is the process very communication intensive Wereturn to block the process

Blocked Householder column pivoted QR

BLAS3 operations are involved which can be executed very rapidlyon a broad range of computing hardware

A randomized algorithm

G = randn(b+ pm) Y = GA [simsimP1] = qr(Y 0)

Low-Rank Approximations Lecture 4 March 6 - 13 2020 35 40

6 Rank-revealing UTV decomposition

Given an mtimes n matrix A with m ge n a UTV decomposition ofA is a factorization of the form

A = UTVT

where U isin Rmtimesm and V isin Rntimesn are unitary matrices and T is atriangular matrix (either lower or upper triangular)

The UTV decomposition can be viewed as a generalization ofother standard factorizations such as eg the Singular ValueDecomposition (SVD) or the Column Pivoted QR decomposition(CPQR) (To be precise the SVD is the special case where T isdiagonal and the CPQR is the special case where V is apermutation matrix) The additional flexibility inherent in UTVdecomposition enables the design of efficient updating procedures

randUTV provides close to optimal low-rank approximation andhighly accurate estimates for the singular values of a matrix

Low-Rank Approximations Lecture 4 March 6 - 13 2020 36 40

The algorithm randUTV builds the factorization incrementallywhich means that when it is applied to a matrix of numerical rankk the algorithm can be stopped early and incur an overall cost ofO(mnk)

Like HQRRP the algorithm randUTV is blocked which enables itto execute fast on modern hardware

The algorithm randUTV is not an iterative algorithm In thisregard it is closer to the CPQR than standard SVD algorithmswhich substantially simplifies software optimization

Blocked UTV factorization process

Low-Rank Approximations Lecture 4 March 6 - 13 2020 37 40

61 A single step block factorization from A0 to A1

Draw an mtimes p Gaussian matrix G and form the sample matrix

Y = (ATA)qATG

where q is a parameter indicating the number of steps of poweriteration taken

Form a unitary matrix V whose first p columns form anorthonormal basis for the column space of Y For example

[Vsim] = qr(Y)

We then compute SVD of the matrix AV( 1 p)

AV( 1 p) = U1A11WT

and set V1 = V

983063W

I

983064

Low-Rank Approximations Lecture 4 March 6 - 13 2020 38 40

One step of randUTV

function [UTV] = stepUTV(A p q)G =randn(size(A 1) p)Y = ATGfor i = 1 q

Y = AT(AY)end[Vsim] = qr(Y)[UDW] = svd(AV( 1 p))T =

983045D UTAV( (p+ 1) end)

983046

V( 1 p) = V( 1 p)Wreturn

randUTV then applies the same procedure to the remaining blockand so on

Low-Rank Approximations Lecture 4 March 6 - 13 2020 39 40

function [UTV] = randUTV(A p q)T = AU = eye(size(A 1)) V = eye(size(A 2))for i = 1 ceil(size(A 2)p)

I1 = 1 (p(iminus 1))I2 = (p(iminus 1) + 1) size(A 1)J2 = (p(iminus 1) + 1) size(A 2)if length(J2) gt p

[983141U 983141T 983141V] = stepUTV(T(I2 J2) p q)else

[983141U 983141T 983141V] = svd(T(I2 J2))end

U( I2) = U( I2)983141U V( J2) = V( J2)983141V

T(I2 J2) = 983141T T(I1 J2) = 983141T(I1 J2)983141Vend

return

Low-Rank Approximations Lecture 4 March 6 - 13 2020 40 40

15 Accuracy enhanced algorithms

Algorithm 4 Accuracy enhanced randomized SVD

Input A isin Rmtimesn a target rank kover-sampling parameter pa small q the number of steps in the power iteration

Output U isin Rmtimes(k+p) D isin R(k+p)times(k+p) and V isin Rntimes(k+p)

G = randn(n k + p)Y = AGfor j = 1 q

Z = ATYY = AZ

endQ = orth(Y) (for example [Qsim] = qr(Y 0))B = QTA

[983141UDV] = svd(Blsquoeconrsquo)

U = Q983141U

Low-Rank Approximations Lecture 4 March 6 - 13 2020 15 40

Theorem 2

Let A be an mtimes n matrix with singular values σjmin(mn)j=1 Let k be a

target rank and let p be an over-sampling parameter such that p ge 2and k + p le min(mn) Let G be a Gaussian random matrix of sizentimes (k + p) let q be a small integer set

Y = (AAT)qAG

and let Q = orth(Y) Then

E[983042AminusQQTA9830422]

le

983093

983097983095

9830751 +

983158k

pminus 1

983076σ2q+1k+1 +

eradick + p

p

983091

983107min(mn)983131

j=k+1

σ2(2q+1)j

983092

983108

12

983094

983098983096

12q+1

le9830771 +

983158k

pminus 1+

eradick + p

p

983155min(mn)minus k

983078 12q+1

σk+1

Low-Rank Approximations Lecture 4 March 6 - 13 2020 16 40

Algorithm 5 Accuracy enhanced randomized SVDwith orthonormalization

Input A isin Rmtimesn a target rank kover-sampling parameter pa small q the number of steps in the power iteration

Output U isin Rmtimes(k+p) D isin R(k+p)times(k+p) and V isin Rntimes(k+p)

G = randn(n k + p)Q = orth(AG)for j = 1 q

W = orth(ATQ)Q = orth(AW)

endB = QTA

[983141UDV] = svd(Blsquoeconrsquo)

U = Q983141U

Low-Rank Approximations Lecture 4 March 6 - 13 2020 17 40

16 The Nystrom method for SPD matrices

Algorithm 6 Randomized EVD for SPD matrixvia Nystrom method

Input SPD A isin Rntimesn a target rank kover-sampling parameter p

Output U isin Rmtimes(k+p) and Λ isin R(k+p)times(k+p)

G = randn(n k + p)Q = orth(AG)B1 = AQB2 = QTB1C = chol(B2)F = B1C

minus1[UΣsim] = svd(Flsquoeconrsquo)Λ = Σ2

Low-Rank Approximations Lecture 4 March 6 - 13 2020 18 40

2 Interpolative decomposition (ID)

Any matrix A of size mtimes n and rank k where k lt min(mn)admits a so called ldquocolumn IDrdquo which takes the form

A = CZ C isin Rmtimesk Z isin Rktimesn

where the matrix C = A( Js) and Z is a matrix that containsthe k times k identity matrix (Z( Js) = Ik)

Advantages compared to QR or SVD

(1) If A is sparse or non-negative then C shares these properties

(2) The ID requires less memory to store than either the QR orSVD

(3) Finding the indices associated with the spanning columns isoften helpful in data interpretation

(4) In the context of numerical algorithms for discretizing PDEsand integral equations the ID often preserves ldquothe physicsrdquo of aproblem in a way that the QR or SVD do not

Low-Rank Approximations Lecture 4 March 6 - 13 2020 19 40

ldquoRow IDrdquo

A = XR X isin Rmtimesk R isin Rktimesn

where R = A(Is ) and X is a matrix that contains the k times kidentity matrix (X(Is ) = Ik)

ldquoDouble-sided IDrdquo

A = XAsZ X isin Rmtimesk As isin Rktimesk Z isin Rktimesn

where X and Z are the same matrices as those that appear in rowID and column ID and As is the k times k submatrix of A given by

As = A(Is Js)

Low-Rank Approximations Lecture 4 March 6 - 13 2020 20 40

21 Deterministic techniques for the approximate ID

The standard software used to compute the classical columnpivoted QR factorization (CPQR)

AP = QS

can with some light post-processing be used to compute thecolumn ID Let

Q =983045Q1 Q2

983046 S =

983063S11 S12

0 S22

983064

ThenAP =

983045Q1S11 Q1S12 +Q2S22

983046

Let J denote the permutation vector associated with thepermutation matrix P so that P = I( J) and AP = A( J) LetJs = J(1 k) Then set

C = A( Js) = Q1S11

Low-Rank Approximations Lecture 4 March 6 - 13 2020 21 40

ByAP = Q1S11

983045I Sminus1

11 S12

983046+

9830450 Q2S22

983046

we have

A = C983045I Sminus1

11 S12

983046PT +

9830450 Q2S22

983046PT

= CZ+9830450 Q2S22

983046PT

Algorithm 7 Column ID A asymp A( Js)Z

function [JsZ] = ID col(A k)[simS J ] = qr(A 0)T = S(1 k 1 k)S(1 k (k + 1) n)Z = zeros(k n)Z( J) =

983045Ik T

983046

Js = J(1 k)end

Low-Rank Approximations Lecture 4 March 6 - 13 2020 22 40

The row ID can be computed via an entirely analogous processthat starts with a CPQR of the transpose of A In other wordswe execute a pivoted Gram-Schmidt orthonormalization processon the rows of A

Algorithm 8 Row ID A asymp XA(Is )

function [IsX] = ID row(A k)[simS J ] = qr(AT 0)T = S(1 k 1 k)S(1 k (k + 1) m)X = zeros(m k)

X(J ) =983045Ik T

983046T

Is = J(1 k)end

Low-Rank Approximations Lecture 4 March 6 - 13 2020 23 40

To obtain the double-sided ID we start with using theCPQR-based process to build the column ID Then compute therow ID by performing Gram-Schmidt on the rows of the tall thinmatrix C

Algorithm 9 Double-sided ID A asymp XA(Is Js)Z

function [Is JsXZ] = ID double(A k)[JsZ] = ID col(A k)[IsX] = ID row(A( Js) k)

end

The algorithms for computing interpolatory decompositions arewasteful when k ≪ min(mn) since they involve a full QRfactorization which has complexity O(mnmin(mn)) Thisproblem is very easily remedied by replacing the full QRfactorization by a partial QR factorization (the QR factorization isinterrupted after k steps) which has cost O(mnk)

Low-Rank Approximations Lecture 4 March 6 - 13 2020 24 40

22 Randomized techniques for the approximate ID

Suppose A isin Rmtimesn with rank(A) = k Let

A = YF Y isin Rmtimesk F isin Rktimesn

Then by [IsX] = ID row(Y k) to compute a row ID of Y wehave

Y = XY(Is )

andXA(Is ) = XY(Is )F = YF = A

That is XA(Is ) is automatically a row ID of A as well

Observation In order to compute a row ID of a matrix A theonly information needed is a matrix Y whose columns span thecolumn space of A

As we have seen the task of finding a matrix Y whose columnsform a good basis for the column space of a matrix is ideallysuited to randomized sampling

Low-Rank Approximations Lecture 4 March 6 - 13 2020 25 40

Algorithm 10 Accuracy enhanced randomized row ID

Input A isin Rmtimesn a target rank kover-sampling parameter pa small q the number of steps in the power iteration

Output X isin Rmtimesk and an index vector Is isin Nk

such that A asymp XA(Is )

G = randn(n k + p)Y = AGfor j = 1 q

Z = ATYY = AZ

end[IsX] = ID row(Y k)

We can reduce the complexity by using a structured randommatrix instead of a Gaussian For example Ω = DFSDHS

Low-Rank Approximations Lecture 4 March 6 - 13 2020 26 40

3 The CUR decomposition

The CUR factorization approximates an mtimes n matrix A as aproduct

A asymp CUR C isin Rmtimesk U isin Rktimesk R isin Rktimesn

where C consists of a subset of the columns of A and R consists ofa subset of the rows of A

From double-sided ID to CUR Let

C = A( Js) R = A(Is )

ByA asymp CZ asymp CUR

we setU = ZRdagger

Low-Rank Approximations Lecture 4 March 6 - 13 2020 27 40

Algorithm 11 Accuracy enhanced randomized CUR

Input A isin Rmtimesn a target rank kover-sampling parameter pa small q the number of steps in the power iteration

Output U isin Rktimesk index vectors Is Js isin Nk

such that A asymp A( Js)UA(Is )

G = randn(k + pm)Y = GAfor j = 1 q

Z = YATY = ZA

end[IsZ] = ID col(Y k)[Issim] = ID row(A( Js) k)Solve UA(Is ) = Z in least squares sense

Low-Rank Approximations Lecture 4 March 6 - 13 2020 28 40

4 Adaptive rank determination

Given a matrix A and a tolerance ε The task is to determinematrices Qk and Bk of rank le k such that 983042AminusQkBk983042 le ε

41 Algorithm with updating the matrix

Q0 = [ ] B0 = [ ] A0 = A k = 0while 983042Ak9830422 gt ε

k = k + 1pick y isin range(Akminus1)qk = y983042y9830422bk = qT

kAkminus1Qk =

983045Qkminus1 qk

983046

Bk =

983063Bkminus1

bk

983064

Ak = Akminus1 minus qkbkend

Low-Rank Approximations Lecture 4 March 6 - 13 2020 29 40

Strategies for choosing y isin range(Akminus1)

(1) Pick the largest remaining column

y = Akminus1( jk) with jk = argmaxj

983042Akminus1( j)9830422

(2) Pick the locally optimal vector (computationally hard)

y = argmin983042x9830422=1 xisinrange(Akminus1)

983042(Iminus xxT)Akminus19830422

(3) A randomized selection strategy Set

y = Akminus1g

where g isin Rn is a Gaussian random vector

Low-Rank Approximations Lecture 4 March 6 - 13 2020 30 40

Exercise Prove that for each k Qk is orthonormal and

Bk = QTkA A = QkBk +Ak

Exercise Prove that for each k

983042A9830422F = 983042Bk9830422F + 983042Ak9830422F

Exercise Prove that for each k with the strategy (2) ie

y = argmin983042x9830422=1 xisinrange(Akminus1)

983042(Iminus xxT)Akminus19830422

the algorithm will produce matrices that attain the theoreticallyoptimal precision ie

983042AminusQkBk9830422 = σk+1

Low-Rank Approximations Lecture 4 March 6 - 13 2020 31 40

42 Block version for randomized selection strategy

Q = [ ] B = [ ]while 983042A9830422 gt ε

G = randn(n p)[Qnewsim] = qr(AG 0)Bnew = QT

newAQ =

983045Q Qnew

983046

B =

983063B

Bnew

983064

A = AminusQnewBnewend

Exercise The block version algorithmrsquos output is an orthonormalmatrix Q of size mtimes k (where k is a multiple of p) and a k times nmatrix B such that

983042AminusQB9830422 le ε

Low-Rank Approximations Lecture 4 March 6 - 13 2020 32 40

43 Algorithm without updating the matrix

Draw standard Gaussian vectors g1 gr of length nyi = Agi i = 1 r j = 0 Q0 = [ ]

while max983042yj+19830422 983042yj+29830422 983042yj+r9830422 gt ε(10983155

2π)j = j + 1yj = yj minusQjminus1Q

Tjminus1yj

qj = yj983042yj9830422Qj =

983045Qjminus1 qj

983046

Draw a standard Gaussian vector gj+r of length nyj+r = (IminusQjQ

Tj )Agj+r

yi = yi minus qjqTj yi i = (j + 1) (j + r minus 1)

endQ = Qj

The orthonormal matrix Q such that 983042AminusQQTA9830422 le ε holdswith probability at least 1minusminmn10minusr

Low-Rank Approximations Lecture 4 March 6 - 13 2020 33 40

5 Rank-revealing QR Factorization

Column pivoted QR factorization (CPQR) of A isin Rmtimesn

AP = QR

where P is a permutation matrix Q is an orthogonal matrix andR is upper triangular

Householder column pivoted QR

Householder column pivoted QR consists of a sequence of nminus 1rank-1 updates (so called BLAS2 operations)

Low-Rank Approximations Lecture 4 March 6 - 13 2020 34 40

Householder column pivoted QR executes rather slowly on modernhardware in particular on systems involving many cores when thematrix is stored in distributed memory or on a hard drive etcThe reason is the process very communication intensive Wereturn to block the process

Blocked Householder column pivoted QR

BLAS3 operations are involved which can be executed very rapidlyon a broad range of computing hardware

A randomized algorithm

G = randn(b+ pm) Y = GA [simsimP1] = qr(Y 0)

Low-Rank Approximations Lecture 4 March 6 - 13 2020 35 40

6 Rank-revealing UTV decomposition

Given an mtimes n matrix A with m ge n a UTV decomposition ofA is a factorization of the form

A = UTVT

where U isin Rmtimesm and V isin Rntimesn are unitary matrices and T is atriangular matrix (either lower or upper triangular)

The UTV decomposition can be viewed as a generalization ofother standard factorizations such as eg the Singular ValueDecomposition (SVD) or the Column Pivoted QR decomposition(CPQR) (To be precise the SVD is the special case where T isdiagonal and the CPQR is the special case where V is apermutation matrix) The additional flexibility inherent in UTVdecomposition enables the design of efficient updating procedures

randUTV provides close to optimal low-rank approximation andhighly accurate estimates for the singular values of a matrix

Low-Rank Approximations Lecture 4 March 6 - 13 2020 36 40

The algorithm randUTV builds the factorization incrementallywhich means that when it is applied to a matrix of numerical rankk the algorithm can be stopped early and incur an overall cost ofO(mnk)

Like HQRRP the algorithm randUTV is blocked which enables itto execute fast on modern hardware

The algorithm randUTV is not an iterative algorithm In thisregard it is closer to the CPQR than standard SVD algorithmswhich substantially simplifies software optimization

Blocked UTV factorization process

Low-Rank Approximations Lecture 4 March 6 - 13 2020 37 40

61 A single step block factorization from A0 to A1

Draw an mtimes p Gaussian matrix G and form the sample matrix

Y = (ATA)qATG

where q is a parameter indicating the number of steps of poweriteration taken

Form a unitary matrix V whose first p columns form anorthonormal basis for the column space of Y For example

[Vsim] = qr(Y)

We then compute SVD of the matrix AV( 1 p)

AV( 1 p) = U1A11WT

and set V1 = V

983063W

I

983064

Low-Rank Approximations Lecture 4 March 6 - 13 2020 38 40

One step of randUTV

function [UTV] = stepUTV(A p q)G =randn(size(A 1) p)Y = ATGfor i = 1 q

Y = AT(AY)end[Vsim] = qr(Y)[UDW] = svd(AV( 1 p))T =

983045D UTAV( (p+ 1) end)

983046

V( 1 p) = V( 1 p)Wreturn

randUTV then applies the same procedure to the remaining blockand so on

Low-Rank Approximations Lecture 4 March 6 - 13 2020 39 40

function [UTV] = randUTV(A p q)T = AU = eye(size(A 1)) V = eye(size(A 2))for i = 1 ceil(size(A 2)p)

I1 = 1 (p(iminus 1))I2 = (p(iminus 1) + 1) size(A 1)J2 = (p(iminus 1) + 1) size(A 2)if length(J2) gt p

[983141U 983141T 983141V] = stepUTV(T(I2 J2) p q)else

[983141U 983141T 983141V] = svd(T(I2 J2))end

U( I2) = U( I2)983141U V( J2) = V( J2)983141V

T(I2 J2) = 983141T T(I1 J2) = 983141T(I1 J2)983141Vend

return

Low-Rank Approximations Lecture 4 March 6 - 13 2020 40 40

Theorem 2

Let A be an mtimes n matrix with singular values σjmin(mn)j=1 Let k be a

target rank and let p be an over-sampling parameter such that p ge 2and k + p le min(mn) Let G be a Gaussian random matrix of sizentimes (k + p) let q be a small integer set

Y = (AAT)qAG

and let Q = orth(Y) Then

E[983042AminusQQTA9830422]

le

983093

983097983095

9830751 +

983158k

pminus 1

983076σ2q+1k+1 +

eradick + p

p

983091

983107min(mn)983131

j=k+1

σ2(2q+1)j

983092

983108

12

983094

983098983096

12q+1

le9830771 +

983158k

pminus 1+

eradick + p

p

983155min(mn)minus k

983078 12q+1

σk+1

Low-Rank Approximations Lecture 4 March 6 - 13 2020 16 40

Algorithm 5 Accuracy enhanced randomized SVDwith orthonormalization

Input A isin Rmtimesn a target rank kover-sampling parameter pa small q the number of steps in the power iteration

Output U isin Rmtimes(k+p) D isin R(k+p)times(k+p) and V isin Rntimes(k+p)

G = randn(n k + p)Q = orth(AG)for j = 1 q

W = orth(ATQ)Q = orth(AW)

endB = QTA

[983141UDV] = svd(Blsquoeconrsquo)

U = Q983141U

Low-Rank Approximations Lecture 4 March 6 - 13 2020 17 40

16 The Nystrom method for SPD matrices

Algorithm 6 Randomized EVD for SPD matrixvia Nystrom method

Input SPD A isin Rntimesn a target rank kover-sampling parameter p

Output U isin Rmtimes(k+p) and Λ isin R(k+p)times(k+p)

G = randn(n k + p)Q = orth(AG)B1 = AQB2 = QTB1C = chol(B2)F = B1C

minus1[UΣsim] = svd(Flsquoeconrsquo)Λ = Σ2

Low-Rank Approximations Lecture 4 March 6 - 13 2020 18 40

2 Interpolative decomposition (ID)

Any matrix A of size mtimes n and rank k where k lt min(mn)admits a so called ldquocolumn IDrdquo which takes the form

A = CZ C isin Rmtimesk Z isin Rktimesn

where the matrix C = A( Js) and Z is a matrix that containsthe k times k identity matrix (Z( Js) = Ik)

Advantages compared to QR or SVD

(1) If A is sparse or non-negative then C shares these properties

(2) The ID requires less memory to store than either the QR orSVD

(3) Finding the indices associated with the spanning columns isoften helpful in data interpretation

(4) In the context of numerical algorithms for discretizing PDEsand integral equations the ID often preserves ldquothe physicsrdquo of aproblem in a way that the QR or SVD do not

Low-Rank Approximations Lecture 4 March 6 - 13 2020 19 40

ldquoRow IDrdquo

A = XR X isin Rmtimesk R isin Rktimesn

where R = A(Is ) and X is a matrix that contains the k times kidentity matrix (X(Is ) = Ik)

ldquoDouble-sided IDrdquo

A = XAsZ X isin Rmtimesk As isin Rktimesk Z isin Rktimesn

where X and Z are the same matrices as those that appear in rowID and column ID and As is the k times k submatrix of A given by

As = A(Is Js)

Low-Rank Approximations Lecture 4 March 6 - 13 2020 20 40

21 Deterministic techniques for the approximate ID

The standard software used to compute the classical columnpivoted QR factorization (CPQR)

AP = QS

can with some light post-processing be used to compute thecolumn ID Let

Q =983045Q1 Q2

983046 S =

983063S11 S12

0 S22

983064

ThenAP =

983045Q1S11 Q1S12 +Q2S22

983046

Let J denote the permutation vector associated with thepermutation matrix P so that P = I( J) and AP = A( J) LetJs = J(1 k) Then set

C = A( Js) = Q1S11

Low-Rank Approximations Lecture 4 March 6 - 13 2020 21 40

ByAP = Q1S11

983045I Sminus1

11 S12

983046+

9830450 Q2S22

983046

we have

A = C983045I Sminus1

11 S12

983046PT +

9830450 Q2S22

983046PT

= CZ+9830450 Q2S22

983046PT

Algorithm 7 Column ID A asymp A( Js)Z

function [JsZ] = ID col(A k)[simS J ] = qr(A 0)T = S(1 k 1 k)S(1 k (k + 1) n)Z = zeros(k n)Z( J) =

983045Ik T

983046

Js = J(1 k)end

Low-Rank Approximations Lecture 4 March 6 - 13 2020 22 40

The row ID can be computed via an entirely analogous processthat starts with a CPQR of the transpose of A In other wordswe execute a pivoted Gram-Schmidt orthonormalization processon the rows of A

Algorithm 8 Row ID A asymp XA(Is )

function [IsX] = ID row(A k)[simS J ] = qr(AT 0)T = S(1 k 1 k)S(1 k (k + 1) m)X = zeros(m k)

X(J ) =983045Ik T

983046T

Is = J(1 k)end

Low-Rank Approximations Lecture 4 March 6 - 13 2020 23 40

To obtain the double-sided ID we start with using theCPQR-based process to build the column ID Then compute therow ID by performing Gram-Schmidt on the rows of the tall thinmatrix C

Algorithm 9 Double-sided ID A asymp XA(Is Js)Z

function [Is JsXZ] = ID double(A k)[JsZ] = ID col(A k)[IsX] = ID row(A( Js) k)

end

The algorithms for computing interpolatory decompositions arewasteful when k ≪ min(mn) since they involve a full QRfactorization which has complexity O(mnmin(mn)) Thisproblem is very easily remedied by replacing the full QRfactorization by a partial QR factorization (the QR factorization isinterrupted after k steps) which has cost O(mnk)

Low-Rank Approximations Lecture 4 March 6 - 13 2020 24 40

22 Randomized techniques for the approximate ID

Suppose A isin Rmtimesn with rank(A) = k Let

A = YF Y isin Rmtimesk F isin Rktimesn

Then by [IsX] = ID row(Y k) to compute a row ID of Y wehave

Y = XY(Is )

andXA(Is ) = XY(Is )F = YF = A

That is XA(Is ) is automatically a row ID of A as well

Observation In order to compute a row ID of a matrix A theonly information needed is a matrix Y whose columns span thecolumn space of A

As we have seen the task of finding a matrix Y whose columnsform a good basis for the column space of a matrix is ideallysuited to randomized sampling

Low-Rank Approximations Lecture 4 March 6 - 13 2020 25 40

Algorithm 10 Accuracy enhanced randomized row ID

Input A isin Rmtimesn a target rank kover-sampling parameter pa small q the number of steps in the power iteration

Output X isin Rmtimesk and an index vector Is isin Nk

such that A asymp XA(Is )

G = randn(n k + p)Y = AGfor j = 1 q

Z = ATYY = AZ

end[IsX] = ID row(Y k)

We can reduce the complexity by using a structured randommatrix instead of a Gaussian For example Ω = DFSDHS

Low-Rank Approximations Lecture 4 March 6 - 13 2020 26 40

3 The CUR decomposition

The CUR factorization approximates an mtimes n matrix A as aproduct

A asymp CUR C isin Rmtimesk U isin Rktimesk R isin Rktimesn

where C consists of a subset of the columns of A and R consists ofa subset of the rows of A

From double-sided ID to CUR Let

C = A( Js) R = A(Is )

ByA asymp CZ asymp CUR

we setU = ZRdagger

Low-Rank Approximations Lecture 4 March 6 - 13 2020 27 40

Algorithm 11 Accuracy enhanced randomized CUR

Input A isin Rmtimesn a target rank kover-sampling parameter pa small q the number of steps in the power iteration

Output U isin Rktimesk index vectors Is Js isin Nk

such that A asymp A( Js)UA(Is )

G = randn(k + pm)Y = GAfor j = 1 q

Z = YATY = ZA

end[IsZ] = ID col(Y k)[Issim] = ID row(A( Js) k)Solve UA(Is ) = Z in least squares sense

Low-Rank Approximations Lecture 4 March 6 - 13 2020 28 40

4 Adaptive rank determination

Given a matrix A and a tolerance ε The task is to determinematrices Qk and Bk of rank le k such that 983042AminusQkBk983042 le ε

41 Algorithm with updating the matrix

Q0 = [ ] B0 = [ ] A0 = A k = 0while 983042Ak9830422 gt ε

k = k + 1pick y isin range(Akminus1)qk = y983042y9830422bk = qT

kAkminus1Qk =

983045Qkminus1 qk

983046

Bk =

983063Bkminus1

bk

983064

Ak = Akminus1 minus qkbkend

Low-Rank Approximations Lecture 4 March 6 - 13 2020 29 40

Strategies for choosing y isin range(Akminus1)

(1) Pick the largest remaining column

y = Akminus1( jk) with jk = argmaxj

983042Akminus1( j)9830422

(2) Pick the locally optimal vector (computationally hard)

y = argmin983042x9830422=1 xisinrange(Akminus1)

983042(Iminus xxT)Akminus19830422

(3) A randomized selection strategy Set

y = Akminus1g

where g isin Rn is a Gaussian random vector

Low-Rank Approximations Lecture 4 March 6 - 13 2020 30 40

Exercise Prove that for each k Qk is orthonormal and

Bk = QTkA A = QkBk +Ak

Exercise Prove that for each k

983042A9830422F = 983042Bk9830422F + 983042Ak9830422F

Exercise Prove that for each k with the strategy (2) ie

y = argmin983042x9830422=1 xisinrange(Akminus1)

983042(Iminus xxT)Akminus19830422

the algorithm will produce matrices that attain the theoreticallyoptimal precision ie

983042AminusQkBk9830422 = σk+1

Low-Rank Approximations Lecture 4 March 6 - 13 2020 31 40

42 Block version for randomized selection strategy

Q = [ ] B = [ ]while 983042A9830422 gt ε

G = randn(n p)[Qnewsim] = qr(AG 0)Bnew = QT

newAQ =

983045Q Qnew

983046

B =

983063B

Bnew

983064

A = AminusQnewBnewend

Exercise The block version algorithmrsquos output is an orthonormalmatrix Q of size mtimes k (where k is a multiple of p) and a k times nmatrix B such that

983042AminusQB9830422 le ε

Low-Rank Approximations Lecture 4 March 6 - 13 2020 32 40

43 Algorithm without updating the matrix

Draw standard Gaussian vectors g1 gr of length nyi = Agi i = 1 r j = 0 Q0 = [ ]

while max983042yj+19830422 983042yj+29830422 983042yj+r9830422 gt ε(10983155

2π)j = j + 1yj = yj minusQjminus1Q

Tjminus1yj

qj = yj983042yj9830422Qj =

983045Qjminus1 qj

983046

Draw a standard Gaussian vector gj+r of length nyj+r = (IminusQjQ

Tj )Agj+r

yi = yi minus qjqTj yi i = (j + 1) (j + r minus 1)

endQ = Qj

The orthonormal matrix Q such that 983042AminusQQTA9830422 le ε holdswith probability at least 1minusminmn10minusr

Low-Rank Approximations Lecture 4 March 6 - 13 2020 33 40

5 Rank-revealing QR Factorization

Column pivoted QR factorization (CPQR) of A isin Rmtimesn

AP = QR

where P is a permutation matrix Q is an orthogonal matrix andR is upper triangular

Householder column pivoted QR

Householder column pivoted QR consists of a sequence of nminus 1rank-1 updates (so called BLAS2 operations)

Low-Rank Approximations Lecture 4 March 6 - 13 2020 34 40

Householder column pivoted QR executes rather slowly on modernhardware in particular on systems involving many cores when thematrix is stored in distributed memory or on a hard drive etcThe reason is the process very communication intensive Wereturn to block the process

Blocked Householder column pivoted QR

BLAS3 operations are involved which can be executed very rapidlyon a broad range of computing hardware

A randomized algorithm

G = randn(b+ pm) Y = GA [simsimP1] = qr(Y 0)

Low-Rank Approximations Lecture 4 March 6 - 13 2020 35 40

6 Rank-revealing UTV decomposition

Given an mtimes n matrix A with m ge n a UTV decomposition ofA is a factorization of the form

A = UTVT

where U isin Rmtimesm and V isin Rntimesn are unitary matrices and T is atriangular matrix (either lower or upper triangular)

The UTV decomposition can be viewed as a generalization ofother standard factorizations such as eg the Singular ValueDecomposition (SVD) or the Column Pivoted QR decomposition(CPQR) (To be precise the SVD is the special case where T isdiagonal and the CPQR is the special case where V is apermutation matrix) The additional flexibility inherent in UTVdecomposition enables the design of efficient updating procedures

randUTV provides close to optimal low-rank approximation andhighly accurate estimates for the singular values of a matrix

Low-Rank Approximations Lecture 4 March 6 - 13 2020 36 40

The algorithm randUTV builds the factorization incrementallywhich means that when it is applied to a matrix of numerical rankk the algorithm can be stopped early and incur an overall cost ofO(mnk)

Like HQRRP the algorithm randUTV is blocked which enables itto execute fast on modern hardware

The algorithm randUTV is not an iterative algorithm In thisregard it is closer to the CPQR than standard SVD algorithmswhich substantially simplifies software optimization

Blocked UTV factorization process

Low-Rank Approximations Lecture 4 March 6 - 13 2020 37 40

61 A single step block factorization from A0 to A1

Draw an mtimes p Gaussian matrix G and form the sample matrix

Y = (ATA)qATG

where q is a parameter indicating the number of steps of poweriteration taken

Form a unitary matrix V whose first p columns form anorthonormal basis for the column space of Y For example

[Vsim] = qr(Y)

We then compute SVD of the matrix AV( 1 p)

AV( 1 p) = U1A11WT

and set V1 = V

983063W

I

983064

Low-Rank Approximations Lecture 4 March 6 - 13 2020 38 40

One step of randUTV

function [UTV] = stepUTV(A p q)G =randn(size(A 1) p)Y = ATGfor i = 1 q

Y = AT(AY)end[Vsim] = qr(Y)[UDW] = svd(AV( 1 p))T =

983045D UTAV( (p+ 1) end)

983046

V( 1 p) = V( 1 p)Wreturn

randUTV then applies the same procedure to the remaining blockand so on

Low-Rank Approximations Lecture 4 March 6 - 13 2020 39 40

function [UTV] = randUTV(A p q)T = AU = eye(size(A 1)) V = eye(size(A 2))for i = 1 ceil(size(A 2)p)

I1 = 1 (p(iminus 1))I2 = (p(iminus 1) + 1) size(A 1)J2 = (p(iminus 1) + 1) size(A 2)if length(J2) gt p

[983141U 983141T 983141V] = stepUTV(T(I2 J2) p q)else

[983141U 983141T 983141V] = svd(T(I2 J2))end

U( I2) = U( I2)983141U V( J2) = V( J2)983141V

T(I2 J2) = 983141T T(I1 J2) = 983141T(I1 J2)983141Vend

return

Low-Rank Approximations Lecture 4 March 6 - 13 2020 40 40

Algorithm 5 Accuracy enhanced randomized SVDwith orthonormalization

Input A isin Rmtimesn a target rank kover-sampling parameter pa small q the number of steps in the power iteration

Output U isin Rmtimes(k+p) D isin R(k+p)times(k+p) and V isin Rntimes(k+p)

G = randn(n k + p)Q = orth(AG)for j = 1 q

W = orth(ATQ)Q = orth(AW)

endB = QTA

[983141UDV] = svd(Blsquoeconrsquo)

U = Q983141U

Low-Rank Approximations Lecture 4 March 6 - 13 2020 17 40

16 The Nystrom method for SPD matrices

Algorithm 6 Randomized EVD for SPD matrixvia Nystrom method

Input SPD A isin Rntimesn a target rank kover-sampling parameter p

Output U isin Rmtimes(k+p) and Λ isin R(k+p)times(k+p)

G = randn(n k + p)Q = orth(AG)B1 = AQB2 = QTB1C = chol(B2)F = B1C

minus1[UΣsim] = svd(Flsquoeconrsquo)Λ = Σ2

Low-Rank Approximations Lecture 4 March 6 - 13 2020 18 40

2 Interpolative decomposition (ID)

Any matrix A of size mtimes n and rank k where k lt min(mn)admits a so called ldquocolumn IDrdquo which takes the form

A = CZ C isin Rmtimesk Z isin Rktimesn

where the matrix C = A( Js) and Z is a matrix that containsthe k times k identity matrix (Z( Js) = Ik)

Advantages compared to QR or SVD

(1) If A is sparse or non-negative then C shares these properties

(2) The ID requires less memory to store than either the QR orSVD

(3) Finding the indices associated with the spanning columns isoften helpful in data interpretation

(4) In the context of numerical algorithms for discretizing PDEsand integral equations the ID often preserves ldquothe physicsrdquo of aproblem in a way that the QR or SVD do not

Low-Rank Approximations Lecture 4 March 6 - 13 2020 19 40

ldquoRow IDrdquo

A = XR X isin Rmtimesk R isin Rktimesn

where R = A(Is ) and X is a matrix that contains the k times kidentity matrix (X(Is ) = Ik)

ldquoDouble-sided IDrdquo

A = XAsZ X isin Rmtimesk As isin Rktimesk Z isin Rktimesn

where X and Z are the same matrices as those that appear in rowID and column ID and As is the k times k submatrix of A given by

As = A(Is Js)

Low-Rank Approximations Lecture 4 March 6 - 13 2020 20 40

21 Deterministic techniques for the approximate ID

The standard software used to compute the classical columnpivoted QR factorization (CPQR)

AP = QS

can with some light post-processing be used to compute thecolumn ID Let

Q =983045Q1 Q2

983046 S =

983063S11 S12

0 S22

983064

ThenAP =

983045Q1S11 Q1S12 +Q2S22

983046

Let J denote the permutation vector associated with thepermutation matrix P so that P = I( J) and AP = A( J) LetJs = J(1 k) Then set

C = A( Js) = Q1S11

Low-Rank Approximations Lecture 4 March 6 - 13 2020 21 40

ByAP = Q1S11

983045I Sminus1

11 S12

983046+

9830450 Q2S22

983046

we have

A = C983045I Sminus1

11 S12

983046PT +

9830450 Q2S22

983046PT

= CZ+9830450 Q2S22

983046PT

Algorithm 7 Column ID A asymp A( Js)Z

function [JsZ] = ID col(A k)[simS J ] = qr(A 0)T = S(1 k 1 k)S(1 k (k + 1) n)Z = zeros(k n)Z( J) =

983045Ik T

983046

Js = J(1 k)end

Low-Rank Approximations Lecture 4 March 6 - 13 2020 22 40

The row ID can be computed via an entirely analogous processthat starts with a CPQR of the transpose of A In other wordswe execute a pivoted Gram-Schmidt orthonormalization processon the rows of A

Algorithm 8 Row ID A asymp XA(Is )

function [IsX] = ID row(A k)[simS J ] = qr(AT 0)T = S(1 k 1 k)S(1 k (k + 1) m)X = zeros(m k)

X(J ) =983045Ik T

983046T

Is = J(1 k)end

Low-Rank Approximations Lecture 4 March 6 - 13 2020 23 40

To obtain the double-sided ID we start with using theCPQR-based process to build the column ID Then compute therow ID by performing Gram-Schmidt on the rows of the tall thinmatrix C

Algorithm 9 Double-sided ID A asymp XA(Is Js)Z

function [Is JsXZ] = ID double(A k)[JsZ] = ID col(A k)[IsX] = ID row(A( Js) k)

end

The algorithms for computing interpolatory decompositions arewasteful when k ≪ min(mn) since they involve a full QRfactorization which has complexity O(mnmin(mn)) Thisproblem is very easily remedied by replacing the full QRfactorization by a partial QR factorization (the QR factorization isinterrupted after k steps) which has cost O(mnk)

Low-Rank Approximations Lecture 4 March 6 - 13 2020 24 40

22 Randomized techniques for the approximate ID

Suppose A isin Rmtimesn with rank(A) = k Let

A = YF Y isin Rmtimesk F isin Rktimesn

Then by [IsX] = ID row(Y k) to compute a row ID of Y wehave

Y = XY(Is )

andXA(Is ) = XY(Is )F = YF = A

That is XA(Is ) is automatically a row ID of A as well

Observation In order to compute a row ID of a matrix A theonly information needed is a matrix Y whose columns span thecolumn space of A

As we have seen the task of finding a matrix Y whose columnsform a good basis for the column space of a matrix is ideallysuited to randomized sampling

Low-Rank Approximations Lecture 4 March 6 - 13 2020 25 40

Algorithm 10 Accuracy enhanced randomized row ID

Input A isin Rmtimesn a target rank kover-sampling parameter pa small q the number of steps in the power iteration

Output X isin Rmtimesk and an index vector Is isin Nk

such that A asymp XA(Is )

G = randn(n k + p)Y = AGfor j = 1 q

Z = ATYY = AZ

end[IsX] = ID row(Y k)

We can reduce the complexity by using a structured randommatrix instead of a Gaussian For example Ω = DFSDHS

Low-Rank Approximations Lecture 4 March 6 - 13 2020 26 40

3 The CUR decomposition

The CUR factorization approximates an mtimes n matrix A as aproduct

A asymp CUR C isin Rmtimesk U isin Rktimesk R isin Rktimesn

where C consists of a subset of the columns of A and R consists ofa subset of the rows of A

From double-sided ID to CUR Let

C = A( Js) R = A(Is )

ByA asymp CZ asymp CUR

we setU = ZRdagger

Low-Rank Approximations Lecture 4 March 6 - 13 2020 27 40

Algorithm 11 Accuracy enhanced randomized CUR

Input A isin Rmtimesn a target rank kover-sampling parameter pa small q the number of steps in the power iteration

Output U isin Rktimesk index vectors Is Js isin Nk

such that A asymp A( Js)UA(Is )

G = randn(k + pm)Y = GAfor j = 1 q

Z = YATY = ZA

end[IsZ] = ID col(Y k)[Issim] = ID row(A( Js) k)Solve UA(Is ) = Z in least squares sense

Low-Rank Approximations Lecture 4 March 6 - 13 2020 28 40

4 Adaptive rank determination

Given a matrix A and a tolerance ε The task is to determinematrices Qk and Bk of rank le k such that 983042AminusQkBk983042 le ε

41 Algorithm with updating the matrix

Q0 = [ ] B0 = [ ] A0 = A k = 0while 983042Ak9830422 gt ε

k = k + 1pick y isin range(Akminus1)qk = y983042y9830422bk = qT

kAkminus1Qk =

983045Qkminus1 qk

983046

Bk =

983063Bkminus1

bk

983064

Ak = Akminus1 minus qkbkend

Low-Rank Approximations Lecture 4 March 6 - 13 2020 29 40

Strategies for choosing y isin range(Akminus1)

(1) Pick the largest remaining column

y = Akminus1( jk) with jk = argmaxj

983042Akminus1( j)9830422

(2) Pick the locally optimal vector (computationally hard)

y = argmin983042x9830422=1 xisinrange(Akminus1)

983042(Iminus xxT)Akminus19830422

(3) A randomized selection strategy Set

y = Akminus1g

where g isin Rn is a Gaussian random vector

Low-Rank Approximations Lecture 4 March 6 - 13 2020 30 40

Exercise Prove that for each k Qk is orthonormal and

Bk = QTkA A = QkBk +Ak

Exercise Prove that for each k

983042A9830422F = 983042Bk9830422F + 983042Ak9830422F

Exercise Prove that for each k with the strategy (2) ie

y = argmin983042x9830422=1 xisinrange(Akminus1)

983042(Iminus xxT)Akminus19830422

the algorithm will produce matrices that attain the theoreticallyoptimal precision ie

983042AminusQkBk9830422 = σk+1

Low-Rank Approximations Lecture 4 March 6 - 13 2020 31 40

42 Block version for randomized selection strategy

Q = [ ] B = [ ]while 983042A9830422 gt ε

G = randn(n p)[Qnewsim] = qr(AG 0)Bnew = QT

newAQ =

983045Q Qnew

983046

B =

983063B

Bnew

983064

A = AminusQnewBnewend

Exercise The block version algorithmrsquos output is an orthonormalmatrix Q of size mtimes k (where k is a multiple of p) and a k times nmatrix B such that

983042AminusQB9830422 le ε

Low-Rank Approximations Lecture 4 March 6 - 13 2020 32 40

43 Algorithm without updating the matrix

Draw standard Gaussian vectors g1 gr of length nyi = Agi i = 1 r j = 0 Q0 = [ ]

while max983042yj+19830422 983042yj+29830422 983042yj+r9830422 gt ε(10983155

2π)j = j + 1yj = yj minusQjminus1Q

Tjminus1yj

qj = yj983042yj9830422Qj =

983045Qjminus1 qj

983046

Draw a standard Gaussian vector gj+r of length nyj+r = (IminusQjQ

Tj )Agj+r

yi = yi minus qjqTj yi i = (j + 1) (j + r minus 1)

endQ = Qj

The orthonormal matrix Q such that 983042AminusQQTA9830422 le ε holdswith probability at least 1minusminmn10minusr

Low-Rank Approximations Lecture 4 March 6 - 13 2020 33 40

5 Rank-revealing QR Factorization

Column pivoted QR factorization (CPQR) of A isin Rmtimesn

AP = QR

where P is a permutation matrix Q is an orthogonal matrix andR is upper triangular

Householder column pivoted QR

Householder column pivoted QR consists of a sequence of nminus 1rank-1 updates (so called BLAS2 operations)

Low-Rank Approximations Lecture 4 March 6 - 13 2020 34 40

Householder column pivoted QR executes rather slowly on modernhardware in particular on systems involving many cores when thematrix is stored in distributed memory or on a hard drive etcThe reason is the process very communication intensive Wereturn to block the process

Blocked Householder column pivoted QR

BLAS3 operations are involved which can be executed very rapidlyon a broad range of computing hardware

A randomized algorithm

G = randn(b+ pm) Y = GA [simsimP1] = qr(Y 0)

Low-Rank Approximations Lecture 4 March 6 - 13 2020 35 40

6 Rank-revealing UTV decomposition

Given an mtimes n matrix A with m ge n a UTV decomposition ofA is a factorization of the form

A = UTVT

where U isin Rmtimesm and V isin Rntimesn are unitary matrices and T is atriangular matrix (either lower or upper triangular)

The UTV decomposition can be viewed as a generalization ofother standard factorizations such as eg the Singular ValueDecomposition (SVD) or the Column Pivoted QR decomposition(CPQR) (To be precise the SVD is the special case where T isdiagonal and the CPQR is the special case where V is apermutation matrix) The additional flexibility inherent in UTVdecomposition enables the design of efficient updating procedures

randUTV provides close to optimal low-rank approximation andhighly accurate estimates for the singular values of a matrix

Low-Rank Approximations Lecture 4 March 6 - 13 2020 36 40

The algorithm randUTV builds the factorization incrementallywhich means that when it is applied to a matrix of numerical rankk the algorithm can be stopped early and incur an overall cost ofO(mnk)

Like HQRRP the algorithm randUTV is blocked which enables itto execute fast on modern hardware

The algorithm randUTV is not an iterative algorithm In thisregard it is closer to the CPQR than standard SVD algorithmswhich substantially simplifies software optimization

Blocked UTV factorization process

Low-Rank Approximations Lecture 4 March 6 - 13 2020 37 40

61 A single step block factorization from A0 to A1

Draw an mtimes p Gaussian matrix G and form the sample matrix

Y = (ATA)qATG

where q is a parameter indicating the number of steps of poweriteration taken

Form a unitary matrix V whose first p columns form anorthonormal basis for the column space of Y For example

[Vsim] = qr(Y)

We then compute SVD of the matrix AV( 1 p)

AV( 1 p) = U1A11WT

and set V1 = V

983063W

I

983064

Low-Rank Approximations Lecture 4 March 6 - 13 2020 38 40

One step of randUTV

function [UTV] = stepUTV(A p q)G =randn(size(A 1) p)Y = ATGfor i = 1 q

Y = AT(AY)end[Vsim] = qr(Y)[UDW] = svd(AV( 1 p))T =

983045D UTAV( (p+ 1) end)

983046

V( 1 p) = V( 1 p)Wreturn

randUTV then applies the same procedure to the remaining blockand so on

Low-Rank Approximations Lecture 4 March 6 - 13 2020 39 40

function [UTV] = randUTV(A p q)T = AU = eye(size(A 1)) V = eye(size(A 2))for i = 1 ceil(size(A 2)p)

I1 = 1 (p(iminus 1))I2 = (p(iminus 1) + 1) size(A 1)J2 = (p(iminus 1) + 1) size(A 2)if length(J2) gt p

[983141U 983141T 983141V] = stepUTV(T(I2 J2) p q)else

[983141U 983141T 983141V] = svd(T(I2 J2))end

U( I2) = U( I2)983141U V( J2) = V( J2)983141V

T(I2 J2) = 983141T T(I1 J2) = 983141T(I1 J2)983141Vend

return

Low-Rank Approximations Lecture 4 March 6 - 13 2020 40 40

16 The Nystrom method for SPD matrices

Algorithm 6 Randomized EVD for SPD matrixvia Nystrom method

Input SPD A isin Rntimesn a target rank kover-sampling parameter p

Output U isin Rmtimes(k+p) and Λ isin R(k+p)times(k+p)

G = randn(n k + p)Q = orth(AG)B1 = AQB2 = QTB1C = chol(B2)F = B1C

minus1[UΣsim] = svd(Flsquoeconrsquo)Λ = Σ2

Low-Rank Approximations Lecture 4 March 6 - 13 2020 18 40

2 Interpolative decomposition (ID)

Any matrix A of size mtimes n and rank k where k lt min(mn)admits a so called ldquocolumn IDrdquo which takes the form

A = CZ C isin Rmtimesk Z isin Rktimesn

where the matrix C = A( Js) and Z is a matrix that containsthe k times k identity matrix (Z( Js) = Ik)

Advantages compared to QR or SVD

(1) If A is sparse or non-negative then C shares these properties

(2) The ID requires less memory to store than either the QR orSVD

(3) Finding the indices associated with the spanning columns isoften helpful in data interpretation

(4) In the context of numerical algorithms for discretizing PDEsand integral equations the ID often preserves ldquothe physicsrdquo of aproblem in a way that the QR or SVD do not

Low-Rank Approximations Lecture 4 March 6 - 13 2020 19 40

ldquoRow IDrdquo

A = XR X isin Rmtimesk R isin Rktimesn

where R = A(Is ) and X is a matrix that contains the k times kidentity matrix (X(Is ) = Ik)

ldquoDouble-sided IDrdquo

A = XAsZ X isin Rmtimesk As isin Rktimesk Z isin Rktimesn

where X and Z are the same matrices as those that appear in rowID and column ID and As is the k times k submatrix of A given by

As = A(Is Js)

Low-Rank Approximations Lecture 4 March 6 - 13 2020 20 40

21 Deterministic techniques for the approximate ID

The standard software used to compute the classical columnpivoted QR factorization (CPQR)

AP = QS

can with some light post-processing be used to compute thecolumn ID Let

Q =983045Q1 Q2

983046 S =

983063S11 S12

0 S22

983064

ThenAP =

983045Q1S11 Q1S12 +Q2S22

983046

Let J denote the permutation vector associated with thepermutation matrix P so that P = I( J) and AP = A( J) LetJs = J(1 k) Then set

C = A( Js) = Q1S11

Low-Rank Approximations Lecture 4 March 6 - 13 2020 21 40

ByAP = Q1S11

983045I Sminus1

11 S12

983046+

9830450 Q2S22

983046

we have

A = C983045I Sminus1

11 S12

983046PT +

9830450 Q2S22

983046PT

= CZ+9830450 Q2S22

983046PT

Algorithm 7 Column ID A asymp A( Js)Z

function [JsZ] = ID col(A k)[simS J ] = qr(A 0)T = S(1 k 1 k)S(1 k (k + 1) n)Z = zeros(k n)Z( J) =

983045Ik T

983046

Js = J(1 k)end

Low-Rank Approximations Lecture 4 March 6 - 13 2020 22 40

The row ID can be computed via an entirely analogous processthat starts with a CPQR of the transpose of A In other wordswe execute a pivoted Gram-Schmidt orthonormalization processon the rows of A

Algorithm 8 Row ID A asymp XA(Is )

function [IsX] = ID row(A k)[simS J ] = qr(AT 0)T = S(1 k 1 k)S(1 k (k + 1) m)X = zeros(m k)

X(J ) =983045Ik T

983046T

Is = J(1 k)end

Low-Rank Approximations Lecture 4 March 6 - 13 2020 23 40

To obtain the double-sided ID we start with using theCPQR-based process to build the column ID Then compute therow ID by performing Gram-Schmidt on the rows of the tall thinmatrix C

Algorithm 9 Double-sided ID A asymp XA(Is Js)Z

function [Is JsXZ] = ID double(A k)[JsZ] = ID col(A k)[IsX] = ID row(A( Js) k)

end

The algorithms for computing interpolatory decompositions arewasteful when k ≪ min(mn) since they involve a full QRfactorization which has complexity O(mnmin(mn)) Thisproblem is very easily remedied by replacing the full QRfactorization by a partial QR factorization (the QR factorization isinterrupted after k steps) which has cost O(mnk)

Low-Rank Approximations Lecture 4 March 6 - 13 2020 24 40

22 Randomized techniques for the approximate ID

Suppose A isin Rmtimesn with rank(A) = k Let

A = YF Y isin Rmtimesk F isin Rktimesn

Then by [IsX] = ID row(Y k) to compute a row ID of Y wehave

Y = XY(Is )

andXA(Is ) = XY(Is )F = YF = A

That is XA(Is ) is automatically a row ID of A as well

Observation In order to compute a row ID of a matrix A theonly information needed is a matrix Y whose columns span thecolumn space of A

As we have seen the task of finding a matrix Y whose columnsform a good basis for the column space of a matrix is ideallysuited to randomized sampling

Low-Rank Approximations Lecture 4 March 6 - 13 2020 25 40

Algorithm 10 Accuracy enhanced randomized row ID

Input A isin Rmtimesn a target rank kover-sampling parameter pa small q the number of steps in the power iteration

Output X isin Rmtimesk and an index vector Is isin Nk

such that A asymp XA(Is )

G = randn(n k + p)Y = AGfor j = 1 q

Z = ATYY = AZ

end[IsX] = ID row(Y k)

We can reduce the complexity by using a structured randommatrix instead of a Gaussian For example Ω = DFSDHS

Low-Rank Approximations Lecture 4 March 6 - 13 2020 26 40

3 The CUR decomposition

The CUR factorization approximates an mtimes n matrix A as aproduct

A asymp CUR C isin Rmtimesk U isin Rktimesk R isin Rktimesn

where C consists of a subset of the columns of A and R consists ofa subset of the rows of A

From double-sided ID to CUR Let

C = A( Js) R = A(Is )

ByA asymp CZ asymp CUR

we setU = ZRdagger

Low-Rank Approximations Lecture 4 March 6 - 13 2020 27 40

Algorithm 11 Accuracy enhanced randomized CUR

Input A isin Rmtimesn a target rank kover-sampling parameter pa small q the number of steps in the power iteration

Output U isin Rktimesk index vectors Is Js isin Nk

such that A asymp A( Js)UA(Is )

G = randn(k + pm)Y = GAfor j = 1 q

Z = YATY = ZA

end[IsZ] = ID col(Y k)[Issim] = ID row(A( Js) k)Solve UA(Is ) = Z in least squares sense

Low-Rank Approximations Lecture 4 March 6 - 13 2020 28 40

4 Adaptive rank determination

Given a matrix A and a tolerance ε The task is to determinematrices Qk and Bk of rank le k such that 983042AminusQkBk983042 le ε

41 Algorithm with updating the matrix

Q0 = [ ] B0 = [ ] A0 = A k = 0while 983042Ak9830422 gt ε

k = k + 1pick y isin range(Akminus1)qk = y983042y9830422bk = qT

kAkminus1Qk =

983045Qkminus1 qk

983046

Bk =

983063Bkminus1

bk

983064

Ak = Akminus1 minus qkbkend

Low-Rank Approximations Lecture 4 March 6 - 13 2020 29 40

Strategies for choosing y isin range(Akminus1)

(1) Pick the largest remaining column

y = Akminus1( jk) with jk = argmaxj

983042Akminus1( j)9830422

(2) Pick the locally optimal vector (computationally hard)

y = argmin983042x9830422=1 xisinrange(Akminus1)

983042(Iminus xxT)Akminus19830422

(3) A randomized selection strategy Set

y = Akminus1g

where g isin Rn is a Gaussian random vector

Low-Rank Approximations Lecture 4 March 6 - 13 2020 30 40

Exercise Prove that for each k Qk is orthonormal and

Bk = QTkA A = QkBk +Ak

Exercise Prove that for each k

983042A9830422F = 983042Bk9830422F + 983042Ak9830422F

Exercise Prove that for each k with the strategy (2) ie

y = argmin983042x9830422=1 xisinrange(Akminus1)

983042(Iminus xxT)Akminus19830422

the algorithm will produce matrices that attain the theoreticallyoptimal precision ie

983042AminusQkBk9830422 = σk+1

Low-Rank Approximations Lecture 4 March 6 - 13 2020 31 40

42 Block version for randomized selection strategy

Q = [ ] B = [ ]while 983042A9830422 gt ε

G = randn(n p)[Qnewsim] = qr(AG 0)Bnew = QT

newAQ =

983045Q Qnew

983046

B =

983063B

Bnew

983064

A = AminusQnewBnewend

Exercise The block version algorithmrsquos output is an orthonormalmatrix Q of size mtimes k (where k is a multiple of p) and a k times nmatrix B such that

983042AminusQB9830422 le ε

Low-Rank Approximations Lecture 4 March 6 - 13 2020 32 40

43 Algorithm without updating the matrix

Draw standard Gaussian vectors g1 gr of length nyi = Agi i = 1 r j = 0 Q0 = [ ]

while max983042yj+19830422 983042yj+29830422 983042yj+r9830422 gt ε(10983155

2π)j = j + 1yj = yj minusQjminus1Q

Tjminus1yj

qj = yj983042yj9830422Qj =

983045Qjminus1 qj

983046

Draw a standard Gaussian vector gj+r of length nyj+r = (IminusQjQ

Tj )Agj+r

yi = yi minus qjqTj yi i = (j + 1) (j + r minus 1)

endQ = Qj

The orthonormal matrix Q such that 983042AminusQQTA9830422 le ε holdswith probability at least 1minusminmn10minusr

Low-Rank Approximations Lecture 4 March 6 - 13 2020 33 40

5 Rank-revealing QR Factorization

Column pivoted QR factorization (CPQR) of A isin Rmtimesn

AP = QR

where P is a permutation matrix Q is an orthogonal matrix andR is upper triangular

Householder column pivoted QR

Householder column pivoted QR consists of a sequence of nminus 1rank-1 updates (so called BLAS2 operations)

Low-Rank Approximations Lecture 4 March 6 - 13 2020 34 40

Householder column pivoted QR executes rather slowly on modernhardware in particular on systems involving many cores when thematrix is stored in distributed memory or on a hard drive etcThe reason is the process very communication intensive Wereturn to block the process

Blocked Householder column pivoted QR

BLAS3 operations are involved which can be executed very rapidlyon a broad range of computing hardware

A randomized algorithm

G = randn(b+ pm) Y = GA [simsimP1] = qr(Y 0)

Low-Rank Approximations Lecture 4 March 6 - 13 2020 35 40

6 Rank-revealing UTV decomposition

Given an mtimes n matrix A with m ge n a UTV decomposition ofA is a factorization of the form

A = UTVT

where U isin Rmtimesm and V isin Rntimesn are unitary matrices and T is atriangular matrix (either lower or upper triangular)

The UTV decomposition can be viewed as a generalization ofother standard factorizations such as eg the Singular ValueDecomposition (SVD) or the Column Pivoted QR decomposition(CPQR) (To be precise the SVD is the special case where T isdiagonal and the CPQR is the special case where V is apermutation matrix) The additional flexibility inherent in UTVdecomposition enables the design of efficient updating procedures

randUTV provides close to optimal low-rank approximation andhighly accurate estimates for the singular values of a matrix

Low-Rank Approximations Lecture 4 March 6 - 13 2020 36 40

The algorithm randUTV builds the factorization incrementallywhich means that when it is applied to a matrix of numerical rankk the algorithm can be stopped early and incur an overall cost ofO(mnk)

Like HQRRP the algorithm randUTV is blocked which enables itto execute fast on modern hardware

The algorithm randUTV is not an iterative algorithm In thisregard it is closer to the CPQR than standard SVD algorithmswhich substantially simplifies software optimization

Blocked UTV factorization process

Low-Rank Approximations Lecture 4 March 6 - 13 2020 37 40

61 A single step block factorization from A0 to A1

Draw an mtimes p Gaussian matrix G and form the sample matrix

Y = (ATA)qATG

where q is a parameter indicating the number of steps of poweriteration taken

Form a unitary matrix V whose first p columns form anorthonormal basis for the column space of Y For example

[Vsim] = qr(Y)

We then compute SVD of the matrix AV( 1 p)

AV( 1 p) = U1A11WT

and set V1 = V

983063W

I

983064

Low-Rank Approximations Lecture 4 March 6 - 13 2020 38 40

One step of randUTV

function [UTV] = stepUTV(A p q)G =randn(size(A 1) p)Y = ATGfor i = 1 q

Y = AT(AY)end[Vsim] = qr(Y)[UDW] = svd(AV( 1 p))T =

983045D UTAV( (p+ 1) end)

983046

V( 1 p) = V( 1 p)Wreturn

randUTV then applies the same procedure to the remaining blockand so on

Low-Rank Approximations Lecture 4 March 6 - 13 2020 39 40

function [UTV] = randUTV(A p q)T = AU = eye(size(A 1)) V = eye(size(A 2))for i = 1 ceil(size(A 2)p)

I1 = 1 (p(iminus 1))I2 = (p(iminus 1) + 1) size(A 1)J2 = (p(iminus 1) + 1) size(A 2)if length(J2) gt p

[983141U 983141T 983141V] = stepUTV(T(I2 J2) p q)else

[983141U 983141T 983141V] = svd(T(I2 J2))end

U( I2) = U( I2)983141U V( J2) = V( J2)983141V

T(I2 J2) = 983141T T(I1 J2) = 983141T(I1 J2)983141Vend

return

Low-Rank Approximations Lecture 4 March 6 - 13 2020 40 40

2 Interpolative decomposition (ID)

Any matrix A of size mtimes n and rank k where k lt min(mn)admits a so called ldquocolumn IDrdquo which takes the form

A = CZ C isin Rmtimesk Z isin Rktimesn

where the matrix C = A( Js) and Z is a matrix that containsthe k times k identity matrix (Z( Js) = Ik)

Advantages compared to QR or SVD

(1) If A is sparse or non-negative then C shares these properties

(2) The ID requires less memory to store than either the QR orSVD

(3) Finding the indices associated with the spanning columns isoften helpful in data interpretation

(4) In the context of numerical algorithms for discretizing PDEsand integral equations the ID often preserves ldquothe physicsrdquo of aproblem in a way that the QR or SVD do not

Low-Rank Approximations Lecture 4 March 6 - 13 2020 19 40

ldquoRow IDrdquo

A = XR X isin Rmtimesk R isin Rktimesn

where R = A(Is ) and X is a matrix that contains the k times kidentity matrix (X(Is ) = Ik)

ldquoDouble-sided IDrdquo

A = XAsZ X isin Rmtimesk As isin Rktimesk Z isin Rktimesn

where X and Z are the same matrices as those that appear in rowID and column ID and As is the k times k submatrix of A given by

As = A(Is Js)

Low-Rank Approximations Lecture 4 March 6 - 13 2020 20 40

21 Deterministic techniques for the approximate ID

The standard software used to compute the classical columnpivoted QR factorization (CPQR)

AP = QS

can with some light post-processing be used to compute thecolumn ID Let

Q =983045Q1 Q2

983046 S =

983063S11 S12

0 S22

983064

ThenAP =

983045Q1S11 Q1S12 +Q2S22

983046

Let J denote the permutation vector associated with thepermutation matrix P so that P = I( J) and AP = A( J) LetJs = J(1 k) Then set

C = A( Js) = Q1S11

Low-Rank Approximations Lecture 4 March 6 - 13 2020 21 40

ByAP = Q1S11

983045I Sminus1

11 S12

983046+

9830450 Q2S22

983046

we have

A = C983045I Sminus1

11 S12

983046PT +

9830450 Q2S22

983046PT

= CZ+9830450 Q2S22

983046PT

Algorithm 7 Column ID A asymp A( Js)Z

function [JsZ] = ID col(A k)[simS J ] = qr(A 0)T = S(1 k 1 k)S(1 k (k + 1) n)Z = zeros(k n)Z( J) =

983045Ik T

983046

Js = J(1 k)end

Low-Rank Approximations Lecture 4 March 6 - 13 2020 22 40

The row ID can be computed via an entirely analogous processthat starts with a CPQR of the transpose of A In other wordswe execute a pivoted Gram-Schmidt orthonormalization processon the rows of A

Algorithm 8 Row ID A asymp XA(Is )

function [IsX] = ID row(A k)[simS J ] = qr(AT 0)T = S(1 k 1 k)S(1 k (k + 1) m)X = zeros(m k)

X(J ) =983045Ik T

983046T

Is = J(1 k)end

Low-Rank Approximations Lecture 4 March 6 - 13 2020 23 40

To obtain the double-sided ID we start with using theCPQR-based process to build the column ID Then compute therow ID by performing Gram-Schmidt on the rows of the tall thinmatrix C

Algorithm 9 Double-sided ID A asymp XA(Is Js)Z

function [Is JsXZ] = ID double(A k)[JsZ] = ID col(A k)[IsX] = ID row(A( Js) k)

end

The algorithms for computing interpolatory decompositions arewasteful when k ≪ min(mn) since they involve a full QRfactorization which has complexity O(mnmin(mn)) Thisproblem is very easily remedied by replacing the full QRfactorization by a partial QR factorization (the QR factorization isinterrupted after k steps) which has cost O(mnk)

Low-Rank Approximations Lecture 4 March 6 - 13 2020 24 40

22 Randomized techniques for the approximate ID

Suppose A isin Rmtimesn with rank(A) = k Let

A = YF Y isin Rmtimesk F isin Rktimesn

Then by [IsX] = ID row(Y k) to compute a row ID of Y wehave

Y = XY(Is )

andXA(Is ) = XY(Is )F = YF = A

That is XA(Is ) is automatically a row ID of A as well

Observation In order to compute a row ID of a matrix A theonly information needed is a matrix Y whose columns span thecolumn space of A

As we have seen the task of finding a matrix Y whose columnsform a good basis for the column space of a matrix is ideallysuited to randomized sampling

Low-Rank Approximations Lecture 4 March 6 - 13 2020 25 40

Algorithm 10 Accuracy enhanced randomized row ID

Input A isin Rmtimesn a target rank kover-sampling parameter pa small q the number of steps in the power iteration

Output X isin Rmtimesk and an index vector Is isin Nk

such that A asymp XA(Is )

G = randn(n k + p)Y = AGfor j = 1 q

Z = ATYY = AZ

end[IsX] = ID row(Y k)

We can reduce the complexity by using a structured randommatrix instead of a Gaussian For example Ω = DFSDHS

Low-Rank Approximations Lecture 4 March 6 - 13 2020 26 40

3 The CUR decomposition

The CUR factorization approximates an mtimes n matrix A as aproduct

A asymp CUR C isin Rmtimesk U isin Rktimesk R isin Rktimesn

where C consists of a subset of the columns of A and R consists ofa subset of the rows of A

From double-sided ID to CUR Let

C = A( Js) R = A(Is )

ByA asymp CZ asymp CUR

we setU = ZRdagger

Low-Rank Approximations Lecture 4 March 6 - 13 2020 27 40

Algorithm 11 Accuracy enhanced randomized CUR

Input A isin Rmtimesn a target rank kover-sampling parameter pa small q the number of steps in the power iteration

Output U isin Rktimesk index vectors Is Js isin Nk

such that A asymp A( Js)UA(Is )

G = randn(k + pm)Y = GAfor j = 1 q

Z = YATY = ZA

end[IsZ] = ID col(Y k)[Issim] = ID row(A( Js) k)Solve UA(Is ) = Z in least squares sense

Low-Rank Approximations Lecture 4 March 6 - 13 2020 28 40

4 Adaptive rank determination

Given a matrix A and a tolerance ε The task is to determinematrices Qk and Bk of rank le k such that 983042AminusQkBk983042 le ε

41 Algorithm with updating the matrix

Q0 = [ ] B0 = [ ] A0 = A k = 0while 983042Ak9830422 gt ε

k = k + 1pick y isin range(Akminus1)qk = y983042y9830422bk = qT

kAkminus1Qk =

983045Qkminus1 qk

983046

Bk =

983063Bkminus1

bk

983064

Ak = Akminus1 minus qkbkend

Low-Rank Approximations Lecture 4 March 6 - 13 2020 29 40

Strategies for choosing y isin range(Akminus1)

(1) Pick the largest remaining column

y = Akminus1( jk) with jk = argmaxj

983042Akminus1( j)9830422

(2) Pick the locally optimal vector (computationally hard)

y = argmin983042x9830422=1 xisinrange(Akminus1)

983042(Iminus xxT)Akminus19830422

(3) A randomized selection strategy Set

y = Akminus1g

where g isin Rn is a Gaussian random vector

Low-Rank Approximations Lecture 4 March 6 - 13 2020 30 40

Exercise Prove that for each k Qk is orthonormal and

Bk = QTkA A = QkBk +Ak

Exercise Prove that for each k

983042A9830422F = 983042Bk9830422F + 983042Ak9830422F

Exercise Prove that for each k with the strategy (2) ie

y = argmin983042x9830422=1 xisinrange(Akminus1)

983042(Iminus xxT)Akminus19830422

the algorithm will produce matrices that attain the theoreticallyoptimal precision ie

983042AminusQkBk9830422 = σk+1

Low-Rank Approximations Lecture 4 March 6 - 13 2020 31 40

42 Block version for randomized selection strategy

Q = [ ] B = [ ]while 983042A9830422 gt ε

G = randn(n p)[Qnewsim] = qr(AG 0)Bnew = QT

newAQ =

983045Q Qnew

983046

B =

983063B

Bnew

983064

A = AminusQnewBnewend

Exercise The block version algorithmrsquos output is an orthonormalmatrix Q of size mtimes k (where k is a multiple of p) and a k times nmatrix B such that

983042AminusQB9830422 le ε

Low-Rank Approximations Lecture 4 March 6 - 13 2020 32 40

43 Algorithm without updating the matrix

Draw standard Gaussian vectors g1 gr of length nyi = Agi i = 1 r j = 0 Q0 = [ ]

while max983042yj+19830422 983042yj+29830422 983042yj+r9830422 gt ε(10983155

2π)j = j + 1yj = yj minusQjminus1Q

Tjminus1yj

qj = yj983042yj9830422Qj =

983045Qjminus1 qj

983046

Draw a standard Gaussian vector gj+r of length nyj+r = (IminusQjQ

Tj )Agj+r

yi = yi minus qjqTj yi i = (j + 1) (j + r minus 1)

endQ = Qj

The orthonormal matrix Q such that 983042AminusQQTA9830422 le ε holdswith probability at least 1minusminmn10minusr

Low-Rank Approximations Lecture 4 March 6 - 13 2020 33 40

5 Rank-revealing QR Factorization

Column pivoted QR factorization (CPQR) of A isin Rmtimesn

AP = QR

where P is a permutation matrix Q is an orthogonal matrix andR is upper triangular

Householder column pivoted QR

Householder column pivoted QR consists of a sequence of nminus 1rank-1 updates (so called BLAS2 operations)

Low-Rank Approximations Lecture 4 March 6 - 13 2020 34 40

Householder column pivoted QR executes rather slowly on modernhardware in particular on systems involving many cores when thematrix is stored in distributed memory or on a hard drive etcThe reason is the process very communication intensive Wereturn to block the process

Blocked Householder column pivoted QR

BLAS3 operations are involved which can be executed very rapidlyon a broad range of computing hardware

A randomized algorithm

G = randn(b+ pm) Y = GA [simsimP1] = qr(Y 0)

Low-Rank Approximations Lecture 4 March 6 - 13 2020 35 40

6 Rank-revealing UTV decomposition

Given an mtimes n matrix A with m ge n a UTV decomposition ofA is a factorization of the form

A = UTVT

where U isin Rmtimesm and V isin Rntimesn are unitary matrices and T is atriangular matrix (either lower or upper triangular)

The UTV decomposition can be viewed as a generalization ofother standard factorizations such as eg the Singular ValueDecomposition (SVD) or the Column Pivoted QR decomposition(CPQR) (To be precise the SVD is the special case where T isdiagonal and the CPQR is the special case where V is apermutation matrix) The additional flexibility inherent in UTVdecomposition enables the design of efficient updating procedures

randUTV provides close to optimal low-rank approximation andhighly accurate estimates for the singular values of a matrix

Low-Rank Approximations Lecture 4 March 6 - 13 2020 36 40

The algorithm randUTV builds the factorization incrementallywhich means that when it is applied to a matrix of numerical rankk the algorithm can be stopped early and incur an overall cost ofO(mnk)

Like HQRRP the algorithm randUTV is blocked which enables itto execute fast on modern hardware

The algorithm randUTV is not an iterative algorithm In thisregard it is closer to the CPQR than standard SVD algorithmswhich substantially simplifies software optimization

Blocked UTV factorization process

Low-Rank Approximations Lecture 4 March 6 - 13 2020 37 40

61 A single step block factorization from A0 to A1

Draw an mtimes p Gaussian matrix G and form the sample matrix

Y = (ATA)qATG

where q is a parameter indicating the number of steps of poweriteration taken

Form a unitary matrix V whose first p columns form anorthonormal basis for the column space of Y For example

[Vsim] = qr(Y)

We then compute SVD of the matrix AV( 1 p)

AV( 1 p) = U1A11WT

and set V1 = V

983063W

I

983064

Low-Rank Approximations Lecture 4 March 6 - 13 2020 38 40

One step of randUTV

function [UTV] = stepUTV(A p q)G =randn(size(A 1) p)Y = ATGfor i = 1 q

Y = AT(AY)end[Vsim] = qr(Y)[UDW] = svd(AV( 1 p))T =

983045D UTAV( (p+ 1) end)

983046

V( 1 p) = V( 1 p)Wreturn

randUTV then applies the same procedure to the remaining blockand so on

Low-Rank Approximations Lecture 4 March 6 - 13 2020 39 40

function [UTV] = randUTV(A p q)T = AU = eye(size(A 1)) V = eye(size(A 2))for i = 1 ceil(size(A 2)p)

I1 = 1 (p(iminus 1))I2 = (p(iminus 1) + 1) size(A 1)J2 = (p(iminus 1) + 1) size(A 2)if length(J2) gt p

[983141U 983141T 983141V] = stepUTV(T(I2 J2) p q)else

[983141U 983141T 983141V] = svd(T(I2 J2))end

U( I2) = U( I2)983141U V( J2) = V( J2)983141V

T(I2 J2) = 983141T T(I1 J2) = 983141T(I1 J2)983141Vend

return

Low-Rank Approximations Lecture 4 March 6 - 13 2020 40 40

ldquoRow IDrdquo

A = XR X isin Rmtimesk R isin Rktimesn

where R = A(Is ) and X is a matrix that contains the k times kidentity matrix (X(Is ) = Ik)

ldquoDouble-sided IDrdquo

A = XAsZ X isin Rmtimesk As isin Rktimesk Z isin Rktimesn

where X and Z are the same matrices as those that appear in rowID and column ID and As is the k times k submatrix of A given by

As = A(Is Js)

Low-Rank Approximations Lecture 4 March 6 - 13 2020 20 40

21 Deterministic techniques for the approximate ID

The standard software used to compute the classical columnpivoted QR factorization (CPQR)

AP = QS

can with some light post-processing be used to compute thecolumn ID Let

Q =983045Q1 Q2

983046 S =

983063S11 S12

0 S22

983064

ThenAP =

983045Q1S11 Q1S12 +Q2S22

983046

Let J denote the permutation vector associated with thepermutation matrix P so that P = I( J) and AP = A( J) LetJs = J(1 k) Then set

C = A( Js) = Q1S11

Low-Rank Approximations Lecture 4 March 6 - 13 2020 21 40

ByAP = Q1S11

983045I Sminus1

11 S12

983046+

9830450 Q2S22

983046

we have

A = C983045I Sminus1

11 S12

983046PT +

9830450 Q2S22

983046PT

= CZ+9830450 Q2S22

983046PT

Algorithm 7 Column ID A asymp A( Js)Z

function [JsZ] = ID col(A k)[simS J ] = qr(A 0)T = S(1 k 1 k)S(1 k (k + 1) n)Z = zeros(k n)Z( J) =

983045Ik T

983046

Js = J(1 k)end

Low-Rank Approximations Lecture 4 March 6 - 13 2020 22 40

The row ID can be computed via an entirely analogous processthat starts with a CPQR of the transpose of A In other wordswe execute a pivoted Gram-Schmidt orthonormalization processon the rows of A

Algorithm 8 Row ID A asymp XA(Is )

function [IsX] = ID row(A k)[simS J ] = qr(AT 0)T = S(1 k 1 k)S(1 k (k + 1) m)X = zeros(m k)

X(J ) =983045Ik T

983046T

Is = J(1 k)end

Low-Rank Approximations Lecture 4 March 6 - 13 2020 23 40

To obtain the double-sided ID we start with using theCPQR-based process to build the column ID Then compute therow ID by performing Gram-Schmidt on the rows of the tall thinmatrix C

Algorithm 9 Double-sided ID A asymp XA(Is Js)Z

function [Is JsXZ] = ID double(A k)[JsZ] = ID col(A k)[IsX] = ID row(A( Js) k)

end

The algorithms for computing interpolatory decompositions arewasteful when k ≪ min(mn) since they involve a full QRfactorization which has complexity O(mnmin(mn)) Thisproblem is very easily remedied by replacing the full QRfactorization by a partial QR factorization (the QR factorization isinterrupted after k steps) which has cost O(mnk)

Low-Rank Approximations Lecture 4 March 6 - 13 2020 24 40

22 Randomized techniques for the approximate ID

Suppose A isin Rmtimesn with rank(A) = k Let

A = YF Y isin Rmtimesk F isin Rktimesn

Then by [IsX] = ID row(Y k) to compute a row ID of Y wehave

Y = XY(Is )

andXA(Is ) = XY(Is )F = YF = A

That is XA(Is ) is automatically a row ID of A as well

Observation In order to compute a row ID of a matrix A theonly information needed is a matrix Y whose columns span thecolumn space of A

As we have seen the task of finding a matrix Y whose columnsform a good basis for the column space of a matrix is ideallysuited to randomized sampling

Low-Rank Approximations Lecture 4 March 6 - 13 2020 25 40

Algorithm 10 Accuracy enhanced randomized row ID

Input A isin Rmtimesn a target rank kover-sampling parameter pa small q the number of steps in the power iteration

Output X isin Rmtimesk and an index vector Is isin Nk

such that A asymp XA(Is )

G = randn(n k + p)Y = AGfor j = 1 q

Z = ATYY = AZ

end[IsX] = ID row(Y k)

We can reduce the complexity by using a structured randommatrix instead of a Gaussian For example Ω = DFSDHS

Low-Rank Approximations Lecture 4 March 6 - 13 2020 26 40

3 The CUR decomposition

The CUR factorization approximates an mtimes n matrix A as aproduct

A asymp CUR C isin Rmtimesk U isin Rktimesk R isin Rktimesn

where C consists of a subset of the columns of A and R consists ofa subset of the rows of A

From double-sided ID to CUR Let

C = A( Js) R = A(Is )

ByA asymp CZ asymp CUR

we setU = ZRdagger

Low-Rank Approximations Lecture 4 March 6 - 13 2020 27 40

Algorithm 11 Accuracy enhanced randomized CUR

Input A isin Rmtimesn a target rank kover-sampling parameter pa small q the number of steps in the power iteration

Output U isin Rktimesk index vectors Is Js isin Nk

such that A asymp A( Js)UA(Is )

G = randn(k + pm)Y = GAfor j = 1 q

Z = YATY = ZA

end[IsZ] = ID col(Y k)[Issim] = ID row(A( Js) k)Solve UA(Is ) = Z in least squares sense

Low-Rank Approximations Lecture 4 March 6 - 13 2020 28 40

4 Adaptive rank determination

Given a matrix A and a tolerance ε The task is to determinematrices Qk and Bk of rank le k such that 983042AminusQkBk983042 le ε

41 Algorithm with updating the matrix

Q0 = [ ] B0 = [ ] A0 = A k = 0while 983042Ak9830422 gt ε

k = k + 1pick y isin range(Akminus1)qk = y983042y9830422bk = qT

kAkminus1Qk =

983045Qkminus1 qk

983046

Bk =

983063Bkminus1

bk

983064

Ak = Akminus1 minus qkbkend

Low-Rank Approximations Lecture 4 March 6 - 13 2020 29 40

Strategies for choosing y isin range(Akminus1)

(1) Pick the largest remaining column

y = Akminus1( jk) with jk = argmaxj

983042Akminus1( j)9830422

(2) Pick the locally optimal vector (computationally hard)

y = argmin983042x9830422=1 xisinrange(Akminus1)

983042(Iminus xxT)Akminus19830422

(3) A randomized selection strategy Set

y = Akminus1g

where g isin Rn is a Gaussian random vector

Low-Rank Approximations Lecture 4 March 6 - 13 2020 30 40

Exercise Prove that for each k Qk is orthonormal and

Bk = QTkA A = QkBk +Ak

Exercise Prove that for each k

983042A9830422F = 983042Bk9830422F + 983042Ak9830422F

Exercise Prove that for each k with the strategy (2) ie

y = argmin983042x9830422=1 xisinrange(Akminus1)

983042(Iminus xxT)Akminus19830422

the algorithm will produce matrices that attain the theoreticallyoptimal precision ie

983042AminusQkBk9830422 = σk+1

Low-Rank Approximations Lecture 4 March 6 - 13 2020 31 40

42 Block version for randomized selection strategy

Q = [ ] B = [ ]while 983042A9830422 gt ε

G = randn(n p)[Qnewsim] = qr(AG 0)Bnew = QT

newAQ =

983045Q Qnew

983046

B =

983063B

Bnew

983064

A = AminusQnewBnewend

Exercise The block version algorithmrsquos output is an orthonormalmatrix Q of size mtimes k (where k is a multiple of p) and a k times nmatrix B such that

983042AminusQB9830422 le ε

Low-Rank Approximations Lecture 4 March 6 - 13 2020 32 40

43 Algorithm without updating the matrix

Draw standard Gaussian vectors g1 gr of length nyi = Agi i = 1 r j = 0 Q0 = [ ]

while max983042yj+19830422 983042yj+29830422 983042yj+r9830422 gt ε(10983155

2π)j = j + 1yj = yj minusQjminus1Q

Tjminus1yj

qj = yj983042yj9830422Qj =

983045Qjminus1 qj

983046

Draw a standard Gaussian vector gj+r of length nyj+r = (IminusQjQ

Tj )Agj+r

yi = yi minus qjqTj yi i = (j + 1) (j + r minus 1)

endQ = Qj

The orthonormal matrix Q such that 983042AminusQQTA9830422 le ε holdswith probability at least 1minusminmn10minusr

Low-Rank Approximations Lecture 4 March 6 - 13 2020 33 40

5 Rank-revealing QR Factorization

Column pivoted QR factorization (CPQR) of A isin Rmtimesn

AP = QR

where P is a permutation matrix Q is an orthogonal matrix andR is upper triangular

Householder column pivoted QR

Householder column pivoted QR consists of a sequence of nminus 1rank-1 updates (so called BLAS2 operations)

Low-Rank Approximations Lecture 4 March 6 - 13 2020 34 40

Householder column pivoted QR executes rather slowly on modernhardware in particular on systems involving many cores when thematrix is stored in distributed memory or on a hard drive etcThe reason is the process very communication intensive Wereturn to block the process

Blocked Householder column pivoted QR

BLAS3 operations are involved which can be executed very rapidlyon a broad range of computing hardware

A randomized algorithm

G = randn(b+ pm) Y = GA [simsimP1] = qr(Y 0)

Low-Rank Approximations Lecture 4 March 6 - 13 2020 35 40

6 Rank-revealing UTV decomposition

Given an mtimes n matrix A with m ge n a UTV decomposition ofA is a factorization of the form

A = UTVT

where U isin Rmtimesm and V isin Rntimesn are unitary matrices and T is atriangular matrix (either lower or upper triangular)

The UTV decomposition can be viewed as a generalization ofother standard factorizations such as eg the Singular ValueDecomposition (SVD) or the Column Pivoted QR decomposition(CPQR) (To be precise the SVD is the special case where T isdiagonal and the CPQR is the special case where V is apermutation matrix) The additional flexibility inherent in UTVdecomposition enables the design of efficient updating procedures

randUTV provides close to optimal low-rank approximation andhighly accurate estimates for the singular values of a matrix

Low-Rank Approximations Lecture 4 March 6 - 13 2020 36 40

The algorithm randUTV builds the factorization incrementallywhich means that when it is applied to a matrix of numerical rankk the algorithm can be stopped early and incur an overall cost ofO(mnk)

Like HQRRP the algorithm randUTV is blocked which enables itto execute fast on modern hardware

The algorithm randUTV is not an iterative algorithm In thisregard it is closer to the CPQR than standard SVD algorithmswhich substantially simplifies software optimization

Blocked UTV factorization process

Low-Rank Approximations Lecture 4 March 6 - 13 2020 37 40

61 A single step block factorization from A0 to A1

Draw an mtimes p Gaussian matrix G and form the sample matrix

Y = (ATA)qATG

where q is a parameter indicating the number of steps of poweriteration taken

Form a unitary matrix V whose first p columns form anorthonormal basis for the column space of Y For example

[Vsim] = qr(Y)

We then compute SVD of the matrix AV( 1 p)

AV( 1 p) = U1A11WT

and set V1 = V

983063W

I

983064

Low-Rank Approximations Lecture 4 March 6 - 13 2020 38 40

One step of randUTV

function [UTV] = stepUTV(A p q)G =randn(size(A 1) p)Y = ATGfor i = 1 q

Y = AT(AY)end[Vsim] = qr(Y)[UDW] = svd(AV( 1 p))T =

983045D UTAV( (p+ 1) end)

983046

V( 1 p) = V( 1 p)Wreturn

randUTV then applies the same procedure to the remaining blockand so on

Low-Rank Approximations Lecture 4 March 6 - 13 2020 39 40

function [UTV] = randUTV(A p q)T = AU = eye(size(A 1)) V = eye(size(A 2))for i = 1 ceil(size(A 2)p)

I1 = 1 (p(iminus 1))I2 = (p(iminus 1) + 1) size(A 1)J2 = (p(iminus 1) + 1) size(A 2)if length(J2) gt p

[983141U 983141T 983141V] = stepUTV(T(I2 J2) p q)else

[983141U 983141T 983141V] = svd(T(I2 J2))end

U( I2) = U( I2)983141U V( J2) = V( J2)983141V

T(I2 J2) = 983141T T(I1 J2) = 983141T(I1 J2)983141Vend

return

Low-Rank Approximations Lecture 4 March 6 - 13 2020 40 40

21 Deterministic techniques for the approximate ID

The standard software used to compute the classical columnpivoted QR factorization (CPQR)

AP = QS

can with some light post-processing be used to compute thecolumn ID Let

Q =983045Q1 Q2

983046 S =

983063S11 S12

0 S22

983064

ThenAP =

983045Q1S11 Q1S12 +Q2S22

983046

Let J denote the permutation vector associated with thepermutation matrix P so that P = I( J) and AP = A( J) LetJs = J(1 k) Then set

C = A( Js) = Q1S11

Low-Rank Approximations Lecture 4 March 6 - 13 2020 21 40

ByAP = Q1S11

983045I Sminus1

11 S12

983046+

9830450 Q2S22

983046

we have

A = C983045I Sminus1

11 S12

983046PT +

9830450 Q2S22

983046PT

= CZ+9830450 Q2S22

983046PT

Algorithm 7 Column ID A asymp A( Js)Z

function [JsZ] = ID col(A k)[simS J ] = qr(A 0)T = S(1 k 1 k)S(1 k (k + 1) n)Z = zeros(k n)Z( J) =

983045Ik T

983046

Js = J(1 k)end

Low-Rank Approximations Lecture 4 March 6 - 13 2020 22 40

The row ID can be computed via an entirely analogous processthat starts with a CPQR of the transpose of A In other wordswe execute a pivoted Gram-Schmidt orthonormalization processon the rows of A

Algorithm 8 Row ID A asymp XA(Is )

function [IsX] = ID row(A k)[simS J ] = qr(AT 0)T = S(1 k 1 k)S(1 k (k + 1) m)X = zeros(m k)

X(J ) =983045Ik T

983046T

Is = J(1 k)end

Low-Rank Approximations Lecture 4 March 6 - 13 2020 23 40

To obtain the double-sided ID we start with using theCPQR-based process to build the column ID Then compute therow ID by performing Gram-Schmidt on the rows of the tall thinmatrix C

Algorithm 9 Double-sided ID A asymp XA(Is Js)Z

function [Is JsXZ] = ID double(A k)[JsZ] = ID col(A k)[IsX] = ID row(A( Js) k)

end

The algorithms for computing interpolatory decompositions arewasteful when k ≪ min(mn) since they involve a full QRfactorization which has complexity O(mnmin(mn)) Thisproblem is very easily remedied by replacing the full QRfactorization by a partial QR factorization (the QR factorization isinterrupted after k steps) which has cost O(mnk)

Low-Rank Approximations Lecture 4 March 6 - 13 2020 24 40

22 Randomized techniques for the approximate ID

Suppose A isin Rmtimesn with rank(A) = k Let

A = YF Y isin Rmtimesk F isin Rktimesn

Then by [IsX] = ID row(Y k) to compute a row ID of Y wehave

Y = XY(Is )

andXA(Is ) = XY(Is )F = YF = A

That is XA(Is ) is automatically a row ID of A as well

Observation In order to compute a row ID of a matrix A theonly information needed is a matrix Y whose columns span thecolumn space of A

As we have seen the task of finding a matrix Y whose columnsform a good basis for the column space of a matrix is ideallysuited to randomized sampling

Low-Rank Approximations Lecture 4 March 6 - 13 2020 25 40

Algorithm 10 Accuracy enhanced randomized row ID

Input A isin Rmtimesn a target rank kover-sampling parameter pa small q the number of steps in the power iteration

Output X isin Rmtimesk and an index vector Is isin Nk

such that A asymp XA(Is )

G = randn(n k + p)Y = AGfor j = 1 q

Z = ATYY = AZ

end[IsX] = ID row(Y k)

We can reduce the complexity by using a structured randommatrix instead of a Gaussian For example Ω = DFSDHS

Low-Rank Approximations Lecture 4 March 6 - 13 2020 26 40

3 The CUR decomposition

The CUR factorization approximates an mtimes n matrix A as aproduct

A asymp CUR C isin Rmtimesk U isin Rktimesk R isin Rktimesn

where C consists of a subset of the columns of A and R consists ofa subset of the rows of A

From double-sided ID to CUR Let

C = A( Js) R = A(Is )

ByA asymp CZ asymp CUR

we setU = ZRdagger

Low-Rank Approximations Lecture 4 March 6 - 13 2020 27 40

Algorithm 11 Accuracy enhanced randomized CUR

Input A isin Rmtimesn a target rank kover-sampling parameter pa small q the number of steps in the power iteration

Output U isin Rktimesk index vectors Is Js isin Nk

such that A asymp A( Js)UA(Is )

G = randn(k + pm)Y = GAfor j = 1 q

Z = YATY = ZA

end[IsZ] = ID col(Y k)[Issim] = ID row(A( Js) k)Solve UA(Is ) = Z in least squares sense

Low-Rank Approximations Lecture 4 March 6 - 13 2020 28 40

4 Adaptive rank determination

Given a matrix A and a tolerance ε The task is to determinematrices Qk and Bk of rank le k such that 983042AminusQkBk983042 le ε

41 Algorithm with updating the matrix

Q0 = [ ] B0 = [ ] A0 = A k = 0while 983042Ak9830422 gt ε

k = k + 1pick y isin range(Akminus1)qk = y983042y9830422bk = qT

kAkminus1Qk =

983045Qkminus1 qk

983046

Bk =

983063Bkminus1

bk

983064

Ak = Akminus1 minus qkbkend

Low-Rank Approximations Lecture 4 March 6 - 13 2020 29 40

Strategies for choosing y isin range(Akminus1)

(1) Pick the largest remaining column

y = Akminus1( jk) with jk = argmaxj

983042Akminus1( j)9830422

(2) Pick the locally optimal vector (computationally hard)

y = argmin983042x9830422=1 xisinrange(Akminus1)

983042(Iminus xxT)Akminus19830422

(3) A randomized selection strategy Set

y = Akminus1g

where g isin Rn is a Gaussian random vector

Low-Rank Approximations Lecture 4 March 6 - 13 2020 30 40

Exercise Prove that for each k Qk is orthonormal and

Bk = QTkA A = QkBk +Ak

Exercise Prove that for each k

983042A9830422F = 983042Bk9830422F + 983042Ak9830422F

Exercise Prove that for each k with the strategy (2) ie

y = argmin983042x9830422=1 xisinrange(Akminus1)

983042(Iminus xxT)Akminus19830422

the algorithm will produce matrices that attain the theoreticallyoptimal precision ie

983042AminusQkBk9830422 = σk+1

Low-Rank Approximations Lecture 4 March 6 - 13 2020 31 40

42 Block version for randomized selection strategy

Q = [ ] B = [ ]while 983042A9830422 gt ε

G = randn(n p)[Qnewsim] = qr(AG 0)Bnew = QT

newAQ =

983045Q Qnew

983046

B =

983063B

Bnew

983064

A = AminusQnewBnewend

Exercise The block version algorithmrsquos output is an orthonormalmatrix Q of size mtimes k (where k is a multiple of p) and a k times nmatrix B such that

983042AminusQB9830422 le ε

Low-Rank Approximations Lecture 4 March 6 - 13 2020 32 40

43 Algorithm without updating the matrix

Draw standard Gaussian vectors g1 gr of length nyi = Agi i = 1 r j = 0 Q0 = [ ]

while max983042yj+19830422 983042yj+29830422 983042yj+r9830422 gt ε(10983155

2π)j = j + 1yj = yj minusQjminus1Q

Tjminus1yj

qj = yj983042yj9830422Qj =

983045Qjminus1 qj

983046

Draw a standard Gaussian vector gj+r of length nyj+r = (IminusQjQ

Tj )Agj+r

yi = yi minus qjqTj yi i = (j + 1) (j + r minus 1)

endQ = Qj

The orthonormal matrix Q such that 983042AminusQQTA9830422 le ε holdswith probability at least 1minusminmn10minusr

Low-Rank Approximations Lecture 4 March 6 - 13 2020 33 40

5 Rank-revealing QR Factorization

Column pivoted QR factorization (CPQR) of A isin Rmtimesn

AP = QR

where P is a permutation matrix Q is an orthogonal matrix andR is upper triangular

Householder column pivoted QR

Householder column pivoted QR consists of a sequence of nminus 1rank-1 updates (so called BLAS2 operations)

Low-Rank Approximations Lecture 4 March 6 - 13 2020 34 40

Householder column pivoted QR executes rather slowly on modernhardware in particular on systems involving many cores when thematrix is stored in distributed memory or on a hard drive etcThe reason is the process very communication intensive Wereturn to block the process

Blocked Householder column pivoted QR

BLAS3 operations are involved which can be executed very rapidlyon a broad range of computing hardware

A randomized algorithm

G = randn(b+ pm) Y = GA [simsimP1] = qr(Y 0)

Low-Rank Approximations Lecture 4 March 6 - 13 2020 35 40

6 Rank-revealing UTV decomposition

Given an mtimes n matrix A with m ge n a UTV decomposition ofA is a factorization of the form

A = UTVT

where U isin Rmtimesm and V isin Rntimesn are unitary matrices and T is atriangular matrix (either lower or upper triangular)

The UTV decomposition can be viewed as a generalization ofother standard factorizations such as eg the Singular ValueDecomposition (SVD) or the Column Pivoted QR decomposition(CPQR) (To be precise the SVD is the special case where T isdiagonal and the CPQR is the special case where V is apermutation matrix) The additional flexibility inherent in UTVdecomposition enables the design of efficient updating procedures

randUTV provides close to optimal low-rank approximation andhighly accurate estimates for the singular values of a matrix

Low-Rank Approximations Lecture 4 March 6 - 13 2020 36 40

The algorithm randUTV builds the factorization incrementallywhich means that when it is applied to a matrix of numerical rankk the algorithm can be stopped early and incur an overall cost ofO(mnk)

Like HQRRP the algorithm randUTV is blocked which enables itto execute fast on modern hardware

The algorithm randUTV is not an iterative algorithm In thisregard it is closer to the CPQR than standard SVD algorithmswhich substantially simplifies software optimization

Blocked UTV factorization process

Low-Rank Approximations Lecture 4 March 6 - 13 2020 37 40

61 A single step block factorization from A0 to A1

Draw an mtimes p Gaussian matrix G and form the sample matrix

Y = (ATA)qATG

where q is a parameter indicating the number of steps of poweriteration taken

Form a unitary matrix V whose first p columns form anorthonormal basis for the column space of Y For example

[Vsim] = qr(Y)

We then compute SVD of the matrix AV( 1 p)

AV( 1 p) = U1A11WT

and set V1 = V

983063W

I

983064

Low-Rank Approximations Lecture 4 March 6 - 13 2020 38 40

One step of randUTV

function [UTV] = stepUTV(A p q)G =randn(size(A 1) p)Y = ATGfor i = 1 q

Y = AT(AY)end[Vsim] = qr(Y)[UDW] = svd(AV( 1 p))T =

983045D UTAV( (p+ 1) end)

983046

V( 1 p) = V( 1 p)Wreturn

randUTV then applies the same procedure to the remaining blockand so on

Low-Rank Approximations Lecture 4 March 6 - 13 2020 39 40

function [UTV] = randUTV(A p q)T = AU = eye(size(A 1)) V = eye(size(A 2))for i = 1 ceil(size(A 2)p)

I1 = 1 (p(iminus 1))I2 = (p(iminus 1) + 1) size(A 1)J2 = (p(iminus 1) + 1) size(A 2)if length(J2) gt p

[983141U 983141T 983141V] = stepUTV(T(I2 J2) p q)else

[983141U 983141T 983141V] = svd(T(I2 J2))end

U( I2) = U( I2)983141U V( J2) = V( J2)983141V

T(I2 J2) = 983141T T(I1 J2) = 983141T(I1 J2)983141Vend

return

Low-Rank Approximations Lecture 4 March 6 - 13 2020 40 40

ByAP = Q1S11

983045I Sminus1

11 S12

983046+

9830450 Q2S22

983046

we have

A = C983045I Sminus1

11 S12

983046PT +

9830450 Q2S22

983046PT

= CZ+9830450 Q2S22

983046PT

Algorithm 7 Column ID A asymp A( Js)Z

function [JsZ] = ID col(A k)[simS J ] = qr(A 0)T = S(1 k 1 k)S(1 k (k + 1) n)Z = zeros(k n)Z( J) =

983045Ik T

983046

Js = J(1 k)end

Low-Rank Approximations Lecture 4 March 6 - 13 2020 22 40

The row ID can be computed via an entirely analogous processthat starts with a CPQR of the transpose of A In other wordswe execute a pivoted Gram-Schmidt orthonormalization processon the rows of A

Algorithm 8 Row ID A asymp XA(Is )

function [IsX] = ID row(A k)[simS J ] = qr(AT 0)T = S(1 k 1 k)S(1 k (k + 1) m)X = zeros(m k)

X(J ) =983045Ik T

983046T

Is = J(1 k)end

Low-Rank Approximations Lecture 4 March 6 - 13 2020 23 40

To obtain the double-sided ID we start with using theCPQR-based process to build the column ID Then compute therow ID by performing Gram-Schmidt on the rows of the tall thinmatrix C

Algorithm 9 Double-sided ID A asymp XA(Is Js)Z

function [Is JsXZ] = ID double(A k)[JsZ] = ID col(A k)[IsX] = ID row(A( Js) k)

end

The algorithms for computing interpolatory decompositions arewasteful when k ≪ min(mn) since they involve a full QRfactorization which has complexity O(mnmin(mn)) Thisproblem is very easily remedied by replacing the full QRfactorization by a partial QR factorization (the QR factorization isinterrupted after k steps) which has cost O(mnk)

Low-Rank Approximations Lecture 4 March 6 - 13 2020 24 40

22 Randomized techniques for the approximate ID

Suppose A isin Rmtimesn with rank(A) = k Let

A = YF Y isin Rmtimesk F isin Rktimesn

Then by [IsX] = ID row(Y k) to compute a row ID of Y wehave

Y = XY(Is )

andXA(Is ) = XY(Is )F = YF = A

That is XA(Is ) is automatically a row ID of A as well

Observation In order to compute a row ID of a matrix A theonly information needed is a matrix Y whose columns span thecolumn space of A

As we have seen the task of finding a matrix Y whose columnsform a good basis for the column space of a matrix is ideallysuited to randomized sampling

Low-Rank Approximations Lecture 4 March 6 - 13 2020 25 40

Algorithm 10 Accuracy enhanced randomized row ID

Input A isin Rmtimesn a target rank kover-sampling parameter pa small q the number of steps in the power iteration

Output X isin Rmtimesk and an index vector Is isin Nk

such that A asymp XA(Is )

G = randn(n k + p)Y = AGfor j = 1 q

Z = ATYY = AZ

end[IsX] = ID row(Y k)

We can reduce the complexity by using a structured randommatrix instead of a Gaussian For example Ω = DFSDHS

Low-Rank Approximations Lecture 4 March 6 - 13 2020 26 40

3 The CUR decomposition

The CUR factorization approximates an mtimes n matrix A as aproduct

A asymp CUR C isin Rmtimesk U isin Rktimesk R isin Rktimesn

where C consists of a subset of the columns of A and R consists ofa subset of the rows of A

From double-sided ID to CUR Let

C = A( Js) R = A(Is )

ByA asymp CZ asymp CUR

we setU = ZRdagger

Low-Rank Approximations Lecture 4 March 6 - 13 2020 27 40

Algorithm 11 Accuracy enhanced randomized CUR

Input A isin Rmtimesn a target rank kover-sampling parameter pa small q the number of steps in the power iteration

Output U isin Rktimesk index vectors Is Js isin Nk

such that A asymp A( Js)UA(Is )

G = randn(k + pm)Y = GAfor j = 1 q

Z = YATY = ZA

end[IsZ] = ID col(Y k)[Issim] = ID row(A( Js) k)Solve UA(Is ) = Z in least squares sense

Low-Rank Approximations Lecture 4 March 6 - 13 2020 28 40

4 Adaptive rank determination

Given a matrix A and a tolerance ε The task is to determinematrices Qk and Bk of rank le k such that 983042AminusQkBk983042 le ε

41 Algorithm with updating the matrix

Q0 = [ ] B0 = [ ] A0 = A k = 0while 983042Ak9830422 gt ε

k = k + 1pick y isin range(Akminus1)qk = y983042y9830422bk = qT

kAkminus1Qk =

983045Qkminus1 qk

983046

Bk =

983063Bkminus1

bk

983064

Ak = Akminus1 minus qkbkend

Low-Rank Approximations Lecture 4 March 6 - 13 2020 29 40

Strategies for choosing y isin range(Akminus1)

(1) Pick the largest remaining column

y = Akminus1( jk) with jk = argmaxj

983042Akminus1( j)9830422

(2) Pick the locally optimal vector (computationally hard)

y = argmin983042x9830422=1 xisinrange(Akminus1)

983042(Iminus xxT)Akminus19830422

(3) A randomized selection strategy Set

y = Akminus1g

where g isin Rn is a Gaussian random vector

Low-Rank Approximations Lecture 4 March 6 - 13 2020 30 40

Exercise Prove that for each k Qk is orthonormal and

Bk = QTkA A = QkBk +Ak

Exercise Prove that for each k

983042A9830422F = 983042Bk9830422F + 983042Ak9830422F

Exercise Prove that for each k with the strategy (2) ie

y = argmin983042x9830422=1 xisinrange(Akminus1)

983042(Iminus xxT)Akminus19830422

the algorithm will produce matrices that attain the theoreticallyoptimal precision ie

983042AminusQkBk9830422 = σk+1

Low-Rank Approximations Lecture 4 March 6 - 13 2020 31 40

42 Block version for randomized selection strategy

Q = [ ] B = [ ]while 983042A9830422 gt ε

G = randn(n p)[Qnewsim] = qr(AG 0)Bnew = QT

newAQ =

983045Q Qnew

983046

B =

983063B

Bnew

983064

A = AminusQnewBnewend

Exercise The block version algorithmrsquos output is an orthonormalmatrix Q of size mtimes k (where k is a multiple of p) and a k times nmatrix B such that

983042AminusQB9830422 le ε

Low-Rank Approximations Lecture 4 March 6 - 13 2020 32 40

43 Algorithm without updating the matrix

Draw standard Gaussian vectors g1 gr of length nyi = Agi i = 1 r j = 0 Q0 = [ ]

while max983042yj+19830422 983042yj+29830422 983042yj+r9830422 gt ε(10983155

2π)j = j + 1yj = yj minusQjminus1Q

Tjminus1yj

qj = yj983042yj9830422Qj =

983045Qjminus1 qj

983046

Draw a standard Gaussian vector gj+r of length nyj+r = (IminusQjQ

Tj )Agj+r

yi = yi minus qjqTj yi i = (j + 1) (j + r minus 1)

endQ = Qj

The orthonormal matrix Q such that 983042AminusQQTA9830422 le ε holdswith probability at least 1minusminmn10minusr

Low-Rank Approximations Lecture 4 March 6 - 13 2020 33 40

5 Rank-revealing QR Factorization

Column pivoted QR factorization (CPQR) of A isin Rmtimesn

AP = QR

where P is a permutation matrix Q is an orthogonal matrix andR is upper triangular

Householder column pivoted QR

Householder column pivoted QR consists of a sequence of nminus 1rank-1 updates (so called BLAS2 operations)

Low-Rank Approximations Lecture 4 March 6 - 13 2020 34 40

Householder column pivoted QR executes rather slowly on modernhardware in particular on systems involving many cores when thematrix is stored in distributed memory or on a hard drive etcThe reason is the process very communication intensive Wereturn to block the process

Blocked Householder column pivoted QR

BLAS3 operations are involved which can be executed very rapidlyon a broad range of computing hardware

A randomized algorithm

G = randn(b+ pm) Y = GA [simsimP1] = qr(Y 0)

Low-Rank Approximations Lecture 4 March 6 - 13 2020 35 40

6 Rank-revealing UTV decomposition

Given an mtimes n matrix A with m ge n a UTV decomposition ofA is a factorization of the form

A = UTVT

where U isin Rmtimesm and V isin Rntimesn are unitary matrices and T is atriangular matrix (either lower or upper triangular)

The UTV decomposition can be viewed as a generalization ofother standard factorizations such as eg the Singular ValueDecomposition (SVD) or the Column Pivoted QR decomposition(CPQR) (To be precise the SVD is the special case where T isdiagonal and the CPQR is the special case where V is apermutation matrix) The additional flexibility inherent in UTVdecomposition enables the design of efficient updating procedures

randUTV provides close to optimal low-rank approximation andhighly accurate estimates for the singular values of a matrix

Low-Rank Approximations Lecture 4 March 6 - 13 2020 36 40

The algorithm randUTV builds the factorization incrementallywhich means that when it is applied to a matrix of numerical rankk the algorithm can be stopped early and incur an overall cost ofO(mnk)

Like HQRRP the algorithm randUTV is blocked which enables itto execute fast on modern hardware

The algorithm randUTV is not an iterative algorithm In thisregard it is closer to the CPQR than standard SVD algorithmswhich substantially simplifies software optimization

Blocked UTV factorization process

Low-Rank Approximations Lecture 4 March 6 - 13 2020 37 40

61 A single step block factorization from A0 to A1

Draw an mtimes p Gaussian matrix G and form the sample matrix

Y = (ATA)qATG

where q is a parameter indicating the number of steps of poweriteration taken

Form a unitary matrix V whose first p columns form anorthonormal basis for the column space of Y For example

[Vsim] = qr(Y)

We then compute SVD of the matrix AV( 1 p)

AV( 1 p) = U1A11WT

and set V1 = V

983063W

I

983064

Low-Rank Approximations Lecture 4 March 6 - 13 2020 38 40

One step of randUTV

function [UTV] = stepUTV(A p q)G =randn(size(A 1) p)Y = ATGfor i = 1 q

Y = AT(AY)end[Vsim] = qr(Y)[UDW] = svd(AV( 1 p))T =

983045D UTAV( (p+ 1) end)

983046

V( 1 p) = V( 1 p)Wreturn

randUTV then applies the same procedure to the remaining blockand so on

Low-Rank Approximations Lecture 4 March 6 - 13 2020 39 40

function [UTV] = randUTV(A p q)T = AU = eye(size(A 1)) V = eye(size(A 2))for i = 1 ceil(size(A 2)p)

I1 = 1 (p(iminus 1))I2 = (p(iminus 1) + 1) size(A 1)J2 = (p(iminus 1) + 1) size(A 2)if length(J2) gt p

[983141U 983141T 983141V] = stepUTV(T(I2 J2) p q)else

[983141U 983141T 983141V] = svd(T(I2 J2))end

U( I2) = U( I2)983141U V( J2) = V( J2)983141V

T(I2 J2) = 983141T T(I1 J2) = 983141T(I1 J2)983141Vend

return

Low-Rank Approximations Lecture 4 March 6 - 13 2020 40 40

The row ID can be computed via an entirely analogous processthat starts with a CPQR of the transpose of A In other wordswe execute a pivoted Gram-Schmidt orthonormalization processon the rows of A

Algorithm 8 Row ID A asymp XA(Is )

function [IsX] = ID row(A k)[simS J ] = qr(AT 0)T = S(1 k 1 k)S(1 k (k + 1) m)X = zeros(m k)

X(J ) =983045Ik T

983046T

Is = J(1 k)end

Low-Rank Approximations Lecture 4 March 6 - 13 2020 23 40

To obtain the double-sided ID we start with using theCPQR-based process to build the column ID Then compute therow ID by performing Gram-Schmidt on the rows of the tall thinmatrix C

Algorithm 9 Double-sided ID A asymp XA(Is Js)Z

function [Is JsXZ] = ID double(A k)[JsZ] = ID col(A k)[IsX] = ID row(A( Js) k)

end

The algorithms for computing interpolatory decompositions arewasteful when k ≪ min(mn) since they involve a full QRfactorization which has complexity O(mnmin(mn)) Thisproblem is very easily remedied by replacing the full QRfactorization by a partial QR factorization (the QR factorization isinterrupted after k steps) which has cost O(mnk)

Low-Rank Approximations Lecture 4 March 6 - 13 2020 24 40

22 Randomized techniques for the approximate ID

Suppose A isin Rmtimesn with rank(A) = k Let

A = YF Y isin Rmtimesk F isin Rktimesn

Then by [IsX] = ID row(Y k) to compute a row ID of Y wehave

Y = XY(Is )

andXA(Is ) = XY(Is )F = YF = A

That is XA(Is ) is automatically a row ID of A as well

Observation In order to compute a row ID of a matrix A theonly information needed is a matrix Y whose columns span thecolumn space of A

As we have seen the task of finding a matrix Y whose columnsform a good basis for the column space of a matrix is ideallysuited to randomized sampling

Low-Rank Approximations Lecture 4 March 6 - 13 2020 25 40

Algorithm 10 Accuracy enhanced randomized row ID

Input A isin Rmtimesn a target rank kover-sampling parameter pa small q the number of steps in the power iteration

Output X isin Rmtimesk and an index vector Is isin Nk

such that A asymp XA(Is )

G = randn(n k + p)Y = AGfor j = 1 q

Z = ATYY = AZ

end[IsX] = ID row(Y k)

We can reduce the complexity by using a structured randommatrix instead of a Gaussian For example Ω = DFSDHS

Low-Rank Approximations Lecture 4 March 6 - 13 2020 26 40

3 The CUR decomposition

The CUR factorization approximates an mtimes n matrix A as aproduct

A asymp CUR C isin Rmtimesk U isin Rktimesk R isin Rktimesn

where C consists of a subset of the columns of A and R consists ofa subset of the rows of A

From double-sided ID to CUR Let

C = A( Js) R = A(Is )

ByA asymp CZ asymp CUR

we setU = ZRdagger

Low-Rank Approximations Lecture 4 March 6 - 13 2020 27 40

Algorithm 11 Accuracy enhanced randomized CUR

Input A isin Rmtimesn a target rank kover-sampling parameter pa small q the number of steps in the power iteration

Output U isin Rktimesk index vectors Is Js isin Nk

such that A asymp A( Js)UA(Is )

G = randn(k + pm)Y = GAfor j = 1 q

Z = YATY = ZA

end[IsZ] = ID col(Y k)[Issim] = ID row(A( Js) k)Solve UA(Is ) = Z in least squares sense

Low-Rank Approximations Lecture 4 March 6 - 13 2020 28 40

4 Adaptive rank determination

Given a matrix A and a tolerance ε The task is to determinematrices Qk and Bk of rank le k such that 983042AminusQkBk983042 le ε

41 Algorithm with updating the matrix

Q0 = [ ] B0 = [ ] A0 = A k = 0while 983042Ak9830422 gt ε

k = k + 1pick y isin range(Akminus1)qk = y983042y9830422bk = qT

kAkminus1Qk =

983045Qkminus1 qk

983046

Bk =

983063Bkminus1

bk

983064

Ak = Akminus1 minus qkbkend

Low-Rank Approximations Lecture 4 March 6 - 13 2020 29 40

Strategies for choosing y isin range(Akminus1)

(1) Pick the largest remaining column

y = Akminus1( jk) with jk = argmaxj

983042Akminus1( j)9830422

(2) Pick the locally optimal vector (computationally hard)

y = argmin983042x9830422=1 xisinrange(Akminus1)

983042(Iminus xxT)Akminus19830422

(3) A randomized selection strategy Set

y = Akminus1g

where g isin Rn is a Gaussian random vector

Low-Rank Approximations Lecture 4 March 6 - 13 2020 30 40

Exercise Prove that for each k Qk is orthonormal and

Bk = QTkA A = QkBk +Ak

Exercise Prove that for each k

983042A9830422F = 983042Bk9830422F + 983042Ak9830422F

Exercise Prove that for each k with the strategy (2) ie

y = argmin983042x9830422=1 xisinrange(Akminus1)

983042(Iminus xxT)Akminus19830422

the algorithm will produce matrices that attain the theoreticallyoptimal precision ie

983042AminusQkBk9830422 = σk+1

Low-Rank Approximations Lecture 4 March 6 - 13 2020 31 40

42 Block version for randomized selection strategy

Q = [ ] B = [ ]while 983042A9830422 gt ε

G = randn(n p)[Qnewsim] = qr(AG 0)Bnew = QT

newAQ =

983045Q Qnew

983046

B =

983063B

Bnew

983064

A = AminusQnewBnewend

Exercise The block version algorithmrsquos output is an orthonormalmatrix Q of size mtimes k (where k is a multiple of p) and a k times nmatrix B such that

983042AminusQB9830422 le ε

Low-Rank Approximations Lecture 4 March 6 - 13 2020 32 40

43 Algorithm without updating the matrix

Draw standard Gaussian vectors g1 gr of length nyi = Agi i = 1 r j = 0 Q0 = [ ]

while max983042yj+19830422 983042yj+29830422 983042yj+r9830422 gt ε(10983155

2π)j = j + 1yj = yj minusQjminus1Q

Tjminus1yj

qj = yj983042yj9830422Qj =

983045Qjminus1 qj

983046

Draw a standard Gaussian vector gj+r of length nyj+r = (IminusQjQ

Tj )Agj+r

yi = yi minus qjqTj yi i = (j + 1) (j + r minus 1)

endQ = Qj

The orthonormal matrix Q such that 983042AminusQQTA9830422 le ε holdswith probability at least 1minusminmn10minusr

Low-Rank Approximations Lecture 4 March 6 - 13 2020 33 40

5 Rank-revealing QR Factorization

Column pivoted QR factorization (CPQR) of A isin Rmtimesn

AP = QR

where P is a permutation matrix Q is an orthogonal matrix andR is upper triangular

Householder column pivoted QR

Householder column pivoted QR consists of a sequence of nminus 1rank-1 updates (so called BLAS2 operations)

Low-Rank Approximations Lecture 4 March 6 - 13 2020 34 40

Householder column pivoted QR executes rather slowly on modernhardware in particular on systems involving many cores when thematrix is stored in distributed memory or on a hard drive etcThe reason is the process very communication intensive Wereturn to block the process

Blocked Householder column pivoted QR

BLAS3 operations are involved which can be executed very rapidlyon a broad range of computing hardware

A randomized algorithm

G = randn(b+ pm) Y = GA [simsimP1] = qr(Y 0)

Low-Rank Approximations Lecture 4 March 6 - 13 2020 35 40

6 Rank-revealing UTV decomposition

Given an mtimes n matrix A with m ge n a UTV decomposition ofA is a factorization of the form

A = UTVT

where U isin Rmtimesm and V isin Rntimesn are unitary matrices and T is atriangular matrix (either lower or upper triangular)

The UTV decomposition can be viewed as a generalization ofother standard factorizations such as eg the Singular ValueDecomposition (SVD) or the Column Pivoted QR decomposition(CPQR) (To be precise the SVD is the special case where T isdiagonal and the CPQR is the special case where V is apermutation matrix) The additional flexibility inherent in UTVdecomposition enables the design of efficient updating procedures

randUTV provides close to optimal low-rank approximation andhighly accurate estimates for the singular values of a matrix

Low-Rank Approximations Lecture 4 March 6 - 13 2020 36 40

The algorithm randUTV builds the factorization incrementallywhich means that when it is applied to a matrix of numerical rankk the algorithm can be stopped early and incur an overall cost ofO(mnk)

Like HQRRP the algorithm randUTV is blocked which enables itto execute fast on modern hardware

The algorithm randUTV is not an iterative algorithm In thisregard it is closer to the CPQR than standard SVD algorithmswhich substantially simplifies software optimization

Blocked UTV factorization process

Low-Rank Approximations Lecture 4 March 6 - 13 2020 37 40

61 A single step block factorization from A0 to A1

Draw an mtimes p Gaussian matrix G and form the sample matrix

Y = (ATA)qATG

where q is a parameter indicating the number of steps of poweriteration taken

Form a unitary matrix V whose first p columns form anorthonormal basis for the column space of Y For example

[Vsim] = qr(Y)

We then compute SVD of the matrix AV( 1 p)

AV( 1 p) = U1A11WT

and set V1 = V

983063W

I

983064

Low-Rank Approximations Lecture 4 March 6 - 13 2020 38 40

One step of randUTV

function [UTV] = stepUTV(A p q)G =randn(size(A 1) p)Y = ATGfor i = 1 q

Y = AT(AY)end[Vsim] = qr(Y)[UDW] = svd(AV( 1 p))T =

983045D UTAV( (p+ 1) end)

983046

V( 1 p) = V( 1 p)Wreturn

randUTV then applies the same procedure to the remaining blockand so on

Low-Rank Approximations Lecture 4 March 6 - 13 2020 39 40

function [UTV] = randUTV(A p q)T = AU = eye(size(A 1)) V = eye(size(A 2))for i = 1 ceil(size(A 2)p)

I1 = 1 (p(iminus 1))I2 = (p(iminus 1) + 1) size(A 1)J2 = (p(iminus 1) + 1) size(A 2)if length(J2) gt p

[983141U 983141T 983141V] = stepUTV(T(I2 J2) p q)else

[983141U 983141T 983141V] = svd(T(I2 J2))end

U( I2) = U( I2)983141U V( J2) = V( J2)983141V

T(I2 J2) = 983141T T(I1 J2) = 983141T(I1 J2)983141Vend

return

Low-Rank Approximations Lecture 4 March 6 - 13 2020 40 40

To obtain the double-sided ID we start with using theCPQR-based process to build the column ID Then compute therow ID by performing Gram-Schmidt on the rows of the tall thinmatrix C

Algorithm 9 Double-sided ID A asymp XA(Is Js)Z

function [Is JsXZ] = ID double(A k)[JsZ] = ID col(A k)[IsX] = ID row(A( Js) k)

end

The algorithms for computing interpolatory decompositions arewasteful when k ≪ min(mn) since they involve a full QRfactorization which has complexity O(mnmin(mn)) Thisproblem is very easily remedied by replacing the full QRfactorization by a partial QR factorization (the QR factorization isinterrupted after k steps) which has cost O(mnk)

Low-Rank Approximations Lecture 4 March 6 - 13 2020 24 40

22 Randomized techniques for the approximate ID

Suppose A isin Rmtimesn with rank(A) = k Let

A = YF Y isin Rmtimesk F isin Rktimesn

Then by [IsX] = ID row(Y k) to compute a row ID of Y wehave

Y = XY(Is )

andXA(Is ) = XY(Is )F = YF = A

That is XA(Is ) is automatically a row ID of A as well

Observation In order to compute a row ID of a matrix A theonly information needed is a matrix Y whose columns span thecolumn space of A

As we have seen the task of finding a matrix Y whose columnsform a good basis for the column space of a matrix is ideallysuited to randomized sampling

Low-Rank Approximations Lecture 4 March 6 - 13 2020 25 40

Algorithm 10 Accuracy enhanced randomized row ID

Input A isin Rmtimesn a target rank kover-sampling parameter pa small q the number of steps in the power iteration

Output X isin Rmtimesk and an index vector Is isin Nk

such that A asymp XA(Is )

G = randn(n k + p)Y = AGfor j = 1 q

Z = ATYY = AZ

end[IsX] = ID row(Y k)

We can reduce the complexity by using a structured randommatrix instead of a Gaussian For example Ω = DFSDHS

Low-Rank Approximations Lecture 4 March 6 - 13 2020 26 40

3 The CUR decomposition

The CUR factorization approximates an mtimes n matrix A as aproduct

A asymp CUR C isin Rmtimesk U isin Rktimesk R isin Rktimesn

where C consists of a subset of the columns of A and R consists ofa subset of the rows of A

From double-sided ID to CUR Let

C = A( Js) R = A(Is )

ByA asymp CZ asymp CUR

we setU = ZRdagger

Low-Rank Approximations Lecture 4 March 6 - 13 2020 27 40

Algorithm 11 Accuracy enhanced randomized CUR

Input A isin Rmtimesn a target rank kover-sampling parameter pa small q the number of steps in the power iteration

Output U isin Rktimesk index vectors Is Js isin Nk

such that A asymp A( Js)UA(Is )

G = randn(k + pm)Y = GAfor j = 1 q

Z = YATY = ZA

end[IsZ] = ID col(Y k)[Issim] = ID row(A( Js) k)Solve UA(Is ) = Z in least squares sense

Low-Rank Approximations Lecture 4 March 6 - 13 2020 28 40

4 Adaptive rank determination

Given a matrix A and a tolerance ε The task is to determinematrices Qk and Bk of rank le k such that 983042AminusQkBk983042 le ε

41 Algorithm with updating the matrix

Q0 = [ ] B0 = [ ] A0 = A k = 0while 983042Ak9830422 gt ε

k = k + 1pick y isin range(Akminus1)qk = y983042y9830422bk = qT

kAkminus1Qk =

983045Qkminus1 qk

983046

Bk =

983063Bkminus1

bk

983064

Ak = Akminus1 minus qkbkend

Low-Rank Approximations Lecture 4 March 6 - 13 2020 29 40

Strategies for choosing y isin range(Akminus1)

(1) Pick the largest remaining column

y = Akminus1( jk) with jk = argmaxj

983042Akminus1( j)9830422

(2) Pick the locally optimal vector (computationally hard)

y = argmin983042x9830422=1 xisinrange(Akminus1)

983042(Iminus xxT)Akminus19830422

(3) A randomized selection strategy Set

y = Akminus1g

where g isin Rn is a Gaussian random vector

Low-Rank Approximations Lecture 4 March 6 - 13 2020 30 40

Exercise Prove that for each k Qk is orthonormal and

Bk = QTkA A = QkBk +Ak

Exercise Prove that for each k

983042A9830422F = 983042Bk9830422F + 983042Ak9830422F

Exercise Prove that for each k with the strategy (2) ie

y = argmin983042x9830422=1 xisinrange(Akminus1)

983042(Iminus xxT)Akminus19830422

the algorithm will produce matrices that attain the theoreticallyoptimal precision ie

983042AminusQkBk9830422 = σk+1

Low-Rank Approximations Lecture 4 March 6 - 13 2020 31 40

42 Block version for randomized selection strategy

Q = [ ] B = [ ]while 983042A9830422 gt ε

G = randn(n p)[Qnewsim] = qr(AG 0)Bnew = QT

newAQ =

983045Q Qnew

983046

B =

983063B

Bnew

983064

A = AminusQnewBnewend

Exercise The block version algorithmrsquos output is an orthonormalmatrix Q of size mtimes k (where k is a multiple of p) and a k times nmatrix B such that

983042AminusQB9830422 le ε

Low-Rank Approximations Lecture 4 March 6 - 13 2020 32 40

43 Algorithm without updating the matrix

Draw standard Gaussian vectors g1 gr of length nyi = Agi i = 1 r j = 0 Q0 = [ ]

while max983042yj+19830422 983042yj+29830422 983042yj+r9830422 gt ε(10983155

2π)j = j + 1yj = yj minusQjminus1Q

Tjminus1yj

qj = yj983042yj9830422Qj =

983045Qjminus1 qj

983046

Draw a standard Gaussian vector gj+r of length nyj+r = (IminusQjQ

Tj )Agj+r

yi = yi minus qjqTj yi i = (j + 1) (j + r minus 1)

endQ = Qj

The orthonormal matrix Q such that 983042AminusQQTA9830422 le ε holdswith probability at least 1minusminmn10minusr

Low-Rank Approximations Lecture 4 March 6 - 13 2020 33 40

5 Rank-revealing QR Factorization

Column pivoted QR factorization (CPQR) of A isin Rmtimesn

AP = QR

where P is a permutation matrix Q is an orthogonal matrix andR is upper triangular

Householder column pivoted QR

Householder column pivoted QR consists of a sequence of nminus 1rank-1 updates (so called BLAS2 operations)

Low-Rank Approximations Lecture 4 March 6 - 13 2020 34 40

Householder column pivoted QR executes rather slowly on modernhardware in particular on systems involving many cores when thematrix is stored in distributed memory or on a hard drive etcThe reason is the process very communication intensive Wereturn to block the process

Blocked Householder column pivoted QR

BLAS3 operations are involved which can be executed very rapidlyon a broad range of computing hardware

A randomized algorithm

G = randn(b+ pm) Y = GA [simsimP1] = qr(Y 0)

Low-Rank Approximations Lecture 4 March 6 - 13 2020 35 40

6 Rank-revealing UTV decomposition

Given an mtimes n matrix A with m ge n a UTV decomposition ofA is a factorization of the form

A = UTVT

where U isin Rmtimesm and V isin Rntimesn are unitary matrices and T is atriangular matrix (either lower or upper triangular)

The UTV decomposition can be viewed as a generalization ofother standard factorizations such as eg the Singular ValueDecomposition (SVD) or the Column Pivoted QR decomposition(CPQR) (To be precise the SVD is the special case where T isdiagonal and the CPQR is the special case where V is apermutation matrix) The additional flexibility inherent in UTVdecomposition enables the design of efficient updating procedures

randUTV provides close to optimal low-rank approximation andhighly accurate estimates for the singular values of a matrix

Low-Rank Approximations Lecture 4 March 6 - 13 2020 36 40

The algorithm randUTV builds the factorization incrementallywhich means that when it is applied to a matrix of numerical rankk the algorithm can be stopped early and incur an overall cost ofO(mnk)

Like HQRRP the algorithm randUTV is blocked which enables itto execute fast on modern hardware

The algorithm randUTV is not an iterative algorithm In thisregard it is closer to the CPQR than standard SVD algorithmswhich substantially simplifies software optimization

Blocked UTV factorization process

Low-Rank Approximations Lecture 4 March 6 - 13 2020 37 40

61 A single step block factorization from A0 to A1

Draw an mtimes p Gaussian matrix G and form the sample matrix

Y = (ATA)qATG

where q is a parameter indicating the number of steps of poweriteration taken

Form a unitary matrix V whose first p columns form anorthonormal basis for the column space of Y For example

[Vsim] = qr(Y)

We then compute SVD of the matrix AV( 1 p)

AV( 1 p) = U1A11WT

and set V1 = V

983063W

I

983064

Low-Rank Approximations Lecture 4 March 6 - 13 2020 38 40

One step of randUTV

function [UTV] = stepUTV(A p q)G =randn(size(A 1) p)Y = ATGfor i = 1 q

Y = AT(AY)end[Vsim] = qr(Y)[UDW] = svd(AV( 1 p))T =

983045D UTAV( (p+ 1) end)

983046

V( 1 p) = V( 1 p)Wreturn

randUTV then applies the same procedure to the remaining blockand so on

Low-Rank Approximations Lecture 4 March 6 - 13 2020 39 40

function [UTV] = randUTV(A p q)T = AU = eye(size(A 1)) V = eye(size(A 2))for i = 1 ceil(size(A 2)p)

I1 = 1 (p(iminus 1))I2 = (p(iminus 1) + 1) size(A 1)J2 = (p(iminus 1) + 1) size(A 2)if length(J2) gt p

[983141U 983141T 983141V] = stepUTV(T(I2 J2) p q)else

[983141U 983141T 983141V] = svd(T(I2 J2))end

U( I2) = U( I2)983141U V( J2) = V( J2)983141V

T(I2 J2) = 983141T T(I1 J2) = 983141T(I1 J2)983141Vend

return

Low-Rank Approximations Lecture 4 March 6 - 13 2020 40 40

22 Randomized techniques for the approximate ID

Suppose A isin Rmtimesn with rank(A) = k Let

A = YF Y isin Rmtimesk F isin Rktimesn

Then by [IsX] = ID row(Y k) to compute a row ID of Y wehave

Y = XY(Is )

andXA(Is ) = XY(Is )F = YF = A

That is XA(Is ) is automatically a row ID of A as well

Observation In order to compute a row ID of a matrix A theonly information needed is a matrix Y whose columns span thecolumn space of A

As we have seen the task of finding a matrix Y whose columnsform a good basis for the column space of a matrix is ideallysuited to randomized sampling

Low-Rank Approximations Lecture 4 March 6 - 13 2020 25 40

Algorithm 10 Accuracy enhanced randomized row ID

Input A isin Rmtimesn a target rank kover-sampling parameter pa small q the number of steps in the power iteration

Output X isin Rmtimesk and an index vector Is isin Nk

such that A asymp XA(Is )

G = randn(n k + p)Y = AGfor j = 1 q

Z = ATYY = AZ

end[IsX] = ID row(Y k)

We can reduce the complexity by using a structured randommatrix instead of a Gaussian For example Ω = DFSDHS

Low-Rank Approximations Lecture 4 March 6 - 13 2020 26 40

3 The CUR decomposition

The CUR factorization approximates an mtimes n matrix A as aproduct

A asymp CUR C isin Rmtimesk U isin Rktimesk R isin Rktimesn

where C consists of a subset of the columns of A and R consists ofa subset of the rows of A

From double-sided ID to CUR Let

C = A( Js) R = A(Is )

ByA asymp CZ asymp CUR

we setU = ZRdagger

Low-Rank Approximations Lecture 4 March 6 - 13 2020 27 40

Algorithm 11 Accuracy enhanced randomized CUR

Input A isin Rmtimesn a target rank kover-sampling parameter pa small q the number of steps in the power iteration

Output U isin Rktimesk index vectors Is Js isin Nk

such that A asymp A( Js)UA(Is )

G = randn(k + pm)Y = GAfor j = 1 q

Z = YATY = ZA

end[IsZ] = ID col(Y k)[Issim] = ID row(A( Js) k)Solve UA(Is ) = Z in least squares sense

Low-Rank Approximations Lecture 4 March 6 - 13 2020 28 40

4 Adaptive rank determination

Given a matrix A and a tolerance ε The task is to determinematrices Qk and Bk of rank le k such that 983042AminusQkBk983042 le ε

41 Algorithm with updating the matrix

Q0 = [ ] B0 = [ ] A0 = A k = 0while 983042Ak9830422 gt ε

k = k + 1pick y isin range(Akminus1)qk = y983042y9830422bk = qT

kAkminus1Qk =

983045Qkminus1 qk

983046

Bk =

983063Bkminus1

bk

983064

Ak = Akminus1 minus qkbkend

Low-Rank Approximations Lecture 4 March 6 - 13 2020 29 40

Strategies for choosing y isin range(Akminus1)

(1) Pick the largest remaining column

y = Akminus1( jk) with jk = argmaxj

983042Akminus1( j)9830422

(2) Pick the locally optimal vector (computationally hard)

y = argmin983042x9830422=1 xisinrange(Akminus1)

983042(Iminus xxT)Akminus19830422

(3) A randomized selection strategy Set

y = Akminus1g

where g isin Rn is a Gaussian random vector

Low-Rank Approximations Lecture 4 March 6 - 13 2020 30 40

Exercise Prove that for each k Qk is orthonormal and

Bk = QTkA A = QkBk +Ak

Exercise Prove that for each k

983042A9830422F = 983042Bk9830422F + 983042Ak9830422F

Exercise Prove that for each k with the strategy (2) ie

y = argmin983042x9830422=1 xisinrange(Akminus1)

983042(Iminus xxT)Akminus19830422

the algorithm will produce matrices that attain the theoreticallyoptimal precision ie

983042AminusQkBk9830422 = σk+1

Low-Rank Approximations Lecture 4 March 6 - 13 2020 31 40

42 Block version for randomized selection strategy

Q = [ ] B = [ ]while 983042A9830422 gt ε

G = randn(n p)[Qnewsim] = qr(AG 0)Bnew = QT

newAQ =

983045Q Qnew

983046

B =

983063B

Bnew

983064

A = AminusQnewBnewend

Exercise The block version algorithmrsquos output is an orthonormalmatrix Q of size mtimes k (where k is a multiple of p) and a k times nmatrix B such that

983042AminusQB9830422 le ε

Low-Rank Approximations Lecture 4 March 6 - 13 2020 32 40

43 Algorithm without updating the matrix

Draw standard Gaussian vectors g1 gr of length nyi = Agi i = 1 r j = 0 Q0 = [ ]

while max983042yj+19830422 983042yj+29830422 983042yj+r9830422 gt ε(10983155

2π)j = j + 1yj = yj minusQjminus1Q

Tjminus1yj

qj = yj983042yj9830422Qj =

983045Qjminus1 qj

983046

Draw a standard Gaussian vector gj+r of length nyj+r = (IminusQjQ

Tj )Agj+r

yi = yi minus qjqTj yi i = (j + 1) (j + r minus 1)

endQ = Qj

The orthonormal matrix Q such that 983042AminusQQTA9830422 le ε holdswith probability at least 1minusminmn10minusr

Low-Rank Approximations Lecture 4 March 6 - 13 2020 33 40

5 Rank-revealing QR Factorization

Column pivoted QR factorization (CPQR) of A isin Rmtimesn

AP = QR

where P is a permutation matrix Q is an orthogonal matrix andR is upper triangular

Householder column pivoted QR

Householder column pivoted QR consists of a sequence of nminus 1rank-1 updates (so called BLAS2 operations)

Low-Rank Approximations Lecture 4 March 6 - 13 2020 34 40

Householder column pivoted QR executes rather slowly on modernhardware in particular on systems involving many cores when thematrix is stored in distributed memory or on a hard drive etcThe reason is the process very communication intensive Wereturn to block the process

Blocked Householder column pivoted QR

BLAS3 operations are involved which can be executed very rapidlyon a broad range of computing hardware

A randomized algorithm

G = randn(b+ pm) Y = GA [simsimP1] = qr(Y 0)

Low-Rank Approximations Lecture 4 March 6 - 13 2020 35 40

6 Rank-revealing UTV decomposition

Given an mtimes n matrix A with m ge n a UTV decomposition ofA is a factorization of the form

A = UTVT

where U isin Rmtimesm and V isin Rntimesn are unitary matrices and T is atriangular matrix (either lower or upper triangular)

The UTV decomposition can be viewed as a generalization ofother standard factorizations such as eg the Singular ValueDecomposition (SVD) or the Column Pivoted QR decomposition(CPQR) (To be precise the SVD is the special case where T isdiagonal and the CPQR is the special case where V is apermutation matrix) The additional flexibility inherent in UTVdecomposition enables the design of efficient updating procedures

randUTV provides close to optimal low-rank approximation andhighly accurate estimates for the singular values of a matrix

Low-Rank Approximations Lecture 4 March 6 - 13 2020 36 40

The algorithm randUTV builds the factorization incrementallywhich means that when it is applied to a matrix of numerical rankk the algorithm can be stopped early and incur an overall cost ofO(mnk)

Like HQRRP the algorithm randUTV is blocked which enables itto execute fast on modern hardware

The algorithm randUTV is not an iterative algorithm In thisregard it is closer to the CPQR than standard SVD algorithmswhich substantially simplifies software optimization

Blocked UTV factorization process

Low-Rank Approximations Lecture 4 March 6 - 13 2020 37 40

61 A single step block factorization from A0 to A1

Draw an mtimes p Gaussian matrix G and form the sample matrix

Y = (ATA)qATG

where q is a parameter indicating the number of steps of poweriteration taken

Form a unitary matrix V whose first p columns form anorthonormal basis for the column space of Y For example

[Vsim] = qr(Y)

We then compute SVD of the matrix AV( 1 p)

AV( 1 p) = U1A11WT

and set V1 = V

983063W

I

983064

Low-Rank Approximations Lecture 4 March 6 - 13 2020 38 40

One step of randUTV

function [UTV] = stepUTV(A p q)G =randn(size(A 1) p)Y = ATGfor i = 1 q

Y = AT(AY)end[Vsim] = qr(Y)[UDW] = svd(AV( 1 p))T =

983045D UTAV( (p+ 1) end)

983046

V( 1 p) = V( 1 p)Wreturn

randUTV then applies the same procedure to the remaining blockand so on

Low-Rank Approximations Lecture 4 March 6 - 13 2020 39 40

function [UTV] = randUTV(A p q)T = AU = eye(size(A 1)) V = eye(size(A 2))for i = 1 ceil(size(A 2)p)

I1 = 1 (p(iminus 1))I2 = (p(iminus 1) + 1) size(A 1)J2 = (p(iminus 1) + 1) size(A 2)if length(J2) gt p

[983141U 983141T 983141V] = stepUTV(T(I2 J2) p q)else

[983141U 983141T 983141V] = svd(T(I2 J2))end

U( I2) = U( I2)983141U V( J2) = V( J2)983141V

T(I2 J2) = 983141T T(I1 J2) = 983141T(I1 J2)983141Vend

return

Low-Rank Approximations Lecture 4 March 6 - 13 2020 40 40

Algorithm 10 Accuracy enhanced randomized row ID

Input A isin Rmtimesn a target rank kover-sampling parameter pa small q the number of steps in the power iteration

Output X isin Rmtimesk and an index vector Is isin Nk

such that A asymp XA(Is )

G = randn(n k + p)Y = AGfor j = 1 q

Z = ATYY = AZ

end[IsX] = ID row(Y k)

We can reduce the complexity by using a structured randommatrix instead of a Gaussian For example Ω = DFSDHS

Low-Rank Approximations Lecture 4 March 6 - 13 2020 26 40

3 The CUR decomposition

The CUR factorization approximates an mtimes n matrix A as aproduct

A asymp CUR C isin Rmtimesk U isin Rktimesk R isin Rktimesn

where C consists of a subset of the columns of A and R consists ofa subset of the rows of A

From double-sided ID to CUR Let

C = A( Js) R = A(Is )

ByA asymp CZ asymp CUR

we setU = ZRdagger

Low-Rank Approximations Lecture 4 March 6 - 13 2020 27 40

Algorithm 11 Accuracy enhanced randomized CUR

Input A isin Rmtimesn a target rank kover-sampling parameter pa small q the number of steps in the power iteration

Output U isin Rktimesk index vectors Is Js isin Nk

such that A asymp A( Js)UA(Is )

G = randn(k + pm)Y = GAfor j = 1 q

Z = YATY = ZA

end[IsZ] = ID col(Y k)[Issim] = ID row(A( Js) k)Solve UA(Is ) = Z in least squares sense

Low-Rank Approximations Lecture 4 March 6 - 13 2020 28 40

4 Adaptive rank determination

Given a matrix A and a tolerance ε The task is to determinematrices Qk and Bk of rank le k such that 983042AminusQkBk983042 le ε

41 Algorithm with updating the matrix

Q0 = [ ] B0 = [ ] A0 = A k = 0while 983042Ak9830422 gt ε

k = k + 1pick y isin range(Akminus1)qk = y983042y9830422bk = qT

kAkminus1Qk =

983045Qkminus1 qk

983046

Bk =

983063Bkminus1

bk

983064

Ak = Akminus1 minus qkbkend

Low-Rank Approximations Lecture 4 March 6 - 13 2020 29 40

Strategies for choosing y isin range(Akminus1)

(1) Pick the largest remaining column

y = Akminus1( jk) with jk = argmaxj

983042Akminus1( j)9830422

(2) Pick the locally optimal vector (computationally hard)

y = argmin983042x9830422=1 xisinrange(Akminus1)

983042(Iminus xxT)Akminus19830422

(3) A randomized selection strategy Set

y = Akminus1g

where g isin Rn is a Gaussian random vector

Low-Rank Approximations Lecture 4 March 6 - 13 2020 30 40

Exercise Prove that for each k Qk is orthonormal and

Bk = QTkA A = QkBk +Ak

Exercise Prove that for each k

983042A9830422F = 983042Bk9830422F + 983042Ak9830422F

Exercise Prove that for each k with the strategy (2) ie

y = argmin983042x9830422=1 xisinrange(Akminus1)

983042(Iminus xxT)Akminus19830422

the algorithm will produce matrices that attain the theoreticallyoptimal precision ie

983042AminusQkBk9830422 = σk+1

Low-Rank Approximations Lecture 4 March 6 - 13 2020 31 40

42 Block version for randomized selection strategy

Q = [ ] B = [ ]while 983042A9830422 gt ε

G = randn(n p)[Qnewsim] = qr(AG 0)Bnew = QT

newAQ =

983045Q Qnew

983046

B =

983063B

Bnew

983064

A = AminusQnewBnewend

Exercise The block version algorithmrsquos output is an orthonormalmatrix Q of size mtimes k (where k is a multiple of p) and a k times nmatrix B such that

983042AminusQB9830422 le ε

Low-Rank Approximations Lecture 4 March 6 - 13 2020 32 40

43 Algorithm without updating the matrix

Draw standard Gaussian vectors g1 gr of length nyi = Agi i = 1 r j = 0 Q0 = [ ]

while max983042yj+19830422 983042yj+29830422 983042yj+r9830422 gt ε(10983155

2π)j = j + 1yj = yj minusQjminus1Q

Tjminus1yj

qj = yj983042yj9830422Qj =

983045Qjminus1 qj

983046

Draw a standard Gaussian vector gj+r of length nyj+r = (IminusQjQ

Tj )Agj+r

yi = yi minus qjqTj yi i = (j + 1) (j + r minus 1)

endQ = Qj

The orthonormal matrix Q such that 983042AminusQQTA9830422 le ε holdswith probability at least 1minusminmn10minusr

Low-Rank Approximations Lecture 4 March 6 - 13 2020 33 40

5 Rank-revealing QR Factorization

Column pivoted QR factorization (CPQR) of A isin Rmtimesn

AP = QR

where P is a permutation matrix Q is an orthogonal matrix andR is upper triangular

Householder column pivoted QR

Householder column pivoted QR consists of a sequence of nminus 1rank-1 updates (so called BLAS2 operations)

Low-Rank Approximations Lecture 4 March 6 - 13 2020 34 40

Householder column pivoted QR executes rather slowly on modernhardware in particular on systems involving many cores when thematrix is stored in distributed memory or on a hard drive etcThe reason is the process very communication intensive Wereturn to block the process

Blocked Householder column pivoted QR

BLAS3 operations are involved which can be executed very rapidlyon a broad range of computing hardware

A randomized algorithm

G = randn(b+ pm) Y = GA [simsimP1] = qr(Y 0)

Low-Rank Approximations Lecture 4 March 6 - 13 2020 35 40

6 Rank-revealing UTV decomposition

Given an mtimes n matrix A with m ge n a UTV decomposition ofA is a factorization of the form

A = UTVT

where U isin Rmtimesm and V isin Rntimesn are unitary matrices and T is atriangular matrix (either lower or upper triangular)

The UTV decomposition can be viewed as a generalization ofother standard factorizations such as eg the Singular ValueDecomposition (SVD) or the Column Pivoted QR decomposition(CPQR) (To be precise the SVD is the special case where T isdiagonal and the CPQR is the special case where V is apermutation matrix) The additional flexibility inherent in UTVdecomposition enables the design of efficient updating procedures

randUTV provides close to optimal low-rank approximation andhighly accurate estimates for the singular values of a matrix

Low-Rank Approximations Lecture 4 March 6 - 13 2020 36 40

The algorithm randUTV builds the factorization incrementallywhich means that when it is applied to a matrix of numerical rankk the algorithm can be stopped early and incur an overall cost ofO(mnk)

Like HQRRP the algorithm randUTV is blocked which enables itto execute fast on modern hardware

The algorithm randUTV is not an iterative algorithm In thisregard it is closer to the CPQR than standard SVD algorithmswhich substantially simplifies software optimization

Blocked UTV factorization process

Low-Rank Approximations Lecture 4 March 6 - 13 2020 37 40

61 A single step block factorization from A0 to A1

Draw an mtimes p Gaussian matrix G and form the sample matrix

Y = (ATA)qATG

where q is a parameter indicating the number of steps of poweriteration taken

Form a unitary matrix V whose first p columns form anorthonormal basis for the column space of Y For example

[Vsim] = qr(Y)

We then compute SVD of the matrix AV( 1 p)

AV( 1 p) = U1A11WT

and set V1 = V

983063W

I

983064

Low-Rank Approximations Lecture 4 March 6 - 13 2020 38 40

One step of randUTV

function [UTV] = stepUTV(A p q)G =randn(size(A 1) p)Y = ATGfor i = 1 q

Y = AT(AY)end[Vsim] = qr(Y)[UDW] = svd(AV( 1 p))T =

983045D UTAV( (p+ 1) end)

983046

V( 1 p) = V( 1 p)Wreturn

randUTV then applies the same procedure to the remaining blockand so on

Low-Rank Approximations Lecture 4 March 6 - 13 2020 39 40

function [UTV] = randUTV(A p q)T = AU = eye(size(A 1)) V = eye(size(A 2))for i = 1 ceil(size(A 2)p)

I1 = 1 (p(iminus 1))I2 = (p(iminus 1) + 1) size(A 1)J2 = (p(iminus 1) + 1) size(A 2)if length(J2) gt p

[983141U 983141T 983141V] = stepUTV(T(I2 J2) p q)else

[983141U 983141T 983141V] = svd(T(I2 J2))end

U( I2) = U( I2)983141U V( J2) = V( J2)983141V

T(I2 J2) = 983141T T(I1 J2) = 983141T(I1 J2)983141Vend

return

Low-Rank Approximations Lecture 4 March 6 - 13 2020 40 40

3 The CUR decomposition

The CUR factorization approximates an mtimes n matrix A as aproduct

A asymp CUR C isin Rmtimesk U isin Rktimesk R isin Rktimesn

where C consists of a subset of the columns of A and R consists ofa subset of the rows of A

From double-sided ID to CUR Let

C = A( Js) R = A(Is )

ByA asymp CZ asymp CUR

we setU = ZRdagger

Low-Rank Approximations Lecture 4 March 6 - 13 2020 27 40

Algorithm 11 Accuracy enhanced randomized CUR

Input A isin Rmtimesn a target rank kover-sampling parameter pa small q the number of steps in the power iteration

Output U isin Rktimesk index vectors Is Js isin Nk

such that A asymp A( Js)UA(Is )

G = randn(k + pm)Y = GAfor j = 1 q

Z = YATY = ZA

end[IsZ] = ID col(Y k)[Issim] = ID row(A( Js) k)Solve UA(Is ) = Z in least squares sense

Low-Rank Approximations Lecture 4 March 6 - 13 2020 28 40

4 Adaptive rank determination

Given a matrix A and a tolerance ε The task is to determinematrices Qk and Bk of rank le k such that 983042AminusQkBk983042 le ε

41 Algorithm with updating the matrix

Q0 = [ ] B0 = [ ] A0 = A k = 0while 983042Ak9830422 gt ε

k = k + 1pick y isin range(Akminus1)qk = y983042y9830422bk = qT

kAkminus1Qk =

983045Qkminus1 qk

983046

Bk =

983063Bkminus1

bk

983064

Ak = Akminus1 minus qkbkend

Low-Rank Approximations Lecture 4 March 6 - 13 2020 29 40

Strategies for choosing y isin range(Akminus1)

(1) Pick the largest remaining column

y = Akminus1( jk) with jk = argmaxj

983042Akminus1( j)9830422

(2) Pick the locally optimal vector (computationally hard)

y = argmin983042x9830422=1 xisinrange(Akminus1)

983042(Iminus xxT)Akminus19830422

(3) A randomized selection strategy Set

y = Akminus1g

where g isin Rn is a Gaussian random vector

Low-Rank Approximations Lecture 4 March 6 - 13 2020 30 40

Exercise Prove that for each k Qk is orthonormal and

Bk = QTkA A = QkBk +Ak

Exercise Prove that for each k

983042A9830422F = 983042Bk9830422F + 983042Ak9830422F

Exercise Prove that for each k with the strategy (2) ie

y = argmin983042x9830422=1 xisinrange(Akminus1)

983042(Iminus xxT)Akminus19830422

the algorithm will produce matrices that attain the theoreticallyoptimal precision ie

983042AminusQkBk9830422 = σk+1

Low-Rank Approximations Lecture 4 March 6 - 13 2020 31 40

42 Block version for randomized selection strategy

Q = [ ] B = [ ]while 983042A9830422 gt ε

G = randn(n p)[Qnewsim] = qr(AG 0)Bnew = QT

newAQ =

983045Q Qnew

983046

B =

983063B

Bnew

983064

A = AminusQnewBnewend

Exercise The block version algorithmrsquos output is an orthonormalmatrix Q of size mtimes k (where k is a multiple of p) and a k times nmatrix B such that

983042AminusQB9830422 le ε

Low-Rank Approximations Lecture 4 March 6 - 13 2020 32 40

43 Algorithm without updating the matrix

Draw standard Gaussian vectors g1 gr of length nyi = Agi i = 1 r j = 0 Q0 = [ ]

while max983042yj+19830422 983042yj+29830422 983042yj+r9830422 gt ε(10983155

2π)j = j + 1yj = yj minusQjminus1Q

Tjminus1yj

qj = yj983042yj9830422Qj =

983045Qjminus1 qj

983046

Draw a standard Gaussian vector gj+r of length nyj+r = (IminusQjQ

Tj )Agj+r

yi = yi minus qjqTj yi i = (j + 1) (j + r minus 1)

endQ = Qj

The orthonormal matrix Q such that 983042AminusQQTA9830422 le ε holdswith probability at least 1minusminmn10minusr

Low-Rank Approximations Lecture 4 March 6 - 13 2020 33 40

5 Rank-revealing QR Factorization

Column pivoted QR factorization (CPQR) of A isin Rmtimesn

AP = QR

where P is a permutation matrix Q is an orthogonal matrix andR is upper triangular

Householder column pivoted QR

Householder column pivoted QR consists of a sequence of nminus 1rank-1 updates (so called BLAS2 operations)

Low-Rank Approximations Lecture 4 March 6 - 13 2020 34 40

Householder column pivoted QR executes rather slowly on modernhardware in particular on systems involving many cores when thematrix is stored in distributed memory or on a hard drive etcThe reason is the process very communication intensive Wereturn to block the process

Blocked Householder column pivoted QR

BLAS3 operations are involved which can be executed very rapidlyon a broad range of computing hardware

A randomized algorithm

G = randn(b+ pm) Y = GA [simsimP1] = qr(Y 0)

Low-Rank Approximations Lecture 4 March 6 - 13 2020 35 40

6 Rank-revealing UTV decomposition

Given an mtimes n matrix A with m ge n a UTV decomposition ofA is a factorization of the form

A = UTVT

where U isin Rmtimesm and V isin Rntimesn are unitary matrices and T is atriangular matrix (either lower or upper triangular)

The UTV decomposition can be viewed as a generalization ofother standard factorizations such as eg the Singular ValueDecomposition (SVD) or the Column Pivoted QR decomposition(CPQR) (To be precise the SVD is the special case where T isdiagonal and the CPQR is the special case where V is apermutation matrix) The additional flexibility inherent in UTVdecomposition enables the design of efficient updating procedures

randUTV provides close to optimal low-rank approximation andhighly accurate estimates for the singular values of a matrix

Low-Rank Approximations Lecture 4 March 6 - 13 2020 36 40

The algorithm randUTV builds the factorization incrementallywhich means that when it is applied to a matrix of numerical rankk the algorithm can be stopped early and incur an overall cost ofO(mnk)

Like HQRRP the algorithm randUTV is blocked which enables itto execute fast on modern hardware

The algorithm randUTV is not an iterative algorithm In thisregard it is closer to the CPQR than standard SVD algorithmswhich substantially simplifies software optimization

Blocked UTV factorization process

Low-Rank Approximations Lecture 4 March 6 - 13 2020 37 40

61 A single step block factorization from A0 to A1

Draw an mtimes p Gaussian matrix G and form the sample matrix

Y = (ATA)qATG

where q is a parameter indicating the number of steps of poweriteration taken

Form a unitary matrix V whose first p columns form anorthonormal basis for the column space of Y For example

[Vsim] = qr(Y)

We then compute SVD of the matrix AV( 1 p)

AV( 1 p) = U1A11WT

and set V1 = V

983063W

I

983064

Low-Rank Approximations Lecture 4 March 6 - 13 2020 38 40

One step of randUTV

function [UTV] = stepUTV(A p q)G =randn(size(A 1) p)Y = ATGfor i = 1 q

Y = AT(AY)end[Vsim] = qr(Y)[UDW] = svd(AV( 1 p))T =

983045D UTAV( (p+ 1) end)

983046

V( 1 p) = V( 1 p)Wreturn

randUTV then applies the same procedure to the remaining blockand so on

Low-Rank Approximations Lecture 4 March 6 - 13 2020 39 40

function [UTV] = randUTV(A p q)T = AU = eye(size(A 1)) V = eye(size(A 2))for i = 1 ceil(size(A 2)p)

I1 = 1 (p(iminus 1))I2 = (p(iminus 1) + 1) size(A 1)J2 = (p(iminus 1) + 1) size(A 2)if length(J2) gt p

[983141U 983141T 983141V] = stepUTV(T(I2 J2) p q)else

[983141U 983141T 983141V] = svd(T(I2 J2))end

U( I2) = U( I2)983141U V( J2) = V( J2)983141V

T(I2 J2) = 983141T T(I1 J2) = 983141T(I1 J2)983141Vend

return

Low-Rank Approximations Lecture 4 March 6 - 13 2020 40 40

Algorithm 11 Accuracy enhanced randomized CUR

Input A isin Rmtimesn a target rank kover-sampling parameter pa small q the number of steps in the power iteration

Output U isin Rktimesk index vectors Is Js isin Nk

such that A asymp A( Js)UA(Is )

G = randn(k + pm)Y = GAfor j = 1 q

Z = YATY = ZA

end[IsZ] = ID col(Y k)[Issim] = ID row(A( Js) k)Solve UA(Is ) = Z in least squares sense

Low-Rank Approximations Lecture 4 March 6 - 13 2020 28 40

4 Adaptive rank determination

Given a matrix A and a tolerance ε The task is to determinematrices Qk and Bk of rank le k such that 983042AminusQkBk983042 le ε

41 Algorithm with updating the matrix

Q0 = [ ] B0 = [ ] A0 = A k = 0while 983042Ak9830422 gt ε

k = k + 1pick y isin range(Akminus1)qk = y983042y9830422bk = qT

kAkminus1Qk =

983045Qkminus1 qk

983046

Bk =

983063Bkminus1

bk

983064

Ak = Akminus1 minus qkbkend

Low-Rank Approximations Lecture 4 March 6 - 13 2020 29 40

Strategies for choosing y isin range(Akminus1)

(1) Pick the largest remaining column

y = Akminus1( jk) with jk = argmaxj

983042Akminus1( j)9830422

(2) Pick the locally optimal vector (computationally hard)

y = argmin983042x9830422=1 xisinrange(Akminus1)

983042(Iminus xxT)Akminus19830422

(3) A randomized selection strategy Set

y = Akminus1g

where g isin Rn is a Gaussian random vector

Low-Rank Approximations Lecture 4 March 6 - 13 2020 30 40

Exercise Prove that for each k Qk is orthonormal and

Bk = QTkA A = QkBk +Ak

Exercise Prove that for each k

983042A9830422F = 983042Bk9830422F + 983042Ak9830422F

Exercise Prove that for each k with the strategy (2) ie

y = argmin983042x9830422=1 xisinrange(Akminus1)

983042(Iminus xxT)Akminus19830422

the algorithm will produce matrices that attain the theoreticallyoptimal precision ie

983042AminusQkBk9830422 = σk+1

Low-Rank Approximations Lecture 4 March 6 - 13 2020 31 40

42 Block version for randomized selection strategy

Q = [ ] B = [ ]while 983042A9830422 gt ε

G = randn(n p)[Qnewsim] = qr(AG 0)Bnew = QT

newAQ =

983045Q Qnew

983046

B =

983063B

Bnew

983064

A = AminusQnewBnewend

Exercise The block version algorithmrsquos output is an orthonormalmatrix Q of size mtimes k (where k is a multiple of p) and a k times nmatrix B such that

983042AminusQB9830422 le ε

Low-Rank Approximations Lecture 4 March 6 - 13 2020 32 40

43 Algorithm without updating the matrix

Draw standard Gaussian vectors g1 gr of length nyi = Agi i = 1 r j = 0 Q0 = [ ]

while max983042yj+19830422 983042yj+29830422 983042yj+r9830422 gt ε(10983155

2π)j = j + 1yj = yj minusQjminus1Q

Tjminus1yj

qj = yj983042yj9830422Qj =

983045Qjminus1 qj

983046

Draw a standard Gaussian vector gj+r of length nyj+r = (IminusQjQ

Tj )Agj+r

yi = yi minus qjqTj yi i = (j + 1) (j + r minus 1)

endQ = Qj

The orthonormal matrix Q such that 983042AminusQQTA9830422 le ε holdswith probability at least 1minusminmn10minusr

Low-Rank Approximations Lecture 4 March 6 - 13 2020 33 40

5 Rank-revealing QR Factorization

Column pivoted QR factorization (CPQR) of A isin Rmtimesn

AP = QR

where P is a permutation matrix Q is an orthogonal matrix andR is upper triangular

Householder column pivoted QR

Householder column pivoted QR consists of a sequence of nminus 1rank-1 updates (so called BLAS2 operations)

Low-Rank Approximations Lecture 4 March 6 - 13 2020 34 40

Householder column pivoted QR executes rather slowly on modernhardware in particular on systems involving many cores when thematrix is stored in distributed memory or on a hard drive etcThe reason is the process very communication intensive Wereturn to block the process

Blocked Householder column pivoted QR

BLAS3 operations are involved which can be executed very rapidlyon a broad range of computing hardware

A randomized algorithm

G = randn(b+ pm) Y = GA [simsimP1] = qr(Y 0)

Low-Rank Approximations Lecture 4 March 6 - 13 2020 35 40

6 Rank-revealing UTV decomposition

Given an mtimes n matrix A with m ge n a UTV decomposition ofA is a factorization of the form

A = UTVT

where U isin Rmtimesm and V isin Rntimesn are unitary matrices and T is atriangular matrix (either lower or upper triangular)

The UTV decomposition can be viewed as a generalization ofother standard factorizations such as eg the Singular ValueDecomposition (SVD) or the Column Pivoted QR decomposition(CPQR) (To be precise the SVD is the special case where T isdiagonal and the CPQR is the special case where V is apermutation matrix) The additional flexibility inherent in UTVdecomposition enables the design of efficient updating procedures

randUTV provides close to optimal low-rank approximation andhighly accurate estimates for the singular values of a matrix

Low-Rank Approximations Lecture 4 March 6 - 13 2020 36 40

The algorithm randUTV builds the factorization incrementallywhich means that when it is applied to a matrix of numerical rankk the algorithm can be stopped early and incur an overall cost ofO(mnk)

Like HQRRP the algorithm randUTV is blocked which enables itto execute fast on modern hardware

The algorithm randUTV is not an iterative algorithm In thisregard it is closer to the CPQR than standard SVD algorithmswhich substantially simplifies software optimization

Blocked UTV factorization process

Low-Rank Approximations Lecture 4 March 6 - 13 2020 37 40

61 A single step block factorization from A0 to A1

Draw an mtimes p Gaussian matrix G and form the sample matrix

Y = (ATA)qATG

where q is a parameter indicating the number of steps of poweriteration taken

Form a unitary matrix V whose first p columns form anorthonormal basis for the column space of Y For example

[Vsim] = qr(Y)

We then compute SVD of the matrix AV( 1 p)

AV( 1 p) = U1A11WT

and set V1 = V

983063W

I

983064

Low-Rank Approximations Lecture 4 March 6 - 13 2020 38 40

One step of randUTV

function [UTV] = stepUTV(A p q)G =randn(size(A 1) p)Y = ATGfor i = 1 q

Y = AT(AY)end[Vsim] = qr(Y)[UDW] = svd(AV( 1 p))T =

983045D UTAV( (p+ 1) end)

983046

V( 1 p) = V( 1 p)Wreturn

randUTV then applies the same procedure to the remaining blockand so on

Low-Rank Approximations Lecture 4 March 6 - 13 2020 39 40

function [UTV] = randUTV(A p q)T = AU = eye(size(A 1)) V = eye(size(A 2))for i = 1 ceil(size(A 2)p)

I1 = 1 (p(iminus 1))I2 = (p(iminus 1) + 1) size(A 1)J2 = (p(iminus 1) + 1) size(A 2)if length(J2) gt p

[983141U 983141T 983141V] = stepUTV(T(I2 J2) p q)else

[983141U 983141T 983141V] = svd(T(I2 J2))end

U( I2) = U( I2)983141U V( J2) = V( J2)983141V

T(I2 J2) = 983141T T(I1 J2) = 983141T(I1 J2)983141Vend

return

Low-Rank Approximations Lecture 4 March 6 - 13 2020 40 40

4 Adaptive rank determination

Given a matrix A and a tolerance ε The task is to determinematrices Qk and Bk of rank le k such that 983042AminusQkBk983042 le ε

41 Algorithm with updating the matrix

Q0 = [ ] B0 = [ ] A0 = A k = 0while 983042Ak9830422 gt ε

k = k + 1pick y isin range(Akminus1)qk = y983042y9830422bk = qT

kAkminus1Qk =

983045Qkminus1 qk

983046

Bk =

983063Bkminus1

bk

983064

Ak = Akminus1 minus qkbkend

Low-Rank Approximations Lecture 4 March 6 - 13 2020 29 40

Strategies for choosing y isin range(Akminus1)

(1) Pick the largest remaining column

y = Akminus1( jk) with jk = argmaxj

983042Akminus1( j)9830422

(2) Pick the locally optimal vector (computationally hard)

y = argmin983042x9830422=1 xisinrange(Akminus1)

983042(Iminus xxT)Akminus19830422

(3) A randomized selection strategy Set

y = Akminus1g

where g isin Rn is a Gaussian random vector

Low-Rank Approximations Lecture 4 March 6 - 13 2020 30 40

Exercise Prove that for each k Qk is orthonormal and

Bk = QTkA A = QkBk +Ak

Exercise Prove that for each k

983042A9830422F = 983042Bk9830422F + 983042Ak9830422F

Exercise Prove that for each k with the strategy (2) ie

y = argmin983042x9830422=1 xisinrange(Akminus1)

983042(Iminus xxT)Akminus19830422

the algorithm will produce matrices that attain the theoreticallyoptimal precision ie

983042AminusQkBk9830422 = σk+1

Low-Rank Approximations Lecture 4 March 6 - 13 2020 31 40

42 Block version for randomized selection strategy

Q = [ ] B = [ ]while 983042A9830422 gt ε

G = randn(n p)[Qnewsim] = qr(AG 0)Bnew = QT

newAQ =

983045Q Qnew

983046

B =

983063B

Bnew

983064

A = AminusQnewBnewend

Exercise The block version algorithmrsquos output is an orthonormalmatrix Q of size mtimes k (where k is a multiple of p) and a k times nmatrix B such that

983042AminusQB9830422 le ε

Low-Rank Approximations Lecture 4 March 6 - 13 2020 32 40

43 Algorithm without updating the matrix

Draw standard Gaussian vectors g1 gr of length nyi = Agi i = 1 r j = 0 Q0 = [ ]

while max983042yj+19830422 983042yj+29830422 983042yj+r9830422 gt ε(10983155

2π)j = j + 1yj = yj minusQjminus1Q

Tjminus1yj

qj = yj983042yj9830422Qj =

983045Qjminus1 qj

983046

Draw a standard Gaussian vector gj+r of length nyj+r = (IminusQjQ

Tj )Agj+r

yi = yi minus qjqTj yi i = (j + 1) (j + r minus 1)

endQ = Qj

The orthonormal matrix Q such that 983042AminusQQTA9830422 le ε holdswith probability at least 1minusminmn10minusr

Low-Rank Approximations Lecture 4 March 6 - 13 2020 33 40

5 Rank-revealing QR Factorization

Column pivoted QR factorization (CPQR) of A isin Rmtimesn

AP = QR

where P is a permutation matrix Q is an orthogonal matrix andR is upper triangular

Householder column pivoted QR

Householder column pivoted QR consists of a sequence of nminus 1rank-1 updates (so called BLAS2 operations)

Low-Rank Approximations Lecture 4 March 6 - 13 2020 34 40

Householder column pivoted QR executes rather slowly on modernhardware in particular on systems involving many cores when thematrix is stored in distributed memory or on a hard drive etcThe reason is the process very communication intensive Wereturn to block the process

Blocked Householder column pivoted QR

BLAS3 operations are involved which can be executed very rapidlyon a broad range of computing hardware

A randomized algorithm

G = randn(b+ pm) Y = GA [simsimP1] = qr(Y 0)

Low-Rank Approximations Lecture 4 March 6 - 13 2020 35 40

6 Rank-revealing UTV decomposition

Given an mtimes n matrix A with m ge n a UTV decomposition ofA is a factorization of the form

A = UTVT

where U isin Rmtimesm and V isin Rntimesn are unitary matrices and T is atriangular matrix (either lower or upper triangular)

The UTV decomposition can be viewed as a generalization ofother standard factorizations such as eg the Singular ValueDecomposition (SVD) or the Column Pivoted QR decomposition(CPQR) (To be precise the SVD is the special case where T isdiagonal and the CPQR is the special case where V is apermutation matrix) The additional flexibility inherent in UTVdecomposition enables the design of efficient updating procedures

randUTV provides close to optimal low-rank approximation andhighly accurate estimates for the singular values of a matrix

Low-Rank Approximations Lecture 4 March 6 - 13 2020 36 40

The algorithm randUTV builds the factorization incrementallywhich means that when it is applied to a matrix of numerical rankk the algorithm can be stopped early and incur an overall cost ofO(mnk)

Like HQRRP the algorithm randUTV is blocked which enables itto execute fast on modern hardware

The algorithm randUTV is not an iterative algorithm In thisregard it is closer to the CPQR than standard SVD algorithmswhich substantially simplifies software optimization

Blocked UTV factorization process

Low-Rank Approximations Lecture 4 March 6 - 13 2020 37 40

61 A single step block factorization from A0 to A1

Draw an mtimes p Gaussian matrix G and form the sample matrix

Y = (ATA)qATG

where q is a parameter indicating the number of steps of poweriteration taken

Form a unitary matrix V whose first p columns form anorthonormal basis for the column space of Y For example

[Vsim] = qr(Y)

We then compute SVD of the matrix AV( 1 p)

AV( 1 p) = U1A11WT

and set V1 = V

983063W

I

983064

Low-Rank Approximations Lecture 4 March 6 - 13 2020 38 40

One step of randUTV

function [UTV] = stepUTV(A p q)G =randn(size(A 1) p)Y = ATGfor i = 1 q

Y = AT(AY)end[Vsim] = qr(Y)[UDW] = svd(AV( 1 p))T =

983045D UTAV( (p+ 1) end)

983046

V( 1 p) = V( 1 p)Wreturn

randUTV then applies the same procedure to the remaining blockand so on

Low-Rank Approximations Lecture 4 March 6 - 13 2020 39 40

function [UTV] = randUTV(A p q)T = AU = eye(size(A 1)) V = eye(size(A 2))for i = 1 ceil(size(A 2)p)

I1 = 1 (p(iminus 1))I2 = (p(iminus 1) + 1) size(A 1)J2 = (p(iminus 1) + 1) size(A 2)if length(J2) gt p

[983141U 983141T 983141V] = stepUTV(T(I2 J2) p q)else

[983141U 983141T 983141V] = svd(T(I2 J2))end

U( I2) = U( I2)983141U V( J2) = V( J2)983141V

T(I2 J2) = 983141T T(I1 J2) = 983141T(I1 J2)983141Vend

return

Low-Rank Approximations Lecture 4 March 6 - 13 2020 40 40

Strategies for choosing y isin range(Akminus1)

(1) Pick the largest remaining column

y = Akminus1( jk) with jk = argmaxj

983042Akminus1( j)9830422

(2) Pick the locally optimal vector (computationally hard)

y = argmin983042x9830422=1 xisinrange(Akminus1)

983042(Iminus xxT)Akminus19830422

(3) A randomized selection strategy Set

y = Akminus1g

where g isin Rn is a Gaussian random vector

Low-Rank Approximations Lecture 4 March 6 - 13 2020 30 40

Exercise Prove that for each k Qk is orthonormal and

Bk = QTkA A = QkBk +Ak

Exercise Prove that for each k

983042A9830422F = 983042Bk9830422F + 983042Ak9830422F

Exercise Prove that for each k with the strategy (2) ie

y = argmin983042x9830422=1 xisinrange(Akminus1)

983042(Iminus xxT)Akminus19830422

the algorithm will produce matrices that attain the theoreticallyoptimal precision ie

983042AminusQkBk9830422 = σk+1

Low-Rank Approximations Lecture 4 March 6 - 13 2020 31 40

42 Block version for randomized selection strategy

Q = [ ] B = [ ]while 983042A9830422 gt ε

G = randn(n p)[Qnewsim] = qr(AG 0)Bnew = QT

newAQ =

983045Q Qnew

983046

B =

983063B

Bnew

983064

A = AminusQnewBnewend

Exercise The block version algorithmrsquos output is an orthonormalmatrix Q of size mtimes k (where k is a multiple of p) and a k times nmatrix B such that

983042AminusQB9830422 le ε

Low-Rank Approximations Lecture 4 March 6 - 13 2020 32 40

43 Algorithm without updating the matrix

Draw standard Gaussian vectors g1 gr of length nyi = Agi i = 1 r j = 0 Q0 = [ ]

while max983042yj+19830422 983042yj+29830422 983042yj+r9830422 gt ε(10983155

2π)j = j + 1yj = yj minusQjminus1Q

Tjminus1yj

qj = yj983042yj9830422Qj =

983045Qjminus1 qj

983046

Draw a standard Gaussian vector gj+r of length nyj+r = (IminusQjQ

Tj )Agj+r

yi = yi minus qjqTj yi i = (j + 1) (j + r minus 1)

endQ = Qj

The orthonormal matrix Q such that 983042AminusQQTA9830422 le ε holdswith probability at least 1minusminmn10minusr

Low-Rank Approximations Lecture 4 March 6 - 13 2020 33 40

5 Rank-revealing QR Factorization

Column pivoted QR factorization (CPQR) of A isin Rmtimesn

AP = QR

where P is a permutation matrix Q is an orthogonal matrix andR is upper triangular

Householder column pivoted QR

Householder column pivoted QR consists of a sequence of nminus 1rank-1 updates (so called BLAS2 operations)

Low-Rank Approximations Lecture 4 March 6 - 13 2020 34 40

Householder column pivoted QR executes rather slowly on modernhardware in particular on systems involving many cores when thematrix is stored in distributed memory or on a hard drive etcThe reason is the process very communication intensive Wereturn to block the process

Blocked Householder column pivoted QR

BLAS3 operations are involved which can be executed very rapidlyon a broad range of computing hardware

A randomized algorithm

G = randn(b+ pm) Y = GA [simsimP1] = qr(Y 0)

Low-Rank Approximations Lecture 4 March 6 - 13 2020 35 40

6 Rank-revealing UTV decomposition

Given an mtimes n matrix A with m ge n a UTV decomposition ofA is a factorization of the form

A = UTVT

where U isin Rmtimesm and V isin Rntimesn are unitary matrices and T is atriangular matrix (either lower or upper triangular)

The UTV decomposition can be viewed as a generalization ofother standard factorizations such as eg the Singular ValueDecomposition (SVD) or the Column Pivoted QR decomposition(CPQR) (To be precise the SVD is the special case where T isdiagonal and the CPQR is the special case where V is apermutation matrix) The additional flexibility inherent in UTVdecomposition enables the design of efficient updating procedures

randUTV provides close to optimal low-rank approximation andhighly accurate estimates for the singular values of a matrix

Low-Rank Approximations Lecture 4 March 6 - 13 2020 36 40

The algorithm randUTV builds the factorization incrementallywhich means that when it is applied to a matrix of numerical rankk the algorithm can be stopped early and incur an overall cost ofO(mnk)

Like HQRRP the algorithm randUTV is blocked which enables itto execute fast on modern hardware

The algorithm randUTV is not an iterative algorithm In thisregard it is closer to the CPQR than standard SVD algorithmswhich substantially simplifies software optimization

Blocked UTV factorization process

Low-Rank Approximations Lecture 4 March 6 - 13 2020 37 40

61 A single step block factorization from A0 to A1

Draw an mtimes p Gaussian matrix G and form the sample matrix

Y = (ATA)qATG

where q is a parameter indicating the number of steps of poweriteration taken

Form a unitary matrix V whose first p columns form anorthonormal basis for the column space of Y For example

[Vsim] = qr(Y)

We then compute SVD of the matrix AV( 1 p)

AV( 1 p) = U1A11WT

and set V1 = V

983063W

I

983064

Low-Rank Approximations Lecture 4 March 6 - 13 2020 38 40

One step of randUTV

function [UTV] = stepUTV(A p q)G =randn(size(A 1) p)Y = ATGfor i = 1 q

Y = AT(AY)end[Vsim] = qr(Y)[UDW] = svd(AV( 1 p))T =

983045D UTAV( (p+ 1) end)

983046

V( 1 p) = V( 1 p)Wreturn

randUTV then applies the same procedure to the remaining blockand so on

Low-Rank Approximations Lecture 4 March 6 - 13 2020 39 40

function [UTV] = randUTV(A p q)T = AU = eye(size(A 1)) V = eye(size(A 2))for i = 1 ceil(size(A 2)p)

I1 = 1 (p(iminus 1))I2 = (p(iminus 1) + 1) size(A 1)J2 = (p(iminus 1) + 1) size(A 2)if length(J2) gt p

[983141U 983141T 983141V] = stepUTV(T(I2 J2) p q)else

[983141U 983141T 983141V] = svd(T(I2 J2))end

U( I2) = U( I2)983141U V( J2) = V( J2)983141V

T(I2 J2) = 983141T T(I1 J2) = 983141T(I1 J2)983141Vend

return

Low-Rank Approximations Lecture 4 March 6 - 13 2020 40 40

Exercise Prove that for each k Qk is orthonormal and

Bk = QTkA A = QkBk +Ak

Exercise Prove that for each k

983042A9830422F = 983042Bk9830422F + 983042Ak9830422F

Exercise Prove that for each k with the strategy (2) ie

y = argmin983042x9830422=1 xisinrange(Akminus1)

983042(Iminus xxT)Akminus19830422

the algorithm will produce matrices that attain the theoreticallyoptimal precision ie

983042AminusQkBk9830422 = σk+1

Low-Rank Approximations Lecture 4 March 6 - 13 2020 31 40

42 Block version for randomized selection strategy

Q = [ ] B = [ ]while 983042A9830422 gt ε

G = randn(n p)[Qnewsim] = qr(AG 0)Bnew = QT

newAQ =

983045Q Qnew

983046

B =

983063B

Bnew

983064

A = AminusQnewBnewend

Exercise The block version algorithmrsquos output is an orthonormalmatrix Q of size mtimes k (where k is a multiple of p) and a k times nmatrix B such that

983042AminusQB9830422 le ε

Low-Rank Approximations Lecture 4 March 6 - 13 2020 32 40

43 Algorithm without updating the matrix

Draw standard Gaussian vectors g1 gr of length nyi = Agi i = 1 r j = 0 Q0 = [ ]

while max983042yj+19830422 983042yj+29830422 983042yj+r9830422 gt ε(10983155

2π)j = j + 1yj = yj minusQjminus1Q

Tjminus1yj

qj = yj983042yj9830422Qj =

983045Qjminus1 qj

983046

Draw a standard Gaussian vector gj+r of length nyj+r = (IminusQjQ

Tj )Agj+r

yi = yi minus qjqTj yi i = (j + 1) (j + r minus 1)

endQ = Qj

The orthonormal matrix Q such that 983042AminusQQTA9830422 le ε holdswith probability at least 1minusminmn10minusr

Low-Rank Approximations Lecture 4 March 6 - 13 2020 33 40

5 Rank-revealing QR Factorization

Column pivoted QR factorization (CPQR) of A isin Rmtimesn

AP = QR

where P is a permutation matrix Q is an orthogonal matrix andR is upper triangular

Householder column pivoted QR

Householder column pivoted QR consists of a sequence of nminus 1rank-1 updates (so called BLAS2 operations)

Low-Rank Approximations Lecture 4 March 6 - 13 2020 34 40

Householder column pivoted QR executes rather slowly on modernhardware in particular on systems involving many cores when thematrix is stored in distributed memory or on a hard drive etcThe reason is the process very communication intensive Wereturn to block the process

Blocked Householder column pivoted QR

BLAS3 operations are involved which can be executed very rapidlyon a broad range of computing hardware

A randomized algorithm

G = randn(b+ pm) Y = GA [simsimP1] = qr(Y 0)

Low-Rank Approximations Lecture 4 March 6 - 13 2020 35 40

6 Rank-revealing UTV decomposition

Given an mtimes n matrix A with m ge n a UTV decomposition ofA is a factorization of the form

A = UTVT

where U isin Rmtimesm and V isin Rntimesn are unitary matrices and T is atriangular matrix (either lower or upper triangular)

The UTV decomposition can be viewed as a generalization ofother standard factorizations such as eg the Singular ValueDecomposition (SVD) or the Column Pivoted QR decomposition(CPQR) (To be precise the SVD is the special case where T isdiagonal and the CPQR is the special case where V is apermutation matrix) The additional flexibility inherent in UTVdecomposition enables the design of efficient updating procedures

randUTV provides close to optimal low-rank approximation andhighly accurate estimates for the singular values of a matrix

Low-Rank Approximations Lecture 4 March 6 - 13 2020 36 40

The algorithm randUTV builds the factorization incrementallywhich means that when it is applied to a matrix of numerical rankk the algorithm can be stopped early and incur an overall cost ofO(mnk)

Like HQRRP the algorithm randUTV is blocked which enables itto execute fast on modern hardware

The algorithm randUTV is not an iterative algorithm In thisregard it is closer to the CPQR than standard SVD algorithmswhich substantially simplifies software optimization

Blocked UTV factorization process

Low-Rank Approximations Lecture 4 March 6 - 13 2020 37 40

61 A single step block factorization from A0 to A1

Draw an mtimes p Gaussian matrix G and form the sample matrix

Y = (ATA)qATG

where q is a parameter indicating the number of steps of poweriteration taken

Form a unitary matrix V whose first p columns form anorthonormal basis for the column space of Y For example

[Vsim] = qr(Y)

We then compute SVD of the matrix AV( 1 p)

AV( 1 p) = U1A11WT

and set V1 = V

983063W

I

983064

Low-Rank Approximations Lecture 4 March 6 - 13 2020 38 40

One step of randUTV

function [UTV] = stepUTV(A p q)G =randn(size(A 1) p)Y = ATGfor i = 1 q

Y = AT(AY)end[Vsim] = qr(Y)[UDW] = svd(AV( 1 p))T =

983045D UTAV( (p+ 1) end)

983046

V( 1 p) = V( 1 p)Wreturn

randUTV then applies the same procedure to the remaining blockand so on

Low-Rank Approximations Lecture 4 March 6 - 13 2020 39 40

function [UTV] = randUTV(A p q)T = AU = eye(size(A 1)) V = eye(size(A 2))for i = 1 ceil(size(A 2)p)

I1 = 1 (p(iminus 1))I2 = (p(iminus 1) + 1) size(A 1)J2 = (p(iminus 1) + 1) size(A 2)if length(J2) gt p

[983141U 983141T 983141V] = stepUTV(T(I2 J2) p q)else

[983141U 983141T 983141V] = svd(T(I2 J2))end

U( I2) = U( I2)983141U V( J2) = V( J2)983141V

T(I2 J2) = 983141T T(I1 J2) = 983141T(I1 J2)983141Vend

return

Low-Rank Approximations Lecture 4 March 6 - 13 2020 40 40

42 Block version for randomized selection strategy

Q = [ ] B = [ ]while 983042A9830422 gt ε

G = randn(n p)[Qnewsim] = qr(AG 0)Bnew = QT

newAQ =

983045Q Qnew

983046

B =

983063B

Bnew

983064

A = AminusQnewBnewend

Exercise The block version algorithmrsquos output is an orthonormalmatrix Q of size mtimes k (where k is a multiple of p) and a k times nmatrix B such that

983042AminusQB9830422 le ε

Low-Rank Approximations Lecture 4 March 6 - 13 2020 32 40

43 Algorithm without updating the matrix

Draw standard Gaussian vectors g1 gr of length nyi = Agi i = 1 r j = 0 Q0 = [ ]

while max983042yj+19830422 983042yj+29830422 983042yj+r9830422 gt ε(10983155

2π)j = j + 1yj = yj minusQjminus1Q

Tjminus1yj

qj = yj983042yj9830422Qj =

983045Qjminus1 qj

983046

Draw a standard Gaussian vector gj+r of length nyj+r = (IminusQjQ

Tj )Agj+r

yi = yi minus qjqTj yi i = (j + 1) (j + r minus 1)

endQ = Qj

The orthonormal matrix Q such that 983042AminusQQTA9830422 le ε holdswith probability at least 1minusminmn10minusr

Low-Rank Approximations Lecture 4 March 6 - 13 2020 33 40

5 Rank-revealing QR Factorization

Column pivoted QR factorization (CPQR) of A isin Rmtimesn

AP = QR

where P is a permutation matrix Q is an orthogonal matrix andR is upper triangular

Householder column pivoted QR

Householder column pivoted QR consists of a sequence of nminus 1rank-1 updates (so called BLAS2 operations)

Low-Rank Approximations Lecture 4 March 6 - 13 2020 34 40

Householder column pivoted QR executes rather slowly on modernhardware in particular on systems involving many cores when thematrix is stored in distributed memory or on a hard drive etcThe reason is the process very communication intensive Wereturn to block the process

Blocked Householder column pivoted QR

BLAS3 operations are involved which can be executed very rapidlyon a broad range of computing hardware

A randomized algorithm

G = randn(b+ pm) Y = GA [simsimP1] = qr(Y 0)

Low-Rank Approximations Lecture 4 March 6 - 13 2020 35 40

6 Rank-revealing UTV decomposition

Given an mtimes n matrix A with m ge n a UTV decomposition ofA is a factorization of the form

A = UTVT

where U isin Rmtimesm and V isin Rntimesn are unitary matrices and T is atriangular matrix (either lower or upper triangular)

The UTV decomposition can be viewed as a generalization ofother standard factorizations such as eg the Singular ValueDecomposition (SVD) or the Column Pivoted QR decomposition(CPQR) (To be precise the SVD is the special case where T isdiagonal and the CPQR is the special case where V is apermutation matrix) The additional flexibility inherent in UTVdecomposition enables the design of efficient updating procedures

randUTV provides close to optimal low-rank approximation andhighly accurate estimates for the singular values of a matrix

Low-Rank Approximations Lecture 4 March 6 - 13 2020 36 40

The algorithm randUTV builds the factorization incrementallywhich means that when it is applied to a matrix of numerical rankk the algorithm can be stopped early and incur an overall cost ofO(mnk)

Like HQRRP the algorithm randUTV is blocked which enables itto execute fast on modern hardware

The algorithm randUTV is not an iterative algorithm In thisregard it is closer to the CPQR than standard SVD algorithmswhich substantially simplifies software optimization

Blocked UTV factorization process

Low-Rank Approximations Lecture 4 March 6 - 13 2020 37 40

61 A single step block factorization from A0 to A1

Draw an mtimes p Gaussian matrix G and form the sample matrix

Y = (ATA)qATG

where q is a parameter indicating the number of steps of poweriteration taken

Form a unitary matrix V whose first p columns form anorthonormal basis for the column space of Y For example

[Vsim] = qr(Y)

We then compute SVD of the matrix AV( 1 p)

AV( 1 p) = U1A11WT

and set V1 = V

983063W

I

983064

Low-Rank Approximations Lecture 4 March 6 - 13 2020 38 40

One step of randUTV

function [UTV] = stepUTV(A p q)G =randn(size(A 1) p)Y = ATGfor i = 1 q

Y = AT(AY)end[Vsim] = qr(Y)[UDW] = svd(AV( 1 p))T =

983045D UTAV( (p+ 1) end)

983046

V( 1 p) = V( 1 p)Wreturn

randUTV then applies the same procedure to the remaining blockand so on

Low-Rank Approximations Lecture 4 March 6 - 13 2020 39 40

function [UTV] = randUTV(A p q)T = AU = eye(size(A 1)) V = eye(size(A 2))for i = 1 ceil(size(A 2)p)

I1 = 1 (p(iminus 1))I2 = (p(iminus 1) + 1) size(A 1)J2 = (p(iminus 1) + 1) size(A 2)if length(J2) gt p

[983141U 983141T 983141V] = stepUTV(T(I2 J2) p q)else

[983141U 983141T 983141V] = svd(T(I2 J2))end

U( I2) = U( I2)983141U V( J2) = V( J2)983141V

T(I2 J2) = 983141T T(I1 J2) = 983141T(I1 J2)983141Vend

return

Low-Rank Approximations Lecture 4 March 6 - 13 2020 40 40

43 Algorithm without updating the matrix

Draw standard Gaussian vectors g1 gr of length nyi = Agi i = 1 r j = 0 Q0 = [ ]

while max983042yj+19830422 983042yj+29830422 983042yj+r9830422 gt ε(10983155

2π)j = j + 1yj = yj minusQjminus1Q

Tjminus1yj

qj = yj983042yj9830422Qj =

983045Qjminus1 qj

983046

Draw a standard Gaussian vector gj+r of length nyj+r = (IminusQjQ

Tj )Agj+r

yi = yi minus qjqTj yi i = (j + 1) (j + r minus 1)

endQ = Qj

The orthonormal matrix Q such that 983042AminusQQTA9830422 le ε holdswith probability at least 1minusminmn10minusr

Low-Rank Approximations Lecture 4 March 6 - 13 2020 33 40

5 Rank-revealing QR Factorization

Column pivoted QR factorization (CPQR) of A isin Rmtimesn

AP = QR

where P is a permutation matrix Q is an orthogonal matrix andR is upper triangular

Householder column pivoted QR

Householder column pivoted QR consists of a sequence of nminus 1rank-1 updates (so called BLAS2 operations)

Low-Rank Approximations Lecture 4 March 6 - 13 2020 34 40

Householder column pivoted QR executes rather slowly on modernhardware in particular on systems involving many cores when thematrix is stored in distributed memory or on a hard drive etcThe reason is the process very communication intensive Wereturn to block the process

Blocked Householder column pivoted QR

BLAS3 operations are involved which can be executed very rapidlyon a broad range of computing hardware

A randomized algorithm

G = randn(b+ pm) Y = GA [simsimP1] = qr(Y 0)

Low-Rank Approximations Lecture 4 March 6 - 13 2020 35 40

6 Rank-revealing UTV decomposition

Given an mtimes n matrix A with m ge n a UTV decomposition ofA is a factorization of the form

A = UTVT

where U isin Rmtimesm and V isin Rntimesn are unitary matrices and T is atriangular matrix (either lower or upper triangular)

The UTV decomposition can be viewed as a generalization ofother standard factorizations such as eg the Singular ValueDecomposition (SVD) or the Column Pivoted QR decomposition(CPQR) (To be precise the SVD is the special case where T isdiagonal and the CPQR is the special case where V is apermutation matrix) The additional flexibility inherent in UTVdecomposition enables the design of efficient updating procedures

randUTV provides close to optimal low-rank approximation andhighly accurate estimates for the singular values of a matrix

Low-Rank Approximations Lecture 4 March 6 - 13 2020 36 40

The algorithm randUTV builds the factorization incrementallywhich means that when it is applied to a matrix of numerical rankk the algorithm can be stopped early and incur an overall cost ofO(mnk)

Like HQRRP the algorithm randUTV is blocked which enables itto execute fast on modern hardware

The algorithm randUTV is not an iterative algorithm In thisregard it is closer to the CPQR than standard SVD algorithmswhich substantially simplifies software optimization

Blocked UTV factorization process

Low-Rank Approximations Lecture 4 March 6 - 13 2020 37 40

61 A single step block factorization from A0 to A1

Draw an mtimes p Gaussian matrix G and form the sample matrix

Y = (ATA)qATG

where q is a parameter indicating the number of steps of poweriteration taken

Form a unitary matrix V whose first p columns form anorthonormal basis for the column space of Y For example

[Vsim] = qr(Y)

We then compute SVD of the matrix AV( 1 p)

AV( 1 p) = U1A11WT

and set V1 = V

983063W

I

983064

Low-Rank Approximations Lecture 4 March 6 - 13 2020 38 40

One step of randUTV

function [UTV] = stepUTV(A p q)G =randn(size(A 1) p)Y = ATGfor i = 1 q

Y = AT(AY)end[Vsim] = qr(Y)[UDW] = svd(AV( 1 p))T =

983045D UTAV( (p+ 1) end)

983046

V( 1 p) = V( 1 p)Wreturn

randUTV then applies the same procedure to the remaining blockand so on

Low-Rank Approximations Lecture 4 March 6 - 13 2020 39 40

function [UTV] = randUTV(A p q)T = AU = eye(size(A 1)) V = eye(size(A 2))for i = 1 ceil(size(A 2)p)

I1 = 1 (p(iminus 1))I2 = (p(iminus 1) + 1) size(A 1)J2 = (p(iminus 1) + 1) size(A 2)if length(J2) gt p

[983141U 983141T 983141V] = stepUTV(T(I2 J2) p q)else

[983141U 983141T 983141V] = svd(T(I2 J2))end

U( I2) = U( I2)983141U V( J2) = V( J2)983141V

T(I2 J2) = 983141T T(I1 J2) = 983141T(I1 J2)983141Vend

return

Low-Rank Approximations Lecture 4 March 6 - 13 2020 40 40

5 Rank-revealing QR Factorization

Column pivoted QR factorization (CPQR) of A isin Rmtimesn

AP = QR

where P is a permutation matrix Q is an orthogonal matrix andR is upper triangular

Householder column pivoted QR

Householder column pivoted QR consists of a sequence of nminus 1rank-1 updates (so called BLAS2 operations)

Low-Rank Approximations Lecture 4 March 6 - 13 2020 34 40

Householder column pivoted QR executes rather slowly on modernhardware in particular on systems involving many cores when thematrix is stored in distributed memory or on a hard drive etcThe reason is the process very communication intensive Wereturn to block the process

Blocked Householder column pivoted QR

BLAS3 operations are involved which can be executed very rapidlyon a broad range of computing hardware

A randomized algorithm

G = randn(b+ pm) Y = GA [simsimP1] = qr(Y 0)

Low-Rank Approximations Lecture 4 March 6 - 13 2020 35 40

6 Rank-revealing UTV decomposition

Given an mtimes n matrix A with m ge n a UTV decomposition ofA is a factorization of the form

A = UTVT

where U isin Rmtimesm and V isin Rntimesn are unitary matrices and T is atriangular matrix (either lower or upper triangular)

The UTV decomposition can be viewed as a generalization ofother standard factorizations such as eg the Singular ValueDecomposition (SVD) or the Column Pivoted QR decomposition(CPQR) (To be precise the SVD is the special case where T isdiagonal and the CPQR is the special case where V is apermutation matrix) The additional flexibility inherent in UTVdecomposition enables the design of efficient updating procedures

randUTV provides close to optimal low-rank approximation andhighly accurate estimates for the singular values of a matrix

Low-Rank Approximations Lecture 4 March 6 - 13 2020 36 40

The algorithm randUTV builds the factorization incrementallywhich means that when it is applied to a matrix of numerical rankk the algorithm can be stopped early and incur an overall cost ofO(mnk)

Like HQRRP the algorithm randUTV is blocked which enables itto execute fast on modern hardware

The algorithm randUTV is not an iterative algorithm In thisregard it is closer to the CPQR than standard SVD algorithmswhich substantially simplifies software optimization

Blocked UTV factorization process

Low-Rank Approximations Lecture 4 March 6 - 13 2020 37 40

61 A single step block factorization from A0 to A1

Draw an mtimes p Gaussian matrix G and form the sample matrix

Y = (ATA)qATG

where q is a parameter indicating the number of steps of poweriteration taken

Form a unitary matrix V whose first p columns form anorthonormal basis for the column space of Y For example

[Vsim] = qr(Y)

We then compute SVD of the matrix AV( 1 p)

AV( 1 p) = U1A11WT

and set V1 = V

983063W

I

983064

Low-Rank Approximations Lecture 4 March 6 - 13 2020 38 40

One step of randUTV

function [UTV] = stepUTV(A p q)G =randn(size(A 1) p)Y = ATGfor i = 1 q

Y = AT(AY)end[Vsim] = qr(Y)[UDW] = svd(AV( 1 p))T =

983045D UTAV( (p+ 1) end)

983046

V( 1 p) = V( 1 p)Wreturn

randUTV then applies the same procedure to the remaining blockand so on

Low-Rank Approximations Lecture 4 March 6 - 13 2020 39 40

function [UTV] = randUTV(A p q)T = AU = eye(size(A 1)) V = eye(size(A 2))for i = 1 ceil(size(A 2)p)

I1 = 1 (p(iminus 1))I2 = (p(iminus 1) + 1) size(A 1)J2 = (p(iminus 1) + 1) size(A 2)if length(J2) gt p

[983141U 983141T 983141V] = stepUTV(T(I2 J2) p q)else

[983141U 983141T 983141V] = svd(T(I2 J2))end

U( I2) = U( I2)983141U V( J2) = V( J2)983141V

T(I2 J2) = 983141T T(I1 J2) = 983141T(I1 J2)983141Vend

return

Low-Rank Approximations Lecture 4 March 6 - 13 2020 40 40

Householder column pivoted QR executes rather slowly on modernhardware in particular on systems involving many cores when thematrix is stored in distributed memory or on a hard drive etcThe reason is the process very communication intensive Wereturn to block the process

Blocked Householder column pivoted QR

BLAS3 operations are involved which can be executed very rapidlyon a broad range of computing hardware

A randomized algorithm

G = randn(b+ pm) Y = GA [simsimP1] = qr(Y 0)

Low-Rank Approximations Lecture 4 March 6 - 13 2020 35 40

6 Rank-revealing UTV decomposition

Given an mtimes n matrix A with m ge n a UTV decomposition ofA is a factorization of the form

A = UTVT

where U isin Rmtimesm and V isin Rntimesn are unitary matrices and T is atriangular matrix (either lower or upper triangular)

The UTV decomposition can be viewed as a generalization ofother standard factorizations such as eg the Singular ValueDecomposition (SVD) or the Column Pivoted QR decomposition(CPQR) (To be precise the SVD is the special case where T isdiagonal and the CPQR is the special case where V is apermutation matrix) The additional flexibility inherent in UTVdecomposition enables the design of efficient updating procedures

randUTV provides close to optimal low-rank approximation andhighly accurate estimates for the singular values of a matrix

Low-Rank Approximations Lecture 4 March 6 - 13 2020 36 40

The algorithm randUTV builds the factorization incrementallywhich means that when it is applied to a matrix of numerical rankk the algorithm can be stopped early and incur an overall cost ofO(mnk)

Like HQRRP the algorithm randUTV is blocked which enables itto execute fast on modern hardware

The algorithm randUTV is not an iterative algorithm In thisregard it is closer to the CPQR than standard SVD algorithmswhich substantially simplifies software optimization

Blocked UTV factorization process

Low-Rank Approximations Lecture 4 March 6 - 13 2020 37 40

61 A single step block factorization from A0 to A1

Draw an mtimes p Gaussian matrix G and form the sample matrix

Y = (ATA)qATG

where q is a parameter indicating the number of steps of poweriteration taken

Form a unitary matrix V whose first p columns form anorthonormal basis for the column space of Y For example

[Vsim] = qr(Y)

We then compute SVD of the matrix AV( 1 p)

AV( 1 p) = U1A11WT

and set V1 = V

983063W

I

983064

Low-Rank Approximations Lecture 4 March 6 - 13 2020 38 40

One step of randUTV

function [UTV] = stepUTV(A p q)G =randn(size(A 1) p)Y = ATGfor i = 1 q

Y = AT(AY)end[Vsim] = qr(Y)[UDW] = svd(AV( 1 p))T =

983045D UTAV( (p+ 1) end)

983046

V( 1 p) = V( 1 p)Wreturn

randUTV then applies the same procedure to the remaining blockand so on

Low-Rank Approximations Lecture 4 March 6 - 13 2020 39 40

function [UTV] = randUTV(A p q)T = AU = eye(size(A 1)) V = eye(size(A 2))for i = 1 ceil(size(A 2)p)

I1 = 1 (p(iminus 1))I2 = (p(iminus 1) + 1) size(A 1)J2 = (p(iminus 1) + 1) size(A 2)if length(J2) gt p

[983141U 983141T 983141V] = stepUTV(T(I2 J2) p q)else

[983141U 983141T 983141V] = svd(T(I2 J2))end

U( I2) = U( I2)983141U V( J2) = V( J2)983141V

T(I2 J2) = 983141T T(I1 J2) = 983141T(I1 J2)983141Vend

return

Low-Rank Approximations Lecture 4 March 6 - 13 2020 40 40

6 Rank-revealing UTV decomposition

Given an mtimes n matrix A with m ge n a UTV decomposition ofA is a factorization of the form

A = UTVT

where U isin Rmtimesm and V isin Rntimesn are unitary matrices and T is atriangular matrix (either lower or upper triangular)

The UTV decomposition can be viewed as a generalization ofother standard factorizations such as eg the Singular ValueDecomposition (SVD) or the Column Pivoted QR decomposition(CPQR) (To be precise the SVD is the special case where T isdiagonal and the CPQR is the special case where V is apermutation matrix) The additional flexibility inherent in UTVdecomposition enables the design of efficient updating procedures

randUTV provides close to optimal low-rank approximation andhighly accurate estimates for the singular values of a matrix

Low-Rank Approximations Lecture 4 March 6 - 13 2020 36 40

The algorithm randUTV builds the factorization incrementallywhich means that when it is applied to a matrix of numerical rankk the algorithm can be stopped early and incur an overall cost ofO(mnk)

Like HQRRP the algorithm randUTV is blocked which enables itto execute fast on modern hardware

The algorithm randUTV is not an iterative algorithm In thisregard it is closer to the CPQR than standard SVD algorithmswhich substantially simplifies software optimization

Blocked UTV factorization process

Low-Rank Approximations Lecture 4 March 6 - 13 2020 37 40

61 A single step block factorization from A0 to A1

Draw an mtimes p Gaussian matrix G and form the sample matrix

Y = (ATA)qATG

where q is a parameter indicating the number of steps of poweriteration taken

Form a unitary matrix V whose first p columns form anorthonormal basis for the column space of Y For example

[Vsim] = qr(Y)

We then compute SVD of the matrix AV( 1 p)

AV( 1 p) = U1A11WT

and set V1 = V

983063W

I

983064

Low-Rank Approximations Lecture 4 March 6 - 13 2020 38 40

One step of randUTV

function [UTV] = stepUTV(A p q)G =randn(size(A 1) p)Y = ATGfor i = 1 q

Y = AT(AY)end[Vsim] = qr(Y)[UDW] = svd(AV( 1 p))T =

983045D UTAV( (p+ 1) end)

983046

V( 1 p) = V( 1 p)Wreturn

randUTV then applies the same procedure to the remaining blockand so on

Low-Rank Approximations Lecture 4 March 6 - 13 2020 39 40

function [UTV] = randUTV(A p q)T = AU = eye(size(A 1)) V = eye(size(A 2))for i = 1 ceil(size(A 2)p)

I1 = 1 (p(iminus 1))I2 = (p(iminus 1) + 1) size(A 1)J2 = (p(iminus 1) + 1) size(A 2)if length(J2) gt p

[983141U 983141T 983141V] = stepUTV(T(I2 J2) p q)else

[983141U 983141T 983141V] = svd(T(I2 J2))end

U( I2) = U( I2)983141U V( J2) = V( J2)983141V

T(I2 J2) = 983141T T(I1 J2) = 983141T(I1 J2)983141Vend

return

Low-Rank Approximations Lecture 4 March 6 - 13 2020 40 40

The algorithm randUTV builds the factorization incrementallywhich means that when it is applied to a matrix of numerical rankk the algorithm can be stopped early and incur an overall cost ofO(mnk)

Like HQRRP the algorithm randUTV is blocked which enables itto execute fast on modern hardware

The algorithm randUTV is not an iterative algorithm In thisregard it is closer to the CPQR than standard SVD algorithmswhich substantially simplifies software optimization

Blocked UTV factorization process

Low-Rank Approximations Lecture 4 March 6 - 13 2020 37 40

61 A single step block factorization from A0 to A1

Draw an mtimes p Gaussian matrix G and form the sample matrix

Y = (ATA)qATG

where q is a parameter indicating the number of steps of poweriteration taken

Form a unitary matrix V whose first p columns form anorthonormal basis for the column space of Y For example

[Vsim] = qr(Y)

We then compute SVD of the matrix AV( 1 p)

AV( 1 p) = U1A11WT

and set V1 = V

983063W

I

983064

Low-Rank Approximations Lecture 4 March 6 - 13 2020 38 40

One step of randUTV

function [UTV] = stepUTV(A p q)G =randn(size(A 1) p)Y = ATGfor i = 1 q

Y = AT(AY)end[Vsim] = qr(Y)[UDW] = svd(AV( 1 p))T =

983045D UTAV( (p+ 1) end)

983046

V( 1 p) = V( 1 p)Wreturn

randUTV then applies the same procedure to the remaining blockand so on

Low-Rank Approximations Lecture 4 March 6 - 13 2020 39 40

function [UTV] = randUTV(A p q)T = AU = eye(size(A 1)) V = eye(size(A 2))for i = 1 ceil(size(A 2)p)

I1 = 1 (p(iminus 1))I2 = (p(iminus 1) + 1) size(A 1)J2 = (p(iminus 1) + 1) size(A 2)if length(J2) gt p

[983141U 983141T 983141V] = stepUTV(T(I2 J2) p q)else

[983141U 983141T 983141V] = svd(T(I2 J2))end

U( I2) = U( I2)983141U V( J2) = V( J2)983141V

T(I2 J2) = 983141T T(I1 J2) = 983141T(I1 J2)983141Vend

return

Low-Rank Approximations Lecture 4 March 6 - 13 2020 40 40

61 A single step block factorization from A0 to A1

Draw an mtimes p Gaussian matrix G and form the sample matrix

Y = (ATA)qATG

where q is a parameter indicating the number of steps of poweriteration taken

Form a unitary matrix V whose first p columns form anorthonormal basis for the column space of Y For example

[Vsim] = qr(Y)

We then compute SVD of the matrix AV( 1 p)

AV( 1 p) = U1A11WT

and set V1 = V

983063W

I

983064

Low-Rank Approximations Lecture 4 March 6 - 13 2020 38 40

One step of randUTV

function [UTV] = stepUTV(A p q)G =randn(size(A 1) p)Y = ATGfor i = 1 q

Y = AT(AY)end[Vsim] = qr(Y)[UDW] = svd(AV( 1 p))T =

983045D UTAV( (p+ 1) end)

983046

V( 1 p) = V( 1 p)Wreturn

randUTV then applies the same procedure to the remaining blockand so on

Low-Rank Approximations Lecture 4 March 6 - 13 2020 39 40

function [UTV] = randUTV(A p q)T = AU = eye(size(A 1)) V = eye(size(A 2))for i = 1 ceil(size(A 2)p)

I1 = 1 (p(iminus 1))I2 = (p(iminus 1) + 1) size(A 1)J2 = (p(iminus 1) + 1) size(A 2)if length(J2) gt p

[983141U 983141T 983141V] = stepUTV(T(I2 J2) p q)else

[983141U 983141T 983141V] = svd(T(I2 J2))end

U( I2) = U( I2)983141U V( J2) = V( J2)983141V

T(I2 J2) = 983141T T(I1 J2) = 983141T(I1 J2)983141Vend

return

Low-Rank Approximations Lecture 4 March 6 - 13 2020 40 40

One step of randUTV

function [UTV] = stepUTV(A p q)G =randn(size(A 1) p)Y = ATGfor i = 1 q

Y = AT(AY)end[Vsim] = qr(Y)[UDW] = svd(AV( 1 p))T =

983045D UTAV( (p+ 1) end)

983046

V( 1 p) = V( 1 p)Wreturn

randUTV then applies the same procedure to the remaining blockand so on

Low-Rank Approximations Lecture 4 March 6 - 13 2020 39 40

function [UTV] = randUTV(A p q)T = AU = eye(size(A 1)) V = eye(size(A 2))for i = 1 ceil(size(A 2)p)

I1 = 1 (p(iminus 1))I2 = (p(iminus 1) + 1) size(A 1)J2 = (p(iminus 1) + 1) size(A 2)if length(J2) gt p

[983141U 983141T 983141V] = stepUTV(T(I2 J2) p q)else

[983141U 983141T 983141V] = svd(T(I2 J2))end

U( I2) = U( I2)983141U V( J2) = V( J2)983141V

T(I2 J2) = 983141T T(I1 J2) = 983141T(I1 J2)983141Vend

return

Low-Rank Approximations Lecture 4 March 6 - 13 2020 40 40

function [UTV] = randUTV(A p q)T = AU = eye(size(A 1)) V = eye(size(A 2))for i = 1 ceil(size(A 2)p)

I1 = 1 (p(iminus 1))I2 = (p(iminus 1) + 1) size(A 1)J2 = (p(iminus 1) + 1) size(A 2)if length(J2) gt p

[983141U 983141T 983141V] = stepUTV(T(I2 J2) p q)else

[983141U 983141T 983141V] = svd(T(I2 J2))end

U( I2) = U( I2)983141U V( J2) = V( J2)983141V

T(I2 J2) = 983141T T(I1 J2) = 983141T(I1 J2)983141Vend

return

Low-Rank Approximations Lecture 4 March 6 - 13 2020 40 40