random matrices with independent rows or columns

Random matrices with independent rows or columns

Nicole Tomczak-Jaegermann

Phenomena in High Dimensionsin geometric analysis, random matrices,

and computational geometry

Roscoff, June 25–29, 2012

Nicole Tomczak-Jaegermann (U of A) May, 2012 1 / 32

Project on random matrices with indepeendent rows or columns, norms andcondition numbers of their submatrices.

Involved in this project:Alain Pajor and various subsets of:

Radosław Adamczak,Olivier Guedon,Rafał Latała,Alexander Litvak,Krzysztof Oleszkiewicz,Nicole Tomczak-Jaegermann


Basic definitions and Notation

Let X = (X(1), . . . ,X(N)) be a random vector in RN with full dimensional support.We say that the distribution of X is

logaritmically concave, if X has density of the form e−h(x) withh : RN → (−∞,∞] convex (one of equivalent definitions by C. Borell)

isotropic, if EX(i) = 0 and EX(i)X(j) = δi,j.

For x ∈ RN we put

|x| = ‖x‖2 =(∑N

i=1 x2i

)1/2

PIx - canonical projection of x onto {y ∈ RN : supp(y) ⊂ I}, I ⊂ {1, . . . ,N}.

For integers k 6 ` we we use the shorthand notation [k, `] = {k, . . . , `}.


Examples

1. Let K ⊂ Rn be a convex body ( = compact convex, with non-empty interior)(symmetric means −K = K).X a random vector uniformly distributed in K. Then the corresponding probabilitymeasure on Rn

µK(A) =|K ∩A|

|K|

is log-concave (by Brunn-Minkowski).Moreover, for every convex body K there exists an affine map T such that µTK isisotropic.

2. The Gaussian vector G = (g1, ...,gn), where gi’s have N(0, 1) distribution, isisotropic and log-concave.

3. Similarly the vector X = (ξ1, ..., ξn), where ξi’s have exponential distribution(i.e., with density f(t) = 1√

2exp(−

√2|t|), for t ∈ R)

is isotropic and log-concave.


Random Matrices

Let n,N > 1 be integers (a priori, no relation between them); fixed throughout.Our interest in behaviour of invariants as functions of n,N

Random matrix: A is n×N matrix, defined either by a sequence of rows orcolumns, which will be independent random vectors

A =

. . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . .

, A =

......

......

......

......

......

......

......

......

......

......

Difference with RMT where entries are independent,limiting behaviour of invariants when the size→∞


Random Matrices, Norms of submatrices, Ak,m

Let k 6 n and m 6 N integers.

Ak,m = maximal operator norm over submatrices of A with k rows,m columns.

“operator norm” means the norm A : Rm → Rk with the Euclidean norms.

Example: Let X1, . . . ,Xn ∈ RN be independent random vectors.Let A be n×N random matrix with rows X1, . . . ,Xn. This acts as an operator

A : RN → Rn Ax =(〈Xj, x〉

)nj=1∈ Rn, for x ∈ RN.

Ak,m = supJ⊂[1,n]|J|=k

supx∈Um

∑j∈J

|〈Xj, x〉|21/2

.

Um = {x ∈ SN−1 : | supp x| 6 m}.


Ak,m – examples: independent columns

I: A is n×N matrix defined by independent isotropic log-concave columns

A =

......

......

......

......

......

......

......

......

......

......

Main application to approximation of a covariance matrix by empirical covariancematricesAn,m (i.e., k = n) was sufficient. It corresponds to submatrices of full columns,thus preserving the structure of the matrix.


Approximation of a covariance matrixLet X ∈ Rn isotropic and log-concave,(Xi)i6N independent copies of X.By isotropicity, EX⊗ X = Id.

By the law of large numbers, the empirical covariance matrix converges to Id.

1N

N∑i=1

Xi ⊗ Xi −→ Id as N→∞..

Kannan-Lovasz-Simonovits asked (around 1995), motivated by a problem ofcomplexity in computing volume in high dimension:Under the above assumptions, estimate the size N for which, given ε ∈ (0, 1),

∥∥∥ 1N

N∑i=1

Xi ⊗ Xi − Id∥∥∥ 6 ε

holds with high probability.

Typical “translation” of a limit law into a quantitative statement in the non-limittheory.


KLS questionKLS showed that for any ε, δ ∈ (0, 1) (under a finite third moment assumption),N > (C/εδ)n2 gives the required approximation, with probability 1 − δ.

Bourgain (1996): for any ε, δ ∈ (0, 1), there exists C(ε, δ) > 0 such thatN = C(ε, δ)n log3 n gives the approximation with probability 1 − δ.

Rudelson:using non-commutative Khinchine inequalities of Pisier andLust-Piquard/Pisier;by majorazing measure approach of Talagrand.

Several other authors improved powers of logarithm, from late 1990’s to 2010

ALPT: N proportional to n is sufficient (JAMS 2010), improved in CRAS 2011.Let X ∈ Rn be isotropic log-concave,X1, . . . ,XN be independent copies of X.

P

(∥∥∥ 1N

N∑i=1

Xi ⊗ Xi − Id∥∥∥ 6 C

√n/N

)> 1 − e−c

√n.

So letting ε = C√n/N we get N = Cn/ε2.


Extremal s-numbers of matrices with independent rows

As the corollary of ALPT we geta quantitative version of Bai-Yin theorem for matrices of a fixed size:


Ak,m – examples: independent rows

II: A is n×N matrix, defined by independent (isotropic log-concave) rows

A =

. . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . .

Ak,m is harder to tackle if m < N; because the row structure of A is destroyed.


supx∈Um

∑j∈J

|〈Xj, x〉|21/2

.

Applicable for studies of reconstruction problems and in particular RIP; for uniformversions of some geometric questions on large deviation estimates....


Large Deviation for Ak,m

Intuition: A – a random matrix with independent isotropic log-concave rows. Thenfor a submatrix AJ,I with k rows and m columns,(

E‖AI,J‖2)1/2

>√

max{k,m}.

One of main results by ALLPT is a large deviation theorem for Ak.m.

Let n,N, k 6 n, and m 6 N, let A be a n×N matrix with independent isotropiclog-concave rows. For t > 1 we have

P(Ak,m > Ct λ

)6 exp(−tλ/

√log(3m)),

where

λ =√

log log(3m)√m log

(emax{N,n}

m

)+√k log

(enk

)and C is a universal constant.

The bound is essentially optimal up to√

log logm factor.


Paouris’ large deviation theorem

Paouris’ large deviation (2005):There exists c > 0 such that if X is an isotropic log-concave random vector in RN,then for all t > 1,

P{

|X| > c t√N}

6 exp(−t√N).

Equivalent formulations, for pth moments, etc....

Weak parameterFor a vector X in RN we define

σX(p) := supt∈SN−1

(E|〈t,X〉|p)1/p p > 1.

Examples

For isotropic log-concave vectors X, σX(p) 6 p/√

2.

For subgaussian vectors X, σX(p) 6 C√p.


Paouris’ theorem with weak parameter

For any log-concave random vector X,

(E|X|p)1/p 6 C(E|X| + σX(p)

)for p > 2,

and if X is isotropic,

P (|X| > t) 6 exp(

− σ−1X

( tC

))for t > C(E|X|2)1/2.


Uniform large deviation theorem

Uniform Paouris-type theorem [ALLPT]:For 1 6 m 6 N and an isotropic log-concave vector X in RN we have, for t > 1,

P(

supI⊂[1,N]|I|=m

|PIX| > ct√m log

(eNm

))6 exp

(− σ−1

X

( t√m√

log(em)log(eNm

))).

If X is isotropic log-concave in RN, then so is PIX, for every I ⊂ [1,N]. Howeverthe probability is too high to beat the complexity of the family of subsets (which is(Nm

)). So a direct union bound argument cannot be used.

Trade off of an extra logarithm in the threshold; is based on new non-trivialestimates for order statistics.


Order Statistics

For an N–dimensional random vector X by X∗1 > X∗2 > . . . > X∗N we denote thenonincreasing rearrangement of |X(1)|, . . . , |X(N)|.

In particular, X∗1 = max{|X(1)|, . . . , |X(N)|} and X∗N = min{|X(1)|, . . . , |X(N)|}.Random variables X∗k, 1 6 k 6 N, are called order statistics of X.

Problem Find upper bound for P (X∗k > t).


Order Statistics for isotropic log-concave vectors

Let X be N-dimensional log-concave isotropic vector. Then

P (X∗k > t) 6 exp(

− σ−1X

( 1Ct√k))

for t > C log(eNk

).

The weak parameter is needed for a better control of a probability for randomvectors which are sums of independent random vectors, in terms of sequences ofcoefficients in these sums.Latala (2010) proved a version without the weak parameter.ALLPT (2012) the present version

The approach is based on the suitable estimate of moments of the process NX(t)

NX(t) :=

n∑i=1

1{X(i)>t}, t > 0.

That is, NX(t) is equal to the number of coordinates of X larger than or equal to t.


Estimate for NX

For any isotropic log-concave vector X and p > 1 we have

E(t2NX(t))p 6 (CσX(p))2p for t > C log( Nt2

σ2X(p)

).

To get estimate for order statistics we observe that X∗k > t implies thatNX(t) > k/2 or N−X(t) > k/2 and vector −X is also isotropic and log-concave.Estimates for NX and Chebyshev’s inequality give

P (X∗k > t) 6(2k

)p(ENX(t)p + EN−X(t)p

)6 2( Cpt√k

)2p

provided that t > C log(Nt2/p2). We take p = 1eCt√k and notice that the

restriction on t follows by the assumption that t > C log(eN/k).


Estimate for NX(t)

Proof of estimate for NX(t) is based on two ideas.

the restriction of a log-concave vector X to a convex set is log-concave;

Paouris’ large deviation theorem.


Uniform Paouris-type estimate

For any m 6 N and any isotropic log-concave vector X in RN we have for t > 1,

P(

supI⊂[1,N]|I|=m

|PIX| > ct√m log

(eNm

))6 exp

(− σ−1

X

( t√m√

log(em)log(eNm

))).

Idea of the proof. It is easy.

supI⊂[1,N]|I|=m

|PIX| =( m∑k=1

|X∗k|2)1/2

6 2( s−1∑i=0

2i|X∗2i |2)1/2

,

where s = dlog2me.


Applications – reconstruction, compressed sensing

Let n,N > 1. Let T ⊂ RN and Γ be an n×N matrix.

Consider any vector x ∈ T . Assuming that Γx is known, the problem is toreconstruct x with a fast algorithm.

Hypothesis on T and on Γ . The common hypothesis is that T = Um.

the Restricted Isometry Property (RIP) of order m: for all m-sparse vectors x,

(1 − δ)|x| 6 |Γx| 6 (1 + δ)|x|.

The RIP parameter:

δm = δm(Γ) = supx∈Um

∣∣|Γx|2 − E|Γx|2∣∣

Introduced by E. Candes, J. Romberg and T. Tao around 2006.If δ2m is appropriately small then every m-sparse vector x can be reconstructedfrom Γx by the `1-minimization method.


More notation

Upper estimates for

δm = δm(Γ) = supx∈Um

∣∣|Γx|2 − E|Γx|2∣∣

More generally, for any T ⊂ SN−1,

δT (Γ) = supx∈T

∣∣|Γx|2 − E|Γx|2∣∣ .

Let X1, . . . ,Xn ∈ RN independent; Γ the n×N matrix with rows Xi. (Inreconstruction problems – we look for vectors given by their measurements)

Let 1 6 k 6 n and define the parameter Γk(T) by

Γk(T)2 = sup

y∈Tsup

I⊂{1,...,n}|I|=k

∑i∈I

| 〈Xi,y〉 |2.

We write Γk,m = Γk(Um).It agrees with the definition introduced earlier – of Ak,m


Fundamental Lemma:

[ALPT, CRAS], [ALLPT]:Let X1, . . . ,Xn ∈ RN be independent isotropic, T ⊂ SN−1 finite. Let 0 < θ < 1and B > 1. Then with probability at least 1 − |T | exp

(−3θ2n/8B2

),

δT

(Γ√n

)= supy∈T

∣∣∣∣∣ 1nn∑i=1

(|〈Xi,y〉|2 − E|〈Xi,y〉|2)

∣∣∣∣∣6 θ+

1n

(supy∈T

n∑i=1

|〈Xi,y〉|21{|〈Xi,y〉|>B}

+ supy∈T

En∑i=1

|〈Xi,y〉|21{|〈Xi,y〉|>B}

)

6 θ+1n

(Γk(T)

2 + EΓk(T)2) .

where k 6 n is the largest integer satisfying k 6 (Γk(T)/B)2.


Corollary for RIP:

Let Xi, Γ , 0 < θ < 1 and B > 1, as before. Assume that m 6 N satisfies

m log11eNm

63θ2n

16B2 .

Then with probability at least 1 − exp(− 3θ2n

16B2

)one has

δm

(Γ√n

)= supy∈Um

∣∣∣∣∣ 1nn∑i=1

(|〈Xi,y〉|2 − E|〈Xi,y〉|2)

∣∣∣∣∣6 2 θ+

2n

(Γ 2k,m + EΓ 2

k,m

),

where k 6 n is the largest integer satisfying k 6 (Γkm/B)2.


RIP Theorem for matrices with independent rows:

Let n,N > 1 and 0 < θ < 1. Let Γ be an n×N matrix, whose rows areindependent isotropic log-concave random vectors Xi, i 6 n.There exists an absolute constant c > 0, such that if m 6 N satisfies

m log log 3m(

log3 max{N,n}

m

)2

6 c

(θ

log(3/θ)

)2

n

thenδm(Γ/

√n) 6 θ

with high probability.

Optimal up to a log log factor.For unconditional distributions we know that this factor can be removed; weconjecture that in general can be removed as well.


Return to Large Deviation for Ak,m

Recall the result for Ak,m.For n 6 N, k 6 n, m 6 N,


supx∈Um

∑j∈J

|〈Xj, x〉|21/2

.

Then for t > 1 we have

P(Ak,m > Ct λ

)6 exp(−tλ/

√log(3m)),

whereλ =

√log log(3m)

√m log(eN/m) +

√k log(en/k).


Ak,m, idea of proof

To bound Ak,m one has then to prove uniformity with respect to two families ofdifferent character:one being {I ⊂ [1,N] : |I| = k}; and the other equal to Um(RN).

X1, . . . ,Xn independent isotropic N-dimensional log-concave vectors.x = (xi) ∈ Rn with some structural assumptions, like sparsity.... we considerY =∑ni=1 xiXi.

By duality we need to estimate probability that supJ⊂[1,N]|J|=m

∣∣∣∣∣PJ(n∑i=1

xiXi

)∣∣∣∣∣ > t

=

supJ⊂[1,N]|J|=m

|PJY| > t

for every t > 0, depending on the norms |x| and ‖x‖∞.Complexity of these families are too high for using a union bound argument, andso we need to come up with some chaining.


Ak,m, continuation

This leads us to distinguishing two cases, depending on the relation between kand k ′:

k ′ = inf{` > 1 : m log(eN/m) 6 ` log(en/`)}.

Step 1. when k > k ′. We reduce to the case k 6 k ′.Step 2. Case k 6 k ′.

To build intuition we may take k ′ ∼ k.


Ak,m, Step 1

Step 1. We take only the family of k-sparse vectors, but do not need projections.{supx∈Uk

∣∣∣∣∣n∑i=1

xiXi

∣∣∣∣∣ > t

}.

Assume first that x is a “flat” vector: xi = ±a or 0 and a = k−1/2, wherek = | supp(x)|. That is, |x| = 1 and ‖x‖∞ = k−1/2.Direct argument shows that the estimate is right for such vectors..

We may have 0 < |x1| 6 |x2| 6 . . . |xk| and xj = 0 for j > k, which may consist ofsome number of “flat” vectors.The first natural try is to consider separately each flat vector and then add theresults together. This works but may produce an extra logarithmic factor.


Ak,m, Step 1, chaining

Chaining:let k1 ∼ k/2, k2 ∼ k/4, . . ., ks ∼ k/2s ∼ k ′. So

∑sj=1 kj = k ′.

Given x ∈ Uk, let x1 be the restriction of x to the k1 smallest coordinates;x2 be the restriction of x to the next k2 smallest coordinates, etc.

This way,

x =

s∑i=1

xi

where xi’s have mutually disjoint supports, each of cardinality 6 ki, andcoordinates of xi are larger than coordinates of xj if i < j.We use Paouris-type estimates for each xi...This is similar to ALPT (JAMS).


Ak,m, Step 2

Step 2. Another chaining argument, more delicate in definitions of ε-nets. We usethe uniform estimate for projections of sums, which in general is weaker than inCase 1.At this step we lose log logm.


Congratulations Alain!


random matrices with independent rows or columns

Documents