random matrices with independent rows or columns
TRANSCRIPT
Random matrices with independent rows or columns
Nicole Tomczak-Jaegermann
Phenomena in High Dimensionsin geometric analysis, random matrices,
and computational geometry
Roscoff, June 25–29, 2012
Nicole Tomczak-Jaegermann (U of A) May, 2012 1 / 32
Project on random matrices with indepeendent rows or columns, norms andcondition numbers of their submatrices.
Involved in this project:Alain Pajor and various subsets of:
Radosław Adamczak,Olivier Guedon,Rafał Latała,Alexander Litvak,Krzysztof Oleszkiewicz,Nicole Tomczak-Jaegermann
Nicole Tomczak-Jaegermann (U of A) May, 2012 2 / 32
Basic definitions and Notation
Let X = (X(1), . . . ,X(N)) be a random vector in RN with full dimensional support.We say that the distribution of X is
logaritmically concave, if X has density of the form e−h(x) withh : RN → (−∞,∞] convex (one of equivalent definitions by C. Borell)
isotropic, if EX(i) = 0 and EX(i)X(j) = δi,j.
For x ∈ RN we put
|x| = ‖x‖2 =(∑N
i=1 x2i
)1/2
PIx - canonical projection of x onto {y ∈ RN : supp(y) ⊂ I}, I ⊂ {1, . . . ,N}.
For integers k 6 ` we we use the shorthand notation [k, `] = {k, . . . , `}.
Nicole Tomczak-Jaegermann (U of A) May, 2012 3 / 32
Examples
1. Let K ⊂ Rn be a convex body ( = compact convex, with non-empty interior)(symmetric means −K = K).X a random vector uniformly distributed in K. Then the corresponding probabilitymeasure on Rn
µK(A) =|K ∩A|
|K|
is log-concave (by Brunn-Minkowski).Moreover, for every convex body K there exists an affine map T such that µTK isisotropic.
2. The Gaussian vector G = (g1, ...,gn), where gi’s have N(0, 1) distribution, isisotropic and log-concave.
3. Similarly the vector X = (ξ1, ..., ξn), where ξi’s have exponential distribution(i.e., with density f(t) = 1√
2exp(−
√2|t|), for t ∈ R)
is isotropic and log-concave.
Nicole Tomczak-Jaegermann (U of A) May, 2012 4 / 32
Random Matrices
Let n,N > 1 be integers (a priori, no relation between them); fixed throughout.Our interest in behaviour of invariants as functions of n,N
Random matrix: A is n×N matrix, defined either by a sequence of rows orcolumns, which will be independent random vectors
A =
. . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . .
, A =
......
......
......
......
......
......
......
......
......
......
Difference with RMT where entries are independent,limiting behaviour of invariants when the size→∞
Nicole Tomczak-Jaegermann (U of A) May, 2012 5 / 32
Random Matrices, Norms of submatrices, Ak,m
Let k 6 n and m 6 N integers.
Ak,m = maximal operator norm over submatrices of A with k rows,m columns.
“operator norm” means the norm A : Rm → Rk with the Euclidean norms.
Example: Let X1, . . . ,Xn ∈ RN be independent random vectors.Let A be n×N random matrix with rows X1, . . . ,Xn. This acts as an operator
A : RN → Rn Ax =(〈Xj, x〉
)nj=1∈ Rn, for x ∈ RN.
Ak,m = supJ⊂[1,n]|J|=k
supx∈Um
∑j∈J
|〈Xj, x〉|21/2
.
Um = {x ∈ SN−1 : | supp x| 6 m}.
Nicole Tomczak-Jaegermann (U of A) May, 2012 6 / 32
Ak,m – examples: independent columns
I: A is n×N matrix defined by independent isotropic log-concave columns
A =
......
......
......
......
......
......
......
......
......
......
Main application to approximation of a covariance matrix by empirical covariancematricesAn,m (i.e., k = n) was sufficient. It corresponds to submatrices of full columns,thus preserving the structure of the matrix.
Nicole Tomczak-Jaegermann (U of A) May, 2012 7 / 32
Approximation of a covariance matrixLet X ∈ Rn isotropic and log-concave,(Xi)i6N independent copies of X.By isotropicity, EX⊗ X = Id.
By the law of large numbers, the empirical covariance matrix converges to Id.
1N
N∑i=1
Xi ⊗ Xi −→ Id as N→∞..
Kannan-Lovasz-Simonovits asked (around 1995), motivated by a problem ofcomplexity in computing volume in high dimension:Under the above assumptions, estimate the size N for which, given ε ∈ (0, 1),
∥∥∥ 1N
N∑i=1
Xi ⊗ Xi − Id∥∥∥ 6 ε
holds with high probability.
Typical “translation” of a limit law into a quantitative statement in the non-limittheory.
Nicole Tomczak-Jaegermann (U of A) May, 2012 8 / 32
KLS questionKLS showed that for any ε, δ ∈ (0, 1) (under a finite third moment assumption),N > (C/εδ)n2 gives the required approximation, with probability 1 − δ.
Bourgain (1996): for any ε, δ ∈ (0, 1), there exists C(ε, δ) > 0 such thatN = C(ε, δ)n log3 n gives the approximation with probability 1 − δ.
Rudelson:using non-commutative Khinchine inequalities of Pisier andLust-Piquard/Pisier;by majorazing measure approach of Talagrand.
Several other authors improved powers of logarithm, from late 1990’s to 2010
ALPT: N proportional to n is sufficient (JAMS 2010), improved in CRAS 2011.Let X ∈ Rn be isotropic log-concave,X1, . . . ,XN be independent copies of X.
P
(∥∥∥ 1N
N∑i=1
Xi ⊗ Xi − Id∥∥∥ 6 C
√n/N
)> 1 − e−c
√n.
So letting ε = C√n/N we get N = Cn/ε2.
Nicole Tomczak-Jaegermann (U of A) May, 2012 9 / 32
Extremal s-numbers of matrices with independent rows
As the corollary of ALPT we geta quantitative version of Bai-Yin theorem for matrices of a fixed size:
Nicole Tomczak-Jaegermann (U of A) May, 2012 10 / 32
Ak,m – examples: independent rows
II: A is n×N matrix, defined by independent (isotropic log-concave) rows
A =
. . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . .
Ak,m is harder to tackle if m < N; because the row structure of A is destroyed.
Ak,m = supJ⊂[1,n]|J|=k
supx∈Um
∑j∈J
|〈Xj, x〉|21/2
.
Applicable for studies of reconstruction problems and in particular RIP; for uniformversions of some geometric questions on large deviation estimates....
Nicole Tomczak-Jaegermann (U of A) May, 2012 11 / 32
Large Deviation for Ak,m
Intuition: A – a random matrix with independent isotropic log-concave rows. Thenfor a submatrix AJ,I with k rows and m columns,(
E‖AI,J‖2)1/2
>√
max{k,m}.
One of main results by ALLPT is a large deviation theorem for Ak.m.
Let n,N, k 6 n, and m 6 N, let A be a n×N matrix with independent isotropiclog-concave rows. For t > 1 we have
P(Ak,m > Ct λ
)6 exp(−tλ/
√log(3m)),
where
λ =√
log log(3m)√m log
(emax{N,n}
m
)+√k log
(enk
)and C is a universal constant.
The bound is essentially optimal up to√
log logm factor.
Nicole Tomczak-Jaegermann (U of A) May, 2012 12 / 32
Paouris’ large deviation theorem
Paouris’ large deviation (2005):There exists c > 0 such that if X is an isotropic log-concave random vector in RN,then for all t > 1,
P{
|X| > c t√N}
6 exp(−t√N).
Equivalent formulations, for pth moments, etc....
Weak parameterFor a vector X in RN we define
σX(p) := supt∈SN−1
(E|〈t,X〉|p)1/p p > 1.
Examples
For isotropic log-concave vectors X, σX(p) 6 p/√
2.
For subgaussian vectors X, σX(p) 6 C√p.
Nicole Tomczak-Jaegermann (U of A) May, 2012 13 / 32
Paouris’ theorem with weak parameter
For any log-concave random vector X,
(E|X|p)1/p 6 C(E|X| + σX(p)
)for p > 2,
and if X is isotropic,
P (|X| > t) 6 exp(
− σ−1X
( tC
))for t > C(E|X|2)1/2.
Nicole Tomczak-Jaegermann (U of A) May, 2012 14 / 32
Uniform large deviation theorem
Uniform Paouris-type theorem [ALLPT]:For 1 6 m 6 N and an isotropic log-concave vector X in RN we have, for t > 1,
P(
supI⊂[1,N]|I|=m
|PIX| > ct√m log
(eNm
))6 exp
(− σ−1
X
( t√m√
log(em)log(eNm
))).
If X is isotropic log-concave in RN, then so is PIX, for every I ⊂ [1,N]. Howeverthe probability is too high to beat the complexity of the family of subsets (which is(Nm
)). So a direct union bound argument cannot be used.
Trade off of an extra logarithm in the threshold; is based on new non-trivialestimates for order statistics.
Nicole Tomczak-Jaegermann (U of A) May, 2012 15 / 32
Order Statistics
For an N–dimensional random vector X by X∗1 > X∗2 > . . . > X∗N we denote thenonincreasing rearrangement of |X(1)|, . . . , |X(N)|.
In particular, X∗1 = max{|X(1)|, . . . , |X(N)|} and X∗N = min{|X(1)|, . . . , |X(N)|}.Random variables X∗k, 1 6 k 6 N, are called order statistics of X.
Problem Find upper bound for P (X∗k > t).
Nicole Tomczak-Jaegermann (U of A) May, 2012 16 / 32
Order Statistics for isotropic log-concave vectors
Let X be N-dimensional log-concave isotropic vector. Then
P (X∗k > t) 6 exp(
− σ−1X
( 1Ct√k))
for t > C log(eNk
).
The weak parameter is needed for a better control of a probability for randomvectors which are sums of independent random vectors, in terms of sequences ofcoefficients in these sums.Latala (2010) proved a version without the weak parameter.ALLPT (2012) the present version
The approach is based on the suitable estimate of moments of the process NX(t)
NX(t) :=
n∑i=1
1{X(i)>t}, t > 0.
That is, NX(t) is equal to the number of coordinates of X larger than or equal to t.
Nicole Tomczak-Jaegermann (U of A) May, 2012 17 / 32
Estimate for NX
For any isotropic log-concave vector X and p > 1 we have
E(t2NX(t))p 6 (CσX(p))2p for t > C log( Nt2
σ2X(p)
).
To get estimate for order statistics we observe that X∗k > t implies thatNX(t) > k/2 or N−X(t) > k/2 and vector −X is also isotropic and log-concave.Estimates for NX and Chebyshev’s inequality give
P (X∗k > t) 6(2k
)p(ENX(t)p + EN−X(t)p
)6 2( Cpt√k
)2p
provided that t > C log(Nt2/p2). We take p = 1eCt√k and notice that the
restriction on t follows by the assumption that t > C log(eN/k).
Nicole Tomczak-Jaegermann (U of A) May, 2012 18 / 32
Estimate for NX(t)
Proof of estimate for NX(t) is based on two ideas.
the restriction of a log-concave vector X to a convex set is log-concave;
Paouris’ large deviation theorem.
Nicole Tomczak-Jaegermann (U of A) May, 2012 19 / 32
Uniform Paouris-type estimate
For any m 6 N and any isotropic log-concave vector X in RN we have for t > 1,
P(
supI⊂[1,N]|I|=m
|PIX| > ct√m log
(eNm
))6 exp
(− σ−1
X
( t√m√
log(em)log(eNm
))).
Idea of the proof. It is easy.
supI⊂[1,N]|I|=m
|PIX| =( m∑k=1
|X∗k|2)1/2
6 2( s−1∑i=0
2i|X∗2i |2)1/2
,
where s = dlog2me.
Nicole Tomczak-Jaegermann (U of A) May, 2012 20 / 32
Applications – reconstruction, compressed sensing
Let n,N > 1. Let T ⊂ RN and Γ be an n×N matrix.
Consider any vector x ∈ T . Assuming that Γx is known, the problem is toreconstruct x with a fast algorithm.
Hypothesis on T and on Γ . The common hypothesis is that T = Um.
the Restricted Isometry Property (RIP) of order m: for all m-sparse vectors x,
(1 − δ)|x| 6 |Γx| 6 (1 + δ)|x|.
The RIP parameter:
δm = δm(Γ) = supx∈Um
∣∣|Γx|2 − E|Γx|2∣∣
Introduced by E. Candes, J. Romberg and T. Tao around 2006.If δ2m is appropriately small then every m-sparse vector x can be reconstructedfrom Γx by the `1-minimization method.
Nicole Tomczak-Jaegermann (U of A) May, 2012 21 / 32
More notation
Upper estimates for
δm = δm(Γ) = supx∈Um
∣∣|Γx|2 − E|Γx|2∣∣
More generally, for any T ⊂ SN−1,
δT (Γ) = supx∈T
∣∣|Γx|2 − E|Γx|2∣∣ .
Let X1, . . . ,Xn ∈ RN independent; Γ the n×N matrix with rows Xi. (Inreconstruction problems – we look for vectors given by their measurements)
Let 1 6 k 6 n and define the parameter Γk(T) by
Γk(T)2 = sup
y∈Tsup
I⊂{1,...,n}|I|=k
∑i∈I
| 〈Xi,y〉 |2.
We write Γk,m = Γk(Um).It agrees with the definition introduced earlier – of Ak,m
Nicole Tomczak-Jaegermann (U of A) May, 2012 22 / 32
Fundamental Lemma:
[ALPT, CRAS], [ALLPT]:Let X1, . . . ,Xn ∈ RN be independent isotropic, T ⊂ SN−1 finite. Let 0 < θ < 1and B > 1. Then with probability at least 1 − |T | exp
(−3θ2n/8B2
),
δT
(Γ√n
)= supy∈T
∣∣∣∣∣ 1nn∑i=1
(|〈Xi,y〉|2 − E|〈Xi,y〉|2)
∣∣∣∣∣6 θ+
1n
(supy∈T
n∑i=1
|〈Xi,y〉|21{|〈Xi,y〉|>B}
+ supy∈T
En∑i=1
|〈Xi,y〉|21{|〈Xi,y〉|>B}
)
6 θ+1n
(Γk(T)
2 + EΓk(T)2) .
where k 6 n is the largest integer satisfying k 6 (Γk(T)/B)2.
Nicole Tomczak-Jaegermann (U of A) May, 2012 23 / 32
Corollary for RIP:
Let Xi, Γ , 0 < θ < 1 and B > 1, as before. Assume that m 6 N satisfies
m log11eNm
63θ2n
16B2 .
Then with probability at least 1 − exp(− 3θ2n
16B2
)one has
δm
(Γ√n
)= supy∈Um
∣∣∣∣∣ 1nn∑i=1
(|〈Xi,y〉|2 − E|〈Xi,y〉|2)
∣∣∣∣∣6 2 θ+
2n
(Γ 2k,m + EΓ 2
k,m
),
where k 6 n is the largest integer satisfying k 6 (Γkm/B)2.
Nicole Tomczak-Jaegermann (U of A) May, 2012 24 / 32
RIP Theorem for matrices with independent rows:
Let n,N > 1 and 0 < θ < 1. Let Γ be an n×N matrix, whose rows areindependent isotropic log-concave random vectors Xi, i 6 n.There exists an absolute constant c > 0, such that if m 6 N satisfies
m log log 3m(
log3 max{N,n}
m
)2
6 c
(θ
log(3/θ)
)2
n
thenδm(Γ/
√n) 6 θ
with high probability.
Optimal up to a log log factor.For unconditional distributions we know that this factor can be removed; weconjecture that in general can be removed as well.
Nicole Tomczak-Jaegermann (U of A) May, 2012 25 / 32
Return to Large Deviation for Ak,m
Recall the result for Ak,m.For n 6 N, k 6 n, m 6 N,
Ak,m = supJ⊂[1,n]|J|=k
supx∈Um
∑j∈J
|〈Xj, x〉|21/2
.
Then for t > 1 we have
P(Ak,m > Ct λ
)6 exp(−tλ/
√log(3m)),
whereλ =
√log log(3m)
√m log(eN/m) +
√k log(en/k).
Nicole Tomczak-Jaegermann (U of A) May, 2012 26 / 32
Ak,m, idea of proof
To bound Ak,m one has then to prove uniformity with respect to two families ofdifferent character:one being {I ⊂ [1,N] : |I| = k}; and the other equal to Um(RN).
X1, . . . ,Xn independent isotropic N-dimensional log-concave vectors.x = (xi) ∈ Rn with some structural assumptions, like sparsity.... we considerY =∑ni=1 xiXi.
By duality we need to estimate probability that supJ⊂[1,N]|J|=m
∣∣∣∣∣PJ(n∑i=1
xiXi
)∣∣∣∣∣ > t
=
supJ⊂[1,N]|J|=m
|PJY| > t
for every t > 0, depending on the norms |x| and ‖x‖∞.Complexity of these families are too high for using a union bound argument, andso we need to come up with some chaining.
Nicole Tomczak-Jaegermann (U of A) May, 2012 27 / 32
Ak,m, continuation
This leads us to distinguishing two cases, depending on the relation between kand k ′:
k ′ = inf{` > 1 : m log(eN/m) 6 ` log(en/`)}.
Step 1. when k > k ′. We reduce to the case k 6 k ′.Step 2. Case k 6 k ′.
To build intuition we may take k ′ ∼ k.
Nicole Tomczak-Jaegermann (U of A) May, 2012 28 / 32
Ak,m, Step 1
Step 1. We take only the family of k-sparse vectors, but do not need projections.{supx∈Uk
∣∣∣∣∣n∑i=1
xiXi
∣∣∣∣∣ > t
}.
Assume first that x is a “flat” vector: xi = ±a or 0 and a = k−1/2, wherek = | supp(x)|. That is, |x| = 1 and ‖x‖∞ = k−1/2.Direct argument shows that the estimate is right for such vectors..
We may have 0 < |x1| 6 |x2| 6 . . . |xk| and xj = 0 for j > k, which may consist ofsome number of “flat” vectors.The first natural try is to consider separately each flat vector and then add theresults together. This works but may produce an extra logarithmic factor.
Nicole Tomczak-Jaegermann (U of A) May, 2012 29 / 32
Ak,m, Step 1, chaining
Chaining:let k1 ∼ k/2, k2 ∼ k/4, . . ., ks ∼ k/2s ∼ k ′. So
∑sj=1 kj = k ′.
Given x ∈ Uk, let x1 be the restriction of x to the k1 smallest coordinates;x2 be the restriction of x to the next k2 smallest coordinates, etc.
This way,
x =
s∑i=1
xi
where xi’s have mutually disjoint supports, each of cardinality 6 ki, andcoordinates of xi are larger than coordinates of xj if i < j.We use Paouris-type estimates for each xi...This is similar to ALPT (JAMS).
Nicole Tomczak-Jaegermann (U of A) May, 2012 30 / 32
Ak,m, Step 2
Step 2. Another chaining argument, more delicate in definitions of ε-nets. We usethe uniform estimate for projections of sums, which in general is weaker than inCase 1.At this step we lose log logm.
Nicole Tomczak-Jaegermann (U of A) May, 2012 31 / 32
Congratulations Alain!
Nicole Tomczak-Jaegermann (U of A) May, 2012 32 / 32