lecture 2: randomized iterative methods for linear...
TRANSCRIPT
Lecture 2 Randomized Iterative Methods for LinearSystems
February 21 - 28 2020
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 1 43
1 Pseudoinverse solutions of linear systems
Consider a linear system of equations iℓ
Ax = b A isin Rmtimesn b isin Rm
The system is called consistent if b isin range(A) otherwiseinconsistent
We are interested in the pseudoinverse solution Adaggerb where Adagger
denotes the Moore-Penrose pseudoinverse of A
Ax = b rank(A) Adaggerbconsistent = n unique solutionconsistent lt n unique minimum ℓ2-norm solution
inconsistent = n unique least-squares (LS) solution
inconsistent lt n unique minimum ℓ2-norm LS solution
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 2 43
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 3 43
2 Notation and preliminaries
For any random variable ξ let E983045ξ983046denote its expectation
For an integer m ge 1 let
[m] = 1 2 3 m
For any vector u isin Rm we use uT and 983042u9830422 to denote thetranspose and the Euclidean norm (ℓ2-norm) of u respectively
I the identity matrix whose order is clear from the context
For any matrix A isin Rmtimesn we use AT Adagger 983042A983042F range(A)
σ1(A) ge σ2(A) ge middot middot middot ge σr(A) gt 0
to denote the transpose the Moore-Penrose pseudoinverse theFrobenius norm the column space and all the nonzero singularvalues of A respectively Obviously r is the rank of A
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 4 43
For index sets I sube [m] and J sube [n] let AI AJ and AIJdenote the row submatrix indexed by I the column submatrixindexed by J and the submatrix that lies in the rows indexed byI and the columns indexed by J respectively
Let I1 I2 Is denote a partition of [m] that is
Ii cap Ij = empty cupsi=1Ii = [m]
Let J1J2 Jt denote a partition of [n] Let
P = I1 I2 Istimes J1J2 Jt
Lemma 1
For any vector u isin Rm and any matrix A isin Rmtimesn it holds
uTAATu le 983042A9830422FuTu
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 5 43
Lemma 2
For any matrix A isin Rmtimesn with rank r and any vector u isin range(A) itholds
uTAATu ge σ2r (A)983042u98304222
Lemma 3
Let α gt 0 and A be any nonzero real matrix For every u isin range(A)it holds
983056983056983056983056983056
983061Iminus αAAT
983042A9830422F
983062k
u
9830569830569830569830569830562
le983061max1leiler
9830559830559830559830551minusασ2
i (A)
983042A9830422F
983055983055983055983055
983062k
983042u9830422
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 6 43
3 A doubly stochastic block Gauss-Seidel algorithm [2]
Algorithm 1 Doubly stochastic block Gauss-Seidel (DSBGS)
Let α gt 0 Initialize x0 isin Rn
for k = 1 2 do
Pick (IJ ) isin P with probability983042AIJ 9830422F983042A9830422F
Set xk = xkminus1 minus αIJ (AIJ )T(II)T
983042AIJ 9830422F(Axkminus1 minus b)
Landweber [3] (s = 1 and t = 1)
xk = xkminus1 minus αAT
983042A9830422F(Axkminus1 minus b)
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 7 43
Randomized Kaczmarz (RK) [7] (s = m t = 1 α = 1)
At step k RK projects xkminus1 onto the hyperplane x | Aix = bi
xk = xkminus1 minus Aixkminus1 minus bi
983042Ai98304222(Ai)
T
where Ai is the ith row of A and bi is the ith component of b
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 8 43
Randomized Gauss-Seidel [4][5] (s = 1 t = n α = 1)
xk = xkminus1 minus (Aj)T(Axkminus1 minus b)
983042Aj98304222Ij
where Aj is the jth column of A and Ij is the jth column of thentimes n identity matrix I
Doubly stochastic Gauss-Seidel [6] (s = m t = n)
xk = xkminus1 minus αAij(Aix
kminus1 minus bi)
|Aij |2Ij
where Aij is the (i j) entry of A
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 9 43
31 Convergence of the norms of the expectations
Theorem 4
Let xk denote the kth iterate of DSBGS applied to the consistent linearsystem
Ax = b
with arbitrary x0 isin Rn In exact arithmetic it holds
983042E[xk minus x0983183]9830422 le
983061max1leiler
9830559830559830559830551minusασ2
i (A)
983042A9830422F
983055983055983055983055
983062k
983042x0 minus x09831839830422
wherex0983183 = (IminusAdaggerA)x0 +Adaggerb
ie the projection of x0 onto the solution set
x isin Rn | Ax = b
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 10 43
Proof of Theorem 4
Note that the conditioned expectation on xkminus1
E[xk |xkminus1]
= xkminus1 minus αE983063IJ (AIJ )T(II)T
983042AIJ 9830422F
983064(Axkminus1 minus b)
= xkminus1 minus α
983091
983107983131
(IJ )isinP
IJ (AIJ )T(II)T
983042AIJ 9830422F983042AIJ 9830422F983042A9830422F
983092
983108 (Axkminus1 minus b)
= xkminus1 minus αAT
983042A9830422F(Axkminus1 minus b)
Then the conditioned expectation E[xk minus x0983183 |xkminus1] is given by
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 11 43
E[xk minus x0983183 |xkminus1] = E[xk |xkminus1]minus x0
983183
= xkminus1 minus αAT
983042A9830422F(Axkminus1 minus b)minus x0
983183
= xkminus1 minus αAT
983042A9830422F(Axkminus1 minusAx0
983183)minus x0983183
=
983061Iminus αATA
983042A9830422F
983062(xkminus1 minus x0
983183)
Taking expectation gives
E[xk minus x0983183] = E[E[xk minus x0
983183 |xkminus1]] =
983061Iminus αATA
983042A9830422F
983062E[xkminus1 minus x0
983183]
=
983061Iminus αATA
983042A9830422F
983062k
(x0 minus x0983183)
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 12 43
Applying the norms to both sides we obtain
983042E[xk minus x0983183]9830422 le
983061max1leiler
9830559830559830559830551minusασ2
i (A)
983042A9830422F
983055983055983055983055
983062k
983042x0 minus x09831839830422
Here the inequality follows from the fact that
x0 minus x0983183 = AdaggerAx
0 minusAdaggerb isin range(AT)
and Lemma 3
Remark 1
If x0 isin range(AT) then x0983183 = Adaggerb
To ensure convergence of the expected iterate it suffices to have
max1leiler
9830559830559830559830551minusασ2
i (A)
983042A9830422F
983055983055983055983055 lt 1 ie 0 lt α lt2983042A9830422Fσ21(A)
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 13 43
Theorem 5
Let xk denote the kth iterate of DSBGS applied to the consistent orinconsistent linear system
Ax = b
with arbitrary x0 isin Rn In exact arithmetic it holds
983042E[Axk minusAx983183]9830422 le983061max1leiler
9830559830559830559830551minusασ2
i (A)
983042A9830422F
983055983055983055983055
983062k
983042Ax0 minusAx9831839830422
where x983183 is any solution of
ATAx = ATb
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 14 43
Proof of Theorem 5
Note that the conditioned expectation on xkminus1
E[Axk minusAx983183 |xkminus1]
= A(E[xk |xkminus1]minus x983183)
= A
983061xkminus1 minus α
AT
983042A9830422F(Axkminus1 minus b)minus x983183
983062
= A
983061xkminus1 minus αAT
983042A9830422F(Axkminus1 minusAx983183)minus x983183
983062
(by ATb = ATAx983183)
= Axkminus1 minusAx983183 minusαAAT
983042A9830422F(Axkminus1 minusAx983183)
=
983061Iminus αAAT
983042A9830422F
983062(Axkminus1 minusAx983183)
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 15 43
Taking expectation gives
E[Axk minusAx983183] = E[E[Axk minusAx983183 |xkminus1]]
=
983061Iminus αAAT
983042A9830422F
983062E[Axkminus1 minusAx983183]
=
983061Iminus αAAT
983042A9830422F
983062k
(Ax0 minusAx983183)
Applying the norms to both sides we obtain
983042E[Axk minusAx983183]9830422 le983061max1leiler
9830559830559830559830551minusασ2
i (A)
983042A9830422F
983055983055983055983055
983062k
983042Ax0 minusAx9831839830422
Here the inequality follows from the fact that
Ax0 minusAx983183 isin range(A)
and Lemma 3
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 16 43
32 Convergence of the expected norms
Theorem 6
Let xk denote the kth iterate of DSBGS applied to the full column rankconsistent linear system
Ax = b
with arbitrary x0 isin Rn Assume
0 lt α lt 2t
In exact arithmetic it holds
E[983042xk minusAdaggerb98304222] le9830611minus (2αminus tα2)σ2
n(A)
983042A9830422F
983062k
983042x0 minusAdaggerb98304222
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 17 43
Proof of Theorem 6
983042xk minusAdaggerb98304222
=
983056983056983056983056xkminus1 minus α
983061IJ (AIJ )T(II)
T
983042AIJ 9830422F
983062(Axkminus1 minus b)minusAdaggerb
9830569830569830569830562
2
=
983056983056983056983056xkminus1 minus α
983061IJ (AIJ )T(II)
T
983042AIJ 9830422F
983062A(xkminus1 minusAdaggerb)minusAdaggerb
9830569830569830569830562
2
=
983056983056983056983056xkminus1 minusAdaggerbminus α
983061IJ (AIJ )T(II)
TA
983042AIJ 9830422F
983062(xkminus1 minusAdaggerb)
9830569830569830569830562
2
= 983042xkminus1 minusAdaggerb98304222 minus 2α(xkminus1 minusAdaggerb)T983061IJ (AIJ )T(II)
TA
983042AIJ 9830422F
983062(xkminus1 minusAdaggerb)
+α2(xkminus1 minusAdaggerb)T983061ATIIAIJ (IJ )TIJ (AIJ )T(II)
TA
983042AIJ 9830424F
983062(xkminus1 minusAdaggerb)
le 983042xkminus1 minusAdaggerb98304222 minus 2α(xkminus1 minusAdaggerb)T983061IJ (AIJ )T(II)
TA
983042AIJ 9830422F
983062(xkminus1 minusAdaggerb)
+α2(xkminus1 minusAdaggerb)T983061ATII(II)
TA
983042AIJ 9830422F
983062(xkminus1 minusAdaggerb)
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 18 43
The last equality follows from (IJ )TIJ = I and Lemma 1 Takingexpectation gives
E[983042xk minusAdaggerb98304222 |xkminus1]
le 983042xkminus1 minusAdaggerb98304222 minus (2αminus tα2)(xkminus1 minusAdaggerb)T983061ATA
983042A9830422F
983062(xkminus1 minusAdaggerb)
le9830611minus (2αminus tα2)σ2
n(A)
983042A9830422F
983062983042xkminus1 minusAdaggerb98304222 (by Lemma 2)
Taking expectation again gives
E[983042xk minusAdaggerb98304222] = E[E[983042xk minusAdaggerb98304222 |xkminus1]]
le9830611minus (2αminus tα2)σ2
n(A)
983042A9830422F
983062E[983042xkminus1 minusAdaggerb98304222]
le9830611minus (2αminus tα2)σ2
n(A)
983042A9830422F
983062k
983042x0 minusAdaggerb98304222
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 19 43
Remark 2
If t = 1 and x0 isin range(AT) we can show
xk minus x0983183 isin range(AT)
by induction where
x0983183 = (IminusAdaggerA)x0 +Adaggerb
ie the projection of x0 onto the set
x isin Rn | Ax = b
Then for rank deficient consistent linear systems by the sameapproach we can prove the convergence bound
E[983042xk minus x098318398304222] le
9830611minus (2αminus α2)σ2
r (A)
983042A9830422F
983062k
983042x0 minus x098318398304222
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 20 43
Theorem 7
Let xk denote the kth iterate of DSBGS applied to the consistent linearsystem (full column rank or rank-deficient)
Ax = b
with arbitrary x0 isin Rn If t = n and 0 lt α lt 2σ2r (A)983042A9830422F then
E[983042Axk minus b98304222] le9830611 + α2 minus 2ασ2
r (A)
983042A9830422F
983062k
983042Ax0 minus b98304222
If t lt n and 0 lt α lt 2σ2r (A)(tρ) then
E[983042Axk minus b98304222] le9830611minus 2ασ2
r (A)minus tρα2
983042A9830422F
983062k
983042Ax0 minus b98304222
whereρ = max
1lejletσ21(AJj )
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 21 43
Proof of Theorem 7
Note that
983042Axk minus b98304222
=
983056983056983056983056Axkminus1 minus α
983061AIJ (AIJ )T(II)
T
983042AIJ 9830422F
983062(Axkminus1 minus b)minus b
9830569830569830569830562
2
= 983042Axkminus1 minus b98304222 minus 2α(Axkminus1 minus b)T983061AIJ (AIJ )T(II)
T
983042AIJ 9830422F
983062(Axkminus1 minus b)
+α2(Axkminus1 minus b)T983061IIAIJ (IJ )TATAIJ (AIJ )T(II)
T
983042AIJ 9830424F
983062(Axkminus1 minus b)
If t = n then it follows from
(IJ )TATAIJ = 983042AJ 9830422F
(since AIJ = AJ is a column vector) that
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 22 43
983042Axk minus b98304222
= 983042Axkminus1 minus b98304222 minus 2α(Axkminus1 minus b)T983061AIJ (AIJ )T(II)
T
983042AIJ 9830422F
983062(Axkminus1 minus b)
+α2(Axkminus1 minus b)T983061983042AJ 9830422FIIAIJ (AIJ )T(II)
T
983042AIJ 9830424F
983062(Axkminus1 minus b)
le 983042Axkminus1 minus b98304222 minus 2α(Axkminus1 minus b)T983061AIJ (AIJ )T(II)
T
983042AIJ 9830422F
983062(Axkminus1 minus b)
+α2(Axkminus1 minus b)T983061983042AJ 9830422FII(II)T
983042AIJ 9830422F
983062(Axkminus1 minus b) (by Lemma 1)
Taking expectation gives
E[983042Axk minus b98304222 |xkminus1]
le (1 + α2)983042Axkminus1 minus b98304222 minus 2α(Axkminus1 minus b)T983061AAT
983042A9830422F
983062(Axkminus1 minus b)
le9830611 + α2 minus 2ασ2
r(A)
983042A9830422F
983062983042Axkminus1 minus b98304222
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 23 43
The last inequality follows from Axkminus1 minus b isin range(A) and Lemma 2Taking expectation again gives
E[983042Axk minus b98304222] = E[E[983042Axk minus b98304222 |xkminus1]]
le9830611 + α2 minus 2ασ2
r (A)
983042A9830422F
983062E[983042Axkminus1 minus b98304222]
le9830611 + α2 minus 2ασ2
r (A)
983042A9830422F
983062k
983042Ax0 minus b98304222
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 24 43
If t lt n then it follows from
(IJ )TATAIJ = AT
JAJ ≼ ρI
(since ρ = max1lejlet σ21(AJj )) that
983042Axk minus b98304222
le 983042Axkminus1 minus b98304222 minus 2α(Axkminus1 minus b)T983061AIJ (AIJ )T(II)
T
983042AIJ 9830422F
983062(Axkminus1 minus b)
+α2(Axkminus1 minus b)T983061ρIIAIJ (AIJ )T(II)
T
983042AIJ 9830424F
983062(Axkminus1 minus b)
le 983042Axkminus1 minus b98304222 minus 2α(Axkminus1 minus b)T983061AIJ (AIJ )T(II)
T
983042AIJ 9830422F
983062(Axkminus1 minus b)
+α2(Axkminus1 minus b)T983061ρII(II)
T
983042AIJ 9830422F
983062(Axkminus1 minus b) (by Lemma 1)
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 25 43
Taking expectation gives
E[983042Axk minus b98304222 |xkminus1]
le9830611 +
tρα2
983042A9830422F
983062983042Axkminus1 minus b98304222 minus 2α(Axkminus1 minus b)T
983061AAT
983042A9830422F
983062(Axkminus1 minus b)
le9830611minus 2ασ2
r(A)minus tρα2
983042A9830422F
983062983042Axkminus1 minus b98304222
The last inequality follows from Axkminus1 minus b isin range(A) and Lemma 2Taking expectation again gives
E[983042Axk minus b98304222] = E[E[983042Axk minus b98304222 |xkminus1]]
le9830611minus 2ασ2
r (A)minus tρα2
983042A9830422F
983062E[983042Axkminus1 minus b98304222]
le9830611minus 2ασ2
r (A)minus tρα2
983042A9830422F
983062k
983042Ax0 minus b98304222
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 26 43
Remark 3
Note that(AJ )
Tb = (AJ )TAx983183
where x983183 is any solution of ATAx = ATb For inconsistent linearsystems we can prove DSBGS(1 t) has the convergence bounds Ift = n and 0 lt α lt 2σ2
r (A)983042A9830422F then
E[983042Axk minusAx98318398304222] le9830611 + α2 minus 2ασ2
r (A)
983042A9830422F
983062k
983042Ax0 minus b98304222
If t lt n and 0 lt α lt 2σ2r (A)(tρ) then
E[983042Axk minusAx98318398304222] le9830611minus 2ασ2
r (A)minus tρα2
983042A9830422F
983062k
983042Ax0 minus b98304222
whereρ = max
1lejletσ21(AJj )
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 27 43
4 Randomized extended block Kaczmarz (REBK) [1]
Algorithm 2 Randomized extended block Kaczmarz (REBK)
Let I1 I2 Is and J1J2 Jt be partitions of [m]and [n] respectivelyLet α gt 0 Initialize z0 isin b+ range(A) and x0 isin range(AT)for k = 1 2 do
Pick j isin [t] with probability 983042AJj9830422F983042A9830422FSet zk = zkminus1 minus α
983042AJj9830422FAJj (AJj )
Tzkminus1
Pick i isin [s] with probability 983042AIi9830422F983042A9830422FSet xk = xkminus1 minus α
983042AIi9830422F(AIi)
T(AIixkminus1 minus bIi + zkIi)
By choosing s = m t = n and α = 1 we recover the well-knownrandomized extended Kaczmarz (REK) algorithm of Zouzias and Freris[8]
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 28 43
Let Ekminus1
983045middot983046denote the conditional expectation conditioned on the first
k minus 1 iterations of REBK That is
Ekminus1
983045middot983046= E
983045middot|j1 i1 j2 i2 jkminus1 ikminus1
983046
where jl is the lth column block chosen and il is the lth row blockchosen
We denote the conditional expectation conditioned on the first k minus 1iterations and the kth column block chosen as
Eikminus1
983045middot983046= E
983045middot|j1 i1 j2 i2 jkminus1 ikminus1 jk
983046
Then by the law of total expectation we have
Ekminus1
983045middot983046= Ekminus1
983045Eikminus1
983045middot983046983046
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 29 43
41 Convergence of the norms of the expectations
Theorem 8
For any given consistent or inconsistent linear system
Ax = b
let xk be the kth iterate of REBK with
z0 isin b+ range(A) and x0 isin range(AT)
It holds
983042E983045xk minusAdaggerb
9830469830422 le δk
983061983042x0 minusAdaggerb9830422 +
αk983042ATz09830422983042A9830422F
983062
where
δ = max1leiler
9830559830559830559830551minusασ2
i (A)
983042A9830422F
983055983055983055983055
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 30 43
Proof By ATb = ATAAdaggerb and
xk minusAdaggerb = xkminus1 minusAdaggerbminus α
983042AIi9830422F(AIi)
T(AIixkminus1 minus bIi + zkIi)
we have
Ekminus1
983045xk minusAdaggerb
983046= Ekminus1
983045Eikminus1
983045xk minusAdaggerb
983046983046
= xkminus1 minusAdaggerbminus Ekminus1
983063αAT(Axkminus1 minus b+ zk)
983042A9830422F
983064
= xkminus1 minusAdaggerbminus αATAxkminus1 minusATb
983042A9830422Fminus α
AT
983042A9830422FEkminus1
983045zk
983046
=
983061Iminus α
ATA
983042A9830422F
983062(xkminus1 minusAdaggerb)minus α
AT
983042A9830422FEkminus1
983045zk
983046
=
983061Iminus α
ATA
983042A9830422F
983062(xkminus1 minusAdaggerb)minus α
AT
983042A9830422F
983061Iminus α
AAT
983042A9830422F
983062zkminus1
=
983061Iminus α
ATA
983042A9830422F
983062(xkminus1 minusAdaggerb)minus α
983061Iminus α
ATA
983042A9830422F
983062AT
983042A9830422Fzkminus1
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 31 43
Taking expectation gives
E983045xk minusAdaggerb
983046
= E983045Ekminus1
983045xk minusAdaggerb
983046983046
=
983061Iminus α
ATA
983042A9830422F
983062E983045xkminus1 minusAdaggerb
983046minus α
983061Iminus α
ATA
983042A9830422F
983062AT
983042A9830422FE983045zkminus1
983046
=
983061Iminus α
ATA
983042A9830422F
983062E983045xkminus1 minusAdaggerb
983046minus α
983061Iminus α
ATA
983042A9830422F
983062kATz0
983042A9830422F
=
983061Iminus α
ATA
983042A9830422F
9830622
E983045xkminus2 minusAdaggerb
983046minus 2α
983061Iminus α
ATA
983042A9830422F
983062kATz0
983042A9830422F= middot middot middot
=
983061Iminus α
ATA
983042A9830422F
983062k
(x0 minusAdaggerb)minus αk
983061Iminus α
ATA
983042A9830422F
983062kATz0
983042A9830422F
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 32 43
Applying the norms to both sides we obtain
983042E983045xk minusAdaggerb
9830469830422
=
983056983056983056983056983056
983061Iminus α
ATA
983042A9830422F
983062k
(x0 minusAdaggerb)minus αk
983061Iminus α
ATA
983042A9830422F
983062kATz0
983042A9830422F
9830569830569830569830569830562
le983056983056983056983056983056
983061Iminus α
ATA
983042A9830422F
983062k
(x0 minusAdaggerb)
9830569830569830569830569830562
+
983056983056983056983056983056αk983061Iminus α
ATA
983042A9830422F
983062kATz0
983042A9830422F
9830569830569830569830569830562
le δk983061983042x0 minusAdaggerb9830422 +
αk983042ATz09830422983042A9830422F
983062
Here the last inequality follows from the facts that
x0 minusAdaggerb isin range(AT) ATz0 isin range(AT)
and Lemma 3
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 33 43
42 Convergence of E983045983042xk minusAdaggerb98304222
983046
The convergence ofE983045983042xk minusAdaggerb98304222
983046
depends on the positive number ρ defined as
ρ = 1minus (2αminus α2)σ2r (A)
983042A9830422F
In the following lemma we show that the vector zk generated in REBKwith z0 isin b+ range(A) converges to
bperp = (IminusAAdagger)b
which is the orthogonal projection of z0 onto the set z | ATz = 0
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 34 43
Lemma 9
For any given consistent or inconsistent linear system
Ax = b
let zk be the vector generated in REBK with
z0 isin b+ range(A)
It holdsE983045983042zk minus bperp98304222
983046le ρk983042z0 minus bperp98304222
Proof By (AJj )Tbperp = 0 we have
zk minus bperp = zkminus1 minus bperp minus α
983042AJj9830422FAJj (AJj )
T(zkminus1 minus bperp) (1)
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 35 43
By z0 minus bperp = AAdaggerz0 isin range(A) we can show by induction that
zk minus bperp isin range(A)
It follows from (1) that
983042zk minus bperp98304222
= 983042zkminus1 minus bperp98304222 minus2α983042(AJj )
T(zkminus1 minus bperp)98304222983042AJj9830422F
+α2
983042AJj9830424F(zkminus1 minus bperp)
TAJj (AJj )TAJj (AJj )
T(zkminus1 minus bperp)
le 983042zkminus1 minus bperp98304222 minus(2αminus α2)983042(AJj )
T(zkminus1 minus bperp)98304222983042AJj9830422F
(by Lemma 1)
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 36 43
Taking the conditioned expectation on the first k minus 1 iterations yields
Ekminus1
983045983042zk minus bperp98304222
983046
le 983042zkminus1 minus bperp98304222 minus(2αminus α2)983042AT(zkminus1 minus bperp)98304222
983042A9830422Fle ρ983042zkminus1 minus bperp98304222 (by Lemma 2)
Taking expectation again gives
E983045983042zk minus bperp98304222
983046= E
983045Ekminus1
983045983042zk minus bperp98304222
983046983046
le ρE983045983042zkminus1 minus bperp98304222
983046
le ρk983042z0 minus bperp98304222
This completes the proof
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 37 43
Theorem 10
For any given consistent or inconsistent linear system Ax = b let xk
be the kth iterate of REBK with
z0 isin b+ range(A) and x0 isin range(AT)
For any ε gt 0 it holds
E983045983042xk minusAdaggerb98304222
983046le (1 + ε)kρk
983061(1 + ε)α2983042z0 minus bperp98304222
ε2983042A9830422F+ 983042x0 minusAdaggerb98304222
983062
Proof Let
983141xk = xkminus1 minus α
983042AIi9830422F(AIi)
TAIi(xkminus1 minusAdaggerb)
which is actually one DSBGS update for the linear systemAx = AAdaggerb from xkminus1
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 38 43
It follows from
xk minus 983141xk =α
983042AIi9830422F(AIi)
T(bIi minusAIiAdaggerbminus zkIi)
that
983042xk minus 983141xk98304222
=α2
983042AIi9830424F(bIi minusAIiA
daggerbminus zkIi)TAIi(AIi)
T(bIi minusAIiAdaggerbminus zkIi
)
leα2983042bIi minusAIiA
daggerbminus zkIi98304222
983042AIi9830422F (by Lemma 1) (2)
It follows from
Ekminus1
983045983042xk minus 983141xk98304222
983046= Ekminus1
983045Eikminus1
983045983042xk minus 983141xk98304222
983046983046
le Ekminus1
983063α2983042bminusAAdaggerbminus zk98304222
983042A9830422F
983064(by (2))
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 39 43
that
E983045983042xk minus 983141xk98304222
983046le α2
983042A9830422FE983045983042bminusAAdaggerbminus zk98304222
983046
le α2ρk
983042A9830422F983042z0 minus bperp98304222 (by Lemma 9) (3)
By x0 isin range(AT) and Adaggerb isin range(AT) we have
x0 minusAdaggerb isin range(AT)
Then we can show that xk minusAdaggerb isin range(AT) by induction By
983042983141xk minusAdaggerb98304222
= 983042xkminus1 minusAdaggerb98304222 minus2α983042AIi(x
kminus1 minusAdaggerb)98304222983042AIi9830422F
+α2
983042AIi9830424F(xkminus1 minusAdaggerb)T(AIi)
TAIi(AIi)TAIi(x
kminus1 minusAdaggerb)
le 983042xkminus1 minusAdaggerb98304222 minus(2αminus α2)983042AIi(x
kminus1 minusAdaggerb)98304222983042AIi9830422F
(by Lemma 1)
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 40 43
we have
Ekminus1
983045983042983141xk minusAdaggerb98304222
983046
le 983042xkminus1 minusAdaggerb98304222 minus(2αminus α2)983042A(xkminus1 minusAdaggerb)98304222
983042A9830422Fle ρ983042xkminus1 minusAdaggerb98304222 (by Lemma 3)
which yields
E983045983042983141xk minusAdaggerb98304222
983046le ρE
983045983042xkminus1 minusAdaggerb98304222
983046 (4)
Note that for any ε gt 0 we have
983042xk minusAdaggerb98304222= 983042xk minus 983141xk + 983141xk minusAdaggerb98304222le (983042xk minus 983141xk9830422 + 983042983141xk minusAdaggerb9830422)2
le 983042xk minus 983141xk98304222 + 983042983141xk minusAdaggerb98304222 + 2983042xk minus 983141xk9830422983042983141xk minusAdaggerb9830422
le9830611 +
1
ε
983062983042xk minus 983141xk98304222 + (1 + ε)983042983141xk minusAdaggerb98304222 (5)
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 41 43
Combining (3) (4) and (5) yields
E983045983042xk minusAdaggerb98304222
983046
le9830611 +
1
ε
983062E983045983042xk minus 983141xk98304222
983046+ (1 + ε)E
983045983042983141xk minusAdaggerb98304222
983046
le9830611 +
1
ε
983062α2ρk
983042A9830422F983042z0 minus bperp98304222 + (1 + ε)ρE
983045983042xkminus1 minusAdaggerb98304222
983046
le9830611 +
1
ε
983062(1 + 1 + ε)α2ρk
983042A9830422F983042z0 minus bperp98304222
+(1 + ε)2ρ2E983045983042xkminus2 minusAdaggerb98304222
983046
le middot middot middot
le9830611 +
1
ε
983062(1 + 1 + ε+ middot middot middot+ (1 + ε)kminus1)α2ρk
983042A9830422F983042z0 minus bperp98304222
+(1 + ε)kρk983042x0 minusAdaggerb98304222
le (1 + ε)kρk983061(1 + ε)α2983042z0 minus bperp98304222
ε2983042A9830422F+ 983042x0 minusAdaggerb98304222
983062
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 42 43
Remark 4
For the case REBK with s = m t = n and α = 1 (ie REK) by theorthogonality
(983141xk minusAdaggerb)T(xk minus 983141xk) = 0
the equation (5) becomes
983042xk minusAdaggerb98304222 = 983042xk minus 983141xk98304222 + 983042983141xk minusAdaggerb98304222
which yields the following convergence for REK
E983045983042xk minusAdaggerb98304222
983046le ρk
983061k983042z0 minus bperp98304222
983042A9830422F+ 983042x0 minusAdaggerb98304222
983062
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 43 43
Kui Du Wutao Si and Xiaohui Sun
Pseudoinverse-free randomized extended block kaczmarz for solving least squaresarXiv preprint arXiv200104179 2020
Kui Du and Xiaohui Sun
A doubly stochastic block Gauss-Seidel algorithm for solving linear equationsarXiv preprint arXiv191213291 2019
Lawrence H Landweber
An iteration formula for Fredholm integral equations of the first kindAmer J Math 73615ndash624 1951
D Leventhal and A S Lewis
Randomized methods for linear constraints convergence rates and conditioningMath Oper Res 35(3)641ndash654 2010
Anna Ma Deanna Needell and Aaditya Ramdas
Convergence properties of the randomized extended Gauss-Seidel and Kaczmarz methodsSIAM J Matrix Anal Appl 36(4)1590ndash1604 2015
Meisam Razaviyayn Mingyi Hong Navid Reyhanian and Zhi-Quan Luo
A linearly convergent doubly stochastic Gauss-Seidel algorithm for solving linear equations and acertain class of over-parameterized optimization problemsMath Program 176(1-2 Ser B)465ndash496 2019
Thomas Strohmer and Roman Vershynin
A randomized Kaczmarz algorithm with exponential convergenceJ Fourier Anal Appl 15(2)262ndash278 2009
Anastasios Zouzias and Nikolaos M Freris
Randomized extended Kaczmarz for solving least squaresSIAM J Matrix Anal Appl 34(2)773ndash793 2013
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 43 43
1 Pseudoinverse solutions of linear systems
Consider a linear system of equations iℓ
Ax = b A isin Rmtimesn b isin Rm
The system is called consistent if b isin range(A) otherwiseinconsistent
We are interested in the pseudoinverse solution Adaggerb where Adagger
denotes the Moore-Penrose pseudoinverse of A
Ax = b rank(A) Adaggerbconsistent = n unique solutionconsistent lt n unique minimum ℓ2-norm solution
inconsistent = n unique least-squares (LS) solution
inconsistent lt n unique minimum ℓ2-norm LS solution
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 2 43
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 3 43
2 Notation and preliminaries
For any random variable ξ let E983045ξ983046denote its expectation
For an integer m ge 1 let
[m] = 1 2 3 m
For any vector u isin Rm we use uT and 983042u9830422 to denote thetranspose and the Euclidean norm (ℓ2-norm) of u respectively
I the identity matrix whose order is clear from the context
For any matrix A isin Rmtimesn we use AT Adagger 983042A983042F range(A)
σ1(A) ge σ2(A) ge middot middot middot ge σr(A) gt 0
to denote the transpose the Moore-Penrose pseudoinverse theFrobenius norm the column space and all the nonzero singularvalues of A respectively Obviously r is the rank of A
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 4 43
For index sets I sube [m] and J sube [n] let AI AJ and AIJdenote the row submatrix indexed by I the column submatrixindexed by J and the submatrix that lies in the rows indexed byI and the columns indexed by J respectively
Let I1 I2 Is denote a partition of [m] that is
Ii cap Ij = empty cupsi=1Ii = [m]
Let J1J2 Jt denote a partition of [n] Let
P = I1 I2 Istimes J1J2 Jt
Lemma 1
For any vector u isin Rm and any matrix A isin Rmtimesn it holds
uTAATu le 983042A9830422FuTu
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 5 43
Lemma 2
For any matrix A isin Rmtimesn with rank r and any vector u isin range(A) itholds
uTAATu ge σ2r (A)983042u98304222
Lemma 3
Let α gt 0 and A be any nonzero real matrix For every u isin range(A)it holds
983056983056983056983056983056
983061Iminus αAAT
983042A9830422F
983062k
u
9830569830569830569830569830562
le983061max1leiler
9830559830559830559830551minusασ2
i (A)
983042A9830422F
983055983055983055983055
983062k
983042u9830422
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 6 43
3 A doubly stochastic block Gauss-Seidel algorithm [2]
Algorithm 1 Doubly stochastic block Gauss-Seidel (DSBGS)
Let α gt 0 Initialize x0 isin Rn
for k = 1 2 do
Pick (IJ ) isin P with probability983042AIJ 9830422F983042A9830422F
Set xk = xkminus1 minus αIJ (AIJ )T(II)T
983042AIJ 9830422F(Axkminus1 minus b)
Landweber [3] (s = 1 and t = 1)
xk = xkminus1 minus αAT
983042A9830422F(Axkminus1 minus b)
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 7 43
Randomized Kaczmarz (RK) [7] (s = m t = 1 α = 1)
At step k RK projects xkminus1 onto the hyperplane x | Aix = bi
xk = xkminus1 minus Aixkminus1 minus bi
983042Ai98304222(Ai)
T
where Ai is the ith row of A and bi is the ith component of b
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 8 43
Randomized Gauss-Seidel [4][5] (s = 1 t = n α = 1)
xk = xkminus1 minus (Aj)T(Axkminus1 minus b)
983042Aj98304222Ij
where Aj is the jth column of A and Ij is the jth column of thentimes n identity matrix I
Doubly stochastic Gauss-Seidel [6] (s = m t = n)
xk = xkminus1 minus αAij(Aix
kminus1 minus bi)
|Aij |2Ij
where Aij is the (i j) entry of A
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 9 43
31 Convergence of the norms of the expectations
Theorem 4
Let xk denote the kth iterate of DSBGS applied to the consistent linearsystem
Ax = b
with arbitrary x0 isin Rn In exact arithmetic it holds
983042E[xk minus x0983183]9830422 le
983061max1leiler
9830559830559830559830551minusασ2
i (A)
983042A9830422F
983055983055983055983055
983062k
983042x0 minus x09831839830422
wherex0983183 = (IminusAdaggerA)x0 +Adaggerb
ie the projection of x0 onto the solution set
x isin Rn | Ax = b
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 10 43
Proof of Theorem 4
Note that the conditioned expectation on xkminus1
E[xk |xkminus1]
= xkminus1 minus αE983063IJ (AIJ )T(II)T
983042AIJ 9830422F
983064(Axkminus1 minus b)
= xkminus1 minus α
983091
983107983131
(IJ )isinP
IJ (AIJ )T(II)T
983042AIJ 9830422F983042AIJ 9830422F983042A9830422F
983092
983108 (Axkminus1 minus b)
= xkminus1 minus αAT
983042A9830422F(Axkminus1 minus b)
Then the conditioned expectation E[xk minus x0983183 |xkminus1] is given by
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 11 43
E[xk minus x0983183 |xkminus1] = E[xk |xkminus1]minus x0
983183
= xkminus1 minus αAT
983042A9830422F(Axkminus1 minus b)minus x0
983183
= xkminus1 minus αAT
983042A9830422F(Axkminus1 minusAx0
983183)minus x0983183
=
983061Iminus αATA
983042A9830422F
983062(xkminus1 minus x0
983183)
Taking expectation gives
E[xk minus x0983183] = E[E[xk minus x0
983183 |xkminus1]] =
983061Iminus αATA
983042A9830422F
983062E[xkminus1 minus x0
983183]
=
983061Iminus αATA
983042A9830422F
983062k
(x0 minus x0983183)
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 12 43
Applying the norms to both sides we obtain
983042E[xk minus x0983183]9830422 le
983061max1leiler
9830559830559830559830551minusασ2
i (A)
983042A9830422F
983055983055983055983055
983062k
983042x0 minus x09831839830422
Here the inequality follows from the fact that
x0 minus x0983183 = AdaggerAx
0 minusAdaggerb isin range(AT)
and Lemma 3
Remark 1
If x0 isin range(AT) then x0983183 = Adaggerb
To ensure convergence of the expected iterate it suffices to have
max1leiler
9830559830559830559830551minusασ2
i (A)
983042A9830422F
983055983055983055983055 lt 1 ie 0 lt α lt2983042A9830422Fσ21(A)
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 13 43
Theorem 5
Let xk denote the kth iterate of DSBGS applied to the consistent orinconsistent linear system
Ax = b
with arbitrary x0 isin Rn In exact arithmetic it holds
983042E[Axk minusAx983183]9830422 le983061max1leiler
9830559830559830559830551minusασ2
i (A)
983042A9830422F
983055983055983055983055
983062k
983042Ax0 minusAx9831839830422
where x983183 is any solution of
ATAx = ATb
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 14 43
Proof of Theorem 5
Note that the conditioned expectation on xkminus1
E[Axk minusAx983183 |xkminus1]
= A(E[xk |xkminus1]minus x983183)
= A
983061xkminus1 minus α
AT
983042A9830422F(Axkminus1 minus b)minus x983183
983062
= A
983061xkminus1 minus αAT
983042A9830422F(Axkminus1 minusAx983183)minus x983183
983062
(by ATb = ATAx983183)
= Axkminus1 minusAx983183 minusαAAT
983042A9830422F(Axkminus1 minusAx983183)
=
983061Iminus αAAT
983042A9830422F
983062(Axkminus1 minusAx983183)
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 15 43
Taking expectation gives
E[Axk minusAx983183] = E[E[Axk minusAx983183 |xkminus1]]
=
983061Iminus αAAT
983042A9830422F
983062E[Axkminus1 minusAx983183]
=
983061Iminus αAAT
983042A9830422F
983062k
(Ax0 minusAx983183)
Applying the norms to both sides we obtain
983042E[Axk minusAx983183]9830422 le983061max1leiler
9830559830559830559830551minusασ2
i (A)
983042A9830422F
983055983055983055983055
983062k
983042Ax0 minusAx9831839830422
Here the inequality follows from the fact that
Ax0 minusAx983183 isin range(A)
and Lemma 3
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 16 43
32 Convergence of the expected norms
Theorem 6
Let xk denote the kth iterate of DSBGS applied to the full column rankconsistent linear system
Ax = b
with arbitrary x0 isin Rn Assume
0 lt α lt 2t
In exact arithmetic it holds
E[983042xk minusAdaggerb98304222] le9830611minus (2αminus tα2)σ2
n(A)
983042A9830422F
983062k
983042x0 minusAdaggerb98304222
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 17 43
Proof of Theorem 6
983042xk minusAdaggerb98304222
=
983056983056983056983056xkminus1 minus α
983061IJ (AIJ )T(II)
T
983042AIJ 9830422F
983062(Axkminus1 minus b)minusAdaggerb
9830569830569830569830562
2
=
983056983056983056983056xkminus1 minus α
983061IJ (AIJ )T(II)
T
983042AIJ 9830422F
983062A(xkminus1 minusAdaggerb)minusAdaggerb
9830569830569830569830562
2
=
983056983056983056983056xkminus1 minusAdaggerbminus α
983061IJ (AIJ )T(II)
TA
983042AIJ 9830422F
983062(xkminus1 minusAdaggerb)
9830569830569830569830562
2
= 983042xkminus1 minusAdaggerb98304222 minus 2α(xkminus1 minusAdaggerb)T983061IJ (AIJ )T(II)
TA
983042AIJ 9830422F
983062(xkminus1 minusAdaggerb)
+α2(xkminus1 minusAdaggerb)T983061ATIIAIJ (IJ )TIJ (AIJ )T(II)
TA
983042AIJ 9830424F
983062(xkminus1 minusAdaggerb)
le 983042xkminus1 minusAdaggerb98304222 minus 2α(xkminus1 minusAdaggerb)T983061IJ (AIJ )T(II)
TA
983042AIJ 9830422F
983062(xkminus1 minusAdaggerb)
+α2(xkminus1 minusAdaggerb)T983061ATII(II)
TA
983042AIJ 9830422F
983062(xkminus1 minusAdaggerb)
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 18 43
The last equality follows from (IJ )TIJ = I and Lemma 1 Takingexpectation gives
E[983042xk minusAdaggerb98304222 |xkminus1]
le 983042xkminus1 minusAdaggerb98304222 minus (2αminus tα2)(xkminus1 minusAdaggerb)T983061ATA
983042A9830422F
983062(xkminus1 minusAdaggerb)
le9830611minus (2αminus tα2)σ2
n(A)
983042A9830422F
983062983042xkminus1 minusAdaggerb98304222 (by Lemma 2)
Taking expectation again gives
E[983042xk minusAdaggerb98304222] = E[E[983042xk minusAdaggerb98304222 |xkminus1]]
le9830611minus (2αminus tα2)σ2
n(A)
983042A9830422F
983062E[983042xkminus1 minusAdaggerb98304222]
le9830611minus (2αminus tα2)σ2
n(A)
983042A9830422F
983062k
983042x0 minusAdaggerb98304222
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 19 43
Remark 2
If t = 1 and x0 isin range(AT) we can show
xk minus x0983183 isin range(AT)
by induction where
x0983183 = (IminusAdaggerA)x0 +Adaggerb
ie the projection of x0 onto the set
x isin Rn | Ax = b
Then for rank deficient consistent linear systems by the sameapproach we can prove the convergence bound
E[983042xk minus x098318398304222] le
9830611minus (2αminus α2)σ2
r (A)
983042A9830422F
983062k
983042x0 minus x098318398304222
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 20 43
Theorem 7
Let xk denote the kth iterate of DSBGS applied to the consistent linearsystem (full column rank or rank-deficient)
Ax = b
with arbitrary x0 isin Rn If t = n and 0 lt α lt 2σ2r (A)983042A9830422F then
E[983042Axk minus b98304222] le9830611 + α2 minus 2ασ2
r (A)
983042A9830422F
983062k
983042Ax0 minus b98304222
If t lt n and 0 lt α lt 2σ2r (A)(tρ) then
E[983042Axk minus b98304222] le9830611minus 2ασ2
r (A)minus tρα2
983042A9830422F
983062k
983042Ax0 minus b98304222
whereρ = max
1lejletσ21(AJj )
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 21 43
Proof of Theorem 7
Note that
983042Axk minus b98304222
=
983056983056983056983056Axkminus1 minus α
983061AIJ (AIJ )T(II)
T
983042AIJ 9830422F
983062(Axkminus1 minus b)minus b
9830569830569830569830562
2
= 983042Axkminus1 minus b98304222 minus 2α(Axkminus1 minus b)T983061AIJ (AIJ )T(II)
T
983042AIJ 9830422F
983062(Axkminus1 minus b)
+α2(Axkminus1 minus b)T983061IIAIJ (IJ )TATAIJ (AIJ )T(II)
T
983042AIJ 9830424F
983062(Axkminus1 minus b)
If t = n then it follows from
(IJ )TATAIJ = 983042AJ 9830422F
(since AIJ = AJ is a column vector) that
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 22 43
983042Axk minus b98304222
= 983042Axkminus1 minus b98304222 minus 2α(Axkminus1 minus b)T983061AIJ (AIJ )T(II)
T
983042AIJ 9830422F
983062(Axkminus1 minus b)
+α2(Axkminus1 minus b)T983061983042AJ 9830422FIIAIJ (AIJ )T(II)
T
983042AIJ 9830424F
983062(Axkminus1 minus b)
le 983042Axkminus1 minus b98304222 minus 2α(Axkminus1 minus b)T983061AIJ (AIJ )T(II)
T
983042AIJ 9830422F
983062(Axkminus1 minus b)
+α2(Axkminus1 minus b)T983061983042AJ 9830422FII(II)T
983042AIJ 9830422F
983062(Axkminus1 minus b) (by Lemma 1)
Taking expectation gives
E[983042Axk minus b98304222 |xkminus1]
le (1 + α2)983042Axkminus1 minus b98304222 minus 2α(Axkminus1 minus b)T983061AAT
983042A9830422F
983062(Axkminus1 minus b)
le9830611 + α2 minus 2ασ2
r(A)
983042A9830422F
983062983042Axkminus1 minus b98304222
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 23 43
The last inequality follows from Axkminus1 minus b isin range(A) and Lemma 2Taking expectation again gives
E[983042Axk minus b98304222] = E[E[983042Axk minus b98304222 |xkminus1]]
le9830611 + α2 minus 2ασ2
r (A)
983042A9830422F
983062E[983042Axkminus1 minus b98304222]
le9830611 + α2 minus 2ασ2
r (A)
983042A9830422F
983062k
983042Ax0 minus b98304222
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 24 43
If t lt n then it follows from
(IJ )TATAIJ = AT
JAJ ≼ ρI
(since ρ = max1lejlet σ21(AJj )) that
983042Axk minus b98304222
le 983042Axkminus1 minus b98304222 minus 2α(Axkminus1 minus b)T983061AIJ (AIJ )T(II)
T
983042AIJ 9830422F
983062(Axkminus1 minus b)
+α2(Axkminus1 minus b)T983061ρIIAIJ (AIJ )T(II)
T
983042AIJ 9830424F
983062(Axkminus1 minus b)
le 983042Axkminus1 minus b98304222 minus 2α(Axkminus1 minus b)T983061AIJ (AIJ )T(II)
T
983042AIJ 9830422F
983062(Axkminus1 minus b)
+α2(Axkminus1 minus b)T983061ρII(II)
T
983042AIJ 9830422F
983062(Axkminus1 minus b) (by Lemma 1)
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 25 43
Taking expectation gives
E[983042Axk minus b98304222 |xkminus1]
le9830611 +
tρα2
983042A9830422F
983062983042Axkminus1 minus b98304222 minus 2α(Axkminus1 minus b)T
983061AAT
983042A9830422F
983062(Axkminus1 minus b)
le9830611minus 2ασ2
r(A)minus tρα2
983042A9830422F
983062983042Axkminus1 minus b98304222
The last inequality follows from Axkminus1 minus b isin range(A) and Lemma 2Taking expectation again gives
E[983042Axk minus b98304222] = E[E[983042Axk minus b98304222 |xkminus1]]
le9830611minus 2ασ2
r (A)minus tρα2
983042A9830422F
983062E[983042Axkminus1 minus b98304222]
le9830611minus 2ασ2
r (A)minus tρα2
983042A9830422F
983062k
983042Ax0 minus b98304222
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 26 43
Remark 3
Note that(AJ )
Tb = (AJ )TAx983183
where x983183 is any solution of ATAx = ATb For inconsistent linearsystems we can prove DSBGS(1 t) has the convergence bounds Ift = n and 0 lt α lt 2σ2
r (A)983042A9830422F then
E[983042Axk minusAx98318398304222] le9830611 + α2 minus 2ασ2
r (A)
983042A9830422F
983062k
983042Ax0 minus b98304222
If t lt n and 0 lt α lt 2σ2r (A)(tρ) then
E[983042Axk minusAx98318398304222] le9830611minus 2ασ2
r (A)minus tρα2
983042A9830422F
983062k
983042Ax0 minus b98304222
whereρ = max
1lejletσ21(AJj )
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 27 43
4 Randomized extended block Kaczmarz (REBK) [1]
Algorithm 2 Randomized extended block Kaczmarz (REBK)
Let I1 I2 Is and J1J2 Jt be partitions of [m]and [n] respectivelyLet α gt 0 Initialize z0 isin b+ range(A) and x0 isin range(AT)for k = 1 2 do
Pick j isin [t] with probability 983042AJj9830422F983042A9830422FSet zk = zkminus1 minus α
983042AJj9830422FAJj (AJj )
Tzkminus1
Pick i isin [s] with probability 983042AIi9830422F983042A9830422FSet xk = xkminus1 minus α
983042AIi9830422F(AIi)
T(AIixkminus1 minus bIi + zkIi)
By choosing s = m t = n and α = 1 we recover the well-knownrandomized extended Kaczmarz (REK) algorithm of Zouzias and Freris[8]
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 28 43
Let Ekminus1
983045middot983046denote the conditional expectation conditioned on the first
k minus 1 iterations of REBK That is
Ekminus1
983045middot983046= E
983045middot|j1 i1 j2 i2 jkminus1 ikminus1
983046
where jl is the lth column block chosen and il is the lth row blockchosen
We denote the conditional expectation conditioned on the first k minus 1iterations and the kth column block chosen as
Eikminus1
983045middot983046= E
983045middot|j1 i1 j2 i2 jkminus1 ikminus1 jk
983046
Then by the law of total expectation we have
Ekminus1
983045middot983046= Ekminus1
983045Eikminus1
983045middot983046983046
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 29 43
41 Convergence of the norms of the expectations
Theorem 8
For any given consistent or inconsistent linear system
Ax = b
let xk be the kth iterate of REBK with
z0 isin b+ range(A) and x0 isin range(AT)
It holds
983042E983045xk minusAdaggerb
9830469830422 le δk
983061983042x0 minusAdaggerb9830422 +
αk983042ATz09830422983042A9830422F
983062
where
δ = max1leiler
9830559830559830559830551minusασ2
i (A)
983042A9830422F
983055983055983055983055
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 30 43
Proof By ATb = ATAAdaggerb and
xk minusAdaggerb = xkminus1 minusAdaggerbminus α
983042AIi9830422F(AIi)
T(AIixkminus1 minus bIi + zkIi)
we have
Ekminus1
983045xk minusAdaggerb
983046= Ekminus1
983045Eikminus1
983045xk minusAdaggerb
983046983046
= xkminus1 minusAdaggerbminus Ekminus1
983063αAT(Axkminus1 minus b+ zk)
983042A9830422F
983064
= xkminus1 minusAdaggerbminus αATAxkminus1 minusATb
983042A9830422Fminus α
AT
983042A9830422FEkminus1
983045zk
983046
=
983061Iminus α
ATA
983042A9830422F
983062(xkminus1 minusAdaggerb)minus α
AT
983042A9830422FEkminus1
983045zk
983046
=
983061Iminus α
ATA
983042A9830422F
983062(xkminus1 minusAdaggerb)minus α
AT
983042A9830422F
983061Iminus α
AAT
983042A9830422F
983062zkminus1
=
983061Iminus α
ATA
983042A9830422F
983062(xkminus1 minusAdaggerb)minus α
983061Iminus α
ATA
983042A9830422F
983062AT
983042A9830422Fzkminus1
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 31 43
Taking expectation gives
E983045xk minusAdaggerb
983046
= E983045Ekminus1
983045xk minusAdaggerb
983046983046
=
983061Iminus α
ATA
983042A9830422F
983062E983045xkminus1 minusAdaggerb
983046minus α
983061Iminus α
ATA
983042A9830422F
983062AT
983042A9830422FE983045zkminus1
983046
=
983061Iminus α
ATA
983042A9830422F
983062E983045xkminus1 minusAdaggerb
983046minus α
983061Iminus α
ATA
983042A9830422F
983062kATz0
983042A9830422F
=
983061Iminus α
ATA
983042A9830422F
9830622
E983045xkminus2 minusAdaggerb
983046minus 2α
983061Iminus α
ATA
983042A9830422F
983062kATz0
983042A9830422F= middot middot middot
=
983061Iminus α
ATA
983042A9830422F
983062k
(x0 minusAdaggerb)minus αk
983061Iminus α
ATA
983042A9830422F
983062kATz0
983042A9830422F
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 32 43
Applying the norms to both sides we obtain
983042E983045xk minusAdaggerb
9830469830422
=
983056983056983056983056983056
983061Iminus α
ATA
983042A9830422F
983062k
(x0 minusAdaggerb)minus αk
983061Iminus α
ATA
983042A9830422F
983062kATz0
983042A9830422F
9830569830569830569830569830562
le983056983056983056983056983056
983061Iminus α
ATA
983042A9830422F
983062k
(x0 minusAdaggerb)
9830569830569830569830569830562
+
983056983056983056983056983056αk983061Iminus α
ATA
983042A9830422F
983062kATz0
983042A9830422F
9830569830569830569830569830562
le δk983061983042x0 minusAdaggerb9830422 +
αk983042ATz09830422983042A9830422F
983062
Here the last inequality follows from the facts that
x0 minusAdaggerb isin range(AT) ATz0 isin range(AT)
and Lemma 3
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 33 43
42 Convergence of E983045983042xk minusAdaggerb98304222
983046
The convergence ofE983045983042xk minusAdaggerb98304222
983046
depends on the positive number ρ defined as
ρ = 1minus (2αminus α2)σ2r (A)
983042A9830422F
In the following lemma we show that the vector zk generated in REBKwith z0 isin b+ range(A) converges to
bperp = (IminusAAdagger)b
which is the orthogonal projection of z0 onto the set z | ATz = 0
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 34 43
Lemma 9
For any given consistent or inconsistent linear system
Ax = b
let zk be the vector generated in REBK with
z0 isin b+ range(A)
It holdsE983045983042zk minus bperp98304222
983046le ρk983042z0 minus bperp98304222
Proof By (AJj )Tbperp = 0 we have
zk minus bperp = zkminus1 minus bperp minus α
983042AJj9830422FAJj (AJj )
T(zkminus1 minus bperp) (1)
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 35 43
By z0 minus bperp = AAdaggerz0 isin range(A) we can show by induction that
zk minus bperp isin range(A)
It follows from (1) that
983042zk minus bperp98304222
= 983042zkminus1 minus bperp98304222 minus2α983042(AJj )
T(zkminus1 minus bperp)98304222983042AJj9830422F
+α2
983042AJj9830424F(zkminus1 minus bperp)
TAJj (AJj )TAJj (AJj )
T(zkminus1 minus bperp)
le 983042zkminus1 minus bperp98304222 minus(2αminus α2)983042(AJj )
T(zkminus1 minus bperp)98304222983042AJj9830422F
(by Lemma 1)
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 36 43
Taking the conditioned expectation on the first k minus 1 iterations yields
Ekminus1
983045983042zk minus bperp98304222
983046
le 983042zkminus1 minus bperp98304222 minus(2αminus α2)983042AT(zkminus1 minus bperp)98304222
983042A9830422Fle ρ983042zkminus1 minus bperp98304222 (by Lemma 2)
Taking expectation again gives
E983045983042zk minus bperp98304222
983046= E
983045Ekminus1
983045983042zk minus bperp98304222
983046983046
le ρE983045983042zkminus1 minus bperp98304222
983046
le ρk983042z0 minus bperp98304222
This completes the proof
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 37 43
Theorem 10
For any given consistent or inconsistent linear system Ax = b let xk
be the kth iterate of REBK with
z0 isin b+ range(A) and x0 isin range(AT)
For any ε gt 0 it holds
E983045983042xk minusAdaggerb98304222
983046le (1 + ε)kρk
983061(1 + ε)α2983042z0 minus bperp98304222
ε2983042A9830422F+ 983042x0 minusAdaggerb98304222
983062
Proof Let
983141xk = xkminus1 minus α
983042AIi9830422F(AIi)
TAIi(xkminus1 minusAdaggerb)
which is actually one DSBGS update for the linear systemAx = AAdaggerb from xkminus1
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 38 43
It follows from
xk minus 983141xk =α
983042AIi9830422F(AIi)
T(bIi minusAIiAdaggerbminus zkIi)
that
983042xk minus 983141xk98304222
=α2
983042AIi9830424F(bIi minusAIiA
daggerbminus zkIi)TAIi(AIi)
T(bIi minusAIiAdaggerbminus zkIi
)
leα2983042bIi minusAIiA
daggerbminus zkIi98304222
983042AIi9830422F (by Lemma 1) (2)
It follows from
Ekminus1
983045983042xk minus 983141xk98304222
983046= Ekminus1
983045Eikminus1
983045983042xk minus 983141xk98304222
983046983046
le Ekminus1
983063α2983042bminusAAdaggerbminus zk98304222
983042A9830422F
983064(by (2))
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 39 43
that
E983045983042xk minus 983141xk98304222
983046le α2
983042A9830422FE983045983042bminusAAdaggerbminus zk98304222
983046
le α2ρk
983042A9830422F983042z0 minus bperp98304222 (by Lemma 9) (3)
By x0 isin range(AT) and Adaggerb isin range(AT) we have
x0 minusAdaggerb isin range(AT)
Then we can show that xk minusAdaggerb isin range(AT) by induction By
983042983141xk minusAdaggerb98304222
= 983042xkminus1 minusAdaggerb98304222 minus2α983042AIi(x
kminus1 minusAdaggerb)98304222983042AIi9830422F
+α2
983042AIi9830424F(xkminus1 minusAdaggerb)T(AIi)
TAIi(AIi)TAIi(x
kminus1 minusAdaggerb)
le 983042xkminus1 minusAdaggerb98304222 minus(2αminus α2)983042AIi(x
kminus1 minusAdaggerb)98304222983042AIi9830422F
(by Lemma 1)
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 40 43
we have
Ekminus1
983045983042983141xk minusAdaggerb98304222
983046
le 983042xkminus1 minusAdaggerb98304222 minus(2αminus α2)983042A(xkminus1 minusAdaggerb)98304222
983042A9830422Fle ρ983042xkminus1 minusAdaggerb98304222 (by Lemma 3)
which yields
E983045983042983141xk minusAdaggerb98304222
983046le ρE
983045983042xkminus1 minusAdaggerb98304222
983046 (4)
Note that for any ε gt 0 we have
983042xk minusAdaggerb98304222= 983042xk minus 983141xk + 983141xk minusAdaggerb98304222le (983042xk minus 983141xk9830422 + 983042983141xk minusAdaggerb9830422)2
le 983042xk minus 983141xk98304222 + 983042983141xk minusAdaggerb98304222 + 2983042xk minus 983141xk9830422983042983141xk minusAdaggerb9830422
le9830611 +
1
ε
983062983042xk minus 983141xk98304222 + (1 + ε)983042983141xk minusAdaggerb98304222 (5)
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 41 43
Combining (3) (4) and (5) yields
E983045983042xk minusAdaggerb98304222
983046
le9830611 +
1
ε
983062E983045983042xk minus 983141xk98304222
983046+ (1 + ε)E
983045983042983141xk minusAdaggerb98304222
983046
le9830611 +
1
ε
983062α2ρk
983042A9830422F983042z0 minus bperp98304222 + (1 + ε)ρE
983045983042xkminus1 minusAdaggerb98304222
983046
le9830611 +
1
ε
983062(1 + 1 + ε)α2ρk
983042A9830422F983042z0 minus bperp98304222
+(1 + ε)2ρ2E983045983042xkminus2 minusAdaggerb98304222
983046
le middot middot middot
le9830611 +
1
ε
983062(1 + 1 + ε+ middot middot middot+ (1 + ε)kminus1)α2ρk
983042A9830422F983042z0 minus bperp98304222
+(1 + ε)kρk983042x0 minusAdaggerb98304222
le (1 + ε)kρk983061(1 + ε)α2983042z0 minus bperp98304222
ε2983042A9830422F+ 983042x0 minusAdaggerb98304222
983062
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 42 43
Remark 4
For the case REBK with s = m t = n and α = 1 (ie REK) by theorthogonality
(983141xk minusAdaggerb)T(xk minus 983141xk) = 0
the equation (5) becomes
983042xk minusAdaggerb98304222 = 983042xk minus 983141xk98304222 + 983042983141xk minusAdaggerb98304222
which yields the following convergence for REK
E983045983042xk minusAdaggerb98304222
983046le ρk
983061k983042z0 minus bperp98304222
983042A9830422F+ 983042x0 minusAdaggerb98304222
983062
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 43 43
Kui Du Wutao Si and Xiaohui Sun
Pseudoinverse-free randomized extended block kaczmarz for solving least squaresarXiv preprint arXiv200104179 2020
Kui Du and Xiaohui Sun
A doubly stochastic block Gauss-Seidel algorithm for solving linear equationsarXiv preprint arXiv191213291 2019
Lawrence H Landweber
An iteration formula for Fredholm integral equations of the first kindAmer J Math 73615ndash624 1951
D Leventhal and A S Lewis
Randomized methods for linear constraints convergence rates and conditioningMath Oper Res 35(3)641ndash654 2010
Anna Ma Deanna Needell and Aaditya Ramdas
Convergence properties of the randomized extended Gauss-Seidel and Kaczmarz methodsSIAM J Matrix Anal Appl 36(4)1590ndash1604 2015
Meisam Razaviyayn Mingyi Hong Navid Reyhanian and Zhi-Quan Luo
A linearly convergent doubly stochastic Gauss-Seidel algorithm for solving linear equations and acertain class of over-parameterized optimization problemsMath Program 176(1-2 Ser B)465ndash496 2019
Thomas Strohmer and Roman Vershynin
A randomized Kaczmarz algorithm with exponential convergenceJ Fourier Anal Appl 15(2)262ndash278 2009
Anastasios Zouzias and Nikolaos M Freris
Randomized extended Kaczmarz for solving least squaresSIAM J Matrix Anal Appl 34(2)773ndash793 2013
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 43 43
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 3 43
2 Notation and preliminaries
For any random variable ξ let E983045ξ983046denote its expectation
For an integer m ge 1 let
[m] = 1 2 3 m
For any vector u isin Rm we use uT and 983042u9830422 to denote thetranspose and the Euclidean norm (ℓ2-norm) of u respectively
I the identity matrix whose order is clear from the context
For any matrix A isin Rmtimesn we use AT Adagger 983042A983042F range(A)
σ1(A) ge σ2(A) ge middot middot middot ge σr(A) gt 0
to denote the transpose the Moore-Penrose pseudoinverse theFrobenius norm the column space and all the nonzero singularvalues of A respectively Obviously r is the rank of A
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 4 43
For index sets I sube [m] and J sube [n] let AI AJ and AIJdenote the row submatrix indexed by I the column submatrixindexed by J and the submatrix that lies in the rows indexed byI and the columns indexed by J respectively
Let I1 I2 Is denote a partition of [m] that is
Ii cap Ij = empty cupsi=1Ii = [m]
Let J1J2 Jt denote a partition of [n] Let
P = I1 I2 Istimes J1J2 Jt
Lemma 1
For any vector u isin Rm and any matrix A isin Rmtimesn it holds
uTAATu le 983042A9830422FuTu
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 5 43
Lemma 2
For any matrix A isin Rmtimesn with rank r and any vector u isin range(A) itholds
uTAATu ge σ2r (A)983042u98304222
Lemma 3
Let α gt 0 and A be any nonzero real matrix For every u isin range(A)it holds
983056983056983056983056983056
983061Iminus αAAT
983042A9830422F
983062k
u
9830569830569830569830569830562
le983061max1leiler
9830559830559830559830551minusασ2
i (A)
983042A9830422F
983055983055983055983055
983062k
983042u9830422
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 6 43
3 A doubly stochastic block Gauss-Seidel algorithm [2]
Algorithm 1 Doubly stochastic block Gauss-Seidel (DSBGS)
Let α gt 0 Initialize x0 isin Rn
for k = 1 2 do
Pick (IJ ) isin P with probability983042AIJ 9830422F983042A9830422F
Set xk = xkminus1 minus αIJ (AIJ )T(II)T
983042AIJ 9830422F(Axkminus1 minus b)
Landweber [3] (s = 1 and t = 1)
xk = xkminus1 minus αAT
983042A9830422F(Axkminus1 minus b)
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 7 43
Randomized Kaczmarz (RK) [7] (s = m t = 1 α = 1)
At step k RK projects xkminus1 onto the hyperplane x | Aix = bi
xk = xkminus1 minus Aixkminus1 minus bi
983042Ai98304222(Ai)
T
where Ai is the ith row of A and bi is the ith component of b
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 8 43
Randomized Gauss-Seidel [4][5] (s = 1 t = n α = 1)
xk = xkminus1 minus (Aj)T(Axkminus1 minus b)
983042Aj98304222Ij
where Aj is the jth column of A and Ij is the jth column of thentimes n identity matrix I
Doubly stochastic Gauss-Seidel [6] (s = m t = n)
xk = xkminus1 minus αAij(Aix
kminus1 minus bi)
|Aij |2Ij
where Aij is the (i j) entry of A
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 9 43
31 Convergence of the norms of the expectations
Theorem 4
Let xk denote the kth iterate of DSBGS applied to the consistent linearsystem
Ax = b
with arbitrary x0 isin Rn In exact arithmetic it holds
983042E[xk minus x0983183]9830422 le
983061max1leiler
9830559830559830559830551minusασ2
i (A)
983042A9830422F
983055983055983055983055
983062k
983042x0 minus x09831839830422
wherex0983183 = (IminusAdaggerA)x0 +Adaggerb
ie the projection of x0 onto the solution set
x isin Rn | Ax = b
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 10 43
Proof of Theorem 4
Note that the conditioned expectation on xkminus1
E[xk |xkminus1]
= xkminus1 minus αE983063IJ (AIJ )T(II)T
983042AIJ 9830422F
983064(Axkminus1 minus b)
= xkminus1 minus α
983091
983107983131
(IJ )isinP
IJ (AIJ )T(II)T
983042AIJ 9830422F983042AIJ 9830422F983042A9830422F
983092
983108 (Axkminus1 minus b)
= xkminus1 minus αAT
983042A9830422F(Axkminus1 minus b)
Then the conditioned expectation E[xk minus x0983183 |xkminus1] is given by
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 11 43
E[xk minus x0983183 |xkminus1] = E[xk |xkminus1]minus x0
983183
= xkminus1 minus αAT
983042A9830422F(Axkminus1 minus b)minus x0
983183
= xkminus1 minus αAT
983042A9830422F(Axkminus1 minusAx0
983183)minus x0983183
=
983061Iminus αATA
983042A9830422F
983062(xkminus1 minus x0
983183)
Taking expectation gives
E[xk minus x0983183] = E[E[xk minus x0
983183 |xkminus1]] =
983061Iminus αATA
983042A9830422F
983062E[xkminus1 minus x0
983183]
=
983061Iminus αATA
983042A9830422F
983062k
(x0 minus x0983183)
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 12 43
Applying the norms to both sides we obtain
983042E[xk minus x0983183]9830422 le
983061max1leiler
9830559830559830559830551minusασ2
i (A)
983042A9830422F
983055983055983055983055
983062k
983042x0 minus x09831839830422
Here the inequality follows from the fact that
x0 minus x0983183 = AdaggerAx
0 minusAdaggerb isin range(AT)
and Lemma 3
Remark 1
If x0 isin range(AT) then x0983183 = Adaggerb
To ensure convergence of the expected iterate it suffices to have
max1leiler
9830559830559830559830551minusασ2
i (A)
983042A9830422F
983055983055983055983055 lt 1 ie 0 lt α lt2983042A9830422Fσ21(A)
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 13 43
Theorem 5
Let xk denote the kth iterate of DSBGS applied to the consistent orinconsistent linear system
Ax = b
with arbitrary x0 isin Rn In exact arithmetic it holds
983042E[Axk minusAx983183]9830422 le983061max1leiler
9830559830559830559830551minusασ2
i (A)
983042A9830422F
983055983055983055983055
983062k
983042Ax0 minusAx9831839830422
where x983183 is any solution of
ATAx = ATb
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 14 43
Proof of Theorem 5
Note that the conditioned expectation on xkminus1
E[Axk minusAx983183 |xkminus1]
= A(E[xk |xkminus1]minus x983183)
= A
983061xkminus1 minus α
AT
983042A9830422F(Axkminus1 minus b)minus x983183
983062
= A
983061xkminus1 minus αAT
983042A9830422F(Axkminus1 minusAx983183)minus x983183
983062
(by ATb = ATAx983183)
= Axkminus1 minusAx983183 minusαAAT
983042A9830422F(Axkminus1 minusAx983183)
=
983061Iminus αAAT
983042A9830422F
983062(Axkminus1 minusAx983183)
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 15 43
Taking expectation gives
E[Axk minusAx983183] = E[E[Axk minusAx983183 |xkminus1]]
=
983061Iminus αAAT
983042A9830422F
983062E[Axkminus1 minusAx983183]
=
983061Iminus αAAT
983042A9830422F
983062k
(Ax0 minusAx983183)
Applying the norms to both sides we obtain
983042E[Axk minusAx983183]9830422 le983061max1leiler
9830559830559830559830551minusασ2
i (A)
983042A9830422F
983055983055983055983055
983062k
983042Ax0 minusAx9831839830422
Here the inequality follows from the fact that
Ax0 minusAx983183 isin range(A)
and Lemma 3
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 16 43
32 Convergence of the expected norms
Theorem 6
Let xk denote the kth iterate of DSBGS applied to the full column rankconsistent linear system
Ax = b
with arbitrary x0 isin Rn Assume
0 lt α lt 2t
In exact arithmetic it holds
E[983042xk minusAdaggerb98304222] le9830611minus (2αminus tα2)σ2
n(A)
983042A9830422F
983062k
983042x0 minusAdaggerb98304222
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 17 43
Proof of Theorem 6
983042xk minusAdaggerb98304222
=
983056983056983056983056xkminus1 minus α
983061IJ (AIJ )T(II)
T
983042AIJ 9830422F
983062(Axkminus1 minus b)minusAdaggerb
9830569830569830569830562
2
=
983056983056983056983056xkminus1 minus α
983061IJ (AIJ )T(II)
T
983042AIJ 9830422F
983062A(xkminus1 minusAdaggerb)minusAdaggerb
9830569830569830569830562
2
=
983056983056983056983056xkminus1 minusAdaggerbminus α
983061IJ (AIJ )T(II)
TA
983042AIJ 9830422F
983062(xkminus1 minusAdaggerb)
9830569830569830569830562
2
= 983042xkminus1 minusAdaggerb98304222 minus 2α(xkminus1 minusAdaggerb)T983061IJ (AIJ )T(II)
TA
983042AIJ 9830422F
983062(xkminus1 minusAdaggerb)
+α2(xkminus1 minusAdaggerb)T983061ATIIAIJ (IJ )TIJ (AIJ )T(II)
TA
983042AIJ 9830424F
983062(xkminus1 minusAdaggerb)
le 983042xkminus1 minusAdaggerb98304222 minus 2α(xkminus1 minusAdaggerb)T983061IJ (AIJ )T(II)
TA
983042AIJ 9830422F
983062(xkminus1 minusAdaggerb)
+α2(xkminus1 minusAdaggerb)T983061ATII(II)
TA
983042AIJ 9830422F
983062(xkminus1 minusAdaggerb)
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 18 43
The last equality follows from (IJ )TIJ = I and Lemma 1 Takingexpectation gives
E[983042xk minusAdaggerb98304222 |xkminus1]
le 983042xkminus1 minusAdaggerb98304222 minus (2αminus tα2)(xkminus1 minusAdaggerb)T983061ATA
983042A9830422F
983062(xkminus1 minusAdaggerb)
le9830611minus (2αminus tα2)σ2
n(A)
983042A9830422F
983062983042xkminus1 minusAdaggerb98304222 (by Lemma 2)
Taking expectation again gives
E[983042xk minusAdaggerb98304222] = E[E[983042xk minusAdaggerb98304222 |xkminus1]]
le9830611minus (2αminus tα2)σ2
n(A)
983042A9830422F
983062E[983042xkminus1 minusAdaggerb98304222]
le9830611minus (2αminus tα2)σ2
n(A)
983042A9830422F
983062k
983042x0 minusAdaggerb98304222
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 19 43
Remark 2
If t = 1 and x0 isin range(AT) we can show
xk minus x0983183 isin range(AT)
by induction where
x0983183 = (IminusAdaggerA)x0 +Adaggerb
ie the projection of x0 onto the set
x isin Rn | Ax = b
Then for rank deficient consistent linear systems by the sameapproach we can prove the convergence bound
E[983042xk minus x098318398304222] le
9830611minus (2αminus α2)σ2
r (A)
983042A9830422F
983062k
983042x0 minus x098318398304222
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 20 43
Theorem 7
Let xk denote the kth iterate of DSBGS applied to the consistent linearsystem (full column rank or rank-deficient)
Ax = b
with arbitrary x0 isin Rn If t = n and 0 lt α lt 2σ2r (A)983042A9830422F then
E[983042Axk minus b98304222] le9830611 + α2 minus 2ασ2
r (A)
983042A9830422F
983062k
983042Ax0 minus b98304222
If t lt n and 0 lt α lt 2σ2r (A)(tρ) then
E[983042Axk minus b98304222] le9830611minus 2ασ2
r (A)minus tρα2
983042A9830422F
983062k
983042Ax0 minus b98304222
whereρ = max
1lejletσ21(AJj )
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 21 43
Proof of Theorem 7
Note that
983042Axk minus b98304222
=
983056983056983056983056Axkminus1 minus α
983061AIJ (AIJ )T(II)
T
983042AIJ 9830422F
983062(Axkminus1 minus b)minus b
9830569830569830569830562
2
= 983042Axkminus1 minus b98304222 minus 2α(Axkminus1 minus b)T983061AIJ (AIJ )T(II)
T
983042AIJ 9830422F
983062(Axkminus1 minus b)
+α2(Axkminus1 minus b)T983061IIAIJ (IJ )TATAIJ (AIJ )T(II)
T
983042AIJ 9830424F
983062(Axkminus1 minus b)
If t = n then it follows from
(IJ )TATAIJ = 983042AJ 9830422F
(since AIJ = AJ is a column vector) that
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 22 43
983042Axk minus b98304222
= 983042Axkminus1 minus b98304222 minus 2α(Axkminus1 minus b)T983061AIJ (AIJ )T(II)
T
983042AIJ 9830422F
983062(Axkminus1 minus b)
+α2(Axkminus1 minus b)T983061983042AJ 9830422FIIAIJ (AIJ )T(II)
T
983042AIJ 9830424F
983062(Axkminus1 minus b)
le 983042Axkminus1 minus b98304222 minus 2α(Axkminus1 minus b)T983061AIJ (AIJ )T(II)
T
983042AIJ 9830422F
983062(Axkminus1 minus b)
+α2(Axkminus1 minus b)T983061983042AJ 9830422FII(II)T
983042AIJ 9830422F
983062(Axkminus1 minus b) (by Lemma 1)
Taking expectation gives
E[983042Axk minus b98304222 |xkminus1]
le (1 + α2)983042Axkminus1 minus b98304222 minus 2α(Axkminus1 minus b)T983061AAT
983042A9830422F
983062(Axkminus1 minus b)
le9830611 + α2 minus 2ασ2
r(A)
983042A9830422F
983062983042Axkminus1 minus b98304222
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 23 43
The last inequality follows from Axkminus1 minus b isin range(A) and Lemma 2Taking expectation again gives
E[983042Axk minus b98304222] = E[E[983042Axk minus b98304222 |xkminus1]]
le9830611 + α2 minus 2ασ2
r (A)
983042A9830422F
983062E[983042Axkminus1 minus b98304222]
le9830611 + α2 minus 2ασ2
r (A)
983042A9830422F
983062k
983042Ax0 minus b98304222
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 24 43
If t lt n then it follows from
(IJ )TATAIJ = AT
JAJ ≼ ρI
(since ρ = max1lejlet σ21(AJj )) that
983042Axk minus b98304222
le 983042Axkminus1 minus b98304222 minus 2α(Axkminus1 minus b)T983061AIJ (AIJ )T(II)
T
983042AIJ 9830422F
983062(Axkminus1 minus b)
+α2(Axkminus1 minus b)T983061ρIIAIJ (AIJ )T(II)
T
983042AIJ 9830424F
983062(Axkminus1 minus b)
le 983042Axkminus1 minus b98304222 minus 2α(Axkminus1 minus b)T983061AIJ (AIJ )T(II)
T
983042AIJ 9830422F
983062(Axkminus1 minus b)
+α2(Axkminus1 minus b)T983061ρII(II)
T
983042AIJ 9830422F
983062(Axkminus1 minus b) (by Lemma 1)
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 25 43
Taking expectation gives
E[983042Axk minus b98304222 |xkminus1]
le9830611 +
tρα2
983042A9830422F
983062983042Axkminus1 minus b98304222 minus 2α(Axkminus1 minus b)T
983061AAT
983042A9830422F
983062(Axkminus1 minus b)
le9830611minus 2ασ2
r(A)minus tρα2
983042A9830422F
983062983042Axkminus1 minus b98304222
The last inequality follows from Axkminus1 minus b isin range(A) and Lemma 2Taking expectation again gives
E[983042Axk minus b98304222] = E[E[983042Axk minus b98304222 |xkminus1]]
le9830611minus 2ασ2
r (A)minus tρα2
983042A9830422F
983062E[983042Axkminus1 minus b98304222]
le9830611minus 2ασ2
r (A)minus tρα2
983042A9830422F
983062k
983042Ax0 minus b98304222
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 26 43
Remark 3
Note that(AJ )
Tb = (AJ )TAx983183
where x983183 is any solution of ATAx = ATb For inconsistent linearsystems we can prove DSBGS(1 t) has the convergence bounds Ift = n and 0 lt α lt 2σ2
r (A)983042A9830422F then
E[983042Axk minusAx98318398304222] le9830611 + α2 minus 2ασ2
r (A)
983042A9830422F
983062k
983042Ax0 minus b98304222
If t lt n and 0 lt α lt 2σ2r (A)(tρ) then
E[983042Axk minusAx98318398304222] le9830611minus 2ασ2
r (A)minus tρα2
983042A9830422F
983062k
983042Ax0 minus b98304222
whereρ = max
1lejletσ21(AJj )
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 27 43
4 Randomized extended block Kaczmarz (REBK) [1]
Algorithm 2 Randomized extended block Kaczmarz (REBK)
Let I1 I2 Is and J1J2 Jt be partitions of [m]and [n] respectivelyLet α gt 0 Initialize z0 isin b+ range(A) and x0 isin range(AT)for k = 1 2 do
Pick j isin [t] with probability 983042AJj9830422F983042A9830422FSet zk = zkminus1 minus α
983042AJj9830422FAJj (AJj )
Tzkminus1
Pick i isin [s] with probability 983042AIi9830422F983042A9830422FSet xk = xkminus1 minus α
983042AIi9830422F(AIi)
T(AIixkminus1 minus bIi + zkIi)
By choosing s = m t = n and α = 1 we recover the well-knownrandomized extended Kaczmarz (REK) algorithm of Zouzias and Freris[8]
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 28 43
Let Ekminus1
983045middot983046denote the conditional expectation conditioned on the first
k minus 1 iterations of REBK That is
Ekminus1
983045middot983046= E
983045middot|j1 i1 j2 i2 jkminus1 ikminus1
983046
where jl is the lth column block chosen and il is the lth row blockchosen
We denote the conditional expectation conditioned on the first k minus 1iterations and the kth column block chosen as
Eikminus1
983045middot983046= E
983045middot|j1 i1 j2 i2 jkminus1 ikminus1 jk
983046
Then by the law of total expectation we have
Ekminus1
983045middot983046= Ekminus1
983045Eikminus1
983045middot983046983046
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 29 43
41 Convergence of the norms of the expectations
Theorem 8
For any given consistent or inconsistent linear system
Ax = b
let xk be the kth iterate of REBK with
z0 isin b+ range(A) and x0 isin range(AT)
It holds
983042E983045xk minusAdaggerb
9830469830422 le δk
983061983042x0 minusAdaggerb9830422 +
αk983042ATz09830422983042A9830422F
983062
where
δ = max1leiler
9830559830559830559830551minusασ2
i (A)
983042A9830422F
983055983055983055983055
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 30 43
Proof By ATb = ATAAdaggerb and
xk minusAdaggerb = xkminus1 minusAdaggerbminus α
983042AIi9830422F(AIi)
T(AIixkminus1 minus bIi + zkIi)
we have
Ekminus1
983045xk minusAdaggerb
983046= Ekminus1
983045Eikminus1
983045xk minusAdaggerb
983046983046
= xkminus1 minusAdaggerbminus Ekminus1
983063αAT(Axkminus1 minus b+ zk)
983042A9830422F
983064
= xkminus1 minusAdaggerbminus αATAxkminus1 minusATb
983042A9830422Fminus α
AT
983042A9830422FEkminus1
983045zk
983046
=
983061Iminus α
ATA
983042A9830422F
983062(xkminus1 minusAdaggerb)minus α
AT
983042A9830422FEkminus1
983045zk
983046
=
983061Iminus α
ATA
983042A9830422F
983062(xkminus1 minusAdaggerb)minus α
AT
983042A9830422F
983061Iminus α
AAT
983042A9830422F
983062zkminus1
=
983061Iminus α
ATA
983042A9830422F
983062(xkminus1 minusAdaggerb)minus α
983061Iminus α
ATA
983042A9830422F
983062AT
983042A9830422Fzkminus1
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 31 43
Taking expectation gives
E983045xk minusAdaggerb
983046
= E983045Ekminus1
983045xk minusAdaggerb
983046983046
=
983061Iminus α
ATA
983042A9830422F
983062E983045xkminus1 minusAdaggerb
983046minus α
983061Iminus α
ATA
983042A9830422F
983062AT
983042A9830422FE983045zkminus1
983046
=
983061Iminus α
ATA
983042A9830422F
983062E983045xkminus1 minusAdaggerb
983046minus α
983061Iminus α
ATA
983042A9830422F
983062kATz0
983042A9830422F
=
983061Iminus α
ATA
983042A9830422F
9830622
E983045xkminus2 minusAdaggerb
983046minus 2α
983061Iminus α
ATA
983042A9830422F
983062kATz0
983042A9830422F= middot middot middot
=
983061Iminus α
ATA
983042A9830422F
983062k
(x0 minusAdaggerb)minus αk
983061Iminus α
ATA
983042A9830422F
983062kATz0
983042A9830422F
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 32 43
Applying the norms to both sides we obtain
983042E983045xk minusAdaggerb
9830469830422
=
983056983056983056983056983056
983061Iminus α
ATA
983042A9830422F
983062k
(x0 minusAdaggerb)minus αk
983061Iminus α
ATA
983042A9830422F
983062kATz0
983042A9830422F
9830569830569830569830569830562
le983056983056983056983056983056
983061Iminus α
ATA
983042A9830422F
983062k
(x0 minusAdaggerb)
9830569830569830569830569830562
+
983056983056983056983056983056αk983061Iminus α
ATA
983042A9830422F
983062kATz0
983042A9830422F
9830569830569830569830569830562
le δk983061983042x0 minusAdaggerb9830422 +
αk983042ATz09830422983042A9830422F
983062
Here the last inequality follows from the facts that
x0 minusAdaggerb isin range(AT) ATz0 isin range(AT)
and Lemma 3
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 33 43
42 Convergence of E983045983042xk minusAdaggerb98304222
983046
The convergence ofE983045983042xk minusAdaggerb98304222
983046
depends on the positive number ρ defined as
ρ = 1minus (2αminus α2)σ2r (A)
983042A9830422F
In the following lemma we show that the vector zk generated in REBKwith z0 isin b+ range(A) converges to
bperp = (IminusAAdagger)b
which is the orthogonal projection of z0 onto the set z | ATz = 0
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 34 43
Lemma 9
For any given consistent or inconsistent linear system
Ax = b
let zk be the vector generated in REBK with
z0 isin b+ range(A)
It holdsE983045983042zk minus bperp98304222
983046le ρk983042z0 minus bperp98304222
Proof By (AJj )Tbperp = 0 we have
zk minus bperp = zkminus1 minus bperp minus α
983042AJj9830422FAJj (AJj )
T(zkminus1 minus bperp) (1)
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 35 43
By z0 minus bperp = AAdaggerz0 isin range(A) we can show by induction that
zk minus bperp isin range(A)
It follows from (1) that
983042zk minus bperp98304222
= 983042zkminus1 minus bperp98304222 minus2α983042(AJj )
T(zkminus1 minus bperp)98304222983042AJj9830422F
+α2
983042AJj9830424F(zkminus1 minus bperp)
TAJj (AJj )TAJj (AJj )
T(zkminus1 minus bperp)
le 983042zkminus1 minus bperp98304222 minus(2αminus α2)983042(AJj )
T(zkminus1 minus bperp)98304222983042AJj9830422F
(by Lemma 1)
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 36 43
Taking the conditioned expectation on the first k minus 1 iterations yields
Ekminus1
983045983042zk minus bperp98304222
983046
le 983042zkminus1 minus bperp98304222 minus(2αminus α2)983042AT(zkminus1 minus bperp)98304222
983042A9830422Fle ρ983042zkminus1 minus bperp98304222 (by Lemma 2)
Taking expectation again gives
E983045983042zk minus bperp98304222
983046= E
983045Ekminus1
983045983042zk minus bperp98304222
983046983046
le ρE983045983042zkminus1 minus bperp98304222
983046
le ρk983042z0 minus bperp98304222
This completes the proof
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 37 43
Theorem 10
For any given consistent or inconsistent linear system Ax = b let xk
be the kth iterate of REBK with
z0 isin b+ range(A) and x0 isin range(AT)
For any ε gt 0 it holds
E983045983042xk minusAdaggerb98304222
983046le (1 + ε)kρk
983061(1 + ε)α2983042z0 minus bperp98304222
ε2983042A9830422F+ 983042x0 minusAdaggerb98304222
983062
Proof Let
983141xk = xkminus1 minus α
983042AIi9830422F(AIi)
TAIi(xkminus1 minusAdaggerb)
which is actually one DSBGS update for the linear systemAx = AAdaggerb from xkminus1
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 38 43
It follows from
xk minus 983141xk =α
983042AIi9830422F(AIi)
T(bIi minusAIiAdaggerbminus zkIi)
that
983042xk minus 983141xk98304222
=α2
983042AIi9830424F(bIi minusAIiA
daggerbminus zkIi)TAIi(AIi)
T(bIi minusAIiAdaggerbminus zkIi
)
leα2983042bIi minusAIiA
daggerbminus zkIi98304222
983042AIi9830422F (by Lemma 1) (2)
It follows from
Ekminus1
983045983042xk minus 983141xk98304222
983046= Ekminus1
983045Eikminus1
983045983042xk minus 983141xk98304222
983046983046
le Ekminus1
983063α2983042bminusAAdaggerbminus zk98304222
983042A9830422F
983064(by (2))
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 39 43
that
E983045983042xk minus 983141xk98304222
983046le α2
983042A9830422FE983045983042bminusAAdaggerbminus zk98304222
983046
le α2ρk
983042A9830422F983042z0 minus bperp98304222 (by Lemma 9) (3)
By x0 isin range(AT) and Adaggerb isin range(AT) we have
x0 minusAdaggerb isin range(AT)
Then we can show that xk minusAdaggerb isin range(AT) by induction By
983042983141xk minusAdaggerb98304222
= 983042xkminus1 minusAdaggerb98304222 minus2α983042AIi(x
kminus1 minusAdaggerb)98304222983042AIi9830422F
+α2
983042AIi9830424F(xkminus1 minusAdaggerb)T(AIi)
TAIi(AIi)TAIi(x
kminus1 minusAdaggerb)
le 983042xkminus1 minusAdaggerb98304222 minus(2αminus α2)983042AIi(x
kminus1 minusAdaggerb)98304222983042AIi9830422F
(by Lemma 1)
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 40 43
we have
Ekminus1
983045983042983141xk minusAdaggerb98304222
983046
le 983042xkminus1 minusAdaggerb98304222 minus(2αminus α2)983042A(xkminus1 minusAdaggerb)98304222
983042A9830422Fle ρ983042xkminus1 minusAdaggerb98304222 (by Lemma 3)
which yields
E983045983042983141xk minusAdaggerb98304222
983046le ρE
983045983042xkminus1 minusAdaggerb98304222
983046 (4)
Note that for any ε gt 0 we have
983042xk minusAdaggerb98304222= 983042xk minus 983141xk + 983141xk minusAdaggerb98304222le (983042xk minus 983141xk9830422 + 983042983141xk minusAdaggerb9830422)2
le 983042xk minus 983141xk98304222 + 983042983141xk minusAdaggerb98304222 + 2983042xk minus 983141xk9830422983042983141xk minusAdaggerb9830422
le9830611 +
1
ε
983062983042xk minus 983141xk98304222 + (1 + ε)983042983141xk minusAdaggerb98304222 (5)
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 41 43
Combining (3) (4) and (5) yields
E983045983042xk minusAdaggerb98304222
983046
le9830611 +
1
ε
983062E983045983042xk minus 983141xk98304222
983046+ (1 + ε)E
983045983042983141xk minusAdaggerb98304222
983046
le9830611 +
1
ε
983062α2ρk
983042A9830422F983042z0 minus bperp98304222 + (1 + ε)ρE
983045983042xkminus1 minusAdaggerb98304222
983046
le9830611 +
1
ε
983062(1 + 1 + ε)α2ρk
983042A9830422F983042z0 minus bperp98304222
+(1 + ε)2ρ2E983045983042xkminus2 minusAdaggerb98304222
983046
le middot middot middot
le9830611 +
1
ε
983062(1 + 1 + ε+ middot middot middot+ (1 + ε)kminus1)α2ρk
983042A9830422F983042z0 minus bperp98304222
+(1 + ε)kρk983042x0 minusAdaggerb98304222
le (1 + ε)kρk983061(1 + ε)α2983042z0 minus bperp98304222
ε2983042A9830422F+ 983042x0 minusAdaggerb98304222
983062
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 42 43
Remark 4
For the case REBK with s = m t = n and α = 1 (ie REK) by theorthogonality
(983141xk minusAdaggerb)T(xk minus 983141xk) = 0
the equation (5) becomes
983042xk minusAdaggerb98304222 = 983042xk minus 983141xk98304222 + 983042983141xk minusAdaggerb98304222
which yields the following convergence for REK
E983045983042xk minusAdaggerb98304222
983046le ρk
983061k983042z0 minus bperp98304222
983042A9830422F+ 983042x0 minusAdaggerb98304222
983062
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 43 43
Kui Du Wutao Si and Xiaohui Sun
Pseudoinverse-free randomized extended block kaczmarz for solving least squaresarXiv preprint arXiv200104179 2020
Kui Du and Xiaohui Sun
A doubly stochastic block Gauss-Seidel algorithm for solving linear equationsarXiv preprint arXiv191213291 2019
Lawrence H Landweber
An iteration formula for Fredholm integral equations of the first kindAmer J Math 73615ndash624 1951
D Leventhal and A S Lewis
Randomized methods for linear constraints convergence rates and conditioningMath Oper Res 35(3)641ndash654 2010
Anna Ma Deanna Needell and Aaditya Ramdas
Convergence properties of the randomized extended Gauss-Seidel and Kaczmarz methodsSIAM J Matrix Anal Appl 36(4)1590ndash1604 2015
Meisam Razaviyayn Mingyi Hong Navid Reyhanian and Zhi-Quan Luo
A linearly convergent doubly stochastic Gauss-Seidel algorithm for solving linear equations and acertain class of over-parameterized optimization problemsMath Program 176(1-2 Ser B)465ndash496 2019
Thomas Strohmer and Roman Vershynin
A randomized Kaczmarz algorithm with exponential convergenceJ Fourier Anal Appl 15(2)262ndash278 2009
Anastasios Zouzias and Nikolaos M Freris
Randomized extended Kaczmarz for solving least squaresSIAM J Matrix Anal Appl 34(2)773ndash793 2013
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 43 43
2 Notation and preliminaries
For any random variable ξ let E983045ξ983046denote its expectation
For an integer m ge 1 let
[m] = 1 2 3 m
For any vector u isin Rm we use uT and 983042u9830422 to denote thetranspose and the Euclidean norm (ℓ2-norm) of u respectively
I the identity matrix whose order is clear from the context
For any matrix A isin Rmtimesn we use AT Adagger 983042A983042F range(A)
σ1(A) ge σ2(A) ge middot middot middot ge σr(A) gt 0
to denote the transpose the Moore-Penrose pseudoinverse theFrobenius norm the column space and all the nonzero singularvalues of A respectively Obviously r is the rank of A
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 4 43
For index sets I sube [m] and J sube [n] let AI AJ and AIJdenote the row submatrix indexed by I the column submatrixindexed by J and the submatrix that lies in the rows indexed byI and the columns indexed by J respectively
Let I1 I2 Is denote a partition of [m] that is
Ii cap Ij = empty cupsi=1Ii = [m]
Let J1J2 Jt denote a partition of [n] Let
P = I1 I2 Istimes J1J2 Jt
Lemma 1
For any vector u isin Rm and any matrix A isin Rmtimesn it holds
uTAATu le 983042A9830422FuTu
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 5 43
Lemma 2
For any matrix A isin Rmtimesn with rank r and any vector u isin range(A) itholds
uTAATu ge σ2r (A)983042u98304222
Lemma 3
Let α gt 0 and A be any nonzero real matrix For every u isin range(A)it holds
983056983056983056983056983056
983061Iminus αAAT
983042A9830422F
983062k
u
9830569830569830569830569830562
le983061max1leiler
9830559830559830559830551minusασ2
i (A)
983042A9830422F
983055983055983055983055
983062k
983042u9830422
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 6 43
3 A doubly stochastic block Gauss-Seidel algorithm [2]
Algorithm 1 Doubly stochastic block Gauss-Seidel (DSBGS)
Let α gt 0 Initialize x0 isin Rn
for k = 1 2 do
Pick (IJ ) isin P with probability983042AIJ 9830422F983042A9830422F
Set xk = xkminus1 minus αIJ (AIJ )T(II)T
983042AIJ 9830422F(Axkminus1 minus b)
Landweber [3] (s = 1 and t = 1)
xk = xkminus1 minus αAT
983042A9830422F(Axkminus1 minus b)
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 7 43
Randomized Kaczmarz (RK) [7] (s = m t = 1 α = 1)
At step k RK projects xkminus1 onto the hyperplane x | Aix = bi
xk = xkminus1 minus Aixkminus1 minus bi
983042Ai98304222(Ai)
T
where Ai is the ith row of A and bi is the ith component of b
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 8 43
Randomized Gauss-Seidel [4][5] (s = 1 t = n α = 1)
xk = xkminus1 minus (Aj)T(Axkminus1 minus b)
983042Aj98304222Ij
where Aj is the jth column of A and Ij is the jth column of thentimes n identity matrix I
Doubly stochastic Gauss-Seidel [6] (s = m t = n)
xk = xkminus1 minus αAij(Aix
kminus1 minus bi)
|Aij |2Ij
where Aij is the (i j) entry of A
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 9 43
31 Convergence of the norms of the expectations
Theorem 4
Let xk denote the kth iterate of DSBGS applied to the consistent linearsystem
Ax = b
with arbitrary x0 isin Rn In exact arithmetic it holds
983042E[xk minus x0983183]9830422 le
983061max1leiler
9830559830559830559830551minusασ2
i (A)
983042A9830422F
983055983055983055983055
983062k
983042x0 minus x09831839830422
wherex0983183 = (IminusAdaggerA)x0 +Adaggerb
ie the projection of x0 onto the solution set
x isin Rn | Ax = b
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 10 43
Proof of Theorem 4
Note that the conditioned expectation on xkminus1
E[xk |xkminus1]
= xkminus1 minus αE983063IJ (AIJ )T(II)T
983042AIJ 9830422F
983064(Axkminus1 minus b)
= xkminus1 minus α
983091
983107983131
(IJ )isinP
IJ (AIJ )T(II)T
983042AIJ 9830422F983042AIJ 9830422F983042A9830422F
983092
983108 (Axkminus1 minus b)
= xkminus1 minus αAT
983042A9830422F(Axkminus1 minus b)
Then the conditioned expectation E[xk minus x0983183 |xkminus1] is given by
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 11 43
E[xk minus x0983183 |xkminus1] = E[xk |xkminus1]minus x0
983183
= xkminus1 minus αAT
983042A9830422F(Axkminus1 minus b)minus x0
983183
= xkminus1 minus αAT
983042A9830422F(Axkminus1 minusAx0
983183)minus x0983183
=
983061Iminus αATA
983042A9830422F
983062(xkminus1 minus x0
983183)
Taking expectation gives
E[xk minus x0983183] = E[E[xk minus x0
983183 |xkminus1]] =
983061Iminus αATA
983042A9830422F
983062E[xkminus1 minus x0
983183]
=
983061Iminus αATA
983042A9830422F
983062k
(x0 minus x0983183)
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 12 43
Applying the norms to both sides we obtain
983042E[xk minus x0983183]9830422 le
983061max1leiler
9830559830559830559830551minusασ2
i (A)
983042A9830422F
983055983055983055983055
983062k
983042x0 minus x09831839830422
Here the inequality follows from the fact that
x0 minus x0983183 = AdaggerAx
0 minusAdaggerb isin range(AT)
and Lemma 3
Remark 1
If x0 isin range(AT) then x0983183 = Adaggerb
To ensure convergence of the expected iterate it suffices to have
max1leiler
9830559830559830559830551minusασ2
i (A)
983042A9830422F
983055983055983055983055 lt 1 ie 0 lt α lt2983042A9830422Fσ21(A)
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 13 43
Theorem 5
Let xk denote the kth iterate of DSBGS applied to the consistent orinconsistent linear system
Ax = b
with arbitrary x0 isin Rn In exact arithmetic it holds
983042E[Axk minusAx983183]9830422 le983061max1leiler
9830559830559830559830551minusασ2
i (A)
983042A9830422F
983055983055983055983055
983062k
983042Ax0 minusAx9831839830422
where x983183 is any solution of
ATAx = ATb
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 14 43
Proof of Theorem 5
Note that the conditioned expectation on xkminus1
E[Axk minusAx983183 |xkminus1]
= A(E[xk |xkminus1]minus x983183)
= A
983061xkminus1 minus α
AT
983042A9830422F(Axkminus1 minus b)minus x983183
983062
= A
983061xkminus1 minus αAT
983042A9830422F(Axkminus1 minusAx983183)minus x983183
983062
(by ATb = ATAx983183)
= Axkminus1 minusAx983183 minusαAAT
983042A9830422F(Axkminus1 minusAx983183)
=
983061Iminus αAAT
983042A9830422F
983062(Axkminus1 minusAx983183)
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 15 43
Taking expectation gives
E[Axk minusAx983183] = E[E[Axk minusAx983183 |xkminus1]]
=
983061Iminus αAAT
983042A9830422F
983062E[Axkminus1 minusAx983183]
=
983061Iminus αAAT
983042A9830422F
983062k
(Ax0 minusAx983183)
Applying the norms to both sides we obtain
983042E[Axk minusAx983183]9830422 le983061max1leiler
9830559830559830559830551minusασ2
i (A)
983042A9830422F
983055983055983055983055
983062k
983042Ax0 minusAx9831839830422
Here the inequality follows from the fact that
Ax0 minusAx983183 isin range(A)
and Lemma 3
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 16 43
32 Convergence of the expected norms
Theorem 6
Let xk denote the kth iterate of DSBGS applied to the full column rankconsistent linear system
Ax = b
with arbitrary x0 isin Rn Assume
0 lt α lt 2t
In exact arithmetic it holds
E[983042xk minusAdaggerb98304222] le9830611minus (2αminus tα2)σ2
n(A)
983042A9830422F
983062k
983042x0 minusAdaggerb98304222
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 17 43
Proof of Theorem 6
983042xk minusAdaggerb98304222
=
983056983056983056983056xkminus1 minus α
983061IJ (AIJ )T(II)
T
983042AIJ 9830422F
983062(Axkminus1 minus b)minusAdaggerb
9830569830569830569830562
2
=
983056983056983056983056xkminus1 minus α
983061IJ (AIJ )T(II)
T
983042AIJ 9830422F
983062A(xkminus1 minusAdaggerb)minusAdaggerb
9830569830569830569830562
2
=
983056983056983056983056xkminus1 minusAdaggerbminus α
983061IJ (AIJ )T(II)
TA
983042AIJ 9830422F
983062(xkminus1 minusAdaggerb)
9830569830569830569830562
2
= 983042xkminus1 minusAdaggerb98304222 minus 2α(xkminus1 minusAdaggerb)T983061IJ (AIJ )T(II)
TA
983042AIJ 9830422F
983062(xkminus1 minusAdaggerb)
+α2(xkminus1 minusAdaggerb)T983061ATIIAIJ (IJ )TIJ (AIJ )T(II)
TA
983042AIJ 9830424F
983062(xkminus1 minusAdaggerb)
le 983042xkminus1 minusAdaggerb98304222 minus 2α(xkminus1 minusAdaggerb)T983061IJ (AIJ )T(II)
TA
983042AIJ 9830422F
983062(xkminus1 minusAdaggerb)
+α2(xkminus1 minusAdaggerb)T983061ATII(II)
TA
983042AIJ 9830422F
983062(xkminus1 minusAdaggerb)
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 18 43
The last equality follows from (IJ )TIJ = I and Lemma 1 Takingexpectation gives
E[983042xk minusAdaggerb98304222 |xkminus1]
le 983042xkminus1 minusAdaggerb98304222 minus (2αminus tα2)(xkminus1 minusAdaggerb)T983061ATA
983042A9830422F
983062(xkminus1 minusAdaggerb)
le9830611minus (2αminus tα2)σ2
n(A)
983042A9830422F
983062983042xkminus1 minusAdaggerb98304222 (by Lemma 2)
Taking expectation again gives
E[983042xk minusAdaggerb98304222] = E[E[983042xk minusAdaggerb98304222 |xkminus1]]
le9830611minus (2αminus tα2)σ2
n(A)
983042A9830422F
983062E[983042xkminus1 minusAdaggerb98304222]
le9830611minus (2αminus tα2)σ2
n(A)
983042A9830422F
983062k
983042x0 minusAdaggerb98304222
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 19 43
Remark 2
If t = 1 and x0 isin range(AT) we can show
xk minus x0983183 isin range(AT)
by induction where
x0983183 = (IminusAdaggerA)x0 +Adaggerb
ie the projection of x0 onto the set
x isin Rn | Ax = b
Then for rank deficient consistent linear systems by the sameapproach we can prove the convergence bound
E[983042xk minus x098318398304222] le
9830611minus (2αminus α2)σ2
r (A)
983042A9830422F
983062k
983042x0 minus x098318398304222
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 20 43
Theorem 7
Let xk denote the kth iterate of DSBGS applied to the consistent linearsystem (full column rank or rank-deficient)
Ax = b
with arbitrary x0 isin Rn If t = n and 0 lt α lt 2σ2r (A)983042A9830422F then
E[983042Axk minus b98304222] le9830611 + α2 minus 2ασ2
r (A)
983042A9830422F
983062k
983042Ax0 minus b98304222
If t lt n and 0 lt α lt 2σ2r (A)(tρ) then
E[983042Axk minus b98304222] le9830611minus 2ασ2
r (A)minus tρα2
983042A9830422F
983062k
983042Ax0 minus b98304222
whereρ = max
1lejletσ21(AJj )
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 21 43
Proof of Theorem 7
Note that
983042Axk minus b98304222
=
983056983056983056983056Axkminus1 minus α
983061AIJ (AIJ )T(II)
T
983042AIJ 9830422F
983062(Axkminus1 minus b)minus b
9830569830569830569830562
2
= 983042Axkminus1 minus b98304222 minus 2α(Axkminus1 minus b)T983061AIJ (AIJ )T(II)
T
983042AIJ 9830422F
983062(Axkminus1 minus b)
+α2(Axkminus1 minus b)T983061IIAIJ (IJ )TATAIJ (AIJ )T(II)
T
983042AIJ 9830424F
983062(Axkminus1 minus b)
If t = n then it follows from
(IJ )TATAIJ = 983042AJ 9830422F
(since AIJ = AJ is a column vector) that
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 22 43
983042Axk minus b98304222
= 983042Axkminus1 minus b98304222 minus 2α(Axkminus1 minus b)T983061AIJ (AIJ )T(II)
T
983042AIJ 9830422F
983062(Axkminus1 minus b)
+α2(Axkminus1 minus b)T983061983042AJ 9830422FIIAIJ (AIJ )T(II)
T
983042AIJ 9830424F
983062(Axkminus1 minus b)
le 983042Axkminus1 minus b98304222 minus 2α(Axkminus1 minus b)T983061AIJ (AIJ )T(II)
T
983042AIJ 9830422F
983062(Axkminus1 minus b)
+α2(Axkminus1 minus b)T983061983042AJ 9830422FII(II)T
983042AIJ 9830422F
983062(Axkminus1 minus b) (by Lemma 1)
Taking expectation gives
E[983042Axk minus b98304222 |xkminus1]
le (1 + α2)983042Axkminus1 minus b98304222 minus 2α(Axkminus1 minus b)T983061AAT
983042A9830422F
983062(Axkminus1 minus b)
le9830611 + α2 minus 2ασ2
r(A)
983042A9830422F
983062983042Axkminus1 minus b98304222
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 23 43
The last inequality follows from Axkminus1 minus b isin range(A) and Lemma 2Taking expectation again gives
E[983042Axk minus b98304222] = E[E[983042Axk minus b98304222 |xkminus1]]
le9830611 + α2 minus 2ασ2
r (A)
983042A9830422F
983062E[983042Axkminus1 minus b98304222]
le9830611 + α2 minus 2ασ2
r (A)
983042A9830422F
983062k
983042Ax0 minus b98304222
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 24 43
If t lt n then it follows from
(IJ )TATAIJ = AT
JAJ ≼ ρI
(since ρ = max1lejlet σ21(AJj )) that
983042Axk minus b98304222
le 983042Axkminus1 minus b98304222 minus 2α(Axkminus1 minus b)T983061AIJ (AIJ )T(II)
T
983042AIJ 9830422F
983062(Axkminus1 minus b)
+α2(Axkminus1 minus b)T983061ρIIAIJ (AIJ )T(II)
T
983042AIJ 9830424F
983062(Axkminus1 minus b)
le 983042Axkminus1 minus b98304222 minus 2α(Axkminus1 minus b)T983061AIJ (AIJ )T(II)
T
983042AIJ 9830422F
983062(Axkminus1 minus b)
+α2(Axkminus1 minus b)T983061ρII(II)
T
983042AIJ 9830422F
983062(Axkminus1 minus b) (by Lemma 1)
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 25 43
Taking expectation gives
E[983042Axk minus b98304222 |xkminus1]
le9830611 +
tρα2
983042A9830422F
983062983042Axkminus1 minus b98304222 minus 2α(Axkminus1 minus b)T
983061AAT
983042A9830422F
983062(Axkminus1 minus b)
le9830611minus 2ασ2
r(A)minus tρα2
983042A9830422F
983062983042Axkminus1 minus b98304222
The last inequality follows from Axkminus1 minus b isin range(A) and Lemma 2Taking expectation again gives
E[983042Axk minus b98304222] = E[E[983042Axk minus b98304222 |xkminus1]]
le9830611minus 2ασ2
r (A)minus tρα2
983042A9830422F
983062E[983042Axkminus1 minus b98304222]
le9830611minus 2ασ2
r (A)minus tρα2
983042A9830422F
983062k
983042Ax0 minus b98304222
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 26 43
Remark 3
Note that(AJ )
Tb = (AJ )TAx983183
where x983183 is any solution of ATAx = ATb For inconsistent linearsystems we can prove DSBGS(1 t) has the convergence bounds Ift = n and 0 lt α lt 2σ2
r (A)983042A9830422F then
E[983042Axk minusAx98318398304222] le9830611 + α2 minus 2ασ2
r (A)
983042A9830422F
983062k
983042Ax0 minus b98304222
If t lt n and 0 lt α lt 2σ2r (A)(tρ) then
E[983042Axk minusAx98318398304222] le9830611minus 2ασ2
r (A)minus tρα2
983042A9830422F
983062k
983042Ax0 minus b98304222
whereρ = max
1lejletσ21(AJj )
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 27 43
4 Randomized extended block Kaczmarz (REBK) [1]
Algorithm 2 Randomized extended block Kaczmarz (REBK)
Let I1 I2 Is and J1J2 Jt be partitions of [m]and [n] respectivelyLet α gt 0 Initialize z0 isin b+ range(A) and x0 isin range(AT)for k = 1 2 do
Pick j isin [t] with probability 983042AJj9830422F983042A9830422FSet zk = zkminus1 minus α
983042AJj9830422FAJj (AJj )
Tzkminus1
Pick i isin [s] with probability 983042AIi9830422F983042A9830422FSet xk = xkminus1 minus α
983042AIi9830422F(AIi)
T(AIixkminus1 minus bIi + zkIi)
By choosing s = m t = n and α = 1 we recover the well-knownrandomized extended Kaczmarz (REK) algorithm of Zouzias and Freris[8]
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 28 43
Let Ekminus1
983045middot983046denote the conditional expectation conditioned on the first
k minus 1 iterations of REBK That is
Ekminus1
983045middot983046= E
983045middot|j1 i1 j2 i2 jkminus1 ikminus1
983046
where jl is the lth column block chosen and il is the lth row blockchosen
We denote the conditional expectation conditioned on the first k minus 1iterations and the kth column block chosen as
Eikminus1
983045middot983046= E
983045middot|j1 i1 j2 i2 jkminus1 ikminus1 jk
983046
Then by the law of total expectation we have
Ekminus1
983045middot983046= Ekminus1
983045Eikminus1
983045middot983046983046
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 29 43
41 Convergence of the norms of the expectations
Theorem 8
For any given consistent or inconsistent linear system
Ax = b
let xk be the kth iterate of REBK with
z0 isin b+ range(A) and x0 isin range(AT)
It holds
983042E983045xk minusAdaggerb
9830469830422 le δk
983061983042x0 minusAdaggerb9830422 +
αk983042ATz09830422983042A9830422F
983062
where
δ = max1leiler
9830559830559830559830551minusασ2
i (A)
983042A9830422F
983055983055983055983055
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 30 43
Proof By ATb = ATAAdaggerb and
xk minusAdaggerb = xkminus1 minusAdaggerbminus α
983042AIi9830422F(AIi)
T(AIixkminus1 minus bIi + zkIi)
we have
Ekminus1
983045xk minusAdaggerb
983046= Ekminus1
983045Eikminus1
983045xk minusAdaggerb
983046983046
= xkminus1 minusAdaggerbminus Ekminus1
983063αAT(Axkminus1 minus b+ zk)
983042A9830422F
983064
= xkminus1 minusAdaggerbminus αATAxkminus1 minusATb
983042A9830422Fminus α
AT
983042A9830422FEkminus1
983045zk
983046
=
983061Iminus α
ATA
983042A9830422F
983062(xkminus1 minusAdaggerb)minus α
AT
983042A9830422FEkminus1
983045zk
983046
=
983061Iminus α
ATA
983042A9830422F
983062(xkminus1 minusAdaggerb)minus α
AT
983042A9830422F
983061Iminus α
AAT
983042A9830422F
983062zkminus1
=
983061Iminus α
ATA
983042A9830422F
983062(xkminus1 minusAdaggerb)minus α
983061Iminus α
ATA
983042A9830422F
983062AT
983042A9830422Fzkminus1
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 31 43
Taking expectation gives
E983045xk minusAdaggerb
983046
= E983045Ekminus1
983045xk minusAdaggerb
983046983046
=
983061Iminus α
ATA
983042A9830422F
983062E983045xkminus1 minusAdaggerb
983046minus α
983061Iminus α
ATA
983042A9830422F
983062AT
983042A9830422FE983045zkminus1
983046
=
983061Iminus α
ATA
983042A9830422F
983062E983045xkminus1 minusAdaggerb
983046minus α
983061Iminus α
ATA
983042A9830422F
983062kATz0
983042A9830422F
=
983061Iminus α
ATA
983042A9830422F
9830622
E983045xkminus2 minusAdaggerb
983046minus 2α
983061Iminus α
ATA
983042A9830422F
983062kATz0
983042A9830422F= middot middot middot
=
983061Iminus α
ATA
983042A9830422F
983062k
(x0 minusAdaggerb)minus αk
983061Iminus α
ATA
983042A9830422F
983062kATz0
983042A9830422F
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 32 43
Applying the norms to both sides we obtain
983042E983045xk minusAdaggerb
9830469830422
=
983056983056983056983056983056
983061Iminus α
ATA
983042A9830422F
983062k
(x0 minusAdaggerb)minus αk
983061Iminus α
ATA
983042A9830422F
983062kATz0
983042A9830422F
9830569830569830569830569830562
le983056983056983056983056983056
983061Iminus α
ATA
983042A9830422F
983062k
(x0 minusAdaggerb)
9830569830569830569830569830562
+
983056983056983056983056983056αk983061Iminus α
ATA
983042A9830422F
983062kATz0
983042A9830422F
9830569830569830569830569830562
le δk983061983042x0 minusAdaggerb9830422 +
αk983042ATz09830422983042A9830422F
983062
Here the last inequality follows from the facts that
x0 minusAdaggerb isin range(AT) ATz0 isin range(AT)
and Lemma 3
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 33 43
42 Convergence of E983045983042xk minusAdaggerb98304222
983046
The convergence ofE983045983042xk minusAdaggerb98304222
983046
depends on the positive number ρ defined as
ρ = 1minus (2αminus α2)σ2r (A)
983042A9830422F
In the following lemma we show that the vector zk generated in REBKwith z0 isin b+ range(A) converges to
bperp = (IminusAAdagger)b
which is the orthogonal projection of z0 onto the set z | ATz = 0
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 34 43
Lemma 9
For any given consistent or inconsistent linear system
Ax = b
let zk be the vector generated in REBK with
z0 isin b+ range(A)
It holdsE983045983042zk minus bperp98304222
983046le ρk983042z0 minus bperp98304222
Proof By (AJj )Tbperp = 0 we have
zk minus bperp = zkminus1 minus bperp minus α
983042AJj9830422FAJj (AJj )
T(zkminus1 minus bperp) (1)
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 35 43
By z0 minus bperp = AAdaggerz0 isin range(A) we can show by induction that
zk minus bperp isin range(A)
It follows from (1) that
983042zk minus bperp98304222
= 983042zkminus1 minus bperp98304222 minus2α983042(AJj )
T(zkminus1 minus bperp)98304222983042AJj9830422F
+α2
983042AJj9830424F(zkminus1 minus bperp)
TAJj (AJj )TAJj (AJj )
T(zkminus1 minus bperp)
le 983042zkminus1 minus bperp98304222 minus(2αminus α2)983042(AJj )
T(zkminus1 minus bperp)98304222983042AJj9830422F
(by Lemma 1)
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 36 43
Taking the conditioned expectation on the first k minus 1 iterations yields
Ekminus1
983045983042zk minus bperp98304222
983046
le 983042zkminus1 minus bperp98304222 minus(2αminus α2)983042AT(zkminus1 minus bperp)98304222
983042A9830422Fle ρ983042zkminus1 minus bperp98304222 (by Lemma 2)
Taking expectation again gives
E983045983042zk minus bperp98304222
983046= E
983045Ekminus1
983045983042zk minus bperp98304222
983046983046
le ρE983045983042zkminus1 minus bperp98304222
983046
le ρk983042z0 minus bperp98304222
This completes the proof
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 37 43
Theorem 10
For any given consistent or inconsistent linear system Ax = b let xk
be the kth iterate of REBK with
z0 isin b+ range(A) and x0 isin range(AT)
For any ε gt 0 it holds
E983045983042xk minusAdaggerb98304222
983046le (1 + ε)kρk
983061(1 + ε)α2983042z0 minus bperp98304222
ε2983042A9830422F+ 983042x0 minusAdaggerb98304222
983062
Proof Let
983141xk = xkminus1 minus α
983042AIi9830422F(AIi)
TAIi(xkminus1 minusAdaggerb)
which is actually one DSBGS update for the linear systemAx = AAdaggerb from xkminus1
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 38 43
It follows from
xk minus 983141xk =α
983042AIi9830422F(AIi)
T(bIi minusAIiAdaggerbminus zkIi)
that
983042xk minus 983141xk98304222
=α2
983042AIi9830424F(bIi minusAIiA
daggerbminus zkIi)TAIi(AIi)
T(bIi minusAIiAdaggerbminus zkIi
)
leα2983042bIi minusAIiA
daggerbminus zkIi98304222
983042AIi9830422F (by Lemma 1) (2)
It follows from
Ekminus1
983045983042xk minus 983141xk98304222
983046= Ekminus1
983045Eikminus1
983045983042xk minus 983141xk98304222
983046983046
le Ekminus1
983063α2983042bminusAAdaggerbminus zk98304222
983042A9830422F
983064(by (2))
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 39 43
that
E983045983042xk minus 983141xk98304222
983046le α2
983042A9830422FE983045983042bminusAAdaggerbminus zk98304222
983046
le α2ρk
983042A9830422F983042z0 minus bperp98304222 (by Lemma 9) (3)
By x0 isin range(AT) and Adaggerb isin range(AT) we have
x0 minusAdaggerb isin range(AT)
Then we can show that xk minusAdaggerb isin range(AT) by induction By
983042983141xk minusAdaggerb98304222
= 983042xkminus1 minusAdaggerb98304222 minus2α983042AIi(x
kminus1 minusAdaggerb)98304222983042AIi9830422F
+α2
983042AIi9830424F(xkminus1 minusAdaggerb)T(AIi)
TAIi(AIi)TAIi(x
kminus1 minusAdaggerb)
le 983042xkminus1 minusAdaggerb98304222 minus(2αminus α2)983042AIi(x
kminus1 minusAdaggerb)98304222983042AIi9830422F
(by Lemma 1)
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 40 43
we have
Ekminus1
983045983042983141xk minusAdaggerb98304222
983046
le 983042xkminus1 minusAdaggerb98304222 minus(2αminus α2)983042A(xkminus1 minusAdaggerb)98304222
983042A9830422Fle ρ983042xkminus1 minusAdaggerb98304222 (by Lemma 3)
which yields
E983045983042983141xk minusAdaggerb98304222
983046le ρE
983045983042xkminus1 minusAdaggerb98304222
983046 (4)
Note that for any ε gt 0 we have
983042xk minusAdaggerb98304222= 983042xk minus 983141xk + 983141xk minusAdaggerb98304222le (983042xk minus 983141xk9830422 + 983042983141xk minusAdaggerb9830422)2
le 983042xk minus 983141xk98304222 + 983042983141xk minusAdaggerb98304222 + 2983042xk minus 983141xk9830422983042983141xk minusAdaggerb9830422
le9830611 +
1
ε
983062983042xk minus 983141xk98304222 + (1 + ε)983042983141xk minusAdaggerb98304222 (5)
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 41 43
Combining (3) (4) and (5) yields
E983045983042xk minusAdaggerb98304222
983046
le9830611 +
1
ε
983062E983045983042xk minus 983141xk98304222
983046+ (1 + ε)E
983045983042983141xk minusAdaggerb98304222
983046
le9830611 +
1
ε
983062α2ρk
983042A9830422F983042z0 minus bperp98304222 + (1 + ε)ρE
983045983042xkminus1 minusAdaggerb98304222
983046
le9830611 +
1
ε
983062(1 + 1 + ε)α2ρk
983042A9830422F983042z0 minus bperp98304222
+(1 + ε)2ρ2E983045983042xkminus2 minusAdaggerb98304222
983046
le middot middot middot
le9830611 +
1
ε
983062(1 + 1 + ε+ middot middot middot+ (1 + ε)kminus1)α2ρk
983042A9830422F983042z0 minus bperp98304222
+(1 + ε)kρk983042x0 minusAdaggerb98304222
le (1 + ε)kρk983061(1 + ε)α2983042z0 minus bperp98304222
ε2983042A9830422F+ 983042x0 minusAdaggerb98304222
983062
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 42 43
Remark 4
For the case REBK with s = m t = n and α = 1 (ie REK) by theorthogonality
(983141xk minusAdaggerb)T(xk minus 983141xk) = 0
the equation (5) becomes
983042xk minusAdaggerb98304222 = 983042xk minus 983141xk98304222 + 983042983141xk minusAdaggerb98304222
which yields the following convergence for REK
E983045983042xk minusAdaggerb98304222
983046le ρk
983061k983042z0 minus bperp98304222
983042A9830422F+ 983042x0 minusAdaggerb98304222
983062
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 43 43
Kui Du Wutao Si and Xiaohui Sun
Pseudoinverse-free randomized extended block kaczmarz for solving least squaresarXiv preprint arXiv200104179 2020
Kui Du and Xiaohui Sun
A doubly stochastic block Gauss-Seidel algorithm for solving linear equationsarXiv preprint arXiv191213291 2019
Lawrence H Landweber
An iteration formula for Fredholm integral equations of the first kindAmer J Math 73615ndash624 1951
D Leventhal and A S Lewis
Randomized methods for linear constraints convergence rates and conditioningMath Oper Res 35(3)641ndash654 2010
Anna Ma Deanna Needell and Aaditya Ramdas
Convergence properties of the randomized extended Gauss-Seidel and Kaczmarz methodsSIAM J Matrix Anal Appl 36(4)1590ndash1604 2015
Meisam Razaviyayn Mingyi Hong Navid Reyhanian and Zhi-Quan Luo
A linearly convergent doubly stochastic Gauss-Seidel algorithm for solving linear equations and acertain class of over-parameterized optimization problemsMath Program 176(1-2 Ser B)465ndash496 2019
Thomas Strohmer and Roman Vershynin
A randomized Kaczmarz algorithm with exponential convergenceJ Fourier Anal Appl 15(2)262ndash278 2009
Anastasios Zouzias and Nikolaos M Freris
Randomized extended Kaczmarz for solving least squaresSIAM J Matrix Anal Appl 34(2)773ndash793 2013
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 43 43
For index sets I sube [m] and J sube [n] let AI AJ and AIJdenote the row submatrix indexed by I the column submatrixindexed by J and the submatrix that lies in the rows indexed byI and the columns indexed by J respectively
Let I1 I2 Is denote a partition of [m] that is
Ii cap Ij = empty cupsi=1Ii = [m]
Let J1J2 Jt denote a partition of [n] Let
P = I1 I2 Istimes J1J2 Jt
Lemma 1
For any vector u isin Rm and any matrix A isin Rmtimesn it holds
uTAATu le 983042A9830422FuTu
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 5 43
Lemma 2
For any matrix A isin Rmtimesn with rank r and any vector u isin range(A) itholds
uTAATu ge σ2r (A)983042u98304222
Lemma 3
Let α gt 0 and A be any nonzero real matrix For every u isin range(A)it holds
983056983056983056983056983056
983061Iminus αAAT
983042A9830422F
983062k
u
9830569830569830569830569830562
le983061max1leiler
9830559830559830559830551minusασ2
i (A)
983042A9830422F
983055983055983055983055
983062k
983042u9830422
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 6 43
3 A doubly stochastic block Gauss-Seidel algorithm [2]
Algorithm 1 Doubly stochastic block Gauss-Seidel (DSBGS)
Let α gt 0 Initialize x0 isin Rn
for k = 1 2 do
Pick (IJ ) isin P with probability983042AIJ 9830422F983042A9830422F
Set xk = xkminus1 minus αIJ (AIJ )T(II)T
983042AIJ 9830422F(Axkminus1 minus b)
Landweber [3] (s = 1 and t = 1)
xk = xkminus1 minus αAT
983042A9830422F(Axkminus1 minus b)
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 7 43
Randomized Kaczmarz (RK) [7] (s = m t = 1 α = 1)
At step k RK projects xkminus1 onto the hyperplane x | Aix = bi
xk = xkminus1 minus Aixkminus1 minus bi
983042Ai98304222(Ai)
T
where Ai is the ith row of A and bi is the ith component of b
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 8 43
Randomized Gauss-Seidel [4][5] (s = 1 t = n α = 1)
xk = xkminus1 minus (Aj)T(Axkminus1 minus b)
983042Aj98304222Ij
where Aj is the jth column of A and Ij is the jth column of thentimes n identity matrix I
Doubly stochastic Gauss-Seidel [6] (s = m t = n)
xk = xkminus1 minus αAij(Aix
kminus1 minus bi)
|Aij |2Ij
where Aij is the (i j) entry of A
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 9 43
31 Convergence of the norms of the expectations
Theorem 4
Let xk denote the kth iterate of DSBGS applied to the consistent linearsystem
Ax = b
with arbitrary x0 isin Rn In exact arithmetic it holds
983042E[xk minus x0983183]9830422 le
983061max1leiler
9830559830559830559830551minusασ2
i (A)
983042A9830422F
983055983055983055983055
983062k
983042x0 minus x09831839830422
wherex0983183 = (IminusAdaggerA)x0 +Adaggerb
ie the projection of x0 onto the solution set
x isin Rn | Ax = b
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 10 43
Proof of Theorem 4
Note that the conditioned expectation on xkminus1
E[xk |xkminus1]
= xkminus1 minus αE983063IJ (AIJ )T(II)T
983042AIJ 9830422F
983064(Axkminus1 minus b)
= xkminus1 minus α
983091
983107983131
(IJ )isinP
IJ (AIJ )T(II)T
983042AIJ 9830422F983042AIJ 9830422F983042A9830422F
983092
983108 (Axkminus1 minus b)
= xkminus1 minus αAT
983042A9830422F(Axkminus1 minus b)
Then the conditioned expectation E[xk minus x0983183 |xkminus1] is given by
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 11 43
E[xk minus x0983183 |xkminus1] = E[xk |xkminus1]minus x0
983183
= xkminus1 minus αAT
983042A9830422F(Axkminus1 minus b)minus x0
983183
= xkminus1 minus αAT
983042A9830422F(Axkminus1 minusAx0
983183)minus x0983183
=
983061Iminus αATA
983042A9830422F
983062(xkminus1 minus x0
983183)
Taking expectation gives
E[xk minus x0983183] = E[E[xk minus x0
983183 |xkminus1]] =
983061Iminus αATA
983042A9830422F
983062E[xkminus1 minus x0
983183]
=
983061Iminus αATA
983042A9830422F
983062k
(x0 minus x0983183)
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 12 43
Applying the norms to both sides we obtain
983042E[xk minus x0983183]9830422 le
983061max1leiler
9830559830559830559830551minusασ2
i (A)
983042A9830422F
983055983055983055983055
983062k
983042x0 minus x09831839830422
Here the inequality follows from the fact that
x0 minus x0983183 = AdaggerAx
0 minusAdaggerb isin range(AT)
and Lemma 3
Remark 1
If x0 isin range(AT) then x0983183 = Adaggerb
To ensure convergence of the expected iterate it suffices to have
max1leiler
9830559830559830559830551minusασ2
i (A)
983042A9830422F
983055983055983055983055 lt 1 ie 0 lt α lt2983042A9830422Fσ21(A)
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 13 43
Theorem 5
Let xk denote the kth iterate of DSBGS applied to the consistent orinconsistent linear system
Ax = b
with arbitrary x0 isin Rn In exact arithmetic it holds
983042E[Axk minusAx983183]9830422 le983061max1leiler
9830559830559830559830551minusασ2
i (A)
983042A9830422F
983055983055983055983055
983062k
983042Ax0 minusAx9831839830422
where x983183 is any solution of
ATAx = ATb
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 14 43
Proof of Theorem 5
Note that the conditioned expectation on xkminus1
E[Axk minusAx983183 |xkminus1]
= A(E[xk |xkminus1]minus x983183)
= A
983061xkminus1 minus α
AT
983042A9830422F(Axkminus1 minus b)minus x983183
983062
= A
983061xkminus1 minus αAT
983042A9830422F(Axkminus1 minusAx983183)minus x983183
983062
(by ATb = ATAx983183)
= Axkminus1 minusAx983183 minusαAAT
983042A9830422F(Axkminus1 minusAx983183)
=
983061Iminus αAAT
983042A9830422F
983062(Axkminus1 minusAx983183)
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 15 43
Taking expectation gives
E[Axk minusAx983183] = E[E[Axk minusAx983183 |xkminus1]]
=
983061Iminus αAAT
983042A9830422F
983062E[Axkminus1 minusAx983183]
=
983061Iminus αAAT
983042A9830422F
983062k
(Ax0 minusAx983183)
Applying the norms to both sides we obtain
983042E[Axk minusAx983183]9830422 le983061max1leiler
9830559830559830559830551minusασ2
i (A)
983042A9830422F
983055983055983055983055
983062k
983042Ax0 minusAx9831839830422
Here the inequality follows from the fact that
Ax0 minusAx983183 isin range(A)
and Lemma 3
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 16 43
32 Convergence of the expected norms
Theorem 6
Let xk denote the kth iterate of DSBGS applied to the full column rankconsistent linear system
Ax = b
with arbitrary x0 isin Rn Assume
0 lt α lt 2t
In exact arithmetic it holds
E[983042xk minusAdaggerb98304222] le9830611minus (2αminus tα2)σ2
n(A)
983042A9830422F
983062k
983042x0 minusAdaggerb98304222
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 17 43
Proof of Theorem 6
983042xk minusAdaggerb98304222
=
983056983056983056983056xkminus1 minus α
983061IJ (AIJ )T(II)
T
983042AIJ 9830422F
983062(Axkminus1 minus b)minusAdaggerb
9830569830569830569830562
2
=
983056983056983056983056xkminus1 minus α
983061IJ (AIJ )T(II)
T
983042AIJ 9830422F
983062A(xkminus1 minusAdaggerb)minusAdaggerb
9830569830569830569830562
2
=
983056983056983056983056xkminus1 minusAdaggerbminus α
983061IJ (AIJ )T(II)
TA
983042AIJ 9830422F
983062(xkminus1 minusAdaggerb)
9830569830569830569830562
2
= 983042xkminus1 minusAdaggerb98304222 minus 2α(xkminus1 minusAdaggerb)T983061IJ (AIJ )T(II)
TA
983042AIJ 9830422F
983062(xkminus1 minusAdaggerb)
+α2(xkminus1 minusAdaggerb)T983061ATIIAIJ (IJ )TIJ (AIJ )T(II)
TA
983042AIJ 9830424F
983062(xkminus1 minusAdaggerb)
le 983042xkminus1 minusAdaggerb98304222 minus 2α(xkminus1 minusAdaggerb)T983061IJ (AIJ )T(II)
TA
983042AIJ 9830422F
983062(xkminus1 minusAdaggerb)
+α2(xkminus1 minusAdaggerb)T983061ATII(II)
TA
983042AIJ 9830422F
983062(xkminus1 minusAdaggerb)
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 18 43
The last equality follows from (IJ )TIJ = I and Lemma 1 Takingexpectation gives
E[983042xk minusAdaggerb98304222 |xkminus1]
le 983042xkminus1 minusAdaggerb98304222 minus (2αminus tα2)(xkminus1 minusAdaggerb)T983061ATA
983042A9830422F
983062(xkminus1 minusAdaggerb)
le9830611minus (2αminus tα2)σ2
n(A)
983042A9830422F
983062983042xkminus1 minusAdaggerb98304222 (by Lemma 2)
Taking expectation again gives
E[983042xk minusAdaggerb98304222] = E[E[983042xk minusAdaggerb98304222 |xkminus1]]
le9830611minus (2αminus tα2)σ2
n(A)
983042A9830422F
983062E[983042xkminus1 minusAdaggerb98304222]
le9830611minus (2αminus tα2)σ2
n(A)
983042A9830422F
983062k
983042x0 minusAdaggerb98304222
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 19 43
Remark 2
If t = 1 and x0 isin range(AT) we can show
xk minus x0983183 isin range(AT)
by induction where
x0983183 = (IminusAdaggerA)x0 +Adaggerb
ie the projection of x0 onto the set
x isin Rn | Ax = b
Then for rank deficient consistent linear systems by the sameapproach we can prove the convergence bound
E[983042xk minus x098318398304222] le
9830611minus (2αminus α2)σ2
r (A)
983042A9830422F
983062k
983042x0 minus x098318398304222
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 20 43
Theorem 7
Let xk denote the kth iterate of DSBGS applied to the consistent linearsystem (full column rank or rank-deficient)
Ax = b
with arbitrary x0 isin Rn If t = n and 0 lt α lt 2σ2r (A)983042A9830422F then
E[983042Axk minus b98304222] le9830611 + α2 minus 2ασ2
r (A)
983042A9830422F
983062k
983042Ax0 minus b98304222
If t lt n and 0 lt α lt 2σ2r (A)(tρ) then
E[983042Axk minus b98304222] le9830611minus 2ασ2
r (A)minus tρα2
983042A9830422F
983062k
983042Ax0 minus b98304222
whereρ = max
1lejletσ21(AJj )
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 21 43
Proof of Theorem 7
Note that
983042Axk minus b98304222
=
983056983056983056983056Axkminus1 minus α
983061AIJ (AIJ )T(II)
T
983042AIJ 9830422F
983062(Axkminus1 minus b)minus b
9830569830569830569830562
2
= 983042Axkminus1 minus b98304222 minus 2α(Axkminus1 minus b)T983061AIJ (AIJ )T(II)
T
983042AIJ 9830422F
983062(Axkminus1 minus b)
+α2(Axkminus1 minus b)T983061IIAIJ (IJ )TATAIJ (AIJ )T(II)
T
983042AIJ 9830424F
983062(Axkminus1 minus b)
If t = n then it follows from
(IJ )TATAIJ = 983042AJ 9830422F
(since AIJ = AJ is a column vector) that
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 22 43
983042Axk minus b98304222
= 983042Axkminus1 minus b98304222 minus 2α(Axkminus1 minus b)T983061AIJ (AIJ )T(II)
T
983042AIJ 9830422F
983062(Axkminus1 minus b)
+α2(Axkminus1 minus b)T983061983042AJ 9830422FIIAIJ (AIJ )T(II)
T
983042AIJ 9830424F
983062(Axkminus1 minus b)
le 983042Axkminus1 minus b98304222 minus 2α(Axkminus1 minus b)T983061AIJ (AIJ )T(II)
T
983042AIJ 9830422F
983062(Axkminus1 minus b)
+α2(Axkminus1 minus b)T983061983042AJ 9830422FII(II)T
983042AIJ 9830422F
983062(Axkminus1 minus b) (by Lemma 1)
Taking expectation gives
E[983042Axk minus b98304222 |xkminus1]
le (1 + α2)983042Axkminus1 minus b98304222 minus 2α(Axkminus1 minus b)T983061AAT
983042A9830422F
983062(Axkminus1 minus b)
le9830611 + α2 minus 2ασ2
r(A)
983042A9830422F
983062983042Axkminus1 minus b98304222
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 23 43
The last inequality follows from Axkminus1 minus b isin range(A) and Lemma 2Taking expectation again gives
E[983042Axk minus b98304222] = E[E[983042Axk minus b98304222 |xkminus1]]
le9830611 + α2 minus 2ασ2
r (A)
983042A9830422F
983062E[983042Axkminus1 minus b98304222]
le9830611 + α2 minus 2ασ2
r (A)
983042A9830422F
983062k
983042Ax0 minus b98304222
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 24 43
If t lt n then it follows from
(IJ )TATAIJ = AT
JAJ ≼ ρI
(since ρ = max1lejlet σ21(AJj )) that
983042Axk minus b98304222
le 983042Axkminus1 minus b98304222 minus 2α(Axkminus1 minus b)T983061AIJ (AIJ )T(II)
T
983042AIJ 9830422F
983062(Axkminus1 minus b)
+α2(Axkminus1 minus b)T983061ρIIAIJ (AIJ )T(II)
T
983042AIJ 9830424F
983062(Axkminus1 minus b)
le 983042Axkminus1 minus b98304222 minus 2α(Axkminus1 minus b)T983061AIJ (AIJ )T(II)
T
983042AIJ 9830422F
983062(Axkminus1 minus b)
+α2(Axkminus1 minus b)T983061ρII(II)
T
983042AIJ 9830422F
983062(Axkminus1 minus b) (by Lemma 1)
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 25 43
Taking expectation gives
E[983042Axk minus b98304222 |xkminus1]
le9830611 +
tρα2
983042A9830422F
983062983042Axkminus1 minus b98304222 minus 2α(Axkminus1 minus b)T
983061AAT
983042A9830422F
983062(Axkminus1 minus b)
le9830611minus 2ασ2
r(A)minus tρα2
983042A9830422F
983062983042Axkminus1 minus b98304222
The last inequality follows from Axkminus1 minus b isin range(A) and Lemma 2Taking expectation again gives
E[983042Axk minus b98304222] = E[E[983042Axk minus b98304222 |xkminus1]]
le9830611minus 2ασ2
r (A)minus tρα2
983042A9830422F
983062E[983042Axkminus1 minus b98304222]
le9830611minus 2ασ2
r (A)minus tρα2
983042A9830422F
983062k
983042Ax0 minus b98304222
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 26 43
Remark 3
Note that(AJ )
Tb = (AJ )TAx983183
where x983183 is any solution of ATAx = ATb For inconsistent linearsystems we can prove DSBGS(1 t) has the convergence bounds Ift = n and 0 lt α lt 2σ2
r (A)983042A9830422F then
E[983042Axk minusAx98318398304222] le9830611 + α2 minus 2ασ2
r (A)
983042A9830422F
983062k
983042Ax0 minus b98304222
If t lt n and 0 lt α lt 2σ2r (A)(tρ) then
E[983042Axk minusAx98318398304222] le9830611minus 2ασ2
r (A)minus tρα2
983042A9830422F
983062k
983042Ax0 minus b98304222
whereρ = max
1lejletσ21(AJj )
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 27 43
4 Randomized extended block Kaczmarz (REBK) [1]
Algorithm 2 Randomized extended block Kaczmarz (REBK)
Let I1 I2 Is and J1J2 Jt be partitions of [m]and [n] respectivelyLet α gt 0 Initialize z0 isin b+ range(A) and x0 isin range(AT)for k = 1 2 do
Pick j isin [t] with probability 983042AJj9830422F983042A9830422FSet zk = zkminus1 minus α
983042AJj9830422FAJj (AJj )
Tzkminus1
Pick i isin [s] with probability 983042AIi9830422F983042A9830422FSet xk = xkminus1 minus α
983042AIi9830422F(AIi)
T(AIixkminus1 minus bIi + zkIi)
By choosing s = m t = n and α = 1 we recover the well-knownrandomized extended Kaczmarz (REK) algorithm of Zouzias and Freris[8]
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 28 43
Let Ekminus1
983045middot983046denote the conditional expectation conditioned on the first
k minus 1 iterations of REBK That is
Ekminus1
983045middot983046= E
983045middot|j1 i1 j2 i2 jkminus1 ikminus1
983046
where jl is the lth column block chosen and il is the lth row blockchosen
We denote the conditional expectation conditioned on the first k minus 1iterations and the kth column block chosen as
Eikminus1
983045middot983046= E
983045middot|j1 i1 j2 i2 jkminus1 ikminus1 jk
983046
Then by the law of total expectation we have
Ekminus1
983045middot983046= Ekminus1
983045Eikminus1
983045middot983046983046
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 29 43
41 Convergence of the norms of the expectations
Theorem 8
For any given consistent or inconsistent linear system
Ax = b
let xk be the kth iterate of REBK with
z0 isin b+ range(A) and x0 isin range(AT)
It holds
983042E983045xk minusAdaggerb
9830469830422 le δk
983061983042x0 minusAdaggerb9830422 +
αk983042ATz09830422983042A9830422F
983062
where
δ = max1leiler
9830559830559830559830551minusασ2
i (A)
983042A9830422F
983055983055983055983055
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 30 43
Proof By ATb = ATAAdaggerb and
xk minusAdaggerb = xkminus1 minusAdaggerbminus α
983042AIi9830422F(AIi)
T(AIixkminus1 minus bIi + zkIi)
we have
Ekminus1
983045xk minusAdaggerb
983046= Ekminus1
983045Eikminus1
983045xk minusAdaggerb
983046983046
= xkminus1 minusAdaggerbminus Ekminus1
983063αAT(Axkminus1 minus b+ zk)
983042A9830422F
983064
= xkminus1 minusAdaggerbminus αATAxkminus1 minusATb
983042A9830422Fminus α
AT
983042A9830422FEkminus1
983045zk
983046
=
983061Iminus α
ATA
983042A9830422F
983062(xkminus1 minusAdaggerb)minus α
AT
983042A9830422FEkminus1
983045zk
983046
=
983061Iminus α
ATA
983042A9830422F
983062(xkminus1 minusAdaggerb)minus α
AT
983042A9830422F
983061Iminus α
AAT
983042A9830422F
983062zkminus1
=
983061Iminus α
ATA
983042A9830422F
983062(xkminus1 minusAdaggerb)minus α
983061Iminus α
ATA
983042A9830422F
983062AT
983042A9830422Fzkminus1
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 31 43
Taking expectation gives
E983045xk minusAdaggerb
983046
= E983045Ekminus1
983045xk minusAdaggerb
983046983046
=
983061Iminus α
ATA
983042A9830422F
983062E983045xkminus1 minusAdaggerb
983046minus α
983061Iminus α
ATA
983042A9830422F
983062AT
983042A9830422FE983045zkminus1
983046
=
983061Iminus α
ATA
983042A9830422F
983062E983045xkminus1 minusAdaggerb
983046minus α
983061Iminus α
ATA
983042A9830422F
983062kATz0
983042A9830422F
=
983061Iminus α
ATA
983042A9830422F
9830622
E983045xkminus2 minusAdaggerb
983046minus 2α
983061Iminus α
ATA
983042A9830422F
983062kATz0
983042A9830422F= middot middot middot
=
983061Iminus α
ATA
983042A9830422F
983062k
(x0 minusAdaggerb)minus αk
983061Iminus α
ATA
983042A9830422F
983062kATz0
983042A9830422F
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 32 43
Applying the norms to both sides we obtain
983042E983045xk minusAdaggerb
9830469830422
=
983056983056983056983056983056
983061Iminus α
ATA
983042A9830422F
983062k
(x0 minusAdaggerb)minus αk
983061Iminus α
ATA
983042A9830422F
983062kATz0
983042A9830422F
9830569830569830569830569830562
le983056983056983056983056983056
983061Iminus α
ATA
983042A9830422F
983062k
(x0 minusAdaggerb)
9830569830569830569830569830562
+
983056983056983056983056983056αk983061Iminus α
ATA
983042A9830422F
983062kATz0
983042A9830422F
9830569830569830569830569830562
le δk983061983042x0 minusAdaggerb9830422 +
αk983042ATz09830422983042A9830422F
983062
Here the last inequality follows from the facts that
x0 minusAdaggerb isin range(AT) ATz0 isin range(AT)
and Lemma 3
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 33 43
42 Convergence of E983045983042xk minusAdaggerb98304222
983046
The convergence ofE983045983042xk minusAdaggerb98304222
983046
depends on the positive number ρ defined as
ρ = 1minus (2αminus α2)σ2r (A)
983042A9830422F
In the following lemma we show that the vector zk generated in REBKwith z0 isin b+ range(A) converges to
bperp = (IminusAAdagger)b
which is the orthogonal projection of z0 onto the set z | ATz = 0
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 34 43
Lemma 9
For any given consistent or inconsistent linear system
Ax = b
let zk be the vector generated in REBK with
z0 isin b+ range(A)
It holdsE983045983042zk minus bperp98304222
983046le ρk983042z0 minus bperp98304222
Proof By (AJj )Tbperp = 0 we have
zk minus bperp = zkminus1 minus bperp minus α
983042AJj9830422FAJj (AJj )
T(zkminus1 minus bperp) (1)
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 35 43
By z0 minus bperp = AAdaggerz0 isin range(A) we can show by induction that
zk minus bperp isin range(A)
It follows from (1) that
983042zk minus bperp98304222
= 983042zkminus1 minus bperp98304222 minus2α983042(AJj )
T(zkminus1 minus bperp)98304222983042AJj9830422F
+α2
983042AJj9830424F(zkminus1 minus bperp)
TAJj (AJj )TAJj (AJj )
T(zkminus1 minus bperp)
le 983042zkminus1 minus bperp98304222 minus(2αminus α2)983042(AJj )
T(zkminus1 minus bperp)98304222983042AJj9830422F
(by Lemma 1)
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 36 43
Taking the conditioned expectation on the first k minus 1 iterations yields
Ekminus1
983045983042zk minus bperp98304222
983046
le 983042zkminus1 minus bperp98304222 minus(2αminus α2)983042AT(zkminus1 minus bperp)98304222
983042A9830422Fle ρ983042zkminus1 minus bperp98304222 (by Lemma 2)
Taking expectation again gives
E983045983042zk minus bperp98304222
983046= E
983045Ekminus1
983045983042zk minus bperp98304222
983046983046
le ρE983045983042zkminus1 minus bperp98304222
983046
le ρk983042z0 minus bperp98304222
This completes the proof
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 37 43
Theorem 10
For any given consistent or inconsistent linear system Ax = b let xk
be the kth iterate of REBK with
z0 isin b+ range(A) and x0 isin range(AT)
For any ε gt 0 it holds
E983045983042xk minusAdaggerb98304222
983046le (1 + ε)kρk
983061(1 + ε)α2983042z0 minus bperp98304222
ε2983042A9830422F+ 983042x0 minusAdaggerb98304222
983062
Proof Let
983141xk = xkminus1 minus α
983042AIi9830422F(AIi)
TAIi(xkminus1 minusAdaggerb)
which is actually one DSBGS update for the linear systemAx = AAdaggerb from xkminus1
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 38 43
It follows from
xk minus 983141xk =α
983042AIi9830422F(AIi)
T(bIi minusAIiAdaggerbminus zkIi)
that
983042xk minus 983141xk98304222
=α2
983042AIi9830424F(bIi minusAIiA
daggerbminus zkIi)TAIi(AIi)
T(bIi minusAIiAdaggerbminus zkIi
)
leα2983042bIi minusAIiA
daggerbminus zkIi98304222
983042AIi9830422F (by Lemma 1) (2)
It follows from
Ekminus1
983045983042xk minus 983141xk98304222
983046= Ekminus1
983045Eikminus1
983045983042xk minus 983141xk98304222
983046983046
le Ekminus1
983063α2983042bminusAAdaggerbminus zk98304222
983042A9830422F
983064(by (2))
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 39 43
that
E983045983042xk minus 983141xk98304222
983046le α2
983042A9830422FE983045983042bminusAAdaggerbminus zk98304222
983046
le α2ρk
983042A9830422F983042z0 minus bperp98304222 (by Lemma 9) (3)
By x0 isin range(AT) and Adaggerb isin range(AT) we have
x0 minusAdaggerb isin range(AT)
Then we can show that xk minusAdaggerb isin range(AT) by induction By
983042983141xk minusAdaggerb98304222
= 983042xkminus1 minusAdaggerb98304222 minus2α983042AIi(x
kminus1 minusAdaggerb)98304222983042AIi9830422F
+α2
983042AIi9830424F(xkminus1 minusAdaggerb)T(AIi)
TAIi(AIi)TAIi(x
kminus1 minusAdaggerb)
le 983042xkminus1 minusAdaggerb98304222 minus(2αminus α2)983042AIi(x
kminus1 minusAdaggerb)98304222983042AIi9830422F
(by Lemma 1)
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 40 43
we have
Ekminus1
983045983042983141xk minusAdaggerb98304222
983046
le 983042xkminus1 minusAdaggerb98304222 minus(2αminus α2)983042A(xkminus1 minusAdaggerb)98304222
983042A9830422Fle ρ983042xkminus1 minusAdaggerb98304222 (by Lemma 3)
which yields
E983045983042983141xk minusAdaggerb98304222
983046le ρE
983045983042xkminus1 minusAdaggerb98304222
983046 (4)
Note that for any ε gt 0 we have
983042xk minusAdaggerb98304222= 983042xk minus 983141xk + 983141xk minusAdaggerb98304222le (983042xk minus 983141xk9830422 + 983042983141xk minusAdaggerb9830422)2
le 983042xk minus 983141xk98304222 + 983042983141xk minusAdaggerb98304222 + 2983042xk minus 983141xk9830422983042983141xk minusAdaggerb9830422
le9830611 +
1
ε
983062983042xk minus 983141xk98304222 + (1 + ε)983042983141xk minusAdaggerb98304222 (5)
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 41 43
Combining (3) (4) and (5) yields
E983045983042xk minusAdaggerb98304222
983046
le9830611 +
1
ε
983062E983045983042xk minus 983141xk98304222
983046+ (1 + ε)E
983045983042983141xk minusAdaggerb98304222
983046
le9830611 +
1
ε
983062α2ρk
983042A9830422F983042z0 minus bperp98304222 + (1 + ε)ρE
983045983042xkminus1 minusAdaggerb98304222
983046
le9830611 +
1
ε
983062(1 + 1 + ε)α2ρk
983042A9830422F983042z0 minus bperp98304222
+(1 + ε)2ρ2E983045983042xkminus2 minusAdaggerb98304222
983046
le middot middot middot
le9830611 +
1
ε
983062(1 + 1 + ε+ middot middot middot+ (1 + ε)kminus1)α2ρk
983042A9830422F983042z0 minus bperp98304222
+(1 + ε)kρk983042x0 minusAdaggerb98304222
le (1 + ε)kρk983061(1 + ε)α2983042z0 minus bperp98304222
ε2983042A9830422F+ 983042x0 minusAdaggerb98304222
983062
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 42 43
Remark 4
For the case REBK with s = m t = n and α = 1 (ie REK) by theorthogonality
(983141xk minusAdaggerb)T(xk minus 983141xk) = 0
the equation (5) becomes
983042xk minusAdaggerb98304222 = 983042xk minus 983141xk98304222 + 983042983141xk minusAdaggerb98304222
which yields the following convergence for REK
E983045983042xk minusAdaggerb98304222
983046le ρk
983061k983042z0 minus bperp98304222
983042A9830422F+ 983042x0 minusAdaggerb98304222
983062
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 43 43
Kui Du Wutao Si and Xiaohui Sun
Pseudoinverse-free randomized extended block kaczmarz for solving least squaresarXiv preprint arXiv200104179 2020
Kui Du and Xiaohui Sun
A doubly stochastic block Gauss-Seidel algorithm for solving linear equationsarXiv preprint arXiv191213291 2019
Lawrence H Landweber
An iteration formula for Fredholm integral equations of the first kindAmer J Math 73615ndash624 1951
D Leventhal and A S Lewis
Randomized methods for linear constraints convergence rates and conditioningMath Oper Res 35(3)641ndash654 2010
Anna Ma Deanna Needell and Aaditya Ramdas
Convergence properties of the randomized extended Gauss-Seidel and Kaczmarz methodsSIAM J Matrix Anal Appl 36(4)1590ndash1604 2015
Meisam Razaviyayn Mingyi Hong Navid Reyhanian and Zhi-Quan Luo
A linearly convergent doubly stochastic Gauss-Seidel algorithm for solving linear equations and acertain class of over-parameterized optimization problemsMath Program 176(1-2 Ser B)465ndash496 2019
Thomas Strohmer and Roman Vershynin
A randomized Kaczmarz algorithm with exponential convergenceJ Fourier Anal Appl 15(2)262ndash278 2009
Anastasios Zouzias and Nikolaos M Freris
Randomized extended Kaczmarz for solving least squaresSIAM J Matrix Anal Appl 34(2)773ndash793 2013
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 43 43
Lemma 2
For any matrix A isin Rmtimesn with rank r and any vector u isin range(A) itholds
uTAATu ge σ2r (A)983042u98304222
Lemma 3
Let α gt 0 and A be any nonzero real matrix For every u isin range(A)it holds
983056983056983056983056983056
983061Iminus αAAT
983042A9830422F
983062k
u
9830569830569830569830569830562
le983061max1leiler
9830559830559830559830551minusασ2
i (A)
983042A9830422F
983055983055983055983055
983062k
983042u9830422
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 6 43
3 A doubly stochastic block Gauss-Seidel algorithm [2]
Algorithm 1 Doubly stochastic block Gauss-Seidel (DSBGS)
Let α gt 0 Initialize x0 isin Rn
for k = 1 2 do
Pick (IJ ) isin P with probability983042AIJ 9830422F983042A9830422F
Set xk = xkminus1 minus αIJ (AIJ )T(II)T
983042AIJ 9830422F(Axkminus1 minus b)
Landweber [3] (s = 1 and t = 1)
xk = xkminus1 minus αAT
983042A9830422F(Axkminus1 minus b)
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 7 43
Randomized Kaczmarz (RK) [7] (s = m t = 1 α = 1)
At step k RK projects xkminus1 onto the hyperplane x | Aix = bi
xk = xkminus1 minus Aixkminus1 minus bi
983042Ai98304222(Ai)
T
where Ai is the ith row of A and bi is the ith component of b
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 8 43
Randomized Gauss-Seidel [4][5] (s = 1 t = n α = 1)
xk = xkminus1 minus (Aj)T(Axkminus1 minus b)
983042Aj98304222Ij
where Aj is the jth column of A and Ij is the jth column of thentimes n identity matrix I
Doubly stochastic Gauss-Seidel [6] (s = m t = n)
xk = xkminus1 minus αAij(Aix
kminus1 minus bi)
|Aij |2Ij
where Aij is the (i j) entry of A
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 9 43
31 Convergence of the norms of the expectations
Theorem 4
Let xk denote the kth iterate of DSBGS applied to the consistent linearsystem
Ax = b
with arbitrary x0 isin Rn In exact arithmetic it holds
983042E[xk minus x0983183]9830422 le
983061max1leiler
9830559830559830559830551minusασ2
i (A)
983042A9830422F
983055983055983055983055
983062k
983042x0 minus x09831839830422
wherex0983183 = (IminusAdaggerA)x0 +Adaggerb
ie the projection of x0 onto the solution set
x isin Rn | Ax = b
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 10 43
Proof of Theorem 4
Note that the conditioned expectation on xkminus1
E[xk |xkminus1]
= xkminus1 minus αE983063IJ (AIJ )T(II)T
983042AIJ 9830422F
983064(Axkminus1 minus b)
= xkminus1 minus α
983091
983107983131
(IJ )isinP
IJ (AIJ )T(II)T
983042AIJ 9830422F983042AIJ 9830422F983042A9830422F
983092
983108 (Axkminus1 minus b)
= xkminus1 minus αAT
983042A9830422F(Axkminus1 minus b)
Then the conditioned expectation E[xk minus x0983183 |xkminus1] is given by
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 11 43
E[xk minus x0983183 |xkminus1] = E[xk |xkminus1]minus x0
983183
= xkminus1 minus αAT
983042A9830422F(Axkminus1 minus b)minus x0
983183
= xkminus1 minus αAT
983042A9830422F(Axkminus1 minusAx0
983183)minus x0983183
=
983061Iminus αATA
983042A9830422F
983062(xkminus1 minus x0
983183)
Taking expectation gives
E[xk minus x0983183] = E[E[xk minus x0
983183 |xkminus1]] =
983061Iminus αATA
983042A9830422F
983062E[xkminus1 minus x0
983183]
=
983061Iminus αATA
983042A9830422F
983062k
(x0 minus x0983183)
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 12 43
Applying the norms to both sides we obtain
983042E[xk minus x0983183]9830422 le
983061max1leiler
9830559830559830559830551minusασ2
i (A)
983042A9830422F
983055983055983055983055
983062k
983042x0 minus x09831839830422
Here the inequality follows from the fact that
x0 minus x0983183 = AdaggerAx
0 minusAdaggerb isin range(AT)
and Lemma 3
Remark 1
If x0 isin range(AT) then x0983183 = Adaggerb
To ensure convergence of the expected iterate it suffices to have
max1leiler
9830559830559830559830551minusασ2
i (A)
983042A9830422F
983055983055983055983055 lt 1 ie 0 lt α lt2983042A9830422Fσ21(A)
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 13 43
Theorem 5
Let xk denote the kth iterate of DSBGS applied to the consistent orinconsistent linear system
Ax = b
with arbitrary x0 isin Rn In exact arithmetic it holds
983042E[Axk minusAx983183]9830422 le983061max1leiler
9830559830559830559830551minusασ2
i (A)
983042A9830422F
983055983055983055983055
983062k
983042Ax0 minusAx9831839830422
where x983183 is any solution of
ATAx = ATb
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 14 43
Proof of Theorem 5
Note that the conditioned expectation on xkminus1
E[Axk minusAx983183 |xkminus1]
= A(E[xk |xkminus1]minus x983183)
= A
983061xkminus1 minus α
AT
983042A9830422F(Axkminus1 minus b)minus x983183
983062
= A
983061xkminus1 minus αAT
983042A9830422F(Axkminus1 minusAx983183)minus x983183
983062
(by ATb = ATAx983183)
= Axkminus1 minusAx983183 minusαAAT
983042A9830422F(Axkminus1 minusAx983183)
=
983061Iminus αAAT
983042A9830422F
983062(Axkminus1 minusAx983183)
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 15 43
Taking expectation gives
E[Axk minusAx983183] = E[E[Axk minusAx983183 |xkminus1]]
=
983061Iminus αAAT
983042A9830422F
983062E[Axkminus1 minusAx983183]
=
983061Iminus αAAT
983042A9830422F
983062k
(Ax0 minusAx983183)
Applying the norms to both sides we obtain
983042E[Axk minusAx983183]9830422 le983061max1leiler
9830559830559830559830551minusασ2
i (A)
983042A9830422F
983055983055983055983055
983062k
983042Ax0 minusAx9831839830422
Here the inequality follows from the fact that
Ax0 minusAx983183 isin range(A)
and Lemma 3
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 16 43
32 Convergence of the expected norms
Theorem 6
Let xk denote the kth iterate of DSBGS applied to the full column rankconsistent linear system
Ax = b
with arbitrary x0 isin Rn Assume
0 lt α lt 2t
In exact arithmetic it holds
E[983042xk minusAdaggerb98304222] le9830611minus (2αminus tα2)σ2
n(A)
983042A9830422F
983062k
983042x0 minusAdaggerb98304222
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 17 43
Proof of Theorem 6
983042xk minusAdaggerb98304222
=
983056983056983056983056xkminus1 minus α
983061IJ (AIJ )T(II)
T
983042AIJ 9830422F
983062(Axkminus1 minus b)minusAdaggerb
9830569830569830569830562
2
=
983056983056983056983056xkminus1 minus α
983061IJ (AIJ )T(II)
T
983042AIJ 9830422F
983062A(xkminus1 minusAdaggerb)minusAdaggerb
9830569830569830569830562
2
=
983056983056983056983056xkminus1 minusAdaggerbminus α
983061IJ (AIJ )T(II)
TA
983042AIJ 9830422F
983062(xkminus1 minusAdaggerb)
9830569830569830569830562
2
= 983042xkminus1 minusAdaggerb98304222 minus 2α(xkminus1 minusAdaggerb)T983061IJ (AIJ )T(II)
TA
983042AIJ 9830422F
983062(xkminus1 minusAdaggerb)
+α2(xkminus1 minusAdaggerb)T983061ATIIAIJ (IJ )TIJ (AIJ )T(II)
TA
983042AIJ 9830424F
983062(xkminus1 minusAdaggerb)
le 983042xkminus1 minusAdaggerb98304222 minus 2α(xkminus1 minusAdaggerb)T983061IJ (AIJ )T(II)
TA
983042AIJ 9830422F
983062(xkminus1 minusAdaggerb)
+α2(xkminus1 minusAdaggerb)T983061ATII(II)
TA
983042AIJ 9830422F
983062(xkminus1 minusAdaggerb)
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 18 43
The last equality follows from (IJ )TIJ = I and Lemma 1 Takingexpectation gives
E[983042xk minusAdaggerb98304222 |xkminus1]
le 983042xkminus1 minusAdaggerb98304222 minus (2αminus tα2)(xkminus1 minusAdaggerb)T983061ATA
983042A9830422F
983062(xkminus1 minusAdaggerb)
le9830611minus (2αminus tα2)σ2
n(A)
983042A9830422F
983062983042xkminus1 minusAdaggerb98304222 (by Lemma 2)
Taking expectation again gives
E[983042xk minusAdaggerb98304222] = E[E[983042xk minusAdaggerb98304222 |xkminus1]]
le9830611minus (2αminus tα2)σ2
n(A)
983042A9830422F
983062E[983042xkminus1 minusAdaggerb98304222]
le9830611minus (2αminus tα2)σ2
n(A)
983042A9830422F
983062k
983042x0 minusAdaggerb98304222
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 19 43
Remark 2
If t = 1 and x0 isin range(AT) we can show
xk minus x0983183 isin range(AT)
by induction where
x0983183 = (IminusAdaggerA)x0 +Adaggerb
ie the projection of x0 onto the set
x isin Rn | Ax = b
Then for rank deficient consistent linear systems by the sameapproach we can prove the convergence bound
E[983042xk minus x098318398304222] le
9830611minus (2αminus α2)σ2
r (A)
983042A9830422F
983062k
983042x0 minus x098318398304222
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 20 43
Theorem 7
Let xk denote the kth iterate of DSBGS applied to the consistent linearsystem (full column rank or rank-deficient)
Ax = b
with arbitrary x0 isin Rn If t = n and 0 lt α lt 2σ2r (A)983042A9830422F then
E[983042Axk minus b98304222] le9830611 + α2 minus 2ασ2
r (A)
983042A9830422F
983062k
983042Ax0 minus b98304222
If t lt n and 0 lt α lt 2σ2r (A)(tρ) then
E[983042Axk minus b98304222] le9830611minus 2ασ2
r (A)minus tρα2
983042A9830422F
983062k
983042Ax0 minus b98304222
whereρ = max
1lejletσ21(AJj )
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 21 43
Proof of Theorem 7
Note that
983042Axk minus b98304222
=
983056983056983056983056Axkminus1 minus α
983061AIJ (AIJ )T(II)
T
983042AIJ 9830422F
983062(Axkminus1 minus b)minus b
9830569830569830569830562
2
= 983042Axkminus1 minus b98304222 minus 2α(Axkminus1 minus b)T983061AIJ (AIJ )T(II)
T
983042AIJ 9830422F
983062(Axkminus1 minus b)
+α2(Axkminus1 minus b)T983061IIAIJ (IJ )TATAIJ (AIJ )T(II)
T
983042AIJ 9830424F
983062(Axkminus1 minus b)
If t = n then it follows from
(IJ )TATAIJ = 983042AJ 9830422F
(since AIJ = AJ is a column vector) that
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 22 43
983042Axk minus b98304222
= 983042Axkminus1 minus b98304222 minus 2α(Axkminus1 minus b)T983061AIJ (AIJ )T(II)
T
983042AIJ 9830422F
983062(Axkminus1 minus b)
+α2(Axkminus1 minus b)T983061983042AJ 9830422FIIAIJ (AIJ )T(II)
T
983042AIJ 9830424F
983062(Axkminus1 minus b)
le 983042Axkminus1 minus b98304222 minus 2α(Axkminus1 minus b)T983061AIJ (AIJ )T(II)
T
983042AIJ 9830422F
983062(Axkminus1 minus b)
+α2(Axkminus1 minus b)T983061983042AJ 9830422FII(II)T
983042AIJ 9830422F
983062(Axkminus1 minus b) (by Lemma 1)
Taking expectation gives
E[983042Axk minus b98304222 |xkminus1]
le (1 + α2)983042Axkminus1 minus b98304222 minus 2α(Axkminus1 minus b)T983061AAT
983042A9830422F
983062(Axkminus1 minus b)
le9830611 + α2 minus 2ασ2
r(A)
983042A9830422F
983062983042Axkminus1 minus b98304222
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 23 43
The last inequality follows from Axkminus1 minus b isin range(A) and Lemma 2Taking expectation again gives
E[983042Axk minus b98304222] = E[E[983042Axk minus b98304222 |xkminus1]]
le9830611 + α2 minus 2ασ2
r (A)
983042A9830422F
983062E[983042Axkminus1 minus b98304222]
le9830611 + α2 minus 2ασ2
r (A)
983042A9830422F
983062k
983042Ax0 minus b98304222
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 24 43
If t lt n then it follows from
(IJ )TATAIJ = AT
JAJ ≼ ρI
(since ρ = max1lejlet σ21(AJj )) that
983042Axk minus b98304222
le 983042Axkminus1 minus b98304222 minus 2α(Axkminus1 minus b)T983061AIJ (AIJ )T(II)
T
983042AIJ 9830422F
983062(Axkminus1 minus b)
+α2(Axkminus1 minus b)T983061ρIIAIJ (AIJ )T(II)
T
983042AIJ 9830424F
983062(Axkminus1 minus b)
le 983042Axkminus1 minus b98304222 minus 2α(Axkminus1 minus b)T983061AIJ (AIJ )T(II)
T
983042AIJ 9830422F
983062(Axkminus1 minus b)
+α2(Axkminus1 minus b)T983061ρII(II)
T
983042AIJ 9830422F
983062(Axkminus1 minus b) (by Lemma 1)
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 25 43
Taking expectation gives
E[983042Axk minus b98304222 |xkminus1]
le9830611 +
tρα2
983042A9830422F
983062983042Axkminus1 minus b98304222 minus 2α(Axkminus1 minus b)T
983061AAT
983042A9830422F
983062(Axkminus1 minus b)
le9830611minus 2ασ2
r(A)minus tρα2
983042A9830422F
983062983042Axkminus1 minus b98304222
The last inequality follows from Axkminus1 minus b isin range(A) and Lemma 2Taking expectation again gives
E[983042Axk minus b98304222] = E[E[983042Axk minus b98304222 |xkminus1]]
le9830611minus 2ασ2
r (A)minus tρα2
983042A9830422F
983062E[983042Axkminus1 minus b98304222]
le9830611minus 2ασ2
r (A)minus tρα2
983042A9830422F
983062k
983042Ax0 minus b98304222
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 26 43
Remark 3
Note that(AJ )
Tb = (AJ )TAx983183
where x983183 is any solution of ATAx = ATb For inconsistent linearsystems we can prove DSBGS(1 t) has the convergence bounds Ift = n and 0 lt α lt 2σ2
r (A)983042A9830422F then
E[983042Axk minusAx98318398304222] le9830611 + α2 minus 2ασ2
r (A)
983042A9830422F
983062k
983042Ax0 minus b98304222
If t lt n and 0 lt α lt 2σ2r (A)(tρ) then
E[983042Axk minusAx98318398304222] le9830611minus 2ασ2
r (A)minus tρα2
983042A9830422F
983062k
983042Ax0 minus b98304222
whereρ = max
1lejletσ21(AJj )
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 27 43
4 Randomized extended block Kaczmarz (REBK) [1]
Algorithm 2 Randomized extended block Kaczmarz (REBK)
Let I1 I2 Is and J1J2 Jt be partitions of [m]and [n] respectivelyLet α gt 0 Initialize z0 isin b+ range(A) and x0 isin range(AT)for k = 1 2 do
Pick j isin [t] with probability 983042AJj9830422F983042A9830422FSet zk = zkminus1 minus α
983042AJj9830422FAJj (AJj )
Tzkminus1
Pick i isin [s] with probability 983042AIi9830422F983042A9830422FSet xk = xkminus1 minus α
983042AIi9830422F(AIi)
T(AIixkminus1 minus bIi + zkIi)
By choosing s = m t = n and α = 1 we recover the well-knownrandomized extended Kaczmarz (REK) algorithm of Zouzias and Freris[8]
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 28 43
Let Ekminus1
983045middot983046denote the conditional expectation conditioned on the first
k minus 1 iterations of REBK That is
Ekminus1
983045middot983046= E
983045middot|j1 i1 j2 i2 jkminus1 ikminus1
983046
where jl is the lth column block chosen and il is the lth row blockchosen
We denote the conditional expectation conditioned on the first k minus 1iterations and the kth column block chosen as
Eikminus1
983045middot983046= E
983045middot|j1 i1 j2 i2 jkminus1 ikminus1 jk
983046
Then by the law of total expectation we have
Ekminus1
983045middot983046= Ekminus1
983045Eikminus1
983045middot983046983046
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 29 43
41 Convergence of the norms of the expectations
Theorem 8
For any given consistent or inconsistent linear system
Ax = b
let xk be the kth iterate of REBK with
z0 isin b+ range(A) and x0 isin range(AT)
It holds
983042E983045xk minusAdaggerb
9830469830422 le δk
983061983042x0 minusAdaggerb9830422 +
αk983042ATz09830422983042A9830422F
983062
where
δ = max1leiler
9830559830559830559830551minusασ2
i (A)
983042A9830422F
983055983055983055983055
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 30 43
Proof By ATb = ATAAdaggerb and
xk minusAdaggerb = xkminus1 minusAdaggerbminus α
983042AIi9830422F(AIi)
T(AIixkminus1 minus bIi + zkIi)
we have
Ekminus1
983045xk minusAdaggerb
983046= Ekminus1
983045Eikminus1
983045xk minusAdaggerb
983046983046
= xkminus1 minusAdaggerbminus Ekminus1
983063αAT(Axkminus1 minus b+ zk)
983042A9830422F
983064
= xkminus1 minusAdaggerbminus αATAxkminus1 minusATb
983042A9830422Fminus α
AT
983042A9830422FEkminus1
983045zk
983046
=
983061Iminus α
ATA
983042A9830422F
983062(xkminus1 minusAdaggerb)minus α
AT
983042A9830422FEkminus1
983045zk
983046
=
983061Iminus α
ATA
983042A9830422F
983062(xkminus1 minusAdaggerb)minus α
AT
983042A9830422F
983061Iminus α
AAT
983042A9830422F
983062zkminus1
=
983061Iminus α
ATA
983042A9830422F
983062(xkminus1 minusAdaggerb)minus α
983061Iminus α
ATA
983042A9830422F
983062AT
983042A9830422Fzkminus1
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 31 43
Taking expectation gives
E983045xk minusAdaggerb
983046
= E983045Ekminus1
983045xk minusAdaggerb
983046983046
=
983061Iminus α
ATA
983042A9830422F
983062E983045xkminus1 minusAdaggerb
983046minus α
983061Iminus α
ATA
983042A9830422F
983062AT
983042A9830422FE983045zkminus1
983046
=
983061Iminus α
ATA
983042A9830422F
983062E983045xkminus1 minusAdaggerb
983046minus α
983061Iminus α
ATA
983042A9830422F
983062kATz0
983042A9830422F
=
983061Iminus α
ATA
983042A9830422F
9830622
E983045xkminus2 minusAdaggerb
983046minus 2α
983061Iminus α
ATA
983042A9830422F
983062kATz0
983042A9830422F= middot middot middot
=
983061Iminus α
ATA
983042A9830422F
983062k
(x0 minusAdaggerb)minus αk
983061Iminus α
ATA
983042A9830422F
983062kATz0
983042A9830422F
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 32 43
Applying the norms to both sides we obtain
983042E983045xk minusAdaggerb
9830469830422
=
983056983056983056983056983056
983061Iminus α
ATA
983042A9830422F
983062k
(x0 minusAdaggerb)minus αk
983061Iminus α
ATA
983042A9830422F
983062kATz0
983042A9830422F
9830569830569830569830569830562
le983056983056983056983056983056
983061Iminus α
ATA
983042A9830422F
983062k
(x0 minusAdaggerb)
9830569830569830569830569830562
+
983056983056983056983056983056αk983061Iminus α
ATA
983042A9830422F
983062kATz0
983042A9830422F
9830569830569830569830569830562
le δk983061983042x0 minusAdaggerb9830422 +
αk983042ATz09830422983042A9830422F
983062
Here the last inequality follows from the facts that
x0 minusAdaggerb isin range(AT) ATz0 isin range(AT)
and Lemma 3
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 33 43
42 Convergence of E983045983042xk minusAdaggerb98304222
983046
The convergence ofE983045983042xk minusAdaggerb98304222
983046
depends on the positive number ρ defined as
ρ = 1minus (2αminus α2)σ2r (A)
983042A9830422F
In the following lemma we show that the vector zk generated in REBKwith z0 isin b+ range(A) converges to
bperp = (IminusAAdagger)b
which is the orthogonal projection of z0 onto the set z | ATz = 0
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 34 43
Lemma 9
For any given consistent or inconsistent linear system
Ax = b
let zk be the vector generated in REBK with
z0 isin b+ range(A)
It holdsE983045983042zk minus bperp98304222
983046le ρk983042z0 minus bperp98304222
Proof By (AJj )Tbperp = 0 we have
zk minus bperp = zkminus1 minus bperp minus α
983042AJj9830422FAJj (AJj )
T(zkminus1 minus bperp) (1)
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 35 43
By z0 minus bperp = AAdaggerz0 isin range(A) we can show by induction that
zk minus bperp isin range(A)
It follows from (1) that
983042zk minus bperp98304222
= 983042zkminus1 minus bperp98304222 minus2α983042(AJj )
T(zkminus1 minus bperp)98304222983042AJj9830422F
+α2
983042AJj9830424F(zkminus1 minus bperp)
TAJj (AJj )TAJj (AJj )
T(zkminus1 minus bperp)
le 983042zkminus1 minus bperp98304222 minus(2αminus α2)983042(AJj )
T(zkminus1 minus bperp)98304222983042AJj9830422F
(by Lemma 1)
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 36 43
Taking the conditioned expectation on the first k minus 1 iterations yields
Ekminus1
983045983042zk minus bperp98304222
983046
le 983042zkminus1 minus bperp98304222 minus(2αminus α2)983042AT(zkminus1 minus bperp)98304222
983042A9830422Fle ρ983042zkminus1 minus bperp98304222 (by Lemma 2)
Taking expectation again gives
E983045983042zk minus bperp98304222
983046= E
983045Ekminus1
983045983042zk minus bperp98304222
983046983046
le ρE983045983042zkminus1 minus bperp98304222
983046
le ρk983042z0 minus bperp98304222
This completes the proof
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 37 43
Theorem 10
For any given consistent or inconsistent linear system Ax = b let xk
be the kth iterate of REBK with
z0 isin b+ range(A) and x0 isin range(AT)
For any ε gt 0 it holds
E983045983042xk minusAdaggerb98304222
983046le (1 + ε)kρk
983061(1 + ε)α2983042z0 minus bperp98304222
ε2983042A9830422F+ 983042x0 minusAdaggerb98304222
983062
Proof Let
983141xk = xkminus1 minus α
983042AIi9830422F(AIi)
TAIi(xkminus1 minusAdaggerb)
which is actually one DSBGS update for the linear systemAx = AAdaggerb from xkminus1
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 38 43
It follows from
xk minus 983141xk =α
983042AIi9830422F(AIi)
T(bIi minusAIiAdaggerbminus zkIi)
that
983042xk minus 983141xk98304222
=α2
983042AIi9830424F(bIi minusAIiA
daggerbminus zkIi)TAIi(AIi)
T(bIi minusAIiAdaggerbminus zkIi
)
leα2983042bIi minusAIiA
daggerbminus zkIi98304222
983042AIi9830422F (by Lemma 1) (2)
It follows from
Ekminus1
983045983042xk minus 983141xk98304222
983046= Ekminus1
983045Eikminus1
983045983042xk minus 983141xk98304222
983046983046
le Ekminus1
983063α2983042bminusAAdaggerbminus zk98304222
983042A9830422F
983064(by (2))
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 39 43
that
E983045983042xk minus 983141xk98304222
983046le α2
983042A9830422FE983045983042bminusAAdaggerbminus zk98304222
983046
le α2ρk
983042A9830422F983042z0 minus bperp98304222 (by Lemma 9) (3)
By x0 isin range(AT) and Adaggerb isin range(AT) we have
x0 minusAdaggerb isin range(AT)
Then we can show that xk minusAdaggerb isin range(AT) by induction By
983042983141xk minusAdaggerb98304222
= 983042xkminus1 minusAdaggerb98304222 minus2α983042AIi(x
kminus1 minusAdaggerb)98304222983042AIi9830422F
+α2
983042AIi9830424F(xkminus1 minusAdaggerb)T(AIi)
TAIi(AIi)TAIi(x
kminus1 minusAdaggerb)
le 983042xkminus1 minusAdaggerb98304222 minus(2αminus α2)983042AIi(x
kminus1 minusAdaggerb)98304222983042AIi9830422F
(by Lemma 1)
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 40 43
we have
Ekminus1
983045983042983141xk minusAdaggerb98304222
983046
le 983042xkminus1 minusAdaggerb98304222 minus(2αminus α2)983042A(xkminus1 minusAdaggerb)98304222
983042A9830422Fle ρ983042xkminus1 minusAdaggerb98304222 (by Lemma 3)
which yields
E983045983042983141xk minusAdaggerb98304222
983046le ρE
983045983042xkminus1 minusAdaggerb98304222
983046 (4)
Note that for any ε gt 0 we have
983042xk minusAdaggerb98304222= 983042xk minus 983141xk + 983141xk minusAdaggerb98304222le (983042xk minus 983141xk9830422 + 983042983141xk minusAdaggerb9830422)2
le 983042xk minus 983141xk98304222 + 983042983141xk minusAdaggerb98304222 + 2983042xk minus 983141xk9830422983042983141xk minusAdaggerb9830422
le9830611 +
1
ε
983062983042xk minus 983141xk98304222 + (1 + ε)983042983141xk minusAdaggerb98304222 (5)
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 41 43
Combining (3) (4) and (5) yields
E983045983042xk minusAdaggerb98304222
983046
le9830611 +
1
ε
983062E983045983042xk minus 983141xk98304222
983046+ (1 + ε)E
983045983042983141xk minusAdaggerb98304222
983046
le9830611 +
1
ε
983062α2ρk
983042A9830422F983042z0 minus bperp98304222 + (1 + ε)ρE
983045983042xkminus1 minusAdaggerb98304222
983046
le9830611 +
1
ε
983062(1 + 1 + ε)α2ρk
983042A9830422F983042z0 minus bperp98304222
+(1 + ε)2ρ2E983045983042xkminus2 minusAdaggerb98304222
983046
le middot middot middot
le9830611 +
1
ε
983062(1 + 1 + ε+ middot middot middot+ (1 + ε)kminus1)α2ρk
983042A9830422F983042z0 minus bperp98304222
+(1 + ε)kρk983042x0 minusAdaggerb98304222
le (1 + ε)kρk983061(1 + ε)α2983042z0 minus bperp98304222
ε2983042A9830422F+ 983042x0 minusAdaggerb98304222
983062
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 42 43
Remark 4
For the case REBK with s = m t = n and α = 1 (ie REK) by theorthogonality
(983141xk minusAdaggerb)T(xk minus 983141xk) = 0
the equation (5) becomes
983042xk minusAdaggerb98304222 = 983042xk minus 983141xk98304222 + 983042983141xk minusAdaggerb98304222
which yields the following convergence for REK
E983045983042xk minusAdaggerb98304222
983046le ρk
983061k983042z0 minus bperp98304222
983042A9830422F+ 983042x0 minusAdaggerb98304222
983062
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 43 43
Kui Du Wutao Si and Xiaohui Sun
Pseudoinverse-free randomized extended block kaczmarz for solving least squaresarXiv preprint arXiv200104179 2020
Kui Du and Xiaohui Sun
A doubly stochastic block Gauss-Seidel algorithm for solving linear equationsarXiv preprint arXiv191213291 2019
Lawrence H Landweber
An iteration formula for Fredholm integral equations of the first kindAmer J Math 73615ndash624 1951
D Leventhal and A S Lewis
Randomized methods for linear constraints convergence rates and conditioningMath Oper Res 35(3)641ndash654 2010
Anna Ma Deanna Needell and Aaditya Ramdas
Convergence properties of the randomized extended Gauss-Seidel and Kaczmarz methodsSIAM J Matrix Anal Appl 36(4)1590ndash1604 2015
Meisam Razaviyayn Mingyi Hong Navid Reyhanian and Zhi-Quan Luo
A linearly convergent doubly stochastic Gauss-Seidel algorithm for solving linear equations and acertain class of over-parameterized optimization problemsMath Program 176(1-2 Ser B)465ndash496 2019
Thomas Strohmer and Roman Vershynin
A randomized Kaczmarz algorithm with exponential convergenceJ Fourier Anal Appl 15(2)262ndash278 2009
Anastasios Zouzias and Nikolaos M Freris
Randomized extended Kaczmarz for solving least squaresSIAM J Matrix Anal Appl 34(2)773ndash793 2013
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 43 43
3 A doubly stochastic block Gauss-Seidel algorithm [2]
Algorithm 1 Doubly stochastic block Gauss-Seidel (DSBGS)
Let α gt 0 Initialize x0 isin Rn
for k = 1 2 do
Pick (IJ ) isin P with probability983042AIJ 9830422F983042A9830422F
Set xk = xkminus1 minus αIJ (AIJ )T(II)T
983042AIJ 9830422F(Axkminus1 minus b)
Landweber [3] (s = 1 and t = 1)
xk = xkminus1 minus αAT
983042A9830422F(Axkminus1 minus b)
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 7 43
Randomized Kaczmarz (RK) [7] (s = m t = 1 α = 1)
At step k RK projects xkminus1 onto the hyperplane x | Aix = bi
xk = xkminus1 minus Aixkminus1 minus bi
983042Ai98304222(Ai)
T
where Ai is the ith row of A and bi is the ith component of b
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 8 43
Randomized Gauss-Seidel [4][5] (s = 1 t = n α = 1)
xk = xkminus1 minus (Aj)T(Axkminus1 minus b)
983042Aj98304222Ij
where Aj is the jth column of A and Ij is the jth column of thentimes n identity matrix I
Doubly stochastic Gauss-Seidel [6] (s = m t = n)
xk = xkminus1 minus αAij(Aix
kminus1 minus bi)
|Aij |2Ij
where Aij is the (i j) entry of A
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 9 43
31 Convergence of the norms of the expectations
Theorem 4
Let xk denote the kth iterate of DSBGS applied to the consistent linearsystem
Ax = b
with arbitrary x0 isin Rn In exact arithmetic it holds
983042E[xk minus x0983183]9830422 le
983061max1leiler
9830559830559830559830551minusασ2
i (A)
983042A9830422F
983055983055983055983055
983062k
983042x0 minus x09831839830422
wherex0983183 = (IminusAdaggerA)x0 +Adaggerb
ie the projection of x0 onto the solution set
x isin Rn | Ax = b
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 10 43
Proof of Theorem 4
Note that the conditioned expectation on xkminus1
E[xk |xkminus1]
= xkminus1 minus αE983063IJ (AIJ )T(II)T
983042AIJ 9830422F
983064(Axkminus1 minus b)
= xkminus1 minus α
983091
983107983131
(IJ )isinP
IJ (AIJ )T(II)T
983042AIJ 9830422F983042AIJ 9830422F983042A9830422F
983092
983108 (Axkminus1 minus b)
= xkminus1 minus αAT
983042A9830422F(Axkminus1 minus b)
Then the conditioned expectation E[xk minus x0983183 |xkminus1] is given by
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 11 43
E[xk minus x0983183 |xkminus1] = E[xk |xkminus1]minus x0
983183
= xkminus1 minus αAT
983042A9830422F(Axkminus1 minus b)minus x0
983183
= xkminus1 minus αAT
983042A9830422F(Axkminus1 minusAx0
983183)minus x0983183
=
983061Iminus αATA
983042A9830422F
983062(xkminus1 minus x0
983183)
Taking expectation gives
E[xk minus x0983183] = E[E[xk minus x0
983183 |xkminus1]] =
983061Iminus αATA
983042A9830422F
983062E[xkminus1 minus x0
983183]
=
983061Iminus αATA
983042A9830422F
983062k
(x0 minus x0983183)
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 12 43
Applying the norms to both sides we obtain
983042E[xk minus x0983183]9830422 le
983061max1leiler
9830559830559830559830551minusασ2
i (A)
983042A9830422F
983055983055983055983055
983062k
983042x0 minus x09831839830422
Here the inequality follows from the fact that
x0 minus x0983183 = AdaggerAx
0 minusAdaggerb isin range(AT)
and Lemma 3
Remark 1
If x0 isin range(AT) then x0983183 = Adaggerb
To ensure convergence of the expected iterate it suffices to have
max1leiler
9830559830559830559830551minusασ2
i (A)
983042A9830422F
983055983055983055983055 lt 1 ie 0 lt α lt2983042A9830422Fσ21(A)
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 13 43
Theorem 5
Let xk denote the kth iterate of DSBGS applied to the consistent orinconsistent linear system
Ax = b
with arbitrary x0 isin Rn In exact arithmetic it holds
983042E[Axk minusAx983183]9830422 le983061max1leiler
9830559830559830559830551minusασ2
i (A)
983042A9830422F
983055983055983055983055
983062k
983042Ax0 minusAx9831839830422
where x983183 is any solution of
ATAx = ATb
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 14 43
Proof of Theorem 5
Note that the conditioned expectation on xkminus1
E[Axk minusAx983183 |xkminus1]
= A(E[xk |xkminus1]minus x983183)
= A
983061xkminus1 minus α
AT
983042A9830422F(Axkminus1 minus b)minus x983183
983062
= A
983061xkminus1 minus αAT
983042A9830422F(Axkminus1 minusAx983183)minus x983183
983062
(by ATb = ATAx983183)
= Axkminus1 minusAx983183 minusαAAT
983042A9830422F(Axkminus1 minusAx983183)
=
983061Iminus αAAT
983042A9830422F
983062(Axkminus1 minusAx983183)
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 15 43
Taking expectation gives
E[Axk minusAx983183] = E[E[Axk minusAx983183 |xkminus1]]
=
983061Iminus αAAT
983042A9830422F
983062E[Axkminus1 minusAx983183]
=
983061Iminus αAAT
983042A9830422F
983062k
(Ax0 minusAx983183)
Applying the norms to both sides we obtain
983042E[Axk minusAx983183]9830422 le983061max1leiler
9830559830559830559830551minusασ2
i (A)
983042A9830422F
983055983055983055983055
983062k
983042Ax0 minusAx9831839830422
Here the inequality follows from the fact that
Ax0 minusAx983183 isin range(A)
and Lemma 3
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 16 43
32 Convergence of the expected norms
Theorem 6
Let xk denote the kth iterate of DSBGS applied to the full column rankconsistent linear system
Ax = b
with arbitrary x0 isin Rn Assume
0 lt α lt 2t
In exact arithmetic it holds
E[983042xk minusAdaggerb98304222] le9830611minus (2αminus tα2)σ2
n(A)
983042A9830422F
983062k
983042x0 minusAdaggerb98304222
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 17 43
Proof of Theorem 6
983042xk minusAdaggerb98304222
=
983056983056983056983056xkminus1 minus α
983061IJ (AIJ )T(II)
T
983042AIJ 9830422F
983062(Axkminus1 minus b)minusAdaggerb
9830569830569830569830562
2
=
983056983056983056983056xkminus1 minus α
983061IJ (AIJ )T(II)
T
983042AIJ 9830422F
983062A(xkminus1 minusAdaggerb)minusAdaggerb
9830569830569830569830562
2
=
983056983056983056983056xkminus1 minusAdaggerbminus α
983061IJ (AIJ )T(II)
TA
983042AIJ 9830422F
983062(xkminus1 minusAdaggerb)
9830569830569830569830562
2
= 983042xkminus1 minusAdaggerb98304222 minus 2α(xkminus1 minusAdaggerb)T983061IJ (AIJ )T(II)
TA
983042AIJ 9830422F
983062(xkminus1 minusAdaggerb)
+α2(xkminus1 minusAdaggerb)T983061ATIIAIJ (IJ )TIJ (AIJ )T(II)
TA
983042AIJ 9830424F
983062(xkminus1 minusAdaggerb)
le 983042xkminus1 minusAdaggerb98304222 minus 2α(xkminus1 minusAdaggerb)T983061IJ (AIJ )T(II)
TA
983042AIJ 9830422F
983062(xkminus1 minusAdaggerb)
+α2(xkminus1 minusAdaggerb)T983061ATII(II)
TA
983042AIJ 9830422F
983062(xkminus1 minusAdaggerb)
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 18 43
The last equality follows from (IJ )TIJ = I and Lemma 1 Takingexpectation gives
E[983042xk minusAdaggerb98304222 |xkminus1]
le 983042xkminus1 minusAdaggerb98304222 minus (2αminus tα2)(xkminus1 minusAdaggerb)T983061ATA
983042A9830422F
983062(xkminus1 minusAdaggerb)
le9830611minus (2αminus tα2)σ2
n(A)
983042A9830422F
983062983042xkminus1 minusAdaggerb98304222 (by Lemma 2)
Taking expectation again gives
E[983042xk minusAdaggerb98304222] = E[E[983042xk minusAdaggerb98304222 |xkminus1]]
le9830611minus (2αminus tα2)σ2
n(A)
983042A9830422F
983062E[983042xkminus1 minusAdaggerb98304222]
le9830611minus (2αminus tα2)σ2
n(A)
983042A9830422F
983062k
983042x0 minusAdaggerb98304222
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 19 43
Remark 2
If t = 1 and x0 isin range(AT) we can show
xk minus x0983183 isin range(AT)
by induction where
x0983183 = (IminusAdaggerA)x0 +Adaggerb
ie the projection of x0 onto the set
x isin Rn | Ax = b
Then for rank deficient consistent linear systems by the sameapproach we can prove the convergence bound
E[983042xk minus x098318398304222] le
9830611minus (2αminus α2)σ2
r (A)
983042A9830422F
983062k
983042x0 minus x098318398304222
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 20 43
Theorem 7
Let xk denote the kth iterate of DSBGS applied to the consistent linearsystem (full column rank or rank-deficient)
Ax = b
with arbitrary x0 isin Rn If t = n and 0 lt α lt 2σ2r (A)983042A9830422F then
E[983042Axk minus b98304222] le9830611 + α2 minus 2ασ2
r (A)
983042A9830422F
983062k
983042Ax0 minus b98304222
If t lt n and 0 lt α lt 2σ2r (A)(tρ) then
E[983042Axk minus b98304222] le9830611minus 2ασ2
r (A)minus tρα2
983042A9830422F
983062k
983042Ax0 minus b98304222
whereρ = max
1lejletσ21(AJj )
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 21 43
Proof of Theorem 7
Note that
983042Axk minus b98304222
=
983056983056983056983056Axkminus1 minus α
983061AIJ (AIJ )T(II)
T
983042AIJ 9830422F
983062(Axkminus1 minus b)minus b
9830569830569830569830562
2
= 983042Axkminus1 minus b98304222 minus 2α(Axkminus1 minus b)T983061AIJ (AIJ )T(II)
T
983042AIJ 9830422F
983062(Axkminus1 minus b)
+α2(Axkminus1 minus b)T983061IIAIJ (IJ )TATAIJ (AIJ )T(II)
T
983042AIJ 9830424F
983062(Axkminus1 minus b)
If t = n then it follows from
(IJ )TATAIJ = 983042AJ 9830422F
(since AIJ = AJ is a column vector) that
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 22 43
983042Axk minus b98304222
= 983042Axkminus1 minus b98304222 minus 2α(Axkminus1 minus b)T983061AIJ (AIJ )T(II)
T
983042AIJ 9830422F
983062(Axkminus1 minus b)
+α2(Axkminus1 minus b)T983061983042AJ 9830422FIIAIJ (AIJ )T(II)
T
983042AIJ 9830424F
983062(Axkminus1 minus b)
le 983042Axkminus1 minus b98304222 minus 2α(Axkminus1 minus b)T983061AIJ (AIJ )T(II)
T
983042AIJ 9830422F
983062(Axkminus1 minus b)
+α2(Axkminus1 minus b)T983061983042AJ 9830422FII(II)T
983042AIJ 9830422F
983062(Axkminus1 minus b) (by Lemma 1)
Taking expectation gives
E[983042Axk minus b98304222 |xkminus1]
le (1 + α2)983042Axkminus1 minus b98304222 minus 2α(Axkminus1 minus b)T983061AAT
983042A9830422F
983062(Axkminus1 minus b)
le9830611 + α2 minus 2ασ2
r(A)
983042A9830422F
983062983042Axkminus1 minus b98304222
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 23 43
The last inequality follows from Axkminus1 minus b isin range(A) and Lemma 2Taking expectation again gives
E[983042Axk minus b98304222] = E[E[983042Axk minus b98304222 |xkminus1]]
le9830611 + α2 minus 2ασ2
r (A)
983042A9830422F
983062E[983042Axkminus1 minus b98304222]
le9830611 + α2 minus 2ασ2
r (A)
983042A9830422F
983062k
983042Ax0 minus b98304222
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 24 43
If t lt n then it follows from
(IJ )TATAIJ = AT
JAJ ≼ ρI
(since ρ = max1lejlet σ21(AJj )) that
983042Axk minus b98304222
le 983042Axkminus1 minus b98304222 minus 2α(Axkminus1 minus b)T983061AIJ (AIJ )T(II)
T
983042AIJ 9830422F
983062(Axkminus1 minus b)
+α2(Axkminus1 minus b)T983061ρIIAIJ (AIJ )T(II)
T
983042AIJ 9830424F
983062(Axkminus1 minus b)
le 983042Axkminus1 minus b98304222 minus 2α(Axkminus1 minus b)T983061AIJ (AIJ )T(II)
T
983042AIJ 9830422F
983062(Axkminus1 minus b)
+α2(Axkminus1 minus b)T983061ρII(II)
T
983042AIJ 9830422F
983062(Axkminus1 minus b) (by Lemma 1)
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 25 43
Taking expectation gives
E[983042Axk minus b98304222 |xkminus1]
le9830611 +
tρα2
983042A9830422F
983062983042Axkminus1 minus b98304222 minus 2α(Axkminus1 minus b)T
983061AAT
983042A9830422F
983062(Axkminus1 minus b)
le9830611minus 2ασ2
r(A)minus tρα2
983042A9830422F
983062983042Axkminus1 minus b98304222
The last inequality follows from Axkminus1 minus b isin range(A) and Lemma 2Taking expectation again gives
E[983042Axk minus b98304222] = E[E[983042Axk minus b98304222 |xkminus1]]
le9830611minus 2ασ2
r (A)minus tρα2
983042A9830422F
983062E[983042Axkminus1 minus b98304222]
le9830611minus 2ασ2
r (A)minus tρα2
983042A9830422F
983062k
983042Ax0 minus b98304222
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 26 43
Remark 3
Note that(AJ )
Tb = (AJ )TAx983183
where x983183 is any solution of ATAx = ATb For inconsistent linearsystems we can prove DSBGS(1 t) has the convergence bounds Ift = n and 0 lt α lt 2σ2
r (A)983042A9830422F then
E[983042Axk minusAx98318398304222] le9830611 + α2 minus 2ασ2
r (A)
983042A9830422F
983062k
983042Ax0 minus b98304222
If t lt n and 0 lt α lt 2σ2r (A)(tρ) then
E[983042Axk minusAx98318398304222] le9830611minus 2ασ2
r (A)minus tρα2
983042A9830422F
983062k
983042Ax0 minus b98304222
whereρ = max
1lejletσ21(AJj )
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 27 43
4 Randomized extended block Kaczmarz (REBK) [1]
Algorithm 2 Randomized extended block Kaczmarz (REBK)
Let I1 I2 Is and J1J2 Jt be partitions of [m]and [n] respectivelyLet α gt 0 Initialize z0 isin b+ range(A) and x0 isin range(AT)for k = 1 2 do
Pick j isin [t] with probability 983042AJj9830422F983042A9830422FSet zk = zkminus1 minus α
983042AJj9830422FAJj (AJj )
Tzkminus1
Pick i isin [s] with probability 983042AIi9830422F983042A9830422FSet xk = xkminus1 minus α
983042AIi9830422F(AIi)
T(AIixkminus1 minus bIi + zkIi)
By choosing s = m t = n and α = 1 we recover the well-knownrandomized extended Kaczmarz (REK) algorithm of Zouzias and Freris[8]
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 28 43
Let Ekminus1
983045middot983046denote the conditional expectation conditioned on the first
k minus 1 iterations of REBK That is
Ekminus1
983045middot983046= E
983045middot|j1 i1 j2 i2 jkminus1 ikminus1
983046
where jl is the lth column block chosen and il is the lth row blockchosen
We denote the conditional expectation conditioned on the first k minus 1iterations and the kth column block chosen as
Eikminus1
983045middot983046= E
983045middot|j1 i1 j2 i2 jkminus1 ikminus1 jk
983046
Then by the law of total expectation we have
Ekminus1
983045middot983046= Ekminus1
983045Eikminus1
983045middot983046983046
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 29 43
41 Convergence of the norms of the expectations
Theorem 8
For any given consistent or inconsistent linear system
Ax = b
let xk be the kth iterate of REBK with
z0 isin b+ range(A) and x0 isin range(AT)
It holds
983042E983045xk minusAdaggerb
9830469830422 le δk
983061983042x0 minusAdaggerb9830422 +
αk983042ATz09830422983042A9830422F
983062
where
δ = max1leiler
9830559830559830559830551minusασ2
i (A)
983042A9830422F
983055983055983055983055
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 30 43
Proof By ATb = ATAAdaggerb and
xk minusAdaggerb = xkminus1 minusAdaggerbminus α
983042AIi9830422F(AIi)
T(AIixkminus1 minus bIi + zkIi)
we have
Ekminus1
983045xk minusAdaggerb
983046= Ekminus1
983045Eikminus1
983045xk minusAdaggerb
983046983046
= xkminus1 minusAdaggerbminus Ekminus1
983063αAT(Axkminus1 minus b+ zk)
983042A9830422F
983064
= xkminus1 minusAdaggerbminus αATAxkminus1 minusATb
983042A9830422Fminus α
AT
983042A9830422FEkminus1
983045zk
983046
=
983061Iminus α
ATA
983042A9830422F
983062(xkminus1 minusAdaggerb)minus α
AT
983042A9830422FEkminus1
983045zk
983046
=
983061Iminus α
ATA
983042A9830422F
983062(xkminus1 minusAdaggerb)minus α
AT
983042A9830422F
983061Iminus α
AAT
983042A9830422F
983062zkminus1
=
983061Iminus α
ATA
983042A9830422F
983062(xkminus1 minusAdaggerb)minus α
983061Iminus α
ATA
983042A9830422F
983062AT
983042A9830422Fzkminus1
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 31 43
Taking expectation gives
E983045xk minusAdaggerb
983046
= E983045Ekminus1
983045xk minusAdaggerb
983046983046
=
983061Iminus α
ATA
983042A9830422F
983062E983045xkminus1 minusAdaggerb
983046minus α
983061Iminus α
ATA
983042A9830422F
983062AT
983042A9830422FE983045zkminus1
983046
=
983061Iminus α
ATA
983042A9830422F
983062E983045xkminus1 minusAdaggerb
983046minus α
983061Iminus α
ATA
983042A9830422F
983062kATz0
983042A9830422F
=
983061Iminus α
ATA
983042A9830422F
9830622
E983045xkminus2 minusAdaggerb
983046minus 2α
983061Iminus α
ATA
983042A9830422F
983062kATz0
983042A9830422F= middot middot middot
=
983061Iminus α
ATA
983042A9830422F
983062k
(x0 minusAdaggerb)minus αk
983061Iminus α
ATA
983042A9830422F
983062kATz0
983042A9830422F
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 32 43
Applying the norms to both sides we obtain
983042E983045xk minusAdaggerb
9830469830422
=
983056983056983056983056983056
983061Iminus α
ATA
983042A9830422F
983062k
(x0 minusAdaggerb)minus αk
983061Iminus α
ATA
983042A9830422F
983062kATz0
983042A9830422F
9830569830569830569830569830562
le983056983056983056983056983056
983061Iminus α
ATA
983042A9830422F
983062k
(x0 minusAdaggerb)
9830569830569830569830569830562
+
983056983056983056983056983056αk983061Iminus α
ATA
983042A9830422F
983062kATz0
983042A9830422F
9830569830569830569830569830562
le δk983061983042x0 minusAdaggerb9830422 +
αk983042ATz09830422983042A9830422F
983062
Here the last inequality follows from the facts that
x0 minusAdaggerb isin range(AT) ATz0 isin range(AT)
and Lemma 3
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 33 43
42 Convergence of E983045983042xk minusAdaggerb98304222
983046
The convergence ofE983045983042xk minusAdaggerb98304222
983046
depends on the positive number ρ defined as
ρ = 1minus (2αminus α2)σ2r (A)
983042A9830422F
In the following lemma we show that the vector zk generated in REBKwith z0 isin b+ range(A) converges to
bperp = (IminusAAdagger)b
which is the orthogonal projection of z0 onto the set z | ATz = 0
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 34 43
Lemma 9
For any given consistent or inconsistent linear system
Ax = b
let zk be the vector generated in REBK with
z0 isin b+ range(A)
It holdsE983045983042zk minus bperp98304222
983046le ρk983042z0 minus bperp98304222
Proof By (AJj )Tbperp = 0 we have
zk minus bperp = zkminus1 minus bperp minus α
983042AJj9830422FAJj (AJj )
T(zkminus1 minus bperp) (1)
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 35 43
By z0 minus bperp = AAdaggerz0 isin range(A) we can show by induction that
zk minus bperp isin range(A)
It follows from (1) that
983042zk minus bperp98304222
= 983042zkminus1 minus bperp98304222 minus2α983042(AJj )
T(zkminus1 minus bperp)98304222983042AJj9830422F
+α2
983042AJj9830424F(zkminus1 minus bperp)
TAJj (AJj )TAJj (AJj )
T(zkminus1 minus bperp)
le 983042zkminus1 minus bperp98304222 minus(2αminus α2)983042(AJj )
T(zkminus1 minus bperp)98304222983042AJj9830422F
(by Lemma 1)
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 36 43
Taking the conditioned expectation on the first k minus 1 iterations yields
Ekminus1
983045983042zk minus bperp98304222
983046
le 983042zkminus1 minus bperp98304222 minus(2αminus α2)983042AT(zkminus1 minus bperp)98304222
983042A9830422Fle ρ983042zkminus1 minus bperp98304222 (by Lemma 2)
Taking expectation again gives
E983045983042zk minus bperp98304222
983046= E
983045Ekminus1
983045983042zk minus bperp98304222
983046983046
le ρE983045983042zkminus1 minus bperp98304222
983046
le ρk983042z0 minus bperp98304222
This completes the proof
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 37 43
Theorem 10
For any given consistent or inconsistent linear system Ax = b let xk
be the kth iterate of REBK with
z0 isin b+ range(A) and x0 isin range(AT)
For any ε gt 0 it holds
E983045983042xk minusAdaggerb98304222
983046le (1 + ε)kρk
983061(1 + ε)α2983042z0 minus bperp98304222
ε2983042A9830422F+ 983042x0 minusAdaggerb98304222
983062
Proof Let
983141xk = xkminus1 minus α
983042AIi9830422F(AIi)
TAIi(xkminus1 minusAdaggerb)
which is actually one DSBGS update for the linear systemAx = AAdaggerb from xkminus1
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 38 43
It follows from
xk minus 983141xk =α
983042AIi9830422F(AIi)
T(bIi minusAIiAdaggerbminus zkIi)
that
983042xk minus 983141xk98304222
=α2
983042AIi9830424F(bIi minusAIiA
daggerbminus zkIi)TAIi(AIi)
T(bIi minusAIiAdaggerbminus zkIi
)
leα2983042bIi minusAIiA
daggerbminus zkIi98304222
983042AIi9830422F (by Lemma 1) (2)
It follows from
Ekminus1
983045983042xk minus 983141xk98304222
983046= Ekminus1
983045Eikminus1
983045983042xk minus 983141xk98304222
983046983046
le Ekminus1
983063α2983042bminusAAdaggerbminus zk98304222
983042A9830422F
983064(by (2))
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 39 43
that
E983045983042xk minus 983141xk98304222
983046le α2
983042A9830422FE983045983042bminusAAdaggerbminus zk98304222
983046
le α2ρk
983042A9830422F983042z0 minus bperp98304222 (by Lemma 9) (3)
By x0 isin range(AT) and Adaggerb isin range(AT) we have
x0 minusAdaggerb isin range(AT)
Then we can show that xk minusAdaggerb isin range(AT) by induction By
983042983141xk minusAdaggerb98304222
= 983042xkminus1 minusAdaggerb98304222 minus2α983042AIi(x
kminus1 minusAdaggerb)98304222983042AIi9830422F
+α2
983042AIi9830424F(xkminus1 minusAdaggerb)T(AIi)
TAIi(AIi)TAIi(x
kminus1 minusAdaggerb)
le 983042xkminus1 minusAdaggerb98304222 minus(2αminus α2)983042AIi(x
kminus1 minusAdaggerb)98304222983042AIi9830422F
(by Lemma 1)
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 40 43
we have
Ekminus1
983045983042983141xk minusAdaggerb98304222
983046
le 983042xkminus1 minusAdaggerb98304222 minus(2αminus α2)983042A(xkminus1 minusAdaggerb)98304222
983042A9830422Fle ρ983042xkminus1 minusAdaggerb98304222 (by Lemma 3)
which yields
E983045983042983141xk minusAdaggerb98304222
983046le ρE
983045983042xkminus1 minusAdaggerb98304222
983046 (4)
Note that for any ε gt 0 we have
983042xk minusAdaggerb98304222= 983042xk minus 983141xk + 983141xk minusAdaggerb98304222le (983042xk minus 983141xk9830422 + 983042983141xk minusAdaggerb9830422)2
le 983042xk minus 983141xk98304222 + 983042983141xk minusAdaggerb98304222 + 2983042xk minus 983141xk9830422983042983141xk minusAdaggerb9830422
le9830611 +
1
ε
983062983042xk minus 983141xk98304222 + (1 + ε)983042983141xk minusAdaggerb98304222 (5)
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 41 43
Combining (3) (4) and (5) yields
E983045983042xk minusAdaggerb98304222
983046
le9830611 +
1
ε
983062E983045983042xk minus 983141xk98304222
983046+ (1 + ε)E
983045983042983141xk minusAdaggerb98304222
983046
le9830611 +
1
ε
983062α2ρk
983042A9830422F983042z0 minus bperp98304222 + (1 + ε)ρE
983045983042xkminus1 minusAdaggerb98304222
983046
le9830611 +
1
ε
983062(1 + 1 + ε)α2ρk
983042A9830422F983042z0 minus bperp98304222
+(1 + ε)2ρ2E983045983042xkminus2 minusAdaggerb98304222
983046
le middot middot middot
le9830611 +
1
ε
983062(1 + 1 + ε+ middot middot middot+ (1 + ε)kminus1)α2ρk
983042A9830422F983042z0 minus bperp98304222
+(1 + ε)kρk983042x0 minusAdaggerb98304222
le (1 + ε)kρk983061(1 + ε)α2983042z0 minus bperp98304222
ε2983042A9830422F+ 983042x0 minusAdaggerb98304222
983062
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 42 43
Remark 4
For the case REBK with s = m t = n and α = 1 (ie REK) by theorthogonality
(983141xk minusAdaggerb)T(xk minus 983141xk) = 0
the equation (5) becomes
983042xk minusAdaggerb98304222 = 983042xk minus 983141xk98304222 + 983042983141xk minusAdaggerb98304222
which yields the following convergence for REK
E983045983042xk minusAdaggerb98304222
983046le ρk
983061k983042z0 minus bperp98304222
983042A9830422F+ 983042x0 minusAdaggerb98304222
983062
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 43 43
Kui Du Wutao Si and Xiaohui Sun
Pseudoinverse-free randomized extended block kaczmarz for solving least squaresarXiv preprint arXiv200104179 2020
Kui Du and Xiaohui Sun
A doubly stochastic block Gauss-Seidel algorithm for solving linear equationsarXiv preprint arXiv191213291 2019
Lawrence H Landweber
An iteration formula for Fredholm integral equations of the first kindAmer J Math 73615ndash624 1951
D Leventhal and A S Lewis
Randomized methods for linear constraints convergence rates and conditioningMath Oper Res 35(3)641ndash654 2010
Anna Ma Deanna Needell and Aaditya Ramdas
Convergence properties of the randomized extended Gauss-Seidel and Kaczmarz methodsSIAM J Matrix Anal Appl 36(4)1590ndash1604 2015
Meisam Razaviyayn Mingyi Hong Navid Reyhanian and Zhi-Quan Luo
A linearly convergent doubly stochastic Gauss-Seidel algorithm for solving linear equations and acertain class of over-parameterized optimization problemsMath Program 176(1-2 Ser B)465ndash496 2019
Thomas Strohmer and Roman Vershynin
A randomized Kaczmarz algorithm with exponential convergenceJ Fourier Anal Appl 15(2)262ndash278 2009
Anastasios Zouzias and Nikolaos M Freris
Randomized extended Kaczmarz for solving least squaresSIAM J Matrix Anal Appl 34(2)773ndash793 2013
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 43 43
Randomized Kaczmarz (RK) [7] (s = m t = 1 α = 1)
At step k RK projects xkminus1 onto the hyperplane x | Aix = bi
xk = xkminus1 minus Aixkminus1 minus bi
983042Ai98304222(Ai)
T
where Ai is the ith row of A and bi is the ith component of b
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 8 43
Randomized Gauss-Seidel [4][5] (s = 1 t = n α = 1)
xk = xkminus1 minus (Aj)T(Axkminus1 minus b)
983042Aj98304222Ij
where Aj is the jth column of A and Ij is the jth column of thentimes n identity matrix I
Doubly stochastic Gauss-Seidel [6] (s = m t = n)
xk = xkminus1 minus αAij(Aix
kminus1 minus bi)
|Aij |2Ij
where Aij is the (i j) entry of A
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 9 43
31 Convergence of the norms of the expectations
Theorem 4
Let xk denote the kth iterate of DSBGS applied to the consistent linearsystem
Ax = b
with arbitrary x0 isin Rn In exact arithmetic it holds
983042E[xk minus x0983183]9830422 le
983061max1leiler
9830559830559830559830551minusασ2
i (A)
983042A9830422F
983055983055983055983055
983062k
983042x0 minus x09831839830422
wherex0983183 = (IminusAdaggerA)x0 +Adaggerb
ie the projection of x0 onto the solution set
x isin Rn | Ax = b
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 10 43
Proof of Theorem 4
Note that the conditioned expectation on xkminus1
E[xk |xkminus1]
= xkminus1 minus αE983063IJ (AIJ )T(II)T
983042AIJ 9830422F
983064(Axkminus1 minus b)
= xkminus1 minus α
983091
983107983131
(IJ )isinP
IJ (AIJ )T(II)T
983042AIJ 9830422F983042AIJ 9830422F983042A9830422F
983092
983108 (Axkminus1 minus b)
= xkminus1 minus αAT
983042A9830422F(Axkminus1 minus b)
Then the conditioned expectation E[xk minus x0983183 |xkminus1] is given by
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 11 43
E[xk minus x0983183 |xkminus1] = E[xk |xkminus1]minus x0
983183
= xkminus1 minus αAT
983042A9830422F(Axkminus1 minus b)minus x0
983183
= xkminus1 minus αAT
983042A9830422F(Axkminus1 minusAx0
983183)minus x0983183
=
983061Iminus αATA
983042A9830422F
983062(xkminus1 minus x0
983183)
Taking expectation gives
E[xk minus x0983183] = E[E[xk minus x0
983183 |xkminus1]] =
983061Iminus αATA
983042A9830422F
983062E[xkminus1 minus x0
983183]
=
983061Iminus αATA
983042A9830422F
983062k
(x0 minus x0983183)
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 12 43
Applying the norms to both sides we obtain
983042E[xk minus x0983183]9830422 le
983061max1leiler
9830559830559830559830551minusασ2
i (A)
983042A9830422F
983055983055983055983055
983062k
983042x0 minus x09831839830422
Here the inequality follows from the fact that
x0 minus x0983183 = AdaggerAx
0 minusAdaggerb isin range(AT)
and Lemma 3
Remark 1
If x0 isin range(AT) then x0983183 = Adaggerb
To ensure convergence of the expected iterate it suffices to have
max1leiler
9830559830559830559830551minusασ2
i (A)
983042A9830422F
983055983055983055983055 lt 1 ie 0 lt α lt2983042A9830422Fσ21(A)
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 13 43
Theorem 5
Let xk denote the kth iterate of DSBGS applied to the consistent orinconsistent linear system
Ax = b
with arbitrary x0 isin Rn In exact arithmetic it holds
983042E[Axk minusAx983183]9830422 le983061max1leiler
9830559830559830559830551minusασ2
i (A)
983042A9830422F
983055983055983055983055
983062k
983042Ax0 minusAx9831839830422
where x983183 is any solution of
ATAx = ATb
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 14 43
Proof of Theorem 5
Note that the conditioned expectation on xkminus1
E[Axk minusAx983183 |xkminus1]
= A(E[xk |xkminus1]minus x983183)
= A
983061xkminus1 minus α
AT
983042A9830422F(Axkminus1 minus b)minus x983183
983062
= A
983061xkminus1 minus αAT
983042A9830422F(Axkminus1 minusAx983183)minus x983183
983062
(by ATb = ATAx983183)
= Axkminus1 minusAx983183 minusαAAT
983042A9830422F(Axkminus1 minusAx983183)
=
983061Iminus αAAT
983042A9830422F
983062(Axkminus1 minusAx983183)
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 15 43
Taking expectation gives
E[Axk minusAx983183] = E[E[Axk minusAx983183 |xkminus1]]
=
983061Iminus αAAT
983042A9830422F
983062E[Axkminus1 minusAx983183]
=
983061Iminus αAAT
983042A9830422F
983062k
(Ax0 minusAx983183)
Applying the norms to both sides we obtain
983042E[Axk minusAx983183]9830422 le983061max1leiler
9830559830559830559830551minusασ2
i (A)
983042A9830422F
983055983055983055983055
983062k
983042Ax0 minusAx9831839830422
Here the inequality follows from the fact that
Ax0 minusAx983183 isin range(A)
and Lemma 3
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 16 43
32 Convergence of the expected norms
Theorem 6
Let xk denote the kth iterate of DSBGS applied to the full column rankconsistent linear system
Ax = b
with arbitrary x0 isin Rn Assume
0 lt α lt 2t
In exact arithmetic it holds
E[983042xk minusAdaggerb98304222] le9830611minus (2αminus tα2)σ2
n(A)
983042A9830422F
983062k
983042x0 minusAdaggerb98304222
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 17 43
Proof of Theorem 6
983042xk minusAdaggerb98304222
=
983056983056983056983056xkminus1 minus α
983061IJ (AIJ )T(II)
T
983042AIJ 9830422F
983062(Axkminus1 minus b)minusAdaggerb
9830569830569830569830562
2
=
983056983056983056983056xkminus1 minus α
983061IJ (AIJ )T(II)
T
983042AIJ 9830422F
983062A(xkminus1 minusAdaggerb)minusAdaggerb
9830569830569830569830562
2
=
983056983056983056983056xkminus1 minusAdaggerbminus α
983061IJ (AIJ )T(II)
TA
983042AIJ 9830422F
983062(xkminus1 minusAdaggerb)
9830569830569830569830562
2
= 983042xkminus1 minusAdaggerb98304222 minus 2α(xkminus1 minusAdaggerb)T983061IJ (AIJ )T(II)
TA
983042AIJ 9830422F
983062(xkminus1 minusAdaggerb)
+α2(xkminus1 minusAdaggerb)T983061ATIIAIJ (IJ )TIJ (AIJ )T(II)
TA
983042AIJ 9830424F
983062(xkminus1 minusAdaggerb)
le 983042xkminus1 minusAdaggerb98304222 minus 2α(xkminus1 minusAdaggerb)T983061IJ (AIJ )T(II)
TA
983042AIJ 9830422F
983062(xkminus1 minusAdaggerb)
+α2(xkminus1 minusAdaggerb)T983061ATII(II)
TA
983042AIJ 9830422F
983062(xkminus1 minusAdaggerb)
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 18 43
The last equality follows from (IJ )TIJ = I and Lemma 1 Takingexpectation gives
E[983042xk minusAdaggerb98304222 |xkminus1]
le 983042xkminus1 minusAdaggerb98304222 minus (2αminus tα2)(xkminus1 minusAdaggerb)T983061ATA
983042A9830422F
983062(xkminus1 minusAdaggerb)
le9830611minus (2αminus tα2)σ2
n(A)
983042A9830422F
983062983042xkminus1 minusAdaggerb98304222 (by Lemma 2)
Taking expectation again gives
E[983042xk minusAdaggerb98304222] = E[E[983042xk minusAdaggerb98304222 |xkminus1]]
le9830611minus (2αminus tα2)σ2
n(A)
983042A9830422F
983062E[983042xkminus1 minusAdaggerb98304222]
le9830611minus (2αminus tα2)σ2
n(A)
983042A9830422F
983062k
983042x0 minusAdaggerb98304222
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 19 43
Remark 2
If t = 1 and x0 isin range(AT) we can show
xk minus x0983183 isin range(AT)
by induction where
x0983183 = (IminusAdaggerA)x0 +Adaggerb
ie the projection of x0 onto the set
x isin Rn | Ax = b
Then for rank deficient consistent linear systems by the sameapproach we can prove the convergence bound
E[983042xk minus x098318398304222] le
9830611minus (2αminus α2)σ2
r (A)
983042A9830422F
983062k
983042x0 minus x098318398304222
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 20 43
Theorem 7
Let xk denote the kth iterate of DSBGS applied to the consistent linearsystem (full column rank or rank-deficient)
Ax = b
with arbitrary x0 isin Rn If t = n and 0 lt α lt 2σ2r (A)983042A9830422F then
E[983042Axk minus b98304222] le9830611 + α2 minus 2ασ2
r (A)
983042A9830422F
983062k
983042Ax0 minus b98304222
If t lt n and 0 lt α lt 2σ2r (A)(tρ) then
E[983042Axk minus b98304222] le9830611minus 2ασ2
r (A)minus tρα2
983042A9830422F
983062k
983042Ax0 minus b98304222
whereρ = max
1lejletσ21(AJj )
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 21 43
Proof of Theorem 7
Note that
983042Axk minus b98304222
=
983056983056983056983056Axkminus1 minus α
983061AIJ (AIJ )T(II)
T
983042AIJ 9830422F
983062(Axkminus1 minus b)minus b
9830569830569830569830562
2
= 983042Axkminus1 minus b98304222 minus 2α(Axkminus1 minus b)T983061AIJ (AIJ )T(II)
T
983042AIJ 9830422F
983062(Axkminus1 minus b)
+α2(Axkminus1 minus b)T983061IIAIJ (IJ )TATAIJ (AIJ )T(II)
T
983042AIJ 9830424F
983062(Axkminus1 minus b)
If t = n then it follows from
(IJ )TATAIJ = 983042AJ 9830422F
(since AIJ = AJ is a column vector) that
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 22 43
983042Axk minus b98304222
= 983042Axkminus1 minus b98304222 minus 2α(Axkminus1 minus b)T983061AIJ (AIJ )T(II)
T
983042AIJ 9830422F
983062(Axkminus1 minus b)
+α2(Axkminus1 minus b)T983061983042AJ 9830422FIIAIJ (AIJ )T(II)
T
983042AIJ 9830424F
983062(Axkminus1 minus b)
le 983042Axkminus1 minus b98304222 minus 2α(Axkminus1 minus b)T983061AIJ (AIJ )T(II)
T
983042AIJ 9830422F
983062(Axkminus1 minus b)
+α2(Axkminus1 minus b)T983061983042AJ 9830422FII(II)T
983042AIJ 9830422F
983062(Axkminus1 minus b) (by Lemma 1)
Taking expectation gives
E[983042Axk minus b98304222 |xkminus1]
le (1 + α2)983042Axkminus1 minus b98304222 minus 2α(Axkminus1 minus b)T983061AAT
983042A9830422F
983062(Axkminus1 minus b)
le9830611 + α2 minus 2ασ2
r(A)
983042A9830422F
983062983042Axkminus1 minus b98304222
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 23 43
The last inequality follows from Axkminus1 minus b isin range(A) and Lemma 2Taking expectation again gives
E[983042Axk minus b98304222] = E[E[983042Axk minus b98304222 |xkminus1]]
le9830611 + α2 minus 2ασ2
r (A)
983042A9830422F
983062E[983042Axkminus1 minus b98304222]
le9830611 + α2 minus 2ασ2
r (A)
983042A9830422F
983062k
983042Ax0 minus b98304222
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 24 43
If t lt n then it follows from
(IJ )TATAIJ = AT
JAJ ≼ ρI
(since ρ = max1lejlet σ21(AJj )) that
983042Axk minus b98304222
le 983042Axkminus1 minus b98304222 minus 2α(Axkminus1 minus b)T983061AIJ (AIJ )T(II)
T
983042AIJ 9830422F
983062(Axkminus1 minus b)
+α2(Axkminus1 minus b)T983061ρIIAIJ (AIJ )T(II)
T
983042AIJ 9830424F
983062(Axkminus1 minus b)
le 983042Axkminus1 minus b98304222 minus 2α(Axkminus1 minus b)T983061AIJ (AIJ )T(II)
T
983042AIJ 9830422F
983062(Axkminus1 minus b)
+α2(Axkminus1 minus b)T983061ρII(II)
T
983042AIJ 9830422F
983062(Axkminus1 minus b) (by Lemma 1)
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 25 43
Taking expectation gives
E[983042Axk minus b98304222 |xkminus1]
le9830611 +
tρα2
983042A9830422F
983062983042Axkminus1 minus b98304222 minus 2α(Axkminus1 minus b)T
983061AAT
983042A9830422F
983062(Axkminus1 minus b)
le9830611minus 2ασ2
r(A)minus tρα2
983042A9830422F
983062983042Axkminus1 minus b98304222
The last inequality follows from Axkminus1 minus b isin range(A) and Lemma 2Taking expectation again gives
E[983042Axk minus b98304222] = E[E[983042Axk minus b98304222 |xkminus1]]
le9830611minus 2ασ2
r (A)minus tρα2
983042A9830422F
983062E[983042Axkminus1 minus b98304222]
le9830611minus 2ασ2
r (A)minus tρα2
983042A9830422F
983062k
983042Ax0 minus b98304222
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 26 43
Remark 3
Note that(AJ )
Tb = (AJ )TAx983183
where x983183 is any solution of ATAx = ATb For inconsistent linearsystems we can prove DSBGS(1 t) has the convergence bounds Ift = n and 0 lt α lt 2σ2
r (A)983042A9830422F then
E[983042Axk minusAx98318398304222] le9830611 + α2 minus 2ασ2
r (A)
983042A9830422F
983062k
983042Ax0 minus b98304222
If t lt n and 0 lt α lt 2σ2r (A)(tρ) then
E[983042Axk minusAx98318398304222] le9830611minus 2ασ2
r (A)minus tρα2
983042A9830422F
983062k
983042Ax0 minus b98304222
whereρ = max
1lejletσ21(AJj )
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 27 43
4 Randomized extended block Kaczmarz (REBK) [1]
Algorithm 2 Randomized extended block Kaczmarz (REBK)
Let I1 I2 Is and J1J2 Jt be partitions of [m]and [n] respectivelyLet α gt 0 Initialize z0 isin b+ range(A) and x0 isin range(AT)for k = 1 2 do
Pick j isin [t] with probability 983042AJj9830422F983042A9830422FSet zk = zkminus1 minus α
983042AJj9830422FAJj (AJj )
Tzkminus1
Pick i isin [s] with probability 983042AIi9830422F983042A9830422FSet xk = xkminus1 minus α
983042AIi9830422F(AIi)
T(AIixkminus1 minus bIi + zkIi)
By choosing s = m t = n and α = 1 we recover the well-knownrandomized extended Kaczmarz (REK) algorithm of Zouzias and Freris[8]
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 28 43
Let Ekminus1
983045middot983046denote the conditional expectation conditioned on the first
k minus 1 iterations of REBK That is
Ekminus1
983045middot983046= E
983045middot|j1 i1 j2 i2 jkminus1 ikminus1
983046
where jl is the lth column block chosen and il is the lth row blockchosen
We denote the conditional expectation conditioned on the first k minus 1iterations and the kth column block chosen as
Eikminus1
983045middot983046= E
983045middot|j1 i1 j2 i2 jkminus1 ikminus1 jk
983046
Then by the law of total expectation we have
Ekminus1
983045middot983046= Ekminus1
983045Eikminus1
983045middot983046983046
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 29 43
41 Convergence of the norms of the expectations
Theorem 8
For any given consistent or inconsistent linear system
Ax = b
let xk be the kth iterate of REBK with
z0 isin b+ range(A) and x0 isin range(AT)
It holds
983042E983045xk minusAdaggerb
9830469830422 le δk
983061983042x0 minusAdaggerb9830422 +
αk983042ATz09830422983042A9830422F
983062
where
δ = max1leiler
9830559830559830559830551minusασ2
i (A)
983042A9830422F
983055983055983055983055
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 30 43
Proof By ATb = ATAAdaggerb and
xk minusAdaggerb = xkminus1 minusAdaggerbminus α
983042AIi9830422F(AIi)
T(AIixkminus1 minus bIi + zkIi)
we have
Ekminus1
983045xk minusAdaggerb
983046= Ekminus1
983045Eikminus1
983045xk minusAdaggerb
983046983046
= xkminus1 minusAdaggerbminus Ekminus1
983063αAT(Axkminus1 minus b+ zk)
983042A9830422F
983064
= xkminus1 minusAdaggerbminus αATAxkminus1 minusATb
983042A9830422Fminus α
AT
983042A9830422FEkminus1
983045zk
983046
=
983061Iminus α
ATA
983042A9830422F
983062(xkminus1 minusAdaggerb)minus α
AT
983042A9830422FEkminus1
983045zk
983046
=
983061Iminus α
ATA
983042A9830422F
983062(xkminus1 minusAdaggerb)minus α
AT
983042A9830422F
983061Iminus α
AAT
983042A9830422F
983062zkminus1
=
983061Iminus α
ATA
983042A9830422F
983062(xkminus1 minusAdaggerb)minus α
983061Iminus α
ATA
983042A9830422F
983062AT
983042A9830422Fzkminus1
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 31 43
Taking expectation gives
E983045xk minusAdaggerb
983046
= E983045Ekminus1
983045xk minusAdaggerb
983046983046
=
983061Iminus α
ATA
983042A9830422F
983062E983045xkminus1 minusAdaggerb
983046minus α
983061Iminus α
ATA
983042A9830422F
983062AT
983042A9830422FE983045zkminus1
983046
=
983061Iminus α
ATA
983042A9830422F
983062E983045xkminus1 minusAdaggerb
983046minus α
983061Iminus α
ATA
983042A9830422F
983062kATz0
983042A9830422F
=
983061Iminus α
ATA
983042A9830422F
9830622
E983045xkminus2 minusAdaggerb
983046minus 2α
983061Iminus α
ATA
983042A9830422F
983062kATz0
983042A9830422F= middot middot middot
=
983061Iminus α
ATA
983042A9830422F
983062k
(x0 minusAdaggerb)minus αk
983061Iminus α
ATA
983042A9830422F
983062kATz0
983042A9830422F
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 32 43
Applying the norms to both sides we obtain
983042E983045xk minusAdaggerb
9830469830422
=
983056983056983056983056983056
983061Iminus α
ATA
983042A9830422F
983062k
(x0 minusAdaggerb)minus αk
983061Iminus α
ATA
983042A9830422F
983062kATz0
983042A9830422F
9830569830569830569830569830562
le983056983056983056983056983056
983061Iminus α
ATA
983042A9830422F
983062k
(x0 minusAdaggerb)
9830569830569830569830569830562
+
983056983056983056983056983056αk983061Iminus α
ATA
983042A9830422F
983062kATz0
983042A9830422F
9830569830569830569830569830562
le δk983061983042x0 minusAdaggerb9830422 +
αk983042ATz09830422983042A9830422F
983062
Here the last inequality follows from the facts that
x0 minusAdaggerb isin range(AT) ATz0 isin range(AT)
and Lemma 3
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 33 43
42 Convergence of E983045983042xk minusAdaggerb98304222
983046
The convergence ofE983045983042xk minusAdaggerb98304222
983046
depends on the positive number ρ defined as
ρ = 1minus (2αminus α2)σ2r (A)
983042A9830422F
In the following lemma we show that the vector zk generated in REBKwith z0 isin b+ range(A) converges to
bperp = (IminusAAdagger)b
which is the orthogonal projection of z0 onto the set z | ATz = 0
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 34 43
Lemma 9
For any given consistent or inconsistent linear system
Ax = b
let zk be the vector generated in REBK with
z0 isin b+ range(A)
It holdsE983045983042zk minus bperp98304222
983046le ρk983042z0 minus bperp98304222
Proof By (AJj )Tbperp = 0 we have
zk minus bperp = zkminus1 minus bperp minus α
983042AJj9830422FAJj (AJj )
T(zkminus1 minus bperp) (1)
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 35 43
By z0 minus bperp = AAdaggerz0 isin range(A) we can show by induction that
zk minus bperp isin range(A)
It follows from (1) that
983042zk minus bperp98304222
= 983042zkminus1 minus bperp98304222 minus2α983042(AJj )
T(zkminus1 minus bperp)98304222983042AJj9830422F
+α2
983042AJj9830424F(zkminus1 minus bperp)
TAJj (AJj )TAJj (AJj )
T(zkminus1 minus bperp)
le 983042zkminus1 minus bperp98304222 minus(2αminus α2)983042(AJj )
T(zkminus1 minus bperp)98304222983042AJj9830422F
(by Lemma 1)
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 36 43
Taking the conditioned expectation on the first k minus 1 iterations yields
Ekminus1
983045983042zk minus bperp98304222
983046
le 983042zkminus1 minus bperp98304222 minus(2αminus α2)983042AT(zkminus1 minus bperp)98304222
983042A9830422Fle ρ983042zkminus1 minus bperp98304222 (by Lemma 2)
Taking expectation again gives
E983045983042zk minus bperp98304222
983046= E
983045Ekminus1
983045983042zk minus bperp98304222
983046983046
le ρE983045983042zkminus1 minus bperp98304222
983046
le ρk983042z0 minus bperp98304222
This completes the proof
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 37 43
Theorem 10
For any given consistent or inconsistent linear system Ax = b let xk
be the kth iterate of REBK with
z0 isin b+ range(A) and x0 isin range(AT)
For any ε gt 0 it holds
E983045983042xk minusAdaggerb98304222
983046le (1 + ε)kρk
983061(1 + ε)α2983042z0 minus bperp98304222
ε2983042A9830422F+ 983042x0 minusAdaggerb98304222
983062
Proof Let
983141xk = xkminus1 minus α
983042AIi9830422F(AIi)
TAIi(xkminus1 minusAdaggerb)
which is actually one DSBGS update for the linear systemAx = AAdaggerb from xkminus1
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 38 43
It follows from
xk minus 983141xk =α
983042AIi9830422F(AIi)
T(bIi minusAIiAdaggerbminus zkIi)
that
983042xk minus 983141xk98304222
=α2
983042AIi9830424F(bIi minusAIiA
daggerbminus zkIi)TAIi(AIi)
T(bIi minusAIiAdaggerbminus zkIi
)
leα2983042bIi minusAIiA
daggerbminus zkIi98304222
983042AIi9830422F (by Lemma 1) (2)
It follows from
Ekminus1
983045983042xk minus 983141xk98304222
983046= Ekminus1
983045Eikminus1
983045983042xk minus 983141xk98304222
983046983046
le Ekminus1
983063α2983042bminusAAdaggerbminus zk98304222
983042A9830422F
983064(by (2))
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 39 43
that
E983045983042xk minus 983141xk98304222
983046le α2
983042A9830422FE983045983042bminusAAdaggerbminus zk98304222
983046
le α2ρk
983042A9830422F983042z0 minus bperp98304222 (by Lemma 9) (3)
By x0 isin range(AT) and Adaggerb isin range(AT) we have
x0 minusAdaggerb isin range(AT)
Then we can show that xk minusAdaggerb isin range(AT) by induction By
983042983141xk minusAdaggerb98304222
= 983042xkminus1 minusAdaggerb98304222 minus2α983042AIi(x
kminus1 minusAdaggerb)98304222983042AIi9830422F
+α2
983042AIi9830424F(xkminus1 minusAdaggerb)T(AIi)
TAIi(AIi)TAIi(x
kminus1 minusAdaggerb)
le 983042xkminus1 minusAdaggerb98304222 minus(2αminus α2)983042AIi(x
kminus1 minusAdaggerb)98304222983042AIi9830422F
(by Lemma 1)
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 40 43
we have
Ekminus1
983045983042983141xk minusAdaggerb98304222
983046
le 983042xkminus1 minusAdaggerb98304222 minus(2αminus α2)983042A(xkminus1 minusAdaggerb)98304222
983042A9830422Fle ρ983042xkminus1 minusAdaggerb98304222 (by Lemma 3)
which yields
E983045983042983141xk minusAdaggerb98304222
983046le ρE
983045983042xkminus1 minusAdaggerb98304222
983046 (4)
Note that for any ε gt 0 we have
983042xk minusAdaggerb98304222= 983042xk minus 983141xk + 983141xk minusAdaggerb98304222le (983042xk minus 983141xk9830422 + 983042983141xk minusAdaggerb9830422)2
le 983042xk minus 983141xk98304222 + 983042983141xk minusAdaggerb98304222 + 2983042xk minus 983141xk9830422983042983141xk minusAdaggerb9830422
le9830611 +
1
ε
983062983042xk minus 983141xk98304222 + (1 + ε)983042983141xk minusAdaggerb98304222 (5)
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 41 43
Combining (3) (4) and (5) yields
E983045983042xk minusAdaggerb98304222
983046
le9830611 +
1
ε
983062E983045983042xk minus 983141xk98304222
983046+ (1 + ε)E
983045983042983141xk minusAdaggerb98304222
983046
le9830611 +
1
ε
983062α2ρk
983042A9830422F983042z0 minus bperp98304222 + (1 + ε)ρE
983045983042xkminus1 minusAdaggerb98304222
983046
le9830611 +
1
ε
983062(1 + 1 + ε)α2ρk
983042A9830422F983042z0 minus bperp98304222
+(1 + ε)2ρ2E983045983042xkminus2 minusAdaggerb98304222
983046
le middot middot middot
le9830611 +
1
ε
983062(1 + 1 + ε+ middot middot middot+ (1 + ε)kminus1)α2ρk
983042A9830422F983042z0 minus bperp98304222
+(1 + ε)kρk983042x0 minusAdaggerb98304222
le (1 + ε)kρk983061(1 + ε)α2983042z0 minus bperp98304222
ε2983042A9830422F+ 983042x0 minusAdaggerb98304222
983062
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 42 43
Remark 4
For the case REBK with s = m t = n and α = 1 (ie REK) by theorthogonality
(983141xk minusAdaggerb)T(xk minus 983141xk) = 0
the equation (5) becomes
983042xk minusAdaggerb98304222 = 983042xk minus 983141xk98304222 + 983042983141xk minusAdaggerb98304222
which yields the following convergence for REK
E983045983042xk minusAdaggerb98304222
983046le ρk
983061k983042z0 minus bperp98304222
983042A9830422F+ 983042x0 minusAdaggerb98304222
983062
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 43 43
Kui Du Wutao Si and Xiaohui Sun
Pseudoinverse-free randomized extended block kaczmarz for solving least squaresarXiv preprint arXiv200104179 2020
Kui Du and Xiaohui Sun
A doubly stochastic block Gauss-Seidel algorithm for solving linear equationsarXiv preprint arXiv191213291 2019
Lawrence H Landweber
An iteration formula for Fredholm integral equations of the first kindAmer J Math 73615ndash624 1951
D Leventhal and A S Lewis
Randomized methods for linear constraints convergence rates and conditioningMath Oper Res 35(3)641ndash654 2010
Anna Ma Deanna Needell and Aaditya Ramdas
Convergence properties of the randomized extended Gauss-Seidel and Kaczmarz methodsSIAM J Matrix Anal Appl 36(4)1590ndash1604 2015
Meisam Razaviyayn Mingyi Hong Navid Reyhanian and Zhi-Quan Luo
A linearly convergent doubly stochastic Gauss-Seidel algorithm for solving linear equations and acertain class of over-parameterized optimization problemsMath Program 176(1-2 Ser B)465ndash496 2019
Thomas Strohmer and Roman Vershynin
A randomized Kaczmarz algorithm with exponential convergenceJ Fourier Anal Appl 15(2)262ndash278 2009
Anastasios Zouzias and Nikolaos M Freris
Randomized extended Kaczmarz for solving least squaresSIAM J Matrix Anal Appl 34(2)773ndash793 2013
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 43 43
Randomized Gauss-Seidel [4][5] (s = 1 t = n α = 1)
xk = xkminus1 minus (Aj)T(Axkminus1 minus b)
983042Aj98304222Ij
where Aj is the jth column of A and Ij is the jth column of thentimes n identity matrix I
Doubly stochastic Gauss-Seidel [6] (s = m t = n)
xk = xkminus1 minus αAij(Aix
kminus1 minus bi)
|Aij |2Ij
where Aij is the (i j) entry of A
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 9 43
31 Convergence of the norms of the expectations
Theorem 4
Let xk denote the kth iterate of DSBGS applied to the consistent linearsystem
Ax = b
with arbitrary x0 isin Rn In exact arithmetic it holds
983042E[xk minus x0983183]9830422 le
983061max1leiler
9830559830559830559830551minusασ2
i (A)
983042A9830422F
983055983055983055983055
983062k
983042x0 minus x09831839830422
wherex0983183 = (IminusAdaggerA)x0 +Adaggerb
ie the projection of x0 onto the solution set
x isin Rn | Ax = b
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 10 43
Proof of Theorem 4
Note that the conditioned expectation on xkminus1
E[xk |xkminus1]
= xkminus1 minus αE983063IJ (AIJ )T(II)T
983042AIJ 9830422F
983064(Axkminus1 minus b)
= xkminus1 minus α
983091
983107983131
(IJ )isinP
IJ (AIJ )T(II)T
983042AIJ 9830422F983042AIJ 9830422F983042A9830422F
983092
983108 (Axkminus1 minus b)
= xkminus1 minus αAT
983042A9830422F(Axkminus1 minus b)
Then the conditioned expectation E[xk minus x0983183 |xkminus1] is given by
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 11 43
E[xk minus x0983183 |xkminus1] = E[xk |xkminus1]minus x0
983183
= xkminus1 minus αAT
983042A9830422F(Axkminus1 minus b)minus x0
983183
= xkminus1 minus αAT
983042A9830422F(Axkminus1 minusAx0
983183)minus x0983183
=
983061Iminus αATA
983042A9830422F
983062(xkminus1 minus x0
983183)
Taking expectation gives
E[xk minus x0983183] = E[E[xk minus x0
983183 |xkminus1]] =
983061Iminus αATA
983042A9830422F
983062E[xkminus1 minus x0
983183]
=
983061Iminus αATA
983042A9830422F
983062k
(x0 minus x0983183)
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 12 43
Applying the norms to both sides we obtain
983042E[xk minus x0983183]9830422 le
983061max1leiler
9830559830559830559830551minusασ2
i (A)
983042A9830422F
983055983055983055983055
983062k
983042x0 minus x09831839830422
Here the inequality follows from the fact that
x0 minus x0983183 = AdaggerAx
0 minusAdaggerb isin range(AT)
and Lemma 3
Remark 1
If x0 isin range(AT) then x0983183 = Adaggerb
To ensure convergence of the expected iterate it suffices to have
max1leiler
9830559830559830559830551minusασ2
i (A)
983042A9830422F
983055983055983055983055 lt 1 ie 0 lt α lt2983042A9830422Fσ21(A)
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 13 43
Theorem 5
Let xk denote the kth iterate of DSBGS applied to the consistent orinconsistent linear system
Ax = b
with arbitrary x0 isin Rn In exact arithmetic it holds
983042E[Axk minusAx983183]9830422 le983061max1leiler
9830559830559830559830551minusασ2
i (A)
983042A9830422F
983055983055983055983055
983062k
983042Ax0 minusAx9831839830422
where x983183 is any solution of
ATAx = ATb
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 14 43
Proof of Theorem 5
Note that the conditioned expectation on xkminus1
E[Axk minusAx983183 |xkminus1]
= A(E[xk |xkminus1]minus x983183)
= A
983061xkminus1 minus α
AT
983042A9830422F(Axkminus1 minus b)minus x983183
983062
= A
983061xkminus1 minus αAT
983042A9830422F(Axkminus1 minusAx983183)minus x983183
983062
(by ATb = ATAx983183)
= Axkminus1 minusAx983183 minusαAAT
983042A9830422F(Axkminus1 minusAx983183)
=
983061Iminus αAAT
983042A9830422F
983062(Axkminus1 minusAx983183)
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 15 43
Taking expectation gives
E[Axk minusAx983183] = E[E[Axk minusAx983183 |xkminus1]]
=
983061Iminus αAAT
983042A9830422F
983062E[Axkminus1 minusAx983183]
=
983061Iminus αAAT
983042A9830422F
983062k
(Ax0 minusAx983183)
Applying the norms to both sides we obtain
983042E[Axk minusAx983183]9830422 le983061max1leiler
9830559830559830559830551minusασ2
i (A)
983042A9830422F
983055983055983055983055
983062k
983042Ax0 minusAx9831839830422
Here the inequality follows from the fact that
Ax0 minusAx983183 isin range(A)
and Lemma 3
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 16 43
32 Convergence of the expected norms
Theorem 6
Let xk denote the kth iterate of DSBGS applied to the full column rankconsistent linear system
Ax = b
with arbitrary x0 isin Rn Assume
0 lt α lt 2t
In exact arithmetic it holds
E[983042xk minusAdaggerb98304222] le9830611minus (2αminus tα2)σ2
n(A)
983042A9830422F
983062k
983042x0 minusAdaggerb98304222
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 17 43
Proof of Theorem 6
983042xk minusAdaggerb98304222
=
983056983056983056983056xkminus1 minus α
983061IJ (AIJ )T(II)
T
983042AIJ 9830422F
983062(Axkminus1 minus b)minusAdaggerb
9830569830569830569830562
2
=
983056983056983056983056xkminus1 minus α
983061IJ (AIJ )T(II)
T
983042AIJ 9830422F
983062A(xkminus1 minusAdaggerb)minusAdaggerb
9830569830569830569830562
2
=
983056983056983056983056xkminus1 minusAdaggerbminus α
983061IJ (AIJ )T(II)
TA
983042AIJ 9830422F
983062(xkminus1 minusAdaggerb)
9830569830569830569830562
2
= 983042xkminus1 minusAdaggerb98304222 minus 2α(xkminus1 minusAdaggerb)T983061IJ (AIJ )T(II)
TA
983042AIJ 9830422F
983062(xkminus1 minusAdaggerb)
+α2(xkminus1 minusAdaggerb)T983061ATIIAIJ (IJ )TIJ (AIJ )T(II)
TA
983042AIJ 9830424F
983062(xkminus1 minusAdaggerb)
le 983042xkminus1 minusAdaggerb98304222 minus 2α(xkminus1 minusAdaggerb)T983061IJ (AIJ )T(II)
TA
983042AIJ 9830422F
983062(xkminus1 minusAdaggerb)
+α2(xkminus1 minusAdaggerb)T983061ATII(II)
TA
983042AIJ 9830422F
983062(xkminus1 minusAdaggerb)
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 18 43
The last equality follows from (IJ )TIJ = I and Lemma 1 Takingexpectation gives
E[983042xk minusAdaggerb98304222 |xkminus1]
le 983042xkminus1 minusAdaggerb98304222 minus (2αminus tα2)(xkminus1 minusAdaggerb)T983061ATA
983042A9830422F
983062(xkminus1 minusAdaggerb)
le9830611minus (2αminus tα2)σ2
n(A)
983042A9830422F
983062983042xkminus1 minusAdaggerb98304222 (by Lemma 2)
Taking expectation again gives
E[983042xk minusAdaggerb98304222] = E[E[983042xk minusAdaggerb98304222 |xkminus1]]
le9830611minus (2αminus tα2)σ2
n(A)
983042A9830422F
983062E[983042xkminus1 minusAdaggerb98304222]
le9830611minus (2αminus tα2)σ2
n(A)
983042A9830422F
983062k
983042x0 minusAdaggerb98304222
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 19 43
Remark 2
If t = 1 and x0 isin range(AT) we can show
xk minus x0983183 isin range(AT)
by induction where
x0983183 = (IminusAdaggerA)x0 +Adaggerb
ie the projection of x0 onto the set
x isin Rn | Ax = b
Then for rank deficient consistent linear systems by the sameapproach we can prove the convergence bound
E[983042xk minus x098318398304222] le
9830611minus (2αminus α2)σ2
r (A)
983042A9830422F
983062k
983042x0 minus x098318398304222
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 20 43
Theorem 7
Let xk denote the kth iterate of DSBGS applied to the consistent linearsystem (full column rank or rank-deficient)
Ax = b
with arbitrary x0 isin Rn If t = n and 0 lt α lt 2σ2r (A)983042A9830422F then
E[983042Axk minus b98304222] le9830611 + α2 minus 2ασ2
r (A)
983042A9830422F
983062k
983042Ax0 minus b98304222
If t lt n and 0 lt α lt 2σ2r (A)(tρ) then
E[983042Axk minus b98304222] le9830611minus 2ασ2
r (A)minus tρα2
983042A9830422F
983062k
983042Ax0 minus b98304222
whereρ = max
1lejletσ21(AJj )
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 21 43
Proof of Theorem 7
Note that
983042Axk minus b98304222
=
983056983056983056983056Axkminus1 minus α
983061AIJ (AIJ )T(II)
T
983042AIJ 9830422F
983062(Axkminus1 minus b)minus b
9830569830569830569830562
2
= 983042Axkminus1 minus b98304222 minus 2α(Axkminus1 minus b)T983061AIJ (AIJ )T(II)
T
983042AIJ 9830422F
983062(Axkminus1 minus b)
+α2(Axkminus1 minus b)T983061IIAIJ (IJ )TATAIJ (AIJ )T(II)
T
983042AIJ 9830424F
983062(Axkminus1 minus b)
If t = n then it follows from
(IJ )TATAIJ = 983042AJ 9830422F
(since AIJ = AJ is a column vector) that
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 22 43
983042Axk minus b98304222
= 983042Axkminus1 minus b98304222 minus 2α(Axkminus1 minus b)T983061AIJ (AIJ )T(II)
T
983042AIJ 9830422F
983062(Axkminus1 minus b)
+α2(Axkminus1 minus b)T983061983042AJ 9830422FIIAIJ (AIJ )T(II)
T
983042AIJ 9830424F
983062(Axkminus1 minus b)
le 983042Axkminus1 minus b98304222 minus 2α(Axkminus1 minus b)T983061AIJ (AIJ )T(II)
T
983042AIJ 9830422F
983062(Axkminus1 minus b)
+α2(Axkminus1 minus b)T983061983042AJ 9830422FII(II)T
983042AIJ 9830422F
983062(Axkminus1 minus b) (by Lemma 1)
Taking expectation gives
E[983042Axk minus b98304222 |xkminus1]
le (1 + α2)983042Axkminus1 minus b98304222 minus 2α(Axkminus1 minus b)T983061AAT
983042A9830422F
983062(Axkminus1 minus b)
le9830611 + α2 minus 2ασ2
r(A)
983042A9830422F
983062983042Axkminus1 minus b98304222
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 23 43
The last inequality follows from Axkminus1 minus b isin range(A) and Lemma 2Taking expectation again gives
E[983042Axk minus b98304222] = E[E[983042Axk minus b98304222 |xkminus1]]
le9830611 + α2 minus 2ασ2
r (A)
983042A9830422F
983062E[983042Axkminus1 minus b98304222]
le9830611 + α2 minus 2ασ2
r (A)
983042A9830422F
983062k
983042Ax0 minus b98304222
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 24 43
If t lt n then it follows from
(IJ )TATAIJ = AT
JAJ ≼ ρI
(since ρ = max1lejlet σ21(AJj )) that
983042Axk minus b98304222
le 983042Axkminus1 minus b98304222 minus 2α(Axkminus1 minus b)T983061AIJ (AIJ )T(II)
T
983042AIJ 9830422F
983062(Axkminus1 minus b)
+α2(Axkminus1 minus b)T983061ρIIAIJ (AIJ )T(II)
T
983042AIJ 9830424F
983062(Axkminus1 minus b)
le 983042Axkminus1 minus b98304222 minus 2α(Axkminus1 minus b)T983061AIJ (AIJ )T(II)
T
983042AIJ 9830422F
983062(Axkminus1 minus b)
+α2(Axkminus1 minus b)T983061ρII(II)
T
983042AIJ 9830422F
983062(Axkminus1 minus b) (by Lemma 1)
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 25 43
Taking expectation gives
E[983042Axk minus b98304222 |xkminus1]
le9830611 +
tρα2
983042A9830422F
983062983042Axkminus1 minus b98304222 minus 2α(Axkminus1 minus b)T
983061AAT
983042A9830422F
983062(Axkminus1 minus b)
le9830611minus 2ασ2
r(A)minus tρα2
983042A9830422F
983062983042Axkminus1 minus b98304222
The last inequality follows from Axkminus1 minus b isin range(A) and Lemma 2Taking expectation again gives
E[983042Axk minus b98304222] = E[E[983042Axk minus b98304222 |xkminus1]]
le9830611minus 2ασ2
r (A)minus tρα2
983042A9830422F
983062E[983042Axkminus1 minus b98304222]
le9830611minus 2ασ2
r (A)minus tρα2
983042A9830422F
983062k
983042Ax0 minus b98304222
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 26 43
Remark 3
Note that(AJ )
Tb = (AJ )TAx983183
where x983183 is any solution of ATAx = ATb For inconsistent linearsystems we can prove DSBGS(1 t) has the convergence bounds Ift = n and 0 lt α lt 2σ2
r (A)983042A9830422F then
E[983042Axk minusAx98318398304222] le9830611 + α2 minus 2ασ2
r (A)
983042A9830422F
983062k
983042Ax0 minus b98304222
If t lt n and 0 lt α lt 2σ2r (A)(tρ) then
E[983042Axk minusAx98318398304222] le9830611minus 2ασ2
r (A)minus tρα2
983042A9830422F
983062k
983042Ax0 minus b98304222
whereρ = max
1lejletσ21(AJj )
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 27 43
4 Randomized extended block Kaczmarz (REBK) [1]
Algorithm 2 Randomized extended block Kaczmarz (REBK)
Let I1 I2 Is and J1J2 Jt be partitions of [m]and [n] respectivelyLet α gt 0 Initialize z0 isin b+ range(A) and x0 isin range(AT)for k = 1 2 do
Pick j isin [t] with probability 983042AJj9830422F983042A9830422FSet zk = zkminus1 minus α
983042AJj9830422FAJj (AJj )
Tzkminus1
Pick i isin [s] with probability 983042AIi9830422F983042A9830422FSet xk = xkminus1 minus α
983042AIi9830422F(AIi)
T(AIixkminus1 minus bIi + zkIi)
By choosing s = m t = n and α = 1 we recover the well-knownrandomized extended Kaczmarz (REK) algorithm of Zouzias and Freris[8]
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 28 43
Let Ekminus1
983045middot983046denote the conditional expectation conditioned on the first
k minus 1 iterations of REBK That is
Ekminus1
983045middot983046= E
983045middot|j1 i1 j2 i2 jkminus1 ikminus1
983046
where jl is the lth column block chosen and il is the lth row blockchosen
We denote the conditional expectation conditioned on the first k minus 1iterations and the kth column block chosen as
Eikminus1
983045middot983046= E
983045middot|j1 i1 j2 i2 jkminus1 ikminus1 jk
983046
Then by the law of total expectation we have
Ekminus1
983045middot983046= Ekminus1
983045Eikminus1
983045middot983046983046
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 29 43
41 Convergence of the norms of the expectations
Theorem 8
For any given consistent or inconsistent linear system
Ax = b
let xk be the kth iterate of REBK with
z0 isin b+ range(A) and x0 isin range(AT)
It holds
983042E983045xk minusAdaggerb
9830469830422 le δk
983061983042x0 minusAdaggerb9830422 +
αk983042ATz09830422983042A9830422F
983062
where
δ = max1leiler
9830559830559830559830551minusασ2
i (A)
983042A9830422F
983055983055983055983055
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 30 43
Proof By ATb = ATAAdaggerb and
xk minusAdaggerb = xkminus1 minusAdaggerbminus α
983042AIi9830422F(AIi)
T(AIixkminus1 minus bIi + zkIi)
we have
Ekminus1
983045xk minusAdaggerb
983046= Ekminus1
983045Eikminus1
983045xk minusAdaggerb
983046983046
= xkminus1 minusAdaggerbminus Ekminus1
983063αAT(Axkminus1 minus b+ zk)
983042A9830422F
983064
= xkminus1 minusAdaggerbminus αATAxkminus1 minusATb
983042A9830422Fminus α
AT
983042A9830422FEkminus1
983045zk
983046
=
983061Iminus α
ATA
983042A9830422F
983062(xkminus1 minusAdaggerb)minus α
AT
983042A9830422FEkminus1
983045zk
983046
=
983061Iminus α
ATA
983042A9830422F
983062(xkminus1 minusAdaggerb)minus α
AT
983042A9830422F
983061Iminus α
AAT
983042A9830422F
983062zkminus1
=
983061Iminus α
ATA
983042A9830422F
983062(xkminus1 minusAdaggerb)minus α
983061Iminus α
ATA
983042A9830422F
983062AT
983042A9830422Fzkminus1
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 31 43
Taking expectation gives
E983045xk minusAdaggerb
983046
= E983045Ekminus1
983045xk minusAdaggerb
983046983046
=
983061Iminus α
ATA
983042A9830422F
983062E983045xkminus1 minusAdaggerb
983046minus α
983061Iminus α
ATA
983042A9830422F
983062AT
983042A9830422FE983045zkminus1
983046
=
983061Iminus α
ATA
983042A9830422F
983062E983045xkminus1 minusAdaggerb
983046minus α
983061Iminus α
ATA
983042A9830422F
983062kATz0
983042A9830422F
=
983061Iminus α
ATA
983042A9830422F
9830622
E983045xkminus2 minusAdaggerb
983046minus 2α
983061Iminus α
ATA
983042A9830422F
983062kATz0
983042A9830422F= middot middot middot
=
983061Iminus α
ATA
983042A9830422F
983062k
(x0 minusAdaggerb)minus αk
983061Iminus α
ATA
983042A9830422F
983062kATz0
983042A9830422F
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 32 43
Applying the norms to both sides we obtain
983042E983045xk minusAdaggerb
9830469830422
=
983056983056983056983056983056
983061Iminus α
ATA
983042A9830422F
983062k
(x0 minusAdaggerb)minus αk
983061Iminus α
ATA
983042A9830422F
983062kATz0
983042A9830422F
9830569830569830569830569830562
le983056983056983056983056983056
983061Iminus α
ATA
983042A9830422F
983062k
(x0 minusAdaggerb)
9830569830569830569830569830562
+
983056983056983056983056983056αk983061Iminus α
ATA
983042A9830422F
983062kATz0
983042A9830422F
9830569830569830569830569830562
le δk983061983042x0 minusAdaggerb9830422 +
αk983042ATz09830422983042A9830422F
983062
Here the last inequality follows from the facts that
x0 minusAdaggerb isin range(AT) ATz0 isin range(AT)
and Lemma 3
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 33 43
42 Convergence of E983045983042xk minusAdaggerb98304222
983046
The convergence ofE983045983042xk minusAdaggerb98304222
983046
depends on the positive number ρ defined as
ρ = 1minus (2αminus α2)σ2r (A)
983042A9830422F
In the following lemma we show that the vector zk generated in REBKwith z0 isin b+ range(A) converges to
bperp = (IminusAAdagger)b
which is the orthogonal projection of z0 onto the set z | ATz = 0
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 34 43
Lemma 9
For any given consistent or inconsistent linear system
Ax = b
let zk be the vector generated in REBK with
z0 isin b+ range(A)
It holdsE983045983042zk minus bperp98304222
983046le ρk983042z0 minus bperp98304222
Proof By (AJj )Tbperp = 0 we have
zk minus bperp = zkminus1 minus bperp minus α
983042AJj9830422FAJj (AJj )
T(zkminus1 minus bperp) (1)
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 35 43
By z0 minus bperp = AAdaggerz0 isin range(A) we can show by induction that
zk minus bperp isin range(A)
It follows from (1) that
983042zk minus bperp98304222
= 983042zkminus1 minus bperp98304222 minus2α983042(AJj )
T(zkminus1 minus bperp)98304222983042AJj9830422F
+α2
983042AJj9830424F(zkminus1 minus bperp)
TAJj (AJj )TAJj (AJj )
T(zkminus1 minus bperp)
le 983042zkminus1 minus bperp98304222 minus(2αminus α2)983042(AJj )
T(zkminus1 minus bperp)98304222983042AJj9830422F
(by Lemma 1)
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 36 43
Taking the conditioned expectation on the first k minus 1 iterations yields
Ekminus1
983045983042zk minus bperp98304222
983046
le 983042zkminus1 minus bperp98304222 minus(2αminus α2)983042AT(zkminus1 minus bperp)98304222
983042A9830422Fle ρ983042zkminus1 minus bperp98304222 (by Lemma 2)
Taking expectation again gives
E983045983042zk minus bperp98304222
983046= E
983045Ekminus1
983045983042zk minus bperp98304222
983046983046
le ρE983045983042zkminus1 minus bperp98304222
983046
le ρk983042z0 minus bperp98304222
This completes the proof
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 37 43
Theorem 10
For any given consistent or inconsistent linear system Ax = b let xk
be the kth iterate of REBK with
z0 isin b+ range(A) and x0 isin range(AT)
For any ε gt 0 it holds
E983045983042xk minusAdaggerb98304222
983046le (1 + ε)kρk
983061(1 + ε)α2983042z0 minus bperp98304222
ε2983042A9830422F+ 983042x0 minusAdaggerb98304222
983062
Proof Let
983141xk = xkminus1 minus α
983042AIi9830422F(AIi)
TAIi(xkminus1 minusAdaggerb)
which is actually one DSBGS update for the linear systemAx = AAdaggerb from xkminus1
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 38 43
It follows from
xk minus 983141xk =α
983042AIi9830422F(AIi)
T(bIi minusAIiAdaggerbminus zkIi)
that
983042xk minus 983141xk98304222
=α2
983042AIi9830424F(bIi minusAIiA
daggerbminus zkIi)TAIi(AIi)
T(bIi minusAIiAdaggerbminus zkIi
)
leα2983042bIi minusAIiA
daggerbminus zkIi98304222
983042AIi9830422F (by Lemma 1) (2)
It follows from
Ekminus1
983045983042xk minus 983141xk98304222
983046= Ekminus1
983045Eikminus1
983045983042xk minus 983141xk98304222
983046983046
le Ekminus1
983063α2983042bminusAAdaggerbminus zk98304222
983042A9830422F
983064(by (2))
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 39 43
that
E983045983042xk minus 983141xk98304222
983046le α2
983042A9830422FE983045983042bminusAAdaggerbminus zk98304222
983046
le α2ρk
983042A9830422F983042z0 minus bperp98304222 (by Lemma 9) (3)
By x0 isin range(AT) and Adaggerb isin range(AT) we have
x0 minusAdaggerb isin range(AT)
Then we can show that xk minusAdaggerb isin range(AT) by induction By
983042983141xk minusAdaggerb98304222
= 983042xkminus1 minusAdaggerb98304222 minus2α983042AIi(x
kminus1 minusAdaggerb)98304222983042AIi9830422F
+α2
983042AIi9830424F(xkminus1 minusAdaggerb)T(AIi)
TAIi(AIi)TAIi(x
kminus1 minusAdaggerb)
le 983042xkminus1 minusAdaggerb98304222 minus(2αminus α2)983042AIi(x
kminus1 minusAdaggerb)98304222983042AIi9830422F
(by Lemma 1)
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 40 43
we have
Ekminus1
983045983042983141xk minusAdaggerb98304222
983046
le 983042xkminus1 minusAdaggerb98304222 minus(2αminus α2)983042A(xkminus1 minusAdaggerb)98304222
983042A9830422Fle ρ983042xkminus1 minusAdaggerb98304222 (by Lemma 3)
which yields
E983045983042983141xk minusAdaggerb98304222
983046le ρE
983045983042xkminus1 minusAdaggerb98304222
983046 (4)
Note that for any ε gt 0 we have
983042xk minusAdaggerb98304222= 983042xk minus 983141xk + 983141xk minusAdaggerb98304222le (983042xk minus 983141xk9830422 + 983042983141xk minusAdaggerb9830422)2
le 983042xk minus 983141xk98304222 + 983042983141xk minusAdaggerb98304222 + 2983042xk minus 983141xk9830422983042983141xk minusAdaggerb9830422
le9830611 +
1
ε
983062983042xk minus 983141xk98304222 + (1 + ε)983042983141xk minusAdaggerb98304222 (5)
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 41 43
Combining (3) (4) and (5) yields
E983045983042xk minusAdaggerb98304222
983046
le9830611 +
1
ε
983062E983045983042xk minus 983141xk98304222
983046+ (1 + ε)E
983045983042983141xk minusAdaggerb98304222
983046
le9830611 +
1
ε
983062α2ρk
983042A9830422F983042z0 minus bperp98304222 + (1 + ε)ρE
983045983042xkminus1 minusAdaggerb98304222
983046
le9830611 +
1
ε
983062(1 + 1 + ε)α2ρk
983042A9830422F983042z0 minus bperp98304222
+(1 + ε)2ρ2E983045983042xkminus2 minusAdaggerb98304222
983046
le middot middot middot
le9830611 +
1
ε
983062(1 + 1 + ε+ middot middot middot+ (1 + ε)kminus1)α2ρk
983042A9830422F983042z0 minus bperp98304222
+(1 + ε)kρk983042x0 minusAdaggerb98304222
le (1 + ε)kρk983061(1 + ε)α2983042z0 minus bperp98304222
ε2983042A9830422F+ 983042x0 minusAdaggerb98304222
983062
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 42 43
Remark 4
For the case REBK with s = m t = n and α = 1 (ie REK) by theorthogonality
(983141xk minusAdaggerb)T(xk minus 983141xk) = 0
the equation (5) becomes
983042xk minusAdaggerb98304222 = 983042xk minus 983141xk98304222 + 983042983141xk minusAdaggerb98304222
which yields the following convergence for REK
E983045983042xk minusAdaggerb98304222
983046le ρk
983061k983042z0 minus bperp98304222
983042A9830422F+ 983042x0 minusAdaggerb98304222
983062
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 43 43
Kui Du Wutao Si and Xiaohui Sun
Pseudoinverse-free randomized extended block kaczmarz for solving least squaresarXiv preprint arXiv200104179 2020
Kui Du and Xiaohui Sun
A doubly stochastic block Gauss-Seidel algorithm for solving linear equationsarXiv preprint arXiv191213291 2019
Lawrence H Landweber
An iteration formula for Fredholm integral equations of the first kindAmer J Math 73615ndash624 1951
D Leventhal and A S Lewis
Randomized methods for linear constraints convergence rates and conditioningMath Oper Res 35(3)641ndash654 2010
Anna Ma Deanna Needell and Aaditya Ramdas
Convergence properties of the randomized extended Gauss-Seidel and Kaczmarz methodsSIAM J Matrix Anal Appl 36(4)1590ndash1604 2015
Meisam Razaviyayn Mingyi Hong Navid Reyhanian and Zhi-Quan Luo
A linearly convergent doubly stochastic Gauss-Seidel algorithm for solving linear equations and acertain class of over-parameterized optimization problemsMath Program 176(1-2 Ser B)465ndash496 2019
Thomas Strohmer and Roman Vershynin
A randomized Kaczmarz algorithm with exponential convergenceJ Fourier Anal Appl 15(2)262ndash278 2009
Anastasios Zouzias and Nikolaos M Freris
Randomized extended Kaczmarz for solving least squaresSIAM J Matrix Anal Appl 34(2)773ndash793 2013
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 43 43
31 Convergence of the norms of the expectations
Theorem 4
Let xk denote the kth iterate of DSBGS applied to the consistent linearsystem
Ax = b
with arbitrary x0 isin Rn In exact arithmetic it holds
983042E[xk minus x0983183]9830422 le
983061max1leiler
9830559830559830559830551minusασ2
i (A)
983042A9830422F
983055983055983055983055
983062k
983042x0 minus x09831839830422
wherex0983183 = (IminusAdaggerA)x0 +Adaggerb
ie the projection of x0 onto the solution set
x isin Rn | Ax = b
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 10 43
Proof of Theorem 4
Note that the conditioned expectation on xkminus1
E[xk |xkminus1]
= xkminus1 minus αE983063IJ (AIJ )T(II)T
983042AIJ 9830422F
983064(Axkminus1 minus b)
= xkminus1 minus α
983091
983107983131
(IJ )isinP
IJ (AIJ )T(II)T
983042AIJ 9830422F983042AIJ 9830422F983042A9830422F
983092
983108 (Axkminus1 minus b)
= xkminus1 minus αAT
983042A9830422F(Axkminus1 minus b)
Then the conditioned expectation E[xk minus x0983183 |xkminus1] is given by
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 11 43
E[xk minus x0983183 |xkminus1] = E[xk |xkminus1]minus x0
983183
= xkminus1 minus αAT
983042A9830422F(Axkminus1 minus b)minus x0
983183
= xkminus1 minus αAT
983042A9830422F(Axkminus1 minusAx0
983183)minus x0983183
=
983061Iminus αATA
983042A9830422F
983062(xkminus1 minus x0
983183)
Taking expectation gives
E[xk minus x0983183] = E[E[xk minus x0
983183 |xkminus1]] =
983061Iminus αATA
983042A9830422F
983062E[xkminus1 minus x0
983183]
=
983061Iminus αATA
983042A9830422F
983062k
(x0 minus x0983183)
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 12 43
Applying the norms to both sides we obtain
983042E[xk minus x0983183]9830422 le
983061max1leiler
9830559830559830559830551minusασ2
i (A)
983042A9830422F
983055983055983055983055
983062k
983042x0 minus x09831839830422
Here the inequality follows from the fact that
x0 minus x0983183 = AdaggerAx
0 minusAdaggerb isin range(AT)
and Lemma 3
Remark 1
If x0 isin range(AT) then x0983183 = Adaggerb
To ensure convergence of the expected iterate it suffices to have
max1leiler
9830559830559830559830551minusασ2
i (A)
983042A9830422F
983055983055983055983055 lt 1 ie 0 lt α lt2983042A9830422Fσ21(A)
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 13 43
Theorem 5
Let xk denote the kth iterate of DSBGS applied to the consistent orinconsistent linear system
Ax = b
with arbitrary x0 isin Rn In exact arithmetic it holds
983042E[Axk minusAx983183]9830422 le983061max1leiler
9830559830559830559830551minusασ2
i (A)
983042A9830422F
983055983055983055983055
983062k
983042Ax0 minusAx9831839830422
where x983183 is any solution of
ATAx = ATb
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 14 43
Proof of Theorem 5
Note that the conditioned expectation on xkminus1
E[Axk minusAx983183 |xkminus1]
= A(E[xk |xkminus1]minus x983183)
= A
983061xkminus1 minus α
AT
983042A9830422F(Axkminus1 minus b)minus x983183
983062
= A
983061xkminus1 minus αAT
983042A9830422F(Axkminus1 minusAx983183)minus x983183
983062
(by ATb = ATAx983183)
= Axkminus1 minusAx983183 minusαAAT
983042A9830422F(Axkminus1 minusAx983183)
=
983061Iminus αAAT
983042A9830422F
983062(Axkminus1 minusAx983183)
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 15 43
Taking expectation gives
E[Axk minusAx983183] = E[E[Axk minusAx983183 |xkminus1]]
=
983061Iminus αAAT
983042A9830422F
983062E[Axkminus1 minusAx983183]
=
983061Iminus αAAT
983042A9830422F
983062k
(Ax0 minusAx983183)
Applying the norms to both sides we obtain
983042E[Axk minusAx983183]9830422 le983061max1leiler
9830559830559830559830551minusασ2
i (A)
983042A9830422F
983055983055983055983055
983062k
983042Ax0 minusAx9831839830422
Here the inequality follows from the fact that
Ax0 minusAx983183 isin range(A)
and Lemma 3
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 16 43
32 Convergence of the expected norms
Theorem 6
Let xk denote the kth iterate of DSBGS applied to the full column rankconsistent linear system
Ax = b
with arbitrary x0 isin Rn Assume
0 lt α lt 2t
In exact arithmetic it holds
E[983042xk minusAdaggerb98304222] le9830611minus (2αminus tα2)σ2
n(A)
983042A9830422F
983062k
983042x0 minusAdaggerb98304222
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 17 43
Proof of Theorem 6
983042xk minusAdaggerb98304222
=
983056983056983056983056xkminus1 minus α
983061IJ (AIJ )T(II)
T
983042AIJ 9830422F
983062(Axkminus1 minus b)minusAdaggerb
9830569830569830569830562
2
=
983056983056983056983056xkminus1 minus α
983061IJ (AIJ )T(II)
T
983042AIJ 9830422F
983062A(xkminus1 minusAdaggerb)minusAdaggerb
9830569830569830569830562
2
=
983056983056983056983056xkminus1 minusAdaggerbminus α
983061IJ (AIJ )T(II)
TA
983042AIJ 9830422F
983062(xkminus1 minusAdaggerb)
9830569830569830569830562
2
= 983042xkminus1 minusAdaggerb98304222 minus 2α(xkminus1 minusAdaggerb)T983061IJ (AIJ )T(II)
TA
983042AIJ 9830422F
983062(xkminus1 minusAdaggerb)
+α2(xkminus1 minusAdaggerb)T983061ATIIAIJ (IJ )TIJ (AIJ )T(II)
TA
983042AIJ 9830424F
983062(xkminus1 minusAdaggerb)
le 983042xkminus1 minusAdaggerb98304222 minus 2α(xkminus1 minusAdaggerb)T983061IJ (AIJ )T(II)
TA
983042AIJ 9830422F
983062(xkminus1 minusAdaggerb)
+α2(xkminus1 minusAdaggerb)T983061ATII(II)
TA
983042AIJ 9830422F
983062(xkminus1 minusAdaggerb)
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 18 43
The last equality follows from (IJ )TIJ = I and Lemma 1 Takingexpectation gives
E[983042xk minusAdaggerb98304222 |xkminus1]
le 983042xkminus1 minusAdaggerb98304222 minus (2αminus tα2)(xkminus1 minusAdaggerb)T983061ATA
983042A9830422F
983062(xkminus1 minusAdaggerb)
le9830611minus (2αminus tα2)σ2
n(A)
983042A9830422F
983062983042xkminus1 minusAdaggerb98304222 (by Lemma 2)
Taking expectation again gives
E[983042xk minusAdaggerb98304222] = E[E[983042xk minusAdaggerb98304222 |xkminus1]]
le9830611minus (2αminus tα2)σ2
n(A)
983042A9830422F
983062E[983042xkminus1 minusAdaggerb98304222]
le9830611minus (2αminus tα2)σ2
n(A)
983042A9830422F
983062k
983042x0 minusAdaggerb98304222
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 19 43
Remark 2
If t = 1 and x0 isin range(AT) we can show
xk minus x0983183 isin range(AT)
by induction where
x0983183 = (IminusAdaggerA)x0 +Adaggerb
ie the projection of x0 onto the set
x isin Rn | Ax = b
Then for rank deficient consistent linear systems by the sameapproach we can prove the convergence bound
E[983042xk minus x098318398304222] le
9830611minus (2αminus α2)σ2
r (A)
983042A9830422F
983062k
983042x0 minus x098318398304222
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 20 43
Theorem 7
Let xk denote the kth iterate of DSBGS applied to the consistent linearsystem (full column rank or rank-deficient)
Ax = b
with arbitrary x0 isin Rn If t = n and 0 lt α lt 2σ2r (A)983042A9830422F then
E[983042Axk minus b98304222] le9830611 + α2 minus 2ασ2
r (A)
983042A9830422F
983062k
983042Ax0 minus b98304222
If t lt n and 0 lt α lt 2σ2r (A)(tρ) then
E[983042Axk minus b98304222] le9830611minus 2ασ2
r (A)minus tρα2
983042A9830422F
983062k
983042Ax0 minus b98304222
whereρ = max
1lejletσ21(AJj )
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 21 43
Proof of Theorem 7
Note that
983042Axk minus b98304222
=
983056983056983056983056Axkminus1 minus α
983061AIJ (AIJ )T(II)
T
983042AIJ 9830422F
983062(Axkminus1 minus b)minus b
9830569830569830569830562
2
= 983042Axkminus1 minus b98304222 minus 2α(Axkminus1 minus b)T983061AIJ (AIJ )T(II)
T
983042AIJ 9830422F
983062(Axkminus1 minus b)
+α2(Axkminus1 minus b)T983061IIAIJ (IJ )TATAIJ (AIJ )T(II)
T
983042AIJ 9830424F
983062(Axkminus1 minus b)
If t = n then it follows from
(IJ )TATAIJ = 983042AJ 9830422F
(since AIJ = AJ is a column vector) that
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 22 43
983042Axk minus b98304222
= 983042Axkminus1 minus b98304222 minus 2α(Axkminus1 minus b)T983061AIJ (AIJ )T(II)
T
983042AIJ 9830422F
983062(Axkminus1 minus b)
+α2(Axkminus1 minus b)T983061983042AJ 9830422FIIAIJ (AIJ )T(II)
T
983042AIJ 9830424F
983062(Axkminus1 minus b)
le 983042Axkminus1 minus b98304222 minus 2α(Axkminus1 minus b)T983061AIJ (AIJ )T(II)
T
983042AIJ 9830422F
983062(Axkminus1 minus b)
+α2(Axkminus1 minus b)T983061983042AJ 9830422FII(II)T
983042AIJ 9830422F
983062(Axkminus1 minus b) (by Lemma 1)
Taking expectation gives
E[983042Axk minus b98304222 |xkminus1]
le (1 + α2)983042Axkminus1 minus b98304222 minus 2α(Axkminus1 minus b)T983061AAT
983042A9830422F
983062(Axkminus1 minus b)
le9830611 + α2 minus 2ασ2
r(A)
983042A9830422F
983062983042Axkminus1 minus b98304222
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 23 43
The last inequality follows from Axkminus1 minus b isin range(A) and Lemma 2Taking expectation again gives
E[983042Axk minus b98304222] = E[E[983042Axk minus b98304222 |xkminus1]]
le9830611 + α2 minus 2ασ2
r (A)
983042A9830422F
983062E[983042Axkminus1 minus b98304222]
le9830611 + α2 minus 2ασ2
r (A)
983042A9830422F
983062k
983042Ax0 minus b98304222
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 24 43
If t lt n then it follows from
(IJ )TATAIJ = AT
JAJ ≼ ρI
(since ρ = max1lejlet σ21(AJj )) that
983042Axk minus b98304222
le 983042Axkminus1 minus b98304222 minus 2α(Axkminus1 minus b)T983061AIJ (AIJ )T(II)
T
983042AIJ 9830422F
983062(Axkminus1 minus b)
+α2(Axkminus1 minus b)T983061ρIIAIJ (AIJ )T(II)
T
983042AIJ 9830424F
983062(Axkminus1 minus b)
le 983042Axkminus1 minus b98304222 minus 2α(Axkminus1 minus b)T983061AIJ (AIJ )T(II)
T
983042AIJ 9830422F
983062(Axkminus1 minus b)
+α2(Axkminus1 minus b)T983061ρII(II)
T
983042AIJ 9830422F
983062(Axkminus1 minus b) (by Lemma 1)
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 25 43
Taking expectation gives
E[983042Axk minus b98304222 |xkminus1]
le9830611 +
tρα2
983042A9830422F
983062983042Axkminus1 minus b98304222 minus 2α(Axkminus1 minus b)T
983061AAT
983042A9830422F
983062(Axkminus1 minus b)
le9830611minus 2ασ2
r(A)minus tρα2
983042A9830422F
983062983042Axkminus1 minus b98304222
The last inequality follows from Axkminus1 minus b isin range(A) and Lemma 2Taking expectation again gives
E[983042Axk minus b98304222] = E[E[983042Axk minus b98304222 |xkminus1]]
le9830611minus 2ασ2
r (A)minus tρα2
983042A9830422F
983062E[983042Axkminus1 minus b98304222]
le9830611minus 2ασ2
r (A)minus tρα2
983042A9830422F
983062k
983042Ax0 minus b98304222
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 26 43
Remark 3
Note that(AJ )
Tb = (AJ )TAx983183
where x983183 is any solution of ATAx = ATb For inconsistent linearsystems we can prove DSBGS(1 t) has the convergence bounds Ift = n and 0 lt α lt 2σ2
r (A)983042A9830422F then
E[983042Axk minusAx98318398304222] le9830611 + α2 minus 2ασ2
r (A)
983042A9830422F
983062k
983042Ax0 minus b98304222
If t lt n and 0 lt α lt 2σ2r (A)(tρ) then
E[983042Axk minusAx98318398304222] le9830611minus 2ασ2
r (A)minus tρα2
983042A9830422F
983062k
983042Ax0 minus b98304222
whereρ = max
1lejletσ21(AJj )
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 27 43
4 Randomized extended block Kaczmarz (REBK) [1]
Algorithm 2 Randomized extended block Kaczmarz (REBK)
Let I1 I2 Is and J1J2 Jt be partitions of [m]and [n] respectivelyLet α gt 0 Initialize z0 isin b+ range(A) and x0 isin range(AT)for k = 1 2 do
Pick j isin [t] with probability 983042AJj9830422F983042A9830422FSet zk = zkminus1 minus α
983042AJj9830422FAJj (AJj )
Tzkminus1
Pick i isin [s] with probability 983042AIi9830422F983042A9830422FSet xk = xkminus1 minus α
983042AIi9830422F(AIi)
T(AIixkminus1 minus bIi + zkIi)
By choosing s = m t = n and α = 1 we recover the well-knownrandomized extended Kaczmarz (REK) algorithm of Zouzias and Freris[8]
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 28 43
Let Ekminus1
983045middot983046denote the conditional expectation conditioned on the first
k minus 1 iterations of REBK That is
Ekminus1
983045middot983046= E
983045middot|j1 i1 j2 i2 jkminus1 ikminus1
983046
where jl is the lth column block chosen and il is the lth row blockchosen
We denote the conditional expectation conditioned on the first k minus 1iterations and the kth column block chosen as
Eikminus1
983045middot983046= E
983045middot|j1 i1 j2 i2 jkminus1 ikminus1 jk
983046
Then by the law of total expectation we have
Ekminus1
983045middot983046= Ekminus1
983045Eikminus1
983045middot983046983046
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 29 43
41 Convergence of the norms of the expectations
Theorem 8
For any given consistent or inconsistent linear system
Ax = b
let xk be the kth iterate of REBK with
z0 isin b+ range(A) and x0 isin range(AT)
It holds
983042E983045xk minusAdaggerb
9830469830422 le δk
983061983042x0 minusAdaggerb9830422 +
αk983042ATz09830422983042A9830422F
983062
where
δ = max1leiler
9830559830559830559830551minusασ2
i (A)
983042A9830422F
983055983055983055983055
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 30 43
Proof By ATb = ATAAdaggerb and
xk minusAdaggerb = xkminus1 minusAdaggerbminus α
983042AIi9830422F(AIi)
T(AIixkminus1 minus bIi + zkIi)
we have
Ekminus1
983045xk minusAdaggerb
983046= Ekminus1
983045Eikminus1
983045xk minusAdaggerb
983046983046
= xkminus1 minusAdaggerbminus Ekminus1
983063αAT(Axkminus1 minus b+ zk)
983042A9830422F
983064
= xkminus1 minusAdaggerbminus αATAxkminus1 minusATb
983042A9830422Fminus α
AT
983042A9830422FEkminus1
983045zk
983046
=
983061Iminus α
ATA
983042A9830422F
983062(xkminus1 minusAdaggerb)minus α
AT
983042A9830422FEkminus1
983045zk
983046
=
983061Iminus α
ATA
983042A9830422F
983062(xkminus1 minusAdaggerb)minus α
AT
983042A9830422F
983061Iminus α
AAT
983042A9830422F
983062zkminus1
=
983061Iminus α
ATA
983042A9830422F
983062(xkminus1 minusAdaggerb)minus α
983061Iminus α
ATA
983042A9830422F
983062AT
983042A9830422Fzkminus1
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 31 43
Taking expectation gives
E983045xk minusAdaggerb
983046
= E983045Ekminus1
983045xk minusAdaggerb
983046983046
=
983061Iminus α
ATA
983042A9830422F
983062E983045xkminus1 minusAdaggerb
983046minus α
983061Iminus α
ATA
983042A9830422F
983062AT
983042A9830422FE983045zkminus1
983046
=
983061Iminus α
ATA
983042A9830422F
983062E983045xkminus1 minusAdaggerb
983046minus α
983061Iminus α
ATA
983042A9830422F
983062kATz0
983042A9830422F
=
983061Iminus α
ATA
983042A9830422F
9830622
E983045xkminus2 minusAdaggerb
983046minus 2α
983061Iminus α
ATA
983042A9830422F
983062kATz0
983042A9830422F= middot middot middot
=
983061Iminus α
ATA
983042A9830422F
983062k
(x0 minusAdaggerb)minus αk
983061Iminus α
ATA
983042A9830422F
983062kATz0
983042A9830422F
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 32 43
Applying the norms to both sides we obtain
983042E983045xk minusAdaggerb
9830469830422
=
983056983056983056983056983056
983061Iminus α
ATA
983042A9830422F
983062k
(x0 minusAdaggerb)minus αk
983061Iminus α
ATA
983042A9830422F
983062kATz0
983042A9830422F
9830569830569830569830569830562
le983056983056983056983056983056
983061Iminus α
ATA
983042A9830422F
983062k
(x0 minusAdaggerb)
9830569830569830569830569830562
+
983056983056983056983056983056αk983061Iminus α
ATA
983042A9830422F
983062kATz0
983042A9830422F
9830569830569830569830569830562
le δk983061983042x0 minusAdaggerb9830422 +
αk983042ATz09830422983042A9830422F
983062
Here the last inequality follows from the facts that
x0 minusAdaggerb isin range(AT) ATz0 isin range(AT)
and Lemma 3
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 33 43
42 Convergence of E983045983042xk minusAdaggerb98304222
983046
The convergence ofE983045983042xk minusAdaggerb98304222
983046
depends on the positive number ρ defined as
ρ = 1minus (2αminus α2)σ2r (A)
983042A9830422F
In the following lemma we show that the vector zk generated in REBKwith z0 isin b+ range(A) converges to
bperp = (IminusAAdagger)b
which is the orthogonal projection of z0 onto the set z | ATz = 0
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 34 43
Lemma 9
For any given consistent or inconsistent linear system
Ax = b
let zk be the vector generated in REBK with
z0 isin b+ range(A)
It holdsE983045983042zk minus bperp98304222
983046le ρk983042z0 minus bperp98304222
Proof By (AJj )Tbperp = 0 we have
zk minus bperp = zkminus1 minus bperp minus α
983042AJj9830422FAJj (AJj )
T(zkminus1 minus bperp) (1)
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 35 43
By z0 minus bperp = AAdaggerz0 isin range(A) we can show by induction that
zk minus bperp isin range(A)
It follows from (1) that
983042zk minus bperp98304222
= 983042zkminus1 minus bperp98304222 minus2α983042(AJj )
T(zkminus1 minus bperp)98304222983042AJj9830422F
+α2
983042AJj9830424F(zkminus1 minus bperp)
TAJj (AJj )TAJj (AJj )
T(zkminus1 minus bperp)
le 983042zkminus1 minus bperp98304222 minus(2αminus α2)983042(AJj )
T(zkminus1 minus bperp)98304222983042AJj9830422F
(by Lemma 1)
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 36 43
Taking the conditioned expectation on the first k minus 1 iterations yields
Ekminus1
983045983042zk minus bperp98304222
983046
le 983042zkminus1 minus bperp98304222 minus(2αminus α2)983042AT(zkminus1 minus bperp)98304222
983042A9830422Fle ρ983042zkminus1 minus bperp98304222 (by Lemma 2)
Taking expectation again gives
E983045983042zk minus bperp98304222
983046= E
983045Ekminus1
983045983042zk minus bperp98304222
983046983046
le ρE983045983042zkminus1 minus bperp98304222
983046
le ρk983042z0 minus bperp98304222
This completes the proof
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 37 43
Theorem 10
For any given consistent or inconsistent linear system Ax = b let xk
be the kth iterate of REBK with
z0 isin b+ range(A) and x0 isin range(AT)
For any ε gt 0 it holds
E983045983042xk minusAdaggerb98304222
983046le (1 + ε)kρk
983061(1 + ε)α2983042z0 minus bperp98304222
ε2983042A9830422F+ 983042x0 minusAdaggerb98304222
983062
Proof Let
983141xk = xkminus1 minus α
983042AIi9830422F(AIi)
TAIi(xkminus1 minusAdaggerb)
which is actually one DSBGS update for the linear systemAx = AAdaggerb from xkminus1
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 38 43
It follows from
xk minus 983141xk =α
983042AIi9830422F(AIi)
T(bIi minusAIiAdaggerbminus zkIi)
that
983042xk minus 983141xk98304222
=α2
983042AIi9830424F(bIi minusAIiA
daggerbminus zkIi)TAIi(AIi)
T(bIi minusAIiAdaggerbminus zkIi
)
leα2983042bIi minusAIiA
daggerbminus zkIi98304222
983042AIi9830422F (by Lemma 1) (2)
It follows from
Ekminus1
983045983042xk minus 983141xk98304222
983046= Ekminus1
983045Eikminus1
983045983042xk minus 983141xk98304222
983046983046
le Ekminus1
983063α2983042bminusAAdaggerbminus zk98304222
983042A9830422F
983064(by (2))
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 39 43
that
E983045983042xk minus 983141xk98304222
983046le α2
983042A9830422FE983045983042bminusAAdaggerbminus zk98304222
983046
le α2ρk
983042A9830422F983042z0 minus bperp98304222 (by Lemma 9) (3)
By x0 isin range(AT) and Adaggerb isin range(AT) we have
x0 minusAdaggerb isin range(AT)
Then we can show that xk minusAdaggerb isin range(AT) by induction By
983042983141xk minusAdaggerb98304222
= 983042xkminus1 minusAdaggerb98304222 minus2α983042AIi(x
kminus1 minusAdaggerb)98304222983042AIi9830422F
+α2
983042AIi9830424F(xkminus1 minusAdaggerb)T(AIi)
TAIi(AIi)TAIi(x
kminus1 minusAdaggerb)
le 983042xkminus1 minusAdaggerb98304222 minus(2αminus α2)983042AIi(x
kminus1 minusAdaggerb)98304222983042AIi9830422F
(by Lemma 1)
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 40 43
we have
Ekminus1
983045983042983141xk minusAdaggerb98304222
983046
le 983042xkminus1 minusAdaggerb98304222 minus(2αminus α2)983042A(xkminus1 minusAdaggerb)98304222
983042A9830422Fle ρ983042xkminus1 minusAdaggerb98304222 (by Lemma 3)
which yields
E983045983042983141xk minusAdaggerb98304222
983046le ρE
983045983042xkminus1 minusAdaggerb98304222
983046 (4)
Note that for any ε gt 0 we have
983042xk minusAdaggerb98304222= 983042xk minus 983141xk + 983141xk minusAdaggerb98304222le (983042xk minus 983141xk9830422 + 983042983141xk minusAdaggerb9830422)2
le 983042xk minus 983141xk98304222 + 983042983141xk minusAdaggerb98304222 + 2983042xk minus 983141xk9830422983042983141xk minusAdaggerb9830422
le9830611 +
1
ε
983062983042xk minus 983141xk98304222 + (1 + ε)983042983141xk minusAdaggerb98304222 (5)
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 41 43
Combining (3) (4) and (5) yields
E983045983042xk minusAdaggerb98304222
983046
le9830611 +
1
ε
983062E983045983042xk minus 983141xk98304222
983046+ (1 + ε)E
983045983042983141xk minusAdaggerb98304222
983046
le9830611 +
1
ε
983062α2ρk
983042A9830422F983042z0 minus bperp98304222 + (1 + ε)ρE
983045983042xkminus1 minusAdaggerb98304222
983046
le9830611 +
1
ε
983062(1 + 1 + ε)α2ρk
983042A9830422F983042z0 minus bperp98304222
+(1 + ε)2ρ2E983045983042xkminus2 minusAdaggerb98304222
983046
le middot middot middot
le9830611 +
1
ε
983062(1 + 1 + ε+ middot middot middot+ (1 + ε)kminus1)α2ρk
983042A9830422F983042z0 minus bperp98304222
+(1 + ε)kρk983042x0 minusAdaggerb98304222
le (1 + ε)kρk983061(1 + ε)α2983042z0 minus bperp98304222
ε2983042A9830422F+ 983042x0 minusAdaggerb98304222
983062
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 42 43
Remark 4
For the case REBK with s = m t = n and α = 1 (ie REK) by theorthogonality
(983141xk minusAdaggerb)T(xk minus 983141xk) = 0
the equation (5) becomes
983042xk minusAdaggerb98304222 = 983042xk minus 983141xk98304222 + 983042983141xk minusAdaggerb98304222
which yields the following convergence for REK
E983045983042xk minusAdaggerb98304222
983046le ρk
983061k983042z0 minus bperp98304222
983042A9830422F+ 983042x0 minusAdaggerb98304222
983062
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 43 43
Kui Du Wutao Si and Xiaohui Sun
Pseudoinverse-free randomized extended block kaczmarz for solving least squaresarXiv preprint arXiv200104179 2020
Kui Du and Xiaohui Sun
A doubly stochastic block Gauss-Seidel algorithm for solving linear equationsarXiv preprint arXiv191213291 2019
Lawrence H Landweber
An iteration formula for Fredholm integral equations of the first kindAmer J Math 73615ndash624 1951
D Leventhal and A S Lewis
Randomized methods for linear constraints convergence rates and conditioningMath Oper Res 35(3)641ndash654 2010
Anna Ma Deanna Needell and Aaditya Ramdas
Convergence properties of the randomized extended Gauss-Seidel and Kaczmarz methodsSIAM J Matrix Anal Appl 36(4)1590ndash1604 2015
Meisam Razaviyayn Mingyi Hong Navid Reyhanian and Zhi-Quan Luo
A linearly convergent doubly stochastic Gauss-Seidel algorithm for solving linear equations and acertain class of over-parameterized optimization problemsMath Program 176(1-2 Ser B)465ndash496 2019
Thomas Strohmer and Roman Vershynin
A randomized Kaczmarz algorithm with exponential convergenceJ Fourier Anal Appl 15(2)262ndash278 2009
Anastasios Zouzias and Nikolaos M Freris
Randomized extended Kaczmarz for solving least squaresSIAM J Matrix Anal Appl 34(2)773ndash793 2013
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 43 43
Proof of Theorem 4
Note that the conditioned expectation on xkminus1
E[xk |xkminus1]
= xkminus1 minus αE983063IJ (AIJ )T(II)T
983042AIJ 9830422F
983064(Axkminus1 minus b)
= xkminus1 minus α
983091
983107983131
(IJ )isinP
IJ (AIJ )T(II)T
983042AIJ 9830422F983042AIJ 9830422F983042A9830422F
983092
983108 (Axkminus1 minus b)
= xkminus1 minus αAT
983042A9830422F(Axkminus1 minus b)
Then the conditioned expectation E[xk minus x0983183 |xkminus1] is given by
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 11 43
E[xk minus x0983183 |xkminus1] = E[xk |xkminus1]minus x0
983183
= xkminus1 minus αAT
983042A9830422F(Axkminus1 minus b)minus x0
983183
= xkminus1 minus αAT
983042A9830422F(Axkminus1 minusAx0
983183)minus x0983183
=
983061Iminus αATA
983042A9830422F
983062(xkminus1 minus x0
983183)
Taking expectation gives
E[xk minus x0983183] = E[E[xk minus x0
983183 |xkminus1]] =
983061Iminus αATA
983042A9830422F
983062E[xkminus1 minus x0
983183]
=
983061Iminus αATA
983042A9830422F
983062k
(x0 minus x0983183)
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 12 43
Applying the norms to both sides we obtain
983042E[xk minus x0983183]9830422 le
983061max1leiler
9830559830559830559830551minusασ2
i (A)
983042A9830422F
983055983055983055983055
983062k
983042x0 minus x09831839830422
Here the inequality follows from the fact that
x0 minus x0983183 = AdaggerAx
0 minusAdaggerb isin range(AT)
and Lemma 3
Remark 1
If x0 isin range(AT) then x0983183 = Adaggerb
To ensure convergence of the expected iterate it suffices to have
max1leiler
9830559830559830559830551minusασ2
i (A)
983042A9830422F
983055983055983055983055 lt 1 ie 0 lt α lt2983042A9830422Fσ21(A)
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 13 43
Theorem 5
Let xk denote the kth iterate of DSBGS applied to the consistent orinconsistent linear system
Ax = b
with arbitrary x0 isin Rn In exact arithmetic it holds
983042E[Axk minusAx983183]9830422 le983061max1leiler
9830559830559830559830551minusασ2
i (A)
983042A9830422F
983055983055983055983055
983062k
983042Ax0 minusAx9831839830422
where x983183 is any solution of
ATAx = ATb
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 14 43
Proof of Theorem 5
Note that the conditioned expectation on xkminus1
E[Axk minusAx983183 |xkminus1]
= A(E[xk |xkminus1]minus x983183)
= A
983061xkminus1 minus α
AT
983042A9830422F(Axkminus1 minus b)minus x983183
983062
= A
983061xkminus1 minus αAT
983042A9830422F(Axkminus1 minusAx983183)minus x983183
983062
(by ATb = ATAx983183)
= Axkminus1 minusAx983183 minusαAAT
983042A9830422F(Axkminus1 minusAx983183)
=
983061Iminus αAAT
983042A9830422F
983062(Axkminus1 minusAx983183)
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 15 43
Taking expectation gives
E[Axk minusAx983183] = E[E[Axk minusAx983183 |xkminus1]]
=
983061Iminus αAAT
983042A9830422F
983062E[Axkminus1 minusAx983183]
=
983061Iminus αAAT
983042A9830422F
983062k
(Ax0 minusAx983183)
Applying the norms to both sides we obtain
983042E[Axk minusAx983183]9830422 le983061max1leiler
9830559830559830559830551minusασ2
i (A)
983042A9830422F
983055983055983055983055
983062k
983042Ax0 minusAx9831839830422
Here the inequality follows from the fact that
Ax0 minusAx983183 isin range(A)
and Lemma 3
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 16 43
32 Convergence of the expected norms
Theorem 6
Let xk denote the kth iterate of DSBGS applied to the full column rankconsistent linear system
Ax = b
with arbitrary x0 isin Rn Assume
0 lt α lt 2t
In exact arithmetic it holds
E[983042xk minusAdaggerb98304222] le9830611minus (2αminus tα2)σ2
n(A)
983042A9830422F
983062k
983042x0 minusAdaggerb98304222
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 17 43
Proof of Theorem 6
983042xk minusAdaggerb98304222
=
983056983056983056983056xkminus1 minus α
983061IJ (AIJ )T(II)
T
983042AIJ 9830422F
983062(Axkminus1 minus b)minusAdaggerb
9830569830569830569830562
2
=
983056983056983056983056xkminus1 minus α
983061IJ (AIJ )T(II)
T
983042AIJ 9830422F
983062A(xkminus1 minusAdaggerb)minusAdaggerb
9830569830569830569830562
2
=
983056983056983056983056xkminus1 minusAdaggerbminus α
983061IJ (AIJ )T(II)
TA
983042AIJ 9830422F
983062(xkminus1 minusAdaggerb)
9830569830569830569830562
2
= 983042xkminus1 minusAdaggerb98304222 minus 2α(xkminus1 minusAdaggerb)T983061IJ (AIJ )T(II)
TA
983042AIJ 9830422F
983062(xkminus1 minusAdaggerb)
+α2(xkminus1 minusAdaggerb)T983061ATIIAIJ (IJ )TIJ (AIJ )T(II)
TA
983042AIJ 9830424F
983062(xkminus1 minusAdaggerb)
le 983042xkminus1 minusAdaggerb98304222 minus 2α(xkminus1 minusAdaggerb)T983061IJ (AIJ )T(II)
TA
983042AIJ 9830422F
983062(xkminus1 minusAdaggerb)
+α2(xkminus1 minusAdaggerb)T983061ATII(II)
TA
983042AIJ 9830422F
983062(xkminus1 minusAdaggerb)
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 18 43
The last equality follows from (IJ )TIJ = I and Lemma 1 Takingexpectation gives
E[983042xk minusAdaggerb98304222 |xkminus1]
le 983042xkminus1 minusAdaggerb98304222 minus (2αminus tα2)(xkminus1 minusAdaggerb)T983061ATA
983042A9830422F
983062(xkminus1 minusAdaggerb)
le9830611minus (2αminus tα2)σ2
n(A)
983042A9830422F
983062983042xkminus1 minusAdaggerb98304222 (by Lemma 2)
Taking expectation again gives
E[983042xk minusAdaggerb98304222] = E[E[983042xk minusAdaggerb98304222 |xkminus1]]
le9830611minus (2αminus tα2)σ2
n(A)
983042A9830422F
983062E[983042xkminus1 minusAdaggerb98304222]
le9830611minus (2αminus tα2)σ2
n(A)
983042A9830422F
983062k
983042x0 minusAdaggerb98304222
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 19 43
Remark 2
If t = 1 and x0 isin range(AT) we can show
xk minus x0983183 isin range(AT)
by induction where
x0983183 = (IminusAdaggerA)x0 +Adaggerb
ie the projection of x0 onto the set
x isin Rn | Ax = b
Then for rank deficient consistent linear systems by the sameapproach we can prove the convergence bound
E[983042xk minus x098318398304222] le
9830611minus (2αminus α2)σ2
r (A)
983042A9830422F
983062k
983042x0 minus x098318398304222
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 20 43
Theorem 7
Let xk denote the kth iterate of DSBGS applied to the consistent linearsystem (full column rank or rank-deficient)
Ax = b
with arbitrary x0 isin Rn If t = n and 0 lt α lt 2σ2r (A)983042A9830422F then
E[983042Axk minus b98304222] le9830611 + α2 minus 2ασ2
r (A)
983042A9830422F
983062k
983042Ax0 minus b98304222
If t lt n and 0 lt α lt 2σ2r (A)(tρ) then
E[983042Axk minus b98304222] le9830611minus 2ασ2
r (A)minus tρα2
983042A9830422F
983062k
983042Ax0 minus b98304222
whereρ = max
1lejletσ21(AJj )
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 21 43
Proof of Theorem 7
Note that
983042Axk minus b98304222
=
983056983056983056983056Axkminus1 minus α
983061AIJ (AIJ )T(II)
T
983042AIJ 9830422F
983062(Axkminus1 minus b)minus b
9830569830569830569830562
2
= 983042Axkminus1 minus b98304222 minus 2α(Axkminus1 minus b)T983061AIJ (AIJ )T(II)
T
983042AIJ 9830422F
983062(Axkminus1 minus b)
+α2(Axkminus1 minus b)T983061IIAIJ (IJ )TATAIJ (AIJ )T(II)
T
983042AIJ 9830424F
983062(Axkminus1 minus b)
If t = n then it follows from
(IJ )TATAIJ = 983042AJ 9830422F
(since AIJ = AJ is a column vector) that
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 22 43
983042Axk minus b98304222
= 983042Axkminus1 minus b98304222 minus 2α(Axkminus1 minus b)T983061AIJ (AIJ )T(II)
T
983042AIJ 9830422F
983062(Axkminus1 minus b)
+α2(Axkminus1 minus b)T983061983042AJ 9830422FIIAIJ (AIJ )T(II)
T
983042AIJ 9830424F
983062(Axkminus1 minus b)
le 983042Axkminus1 minus b98304222 minus 2α(Axkminus1 minus b)T983061AIJ (AIJ )T(II)
T
983042AIJ 9830422F
983062(Axkminus1 minus b)
+α2(Axkminus1 minus b)T983061983042AJ 9830422FII(II)T
983042AIJ 9830422F
983062(Axkminus1 minus b) (by Lemma 1)
Taking expectation gives
E[983042Axk minus b98304222 |xkminus1]
le (1 + α2)983042Axkminus1 minus b98304222 minus 2α(Axkminus1 minus b)T983061AAT
983042A9830422F
983062(Axkminus1 minus b)
le9830611 + α2 minus 2ασ2
r(A)
983042A9830422F
983062983042Axkminus1 minus b98304222
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 23 43
The last inequality follows from Axkminus1 minus b isin range(A) and Lemma 2Taking expectation again gives
E[983042Axk minus b98304222] = E[E[983042Axk minus b98304222 |xkminus1]]
le9830611 + α2 minus 2ασ2
r (A)
983042A9830422F
983062E[983042Axkminus1 minus b98304222]
le9830611 + α2 minus 2ασ2
r (A)
983042A9830422F
983062k
983042Ax0 minus b98304222
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 24 43
If t lt n then it follows from
(IJ )TATAIJ = AT
JAJ ≼ ρI
(since ρ = max1lejlet σ21(AJj )) that
983042Axk minus b98304222
le 983042Axkminus1 minus b98304222 minus 2α(Axkminus1 minus b)T983061AIJ (AIJ )T(II)
T
983042AIJ 9830422F
983062(Axkminus1 minus b)
+α2(Axkminus1 minus b)T983061ρIIAIJ (AIJ )T(II)
T
983042AIJ 9830424F
983062(Axkminus1 minus b)
le 983042Axkminus1 minus b98304222 minus 2α(Axkminus1 minus b)T983061AIJ (AIJ )T(II)
T
983042AIJ 9830422F
983062(Axkminus1 minus b)
+α2(Axkminus1 minus b)T983061ρII(II)
T
983042AIJ 9830422F
983062(Axkminus1 minus b) (by Lemma 1)
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 25 43
Taking expectation gives
E[983042Axk minus b98304222 |xkminus1]
le9830611 +
tρα2
983042A9830422F
983062983042Axkminus1 minus b98304222 minus 2α(Axkminus1 minus b)T
983061AAT
983042A9830422F
983062(Axkminus1 minus b)
le9830611minus 2ασ2
r(A)minus tρα2
983042A9830422F
983062983042Axkminus1 minus b98304222
The last inequality follows from Axkminus1 minus b isin range(A) and Lemma 2Taking expectation again gives
E[983042Axk minus b98304222] = E[E[983042Axk minus b98304222 |xkminus1]]
le9830611minus 2ασ2
r (A)minus tρα2
983042A9830422F
983062E[983042Axkminus1 minus b98304222]
le9830611minus 2ασ2
r (A)minus tρα2
983042A9830422F
983062k
983042Ax0 minus b98304222
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 26 43
Remark 3
Note that(AJ )
Tb = (AJ )TAx983183
where x983183 is any solution of ATAx = ATb For inconsistent linearsystems we can prove DSBGS(1 t) has the convergence bounds Ift = n and 0 lt α lt 2σ2
r (A)983042A9830422F then
E[983042Axk minusAx98318398304222] le9830611 + α2 minus 2ασ2
r (A)
983042A9830422F
983062k
983042Ax0 minus b98304222
If t lt n and 0 lt α lt 2σ2r (A)(tρ) then
E[983042Axk minusAx98318398304222] le9830611minus 2ασ2
r (A)minus tρα2
983042A9830422F
983062k
983042Ax0 minus b98304222
whereρ = max
1lejletσ21(AJj )
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 27 43
4 Randomized extended block Kaczmarz (REBK) [1]
Algorithm 2 Randomized extended block Kaczmarz (REBK)
Let I1 I2 Is and J1J2 Jt be partitions of [m]and [n] respectivelyLet α gt 0 Initialize z0 isin b+ range(A) and x0 isin range(AT)for k = 1 2 do
Pick j isin [t] with probability 983042AJj9830422F983042A9830422FSet zk = zkminus1 minus α
983042AJj9830422FAJj (AJj )
Tzkminus1
Pick i isin [s] with probability 983042AIi9830422F983042A9830422FSet xk = xkminus1 minus α
983042AIi9830422F(AIi)
T(AIixkminus1 minus bIi + zkIi)
By choosing s = m t = n and α = 1 we recover the well-knownrandomized extended Kaczmarz (REK) algorithm of Zouzias and Freris[8]
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 28 43
Let Ekminus1
983045middot983046denote the conditional expectation conditioned on the first
k minus 1 iterations of REBK That is
Ekminus1
983045middot983046= E
983045middot|j1 i1 j2 i2 jkminus1 ikminus1
983046
where jl is the lth column block chosen and il is the lth row blockchosen
We denote the conditional expectation conditioned on the first k minus 1iterations and the kth column block chosen as
Eikminus1
983045middot983046= E
983045middot|j1 i1 j2 i2 jkminus1 ikminus1 jk
983046
Then by the law of total expectation we have
Ekminus1
983045middot983046= Ekminus1
983045Eikminus1
983045middot983046983046
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 29 43
41 Convergence of the norms of the expectations
Theorem 8
For any given consistent or inconsistent linear system
Ax = b
let xk be the kth iterate of REBK with
z0 isin b+ range(A) and x0 isin range(AT)
It holds
983042E983045xk minusAdaggerb
9830469830422 le δk
983061983042x0 minusAdaggerb9830422 +
αk983042ATz09830422983042A9830422F
983062
where
δ = max1leiler
9830559830559830559830551minusασ2
i (A)
983042A9830422F
983055983055983055983055
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 30 43
Proof By ATb = ATAAdaggerb and
xk minusAdaggerb = xkminus1 minusAdaggerbminus α
983042AIi9830422F(AIi)
T(AIixkminus1 minus bIi + zkIi)
we have
Ekminus1
983045xk minusAdaggerb
983046= Ekminus1
983045Eikminus1
983045xk minusAdaggerb
983046983046
= xkminus1 minusAdaggerbminus Ekminus1
983063αAT(Axkminus1 minus b+ zk)
983042A9830422F
983064
= xkminus1 minusAdaggerbminus αATAxkminus1 minusATb
983042A9830422Fminus α
AT
983042A9830422FEkminus1
983045zk
983046
=
983061Iminus α
ATA
983042A9830422F
983062(xkminus1 minusAdaggerb)minus α
AT
983042A9830422FEkminus1
983045zk
983046
=
983061Iminus α
ATA
983042A9830422F
983062(xkminus1 minusAdaggerb)minus α
AT
983042A9830422F
983061Iminus α
AAT
983042A9830422F
983062zkminus1
=
983061Iminus α
ATA
983042A9830422F
983062(xkminus1 minusAdaggerb)minus α
983061Iminus α
ATA
983042A9830422F
983062AT
983042A9830422Fzkminus1
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 31 43
Taking expectation gives
E983045xk minusAdaggerb
983046
= E983045Ekminus1
983045xk minusAdaggerb
983046983046
=
983061Iminus α
ATA
983042A9830422F
983062E983045xkminus1 minusAdaggerb
983046minus α
983061Iminus α
ATA
983042A9830422F
983062AT
983042A9830422FE983045zkminus1
983046
=
983061Iminus α
ATA
983042A9830422F
983062E983045xkminus1 minusAdaggerb
983046minus α
983061Iminus α
ATA
983042A9830422F
983062kATz0
983042A9830422F
=
983061Iminus α
ATA
983042A9830422F
9830622
E983045xkminus2 minusAdaggerb
983046minus 2α
983061Iminus α
ATA
983042A9830422F
983062kATz0
983042A9830422F= middot middot middot
=
983061Iminus α
ATA
983042A9830422F
983062k
(x0 minusAdaggerb)minus αk
983061Iminus α
ATA
983042A9830422F
983062kATz0
983042A9830422F
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 32 43
Applying the norms to both sides we obtain
983042E983045xk minusAdaggerb
9830469830422
=
983056983056983056983056983056
983061Iminus α
ATA
983042A9830422F
983062k
(x0 minusAdaggerb)minus αk
983061Iminus α
ATA
983042A9830422F
983062kATz0
983042A9830422F
9830569830569830569830569830562
le983056983056983056983056983056
983061Iminus α
ATA
983042A9830422F
983062k
(x0 minusAdaggerb)
9830569830569830569830569830562
+
983056983056983056983056983056αk983061Iminus α
ATA
983042A9830422F
983062kATz0
983042A9830422F
9830569830569830569830569830562
le δk983061983042x0 minusAdaggerb9830422 +
αk983042ATz09830422983042A9830422F
983062
Here the last inequality follows from the facts that
x0 minusAdaggerb isin range(AT) ATz0 isin range(AT)
and Lemma 3
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 33 43
42 Convergence of E983045983042xk minusAdaggerb98304222
983046
The convergence ofE983045983042xk minusAdaggerb98304222
983046
depends on the positive number ρ defined as
ρ = 1minus (2αminus α2)σ2r (A)
983042A9830422F
In the following lemma we show that the vector zk generated in REBKwith z0 isin b+ range(A) converges to
bperp = (IminusAAdagger)b
which is the orthogonal projection of z0 onto the set z | ATz = 0
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 34 43
Lemma 9
For any given consistent or inconsistent linear system
Ax = b
let zk be the vector generated in REBK with
z0 isin b+ range(A)
It holdsE983045983042zk minus bperp98304222
983046le ρk983042z0 minus bperp98304222
Proof By (AJj )Tbperp = 0 we have
zk minus bperp = zkminus1 minus bperp minus α
983042AJj9830422FAJj (AJj )
T(zkminus1 minus bperp) (1)
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 35 43
By z0 minus bperp = AAdaggerz0 isin range(A) we can show by induction that
zk minus bperp isin range(A)
It follows from (1) that
983042zk minus bperp98304222
= 983042zkminus1 minus bperp98304222 minus2α983042(AJj )
T(zkminus1 minus bperp)98304222983042AJj9830422F
+α2
983042AJj9830424F(zkminus1 minus bperp)
TAJj (AJj )TAJj (AJj )
T(zkminus1 minus bperp)
le 983042zkminus1 minus bperp98304222 minus(2αminus α2)983042(AJj )
T(zkminus1 minus bperp)98304222983042AJj9830422F
(by Lemma 1)
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 36 43
Taking the conditioned expectation on the first k minus 1 iterations yields
Ekminus1
983045983042zk minus bperp98304222
983046
le 983042zkminus1 minus bperp98304222 minus(2αminus α2)983042AT(zkminus1 minus bperp)98304222
983042A9830422Fle ρ983042zkminus1 minus bperp98304222 (by Lemma 2)
Taking expectation again gives
E983045983042zk minus bperp98304222
983046= E
983045Ekminus1
983045983042zk minus bperp98304222
983046983046
le ρE983045983042zkminus1 minus bperp98304222
983046
le ρk983042z0 minus bperp98304222
This completes the proof
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 37 43
Theorem 10
For any given consistent or inconsistent linear system Ax = b let xk
be the kth iterate of REBK with
z0 isin b+ range(A) and x0 isin range(AT)
For any ε gt 0 it holds
E983045983042xk minusAdaggerb98304222
983046le (1 + ε)kρk
983061(1 + ε)α2983042z0 minus bperp98304222
ε2983042A9830422F+ 983042x0 minusAdaggerb98304222
983062
Proof Let
983141xk = xkminus1 minus α
983042AIi9830422F(AIi)
TAIi(xkminus1 minusAdaggerb)
which is actually one DSBGS update for the linear systemAx = AAdaggerb from xkminus1
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 38 43
It follows from
xk minus 983141xk =α
983042AIi9830422F(AIi)
T(bIi minusAIiAdaggerbminus zkIi)
that
983042xk minus 983141xk98304222
=α2
983042AIi9830424F(bIi minusAIiA
daggerbminus zkIi)TAIi(AIi)
T(bIi minusAIiAdaggerbminus zkIi
)
leα2983042bIi minusAIiA
daggerbminus zkIi98304222
983042AIi9830422F (by Lemma 1) (2)
It follows from
Ekminus1
983045983042xk minus 983141xk98304222
983046= Ekminus1
983045Eikminus1
983045983042xk minus 983141xk98304222
983046983046
le Ekminus1
983063α2983042bminusAAdaggerbminus zk98304222
983042A9830422F
983064(by (2))
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 39 43
that
E983045983042xk minus 983141xk98304222
983046le α2
983042A9830422FE983045983042bminusAAdaggerbminus zk98304222
983046
le α2ρk
983042A9830422F983042z0 minus bperp98304222 (by Lemma 9) (3)
By x0 isin range(AT) and Adaggerb isin range(AT) we have
x0 minusAdaggerb isin range(AT)
Then we can show that xk minusAdaggerb isin range(AT) by induction By
983042983141xk minusAdaggerb98304222
= 983042xkminus1 minusAdaggerb98304222 minus2α983042AIi(x
kminus1 minusAdaggerb)98304222983042AIi9830422F
+α2
983042AIi9830424F(xkminus1 minusAdaggerb)T(AIi)
TAIi(AIi)TAIi(x
kminus1 minusAdaggerb)
le 983042xkminus1 minusAdaggerb98304222 minus(2αminus α2)983042AIi(x
kminus1 minusAdaggerb)98304222983042AIi9830422F
(by Lemma 1)
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 40 43
we have
Ekminus1
983045983042983141xk minusAdaggerb98304222
983046
le 983042xkminus1 minusAdaggerb98304222 minus(2αminus α2)983042A(xkminus1 minusAdaggerb)98304222
983042A9830422Fle ρ983042xkminus1 minusAdaggerb98304222 (by Lemma 3)
which yields
E983045983042983141xk minusAdaggerb98304222
983046le ρE
983045983042xkminus1 minusAdaggerb98304222
983046 (4)
Note that for any ε gt 0 we have
983042xk minusAdaggerb98304222= 983042xk minus 983141xk + 983141xk minusAdaggerb98304222le (983042xk minus 983141xk9830422 + 983042983141xk minusAdaggerb9830422)2
le 983042xk minus 983141xk98304222 + 983042983141xk minusAdaggerb98304222 + 2983042xk minus 983141xk9830422983042983141xk minusAdaggerb9830422
le9830611 +
1
ε
983062983042xk minus 983141xk98304222 + (1 + ε)983042983141xk minusAdaggerb98304222 (5)
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 41 43
Combining (3) (4) and (5) yields
E983045983042xk minusAdaggerb98304222
983046
le9830611 +
1
ε
983062E983045983042xk minus 983141xk98304222
983046+ (1 + ε)E
983045983042983141xk minusAdaggerb98304222
983046
le9830611 +
1
ε
983062α2ρk
983042A9830422F983042z0 minus bperp98304222 + (1 + ε)ρE
983045983042xkminus1 minusAdaggerb98304222
983046
le9830611 +
1
ε
983062(1 + 1 + ε)α2ρk
983042A9830422F983042z0 minus bperp98304222
+(1 + ε)2ρ2E983045983042xkminus2 minusAdaggerb98304222
983046
le middot middot middot
le9830611 +
1
ε
983062(1 + 1 + ε+ middot middot middot+ (1 + ε)kminus1)α2ρk
983042A9830422F983042z0 minus bperp98304222
+(1 + ε)kρk983042x0 minusAdaggerb98304222
le (1 + ε)kρk983061(1 + ε)α2983042z0 minus bperp98304222
ε2983042A9830422F+ 983042x0 minusAdaggerb98304222
983062
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 42 43
Remark 4
For the case REBK with s = m t = n and α = 1 (ie REK) by theorthogonality
(983141xk minusAdaggerb)T(xk minus 983141xk) = 0
the equation (5) becomes
983042xk minusAdaggerb98304222 = 983042xk minus 983141xk98304222 + 983042983141xk minusAdaggerb98304222
which yields the following convergence for REK
E983045983042xk minusAdaggerb98304222
983046le ρk
983061k983042z0 minus bperp98304222
983042A9830422F+ 983042x0 minusAdaggerb98304222
983062
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 43 43
Kui Du Wutao Si and Xiaohui Sun
Pseudoinverse-free randomized extended block kaczmarz for solving least squaresarXiv preprint arXiv200104179 2020
Kui Du and Xiaohui Sun
A doubly stochastic block Gauss-Seidel algorithm for solving linear equationsarXiv preprint arXiv191213291 2019
Lawrence H Landweber
An iteration formula for Fredholm integral equations of the first kindAmer J Math 73615ndash624 1951
D Leventhal and A S Lewis
Randomized methods for linear constraints convergence rates and conditioningMath Oper Res 35(3)641ndash654 2010
Anna Ma Deanna Needell and Aaditya Ramdas
Convergence properties of the randomized extended Gauss-Seidel and Kaczmarz methodsSIAM J Matrix Anal Appl 36(4)1590ndash1604 2015
Meisam Razaviyayn Mingyi Hong Navid Reyhanian and Zhi-Quan Luo
A linearly convergent doubly stochastic Gauss-Seidel algorithm for solving linear equations and acertain class of over-parameterized optimization problemsMath Program 176(1-2 Ser B)465ndash496 2019
Thomas Strohmer and Roman Vershynin
A randomized Kaczmarz algorithm with exponential convergenceJ Fourier Anal Appl 15(2)262ndash278 2009
Anastasios Zouzias and Nikolaos M Freris
Randomized extended Kaczmarz for solving least squaresSIAM J Matrix Anal Appl 34(2)773ndash793 2013
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 43 43
E[xk minus x0983183 |xkminus1] = E[xk |xkminus1]minus x0
983183
= xkminus1 minus αAT
983042A9830422F(Axkminus1 minus b)minus x0
983183
= xkminus1 minus αAT
983042A9830422F(Axkminus1 minusAx0
983183)minus x0983183
=
983061Iminus αATA
983042A9830422F
983062(xkminus1 minus x0
983183)
Taking expectation gives
E[xk minus x0983183] = E[E[xk minus x0
983183 |xkminus1]] =
983061Iminus αATA
983042A9830422F
983062E[xkminus1 minus x0
983183]
=
983061Iminus αATA
983042A9830422F
983062k
(x0 minus x0983183)
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 12 43
Applying the norms to both sides we obtain
983042E[xk minus x0983183]9830422 le
983061max1leiler
9830559830559830559830551minusασ2
i (A)
983042A9830422F
983055983055983055983055
983062k
983042x0 minus x09831839830422
Here the inequality follows from the fact that
x0 minus x0983183 = AdaggerAx
0 minusAdaggerb isin range(AT)
and Lemma 3
Remark 1
If x0 isin range(AT) then x0983183 = Adaggerb
To ensure convergence of the expected iterate it suffices to have
max1leiler
9830559830559830559830551minusασ2
i (A)
983042A9830422F
983055983055983055983055 lt 1 ie 0 lt α lt2983042A9830422Fσ21(A)
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 13 43
Theorem 5
Let xk denote the kth iterate of DSBGS applied to the consistent orinconsistent linear system
Ax = b
with arbitrary x0 isin Rn In exact arithmetic it holds
983042E[Axk minusAx983183]9830422 le983061max1leiler
9830559830559830559830551minusασ2
i (A)
983042A9830422F
983055983055983055983055
983062k
983042Ax0 minusAx9831839830422
where x983183 is any solution of
ATAx = ATb
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 14 43
Proof of Theorem 5
Note that the conditioned expectation on xkminus1
E[Axk minusAx983183 |xkminus1]
= A(E[xk |xkminus1]minus x983183)
= A
983061xkminus1 minus α
AT
983042A9830422F(Axkminus1 minus b)minus x983183
983062
= A
983061xkminus1 minus αAT
983042A9830422F(Axkminus1 minusAx983183)minus x983183
983062
(by ATb = ATAx983183)
= Axkminus1 minusAx983183 minusαAAT
983042A9830422F(Axkminus1 minusAx983183)
=
983061Iminus αAAT
983042A9830422F
983062(Axkminus1 minusAx983183)
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 15 43
Taking expectation gives
E[Axk minusAx983183] = E[E[Axk minusAx983183 |xkminus1]]
=
983061Iminus αAAT
983042A9830422F
983062E[Axkminus1 minusAx983183]
=
983061Iminus αAAT
983042A9830422F
983062k
(Ax0 minusAx983183)
Applying the norms to both sides we obtain
983042E[Axk minusAx983183]9830422 le983061max1leiler
9830559830559830559830551minusασ2
i (A)
983042A9830422F
983055983055983055983055
983062k
983042Ax0 minusAx9831839830422
Here the inequality follows from the fact that
Ax0 minusAx983183 isin range(A)
and Lemma 3
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 16 43
32 Convergence of the expected norms
Theorem 6
Let xk denote the kth iterate of DSBGS applied to the full column rankconsistent linear system
Ax = b
with arbitrary x0 isin Rn Assume
0 lt α lt 2t
In exact arithmetic it holds
E[983042xk minusAdaggerb98304222] le9830611minus (2αminus tα2)σ2
n(A)
983042A9830422F
983062k
983042x0 minusAdaggerb98304222
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 17 43
Proof of Theorem 6
983042xk minusAdaggerb98304222
=
983056983056983056983056xkminus1 minus α
983061IJ (AIJ )T(II)
T
983042AIJ 9830422F
983062(Axkminus1 minus b)minusAdaggerb
9830569830569830569830562
2
=
983056983056983056983056xkminus1 minus α
983061IJ (AIJ )T(II)
T
983042AIJ 9830422F
983062A(xkminus1 minusAdaggerb)minusAdaggerb
9830569830569830569830562
2
=
983056983056983056983056xkminus1 minusAdaggerbminus α
983061IJ (AIJ )T(II)
TA
983042AIJ 9830422F
983062(xkminus1 minusAdaggerb)
9830569830569830569830562
2
= 983042xkminus1 minusAdaggerb98304222 minus 2α(xkminus1 minusAdaggerb)T983061IJ (AIJ )T(II)
TA
983042AIJ 9830422F
983062(xkminus1 minusAdaggerb)
+α2(xkminus1 minusAdaggerb)T983061ATIIAIJ (IJ )TIJ (AIJ )T(II)
TA
983042AIJ 9830424F
983062(xkminus1 minusAdaggerb)
le 983042xkminus1 minusAdaggerb98304222 minus 2α(xkminus1 minusAdaggerb)T983061IJ (AIJ )T(II)
TA
983042AIJ 9830422F
983062(xkminus1 minusAdaggerb)
+α2(xkminus1 minusAdaggerb)T983061ATII(II)
TA
983042AIJ 9830422F
983062(xkminus1 minusAdaggerb)
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 18 43
The last equality follows from (IJ )TIJ = I and Lemma 1 Takingexpectation gives
E[983042xk minusAdaggerb98304222 |xkminus1]
le 983042xkminus1 minusAdaggerb98304222 minus (2αminus tα2)(xkminus1 minusAdaggerb)T983061ATA
983042A9830422F
983062(xkminus1 minusAdaggerb)
le9830611minus (2αminus tα2)σ2
n(A)
983042A9830422F
983062983042xkminus1 minusAdaggerb98304222 (by Lemma 2)
Taking expectation again gives
E[983042xk minusAdaggerb98304222] = E[E[983042xk minusAdaggerb98304222 |xkminus1]]
le9830611minus (2αminus tα2)σ2
n(A)
983042A9830422F
983062E[983042xkminus1 minusAdaggerb98304222]
le9830611minus (2αminus tα2)σ2
n(A)
983042A9830422F
983062k
983042x0 minusAdaggerb98304222
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 19 43
Remark 2
If t = 1 and x0 isin range(AT) we can show
xk minus x0983183 isin range(AT)
by induction where
x0983183 = (IminusAdaggerA)x0 +Adaggerb
ie the projection of x0 onto the set
x isin Rn | Ax = b
Then for rank deficient consistent linear systems by the sameapproach we can prove the convergence bound
E[983042xk minus x098318398304222] le
9830611minus (2αminus α2)σ2
r (A)
983042A9830422F
983062k
983042x0 minus x098318398304222
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 20 43
Theorem 7
Let xk denote the kth iterate of DSBGS applied to the consistent linearsystem (full column rank or rank-deficient)
Ax = b
with arbitrary x0 isin Rn If t = n and 0 lt α lt 2σ2r (A)983042A9830422F then
E[983042Axk minus b98304222] le9830611 + α2 minus 2ασ2
r (A)
983042A9830422F
983062k
983042Ax0 minus b98304222
If t lt n and 0 lt α lt 2σ2r (A)(tρ) then
E[983042Axk minus b98304222] le9830611minus 2ασ2
r (A)minus tρα2
983042A9830422F
983062k
983042Ax0 minus b98304222
whereρ = max
1lejletσ21(AJj )
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 21 43
Proof of Theorem 7
Note that
983042Axk minus b98304222
=
983056983056983056983056Axkminus1 minus α
983061AIJ (AIJ )T(II)
T
983042AIJ 9830422F
983062(Axkminus1 minus b)minus b
9830569830569830569830562
2
= 983042Axkminus1 minus b98304222 minus 2α(Axkminus1 minus b)T983061AIJ (AIJ )T(II)
T
983042AIJ 9830422F
983062(Axkminus1 minus b)
+α2(Axkminus1 minus b)T983061IIAIJ (IJ )TATAIJ (AIJ )T(II)
T
983042AIJ 9830424F
983062(Axkminus1 minus b)
If t = n then it follows from
(IJ )TATAIJ = 983042AJ 9830422F
(since AIJ = AJ is a column vector) that
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 22 43
983042Axk minus b98304222
= 983042Axkminus1 minus b98304222 minus 2α(Axkminus1 minus b)T983061AIJ (AIJ )T(II)
T
983042AIJ 9830422F
983062(Axkminus1 minus b)
+α2(Axkminus1 minus b)T983061983042AJ 9830422FIIAIJ (AIJ )T(II)
T
983042AIJ 9830424F
983062(Axkminus1 minus b)
le 983042Axkminus1 minus b98304222 minus 2α(Axkminus1 minus b)T983061AIJ (AIJ )T(II)
T
983042AIJ 9830422F
983062(Axkminus1 minus b)
+α2(Axkminus1 minus b)T983061983042AJ 9830422FII(II)T
983042AIJ 9830422F
983062(Axkminus1 minus b) (by Lemma 1)
Taking expectation gives
E[983042Axk minus b98304222 |xkminus1]
le (1 + α2)983042Axkminus1 minus b98304222 minus 2α(Axkminus1 minus b)T983061AAT
983042A9830422F
983062(Axkminus1 minus b)
le9830611 + α2 minus 2ασ2
r(A)
983042A9830422F
983062983042Axkminus1 minus b98304222
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 23 43
The last inequality follows from Axkminus1 minus b isin range(A) and Lemma 2Taking expectation again gives
E[983042Axk minus b98304222] = E[E[983042Axk minus b98304222 |xkminus1]]
le9830611 + α2 minus 2ασ2
r (A)
983042A9830422F
983062E[983042Axkminus1 minus b98304222]
le9830611 + α2 minus 2ασ2
r (A)
983042A9830422F
983062k
983042Ax0 minus b98304222
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 24 43
If t lt n then it follows from
(IJ )TATAIJ = AT
JAJ ≼ ρI
(since ρ = max1lejlet σ21(AJj )) that
983042Axk minus b98304222
le 983042Axkminus1 minus b98304222 minus 2α(Axkminus1 minus b)T983061AIJ (AIJ )T(II)
T
983042AIJ 9830422F
983062(Axkminus1 minus b)
+α2(Axkminus1 minus b)T983061ρIIAIJ (AIJ )T(II)
T
983042AIJ 9830424F
983062(Axkminus1 minus b)
le 983042Axkminus1 minus b98304222 minus 2α(Axkminus1 minus b)T983061AIJ (AIJ )T(II)
T
983042AIJ 9830422F
983062(Axkminus1 minus b)
+α2(Axkminus1 minus b)T983061ρII(II)
T
983042AIJ 9830422F
983062(Axkminus1 minus b) (by Lemma 1)
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 25 43
Taking expectation gives
E[983042Axk minus b98304222 |xkminus1]
le9830611 +
tρα2
983042A9830422F
983062983042Axkminus1 minus b98304222 minus 2α(Axkminus1 minus b)T
983061AAT
983042A9830422F
983062(Axkminus1 minus b)
le9830611minus 2ασ2
r(A)minus tρα2
983042A9830422F
983062983042Axkminus1 minus b98304222
The last inequality follows from Axkminus1 minus b isin range(A) and Lemma 2Taking expectation again gives
E[983042Axk minus b98304222] = E[E[983042Axk minus b98304222 |xkminus1]]
le9830611minus 2ασ2
r (A)minus tρα2
983042A9830422F
983062E[983042Axkminus1 minus b98304222]
le9830611minus 2ασ2
r (A)minus tρα2
983042A9830422F
983062k
983042Ax0 minus b98304222
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 26 43
Remark 3
Note that(AJ )
Tb = (AJ )TAx983183
where x983183 is any solution of ATAx = ATb For inconsistent linearsystems we can prove DSBGS(1 t) has the convergence bounds Ift = n and 0 lt α lt 2σ2
r (A)983042A9830422F then
E[983042Axk minusAx98318398304222] le9830611 + α2 minus 2ασ2
r (A)
983042A9830422F
983062k
983042Ax0 minus b98304222
If t lt n and 0 lt α lt 2σ2r (A)(tρ) then
E[983042Axk minusAx98318398304222] le9830611minus 2ασ2
r (A)minus tρα2
983042A9830422F
983062k
983042Ax0 minus b98304222
whereρ = max
1lejletσ21(AJj )
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 27 43
4 Randomized extended block Kaczmarz (REBK) [1]
Algorithm 2 Randomized extended block Kaczmarz (REBK)
Let I1 I2 Is and J1J2 Jt be partitions of [m]and [n] respectivelyLet α gt 0 Initialize z0 isin b+ range(A) and x0 isin range(AT)for k = 1 2 do
Pick j isin [t] with probability 983042AJj9830422F983042A9830422FSet zk = zkminus1 minus α
983042AJj9830422FAJj (AJj )
Tzkminus1
Pick i isin [s] with probability 983042AIi9830422F983042A9830422FSet xk = xkminus1 minus α
983042AIi9830422F(AIi)
T(AIixkminus1 minus bIi + zkIi)
By choosing s = m t = n and α = 1 we recover the well-knownrandomized extended Kaczmarz (REK) algorithm of Zouzias and Freris[8]
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 28 43
Let Ekminus1
983045middot983046denote the conditional expectation conditioned on the first
k minus 1 iterations of REBK That is
Ekminus1
983045middot983046= E
983045middot|j1 i1 j2 i2 jkminus1 ikminus1
983046
where jl is the lth column block chosen and il is the lth row blockchosen
We denote the conditional expectation conditioned on the first k minus 1iterations and the kth column block chosen as
Eikminus1
983045middot983046= E
983045middot|j1 i1 j2 i2 jkminus1 ikminus1 jk
983046
Then by the law of total expectation we have
Ekminus1
983045middot983046= Ekminus1
983045Eikminus1
983045middot983046983046
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 29 43
41 Convergence of the norms of the expectations
Theorem 8
For any given consistent or inconsistent linear system
Ax = b
let xk be the kth iterate of REBK with
z0 isin b+ range(A) and x0 isin range(AT)
It holds
983042E983045xk minusAdaggerb
9830469830422 le δk
983061983042x0 minusAdaggerb9830422 +
αk983042ATz09830422983042A9830422F
983062
where
δ = max1leiler
9830559830559830559830551minusασ2
i (A)
983042A9830422F
983055983055983055983055
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 30 43
Proof By ATb = ATAAdaggerb and
xk minusAdaggerb = xkminus1 minusAdaggerbminus α
983042AIi9830422F(AIi)
T(AIixkminus1 minus bIi + zkIi)
we have
Ekminus1
983045xk minusAdaggerb
983046= Ekminus1
983045Eikminus1
983045xk minusAdaggerb
983046983046
= xkminus1 minusAdaggerbminus Ekminus1
983063αAT(Axkminus1 minus b+ zk)
983042A9830422F
983064
= xkminus1 minusAdaggerbminus αATAxkminus1 minusATb
983042A9830422Fminus α
AT
983042A9830422FEkminus1
983045zk
983046
=
983061Iminus α
ATA
983042A9830422F
983062(xkminus1 minusAdaggerb)minus α
AT
983042A9830422FEkminus1
983045zk
983046
=
983061Iminus α
ATA
983042A9830422F
983062(xkminus1 minusAdaggerb)minus α
AT
983042A9830422F
983061Iminus α
AAT
983042A9830422F
983062zkminus1
=
983061Iminus α
ATA
983042A9830422F
983062(xkminus1 minusAdaggerb)minus α
983061Iminus α
ATA
983042A9830422F
983062AT
983042A9830422Fzkminus1
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 31 43
Taking expectation gives
E983045xk minusAdaggerb
983046
= E983045Ekminus1
983045xk minusAdaggerb
983046983046
=
983061Iminus α
ATA
983042A9830422F
983062E983045xkminus1 minusAdaggerb
983046minus α
983061Iminus α
ATA
983042A9830422F
983062AT
983042A9830422FE983045zkminus1
983046
=
983061Iminus α
ATA
983042A9830422F
983062E983045xkminus1 minusAdaggerb
983046minus α
983061Iminus α
ATA
983042A9830422F
983062kATz0
983042A9830422F
=
983061Iminus α
ATA
983042A9830422F
9830622
E983045xkminus2 minusAdaggerb
983046minus 2α
983061Iminus α
ATA
983042A9830422F
983062kATz0
983042A9830422F= middot middot middot
=
983061Iminus α
ATA
983042A9830422F
983062k
(x0 minusAdaggerb)minus αk
983061Iminus α
ATA
983042A9830422F
983062kATz0
983042A9830422F
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 32 43
Applying the norms to both sides we obtain
983042E983045xk minusAdaggerb
9830469830422
=
983056983056983056983056983056
983061Iminus α
ATA
983042A9830422F
983062k
(x0 minusAdaggerb)minus αk
983061Iminus α
ATA
983042A9830422F
983062kATz0
983042A9830422F
9830569830569830569830569830562
le983056983056983056983056983056
983061Iminus α
ATA
983042A9830422F
983062k
(x0 minusAdaggerb)
9830569830569830569830569830562
+
983056983056983056983056983056αk983061Iminus α
ATA
983042A9830422F
983062kATz0
983042A9830422F
9830569830569830569830569830562
le δk983061983042x0 minusAdaggerb9830422 +
αk983042ATz09830422983042A9830422F
983062
Here the last inequality follows from the facts that
x0 minusAdaggerb isin range(AT) ATz0 isin range(AT)
and Lemma 3
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 33 43
42 Convergence of E983045983042xk minusAdaggerb98304222
983046
The convergence ofE983045983042xk minusAdaggerb98304222
983046
depends on the positive number ρ defined as
ρ = 1minus (2αminus α2)σ2r (A)
983042A9830422F
In the following lemma we show that the vector zk generated in REBKwith z0 isin b+ range(A) converges to
bperp = (IminusAAdagger)b
which is the orthogonal projection of z0 onto the set z | ATz = 0
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 34 43
Lemma 9
For any given consistent or inconsistent linear system
Ax = b
let zk be the vector generated in REBK with
z0 isin b+ range(A)
It holdsE983045983042zk minus bperp98304222
983046le ρk983042z0 minus bperp98304222
Proof By (AJj )Tbperp = 0 we have
zk minus bperp = zkminus1 minus bperp minus α
983042AJj9830422FAJj (AJj )
T(zkminus1 minus bperp) (1)
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 35 43
By z0 minus bperp = AAdaggerz0 isin range(A) we can show by induction that
zk minus bperp isin range(A)
It follows from (1) that
983042zk minus bperp98304222
= 983042zkminus1 minus bperp98304222 minus2α983042(AJj )
T(zkminus1 minus bperp)98304222983042AJj9830422F
+α2
983042AJj9830424F(zkminus1 minus bperp)
TAJj (AJj )TAJj (AJj )
T(zkminus1 minus bperp)
le 983042zkminus1 minus bperp98304222 minus(2αminus α2)983042(AJj )
T(zkminus1 minus bperp)98304222983042AJj9830422F
(by Lemma 1)
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 36 43
Taking the conditioned expectation on the first k minus 1 iterations yields
Ekminus1
983045983042zk minus bperp98304222
983046
le 983042zkminus1 minus bperp98304222 minus(2αminus α2)983042AT(zkminus1 minus bperp)98304222
983042A9830422Fle ρ983042zkminus1 minus bperp98304222 (by Lemma 2)
Taking expectation again gives
E983045983042zk minus bperp98304222
983046= E
983045Ekminus1
983045983042zk minus bperp98304222
983046983046
le ρE983045983042zkminus1 minus bperp98304222
983046
le ρk983042z0 minus bperp98304222
This completes the proof
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 37 43
Theorem 10
For any given consistent or inconsistent linear system Ax = b let xk
be the kth iterate of REBK with
z0 isin b+ range(A) and x0 isin range(AT)
For any ε gt 0 it holds
E983045983042xk minusAdaggerb98304222
983046le (1 + ε)kρk
983061(1 + ε)α2983042z0 minus bperp98304222
ε2983042A9830422F+ 983042x0 minusAdaggerb98304222
983062
Proof Let
983141xk = xkminus1 minus α
983042AIi9830422F(AIi)
TAIi(xkminus1 minusAdaggerb)
which is actually one DSBGS update for the linear systemAx = AAdaggerb from xkminus1
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 38 43
It follows from
xk minus 983141xk =α
983042AIi9830422F(AIi)
T(bIi minusAIiAdaggerbminus zkIi)
that
983042xk minus 983141xk98304222
=α2
983042AIi9830424F(bIi minusAIiA
daggerbminus zkIi)TAIi(AIi)
T(bIi minusAIiAdaggerbminus zkIi
)
leα2983042bIi minusAIiA
daggerbminus zkIi98304222
983042AIi9830422F (by Lemma 1) (2)
It follows from
Ekminus1
983045983042xk minus 983141xk98304222
983046= Ekminus1
983045Eikminus1
983045983042xk minus 983141xk98304222
983046983046
le Ekminus1
983063α2983042bminusAAdaggerbminus zk98304222
983042A9830422F
983064(by (2))
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 39 43
that
E983045983042xk minus 983141xk98304222
983046le α2
983042A9830422FE983045983042bminusAAdaggerbminus zk98304222
983046
le α2ρk
983042A9830422F983042z0 minus bperp98304222 (by Lemma 9) (3)
By x0 isin range(AT) and Adaggerb isin range(AT) we have
x0 minusAdaggerb isin range(AT)
Then we can show that xk minusAdaggerb isin range(AT) by induction By
983042983141xk minusAdaggerb98304222
= 983042xkminus1 minusAdaggerb98304222 minus2α983042AIi(x
kminus1 minusAdaggerb)98304222983042AIi9830422F
+α2
983042AIi9830424F(xkminus1 minusAdaggerb)T(AIi)
TAIi(AIi)TAIi(x
kminus1 minusAdaggerb)
le 983042xkminus1 minusAdaggerb98304222 minus(2αminus α2)983042AIi(x
kminus1 minusAdaggerb)98304222983042AIi9830422F
(by Lemma 1)
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 40 43
we have
Ekminus1
983045983042983141xk minusAdaggerb98304222
983046
le 983042xkminus1 minusAdaggerb98304222 minus(2αminus α2)983042A(xkminus1 minusAdaggerb)98304222
983042A9830422Fle ρ983042xkminus1 minusAdaggerb98304222 (by Lemma 3)
which yields
E983045983042983141xk minusAdaggerb98304222
983046le ρE
983045983042xkminus1 minusAdaggerb98304222
983046 (4)
Note that for any ε gt 0 we have
983042xk minusAdaggerb98304222= 983042xk minus 983141xk + 983141xk minusAdaggerb98304222le (983042xk minus 983141xk9830422 + 983042983141xk minusAdaggerb9830422)2
le 983042xk minus 983141xk98304222 + 983042983141xk minusAdaggerb98304222 + 2983042xk minus 983141xk9830422983042983141xk minusAdaggerb9830422
le9830611 +
1
ε
983062983042xk minus 983141xk98304222 + (1 + ε)983042983141xk minusAdaggerb98304222 (5)
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 41 43
Combining (3) (4) and (5) yields
E983045983042xk minusAdaggerb98304222
983046
le9830611 +
1
ε
983062E983045983042xk minus 983141xk98304222
983046+ (1 + ε)E
983045983042983141xk minusAdaggerb98304222
983046
le9830611 +
1
ε
983062α2ρk
983042A9830422F983042z0 minus bperp98304222 + (1 + ε)ρE
983045983042xkminus1 minusAdaggerb98304222
983046
le9830611 +
1
ε
983062(1 + 1 + ε)α2ρk
983042A9830422F983042z0 minus bperp98304222
+(1 + ε)2ρ2E983045983042xkminus2 minusAdaggerb98304222
983046
le middot middot middot
le9830611 +
1
ε
983062(1 + 1 + ε+ middot middot middot+ (1 + ε)kminus1)α2ρk
983042A9830422F983042z0 minus bperp98304222
+(1 + ε)kρk983042x0 minusAdaggerb98304222
le (1 + ε)kρk983061(1 + ε)α2983042z0 minus bperp98304222
ε2983042A9830422F+ 983042x0 minusAdaggerb98304222
983062
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 42 43
Remark 4
For the case REBK with s = m t = n and α = 1 (ie REK) by theorthogonality
(983141xk minusAdaggerb)T(xk minus 983141xk) = 0
the equation (5) becomes
983042xk minusAdaggerb98304222 = 983042xk minus 983141xk98304222 + 983042983141xk minusAdaggerb98304222
which yields the following convergence for REK
E983045983042xk minusAdaggerb98304222
983046le ρk
983061k983042z0 minus bperp98304222
983042A9830422F+ 983042x0 minusAdaggerb98304222
983062
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 43 43
Kui Du Wutao Si and Xiaohui Sun
Pseudoinverse-free randomized extended block kaczmarz for solving least squaresarXiv preprint arXiv200104179 2020
Kui Du and Xiaohui Sun
A doubly stochastic block Gauss-Seidel algorithm for solving linear equationsarXiv preprint arXiv191213291 2019
Lawrence H Landweber
An iteration formula for Fredholm integral equations of the first kindAmer J Math 73615ndash624 1951
D Leventhal and A S Lewis
Randomized methods for linear constraints convergence rates and conditioningMath Oper Res 35(3)641ndash654 2010
Anna Ma Deanna Needell and Aaditya Ramdas
Convergence properties of the randomized extended Gauss-Seidel and Kaczmarz methodsSIAM J Matrix Anal Appl 36(4)1590ndash1604 2015
Meisam Razaviyayn Mingyi Hong Navid Reyhanian and Zhi-Quan Luo
A linearly convergent doubly stochastic Gauss-Seidel algorithm for solving linear equations and acertain class of over-parameterized optimization problemsMath Program 176(1-2 Ser B)465ndash496 2019
Thomas Strohmer and Roman Vershynin
A randomized Kaczmarz algorithm with exponential convergenceJ Fourier Anal Appl 15(2)262ndash278 2009
Anastasios Zouzias and Nikolaos M Freris
Randomized extended Kaczmarz for solving least squaresSIAM J Matrix Anal Appl 34(2)773ndash793 2013
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 43 43
Applying the norms to both sides we obtain
983042E[xk minus x0983183]9830422 le
983061max1leiler
9830559830559830559830551minusασ2
i (A)
983042A9830422F
983055983055983055983055
983062k
983042x0 minus x09831839830422
Here the inequality follows from the fact that
x0 minus x0983183 = AdaggerAx
0 minusAdaggerb isin range(AT)
and Lemma 3
Remark 1
If x0 isin range(AT) then x0983183 = Adaggerb
To ensure convergence of the expected iterate it suffices to have
max1leiler
9830559830559830559830551minusασ2
i (A)
983042A9830422F
983055983055983055983055 lt 1 ie 0 lt α lt2983042A9830422Fσ21(A)
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 13 43
Theorem 5
Let xk denote the kth iterate of DSBGS applied to the consistent orinconsistent linear system
Ax = b
with arbitrary x0 isin Rn In exact arithmetic it holds
983042E[Axk minusAx983183]9830422 le983061max1leiler
9830559830559830559830551minusασ2
i (A)
983042A9830422F
983055983055983055983055
983062k
983042Ax0 minusAx9831839830422
where x983183 is any solution of
ATAx = ATb
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 14 43
Proof of Theorem 5
Note that the conditioned expectation on xkminus1
E[Axk minusAx983183 |xkminus1]
= A(E[xk |xkminus1]minus x983183)
= A
983061xkminus1 minus α
AT
983042A9830422F(Axkminus1 minus b)minus x983183
983062
= A
983061xkminus1 minus αAT
983042A9830422F(Axkminus1 minusAx983183)minus x983183
983062
(by ATb = ATAx983183)
= Axkminus1 minusAx983183 minusαAAT
983042A9830422F(Axkminus1 minusAx983183)
=
983061Iminus αAAT
983042A9830422F
983062(Axkminus1 minusAx983183)
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 15 43
Taking expectation gives
E[Axk minusAx983183] = E[E[Axk minusAx983183 |xkminus1]]
=
983061Iminus αAAT
983042A9830422F
983062E[Axkminus1 minusAx983183]
=
983061Iminus αAAT
983042A9830422F
983062k
(Ax0 minusAx983183)
Applying the norms to both sides we obtain
983042E[Axk minusAx983183]9830422 le983061max1leiler
9830559830559830559830551minusασ2
i (A)
983042A9830422F
983055983055983055983055
983062k
983042Ax0 minusAx9831839830422
Here the inequality follows from the fact that
Ax0 minusAx983183 isin range(A)
and Lemma 3
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 16 43
32 Convergence of the expected norms
Theorem 6
Let xk denote the kth iterate of DSBGS applied to the full column rankconsistent linear system
Ax = b
with arbitrary x0 isin Rn Assume
0 lt α lt 2t
In exact arithmetic it holds
E[983042xk minusAdaggerb98304222] le9830611minus (2αminus tα2)σ2
n(A)
983042A9830422F
983062k
983042x0 minusAdaggerb98304222
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 17 43
Proof of Theorem 6
983042xk minusAdaggerb98304222
=
983056983056983056983056xkminus1 minus α
983061IJ (AIJ )T(II)
T
983042AIJ 9830422F
983062(Axkminus1 minus b)minusAdaggerb
9830569830569830569830562
2
=
983056983056983056983056xkminus1 minus α
983061IJ (AIJ )T(II)
T
983042AIJ 9830422F
983062A(xkminus1 minusAdaggerb)minusAdaggerb
9830569830569830569830562
2
=
983056983056983056983056xkminus1 minusAdaggerbminus α
983061IJ (AIJ )T(II)
TA
983042AIJ 9830422F
983062(xkminus1 minusAdaggerb)
9830569830569830569830562
2
= 983042xkminus1 minusAdaggerb98304222 minus 2α(xkminus1 minusAdaggerb)T983061IJ (AIJ )T(II)
TA
983042AIJ 9830422F
983062(xkminus1 minusAdaggerb)
+α2(xkminus1 minusAdaggerb)T983061ATIIAIJ (IJ )TIJ (AIJ )T(II)
TA
983042AIJ 9830424F
983062(xkminus1 minusAdaggerb)
le 983042xkminus1 minusAdaggerb98304222 minus 2α(xkminus1 minusAdaggerb)T983061IJ (AIJ )T(II)
TA
983042AIJ 9830422F
983062(xkminus1 minusAdaggerb)
+α2(xkminus1 minusAdaggerb)T983061ATII(II)
TA
983042AIJ 9830422F
983062(xkminus1 minusAdaggerb)
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 18 43
The last equality follows from (IJ )TIJ = I and Lemma 1 Takingexpectation gives
E[983042xk minusAdaggerb98304222 |xkminus1]
le 983042xkminus1 minusAdaggerb98304222 minus (2αminus tα2)(xkminus1 minusAdaggerb)T983061ATA
983042A9830422F
983062(xkminus1 minusAdaggerb)
le9830611minus (2αminus tα2)σ2
n(A)
983042A9830422F
983062983042xkminus1 minusAdaggerb98304222 (by Lemma 2)
Taking expectation again gives
E[983042xk minusAdaggerb98304222] = E[E[983042xk minusAdaggerb98304222 |xkminus1]]
le9830611minus (2αminus tα2)σ2
n(A)
983042A9830422F
983062E[983042xkminus1 minusAdaggerb98304222]
le9830611minus (2αminus tα2)σ2
n(A)
983042A9830422F
983062k
983042x0 minusAdaggerb98304222
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 19 43
Remark 2
If t = 1 and x0 isin range(AT) we can show
xk minus x0983183 isin range(AT)
by induction where
x0983183 = (IminusAdaggerA)x0 +Adaggerb
ie the projection of x0 onto the set
x isin Rn | Ax = b
Then for rank deficient consistent linear systems by the sameapproach we can prove the convergence bound
E[983042xk minus x098318398304222] le
9830611minus (2αminus α2)σ2
r (A)
983042A9830422F
983062k
983042x0 minus x098318398304222
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 20 43
Theorem 7
Let xk denote the kth iterate of DSBGS applied to the consistent linearsystem (full column rank or rank-deficient)
Ax = b
with arbitrary x0 isin Rn If t = n and 0 lt α lt 2σ2r (A)983042A9830422F then
E[983042Axk minus b98304222] le9830611 + α2 minus 2ασ2
r (A)
983042A9830422F
983062k
983042Ax0 minus b98304222
If t lt n and 0 lt α lt 2σ2r (A)(tρ) then
E[983042Axk minus b98304222] le9830611minus 2ασ2
r (A)minus tρα2
983042A9830422F
983062k
983042Ax0 minus b98304222
whereρ = max
1lejletσ21(AJj )
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 21 43
Proof of Theorem 7
Note that
983042Axk minus b98304222
=
983056983056983056983056Axkminus1 minus α
983061AIJ (AIJ )T(II)
T
983042AIJ 9830422F
983062(Axkminus1 minus b)minus b
9830569830569830569830562
2
= 983042Axkminus1 minus b98304222 minus 2α(Axkminus1 minus b)T983061AIJ (AIJ )T(II)
T
983042AIJ 9830422F
983062(Axkminus1 minus b)
+α2(Axkminus1 minus b)T983061IIAIJ (IJ )TATAIJ (AIJ )T(II)
T
983042AIJ 9830424F
983062(Axkminus1 minus b)
If t = n then it follows from
(IJ )TATAIJ = 983042AJ 9830422F
(since AIJ = AJ is a column vector) that
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 22 43
983042Axk minus b98304222
= 983042Axkminus1 minus b98304222 minus 2α(Axkminus1 minus b)T983061AIJ (AIJ )T(II)
T
983042AIJ 9830422F
983062(Axkminus1 minus b)
+α2(Axkminus1 minus b)T983061983042AJ 9830422FIIAIJ (AIJ )T(II)
T
983042AIJ 9830424F
983062(Axkminus1 minus b)
le 983042Axkminus1 minus b98304222 minus 2α(Axkminus1 minus b)T983061AIJ (AIJ )T(II)
T
983042AIJ 9830422F
983062(Axkminus1 minus b)
+α2(Axkminus1 minus b)T983061983042AJ 9830422FII(II)T
983042AIJ 9830422F
983062(Axkminus1 minus b) (by Lemma 1)
Taking expectation gives
E[983042Axk minus b98304222 |xkminus1]
le (1 + α2)983042Axkminus1 minus b98304222 minus 2α(Axkminus1 minus b)T983061AAT
983042A9830422F
983062(Axkminus1 minus b)
le9830611 + α2 minus 2ασ2
r(A)
983042A9830422F
983062983042Axkminus1 minus b98304222
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 23 43
The last inequality follows from Axkminus1 minus b isin range(A) and Lemma 2Taking expectation again gives
E[983042Axk minus b98304222] = E[E[983042Axk minus b98304222 |xkminus1]]
le9830611 + α2 minus 2ασ2
r (A)
983042A9830422F
983062E[983042Axkminus1 minus b98304222]
le9830611 + α2 minus 2ασ2
r (A)
983042A9830422F
983062k
983042Ax0 minus b98304222
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 24 43
If t lt n then it follows from
(IJ )TATAIJ = AT
JAJ ≼ ρI
(since ρ = max1lejlet σ21(AJj )) that
983042Axk minus b98304222
le 983042Axkminus1 minus b98304222 minus 2α(Axkminus1 minus b)T983061AIJ (AIJ )T(II)
T
983042AIJ 9830422F
983062(Axkminus1 minus b)
+α2(Axkminus1 minus b)T983061ρIIAIJ (AIJ )T(II)
T
983042AIJ 9830424F
983062(Axkminus1 minus b)
le 983042Axkminus1 minus b98304222 minus 2α(Axkminus1 minus b)T983061AIJ (AIJ )T(II)
T
983042AIJ 9830422F
983062(Axkminus1 minus b)
+α2(Axkminus1 minus b)T983061ρII(II)
T
983042AIJ 9830422F
983062(Axkminus1 minus b) (by Lemma 1)
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 25 43
Taking expectation gives
E[983042Axk minus b98304222 |xkminus1]
le9830611 +
tρα2
983042A9830422F
983062983042Axkminus1 minus b98304222 minus 2α(Axkminus1 minus b)T
983061AAT
983042A9830422F
983062(Axkminus1 minus b)
le9830611minus 2ασ2
r(A)minus tρα2
983042A9830422F
983062983042Axkminus1 minus b98304222
The last inequality follows from Axkminus1 minus b isin range(A) and Lemma 2Taking expectation again gives
E[983042Axk minus b98304222] = E[E[983042Axk minus b98304222 |xkminus1]]
le9830611minus 2ασ2
r (A)minus tρα2
983042A9830422F
983062E[983042Axkminus1 minus b98304222]
le9830611minus 2ασ2
r (A)minus tρα2
983042A9830422F
983062k
983042Ax0 minus b98304222
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 26 43
Remark 3
Note that(AJ )
Tb = (AJ )TAx983183
where x983183 is any solution of ATAx = ATb For inconsistent linearsystems we can prove DSBGS(1 t) has the convergence bounds Ift = n and 0 lt α lt 2σ2
r (A)983042A9830422F then
E[983042Axk minusAx98318398304222] le9830611 + α2 minus 2ασ2
r (A)
983042A9830422F
983062k
983042Ax0 minus b98304222
If t lt n and 0 lt α lt 2σ2r (A)(tρ) then
E[983042Axk minusAx98318398304222] le9830611minus 2ασ2
r (A)minus tρα2
983042A9830422F
983062k
983042Ax0 minus b98304222
whereρ = max
1lejletσ21(AJj )
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 27 43
4 Randomized extended block Kaczmarz (REBK) [1]
Algorithm 2 Randomized extended block Kaczmarz (REBK)
Let I1 I2 Is and J1J2 Jt be partitions of [m]and [n] respectivelyLet α gt 0 Initialize z0 isin b+ range(A) and x0 isin range(AT)for k = 1 2 do
Pick j isin [t] with probability 983042AJj9830422F983042A9830422FSet zk = zkminus1 minus α
983042AJj9830422FAJj (AJj )
Tzkminus1
Pick i isin [s] with probability 983042AIi9830422F983042A9830422FSet xk = xkminus1 minus α
983042AIi9830422F(AIi)
T(AIixkminus1 minus bIi + zkIi)
By choosing s = m t = n and α = 1 we recover the well-knownrandomized extended Kaczmarz (REK) algorithm of Zouzias and Freris[8]
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 28 43
Let Ekminus1
983045middot983046denote the conditional expectation conditioned on the first
k minus 1 iterations of REBK That is
Ekminus1
983045middot983046= E
983045middot|j1 i1 j2 i2 jkminus1 ikminus1
983046
where jl is the lth column block chosen and il is the lth row blockchosen
We denote the conditional expectation conditioned on the first k minus 1iterations and the kth column block chosen as
Eikminus1
983045middot983046= E
983045middot|j1 i1 j2 i2 jkminus1 ikminus1 jk
983046
Then by the law of total expectation we have
Ekminus1
983045middot983046= Ekminus1
983045Eikminus1
983045middot983046983046
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 29 43
41 Convergence of the norms of the expectations
Theorem 8
For any given consistent or inconsistent linear system
Ax = b
let xk be the kth iterate of REBK with
z0 isin b+ range(A) and x0 isin range(AT)
It holds
983042E983045xk minusAdaggerb
9830469830422 le δk
983061983042x0 minusAdaggerb9830422 +
αk983042ATz09830422983042A9830422F
983062
where
δ = max1leiler
9830559830559830559830551minusασ2
i (A)
983042A9830422F
983055983055983055983055
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 30 43
Proof By ATb = ATAAdaggerb and
xk minusAdaggerb = xkminus1 minusAdaggerbminus α
983042AIi9830422F(AIi)
T(AIixkminus1 minus bIi + zkIi)
we have
Ekminus1
983045xk minusAdaggerb
983046= Ekminus1
983045Eikminus1
983045xk minusAdaggerb
983046983046
= xkminus1 minusAdaggerbminus Ekminus1
983063αAT(Axkminus1 minus b+ zk)
983042A9830422F
983064
= xkminus1 minusAdaggerbminus αATAxkminus1 minusATb
983042A9830422Fminus α
AT
983042A9830422FEkminus1
983045zk
983046
=
983061Iminus α
ATA
983042A9830422F
983062(xkminus1 minusAdaggerb)minus α
AT
983042A9830422FEkminus1
983045zk
983046
=
983061Iminus α
ATA
983042A9830422F
983062(xkminus1 minusAdaggerb)minus α
AT
983042A9830422F
983061Iminus α
AAT
983042A9830422F
983062zkminus1
=
983061Iminus α
ATA
983042A9830422F
983062(xkminus1 minusAdaggerb)minus α
983061Iminus α
ATA
983042A9830422F
983062AT
983042A9830422Fzkminus1
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 31 43
Taking expectation gives
E983045xk minusAdaggerb
983046
= E983045Ekminus1
983045xk minusAdaggerb
983046983046
=
983061Iminus α
ATA
983042A9830422F
983062E983045xkminus1 minusAdaggerb
983046minus α
983061Iminus α
ATA
983042A9830422F
983062AT
983042A9830422FE983045zkminus1
983046
=
983061Iminus α
ATA
983042A9830422F
983062E983045xkminus1 minusAdaggerb
983046minus α
983061Iminus α
ATA
983042A9830422F
983062kATz0
983042A9830422F
=
983061Iminus α
ATA
983042A9830422F
9830622
E983045xkminus2 minusAdaggerb
983046minus 2α
983061Iminus α
ATA
983042A9830422F
983062kATz0
983042A9830422F= middot middot middot
=
983061Iminus α
ATA
983042A9830422F
983062k
(x0 minusAdaggerb)minus αk
983061Iminus α
ATA
983042A9830422F
983062kATz0
983042A9830422F
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 32 43
Applying the norms to both sides we obtain
983042E983045xk minusAdaggerb
9830469830422
=
983056983056983056983056983056
983061Iminus α
ATA
983042A9830422F
983062k
(x0 minusAdaggerb)minus αk
983061Iminus α
ATA
983042A9830422F
983062kATz0
983042A9830422F
9830569830569830569830569830562
le983056983056983056983056983056
983061Iminus α
ATA
983042A9830422F
983062k
(x0 minusAdaggerb)
9830569830569830569830569830562
+
983056983056983056983056983056αk983061Iminus α
ATA
983042A9830422F
983062kATz0
983042A9830422F
9830569830569830569830569830562
le δk983061983042x0 minusAdaggerb9830422 +
αk983042ATz09830422983042A9830422F
983062
Here the last inequality follows from the facts that
x0 minusAdaggerb isin range(AT) ATz0 isin range(AT)
and Lemma 3
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 33 43
42 Convergence of E983045983042xk minusAdaggerb98304222
983046
The convergence ofE983045983042xk minusAdaggerb98304222
983046
depends on the positive number ρ defined as
ρ = 1minus (2αminus α2)σ2r (A)
983042A9830422F
In the following lemma we show that the vector zk generated in REBKwith z0 isin b+ range(A) converges to
bperp = (IminusAAdagger)b
which is the orthogonal projection of z0 onto the set z | ATz = 0
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 34 43
Lemma 9
For any given consistent or inconsistent linear system
Ax = b
let zk be the vector generated in REBK with
z0 isin b+ range(A)
It holdsE983045983042zk minus bperp98304222
983046le ρk983042z0 minus bperp98304222
Proof By (AJj )Tbperp = 0 we have
zk minus bperp = zkminus1 minus bperp minus α
983042AJj9830422FAJj (AJj )
T(zkminus1 minus bperp) (1)
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 35 43
By z0 minus bperp = AAdaggerz0 isin range(A) we can show by induction that
zk minus bperp isin range(A)
It follows from (1) that
983042zk minus bperp98304222
= 983042zkminus1 minus bperp98304222 minus2α983042(AJj )
T(zkminus1 minus bperp)98304222983042AJj9830422F
+α2
983042AJj9830424F(zkminus1 minus bperp)
TAJj (AJj )TAJj (AJj )
T(zkminus1 minus bperp)
le 983042zkminus1 minus bperp98304222 minus(2αminus α2)983042(AJj )
T(zkminus1 minus bperp)98304222983042AJj9830422F
(by Lemma 1)
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 36 43
Taking the conditioned expectation on the first k minus 1 iterations yields
Ekminus1
983045983042zk minus bperp98304222
983046
le 983042zkminus1 minus bperp98304222 minus(2αminus α2)983042AT(zkminus1 minus bperp)98304222
983042A9830422Fle ρ983042zkminus1 minus bperp98304222 (by Lemma 2)
Taking expectation again gives
E983045983042zk minus bperp98304222
983046= E
983045Ekminus1
983045983042zk minus bperp98304222
983046983046
le ρE983045983042zkminus1 minus bperp98304222
983046
le ρk983042z0 minus bperp98304222
This completes the proof
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 37 43
Theorem 10
For any given consistent or inconsistent linear system Ax = b let xk
be the kth iterate of REBK with
z0 isin b+ range(A) and x0 isin range(AT)
For any ε gt 0 it holds
E983045983042xk minusAdaggerb98304222
983046le (1 + ε)kρk
983061(1 + ε)α2983042z0 minus bperp98304222
ε2983042A9830422F+ 983042x0 minusAdaggerb98304222
983062
Proof Let
983141xk = xkminus1 minus α
983042AIi9830422F(AIi)
TAIi(xkminus1 minusAdaggerb)
which is actually one DSBGS update for the linear systemAx = AAdaggerb from xkminus1
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 38 43
It follows from
xk minus 983141xk =α
983042AIi9830422F(AIi)
T(bIi minusAIiAdaggerbminus zkIi)
that
983042xk minus 983141xk98304222
=α2
983042AIi9830424F(bIi minusAIiA
daggerbminus zkIi)TAIi(AIi)
T(bIi minusAIiAdaggerbminus zkIi
)
leα2983042bIi minusAIiA
daggerbminus zkIi98304222
983042AIi9830422F (by Lemma 1) (2)
It follows from
Ekminus1
983045983042xk minus 983141xk98304222
983046= Ekminus1
983045Eikminus1
983045983042xk minus 983141xk98304222
983046983046
le Ekminus1
983063α2983042bminusAAdaggerbminus zk98304222
983042A9830422F
983064(by (2))
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 39 43
that
E983045983042xk minus 983141xk98304222
983046le α2
983042A9830422FE983045983042bminusAAdaggerbminus zk98304222
983046
le α2ρk
983042A9830422F983042z0 minus bperp98304222 (by Lemma 9) (3)
By x0 isin range(AT) and Adaggerb isin range(AT) we have
x0 minusAdaggerb isin range(AT)
Then we can show that xk minusAdaggerb isin range(AT) by induction By
983042983141xk minusAdaggerb98304222
= 983042xkminus1 minusAdaggerb98304222 minus2α983042AIi(x
kminus1 minusAdaggerb)98304222983042AIi9830422F
+α2
983042AIi9830424F(xkminus1 minusAdaggerb)T(AIi)
TAIi(AIi)TAIi(x
kminus1 minusAdaggerb)
le 983042xkminus1 minusAdaggerb98304222 minus(2αminus α2)983042AIi(x
kminus1 minusAdaggerb)98304222983042AIi9830422F
(by Lemma 1)
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 40 43
we have
Ekminus1
983045983042983141xk minusAdaggerb98304222
983046
le 983042xkminus1 minusAdaggerb98304222 minus(2αminus α2)983042A(xkminus1 minusAdaggerb)98304222
983042A9830422Fle ρ983042xkminus1 minusAdaggerb98304222 (by Lemma 3)
which yields
E983045983042983141xk minusAdaggerb98304222
983046le ρE
983045983042xkminus1 minusAdaggerb98304222
983046 (4)
Note that for any ε gt 0 we have
983042xk minusAdaggerb98304222= 983042xk minus 983141xk + 983141xk minusAdaggerb98304222le (983042xk minus 983141xk9830422 + 983042983141xk minusAdaggerb9830422)2
le 983042xk minus 983141xk98304222 + 983042983141xk minusAdaggerb98304222 + 2983042xk minus 983141xk9830422983042983141xk minusAdaggerb9830422
le9830611 +
1
ε
983062983042xk minus 983141xk98304222 + (1 + ε)983042983141xk minusAdaggerb98304222 (5)
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 41 43
Combining (3) (4) and (5) yields
E983045983042xk minusAdaggerb98304222
983046
le9830611 +
1
ε
983062E983045983042xk minus 983141xk98304222
983046+ (1 + ε)E
983045983042983141xk minusAdaggerb98304222
983046
le9830611 +
1
ε
983062α2ρk
983042A9830422F983042z0 minus bperp98304222 + (1 + ε)ρE
983045983042xkminus1 minusAdaggerb98304222
983046
le9830611 +
1
ε
983062(1 + 1 + ε)α2ρk
983042A9830422F983042z0 minus bperp98304222
+(1 + ε)2ρ2E983045983042xkminus2 minusAdaggerb98304222
983046
le middot middot middot
le9830611 +
1
ε
983062(1 + 1 + ε+ middot middot middot+ (1 + ε)kminus1)α2ρk
983042A9830422F983042z0 minus bperp98304222
+(1 + ε)kρk983042x0 minusAdaggerb98304222
le (1 + ε)kρk983061(1 + ε)α2983042z0 minus bperp98304222
ε2983042A9830422F+ 983042x0 minusAdaggerb98304222
983062
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 42 43
Remark 4
For the case REBK with s = m t = n and α = 1 (ie REK) by theorthogonality
(983141xk minusAdaggerb)T(xk minus 983141xk) = 0
the equation (5) becomes
983042xk minusAdaggerb98304222 = 983042xk minus 983141xk98304222 + 983042983141xk minusAdaggerb98304222
which yields the following convergence for REK
E983045983042xk minusAdaggerb98304222
983046le ρk
983061k983042z0 minus bperp98304222
983042A9830422F+ 983042x0 minusAdaggerb98304222
983062
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 43 43
Kui Du Wutao Si and Xiaohui Sun
Pseudoinverse-free randomized extended block kaczmarz for solving least squaresarXiv preprint arXiv200104179 2020
Kui Du and Xiaohui Sun
A doubly stochastic block Gauss-Seidel algorithm for solving linear equationsarXiv preprint arXiv191213291 2019
Lawrence H Landweber
An iteration formula for Fredholm integral equations of the first kindAmer J Math 73615ndash624 1951
D Leventhal and A S Lewis
Randomized methods for linear constraints convergence rates and conditioningMath Oper Res 35(3)641ndash654 2010
Anna Ma Deanna Needell and Aaditya Ramdas
Convergence properties of the randomized extended Gauss-Seidel and Kaczmarz methodsSIAM J Matrix Anal Appl 36(4)1590ndash1604 2015
Meisam Razaviyayn Mingyi Hong Navid Reyhanian and Zhi-Quan Luo
A linearly convergent doubly stochastic Gauss-Seidel algorithm for solving linear equations and acertain class of over-parameterized optimization problemsMath Program 176(1-2 Ser B)465ndash496 2019
Thomas Strohmer and Roman Vershynin
A randomized Kaczmarz algorithm with exponential convergenceJ Fourier Anal Appl 15(2)262ndash278 2009
Anastasios Zouzias and Nikolaos M Freris
Randomized extended Kaczmarz for solving least squaresSIAM J Matrix Anal Appl 34(2)773ndash793 2013
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 43 43
Theorem 5
Let xk denote the kth iterate of DSBGS applied to the consistent orinconsistent linear system
Ax = b
with arbitrary x0 isin Rn In exact arithmetic it holds
983042E[Axk minusAx983183]9830422 le983061max1leiler
9830559830559830559830551minusασ2
i (A)
983042A9830422F
983055983055983055983055
983062k
983042Ax0 minusAx9831839830422
where x983183 is any solution of
ATAx = ATb
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 14 43
Proof of Theorem 5
Note that the conditioned expectation on xkminus1
E[Axk minusAx983183 |xkminus1]
= A(E[xk |xkminus1]minus x983183)
= A
983061xkminus1 minus α
AT
983042A9830422F(Axkminus1 minus b)minus x983183
983062
= A
983061xkminus1 minus αAT
983042A9830422F(Axkminus1 minusAx983183)minus x983183
983062
(by ATb = ATAx983183)
= Axkminus1 minusAx983183 minusαAAT
983042A9830422F(Axkminus1 minusAx983183)
=
983061Iminus αAAT
983042A9830422F
983062(Axkminus1 minusAx983183)
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 15 43
Taking expectation gives
E[Axk minusAx983183] = E[E[Axk minusAx983183 |xkminus1]]
=
983061Iminus αAAT
983042A9830422F
983062E[Axkminus1 minusAx983183]
=
983061Iminus αAAT
983042A9830422F
983062k
(Ax0 minusAx983183)
Applying the norms to both sides we obtain
983042E[Axk minusAx983183]9830422 le983061max1leiler
9830559830559830559830551minusασ2
i (A)
983042A9830422F
983055983055983055983055
983062k
983042Ax0 minusAx9831839830422
Here the inequality follows from the fact that
Ax0 minusAx983183 isin range(A)
and Lemma 3
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 16 43
32 Convergence of the expected norms
Theorem 6
Let xk denote the kth iterate of DSBGS applied to the full column rankconsistent linear system
Ax = b
with arbitrary x0 isin Rn Assume
0 lt α lt 2t
In exact arithmetic it holds
E[983042xk minusAdaggerb98304222] le9830611minus (2αminus tα2)σ2
n(A)
983042A9830422F
983062k
983042x0 minusAdaggerb98304222
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 17 43
Proof of Theorem 6
983042xk minusAdaggerb98304222
=
983056983056983056983056xkminus1 minus α
983061IJ (AIJ )T(II)
T
983042AIJ 9830422F
983062(Axkminus1 minus b)minusAdaggerb
9830569830569830569830562
2
=
983056983056983056983056xkminus1 minus α
983061IJ (AIJ )T(II)
T
983042AIJ 9830422F
983062A(xkminus1 minusAdaggerb)minusAdaggerb
9830569830569830569830562
2
=
983056983056983056983056xkminus1 minusAdaggerbminus α
983061IJ (AIJ )T(II)
TA
983042AIJ 9830422F
983062(xkminus1 minusAdaggerb)
9830569830569830569830562
2
= 983042xkminus1 minusAdaggerb98304222 minus 2α(xkminus1 minusAdaggerb)T983061IJ (AIJ )T(II)
TA
983042AIJ 9830422F
983062(xkminus1 minusAdaggerb)
+α2(xkminus1 minusAdaggerb)T983061ATIIAIJ (IJ )TIJ (AIJ )T(II)
TA
983042AIJ 9830424F
983062(xkminus1 minusAdaggerb)
le 983042xkminus1 minusAdaggerb98304222 minus 2α(xkminus1 minusAdaggerb)T983061IJ (AIJ )T(II)
TA
983042AIJ 9830422F
983062(xkminus1 minusAdaggerb)
+α2(xkminus1 minusAdaggerb)T983061ATII(II)
TA
983042AIJ 9830422F
983062(xkminus1 minusAdaggerb)
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 18 43
The last equality follows from (IJ )TIJ = I and Lemma 1 Takingexpectation gives
E[983042xk minusAdaggerb98304222 |xkminus1]
le 983042xkminus1 minusAdaggerb98304222 minus (2αminus tα2)(xkminus1 minusAdaggerb)T983061ATA
983042A9830422F
983062(xkminus1 minusAdaggerb)
le9830611minus (2αminus tα2)σ2
n(A)
983042A9830422F
983062983042xkminus1 minusAdaggerb98304222 (by Lemma 2)
Taking expectation again gives
E[983042xk minusAdaggerb98304222] = E[E[983042xk minusAdaggerb98304222 |xkminus1]]
le9830611minus (2αminus tα2)σ2
n(A)
983042A9830422F
983062E[983042xkminus1 minusAdaggerb98304222]
le9830611minus (2αminus tα2)σ2
n(A)
983042A9830422F
983062k
983042x0 minusAdaggerb98304222
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 19 43
Remark 2
If t = 1 and x0 isin range(AT) we can show
xk minus x0983183 isin range(AT)
by induction where
x0983183 = (IminusAdaggerA)x0 +Adaggerb
ie the projection of x0 onto the set
x isin Rn | Ax = b
Then for rank deficient consistent linear systems by the sameapproach we can prove the convergence bound
E[983042xk minus x098318398304222] le
9830611minus (2αminus α2)σ2
r (A)
983042A9830422F
983062k
983042x0 minus x098318398304222
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 20 43
Theorem 7
Let xk denote the kth iterate of DSBGS applied to the consistent linearsystem (full column rank or rank-deficient)
Ax = b
with arbitrary x0 isin Rn If t = n and 0 lt α lt 2σ2r (A)983042A9830422F then
E[983042Axk minus b98304222] le9830611 + α2 minus 2ασ2
r (A)
983042A9830422F
983062k
983042Ax0 minus b98304222
If t lt n and 0 lt α lt 2σ2r (A)(tρ) then
E[983042Axk minus b98304222] le9830611minus 2ασ2
r (A)minus tρα2
983042A9830422F
983062k
983042Ax0 minus b98304222
whereρ = max
1lejletσ21(AJj )
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 21 43
Proof of Theorem 7
Note that
983042Axk minus b98304222
=
983056983056983056983056Axkminus1 minus α
983061AIJ (AIJ )T(II)
T
983042AIJ 9830422F
983062(Axkminus1 minus b)minus b
9830569830569830569830562
2
= 983042Axkminus1 minus b98304222 minus 2α(Axkminus1 minus b)T983061AIJ (AIJ )T(II)
T
983042AIJ 9830422F
983062(Axkminus1 minus b)
+α2(Axkminus1 minus b)T983061IIAIJ (IJ )TATAIJ (AIJ )T(II)
T
983042AIJ 9830424F
983062(Axkminus1 minus b)
If t = n then it follows from
(IJ )TATAIJ = 983042AJ 9830422F
(since AIJ = AJ is a column vector) that
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 22 43
983042Axk minus b98304222
= 983042Axkminus1 minus b98304222 minus 2α(Axkminus1 minus b)T983061AIJ (AIJ )T(II)
T
983042AIJ 9830422F
983062(Axkminus1 minus b)
+α2(Axkminus1 minus b)T983061983042AJ 9830422FIIAIJ (AIJ )T(II)
T
983042AIJ 9830424F
983062(Axkminus1 minus b)
le 983042Axkminus1 minus b98304222 minus 2α(Axkminus1 minus b)T983061AIJ (AIJ )T(II)
T
983042AIJ 9830422F
983062(Axkminus1 minus b)
+α2(Axkminus1 minus b)T983061983042AJ 9830422FII(II)T
983042AIJ 9830422F
983062(Axkminus1 minus b) (by Lemma 1)
Taking expectation gives
E[983042Axk minus b98304222 |xkminus1]
le (1 + α2)983042Axkminus1 minus b98304222 minus 2α(Axkminus1 minus b)T983061AAT
983042A9830422F
983062(Axkminus1 minus b)
le9830611 + α2 minus 2ασ2
r(A)
983042A9830422F
983062983042Axkminus1 minus b98304222
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 23 43
The last inequality follows from Axkminus1 minus b isin range(A) and Lemma 2Taking expectation again gives
E[983042Axk minus b98304222] = E[E[983042Axk minus b98304222 |xkminus1]]
le9830611 + α2 minus 2ασ2
r (A)
983042A9830422F
983062E[983042Axkminus1 minus b98304222]
le9830611 + α2 minus 2ασ2
r (A)
983042A9830422F
983062k
983042Ax0 minus b98304222
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 24 43
If t lt n then it follows from
(IJ )TATAIJ = AT
JAJ ≼ ρI
(since ρ = max1lejlet σ21(AJj )) that
983042Axk minus b98304222
le 983042Axkminus1 minus b98304222 minus 2α(Axkminus1 minus b)T983061AIJ (AIJ )T(II)
T
983042AIJ 9830422F
983062(Axkminus1 minus b)
+α2(Axkminus1 minus b)T983061ρIIAIJ (AIJ )T(II)
T
983042AIJ 9830424F
983062(Axkminus1 minus b)
le 983042Axkminus1 minus b98304222 minus 2α(Axkminus1 minus b)T983061AIJ (AIJ )T(II)
T
983042AIJ 9830422F
983062(Axkminus1 minus b)
+α2(Axkminus1 minus b)T983061ρII(II)
T
983042AIJ 9830422F
983062(Axkminus1 minus b) (by Lemma 1)
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 25 43
Taking expectation gives
E[983042Axk minus b98304222 |xkminus1]
le9830611 +
tρα2
983042A9830422F
983062983042Axkminus1 minus b98304222 minus 2α(Axkminus1 minus b)T
983061AAT
983042A9830422F
983062(Axkminus1 minus b)
le9830611minus 2ασ2
r(A)minus tρα2
983042A9830422F
983062983042Axkminus1 minus b98304222
The last inequality follows from Axkminus1 minus b isin range(A) and Lemma 2Taking expectation again gives
E[983042Axk minus b98304222] = E[E[983042Axk minus b98304222 |xkminus1]]
le9830611minus 2ασ2
r (A)minus tρα2
983042A9830422F
983062E[983042Axkminus1 minus b98304222]
le9830611minus 2ασ2
r (A)minus tρα2
983042A9830422F
983062k
983042Ax0 minus b98304222
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 26 43
Remark 3
Note that(AJ )
Tb = (AJ )TAx983183
where x983183 is any solution of ATAx = ATb For inconsistent linearsystems we can prove DSBGS(1 t) has the convergence bounds Ift = n and 0 lt α lt 2σ2
r (A)983042A9830422F then
E[983042Axk minusAx98318398304222] le9830611 + α2 minus 2ασ2
r (A)
983042A9830422F
983062k
983042Ax0 minus b98304222
If t lt n and 0 lt α lt 2σ2r (A)(tρ) then
E[983042Axk minusAx98318398304222] le9830611minus 2ασ2
r (A)minus tρα2
983042A9830422F
983062k
983042Ax0 minus b98304222
whereρ = max
1lejletσ21(AJj )
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 27 43
4 Randomized extended block Kaczmarz (REBK) [1]
Algorithm 2 Randomized extended block Kaczmarz (REBK)
Let I1 I2 Is and J1J2 Jt be partitions of [m]and [n] respectivelyLet α gt 0 Initialize z0 isin b+ range(A) and x0 isin range(AT)for k = 1 2 do
Pick j isin [t] with probability 983042AJj9830422F983042A9830422FSet zk = zkminus1 minus α
983042AJj9830422FAJj (AJj )
Tzkminus1
Pick i isin [s] with probability 983042AIi9830422F983042A9830422FSet xk = xkminus1 minus α
983042AIi9830422F(AIi)
T(AIixkminus1 minus bIi + zkIi)
By choosing s = m t = n and α = 1 we recover the well-knownrandomized extended Kaczmarz (REK) algorithm of Zouzias and Freris[8]
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 28 43
Let Ekminus1
983045middot983046denote the conditional expectation conditioned on the first
k minus 1 iterations of REBK That is
Ekminus1
983045middot983046= E
983045middot|j1 i1 j2 i2 jkminus1 ikminus1
983046
where jl is the lth column block chosen and il is the lth row blockchosen
We denote the conditional expectation conditioned on the first k minus 1iterations and the kth column block chosen as
Eikminus1
983045middot983046= E
983045middot|j1 i1 j2 i2 jkminus1 ikminus1 jk
983046
Then by the law of total expectation we have
Ekminus1
983045middot983046= Ekminus1
983045Eikminus1
983045middot983046983046
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 29 43
41 Convergence of the norms of the expectations
Theorem 8
For any given consistent or inconsistent linear system
Ax = b
let xk be the kth iterate of REBK with
z0 isin b+ range(A) and x0 isin range(AT)
It holds
983042E983045xk minusAdaggerb
9830469830422 le δk
983061983042x0 minusAdaggerb9830422 +
αk983042ATz09830422983042A9830422F
983062
where
δ = max1leiler
9830559830559830559830551minusασ2
i (A)
983042A9830422F
983055983055983055983055
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 30 43
Proof By ATb = ATAAdaggerb and
xk minusAdaggerb = xkminus1 minusAdaggerbminus α
983042AIi9830422F(AIi)
T(AIixkminus1 minus bIi + zkIi)
we have
Ekminus1
983045xk minusAdaggerb
983046= Ekminus1
983045Eikminus1
983045xk minusAdaggerb
983046983046
= xkminus1 minusAdaggerbminus Ekminus1
983063αAT(Axkminus1 minus b+ zk)
983042A9830422F
983064
= xkminus1 minusAdaggerbminus αATAxkminus1 minusATb
983042A9830422Fminus α
AT
983042A9830422FEkminus1
983045zk
983046
=
983061Iminus α
ATA
983042A9830422F
983062(xkminus1 minusAdaggerb)minus α
AT
983042A9830422FEkminus1
983045zk
983046
=
983061Iminus α
ATA
983042A9830422F
983062(xkminus1 minusAdaggerb)minus α
AT
983042A9830422F
983061Iminus α
AAT
983042A9830422F
983062zkminus1
=
983061Iminus α
ATA
983042A9830422F
983062(xkminus1 minusAdaggerb)minus α
983061Iminus α
ATA
983042A9830422F
983062AT
983042A9830422Fzkminus1
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 31 43
Taking expectation gives
E983045xk minusAdaggerb
983046
= E983045Ekminus1
983045xk minusAdaggerb
983046983046
=
983061Iminus α
ATA
983042A9830422F
983062E983045xkminus1 minusAdaggerb
983046minus α
983061Iminus α
ATA
983042A9830422F
983062AT
983042A9830422FE983045zkminus1
983046
=
983061Iminus α
ATA
983042A9830422F
983062E983045xkminus1 minusAdaggerb
983046minus α
983061Iminus α
ATA
983042A9830422F
983062kATz0
983042A9830422F
=
983061Iminus α
ATA
983042A9830422F
9830622
E983045xkminus2 minusAdaggerb
983046minus 2α
983061Iminus α
ATA
983042A9830422F
983062kATz0
983042A9830422F= middot middot middot
=
983061Iminus α
ATA
983042A9830422F
983062k
(x0 minusAdaggerb)minus αk
983061Iminus α
ATA
983042A9830422F
983062kATz0
983042A9830422F
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 32 43
Applying the norms to both sides we obtain
983042E983045xk minusAdaggerb
9830469830422
=
983056983056983056983056983056
983061Iminus α
ATA
983042A9830422F
983062k
(x0 minusAdaggerb)minus αk
983061Iminus α
ATA
983042A9830422F
983062kATz0
983042A9830422F
9830569830569830569830569830562
le983056983056983056983056983056
983061Iminus α
ATA
983042A9830422F
983062k
(x0 minusAdaggerb)
9830569830569830569830569830562
+
983056983056983056983056983056αk983061Iminus α
ATA
983042A9830422F
983062kATz0
983042A9830422F
9830569830569830569830569830562
le δk983061983042x0 minusAdaggerb9830422 +
αk983042ATz09830422983042A9830422F
983062
Here the last inequality follows from the facts that
x0 minusAdaggerb isin range(AT) ATz0 isin range(AT)
and Lemma 3
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 33 43
42 Convergence of E983045983042xk minusAdaggerb98304222
983046
The convergence ofE983045983042xk minusAdaggerb98304222
983046
depends on the positive number ρ defined as
ρ = 1minus (2αminus α2)σ2r (A)
983042A9830422F
In the following lemma we show that the vector zk generated in REBKwith z0 isin b+ range(A) converges to
bperp = (IminusAAdagger)b
which is the orthogonal projection of z0 onto the set z | ATz = 0
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 34 43
Lemma 9
For any given consistent or inconsistent linear system
Ax = b
let zk be the vector generated in REBK with
z0 isin b+ range(A)
It holdsE983045983042zk minus bperp98304222
983046le ρk983042z0 minus bperp98304222
Proof By (AJj )Tbperp = 0 we have
zk minus bperp = zkminus1 minus bperp minus α
983042AJj9830422FAJj (AJj )
T(zkminus1 minus bperp) (1)
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 35 43
By z0 minus bperp = AAdaggerz0 isin range(A) we can show by induction that
zk minus bperp isin range(A)
It follows from (1) that
983042zk minus bperp98304222
= 983042zkminus1 minus bperp98304222 minus2α983042(AJj )
T(zkminus1 minus bperp)98304222983042AJj9830422F
+α2
983042AJj9830424F(zkminus1 minus bperp)
TAJj (AJj )TAJj (AJj )
T(zkminus1 minus bperp)
le 983042zkminus1 minus bperp98304222 minus(2αminus α2)983042(AJj )
T(zkminus1 minus bperp)98304222983042AJj9830422F
(by Lemma 1)
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 36 43
Taking the conditioned expectation on the first k minus 1 iterations yields
Ekminus1
983045983042zk minus bperp98304222
983046
le 983042zkminus1 minus bperp98304222 minus(2αminus α2)983042AT(zkminus1 minus bperp)98304222
983042A9830422Fle ρ983042zkminus1 minus bperp98304222 (by Lemma 2)
Taking expectation again gives
E983045983042zk minus bperp98304222
983046= E
983045Ekminus1
983045983042zk minus bperp98304222
983046983046
le ρE983045983042zkminus1 minus bperp98304222
983046
le ρk983042z0 minus bperp98304222
This completes the proof
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 37 43
Theorem 10
For any given consistent or inconsistent linear system Ax = b let xk
be the kth iterate of REBK with
z0 isin b+ range(A) and x0 isin range(AT)
For any ε gt 0 it holds
E983045983042xk minusAdaggerb98304222
983046le (1 + ε)kρk
983061(1 + ε)α2983042z0 minus bperp98304222
ε2983042A9830422F+ 983042x0 minusAdaggerb98304222
983062
Proof Let
983141xk = xkminus1 minus α
983042AIi9830422F(AIi)
TAIi(xkminus1 minusAdaggerb)
which is actually one DSBGS update for the linear systemAx = AAdaggerb from xkminus1
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 38 43
It follows from
xk minus 983141xk =α
983042AIi9830422F(AIi)
T(bIi minusAIiAdaggerbminus zkIi)
that
983042xk minus 983141xk98304222
=α2
983042AIi9830424F(bIi minusAIiA
daggerbminus zkIi)TAIi(AIi)
T(bIi minusAIiAdaggerbminus zkIi
)
leα2983042bIi minusAIiA
daggerbminus zkIi98304222
983042AIi9830422F (by Lemma 1) (2)
It follows from
Ekminus1
983045983042xk minus 983141xk98304222
983046= Ekminus1
983045Eikminus1
983045983042xk minus 983141xk98304222
983046983046
le Ekminus1
983063α2983042bminusAAdaggerbminus zk98304222
983042A9830422F
983064(by (2))
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 39 43
that
E983045983042xk minus 983141xk98304222
983046le α2
983042A9830422FE983045983042bminusAAdaggerbminus zk98304222
983046
le α2ρk
983042A9830422F983042z0 minus bperp98304222 (by Lemma 9) (3)
By x0 isin range(AT) and Adaggerb isin range(AT) we have
x0 minusAdaggerb isin range(AT)
Then we can show that xk minusAdaggerb isin range(AT) by induction By
983042983141xk minusAdaggerb98304222
= 983042xkminus1 minusAdaggerb98304222 minus2α983042AIi(x
kminus1 minusAdaggerb)98304222983042AIi9830422F
+α2
983042AIi9830424F(xkminus1 minusAdaggerb)T(AIi)
TAIi(AIi)TAIi(x
kminus1 minusAdaggerb)
le 983042xkminus1 minusAdaggerb98304222 minus(2αminus α2)983042AIi(x
kminus1 minusAdaggerb)98304222983042AIi9830422F
(by Lemma 1)
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 40 43
we have
Ekminus1
983045983042983141xk minusAdaggerb98304222
983046
le 983042xkminus1 minusAdaggerb98304222 minus(2αminus α2)983042A(xkminus1 minusAdaggerb)98304222
983042A9830422Fle ρ983042xkminus1 minusAdaggerb98304222 (by Lemma 3)
which yields
E983045983042983141xk minusAdaggerb98304222
983046le ρE
983045983042xkminus1 minusAdaggerb98304222
983046 (4)
Note that for any ε gt 0 we have
983042xk minusAdaggerb98304222= 983042xk minus 983141xk + 983141xk minusAdaggerb98304222le (983042xk minus 983141xk9830422 + 983042983141xk minusAdaggerb9830422)2
le 983042xk minus 983141xk98304222 + 983042983141xk minusAdaggerb98304222 + 2983042xk minus 983141xk9830422983042983141xk minusAdaggerb9830422
le9830611 +
1
ε
983062983042xk minus 983141xk98304222 + (1 + ε)983042983141xk minusAdaggerb98304222 (5)
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 41 43
Combining (3) (4) and (5) yields
E983045983042xk minusAdaggerb98304222
983046
le9830611 +
1
ε
983062E983045983042xk minus 983141xk98304222
983046+ (1 + ε)E
983045983042983141xk minusAdaggerb98304222
983046
le9830611 +
1
ε
983062α2ρk
983042A9830422F983042z0 minus bperp98304222 + (1 + ε)ρE
983045983042xkminus1 minusAdaggerb98304222
983046
le9830611 +
1
ε
983062(1 + 1 + ε)α2ρk
983042A9830422F983042z0 minus bperp98304222
+(1 + ε)2ρ2E983045983042xkminus2 minusAdaggerb98304222
983046
le middot middot middot
le9830611 +
1
ε
983062(1 + 1 + ε+ middot middot middot+ (1 + ε)kminus1)α2ρk
983042A9830422F983042z0 minus bperp98304222
+(1 + ε)kρk983042x0 minusAdaggerb98304222
le (1 + ε)kρk983061(1 + ε)α2983042z0 minus bperp98304222
ε2983042A9830422F+ 983042x0 minusAdaggerb98304222
983062
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 42 43
Remark 4
For the case REBK with s = m t = n and α = 1 (ie REK) by theorthogonality
(983141xk minusAdaggerb)T(xk minus 983141xk) = 0
the equation (5) becomes
983042xk minusAdaggerb98304222 = 983042xk minus 983141xk98304222 + 983042983141xk minusAdaggerb98304222
which yields the following convergence for REK
E983045983042xk minusAdaggerb98304222
983046le ρk
983061k983042z0 minus bperp98304222
983042A9830422F+ 983042x0 minusAdaggerb98304222
983062
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 43 43
Kui Du Wutao Si and Xiaohui Sun
Pseudoinverse-free randomized extended block kaczmarz for solving least squaresarXiv preprint arXiv200104179 2020
Kui Du and Xiaohui Sun
A doubly stochastic block Gauss-Seidel algorithm for solving linear equationsarXiv preprint arXiv191213291 2019
Lawrence H Landweber
An iteration formula for Fredholm integral equations of the first kindAmer J Math 73615ndash624 1951
D Leventhal and A S Lewis
Randomized methods for linear constraints convergence rates and conditioningMath Oper Res 35(3)641ndash654 2010
Anna Ma Deanna Needell and Aaditya Ramdas
Convergence properties of the randomized extended Gauss-Seidel and Kaczmarz methodsSIAM J Matrix Anal Appl 36(4)1590ndash1604 2015
Meisam Razaviyayn Mingyi Hong Navid Reyhanian and Zhi-Quan Luo
A linearly convergent doubly stochastic Gauss-Seidel algorithm for solving linear equations and acertain class of over-parameterized optimization problemsMath Program 176(1-2 Ser B)465ndash496 2019
Thomas Strohmer and Roman Vershynin
A randomized Kaczmarz algorithm with exponential convergenceJ Fourier Anal Appl 15(2)262ndash278 2009
Anastasios Zouzias and Nikolaos M Freris
Randomized extended Kaczmarz for solving least squaresSIAM J Matrix Anal Appl 34(2)773ndash793 2013
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 43 43
Proof of Theorem 5
Note that the conditioned expectation on xkminus1
E[Axk minusAx983183 |xkminus1]
= A(E[xk |xkminus1]minus x983183)
= A
983061xkminus1 minus α
AT
983042A9830422F(Axkminus1 minus b)minus x983183
983062
= A
983061xkminus1 minus αAT
983042A9830422F(Axkminus1 minusAx983183)minus x983183
983062
(by ATb = ATAx983183)
= Axkminus1 minusAx983183 minusαAAT
983042A9830422F(Axkminus1 minusAx983183)
=
983061Iminus αAAT
983042A9830422F
983062(Axkminus1 minusAx983183)
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 15 43
Taking expectation gives
E[Axk minusAx983183] = E[E[Axk minusAx983183 |xkminus1]]
=
983061Iminus αAAT
983042A9830422F
983062E[Axkminus1 minusAx983183]
=
983061Iminus αAAT
983042A9830422F
983062k
(Ax0 minusAx983183)
Applying the norms to both sides we obtain
983042E[Axk minusAx983183]9830422 le983061max1leiler
9830559830559830559830551minusασ2
i (A)
983042A9830422F
983055983055983055983055
983062k
983042Ax0 minusAx9831839830422
Here the inequality follows from the fact that
Ax0 minusAx983183 isin range(A)
and Lemma 3
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 16 43
32 Convergence of the expected norms
Theorem 6
Let xk denote the kth iterate of DSBGS applied to the full column rankconsistent linear system
Ax = b
with arbitrary x0 isin Rn Assume
0 lt α lt 2t
In exact arithmetic it holds
E[983042xk minusAdaggerb98304222] le9830611minus (2αminus tα2)σ2
n(A)
983042A9830422F
983062k
983042x0 minusAdaggerb98304222
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 17 43
Proof of Theorem 6
983042xk minusAdaggerb98304222
=
983056983056983056983056xkminus1 minus α
983061IJ (AIJ )T(II)
T
983042AIJ 9830422F
983062(Axkminus1 minus b)minusAdaggerb
9830569830569830569830562
2
=
983056983056983056983056xkminus1 minus α
983061IJ (AIJ )T(II)
T
983042AIJ 9830422F
983062A(xkminus1 minusAdaggerb)minusAdaggerb
9830569830569830569830562
2
=
983056983056983056983056xkminus1 minusAdaggerbminus α
983061IJ (AIJ )T(II)
TA
983042AIJ 9830422F
983062(xkminus1 minusAdaggerb)
9830569830569830569830562
2
= 983042xkminus1 minusAdaggerb98304222 minus 2α(xkminus1 minusAdaggerb)T983061IJ (AIJ )T(II)
TA
983042AIJ 9830422F
983062(xkminus1 minusAdaggerb)
+α2(xkminus1 minusAdaggerb)T983061ATIIAIJ (IJ )TIJ (AIJ )T(II)
TA
983042AIJ 9830424F
983062(xkminus1 minusAdaggerb)
le 983042xkminus1 minusAdaggerb98304222 minus 2α(xkminus1 minusAdaggerb)T983061IJ (AIJ )T(II)
TA
983042AIJ 9830422F
983062(xkminus1 minusAdaggerb)
+α2(xkminus1 minusAdaggerb)T983061ATII(II)
TA
983042AIJ 9830422F
983062(xkminus1 minusAdaggerb)
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 18 43
The last equality follows from (IJ )TIJ = I and Lemma 1 Takingexpectation gives
E[983042xk minusAdaggerb98304222 |xkminus1]
le 983042xkminus1 minusAdaggerb98304222 minus (2αminus tα2)(xkminus1 minusAdaggerb)T983061ATA
983042A9830422F
983062(xkminus1 minusAdaggerb)
le9830611minus (2αminus tα2)σ2
n(A)
983042A9830422F
983062983042xkminus1 minusAdaggerb98304222 (by Lemma 2)
Taking expectation again gives
E[983042xk minusAdaggerb98304222] = E[E[983042xk minusAdaggerb98304222 |xkminus1]]
le9830611minus (2αminus tα2)σ2
n(A)
983042A9830422F
983062E[983042xkminus1 minusAdaggerb98304222]
le9830611minus (2αminus tα2)σ2
n(A)
983042A9830422F
983062k
983042x0 minusAdaggerb98304222
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 19 43
Remark 2
If t = 1 and x0 isin range(AT) we can show
xk minus x0983183 isin range(AT)
by induction where
x0983183 = (IminusAdaggerA)x0 +Adaggerb
ie the projection of x0 onto the set
x isin Rn | Ax = b
Then for rank deficient consistent linear systems by the sameapproach we can prove the convergence bound
E[983042xk minus x098318398304222] le
9830611minus (2αminus α2)σ2
r (A)
983042A9830422F
983062k
983042x0 minus x098318398304222
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 20 43
Theorem 7
Let xk denote the kth iterate of DSBGS applied to the consistent linearsystem (full column rank or rank-deficient)
Ax = b
with arbitrary x0 isin Rn If t = n and 0 lt α lt 2σ2r (A)983042A9830422F then
E[983042Axk minus b98304222] le9830611 + α2 minus 2ασ2
r (A)
983042A9830422F
983062k
983042Ax0 minus b98304222
If t lt n and 0 lt α lt 2σ2r (A)(tρ) then
E[983042Axk minus b98304222] le9830611minus 2ασ2
r (A)minus tρα2
983042A9830422F
983062k
983042Ax0 minus b98304222
whereρ = max
1lejletσ21(AJj )
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 21 43
Proof of Theorem 7
Note that
983042Axk minus b98304222
=
983056983056983056983056Axkminus1 minus α
983061AIJ (AIJ )T(II)
T
983042AIJ 9830422F
983062(Axkminus1 minus b)minus b
9830569830569830569830562
2
= 983042Axkminus1 minus b98304222 minus 2α(Axkminus1 minus b)T983061AIJ (AIJ )T(II)
T
983042AIJ 9830422F
983062(Axkminus1 minus b)
+α2(Axkminus1 minus b)T983061IIAIJ (IJ )TATAIJ (AIJ )T(II)
T
983042AIJ 9830424F
983062(Axkminus1 minus b)
If t = n then it follows from
(IJ )TATAIJ = 983042AJ 9830422F
(since AIJ = AJ is a column vector) that
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 22 43
983042Axk minus b98304222
= 983042Axkminus1 minus b98304222 minus 2α(Axkminus1 minus b)T983061AIJ (AIJ )T(II)
T
983042AIJ 9830422F
983062(Axkminus1 minus b)
+α2(Axkminus1 minus b)T983061983042AJ 9830422FIIAIJ (AIJ )T(II)
T
983042AIJ 9830424F
983062(Axkminus1 minus b)
le 983042Axkminus1 minus b98304222 minus 2α(Axkminus1 minus b)T983061AIJ (AIJ )T(II)
T
983042AIJ 9830422F
983062(Axkminus1 minus b)
+α2(Axkminus1 minus b)T983061983042AJ 9830422FII(II)T
983042AIJ 9830422F
983062(Axkminus1 minus b) (by Lemma 1)
Taking expectation gives
E[983042Axk minus b98304222 |xkminus1]
le (1 + α2)983042Axkminus1 minus b98304222 minus 2α(Axkminus1 minus b)T983061AAT
983042A9830422F
983062(Axkminus1 minus b)
le9830611 + α2 minus 2ασ2
r(A)
983042A9830422F
983062983042Axkminus1 minus b98304222
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 23 43
The last inequality follows from Axkminus1 minus b isin range(A) and Lemma 2Taking expectation again gives
E[983042Axk minus b98304222] = E[E[983042Axk minus b98304222 |xkminus1]]
le9830611 + α2 minus 2ασ2
r (A)
983042A9830422F
983062E[983042Axkminus1 minus b98304222]
le9830611 + α2 minus 2ασ2
r (A)
983042A9830422F
983062k
983042Ax0 minus b98304222
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 24 43
If t lt n then it follows from
(IJ )TATAIJ = AT
JAJ ≼ ρI
(since ρ = max1lejlet σ21(AJj )) that
983042Axk minus b98304222
le 983042Axkminus1 minus b98304222 minus 2α(Axkminus1 minus b)T983061AIJ (AIJ )T(II)
T
983042AIJ 9830422F
983062(Axkminus1 minus b)
+α2(Axkminus1 minus b)T983061ρIIAIJ (AIJ )T(II)
T
983042AIJ 9830424F
983062(Axkminus1 minus b)
le 983042Axkminus1 minus b98304222 minus 2α(Axkminus1 minus b)T983061AIJ (AIJ )T(II)
T
983042AIJ 9830422F
983062(Axkminus1 minus b)
+α2(Axkminus1 minus b)T983061ρII(II)
T
983042AIJ 9830422F
983062(Axkminus1 minus b) (by Lemma 1)
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 25 43
Taking expectation gives
E[983042Axk minus b98304222 |xkminus1]
le9830611 +
tρα2
983042A9830422F
983062983042Axkminus1 minus b98304222 minus 2α(Axkminus1 minus b)T
983061AAT
983042A9830422F
983062(Axkminus1 minus b)
le9830611minus 2ασ2
r(A)minus tρα2
983042A9830422F
983062983042Axkminus1 minus b98304222
The last inequality follows from Axkminus1 minus b isin range(A) and Lemma 2Taking expectation again gives
E[983042Axk minus b98304222] = E[E[983042Axk minus b98304222 |xkminus1]]
le9830611minus 2ασ2
r (A)minus tρα2
983042A9830422F
983062E[983042Axkminus1 minus b98304222]
le9830611minus 2ασ2
r (A)minus tρα2
983042A9830422F
983062k
983042Ax0 minus b98304222
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 26 43
Remark 3
Note that(AJ )
Tb = (AJ )TAx983183
where x983183 is any solution of ATAx = ATb For inconsistent linearsystems we can prove DSBGS(1 t) has the convergence bounds Ift = n and 0 lt α lt 2σ2
r (A)983042A9830422F then
E[983042Axk minusAx98318398304222] le9830611 + α2 minus 2ασ2
r (A)
983042A9830422F
983062k
983042Ax0 minus b98304222
If t lt n and 0 lt α lt 2σ2r (A)(tρ) then
E[983042Axk minusAx98318398304222] le9830611minus 2ασ2
r (A)minus tρα2
983042A9830422F
983062k
983042Ax0 minus b98304222
whereρ = max
1lejletσ21(AJj )
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 27 43
4 Randomized extended block Kaczmarz (REBK) [1]
Algorithm 2 Randomized extended block Kaczmarz (REBK)
Let I1 I2 Is and J1J2 Jt be partitions of [m]and [n] respectivelyLet α gt 0 Initialize z0 isin b+ range(A) and x0 isin range(AT)for k = 1 2 do
Pick j isin [t] with probability 983042AJj9830422F983042A9830422FSet zk = zkminus1 minus α
983042AJj9830422FAJj (AJj )
Tzkminus1
Pick i isin [s] with probability 983042AIi9830422F983042A9830422FSet xk = xkminus1 minus α
983042AIi9830422F(AIi)
T(AIixkminus1 minus bIi + zkIi)
By choosing s = m t = n and α = 1 we recover the well-knownrandomized extended Kaczmarz (REK) algorithm of Zouzias and Freris[8]
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 28 43
Let Ekminus1
983045middot983046denote the conditional expectation conditioned on the first
k minus 1 iterations of REBK That is
Ekminus1
983045middot983046= E
983045middot|j1 i1 j2 i2 jkminus1 ikminus1
983046
where jl is the lth column block chosen and il is the lth row blockchosen
We denote the conditional expectation conditioned on the first k minus 1iterations and the kth column block chosen as
Eikminus1
983045middot983046= E
983045middot|j1 i1 j2 i2 jkminus1 ikminus1 jk
983046
Then by the law of total expectation we have
Ekminus1
983045middot983046= Ekminus1
983045Eikminus1
983045middot983046983046
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 29 43
41 Convergence of the norms of the expectations
Theorem 8
For any given consistent or inconsistent linear system
Ax = b
let xk be the kth iterate of REBK with
z0 isin b+ range(A) and x0 isin range(AT)
It holds
983042E983045xk minusAdaggerb
9830469830422 le δk
983061983042x0 minusAdaggerb9830422 +
αk983042ATz09830422983042A9830422F
983062
where
δ = max1leiler
9830559830559830559830551minusασ2
i (A)
983042A9830422F
983055983055983055983055
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 30 43
Proof By ATb = ATAAdaggerb and
xk minusAdaggerb = xkminus1 minusAdaggerbminus α
983042AIi9830422F(AIi)
T(AIixkminus1 minus bIi + zkIi)
we have
Ekminus1
983045xk minusAdaggerb
983046= Ekminus1
983045Eikminus1
983045xk minusAdaggerb
983046983046
= xkminus1 minusAdaggerbminus Ekminus1
983063αAT(Axkminus1 minus b+ zk)
983042A9830422F
983064
= xkminus1 minusAdaggerbminus αATAxkminus1 minusATb
983042A9830422Fminus α
AT
983042A9830422FEkminus1
983045zk
983046
=
983061Iminus α
ATA
983042A9830422F
983062(xkminus1 minusAdaggerb)minus α
AT
983042A9830422FEkminus1
983045zk
983046
=
983061Iminus α
ATA
983042A9830422F
983062(xkminus1 minusAdaggerb)minus α
AT
983042A9830422F
983061Iminus α
AAT
983042A9830422F
983062zkminus1
=
983061Iminus α
ATA
983042A9830422F
983062(xkminus1 minusAdaggerb)minus α
983061Iminus α
ATA
983042A9830422F
983062AT
983042A9830422Fzkminus1
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 31 43
Taking expectation gives
E983045xk minusAdaggerb
983046
= E983045Ekminus1
983045xk minusAdaggerb
983046983046
=
983061Iminus α
ATA
983042A9830422F
983062E983045xkminus1 minusAdaggerb
983046minus α
983061Iminus α
ATA
983042A9830422F
983062AT
983042A9830422FE983045zkminus1
983046
=
983061Iminus α
ATA
983042A9830422F
983062E983045xkminus1 minusAdaggerb
983046minus α
983061Iminus α
ATA
983042A9830422F
983062kATz0
983042A9830422F
=
983061Iminus α
ATA
983042A9830422F
9830622
E983045xkminus2 minusAdaggerb
983046minus 2α
983061Iminus α
ATA
983042A9830422F
983062kATz0
983042A9830422F= middot middot middot
=
983061Iminus α
ATA
983042A9830422F
983062k
(x0 minusAdaggerb)minus αk
983061Iminus α
ATA
983042A9830422F
983062kATz0
983042A9830422F
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 32 43
Applying the norms to both sides we obtain
983042E983045xk minusAdaggerb
9830469830422
=
983056983056983056983056983056
983061Iminus α
ATA
983042A9830422F
983062k
(x0 minusAdaggerb)minus αk
983061Iminus α
ATA
983042A9830422F
983062kATz0
983042A9830422F
9830569830569830569830569830562
le983056983056983056983056983056
983061Iminus α
ATA
983042A9830422F
983062k
(x0 minusAdaggerb)
9830569830569830569830569830562
+
983056983056983056983056983056αk983061Iminus α
ATA
983042A9830422F
983062kATz0
983042A9830422F
9830569830569830569830569830562
le δk983061983042x0 minusAdaggerb9830422 +
αk983042ATz09830422983042A9830422F
983062
Here the last inequality follows from the facts that
x0 minusAdaggerb isin range(AT) ATz0 isin range(AT)
and Lemma 3
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 33 43
42 Convergence of E983045983042xk minusAdaggerb98304222
983046
The convergence ofE983045983042xk minusAdaggerb98304222
983046
depends on the positive number ρ defined as
ρ = 1minus (2αminus α2)σ2r (A)
983042A9830422F
In the following lemma we show that the vector zk generated in REBKwith z0 isin b+ range(A) converges to
bperp = (IminusAAdagger)b
which is the orthogonal projection of z0 onto the set z | ATz = 0
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 34 43
Lemma 9
For any given consistent or inconsistent linear system
Ax = b
let zk be the vector generated in REBK with
z0 isin b+ range(A)
It holdsE983045983042zk minus bperp98304222
983046le ρk983042z0 minus bperp98304222
Proof By (AJj )Tbperp = 0 we have
zk minus bperp = zkminus1 minus bperp minus α
983042AJj9830422FAJj (AJj )
T(zkminus1 minus bperp) (1)
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 35 43
By z0 minus bperp = AAdaggerz0 isin range(A) we can show by induction that
zk minus bperp isin range(A)
It follows from (1) that
983042zk minus bperp98304222
= 983042zkminus1 minus bperp98304222 minus2α983042(AJj )
T(zkminus1 minus bperp)98304222983042AJj9830422F
+α2
983042AJj9830424F(zkminus1 minus bperp)
TAJj (AJj )TAJj (AJj )
T(zkminus1 minus bperp)
le 983042zkminus1 minus bperp98304222 minus(2αminus α2)983042(AJj )
T(zkminus1 minus bperp)98304222983042AJj9830422F
(by Lemma 1)
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 36 43
Taking the conditioned expectation on the first k minus 1 iterations yields
Ekminus1
983045983042zk minus bperp98304222
983046
le 983042zkminus1 minus bperp98304222 minus(2αminus α2)983042AT(zkminus1 minus bperp)98304222
983042A9830422Fle ρ983042zkminus1 minus bperp98304222 (by Lemma 2)
Taking expectation again gives
E983045983042zk minus bperp98304222
983046= E
983045Ekminus1
983045983042zk minus bperp98304222
983046983046
le ρE983045983042zkminus1 minus bperp98304222
983046
le ρk983042z0 minus bperp98304222
This completes the proof
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 37 43
Theorem 10
For any given consistent or inconsistent linear system Ax = b let xk
be the kth iterate of REBK with
z0 isin b+ range(A) and x0 isin range(AT)
For any ε gt 0 it holds
E983045983042xk minusAdaggerb98304222
983046le (1 + ε)kρk
983061(1 + ε)α2983042z0 minus bperp98304222
ε2983042A9830422F+ 983042x0 minusAdaggerb98304222
983062
Proof Let
983141xk = xkminus1 minus α
983042AIi9830422F(AIi)
TAIi(xkminus1 minusAdaggerb)
which is actually one DSBGS update for the linear systemAx = AAdaggerb from xkminus1
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 38 43
It follows from
xk minus 983141xk =α
983042AIi9830422F(AIi)
T(bIi minusAIiAdaggerbminus zkIi)
that
983042xk minus 983141xk98304222
=α2
983042AIi9830424F(bIi minusAIiA
daggerbminus zkIi)TAIi(AIi)
T(bIi minusAIiAdaggerbminus zkIi
)
leα2983042bIi minusAIiA
daggerbminus zkIi98304222
983042AIi9830422F (by Lemma 1) (2)
It follows from
Ekminus1
983045983042xk minus 983141xk98304222
983046= Ekminus1
983045Eikminus1
983045983042xk minus 983141xk98304222
983046983046
le Ekminus1
983063α2983042bminusAAdaggerbminus zk98304222
983042A9830422F
983064(by (2))
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 39 43
that
E983045983042xk minus 983141xk98304222
983046le α2
983042A9830422FE983045983042bminusAAdaggerbminus zk98304222
983046
le α2ρk
983042A9830422F983042z0 minus bperp98304222 (by Lemma 9) (3)
By x0 isin range(AT) and Adaggerb isin range(AT) we have
x0 minusAdaggerb isin range(AT)
Then we can show that xk minusAdaggerb isin range(AT) by induction By
983042983141xk minusAdaggerb98304222
= 983042xkminus1 minusAdaggerb98304222 minus2α983042AIi(x
kminus1 minusAdaggerb)98304222983042AIi9830422F
+α2
983042AIi9830424F(xkminus1 minusAdaggerb)T(AIi)
TAIi(AIi)TAIi(x
kminus1 minusAdaggerb)
le 983042xkminus1 minusAdaggerb98304222 minus(2αminus α2)983042AIi(x
kminus1 minusAdaggerb)98304222983042AIi9830422F
(by Lemma 1)
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 40 43
we have
Ekminus1
983045983042983141xk minusAdaggerb98304222
983046
le 983042xkminus1 minusAdaggerb98304222 minus(2αminus α2)983042A(xkminus1 minusAdaggerb)98304222
983042A9830422Fle ρ983042xkminus1 minusAdaggerb98304222 (by Lemma 3)
which yields
E983045983042983141xk minusAdaggerb98304222
983046le ρE
983045983042xkminus1 minusAdaggerb98304222
983046 (4)
Note that for any ε gt 0 we have
983042xk minusAdaggerb98304222= 983042xk minus 983141xk + 983141xk minusAdaggerb98304222le (983042xk minus 983141xk9830422 + 983042983141xk minusAdaggerb9830422)2
le 983042xk minus 983141xk98304222 + 983042983141xk minusAdaggerb98304222 + 2983042xk minus 983141xk9830422983042983141xk minusAdaggerb9830422
le9830611 +
1
ε
983062983042xk minus 983141xk98304222 + (1 + ε)983042983141xk minusAdaggerb98304222 (5)
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 41 43
Combining (3) (4) and (5) yields
E983045983042xk minusAdaggerb98304222
983046
le9830611 +
1
ε
983062E983045983042xk minus 983141xk98304222
983046+ (1 + ε)E
983045983042983141xk minusAdaggerb98304222
983046
le9830611 +
1
ε
983062α2ρk
983042A9830422F983042z0 minus bperp98304222 + (1 + ε)ρE
983045983042xkminus1 minusAdaggerb98304222
983046
le9830611 +
1
ε
983062(1 + 1 + ε)α2ρk
983042A9830422F983042z0 minus bperp98304222
+(1 + ε)2ρ2E983045983042xkminus2 minusAdaggerb98304222
983046
le middot middot middot
le9830611 +
1
ε
983062(1 + 1 + ε+ middot middot middot+ (1 + ε)kminus1)α2ρk
983042A9830422F983042z0 minus bperp98304222
+(1 + ε)kρk983042x0 minusAdaggerb98304222
le (1 + ε)kρk983061(1 + ε)α2983042z0 minus bperp98304222
ε2983042A9830422F+ 983042x0 minusAdaggerb98304222
983062
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 42 43
Remark 4
For the case REBK with s = m t = n and α = 1 (ie REK) by theorthogonality
(983141xk minusAdaggerb)T(xk minus 983141xk) = 0
the equation (5) becomes
983042xk minusAdaggerb98304222 = 983042xk minus 983141xk98304222 + 983042983141xk minusAdaggerb98304222
which yields the following convergence for REK
E983045983042xk minusAdaggerb98304222
983046le ρk
983061k983042z0 minus bperp98304222
983042A9830422F+ 983042x0 minusAdaggerb98304222
983062
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 43 43
Kui Du Wutao Si and Xiaohui Sun
Pseudoinverse-free randomized extended block kaczmarz for solving least squaresarXiv preprint arXiv200104179 2020
Kui Du and Xiaohui Sun
A doubly stochastic block Gauss-Seidel algorithm for solving linear equationsarXiv preprint arXiv191213291 2019
Lawrence H Landweber
An iteration formula for Fredholm integral equations of the first kindAmer J Math 73615ndash624 1951
D Leventhal and A S Lewis
Randomized methods for linear constraints convergence rates and conditioningMath Oper Res 35(3)641ndash654 2010
Anna Ma Deanna Needell and Aaditya Ramdas
Convergence properties of the randomized extended Gauss-Seidel and Kaczmarz methodsSIAM J Matrix Anal Appl 36(4)1590ndash1604 2015
Meisam Razaviyayn Mingyi Hong Navid Reyhanian and Zhi-Quan Luo
A linearly convergent doubly stochastic Gauss-Seidel algorithm for solving linear equations and acertain class of over-parameterized optimization problemsMath Program 176(1-2 Ser B)465ndash496 2019
Thomas Strohmer and Roman Vershynin
A randomized Kaczmarz algorithm with exponential convergenceJ Fourier Anal Appl 15(2)262ndash278 2009
Anastasios Zouzias and Nikolaos M Freris
Randomized extended Kaczmarz for solving least squaresSIAM J Matrix Anal Appl 34(2)773ndash793 2013
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 43 43
Taking expectation gives
E[Axk minusAx983183] = E[E[Axk minusAx983183 |xkminus1]]
=
983061Iminus αAAT
983042A9830422F
983062E[Axkminus1 minusAx983183]
=
983061Iminus αAAT
983042A9830422F
983062k
(Ax0 minusAx983183)
Applying the norms to both sides we obtain
983042E[Axk minusAx983183]9830422 le983061max1leiler
9830559830559830559830551minusασ2
i (A)
983042A9830422F
983055983055983055983055
983062k
983042Ax0 minusAx9831839830422
Here the inequality follows from the fact that
Ax0 minusAx983183 isin range(A)
and Lemma 3
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 16 43
32 Convergence of the expected norms
Theorem 6
Let xk denote the kth iterate of DSBGS applied to the full column rankconsistent linear system
Ax = b
with arbitrary x0 isin Rn Assume
0 lt α lt 2t
In exact arithmetic it holds
E[983042xk minusAdaggerb98304222] le9830611minus (2αminus tα2)σ2
n(A)
983042A9830422F
983062k
983042x0 minusAdaggerb98304222
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 17 43
Proof of Theorem 6
983042xk minusAdaggerb98304222
=
983056983056983056983056xkminus1 minus α
983061IJ (AIJ )T(II)
T
983042AIJ 9830422F
983062(Axkminus1 minus b)minusAdaggerb
9830569830569830569830562
2
=
983056983056983056983056xkminus1 minus α
983061IJ (AIJ )T(II)
T
983042AIJ 9830422F
983062A(xkminus1 minusAdaggerb)minusAdaggerb
9830569830569830569830562
2
=
983056983056983056983056xkminus1 minusAdaggerbminus α
983061IJ (AIJ )T(II)
TA
983042AIJ 9830422F
983062(xkminus1 minusAdaggerb)
9830569830569830569830562
2
= 983042xkminus1 minusAdaggerb98304222 minus 2α(xkminus1 minusAdaggerb)T983061IJ (AIJ )T(II)
TA
983042AIJ 9830422F
983062(xkminus1 minusAdaggerb)
+α2(xkminus1 minusAdaggerb)T983061ATIIAIJ (IJ )TIJ (AIJ )T(II)
TA
983042AIJ 9830424F
983062(xkminus1 minusAdaggerb)
le 983042xkminus1 minusAdaggerb98304222 minus 2α(xkminus1 minusAdaggerb)T983061IJ (AIJ )T(II)
TA
983042AIJ 9830422F
983062(xkminus1 minusAdaggerb)
+α2(xkminus1 minusAdaggerb)T983061ATII(II)
TA
983042AIJ 9830422F
983062(xkminus1 minusAdaggerb)
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 18 43
The last equality follows from (IJ )TIJ = I and Lemma 1 Takingexpectation gives
E[983042xk minusAdaggerb98304222 |xkminus1]
le 983042xkminus1 minusAdaggerb98304222 minus (2αminus tα2)(xkminus1 minusAdaggerb)T983061ATA
983042A9830422F
983062(xkminus1 minusAdaggerb)
le9830611minus (2αminus tα2)σ2
n(A)
983042A9830422F
983062983042xkminus1 minusAdaggerb98304222 (by Lemma 2)
Taking expectation again gives
E[983042xk minusAdaggerb98304222] = E[E[983042xk minusAdaggerb98304222 |xkminus1]]
le9830611minus (2αminus tα2)σ2
n(A)
983042A9830422F
983062E[983042xkminus1 minusAdaggerb98304222]
le9830611minus (2αminus tα2)σ2
n(A)
983042A9830422F
983062k
983042x0 minusAdaggerb98304222
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 19 43
Remark 2
If t = 1 and x0 isin range(AT) we can show
xk minus x0983183 isin range(AT)
by induction where
x0983183 = (IminusAdaggerA)x0 +Adaggerb
ie the projection of x0 onto the set
x isin Rn | Ax = b
Then for rank deficient consistent linear systems by the sameapproach we can prove the convergence bound
E[983042xk minus x098318398304222] le
9830611minus (2αminus α2)σ2
r (A)
983042A9830422F
983062k
983042x0 minus x098318398304222
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 20 43
Theorem 7
Let xk denote the kth iterate of DSBGS applied to the consistent linearsystem (full column rank or rank-deficient)
Ax = b
with arbitrary x0 isin Rn If t = n and 0 lt α lt 2σ2r (A)983042A9830422F then
E[983042Axk minus b98304222] le9830611 + α2 minus 2ασ2
r (A)
983042A9830422F
983062k
983042Ax0 minus b98304222
If t lt n and 0 lt α lt 2σ2r (A)(tρ) then
E[983042Axk minus b98304222] le9830611minus 2ασ2
r (A)minus tρα2
983042A9830422F
983062k
983042Ax0 minus b98304222
whereρ = max
1lejletσ21(AJj )
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 21 43
Proof of Theorem 7
Note that
983042Axk minus b98304222
=
983056983056983056983056Axkminus1 minus α
983061AIJ (AIJ )T(II)
T
983042AIJ 9830422F
983062(Axkminus1 minus b)minus b
9830569830569830569830562
2
= 983042Axkminus1 minus b98304222 minus 2α(Axkminus1 minus b)T983061AIJ (AIJ )T(II)
T
983042AIJ 9830422F
983062(Axkminus1 minus b)
+α2(Axkminus1 minus b)T983061IIAIJ (IJ )TATAIJ (AIJ )T(II)
T
983042AIJ 9830424F
983062(Axkminus1 minus b)
If t = n then it follows from
(IJ )TATAIJ = 983042AJ 9830422F
(since AIJ = AJ is a column vector) that
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 22 43
983042Axk minus b98304222
= 983042Axkminus1 minus b98304222 minus 2α(Axkminus1 minus b)T983061AIJ (AIJ )T(II)
T
983042AIJ 9830422F
983062(Axkminus1 minus b)
+α2(Axkminus1 minus b)T983061983042AJ 9830422FIIAIJ (AIJ )T(II)
T
983042AIJ 9830424F
983062(Axkminus1 minus b)
le 983042Axkminus1 minus b98304222 minus 2α(Axkminus1 minus b)T983061AIJ (AIJ )T(II)
T
983042AIJ 9830422F
983062(Axkminus1 minus b)
+α2(Axkminus1 minus b)T983061983042AJ 9830422FII(II)T
983042AIJ 9830422F
983062(Axkminus1 minus b) (by Lemma 1)
Taking expectation gives
E[983042Axk minus b98304222 |xkminus1]
le (1 + α2)983042Axkminus1 minus b98304222 minus 2α(Axkminus1 minus b)T983061AAT
983042A9830422F
983062(Axkminus1 minus b)
le9830611 + α2 minus 2ασ2
r(A)
983042A9830422F
983062983042Axkminus1 minus b98304222
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 23 43
The last inequality follows from Axkminus1 minus b isin range(A) and Lemma 2Taking expectation again gives
E[983042Axk minus b98304222] = E[E[983042Axk minus b98304222 |xkminus1]]
le9830611 + α2 minus 2ασ2
r (A)
983042A9830422F
983062E[983042Axkminus1 minus b98304222]
le9830611 + α2 minus 2ασ2
r (A)
983042A9830422F
983062k
983042Ax0 minus b98304222
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 24 43
If t lt n then it follows from
(IJ )TATAIJ = AT
JAJ ≼ ρI
(since ρ = max1lejlet σ21(AJj )) that
983042Axk minus b98304222
le 983042Axkminus1 minus b98304222 minus 2α(Axkminus1 minus b)T983061AIJ (AIJ )T(II)
T
983042AIJ 9830422F
983062(Axkminus1 minus b)
+α2(Axkminus1 minus b)T983061ρIIAIJ (AIJ )T(II)
T
983042AIJ 9830424F
983062(Axkminus1 minus b)
le 983042Axkminus1 minus b98304222 minus 2α(Axkminus1 minus b)T983061AIJ (AIJ )T(II)
T
983042AIJ 9830422F
983062(Axkminus1 minus b)
+α2(Axkminus1 minus b)T983061ρII(II)
T
983042AIJ 9830422F
983062(Axkminus1 minus b) (by Lemma 1)
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 25 43
Taking expectation gives
E[983042Axk minus b98304222 |xkminus1]
le9830611 +
tρα2
983042A9830422F
983062983042Axkminus1 minus b98304222 minus 2α(Axkminus1 minus b)T
983061AAT
983042A9830422F
983062(Axkminus1 minus b)
le9830611minus 2ασ2
r(A)minus tρα2
983042A9830422F
983062983042Axkminus1 minus b98304222
The last inequality follows from Axkminus1 minus b isin range(A) and Lemma 2Taking expectation again gives
E[983042Axk minus b98304222] = E[E[983042Axk minus b98304222 |xkminus1]]
le9830611minus 2ασ2
r (A)minus tρα2
983042A9830422F
983062E[983042Axkminus1 minus b98304222]
le9830611minus 2ασ2
r (A)minus tρα2
983042A9830422F
983062k
983042Ax0 minus b98304222
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 26 43
Remark 3
Note that(AJ )
Tb = (AJ )TAx983183
where x983183 is any solution of ATAx = ATb For inconsistent linearsystems we can prove DSBGS(1 t) has the convergence bounds Ift = n and 0 lt α lt 2σ2
r (A)983042A9830422F then
E[983042Axk minusAx98318398304222] le9830611 + α2 minus 2ασ2
r (A)
983042A9830422F
983062k
983042Ax0 minus b98304222
If t lt n and 0 lt α lt 2σ2r (A)(tρ) then
E[983042Axk minusAx98318398304222] le9830611minus 2ασ2
r (A)minus tρα2
983042A9830422F
983062k
983042Ax0 minus b98304222
whereρ = max
1lejletσ21(AJj )
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 27 43
4 Randomized extended block Kaczmarz (REBK) [1]
Algorithm 2 Randomized extended block Kaczmarz (REBK)
Let I1 I2 Is and J1J2 Jt be partitions of [m]and [n] respectivelyLet α gt 0 Initialize z0 isin b+ range(A) and x0 isin range(AT)for k = 1 2 do
Pick j isin [t] with probability 983042AJj9830422F983042A9830422FSet zk = zkminus1 minus α
983042AJj9830422FAJj (AJj )
Tzkminus1
Pick i isin [s] with probability 983042AIi9830422F983042A9830422FSet xk = xkminus1 minus α
983042AIi9830422F(AIi)
T(AIixkminus1 minus bIi + zkIi)
By choosing s = m t = n and α = 1 we recover the well-knownrandomized extended Kaczmarz (REK) algorithm of Zouzias and Freris[8]
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 28 43
Let Ekminus1
983045middot983046denote the conditional expectation conditioned on the first
k minus 1 iterations of REBK That is
Ekminus1
983045middot983046= E
983045middot|j1 i1 j2 i2 jkminus1 ikminus1
983046
where jl is the lth column block chosen and il is the lth row blockchosen
We denote the conditional expectation conditioned on the first k minus 1iterations and the kth column block chosen as
Eikminus1
983045middot983046= E
983045middot|j1 i1 j2 i2 jkminus1 ikminus1 jk
983046
Then by the law of total expectation we have
Ekminus1
983045middot983046= Ekminus1
983045Eikminus1
983045middot983046983046
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 29 43
41 Convergence of the norms of the expectations
Theorem 8
For any given consistent or inconsistent linear system
Ax = b
let xk be the kth iterate of REBK with
z0 isin b+ range(A) and x0 isin range(AT)
It holds
983042E983045xk minusAdaggerb
9830469830422 le δk
983061983042x0 minusAdaggerb9830422 +
αk983042ATz09830422983042A9830422F
983062
where
δ = max1leiler
9830559830559830559830551minusασ2
i (A)
983042A9830422F
983055983055983055983055
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 30 43
Proof By ATb = ATAAdaggerb and
xk minusAdaggerb = xkminus1 minusAdaggerbminus α
983042AIi9830422F(AIi)
T(AIixkminus1 minus bIi + zkIi)
we have
Ekminus1
983045xk minusAdaggerb
983046= Ekminus1
983045Eikminus1
983045xk minusAdaggerb
983046983046
= xkminus1 minusAdaggerbminus Ekminus1
983063αAT(Axkminus1 minus b+ zk)
983042A9830422F
983064
= xkminus1 minusAdaggerbminus αATAxkminus1 minusATb
983042A9830422Fminus α
AT
983042A9830422FEkminus1
983045zk
983046
=
983061Iminus α
ATA
983042A9830422F
983062(xkminus1 minusAdaggerb)minus α
AT
983042A9830422FEkminus1
983045zk
983046
=
983061Iminus α
ATA
983042A9830422F
983062(xkminus1 minusAdaggerb)minus α
AT
983042A9830422F
983061Iminus α
AAT
983042A9830422F
983062zkminus1
=
983061Iminus α
ATA
983042A9830422F
983062(xkminus1 minusAdaggerb)minus α
983061Iminus α
ATA
983042A9830422F
983062AT
983042A9830422Fzkminus1
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 31 43
Taking expectation gives
E983045xk minusAdaggerb
983046
= E983045Ekminus1
983045xk minusAdaggerb
983046983046
=
983061Iminus α
ATA
983042A9830422F
983062E983045xkminus1 minusAdaggerb
983046minus α
983061Iminus α
ATA
983042A9830422F
983062AT
983042A9830422FE983045zkminus1
983046
=
983061Iminus α
ATA
983042A9830422F
983062E983045xkminus1 minusAdaggerb
983046minus α
983061Iminus α
ATA
983042A9830422F
983062kATz0
983042A9830422F
=
983061Iminus α
ATA
983042A9830422F
9830622
E983045xkminus2 minusAdaggerb
983046minus 2α
983061Iminus α
ATA
983042A9830422F
983062kATz0
983042A9830422F= middot middot middot
=
983061Iminus α
ATA
983042A9830422F
983062k
(x0 minusAdaggerb)minus αk
983061Iminus α
ATA
983042A9830422F
983062kATz0
983042A9830422F
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 32 43
Applying the norms to both sides we obtain
983042E983045xk minusAdaggerb
9830469830422
=
983056983056983056983056983056
983061Iminus α
ATA
983042A9830422F
983062k
(x0 minusAdaggerb)minus αk
983061Iminus α
ATA
983042A9830422F
983062kATz0
983042A9830422F
9830569830569830569830569830562
le983056983056983056983056983056
983061Iminus α
ATA
983042A9830422F
983062k
(x0 minusAdaggerb)
9830569830569830569830569830562
+
983056983056983056983056983056αk983061Iminus α
ATA
983042A9830422F
983062kATz0
983042A9830422F
9830569830569830569830569830562
le δk983061983042x0 minusAdaggerb9830422 +
αk983042ATz09830422983042A9830422F
983062
Here the last inequality follows from the facts that
x0 minusAdaggerb isin range(AT) ATz0 isin range(AT)
and Lemma 3
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 33 43
42 Convergence of E983045983042xk minusAdaggerb98304222
983046
The convergence ofE983045983042xk minusAdaggerb98304222
983046
depends on the positive number ρ defined as
ρ = 1minus (2αminus α2)σ2r (A)
983042A9830422F
In the following lemma we show that the vector zk generated in REBKwith z0 isin b+ range(A) converges to
bperp = (IminusAAdagger)b
which is the orthogonal projection of z0 onto the set z | ATz = 0
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 34 43
Lemma 9
For any given consistent or inconsistent linear system
Ax = b
let zk be the vector generated in REBK with
z0 isin b+ range(A)
It holdsE983045983042zk minus bperp98304222
983046le ρk983042z0 minus bperp98304222
Proof By (AJj )Tbperp = 0 we have
zk minus bperp = zkminus1 minus bperp minus α
983042AJj9830422FAJj (AJj )
T(zkminus1 minus bperp) (1)
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 35 43
By z0 minus bperp = AAdaggerz0 isin range(A) we can show by induction that
zk minus bperp isin range(A)
It follows from (1) that
983042zk minus bperp98304222
= 983042zkminus1 minus bperp98304222 minus2α983042(AJj )
T(zkminus1 minus bperp)98304222983042AJj9830422F
+α2
983042AJj9830424F(zkminus1 minus bperp)
TAJj (AJj )TAJj (AJj )
T(zkminus1 minus bperp)
le 983042zkminus1 minus bperp98304222 minus(2αminus α2)983042(AJj )
T(zkminus1 minus bperp)98304222983042AJj9830422F
(by Lemma 1)
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 36 43
Taking the conditioned expectation on the first k minus 1 iterations yields
Ekminus1
983045983042zk minus bperp98304222
983046
le 983042zkminus1 minus bperp98304222 minus(2αminus α2)983042AT(zkminus1 minus bperp)98304222
983042A9830422Fle ρ983042zkminus1 minus bperp98304222 (by Lemma 2)
Taking expectation again gives
E983045983042zk minus bperp98304222
983046= E
983045Ekminus1
983045983042zk minus bperp98304222
983046983046
le ρE983045983042zkminus1 minus bperp98304222
983046
le ρk983042z0 minus bperp98304222
This completes the proof
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 37 43
Theorem 10
For any given consistent or inconsistent linear system Ax = b let xk
be the kth iterate of REBK with
z0 isin b+ range(A) and x0 isin range(AT)
For any ε gt 0 it holds
E983045983042xk minusAdaggerb98304222
983046le (1 + ε)kρk
983061(1 + ε)α2983042z0 minus bperp98304222
ε2983042A9830422F+ 983042x0 minusAdaggerb98304222
983062
Proof Let
983141xk = xkminus1 minus α
983042AIi9830422F(AIi)
TAIi(xkminus1 minusAdaggerb)
which is actually one DSBGS update for the linear systemAx = AAdaggerb from xkminus1
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 38 43
It follows from
xk minus 983141xk =α
983042AIi9830422F(AIi)
T(bIi minusAIiAdaggerbminus zkIi)
that
983042xk minus 983141xk98304222
=α2
983042AIi9830424F(bIi minusAIiA
daggerbminus zkIi)TAIi(AIi)
T(bIi minusAIiAdaggerbminus zkIi
)
leα2983042bIi minusAIiA
daggerbminus zkIi98304222
983042AIi9830422F (by Lemma 1) (2)
It follows from
Ekminus1
983045983042xk minus 983141xk98304222
983046= Ekminus1
983045Eikminus1
983045983042xk minus 983141xk98304222
983046983046
le Ekminus1
983063α2983042bminusAAdaggerbminus zk98304222
983042A9830422F
983064(by (2))
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 39 43
that
E983045983042xk minus 983141xk98304222
983046le α2
983042A9830422FE983045983042bminusAAdaggerbminus zk98304222
983046
le α2ρk
983042A9830422F983042z0 minus bperp98304222 (by Lemma 9) (3)
By x0 isin range(AT) and Adaggerb isin range(AT) we have
x0 minusAdaggerb isin range(AT)
Then we can show that xk minusAdaggerb isin range(AT) by induction By
983042983141xk minusAdaggerb98304222
= 983042xkminus1 minusAdaggerb98304222 minus2α983042AIi(x
kminus1 minusAdaggerb)98304222983042AIi9830422F
+α2
983042AIi9830424F(xkminus1 minusAdaggerb)T(AIi)
TAIi(AIi)TAIi(x
kminus1 minusAdaggerb)
le 983042xkminus1 minusAdaggerb98304222 minus(2αminus α2)983042AIi(x
kminus1 minusAdaggerb)98304222983042AIi9830422F
(by Lemma 1)
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 40 43
we have
Ekminus1
983045983042983141xk minusAdaggerb98304222
983046
le 983042xkminus1 minusAdaggerb98304222 minus(2αminus α2)983042A(xkminus1 minusAdaggerb)98304222
983042A9830422Fle ρ983042xkminus1 minusAdaggerb98304222 (by Lemma 3)
which yields
E983045983042983141xk minusAdaggerb98304222
983046le ρE
983045983042xkminus1 minusAdaggerb98304222
983046 (4)
Note that for any ε gt 0 we have
983042xk minusAdaggerb98304222= 983042xk minus 983141xk + 983141xk minusAdaggerb98304222le (983042xk minus 983141xk9830422 + 983042983141xk minusAdaggerb9830422)2
le 983042xk minus 983141xk98304222 + 983042983141xk minusAdaggerb98304222 + 2983042xk minus 983141xk9830422983042983141xk minusAdaggerb9830422
le9830611 +
1
ε
983062983042xk minus 983141xk98304222 + (1 + ε)983042983141xk minusAdaggerb98304222 (5)
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 41 43
Combining (3) (4) and (5) yields
E983045983042xk minusAdaggerb98304222
983046
le9830611 +
1
ε
983062E983045983042xk minus 983141xk98304222
983046+ (1 + ε)E
983045983042983141xk minusAdaggerb98304222
983046
le9830611 +
1
ε
983062α2ρk
983042A9830422F983042z0 minus bperp98304222 + (1 + ε)ρE
983045983042xkminus1 minusAdaggerb98304222
983046
le9830611 +
1
ε
983062(1 + 1 + ε)α2ρk
983042A9830422F983042z0 minus bperp98304222
+(1 + ε)2ρ2E983045983042xkminus2 minusAdaggerb98304222
983046
le middot middot middot
le9830611 +
1
ε
983062(1 + 1 + ε+ middot middot middot+ (1 + ε)kminus1)α2ρk
983042A9830422F983042z0 minus bperp98304222
+(1 + ε)kρk983042x0 minusAdaggerb98304222
le (1 + ε)kρk983061(1 + ε)α2983042z0 minus bperp98304222
ε2983042A9830422F+ 983042x0 minusAdaggerb98304222
983062
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 42 43
Remark 4
For the case REBK with s = m t = n and α = 1 (ie REK) by theorthogonality
(983141xk minusAdaggerb)T(xk minus 983141xk) = 0
the equation (5) becomes
983042xk minusAdaggerb98304222 = 983042xk minus 983141xk98304222 + 983042983141xk minusAdaggerb98304222
which yields the following convergence for REK
E983045983042xk minusAdaggerb98304222
983046le ρk
983061k983042z0 minus bperp98304222
983042A9830422F+ 983042x0 minusAdaggerb98304222
983062
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 43 43
Kui Du Wutao Si and Xiaohui Sun
Pseudoinverse-free randomized extended block kaczmarz for solving least squaresarXiv preprint arXiv200104179 2020
Kui Du and Xiaohui Sun
A doubly stochastic block Gauss-Seidel algorithm for solving linear equationsarXiv preprint arXiv191213291 2019
Lawrence H Landweber
An iteration formula for Fredholm integral equations of the first kindAmer J Math 73615ndash624 1951
D Leventhal and A S Lewis
Randomized methods for linear constraints convergence rates and conditioningMath Oper Res 35(3)641ndash654 2010
Anna Ma Deanna Needell and Aaditya Ramdas
Convergence properties of the randomized extended Gauss-Seidel and Kaczmarz methodsSIAM J Matrix Anal Appl 36(4)1590ndash1604 2015
Meisam Razaviyayn Mingyi Hong Navid Reyhanian and Zhi-Quan Luo
A linearly convergent doubly stochastic Gauss-Seidel algorithm for solving linear equations and acertain class of over-parameterized optimization problemsMath Program 176(1-2 Ser B)465ndash496 2019
Thomas Strohmer and Roman Vershynin
A randomized Kaczmarz algorithm with exponential convergenceJ Fourier Anal Appl 15(2)262ndash278 2009
Anastasios Zouzias and Nikolaos M Freris
Randomized extended Kaczmarz for solving least squaresSIAM J Matrix Anal Appl 34(2)773ndash793 2013
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 43 43
32 Convergence of the expected norms
Theorem 6
Let xk denote the kth iterate of DSBGS applied to the full column rankconsistent linear system
Ax = b
with arbitrary x0 isin Rn Assume
0 lt α lt 2t
In exact arithmetic it holds
E[983042xk minusAdaggerb98304222] le9830611minus (2αminus tα2)σ2
n(A)
983042A9830422F
983062k
983042x0 minusAdaggerb98304222
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 17 43
Proof of Theorem 6
983042xk minusAdaggerb98304222
=
983056983056983056983056xkminus1 minus α
983061IJ (AIJ )T(II)
T
983042AIJ 9830422F
983062(Axkminus1 minus b)minusAdaggerb
9830569830569830569830562
2
=
983056983056983056983056xkminus1 minus α
983061IJ (AIJ )T(II)
T
983042AIJ 9830422F
983062A(xkminus1 minusAdaggerb)minusAdaggerb
9830569830569830569830562
2
=
983056983056983056983056xkminus1 minusAdaggerbminus α
983061IJ (AIJ )T(II)
TA
983042AIJ 9830422F
983062(xkminus1 minusAdaggerb)
9830569830569830569830562
2
= 983042xkminus1 minusAdaggerb98304222 minus 2α(xkminus1 minusAdaggerb)T983061IJ (AIJ )T(II)
TA
983042AIJ 9830422F
983062(xkminus1 minusAdaggerb)
+α2(xkminus1 minusAdaggerb)T983061ATIIAIJ (IJ )TIJ (AIJ )T(II)
TA
983042AIJ 9830424F
983062(xkminus1 minusAdaggerb)
le 983042xkminus1 minusAdaggerb98304222 minus 2α(xkminus1 minusAdaggerb)T983061IJ (AIJ )T(II)
TA
983042AIJ 9830422F
983062(xkminus1 minusAdaggerb)
+α2(xkminus1 minusAdaggerb)T983061ATII(II)
TA
983042AIJ 9830422F
983062(xkminus1 minusAdaggerb)
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 18 43
The last equality follows from (IJ )TIJ = I and Lemma 1 Takingexpectation gives
E[983042xk minusAdaggerb98304222 |xkminus1]
le 983042xkminus1 minusAdaggerb98304222 minus (2αminus tα2)(xkminus1 minusAdaggerb)T983061ATA
983042A9830422F
983062(xkminus1 minusAdaggerb)
le9830611minus (2αminus tα2)σ2
n(A)
983042A9830422F
983062983042xkminus1 minusAdaggerb98304222 (by Lemma 2)
Taking expectation again gives
E[983042xk minusAdaggerb98304222] = E[E[983042xk minusAdaggerb98304222 |xkminus1]]
le9830611minus (2αminus tα2)σ2
n(A)
983042A9830422F
983062E[983042xkminus1 minusAdaggerb98304222]
le9830611minus (2αminus tα2)σ2
n(A)
983042A9830422F
983062k
983042x0 minusAdaggerb98304222
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 19 43
Remark 2
If t = 1 and x0 isin range(AT) we can show
xk minus x0983183 isin range(AT)
by induction where
x0983183 = (IminusAdaggerA)x0 +Adaggerb
ie the projection of x0 onto the set
x isin Rn | Ax = b
Then for rank deficient consistent linear systems by the sameapproach we can prove the convergence bound
E[983042xk minus x098318398304222] le
9830611minus (2αminus α2)σ2
r (A)
983042A9830422F
983062k
983042x0 minus x098318398304222
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 20 43
Theorem 7
Let xk denote the kth iterate of DSBGS applied to the consistent linearsystem (full column rank or rank-deficient)
Ax = b
with arbitrary x0 isin Rn If t = n and 0 lt α lt 2σ2r (A)983042A9830422F then
E[983042Axk minus b98304222] le9830611 + α2 minus 2ασ2
r (A)
983042A9830422F
983062k
983042Ax0 minus b98304222
If t lt n and 0 lt α lt 2σ2r (A)(tρ) then
E[983042Axk minus b98304222] le9830611minus 2ασ2
r (A)minus tρα2
983042A9830422F
983062k
983042Ax0 minus b98304222
whereρ = max
1lejletσ21(AJj )
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 21 43
Proof of Theorem 7
Note that
983042Axk minus b98304222
=
983056983056983056983056Axkminus1 minus α
983061AIJ (AIJ )T(II)
T
983042AIJ 9830422F
983062(Axkminus1 minus b)minus b
9830569830569830569830562
2
= 983042Axkminus1 minus b98304222 minus 2α(Axkminus1 minus b)T983061AIJ (AIJ )T(II)
T
983042AIJ 9830422F
983062(Axkminus1 minus b)
+α2(Axkminus1 minus b)T983061IIAIJ (IJ )TATAIJ (AIJ )T(II)
T
983042AIJ 9830424F
983062(Axkminus1 minus b)
If t = n then it follows from
(IJ )TATAIJ = 983042AJ 9830422F
(since AIJ = AJ is a column vector) that
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 22 43
983042Axk minus b98304222
= 983042Axkminus1 minus b98304222 minus 2α(Axkminus1 minus b)T983061AIJ (AIJ )T(II)
T
983042AIJ 9830422F
983062(Axkminus1 minus b)
+α2(Axkminus1 minus b)T983061983042AJ 9830422FIIAIJ (AIJ )T(II)
T
983042AIJ 9830424F
983062(Axkminus1 minus b)
le 983042Axkminus1 minus b98304222 minus 2α(Axkminus1 minus b)T983061AIJ (AIJ )T(II)
T
983042AIJ 9830422F
983062(Axkminus1 minus b)
+α2(Axkminus1 minus b)T983061983042AJ 9830422FII(II)T
983042AIJ 9830422F
983062(Axkminus1 minus b) (by Lemma 1)
Taking expectation gives
E[983042Axk minus b98304222 |xkminus1]
le (1 + α2)983042Axkminus1 minus b98304222 minus 2α(Axkminus1 minus b)T983061AAT
983042A9830422F
983062(Axkminus1 minus b)
le9830611 + α2 minus 2ασ2
r(A)
983042A9830422F
983062983042Axkminus1 minus b98304222
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 23 43
The last inequality follows from Axkminus1 minus b isin range(A) and Lemma 2Taking expectation again gives
E[983042Axk minus b98304222] = E[E[983042Axk minus b98304222 |xkminus1]]
le9830611 + α2 minus 2ασ2
r (A)
983042A9830422F
983062E[983042Axkminus1 minus b98304222]
le9830611 + α2 minus 2ασ2
r (A)
983042A9830422F
983062k
983042Ax0 minus b98304222
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 24 43
If t lt n then it follows from
(IJ )TATAIJ = AT
JAJ ≼ ρI
(since ρ = max1lejlet σ21(AJj )) that
983042Axk minus b98304222
le 983042Axkminus1 minus b98304222 minus 2α(Axkminus1 minus b)T983061AIJ (AIJ )T(II)
T
983042AIJ 9830422F
983062(Axkminus1 minus b)
+α2(Axkminus1 minus b)T983061ρIIAIJ (AIJ )T(II)
T
983042AIJ 9830424F
983062(Axkminus1 minus b)
le 983042Axkminus1 minus b98304222 minus 2α(Axkminus1 minus b)T983061AIJ (AIJ )T(II)
T
983042AIJ 9830422F
983062(Axkminus1 minus b)
+α2(Axkminus1 minus b)T983061ρII(II)
T
983042AIJ 9830422F
983062(Axkminus1 minus b) (by Lemma 1)
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 25 43
Taking expectation gives
E[983042Axk minus b98304222 |xkminus1]
le9830611 +
tρα2
983042A9830422F
983062983042Axkminus1 minus b98304222 minus 2α(Axkminus1 minus b)T
983061AAT
983042A9830422F
983062(Axkminus1 minus b)
le9830611minus 2ασ2
r(A)minus tρα2
983042A9830422F
983062983042Axkminus1 minus b98304222
The last inequality follows from Axkminus1 minus b isin range(A) and Lemma 2Taking expectation again gives
E[983042Axk minus b98304222] = E[E[983042Axk minus b98304222 |xkminus1]]
le9830611minus 2ασ2
r (A)minus tρα2
983042A9830422F
983062E[983042Axkminus1 minus b98304222]
le9830611minus 2ασ2
r (A)minus tρα2
983042A9830422F
983062k
983042Ax0 minus b98304222
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 26 43
Remark 3
Note that(AJ )
Tb = (AJ )TAx983183
where x983183 is any solution of ATAx = ATb For inconsistent linearsystems we can prove DSBGS(1 t) has the convergence bounds Ift = n and 0 lt α lt 2σ2
r (A)983042A9830422F then
E[983042Axk minusAx98318398304222] le9830611 + α2 minus 2ασ2
r (A)
983042A9830422F
983062k
983042Ax0 minus b98304222
If t lt n and 0 lt α lt 2σ2r (A)(tρ) then
E[983042Axk minusAx98318398304222] le9830611minus 2ασ2
r (A)minus tρα2
983042A9830422F
983062k
983042Ax0 minus b98304222
whereρ = max
1lejletσ21(AJj )
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 27 43
4 Randomized extended block Kaczmarz (REBK) [1]
Algorithm 2 Randomized extended block Kaczmarz (REBK)
Let I1 I2 Is and J1J2 Jt be partitions of [m]and [n] respectivelyLet α gt 0 Initialize z0 isin b+ range(A) and x0 isin range(AT)for k = 1 2 do
Pick j isin [t] with probability 983042AJj9830422F983042A9830422FSet zk = zkminus1 minus α
983042AJj9830422FAJj (AJj )
Tzkminus1
Pick i isin [s] with probability 983042AIi9830422F983042A9830422FSet xk = xkminus1 minus α
983042AIi9830422F(AIi)
T(AIixkminus1 minus bIi + zkIi)
By choosing s = m t = n and α = 1 we recover the well-knownrandomized extended Kaczmarz (REK) algorithm of Zouzias and Freris[8]
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 28 43
Let Ekminus1
983045middot983046denote the conditional expectation conditioned on the first
k minus 1 iterations of REBK That is
Ekminus1
983045middot983046= E
983045middot|j1 i1 j2 i2 jkminus1 ikminus1
983046
where jl is the lth column block chosen and il is the lth row blockchosen
We denote the conditional expectation conditioned on the first k minus 1iterations and the kth column block chosen as
Eikminus1
983045middot983046= E
983045middot|j1 i1 j2 i2 jkminus1 ikminus1 jk
983046
Then by the law of total expectation we have
Ekminus1
983045middot983046= Ekminus1
983045Eikminus1
983045middot983046983046
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 29 43
41 Convergence of the norms of the expectations
Theorem 8
For any given consistent or inconsistent linear system
Ax = b
let xk be the kth iterate of REBK with
z0 isin b+ range(A) and x0 isin range(AT)
It holds
983042E983045xk minusAdaggerb
9830469830422 le δk
983061983042x0 minusAdaggerb9830422 +
αk983042ATz09830422983042A9830422F
983062
where
δ = max1leiler
9830559830559830559830551minusασ2
i (A)
983042A9830422F
983055983055983055983055
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 30 43
Proof By ATb = ATAAdaggerb and
xk minusAdaggerb = xkminus1 minusAdaggerbminus α
983042AIi9830422F(AIi)
T(AIixkminus1 minus bIi + zkIi)
we have
Ekminus1
983045xk minusAdaggerb
983046= Ekminus1
983045Eikminus1
983045xk minusAdaggerb
983046983046
= xkminus1 minusAdaggerbminus Ekminus1
983063αAT(Axkminus1 minus b+ zk)
983042A9830422F
983064
= xkminus1 minusAdaggerbminus αATAxkminus1 minusATb
983042A9830422Fminus α
AT
983042A9830422FEkminus1
983045zk
983046
=
983061Iminus α
ATA
983042A9830422F
983062(xkminus1 minusAdaggerb)minus α
AT
983042A9830422FEkminus1
983045zk
983046
=
983061Iminus α
ATA
983042A9830422F
983062(xkminus1 minusAdaggerb)minus α
AT
983042A9830422F
983061Iminus α
AAT
983042A9830422F
983062zkminus1
=
983061Iminus α
ATA
983042A9830422F
983062(xkminus1 minusAdaggerb)minus α
983061Iminus α
ATA
983042A9830422F
983062AT
983042A9830422Fzkminus1
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 31 43
Taking expectation gives
E983045xk minusAdaggerb
983046
= E983045Ekminus1
983045xk minusAdaggerb
983046983046
=
983061Iminus α
ATA
983042A9830422F
983062E983045xkminus1 minusAdaggerb
983046minus α
983061Iminus α
ATA
983042A9830422F
983062AT
983042A9830422FE983045zkminus1
983046
=
983061Iminus α
ATA
983042A9830422F
983062E983045xkminus1 minusAdaggerb
983046minus α
983061Iminus α
ATA
983042A9830422F
983062kATz0
983042A9830422F
=
983061Iminus α
ATA
983042A9830422F
9830622
E983045xkminus2 minusAdaggerb
983046minus 2α
983061Iminus α
ATA
983042A9830422F
983062kATz0
983042A9830422F= middot middot middot
=
983061Iminus α
ATA
983042A9830422F
983062k
(x0 minusAdaggerb)minus αk
983061Iminus α
ATA
983042A9830422F
983062kATz0
983042A9830422F
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 32 43
Applying the norms to both sides we obtain
983042E983045xk minusAdaggerb
9830469830422
=
983056983056983056983056983056
983061Iminus α
ATA
983042A9830422F
983062k
(x0 minusAdaggerb)minus αk
983061Iminus α
ATA
983042A9830422F
983062kATz0
983042A9830422F
9830569830569830569830569830562
le983056983056983056983056983056
983061Iminus α
ATA
983042A9830422F
983062k
(x0 minusAdaggerb)
9830569830569830569830569830562
+
983056983056983056983056983056αk983061Iminus α
ATA
983042A9830422F
983062kATz0
983042A9830422F
9830569830569830569830569830562
le δk983061983042x0 minusAdaggerb9830422 +
αk983042ATz09830422983042A9830422F
983062
Here the last inequality follows from the facts that
x0 minusAdaggerb isin range(AT) ATz0 isin range(AT)
and Lemma 3
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 33 43
42 Convergence of E983045983042xk minusAdaggerb98304222
983046
The convergence ofE983045983042xk minusAdaggerb98304222
983046
depends on the positive number ρ defined as
ρ = 1minus (2αminus α2)σ2r (A)
983042A9830422F
In the following lemma we show that the vector zk generated in REBKwith z0 isin b+ range(A) converges to
bperp = (IminusAAdagger)b
which is the orthogonal projection of z0 onto the set z | ATz = 0
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 34 43
Lemma 9
For any given consistent or inconsistent linear system
Ax = b
let zk be the vector generated in REBK with
z0 isin b+ range(A)
It holdsE983045983042zk minus bperp98304222
983046le ρk983042z0 minus bperp98304222
Proof By (AJj )Tbperp = 0 we have
zk minus bperp = zkminus1 minus bperp minus α
983042AJj9830422FAJj (AJj )
T(zkminus1 minus bperp) (1)
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 35 43
By z0 minus bperp = AAdaggerz0 isin range(A) we can show by induction that
zk minus bperp isin range(A)
It follows from (1) that
983042zk minus bperp98304222
= 983042zkminus1 minus bperp98304222 minus2α983042(AJj )
T(zkminus1 minus bperp)98304222983042AJj9830422F
+α2
983042AJj9830424F(zkminus1 minus bperp)
TAJj (AJj )TAJj (AJj )
T(zkminus1 minus bperp)
le 983042zkminus1 minus bperp98304222 minus(2αminus α2)983042(AJj )
T(zkminus1 minus bperp)98304222983042AJj9830422F
(by Lemma 1)
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 36 43
Taking the conditioned expectation on the first k minus 1 iterations yields
Ekminus1
983045983042zk minus bperp98304222
983046
le 983042zkminus1 minus bperp98304222 minus(2αminus α2)983042AT(zkminus1 minus bperp)98304222
983042A9830422Fle ρ983042zkminus1 minus bperp98304222 (by Lemma 2)
Taking expectation again gives
E983045983042zk minus bperp98304222
983046= E
983045Ekminus1
983045983042zk minus bperp98304222
983046983046
le ρE983045983042zkminus1 minus bperp98304222
983046
le ρk983042z0 minus bperp98304222
This completes the proof
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 37 43
Theorem 10
For any given consistent or inconsistent linear system Ax = b let xk
be the kth iterate of REBK with
z0 isin b+ range(A) and x0 isin range(AT)
For any ε gt 0 it holds
E983045983042xk minusAdaggerb98304222
983046le (1 + ε)kρk
983061(1 + ε)α2983042z0 minus bperp98304222
ε2983042A9830422F+ 983042x0 minusAdaggerb98304222
983062
Proof Let
983141xk = xkminus1 minus α
983042AIi9830422F(AIi)
TAIi(xkminus1 minusAdaggerb)
which is actually one DSBGS update for the linear systemAx = AAdaggerb from xkminus1
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 38 43
It follows from
xk minus 983141xk =α
983042AIi9830422F(AIi)
T(bIi minusAIiAdaggerbminus zkIi)
that
983042xk minus 983141xk98304222
=α2
983042AIi9830424F(bIi minusAIiA
daggerbminus zkIi)TAIi(AIi)
T(bIi minusAIiAdaggerbminus zkIi
)
leα2983042bIi minusAIiA
daggerbminus zkIi98304222
983042AIi9830422F (by Lemma 1) (2)
It follows from
Ekminus1
983045983042xk minus 983141xk98304222
983046= Ekminus1
983045Eikminus1
983045983042xk minus 983141xk98304222
983046983046
le Ekminus1
983063α2983042bminusAAdaggerbminus zk98304222
983042A9830422F
983064(by (2))
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 39 43
that
E983045983042xk minus 983141xk98304222
983046le α2
983042A9830422FE983045983042bminusAAdaggerbminus zk98304222
983046
le α2ρk
983042A9830422F983042z0 minus bperp98304222 (by Lemma 9) (3)
By x0 isin range(AT) and Adaggerb isin range(AT) we have
x0 minusAdaggerb isin range(AT)
Then we can show that xk minusAdaggerb isin range(AT) by induction By
983042983141xk minusAdaggerb98304222
= 983042xkminus1 minusAdaggerb98304222 minus2α983042AIi(x
kminus1 minusAdaggerb)98304222983042AIi9830422F
+α2
983042AIi9830424F(xkminus1 minusAdaggerb)T(AIi)
TAIi(AIi)TAIi(x
kminus1 minusAdaggerb)
le 983042xkminus1 minusAdaggerb98304222 minus(2αminus α2)983042AIi(x
kminus1 minusAdaggerb)98304222983042AIi9830422F
(by Lemma 1)
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 40 43
we have
Ekminus1
983045983042983141xk minusAdaggerb98304222
983046
le 983042xkminus1 minusAdaggerb98304222 minus(2αminus α2)983042A(xkminus1 minusAdaggerb)98304222
983042A9830422Fle ρ983042xkminus1 minusAdaggerb98304222 (by Lemma 3)
which yields
E983045983042983141xk minusAdaggerb98304222
983046le ρE
983045983042xkminus1 minusAdaggerb98304222
983046 (4)
Note that for any ε gt 0 we have
983042xk minusAdaggerb98304222= 983042xk minus 983141xk + 983141xk minusAdaggerb98304222le (983042xk minus 983141xk9830422 + 983042983141xk minusAdaggerb9830422)2
le 983042xk minus 983141xk98304222 + 983042983141xk minusAdaggerb98304222 + 2983042xk minus 983141xk9830422983042983141xk minusAdaggerb9830422
le9830611 +
1
ε
983062983042xk minus 983141xk98304222 + (1 + ε)983042983141xk minusAdaggerb98304222 (5)
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 41 43
Combining (3) (4) and (5) yields
E983045983042xk minusAdaggerb98304222
983046
le9830611 +
1
ε
983062E983045983042xk minus 983141xk98304222
983046+ (1 + ε)E
983045983042983141xk minusAdaggerb98304222
983046
le9830611 +
1
ε
983062α2ρk
983042A9830422F983042z0 minus bperp98304222 + (1 + ε)ρE
983045983042xkminus1 minusAdaggerb98304222
983046
le9830611 +
1
ε
983062(1 + 1 + ε)α2ρk
983042A9830422F983042z0 minus bperp98304222
+(1 + ε)2ρ2E983045983042xkminus2 minusAdaggerb98304222
983046
le middot middot middot
le9830611 +
1
ε
983062(1 + 1 + ε+ middot middot middot+ (1 + ε)kminus1)α2ρk
983042A9830422F983042z0 minus bperp98304222
+(1 + ε)kρk983042x0 minusAdaggerb98304222
le (1 + ε)kρk983061(1 + ε)α2983042z0 minus bperp98304222
ε2983042A9830422F+ 983042x0 minusAdaggerb98304222
983062
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 42 43
Remark 4
For the case REBK with s = m t = n and α = 1 (ie REK) by theorthogonality
(983141xk minusAdaggerb)T(xk minus 983141xk) = 0
the equation (5) becomes
983042xk minusAdaggerb98304222 = 983042xk minus 983141xk98304222 + 983042983141xk minusAdaggerb98304222
which yields the following convergence for REK
E983045983042xk minusAdaggerb98304222
983046le ρk
983061k983042z0 minus bperp98304222
983042A9830422F+ 983042x0 minusAdaggerb98304222
983062
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 43 43
Kui Du Wutao Si and Xiaohui Sun
Pseudoinverse-free randomized extended block kaczmarz for solving least squaresarXiv preprint arXiv200104179 2020
Kui Du and Xiaohui Sun
A doubly stochastic block Gauss-Seidel algorithm for solving linear equationsarXiv preprint arXiv191213291 2019
Lawrence H Landweber
An iteration formula for Fredholm integral equations of the first kindAmer J Math 73615ndash624 1951
D Leventhal and A S Lewis
Randomized methods for linear constraints convergence rates and conditioningMath Oper Res 35(3)641ndash654 2010
Anna Ma Deanna Needell and Aaditya Ramdas
Convergence properties of the randomized extended Gauss-Seidel and Kaczmarz methodsSIAM J Matrix Anal Appl 36(4)1590ndash1604 2015
Meisam Razaviyayn Mingyi Hong Navid Reyhanian and Zhi-Quan Luo
A linearly convergent doubly stochastic Gauss-Seidel algorithm for solving linear equations and acertain class of over-parameterized optimization problemsMath Program 176(1-2 Ser B)465ndash496 2019
Thomas Strohmer and Roman Vershynin
A randomized Kaczmarz algorithm with exponential convergenceJ Fourier Anal Appl 15(2)262ndash278 2009
Anastasios Zouzias and Nikolaos M Freris
Randomized extended Kaczmarz for solving least squaresSIAM J Matrix Anal Appl 34(2)773ndash793 2013
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 43 43
Proof of Theorem 6
983042xk minusAdaggerb98304222
=
983056983056983056983056xkminus1 minus α
983061IJ (AIJ )T(II)
T
983042AIJ 9830422F
983062(Axkminus1 minus b)minusAdaggerb
9830569830569830569830562
2
=
983056983056983056983056xkminus1 minus α
983061IJ (AIJ )T(II)
T
983042AIJ 9830422F
983062A(xkminus1 minusAdaggerb)minusAdaggerb
9830569830569830569830562
2
=
983056983056983056983056xkminus1 minusAdaggerbminus α
983061IJ (AIJ )T(II)
TA
983042AIJ 9830422F
983062(xkminus1 minusAdaggerb)
9830569830569830569830562
2
= 983042xkminus1 minusAdaggerb98304222 minus 2α(xkminus1 minusAdaggerb)T983061IJ (AIJ )T(II)
TA
983042AIJ 9830422F
983062(xkminus1 minusAdaggerb)
+α2(xkminus1 minusAdaggerb)T983061ATIIAIJ (IJ )TIJ (AIJ )T(II)
TA
983042AIJ 9830424F
983062(xkminus1 minusAdaggerb)
le 983042xkminus1 minusAdaggerb98304222 minus 2α(xkminus1 minusAdaggerb)T983061IJ (AIJ )T(II)
TA
983042AIJ 9830422F
983062(xkminus1 minusAdaggerb)
+α2(xkminus1 minusAdaggerb)T983061ATII(II)
TA
983042AIJ 9830422F
983062(xkminus1 minusAdaggerb)
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 18 43
The last equality follows from (IJ )TIJ = I and Lemma 1 Takingexpectation gives
E[983042xk minusAdaggerb98304222 |xkminus1]
le 983042xkminus1 minusAdaggerb98304222 minus (2αminus tα2)(xkminus1 minusAdaggerb)T983061ATA
983042A9830422F
983062(xkminus1 minusAdaggerb)
le9830611minus (2αminus tα2)σ2
n(A)
983042A9830422F
983062983042xkminus1 minusAdaggerb98304222 (by Lemma 2)
Taking expectation again gives
E[983042xk minusAdaggerb98304222] = E[E[983042xk minusAdaggerb98304222 |xkminus1]]
le9830611minus (2αminus tα2)σ2
n(A)
983042A9830422F
983062E[983042xkminus1 minusAdaggerb98304222]
le9830611minus (2αminus tα2)σ2
n(A)
983042A9830422F
983062k
983042x0 minusAdaggerb98304222
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 19 43
Remark 2
If t = 1 and x0 isin range(AT) we can show
xk minus x0983183 isin range(AT)
by induction where
x0983183 = (IminusAdaggerA)x0 +Adaggerb
ie the projection of x0 onto the set
x isin Rn | Ax = b
Then for rank deficient consistent linear systems by the sameapproach we can prove the convergence bound
E[983042xk minus x098318398304222] le
9830611minus (2αminus α2)σ2
r (A)
983042A9830422F
983062k
983042x0 minus x098318398304222
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 20 43
Theorem 7
Let xk denote the kth iterate of DSBGS applied to the consistent linearsystem (full column rank or rank-deficient)
Ax = b
with arbitrary x0 isin Rn If t = n and 0 lt α lt 2σ2r (A)983042A9830422F then
E[983042Axk minus b98304222] le9830611 + α2 minus 2ασ2
r (A)
983042A9830422F
983062k
983042Ax0 minus b98304222
If t lt n and 0 lt α lt 2σ2r (A)(tρ) then
E[983042Axk minus b98304222] le9830611minus 2ασ2
r (A)minus tρα2
983042A9830422F
983062k
983042Ax0 minus b98304222
whereρ = max
1lejletσ21(AJj )
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 21 43
Proof of Theorem 7
Note that
983042Axk minus b98304222
=
983056983056983056983056Axkminus1 minus α
983061AIJ (AIJ )T(II)
T
983042AIJ 9830422F
983062(Axkminus1 minus b)minus b
9830569830569830569830562
2
= 983042Axkminus1 minus b98304222 minus 2α(Axkminus1 minus b)T983061AIJ (AIJ )T(II)
T
983042AIJ 9830422F
983062(Axkminus1 minus b)
+α2(Axkminus1 minus b)T983061IIAIJ (IJ )TATAIJ (AIJ )T(II)
T
983042AIJ 9830424F
983062(Axkminus1 minus b)
If t = n then it follows from
(IJ )TATAIJ = 983042AJ 9830422F
(since AIJ = AJ is a column vector) that
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 22 43
983042Axk minus b98304222
= 983042Axkminus1 minus b98304222 minus 2α(Axkminus1 minus b)T983061AIJ (AIJ )T(II)
T
983042AIJ 9830422F
983062(Axkminus1 minus b)
+α2(Axkminus1 minus b)T983061983042AJ 9830422FIIAIJ (AIJ )T(II)
T
983042AIJ 9830424F
983062(Axkminus1 minus b)
le 983042Axkminus1 minus b98304222 minus 2α(Axkminus1 minus b)T983061AIJ (AIJ )T(II)
T
983042AIJ 9830422F
983062(Axkminus1 minus b)
+α2(Axkminus1 minus b)T983061983042AJ 9830422FII(II)T
983042AIJ 9830422F
983062(Axkminus1 minus b) (by Lemma 1)
Taking expectation gives
E[983042Axk minus b98304222 |xkminus1]
le (1 + α2)983042Axkminus1 minus b98304222 minus 2α(Axkminus1 minus b)T983061AAT
983042A9830422F
983062(Axkminus1 minus b)
le9830611 + α2 minus 2ασ2
r(A)
983042A9830422F
983062983042Axkminus1 minus b98304222
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 23 43
The last inequality follows from Axkminus1 minus b isin range(A) and Lemma 2Taking expectation again gives
E[983042Axk minus b98304222] = E[E[983042Axk minus b98304222 |xkminus1]]
le9830611 + α2 minus 2ασ2
r (A)
983042A9830422F
983062E[983042Axkminus1 minus b98304222]
le9830611 + α2 minus 2ασ2
r (A)
983042A9830422F
983062k
983042Ax0 minus b98304222
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 24 43
If t lt n then it follows from
(IJ )TATAIJ = AT
JAJ ≼ ρI
(since ρ = max1lejlet σ21(AJj )) that
983042Axk minus b98304222
le 983042Axkminus1 minus b98304222 minus 2α(Axkminus1 minus b)T983061AIJ (AIJ )T(II)
T
983042AIJ 9830422F
983062(Axkminus1 minus b)
+α2(Axkminus1 minus b)T983061ρIIAIJ (AIJ )T(II)
T
983042AIJ 9830424F
983062(Axkminus1 minus b)
le 983042Axkminus1 minus b98304222 minus 2α(Axkminus1 minus b)T983061AIJ (AIJ )T(II)
T
983042AIJ 9830422F
983062(Axkminus1 minus b)
+α2(Axkminus1 minus b)T983061ρII(II)
T
983042AIJ 9830422F
983062(Axkminus1 minus b) (by Lemma 1)
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 25 43
Taking expectation gives
E[983042Axk minus b98304222 |xkminus1]
le9830611 +
tρα2
983042A9830422F
983062983042Axkminus1 minus b98304222 minus 2α(Axkminus1 minus b)T
983061AAT
983042A9830422F
983062(Axkminus1 minus b)
le9830611minus 2ασ2
r(A)minus tρα2
983042A9830422F
983062983042Axkminus1 minus b98304222
The last inequality follows from Axkminus1 minus b isin range(A) and Lemma 2Taking expectation again gives
E[983042Axk minus b98304222] = E[E[983042Axk minus b98304222 |xkminus1]]
le9830611minus 2ασ2
r (A)minus tρα2
983042A9830422F
983062E[983042Axkminus1 minus b98304222]
le9830611minus 2ασ2
r (A)minus tρα2
983042A9830422F
983062k
983042Ax0 minus b98304222
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 26 43
Remark 3
Note that(AJ )
Tb = (AJ )TAx983183
where x983183 is any solution of ATAx = ATb For inconsistent linearsystems we can prove DSBGS(1 t) has the convergence bounds Ift = n and 0 lt α lt 2σ2
r (A)983042A9830422F then
E[983042Axk minusAx98318398304222] le9830611 + α2 minus 2ασ2
r (A)
983042A9830422F
983062k
983042Ax0 minus b98304222
If t lt n and 0 lt α lt 2σ2r (A)(tρ) then
E[983042Axk minusAx98318398304222] le9830611minus 2ασ2
r (A)minus tρα2
983042A9830422F
983062k
983042Ax0 minus b98304222
whereρ = max
1lejletσ21(AJj )
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 27 43
4 Randomized extended block Kaczmarz (REBK) [1]
Algorithm 2 Randomized extended block Kaczmarz (REBK)
Let I1 I2 Is and J1J2 Jt be partitions of [m]and [n] respectivelyLet α gt 0 Initialize z0 isin b+ range(A) and x0 isin range(AT)for k = 1 2 do
Pick j isin [t] with probability 983042AJj9830422F983042A9830422FSet zk = zkminus1 minus α
983042AJj9830422FAJj (AJj )
Tzkminus1
Pick i isin [s] with probability 983042AIi9830422F983042A9830422FSet xk = xkminus1 minus α
983042AIi9830422F(AIi)
T(AIixkminus1 minus bIi + zkIi)
By choosing s = m t = n and α = 1 we recover the well-knownrandomized extended Kaczmarz (REK) algorithm of Zouzias and Freris[8]
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 28 43
Let Ekminus1
983045middot983046denote the conditional expectation conditioned on the first
k minus 1 iterations of REBK That is
Ekminus1
983045middot983046= E
983045middot|j1 i1 j2 i2 jkminus1 ikminus1
983046
where jl is the lth column block chosen and il is the lth row blockchosen
We denote the conditional expectation conditioned on the first k minus 1iterations and the kth column block chosen as
Eikminus1
983045middot983046= E
983045middot|j1 i1 j2 i2 jkminus1 ikminus1 jk
983046
Then by the law of total expectation we have
Ekminus1
983045middot983046= Ekminus1
983045Eikminus1
983045middot983046983046
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 29 43
41 Convergence of the norms of the expectations
Theorem 8
For any given consistent or inconsistent linear system
Ax = b
let xk be the kth iterate of REBK with
z0 isin b+ range(A) and x0 isin range(AT)
It holds
983042E983045xk minusAdaggerb
9830469830422 le δk
983061983042x0 minusAdaggerb9830422 +
αk983042ATz09830422983042A9830422F
983062
where
δ = max1leiler
9830559830559830559830551minusασ2
i (A)
983042A9830422F
983055983055983055983055
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 30 43
Proof By ATb = ATAAdaggerb and
xk minusAdaggerb = xkminus1 minusAdaggerbminus α
983042AIi9830422F(AIi)
T(AIixkminus1 minus bIi + zkIi)
we have
Ekminus1
983045xk minusAdaggerb
983046= Ekminus1
983045Eikminus1
983045xk minusAdaggerb
983046983046
= xkminus1 minusAdaggerbminus Ekminus1
983063αAT(Axkminus1 minus b+ zk)
983042A9830422F
983064
= xkminus1 minusAdaggerbminus αATAxkminus1 minusATb
983042A9830422Fminus α
AT
983042A9830422FEkminus1
983045zk
983046
=
983061Iminus α
ATA
983042A9830422F
983062(xkminus1 minusAdaggerb)minus α
AT
983042A9830422FEkminus1
983045zk
983046
=
983061Iminus α
ATA
983042A9830422F
983062(xkminus1 minusAdaggerb)minus α
AT
983042A9830422F
983061Iminus α
AAT
983042A9830422F
983062zkminus1
=
983061Iminus α
ATA
983042A9830422F
983062(xkminus1 minusAdaggerb)minus α
983061Iminus α
ATA
983042A9830422F
983062AT
983042A9830422Fzkminus1
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 31 43
Taking expectation gives
E983045xk minusAdaggerb
983046
= E983045Ekminus1
983045xk minusAdaggerb
983046983046
=
983061Iminus α
ATA
983042A9830422F
983062E983045xkminus1 minusAdaggerb
983046minus α
983061Iminus α
ATA
983042A9830422F
983062AT
983042A9830422FE983045zkminus1
983046
=
983061Iminus α
ATA
983042A9830422F
983062E983045xkminus1 minusAdaggerb
983046minus α
983061Iminus α
ATA
983042A9830422F
983062kATz0
983042A9830422F
=
983061Iminus α
ATA
983042A9830422F
9830622
E983045xkminus2 minusAdaggerb
983046minus 2α
983061Iminus α
ATA
983042A9830422F
983062kATz0
983042A9830422F= middot middot middot
=
983061Iminus α
ATA
983042A9830422F
983062k
(x0 minusAdaggerb)minus αk
983061Iminus α
ATA
983042A9830422F
983062kATz0
983042A9830422F
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 32 43
Applying the norms to both sides we obtain
983042E983045xk minusAdaggerb
9830469830422
=
983056983056983056983056983056
983061Iminus α
ATA
983042A9830422F
983062k
(x0 minusAdaggerb)minus αk
983061Iminus α
ATA
983042A9830422F
983062kATz0
983042A9830422F
9830569830569830569830569830562
le983056983056983056983056983056
983061Iminus α
ATA
983042A9830422F
983062k
(x0 minusAdaggerb)
9830569830569830569830569830562
+
983056983056983056983056983056αk983061Iminus α
ATA
983042A9830422F
983062kATz0
983042A9830422F
9830569830569830569830569830562
le δk983061983042x0 minusAdaggerb9830422 +
αk983042ATz09830422983042A9830422F
983062
Here the last inequality follows from the facts that
x0 minusAdaggerb isin range(AT) ATz0 isin range(AT)
and Lemma 3
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 33 43
42 Convergence of E983045983042xk minusAdaggerb98304222
983046
The convergence ofE983045983042xk minusAdaggerb98304222
983046
depends on the positive number ρ defined as
ρ = 1minus (2αminus α2)σ2r (A)
983042A9830422F
In the following lemma we show that the vector zk generated in REBKwith z0 isin b+ range(A) converges to
bperp = (IminusAAdagger)b
which is the orthogonal projection of z0 onto the set z | ATz = 0
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 34 43
Lemma 9
For any given consistent or inconsistent linear system
Ax = b
let zk be the vector generated in REBK with
z0 isin b+ range(A)
It holdsE983045983042zk minus bperp98304222
983046le ρk983042z0 minus bperp98304222
Proof By (AJj )Tbperp = 0 we have
zk minus bperp = zkminus1 minus bperp minus α
983042AJj9830422FAJj (AJj )
T(zkminus1 minus bperp) (1)
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 35 43
By z0 minus bperp = AAdaggerz0 isin range(A) we can show by induction that
zk minus bperp isin range(A)
It follows from (1) that
983042zk minus bperp98304222
= 983042zkminus1 minus bperp98304222 minus2α983042(AJj )
T(zkminus1 minus bperp)98304222983042AJj9830422F
+α2
983042AJj9830424F(zkminus1 minus bperp)
TAJj (AJj )TAJj (AJj )
T(zkminus1 minus bperp)
le 983042zkminus1 minus bperp98304222 minus(2αminus α2)983042(AJj )
T(zkminus1 minus bperp)98304222983042AJj9830422F
(by Lemma 1)
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 36 43
Taking the conditioned expectation on the first k minus 1 iterations yields
Ekminus1
983045983042zk minus bperp98304222
983046
le 983042zkminus1 minus bperp98304222 minus(2αminus α2)983042AT(zkminus1 minus bperp)98304222
983042A9830422Fle ρ983042zkminus1 minus bperp98304222 (by Lemma 2)
Taking expectation again gives
E983045983042zk minus bperp98304222
983046= E
983045Ekminus1
983045983042zk minus bperp98304222
983046983046
le ρE983045983042zkminus1 minus bperp98304222
983046
le ρk983042z0 minus bperp98304222
This completes the proof
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 37 43
Theorem 10
For any given consistent or inconsistent linear system Ax = b let xk
be the kth iterate of REBK with
z0 isin b+ range(A) and x0 isin range(AT)
For any ε gt 0 it holds
E983045983042xk minusAdaggerb98304222
983046le (1 + ε)kρk
983061(1 + ε)α2983042z0 minus bperp98304222
ε2983042A9830422F+ 983042x0 minusAdaggerb98304222
983062
Proof Let
983141xk = xkminus1 minus α
983042AIi9830422F(AIi)
TAIi(xkminus1 minusAdaggerb)
which is actually one DSBGS update for the linear systemAx = AAdaggerb from xkminus1
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 38 43
It follows from
xk minus 983141xk =α
983042AIi9830422F(AIi)
T(bIi minusAIiAdaggerbminus zkIi)
that
983042xk minus 983141xk98304222
=α2
983042AIi9830424F(bIi minusAIiA
daggerbminus zkIi)TAIi(AIi)
T(bIi minusAIiAdaggerbminus zkIi
)
leα2983042bIi minusAIiA
daggerbminus zkIi98304222
983042AIi9830422F (by Lemma 1) (2)
It follows from
Ekminus1
983045983042xk minus 983141xk98304222
983046= Ekminus1
983045Eikminus1
983045983042xk minus 983141xk98304222
983046983046
le Ekminus1
983063α2983042bminusAAdaggerbminus zk98304222
983042A9830422F
983064(by (2))
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 39 43
that
E983045983042xk minus 983141xk98304222
983046le α2
983042A9830422FE983045983042bminusAAdaggerbminus zk98304222
983046
le α2ρk
983042A9830422F983042z0 minus bperp98304222 (by Lemma 9) (3)
By x0 isin range(AT) and Adaggerb isin range(AT) we have
x0 minusAdaggerb isin range(AT)
Then we can show that xk minusAdaggerb isin range(AT) by induction By
983042983141xk minusAdaggerb98304222
= 983042xkminus1 minusAdaggerb98304222 minus2α983042AIi(x
kminus1 minusAdaggerb)98304222983042AIi9830422F
+α2
983042AIi9830424F(xkminus1 minusAdaggerb)T(AIi)
TAIi(AIi)TAIi(x
kminus1 minusAdaggerb)
le 983042xkminus1 minusAdaggerb98304222 minus(2αminus α2)983042AIi(x
kminus1 minusAdaggerb)98304222983042AIi9830422F
(by Lemma 1)
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 40 43
we have
Ekminus1
983045983042983141xk minusAdaggerb98304222
983046
le 983042xkminus1 minusAdaggerb98304222 minus(2αminus α2)983042A(xkminus1 minusAdaggerb)98304222
983042A9830422Fle ρ983042xkminus1 minusAdaggerb98304222 (by Lemma 3)
which yields
E983045983042983141xk minusAdaggerb98304222
983046le ρE
983045983042xkminus1 minusAdaggerb98304222
983046 (4)
Note that for any ε gt 0 we have
983042xk minusAdaggerb98304222= 983042xk minus 983141xk + 983141xk minusAdaggerb98304222le (983042xk minus 983141xk9830422 + 983042983141xk minusAdaggerb9830422)2
le 983042xk minus 983141xk98304222 + 983042983141xk minusAdaggerb98304222 + 2983042xk minus 983141xk9830422983042983141xk minusAdaggerb9830422
le9830611 +
1
ε
983062983042xk minus 983141xk98304222 + (1 + ε)983042983141xk minusAdaggerb98304222 (5)
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 41 43
Combining (3) (4) and (5) yields
E983045983042xk minusAdaggerb98304222
983046
le9830611 +
1
ε
983062E983045983042xk minus 983141xk98304222
983046+ (1 + ε)E
983045983042983141xk minusAdaggerb98304222
983046
le9830611 +
1
ε
983062α2ρk
983042A9830422F983042z0 minus bperp98304222 + (1 + ε)ρE
983045983042xkminus1 minusAdaggerb98304222
983046
le9830611 +
1
ε
983062(1 + 1 + ε)α2ρk
983042A9830422F983042z0 minus bperp98304222
+(1 + ε)2ρ2E983045983042xkminus2 minusAdaggerb98304222
983046
le middot middot middot
le9830611 +
1
ε
983062(1 + 1 + ε+ middot middot middot+ (1 + ε)kminus1)α2ρk
983042A9830422F983042z0 minus bperp98304222
+(1 + ε)kρk983042x0 minusAdaggerb98304222
le (1 + ε)kρk983061(1 + ε)α2983042z0 minus bperp98304222
ε2983042A9830422F+ 983042x0 minusAdaggerb98304222
983062
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 42 43
Remark 4
For the case REBK with s = m t = n and α = 1 (ie REK) by theorthogonality
(983141xk minusAdaggerb)T(xk minus 983141xk) = 0
the equation (5) becomes
983042xk minusAdaggerb98304222 = 983042xk minus 983141xk98304222 + 983042983141xk minusAdaggerb98304222
which yields the following convergence for REK
E983045983042xk minusAdaggerb98304222
983046le ρk
983061k983042z0 minus bperp98304222
983042A9830422F+ 983042x0 minusAdaggerb98304222
983062
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 43 43
Kui Du Wutao Si and Xiaohui Sun
Pseudoinverse-free randomized extended block kaczmarz for solving least squaresarXiv preprint arXiv200104179 2020
Kui Du and Xiaohui Sun
A doubly stochastic block Gauss-Seidel algorithm for solving linear equationsarXiv preprint arXiv191213291 2019
Lawrence H Landweber
An iteration formula for Fredholm integral equations of the first kindAmer J Math 73615ndash624 1951
D Leventhal and A S Lewis
Randomized methods for linear constraints convergence rates and conditioningMath Oper Res 35(3)641ndash654 2010
Anna Ma Deanna Needell and Aaditya Ramdas
Convergence properties of the randomized extended Gauss-Seidel and Kaczmarz methodsSIAM J Matrix Anal Appl 36(4)1590ndash1604 2015
Meisam Razaviyayn Mingyi Hong Navid Reyhanian and Zhi-Quan Luo
A linearly convergent doubly stochastic Gauss-Seidel algorithm for solving linear equations and acertain class of over-parameterized optimization problemsMath Program 176(1-2 Ser B)465ndash496 2019
Thomas Strohmer and Roman Vershynin
A randomized Kaczmarz algorithm with exponential convergenceJ Fourier Anal Appl 15(2)262ndash278 2009
Anastasios Zouzias and Nikolaos M Freris
Randomized extended Kaczmarz for solving least squaresSIAM J Matrix Anal Appl 34(2)773ndash793 2013
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 43 43
The last equality follows from (IJ )TIJ = I and Lemma 1 Takingexpectation gives
E[983042xk minusAdaggerb98304222 |xkminus1]
le 983042xkminus1 minusAdaggerb98304222 minus (2αminus tα2)(xkminus1 minusAdaggerb)T983061ATA
983042A9830422F
983062(xkminus1 minusAdaggerb)
le9830611minus (2αminus tα2)σ2
n(A)
983042A9830422F
983062983042xkminus1 minusAdaggerb98304222 (by Lemma 2)
Taking expectation again gives
E[983042xk minusAdaggerb98304222] = E[E[983042xk minusAdaggerb98304222 |xkminus1]]
le9830611minus (2αminus tα2)σ2
n(A)
983042A9830422F
983062E[983042xkminus1 minusAdaggerb98304222]
le9830611minus (2αminus tα2)σ2
n(A)
983042A9830422F
983062k
983042x0 minusAdaggerb98304222
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 19 43
Remark 2
If t = 1 and x0 isin range(AT) we can show
xk minus x0983183 isin range(AT)
by induction where
x0983183 = (IminusAdaggerA)x0 +Adaggerb
ie the projection of x0 onto the set
x isin Rn | Ax = b
Then for rank deficient consistent linear systems by the sameapproach we can prove the convergence bound
E[983042xk minus x098318398304222] le
9830611minus (2αminus α2)σ2
r (A)
983042A9830422F
983062k
983042x0 minus x098318398304222
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 20 43
Theorem 7
Let xk denote the kth iterate of DSBGS applied to the consistent linearsystem (full column rank or rank-deficient)
Ax = b
with arbitrary x0 isin Rn If t = n and 0 lt α lt 2σ2r (A)983042A9830422F then
E[983042Axk minus b98304222] le9830611 + α2 minus 2ασ2
r (A)
983042A9830422F
983062k
983042Ax0 minus b98304222
If t lt n and 0 lt α lt 2σ2r (A)(tρ) then
E[983042Axk minus b98304222] le9830611minus 2ασ2
r (A)minus tρα2
983042A9830422F
983062k
983042Ax0 minus b98304222
whereρ = max
1lejletσ21(AJj )
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 21 43
Proof of Theorem 7
Note that
983042Axk minus b98304222
=
983056983056983056983056Axkminus1 minus α
983061AIJ (AIJ )T(II)
T
983042AIJ 9830422F
983062(Axkminus1 minus b)minus b
9830569830569830569830562
2
= 983042Axkminus1 minus b98304222 minus 2α(Axkminus1 minus b)T983061AIJ (AIJ )T(II)
T
983042AIJ 9830422F
983062(Axkminus1 minus b)
+α2(Axkminus1 minus b)T983061IIAIJ (IJ )TATAIJ (AIJ )T(II)
T
983042AIJ 9830424F
983062(Axkminus1 minus b)
If t = n then it follows from
(IJ )TATAIJ = 983042AJ 9830422F
(since AIJ = AJ is a column vector) that
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 22 43
983042Axk minus b98304222
= 983042Axkminus1 minus b98304222 minus 2α(Axkminus1 minus b)T983061AIJ (AIJ )T(II)
T
983042AIJ 9830422F
983062(Axkminus1 minus b)
+α2(Axkminus1 minus b)T983061983042AJ 9830422FIIAIJ (AIJ )T(II)
T
983042AIJ 9830424F
983062(Axkminus1 minus b)
le 983042Axkminus1 minus b98304222 minus 2α(Axkminus1 minus b)T983061AIJ (AIJ )T(II)
T
983042AIJ 9830422F
983062(Axkminus1 minus b)
+α2(Axkminus1 minus b)T983061983042AJ 9830422FII(II)T
983042AIJ 9830422F
983062(Axkminus1 minus b) (by Lemma 1)
Taking expectation gives
E[983042Axk minus b98304222 |xkminus1]
le (1 + α2)983042Axkminus1 minus b98304222 minus 2α(Axkminus1 minus b)T983061AAT
983042A9830422F
983062(Axkminus1 minus b)
le9830611 + α2 minus 2ασ2
r(A)
983042A9830422F
983062983042Axkminus1 minus b98304222
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 23 43
The last inequality follows from Axkminus1 minus b isin range(A) and Lemma 2Taking expectation again gives
E[983042Axk minus b98304222] = E[E[983042Axk minus b98304222 |xkminus1]]
le9830611 + α2 minus 2ασ2
r (A)
983042A9830422F
983062E[983042Axkminus1 minus b98304222]
le9830611 + α2 minus 2ασ2
r (A)
983042A9830422F
983062k
983042Ax0 minus b98304222
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 24 43
If t lt n then it follows from
(IJ )TATAIJ = AT
JAJ ≼ ρI
(since ρ = max1lejlet σ21(AJj )) that
983042Axk minus b98304222
le 983042Axkminus1 minus b98304222 minus 2α(Axkminus1 minus b)T983061AIJ (AIJ )T(II)
T
983042AIJ 9830422F
983062(Axkminus1 minus b)
+α2(Axkminus1 minus b)T983061ρIIAIJ (AIJ )T(II)
T
983042AIJ 9830424F
983062(Axkminus1 minus b)
le 983042Axkminus1 minus b98304222 minus 2α(Axkminus1 minus b)T983061AIJ (AIJ )T(II)
T
983042AIJ 9830422F
983062(Axkminus1 minus b)
+α2(Axkminus1 minus b)T983061ρII(II)
T
983042AIJ 9830422F
983062(Axkminus1 minus b) (by Lemma 1)
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 25 43
Taking expectation gives
E[983042Axk minus b98304222 |xkminus1]
le9830611 +
tρα2
983042A9830422F
983062983042Axkminus1 minus b98304222 minus 2α(Axkminus1 minus b)T
983061AAT
983042A9830422F
983062(Axkminus1 minus b)
le9830611minus 2ασ2
r(A)minus tρα2
983042A9830422F
983062983042Axkminus1 minus b98304222
The last inequality follows from Axkminus1 minus b isin range(A) and Lemma 2Taking expectation again gives
E[983042Axk minus b98304222] = E[E[983042Axk minus b98304222 |xkminus1]]
le9830611minus 2ασ2
r (A)minus tρα2
983042A9830422F
983062E[983042Axkminus1 minus b98304222]
le9830611minus 2ασ2
r (A)minus tρα2
983042A9830422F
983062k
983042Ax0 minus b98304222
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 26 43
Remark 3
Note that(AJ )
Tb = (AJ )TAx983183
where x983183 is any solution of ATAx = ATb For inconsistent linearsystems we can prove DSBGS(1 t) has the convergence bounds Ift = n and 0 lt α lt 2σ2
r (A)983042A9830422F then
E[983042Axk minusAx98318398304222] le9830611 + α2 minus 2ασ2
r (A)
983042A9830422F
983062k
983042Ax0 minus b98304222
If t lt n and 0 lt α lt 2σ2r (A)(tρ) then
E[983042Axk minusAx98318398304222] le9830611minus 2ασ2
r (A)minus tρα2
983042A9830422F
983062k
983042Ax0 minus b98304222
whereρ = max
1lejletσ21(AJj )
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 27 43
4 Randomized extended block Kaczmarz (REBK) [1]
Algorithm 2 Randomized extended block Kaczmarz (REBK)
Let I1 I2 Is and J1J2 Jt be partitions of [m]and [n] respectivelyLet α gt 0 Initialize z0 isin b+ range(A) and x0 isin range(AT)for k = 1 2 do
Pick j isin [t] with probability 983042AJj9830422F983042A9830422FSet zk = zkminus1 minus α
983042AJj9830422FAJj (AJj )
Tzkminus1
Pick i isin [s] with probability 983042AIi9830422F983042A9830422FSet xk = xkminus1 minus α
983042AIi9830422F(AIi)
T(AIixkminus1 minus bIi + zkIi)
By choosing s = m t = n and α = 1 we recover the well-knownrandomized extended Kaczmarz (REK) algorithm of Zouzias and Freris[8]
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 28 43
Let Ekminus1
983045middot983046denote the conditional expectation conditioned on the first
k minus 1 iterations of REBK That is
Ekminus1
983045middot983046= E
983045middot|j1 i1 j2 i2 jkminus1 ikminus1
983046
where jl is the lth column block chosen and il is the lth row blockchosen
We denote the conditional expectation conditioned on the first k minus 1iterations and the kth column block chosen as
Eikminus1
983045middot983046= E
983045middot|j1 i1 j2 i2 jkminus1 ikminus1 jk
983046
Then by the law of total expectation we have
Ekminus1
983045middot983046= Ekminus1
983045Eikminus1
983045middot983046983046
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 29 43
41 Convergence of the norms of the expectations
Theorem 8
For any given consistent or inconsistent linear system
Ax = b
let xk be the kth iterate of REBK with
z0 isin b+ range(A) and x0 isin range(AT)
It holds
983042E983045xk minusAdaggerb
9830469830422 le δk
983061983042x0 minusAdaggerb9830422 +
αk983042ATz09830422983042A9830422F
983062
where
δ = max1leiler
9830559830559830559830551minusασ2
i (A)
983042A9830422F
983055983055983055983055
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 30 43
Proof By ATb = ATAAdaggerb and
xk minusAdaggerb = xkminus1 minusAdaggerbminus α
983042AIi9830422F(AIi)
T(AIixkminus1 minus bIi + zkIi)
we have
Ekminus1
983045xk minusAdaggerb
983046= Ekminus1
983045Eikminus1
983045xk minusAdaggerb
983046983046
= xkminus1 minusAdaggerbminus Ekminus1
983063αAT(Axkminus1 minus b+ zk)
983042A9830422F
983064
= xkminus1 minusAdaggerbminus αATAxkminus1 minusATb
983042A9830422Fminus α
AT
983042A9830422FEkminus1
983045zk
983046
=
983061Iminus α
ATA
983042A9830422F
983062(xkminus1 minusAdaggerb)minus α
AT
983042A9830422FEkminus1
983045zk
983046
=
983061Iminus α
ATA
983042A9830422F
983062(xkminus1 minusAdaggerb)minus α
AT
983042A9830422F
983061Iminus α
AAT
983042A9830422F
983062zkminus1
=
983061Iminus α
ATA
983042A9830422F
983062(xkminus1 minusAdaggerb)minus α
983061Iminus α
ATA
983042A9830422F
983062AT
983042A9830422Fzkminus1
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 31 43
Taking expectation gives
E983045xk minusAdaggerb
983046
= E983045Ekminus1
983045xk minusAdaggerb
983046983046
=
983061Iminus α
ATA
983042A9830422F
983062E983045xkminus1 minusAdaggerb
983046minus α
983061Iminus α
ATA
983042A9830422F
983062AT
983042A9830422FE983045zkminus1
983046
=
983061Iminus α
ATA
983042A9830422F
983062E983045xkminus1 minusAdaggerb
983046minus α
983061Iminus α
ATA
983042A9830422F
983062kATz0
983042A9830422F
=
983061Iminus α
ATA
983042A9830422F
9830622
E983045xkminus2 minusAdaggerb
983046minus 2α
983061Iminus α
ATA
983042A9830422F
983062kATz0
983042A9830422F= middot middot middot
=
983061Iminus α
ATA
983042A9830422F
983062k
(x0 minusAdaggerb)minus αk
983061Iminus α
ATA
983042A9830422F
983062kATz0
983042A9830422F
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 32 43
Applying the norms to both sides we obtain
983042E983045xk minusAdaggerb
9830469830422
=
983056983056983056983056983056
983061Iminus α
ATA
983042A9830422F
983062k
(x0 minusAdaggerb)minus αk
983061Iminus α
ATA
983042A9830422F
983062kATz0
983042A9830422F
9830569830569830569830569830562
le983056983056983056983056983056
983061Iminus α
ATA
983042A9830422F
983062k
(x0 minusAdaggerb)
9830569830569830569830569830562
+
983056983056983056983056983056αk983061Iminus α
ATA
983042A9830422F
983062kATz0
983042A9830422F
9830569830569830569830569830562
le δk983061983042x0 minusAdaggerb9830422 +
αk983042ATz09830422983042A9830422F
983062
Here the last inequality follows from the facts that
x0 minusAdaggerb isin range(AT) ATz0 isin range(AT)
and Lemma 3
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 33 43
42 Convergence of E983045983042xk minusAdaggerb98304222
983046
The convergence ofE983045983042xk minusAdaggerb98304222
983046
depends on the positive number ρ defined as
ρ = 1minus (2αminus α2)σ2r (A)
983042A9830422F
In the following lemma we show that the vector zk generated in REBKwith z0 isin b+ range(A) converges to
bperp = (IminusAAdagger)b
which is the orthogonal projection of z0 onto the set z | ATz = 0
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 34 43
Lemma 9
For any given consistent or inconsistent linear system
Ax = b
let zk be the vector generated in REBK with
z0 isin b+ range(A)
It holdsE983045983042zk minus bperp98304222
983046le ρk983042z0 minus bperp98304222
Proof By (AJj )Tbperp = 0 we have
zk minus bperp = zkminus1 minus bperp minus α
983042AJj9830422FAJj (AJj )
T(zkminus1 minus bperp) (1)
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 35 43
By z0 minus bperp = AAdaggerz0 isin range(A) we can show by induction that
zk minus bperp isin range(A)
It follows from (1) that
983042zk minus bperp98304222
= 983042zkminus1 minus bperp98304222 minus2α983042(AJj )
T(zkminus1 minus bperp)98304222983042AJj9830422F
+α2
983042AJj9830424F(zkminus1 minus bperp)
TAJj (AJj )TAJj (AJj )
T(zkminus1 minus bperp)
le 983042zkminus1 minus bperp98304222 minus(2αminus α2)983042(AJj )
T(zkminus1 minus bperp)98304222983042AJj9830422F
(by Lemma 1)
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 36 43
Taking the conditioned expectation on the first k minus 1 iterations yields
Ekminus1
983045983042zk minus bperp98304222
983046
le 983042zkminus1 minus bperp98304222 minus(2αminus α2)983042AT(zkminus1 minus bperp)98304222
983042A9830422Fle ρ983042zkminus1 minus bperp98304222 (by Lemma 2)
Taking expectation again gives
E983045983042zk minus bperp98304222
983046= E
983045Ekminus1
983045983042zk minus bperp98304222
983046983046
le ρE983045983042zkminus1 minus bperp98304222
983046
le ρk983042z0 minus bperp98304222
This completes the proof
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 37 43
Theorem 10
For any given consistent or inconsistent linear system Ax = b let xk
be the kth iterate of REBK with
z0 isin b+ range(A) and x0 isin range(AT)
For any ε gt 0 it holds
E983045983042xk minusAdaggerb98304222
983046le (1 + ε)kρk
983061(1 + ε)α2983042z0 minus bperp98304222
ε2983042A9830422F+ 983042x0 minusAdaggerb98304222
983062
Proof Let
983141xk = xkminus1 minus α
983042AIi9830422F(AIi)
TAIi(xkminus1 minusAdaggerb)
which is actually one DSBGS update for the linear systemAx = AAdaggerb from xkminus1
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 38 43
It follows from
xk minus 983141xk =α
983042AIi9830422F(AIi)
T(bIi minusAIiAdaggerbminus zkIi)
that
983042xk minus 983141xk98304222
=α2
983042AIi9830424F(bIi minusAIiA
daggerbminus zkIi)TAIi(AIi)
T(bIi minusAIiAdaggerbminus zkIi
)
leα2983042bIi minusAIiA
daggerbminus zkIi98304222
983042AIi9830422F (by Lemma 1) (2)
It follows from
Ekminus1
983045983042xk minus 983141xk98304222
983046= Ekminus1
983045Eikminus1
983045983042xk minus 983141xk98304222
983046983046
le Ekminus1
983063α2983042bminusAAdaggerbminus zk98304222
983042A9830422F
983064(by (2))
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 39 43
that
E983045983042xk minus 983141xk98304222
983046le α2
983042A9830422FE983045983042bminusAAdaggerbminus zk98304222
983046
le α2ρk
983042A9830422F983042z0 minus bperp98304222 (by Lemma 9) (3)
By x0 isin range(AT) and Adaggerb isin range(AT) we have
x0 minusAdaggerb isin range(AT)
Then we can show that xk minusAdaggerb isin range(AT) by induction By
983042983141xk minusAdaggerb98304222
= 983042xkminus1 minusAdaggerb98304222 minus2α983042AIi(x
kminus1 minusAdaggerb)98304222983042AIi9830422F
+α2
983042AIi9830424F(xkminus1 minusAdaggerb)T(AIi)
TAIi(AIi)TAIi(x
kminus1 minusAdaggerb)
le 983042xkminus1 minusAdaggerb98304222 minus(2αminus α2)983042AIi(x
kminus1 minusAdaggerb)98304222983042AIi9830422F
(by Lemma 1)
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 40 43
we have
Ekminus1
983045983042983141xk minusAdaggerb98304222
983046
le 983042xkminus1 minusAdaggerb98304222 minus(2αminus α2)983042A(xkminus1 minusAdaggerb)98304222
983042A9830422Fle ρ983042xkminus1 minusAdaggerb98304222 (by Lemma 3)
which yields
E983045983042983141xk minusAdaggerb98304222
983046le ρE
983045983042xkminus1 minusAdaggerb98304222
983046 (4)
Note that for any ε gt 0 we have
983042xk minusAdaggerb98304222= 983042xk minus 983141xk + 983141xk minusAdaggerb98304222le (983042xk minus 983141xk9830422 + 983042983141xk minusAdaggerb9830422)2
le 983042xk minus 983141xk98304222 + 983042983141xk minusAdaggerb98304222 + 2983042xk minus 983141xk9830422983042983141xk minusAdaggerb9830422
le9830611 +
1
ε
983062983042xk minus 983141xk98304222 + (1 + ε)983042983141xk minusAdaggerb98304222 (5)
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 41 43
Combining (3) (4) and (5) yields
E983045983042xk minusAdaggerb98304222
983046
le9830611 +
1
ε
983062E983045983042xk minus 983141xk98304222
983046+ (1 + ε)E
983045983042983141xk minusAdaggerb98304222
983046
le9830611 +
1
ε
983062α2ρk
983042A9830422F983042z0 minus bperp98304222 + (1 + ε)ρE
983045983042xkminus1 minusAdaggerb98304222
983046
le9830611 +
1
ε
983062(1 + 1 + ε)α2ρk
983042A9830422F983042z0 minus bperp98304222
+(1 + ε)2ρ2E983045983042xkminus2 minusAdaggerb98304222
983046
le middot middot middot
le9830611 +
1
ε
983062(1 + 1 + ε+ middot middot middot+ (1 + ε)kminus1)α2ρk
983042A9830422F983042z0 minus bperp98304222
+(1 + ε)kρk983042x0 minusAdaggerb98304222
le (1 + ε)kρk983061(1 + ε)α2983042z0 minus bperp98304222
ε2983042A9830422F+ 983042x0 minusAdaggerb98304222
983062
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 42 43
Remark 4
For the case REBK with s = m t = n and α = 1 (ie REK) by theorthogonality
(983141xk minusAdaggerb)T(xk minus 983141xk) = 0
the equation (5) becomes
983042xk minusAdaggerb98304222 = 983042xk minus 983141xk98304222 + 983042983141xk minusAdaggerb98304222
which yields the following convergence for REK
E983045983042xk minusAdaggerb98304222
983046le ρk
983061k983042z0 minus bperp98304222
983042A9830422F+ 983042x0 minusAdaggerb98304222
983062
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 43 43
Kui Du Wutao Si and Xiaohui Sun
Pseudoinverse-free randomized extended block kaczmarz for solving least squaresarXiv preprint arXiv200104179 2020
Kui Du and Xiaohui Sun
A doubly stochastic block Gauss-Seidel algorithm for solving linear equationsarXiv preprint arXiv191213291 2019
Lawrence H Landweber
An iteration formula for Fredholm integral equations of the first kindAmer J Math 73615ndash624 1951
D Leventhal and A S Lewis
Randomized methods for linear constraints convergence rates and conditioningMath Oper Res 35(3)641ndash654 2010
Anna Ma Deanna Needell and Aaditya Ramdas
Convergence properties of the randomized extended Gauss-Seidel and Kaczmarz methodsSIAM J Matrix Anal Appl 36(4)1590ndash1604 2015
Meisam Razaviyayn Mingyi Hong Navid Reyhanian and Zhi-Quan Luo
A linearly convergent doubly stochastic Gauss-Seidel algorithm for solving linear equations and acertain class of over-parameterized optimization problemsMath Program 176(1-2 Ser B)465ndash496 2019
Thomas Strohmer and Roman Vershynin
A randomized Kaczmarz algorithm with exponential convergenceJ Fourier Anal Appl 15(2)262ndash278 2009
Anastasios Zouzias and Nikolaos M Freris
Randomized extended Kaczmarz for solving least squaresSIAM J Matrix Anal Appl 34(2)773ndash793 2013
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 43 43
Remark 2
If t = 1 and x0 isin range(AT) we can show
xk minus x0983183 isin range(AT)
by induction where
x0983183 = (IminusAdaggerA)x0 +Adaggerb
ie the projection of x0 onto the set
x isin Rn | Ax = b
Then for rank deficient consistent linear systems by the sameapproach we can prove the convergence bound
E[983042xk minus x098318398304222] le
9830611minus (2αminus α2)σ2
r (A)
983042A9830422F
983062k
983042x0 minus x098318398304222
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 20 43
Theorem 7
Let xk denote the kth iterate of DSBGS applied to the consistent linearsystem (full column rank or rank-deficient)
Ax = b
with arbitrary x0 isin Rn If t = n and 0 lt α lt 2σ2r (A)983042A9830422F then
E[983042Axk minus b98304222] le9830611 + α2 minus 2ασ2
r (A)
983042A9830422F
983062k
983042Ax0 minus b98304222
If t lt n and 0 lt α lt 2σ2r (A)(tρ) then
E[983042Axk minus b98304222] le9830611minus 2ασ2
r (A)minus tρα2
983042A9830422F
983062k
983042Ax0 minus b98304222
whereρ = max
1lejletσ21(AJj )
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 21 43
Proof of Theorem 7
Note that
983042Axk minus b98304222
=
983056983056983056983056Axkminus1 minus α
983061AIJ (AIJ )T(II)
T
983042AIJ 9830422F
983062(Axkminus1 minus b)minus b
9830569830569830569830562
2
= 983042Axkminus1 minus b98304222 minus 2α(Axkminus1 minus b)T983061AIJ (AIJ )T(II)
T
983042AIJ 9830422F
983062(Axkminus1 minus b)
+α2(Axkminus1 minus b)T983061IIAIJ (IJ )TATAIJ (AIJ )T(II)
T
983042AIJ 9830424F
983062(Axkminus1 minus b)
If t = n then it follows from
(IJ )TATAIJ = 983042AJ 9830422F
(since AIJ = AJ is a column vector) that
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 22 43
983042Axk minus b98304222
= 983042Axkminus1 minus b98304222 minus 2α(Axkminus1 minus b)T983061AIJ (AIJ )T(II)
T
983042AIJ 9830422F
983062(Axkminus1 minus b)
+α2(Axkminus1 minus b)T983061983042AJ 9830422FIIAIJ (AIJ )T(II)
T
983042AIJ 9830424F
983062(Axkminus1 minus b)
le 983042Axkminus1 minus b98304222 minus 2α(Axkminus1 minus b)T983061AIJ (AIJ )T(II)
T
983042AIJ 9830422F
983062(Axkminus1 minus b)
+α2(Axkminus1 minus b)T983061983042AJ 9830422FII(II)T
983042AIJ 9830422F
983062(Axkminus1 minus b) (by Lemma 1)
Taking expectation gives
E[983042Axk minus b98304222 |xkminus1]
le (1 + α2)983042Axkminus1 minus b98304222 minus 2α(Axkminus1 minus b)T983061AAT
983042A9830422F
983062(Axkminus1 minus b)
le9830611 + α2 minus 2ασ2
r(A)
983042A9830422F
983062983042Axkminus1 minus b98304222
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 23 43
The last inequality follows from Axkminus1 minus b isin range(A) and Lemma 2Taking expectation again gives
E[983042Axk minus b98304222] = E[E[983042Axk minus b98304222 |xkminus1]]
le9830611 + α2 minus 2ασ2
r (A)
983042A9830422F
983062E[983042Axkminus1 minus b98304222]
le9830611 + α2 minus 2ασ2
r (A)
983042A9830422F
983062k
983042Ax0 minus b98304222
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 24 43
If t lt n then it follows from
(IJ )TATAIJ = AT
JAJ ≼ ρI
(since ρ = max1lejlet σ21(AJj )) that
983042Axk minus b98304222
le 983042Axkminus1 minus b98304222 minus 2α(Axkminus1 minus b)T983061AIJ (AIJ )T(II)
T
983042AIJ 9830422F
983062(Axkminus1 minus b)
+α2(Axkminus1 minus b)T983061ρIIAIJ (AIJ )T(II)
T
983042AIJ 9830424F
983062(Axkminus1 minus b)
le 983042Axkminus1 minus b98304222 minus 2α(Axkminus1 minus b)T983061AIJ (AIJ )T(II)
T
983042AIJ 9830422F
983062(Axkminus1 minus b)
+α2(Axkminus1 minus b)T983061ρII(II)
T
983042AIJ 9830422F
983062(Axkminus1 minus b) (by Lemma 1)
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 25 43
Taking expectation gives
E[983042Axk minus b98304222 |xkminus1]
le9830611 +
tρα2
983042A9830422F
983062983042Axkminus1 minus b98304222 minus 2α(Axkminus1 minus b)T
983061AAT
983042A9830422F
983062(Axkminus1 minus b)
le9830611minus 2ασ2
r(A)minus tρα2
983042A9830422F
983062983042Axkminus1 minus b98304222
The last inequality follows from Axkminus1 minus b isin range(A) and Lemma 2Taking expectation again gives
E[983042Axk minus b98304222] = E[E[983042Axk minus b98304222 |xkminus1]]
le9830611minus 2ασ2
r (A)minus tρα2
983042A9830422F
983062E[983042Axkminus1 minus b98304222]
le9830611minus 2ασ2
r (A)minus tρα2
983042A9830422F
983062k
983042Ax0 minus b98304222
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 26 43
Remark 3
Note that(AJ )
Tb = (AJ )TAx983183
where x983183 is any solution of ATAx = ATb For inconsistent linearsystems we can prove DSBGS(1 t) has the convergence bounds Ift = n and 0 lt α lt 2σ2
r (A)983042A9830422F then
E[983042Axk minusAx98318398304222] le9830611 + α2 minus 2ασ2
r (A)
983042A9830422F
983062k
983042Ax0 minus b98304222
If t lt n and 0 lt α lt 2σ2r (A)(tρ) then
E[983042Axk minusAx98318398304222] le9830611minus 2ασ2
r (A)minus tρα2
983042A9830422F
983062k
983042Ax0 minus b98304222
whereρ = max
1lejletσ21(AJj )
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 27 43
4 Randomized extended block Kaczmarz (REBK) [1]
Algorithm 2 Randomized extended block Kaczmarz (REBK)
Let I1 I2 Is and J1J2 Jt be partitions of [m]and [n] respectivelyLet α gt 0 Initialize z0 isin b+ range(A) and x0 isin range(AT)for k = 1 2 do
Pick j isin [t] with probability 983042AJj9830422F983042A9830422FSet zk = zkminus1 minus α
983042AJj9830422FAJj (AJj )
Tzkminus1
Pick i isin [s] with probability 983042AIi9830422F983042A9830422FSet xk = xkminus1 minus α
983042AIi9830422F(AIi)
T(AIixkminus1 minus bIi + zkIi)
By choosing s = m t = n and α = 1 we recover the well-knownrandomized extended Kaczmarz (REK) algorithm of Zouzias and Freris[8]
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 28 43
Let Ekminus1
983045middot983046denote the conditional expectation conditioned on the first
k minus 1 iterations of REBK That is
Ekminus1
983045middot983046= E
983045middot|j1 i1 j2 i2 jkminus1 ikminus1
983046
where jl is the lth column block chosen and il is the lth row blockchosen
We denote the conditional expectation conditioned on the first k minus 1iterations and the kth column block chosen as
Eikminus1
983045middot983046= E
983045middot|j1 i1 j2 i2 jkminus1 ikminus1 jk
983046
Then by the law of total expectation we have
Ekminus1
983045middot983046= Ekminus1
983045Eikminus1
983045middot983046983046
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 29 43
41 Convergence of the norms of the expectations
Theorem 8
For any given consistent or inconsistent linear system
Ax = b
let xk be the kth iterate of REBK with
z0 isin b+ range(A) and x0 isin range(AT)
It holds
983042E983045xk minusAdaggerb
9830469830422 le δk
983061983042x0 minusAdaggerb9830422 +
αk983042ATz09830422983042A9830422F
983062
where
δ = max1leiler
9830559830559830559830551minusασ2
i (A)
983042A9830422F
983055983055983055983055
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 30 43
Proof By ATb = ATAAdaggerb and
xk minusAdaggerb = xkminus1 minusAdaggerbminus α
983042AIi9830422F(AIi)
T(AIixkminus1 minus bIi + zkIi)
we have
Ekminus1
983045xk minusAdaggerb
983046= Ekminus1
983045Eikminus1
983045xk minusAdaggerb
983046983046
= xkminus1 minusAdaggerbminus Ekminus1
983063αAT(Axkminus1 minus b+ zk)
983042A9830422F
983064
= xkminus1 minusAdaggerbminus αATAxkminus1 minusATb
983042A9830422Fminus α
AT
983042A9830422FEkminus1
983045zk
983046
=
983061Iminus α
ATA
983042A9830422F
983062(xkminus1 minusAdaggerb)minus α
AT
983042A9830422FEkminus1
983045zk
983046
=
983061Iminus α
ATA
983042A9830422F
983062(xkminus1 minusAdaggerb)minus α
AT
983042A9830422F
983061Iminus α
AAT
983042A9830422F
983062zkminus1
=
983061Iminus α
ATA
983042A9830422F
983062(xkminus1 minusAdaggerb)minus α
983061Iminus α
ATA
983042A9830422F
983062AT
983042A9830422Fzkminus1
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 31 43
Taking expectation gives
E983045xk minusAdaggerb
983046
= E983045Ekminus1
983045xk minusAdaggerb
983046983046
=
983061Iminus α
ATA
983042A9830422F
983062E983045xkminus1 minusAdaggerb
983046minus α
983061Iminus α
ATA
983042A9830422F
983062AT
983042A9830422FE983045zkminus1
983046
=
983061Iminus α
ATA
983042A9830422F
983062E983045xkminus1 minusAdaggerb
983046minus α
983061Iminus α
ATA
983042A9830422F
983062kATz0
983042A9830422F
=
983061Iminus α
ATA
983042A9830422F
9830622
E983045xkminus2 minusAdaggerb
983046minus 2α
983061Iminus α
ATA
983042A9830422F
983062kATz0
983042A9830422F= middot middot middot
=
983061Iminus α
ATA
983042A9830422F
983062k
(x0 minusAdaggerb)minus αk
983061Iminus α
ATA
983042A9830422F
983062kATz0
983042A9830422F
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 32 43
Applying the norms to both sides we obtain
983042E983045xk minusAdaggerb
9830469830422
=
983056983056983056983056983056
983061Iminus α
ATA
983042A9830422F
983062k
(x0 minusAdaggerb)minus αk
983061Iminus α
ATA
983042A9830422F
983062kATz0
983042A9830422F
9830569830569830569830569830562
le983056983056983056983056983056
983061Iminus α
ATA
983042A9830422F
983062k
(x0 minusAdaggerb)
9830569830569830569830569830562
+
983056983056983056983056983056αk983061Iminus α
ATA
983042A9830422F
983062kATz0
983042A9830422F
9830569830569830569830569830562
le δk983061983042x0 minusAdaggerb9830422 +
αk983042ATz09830422983042A9830422F
983062
Here the last inequality follows from the facts that
x0 minusAdaggerb isin range(AT) ATz0 isin range(AT)
and Lemma 3
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 33 43
42 Convergence of E983045983042xk minusAdaggerb98304222
983046
The convergence ofE983045983042xk minusAdaggerb98304222
983046
depends on the positive number ρ defined as
ρ = 1minus (2αminus α2)σ2r (A)
983042A9830422F
In the following lemma we show that the vector zk generated in REBKwith z0 isin b+ range(A) converges to
bperp = (IminusAAdagger)b
which is the orthogonal projection of z0 onto the set z | ATz = 0
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 34 43
Lemma 9
For any given consistent or inconsistent linear system
Ax = b
let zk be the vector generated in REBK with
z0 isin b+ range(A)
It holdsE983045983042zk minus bperp98304222
983046le ρk983042z0 minus bperp98304222
Proof By (AJj )Tbperp = 0 we have
zk minus bperp = zkminus1 minus bperp minus α
983042AJj9830422FAJj (AJj )
T(zkminus1 minus bperp) (1)
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 35 43
By z0 minus bperp = AAdaggerz0 isin range(A) we can show by induction that
zk minus bperp isin range(A)
It follows from (1) that
983042zk minus bperp98304222
= 983042zkminus1 minus bperp98304222 minus2α983042(AJj )
T(zkminus1 minus bperp)98304222983042AJj9830422F
+α2
983042AJj9830424F(zkminus1 minus bperp)
TAJj (AJj )TAJj (AJj )
T(zkminus1 minus bperp)
le 983042zkminus1 minus bperp98304222 minus(2αminus α2)983042(AJj )
T(zkminus1 minus bperp)98304222983042AJj9830422F
(by Lemma 1)
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 36 43
Taking the conditioned expectation on the first k minus 1 iterations yields
Ekminus1
983045983042zk minus bperp98304222
983046
le 983042zkminus1 minus bperp98304222 minus(2αminus α2)983042AT(zkminus1 minus bperp)98304222
983042A9830422Fle ρ983042zkminus1 minus bperp98304222 (by Lemma 2)
Taking expectation again gives
E983045983042zk minus bperp98304222
983046= E
983045Ekminus1
983045983042zk minus bperp98304222
983046983046
le ρE983045983042zkminus1 minus bperp98304222
983046
le ρk983042z0 minus bperp98304222
This completes the proof
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 37 43
Theorem 10
For any given consistent or inconsistent linear system Ax = b let xk
be the kth iterate of REBK with
z0 isin b+ range(A) and x0 isin range(AT)
For any ε gt 0 it holds
E983045983042xk minusAdaggerb98304222
983046le (1 + ε)kρk
983061(1 + ε)α2983042z0 minus bperp98304222
ε2983042A9830422F+ 983042x0 minusAdaggerb98304222
983062
Proof Let
983141xk = xkminus1 minus α
983042AIi9830422F(AIi)
TAIi(xkminus1 minusAdaggerb)
which is actually one DSBGS update for the linear systemAx = AAdaggerb from xkminus1
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 38 43
It follows from
xk minus 983141xk =α
983042AIi9830422F(AIi)
T(bIi minusAIiAdaggerbminus zkIi)
that
983042xk minus 983141xk98304222
=α2
983042AIi9830424F(bIi minusAIiA
daggerbminus zkIi)TAIi(AIi)
T(bIi minusAIiAdaggerbminus zkIi
)
leα2983042bIi minusAIiA
daggerbminus zkIi98304222
983042AIi9830422F (by Lemma 1) (2)
It follows from
Ekminus1
983045983042xk minus 983141xk98304222
983046= Ekminus1
983045Eikminus1
983045983042xk minus 983141xk98304222
983046983046
le Ekminus1
983063α2983042bminusAAdaggerbminus zk98304222
983042A9830422F
983064(by (2))
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 39 43
that
E983045983042xk minus 983141xk98304222
983046le α2
983042A9830422FE983045983042bminusAAdaggerbminus zk98304222
983046
le α2ρk
983042A9830422F983042z0 minus bperp98304222 (by Lemma 9) (3)
By x0 isin range(AT) and Adaggerb isin range(AT) we have
x0 minusAdaggerb isin range(AT)
Then we can show that xk minusAdaggerb isin range(AT) by induction By
983042983141xk minusAdaggerb98304222
= 983042xkminus1 minusAdaggerb98304222 minus2α983042AIi(x
kminus1 minusAdaggerb)98304222983042AIi9830422F
+α2
983042AIi9830424F(xkminus1 minusAdaggerb)T(AIi)
TAIi(AIi)TAIi(x
kminus1 minusAdaggerb)
le 983042xkminus1 minusAdaggerb98304222 minus(2αminus α2)983042AIi(x
kminus1 minusAdaggerb)98304222983042AIi9830422F
(by Lemma 1)
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 40 43
we have
Ekminus1
983045983042983141xk minusAdaggerb98304222
983046
le 983042xkminus1 minusAdaggerb98304222 minus(2αminus α2)983042A(xkminus1 minusAdaggerb)98304222
983042A9830422Fle ρ983042xkminus1 minusAdaggerb98304222 (by Lemma 3)
which yields
E983045983042983141xk minusAdaggerb98304222
983046le ρE
983045983042xkminus1 minusAdaggerb98304222
983046 (4)
Note that for any ε gt 0 we have
983042xk minusAdaggerb98304222= 983042xk minus 983141xk + 983141xk minusAdaggerb98304222le (983042xk minus 983141xk9830422 + 983042983141xk minusAdaggerb9830422)2
le 983042xk minus 983141xk98304222 + 983042983141xk minusAdaggerb98304222 + 2983042xk minus 983141xk9830422983042983141xk minusAdaggerb9830422
le9830611 +
1
ε
983062983042xk minus 983141xk98304222 + (1 + ε)983042983141xk minusAdaggerb98304222 (5)
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 41 43
Combining (3) (4) and (5) yields
E983045983042xk minusAdaggerb98304222
983046
le9830611 +
1
ε
983062E983045983042xk minus 983141xk98304222
983046+ (1 + ε)E
983045983042983141xk minusAdaggerb98304222
983046
le9830611 +
1
ε
983062α2ρk
983042A9830422F983042z0 minus bperp98304222 + (1 + ε)ρE
983045983042xkminus1 minusAdaggerb98304222
983046
le9830611 +
1
ε
983062(1 + 1 + ε)α2ρk
983042A9830422F983042z0 minus bperp98304222
+(1 + ε)2ρ2E983045983042xkminus2 minusAdaggerb98304222
983046
le middot middot middot
le9830611 +
1
ε
983062(1 + 1 + ε+ middot middot middot+ (1 + ε)kminus1)α2ρk
983042A9830422F983042z0 minus bperp98304222
+(1 + ε)kρk983042x0 minusAdaggerb98304222
le (1 + ε)kρk983061(1 + ε)α2983042z0 minus bperp98304222
ε2983042A9830422F+ 983042x0 minusAdaggerb98304222
983062
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 42 43
Remark 4
For the case REBK with s = m t = n and α = 1 (ie REK) by theorthogonality
(983141xk minusAdaggerb)T(xk minus 983141xk) = 0
the equation (5) becomes
983042xk minusAdaggerb98304222 = 983042xk minus 983141xk98304222 + 983042983141xk minusAdaggerb98304222
which yields the following convergence for REK
E983045983042xk minusAdaggerb98304222
983046le ρk
983061k983042z0 minus bperp98304222
983042A9830422F+ 983042x0 minusAdaggerb98304222
983062
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 43 43
Kui Du Wutao Si and Xiaohui Sun
Pseudoinverse-free randomized extended block kaczmarz for solving least squaresarXiv preprint arXiv200104179 2020
Kui Du and Xiaohui Sun
A doubly stochastic block Gauss-Seidel algorithm for solving linear equationsarXiv preprint arXiv191213291 2019
Lawrence H Landweber
An iteration formula for Fredholm integral equations of the first kindAmer J Math 73615ndash624 1951
D Leventhal and A S Lewis
Randomized methods for linear constraints convergence rates and conditioningMath Oper Res 35(3)641ndash654 2010
Anna Ma Deanna Needell and Aaditya Ramdas
Convergence properties of the randomized extended Gauss-Seidel and Kaczmarz methodsSIAM J Matrix Anal Appl 36(4)1590ndash1604 2015
Meisam Razaviyayn Mingyi Hong Navid Reyhanian and Zhi-Quan Luo
A linearly convergent doubly stochastic Gauss-Seidel algorithm for solving linear equations and acertain class of over-parameterized optimization problemsMath Program 176(1-2 Ser B)465ndash496 2019
Thomas Strohmer and Roman Vershynin
A randomized Kaczmarz algorithm with exponential convergenceJ Fourier Anal Appl 15(2)262ndash278 2009
Anastasios Zouzias and Nikolaos M Freris
Randomized extended Kaczmarz for solving least squaresSIAM J Matrix Anal Appl 34(2)773ndash793 2013
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 43 43
Theorem 7
Let xk denote the kth iterate of DSBGS applied to the consistent linearsystem (full column rank or rank-deficient)
Ax = b
with arbitrary x0 isin Rn If t = n and 0 lt α lt 2σ2r (A)983042A9830422F then
E[983042Axk minus b98304222] le9830611 + α2 minus 2ασ2
r (A)
983042A9830422F
983062k
983042Ax0 minus b98304222
If t lt n and 0 lt α lt 2σ2r (A)(tρ) then
E[983042Axk minus b98304222] le9830611minus 2ασ2
r (A)minus tρα2
983042A9830422F
983062k
983042Ax0 minus b98304222
whereρ = max
1lejletσ21(AJj )
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 21 43
Proof of Theorem 7
Note that
983042Axk minus b98304222
=
983056983056983056983056Axkminus1 minus α
983061AIJ (AIJ )T(II)
T
983042AIJ 9830422F
983062(Axkminus1 minus b)minus b
9830569830569830569830562
2
= 983042Axkminus1 minus b98304222 minus 2α(Axkminus1 minus b)T983061AIJ (AIJ )T(II)
T
983042AIJ 9830422F
983062(Axkminus1 minus b)
+α2(Axkminus1 minus b)T983061IIAIJ (IJ )TATAIJ (AIJ )T(II)
T
983042AIJ 9830424F
983062(Axkminus1 minus b)
If t = n then it follows from
(IJ )TATAIJ = 983042AJ 9830422F
(since AIJ = AJ is a column vector) that
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 22 43
983042Axk minus b98304222
= 983042Axkminus1 minus b98304222 minus 2α(Axkminus1 minus b)T983061AIJ (AIJ )T(II)
T
983042AIJ 9830422F
983062(Axkminus1 minus b)
+α2(Axkminus1 minus b)T983061983042AJ 9830422FIIAIJ (AIJ )T(II)
T
983042AIJ 9830424F
983062(Axkminus1 minus b)
le 983042Axkminus1 minus b98304222 minus 2α(Axkminus1 minus b)T983061AIJ (AIJ )T(II)
T
983042AIJ 9830422F
983062(Axkminus1 minus b)
+α2(Axkminus1 minus b)T983061983042AJ 9830422FII(II)T
983042AIJ 9830422F
983062(Axkminus1 minus b) (by Lemma 1)
Taking expectation gives
E[983042Axk minus b98304222 |xkminus1]
le (1 + α2)983042Axkminus1 minus b98304222 minus 2α(Axkminus1 minus b)T983061AAT
983042A9830422F
983062(Axkminus1 minus b)
le9830611 + α2 minus 2ασ2
r(A)
983042A9830422F
983062983042Axkminus1 minus b98304222
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 23 43
The last inequality follows from Axkminus1 minus b isin range(A) and Lemma 2Taking expectation again gives
E[983042Axk minus b98304222] = E[E[983042Axk minus b98304222 |xkminus1]]
le9830611 + α2 minus 2ασ2
r (A)
983042A9830422F
983062E[983042Axkminus1 minus b98304222]
le9830611 + α2 minus 2ασ2
r (A)
983042A9830422F
983062k
983042Ax0 minus b98304222
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 24 43
If t lt n then it follows from
(IJ )TATAIJ = AT
JAJ ≼ ρI
(since ρ = max1lejlet σ21(AJj )) that
983042Axk minus b98304222
le 983042Axkminus1 minus b98304222 minus 2α(Axkminus1 minus b)T983061AIJ (AIJ )T(II)
T
983042AIJ 9830422F
983062(Axkminus1 minus b)
+α2(Axkminus1 minus b)T983061ρIIAIJ (AIJ )T(II)
T
983042AIJ 9830424F
983062(Axkminus1 minus b)
le 983042Axkminus1 minus b98304222 minus 2α(Axkminus1 minus b)T983061AIJ (AIJ )T(II)
T
983042AIJ 9830422F
983062(Axkminus1 minus b)
+α2(Axkminus1 minus b)T983061ρII(II)
T
983042AIJ 9830422F
983062(Axkminus1 minus b) (by Lemma 1)
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 25 43
Taking expectation gives
E[983042Axk minus b98304222 |xkminus1]
le9830611 +
tρα2
983042A9830422F
983062983042Axkminus1 minus b98304222 minus 2α(Axkminus1 minus b)T
983061AAT
983042A9830422F
983062(Axkminus1 minus b)
le9830611minus 2ασ2
r(A)minus tρα2
983042A9830422F
983062983042Axkminus1 minus b98304222
The last inequality follows from Axkminus1 minus b isin range(A) and Lemma 2Taking expectation again gives
E[983042Axk minus b98304222] = E[E[983042Axk minus b98304222 |xkminus1]]
le9830611minus 2ασ2
r (A)minus tρα2
983042A9830422F
983062E[983042Axkminus1 minus b98304222]
le9830611minus 2ασ2
r (A)minus tρα2
983042A9830422F
983062k
983042Ax0 minus b98304222
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 26 43
Remark 3
Note that(AJ )
Tb = (AJ )TAx983183
where x983183 is any solution of ATAx = ATb For inconsistent linearsystems we can prove DSBGS(1 t) has the convergence bounds Ift = n and 0 lt α lt 2σ2
r (A)983042A9830422F then
E[983042Axk minusAx98318398304222] le9830611 + α2 minus 2ασ2
r (A)
983042A9830422F
983062k
983042Ax0 minus b98304222
If t lt n and 0 lt α lt 2σ2r (A)(tρ) then
E[983042Axk minusAx98318398304222] le9830611minus 2ασ2
r (A)minus tρα2
983042A9830422F
983062k
983042Ax0 minus b98304222
whereρ = max
1lejletσ21(AJj )
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 27 43
4 Randomized extended block Kaczmarz (REBK) [1]
Algorithm 2 Randomized extended block Kaczmarz (REBK)
Let I1 I2 Is and J1J2 Jt be partitions of [m]and [n] respectivelyLet α gt 0 Initialize z0 isin b+ range(A) and x0 isin range(AT)for k = 1 2 do
Pick j isin [t] with probability 983042AJj9830422F983042A9830422FSet zk = zkminus1 minus α
983042AJj9830422FAJj (AJj )
Tzkminus1
Pick i isin [s] with probability 983042AIi9830422F983042A9830422FSet xk = xkminus1 minus α
983042AIi9830422F(AIi)
T(AIixkminus1 minus bIi + zkIi)
By choosing s = m t = n and α = 1 we recover the well-knownrandomized extended Kaczmarz (REK) algorithm of Zouzias and Freris[8]
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 28 43
Let Ekminus1
983045middot983046denote the conditional expectation conditioned on the first
k minus 1 iterations of REBK That is
Ekminus1
983045middot983046= E
983045middot|j1 i1 j2 i2 jkminus1 ikminus1
983046
where jl is the lth column block chosen and il is the lth row blockchosen
We denote the conditional expectation conditioned on the first k minus 1iterations and the kth column block chosen as
Eikminus1
983045middot983046= E
983045middot|j1 i1 j2 i2 jkminus1 ikminus1 jk
983046
Then by the law of total expectation we have
Ekminus1
983045middot983046= Ekminus1
983045Eikminus1
983045middot983046983046
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 29 43
41 Convergence of the norms of the expectations
Theorem 8
For any given consistent or inconsistent linear system
Ax = b
let xk be the kth iterate of REBK with
z0 isin b+ range(A) and x0 isin range(AT)
It holds
983042E983045xk minusAdaggerb
9830469830422 le δk
983061983042x0 minusAdaggerb9830422 +
αk983042ATz09830422983042A9830422F
983062
where
δ = max1leiler
9830559830559830559830551minusασ2
i (A)
983042A9830422F
983055983055983055983055
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 30 43
Proof By ATb = ATAAdaggerb and
xk minusAdaggerb = xkminus1 minusAdaggerbminus α
983042AIi9830422F(AIi)
T(AIixkminus1 minus bIi + zkIi)
we have
Ekminus1
983045xk minusAdaggerb
983046= Ekminus1
983045Eikminus1
983045xk minusAdaggerb
983046983046
= xkminus1 minusAdaggerbminus Ekminus1
983063αAT(Axkminus1 minus b+ zk)
983042A9830422F
983064
= xkminus1 minusAdaggerbminus αATAxkminus1 minusATb
983042A9830422Fminus α
AT
983042A9830422FEkminus1
983045zk
983046
=
983061Iminus α
ATA
983042A9830422F
983062(xkminus1 minusAdaggerb)minus α
AT
983042A9830422FEkminus1
983045zk
983046
=
983061Iminus α
ATA
983042A9830422F
983062(xkminus1 minusAdaggerb)minus α
AT
983042A9830422F
983061Iminus α
AAT
983042A9830422F
983062zkminus1
=
983061Iminus α
ATA
983042A9830422F
983062(xkminus1 minusAdaggerb)minus α
983061Iminus α
ATA
983042A9830422F
983062AT
983042A9830422Fzkminus1
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 31 43
Taking expectation gives
E983045xk minusAdaggerb
983046
= E983045Ekminus1
983045xk minusAdaggerb
983046983046
=
983061Iminus α
ATA
983042A9830422F
983062E983045xkminus1 minusAdaggerb
983046minus α
983061Iminus α
ATA
983042A9830422F
983062AT
983042A9830422FE983045zkminus1
983046
=
983061Iminus α
ATA
983042A9830422F
983062E983045xkminus1 minusAdaggerb
983046minus α
983061Iminus α
ATA
983042A9830422F
983062kATz0
983042A9830422F
=
983061Iminus α
ATA
983042A9830422F
9830622
E983045xkminus2 minusAdaggerb
983046minus 2α
983061Iminus α
ATA
983042A9830422F
983062kATz0
983042A9830422F= middot middot middot
=
983061Iminus α
ATA
983042A9830422F
983062k
(x0 minusAdaggerb)minus αk
983061Iminus α
ATA
983042A9830422F
983062kATz0
983042A9830422F
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 32 43
Applying the norms to both sides we obtain
983042E983045xk minusAdaggerb
9830469830422
=
983056983056983056983056983056
983061Iminus α
ATA
983042A9830422F
983062k
(x0 minusAdaggerb)minus αk
983061Iminus α
ATA
983042A9830422F
983062kATz0
983042A9830422F
9830569830569830569830569830562
le983056983056983056983056983056
983061Iminus α
ATA
983042A9830422F
983062k
(x0 minusAdaggerb)
9830569830569830569830569830562
+
983056983056983056983056983056αk983061Iminus α
ATA
983042A9830422F
983062kATz0
983042A9830422F
9830569830569830569830569830562
le δk983061983042x0 minusAdaggerb9830422 +
αk983042ATz09830422983042A9830422F
983062
Here the last inequality follows from the facts that
x0 minusAdaggerb isin range(AT) ATz0 isin range(AT)
and Lemma 3
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 33 43
42 Convergence of E983045983042xk minusAdaggerb98304222
983046
The convergence ofE983045983042xk minusAdaggerb98304222
983046
depends on the positive number ρ defined as
ρ = 1minus (2αminus α2)σ2r (A)
983042A9830422F
In the following lemma we show that the vector zk generated in REBKwith z0 isin b+ range(A) converges to
bperp = (IminusAAdagger)b
which is the orthogonal projection of z0 onto the set z | ATz = 0
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 34 43
Lemma 9
For any given consistent or inconsistent linear system
Ax = b
let zk be the vector generated in REBK with
z0 isin b+ range(A)
It holdsE983045983042zk minus bperp98304222
983046le ρk983042z0 minus bperp98304222
Proof By (AJj )Tbperp = 0 we have
zk minus bperp = zkminus1 minus bperp minus α
983042AJj9830422FAJj (AJj )
T(zkminus1 minus bperp) (1)
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 35 43
By z0 minus bperp = AAdaggerz0 isin range(A) we can show by induction that
zk minus bperp isin range(A)
It follows from (1) that
983042zk minus bperp98304222
= 983042zkminus1 minus bperp98304222 minus2α983042(AJj )
T(zkminus1 minus bperp)98304222983042AJj9830422F
+α2
983042AJj9830424F(zkminus1 minus bperp)
TAJj (AJj )TAJj (AJj )
T(zkminus1 minus bperp)
le 983042zkminus1 minus bperp98304222 minus(2αminus α2)983042(AJj )
T(zkminus1 minus bperp)98304222983042AJj9830422F
(by Lemma 1)
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 36 43
Taking the conditioned expectation on the first k minus 1 iterations yields
Ekminus1
983045983042zk minus bperp98304222
983046
le 983042zkminus1 minus bperp98304222 minus(2αminus α2)983042AT(zkminus1 minus bperp)98304222
983042A9830422Fle ρ983042zkminus1 minus bperp98304222 (by Lemma 2)
Taking expectation again gives
E983045983042zk minus bperp98304222
983046= E
983045Ekminus1
983045983042zk minus bperp98304222
983046983046
le ρE983045983042zkminus1 minus bperp98304222
983046
le ρk983042z0 minus bperp98304222
This completes the proof
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 37 43
Theorem 10
For any given consistent or inconsistent linear system Ax = b let xk
be the kth iterate of REBK with
z0 isin b+ range(A) and x0 isin range(AT)
For any ε gt 0 it holds
E983045983042xk minusAdaggerb98304222
983046le (1 + ε)kρk
983061(1 + ε)α2983042z0 minus bperp98304222
ε2983042A9830422F+ 983042x0 minusAdaggerb98304222
983062
Proof Let
983141xk = xkminus1 minus α
983042AIi9830422F(AIi)
TAIi(xkminus1 minusAdaggerb)
which is actually one DSBGS update for the linear systemAx = AAdaggerb from xkminus1
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 38 43
It follows from
xk minus 983141xk =α
983042AIi9830422F(AIi)
T(bIi minusAIiAdaggerbminus zkIi)
that
983042xk minus 983141xk98304222
=α2
983042AIi9830424F(bIi minusAIiA
daggerbminus zkIi)TAIi(AIi)
T(bIi minusAIiAdaggerbminus zkIi
)
leα2983042bIi minusAIiA
daggerbminus zkIi98304222
983042AIi9830422F (by Lemma 1) (2)
It follows from
Ekminus1
983045983042xk minus 983141xk98304222
983046= Ekminus1
983045Eikminus1
983045983042xk minus 983141xk98304222
983046983046
le Ekminus1
983063α2983042bminusAAdaggerbminus zk98304222
983042A9830422F
983064(by (2))
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 39 43
that
E983045983042xk minus 983141xk98304222
983046le α2
983042A9830422FE983045983042bminusAAdaggerbminus zk98304222
983046
le α2ρk
983042A9830422F983042z0 minus bperp98304222 (by Lemma 9) (3)
By x0 isin range(AT) and Adaggerb isin range(AT) we have
x0 minusAdaggerb isin range(AT)
Then we can show that xk minusAdaggerb isin range(AT) by induction By
983042983141xk minusAdaggerb98304222
= 983042xkminus1 minusAdaggerb98304222 minus2α983042AIi(x
kminus1 minusAdaggerb)98304222983042AIi9830422F
+α2
983042AIi9830424F(xkminus1 minusAdaggerb)T(AIi)
TAIi(AIi)TAIi(x
kminus1 minusAdaggerb)
le 983042xkminus1 minusAdaggerb98304222 minus(2αminus α2)983042AIi(x
kminus1 minusAdaggerb)98304222983042AIi9830422F
(by Lemma 1)
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 40 43
we have
Ekminus1
983045983042983141xk minusAdaggerb98304222
983046
le 983042xkminus1 minusAdaggerb98304222 minus(2αminus α2)983042A(xkminus1 minusAdaggerb)98304222
983042A9830422Fle ρ983042xkminus1 minusAdaggerb98304222 (by Lemma 3)
which yields
E983045983042983141xk minusAdaggerb98304222
983046le ρE
983045983042xkminus1 minusAdaggerb98304222
983046 (4)
Note that for any ε gt 0 we have
983042xk minusAdaggerb98304222= 983042xk minus 983141xk + 983141xk minusAdaggerb98304222le (983042xk minus 983141xk9830422 + 983042983141xk minusAdaggerb9830422)2
le 983042xk minus 983141xk98304222 + 983042983141xk minusAdaggerb98304222 + 2983042xk minus 983141xk9830422983042983141xk minusAdaggerb9830422
le9830611 +
1
ε
983062983042xk minus 983141xk98304222 + (1 + ε)983042983141xk minusAdaggerb98304222 (5)
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 41 43
Combining (3) (4) and (5) yields
E983045983042xk minusAdaggerb98304222
983046
le9830611 +
1
ε
983062E983045983042xk minus 983141xk98304222
983046+ (1 + ε)E
983045983042983141xk minusAdaggerb98304222
983046
le9830611 +
1
ε
983062α2ρk
983042A9830422F983042z0 minus bperp98304222 + (1 + ε)ρE
983045983042xkminus1 minusAdaggerb98304222
983046
le9830611 +
1
ε
983062(1 + 1 + ε)α2ρk
983042A9830422F983042z0 minus bperp98304222
+(1 + ε)2ρ2E983045983042xkminus2 minusAdaggerb98304222
983046
le middot middot middot
le9830611 +
1
ε
983062(1 + 1 + ε+ middot middot middot+ (1 + ε)kminus1)α2ρk
983042A9830422F983042z0 minus bperp98304222
+(1 + ε)kρk983042x0 minusAdaggerb98304222
le (1 + ε)kρk983061(1 + ε)α2983042z0 minus bperp98304222
ε2983042A9830422F+ 983042x0 minusAdaggerb98304222
983062
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 42 43
Remark 4
For the case REBK with s = m t = n and α = 1 (ie REK) by theorthogonality
(983141xk minusAdaggerb)T(xk minus 983141xk) = 0
the equation (5) becomes
983042xk minusAdaggerb98304222 = 983042xk minus 983141xk98304222 + 983042983141xk minusAdaggerb98304222
which yields the following convergence for REK
E983045983042xk minusAdaggerb98304222
983046le ρk
983061k983042z0 minus bperp98304222
983042A9830422F+ 983042x0 minusAdaggerb98304222
983062
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 43 43
Kui Du Wutao Si and Xiaohui Sun
Pseudoinverse-free randomized extended block kaczmarz for solving least squaresarXiv preprint arXiv200104179 2020
Kui Du and Xiaohui Sun
A doubly stochastic block Gauss-Seidel algorithm for solving linear equationsarXiv preprint arXiv191213291 2019
Lawrence H Landweber
An iteration formula for Fredholm integral equations of the first kindAmer J Math 73615ndash624 1951
D Leventhal and A S Lewis
Randomized methods for linear constraints convergence rates and conditioningMath Oper Res 35(3)641ndash654 2010
Anna Ma Deanna Needell and Aaditya Ramdas
Convergence properties of the randomized extended Gauss-Seidel and Kaczmarz methodsSIAM J Matrix Anal Appl 36(4)1590ndash1604 2015
Meisam Razaviyayn Mingyi Hong Navid Reyhanian and Zhi-Quan Luo
A linearly convergent doubly stochastic Gauss-Seidel algorithm for solving linear equations and acertain class of over-parameterized optimization problemsMath Program 176(1-2 Ser B)465ndash496 2019
Thomas Strohmer and Roman Vershynin
A randomized Kaczmarz algorithm with exponential convergenceJ Fourier Anal Appl 15(2)262ndash278 2009
Anastasios Zouzias and Nikolaos M Freris
Randomized extended Kaczmarz for solving least squaresSIAM J Matrix Anal Appl 34(2)773ndash793 2013
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 43 43
Proof of Theorem 7
Note that
983042Axk minus b98304222
=
983056983056983056983056Axkminus1 minus α
983061AIJ (AIJ )T(II)
T
983042AIJ 9830422F
983062(Axkminus1 minus b)minus b
9830569830569830569830562
2
= 983042Axkminus1 minus b98304222 minus 2α(Axkminus1 minus b)T983061AIJ (AIJ )T(II)
T
983042AIJ 9830422F
983062(Axkminus1 minus b)
+α2(Axkminus1 minus b)T983061IIAIJ (IJ )TATAIJ (AIJ )T(II)
T
983042AIJ 9830424F
983062(Axkminus1 minus b)
If t = n then it follows from
(IJ )TATAIJ = 983042AJ 9830422F
(since AIJ = AJ is a column vector) that
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 22 43
983042Axk minus b98304222
= 983042Axkminus1 minus b98304222 minus 2α(Axkminus1 minus b)T983061AIJ (AIJ )T(II)
T
983042AIJ 9830422F
983062(Axkminus1 minus b)
+α2(Axkminus1 minus b)T983061983042AJ 9830422FIIAIJ (AIJ )T(II)
T
983042AIJ 9830424F
983062(Axkminus1 minus b)
le 983042Axkminus1 minus b98304222 minus 2α(Axkminus1 minus b)T983061AIJ (AIJ )T(II)
T
983042AIJ 9830422F
983062(Axkminus1 minus b)
+α2(Axkminus1 minus b)T983061983042AJ 9830422FII(II)T
983042AIJ 9830422F
983062(Axkminus1 minus b) (by Lemma 1)
Taking expectation gives
E[983042Axk minus b98304222 |xkminus1]
le (1 + α2)983042Axkminus1 minus b98304222 minus 2α(Axkminus1 minus b)T983061AAT
983042A9830422F
983062(Axkminus1 minus b)
le9830611 + α2 minus 2ασ2
r(A)
983042A9830422F
983062983042Axkminus1 minus b98304222
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 23 43
The last inequality follows from Axkminus1 minus b isin range(A) and Lemma 2Taking expectation again gives
E[983042Axk minus b98304222] = E[E[983042Axk minus b98304222 |xkminus1]]
le9830611 + α2 minus 2ασ2
r (A)
983042A9830422F
983062E[983042Axkminus1 minus b98304222]
le9830611 + α2 minus 2ασ2
r (A)
983042A9830422F
983062k
983042Ax0 minus b98304222
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 24 43
If t lt n then it follows from
(IJ )TATAIJ = AT
JAJ ≼ ρI
(since ρ = max1lejlet σ21(AJj )) that
983042Axk minus b98304222
le 983042Axkminus1 minus b98304222 minus 2α(Axkminus1 minus b)T983061AIJ (AIJ )T(II)
T
983042AIJ 9830422F
983062(Axkminus1 minus b)
+α2(Axkminus1 minus b)T983061ρIIAIJ (AIJ )T(II)
T
983042AIJ 9830424F
983062(Axkminus1 minus b)
le 983042Axkminus1 minus b98304222 minus 2α(Axkminus1 minus b)T983061AIJ (AIJ )T(II)
T
983042AIJ 9830422F
983062(Axkminus1 minus b)
+α2(Axkminus1 minus b)T983061ρII(II)
T
983042AIJ 9830422F
983062(Axkminus1 minus b) (by Lemma 1)
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 25 43
Taking expectation gives
E[983042Axk minus b98304222 |xkminus1]
le9830611 +
tρα2
983042A9830422F
983062983042Axkminus1 minus b98304222 minus 2α(Axkminus1 minus b)T
983061AAT
983042A9830422F
983062(Axkminus1 minus b)
le9830611minus 2ασ2
r(A)minus tρα2
983042A9830422F
983062983042Axkminus1 minus b98304222
The last inequality follows from Axkminus1 minus b isin range(A) and Lemma 2Taking expectation again gives
E[983042Axk minus b98304222] = E[E[983042Axk minus b98304222 |xkminus1]]
le9830611minus 2ασ2
r (A)minus tρα2
983042A9830422F
983062E[983042Axkminus1 minus b98304222]
le9830611minus 2ασ2
r (A)minus tρα2
983042A9830422F
983062k
983042Ax0 minus b98304222
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 26 43
Remark 3
Note that(AJ )
Tb = (AJ )TAx983183
where x983183 is any solution of ATAx = ATb For inconsistent linearsystems we can prove DSBGS(1 t) has the convergence bounds Ift = n and 0 lt α lt 2σ2
r (A)983042A9830422F then
E[983042Axk minusAx98318398304222] le9830611 + α2 minus 2ασ2
r (A)
983042A9830422F
983062k
983042Ax0 minus b98304222
If t lt n and 0 lt α lt 2σ2r (A)(tρ) then
E[983042Axk minusAx98318398304222] le9830611minus 2ασ2
r (A)minus tρα2
983042A9830422F
983062k
983042Ax0 minus b98304222
whereρ = max
1lejletσ21(AJj )
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 27 43
4 Randomized extended block Kaczmarz (REBK) [1]
Algorithm 2 Randomized extended block Kaczmarz (REBK)
Let I1 I2 Is and J1J2 Jt be partitions of [m]and [n] respectivelyLet α gt 0 Initialize z0 isin b+ range(A) and x0 isin range(AT)for k = 1 2 do
Pick j isin [t] with probability 983042AJj9830422F983042A9830422FSet zk = zkminus1 minus α
983042AJj9830422FAJj (AJj )
Tzkminus1
Pick i isin [s] with probability 983042AIi9830422F983042A9830422FSet xk = xkminus1 minus α
983042AIi9830422F(AIi)
T(AIixkminus1 minus bIi + zkIi)
By choosing s = m t = n and α = 1 we recover the well-knownrandomized extended Kaczmarz (REK) algorithm of Zouzias and Freris[8]
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 28 43
Let Ekminus1
983045middot983046denote the conditional expectation conditioned on the first
k minus 1 iterations of REBK That is
Ekminus1
983045middot983046= E
983045middot|j1 i1 j2 i2 jkminus1 ikminus1
983046
where jl is the lth column block chosen and il is the lth row blockchosen
We denote the conditional expectation conditioned on the first k minus 1iterations and the kth column block chosen as
Eikminus1
983045middot983046= E
983045middot|j1 i1 j2 i2 jkminus1 ikminus1 jk
983046
Then by the law of total expectation we have
Ekminus1
983045middot983046= Ekminus1
983045Eikminus1
983045middot983046983046
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 29 43
41 Convergence of the norms of the expectations
Theorem 8
For any given consistent or inconsistent linear system
Ax = b
let xk be the kth iterate of REBK with
z0 isin b+ range(A) and x0 isin range(AT)
It holds
983042E983045xk minusAdaggerb
9830469830422 le δk
983061983042x0 minusAdaggerb9830422 +
αk983042ATz09830422983042A9830422F
983062
where
δ = max1leiler
9830559830559830559830551minusασ2
i (A)
983042A9830422F
983055983055983055983055
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 30 43
Proof By ATb = ATAAdaggerb and
xk minusAdaggerb = xkminus1 minusAdaggerbminus α
983042AIi9830422F(AIi)
T(AIixkminus1 minus bIi + zkIi)
we have
Ekminus1
983045xk minusAdaggerb
983046= Ekminus1
983045Eikminus1
983045xk minusAdaggerb
983046983046
= xkminus1 minusAdaggerbminus Ekminus1
983063αAT(Axkminus1 minus b+ zk)
983042A9830422F
983064
= xkminus1 minusAdaggerbminus αATAxkminus1 minusATb
983042A9830422Fminus α
AT
983042A9830422FEkminus1
983045zk
983046
=
983061Iminus α
ATA
983042A9830422F
983062(xkminus1 minusAdaggerb)minus α
AT
983042A9830422FEkminus1
983045zk
983046
=
983061Iminus α
ATA
983042A9830422F
983062(xkminus1 minusAdaggerb)minus α
AT
983042A9830422F
983061Iminus α
AAT
983042A9830422F
983062zkminus1
=
983061Iminus α
ATA
983042A9830422F
983062(xkminus1 minusAdaggerb)minus α
983061Iminus α
ATA
983042A9830422F
983062AT
983042A9830422Fzkminus1
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 31 43
Taking expectation gives
E983045xk minusAdaggerb
983046
= E983045Ekminus1
983045xk minusAdaggerb
983046983046
=
983061Iminus α
ATA
983042A9830422F
983062E983045xkminus1 minusAdaggerb
983046minus α
983061Iminus α
ATA
983042A9830422F
983062AT
983042A9830422FE983045zkminus1
983046
=
983061Iminus α
ATA
983042A9830422F
983062E983045xkminus1 minusAdaggerb
983046minus α
983061Iminus α
ATA
983042A9830422F
983062kATz0
983042A9830422F
=
983061Iminus α
ATA
983042A9830422F
9830622
E983045xkminus2 minusAdaggerb
983046minus 2α
983061Iminus α
ATA
983042A9830422F
983062kATz0
983042A9830422F= middot middot middot
=
983061Iminus α
ATA
983042A9830422F
983062k
(x0 minusAdaggerb)minus αk
983061Iminus α
ATA
983042A9830422F
983062kATz0
983042A9830422F
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 32 43
Applying the norms to both sides we obtain
983042E983045xk minusAdaggerb
9830469830422
=
983056983056983056983056983056
983061Iminus α
ATA
983042A9830422F
983062k
(x0 minusAdaggerb)minus αk
983061Iminus α
ATA
983042A9830422F
983062kATz0
983042A9830422F
9830569830569830569830569830562
le983056983056983056983056983056
983061Iminus α
ATA
983042A9830422F
983062k
(x0 minusAdaggerb)
9830569830569830569830569830562
+
983056983056983056983056983056αk983061Iminus α
ATA
983042A9830422F
983062kATz0
983042A9830422F
9830569830569830569830569830562
le δk983061983042x0 minusAdaggerb9830422 +
αk983042ATz09830422983042A9830422F
983062
Here the last inequality follows from the facts that
x0 minusAdaggerb isin range(AT) ATz0 isin range(AT)
and Lemma 3
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 33 43
42 Convergence of E983045983042xk minusAdaggerb98304222
983046
The convergence ofE983045983042xk minusAdaggerb98304222
983046
depends on the positive number ρ defined as
ρ = 1minus (2αminus α2)σ2r (A)
983042A9830422F
In the following lemma we show that the vector zk generated in REBKwith z0 isin b+ range(A) converges to
bperp = (IminusAAdagger)b
which is the orthogonal projection of z0 onto the set z | ATz = 0
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 34 43
Lemma 9
For any given consistent or inconsistent linear system
Ax = b
let zk be the vector generated in REBK with
z0 isin b+ range(A)
It holdsE983045983042zk minus bperp98304222
983046le ρk983042z0 minus bperp98304222
Proof By (AJj )Tbperp = 0 we have
zk minus bperp = zkminus1 minus bperp minus α
983042AJj9830422FAJj (AJj )
T(zkminus1 minus bperp) (1)
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 35 43
By z0 minus bperp = AAdaggerz0 isin range(A) we can show by induction that
zk minus bperp isin range(A)
It follows from (1) that
983042zk minus bperp98304222
= 983042zkminus1 minus bperp98304222 minus2α983042(AJj )
T(zkminus1 minus bperp)98304222983042AJj9830422F
+α2
983042AJj9830424F(zkminus1 minus bperp)
TAJj (AJj )TAJj (AJj )
T(zkminus1 minus bperp)
le 983042zkminus1 minus bperp98304222 minus(2αminus α2)983042(AJj )
T(zkminus1 minus bperp)98304222983042AJj9830422F
(by Lemma 1)
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 36 43
Taking the conditioned expectation on the first k minus 1 iterations yields
Ekminus1
983045983042zk minus bperp98304222
983046
le 983042zkminus1 minus bperp98304222 minus(2αminus α2)983042AT(zkminus1 minus bperp)98304222
983042A9830422Fle ρ983042zkminus1 minus bperp98304222 (by Lemma 2)
Taking expectation again gives
E983045983042zk minus bperp98304222
983046= E
983045Ekminus1
983045983042zk minus bperp98304222
983046983046
le ρE983045983042zkminus1 minus bperp98304222
983046
le ρk983042z0 minus bperp98304222
This completes the proof
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 37 43
Theorem 10
For any given consistent or inconsistent linear system Ax = b let xk
be the kth iterate of REBK with
z0 isin b+ range(A) and x0 isin range(AT)
For any ε gt 0 it holds
E983045983042xk minusAdaggerb98304222
983046le (1 + ε)kρk
983061(1 + ε)α2983042z0 minus bperp98304222
ε2983042A9830422F+ 983042x0 minusAdaggerb98304222
983062
Proof Let
983141xk = xkminus1 minus α
983042AIi9830422F(AIi)
TAIi(xkminus1 minusAdaggerb)
which is actually one DSBGS update for the linear systemAx = AAdaggerb from xkminus1
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 38 43
It follows from
xk minus 983141xk =α
983042AIi9830422F(AIi)
T(bIi minusAIiAdaggerbminus zkIi)
that
983042xk minus 983141xk98304222
=α2
983042AIi9830424F(bIi minusAIiA
daggerbminus zkIi)TAIi(AIi)
T(bIi minusAIiAdaggerbminus zkIi
)
leα2983042bIi minusAIiA
daggerbminus zkIi98304222
983042AIi9830422F (by Lemma 1) (2)
It follows from
Ekminus1
983045983042xk minus 983141xk98304222
983046= Ekminus1
983045Eikminus1
983045983042xk minus 983141xk98304222
983046983046
le Ekminus1
983063α2983042bminusAAdaggerbminus zk98304222
983042A9830422F
983064(by (2))
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 39 43
that
E983045983042xk minus 983141xk98304222
983046le α2
983042A9830422FE983045983042bminusAAdaggerbminus zk98304222
983046
le α2ρk
983042A9830422F983042z0 minus bperp98304222 (by Lemma 9) (3)
By x0 isin range(AT) and Adaggerb isin range(AT) we have
x0 minusAdaggerb isin range(AT)
Then we can show that xk minusAdaggerb isin range(AT) by induction By
983042983141xk minusAdaggerb98304222
= 983042xkminus1 minusAdaggerb98304222 minus2α983042AIi(x
kminus1 minusAdaggerb)98304222983042AIi9830422F
+α2
983042AIi9830424F(xkminus1 minusAdaggerb)T(AIi)
TAIi(AIi)TAIi(x
kminus1 minusAdaggerb)
le 983042xkminus1 minusAdaggerb98304222 minus(2αminus α2)983042AIi(x
kminus1 minusAdaggerb)98304222983042AIi9830422F
(by Lemma 1)
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 40 43
we have
Ekminus1
983045983042983141xk minusAdaggerb98304222
983046
le 983042xkminus1 minusAdaggerb98304222 minus(2αminus α2)983042A(xkminus1 minusAdaggerb)98304222
983042A9830422Fle ρ983042xkminus1 minusAdaggerb98304222 (by Lemma 3)
which yields
E983045983042983141xk minusAdaggerb98304222
983046le ρE
983045983042xkminus1 minusAdaggerb98304222
983046 (4)
Note that for any ε gt 0 we have
983042xk minusAdaggerb98304222= 983042xk minus 983141xk + 983141xk minusAdaggerb98304222le (983042xk minus 983141xk9830422 + 983042983141xk minusAdaggerb9830422)2
le 983042xk minus 983141xk98304222 + 983042983141xk minusAdaggerb98304222 + 2983042xk minus 983141xk9830422983042983141xk minusAdaggerb9830422
le9830611 +
1
ε
983062983042xk minus 983141xk98304222 + (1 + ε)983042983141xk minusAdaggerb98304222 (5)
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 41 43
Combining (3) (4) and (5) yields
E983045983042xk minusAdaggerb98304222
983046
le9830611 +
1
ε
983062E983045983042xk minus 983141xk98304222
983046+ (1 + ε)E
983045983042983141xk minusAdaggerb98304222
983046
le9830611 +
1
ε
983062α2ρk
983042A9830422F983042z0 minus bperp98304222 + (1 + ε)ρE
983045983042xkminus1 minusAdaggerb98304222
983046
le9830611 +
1
ε
983062(1 + 1 + ε)α2ρk
983042A9830422F983042z0 minus bperp98304222
+(1 + ε)2ρ2E983045983042xkminus2 minusAdaggerb98304222
983046
le middot middot middot
le9830611 +
1
ε
983062(1 + 1 + ε+ middot middot middot+ (1 + ε)kminus1)α2ρk
983042A9830422F983042z0 minus bperp98304222
+(1 + ε)kρk983042x0 minusAdaggerb98304222
le (1 + ε)kρk983061(1 + ε)α2983042z0 minus bperp98304222
ε2983042A9830422F+ 983042x0 minusAdaggerb98304222
983062
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 42 43
Remark 4
For the case REBK with s = m t = n and α = 1 (ie REK) by theorthogonality
(983141xk minusAdaggerb)T(xk minus 983141xk) = 0
the equation (5) becomes
983042xk minusAdaggerb98304222 = 983042xk minus 983141xk98304222 + 983042983141xk minusAdaggerb98304222
which yields the following convergence for REK
E983045983042xk minusAdaggerb98304222
983046le ρk
983061k983042z0 minus bperp98304222
983042A9830422F+ 983042x0 minusAdaggerb98304222
983062
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 43 43
Kui Du Wutao Si and Xiaohui Sun
Pseudoinverse-free randomized extended block kaczmarz for solving least squaresarXiv preprint arXiv200104179 2020
Kui Du and Xiaohui Sun
A doubly stochastic block Gauss-Seidel algorithm for solving linear equationsarXiv preprint arXiv191213291 2019
Lawrence H Landweber
An iteration formula for Fredholm integral equations of the first kindAmer J Math 73615ndash624 1951
D Leventhal and A S Lewis
Randomized methods for linear constraints convergence rates and conditioningMath Oper Res 35(3)641ndash654 2010
Anna Ma Deanna Needell and Aaditya Ramdas
Convergence properties of the randomized extended Gauss-Seidel and Kaczmarz methodsSIAM J Matrix Anal Appl 36(4)1590ndash1604 2015
Meisam Razaviyayn Mingyi Hong Navid Reyhanian and Zhi-Quan Luo
A linearly convergent doubly stochastic Gauss-Seidel algorithm for solving linear equations and acertain class of over-parameterized optimization problemsMath Program 176(1-2 Ser B)465ndash496 2019
Thomas Strohmer and Roman Vershynin
A randomized Kaczmarz algorithm with exponential convergenceJ Fourier Anal Appl 15(2)262ndash278 2009
Anastasios Zouzias and Nikolaos M Freris
Randomized extended Kaczmarz for solving least squaresSIAM J Matrix Anal Appl 34(2)773ndash793 2013
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 43 43
983042Axk minus b98304222
= 983042Axkminus1 minus b98304222 minus 2α(Axkminus1 minus b)T983061AIJ (AIJ )T(II)
T
983042AIJ 9830422F
983062(Axkminus1 minus b)
+α2(Axkminus1 minus b)T983061983042AJ 9830422FIIAIJ (AIJ )T(II)
T
983042AIJ 9830424F
983062(Axkminus1 minus b)
le 983042Axkminus1 minus b98304222 minus 2α(Axkminus1 minus b)T983061AIJ (AIJ )T(II)
T
983042AIJ 9830422F
983062(Axkminus1 minus b)
+α2(Axkminus1 minus b)T983061983042AJ 9830422FII(II)T
983042AIJ 9830422F
983062(Axkminus1 minus b) (by Lemma 1)
Taking expectation gives
E[983042Axk minus b98304222 |xkminus1]
le (1 + α2)983042Axkminus1 minus b98304222 minus 2α(Axkminus1 minus b)T983061AAT
983042A9830422F
983062(Axkminus1 minus b)
le9830611 + α2 minus 2ασ2
r(A)
983042A9830422F
983062983042Axkminus1 minus b98304222
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 23 43
The last inequality follows from Axkminus1 minus b isin range(A) and Lemma 2Taking expectation again gives
E[983042Axk minus b98304222] = E[E[983042Axk minus b98304222 |xkminus1]]
le9830611 + α2 minus 2ασ2
r (A)
983042A9830422F
983062E[983042Axkminus1 minus b98304222]
le9830611 + α2 minus 2ασ2
r (A)
983042A9830422F
983062k
983042Ax0 minus b98304222
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 24 43
If t lt n then it follows from
(IJ )TATAIJ = AT
JAJ ≼ ρI
(since ρ = max1lejlet σ21(AJj )) that
983042Axk minus b98304222
le 983042Axkminus1 minus b98304222 minus 2α(Axkminus1 minus b)T983061AIJ (AIJ )T(II)
T
983042AIJ 9830422F
983062(Axkminus1 minus b)
+α2(Axkminus1 minus b)T983061ρIIAIJ (AIJ )T(II)
T
983042AIJ 9830424F
983062(Axkminus1 minus b)
le 983042Axkminus1 minus b98304222 minus 2α(Axkminus1 minus b)T983061AIJ (AIJ )T(II)
T
983042AIJ 9830422F
983062(Axkminus1 minus b)
+α2(Axkminus1 minus b)T983061ρII(II)
T
983042AIJ 9830422F
983062(Axkminus1 minus b) (by Lemma 1)
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 25 43
Taking expectation gives
E[983042Axk minus b98304222 |xkminus1]
le9830611 +
tρα2
983042A9830422F
983062983042Axkminus1 minus b98304222 minus 2α(Axkminus1 minus b)T
983061AAT
983042A9830422F
983062(Axkminus1 minus b)
le9830611minus 2ασ2
r(A)minus tρα2
983042A9830422F
983062983042Axkminus1 minus b98304222
The last inequality follows from Axkminus1 minus b isin range(A) and Lemma 2Taking expectation again gives
E[983042Axk minus b98304222] = E[E[983042Axk minus b98304222 |xkminus1]]
le9830611minus 2ασ2
r (A)minus tρα2
983042A9830422F
983062E[983042Axkminus1 minus b98304222]
le9830611minus 2ασ2
r (A)minus tρα2
983042A9830422F
983062k
983042Ax0 minus b98304222
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 26 43
Remark 3
Note that(AJ )
Tb = (AJ )TAx983183
where x983183 is any solution of ATAx = ATb For inconsistent linearsystems we can prove DSBGS(1 t) has the convergence bounds Ift = n and 0 lt α lt 2σ2
r (A)983042A9830422F then
E[983042Axk minusAx98318398304222] le9830611 + α2 minus 2ασ2
r (A)
983042A9830422F
983062k
983042Ax0 minus b98304222
If t lt n and 0 lt α lt 2σ2r (A)(tρ) then
E[983042Axk minusAx98318398304222] le9830611minus 2ασ2
r (A)minus tρα2
983042A9830422F
983062k
983042Ax0 minus b98304222
whereρ = max
1lejletσ21(AJj )
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 27 43
4 Randomized extended block Kaczmarz (REBK) [1]
Algorithm 2 Randomized extended block Kaczmarz (REBK)
Let I1 I2 Is and J1J2 Jt be partitions of [m]and [n] respectivelyLet α gt 0 Initialize z0 isin b+ range(A) and x0 isin range(AT)for k = 1 2 do
Pick j isin [t] with probability 983042AJj9830422F983042A9830422FSet zk = zkminus1 minus α
983042AJj9830422FAJj (AJj )
Tzkminus1
Pick i isin [s] with probability 983042AIi9830422F983042A9830422FSet xk = xkminus1 minus α
983042AIi9830422F(AIi)
T(AIixkminus1 minus bIi + zkIi)
By choosing s = m t = n and α = 1 we recover the well-knownrandomized extended Kaczmarz (REK) algorithm of Zouzias and Freris[8]
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 28 43
Let Ekminus1
983045middot983046denote the conditional expectation conditioned on the first
k minus 1 iterations of REBK That is
Ekminus1
983045middot983046= E
983045middot|j1 i1 j2 i2 jkminus1 ikminus1
983046
where jl is the lth column block chosen and il is the lth row blockchosen
We denote the conditional expectation conditioned on the first k minus 1iterations and the kth column block chosen as
Eikminus1
983045middot983046= E
983045middot|j1 i1 j2 i2 jkminus1 ikminus1 jk
983046
Then by the law of total expectation we have
Ekminus1
983045middot983046= Ekminus1
983045Eikminus1
983045middot983046983046
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 29 43
41 Convergence of the norms of the expectations
Theorem 8
For any given consistent or inconsistent linear system
Ax = b
let xk be the kth iterate of REBK with
z0 isin b+ range(A) and x0 isin range(AT)
It holds
983042E983045xk minusAdaggerb
9830469830422 le δk
983061983042x0 minusAdaggerb9830422 +
αk983042ATz09830422983042A9830422F
983062
where
δ = max1leiler
9830559830559830559830551minusασ2
i (A)
983042A9830422F
983055983055983055983055
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 30 43
Proof By ATb = ATAAdaggerb and
xk minusAdaggerb = xkminus1 minusAdaggerbminus α
983042AIi9830422F(AIi)
T(AIixkminus1 minus bIi + zkIi)
we have
Ekminus1
983045xk minusAdaggerb
983046= Ekminus1
983045Eikminus1
983045xk minusAdaggerb
983046983046
= xkminus1 minusAdaggerbminus Ekminus1
983063αAT(Axkminus1 minus b+ zk)
983042A9830422F
983064
= xkminus1 minusAdaggerbminus αATAxkminus1 minusATb
983042A9830422Fminus α
AT
983042A9830422FEkminus1
983045zk
983046
=
983061Iminus α
ATA
983042A9830422F
983062(xkminus1 minusAdaggerb)minus α
AT
983042A9830422FEkminus1
983045zk
983046
=
983061Iminus α
ATA
983042A9830422F
983062(xkminus1 minusAdaggerb)minus α
AT
983042A9830422F
983061Iminus α
AAT
983042A9830422F
983062zkminus1
=
983061Iminus α
ATA
983042A9830422F
983062(xkminus1 minusAdaggerb)minus α
983061Iminus α
ATA
983042A9830422F
983062AT
983042A9830422Fzkminus1
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 31 43
Taking expectation gives
E983045xk minusAdaggerb
983046
= E983045Ekminus1
983045xk minusAdaggerb
983046983046
=
983061Iminus α
ATA
983042A9830422F
983062E983045xkminus1 minusAdaggerb
983046minus α
983061Iminus α
ATA
983042A9830422F
983062AT
983042A9830422FE983045zkminus1
983046
=
983061Iminus α
ATA
983042A9830422F
983062E983045xkminus1 minusAdaggerb
983046minus α
983061Iminus α
ATA
983042A9830422F
983062kATz0
983042A9830422F
=
983061Iminus α
ATA
983042A9830422F
9830622
E983045xkminus2 minusAdaggerb
983046minus 2α
983061Iminus α
ATA
983042A9830422F
983062kATz0
983042A9830422F= middot middot middot
=
983061Iminus α
ATA
983042A9830422F
983062k
(x0 minusAdaggerb)minus αk
983061Iminus α
ATA
983042A9830422F
983062kATz0
983042A9830422F
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 32 43
Applying the norms to both sides we obtain
983042E983045xk minusAdaggerb
9830469830422
=
983056983056983056983056983056
983061Iminus α
ATA
983042A9830422F
983062k
(x0 minusAdaggerb)minus αk
983061Iminus α
ATA
983042A9830422F
983062kATz0
983042A9830422F
9830569830569830569830569830562
le983056983056983056983056983056
983061Iminus α
ATA
983042A9830422F
983062k
(x0 minusAdaggerb)
9830569830569830569830569830562
+
983056983056983056983056983056αk983061Iminus α
ATA
983042A9830422F
983062kATz0
983042A9830422F
9830569830569830569830569830562
le δk983061983042x0 minusAdaggerb9830422 +
αk983042ATz09830422983042A9830422F
983062
Here the last inequality follows from the facts that
x0 minusAdaggerb isin range(AT) ATz0 isin range(AT)
and Lemma 3
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 33 43
42 Convergence of E983045983042xk minusAdaggerb98304222
983046
The convergence ofE983045983042xk minusAdaggerb98304222
983046
depends on the positive number ρ defined as
ρ = 1minus (2αminus α2)σ2r (A)
983042A9830422F
In the following lemma we show that the vector zk generated in REBKwith z0 isin b+ range(A) converges to
bperp = (IminusAAdagger)b
which is the orthogonal projection of z0 onto the set z | ATz = 0
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 34 43
Lemma 9
For any given consistent or inconsistent linear system
Ax = b
let zk be the vector generated in REBK with
z0 isin b+ range(A)
It holdsE983045983042zk minus bperp98304222
983046le ρk983042z0 minus bperp98304222
Proof By (AJj )Tbperp = 0 we have
zk minus bperp = zkminus1 minus bperp minus α
983042AJj9830422FAJj (AJj )
T(zkminus1 minus bperp) (1)
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 35 43
By z0 minus bperp = AAdaggerz0 isin range(A) we can show by induction that
zk minus bperp isin range(A)
It follows from (1) that
983042zk minus bperp98304222
= 983042zkminus1 minus bperp98304222 minus2α983042(AJj )
T(zkminus1 minus bperp)98304222983042AJj9830422F
+α2
983042AJj9830424F(zkminus1 minus bperp)
TAJj (AJj )TAJj (AJj )
T(zkminus1 minus bperp)
le 983042zkminus1 minus bperp98304222 minus(2αminus α2)983042(AJj )
T(zkminus1 minus bperp)98304222983042AJj9830422F
(by Lemma 1)
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 36 43
Taking the conditioned expectation on the first k minus 1 iterations yields
Ekminus1
983045983042zk minus bperp98304222
983046
le 983042zkminus1 minus bperp98304222 minus(2αminus α2)983042AT(zkminus1 minus bperp)98304222
983042A9830422Fle ρ983042zkminus1 minus bperp98304222 (by Lemma 2)
Taking expectation again gives
E983045983042zk minus bperp98304222
983046= E
983045Ekminus1
983045983042zk minus bperp98304222
983046983046
le ρE983045983042zkminus1 minus bperp98304222
983046
le ρk983042z0 minus bperp98304222
This completes the proof
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 37 43
Theorem 10
For any given consistent or inconsistent linear system Ax = b let xk
be the kth iterate of REBK with
z0 isin b+ range(A) and x0 isin range(AT)
For any ε gt 0 it holds
E983045983042xk minusAdaggerb98304222
983046le (1 + ε)kρk
983061(1 + ε)α2983042z0 minus bperp98304222
ε2983042A9830422F+ 983042x0 minusAdaggerb98304222
983062
Proof Let
983141xk = xkminus1 minus α
983042AIi9830422F(AIi)
TAIi(xkminus1 minusAdaggerb)
which is actually one DSBGS update for the linear systemAx = AAdaggerb from xkminus1
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 38 43
It follows from
xk minus 983141xk =α
983042AIi9830422F(AIi)
T(bIi minusAIiAdaggerbminus zkIi)
that
983042xk minus 983141xk98304222
=α2
983042AIi9830424F(bIi minusAIiA
daggerbminus zkIi)TAIi(AIi)
T(bIi minusAIiAdaggerbminus zkIi
)
leα2983042bIi minusAIiA
daggerbminus zkIi98304222
983042AIi9830422F (by Lemma 1) (2)
It follows from
Ekminus1
983045983042xk minus 983141xk98304222
983046= Ekminus1
983045Eikminus1
983045983042xk minus 983141xk98304222
983046983046
le Ekminus1
983063α2983042bminusAAdaggerbminus zk98304222
983042A9830422F
983064(by (2))
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 39 43
that
E983045983042xk minus 983141xk98304222
983046le α2
983042A9830422FE983045983042bminusAAdaggerbminus zk98304222
983046
le α2ρk
983042A9830422F983042z0 minus bperp98304222 (by Lemma 9) (3)
By x0 isin range(AT) and Adaggerb isin range(AT) we have
x0 minusAdaggerb isin range(AT)
Then we can show that xk minusAdaggerb isin range(AT) by induction By
983042983141xk minusAdaggerb98304222
= 983042xkminus1 minusAdaggerb98304222 minus2α983042AIi(x
kminus1 minusAdaggerb)98304222983042AIi9830422F
+α2
983042AIi9830424F(xkminus1 minusAdaggerb)T(AIi)
TAIi(AIi)TAIi(x
kminus1 minusAdaggerb)
le 983042xkminus1 minusAdaggerb98304222 minus(2αminus α2)983042AIi(x
kminus1 minusAdaggerb)98304222983042AIi9830422F
(by Lemma 1)
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 40 43
we have
Ekminus1
983045983042983141xk minusAdaggerb98304222
983046
le 983042xkminus1 minusAdaggerb98304222 minus(2αminus α2)983042A(xkminus1 minusAdaggerb)98304222
983042A9830422Fle ρ983042xkminus1 minusAdaggerb98304222 (by Lemma 3)
which yields
E983045983042983141xk minusAdaggerb98304222
983046le ρE
983045983042xkminus1 minusAdaggerb98304222
983046 (4)
Note that for any ε gt 0 we have
983042xk minusAdaggerb98304222= 983042xk minus 983141xk + 983141xk minusAdaggerb98304222le (983042xk minus 983141xk9830422 + 983042983141xk minusAdaggerb9830422)2
le 983042xk minus 983141xk98304222 + 983042983141xk minusAdaggerb98304222 + 2983042xk minus 983141xk9830422983042983141xk minusAdaggerb9830422
le9830611 +
1
ε
983062983042xk minus 983141xk98304222 + (1 + ε)983042983141xk minusAdaggerb98304222 (5)
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 41 43
Combining (3) (4) and (5) yields
E983045983042xk minusAdaggerb98304222
983046
le9830611 +
1
ε
983062E983045983042xk minus 983141xk98304222
983046+ (1 + ε)E
983045983042983141xk minusAdaggerb98304222
983046
le9830611 +
1
ε
983062α2ρk
983042A9830422F983042z0 minus bperp98304222 + (1 + ε)ρE
983045983042xkminus1 minusAdaggerb98304222
983046
le9830611 +
1
ε
983062(1 + 1 + ε)α2ρk
983042A9830422F983042z0 minus bperp98304222
+(1 + ε)2ρ2E983045983042xkminus2 minusAdaggerb98304222
983046
le middot middot middot
le9830611 +
1
ε
983062(1 + 1 + ε+ middot middot middot+ (1 + ε)kminus1)α2ρk
983042A9830422F983042z0 minus bperp98304222
+(1 + ε)kρk983042x0 minusAdaggerb98304222
le (1 + ε)kρk983061(1 + ε)α2983042z0 minus bperp98304222
ε2983042A9830422F+ 983042x0 minusAdaggerb98304222
983062
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 42 43
Remark 4
For the case REBK with s = m t = n and α = 1 (ie REK) by theorthogonality
(983141xk minusAdaggerb)T(xk minus 983141xk) = 0
the equation (5) becomes
983042xk minusAdaggerb98304222 = 983042xk minus 983141xk98304222 + 983042983141xk minusAdaggerb98304222
which yields the following convergence for REK
E983045983042xk minusAdaggerb98304222
983046le ρk
983061k983042z0 minus bperp98304222
983042A9830422F+ 983042x0 minusAdaggerb98304222
983062
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 43 43
Kui Du Wutao Si and Xiaohui Sun
Pseudoinverse-free randomized extended block kaczmarz for solving least squaresarXiv preprint arXiv200104179 2020
Kui Du and Xiaohui Sun
A doubly stochastic block Gauss-Seidel algorithm for solving linear equationsarXiv preprint arXiv191213291 2019
Lawrence H Landweber
An iteration formula for Fredholm integral equations of the first kindAmer J Math 73615ndash624 1951
D Leventhal and A S Lewis
Randomized methods for linear constraints convergence rates and conditioningMath Oper Res 35(3)641ndash654 2010
Anna Ma Deanna Needell and Aaditya Ramdas
Convergence properties of the randomized extended Gauss-Seidel and Kaczmarz methodsSIAM J Matrix Anal Appl 36(4)1590ndash1604 2015
Meisam Razaviyayn Mingyi Hong Navid Reyhanian and Zhi-Quan Luo
A linearly convergent doubly stochastic Gauss-Seidel algorithm for solving linear equations and acertain class of over-parameterized optimization problemsMath Program 176(1-2 Ser B)465ndash496 2019
Thomas Strohmer and Roman Vershynin
A randomized Kaczmarz algorithm with exponential convergenceJ Fourier Anal Appl 15(2)262ndash278 2009
Anastasios Zouzias and Nikolaos M Freris
Randomized extended Kaczmarz for solving least squaresSIAM J Matrix Anal Appl 34(2)773ndash793 2013
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 43 43
The last inequality follows from Axkminus1 minus b isin range(A) and Lemma 2Taking expectation again gives
E[983042Axk minus b98304222] = E[E[983042Axk minus b98304222 |xkminus1]]
le9830611 + α2 minus 2ασ2
r (A)
983042A9830422F
983062E[983042Axkminus1 minus b98304222]
le9830611 + α2 minus 2ασ2
r (A)
983042A9830422F
983062k
983042Ax0 minus b98304222
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 24 43
If t lt n then it follows from
(IJ )TATAIJ = AT
JAJ ≼ ρI
(since ρ = max1lejlet σ21(AJj )) that
983042Axk minus b98304222
le 983042Axkminus1 minus b98304222 minus 2α(Axkminus1 minus b)T983061AIJ (AIJ )T(II)
T
983042AIJ 9830422F
983062(Axkminus1 minus b)
+α2(Axkminus1 minus b)T983061ρIIAIJ (AIJ )T(II)
T
983042AIJ 9830424F
983062(Axkminus1 minus b)
le 983042Axkminus1 minus b98304222 minus 2α(Axkminus1 minus b)T983061AIJ (AIJ )T(II)
T
983042AIJ 9830422F
983062(Axkminus1 minus b)
+α2(Axkminus1 minus b)T983061ρII(II)
T
983042AIJ 9830422F
983062(Axkminus1 minus b) (by Lemma 1)
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 25 43
Taking expectation gives
E[983042Axk minus b98304222 |xkminus1]
le9830611 +
tρα2
983042A9830422F
983062983042Axkminus1 minus b98304222 minus 2α(Axkminus1 minus b)T
983061AAT
983042A9830422F
983062(Axkminus1 minus b)
le9830611minus 2ασ2
r(A)minus tρα2
983042A9830422F
983062983042Axkminus1 minus b98304222
The last inequality follows from Axkminus1 minus b isin range(A) and Lemma 2Taking expectation again gives
E[983042Axk minus b98304222] = E[E[983042Axk minus b98304222 |xkminus1]]
le9830611minus 2ασ2
r (A)minus tρα2
983042A9830422F
983062E[983042Axkminus1 minus b98304222]
le9830611minus 2ασ2
r (A)minus tρα2
983042A9830422F
983062k
983042Ax0 minus b98304222
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 26 43
Remark 3
Note that(AJ )
Tb = (AJ )TAx983183
where x983183 is any solution of ATAx = ATb For inconsistent linearsystems we can prove DSBGS(1 t) has the convergence bounds Ift = n and 0 lt α lt 2σ2
r (A)983042A9830422F then
E[983042Axk minusAx98318398304222] le9830611 + α2 minus 2ασ2
r (A)
983042A9830422F
983062k
983042Ax0 minus b98304222
If t lt n and 0 lt α lt 2σ2r (A)(tρ) then
E[983042Axk minusAx98318398304222] le9830611minus 2ασ2
r (A)minus tρα2
983042A9830422F
983062k
983042Ax0 minus b98304222
whereρ = max
1lejletσ21(AJj )
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 27 43
4 Randomized extended block Kaczmarz (REBK) [1]
Algorithm 2 Randomized extended block Kaczmarz (REBK)
Let I1 I2 Is and J1J2 Jt be partitions of [m]and [n] respectivelyLet α gt 0 Initialize z0 isin b+ range(A) and x0 isin range(AT)for k = 1 2 do
Pick j isin [t] with probability 983042AJj9830422F983042A9830422FSet zk = zkminus1 minus α
983042AJj9830422FAJj (AJj )
Tzkminus1
Pick i isin [s] with probability 983042AIi9830422F983042A9830422FSet xk = xkminus1 minus α
983042AIi9830422F(AIi)
T(AIixkminus1 minus bIi + zkIi)
By choosing s = m t = n and α = 1 we recover the well-knownrandomized extended Kaczmarz (REK) algorithm of Zouzias and Freris[8]
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 28 43
Let Ekminus1
983045middot983046denote the conditional expectation conditioned on the first
k minus 1 iterations of REBK That is
Ekminus1
983045middot983046= E
983045middot|j1 i1 j2 i2 jkminus1 ikminus1
983046
where jl is the lth column block chosen and il is the lth row blockchosen
We denote the conditional expectation conditioned on the first k minus 1iterations and the kth column block chosen as
Eikminus1
983045middot983046= E
983045middot|j1 i1 j2 i2 jkminus1 ikminus1 jk
983046
Then by the law of total expectation we have
Ekminus1
983045middot983046= Ekminus1
983045Eikminus1
983045middot983046983046
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 29 43
41 Convergence of the norms of the expectations
Theorem 8
For any given consistent or inconsistent linear system
Ax = b
let xk be the kth iterate of REBK with
z0 isin b+ range(A) and x0 isin range(AT)
It holds
983042E983045xk minusAdaggerb
9830469830422 le δk
983061983042x0 minusAdaggerb9830422 +
αk983042ATz09830422983042A9830422F
983062
where
δ = max1leiler
9830559830559830559830551minusασ2
i (A)
983042A9830422F
983055983055983055983055
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 30 43
Proof By ATb = ATAAdaggerb and
xk minusAdaggerb = xkminus1 minusAdaggerbminus α
983042AIi9830422F(AIi)
T(AIixkminus1 minus bIi + zkIi)
we have
Ekminus1
983045xk minusAdaggerb
983046= Ekminus1
983045Eikminus1
983045xk minusAdaggerb
983046983046
= xkminus1 minusAdaggerbminus Ekminus1
983063αAT(Axkminus1 minus b+ zk)
983042A9830422F
983064
= xkminus1 minusAdaggerbminus αATAxkminus1 minusATb
983042A9830422Fminus α
AT
983042A9830422FEkminus1
983045zk
983046
=
983061Iminus α
ATA
983042A9830422F
983062(xkminus1 minusAdaggerb)minus α
AT
983042A9830422FEkminus1
983045zk
983046
=
983061Iminus α
ATA
983042A9830422F
983062(xkminus1 minusAdaggerb)minus α
AT
983042A9830422F
983061Iminus α
AAT
983042A9830422F
983062zkminus1
=
983061Iminus α
ATA
983042A9830422F
983062(xkminus1 minusAdaggerb)minus α
983061Iminus α
ATA
983042A9830422F
983062AT
983042A9830422Fzkminus1
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 31 43
Taking expectation gives
E983045xk minusAdaggerb
983046
= E983045Ekminus1
983045xk minusAdaggerb
983046983046
=
983061Iminus α
ATA
983042A9830422F
983062E983045xkminus1 minusAdaggerb
983046minus α
983061Iminus α
ATA
983042A9830422F
983062AT
983042A9830422FE983045zkminus1
983046
=
983061Iminus α
ATA
983042A9830422F
983062E983045xkminus1 minusAdaggerb
983046minus α
983061Iminus α
ATA
983042A9830422F
983062kATz0
983042A9830422F
=
983061Iminus α
ATA
983042A9830422F
9830622
E983045xkminus2 minusAdaggerb
983046minus 2α
983061Iminus α
ATA
983042A9830422F
983062kATz0
983042A9830422F= middot middot middot
=
983061Iminus α
ATA
983042A9830422F
983062k
(x0 minusAdaggerb)minus αk
983061Iminus α
ATA
983042A9830422F
983062kATz0
983042A9830422F
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 32 43
Applying the norms to both sides we obtain
983042E983045xk minusAdaggerb
9830469830422
=
983056983056983056983056983056
983061Iminus α
ATA
983042A9830422F
983062k
(x0 minusAdaggerb)minus αk
983061Iminus α
ATA
983042A9830422F
983062kATz0
983042A9830422F
9830569830569830569830569830562
le983056983056983056983056983056
983061Iminus α
ATA
983042A9830422F
983062k
(x0 minusAdaggerb)
9830569830569830569830569830562
+
983056983056983056983056983056αk983061Iminus α
ATA
983042A9830422F
983062kATz0
983042A9830422F
9830569830569830569830569830562
le δk983061983042x0 minusAdaggerb9830422 +
αk983042ATz09830422983042A9830422F
983062
Here the last inequality follows from the facts that
x0 minusAdaggerb isin range(AT) ATz0 isin range(AT)
and Lemma 3
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 33 43
42 Convergence of E983045983042xk minusAdaggerb98304222
983046
The convergence ofE983045983042xk minusAdaggerb98304222
983046
depends on the positive number ρ defined as
ρ = 1minus (2αminus α2)σ2r (A)
983042A9830422F
In the following lemma we show that the vector zk generated in REBKwith z0 isin b+ range(A) converges to
bperp = (IminusAAdagger)b
which is the orthogonal projection of z0 onto the set z | ATz = 0
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 34 43
Lemma 9
For any given consistent or inconsistent linear system
Ax = b
let zk be the vector generated in REBK with
z0 isin b+ range(A)
It holdsE983045983042zk minus bperp98304222
983046le ρk983042z0 minus bperp98304222
Proof By (AJj )Tbperp = 0 we have
zk minus bperp = zkminus1 minus bperp minus α
983042AJj9830422FAJj (AJj )
T(zkminus1 minus bperp) (1)
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 35 43
By z0 minus bperp = AAdaggerz0 isin range(A) we can show by induction that
zk minus bperp isin range(A)
It follows from (1) that
983042zk minus bperp98304222
= 983042zkminus1 minus bperp98304222 minus2α983042(AJj )
T(zkminus1 minus bperp)98304222983042AJj9830422F
+α2
983042AJj9830424F(zkminus1 minus bperp)
TAJj (AJj )TAJj (AJj )
T(zkminus1 minus bperp)
le 983042zkminus1 minus bperp98304222 minus(2αminus α2)983042(AJj )
T(zkminus1 minus bperp)98304222983042AJj9830422F
(by Lemma 1)
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 36 43
Taking the conditioned expectation on the first k minus 1 iterations yields
Ekminus1
983045983042zk minus bperp98304222
983046
le 983042zkminus1 minus bperp98304222 minus(2αminus α2)983042AT(zkminus1 minus bperp)98304222
983042A9830422Fle ρ983042zkminus1 minus bperp98304222 (by Lemma 2)
Taking expectation again gives
E983045983042zk minus bperp98304222
983046= E
983045Ekminus1
983045983042zk minus bperp98304222
983046983046
le ρE983045983042zkminus1 minus bperp98304222
983046
le ρk983042z0 minus bperp98304222
This completes the proof
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 37 43
Theorem 10
For any given consistent or inconsistent linear system Ax = b let xk
be the kth iterate of REBK with
z0 isin b+ range(A) and x0 isin range(AT)
For any ε gt 0 it holds
E983045983042xk minusAdaggerb98304222
983046le (1 + ε)kρk
983061(1 + ε)α2983042z0 minus bperp98304222
ε2983042A9830422F+ 983042x0 minusAdaggerb98304222
983062
Proof Let
983141xk = xkminus1 minus α
983042AIi9830422F(AIi)
TAIi(xkminus1 minusAdaggerb)
which is actually one DSBGS update for the linear systemAx = AAdaggerb from xkminus1
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 38 43
It follows from
xk minus 983141xk =α
983042AIi9830422F(AIi)
T(bIi minusAIiAdaggerbminus zkIi)
that
983042xk minus 983141xk98304222
=α2
983042AIi9830424F(bIi minusAIiA
daggerbminus zkIi)TAIi(AIi)
T(bIi minusAIiAdaggerbminus zkIi
)
leα2983042bIi minusAIiA
daggerbminus zkIi98304222
983042AIi9830422F (by Lemma 1) (2)
It follows from
Ekminus1
983045983042xk minus 983141xk98304222
983046= Ekminus1
983045Eikminus1
983045983042xk minus 983141xk98304222
983046983046
le Ekminus1
983063α2983042bminusAAdaggerbminus zk98304222
983042A9830422F
983064(by (2))
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 39 43
that
E983045983042xk minus 983141xk98304222
983046le α2
983042A9830422FE983045983042bminusAAdaggerbminus zk98304222
983046
le α2ρk
983042A9830422F983042z0 minus bperp98304222 (by Lemma 9) (3)
By x0 isin range(AT) and Adaggerb isin range(AT) we have
x0 minusAdaggerb isin range(AT)
Then we can show that xk minusAdaggerb isin range(AT) by induction By
983042983141xk minusAdaggerb98304222
= 983042xkminus1 minusAdaggerb98304222 minus2α983042AIi(x
kminus1 minusAdaggerb)98304222983042AIi9830422F
+α2
983042AIi9830424F(xkminus1 minusAdaggerb)T(AIi)
TAIi(AIi)TAIi(x
kminus1 minusAdaggerb)
le 983042xkminus1 minusAdaggerb98304222 minus(2αminus α2)983042AIi(x
kminus1 minusAdaggerb)98304222983042AIi9830422F
(by Lemma 1)
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 40 43
we have
Ekminus1
983045983042983141xk minusAdaggerb98304222
983046
le 983042xkminus1 minusAdaggerb98304222 minus(2αminus α2)983042A(xkminus1 minusAdaggerb)98304222
983042A9830422Fle ρ983042xkminus1 minusAdaggerb98304222 (by Lemma 3)
which yields
E983045983042983141xk minusAdaggerb98304222
983046le ρE
983045983042xkminus1 minusAdaggerb98304222
983046 (4)
Note that for any ε gt 0 we have
983042xk minusAdaggerb98304222= 983042xk minus 983141xk + 983141xk minusAdaggerb98304222le (983042xk minus 983141xk9830422 + 983042983141xk minusAdaggerb9830422)2
le 983042xk minus 983141xk98304222 + 983042983141xk minusAdaggerb98304222 + 2983042xk minus 983141xk9830422983042983141xk minusAdaggerb9830422
le9830611 +
1
ε
983062983042xk minus 983141xk98304222 + (1 + ε)983042983141xk minusAdaggerb98304222 (5)
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 41 43
Combining (3) (4) and (5) yields
E983045983042xk minusAdaggerb98304222
983046
le9830611 +
1
ε
983062E983045983042xk minus 983141xk98304222
983046+ (1 + ε)E
983045983042983141xk minusAdaggerb98304222
983046
le9830611 +
1
ε
983062α2ρk
983042A9830422F983042z0 minus bperp98304222 + (1 + ε)ρE
983045983042xkminus1 minusAdaggerb98304222
983046
le9830611 +
1
ε
983062(1 + 1 + ε)α2ρk
983042A9830422F983042z0 minus bperp98304222
+(1 + ε)2ρ2E983045983042xkminus2 minusAdaggerb98304222
983046
le middot middot middot
le9830611 +
1
ε
983062(1 + 1 + ε+ middot middot middot+ (1 + ε)kminus1)α2ρk
983042A9830422F983042z0 minus bperp98304222
+(1 + ε)kρk983042x0 minusAdaggerb98304222
le (1 + ε)kρk983061(1 + ε)α2983042z0 minus bperp98304222
ε2983042A9830422F+ 983042x0 minusAdaggerb98304222
983062
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 42 43
Remark 4
For the case REBK with s = m t = n and α = 1 (ie REK) by theorthogonality
(983141xk minusAdaggerb)T(xk minus 983141xk) = 0
the equation (5) becomes
983042xk minusAdaggerb98304222 = 983042xk minus 983141xk98304222 + 983042983141xk minusAdaggerb98304222
which yields the following convergence for REK
E983045983042xk minusAdaggerb98304222
983046le ρk
983061k983042z0 minus bperp98304222
983042A9830422F+ 983042x0 minusAdaggerb98304222
983062
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 43 43
Kui Du Wutao Si and Xiaohui Sun
Pseudoinverse-free randomized extended block kaczmarz for solving least squaresarXiv preprint arXiv200104179 2020
Kui Du and Xiaohui Sun
A doubly stochastic block Gauss-Seidel algorithm for solving linear equationsarXiv preprint arXiv191213291 2019
Lawrence H Landweber
An iteration formula for Fredholm integral equations of the first kindAmer J Math 73615ndash624 1951
D Leventhal and A S Lewis
Randomized methods for linear constraints convergence rates and conditioningMath Oper Res 35(3)641ndash654 2010
Anna Ma Deanna Needell and Aaditya Ramdas
Convergence properties of the randomized extended Gauss-Seidel and Kaczmarz methodsSIAM J Matrix Anal Appl 36(4)1590ndash1604 2015
Meisam Razaviyayn Mingyi Hong Navid Reyhanian and Zhi-Quan Luo
A linearly convergent doubly stochastic Gauss-Seidel algorithm for solving linear equations and acertain class of over-parameterized optimization problemsMath Program 176(1-2 Ser B)465ndash496 2019
Thomas Strohmer and Roman Vershynin
A randomized Kaczmarz algorithm with exponential convergenceJ Fourier Anal Appl 15(2)262ndash278 2009
Anastasios Zouzias and Nikolaos M Freris
Randomized extended Kaczmarz for solving least squaresSIAM J Matrix Anal Appl 34(2)773ndash793 2013
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 43 43
If t lt n then it follows from
(IJ )TATAIJ = AT
JAJ ≼ ρI
(since ρ = max1lejlet σ21(AJj )) that
983042Axk minus b98304222
le 983042Axkminus1 minus b98304222 minus 2α(Axkminus1 minus b)T983061AIJ (AIJ )T(II)
T
983042AIJ 9830422F
983062(Axkminus1 minus b)
+α2(Axkminus1 minus b)T983061ρIIAIJ (AIJ )T(II)
T
983042AIJ 9830424F
983062(Axkminus1 minus b)
le 983042Axkminus1 minus b98304222 minus 2α(Axkminus1 minus b)T983061AIJ (AIJ )T(II)
T
983042AIJ 9830422F
983062(Axkminus1 minus b)
+α2(Axkminus1 minus b)T983061ρII(II)
T
983042AIJ 9830422F
983062(Axkminus1 minus b) (by Lemma 1)
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 25 43
Taking expectation gives
E[983042Axk minus b98304222 |xkminus1]
le9830611 +
tρα2
983042A9830422F
983062983042Axkminus1 minus b98304222 minus 2α(Axkminus1 minus b)T
983061AAT
983042A9830422F
983062(Axkminus1 minus b)
le9830611minus 2ασ2
r(A)minus tρα2
983042A9830422F
983062983042Axkminus1 minus b98304222
The last inequality follows from Axkminus1 minus b isin range(A) and Lemma 2Taking expectation again gives
E[983042Axk minus b98304222] = E[E[983042Axk minus b98304222 |xkminus1]]
le9830611minus 2ασ2
r (A)minus tρα2
983042A9830422F
983062E[983042Axkminus1 minus b98304222]
le9830611minus 2ασ2
r (A)minus tρα2
983042A9830422F
983062k
983042Ax0 minus b98304222
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 26 43
Remark 3
Note that(AJ )
Tb = (AJ )TAx983183
where x983183 is any solution of ATAx = ATb For inconsistent linearsystems we can prove DSBGS(1 t) has the convergence bounds Ift = n and 0 lt α lt 2σ2
r (A)983042A9830422F then
E[983042Axk minusAx98318398304222] le9830611 + α2 minus 2ασ2
r (A)
983042A9830422F
983062k
983042Ax0 minus b98304222
If t lt n and 0 lt α lt 2σ2r (A)(tρ) then
E[983042Axk minusAx98318398304222] le9830611minus 2ασ2
r (A)minus tρα2
983042A9830422F
983062k
983042Ax0 minus b98304222
whereρ = max
1lejletσ21(AJj )
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 27 43
4 Randomized extended block Kaczmarz (REBK) [1]
Algorithm 2 Randomized extended block Kaczmarz (REBK)
Let I1 I2 Is and J1J2 Jt be partitions of [m]and [n] respectivelyLet α gt 0 Initialize z0 isin b+ range(A) and x0 isin range(AT)for k = 1 2 do
Pick j isin [t] with probability 983042AJj9830422F983042A9830422FSet zk = zkminus1 minus α
983042AJj9830422FAJj (AJj )
Tzkminus1
Pick i isin [s] with probability 983042AIi9830422F983042A9830422FSet xk = xkminus1 minus α
983042AIi9830422F(AIi)
T(AIixkminus1 minus bIi + zkIi)
By choosing s = m t = n and α = 1 we recover the well-knownrandomized extended Kaczmarz (REK) algorithm of Zouzias and Freris[8]
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 28 43
Let Ekminus1
983045middot983046denote the conditional expectation conditioned on the first
k minus 1 iterations of REBK That is
Ekminus1
983045middot983046= E
983045middot|j1 i1 j2 i2 jkminus1 ikminus1
983046
where jl is the lth column block chosen and il is the lth row blockchosen
We denote the conditional expectation conditioned on the first k minus 1iterations and the kth column block chosen as
Eikminus1
983045middot983046= E
983045middot|j1 i1 j2 i2 jkminus1 ikminus1 jk
983046
Then by the law of total expectation we have
Ekminus1
983045middot983046= Ekminus1
983045Eikminus1
983045middot983046983046
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 29 43
41 Convergence of the norms of the expectations
Theorem 8
For any given consistent or inconsistent linear system
Ax = b
let xk be the kth iterate of REBK with
z0 isin b+ range(A) and x0 isin range(AT)
It holds
983042E983045xk minusAdaggerb
9830469830422 le δk
983061983042x0 minusAdaggerb9830422 +
αk983042ATz09830422983042A9830422F
983062
where
δ = max1leiler
9830559830559830559830551minusασ2
i (A)
983042A9830422F
983055983055983055983055
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 30 43
Proof By ATb = ATAAdaggerb and
xk minusAdaggerb = xkminus1 minusAdaggerbminus α
983042AIi9830422F(AIi)
T(AIixkminus1 minus bIi + zkIi)
we have
Ekminus1
983045xk minusAdaggerb
983046= Ekminus1
983045Eikminus1
983045xk minusAdaggerb
983046983046
= xkminus1 minusAdaggerbminus Ekminus1
983063αAT(Axkminus1 minus b+ zk)
983042A9830422F
983064
= xkminus1 minusAdaggerbminus αATAxkminus1 minusATb
983042A9830422Fminus α
AT
983042A9830422FEkminus1
983045zk
983046
=
983061Iminus α
ATA
983042A9830422F
983062(xkminus1 minusAdaggerb)minus α
AT
983042A9830422FEkminus1
983045zk
983046
=
983061Iminus α
ATA
983042A9830422F
983062(xkminus1 minusAdaggerb)minus α
AT
983042A9830422F
983061Iminus α
AAT
983042A9830422F
983062zkminus1
=
983061Iminus α
ATA
983042A9830422F
983062(xkminus1 minusAdaggerb)minus α
983061Iminus α
ATA
983042A9830422F
983062AT
983042A9830422Fzkminus1
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 31 43
Taking expectation gives
E983045xk minusAdaggerb
983046
= E983045Ekminus1
983045xk minusAdaggerb
983046983046
=
983061Iminus α
ATA
983042A9830422F
983062E983045xkminus1 minusAdaggerb
983046minus α
983061Iminus α
ATA
983042A9830422F
983062AT
983042A9830422FE983045zkminus1
983046
=
983061Iminus α
ATA
983042A9830422F
983062E983045xkminus1 minusAdaggerb
983046minus α
983061Iminus α
ATA
983042A9830422F
983062kATz0
983042A9830422F
=
983061Iminus α
ATA
983042A9830422F
9830622
E983045xkminus2 minusAdaggerb
983046minus 2α
983061Iminus α
ATA
983042A9830422F
983062kATz0
983042A9830422F= middot middot middot
=
983061Iminus α
ATA
983042A9830422F
983062k
(x0 minusAdaggerb)minus αk
983061Iminus α
ATA
983042A9830422F
983062kATz0
983042A9830422F
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 32 43
Applying the norms to both sides we obtain
983042E983045xk minusAdaggerb
9830469830422
=
983056983056983056983056983056
983061Iminus α
ATA
983042A9830422F
983062k
(x0 minusAdaggerb)minus αk
983061Iminus α
ATA
983042A9830422F
983062kATz0
983042A9830422F
9830569830569830569830569830562
le983056983056983056983056983056
983061Iminus α
ATA
983042A9830422F
983062k
(x0 minusAdaggerb)
9830569830569830569830569830562
+
983056983056983056983056983056αk983061Iminus α
ATA
983042A9830422F
983062kATz0
983042A9830422F
9830569830569830569830569830562
le δk983061983042x0 minusAdaggerb9830422 +
αk983042ATz09830422983042A9830422F
983062
Here the last inequality follows from the facts that
x0 minusAdaggerb isin range(AT) ATz0 isin range(AT)
and Lemma 3
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 33 43
42 Convergence of E983045983042xk minusAdaggerb98304222
983046
The convergence ofE983045983042xk minusAdaggerb98304222
983046
depends on the positive number ρ defined as
ρ = 1minus (2αminus α2)σ2r (A)
983042A9830422F
In the following lemma we show that the vector zk generated in REBKwith z0 isin b+ range(A) converges to
bperp = (IminusAAdagger)b
which is the orthogonal projection of z0 onto the set z | ATz = 0
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 34 43
Lemma 9
For any given consistent or inconsistent linear system
Ax = b
let zk be the vector generated in REBK with
z0 isin b+ range(A)
It holdsE983045983042zk minus bperp98304222
983046le ρk983042z0 minus bperp98304222
Proof By (AJj )Tbperp = 0 we have
zk minus bperp = zkminus1 minus bperp minus α
983042AJj9830422FAJj (AJj )
T(zkminus1 minus bperp) (1)
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 35 43
By z0 minus bperp = AAdaggerz0 isin range(A) we can show by induction that
zk minus bperp isin range(A)
It follows from (1) that
983042zk minus bperp98304222
= 983042zkminus1 minus bperp98304222 minus2α983042(AJj )
T(zkminus1 minus bperp)98304222983042AJj9830422F
+α2
983042AJj9830424F(zkminus1 minus bperp)
TAJj (AJj )TAJj (AJj )
T(zkminus1 minus bperp)
le 983042zkminus1 minus bperp98304222 minus(2αminus α2)983042(AJj )
T(zkminus1 minus bperp)98304222983042AJj9830422F
(by Lemma 1)
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 36 43
Taking the conditioned expectation on the first k minus 1 iterations yields
Ekminus1
983045983042zk minus bperp98304222
983046
le 983042zkminus1 minus bperp98304222 minus(2αminus α2)983042AT(zkminus1 minus bperp)98304222
983042A9830422Fle ρ983042zkminus1 minus bperp98304222 (by Lemma 2)
Taking expectation again gives
E983045983042zk minus bperp98304222
983046= E
983045Ekminus1
983045983042zk minus bperp98304222
983046983046
le ρE983045983042zkminus1 minus bperp98304222
983046
le ρk983042z0 minus bperp98304222
This completes the proof
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 37 43
Theorem 10
For any given consistent or inconsistent linear system Ax = b let xk
be the kth iterate of REBK with
z0 isin b+ range(A) and x0 isin range(AT)
For any ε gt 0 it holds
E983045983042xk minusAdaggerb98304222
983046le (1 + ε)kρk
983061(1 + ε)α2983042z0 minus bperp98304222
ε2983042A9830422F+ 983042x0 minusAdaggerb98304222
983062
Proof Let
983141xk = xkminus1 minus α
983042AIi9830422F(AIi)
TAIi(xkminus1 minusAdaggerb)
which is actually one DSBGS update for the linear systemAx = AAdaggerb from xkminus1
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 38 43
It follows from
xk minus 983141xk =α
983042AIi9830422F(AIi)
T(bIi minusAIiAdaggerbminus zkIi)
that
983042xk minus 983141xk98304222
=α2
983042AIi9830424F(bIi minusAIiA
daggerbminus zkIi)TAIi(AIi)
T(bIi minusAIiAdaggerbminus zkIi
)
leα2983042bIi minusAIiA
daggerbminus zkIi98304222
983042AIi9830422F (by Lemma 1) (2)
It follows from
Ekminus1
983045983042xk minus 983141xk98304222
983046= Ekminus1
983045Eikminus1
983045983042xk minus 983141xk98304222
983046983046
le Ekminus1
983063α2983042bminusAAdaggerbminus zk98304222
983042A9830422F
983064(by (2))
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 39 43
that
E983045983042xk minus 983141xk98304222
983046le α2
983042A9830422FE983045983042bminusAAdaggerbminus zk98304222
983046
le α2ρk
983042A9830422F983042z0 minus bperp98304222 (by Lemma 9) (3)
By x0 isin range(AT) and Adaggerb isin range(AT) we have
x0 minusAdaggerb isin range(AT)
Then we can show that xk minusAdaggerb isin range(AT) by induction By
983042983141xk minusAdaggerb98304222
= 983042xkminus1 minusAdaggerb98304222 minus2α983042AIi(x
kminus1 minusAdaggerb)98304222983042AIi9830422F
+α2
983042AIi9830424F(xkminus1 minusAdaggerb)T(AIi)
TAIi(AIi)TAIi(x
kminus1 minusAdaggerb)
le 983042xkminus1 minusAdaggerb98304222 minus(2αminus α2)983042AIi(x
kminus1 minusAdaggerb)98304222983042AIi9830422F
(by Lemma 1)
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 40 43
we have
Ekminus1
983045983042983141xk minusAdaggerb98304222
983046
le 983042xkminus1 minusAdaggerb98304222 minus(2αminus α2)983042A(xkminus1 minusAdaggerb)98304222
983042A9830422Fle ρ983042xkminus1 minusAdaggerb98304222 (by Lemma 3)
which yields
E983045983042983141xk minusAdaggerb98304222
983046le ρE
983045983042xkminus1 minusAdaggerb98304222
983046 (4)
Note that for any ε gt 0 we have
983042xk minusAdaggerb98304222= 983042xk minus 983141xk + 983141xk minusAdaggerb98304222le (983042xk minus 983141xk9830422 + 983042983141xk minusAdaggerb9830422)2
le 983042xk minus 983141xk98304222 + 983042983141xk minusAdaggerb98304222 + 2983042xk minus 983141xk9830422983042983141xk minusAdaggerb9830422
le9830611 +
1
ε
983062983042xk minus 983141xk98304222 + (1 + ε)983042983141xk minusAdaggerb98304222 (5)
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 41 43
Combining (3) (4) and (5) yields
E983045983042xk minusAdaggerb98304222
983046
le9830611 +
1
ε
983062E983045983042xk minus 983141xk98304222
983046+ (1 + ε)E
983045983042983141xk minusAdaggerb98304222
983046
le9830611 +
1
ε
983062α2ρk
983042A9830422F983042z0 minus bperp98304222 + (1 + ε)ρE
983045983042xkminus1 minusAdaggerb98304222
983046
le9830611 +
1
ε
983062(1 + 1 + ε)α2ρk
983042A9830422F983042z0 minus bperp98304222
+(1 + ε)2ρ2E983045983042xkminus2 minusAdaggerb98304222
983046
le middot middot middot
le9830611 +
1
ε
983062(1 + 1 + ε+ middot middot middot+ (1 + ε)kminus1)α2ρk
983042A9830422F983042z0 minus bperp98304222
+(1 + ε)kρk983042x0 minusAdaggerb98304222
le (1 + ε)kρk983061(1 + ε)α2983042z0 minus bperp98304222
ε2983042A9830422F+ 983042x0 minusAdaggerb98304222
983062
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 42 43
Remark 4
For the case REBK with s = m t = n and α = 1 (ie REK) by theorthogonality
(983141xk minusAdaggerb)T(xk minus 983141xk) = 0
the equation (5) becomes
983042xk minusAdaggerb98304222 = 983042xk minus 983141xk98304222 + 983042983141xk minusAdaggerb98304222
which yields the following convergence for REK
E983045983042xk minusAdaggerb98304222
983046le ρk
983061k983042z0 minus bperp98304222
983042A9830422F+ 983042x0 minusAdaggerb98304222
983062
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 43 43
Kui Du Wutao Si and Xiaohui Sun
Pseudoinverse-free randomized extended block kaczmarz for solving least squaresarXiv preprint arXiv200104179 2020
Kui Du and Xiaohui Sun
A doubly stochastic block Gauss-Seidel algorithm for solving linear equationsarXiv preprint arXiv191213291 2019
Lawrence H Landweber
An iteration formula for Fredholm integral equations of the first kindAmer J Math 73615ndash624 1951
D Leventhal and A S Lewis
Randomized methods for linear constraints convergence rates and conditioningMath Oper Res 35(3)641ndash654 2010
Anna Ma Deanna Needell and Aaditya Ramdas
Convergence properties of the randomized extended Gauss-Seidel and Kaczmarz methodsSIAM J Matrix Anal Appl 36(4)1590ndash1604 2015
Meisam Razaviyayn Mingyi Hong Navid Reyhanian and Zhi-Quan Luo
A linearly convergent doubly stochastic Gauss-Seidel algorithm for solving linear equations and acertain class of over-parameterized optimization problemsMath Program 176(1-2 Ser B)465ndash496 2019
Thomas Strohmer and Roman Vershynin
A randomized Kaczmarz algorithm with exponential convergenceJ Fourier Anal Appl 15(2)262ndash278 2009
Anastasios Zouzias and Nikolaos M Freris
Randomized extended Kaczmarz for solving least squaresSIAM J Matrix Anal Appl 34(2)773ndash793 2013
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 43 43
Taking expectation gives
E[983042Axk minus b98304222 |xkminus1]
le9830611 +
tρα2
983042A9830422F
983062983042Axkminus1 minus b98304222 minus 2α(Axkminus1 minus b)T
983061AAT
983042A9830422F
983062(Axkminus1 minus b)
le9830611minus 2ασ2
r(A)minus tρα2
983042A9830422F
983062983042Axkminus1 minus b98304222
The last inequality follows from Axkminus1 minus b isin range(A) and Lemma 2Taking expectation again gives
E[983042Axk minus b98304222] = E[E[983042Axk minus b98304222 |xkminus1]]
le9830611minus 2ασ2
r (A)minus tρα2
983042A9830422F
983062E[983042Axkminus1 minus b98304222]
le9830611minus 2ασ2
r (A)minus tρα2
983042A9830422F
983062k
983042Ax0 minus b98304222
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 26 43
Remark 3
Note that(AJ )
Tb = (AJ )TAx983183
where x983183 is any solution of ATAx = ATb For inconsistent linearsystems we can prove DSBGS(1 t) has the convergence bounds Ift = n and 0 lt α lt 2σ2
r (A)983042A9830422F then
E[983042Axk minusAx98318398304222] le9830611 + α2 minus 2ασ2
r (A)
983042A9830422F
983062k
983042Ax0 minus b98304222
If t lt n and 0 lt α lt 2σ2r (A)(tρ) then
E[983042Axk minusAx98318398304222] le9830611minus 2ασ2
r (A)minus tρα2
983042A9830422F
983062k
983042Ax0 minus b98304222
whereρ = max
1lejletσ21(AJj )
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 27 43
4 Randomized extended block Kaczmarz (REBK) [1]
Algorithm 2 Randomized extended block Kaczmarz (REBK)
Let I1 I2 Is and J1J2 Jt be partitions of [m]and [n] respectivelyLet α gt 0 Initialize z0 isin b+ range(A) and x0 isin range(AT)for k = 1 2 do
Pick j isin [t] with probability 983042AJj9830422F983042A9830422FSet zk = zkminus1 minus α
983042AJj9830422FAJj (AJj )
Tzkminus1
Pick i isin [s] with probability 983042AIi9830422F983042A9830422FSet xk = xkminus1 minus α
983042AIi9830422F(AIi)
T(AIixkminus1 minus bIi + zkIi)
By choosing s = m t = n and α = 1 we recover the well-knownrandomized extended Kaczmarz (REK) algorithm of Zouzias and Freris[8]
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 28 43
Let Ekminus1
983045middot983046denote the conditional expectation conditioned on the first
k minus 1 iterations of REBK That is
Ekminus1
983045middot983046= E
983045middot|j1 i1 j2 i2 jkminus1 ikminus1
983046
where jl is the lth column block chosen and il is the lth row blockchosen
We denote the conditional expectation conditioned on the first k minus 1iterations and the kth column block chosen as
Eikminus1
983045middot983046= E
983045middot|j1 i1 j2 i2 jkminus1 ikminus1 jk
983046
Then by the law of total expectation we have
Ekminus1
983045middot983046= Ekminus1
983045Eikminus1
983045middot983046983046
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 29 43
41 Convergence of the norms of the expectations
Theorem 8
For any given consistent or inconsistent linear system
Ax = b
let xk be the kth iterate of REBK with
z0 isin b+ range(A) and x0 isin range(AT)
It holds
983042E983045xk minusAdaggerb
9830469830422 le δk
983061983042x0 minusAdaggerb9830422 +
αk983042ATz09830422983042A9830422F
983062
where
δ = max1leiler
9830559830559830559830551minusασ2
i (A)
983042A9830422F
983055983055983055983055
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 30 43
Proof By ATb = ATAAdaggerb and
xk minusAdaggerb = xkminus1 minusAdaggerbminus α
983042AIi9830422F(AIi)
T(AIixkminus1 minus bIi + zkIi)
we have
Ekminus1
983045xk minusAdaggerb
983046= Ekminus1
983045Eikminus1
983045xk minusAdaggerb
983046983046
= xkminus1 minusAdaggerbminus Ekminus1
983063αAT(Axkminus1 minus b+ zk)
983042A9830422F
983064
= xkminus1 minusAdaggerbminus αATAxkminus1 minusATb
983042A9830422Fminus α
AT
983042A9830422FEkminus1
983045zk
983046
=
983061Iminus α
ATA
983042A9830422F
983062(xkminus1 minusAdaggerb)minus α
AT
983042A9830422FEkminus1
983045zk
983046
=
983061Iminus α
ATA
983042A9830422F
983062(xkminus1 minusAdaggerb)minus α
AT
983042A9830422F
983061Iminus α
AAT
983042A9830422F
983062zkminus1
=
983061Iminus α
ATA
983042A9830422F
983062(xkminus1 minusAdaggerb)minus α
983061Iminus α
ATA
983042A9830422F
983062AT
983042A9830422Fzkminus1
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 31 43
Taking expectation gives
E983045xk minusAdaggerb
983046
= E983045Ekminus1
983045xk minusAdaggerb
983046983046
=
983061Iminus α
ATA
983042A9830422F
983062E983045xkminus1 minusAdaggerb
983046minus α
983061Iminus α
ATA
983042A9830422F
983062AT
983042A9830422FE983045zkminus1
983046
=
983061Iminus α
ATA
983042A9830422F
983062E983045xkminus1 minusAdaggerb
983046minus α
983061Iminus α
ATA
983042A9830422F
983062kATz0
983042A9830422F
=
983061Iminus α
ATA
983042A9830422F
9830622
E983045xkminus2 minusAdaggerb
983046minus 2α
983061Iminus α
ATA
983042A9830422F
983062kATz0
983042A9830422F= middot middot middot
=
983061Iminus α
ATA
983042A9830422F
983062k
(x0 minusAdaggerb)minus αk
983061Iminus α
ATA
983042A9830422F
983062kATz0
983042A9830422F
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 32 43
Applying the norms to both sides we obtain
983042E983045xk minusAdaggerb
9830469830422
=
983056983056983056983056983056
983061Iminus α
ATA
983042A9830422F
983062k
(x0 minusAdaggerb)minus αk
983061Iminus α
ATA
983042A9830422F
983062kATz0
983042A9830422F
9830569830569830569830569830562
le983056983056983056983056983056
983061Iminus α
ATA
983042A9830422F
983062k
(x0 minusAdaggerb)
9830569830569830569830569830562
+
983056983056983056983056983056αk983061Iminus α
ATA
983042A9830422F
983062kATz0
983042A9830422F
9830569830569830569830569830562
le δk983061983042x0 minusAdaggerb9830422 +
αk983042ATz09830422983042A9830422F
983062
Here the last inequality follows from the facts that
x0 minusAdaggerb isin range(AT) ATz0 isin range(AT)
and Lemma 3
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 33 43
42 Convergence of E983045983042xk minusAdaggerb98304222
983046
The convergence ofE983045983042xk minusAdaggerb98304222
983046
depends on the positive number ρ defined as
ρ = 1minus (2αminus α2)σ2r (A)
983042A9830422F
In the following lemma we show that the vector zk generated in REBKwith z0 isin b+ range(A) converges to
bperp = (IminusAAdagger)b
which is the orthogonal projection of z0 onto the set z | ATz = 0
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 34 43
Lemma 9
For any given consistent or inconsistent linear system
Ax = b
let zk be the vector generated in REBK with
z0 isin b+ range(A)
It holdsE983045983042zk minus bperp98304222
983046le ρk983042z0 minus bperp98304222
Proof By (AJj )Tbperp = 0 we have
zk minus bperp = zkminus1 minus bperp minus α
983042AJj9830422FAJj (AJj )
T(zkminus1 minus bperp) (1)
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 35 43
By z0 minus bperp = AAdaggerz0 isin range(A) we can show by induction that
zk minus bperp isin range(A)
It follows from (1) that
983042zk minus bperp98304222
= 983042zkminus1 minus bperp98304222 minus2α983042(AJj )
T(zkminus1 minus bperp)98304222983042AJj9830422F
+α2
983042AJj9830424F(zkminus1 minus bperp)
TAJj (AJj )TAJj (AJj )
T(zkminus1 minus bperp)
le 983042zkminus1 minus bperp98304222 minus(2αminus α2)983042(AJj )
T(zkminus1 minus bperp)98304222983042AJj9830422F
(by Lemma 1)
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 36 43
Taking the conditioned expectation on the first k minus 1 iterations yields
Ekminus1
983045983042zk minus bperp98304222
983046
le 983042zkminus1 minus bperp98304222 minus(2αminus α2)983042AT(zkminus1 minus bperp)98304222
983042A9830422Fle ρ983042zkminus1 minus bperp98304222 (by Lemma 2)
Taking expectation again gives
E983045983042zk minus bperp98304222
983046= E
983045Ekminus1
983045983042zk minus bperp98304222
983046983046
le ρE983045983042zkminus1 minus bperp98304222
983046
le ρk983042z0 minus bperp98304222
This completes the proof
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 37 43
Theorem 10
For any given consistent or inconsistent linear system Ax = b let xk
be the kth iterate of REBK with
z0 isin b+ range(A) and x0 isin range(AT)
For any ε gt 0 it holds
E983045983042xk minusAdaggerb98304222
983046le (1 + ε)kρk
983061(1 + ε)α2983042z0 minus bperp98304222
ε2983042A9830422F+ 983042x0 minusAdaggerb98304222
983062
Proof Let
983141xk = xkminus1 minus α
983042AIi9830422F(AIi)
TAIi(xkminus1 minusAdaggerb)
which is actually one DSBGS update for the linear systemAx = AAdaggerb from xkminus1
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 38 43
It follows from
xk minus 983141xk =α
983042AIi9830422F(AIi)
T(bIi minusAIiAdaggerbminus zkIi)
that
983042xk minus 983141xk98304222
=α2
983042AIi9830424F(bIi minusAIiA
daggerbminus zkIi)TAIi(AIi)
T(bIi minusAIiAdaggerbminus zkIi
)
leα2983042bIi minusAIiA
daggerbminus zkIi98304222
983042AIi9830422F (by Lemma 1) (2)
It follows from
Ekminus1
983045983042xk minus 983141xk98304222
983046= Ekminus1
983045Eikminus1
983045983042xk minus 983141xk98304222
983046983046
le Ekminus1
983063α2983042bminusAAdaggerbminus zk98304222
983042A9830422F
983064(by (2))
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 39 43
that
E983045983042xk minus 983141xk98304222
983046le α2
983042A9830422FE983045983042bminusAAdaggerbminus zk98304222
983046
le α2ρk
983042A9830422F983042z0 minus bperp98304222 (by Lemma 9) (3)
By x0 isin range(AT) and Adaggerb isin range(AT) we have
x0 minusAdaggerb isin range(AT)
Then we can show that xk minusAdaggerb isin range(AT) by induction By
983042983141xk minusAdaggerb98304222
= 983042xkminus1 minusAdaggerb98304222 minus2α983042AIi(x
kminus1 minusAdaggerb)98304222983042AIi9830422F
+α2
983042AIi9830424F(xkminus1 minusAdaggerb)T(AIi)
TAIi(AIi)TAIi(x
kminus1 minusAdaggerb)
le 983042xkminus1 minusAdaggerb98304222 minus(2αminus α2)983042AIi(x
kminus1 minusAdaggerb)98304222983042AIi9830422F
(by Lemma 1)
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 40 43
we have
Ekminus1
983045983042983141xk minusAdaggerb98304222
983046
le 983042xkminus1 minusAdaggerb98304222 minus(2αminus α2)983042A(xkminus1 minusAdaggerb)98304222
983042A9830422Fle ρ983042xkminus1 minusAdaggerb98304222 (by Lemma 3)
which yields
E983045983042983141xk minusAdaggerb98304222
983046le ρE
983045983042xkminus1 minusAdaggerb98304222
983046 (4)
Note that for any ε gt 0 we have
983042xk minusAdaggerb98304222= 983042xk minus 983141xk + 983141xk minusAdaggerb98304222le (983042xk minus 983141xk9830422 + 983042983141xk minusAdaggerb9830422)2
le 983042xk minus 983141xk98304222 + 983042983141xk minusAdaggerb98304222 + 2983042xk minus 983141xk9830422983042983141xk minusAdaggerb9830422
le9830611 +
1
ε
983062983042xk minus 983141xk98304222 + (1 + ε)983042983141xk minusAdaggerb98304222 (5)
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 41 43
Combining (3) (4) and (5) yields
E983045983042xk minusAdaggerb98304222
983046
le9830611 +
1
ε
983062E983045983042xk minus 983141xk98304222
983046+ (1 + ε)E
983045983042983141xk minusAdaggerb98304222
983046
le9830611 +
1
ε
983062α2ρk
983042A9830422F983042z0 minus bperp98304222 + (1 + ε)ρE
983045983042xkminus1 minusAdaggerb98304222
983046
le9830611 +
1
ε
983062(1 + 1 + ε)α2ρk
983042A9830422F983042z0 minus bperp98304222
+(1 + ε)2ρ2E983045983042xkminus2 minusAdaggerb98304222
983046
le middot middot middot
le9830611 +
1
ε
983062(1 + 1 + ε+ middot middot middot+ (1 + ε)kminus1)α2ρk
983042A9830422F983042z0 minus bperp98304222
+(1 + ε)kρk983042x0 minusAdaggerb98304222
le (1 + ε)kρk983061(1 + ε)α2983042z0 minus bperp98304222
ε2983042A9830422F+ 983042x0 minusAdaggerb98304222
983062
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 42 43
Remark 4
For the case REBK with s = m t = n and α = 1 (ie REK) by theorthogonality
(983141xk minusAdaggerb)T(xk minus 983141xk) = 0
the equation (5) becomes
983042xk minusAdaggerb98304222 = 983042xk minus 983141xk98304222 + 983042983141xk minusAdaggerb98304222
which yields the following convergence for REK
E983045983042xk minusAdaggerb98304222
983046le ρk
983061k983042z0 minus bperp98304222
983042A9830422F+ 983042x0 minusAdaggerb98304222
983062
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 43 43
Kui Du Wutao Si and Xiaohui Sun
Pseudoinverse-free randomized extended block kaczmarz for solving least squaresarXiv preprint arXiv200104179 2020
Kui Du and Xiaohui Sun
A doubly stochastic block Gauss-Seidel algorithm for solving linear equationsarXiv preprint arXiv191213291 2019
Lawrence H Landweber
An iteration formula for Fredholm integral equations of the first kindAmer J Math 73615ndash624 1951
D Leventhal and A S Lewis
Randomized methods for linear constraints convergence rates and conditioningMath Oper Res 35(3)641ndash654 2010
Anna Ma Deanna Needell and Aaditya Ramdas
Convergence properties of the randomized extended Gauss-Seidel and Kaczmarz methodsSIAM J Matrix Anal Appl 36(4)1590ndash1604 2015
Meisam Razaviyayn Mingyi Hong Navid Reyhanian and Zhi-Quan Luo
A linearly convergent doubly stochastic Gauss-Seidel algorithm for solving linear equations and acertain class of over-parameterized optimization problemsMath Program 176(1-2 Ser B)465ndash496 2019
Thomas Strohmer and Roman Vershynin
A randomized Kaczmarz algorithm with exponential convergenceJ Fourier Anal Appl 15(2)262ndash278 2009
Anastasios Zouzias and Nikolaos M Freris
Randomized extended Kaczmarz for solving least squaresSIAM J Matrix Anal Appl 34(2)773ndash793 2013
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 43 43
Remark 3
Note that(AJ )
Tb = (AJ )TAx983183
where x983183 is any solution of ATAx = ATb For inconsistent linearsystems we can prove DSBGS(1 t) has the convergence bounds Ift = n and 0 lt α lt 2σ2
r (A)983042A9830422F then
E[983042Axk minusAx98318398304222] le9830611 + α2 minus 2ασ2
r (A)
983042A9830422F
983062k
983042Ax0 minus b98304222
If t lt n and 0 lt α lt 2σ2r (A)(tρ) then
E[983042Axk minusAx98318398304222] le9830611minus 2ασ2
r (A)minus tρα2
983042A9830422F
983062k
983042Ax0 minus b98304222
whereρ = max
1lejletσ21(AJj )
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 27 43
4 Randomized extended block Kaczmarz (REBK) [1]
Algorithm 2 Randomized extended block Kaczmarz (REBK)
Let I1 I2 Is and J1J2 Jt be partitions of [m]and [n] respectivelyLet α gt 0 Initialize z0 isin b+ range(A) and x0 isin range(AT)for k = 1 2 do
Pick j isin [t] with probability 983042AJj9830422F983042A9830422FSet zk = zkminus1 minus α
983042AJj9830422FAJj (AJj )
Tzkminus1
Pick i isin [s] with probability 983042AIi9830422F983042A9830422FSet xk = xkminus1 minus α
983042AIi9830422F(AIi)
T(AIixkminus1 minus bIi + zkIi)
By choosing s = m t = n and α = 1 we recover the well-knownrandomized extended Kaczmarz (REK) algorithm of Zouzias and Freris[8]
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 28 43
Let Ekminus1
983045middot983046denote the conditional expectation conditioned on the first
k minus 1 iterations of REBK That is
Ekminus1
983045middot983046= E
983045middot|j1 i1 j2 i2 jkminus1 ikminus1
983046
where jl is the lth column block chosen and il is the lth row blockchosen
We denote the conditional expectation conditioned on the first k minus 1iterations and the kth column block chosen as
Eikminus1
983045middot983046= E
983045middot|j1 i1 j2 i2 jkminus1 ikminus1 jk
983046
Then by the law of total expectation we have
Ekminus1
983045middot983046= Ekminus1
983045Eikminus1
983045middot983046983046
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 29 43
41 Convergence of the norms of the expectations
Theorem 8
For any given consistent or inconsistent linear system
Ax = b
let xk be the kth iterate of REBK with
z0 isin b+ range(A) and x0 isin range(AT)
It holds
983042E983045xk minusAdaggerb
9830469830422 le δk
983061983042x0 minusAdaggerb9830422 +
αk983042ATz09830422983042A9830422F
983062
where
δ = max1leiler
9830559830559830559830551minusασ2
i (A)
983042A9830422F
983055983055983055983055
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 30 43
Proof By ATb = ATAAdaggerb and
xk minusAdaggerb = xkminus1 minusAdaggerbminus α
983042AIi9830422F(AIi)
T(AIixkminus1 minus bIi + zkIi)
we have
Ekminus1
983045xk minusAdaggerb
983046= Ekminus1
983045Eikminus1
983045xk minusAdaggerb
983046983046
= xkminus1 minusAdaggerbminus Ekminus1
983063αAT(Axkminus1 minus b+ zk)
983042A9830422F
983064
= xkminus1 minusAdaggerbminus αATAxkminus1 minusATb
983042A9830422Fminus α
AT
983042A9830422FEkminus1
983045zk
983046
=
983061Iminus α
ATA
983042A9830422F
983062(xkminus1 minusAdaggerb)minus α
AT
983042A9830422FEkminus1
983045zk
983046
=
983061Iminus α
ATA
983042A9830422F
983062(xkminus1 minusAdaggerb)minus α
AT
983042A9830422F
983061Iminus α
AAT
983042A9830422F
983062zkminus1
=
983061Iminus α
ATA
983042A9830422F
983062(xkminus1 minusAdaggerb)minus α
983061Iminus α
ATA
983042A9830422F
983062AT
983042A9830422Fzkminus1
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 31 43
Taking expectation gives
E983045xk minusAdaggerb
983046
= E983045Ekminus1
983045xk minusAdaggerb
983046983046
=
983061Iminus α
ATA
983042A9830422F
983062E983045xkminus1 minusAdaggerb
983046minus α
983061Iminus α
ATA
983042A9830422F
983062AT
983042A9830422FE983045zkminus1
983046
=
983061Iminus α
ATA
983042A9830422F
983062E983045xkminus1 minusAdaggerb
983046minus α
983061Iminus α
ATA
983042A9830422F
983062kATz0
983042A9830422F
=
983061Iminus α
ATA
983042A9830422F
9830622
E983045xkminus2 minusAdaggerb
983046minus 2α
983061Iminus α
ATA
983042A9830422F
983062kATz0
983042A9830422F= middot middot middot
=
983061Iminus α
ATA
983042A9830422F
983062k
(x0 minusAdaggerb)minus αk
983061Iminus α
ATA
983042A9830422F
983062kATz0
983042A9830422F
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 32 43
Applying the norms to both sides we obtain
983042E983045xk minusAdaggerb
9830469830422
=
983056983056983056983056983056
983061Iminus α
ATA
983042A9830422F
983062k
(x0 minusAdaggerb)minus αk
983061Iminus α
ATA
983042A9830422F
983062kATz0
983042A9830422F
9830569830569830569830569830562
le983056983056983056983056983056
983061Iminus α
ATA
983042A9830422F
983062k
(x0 minusAdaggerb)
9830569830569830569830569830562
+
983056983056983056983056983056αk983061Iminus α
ATA
983042A9830422F
983062kATz0
983042A9830422F
9830569830569830569830569830562
le δk983061983042x0 minusAdaggerb9830422 +
αk983042ATz09830422983042A9830422F
983062
Here the last inequality follows from the facts that
x0 minusAdaggerb isin range(AT) ATz0 isin range(AT)
and Lemma 3
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 33 43
42 Convergence of E983045983042xk minusAdaggerb98304222
983046
The convergence ofE983045983042xk minusAdaggerb98304222
983046
depends on the positive number ρ defined as
ρ = 1minus (2αminus α2)σ2r (A)
983042A9830422F
In the following lemma we show that the vector zk generated in REBKwith z0 isin b+ range(A) converges to
bperp = (IminusAAdagger)b
which is the orthogonal projection of z0 onto the set z | ATz = 0
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 34 43
Lemma 9
For any given consistent or inconsistent linear system
Ax = b
let zk be the vector generated in REBK with
z0 isin b+ range(A)
It holdsE983045983042zk minus bperp98304222
983046le ρk983042z0 minus bperp98304222
Proof By (AJj )Tbperp = 0 we have
zk minus bperp = zkminus1 minus bperp minus α
983042AJj9830422FAJj (AJj )
T(zkminus1 minus bperp) (1)
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 35 43
By z0 minus bperp = AAdaggerz0 isin range(A) we can show by induction that
zk minus bperp isin range(A)
It follows from (1) that
983042zk minus bperp98304222
= 983042zkminus1 minus bperp98304222 minus2α983042(AJj )
T(zkminus1 minus bperp)98304222983042AJj9830422F
+α2
983042AJj9830424F(zkminus1 minus bperp)
TAJj (AJj )TAJj (AJj )
T(zkminus1 minus bperp)
le 983042zkminus1 minus bperp98304222 minus(2αminus α2)983042(AJj )
T(zkminus1 minus bperp)98304222983042AJj9830422F
(by Lemma 1)
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 36 43
Taking the conditioned expectation on the first k minus 1 iterations yields
Ekminus1
983045983042zk minus bperp98304222
983046
le 983042zkminus1 minus bperp98304222 minus(2αminus α2)983042AT(zkminus1 minus bperp)98304222
983042A9830422Fle ρ983042zkminus1 minus bperp98304222 (by Lemma 2)
Taking expectation again gives
E983045983042zk minus bperp98304222
983046= E
983045Ekminus1
983045983042zk minus bperp98304222
983046983046
le ρE983045983042zkminus1 minus bperp98304222
983046
le ρk983042z0 minus bperp98304222
This completes the proof
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 37 43
Theorem 10
For any given consistent or inconsistent linear system Ax = b let xk
be the kth iterate of REBK with
z0 isin b+ range(A) and x0 isin range(AT)
For any ε gt 0 it holds
E983045983042xk minusAdaggerb98304222
983046le (1 + ε)kρk
983061(1 + ε)α2983042z0 minus bperp98304222
ε2983042A9830422F+ 983042x0 minusAdaggerb98304222
983062
Proof Let
983141xk = xkminus1 minus α
983042AIi9830422F(AIi)
TAIi(xkminus1 minusAdaggerb)
which is actually one DSBGS update for the linear systemAx = AAdaggerb from xkminus1
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 38 43
It follows from
xk minus 983141xk =α
983042AIi9830422F(AIi)
T(bIi minusAIiAdaggerbminus zkIi)
that
983042xk minus 983141xk98304222
=α2
983042AIi9830424F(bIi minusAIiA
daggerbminus zkIi)TAIi(AIi)
T(bIi minusAIiAdaggerbminus zkIi
)
leα2983042bIi minusAIiA
daggerbminus zkIi98304222
983042AIi9830422F (by Lemma 1) (2)
It follows from
Ekminus1
983045983042xk minus 983141xk98304222
983046= Ekminus1
983045Eikminus1
983045983042xk minus 983141xk98304222
983046983046
le Ekminus1
983063α2983042bminusAAdaggerbminus zk98304222
983042A9830422F
983064(by (2))
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 39 43
that
E983045983042xk minus 983141xk98304222
983046le α2
983042A9830422FE983045983042bminusAAdaggerbminus zk98304222
983046
le α2ρk
983042A9830422F983042z0 minus bperp98304222 (by Lemma 9) (3)
By x0 isin range(AT) and Adaggerb isin range(AT) we have
x0 minusAdaggerb isin range(AT)
Then we can show that xk minusAdaggerb isin range(AT) by induction By
983042983141xk minusAdaggerb98304222
= 983042xkminus1 minusAdaggerb98304222 minus2α983042AIi(x
kminus1 minusAdaggerb)98304222983042AIi9830422F
+α2
983042AIi9830424F(xkminus1 minusAdaggerb)T(AIi)
TAIi(AIi)TAIi(x
kminus1 minusAdaggerb)
le 983042xkminus1 minusAdaggerb98304222 minus(2αminus α2)983042AIi(x
kminus1 minusAdaggerb)98304222983042AIi9830422F
(by Lemma 1)
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 40 43
we have
Ekminus1
983045983042983141xk minusAdaggerb98304222
983046
le 983042xkminus1 minusAdaggerb98304222 minus(2αminus α2)983042A(xkminus1 minusAdaggerb)98304222
983042A9830422Fle ρ983042xkminus1 minusAdaggerb98304222 (by Lemma 3)
which yields
E983045983042983141xk minusAdaggerb98304222
983046le ρE
983045983042xkminus1 minusAdaggerb98304222
983046 (4)
Note that for any ε gt 0 we have
983042xk minusAdaggerb98304222= 983042xk minus 983141xk + 983141xk minusAdaggerb98304222le (983042xk minus 983141xk9830422 + 983042983141xk minusAdaggerb9830422)2
le 983042xk minus 983141xk98304222 + 983042983141xk minusAdaggerb98304222 + 2983042xk minus 983141xk9830422983042983141xk minusAdaggerb9830422
le9830611 +
1
ε
983062983042xk minus 983141xk98304222 + (1 + ε)983042983141xk minusAdaggerb98304222 (5)
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 41 43
Combining (3) (4) and (5) yields
E983045983042xk minusAdaggerb98304222
983046
le9830611 +
1
ε
983062E983045983042xk minus 983141xk98304222
983046+ (1 + ε)E
983045983042983141xk minusAdaggerb98304222
983046
le9830611 +
1
ε
983062α2ρk
983042A9830422F983042z0 minus bperp98304222 + (1 + ε)ρE
983045983042xkminus1 minusAdaggerb98304222
983046
le9830611 +
1
ε
983062(1 + 1 + ε)α2ρk
983042A9830422F983042z0 minus bperp98304222
+(1 + ε)2ρ2E983045983042xkminus2 minusAdaggerb98304222
983046
le middot middot middot
le9830611 +
1
ε
983062(1 + 1 + ε+ middot middot middot+ (1 + ε)kminus1)α2ρk
983042A9830422F983042z0 minus bperp98304222
+(1 + ε)kρk983042x0 minusAdaggerb98304222
le (1 + ε)kρk983061(1 + ε)α2983042z0 minus bperp98304222
ε2983042A9830422F+ 983042x0 minusAdaggerb98304222
983062
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 42 43
Remark 4
For the case REBK with s = m t = n and α = 1 (ie REK) by theorthogonality
(983141xk minusAdaggerb)T(xk minus 983141xk) = 0
the equation (5) becomes
983042xk minusAdaggerb98304222 = 983042xk minus 983141xk98304222 + 983042983141xk minusAdaggerb98304222
which yields the following convergence for REK
E983045983042xk minusAdaggerb98304222
983046le ρk
983061k983042z0 minus bperp98304222
983042A9830422F+ 983042x0 minusAdaggerb98304222
983062
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 43 43
Kui Du Wutao Si and Xiaohui Sun
Pseudoinverse-free randomized extended block kaczmarz for solving least squaresarXiv preprint arXiv200104179 2020
Kui Du and Xiaohui Sun
A doubly stochastic block Gauss-Seidel algorithm for solving linear equationsarXiv preprint arXiv191213291 2019
Lawrence H Landweber
An iteration formula for Fredholm integral equations of the first kindAmer J Math 73615ndash624 1951
D Leventhal and A S Lewis
Randomized methods for linear constraints convergence rates and conditioningMath Oper Res 35(3)641ndash654 2010
Anna Ma Deanna Needell and Aaditya Ramdas
Convergence properties of the randomized extended Gauss-Seidel and Kaczmarz methodsSIAM J Matrix Anal Appl 36(4)1590ndash1604 2015
Meisam Razaviyayn Mingyi Hong Navid Reyhanian and Zhi-Quan Luo
A linearly convergent doubly stochastic Gauss-Seidel algorithm for solving linear equations and acertain class of over-parameterized optimization problemsMath Program 176(1-2 Ser B)465ndash496 2019
Thomas Strohmer and Roman Vershynin
A randomized Kaczmarz algorithm with exponential convergenceJ Fourier Anal Appl 15(2)262ndash278 2009
Anastasios Zouzias and Nikolaos M Freris
Randomized extended Kaczmarz for solving least squaresSIAM J Matrix Anal Appl 34(2)773ndash793 2013
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 43 43
4 Randomized extended block Kaczmarz (REBK) [1]
Algorithm 2 Randomized extended block Kaczmarz (REBK)
Let I1 I2 Is and J1J2 Jt be partitions of [m]and [n] respectivelyLet α gt 0 Initialize z0 isin b+ range(A) and x0 isin range(AT)for k = 1 2 do
Pick j isin [t] with probability 983042AJj9830422F983042A9830422FSet zk = zkminus1 minus α
983042AJj9830422FAJj (AJj )
Tzkminus1
Pick i isin [s] with probability 983042AIi9830422F983042A9830422FSet xk = xkminus1 minus α
983042AIi9830422F(AIi)
T(AIixkminus1 minus bIi + zkIi)
By choosing s = m t = n and α = 1 we recover the well-knownrandomized extended Kaczmarz (REK) algorithm of Zouzias and Freris[8]
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 28 43
Let Ekminus1
983045middot983046denote the conditional expectation conditioned on the first
k minus 1 iterations of REBK That is
Ekminus1
983045middot983046= E
983045middot|j1 i1 j2 i2 jkminus1 ikminus1
983046
where jl is the lth column block chosen and il is the lth row blockchosen
We denote the conditional expectation conditioned on the first k minus 1iterations and the kth column block chosen as
Eikminus1
983045middot983046= E
983045middot|j1 i1 j2 i2 jkminus1 ikminus1 jk
983046
Then by the law of total expectation we have
Ekminus1
983045middot983046= Ekminus1
983045Eikminus1
983045middot983046983046
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 29 43
41 Convergence of the norms of the expectations
Theorem 8
For any given consistent or inconsistent linear system
Ax = b
let xk be the kth iterate of REBK with
z0 isin b+ range(A) and x0 isin range(AT)
It holds
983042E983045xk minusAdaggerb
9830469830422 le δk
983061983042x0 minusAdaggerb9830422 +
αk983042ATz09830422983042A9830422F
983062
where
δ = max1leiler
9830559830559830559830551minusασ2
i (A)
983042A9830422F
983055983055983055983055
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 30 43
Proof By ATb = ATAAdaggerb and
xk minusAdaggerb = xkminus1 minusAdaggerbminus α
983042AIi9830422F(AIi)
T(AIixkminus1 minus bIi + zkIi)
we have
Ekminus1
983045xk minusAdaggerb
983046= Ekminus1
983045Eikminus1
983045xk minusAdaggerb
983046983046
= xkminus1 minusAdaggerbminus Ekminus1
983063αAT(Axkminus1 minus b+ zk)
983042A9830422F
983064
= xkminus1 minusAdaggerbminus αATAxkminus1 minusATb
983042A9830422Fminus α
AT
983042A9830422FEkminus1
983045zk
983046
=
983061Iminus α
ATA
983042A9830422F
983062(xkminus1 minusAdaggerb)minus α
AT
983042A9830422FEkminus1
983045zk
983046
=
983061Iminus α
ATA
983042A9830422F
983062(xkminus1 minusAdaggerb)minus α
AT
983042A9830422F
983061Iminus α
AAT
983042A9830422F
983062zkminus1
=
983061Iminus α
ATA
983042A9830422F
983062(xkminus1 minusAdaggerb)minus α
983061Iminus α
ATA
983042A9830422F
983062AT
983042A9830422Fzkminus1
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 31 43
Taking expectation gives
E983045xk minusAdaggerb
983046
= E983045Ekminus1
983045xk minusAdaggerb
983046983046
=
983061Iminus α
ATA
983042A9830422F
983062E983045xkminus1 minusAdaggerb
983046minus α
983061Iminus α
ATA
983042A9830422F
983062AT
983042A9830422FE983045zkminus1
983046
=
983061Iminus α
ATA
983042A9830422F
983062E983045xkminus1 minusAdaggerb
983046minus α
983061Iminus α
ATA
983042A9830422F
983062kATz0
983042A9830422F
=
983061Iminus α
ATA
983042A9830422F
9830622
E983045xkminus2 minusAdaggerb
983046minus 2α
983061Iminus α
ATA
983042A9830422F
983062kATz0
983042A9830422F= middot middot middot
=
983061Iminus α
ATA
983042A9830422F
983062k
(x0 minusAdaggerb)minus αk
983061Iminus α
ATA
983042A9830422F
983062kATz0
983042A9830422F
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 32 43
Applying the norms to both sides we obtain
983042E983045xk minusAdaggerb
9830469830422
=
983056983056983056983056983056
983061Iminus α
ATA
983042A9830422F
983062k
(x0 minusAdaggerb)minus αk
983061Iminus α
ATA
983042A9830422F
983062kATz0
983042A9830422F
9830569830569830569830569830562
le983056983056983056983056983056
983061Iminus α
ATA
983042A9830422F
983062k
(x0 minusAdaggerb)
9830569830569830569830569830562
+
983056983056983056983056983056αk983061Iminus α
ATA
983042A9830422F
983062kATz0
983042A9830422F
9830569830569830569830569830562
le δk983061983042x0 minusAdaggerb9830422 +
αk983042ATz09830422983042A9830422F
983062
Here the last inequality follows from the facts that
x0 minusAdaggerb isin range(AT) ATz0 isin range(AT)
and Lemma 3
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 33 43
42 Convergence of E983045983042xk minusAdaggerb98304222
983046
The convergence ofE983045983042xk minusAdaggerb98304222
983046
depends on the positive number ρ defined as
ρ = 1minus (2αminus α2)σ2r (A)
983042A9830422F
In the following lemma we show that the vector zk generated in REBKwith z0 isin b+ range(A) converges to
bperp = (IminusAAdagger)b
which is the orthogonal projection of z0 onto the set z | ATz = 0
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 34 43
Lemma 9
For any given consistent or inconsistent linear system
Ax = b
let zk be the vector generated in REBK with
z0 isin b+ range(A)
It holdsE983045983042zk minus bperp98304222
983046le ρk983042z0 minus bperp98304222
Proof By (AJj )Tbperp = 0 we have
zk minus bperp = zkminus1 minus bperp minus α
983042AJj9830422FAJj (AJj )
T(zkminus1 minus bperp) (1)
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 35 43
By z0 minus bperp = AAdaggerz0 isin range(A) we can show by induction that
zk minus bperp isin range(A)
It follows from (1) that
983042zk minus bperp98304222
= 983042zkminus1 minus bperp98304222 minus2α983042(AJj )
T(zkminus1 minus bperp)98304222983042AJj9830422F
+α2
983042AJj9830424F(zkminus1 minus bperp)
TAJj (AJj )TAJj (AJj )
T(zkminus1 minus bperp)
le 983042zkminus1 minus bperp98304222 minus(2αminus α2)983042(AJj )
T(zkminus1 minus bperp)98304222983042AJj9830422F
(by Lemma 1)
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 36 43
Taking the conditioned expectation on the first k minus 1 iterations yields
Ekminus1
983045983042zk minus bperp98304222
983046
le 983042zkminus1 minus bperp98304222 minus(2αminus α2)983042AT(zkminus1 minus bperp)98304222
983042A9830422Fle ρ983042zkminus1 minus bperp98304222 (by Lemma 2)
Taking expectation again gives
E983045983042zk minus bperp98304222
983046= E
983045Ekminus1
983045983042zk minus bperp98304222
983046983046
le ρE983045983042zkminus1 minus bperp98304222
983046
le ρk983042z0 minus bperp98304222
This completes the proof
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 37 43
Theorem 10
For any given consistent or inconsistent linear system Ax = b let xk
be the kth iterate of REBK with
z0 isin b+ range(A) and x0 isin range(AT)
For any ε gt 0 it holds
E983045983042xk minusAdaggerb98304222
983046le (1 + ε)kρk
983061(1 + ε)α2983042z0 minus bperp98304222
ε2983042A9830422F+ 983042x0 minusAdaggerb98304222
983062
Proof Let
983141xk = xkminus1 minus α
983042AIi9830422F(AIi)
TAIi(xkminus1 minusAdaggerb)
which is actually one DSBGS update for the linear systemAx = AAdaggerb from xkminus1
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 38 43
It follows from
xk minus 983141xk =α
983042AIi9830422F(AIi)
T(bIi minusAIiAdaggerbminus zkIi)
that
983042xk minus 983141xk98304222
=α2
983042AIi9830424F(bIi minusAIiA
daggerbminus zkIi)TAIi(AIi)
T(bIi minusAIiAdaggerbminus zkIi
)
leα2983042bIi minusAIiA
daggerbminus zkIi98304222
983042AIi9830422F (by Lemma 1) (2)
It follows from
Ekminus1
983045983042xk minus 983141xk98304222
983046= Ekminus1
983045Eikminus1
983045983042xk minus 983141xk98304222
983046983046
le Ekminus1
983063α2983042bminusAAdaggerbminus zk98304222
983042A9830422F
983064(by (2))
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 39 43
that
E983045983042xk minus 983141xk98304222
983046le α2
983042A9830422FE983045983042bminusAAdaggerbminus zk98304222
983046
le α2ρk
983042A9830422F983042z0 minus bperp98304222 (by Lemma 9) (3)
By x0 isin range(AT) and Adaggerb isin range(AT) we have
x0 minusAdaggerb isin range(AT)
Then we can show that xk minusAdaggerb isin range(AT) by induction By
983042983141xk minusAdaggerb98304222
= 983042xkminus1 minusAdaggerb98304222 minus2α983042AIi(x
kminus1 minusAdaggerb)98304222983042AIi9830422F
+α2
983042AIi9830424F(xkminus1 minusAdaggerb)T(AIi)
TAIi(AIi)TAIi(x
kminus1 minusAdaggerb)
le 983042xkminus1 minusAdaggerb98304222 minus(2αminus α2)983042AIi(x
kminus1 minusAdaggerb)98304222983042AIi9830422F
(by Lemma 1)
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 40 43
we have
Ekminus1
983045983042983141xk minusAdaggerb98304222
983046
le 983042xkminus1 minusAdaggerb98304222 minus(2αminus α2)983042A(xkminus1 minusAdaggerb)98304222
983042A9830422Fle ρ983042xkminus1 minusAdaggerb98304222 (by Lemma 3)
which yields
E983045983042983141xk minusAdaggerb98304222
983046le ρE
983045983042xkminus1 minusAdaggerb98304222
983046 (4)
Note that for any ε gt 0 we have
983042xk minusAdaggerb98304222= 983042xk minus 983141xk + 983141xk minusAdaggerb98304222le (983042xk minus 983141xk9830422 + 983042983141xk minusAdaggerb9830422)2
le 983042xk minus 983141xk98304222 + 983042983141xk minusAdaggerb98304222 + 2983042xk minus 983141xk9830422983042983141xk minusAdaggerb9830422
le9830611 +
1
ε
983062983042xk minus 983141xk98304222 + (1 + ε)983042983141xk minusAdaggerb98304222 (5)
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 41 43
Combining (3) (4) and (5) yields
E983045983042xk minusAdaggerb98304222
983046
le9830611 +
1
ε
983062E983045983042xk minus 983141xk98304222
983046+ (1 + ε)E
983045983042983141xk minusAdaggerb98304222
983046
le9830611 +
1
ε
983062α2ρk
983042A9830422F983042z0 minus bperp98304222 + (1 + ε)ρE
983045983042xkminus1 minusAdaggerb98304222
983046
le9830611 +
1
ε
983062(1 + 1 + ε)α2ρk
983042A9830422F983042z0 minus bperp98304222
+(1 + ε)2ρ2E983045983042xkminus2 minusAdaggerb98304222
983046
le middot middot middot
le9830611 +
1
ε
983062(1 + 1 + ε+ middot middot middot+ (1 + ε)kminus1)α2ρk
983042A9830422F983042z0 minus bperp98304222
+(1 + ε)kρk983042x0 minusAdaggerb98304222
le (1 + ε)kρk983061(1 + ε)α2983042z0 minus bperp98304222
ε2983042A9830422F+ 983042x0 minusAdaggerb98304222
983062
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 42 43
Remark 4
For the case REBK with s = m t = n and α = 1 (ie REK) by theorthogonality
(983141xk minusAdaggerb)T(xk minus 983141xk) = 0
the equation (5) becomes
983042xk minusAdaggerb98304222 = 983042xk minus 983141xk98304222 + 983042983141xk minusAdaggerb98304222
which yields the following convergence for REK
E983045983042xk minusAdaggerb98304222
983046le ρk
983061k983042z0 minus bperp98304222
983042A9830422F+ 983042x0 minusAdaggerb98304222
983062
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 43 43
Kui Du Wutao Si and Xiaohui Sun
Pseudoinverse-free randomized extended block kaczmarz for solving least squaresarXiv preprint arXiv200104179 2020
Kui Du and Xiaohui Sun
A doubly stochastic block Gauss-Seidel algorithm for solving linear equationsarXiv preprint arXiv191213291 2019
Lawrence H Landweber
An iteration formula for Fredholm integral equations of the first kindAmer J Math 73615ndash624 1951
D Leventhal and A S Lewis
Randomized methods for linear constraints convergence rates and conditioningMath Oper Res 35(3)641ndash654 2010
Anna Ma Deanna Needell and Aaditya Ramdas
Convergence properties of the randomized extended Gauss-Seidel and Kaczmarz methodsSIAM J Matrix Anal Appl 36(4)1590ndash1604 2015
Meisam Razaviyayn Mingyi Hong Navid Reyhanian and Zhi-Quan Luo
A linearly convergent doubly stochastic Gauss-Seidel algorithm for solving linear equations and acertain class of over-parameterized optimization problemsMath Program 176(1-2 Ser B)465ndash496 2019
Thomas Strohmer and Roman Vershynin
A randomized Kaczmarz algorithm with exponential convergenceJ Fourier Anal Appl 15(2)262ndash278 2009
Anastasios Zouzias and Nikolaos M Freris
Randomized extended Kaczmarz for solving least squaresSIAM J Matrix Anal Appl 34(2)773ndash793 2013
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 43 43
Let Ekminus1
983045middot983046denote the conditional expectation conditioned on the first
k minus 1 iterations of REBK That is
Ekminus1
983045middot983046= E
983045middot|j1 i1 j2 i2 jkminus1 ikminus1
983046
where jl is the lth column block chosen and il is the lth row blockchosen
We denote the conditional expectation conditioned on the first k minus 1iterations and the kth column block chosen as
Eikminus1
983045middot983046= E
983045middot|j1 i1 j2 i2 jkminus1 ikminus1 jk
983046
Then by the law of total expectation we have
Ekminus1
983045middot983046= Ekminus1
983045Eikminus1
983045middot983046983046
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 29 43
41 Convergence of the norms of the expectations
Theorem 8
For any given consistent or inconsistent linear system
Ax = b
let xk be the kth iterate of REBK with
z0 isin b+ range(A) and x0 isin range(AT)
It holds
983042E983045xk minusAdaggerb
9830469830422 le δk
983061983042x0 minusAdaggerb9830422 +
αk983042ATz09830422983042A9830422F
983062
where
δ = max1leiler
9830559830559830559830551minusασ2
i (A)
983042A9830422F
983055983055983055983055
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 30 43
Proof By ATb = ATAAdaggerb and
xk minusAdaggerb = xkminus1 minusAdaggerbminus α
983042AIi9830422F(AIi)
T(AIixkminus1 minus bIi + zkIi)
we have
Ekminus1
983045xk minusAdaggerb
983046= Ekminus1
983045Eikminus1
983045xk minusAdaggerb
983046983046
= xkminus1 minusAdaggerbminus Ekminus1
983063αAT(Axkminus1 minus b+ zk)
983042A9830422F
983064
= xkminus1 minusAdaggerbminus αATAxkminus1 minusATb
983042A9830422Fminus α
AT
983042A9830422FEkminus1
983045zk
983046
=
983061Iminus α
ATA
983042A9830422F
983062(xkminus1 minusAdaggerb)minus α
AT
983042A9830422FEkminus1
983045zk
983046
=
983061Iminus α
ATA
983042A9830422F
983062(xkminus1 minusAdaggerb)minus α
AT
983042A9830422F
983061Iminus α
AAT
983042A9830422F
983062zkminus1
=
983061Iminus α
ATA
983042A9830422F
983062(xkminus1 minusAdaggerb)minus α
983061Iminus α
ATA
983042A9830422F
983062AT
983042A9830422Fzkminus1
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 31 43
Taking expectation gives
E983045xk minusAdaggerb
983046
= E983045Ekminus1
983045xk minusAdaggerb
983046983046
=
983061Iminus α
ATA
983042A9830422F
983062E983045xkminus1 minusAdaggerb
983046minus α
983061Iminus α
ATA
983042A9830422F
983062AT
983042A9830422FE983045zkminus1
983046
=
983061Iminus α
ATA
983042A9830422F
983062E983045xkminus1 minusAdaggerb
983046minus α
983061Iminus α
ATA
983042A9830422F
983062kATz0
983042A9830422F
=
983061Iminus α
ATA
983042A9830422F
9830622
E983045xkminus2 minusAdaggerb
983046minus 2α
983061Iminus α
ATA
983042A9830422F
983062kATz0
983042A9830422F= middot middot middot
=
983061Iminus α
ATA
983042A9830422F
983062k
(x0 minusAdaggerb)minus αk
983061Iminus α
ATA
983042A9830422F
983062kATz0
983042A9830422F
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 32 43
Applying the norms to both sides we obtain
983042E983045xk minusAdaggerb
9830469830422
=
983056983056983056983056983056
983061Iminus α
ATA
983042A9830422F
983062k
(x0 minusAdaggerb)minus αk
983061Iminus α
ATA
983042A9830422F
983062kATz0
983042A9830422F
9830569830569830569830569830562
le983056983056983056983056983056
983061Iminus α
ATA
983042A9830422F
983062k
(x0 minusAdaggerb)
9830569830569830569830569830562
+
983056983056983056983056983056αk983061Iminus α
ATA
983042A9830422F
983062kATz0
983042A9830422F
9830569830569830569830569830562
le δk983061983042x0 minusAdaggerb9830422 +
αk983042ATz09830422983042A9830422F
983062
Here the last inequality follows from the facts that
x0 minusAdaggerb isin range(AT) ATz0 isin range(AT)
and Lemma 3
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 33 43
42 Convergence of E983045983042xk minusAdaggerb98304222
983046
The convergence ofE983045983042xk minusAdaggerb98304222
983046
depends on the positive number ρ defined as
ρ = 1minus (2αminus α2)σ2r (A)
983042A9830422F
In the following lemma we show that the vector zk generated in REBKwith z0 isin b+ range(A) converges to
bperp = (IminusAAdagger)b
which is the orthogonal projection of z0 onto the set z | ATz = 0
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 34 43
Lemma 9
For any given consistent or inconsistent linear system
Ax = b
let zk be the vector generated in REBK with
z0 isin b+ range(A)
It holdsE983045983042zk minus bperp98304222
983046le ρk983042z0 minus bperp98304222
Proof By (AJj )Tbperp = 0 we have
zk minus bperp = zkminus1 minus bperp minus α
983042AJj9830422FAJj (AJj )
T(zkminus1 minus bperp) (1)
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 35 43
By z0 minus bperp = AAdaggerz0 isin range(A) we can show by induction that
zk minus bperp isin range(A)
It follows from (1) that
983042zk minus bperp98304222
= 983042zkminus1 minus bperp98304222 minus2α983042(AJj )
T(zkminus1 minus bperp)98304222983042AJj9830422F
+α2
983042AJj9830424F(zkminus1 minus bperp)
TAJj (AJj )TAJj (AJj )
T(zkminus1 minus bperp)
le 983042zkminus1 minus bperp98304222 minus(2αminus α2)983042(AJj )
T(zkminus1 minus bperp)98304222983042AJj9830422F
(by Lemma 1)
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 36 43
Taking the conditioned expectation on the first k minus 1 iterations yields
Ekminus1
983045983042zk minus bperp98304222
983046
le 983042zkminus1 minus bperp98304222 minus(2αminus α2)983042AT(zkminus1 minus bperp)98304222
983042A9830422Fle ρ983042zkminus1 minus bperp98304222 (by Lemma 2)
Taking expectation again gives
E983045983042zk minus bperp98304222
983046= E
983045Ekminus1
983045983042zk minus bperp98304222
983046983046
le ρE983045983042zkminus1 minus bperp98304222
983046
le ρk983042z0 minus bperp98304222
This completes the proof
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 37 43
Theorem 10
For any given consistent or inconsistent linear system Ax = b let xk
be the kth iterate of REBK with
z0 isin b+ range(A) and x0 isin range(AT)
For any ε gt 0 it holds
E983045983042xk minusAdaggerb98304222
983046le (1 + ε)kρk
983061(1 + ε)α2983042z0 minus bperp98304222
ε2983042A9830422F+ 983042x0 minusAdaggerb98304222
983062
Proof Let
983141xk = xkminus1 minus α
983042AIi9830422F(AIi)
TAIi(xkminus1 minusAdaggerb)
which is actually one DSBGS update for the linear systemAx = AAdaggerb from xkminus1
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 38 43
It follows from
xk minus 983141xk =α
983042AIi9830422F(AIi)
T(bIi minusAIiAdaggerbminus zkIi)
that
983042xk minus 983141xk98304222
=α2
983042AIi9830424F(bIi minusAIiA
daggerbminus zkIi)TAIi(AIi)
T(bIi minusAIiAdaggerbminus zkIi
)
leα2983042bIi minusAIiA
daggerbminus zkIi98304222
983042AIi9830422F (by Lemma 1) (2)
It follows from
Ekminus1
983045983042xk minus 983141xk98304222
983046= Ekminus1
983045Eikminus1
983045983042xk minus 983141xk98304222
983046983046
le Ekminus1
983063α2983042bminusAAdaggerbminus zk98304222
983042A9830422F
983064(by (2))
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 39 43
that
E983045983042xk minus 983141xk98304222
983046le α2
983042A9830422FE983045983042bminusAAdaggerbminus zk98304222
983046
le α2ρk
983042A9830422F983042z0 minus bperp98304222 (by Lemma 9) (3)
By x0 isin range(AT) and Adaggerb isin range(AT) we have
x0 minusAdaggerb isin range(AT)
Then we can show that xk minusAdaggerb isin range(AT) by induction By
983042983141xk minusAdaggerb98304222
= 983042xkminus1 minusAdaggerb98304222 minus2α983042AIi(x
kminus1 minusAdaggerb)98304222983042AIi9830422F
+α2
983042AIi9830424F(xkminus1 minusAdaggerb)T(AIi)
TAIi(AIi)TAIi(x
kminus1 minusAdaggerb)
le 983042xkminus1 minusAdaggerb98304222 minus(2αminus α2)983042AIi(x
kminus1 minusAdaggerb)98304222983042AIi9830422F
(by Lemma 1)
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 40 43
we have
Ekminus1
983045983042983141xk minusAdaggerb98304222
983046
le 983042xkminus1 minusAdaggerb98304222 minus(2αminus α2)983042A(xkminus1 minusAdaggerb)98304222
983042A9830422Fle ρ983042xkminus1 minusAdaggerb98304222 (by Lemma 3)
which yields
E983045983042983141xk minusAdaggerb98304222
983046le ρE
983045983042xkminus1 minusAdaggerb98304222
983046 (4)
Note that for any ε gt 0 we have
983042xk minusAdaggerb98304222= 983042xk minus 983141xk + 983141xk minusAdaggerb98304222le (983042xk minus 983141xk9830422 + 983042983141xk minusAdaggerb9830422)2
le 983042xk minus 983141xk98304222 + 983042983141xk minusAdaggerb98304222 + 2983042xk minus 983141xk9830422983042983141xk minusAdaggerb9830422
le9830611 +
1
ε
983062983042xk minus 983141xk98304222 + (1 + ε)983042983141xk minusAdaggerb98304222 (5)
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 41 43
Combining (3) (4) and (5) yields
E983045983042xk minusAdaggerb98304222
983046
le9830611 +
1
ε
983062E983045983042xk minus 983141xk98304222
983046+ (1 + ε)E
983045983042983141xk minusAdaggerb98304222
983046
le9830611 +
1
ε
983062α2ρk
983042A9830422F983042z0 minus bperp98304222 + (1 + ε)ρE
983045983042xkminus1 minusAdaggerb98304222
983046
le9830611 +
1
ε
983062(1 + 1 + ε)α2ρk
983042A9830422F983042z0 minus bperp98304222
+(1 + ε)2ρ2E983045983042xkminus2 minusAdaggerb98304222
983046
le middot middot middot
le9830611 +
1
ε
983062(1 + 1 + ε+ middot middot middot+ (1 + ε)kminus1)α2ρk
983042A9830422F983042z0 minus bperp98304222
+(1 + ε)kρk983042x0 minusAdaggerb98304222
le (1 + ε)kρk983061(1 + ε)α2983042z0 minus bperp98304222
ε2983042A9830422F+ 983042x0 minusAdaggerb98304222
983062
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 42 43
Remark 4
For the case REBK with s = m t = n and α = 1 (ie REK) by theorthogonality
(983141xk minusAdaggerb)T(xk minus 983141xk) = 0
the equation (5) becomes
983042xk minusAdaggerb98304222 = 983042xk minus 983141xk98304222 + 983042983141xk minusAdaggerb98304222
which yields the following convergence for REK
E983045983042xk minusAdaggerb98304222
983046le ρk
983061k983042z0 minus bperp98304222
983042A9830422F+ 983042x0 minusAdaggerb98304222
983062
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 43 43
Kui Du Wutao Si and Xiaohui Sun
Pseudoinverse-free randomized extended block kaczmarz for solving least squaresarXiv preprint arXiv200104179 2020
Kui Du and Xiaohui Sun
A doubly stochastic block Gauss-Seidel algorithm for solving linear equationsarXiv preprint arXiv191213291 2019
Lawrence H Landweber
An iteration formula for Fredholm integral equations of the first kindAmer J Math 73615ndash624 1951
D Leventhal and A S Lewis
Randomized methods for linear constraints convergence rates and conditioningMath Oper Res 35(3)641ndash654 2010
Anna Ma Deanna Needell and Aaditya Ramdas
Convergence properties of the randomized extended Gauss-Seidel and Kaczmarz methodsSIAM J Matrix Anal Appl 36(4)1590ndash1604 2015
Meisam Razaviyayn Mingyi Hong Navid Reyhanian and Zhi-Quan Luo
A linearly convergent doubly stochastic Gauss-Seidel algorithm for solving linear equations and acertain class of over-parameterized optimization problemsMath Program 176(1-2 Ser B)465ndash496 2019
Thomas Strohmer and Roman Vershynin
A randomized Kaczmarz algorithm with exponential convergenceJ Fourier Anal Appl 15(2)262ndash278 2009
Anastasios Zouzias and Nikolaos M Freris
Randomized extended Kaczmarz for solving least squaresSIAM J Matrix Anal Appl 34(2)773ndash793 2013
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 43 43
41 Convergence of the norms of the expectations
Theorem 8
For any given consistent or inconsistent linear system
Ax = b
let xk be the kth iterate of REBK with
z0 isin b+ range(A) and x0 isin range(AT)
It holds
983042E983045xk minusAdaggerb
9830469830422 le δk
983061983042x0 minusAdaggerb9830422 +
αk983042ATz09830422983042A9830422F
983062
where
δ = max1leiler
9830559830559830559830551minusασ2
i (A)
983042A9830422F
983055983055983055983055
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 30 43
Proof By ATb = ATAAdaggerb and
xk minusAdaggerb = xkminus1 minusAdaggerbminus α
983042AIi9830422F(AIi)
T(AIixkminus1 minus bIi + zkIi)
we have
Ekminus1
983045xk minusAdaggerb
983046= Ekminus1
983045Eikminus1
983045xk minusAdaggerb
983046983046
= xkminus1 minusAdaggerbminus Ekminus1
983063αAT(Axkminus1 minus b+ zk)
983042A9830422F
983064
= xkminus1 minusAdaggerbminus αATAxkminus1 minusATb
983042A9830422Fminus α
AT
983042A9830422FEkminus1
983045zk
983046
=
983061Iminus α
ATA
983042A9830422F
983062(xkminus1 minusAdaggerb)minus α
AT
983042A9830422FEkminus1
983045zk
983046
=
983061Iminus α
ATA
983042A9830422F
983062(xkminus1 minusAdaggerb)minus α
AT
983042A9830422F
983061Iminus α
AAT
983042A9830422F
983062zkminus1
=
983061Iminus α
ATA
983042A9830422F
983062(xkminus1 minusAdaggerb)minus α
983061Iminus α
ATA
983042A9830422F
983062AT
983042A9830422Fzkminus1
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 31 43
Taking expectation gives
E983045xk minusAdaggerb
983046
= E983045Ekminus1
983045xk minusAdaggerb
983046983046
=
983061Iminus α
ATA
983042A9830422F
983062E983045xkminus1 minusAdaggerb
983046minus α
983061Iminus α
ATA
983042A9830422F
983062AT
983042A9830422FE983045zkminus1
983046
=
983061Iminus α
ATA
983042A9830422F
983062E983045xkminus1 minusAdaggerb
983046minus α
983061Iminus α
ATA
983042A9830422F
983062kATz0
983042A9830422F
=
983061Iminus α
ATA
983042A9830422F
9830622
E983045xkminus2 minusAdaggerb
983046minus 2α
983061Iminus α
ATA
983042A9830422F
983062kATz0
983042A9830422F= middot middot middot
=
983061Iminus α
ATA
983042A9830422F
983062k
(x0 minusAdaggerb)minus αk
983061Iminus α
ATA
983042A9830422F
983062kATz0
983042A9830422F
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 32 43
Applying the norms to both sides we obtain
983042E983045xk minusAdaggerb
9830469830422
=
983056983056983056983056983056
983061Iminus α
ATA
983042A9830422F
983062k
(x0 minusAdaggerb)minus αk
983061Iminus α
ATA
983042A9830422F
983062kATz0
983042A9830422F
9830569830569830569830569830562
le983056983056983056983056983056
983061Iminus α
ATA
983042A9830422F
983062k
(x0 minusAdaggerb)
9830569830569830569830569830562
+
983056983056983056983056983056αk983061Iminus α
ATA
983042A9830422F
983062kATz0
983042A9830422F
9830569830569830569830569830562
le δk983061983042x0 minusAdaggerb9830422 +
αk983042ATz09830422983042A9830422F
983062
Here the last inequality follows from the facts that
x0 minusAdaggerb isin range(AT) ATz0 isin range(AT)
and Lemma 3
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 33 43
42 Convergence of E983045983042xk minusAdaggerb98304222
983046
The convergence ofE983045983042xk minusAdaggerb98304222
983046
depends on the positive number ρ defined as
ρ = 1minus (2αminus α2)σ2r (A)
983042A9830422F
In the following lemma we show that the vector zk generated in REBKwith z0 isin b+ range(A) converges to
bperp = (IminusAAdagger)b
which is the orthogonal projection of z0 onto the set z | ATz = 0
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 34 43
Lemma 9
For any given consistent or inconsistent linear system
Ax = b
let zk be the vector generated in REBK with
z0 isin b+ range(A)
It holdsE983045983042zk minus bperp98304222
983046le ρk983042z0 minus bperp98304222
Proof By (AJj )Tbperp = 0 we have
zk minus bperp = zkminus1 minus bperp minus α
983042AJj9830422FAJj (AJj )
T(zkminus1 minus bperp) (1)
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 35 43
By z0 minus bperp = AAdaggerz0 isin range(A) we can show by induction that
zk minus bperp isin range(A)
It follows from (1) that
983042zk minus bperp98304222
= 983042zkminus1 minus bperp98304222 minus2α983042(AJj )
T(zkminus1 minus bperp)98304222983042AJj9830422F
+α2
983042AJj9830424F(zkminus1 minus bperp)
TAJj (AJj )TAJj (AJj )
T(zkminus1 minus bperp)
le 983042zkminus1 minus bperp98304222 minus(2αminus α2)983042(AJj )
T(zkminus1 minus bperp)98304222983042AJj9830422F
(by Lemma 1)
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 36 43
Taking the conditioned expectation on the first k minus 1 iterations yields
Ekminus1
983045983042zk minus bperp98304222
983046
le 983042zkminus1 minus bperp98304222 minus(2αminus α2)983042AT(zkminus1 minus bperp)98304222
983042A9830422Fle ρ983042zkminus1 minus bperp98304222 (by Lemma 2)
Taking expectation again gives
E983045983042zk minus bperp98304222
983046= E
983045Ekminus1
983045983042zk minus bperp98304222
983046983046
le ρE983045983042zkminus1 minus bperp98304222
983046
le ρk983042z0 minus bperp98304222
This completes the proof
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 37 43
Theorem 10
For any given consistent or inconsistent linear system Ax = b let xk
be the kth iterate of REBK with
z0 isin b+ range(A) and x0 isin range(AT)
For any ε gt 0 it holds
E983045983042xk minusAdaggerb98304222
983046le (1 + ε)kρk
983061(1 + ε)α2983042z0 minus bperp98304222
ε2983042A9830422F+ 983042x0 minusAdaggerb98304222
983062
Proof Let
983141xk = xkminus1 minus α
983042AIi9830422F(AIi)
TAIi(xkminus1 minusAdaggerb)
which is actually one DSBGS update for the linear systemAx = AAdaggerb from xkminus1
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 38 43
It follows from
xk minus 983141xk =α
983042AIi9830422F(AIi)
T(bIi minusAIiAdaggerbminus zkIi)
that
983042xk minus 983141xk98304222
=α2
983042AIi9830424F(bIi minusAIiA
daggerbminus zkIi)TAIi(AIi)
T(bIi minusAIiAdaggerbminus zkIi
)
leα2983042bIi minusAIiA
daggerbminus zkIi98304222
983042AIi9830422F (by Lemma 1) (2)
It follows from
Ekminus1
983045983042xk minus 983141xk98304222
983046= Ekminus1
983045Eikminus1
983045983042xk minus 983141xk98304222
983046983046
le Ekminus1
983063α2983042bminusAAdaggerbminus zk98304222
983042A9830422F
983064(by (2))
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 39 43
that
E983045983042xk minus 983141xk98304222
983046le α2
983042A9830422FE983045983042bminusAAdaggerbminus zk98304222
983046
le α2ρk
983042A9830422F983042z0 minus bperp98304222 (by Lemma 9) (3)
By x0 isin range(AT) and Adaggerb isin range(AT) we have
x0 minusAdaggerb isin range(AT)
Then we can show that xk minusAdaggerb isin range(AT) by induction By
983042983141xk minusAdaggerb98304222
= 983042xkminus1 minusAdaggerb98304222 minus2α983042AIi(x
kminus1 minusAdaggerb)98304222983042AIi9830422F
+α2
983042AIi9830424F(xkminus1 minusAdaggerb)T(AIi)
TAIi(AIi)TAIi(x
kminus1 minusAdaggerb)
le 983042xkminus1 minusAdaggerb98304222 minus(2αminus α2)983042AIi(x
kminus1 minusAdaggerb)98304222983042AIi9830422F
(by Lemma 1)
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 40 43
we have
Ekminus1
983045983042983141xk minusAdaggerb98304222
983046
le 983042xkminus1 minusAdaggerb98304222 minus(2αminus α2)983042A(xkminus1 minusAdaggerb)98304222
983042A9830422Fle ρ983042xkminus1 minusAdaggerb98304222 (by Lemma 3)
which yields
E983045983042983141xk minusAdaggerb98304222
983046le ρE
983045983042xkminus1 minusAdaggerb98304222
983046 (4)
Note that for any ε gt 0 we have
983042xk minusAdaggerb98304222= 983042xk minus 983141xk + 983141xk minusAdaggerb98304222le (983042xk minus 983141xk9830422 + 983042983141xk minusAdaggerb9830422)2
le 983042xk minus 983141xk98304222 + 983042983141xk minusAdaggerb98304222 + 2983042xk minus 983141xk9830422983042983141xk minusAdaggerb9830422
le9830611 +
1
ε
983062983042xk minus 983141xk98304222 + (1 + ε)983042983141xk minusAdaggerb98304222 (5)
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 41 43
Combining (3) (4) and (5) yields
E983045983042xk minusAdaggerb98304222
983046
le9830611 +
1
ε
983062E983045983042xk minus 983141xk98304222
983046+ (1 + ε)E
983045983042983141xk minusAdaggerb98304222
983046
le9830611 +
1
ε
983062α2ρk
983042A9830422F983042z0 minus bperp98304222 + (1 + ε)ρE
983045983042xkminus1 minusAdaggerb98304222
983046
le9830611 +
1
ε
983062(1 + 1 + ε)α2ρk
983042A9830422F983042z0 minus bperp98304222
+(1 + ε)2ρ2E983045983042xkminus2 minusAdaggerb98304222
983046
le middot middot middot
le9830611 +
1
ε
983062(1 + 1 + ε+ middot middot middot+ (1 + ε)kminus1)α2ρk
983042A9830422F983042z0 minus bperp98304222
+(1 + ε)kρk983042x0 minusAdaggerb98304222
le (1 + ε)kρk983061(1 + ε)α2983042z0 minus bperp98304222
ε2983042A9830422F+ 983042x0 minusAdaggerb98304222
983062
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 42 43
Remark 4
For the case REBK with s = m t = n and α = 1 (ie REK) by theorthogonality
(983141xk minusAdaggerb)T(xk minus 983141xk) = 0
the equation (5) becomes
983042xk minusAdaggerb98304222 = 983042xk minus 983141xk98304222 + 983042983141xk minusAdaggerb98304222
which yields the following convergence for REK
E983045983042xk minusAdaggerb98304222
983046le ρk
983061k983042z0 minus bperp98304222
983042A9830422F+ 983042x0 minusAdaggerb98304222
983062
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 43 43
Kui Du Wutao Si and Xiaohui Sun
Pseudoinverse-free randomized extended block kaczmarz for solving least squaresarXiv preprint arXiv200104179 2020
Kui Du and Xiaohui Sun
A doubly stochastic block Gauss-Seidel algorithm for solving linear equationsarXiv preprint arXiv191213291 2019
Lawrence H Landweber
An iteration formula for Fredholm integral equations of the first kindAmer J Math 73615ndash624 1951
D Leventhal and A S Lewis
Randomized methods for linear constraints convergence rates and conditioningMath Oper Res 35(3)641ndash654 2010
Anna Ma Deanna Needell and Aaditya Ramdas
Convergence properties of the randomized extended Gauss-Seidel and Kaczmarz methodsSIAM J Matrix Anal Appl 36(4)1590ndash1604 2015
Meisam Razaviyayn Mingyi Hong Navid Reyhanian and Zhi-Quan Luo
A linearly convergent doubly stochastic Gauss-Seidel algorithm for solving linear equations and acertain class of over-parameterized optimization problemsMath Program 176(1-2 Ser B)465ndash496 2019
Thomas Strohmer and Roman Vershynin
A randomized Kaczmarz algorithm with exponential convergenceJ Fourier Anal Appl 15(2)262ndash278 2009
Anastasios Zouzias and Nikolaos M Freris
Randomized extended Kaczmarz for solving least squaresSIAM J Matrix Anal Appl 34(2)773ndash793 2013
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 43 43
Proof By ATb = ATAAdaggerb and
xk minusAdaggerb = xkminus1 minusAdaggerbminus α
983042AIi9830422F(AIi)
T(AIixkminus1 minus bIi + zkIi)
we have
Ekminus1
983045xk minusAdaggerb
983046= Ekminus1
983045Eikminus1
983045xk minusAdaggerb
983046983046
= xkminus1 minusAdaggerbminus Ekminus1
983063αAT(Axkminus1 minus b+ zk)
983042A9830422F
983064
= xkminus1 minusAdaggerbminus αATAxkminus1 minusATb
983042A9830422Fminus α
AT
983042A9830422FEkminus1
983045zk
983046
=
983061Iminus α
ATA
983042A9830422F
983062(xkminus1 minusAdaggerb)minus α
AT
983042A9830422FEkminus1
983045zk
983046
=
983061Iminus α
ATA
983042A9830422F
983062(xkminus1 minusAdaggerb)minus α
AT
983042A9830422F
983061Iminus α
AAT
983042A9830422F
983062zkminus1
=
983061Iminus α
ATA
983042A9830422F
983062(xkminus1 minusAdaggerb)minus α
983061Iminus α
ATA
983042A9830422F
983062AT
983042A9830422Fzkminus1
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 31 43
Taking expectation gives
E983045xk minusAdaggerb
983046
= E983045Ekminus1
983045xk minusAdaggerb
983046983046
=
983061Iminus α
ATA
983042A9830422F
983062E983045xkminus1 minusAdaggerb
983046minus α
983061Iminus α
ATA
983042A9830422F
983062AT
983042A9830422FE983045zkminus1
983046
=
983061Iminus α
ATA
983042A9830422F
983062E983045xkminus1 minusAdaggerb
983046minus α
983061Iminus α
ATA
983042A9830422F
983062kATz0
983042A9830422F
=
983061Iminus α
ATA
983042A9830422F
9830622
E983045xkminus2 minusAdaggerb
983046minus 2α
983061Iminus α
ATA
983042A9830422F
983062kATz0
983042A9830422F= middot middot middot
=
983061Iminus α
ATA
983042A9830422F
983062k
(x0 minusAdaggerb)minus αk
983061Iminus α
ATA
983042A9830422F
983062kATz0
983042A9830422F
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 32 43
Applying the norms to both sides we obtain
983042E983045xk minusAdaggerb
9830469830422
=
983056983056983056983056983056
983061Iminus α
ATA
983042A9830422F
983062k
(x0 minusAdaggerb)minus αk
983061Iminus α
ATA
983042A9830422F
983062kATz0
983042A9830422F
9830569830569830569830569830562
le983056983056983056983056983056
983061Iminus α
ATA
983042A9830422F
983062k
(x0 minusAdaggerb)
9830569830569830569830569830562
+
983056983056983056983056983056αk983061Iminus α
ATA
983042A9830422F
983062kATz0
983042A9830422F
9830569830569830569830569830562
le δk983061983042x0 minusAdaggerb9830422 +
αk983042ATz09830422983042A9830422F
983062
Here the last inequality follows from the facts that
x0 minusAdaggerb isin range(AT) ATz0 isin range(AT)
and Lemma 3
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 33 43
42 Convergence of E983045983042xk minusAdaggerb98304222
983046
The convergence ofE983045983042xk minusAdaggerb98304222
983046
depends on the positive number ρ defined as
ρ = 1minus (2αminus α2)σ2r (A)
983042A9830422F
In the following lemma we show that the vector zk generated in REBKwith z0 isin b+ range(A) converges to
bperp = (IminusAAdagger)b
which is the orthogonal projection of z0 onto the set z | ATz = 0
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 34 43
Lemma 9
For any given consistent or inconsistent linear system
Ax = b
let zk be the vector generated in REBK with
z0 isin b+ range(A)
It holdsE983045983042zk minus bperp98304222
983046le ρk983042z0 minus bperp98304222
Proof By (AJj )Tbperp = 0 we have
zk minus bperp = zkminus1 minus bperp minus α
983042AJj9830422FAJj (AJj )
T(zkminus1 minus bperp) (1)
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 35 43
By z0 minus bperp = AAdaggerz0 isin range(A) we can show by induction that
zk minus bperp isin range(A)
It follows from (1) that
983042zk minus bperp98304222
= 983042zkminus1 minus bperp98304222 minus2α983042(AJj )
T(zkminus1 minus bperp)98304222983042AJj9830422F
+α2
983042AJj9830424F(zkminus1 minus bperp)
TAJj (AJj )TAJj (AJj )
T(zkminus1 minus bperp)
le 983042zkminus1 minus bperp98304222 minus(2αminus α2)983042(AJj )
T(zkminus1 minus bperp)98304222983042AJj9830422F
(by Lemma 1)
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 36 43
Taking the conditioned expectation on the first k minus 1 iterations yields
Ekminus1
983045983042zk minus bperp98304222
983046
le 983042zkminus1 minus bperp98304222 minus(2αminus α2)983042AT(zkminus1 minus bperp)98304222
983042A9830422Fle ρ983042zkminus1 minus bperp98304222 (by Lemma 2)
Taking expectation again gives
E983045983042zk minus bperp98304222
983046= E
983045Ekminus1
983045983042zk minus bperp98304222
983046983046
le ρE983045983042zkminus1 minus bperp98304222
983046
le ρk983042z0 minus bperp98304222
This completes the proof
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 37 43
Theorem 10
For any given consistent or inconsistent linear system Ax = b let xk
be the kth iterate of REBK with
z0 isin b+ range(A) and x0 isin range(AT)
For any ε gt 0 it holds
E983045983042xk minusAdaggerb98304222
983046le (1 + ε)kρk
983061(1 + ε)α2983042z0 minus bperp98304222
ε2983042A9830422F+ 983042x0 minusAdaggerb98304222
983062
Proof Let
983141xk = xkminus1 minus α
983042AIi9830422F(AIi)
TAIi(xkminus1 minusAdaggerb)
which is actually one DSBGS update for the linear systemAx = AAdaggerb from xkminus1
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 38 43
It follows from
xk minus 983141xk =α
983042AIi9830422F(AIi)
T(bIi minusAIiAdaggerbminus zkIi)
that
983042xk minus 983141xk98304222
=α2
983042AIi9830424F(bIi minusAIiA
daggerbminus zkIi)TAIi(AIi)
T(bIi minusAIiAdaggerbminus zkIi
)
leα2983042bIi minusAIiA
daggerbminus zkIi98304222
983042AIi9830422F (by Lemma 1) (2)
It follows from
Ekminus1
983045983042xk minus 983141xk98304222
983046= Ekminus1
983045Eikminus1
983045983042xk minus 983141xk98304222
983046983046
le Ekminus1
983063α2983042bminusAAdaggerbminus zk98304222
983042A9830422F
983064(by (2))
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 39 43
that
E983045983042xk minus 983141xk98304222
983046le α2
983042A9830422FE983045983042bminusAAdaggerbminus zk98304222
983046
le α2ρk
983042A9830422F983042z0 minus bperp98304222 (by Lemma 9) (3)
By x0 isin range(AT) and Adaggerb isin range(AT) we have
x0 minusAdaggerb isin range(AT)
Then we can show that xk minusAdaggerb isin range(AT) by induction By
983042983141xk minusAdaggerb98304222
= 983042xkminus1 minusAdaggerb98304222 minus2α983042AIi(x
kminus1 minusAdaggerb)98304222983042AIi9830422F
+α2
983042AIi9830424F(xkminus1 minusAdaggerb)T(AIi)
TAIi(AIi)TAIi(x
kminus1 minusAdaggerb)
le 983042xkminus1 minusAdaggerb98304222 minus(2αminus α2)983042AIi(x
kminus1 minusAdaggerb)98304222983042AIi9830422F
(by Lemma 1)
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 40 43
we have
Ekminus1
983045983042983141xk minusAdaggerb98304222
983046
le 983042xkminus1 minusAdaggerb98304222 minus(2αminus α2)983042A(xkminus1 minusAdaggerb)98304222
983042A9830422Fle ρ983042xkminus1 minusAdaggerb98304222 (by Lemma 3)
which yields
E983045983042983141xk minusAdaggerb98304222
983046le ρE
983045983042xkminus1 minusAdaggerb98304222
983046 (4)
Note that for any ε gt 0 we have
983042xk minusAdaggerb98304222= 983042xk minus 983141xk + 983141xk minusAdaggerb98304222le (983042xk minus 983141xk9830422 + 983042983141xk minusAdaggerb9830422)2
le 983042xk minus 983141xk98304222 + 983042983141xk minusAdaggerb98304222 + 2983042xk minus 983141xk9830422983042983141xk minusAdaggerb9830422
le9830611 +
1
ε
983062983042xk minus 983141xk98304222 + (1 + ε)983042983141xk minusAdaggerb98304222 (5)
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 41 43
Combining (3) (4) and (5) yields
E983045983042xk minusAdaggerb98304222
983046
le9830611 +
1
ε
983062E983045983042xk minus 983141xk98304222
983046+ (1 + ε)E
983045983042983141xk minusAdaggerb98304222
983046
le9830611 +
1
ε
983062α2ρk
983042A9830422F983042z0 minus bperp98304222 + (1 + ε)ρE
983045983042xkminus1 minusAdaggerb98304222
983046
le9830611 +
1
ε
983062(1 + 1 + ε)α2ρk
983042A9830422F983042z0 minus bperp98304222
+(1 + ε)2ρ2E983045983042xkminus2 minusAdaggerb98304222
983046
le middot middot middot
le9830611 +
1
ε
983062(1 + 1 + ε+ middot middot middot+ (1 + ε)kminus1)α2ρk
983042A9830422F983042z0 minus bperp98304222
+(1 + ε)kρk983042x0 minusAdaggerb98304222
le (1 + ε)kρk983061(1 + ε)α2983042z0 minus bperp98304222
ε2983042A9830422F+ 983042x0 minusAdaggerb98304222
983062
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 42 43
Remark 4
For the case REBK with s = m t = n and α = 1 (ie REK) by theorthogonality
(983141xk minusAdaggerb)T(xk minus 983141xk) = 0
the equation (5) becomes
983042xk minusAdaggerb98304222 = 983042xk minus 983141xk98304222 + 983042983141xk minusAdaggerb98304222
which yields the following convergence for REK
E983045983042xk minusAdaggerb98304222
983046le ρk
983061k983042z0 minus bperp98304222
983042A9830422F+ 983042x0 minusAdaggerb98304222
983062
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 43 43
Kui Du Wutao Si and Xiaohui Sun
Pseudoinverse-free randomized extended block kaczmarz for solving least squaresarXiv preprint arXiv200104179 2020
Kui Du and Xiaohui Sun
A doubly stochastic block Gauss-Seidel algorithm for solving linear equationsarXiv preprint arXiv191213291 2019
Lawrence H Landweber
An iteration formula for Fredholm integral equations of the first kindAmer J Math 73615ndash624 1951
D Leventhal and A S Lewis
Randomized methods for linear constraints convergence rates and conditioningMath Oper Res 35(3)641ndash654 2010
Anna Ma Deanna Needell and Aaditya Ramdas
Convergence properties of the randomized extended Gauss-Seidel and Kaczmarz methodsSIAM J Matrix Anal Appl 36(4)1590ndash1604 2015
Meisam Razaviyayn Mingyi Hong Navid Reyhanian and Zhi-Quan Luo
A linearly convergent doubly stochastic Gauss-Seidel algorithm for solving linear equations and acertain class of over-parameterized optimization problemsMath Program 176(1-2 Ser B)465ndash496 2019
Thomas Strohmer and Roman Vershynin
A randomized Kaczmarz algorithm with exponential convergenceJ Fourier Anal Appl 15(2)262ndash278 2009
Anastasios Zouzias and Nikolaos M Freris
Randomized extended Kaczmarz for solving least squaresSIAM J Matrix Anal Appl 34(2)773ndash793 2013
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 43 43
Taking expectation gives
E983045xk minusAdaggerb
983046
= E983045Ekminus1
983045xk minusAdaggerb
983046983046
=
983061Iminus α
ATA
983042A9830422F
983062E983045xkminus1 minusAdaggerb
983046minus α
983061Iminus α
ATA
983042A9830422F
983062AT
983042A9830422FE983045zkminus1
983046
=
983061Iminus α
ATA
983042A9830422F
983062E983045xkminus1 minusAdaggerb
983046minus α
983061Iminus α
ATA
983042A9830422F
983062kATz0
983042A9830422F
=
983061Iminus α
ATA
983042A9830422F
9830622
E983045xkminus2 minusAdaggerb
983046minus 2α
983061Iminus α
ATA
983042A9830422F
983062kATz0
983042A9830422F= middot middot middot
=
983061Iminus α
ATA
983042A9830422F
983062k
(x0 minusAdaggerb)minus αk
983061Iminus α
ATA
983042A9830422F
983062kATz0
983042A9830422F
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 32 43
Applying the norms to both sides we obtain
983042E983045xk minusAdaggerb
9830469830422
=
983056983056983056983056983056
983061Iminus α
ATA
983042A9830422F
983062k
(x0 minusAdaggerb)minus αk
983061Iminus α
ATA
983042A9830422F
983062kATz0
983042A9830422F
9830569830569830569830569830562
le983056983056983056983056983056
983061Iminus α
ATA
983042A9830422F
983062k
(x0 minusAdaggerb)
9830569830569830569830569830562
+
983056983056983056983056983056αk983061Iminus α
ATA
983042A9830422F
983062kATz0
983042A9830422F
9830569830569830569830569830562
le δk983061983042x0 minusAdaggerb9830422 +
αk983042ATz09830422983042A9830422F
983062
Here the last inequality follows from the facts that
x0 minusAdaggerb isin range(AT) ATz0 isin range(AT)
and Lemma 3
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 33 43
42 Convergence of E983045983042xk minusAdaggerb98304222
983046
The convergence ofE983045983042xk minusAdaggerb98304222
983046
depends on the positive number ρ defined as
ρ = 1minus (2αminus α2)σ2r (A)
983042A9830422F
In the following lemma we show that the vector zk generated in REBKwith z0 isin b+ range(A) converges to
bperp = (IminusAAdagger)b
which is the orthogonal projection of z0 onto the set z | ATz = 0
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 34 43
Lemma 9
For any given consistent or inconsistent linear system
Ax = b
let zk be the vector generated in REBK with
z0 isin b+ range(A)
It holdsE983045983042zk minus bperp98304222
983046le ρk983042z0 minus bperp98304222
Proof By (AJj )Tbperp = 0 we have
zk minus bperp = zkminus1 minus bperp minus α
983042AJj9830422FAJj (AJj )
T(zkminus1 minus bperp) (1)
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 35 43
By z0 minus bperp = AAdaggerz0 isin range(A) we can show by induction that
zk minus bperp isin range(A)
It follows from (1) that
983042zk minus bperp98304222
= 983042zkminus1 minus bperp98304222 minus2α983042(AJj )
T(zkminus1 minus bperp)98304222983042AJj9830422F
+α2
983042AJj9830424F(zkminus1 minus bperp)
TAJj (AJj )TAJj (AJj )
T(zkminus1 minus bperp)
le 983042zkminus1 minus bperp98304222 minus(2αminus α2)983042(AJj )
T(zkminus1 minus bperp)98304222983042AJj9830422F
(by Lemma 1)
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 36 43
Taking the conditioned expectation on the first k minus 1 iterations yields
Ekminus1
983045983042zk minus bperp98304222
983046
le 983042zkminus1 minus bperp98304222 minus(2αminus α2)983042AT(zkminus1 minus bperp)98304222
983042A9830422Fle ρ983042zkminus1 minus bperp98304222 (by Lemma 2)
Taking expectation again gives
E983045983042zk minus bperp98304222
983046= E
983045Ekminus1
983045983042zk minus bperp98304222
983046983046
le ρE983045983042zkminus1 minus bperp98304222
983046
le ρk983042z0 minus bperp98304222
This completes the proof
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 37 43
Theorem 10
For any given consistent or inconsistent linear system Ax = b let xk
be the kth iterate of REBK with
z0 isin b+ range(A) and x0 isin range(AT)
For any ε gt 0 it holds
E983045983042xk minusAdaggerb98304222
983046le (1 + ε)kρk
983061(1 + ε)α2983042z0 minus bperp98304222
ε2983042A9830422F+ 983042x0 minusAdaggerb98304222
983062
Proof Let
983141xk = xkminus1 minus α
983042AIi9830422F(AIi)
TAIi(xkminus1 minusAdaggerb)
which is actually one DSBGS update for the linear systemAx = AAdaggerb from xkminus1
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 38 43
It follows from
xk minus 983141xk =α
983042AIi9830422F(AIi)
T(bIi minusAIiAdaggerbminus zkIi)
that
983042xk minus 983141xk98304222
=α2
983042AIi9830424F(bIi minusAIiA
daggerbminus zkIi)TAIi(AIi)
T(bIi minusAIiAdaggerbminus zkIi
)
leα2983042bIi minusAIiA
daggerbminus zkIi98304222
983042AIi9830422F (by Lemma 1) (2)
It follows from
Ekminus1
983045983042xk minus 983141xk98304222
983046= Ekminus1
983045Eikminus1
983045983042xk minus 983141xk98304222
983046983046
le Ekminus1
983063α2983042bminusAAdaggerbminus zk98304222
983042A9830422F
983064(by (2))
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 39 43
that
E983045983042xk minus 983141xk98304222
983046le α2
983042A9830422FE983045983042bminusAAdaggerbminus zk98304222
983046
le α2ρk
983042A9830422F983042z0 minus bperp98304222 (by Lemma 9) (3)
By x0 isin range(AT) and Adaggerb isin range(AT) we have
x0 minusAdaggerb isin range(AT)
Then we can show that xk minusAdaggerb isin range(AT) by induction By
983042983141xk minusAdaggerb98304222
= 983042xkminus1 minusAdaggerb98304222 minus2α983042AIi(x
kminus1 minusAdaggerb)98304222983042AIi9830422F
+α2
983042AIi9830424F(xkminus1 minusAdaggerb)T(AIi)
TAIi(AIi)TAIi(x
kminus1 minusAdaggerb)
le 983042xkminus1 minusAdaggerb98304222 minus(2αminus α2)983042AIi(x
kminus1 minusAdaggerb)98304222983042AIi9830422F
(by Lemma 1)
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 40 43
we have
Ekminus1
983045983042983141xk minusAdaggerb98304222
983046
le 983042xkminus1 minusAdaggerb98304222 minus(2αminus α2)983042A(xkminus1 minusAdaggerb)98304222
983042A9830422Fle ρ983042xkminus1 minusAdaggerb98304222 (by Lemma 3)
which yields
E983045983042983141xk minusAdaggerb98304222
983046le ρE
983045983042xkminus1 minusAdaggerb98304222
983046 (4)
Note that for any ε gt 0 we have
983042xk minusAdaggerb98304222= 983042xk minus 983141xk + 983141xk minusAdaggerb98304222le (983042xk minus 983141xk9830422 + 983042983141xk minusAdaggerb9830422)2
le 983042xk minus 983141xk98304222 + 983042983141xk minusAdaggerb98304222 + 2983042xk minus 983141xk9830422983042983141xk minusAdaggerb9830422
le9830611 +
1
ε
983062983042xk minus 983141xk98304222 + (1 + ε)983042983141xk minusAdaggerb98304222 (5)
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 41 43
Combining (3) (4) and (5) yields
E983045983042xk minusAdaggerb98304222
983046
le9830611 +
1
ε
983062E983045983042xk minus 983141xk98304222
983046+ (1 + ε)E
983045983042983141xk minusAdaggerb98304222
983046
le9830611 +
1
ε
983062α2ρk
983042A9830422F983042z0 minus bperp98304222 + (1 + ε)ρE
983045983042xkminus1 minusAdaggerb98304222
983046
le9830611 +
1
ε
983062(1 + 1 + ε)α2ρk
983042A9830422F983042z0 minus bperp98304222
+(1 + ε)2ρ2E983045983042xkminus2 minusAdaggerb98304222
983046
le middot middot middot
le9830611 +
1
ε
983062(1 + 1 + ε+ middot middot middot+ (1 + ε)kminus1)α2ρk
983042A9830422F983042z0 minus bperp98304222
+(1 + ε)kρk983042x0 minusAdaggerb98304222
le (1 + ε)kρk983061(1 + ε)α2983042z0 minus bperp98304222
ε2983042A9830422F+ 983042x0 minusAdaggerb98304222
983062
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 42 43
Remark 4
For the case REBK with s = m t = n and α = 1 (ie REK) by theorthogonality
(983141xk minusAdaggerb)T(xk minus 983141xk) = 0
the equation (5) becomes
983042xk minusAdaggerb98304222 = 983042xk minus 983141xk98304222 + 983042983141xk minusAdaggerb98304222
which yields the following convergence for REK
E983045983042xk minusAdaggerb98304222
983046le ρk
983061k983042z0 minus bperp98304222
983042A9830422F+ 983042x0 minusAdaggerb98304222
983062
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 43 43
Kui Du Wutao Si and Xiaohui Sun
Pseudoinverse-free randomized extended block kaczmarz for solving least squaresarXiv preprint arXiv200104179 2020
Kui Du and Xiaohui Sun
A doubly stochastic block Gauss-Seidel algorithm for solving linear equationsarXiv preprint arXiv191213291 2019
Lawrence H Landweber
An iteration formula for Fredholm integral equations of the first kindAmer J Math 73615ndash624 1951
D Leventhal and A S Lewis
Randomized methods for linear constraints convergence rates and conditioningMath Oper Res 35(3)641ndash654 2010
Anna Ma Deanna Needell and Aaditya Ramdas
Convergence properties of the randomized extended Gauss-Seidel and Kaczmarz methodsSIAM J Matrix Anal Appl 36(4)1590ndash1604 2015
Meisam Razaviyayn Mingyi Hong Navid Reyhanian and Zhi-Quan Luo
A linearly convergent doubly stochastic Gauss-Seidel algorithm for solving linear equations and acertain class of over-parameterized optimization problemsMath Program 176(1-2 Ser B)465ndash496 2019
Thomas Strohmer and Roman Vershynin
A randomized Kaczmarz algorithm with exponential convergenceJ Fourier Anal Appl 15(2)262ndash278 2009
Anastasios Zouzias and Nikolaos M Freris
Randomized extended Kaczmarz for solving least squaresSIAM J Matrix Anal Appl 34(2)773ndash793 2013
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 43 43
Applying the norms to both sides we obtain
983042E983045xk minusAdaggerb
9830469830422
=
983056983056983056983056983056
983061Iminus α
ATA
983042A9830422F
983062k
(x0 minusAdaggerb)minus αk
983061Iminus α
ATA
983042A9830422F
983062kATz0
983042A9830422F
9830569830569830569830569830562
le983056983056983056983056983056
983061Iminus α
ATA
983042A9830422F
983062k
(x0 minusAdaggerb)
9830569830569830569830569830562
+
983056983056983056983056983056αk983061Iminus α
ATA
983042A9830422F
983062kATz0
983042A9830422F
9830569830569830569830569830562
le δk983061983042x0 minusAdaggerb9830422 +
αk983042ATz09830422983042A9830422F
983062
Here the last inequality follows from the facts that
x0 minusAdaggerb isin range(AT) ATz0 isin range(AT)
and Lemma 3
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 33 43
42 Convergence of E983045983042xk minusAdaggerb98304222
983046
The convergence ofE983045983042xk minusAdaggerb98304222
983046
depends on the positive number ρ defined as
ρ = 1minus (2αminus α2)σ2r (A)
983042A9830422F
In the following lemma we show that the vector zk generated in REBKwith z0 isin b+ range(A) converges to
bperp = (IminusAAdagger)b
which is the orthogonal projection of z0 onto the set z | ATz = 0
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 34 43
Lemma 9
For any given consistent or inconsistent linear system
Ax = b
let zk be the vector generated in REBK with
z0 isin b+ range(A)
It holdsE983045983042zk minus bperp98304222
983046le ρk983042z0 minus bperp98304222
Proof By (AJj )Tbperp = 0 we have
zk minus bperp = zkminus1 minus bperp minus α
983042AJj9830422FAJj (AJj )
T(zkminus1 minus bperp) (1)
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 35 43
By z0 minus bperp = AAdaggerz0 isin range(A) we can show by induction that
zk minus bperp isin range(A)
It follows from (1) that
983042zk minus bperp98304222
= 983042zkminus1 minus bperp98304222 minus2α983042(AJj )
T(zkminus1 minus bperp)98304222983042AJj9830422F
+α2
983042AJj9830424F(zkminus1 minus bperp)
TAJj (AJj )TAJj (AJj )
T(zkminus1 minus bperp)
le 983042zkminus1 minus bperp98304222 minus(2αminus α2)983042(AJj )
T(zkminus1 minus bperp)98304222983042AJj9830422F
(by Lemma 1)
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 36 43
Taking the conditioned expectation on the first k minus 1 iterations yields
Ekminus1
983045983042zk minus bperp98304222
983046
le 983042zkminus1 minus bperp98304222 minus(2αminus α2)983042AT(zkminus1 minus bperp)98304222
983042A9830422Fle ρ983042zkminus1 minus bperp98304222 (by Lemma 2)
Taking expectation again gives
E983045983042zk minus bperp98304222
983046= E
983045Ekminus1
983045983042zk minus bperp98304222
983046983046
le ρE983045983042zkminus1 minus bperp98304222
983046
le ρk983042z0 minus bperp98304222
This completes the proof
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 37 43
Theorem 10
For any given consistent or inconsistent linear system Ax = b let xk
be the kth iterate of REBK with
z0 isin b+ range(A) and x0 isin range(AT)
For any ε gt 0 it holds
E983045983042xk minusAdaggerb98304222
983046le (1 + ε)kρk
983061(1 + ε)α2983042z0 minus bperp98304222
ε2983042A9830422F+ 983042x0 minusAdaggerb98304222
983062
Proof Let
983141xk = xkminus1 minus α
983042AIi9830422F(AIi)
TAIi(xkminus1 minusAdaggerb)
which is actually one DSBGS update for the linear systemAx = AAdaggerb from xkminus1
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 38 43
It follows from
xk minus 983141xk =α
983042AIi9830422F(AIi)
T(bIi minusAIiAdaggerbminus zkIi)
that
983042xk minus 983141xk98304222
=α2
983042AIi9830424F(bIi minusAIiA
daggerbminus zkIi)TAIi(AIi)
T(bIi minusAIiAdaggerbminus zkIi
)
leα2983042bIi minusAIiA
daggerbminus zkIi98304222
983042AIi9830422F (by Lemma 1) (2)
It follows from
Ekminus1
983045983042xk minus 983141xk98304222
983046= Ekminus1
983045Eikminus1
983045983042xk minus 983141xk98304222
983046983046
le Ekminus1
983063α2983042bminusAAdaggerbminus zk98304222
983042A9830422F
983064(by (2))
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 39 43
that
E983045983042xk minus 983141xk98304222
983046le α2
983042A9830422FE983045983042bminusAAdaggerbminus zk98304222
983046
le α2ρk
983042A9830422F983042z0 minus bperp98304222 (by Lemma 9) (3)
By x0 isin range(AT) and Adaggerb isin range(AT) we have
x0 minusAdaggerb isin range(AT)
Then we can show that xk minusAdaggerb isin range(AT) by induction By
983042983141xk minusAdaggerb98304222
= 983042xkminus1 minusAdaggerb98304222 minus2α983042AIi(x
kminus1 minusAdaggerb)98304222983042AIi9830422F
+α2
983042AIi9830424F(xkminus1 minusAdaggerb)T(AIi)
TAIi(AIi)TAIi(x
kminus1 minusAdaggerb)
le 983042xkminus1 minusAdaggerb98304222 minus(2αminus α2)983042AIi(x
kminus1 minusAdaggerb)98304222983042AIi9830422F
(by Lemma 1)
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 40 43
we have
Ekminus1
983045983042983141xk minusAdaggerb98304222
983046
le 983042xkminus1 minusAdaggerb98304222 minus(2αminus α2)983042A(xkminus1 minusAdaggerb)98304222
983042A9830422Fle ρ983042xkminus1 minusAdaggerb98304222 (by Lemma 3)
which yields
E983045983042983141xk minusAdaggerb98304222
983046le ρE
983045983042xkminus1 minusAdaggerb98304222
983046 (4)
Note that for any ε gt 0 we have
983042xk minusAdaggerb98304222= 983042xk minus 983141xk + 983141xk minusAdaggerb98304222le (983042xk minus 983141xk9830422 + 983042983141xk minusAdaggerb9830422)2
le 983042xk minus 983141xk98304222 + 983042983141xk minusAdaggerb98304222 + 2983042xk minus 983141xk9830422983042983141xk minusAdaggerb9830422
le9830611 +
1
ε
983062983042xk minus 983141xk98304222 + (1 + ε)983042983141xk minusAdaggerb98304222 (5)
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 41 43
Combining (3) (4) and (5) yields
E983045983042xk minusAdaggerb98304222
983046
le9830611 +
1
ε
983062E983045983042xk minus 983141xk98304222
983046+ (1 + ε)E
983045983042983141xk minusAdaggerb98304222
983046
le9830611 +
1
ε
983062α2ρk
983042A9830422F983042z0 minus bperp98304222 + (1 + ε)ρE
983045983042xkminus1 minusAdaggerb98304222
983046
le9830611 +
1
ε
983062(1 + 1 + ε)α2ρk
983042A9830422F983042z0 minus bperp98304222
+(1 + ε)2ρ2E983045983042xkminus2 minusAdaggerb98304222
983046
le middot middot middot
le9830611 +
1
ε
983062(1 + 1 + ε+ middot middot middot+ (1 + ε)kminus1)α2ρk
983042A9830422F983042z0 minus bperp98304222
+(1 + ε)kρk983042x0 minusAdaggerb98304222
le (1 + ε)kρk983061(1 + ε)α2983042z0 minus bperp98304222
ε2983042A9830422F+ 983042x0 minusAdaggerb98304222
983062
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 42 43
Remark 4
For the case REBK with s = m t = n and α = 1 (ie REK) by theorthogonality
(983141xk minusAdaggerb)T(xk minus 983141xk) = 0
the equation (5) becomes
983042xk minusAdaggerb98304222 = 983042xk minus 983141xk98304222 + 983042983141xk minusAdaggerb98304222
which yields the following convergence for REK
E983045983042xk minusAdaggerb98304222
983046le ρk
983061k983042z0 minus bperp98304222
983042A9830422F+ 983042x0 minusAdaggerb98304222
983062
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 43 43
Kui Du Wutao Si and Xiaohui Sun
Pseudoinverse-free randomized extended block kaczmarz for solving least squaresarXiv preprint arXiv200104179 2020
Kui Du and Xiaohui Sun
A doubly stochastic block Gauss-Seidel algorithm for solving linear equationsarXiv preprint arXiv191213291 2019
Lawrence H Landweber
An iteration formula for Fredholm integral equations of the first kindAmer J Math 73615ndash624 1951
D Leventhal and A S Lewis
Randomized methods for linear constraints convergence rates and conditioningMath Oper Res 35(3)641ndash654 2010
Anna Ma Deanna Needell and Aaditya Ramdas
Convergence properties of the randomized extended Gauss-Seidel and Kaczmarz methodsSIAM J Matrix Anal Appl 36(4)1590ndash1604 2015
Meisam Razaviyayn Mingyi Hong Navid Reyhanian and Zhi-Quan Luo
A linearly convergent doubly stochastic Gauss-Seidel algorithm for solving linear equations and acertain class of over-parameterized optimization problemsMath Program 176(1-2 Ser B)465ndash496 2019
Thomas Strohmer and Roman Vershynin
A randomized Kaczmarz algorithm with exponential convergenceJ Fourier Anal Appl 15(2)262ndash278 2009
Anastasios Zouzias and Nikolaos M Freris
Randomized extended Kaczmarz for solving least squaresSIAM J Matrix Anal Appl 34(2)773ndash793 2013
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 43 43
42 Convergence of E983045983042xk minusAdaggerb98304222
983046
The convergence ofE983045983042xk minusAdaggerb98304222
983046
depends on the positive number ρ defined as
ρ = 1minus (2αminus α2)σ2r (A)
983042A9830422F
In the following lemma we show that the vector zk generated in REBKwith z0 isin b+ range(A) converges to
bperp = (IminusAAdagger)b
which is the orthogonal projection of z0 onto the set z | ATz = 0
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 34 43
Lemma 9
For any given consistent or inconsistent linear system
Ax = b
let zk be the vector generated in REBK with
z0 isin b+ range(A)
It holdsE983045983042zk minus bperp98304222
983046le ρk983042z0 minus bperp98304222
Proof By (AJj )Tbperp = 0 we have
zk minus bperp = zkminus1 minus bperp minus α
983042AJj9830422FAJj (AJj )
T(zkminus1 minus bperp) (1)
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 35 43
By z0 minus bperp = AAdaggerz0 isin range(A) we can show by induction that
zk minus bperp isin range(A)
It follows from (1) that
983042zk minus bperp98304222
= 983042zkminus1 minus bperp98304222 minus2α983042(AJj )
T(zkminus1 minus bperp)98304222983042AJj9830422F
+α2
983042AJj9830424F(zkminus1 minus bperp)
TAJj (AJj )TAJj (AJj )
T(zkminus1 minus bperp)
le 983042zkminus1 minus bperp98304222 minus(2αminus α2)983042(AJj )
T(zkminus1 minus bperp)98304222983042AJj9830422F
(by Lemma 1)
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 36 43
Taking the conditioned expectation on the first k minus 1 iterations yields
Ekminus1
983045983042zk minus bperp98304222
983046
le 983042zkminus1 minus bperp98304222 minus(2αminus α2)983042AT(zkminus1 minus bperp)98304222
983042A9830422Fle ρ983042zkminus1 minus bperp98304222 (by Lemma 2)
Taking expectation again gives
E983045983042zk minus bperp98304222
983046= E
983045Ekminus1
983045983042zk minus bperp98304222
983046983046
le ρE983045983042zkminus1 minus bperp98304222
983046
le ρk983042z0 minus bperp98304222
This completes the proof
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 37 43
Theorem 10
For any given consistent or inconsistent linear system Ax = b let xk
be the kth iterate of REBK with
z0 isin b+ range(A) and x0 isin range(AT)
For any ε gt 0 it holds
E983045983042xk minusAdaggerb98304222
983046le (1 + ε)kρk
983061(1 + ε)α2983042z0 minus bperp98304222
ε2983042A9830422F+ 983042x0 minusAdaggerb98304222
983062
Proof Let
983141xk = xkminus1 minus α
983042AIi9830422F(AIi)
TAIi(xkminus1 minusAdaggerb)
which is actually one DSBGS update for the linear systemAx = AAdaggerb from xkminus1
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 38 43
It follows from
xk minus 983141xk =α
983042AIi9830422F(AIi)
T(bIi minusAIiAdaggerbminus zkIi)
that
983042xk minus 983141xk98304222
=α2
983042AIi9830424F(bIi minusAIiA
daggerbminus zkIi)TAIi(AIi)
T(bIi minusAIiAdaggerbminus zkIi
)
leα2983042bIi minusAIiA
daggerbminus zkIi98304222
983042AIi9830422F (by Lemma 1) (2)
It follows from
Ekminus1
983045983042xk minus 983141xk98304222
983046= Ekminus1
983045Eikminus1
983045983042xk minus 983141xk98304222
983046983046
le Ekminus1
983063α2983042bminusAAdaggerbminus zk98304222
983042A9830422F
983064(by (2))
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 39 43
that
E983045983042xk minus 983141xk98304222
983046le α2
983042A9830422FE983045983042bminusAAdaggerbminus zk98304222
983046
le α2ρk
983042A9830422F983042z0 minus bperp98304222 (by Lemma 9) (3)
By x0 isin range(AT) and Adaggerb isin range(AT) we have
x0 minusAdaggerb isin range(AT)
Then we can show that xk minusAdaggerb isin range(AT) by induction By
983042983141xk minusAdaggerb98304222
= 983042xkminus1 minusAdaggerb98304222 minus2α983042AIi(x
kminus1 minusAdaggerb)98304222983042AIi9830422F
+α2
983042AIi9830424F(xkminus1 minusAdaggerb)T(AIi)
TAIi(AIi)TAIi(x
kminus1 minusAdaggerb)
le 983042xkminus1 minusAdaggerb98304222 minus(2αminus α2)983042AIi(x
kminus1 minusAdaggerb)98304222983042AIi9830422F
(by Lemma 1)
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 40 43
we have
Ekminus1
983045983042983141xk minusAdaggerb98304222
983046
le 983042xkminus1 minusAdaggerb98304222 minus(2αminus α2)983042A(xkminus1 minusAdaggerb)98304222
983042A9830422Fle ρ983042xkminus1 minusAdaggerb98304222 (by Lemma 3)
which yields
E983045983042983141xk minusAdaggerb98304222
983046le ρE
983045983042xkminus1 minusAdaggerb98304222
983046 (4)
Note that for any ε gt 0 we have
983042xk minusAdaggerb98304222= 983042xk minus 983141xk + 983141xk minusAdaggerb98304222le (983042xk minus 983141xk9830422 + 983042983141xk minusAdaggerb9830422)2
le 983042xk minus 983141xk98304222 + 983042983141xk minusAdaggerb98304222 + 2983042xk minus 983141xk9830422983042983141xk minusAdaggerb9830422
le9830611 +
1
ε
983062983042xk minus 983141xk98304222 + (1 + ε)983042983141xk minusAdaggerb98304222 (5)
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 41 43
Combining (3) (4) and (5) yields
E983045983042xk minusAdaggerb98304222
983046
le9830611 +
1
ε
983062E983045983042xk minus 983141xk98304222
983046+ (1 + ε)E
983045983042983141xk minusAdaggerb98304222
983046
le9830611 +
1
ε
983062α2ρk
983042A9830422F983042z0 minus bperp98304222 + (1 + ε)ρE
983045983042xkminus1 minusAdaggerb98304222
983046
le9830611 +
1
ε
983062(1 + 1 + ε)α2ρk
983042A9830422F983042z0 minus bperp98304222
+(1 + ε)2ρ2E983045983042xkminus2 minusAdaggerb98304222
983046
le middot middot middot
le9830611 +
1
ε
983062(1 + 1 + ε+ middot middot middot+ (1 + ε)kminus1)α2ρk
983042A9830422F983042z0 minus bperp98304222
+(1 + ε)kρk983042x0 minusAdaggerb98304222
le (1 + ε)kρk983061(1 + ε)α2983042z0 minus bperp98304222
ε2983042A9830422F+ 983042x0 minusAdaggerb98304222
983062
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 42 43
Remark 4
For the case REBK with s = m t = n and α = 1 (ie REK) by theorthogonality
(983141xk minusAdaggerb)T(xk minus 983141xk) = 0
the equation (5) becomes
983042xk minusAdaggerb98304222 = 983042xk minus 983141xk98304222 + 983042983141xk minusAdaggerb98304222
which yields the following convergence for REK
E983045983042xk minusAdaggerb98304222
983046le ρk
983061k983042z0 minus bperp98304222
983042A9830422F+ 983042x0 minusAdaggerb98304222
983062
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 43 43
Kui Du Wutao Si and Xiaohui Sun
Pseudoinverse-free randomized extended block kaczmarz for solving least squaresarXiv preprint arXiv200104179 2020
Kui Du and Xiaohui Sun
A doubly stochastic block Gauss-Seidel algorithm for solving linear equationsarXiv preprint arXiv191213291 2019
Lawrence H Landweber
An iteration formula for Fredholm integral equations of the first kindAmer J Math 73615ndash624 1951
D Leventhal and A S Lewis
Randomized methods for linear constraints convergence rates and conditioningMath Oper Res 35(3)641ndash654 2010
Anna Ma Deanna Needell and Aaditya Ramdas
Convergence properties of the randomized extended Gauss-Seidel and Kaczmarz methodsSIAM J Matrix Anal Appl 36(4)1590ndash1604 2015
Meisam Razaviyayn Mingyi Hong Navid Reyhanian and Zhi-Quan Luo
A linearly convergent doubly stochastic Gauss-Seidel algorithm for solving linear equations and acertain class of over-parameterized optimization problemsMath Program 176(1-2 Ser B)465ndash496 2019
Thomas Strohmer and Roman Vershynin
A randomized Kaczmarz algorithm with exponential convergenceJ Fourier Anal Appl 15(2)262ndash278 2009
Anastasios Zouzias and Nikolaos M Freris
Randomized extended Kaczmarz for solving least squaresSIAM J Matrix Anal Appl 34(2)773ndash793 2013
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 43 43
Lemma 9
For any given consistent or inconsistent linear system
Ax = b
let zk be the vector generated in REBK with
z0 isin b+ range(A)
It holdsE983045983042zk minus bperp98304222
983046le ρk983042z0 minus bperp98304222
Proof By (AJj )Tbperp = 0 we have
zk minus bperp = zkminus1 minus bperp minus α
983042AJj9830422FAJj (AJj )
T(zkminus1 minus bperp) (1)
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 35 43
By z0 minus bperp = AAdaggerz0 isin range(A) we can show by induction that
zk minus bperp isin range(A)
It follows from (1) that
983042zk minus bperp98304222
= 983042zkminus1 minus bperp98304222 minus2α983042(AJj )
T(zkminus1 minus bperp)98304222983042AJj9830422F
+α2
983042AJj9830424F(zkminus1 minus bperp)
TAJj (AJj )TAJj (AJj )
T(zkminus1 minus bperp)
le 983042zkminus1 minus bperp98304222 minus(2αminus α2)983042(AJj )
T(zkminus1 minus bperp)98304222983042AJj9830422F
(by Lemma 1)
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 36 43
Taking the conditioned expectation on the first k minus 1 iterations yields
Ekminus1
983045983042zk minus bperp98304222
983046
le 983042zkminus1 minus bperp98304222 minus(2αminus α2)983042AT(zkminus1 minus bperp)98304222
983042A9830422Fle ρ983042zkminus1 minus bperp98304222 (by Lemma 2)
Taking expectation again gives
E983045983042zk minus bperp98304222
983046= E
983045Ekminus1
983045983042zk minus bperp98304222
983046983046
le ρE983045983042zkminus1 minus bperp98304222
983046
le ρk983042z0 minus bperp98304222
This completes the proof
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 37 43
Theorem 10
For any given consistent or inconsistent linear system Ax = b let xk
be the kth iterate of REBK with
z0 isin b+ range(A) and x0 isin range(AT)
For any ε gt 0 it holds
E983045983042xk minusAdaggerb98304222
983046le (1 + ε)kρk
983061(1 + ε)α2983042z0 minus bperp98304222
ε2983042A9830422F+ 983042x0 minusAdaggerb98304222
983062
Proof Let
983141xk = xkminus1 minus α
983042AIi9830422F(AIi)
TAIi(xkminus1 minusAdaggerb)
which is actually one DSBGS update for the linear systemAx = AAdaggerb from xkminus1
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 38 43
It follows from
xk minus 983141xk =α
983042AIi9830422F(AIi)
T(bIi minusAIiAdaggerbminus zkIi)
that
983042xk minus 983141xk98304222
=α2
983042AIi9830424F(bIi minusAIiA
daggerbminus zkIi)TAIi(AIi)
T(bIi minusAIiAdaggerbminus zkIi
)
leα2983042bIi minusAIiA
daggerbminus zkIi98304222
983042AIi9830422F (by Lemma 1) (2)
It follows from
Ekminus1
983045983042xk minus 983141xk98304222
983046= Ekminus1
983045Eikminus1
983045983042xk minus 983141xk98304222
983046983046
le Ekminus1
983063α2983042bminusAAdaggerbminus zk98304222
983042A9830422F
983064(by (2))
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 39 43
that
E983045983042xk minus 983141xk98304222
983046le α2
983042A9830422FE983045983042bminusAAdaggerbminus zk98304222
983046
le α2ρk
983042A9830422F983042z0 minus bperp98304222 (by Lemma 9) (3)
By x0 isin range(AT) and Adaggerb isin range(AT) we have
x0 minusAdaggerb isin range(AT)
Then we can show that xk minusAdaggerb isin range(AT) by induction By
983042983141xk minusAdaggerb98304222
= 983042xkminus1 minusAdaggerb98304222 minus2α983042AIi(x
kminus1 minusAdaggerb)98304222983042AIi9830422F
+α2
983042AIi9830424F(xkminus1 minusAdaggerb)T(AIi)
TAIi(AIi)TAIi(x
kminus1 minusAdaggerb)
le 983042xkminus1 minusAdaggerb98304222 minus(2αminus α2)983042AIi(x
kminus1 minusAdaggerb)98304222983042AIi9830422F
(by Lemma 1)
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 40 43
we have
Ekminus1
983045983042983141xk minusAdaggerb98304222
983046
le 983042xkminus1 minusAdaggerb98304222 minus(2αminus α2)983042A(xkminus1 minusAdaggerb)98304222
983042A9830422Fle ρ983042xkminus1 minusAdaggerb98304222 (by Lemma 3)
which yields
E983045983042983141xk minusAdaggerb98304222
983046le ρE
983045983042xkminus1 minusAdaggerb98304222
983046 (4)
Note that for any ε gt 0 we have
983042xk minusAdaggerb98304222= 983042xk minus 983141xk + 983141xk minusAdaggerb98304222le (983042xk minus 983141xk9830422 + 983042983141xk minusAdaggerb9830422)2
le 983042xk minus 983141xk98304222 + 983042983141xk minusAdaggerb98304222 + 2983042xk minus 983141xk9830422983042983141xk minusAdaggerb9830422
le9830611 +
1
ε
983062983042xk minus 983141xk98304222 + (1 + ε)983042983141xk minusAdaggerb98304222 (5)
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 41 43
Combining (3) (4) and (5) yields
E983045983042xk minusAdaggerb98304222
983046
le9830611 +
1
ε
983062E983045983042xk minus 983141xk98304222
983046+ (1 + ε)E
983045983042983141xk minusAdaggerb98304222
983046
le9830611 +
1
ε
983062α2ρk
983042A9830422F983042z0 minus bperp98304222 + (1 + ε)ρE
983045983042xkminus1 minusAdaggerb98304222
983046
le9830611 +
1
ε
983062(1 + 1 + ε)α2ρk
983042A9830422F983042z0 minus bperp98304222
+(1 + ε)2ρ2E983045983042xkminus2 minusAdaggerb98304222
983046
le middot middot middot
le9830611 +
1
ε
983062(1 + 1 + ε+ middot middot middot+ (1 + ε)kminus1)α2ρk
983042A9830422F983042z0 minus bperp98304222
+(1 + ε)kρk983042x0 minusAdaggerb98304222
le (1 + ε)kρk983061(1 + ε)α2983042z0 minus bperp98304222
ε2983042A9830422F+ 983042x0 minusAdaggerb98304222
983062
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 42 43
Remark 4
For the case REBK with s = m t = n and α = 1 (ie REK) by theorthogonality
(983141xk minusAdaggerb)T(xk minus 983141xk) = 0
the equation (5) becomes
983042xk minusAdaggerb98304222 = 983042xk minus 983141xk98304222 + 983042983141xk minusAdaggerb98304222
which yields the following convergence for REK
E983045983042xk minusAdaggerb98304222
983046le ρk
983061k983042z0 minus bperp98304222
983042A9830422F+ 983042x0 minusAdaggerb98304222
983062
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 43 43
Kui Du Wutao Si and Xiaohui Sun
Pseudoinverse-free randomized extended block kaczmarz for solving least squaresarXiv preprint arXiv200104179 2020
Kui Du and Xiaohui Sun
A doubly stochastic block Gauss-Seidel algorithm for solving linear equationsarXiv preprint arXiv191213291 2019
Lawrence H Landweber
An iteration formula for Fredholm integral equations of the first kindAmer J Math 73615ndash624 1951
D Leventhal and A S Lewis
Randomized methods for linear constraints convergence rates and conditioningMath Oper Res 35(3)641ndash654 2010
Anna Ma Deanna Needell and Aaditya Ramdas
Convergence properties of the randomized extended Gauss-Seidel and Kaczmarz methodsSIAM J Matrix Anal Appl 36(4)1590ndash1604 2015
Meisam Razaviyayn Mingyi Hong Navid Reyhanian and Zhi-Quan Luo
A linearly convergent doubly stochastic Gauss-Seidel algorithm for solving linear equations and acertain class of over-parameterized optimization problemsMath Program 176(1-2 Ser B)465ndash496 2019
Thomas Strohmer and Roman Vershynin
A randomized Kaczmarz algorithm with exponential convergenceJ Fourier Anal Appl 15(2)262ndash278 2009
Anastasios Zouzias and Nikolaos M Freris
Randomized extended Kaczmarz for solving least squaresSIAM J Matrix Anal Appl 34(2)773ndash793 2013
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 43 43
By z0 minus bperp = AAdaggerz0 isin range(A) we can show by induction that
zk minus bperp isin range(A)
It follows from (1) that
983042zk minus bperp98304222
= 983042zkminus1 minus bperp98304222 minus2α983042(AJj )
T(zkminus1 minus bperp)98304222983042AJj9830422F
+α2
983042AJj9830424F(zkminus1 minus bperp)
TAJj (AJj )TAJj (AJj )
T(zkminus1 minus bperp)
le 983042zkminus1 minus bperp98304222 minus(2αminus α2)983042(AJj )
T(zkminus1 minus bperp)98304222983042AJj9830422F
(by Lemma 1)
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 36 43
Taking the conditioned expectation on the first k minus 1 iterations yields
Ekminus1
983045983042zk minus bperp98304222
983046
le 983042zkminus1 minus bperp98304222 minus(2αminus α2)983042AT(zkminus1 minus bperp)98304222
983042A9830422Fle ρ983042zkminus1 minus bperp98304222 (by Lemma 2)
Taking expectation again gives
E983045983042zk minus bperp98304222
983046= E
983045Ekminus1
983045983042zk minus bperp98304222
983046983046
le ρE983045983042zkminus1 minus bperp98304222
983046
le ρk983042z0 minus bperp98304222
This completes the proof
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 37 43
Theorem 10
For any given consistent or inconsistent linear system Ax = b let xk
be the kth iterate of REBK with
z0 isin b+ range(A) and x0 isin range(AT)
For any ε gt 0 it holds
E983045983042xk minusAdaggerb98304222
983046le (1 + ε)kρk
983061(1 + ε)α2983042z0 minus bperp98304222
ε2983042A9830422F+ 983042x0 minusAdaggerb98304222
983062
Proof Let
983141xk = xkminus1 minus α
983042AIi9830422F(AIi)
TAIi(xkminus1 minusAdaggerb)
which is actually one DSBGS update for the linear systemAx = AAdaggerb from xkminus1
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 38 43
It follows from
xk minus 983141xk =α
983042AIi9830422F(AIi)
T(bIi minusAIiAdaggerbminus zkIi)
that
983042xk minus 983141xk98304222
=α2
983042AIi9830424F(bIi minusAIiA
daggerbminus zkIi)TAIi(AIi)
T(bIi minusAIiAdaggerbminus zkIi
)
leα2983042bIi minusAIiA
daggerbminus zkIi98304222
983042AIi9830422F (by Lemma 1) (2)
It follows from
Ekminus1
983045983042xk minus 983141xk98304222
983046= Ekminus1
983045Eikminus1
983045983042xk minus 983141xk98304222
983046983046
le Ekminus1
983063α2983042bminusAAdaggerbminus zk98304222
983042A9830422F
983064(by (2))
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 39 43
that
E983045983042xk minus 983141xk98304222
983046le α2
983042A9830422FE983045983042bminusAAdaggerbminus zk98304222
983046
le α2ρk
983042A9830422F983042z0 minus bperp98304222 (by Lemma 9) (3)
By x0 isin range(AT) and Adaggerb isin range(AT) we have
x0 minusAdaggerb isin range(AT)
Then we can show that xk minusAdaggerb isin range(AT) by induction By
983042983141xk minusAdaggerb98304222
= 983042xkminus1 minusAdaggerb98304222 minus2α983042AIi(x
kminus1 minusAdaggerb)98304222983042AIi9830422F
+α2
983042AIi9830424F(xkminus1 minusAdaggerb)T(AIi)
TAIi(AIi)TAIi(x
kminus1 minusAdaggerb)
le 983042xkminus1 minusAdaggerb98304222 minus(2αminus α2)983042AIi(x
kminus1 minusAdaggerb)98304222983042AIi9830422F
(by Lemma 1)
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 40 43
we have
Ekminus1
983045983042983141xk minusAdaggerb98304222
983046
le 983042xkminus1 minusAdaggerb98304222 minus(2αminus α2)983042A(xkminus1 minusAdaggerb)98304222
983042A9830422Fle ρ983042xkminus1 minusAdaggerb98304222 (by Lemma 3)
which yields
E983045983042983141xk minusAdaggerb98304222
983046le ρE
983045983042xkminus1 minusAdaggerb98304222
983046 (4)
Note that for any ε gt 0 we have
983042xk minusAdaggerb98304222= 983042xk minus 983141xk + 983141xk minusAdaggerb98304222le (983042xk minus 983141xk9830422 + 983042983141xk minusAdaggerb9830422)2
le 983042xk minus 983141xk98304222 + 983042983141xk minusAdaggerb98304222 + 2983042xk minus 983141xk9830422983042983141xk minusAdaggerb9830422
le9830611 +
1
ε
983062983042xk minus 983141xk98304222 + (1 + ε)983042983141xk minusAdaggerb98304222 (5)
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 41 43
Combining (3) (4) and (5) yields
E983045983042xk minusAdaggerb98304222
983046
le9830611 +
1
ε
983062E983045983042xk minus 983141xk98304222
983046+ (1 + ε)E
983045983042983141xk minusAdaggerb98304222
983046
le9830611 +
1
ε
983062α2ρk
983042A9830422F983042z0 minus bperp98304222 + (1 + ε)ρE
983045983042xkminus1 minusAdaggerb98304222
983046
le9830611 +
1
ε
983062(1 + 1 + ε)α2ρk
983042A9830422F983042z0 minus bperp98304222
+(1 + ε)2ρ2E983045983042xkminus2 minusAdaggerb98304222
983046
le middot middot middot
le9830611 +
1
ε
983062(1 + 1 + ε+ middot middot middot+ (1 + ε)kminus1)α2ρk
983042A9830422F983042z0 minus bperp98304222
+(1 + ε)kρk983042x0 minusAdaggerb98304222
le (1 + ε)kρk983061(1 + ε)α2983042z0 minus bperp98304222
ε2983042A9830422F+ 983042x0 minusAdaggerb98304222
983062
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 42 43
Remark 4
For the case REBK with s = m t = n and α = 1 (ie REK) by theorthogonality
(983141xk minusAdaggerb)T(xk minus 983141xk) = 0
the equation (5) becomes
983042xk minusAdaggerb98304222 = 983042xk minus 983141xk98304222 + 983042983141xk minusAdaggerb98304222
which yields the following convergence for REK
E983045983042xk minusAdaggerb98304222
983046le ρk
983061k983042z0 minus bperp98304222
983042A9830422F+ 983042x0 minusAdaggerb98304222
983062
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 43 43
Kui Du Wutao Si and Xiaohui Sun
Pseudoinverse-free randomized extended block kaczmarz for solving least squaresarXiv preprint arXiv200104179 2020
Kui Du and Xiaohui Sun
A doubly stochastic block Gauss-Seidel algorithm for solving linear equationsarXiv preprint arXiv191213291 2019
Lawrence H Landweber
An iteration formula for Fredholm integral equations of the first kindAmer J Math 73615ndash624 1951
D Leventhal and A S Lewis
Randomized methods for linear constraints convergence rates and conditioningMath Oper Res 35(3)641ndash654 2010
Anna Ma Deanna Needell and Aaditya Ramdas
Convergence properties of the randomized extended Gauss-Seidel and Kaczmarz methodsSIAM J Matrix Anal Appl 36(4)1590ndash1604 2015
Meisam Razaviyayn Mingyi Hong Navid Reyhanian and Zhi-Quan Luo
A linearly convergent doubly stochastic Gauss-Seidel algorithm for solving linear equations and acertain class of over-parameterized optimization problemsMath Program 176(1-2 Ser B)465ndash496 2019
Thomas Strohmer and Roman Vershynin
A randomized Kaczmarz algorithm with exponential convergenceJ Fourier Anal Appl 15(2)262ndash278 2009
Anastasios Zouzias and Nikolaos M Freris
Randomized extended Kaczmarz for solving least squaresSIAM J Matrix Anal Appl 34(2)773ndash793 2013
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 43 43
Taking the conditioned expectation on the first k minus 1 iterations yields
Ekminus1
983045983042zk minus bperp98304222
983046
le 983042zkminus1 minus bperp98304222 minus(2αminus α2)983042AT(zkminus1 minus bperp)98304222
983042A9830422Fle ρ983042zkminus1 minus bperp98304222 (by Lemma 2)
Taking expectation again gives
E983045983042zk minus bperp98304222
983046= E
983045Ekminus1
983045983042zk minus bperp98304222
983046983046
le ρE983045983042zkminus1 minus bperp98304222
983046
le ρk983042z0 minus bperp98304222
This completes the proof
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 37 43
Theorem 10
For any given consistent or inconsistent linear system Ax = b let xk
be the kth iterate of REBK with
z0 isin b+ range(A) and x0 isin range(AT)
For any ε gt 0 it holds
E983045983042xk minusAdaggerb98304222
983046le (1 + ε)kρk
983061(1 + ε)α2983042z0 minus bperp98304222
ε2983042A9830422F+ 983042x0 minusAdaggerb98304222
983062
Proof Let
983141xk = xkminus1 minus α
983042AIi9830422F(AIi)
TAIi(xkminus1 minusAdaggerb)
which is actually one DSBGS update for the linear systemAx = AAdaggerb from xkminus1
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 38 43
It follows from
xk minus 983141xk =α
983042AIi9830422F(AIi)
T(bIi minusAIiAdaggerbminus zkIi)
that
983042xk minus 983141xk98304222
=α2
983042AIi9830424F(bIi minusAIiA
daggerbminus zkIi)TAIi(AIi)
T(bIi minusAIiAdaggerbminus zkIi
)
leα2983042bIi minusAIiA
daggerbminus zkIi98304222
983042AIi9830422F (by Lemma 1) (2)
It follows from
Ekminus1
983045983042xk minus 983141xk98304222
983046= Ekminus1
983045Eikminus1
983045983042xk minus 983141xk98304222
983046983046
le Ekminus1
983063α2983042bminusAAdaggerbminus zk98304222
983042A9830422F
983064(by (2))
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 39 43
that
E983045983042xk minus 983141xk98304222
983046le α2
983042A9830422FE983045983042bminusAAdaggerbminus zk98304222
983046
le α2ρk
983042A9830422F983042z0 minus bperp98304222 (by Lemma 9) (3)
By x0 isin range(AT) and Adaggerb isin range(AT) we have
x0 minusAdaggerb isin range(AT)
Then we can show that xk minusAdaggerb isin range(AT) by induction By
983042983141xk minusAdaggerb98304222
= 983042xkminus1 minusAdaggerb98304222 minus2α983042AIi(x
kminus1 minusAdaggerb)98304222983042AIi9830422F
+α2
983042AIi9830424F(xkminus1 minusAdaggerb)T(AIi)
TAIi(AIi)TAIi(x
kminus1 minusAdaggerb)
le 983042xkminus1 minusAdaggerb98304222 minus(2αminus α2)983042AIi(x
kminus1 minusAdaggerb)98304222983042AIi9830422F
(by Lemma 1)
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 40 43
we have
Ekminus1
983045983042983141xk minusAdaggerb98304222
983046
le 983042xkminus1 minusAdaggerb98304222 minus(2αminus α2)983042A(xkminus1 minusAdaggerb)98304222
983042A9830422Fle ρ983042xkminus1 minusAdaggerb98304222 (by Lemma 3)
which yields
E983045983042983141xk minusAdaggerb98304222
983046le ρE
983045983042xkminus1 minusAdaggerb98304222
983046 (4)
Note that for any ε gt 0 we have
983042xk minusAdaggerb98304222= 983042xk minus 983141xk + 983141xk minusAdaggerb98304222le (983042xk minus 983141xk9830422 + 983042983141xk minusAdaggerb9830422)2
le 983042xk minus 983141xk98304222 + 983042983141xk minusAdaggerb98304222 + 2983042xk minus 983141xk9830422983042983141xk minusAdaggerb9830422
le9830611 +
1
ε
983062983042xk minus 983141xk98304222 + (1 + ε)983042983141xk minusAdaggerb98304222 (5)
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 41 43
Combining (3) (4) and (5) yields
E983045983042xk minusAdaggerb98304222
983046
le9830611 +
1
ε
983062E983045983042xk minus 983141xk98304222
983046+ (1 + ε)E
983045983042983141xk minusAdaggerb98304222
983046
le9830611 +
1
ε
983062α2ρk
983042A9830422F983042z0 minus bperp98304222 + (1 + ε)ρE
983045983042xkminus1 minusAdaggerb98304222
983046
le9830611 +
1
ε
983062(1 + 1 + ε)α2ρk
983042A9830422F983042z0 minus bperp98304222
+(1 + ε)2ρ2E983045983042xkminus2 minusAdaggerb98304222
983046
le middot middot middot
le9830611 +
1
ε
983062(1 + 1 + ε+ middot middot middot+ (1 + ε)kminus1)α2ρk
983042A9830422F983042z0 minus bperp98304222
+(1 + ε)kρk983042x0 minusAdaggerb98304222
le (1 + ε)kρk983061(1 + ε)α2983042z0 minus bperp98304222
ε2983042A9830422F+ 983042x0 minusAdaggerb98304222
983062
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 42 43
Remark 4
For the case REBK with s = m t = n and α = 1 (ie REK) by theorthogonality
(983141xk minusAdaggerb)T(xk minus 983141xk) = 0
the equation (5) becomes
983042xk minusAdaggerb98304222 = 983042xk minus 983141xk98304222 + 983042983141xk minusAdaggerb98304222
which yields the following convergence for REK
E983045983042xk minusAdaggerb98304222
983046le ρk
983061k983042z0 minus bperp98304222
983042A9830422F+ 983042x0 minusAdaggerb98304222
983062
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 43 43
Kui Du Wutao Si and Xiaohui Sun
Pseudoinverse-free randomized extended block kaczmarz for solving least squaresarXiv preprint arXiv200104179 2020
Kui Du and Xiaohui Sun
A doubly stochastic block Gauss-Seidel algorithm for solving linear equationsarXiv preprint arXiv191213291 2019
Lawrence H Landweber
An iteration formula for Fredholm integral equations of the first kindAmer J Math 73615ndash624 1951
D Leventhal and A S Lewis
Randomized methods for linear constraints convergence rates and conditioningMath Oper Res 35(3)641ndash654 2010
Anna Ma Deanna Needell and Aaditya Ramdas
Convergence properties of the randomized extended Gauss-Seidel and Kaczmarz methodsSIAM J Matrix Anal Appl 36(4)1590ndash1604 2015
Meisam Razaviyayn Mingyi Hong Navid Reyhanian and Zhi-Quan Luo
A linearly convergent doubly stochastic Gauss-Seidel algorithm for solving linear equations and acertain class of over-parameterized optimization problemsMath Program 176(1-2 Ser B)465ndash496 2019
Thomas Strohmer and Roman Vershynin
A randomized Kaczmarz algorithm with exponential convergenceJ Fourier Anal Appl 15(2)262ndash278 2009
Anastasios Zouzias and Nikolaos M Freris
Randomized extended Kaczmarz for solving least squaresSIAM J Matrix Anal Appl 34(2)773ndash793 2013
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 43 43
Theorem 10
For any given consistent or inconsistent linear system Ax = b let xk
be the kth iterate of REBK with
z0 isin b+ range(A) and x0 isin range(AT)
For any ε gt 0 it holds
E983045983042xk minusAdaggerb98304222
983046le (1 + ε)kρk
983061(1 + ε)α2983042z0 minus bperp98304222
ε2983042A9830422F+ 983042x0 minusAdaggerb98304222
983062
Proof Let
983141xk = xkminus1 minus α
983042AIi9830422F(AIi)
TAIi(xkminus1 minusAdaggerb)
which is actually one DSBGS update for the linear systemAx = AAdaggerb from xkminus1
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 38 43
It follows from
xk minus 983141xk =α
983042AIi9830422F(AIi)
T(bIi minusAIiAdaggerbminus zkIi)
that
983042xk minus 983141xk98304222
=α2
983042AIi9830424F(bIi minusAIiA
daggerbminus zkIi)TAIi(AIi)
T(bIi minusAIiAdaggerbminus zkIi
)
leα2983042bIi minusAIiA
daggerbminus zkIi98304222
983042AIi9830422F (by Lemma 1) (2)
It follows from
Ekminus1
983045983042xk minus 983141xk98304222
983046= Ekminus1
983045Eikminus1
983045983042xk minus 983141xk98304222
983046983046
le Ekminus1
983063α2983042bminusAAdaggerbminus zk98304222
983042A9830422F
983064(by (2))
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 39 43
that
E983045983042xk minus 983141xk98304222
983046le α2
983042A9830422FE983045983042bminusAAdaggerbminus zk98304222
983046
le α2ρk
983042A9830422F983042z0 minus bperp98304222 (by Lemma 9) (3)
By x0 isin range(AT) and Adaggerb isin range(AT) we have
x0 minusAdaggerb isin range(AT)
Then we can show that xk minusAdaggerb isin range(AT) by induction By
983042983141xk minusAdaggerb98304222
= 983042xkminus1 minusAdaggerb98304222 minus2α983042AIi(x
kminus1 minusAdaggerb)98304222983042AIi9830422F
+α2
983042AIi9830424F(xkminus1 minusAdaggerb)T(AIi)
TAIi(AIi)TAIi(x
kminus1 minusAdaggerb)
le 983042xkminus1 minusAdaggerb98304222 minus(2αminus α2)983042AIi(x
kminus1 minusAdaggerb)98304222983042AIi9830422F
(by Lemma 1)
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 40 43
we have
Ekminus1
983045983042983141xk minusAdaggerb98304222
983046
le 983042xkminus1 minusAdaggerb98304222 minus(2αminus α2)983042A(xkminus1 minusAdaggerb)98304222
983042A9830422Fle ρ983042xkminus1 minusAdaggerb98304222 (by Lemma 3)
which yields
E983045983042983141xk minusAdaggerb98304222
983046le ρE
983045983042xkminus1 minusAdaggerb98304222
983046 (4)
Note that for any ε gt 0 we have
983042xk minusAdaggerb98304222= 983042xk minus 983141xk + 983141xk minusAdaggerb98304222le (983042xk minus 983141xk9830422 + 983042983141xk minusAdaggerb9830422)2
le 983042xk minus 983141xk98304222 + 983042983141xk minusAdaggerb98304222 + 2983042xk minus 983141xk9830422983042983141xk minusAdaggerb9830422
le9830611 +
1
ε
983062983042xk minus 983141xk98304222 + (1 + ε)983042983141xk minusAdaggerb98304222 (5)
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 41 43
Combining (3) (4) and (5) yields
E983045983042xk minusAdaggerb98304222
983046
le9830611 +
1
ε
983062E983045983042xk minus 983141xk98304222
983046+ (1 + ε)E
983045983042983141xk minusAdaggerb98304222
983046
le9830611 +
1
ε
983062α2ρk
983042A9830422F983042z0 minus bperp98304222 + (1 + ε)ρE
983045983042xkminus1 minusAdaggerb98304222
983046
le9830611 +
1
ε
983062(1 + 1 + ε)α2ρk
983042A9830422F983042z0 minus bperp98304222
+(1 + ε)2ρ2E983045983042xkminus2 minusAdaggerb98304222
983046
le middot middot middot
le9830611 +
1
ε
983062(1 + 1 + ε+ middot middot middot+ (1 + ε)kminus1)α2ρk
983042A9830422F983042z0 minus bperp98304222
+(1 + ε)kρk983042x0 minusAdaggerb98304222
le (1 + ε)kρk983061(1 + ε)α2983042z0 minus bperp98304222
ε2983042A9830422F+ 983042x0 minusAdaggerb98304222
983062
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 42 43
Remark 4
For the case REBK with s = m t = n and α = 1 (ie REK) by theorthogonality
(983141xk minusAdaggerb)T(xk minus 983141xk) = 0
the equation (5) becomes
983042xk minusAdaggerb98304222 = 983042xk minus 983141xk98304222 + 983042983141xk minusAdaggerb98304222
which yields the following convergence for REK
E983045983042xk minusAdaggerb98304222
983046le ρk
983061k983042z0 minus bperp98304222
983042A9830422F+ 983042x0 minusAdaggerb98304222
983062
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 43 43
Kui Du Wutao Si and Xiaohui Sun
Pseudoinverse-free randomized extended block kaczmarz for solving least squaresarXiv preprint arXiv200104179 2020
Kui Du and Xiaohui Sun
A doubly stochastic block Gauss-Seidel algorithm for solving linear equationsarXiv preprint arXiv191213291 2019
Lawrence H Landweber
An iteration formula for Fredholm integral equations of the first kindAmer J Math 73615ndash624 1951
D Leventhal and A S Lewis
Randomized methods for linear constraints convergence rates and conditioningMath Oper Res 35(3)641ndash654 2010
Anna Ma Deanna Needell and Aaditya Ramdas
Convergence properties of the randomized extended Gauss-Seidel and Kaczmarz methodsSIAM J Matrix Anal Appl 36(4)1590ndash1604 2015
Meisam Razaviyayn Mingyi Hong Navid Reyhanian and Zhi-Quan Luo
A linearly convergent doubly stochastic Gauss-Seidel algorithm for solving linear equations and acertain class of over-parameterized optimization problemsMath Program 176(1-2 Ser B)465ndash496 2019
Thomas Strohmer and Roman Vershynin
A randomized Kaczmarz algorithm with exponential convergenceJ Fourier Anal Appl 15(2)262ndash278 2009
Anastasios Zouzias and Nikolaos M Freris
Randomized extended Kaczmarz for solving least squaresSIAM J Matrix Anal Appl 34(2)773ndash793 2013
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 43 43
It follows from
xk minus 983141xk =α
983042AIi9830422F(AIi)
T(bIi minusAIiAdaggerbminus zkIi)
that
983042xk minus 983141xk98304222
=α2
983042AIi9830424F(bIi minusAIiA
daggerbminus zkIi)TAIi(AIi)
T(bIi minusAIiAdaggerbminus zkIi
)
leα2983042bIi minusAIiA
daggerbminus zkIi98304222
983042AIi9830422F (by Lemma 1) (2)
It follows from
Ekminus1
983045983042xk minus 983141xk98304222
983046= Ekminus1
983045Eikminus1
983045983042xk minus 983141xk98304222
983046983046
le Ekminus1
983063α2983042bminusAAdaggerbminus zk98304222
983042A9830422F
983064(by (2))
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 39 43
that
E983045983042xk minus 983141xk98304222
983046le α2
983042A9830422FE983045983042bminusAAdaggerbminus zk98304222
983046
le α2ρk
983042A9830422F983042z0 minus bperp98304222 (by Lemma 9) (3)
By x0 isin range(AT) and Adaggerb isin range(AT) we have
x0 minusAdaggerb isin range(AT)
Then we can show that xk minusAdaggerb isin range(AT) by induction By
983042983141xk minusAdaggerb98304222
= 983042xkminus1 minusAdaggerb98304222 minus2α983042AIi(x
kminus1 minusAdaggerb)98304222983042AIi9830422F
+α2
983042AIi9830424F(xkminus1 minusAdaggerb)T(AIi)
TAIi(AIi)TAIi(x
kminus1 minusAdaggerb)
le 983042xkminus1 minusAdaggerb98304222 minus(2αminus α2)983042AIi(x
kminus1 minusAdaggerb)98304222983042AIi9830422F
(by Lemma 1)
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 40 43
we have
Ekminus1
983045983042983141xk minusAdaggerb98304222
983046
le 983042xkminus1 minusAdaggerb98304222 minus(2αminus α2)983042A(xkminus1 minusAdaggerb)98304222
983042A9830422Fle ρ983042xkminus1 minusAdaggerb98304222 (by Lemma 3)
which yields
E983045983042983141xk minusAdaggerb98304222
983046le ρE
983045983042xkminus1 minusAdaggerb98304222
983046 (4)
Note that for any ε gt 0 we have
983042xk minusAdaggerb98304222= 983042xk minus 983141xk + 983141xk minusAdaggerb98304222le (983042xk minus 983141xk9830422 + 983042983141xk minusAdaggerb9830422)2
le 983042xk minus 983141xk98304222 + 983042983141xk minusAdaggerb98304222 + 2983042xk minus 983141xk9830422983042983141xk minusAdaggerb9830422
le9830611 +
1
ε
983062983042xk minus 983141xk98304222 + (1 + ε)983042983141xk minusAdaggerb98304222 (5)
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 41 43
Combining (3) (4) and (5) yields
E983045983042xk minusAdaggerb98304222
983046
le9830611 +
1
ε
983062E983045983042xk minus 983141xk98304222
983046+ (1 + ε)E
983045983042983141xk minusAdaggerb98304222
983046
le9830611 +
1
ε
983062α2ρk
983042A9830422F983042z0 minus bperp98304222 + (1 + ε)ρE
983045983042xkminus1 minusAdaggerb98304222
983046
le9830611 +
1
ε
983062(1 + 1 + ε)α2ρk
983042A9830422F983042z0 minus bperp98304222
+(1 + ε)2ρ2E983045983042xkminus2 minusAdaggerb98304222
983046
le middot middot middot
le9830611 +
1
ε
983062(1 + 1 + ε+ middot middot middot+ (1 + ε)kminus1)α2ρk
983042A9830422F983042z0 minus bperp98304222
+(1 + ε)kρk983042x0 minusAdaggerb98304222
le (1 + ε)kρk983061(1 + ε)α2983042z0 minus bperp98304222
ε2983042A9830422F+ 983042x0 minusAdaggerb98304222
983062
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 42 43
Remark 4
For the case REBK with s = m t = n and α = 1 (ie REK) by theorthogonality
(983141xk minusAdaggerb)T(xk minus 983141xk) = 0
the equation (5) becomes
983042xk minusAdaggerb98304222 = 983042xk minus 983141xk98304222 + 983042983141xk minusAdaggerb98304222
which yields the following convergence for REK
E983045983042xk minusAdaggerb98304222
983046le ρk
983061k983042z0 minus bperp98304222
983042A9830422F+ 983042x0 minusAdaggerb98304222
983062
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 43 43
Kui Du Wutao Si and Xiaohui Sun
Pseudoinverse-free randomized extended block kaczmarz for solving least squaresarXiv preprint arXiv200104179 2020
Kui Du and Xiaohui Sun
A doubly stochastic block Gauss-Seidel algorithm for solving linear equationsarXiv preprint arXiv191213291 2019
Lawrence H Landweber
An iteration formula for Fredholm integral equations of the first kindAmer J Math 73615ndash624 1951
D Leventhal and A S Lewis
Randomized methods for linear constraints convergence rates and conditioningMath Oper Res 35(3)641ndash654 2010
Anna Ma Deanna Needell and Aaditya Ramdas
Convergence properties of the randomized extended Gauss-Seidel and Kaczmarz methodsSIAM J Matrix Anal Appl 36(4)1590ndash1604 2015
Meisam Razaviyayn Mingyi Hong Navid Reyhanian and Zhi-Quan Luo
A linearly convergent doubly stochastic Gauss-Seidel algorithm for solving linear equations and acertain class of over-parameterized optimization problemsMath Program 176(1-2 Ser B)465ndash496 2019
Thomas Strohmer and Roman Vershynin
A randomized Kaczmarz algorithm with exponential convergenceJ Fourier Anal Appl 15(2)262ndash278 2009
Anastasios Zouzias and Nikolaos M Freris
Randomized extended Kaczmarz for solving least squaresSIAM J Matrix Anal Appl 34(2)773ndash793 2013
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 43 43
that
E983045983042xk minus 983141xk98304222
983046le α2
983042A9830422FE983045983042bminusAAdaggerbminus zk98304222
983046
le α2ρk
983042A9830422F983042z0 minus bperp98304222 (by Lemma 9) (3)
By x0 isin range(AT) and Adaggerb isin range(AT) we have
x0 minusAdaggerb isin range(AT)
Then we can show that xk minusAdaggerb isin range(AT) by induction By
983042983141xk minusAdaggerb98304222
= 983042xkminus1 minusAdaggerb98304222 minus2α983042AIi(x
kminus1 minusAdaggerb)98304222983042AIi9830422F
+α2
983042AIi9830424F(xkminus1 minusAdaggerb)T(AIi)
TAIi(AIi)TAIi(x
kminus1 minusAdaggerb)
le 983042xkminus1 minusAdaggerb98304222 minus(2αminus α2)983042AIi(x
kminus1 minusAdaggerb)98304222983042AIi9830422F
(by Lemma 1)
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 40 43
we have
Ekminus1
983045983042983141xk minusAdaggerb98304222
983046
le 983042xkminus1 minusAdaggerb98304222 minus(2αminus α2)983042A(xkminus1 minusAdaggerb)98304222
983042A9830422Fle ρ983042xkminus1 minusAdaggerb98304222 (by Lemma 3)
which yields
E983045983042983141xk minusAdaggerb98304222
983046le ρE
983045983042xkminus1 minusAdaggerb98304222
983046 (4)
Note that for any ε gt 0 we have
983042xk minusAdaggerb98304222= 983042xk minus 983141xk + 983141xk minusAdaggerb98304222le (983042xk minus 983141xk9830422 + 983042983141xk minusAdaggerb9830422)2
le 983042xk minus 983141xk98304222 + 983042983141xk minusAdaggerb98304222 + 2983042xk minus 983141xk9830422983042983141xk minusAdaggerb9830422
le9830611 +
1
ε
983062983042xk minus 983141xk98304222 + (1 + ε)983042983141xk minusAdaggerb98304222 (5)
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 41 43
Combining (3) (4) and (5) yields
E983045983042xk minusAdaggerb98304222
983046
le9830611 +
1
ε
983062E983045983042xk minus 983141xk98304222
983046+ (1 + ε)E
983045983042983141xk minusAdaggerb98304222
983046
le9830611 +
1
ε
983062α2ρk
983042A9830422F983042z0 minus bperp98304222 + (1 + ε)ρE
983045983042xkminus1 minusAdaggerb98304222
983046
le9830611 +
1
ε
983062(1 + 1 + ε)α2ρk
983042A9830422F983042z0 minus bperp98304222
+(1 + ε)2ρ2E983045983042xkminus2 minusAdaggerb98304222
983046
le middot middot middot
le9830611 +
1
ε
983062(1 + 1 + ε+ middot middot middot+ (1 + ε)kminus1)α2ρk
983042A9830422F983042z0 minus bperp98304222
+(1 + ε)kρk983042x0 minusAdaggerb98304222
le (1 + ε)kρk983061(1 + ε)α2983042z0 minus bperp98304222
ε2983042A9830422F+ 983042x0 minusAdaggerb98304222
983062
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 42 43
Remark 4
For the case REBK with s = m t = n and α = 1 (ie REK) by theorthogonality
(983141xk minusAdaggerb)T(xk minus 983141xk) = 0
the equation (5) becomes
983042xk minusAdaggerb98304222 = 983042xk minus 983141xk98304222 + 983042983141xk minusAdaggerb98304222
which yields the following convergence for REK
E983045983042xk minusAdaggerb98304222
983046le ρk
983061k983042z0 minus bperp98304222
983042A9830422F+ 983042x0 minusAdaggerb98304222
983062
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 43 43
Kui Du Wutao Si and Xiaohui Sun
Pseudoinverse-free randomized extended block kaczmarz for solving least squaresarXiv preprint arXiv200104179 2020
Kui Du and Xiaohui Sun
A doubly stochastic block Gauss-Seidel algorithm for solving linear equationsarXiv preprint arXiv191213291 2019
Lawrence H Landweber
An iteration formula for Fredholm integral equations of the first kindAmer J Math 73615ndash624 1951
D Leventhal and A S Lewis
Randomized methods for linear constraints convergence rates and conditioningMath Oper Res 35(3)641ndash654 2010
Anna Ma Deanna Needell and Aaditya Ramdas
Convergence properties of the randomized extended Gauss-Seidel and Kaczmarz methodsSIAM J Matrix Anal Appl 36(4)1590ndash1604 2015
Meisam Razaviyayn Mingyi Hong Navid Reyhanian and Zhi-Quan Luo
A linearly convergent doubly stochastic Gauss-Seidel algorithm for solving linear equations and acertain class of over-parameterized optimization problemsMath Program 176(1-2 Ser B)465ndash496 2019
Thomas Strohmer and Roman Vershynin
A randomized Kaczmarz algorithm with exponential convergenceJ Fourier Anal Appl 15(2)262ndash278 2009
Anastasios Zouzias and Nikolaos M Freris
Randomized extended Kaczmarz for solving least squaresSIAM J Matrix Anal Appl 34(2)773ndash793 2013
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 43 43
we have
Ekminus1
983045983042983141xk minusAdaggerb98304222
983046
le 983042xkminus1 minusAdaggerb98304222 minus(2αminus α2)983042A(xkminus1 minusAdaggerb)98304222
983042A9830422Fle ρ983042xkminus1 minusAdaggerb98304222 (by Lemma 3)
which yields
E983045983042983141xk minusAdaggerb98304222
983046le ρE
983045983042xkminus1 minusAdaggerb98304222
983046 (4)
Note that for any ε gt 0 we have
983042xk minusAdaggerb98304222= 983042xk minus 983141xk + 983141xk minusAdaggerb98304222le (983042xk minus 983141xk9830422 + 983042983141xk minusAdaggerb9830422)2
le 983042xk minus 983141xk98304222 + 983042983141xk minusAdaggerb98304222 + 2983042xk minus 983141xk9830422983042983141xk minusAdaggerb9830422
le9830611 +
1
ε
983062983042xk minus 983141xk98304222 + (1 + ε)983042983141xk minusAdaggerb98304222 (5)
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 41 43
Combining (3) (4) and (5) yields
E983045983042xk minusAdaggerb98304222
983046
le9830611 +
1
ε
983062E983045983042xk minus 983141xk98304222
983046+ (1 + ε)E
983045983042983141xk minusAdaggerb98304222
983046
le9830611 +
1
ε
983062α2ρk
983042A9830422F983042z0 minus bperp98304222 + (1 + ε)ρE
983045983042xkminus1 minusAdaggerb98304222
983046
le9830611 +
1
ε
983062(1 + 1 + ε)α2ρk
983042A9830422F983042z0 minus bperp98304222
+(1 + ε)2ρ2E983045983042xkminus2 minusAdaggerb98304222
983046
le middot middot middot
le9830611 +
1
ε
983062(1 + 1 + ε+ middot middot middot+ (1 + ε)kminus1)α2ρk
983042A9830422F983042z0 minus bperp98304222
+(1 + ε)kρk983042x0 minusAdaggerb98304222
le (1 + ε)kρk983061(1 + ε)α2983042z0 minus bperp98304222
ε2983042A9830422F+ 983042x0 minusAdaggerb98304222
983062
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 42 43
Remark 4
For the case REBK with s = m t = n and α = 1 (ie REK) by theorthogonality
(983141xk minusAdaggerb)T(xk minus 983141xk) = 0
the equation (5) becomes
983042xk minusAdaggerb98304222 = 983042xk minus 983141xk98304222 + 983042983141xk minusAdaggerb98304222
which yields the following convergence for REK
E983045983042xk minusAdaggerb98304222
983046le ρk
983061k983042z0 minus bperp98304222
983042A9830422F+ 983042x0 minusAdaggerb98304222
983062
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 43 43
Kui Du Wutao Si and Xiaohui Sun
Pseudoinverse-free randomized extended block kaczmarz for solving least squaresarXiv preprint arXiv200104179 2020
Kui Du and Xiaohui Sun
A doubly stochastic block Gauss-Seidel algorithm for solving linear equationsarXiv preprint arXiv191213291 2019
Lawrence H Landweber
An iteration formula for Fredholm integral equations of the first kindAmer J Math 73615ndash624 1951
D Leventhal and A S Lewis
Randomized methods for linear constraints convergence rates and conditioningMath Oper Res 35(3)641ndash654 2010
Anna Ma Deanna Needell and Aaditya Ramdas
Convergence properties of the randomized extended Gauss-Seidel and Kaczmarz methodsSIAM J Matrix Anal Appl 36(4)1590ndash1604 2015
Meisam Razaviyayn Mingyi Hong Navid Reyhanian and Zhi-Quan Luo
A linearly convergent doubly stochastic Gauss-Seidel algorithm for solving linear equations and acertain class of over-parameterized optimization problemsMath Program 176(1-2 Ser B)465ndash496 2019
Thomas Strohmer and Roman Vershynin
A randomized Kaczmarz algorithm with exponential convergenceJ Fourier Anal Appl 15(2)262ndash278 2009
Anastasios Zouzias and Nikolaos M Freris
Randomized extended Kaczmarz for solving least squaresSIAM J Matrix Anal Appl 34(2)773ndash793 2013
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 43 43
Combining (3) (4) and (5) yields
E983045983042xk minusAdaggerb98304222
983046
le9830611 +
1
ε
983062E983045983042xk minus 983141xk98304222
983046+ (1 + ε)E
983045983042983141xk minusAdaggerb98304222
983046
le9830611 +
1
ε
983062α2ρk
983042A9830422F983042z0 minus bperp98304222 + (1 + ε)ρE
983045983042xkminus1 minusAdaggerb98304222
983046
le9830611 +
1
ε
983062(1 + 1 + ε)α2ρk
983042A9830422F983042z0 minus bperp98304222
+(1 + ε)2ρ2E983045983042xkminus2 minusAdaggerb98304222
983046
le middot middot middot
le9830611 +
1
ε
983062(1 + 1 + ε+ middot middot middot+ (1 + ε)kminus1)α2ρk
983042A9830422F983042z0 minus bperp98304222
+(1 + ε)kρk983042x0 minusAdaggerb98304222
le (1 + ε)kρk983061(1 + ε)α2983042z0 minus bperp98304222
ε2983042A9830422F+ 983042x0 minusAdaggerb98304222
983062
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 42 43
Remark 4
For the case REBK with s = m t = n and α = 1 (ie REK) by theorthogonality
(983141xk minusAdaggerb)T(xk minus 983141xk) = 0
the equation (5) becomes
983042xk minusAdaggerb98304222 = 983042xk minus 983141xk98304222 + 983042983141xk minusAdaggerb98304222
which yields the following convergence for REK
E983045983042xk minusAdaggerb98304222
983046le ρk
983061k983042z0 minus bperp98304222
983042A9830422F+ 983042x0 minusAdaggerb98304222
983062
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 43 43
Kui Du Wutao Si and Xiaohui Sun
Pseudoinverse-free randomized extended block kaczmarz for solving least squaresarXiv preprint arXiv200104179 2020
Kui Du and Xiaohui Sun
A doubly stochastic block Gauss-Seidel algorithm for solving linear equationsarXiv preprint arXiv191213291 2019
Lawrence H Landweber
An iteration formula for Fredholm integral equations of the first kindAmer J Math 73615ndash624 1951
D Leventhal and A S Lewis
Randomized methods for linear constraints convergence rates and conditioningMath Oper Res 35(3)641ndash654 2010
Anna Ma Deanna Needell and Aaditya Ramdas
Convergence properties of the randomized extended Gauss-Seidel and Kaczmarz methodsSIAM J Matrix Anal Appl 36(4)1590ndash1604 2015
Meisam Razaviyayn Mingyi Hong Navid Reyhanian and Zhi-Quan Luo
A linearly convergent doubly stochastic Gauss-Seidel algorithm for solving linear equations and acertain class of over-parameterized optimization problemsMath Program 176(1-2 Ser B)465ndash496 2019
Thomas Strohmer and Roman Vershynin
A randomized Kaczmarz algorithm with exponential convergenceJ Fourier Anal Appl 15(2)262ndash278 2009
Anastasios Zouzias and Nikolaos M Freris
Randomized extended Kaczmarz for solving least squaresSIAM J Matrix Anal Appl 34(2)773ndash793 2013
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 43 43
Remark 4
For the case REBK with s = m t = n and α = 1 (ie REK) by theorthogonality
(983141xk minusAdaggerb)T(xk minus 983141xk) = 0
the equation (5) becomes
983042xk minusAdaggerb98304222 = 983042xk minus 983141xk98304222 + 983042983141xk minusAdaggerb98304222
which yields the following convergence for REK
E983045983042xk minusAdaggerb98304222
983046le ρk
983061k983042z0 minus bperp98304222
983042A9830422F+ 983042x0 minusAdaggerb98304222
983062
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 43 43
Kui Du Wutao Si and Xiaohui Sun
Pseudoinverse-free randomized extended block kaczmarz for solving least squaresarXiv preprint arXiv200104179 2020
Kui Du and Xiaohui Sun
A doubly stochastic block Gauss-Seidel algorithm for solving linear equationsarXiv preprint arXiv191213291 2019
Lawrence H Landweber
An iteration formula for Fredholm integral equations of the first kindAmer J Math 73615ndash624 1951
D Leventhal and A S Lewis
Randomized methods for linear constraints convergence rates and conditioningMath Oper Res 35(3)641ndash654 2010
Anna Ma Deanna Needell and Aaditya Ramdas
Convergence properties of the randomized extended Gauss-Seidel and Kaczmarz methodsSIAM J Matrix Anal Appl 36(4)1590ndash1604 2015
Meisam Razaviyayn Mingyi Hong Navid Reyhanian and Zhi-Quan Luo
A linearly convergent doubly stochastic Gauss-Seidel algorithm for solving linear equations and acertain class of over-parameterized optimization problemsMath Program 176(1-2 Ser B)465ndash496 2019
Thomas Strohmer and Roman Vershynin
A randomized Kaczmarz algorithm with exponential convergenceJ Fourier Anal Appl 15(2)262ndash278 2009
Anastasios Zouzias and Nikolaos M Freris
Randomized extended Kaczmarz for solving least squaresSIAM J Matrix Anal Appl 34(2)773ndash793 2013
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 43 43
Kui Du Wutao Si and Xiaohui Sun
Pseudoinverse-free randomized extended block kaczmarz for solving least squaresarXiv preprint arXiv200104179 2020
Kui Du and Xiaohui Sun
A doubly stochastic block Gauss-Seidel algorithm for solving linear equationsarXiv preprint arXiv191213291 2019
Lawrence H Landweber
An iteration formula for Fredholm integral equations of the first kindAmer J Math 73615ndash624 1951
D Leventhal and A S Lewis
Randomized methods for linear constraints convergence rates and conditioningMath Oper Res 35(3)641ndash654 2010
Anna Ma Deanna Needell and Aaditya Ramdas
Convergence properties of the randomized extended Gauss-Seidel and Kaczmarz methodsSIAM J Matrix Anal Appl 36(4)1590ndash1604 2015
Meisam Razaviyayn Mingyi Hong Navid Reyhanian and Zhi-Quan Luo
A linearly convergent doubly stochastic Gauss-Seidel algorithm for solving linear equations and acertain class of over-parameterized optimization problemsMath Program 176(1-2 Ser B)465ndash496 2019
Thomas Strohmer and Roman Vershynin
A randomized Kaczmarz algorithm with exponential convergenceJ Fourier Anal Appl 15(2)262ndash278 2009
Anastasios Zouzias and Nikolaos M Freris
Randomized extended Kaczmarz for solving least squaresSIAM J Matrix Anal Appl 34(2)773ndash793 2013
Randomized Iterative Methods Lecture 2 February 21 - 28 2020 43 43