hiroyuki sato
TRANSCRIPT
-
2016 3 10
-
1
2
3
4
5
( ) 2016 3 10 1 / 67
-
1
2
3
4
5
( ) 2016 3 10 2 / 67
-
Rn
1.1 (Rn )
minimize f (x),subject to x Rn.
1.1 Rn1: x0 Rn2: for k = 0, 1, 2, . . . do3: k Rn tk > 04: xk+1 xk+1 := xk + tkk5: end for
k tk( ) 2016 3 10 3 / 67
-
Rn
( ) 2016 3 10 4 / 67
-
Rn k
f , 2f fk := f (xk).k Rn
2f (xk)[] = f (xk)
0 := f (x0),k+1 := f (xk+1) + k+1k, k 0.
k
( ) 2016 3 10 5 / 67
-
A n
1.2
minimize f (x) =xTAxxTx,
subject to x Rn {0} .
f (x)A
x f( Ax = x
TAxx2 x
)
x = x.
( ) 2016 3 10 6 / 67
-
1.2 Rn
1.3minimize f (x) = xTAx,subject to x Rn, xTx = 1.
n 1 Sn1
1.4minimize f (x) = xTAx,subject to x Sn1.
( ) 2016 3 10 7 / 67
-
1.1M M
Ui Ui Rni : Ui i(Ui)
iUi = M,
Ui Uj !
i (1j |j(UiUj)
): j(Ui Uj) i(Ui Uj)
C
M Rn MR3
MM
( ) 2016 3 10 8 / 67
-
p nn 1 Sn1 =
{x Rn | xTx = 1
} Rn
n O(n) ={X Rnn | XTX = In
} Rnn
St(p, n) ={Y Rnp | YTY = Ip
} Rnp
n 1 RPn1 = {l : Rn }
Grass(p, n) ={W : Rn p
}
( ) 2016 3 10 9 / 67
-
Rn Mk M xk .
Rn
xk+1 := xk + tkk
M (0) = xk, (0) = k M xk+1
R : TM M Rx := R|TxM
xk+1 := Rxk(tkk), Rxk : TxkM M.
( ) 2016 3 10 10 / 67
-
M R ( )
1.2x0 M .
for k = 0, 1, 2, . . . dok TxkM tk > 0 .
xk+1 xk+1 := Rxk(tkk) .end for
k tk
( ) 2016 3 10 11 / 67
-
( ) 2016 3 10 12 / 67
-
M
k := grad f (xk) grad M
0 := grad f (x0),(?) k+1 := grad f (xk+1) + k+1k, k 0.
grad f f
grad f (xk+1) Txk+1M k TxkM
( ) 2016 3 10 13 / 67
-
1
2
3
4
5
( ) 2016 3 10 14 / 67
-
x M TxM
x M2
M (0)
f : M R (0)f = ddt
f ((t))|t=0
M (0)ddt(t)|t=0
Sn1 := {x Rn | xTx = 1}
TxSn1 = { Rn | Tx = 0}.
( ) 2016 3 10 15 / 67
-
g
x M TxM gx x
Sn1 Rn Rn
a, b = aTb, a, b Rn
gx(, ) = T, , TxSn1
g TxMgx(, ) , x
( ) 2016 3 10 16 / 67
-
f grad f (x)
M f x grad f (x) TxM
D f (x)[] = gx(grad f (x), ), TxM
Sn1 f (x) = xTAx Af Rn f
f (x) = xTAx, x Rn.f Rn f (x) = 2Ax
TxSn1
Df (x)[] = 2xTA = 2xTA(In xxT) = gx(2(In xxT)Ax, )
grad f (x) = 2(In xxT
)Ax.
( ) 2016 3 10 17 / 67
-
R : TM M
R [Absil et al., 2008]
2.1R : TM M R
Rx := R|TxM R TxMRx(0x) = x, x M. 0x TxMDRx(0x)[] = , x M, TxM.
x M, TxM (t) = Rx(t)(0) = Rx(0) = x (t) x(0) = DRx(0)[] = (t)
( ) 2016 3 10 18 / 67
-
Sn1
Rx() =x + x + , x S
n1, TxSn1
R
( ) 2016 3 10 19 / 67
-
1
2
3
4
5
( ) 2016 3 10 20 / 67
-
Rn
3.1 Rn1: x0 Rn .2: 0 := f (x0).3: while f (xk) ! 0 do4: k xk+1 := xk + kk .5: k+1
k+1 := f (xk+1)+k+1k (1)
6: k := k + 1.7: end while
M(1) +
grad f (xk+1) Txk+1M, k TxkM ( ) 2016 3 10 21 / 67
-
Vector transport
Vector transport
M vector transport T TM TM TMx M
[Absil et al., 2008]1 R (Tx(x)) = R(x).
(Tx(x)) Tx(x)2 T0x(x) = x, x TxM.3 Tx(ax + bx) = aTx(x) + bTx(x), a, b R.
vector transport
( ) 2016 3 10 22 / 67
-
Vector transport
Vector transport
M R
T Rx(x) := DRx(x)[x]T R vector transport
T T R( ) 2016 3 10 23 / 67
-
Vector transport
Vector transport
3.1 M1: x0 M .2: 0 := grad f (x0).3: while grad f (xk) ! 0 do4: k xk+1 := Rxk(kk) .5: k+1 k+1 := grad f (xk+1) + k+1Tkk (k)6: k := k + 1.7: end while
k k
( ) 2016 3 10 24 / 67
-
0 < c1 < c2 < 1Rn xk Rn k f (xk)Tk < 0
f (xk + kk) f (xk) + c1kf (xk)Tk, (2)f (xk + kk)Tk c2f (xk)Tk, (3)|f (xk + kk)Tk| c2|f (xk)Tk|. (4)
(2)(2) (3)
(2) (4)
( ) 2016 3 10 25 / 67
-
() := f (xk + k) (2), (3), (4)
(k) (0) + c1k(0), (5)(k) c2(0), (6)|(k)| c2|(0)| (7)
(5)(5) (6)
(5) (7)M () := f (Rxk(k))
(5), (6), (7)
( ) 2016 3 10 26 / 67
-
0 < c1 < c2 < 1M xk M k
grad f (xk), kxk < 0
f (Rxk(kk)) f (xk) + c1kgradf (xk), kxk , (8)grad f (Rxk(kk)),DRxk(kk)[k]xk c2grad f (xk), kxk , (9)|grad f (Rxk(kk)),DRxk(kk)[k]xk | c2|grad f (xk), kxk |. (10)
[Absil et al., 2008] (8)[Sato, 2015] (8) (9)
[Ring & Wirth, 2012] (8) (10)
DRxk(kk)[k] = T Rkk(k)( ) 2016 3 10 27 / 67
-
k
Rn k
gk := f (xk), yk := gk+1 gkHSk+1 =
gTk+1ykTk yk
. [Hestenes & Stiefel, 1952]
FRk+1 =gk+12gk2
. [Fletcher & Reeves, 1964]
PRPk+1 =gTk+1ykgk2
. [Polak, Ribiere, Polyak, 1969]
CDk+1 =gk+12Tk gk
. [Fletcher, 1987]
LSk+1 =gTk+1ykTk gk
. [Liu & Storey, 1991]
DYk+1 =gk+12Tk yk
. [Dai & Yuan, 1999]
( ) 2016 3 10 28 / 67
-
k
k
gk := f (xk), yk := gk+1 gkFletcherReeves: Rn FRk+1 =
gk+12gk2
.
M
k+1 =grad f (xk+1), grad f (xk+1)xk+1
grad f (xk), grad f (xk)xk
DaiYuan: Rn DYk+1 =gk+12Tk yk
.
M
(?) k+1 :=grad f (xk+1), grad f (xk+1)xk+1
k, ykxkyk = grad f (xk+1) Tkk(grad f (xk))?
( ) 2016 3 10 29 / 67
-
FletcherReeves
Scaled vector transport
Rn
vector transport TTk1k1(k1)xk k1xk1
Vector transport
Vector transport T R scaled vector transport T 0[Sato & Iwai, 2015]
T 0 () =x
T R ()Rx()T R (), , TxM.
( ) 2016 3 10 30 / 67
-
FletcherReeves
Scaled vector transport FletcherReeves
3.2 FletcherReeves
1: x0 M2: 0 := grad f (x0).3: while grad f (xk) ! 0 do4: k
xk+1 := Rxk(kk)
5: k+1 :=grad f (xk+1), grad f (xk+1)xk+1
grad f (xk), grad f (xk)xkk+1 := grad f (xk+1) + k+1T (k)kk (k)
6: k := k + 1.7: end while
T (k)kk(k) :=T Rkk(k), if T Rkk(k)xk+1 kxk ,T 0kk(k), otherwise.
( ) 2016 3 10 31 / 67
-
FletcherReeves
FletcherReeves
3.1 (Sato & Iwai, 2015)f C1 L > 0
|D(f Rx)(t)[] D(f Rx)(0)[]| Lt,
TxM with x = 1, x M, t 03.2 {xk}
lim infk
grad f (xk)xk = 0
( ) 2016 3 10 32 / 67
-
FletcherReeves
[Ring & Wirth, 2012]k
T Rk1k1(k1)xk k1xk1 (11)
vector transport T R[Sato & Iwai, 2015]
(11) (11) vectortransport scaled vector transport
( ) 2016 3 10 33 / 67
-
FletcherReeves
(11)n = 20,A = diag(1, . . . , 20) Sn1 :=
{x Rn | xTx = 1
}
3.1minimize f (x) = xTAx,subject to x Sn1,
Sn1
gx(x, x) := Tx Gxx, x, x TxSn1,
Gx := diag(104(x(1))2 + 1, 1, 1, . . . , 1) x(1)x 1
( ) 2016 3 10 34 / 67
-
FletcherReeves
grad f (x) = 2(In
G1x xxT
xTG1x x
)G1x Ax.
Rx() =x +
(x + )T(x + )
, TxSn1, x Sn1,
Vector transport:
T R () =1
(x + )T(x + )
(In
(x + )(x + )T
(x + )T(x + )
),
, TxSn1, x Sn1.x f (x) = 1
( ) 2016 3 10 35 / 67
-
FletcherReeves
0 2 4 6 8 10x 104
1.45
1.5
1.55
1.6
Iteration
f(x
k)
( ) 2016 3 10 36 / 67
-
FletcherReeves
0 2 4 6 8 10x 104
0.6
0.65
0.7
0.75
0.8
0.85
Iteration
x(1
)k
( ) 2016 3 10 37 / 67
-
FletcherReeves
0 2 4 6 8 10x 104
0
0.5
1
1.5
2
2.5
Iteration
||T
R k
k(
k)|| x
k+
1/||
k||
xk
( ) 2016 3 10 38 / 67
-
FletcherReeves
0 0.5 1 1.5 2x 104
0.5
1
1.5
Iteration
xk
(1)
Ratios
( ) 2016 3 10 39 / 67
-
FletcherReeves
0 50 100 150 2000
0.2
0.4
0.6
0.8
1
Iteration
x(1
)k
( ) 2016 3 10 40 / 67
-
FletcherReeves
0 50 100 150 200108
106
104
102
100
102
Iteration
Dist
ance
to so
lutio
n
( ) 2016 3 10 41 / 67
-
FletcherReeves
n = 100, A = diag(1, . . . , 100)/100Sn1
3.2minimize f (x) = xTAx,subject to x Sn1,
Sn1
gx(x, x) := Tx x, x, x TxSn1,
( ) 2016 3 10 42 / 67
-
FletcherReeves
grad f (x) = 2(I xxT
)Ax.
Rx() =
1 Tx + , TxSn1, x Sn1,
Vector transport:
T R () = T
1 T)
x,
, TxSn1 with x, x < 1, x Sn1.(2) T R ()Rx() > x.
( ) 2016 3 10 43 / 67
-
FletcherReeves
0 50 100 150 200 250 300 350
106
104
102
100
Iteration
Dis
tan
ce
to
so
luti
on
( ) 2016 3 10 44 / 67
-
DaiYuan
Rn DaiYuan
3.3 Rn DaiYuan [Dai & Yuan, 1999]
1: x0 Rn2: 0 := grad f (x0).3: while grad f (xk) ! 0 do4: k xk+1 :=
xk + kk5:
k+1 =gk+12Tk yk
, k+1 := grad f (xk+1) + k+1k
gk = grad f (xk), yk = gk+1 gk.6: k := k + 1.7: end while
( ) 2016 3 10 45 / 67
-
DaiYuan
Rn DaiYuan
3.2f L = {x Rn | f (x) f (x1)} N
C1 L > 0
f (x) f (y) Lx y, x, y N
3.3 {xk}
lim infk
grad f (xk)xk = 0
( ) 2016 3 10 46 / 67
-
DaiYuan
DaiYuan
Rn gk = f (xk), yk = gk+1 gk
k+1 =gk+12Tk yk
=gTk+1k+1
gTk k
M gk = grad f (xk)
k+1 =gk+1, k+1xk+1
gk, kxkk+1 k+1
k+1
( ) 2016 3 10 47 / 67
-
DaiYuan
DaiYuan
k+1 =gk+1, k+1xk+1
gk, kxk
=gk+1,gk+1 + k+1T (k)kk(k)xk+1
gk, kxk
=gk+12 + k+1gk+1,T (k)kk(k)xk+1
gk, kxk.
k+1 =gk+12xk+1
gk+1,T (k)kk(k)xk+1 gk, kxk.
( ) 2016 3 10 48 / 67
-
DaiYuan
DaiYuan
Rn
k+1 =gTk+1k+1
gTk k=
gk+12Tk yk
, yk = gk+1 gk.
M
k+1 =gk+1, k+1xk+1
gk, kxk=
gk+12xk+1T (k)kk(k), ykxk+1
.
yk = gk+1 gk, kxk
T (k)kk(gk),T (k)kk(k)xk+1T (k)kk(gk).
( ) 2016 3 10 49 / 67
-
DaiYuan
DaiYuan
3.3 (Sato, 2015)f C1 L > 0
|D(f Rx)(t)[] D(f Rx)(0)[]| Lt,
TxM with x = 1, x M, t 0{xk}
lim infk
grad f (xk)xk = 0
( ) 2016 3 10 50 / 67
-
DaiYuan
f (x) = xTAx, x Sn1.
Iteration0 50 100 150 200 250 300 350
Nor
m o
f the
gra
dien
t
10-6
10-4
10-2
100
102DY + wWolfeDY + sWolfeFR + wWolfeFR + sWolfe
3.1: n = 100,A = diag(1, 2, . . . , n), x0 = 1n/
n.( ) 2016 3 10 51 / 67
-
DaiYuan
f (x) = xTAx, x Sn1.
Iteration0 200 400 600 800 1000
Nor
m o
f the
gra
dien
t
10-6
10-4
10-2
100
102
104DY + wWolfeDY + sWolfeFR + wWolfeFR + sWolfe
3.2: n = 500,A = diag(1, 2, . . . , n), x0 = 1n/
n.( ) 2016 3 10 52 / 67
-
DaiYuan
f (x) = xTAx, x Sn1.
3.1: n = 100,A = diag(1, 2, . . . , n), x0 = 1n/
n.!!!!!!Method
Iterations Function Evals. Gradient Evals. Computational time
DY + wWolfe 149 210 206 0.0175DY + sWolfe 90 288 244 0.0187FR + wWolfe 318 619 577 0.0429FR + sWolfe 91 293 258 0.0191
3.2: n = 500,A = diag(1, 2, . . . , n), x0 = 1n/
n.!!!!!!Method
Iterations Function Evals. Gradient Evals. Computational time
DY + wWolfe 340 373 367 0.0522DY + sWolfe 232 657 467 0.0658FR + wWolfe 960 1902 1757 0.1988FR + sWolfe 300 723 529 0.0730
( ) 2016 3 10 53 / 67
-
Rn k
PRPk+1 =gk+1ykgk2
, HSk+1 =gk+1ykdk yk
, LSk+1 =gk+1ykdk gk
,
FRk+1 =gk+12gk2
, DYk+1 =gk+12dk yk
, CDk+1 =gk+12dk gk
.
Rn 3[Narushima et al., 2011]
0 := g0 k 0
k+1 :=
gk+1 if gk+1pk+1 = 0,gk+1 + k+1k k+1
gk+1kgk+1pk+1
pk+1 otherwise.
pk Rn
( ) 2016 3 10 54 / 67
-
1
2
3
4
5
( ) 2016 3 10 55 / 67
-
[Sato & Iwai, 2013]
A Rmn, m np n N = diag(1, . . . , p), 1 > > p > 0
4.1minimize tr(UTAVN),subject to (U,V) St(p,m) St(p, n).
(U, V) U, VA p
2
( ) 2016 3 10 56 / 67
-
[Yger et al., 2012]0 2 X RTm,Y RTn
CX = XTX, CY = YTY , CXY = XTY
u Rm, v Rn f = Xu, g = Yv2 f g
=Cov(f , g)
Var(f )
Var(g)
=uTCXYv
uTCXu
vTCYv.
4.2maximize uTCXYv,subject to uTCXu = vTCYv = 1.
2( ) 2016 3 10 57 / 67
-
[Yger et al., 2012]
u, v
4.3maximize tr(UTCXYV),subject to (U,V) StCX (p,m) StCY (p, n).
n GStG(p, n)
StG(p, n) = {Y Rnp | YTGY = Ip}
2( ) 2016 3 10 58 / 67
-
[Sato & Sato, 2015]
x =Ax + Bu,y =Cx.
u Rp y Rq x Rn
xm =Amxm + Bmu,ym =Cmxm.
Am = UTAU,Bm = UTB,Cm = CU, U Rnm UUTU = Im
( ) 2016 3 10 59 / 67
-
[Sato & Sato, 2015]
4.4minimize J(U),subject to U St(m, n).
J
J(U) := Ge2 = tr(CeEcCTe ) = tr(BTe EoBe)
Ae =(A 00 UTAU
),Be =
(B
UTB
),Ce =
(C CU
)Ec
Eo
AeEc + EcATe + BeBTe =0, A
Te Eo + EoAe + C
Te Ce = 0.
( ) 2016 3 10 60 / 67
-
[Kasai & Mishra, 2015]
X Rn1n2n3 : 3 {(i1, i2, i3) | id {1, 2, . . . , nd}, d {1, 2, 3}}Xi1i2i3 (i1, i2, i3)
P(X)(i1,i2,i3) =Xi1i2i3 if (i1, i2, i3) 0 otherwise
r = (r1, r2, r3)
4.5minimize
1||P(X) P(X
)2F,
subject to X Rn1n2n3 , rank(X) = r.
( ) 2016 3 10 61 / 67
-
[Kasai & Mishra, 2015]
X Rn1n2n3 r
X = G1U12U23U3, G Rr1r2r3 , Ud St(rd, nd), d = 1, 2, 3.
M := St(r1, n1) St(r2, n2) St(r3, n3) Rr1r2r3Od O(rd), d = 1, 2, 3
(U1,U2,U3,G) 8 (U1O1,U2O2,U3O3,G 1 OT1 2 OT2 3 OT3 )
XM/(O(r1) O(r2) O(r3))
( ) 2016 3 10 62 / 67
-
[Yao et al., 2016]
1
DSIEP (Doubly Stochastic Inverse Eigenvalue Problem):self-conjugate {1, 2, . . . , n}
n n C1, 2, . . . , n
i
( ) 2016 3 10 63 / 67
-
[Yao et al., 2016]
Oblique OB := {Z Rnn | diag(ZZT) = In} := diag(1, 2, . . . , n)U:
1 Z Z, Z OB(Z Z)T1n 1n = 0
Z Z 1, 2, . . . , nZ Z = Q( + U)QT , Q O(n), U U
( ) 2016 3 10 64 / 67
-
[Yao et al., 2016]
H1(Z,Q,U) := Z Z Q( + U)QT , H2(Z) := (Z Z)T1n 1nH(Z,Q,U) := (H1(Z,Q,U),H2(Z))
4.6minimize h(Z,Q,U) :=
12H(Z,Q,U)2F,
subject to (Z,Q,U) OB O(n) U.
OB O(n) U
( ) 2016 3 10 65 / 67
-
1
2
3
4
5
( ) 2016 3 10 66 / 67
-
( ) 2016 3 10 67 / 67
-
I
[1] Absil, P.A., Mahony, R., Sepulchre, R.: OptimizationAlgorithms on Matrix Manifolds. Princeton University Press,Princeton, NJ (2008)
[2] Dai, Y.H., Yuan, Y.: A nonlinear conjugate gradient methodwith a strong global convergence property. SIAM Journalon Optimization 10(1), 177182 (1999)
[3] Edelman, A., Arias, T.A., Smith, S.T.: The geometry ofalgorithms with orthogonality constraints. SIAM Journal onMatrix Analysis and Applications 20(2), 303353 (1998)
[4] Fletcher, R., Reeves, C.M.: Function minimization byconjugate gradients. The Computer Journal 7(2), 149154(1964)
( ) 2016 3 10 68 / 67
-
II
[5] Kasai, H., Mishra, B.: Riemannian preconditioning fortensor completion. arXiv preprint arXiv:1506.02159v1(2015)
[6] Narushima, Y., Yabe, H., Ford, J.A.: A three-term conjugategradient method with sufficient descent property forunconstrained optimization. SIAM Journal on optimization21(1), 212230 (2011)
[7] Ring, W., Wirth, B.: Optimization methods on Riemannianmanifolds and their application to shape space. SIAMJournal on Optimization 22(2), 596627 (2012)
[8] Sato, H.: A DaiYuan-type Riemannian conjugate gradientmethod with the weak Wolfe conditions. ComputationalOptimization and Applications (2015)
( ) 2016 3 10 69 / 67
-
III[9] Sato, H., Iwai, T.: A Riemannian optimization approach to
the matrix singular value decomposition. SIAM Journal onOptimization 23(1), 188212 (2013)
[10] Sato, H., Iwai, T.: A new, globally convergent Riemannianconjugate gradient method. Optimization 64(4), 10111031(2015)
[11] Sato, H., Sato, K.: Riemannian trust-region methods for H2optimal model reduction. In: Proceedings of the 54th IEEEConference on Decision and Control, pp. 46484655(2015)
[12] Tan, M., Tsang, I.W., Wang, L., Vandereycken, B., Pan,S.J.: Riemannian pursuit for big matrix recovery. In:Proceedings of the 31st International Conference onMachine Learning, pp. 15391547 (2014)
( ) 2016 3 10 70 / 67
-
IV
[13] Yao, T.T., Bai, Z.J., Zhao, Z., Ching, W.K.: A RiemannianFletcherReeves conjugate gradient method for doublystochastic inverse eigenvalue problems. SIAM Journal onMatrix Analysis and Applications 37(1), 215234 (2016)
[14] Yger, F., Berar, M., Gasso, G., Rakotomamonjy, A.:Adaptive canonical correlation analysis based on matrixmanifolds. In: Proceedings of the 29th InternationalConference on Machine Learning (ICML-12), pp.10711078 (2012)
( ) 2016 3 10 71 / 67