Announcements:
• Midtermexams
• ReturnitnextTuesday
1
Non-smooth Objective Function• Sub-gradient
• Typicallyslowandnogoodterminationcriteria(otherthancrossvalidation)
• ProximalGradient• Fastassumingeachiterationiseasy
• BlockCoordinateDescent• Alsohelpfulforexploitingmulti-blockstructure
• AlternatingDirectionMethodofMultipliers(ADMM)• Willbecoveredlater
1
Multi-Block Structure and BCD Method
BlockCoordinateDescent(BCD)Method:
Simpleandscalable:Lassoexample
Atiterationr,chooseanindexi and
Choiceofindexi:Cyclic,randomized,Greedy
2
VerydifferentthanpreviousincrementalGD,SGD,…
GeometricInterpretation
Convergence of BCD Method
3
Assumptions:• SeparableConstraints
• Differentiable/smoothobjective
• UniqueminimizerateachstepNecessaryassumptions?
“Nonlinearprogramming”,D.P.Bertsekas forcyclicupdaterule
Proof?
Necessity of Smoothness Assumption
4
Not“Regular”
Examples:Lasso
smooth
BCD and Non-smooth Objective
5
Theorem[Tseng2001]Assume1) Feasiblesetiscompact.2) Theuniquenessofminimizerateachstep.3) Separableconstraint4) Regularobjectivefunction
Everylimitpointoftheiteratesisastationarypoint
RateofconvergenceofBCD:• SimilartoGD:sublinearforgeneralconvexandlinearforstronglyconvex
• Sameresultscanbeshowninmostofthenon-smoothpopularobjectives
Definitionofstationarityfornonsmooth?Trueforcyclic/randomized/greedyrule
Uniqueness of the Minimizer
6
[MichaelJ.D.Powell1973]
Uniqueness of the Minimizer
7
TensorPARAFACDecomposition
NP-hard[Hastad 1990]
[Carroll1970],[Harshman1970]:AlternatingLeastSquares 0 50 100 150 200 250 3000
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
Iterates
Error
“Swamp”effect
BCD Limitations
8
• Uniquenessofminimizer
• Eachsub-problemneedstobeeasilysolvable
BCD Limitations
8
• Uniquenessofminimizer
• Eachsub-problemneedstobeeasilysolvable
PopularSolution:InexactBCD
Atiterationr,chooseanindexi and
BCD Limitations
8
• Uniquenessofminimizer
• Eachsub-problemneedstobeeasilysolvable
PopularSolution:InexactBCD
Atiterationr,chooseanindexi and
Localapproximationoftheobjectivefunction
Blocksuccessiveupper-boundminimization,blocksuccessiveconvexapproximation,convex-concaveprocedure,majorization minimization,dc-programming,BCGD,…
Idea of Block Successive Upper-bound Minimization
9
Globalupper-bound:
Idea of Block Successive Upper-bound Minimization
9
Locallytight:
Globalupper-bound:
Idea of Block Successive Upper-bound Minimization
9
Locallytight:
Globalupper-bound:
MonotoneAlgorithm
Everylimitpointisastationarypoint
Example 1: Block Coordinate (Proximal) Gradient Descent
10
SmoothScenario:
Example 1: Block Coordinate (Proximal) Gradient Descent
10
SmoothScenario:
Non-smoothScenario:
Example 1: Block Coordinate (Proximal) Gradient Descent
10
SmoothScenario:
Non-smoothScenario:
UsingBregman divergence
Example 1: Block Coordinate (Proximal) Gradient Descent
10
SmoothScenario:
Non-smoothScenario:
AlternatingProximalMinimization:
UsingBregman divergence
Example 2: Expectation Maximization Algorithm
11
Example 2: Expectation Maximization Algorithm
11
Example 2: Expectation Maximization Algorithm
11
Jensen’sinequality
Example 3: Transcript Abundance Estimation
12
24
levels ⇢1
, . . . , ⇢M can be written as
Pr (R1
, . . . , RN ; ⇢1
, . . . , ⇢M ) =
NY
n=1
Pr (Rn; ⇢1 . . . ⇢M )
=
NY
n=1
MX
m=1
Pr (Rn | read Rn from sequence sm)Pr(sm)
!
=
NY
n=1
MX
m=1
↵nm⇢m
!,
where ↵nm , Pr (Rn | read Rn from sequence sm) can be obtained efficiently using an alignment
algorithm such as the ones based on the Burrows-Wheeler transform; see, e.g., [85], [86]. Therefore,
given {↵nm}n,m, the maximum likelihood estimation of the abundance levels can be stated as
b⇢ML = argmin
⇢�
NX
n=1
log
MX
m=1
↵nm⇢m
!
s.t.MX
m=1
⇢m = 1, and ⇢m � 0, 8m = 1, . . . ,M.
(36)
As a special case of the EM algorithm, a popular approach for solving this optimization problem is
to successively minimize a local tight upper-bound of the objective function. In particular, the eXpress
software [87] solves the following optimization problem at the r-th iteration of the algorithm:
⇢r+1
= argmin
⇢�
NX
n=1
MX
m=1
↵nm⇢rmPM
m0=1
↵nm0⇢rm0
log
✓⇢m⇢rm
◆!+ log
MX
m=1
↵nm⇢rm
!!
s.t.MX
m=1
⇢m = 1, and ⇢m � 0, 8m = 1, . . . ,M.
(37)
Using Jensen’s inequality, it is not hard to check that (37) is a valid upper-bound of (36) in the BSUM
framework. Moreover, (37) has a closed form solution obtained by
⇢r+1
m =
1
N
NX
n=1
↵nm⇢rmPMm0
=1
↵nm0⇢rm0
, 8m = 1, . . . ,M,
which makes the algorithm computationally efficient at each step.
For another application of the BSUM algorithm in classical genetics, the readers are referred to the
traditional gene counting algorithm [88].
2) Tensor decomposition: The CANDECOMP/PARAFAC (CP) decomposition has applications in
different areas such as chemometrics [89], [90], clustering [91], and compression [92]. For the ease
March 10, 2015 DRAFT
Example 3: Transcript Abundance Estimation
12
24
levels ⇢1
, . . . , ⇢M can be written as
Pr (R1
, . . . , RN ; ⇢1
, . . . , ⇢M ) =
NY
n=1
Pr (Rn; ⇢1 . . . ⇢M )
=
NY
n=1
MX
m=1
Pr (Rn | read Rn from sequence sm)Pr(sm)
!
=
NY
n=1
MX
m=1
↵nm⇢m
!,
where ↵nm , Pr (Rn | read Rn from sequence sm) can be obtained efficiently using an alignment
algorithm such as the ones based on the Burrows-Wheeler transform; see, e.g., [85], [86]. Therefore,
given {↵nm}n,m, the maximum likelihood estimation of the abundance levels can be stated as
b⇢ML = argmin
⇢�
NX
n=1
log
MX
m=1
↵nm⇢m
!
s.t.MX
m=1
⇢m = 1, and ⇢m � 0, 8m = 1, . . . ,M.
(36)
As a special case of the EM algorithm, a popular approach for solving this optimization problem is
to successively minimize a local tight upper-bound of the objective function. In particular, the eXpress
software [87] solves the following optimization problem at the r-th iteration of the algorithm:
⇢r+1
= argmin
⇢�
NX
n=1
MX
m=1
↵nm⇢rmPM
m0=1
↵nm0⇢rm0
log
✓⇢m⇢rm
◆!+ log
MX
m=1
↵nm⇢rm
!!
s.t.MX
m=1
⇢m = 1, and ⇢m � 0, 8m = 1, . . . ,M.
(37)
Using Jensen’s inequality, it is not hard to check that (37) is a valid upper-bound of (36) in the BSUM
framework. Moreover, (37) has a closed form solution obtained by
⇢r+1
m =
1
N
NX
n=1
↵nm⇢rmPMm0
=1
↵nm0⇢rm0
, 8m = 1, . . . ,M,
which makes the algorithm computationally efficient at each step.
For another application of the BSUM algorithm in classical genetics, the readers are referred to the
traditional gene counting algorithm [88].
2) Tensor decomposition: The CANDECOMP/PARAFAC (CP) decomposition has applications in
different areas such as chemometrics [89], [90], clustering [91], and compression [92]. For the ease
March 10, 2015 DRAFT
24
levels ⇢1
, . . . , ⇢M can be written as
Pr (R1
, . . . , RN ; ⇢1
, . . . , ⇢M ) =
NY
n=1
Pr (Rn; ⇢1 . . . ⇢M )
=
NY
n=1
MX
m=1
Pr (Rn | read Rn from sequence sm)Pr(sm)
!
=
NY
n=1
MX
m=1
↵nm⇢m
!,
where ↵nm , Pr (Rn | read Rn from sequence sm) can be obtained efficiently using an alignment
algorithm such as the ones based on the Burrows-Wheeler transform; see, e.g., [85], [86]. Therefore,
given {↵nm}n,m, the maximum likelihood estimation of the abundance levels can be stated as
b⇢ML = argmin
⇢�
NX
n=1
log
MX
m=1
↵nm⇢m
!
s.t.MX
m=1
⇢m = 1, and ⇢m � 0, 8m = 1, . . . ,M.
(36)
As a special case of the EM algorithm, a popular approach for solving this optimization problem is
to successively minimize a local tight upper-bound of the objective function. In particular, the eXpress
software [87] solves the following optimization problem at the r-th iteration of the algorithm:
⇢r+1
= argmin
⇢�
NX
n=1
MX
m=1
↵nm⇢rmPM
m0=1
↵nm0⇢rm0
log
✓⇢m⇢rm
◆!+ log
MX
m=1
↵nm⇢rm
!!
s.t.MX
m=1
⇢m = 1, and ⇢m � 0, 8m = 1, . . . ,M.
(37)
Using Jensen’s inequality, it is not hard to check that (37) is a valid upper-bound of (36) in the BSUM
framework. Moreover, (37) has a closed form solution obtained by
⇢r+1
m =
1
N
NX
n=1
↵nm⇢rmPMm0
=1
↵nm0⇢rm0
, 8m = 1, . . . ,M,
which makes the algorithm computationally efficient at each step.
For another application of the BSUM algorithm in classical genetics, the readers are referred to the
traditional gene counting algorithm [88].
2) Tensor decomposition: The CANDECOMP/PARAFAC (CP) decomposition has applications in
different areas such as chemometrics [89], [90], clustering [91], and compression [92]. For the ease
March 10, 2015 DRAFT
Example 3: Transcript Abundance Estimation
13
24
levels ⇢1
, . . . , ⇢M can be written as
Pr (R1
, . . . , RN ; ⇢1
, . . . , ⇢M ) =
NY
n=1
Pr (Rn; ⇢1 . . . ⇢M )
=
NY
n=1
MX
m=1
Pr (Rn | read Rn from sequence sm)Pr(sm)
!
=
NY
n=1
MX
m=1
↵nm⇢m
!,
where ↵nm , Pr (Rn | read Rn from sequence sm) can be obtained efficiently using an alignment
algorithm such as the ones based on the Burrows-Wheeler transform; see, e.g., [85], [86]. Therefore,
given {↵nm}n,m, the maximum likelihood estimation of the abundance levels can be stated as
b⇢ML = argmin
⇢�
NX
n=1
log
MX
m=1
↵nm⇢m
!
s.t.MX
m=1
⇢m = 1, and ⇢m � 0, 8m = 1, . . . ,M.
(36)
As a special case of the EM algorithm, a popular approach for solving this optimization problem is
to successively minimize a local tight upper-bound of the objective function. In particular, the eXpress
software [87] solves the following optimization problem at the r-th iteration of the algorithm:
⇢r+1
= argmin
⇢�
NX
n=1
MX
m=1
↵nm⇢rmPM
m0=1
↵nm0⇢rm0
log
✓⇢m⇢rm
◆!+ log
MX
m=1
↵nm⇢rm
!!
s.t.MX
m=1
⇢m = 1, and ⇢m � 0, 8m = 1, . . . ,M.
(37)
Using Jensen’s inequality, it is not hard to check that (37) is a valid upper-bound of (36) in the BSUM
framework. Moreover, (37) has a closed form solution obtained by
⇢r+1
m =
1
N
NX
n=1
↵nm⇢rmPMm0
=1
↵nm0⇢rm0
, 8m = 1, . . . ,M,
which makes the algorithm computationally efficient at each step.
For another application of the BSUM algorithm in classical genetics, the readers are referred to the
traditional gene counting algorithm [88].
2) Tensor decomposition: The CANDECOMP/PARAFAC (CP) decomposition has applications in
different areas such as chemometrics [89], [90], clustering [91], and compression [92]. For the ease
March 10, 2015 DRAFT
24
levels ⇢1
, . . . , ⇢M can be written as
Pr (R1
, . . . , RN ; ⇢1
, . . . , ⇢M ) =
NY
n=1
Pr (Rn; ⇢1 . . . ⇢M )
=
NY
n=1
MX
m=1
Pr (Rn | read Rn from sequence sm)Pr(sm)
!
=
NY
n=1
MX
m=1
↵nm⇢m
!,
where ↵nm , Pr (Rn | read Rn from sequence sm) can be obtained efficiently using an alignment
algorithm such as the ones based on the Burrows-Wheeler transform; see, e.g., [85], [86]. Therefore,
given {↵nm}n,m, the maximum likelihood estimation of the abundance levels can be stated as
b⇢ML = argmin
⇢�
NX
n=1
log
MX
m=1
↵nm⇢m
!
s.t.MX
m=1
⇢m = 1, and ⇢m � 0, 8m = 1, . . . ,M.
(36)
As a special case of the EM algorithm, a popular approach for solving this optimization problem is
to successively minimize a local tight upper-bound of the objective function. In particular, the eXpress
software [87] solves the following optimization problem at the r-th iteration of the algorithm:
⇢r+1
= argmin
⇢�
NX
n=1
MX
m=1
↵nm⇢rmPM
m0=1
↵nm0⇢rm0
log
✓⇢m⇢rm
◆!+ log
MX
m=1
↵nm⇢rm
!!
s.t.MX
m=1
⇢m = 1, and ⇢m � 0, 8m = 1, . . . ,M.
(37)
Using Jensen’s inequality, it is not hard to check that (37) is a valid upper-bound of (36) in the BSUM
framework. Moreover, (37) has a closed form solution obtained by
⇢r+1
m =
1
N
NX
n=1
↵nm⇢rmPMm0
=1
↵nm0⇢rm0
, 8m = 1, . . . ,M,
which makes the algorithm computationally efficient at each step.
For another application of the BSUM algorithm in classical genetics, the readers are referred to the
traditional gene counting algorithm [88].
2) Tensor decomposition: The CANDECOMP/PARAFAC (CP) decomposition has applications in
different areas such as chemometrics [89], [90], clustering [91], and compression [92]. For the ease
March 10, 2015 DRAFT
Example 3: Transcript Abundance Estimation
13
24
levels ⇢1
, . . . , ⇢M can be written as
Pr (R1
, . . . , RN ; ⇢1
, . . . , ⇢M ) =
NY
n=1
Pr (Rn; ⇢1 . . . ⇢M )
=
NY
n=1
MX
m=1
Pr (Rn | read Rn from sequence sm)Pr(sm)
!
=
NY
n=1
MX
m=1
↵nm⇢m
!,
where ↵nm , Pr (Rn | read Rn from sequence sm) can be obtained efficiently using an alignment
algorithm such as the ones based on the Burrows-Wheeler transform; see, e.g., [85], [86]. Therefore,
given {↵nm}n,m, the maximum likelihood estimation of the abundance levels can be stated as
b⇢ML = argmin
⇢�
NX
n=1
log
MX
m=1
↵nm⇢m
!
s.t.MX
m=1
⇢m = 1, and ⇢m � 0, 8m = 1, . . . ,M.
(36)
As a special case of the EM algorithm, a popular approach for solving this optimization problem is
to successively minimize a local tight upper-bound of the objective function. In particular, the eXpress
software [87] solves the following optimization problem at the r-th iteration of the algorithm:
⇢r+1
= argmin
⇢�
NX
n=1
MX
m=1
↵nm⇢rmPM
m0=1
↵nm0⇢rm0
log
✓⇢m⇢rm
◆!+ log
MX
m=1
↵nm⇢rm
!!
s.t.MX
m=1
⇢m = 1, and ⇢m � 0, 8m = 1, . . . ,M.
(37)
Using Jensen’s inequality, it is not hard to check that (37) is a valid upper-bound of (36) in the BSUM
framework. Moreover, (37) has a closed form solution obtained by
⇢r+1
m =
1
N
NX
n=1
↵nm⇢rmPMm0
=1
↵nm0⇢rm0
, 8m = 1, . . . ,M,
which makes the algorithm computationally efficient at each step.
For another application of the BSUM algorithm in classical genetics, the readers are referred to the
traditional gene counting algorithm [88].
2) Tensor decomposition: The CANDECOMP/PARAFAC (CP) decomposition has applications in
different areas such as chemometrics [89], [90], clustering [91], and compression [92]. For the ease
March 10, 2015 DRAFT
24
levels ⇢1
, . . . , ⇢M can be written as
Pr (R1
, . . . , RN ; ⇢1
, . . . , ⇢M ) =
NY
n=1
Pr (Rn; ⇢1 . . . ⇢M )
=
NY
n=1
MX
m=1
Pr (Rn | read Rn from sequence sm)Pr(sm)
!
=
NY
n=1
MX
m=1
↵nm⇢m
!,
where ↵nm , Pr (Rn | read Rn from sequence sm) can be obtained efficiently using an alignment
algorithm such as the ones based on the Burrows-Wheeler transform; see, e.g., [85], [86]. Therefore,
given {↵nm}n,m, the maximum likelihood estimation of the abundance levels can be stated as
b⇢ML = argmin
⇢�
NX
n=1
log
MX
m=1
↵nm⇢m
!
s.t.MX
m=1
⇢m = 1, and ⇢m � 0, 8m = 1, . . . ,M.
(36)
As a special case of the EM algorithm, a popular approach for solving this optimization problem is
to successively minimize a local tight upper-bound of the objective function. In particular, the eXpress
software [87] solves the following optimization problem at the r-th iteration of the algorithm:
⇢r+1
= argmin
⇢�
NX
n=1
MX
m=1
↵nm⇢rmPM
m0=1
↵nm0⇢rm0
log
✓⇢m⇢rm
◆!+ log
MX
m=1
↵nm⇢rm
!!
s.t.MX
m=1
⇢m = 1, and ⇢m � 0, 8m = 1, . . . ,M.
(37)
Using Jensen’s inequality, it is not hard to check that (37) is a valid upper-bound of (36) in the BSUM
framework. Moreover, (37) has a closed form solution obtained by
⇢r+1
m =
1
N
NX
n=1
↵nm⇢rmPMm0
=1
↵nm0⇢rm0
, 8m = 1, . . . ,M,
which makes the algorithm computationally efficient at each step.
For another application of the BSUM algorithm in classical genetics, the readers are referred to the
traditional gene counting algorithm [88].
2) Tensor decomposition: The CANDECOMP/PARAFAC (CP) decomposition has applications in
different areas such as chemometrics [89], [90], clustering [91], and compression [92]. For the ease
March 10, 2015 DRAFT
24
levels ⇢1
, . . . , ⇢M can be written as
Pr (R1
, . . . , RN ; ⇢1
, . . . , ⇢M ) =
NY
n=1
Pr (Rn; ⇢1 . . . ⇢M )
=
NY
n=1
MX
m=1
Pr (Rn | read Rn from sequence sm)Pr(sm)
!
=
NY
n=1
MX
m=1
↵nm⇢m
!,
where ↵nm , Pr (Rn | read Rn from sequence sm) can be obtained efficiently using an alignment
algorithm such as the ones based on the Burrows-Wheeler transform; see, e.g., [85], [86]. Therefore,
given {↵nm}n,m, the maximum likelihood estimation of the abundance levels can be stated as
b⇢ML = argmin
⇢�
NX
n=1
log
MX
m=1
↵nm⇢m
!
s.t.MX
m=1
⇢m = 1, and ⇢m � 0, 8m = 1, . . . ,M.
(36)
As a special case of the EM algorithm, a popular approach for solving this optimization problem is
to successively minimize a local tight upper-bound of the objective function. In particular, the eXpress
software [87] solves the following optimization problem at the r-th iteration of the algorithm:
⇢r+1
= argmin
⇢�
NX
n=1
MX
m=1
↵nm⇢rmPM
m0=1
↵nm0⇢rm0
log
✓⇢m⇢rm
◆!+ log
MX
m=1
↵nm⇢rm
!!
s.t.MX
m=1
⇢m = 1, and ⇢m � 0, 8m = 1, . . . ,M.
(37)
Using Jensen’s inequality, it is not hard to check that (37) is a valid upper-bound of (36) in the BSUM
framework. Moreover, (37) has a closed form solution obtained by
⇢r+1
m =
1
N
NX
n=1
↵nm⇢rmPMm0
=1
↵nm0⇢rm0
, 8m = 1, . . . ,M,
which makes the algorithm computationally efficient at each step.
For another application of the BSUM algorithm in classical genetics, the readers are referred to the
traditional gene counting algorithm [88].
2) Tensor decomposition: The CANDECOMP/PARAFAC (CP) decomposition has applications in
different areas such as chemometrics [89], [90], clustering [91], and compression [92]. For the ease
March 10, 2015 DRAFT
Closedformupdate!