Transcript
Page 1: Large Scale Optimization for Machine Learning...Lecture 21 razaviya@usc.edu. Announcements: • Midterm exams • Return it next Tuesday 1. Non-smooth Objective Function • Sub-gradient

LargeScaleOptimizationforMachineLearning

Meisam RazaviyaynLecture 21

[email protected]

Page 2: Large Scale Optimization for Machine Learning...Lecture 21 razaviya@usc.edu. Announcements: • Midterm exams • Return it next Tuesday 1. Non-smooth Objective Function • Sub-gradient

Announcements:

• Midtermexams

• ReturnitnextTuesday

1

Page 3: Large Scale Optimization for Machine Learning...Lecture 21 razaviya@usc.edu. Announcements: • Midterm exams • Return it next Tuesday 1. Non-smooth Objective Function • Sub-gradient

Non-smooth Objective Function• Sub-gradient

• Typicallyslowandnogoodterminationcriteria(otherthancrossvalidation)

• ProximalGradient• Fastassumingeachiterationiseasy

• BlockCoordinateDescent• Alsohelpfulforexploitingmulti-blockstructure

• AlternatingDirectionMethodofMultipliers(ADMM)• Willbecoveredlater

1

Page 4: Large Scale Optimization for Machine Learning...Lecture 21 razaviya@usc.edu. Announcements: • Midterm exams • Return it next Tuesday 1. Non-smooth Objective Function • Sub-gradient

Multi-Block Structure and BCD Method

BlockCoordinateDescent(BCD)Method:

Simpleandscalable:Lassoexample

Atiterationr,chooseanindexi and

Choiceofindexi:Cyclic,randomized,Greedy

2

VerydifferentthanpreviousincrementalGD,SGD,…

GeometricInterpretation

Page 5: Large Scale Optimization for Machine Learning...Lecture 21 razaviya@usc.edu. Announcements: • Midterm exams • Return it next Tuesday 1. Non-smooth Objective Function • Sub-gradient

Convergence of BCD Method

3

Assumptions:• SeparableConstraints

• Differentiable/smoothobjective

• UniqueminimizerateachstepNecessaryassumptions?

“Nonlinearprogramming”,D.P.Bertsekas forcyclicupdaterule

Proof?

Page 6: Large Scale Optimization for Machine Learning...Lecture 21 razaviya@usc.edu. Announcements: • Midterm exams • Return it next Tuesday 1. Non-smooth Objective Function • Sub-gradient

Necessity of Smoothness Assumption

4

Not“Regular”

Examples:Lasso

smooth

Page 7: Large Scale Optimization for Machine Learning...Lecture 21 razaviya@usc.edu. Announcements: • Midterm exams • Return it next Tuesday 1. Non-smooth Objective Function • Sub-gradient

BCD and Non-smooth Objective

5

Theorem[Tseng2001]Assume1) Feasiblesetiscompact.2) Theuniquenessofminimizerateachstep.3) Separableconstraint4) Regularobjectivefunction

Everylimitpointoftheiteratesisastationarypoint

RateofconvergenceofBCD:• SimilartoGD:sublinearforgeneralconvexandlinearforstronglyconvex

• Sameresultscanbeshowninmostofthenon-smoothpopularobjectives

Definitionofstationarityfornonsmooth?Trueforcyclic/randomized/greedyrule

Page 8: Large Scale Optimization for Machine Learning...Lecture 21 razaviya@usc.edu. Announcements: • Midterm exams • Return it next Tuesday 1. Non-smooth Objective Function • Sub-gradient

Uniqueness of the Minimizer

6

[MichaelJ.D.Powell1973]

Page 9: Large Scale Optimization for Machine Learning...Lecture 21 razaviya@usc.edu. Announcements: • Midterm exams • Return it next Tuesday 1. Non-smooth Objective Function • Sub-gradient

Uniqueness of the Minimizer

7

TensorPARAFACDecomposition

NP-hard[Hastad 1990]

[Carroll1970],[Harshman1970]:AlternatingLeastSquares 0 50 100 150 200 250 3000

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

Iterates

Error

“Swamp”effect

Page 10: Large Scale Optimization for Machine Learning...Lecture 21 razaviya@usc.edu. Announcements: • Midterm exams • Return it next Tuesday 1. Non-smooth Objective Function • Sub-gradient

BCD Limitations

8

• Uniquenessofminimizer

• Eachsub-problemneedstobeeasilysolvable

Page 11: Large Scale Optimization for Machine Learning...Lecture 21 razaviya@usc.edu. Announcements: • Midterm exams • Return it next Tuesday 1. Non-smooth Objective Function • Sub-gradient

BCD Limitations

8

• Uniquenessofminimizer

• Eachsub-problemneedstobeeasilysolvable

PopularSolution:InexactBCD

Atiterationr,chooseanindexi and

Page 12: Large Scale Optimization for Machine Learning...Lecture 21 razaviya@usc.edu. Announcements: • Midterm exams • Return it next Tuesday 1. Non-smooth Objective Function • Sub-gradient

BCD Limitations

8

• Uniquenessofminimizer

• Eachsub-problemneedstobeeasilysolvable

PopularSolution:InexactBCD

Atiterationr,chooseanindexi and

Localapproximationoftheobjectivefunction

Blocksuccessiveupper-boundminimization,blocksuccessiveconvexapproximation,convex-concaveprocedure,majorization minimization,dc-programming,BCGD,…

Page 13: Large Scale Optimization for Machine Learning...Lecture 21 razaviya@usc.edu. Announcements: • Midterm exams • Return it next Tuesday 1. Non-smooth Objective Function • Sub-gradient

Idea of Block Successive Upper-bound Minimization

9

Globalupper-bound:

Page 14: Large Scale Optimization for Machine Learning...Lecture 21 razaviya@usc.edu. Announcements: • Midterm exams • Return it next Tuesday 1. Non-smooth Objective Function • Sub-gradient

Idea of Block Successive Upper-bound Minimization

9

Locallytight:

Globalupper-bound:

Page 15: Large Scale Optimization for Machine Learning...Lecture 21 razaviya@usc.edu. Announcements: • Midterm exams • Return it next Tuesday 1. Non-smooth Objective Function • Sub-gradient

Idea of Block Successive Upper-bound Minimization

9

Locallytight:

Globalupper-bound:

MonotoneAlgorithm

Everylimitpointisastationarypoint

Page 16: Large Scale Optimization for Machine Learning...Lecture 21 razaviya@usc.edu. Announcements: • Midterm exams • Return it next Tuesday 1. Non-smooth Objective Function • Sub-gradient

Example 1: Block Coordinate (Proximal) Gradient Descent

10

SmoothScenario:

Page 17: Large Scale Optimization for Machine Learning...Lecture 21 razaviya@usc.edu. Announcements: • Midterm exams • Return it next Tuesday 1. Non-smooth Objective Function • Sub-gradient

Example 1: Block Coordinate (Proximal) Gradient Descent

10

SmoothScenario:

Non-smoothScenario:

Page 18: Large Scale Optimization for Machine Learning...Lecture 21 razaviya@usc.edu. Announcements: • Midterm exams • Return it next Tuesday 1. Non-smooth Objective Function • Sub-gradient

Example 1: Block Coordinate (Proximal) Gradient Descent

10

SmoothScenario:

Non-smoothScenario:

UsingBregman divergence

Page 19: Large Scale Optimization for Machine Learning...Lecture 21 razaviya@usc.edu. Announcements: • Midterm exams • Return it next Tuesday 1. Non-smooth Objective Function • Sub-gradient

Example 1: Block Coordinate (Proximal) Gradient Descent

10

SmoothScenario:

Non-smoothScenario:

AlternatingProximalMinimization:

UsingBregman divergence

Page 20: Large Scale Optimization for Machine Learning...Lecture 21 razaviya@usc.edu. Announcements: • Midterm exams • Return it next Tuesday 1. Non-smooth Objective Function • Sub-gradient

Example 2: Expectation Maximization Algorithm

11

Page 21: Large Scale Optimization for Machine Learning...Lecture 21 razaviya@usc.edu. Announcements: • Midterm exams • Return it next Tuesday 1. Non-smooth Objective Function • Sub-gradient

Example 2: Expectation Maximization Algorithm

11

Page 22: Large Scale Optimization for Machine Learning...Lecture 21 razaviya@usc.edu. Announcements: • Midterm exams • Return it next Tuesday 1. Non-smooth Objective Function • Sub-gradient

Example 2: Expectation Maximization Algorithm

11

Jensen’sinequality

Page 23: Large Scale Optimization for Machine Learning...Lecture 21 razaviya@usc.edu. Announcements: • Midterm exams • Return it next Tuesday 1. Non-smooth Objective Function • Sub-gradient

Example 3: Transcript Abundance Estimation

12

24

levels ⇢1

, . . . , ⇢M can be written as

Pr (R1

, . . . , RN ; ⇢1

, . . . , ⇢M ) =

NY

n=1

Pr (Rn; ⇢1 . . . ⇢M )

=

NY

n=1

MX

m=1

Pr (Rn | read Rn from sequence sm)Pr(sm)

!

=

NY

n=1

MX

m=1

↵nm⇢m

!,

where ↵nm , Pr (Rn | read Rn from sequence sm) can be obtained efficiently using an alignment

algorithm such as the ones based on the Burrows-Wheeler transform; see, e.g., [85], [86]. Therefore,

given {↵nm}n,m, the maximum likelihood estimation of the abundance levels can be stated as

b⇢ML = argmin

⇢�

NX

n=1

log

MX

m=1

↵nm⇢m

!

s.t.MX

m=1

⇢m = 1, and ⇢m � 0, 8m = 1, . . . ,M.

(36)

As a special case of the EM algorithm, a popular approach for solving this optimization problem is

to successively minimize a local tight upper-bound of the objective function. In particular, the eXpress

software [87] solves the following optimization problem at the r-th iteration of the algorithm:

⇢r+1

= argmin

⇢�

NX

n=1

MX

m=1

↵nm⇢rmPM

m0=1

↵nm0⇢rm0

log

✓⇢m⇢rm

◆!+ log

MX

m=1

↵nm⇢rm

!!

s.t.MX

m=1

⇢m = 1, and ⇢m � 0, 8m = 1, . . . ,M.

(37)

Using Jensen’s inequality, it is not hard to check that (37) is a valid upper-bound of (36) in the BSUM

framework. Moreover, (37) has a closed form solution obtained by

⇢r+1

m =

1

N

NX

n=1

↵nm⇢rmPMm0

=1

↵nm0⇢rm0

, 8m = 1, . . . ,M,

which makes the algorithm computationally efficient at each step.

For another application of the BSUM algorithm in classical genetics, the readers are referred to the

traditional gene counting algorithm [88].

2) Tensor decomposition: The CANDECOMP/PARAFAC (CP) decomposition has applications in

different areas such as chemometrics [89], [90], clustering [91], and compression [92]. For the ease

March 10, 2015 DRAFT

Page 24: Large Scale Optimization for Machine Learning...Lecture 21 razaviya@usc.edu. Announcements: • Midterm exams • Return it next Tuesday 1. Non-smooth Objective Function • Sub-gradient

Example 3: Transcript Abundance Estimation

12

24

levels ⇢1

, . . . , ⇢M can be written as

Pr (R1

, . . . , RN ; ⇢1

, . . . , ⇢M ) =

NY

n=1

Pr (Rn; ⇢1 . . . ⇢M )

=

NY

n=1

MX

m=1

Pr (Rn | read Rn from sequence sm)Pr(sm)

!

=

NY

n=1

MX

m=1

↵nm⇢m

!,

where ↵nm , Pr (Rn | read Rn from sequence sm) can be obtained efficiently using an alignment

algorithm such as the ones based on the Burrows-Wheeler transform; see, e.g., [85], [86]. Therefore,

given {↵nm}n,m, the maximum likelihood estimation of the abundance levels can be stated as

b⇢ML = argmin

⇢�

NX

n=1

log

MX

m=1

↵nm⇢m

!

s.t.MX

m=1

⇢m = 1, and ⇢m � 0, 8m = 1, . . . ,M.

(36)

As a special case of the EM algorithm, a popular approach for solving this optimization problem is

to successively minimize a local tight upper-bound of the objective function. In particular, the eXpress

software [87] solves the following optimization problem at the r-th iteration of the algorithm:

⇢r+1

= argmin

⇢�

NX

n=1

MX

m=1

↵nm⇢rmPM

m0=1

↵nm0⇢rm0

log

✓⇢m⇢rm

◆!+ log

MX

m=1

↵nm⇢rm

!!

s.t.MX

m=1

⇢m = 1, and ⇢m � 0, 8m = 1, . . . ,M.

(37)

Using Jensen’s inequality, it is not hard to check that (37) is a valid upper-bound of (36) in the BSUM

framework. Moreover, (37) has a closed form solution obtained by

⇢r+1

m =

1

N

NX

n=1

↵nm⇢rmPMm0

=1

↵nm0⇢rm0

, 8m = 1, . . . ,M,

which makes the algorithm computationally efficient at each step.

For another application of the BSUM algorithm in classical genetics, the readers are referred to the

traditional gene counting algorithm [88].

2) Tensor decomposition: The CANDECOMP/PARAFAC (CP) decomposition has applications in

different areas such as chemometrics [89], [90], clustering [91], and compression [92]. For the ease

March 10, 2015 DRAFT

24

levels ⇢1

, . . . , ⇢M can be written as

Pr (R1

, . . . , RN ; ⇢1

, . . . , ⇢M ) =

NY

n=1

Pr (Rn; ⇢1 . . . ⇢M )

=

NY

n=1

MX

m=1

Pr (Rn | read Rn from sequence sm)Pr(sm)

!

=

NY

n=1

MX

m=1

↵nm⇢m

!,

where ↵nm , Pr (Rn | read Rn from sequence sm) can be obtained efficiently using an alignment

algorithm such as the ones based on the Burrows-Wheeler transform; see, e.g., [85], [86]. Therefore,

given {↵nm}n,m, the maximum likelihood estimation of the abundance levels can be stated as

b⇢ML = argmin

⇢�

NX

n=1

log

MX

m=1

↵nm⇢m

!

s.t.MX

m=1

⇢m = 1, and ⇢m � 0, 8m = 1, . . . ,M.

(36)

As a special case of the EM algorithm, a popular approach for solving this optimization problem is

to successively minimize a local tight upper-bound of the objective function. In particular, the eXpress

software [87] solves the following optimization problem at the r-th iteration of the algorithm:

⇢r+1

= argmin

⇢�

NX

n=1

MX

m=1

↵nm⇢rmPM

m0=1

↵nm0⇢rm0

log

✓⇢m⇢rm

◆!+ log

MX

m=1

↵nm⇢rm

!!

s.t.MX

m=1

⇢m = 1, and ⇢m � 0, 8m = 1, . . . ,M.

(37)

Using Jensen’s inequality, it is not hard to check that (37) is a valid upper-bound of (36) in the BSUM

framework. Moreover, (37) has a closed form solution obtained by

⇢r+1

m =

1

N

NX

n=1

↵nm⇢rmPMm0

=1

↵nm0⇢rm0

, 8m = 1, . . . ,M,

which makes the algorithm computationally efficient at each step.

For another application of the BSUM algorithm in classical genetics, the readers are referred to the

traditional gene counting algorithm [88].

2) Tensor decomposition: The CANDECOMP/PARAFAC (CP) decomposition has applications in

different areas such as chemometrics [89], [90], clustering [91], and compression [92]. For the ease

March 10, 2015 DRAFT

Page 25: Large Scale Optimization for Machine Learning...Lecture 21 razaviya@usc.edu. Announcements: • Midterm exams • Return it next Tuesday 1. Non-smooth Objective Function • Sub-gradient

Example 3: Transcript Abundance Estimation

13

24

levels ⇢1

, . . . , ⇢M can be written as

Pr (R1

, . . . , RN ; ⇢1

, . . . , ⇢M ) =

NY

n=1

Pr (Rn; ⇢1 . . . ⇢M )

=

NY

n=1

MX

m=1

Pr (Rn | read Rn from sequence sm)Pr(sm)

!

=

NY

n=1

MX

m=1

↵nm⇢m

!,

where ↵nm , Pr (Rn | read Rn from sequence sm) can be obtained efficiently using an alignment

algorithm such as the ones based on the Burrows-Wheeler transform; see, e.g., [85], [86]. Therefore,

given {↵nm}n,m, the maximum likelihood estimation of the abundance levels can be stated as

b⇢ML = argmin

⇢�

NX

n=1

log

MX

m=1

↵nm⇢m

!

s.t.MX

m=1

⇢m = 1, and ⇢m � 0, 8m = 1, . . . ,M.

(36)

As a special case of the EM algorithm, a popular approach for solving this optimization problem is

to successively minimize a local tight upper-bound of the objective function. In particular, the eXpress

software [87] solves the following optimization problem at the r-th iteration of the algorithm:

⇢r+1

= argmin

⇢�

NX

n=1

MX

m=1

↵nm⇢rmPM

m0=1

↵nm0⇢rm0

log

✓⇢m⇢rm

◆!+ log

MX

m=1

↵nm⇢rm

!!

s.t.MX

m=1

⇢m = 1, and ⇢m � 0, 8m = 1, . . . ,M.

(37)

Using Jensen’s inequality, it is not hard to check that (37) is a valid upper-bound of (36) in the BSUM

framework. Moreover, (37) has a closed form solution obtained by

⇢r+1

m =

1

N

NX

n=1

↵nm⇢rmPMm0

=1

↵nm0⇢rm0

, 8m = 1, . . . ,M,

which makes the algorithm computationally efficient at each step.

For another application of the BSUM algorithm in classical genetics, the readers are referred to the

traditional gene counting algorithm [88].

2) Tensor decomposition: The CANDECOMP/PARAFAC (CP) decomposition has applications in

different areas such as chemometrics [89], [90], clustering [91], and compression [92]. For the ease

March 10, 2015 DRAFT

24

levels ⇢1

, . . . , ⇢M can be written as

Pr (R1

, . . . , RN ; ⇢1

, . . . , ⇢M ) =

NY

n=1

Pr (Rn; ⇢1 . . . ⇢M )

=

NY

n=1

MX

m=1

Pr (Rn | read Rn from sequence sm)Pr(sm)

!

=

NY

n=1

MX

m=1

↵nm⇢m

!,

where ↵nm , Pr (Rn | read Rn from sequence sm) can be obtained efficiently using an alignment

algorithm such as the ones based on the Burrows-Wheeler transform; see, e.g., [85], [86]. Therefore,

given {↵nm}n,m, the maximum likelihood estimation of the abundance levels can be stated as

b⇢ML = argmin

⇢�

NX

n=1

log

MX

m=1

↵nm⇢m

!

s.t.MX

m=1

⇢m = 1, and ⇢m � 0, 8m = 1, . . . ,M.

(36)

As a special case of the EM algorithm, a popular approach for solving this optimization problem is

to successively minimize a local tight upper-bound of the objective function. In particular, the eXpress

software [87] solves the following optimization problem at the r-th iteration of the algorithm:

⇢r+1

= argmin

⇢�

NX

n=1

MX

m=1

↵nm⇢rmPM

m0=1

↵nm0⇢rm0

log

✓⇢m⇢rm

◆!+ log

MX

m=1

↵nm⇢rm

!!

s.t.MX

m=1

⇢m = 1, and ⇢m � 0, 8m = 1, . . . ,M.

(37)

Using Jensen’s inequality, it is not hard to check that (37) is a valid upper-bound of (36) in the BSUM

framework. Moreover, (37) has a closed form solution obtained by

⇢r+1

m =

1

N

NX

n=1

↵nm⇢rmPMm0

=1

↵nm0⇢rm0

, 8m = 1, . . . ,M,

which makes the algorithm computationally efficient at each step.

For another application of the BSUM algorithm in classical genetics, the readers are referred to the

traditional gene counting algorithm [88].

2) Tensor decomposition: The CANDECOMP/PARAFAC (CP) decomposition has applications in

different areas such as chemometrics [89], [90], clustering [91], and compression [92]. For the ease

March 10, 2015 DRAFT

Page 26: Large Scale Optimization for Machine Learning...Lecture 21 razaviya@usc.edu. Announcements: • Midterm exams • Return it next Tuesday 1. Non-smooth Objective Function • Sub-gradient

Example 3: Transcript Abundance Estimation

13

24

levels ⇢1

, . . . , ⇢M can be written as

Pr (R1

, . . . , RN ; ⇢1

, . . . , ⇢M ) =

NY

n=1

Pr (Rn; ⇢1 . . . ⇢M )

=

NY

n=1

MX

m=1

Pr (Rn | read Rn from sequence sm)Pr(sm)

!

=

NY

n=1

MX

m=1

↵nm⇢m

!,

where ↵nm , Pr (Rn | read Rn from sequence sm) can be obtained efficiently using an alignment

algorithm such as the ones based on the Burrows-Wheeler transform; see, e.g., [85], [86]. Therefore,

given {↵nm}n,m, the maximum likelihood estimation of the abundance levels can be stated as

b⇢ML = argmin

⇢�

NX

n=1

log

MX

m=1

↵nm⇢m

!

s.t.MX

m=1

⇢m = 1, and ⇢m � 0, 8m = 1, . . . ,M.

(36)

As a special case of the EM algorithm, a popular approach for solving this optimization problem is

to successively minimize a local tight upper-bound of the objective function. In particular, the eXpress

software [87] solves the following optimization problem at the r-th iteration of the algorithm:

⇢r+1

= argmin

⇢�

NX

n=1

MX

m=1

↵nm⇢rmPM

m0=1

↵nm0⇢rm0

log

✓⇢m⇢rm

◆!+ log

MX

m=1

↵nm⇢rm

!!

s.t.MX

m=1

⇢m = 1, and ⇢m � 0, 8m = 1, . . . ,M.

(37)

Using Jensen’s inequality, it is not hard to check that (37) is a valid upper-bound of (36) in the BSUM

framework. Moreover, (37) has a closed form solution obtained by

⇢r+1

m =

1

N

NX

n=1

↵nm⇢rmPMm0

=1

↵nm0⇢rm0

, 8m = 1, . . . ,M,

which makes the algorithm computationally efficient at each step.

For another application of the BSUM algorithm in classical genetics, the readers are referred to the

traditional gene counting algorithm [88].

2) Tensor decomposition: The CANDECOMP/PARAFAC (CP) decomposition has applications in

different areas such as chemometrics [89], [90], clustering [91], and compression [92]. For the ease

March 10, 2015 DRAFT

24

levels ⇢1

, . . . , ⇢M can be written as

Pr (R1

, . . . , RN ; ⇢1

, . . . , ⇢M ) =

NY

n=1

Pr (Rn; ⇢1 . . . ⇢M )

=

NY

n=1

MX

m=1

Pr (Rn | read Rn from sequence sm)Pr(sm)

!

=

NY

n=1

MX

m=1

↵nm⇢m

!,

where ↵nm , Pr (Rn | read Rn from sequence sm) can be obtained efficiently using an alignment

algorithm such as the ones based on the Burrows-Wheeler transform; see, e.g., [85], [86]. Therefore,

given {↵nm}n,m, the maximum likelihood estimation of the abundance levels can be stated as

b⇢ML = argmin

⇢�

NX

n=1

log

MX

m=1

↵nm⇢m

!

s.t.MX

m=1

⇢m = 1, and ⇢m � 0, 8m = 1, . . . ,M.

(36)

As a special case of the EM algorithm, a popular approach for solving this optimization problem is

to successively minimize a local tight upper-bound of the objective function. In particular, the eXpress

software [87] solves the following optimization problem at the r-th iteration of the algorithm:

⇢r+1

= argmin

⇢�

NX

n=1

MX

m=1

↵nm⇢rmPM

m0=1

↵nm0⇢rm0

log

✓⇢m⇢rm

◆!+ log

MX

m=1

↵nm⇢rm

!!

s.t.MX

m=1

⇢m = 1, and ⇢m � 0, 8m = 1, . . . ,M.

(37)

Using Jensen’s inequality, it is not hard to check that (37) is a valid upper-bound of (36) in the BSUM

framework. Moreover, (37) has a closed form solution obtained by

⇢r+1

m =

1

N

NX

n=1

↵nm⇢rmPMm0

=1

↵nm0⇢rm0

, 8m = 1, . . . ,M,

which makes the algorithm computationally efficient at each step.

For another application of the BSUM algorithm in classical genetics, the readers are referred to the

traditional gene counting algorithm [88].

2) Tensor decomposition: The CANDECOMP/PARAFAC (CP) decomposition has applications in

different areas such as chemometrics [89], [90], clustering [91], and compression [92]. For the ease

March 10, 2015 DRAFT

24

levels ⇢1

, . . . , ⇢M can be written as

Pr (R1

, . . . , RN ; ⇢1

, . . . , ⇢M ) =

NY

n=1

Pr (Rn; ⇢1 . . . ⇢M )

=

NY

n=1

MX

m=1

Pr (Rn | read Rn from sequence sm)Pr(sm)

!

=

NY

n=1

MX

m=1

↵nm⇢m

!,

where ↵nm , Pr (Rn | read Rn from sequence sm) can be obtained efficiently using an alignment

algorithm such as the ones based on the Burrows-Wheeler transform; see, e.g., [85], [86]. Therefore,

given {↵nm}n,m, the maximum likelihood estimation of the abundance levels can be stated as

b⇢ML = argmin

⇢�

NX

n=1

log

MX

m=1

↵nm⇢m

!

s.t.MX

m=1

⇢m = 1, and ⇢m � 0, 8m = 1, . . . ,M.

(36)

As a special case of the EM algorithm, a popular approach for solving this optimization problem is

to successively minimize a local tight upper-bound of the objective function. In particular, the eXpress

software [87] solves the following optimization problem at the r-th iteration of the algorithm:

⇢r+1

= argmin

⇢�

NX

n=1

MX

m=1

↵nm⇢rmPM

m0=1

↵nm0⇢rm0

log

✓⇢m⇢rm

◆!+ log

MX

m=1

↵nm⇢rm

!!

s.t.MX

m=1

⇢m = 1, and ⇢m � 0, 8m = 1, . . . ,M.

(37)

Using Jensen’s inequality, it is not hard to check that (37) is a valid upper-bound of (36) in the BSUM

framework. Moreover, (37) has a closed form solution obtained by

⇢r+1

m =

1

N

NX

n=1

↵nm⇢rmPMm0

=1

↵nm0⇢rm0

, 8m = 1, . . . ,M,

which makes the algorithm computationally efficient at each step.

For another application of the BSUM algorithm in classical genetics, the readers are referred to the

traditional gene counting algorithm [88].

2) Tensor decomposition: The CANDECOMP/PARAFAC (CP) decomposition has applications in

different areas such as chemometrics [89], [90], clustering [91], and compression [92]. For the ease

March 10, 2015 DRAFT

Closedformupdate!


Top Related