complementairty formulations for -...

Complementairty Formulations forL0-norm Optimization Problems1

John E. Mitchell

Department of Mathematical SciencesRPI, Troy, NY 12180 USA

SIAM Conference on OptimizationSan Diego, May 2014

1Joint work with Mingbin Feng, Jong-Shi Pang, Xin Shen,and Andreas Wächter.Supported by AFOSR and NSF.

Mitchell (RPI) Complementarity for L0 SIOPT 2014 1 / 22

Outline

1 Introduction and Applications

2 L0-norm minimization

3 Conclusions


Introduction and Applications

Outline



3 Conclusions



L0-norm minimization

Want to solve:

minx{f (x) + γ||x ||0 : Ax ≥ b, x ∈ IRn}

where A ∈ IRm×n, b ∈ IRm, and ||x ||0 = card{xi : xi 6= 0}.

The parameter γ reflects the relative importance of sparsity and f (x).

L0-norm minimization is of interest in feature selection, compressedsensing, misclassification minimization, and almost any problem wherea sparse solution is desired.

Can be modeled as a Mathematical Program with ComplementarityConstraints or MPCC.



Eg: Misclassification minimization

Given (C, γ, ε) > 0, and points x i with observed values yi :

minimize(w ,b)

Cn∑

i=1

max{|wT x i + b − yi | − ε, 0

}+ 1

2 wT w︸︷︷︸standard SVM objective

+ γ

n∑i=1

{1 if |wT x i + b − yi | > ε

0 if |wT x i + b − yi | ≤ ε︸︷︷︸number of misclassified points

(Mangasarian)



Eg: Minimum portfolio revision

Given portfolio x0 in standard unit simplex ∆n , {x ∈ Rn+ | 1T x = 1},

consider a portfolio revision problem as follows:

minimizex∈∆n

c2

xT Vx︸︷︷︸variance

+c ′ VaRβ(x)

︸︷︷︸composite risk measure

+ K ‖ x − x0 ‖0︸︷︷︸portfolio revision cost

subject to µT x ≥ R (expected portfolio return),

where VaRβ(x) is the β-value-at-risk of the portfolio x with β ∈ (0,1).


L0-norm minimization

Outline



3 Conclusions


L0-norm minimization Formulation

Complementarity formulationWant to solve:

minx{f (x) + γ||x ||0 : Ax ≥ b, x ∈ IRn}

Equivalently:

minx±,ξ f (x) +∑n

i=1(1− ξi)

subject to Ax ≥ b

0 ≤ ξ ≤ 1

0 ≤ ξ ⊥ x

Note that if xi = 0 then we can choose ξi = 1,and if xi 6= 0 then we must set ξi = 0.

Thus, the objective counts the number of nonzero components.

No need for any big-M terms in this LPCC formulation.Mitchell (RPI) Complementarity for L0 SIOPT 2014 8 / 22


Complementarity formulation


i=1(1− ξi)

subject to Ax ≥ b

0 ≤ ξ ≤ 1

0 ≤ ξ ⊥ x

This is a half-complementary formulation;



Complementarity formulation


i=1(1− ξi)

subject to Ax ≥ b

0 ≤ ξ ≤ 1

0 ≤ ξ ⊥ x+ + x− ≥ 0, 0 ≤ x+ ⊥ x− ≥ 0, x = x+ − x−

This is a half-complementary formulation;can also get a full-complementary formulation by splitting x = x+ − x−.



KKT points

The Constant Rank Constraint Qualification holds.It also holds under additional assumptions with nonlinear constraints.

Let x satisfy Ax ≥ b and assume x minimizes f (x) for someassignment of the complementarities,set ξ to count the number of nonzero components:

This point is a local minimizer and a strongly stationary pointof the MPCC formulation.



Relaxed NLP formulationLet ε > 0. Assume f (x) ≡ 0. Can get a relaxed formulation:

minx±,ξ∑n

i=1(1− ξi)

subject to Ax+ − Ax− ≥ b

0 ≤ x+, x−

0 ≤ ξ ≤ 1

ξi (x+i + x−i ) ≤ ε i = 1, . . . ,n

CQ holds for this problem.

Not every x is a local minimizer.

A sequence of global minimizers as ε→ 0 will have a subsequencethat converges to a global minimizer of the L0-norm problem.

Under certain conditions, the solution to the relaxed formulation is asolution to a reweighted version of the L1-norm minimization problem.



Penalty method

Can penalize violations of complementarity in many ways:

• Add (x+ + x−)T ξ to the objective function.

• Add (x+ + x−)T ξ + (x+)T x− to the objective function.Corresponds to the AMPL keyword “complements”when using knitro.

• Add (((x+ + x−)T ξ)2 to the objective function.

• Add ((x+ + x−)T ξ + (x+)T x−))2 to the objective function.

• Use the Fischer-Burmeister function, or some other NCP function.

Theoretical investigation is ongoing.

Surprisingly, the fourth formulation has worked well in our tests.


L0-norm minimization Computational Results

Test problems

Examine three classes of problems:

• Minimize L0-norm, without another objective f (x).

• Look at weighted combinations of f (x) and L0-norm.

• Look at signal recovery problems.

In each case, compare solutions obtained by a nonlinear programmingpackage (KNITRO, SNOPT, CONOPT, MINOS) for ourcomplementarity formulation with an L1-norm formulation.

Have no guarantee of global optimality.


L0-norm minimization Minimize L0-norm

Minimize number of non zeroes

minx∈IRn ‖x‖0,1s.t. Ax ≥ b.

Look at four formulations:

MILP: integer programming formulation with a big-M

L1: L1-approximation solved as an LP.Objective reweighted iteratively (Candes et al).

fullcomp: complementarity formulation solved as NLP

halfcomp: modified complementarity formulation, solved as NLP

Test problemsA and b generated randomly, each entry in U[−1,1].



A is 30× 50:50 test problems, various solvers and start points



A is 300× 500: performance profile of sparsity


L0-norm minimization Pareto frontier

Weighted combinations

minx∈IRn q(x) + β‖x‖0,1s.t. Ax ≥ b.

Solve for each norm for various different choices of β.

In our tests, A is 30× 50, q(x) is strictly convex quadratic function.

The L0-norm formulations are solved using KNITRO under AMPL.


L0-norm minimization Pareto frontier

Pareto frontier


L0-norm minimization Signal recovery

Signal recovery

minx∈IRn

‖Ax − b‖22 + β‖x‖0,1

Look at different choices of regularizing parameter β.

A is 256× 1024.In original signal, 40 entries of x are nonzero.Random noise is added to give b.

L1-norm results are improved by debiasing:fix zeroes in L1 solution at zero, then look for best least squares fit.L1-debiased leads to many small nonzero components in our tests.


L0-norm minimization Signal recovery

Signal recovery sparsity

10−6 10−4 10−2 100100

101

102

103

104

y=40

Regularizing Parameter

Num

ber o

f NZ

in o

ptim

al s

olut

ion

y=40y=40

L0L1L1−Debiased


Conclusions

Outline



3 Conclusions


Conclusions

Conclusions

A complementarity formulation can be very effective at finding sparsesolutions to L0-norm optimization problems, far more quickly than anMILP approach but more slowly than an L1-norm LP approach.

In the first class, outperformed an L1-norm formulation even withsophisticated iterative schemes for the L1 approach.

For Pareto frontier, get better results than L1 for small instance. So far,these results haven’t yet extended well to larger instances.

For signal recovery, able to recover signal effectively. Currentlyinvestigating different distributions and test instances.

At present, we only have a very partial understanding as to why theL0-norm approach works so well. Theoretical investigation is ongoing!


complementairty formulations for -...

Documents