2008 : a case study: how to speed up the simplex algorithms on problems minimizing l1 and linf norms

http://www.mosek.com

A Case Study:How to Speed Up the Simplex

Algorithms on Problems MinimizingL1 and Linf Norms.

Bo Jensen MOSEK ApS,Fruebjergvej 3, Box 16, 2100 Copenhagen,

Denmark.Email: [email protected]

Introduction

2 / 31

What is MOSEK

Introduction

Aim of talk

The case

Formulations

Getting feasible

Getting optimal

Computational results

Conclusions

3 / 31

■ A software package for solving large-scale optimizationproblems.

■ Solves linear, conic, and nonlinear convex problems.■ Has mixed-integer capabilities.■ Used to solve problems with up to millions of constraints and

variables.

For further information see www.mosek.com.

Aim of talk

Introduction

Aim of talk

The case

Formulations

Getting feasible

Getting optimal


Conclusions

4 / 31

Focusing on good old simplex issues...Demonstrate through a case study :

■ How to speed up the simplex optimizers on specialstructured problems.

■ Emphasize the importance of modeling issues.

The computational results will show we can reduce the simplex

solve time with several factors by exploiting norm structures!

Due to limited presentation time we will focus on the primalsimplex optimizer.

The case

5 / 31

Norm minimization

Introduction

The case

Norm minimization

Applications

Formulations

Getting feasible

Getting optimal


Conclusions

6 / 31

■ The problem(s):

(P ) min ||x||pst Ax = b,

bl ≤ x ≤ bu.

Assume bl < 0 and bu > 0.■ We will look at :

◆ L1 norm i.e. ||x||1

=∑

∀j∈K |xj|.◆ L∞ norm i.e. ||x||

∞= max∀j∈K |xj |.

Where K is a set of variable index.

Applications

Introduction

The case

Norm minimization

Applications

Formulations

Getting feasible

Getting optimal


Conclusions

7 / 31

Used by many applications since it can be seen as a measure of’distance’ from one solution to another.

■ Statistical applications such as curve fitting or morespecialized applications as data protection.

■ Sparse solution estimate (L1)(Basis Pursuit ex. Ewout,Bergand Friedlander 08).

■ Applications where ’errors’ should be minimized.

Formulations

8 / 31

How should we formulate the problem ?

Introduction

The case

FormulationsHow should weformulate theproblem ?

Example L1 norm

Good formulation L1

norm

Example L∞ norm

Good formulationL∞ norm

Reformulation of theL1 norm

Reformulation of theL∞ norm

Getting feasible

Getting optimal


Conclusions

9 / 31

Absolute values i.e cj|xj | in the objective can be formulated inseveral ways :

■ Split |xj| up in two variables i.e positive and negative part(Variable approach).

◆ xj = x+

j + x−

j where x+

j ≥ 0, x−

j ≤ 0 and c+

j = −c−j = cj .

■ Add two extra constraints and one variable for each variable(Constraint approach).

◆ Add xj ≤ tj and −xj ≤ tj where tj ≥ 0 and ctj = cj .

Other variations including |xj| can occur.

Example L1 norm

Introduction

The case


Example L1 norm

Good formulation L1

norm

Example L∞ norm




Getting feasible

Getting optimal


Conclusions

10 / 31

■ Variable approach :

(P1) min 1T x+ − 1T x−

st Ax+ + Ax− = b,

0 ≤ x+ ≤ bu bl ≤ x− ≤ 0.

■ Constraint approach (Most likely customer formulation):

(P2) min 1T t

st Ax = b,

Ix ≤ t

−Ix ≤ t

bl ≤ x ≤ bu.

Good formulation L1 norm

Introduction

The case


Example L1 norm

Good formulation L1

norm

Example L∞ norm




Getting feasible

Getting optimal


Conclusions

11 / 31

Normally the variable approach is better, why ?

■ The problem dimensions is smaller i.e same number ofcolumns but less rows (more nonzeroes in A but not in thebasis since x+ and x− can not both be basic).

■ We can better exploit this structure (see later slides).

Example L∞ norm

Introduction

The case


Example L1 norm

Good formulation L1

norm

Example L∞ norm




Getting feasible

Getting optimal


Conclusions

12 / 31

■ Variable approach :

(P3) min t∞st Ax+ + Ax− = b,

Ix+ − Ix− ≤ 1t∞0 ≤ x+ ≤ bu bl ≤ x− ≤ 0.

■ Constraint approach (Most likely customer formulation) :

(P4) min t∞st Ax = b,

Ix ≤ 1t∞−Ix ≤ 1t∞bl ≤ x ≤ bu.

Good formulation L∞ norm

Introduction

The case


Example L1 norm

Good formulation L1

norm

Example L∞ norm




Getting feasible

Getting optimal


Conclusions

13 / 31

Normally the constraint approach is better, why ?

■ Many constraints compared to variables.■ x+ and x− are not duplicated columns as in L1 case.■ Dualizing L∞ gives similar structure as L1.■ If we dualize P4 we get duplicated columns.■ Since we choose to dualize we do not mind the extra

constraints in the formulation (will become variables in thedual problem).

Reformulation of the L1 norm

Introduction

The case


Example L1 norm

Good formulation L1

norm

Example L∞ norm




Getting feasible

Getting optimal


Conclusions

14 / 31

(P5) min t1 +t2st

x1 +x2 = 2x1 ≤ t1−x1 ≤ t1

x2 ≤ t2−x2 ≤ t2

(P6) min x3 +x4 +x5 +x6

stx3 +x4 +x5 +x6 = 2

0 ≤ x3 0 ≥ x4 0 ≤ x5 0 ≥ x6

Reformulation of the L∞ norm

Introduction

The case


Example L1 norm

Good formulation L1

norm

Example L∞ norm




Getting feasible

Getting optimal


Conclusions

15 / 31

(P8) min t

stx3 −x4 +x5 −x6 = 2x3 −x4 ≤ t

x5 −x6 ≤ t

0 ≤ x3 x4 ≤ 0 0 ≤ x5 x6 ≤ 0

(P7) min t

stx1 +x2 = 2x1 ≤ t

−x1 ≤ t

x2 ≤ t

−x2 ≤ t

Getting feasible

16 / 31

Improving the primal simplex optimizer for L1 and L∞

norm

Introduction

The case

Formulations

Getting feasible

Improving the primalsimplex optimizer forL1 and L∞ norm

Getting optimal


Conclusions

17 / 31

■ Exploit duplicated columns when primal infeasible:

◆ Find duplicated columns (i.e x+ and x−) and create onevariable.

◆ Reduces primal breakpoints so we can take longer steps.

Example : For L1 the splitted variables can be merged.

min x3 +x4 +x5 +x6

stx3 +x4 +x5 +x6 = 2

0 ≤ x3 0 ≥ x4 0 ≤ x5 0 ≥ x6

Improving the primal simplex optimizer for L∞ and L∞

norm (Cont.)

Introduction

The case

Formulations

Getting feasible

Improving the primalsimplex optimizer forL1 and L∞ norm

Getting optimal


Conclusions

18 / 31

■ Exploit scalable x when primal infeasible:Variables that always can make a set of constraints feasible despitethe level of other variables

Example : For L∞ t can always be set at a level making normequations feasible.

min t

stx1 +x2 = 2x1 ≤ t

−x1 ≤ t

x2 ≤ t

−x2 ≤ t

Getting optimal

19 / 31


norm (Cont.)

Introduction

The case

Formulations

Getting feasible

Getting optimal


Conclusions

20 / 31

■ Take longer primal steps when primal feasible :Consider the case where (x+, x−),(y+, y−) and (z+, z−) aresets of norm variables and we will find the primal steplengthfor a move on a non basic variable.


norm (Cont.)

Introduction

The case

Formulations

Getting feasible

Getting optimal


Conclusions

21 / 31


norm (Cont.)

Introduction

The case

Formulations

Getting feasible

Getting optimal


Conclusions

22 / 31

■ Since swap columns has identical columns in A no basisupdate are needed due to swaps.

■ We can do many mini-iterations very ‘cheap‘ (need to do oneextra solve with the basis to update dual variables).

■ Involves a linesearch for finding optimal breakpoint.■ The dual version of boundswaps in dual simplex (very

important for speed!).

This is not a new idea and can be generalized to all duplicated

columns (i.e aj = αak is a matter of rescaling)


23 / 31

Test setup

Introduction

The case

Formulations

Getting feasible

Getting optimal


Test setup

L1 norm instancesinfeasible iterationsprimal simplex

L1 norm instancestotal iterations primalsimplex

L1 norm instancestotal time primalsimplex

L∞ norm instancestotal time “primalsimplex”

Conclusions

24 / 31

Data:

■ Will focus on one data set arising from statistical dataprotection.

■ Public available data see Castro’s homepagehttp://www-eio.upc.es/ jcastro/.

■ Both L1 and L∞ objectives on same set of constraints.

Test environment:

■ Double Quad Core 2.5 GHz.■ 12 GB RAM.■ Running Linux.

L1 norm instances infeasible iterations primal simplex

Introduction

The case

Formulations

Getting feasible

Getting optimal


Test setup





Conclusions

25 / 31

Problem Standard Specialized RatioL1 bts4 91044 7319 12.44L1 cbs 0 7 0.00L1 dale 301 145 2.08L1 five20b 303639 9329 32.55L1 five20c 348438 9421 36.98L1 hier13 5491 689 7.96L1 hier13x13x13a 5621 663 8.42L1 hier13x13x13d 4538 665 6.82L1 hier13x13x13e 5729 661 8.66L1 hier13x13x7d 1030 301 3.42L1 hier13x7x7d 245 77 3.18L1 hier16 14587 1083 14.05L1 hier16x16x16a 14699 1128 13.05L1 hier16x16x16d 14317 1146 12.44L1 jjtabeltest3 772 355 2.17L1 nine12 35451 2048 17.31L1 nine5d 34837 2961 11.76L1 ninenew 14825 1407 10.53L1 osorio 0 0 1.00L1 table1 90 35 2.57L1 table3 2460 366 6.72L1 table6 204 30 6.80L1 table7 19 15 1.27L1 table8 3 3 1.00L1 targus 8 4 2.00L1 toy3dsarah 199 167 1.19L1 two5in6 13061 1338 9.76Num. 27 27 NANum. first 3 26 NATotal iter 911608 41363 22.04G. avg. 8670.35 1035.91 8.37

L1 norm instances total iterations primal simplex

Introduction

The case

Formulations

Getting feasible

Getting optimal


Test setup





Conclusions

26 / 31

Problem Standard Specialized RatioL1 bts4 124598 27587 4.50L1 cbs 0 10 0.0L1 dale 725 315 2.3L1 five20b 894976 94595 9.46L1 five20c 871334 77500 11.24L1 hier13 8674 1203 7.21L1 hier13x13x13a 8202 1118 7.34L1 hier13x13x13d 7819 1098 7.12L1 hier13x13x13e 8471 1099 7.71L1 hier13x13x7d 1952 652 2.99L1 hier13x7x7d 488 257 1.90L1 hier16 30909 4381 7.06L1 hier16x16x16a 31884 3796 8.40L1 hier16x16x16d 28345 4266 6.64L1 jjtabeltest3 1219 643 1.90L1 nine12 183296 50581 3.62L1 nine5d 83201 14137 5.89L1 ninenew 48774 10928 4.46L1 osorio 645 777 0.83L1 table1 160 145 1.10L1 table3 5696 1547 3.68L1 table6 337 125 2.7L1 table7 66 87 0.76L1 table8 5 74 0.07L1 targus 25 25 1.00L1 toy3dsarah 224 197 1.14L1 two5in6 32046 4706 6.81Num. 27 27 NANum. first 5 23 NATotal iter 2374095 301894 7.86G. avg. 9098.66 2249.05 4.05

L1 norm instances total time primal simplex

Introduction

The case

Formulations

Getting feasible

Getting optimal


Test setup





Conclusions

27 / 31

Problem Standard Specialized RatioL1 bts4 151.26 45.24 3.34L1 cbs 0.03 0.05 0.60L1 dale 0.11 0.14 0.79L1 five20b 8416.45 1168.03 7.21L1 five20c 7970.91 891.02 8.95L1 hier13 2.07 0.38 5.45L1 hier13x13x13a 1.93 0.33 5.85L1 hier13x13x13d 1.94 0.34 5.71L1 hier13x13x13e 2.13 0.34 6.26L1 hier13x13x7d 0.24 0.09 2.67L1 hier13x7x7d 0.03 0.02 1.50L1 hier16 16.86 2.90 5.81L1 hier16x16x16a 17.12 2.18 7.85L1 hier16x16x16d 14.70 2.50 5.88L1 jjtabeltest3 0.07 0.05 1.40L1 nine12 359.94 123.51 2.91L1 nine5d 88.81 19.07 4.66L1 ninenew 50.36 11.14 4.52L1 osorio 0.08 0.11 0.73L1 table1 0.02 0.02 1.00L1 table3 1.78 0.30 5.93L1 table6 0.02 0.02 1.00L1 table7 0.01 0.01 1.00L1 table8 0.01 0.01 1.00L1 targus 0.00 0.00 1.00L1 toy3dsarah 0.02 0.03 0.67L1 two5in6 18.48 2.99 6.18Num. 27 27 NANum. first 9 23 NATotal time 17115.39 2270.84 7.54G. avg. 2950.40 510.10 5.78

L∞ norm instances total time “primal simplex”

Introduction

The case

Formulations

Getting feasible

Getting optimal


Test setup





Conclusions

28 / 31

Problem Standard dual on primal Specialized primal on dual RatioLinf bts4 59.01 24.92 2.37Linf cbs 0.19 0.47 0.40Linf dale 0.86 2.55 0.34Linf hier13 2.56 0.98 2.61Linf hier13x13x13a 2.18 0.86 2.53Linf hier13x13x13d 2.31 0.77 3.00Linf hier13x13x13e 2.45 0.76 3.22Linf hier13x13x7d 0.43 0.24 1.79Linf hier13x7x7d 0.05 0.09 0.56Linf hier16 13.64 3.78 3.61Linf hier16x16x16a 13.60 2.77 4.91Linf hier16x16x16d 12.57 2.99 4.20Linf hier16x16x16e 12.52 2.92 4.29Linf jjtabeltest3 0.16 0.28 0.57Linf nine12 272.58 142.98 1.91Linf nine5d 89.82 11.27 7.97Linf ninenew 132.43 4.30 30.80Linf osorio 4.86 17.20 0.28Linf table3 6.41 1.09 5.88Linf table7 0.03 0.06 0.50Linf table8 0.27 0.38 0.71Linf targus 0.01 0.03 0.33Linf toy3dsarah 0.06 0.17 0.35Linf two5in6 18.43 3.32 5.55Num. 24 24 NANum. first 9 15 NATotal time 691.90 234.38 2.95G. avg. 16.26 3.59 4.53

Conclusions

29 / 31

Conclusions

Introduction

The case

Formulations

Getting feasible

Getting optimal


Conclusions

Conclusions

References

30 / 31

■ Norm structures can be exploited within the simplexoptimizers with surprisingly good results.

■ Model issues can be very important for the simplexoptimizers.

■ Simple ‘tricks‘ using structure knowledge about models canspeedup dramatically.

References

Introduction

The case

Formulations

Getting feasible

Getting optimal


Conclusions

Conclusions

References

31 / 31

[CASTRO:87] J. Castro, Minimum-distance controlled perturbationmethods for large-scale tabular data protection, European Journalof Operational Research, 171 (2006)

[EWOUT:08] Ewout,Berg and Friedlander ”Probing the Pareto frontier

for basis pursuit solutions” Technical Report TR 2008-01,

Department of Computer Science, University of British Colombia.

2008 : a case study: how to speed up the simplex algorithms on problems minimizing l1 and linf norms

Documents