2008 : a case study: how to speed up the simplex algorithms on problems minimizing l1 and linf norms
DESCRIPTION
A talk I gave at the INFORMS annual meeting in Washington 2008, based on some work on special client problems.TRANSCRIPT
http://www.mosek.com
A Case Study:How to Speed Up the Simplex
Algorithms on Problems MinimizingL1 and Linf Norms.
Bo Jensen MOSEK ApS,Fruebjergvej 3, Box 16, 2100 Copenhagen,
Denmark.Email: [email protected]
Introduction
2 / 31
What is MOSEK
Introduction
Aim of talk
The case
Formulations
Getting feasible
Getting optimal
Computational results
Conclusions
3 / 31
■ A software package for solving large-scale optimizationproblems.
■ Solves linear, conic, and nonlinear convex problems.■ Has mixed-integer capabilities.■ Used to solve problems with up to millions of constraints and
variables.
For further information see www.mosek.com.
Aim of talk
Introduction
Aim of talk
The case
Formulations
Getting feasible
Getting optimal
Computational results
Conclusions
4 / 31
Focusing on good old simplex issues...Demonstrate through a case study :
■ How to speed up the simplex optimizers on specialstructured problems.
■ Emphasize the importance of modeling issues.
The computational results will show we can reduce the simplex
solve time with several factors by exploiting norm structures!
Due to limited presentation time we will focus on the primalsimplex optimizer.
The case
5 / 31
Norm minimization
Introduction
The case
Norm minimization
Applications
Formulations
Getting feasible
Getting optimal
Computational results
Conclusions
6 / 31
■ The problem(s):
(P ) min ||x||pst Ax = b,
bl ≤ x ≤ bu.
Assume bl < 0 and bu > 0.■ We will look at :
◆ L1 norm i.e. ||x||1
=∑
∀j∈K |xj|.◆ L∞ norm i.e. ||x||
∞= max∀j∈K |xj |.
Where K is a set of variable index.
Applications
Introduction
The case
Norm minimization
Applications
Formulations
Getting feasible
Getting optimal
Computational results
Conclusions
7 / 31
Used by many applications since it can be seen as a measure of’distance’ from one solution to another.
■ Statistical applications such as curve fitting or morespecialized applications as data protection.
■ Sparse solution estimate (L1)(Basis Pursuit ex. Ewout,Bergand Friedlander 08).
■ Applications where ’errors’ should be minimized.
Formulations
8 / 31
How should we formulate the problem ?
Introduction
The case
FormulationsHow should weformulate theproblem ?
Example L1 norm
Good formulation L1
norm
Example L∞ norm
Good formulationL∞ norm
Reformulation of theL1 norm
Reformulation of theL∞ norm
Getting feasible
Getting optimal
Computational results
Conclusions
9 / 31
Absolute values i.e cj|xj | in the objective can be formulated inseveral ways :
■ Split |xj| up in two variables i.e positive and negative part(Variable approach).
◆ xj = x+
j + x−
j where x+
j ≥ 0, x−
j ≤ 0 and c+
j = −c−j = cj .
■ Add two extra constraints and one variable for each variable(Constraint approach).
◆ Add xj ≤ tj and −xj ≤ tj where tj ≥ 0 and ctj = cj .
Other variations including |xj| can occur.
Example L1 norm
Introduction
The case
FormulationsHow should weformulate theproblem ?
Example L1 norm
Good formulation L1
norm
Example L∞ norm
Good formulationL∞ norm
Reformulation of theL1 norm
Reformulation of theL∞ norm
Getting feasible
Getting optimal
Computational results
Conclusions
10 / 31
■ Variable approach :
(P1) min 1T x+ − 1T x−
st Ax+ + Ax− = b,
0 ≤ x+ ≤ bu bl ≤ x− ≤ 0.
■ Constraint approach (Most likely customer formulation):
(P2) min 1T t
st Ax = b,
Ix ≤ t
−Ix ≤ t
bl ≤ x ≤ bu.
Good formulation L1 norm
Introduction
The case
FormulationsHow should weformulate theproblem ?
Example L1 norm
Good formulation L1
norm
Example L∞ norm
Good formulationL∞ norm
Reformulation of theL1 norm
Reformulation of theL∞ norm
Getting feasible
Getting optimal
Computational results
Conclusions
11 / 31
Normally the variable approach is better, why ?
■ The problem dimensions is smaller i.e same number ofcolumns but less rows (more nonzeroes in A but not in thebasis since x+ and x− can not both be basic).
■ We can better exploit this structure (see later slides).
Example L∞ norm
Introduction
The case
FormulationsHow should weformulate theproblem ?
Example L1 norm
Good formulation L1
norm
Example L∞ norm
Good formulationL∞ norm
Reformulation of theL1 norm
Reformulation of theL∞ norm
Getting feasible
Getting optimal
Computational results
Conclusions
12 / 31
■ Variable approach :
(P3) min t∞st Ax+ + Ax− = b,
Ix+ − Ix− ≤ 1t∞0 ≤ x+ ≤ bu bl ≤ x− ≤ 0.
■ Constraint approach (Most likely customer formulation) :
(P4) min t∞st Ax = b,
Ix ≤ 1t∞−Ix ≤ 1t∞bl ≤ x ≤ bu.
Good formulation L∞ norm
Introduction
The case
FormulationsHow should weformulate theproblem ?
Example L1 norm
Good formulation L1
norm
Example L∞ norm
Good formulationL∞ norm
Reformulation of theL1 norm
Reformulation of theL∞ norm
Getting feasible
Getting optimal
Computational results
Conclusions
13 / 31
Normally the constraint approach is better, why ?
■ Many constraints compared to variables.■ x+ and x− are not duplicated columns as in L1 case.■ Dualizing L∞ gives similar structure as L1.■ If we dualize P4 we get duplicated columns.■ Since we choose to dualize we do not mind the extra
constraints in the formulation (will become variables in thedual problem).
Reformulation of the L1 norm
Introduction
The case
FormulationsHow should weformulate theproblem ?
Example L1 norm
Good formulation L1
norm
Example L∞ norm
Good formulationL∞ norm
Reformulation of theL1 norm
Reformulation of theL∞ norm
Getting feasible
Getting optimal
Computational results
Conclusions
14 / 31
(P5) min t1 +t2st
x1 +x2 = 2x1 ≤ t1−x1 ≤ t1
x2 ≤ t2−x2 ≤ t2
(P6) min x3 +x4 +x5 +x6
stx3 +x4 +x5 +x6 = 2
0 ≤ x3 0 ≥ x4 0 ≤ x5 0 ≥ x6
Reformulation of the L∞ norm
Introduction
The case
FormulationsHow should weformulate theproblem ?
Example L1 norm
Good formulation L1
norm
Example L∞ norm
Good formulationL∞ norm
Reformulation of theL1 norm
Reformulation of theL∞ norm
Getting feasible
Getting optimal
Computational results
Conclusions
15 / 31
(P8) min t
stx3 −x4 +x5 −x6 = 2x3 −x4 ≤ t
x5 −x6 ≤ t
0 ≤ x3 x4 ≤ 0 0 ≤ x5 x6 ≤ 0
(P7) min t
stx1 +x2 = 2x1 ≤ t
−x1 ≤ t
x2 ≤ t
−x2 ≤ t
Getting feasible
16 / 31
Improving the primal simplex optimizer for L1 and L∞
norm
Introduction
The case
Formulations
Getting feasible
Improving the primalsimplex optimizer forL1 and L∞ norm
Getting optimal
Computational results
Conclusions
17 / 31
■ Exploit duplicated columns when primal infeasible:
◆ Find duplicated columns (i.e x+ and x−) and create onevariable.
◆ Reduces primal breakpoints so we can take longer steps.
Example : For L1 the splitted variables can be merged.
min x3 +x4 +x5 +x6
stx3 +x4 +x5 +x6 = 2
0 ≤ x3 0 ≥ x4 0 ≤ x5 0 ≥ x6
Improving the primal simplex optimizer for L∞ and L∞
norm (Cont.)
Introduction
The case
Formulations
Getting feasible
Improving the primalsimplex optimizer forL1 and L∞ norm
Getting optimal
Computational results
Conclusions
18 / 31
■ Exploit scalable x when primal infeasible:Variables that always can make a set of constraints feasible despitethe level of other variables
Example : For L∞ t can always be set at a level making normequations feasible.
min t
stx1 +x2 = 2x1 ≤ t
−x1 ≤ t
x2 ≤ t
−x2 ≤ t
Getting optimal
19 / 31
Improving the primal simplex optimizer for L1 and L∞
norm (Cont.)
Introduction
The case
Formulations
Getting feasible
Getting optimal
Computational results
Conclusions
20 / 31
■ Take longer primal steps when primal feasible :Consider the case where (x+, x−),(y+, y−) and (z+, z−) aresets of norm variables and we will find the primal steplengthfor a move on a non basic variable.
Improving the primal simplex optimizer for L1 and L∞
norm (Cont.)
Introduction
The case
Formulations
Getting feasible
Getting optimal
Computational results
Conclusions
21 / 31
Improving the primal simplex optimizer for L1 and L∞
norm (Cont.)
Introduction
The case
Formulations
Getting feasible
Getting optimal
Computational results
Conclusions
22 / 31
■ Since swap columns has identical columns in A no basisupdate are needed due to swaps.
■ We can do many mini-iterations very ‘cheap‘ (need to do oneextra solve with the basis to update dual variables).
■ Involves a linesearch for finding optimal breakpoint.■ The dual version of boundswaps in dual simplex (very
important for speed!).
This is not a new idea and can be generalized to all duplicated
columns (i.e aj = αak is a matter of rescaling)
Computational results
23 / 31
Test setup
Introduction
The case
Formulations
Getting feasible
Getting optimal
Computational results
Test setup
L1 norm instancesinfeasible iterationsprimal simplex
L1 norm instancestotal iterations primalsimplex
L1 norm instancestotal time primalsimplex
L∞ norm instancestotal time “primalsimplex”
Conclusions
24 / 31
Data:
■ Will focus on one data set arising from statistical dataprotection.
■ Public available data see Castro’s homepagehttp://www-eio.upc.es/ jcastro/.
■ Both L1 and L∞ objectives on same set of constraints.
Test environment:
■ Double Quad Core 2.5 GHz.■ 12 GB RAM.■ Running Linux.
L1 norm instances infeasible iterations primal simplex
Introduction
The case
Formulations
Getting feasible
Getting optimal
Computational results
Test setup
L1 norm instancesinfeasible iterationsprimal simplex
L1 norm instancestotal iterations primalsimplex
L1 norm instancestotal time primalsimplex
L∞ norm instancestotal time “primalsimplex”
Conclusions
25 / 31
Problem Standard Specialized RatioL1 bts4 91044 7319 12.44L1 cbs 0 7 0.00L1 dale 301 145 2.08L1 five20b 303639 9329 32.55L1 five20c 348438 9421 36.98L1 hier13 5491 689 7.96L1 hier13x13x13a 5621 663 8.42L1 hier13x13x13d 4538 665 6.82L1 hier13x13x13e 5729 661 8.66L1 hier13x13x7d 1030 301 3.42L1 hier13x7x7d 245 77 3.18L1 hier16 14587 1083 14.05L1 hier16x16x16a 14699 1128 13.05L1 hier16x16x16d 14317 1146 12.44L1 jjtabeltest3 772 355 2.17L1 nine12 35451 2048 17.31L1 nine5d 34837 2961 11.76L1 ninenew 14825 1407 10.53L1 osorio 0 0 1.00L1 table1 90 35 2.57L1 table3 2460 366 6.72L1 table6 204 30 6.80L1 table7 19 15 1.27L1 table8 3 3 1.00L1 targus 8 4 2.00L1 toy3dsarah 199 167 1.19L1 two5in6 13061 1338 9.76Num. 27 27 NANum. first 3 26 NATotal iter 911608 41363 22.04G. avg. 8670.35 1035.91 8.37
L1 norm instances total iterations primal simplex
Introduction
The case
Formulations
Getting feasible
Getting optimal
Computational results
Test setup
L1 norm instancesinfeasible iterationsprimal simplex
L1 norm instancestotal iterations primalsimplex
L1 norm instancestotal time primalsimplex
L∞ norm instancestotal time “primalsimplex”
Conclusions
26 / 31
Problem Standard Specialized RatioL1 bts4 124598 27587 4.50L1 cbs 0 10 0.0L1 dale 725 315 2.3L1 five20b 894976 94595 9.46L1 five20c 871334 77500 11.24L1 hier13 8674 1203 7.21L1 hier13x13x13a 8202 1118 7.34L1 hier13x13x13d 7819 1098 7.12L1 hier13x13x13e 8471 1099 7.71L1 hier13x13x7d 1952 652 2.99L1 hier13x7x7d 488 257 1.90L1 hier16 30909 4381 7.06L1 hier16x16x16a 31884 3796 8.40L1 hier16x16x16d 28345 4266 6.64L1 jjtabeltest3 1219 643 1.90L1 nine12 183296 50581 3.62L1 nine5d 83201 14137 5.89L1 ninenew 48774 10928 4.46L1 osorio 645 777 0.83L1 table1 160 145 1.10L1 table3 5696 1547 3.68L1 table6 337 125 2.7L1 table7 66 87 0.76L1 table8 5 74 0.07L1 targus 25 25 1.00L1 toy3dsarah 224 197 1.14L1 two5in6 32046 4706 6.81Num. 27 27 NANum. first 5 23 NATotal iter 2374095 301894 7.86G. avg. 9098.66 2249.05 4.05
L1 norm instances total time primal simplex
Introduction
The case
Formulations
Getting feasible
Getting optimal
Computational results
Test setup
L1 norm instancesinfeasible iterationsprimal simplex
L1 norm instancestotal iterations primalsimplex
L1 norm instancestotal time primalsimplex
L∞ norm instancestotal time “primalsimplex”
Conclusions
27 / 31
Problem Standard Specialized RatioL1 bts4 151.26 45.24 3.34L1 cbs 0.03 0.05 0.60L1 dale 0.11 0.14 0.79L1 five20b 8416.45 1168.03 7.21L1 five20c 7970.91 891.02 8.95L1 hier13 2.07 0.38 5.45L1 hier13x13x13a 1.93 0.33 5.85L1 hier13x13x13d 1.94 0.34 5.71L1 hier13x13x13e 2.13 0.34 6.26L1 hier13x13x7d 0.24 0.09 2.67L1 hier13x7x7d 0.03 0.02 1.50L1 hier16 16.86 2.90 5.81L1 hier16x16x16a 17.12 2.18 7.85L1 hier16x16x16d 14.70 2.50 5.88L1 jjtabeltest3 0.07 0.05 1.40L1 nine12 359.94 123.51 2.91L1 nine5d 88.81 19.07 4.66L1 ninenew 50.36 11.14 4.52L1 osorio 0.08 0.11 0.73L1 table1 0.02 0.02 1.00L1 table3 1.78 0.30 5.93L1 table6 0.02 0.02 1.00L1 table7 0.01 0.01 1.00L1 table8 0.01 0.01 1.00L1 targus 0.00 0.00 1.00L1 toy3dsarah 0.02 0.03 0.67L1 two5in6 18.48 2.99 6.18Num. 27 27 NANum. first 9 23 NATotal time 17115.39 2270.84 7.54G. avg. 2950.40 510.10 5.78
L∞ norm instances total time “primal simplex”
Introduction
The case
Formulations
Getting feasible
Getting optimal
Computational results
Test setup
L1 norm instancesinfeasible iterationsprimal simplex
L1 norm instancestotal iterations primalsimplex
L1 norm instancestotal time primalsimplex
L∞ norm instancestotal time “primalsimplex”
Conclusions
28 / 31
Problem Standard dual on primal Specialized primal on dual RatioLinf bts4 59.01 24.92 2.37Linf cbs 0.19 0.47 0.40Linf dale 0.86 2.55 0.34Linf hier13 2.56 0.98 2.61Linf hier13x13x13a 2.18 0.86 2.53Linf hier13x13x13d 2.31 0.77 3.00Linf hier13x13x13e 2.45 0.76 3.22Linf hier13x13x7d 0.43 0.24 1.79Linf hier13x7x7d 0.05 0.09 0.56Linf hier16 13.64 3.78 3.61Linf hier16x16x16a 13.60 2.77 4.91Linf hier16x16x16d 12.57 2.99 4.20Linf hier16x16x16e 12.52 2.92 4.29Linf jjtabeltest3 0.16 0.28 0.57Linf nine12 272.58 142.98 1.91Linf nine5d 89.82 11.27 7.97Linf ninenew 132.43 4.30 30.80Linf osorio 4.86 17.20 0.28Linf table3 6.41 1.09 5.88Linf table7 0.03 0.06 0.50Linf table8 0.27 0.38 0.71Linf targus 0.01 0.03 0.33Linf toy3dsarah 0.06 0.17 0.35Linf two5in6 18.43 3.32 5.55Num. 24 24 NANum. first 9 15 NATotal time 691.90 234.38 2.95G. avg. 16.26 3.59 4.53
Conclusions
29 / 31
Conclusions
Introduction
The case
Formulations
Getting feasible
Getting optimal
Computational results
Conclusions
Conclusions
References
30 / 31
■ Norm structures can be exploited within the simplexoptimizers with surprisingly good results.
■ Model issues can be very important for the simplexoptimizers.
■ Simple ‘tricks‘ using structure knowledge about models canspeedup dramatically.
References
Introduction
The case
Formulations
Getting feasible
Getting optimal
Computational results
Conclusions
Conclusions
References
31 / 31
[CASTRO:87] J. Castro, Minimum-distance controlled perturbationmethods for large-scale tabular data protection, European Journalof Operational Research, 171 (2006)
[EWOUT:08] Ewout,Berg and Friedlander ”Probing the Pareto frontier
for basis pursuit solutions” Technical Report TR 2008-01,
Department of Computer Science, University of British Colombia.