random projections for linear programmingliberti/mor17.pdfvu, poirion and liberti: random...

26
MATHEMATICS OF OPERATIONS RESEARCH Vol. 00, No. 0, Xxxxx 0000, pp. 000–000 issn 0364-765X | eissn 1526-5471 | 00 | 0000 | 0001 INFORMS doi 10.1287/xxxx.0000.0000 c 0000 INFORMS Authors are encouraged to submit new papers to INFORMS journals by means of a style file template, which includes the journal title. However, use of a template does not certify that the paper has been accepted for publication in the named jour- nal. INFORMS journal templates are for the exclusive purpose of submitting to an INFORMS journal and should not be used to distribute the papers in print or online or to submit the papers to another publication. Random projections for linear programming Ky Vu ITCSC, Chinese University of Hong Kong, [email protected], www.lix.polytechnique.fr/vu Pierre-Louis Poirion Huawei Research Center, Paris, France, [email protected], sites.google.com/site/plpoirion86/ Leo Liberti CNRS LIX, Ecole Polytechnique, 91128 Palaiseau, France, [email protected], www.lix.polytechnique.fr/liberti Random projections are random linear maps, sampled from appropriate distributions, that approximately preserve certain geometrical invariants so that the approximation improves as the dimension of the space grows. The well-known Johnson-Lindenstrauss lemma states that there are random matrices with surprisingly few rows that approximately preserve pairwise Euclidean distances among a set of points. This is commonly used to speed up algorithms based on Euclidean distances. We prove that these matrices also preserve other quantities, such as the distance to a cone. We exploit this result to devise a probabilistic algorithm to solve linear programs approximately. We show that this algorithm can approximately solve very large randomly generated LP instances. We also showcase its application to an error correction coding problem. Key words : Johnson-Lindenstrauss lemma, concentration of measure, dimension reduction MSC2000 subject classification : Primary: 90C05; secondary: 60D05, 52A23 OR/MS subject classification : Primary: programming: linear theory; secondary: mathematics: systems solutions History : Submitted July 26, 2017 1. Introduction A deep and surprising result, called the Johnson-Lindenstrauss Lemma (JLL) [15], states that a set of high dimensional points can be projected to a much lower dimen- sional space while keeping Euclidean distances approximately the same. The JLL was previously exploited in purely Euclidean distance based algorithms, such as k-means [6] and k nearest neigh- bours [14]. The JLL has rarely been employed in mathematical optimization. The few occurrences are related to reasonably natural cases such as linear regression [25], where the error minimization is encoded by means of a Euclidean norm. One reason for this is that the very proof of the JLL exploits rotational invariance, naturally exhibited by sets of distances, but which feasible sets com- monly occurring in Linear Programming (LP), such as orthants, obviously do not. In this paper we lay the theoretical foundations of solving LPs approximately using random projections, and showcase their usefulness in practice. More precisely, we address LPs in standard form P min{c > x | Ax = b x R n + }, (1) where A is an m × n matrix. For each i m we let A i be the i-th row of A, and for each j n we let A j be the j -th column of A. If I is a set of row indices, we indicate the submatrix of A consisting of those rows by A I ; if J is a set of column indices, we indicate the submatrix of A consisting of those 1

Upload: others

Post on 07-Jun-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Random projections for linear programmingliberti/mor17.pdfVu, Poirion and Liberti: Random projections for linear programming 2 Mathematics of Operations Research 00(0), pp. 000{000,

MATHEMATICS OF OPERATIONS RESEARCHVol. 00, No. 0, Xxxxx 0000, pp. 000–000issn 0364-765X |eissn 1526-5471 |00 |0000 |0001

INFORMSdoi 10.1287/xxxx.0000.0000

c© 0000 INFORMS

Authors are encouraged to submit new papers to INFORMS journals by means ofa style file template, which includes the journal title. However, use of a templatedoes not certify that the paper has been accepted for publication in the named jour-nal. INFORMS journal templates are for the exclusive purpose of submitting to anINFORMS journal and should not be used to distribute the papers in print or onlineor to submit the papers to another publication.

Random projections for linear programming

Ky VuITCSC, Chinese University of Hong Kong, [email protected], www.lix.polytechnique.fr/∼vu

Pierre-Louis PoirionHuawei Research Center, Paris, France, [email protected], sites.google.com/site/plpoirion86/

Leo LibertiCNRS LIX, Ecole Polytechnique, 91128 Palaiseau, France, [email protected], www.lix.polytechnique.fr/∼liberti

Random projections are random linear maps, sampled from appropriate distributions, that approximatelypreserve certain geometrical invariants so that the approximation improves as the dimension of the spacegrows. The well-known Johnson-Lindenstrauss lemma states that there are random matrices with surprisinglyfew rows that approximately preserve pairwise Euclidean distances among a set of points. This is commonlyused to speed up algorithms based on Euclidean distances. We prove that these matrices also preserve otherquantities, such as the distance to a cone. We exploit this result to devise a probabilistic algorithm to solvelinear programs approximately. We show that this algorithm can approximately solve very large randomlygenerated LP instances. We also showcase its application to an error correction coding problem.

Key words : Johnson-Lindenstrauss lemma, concentration of measure, dimension reductionMSC2000 subject classification : Primary: 90C05; secondary: 60D05, 52A23OR/MS subject classification : Primary: programming: linear theory; secondary: mathematics: systems

solutionsHistory : Submitted July 26, 2017

1. Introduction A deep and surprising result, called the Johnson-Lindenstrauss Lemma(JLL) [15], states that a set of high dimensional points can be projected to a much lower dimen-sional space while keeping Euclidean distances approximately the same. The JLL was previouslyexploited in purely Euclidean distance based algorithms, such as k-means [6] and k nearest neigh-bours [14]. The JLL has rarely been employed in mathematical optimization. The few occurrencesare related to reasonably natural cases such as linear regression [25], where the error minimizationis encoded by means of a Euclidean norm. One reason for this is that the very proof of the JLLexploits rotational invariance, naturally exhibited by sets of distances, but which feasible sets com-monly occurring in Linear Programming (LP), such as orthants, obviously do not. In this paperwe lay the theoretical foundations of solving LPs approximately using random projections, andshowcase their usefulness in practice. More precisely, we address LPs in standard form

P ≡min{c>x |Ax= b∧x∈Rn+}, (1)

where A is an m×n matrix. For each i≤m we let Ai be the i-th row of A, and for each j ≤ n we letAj be the j-th column of A. If I is a set of row indices, we indicate the submatrix of A consisting ofthose rows by AI ; if J is a set of column indices, we indicate the submatrix of A consisting of those

1

Page 2: Random projections for linear programmingliberti/mor17.pdfVu, Poirion and Liberti: Random projections for linear programming 2 Mathematics of Operations Research 00(0), pp. 000{000,

Vu, Poirion and Liberti: Random projections for linear programming2 Mathematics of Operations Research 00(0), pp. 000–000, c© 0000 INFORMS

columns by AJ . We let cone(A) be the cone spanned by the column vectors Aj (for j ≤ n), andconv(A) be the convex hull of the column vectors Aj (for j ≤ n). We denote by v(P ) the optimalobjective function value of the problem P , and by F(P ) its feasible region. Note that determiningwhether F(P ) 6= ∅ is exactly the same problem as determining whether b ∈ cone(A). Throughoutthis paper, all norms will be Euclidean unless specified otherwise.

We often assume that c, b and all the column vectors of A have unit Euclidean norm. Thisassumption does not lose generality: let b and A be b,A after scaling all columns to unit norm. Ifx is the optimal solution of the LP with constraints Ax= b, we can retrieve the optimal solutionx∗ of Eq. (1) as follows:

∀j ∈ {1, . . . , n}, x∗j =‖b‖xj‖Aj‖

.

The vector c can simply be replaced by the scaled vector c= c/‖c‖: the optimum will not change,and the optimal objective function value will simply by scaled by 1/‖c‖.

A random projector is a k×m matrix T , sampled from appropriate distributions (more detailson this below), which preserves certain geometrical properties of sets of points in Rm. We denoteby

PT ≡min{c>x | TAx= Tb∧x∈Rn+} (2)

the randomly projected version of P . Our main result (Thm. 4) states that we can construct arandom projector T , with k� n, such that, for some given ε > 0, we have |v(P )− v(PT )| ≤ ε witharbitrarily high probability (w.a.h.p.). By this we mean that the probability of the concerned eventis 1− f(k) where f(k) tends to zero extremely fast as k tends to infinity (typically, the functionf is O(e−k)). Moreover, for fixed ε, k turns out to be O(lnn). Since the complexity of solving LPsdepends on both m and n, a logarithmic reduction on m (even as a function of n) is very appealing.

So far so good; unfortunately, there are some bad news too. First, we prove that the optimumof the projected problem PT is infeasible w.r.t. F(P ) (the original region) with probability 1(Prop. 3), which appears to severely limit the usefulness of Thm. 4 — we address the retrieval ofan approximate solution of the original problem in Sect. 5. Second, sampling T and performingmatrix multiplications T (A,b) is time consuming, since T is a dense matrix. Third, even thoughthe original LP is sparse, the projected LP is dense as a result of T being dense, which meansthat solving it has an added computational cost. The ill effects of density on CPU time can bemoderated in computational experiments by using sparse random projectors (see e.g. [1]). Last,but not least, we have no idea how to estimate, much less compute, the multiplicative constant inthe term O(lnn). We know that the term 1

ε2, which is large if we want the approximation to be

tight, plays a role; but there are other universal constants that also play a role. We also know thatthe probability of the event |v(P )− v(PT )| ≤ ε approaches 1 as 1−O(e−k). All this suggests thatany practical usefulness of this methodology will come from very large and/or very dense instances.

As we shall indeed see in Sect. 7, our experience suggests that errors decrease as sizes increase.This is true for our randomly generated test set, but also in regard to our coding application.

We have also started testing random projections on LPs coming from various other applications.Though these results are not conclusive enough to be presented (yet), we had some positive expe-rience with: (i) quantile regression [17]; (ii) large and dense diet problems with unit costs [9]; (iii)LPs arising from the basis pursuit methodology for compressed sensing [8].

1.1. Differences with existing literature Randomized dimension reduction techniques arewidely used in the analysis of large data sets, but much less so in Mathematical Programming (MP).Specifically, in the field of LP we are aware of three main results [7, 25, 12]. We set compressedsensing [7] aside, as strictly speaking this is not a solution or reformulation method, but rather atheoretical analysis which explains why `1-norm minimization of the error of an underdeterminedlinear system is an excellent proxy for reconstructing sparse solutions. Although we are only citing

Page 3: Random projections for linear programmingliberti/mor17.pdfVu, Poirion and Liberti: Random projections for linear programming 2 Mathematics of Operations Research 00(0), pp. 000{000,

Vu, Poirion and Liberti: Random projections for linear programmingMathematics of Operations Research 00(0), pp. 000–000, c© 0000 INFORMS 3

the paper [7] for compressed sensing, this line of work gave rise to a very large number of papersby many different authors. We shall see in Sect. 8 that compressed sensing, in the setting of ourcoding application, can be “further compressed” using our methodology.

In [25], it is shown how matrix sketching (which is strongly related to random projections) canhelp decrease the dimensionality of some convex quadratic minimization over an arbitrary convexset C from a given Rm to Rk for some k ≤m. Sect. 3-4 below emphasize some of the differencesof the present work with [25]; [25, Eq. (28) §3.4], for example, encodes the problem of decidingwhether zero is in the convex hull of the columns of a given matrix B. Unlike our development, theanalysis provided in [25] requires the projected dimension k to be bounded below by a function ofseveral parameters before any probability estimation can be made. Another remarkable differenceis that the framework described in [25] requires a convex purely quadratic objective function: toencode a linear objective c>x using a quadratic, the most direct way involves the introduction ofa new scalar variable y, and then rewriting min c>x as miny2 subject to y ≥ c>x and y ≥ 0. Thisreformulation, however, makes the application of the method impossible, since the quadratic formy2 = y(1)y is represented by a 1× 1 matrix, which has dimensions that obviously cannot furtherreduced by sketching. Lastly, in [25] we find that the projected dimension k is of the order ofmagnitude of the Gaussian width W of C. To require k�m, this implies working with convex setsC having small Gaussian width. By contrast, our technique optimizes over orthants, which have(large) Gaussian width O(n).

The paper [12] proposes a randomized dimensionality reduction based on PAC learning [3]: froma small training set, it is possible to forecast some properties of large data sets while keepingthe error low. This is exploited in LPs with very few variables and huge numbers of inequalityconstraints: it is found that this number can be greatly reduced while keeping the optimality errorbounded. In order to have PAC learning assumptions work, the authors focus on application caseswhich have a specific structure, i.e. there is an order on the constraints which makes their slope varyin a controlled way (an example is given by the piecewise linear approximation of a two-dimensionalclosed convex curve: one can take many tangents, but few of these suffice to give almost the sameapproximation). The prominent difference with the method proposed in this paper is that we makeno such assumption.

1.2. Contents The rest of the paper is organized as follows. Section 2 reports the basicconcepts about the JLL. In Sect. 3 we show that random projections approximately preserve LPfeasibility with high probability. The proof of our main theorem is offered in Sect. 4, where we arguethat random projections also preserve LP optimality with high probability. In Sect. 5 we addressthe limitation referred to above, and provide a method to work out the solution of the originalLP given the solution of the projected LP. In Sect. 6 we make some remarks about computationalcomplexity. Sect. 7 reports some computational results, and Sect. 8 showcases an application toerror correcting codes.

2. The Johnson-Lindenstrauss lemma The JLL is stated as follows:

Theorem 1 (Johnson-Lindenstrauss Lemma [15]). Given ε∈ (0,1) and an m×n matrixA, there exists a k×m matrix T such that:

∀1≤ i < j ≤ n (1− ε)‖Ai−Aj‖ ≤ ‖TAi−TAj‖ ≤ (1 + ε)‖Ai−Aj‖, (3)

where k is O(ε−2 lnn).

Thus, all sets of n points can be projected to a subspace having dimension logarithmic in n(and, surprisingly, independent of the original number m of dimensions), such that no distance isdistorted by more than 1 + 2ε. The JLL can be established as a consequence of a general property

Page 4: Random projections for linear programmingliberti/mor17.pdfVu, Poirion and Liberti: Random projections for linear programming 2 Mathematics of Operations Research 00(0), pp. 000{000,

Vu, Poirion and Liberti: Random projections for linear programming4 Mathematics of Operations Research 00(0), pp. 000–000, c© 0000 INFORMS

(see Lemma 1 below) of sub-gaussian random mappings T = 1√kU [19]. Some of the most popular

choices for U are:Choices of random projection:1. orthogonal projections on a random k-dimensional linear subspace of Rm [15];2. random k×m matrices with each entry independently drawn from the standard normal distri-

bution N(0,1) [14];3. random k ×m matrices with each entry independently taking values +1 and −1, each with

probability 12

[1];4. random k×m matrices with entries independently taking values +1, 0, −1, respectively with

probability 16, 2

3, 1

6[1] (we call this the Achlioptas random projector).

Other, sparser projectors have been proposed in [1, 10, 16, 2]. In this paper we just limit ourattention to the normally distributed T ∼N(0,1/k) (where 1/k is the variance, not the standarddeviation) and its discrete approximation in Item 4 above. Our reasons for ignoring this issueis that we believe that the rue bottleneck lies the unknown “large constants” referred to above.The matrix product operation (on which the choice of random projector would have the greatestimpact) is one of the most common in scientific computing, and many ways are known to optimizeand streamline it. In our computational experiments (Sect. 7-8) we use the Achlioptas projectorand the most obvious matrix product implementation.

Note that all the random projectors we consider have zero mean. This is necessary in orderto ensure that our randomized algorithms will yield the result we want in expectation. This alsoexplains why we consider LPs in standard rather than canonical form: we cannot apply the randomprojection to the inequality system Ax≤ b to yield TAx≤ Tb: this is almost always false, since thesigns of the components of the matrix T are distributed uniformly.

The JLL can be derived from a more fundamental result [20].

Lemma 1 (Random projection lemma). For all ε ∈ (0,1) and all vectors y ∈Rm, let T bea k×m random projector from one of the choices (1-4) above , then

Prob( (1− ε)‖y‖ ≤ ‖Ty‖ ≤ (1 + ε)‖y‖ )≥ 1− 2e−Cε2k (4)

for some constant C > 0 (independent of m,k, ε).

It can be proved easily that JLL is a consequence of Lemma 1 by setting y =Ai−Aj for all pairsof (i, j) and then applying the union bound. Moreover, Lemma 1 shows that the probability offinding a good T is very high for large enough values of k. Indeed, from Lemma 1, the probabilitythat Eq. (3) holds for all i 6= j ≤ n is at least

1− 2

(n

2

)e−Cε

2k = 1−n(n− 1)e−Cε2k. (5)

Therefore, if we want this probability to be larger than, say 99.9%, we simply choose any k such that1

1000n(n−1)> e−Cε

2k. This means k can be chosen to be k= d ln(1000)+2 ln(n)

Cε2 e, which is O(ε−2(ln(n) +

3.5) ).Note that the distributions from which T is sampled are such that the the average of ‖Ty‖ over

T is equal to ‖y‖. Lemma 1 is a concentration of measure result, and it states that the probabilityof a single sampling of T yielding a value of ‖Ty‖ very close to its mean approaches 1 as fast as anegative exponential of k approaches zero.

We shall also need a squared version of the random projection lemma [11].

Lemma 2 (Random projection lemma, squared version). For all ε ∈ (0,1) and all vec-tors y ∈Rm, let T be a k×m random projector from one of the choices (1-4) above, then

Prob( (1− ε)‖y‖2 ≤ ‖Ty‖2 ≤ (1 + ε)‖y‖2 )≥ 1− 2e−C(ε2−ε3)k (6)

for some constant C > 0 (independent of m,k, ε).

Page 5: Random projections for linear programmingliberti/mor17.pdfVu, Poirion and Liberti: Random projections for linear programming 2 Mathematics of Operations Research 00(0), pp. 000{000,

Vu, Poirion and Liberti: Random projections for linear programmingMathematics of Operations Research 00(0), pp. 000–000, c© 0000 INFORMS 5

Another relevant result about the JLL is the preservation of angles (or scalar product) with highprobability. Indeed, given any x, y ∈Rn, and T a k×m random projector from one of the choices(1-4) above, by applying Lemma 2 on two vectors x+y, x−y and using the union bound, we have

|〈Tx,Ty〉− 〈x, y〉| = 14

∣∣‖T (x+ y)‖2−‖T (x− y)‖2−‖x+ y‖2 + ‖x− y‖2∣∣

≤ 14

∣∣‖T (x+ y)‖2−‖x+ y‖2∣∣+ 1

4

∣∣‖T (x− y)‖2−‖x− y‖2∣∣

≤ ε4(‖x+ y‖2 + ‖x− y‖2) = ε

2(‖x‖2 + ‖y‖2), (7)

with probability at least 1− 4e−Cε2k. We can strengthen this further to obtain the following useful

result.

Proposition 1. Let T : Rm→Rk be a k×m random projector from one of the choices (1-4)above and let 0< ε< 1. Then there is a universal constant C such that, for any x, y ∈Rn:

−ε‖x‖‖y‖ ≤ 〈Tx,Ty〉− 〈x, y〉 ≤ ε‖x‖‖y‖

with probability at least 1− 4e−Cε2k.

Proof . Apply Eq. (7) with x replaced by x‖x‖ and y replaced by y

‖y‖ . This yields |〈Tx,Ty〉 −〈x, y〉| ≤ ε; now we can multiply both sides by ‖x‖‖y‖ to obtain the desired result. 2

Lemma 2 and Proposition 1 will be used extensively throughout the paper in order to estimatethe distance between a vector and a linear combination of other vectors. In particular, we canbound ‖Tx−

∑i λiTyi‖2 by first expanding it and then approximating each 〈Tx,Tyi〉 by 〈x, yi〉

and 〈Tyi, T yj〉 by 〈yi, yj〉. We leverage over the weights λi to get the desired estimation. In the restof the paper, we will refer to a k×m random matrix from one of the choices (1-4) in Section 2 asa “random projector”.

We remark that Lemma 1 is central to all random projection theory, and will be used again laterin Sect. 3. Prop. 1 will be used later on in Sect. 4. If Ax= b∧ x≥ 0 is infeasible, then b is not inthe cone spanned by the columns of A. By proving angle preservation w.a.h.p., Prop. 1 is key tounderstanding why, when we pre-multiply Ax= b by a random projector, the resulting system isstill infeasible w.a.h.p.

3. Preserving LP feasibility Consider the Linear Feasibility Problem (LFP)

F =F(P )≡ {x∈Rn+ |Ax= b}

and its randomly projected version

TF =F(PT )≡ {x∈Rn+ | TAx= Tb}.

In this section we prove that F 6=∅ if and only if TF 6=∅ w.a.h.p.We remark that, for any k×m matrix T , any feasible solution for F is also a feasible solution

for TF by linearity. So the real issue is proving that if F is infeasible then TF is also infeasiblew.a.h.p. This is where we exploit the fact that T is a random projector. More precisely, we provethe following statements about linear infeasibility w.a.h.p.:1. a nonzero vector is randomly projected to a nonzero vector;2. if x is not feasible in F , then it is not feasible in TF ;3. if x is not feasible in F for all x in a finite set X, then the same follows for TF ;4. if b is not in the convex hull of A, then Tb is not in the convex hull of TA.5. if b is not in the cone of A, then Tb is not in the cone of TA.

The first result is actually a corollary of Lemma 1. We denote by Ec the complement of an eventE.

Page 6: Random projections for linear programmingliberti/mor17.pdfVu, Poirion and Liberti: Random projections for linear programming 2 Mathematics of Operations Research 00(0), pp. 000{000,

Vu, Poirion and Liberti: Random projections for linear programming6 Mathematics of Operations Research 00(0), pp. 000–000, c© 0000 INFORMS

Corollary 1. Let T be a k×m random projector and y ∈Rm with y 6= 0. Then we have

Prob(Ty 6= 0)≥ 1− 2e−Ck. (8)

for some constant C > 0 (independent of n,k).

Proof. For any ε∈ (0,1), we define the following events:

A ={Ty 6= 0

}B =

{(1− ε)‖y‖ ≤ ‖Ty‖ ≤ (1 + ε)‖y‖

}.

By Lemma 1 it follows that Prob(B)≥ 1− 2e−Cε2k for some constant C > 0 independent of m,k, ε.

On the other hand, Ac ∩ B = ∅, since otherwise, for any ε ∈ (0,1) there is a mapping T1 such

that T1(y) = 0 and (1 − ε)‖y‖ ≤ ‖T1(y)‖, which altogether imply that y = 0 (a contradiction).Therefore, B ⊆ A, and we have Prob(A)≥ Prob(B)≥ 1− 2e−Cε

2k. This holds for all 0< ε < 1, so

Prob(A)≥ 1− 2eCk. 2

The following theorem settles points 2-3 above.

Theorem 2. Let T be a k×m random projector and F ≡ {x≥ 0 | Ax= b} with A an m× nmatrix. Then for any x∈Rn, we have:

(i) If b=n∑j=1

xjAj then Tb=n∑j=1

xjTAj;

(ii) If b 6=∑n

j=1 xjAj then Prob

[Tb 6=

∑n

j=1 xjTAj

]≥ 1− 2e−Ck;

(iii) If b 6=∑n

j=1 xjAj for all x∈X ⊆Rn, where |X| is finite, then

Prob

[∀x∈X Tb 6=

n∑j=1

xjTAj

]≥ 1− 2|X|e−Ck;

for some constant C > 0 (independent of n,k).

Proof. Point (i) follows by linearity of T , and (ii) by applying Cor. 1 to Ax− b. For (iii), the

union bound on (ii) yields:

Prob

[∀x∈X Tb 6=

n∑j=1

xjTAj

]= Prob

[ ⋂x∈X

{Tb 6=

n∑j=1

xjTAj}]

= 1−Prob

[ ⋃x∈X

{Tb 6=

n∑j=1

xjTAj}c] ≥ 1−

∑x∈X

Prob

[{Tb 6=

n∑j=1

xjTAj}c]

[by (ii)] ≥ 1−∑x∈X

2e−Ck = 1− 2|X|e−Ck,

as claimed. 2

Thm. 2 can be used to project certain types of integer programs. It also gives us an indication

to why estimating the probability that Tb 6∈ cone(A) is not straightforward. This event can be

written as an intersection of uncountably many events {Tb 6=∑n

j=1 xjTAj} where x∈Rn+. Even if

each of these occurs w.a.h.p., their intersection might still be small. As these events are dependent,

however, we shall show that there is hope yet.

Page 7: Random projections for linear programmingliberti/mor17.pdfVu, Poirion and Liberti: Random projections for linear programming 2 Mathematics of Operations Research 00(0), pp. 000{000,

Vu, Poirion and Liberti: Random projections for linear programmingMathematics of Operations Research 00(0), pp. 000–000, c© 0000 INFORMS 7

3.1. Convex hull feasibility Next, we show that if the distance between a point and a closedset is positive, it remains positive with high probability after applying a random projection. Weconsider the convex hull membership problem: given vectors b,A1, . . . ,An ∈ Rm, decide whetherb ∈ conv({A1, . . . ,An}). Although we do not use this result directly in the following, we believe itis of independent interest.

We have the following result:

Proposition 2. Given A1, . . . ,An ∈ Rm, let C = conv({A1, . . . ,An}), b ∈ Rm such that b /∈ C,d= min

x∈C‖b−x‖ and D= max

1≤j≤n‖b−Aj‖. Let T :Rm→Rk be a random projector. Then

Prob[Tb /∈ TC

]≥ 1− 2n2e−C(ε

2−ε3)k (9)

for some constant C (independent of m,n,k, d,D) and ε < d2

D2 .

This proposition is based on the fact that any vector in C can be represented as a convex com-bination of Aj for j ∈ {1, . . . , n}. Since the distance between Tb and all TAj is positive with highprobability and the total weight, i.e.

∑n

j=1 λj, is always 1, we can bound the distance between Tband all vectors in TC.

Proof . Let Sε be the event that both

(1− ε)‖x− y‖2 ≤ ‖T (x− y)‖2 ≤ (1 + ε)‖x− y‖2

and(1− ε)‖x+ y‖2 ≤ ‖T (x+ y)‖2 ≤ (1 + ε)‖x+ y‖2

hold for all x, y ∈ {0, b−A1, . . . , b−An}. Assume Sε occurs. Then for all real λj ≥ 0 withn∑j=1

λj = 1,

we have:

‖Tb−n∑j=1

λjTAj‖2 = ‖n∑j=1

λjT (b−Aj)‖2 (by linearity of T and∑

j λj = 1)

=n∑j=1

λ2j‖T (b−Aj)‖2 + 2

∑1≤i<j≤n

λiλj〈T (b−Ai), T (b−Aj)〉

=n∑j=1

λ2j‖T (b−Aj)‖2 +

1

2

∑1≤i<j≤n

λiλj

(‖T (b−Ai + b−Aj)‖2−‖T (Ai−Aj)‖2

). (10)

Here the last equality follows from the fact that 〈x, y〉= 14(‖x+ y‖2−‖x− y‖2) for all vectors x, y.

Moreover, since Sε occurs, we have

‖T (b−Aj)‖2 ≥ (1− ε)‖b−Aj‖2

and

‖T (b−Ai + b−Aj)‖2−‖T (Ai−Aj)‖2 ≥ (1− ε)∥∥b−Ai + b−Aj

∥∥2− (1 + ε)‖Ai−Aj‖2

for all 1≤ i < j ≤ n. Therefore, the RHS in (10) is greater than or equal to

(1− ε)n∑j=1

λ2j‖b−Aj‖2 +

1

2

∑1≤i<j≤n

λiλj

((1− ε)

∥∥b−Ai + b−Aj∥∥2− (1 + ε)‖Ai−Aj‖2

)= ‖b−

n∑j=1

λjAj‖2− ε( n∑

j=1

λ2j‖b−Aj‖2 +

1

2

∑1≤i<j≤n

λiλj(‖b−Ai + b−Aj‖2 + ‖Ai−Aj‖2)

)= ‖b−

n∑j=1

λjAj‖2− ε( n∑

j=1

λ2j‖b−Aj‖2 +

∑1≤i<j≤n

λiλj(‖b−Ai‖2 + ‖b−Aj‖2)

).

Page 8: Random projections for linear programmingliberti/mor17.pdfVu, Poirion and Liberti: Random projections for linear programming 2 Mathematics of Operations Research 00(0), pp. 000{000,

Vu, Poirion and Liberti: Random projections for linear programming8 Mathematics of Operations Research 00(0), pp. 000–000, c© 0000 INFORMS

From the definitions of d and D, we have ‖b−∑n

j=1 λjAj‖2 ≥ d2 and ‖b−Ai‖ ≤D2 for all 1≤ i≤ n.Therefore:

‖Tb−n∑j=1

λjTAj‖2 ≥ d2− εD2

( n∑j=1

λ2j + 2

∑1≤i<j≤n

λiλj

)= d2− εD2

( n∑j=1

λj

)2

= d2− εD2 > 0

due to the fact that∑n

j=1 λj = 1 and the choice of ε < d2

D2 .

Now, since ‖Tb −n∑j=1

λjTAj‖2 > 0 for all choices of λ ≥ 0 with∑n

j=1 λj = 1, it follows that

Tb /∈ conv({TA1, . . . , TAn}).In summary, if Sε occurs, then Tb /∈ conv({TA1, . . . , TAn}). Thus, by Lemma 2 and the union

bound,

Prob(Tb /∈ TC)≥ Prob(Sε)≥ 1− 2(n+ 2

(n

2

))e−C(ε

2−ε3)k = 1− 2n2e−C(ε2−ε3)k

for some constant C > 0. 2

As an interesting aside, we remark that this proof can also be extended to show that disjointpolytopes project to disjoint polytopes with high probability.

3.2. Cone feasibility We now deal with the last (and most relevant) result: if b is not in thecone of the columns of A, then Tb is not in the cone of the columns of TA w.a.h.p. We first definethe A-norm of x∈ cone(A) as

‖x‖A = min{ n∑j=1

λj∣∣ λ≥ 0∧x=

n∑j=1

λjAj}.

For each x ∈ cone(A), we say that λ ∈ Rn+ yields a minimal A-representation of x if and only ifn∑j=1

λj = ‖x‖A. We define µA = max{‖x‖A | x ∈ cone(A) ∧ ‖x‖ ≤ 1}; then, for all x ∈ cone(A), we

have‖x‖ ≤ ‖x‖A ≤ µA‖x‖.

In particular µA ≥ 1. Note that µA serves as a measure of worst-case distortion when we move fromEuclidean to ‖ · ‖A norm.

For the next result, we assume we are given an estimate of a lower bound ∆ to d= minx∈C‖b−x‖,

and also (without loss of generality) that b and the column vectors of A have unit Euclidean norm.

Theorem 3. Given an m × n matrix A and b ∈ Rm s.t. b 6∈ cone(A). Then for any 0 <

ε< ∆2

µ2A

+2µA

√1−∆2+1

and any k×m random projector T (such as one in Section 2), we have

Prob(Tb /∈ cone(TA))≥ 1− 2(n+ 1)(n+ 2)e−C(ε2−ε3)k (11)

for some constant C (independent of m,n,k,∆).

The proof to this theorem (below) is based on the fact that any vector x in the cone generatedby A can be represented as a nonnegative combination of Aj for j ∈ {1, . . . , n}. Since the distancebetween Tb and all TAj is positive with high probability, so if we can bound the total weight,i.e.

∑n

j=1 λj, it is possible to bound the distance between Tb and all vectors in TC. However,vectors in the cone might have many different such representations. Therefore we choose a minimalrepresentation, which we obtain by minimizing the corresponding total weight. This motives ourdefinition of A-norm. With this definition, we can then estimate

‖Tb−Tx‖2 ≥ ‖b−x‖2− ε(1 + ‖x‖2A)

Page 9: Random projections for linear programmingliberti/mor17.pdfVu, Poirion and Liberti: Random projections for linear programming 2 Mathematics of Operations Research 00(0), pp. 000{000,

Vu, Poirion and Liberti: Random projections for linear programmingMathematics of Operations Research 00(0), pp. 000–000, c© 0000 INFORMS 9

with high probability. The problem is that ‖x‖A can go to infinity when ‖x‖→∞, therefore theleft-hand side might be negative. To overcome this difficulty, we argue that ‖b− x‖2 can also bescaled up when ‖x‖→∞, so that it always dominates ε(1 + ‖x‖2A). This is the fact we prove laterin the claim that

‖b−x‖2 ≥ ‖x‖2− 2‖x‖‖p‖+ 1,

where p is the projection of b to the cone.Proof. For any ε chosen as in the theorem statement, let Sε be the event that both

(1− ε)‖x− y‖2 ≤ ‖T (x− y)‖2 ≤ (1 + ε)‖x− y‖2

and(1− ε)‖x+ y‖2 ≤ ‖T (x+ y)‖2 ≤ (1 + ε)‖x+ y‖2

hold for all x, y ∈ {0, b,A1, . . . ,An}. By Lemma 1, we have

Prob(Sε)≥ 1− 4

(n+ 2

2

)e−C(ε

2−ε3)k = 1− 2(n+ 1)(n+ 2)e−C(ε2−ε3)k

for some constant C (independent of m,n,k, d). We will prove that if Sε occurs, then we haveTb /∈ cone{TA1, . . . , TAn}. Assume that Sε occurs. Consider an arbitrary x∈ cone{A1, . . . ,An} and

letn∑j=1

λjAj be a minimal A-representation of x. Then we have:

‖Tb−Tx‖2 = ‖Tb−n∑j=1

λjTAj‖2

= ‖Tb‖2 +

n∑j=1

λ2j‖TAj‖2− 2

n∑j=1

λj〈Tb,TAj〉+ 2∑

1≤i<j≤n

λiλi〈TAi, TAj〉

=‖Tb‖2+n∑j=1

λ2j‖TAj‖2+

n∑j=1

λj2

(‖T (b−Aj)‖2−‖T (b+Aj)‖2)+∑

1≤i<j≤n

λiλj2

(‖T (Ai+Aj)‖2−‖T (Ai−Aj)‖2)

(12)

Here the last equality follows by the fact that 〈x, y〉= 14(‖x+ y‖2 − ‖x− y‖2) for all vectors x, y.

Moreover, since Sε occurs, we have

‖Tb‖2 ≥ (1− ε)‖b‖2, ‖TAj‖2 ≥ (1− ε)‖Aj‖2 for all 1≤ j ≤ n

and

‖T (b−Aj)‖2−‖T (b+Aj)‖2 ≥ (1− ε)∥∥b−Aj∥∥2− (1 + ε)‖b+Aj‖2

‖T (Ai+Aj)‖2−‖T (Ai−Aj)‖2 ≥ (1− ε)‖Ai+Aj‖2− (1 + ε)‖Ai−Aj‖2

for all 1≤ i < j ≤ n. Therefore, the RHS in (12) is greater than or equal to

(1− ε)‖b‖2 + (1− ε)n∑j=1

λ2j‖Aj‖2 +

n∑j=1

λj2

((1− ε)‖b−Aj‖2− (1 + ε)‖b+Aj‖2)

+∑

1≤i<j≤n

λiλj2

((1− ε)‖Ai +Aj‖2− (1 + ε)‖Ai−Aj‖2). (13)

Since we have assumed that ‖b‖= ‖A1‖= . . .‖An‖= 1, it can then be rewritten as

‖b−n∑j=1

λjAj‖2− ε(

1 +n∑j=1

λ2j + 2

n∑i=j

λj + 2∑j 6=i

λiλj

)= ‖b−

n∑j=1

λjAj‖2− ε(1 +

n∑j=1

λj)2

= ‖b−x‖2− ε(1 + ‖x‖A

)2(by the definition of A-norm).

Page 10: Random projections for linear programmingliberti/mor17.pdfVu, Poirion and Liberti: Random projections for linear programming 2 Mathematics of Operations Research 00(0), pp. 000{000,

Vu, Poirion and Liberti: Random projections for linear programming10 Mathematics of Operations Research 00(0), pp. 000–000, c© 0000 INFORMS

In summary, we have proved that, when the event Sε occurs, then

‖Tb−Tx‖2 ≥ ‖b−x‖2− ε(1 + ‖x‖A

)2. (14)

Denote by α= ‖x‖ and let p be the orthogonal projection of b to cone{A1, . . . ,An}, which means‖b− p‖= min{‖b−x‖ | x∈ cone{A1, . . . ,An}}. We will need to use the following claim:

Claim. For all b,x,α, p given above, we have ‖b−x‖2 ≥ α2− 2α‖p‖+ 1.By this claim (proved later), from inequality (14), we have:

‖Tb−Tx‖2 ≥ α2− 2α‖p‖+ 1− ε(1 + ‖x‖A

)2

≥ α2− 2α‖p‖+ 1− ε(1 +µAα

)2(since ‖x‖A ≤ µA‖x‖)

=(1− εµ2

A

)α2− 2

(‖p‖+ εµA

)α+ (1− ε).

The last expression can be viewed as a quadratic function with respect to α. We will prove thisfunction is positive for all α∈R. This is equivalent to1(

‖p‖+ εµA)2−

(1− εµ2

A

)(1− ε)< 0

⇔(µ2A + 2‖p‖µA + 1

)ε < 1−‖p‖2

⇔ ε <1−‖p‖2

µ2A + 2‖p‖µA + 1

=d2

µ2A + 2‖p‖µA + 1

,

which holds for the choice of ε as in the hypothesis. In conclusion, if the event Sε occurs, then‖Tb−Tx‖2 > 0 for all x∈ cone{A1, . . . ,An}, i.e. Tx /∈ cone{TA1, . . . , TAn}. Thus,

Prob(Tb /∈ TC)≥ Prob(Sε)≥ 1− 2(n+ 1)(n+ 2)e−c(ε2−ε3)k

as claimed. The result follows since ‖p‖22 + d2 = 1 by Pythagoras’ theorem, and ∆≤ d.

Proof of the claim that ‖b−x‖2 ≥ α2− 2α‖p‖+ 1:If x= 0 then the claim is trivially true, since ‖b−x‖2 = ‖b‖2 = 1 = α2−2α‖p‖+1. Hence we assumex 6= 0. First consider the case p 6= 0. By Pythagoras’ theorem, we must have d2 = 1− ‖p‖2. Wedenote by z = ‖p‖

αx, then ‖z‖= ‖p‖. Set δ= α

‖p‖ , we have

‖b−x‖2 = ‖b− δz‖2= (1− δ)‖b‖2 + (δ2− δ)‖z‖2 + δ‖b− z‖2= (1− δ) + (δ2− δ)‖p‖2 + δ‖b− z‖2≥ (1− δ) + (δ2− δ)‖p‖2 + δd2

= (1− δ) + (δ2− δ)‖p‖2 + δ(1−‖p‖2)= δ2‖p‖2− 2δ‖p‖2 + 1 = α2− 2α‖p‖+ 1.

Next, we consider the case p= 0. In this case we have bT (x)≤ 0 for all x∈ cone{A1, . . . ,An}. Indeed,for an arbitrary δ > 0,

0≤ 1

δ(‖b− δx‖2− 1) =

1

δ(1 + δ2‖x‖2− 2δbTx− 1) = δ‖x‖2− 2bTx

which tends to −2bTx when δ→ 0+. Therefore

‖b−x‖2 = 1− 2bTx+ ‖x‖2 ≥ ‖x‖2 + 1 = α2− 2α‖p‖+ 1,

which proves the claim. 2

Since cone membership is the same as LP feasibility, Thm. 3 establishes that LFPs can berandomly projected accurately w.a.h.p.

1 Here we use the fact that a quadratic function ax2 + bx+ c > 0 for all x∈R if and only if a> 0 and b2− 4ac < 0.

Page 11: Random projections for linear programmingliberti/mor17.pdfVu, Poirion and Liberti: Random projections for linear programming 2 Mathematics of Operations Research 00(0), pp. 000{000,

Vu, Poirion and Liberti: Random projections for linear programmingMathematics of Operations Research 00(0), pp. 000–000, c© 0000 INFORMS 11

4. Preserving optimality In this section we show that, if the projected dimension k is largeenough, v(P )≈ v(PT ) w.o.p (Thm. 4). We assume all along, and without loss of generality, thatb, c and the columns of A have unit Euclidean norms.

The proof of Thm. 4 is divided into two main parts.• In the first part, we write v(P )≈ v(PT ) formally as “given δ > 0 there is a random projector T

such that v(P )−δ≤ v(PT )≤ v(P ) w.a.h.p.”, formalize some infeasible LFPs which encode v(P )−δand v(PT ), and emphasize their relationship.• In the second part, we formally argue the “overwhelming probability” by means of an ε > 0 (in

function of δ) which ensures that the probability of v(P )− δ≤ v(PT ) approaches 1 fast enough (asa function of ε). This ε refers to the projected (infeasible) LFP of the first part, but for technicalreasons we cannot simply “inherit it” from Thm. 3. Instead, from the cone of the infeasible LFPwe carefully construct a new pointed cone which allows us to carry out a projected separationargument based on inner product preservation (Prop. 1).Our proof assumes that the feasible region of P is non-empty and bounded. Specifically, we assumethat a constant θ > 0 is given such that that there exists an optimal solution x∗ of P (see Eq. (1))satisfying

n∑j=1

x∗j < θ. (15)

For the sake of simplicity (and without loss of generality), we assume further that θ ≥ 1. Thisassumption is used to control the excessive flatness of the involved cones, which is required in theprojected separation argument.

4.1. A cone transformation operation Before introducing Thm. 4 and its proof, we explainhow to construct a pointed cone from the cone of the LFP in such a way as to preserve a certainmembership property.

Given a polyhedral cone

K=

{∑j≤n

xjCj

∣∣∣∣ x∈Rn+}

in which C1, . . . ,Cn are column vectors of an m×n matrix C, in other words K= cone(C). For anyu∈Rm, we consider the following transformation φu,θ, defined by:

φu,θ(K) :=

{n∑j=1

xj

(Cj −

1

θu

) ∣∣∣∣ x∈Rn+}.

In particular, φu,θ moves the origin in the direction u by a step 1/θ (see Figure 1). For θ definedin Eq. (15), we also consider the following set

Kθ =

{n∑j=1

xjCj

∣∣∣∣ x∈Rn+ ∧ n∑j=1

xj < θ

}.

Kθ can be seen as a set truncated from K (in particular, it is not a cone anymore). We shall showthat φu,θ preserves the membership of the vector u in the “truncated cone” Kθ.

Lemma 3. For any u∈Rm, we have u∈Kθ if and only if u∈ φu,θ(K).

Proof. First of all, let denote by t= 1− 1θ

n∑j=1

xj.

Page 12: Random projections for linear programmingliberti/mor17.pdfVu, Poirion and Liberti: Random projections for linear programming 2 Mathematics of Operations Research 00(0), pp. 000{000,

Vu, Poirion and Liberti: Random projections for linear programming12 Mathematics of Operations Research 00(0), pp. 000–000, c© 0000 INFORMS

Figure 1. The effect of φu when u does not belong to the cone (left) and when it does (right).

(⇒) If u ∈ Kθ, then there exists x ∈ Rn+ such that u =n∑j=1

xjCj andn∑j=1

xj < θ. Then u can be

written asn∑j=1

x′j(Cj − 1

θu)

with x′ = 1tx. Indeed,

n∑j=1

x′j(Cj −

1

θu)

=1

t

n∑j=1

xj(Cj −

1

θu)

=1

t

n∑j=1

xjCj −1

t

( n∑j=1

1

θxj)u

=1

tu− 1

t

( n∑j=1

1

θxj)u

=1

t

(1− 1

θ

n∑j=1

xj)u

= u (by definition of t).

Moreover, due to the assumption thatn∑j=1

xj < θ, we have x′ ≥ 0. It follows that u∈ φu,θ(K).

(⇐) If u ∈ φu,θ(K), then there exists x ∈ Rn+ such that u =n∑j=1

xj(Cj − 1

θu). It is equivalent to(

1 + 1θ

n∑j=1

xj)u=n∑j=1

xjCj. Thus u can also be written asn∑j=1

x′jCj, where x′j =xj

1+ 1θ

n∑i=1

xi

. Note that

n∑j=1

x′j < θ because

n∑j=1

x′j =

n∑j=1

xj

1 + 1θ

n∑j=1

xj

< θ,

which implies that u∈Kθ. 2

Page 13: Random projections for linear programmingliberti/mor17.pdfVu, Poirion and Liberti: Random projections for linear programming 2 Mathematics of Operations Research 00(0), pp. 000{000,

Vu, Poirion and Liberti: Random projections for linear programmingMathematics of Operations Research 00(0), pp. 000–000, c© 0000 INFORMS 13

Note that this result is still valid when the transformation φu,θ is only applied to a subset ofcolumns of C. Given any vector u and an index set J ⊆ {1, . . . , n}, we define ∀j ≤ n:

CJuj =

{Cj − 1

θu if j ∈ J

Cj otherwise.

We extend φu,θ to

φJu,θ(K) =

{n∑j=1

xjCJuj

∣∣∣∣ x∈Rn+}

= cone(CJuj | 1≤ j ≤ n), (16)

and define

KJθ =

{n∑j=1

xjCj

∣∣∣∣ x∈Rn+ ∧∑j∈J

xj < θ

}.

The following corollary can be proved in the same way as Lemma 3, in which φu,θ is replaced byφJu,θ.

Corollary 2. For any vector u∈Rm and any index set J ⊆ {1, . . . , n}, we have u∈KJθ if andonly if u∈ φJu,θ(K).

4.2. The main theorem Given an LFP instance Ax= b∧x≥ 0, where A is an m×n matrixand T is a k×m random projector. By Thm. 3, we know that,

∃x≥ 0 (Ax= b) ⇔ ∃x≥ 0 (TAx= Tb)

w.a.h.p. We remark that this also holds for a (k+h)×m random projector of the form(Ih 0T

),

where T is a k×m random matrix. This allows us to claim the feasibility equivalence w.a.h.p. evenwhen we only want to project a subset of rows of A. In the following, we will use this observation tohandle constraints and objective function separately. In particular, we only project the constraintswhile keeping objective function unchanged.

If we add the constraintn∑j=1

xj ≤ θ to the problem PT (defined in Eq. (2)), we obtain the following:

PT,θ ≡min

{c>x

∣∣∣∣ TAx= Tb∧n∑j=1

xj ≤ θ∧x∈Rn+

}. (17)

So we come to our main theorem, which asserts that the optimal objective value of P can bewell-approximated by that of PT,θ.

Theorem 4. Assume F(P ) is bounded and non-empty. Let y∗ be an optimal dual solution ofP of minimal Euclidean norm. Given 0< δ≤ |v(P )|, we have

v(P )− δ≤ v(PT,θ)≤ v(P ), (18)

with probability at least p= 1− 4ne−C(ε2−ε3)k, where ε=O( δ

θ2‖y∗‖).

Page 14: Random projections for linear programmingliberti/mor17.pdfVu, Poirion and Liberti: Random projections for linear programming 2 Mathematics of Operations Research 00(0), pp. 000{000,

Vu, Poirion and Liberti: Random projections for linear programming14 Mathematics of Operations Research 00(0), pp. 000–000, c© 0000 INFORMS

First, we will informally explain the idea of the proof. Since v(P ) is the optimal objective value ofproblem P , for any positive δ, the problem

Ax= b∧x≥ 0∧ c>x≤ v(P )− δ.

is infeasible (because we can not obtain a lower objective value than v(P )). That problem can nowbe projected in such a way that it remains infeasible w.a.h.p. By rewriting this original problemin the standard form as (

c> 1A 0

)(xs

)=

(v(P )− δ

b

), where

(xs

)≥ 0, (19)

and applying a random projection of the form1 0 . . . 00. . . T0

, where T is a k×m random projector,

we will obtain the following problem, which is supposed to be infeasible w.a.h.p.

cx+ s = v(P )− δTAx = Tb

s ≥ 0x ≥ 0

. (20)

The main idea is that, the prior information about the optimal solution x∗ (i.e. the conditionn∑j=1

x∗j ≤ θ), can now be added into this new projected problem. This does not change its feasibility,

but later can be used to transform the corresponding cone into the one which is easier to deal with.Therefore, w.a.h.p., the problem

cx ≤ v(P )− δTAx = Tbn∑j=1

xj ≤ θ

x ≥ 0

(21)

is infeasible. Hence we deduce that cx≥ v(P )− δ holds w.a.h.p. for any feasible solution x of theproblem PT,θ, and that proves the LHS of Eq. (18). For the RHS, the proof is trivial since PT is arelaxation of P with the same objective function. We now turn to the formal proof.

Proof. Let

A=

(c> 1A 0

), x=

(xs

)and b=

(v(P )− δ

b

)Furthermore, let

T =

1 0 . . . 00. . . T0

, where T is a k×m random projector.

In the rest of the proof, we prove that b 6∈ cone(A) if and only if T b 6∈ cone(TA) w.a.h.p.

Page 15: Random projections for linear programmingliberti/mor17.pdfVu, Poirion and Liberti: Random projections for linear programming 2 Mathematics of Operations Research 00(0), pp. 000{000,

Vu, Poirion and Liberti: Random projections for linear programmingMathematics of Operations Research 00(0), pp. 000–000, c© 0000 INFORMS 15

Let J be the index set of the first n columns of A. Consider the transformation φJb,θ′

as defined

above, using a step 1θ′ instead of 1

θ, in which θ′ ∈ (θ, θ+ 1). We define the following matrix:

A′ =(A1− 1

θ′ b · · · An−1θ′ b An+1

)Since Eq. (19) is infeasible, it is easy to verify that the system:

Ax = bn∑j=1

xj < θ′

x ≥ 0

(22)

is also infeasible. It is equivalent to

b 6∈

{n∑j=1

xjAj

∣∣∣∣ x∈Rn+ ∧∑j∈J

xj < θ′

}.

Then, by Cor. 2, it follows that b 6∈ cone(A′).Let y∗ ∈Rm be an optimal dual solution of P of minimal Euclidean norm. By the strong duality

theorem, we have y∗A≤ c and y∗ b= v(P ). We define

y=

(1−y∗

).

We will prove that y A′ > 0 and y b < 0. Indeed, since y A=

(1−y∗

)>(c> 1A 0

)=

(c− y∗A

1

)≥ 0 and

y b= v(P )− δ− y∗ b=−δ < 0, then we have

y A′ =

(c− y∗A+ δ

θ′

1

)≥ δ

θ′1≥ δ

θ+ 11 and y b=−δ (23)

(where 1 is the all-one vector), which proves the claim.Now we can apply the scalar product preservation property. By Proposition 1 and the union

bound, we have that

∀j ≤ n | ((T y) (TA′)− y A′)j | ≤ εη (24)

| (T y) (T b)− y b | ≤ εη (25)

hold with probability at least p= 1− 4ne−C(ε2−ε3)k. Here, η is the normalization constant (to scale

vectors to unit norm)

η= max

{‖y‖‖b‖, max

1≤j≤n‖y‖‖Aj ′‖

},

in which we can easily estimate η = O(θ‖y∗‖) (the proof is given at the end). Let us now fixε= δ

2(θ+1)η. It is easy to see that

ε=δ

2(θ+ 1)η=O(

δ

θ2‖y∗‖).

Then with this choice of ε, by (23), (24) and (25), we have, with probability at least p,

(T y) (TA′) ≥ y A′− εη1≥(

δ

θ+ 1− εη

)1≥ 0

(T y) (T b) ≤ yb+ εη≤−δ+ εη < 0,

Page 16: Random projections for linear programmingliberti/mor17.pdfVu, Poirion and Liberti: Random projections for linear programming 2 Mathematics of Operations Research 00(0), pp. 000{000,

Vu, Poirion and Liberti: Random projections for linear programming16 Mathematics of Operations Research 00(0), pp. 000–000, c© 0000 INFORMS

which then implies that the problem

TA′x = T bx ≥ 0

is infeasible (by Farkas’ Lemma). By definition, TA′x= T Ax− 1θ′

n∑j=1

xjT b, which implies that the

systemT Ax = T bn∑j=1

xj < θ′

x ≥ 0

is also infeasible with probability at least p (the proof is similar to that of Corollary 2). Therefore,with probability at least p, the following optimization problem:

inf

{c>x

∣∣∣∣ TAx= Tb∧n∑j=1

xj < θ′ ∧x∈Rn+

}.

has its optimal value greater than v(P )− δ. Since θ′ > θ, it follows that with probability at leastp, we have v(PT,θ)≥ v(P )− δ, as claimed. The proof is done.

Proof of the claim that η=O(θ‖y∗‖): We have

‖b‖2 = ‖b‖2 + (v(P )− δ)2 (by the definition of b)≤ ‖b‖2 + 2(v(P ))2 + 2δ2 (using the inequality (x− y)2 ≤ 2x2 + 2y2 for all x, y.)≤ ‖b‖2 + 4(v(P ))2 (by assumption that |δ| ≤ |v(P )|)= 1 + 4|c>x∗|≤ 1 + 4‖c‖∞ ‖x∗‖1 (by Holder inequality)≤ 1 + 4θ (since ‖c‖∞ ≤ ‖c‖2 = 1 and

∑x∗i ≤ θ)

≤ 5θ (by the assumption that θ≥ 1).

Therefore, we conclude that

η= max

{‖y‖‖b‖, max

1≤j≤n‖y‖‖Aj ′‖

}=O(θ ‖y∗‖)

2

5. Solution retrieval In this section we explain how to retrieve an approximation x of theoptimal solution x∗ of problem P . Let δ > 0, by Theorem 4, we can build a vector x′ ∈ Rn+ suchthat v(P )− δ≤ cx′ ≤ v(P ) and TAx′ = Tb for some k×m projection matrix T .

5.1. Infeasibility of projected solutions We first prove that Ax′ 6= b almost surely, whichmeans that the projected problem directly gives us an approximate optimal objective functionvalue, but not the optimum itself. Let 0≤ ν ≤ δ such that v(PT ) = v(P )− ν.

Let A=

(cA

), b=

(v(P )− ν

b

), and T =

(1T

). We assume here that the projected solution x′

(s.t. cx′ = v(P )−ν) is found uniformly in the projected solution set F ′ = {x∈Rn+ | T Ax= T b}. We

denote F = {x∈Rn+ | Ax= b}.

Proposition 3. Assume that cone(A) is full dimensional in Rm and that any optimal solutionof P has at least m non-zero components. Let x′ be uniformly chosen in F ′. Then, almost surely,Ax′ = b does not hold.

Page 17: Random projections for linear programmingliberti/mor17.pdfVu, Poirion and Liberti: Random projections for linear programming 2 Mathematics of Operations Research 00(0), pp. 000{000,

Vu, Poirion and Liberti: Random projections for linear programmingMathematics of Operations Research 00(0), pp. 000–000, c© 0000 INFORMS 17

Proof. If ν > 0 then obviously Ax′ = b does not hold, because otherwise, it would contradictthe minimality of v(P ). Hence we assume in the rest of the proof that ν = 0, i.e, the value of theprojected problem is the same than the value of the original one.

In order to aim at a contradiction, we assume that

Prob(x′ ∈ F ) = p> 0.

For each ε∈ ker(T ), let

Fε = {x≥ 0 | Ax− b= ε}∩F ′.

We will prove that there exists d > 0 and a family V of infinitely many ε ∈ ker(T ) such that

Prob(x′ ∈ Fε)≥ d> 0. Since (Fε)ε∈V is a family of disjoint sets, we deduce that Prob

(x′ ∈

⋃ε∈V

Fv

)≥∑

ε∈Vd= +∞, leading to a contradiction.

Claim: b belongs to the relative interior of a facet of the m+ 1 dimensional cone, cone(A).Proof of claim. Notice first that if b belongs to the relative interior of cone(A) then we canfind a feasible solution for P with a smaller cost. Hence b belongs to a face of dimension atmost m. Assume now, to aim at a contradiction, that b belongs to the relative interior of a faceof dimension d≤m−1 of cone(A). Then, we could write b as a positive sum of d extreme rays,Aj, j ∈ J . Hence there exists an optimal solution x∗ of P with d non-negative components.Since d<m there is a contradiction.

Hence 0 belongs to a facet of {Ax− b | x≥ 0}, and since dim(ker(T ))≥ 2 (w.l.o.g.), then there existsa segment [−u,u] (for ‖u‖ small enough) that is contained in the intersection ker(T )∩{Ax− b | x≥0}.

Let Aj, j ∈ J be the rays of cone(A) that belong to the same facet of cone(A) as b. Thereexists x≥ 0 such that Ax= b and xj > 0, ∀j ∈ J (because b belongs to the relative interior of thisfacet). Since [−u,u] belongs to this facet, there exits x ∈ Rn such that Ax = −u and such thatxj = 0, ∀j /∈ J . We can hence compute N > 0 large enough such that 2x≤ N x.

For all N ≥ N and for all x∈ F , we denote x′N = x+x2− 1

Nx. Then we have Ax′N = b− 1

NAx= b+ u

N

and x′N = x2

+ ( x2− x

N)≥ 0. Therefore,

x+F

2− 1

Nx⊆ F u

N

which implies that, for all N ≥ N ,

Prob(x′ ∈ F uN

) = µ(F uN

)≥ µ(x+F

2)≥ αµ(F ) = αp> 0

for some constant α> 0, where µ is a uniform measure on F ′. 2

5.2. Approximate solution retrieval Let us consider y∗ to be an optimal solution of thefollowing dual problem:

D≡max {b>y | y>A≤ c∧ y ∈Rm} (26)

and let yT be an optimal solution of the dual of the projected problem:

DT ≡max {(Tb)> y | y> TA≤ c∧ y ∈Rk}. (27)

Let define yprox = T>yT . It is easy to see that yprox is also a feasible solution for the dual problemD in (26).

Page 18: Random projections for linear programmingliberti/mor17.pdfVu, Poirion and Liberti: Random projections for linear programming 2 Mathematics of Operations Research 00(0), pp. 000{000,

Vu, Poirion and Liberti: Random projections for linear programming18 Mathematics of Operations Research 00(0), pp. 000–000, c© 0000 INFORMS

In this section we will assume that the vector b ∈ Rm belongs to the relative interior of thenormal cone at some vertex of the dual polyhedron. Under this assumption, the dual solution y∗

is uniquely determined.Let Ct(y

∗) be the tangent cone of the dual polyhedron F(D)≡ {y ∈Rm| y>A≤ c} at y∗, whichis defined as

Ct(y∗) = closure

({d : ∃λ> 0 such that x+λd∈F(D)

})In other words, Ct(y

∗) is the closure of the set of all feasible directions of the dual polyhe-dron F(D) at y∗. Moreover, it is a convex cone generated by a set of vectors vi = yi − y∗ whereyi are the neighboring vertices of y∗ for i≤ p. Notice that by the previous hypothesis, we have:

b>vi < 0 for all i≤ p.

For each 1≤ i≤ p, let αi denote the angle between the vectors −b and vi. Let denote by

α∗ ∈ arg minαi,...,αp

cos(αi)

We first prove the following lemma, which states that yprox is approximately close to y∗.

Lemma 4. For any ε > 0, there is a constant C such that:

‖y∗− yprox‖2 ≤Cθ2ε

cos(α∗)‖b‖2‖y∗‖2 (28)

with probability at least p= 1− 4ne−C(ε2−ε3)k

Proof. By definition, yprox is also a feasible solution for the dual problem D. Furthermore, byTheorem 4, there is a constant C such that:

b>yprox ≥ b>y∗−Cθ2ε‖y∗‖2 (29)

with probability at least p= 1− 4ne−C(ε2−ε3)k.

Since yprox−y∗ belongs to the tangent cone Ct(y∗), there exists non-negative scalars λi (for i≤ p)

such that yprox− y∗ =p∑i=1

λivi. Hence

‖y∗− yprox‖2 = ‖p∑i=1

λivi‖2 ≤

p∑i=1

λi‖vi‖2.

By equation (29), we have also

Cθ2ε‖y∗‖2 ≥ b>(y∗− yprox) =

p∑i=1

λi(−b>vi) (we recall that −b>vi > 0 for all i) .

Let us consider the following LP:

maxp∑i=1

λi‖vi‖2p∑i=1

λi(−b>vi)≤Cθ2ε‖y∗‖2λ≥ 0.

(30)

The LP above is a simple continuous knapsack problem whose solution can be computed easily by

a greedy algorithm: let j be such that ‖vj‖2−b>vj ≥

‖vi‖2−b>vi for all i∈ {1, . . . , p}, then

‖vj‖2−b>vj

Cθ2ε‖y∗‖2 =1

cos(α∗)‖b‖2Cθ2ε‖y∗‖2

is the optimal value of (30). The lemma is proved. 2

Page 19: Random projections for linear programmingliberti/mor17.pdfVu, Poirion and Liberti: Random projections for linear programming 2 Mathematics of Operations Research 00(0), pp. 000{000,

Vu, Poirion and Liberti: Random projections for linear programmingMathematics of Operations Research 00(0), pp. 000–000, c© 0000 INFORMS 19

We consider the following algorithm which retrieves an approximate solution for the original LPfrom an optimal basis of the projected problem.

Algorithm 1 Retrieving an approximate solution of P

Let yT be the associated basic dual solution of the projected dual problem (DT ).Define yprox = T>yTfor all 1≤ j ≤ n do

zj :=cj−A>j yprox‖Aj‖2

Let B be the set of indices j corresponding to the m smallest values of zj.return x :=A−1

B b.

Notice that, for all 1≤ j ≤ n, zj :=cj−A>j yprox‖Aj‖2

is the distance between yprox and the hyperplane

defined by A>j y= cj. Hence, Algorithm 1 searches for the m facets of the dual polyhedron that arethe closest to yprox and return the corresponding basis.

Let B∗ be the optimal basis. We consider the shortest distance from y∗ to any hyperplaneA>j y= cj for j /∈B∗:

d∗ = minj /∈B∗

cj −A>j y∗

‖Aj‖2

Proposition 4. Assume that the LP problem P satisfies the following two assumptions:(a) there is no degenerated vertex in the dual polyhedron.(b) the vector b ∈ Rm belongs to the relative interior of the normal cone at some vertex of the

dual polyhedron.If

Cθ2ε

cos(α∗)‖b‖2‖y∗‖2 <

d∗

2,

where C is the universal constant in Lemma 4, then with probability at least p= 1− 4ne−C(ε2−ε3)k,

the Algorithm 1 returns an optimal basis solution.

Proof. By Lemma 4, we have that with probability at least p= 1− 4ne−C(ε2−ε3)k,

‖y∗− yprox‖2 ≤Cθ2ε

cos(α∗)‖b‖2‖y∗‖2

Let B∗ be the optimal basis. Since ‖y∗− yprox‖2 < d∗

2, We deduce that for all j ∈ B∗, zj ≤ ‖y∗−

yprox‖2 < d∗

2.

Now, let us consider j /∈ B∗. We have zj ≥ d2, otherwise y∗ would be at a distance less than d∗

from A>j y= cj. Since y∗ is non-degenerated we have d∗ > 0. This ends the proof. 2

Note that both assumptions (a)-(b) in Prop. 4 hold almost surely for random instances.

6. Computational complexity The main aim of this paper is that of proving that randomprojections can be applied to the given LP P with some probabilistic bounds on feasibility andoptimality errors. The projected LP PT can be solved by any method, e.g. simplex or interior point.Formally, we envisage the following the solution methodology:

1. sample a random projection matrix T ;

Page 20: Random projections for linear programmingliberti/mor17.pdfVu, Poirion and Liberti: Random projections for linear programming 2 Mathematics of Operations Research 00(0), pp. 000{000,

Vu, Poirion and Liberti: Random projections for linear programming20 Mathematics of Operations Research 00(0), pp. 000–000, c© 0000 INFORMS

2. perform the multiplication T (A,b);3. solve PT (Eq. (2));4. retrieve a solution for P ,

where (A,b) is the m× (n+ 1) matrix consisting of A with the column b appended.A very coarse computational complexity estimation is as follows: we assume computing each

component of T takes O(1), so computing T is O(km). The best practical algorithm for serialmatrix multiplication is only very slightly better than the naive algorithm, which takes O(kmn) =O(mn logn), but more efficient parallel and distributed algorithms exist. For solution retrieval,Alg. 1 runs in time O(km+mn+n logn+m2) =O(n(m+ logn)). The complexity O(mn logn) ofmatrix multiplication therefore dominates the complexity of sampling.

The last step, solution retrieval, is essentially dominated by taking the inverse of the m×mmatrix AB in Alg. 1, which we can assume to have complexity O(m3).

We focus our discussion on the most computationally costly step, i.e. that of solving the pro-jected LP PT . Exact polynomial-time methods for LP, such as the ellipsoid method or the inte-rior point method, have complexity estimates ranging from O(n4L) to O( n3

lognL), where L =∑m

i=0

∑n

j=0dlog(|aij|+ 1) + 1e, ai0 = bi for all i≤m, and a0j = cj for all j ≤ n [26].Obviously, these LP complexity bounds are impacted by replacing the number m of rows in P by

the corresponding number k=O(lnn) in PT . Also note that, since m≤ n, the complexity of solvingan LP always exceeds (asymptotically) the complexity of the other steps. So the overall worst-caseasymptotic complexity of our solution methodology does not change with respect to solving theoriginal LP. On the other hand, m appears implicitly as part of L. If we assume we can write L asmL′ for some L′, then the complexity goes from O( n3

lognmL′) to O( n3

logn(lnn)L′) =O(n3L′).

The simplex method has exponential time complexity in the worst case. On the other hand, itsaverage complexity is O(mn4) [5, Eq. (0.5.15)] in terms of the number of pivot steps, each takingO(m2L) in a naive implementation [24], where L represents a factor due to the encoding length(assumed multiplicative). This yields an overall average complexity bound O(m3n4L). Replacingm by O(lnn) yields an improvement O(n4(lnn)3L).

7. Computational results A sizable majority of works on the applications of the JLL aretheoretical in nature (with some exceptions, e.g. [27, 28]). In this section we provide some empiricalevidence that our ideas show a rather solid promise of practical applicability.

We started our empirical study by considering the NetLib public LP instance library [23], but itturns out that its instances are too small and sparse to yield any CPU improvement. We thereforedecided to generate and test a set of random LP instances in standard form. Our test set consists of360 infeasible LPs and 360 feasible LPs. We considered pairs (m,n) as shown in Table 1. For each

m500 1000 1500600 1200 1800

n 700 1400 2100800 1600 2400

Table 1. Instance sizes.

(m,n) we test constraint matrix densities in dens ∈ {0.1,0.3,0.5,0.7}. For each triplet (m,n,dens)we generate 10 instances where each component of the constraint matrix A is sampled from auniform distribution on [0,1]. The objective function vector is always c = 1. Infeasible instancesare generated using Farkas’ lemma: we sample a dual solution vector y such that yA≥ 0 and thenchoose b such that by < 0. Feasible instances are generated by sampling a primal solution vector xand letting b=Ax.

Page 21: Random projections for linear programmingliberti/mor17.pdfVu, Poirion and Liberti: Random projections for linear programming 2 Mathematics of Operations Research 00(0), pp. 000{000,

Vu, Poirion and Liberti: Random projections for linear programmingMathematics of Operations Research 00(0), pp. 000–000, c© 0000 INFORMS 21

We employ Achlioptas random projectors in order to decrease the density of the projectedconstraint matrix. One of the foremost difficulties in using random projections in practice is thatthe theory behind them gives no hint as regards the “universal constants”, e.g. C and the constantimplicit in the definition of k as O( 1

ε2lnn). In theory, one should be able to work out appropriate

values of ε and of the number σ of samplings of the random projector T for the problems athand. In practice, following the theory will yield such small ε and large σ values that the smallestLPs where our methodology becomes efficient will be expected to have billions of rows, defyingall computation on modest hardware such as today’s laptops. In fact, we are defending the pointof view that random projections are useful in day-to-day work involving large but not necessarilyhuge LPs and common hardware platforms. For such LPs, a lot of guesswork and trial-and-erroris needed. In our computational results we use k= 1.8

ε2lnn after an indication found in [27], ε= 0.2

after testing some values between 0.1 and 0.3, and σ= 1 again after some testing. The choice σ= 1implies that, occasionally, a few pairwise distances might fall outside their bounds; but enforcingevery pairwise distance to satisfy the JLL requires excessive amounts of samplings of T . Besides,concentration of measure ensures that very few pairwise distances will be projected wrong w.a.h.p.

All results are obtained using a Julia [4] JuMP [18] script calling the CPLEX [13] barrier solver(without crossover) on four virtual cores of a dual core Intel i7-7500U CPU at 2.70GHz with 16GBRAM (we remark that Julia is a just-in-time compiled language, so aside from a small lag to initiallycompile the script, CPU times should be similar to compiled rather than interpreted programs).The CPLEX barrier solver is, in our opinion, the solver of choice when solving very large andpossibly dense LPs — our preliminary tests with the simplex method showed repeated failuresdue to excessive resource usage (both CPU and RAM), and high standard deviations in evaluatingthe computational advantage between original and projected problems. Eliminating the crossoverphase is a choice we made after some experimentation with these instances. Some preliminaryresults show that this choice may need to be re-evaluated when solving problems with differentstructures.

7.1. Infeasible instances We benchmark infeasible instances on CPU time and accuracy.The latter is expressed in terms of mismatches: i.e., an infeasible original LP that is mapped intoa feasible projected LP (recall that the converse can never happen by linearity). The results areshown in Table 2. Each line is obtained as an average over the 10 instances with same m,n,dens.We denote by m the number of rows, by n the number of columns, and by dens the propertiesof the constraint matrix A. We then report the number of rows k in the projected problem, thetime orgCPU taken to solve the original LP, the time prjCPU taken to solve the projected LP,and the accuracy acc (“zero” means that no instance was incorrectly classified as feasible in theprojection). While for smaller instances the proposed methodology is not competitive as regardsthe CPU time, the trend clearly shows that the larger the size of the orginal LP, the higher thechances of our methodology being faster, in accordance with theory. We remark that prjCPU is thesum of the times taken to sample T , to perform the matrix multiplication TA, and to solve theprojected problem.

7.2. Feasible instances Feasible instances are benchmarked on CPU time as well as on threediscrepancy measures to ascertain the quality of the approximated solution x∗ of the projectedLP. In particular, we look at feasibility with respect to both Ax= b and x≥ 0, as well as at theoptimality gap between the approximate and the guaranteed optimal objective function value.Unfortunately, we found very high errors in the application of the solution retrieval method inAlg. 1, which at this time we are only able to justify by claiming our test LPs are “too small” (butwe are looking for a more detailed reason — for example, numerical errors leading to values closeto zero might yield an invalid basis). We therefore also tested a different solution retrieval method

Page 22: Random projections for linear programmingliberti/mor17.pdfVu, Poirion and Liberti: Random projections for linear programming 2 Mathematics of Operations Research 00(0), pp. 000{000,

Vu, Poirion and Liberti: Random projections for linear programming22 Mathematics of Operations Research 00(0), pp. 000–000, c© 0000 INFORMS

m n dens k orgCPU prjCPU acc500 600 0.1 289 2.40 2.66 0.0500 600 0.3 289 2.15 2.80 0.0500 600 0.5 289 2.48 2.95 0.0500 600 0.7 289 2.91 3.12 0.0500 700 0.1 296 2.46 2.99 0.0500 700 0.3 296 2.24 2.93 0.0500 700 0.5 296 2.72 3.34 0.0500 700 0.7 296 3.49 3.38 0.0500 800 0.1 302 2.01 3.11 0.0500 800 0.3 302 2.35 3.17 0.0500 800 0.5 302 2.95 3.58 0.0500 800 0.7 302 3.60 3.95 0.0

1000 1200 0.1 321 5.47 4.50 0.01000 1200 0.3 321 6.92 5.76 0.01000 1200 0.5 321 9.54 6.87 0.01000 1200 0.7 321 13.75 7.79 0.01000 1400 0.1 327 5.34 5.40 0.01000 1400 0.3 327 7.89 6.48 0.01000 1400 0.5 327 12.02 8.47 0.01000 1400 0.7 327 20.93 9.73 0.01000 1600 0.1 333 5.64 6.29 0.01000 1600 0.3 333 8.26 8.23 0.01000 1600 0.5 333 13.20 10.15 0.01000 1600 0.7 333 20.26 13.34 0.01500 1800 0.1 339 7.40 8.04 0.01500 1800 0.3 339 14.38 10.84 0.01500 1800 0.5 339 24.83 13.97 0.01500 1800 0.7 339 41.98 19.02 0.01500 2100 0.1 346 7.98 10.05 0.01500 2100 0.3 346 17.27 12.20 0.01500 2100 0.5 346 33.35 16.27 0.01500 2100 0.7 346 66.81 19.72 0.01500 2400 0.1 352 8.52 13.54 0.01500 2400 0.3 352 20.00 17.78 0.01500 2400 0.5 352 39.01 24.75 0.01500 2400 0.7 352 65.85 31.95 0.0

Table 2. Results on infeasible instances.

based on the pseudoinverse: it consists in replacing ABx= b (see last line of Alg. 1) by the reducedsystem A>HAHx=A>Hb, where H is a basis of the projected problem PT (the reconstruction of thefull solution from the projected basic components is heuristic). Accordingly, we present two setsof statistics for feasible instances: one labelled “1”, referring to Alg. 1, and the other labelled “2”,referring to the pseudoinverse variant.

The results on the feasible instances are given in Table 2. Again, each line is obtained as anaverage over the 10 instances with same m,n,dens. The CPU time comparison takes three columns:orgCPU refers to the time taken by CPLEX to solve the original LP; prjCPU1 is the sum of thetimes taken to sample T , multiply T by A, solve the projected LP, and retrieve the original solutionby Alg. 1; and prjCPU2 is the same as prjCPU1 but using the solution retrieval method basedon the pseudoinverse. The solution quality is evaluated in the six columns feas1, feas2 (verifyingfeasibility with respect to Ax= b using the two retrieval methods), neg1, neg2 (verifying feasibilitywith respect to x≥ 0 using the two retrieval methods), and obj1, obj2 (evaluating the optimalitygap using the two retrieval methods), defined as follows:• feas= 1

‖b‖1

∑i≤m|Aix∗− bi|;

Page 23: Random projections for linear programmingliberti/mor17.pdfVu, Poirion and Liberti: Random projections for linear programming 2 Mathematics of Operations Research 00(0), pp. 000{000,

Vu, Poirion and Liberti: Random projections for linear programmingMathematics of Operations Research 00(0), pp. 000–000, c© 0000 INFORMS 23

• neg= 1‖x∗‖1

∑x∗j<0

|x∗j |;

• obj= |v(P )−v(PT )||v(P )| .

The results are presented in Table 3. Again, we see an encouraging trend showing that the

m n dens k orgCPU prjCPU1 prjCPU2 feas1 feas2 neg1 neg2 obj1 obj2500 600 0.1 289 2.42 9.97 6.76 0.000 0.000 0.437 0.033 0.079 0.055500 600 0.3 289 2.41 10.24 7.08 0.000 0.000 0.442 0.035 0.029 0.027500 600 0.5 289 3.06 10.53 7.37 0.000 0.000 0.444 0.037 0.023 0.020500 600 0.7 289 3.89 10.81 7.72 0.000 0.000 0.454 0.036 0.042 0.014500 700 0.1 296 2.53 10.41 7.11 0.000 0.000 0.467 0.039 0.246 0.050500 700 0.3 296 2.46 10.72 7.58 0.000 0.000 0.453 0.045 0.068 0.025500 700 0.5 296 3.43 11.10 7.97 0.000 0.000 0.475 0.043 0.065 0.017500 700 0.7 296 4.45 11.40 8.45 0.000 0.000 0.468 0.038 0.028 0.012500 800 0.1 302 2.01 10.67 7.58 0.000 0.000 0.472 0.059 0.102 0.045500 800 0.3 302 2.55 11.10 8.02 0.000 0.000 0.463 0.060 0.053 0.023500 800 0.5 302 3.69 11.60 8.48 0.000 0.000 0.474 0.061 0.068 0.015500 800 0.7 302 5.03 12.03 9.02 0.000 0.000 0.473 0.054 0.044 0.011

1000 1200 0.1 321 6.49 14.03 10.04 0.000 0.000 0.466 0.012 0.036 0.0671000 1200 0.3 321 9.16 15.82 11.61 0.000 0.000 0.468 0.012 0.054 0.0301000 1200 0.5 321 14.71 17.52 13.46 0.000 0.000 0.487 0.013 0.277 0.0211000 1200 0.7 321 26.89 19.45 14.44 0.000 0.000 0.464 0.013 0.092 0.0141000 1400 0.1 327 6.88 15.54 11.50 0.000 0.000 0.484 0.013 0.222 0.0581000 1400 0.3 327 10.34 17.05 12.91 0.000 0.000 0.495 0.016 0.411 0.0261000 1400 0.5 327 22.80 19.85 16.23 0.000 0.000 0.488 0.013 0.144 0.0161000 1400 0.7 327 34.73 21.64 16.47 0.000 0.000 0.484 0.013 0.111 0.0121000 1600 0.1 333 7.16 16.98 12.93 0.000 0.000 0.487 0.021 0.857 0.0561000 1600 0.3 333 11.39 20.11 15.93 0.000 0.000 0.480 0.016 0.102 0.0211000 1600 0.5 333 25.44 22.42 18.73 0.000 0.000 0.486 0.017 0.073 0.0141000 1600 0.7 333 40.84 26.31 21.28 0.000 0.000 0.483 0.016 0.066 0.0101500 1800 0.1 339 9.77 21.64 15.68 0.000 0.000 0.479 0.005 0.069 0.0641500 1800 0.3 339 20.81 26.33 18.89 0.000 0.000 0.477 0.004 0.042 0.0271500 1800 0.5 339 42.95 29.95 22.36 0.000 0.000 0.473 0.004 0.054 0.0181500 1800 0.7 339 74.23 35.63 27.82 0.000 0.000 0.472 0.005 0.016 0.0131500 2100 0.1 346 10.38 24.78 19.02 0.000 0.000 0.485 0.007 0.095 0.0571500 2100 0.3 346 25.74 29.22 21.88 0.000 0.000 0.487 0.007 0.156 0.0221500 2100 0.5 346 52.21 34.06 26.06 0.000 0.000 0.483 0.007 0.046 0.0151500 2100 0.7 346 90.18 36.81 29.58 0.000 0.000 0.487 0.005 0.064 0.0101500 2400 0.1 352 11.26 27.90 22.12 0.000 0.000 0.485 0.006 0.121 0.0501500 2400 0.3 352 29.85 35.97 28.58 0.000 0.000 0.485 0.006 0.134 0.0191500 2400 0.5 352 61.25 42.47 34.99 0.000 0.000 0.489 0.006 0.253 0.0111500 2400 0.7 352 104.58 49.98 43.00 0.000 0.000 0.492 0.006 0.126 0.008

Table 3. Results on feasible instances.

CPU time for creating and solving the projected LP becomes smaller than the time taken to solvethe original LP as size and density increase. According to our theoretical development, increasingsize/density further will give a definite advantage to our methodology based on random projections.It is clear that feasibility w.r.t. Ax= b is never a problem. On the other hand, feasibility w.r.t. non-negativity is an issue, expected with the pseudoinverse-based solution retrieval method, but notnecessarily with Alg. 1. After checking it (and its implementation) multiple times, we came to twopossible conclusions: (i) that our arbitrary choice of universal constants is wrong for Alg. 1, whichwould require larger instances than those we tested in order to work effectively; (ii) that the choiceof the basis B in Alg. 1 is heavily affected by numerical errors, and therefore wrong. We have beenunable to establish which of these reasons is most impactful, and delegate this investigation tofuture work. For the time being, we propose the pseudoinverse variant as the method of choice.

Page 24: Random projections for linear programmingliberti/mor17.pdfVu, Poirion and Liberti: Random projections for linear programming 2 Mathematics of Operations Research 00(0), pp. 000{000,

Vu, Poirion and Liberti: Random projections for linear programming24 Mathematics of Operations Research 00(0), pp. 000–000, c© 0000 INFORMS

8. An application to error correcting codes In this section we showcase an applicationof our methodology to a problem of error correcting coding and decoding [21, §8.5].

A binary word w of length m can be encoded as a word z of length n (with m< n) such thatz =Qw where Q is an n×m real matrix, which we assume to have rank m. After transmission onan analogue noisy channel the other party receives z. We assume z = z+ x, where the transmissionerror xj on the j-th character is uniformly distributed in [−δ, δ] for some given δ > 0 with somegiven (reasonably small) probability ε > 0, and xj = 0 with probability 1− ε. In other words, x isa sparse vector with density ε.

The decoding of z into w is carried out as follows. We find an m×n matrix A orthogonal to Q(so AQ= 0), we compute b=Az and note that

b=Az =A(z+x) =A(Qw+x) =AQw+Ax=Ax.

If the system Ax = b can be solved, we can find z′ = z − x, and recover w using the projectionmatrix (Q>Q)−1Q> followed by rounding:

w= b(Q>Q)−1Q>z′e.

The protocol rests on finding a sparse solution of the under-determined linear system Ax= b.Minimizing the number of non-zero components of a vector that also satisfies Ax= b is known as“zero-norm minimization”, and is NP-hard [22]. In a celebrated discovery later called compressedsensing, Candes, Rohmberg, Tao and Donoho discovered that the zero-norm is well approximatedby the `1-norm. We therefore consider the following problem

min{‖x‖1 |Ax= b},

which can be readily reformulated to the LP

min{∑j

sj | − s≤ x≤ s∧Ax= b}. (31)

We propose to compare the solution of Eq. (31) with that of its randomly projected version:

min{∑j

sj | − s≤ x≤ s∧TAx= Tb}, (32)

where T is an Achlioptas random projector. The computational set-up for this test is similar tothat of Sect. 7, except that we enable the crossover in the CPLEX barrier solver.

We compare Eq. (31) and Eq. (32) on the sentence that the Sybilla of Delphos spoke to thehapless soldier who asked her whether he would get back from the war or die in it: Ibis redibis nonmorieris in bello [Alberico delle Tre Fontane, Chronicon], at which the soldier rejoiced. When thewife heard her husband had actually died in the war, she contacted the Sybilla for a full refund.The Sybilla, unperturbed, pointed out that the small print in the legal terms attributed her theright of inserting commas in sentences as she saw fit, which made her prophecy into the morereality-oriented Ibis redibis non, morieris in bello. We test here the comma-free version: much morecryptic, ambiguous, and therefore worthy of the Sybilla.

The original sentence is encoded in ASCII-128 and then in binary without padding (10010011100010 1101001 1110011 1000001 1100101 1001011 1001001 1010011 1000101 1010011 11001110000011 0111011 0111111 0111010 0000110 1101110 1111111 0010110 1001110 0101111 00101101001111 0011100 0001101 0011101 1101000 0011000 1011001 0111011 0011011 0011011 11). Thebinary string has m= 233 characters, is encoded into n= 256 characters (assuming an error rateof 10%, typical of the Sybilla muttering incantations with a low and guttural voice), and is then

Page 25: Random projections for linear programmingliberti/mor17.pdfVu, Poirion and Liberti: Random projections for linear programming 2 Mathematics of Operations Research 00(0), pp. 000{000,

Vu, Poirion and Liberti: Random projections for linear programmingMathematics of Operations Research 00(0), pp. 000–000, c© 0000 INFORMS 25

projected into k= 61 characters. We modified the parameter of the Achlioptas projector from 1/6down to 1/100 after verifying with many examples that this particular application is extremelyrobust to random projections.

While the original LP took 0.296s to solve, the projected LP only took 0.028s. The accuracy inretrieving the original text was perfect. In fact, in this application it is very hard to make mistakesin the recovery; so much so, that we could set the JLL ε at 0.3. This might be partly due tothe fact that the LP in Eq. (31) does not include nonnegativity constraints, which are generallyproblematic because of their large Gaussian width, see Sect. 1.1.

We also tested a slightly longer word sequence from a well-known poem about aviary permanenceon greek sculptures: Once upon a midnight, dreary, while I pondered, weak and weary. The 421characters long binary string is encoded into 463 characters and projected into 67. The original LPtook 1.332s and the projected LP took 0.064s to solve; again, the retrieval accuracy was perfect.

9. Conclusion This paper is about the application of random projections to LP in standardform. We prove that feasibility and optimality are both approximately preserved by sub-gaussianrandom projections. Moreover, we show how to retrieve solutions of the original LPs from thoseof the projected LPs, using duality arguments. These findings make it possible to approximatelysolve very large scale LPs with high probability, as showcased by our computational results andapplication to error correcting codes.

Acknowledgments We are grateful to A.A. Ahmadi and D. Malioutov for introducing us tothe Johnson-Lindenstrauss lemma, and to F. Tardella for helpful discussions. We are also gratefulto two referees and an associate editor who, through accurate and constructive criticism, helpedus improve this paper considerably. This research is partly supported by the SO-grid project(www.so-grid.com) funded by ADEME, by the EU Grant FP7-PEOPLE-2012-ITN No. 316647“Mixed-Integer Nonlinear Optimization”, and by the ANR “Bip:Bip” project under contract ANR-10-BINF-0003. The first author was supported by a Microsoft Research Ph.D. fellowship.

References[1] Achlioptas D (2003) Database-friendly random projections: Johnson-Lindenstrauss with binary coins.

Journal of Computer and System Sciences 66:671–687.

[2] Allen-Zhu Z, Gelashvili R, Micali S, Shavit N (2014) Sparse sign-consistent Johnson-Lindenstraussmatrices: Compression with neuroscience-based constraints. Proceedings of the National Academy ofSciences 111(47):16872–16876.

[3] Anthony M, Biggs N (1992) Computational Learning Theory: an Introduction. Cambridge Tracts inTheoretical Computer Science (Cambridge: Cambrige University Press).

[4] Bezanson J, Edelman A, Karpinski S, Shah V (2017) Julia: A fresh approach to numerical computing.SIAM Review 59(1):65–98, URL http://dx.doi.org/10.1137/141000671.

[5] Borgwardt K (1987) The Simplex Method: a Probabilistic Analysis (Berlin: Springer).

[6] Boutsidis C, Zouzias A, Drineas P (2010) Random projections for k-means clustering. Advances inNeural Information Processing Systems, 298–306 (La Jolla: NIPS Foundation).

[7] Candes E, Tao T (2005) Decoding by Linear Programming. IEEE Transactions on Information Theory51(12):4203–4215.

[8] Candes E, Tao T (2008) Reflections on compressed sensing. IEEE Information Theory Society Newsletter58(4):14–17.

[9] Dantzig G (1990) The Diet Problem. Interfaces 20(4):43–47.

[10] Dasgupta A, Kumar R, Sarlos T (2010) A sparse Johnson-Lindenstrauss transform. Proceedings of theSymposium on the Theory Of Computing, volume 10 of STOC (Cambridge: ACM).

[11] Dasgupta S, Gupta A (2002) An elementary proof of a theorem by johnson and lindenstrauss. RandomStructures and Algorithms 22:60–65.

Page 26: Random projections for linear programmingliberti/mor17.pdfVu, Poirion and Liberti: Random projections for linear programming 2 Mathematics of Operations Research 00(0), pp. 000{000,

Vu, Poirion and Liberti: Random projections for linear programming26 Mathematics of Operations Research 00(0), pp. 000–000, c© 0000 INFORMS

[12] de Farias DP, Roy BV (2004) On constraint sampling in the Linear Programming approach to approx-imate Dynamic Programming. Mathematics of Operations Research 29(3):462–478.

[13] IBM (2014) ILOG CPLEX 12.6 User’s Manual. IBM.

[14] Indyk P, Naor A (2007) Nearest neighbor preserving embeddings. ACM Transactions on Algorithms3(3):Art. 31.

[15] Johnson W, Lindenstrauss J (1984) Extensions of Lipschitz mappings into a Hilbert space. Hedlund G,ed., Conference in Modern Analysis and Probability, volume 26 of Contemporary Mathematics, 189–206(Providence: American Mathematical Society).

[16] Kane D, Nelson J (2014) Sparser Johnson-Lindenstrauss transforms. Journal of the ACM 61(1):4.

[17] Koenker R (2005) Quantile regression (Cambridge: Cambridge University Press).

[18] Lubin M, Dunning I (2015) Computing in operations research using julia. INFORMS Journal on Com-puting 27(2):238–248, URL http://dx.doi.org/10.1287/ijoc.2014.0623.

[19] Matousek J (2008) On variants of the Johnson-Lindenstrauss lemma. Random Structures and Algorithms33:142–156.

[20] Matousek J (2013) Lecture notes on metric embeddings. Technical report, ETH Zurich.

[21] Matousek J, Gartner B (2007) Understanding and using Linear Programming (Berlin: Springer).

[22] Natarajan B (1995) Sparse approximate solutions to linear systems. SIAM Journal of Computing24(2):227–234.

[23] NetLib (2015) LP instance library. http://www.netlib.org/lp/.

[24] Pan V (1985) On the complexity of a pivot step of the revised simplex algorithm. Computers & Math-ematics with Applications 11(11):1127–1140.

[25] Pilanci M, Wainwright M (2014) Randomized sketches of convex programs with sharp guarantees.International Symposium on Information Theory (ISIT), 921–925 (Piscataway: IEEE).

[26] Potra F, Wright S (2000) Interior-point methods. Journal of Computational and Applied Mathematics124:281–302.

[27] Venkatasubramanian S, Wang Q (2011) The Johnson-Lindenstrauss transform: An empirical study.Algorithm Engineering and Experiments, volume 13 of ALENEX, 164–173 (Providence: SIAM).

[28] Yang J, Meng X, Mahoney M (2014) Quantile regression for large-scale applications. SIAM Journal ofScientific Computing 36(5):S78–S110.