cluster-based distributed augmented lagrangian algorithm ... · convexity of the local cost...

14
Cluster-based Distributed Augmented Lagrangian Algorithm for a Class of Constrained Convex Optimization Problems 1 Hossein Moradian a Solmaz S. Kia a a Department of Mechanical and Aerospace Engineering, University of California, Irvine Abstract We propose a distributed solution for a constrained convex optimization problem over a network of clustered agents each consisted of a set of subagents. The communication range of the clustered agents is such that they can form a connected undirected graph topology. The total cost in this optimization problem is the sum of the local convex costs of the subagents of each cluster. We seek a minimizer of this cost subject to a set of affine equality constraints, and a set of affine inequality constraints specifying the bounds on the decision variables if such bounds exist. We design our distributed algorithm in a cluster-based framework which results in a significant reduction in communication and computation costs. Our proposed distributed solution is a novel continuous-time algorithm that is linked to the augmented Lagrangian approach. It converges asymptotically when the local cost functions are convex and exponentially when they are strongly convex and have Lipschitz gradients. Moreover, we use an -exact penalty function to address the inequality constraints and derive an explicit lower bound on the penalty function weight to guarantee convergence to -neighborhood of the global minimum value of the cost. A numerical example demonstrates our results. Key words: distributed constrained convex optimization, augmented Lagrangian, primal-dual solutions, optimal resource allocation, penalty function methods 1 Introduction We consider a group of N clustered agents V = {1, ··· ,N } with communication and computation ca- pabilities, whose communication range is such that they can form a connected undirected graph topology, see Fig. 1. These agents aim to solve, in a distributed manner, the optimization problem x ? = arg min xR m X N i=1 f i (x i ), subject to (1a) [w 1 ] j x 1 + ··· +[w N ] j x N - b j =0,j ∈{1, ··· ,p}, (1b) x i l x i l , l ∈B i ⊆{1, ··· ,n i }, i ∈V , (1c) x i l ¯ x i l , l ¯ B i ⊆{1, ··· ,n i }, i ∈V , (1d) where f i (x i )= n i l=1 f i l (x i l ). In this setting, each agent i ∈V is a cluster of local ‘subagents’ l ∈{1,...,n i } whose decision variable is x i =[x i 1 , ··· ,x i n i ] > R n i . The weighting factor matrix w i R p×n i of each agent ? Corresponding author: H. Moradian Email addresses: [email protected] (Hossein Moradian), [email protected] (Solmaz S. Kia). 1 A preliminary version of this paper is presented in [1]. i ∈V is only known to the agent i itself. Moreover, x i l , ¯ x i l R, with x i l < ¯ x i l , are respectively the lower and upper bounds on the l th decision variable of agent i ∈V , if such a bound exists. In a distributed solution, each agent i ∈V should obtain its respective component of x ? =[x 1?> , ··· , x N?> ] > by interacting only with the agents that are in its communication range. Problem (1) explicitly or implicitly, captures various in-network op- timization problems. One example is the optimal in- network resource allocation, which appears in many op- timal decision making tasks such as economic dispatch over power networks [2,3], optimal routing [4,5] and net- work resource allocation for wireless systems [6, 7]. In such problems, a group of agents with limited resources, e.g., a group of generators in a power network, add up their local resources to meet a demand in a way that the overall cost is optimum for the entire network. Another family of problems that can be modeled as (1) is the in- network model predictive control over a finite horizon for a group of agents with linear dynamics [8, 9]. In recent years, there has been a surge in the design of distributed algorithms for large-scale in-network opti- mization problems. The major developments have been in the unconstrained convex optimization setting where the global cost is the sum of local costs of the agents Preprint submitted to Automatica 30 October 2020 arXiv:1908.06634v3 [cs.MA] 29 Oct 2020

Upload: others

Post on 14-Oct-2020

5 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Cluster-based Distributed Augmented Lagrangian Algorithm ... · convexity of the local cost functions, we adapt an augmented Lagrangian framework [27]. Augmented Lagrangian method

Cluster-basedDistributedAugmentedLagrangianAlgorithm

for aClass ofConstrainedConvexOptimizationProblems 1

Hossein Moradian a Solmaz S. Kia a

aDepartment of Mechanical and Aerospace Engineering, University of California, Irvine

Abstract

We propose a distributed solution for a constrained convex optimization problem over a network of clustered agents eachconsisted of a set of subagents. The communication range of the clustered agents is such that they can form a connectedundirected graph topology. The total cost in this optimization problem is the sum of the local convex costs of the subagentsof each cluster. We seek a minimizer of this cost subject to a set of affine equality constraints, and a set of affine inequalityconstraints specifying the bounds on the decision variables if such bounds exist. We design our distributed algorithm in acluster-based framework which results in a significant reduction in communication and computation costs. Our proposeddistributed solution is a novel continuous-time algorithm that is linked to the augmented Lagrangian approach. It convergesasymptotically when the local cost functions are convex and exponentially when they are strongly convex and have Lipschitzgradients. Moreover, we use an ε-exact penalty function to address the inequality constraints and derive an explicit lowerbound on the penalty function weight to guarantee convergence to ε-neighborhood of the global minimum value of the cost.A numerical example demonstrates our results.

Key words: distributed constrained convex optimization, augmented Lagrangian, primal-dual solutions, optimal resourceallocation, penalty function methods

1 Introduction

We consider a group of N clustered agents V ={1, · · · , N} with communication and computation ca-pabilities, whose communication range is such thatthey can form a connected undirected graph topology,see Fig. 1. These agents aim to solve, in a distributedmanner, the optimization problem

x? = arg minx∈Rm

∑N

i=1f i(xi), subject to (1a)

[w1]jx1+ · · ·+ [wN ]jx

N− bj=0, j ∈ {1, · · · , p}, (1b)

xil ≤ xil, l ∈ Bi ⊆ {1, · · · , ni}, i ∈ V, (1c)

xil ≤ xil, l ∈ Bi ⊆ {1, · · · , ni}, i ∈ V, (1d)

where f i(xi) =∑ni

l=1 fil (x

il). In this setting, each agent

i ∈ V is a cluster of local ‘subagents’ l ∈ {1, . . . , ni}whose decision variable is xi = [xi1, · · · , xini ]> ∈ Rni

.

The weighting factor matrix wi ∈ Rp×ni

of each agent

? Corresponding author: H. Moradian

Email addresses: [email protected] (Hossein Moradian),[email protected] (Solmaz S. Kia).1 A preliminary version of this paper is presented in [1].

i ∈ V is only known to the agent i itself. Moreover,xil, x

il ∈ R, with xil < xil, are respectively the lower and

upper bounds on the lth decision variable of agent i ∈ V,if such a bound exists. In a distributed solution, eachagent i ∈ V should obtain its respective component ofx? = [x1?>, · · · , xN?>]> by interacting only with theagents that are in its communication range. Problem (1)explicitly or implicitly, captures various in-network op-timization problems. One example is the optimal in-network resource allocation, which appears in many op-timal decision making tasks such as economic dispatchover power networks [2,3], optimal routing [4,5] and net-work resource allocation for wireless systems [6, 7]. Insuch problems, a group of agents with limited resources,e.g., a group of generators in a power network, add uptheir local resources to meet a demand in a way that theoverall cost is optimum for the entire network. Anotherfamily of problems that can be modeled as (1) is the in-network model predictive control over a finite horizonfor a group of agents with linear dynamics [8, 9].

In recent years, there has been a surge in the design ofdistributed algorithms for large-scale in-network opti-mization problems. The major developments have beenin the unconstrained convex optimization setting wherethe global cost is the sum of local costs of the agents

Preprint submitted to Automatica 30 October 2020

arX

iv:1

908.

0663

4v3

[cs

.MA

] 2

9 O

ct 2

020

Page 2: Cluster-based Distributed Augmented Lagrangian Algorithm ... · convexity of the local cost functions, we adapt an augmented Lagrangian framework [27]. Augmented Lagrangian method

Fig. 1. A group of clustered agents (generators)with undirected connected graph topology aim to

solve x? = arg minx∈R12

∑6i=1 f

i(xi), subject to

[1 1]x1 + x2 + [0.5 0.5 0.5]x3 + [1 1 1]x4 = 450,[0.5 0.5 0.5]x3 + [1 1]x5 + x6 = 700, and

xil ≤ xil ≤ xil , i∈Z6

1, l∈Zni

1 in a distributed manner. Here,

f i(xi) =∑ni

l=1 fil (x

il), where f il (x

il) = αilx

il2 + βilx

il + γil . In

the physical layer plot, a cluster agent can communicate withanother cluster if it is inside the other cluster’s communication

disk. To solve this optimal resource allocation problem in a dis-

tributed manner, we form subgraphs G1(V1, E1) and G2(V2, E2),which are associated, respectively, with the first and the second

equality constraints. Here, V1 = {1, 2, 3, 4} and V2 = {3, 4, 5, 6}.Agent 4 acts as a connectivity helper node in G2. A solution tothis problem using our proposed algorithm is given in section 4.

(see e.g. [10,11]for algorithms in discrete-time, and [12–14] for algorithms in continuous-time). In-network con-strained convex optimization problems have also beenstudied in the literature. For example, in the context ofthe power generator economic dispatch problem, [15–17]offer distributed solutions that solve a special case of (1)with local quadratic costs subject to bounded decisionvariables and a single demand equation, p = 1 andwi = 1 for i ∈ V. Distributed algorithm design forspecial cases of (1) with non-quadratic costs are pre-sented in [8, 18, 19] in discrete-time form, and [20–24]in continuous-time form. Except for [19], all these al-gorithms consider the case that the local decision vari-able of each agent i ∈ V is a scalar. Moreover, with theexception of [8, 19, 21], these algorithms only solve (1)when the equality constraint is the unweighted sum oflocal decision variables, i.e., p = 1 and wi = 1 for i ∈ V.Also, only [23] and [24] consider local inequality con-straints, which are in the form of local box inequalityconstraints on all the decision variables of the problem.Lastly, the algorithms in [18,23,24] require the agents tocommunicate the gradient of their local cost functionsto their neighbors. Such a requirement can be of concernfor privacy-sensitive applications.

In this paper, we propose a novel distributed algorithmto solve the optimization problem (1). We start by con-sidering the case that Bi = Bi = {} for i ∈ V, i.e.,

when there is no inequality constraint. For this prob-lem, we propose a continuous-time distributed primal-dual algorithm. To induce robustness and also to yieldconvergence without strict convexity of the local costfunctions, we adapt an augmented Lagrangian frame-work [25]. The augmented Lagrangian method has beenused in [26], [27], and [19] to improve the transient re-sponse of the distributed algorithms for, respectively,an unconstrained convex optimization, an online opti-mization, and a discrete-time constrained optimizationproblems. Different than the customary practice of us-ing a common augmented Lagrangian penalty parame-ter as in [19,26,27], in our design to reduce the coordina-tion overhead among the agents we allow each agent tochoose its own penalty parameter locally. The structureof our distributed solution is inspired by the primal-dualcentralized solution of [28] (see (6)), where the couplingin the differential solver is in the dual state dynamics.In decentralized primal-dual algorithms, e.g. [22,29,30],the adopted practice is to give every agent a copy of thedual variables and use a consensus mechanism to makethe agents arrive eventually at the same dual variable.We follow the same approach but in our design, we payparticular attention to computation and communicationresource management by adopting a cluster-based ap-proach. First, we consider the sparsity in the equalityconstraints and give only a copy of a dual variable to anagent if a decision variable of that agent is involved inthe equality constraint corresponding to that dual vari-able. Then, only the cluster of the agents that have acopy of the dual variable need to form a connected graphand use a consensus mechanism to arrive at agreementon their dual variable, see Fig. 1. Next, in our design, weonly assign a single copy of the dual variable to an agenti regardless of how many subagents it has. We note thatif we use the algorithms in [8, 18–24] to solve problems

where xi ∈ Rni

of an agent i ∈ V is a vector (ni > 1),we need to treat each component of the i as an agentand assign a copy of a dual variable to it. Such a treat-ment increases the local storage, computation and com-munication costs of agent i. Our convergence analysis,is based on the Lyapunov and the LaSalle invariant setmethods, and also the semistability analysis [31] to showthat our algorithm is guaranteed to converge to a pointin the set of optimal decision values when the local costsare convex. When the local cost functions are stronglyconvex and their local gradients are globally Lipschitzthe convergence guarantees of our proposed algorithmover connected graphs is exponential and can also be ex-tended to dynamic graphs.

To address scenarios where all or some of the decisionvariables are bounded in (1), we use a variation of exactpenalty function method [32], called ε-exact penaltyfunction method [33]. Unlike the exact penalty method,this method uses a smooth differentiable penalty func-tion to converge to the ε-neighborhood of the globalminimum value of the cost. The advantage of exactpenalty function methods is in the possibility of using

2

Page 3: Cluster-based Distributed Augmented Lagrangian Algorithm ... · convexity of the local cost functions, we adapt an augmented Lagrangian framework [27]. Augmented Lagrangian method

a finite penalty weight to arrive at a practical and nu-merically well-posed optimization solution. However,as shown in [32, 33], the penalty function weight islower bounded by the bounds on the Lagrange multi-pliers. Since generally, the Lagrange multipliers are un-known, the bound on the penalty function weight is notknown either. Many literature that use penalty functionmethods on distributed optimization framework gen-erally state that a large enough value for the weight isused [34,35], with no guarantees on the feasibility of theirchoice. [36], [37], [24, Lemma 5.1] and [30, Proposition4],, and are among few results in literature that addressthe problem of establishing an exact upper-bound onthe size of the Lagrange multipliers, which can be usedto obtain a lower bound on the size of the valid penaltyfunction weight. However, [36] only considers prob-lems with inequality constraints only, while [24, Lemma5.1], [30, Proposition 4] are developed for the resourceallocation problem described by (1) when there existsonly one equality constraint (p = 1) with wi = 1, i ∈ Vand all the decision variables have boxed inequality. onthe other hand [37] proposes a numerical procedure.As part of our contribution in this paper, we obtainan explicit closed-form upper-bound on the Lagrangemultipliers of problem (1), which enables determiningthe size of the suitable penalty function weight for bothexact and ε-exact penalty function methods.

In summary, the contribution of this paper is twofold. (a)We propose a novel distributed algorithm to solve prob-lem (1). This design uses an augmented Lagrangian ap-proach, which, similar to the case of centralized solvers,extends the convergence guarantees of our proposed dis-tributed algorithm to convex cost functions, as well. Ourdesign also incorporates a cluster-based approach to re-duce computational and communication costs. (b) Weestablish a well-defined upper-bound on the Lagrangemultipliers of problem (1). This result is of fundamentalimportance and its impact is beyond our proposed algo-rithm. It is useful in identifying the value of the weightfactor of exact and ε-exact penalty functions that areused to address inequality constraints.

2 Preliminaries

Let R, R≥0, Z, and Z>0 be, respectively, the set of real,nonnegative real, integer, and positive integer numbers.For a given i, j ∈ Z, i < j, we define Zji = {x ∈ Z | i ≤x ≤ j}. We denote the cardinality of a set A by |A|. Fora matrix A = [aij ] ∈ Rn×m, we denote its transpose ma-

trix by A>, kth row by [A]k, kth column by [A]k, and itselement wise max-norm with ‖A‖max. We let 1n (resp.0n) denote the vector of n ones (resp. n zeros), In de-note the n × n identity matrix and Πn = In − 1

n1n1>n .When clear from the context, we do not specify the ma-trix dimensions. For a vector x ∈ Rn we denote thestandard Euclidean and infinity norms by, respectively,

‖x‖=√

x>x and ‖x‖∞=max |xi|ni=1. Given a set of vec-tors, we use [{pi}i∈M] to indicate the aggregate vector

obtained from staking the set of the vectors {pi}i∈Mwhose indices belong to the ordered setM⊂ Z>0. In anetwork of N agents, to distinguish and emphasize thata variable is local to an agent i ∈ ZN1 , we use super-scripts, e.g., f i(xi) is the local function of agent i ∈ ZN1evaluated at its own local value xi ∈ Rni

. The lth ele-ment of a vector xi ∈ Rni

at agent i ∈ ZN1 is denoted

by xil. Moreover, if pi ∈ Rdi is a variable of agent i ∈V = {1, · · · , N}, the aggregated pi’s of the network is

the vector p = [{pi}i∈V ] = [p1>, · · · ,pN>]> ∈ Rdand

Blkdiag(p) =[ p1 0 0

0 ··· 00 0 pN

]∈ Rd×N , with d =

∑Ni=1 d

i.

For a differentiable function f : Rd → R, ∇f(x) repre-sents its gradient. A differentiable function f : Rd → R isconvex (resp. α-strongly convex, α ∈ R>0) over a convexset C ⊆ Rd if and only if (z− x)>(∇f(z)−∇f(x)) ≥ 0(resp. α‖z− x‖2 ≤ (z− x)>(∇f(z)−∇f(x)), or equiv-alently α‖z − x‖ ≤ ‖∇f(z) − ∇f(x)‖) for all x, z ∈ C.Moreover, it is strictly convex over a convex setC ⊆ Rd ifand only if (z− x)>(∇f(z)−∇f(x)) > 0.

Next, we briefly review basic concepts from algebraicgraph theory following [38]. A weighted graph, is a tripletG = (V, E ,A), where V = {1, . . . , N} is the node set,E ⊆ V × V is the edge set, and A = [aij ] ∈ RN×N is aweighted adjacency matrix such that aij > 0 if (i, j) ∈ Eand aij = 0, otherwise. An edge from i to j, denoted by(i, j), means that agent j can send information to agenti. A graph is undirected if (i, j) ∈ E anytime (j, i) ∈ E .An undirected graph whose weights satisfy aij = aji forall i, j ∈ V is called a connected graph if there is a pathfrom every node to every other node in the network. The(out-)Laplacian matrix of a graph is L = Diag(A1N )−A.Note that L1N = 0. A graph is connected if and onlyif 1>NL = 0, and rank(L) = N − 1. Therefore, for aconnected graph zero is a simple eigenvalue of L. Fora connected graph, we denote the eigenvalues of L byλ1, . . . , λN , where λ1 = 0 and λi ≤ λj , for i < j.

3 Distributed Continuous-Time Solvers

In this section, we present our distributed algorithmto first solve the constrained optimization problem (1)when there is no inequality constraint, i.e., Bi = Bi = {}for i ∈ V. Then, we extend our results to solve the con-strained optimization problem (1) with inequality con-straints. Our standing assumptions are given below.

Assumption 3.1 (Problem specifications): The cost

function f il : R → R of the subagent l ∈ Zni

1 of eachagent i ∈ V is convex and differentiable. Moreover,

∇f i : Rni → Rni

of each agent i ∈ V is locally Lipschitz.Also,

W = [w1, . . . ,wN ] ∈ Rp×m (2)

3

Page 4: Cluster-based Distributed Augmented Lagrangian Algorithm ... · convexity of the local cost functions, we adapt an augmented Lagrangian framework [27]. Augmented Lagrangian method

is full row rank and the feasible set

Xfe = {x ∈ Rm | (1b), (1c), (1d) hold } (3)

is non-empty for local inequalities (1c) and (1d). Lastly,the optimization problem (1) has a finite optimum f? =

f(x?) =∑Ni=1 f

i(xi?). 2

Local Lipschitzness of ∇f i, i ∈ V, guarantees existenceand uniqueness of the solution of our proposed algo-rithm (7), which is a differential equation.

To solve problem (1) subject to only the equality con-straints, we consider the augmented cost function witha penalty term on violating the affine constraint, i.e.,

x? = argminx∈Rm

∑N

i=1f i(xi) +

ρ

2‖Wx− b‖2, (4a)

[w1]kx1 + · · ·+ [wN ]kx

N = bk, k ∈ Zp1, (4b)

where ρ ∈ R≥0 is the penalty parameter. This augmen-tation results in the so-called augmented Lagrangian for-mulation of iterative optimization algorithms. As statedin [10], augmented Lagrangian methods were developedin part to bring robustness to the dual ascent method,and in particular, to yield convergence without assump-tions like strict convexity or finiteness of the cost func-tion (see also [25]). As shown below, such positive ef-fects are valid also for the continuous-time algorithmswe study. Augmenting the cost with the penalty func-tion as in (4a) however presents a challenge in design ofdistributed solutions as the total cost in (4a) is no longerseparable. Nevertheless, we are able to address this chal-lenge in our distributed solution.

Lemma 3.1 (KKT conditions to characterize solutionset of (4) [39]): Consider the constrained optimization

problem (4). Let Assumption 3.1 hold and f i : Rni → R,

i ∈ V, be a differentiable and convex function on Rni

.For any ρ ∈ R≥0, a point x? ∈ Rm is a solution of (4) ifand only if there exists a ν? ∈ Rp, such that, for i ∈ V,

∇f i(xi?) + wi>ν? = 0, (5a)

[w1]kx1? + · · ·+ [wN ]kxN? = bk, k ∈ Zp1. (5b)

Moreover, ν? corresponding to every x? is unique andfinite. If the local cost functions are strongly convex,then for any ρ ∈ R≥0 the KKT equation (5) has a uniquesolution (ν?, x?), i.e., (4) has a unique solution. 2

Let L(ν,x) = f(x) + ρ2‖w

1x1 + · · ·+ wNxN− b‖2 +

ν>(w1x1+· · ·+wNxN−b) be the augmented Lagrangianof the optimization problem (4). Following [28], a centralsolver for the optimal resource allocation problem (4) is

νk =∂L(ν,x)

∂νk= [w1]kx

1+· · ·+[wN ]kxN−bk, (6a)

xi = −∂L(ν,x)

∂xi= −∇f i(xi)−

∑p

j=1[wi]>j νj −

ρwi>(w1x1+· · ·+wNxN−b), (6b)

where k ∈ Zp1, and i ∈ V. The algorithm studied in [28] isfor un-augmented Lagrangian, i.e., ρ = 0, and the guar-anteed convergence holds only for strictly convex costfunction f(x). However, we can show that the centralsolver (6) with ρ > 0 is guaranteed to converge for con-vex cost function f(x), as well (the details are omittedfor brevity). A numerical example demonstrating thispositive role is presented in Appendix B.

The source of coupling in (4) is the set of the equalityconstraints (4b), which appear in the central solver (6),as well. To design our distributed algorithm, we adaptthe structural constitution of (6), but aim to create thecoupling terms [w1]kx

1+· · ·+[wN ]kxN− bk, k ∈ Zp1, in a

distributed manner. We note that for every equality con-straint k ∈ Zp1, the coupling is among the set of agentsCk = {i ∈ V | [wi]k 6= 0}. To have an efficient communi-cation and computation resource management, we seekan algorithm that handles every coupled equality con-straint among only those agents that are involved. Inthis regards, for every equality constraint k ∈ Zp1, we letGk(Vk, Ek) be a connected undirected subgraph of G thatcontains the set of agents Ck (see Fig. 1 for an exam-ple). We assume that Vk ⊂ V is a monotonically increas-ing ordered set. It is very likely that the agents coupledthrough an equality constraint are geographically close,and thus in the communication range of each other. Nev-ertheless, Vk, k ∈ Zp1, may contain agents i ∈ V thathave [wi]k = 0 but are needed to make Gk connected (seeFig. 1 for an example). We let Nk = |Vk|, k ∈ Zp1. In ourdistributed solution for (4), we also seek an algorithmthat allows each agent to use a local penalty parameterρi ∈ R>0, so we can eliminate the need to coordinateamong the agents to choose the penalty parameter ρ. Inwhat follows, we define T i = {j ∈ Zp1|i ∈ Vj}, i ∈ V,and {blk}l∈Vk such that

∑l∈Vk b

lk = bk, for k ∈ Zp1 (pos-

sible options include blk = bk/|Ck|, l ∈ Ck while bjk = 0,

j ∈ V\Ck, or bjk = bk for a particular agent j ∈ Vk and

blk = 0 for any l ∈ V\{j}).

With the right notation at hand, our proposed dis-tributed algorithm to solve optimization problem (4) is

ylk =βk∑

j∈Vkalj(v

lk − v

jk), (7a)

vlk = ([wl]kxl − blk)−βk

∑j∈Vk

alj(vlk − v

jk)− ylk, (7b)

xi =− (1 + ρi)∇f i(xi)−ρi∑

k∈T i[wi]>k ([wi]kx

i−bik)

+ρi∑k∈T i

([wi]>k yik)−(1 + ρi)

∑k∈T i

([wi]>k vik), (7c)

with βk ∈ R>0 and ρi ∈ R≥0 for i ∈ V, k ∈ Zp1 and l ∈Vk. To comprehend the connection with the centralized

4

Page 5: Cluster-based Distributed Augmented Lagrangian Algorithm ... · convexity of the local cost functions, we adapt an augmented Lagrangian framework [27]. Augmented Lagrangian method

dynamical solver (6), take summation of (7a) and (7b)over every connected Gk, k ∈ Zp1 to obtain

∑l∈Vk

ylk = 0 =⇒∑

l∈Vkylk(t) =

∑l∈Vk

ylk(0), (8a)∑l∈Vk

vlk = [w1]kx1+· · ·+[wN ]kx

N−bk, (8b)

which shows that for any k ∈ Zp1, the dynamics of thesum of vlks duplicates the Lagrange multiplier dynam-ics (6a) of the central Augmented Lagrangian method.Therefore, in a convergent (7), ultimately for each k ∈Zp1, all the vlks converge to the same value indicating thatultimately every agent obtains a local copy of (6a) forany k ∈ Zp1. On the other hand, if we factor out (1 + ρi)from the right hand side of (7c) and exclude the thirdcomponent, which is a technical term added to induceagreement between the agents, (7c) mimics the dynam-ics (6b) of the central Augmented Lagrangian solver.

Remark 3.1 (Benefits of cluster-based approach) Firstwe note that regardless of the size of ni, in algorithm (7)we associate at most one copy of the Lagrange multi-plier generator dynamics, i.e., (7a) and (7b), to everyagent i ∈ V. Specifically, every agent i ∈ V, maintains|T i| ≤ p number of (7a) and (7b) pair dynamics and con-sequently has to broadcast the same number of variablesto the network. In comparison, if we use the algorithmsin [8,18–24], when ni > 1, for any i ∈ V, we need to treateach component of the i as an agent and assign a copyof a dynamics that generates the dual variable to every

subagent l ∈ Zni

1 . This results in a storage, computationand communication cost of order ni× p per agent i ∈ V.See our numerical examples for a comparison. Next, no-tice that algorithm (7) can always be implemented byusing Gk = G, k ∈ Zp1, where G = (V, E) is the connectedinteraction topology that all the agents form. However,the flexibility to use a smaller cyber-layer formed by onlythe cluster of agents that are coupled by an equality con-straint reduces the communication and computationalcost of implementing Algorithm (7). Moreover, in someproblems, similar to our numerical example in Section 4,the coupling equation is between the neighboring agents.In such cases, subgraphs Gk can be easily formed. More-over, as one can expect and our numerical example alsohighlights, using a smaller subgraph Gk can results in afaster convergence for (7a) and (7b) dynamics and as aresult a faster convergence for algorithm (7). 2

The equilibrium points of algorithm (7) when every Gk,k ∈ Zp1 is a connected graph is given by

Se={

({vk}pk=1, {yk}pk=1, {x

i}Ni=1) ∈p∏k=1

RNk×p∏k=1

RNk×

N∏i=1

Rni∣∣∣vk = θk1Nk

, θk ∈ R, ∇f i(xi)+∑j∈T i

[wi]>j θj=0,

∑N

j=1[wj ]kx

j=bk +∑

j∈Vkyjk, y

lk = [wl]kx

l − blk,

i ∈ V, l ∈ Vk, k ∈ Zp1}. (9)

Due to (8a), if algorithm (7) is initialized such that∑l∈Vk y

lk(0) = 0, we have

∑l∈Vk y

lk(t) =

∑l∈Vk y

lk(0)

for t ∈ R≥0. In that case, if algorithm (7) converges toan equilibrium point ({vk}pk=1, {yk}

pk=1, {xi}Ni=1) ∈ Se,

we have ({vk}pk=1, {yk}pk=1, {xi}Ni=1) = ({[{[wl]kxl? −

blk}l∈Vk ]}pk=1, {ν?k1Nk}pk=1, {xi?}Ni=1), where ({xi?}Ni=1,

{ν?k}pk=1) satisfies the KKT equation (5). The following

theorem shows that indeed under the stated initializa-tion, the algorithm (7) converges to a minimizer of opti-mization problem (4). To establish the proof of this the-orem we use the following notations. We let A ∈ RN×Nbe the adjacency matrix of G. Then, the the adjacencymatrix of Gk ⊂ G, k ∈ Zp1, is Ak, which is the subma-trix of A corresponding to the rows and the columnsassociated with the agents in Vk, i.e., Ak = M>

k A Mk

where Mk ∈ RN×Nk is defined such that [Mk]l = [I]Vk(l),l ∈ {1, . . . , Nk}withVk(l) being the lth element of the or-dered set Vk. Then, Lk = Diag(Ak1Nk

)−Ak is the Lapla-cian matrix of Gk, k ∈ Zp1. Next, we define rk = 1√

Nk1Nk

and Rk = [v2k, · · · ,vNkk] with (rk, {vjk}Nkj=2) being the

normalized eigenvectors of Lk. Note here that we have

r>k Rk=0, R>k Rk = INk−1, RkR>k =ΠNk, (10a)

[rk Rk]>Lk[rk Rk] = Diag([0, λ2k, · · · , λNkk]). (10b)

The eigenvectors are ordered such that λ2k and λNkk

are, respectively, the smallest and the largest non-zeroeigenvalues of Lk. The next two theorems whose proofsare given in Appendix A examine the stability and con-vergence of (7) over connected graphs.

Theorem 3.1 (Asymptotic convergence of (7) overconnected graphs when the local costs are convex): Letevery Gk, k ∈ Zp1, be a connected graph and Assump-tion 3.1 hold. For every k ∈ Zp1, suppose {blk}l∈Vk ⊂ R isdefined such that

∑l∈Vk b

lk = bk. Then, for each i ∈ V,

l ∈ Vk, starting from xi(0) ∈ Rni

and ylk(0), vlk(0) ∈ Rwith

∑l∈Vk y

lk(0) = 0, the algorithm (7) for any

ρi ∈ R>0, makes t 7→ ({vk(t)}pk=1, {xi(t)}Ni=1) con-verge asymptotically to ( {ν?k1Nk

}pk=1, {xi?}Ni=1), where({ν?k}

pk=1, {xi?}Ni=1) is a point satisfying the KKT con-

ditions (5) of problem (4). 2

The initialization condition∑l∈Vk y

lk(0) = 0 of The-

orem 3.1 is trivially satisfied by every agent l ∈ Vk,k ∈ Zp1, using ylk(0) = 0. The asymptotic convergenceguarantee for algorithm (7) in Theorem 3.1 is establishedfor local convex cost functions. For such cost functions,similar to the centralized algorithm (6), (7) fails to con-verge when ρi = 0 for all i ∈ V. Next, we show that ifthe local costs are strongly convex and have Lipschitz

5

Page 6: Cluster-based Distributed Augmented Lagrangian Algorithm ... · convexity of the local cost functions, we adapt an augmented Lagrangian framework [27]. Augmented Lagrangian method

gradients then the convergence is in fact exponentiallyfast for ρi ∈ R>0 i ∈ V. Recall that for strongly convexlocal cost functions, the minimizer of (4) is unique.

Theorem 3.2 (Exponential convergence of (7) overconnected graphs when the local costs are strongly convexand have Lipschitz gradients ): Let every Gk, k ∈Zp1 beconnected and Assumption 3.1 hold. Also, assume each

cost function f il , l∈Zni

1 , i∈V, is mil-strongly convex and

has M il -Lipschitz gradient. Let m=max{{mi

l}ni

l=1}Ni=1 ∈R>0 and M = max{{M i

l }ni

l=1}Ni=1 ∈ R>0. Then,

starting from xi(0) ∈ Rni

and ylk(0), vlk(0) ∈ R foreach i ∈ V, l ∈ Vk, and given

∑l∈Vk y

lk(0) = 0

and∑l∈Vk b

lk = bk in (7), the algorithm (7) makes

t 7→ ({vk(t)}pk=1, {xi(t)}Ni=1) converge exponentiallyfast to ( {ν?k1Nk

}pk=1, {xi?}Ni=1) for any ρi ∈ R>0, where({ν?k}

pk=1, {xi?}Ni=1) is the unique solution of the KKT

conditions (5) of problem (4). Moreover, when ρi = 0for an i ∈ V, the convergence to the unique solution ofthe KKT conditions (5) is asymptotic. 2

The proof of Theorem 3.2 is given in Appendix A.

Remark 3.2 (The convergence of (7) over dynamicallychanging connected graphs) The proof of Theorem 3.2relies on a Lyapunov function that is independent of thesystems parameters, and its derivative for ρi ∈ R>0,i ∈ V, is negative definite with a quadratic upper bound.Hence, we can also show that the algorithm (7), whenρi ∈ R>0 for i ∈ V, converges exponentially fast to aunique solution of the KKT conditions (5) of problem (4)over any time-varying topology Gk, k ∈ Zp1 that is con-nected at all times and its adjacency matrix is uniformlybounded and piece-wise constant.

3.1 Problem subject to both equality and inequality con-straints

To address inequality constraints, we use a penaltyfunction method to eliminate the local inequality con-straints (1c) and (1d). That is, we seek solving

x?p = arg minx∈Rm

∑N

i=1f ip(x

i), subject to (11a)

[w1]jx1 + · · ·+ [wN ]jx

N = bj , j ∈ Zp1, (11b)

with

f ip(xi)=f i(xi)+γ(∑l∈Bi

pε(xil−xil)+

∑l∈Bi

pε(xil−xil)

), (12)

i ∈ V, where γ ∈ R>0 is the weight of the smooth penalty

function pε =

0, y ≤ 0,

12εy

2, 0 ≤ y ≤ ε,(y − 1

2ε), y ≥ ε,for some ε ∈R>0.

This approach allows us to use algorithm (7) to solve

the optimization (1) by using f ip(xi) in place of f i(xi)

in (7c). We note that f ip(xi) is convex and differentiable if

f i(xi) is a convex function in Rni

. Following this penaltymethod approach, when the global cost function of (1)is evaluated at the limit point of algorithm (7), it is inε-order neighborhood of the global optimal value of theoptimization problem (1) (see Proposition 3.1 below).In what follows, we investigate when the penalty func-tion weight γ has a finite value and give a well-definedadmissible range for it.

Given Assumption 3.1, the Slater condition [39] is satis-fied. Thus, the KKT conditions below give a set of nec-essary and sufficient conditions that characterize the so-lution set of the convex optimization problem (1).

Lemma 3.2 (Solution set of (1) [39]): Consider theconstrained optimization problem (1) under Assump-tions 3.1. A point x? ∈ Rm is a solution of (1) if and onlyif there exists ν? ∈ Rp and {µi?l }l∈Bi ⊂ R≥0 {µi?l }l∈Bi ⊂R≥0, i ∈ V, such that

∇f i(xi?)+wi>ν? − µi? + µi? = 0, (13a)

Wx? − b = 0, (13b)

µi?l (xil−xi?l )=0, xil−xi?l ≤0, µi?l ≥0, l∈Bi, (13c)

µi?l (xi?l −xil)=0, xi?l −xil ≤0, µi?l ≥0, l∈Bi, (13d)

where µi? = [µi?1 , · · · , µi?ni ]> with µi?l = 0 for l ∈ Zni

1 \Bi

and µi? = [µi?1 , · · · , µi?ni ]> with µi?l = 0 for l ∈ Zni

1 \Bi.If the local cost functions are strongly convex, then theoptimization problem (1) has a unique solution. 2

LetXεfe be the ε-feasible set of optimization problem (1),

Xεfe =

{x ∈ Rm |Wx = b, xil−xil ≤ε, l∈B

i

xij−xij ≤ε, j∈Bi, i ∈ V}. (14)

The result below states that for some admissible valuesof γ, the minimizer of problem (11) belongs to ε-feasibleset Xε

fe and optimal value of optimization problem (1)is in ε order neighborhood of the optimal value of theoriginal optimization problem (1).

Proposition 3.1 (relationship between the solutionof (1) and (11) [33]): Let (x?,ν?, {µi?l }l∈Bi , {µi?l }l∈Bi)

be any solution of the KKT equations (5). Let x?pbe a minimizer of optimization problem (11) forsome γ, ε ∈ R>0. If γ = 1−N

1−√Nγ?, where γ? >

max{

max{µi?l }l∈Bi ,max{µi?l }l∈Bi

}Ni=1

, then

x?p ∈ Xεfe, 0 ≤ f? − f(x?p) ≤ ε γN, (15)

where f? = f(x?) is the optimal value of (1). 2

6

Page 7: Cluster-based Distributed Augmented Lagrangian Algorithm ... · convexity of the local cost functions, we adapt an augmented Lagrangian framework [27]. Augmented Lagrangian method

We note that if ε→0, we have pε(y)→p(y) = max{0, y},where p(y) is the well-known non-smooth penalty func-tion [32] with exact equivalency guarantees when γ>γ?

in Proposition 3.1.

Remark 3.3 (comment on the feasibility of solutionof (11)) Use of ε−exact penalty function approach ismotivated by keeping the cost smooth and differen-tiable, which is of desire from practical perspective com-pared to exact penalty method which is a non-smoothfunction. Using an ε-exact penalty function we have thegrantees that the approximated solution x?p is in (14).Therefore only the inequality constrains may be vio-lated by ε amount. Since the value of ε can be selectedvery small, the possible violation of the inequality con-straints will be small too. One may select the value ofε in accordance to the expected accuracy of the algo-rithm. Note that by slight tightening of the inequalityconstraints according to xil ≤ xil − ε and xil + ε ≤ xiland using these adjusted inequalities in the penaltyfunction, we can guarantee that x?p ∈ Xfe. But this mayresult in slight increase in the optimally gap in (15).

Considering Proposition 3.1, a practical and numeri-cally well-posed solution via the penalty optimizationmethod (11) is achieved when the Lagrange multipliersare bounded. Thus, in what follows we seek for µbound in

max{

max{µi?l }l∈Bi ,max{µi?l }l∈Bi

}Ni=1≤ µbound, (16)

with the objective of choosing a penalty function weightγ that satisfies the condition set by Proposition 3.1 bysetting γ ≥ 1−N

1−√Nµbound.

For any solution of the KKT conditions (5), we let Ai ⊂Bi and Ai ⊂ Bi respectively be the set of indices of theactive lower bound and the active upper bound inequal-ity constraints of agent i ∈ V. We note thatAi∩Ai = {}.Because for inactive inequalities µi?l = 0 (resp. µi?l = 0)

for l ∈ Bi\Ai and i ∈ V (resp. l ∈ Bi\Ai) [40], we obtain

max{

max{µi?l }l∈Bi ,max{µi?l }l∈Bi

}Ni=1

=

max{

max{µi?l }l∈Ai ,max{µi?l }l∈Ai

}Ni=1

. (17)

Therefore, to find µbound, it suffices to find an upper

bound on max{

max{µi?l }l∈Ai ,max{µi?l }l∈Ai

}Ni=1

.

As known, the set of the Lagrange multipliers of an opti-mization problem of form (1) is nonempty and boundedif and only if the Mangasarian-Fromovitz constraintqualification (MFCQ) holds [41]. It is straight-forwardto show that the MFCQ condition is satisfied for a re-source allocation problem of form (1) with one equalityconstraint (i.e., p = 1) and upper and lower bounded de-cision variables (i.e., Bi = Bi = Zni

1 ). For such a problemthe following result specifies a µbound that satisfies (16).

Proposition 3.2 (µbound for the resource allocationproblem with one equality constraint and bounded deci-sion variables): Consider problem (1) under Assump-tion 3.1 when p = 1, wil > 0 for l ∈ {1, · · · , ni} and

Bi = Bi = Zni1 , i ∈ V. Let (x?, ν?, {µi?l }l∈Bi , {µi?l }l∈Bi)

be an arbitrary solution of the KKT conditions (5) forthis problem. Then, µbound in (16) satisfies

µbound ≤(1 +w

w) max

{max

xi∈Xiineq

‖∇f i(xi)‖∞}Ni=1

, (18)

where Xiineq = {xi ∈ Rni | xil ≤ xil ≤ xil, l ∈ Zni

1 }, w =

min{{wil}ni

l=1}Ni=1 and w = max{{wil}ni

l=1}Ni=1.

PROOF. For any given (x?, ν?, {µi?l }l∈Bi , {µi?l }l∈Bi),

we note that the KKT conditions (5) can be written as

∇f il (xi?l )+wil ν? = 0, l ∈ Zn

i

1 \{Ai ∪ Ai}, (19a)

∇f il (xi?l )+ wil ν? + µi?l = 0, l ∈ Ai, (19b)

∇f il (xi?l )+ wil ν? − µi?l = 0, l ∈ Ai. (19c)

Since {wil}ni

l=1 ⊂ R>0, it follows from Assumption 3.1,which states that the feasible set is non-empty for strictlocal inequalities, that the upper bounds (similarlythe lower bounds) for all decision variable cannot beactive simultaneously. Therefore, for any given min-imizer, we have either (a) at least for one subagent

k ∈ Zni

1 in an agent i ∈ V we have xik < xi?k < xik or(b) some of the decision variables are equal to their up-per bound and the remaining others are equal to theirlower bound. If case (a) holds, it follows from (19a) that

ν? =−∇fi

k(xi?k )

wik

, which means that we have the guaran-

tees that |ν?| ≤ max{‖∇fi(xi?)‖∞}Ni=1

w . On the other hand,

if (b) holds, then there exists at least an agent k ∈ Vwith Ak 6= {} and an agent j ∈ V with Aj 6= {} (k = jis possible). Therefore, for l ∈ Ak it follows from (19b)that ν? = 1

wkl

(−∇fkl (xk?l ) − µk?l ), and for l ∈ Aj it

follows from (19c) that ν? = 1

wj

l

(−∇f jl(xj?l

) + µj?l

).

Consequently, because µk?l ≥ 0 and µj?l≥ 0, we

conclude that − 1

wj

l

∇f jl(xj?l

) ≤ ν? ≤ − 1wk

l

∇fkl (xk?l ),

which leads to |ν?| ≤ max{|∇fj

l(xj?

l)

wj

l

|, |∇fkl (xk?

l )

wkl

|} ≤max{‖∇fi(xi?)‖∞}Ni=1

w . Therefore, we conclude that for

any given (x?, ν?, {µi?l }l∈Bi , {µi?l }l∈Bi), we have |ν?| ≤

max{‖∇fi(xi?)‖∞}Ni=1

w ≤max

{max

xi∈Xiineq

‖∇fi(xi)‖∞}N

i=1

w .

Consequently, it follows from (19b) that µi?l ≤|∇f il (xi?l )| + |wil ν?| ≤ ‖∇f il (xi?l )‖∞ + w|ν?|, andfrom (19c) that µi?l ≤ ‖∇f il (xi?l )‖∞ + |wil ν?| ≤

7

Page 8: Cluster-based Distributed Augmented Lagrangian Algorithm ... · convexity of the local cost functions, we adapt an augmented Lagrangian framework [27]. Augmented Lagrangian method

‖∇f il (xi?l )‖∞+ w|ν?|. Therefore, given (17), we have theguarantees that (18) holds.

To compute the upper-bound in (18) in a distributedmanner, agents can run a set of max-consensus algo-rithms.

To demonstrate the tightness of the bound in (20), con-sider the following numerical example

x? = arg minx∈R10

∑10

i=1f i(xi), subject to

w1x1 + w2x

2 + · · ·+ w10x10 = b, 0 ≤ xi ≤ 1 i ∈ Z10

1 ,

in which the local cost functions are assumed quadraticas f i(xi) = αix

i2+βixi+γi where the parameters chosen

randomly according to αi ∈ (0, 1], βi ∈ (0, 3], γi ∈ (0, 4],b ∈ (0, 4]. The affine constraint weights are also chosenrandomly according to wi ∈ (0, 2] are randomly chosen.For this problem finding the exact value of the Lagrangemultipliers is possible by solving the KKT equations. Todo this calculation, we use fmincon function of MAT-LAB to obtain the optimum solution. Then, we computethe corresponding Lagrange multipliers by solving theKKT conditions. Table. 1 shows the values of µmax, themaximum of the Lagrange multipliers, and the valuesof µbound in (18) for five different runs of the algorithm.As we can see, for this problem the values for µbound atmost are only one order of magnitude larger than µmax.

Table 1

The values of actual µbound and the bound in (17)

case: 1 2 3 4 5

µmax 2.33 2.68 1.95 2.38 1.95

µbound in (17) 13.34 17.91 11.6 52.1 18.48

Evaluating the MFCQ condition generally is challengingfor other classes of optimization problems. A commonsufficient condition for the MFCQ is the linear indepen-dence constraint qualification (LICQ), which also guar-antees the uniqueness of the Lagrange multipliers for anysolution of the optimization problem (1) [42] (see [12]and [43] for examples of the optimization solvers that aredeveloped under the assumption that the LICQ holds).For a constrained optimization problem we say that theLICQ holds for the optimal solution x? ∈ Rm if the gra-dient of the equality constraints and the active inequal-ity constraints at x? are linearly independent. The fol-lowing result finds a µbound for problem (1) when LICQcondition holds at the minimizers.

Theorem 3.3 (Bounds on the Lagrange multipliers cor-responding to inequality constraints when the LICQ holdsat the minimizers): Consider problem (1) under Assump-tion 3.1. Assume also that the LICQ holds at the mini-mizers of (1). Let (x?,ν?, {µi?l }l∈Bi , {µi?l }l∈Bi) be an ar-

bitrary solution of the KKT conditions (5) for this prob-lem. Then, the bound µbound in (16) satisfies

µbound≤(

1+w

ω

)max

{max

xi∈Xiineq

‖∇f i(xi)‖∞}Ni=1

. (20)

where w = ‖W‖max = max{‖wi‖max}Ni=1, and ω =

min{σmin(Wc)∣∣Wc∈Q (W>) }. Here, Q (W>) is the set

of all the invertible p × p sub-matrices of W> ∈ Rm×p(recall (2)).

PROOF. For any (x?,ν?, {µi?l }l∈Bi , {µi?l }l∈Bi), we

note that the KKT conditions (5) can be written as

∇f il (xi?l )+([wi]l)>ν? = 0, l ∈ Zni

1 \{Ai ∪ Ai}, (21a)

∇f il (xi?l )+ ([wi]l)>ν? + µi?l = 0, l ∈ Ai, (21b)

∇f il (xi?l )+ ([wi]l)>ν? − µi?l = 0, l ∈ Ai, (21c)∑N

i=1

∑ni

l=1[wi]lj x

i?l = bj , j ∈ Zp1, (21d)

xi?l = xil, l ∈ Ai, (21e)

xi?l = xil, l ∈ Ai, (21f)

i ∈ V. Under the LICQ assumption, the gradients of theequality constraints (set of p vectors in Rm) and the ac-

tive inequality constraints (set of∑Ni=1 |A

i∪Ai| vectorsin Rm) at the minimizer should be linearly independent.

This necessitates that∑Ni=1 |A

i ∪ Ai| ≤ m− p. As a re-

sult, we can conclude that q =∑Ni=1 |Zn

i

1 \(Ai∪Ai)| ≥ p.Thus, the number of KKT equations of the form (21a) isq ≥ p. As a result, we can write all these q equations as

W>e ν? = −[{{∇f il (xi?l )}n

i

l=1}Ni=1]

where We ∈ Rp×q is a sub-matrix of W ∈ Rp×m.Recall that under the LICQ assumption (ν? ∈Rp, {µi?l }l∈Bi , {µi?l }l∈Bi) corresponding to every x? is

unique. Thus, rank(W>e ) = p and there always exist a

sub-matrix Wse ∈ Rp×p of W>e ∈ Rq×p such that

ν? = −W−1se J, (22)

where J is the components of [{{∇f il (xi?l )}ni

l=1}Ni=1] as-sociated with the rows of Wse. Therefore, we can write

‖ν?‖∞≤1

σmin(Wse)‖J‖∞

≤ 1

ωmax

{max

xi∈Xiineq

‖∇f i(xi)‖∞}Ni=1

, (23)

where ω is defined in the statement. Here, we used

|∇f il (xi?)|≤max{

maxxi∈Xi

ineq

‖∇f i(xi)‖∞}Ni=1

, l∈Zni

1 , i∈V.

8

Page 9: Cluster-based Distributed Augmented Lagrangian Algorithm ... · convexity of the local cost functions, we adapt an augmented Lagrangian framework [27]. Augmented Lagrangian method

On the other hand, given (21b) and (21c) we can write

max{

max{µi?l }l∈Ai ,max{µi?l }l∈Ai

}Ni=1≤

max{

maxxi∈Xi

ineq

‖∇f i(xi)‖∞}Ni=1

+ w ‖ν?‖∞,

where w is defined in the statement. Therefore,given (23) we have the guarantees that (20) holds.

4 Numerical examples

In what follows, we demonstrate the performance of al-gorithm (7) via two numerical examples.

As a first demonstrative example, we consider the in-network resource allocation problem described in Fig. 1.We choose the parameters of the costs and the limitsof generation of the generators randomly from the ta-ble below, which lists the parameters of the generatorsof the IEEE 118 bus test model [44], located at buses(4, 10, 18, 26, 54, 69).

IEEE α β γ x x

bus number [mu/MW2] [mu/MW] [mu] [MW] [MW]

4 0.0696629 26.24382 031.67 5 30

10 0.010875 12.8875 6.78 150 300

18 0.0128 17.82 10.15 25 100

26 0.003 10.76 32.96 100 350

54 0.0024014 12.32989 28 50 250

69 0.010875 12.8875 6.78 80 300

Figure 2 shows the time history of xil’s generated by im-plementing the distributed optimization algorithm (7)(using f ip(xi) as defined in (12) in place of f i(xi) in (7c))in comparison to the solution obtained using MATLAB’sconstraint optimization solver ‘fmincon’. As expectedthe decision variable xi of each agent i ∈ {1, . . . , 6}converges closely to its corresponding minimizer, usingε = 0.001. Figure 3 depicts the equality constraint vi-olation time history, which as shown vanishes over thetime. For this problem to generate the dual dynamics,the agents {1, · · · , 6}, maintain and communicate vari-ables of order {1, 1, 2, 2, 1, 1}, respectively when we im-plement algorithm (7). Whereas, if we implement algo-rithms of [8, 18–24], the corresponding variables to gen-erate the dual dynamics is of order {4, 2, 6, 6, 4, 2}.

For second example, we consider a simple distributedself-localizing deployment problem concerned with opti-mal deployment of 3 sensors labeled Si, i ∈ {1, 3, 5} ona line to monitor a set of events that are horizontally lo-cated at P=[{pi}10

i=1] = [12, 11, 9, 3, 2,−1,−2,−8,−11,− 13] for t ∈ [0, 100) , and P=[{pi}10

i=1]=[24, 22, 17, 15,13, 8, 7, 3,−2,−4] for t ∈ [100, 200), see Fig. 4. Agent1 is monitoring {pi}3i=1, agent 3 is monitoring {pi}7i=4,and agent 5 is monitoring {pi}10

i=8. Sensors should findtheir positions cooperatively to keep their position in

0 100 200

Time

0

100

200

300

Fig. 2. Execution of algorithm (7) over the network depicted inFig. 1. The colored solid curved plots depicts the time history

of decision variable of each agent. Horizontal dashed lines depict

the centralized solution obtained using MATLAB’s constraintoptimization solver ‘fmincon’.

0 10 20 30 40 50 60Time

-200

0

200

[Wx

- b]

k

k=1

k=2

Fig. 3. Constraint violation error while solving the optimizationproblem described in Figure 1 using algorithm (7).

the communication range of each other as well as stayclose to the targets to improve the detection accuracy.Due to limited communication range, two relay nodesRi, i ∈ {2, 4}, as shown in Fig. 4 are used to guaran-tee the connectivity of the sensors during the operation.The problem is formulated by

x? = arg minx∈R5

∑5

i=1f i(xi), subject to (24)

xj − xj+1 ≤ 5, j ∈ {1, · · · , 4},

where f i(xi) =∑j∈Ei ‖xi − pj‖2 for i ∈ {1, 3, 5} with

E1 = {1, · · · , 3}, E3 = {4, · · · , 7} and E5 = {8, · · · , 10}and f i(xi) = 0 for i ∈ {2, 4}. Here, xi with i ∈ {1, 3, 5}(resp. i ∈ {2, 4}) is the horizontal position of sensorSi (resp. relay node Ri). To transform problem (24) tothe standard form described in (1) we introduce slackvariables xi2 ∈ R with i ∈ {1, · · · , 4}, to rewrite (24) as

x? = arg minx∈R9

∑5

i=1f i(xi), subject to (25)

xj1 − xj+11 + xj2 = 5, xj2 ≥ 0, j ∈ {1, · · · , 4},

where xi ∈ R2 for i ∈ {1, 2, 3, 4}, x5 ∈ R, and f i(xi) =f i(xi1) for any i ∈ {1, · · · , 5}, i.e., f i(xi2) = 0. We canrun algorithm (7) by choosing the cyber layer equiva-lent to the physical connected topology between all theagent, i.e., Gk = G for k ∈ {1, 2, 3, 4}, where G is the linegraph connecting all 5 agents. However, as stated ear-

9

Page 10: Cluster-based Distributed Augmented Lagrangian Algorithm ... · convexity of the local cost functions, we adapt an augmented Lagrangian framework [27]. Augmented Lagrangian method

Fig. 4. Schematic representation of the events, sensors and relay

nodes in the second example.

lier this configuration leads to extra computational andcommunication efforts. Here, instead, we form 4 cyber-layers Gk, k ∈ {1, 2, 3, 4}, where V1 = {1, 2}, V2 = {2, 3},V3 = {3, 4} and V4 = {4, 5}. We note that our proposedapproach to form the cyber-layers in correspondence tothe equality constraints leads to an efficient communi-cation topology here. More specifically, to generate thedual dynamics, the agents {1, · · · , 5}, maintain and com-municate variables of order {1, 2, 2, 2, 1}, respectively.Whereas, if we implement algorithms of [19,21], the cor-responding variables to generate the dual dynamics is oforder {8, 8, 8, 8, 4}.

Figure 5 shows the trajectory of the distributed opti-mization algorithm (7) (using f ip(xi1) as defined in (12)

in place of f i(xi) in (7c)) for problem (25). As shownthe location of the sensors remain in their communica-tion range and converge to optimum values during exe-cution of the algorithm (the optimal solution is shown bythe grey lines, and is obtained by MATLAB’s constraintoptimization solver ‘fmincon’). Our choice of smoothpenalty function (12) is obtained by γ = 200 and ε =0.01 which satisfies the condition of Proposition 3.1.What is interesting to note in Fig. 5 is how the con-vergence of the algorithm is slowed down when we useGk = G for k ∈ Z4

1. This is expected, as in this case thecoordination to generate the dual variables has to hap-pen over a larger graph.

Table 2 gives the global cost value and the inequalityconstraint evaluation at x?p obtained by using our dis-tributed algorithm with ε-exact penalty function methodfor three simulation scenarios. The first and the secondscenarios are respectively when we use ε = 0.01 andε = 0.001. As we can see when ε = 0.01 only one ofthe inequalities is violated slightly (by 2.1e−4). When asmaller ε = 0.001 is used this violation also is removed.Table 2 also shows that if we use the ’adjusted boxedinequalities’ that we introduced in Remark 3.3, the in-equality constraints are all respected with only a negli-gible increase in the cost value.

5 Conclusions

We proposed a novel cluster-based distributed aug-mented Lagrangian algorithm for a class of constrainedconvex optimization problem. In the design of our dis-tributed algorithm, we paid special attention to the

0 50 100 150 200Time

-10

0

10

20 S1R1S2R2S3

Fig. 5. Trajectories of {xi1}5i=1 generated by implementing dis-tributed algorithm (7): The grey lines show the optimum positions

of agents on the line obtained by using the Matlab’s fmincon.

The thick curved lines show the trajectories when algorithm (7) isimplemented over cluster-based cyber-layers. the thin lines show

the trajectories when algorithm (7) is implemented with Gk = G,k = {1, 2, 3, 4}.

efficient communication and computation resource man-agement and required only the agents that are coupledthrough an equality constraint to form a communica-tion topology to address that coupling in a distributedmanner. We showed that if the communication topologycorresponding to each equality constraint is a connectedgraph, the proposed algorithm converges asymptot-ically when the local cost functions are convex, andexponentially when the local cost functions are stronglyconvex and have Lipschitz gradients. We invoked theε-exact penalty function method to address the inequal-ity constraints and obtained an explicit lower boundon the penalty function weight to guarantee conver-gence to ε-neighborhood of the global minimum value ofthe cost. Simulations demonstrated the performance ofour proposed algorithm. As future work, we will studythe event-triggered communication implementation ofour algorithm.

References

[1] S. S. Kia, “An augmented lagrangian distributed algorithmfor an in-network optimal resource allocation problem,” inAmerican Control Conference, (WA, USA), 2017.

[2] A. J. Wood, F. Wollenberg, and G. B. Sheble, PowerGeneration, Operation and Control. New York: John Wiley,3rd ed., 2013.

[3] A. Cherukuri and J. Cortes, “Initialization-free distributedcoordination for economic dispatch under varying loads andgenerator commitment,” Automatica, vol. 74, pp. 183–193,2016.

[4] L. Xiao, M. Johansson, and S. P. Boyd, “Simultaneousrouting and resource allocation via dual decomposition,”IEEE Transactions on Communications, vol. 52, no. 7,pp. 1136–1144, 2004.

[5] R. Madan and S. Lall, “Distributed algorithms formaximum lifetime routing in wireless sensor networks,” IEEETransactions on Wireless Communications, vol. 5, no. 8,pp. 2185–2193, 2006.

[6] J. Chen and V. K. N. Lau, “Convergence analysis ofsaddle point problems in time varying wireless systems –control theoretical approach,” IEEE Transactions on SignalProcessing, vol. 60, no. 1, pp. 443–452, 2012.

10

Page 11: Cluster-based Distributed Augmented Lagrangian Algorithm ... · convexity of the local cost functions, we adapt an augmented Lagrangian framework [27]. Augmented Lagrangian method

Table 2

The global cost value and the inequality constraint evaluation at x?p obtained by using ε-exact penalty function method

x1 − x2 − 5 x2 − x3 − 5 x3 − x4 − 5 x4 − x5 − 5 f(x?p)

ε = 0.01 -5.8e-3 -2.06e-2 -2.63e-2 2.1e-4 -680.4

ε = 0.001 -5.92-3 -3.46e-2 -3.75e-2 -3.5e-2 -680.4

ε = 0.01 and adjusted bounds -1.25e-2 -1.3e-2 -3.92e-2 -8.2e-3 -680.23

[7] A. Ferragut and F. Paganini, “Network resource allocationfor users with multiple connections: fairness and stability,”IEEE/ACM Transactions on Networking, vol. 22, no. 2,pp. 349–362, 2014.

[8] S. A. Alghunaim, K. Yuan, and A. H. Sayed, “Dual coupleddiffusion for distributed optimization with affine constraints,”in IEEE Conf. on Decision and Control, (FL, USA), 2018.

[9] R. Rostami, G. Costantini, and D. Gorges, “ADMM-based distributed model predictive control: Primal anddual approaches,” in IEEE Conf. on Decision and Control,(Melbourne, Australia), 2017.

[10] S. Boyd, N. Parikh, E. Chu, B. Peleato, and J. Eckstein,“Distributed optimization and statistical learning via thealternating direction method of multipliers,” Foundationsand Trends in Machine Learning, vol. 3, pp. 1–122, 2010.

[11] J. Duchi, A. Agarwal, and M. Wainwright, “Dual averagingfor distributed optimization: Convergence analysis andnetwork scaling,” IEEE Transactions on Automatic Control,vol. 57, no. 3, pp. 592–606, 2012.

[12] J. Wang and N. Elia, “A control perspective for centralizedand distributed convex optimization,” in IEEE Conf. onDecision and Control, (FL, USA), 2011.

[13] S. S. Kia, J. Cortes, and S. Martınez, “Distributed convexoptimization via continuous-time coordination algorithmswith discrete-time communication,” Automatica, vol. 55,pp. 254–264, 2014.

[14] D. Varagnolo, F. Zanella, A. Cenedese, G. Pillonetto, andL. Schenato, “Newton-raphson consensus for distributedconvex optimization,” IEEE Transactions on AutomaticControl, vol. 61, no. 4, pp. 994 – 1009, 2015.

[15] Z. Zhang and M. Chow, “Convergence analysis of theincremental cost consensus algorithm under differentcommunication network topologies in a smart grid,” IEEETransactions on Power Systems, vol. 27, no. 4, pp. 1761–1768, 2012.

[16] S. Kar and G. Hug, “Distributed robust economic dispatchin power systems: A consensus + innovations approach,” inPower & Energy Society General Meeting, (San Diego, CA),pp. 1–8, July 2012.

[17] A. D. Dominguez-Garcia, S. T. Cady, and C. N. Hadjicostis,“Decentralized optimal dispatch of distributed energyresources,” in IEEE Conf. on Decision and Control, (Hawaii,USA), pp. 3688–3693, Dec. 2012.

[18] L. Xiao and S. Boyd, “Optimal scaling of a gradient methodfor distributed resource allocation,” Journal of optimizationtheory and applications, vol. 129, no. 3, pp. 469–488, 2006.

[19] Y. Zhang and M. M. Zavlanos, “A consensus-baseddistributed augmented lagrangian method,” in IEEE Conf.on Decision and Control, (CA, USA), 2018.

[20] P. Yi, Y. Hong, and F. Liu, “Initialization-free distributedalgorithms for optimal resource allocation with feasibilityconstraints and its application to economic dispatch of powersystems,” Automatica, vol. 74, pp. 259–269, 2016.

[21] S. S. Kia, “Distributed optimal in-network resource allocationalgorithm design via a control theoretic approach,” Systemsand Control Letters, vol. 107, pp. 49–57, 2017.

[22] D. Ding and M. Jovanovic´, “A primal-dual Laplaciangradient flow dynamics for distributed resource allocationproblems,” in American Control Conference, (WI, USA),2018.

[23] A. Cherukuri and J. Cortes, “Initialization-free distributedcoordination for economic dispatch under varying loads andgenerator commitment,” Automatica, vol. 74, pp. 183–193,2016.

[24] A. Cherukuri and J. Cortes, “Distributed generatorcoordination for initialization and anytime optimizationin economic dispatch,” IEEE Transactions on Control ofNetwork Systems, vol. 2, no. 3, pp. 226–237, 2015.

[25] D. Bertsekas and J. Tsitsiklis, Parallel and DistributedComputation: Numerical Methods. 1997.

[26] D. Jakovetic, J. Moura, and J. Xavier, “Linear convergencerate of a class of distributed augmented lagrangianalgorithms,” IEEE Transactions on Automatic Control,vol. 60, no. 4, pp. 922–936, 2015.

[27] M.Vaquero and J.Cortes, “Distributed augmentation-regularization for robust online convex optimization,” IFAC-PapersOnLine, vol. 51, no. 23, pp. 230–235, 2018.

[28] K. J. Arrow, L. Hurwicz, and H. Uzawa, Studies in linearand nonlinear programming. 1958.

[29] D. Ding, B. Hu, N. Dhingra, and M. Jovanovic´,“An exponentially convergent primal-dual algorithm fornonsmooth composite minimization,” in IEEE Conf. onDecision and Control, (FL, USA), 2018.

[30] S. S. Kia, “Distributed optimal resource allocation overnetworked systems and use of an epsilon-exact penaltyfunction,” in IFAC Symposium on Large Scale ComplexSystems, (CA, USA), 2016.

[31] W. Haddad and V. Chellaboina, Nonlinear DynamicalSystems and Control. Princeton University Press, 2008.

[32] D. P. Bertsekas, “Nondifferentiable optimization viaapproximation,” Mathematical Programing Study, vol. 3,pp. 1–25, 1975.

[33] M. C. Pinar and S. A. Zenios, “On smoothing exactpenalty functions for convex constrained optimization,”IEEE Transactions on Communications, vol. 4, no. 3,pp. 1136–1144, 1994.

[34] W. Wei, J. Wang, N. Li, and S. Mei, “Optimal power flowof radial networks and its variations: A sequential convexoptimization approach,” IEEE Transactions on Smart Grid,vol. 8, no. 6, pp. 2974–2987, 2017.

[35] M. Zholbaryssov, D. Fooladivanda, and A. D.Domınguez-Garcıa, “Resilient distributed optimal generation dispatch forlossy ac microgrids,” Systems and Control Letters, vol. 123,pp. 47–54, 2019.

11

Page 12: Cluster-based Distributed Augmented Lagrangian Algorithm ... · convexity of the local cost functions, we adapt an augmented Lagrangian framework [27]. Augmented Lagrangian method

[36] O. Mangasarian, “Computable numerical bounds for lagrangemultipliers of stationary points of non-convex differentiablenon-linear programs,” Operations Research Letters, vol. 4,no. 2, pp. 1757–1780, 1985.

[37] S. Richter, M. Morari, and C. Jones, “Towards computationalcomplexity certification for constrained MPC based onLagrange relaxation and the fast gradient method,” in IEEEConf. on Decision and Control, (Orlando, Florida, USA),pp. 5223 – 5229, 2011.

[38] F. Bullo, J. Cortes, and S. Martınez, Distributed Control ofRobotic Networks. Applied Mathematics Series, PrincetonUniversity Press, 2009.

[39] S. Boyd and L. Vandenberghe, Convex Optimization.Cambridge University Press, 2004.

[40] D. Bertsekas, Nonlinear Programming. 1999.

[41] O. L. Mangasarian and S. Fromovitz, “The fritz johnnecessary optimality conditions in the presence of equalityand inequality constraints,” Operations Research Letters,vol. 17, pp. 37–47, 1967.

[42] G. Wachsmuth, “On LICQ and the uniqueness of Lagrangemultipliers,” Operations Research Letters, vol. 41, no. 1,pp. 78–80, 2013.

[43] P. Srivastava and J. Cortes, “Distributed algorithm viacontinuously differentiable exact penalty method for networkoptimization,” in IEEE Conf. on Decision and Control, (FL,USA), 2018.

[44] 2004. http://motor.ece.iit.edu/data/JEAS_IEEE118.doc.

[45] H. K. Khalil, Nonlinear Control. Prentice Hall, 2002.

Appendix A

PROOF. [Proof of Theorem 3.1] Let ({xi?}Ni=1,ν?)

satisfy the KKT equation (5) and y?k = [{[wl]kxl? −blk}l∈Vk ]. For convenience in analysis, we apply thechange of variables

qk=

[r>k

R>k

](yk−y?k), pk=vk−ν?k1Nk

, χi = xi−xi?,

(A.26)

to write the algorithm (7), under the stated initializationconditions, in the equivalent form

˙qk = 0, (A.27a).qk =βk (R>k LkRk) R>k pk, (A.27b)

pk =ψkχk − βk Lkpk − Rk qk − rk qk, (A.27c)

χi = − (ρi + 1) (∇f i(χi + xi?)−∇f i(xi?)) +∑k∈T i

(− ρi[wi]>k [wi]kχ

i − (ρi + 1)[wi]>k pik

+ ρi[wi]>k [Rk qk]i + ρi [wi

]>k qk), (A.27d)

where we used qk = (qk, qk) with qk ∈ R, qk ∈R(Nk−1). Here, we also used RkR>k Lk = Lk, ψk =Blkdiag({[wi]k}i∈Vk) and χk = [{χi>}i∈Vk ]>. Under

the given initial condition, for any t ∈ R≥0 we obtain

qk(t) =1√Nk

(∑l∈Vk

ylk(t)−([W]kx?−bk))

=0. (A.28)

To study the stability in the other variables, we letqk(t) = 0 in (A.27c) and (A.27d), and consider theradially unbounded candidate Lyapunov function

V ({qk}pk=1, {pk}

pk=1, {χ

i}Ni=1) =1

2

∑N

i=1χi>χi +

1

2

∑p

k=1

(q>k (Γk + I)(βkR>k LkRk)−1qk + p>k pk

+ (pk + Rk qk)>Γk (pk + Rk qk)), (A.29)

where Γk=Blkdiag({ρi}i∈Vk). Note that (βkR>k LkRk)−1

and Γk + I are positive definite diagonal matrices, thusq>k (Γk + I)(βkR>k LkRk)−1qk>0. Taking the derivativeof V along the trajectories of (A.27b)-(A.27d) gives

V =−∑N

i=1

((ρi + 1)χi>(∇f(χi + xi?)−∇f(xi?))

(A.30)

−p∑k=1

(βk p>k Lkpk+(ψk χk−Rkqk)>Γk(ψkχk−Rkqk)

).

Convexity of the local cost functions ensuresχi(∇f i(χi+xi?) −∇f i(xi?)) = ((χi + xi?) − xi?)(∇f i(χi + xi?) −∇f i(xi?)) ≥ 0, i ∈ V. The connectivity of the sub-graph

Gk, k ∈ Zp1 also ensures −p>k Lkpk ≤ 0. Thus, V ≤ 0,and consequently the trajectories of (A.27b)-(A.27d)starting from any initial condition are bounded.

Next, we invoke the invariant set stability results toprove that the trajectories of (A.27b)-(A.27d) con-verge to a point in its set of equilibrium points. LetS = {({qk}

pk=1, {pk}

pk=1, {χi}Ni=1) ∈

∏pk=1RNk−1 ×∏p

k=1RNk ×∏Ni=1 Rn

i | V ≡ 0}. Given (A.30), we

have S ={{qk}

pk=1,{pk}

pk=1,{χi}Ni=1 ∈

∏pk=1 RNk−1×∏p

k=1 RNk ×∏Ni=1 Rn

i∣∣∣ pk = 0, ψk χk = Rkqk,

χi>(∇f i(χi + xi?) − ∇f i(xi?)) = 0, i ∈ V, k ∈ Zp1}

.

Sinceχi>(∇f i(χi+xi?)−∇f i(xi?)) =∑ni

j=1 χij(∇f ij(χij+

xi?j )−∇f ij(xi?j )), due to convexity of the cost functions f ij ,

j ∈ Zni1 , i ∈ V, from χi>(∇f i(χi + xi?)−∇f i(xi?)) = 0

we conclude that either χij = 0 or ∇f ij(χij + xi?j )) −∇f ij(xi?j )) = 0. Consequently, the points in S satisfy

∇f i(χi + xi?)−∇f i(xi?) = 0. As a result, given (A.28),a trajectory t 7→ ({qk(t)}pk=1, {pk(t)}pk=1, {χi(t)}Ni=1)of (A.27b)-(A.27d) belonging to S for all t ≥ 0, must

satisfy (.qk ≡ 0, pk ≡ 0, χi ≡ 0). Therefore, the

largest invariant set in S is the set of equilibrium pointsof (A.27b)-(A.27d). Then, invoking the La Salle invari-ant theorem [31, Theorem 3.4], we conclude that the

12

Page 13: Cluster-based Distributed Augmented Lagrangian Algorithm ... · convexity of the local cost functions, we adapt an augmented Lagrangian framework [27]. Augmented Lagrangian method

trajectories of (A.27b)-(A.27d) converge asymptoticallyto the set of its equilibrium points.

Next, we show that the convergence is indeed to a pointin the equlibia set. For that, by virtue of semi-stabilitytheorem [31, Theorem 4.20], we show that every equi-librium point of (A.27b)-(A.27d) is Lyapunov stable.Let ({qk}pk=1, {pk}, {χi)}Ni=1 be an equilibrium point of(A.27b)-(A.27d) (recall that qk(t) = 0 due to (A.28)).Now, consider the change of variables qk = qk − qk and

pk = pk − pk for k ∈ Zp1, and ri = χi − χi for i ∈ V, towrite (A.27b)-(A.27d) as

qk =βk (R>k LkRk) R>k pk, (A.31a)

pk =ψkrk − βk Lkpk − Rk qk, (A.31b)

ri = − (ρi + 1) (∇f i(ri + χi + xi?)−∇f i(xi?)) +∑k∈T i

(− ρi[wi]>k [wi]kr

i − (ρi + 1)[wi]>k pik

+ ρi[wi]>k [Rk qk]i

). (A.31c)

Next, consider the Lyapunov function (A.29) where({qk}

pk=1, {pk}

pk=1, {χi}Ni=1) is substituted by ({qk}pk=1,

{pk}pk=1, {ri}Ni=1). Following the same argument used

to show V ≤ 0 in (A.30), we can show that the deriva-tive of V ({qk}pk=1, {pk}

pk=1, {ri}Ni=1) along the trajec-

tories of (A.27b)-(A.27d), when (A.28) holds, is alsonegative semi-definite. Thus, any equilibrium point({qk}pk=1, {pk}, {χi}Ni=1) of (A.27b)-(A.27d) is Lya-punov stable (recall (A.28)). Therefore, since the tra-jectories of (A.27b)-(A.27d) are approaching to the setof stable equilibrium points, starting from any initialcondition, the trajectories of (A.27b)-(A.27d) convergeto a point in its equilibrium set. Consequently, giventhe change of variables (A.26), we conclude that start-ing from stated initial conditions in the statement, thetrajectories of (7) converge, as t → ∞, to a point inits set of equilibrium points (9), where ({vlk}l∈Vk =

0, {ylk}l∈Vk = 0, {xi}Ni=1 = 0). Therefore, under thestated initial condition, as t → ∞, the limit point({vlk}

pk=1, {ylk}

pk=1, {xi}Ni=1), i ∈ V, l ∈ Vk that satis-

fies ({vlk}l∈Vk = 0, {ylk}l∈Vk = 0, {xi}Ni=1 = 0) in (7)is equal to (ν?k1Nk

, y?, {xi?}Ni=1), where ({ν?k}pk=1, x

i?),where ({ν?k}

pk=1, {xi?}Ni=1) is a point satisfying the KKT

conditions (5) of problem (4) (this point is not neces-sarily the point used in the change of variable (A.26)).

PROOF. [Proof of Theorem 3.2] Follow the proof ofTheorem 3.1 until the choice of the candidate Lyapounvfunction where we use the candidate function below con-sisted of V in (A.29) plus an extra positive quadraticterm

V ({qk}pk=1, {pk}

pk=1, {χ

i}Ni=1) = V+∑p

k=1

φk2

(χk +ψ>k Γkpk)>(χk+ψ>k Γkpk) = ζ>Eζ,

where φk ∈ R>0 satisfies φk < min{ 2(1+ρ)m

p(M2(ρ2+1)2+1) ,2βkλ2k

(β2kλ2Nkρ2+ρ+1)‖ψk‖2

}, with ρ = min{ρi}Ni=1 and ρ =

max{ρi}Ni=1. Here ζ = [{q>k }pk=1, {p>k }

pk=1, {χi>}Ni=1]>

and E > 0 is the obvious matrix describing the coef-ficients of the quadratic terms of V . When every Gk,k ∈ Zp1 is a connected graph, V is a radially unboundedand positive definite function. Then,

˙V =−∑N

i=1(ρi + 1)χi

>h(χi)+

∑p

k=1

(− βkp>k Lkpk

− (ψk χk − Rkqk)>Γk(ψk χk − Rkqk)

− φk2‖ψ>k Γkpk+(Γk+I)h(χk)‖2 +

φk2χ>k χk

− φk2‖χk + βkψ

>k ΓkLkpk +ψ>k (Γk + I)pk)‖2

− φkχ>k (Γk + I)h(χk) +φk2

h>(χk)(Γk + I)2h(χk)

+β2φk

2‖p>k LkΓkψk‖2 +

φk2

p>k ψk(Γk + I)ψ>k pk

− βφk2

p>k (Γk + I)ψkψ>k ΓkLkpk

),

where h(χk) = ∇f(χk + x?k)−∇f(x?k). When ρi ∈ R>0

for all i ∈ V, we can write

˙V ≤ − (1 + ρ)mχ>χ+∑p

k=1

(− βk λ2kp

>k pk−

(ψk χk−Rkqk)>Γk(ψkχk−Rkqk)+φk2

(M2(ρ+ 1)2+1)

χ>χ+φk2

(β2kλ

2Nkρ

2+ρ+ 1)‖ψk‖2p>k pk

).

Here, we used the M il -Lipschitzness property of lo-

cal gradients to write h(χk)>(Γk + I)2h(χk) ≤∑Nk

i=1(ρi + 1)2M2χi2 ≤ M2(ρ + 1)2χ>χ . We also

used −∑Ni=1(ρi + 1)χ>i h(χi) ≤ −m(ρ + 1)χ>χ

due to the mil-strong convexity of local cost func-

tion f il , and −p>k Lkpk ≤ 0, which is true becauseevery Gk, k ∈ Zp1 is a connected graph. We alsoused ‖p>k LkΓkψk‖2 ≤ λ2

Nkρ2‖ψk‖2p>k pk where λNk

is the maximum eigenlavue of Lk. We note that for

0 < φk < min{ 2(1+ρ)m

p(M2(ρ2+1)2+1) ,2βkλ2k

(β2kλ2Nkρ2+ρ+1)‖ψk‖2

},

we have ˙V < 0. Next, note that we can bound ˙V by anegative definite quadratic upper bound as

˙V ≤ −((1 + ρ)m− pφk

2(M2(ρ+ 1)21

)χ>χ+ (A.32)∑p

k=1

(− (βk λ2k −

φk2

(β2kλ

2N ρ

2+ρ+1)‖ψk‖2)p>k pk

− (ψk χk − Rkqk)>Γk(ψkχk − Rkqk))

= −ζ>Fζ,

where F > 0 is the obvious matrix describing the co-efficients of the quadratic terms of the upper bound

13

Page 14: Cluster-based Distributed Augmented Lagrangian Algorithm ... · convexity of the local cost functions, we adapt an augmented Lagrangian framework [27]. Augmented Lagrangian method

of ˙V . Because V is a quadratic positive definite func-

tion and the upper bound on ˙V is a quadratic negativedefinite quadratic function, by virtue of [45, Theorem4.10], (A.27b)-(A.27d) is exponentially stable, and itstrajectories converge to the origin with the rate no worse

than λmin(F)2λmax(E) , where λmin(F) is the minimum eigenvalue

of F and λmax(E) is the maximum eigenvalue of E. Con-sequently, starting from any initial condition given in thestatement, the trajectories t 7→ ({vk(t)}pk=1, {xi(t)}Ni=1)converge exponentially fast with the rate given above to( ν?k1Nk

, {xi?}Ni=1), as t→∞.If ρi = 0 for any i ∈ V, we can only guarantee that˙V ≤ 0 with S = {({qk}

pk=1, {pk}

pk=1, {χi}Ni=1) ∈∏p

k=1 RNk−1 ×∏pk=1 RNk ×

∏Ni=1 Rn

i | ˙V ≡ 0} ={{qk}

pk=1,{pk}

pk=1,{χi}Ni=1∈

∏pk=1 RNk−1×

∏pk=1 RNk×∏N

i=1 Rni∣∣∣ pk = 0, χi = 0,ΓkRkqk = 0, i ∈

V, k ∈ Zp1}. Next, we note that since Rk is a fullcolumn rank matrix, given (A.28), the only tra-jectory t 7→ ({qk(t)}pk=1, {pk(t)}pk=1, {χi(t)}Ni=1)of (A.27b)-(A.27d) that belongs to S for all t ∈ R≥0

is ({qk(t) ≡ 0}pk=1, {pk(t) ≡ 0}pk=1, {χi(t) ≡ 0}Ni=1).Therefore, using a LaSalle invariant set analysisof [45, Corollary 4.1], and recalling the change of vari-able (A.26) and also (A.28), we can conclude thatt 7→ ({vk(t)}pk=1, {xi(t)}Ni=1) of (7) converges exponen-tially fast to ( ν?k1Nk

, {xi?}Ni=1).

Appendix B

Consider the optimization problem

x?= arg minx∈R2

∑2

i=1f i(xi) subject to x1 + x2 = 2,

(B.1)

where f i(xi)=

0, |xi| ≤ 2,

12α (|xi| − 2)2, 2 < |xi| ≤ 2 + α,

(|xi| − 2− 12α), |xi| > 2 + α,

with α=0.01. Here, the cost function is convex.

0 20 40 60 80 100

-2

0

2

4

xi (t

)

x1

x2

0 20 40 60 80 100

Time

-2

0

2

xi (t

)

x1

x2

= 0

= 1

Fig. 6. Trajectories of algorithm (6) when it is used to solve

optimization problem (B.1) with ρ = 0 and ρ = 1.

Note that the optimization problem (B.1) has infi-nite number of minimizers that correspond to the

minimum cost of f? = 0. One of these minimizers is(x1?, x2?) = (0, 2). Figure 6 shows the xi trajectories ofcentral solver (6) over time. As shown, the algorithmdoes not converge when ρ = 0, while the convergence isachieved when we use the augmented Lagrangian withρ = 1.

14