graphs, principal minors, and eigenvalue problems

200
Graphs, Principal Minors, and Eigenvalue Problems by John C. Urschel Submitted to the Department of Mathematics in partial ful๏ฌllment of the requirements for the degree of Doctor of Philosophy in Mathematics at the MASSACHUSETTS INSTITUTE OF TECHNOLOGY September 2021 ยฉ John C. Urschel, MMXXI. All rights reserved. The author hereby grants to MIT permission to reproduce and to distribute publicly paper and electronic copies of this thesis document in whole or in part in any medium now known or hereafter created. Author .............................................................. Department of Mathematics August 6, 2021 Certi๏ฌed by ......................................................... Michel X. Goemans RSA Professor of Mathematics and Department Head Thesis Supervisor Accepted by ......................................................... Jonathan Kelner Chairman, Department Committee on Graduate Theses

Upload: others

Post on 24-Jan-2022

3 views

Category:

Documents


0 download

TRANSCRIPT

Graphs, Principal Minors, and Eigenvalue Problemsby

John C. Urschel

Submitted to the Department of Mathematicsin partial fulfillment of the requirements for the degree of

Doctor of Philosophy in Mathematics

at the

MASSACHUSETTS INSTITUTE OF TECHNOLOGY

September 2021

ยฉ John C. Urschel, MMXXI. All rights reserved.

The author hereby grants to MIT permission to reproduce and to distributepublicly paper and electronic copies of this thesis document in whole or in

part in any medium now known or hereafter created.

Author . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .Department of Mathematics

August 6, 2021

Certified by . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .Michel X. Goemans

RSA Professor of Mathematics and Department HeadThesis Supervisor

Accepted by . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .Jonathan Kelner

Chairman, Department Committee on Graduate Theses

2

Graphs, Principal Minors, and Eigenvalue Problems

by

John C. Urschel

Submitted to the Department of Mathematicson August 6, 2021, in partial fulfillment of the

requirements for the degree ofDoctor of Philosophy in Mathematics

AbstractThis thesis considers four independent topics within linear algebra: determinantal pointprocesses, extremal problems in spectral graph theory, force-directed layouts, and eigenvaluealgorithms. For determinantal point processes (DPPs), we consider the classes of symmetricand signed DPPs, respectively, and in both cases connect the problem of learning theparameters of a DPP to a related matrix recovery problem. Next, we consider two conjecturesin spectral graph theory regarding the spread of a graph, and resolve both. For force-directedlayouts of graphs, we connect the layout of the boundary of a Tutte spring embedding totrace theorems from the theory of elliptic PDEs, and we provide a rigorous theoreticalanalysis of the popular Kamada-Kawai objective, proving hardness of approximation andstructural results regarding optimal layouts, and providing a polynomial time randomizedapproximation scheme for low diameter graphs. Finally, we consider the Lanczos methodfor computing extremal eigenvalues of a symmetric matrix and produce new error estimatesfor this algorithm.

Thesis Supervisor: Michel X. GoemansTitle: RSA Professor of Mathematics and Department Head

3

4

Acknowledgments

I would like to thank my advisor, Michel Goemans. From the moment I met him, I was

impressed and inspired by the way he thinks about mathematics. Despite his busy schedule

as head of the math department, he always made time when I needed him. He has been the

most supportive advisor a student could possibly ask for. Heโ€™s the type of mathematician

I aspire to be. I would also like to thank the other members of my thesis committee,

Alan Edelman and Philippe Rigollet. They have been generous with their time and their

mathematical knowledge. In my later years as a PhD candidate, Alan has served as a

secondary advisor of sorts, and Iโ€™m thankful for our weekly Saturday meetings.

Iโ€™ve also been lucky to work with a number of great collaborators during my time at

MIT. This thesis would not have been possible without my mathematical collaborations and

conversations with Jane Breen, Victor-Emmanuel Brunel, Erik Demaine, Adam Hesterberg,

Fred Koehler, Jayson Lynch, Ankur Moitra, Alex Riasanovksy, Michael Tait, and Ludmil

Zikatanov. And, though the associated work does not appear in this thesis, I am also grateful

for collaborations with Xiaozhe Hu, Dhruv Rohatgi, and Jake Wellens during my PhD.

Iโ€™ve been the beneficiary of a good deal of advice, mentorship, and support from the larger

mathematics community. This group is too long to list, but Iโ€™m especially grateful to Rediet

Abebe, Michael Brenner, David Mordecai, Jelani Nelson, Peter Sarnak, Steven Strogatz,

and Alex Townsend, among many others, for always being willing to give me some much

needed advice during my time at MIT.

I would like to thank my office mates, Fred Koehler and Jake Wellens (and Megan Fu),

for making my time in office 2-342 so enjoyable. With Jakeโ€™s wall mural and very large

and dusty rug, our office felt lived in and was a comfortable place to do math. Iโ€™ve had

countless interesting conversations with fellow grad students, including Michael Cohen,

Younhun Kim, Meehtab Sawhney, and Jonathan Tidor, to name a few. My interactions with

MIT undergrads constantly reminded me just how fulfilling and rewarding mentorship can

be. Iโ€™m thankful to all the support staff members at MIT math, and have particularly fond

memories of grabbing drinks with Andrรฉ at the Muddy, constantly bothering Michele with

5

whatever random MIT problem I could not solve, and making frequent visits to Rosalee for

more espresso pods. More generally, I have to thank the entire math department for fostering

such a warm and welcoming environment to do math. MIT math really does feel like home.

Iโ€™d be remiss if I did not thank my friends and family for their non-academic contributions.

I have to thank Ty Howle for supporting me every step of the way, and trying to claim

responsibility for nearly all of my mathematical work. This is a hard claim to refute, as it

was often only once I took a step back and looked at things from a different perspective that

breakthroughs were made. Iโ€™d like to thank Robert Hess for constantly encouraging and

believing in me, despite knowing absolutely no math. Iโ€™d like to thank my father and mother

for instilling a love of math in me from a young age. My fondest childhood memories of

math were of going through puzzle books at the kitchen table with my mother, and leafing

through the math section in the local Chapters in St. Catherines while my father worked

in the cafe. Iโ€™d like to thank Louisa for putting up with me, moving with me (eight times

during this PhD), and proofreading nearly all of my academic papers. I would thank Joanna,

but, frankly, she was of little to no help.

This research was supported in part by ONR Research Contract N00014-17-1-2177.

During my time at MIT, I was previously supported by a Dean of Science Fellowship and

am currently supported by a Mathworks Fellowship, both of which have allowed me to place

a greater emphasis on research, and without which this thesis, in its current form, would not

have been possible.

6

For John and Venita

7

8

Contents

1 Introduction 11

2 Determinantal Point Processes and Principal Minor Assignment Problems 17

2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

2.1.1 Symmetric Matrices . . . . . . . . . . . . . . . . . . . . . . . . . 18

2.1.2 Magnitude-Symmetric Matrices . . . . . . . . . . . . . . . . . . . 20

2.2 Learning Symmetric Determinantal Point Processes . . . . . . . . . . . . . 22

2.2.1 An Associated Principal Minor Assignment Problem . . . . . . . . 22

2.2.2 Definition of the Estimator . . . . . . . . . . . . . . . . . . . . . . 25

2.2.3 Information Theoretic Lower Bounds . . . . . . . . . . . . . . . . 29

2.2.4 Algorithmic Aspects and Experiments . . . . . . . . . . . . . . . . 32

2.3 Recovering a Magnitude-Symmetric Matrix from its Principal Minors . . . 38

2.3.1 Principal Minors and Magnitude-Symmetric Matrices . . . . . . . . 40

2.3.2 Efficiently Recovering a Matrix from its Principal Minors . . . . . 60

2.3.3 An Algorithm for Principal Minors with Noise . . . . . . . . . . . 66

3 The Spread and Bipartite Spread Conjecture 75

3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75

3.2 The Bipartite Spread Conjecture . . . . . . . . . . . . . . . . . . . . . . . 78

3.3 A Sketch for the Spread Conjecture . . . . . . . . . . . . . . . . . . . . . 85

3.3.1 Properties of Spread-Extremal Graphs . . . . . . . . . . . . . . . . 88

9

4 Force-Directed Layouts 95

4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95

4.2 Tutteโ€™s Spring Embedding and a Trace Theorem . . . . . . . . . . . . . . . 98

4.2.1 Spring Embeddings and a Schur Complement . . . . . . . . . . . . 101

4.2.2 A Discrete Trace Theorem and Spectral Equivalence . . . . . . . . 109

4.3 The Kamada-Kawai Objective and Optimal Layouts . . . . . . . . . . . . . 127

4.3.1 Structural Results for Optimal Embeddings . . . . . . . . . . . . . 131

4.3.2 Algorithmic Lower Bounds . . . . . . . . . . . . . . . . . . . . . 138

4.3.3 An Approximation Algorithm . . . . . . . . . . . . . . . . . . . . 147

5 Error Estimates for the Lanczos Method 151

5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151

5.1.1 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155

5.1.2 Contributions and Remainder of Chapter . . . . . . . . . . . . . . 157

5.2 Preliminary Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 159

5.3 Asymptotic Lower Bounds and Improved Upper Bounds . . . . . . . . . . 161

5.4 Distribution Dependent Bounds . . . . . . . . . . . . . . . . . . . . . . . 175

5.5 Estimates for Arbitrary Eigenvalues and Condition Number . . . . . . . . . 181

5.6 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 187

10

Chapter 1

Introduction

In this thesis, we consider a number of different mathematical topics in linear algebra, with

a focus on determinantal point processes and eigenvalue problems. Below we provide a

brief non-technical summary of each topic in this thesis, as well as a short summary of other

interesting projects that I had the pleasure of working on during my PhD that do not appear

in this thesis.

Determinantal Point Processes and Principal Minor Assignment Problems

A determinantal point process is a probability distribution over the subsets of a finite

ground set where the distribution is characterized by the principal minors of some fixed

matrix. Given a finite set, say, [๐‘ ] = 1, . . . , ๐‘ where ๐‘ is a positive integer, a DPP

is a random subset ๐‘Œ โŠ† [๐‘ ] such that ๐‘ƒ (๐ฝ โŠ† ๐‘Œ ) = det(๐พ๐ฝ), for all fixed ๐ฝ โŠ† [๐‘ ],

where ๐พ โˆˆ R๐‘ร—๐‘ is a given matrix called the kernel of the DPP and ๐พ๐ฝ = (๐พ๐‘–,๐‘—)๐‘–,๐‘—โˆˆ๐ฝ is

the principal submatrix of ๐พ associated with the set ๐ฝ . In Chapter 2, we focus on DPPs

with kernels that have two different types of structure. First, we restrict ourselves to DPPs

with symmetric kernels, and produce an algorithm that learns the kernel of a DPP from

its samples. Then, we consider the slightly broader class of DPPs with kernels that have

corresponding off-diagonal entries equal in magnitude, i.e., ๐พ satisfies |๐พ๐‘–,๐‘—| = |๐พ๐‘—,๐‘–| for all

๐‘–, ๐‘— โˆˆ [๐‘ ], and produce an algorithm to recover such a matrix ๐พ from (possibly perturbed

versions of) its principal minors. Sections 2.2 and 2.3 are written so that they may be read

11

independently of each other. This chapter is joint work with Victor-Emmanuel Brunel,

Michel Goemans, Ankur Moitra, and Philippe Rigollet, and based on [119, 15].

The Spread and Bipartite Spread Conjecture

Given a graph ๐บ = ([๐‘›], ๐ธ), [๐‘›] = 1, ..., ๐‘›, the adjacency matrix ๐ด โˆˆ R๐‘›ร—๐‘› of ๐บ

is defined as the matrix with ๐ด๐‘–,๐‘— = 1 if ๐‘–, ๐‘— โˆˆ ๐ธ and ๐ด๐‘–,๐‘— = 0 otherwise. ๐ด is a real

symmetric matrix, and so has real eigenvalues ๐œ†1 โ‰ฅ ... โ‰ฅ ๐œ†๐‘›. One interesting quantity is

the difference between extreme eigenvalues ๐œ†1 โˆ’ ๐œ†๐‘›, commonly referred to as the spread of

the graph ๐บ. In [48], the authors proposed two conjectures. First, the authors conjectured

that if a graph ๐บ of size |๐ธ| โ‰ค ๐‘›2/4 maximizes the spread over all graphs of order ๐‘›

and size |๐ธ|, then the graph is bipartite. The authors also conjectured that if a graph ๐บ

maximizes the spread over all graphs of order ๐‘›, then the graph is the join (i.e., with all

edges in between) ๐พโŒŠ2๐‘›/3โŒ‹ โˆจ ๐‘’๐‘›โŒˆ๐‘›/3โŒ‰ of a clique of order โŒŠ2๐‘›/3โŒ‹ and an independent set

of order โŒˆ๐‘›/3โŒ‰. We refer to these conjectures as the bipartite spread and spread conjectures,

respectively. In Chapter 3, we consider both of these conjectures. We provide an infinite

class of counterexamples to the bipartite spread conjecture, and prove an asymptotic version

of the bipartite spread conjecture that is as tight as possible. In addition, for the spread

conjecture, we make a number of observations regarding any spread-optimal graph, and use

these observations to sketch a proof of the spread conjecture for all ๐‘› sufficiently large. The

full proof of the spread conjecture for all ๐‘› sufficiently large is rather long and technical,

and can be found in [12]. This chapter is joint work with Jane Breen, Alex Riasanovsky,

and Michael Tait, and based on [12].

Force-Directed Layouts

A force-directed layout, broadly speaking, is a technique for drawing a graph in a low-

dimensional Euclidean space (usually dimension โ‰ค 3) by applying โ€œforcesโ€ between the

set of vertices and/or edges. In a force-directed layout, vertices connected by an edge (or

at a small graph distance from each other) tend to be close to each other in the resulting

layout. Two well-known examples of a force-directed layout are Tutteโ€™s spring embedding

12

and metric multidimensional scaling. In his 1963 work titled โ€œHow to Draw a Graph,โ€

Tutte found an elegant technique to produce planar embeddings of planar graphs that also

minimize the sum of squared edge lengths in some sense. In particular, for a three-connected

planar graph, he showed that if the outer face of the graph is fixed as the complement of

some convex region in the plane, and every other point is located at the mass center of its

neighbors, then the resulting embedding is planar. This result is now known as Tutteโ€™s spring

embedding theorem, and is considered by many to be the first example of a force-directed

layout [117]. One of the major questions that this result does not treat is how to best embed

the outer face. In Chapter 4, we investigate this question, consider connections to a Schur

complement, and provide some theoretical results for this Schur complement using a discrete

energy-only trace theorem. In addition, we also consider the later proposed Kamada-Kawai

objective, which can provide a force-directed layout of an arbitrary graph. In particular, we

prove a number of structural results regarding layouts that minimize this objective, provide

algorithmic lower bounds for the optimization problem, and propose a polynomial time

approximation scheme for drawing low diameter graphs. Sections 4.2 and 4.3 are written

so that they may be read independently of each other. This chapter is joint work with Erik

Demaine, Adam Hesterberg, Fred Koehler, Jayson Lynch, and Ludmil Zikatanov, and is

based on [124, 31].

Error Estimates for the Lanczos Method

The Lanczos method is one of the most powerful and fundamental techniques for solving

an extremal symmetric eigenvalue problem. Convergence-based error estimates depend

heavily on the eigenvalue gap, but in practice, this gap is often relatively small, resulting

in significant overestimates of error. One way to avoid this issue is through the use of

uniform error estimates, namely, bounds that depend only on the dimension of the matrix

and the number of iterations. In Chapter 5, we prove upper uniform error estimates for

the Lanczos method, and provide a number of lower bounds through the use of orthogonal

polynomials. In addition, we prove more specific results for matrices that possess some level

of eigenvalue regularity or whose eigenvalues converge to some limiting empirical spectral

13

distribution. Through numerical experiments, we show that the theoretical estimates of this

chapter do apply to practical computations for reasonably sized matrices. This chapter has

no collaborators, and is based on [122].

Topics not in this Thesis

During my PhD, I have had the chance to work on a number of interesting topics that

do not appear in this thesis. Below I briefly describe some of these projects. In [52] (joint

with X. Hu and L. Zikatanov), we consider graph disaggregation, a technique to break

high degree nodes of a graph into multiple smaller degree nodes, and prove a number of

results regarding the spectral approximation of a graph by a disaggregated version of it.

In [17] (joint with V.E. Brunel, A. Moitra, and P. Rigollet), we analyze the curvature of

the expected log-likelihood around its maximum and characterize of when the maximum

likelihood estimator converges at a parametric rate. In [120], we study centroidal Voronoi

tessellations from a variational perspective and show that, for any density function, there

does not exist a unique two generator centroidal Voronoi tessellation for dimensions greater

than one. In [121], we prove a generalization of Courantโ€™s theorem for discrete graphs,

namely, that for the ๐‘˜๐‘กโ„Ž eigenvalue of a generalized Laplacian of a discrete graph, there

exists a set of corresponding eigenvectors such that each eigenvector can be decomposed

into at most k nodal domains, and that this set is of co-dimension zero with respect to the

entire eigenspace. In [123] (joint with J. Wellens), we show that, given a graph with local

crossing number either at most ๐‘˜ or at least 2๐‘˜, it is NP-complete to decide whether the local

crossing number is at most ๐‘˜ or at least 2๐‘˜. In [97] (joint with D. Rohatgi and J. Wellens),

we treat two conjectures โ€“ one regarding the maximum possible value of ๐‘๐‘(๐บ) + ๐‘๐‘()

(where ๐‘๐‘(๐บ) is the minimum number of cliques needed to cover the edges of ๐บ exactly

once), due to de Caen, Erdos, Pullman and Wormald, and the other regarding ๐‘๐‘๐‘˜(๐พ๐‘›)

(where ๐‘๐‘๐‘˜(๐บ) is the minimum number of bicliques needed to cover each edge of ๐บ exactly

๐‘˜ times), due to de Caen, Gregory and Pritikin. We disprove the first, obtaining improved

lower and upper bounds on ๐‘š๐‘Ž๐‘ฅ๐บ ๐‘๐‘(๐บ) + ๐‘๐‘(), and we prove an asymptotic version

of the second, showing that ๐‘๐‘๐‘˜(๐พ๐‘›) = (1 + ๐‘œ(1))๐‘›. If any of the topics (very briefly)

14

described here interest you, I encourage you to take a look at the associated paper. This

thesis was constructed to center around a topic and to be of a manageable length, and I am

no less proud of some of the results described here that do not appear in this thesis.

15

16

Chapter 2

Determinantal Point Processes and

Principal Minor Assignment Problems

2.1 Introduction

A determinantal point process (DPP) is a probability distribution over the subsets of a finite

ground set where the distribution is characterized by the principal minors of some fixed

matrix. DPPs emerge naturally in many probabilistic setups such as random matrices and

integrable systems [10]. They also have attracted interest in machine learning in the past

years for their ability to model random choices while being tractable and mathematically

elegant [71]. Given a finite set, say, [๐‘ ] = 1, . . . , ๐‘ where ๐‘ is a positive integer, a

DPP is a random subset ๐‘Œ โŠ† [๐‘ ] such that ๐‘ƒ (๐ฝ โŠ† ๐‘Œ ) = det(๐พ๐ฝ), for all fixed ๐ฝ โŠ† [๐‘ ],

where ๐พ โˆˆ R๐‘ร—๐‘ is a given matrix called the kernel of the DPP and ๐พ๐ฝ = (๐พ๐‘–,๐‘—)๐‘–,๐‘—โˆˆ๐ฝ

is the principal submatrix of ๐พ associated with the set ๐ฝ . Assumptions on ๐พ that yield

the existence of a DPP can be found in [14] and it is easily seen that uniqueness of the

DPP is then automatically guaranteed. For instance, if ๐ผ โˆ’ ๐พ is invertible (๐ผ being the

identity matrix), then ๐พ is the kernel of a DPP if and only if ๐พ(๐ผ โˆ’๐พ)โˆ’1 is a ๐‘ƒ0-matrix,

i.e., all its principal minors are nonnegative. We refer to [54] for further definitions and

properties of ๐‘ƒ0 matrices. Note that in that case, the DPP is also an ๐ฟ-ensemble, with

probability mass function given by ๐‘ƒ (๐‘Œ = ๐ฝ) = det(๐ฟ๐ฝ)/ det(๐ผ + ๐ฟ), for all ๐ฝ โŠ† [๐‘ ],

17

where ๐ฟ = ๐พ(๐ผ โˆ’๐พ)โˆ’1.

DPPs with a symmetric kernel have attracted a lot of interest in machine learning because

they satisfy a property called negative association, which models repulsive interactions

between items [8]. Following the seminal work of Kulesza and Taskar [72], discrete

symmetric DPPs have found numerous applications in machine learning, including in

document and timeline summarization [76, 132], image search [70, 1] and segmentation

[73], audio signal processing [131], bioinformatics [6] and neuroscience [106]. What makes

such models appealing is that they exhibit repulsive behavior and lend themselves naturally

to tasks where returning a diverse set of objects is important. For instance, when applied to

recommender systems, DPPs enforce diversity in the items within one basket [45].

From a statistical point of view, DPPs raise two essential questions. First, what are the

families of kernels that give rise to one and the same DPP? Second, given observations

of independent copies of a DPP, how to recover its kernel (which, as foreseen by the first

question, is not necessarily unique)? These two questions are directly related to the principal

minor assignment problem (PMA). Given a class of matrices, (1) describe the set of all

matrices in this class that have a prescribed list of principal minors (this set may be empty),

(2) find one such matrix. While the first task is theoretical in nature, the second one is

algorithmic and should be solved using as few queries of the prescribed principal minors as

possible. In this work, we focus on the question of recovering a kernel from observations,

which is closely related to the problem of recovering a matrix with given principal minors.

In this chapter, we focus on symmetric DPPs (which, when clear from context, we simply

refer to as DPPs) and signed DPPs, a slightly larger class of kernels that only require

corresponding off-diagonal entries be symmetric in magnitude.

2.1.1 Symmetric Matrices

In the symmetric case, there are fast algorithms for sampling (or approximately sampling)

from a DPP [32, 94, 74, 75]. Marginalizing the distribution on a subset ๐ผ โŠ† [๐‘ ] and

conditioning on the event that ๐ฝ โŠ† ๐‘Œ both result in new DPPs and closed form expressions

for their kernels are known [11]. All of this work pertains to how to use a DPP once we

18

have learned its parameters. However, there has been much less work on the problem of

learning the parameters of a symmetric DPP. A variety of heuristics have been proposed,

including Expectation-Maximization [46], MCMC [1], and fixed point algorithms [80]. All

of these attempt to solve a non-convex optimization problem, and no guarantees on their

statistical performance are known. Recently, Brunel et al. [16] studied the rate of estimation

achieved by the maximum likelihood estimator, but the question of efficient computation

remains open. Apart from positive results on sampling, marginalization and conditioning,

most provable results about DPPs are actually negative. It is conjectured that the maximum

likelihood estimator is NP-hard to compute [69]. Actually, approximating the mode of size

๐‘˜ of a DPP to within a ๐‘๐‘˜ factor is known to be NP-hard for some ๐‘ > 1 [22, 108]. The best

known algorithms currently obtain a ๐‘’๐‘˜ + ๐‘œ(๐‘˜) approximation factor [88, 89].

In Section 2.2, we bypass the difficulties associated with maximum likelihood estimation

by using the method of moments to achieve optimal sample complexity. In the setting

of DPPs, the method of moments has a close connection to a principal minor assignment

problem, which asks, given some set of principal minors of a matrix, to recover the original

matrix, up to some equivalence class. We introduce a parameter โ„“, called the cycle sparsity

of the graph induced by the kernel ๐พ, which governs the number of moments that need to

be considered and, thus, the sample complexity. The cycle sparsity of a graph is the smallest

integer โ„“ so that the cycles of length at most โ„“ yield a basis for the cycle space of the graph.

We use a refined version of Hortonโ€™s algorithm [51, 2] to implement the method of moments

in polynomial time. Even though there are in general exponentially many cycles in a graph

to consider, Hortonโ€™s algorithm constructs a minimum weight cycle basis and, in doing so,

also reveals the parameter โ„“ together with a collection of at most โ„“ induced cycles spanning

the cycle space. We use such cycles in order to construct our method of moments estimator.

For any fixed โ„“ โ‰ฅ 2 and kernel ๐พ satisfying either |๐พ๐‘–,๐‘—| โ‰ฅ ๐›ผ or ๐พ๐‘–,๐‘— = 0 for all ๐‘–, ๐‘— โˆˆ [๐‘ ],

our algorithm has sample complexity

๐‘› = ๐‘‚

((๐ถ

๐›ผ

)2โ„“

+log๐‘

๐›ผ2๐œ€2

)

19

for some constant ๐ถ > 1, runs in time polynomial in ๐‘› and ๐‘ , and learns the parameters

up to an additive ๐œ€ with high probability. The (๐ถ/๐›ผ)2โ„“ term corresponds to the number of

samples needed to recover the signs of the entries in ๐พ. We complement this result with

a minimax lower bound (Theorem 2) to show that this sample complexity is in fact near

optimal. In particular, we show that there is an infinite family of graphs with cycle sparsity

โ„“ (namely length โ„“ cycles) on which any algorithm requires at least (๐ถ โ€ฒ๐›ผ)โˆ’2โ„“ samples to

recover the signs of the entries of ๐พ for some constant ๐ถ โ€ฒ > 1. We also provide experimental

results that confirm many quantitative aspects of our theoretical predictions. Together, our

upper bounds, lower bounds, and experiments present a nuanced understanding of which

symmetric DPPs can be learned provably and efficiently.

2.1.2 Magnitude-Symmetric Matrices

Recently, DPPs with non-symmetric kernels have gained interest in the machine learning

community [14, 44, 3, 93]. However, for these classes, questions of recovery are significantly

harder in general. In [14], Brunel proposes DPPs with kernels that are symmetric in

magnitudes, i.e., |๐พ๐‘–,๐‘—| = |๐พ๐‘—,๐‘–|, for all ๐‘–, ๐‘— = 1, . . . , ๐‘ . This class is referred to as signed

DPPs. A signed DPP is a generalization of DPPs which allows for both attractive and

repulsive behavior. Such matrices are relevant in machine learning applications, because

they increase the modeling power of DPPs. An essential assumption made in [14] is

that ๐พ is dense, which simplifies the combinatorial analysis. We consider magnitude-

symmetric matrices, but without any density assumption. The problem becomes significantly

harder because it requires a fine analysis of the combinatorial properties of a graphical

representation of the sparsity of the matrix and the products of corresponding off-diagonal

entries. In Section 2.3, we treat both theoretical and algorithmic questions around recovering

a magnitude-symmetric matrix from its principal minors. First, in Subsection 2.3.1, we

show that, for a given magnitude-symmetric matrix ๐พ, the principal minors of length at

most โ„“, for some graph invariant โ„“ depending only on principal minors of order one and

two, uniquely determine principal minors of all orders. Next, in Subsection 2.3.2, we

describe an efficient algorithm that, given the principal minors of a magnitude-symmetric

20

matrix, computes a matrix with those principal minors. This algorithm queries only ๐‘‚(๐‘›2)

principal minors, all of a bounded order that depends solely on the sparsity of the matrix.

Finally, in Subsection 2.3.3, we consider the question of recovery when principal minors

are only known approximately, and construct an algorithm that, for magnitude-symmetric

matrices that are sufficiently generic in a sense, recovers the matrix almost exactly. This

algorithm immediately implies an procedure for learning a signed DPP from its samples, as

basic probabilistic techniques (e.g., a union bound, etc.) can be used to show that with high

probability all estimators are sufficiently close to the principal minor they are approximating.

21

2.2 Learning Symmetric Determinantal Point Processes

Let ๐‘Œ1, . . . , ๐‘Œ๐‘› be ๐‘› independent copies of ๐‘Œ โˆผ DPP(๐พ), for some unknown kernel ๐พ such

that 0 โชฏ ๐พ โชฏ ๐ผ๐‘ . It is well known that ๐พ is identified by DPP(๐พ) only up to flips of the

signs of its rows and columns: If ๐พ โ€ฒ is another symmetric matrix with 0 โชฏ ๐พ โ€ฒ โชฏ ๐ผ๐‘ , then

DPP(๐พ โ€ฒ)=DPP(๐พ) if and only if ๐พ โ€ฒ = ๐ท๐พ๐ท for some ๐ท โˆˆ ๐’Ÿ๐‘ , where ๐’Ÿ๐‘ denotes the

class of all ๐‘ ร—๐‘ diagonal matrices with only 1 and โˆ’1 on their diagonals [69, Theorem

4.1]. We call such a transform a ๐’Ÿ๐‘ -similarity of ๐พ.

In view of this equivalence class, we define the following pseudo-distance between

kernels ๐พ and ๐พ โ€ฒ:

๐œŒ(๐พ,๐พ โ€ฒ) = inf๐ทโˆˆ๐’Ÿ๐‘

|๐ท๐พ๐ท โˆ’๐พ โ€ฒ|โˆž ,

where for any matrix ๐พ, |๐พ|โˆž = max๐‘–,๐‘—โˆˆ[๐‘ ] |๐พ๐‘–,๐‘—| denotes the entrywise sup-norm.

For any ๐‘† โŠ‚ [๐‘ ], we write โˆ†๐‘† = det(๐พ๐‘†), where ๐พ๐‘† denotes the |๐‘†| ร— |๐‘†| sub-matrix

of ๐พ obtained by keeping rows and columns with indices in ๐‘†. Note that for 1 โ‰ค ๐‘– = ๐‘— โ‰ค ๐‘ ,

we have the following relations:

๐พ๐‘–,๐‘– = P[๐‘– โˆˆ ๐‘Œ ], โˆ†๐‘–,๐‘— = P[๐‘–, ๐‘— โŠ† ๐‘Œ ],

and |๐พ๐‘–,๐‘—| =โˆš

๐พ๐‘–,๐‘–๐พ๐‘—,๐‘— โˆ’โˆ†๐‘–,๐‘—. Therefore, the principal minors of size one and two of

๐พ determine ๐พ up to the sign of its off-diagonal entries. What remains is to compute the

signs of the off-diagonal entries. In fact, for any ๐พ, there exists an โ„“ depending only on the

graph ๐บ๐พ induced by ๐พ, such that ๐พ can be recovered up to a ๐’Ÿ๐‘ -similarity with only the

knowledge of its principal minors of size at most โ„“. We will show that this โ„“ is exactly the

cycle sparsity.

2.2.1 An Associated Principal Minor Assignment Problem

In this subsection, we consider the following related principal minor assignment problem:

Given a symmetric matrix ๐พ โˆˆ R๐‘ร—๐‘ , what is the minimal โ„“ such that ๐พ can be

recovered (up to ๐’Ÿ๐‘ -similarity), using principal minors โˆ†๐‘† , of size at most โ„“?

22

This problem has a clear relation to learning DPPs, as, in our setting, we can approximate

the principal minors of ๐พ by empirical averages. However the accuracy of our estimator

deteriorates with the size of the principal minor, and we must therefore estimate the smallest

possible principal minors in order to achieve optimal sample complexity. Answering the

above question tells us how small of principal minors we can consider. The relationship

between the principal minors of ๐พ and recovery of DPP(๐พ) has also been considered

elsewhere. There has been work regarding the symmetric principal minor assignment

problem, namely the problem of computing a matrix given an oracle that gives any principal

minor in constant time [96]. Here, we prove that the smallest โ„“ such that all the principal

minors of ๐พ are uniquely determined by those of size at most โ„“ is exactly the cycle sparsity

of the graph induced by ๐พ.

We begin by recalling some standard graph theoretic notions. Let ๐บ = ([๐‘ ], ๐ธ),

|๐ธ| = ๐‘š. A cycle ๐ถ of ๐บ is any connected subgraph in which each vertex has even degree.

Each cycle ๐ถ is associated with an incidence vector ๐‘ฅ โˆˆ ๐บ๐น (2)๐‘š such that ๐‘ฅ๐‘’ = 1 if ๐‘’ is

an edge in ๐ถ and ๐‘ฅ๐‘’ = 0 otherwise. The cycle space ๐’ž of ๐บ is the subspace of ๐บ๐น (2)๐‘š

spanned by the incidence vectors of the cycles in ๐บ. The dimension ๐œˆ๐บ of the cycle space is

called the cyclomatic number, and it is well known that ๐œˆ๐บ := ๐‘šโˆ’๐‘ + ๐œ…(๐บ), where ๐œ…(๐บ)

denotes the number of connected components of ๐บ. Recall that a simple cycle is a graph

where every vertex has either degree two or zero and the set of vertices with degree two

form a connected set. A cycle basis is a basis of ๐’ž โŠ‚ ๐บ๐น (2)๐‘š such that every element is a

simple cycle. It is well known that every cycle space has a cycle basis of induced cycles.

Definition 1. The cycle sparsity of a graph ๐บ is the minimal โ„“ for which ๐บ admits a cycle

basis of induced cycles of length at most โ„“, with the convention that โ„“ = 2 whenever the

cycle space is empty. A corresponding cycle basis is called a shortest maximal cycle basis.

A shortest maximal cycle basis of the cycle space was also studied for other reasons by

[21]. We defer a discussion of computing such a basis to a later subsection. For any subset

๐‘† โŠ† [๐‘ ], denote by ๐บ๐พ(๐‘†) = (๐‘†,๐ธ(๐‘†)) the subgraph of ๐บ๐พ induced by ๐‘†. A matching

of ๐บ๐พ(๐‘†) is a subset ๐‘€ โŠ† ๐ธ(๐‘†) such that any two distinct edges in ๐‘€ are not adjacent in

๐บ(๐‘†). The set of vertices incident to some edge in ๐‘€ is denoted by ๐‘‰ (๐‘€). We denote by

23

โ„ณ(๐‘†) the collection of all matchings of ๐บ๐พ(๐‘†). Then, if ๐บ๐พ(๐‘†) is an induced cycle, we

can write the principal minor โˆ†๐‘† = det(๐พ๐‘†) as follows:

โˆ†๐‘† =โˆ‘

๐‘€โˆˆโ„ณ(๐‘†)

(โˆ’1)|๐‘€ |โˆ

๐‘–,๐‘—โˆˆ๐‘€

๐พ2๐‘–,๐‘—

โˆ๐‘– โˆˆ๐‘‰ (๐‘€)

๐พ๐‘–,๐‘– + 2ร— (โˆ’1)|๐‘†|+1โˆ

๐‘–,๐‘—โˆˆ๐ธ(๐‘†)

๐พ๐‘–,๐‘—. (2.1)

Proposition 1. Let ๐พ โˆˆ R๐‘ร—๐‘ be a symmetric matrix, ๐บ๐พ be the graph induced by ๐พ, and

โ„“ โ‰ฅ 3 be some integer. The kernel ๐พ is completely determined up to ๐’Ÿ๐‘ -similarity by its

principal minors of size at most โ„“ if and only if the cycle sparsity of ๐บ๐พ is at most โ„“.

Proof. Note first that all the principal minors of ๐พ completely determine ๐พ up to a ๐’Ÿ๐‘ -

similarity [96, Theorem 3.14]. Moreover, recall that principal minors of degree at most 2

determine the diagonal entries of ๐พ as well as the magnitude of its off-diagonal entries.

In particular, given these principal minors, one only needs to recover the signs of the off-

diagonal entries of ๐พ. Let the sign of cycle ๐ถ in ๐พ be the product of the signs of the entries

of ๐พ corresponding to the edges of ๐ถ.

Suppose ๐บ๐พ has cycle sparsity โ„“ and let (๐ถ1, . . . , ๐ถ๐œˆ) be a cycle basis of ๐บ๐พ where each

๐ถ๐‘–, ๐‘– โˆˆ [๐œˆ] is an induced cycle of length at most โ„“. By (2.1), the sign of any ๐ถ๐‘–, ๐‘– โˆˆ [๐œˆ] is

completely determined by the principal minor โˆ†๐‘† , where ๐‘† is the set of vertices of ๐ถ๐‘– and is

such that |๐‘†| โ‰ค โ„“. Moreover, for ๐‘– โˆˆ [๐œˆ], let ๐‘ฅ๐‘– โˆˆ ๐บ๐น (2)๐‘š denote the incidence vector of ๐ถ๐‘–.

By definition, the incidence vector ๐‘ฅ of any cycle ๐ถ is given byโˆ‘

๐‘–โˆˆโ„ ๐‘ฅ๐‘– for some subset

โ„ โŠ‚ [๐œˆ]. The sign of ๐ถ is then given by the product of the signs of ๐ถ๐‘–, ๐‘– โˆˆ โ„ and thus by

corresponding principal minors. In particular, the signs of all cycles are determined by the

principal minors โˆ†๐‘† with |๐‘†| โ‰ค โ„“. In turn, by Theorem 3.12 in [96], the signs of all cycles

completely determine ๐พ, up to a ๐’Ÿ๐‘ -similarity.

Next, suppose the cycle sparsity of ๐บ๐พ is at least โ„“ + 1, and let ๐’žโ„“ be the subspace of

๐บ๐น (2)๐‘š spanned by the induced cycles of length at most โ„“ in ๐บ๐พ . Let ๐‘ฅ1, . . . , ๐‘ฅ๐œˆ be a basis

of ๐’žโ„“ made of the incidence column vectors of induced cycles of length at most โ„“ in ๐บ๐พ and

form the matrix ๐ด โˆˆ ๐บ๐น (2)๐‘šร—๐œˆ by concatenating the ๐‘ฅ๐‘–โ€™s. Since ๐’žโ„“ does not span the cycle

space of ๐บ๐พ , ๐œˆ < ๐œˆ๐บ๐พโ‰ค ๐‘š. Hence, the rank of ๐ด is less than ๐‘š, so the null space of ๐ดโŠค is

non trivial. Let be the incidence column vector of an induced cycle ๐ถ that is not in ๐’žโ„“, and

24

let โ„Ž โˆˆ ๐บ๐ฟ(2)๐‘š with ๐ดโŠคโ„Ž = 0, โ„Ž = 0 and โŠคโ„Ž = 1. These three conditions are compatible

because ๐ถ /โˆˆ ๐’žโ„“. We are now in a position to define an alternate kernel ๐พ โ€ฒ as follows: Let

๐พ โ€ฒ๐‘–,๐‘– = ๐พ๐‘–,๐‘– and |๐พ โ€ฒ

๐‘–,๐‘—| = |๐พ๐‘–,๐‘—| for all ๐‘–, ๐‘— โˆˆ [๐‘ ]. We define the signs of the off-diagonal

entries of ๐พ โ€ฒ as follows: For all edges ๐‘’ = ๐‘–, ๐‘—, ๐‘– = ๐‘—, sgn(๐พ โ€ฒ๐‘’) = sgn(๐พ๐‘’) if โ„Ž๐‘’ = 0 and

sgn(๐พ โ€ฒ๐‘’) = โˆ’ sgn(๐พ๐‘’) otherwise. We now check that ๐พ and ๐พ โ€ฒ have the same principal

minors of size at most โ„“ but differ on a principal minor of size larger than โ„“. To that end, let

๐‘ฅ be the incidence vector of a cycle ๐ถ in ๐’žโ„“ so that ๐‘ฅ = ๐ด๐‘ค for some ๐‘ค โˆˆ ๐บ๐ฟ(2)๐œˆ . Thus

the sign of ๐ถ in ๐พ is given by

โˆ๐‘’ :๐‘ฅ๐‘’=1

๐พ๐‘’ = (โˆ’1)๐‘ฅโŠคโ„Ž

โˆ๐‘’ :๐‘ฅ๐‘’=1

๐พ โ€ฒ๐‘’ = (โˆ’1)๐‘ค

โŠค๐ดโŠคโ„Žโˆ

๐‘’ :๐‘ฅ๐‘’=1

๐พ โ€ฒ๐‘’ =

โˆ๐‘’ :๐‘ฅ๐‘’=1

๐พ โ€ฒ๐‘’

because ๐ดโŠคโ„Ž = 0. Therefore, the sign of any ๐ถ โˆˆ ๐’žโ„“ is the same in ๐พ and ๐พ โ€ฒ. Now, let

๐‘† โŠ† [๐‘ ] with |๐‘†| โ‰ค โ„“, and let ๐บ = ๐บ๐พ๐‘†= ๐บ๐พโ€ฒ

๐‘†be the graph corresponding to ๐พ๐‘† (or,

equivalently, to ๐พ โ€ฒ๐‘†). For any induced cycle ๐ถ in ๐บ, ๐ถ is also an induced cycle in ๐บ๐พ and

its length is at most โ„“. Hence, ๐ถ โˆˆ ๐’žโ„“ and the sign of ๐ถ is the same in ๐พ and ๐พ โ€ฒ. By [96,

Theorem 3.12], det(๐พ๐‘†) = det(๐พ โ€ฒ๐‘†). Next observe that the sign of ๐ถ in ๐พ is given by

โˆ๐‘’ : ๐‘’=1

๐พ๐‘’ = (โˆ’1)โŠคโ„Ž

โˆ๐‘’ : ๐‘’=1

๐พ โ€ฒ๐‘’ = โˆ’

โˆ๐‘’ :๐‘ฅ๐‘’=1

๐พ โ€ฒ๐‘’.

Note also that since ๐ถ is an induced cycle of ๐บ๐พ = ๐บ๐พโ€ฒ , the above quantity is nonzero. Let

๐‘† be the set of vertices in ๐ถ. By (2.1) and the above display, we have det(๐พ๐‘†) = det(๐พ โ€ฒ๐‘†).

Together with [96, Theorem 3.14], it yields ๐พ = ๐ท๐พ โ€ฒ๐ท for all ๐ท โˆˆ ๐’Ÿ๐‘ .

2.2.2 Definition of the Estimator

Our procedure is based on the previous result and can be summarized as follows. We first

estimate the diagonal entries (i.e., the principal minors of size one) of ๐พ by the method of

moments. By the same method, we estimate the principal minors of size two of ๐พ, and we

deduce estimates of the magnitude of the off-diagonal entries. To use these estimates to

25

deduce an estimate of ๐บ๐พ , we make the following assumption on the kernel ๐พ.

Assumption 1. Fix ๐›ผ โˆˆ (0, 1). For all 1 โ‰ค ๐‘– < ๐‘— โ‰ค ๐‘ , either ๐พ๐‘–,๐‘— = 0, or |๐พ๐‘–,๐‘—| โ‰ฅ ๐›ผ.

Finally, we find a shortest maximal cycle basis of , and we set the signs of our non-zero

off-diagonal entry estimates by using estimators of the principal minors induced by the

elements of the basis, again obtained by the method of moments.

For ๐‘† โŠ† [๐‘ ], set โˆ†๐‘† =1

๐‘›

๐‘›โˆ‘๐‘=1

1๐‘†โŠ†๐‘Œ๐‘ , and define

๐‘–,๐‘– = โˆ†๐‘– and ๐‘–,๐‘— = ๐‘–,๐‘–๐‘—,๐‘— โˆ’ โˆ†๐‘–,๐‘—,

where ๐‘–,๐‘– and ๐‘–,๐‘— are our estimators of ๐พ๐‘–,๐‘– and ๐พ2๐‘–,๐‘— , respectively. Define = ([๐‘ ], ),

where, for ๐‘– = ๐‘—, ๐‘–, ๐‘— โˆˆ if and only if ๐‘–,๐‘— โ‰ฅ 12๐›ผ2. The graph is our estimator of ๐บ๐พ .

Let ๐ถ1, ..., ๐ถ๐œˆ be a shortest maximal cycle basis of the cycle space of . Let ๐‘†๐‘– โŠ† [๐‘ ]

be the subset of vertices of ๐ถ๐‘–, for 1 โ‰ค ๐‘– โ‰ค ๐œˆ. We define

๐‘– = โˆ†๐‘†๐‘–โˆ’

โˆ‘๐‘€โˆˆโ„ณ(๐‘†๐‘–)

(โˆ’1)|๐‘€ |โˆ

๐‘–,๐‘—โˆˆ๐‘€

๐‘–,๐‘—

โˆ๐‘– โˆˆ๐‘‰ (๐‘€)

๐‘–,๐‘–,

for 1 โ‰ค ๐‘– โ‰ค ๐œˆ. In light of (2.1), for large enough ๐‘›, this quantity should be close to

๐ป๐‘– = 2ร— (โˆ’1)|๐‘†๐‘–|+1โˆ

๐‘–,๐‘—โˆˆ๐ธ(๐‘†๐‘–)

๐พ๐‘–,๐‘— .

We note that this definition is only symbolic in nature, and computing ๐‘– in this fashion

is extremely inefficient. Instead, to compute it in practice, we will use the determinant of

an auxiliary matrix, computed via a matrix factorization. Namely, let us define the matrix๐พ โˆˆ R๐‘ร—๐‘ such that ๐พ๐‘–,๐‘– = ๐‘–,๐‘– for 1 โ‰ค ๐‘– โ‰ค ๐‘ , and ๐พ๐‘–,๐‘— = 1/2๐‘–,๐‘— . We have

det ๐พ๐‘†๐‘–=โˆ‘๐‘€โˆˆโ„ณ

(โˆ’1)|๐‘€ |โˆ

๐‘–,๐‘—โˆˆ๐‘€

๐‘–,๐‘—

โˆ๐‘– โˆˆ๐‘‰ (๐‘€)

๐‘–,๐‘– + 2ร— (โˆ’1)|๐‘†๐‘–|+1โˆ

๐‘–,๐‘—โˆˆ(๐‘†๐‘–)

1/2๐‘–,๐‘— ,

26

so that we may equivalently write

๐‘– = โˆ†๐‘†๐‘–โˆ’ det( ๐พ๐‘†๐‘–

) + 2ร— (โˆ’1)|๐‘†๐‘–|+1โˆ

๐‘–,๐‘—โˆˆ(๐‘†๐‘–)

1/2๐‘–,๐‘— .

Finally, let = ||. Set the matrix ๐ด โˆˆ ๐บ๐น (2)๐œˆร— with ๐‘–-th row representing ๐ถ๐‘– in

๐บ๐น (2)๐‘š, 1 โ‰ค ๐‘– โ‰ค ๐œˆ, ๐‘ = (๐‘1, . . . , ๐‘๐œˆ) โˆˆ ๐บ๐น (2)๐œˆ with ๐‘๐‘– = 12[sgn(๐‘–) + 1], 1 โ‰ค ๐‘– โ‰ค ๐œˆ,

and let ๐‘ฅ โˆˆ ๐บ๐น (2)๐‘š be a solution to the linear system ๐ด๐‘ฅ = ๐‘ if a solution exists, ๐‘ฅ = 1๐‘š

otherwise. We define ๐‘–,๐‘— = 0 if ๐‘–, ๐‘— /โˆˆ and ๐‘–,๐‘— = ๐‘—,๐‘– = (2๐‘ฅ๐‘–,๐‘— โˆ’ 1)1/2๐‘–,๐‘— for all

๐‘–, ๐‘— โˆˆ .

Next, we prove the following lemma which relates the quality of estimation of ๐พ in

terms of ๐œŒ to the quality of estimation of the principal minors โˆ†๐‘† .

Lemma 1. Let ๐พ satisfy Assumption 1, and let โ„“ be the cycle sparsity of ๐บ๐พ . Let ๐œ€ > 0. If

|โˆ†๐‘† โˆ’โˆ†๐‘†| โ‰ค ๐œ€ for all ๐‘† โŠ† [๐‘ ] with |๐‘†| โ‰ค 2 and if |โˆ†๐‘† โˆ’โˆ†๐‘†| โ‰ค (๐›ผ/4)|๐‘†| for all ๐‘† โŠ† [๐‘ ]

with 3 โ‰ค |๐‘†| โ‰ค โ„“, then

๐œŒ(,๐พ) < 4๐œ€/๐›ผ .

Proof. We can bound |๐‘–,๐‘— โˆ’๐พ2๐‘–,๐‘—|, namely,

๐‘–,๐‘— โ‰ค (๐พ๐‘–,๐‘– + ๐›ผ2/16)(๐พ๐‘—,๐‘— + ๐›ผ2/16)โˆ’ (โˆ†๐‘–,๐‘— โˆ’ ๐›ผ2/16)

โ‰ค ๐พ2๐‘–,๐‘— + ๐›ผ2/4

and

๐‘–,๐‘— โ‰ฅ (๐พ๐‘–,๐‘– โˆ’ ๐›ผ2/16)(๐พ๐‘—,๐‘— โˆ’ ๐›ผ2/16)โˆ’ (โˆ†๐‘–,๐‘— + ๐›ผ2/16)

โ‰ฅ ๐พ2๐‘–,๐‘— โˆ’ 3๐›ผ2/16,

giving |๐‘–,๐‘— โˆ’๐พ2๐‘–,๐‘—| < ๐›ผ2/4. Thus, we can correctly determine whether ๐พ๐‘–,๐‘— = 0 or |๐พ๐‘–,๐‘—| โ‰ฅ

๐›ผ, yielding = ๐บ๐พ . In particular, the cycle basis ๐ถ1, . . . , ๐ถ๐œˆof is a cycle basis of ๐บ๐พ .

27

Let 1 โ‰ค ๐‘– โ‰ค ๐œˆ. Denote by ๐‘ก = (๐›ผ/4)|๐‘†๐‘–|. We have

๐‘– โˆ’๐ป๐‘–

โ‰ค |โˆ†๐‘†๐‘–

โˆ’โˆ†๐‘†๐‘–|+ |โ„ณ(๐‘†๐‘–)|max

๐‘ฅโˆˆยฑ1

[(1 + 4๐‘ก๐‘ฅ)|๐‘†๐‘–| โˆ’ 1

]โ‰ค (๐›ผ/4)|๐‘†๐‘–| + |โ„ณ(๐‘†๐‘–)|

[(1 + 4๐‘ก)|๐‘†๐‘–| โˆ’ 1

]โ‰ค (๐›ผ/4)|๐‘†๐‘–| + ๐‘‡

(|๐‘†๐‘–|,

โŒŠ|๐‘†๐‘–|2

โŒ‹)4๐‘ก ๐‘‡ (|๐‘†๐‘–|, |๐‘†๐‘–|)

โ‰ค (๐›ผ/4)|๐‘†๐‘–| + 4๐‘ก (2|๐‘†๐‘–|2 โˆ’ 1)(2|๐‘†๐‘–| โˆ’ 1)

โ‰ค (๐›ผ/4)|๐‘†๐‘–| + ๐‘ก22|๐‘†๐‘–|

< 2๐›ผ|๐‘†๐‘–| โ‰ค |๐ป๐‘–|,

where, for positive integers ๐‘ < ๐‘ž, we denote by ๐‘‡ (๐‘ž, ๐‘) =โˆ‘๐‘

๐‘–=1

(๐‘ž๐‘–

). Therefore, we can

determine the sign of the productโˆ

๐‘–,๐‘—โˆˆ๐ธ(๐‘†๐‘–)๐พ๐‘–,๐‘— for every element in the cycle basis and

recover the signs of the non-zero off-diagonal entries of ๐พ๐‘–,๐‘— . Hence,

๐œŒ(,๐พ) = max1โ‰ค๐‘–,๐‘—โ‰ค๐‘

|๐‘–,๐‘—| โˆ’ |๐พ๐‘–,๐‘—|

.

For ๐‘– = ๐‘—,|๐‘–,๐‘—| โˆ’ |๐พ๐‘–,๐‘—|

= |๐‘–,๐‘– โˆ’๐พ๐‘–,๐‘–| โ‰ค ๐œ€. For ๐‘– = ๐‘— with ๐‘–, ๐‘— โˆˆ = ๐ธ, one can

easily show that๐‘–,๐‘— โˆ’๐พ2

๐‘–,๐‘—

โ‰ค 4๐œ€, yielding

|1/2๐‘–,๐‘— โˆ’ |๐พ๐‘–,๐‘—|| โ‰ค

4๐œ€

1/2๐‘–,๐‘— + |๐พ๐‘–,๐‘—|

โ‰ค 4๐œ€

๐›ผ,

which completes the proof.

We are now in a position to establish a sufficient sample size to estimate ๐พ within

distance ๐œ€.

Theorem 1. Let ๐พ satisfy Assumption 1, and let โ„“ be the cycle sparsity of ๐บ๐พ . Let ๐œ€ > 0.

For any ๐ด > 0, there exists ๐ถ > 0 such that

๐‘› โ‰ฅ ๐ถ( 1

๐›ผ2๐œ€2+ โ„“( 4

๐›ผ

)2โ„“)log๐‘ ,

28

yields ๐œŒ(,๐พ) โ‰ค ๐œ€ with probability at least 1โˆ’๐‘โˆ’๐ด.

Proof. Using the previous lemma, and applying a union bound,

P[๐œŒ(,๐พ) > ๐œ€

]โ‰คโˆ‘|๐‘†|โ‰ค2

P[|โˆ†๐‘† โˆ’โˆ†๐‘†| > ๐›ผ๐œ€/4

]+โˆ‘

2โ‰ค|๐‘†|โ‰คโ„“

P[|โˆ†๐‘† โˆ’โˆ†๐‘†| > (๐›ผ/4)|๐‘†|

]โ‰ค 2๐‘2๐‘’โˆ’๐‘›๐›ผ2๐œ€2/8 + 2๐‘ โ„“+1๐‘’โˆ’2๐‘›(๐›ผ/4)2โ„“ , (2.2)

where we used Hoeffdingโ€™s inequality.

2.2.3 Information Theoretic Lower Bounds

We prove an information-theoretic lower bound that holds already if ๐บ๐พ is an โ„“-cycle.

Let ๐ท(๐พโ€–๐พ โ€ฒ) and H(๐พ,๐พ โ€ฒ) denote respectively the Kullback-Leibler divergence and the

Hellinger distance between DPP(๐พ) and DPP(๐พ โ€ฒ).

Lemma 2. For ๐œ‚ โˆˆ โˆ’,+, let ๐พ๐œ‚ be the โ„“ร— โ„“ matrix with elements given by

๐พ๐‘–,๐‘— =

โŽงโŽชโŽชโŽชโŽชโŽชโŽชโŽจโŽชโŽชโŽชโŽชโŽชโŽชโŽฉ

1/2 if ๐‘— = ๐‘–

๐›ผ if ๐‘— = ๐‘–ยฑ 1

๐œ‚๐›ผ if (๐‘–, ๐‘—) โˆˆ (1, โ„“), (โ„“, 1)

0 otherwise

.

Then, for any ๐›ผ โ‰ค 1/8, it holds

๐ท(๐พโ€–๐พ โ€ฒ) โ‰ค 4(6๐›ผ)โ„“, and H(๐พ,๐พ โ€ฒ) โ‰ค (8๐›ผ2)โ„“ .

Proof. It is straightforward to see that

det(๐พ+๐ฝ )โˆ’ det(๐พโˆ’

๐ฝ ) =

โŽงโŽชโŽจโŽชโŽฉ2๐›ผโ„“ if ๐ฝ = [โ„“]

0 else.

If ๐‘Œ is sampled from DPP(๐พ๐œ‚), we denote by ๐‘๐œ‚(๐‘†) = P[๐‘Œ = ๐‘†], for ๐‘† โŠ† [โ„“]. It follows

29

from the inclusion-exclusion principle that for all ๐‘† โŠ† [โ„“],

๐‘+(๐‘†)โˆ’ ๐‘โˆ’(๐‘†) =โˆ‘

๐ฝโŠ†[โ„“]โˆ–๐‘†

(โˆ’1)|๐ฝ |(det๐พ+๐‘†โˆช๐ฝ โˆ’ det๐พโˆ’

๐‘†โˆช๐ฝ)

= (โˆ’1)โ„“โˆ’|๐‘†|(det๐พ+ โˆ’ det๐พโˆ’) = ยฑ2๐›ผโ„“ , (2.3)

where |๐ฝ | denotes the cardinality of ๐ฝ . The inclusion-exclusion principle also yields that

๐‘๐œ‚(๐‘†) = | det(๐พ๐œ‚ โˆ’ ๐ผ๐‘†)| for all ๐‘† โŠ† [๐‘™], where ๐ผ๐‘† stands for the โ„“ร— โ„“ diagonal matrix with

ones on its entries (๐‘–, ๐‘–) for ๐‘– /โˆˆ ๐‘†, zeros elsewhere.

We denote by ๐ท(๐พ+โ€–๐พโˆ’) the Kullback Leibler divergence between DPP(๐พ+) and

DPP(๐พโˆ’):

๐ท(๐พ+โ€–๐พโˆ’) =โˆ‘๐‘†โŠ†[โ„“]

๐‘+(๐‘†) log

(๐‘+(๐‘†)

๐‘โˆ’(๐‘†)

)โ‰คโˆ‘๐‘†โŠ†[โ„“]

๐‘+(๐‘†)

๐‘โˆ’(๐‘†)(๐‘+(๐‘†)โˆ’ ๐‘โˆ’(๐‘†))

โ‰ค 2๐›ผโ„“โˆ‘๐‘†โŠ†[โ„“]

| det(๐พ+ โˆ’ ๐ผ๐‘†)|| det(๐พโˆ’ โˆ’ ๐ผ๐‘†)|

, (2.4)

by (2.3). Using the fact that 0 < ๐›ผ โ‰ค 1/8 and the Gershgorin circle theorem, we conclude

that the absolute value of all eigenvalues of ๐พ๐œ‚ โˆ’ ๐ผ๐‘† are between 1/4 and 3/4. Thus we

obtain from (2.4) the bound ๐ท(๐พ+โ€–๐พโˆ’) โ‰ค 4(6๐›ผ)โ„“.

Using the same arguments as above, the Hellinger distance H(๐พ+, ๐พโˆ’) between

DPP(๐พ+) and DPP(๐พโˆ’) satisfies

H(๐พ+, ๐พโˆ’) =โˆ‘๐ฝโŠ†[โ„“]

(๐‘+(๐ฝ)โˆ’ ๐‘โˆ’(๐ฝ)โˆš๐‘+(๐ฝ) +

โˆš๐‘โˆ’(๐ฝ)

)2

โ‰คโˆ‘๐ฝโŠ†[โ„“]

4๐›ผ2โ„“

2 ยท 4โˆ’โ„“= (8๐›ผ2)โ„“

which completes the proof.

The sample complexity lower bound now follows from standard arguments.

30

Theorem 2. Let 0 < ๐œ€ โ‰ค ๐›ผ โ‰ค 1/8 and 3 โ‰ค โ„“ โ‰ค ๐‘ . There exists a constant ๐ถ > 0 such

that if

๐‘› โ‰ค ๐ถ( 8โ„“

๐›ผ2โ„“+

log(๐‘/โ„“)

(6๐›ผ)โ„“+

log๐‘

๐œ€2

),

then the following holds: for any estimator based on ๐‘› samples, there exists a kernel

๐พ that satisfies Assumption 1 and such that the cycle sparsity of ๐บ๐พ is โ„“ and for which

๐œŒ(,๐พ) โ‰ฅ ๐œ€ with probability at least 1/3.

Proof. Recall the notation of Lemma 2. First consider the ๐‘ ร—๐‘ block diagonal matrix ๐พ

(resp. ๐พ โ€ฒ) where its first block is ๐พ+ (resp. ๐พโˆ’) and its second block is ๐ผ๐‘โˆ’โ„“. By a standard

argument, the Hellinger distance H๐‘›(๐พ,๐พ โ€ฒ) between the product measures DPP(๐พ)โŠ—๐‘› and

DPP(๐พ โ€ฒ)โŠ—๐‘› satisfies

1โˆ’ H2๐‘›(๐พ,๐พ โ€ฒ)

2=(1โˆ’ H2(๐พ,๐พ โ€ฒ)

2

)๐‘› โ‰ฅ (1โˆ’ ๐›ผ2โ„“

2ร— 8โ„“

)๐‘›,

which yields the first term in the desired lower bound.

Next, by padding with zeros, we can assume that ๐ฟ = ๐‘/โ„“ is an integer. Let ๐พ(0) be

a block diagonal matrix where each block is ๐พ+ (using the notation of Lemma 2). For

๐‘— = 1, . . . , ๐ฟ, define the ๐‘ ร— ๐‘ block diagonal matrix ๐พ(๐‘—) as the matrix obtained from

๐พ(0) by replacing its ๐‘—th block with ๐พโˆ’ (again using the notation of Lemma 2).

Since DPP(๐พ(๐‘—)) is the product measure of ๐ฟ lower dimensional DPPs that are each

independent of each other, using Lemma 2 we have ๐ท(๐พ(๐‘—)โ€–๐พ(0)) โ‰ค 4(6๐›ผ)โ„“. Hence, by

Fanoโ€™s lemma (see, e.g., Corollary 2.6 in [115]), the sample complexity to learn the kernel

of a DPP within a distance ๐œ€ โ‰ค ๐›ผ is

ฮฉ

(log(๐‘/โ„“)

(6๐›ผ)โ„“

)

which yields the second term.

The third term follows from considering ๐พ0 = (1/2)๐ผ๐‘ and letting ๐พ๐‘— be obtained from

๐พ0 by adding ๐œ€ to the ๐‘—th entry along the diagonal. It is easy to see that ๐ท(๐พ๐‘—โ€–๐พ0) โ‰ค 8๐œ€2.

Hence, a second application of Fanoโ€™s lemma yields that the sample complexity to learn the

31

kernel of a DPP within a distance ๐œ€ is ฮฉ( log๐‘๐œ€2

).

The third term in the lower bound is the standard parametric term and is unavoidable

in order to estimate the magnitude of the coefficients of ๐พ. The other terms are more

interesting. They reveal that the cycle sparsity of ๐บ๐พ , namely, โ„“, plays a key role in the task

of recovering the sign pattern of ๐พ. Moreover the theorem shows that the sample complexity

of our method of moments estimator is near optimal.

2.2.4 Algorithmic Aspects and Experiments

We first give an algorithm to compute the estimator defined in Subsection 2.2.2. A

well-known algorithm of Horton [51] computes a cycle basis of minimum total length in

time ๐‘‚(๐‘š3๐‘). Subsequently, the running time was improved to ๐‘‚(๐‘š2๐‘/ log๐‘) time [2].

Also, it is known that a cycle basis of minimum total length is a shortest maximal cycle

basis [21]. Together, these results imply the following.

Lemma 3. Let ๐บ = ([๐‘ ], ๐ธ), |๐ธ| = ๐‘š. There is an algorithm to compute a shortest

maximal cycle basis in ๐‘‚(๐‘š2๐‘/ log๐‘) time.

In addition, we recall the following standard result regarding the complexity of Gaussian

elimination [47].

Lemma 4. Let ๐ด โˆˆ ๐บ๐น (2)๐œˆร—๐‘š, ๐‘ โˆˆ ๐บ๐น (2)๐œˆ . Then Gaussian elimination will find a vector

๐‘ฅ โˆˆ ๐บ๐น (2)๐‘š such that ๐ด๐‘ฅ = ๐‘ or conclude that none exists in ๐‘‚(๐œˆ2๐‘š) time.

We give our procedure for computing the estimator in Algorithm 1. In the following

theorem, we bound the running time of Algorithm 1 and establish an upper bound on the

sample complexity needed to solve the recovery problem as well as the sample complexity

needed to compute an estimate that is close to ๐พ.

Theorem 3. Let ๐พ โˆˆ R๐‘ร—๐‘ be a symmetric matrix satisfying 0 โชฏ ๐พ โชฏ ๐ผ , and satisfying

Assumption 1. Let ๐บ๐พ be the graph induced by ๐พ and โ„“ be the cycle sparsity of ๐บ๐พ . Let

32

Algorithm 1 Compute Estimator Input: samples ๐‘Œ1, ..., ๐‘Œ๐‘›, parameter ๐›ผ > 0.

Compute โˆ†๐‘† for all |๐‘†| โ‰ค 2.Set ๐‘–,๐‘– = โˆ†๐‘– for 1 โ‰ค ๐‘– โ‰ค ๐‘ .Compute ๐‘–,๐‘— for 1 โ‰ค ๐‘– < ๐‘— โ‰ค ๐‘ .Form ๐พ โˆˆ R๐‘ร—๐‘ and = ([๐‘ ], ).Compute a shortest maximal cycle basis ๐‘ฃ1, ..., ๐‘ฃ๐œˆ.Compute โˆ†๐‘†๐‘–

for 1 โ‰ค ๐‘– โ‰ค ๐œˆ.Compute ๐ถ๐‘†๐‘–

using det ๐พ๐‘†๐‘–for 1 โ‰ค ๐‘– โ‰ค ๐œˆ.

Construct ๐ด โˆˆ ๐บ๐น (2)๐œˆร—๐‘š, ๐‘ โˆˆ ๐บ๐น (2)๐œˆ .Solve ๐ด๐‘ฅ = ๐‘ using Gaussian elimination.Set ๐‘–,๐‘— = ๐‘—,๐‘– = (2๐‘ฅ๐‘–,๐‘— โˆ’ 1)

1/2๐‘–,๐‘— , for all ๐‘–, ๐‘— โˆˆ .

๐‘Œ1, ..., ๐‘Œ๐‘› be samples from DPP(๐พ) and ๐›ฟ โˆˆ (0, 1). If

๐‘› >log(๐‘ โ„“+1/๐›ฟ)

(๐›ผ/4)2โ„“,

then with probability at least 1โˆ’ ๐›ฟ, Algorithm 1 computes an estimator which recovers

the signs of ๐พ up to a ๐’Ÿ๐‘ -similarity and satisfies

๐œŒ(๐พ, ) <1

๐›ผ

(8 log(4๐‘ โ„“+1/๐›ฟ)

๐‘›

)1/2

(2.5)

in ๐‘‚(๐‘š3 + ๐‘›๐‘2) time.

Proof. (2.5) follows directly from (2.2) in the proof of Theorem 1. That same proof also

shows that with probability at least 1โˆ’๐›ฟ, the support of ๐บ๐พ and the signs of ๐พ are recovered

up to a ๐’Ÿ๐‘ -similarity. What remains is to upper bound the worst case run time of Algorithm

1. We will perform this analysis line by line. Initializing requires ๐‘‚(๐‘2) operations.

Computing โˆ†๐‘† for all subsets |๐‘†| โ‰ค 2 requires ๐‘‚(๐‘›๐‘2) operations. Setting ๐‘–,๐‘– requires

๐‘‚(๐‘) operations. Computing ๐‘–,๐‘— for 1 โ‰ค ๐‘– < ๐‘— โ‰ค ๐‘ requires ๐‘‚(๐‘2) operations. Forming๐พ requires ๐‘‚(๐‘2) operations. Forming ๐บ๐พ requires ๐‘‚(๐‘2) operations. By Lemma 3,

computing a shortest maximal cycle basis requires ๐‘‚(๐‘š๐‘) operations. Constructing the

subsets ๐‘†๐‘–, 1 โ‰ค ๐‘– โ‰ค ๐œˆ, requires ๐‘‚(๐‘š๐‘) operations. Computing โˆ†๐‘†๐‘–for 1 โ‰ค ๐‘– โ‰ค ๐œˆ

33

requires ๐‘‚(๐‘›๐‘š) operations. Computing ๐ถ๐‘†๐‘–using det( ๐พ[๐‘†๐‘–]) for 1 โ‰ค ๐‘– โ‰ค ๐œˆ requires

๐‘‚(๐‘šโ„“3) operations, where a factorization of each ๐พ[๐‘†๐‘–] is used to compute each determinant

in ๐‘‚(โ„“3) operations. Constructing ๐ด and ๐‘ requires ๐‘‚(๐‘šโ„“) operations. By Lemma 4, finding

a solution ๐‘ฅ using Gaussian elimination requires ๐‘‚(๐‘š3) operations. Setting ๐‘–,๐‘— for all

edges ๐‘–, ๐‘— โˆˆ ๐ธ requires ๐‘‚(๐‘š) operations. Put this all together, Algorithm 1 runs in

๐‘‚(๐‘š3 + ๐‘›๐‘2) time.

Chordal Graphs

Here we show that it is possible to obtain faster algorithms by exploiting the structure

of ๐บ๐พ . Specifically, in the case where ๐บ๐พ chordal, we give an ๐‘‚(๐‘š) time algorithm to

determine the signs of the off-diagonal entries of the estimator , resulting in an improved

overall runtime of ๐‘‚(๐‘š + ๐‘›๐‘2). Recall that a graph ๐บ = ([๐‘ ], ๐ธ) is said to be chordal if

every induced cycle in ๐บ is of length three. Moreover, a graph ๐บ = ([๐‘ ], ๐ธ) has a perfect

elimination ordering (PEO) if there exists an ordering of the vertex set ๐‘ฃ1, ..., ๐‘ฃ๐‘ such that,

for all ๐‘–, the graph induced by ๐‘ฃ๐‘– โˆช ๐‘ฃ๐‘—|๐‘–, ๐‘— โˆˆ ๐ธ, ๐‘— > ๐‘– is a clique. It is well known

that a graph is chordal if and only if it has a PEO. A PEO of a chordal graph with ๐‘š edges

can be computed in ๐‘‚(๐‘š) operations using lexicographic breadth-first search [98].

Lemma 5. Let ๐บ = ([๐‘ ], ๐ธ), be a chordal graph and ๐‘ฃ1, ..., ๐‘ฃ๐‘› be a PEO. Given ๐‘–,

let ๐‘–* := min๐‘—|๐‘— > ๐‘–, ๐‘ฃ๐‘–, ๐‘ฃ๐‘— โˆˆ ๐ธ. Then the graph ๐บโ€ฒ = ([๐‘ ], ๐ธ โ€ฒ), where ๐ธ โ€ฒ =

๐‘ฃ๐‘–, ๐‘ฃ๐‘–*๐‘โˆ’๐œ…(๐บ)๐‘–=1 , is a spanning forest of ๐บ.

Proof. We first show that there are no cycles in ๐บโ€ฒ. Suppose to the contrary, that there is

an induced cycle ๐ถ of length ๐‘˜ on the vertices ๐‘ฃ๐‘—1 , ..., ๐‘ฃ๐‘—๐‘˜. Let ๐‘ฃ be the vertex of smallest

index. Then ๐‘ฃ is connected to two other vertices in the cycle of larger index. This is a

contradiction to the construction.

What remains is to show that |๐ธ โ€ฒ| = ๐‘ โˆ’ ๐œ…(๐บ). It suffices to prove the case ๐œ…(๐บ) = 1.

Suppose to the contrary, that there exists a vertex ๐‘ฃ๐‘–, ๐‘– < ๐‘ , with no neighbors of larger

index. Let ๐‘ƒ be the shortest path in ๐บ from ๐‘ฃ๐‘– to ๐‘ฃ๐‘ . By connectivity, such a path exists.

Let ๐‘ฃ๐‘˜ be the vertex of smallest index in the path. However, it has two neighbors in the path

of larger index, which must be adjacent to each other. Therefore, there is a shorter path.

34

Algorithm 2 Compute Signs of Edges in Chordal Graph

Input: ๐บ๐พ = ([๐‘ ], ๐ธ) chordal, โˆ†๐‘† for |๐‘†| โ‰ค 3.

Compute a perfect elimination ordering ๐‘ฃ1, ..., ๐‘ฃ๐‘.Compute the spanning forest ๐บโ€ฒ = ([๐‘ ], ๐ธ โ€ฒ).Set all edges in ๐ธ โ€ฒ to have positive sign.Compute ๐ถ๐‘–,๐‘—,๐‘–* for all ๐‘–, ๐‘— โˆˆ ๐ธ โˆ– ๐ธ โ€ฒ, ๐‘— < ๐‘–.Order edges ๐ธ โˆ– ๐ธ โ€ฒ = ๐‘’1, ..., ๐‘’๐œˆ such that ๐‘– > ๐‘— if max ๐‘’๐‘– < max ๐‘’๐‘— .Visit edges in sorted order and for ๐‘’ = ๐‘–, ๐‘—, ๐‘— > ๐‘–, set

sgn(๐‘–, ๐‘—) = sgn(๐ถ๐‘–,๐‘—,๐‘–*) sgn(๐‘–, ๐‘–*) sgn(๐‘—, ๐‘–*).

Now, given the chordal graph ๐บ๐พ induced by ๐พ and the estimates of principal minors

of size at most three, we provide an algorithm to determine the signs of the edges of ๐บ๐พ , or,

equivalently, the off-diagonal entries of ๐พ.

Theorem 4. If ๐บ๐พ is chordal, Algorithm 2 correctly determines the signs of the edges of

๐บ๐พ in ๐‘‚(๐‘š) time.

Proof. We will simultaneously perform a count of the operations and a proof of the cor-

rectness of the algorithm. Computing a PEO requires ๐‘‚(๐‘š) operations. Computing the

spanning forest requires ๐‘‚(๐‘š) operations. The edges of the spanning tree can be given

arbitrary sign, because it is a cycle-free graph. This assigns a sign to two edges of each

3-cycle. Computing each ๐ถ๐‘–,๐‘—,๐‘–* requires a constant number of operations because โ„“ = 3,

requiring a total of ๐‘‚(๐‘šโˆ’๐‘) operations. Ordering the edges requires ๐‘‚(๐‘š) operations.

Setting the signs of each remaining edge requires ๐‘‚(๐‘š) operations.

Therefore, when ๐บ๐พ is chordal, the overall complexity required by our algorithm to

compute is reduced to ๐‘‚(๐‘š + ๐‘›๐‘2).

Experiments

Here we present experiments to supplement the theoretical results of the section. We test

our algorithm on two types of random matrices. First, we consider the matrix ๐พ โˆˆ R๐‘ร—๐‘

35

(a) graph recovery, cycle (b) graph and sign recovery, cycle

(c) graph recovery, clique (d) graph and sign recovery, clique

Figure 2-1: Plots of the proportion of successive graph recovery, and graph and sign recovery,for random matrices with cycle and clique graph structure, respectively. The darker the box,the higher the proportion of trials that were recovered successfully.

corresponding to the cycle on ๐‘ vertices,

๐พ =1

2๐ผ +

1

4๐ด,

where ๐ด is symmetric and has non-zero entries only on the edges of the cycle, either +1 or

โˆ’1, each with probability 1/2. By the Gershgorin circle theorem, 0 โชฏ ๐พ โชฏ ๐ผ . Next, we

consider the matrix ๐พ โˆˆ R๐‘ร—๐‘ corresponding to the clique on ๐‘ vertices,

๐พ =1

2๐ผ +

1

4โˆš๐‘๐ด,

where ๐ด is symmetric and has all entries either +1 or โˆ’1, each with probability 1/2. It is

well known that โˆ’2โˆš๐‘ โชฏ ๐ด โชฏ 2

โˆš๐‘ with high probability, implying 0 โชฏ ๐พ โชฏ ๐ผ .

For both cases and for a range of values of matrix dimension ๐‘ and samples ๐‘›, we

run our algorithm on 50 randomly generated instances. We record the proportion of trials

where we recover the graph induced by ๐พ, and the proportion of the trials where we recover

36

both the graph and correctly determine the signs of the entries. In Figure 2-1, the shade of

each box represents the proportion of trials where recovery was successful for a given pair

๐‘, ๐‘›. A completely white box corresponds to zero success rate, black to a perfect success

rate. The plots corresponding to the cycle and the clique are telling. We note that for the

clique, recovering the sparsity pattern and recovering the signs of the off-diagonal entries

come hand-in-hand. However, for the cycle, there is a noticeable gap between the number

of samples required to recover the sparsity pattern and the number of samples required to

recover the signs of the off-diagonal entries. This empirically confirms the central role that

cycle sparsity plays in parameter estimation, and further corroborates our theoretical results.

37

2.3 Recovering a Magnitude-Symmetric Matrix from its

Principal Minors

In this section, we consider matrices ๐พ โˆˆ R๐‘ร—๐‘ satisfying |๐พ๐‘–,๐‘—| = |๐พ๐‘—,๐‘–| for ๐‘–, ๐‘— โˆˆ [๐‘ ],

which we refer to as magnitude-symmetric matrices, and investigate the algorithmic question

of recovering such a matrix from its principal minors. First, we require a number of key

definitions and notation. For completeness (and to allow independent reading), we are

including terms and notations that we have defined in Section 2.2. If ๐พ โˆˆ R๐‘ร—๐‘ and

๐‘† โŠ† [๐‘ ], ๐พ๐‘† := (๐พ๐‘–,๐‘—)๐‘–,๐‘—โˆˆ๐‘† and โˆ†๐‘†(๐พ) := det๐พ๐‘† is the principal minor of ๐พ associated

with the set ๐‘† (โˆ†โˆ…(๐พ) = 1 by convention). When it is clear from the context, we simply

write โˆ†๐‘† instead of โˆ†๐‘†(๐พ), and for sets of order at most four, we replace the set itself by

its elements, i.e., write โˆ†1,2,3 as โˆ†1,2,3.

In addition, we recall a number of relevant graph-theoretic definitions. In this work, all

graphs ๐บ are simple, undirected graphs. An articulation point or cut vertex of a graph ๐บ is a

vertex whose removal disconnects the graph. A graph ๐บ is said to be two-connected if it has

at least two vertices and has no articulation points. A maximal two-connected subgraph ๐ป

of ๐บ is called a block of ๐บ. Given a graph ๐บ, a cycle ๐ถ is a subgraph of ๐บ in which every

vertex of ๐ถ has even degree (within ๐ถ). A simple cycle ๐ถ is a connected subgraph of ๐บ in

which every vertex of ๐ถ has degree two. For simplicity, we may sometimes describe ๐ถ by a

traversal of its vertices along its edges, i.e., ๐ถ = ๐‘–1 ๐‘–2 ... ๐‘–๐‘˜ ๐‘–1; notice we have the choice of

the start vertex and the orientation of the cycle in this description. We denote the subgraph

induced by a vertex subset ๐‘† โŠ‚ ๐‘‰ by ๐บ[๐‘†]. A simple cycle ๐ถ of a graph ๐บ on vertex set

๐‘‰ (๐ถ) is not necessarily induced, and while the cycle itself has all vertices of degree two,

this cycle may contain some number of chords in the original graph ๐บ, and we denote this

set ๐ธ(๐บ[๐‘†])โˆ–๐ธ(๐ถ) of chords by ๐›พ(๐ถ).

Given any subgraph ๐ป of ๐บ, we can associate with ๐ป an incidence vector ๐œ’๐ป โˆˆ ๐บ๐น (2)๐‘š,

๐‘š = |๐ธ|, where ๐œ’๐ป(๐‘’) = 1 if and only if ๐‘’ โˆˆ ๐ป , and ๐œ’๐ป(๐‘’) = 0 otherwise. Given two

subgraphs ๐ป1, ๐ป2 โŠ‚ ๐บ of ๐บ, we define their sum ๐ป1 + ๐ป2 as the graph containing all

edges in exactly one of ๐ธ(๐ป1) and ๐ธ(๐ป2) (i.e., their symmetric difference) and no isolated

38

vertices. This corresponds to the graph resulting from the sum of their incidence vectors.

The cycle space of ๐บ is given by

๐’ž(๐บ) = span๐œ’๐ถ |๐ถ is a cycle of ๐บ โŠ‚ ๐บ๐น (2)๐‘š,

and has dimension ๐œˆ = ๐‘š โˆ’ ๐‘ + ๐œ…(๐บ), where ๐œ…(๐บ) denotes the number of connected

components of ๐บ. The quantity ๐œˆ is commonly referred to as the cyclomatic number. The

cycle sparsity โ„“ of the graph ๐บ is the smallest number for which the set of incidence vectors

๐œ’๐ถ of cycles of edge length at most โ„“ spans ๐’ž(๐บ).

In this section, we require the use and analysis of graphs ๐บ = ([๐‘ ], ๐ธ) endowed with a

linear Boolean function ๐œ– that maps subgraphs of ๐บ to โˆ’1,+1, i.e., ๐œ–(๐‘’) โˆˆ โˆ’1,+1 for

all ๐‘’ โˆˆ ๐ธ(๐บ) and

๐œ–(๐ป) =โˆ

๐‘’โˆˆ๐ธ(๐ป)

๐œ–(๐‘’)

for all subgraphs ๐ป . The graph ๐บ combined with a linear Boolean function ๐œ– is denoted

by ๐บ = ([๐‘ ], ๐ธ, ๐œ–) and referred to as a charged graph. If ๐œ–(๐ป) = +1 (resp. ๐œ–(๐ป) = โˆ’1),

then we say the subgraph ๐ป is positive (resp. negative). For the sake of space, we often

denote ๐œ–(๐‘–, ๐‘—), ๐‘–, ๐‘— โˆˆ ๐ธ(๐บ), by ๐œ–๐‘–,๐‘— . Given a magnitude-symmetric matrix ๐พ โˆˆ R๐‘ร—๐‘ ,

|๐พ๐‘–,๐‘—| = |๐พ๐‘—,๐‘–| for ๐‘–, ๐‘— โˆˆ [๐‘ ], we define the charged sparsity graph ๐บ๐พ as ๐บ๐พ = ([๐‘ ], ๐ธ, ๐œ–),

๐ธ := ๐‘–, ๐‘— โˆˆ [๐‘ ] | ๐‘– = ๐‘—, |๐พ๐‘–,๐‘—| = 0, ๐œ–(๐‘–, ๐‘—) := sgn(๐พ๐‘–,๐‘—๐พ๐‘—,๐‘–), and when clear from

context, we simply write ๐บ instead of ๐บ๐พ .

We define the span of the incidence vectors of positive cycles as

๐’ž+(๐บ) = span๐œ’๐ถ |๐ถ is a cycle of ๐บ, ๐œ–(๐ถ) = +1.

As we will see, when ๐ป is a block, the space ๐’ž+(๐ป) is spanned by positive simple cycles

(Proposition 3). We define the simple cycle sparsity โ„“+ of ๐’ž+(๐บ) to be the smallest number

for which the set of incidence vectors ๐œ’๐ถ of positive simple cycles of edge length at most

โ„“+ spans ๐’ž+(๐บ) (or, equivalently, the set ๐’ž+(๐ป) for every block ๐ป of ๐บ). Unlike ๐’ž(๐บ), for

๐’ž+(๐บ) this quantity โ„“+ depends on whether we consider a basis of cycles or of only simple

39

cycles. The study of bases of ๐’ž+(๐ป), for blocks ๐ป , consisting of incidence vectors of simple

cycles constitutes the major graph-theoretic subject of this work, and has connections to the

recovery of a magnitude-symmetric matrix from its principal minors.

2.3.1 Principal Minors and Magnitude-Symmetric Matrices

In this subsection, we are primarily concerned with recovering a magnitude-symmetric

matrix ๐พ from its principal minors. Using only principal minors of order one and two,

the quantities ๐พ๐‘–,๐‘–, ๐พ๐‘–,๐‘—๐พ๐‘—,๐‘–, and the charged graph ๐บ๐พ can be computed immediately, as

๐พ๐‘–,๐‘– = โˆ†๐‘– and ๐พ๐‘–,๐‘—๐พ๐‘—,๐‘– = โˆ†๐‘–โˆ†๐‘— โˆ’โˆ†๐‘–,๐‘— for all ๐‘– = ๐‘—. The main focus of this subsection is

to obtain further information on ๐พ using principal minors of order greater than two, and

to quantify the extent to which ๐พ can be identified from its principal minors. To avoid the

unintended cancellation of terms in a principal minor, in what follows we assume that ๐พ is

generic in the sense that

If ๐พ๐‘–,๐‘—๐พ๐‘—,๐‘˜๐พ๐‘˜,โ„“๐พโ„“,๐‘– = 0 for ๐‘–, ๐‘—, ๐‘˜, โ„“ โˆˆ [๐‘ ] distinct, then |๐พ๐‘–,๐‘—๐พ๐‘˜,โ„“| = |๐พ๐‘—,๐‘˜๐พโ„“,๐‘–|, (2.6)

and ๐œ‘1๐พ๐‘–,๐‘—๐พ๐‘—,๐‘˜๐พ๐‘˜,โ„“๐พโ„“,๐‘– + ๐œ‘2๐พ๐‘–,๐‘—๐พ๐‘—,โ„“๐พโ„“,๐‘˜๐พ๐‘˜,๐‘– + ๐œ‘3๐พ๐‘–,๐‘˜๐พ๐‘˜,๐‘—๐พ๐‘—,โ„“๐พโ„“,๐‘– = 0

for all ๐œ‘1, ๐œ‘2, ๐œ‘3 โˆˆ โˆ’1, 1.

The first part of the condition in (2.6) implies that the three terms (corresponding to the three

distinct cycles on 4 vertices) in the second part are all distinct in magnitude; the second

part strengthens this requirement. Condition (2.6) can be thought of as a no-cancellation

requirement for four-cycles of principal minors of order four. As we will see, this condition

is quite important for the recovery of a magnitude-symmetric matrix from its principal

minors. Though, the results of this section, slightly modified, hold under slightly weaker

conditions than (2.6), albeit at the cost of simplicity. This condition rules out dense matrices

with a high degree of symmetry, as well as the large majority of rank one matrices, but this

condition is satisfied for almost all matrices.

We denote the set of ๐‘ ร—๐‘ magnitude-symmetric matrices satisfying Property (2.6)

by ๐’ฆ๐‘ , and, when the dimension is clear from context, we often simply write ๐’ฆ. We note

40

that if ๐พ โˆˆ ๐’ฆ, then any matrix ๐พ โ€ฒ satisfying |๐พ โ€ฒ๐‘–,๐‘—| = |๐พ๐‘–,๐‘—|, ๐‘–, ๐‘— โˆˆ [๐‘ ], is also in ๐’ฆ. In this

subsection, we answer the following question:

Given ๐พ โˆˆ ๐’ฆ, what is the minimal โ„“+ such that any ๐พ โ€ฒ โˆˆ ๐’ฆ with (2.7)

โˆ†๐‘†(๐พ) = โˆ†๐‘†(๐พ โ€ฒ) for all |๐‘†| โ‰ค โ„“+ also satisfies โˆ†๐‘†(๐พ) = โˆ†๐‘†(๐พ โ€ฒ)

for all ๐‘† โŠ‚ [๐‘ ]?

Question (2.7) asks for the smallest โ„“ such that the principal minors of order at most โ„“

uniquely determines principal minors of all orders. In Theorem 5, we show that the answer

is the simple cycle sparsity of ๐’ž+(๐บ). In Subsections 2.3.2 and 2.3.3, we build on the

analysis of this subsection, and produce a polynomial-time algorithm for recovering a matrix

๐พ with prescribed principal minors (possibly given up to some error term). The polynomial-

time algorithm (of Subsection 2.3.3) for recovery given perturbed principal minors has key

connections to learning signed DPPs.

In addition, it is also reasonable to ask:

Given ๐พ โˆˆ ๐’ฆ, what is the set of ๐พ โ€ฒ โˆˆ ๐’ฆ that satisfy โˆ†๐‘†(๐พ) = โˆ†๐‘†(๐พ โ€ฒ) (2.8)

for all ๐‘† โŠ‚ [๐‘ ]?

Question (2.8) treats the extent to which we can hope to recover a matrix from its principal

minors. For instance, the transpose operation ๐พ๐‘‡ and the similarity transformation ๐ท๐พ๐ท,

where ๐ท is a diagonal matrix with entries โˆ’1,+1 on the diagonal, both clearly preserve

principal minors. In fact, these two operations suitably combined completely define this

set. This result follows fairly quickly from a more general result of Loewy [79, Theorem 1]

regarding matrices with all principal minors equal. In particular, Loewy shows that if two

๐‘›ร— ๐‘›, ๐‘› โ‰ฅ 4, matrices ๐ด and ๐ต have all principal minors equal, and ๐ด is irreducible and

rank(๐ด๐‘†,๐‘‡ ) โ‰ฅ 2 or rank(๐ด๐‘‡,๐‘†) โ‰ฅ 2 (where ๐ด๐‘†,๐‘‡ is the submatrix of ๐ด containing the rows

in ๐‘† and the columns in ๐‘‡ ) for all partitions ๐‘† โˆช ๐‘‡ = [๐‘›], ๐‘† โˆฉ ๐‘‡ = โˆ…, |๐‘†|, |๐‘‡ | โ‰ฅ 2, then

either ๐ต or ๐ต๐‘‡ is diagonally similar to ๐ด. In Proposition 4, we include an alternate proof

41

answering Question (2.8), as Loewyโ€™s result and proof, though more general and quite nice,

is more involved and not as illuminating for the specific case that we consider here.

To answer Question (2.7), we study properties of certain bases of the space ๐’ž+(๐บ). We

recall that, given a matrix ๐พ โˆˆ ๐’ฆ, we can define the charged sparsity graph ๐บ = ([๐‘ ], ๐ธ, ๐œ–)

of ๐พ, where ๐บ is the simple graph on vertex set [๐‘ ] with an edge ๐‘–, ๐‘— โˆˆ ๐ธ, ๐‘– = ๐‘—, if

๐พ๐‘–,๐‘— = 0 and endowed with a function ๐œ– mapping subgraphs of ๐บ to โˆ’1,+1. Viewing the

possible values ยฑ1 for ๐œ– as the two elements of ๐บ๐น (2), we note that ๐œ–(๐ป) is additive over

๐บ๐น (2), i.e. ๐œ–(๐ป1 + ๐ป2) = ๐œ–(๐ป1)๐œ–(๐ป2). We can make a number of observations regarding

the connectivity of ๐บ. If ๐บ is disconnected, with connected components given by vertex

sets ๐‘‰1, ..., ๐‘‰๐‘˜, then any principal minor โˆ†๐‘† satisfies

โˆ†๐‘† =๐‘˜โˆ

๐‘—=1

โˆ†๐‘†โˆฉ๐‘‰๐‘—. (2.9)

In addition, if ๐บ has a cut vertex ๐‘–, whose removal results in connected components with

vertex sets ๐‘‰1, ..., ๐‘‰๐‘˜, then the principal minor โˆ†๐‘† satisfies

โˆ†๐‘† =๐‘˜โˆ‘

๐‘—1=1

โˆ†[๐‘–โˆช๐‘‰๐‘—1]โˆฉ๐‘†

โˆ๐‘—2 =๐‘—1

โˆ†๐‘‰๐‘—2โˆฉ๐‘† โˆ’ (๐‘˜ โˆ’ 1)โˆ†๐‘–โˆฉ๐‘†

๐‘˜โˆ๐‘—=1

โˆ†๐‘‰๐‘—โˆฉ๐‘†. (2.10)

This implies that the principal minors of ๐พ are uniquely determined by principal minors

corresponding to subsets of blocks of ๐บ. Given this property, we focus on matrices ๐พ

without an articulation point, i.e. ๐บ is two-connected. Given results for matrices without an

articulation point and Equations (2.9) and (2.10), we can then answer Question (2.7) in the

more general case.

Next, we make an important observation regarding the contribution of certain terms

in the Laplace expansion of a principal minor. Recall that the Laplace expansion of the

determinant is given by

det(๐พ) =โˆ‘๐œŽโˆˆ๐’ฎ๐‘

sgn(๐œŽ)๐‘โˆ๐‘–=1

๐พ๐‘–,๐œŽ(๐‘–),

42

where sgn(๐œŽ) is multiplicative over the (disjoint) cycles forming the permutation, and is

preserved when taking the inverse of the permutation, or of any cycle therein. Consider

now an arbitrary, possibly non-induced, simple cycle ๐ถ (in the graph-theoretic sense) of

the sparsity graph ๐บ, without loss of generality given by ๐ถ = 1 2 ... ๐‘˜ 1, that satisfies

๐œ–(๐ถ) = โˆ’1. Consider the sum of all terms in the Laplace expansion that contains either

the cycle (1 2 ... ๐‘˜) or its inverse (๐‘˜ ๐‘˜ โˆ’ 1 ... 1) in the associated permutation. Because

๐œ–(๐ถ) = โˆ’1,

๐พ๐‘˜,1

๐‘˜โˆ’1โˆ๐‘–=1

๐พ๐‘–,๐‘–+1 + ๐พ1,๐‘˜

๐‘˜โˆ๐‘–=2

๐พ๐‘–,๐‘–โˆ’1 = 0,

and the sum over all permutations containing the cyclic permutations (๐‘˜ ๐‘˜ โˆ’ 1 ... 1) or

(1 2 ... ๐‘˜) is zero. Therefore, terms associated with permutations containing negative cycles

(๐œ–(๐ถ) = โˆ’1) do not contribute to principal minors.

To illustrate the additional information contained in higher order principal minors โˆ†๐‘† ,

|๐‘†| > 2, we first consider principal minors of order three and four. Consider the principal

minor โˆ†1,2,3, given by

โˆ†1,2,3 = ๐พ1,1๐พ2,2๐พ3,3 โˆ’๐พ1,1๐พ2,3๐พ3,2 โˆ’๐พ2,2๐พ1,3๐พ3,1 โˆ’๐พ3,3๐พ1,2๐พ2,1

+ ๐พ1,2๐พ2,3๐พ3,1 + ๐พ3,2๐พ2,1๐พ1,3

= โˆ†1โˆ†2โˆ†3 โˆ’โˆ†1

[โˆ†2โˆ†3 โˆ’โˆ†2,3

]โˆ’โˆ†2

[โˆ†1โˆ†3 โˆ’โˆ†1,3

]โˆ’โˆ†3

[โˆ†1โˆ†2 โˆ’โˆ†1,2

]+[1 + ๐œ–1,2๐œ–2,3๐œ–3,1

]๐พ1,2๐พ2,3๐พ3,1.

If the corresponding graph ๐บ[1, 2, 3] is not a cycle, or if the cycle is negative (๐œ–1,2๐œ–2,3๐œ–3,1 =

โˆ’1), then โˆ†1,2,3 can be written in terms of principal minors of order at most two, and contains

no additional information about ๐พ. If ๐บ[1, 2, 3] is a positive cycle, then we can write

๐พ1,2๐พ2,3๐พ3,1 as a function of principal minors of order at most three,

๐พ1,2๐พ2,3๐พ3,1 = โˆ†1โˆ†2โˆ†3 โˆ’1

2

[โˆ†1โˆ†2,3 + โˆ†2โˆ†1,3 + โˆ†3โˆ†1,2

]+

1

2โˆ†1,2,3,

which allows us to compute sgn(๐พ1,2๐พ2,3๐พ3,1). This same procedure holds for any posi-

tively charged induced simple cycle in ๐บ. However, when a simple cycle is not induced,

43

further analysis is required. To illustrate some of the potential issues for a non-induced

positive simple cycle, we consider principal minors of order four.

Consider the principal minor โˆ†1,2,3,4. By our previous analysis, all terms in the Laplace

expansion of โˆ†1,2,3,4 corresponding to permutations with cycles of length at most three can

be written in terms of principal minors of size at most three. What remains is to consider

the sum of the terms in the Laplace expansion corresponding to permutations containing

a cycle of length four (which there are three pairs of, each pair corresponding to the two

orientations of a cycle of length four in the graph sense), which we denote by ๐‘, given by

๐‘ = โˆ’๐พ1,2๐พ2,3๐พ3,4๐พ4,1 โˆ’๐พ1,2๐พ2,4๐พ4,3๐พ3,1 โˆ’๐พ1,3๐พ3,2๐พ2,4๐พ4,1

โˆ’๐พ4,3๐พ3,2๐พ2,1๐พ1,4 โˆ’๐พ4,2๐พ2,1๐พ1,3๐พ3,4 โˆ’๐พ4,2๐พ2,3๐พ3,1๐พ1,4.

If ๐บ[1, 2, 3, 4] does not contain a positive simple cycle of length four, then ๐‘ = 0 and

โˆ†1,2,3,4 can be written in terms of principal minors of order at most three. If ๐บ[1, 2, 3, 4]

has exactly one positive cycle of length four, without loss of generality given by ๐ถ :=

1 2 3 4 1, then

๐‘ = โˆ’[1 + ๐œ–(๐ถ)

]๐พ1,2๐พ2,3๐พ3,4๐พ4,1 = โˆ’2๐พ1,2๐พ2,3๐พ3,4๐พ4,1,

and we can write ๐พ1,2๐พ2,3๐พ3,4๐พ4,1 as a function of principal minors of order at most four,

which allows us to compute sgn(๐พ1,2๐พ2,3๐พ3,4๐พ4,1). Finally, we treat the case in which

there is more than one positive simple cycle of length four. This implies that all simple

cycles of length four are positive and ๐‘ is given by

๐‘ = โˆ’2[๐พ1,2๐พ2,3๐พ3,4๐พ4,1 + ๐พ1,2๐พ2,4๐พ4,3๐พ3,1 + ๐พ1,3๐พ3,2๐พ2,4๐พ4,1

]. (2.11)

By Condition (2.6), the magnitude of each of these three terms is distinct, ๐‘ = 0, and we can

compute the sign of each of these terms, as there is a unique choice of ๐œ‘1, ๐œ‘2, ๐œ‘3 โˆˆ โˆ’1,+1

44

satisfying

๐œ‘1|๐พ1,2๐พ2,3๐พ3,4๐พ4,1|+ ๐œ‘2|๐พ1,2๐พ2,4๐พ4,3๐พ3,1|+ ๐œ‘3|๐พ1,3๐พ3,2๐พ2,4๐พ4,1| = โˆ’๐‘

2.

In order to better understand the behavior of principal minors of order greater than four, we

investigate the set of positive simple cycles (i.e., simple cycles ๐ถ satisfying ๐œ–(๐ถ) = +1). In

the following two propositions, we compute the dimension of ๐’ž+(๐บ), and note that when

๐บ is two-connected, this space is spanned by incidence vectors corresponding to positive

simple cycles.

Proposition 2. Let ๐บ = ([๐‘ ], ๐ธ, ๐œ–) be a charged graph. Then dim(๐’ž+(๐บ)) = ๐œˆ โˆ’ 1 if ๐บ

contains at least one negative cycle, and equals ๐œˆ otherwise.

Proof. Consider a basis ๐‘ฅ1, ..., ๐‘ฅ๐œˆ for ๐’ž(๐บ). If all associated cycles are positive, then

dim(๐’ž+(๐บ)) = ๐œˆ and we are done. If ๐บ has a negative cycle, then, by the linearity of ๐œ–, at

least one element of ๐‘ฅ1, ..., ๐‘ฅ๐œˆ must be the incidence vector of a negative cycle. Without

loss of generality, suppose that ๐‘ฅ1, ..., ๐‘ฅ๐‘– are incidence vectors of negative cycles. Then

๐‘ฅ1 + ๐‘ฅ2, ..., ๐‘ฅ1 + ๐‘ฅ๐‘–, ๐‘ฅ๐‘–+1, ..., ๐‘ฅ๐œˆ is a linearly independent set of incidence vectors of positive

cycles. Therefore, dim(๐’ž+(๐บ)) โ‰ฅ ๐œˆ โˆ’ 1. However, ๐‘ฅ1 is the incidence vector of a negative

cycle, and so cannot be in the span of ๐‘ฅ1 + ๐‘ฅ2, ..., ๐‘ฅ1 + ๐‘ฅ๐‘–, ๐‘ฅ๐‘–+1, ..., ๐‘ฅ๐œˆ .

Proposition 3. Let ๐บ = ([๐‘ ], ๐ธ, ๐œ–) be a two-connected charged graph. Then

๐’ž+(๐บ) = span๐œ’๐ถ |๐ถ is a simple cycle of ๐บ, ๐œ–(๐ถ) = +1.

Proof. If ๐บ does not contain a negative cycle, then the result follows immediately, as then

๐’ž+(๐บ) = ๐’ž(๐บ). Suppose then that ๐บ has at least one negative cycle. Decomposing this

negative cycle into the union of edge-disjoint simple cycles, we note that ๐บ must also have

at least one negative simple cycle.

Since ๐บ is a two-connected graph, it admits a proper (also called open) ear decomposition

with ๐œˆ โˆ’ 1 proper ears (Whitney [128], see also [63]) starting from any simple cycle. We

choose our initial simple cycle to be a negative simple cycle denoted by ๐บ0. Whitneyโ€™s proper

45

ear decomposition says that we can obtain a sequence of graphs ๐บ0, ๐บ1, ยท ยท ยท , ๐บ๐œˆโˆ’1 = ๐บ

where ๐บ๐‘– is obtained from ๐บ๐‘–โˆ’1 by adding a path ๐‘ƒ๐‘– between two distinct vertices ๐‘ข๐‘– and ๐‘ฃ๐‘–

of ๐‘‰ (๐บ๐‘–โˆ’1) with its internal vertices not belonging to ๐‘‰ (๐บ๐‘–โˆ’1) (a proper or open ear). By

construction, ๐‘ƒ๐‘– is also a path between ๐‘ข๐‘– and ๐‘ฃ๐‘– in ๐บ๐‘–.

We will prove a stronger statement by induction on ๐‘–, namely that for suitably constructed

positive simple cycles ๐ถ๐‘— , ๐‘— = 1, ..., ๐œˆ โˆ’ 1, we have that, for every ๐‘– = 0, 1, ..., ๐œˆ โˆ’ 1:

(i) ๐’ž+(๐บ๐‘–) = span๐œ’๐ถ๐‘—: 1 โ‰ค ๐‘— โ‰ค ๐‘–,

(ii) for every pair of distinct vertices ๐‘ข, ๐‘ฃ โˆˆ ๐‘‰ (๐บ๐‘–), there exists both a positive path in ๐บ๐‘–

between ๐‘ข and ๐‘ฃ and a negative path in ๐บ๐‘– between ๐‘ข and ๐‘ฃ.

For ๐‘– = 0, (i) is clear and (ii) follows from the fact that the two paths between ๐‘ข and ๐‘ฃ in

the cycle ๐บ0 must have opposite charge since their sum is ๐บ0 and ๐œ–(๐บ0) = โˆ’1. For ๐‘– > 0,

we assume that we have constructed ๐ถ1, ยท ยท ยท , ๐ถ๐‘–โˆ’1 satisfying (i) and (ii) for smaller values

of ๐‘–. To construct ๐ถ๐‘–, we take the path ๐‘ƒ๐‘– between ๐‘ข๐‘– and ๐‘ฃ๐‘– and complete it with a path of

charge ๐œ–(๐‘ƒ๐‘–) between ๐‘ข๐‘– and ๐‘ฃ๐‘– in ๐บ๐‘–โˆ’1. The existence of this latter path follows from (ii),

and these two paths together form a simple cycle ๐ถ๐‘– with ๐œ–(๐ถ๐‘–) = +1. It is clear that this ๐ถ๐‘–

is linearly independent from all the previously constructed cycles ๐ถ๐‘— since ๐ถ๐‘– contains ๐‘ƒ๐‘–

but ๐‘ƒ๐‘– was not even part of ๐บ๐‘–โˆ’1. This implies (i) for ๐‘–.

To show (ii) for ๐บ๐‘–, we need to consider three cases for ๐‘ข = ๐‘ฃ. If ๐‘ข, ๐‘ฃ โˆˆ ๐‘‰ (๐บ๐‘–โˆ’1), we

can use the corresponding positive and negative paths in ๐บ๐‘–โˆ’1 (whose existence follows

from (ii) for ๐‘–โˆ’ 1). If ๐‘ข, ๐‘ฃ โˆˆ ๐‘‰ (๐‘ƒ๐‘–), one of the paths can be the subpath ๐‘ƒ๐‘ข๐‘ฃ of ๐‘ƒ๐‘– between

๐‘ข and ๐‘ฃ and the other path can be ๐‘ƒ๐‘– โˆ– ๐‘ƒ๐‘ข๐‘ฃ together with a path in ๐บ๐‘–โˆ’1 between ๐‘ข๐‘– and ๐‘ฃ๐‘– of

charge equal to โˆ’๐œ–(๐‘ƒ๐‘–) (so that together they form a negative cycle). Otherwise, we must

have ๐‘ข โˆˆ ๐‘‰ (๐‘ƒ๐‘–) โˆ– ๐‘ข๐‘–, ๐‘ฃ๐‘– and ๐‘ฃ โˆˆ ๐‘‰ (๐บ๐‘–โˆ’1) โˆ– ๐‘ข๐‘–, ๐‘ฃ๐‘– (or vice versa), and we can select the

paths to be the path in ๐‘ƒ๐‘– from ๐‘ข to ๐‘ข๐‘– together with two oppositely charged paths in ๐บ๐‘–โˆ’1

between ๐‘ข๐‘– and ๐‘ฃ. In all these cases, we have shown property (ii) holds for ๐บ๐‘–.

For the remainder of the analysis in this subsection, we assume that ๐บ contains a

negative cycle, otherwise ๐’ž+(๐บ) = ๐’ž(๐บ) and we inherit all of the desirable properties of

๐’ž(๐บ). Next, we study the properties of simple cycle bases (bases consisting of incidence

46

vectors of simple cycles) of ๐’ž+(๐บ) that are minimal in some sense. We say that a simple

cycle basis ๐œ’๐ถ1 , ..., ๐œ’๐ถ๐œˆโˆ’1 for ๐’ž+(๐บ) is lexicographically minimal if it lexicographically

minimizes (|๐ถ1|, |๐ถ2|, ..., |๐ถ๐œˆโˆ’1|), i.e., minimizes |๐ถ1|, minimizes |๐ถ2| conditional on |๐ถ1|

being minimal, etc. Any lexicographical minimal basis also minimizesโˆ‘|๐ถ๐‘–| and max |๐ถ๐‘–|

(by optimality of the greedy algorithm for (linear) matroids). For brevity, we will simply

refer to such a basis as โ€œminimal." One complicating issue is that, while a minimal cycle

basis for ๐’ž(๐บ) always consists of induced simple cycles, this is no longer the case for ๐’ž+(๐บ).

This for example can happen for an appropriately constructed graph with only two negative,

short simple cycles, far away from each other; a minimal cycle basis for ๐’ž+(๐บ) then contains

a cycle consisting of the union of these 2 negative simple cycles.

However, we can make a number of statements regarding chords of simple cycles

corresponding to elements of a minimal simple cycle basis of ๐’ž+(๐บ). In the following

lemma, we show that a minimal simple cycle basis satisfies a number of desirable properties.

However, before we do so, we introduce some useful notation. Given a simple cycle ๐ถ, it is

convenient to fix an orientation say ๐ถ = ๐‘–1 ๐‘–2 ... ๐‘–๐‘˜ ๐‘–1. Now for any chord ๐‘–๐‘˜1 , ๐‘–๐‘˜2 โˆˆ ๐›พ(๐ถ),

๐‘˜1 < ๐‘˜2, we denote the two cycles created by this chord by

๐ถ(๐‘˜1, ๐‘˜2) = ๐‘–๐‘˜1 ๐‘–๐‘˜1+1 ... ๐‘–๐‘˜2 ๐‘–๐‘˜1 and ๐ถ(๐‘˜2, ๐‘˜1) = ๐‘–๐‘˜2 ๐‘–๐‘˜2+1 ... ๐‘–๐‘˜ ๐‘–1 ... ๐‘–๐‘˜1 ๐‘–๐‘˜2 .

We have the following result.

Lemma 6. Let ๐บ = ([๐‘ ], ๐ธ, ๐œ–) be a two-connected charged graph, and ๐ถ := ๐‘–1 ๐‘–2 ... ๐‘–๐‘˜ ๐‘–1

be a cycle corresponding to an incidence vector of a minimal simple cycle basis ๐‘ฅ1, ..., ๐‘ฅ๐œˆโˆ’1

of ๐’ž+(๐บ). Then

(i) ๐œ–(๐ถ(๐‘˜1, ๐‘˜2)

)= ๐œ–(๐ถ(๐‘˜2, ๐‘˜1)

)= โˆ’1 for all ๐‘–๐‘˜1 , ๐‘–๐‘˜2 โˆˆ ๐›พ(๐ถ),

(ii) if ๐‘–๐‘˜1 , ๐‘–๐‘˜2, ๐‘–๐‘˜3 , ๐‘–๐‘˜4 โˆˆ ๐›พ(๐ถ) satisfy ๐‘˜1 < ๐‘˜3 < ๐‘˜2 < ๐‘˜4 (crossing chords), then either

๐‘˜3 โˆ’ ๐‘˜1 = ๐‘˜4 โˆ’ ๐‘˜2 = 1 or ๐‘˜1 = 1, ๐‘˜2 โˆ’ ๐‘˜3 = 1, ๐‘˜4 = ๐‘˜, i.e. these two chords form a

four-cycle with two edges of the cycle,

(iii) there does not exist ๐‘–๐‘˜1 , ๐‘–๐‘˜2, ๐‘–๐‘˜3 , ๐‘–๐‘˜4, ๐‘–๐‘˜5 , ๐‘–๐‘˜6 โˆˆ ๐›พ(๐ถ) satisfying either ๐‘˜1 < ๐‘˜2 โ‰ค

47

๐‘˜3 < ๐‘˜4 โ‰ค ๐‘˜5 < ๐‘˜6 or ๐‘˜6 โ‰ค ๐‘˜1 < ๐‘˜2 โ‰ค ๐‘˜3 < ๐‘˜4 โ‰ค ๐‘˜5.

Proof. Without loss of generality, suppose that ๐ถ = 1 2 ... ๐‘˜. Given a chord ๐‘˜1, ๐‘˜2,

๐ถ(๐‘˜1, ๐‘˜2) + ๐ถ(๐‘˜2, ๐‘˜1) = ๐ถ, and so ๐œ–(๐ถ) = +1 implies that ๐œ–(๐ถ(๐‘˜1, ๐‘˜2)

)= ๐œ–(๐ถ(๐‘˜2, ๐‘˜1)

).

If both these cycles were positive, then this would contradict the minimality of the basis, as

|๐ถ(๐‘˜1, ๐‘˜2)|, |๐ถ(๐‘˜2, ๐‘˜1)| < ๐‘˜. This completes the proof of Property (i).

Given two chords ๐‘˜1, ๐‘˜2 and ๐‘˜3, ๐‘˜4 satisfying ๐‘˜1 < ๐‘˜3 < ๐‘˜2 < ๐‘˜4, we consider the

cycles

๐ถ1 := ๐ถ(๐‘˜1, ๐‘˜2) + ๐ถ(๐‘˜3, ๐‘˜4),

๐ถ2 := ๐ถ(๐‘˜1, ๐‘˜2) + ๐ถ(๐‘˜4, ๐‘˜3).

By Property (i), ๐œ–(๐ถ1) = ๐œ–(๐ถ2) = +1. In addition, ๐ถ1 + ๐ถ2 = ๐ถ, and |๐ถ1|, |๐ถ2| โ‰ค |๐ถ|.

By the minimality of the basis, either |๐ถ1| = |๐ถ| or |๐ถ2| = |๐ถ|, which implies that either

|๐ถ1| = 4 or |๐ถ2| = 4 (or both). This completes the proof of Property (ii).

Given three non-crossing chords ๐‘˜1, ๐‘˜2, ๐‘˜3, ๐‘˜4, and ๐‘˜5, ๐‘˜6, with ๐‘˜1 < ๐‘˜2 โ‰ค ๐‘˜3 <

๐‘˜4 โ‰ค ๐‘˜5 < ๐‘˜6 (the other case is identical, up to rotation), we consider the cycles

๐ถ1 := ๐ถ(๐‘˜3, ๐‘˜4) + ๐ถ(๐‘˜5, ๐‘˜6) + ๐ถ,

๐ถ2 := ๐ถ(๐‘˜1, ๐‘˜2) + ๐ถ(๐‘˜5, ๐‘˜6) + ๐ถ,

๐ถ3 := ๐ถ(๐‘˜1, ๐‘˜2) + ๐ถ(๐‘˜3, ๐‘˜4) + ๐ถ.

By Property (i), ๐œ–(๐ถ1) = ๐œ–(๐ถ2) = ๐œ–(๐ถ3) = +1. In addition, ๐ถ1 + ๐ถ2 + ๐ถ3 = ๐ถ and

|๐ถ1|, |๐ถ2|, |๐ถ3| < |๐ถ|, a contradiction to the minimality of the basis. This completes the

proof of Property (iii).

The above lemma tells us that in a minimal simple cycle basis for ๐’ž+(๐บ), in each cycle

two crossing chords always form a positive cycle of length four (and thus any chord has at

most one other crossing chord), and there doesnโ€™t exist three non-crossing chords without

one chord in between the other two. Using a simple cycle basis of ๐’ž+(๐บ) that satisfies the

three properties of the above lemma, we aim to show that the principal minors of order at

48

most the length of the longest cycle in the basis uniquely determine the principal minors of

all orders. To do so, we prove a series of three lemmas that show that from the principal

minor corresponding to a cycle ๐ถ = ๐‘–1 ... ๐‘–๐‘˜ ๐‘–1 in our basis and ๐‘‚(1) principal minors of

subsets of ๐‘‰ (๐ถ), we can compute the quantity

๐‘ (๐ถ) :=โˆ

๐‘–,๐‘—โˆˆ๐ธ(๐ถ),๐‘–<๐‘—

sgn(๐พ๐‘–,๐‘—

).

Once we have computed ๐‘ (๐ถ๐‘–) for every simple cycle in our basis, we can compute ๐‘ (๐ถ)

for any simple positive cycle by writing ๐ถ as a sum of some subset of the simple cycles

๐ถ1, ..., ๐ถ๐œˆโˆ’1 corresponding to incidence vectors in our basis, and taking the product of ๐‘ (๐ถ๐‘–)

for all indices ๐‘– in the aforementioned sum. To this end, we prove the following three

lemmas.

Lemma 7. Let ๐ถ be a positive simple cycle of ๐บ = ([๐‘ ], ๐ธ, ๐œ–) with ๐‘‰ (๐ถ) = ๐‘–1, ๐‘–2, ยท ยท ยท , ๐‘–๐‘˜

whose chords ๐›พ(๐ถ) satisfy Properties (i)-(iii) of Lemma 6. Then there exist two edges of ๐ถ,

say, ๐‘–1, ๐‘–๐‘˜ and ๐‘–โ„“, ๐‘–โ„“+1, where ๐ถ := ๐‘–1 ๐‘–2 ... ๐‘–๐‘˜ ๐‘–1 such that

(i) all chords ๐‘–๐‘, ๐‘–๐‘ž โˆˆ ๐›พ(๐ถ), ๐‘ < ๐‘ž, satisfy ๐‘ โ‰ค โ„“ < ๐‘ž,

(ii) any positive simple cycle in ๐บ[๐‘‰ (๐ถ)] containing ๐‘–1, ๐‘–๐‘˜ spans ๐‘‰ (๐ถ) and contains

only edges in ๐ธ(๐ถ) and pairs of crossed chords.

Proof. We first note that Property (i) of Lemma 6 implies that the charge of a cycle ๐ถ โ€ฒ โŠ‚

๐บ[๐‘‰ (๐ถ)] corresponds to the parity of ๐ธ(๐ถ โ€ฒ) โˆฉ ๐›พ(๐ถ), i.e., if ๐ถ โ€ฒ contains an even number of

chords then ๐œ–(๐ถ โ€ฒ) = +1, otherwise ๐œ–(๐ถ โ€ฒ) = โˆ’1. This follows from constructing a cycle

basis for ๐บ[๐‘‰ (๐ถ)] consisting of incidence vectors corresponding to ๐ถ and ๐ถ(๐‘ข, ๐‘ฃ) for every

chord ๐‘ข, ๐‘ฃ โˆˆ ๐›พ(๐ถ).

Let the cycle ๐ถ(๐‘–, ๐‘—) be a shortest cycle in ๐บ[๐‘‰ (๐ถ)] containing exactly one chord of

๐ถ. If this is a non-crossed chord, then we take an arbitrary edge in ๐ธ(๐ถ(๐‘–, ๐‘—))โˆ–๐‘–, ๐‘— to be

๐‘–1, ๐‘–๐‘˜. If this chord crosses another, denoted by ๐‘–โ€ฒ, ๐‘—โ€ฒ, then either ๐ถ(๐‘–โ€ฒ, ๐‘—โ€ฒ) or ๐ถ(๐‘—โ€ฒ, ๐‘–โ€ฒ) is

also a shortest cycle, and we take an arbitrary edge in the intersection of these two shortest

49

cycles. Without loss of generality, let 1, ๐‘˜ denote this edge and let the cycle be 1 2 ... ๐‘˜ 1,

where for simplicity we have assumed that ๐‘‰ (๐ถ) = [๐‘˜]. Because of Property (iii) of Lemma

6, there exists โ„“ such that all chords ๐‘–, ๐‘— โˆˆ ๐›พ(๐ถ) satisfy ๐‘– โ‰ค โ„“ and ๐‘— > โ„“. This completes

the proof of Property (i).

Next, consider an arbitrary positive simple cycle ๐ถ โ€ฒ in ๐บ[๐‘‰ (๐ถ)] containing 1, ๐‘˜.

We aim to show that this cycle contains only edges in ๐ธ(๐บ) and pairs of crossed chords,

from which the equality ๐‘‰ (๐ถ โ€ฒ) = ๐‘‰ (๐ถ) immediately follows. Since ๐ถ โ€ฒ is positive, it

contains an even number of chords ๐›พ(๐ถ), and either all the chords ๐›พ(๐ถ) โˆฉ ๐ธ(๐ถ โ€ฒ) are pairs

of crossed chords or there exist two chords ๐‘1, ๐‘ž1, ๐‘2, ๐‘ž2 โˆˆ ๐›พ(๐ถ) โˆฉ ๐ธ(๐ถ โ€ฒ), w.l.o.g.

๐‘1 โ‰ค ๐‘2 โ‰ค โ„“ < ๐‘ž2 < ๐‘ž1 (we can simply reverse the orientation if ๐‘ž1 = ๐‘ž2), that do not cross

any other chord in ๐›พ(๐ถ) โˆฉ ๐ธ(๐ถ โ€ฒ). In this case, there would be a 1โˆ’ ๐‘ž2 path in ๐ถ โ€ฒ neither

containing ๐‘1 nor ๐‘ž1 (as ๐‘1, ๐‘ž1 would be along the other path between 1 and ๐‘ž2 in ๐ถ โ€ฒ and

๐‘ž1 = ๐‘ž2). However, this is a contradiction as no such path exists in ๐ถ โ€ฒ, since ๐‘1, ๐‘ž1 does

not cross any chord in ๐›พ(๐ถ) โˆฉ ๐ธ(๐ถ โ€ฒ). Therefore, the chords ๐›พ(๐ถ) โˆฉ ๐ธ(๐ถ โ€ฒ) are all pairs of

crossed chords. Furthermore, since all pairs of crossed chords must define cycles of length

4, we have ๐‘‰ (๐ถ โ€ฒ) = ๐‘‰ (๐ถ). This completes the proof of Property (ii).

Although this is not needed in what follows, observe that, in the above Lemma 7,

๐‘–โ„“, ๐‘–โ„“+1 also plays the same role as ๐‘–1, ๐‘–๐‘˜ in the sense that any positive simple cycle

containing ๐‘–โ„“, ๐‘–โ„“+1 also spans ๐‘‰ (๐ถ) and contains only pairs of crossed chords and edges

in ๐ธ(๐ถ).

Lemma 8. Let ๐พ โˆˆ ๐’ฆ have charged sparsity graph ๐บ = ([๐‘ ], ๐ธ, ๐œ–), and ๐ถ = ๐‘–1 ... ๐‘–๐‘˜ ๐‘–1

be a positive simple cycle of ๐บ whose chords ๐›พ(๐ถ) satisfy Properties (i)-(iii) of Lemma

6, and with vertices ordered as in Lemma 7. Then the principal minor corresponding to

50

๐‘† = ๐‘‰ (๐ถ) is given by

โˆ†๐‘† = โˆ†๐‘–1โˆ†๐‘†โˆ–๐‘–1 + โˆ†๐‘–๐‘˜โˆ†๐‘†โˆ–๐‘–๐‘˜ +[โˆ†๐‘–1,๐‘–๐‘˜ โˆ’ 2โˆ†๐‘–1โˆ†๐‘–๐‘˜

]โˆ†๐‘†โˆ–๐‘–1,๐‘–๐‘˜

โˆ’ 2๐พ๐‘–1,๐‘–๐‘˜โˆ’1๐พ๐‘–๐‘˜โˆ’1,๐‘–๐‘˜๐พ๐‘–๐‘˜,๐‘–2๐พ๐‘–2,๐‘–1โˆ†๐‘†โˆ–๐‘–1,๐‘–2,๐‘–๐‘˜โˆ’1,๐‘–๐‘˜

โˆ’[โˆ†๐‘–1,๐‘–2 โˆ’โˆ†๐‘–1โˆ†๐‘–2

][โˆ†๐‘–๐‘˜โˆ’1,๐‘–๐‘˜ โˆ’โˆ†๐‘–๐‘˜โˆ’1

โˆ†๐‘–๐‘˜

]โˆ†๐‘†โˆ–๐‘–1,๐‘–2,๐‘–๐‘˜โˆ’1,๐‘–๐‘˜

+[โˆ†๐‘–1,๐‘–๐‘˜โˆ’1

โˆ’โˆ†๐‘–1โˆ†๐‘–๐‘˜โˆ’1

][โˆ†๐‘–2,๐‘–๐‘˜ โˆ’โˆ†๐‘–2โˆ†๐‘–๐‘˜

]โˆ†๐‘†โˆ–๐‘–1,๐‘–2,๐‘–๐‘˜โˆ’1,๐‘–๐‘˜

+[โˆ†๐‘–1,๐‘–2 โˆ’โˆ†๐‘–1โˆ†๐‘–2

][โˆ†๐‘†โˆ–๐‘–1,๐‘–2 โˆ’โˆ†๐‘–๐‘˜โˆ†๐‘†โˆ–๐‘–1,๐‘–2,๐‘–๐‘˜

]+[โˆ†๐‘–๐‘˜โˆ’1,๐‘–๐‘˜ โˆ’โˆ†๐‘–๐‘˜โˆ’1

โˆ†๐‘–๐‘˜

][โˆ†๐‘†โˆ–๐‘–๐‘˜โˆ’1,๐‘–๐‘˜ โˆ’โˆ†๐‘–2โˆ†๐‘†โˆ–๐‘–1,๐‘–๐‘˜โˆ’1,๐‘–๐‘˜

]+ ๐‘.

where ๐‘ is the sum of terms in the Laplace expansion of โˆ†๐‘† corresponding to a permutation

where ๐œŽ(๐‘–1) = ๐‘–๐‘˜ or ๐œŽ(๐‘–๐‘˜) = ๐‘–1, but not both.

Proof. Without loss of generality, suppose that ๐ถ = 1 2 ... ๐‘˜ 1. Each term in โˆ†๐‘† corre-

sponds to a partition of ๐‘† = ๐‘‰ (๐ถ) into a disjoint collection of vertices, pairs corresponding

to edges of ๐บ[๐‘†], and simple cycles ๐ถ๐‘– of ๐ธ(๐‘†) which can be assumed to be positive. We

can decompose the Laplace expansion of โˆ†๐‘† into seven sums of terms, based on how the as-

sociated permutation for each term treats the elements 1 and ๐‘˜. Let ๐‘‹๐‘—1,๐‘—2 , ๐‘—1, ๐‘—2 โˆˆ 1, ๐‘˜, *,

equal the sum of terms corresponding to permutations where ๐œŽ(1) = ๐‘—1 and ๐œŽ(๐‘˜) = ๐‘—2,

where * denotes any element in 2, ..., ๐‘˜ โˆ’ 1. The case ๐œŽ(1) = ๐œŽ(๐‘˜) obviously cannot

occur, and so

โˆ†๐‘† = ๐‘‹1,๐‘˜ + ๐‘‹1,* + ๐‘‹*,๐‘˜ + ๐‘‹๐‘˜,1 + ๐‘‹๐‘˜,* + ๐‘‹*,1 + ๐‘‹*,*.

By definition, ๐‘ = ๐‘‹๐‘˜,* + ๐‘‹*,1. What remains is to compute the remaining five terms. We

have

๐‘‹๐‘˜,1 = โˆ’๐พ1,๐‘˜๐พ๐‘˜,1โˆ†๐‘†โˆ–1,๐‘˜

=[โˆ†1,๐‘˜ โˆ’โˆ†1โˆ†๐‘˜

]โˆ†๐‘†โˆ–1,๐‘˜,

51

๐‘‹1,๐‘˜ = โˆ†1โˆ†๐‘˜โˆ†๐‘†โˆ–1,๐‘˜,

๐‘‹1,๐‘˜ + ๐‘‹1,* = โˆ†1โˆ†๐‘†โˆ–1,

๐‘‹1,๐‘˜ + ๐‘‹*,๐‘˜ = โˆ†๐‘˜โˆ†๐‘†โˆ–๐‘˜,

and so

โˆ†๐‘† = โˆ†1โˆ†๐‘†โˆ–1 + โˆ†๐‘˜โˆ†๐‘†โˆ–๐‘˜ +[โˆ†1,๐‘˜ โˆ’ 2โˆ†1โˆ†๐‘˜

]โˆ†๐‘†โˆ–1,๐‘˜ + ๐‘‹*,* + ๐‘.

The sum ๐‘‹*,* corresponds to permutations where ๐œŽ(1) /โˆˆ 1, ๐‘˜ and ๐œŽ(๐‘˜) /โˆˆ 1, ๐‘˜.

We first show that the only permutations that satisfy these two properties take the form

๐œŽ2(1) = 1 or ๐œŽ2(๐‘˜) = ๐‘˜ (or both), or contain both 1 and ๐‘˜ in the same cycle. Suppose to

the contrary, that there exists some permutation where 1 and ๐‘˜ are not in the same cycle,

and both are in cycles of length at least three. Then in ๐บ[๐‘†] both vertices 1 and ๐‘˜ contain

a chord emanating from them. By Property (ii) of Lemma 6, each has exactly one chord

and these chords are 1, ๐‘˜ โˆ’ 1 and 2, ๐‘˜. The cycle containing 1 must also contain 2 and

๐‘˜ โˆ’ 1, and the cycle containing ๐‘˜ must also contain 2 and ๐‘˜ โˆ’ 1, a contradiction.

In addition, we note that, also by the above analysis, the only cycles (of a permutation)

that can contain both 1 and ๐‘˜ without having ๐œŽ(1) = ๐‘˜ or ๐œŽ(๐‘˜) = 1 are given by (1 2 ๐‘˜ ๐‘˜โˆ’1)

and (1 ๐‘˜โˆ’ 1 ๐‘˜ 2). Therefore, we can decompose ๐‘‹*,* further into the sum of five terms. Let

โ€ข ๐‘Œ1 = the sum of terms corresponding to permutations containing either (1 2 ๐‘˜ ๐‘˜ โˆ’ 1)

or (1 ๐‘˜ โˆ’ 1 ๐‘˜ 2),

โ€ข ๐‘Œ2 = the sum of terms corresponding to permutations containing (1 2) and (๐‘˜ โˆ’ 1 ๐‘˜),

โ€ข ๐‘Œ3 = the sum of terms corresponding to permutations containing (1 ๐‘˜ โˆ’ 1) and (2 ๐‘˜),

โ€ข ๐‘Œ4 = the sum of terms corresponding to permutations containing (1 2) and where

๐œŽ(๐‘˜) = ๐‘˜ โˆ’ 1, ๐‘˜,

โ€ข ๐‘Œ5 = the sum of terms corresponding to permutations containing (๐‘˜โˆ’ 1 ๐‘˜) and where

๐œŽ(1) = 1, 2.

52

๐‘Œ1 and ๐‘Œ3 are only non-zero if 1, ๐‘˜ โˆ’ 1 and 2, ๐‘˜ are both chords of ๐ถ. ๐‘Œ4 is only

non-zero if there is a chord incident to ๐‘˜, and ๐‘Œ5 is only non-zero if there is a chord incident

to 1. We have ๐‘‹*,* = ๐‘Œ1 + ๐‘Œ2 + ๐‘Œ3 + ๐‘Œ4 + ๐‘Œ5, and

๐‘Œ1 = โˆ’2๐พ1,๐‘˜โˆ’1๐พ๐‘˜โˆ’1,๐‘˜๐พ๐‘˜,2๐พ2,1โˆ†๐‘†โˆ–1,2,๐‘˜โˆ’1,๐‘˜,

๐‘Œ2 = ๐พ1,2๐พ2,1๐พ๐‘˜โˆ’1,๐‘˜๐พ๐‘˜,๐‘˜โˆ’1โˆ†๐‘†โˆ–1,2,๐‘˜โˆ’1,๐‘˜

=[โˆ†1,2 โˆ’โˆ†1โˆ†2

][โˆ†๐‘˜โˆ’1,๐‘˜ โˆ’โˆ†๐‘˜โˆ’1โˆ†๐‘˜

]โˆ†๐‘†โˆ–1,2,๐‘˜โˆ’1,๐‘˜,

๐‘Œ3 = ๐พ1,๐‘˜โˆ’1๐พ๐‘˜โˆ’1,1๐พ2,๐‘˜๐พ๐‘˜,2โˆ†๐‘†โˆ–1,2,๐‘˜โˆ’1,๐‘˜

=[โˆ†1,๐‘˜โˆ’1 โˆ’โˆ†1โˆ†๐‘˜โˆ’1

][โˆ†2,๐‘˜ โˆ’โˆ†2โˆ†๐‘˜

]โˆ†๐‘†โˆ–1,2,๐‘˜โˆ’1,๐‘˜,

๐‘Œ2 + ๐‘Œ4 = โˆ’๐พ1,2๐พ2,1

[โˆ†๐‘†โˆ–1,2 โˆ’โˆ†๐‘˜โˆ†๐‘†โˆ–1,2,๐‘˜

]=[โˆ†1,2 โˆ’โˆ†1โˆ†2

][โˆ†๐‘†โˆ–1,2 โˆ’โˆ†๐‘˜โˆ†๐‘†โˆ–1,2,๐‘˜

],

๐‘Œ2 + ๐‘Œ5 = โˆ’๐พ๐‘˜โˆ’1,๐‘˜๐พ๐‘˜,๐‘˜โˆ’1

[โˆ†๐‘†โˆ–๐‘˜โˆ’1,๐‘˜ โˆ’โˆ†1โˆ†๐‘†โˆ–1,๐‘˜โˆ’1,๐‘˜

]=[โˆ†๐‘˜โˆ’1,๐‘˜ โˆ’โˆ†๐‘˜โˆ’1โˆ†๐‘˜

][โˆ†๐‘†โˆ–๐‘˜โˆ’1,๐‘˜ โˆ’โˆ†2โˆ†๐‘†โˆ–1,๐‘˜โˆ’1,๐‘˜

].

Combining all of these terms gives us

๐‘‹*,* = โˆ’2๐พ1,๐‘˜โˆ’1๐พ๐‘˜โˆ’1,๐‘˜๐พ๐‘˜,2๐พ2,1โˆ†๐‘†โˆ–1,2,๐‘˜โˆ’1,๐‘˜

โˆ’[โˆ†1,2 โˆ’โˆ†1โˆ†2

][โˆ†๐‘˜โˆ’1,๐‘˜ โˆ’โˆ†๐‘˜โˆ’1โˆ†๐‘˜

]โˆ†๐‘†โˆ–1,2,๐‘˜โˆ’1,๐‘˜

+[โˆ†1,๐‘˜โˆ’1 โˆ’โˆ†1โˆ†๐‘˜โˆ’1

][โˆ†2,๐‘˜ โˆ’โˆ†2โˆ†๐‘˜

]โˆ†๐‘†โˆ–1,2,๐‘˜โˆ’1,๐‘˜

+[โˆ†1,2 โˆ’โˆ†1โˆ†2

][โˆ†๐‘†โˆ–1,2 โˆ’โˆ†๐‘˜โˆ†๐‘†โˆ–1,2,๐‘˜

]+[โˆ†๐‘˜โˆ’1,๐‘˜ โˆ’โˆ†๐‘˜โˆ’1โˆ†๐‘˜

][โˆ†๐‘†โˆ–๐‘˜โˆ’1,๐‘˜ โˆ’โˆ†2โˆ†๐‘†โˆ–1,๐‘˜โˆ’1,๐‘˜

].

53

Combining our formula for ๐‘‹*,* with our formula for โˆ†๐‘† leads to the desired result

โˆ†๐‘† = โˆ†1โˆ†๐‘†โˆ–1 + โˆ†๐‘˜โˆ†๐‘†โˆ–๐‘˜ +[โˆ†1,๐‘˜ โˆ’ 2โˆ†1โˆ†๐‘˜

]โˆ†๐‘†โˆ–1,๐‘˜

โˆ’ 2๐พ1,๐‘˜โˆ’1๐พ๐‘˜โˆ’1,๐‘˜๐พ๐‘˜,2๐พ2,1โˆ†๐‘†โˆ–1,2,๐‘˜โˆ’1,๐‘˜

โˆ’[โˆ†1,2 โˆ’โˆ†1โˆ†2

][โˆ†๐‘˜โˆ’1,๐‘˜ โˆ’โˆ†๐‘˜โˆ’1โˆ†๐‘˜

]โˆ†๐‘†โˆ–1,2,๐‘˜โˆ’1,๐‘˜

+[โˆ†1,๐‘˜โˆ’1 โˆ’โˆ†1โˆ†๐‘˜โˆ’1

][โˆ†2,๐‘˜ โˆ’โˆ†2โˆ†๐‘˜

]โˆ†๐‘†โˆ–1,2,๐‘˜โˆ’1,๐‘˜

+[โˆ†1,2 โˆ’โˆ†1โˆ†2

][โˆ†๐‘†โˆ–1,2 โˆ’โˆ†๐‘˜โˆ†๐‘†โˆ–1,2,๐‘˜

]+[โˆ†๐‘˜โˆ’1,๐‘˜ โˆ’โˆ†๐‘˜โˆ’1โˆ†๐‘˜

][โˆ†๐‘†โˆ–๐‘˜โˆ’1,๐‘˜ โˆ’โˆ†2โˆ†๐‘†โˆ–1,๐‘˜โˆ’1,๐‘˜

]+ ๐‘.

Lemma 9. Let ๐พ โˆˆ ๐’ฆ have two-connected charged sparsity graph ๐บ = ([๐‘ ], ๐ธ, ๐œ–), and

๐ถ = ๐‘–1 ... ๐‘–๐‘˜ ๐‘–1 be a positive simple cycle of ๐บ whose chords ๐›พ(๐ถ) satisfy Properties (i)-(iii)

of Lemma 6, with vertices ordered as in Lemma 7. Let ๐‘ equal the sum of terms in the

Laplace expansion of โˆ†๐‘† , ๐‘† = ๐‘‰ (๐ถ), corresponding to a permutation where ๐œŽ(๐‘–1) = ๐‘–๐‘˜

or ๐œŽ(๐‘–๐‘˜) = ๐‘–1, but not both. Let ๐‘ˆ โŠ‚ ๐‘†2 be the set of pairs (๐‘Ž, ๐‘), ๐‘Ž < ๐‘, for which

๐‘–๐‘Ž, ๐‘–๐‘โˆ’1, ๐‘–๐‘Ž+1, ๐‘–๐‘ โˆˆ ๐›พ(๐ถ), i.e., cycle edges ๐‘–๐‘Ž, ๐‘–๐‘Ž+1 and ๐‘–๐‘โˆ’1, ๐‘–๐‘) have a pair of

crossed chords between them. Then ๐‘ is equal to

2 (โˆ’1)๐‘˜+1๐พ๐‘–๐‘˜,๐‘–1

๐‘˜โˆ’1โˆ๐‘—=1

๐พ๐‘–๐‘— ,๐‘–๐‘—+1

โˆ(๐‘Ž,๐‘)โˆˆ๐‘ˆ

[1โˆ’

๐œ–๐‘–๐‘Ž,๐‘–๐‘Ž+1๐พ๐‘–๐‘โˆ’1,๐‘–๐‘Ž๐พ๐‘–๐‘Ž+1,๐‘–๐‘

๐พ๐‘–๐‘Ž,๐‘–๐‘Ž+1๐พ๐‘–๐‘โˆ’1,๐‘–๐‘

].

Proof. Without loss of generality, suppose that ๐บ is labelled so that ๐ถ = 1 2 ... ๐‘˜ 1.

Edge 1, ๐‘˜ satisfies Property (ii) of Lemma 7, and so every positive simple cycle of ๐บ[๐‘†]

containing 1, ๐‘˜ spans ๐‘† and contains only edges in ๐ธ(๐ถ) and pairs of crossing chords.

Therefore, we may assume without loss of generality that ๐บ[๐‘†] contains ๐‘ pairs of crossing

chords, and no other chords. If ๐‘ = 0, then ๐ถ is an induced cycle and the result follows

immediately.

There are 2๐‘ Hamiltonian 1 โˆ’ ๐‘˜ paths in ๐บ[๐‘†] not containing 1, ๐‘˜, corresponding

to the ๐‘ binary choices of whether to use each pair of crossing chords or not. Let us

54

denote these paths from 1 to ๐‘˜ by ๐‘ƒ 1,๐‘˜๐œƒ , where ๐œƒ โˆˆ Z๐‘

2, and ๐œƒ๐‘– equals one if the ๐‘–๐‘กโ„Ž crossing

is used in the path (ordered based on increasing distance to 1, ๐‘˜), and zero otherwise.

In particular, ๐‘ƒ 1,๐‘˜0 corresponds to the path = 1 2 ... ๐‘˜. We only consider paths with

the orientation 1 โ†’ ๐‘˜, as all cycles under consideration are positive, and the opposite

orientation has the same sum. Denoting the product of the terms of ๐พ corresponding to the

path ๐‘ƒ 1,๐‘˜๐œƒ = 1 = โ„“1 โ„“2 ... โ„“๐‘˜ = ๐‘˜, by

๐พ(๐‘ƒ 1,๐‘˜๐œƒ

)=

๐‘˜โˆ’1โˆ๐‘—=1

๐พโ„“๐‘— ,โ„“๐‘—+1,

we have

๐‘ = 2 (โˆ’1)๐‘˜+1๐พ๐‘˜,1๐พ(๐‘ƒ 1,๐‘˜0

)โˆ‘๐œƒโˆˆZ๐‘

2

๐พ(๐‘ƒ 1,๐‘˜๐œƒ

)๐พ(๐‘ƒ 1,๐‘˜0

) ,where

๐พ(๐‘ƒ 1,๐‘˜0

)=

๐‘˜โˆ’1โˆ๐‘—=1

๐พ๐‘—,๐‘—+1.

Suppose that the first possible crossing occurs at cycle edges ๐‘Ž, ๐‘Ž + 1 and ๐‘ โˆ’ 1, ๐‘,

๐‘Ž < ๐‘, i.e., ๐‘Ž, ๐‘โˆ’ 1 and ๐‘Ž + 1, ๐‘ are crossing chords. Then

๐‘ = 2 (โˆ’1)๐‘˜+1 ๐พ๐‘˜,1

๐‘˜โˆ’1โˆ๐‘—=1

๐พ๐‘—,๐‘—+1

โˆ‘๐œƒโˆˆZ๐‘

2

๐พ(๐‘ƒ ๐‘Ž,๐‘๐œƒ

)๐พ(๐‘ƒ ๐‘Ž,๐‘0

) .We have โˆ‘

๐œƒโˆˆZ๐‘2

๐พ(๐‘ƒ ๐‘Ž,๐‘๐œƒ

)=โˆ‘๐œƒ1=0

๐พ(๐‘ƒ ๐‘Ž,๐‘๐œƒ

)+โˆ‘๐œƒ1=1

๐พ(๐‘ƒ ๐‘Ž,๐‘๐œƒ

),

and

โˆ‘๐œƒ1=0

๐พ(๐‘ƒ ๐‘Ž,๐‘๐œƒ

)= ๐พ๐‘Ž,๐‘Ž+1๐พ๐‘โˆ’1,๐‘

โˆ‘๐œƒโ€ฒโˆˆZ๐‘โˆ’1

2

๐พ(๐‘ƒ ๐‘Ž+1,๐‘โˆ’1๐œƒโ€ฒ

),

55

โˆ‘๐œƒ1=1

๐พ(๐‘ƒ ๐‘Ž,๐‘๐œƒ

)= ๐พ๐‘Ž,๐‘โˆ’1๐พ๐‘Ž+1,๐‘

โˆ‘๐œƒโ€ฒโˆˆZ๐‘โˆ’1

2

๐พ(๐‘ƒ ๐‘โˆ’1,๐‘Ž+1๐œƒโ€ฒ

)= ๐พ๐‘Ž,๐‘โˆ’1๐พ๐‘Ž+1,๐‘

โˆ‘๐œƒโ€ฒโˆˆZ๐‘โˆ’1

2

๐พ(๐‘ƒ ๐‘Ž+1,๐‘โˆ’1๐œƒโ€ฒ

)๐œ–(๐‘ƒ ๐‘Ž+1,๐‘โˆ’1๐œƒโ€ฒ

).

By Property (i) of Lemma 6, ๐œ–(๐ถ(๐‘Ž, ๐‘โˆ’ 1)

)= โˆ’1, and so

๐œ–(๐‘ƒ ๐‘Ž+1,๐‘โˆ’1๐œƒโ€ฒ

)= โˆ’๐œ–๐‘โˆ’1,๐‘Ž๐œ–๐‘Ž,๐‘Ž+1 and ๐พ๐‘Ž,๐‘โˆ’1๐œ–

(๐‘ƒ ๐‘Ž+1,๐‘โˆ’1๐œƒโ€ฒ

)= โˆ’๐œ–๐‘Ž,๐‘Ž+1๐พ๐‘โˆ’1,๐‘Ž.

This implies that

โˆ‘๐œƒโˆˆZ๐‘

2

๐พ(๐‘ƒ ๐‘Ž,๐‘๐œƒ

)=(๐พ๐‘Ž,๐‘Ž+1๐พ๐‘โˆ’1,๐‘ โˆ’ ๐œ–๐‘Ž,๐‘Ž+1๐พ๐‘โˆ’1,๐‘Ž๐พ๐‘Ž+1,๐‘

) โˆ‘๐œƒโ€ฒโˆˆZ๐‘โˆ’1

2

๐พ(๐‘ƒ ๐‘Ž+1,๐‘โˆ’1๐œƒโ€ฒ

),

and that

โˆ‘๐œƒโˆˆZ๐‘

2

๐พ(๐‘ƒ ๐‘Ž,๐‘๐œƒ

)๐พ(๐‘ƒ ๐‘Ž,๐‘0

) =

(1โˆ’ ๐œ–๐‘Ž,๐‘Ž+1๐พ๐‘โˆ’1,๐‘Ž๐พ๐‘Ž+1,๐‘

๐พ๐‘Ž,๐‘Ž+1๐พ๐‘โˆ’1,๐‘

) โˆ‘๐œƒโ€ฒโˆˆZ๐‘โˆ’1

2

๐พ(๐‘ƒ ๐‘Ž+1,๐‘โˆ’1๐œƒโ€ฒ

)๐พ(๐‘ƒ ๐‘Ž+1,๐‘โˆ’10

) .Repeating the above procedure for the remaining ๐‘โˆ’ 1 crossings completes the proof.

Equipped with Lemmas 6, 7, 8, and 9, we can now make a key observation. Suppose that

we have a simple cycle basis ๐‘ฅ1, ..., ๐‘ฅ๐œˆโˆ’1 for ๐’ž+(๐บ) whose corresponding cycles all satisfy

Properties (i)-(iii) of Lemma 6. Of course, a minimal simple cycle basis satisfies this, but in

Subsections 2.3.2 and 2.3.3 we will consider alternate bases that also satisfy these conditions

and may be easier to compute in practice. For cycles ๐ถ of length at most four, we have

already detailed how to compute ๐‘ (๐ถ), and this computation requires only principal minors

corresponding to subsets of ๐‘‰ (๐ถ). When ๐ถ is of length greater than four, by Lemmas 8

and 9, we can also compute ๐‘ (๐ถ), using only principal minors corresponding to subsets

of ๐‘‰ (๐ถ), as the quantity sgn(๐พ๐‘–1,๐‘–๐‘˜โˆ’1๐พ๐‘–๐‘˜โˆ’1,๐‘–๐‘˜๐พ๐‘–๐‘˜,๐‘–2๐พ๐‘–2,๐‘–1) in Lemma 8 corresponds to a

positive four-cycle, and in Lemma 9 the quantity

sgn(

1โˆ’๐œ–๐‘–๐‘Ž,๐‘–๐‘Ž+1๐พ๐‘–๐‘โˆ’1,๐‘–๐‘Ž๐พ๐‘–๐‘Ž+1,๐‘–๐‘

๐พ๐‘–๐‘Ž,๐‘–๐‘Ž+1๐พ๐‘–๐‘โˆ’1,๐‘–๐‘

)

56

equals +1 if |๐พ๐‘–๐‘Ž,๐‘–๐‘Ž+1๐พ๐‘–๐‘โˆ’1,๐‘–๐‘| > |๐พ๐‘–๐‘โˆ’1,๐‘–๐‘Ž๐พ๐‘–๐‘Ž+1,๐‘–๐‘| and equals

โˆ’๐œ–๐‘–๐‘Ž,๐‘–๐‘Ž+1๐œ–๐‘–๐‘โˆ’1,๐‘–๐‘ sgn(๐พ๐‘–๐‘Ž,๐‘–๐‘Ž+1๐พ๐‘–๐‘Ž+1,๐‘–๐‘๐พ๐‘–๐‘,๐‘–๐‘โˆ’1

๐พ๐‘–๐‘โˆ’1,๐‘–๐‘Ž

)if |๐พ๐‘–๐‘Ž,๐‘–๐‘Ž+1๐พ๐‘–๐‘โˆ’1,๐‘–๐‘| < |๐พ๐‘–๐‘โˆ’1,๐‘–๐‘Ž๐พ๐‘–๐‘Ž+1,๐‘–๐‘|. By Condition (2.6), we have |๐พ๐‘–๐‘Ž,๐‘–๐‘Ž+1๐พ๐‘–๐‘โˆ’1,๐‘–๐‘| =

|๐พ๐‘–๐‘โˆ’1,๐‘–๐‘Ž๐พ๐‘–๐‘Ž+1,๐‘–๐‘|. Therefore, given a simple cycle basis ๐‘ฅ1, ..., ๐‘ฅ๐œˆโˆ’1 whose corresponding

cycles satisfy Properties (i)-(iii) of Lemma 6, we can compute ๐‘ (๐ถ) for every such cycle

in the basis using only principal minors of size at most the length of the longest cycle in

the basis. Given this fact, we are now prepared to answer Question (2.7) and provide an

alternate proof for Question (2.8) through the following proposition and theorem.

Proposition 4. Let ๐พ โˆˆ ๐’ฆ, with charged sparsity graph ๐บ = ([๐‘ ], ๐ธ, ๐œ–). The set of ๐พ โ€ฒ โˆˆ ๐’ฆ

that satisfy โˆ†๐‘†(๐พ) = โˆ†๐‘†(๐พ โ€ฒ) for all ๐‘† โŠ‚ [๐‘ ] is exactly the set generated by ๐พ and the

operations

๐’Ÿ๐‘ -similarity: ๐พ โ†’ ๐ท๐พ๐ท, where ๐ท โˆˆ R๐‘ร—๐‘ is an arbitrary involutory diagonal

matrix, i.e., ๐ท has entries ยฑ1 on the diagonal, and

block transpose: ๐พ โ†’ ๐พ โ€ฒ, where ๐พ โ€ฒ๐‘–,๐‘— = ๐พ๐‘—,๐‘– for all ๐‘–, ๐‘— โˆˆ ๐‘‰ (๐ป) for some block ๐ป , and

๐พ โ€ฒ๐‘–,๐‘— = ๐พ๐‘–,๐‘— otherwise.

Proof. We first verify that the ๐’Ÿ๐‘ -similarity and block transpose operations both preserve

principal minors. The determinant is multiplicative, and so principal minors are immediately

preserved under ๐’Ÿ๐‘ -similarity, as the principal submatrix of ๐ท๐พ๐ท corresponding to ๐‘† is

equal to the product of the principal submatrices corresponding to ๐‘† of the three matrices ๐ท,

๐พ, ๐ท, and so

โˆ†๐‘†(๐ท๐พ๐ท) = โˆ†๐‘†(๐ท)โˆ†๐‘†(๐พ)โˆ†๐‘†(๐ท) = โˆ†๐‘†(๐พ).

For the block transpose operation, we note that the transpose leaves the determinant un-

changed. By equation (2.10), the principal minors of a matrix are uniquely determined by the

principal minors of the matrices corresponding to the blocks of ๐บ. As the transpose of any

block also leaves the principal minors corresponding to subsets of other blocks unchanged,

principal minors are preserved under ๐’Ÿ๐‘ -similarity.

57

What remains is to show that if ๐พ โ€ฒ satisfies โˆ†๐‘†(๐พ) = โˆ†๐‘†(๐พ โ€ฒ) for all ๐‘† โŠ‚ [๐‘ ], then ๐พ โ€ฒ

is generated by ๐พ and the above two operations. Without loss of generality, we may suppose

that the shared charged sparsity graph ๐บ = ([๐‘ ], ๐ธ, ๐œ–) of ๐พ and ๐พ โ€ฒ is two-connected, as the

general case follows from this one. We will transform ๐พ โ€ฒ to ๐พ by first making them agree

for entries corresponding to a spanning tree and a negative edge of ๐บ, and then observing

that by Lemmas 8 and 9, this property implies that all entries agree.

Let ๐ถ โ€ฒ be an arbitrary negative simple cycle of ๐บ, ๐‘’ โˆˆ ๐ธ(๐ถ โ€ฒ) be a negative edge in ๐ถ โ€ฒ,

and ๐‘‡ be a spanning tree of ๐บ containing the edges ๐ธ(๐ถ โ€ฒ) โˆ– ๐‘’. By applying ๐’Ÿ๐‘ -similarity

and block transpose to ๐พ โ€ฒ, we can produce a matrix that agrees with ๐พ for all entries

corresponding to edges in ๐ธ(๐‘‡ ) โˆช ๐‘’. We perform this procedure in three steps, first by

making ๐พ โ€ฒ agree with ๐พ for edges in ๐ธ(๐‘‡ ), then for edge ๐‘’, and then finally fixing any

edges in ๐ธ(๐‘‡ ) for which the matrices no longer agree.

First, we make ๐พ โ€ฒ and ๐พ agree for edges in ๐ธ(๐‘‡ ). Suppose that ๐พ โ€ฒ๐‘,๐‘ž = โˆ’๐พ๐‘,๐‘ž for some

๐‘, ๐‘ž โˆˆ ๐ธ(๐‘‡ ). Let ๐‘ˆ โŠ‚ [๐‘ ] be the set of vertices connected to ๐‘ in the forest ๐‘‡ โˆ– ๐‘, ๐‘ž

(the removal of edge ๐‘, ๐‘ž from ๐‘‡ ), and be the diagonal matrix with ๐‘–,๐‘– = +1 if ๐‘– โˆˆ ๐‘ˆ

and ๐‘–,๐‘– = โˆ’1 if ๐‘– โˆˆ ๐‘ˆ . The matrix ๐พ โ€ฒ satisfies

[๐พ โ€ฒ

]๐‘,๐‘ž

= ๐‘,๐‘๐พโ€ฒ๐‘,๐‘ž๐‘ž,๐‘ž = โˆ’๐พ โ€ฒ

๐‘,๐‘ž = ๐พ๐‘,๐‘ž,

and[๐พ โ€ฒ

]๐‘–,๐‘—

= ๐พ โ€ฒ๐‘–,๐‘— for any ๐‘–, ๐‘— either both in ๐‘ˆ or neither in ๐‘ˆ (and therefore for all

edges ๐‘–, ๐‘— โˆˆ ๐ธ(๐‘‡ ) โˆ– ๐‘, ๐‘ž). Repeating this procedure for every edge ๐‘, ๐‘ž โˆˆ ๐ธ(๐‘‡ ) for

which ๐พ โ€ฒ๐‘,๐‘ž = โˆ’๐พ๐‘,๐‘ž results in a matrix that agrees with ๐พ for all entries corresponding to

edges in ๐ธ(๐‘‡ ).

Next, we make our matrix and ๐พ agree for the edge ๐‘’. If our matrix already agrees for

edge ๐‘’, then we are done with this part of the proof, and we denote the resulting matrix by

. If our resulting matrix does not agree, then, by taking the transpose of this matrix, we

have a new matrix that now agrees with ๐พ for all edges ๐ธ(๐‘‡ )โˆช ๐‘’, except for negative edges

in ๐ธ(๐‘‡ ). By repeating the ๐’Ÿ๐‘ -similarity operation again on negative edges of ๐ธ(๐‘‡ ), we

again obtain a matrix that agrees with ๐พ on the edges of ๐ธ(๐‘‡ ), but now also agrees on the

58

edge ๐‘’, as there is an even number of negative edges in the path between the vertices of ๐‘’ in

the tree ๐‘‡ . We now have a matrix that agrees with ๐พ for all entries corresponding to edges

in ๐ธ(๐‘‡ ) โˆช ๐‘’, and we denote this matrix by .

Finally, we aim to show that agreement on the edges ๐ธ(๐‘‡ ) โˆช ๐‘’ already implies that

= ๐พ. Let ๐‘–, ๐‘— be an arbitrary edge not in ๐ธ(๐‘‡ ) โˆช ๐‘’, and ๐ถ be the simple cycle

containing edge ๐‘–, ๐‘— and the unique ๐‘–โˆ’ ๐‘— path in the tree ๐‘‡ . By Lemmas 8 and 9, the value

of ๐‘ ๐พ(๐ถ) can be computed for every cycle corresponding to an incidence vector in a minimal

simple cycle basis of ๐’ž+(๐บ) using only principal minors. Then ๐‘ ๐พ(๐ถ) can be computed

using principal minors and ๐‘ ๐พ(๐ถ โ€ฒ), as the incidence vector for ๐ถ โ€ฒ combined with a minimal

positive simple cycle basis forms a basis for ๐’ž(๐บ). By assumption, โˆ†๐‘†(๐พ) = โˆ†๐‘†() for

all ๐‘† โŠ‚ [๐‘ ], and, by construction, ๐‘ ๐พ(๐ถ โ€ฒ) = ๐‘ (๐ถ โ€ฒ). Therefore, ๐‘ ๐พ(๐ถ) = ๐‘ (๐ถ) for all

๐‘–, ๐‘— not in ๐ธ(๐‘‡ ) โˆช ๐‘’, and so = ๐พ.

Theorem 5. Let ๐พ โˆˆ ๐’ฆ, with charged sparsity graph ๐บ = ([๐‘ ], ๐ธ, ๐œ–). Let โ„“+ โ‰ฅ 3 be the

simple cycle sparsity of ๐’ž+(๐บ). Then any matrix ๐พ โ€ฒ โˆˆ ๐’ฆ with โˆ†๐‘†(๐พ) = โˆ†๐‘†(๐พ โ€ฒ) for all

|๐‘†| โ‰ค โ„“+ also satisfies โˆ†๐‘†(๐พ) = โˆ†๐‘†(๐พ โ€ฒ) for all ๐‘† โŠ‚ [๐‘ ], and there exists a matrix โˆˆ ๐’ฆ

with โˆ†๐‘†() = โˆ†๐‘†(๐พ) for all |๐‘†| < โ„“+ but โˆ†๐‘†() = โˆ†๐‘†(๐พ) for some ๐‘†.

Proof. The first part of theorem follows almost immediately from Lemmas 8 and 9. It

suffices to consider a matrix ๐พ โˆˆ ๐’ฆ with a two-connected charged sparsity graph ๐บ =

([๐‘ ], ๐ธ, ๐œ–), as the more general case follows from this one. By Lemmas 8, and 9, the

quantity ๐‘ (๐ถ) is computable for all simple cycles ๐ถ in a minimal cycle basis for ๐’ž+(๐บ)

(and therefore for all positive simple cycles), using only principal minors of size at most

the length of the longest cycle in the basis, which in this case is the simple cycle sparsity

โ„“+ of ๐’ž+(๐บ). The values ๐‘ (๐ถ) for positive simple cycles combined with the magnitude of

the entries of ๐พ and the charge function ๐œ– uniquely determines all principal minors, as each

term in the Laplace expansion of some โˆ†๐‘†(๐พ) corresponds to a partitioning of ๐‘† into a

disjoint collection of vertices, pairs corresponding to edges, and oriented versions of positive

simple cycles of ๐ธ(๐‘†). This completes the first portion of the proof.

Next, we explicitly construct a matrix that agrees with ๐พ in the first โ„“+ โˆ’ 1 principal

minors (โ„“+ โ‰ฅ 3), but disagrees on a principal minor of order โ„“+. To do so, we consider a

59

minimal simple cycle basis๐œ’๐ถ1 , ..., ๐œ’๐ถ๐œˆโˆ’1

of ๐’ž+(๐บ), ordered so that |๐ถ1| โ‰ฅ |๐ถ2| โ‰ฅ ... โ‰ฅ

|๐ถ๐œˆโˆ’1|. By definition, โ„“+ = |๐ถ1|. Let be a matrix satisfying

๐‘–,๐‘– = ๐พ๐‘–,๐‘– for ๐‘– โˆˆ [๐‘ ], ๐‘–,๐‘—๐‘—,๐‘– = ๐พ๐‘–,๐‘—๐พ๐‘—,๐‘– for ๐‘–, ๐‘— โˆˆ [๐‘ ],

and

๐‘ (๐ถ๐‘–) =

โŽงโŽชโŽจโŽชโŽฉ ๐‘ ๐พ(๐ถ๐‘–) if ๐‘– > 1

โˆ’๐‘ ๐พ(๐ถ๐‘–) if ๐‘– = 1

.

The matrix and ๐พ agree on all principal minors of order at most โ„“+ โˆ’ 1, for if there was

a principal minor where they disagreed, this would imply that there is a positive simple

cycle of length less than โ„“+ whose incidence vector is not in the span of ๐œ’๐ถ2 , ..., ๐œ’๐ถ๐œˆโˆ’1, a

contradiction. To complete the proof, it suffices to show that โˆ†๐‘‰ (๐ถ1)() = โˆ†๐‘‰ (๐ถ1)(๐พ).

To do so, we consider three different cases, depending on the length of ๐ถ1. If ๐ถ1 is

a three-cycle, then ๐ถ1 is an induced cycle and the result follows immediately. If ๐ถ1 is a

four-cycle, ๐บ[๐‘‰ (๐ถ1)] may possibly have multiple positive four-cycles. However, in this

case the quantity ๐‘ from Equation (2.11) is distinct for and ๐พ, as ๐พ โˆˆ ๐’ฆ. Finally,

we consider the general case when |๐ถ1| > 4. By Lemma 8, all terms of โˆ†๐‘‰ (๐ถ)(๐พ) (and

โˆ†๐‘‰ (๐ถ)()) except for ๐‘ depend only on principal minors of order at most โ„“+ โˆ’ 1. We

denote the quantity ๐‘ from Lemma 8 for ๐พ and by ๐‘ and ๐‘ respectively. By Lemma 9,

the magnitude of ๐‘ and ๐‘ depend only on principal minors of order at most four, and so

|๐‘| = |๐‘|. In addition, because ๐พ โˆˆ ๐’ฆ, ๐‘ = 0. The quantities ๐‘ and ๐‘ are equal in sign if

and only if ๐‘ (๐ถ1) = ๐‘ ๐พ(๐ถ1), and therefore โˆ†๐‘‰ (๐ถ1)() = โˆ†๐‘‰ (๐ถ1)(๐พ).

2.3.2 Efficiently Recovering a Matrix from its Principal Minors

In Subsection 2.3.1, we characterized both the set of magnitude-symmetric matrices in ๐’ฆ

that share a given set of principal minors, and noted that principal minors of order โ‰ค โ„“+

uniquely determine principal minors of all orders, where โ„“+ is the simple cycle sparsity of

๐’ž+(๐บ) and is computable using principal minors of order one and two. In this subsection,

we make use of a number of theoretical results of Subsection 2.3.1 to formally describe a

60

polynomial time algorithm to produce a magnitude-symmetric matrix ๐พ with prescribed

principal minors. We provide a high-level description and discussion of this algorithm below,

and save the description of a key subroutine for computing a positive simple cycle basis for

after the overall description and proof.

An Efficient Algorithm

Our formal algorithm proceeds by completing a number of high-level tasks, which we

describe below. This procedure is very similar in nature to the algorithm implicitly described

and used in the proofs of Proposition 4 and Theorem 5. The main difference between the

algorithmic procedure alluded to in Subsection 2.3.1 and this algorithm is the computation

of a positive simple cycle basis. Unlike ๐’ž(๐บ), there is no known polynomial time algorithm

to compute a minimal simple cycle basis for ๐’ž+(๐บ), and a decision version of this problem

may indeed be NP-hard. Our algorithm, which we denote by RECOVERK(โˆ†๐‘†๐‘†โŠ‚[๐‘ ]

),

proceeds in five main steps:

Step 1: Compute |๐พ๐‘–,๐‘—|, ๐‘–, ๐‘— โˆˆ [๐‘ ], and the charged sparsity graph ๐บ = ([๐‘ ], ๐ธ, ๐œ–).

We recall that ๐พ๐‘–,๐‘– = โˆ†๐‘– and |๐พ๐‘–,๐‘—| = |๐พ๐‘—,๐‘–| =โˆš|โˆ†๐‘–โˆ†๐‘— โˆ’โˆ†๐‘–,๐‘—|. The edges ๐‘–, ๐‘— โˆˆ

๐ธ(๐บ) correspond to non-zero off-diagonal entries |๐พ๐‘–,๐‘—| = 0, and the function ๐œ– is given

by ๐œ–๐‘–,๐‘— = sgn(โˆ†๐‘–โˆ†๐‘— โˆ’โˆ†๐‘–,๐‘—).

Step 2: For every block ๐ป of ๐บ, compute a simple cycle basis ๐‘ฅ1, ..., ๐‘ฅ๐‘˜ of ๐’ž+(๐ป).

In the proof of Proposition 3, we defined an efficient algorithm to compute a simple

cycle basis of ๐’ž+(๐ป) for any two-connected graph ๐ป . This algorithm makes use of the

property that every two-connected graph has an open ear decomposition. Unfortunately,

this algorithm has no provable guarantees on the length of the longest cycle in the basis.

For this reason, we introduce an alternate efficient algorithm below that computes a

simple cycle basis of ๐’ž+(๐ป) consisting of cycles all of length at most 3๐œ‘๐ป , where ๐œ‘๐ป is

61

the maximum length of a shortest cycle between any two edges in ๐ป , i.e.,

๐œ‘๐ป := max๐‘’,๐‘’โ€ฒโˆˆ๐ธ(๐ป)

minsimple cycle ๐ถ๐‘ .๐‘ก. ๐‘’,๐‘’โ€ฒโˆˆ๐ธ(๐ถ)

|๐ถ|

with ๐œ‘๐ป := 2 if ๐ป is acyclic. The existence of a simple cycle through any two edges

of a 2-connected graph can be deduced from the existence of two vertex-disjoint paths

between newly added midpoints of these two edges. The resulting simple cycle basis

also maximizes the number of three- and four-cycles it contains. This is a key property

which allows us to limit the number of principal minors that we query.

Step 3: For every block ๐ป of ๐บ, convert ๐‘ฅ1, ..., ๐‘ฅ๐‘˜ into a simple cycle basis satisfying

Properties (i)-(iii) of Lemma 6.

If there was an efficient algorithm for computing a minimal simple cycle basis for

๐’ž+(๐ป), by Lemma 6, we would be done (and there would be no need for the previous

step). However, there is currently no known algorithm for this, and a decision version of

this problem may be NP-hard. By using the simple cycle basis from Step 2 and iteratively

removing chords, we can create a basis that satisfies the same three key properties (in

Lemma 6) that a minimal simple cycle basis does. In addition, the lengths of the cycles

in this basis are no larger than those of Step 2, i.e., no procedure in Step 3 ever increases

the length of a cycle.

The procedure for this is quite intuitive. Given a cycle ๐ถ in the basis, we efficiently

check that ๐›พ(๐ถ) satisfies Properties (i)-(iii) of Lemma 6. If all properties hold, we are

done, and check another cycle in the basis, until all cycles satisfy the desired properties.

In each case, if a given property does not hold, then, by the proof of Lemma 6, we can

efficiently compute an alternate cycle ๐ถ โ€ฒ, |๐ถ โ€ฒ| < |๐ถ|, that can replace ๐ถ in the simple

cycle basis for ๐’ž+(๐ป), decreasing the sum of cycle lengths in the basis by at least one.

Because of this, the described procedure is a polynomial time algorithm.

62

Step 4: For every block ๐ป of ๐บ, compute ๐‘ (๐ถ) for every cycle ๐ถ in the basis.

This calculation relies heavily on the results of Subsection 2.3.1. The simple cycle basis

for ๐’ž+(๐ป) satisfies Properties (i)-(iii) of Lemma 6, and so we may apply Lemmas 8

and 9. We compute the quantities ๐‘ (๐ถ) iteratively based on cycle length, beginning

with the shortest cycle in the basis and finishing with the longest. By the analysis at

the beginning of Subsection 2.3.1, we recall that when ๐ถ is a three- or four-cycle, ๐‘ (๐ถ)

can be computed efficiently using ๐‘‚(1) principal minors, all corresponding to subsets

of ๐‘‰ (๐ถ). When |๐ถ| > 4, by Lemma 8, the quantity ๐‘ (defined in Lemma 8) can be

computed efficiently using ๐‘‚(1) principal minors, all corresponding to subsets of ๐‘‰ (๐ถ).

By Lemma 9, the quantity ๐‘ (๐ถ) can be computed efficiently using ๐‘, ๐‘ (๐ถ โ€ฒ) for positive

four-cycles ๐ถ โ€ฒ โŠ‚ ๐บ[๐‘‰ (๐ถ)], and ๐‘‚(1) principal minors all corresponding to subsets of

๐‘‰ (๐ถ). Because our basis maximizes the number of three- and four-cycles it contains,

any such four-cycle ๐ถ โ€ฒ is either in the basis (and so ๐‘ (๐ถ โ€ฒ) has already been computed)

or is a linear combination of three- and four-cycles in the basis, in which case ๐‘ (๐ถ โ€ฒ) can

be computed using Gaussian elimination, without any additional querying of principal

minors.

Step 5: Output a matrix ๐พ satisfying โˆ†๐‘†(๐พ) = โˆ†๐‘† for all ๐‘† โŠ‚ [๐‘›].

The procedure for producing this matrix is quite similar to the proof of Proposition 4.

It suffices to fix the signs of the upper triangular entries of ๐พ, as the lower triangular

entries can be computed using ๐œ–. For each block ๐ป , we find a negative simple cycle ๐ถ (if

one exists), fix a negative edge ๐‘’ โˆˆ ๐ธ(๐ถ), and extend ๐ธ(๐ถ)โˆ–๐‘’ to a spanning tree ๐‘‡ of ๐ป ,

i.e.,[๐ธ(๐ถ)โˆ–๐‘’

]โŠ‚ ๐ธ(๐‘‡ ). We give the entries ๐พ๐‘–,๐‘— , ๐‘– < ๐‘—, ๐‘–, ๐‘— โˆˆ ๐ธ(๐‘‡ ) โˆช ๐‘’ an arbitrary

sign, and note our choice of ๐‘ ๐พ(๐ถ). We extend the simple cycle basis for ๐’ž+(๐ป) to a

basis for ๐’ž(๐ป) by adding ๐ถ. On the other hand, if no negative cycle exists, then we

simply fix an arbitrary spanning tree ๐‘‡ , and give the entries ๐พ๐‘–,๐‘— , ๐‘– < ๐‘—, ๐‘–, ๐‘— โˆˆ ๐ธ(๐‘‡ )

an arbitrary sign. In both cases, we have a basis for ๐’ž(๐ป) consisting of cycles ๐ถ๐‘– for

63

which we have computed ๐‘ (๐ถ๐‘–). For each edge ๐‘–, ๐‘— โˆˆ ๐ธ(๐ป) corresponding to an entry

๐พ๐‘–,๐‘— for which we have not fixed the sign of, we consider the cycle ๐ถ โ€ฒ consisting of the

edge ๐‘–, ๐‘— and the unique ๐‘–โˆ’ ๐‘— path in ๐‘‡ . Using Gaussian elimination, we can write ๐ถ โ€ฒ

as a sum of a subset of the cycles in our basis. As noted in Subsection 2.3.1, the quantity

๐‘ (๐ถ โ€ฒ) is simply the product of the quantities ๐‘ (๐ถ๐‘–) for cycles ๐ถ๐‘– in this sum.

A few comments are in order. Conditional on the analysis of Step 2, the above algorithm

runs in time polynomial in ๐‘ . Of course, the set โˆ†๐‘†๐‘†โŠ‚[๐‘ ] is not polynomial in ๐‘ , but

rather than take the entire set as input, we assume the existence of some querying operation,

in which the value of any principal minor can be queried in polynomial time. Combining the

analysis of each step, we can give the following guarantee for the RECOVERK(โˆ†๐‘†๐‘†โŠ‚[๐‘ ]

)algorithm.

Theorem 6. Let โˆ†๐‘†๐‘†โŠ‚[๐‘ ] be the principal minors of some matrix in ๐’ฆ. The algorithm

RECOVERK(โˆ†๐‘†๐‘†โŠ‚[๐‘ ]

)computes a matrix ๐พ โˆˆ ๐’ฆ satisfying โˆ†๐‘†(๐พ) = โˆ†๐‘† for all

๐‘† โŠ‚ [๐‘ ]. This algorithm runs in time polynomial in ๐‘ , and queries at most ๐‘‚(๐‘2)

principal minors, all of order at most 3๐œ‘๐บ, where ๐บ is the sparsity graph of โˆ†๐‘†|๐‘†|โ‰ค2 and

๐œ‘๐บ is the maximum of ๐œ‘๐ป over all blocks ๐ป of ๐บ. In addition, there exists a matrix ๐พ โ€ฒ โˆˆ ๐’ฆ,

|๐พ โ€ฒ๐‘–,๐‘—| = |๐พ๐‘–,๐‘—|, ๐‘–, ๐‘— โˆˆ [๐‘ ], such that any algorithm that computes a matrix with principal

minors โˆ†๐‘†(๐พ โ€ฒ)๐‘†โŠ‚[๐‘ ] must query a principal minor of order at least ๐œ‘๐บ.

Proof. Conditional on the existence of the algorithm described in Step 2, i.e., an efficient

algorithm to compute a simple cycle basis for ๐’ž+(๐ป) consisting of cycles of length at most

3๐œ‘๐ป that maximizes the number of three- and four-cycles it contains, by the above analysis

we already have shown that the RECOVERK(โˆ†๐‘†๐‘†โŠ‚[๐‘ ]

)algorithm runs in time polynomial

in ๐‘› and queries at most ๐‘‚(๐‘2) principal minors.

To construct ๐พ โ€ฒ, we first consider the uncharged sparsity graph ๐บ = ([๐‘ ], ๐ธ) of ๐พ, and

let ๐‘’, ๐‘’โ€ฒ โˆˆ ๐ธ(๐บ) be a pair of edges for which the quantity ๐œ‘๐บ is achieved (if ๐œ‘๐บ = 2, we

are done). We aim to define an alternate charge function for ๐บ, show that the simple cycle

sparsity of ๐’ž+(๐บ) is at least ๐œ‘๐บ, find a matrix ๐พ โ€ฒ that has this charged sparsity graph, and

64

then make use of Theorem 5. Consider the charge function ๐œ– satisfying ๐œ–(๐‘’) = ๐œ–(๐‘’โ€ฒ) = โˆ’1,

and equal to +1 otherwise. Any simple cycle basis for the block containing ๐‘’, ๐‘’โ€ฒ must have

a simple cycle containing both edges ๐‘’, ๐‘’โ€ฒ. By definition, the length of this cycle is at least

๐œ‘๐บ, and so the simple cycle sparsity of ๐’ž+(๐บ) is at least ๐œ‘๐บ. Next, let ๐พ โ€ฒ be an arbitrary

matrix with |๐พ โ€ฒ๐‘–,๐‘—| = |๐พ๐‘–,๐‘—|, ๐‘–, ๐‘— โˆˆ [๐‘ ], and charged sparsity graph ๐บ = ([๐‘ ], ๐ธ, ๐œ–), with ๐œ–

as defined above. ๐พ โˆˆ ๐’ฆ, and so ๐พ โ€ฒ โˆˆ ๐’ฆ, and therefore, by Theorem 5, any algorithm that

computes a matrix with principal minors โˆ†๐‘†(๐พ โ€ฒ)๐‘†โŠ‚[๐‘ ] must query a principal minor of

order at least โ„“+ โ‰ฅ ๐œ‘๐บ.

Computing a Simple Cycle Basis with Provable Guarantees

Next, we describe an efficient algorithm to compute a simple cycle basis for ๐’ž+(๐ป)

consisting of cycles of length at most 3๐œ‘๐ป . Suppose we have a two-connected graph

๐ป = ([๐‘ ], ๐ธ, ๐œ–). We first compute a minimal cycle basis ๐œ’๐ถ1 , ..., ๐œ’๐ถ๐œˆ of ๐’ž(๐ป), where

each ๐ถ๐‘– is an induced simple cycle (as argued previously, any lexicographically minimal

basis for ๐’ž(๐ป) consists only of induced simple cycles). Without loss of generality, sup-

pose that ๐ถ1 is the shortest negative cycle in the basis (if no negative cycle exists, we are

done). The set of incidence vectors ๐œ’๐ถ1 + ๐œ’๐ถ๐‘–, ๐œ–(๐ถ๐‘–) = โˆ’1, combined with vectors ๐œ’๐ถ๐‘—

,

๐œ–(๐ถ๐‘—) = +1, forms a basis for ๐’ž+(๐ป), which we denote by โ„ฌ. We will build a simple

cycle basis for ๐’ž+(๐ป) by iteratively choosing incidence vectors of the form ๐œ’๐ถ1 + ๐œ’๐ถ๐‘–,

๐œ–(๐ถ๐‘–) = โˆ’1, replacing each with an incidence vector ๐œ’ ๐ถ๐‘–corresponding to a positive simple

cycle ๐ถ๐‘–, and iteratively updating โ„ฌ.

Let ๐ถ := ๐ถ1 + ๐ถ๐‘– be a positive cycle in our basis โ„ฌ. If ๐ถ is a simple cycle, we are done,

otherwise ๐ถ is the union of edge-disjoint simple cycles ๐น1, ..., ๐น๐‘ for some ๐‘ > 1. If one of

these simple cycles is positive and satisfies

๐œ’๐น๐‘—โˆˆ span

โ„ฌโˆ–๐œ’๐ถ1 + ๐œ’๐ถ๐‘–

,

we are done. Otherwise there is a negative simple cycle, without loss of generality given by

๐น1. We can construct a set of ๐‘ โˆ’ 1 positive cycles by adding ๐น1 to each negative simple

65

cycle, and at least one of these cycles is not in spanโ„ฌโˆ–๐œ’๐ถ1 + ๐œ’๐ถ๐‘–

. We now have a

positive cycle not in this span, given by the sum of two edge-disjoint negative simple cycles

๐น1 and ๐น๐‘— . These two cycles satisfy, by construction, |๐น1| + |๐น๐‘—| โ‰ค |๐ถ| โ‰ค 2โ„“, where โ„“ is

the cycle sparsity of ๐’ž(๐ป). If |๐‘‰ (๐น1) โˆฉ ๐‘‰ (๐น๐‘—)| > 1, then ๐น1 โˆช ๐น๐‘— is two-connected, and we

may compute a simple cycle basis for ๐’ž+(๐น1 โˆช ๐น๐‘—) using the ear decomposition approach of

Proposition 3. At least one positive simple cycle in this basis satisfies our desired condition,

and the length of this positive simple cycle is at most |๐ธ(๐น1 โˆช ๐น๐‘—)| โ‰ค 2โ„“. If ๐น1 โˆช ๐น๐‘— is not

two-connected, then we compute a shortest cycle ๐ถ in ๐ธ(๐ป) containing both an edge in

๐ธ(๐น1) and ๐ธ(๐น๐‘—) (computed using Suurballeโ€™s algorithm, see [109] for details). The graph

๐ป โ€ฒ =(๐‘‰ (๐น1) โˆช ๐‘‰ (๐น๐‘—) โˆช ๐‘‰ ( ๐ถ), ๐ธ(๐น1) โˆช ๐ธ(๐น๐‘—) โˆช ๐ธ( ๐ถ)

)is two-connected, and we can repeat the same procedure as above for ๐ป โ€ฒ, where, in this case,

the resulting cycle is of length at most |๐ธ(๐ป โ€ฒ)| โ‰ค 2โ„“ + ๐œ‘๐ป . The additional guarantee for

maximizing the number of three- and four-cycles can be obtained easily by simply listing

all positive cycles of length three and four, combining them with the computed basis, and

performing a greedy algorithm. What remains is to note that โ„“ โ‰ค ๐œ‘๐ป . This follows quickly

from the observation that for any vertex ๐‘ข in any simple cycle ๐ถ of a minimal cycle basis

for ๐’ž(๐ป), there exists an edge ๐‘ฃ, ๐‘ค โˆˆ ๐ธ(๐ถ) such that ๐ถ is the disjoint union of a shortest

๐‘ขโˆ’ ๐‘ฃ path, shortest ๐‘ขโˆ’ ๐‘ค path, and ๐‘ฃ, ๐‘ค [51, Theorem 3].

2.3.3 An Algorithm for Principal Minors with Noise

So far, we have exclusively considered the situation in which principal minors are known (or

can be queried) exactly. In applications related to this work, this is often not the case. The

key application of a large part of this work is signed determinantal point processes, and the

algorithmic question of learning the kernel of a signed DPP from some set of samples. Here,

for the sake of readability, we focus on the non-probabilistic setting in which, given some

matrix ๐พ with spectral radius ๐œŒ(๐พ) โ‰ค 1, each principal minor โˆ†๐‘† can be queried/estimated

66

up to some absolute error term 0 < ๐›ฟ < 1, i.e., we are given a set โˆ†๐‘†๐‘†โŠ‚[๐‘ ], satisfying

โˆ†๐‘† โˆ’โˆ†๐‘†

< ๐›ฟ for all ๐‘† โŠ‚ [๐‘ ],

where โˆ†๐‘†๐‘†โŠ‚[๐‘ ] are the principal minors of some magnitude-symmetric matrix ๐พ, and

asked to compute a matrix ๐พ โ€ฒ minimizing

๐œŒ(๐พ,๐พ โ€ฒ) := min s.t.

ฮ”๐‘†()=ฮ”๐‘†(๐พ),๐‘†โŠ‚[๐‘ ]

| โˆ’๐พ โ€ฒ|โˆž,

where |๐พ โˆ’๐พ โ€ฒ|โˆž = max๐‘–,๐‘— |๐พ๐‘–,๐‘— โˆ’๐พ โ€ฒ๐‘–,๐‘—|.

In this setting, we require both a separation condition for the entries of ๐พ and a stronger

version of Condition (2.6). Let ๐’ฆ๐‘(๐›ผ, ๐›ฝ) be the set of matrices ๐พ โˆˆ R๐‘ร—๐‘ , ๐œŒ(๐พ) โ‰ค 1,

satisfying: |๐พ๐‘–,๐‘—| = |๐พ๐‘—,๐‘–| for ๐‘–, ๐‘— โˆˆ [๐พ], |๐พ๐‘–,๐‘—| > ๐›ผ or |๐พ๐‘–,๐‘—| = 0 for all ๐‘–, ๐‘— โˆˆ [๐‘ ], and

If ๐พ๐‘–,๐‘—๐พ๐‘—,๐‘˜๐พ๐‘˜,โ„“๐พโ„“,๐‘– = 0 for ๐‘–, ๐‘—, ๐‘˜, โ„“ โˆˆ [๐‘ ] distinct, (2.12)

then|๐พ๐‘–,๐‘—๐พ๐‘˜,โ„“| โˆ’ |๐พ๐‘—,๐‘˜๐พโ„“,๐‘–|

> ๐›ฝ, and the sum

๐œ‘1๐พ๐‘–,๐‘—๐พ๐‘—,๐‘˜๐พ๐‘˜,โ„“๐พโ„“,๐‘– + ๐œ‘2๐พ๐‘–,๐‘—๐พ๐‘—,โ„“๐พโ„“,๐‘˜๐พ๐‘˜,๐‘–

+ ๐œ‘3๐พ๐‘–,๐‘˜๐พ๐‘˜,๐‘—๐พ๐‘—,โ„“๐พโ„“,๐‘–

> ๐›ฝ โˆ€๐œ‘ โˆˆ โˆ’1, 13.

The algorithm RECOVERK(โˆ†๐‘†๐‘†โŠ‚[๐‘ ]

), slightly modified, performs well for this more

general problem. Here, we briefly describe a variant of the above algorithm that performs

nearly optimally on any ๐พ โˆˆ ๐’ฆ(๐›ผ, ๐›ฝ), ๐›ผ, ๐›ฝ > 0, for ๐›ฟ sufficiently small, and quantify the

relationship between ๐›ผ and ๐›ฝ. The algorithm to compute a ๐พ โ€ฒ sufficiently close to ๐พ, which

we denote by RECOVERNOISYK(โˆ†๐‘†๐‘†โŠ‚[๐‘ ]; ๐›ฟ

), proceeds in three main steps:

Step 1: Define |๐พ โ€ฒ๐‘–,๐‘—|, ๐‘–, ๐‘— โˆˆ [๐‘ ], and the charged sparsity graph ๐บ = ([๐‘ ], ๐ธ, ๐œ–).

67

We define

๐พ โ€ฒ๐‘–,๐‘– =

โŽงโŽชโŽจโŽชโŽฉโˆ†๐‘– if |โˆ†๐‘–| > ๐›ฟ

0 otherwise,

and

๐พ โ€ฒ๐‘–,๐‘—๐พ

โ€ฒ๐‘—,๐‘– =

โŽงโŽชโŽจโŽชโŽฉโˆ†๐‘–โˆ†๐‘— โˆ’ โˆ†๐‘–,๐‘— if |โˆ†๐‘–โˆ†๐‘— โˆ’ โˆ†๐‘–,๐‘—| > 3๐›ฟ + ๐›ฟ2

0 otherwise.

The edges ๐‘–, ๐‘— โˆˆ ๐ธ(๐บ) correspond to non-zero off-diagonal entries |๐พ โ€ฒ๐‘–,๐‘—| = 0, and

the function ๐œ– is given by ๐œ–๐‘–,๐‘— = sgn(๐พ โ€ฒ๐‘–,๐‘—๐พ

โ€ฒ๐‘—,๐‘–).

Step 2: For every block ๐ป of ๐บ, compute a simple cycle basis ๐‘ฅ1, ..., ๐‘ฅ๐‘˜ of ๐’ž+(๐ป) satis-

fying Properties (i)-(iii) of Lemma 6, and define ๐‘ ๐พโ€ฒ(๐ถ) for every cycle ๐ถ in the basis.

The computation of the simple cycle basis depends only on the graph ๐บ, and so is identi-

cal to Steps 1 and 2 of the RECOVERK algorithm. The simple cycle basis for ๐’ž+(๐ป)

satisfies Properties (i)-(iii) of Lemma 6, and so we can apply Lemmas 8 and 9. We

define the quantities ๐‘ ๐พโ€ฒ(๐ถ) iteratively based on cycle length, beginning with the shortest

cycle. We begin by detailing the cases of |๐ถ| = 3, then |๐ถ| = 4, and then finally |๐ถ| > 4.

Case I: |๐ถ| = 3.

Suppose that the three-cycle ๐ถ = ๐‘– ๐‘— ๐‘˜ ๐‘–, ๐‘– < ๐‘— < ๐‘˜, is in our basis. If ๐บ was the charged

sparsity graph of ๐พ, then the equation

๐พ๐‘–,๐‘—๐พ๐‘—,๐‘˜๐พ๐‘˜,๐‘– = โˆ†๐‘–โˆ†๐‘—โˆ†๐‘˜ โˆ’1

2

[โˆ†๐‘–โˆ†๐‘—,๐‘˜ + โˆ†๐‘—โˆ†๐‘–,๐‘˜ + โˆ†๐‘˜โˆ†๐‘–,๐‘—

]+

1

2โˆ†๐‘–,๐‘—,๐‘˜

would hold. For this reason, we define ๐‘ ๐พโ€ฒ(๐ถ) as

๐‘ ๐พโ€ฒ(๐ถ) = ๐œ–๐‘–,๐‘˜ sgn(2โˆ†๐‘–โˆ†๐‘—โˆ†๐‘˜ โˆ’

[โˆ†๐‘–โˆ†๐‘—,๐‘˜ + โˆ†๐‘—โˆ†๐‘–,๐‘˜ + โˆ†๐‘˜โˆ†๐‘–,๐‘—

]+ โˆ†๐‘–,๐‘—,๐‘˜

).

68

Case II: |๐ถ| = 4.

Suppose that the four-cycle ๐ถ = ๐‘– ๐‘— ๐‘˜ โ„“ ๐‘–, with (without loss of generality) ๐‘– < ๐‘— < ๐‘˜ < โ„“,

is in our basis. We note that

โˆ†๐‘–,๐‘—,๐‘˜,โ„“ = โˆ†๐‘–,๐‘—โˆ†๐‘˜,โ„“ + โˆ†๐‘–,๐‘˜โˆ†๐‘—,โ„“ + โˆ†๐‘–,โ„“โˆ†๐‘—,๐‘˜ + โˆ†๐‘–โˆ†๐‘—,๐‘˜,โ„“ + โˆ†๐‘—โˆ†๐‘–,๐‘˜,โ„“ + โˆ†๐‘˜โˆ†๐‘–,๐‘—,โ„“

+ โˆ†โ„“โˆ†๐‘–,๐‘—,๐‘˜ โˆ’ 2โˆ†๐‘–โˆ†๐‘—โˆ†๐‘˜,โ„“ โˆ’ 2โˆ†๐‘–โˆ†๐‘˜โˆ†๐‘—,โ„“ โˆ’ 2โˆ†๐‘–โˆ†โ„“โˆ†๐‘—,๐‘˜ โˆ’ 2โˆ†๐‘—โˆ†๐‘˜โˆ†๐‘–,โ„“

โˆ’ 2โˆ†๐‘—โˆ†โ„“โˆ†๐‘–,๐‘˜ โˆ’ 2โˆ†๐‘˜โˆ†โ„“โˆ†๐‘–,๐‘— + 6โˆ†๐‘–โˆ†๐‘—โˆ†๐‘˜โˆ†โ„“ + ๐‘,

where ๐‘ is the sum of the terms in the Laplace expansion of โˆ†๐‘–,๐‘—,๐‘˜,โ„“ corresponding to a

four-cycle of ๐บ[๐‘–, ๐‘—, ๐‘˜, โ„“], and define the quantity ๐‘ analogously:

๐‘ = โˆ†๐‘–,๐‘—,๐‘˜,โ„“ โˆ’ โˆ†๐‘–,๐‘—โˆ†๐‘˜,โ„“ โˆ’ โˆ†๐‘–,๐‘˜โˆ†๐‘—,โ„“ โˆ’ โˆ†๐‘–,โ„“โˆ†๐‘—,๐‘˜ โˆ’ โˆ†๐‘–โˆ†๐‘—,๐‘˜,โ„“ โˆ’ โˆ†๐‘—โˆ†๐‘–,๐‘˜,โ„“

โˆ’ โˆ†๐‘˜โˆ†๐‘–,๐‘—,โ„“ โˆ’ โˆ†โ„“โˆ†๐‘–,๐‘—,๐‘˜ + 2โˆ†๐‘–โˆ†๐‘—โˆ†๐‘˜,โ„“ + 2โˆ†๐‘–โˆ†๐‘˜โˆ†๐‘—,โ„“ + 2โˆ†๐‘–โˆ†โ„“โˆ†๐‘—,๐‘˜

+ 2โˆ†๐‘—โˆ†๐‘˜โˆ†๐‘–,โ„“ + 2โˆ†๐‘—โˆ†โ„“โˆ†๐‘–,๐‘˜ + 2โˆ†๐‘˜โˆ†โ„“โˆ†๐‘–,๐‘— โˆ’ 6โˆ†๐‘–โˆ†๐‘—โˆ†๐‘˜โˆ†โ„“.

We consider two cases, depending on the number of positive four-cycles in ๐บ[๐‘–, ๐‘—, ๐‘˜, โ„“].

First, suppose that ๐ถ is the only positive four-cycle in ๐บ[๐‘–, ๐‘—, ๐‘˜, โ„“]. If ๐บ was the charged

sparsity graph of ๐พ, then ๐‘ would equalโˆ’2๐พ๐‘–,๐‘—๐พ๐‘—,๐‘˜๐พ๐‘˜,โ„“๐พโ„“,๐‘–. For this reason, we define

๐‘ ๐พโ€ฒ(๐ถ) as

๐‘ ๐พโ€ฒ(๐ถ) = ๐œ–๐‘–,โ„“ sgn(โˆ’ ๐‘

)in this case. Next, we consider the second case, in which there is more than one

positive four-cycle in ๐บ[๐‘–, ๐‘—, ๐‘˜, โ„“], which actually implies that all three four-cycles in

๐บ[๐‘–, ๐‘—, ๐‘˜, โ„“] are positive. If ๐บ was the charged sparsity graph of ๐พ then ๐‘ would be

69

given by

๐‘ = โˆ’2(๐พ๐‘–,๐‘—๐พ๐‘—,๐‘˜๐พ๐‘˜,โ„“๐พโ„“,๐‘– + ๐พ๐‘–,๐‘—๐พ๐‘—,โ„“๐พโ„“,๐‘˜๐พ๐‘˜,๐‘– + ๐พ๐‘–,๐‘˜๐พ๐‘˜,๐‘—๐พ๐‘—,โ„“๐พโ„“,๐‘–

).

For this reason, we compute an assignment of values ๐œ‘1, ๐œ‘2, ๐œ‘3 โˆˆ โˆ’1,+1 that mini-

mizes (not necessarily uniquely) the quantity๐‘/2 + ๐œ‘1

๐‘–,๐‘—๐‘—,๐‘˜๐‘˜,โ„“โ„“,๐‘–

+ ๐œ‘2

๐‘–,๐‘—๐‘—,โ„“โ„“,๐‘˜๐‘˜,๐‘–

+ ๐œ‘3

๐‘–,๐‘˜๐‘˜,๐‘—๐‘—,โ„“โ„“,๐‘–

,

and define ๐‘ ๐พโ€ฒ(๐ถ) = ๐œ–๐‘–,โ„“ ๐œ‘1 in this case.

Case III: |๐ถ| > 4.

Suppose that the cycle ๐ถ = ๐‘–1 ... ๐‘–๐‘˜ ๐‘–1, ๐‘˜ > 4, is in our basis, with vertices ordered to

match the ordering of Lemma 8, and define ๐‘† = ๐‘–1, ..., ๐‘–๐‘˜. Based on the equation in

Lemma 8, we define ๐‘ to equal

๐‘ = โˆ†๐‘† โˆ’ โˆ†๐‘–1โˆ†๐‘†โˆ–๐‘–1 โˆ’ โˆ†๐‘–๐‘˜โˆ†๐‘†โˆ–๐‘–๐‘˜ โˆ’[โˆ†๐‘–1,๐‘–๐‘˜ โˆ’ 2โˆ†๐‘–1โˆ†๐‘–๐‘˜

]โˆ†๐‘†โˆ–๐‘–1,๐‘–๐‘˜

+ 2๐พ โ€ฒ๐‘–1,๐‘–๐‘˜โˆ’1

๐พ โ€ฒ๐‘–๐‘˜โˆ’1,๐‘–๐‘˜

๐พ โ€ฒ๐‘–๐‘˜,๐‘–2

๐พ โ€ฒ๐‘–2,๐‘–1

โˆ†๐‘†โˆ–๐‘–1,๐‘–2,๐‘–๐‘˜โˆ’1,๐‘–๐‘˜

+[โˆ†๐‘–1,๐‘–2 โˆ’ โˆ†๐‘–1โˆ†๐‘–2

][โˆ†๐‘–๐‘˜โˆ’1,๐‘–๐‘˜ โˆ’ โˆ†๐‘–๐‘˜โˆ’1

โˆ†๐‘–๐‘˜

]โˆ†๐‘†โˆ–๐‘–1,๐‘–2,๐‘–๐‘˜โˆ’1,๐‘–๐‘˜

โˆ’[โˆ†๐‘–1,๐‘–๐‘˜โˆ’1

โˆ’ โˆ†๐‘–1โˆ†๐‘–๐‘˜โˆ’1

][โˆ†๐‘–2,๐‘–๐‘˜ โˆ’ โˆ†๐‘–2โˆ†๐‘–๐‘˜

]โˆ†๐‘†โˆ–๐‘–1,๐‘–2,๐‘–๐‘˜โˆ’1,๐‘–๐‘˜

โˆ’[โˆ†๐‘–1,๐‘–2 โˆ’ โˆ†๐‘–1โˆ†๐‘–2

][โˆ†๐‘†โˆ–๐‘–1,๐‘–2 โˆ’ โˆ†๐‘–๐‘˜โˆ†๐‘†โˆ–๐‘–1,๐‘–2,๐‘–๐‘˜

]โˆ’[โˆ†๐‘–๐‘˜โˆ’1,๐‘–๐‘˜ โˆ’ โˆ†๐‘–๐‘˜โˆ’1

โˆ†๐‘–๐‘˜

][โˆ†๐‘†โˆ–๐‘–๐‘˜โˆ’1,๐‘–๐‘˜ โˆ’ โˆ†๐‘–2โˆ†๐‘†โˆ–๐‘–1,๐‘–๐‘˜โˆ’1,๐‘–๐‘˜

],

and note that the quantity

sgn(๐พ โ€ฒ๐‘–1,๐‘–๐‘˜โˆ’1

๐พ โ€ฒ๐‘–๐‘˜โˆ’1,๐‘–๐‘˜

๐พ โ€ฒ๐‘–๐‘˜,๐‘–2

๐พ โ€ฒ๐‘–2,๐‘–1

)

is, by construction, computable using the signs of three- and four-cycles in our basis.

70

Based on the equation in Lemma 9, we set

sgn(

2 (โˆ’1)๐‘˜+1๐พ โ€ฒ๐‘–๐‘˜,๐‘–1

๐‘˜โˆ’1โˆ๐‘—=1

๐พ โ€ฒ๐‘–๐‘— ,๐‘–๐‘—+1

โˆ(๐‘Ž,๐‘)โˆˆ๐‘ˆ

[1โˆ’

๐œ–๐‘–๐‘Ž,๐‘–๐‘Ž+1๐พโ€ฒ๐‘–๐‘โˆ’1,๐‘–๐‘Ž

๐พ โ€ฒ๐‘–๐‘Ž+1,๐‘–๐‘

๐พ โ€ฒ๐‘–๐‘Ž,๐‘–๐‘Ž+1

๐พ โ€ฒ๐‘–๐‘โˆ’1,๐‘–๐‘

])= sgn(๐‘),

with the set ๐‘ˆ defined as in Lemma 9. From this equation, we can compute ๐‘ ๐พโ€ฒ(๐ถ), as

the quantity

sgn

[1โˆ’

๐œ–๐‘–๐‘Ž,๐‘–๐‘Ž+1๐พโ€ฒ๐‘–๐‘โˆ’1,๐‘–๐‘Ž

๐พ โ€ฒ๐‘–๐‘Ž+1,๐‘–๐‘

๐พ โ€ฒ๐‘–๐‘Ž,๐‘–๐‘Ž+1

๐พ โ€ฒ๐‘–๐‘โˆ’1,๐‘–๐‘

]is computable using the signs of three- and four-cycles in our basis (see Subsection 2.3.1

for details). In the case of |๐ถ| > 4, we note that ๐‘ ๐พโ€ฒ(๐ถ) was defined using ๐‘‚(1) principal

minors, all corresponding to subsets of ๐‘‰ (๐ถ), and previously computed information

regarding three- and four-cycles (which requires no additional querying).

Step 3: Define ๐พ โ€ฒ.

We have already defined ๐พ โ€ฒ๐‘–,๐‘–, ๐‘– โˆˆ [๐‘ ], |๐พ โ€ฒ

๐‘–,๐‘—|, ๐‘–, ๐‘— โˆˆ [๐‘ ], and ๐‘ ๐พโ€ฒ(๐ถ) for every cycle in

a simple cycle basis. The procedure for producing this matrix is identical to Step 5 of

the RECOVERK algorithm.

The RECOVERNOISYK(โˆ†๐‘†๐‘†โŠ‚[๐‘ ], ๐›ฟ

)algorithm relies quite heavily on ๐›ฟ being small

enough so that the sparsity graph of ๐พ can be recovered. In fact, we can quantify the size of

๐›ฟ required so that the complete signed structure of ๐พ can be recovered, and only uncertainty

in the magnitude of entries remains. In the following theorem, we show that, in this regime,

the RECOVERNOISYK algorithm is nearly optimal.

Theorem 7. Let ๐พ โˆˆ ๐’ฆ๐‘(๐›ผ, ๐›ฝ) and โˆ†๐‘†๐‘†โŠ‚[๐‘ ] satisfy |โˆ†๐‘† โˆ’ โˆ†๐‘†(๐พ)| < ๐›ฟ < 1 for all

๐‘† โŠ‚ [๐‘ ], for some

๐›ฟ < min๐›ผ3๐œ‘๐บ , ๐›ฝ3๐œ‘๐บ/2, ๐›ผ2/400,

where ๐บ is the sparsity graph of โˆ†๐‘†|๐‘†|โ‰ค2 and ๐œ‘๐บ is the maximum of ๐œ‘๐ป over all blocks ๐ป

71

of ๐บ. The RECOVERNOISYK(โˆ†๐‘†๐‘†โŠ‚[๐‘ ], ๐›ฟ

)algorithm outputs a matrix ๐พ โ€ฒ satisfying

๐œŒ(๐พ,๐พ โ€ฒ) < 2๐›ฟ/๐›ผ.

This algorithm runs in time polynomial in ๐‘ , and queries at most ๐‘‚(๐‘2) principal minors,

all of order at most 3๐œ‘๐บ.

Proof. This proof consists primarily of showing that, for ๐›ฟ sufficiently small, the charged

sparsity graph ๐บ = ([๐‘ ], ๐ธ, ๐œ–) and the cycle signs ๐‘ (๐ถ) of ๐พ โ€ฒ and ๐พ agree, and quantifying

the size of ๐›ฟ required to achieve this. If this holds, then the quantity ๐œŒ(๐พ,๐พ โ€ฒ) is simply

given by max๐‘–,๐‘—

|๐พ๐‘–,๐‘—| โˆ’ |๐พ โ€ฒ

๐‘–,๐‘—|. We note that, because ๐œŒ(๐พ) โ‰ค 1 and ๐›ฟ < 1,

โˆ†๐‘†1 ...โˆ†๐‘†๐‘˜

โˆ’ โˆ†๐‘†1 ...โˆ†๐‘†๐‘˜

< (1 + ๐›ฟ)๐‘˜ โˆ’ 1 < (2๐‘˜ โˆ’ 1)๐›ฟ (2.13)

for any ๐‘†1, ..., ๐‘†๐‘˜ โŠ‚ [๐‘ ], ๐‘˜ โˆˆ N. This is the key error estimate we will use throughout the

proof.

We first consider error estimates for the magnitudes of the entries of ๐พ and ๐พ โ€ฒ. We

have |๐พ๐‘–,๐‘– โˆ’ โˆ†๐‘–| < ๐›ฟ and either |๐พ๐‘–,๐‘–| > ๐›ผ or ๐พ๐‘–,๐‘– = 0. If ๐พ๐‘–,๐‘– = 0, then |โˆ†๐‘–| < ๐›ฟ, and

๐พ๐‘–,๐‘– = ๐พ โ€ฒ๐‘–,๐‘–. If |๐พ๐‘–,๐‘–| > ๐›ผ, then |โˆ†๐‘–| > ๐›ฟ, as ๐›ผ > 2๐›ฟ, and ๐พ๐‘–,๐‘– = โˆ†๐‘–. Combining these two

cases, we have

|๐พ๐‘–,๐‘– โˆ’๐พ โ€ฒ๐‘–,๐‘–| < ๐›ฟ for all ๐‘– โˆˆ [๐‘ ].

Next, we consider off-diagonal entries ๐พ โ€ฒ๐‘–,๐‘— . By Equation (2.13),

(โˆ†๐‘–โˆ†๐‘— โˆ’ โˆ†๐‘–,๐‘—)โˆ’๐พ๐‘–,๐‘—๐พ๐‘—,๐‘–

โ‰ค [(1 + ๐›ฟ)2 โˆ’ 1] + ๐›ฟ < 4๐›ฟ.

Therefore, if ๐พ๐‘–,๐‘— = 0, then ๐พ โ€ฒ๐‘–,๐‘— = 0. If |๐พ๐‘–,๐‘—| > ๐›ผ, then |๐พ โ€ฒ

๐‘–,๐‘—| =โˆš|โˆ†๐‘–โˆ†๐‘— โˆ’ โˆ†๐‘–,๐‘—|, as

๐›ผ2 > 8๐›ฟ. Combining these two cases, we have

|๐พ๐‘–,๐‘—๐พ๐‘—,๐‘– โˆ’๐พ โ€ฒ๐‘–,๐‘—๐พ

โ€ฒ๐‘—,๐‘–| < 4๐›ฟ for all ๐‘–, ๐‘— โˆˆ [๐‘ ],

72

and therefore

|๐พ๐‘–,๐‘—| โˆ’ |๐พ โ€ฒ

๐‘–,๐‘—|< 2๐›ฟ/๐›ผ for all ๐‘–, ๐‘— โˆˆ [๐‘ ], ๐‘– = ๐‘—. (2.14)

Our above analysis implies that ๐พ and ๐พ โ€ฒ share the same charged sparsity graph ๐บ =

([๐‘ ], ๐ธ, ๐œ–). What remains is to show that the cycle signs ๐‘ (๐ถ) of ๐พ and ๐พ โ€ฒ agree for

the positive simple cycle basis of our algorithm. Similar to the structure in Step 2 of the

description of the RECOVERNOISYK algorithm, we will treat the cases of |๐ถ| = 3, |๐ถ| = 4,

and |๐ถ| > 4 separately, and rely heavily on the associated notation defined above.

Case I: |๐ถ| = 3.

We have |๐พ๐‘–,๐‘—๐พ๐‘—,๐‘˜๐พ๐‘˜,๐‘–| > ๐›ผ3, and, by Equation (2.13), the difference between ๐พ๐‘–,๐‘—๐พ๐‘—,๐‘˜๐พ๐‘˜,๐‘–

and its estimated value

โˆ†๐‘–โˆ†๐‘—โˆ†๐‘˜ โˆ’1

2

[โˆ†๐‘–โˆ†๐‘—,๐‘˜ + โˆ†๐‘—โˆ†๐‘–,๐‘˜ + โˆ†๐‘˜โˆ†๐‘–,๐‘—

]+

1

2โˆ†๐‘–,๐‘—,๐‘˜

is at most

[(1 + ๐›ฟ)3 โˆ’ 1] +3

2[(1 + ๐›ฟ)2 โˆ’ 1] +

1

2[(1 + ๐›ฟ)โˆ’ 1] < 12๐›ฟ < ๐›ผ3.

Therefore, ๐‘ ๐พ(๐ถ) equals ๐‘ ๐พโ€ฒ(๐ถ).

Case II: |๐ถ| = 4.

By repeated application of Equation (2.13),

|๐‘ โˆ’ ๐‘| < 6[(1 + ๐›ฟ)4 โˆ’ 1] + 12[(1 + ๐›ฟ)3 โˆ’ 1] + 7[(1 + ๐›ฟ)2 โˆ’ 1] + [(1 + ๐›ฟ)โˆ’ 1] < 196 ๐›ฟ.

If ๐ถ is the only positive four-cycle of ๐บ[๐‘–, ๐‘—, ๐‘˜, โ„“], then ๐‘ ๐พ(๐ถ) equals ๐‘ ๐พโ€ฒ(๐ถ), as 2๐›ฝ >

73

196๐›ฟ. If ๐บ[๐‘–, ๐‘—, ๐‘˜, โ„“] has three positive four-cycles, then

๐‘ = โˆ’2(๐พ๐‘–,๐‘—๐พ๐‘—,๐‘˜๐พ๐‘˜,โ„“๐พโ„“,๐‘– + ๐พ๐‘–,๐‘—๐พ๐‘—,โ„“๐พโ„“,๐‘˜๐พ๐‘˜,๐‘– + ๐พ๐‘–,๐‘˜๐พ๐‘˜,๐‘—๐พ๐‘—,โ„“๐พโ„“,๐‘–

),

and, by Condition (2.12), any two signings of the three terms of the above equation for ๐‘

are at distance at least 4๐›ฝ from each other. As 2๐›ฝ > 196๐›ฟ, ๐‘ ๐พ(๐ถ) equals ๐‘ ๐พโ€ฒ(๐ถ) in this case

as well.

Case III: |๐ถ| > 4.

Using equations (2.13) and (2.14), and noting that ๐œŒ(๐พ) < 1 implies |๐พ๐‘–,๐‘—| <โˆš

2, we can

bound the difference between ๐‘ and ๐‘ by

|๐‘ โˆ’ ๐‘| < [(1 + ๐›ฟ)โˆ’ 1] + 5[(1 + ๐›ฟ)2 โˆ’ 1] + 8[(1 + ๐›ฟ)3 โˆ’ 1] + 6[(1 + ๐›ฟ)4 โˆ’ 1]

+ 2[(1 + ๐›ฟ)5 โˆ’ 1] + 2[(โˆš

2 + 2๐›ฟ/๐›ผ)4 โˆ’ 4][(1 + ๐›ฟ)โˆ’ 1]

โ‰ค (224 + 120๐›ฟ/๐›ผ)๐›ฟ โ‰ค 344๐›ฟ.

Based on the equation of Lemma 9 for ๐‘, the quantity ๐‘ is at least

min๐›ผ|๐ถ|, ๐›ฝ|๐ถ|/2 > 344๐›ฟ,

and so ๐‘ ๐พ(๐ถ) equals ๐‘ ๐พโ€ฒ(๐ถ). This completes the analysis of the final case.

This implies that ๐œŒ(๐พ,๐พ โ€ฒ) is given by max๐‘–,๐‘—

|๐พ๐‘–,๐‘—| โˆ’ |๐พ โ€ฒ

๐‘–,๐‘—|< 2๐›ฟ/๐›ผ.

74

Chapter 3

The Spread and Bipartite Spread

Conjecture

3.1 Introduction

The spread ๐‘ (๐‘€) of an arbitrary ๐‘›ร— ๐‘› complex matrix ๐‘€ is the diameter of its spectrum;

that is,

๐‘ (๐‘€) := max๐‘–,๐‘—|๐œ†๐‘– โˆ’ ๐œ†๐‘—|,

where the maximum is taken over all pairs of eigenvalues of ๐‘€ . This quantity has been

well-studied in general, see [33, 53, 83, 129] for details and additional references. Most

notably, Johnson, Kumar, and Wolkowicz produced the lower bound

๐‘ (๐‘€) โ‰ฅโˆ‘

๐‘– =๐‘— ๐‘š๐‘–,๐‘—

/(๐‘›โˆ’ 1)

for normal matrices ๐‘€ = (๐‘š๐‘–,๐‘—) [53, Theorem 2.1], and Mirsky produced the upper bound

๐‘ (๐‘€) โ‰คโˆš

2โˆ‘

๐‘–,๐‘— |๐‘š๐‘–,๐‘—|2 โˆ’ (2/๐‘›)โˆ‘

๐‘– ๐‘š๐‘–,๐‘–

2for any ๐‘›ร— ๐‘› matrix ๐‘€ , which is tight for normal matrices with ๐‘›โˆ’ 2 of its eigenvalues all

equal and equal to the arithmetic mean of the other two [83, Theorem 2].

75

The spread of a matrix has also received interest in certain particular cases. Consider a

simple undirected graph ๐บ = (๐‘‰,๐ธ) of order ๐‘›. The adjacency matrix ๐ด of a graph ๐บ is

the ๐‘› ร— ๐‘› matrix whose rows and columns are indexed by the vertices of ๐บ, with entries

satisfying

๐ด๐‘ข,๐‘ฃ =

โŽงโŽชโŽจโŽชโŽฉ 1 if ๐‘ข, ๐‘ฃ โˆˆ ๐ธ(๐บ)

0 otherwise.

This matrix is real and symmetric, and so its eigenvalues are real, and can be ordered

๐œ†1(๐บ) โ‰ฅ ๐œ†2(๐บ) โ‰ฅ ยท ยท ยท โ‰ฅ ๐œ†๐‘›(๐บ). When considering the spread of the adjacency matrix ๐ด

of some graph ๐บ, the spread is simply the distance between ๐œ†1(๐บ) and ๐œ†๐‘›(๐บ), denoted by

๐‘ (๐บ) := ๐œ†1(๐บ)โˆ’ ๐œ†๐‘›(๐บ).

In this instance, ๐‘ (๐บ) is referred to as the spread of the graph. The spectrum of a bipartite

graph is symmetric with respect to the origin, and, in particular, ๐œ†๐‘›(๐บ) = โˆ’๐œ†1(๐บ) and so

๐‘ (๐บ) = 2๐œ†1(๐บ).

In [48], the authors investigated a number of properties regarding the spread of a

graph, determining upper and lower bounds on ๐‘ (๐บ). Furthermore, they made two key

conjectures. Let us denote the maximum spread over all ๐‘› vertex graphs by ๐‘ (๐‘›), the

maximum spread over all ๐‘› vertex graphs of size ๐‘š by ๐‘ (๐‘›,๐‘š), and the maximum spread

over all ๐‘› vertex bipartite graphs of size ๐‘š by ๐‘ ๐‘(๐‘›,๐‘š). Let ๐พ๐‘˜ be the clique of order ๐‘˜ and

๐บ(๐‘›, ๐‘˜) := ๐พ๐‘˜ โˆจ๐พ๐‘›โˆ’๐‘˜ be the join of the clique ๐พ๐‘˜ and the independent set ๐พ๐‘›โˆ’๐‘˜ (i.e., the

disjoint union of ๐พ๐‘˜ and ๐พ๐‘›โˆ’๐‘˜, with all possible edges between the two graphs added). The

conjectures addressed in this article are as follows.

Conjecture 1 ([48], Conjecture 1.3). For any positive integer ๐‘›, the graph of order ๐‘› with

maximum spread is ๐บ(๐‘›, โŒŠ2๐‘›/3โŒ‹); that is, ๐‘ (๐‘›) is attained only by ๐บ(๐‘›, โŒŠ2๐‘›/3โŒ‹).

Conjecture 2 ([48], Conjecture 1.4). If ๐บ is a graph with ๐‘› vertices and ๐‘š edges attaining

the maximum spread ๐‘ (๐‘›,๐‘š), and if ๐‘š โ‰ค โŒŠ๐‘›2/4โŒ‹, then ๐บ must be bipartite. That is,

๐‘ ๐‘(๐‘›,๐‘š) = ๐‘ (๐‘›,๐‘š) for all ๐‘š โ‰ค โŒŠ๐‘›2/4โŒ‹.

76

Conjecture 1 is referred to as the Spread Conjecture, and Conjecture 2 is referred to as

the Bipartite Spread Conjecture. In this chapter, we investigate both conjectures. In Section

3.2, we prove an asymptotic version of the Bipartite Spread Conjecture, and provide an

infinite family of counterexamples to illustrate that our asymptotic version is as tight as

possible, up to lower order terms. In Section 3.3, we provide a high-level sketch of a proof

that the Spread Conjecture holds for all ๐‘› sufficiently large (the full proof can be found in

[12]). The exact associated results are given by Theorems 8 and 9.

Theorem 8. There exists a constant ๐‘ so that the following holds: Suppose ๐บ is a graph

on ๐‘› โ‰ฅ ๐‘ vertices with maximum spread; then ๐บ is the join of a clique on โŒŠ2๐‘›/3โŒ‹ vertices

and an independent set on โŒˆ๐‘›/3โŒ‰ vertices.

Theorem 9.

๐‘ (๐‘›,๐‘š)โˆ’ ๐‘ ๐‘(๐‘›,๐‘š) โ‰ค 1 + 16๐‘šโˆ’3/4

๐‘š3/4๐‘ (๐‘›,๐‘š)

for all ๐‘›,๐‘š โˆˆ N satisfying ๐‘š โ‰ค โŒŠ๐‘›2/4โŒ‹. In addition, for any ๐œ€ > 0, there exists some ๐‘›๐œ€

such that

๐‘ (๐‘›,๐‘š)โˆ’ ๐‘ ๐‘(๐‘›,๐‘š) โ‰ฅ 1โˆ’ ๐œ€

๐‘š3/4๐‘ (๐‘›,๐‘š)

for all ๐‘› โ‰ฅ ๐‘›๐œ€ and some ๐‘š โ‰ค โŒŠ๐‘›2/4โŒ‹ depending on ๐‘›.

The proof of Theorem 8 is quite involved, and for this reason we provide only a sketch,

illustrating the key high-level details involved in the proof, and referring some of the exact

details to the main paper [12]. The general technique consists of showing that a spread-

extremal graph has certain desirable properties, considering and solving an analogous

problem for graph limits, and then using this result to say something about the Spread

Conjecture for sufficiently large ๐‘›. In comparison, the proof of Theorem 9 is surprisingly

short, making use of the theory of equitable decompositions and a well-chosen class of

counterexamples.

77

3.2 The Bipartite Spread Conjecture

In [48], the authors investigated the structure of graphs which maximize the spread over all

graphs with a fixed number of vertices ๐‘› and edges ๐‘š, denoted by ๐‘ (๐‘›,๐‘š). In particular,

they proved the upper bound

๐‘ (๐บ) โ‰ค ๐œ†1 +โˆš

2๐‘šโˆ’ ๐œ†21 โ‰ค 2

โˆš๐‘š, (3.1)

and noted that equality holds throughout if and only if ๐บ is the union of isolated vertices

and ๐พ๐‘,๐‘ž, for some ๐‘ + ๐‘ž โ‰ค ๐‘› satisfying ๐‘š = ๐‘๐‘ž [48, Thm. 1.5]. This led the authors

to conjecture that if ๐บ has ๐‘› vertices, ๐‘š โ‰ค โŒŠ๐‘›2/4โŒ‹ edges, and spread ๐‘ (๐‘›,๐‘š), then ๐บ is

bipartite [48, Conj. 1.4]. In this section, we prove a weak asymptotic form of this conjecture

and provide an infinite family of counterexamples to the exact conjecture which verifies that

the error in the aforementioned asymptotic result is of the correct order of magnitude. Recall

that ๐‘ ๐‘(๐‘›,๐‘š), ๐‘š โ‰ค โŒŠ๐‘›2/4โŒ‹, is the maximum spread over all bipartite graphs with ๐‘› vertices

and ๐‘š edges. To explicitly compute the spread of certain graphs, we make use of the theory

of equitable partitions. In particular, we note that if ๐œ‘ is an automorphism of ๐บ, then the

quotient matrix of ๐ด(๐บ) with respect to ๐œ‘, denoted by ๐ด๐œ‘, satisfies ฮ›(๐ด๐œ‘) โŠ‚ ฮ›(๐ด), and

therefore ๐‘ (๐บ) is at least the spread of ๐ด๐œ‘ (for details, see [13, Section 2.3]). Additionally,

we require two propositions, one regarding the largest spectral radius of subgraphs of ๐พ๐‘,๐‘ž

of a given size, and another regarding the largest gap between sizes which correspond to a

complete bipartite graph of order at most ๐‘›.

Let ๐พ๐‘š๐‘,๐‘ž, 0 โ‰ค ๐‘๐‘ž โˆ’๐‘š < min๐‘, ๐‘ž, be the subgraph of ๐พ๐‘,๐‘ž resulting from removing

๐‘๐‘ž โˆ’๐‘š edges all incident to some vertex in the larger side of the bipartition (if ๐‘ = ๐‘ž, the

vertex can be from either set). In [78], the authors proved the following result.

Proposition 5. If 0 โ‰ค ๐‘๐‘ž โˆ’๐‘š < min๐‘, ๐‘ž, then ๐พ๐‘š๐‘,๐‘ž maximizes ๐œ†1 over all subgraphs of

๐พ๐‘,๐‘ž of size ๐‘š.

We also require estimates regarding the longest sequence of consecutive sizes ๐‘š <

โŒŠ๐‘›2/4โŒ‹ for which there does not exist a complete bipartite graph on at most ๐‘› vertices and

78

exactly ๐‘’ edges. As pointed out by [4], the result follows quickly by induction. However,

for completeness, we include a brief proof.

Proposition 6. The length of the longest sequence of consecutive sizes ๐‘š < โŒŠ๐‘›2/4โŒ‹ for

which there does not exist a complete bipartite graph on at most ๐‘› vertices and exactly ๐‘š

edges is zero for ๐‘› โ‰ค 4 and at mostโˆš

2๐‘›โˆ’ 1โˆ’ 1 for ๐‘› โ‰ฅ 5.

Proof. We proceed by induction. By inspection, for every ๐‘› โ‰ค 4, ๐‘š โ‰ค โŒŠ๐‘›2/4โŒ‹, there exists

a complete bipartite graph of size ๐‘š and order at most ๐‘›, and so the length of the longest

sequence is trivially zero for ๐‘› โ‰ค 4. When ๐‘› = ๐‘š = 5, there is no complete bipartite graph

of order at most five with exactly five edges. This is the only such instance for ๐‘› = 5, and

so the length of the longest sequence for ๐‘› = 5 is one.

Now, suppose that the statement holds for graphs of order at most ๐‘›โˆ’ 1, for some ๐‘› > 5.

We aim to show the statement for graphs of order at most ๐‘›. By our inductive hypothesis,

it suffices to consider only sizes ๐‘š โ‰ฅ โŒŠ(๐‘› โˆ’ 1)2/4โŒ‹ and complete bipartite graphs on ๐‘›

vertices. We have

(๐‘›2

+ ๐‘˜)(๐‘›

2โˆ’ ๐‘˜)โ‰ฅ (๐‘›โˆ’ 1)2

4for |๐‘˜| โ‰ค

โˆš2๐‘›โˆ’ 1

2.

When 1 โ‰ค ๐‘˜ โ‰คโˆš

2๐‘›โˆ’ 1/2, the difference between the sizes of ๐พ๐‘›/2+๐‘˜โˆ’1,๐‘›/2โˆ’๐‘˜+1 and

๐พ๐‘›/2+๐‘˜,๐‘›/2โˆ’๐‘˜ is at most

๐ธ(๐พ๐‘›

2+๐‘˜โˆ’1,๐‘›

2โˆ’๐‘˜+1

)โˆ’๐ธ(๐พ๐‘›/2+๐‘˜,๐‘›/2โˆ’๐‘˜

)= 2๐‘˜ โˆ’ 1 โ‰ค

โˆš2๐‘›โˆ’ 1โˆ’ 1.

Let ๐‘˜* be the largest value of ๐‘˜ satisfying ๐‘˜ โ‰คโˆš

2๐‘›โˆ’ 1/2 and ๐‘›/2 + ๐‘˜ โˆˆ N. Then

๐ธ(๐พ๐‘›

2+๐‘˜*,๐‘›

2โˆ’๐‘˜*)

<

(๐‘›

2+

โˆš2๐‘›โˆ’ 1

2โˆ’ 1

)(๐‘›

2โˆ’โˆš

2๐‘›โˆ’ 1

2+ 1

)=โˆš

2๐‘›โˆ’ 1 +(๐‘›โˆ’ 1)2

4โˆ’ 1,

79

and the difference between the sizes of ๐พ๐‘›/2+๐‘˜*,๐‘›/2โˆ’๐‘˜* and ๐พโŒˆ๐‘›โˆ’12

โŒ‰,โŒŠ๐‘›โˆ’12

โŒ‹ is at most

๐ธ(๐พ๐‘›

2+๐‘˜*,๐‘›

2โˆ’๐‘˜*)โˆ’๐ธ(๐พโŒˆ๐‘›โˆ’1

2โŒ‰,โŒŠ๐‘›โˆ’1

2โŒ‹)

<โˆš

2๐‘›โˆ’ 1 +(๐‘›โˆ’ 1)2

4โˆ’โŒŠ

(๐‘›โˆ’ 1)2

4

โŒ‹โˆ’ 1

<โˆš

2๐‘›โˆ’ 1.

Combining these two estimates completes our inductive step, and the proof.

We are now prepared to prove an asymptotic version of [48, Conjecture 1.4], and

provide an infinite class of counterexamples that illustrates that the asymptotic version under

consideration is the tightest version of this conjecture possible.

Theorem 10.

๐‘ (๐‘›,๐‘š)โˆ’ ๐‘ ๐‘(๐‘›,๐‘š) โ‰ค 1 + 16๐‘šโˆ’3/4

๐‘š3/4๐‘ (๐‘›,๐‘š)

for all ๐‘›,๐‘š โˆˆ N satisfying ๐‘š โ‰ค โŒŠ๐‘›2/4โŒ‹. In addition, for any ๐œ– > 0, there exists some ๐‘›๐œ–

such that

๐‘ (๐‘›,๐‘š)โˆ’ ๐‘ ๐‘(๐‘›,๐‘š) โ‰ฅ 1โˆ’ ๐œ–

๐‘š3/4๐‘ (๐‘›,๐‘š)

for all ๐‘› โ‰ฅ ๐‘›๐œ– and some ๐‘š โ‰ค โŒŠ๐‘›2/4โŒ‹ depending on ๐‘›.

Proof. The main idea of the proof is as follows. To obtain an upper bound on ๐‘ (๐‘›,๐‘š) โˆ’

๐‘ ๐‘(๐‘›,๐‘š), we upper bound ๐‘ (๐‘›,๐‘š) by 2โˆš๐‘š (using Inequality 3.1) and lower bound ๐‘ ๐‘(๐‘›,๐‘š)

by the spread of some specific bipartite graph. To obtain a lower bound on ๐‘ (๐‘›,๐‘š)โˆ’๐‘ ๐‘(๐‘›,๐‘š)

for a specific ๐‘› and ๐‘š, we explicitly compute ๐‘ ๐‘(๐‘›,๐‘š) using Proposition 5, and lower bound

๐‘ (๐‘›,๐‘š) by the spread of some specific non-bipartite graph.

First, we analyze the spread of ๐พ๐‘š๐‘,๐‘ž, 0 < ๐‘๐‘ž โˆ’๐‘š < ๐‘ž โ‰ค ๐‘, a quantity that will be used

in the proof of both the upper and lower bound. Let us denote the vertices in the bipartition

of ๐พ๐‘š๐‘,๐‘ž by ๐‘ข1, ..., ๐‘ข๐‘ and ๐‘ฃ1, ..., ๐‘ฃ๐‘ž, and suppose without loss of generality that ๐‘ข1 is not

adjacent to ๐‘ฃ1, ..., ๐‘ฃ๐‘๐‘žโˆ’๐‘š. Then

๐œ‘ = (๐‘ข1)(๐‘ข2, ..., ๐‘ข๐‘)(๐‘ฃ1, ..., ๐‘ฃ๐‘๐‘žโˆ’๐‘š)(๐‘ฃ๐‘๐‘žโˆ’๐‘š+1, ..., ๐‘ฃ๐‘ž)

80

is an automorphism of ๐พ๐‘š๐‘,๐‘ž. The corresponding quotient matrix is given by

๐ด๐œ‘ =

โŽ›โŽœโŽœโŽœโŽœโŽœโŽœโŽ0 0 0 ๐‘šโˆ’ (๐‘โˆ’ 1)๐‘ž

0 0 ๐‘๐‘ž โˆ’๐‘š ๐‘šโˆ’ (๐‘โˆ’ 1)๐‘ž

0 ๐‘โˆ’ 1 0 0

1 ๐‘โˆ’ 1 0 0

โŽžโŽŸโŽŸโŽŸโŽŸโŽŸโŽŸโŽ  ,

has characteristic polynomial

๐‘„(๐‘, ๐‘ž,๐‘š) = det[๐ด๐œ‘ โˆ’ ๐œ†๐ผ] = ๐œ†4 โˆ’๐‘š๐œ†2 + (๐‘โˆ’ 1)(๐‘šโˆ’ (๐‘โˆ’ 1)๐‘ž)(๐‘๐‘ž โˆ’๐‘š),

and, therefore,

๐‘ (๐พ๐‘š

๐‘,๐‘ž

)โ‰ฅ 2

(๐‘š +

โˆš๐‘š2 โˆ’ 4(๐‘โˆ’ 1)(๐‘šโˆ’ (๐‘โˆ’ 1)๐‘ž)(๐‘๐‘ž โˆ’๐‘š)

2

)1/2

. (3.2)

For ๐‘๐‘ž = ฮฉ(๐‘›2) and ๐‘› sufficiently large, this upper bound is actually an equality, as ๐ด(๐พ๐‘š๐‘,๐‘ž)

is a perturbation of the adjacency matrix of a complete bipartite graph with ฮฉ(๐‘›) bipartite

sets by an ๐‘‚(โˆš๐‘›) norm matrix. For the upper bound, we only require the inequality, but for

the lower bound, we assume ๐‘› is large enough so that this is indeed an equality.

Next, we prove the upper bound. For some fixed ๐‘› and ๐‘š โ‰ค โŒŠ๐‘›2/4โŒ‹, let ๐‘š = ๐‘๐‘ž โˆ’ ๐‘Ÿ,

where ๐‘, ๐‘ž, ๐‘Ÿ โˆˆ N, ๐‘ + ๐‘ž โ‰ค ๐‘›, and ๐‘Ÿ is as small as possible. If ๐‘Ÿ = 0, then by [48,

Thm. 1.5] (described above), ๐‘ (๐‘›,๐‘š) = ๐‘ ๐‘(๐‘›,๐‘š) and we are done. Otherwise, we note

that 0 < ๐‘Ÿ < min๐‘, ๐‘ž, and so Inequality 3.2 is applicable (in fact, by Proposition 6,

๐‘Ÿ = ๐‘‚(โˆš๐‘›)). Using the upper bound ๐‘ (๐‘›,๐‘š) โ‰ค 2

โˆš๐‘š and Inequality 3.2, we have

๐‘ (๐‘›, ๐‘๐‘ž โˆ’ ๐‘Ÿ)โˆ’ ๐‘ (๐พ๐‘š

๐‘,๐‘ž

)๐‘ (๐‘›, ๐‘๐‘ž โˆ’ ๐‘Ÿ)

โ‰ค 1โˆ’

(1

2+

1

2

โˆš1โˆ’ 4(๐‘โˆ’ 1)(๐‘ž โˆ’ ๐‘Ÿ)๐‘Ÿ

(๐‘๐‘ž โˆ’ ๐‘Ÿ)2

)1/2

. (3.3)

To upper bound ๐‘Ÿ, we use Proposition 6 with ๐‘›โ€ฒ = โŒˆ2โˆš๐‘šโŒ‰ โ‰ค ๐‘› and ๐‘š. This implies that

๐‘Ÿ โ‰คโˆš

2โŒˆ2โˆš๐‘šโŒ‰ โˆ’ 1โˆ’ 1 <

โˆš2(2โˆš๐‘š + 1)โˆ’ 1โˆ’ 1 =

โˆš4โˆš๐‘š + 1โˆ’ 1 โ‰ค 2๐‘š1/4.

81

Recall thatโˆš

1โˆ’ ๐‘ฅ โ‰ฅ 1โˆ’ ๐‘ฅ/2โˆ’ ๐‘ฅ2/2 for all ๐‘ฅ โˆˆ [0, 1], and so

1โˆ’(12

+ 12

โˆš1โˆ’ ๐‘ฅ

)1/2 โ‰ค 1โˆ’(12

+ 12(1โˆ’ 1

2๐‘ฅโˆ’ 1

2๐‘ฅ2))1/2

= 1โˆ’(1โˆ’ 1

4(๐‘ฅ + ๐‘ฅ2)

)โ‰ค 1โˆ’

(1โˆ’ 1

8(๐‘ฅ + ๐‘ฅ2)โˆ’ 1

32(๐‘ฅ + ๐‘ฅ2)2

)โ‰ค 1

8๐‘ฅ + 1

4๐‘ฅ2

for ๐‘ฅ โˆˆ [0, 1]. To simplify Inequality 3.3, we observe that

4(๐‘โˆ’ 1)(๐‘ž โˆ’ ๐‘Ÿ)๐‘Ÿ

(๐‘๐‘ž โˆ’ ๐‘Ÿ)2โ‰ค 4๐‘Ÿ

๐‘šโ‰ค 8

๐‘š3/4.

Therefore,๐‘ (๐‘›, ๐‘๐‘ž โˆ’ ๐‘Ÿ)โˆ’ ๐‘ 

(๐พ๐‘š

๐‘,๐‘ž

)๐‘ (๐‘›, ๐‘๐‘ž โˆ’ ๐‘Ÿ)

โ‰ค 1

๐‘š3/4+

16

๐‘š3/2.

This completes the proof of the upper bound.

Finally, we proceed with the proof of the lower bound. Let us fix some 0 < ๐œ– < 1, and

consider some arbitrarily large ๐‘›. Let ๐‘š = (๐‘›/2 + ๐‘˜)(๐‘›/2โˆ’ ๐‘˜) + 1, where ๐‘˜ is the smallest

number satisfying ๐‘›/2 + ๐‘˜ โˆˆ N and ๐œ– := 1โˆ’ 2๐‘˜2/๐‘› < ๐œ–/2 (here we require ๐‘› = ฮฉ(1/๐œ–2)).

Denote the vertices in the bipartition of ๐พ๐‘›/2+๐‘˜,๐‘›/2โˆ’๐‘˜ by ๐‘ข1, ..., ๐‘ข๐‘›/2+๐‘˜ and ๐‘ฃ1, ..., ๐‘ฃ๐‘›/2โˆ’๐‘˜,

and consider the graph ๐พ+๐‘›/2+๐‘˜,๐‘›/2โˆ’๐‘˜ := ๐พ๐‘›/2+๐‘˜,๐‘›/2โˆ’๐‘˜ โˆช (๐‘ฃ1, ๐‘ฃ2) resulting from adding

one edge to ๐พ๐‘›/2+๐‘˜,๐‘›/2โˆ’๐‘˜ between two vertices in the smaller side of the bipartition. Then

๐œ‘ = (๐‘ข1, ..., ๐‘ข๐‘›/2+๐‘˜)(๐‘ฃ1, ๐‘ฃ2)(๐‘ฃ3, ..., ๐‘ฃ๐‘›/2โˆ’๐‘˜)

is an automorphism of ๐พ+๐‘›/2+๐‘˜,๐‘›/2โˆ’๐‘˜, and

๐ด๐œ‘ =

โŽ›โŽœโŽœโŽœโŽ0 2 ๐‘›/2โˆ’ ๐‘˜ โˆ’ 2

๐‘›/2 + ๐‘˜ 1 0

๐‘›/2 + ๐‘˜ 0 0

โŽžโŽŸโŽŸโŽŸโŽ 

82

has characteristic polynomial

det[๐ด๐œ‘ โˆ’ ๐œ†๐ผ] = โˆ’๐œ†3 + ๐œ†2 +(๐‘›2/4โˆ’ ๐‘˜2

)๐œ†โˆ’ (๐‘›/2 + ๐‘˜)(๐‘›/2โˆ’ ๐‘˜ โˆ’ 2)

= โˆ’๐œ†3 + ๐œ†2 +

(๐‘›2

4โˆ’ (1โˆ’ ๐œ–)๐‘›

2

)๐œ†โˆ’

(๐‘›2

4โˆ’ (3โˆ’ ๐œ–)๐‘›

2โˆ’โˆš

2(1โˆ’ ๐œ–)๐‘›

).

By matching higher order terms, we obtain

๐œ†๐‘š๐‘Ž๐‘ฅ(๐ด๐œ‘) =๐‘›

2โˆ’ 1โˆ’ ๐œ–

2+

(8โˆ’ (1โˆ’ ๐œ–)2)

4๐‘›+ ๐‘œ(1/๐‘›),

๐œ†๐‘š๐‘–๐‘›(๐ด๐œ‘) = โˆ’๐‘›

2+

1โˆ’ ๐œ–

2+

(8 + (1โˆ’ ๐œ–)2)

4๐‘›+ ๐‘œ(1/๐‘›),

and

๐‘ (๐พ+๐‘›/2+๐‘˜,๐‘›/2โˆ’๐‘˜) โ‰ฅ ๐‘›โˆ’ (1โˆ’ ๐œ–)โˆ’ (1โˆ’ ๐œ–)2

2๐‘›+ ๐‘œ(1/๐‘›).

Next, we aim to compute ๐‘ ๐‘(๐‘›,๐‘š), ๐‘š = (๐‘›/2 + ๐‘˜)(๐‘›/2 โˆ’ ๐‘˜) + 1. By Proposition 5,

๐‘ ๐‘(๐‘›,๐‘š) is equal to the maximum of ๐‘ (๐พ๐‘š๐‘›/2+โ„“,๐‘›/2โˆ’โ„“) over all โ„“ โˆˆ [0, ๐‘˜ โˆ’ 1], ๐‘˜ โˆ’ โ„“ โˆˆ N. As

previously noted, for ๐‘› sufficiently large, the quantity ๐‘ (๐พ๐‘š๐‘›/2+โ„“,๐‘›/2โˆ’โ„“) is given exactly by

Equation (3.2), and so the optimal choice of โ„“ minimizes

๐‘“(โ„“) := (๐‘›/2 + โ„“โˆ’ 1)(๐‘˜2 โˆ’ โ„“2 โˆ’ 1)(๐‘›/2โˆ’ โ„“โˆ’ (๐‘˜2 โˆ’ โ„“2 โˆ’ 1))

= (๐‘›/2 + โ„“)((1โˆ’ ๐œ–)๐‘›/2โˆ’ โ„“2

)(๐œ–๐‘›/2 + โ„“2 โˆ’ โ„“

)+ ๐‘‚(๐‘›2).

We have

๐‘“(๐‘˜ โˆ’ 1) = (๐‘›/2 + ๐‘˜ โˆ’ 2)(2๐‘˜ โˆ’ 2)(๐‘›/2โˆ’ 3๐‘˜ + 3),

and if โ„“ โ‰ค 45๐‘˜, then ๐‘“(โ„“) = ฮฉ(๐‘›3). Therefore the minimizing โ„“ is in [4

5๐‘˜, ๐‘˜]. The derivative

of ๐‘“(โ„“) is given by

๐‘“ โ€ฒ(โ„“) = (๐‘˜2 โˆ’ โ„“2 โˆ’ 1)(๐‘›/2โˆ’ โ„“โˆ’ ๐‘˜2 + โ„“2 + 1)

โˆ’ 2โ„“(๐‘›/2 + โ„“โˆ’ 1)(๐‘›/2โˆ’ โ„“โˆ’ ๐‘˜2 + โ„“2 + 1)

+ (2โ„“โˆ’ 1)(๐‘›/2 + โ„“โˆ’ 1)(๐‘˜2 โˆ’ โ„“2 โˆ’ 1).

83

For โ„“ โˆˆ [45๐‘˜, ๐‘˜],

๐‘“ โ€ฒ(โ„“) โ‰ค ๐‘›(๐‘˜2 โˆ’ โ„“2)

2โˆ’ โ„“๐‘›(๐‘›/2โˆ’ โ„“โˆ’ ๐‘˜2 + โ„“2) + 2โ„“(๐‘›/2 + โ„“)(๐‘˜2 โˆ’ โ„“2)

โ‰ค 9๐‘˜2๐‘›

50โˆ’ 4

5๐‘˜๐‘›(๐‘›/2โˆ’ ๐‘˜ โˆ’ 9

25๐‘˜2) + 18

25(๐‘›/2 + ๐‘˜)๐‘˜3

=81๐‘˜3๐‘›

125โˆ’ 2๐‘˜๐‘›2

5+ ๐‘‚(๐‘›2)

= ๐‘˜๐‘›2

(81(1โˆ’ ๐œ–)

250โˆ’ 2

5

)+ ๐‘‚(๐‘›2) < 0

for sufficiently large ๐‘›. This implies that the optimal choice is โ„“ = ๐‘˜ โˆ’ 1, and ๐‘ ๐‘(๐‘›,๐‘š) =

๐‘ (๐พ๐‘š๐‘›/2+๐‘˜โˆ’1,๐‘›/2โˆ’๐‘˜+1). The characteristic polynomial ๐‘„(๐‘›/2+๐‘˜โˆ’1, ๐‘›/2โˆ’๐‘˜+1, ๐‘›2/4โˆ’๐‘˜2+1)

equals

๐œ†4 โˆ’(๐‘›2/4โˆ’ ๐‘˜2 + 1

)๐œ†2 + 2(๐‘›/2 + ๐‘˜ โˆ’ 2)(๐‘›/2โˆ’ 3๐‘˜ + 3)(๐‘˜ โˆ’ 1).

By matching higher order terms, the extreme root of ๐‘„ is given by

๐œ† =๐‘›

2โˆ’ 1โˆ’ ๐œ–

2โˆ’โˆš

2(1โˆ’ ๐œ–)

๐‘›+

27โˆ’ 14๐œ–โˆ’ ๐œ–2

4๐‘›+ ๐‘œ(1/๐‘›),

and so

๐‘ ๐‘(๐‘›,๐‘š) = ๐‘›โˆ’ (1โˆ’ ๐œ–)โˆ’ 2

โˆš2(1โˆ’ ๐œ–)

๐‘›+

27โˆ’ 14๐œ–โˆ’ ๐œ–2

2๐‘›+ ๐‘œ(1/๐‘›),

and

๐‘ (๐‘›,๐‘š)โˆ’ ๐‘ ๐‘(๐‘›,๐‘š)

๐‘ (๐‘›,๐‘š)โ‰ฅ 23/2(1โˆ’ ๐œ–)1/2

๐‘›3/2โˆ’ 14โˆ’ 8๐œ–

๐‘›2+ ๐‘œ(1/๐‘›2)

=(1โˆ’ ๐œ–)1/2

๐‘š3/4+

(1โˆ’ ๐œ–)1/2

(๐‘›/2)3/2

[1โˆ’ (๐‘›/2)3/2

๐‘š3/4

]โˆ’ 14โˆ’ 8๐œ–

๐‘›2+ ๐‘œ(1/๐‘›2)

โ‰ฅ 1โˆ’ ๐œ–/2

๐‘š3/4+ ๐‘œ(1/๐‘š3/4).

This completes the proof.

84

3.3 A Sketch for the Spread Conjecture

Here, we provide a concise, high-level description of an asymptotic proof of the Spread

Conjecture, proving only the key graph-theoretic structural results for spread-extremal

graphs. The full proof itself is quite involved (and long), making use of interval arith-

metic and a number of fairly complicated symbolic calculations, but conceptually, is quite

intuitive. For the full proof, we refer the reader to [12]. The proof consists of four main steps:

Step 1: Graph-Theoretic Results

In Subsection 3.3.1, we observe a number of important structural properties of any graph

that maximizes the spread for a given order ๐‘›. In particular, in Theorem 11, we show that

โ€ข any graph that maximizes spread is be the join of two threshold graphs, each with

order linear in ๐‘›,

โ€ข the unit eigenvectors x and z corresponding to ๐œ†1(๐บ) and ๐œ†๐‘›(๐บ) have infinity

norms of order ๐‘›โˆ’1/2 ,

โ€ข the quantities ๐œ†1x2๐‘ข โˆ’ ๐œ†๐‘›z

2๐‘ข, ๐‘ข โˆˆ ๐‘‰ , are all nearly equal, up to a term of order ๐‘›โˆ’1.

This last structural property serves as the key backbone of our proof. In addition, we

note that, by a tensor argument (replacing vertices by independent sets ๐พ๐‘ก and edges by

bicliques ๐พ๐‘ก,๐‘ก), an asymptotic upper bound for ๐‘ (๐‘›) implies a bound for all ๐‘›.

Step 2: Graph Limits, Averaging, and a Finite-Dimensional Eigenvalue Problem

For this step, we provide only a brief overview, and refer the reader to [12] for additional

details. In this step, we can make use of graph limits to understand how spread-extremal

graphs behave as ๐‘› tends to infinity. By proving a continuous version of Theorem 11

for graphons, and using an averaging argument inspired by [112], we can show that

85

the spread-extremal graphon takes the form of a step-graphon with a fixed structure of

symmetric seven by seven blocks, illustrated below.

The lengths ๐›ผ = (๐›ผ1, ..., ๐›ผ7), ๐›ผ๐‘‡1 = 1, of each row/column in the spread-optimal

step-graphon is unknown. For any choice of lengths ๐›ผ, we can associate a seven by

seven matrix whose spread is identical to that of the associated step-graphon pictured

above. Let ๐ต be the seven by seven matrix with ๐ต๐‘–,๐‘— equal to the value of the above

step-graphon on block ๐‘–, ๐‘—, and ๐ท = diag(๐›ผ1, ..., ๐›ผ7) be a diagonal matrix with ๐›ผ on the

diagonal. Then the matrix ๐ท1/2๐ต๐ท1/2, given by

diag(๐›ผ1, ..., ๐›ผ7)1/2

โŽ›โŽœโŽœโŽœโŽœโŽœโŽœโŽœโŽœโŽœโŽœโŽœโŽœโŽœโŽœโŽœโŽ

1 1 1 1 1 1 1

1 1 1 0 1 1 1

1 1 0 0 1 1 1

1 0 0 0 1 1 1

1 1 1 1 1 1 0

1 1 1 1 1 0 0

1 1 1 1 0 0 0

โŽžโŽŸโŽŸโŽŸโŽŸโŽŸโŽŸโŽŸโŽŸโŽŸโŽŸโŽŸโŽŸโŽŸโŽŸโŽŸโŽ diag(๐›ผ1, ..., ๐›ผ7)

1/2

has spread equal to the spread of the associated step-graphon. This process has reduced

the graph limit version of our problem to a finite-dimensional eigenvalue optimization

problem, which we can endow with additional structural constraints resulting from the

graphon version of Theorem 11.

86

(a) ๐›ผ๐‘– = 0 for all ๐‘– (b) ๐›ผ2 = ๐›ผ3 = ๐›ผ4 = 0

Figure 3-1: Contour plots of the spread for some choices of ๐›ผ. Each point (๐‘ฅ, ๐‘ฆ) of Plot (a)illustrates the maximum spread over all choices of ๐›ผ satisfying ๐›ผ3+๐›ผ4 = ๐‘ฅ and ๐›ผ6+๐›ผ7 = ๐‘ฆ(and therefore, ๐›ผ1 +๐›ผ2 +๐›ผ5 = 1โˆ’ ๐‘ฅโˆ’ ๐‘ฆ) on a grid of step size 1/100. Each point (๐‘ฅ, ๐‘ฆ) ofPlot (b) illustrates the maximum spread over all choices of ๐›ผ satisfying ๐›ผ2 = ๐›ผ3 = ๐›ผ4 = 0,๐›ผ5 = ๐‘ฆ, and ๐›ผ7 = ๐‘ฅ on a grid of step size 1/100. The maximum spread of Plot (a) isachieved at the black x, and implies that, without loss of generality, ๐›ผ3 + ๐›ผ4 = 0, andtherefore ๐›ผ2 = 0 (indices ๐›ผ1 and ๐›ผ2 can be combined when ๐›ผ3 + ๐›ผ4 = 0). Plot (b) treatsthis case when ๐›ผ2 = ๐›ผ3 = ๐›ผ4 = 0, and the maximum spread is achieved on the black line.This implies that either ๐›ผ5 = 0 or ๐›ผ7 = 0. In both cases, this reduces to the block two bytwo case ๐›ผ1, ๐›ผ7 = 0 (or, if ๐›ผ7 = 0, then ๐›ผ1, ๐›ผ6 = 0).

Step 3: Computer-Assisted Proof of a Finite-Dimensional Eigenvalue Problem

Using a computer-assisted proof, we can show that the optimizing choice of ๐›ผ๐‘–, ๐‘– =

1, ..., 7, is, without loss of generality, given by ๐›ผ1 = 2/3, ๐›ผ6 = 1/3, and all other ๐›ผ๐‘– = 0.

This is exactly the limit of the conjectured spread optimal graph as ๐‘› tends to infinity.

The proof of this fact is extremely technical. Though not a proof, in Figure 1 we provide

intuitive visual justification that this result is true. In this figure, we provide contour

plots resulting from numerical computations of the spread of the above matrix for var-

ious values of ๐›ผ. The numerical results suggest that the 2/3 โˆ’ 1/3 two by two block

step-graphon is indeed optimal. See Figure 1 and the associated caption for details. The

87

actual proof of this fact consists of three parts. First, we can reduce the possible choices

of non-zero ๐›ผ๐‘– from 27 to 17 different cases by removing different representations of the

same matrix. Second, using eigenvalue equations, the graph limit version of ๐œ†1x2๐‘ขโˆ’๐œ†๐‘›z

2๐‘ข

all nearly equal, and interval arithmetic, we can prove that, of the 17 cases, only the

cases ๐›ผ1, ๐›ผ7 = 0 or ๐›ผ4, ๐›ผ5, ๐›ผ7 = 0 need to be considered. Finally, using basic results

from the theory of cubic polynomials and computer-assisted symbolic calculations, we

can restrict ourselves to the case ๐›ผ1, ๐›ผ7 = 0. This case corresponds to a quadratic

polynomial and can easily be shown to be maximized by ๐›ผ1 = 2/3, ๐›ผ7 = 1/3. We refer

the reader to [118] for the code (based on [95]) and output of the interval arithmetic

algorithm (the key portion of this step), and [12] for additional details regarding this step.

Step 4: From Graph Limits to an Asymptotic Proof of the Spread Conjecture

We can convert our result for the spread-optimal graph limit to a statement for graphs.

This process consists of showing that any spread optimal graph takes the form (๐พ๐‘›1 โˆช

๐พ๐‘›2) โˆจ ๐พ๐‘›3 for ๐‘›1 = (2/3 + ๐‘œ(1))๐‘›, ๐‘›2 = ๐‘œ(๐‘›), and ๐‘›3 = (1/3 + ๐‘œ(1))๐‘›, i.e. any

spread optimal graph is equal up to a set of ๐‘œ(๐‘›) vertices to the conjectured optimal

graph ๐พโŒŠ2๐‘›/3โŒ‹ โˆจ๐พโŒˆ๐‘›/3โŒ‰, and then showing that, for ๐‘› sufficiently large, the spread of

(๐พ๐‘›1 โˆช๐พ๐‘›2) โˆจ๐พ๐‘›3 , ๐‘›1 + ๐‘›2 + ๐‘›3 = ๐‘›, is maximized when ๐‘›2 = 0. This step is fairly

straightforward, and we refer the reader to [12] for details.

3.3.1 Properties of Spread-Extremal Graphs

In this subsection, we consider properties of graphs that maximize the quantity ๐œ†1(๐บ) โˆ’

๐‘ ๐œ†๐‘›(๐บ) over all graphs of order ๐‘›, for some fixed ๐‘ > 0. When ๐‘ = 1, this is exactly

the class of spread-extremal graphs. Let x be the unit eigenvector of ๐œ†1 and z be the unit

eigenvector of ๐œ†๐‘›. As noted in [48] (for the case of ๐‘ = 1), we have

๐œ†1 โˆ’ ๐‘ ๐œ†๐‘› =โˆ‘๐‘ขโˆผ๐‘ฃ

x๐‘ขx๐‘ฃ โˆ’ ๐‘ z๐‘ขz๐‘ฃ,

88

and if a graph ๐บ maximizes ๐œ†1 โˆ’ ๐‘ ๐œ†๐‘› over all ๐‘› vertex graphs, then ๐‘ข โˆผ ๐‘ฃ, ๐‘ข = ๐‘ฃ, if

x๐‘ขx๐‘ฃ โˆ’ ๐‘ z๐‘ขz๐‘ฃ > 0 and ๐‘ข โˆผ ๐‘ฃ if x๐‘ขx๐‘ฃ โˆ’ ๐‘ z๐‘ขz๐‘ฃ < 0. We first produce some weak lower

bounds for the maximum of ๐œ†1 โˆ’ ๐‘ ๐œ†๐‘› over all ๐‘› vertex graphs.

Proposition 7.

max๐บ

๐œ†1(๐บ)โˆ’ ๐‘ ๐œ†๐‘›(๐บ) โ‰ฅ ๐‘›[1 + min1, ๐‘2/50

]for all ๐‘ > 0 and ๐‘› โ‰ฅ 100 max1, ๐‘โˆ’2.

Proof. We consider two different graphs, depending on the choice of ๐‘. When ๐‘ โ‰ฅ 5/4,

๐‘› โ‰ฅ 100, we consider the bicliques ๐พโŒŠ๐‘›/2โŒ‹,โŒˆ๐‘›/2โŒ‰ and note that

๐œ†1(๐พโŒŠ๐‘›/2โŒ‹,โŒˆ๐‘›/2โŒ‰)โˆ’ ๐‘ ๐œ†๐‘›(๐พโŒŠ๐‘›/2โŒ‹,โŒˆ๐‘›/2โŒ‰) โ‰ฅ(๐‘ + 1)(๐‘›โˆ’ 1)

2> ๐‘›[1 + 1/50].

When ๐‘ โ‰ค 5/4, we consider the graph ๐บ(๐‘›, ๐‘˜) = ๐พ๐‘˜ โˆจ๐พ๐‘›โˆ’๐‘˜, ๐‘˜ > 0, with characteristic

polynomial ๐œ†๐‘›โˆ’๐‘˜โˆ’1(๐œ† + 1)๐‘˜โˆ’1(๐œ†2 โˆ’ (๐‘˜ โˆ’ 1)๐œ†โˆ’ ๐‘˜(๐‘›โˆ’ ๐‘˜)

), and note that

๐œ†1(๐บ(๐‘›, ๐‘˜))โˆ’ ๐‘ ๐œ†๐‘›(๐บ(๐‘›, ๐‘˜)) =(1โˆ’ ๐‘)(๐‘˜ โˆ’ 1)

2+

1 + ๐‘

2

โˆš(๐‘˜ โˆ’ 1)2 + 4(๐‘›โˆ’ ๐‘˜)๐‘˜.

Setting ๐‘˜ = โŒˆ(1โˆ’๐‘/3)๐‘›โŒ‰+1 and using the inequalities (1โˆ’๐‘/3)๐‘›+1 โ‰ค ๐‘˜ < (1โˆ’๐‘/3)๐‘›+2,

we have(1โˆ’ ๐‘)(๐‘˜ โˆ’ 1)

2โ‰ฅ

(1โˆ’ ๐‘)(1โˆ’ ๐‘3)๐‘›

2= ๐‘›

[1

2โˆ’ 2๐‘

3+

๐‘2

6

],

and

(๐‘˜ โˆ’ 1)2 + 4(๐‘›โˆ’ ๐‘˜)๐‘˜ โ‰ฅ(

1โˆ’ ๐‘

3

)2

๐‘›2 + 4

(๐‘๐‘›

3โˆ’ 2

)(๐‘›โˆ’ ๐‘๐‘›

3+ 1

)= ๐‘›2

[1 +

2๐‘

3โˆ’ ๐‘2

3+

4๐‘โˆ’ 8

๐‘›โˆ’ 8

๐‘›2

]โ‰ฅ 1 +

2๐‘

3โˆ’ ๐‘2

2

for ๐‘› โ‰ฅ 100/๐‘2. Therefore,

๐œ†1(๐บ(๐‘›, ๐‘˜))โˆ’ ๐‘ ๐œ†๐‘›(๐บ(๐‘›, ๐‘˜)) โ‰ฅ ๐‘›

[1

2โˆ’ 2๐‘

3+

๐‘2

6+

1 + ๐‘

2

โˆš1 +

2๐‘

3โˆ’ ๐‘2

2

].

89

Using the inequalityโˆš

1 + ๐‘ฅ โ‰ฅ 1 + ๐‘ฅ/2โˆ’ ๐‘ฅ2/8 for all ๐‘ฅ โˆˆ [0, 1],

โˆš1 +

2๐‘

3โˆ’ ๐‘2

2โ‰ฅ 1 +

(๐‘

3โˆ’ ๐‘2

4

)โˆ’ 1

8

(2๐‘

3โˆ’ ๐‘2

2

)2

,

and, after a lengthy calculation, we obtain

๐œ†1(๐บ(๐‘›, ๐‘˜))โˆ’ ๐‘ ๐œ†๐‘›(๐บ(๐‘›, ๐‘˜)) โ‰ฅ ๐‘›

[1 +

13๐‘2

72โˆ’ ๐‘3

9+

5๐‘4

192โˆ’ ๐‘5

64

]โ‰ฅ ๐‘›

[1 + ๐‘2/50

].

Combining these two estimates completes the proof.

This implies that for ๐‘› sufficiently large and ๐‘ fixed, any extremal graph must satisfy

|๐œ†๐‘›| = ฮฉ(๐‘›). In the following theorem, we make a number of additional observations

regarding the structure of these extremal graphs. Similar results for the specific case of ๐‘ = 1

can be found in the full paper [12].

Theorem 11. Let ๐บ = (๐‘‰,๐ธ), ๐‘› โ‰ฅ 3, maximize the quantity ๐œ†1(๐บ) โˆ’ ๐‘ ๐œ†๐‘›(๐บ), for some

๐‘ > 0, over all ๐‘› vertex graphs, and x and z be unit eigenvectors corresponding to ๐œ†1 and

๐œ†๐‘›, respectively. Then

1. ๐บ = ๐บ[๐‘ƒ ] โˆจ๐บ[๐‘ ], where ๐‘ƒ = ๐‘ข โˆˆ ๐‘‰ | z๐‘ข โ‰ฅ 0 and ๐‘ = ๐‘‰ โˆ–๐‘ƒ ,

2. x๐‘ขx๐‘ฃ โˆ’ ๐‘ z๐‘ขz๐‘ฃ = 0 for all ๐‘ข = ๐‘ฃ,

3. ๐บ[๐‘ƒ ] and ๐บ[๐‘ ] are threshold graphs, i.e., there exists an ordering ๐‘ข1, ..., ๐‘ข|๐‘ƒ | of the

vertices of ๐บ[๐‘ƒ ] (and ๐บ[๐‘ ]) such that if ๐‘ข๐‘– โˆผ ๐‘ข๐‘— , ๐‘– < ๐‘—, then ๐‘ข๐‘— โˆผ ๐‘ข๐‘˜, ๐‘˜ = ๐‘–, implies

that ๐‘ข๐‘– โˆผ ๐‘ข๐‘˜,

4. |๐‘ƒ |, |๐‘ | > min1, ๐‘4๐‘›/2500 for ๐‘› โ‰ฅ 100 max1, ๐‘โˆ’2,

5. โ€–xโ€–โˆž โ‰ค 6/๐‘›1/2 and โ€–zโ€–โˆž โ‰ค 103 max1, ๐‘โˆ’3/๐‘›1/2 for ๐‘› โ‰ฅ 100 max1, ๐‘โˆ’2,

6.๐‘›(๐œ†1x

2๐‘ข โˆ’ ๐‘ ๐œ†๐‘›z

2๐‘ข) โˆ’ (๐œ†1 โˆ’ ๐‘ ๐œ†๐‘›)

โ‰ค 5 ยท 1012 max๐‘, ๐‘โˆ’12 for all ๐‘ข โˆˆ ๐‘‰ for ๐‘› โ‰ฅ

4 ยท 106 max1, ๐‘โˆ’6.

90

Proof. Property 1 follows immediately, as x๐‘ขx๐‘ฃ โˆ’ ๐‘ z๐‘ขz๐‘ฃ > 0 for all ๐‘ข โˆˆ ๐‘ƒ , ๐‘ฃ โˆˆ ๐‘ . To

prove Property 2, we proceed by contradiction. Suppose that this is not the case, that

x๐‘ข1x๐‘ข2 โˆ’ ๐‘ z๐‘ข1z๐‘ข2 = 0 for some ๐‘ข1 = ๐‘ข2. Let ๐บโ€ฒ be the graph resulting from either

adding edge ๐‘ข1, ๐‘ข2 to ๐บ (if ๐‘ข1, ๐‘ข2 โˆˆ ๐ธ(๐บ)) or removing edge ๐‘ข1, ๐‘ข2 from ๐บ (if

๐‘ข1, ๐‘ข2 โˆˆ ๐ธ(๐บ)). Then,

๐œ†1(๐บ)โˆ’ ๐‘ ๐œ†๐‘›(๐บ) =โˆ‘

๐‘ข,๐‘ฃโˆˆ๐ธ(๐บโ€ฒ)

x๐‘ขx๐‘ฃ โˆ’ ๐‘ z๐‘ขz๐‘ฃ โ‰ค ๐œ†1(๐บโ€ฒ)โˆ’ ๐‘ ๐œ†๐‘›(๐บโ€ฒ),

and so, by the optimality of ๐บ, x and z are also eigenvectors of the adjacency matrix of ๐บโ€ฒ.

This implies that

(๐œ†1(๐บ)โˆ’ ๐œ†1(๐บ

โ€ฒ))x๐‘ข1

=x๐‘ข2

and (๐œ†1(๐บ)โˆ’ ๐œ†1(๐บ

โ€ฒ))x๐‘ฃ = 0

for any ๐‘ฃ = ๐‘ข1, ๐‘ข2. If ๐‘› โ‰ฅ 3, then there exists some ๐‘ฃ such that x๐‘ฃ = 0, and so, by the

Perron-Frobenius theorem, ๐บ is disconnected. By Property 1, either ๐‘ƒ = โˆ… or ๐‘ = โˆ…, and

so ๐œ†๐‘› โ‰ฅ 0, a contradiction, as the trace of ๐ด is zero. This completes the proof of Property

2. To show Property 3, we simply order the vertices of ๐‘ƒ so that the quantity z๐‘ข๐‘–/x๐‘ข๐‘–

is

non-decreasing in ๐‘–. If ๐‘ข๐‘– โˆผ ๐‘ข๐‘— , ๐‘– < ๐‘—, and ๐‘ข๐‘— โˆผ ๐‘ข๐‘˜, without loss of generality with ๐‘– < ๐‘˜,

then

x๐‘ข๐‘–x๐‘ข๐‘˜โˆ’ ๐‘ z๐‘ข๐‘–

z๐‘ข๐‘˜โ‰ฅ x๐‘ข๐‘–

x๐‘ข๐‘˜โˆ’ ๐‘

x๐‘ข๐‘–

x๐‘ข๐‘—

z๐‘ข๐‘—z๐‘ข๐‘˜

=x๐‘ข๐‘–

x๐‘ข๐‘—

(x๐‘ข๐‘—x๐‘ข๐‘˜โˆ’ ๐‘ z๐‘ข๐‘—

z๐‘ข๐‘˜) > 0.

To show Properties 4-6, we assume ๐‘› โ‰ฅ 100 max1, ๐‘โˆ’2, and so, by Proposition 7,

๐œ†๐‘›(๐บ) < โˆ’min1, ๐‘2๐‘›/50. We begin with Property 4. Suppose, to the contrary, that

|๐‘ƒ | โ‰ค min1, ๐‘4๐‘›/2500, and let ๐‘ฃ maximize |z๐‘ฃ|. If ๐‘ฃ โˆˆ ๐‘ , then

๐œ†๐‘› =โˆ‘๐‘ขโˆผ๐‘ฃ

z๐‘ขz๐‘ฃโ‰ฅโˆ‘๐‘ขโˆˆ๐‘ƒ

z๐‘ขz๐‘ฃโ‰ฅ โˆ’|๐‘ƒ | โ‰ฅ โˆ’min1, ๐‘4๐‘›/2500,

91

a contradiction to ๐œ†๐‘›(๐บ) < โˆ’min1, ๐‘2๐‘›/50. If ๐‘ฃ โˆˆ ๐‘ƒ , then

๐œ†2๐‘› =

โˆ‘๐‘ขโˆผ๐‘ฃ

๐œ†๐‘›z๐‘ขz๐‘ฃ

=โˆ‘๐‘ขโˆผ๐‘ฃ

โˆ‘๐‘คโˆผ๐‘ข

z๐‘คz๐‘ฃโ‰คโˆ‘๐‘ขโˆผ๐‘ฃ

โˆ‘๐‘คโˆˆ๐‘ƒ,๐‘คโˆผ๐‘ข

z๐‘คz๐‘ฃโ‰ค ๐‘› |๐‘ƒ | โ‰ค min1, ๐‘4๐‘›2/2500,

again, a contradiction. This completes the proof of Property 4.

We now prove Property 5. Suppose that maximizes |x๐‘ข| and ๐‘ฃ maximizes |z๐‘ฃ|, and,

without loss of generality, ๐‘ฃ โˆˆ ๐‘ . Let

๐ด =

๐‘ค

x๐‘ค >

x

3

and ๐ต =

๐‘ค

z๐‘ค > min1, ๐‘2|z๐‘ฃ|

100

.

We have

๐œ†1x =โˆ‘๐‘ฃโˆผ

x๐‘ฃ โ‰ค x

(|๐ด|+ (๐‘›โˆ’ |๐ด|)/3

)and ๐œ†1 โ‰ฅ ๐‘›/2, and so |๐ด| โ‰ฅ ๐‘›/4. In addition,

1 โ‰ฅโˆ‘๐‘ฃโˆˆ๐ด

x2๐‘ฃ โ‰ฅ |๐ด|

โ€–xโ€–2โˆž9โ‰ฅ ๐‘› โ€–xโ€–2โˆž

36,

so โ€–xโ€–โˆž โ‰ค 6/๐‘›1/2. Similarly,

๐œ†๐‘›z๐‘ฃ =โˆ‘๐‘ขโˆผ๐‘ฃ

z๐‘ข โ‰ค |z๐‘ฃ|(|๐ต|+ min1, ๐‘2

100(๐‘›โˆ’ |๐ต|)

)

and |๐œ†๐‘›(๐บ)| โ‰ฅ min1, ๐‘2๐‘›/50, and so |๐ต| โ‰ฅ min1, ๐‘2๐‘›/100. In addition,

1 โ‰ฅโˆ‘๐‘ขโˆˆ๐ต

z2๐‘ข โ‰ฅ |๐ต|min1, ๐‘4โ€–zโ€–2โˆž

104โ‰ฅ min1, ๐‘6๐‘› โ€–zโ€–

2โˆž

106,

so โ€–zโ€–โˆž โ‰ค 103 max1, ๐‘โˆ’3/๐‘›1/2. This completes the proof of Property 5.

Finally, we prove Property 6. Let = (๐‘‰ (), ๐ธ()) be the graph resulting from

deleting ๐‘ข from ๐บ and cloning ๐‘ฃ, namely,

๐‘‰ () = ๐‘ฃโ€ฒ โˆช[๐‘‰ (๐บ) โˆ– ๐‘ข

]and ๐ธ() = ๐ธ(๐บ โˆ– ๐‘ข) โˆช ๐‘ฃโ€ฒ, ๐‘ค | ๐‘ฃ, ๐‘ค โˆˆ ๐ธ(๐บ).

92

Let ๐ด be the adjacency matrix of , and x and z equal

x๐‘ค =

โŽงโŽชโŽจโŽชโŽฉx๐‘ค ๐‘ค = ๐‘ฃโ€ฒ

x๐‘ฃ ๐‘ค = ๐‘ฃโ€ฒ,

and z๐‘ค =

โŽงโŽชโŽจโŽชโŽฉz๐‘ค ๐‘ค = ๐‘ฃโ€ฒ

z๐‘ฃ ๐‘ค = ๐‘ฃโ€ฒ.

.

Then

x๐‘‡ x = 1โˆ’ x2๐‘ข + x2

๐‘ฃ,

z๐‘‡ z = 1โˆ’ z2๐‘ข + z2๐‘ฃ,

and

x๐‘‡๐ดx = ๐œ†1 โˆ’ 2๐œ†1x2๐‘ข + 2๐œ†1x

2๐‘ฃ โˆ’ 2๐ด๐‘ข๐‘ฃx๐‘ขx๐‘ฃ,

z๐‘‡๐ดz = ๐œ†๐‘› โˆ’ 2๐œ†๐‘›z2๐‘ข + 2๐œ†๐‘›z

2๐‘ฃ โˆ’ 2๐ด๐‘ข๐‘ฃz๐‘ขz๐‘ฃ.

By the optimality of ๐บ,

๐œ†1(๐บ)โˆ’ ๐‘ ๐œ†๐‘›(๐บ)โˆ’

(x๐‘‡๐ดx

x๐‘‡ xโˆ’ ๐‘

z๐‘‡๐ดz

z๐‘‡ z

)โ‰ฅ 0.

Rearranging terms, we have

๐œ†1x2๐‘ข โˆ’ ๐œ†1x

2๐‘ฃ + 2๐ด๐‘ข,๐‘ฃx๐‘ขx๐‘ฃ

1โˆ’ x2๐‘ข + x2

๐‘ฃ

โˆ’ ๐‘๐œ†๐‘›z

2๐‘ข โˆ’ ๐œ†๐‘›z

2๐‘ฃ + 2๐ด๐‘ข,๐‘ฃz๐‘ขz๐‘ฃ

1โˆ’ z2๐‘ข + z2๐‘ฃโ‰ฅ 0,

and so

(๐œ†1x2๐‘ข โˆ’ ๐‘ ๐œ†๐‘›z

2๐‘ข)โˆ’ (๐œ†1x

2๐‘ฃ โˆ’ ๐‘ ๐œ†๐‘›z

2๐‘ฃ) โ‰ฅ โˆ’

[๐œ†1(x

2๐‘ข โˆ’ x2

๐‘ฃ)2 + 2๐ด๐‘ข,๐‘ฃx๐‘ขx๐‘ฃ

1โˆ’ x2๐‘ข + x2

๐‘ฃ

โˆ’ ๐‘๐œ†๐‘›(z2๐‘ข โˆ’ z2๐‘ฃ)

2 + 2๐ด๐‘ข,๐‘ฃz๐‘ขz๐‘ฃ1โˆ’ z2๐‘ข + z2๐‘ฃ

].

The infinity norms of x and z are at most 6/๐‘›1/2 and 103 max1, ๐‘โˆ’3/๐‘›1/2, respectively,

93

and so we can upper bound๐œ†1(x

2๐‘ข โˆ’ x2

๐‘ฃ)2 + 2๐ด๐‘ข,๐‘ฃx๐‘ขx๐‘ฃ

1โˆ’ x2๐‘ข + x2

๐‘ฃ

โ‰ค4๐‘›โ€–xโ€–4โˆž + 2โ€–xโ€–2โˆž

1โˆ’ 2โ€–xโ€–2โˆž

โ‰ค 4 ยท 64 + 2 ยท 62

๐‘›/2,

and๐œ†๐‘›(z2๐‘ข โˆ’ z2๐‘ฃ)

2 + 2๐ด๐‘ข,๐‘ฃz๐‘ขz๐‘ฃ1โˆ’ z2๐‘ข + z2๐‘ฃ

โ‰ค2๐‘›โ€–zโ€–4โˆž + 2โ€–zโ€–2โˆž

1โˆ’ 2โ€–zโ€–2โˆž

โ‰ค 2(1012 + 106) max1, ๐‘โˆ’12

๐‘›/2

for ๐‘› โ‰ฅ 4 ยท 106 max1, ๐‘โˆ’6. Our choice of ๐‘ข and ๐‘ฃ was arbitrary, and so

(๐œ†1x

2๐‘ข โˆ’ ๐‘ ๐œ†๐‘›z

2๐‘ข)โˆ’ (๐œ†1x

2๐‘ฃ โˆ’ ๐‘ ๐œ†๐‘›z

2๐‘ฃ)โ‰ค 5 ยท 1012 max๐‘, ๐‘โˆ’12/๐‘›

for all ๐‘ข, ๐‘ฃ โˆˆ ๐‘‰ . Noting thatโˆ‘

๐‘ข(๐œ†1x2๐‘ข โˆ’ ๐‘ ๐œ†๐‘›z

2๐‘ข) = ๐œ†1 โˆ’ ๐‘ ๐œ†๐‘› completes the proof.

The generality of this result implies that similar techniques to those used to prove the

spread conjecture could be used to understand the behavior of graphs that maximize ๐œ†1โˆ’๐‘ ๐œ†๐‘›

and how the graphs vary with the choice of ๐‘ โˆˆ (0,โˆž).

94

Chapter 4

Force-Directed Layouts

4.1 Introduction

Graph drawing is an area at the intersection of mathematics, computer science, and more

qualitative fields. Despite the extensive literature in the field, in many ways the concept

of what constitutes the optimal drawing of a graph is heuristic at best, and subjective at

worst. For a general review of the major areas of research in graph drawing, we refer the

reader to [7, 57]. In this chapter, we briefly introduce the broad class of force-directed

layouts of graphs and consider two prominent force-based techniques for drawing a graph in

low-dimensional Euclidean space: Tutteโ€™s spring embedding and metric multidimensional

scaling.

The concept of a force-directed layout is somewhat ambiguous, but, loosely defined, it

is a technique for drawing a graph in a low-dimensional Euclidean space (usually dimension

โ‰ค 3) by applying โ€œforcesโ€ between the set of vertices and/or edges. In a force-directed

layout, vertices connected by an edge (or at a small graph distance from each other) tend

to be close to each other in the resulting layout. Below we briefly introduce a number of

prominent force-directed layouts, including Tutteโ€™s spring embedding, Eadesโ€™ algorithm, the

Kamada-Kawai objective and algorithm, and the more recent UMAP algorithm.

In his 19631 work titled โ€œHow to Draw a Graph,โ€ Tutte found an elegant technique

1The paper was received in May 1962, but was published in 1963. For this reason, in [124], the year is

95

to produce planar embeddings of planar graphs that also minimize โ€œenergyโ€ (the sum of

squared edge lengths) in some sense [116]. In particular, for a three-connected planar graph,

he showed that if the outer face of the graph is fixed as the complement of some convex

region in the plane, and every other point is located at the mass center of its neighbors, then

the resulting embedding is planar. This result is now known as Tutteโ€™s spring embedding

theorem, and is considered by many to be the first example of a force-directed layout. One

of the major questions that this result does not treat is how to best embed the outer face. In

Section 4.2, we investigate this question, consider connections to a Schur complement, and

provide some theoretical results for this Schur complement using a discrete energy-only

trace theorem.

An algorithm for general graphs was later proposed by Eades in 1984, in which vertices

are placed in a random initial position in the plane, a logarithmic spring force is applied

to each edge and a repulsive inverse square-root law force is applied to each pair of non-

adjacent vertices [38]. The algorithm proceeds by iteratively moving each vertex towards

its local equilibrium. Five years later, Kamada and Kawai proposed an objective function

corresponding to the situation in which each pair of vertices are connected by a spring with

equilibrium length given by their graph distance and spring constant given by the inverse

square of their graph distance [55]. Their algorithm for locally minimizing this objective

consists of choosing vertices iteratively based on the magnitude of the associated derivative

of the objective, and locally minimizing with respect to that vertex. The associated objective

function is a specific example of metric multidimensional scaling, and both the objective

function and the proposed algorithm are quite popular in practice (see, for instance, popular

packages such as GRAPHVIZ [39] and the SMACOF package in R). In Section 4.3, we provide

a theoretical analysis of the Kamada-Kawai objective function. In particular, we prove a

number of structural results regarding optimal layouts, provide algorithmic lower bounds

for the optimization problem, and propose a polynomial time randomized approximation

scheme for drawing low diameter graphs.

Most recently, a new force-based layout algorithm called UMAP was proposed as an

referred to as 1962.

96

alternative to the popular t-SNE algorithm [82]. The UMAP algorithm takes a data set as

input, constructs a weighted graph, and performs a fairly complex force-directed layout in

which attractive and repulsive forces governed by a number of hyper-parameters are applied.

For the exact details of this approach, we refer the reader to [82, Section 3.2]. Though

the UMAP algorithm is not specifically analyzed in this chapter, it is likely that some the

techniques proposed in this chapter can be applied (given a much more complicated analysis)

to more complicated force-directed layout objectives, such as that of the UMAP algorithm.

In addition, the popularity of the UMAP algorithm illustrates the continued importance and

relevance of force-based techniques and their analysis.

In the remainder of this chapter, we investigate both Tutteโ€™s spring embedding and the

Kamada-Kawai objective function. In Section 4.2, we formally describe Tutteโ€™s spring

embedding and consider theoretical questions regarding the best choice of boundary. In

particular, we consider connections to a Schur complement and, using a discrete version of

a trace theorem from the theory of elliptic PDEs, provide a spectral equivalence result for

this matrix. In Section 4.3, we formally define the Kamada-Kawai objective, and prove a

number of theoretical results for this optimization problem. We lower bound the objective

value and upper bound the diameter of any optimal layout, prove that a gap version of the

optimization problem is NP-hard even for bounded diameter graph metrics, and provide a

polynomial time randomized approximation scheme for low diameter graphs.

97

4.2 Tutteโ€™s Spring Embedding and a Trace Theorem

The question of how to โ€œbestโ€ draw a graph in the plane is not an easy one to answer, largely

due to the ambiguity in the objective. When energy (i.e. Hallโ€™s energy, the sum of squared

distances between adjacent vertices) minimization is desired, the optimal embedding in

the plane is given by the two-dimensional diffusion map induced by the eigenvectors of

the two smallest non-zero eigenvalues of the graph Laplacian [60, 61, 62]. This general

class of graph drawing techniques is referred to as spectral layouts. When drawing a planar

graph, often a planar embedding (a drawing in which edges do not intersect) is desirable.

However, spectral layouts of planar graphs are not guaranteed to be planar. When looking

at a triangulation of a given domain, it is commonplace for the near-boundary points of

the spectral layout to โ€œgrowโ€ out of the boundary, or lack any resemblance to a planar

embedding. For instance, see the spectral layout of a random triangulation of a disk in

Figure 4-1.

Tutteโ€™s spring embedding is an elegant technique to produce planar embeddings of planar

graphs that also minimize energy in some sense [116]. In particular, for a three-connected

planar graph, if the outer face of the graph is fixed as the complement of some convex region

in the plane, and every other point is located at the mass center of its neighbors, then the

resulting embedding is planar. This embedding minimizes Hallโ€™s energy, conditional on the

embedding of the boundary face. This result is now known as Tutteโ€™s spring embedding

theorem. While this result is well known (see [59], for example), it is not so obvious how

to embed the outer face. This, of course, should vary from case to case, depending on the

dynamics of the interior.

In this section, we investigate how to embed the boundary face so that the resulting

drawing is planar and Hallโ€™s energy is minimized (subject to some normalization). In what

follows, we produce an algorithm with theoretical guarantees for a large class of three-

connected planar graphs. Our analysis begins by observing that the Schur complement of

the graph Laplacian with respect to the interior vertices is, in some sense, the correct matrix

to consider when choosing an optimal embedding of boundary vertices. See Figure 4-2

98

(a) Circle (b) Spectral Layout (c) Schur Complement Layout

Figure 4-1: A Delaunay triangulation of 1250 points randomly generated on the disk (A), itsnon-planar spectral layout (B), and a planar layout using a spring embedding of the Schurcomplement of the graph Laplacian with respect to the interior vertices (C).

for a visual example of a spring embedding using the two minimal non-trivial eigenvectors

of the Schur complement. In order to theoretically understand the behavior of the Schur

complement, we prove a discrete trace theorem. Trace theorems are a class of results in

the theory of partial differential equations relating norms on the domain to norms on the

boundary, which are used to provide a priori estimates on the Dirichlet integral of functions

with given data on the boundary. We construct a discrete version of a trace theorem in the

plane for energy-only semi-norms. Using a discrete trace theorem, we show that this Schur

complement is spectrally equivalent to the boundary Laplacian to the one-half power. This

spectral equivalence proves the existence of low energy (w.r.t. the Schur complement) convex

and planar embeddings of the boundary, but is also of independent interest and applicability

in the study of spectral properties of planar graphs. These theoretical guarantees give rise to

a simple graph drawing algorithm with provable guarantees.

The remainder of this section is as follows. In Subsection 4.2.1, we formally introduce

Tutteโ€™s spring embedding theorem, describe the optimization problem under consideration,

illustrate the connection to a Schur complement, and, conditional on a spectral equivalence,

describe a simple algorithm with provable guarantees. In Subsection 4.2.2, we prove the

aforementioned spectral equivalence for a large class of three-connected planar graphs. In

particular, we consider trace theorems for Lipschitz domains from the theory of elliptic

99

(a) Laplacian Embedding (b) Schur Complement Embedding

Figure 4-2: A visual example of embeddings of the 2D finite element discretization graph3elt, taken from the SuiteSparse Matrix Collection [27]. Figure (A) is the non-planar spectrallayout of this 2D mesh, and Figure (B) is a planar spring embedding of the mesh using theminimal non-trivial eigenvectors of the Schur complement to embed the boundary.

partial differential equations, prove discrete energy-only variants of these results for a large

class of graphs in the plane, and show that the Schur complement with respect to the interior

is spectrally equivalent to the boundary Laplacian to the one-half power.

Definitions and Notation

Let ๐บ = (๐‘‰,๐ธ), ๐‘‰ = 1, ..., ๐‘›, ๐ธ โŠ‚ ๐‘’ โŠ‚ ๐‘‰ | |๐‘’| = 2, be a simple, connected, undirected

graph. A graph ๐บ is ๐‘˜-connected if it remains connected upon the removal of any ๐‘˜ โˆ’ 1

vertices, and is planar if it can be drawn in the plane such that no edges intersect (save for

adjacent edges at their mutual endpoint). A face of a planar embedding of a graph is a region

of the plane bounded by edges (including the outer infinite region, referred to as the outer

face). Let ๐’ข๐‘› be the set of all ordered pairs (๐บ,ฮ“), where ๐บ is a simple, undirected, planar,

three-connected graph of order ๐‘› > 4, and ฮ“ โŠ‚ ๐‘‰ is the set of vertices of some face of

๐บ. Three-connectedness is an important property for planar graphs, which, by Steinitzโ€™s

theorem, guarantees that the graph is the skeleton of a convex polyhedron [107]. This

characterization implies that, for three-connected graphs, the edges corresponding to each

face in a planar embedding are uniquely determined by the graph. In particular, the set of

100

faces is simply the set of induced cycles, so we may refer to faces of the graph without

specifying an embedding. One important corollary of this result is that, for ๐‘› โ‰ฅ 3, the

vertices of any face form an induced simple cycle.

Let ๐‘๐บ(๐‘–) be the neighborhood of vertex ๐‘–, ๐‘๐บ(๐‘†) be the union of the neighborhoods

of the vertices in ๐‘†, and ๐‘‘๐บ(๐‘–, ๐‘—) be the distance between vertices ๐‘– and ๐‘— in the graph

๐บ. When the associated graph is obvious, we may remove the subscript. Let ๐‘‘(๐‘–) be the

degree of vertex ๐‘–. Let ๐บ[๐‘†] be the graph induced by the vertices ๐‘†, and ๐‘‘๐‘†(๐‘–, ๐‘—) be the

distance between vertices ๐‘– and ๐‘— in ๐บ[๐‘†]. If ๐ป is a subgraph of ๐บ, we write ๐ป โŠ‚ ๐บ. The

Cartesian product ๐บ1๐บ2 between ๐บ1 = (๐‘‰1, ๐ธ1) and ๐บ2 = (๐‘‰2, ๐ธ2) is the graph with

vertices (๐‘ฃ1, ๐‘ฃ2) โˆˆ ๐‘‰1 ร— ๐‘‰2 and edge (๐‘ข1, ๐‘ข2), (๐‘ฃ1, ๐‘ฃ2) โˆˆ ๐ธ if (๐‘ข1, ๐‘ฃ1) โˆˆ ๐ธ1 and ๐‘ข2 = ๐‘ฃ2,

or ๐‘ข1 = ๐‘ฃ1 and (๐‘ข2, ๐‘ฃ2) โˆˆ ๐ธ2. The graph Laplacian ๐ฟ๐บ โˆˆ R๐‘›ร—๐‘› of ๐บ is the symmetric

positive semi-definite matrix defined by

โŸจ๐ฟ๐บ๐‘ฅ, ๐‘ฅโŸฉ =โˆ‘

๐‘–,๐‘—โˆˆ๐ธ

(๐‘ฅ๐‘– โˆ’ ๐‘ฅ๐‘—)2,

and, in general, a matrix is the graph Laplacian of some weighted graph if it is symmetric

diagonally dominant, has non-positive off-diagonal entries, and the vector 1 := (1, ..., 1)๐‘‡

lies in its nullspace. Given a matrix ๐ด, we denote the ๐‘–๐‘กโ„Ž row by ๐ด๐‘–,ยท, the ๐‘—๐‘กโ„Ž column by ๐ดยท,๐‘— ,

and the entry in the ๐‘–๐‘กโ„Ž row and ๐‘—๐‘กโ„Ž column by ๐ด๐‘–,๐‘— .

4.2.1 Spring Embeddings and a Schur Complement

Here and in what follows, we refer to ฮ“ as the โ€œboundaryโ€ of the graph ๐บ, ๐‘‰ โˆ–ฮ“ as the

โ€œinterior,โ€ and generally assume ๐‘›ฮ“ := |ฮ“| to be relatively large (typically ๐‘›ฮ“ = ฮ˜(๐‘›1/2)).

The concept of a โ€œboundaryโ€ face is somewhat arbitrary, but, depending on the application

from which the graph originated (i.e., a discretization of some domain), one face is often

already designated as the boundary face. If a face has not been designated, choosing the

largest induced cycle is a reasonable choice. By producing a planar drawing of ๐บ in the

plane and traversing the embedding, one can easily find all the induced cycles of ๐บ in linear

time and space [20].

101

Without loss of generality, suppose that ฮ“ = ๐‘›โˆ’ ๐‘›ฮ“ + 1, ..., ๐‘›. A matrix ๐‘‹ โˆˆ R๐‘›ร—2 is

said to be a planar embedding of ๐บ if the drawing of ๐บ using straight lines and with vertex ๐‘–

located at coordinates ๐‘‹๐‘–,ยท for all ๐‘– โˆˆ ๐‘‰ is a planar drawing. A matrix ๐‘‹ฮ“ โˆˆ R๐‘›ฮ“ร—2 is said

to be a convex embedding of ฮ“ if the embedding is planar and every point is an extreme

point of the convex hull conv([๐‘‹ฮ“]๐‘–,ยท๐‘›ฮ“๐‘–=1). Tutteโ€™s spring embedding theorem states that if

๐‘‹ฮ“ is a convex embedding of ฮ“, then the system of equations

๐‘‹๐‘–,ยท =

โŽงโŽชโŽจโŽชโŽฉ1

๐‘‘(๐‘–)

โˆ‘๐‘—โˆˆ๐‘(๐‘–) ๐‘‹๐‘—,ยท ๐‘– = 1, ..., ๐‘›โˆ’ ๐‘›ฮ“

[๐‘‹ฮ“]๐‘–โˆ’(๐‘›โˆ’๐‘›ฮ“),ยท ๐‘– = ๐‘›โˆ’ ๐‘›ฮ“ + 1, ..., ๐‘›

has a unique solution ๐‘‹ , and this solution is a planar embedding of ๐บ [116].

We can write both the Laplacian and embedding of ๐บ in block-notation, differentiating

between interior and boundary vertices as follows:

๐ฟ๐บ =

โŽ›โŽ๐ฟ๐‘œ + ๐ท๐‘œ โˆ’๐ด๐‘œ,ฮ“

โˆ’๐ด๐‘‡๐‘œ,ฮ“ ๐ฟฮ“ + ๐ทฮ“

โŽžโŽ  โˆˆ R๐‘›ร—๐‘›, ๐‘‹ =

โŽ›โŽ๐‘‹๐‘œ

๐‘‹ฮ“

โŽžโŽ  โˆˆ R๐‘›ร—2,

where ๐ฟ๐‘œ, ๐ท๐‘œ โˆˆ R(๐‘›โˆ’๐‘›ฮ“)ร—(๐‘›โˆ’๐‘›ฮ“), ๐ฟฮ“, ๐ทฮ“ โˆˆ R๐‘›ฮ“ร—๐‘›ฮ“ , ๐ด๐‘œ,ฮ“ โˆˆ R(๐‘›โˆ’๐‘›ฮ“)ร—๐‘›ฮ“ , ๐‘‹๐‘œ โˆˆ R(๐‘›โˆ’๐‘›ฮ“)ร—2,

๐‘‹ฮ“ โˆˆ R๐‘›ฮ“ร—2, and ๐ฟ๐‘œ and ๐ฟฮ“ are the Laplacians of ๐บ[๐‘‰ โˆ–ฮ“] and ๐บ[ฮ“], respectively. Using

block notation, the system of equations for the Tutte spring embedding of some convex

embedding ๐‘‹ฮ“ is given by

๐‘‹๐‘œ = (๐ท๐‘œ + ๐ท[๐ฟ๐‘œ])โˆ’1[(๐ท[๐ฟ๐‘œ]โˆ’ ๐ฟ๐‘œ)๐‘‹๐‘œ + ๐ด๐‘œ,ฮ“๐‘‹ฮ“],

where ๐ท[๐ด] is the diagonal matrix with diagonal entries given by the diagonal of ๐ด. The

unique solution to this system is

๐‘‹๐‘œ = (๐ฟ๐‘œ + ๐ท๐‘œ)โˆ’1๐ด๐‘œ,ฮ“๐‘‹ฮ“.

We note that this choice of ๐‘‹๐‘œ not only guarantees a planar embedding of ๐บ, but also

102

minimizes Hallโ€™s energy, namely,

arg min๐‘‹๐‘œ

โ„Ž(๐‘‹) = (๐ฟ๐‘œ + ๐ท๐‘œ)โˆ’1๐ด๐‘œ,ฮ“๐‘‹ฮ“,

where โ„Ž(๐‘‹) := Tr(๐‘‹๐‘‡๐ฟ๐‘‹) (see [62] for more on Hallโ€™s energy).

While Tutteโ€™s theorem is a very powerful result, guaranteeing that, given a convex

embedding of any face, the energy minimizing embedding of the remaining vertices results

in a planar embedding, it gives no direction as to how this outer face should be embedded.

We consider embeddings of the boundary that minimize Hallโ€™s energy, given some

normalization. We consider embeddings that satisfy ๐‘‹๐‘‡ฮ“๐‘‹ฮ“ = ๐ผ and ๐‘‹๐‘‡

ฮ“ 1 = 0, though

other normalizations, such as ๐‘‹๐‘‡๐‘‹ = ๐ผ and ๐‘‹๐‘‡1 = 0, would be equally appropriate. The

analysis that follows in this section can be readily applied to this alternate normalization,

but it does require the additional step of verifying a norm equivalence between ๐‘‰ and ฮ“

for the harmonic extension of low energy vectors, which can be produced relatively easily

for the class of graphs considered in Subsection 4.2.2. In addition, alternate normalization

techniques, such as the introduction of negative weights on the boundary cycle, can also

be considered. Let ๐’ณ be the set of all planar embeddings ๐‘‹ฮ“ that satisfy ๐‘‹๐‘‡ฮ“๐‘‹ฮ“ = ๐ผ ,

๐‘‹๐‘‡ฮ“ 1 = 0, and for which the spring embedding ๐‘‹ with ๐‘‹๐‘œ = (๐ฟ๐‘œ + ๐ท๐‘œ)

โˆ’1๐ด๐‘œ,ฮ“๐‘‹ฮ“ is planar.

We consider the optimization problem

min โ„Ž(๐‘‹) ๐‘ .๐‘ก. ๐‘‹ฮ“ โˆˆ cl(๐’ณ ), (4.1)

where cl(ยท) is the closure of a set. The set ๐’ณ is not closed, and the minimizer of (4.1) may

be non-planar, but must be arbitrarily close to a planar embedding. The normalizations

๐‘‹๐‘‡ฮ“ 1 = 0 and ๐‘‹๐‘‡

ฮ“๐‘‹ฮ“ = ๐ผ ensure that the solution does not degenerate into a single

point or line. In what follows we are primarily concerned with approximately solving this

optimization problem and connecting this problem to the Schur complement of ๐ฟ๐บ with

respect to ๐‘‰ โˆ–ฮ“. It is unclear whether there exists an efficient algorithm to solve (4.1) exactly

or if the associated decision problem is NP-hard. If (4.1) is NP-hard, it seems rather difficult

to verify that this is indeed the case.

103

A Schur Complement

Given some choice of ๐‘‹ฮ“, by Tutteโ€™s theorem the minimum value of โ„Ž(๐‘‹) is attained when

๐‘‹๐‘œ = (๐ฟ๐‘œ + ๐ท๐‘œ)โˆ’1๐ด๐‘œ,ฮ“๐‘‹ฮ“, and given by

Tr

โŽกโŽฃ([(๐ฟ๐‘œ + ๐ท๐‘œ)โˆ’1๐ด๐‘œ,ฮ“๐‘‹ฮ“]๐‘‡ ๐‘‹๐‘‡

ฮ“

)โŽ›โŽ๐ฟ๐‘œ + ๐ท๐‘œ โˆ’๐ด๐‘œ,ฮ“

โˆ’๐ด๐‘‡๐‘œ,ฮ“ ๐ฟฮ“ + ๐ทฮ“

โŽžโŽ โŽ›โŽ(๐ฟ๐‘œ + ๐ท๐‘œ)โˆ’1๐ด๐‘œ,ฮ“๐‘‹ฮ“

๐‘‹ฮ“

โŽžโŽ โŽคโŽฆ= Tr

(๐‘‹๐‘‡

ฮ“

[๐ฟฮ“ + ๐ทฮ“ โˆ’ ๐ด๐‘‡

๐‘œ,ฮ“(๐ฟ๐‘œ + ๐ท๐‘œ)โˆ’1๐ด๐‘œ,ฮ“

]๐‘‹ฮ“

)= Tr

(๐‘‹๐‘‡

ฮ“ ๐‘†ฮ“๐‘‹ฮ“

),

where ๐‘†ฮ“ is the Schur complement of ๐ฟ๐บ with respect to ๐‘‰ โˆ–ฮ“,

๐‘†ฮ“ = ๐ฟฮ“ + ๐ทฮ“ โˆ’ ๐ด๐‘‡๐‘œ,ฮ“(๐ฟ๐‘œ + ๐ท๐‘œ)

โˆ’1๐ด๐‘œ,ฮ“.

For this reason, we can instead consider the optimization problem

min โ„Žฮ“(๐‘‹ฮ“) ๐‘ .๐‘ก. ๐‘‹ฮ“ โˆˆ cl(๐’ณ ), (4.2)

where

โ„Žฮ“(๐‘‹ฮ“) := Tr(๐‘‹๐‘‡

ฮ“ ๐‘†ฮ“๐‘‹ฮ“

).

Therefore, if the minimal two non-trivial eigenvectors of ๐‘†ฮ“ produce a planar spring

embedding, then this is the exact solution of (4.2). However, a priori, there is no reason to

think that this drawing would be planar. It turns out that, for a large class of graphs with

some macroscopic structure, this is often the case (see, for example, Figures 4-1 and 4-2),

and, in the rare instance that it is not, through a spectral equivalence result, a low energy

planar and convex embedding of the boundary always exists. This fact follows from the

spectral equivalence of ๐‘†ฮ“ and ๐ฟ1/2ฮ“ , which is shown in Subsection 4.2.2.

First, we present a number of basic properties of the Schur complement of a graph

Laplacian. For more information on the Schur complement, we refer the reader to [18, 40,

134].

104

Proposition 8. Let ๐บ = (๐‘‰,๐ธ), ๐‘› = |๐‘‰ |, be a graph and ๐ฟ๐บ โˆˆ R๐‘›ร—๐‘› the associated graph

Laplacian. Let ๐ฟ๐บ and vectors ๐‘ฃ โˆˆ R๐‘› be written in block form

๐ฟ(๐บ) =

โŽ›โŽ๐ฟ11 ๐ฟ12

๐ฟ21 ๐ฟ22

โŽžโŽ  , ๐‘ฃ =

โŽ›โŽ๐‘ฃ1

๐‘ฃ2

โŽžโŽ  ,

where ๐ฟ22 โˆˆ R๐‘šร—๐‘š, ๐‘ฃ2 โˆˆ R๐‘š, and ๐ฟ12 = 0. Then

(1) ๐‘† = ๐ฟ22 โˆ’ ๐ฟ21๐ฟโˆ’111 ๐ฟ12 is a graph Laplacian,

(2)โˆ‘๐‘š

๐‘–=1(๐‘’๐‘‡๐‘– ๐ฟ221๐‘š)๐‘’๐‘–๐‘’

๐‘‡๐‘– โˆ’ ๐ฟ21๐ฟ

โˆ’111 ๐ฟ12 is a graph Laplacian,

(3) โŸจ๐‘†๐‘ค,๐‘คโŸฉ = infโŸจ๐ฟ๐‘ฃ, ๐‘ฃโŸฉ|๐‘ฃ2 = ๐‘ค.

Proof. Let ๐‘ƒ =

โŽ›โŽโˆ’๐ฟโˆ’111 ๐ฟ12

๐ผ

โŽžโŽ  โˆˆ R๐‘›ร—๐‘š. Then

๐‘ƒ ๐‘‡๐ฟ๐‘ƒ =(โˆ’๐ฟ21๐ฟ

โˆ’111 ๐ผ

)โŽ›โŽ๐ฟ11 ๐ฟ12

๐ฟ21 ๐ฟ22

โŽžโŽ โŽ›โŽโˆ’๐ฟโˆ’111 ๐ฟ12

๐ผ

โŽžโŽ  = ๐ฟ22 โˆ’ ๐ฟ21๐ฟโˆ’111 ๐ฟ12 = ๐‘†.

Because ๐ฟ111๐‘›โˆ’๐‘š +๐ฟ121๐‘š = 0, we have 1๐‘›โˆ’๐‘š = โˆ’๐ฟโˆ’111 ๐ฟ121๐‘š. Therefore ๐‘ƒ1๐‘š = 1๐‘›, and,

as a result,

๐‘†1๐‘š = ๐‘ƒ ๐‘‡๐ฟ๐‘ƒ1๐‘š = ๐‘ƒ ๐‘‡๐ฟ1๐‘› = ๐‘ƒ ๐‘‡0 = 0.

In addition,[๐‘šโˆ‘๐‘–=1

(๐‘’๐‘‡๐‘– ๐ฟ221๐‘š)๐‘’๐‘–๐‘’๐‘‡๐‘– โˆ’ ๐ฟ21๐ฟ

โˆ’111 ๐ฟ12

]1๐‘š =

[ ๐‘šโˆ‘๐‘–=1

(๐‘’๐‘‡๐‘– ๐ฟ221๐‘š)๐‘’๐‘–๐‘’๐‘‡๐‘– โˆ’ ๐ฟ22

]1๐‘š + ๐‘†1๐‘š

=๐‘šโˆ‘๐‘–=1

(๐‘’๐‘‡๐‘– ๐ฟ221๐‘š)๐‘’๐‘– โˆ’ ๐ฟ221๐‘š

=

[ ๐‘šโˆ‘๐‘–=1

๐‘’๐‘–๐‘’๐‘‡๐‘– โˆ’ ๐ผ๐‘š

]๐ฟ221๐‘š = 0.

๐ฟ11 is an M-matrix, so ๐ฟโˆ’111 is a non-negative matrix. ๐ฟ21๐ฟ

โˆ’111 ๐ฟ12 is the product of three non-

negative matrices, and so must also be non-negative. Therefore, the off-diagonal entries of

105

๐‘† andโˆ‘๐‘š

๐‘–=1(๐‘’๐‘‡๐‘– ๐ฟ221)๐‘’๐‘–๐‘’

๐‘‡๐‘– โˆ’ ๐ฟ21๐ฟ

โˆ’111 ๐ฟ12 are non-positive, and so both are graph Laplacians.

Consider

โŸจ๐ฟ๐‘ฃ, ๐‘ฃโŸฉ = โŸจ๐ฟ11๐‘ฃ1, ๐‘ฃ1โŸฉ+ 2โŸจ๐ฟ12๐‘ฃ2, ๐‘ฃ1โŸฉ+ โŸจ๐ฟ22๐‘ฃ2, ๐‘ฃ2โŸฉ,

with ๐‘ฃ2 fixed. Because ๐ฟ11 is symmetric positive definite, the minimum occurs when

๐œ•

๐œ•๐‘ฃ1โŸจ๐ฟ๐‘ฃ, ๐‘ฃโŸฉ = 2๐ฟ11๐‘ฃ1 + 2๐ฟ12๐‘ฃ2 = 0.

Setting ๐‘ฃ1 = โˆ’๐ฟโˆ’111 ๐ฟ12๐‘ฃ2, the desired result follows.

The above result illustrates that the Schur complement Laplacian ๐‘†ฮ“ is the sum of two

Laplacians ๐ฟฮ“ and ๐ทฮ“โˆ’๐ด๐‘‡๐‘œ,ฮ“(๐ฟ๐‘œ +๐ท๐‘œ)

โˆ’1๐ด๐‘œ,ฮ“, where the first is the Laplacian of ๐บ[ฮ“], and

the second is a Laplacian representing the dynamics of the interior.

An Algorithm

Given the above analysis for ๐‘†ฮ“, and assuming a spectral equivalence result of the form

1

๐‘1โŸจ๐ฟ1/2

ฮ“ ๐‘ฅ, ๐‘ฅโŸฉ โ‰ค โŸจ๐‘†ฮ“๐‘ฅ, ๐‘ฅโŸฉ โ‰ค ๐‘2 โŸจ๐ฟ1/2ฮ“ ๐‘ฅ, ๐‘ฅโŸฉ, (4.3)

for all ๐‘ฅ โˆˆ R๐‘›ฮ“ and some constants ๐‘1 and ๐‘2 that are not too large, we can construct a simple

algorithm for producing a spring embedding. Constants ๐‘1 and ๐‘2 can be computed explicitly

using the results of Subsection 4.2.2, and are reasonably sized if the graph resembles a

discretization of some planar domain (i.e., a graph that satisfies the conditions of Theorem

15). Given these two facts, a very natural algorithm consists of the following steps. First we

compute the two eigenvectors corresponding to the two minimal non-trivial eigenvalues of

the Schur complement ๐‘†ฮ“. If these two eigenvectors produce a planar embedding, we are

done, and have solved the optimization problem (4.2) exactly. Depending on the structure

of the graph, this may often be the case. For instance, this occurs for the graphs in Figures

4-1 and 4-2, and the numerical results of [124] confirm that this also occurs often for

discretizations of circular and rectangular domains. If this is not the case, then we consider

106

Algorithm 3 A Schur Complement-Based Spring Embedding Algorithm

function SchurSPRING(๐บ,ฮ“)Compute the two minimal non-trivial eigenpairs ๐‘‹๐‘Ž๐‘™๐‘” โˆˆ R๐‘›ฮ“ร—2 of ๐‘†ฮ“

if spring embedding of ๐‘‹๐‘Ž๐‘™๐‘” is non-planar then๐‘‹๐‘›๐‘’๐‘ค โ†

2๐‘›ฮ“

(cos 2๐œ‹๐‘—

๐‘›ฮ“, sin 2๐œ‹๐‘—

๐‘›ฮ“

)๐‘›ฮ“

๐‘—=1; gapโ† 1

while spring embedding of ๐‘‹๐‘›๐‘’๐‘ค is planar, gap > 0 do โ† smooth(๐‘‹๐‘›๐‘’๐‘ค); ๐‘‹๐‘Ž๐‘™๐‘” โ† ๐‘‹๐‘›๐‘’๐‘ค

โ† โˆ’ 11๐‘‡ /๐‘›ฮ“

Solve [๐‘‡ ]๐‘„ = ๐‘„ฮ›, ๐‘„ orthogonal, ฮ› diagonal; set ๐‘‹๐‘›๐‘’๐‘ค โ† ๐‘„ฮ›โˆ’1/2

gapโ† โ„Žฮ“(๐‘‹๐‘Ž๐‘™๐‘”)โˆ’ โ„Žฮ“(๐‘‹๐‘›๐‘’๐‘ค)end while

end ifReturn ๐‘‹๐‘Ž๐‘™๐‘”

end function

the boundary embedding

[๐‘‹๐ถ ]๐‘—,ยท =2

๐‘›ฮ“

(cos

2๐œ‹๐‘—

๐‘›ฮ“

, sin2๐œ‹๐‘—

๐‘›ฮ“

), ๐‘— = 1, ..., ๐‘›ฮ“,

namely, the embedding of the two minimal non-trivial eigenvectors of ๐ฟ1/2ฮ“ . This choice

of boundary is convex and planar, and so the associated spring embedding is planar. By

spectral equivalence,

โ„Žฮ“(๐‘‹๐ถ) โ‰ค 4๐‘2 sin๐œ‹

๐‘›ฮ“

โ‰ค ๐‘1๐‘2 min๐‘‹ฮ“โˆˆcl(๐’ณ )

โ„Žฮ“(๐‘‹ฮ“),

and therefore, this algorithm already produces a ๐‘1๐‘2 approximation guarantee for (4.2).

The choice of ๐‘‹๐ถ can often be improved, and a natural technique consists of smoothing

๐‘‹๐ถ using ๐‘†ฮ“ and renormalizing until either the resulting drawing is no longer planar or the

objective value no longer decreases. We describe this procedure in Algorithm 3.

We now discuss some of the finer details of the SchurSPRING(๐บ,ฮ“) algorithm (Algo-

rithm 3). Determining whether a drawing is planar can be done in near-linear time using

the sweep line algorithm [102]. Also, in practice, it is advisable to replace conditions of the

form gap > 0 in Algorithm 3 by gap > tol for some small value of tol, in order to ensure

that the algorithm terminates after some finite number of steps. For a reasonable choice of

107

smoother, a constant tolerance, and ๐‘›ฮ“ = ฮ˜(๐‘›1/2), the SchurSPRING(๐บ,ฮ“) algorithm has

complexity near-linear in ๐‘›. The main cost of this procedure is due to the computations that

involve ๐‘†ฮ“.

The SchurSPRING(๐บ,ฮ“) algorithm requires the repeated application of ๐‘†ฮ“ or ๐‘†โˆ’1ฮ“ in

order to compute the minimal eigenvectors of ๐‘†ฮ“ and also to perform smoothing. The Schur

complement ๐‘†ฮ“ is a dense matrix and requires the inversion of a (๐‘›โˆ’๐‘›ฮ“)ร— (๐‘›โˆ’๐‘›ฮ“) matrix,

but can be represented as the composition of functions of sparse matrices. In practice,

๐‘†ฮ“ should never be formed explicitly. Rather, the operation of applying ๐‘†ฮ“ to a vector ๐‘ฅ

should occur in two steps. First, the sparse Laplacian system (๐ฟ๐‘œ + ๐ท๐‘œ)๐‘ฆ = ๐ด๐‘œ,ฮ“๐‘ฅ should

be solved for ๐‘ฆ, and then the product ๐‘†๐‘ฅ is given by ๐‘†ฮ“๐‘ฅ = (๐ฟฮ“ + ๐ทฮ“)๐‘ฅ โˆ’ ๐ด๐‘‡๐‘œ,ฮ“๐‘ฆ. Each

application of ๐‘†ฮ“ is therefore an (๐‘›) procedure (using a nearly-linear time Laplacian

solver). The application of the inverse ๐‘†โˆ’1ฮ“ defined on the subspace ๐‘ฅ | โŸจ๐‘ฅ,1โŸฉ = 0 also

requires the solution of a Laplacian system. As noted in [130], the action of ๐‘†โˆ’1ฮ“ on a vector

๐‘ฅ โˆˆ ๐‘ฅ | โŸจ๐‘ฅ,1โŸฉ = 0 is given by

๐‘†โˆ’1ฮ“ ๐‘ฅ =

(0 ๐ผ

)โŽ›โŽ๐ฟ๐‘œ + ๐ท๐‘œ โˆ’๐ด๐‘œ,ฮ“

โˆ’๐ด๐‘‡๐‘œ,ฮ“ ๐ฟฮ“ + ๐ทฮ“

โŽžโŽ โˆ’1โŽ›โŽ0

๐‘ฅ

โŽžโŽ  ,

as verified by the computation

๐‘†ฮ“

[๐‘†โˆ’1ฮ“ ๐‘ฅ]

= ๐‘†ฮ“

(0 ๐ผ

)โŽกโŽฃโŽ›โŽ ๐ผ 0

โˆ’๐ด๐‘‡๐‘œ,ฮ“(๐ฟ๐‘œ + ๐ท๐‘œ)

โˆ’1 ๐ผ

โŽžโŽ โŽ›โŽ๐ฟ๐‘œ + ๐ท๐‘œ โˆ’๐ด๐‘œ,ฮ“

0 ๐‘†ฮ“

โŽžโŽ โŽคโŽฆโˆ’1โŽ›โŽ0

๐‘ฅ

โŽžโŽ = ๐‘†ฮ“

(0 ๐ผ

)โŽ›โŽ๐ฟ๐‘œ + ๐ท๐‘œ โˆ’๐ด๐‘œ,ฮ“

0 ๐‘†ฮ“

โŽžโŽ โˆ’1โŽ›โŽ ๐ผ 0

๐ด๐‘‡๐‘œ,ฮ“(๐ฟ๐‘œ + ๐ท๐‘œ)

โˆ’1 ๐ผ

โŽžโŽ โŽ›โŽ0

๐‘ฅ

โŽžโŽ = ๐‘†ฮ“

(0 ๐ผ

)โŽ›โŽ๐ฟ๐‘œ + ๐ท๐‘œ โˆ’๐ด๐‘œ,ฮ“

0 ๐‘†ฮ“

โŽžโŽ โˆ’1โŽ›โŽ0

๐‘ฅ

โŽžโŽ  = ๐‘ฅ.

Given that the application of ๐‘†โˆ’1ฮ“ has the same complexity as an application ๐‘†ฮ“, the inverse

power method is naturally preferred over the shifted power method for both smoothing and

the computation of low energy eigenvectors.

108

4.2.2 A Discrete Trace Theorem and Spectral Equivalence

The main result of this subsection takes classical trace theorems from the theory of partial

differential equations and extends them to a class of planar graphs. However, for our

purposes, we require a stronger form of trace theorem, one between energy semi-norms

(i.e., no โ„“2 term), which we refer to as โ€œenergy-onlyโ€ trace theorems. These energy-only

trace theorems imply their classical variants with โ„“2 terms almost immediately. We then

use these new results to prove the spectral equivalence of ๐‘†ฮ“ and ๐ฟ1/2ฮ“ for the class of

graphs under consideration. This class of graphs is rigorously defined below, but includes

planar three-connected graphs that have some regular structure (such as graphs of finite

element discretizations). For the sake of space and readibility, a number of the results of

this subsection are stated for arbitrary constants, or constants larger than necessary. More

complicated proofs with explicit constants (or, in some cases, improved constants) can be

found in [124]. We begin by formally describing a classical trace theorem.

Let ฮฉ โŠ‚ R๐‘‘ be a domain with boundary ฮ“ = ๐›ฟฮฉ that, locally, is a graph of a Lipschitz

function. ๐ป1(ฮฉ) is the Sobolev space of square integrable functions with square integrable

weak gradient, with norm

โ€–๐‘ขโ€–21,ฮฉ = โ€–โˆ‡๐‘ขโ€–2๐ฟ2(ฮฉ) + โ€–๐‘ขโ€–2๐ฟ2(ฮฉ), where โ€–๐‘ขโ€–2๐ฟ2(ฮฉ) =

โˆซฮฉ

๐‘ข2 ๐‘‘๐‘ฅ.

Let

โ€–๐œ™โ€–21/2,ฮ“ = โ€–๐œ™โ€–2๐ฟ2(ฮ“)+

โˆซโˆซฮ“ร—ฮ“

(๐œ™(๐‘ฅ)โˆ’ ๐œ™(๐‘ฆ))2

|๐‘ฅโˆ’ ๐‘ฆ|๐‘‘๐‘‘๐‘ฅ ๐‘‘๐‘ฆ

for functions defined on ฮ“, and denote by ๐ป1/2(ฮ“) the Sobolev space of functions defined on

the boundary ฮ“ for which โ€– ยท โ€–1/2,ฮ“ is finite. The trace theorem for functions in ๐ป1(ฮฉ) is one

of the most important and frequently used trace theorems in the theory of partial differential

equations. More general results for traces on boundaries of Lipschitz domains, which

involve ๐ฟ๐‘ norms and fractional derivatives, are due to E. Gagliardo [42] (see also [24]).

Gagliardoโ€™s theorem, when applied to the case of ๐ป1(ฮฉ) and ๐ป1/2(ฮ“), states that if ฮฉ โŠ‚ R๐‘‘

109

is a Lipschitz domain, then the norm equivalence

โ€–๐œ™โ€–1/2,ฮ“ h infโ€–๐‘ขโ€–1,ฮฉ๐‘ข|ฮ“ = ๐œ™

holds (the right hand side is indeed a norm on ๐ป1/2(ฮ“)). These results are key tools in

proving a priori estimates on the Dirichlet integral of functions with given data on the

boundary of a domain ฮฉ. Roughly speaking, a trace theorem gives a bound on the energy

of a harmonic function via norm of the trace of the function on ฮ“ = ๐œ•ฮฉ. In addition to

the classical references given above, further details on trace theorems and their role in the

analysis of PDEs (including the case of Lipschitz domains) can be found in [77, 87]. There

are several analogues of this theorem for finite element spaces (finite dimensional subspaces

of ๐ป1(ฮฉ)). For instance, in [86] it is shown that the finite element discretization of the

Laplace-Beltrami operator on the boundary to the one-half power provides a norm which is

equivalent to the ๐ป1/2(ฮ“)-norm. Here we prove energy-only analogues of the classical trace

theorem for graphs (๐บ,ฮ“) โˆˆ ๐’ข๐‘›, using energy semi-norms

|๐‘ข|2๐บ = โŸจ๐ฟ๐บ๐‘ข, ๐‘ขโŸฉ and |๐œ™|2ฮ“ =โˆ‘๐‘,๐‘žโˆˆฮ“,๐‘<๐‘ž

(๐œ™(๐‘)โˆ’ ๐œ™(๐‘ž))2

๐‘‘2๐บ(๐‘, ๐‘ž).

The energy semi-norm | ยท |๐บ is a discrete analogue of โ€–โˆ‡๐‘ขโ€–๐ฟ2(ฮฉ), and the boundary

semi-norm | ยท |ฮ“ is a discrete analogue of the quantityโˆซโˆซ

ฮ“ร—ฮ“(๐œ™(๐‘ฅ)โˆ’๐œ™(๐‘ฆ))2

|๐‘ฅโˆ’๐‘ฆ|2 ๐‘‘๐‘ฅ ๐‘‘๐‘ฆ. In addition,

by connectivity, | ยท |๐บ and | ยท |ฮ“ are norms on the quotient space orthogonal to 1. We aim to

prove that for any ๐œ™ โˆˆ R๐‘›ฮ“ ,

1

๐‘1|๐œ™|ฮ“ โ‰ค min

๐‘ข|ฮ“=๐œ™|๐‘ข|๐บ โ‰ค ๐‘2 |๐œ™|ฮ“

for some constants ๐‘1, ๐‘2 that do not depend on ๐‘›ฮ“, ๐‘›. We begin by proving these results for

a simple class of graphs, and then extend our analysis to more general graphs. Some of the

proofs of the below results are rather technical, and are omitted for the sake of readibility

and space.

110

Trace Theorems for a Simple Class of Graphs

Let ๐บ๐‘˜,โ„“ = ๐ถ๐‘˜๐‘ƒโ„“ be the Cartesian product of the ๐‘˜ vertex cycle ๐ถ๐‘˜ and the โ„“ vertex path ๐‘ƒโ„“,

where 4โ„“ < ๐‘˜ < 2๐‘โ„“ for some constant ๐‘ โˆˆ N. The lower bound 4โ„“ < ๐‘˜ is arbitrary in some

sense, but is natural, given that the ratio of boundary length to in-radius of a convex region is

at least 2๐œ‹. Vertex (๐‘–, ๐‘—) in ๐บ๐‘˜,โ„“ corresponds to the product of ๐‘– โˆˆ ๐ถ๐‘˜ and ๐‘— โˆˆ ๐‘ƒโ„“, ๐‘– = 1, ..., ๐‘˜,

๐‘— = 1, ..., โ„“. The boundary of ๐บ๐‘˜,โ„“ is defined to be ฮ“ = (๐‘–, 1)๐‘˜๐‘–=1. Let ๐‘ข โˆˆ R๐‘˜ร—โ„“ and

๐œ™ โˆˆ R๐‘˜ be functions on ๐บ๐‘˜,โ„“ and ฮ“, respectively, with ๐‘ข[(๐‘–, ๐‘—)] denoted by ๐‘ข๐‘–,๐‘— and ๐œ™[(๐‘–, 1)]

denoted by ๐œ™๐‘–. For the remainder of the section, we consider the natural periodic extension

of the vertices (๐‘–, ๐‘—) and the functions ๐‘ข๐‘–,๐‘— and ๐œ™๐‘– to the indices ๐‘– โˆˆ Z. In particular, if

๐‘– โˆˆ 1, ..., ๐‘˜, then (๐‘–, ๐‘—) := (๐‘–*, ๐‘—), ๐œ™๐‘– := ๐œ™๐‘–* , and ๐‘ข๐‘–,๐‘— := ๐‘ข๐‘–*,๐‘— , where ๐‘–* โˆˆ 1, ..., ๐‘˜ and

๐‘–* = ๐‘– mod ๐‘˜. Let ๐บ*๐‘˜,โ„“ be the graph resulting from adding to ๐บ๐‘˜,โ„“ all edges of the form

(๐‘–, ๐‘—), (๐‘–โˆ’ 1, ๐‘— + 1) and (๐‘–, ๐‘—), (๐‘– + 1, ๐‘— + 1), ๐‘– = 1, ..., ๐‘˜, ๐‘— = 1, ..., โ„“โˆ’ 1. We provide

a visual example of ๐บ๐‘˜,โ„“ and ๐บ*๐‘˜,โ„“ in Figure 4-3. First, we prove a trace theorem for ๐บ๐‘˜,โ„“.

The proof of the trace theorem consists of two lemmas. Lemma 10 shows that the discrete

trace operator is bounded, and Lemma 11 shows that it has a continuous right inverse. Taken

together, these lemmas imply our desired result for ๐บ๐‘˜,โ„“. We then extend this result to all

graphs ๐ป satisfying ๐บ๐‘˜,โ„“ โŠ‚ ๐ป โŠ‚ ๐บ*๐‘˜,โ„“.

Lemma 10. Let ๐บ = ๐บ๐‘˜,โ„“, 4โ„“ < ๐‘˜ < 2๐‘โ„“, ๐‘ โˆˆ N, with boundary ฮ“ = (๐‘–, 1)๐‘˜๐‘–=1. For any

๐‘ข โˆˆ R๐‘˜ร—โ„“, the vector ๐œ™ = ๐‘ข|ฮ“ satisfies |๐œ™|ฮ“ โ‰ค 4โˆš๐‘ |๐‘ข|๐บ.

Proof. We can decompose ๐œ™๐‘+โ„Ž โˆ’ ๐œ™โ„Ž into a sum of differences, given by

๐œ™๐‘+โ„Ž โˆ’ ๐œ™๐‘ =๐‘ โˆ’1โˆ‘๐‘–=1

๐‘ข๐‘+โ„Ž,๐‘– โˆ’ ๐‘ข๐‘+โ„Ž,๐‘–+1 +โ„Žโˆ‘

๐‘–=1

๐‘ข๐‘+๐‘–,๐‘  โˆ’ ๐‘ข๐‘+๐‘–โˆ’1,๐‘  +๐‘ โˆ’1โˆ‘๐‘–=1

๐‘ข๐‘,๐‘ โˆ’๐‘–+1 โˆ’ ๐‘ข๐‘,๐‘ โˆ’๐‘–,

111

where ๐‘  =โŒˆโ„Ž๐‘

โŒ‰is a function of โ„Ž. By Cauchy-Schwarz,

๐‘˜โˆ‘๐‘=1

โŒŠ๐‘˜/2โŒ‹โˆ‘โ„Ž=1

(๐œ™๐‘+โ„Ž โˆ’ ๐œ™๐‘

โ„Ž

)2

โ‰ค 3๐‘˜โˆ‘

๐‘=1

โŒŠ๐‘˜/2โŒ‹โˆ‘โ„Ž=1

(1

โ„Ž

๐‘ โˆ’1โˆ‘๐‘–=1

๐‘ข๐‘+โ„Ž,๐‘– โˆ’ ๐‘ข๐‘+โ„Ž,๐‘–+1

)2

+ 3๐‘˜โˆ‘

๐‘=1

โŒŠ๐‘˜/2โŒ‹โˆ‘โ„Ž=1

(1

โ„Ž

โ„Žโˆ‘๐‘–=1

๐‘ข๐‘+๐‘–,๐‘  โˆ’ ๐‘ข๐‘+๐‘–โˆ’1,๐‘ 

)2

+ 3๐‘˜โˆ‘

๐‘=1

โŒŠ๐‘˜/2โŒ‹โˆ‘โ„Ž=1

(1

โ„Ž

๐‘ โˆ’1โˆ‘๐‘–=1

๐‘ข๐‘,๐‘ โˆ’๐‘–+1 โˆ’ ๐‘ข๐‘,๐‘ โˆ’๐‘–

)2

.

We bound the first and the second term separately. The third term is identical to the first.

Using Hardyโ€™s inequality [50, Theorem 326], we can bound the first term by

๐‘˜โˆ‘๐‘=1

โŒŠ๐‘˜/2โŒ‹โˆ‘โ„Ž=1

(1

โ„Ž

๐‘ โˆ’1โˆ‘๐‘–=1

๐‘ข๐‘,๐‘– โˆ’ ๐‘ข๐‘,๐‘–+1

)2

=๐‘˜โˆ‘

๐‘=1

โ„“โˆ‘๐‘ =1

(1

๐‘ 

๐‘ โˆ’1โˆ‘๐‘–=1

๐‘ข๐‘,๐‘– โˆ’ ๐‘ข๐‘,๐‘–+1

)2 โˆ‘โ„Ž:โŒˆโ„Ž/๐‘โŒ‰=๐‘ 1โ‰คโ„Žโ‰คโŒŠ๐‘˜/2โŒ‹

๐‘ 2

โ„Ž2

โ‰ค 4๐‘˜โˆ‘

๐‘=1

โ„“โˆ’1โˆ‘๐‘ =1

(๐‘ข๐‘,๐‘  โˆ’ ๐‘ข๐‘,๐‘ +1

)2 โˆ‘โ„Ž:โŒˆโ„Ž/๐‘โŒ‰=๐‘ 1โ‰คโ„Žโ‰คโŒŠ๐‘˜/2โŒ‹

๐‘ 2

โ„Ž2.

We have

โˆ‘โ„Ž:โŒˆโ„Ž/๐‘โŒ‰=๐‘ 1โ‰คโ„Žโ‰คโŒŠ๐‘˜/2โŒ‹

๐‘ 2

โ„Ž2โ‰ค ๐‘ 2

๐‘๐‘ โˆ‘๐‘–=๐‘(๐‘ โˆ’1)+1

1

๐‘–2โ‰ค ๐‘ 2(๐‘โˆ’ 1)

(๐‘(๐‘ โˆ’ 1) + 1)2โ‰ค 4(๐‘โˆ’ 1)

(๐‘ + 1)2โ‰ค 1

2

for ๐‘  โ‰ฅ 2 (๐‘ โ‰ฅ 3, by definition), and for ๐‘  = 1,

โˆ‘โ„Ž:โŒˆโ„Ž/๐‘โŒ‰=11โ‰คโ„Žโ‰คโŒŠ๐‘˜/2โŒ‹

1

โ„Ž2โ‰ค

โˆžโˆ‘๐‘–=1

1

๐‘–2=

๐œ‹2

6.

112

(a) ๐บ16,3 = ๐ถ16๐‘ƒ3 (b) ๐บ*16,3

Figure 4-3: A visual example of ๐บ๐‘˜,โ„“ and ๐บ*๐‘˜,โ„“ for ๐‘˜ = 16, โ„“ = 3. The boundary ฮ“ is given

by the outer (or, by symmetry, inner) cycle.

Therefore, we can bound the first term by

๐‘˜โˆ‘๐‘=1

โŒŠ๐‘˜/2โŒ‹โˆ‘โ„Ž=1

(1

โ„Ž

๐‘ โˆ’1โˆ‘๐‘–=1

๐‘ข๐‘,๐‘– โˆ’ ๐‘ข๐‘,๐‘–+1

)2

โ‰ค 2๐œ‹2

3

๐‘˜โˆ‘๐‘=1

โ„“โˆ’1โˆ‘๐‘ =1

(๐‘ข๐‘,๐‘  โˆ’ ๐‘ข๐‘,๐‘ +1

)2.

For the second term, we have

๐‘˜โˆ‘๐‘=1

โŒŠ๐‘˜/2โŒ‹โˆ‘โ„Ž=1

(1

โ„Ž

โ„Žโˆ‘๐‘–=1

๐‘ข๐‘+๐‘–,๐‘  โˆ’ ๐‘ข๐‘+๐‘–โˆ’1,๐‘ 

)2

โ‰ค๐‘˜โˆ‘

๐‘=1

โŒŠ๐‘˜/2โŒ‹โˆ‘โ„Ž=1

1

โ„Ž

โ„Žโˆ‘๐‘–=1

(๐‘ข๐‘+๐‘–,๐‘  โˆ’ ๐‘ข๐‘+๐‘–โˆ’1,๐‘ 

)2โ‰ค ๐‘

๐‘˜โˆ‘๐‘=1

โ„“โˆ‘๐‘ =1

(๐‘ข๐‘+1,๐‘  โˆ’ ๐‘ข๐‘,๐‘ 

)2.

Combining these bounds produces the result |๐œ™|2ฮ“ โ‰ค max4๐œ‹2, 3๐‘ |๐‘ข|2๐บ. Noting that ๐‘ โ‰ฅ 3

gives the desired result.

In order to show that the discrete trace operator has a continuous right inverse, we need

to produce a provably low-energy extension of an arbitrary function on ฮ“. Let

๐‘Ž =1

๐‘˜

๐‘˜โˆ‘๐‘=1

๐œ™๐‘ and ๐‘Ž๐‘–,๐‘— =1

2๐‘— โˆ’ 1

๐‘—โˆ’1โˆ‘โ„Ž=1โˆ’๐‘—

๐œ™๐‘–+โ„Ž.

113

We consider the extension

๐‘ข๐‘–,๐‘— =๐‘— โˆ’ 1

โ„“โˆ’ 1๐‘Ž +

(1โˆ’ ๐‘— โˆ’ 1

โ„“โˆ’ 1

)๐‘Ž๐‘–,๐‘—. (4.4)

The proof of following inverse result for the discrete trace operator is similar in technique

to that of Lemma 10, but significantly more involved.

Lemma 11. Let ๐บ = ๐บ๐‘˜,โ„“, 4โ„“ < ๐‘˜ < 2๐‘โ„“, ๐‘ โˆˆ N, with boundary ฮ“ = (๐‘–, 1)๐‘˜๐‘–=1. For any

๐œ™ โˆˆ R๐‘˜, the vector ๐‘ข defined by (4.4) satisfies |๐‘ข|๐บ โ‰ค 5โˆš๐‘ |๐œ™|ฮ“.

Proof. We can decompose |๐‘ข|2๐บ into two parts, namely,

|๐‘ข|2๐บ =๐‘˜โˆ‘

๐‘–=1

โ„“โˆ‘๐‘—=1

(๐‘ข๐‘–+1,๐‘— โˆ’ ๐‘ข๐‘–,๐‘—)2 +

๐‘˜โˆ‘๐‘–=1

โ„“โˆ’1โˆ‘๐‘—=1

(๐‘ข๐‘–,๐‘—+1 โˆ’ ๐‘ข๐‘–,๐‘—)2.

We bound each sum separately, beginning with the first. We have

๐‘ข๐‘–+1,๐‘— โˆ’ ๐‘ข๐‘–,๐‘— =

(1โˆ’ ๐‘— โˆ’ 1

โ„“โˆ’ 1

)(๐‘Ž๐‘–+1,๐‘— โˆ’ ๐‘Ž๐‘–,๐‘—) =

(1โˆ’ ๐‘— โˆ’ 1

โ„“โˆ’ 1

)๐œ™๐‘–+๐‘— โˆ’ ๐œ™๐‘–+1โˆ’๐‘—

2๐‘— โˆ’ 1.

Squaring both sides and noting that 4โ„“ < ๐‘˜, we have

๐‘˜โˆ‘๐‘–=1

โ„“โˆ‘๐‘—=1

(๐‘ข๐‘–+1,๐‘— โˆ’ ๐‘ข๐‘–,๐‘—)2 โ‰ค

๐‘˜โˆ‘๐‘–=1

โ„“โˆ‘๐‘—=1

[๐œ™๐‘–+๐‘— โˆ’ ๐œ™๐‘–+1โˆ’๐‘—

2๐‘— โˆ’ 1

]2โ‰ค

๐‘˜โˆ‘๐‘=1

2โ„“โˆ’1โˆ‘โ„Ž=1

[๐œ™๐‘+โ„Ž โˆ’ ๐œ™๐‘

โ„Ž

]2โ‰ค |๐œ™|2ฮ“.

We now consider the second sum. Each term can be decomposed as

๐‘ข๐‘–,๐‘—+1 โˆ’ ๐‘ข๐‘–,๐‘— =๐‘Žโˆ’ ๐‘Ž๐‘–,๐‘—โ„“โˆ’ 1

+

(1โˆ’ ๐‘—

โ„“โˆ’ 1

)[๐‘Ž๐‘–,๐‘—+1 โˆ’ ๐‘Ž๐‘–,๐‘—],

which leads to the upper bound

๐‘˜โˆ‘๐‘–=1

โ„“โˆ’1โˆ‘๐‘—=1

(๐‘ข๐‘–,๐‘—+1 โˆ’ ๐‘ข๐‘–,๐‘—)2 โ‰ค 2

๐‘˜โˆ‘๐‘–=1

โ„“โˆ’1โˆ‘๐‘—=1

[๐‘Žโˆ’ ๐‘Ž๐‘–,๐‘—โ„“โˆ’ 1

]2+ 2

๐‘˜โˆ‘๐‘–=1

โ„“โˆ’1โˆ‘๐‘—=1

(๐‘Ž๐‘–,๐‘—+1 โˆ’ ๐‘Ž๐‘–,๐‘—)2.

We estimate the two terms in the previous equation separately, beginning with the first. The

114

difference ๐‘Žโˆ’ ๐‘Ž๐‘–,๐‘— can be written as

๐‘Žโˆ’ ๐‘Ž๐‘–,๐‘— =1

๐‘˜

๐‘˜โˆ‘๐‘=1

๐œ™๐‘ โˆ’1

2๐‘— โˆ’ 1

๐‘—โˆ’1โˆ‘โ„Ž=1โˆ’๐‘—

๐œ™๐‘–+โ„Ž =1

๐‘˜(2๐‘— โˆ’ 1)

๐‘˜โˆ‘๐‘=1

๐‘—โˆ’1โˆ‘โ„Ž=1โˆ’๐‘—

๐œ™๐‘ โˆ’ ๐œ™๐‘–+โ„Ž.

Squaring both sides,

(๐‘Žโˆ’ ๐‘Ž๐‘–,๐‘—)2 =

1

๐‘˜2(2๐‘— โˆ’ 1)2

(๐‘˜โˆ‘

๐‘=1

๐‘—โˆ’1โˆ‘โ„Ž=1โˆ’๐‘—

๐œ™๐‘ โˆ’ ๐œ™๐‘–+โ„Ž

)2

โ‰ค 1

๐‘˜(2๐‘— โˆ’ 1)

๐‘˜โˆ‘๐‘=1

๐‘—โˆ’1โˆ‘โ„Ž=1โˆ’๐‘—

(๐œ™๐‘ โˆ’ ๐œ™๐‘–+โ„Ž)2.

Summing over all ๐‘– and ๐‘— gives

๐‘˜โˆ‘๐‘–=1

โ„“โˆ’1โˆ‘๐‘—=1

[๐‘Žโˆ’ ๐‘Ž๐‘–,๐‘—โ„“โˆ’ 1

]2โ‰ค 1

(โ„“โˆ’ 1)2

๐‘˜โˆ‘๐‘–=1

โ„“โˆ’1โˆ‘๐‘—=1

1

๐‘˜(2๐‘— โˆ’ 1)

๐‘˜โˆ‘๐‘=1

๐‘—โˆ’1โˆ‘โ„Ž=1โˆ’๐‘—

(๐œ™๐‘ โˆ’ ๐œ™๐‘–+โ„Ž)2

=๐‘˜

4(โ„“โˆ’ 1)2

โ„“โˆ’1โˆ‘๐‘—=1

1

2๐‘— โˆ’ 1

๐‘—โˆ’1โˆ‘โ„Ž=1โˆ’๐‘—

๐‘˜โˆ‘๐‘–,๐‘=1

(๐œ™๐‘ โˆ’ ๐œ™๐‘–+โ„Ž)2

๐‘˜2/4

โ‰ค ๐‘˜

4(โ„“โˆ’ 1)|๐œ™|2ฮ“ โ‰ค ๐‘|๐œ™|2ฮ“.

This completes the analysis of the first term. For the second term, we have

๐‘Ž๐‘–,๐‘—+1 โˆ’ ๐‘Ž๐‘–,๐‘— =1

2๐‘— + 1

[๐œ™๐‘–+๐‘— + ๐œ™๐‘–โˆ’๐‘— โˆ’

2

2๐‘— โˆ’ 1

๐‘—โˆ’1โˆ‘โ„Ž=1โˆ’๐‘—

๐œ™๐‘–+โ„Ž

].

Next, we note that๐œ™๐‘–+๐‘— โˆ’

๐œ™๐‘–

2๐‘— โˆ’ 1โˆ’ 2

2๐‘— โˆ’ 1

๐‘—โˆ’1โˆ‘โ„Ž=1

๐œ™๐‘–+โ„Ž

=

๐œ™๐‘–+๐‘— โˆ’ ๐œ™๐‘–

2๐‘— โˆ’ 1+ 2

๐‘—โˆ’1โˆ‘โ„Ž=1

๐œ™๐‘–+๐‘— โˆ’ ๐œ™๐‘–+โ„Ž

2๐‘— โˆ’ 1

โ‰ค 2

๐‘—โˆ’1โˆ‘โ„Ž=0

|๐œ™๐‘–+๐‘— โˆ’ ๐œ™๐‘–+โ„Ž|2๐‘— โˆ’ 1

,

115

and, similarly,

๐œ™๐‘–โˆ’๐‘— โˆ’

๐œ™๐‘–

2๐‘— โˆ’ 1โˆ’ 2

2๐‘— โˆ’ 1

๐‘—โˆ’1โˆ‘โ„Ž=1

๐œ™๐‘–โˆ’โ„Ž

=

๐œ™๐‘–โˆ’๐‘— โˆ’ ๐œ™๐‘–

2๐‘— โˆ’ 1+ 2

๐‘—โˆ’1โˆ‘โ„Ž=1

๐œ™๐‘–โˆ’๐‘— โˆ’ ๐œ™๐‘–โˆ’โ„Ž

2๐‘— โˆ’ 1

โ‰ค 2

๐‘—โˆ’1โˆ‘โ„Ž=0

|๐œ™๐‘–โˆ’๐‘— โˆ’ ๐œ™๐‘–โˆ’โ„Ž|2๐‘— โˆ’ 1

.

Hence,

๐‘™โˆ’1โˆ‘๐‘—=1

(๐‘Ž๐‘–,๐‘—+1 โˆ’ ๐‘Ž๐‘–,๐‘—)2 โ‰ค

๐‘™โˆ’1โˆ‘๐‘—=1

8

(2๐‘— + 1)2

โŽกโŽฃ( ๐‘—โˆ’1โˆ‘โ„Ž=0

|๐œ™๐‘–+๐‘— โˆ’ ๐œ™๐‘–+โ„Ž|2๐‘— โˆ’ 1

)2

+

(๐‘—โˆ’1โˆ‘โ„Ž=0

|๐œ™๐‘–โˆ’๐‘— โˆ’ ๐œ™๐‘–โˆ’โ„Ž|2๐‘— โˆ’ 1

)2โŽคโŽฆ .

Once we sum over all ๐‘–, the sum of the first and second term are identical, and therefore

๐‘˜โˆ‘๐‘–=1

๐‘™โˆ’1โˆ‘๐‘—=1

(๐‘Ž๐‘–,๐‘—+1 โˆ’ ๐‘Ž๐‘–,๐‘—)2 โ‰ค 16

๐‘˜โˆ‘๐‘–=1

๐‘™โˆ’1โˆ‘๐‘—=1

(๐‘—โˆ’1โˆ‘โ„Ž=0

|๐œ™๐‘–+๐‘— โˆ’ ๐œ™๐‘–+โ„Ž|(2๐‘— โˆ’ 1)(2๐‘— + 1)

)2

.

We have

๐‘—โˆ’1โˆ‘โ„Ž=0

|๐œ™๐‘–+๐‘— โˆ’ ๐œ™๐‘–+โ„Ž|(2๐‘— โˆ’ 1)(2๐‘— + 1)

โ‰ค 1

3๐‘—

๐‘–+๐‘—โˆ’1โˆ‘๐‘=๐‘–

|๐œ™๐‘–+๐‘— โˆ’ ๐œ™๐‘|๐‘—

โ‰ค 1

3๐‘—

๐‘–+๐‘—โˆ’1โˆ‘๐‘=๐‘–

|๐œ™๐‘–+๐‘— โˆ’ ๐œ™๐‘|๐‘– + ๐‘— โˆ’ ๐‘

,

which implies that

16๐‘˜โˆ‘

๐‘–=1

๐‘™โˆ’1โˆ‘๐‘—=1

(๐‘—โˆ’1โˆ‘โ„Ž=0

|๐œ™๐‘–+๐‘— โˆ’ ๐œ™๐‘–+โ„Ž|(2๐‘— โˆ’ 1)(2๐‘— + 1)

)2

โ‰ค 16

9

๐‘˜โˆ‘๐‘–=1

๐‘™โˆ’1โˆ‘๐‘—=1

(1

๐‘—

๐‘–+๐‘—โˆ’1โˆ‘๐‘=๐‘–

|๐œ™๐‘–+๐‘— โˆ’ ๐œ™๐‘|๐‘– + ๐‘— โˆ’ ๐‘

)2

โ‰ค 16

9

๐‘˜+โ„“โˆ’1โˆ‘๐‘ž=1

๐‘žโˆ’1โˆ‘๐‘š=1

(1

๐‘ž โˆ’๐‘š

๐‘žโˆ’1โˆ‘๐‘=๐‘š

|๐œ™๐‘ž โˆ’ ๐œ™๐‘|๐‘ž โˆ’ ๐‘

)2

,

where ๐‘ž = ๐‘–+ ๐‘— and ๐‘š = ๐‘–. Letting ๐‘Ÿ = ๐‘žโˆ’๐‘š, ๐‘  = ๐‘žโˆ’ ๐‘, and using Hardyโ€™s inequality [50,

116

Theorem 326], we obtain

16

9

๐‘˜+โ„“โˆ’1โˆ‘๐‘ž=1

๐‘žโˆ’1โˆ‘๐‘š=1

(1

๐‘ž โˆ’๐‘š

๐‘žโˆ’1โˆ‘๐‘=๐‘š

|๐œ™๐‘ž โˆ’ ๐œ™๐‘|๐‘ž โˆ’ ๐‘

)2

=16

9

๐‘˜+โ„“โˆ’1โˆ‘๐‘ž=1

๐‘žโˆ’1โˆ‘๐‘Ÿ=1

(1

๐‘Ÿ

๐‘Ÿโˆ‘๐‘ =1

|๐œ™๐‘ž โˆ’ ๐œ™๐‘žโˆ’๐‘ |๐‘ 

)2

โ‰ค 64

9

๐‘˜+โ„“โˆ’1โˆ‘๐‘ž=1

๐‘žโˆ’1โˆ‘๐‘Ÿ=1

[๐œ™๐‘ž โˆ’ ๐œ™๐‘žโˆ’๐‘Ÿ

๐‘Ÿ

]2

โ‰ค 64

9

๐‘˜+โ„“โˆ’1โˆ‘๐‘ž=1

๐‘žโˆ’1โˆ‘๐‘Ÿ=1

[๐œ™๐‘ž โˆ’ ๐œ™๐‘žโˆ’๐‘Ÿ

๐‘‘๐บ((๐‘ž, 1), (๐‘ž โˆ’ ๐‘Ÿ, 1)

)]2โ‰ค 256

9|๐œ™|2ฮ“,

where we note that the sum over the indices ๐‘ž and ๐‘Ÿ consists of some amount of over-

counting, with some terms (๐œ™(๐‘ž) โˆ’ ๐œ™(๐‘ž โˆ’ ๐‘Ÿ))2 appearing up to four times. Improved

estimates can be obtained by noting that the choice of indexing for ฮ“ is arbitrary (see [124]

for details), but, for the sake of readability, we focus on simplicity over tight constants.

Combining our estimates and noting that ๐‘ โ‰ฅ 3, we obtain the desired result

|๐‘ข|๐บ โ‰ค

โˆš1 + 2

(๐‘ +

256

9

)|๐œ™|ฮ“ =

โˆš2๐‘ +

521

9|๐œ™|ฮ“ โ‰ค 5

โˆš๐‘ |๐œ™|ฮ“.

Combining Lemmas 10 and 11, we obtain our desired trace theorem.

Theorem 12. Let ๐บ = ๐บ๐‘˜,โ„“, 4โ„“ < ๐‘˜ < 2๐‘โ„“, ๐‘ โˆˆ N, with boundary ฮ“ = (๐‘–, 1)๐‘˜๐‘–=1. For any

๐œ™ โˆˆ R๐‘˜,1

4โˆš๐‘|๐œ™|ฮ“ โ‰ค min

๐‘ข|ฮ“=๐œ™|๐‘ข|๐บ โ‰ค 5

โˆš๐‘ |๐œ™|ฮ“.

With a little more work, we can prove a similar result for a slightly more general class

of graphs. Using Theorem 12, we can almost immediately prove a trace theorem for any

graph ๐ป satisfying ๐บ๐‘˜,โ„“ โŠ‚ ๐ป โŠ‚ ๐บ*๐‘˜,โ„“. In fact, Lemma 10 carries over immediately. In order

to prove a new version of Lemma 11, it suffices to bound the energy of ๐‘ข on the edges in

117

๐บ*๐‘˜,โ„“ not contained in ๐บ๐‘˜,โ„“. By Cauchy-Schwarz,

|๐‘ข|2๐บ* = |๐‘ข|2๐บ +๐‘˜โˆ‘

๐‘–=1

โ„“โˆ’1โˆ‘๐‘—=1

[(๐‘ข๐‘–,๐‘—+1 โˆ’ ๐‘ข๐‘–โˆ’1,๐‘—)

2 + (๐‘ข๐‘–,๐‘—+1 โˆ’ ๐‘ข๐‘–+1,๐‘—)2

]

โ‰ค 3๐‘˜โˆ‘

๐‘–=1

โ„“โˆ‘๐‘—=1

(๐‘ข๐‘–+1,๐‘— โˆ’ ๐‘ข๐‘–,๐‘—)2 + 2

๐‘˜โˆ‘๐‘–=1

โ„“โˆ’1โˆ‘๐‘—=1

(๐‘ข๐‘–,๐‘—+1 โˆ’ ๐‘ข๐‘–,๐‘—)2 โ‰ค 3 |๐‘ข|2๐บ,

and therefore Corollary 1 follows immediately from Lemmas 10 and 11.

Corollary 1. Let ๐ป satisfy ๐บ๐‘˜,โ„“ โŠ‚ ๐ป โŠ‚ ๐บ*๐‘˜,โ„“, 4โ„“ < ๐‘˜ < 2๐‘โ„“, ๐‘ โˆˆ N, with boundary

ฮ“ = (๐‘–, 1)๐‘˜๐‘–=1. For any ๐œ™ โˆˆ R๐‘˜,

1

4โˆš๐‘|๐œ™|ฮ“ โ‰ค min

๐‘ข|ฮ“=๐œ™|๐‘ข|๐ป โ‰ค 5

โˆš3๐‘ |๐œ™|ฮ“.

Trace Theorems for General Graphs

In order to extend Corollary 1 to more general graphs, we introduce a graph operation which

is similar in concept to an aggregation (a partition of ๐‘‰ into connected subsets) in which the

size of aggregates is bounded. In particular, we give the following definition.

Definition 2. The graph ๐ป , ๐บ๐‘˜,โ„“ โŠ‚ ๐ป โŠ‚ ๐บ*๐‘˜,โ„“, is said to be an ๐‘€ -aggregation of (๐บ,ฮ“) โˆˆ

๐’ข๐‘› if there exists a partition ๐’œ = ๐‘Ž* โˆช ๐‘Ž๐‘–,๐‘—๐‘—=1,...,โ„“๐‘–=1,...,๐‘˜ of ๐‘‰ (๐บ) satisfying

1. ๐บ[๐‘Ž๐‘–,๐‘—] is connected and |๐‘Ž๐‘–,๐‘—| โ‰ค๐‘€ for all ๐‘– = 1, ..., ๐‘˜, ๐‘— = 1, ..., โ„“,

2. ฮ“ โŠ‚โ‹ƒ๐‘˜

๐‘–=1 ๐‘Ž๐‘–,1, and ฮ“ โˆฉ ๐‘Ž๐‘–,1 = โˆ… for all ๐‘– = 1, ..., ๐‘˜,

3. ๐‘๐บ(๐‘Ž*) โŠ‚ ๐‘Ž* โˆชโ‹ƒ๐‘˜

๐‘–=1 ๐‘Ž๐‘–,โ„“,

4. the aggregation graph of ๐’œโˆ–๐‘Ž*, given by

(๐’œโˆ–๐‘Ž*, (๐‘Ž๐‘–1,๐‘—1 , ๐‘Ž๐‘–2,๐‘—2) |๐‘๐บ(๐‘Ž๐‘–1,๐‘—1) โˆฉ ๐‘Ž๐‘–2,๐‘—2 = 0),

is isomorphic to ๐ป .

118

(a) graph (๐บ,ฮ“) (b) partition ๐’œ (c) ๐บ6,2 โŠ‚ ๐ป โŠ‚ ๐บ*6,2

Figure 4-4: An example of an ๐‘€ -aggregation. Figure (A) provides a visual representationof a graph ๐บ, with boundary vertices ฮ“ enlarged. Figure (B) shows a partition ๐’œ of ๐บ, inwhich each aggregate (enclosed by dotted lines) has order at most four. The set ๐‘Ž* is denotedby a shaded region. Figure (C) shows the aggregation graph ๐ป of ๐’œโˆ–๐‘Ž*. The graph ๐ปsatisfies ๐บ6,2 โŠ‚ ๐ป โŠ‚ ๐บ*

6,2, and is therefore a 4-aggregation of (๐บ,ฮ“).

We provide a visual example in Figure 4-4, and later show that this operation applies

to a fairly large class of graphs. However, the ๐‘€ -aggregation procedure is not the only

operation for which we can control the behavior of the energy and boundary semi-norms. For

instance, the behavior of our semi-norms under the deletion of some number of edges can be

bounded easily if each edge can be replaced by a path of constant length, and no remaining

edge is contained in more than a constant number of such paths. In addition, the behavior

of these semi-norms under the disaggregation of large degree vertices is also relatively

well-behaved (see [52] for details). Here, we focus on the ๐‘€ -aggregation procedure. We

give the following result regarding graphs (๐บ,ฮ“) for which some ๐ป , ๐บ๐‘˜,โ„“ โŠ‚ ๐ป โŠ‚ ๐บ*๐‘˜,โ„“, is

an ๐‘€ -aggregation of (๐บ,ฮ“), but note that a large number of minor refinements are possible,

such as the two briefly mentioned above.

Theorem 13. If ๐ป , ๐บ๐‘˜,โ„“ โŠ‚ ๐ป โŠ‚ ๐บ*๐‘˜,โ„“, 4โ„“ < ๐‘˜ < 2๐‘โ„“, ๐‘ โˆˆ N, is an ๐‘€ -aggregation of

(๐บ,ฮ“) โˆˆ ๐’ข๐‘›, then for any ๐œ™ โˆˆ R๐‘›ฮ“ ,

1

๐ถ1

|๐œ™|ฮ“ โ‰ค min๐‘ข|ฮ“=๐œ™

|๐‘ข|๐บ โ‰ค ๐ถ2 |๐œ™|ฮ“

for some fixed constants ๐ถ1, ๐ถ2 > 0 that depend only on ๐‘ and ๐‘€ .

119

Proof. We first prove that there is an extension ๐‘ข of ๐œ™ which satisfies |๐‘ข|๐บ โ‰ค ๐ถ2|๐œ™|ฮ“ for

some ๐ถ2 > 0 that depends only on ๐‘ and ๐‘€ . To do so, we define auxiliary functions ๐‘ข and๐œ™ on (๐บ*2๐‘˜,โ„“,ฮ“2๐‘˜,โ„“). Let

๐œ™(๐‘) =

โŽงโŽชโŽจโŽชโŽฉmax๐‘žโˆˆฮ“โˆฉ๐‘Ž(๐‘+1)/2,1๐œ™(๐‘ž) if ๐‘ is odd,

min๐‘žโˆˆฮ“โˆฉ๐‘Ž๐‘/2,1 ๐œ™(๐‘ž) if ๐‘ is even,

and ๐‘ข be extension (4.4) of ๐œ™. The idea is to upper bound the semi-norm for ๐‘ข by ๐‘ข, for ๐‘ขby ๐œ™ (using Corollary 1), and for ๐œ™ by ๐œ™. On each aggregate ๐‘Ž๐‘–,๐‘— , let ๐‘ข take values between๐‘ข(2๐‘–โˆ’ 1, ๐‘—) and ๐‘ข(2๐‘–, ๐‘—), and let ๐‘ข equal ๐‘Ž on ๐‘Ž*. We can decompose |๐‘ข|2๐บ into

|๐‘ข|2๐บ =๐‘˜โˆ‘

๐‘–=1

โ„“โˆ‘๐‘—=1

โˆ‘๐‘,๐‘žโˆˆ๐‘Ž๐‘–,๐‘— ,

๐‘โˆผ๐‘ž

(๐‘ข(๐‘)โˆ’ ๐‘ข(๐‘ž))2 +๐‘˜โˆ‘

๐‘–=1

โ„“โˆ‘๐‘—=1

โˆ‘๐‘โˆˆ๐‘Ž๐‘–,๐‘— ,

๐‘žโˆˆ๐‘Ž๐‘–+1,๐‘— ,๐‘โˆผ๐‘ž

(๐‘ข(๐‘)โˆ’ ๐‘ข(๐‘ž))2

+๐‘˜โˆ‘

๐‘–=1

โ„“โˆ’1โˆ‘๐‘—=1

โˆ‘๐‘โˆˆ๐‘Ž๐‘–,๐‘— ,

๐‘žโˆˆ๐‘Ž๐‘–โˆ’1,๐‘—+1,๐‘โˆผ๐‘ž

(๐‘ข(๐‘)โˆ’ ๐‘ข(๐‘ž))2 +๐‘˜โˆ‘

๐‘–=1

โ„“โˆ’1โˆ‘๐‘—=1

โˆ‘๐‘โˆˆ๐‘Ž๐‘–,๐‘— ,

๐‘žโˆˆ๐‘Ž๐‘–+1,๐‘—+1,๐‘โˆผ๐‘ž

(๐‘ข(๐‘)โˆ’ ๐‘ข(๐‘ž))2

+๐‘˜โˆ‘

๐‘–=1

โ„“โˆ’1โˆ‘๐‘—=1

โˆ‘๐‘โˆˆ๐‘Ž๐‘–,๐‘— ,

๐‘žโˆˆ๐‘Ž๐‘–,๐‘—+1,๐‘โˆผ๐‘ž

(๐‘ข(๐‘)โˆ’ ๐‘ข(๐‘ž))2,

and bound each term of |๐‘ข|2๐บ separately, beginning with the first. The maximum energy

semi-norm of an ๐‘š vertex graph that takes values in the range [๐‘Ž, ๐‘] is bounded above by

(๐‘š/2)2(๐‘โˆ’ ๐‘Ž)2. Therefore,

โˆ‘๐‘,๐‘žโˆˆ๐‘Ž๐‘–,๐‘— ,

๐‘โˆผ๐‘ž

(๐‘ข(๐‘)โˆ’ ๐‘ข(๐‘ž))2 โ‰ค ๐‘€2

4(๐‘ข(2๐‘–โˆ’ 1, ๐‘—)โˆ’ ๐‘ข(2๐‘–, ๐‘—))2 .

120

For the second term,

โˆ‘๐‘โˆˆ๐‘Ž๐‘–,๐‘— ,

๐‘žโˆˆ๐‘Ž๐‘–+1,๐‘— ,๐‘โˆผ๐‘ž

(๐‘ข(๐‘)โˆ’ ๐‘ข(๐‘ž))2 โ‰ค ๐‘€2 max๐‘–1โˆˆ2๐‘–โˆ’1,2๐‘–,

๐‘–2โˆˆ2๐‘–+1,2๐‘–+2

(๐‘ข(๐‘–1, ๐‘—)โˆ’ ๐‘ข(๐‘–2, ๐‘—))2

โ‰ค 3๐‘€2[(๐‘ข(2๐‘–โˆ’ 1, ๐‘—)โˆ’ ๐‘ข(2๐‘–, ๐‘—))2 + (๐‘ข(2๐‘–, ๐‘—)โˆ’ ๐‘ข(2๐‘– + 1, ๐‘—))2

+(๐‘ข(2๐‘– + 1, ๐‘—)โˆ’ ๐‘ข(2๐‘– + 2, ๐‘—))2].

The exact same type of bound holds for the third and fourth terms. For the fifth term,

โˆ‘๐‘โˆˆ๐‘Ž๐‘–,๐‘— ,

๐‘žโˆˆ๐‘Ž๐‘–,๐‘—+1,๐‘โˆผ๐‘ž

(๐‘ข(๐‘)โˆ’ ๐‘ข(๐‘ž))2 โ‰ค๐‘€2 max๐‘–1โˆˆ2๐‘–โˆ’1,2๐‘–,๐‘–2โˆˆ2๐‘–โˆ’1,2๐‘–

(๐‘ข(๐‘–1, ๐‘—)โˆ’ ๐‘ข(๐‘–2, ๐‘— + 1))2,

and, unlike terms two, three, and four, this maximum appears in |๐‘ข|2๐บ*2๐‘˜,โ„“

. Combining these

three estimates, we obtain the upper bound |๐‘ข|2๐บ โ‰ค (6ร— 3 + 14)๐‘€2 |๐‘ข|2๐บ*

2๐‘˜,โ„“= 73

4๐‘€2 |๐‘ข|2๐บ*

2๐‘˜,โ„“.

Next, we lower bound |๐œ™|ฮ“ by a constant times |๐œ™|ฮ“2๐‘˜,โ„“. By definition, in ฮ“โˆฉ ๐‘Ž๐‘–,1 there is

a vertex that takes value ๐œ™(2๐‘–โˆ’1) and a vertex that takes value ๐œ™(2๐‘–). This implies that every

term in |๐œ™|ฮ“2๐‘˜,โ„“is a term in |๐œ™|ฮ“, with a possibly different denominator. Distances between

vertices on ฮ“ can be decreased by at most a factor of 2๐‘€ on ฮ“2๐‘˜,โ„“. In addition, it may be the

case that an aggregate contains only one vertex of ฮ“, which results in ๐œ™(2๐‘–โˆ’ 1) = ๐œ™(2๐‘–).

Therefore, a given term in |๐œ™|2ฮ“ could appear four times in |๐œ™|2ฮ“2๐‘˜,โ„“. Combining these

two facts, we immediately obtain the bound |๐œ™|2ฮ“2๐‘˜,โ„“โ‰ค 16๐‘€2|๐œ™|2ฮ“. Combining these two

estimates with Corollary 1 completes the first half of the proof.

All that remains is to show that for any ๐‘ข, |๐œ™|ฮ“ โ‰ค ๐ถ1|๐‘ข|๐บ for some ๐ถ1 > 0 that depends

only on ๐‘ and ๐‘€ . To do so, we define auxiliary functions ๐‘ข and ๐œ™ on (๐บ2๐‘˜,2โ„“,ฮ“2๐‘˜,2โ„“). Let

๐‘ข(๐‘–, ๐‘—) =

โŽงโŽชโŽจโŽชโŽฉmax๐‘โˆˆ๐‘ŽโŒˆ๐‘–/2โŒ‰,โŒˆ๐‘—/2โŒ‰ ๐‘ข(๐‘) if ๐‘– = ๐‘— mod 2,

min๐‘โˆˆ๐‘ŽโŒˆ๐‘–/2โŒ‰,โŒˆ๐‘—/2โŒ‰ ๐‘ข(๐‘) if ๐‘– = ๐‘— mod 2.

Here, the idea is to lower bound the semi-norm for ๐‘ข by ๐‘ข, for ๐‘ข by ๐œ™ (using Corollary 1),

121

and for ๐œ™ by ๐œ™. We can decompose |๐‘ข|2๐บ2๐‘˜,2โ„“into

|๐‘ข|2๐บ2๐‘˜,2โ„“= 4

๐‘˜โˆ‘๐‘–=1

โ„“โˆ‘๐‘—=1

(๐‘ข(2๐‘–โˆ’ 1, 2๐‘— โˆ’ 1)โˆ’ ๐‘ข(2๐‘–, 2๐‘— โˆ’ 1))2

+๐‘˜โˆ‘

๐‘–=1

โ„“โˆ‘๐‘—=1

(๐‘ข(2๐‘–, 2๐‘— โˆ’ 1)โˆ’ ๐‘ข(2๐‘– + 1, 2๐‘— โˆ’ 1))2 + (๐‘ข(2๐‘–, 2๐‘—)โˆ’ ๐‘ข(2๐‘– + 1, 2๐‘—))2

+๐‘˜โˆ‘

๐‘–=1

โ„“โˆ’1โˆ‘๐‘—=1

(๐‘ข(2๐‘–โˆ’ 1, 2๐‘—)โˆ’ ๐‘ข(2๐‘–โˆ’ 1, 2๐‘— + 1))2 + (๐‘ข(2๐‘–, 2๐‘—)โˆ’ ๐‘ข(2๐‘–, 2๐‘— + 1))2.

The minimum squared energy semi-norm of an ๐‘š vertex graph that takes value ๐‘Ž at some

vertex and value ๐‘ at some vertex is bounded below by (๐‘โˆ’ ๐‘Ž)2/๐‘š. Therefore,

(๐‘ข(2๐‘–โˆ’ 1, 2๐‘— โˆ’ 1)โˆ’ ๐‘ข(2๐‘–, 2๐‘— โˆ’ 1))2 โ‰ค๐‘€โˆ‘

๐‘,๐‘žโˆˆ๐‘Ž๐‘–,๐‘— ,๐‘โˆผ๐‘ž

(๐‘ข(๐‘)โˆ’ ๐‘ข(๐‘ž))2,

and

max๐‘–1โˆˆ2๐‘–โˆ’1,2๐‘–,

๐‘–2โˆˆ2๐‘–+1,2๐‘–+2

(๐‘ข(๐‘–1, 2๐‘—)โˆ’ ๐‘ข(๐‘–2, 2๐‘—))2 โ‰ค 2๐‘€

โˆ‘๐‘,๐‘žโˆˆ๐‘Ž๐‘–,๐‘—โˆช๐‘Ž๐‘–+1,๐‘— ,

๐‘โˆผ๐‘ž

(๐‘ข(๐‘)โˆ’ ๐‘ข(๐‘ž))2.

Repeating the above estimate for each term results in the desired bound.

Next, we upper bound |๐œ™|ฮ“ by a constant multiple of |๐œ™|ฮ“2๐‘˜,2โ„“. We can write |๐œ™|2ฮ“ as

|๐œ™|2ฮ“ =๐‘˜โˆ‘

๐‘–=1

โˆ‘๐‘,๐‘žโˆˆฮ“โˆฉ๐‘Ž๐‘–,1

(๐œ™(๐‘)โˆ’ ๐œ™(๐‘ž))2

๐‘‘2๐บ(๐‘, ๐‘ž)+

๐‘˜โˆ’1โˆ‘๐‘–1=1

๐‘˜โˆ‘๐‘–2=๐‘–1+1

โˆ‘๐‘โˆˆฮ“โˆฉ๐‘Ž๐‘–1,1,๐‘žโˆˆฮ“โˆฉ๐‘Ž๐‘–2,1

(๐œ™(๐‘)โˆ’ ๐œ™(๐‘ž))2

๐‘‘2๐บ(๐‘, ๐‘ž),

and bound each term separately. The first term is bounded by

โˆ‘๐‘,๐‘žโˆˆฮ“โˆฉ๐‘Ž๐‘–,1

(๐œ™(๐‘)โˆ’ ๐œ™(๐‘ž))2

๐‘‘2๐บ(๐‘, ๐‘ž)โ‰ค ๐‘€2

4(๐œ™(2๐‘–โˆ’ 1)โˆ’ ๐œ™(2๐‘–))2.

For the second term, we first note that ๐‘‘๐บ(๐‘, ๐‘ž) โ‰ฅ 13๐‘‘ฮ“2๐‘˜,2โ„“

((๐‘š1, 1), (๐‘š2, 1)) for ๐‘ โˆˆ ฮ“โˆฉ๐‘Ž๐‘–1,1,

๐‘ž โˆˆ ฮ“ โˆฉ ๐‘Ž๐‘–2,1, ๐‘š1 โˆˆ 2๐‘–1 โˆ’ 1, 2๐‘–1, ๐‘š2 โˆˆ 2๐‘–2 โˆ’ 1, 2๐‘–2, which allows us to bound the

122

second term by

โˆ‘๐‘โˆˆฮ“โˆฉ๐‘Ž๐‘–1,1,๐‘žโˆˆฮ“โˆฉ๐‘Ž๐‘–2,1

(๐œ™(๐‘)โˆ’ ๐œ™(๐‘ž))2

๐‘‘2๐บ(๐‘, ๐‘ž)โ‰ค 9๐‘€2 max

๐‘š1โˆˆ2๐‘–1โˆ’1,2๐‘–1,๐‘š2โˆˆ2๐‘–2โˆ’1,2๐‘–2

(๐œ™(๐‘š1)โˆ’ ๐œ™(๐‘š2))2

๐‘‘2ฮ“2๐‘˜,2โ„“(๐‘š1,๐‘š2)

.

Combining the lower bound for ๐‘ข in terms of ๐‘ข, the upper bound for ๐œ‘ in terms of ๐œ‘, and

Corollary 1 completes the second half of the proof.

The proof of Theorem 13 also immediately implies a similar result. Let ๐ฟ โˆˆ R๐‘›ฮ“ร—๐‘›ฮ“ be

the Laplacian of the complete graph on ฮ“ with weights ๐‘ค(๐‘–, ๐‘—) = ๐‘‘โˆ’2ฮ“ (๐‘–, ๐‘—). The same proof

implies the following.

Corollary 2. If ๐ป , ๐บ๐‘˜,โ„“ โŠ‚ ๐ป โŠ‚ ๐บ*๐‘˜,โ„“, 4โ„“ < ๐‘˜ < 2๐‘โ„“, ๐‘ โˆˆ N, is an ๐‘€ -aggregation of

(๐บ,ฮ“) โˆˆ ๐’ข๐‘›, then for any ๐œ™ โˆˆ R๐‘›ฮ“ ,

1

๐ถ1

โŸจ๐ฟ๐œ™, ๐œ™โŸฉ1/2 โ‰ค min๐‘ข|ฮ“=๐œ™

|๐‘ข|๐บ โ‰ค ๐ถ2 โŸจ๐ฟ๐œ™, ๐œ™โŸฉ1/2,for some fixed constants ๐ถ1, ๐ถ2 > 0 that depend only on ๐‘ and ๐‘€ .

Spectral Equivalence of ๐‘†ฮ“ and ๐ฟ1/2ฮ“

By Corollary 2, and the property โŸจ๐œ™, ๐‘†ฮ“๐œ™โŸฉ = min๐‘ข|ฮ“=๐œ™ |๐‘ข|2๐บ (see Proposition 8), in order

to prove spectral equivalence between ๐‘†ฮ“ and ๐ฟ1/2ฮ“ , it suffices to show that ๐ฟ1/2

ฮ“ and ๐ฟ are

spectrally equivalent. This can be done relatively easily, and leads to a proof of the main

result of the section.

Theorem 14. If ๐ป , ๐บ๐‘˜,โ„“ โŠ‚ ๐ป โŠ‚ ๐บ*๐‘˜,โ„“, 4โ„“ < ๐‘˜ < 2๐‘โ„“, ๐‘ โˆˆ N, is an ๐‘€ -aggregation of

(๐บ,ฮ“) โˆˆ ๐’ข๐‘›, then for any ๐œ™ โˆˆ R๐‘›ฮ“ ,

1

๐ถ1

โŸจ๐ฟ1/2ฮ“ ๐œ™, ๐œ™โŸฉ โ‰ค โŸจ๐‘†ฮ“๐œ™, ๐œ™โŸฉ โ‰ค ๐ถ2 โŸจ๐ฟ1/2

ฮ“ ๐œ™, ๐œ™โŸฉ,

for some fixed constants ๐ถ1, ๐ถ2 > 0 that depend only on ๐‘ and ๐‘€ .

123

Proof. Let ๐œ‘(๐‘–, ๐‘—) = min๐‘– โˆ’ ๐‘— mod ๐‘›ฮ“, ๐‘— โˆ’ ๐‘– mod ๐‘›ฮ“. ๐บ[ฮ“] is a cycle, so ๐ฟ(๐‘–, ๐‘—) =

โˆ’๐œ‘(๐‘–, ๐‘—)โˆ’2 for ๐‘– = ๐‘—. The spectral decomposition of ๐ฟฮ“ is well known, namely,

๐ฟฮ“ =

โŒŠ๐‘›ฮ“

2

โŒ‹โˆ‘๐‘˜=1

๐œ†๐‘˜(๐ฟฮ“)

[๐‘ฅ๐‘˜๐‘ฅ

๐‘‡๐‘˜

โ€–๐‘ฅ๐‘˜โ€–2+

๐‘ฆ๐‘˜๐‘ฆ๐‘‡๐‘˜

โ€–๐‘ฆ๐‘˜โ€–2

],

where ๐œ†๐‘˜(๐ฟฮ“) = 2โˆ’ 2 cos 2๐œ‹๐‘˜๐‘›ฮ“

and ๐‘ฅ๐‘˜(๐‘—) = sin 2๐œ‹๐‘˜๐‘—๐‘›ฮ“

, ๐‘ฆ๐‘˜(๐‘—) = cos 2๐œ‹๐‘˜๐‘—๐‘›ฮ“

, ๐‘— = 1, ..., ๐‘›ฮ“. If ๐‘›ฮ“

is odd, then ๐œ†(๐‘›ฮ“โˆ’1)/2 has multiplicity two, but if ๐‘›ฮ“ is even, then ๐œ†๐‘›ฮ“/2 has only multiplicity

one, as ๐‘ฅ๐‘›ฮ“/2 = 0. If ๐‘˜ = ๐‘›ฮ“/2, we have

โ€–๐‘ฅ๐‘˜โ€–2 =

๐‘›ฮ“โˆ‘๐‘—=1

sin2

(2๐œ‹๐‘˜๐‘—

๐‘›ฮ“

)=

๐‘›ฮ“

2โˆ’ 1

2

๐‘›ฮ“โˆ‘๐‘—=1

cos

(4๐œ‹๐‘˜๐‘—

๐‘›ฮ“

)

=๐‘›ฮ“

2โˆ’ 1

4

[sin(2๐œ‹๐‘˜(2 + 1

๐‘›ฮ“))

sin 2๐œ‹๐‘˜๐‘›ฮ“

โˆ’ 1

]=

๐‘›ฮ“

2,

and so โ€–๐‘ฆ๐‘˜โ€–2 = ๐‘›ฮ“

2as well. If ๐‘˜ = ๐‘›ฮ“/2, then โ€–๐‘ฆ๐‘˜โ€–2 = ๐‘›ฮ“. If ๐‘›ฮ“ is odd,

๐ฟ1/2ฮ“ (๐‘–, ๐‘—) =

2โˆš

2

๐‘›ฮ“

๐‘›ฮ“โˆ’1

2โˆ‘๐‘˜=1

[1โˆ’ cos

2๐‘˜๐œ‹

๐‘›ฮ“

]1/2 [sin

2๐œ‹๐‘˜๐‘–

๐‘›ฮ“

sin2๐œ‹๐‘˜๐‘—

๐‘›ฮ“

โˆ’ cos2๐œ‹๐‘˜๐‘–

๐‘›ฮ“

cos2๐œ‹๐‘˜๐‘—

๐‘›ฮ“

]

=4

๐‘›ฮ“

๐‘›ฮ“โˆ’1

2โˆ‘๐‘˜=1

sin

(๐œ‹

2

2๐‘˜

๐‘›ฮ“

)cos

(๐œ‘(๐‘–, ๐‘—)๐œ‹

2๐‘˜

๐‘›ฮ“

)

=2

๐‘›ฮ“

๐‘›ฮ“โˆ‘๐‘˜=0

sin

(๐œ‹

2

2๐‘˜

๐‘›ฮ“

)cos

(๐œ‘(๐‘–, ๐‘—)๐œ‹

2๐‘˜

๐‘›ฮ“

),

and if ๐‘›ฮ“ is even,

๐ฟ1/2ฮ“ (๐‘–, ๐‘—) =

2

๐‘›ฮ“

(โˆ’1)๐‘–+๐‘— +4

๐‘›ฮ“

๐‘›ฮ“2โˆ’1โˆ‘

๐‘˜=1

sin

(๐œ‹

2

2๐‘˜

๐‘›ฮ“

)cos

(๐œ‘(๐‘–, ๐‘—)๐œ‹

2๐‘˜

๐‘›ฮ“

)

=2

๐‘›ฮ“

๐‘›ฮ“โˆ‘๐‘˜=0

sin

(๐œ‹

2

2๐‘˜

๐‘›ฮ“

)cos

(๐œ‘(๐‘–, ๐‘—)๐œ‹

2๐‘˜

๐‘›ฮ“

).

๐ฟ1/2ฮ“ (๐‘–, ๐‘—) is simply the trapezoid rule applied to the integral of sin(๐œ‹

2๐‘ฅ) cos(๐œ‘(๐‘–, ๐‘—)๐œ‹๐‘ฅ) on

124

the interval [0, 2]. Therefore,๐ฟ1/2ฮ“ (๐‘–, ๐‘—) +

2

๐œ‹(4๐œ‘(๐‘–, ๐‘—)2 โˆ’ 1)

=

๐ฟ1/2ฮ“ (๐‘–, ๐‘—)โˆ’

โˆซ 2

0

sin(๐œ‹

2๐‘ฅ)

cos (๐œ‘(๐‘–, ๐‘—)๐œ‹๐‘ฅ) ๐‘‘๐‘ฅ

โ‰ค 2

3๐‘›2ฮ“

,

where we have used the fact that if ๐‘“ โˆˆ ๐ถ2([๐‘Ž, ๐‘]), then โˆซ ๐‘

๐‘Ž

๐‘“(๐‘ฅ)๐‘‘๐‘ฅโˆ’ ๐‘“(๐‘Ž) + ๐‘“(๐‘)

2(๐‘โˆ’ ๐‘Ž)

โ‰ค (๐‘โˆ’ ๐‘Ž)3

12max๐œ‰โˆˆ[๐‘Ž,๐‘]

|๐‘“ โ€ฒโ€ฒ(๐œ‰)|.

Noting that ๐‘›ฮ“ โ‰ฅ 3, it quickly follows that

(1

2๐œ‹โˆ’โˆš

2

12

)โŸจ๐œ™, ๐œ™โŸฉ โ‰ค โŸจ๐ฟ1/2

ฮ“ ๐œ™, ๐œ™โŸฉ โ‰ค(

2

3๐œ‹+

โˆš2

27

)โŸจ๐œ™, ๐œ™โŸฉ.

Combining this result with Corollary 2, and noting that โŸจ๐œ™, ๐‘†ฮ“๐œ™โŸฉ = |๐‘ข|2๐บ, where ๐‘ข is the

harmonic extension of ๐œ™, we obtain the desired result.

An Illustrative Example

While the concept of a graph (๐บ,ฮ“) having some ๐ป , ๐บ๐‘˜,โ„“ โŠ‚ ๐ป โŠ‚ ๐บ*๐‘˜,โ„“, as an ๐‘€ -aggregation

seems somewhat abstract, this simple formulation in itself is quite powerful. As an example,

we illustrate that this implies a trace theorem (and, therefore, spectral equivalence) for all

three-connected planar graphs with bounded face degree (number of edges in the associated

induced cycle) and for which there exists a planar spring embedding with a convex hull

that is not too thin (a bounded distance to Hausdorff distance ratio for the boundary with

respect to some point in the convex hull) and satisfies bounded edge length and small angle

conditions. Let ๐’ข๐‘“โ‰ค๐‘๐‘› be the elements of (๐บ,ฮ“) โˆˆ ๐’ข๐‘› for which every face other than the

outer face ฮ“ has at most ๐‘ edges. The exact proof the following theorem is rather long, and

can be found in the Appendix of [124]. The intuition is that by taking the planar spring

embedding, rescaling it so that the vertices of the boundary face lie on the unit circle, and

overlaying a natural embedding of ๐บ*๐‘˜,โ„“ on the unit disk, an intuitive ๐‘€ -aggregation of the

graph ๐บ can be formed.

Theorem 15. If there exists a planar spring embedding ๐‘‹ of (๐บ,ฮ“) โˆˆ ๐’ข๐‘“โ‰ค๐‘1๐‘› for which

125

(1) ๐พ = conv ([๐‘‹ฮ“]๐‘–,ยท๐‘›ฮ“๐‘–=1) satisfies

sup๐‘ขโˆˆ๐พ

inf๐‘ฃโˆˆ๐œ•๐พ

sup๐‘คโˆˆ๐œ•๐พ

โ€–๐‘ขโˆ’ ๐‘ฃโ€–โ€–๐‘ขโˆ’ ๐‘คโ€–

โ‰ฅ ๐‘2 > 0,

(2) ๐‘‹ satisfies

max๐‘–1,๐‘–2โˆˆ๐ธ๐‘—1,๐‘—2โˆˆ๐ธ

โ€–๐‘‹๐‘–1,ยท โˆ’๐‘‹๐‘–2,ยทโ€–โ€–๐‘‹๐‘—1,ยท โˆ’๐‘‹๐‘—2,ยทโ€–

โ‰ค ๐‘3 and min๐‘–โˆˆ๐‘‰

๐‘—1,๐‘—2โˆˆ๐‘(๐‘–)

โˆ ๐‘‹๐‘—1,ยท ๐‘‹๐‘–,ยท ๐‘‹๐‘—2,ยท โ‰ฅ ๐‘4 > 0,

then there exists an ๐ป , ๐บ๐‘˜,โ„“ โŠ‚ ๐ป โŠ‚ ๐บ*๐‘˜,โ„“, โ„“ โ‰ค ๐‘˜ < 2๐‘โ„“, ๐‘ โˆˆ N, such that ๐ป is an

๐‘€ -aggregation of (๐บ,ฮ“), where ๐‘ and ๐‘€ are constants that depend on ๐‘1, ๐‘2, ๐‘3, and ๐‘4.

126

4.3 The Kamada-Kawai Objective and Optimal Layouts

Given the distances between data points in a high dimensional space, how can we meaning-

fully visualize their relationships? This is a fundamental task in exploratory data analysis,

for which a variety of different approaches have been proposed. Many of these techniques

seek to visualize high-dimensional data by embedding it into lower dimensional, e.g. two

or three-dimensional, space. Metric multidimensional scaling (MDS or mMDS) [64, 66]

is a classical approach that attempts to find a low-dimensional embedding that accurately

represents the distances between points. Originally motivated by applications in psychomet-

rics, MDS has now been recognized as a fundamental tool for data analysis across a broad

range of disciplines. See [66, 9] for more details, including a discussion of applications to

data from scientific, economic, political, and other domains. Compared to other classical

visualization tools like PCA2, metric multidimensional scaling has the advantage of not

being restricted to linear projections of the data, and of being applicable to data from an

arbitrary metric space, rather than just Euclidean space. Because of this versatility, MDS

has also become one of the most popular algorithms in the field of graph drawing, where the

goal is to visualize relationships between nodes (e.g. people in a social network). In this

context, MDS was independently proposed by Kamada and Kawai [55] as a force-directed

graph drawing method.

In this section, we consider the algorithmic problem of computing an optimal drawing

under the MDS/Kamada-Kawai objective, and structural properties of optimal drawings. The

Kamada-Kawai objective is to minimize the following energy/stress functional ๐ธ : R๐‘Ÿ๐‘› โ†’ R

๐ธ(1, . . . , ๐‘›) =โˆ‘๐‘–<๐‘—

(โ€–๐‘– โˆ’ ๐‘—โ€–๐‘‘(๐‘–, ๐‘—)

โˆ’ 1

)2

, (4.5)

which corresponds to the physical situation where 1, . . . , ๐‘› โˆˆ R๐‘Ÿ are particles and for each

๐‘– = ๐‘—, particles ๐‘– and ๐‘— are connected by an idealized spring with equilibrium length ๐‘‘(๐‘–, ๐‘—)

following Hookeโ€™s law with spring constant ๐‘˜๐‘–๐‘— = 1๐‘‘(๐‘–,๐‘—)2

. In applications to visualization,

2In the literature, PCA is sometimes referred to as classical multidimensional scaling, in contrast to metricmultidimensional scaling, which we study in this work.

127

the choice of dimension is often small, i.e. ๐‘Ÿ โ‰ค 3. We also note that in (4.5) the terms in

the sum are sometimes re-weighted with vertex or edge weights, which we discuss in more

detail later.

In practice, the MDS/Kamada-Kawai objective (4.5) is optimized via a heuristic proce-

dure like gradient descent [65, 135] or stress majorization [29, 43]. Because the objective is

non-convex, these algorithms may not reach the global minimum, but instead may terminate

at approximate critical points of the objective function. Heuristics such as restarting an

algorithm from different initializations and using modified step size schedules have been

proposed to improve the quality of results. In practice, these heuristic methods do seem to

work well for the Kamada-Kawai objective and are implemented in popular packages like

GRAPHVIZ [39] and the SMACOF package in R.

We revisit this problem from an approximation algorithms perspective. First, we resolve

the computational complexity of minimizing (4.5) by proving that finding the global mini-

mum is NP-hard, even for graph metrics (where the metric is the shortest path distance on a

graph). Consider the gap decision version of stress minimization over graph metrics, which

we formally define below:

GAP STRESS MINIMIZATION

Input: Graph ๐บ = ([๐‘›], ๐ธ), ๐‘Ÿ โˆˆ N, ๐ฟ โ‰ฅ 0.

Output: TRUE if there exists = (1, . . . , ๐‘›) โˆˆ R๐‘›๐‘Ÿ such that ๐ธ() โ‰ค ๐ฟ; FALSE if for

every , ๐ธ() โ‰ฅ ๐ฟ + 1.

Theorem 16. Gap Stress Minimization in dimension ๐‘Ÿ = 1 is NP-hard. Furthermore,

the problem is hard even restricted to input graphs with diameter bounded by an absolute

constant.

As a gap problem, the output is allowed to be arbitrary if neither case holds; the hardness

of the gap formulation shows that there cannot exist a Fully-Polynomial Randomized

Approximation Scheme (FPRAS) for this problem if P = NP, i.e. the runtime cannot be

polynomial in the desired approximation guarantee. Our reduction shows this problem is

128

hard even when the input graph has diameter bounded above by an absolute constant. This

is a natural setting to consider, since many real world graphs (for example, social networks

[35]) and random graph models [127] indeed have low diameter due to the โ€œsmall-world

phenomena.โ€ Other key aspects of this hardness proof are that we show the problem is

hard even when the input ๐‘‘ is a graph metric, and we show it is hard even in its canonical

unweighted formulation (4.5).

Given that computing the minimizer is NP-hard, a natural question is whether there

exist polynomial time approximation algorithms for minimizing (4.5). We show that if the

input graph has bounded diameter ๐ท = ๐‘‚(1), then there indeed exists a Polynomial-Time

Approximation Scheme (PTAS) to minimize (4.5), i.e. for fixed ๐œ– > 0 and fixed ๐ท there

exists an algorithm to approximate the global minimum of a ๐‘› vertex diameter ๐ท graph up to

multiplicative error (1+ ๐œ–) in time ๐‘“(๐œ–,๐ท) ยท๐‘๐‘œ๐‘™๐‘ฆ(๐‘›). More generally, we show the following

result, where KKSCHEME is a simple greedy algorithm described in Subsection 4.3.3 below.

Theorem 17. For any input metric over [๐‘›] with min๐‘–,๐‘—โˆˆ[๐‘›] ๐‘‘(๐‘–, ๐‘—) = 1 and any ๐‘… > ๐œ– > 0,

Algorithm KKSCHEME with ๐œ–1 = ๐‘‚(๐œ–/๐‘…) and ๐œ–2 = ๐‘‚(๐œ–/๐‘…2) runs in time ๐‘›2(๐‘…/๐œ–)๐‘‚(๐‘Ÿ๐‘…4/๐œ–2)

and outputs 1, . . . , ๐‘› โˆˆ R๐‘Ÿ with โ€–๐‘–โ€– โ‰ค ๐‘… such that

E [๐ธ(1, . . . , ๐‘›)] โ‰ค ๐ธ(*1, . . . ,

*๐‘›) + ๐œ–๐‘›2

for any *1, . . . ,

*๐‘› with โ€–*

๐‘– โ€– โ‰ค ๐‘… for all ๐‘–, where E is the expectation over the randomness

of the algorithm.

The fact that this result is a PTAS for bounded diameter graphs follows from combining

it with the two structural results regarding optimal Kamada-Kawai embeddings, which are

of independent interest. The first (Lemma 13) shows that the optimal objective value for low

diameter graphs must be of order ฮฉ(๐‘›2), and the second (Lemma 15) shows that the optimal

KK embedding is โ€œcontractiveโ€ in the sense that the diameter of the output is never much

larger than the diameter of the input. Both lemmas are proven in Subsection 4.3.1.

In the multidimensional scaling literature, there has been some study of the local

convergence of algorithms like stress majorization (for example [28]) which shows that

129

stress majorization will converge quickly if in a sufficiently small neighborhood of a local

minimum. This work seems to propose the first provable guarantees for global optimization.

The closest previous hardness result is the work of [19], where they showed that a similar

problem is hard. In their problem, the terms in (4.5) are weighted by ๐‘‘(๐‘–, ๐‘—), absolute

value loss replaces the squared loss, and the input is an arbitrary pseudometric where nodes

in the input are allowed to be at distance zero from each other. The second assumption

makes the diameter (ratio of max to min distance in the input) infinite, which is a major

obstruction to modifying their approach to show Theorem 16. See Remark 1 for further

discussion. In the approximation algorithms literature, there has been a great deal of interest

in optimizing the worst-case distortion of metric embeddings into various spaces, see e.g.

[5] for approximation algorithms for embeddings into one dimension, and [34, 85] for more

general surveys of low distortion metric embeddings. Though conceptually related, the

techniques used in this literature are not generally targeted for minimizing a measure of

average pairwise distortion like (4.5).

In what follows, we generally assume the input is given as an unweighted graph to

simplify notation. However, for the upper bound (Theorem 17) we do handle the general

case of arbitrary metrics with distances in [1, ๐ท] (the lower bound of 1 is without loss of

generality after re-scaling). In the lower bound (Theorem 16), we prove the (stronger) result

that the problem is hard when restricted to graph metrics, instead of just for arbitrary metrics.

Throughout this section, we use techniques involving estimating different components of

the objective function ๐ธ(1, . . . , ๐‘›) given by (4.5). For convenience, we use the notation

๐ธ๐‘–,๐‘—() :=

(โ€–๐‘– โˆ’ ๐‘—โ€–๐‘‘(๐‘–, ๐‘—)

โˆ’ 1

)2

, ๐ธ๐‘†() :=โˆ‘๐‘–,๐‘—โˆˆ๐‘†๐‘–<๐‘—

๐ธ๐‘–,๐‘—(), ๐ธ๐‘†,๐‘‡ () :=โˆ‘๐‘–โˆˆ๐‘†

โˆ‘๐‘—โˆˆ๐‘‡

๐ธ๐‘–,๐‘—().

The remainder of this section is as follows. In Subsection 4.3.1, we prove two structural

results regarding the objective value and diameter of an optimal layout. In Subsection 4.3.2,

we provide a proof of Theorem 16, the main algorithmic lower bound of the section. Finally,

in Subsection 4.3.3, we formally describe an approximation algorithm for low diameter

graphs and prove Theorem 17.

130

4.3.1 Structural Results for Optimal Embeddings

In this subsection, we present two results regarding optimal layouts of a given graph. In

particular, we provide a lower bound for the energy of a graph layout and an upper bound

for the diameter of an optimal layout. First, we recall the following standard ๐œ–-net estimate.

Lemma 12 (Corollary 4.2.13 of [126]). Let ๐ต๐‘… = ๐‘ฅ : โ€–๐‘ฅโ€– โ‰ค ๐‘… โŠ‚ R๐‘Ÿ be the origin-

centered radius ๐‘… ball in ๐‘Ÿ dimensions. For any ๐œ– โˆˆ (0, ๐‘…) there exists a subset ๐‘†๐œ– โŠ‚ ๐ต๐‘…

with |๐‘†๐œ–| โ‰ค (3๐‘…/๐œ–)๐‘Ÿ such that maxโ€–๐‘ฅโ€–โ‰ค๐‘…

min๐‘ฆโˆˆ๐‘†๐œ–

โ€–๐‘ฅโˆ’ ๐‘ฆโ€– โ‰ค ๐œ–, i.e. ๐‘†๐œ– is an ๐œ–-net of ๐ต๐‘….

We use this result to prove a lower bound for the objective value of any layout of a

diameter ๐ท graph in R๐‘Ÿ.

Lemma 13. Let ๐บ = ([๐‘›], ๐ธ) have diameter

๐ท โ‰ค (๐‘›/2)1/๐‘Ÿ

10.

Then any layout โˆˆ R๐‘Ÿ๐‘› has energy

๐ธ() โ‰ฅ ๐‘›2

81(10๐ท)๐‘Ÿ.

Proof. Let ๐บ = ([๐‘›], ๐ธ) have diameter ๐ท โ‰ค (๐‘›/2)1/๐‘Ÿ/10, and suppose that there exists a

layout โŠ‚ R๐‘Ÿ of ๐บ in dimension ๐‘Ÿ with energy ๐ธ() = ๐‘๐‘›2 for some ๐‘ โ‰ค 1/810. If no

such layout exists, then we are done. We aim to lower bound the possible values of ๐‘. For

each vertex ๐‘– โˆˆ [๐‘›], we consider the quantity ๐ธ๐‘–,๐‘‰ โˆ–๐‘–(). The sum

โˆ‘๐‘–โˆˆ[๐‘›]

๐ธ๐‘–,๐‘‰ โˆ–๐‘–() = 2๐‘๐‘›2,

and so there exists some ๐‘–โ€ฒ โˆˆ [๐‘›] such that ๐ธ๐‘–โ€ฒ,๐‘‰ โˆ–๐‘–โ€ฒ() โ‰ค 2๐‘๐‘›. By Markovโ€™s inequality,

๐‘— โˆˆ [๐‘›] |๐ธ๐‘–โ€ฒ,๐‘—() > 10๐‘

< ๐‘›/5,

131

and so at least 4๐‘›/5 vertices (including ๐‘–โ€ฒ) in [๐‘›] satisfy

(โ€–๐‘–โ€ฒ โˆ’ ๐‘—โ€–๐‘‘(๐‘–โ€ฒ, ๐‘—)

โˆ’ 1

)2

โ‰ค 10๐‘,

and also

โ€–๐‘–โ€ฒ โˆ’ ๐‘—โ€– โ‰ค ๐‘‘(๐‘–โ€ฒ, ๐‘—)(1 +โˆš

10๐‘) โ‰ค 10

9๐ท.

The remainder of the proof consists of taking the ๐‘‘-dimensional ball with center ๐‘–โ€ฒ and

radius 10๐ท/9 (which contains โ‰ฅ 4๐‘›/5 vertices), partitioning it into smaller sub-regions,

and then lower bounding the energy resulting from the interactions between vertices within

each sub-region.

By applying Lemma 12 with ๐‘… := 10๐ท/9 and ๐œ– := 1/3, we may partition the ๐‘Ÿ

dimensional ball with center ๐‘–โ€ฒ and radius 10๐ท/9 into (10๐ท)๐‘Ÿ disjoint regions, each of

diameter at most 2/3. For each of these regions, we denote by ๐‘†๐‘— โŠ‚ [๐‘›], ๐‘— โˆˆ [(10๐ท)๐‘Ÿ], the

subset of vertices whose corresponding point lies in the corresponding region. As each

region is of diameter at most 2/3 and the graph distance between any two distinct vertices is

at least one, either

๐ธ๐‘†๐‘—() โ‰ฅ

(|๐‘†๐‘—|2

)(2/3โˆ’ 1)2 =

|๐‘†๐‘—|(|๐‘†๐‘—| โˆ’ 1)

18

or |๐‘†๐‘—| = 0. Empty intervals provide no benefit and can be safely ignored. A lower bound

on the total energy can be produced by the following optimization problem

minโ„“โˆ‘

๐‘˜=1

๐‘š๐‘˜(๐‘š๐‘˜ โˆ’ 1) s.t.โ„“โˆ‘

๐‘˜=1

๐‘š๐‘˜ = ๐‘š, ๐‘š๐‘˜ โ‰ฅ 1, ๐‘˜ โˆˆ [โ„“],

where the ๐‘š๐‘˜ have been relaxed to real values. This optimization problem has a non-empty

feasible region for ๐‘š โ‰ฅ โ„“, and the solution is given by ๐‘š(๐‘š/โ„“ โˆ’ 1) (achieved when

๐‘š๐‘˜ = ๐‘š/โ„“ for all ๐‘˜). In our situation, ๐‘š := 4๐‘›/5 and โ„“ := (10๐ท)๐‘Ÿ, and, by assumption,

132

๐‘š โ‰ฅ โ„“. This leads to the lower bound

๐‘๐‘›2 = ๐ธ() โ‰ฅโ„“โˆ‘

๐‘—=1

๐ธ๐‘†๐‘—() โ‰ฅ 4๐‘›

90

[4๐‘›

5(10๐ท)๐‘Ÿโˆ’ 1

],

which implies that

๐‘ โ‰ฅ 16

450(10๐ท)๐‘Ÿ

(1โˆ’ 5(10๐ท)๐‘Ÿ

4๐‘›

)โ‰ฅ 1

75(10๐ท)๐‘Ÿ

for ๐ท โ‰ค (๐‘›/2)1/๐‘Ÿ/10. This completes the proof.

The above estimate has the correct dependence for ๐‘Ÿ = 1. For instance, consider the

lexicographical product of a path ๐‘ƒ๐ท and a clique ๐พ๐‘›/๐ท (i.e., a graph with ๐ท cliques in a

line, and complete bipartite graphs between neighboring cliques). This graph has diameter

๐ท, and the layout in which the โ€œverticesโ€ (each corresponding to a copy of ๐พ๐‘›/๐ท) of ๐‘ƒ๐ท lie

exactly at the integer values [๐ท] has objective value ๐‘›2(๐‘›/๐ท โˆ’ 1). This estimate is almost

certainly not tight for dimensions ๐‘Ÿ > 1, as there is no higher dimensional analogue of the

path (i.e., a graph with ๐‘‚(๐ท๐‘Ÿ) vertices and diameter ๐ท that embeds isometrically in R๐‘Ÿ).

Next, we provide a proof of Lemma 15, which upper bounds the diameter of any optimal

layout of a diameter ๐ท graph. However, before we proceed with the proof, we first prove

an estimate for the concentration of points ๐‘– at some distance from the marginal median

in any optimal layout. The marginal median โˆˆ R๐‘Ÿ of a set of points 1, . . . , ๐‘› โˆˆ R๐‘Ÿ is

the vector whose ๐‘–๐‘กโ„Ž component (๐‘–) is the univariate median of 1(๐‘–), . . . , ๐‘›(๐‘–) (with the

convention that, if ๐‘› is even, the univariate median is given by the mean of the (๐‘›/2)๐‘กโ„Ž and

(๐‘›/2 + 1)๐‘กโ„Ž ordered values). The below result, by itself, is not strong enough to prove our

desired diameter estimate, but serves as a key ingredient in the proof of Lemma 15.

Lemma 14. Let ๐บ = ([๐‘›], ๐ธ) have diameter ๐ท. Then, for any optimal layout โˆˆ R๐‘Ÿ๐‘›, i.e.,

such that ๐ธ() โ‰ค ๐ธ() for all โˆˆ R๐‘Ÿ๐‘›,

๐‘– โˆˆ [๐‘›] | โ€–โˆ’ ๐‘–โ€–โˆž โ‰ฅ (๐ถ + ๐‘˜)๐ท

โ‰ค 2๐‘Ÿ๐‘›๐ถโˆ’

โˆš2๐‘˜

133

for all ๐ถ > 0 and ๐‘˜ โˆˆ N, where is the marginal median of 1, . . . , ๐‘›.

Proof. Let ๐บ = ([๐‘›], ๐ธ) have diameter ๐ท, and be an optimal layout. Without loss

of generality, we order the vertices so that 1(1) โ‰ค ยท ยท ยท โ‰ค ๐‘›(1), and shift so that

โŒŠ๐‘›/2โŒ‹(1) = 0. Next, we fix some ๐ถ > 0, and define the subsets ๐‘†0 :=[โŒŠ๐‘›/2โŒ‹

]and

๐‘†๐‘˜ :=๐‘– โˆˆ [๐‘›] | ๐‘–(1) โˆˆ

[(๐ถ + ๐‘˜)๐ท, (๐ถ + ๐‘˜ + 1)๐ท

),

๐‘‡๐‘˜ :=๐‘– โˆˆ [๐‘›] | ๐‘–(1) โ‰ฅ (๐ถ + ๐‘˜)๐ท

,

๐‘˜ โˆˆ N. Our primary goal is to estimate the quantity |๐‘‡๐‘˜|, for an arbitrary ๐‘˜, from which the

desired result will follow quickly.

The objective value ๐ธ() is at most(๐‘›2

); otherwise we could replace by a layout with

all ๐‘– equal. To obtain a first estimate on |๐‘‡๐‘˜|, we consider a crude lower bound on ๐ธ๐‘†0,๐‘‡๐‘˜().

We have

๐ธ๐‘†0,๐‘‡๐‘˜() โ‰ฅ |๐‘†0||๐‘‡๐‘˜|

((๐ถ + ๐‘˜)๐ท

๐ทโˆ’ 1

)2

= (๐ถ + ๐‘˜ โˆ’ 1)2โŒŠ๐‘›/2โŒ‹|๐‘‡๐‘˜|,

and therefore

|๐‘‡๐‘˜| โ‰ค(๐‘›

2

)1

(๐ถ + ๐‘˜ โˆ’ 1)2โŒŠ๐‘›/2โŒ‹โ‰ค ๐‘›

(๐ถ + ๐‘˜ โˆ’ 1)2.

From here, we aim to prove the following claim

|๐‘‡๐‘˜| โ‰ค๐‘›(

๐ถ + ๐‘˜ โˆ’ (2โ„“โˆ’ 1))2โ„“ , ๐‘˜, โ„“ โˆˆ N, ๐‘˜ โ‰ฅ 2โ„“โˆ’ 1, (4.6)

by induction on โ„“. The above estimate serves as the base case for โ„“ = 1. Assuming the

above statement holds for some fixed โ„“, we aim to prove that this holds for โ„“ + 1.

To do so, we consider the effect of collapsing the set of points in ๐‘†๐‘˜, ๐‘˜ โ‰ฅ 2โ„“, into a

134

hyperplane with a fixed first value. In particular, consider the alternate layout โ€ฒ given by

โ€ฒ๐‘–(๐‘—) =

โŽงโŽชโŽชโŽชโŽชโŽชโŽจโŽชโŽชโŽชโŽชโŽชโŽฉ(๐ถ + ๐‘˜)๐ท ๐‘— = 1, ๐‘– โˆˆ ๐‘†๐‘˜

๐‘–(๐‘—)โˆ’๐ท ๐‘— = 1, ๐‘– โˆˆ ๐‘‡๐‘˜+1

๐‘–(๐‘—) otherwise

.

For this new layout, we have

๐ธ๐‘†๐‘˜โˆ’1โˆช๐‘†๐‘˜โˆช๐‘†๐‘˜+1(โ€ฒ) โ‰ค ๐ธ๐‘†๐‘˜โˆ’1โˆช๐‘†๐‘˜โˆช๐‘†๐‘˜+1

() +|๐‘†๐‘˜|2

2+(|๐‘†๐‘˜โˆ’1|+ |๐‘†๐‘˜+1|

)|๐‘†๐‘˜|

and

๐ธ๐‘†0,๐‘‡๐‘˜+1(โ€ฒ) โ‰ค ๐ธ๐‘†0,๐‘‡๐‘˜+1

()โˆ’ |๐‘†0||๐‘‡๐‘˜+1|((

(๐ถ + ๐‘˜ + 1)โˆ’ 1)2 โˆ’ ((๐ถ + ๐‘˜)โˆ’ 1

)2)= ๐ธ๐‘†0,๐‘‡๐‘˜+1

()โˆ’ (2๐ถ + 2๐‘˜ โˆ’ 1)โŒŠ๐‘›/2โŒ‹|๐‘‡๐‘˜+1|.

Combining these estimates, we note that this layout has objective value bounded above by

๐ธ(โ€ฒ) โ‰ค ๐ธ() +(|๐‘†๐‘˜โˆ’1|+ |๐‘†๐‘˜|/2 + |๐‘†๐‘˜+1|

)|๐‘†๐‘˜| โˆ’ (2๐ถ + 2๐‘˜ โˆ’ 1)โŒŠ๐‘›/2โŒ‹|๐‘‡๐‘˜+1|

โ‰ค ๐ธ() + |๐‘‡๐‘˜โˆ’1||๐‘‡๐‘˜| โˆ’ (2๐ถ + 2๐‘˜ โˆ’ 1)โŒŠ๐‘›/2โŒ‹|๐‘‡๐‘˜+1|,

and so

|๐‘‡๐‘˜+1| โ‰ค|๐‘‡๐‘˜โˆ’1||๐‘‡๐‘˜|

(2๐ถ + 2๐‘˜ โˆ’ 1)โŒŠ๐‘›/2โŒ‹

โ‰ค ๐‘›2(๐ถ + (๐‘˜ โˆ’ 1)โˆ’ (2โ„“โˆ’ 1)

)2โ„“(๐ถ + ๐‘˜ โˆ’ (2โ„“โˆ’ 1)

)2โ„“(2๐ถ + 2๐‘˜ โˆ’ 1)โŒŠ๐‘›/2โŒ‹

โ‰ค ๐‘›

2โŒŠ๐‘›/2โŒ‹

(๐ถ + (๐‘˜ โˆ’ 1)โˆ’ (2โ„“โˆ’ 1)

)2โ„“(๐ถ + ๐‘˜ โˆ’ (2โ„“โˆ’ 1)

)2โ„“ ๐‘›(๐ถ + (๐‘˜ โˆ’ 1)โˆ’ (2โ„“โˆ’ 1)

)2โ„“+1

โ‰ค ๐‘›(๐ถ + (๐‘˜ + 1)โˆ’ (2(โ„“ + 1)โˆ’ 1)

)2โ„“+1

135

for all ๐‘˜ + 1 โ‰ฅ 2(โ„“ + 1)โˆ’ 1 and ๐ถ + ๐‘˜ โ‰ค ๐‘› + 1. If ๐ถ + ๐‘˜ > ๐‘› + 1, then

|๐‘‡๐‘˜+1| โ‰ค|๐‘‡๐‘˜โˆ’1||๐‘‡๐‘˜|

(2๐ถ + 2๐‘˜ โˆ’ 1)โŒŠ๐‘›/2โŒ‹โ‰ค |๐‘‡๐‘˜โˆ’1||๐‘‡๐‘˜|

(2๐‘› + 1)โŒŠ๐‘›/2โŒ‹< 1,

and so |๐‘‡๐‘˜+1| = 0. This completes the proof of claim (4.6), and implies that |๐‘‡๐‘˜| โ‰ค ๐‘›๐ถโˆ’โˆš2๐‘˜

.

Repeating this argument, with indices reversed, and also for the remaining dimensions

๐‘— = 2, . . . , ๐‘Ÿ, leads to the desired result

๐‘– โˆˆ [๐‘›] | โ€–โˆ’ ๐‘–โ€–โˆž โ‰ฅ (๐ถ + ๐‘˜)๐ท

โ‰ค 2๐‘Ÿ๐‘›๐ถโˆ’

โˆš2๐‘˜

for all ๐‘˜ โˆˆ N and ๐ถ > 0.

Using the estimates in the proof of Lemma 14 (in particular, the bound |๐‘‡๐‘˜| โ‰ค ๐‘›๐ถโˆ’โˆš2๐‘˜

),

we are now prepared to prove the following diameter bound.

Lemma 15. Let ๐บ = ([๐‘›], ๐ธ) have diameter ๐ท. Then, for any optimal layout โˆˆ R๐‘Ÿ๐‘›, i.e.,

such that ๐ธ() โ‰ค ๐ธ() for all โˆˆ R๐‘Ÿ๐‘›,

โ€–๐‘– โˆ’ ๐‘—โ€–2 โ‰ค 8๐ท + 4๐ท log2 log2 2๐ท

for all ๐‘–, ๐‘— โˆˆ [๐‘›].

Proof. Let ๐บ = ([๐‘›], ๐ธ), have diameter ๐ท, and be an optimal layout. Without loss of

generality, suppose that the largest distance between any two points of is realized between

two points lying in span๐‘’1 (i.e., on the axis of the first dimension). In addition, we order

the vertices so that 1(1) โ‰ค ยท ยท ยท โ‰ค ๐‘›(1) and translate so that โŒŠ๐‘›/2โŒ‹(1) = 0.

Let ๐‘ฅ1(1) = โˆ’๐›ผ๐‘’1, and suppose that ๐›ผ โ‰ฅ 5๐ท. If this is not the case, then we are done.

By differentiating ๐ธ with respect to 1(1), we obtain

๐œ•๐ธ

๐œ•1(1)= โˆ’

๐‘›โˆ‘๐‘—=2

(โ€–๐‘— โˆ’ 1โ€–๐‘‘(1, ๐‘—)

โˆ’ 1

)๐‘—(1) + ๐›ผ

๐‘‘(1, ๐‘—)โ€–๐‘— โˆ’ 1โ€–= 0.

136

Let

๐‘‡1 := ๐‘— โˆˆ [๐‘›] : โ€–๐‘— โˆ’ 1โ€– โ‰ฅ ๐‘‘(1, ๐‘—),

๐‘‡2 := ๐‘— โˆˆ [๐‘›] : โ€–๐‘— โˆ’ 1โ€– < ๐‘‘(1, ๐‘—),

and compare the sum of the terms in the above equation corresponding to ๐‘‡1 and ๐‘‡2,

respectively. We begin with the former. Note that for all ๐‘— โ‰ฅ โŒŠ๐‘›/2โŒ‹ we must have ๐‘— โˆˆ ๐‘‡1,

because โ€–๐‘ฅ๐‘— โˆ’ ๐‘ฅ1โ€– โ‰ฅ ๐›ผ โ‰ฅ 5๐ท > ๐‘‘(1, ๐‘—). Therefore we have

โˆ‘๐‘—โˆˆ๐‘‡1

(โ€–๐‘— โˆ’ 1โ€–๐‘‘(1, ๐‘—)

โˆ’ 1

)๐‘—(1) + ๐›ผ

๐‘‘(1, ๐‘—)โ€–๐‘— โˆ’ 1โ€–โ‰ฅ

๐‘›โˆ‘๐‘—=โŒŠ๐‘›/2โŒ‹

(โ€–๐‘— โˆ’ 1โ€–๐‘‘(1, ๐‘—)

โˆ’ 1

)๐‘—(1) + ๐›ผ

๐‘‘(1, ๐‘—)โ€–๐‘— โˆ’ 1โ€–

โ‰ฅ โŒˆ๐‘›/2โŒ‰(โ€–๐‘— โˆ’ 1โ€–

๐ทโˆ’ 1

)๐›ผ

๐ทโ€–๐‘— โˆ’ 1โ€–

โ‰ฅ โŒˆ๐‘›/2โŒ‰(๐›ผโˆ’๐ท)

๐ท2.

where in the last inequality we used โ€–๐‘ฅ๐‘— โˆ’ ๐‘ฅ1โ€– โ‰ฅ ๐›ผ and โ€–๐‘ฅ๐‘— โˆ’ ๐‘ฅ1โ€–/๐ท โˆ’ 1 โ‰ฅ ๐›ผ/๐ท โˆ’ 1.

Next, we estimate the sum of terms corresponding to ๐‘‡2. Let

๐‘‡3 := ๐‘– โˆˆ [๐‘›] : ๐‘–(1) + ๐›ผ โ‰ค ๐ท

and note that ๐‘‡2 โŠ‚ ๐‘‡3. We have

โˆ‘๐‘—โˆˆ๐‘‡2

(1โˆ’ โ€–๐‘— โˆ’ 1โ€–

๐‘‘(1, ๐‘—)

)๐‘—(1) + ๐›ผ

๐‘‘(1, ๐‘—)โ€–๐‘— โˆ’ 1โ€–=โˆ‘๐‘—โˆˆ๐‘‡3

1โˆ’ โ€–๐‘— โˆ’ 1โ€–

๐‘‘(1, ๐‘—)

+

๐‘—(1) + ๐›ผ

๐‘‘(1, ๐‘—)โ€–๐‘— โˆ’ 1โ€–

โ‰คโˆ‘๐‘—โˆˆ๐‘‡3

1โˆ’ โ€–๐‘— โˆ’ 1โ€–

๐‘‘(1, ๐‘—)

+

1

๐‘‘(1, ๐‘—)

โ‰คโˆ‘๐‘—โˆˆ๐‘‡3

1

๐‘‘(1, ๐‘—)โ‰ค |๐‘‡3|,

where | ยท |+ := maxยท, 0. Combining these estimates, we have

|๐‘‡3| โˆ’โŒˆ๐‘›/2โŒ‰(๐›ผโˆ’๐ท)

๐ท2โ‰ฅ 0,

137

or equivalently,

๐›ผ โ‰ค ๐ท +|๐‘‡3|๐ท2

โŒˆ๐‘›/2โŒ‰.

By the estimate in the proof of Lemma 14, |๐‘‡3| โ‰ค ๐‘›๐ถโˆ’โˆš2๐‘˜

for any ๐‘˜ โˆˆ N and ๐ถ > 0

satisfying (๐ถ + ๐‘˜)๐ท โ‰ค ๐›ผ โˆ’๐ท. Taking ๐ถ = 2, and ๐‘˜ = โŒŠ๐›ผ/๐ท โˆ’ 3โŒ‹ > 2๐ท log2 log2(2๐ท),

we have

๐›ผ โ‰ค ๐ท + 2๐ท22โˆ’โˆš2๐‘˜

โ‰ค ๐ท +2๐ท2

2๐ท= 2๐ท,

a contradiction. Therefore, ๐›ผ is at most 4๐ท + 2๐ท log2 log2 2๐ท. Repeating this argument

with indices reversed completes the proof.

While the above estimate is sufficient for our purposes, we conjecture that it is not tight,

and that the diameter of an optimal layout of a diameter ๐ท graph is always at most 2๐ท.

4.3.2 Algorithmic Lower Bounds

In this subsection, we provide a proof of Theorem 16, the algorithmic lower bound of this

section. Our proof is based on a reduction from a version of Max All-Equal 3SAT. The

Max All-Equal 3SAT decision problem asks whether, given variables ๐‘ก1, . . . , ๐‘กโ„“, clauses

๐ถ1, . . . , ๐ถ๐‘š โŠ‚ ๐‘ก1, . . . , ๐‘กโ„“, ๐‘ก1, . . . , ๐‘กโ„“ each consisting of at most three literals (variables or

their negation), and some value ๐ฟ, there exists an assignment of variables such that at least

๐ฟ clauses have all literals equal. The Max All-Equal 3SAT decision problem is known to be

APX-hard, as it does not satisfy the conditions of the Max CSP classification theorem for a

polynomial time optimizable Max CSP [58] (setting all variables true or all variables false

does not satisfy all clauses, and all clauses cannot be written in disjunctive normal form as

two terms, one with all unnegated variables and one with all negated variables).

However, we require a much more restrictive version of this problem. In particular, we

require a version in which all clauses have exactly three literals, no literal appears in a clause

more than once, the number of copies of a clause is equal to the number of copies of its

complement (defined as the negation of all its elements), and each literal appears in exactly

138

๐‘˜ clauses. We refer to this restricted version as Balanced Max All-Equal EU3SAT-E๐‘˜. This

is indeed still APX-hard, even for ๐‘˜ = 6. The proof adds little to the understanding of the

hardness of the Kamada-Kawai objective, and for this reason is omitted. The proof can be

found in [31].

Lemma 16. Balanced Max All-Equal EU3SAT-E6 is APX-hard.

Reduction. Suppose we have an instance of Balanced Max All-Equal EU3SAT-E๐‘˜ with

variables ๐‘ก1, . . . , ๐‘กโ„“, and clauses ๐ถ1, . . . , ๐ถ2๐‘š. Let โ„’ = ๐‘ก1, . . . , ๐‘กโ„“, ๐‘ก1, . . . , ๐‘กโ„“ be the set of

literals and ๐’ž = ๐ถ1, . . . , ๐ถ2๐‘š be the multiset of clauses. Consider the graph ๐บ = (๐‘‰,๐ธ),

with

๐‘‰ = ๐‘ฃ๐‘–๐‘–โˆˆ[๐‘๐‘ฃ ] โˆช ๐‘ก๐‘– ๐‘กโˆˆโ„’๐‘–โˆˆ[๐‘๐‘ก]

โˆช ๐ถ๐‘– ๐ถโˆˆ๐’ž๐‘–โˆˆ[๐‘๐‘]

,

๐ธ = ๐‘‰ (2) โˆ–[(๐‘ก๐‘–, ๐‘ก๐‘—) ๐‘กโˆˆโ„’

๐‘–,๐‘—โˆˆ[๐‘๐‘ก]โˆช (๐‘ก๐‘–, ๐ถ๐‘—) ๐‘กโˆˆ๐ถ,๐ถโˆˆ๐’ž

๐‘–โˆˆ[๐‘๐‘ก],๐‘—โˆˆ[๐‘๐‘]

],

where ๐‘‰ (2) := ๐‘ˆ โŠ‚ ๐‘‰ | |๐‘ˆ | = 2, ๐‘๐‘ฃ = ๐‘20๐‘ , ๐‘๐‘ก = ๐‘2

๐‘ , ๐‘๐‘ = 2000โ„“20๐‘š20. The graph ๐บ

consists of a central clique of size ๐‘๐‘ฃ, a clique of size ๐‘๐‘ก for each literal, and a clique of

size ๐‘๐‘ for each clause; the central clique is connected to all other cliques (via bicliques);

the clause cliques are connected to each other; each literal clique is connected to all other

literals, save for the clique corresponding to its negation; and each literal clique is connected

to the cliques of all clauses it does not appear in. Note that the distance between any two

nodes in this graph is at most two.

Our goal is to show that, for some fixed ๐ฟ = ๐ฟ(โ„“,๐‘š, ๐‘›, ๐‘) โˆˆ N, if our instance of

Balanced Max All-Equal EU3SAT-E๐‘˜ can satisfy 2๐‘ clauses, then there exists a layout

with ๐ธ() โ‰ค ๐ฟ, and if it can satisfy at most 2๐‘ โˆ’ 2 clauses, then all layouts โ€ฒ have

objective value at least ๐ธ(โ€ฒ) โ‰ฅ ๐ฟ + 1. To this end, we propose a layout and calculate

its objective function up to lower-order terms. Later we will show that, for any layout of

an instance that satisfies at most 2๐‘โˆ’ 2 clauses, the objective value is strictly higher than

our proposed construction for the previous case. The anticipated difference in objective

value is of order ฮ˜(๐‘๐‘ก๐‘๐‘), and so we will attempt to correctly place each literal vertex up to

139

accuracy ๐‘œ(๐‘๐‘/(โ„“๐‘›)

)and each clause vertex up to accuracy ๐‘œ

(๐‘๐‘ก/(๐‘š๐‘›)

).

The general idea of this reduction is that the clique ๐‘ฃ๐‘– serves as an โ€œanchor" of sorts that

forces all other vertices to be almost exactly at the correct distance from its center. Without

loss of generality, assume this anchor clique is centered at 0. Roughly speaking, this forces

each literal clique to roughly be at either โˆ’1 or +1, and the distance between negations

forces negations to be on opposite sides, i.e., ๐‘ก๐‘– โ‰ˆ โˆ’๐‘ก๐‘– . Clause cliques are also roughly at

either โˆ’1 or +1, and the distance to literals forces clauses to be opposite the side with the

majority of its literals, i.e., clause ๐ถ = ๐‘ก1, ๐‘ก2, ๐‘ก3 lies at ๐ถ๐‘– โ‰ˆ โˆ’๐œ’๐‘ก๐‘–1+ ๐‘ก๐‘–2

+ ๐‘ก๐‘–3โ‰ฅ 0,

where ๐œ’ is the indicator variable. The optimal embedding of ๐บ, i.e. the location of variable

cliques at either +1 or โˆ’1, corresponds to an optimal assignment for the Max All-Equal

3SAT instance.

Remark 1 (Comparison to [19]). As mentioned in the Introduction, the reduction here is

significantly more involved than the hardness proof for a related problem in [19]. At a high

level, the key difference is that in [19] they were able to use a large number of distance-zero

vertices to create a simple structure around the origin. This is no longer possible in our

setting (in particular, with bounded diameter graph metrics), which results in graph layouts

with much less structure. For this reason, we require a graph that exhibits as much structure

as possible. To this end, a reduction from Max All-Equal 3SAT using both literals and

clauses in the graph is a much more suitable technique than a reduction from Not All-Equal

3SAT using only literals. In fact, it is not at all obvious that the same approach in [19],

applied to unweighted graphs, would lead to a computationally hard instance.

Structure of optimal layout. Let be a globally optimal layout of ๐บ, and let us label the

vertices of ๐บ based on their value in , i.e., such that 1 โ‰ค ยท ยท ยท โ‰ค ๐‘›. In addition, we define

:= ๐‘›โˆ’๐‘๐‘ฃ = 2โ„“๐‘๐‘ก + 2๐‘š๐‘๐‘.

Without loss of generality, we assume thatโˆ‘

๐‘– ๐‘– = 0. We consider the first-order conditions

for an arbitrary vertex ๐‘–. Let ๐‘† := (๐‘–, ๐‘—) | ๐‘– < ๐‘—, ๐‘‘(๐‘–, ๐‘—) = 2, ๐‘†๐‘– := ๐‘— โˆˆ [๐‘›] | ๐‘‘(๐‘–, ๐‘—) = 2,

140

and

๐‘†<๐‘– := ๐‘— < ๐‘– | ๐‘‘(๐‘–, ๐‘—) = 2, ๐‘†>

๐‘– := ๐‘— > ๐‘– | ๐‘‘(๐‘–, ๐‘—) = 2,

๐‘†<,+๐‘– := ๐‘— < ๐‘– | ๐‘‘(๐‘–, ๐‘—) = 2, ๐‘— โ‰ฅ 0, ๐‘†>,+

๐‘– := ๐‘— > ๐‘– | ๐‘‘(๐‘–, ๐‘—) = 2, ๐‘— โ‰ฅ 0,

๐‘†<,โˆ’๐‘– := ๐‘— < ๐‘– | ๐‘‘(๐‘–, ๐‘—) = 2, ๐‘— < 0, ๐‘†>,โˆ’

๐‘– := ๐‘— > ๐‘– | ๐‘‘(๐‘–, ๐‘—) = 2, ๐‘— < 0.

We have

๐œ•๐ธ

๐œ•๐‘–

= 2โˆ‘๐‘—<๐‘–

(๐‘– โˆ’ ๐‘— โˆ’ 1)โˆ’ 2โˆ‘๐‘—>๐‘–

(๐‘— โˆ’ ๐‘– โˆ’ 1)โˆ’ 2โˆ‘๐‘—โˆˆ๐‘†<

๐‘–

(๐‘– โˆ’ ๐‘— โˆ’ 1)

+ 2โˆ‘๐‘—โˆˆ๐‘†>

๐‘–

(๐‘— โˆ’ ๐‘– โˆ’ 1) +1

2

โˆ‘๐‘—โˆˆ๐‘†<

๐‘–

(๐‘– โˆ’ ๐‘— โˆ’ 2)โˆ’ 1

2

โˆ‘๐‘—โˆˆ๐‘†>

๐‘–

(๐‘— โˆ’ ๐‘– โˆ’ 2)

=(2๐‘›โˆ’ 3

2|๐‘†๐‘–|)๐‘– + 2(๐‘› + 1โˆ’ 2๐‘–) +

1

2

โŽกโŽฃโˆ‘๐‘—โˆˆ๐‘†<

๐‘–

(3๐‘— + 2) +โˆ‘๐‘—โˆˆ๐‘†>

๐‘–

(3๐‘— โˆ’ 2)

โŽคโŽฆ=(2๐‘›โˆ’ 3

2|๐‘†๐‘–|)๐‘– + 2(๐‘› + 1โˆ’ 2๐‘–) + 1

2

(|๐‘†>,+

๐‘– | โˆ’ 5|๐‘†>,โˆ’๐‘– |+ 5|๐‘†<,+

๐‘– | โˆ’ |๐‘†<,โˆ’๐‘– |

)+ 3

2๐ถ๐‘–

= 0,

where

๐ถ๐‘– :=โˆ‘

๐‘—โˆˆ๐‘†<,โˆ’๐‘– โˆช๐‘†>,โˆ’

๐‘–

(๐‘— + 1) +โˆ‘

๐‘—โˆˆ๐‘†<,+๐‘– โˆช๐‘†>,+

๐‘–

(๐‘— โˆ’ 1).

The ๐‘–๐‘กโ„Ž vertex has location

๐‘– =2๐‘–โˆ’ (๐‘› + 1)

๐‘›โˆ’ 34|๐‘†๐‘–|

โˆ’ |๐‘†>,+๐‘– | โˆ’ 5|๐‘†>,โˆ’

๐‘– |+ 5|๐‘†<,+๐‘– | โˆ’ |๐‘†<,โˆ’

๐‘– |4๐‘›โˆ’ 3|๐‘†๐‘–|

โˆ’ 3๐ถ๐‘–

4๐‘›โˆ’ 3|๐‘†๐‘–|. (4.7)

By Lemma 15,

|๐ถ๐‘–| โ‰ค |๐‘†๐‘–| max๐‘—โˆˆ[๐‘›]

max|๐‘— โˆ’ 1|, |๐‘— + 1| โ‰ค 75๐‘๐‘ก,

which implies that |๐‘–| โ‰ค 1 + 1/๐‘5๐‘ก for all ๐‘– โˆˆ [๐‘›].

141

Good layout for positive instances. Next, we will formally describe a layout that we

will then show to be nearly optimal (up to lower-order terms). Before giving the formal

description, we describe the layoutโ€™s structure at a high level first. Given some assignment

of variables that corresponds to 2๐‘ clauses being satisfied, for each literal ๐‘ก, we place the

clique ๐‘ก๐‘–๐‘–โˆˆ[๐‘๐‘ก] at roughly +1 if the corresponding literal ๐‘ก is true; otherwise we place the

clique at roughly โˆ’1. For each clause ๐ถ, we place the corresponding clique ๐ถ๐‘–๐‘–โˆˆ[๐‘๐ถ ] at

roughly +1 if the majority of its literals are near โˆ’1; otherwise we place it at roughly โˆ’1.

The anchor clique ๐‘ฃ๐‘–๐‘–โˆˆ[๐‘๐‘ฃ ] lies in the middle of the interval [โˆ’1,+1], separating the literal

and clause cliques on both sides.

The order of the vertices, from most negative to most positive, consists of

๐‘‡1, ๐‘‡2, ๐‘‡03 , . . . , ๐‘‡

๐‘˜3 , ๐‘‡4, ๐‘‡

๐‘˜5 , . . . , ๐‘‡

05 , ๐‘‡6, ๐‘‡7,

where

๐‘‡1 = clause cliques near โˆ’1 with all corresponding literal cliques near +1 ,

๐‘‡2 = clause cliques near โˆ’1 with two corresponding literal cliques near +1,

๐‘‡ ๐œ‘3 = literal cliques near โˆ’1 with ๐œ‘ corresponding clauses near โˆ’1 , ๐œ‘ = 0, . . . , ๐‘˜,

๐‘‡4 = the anchor clique ,

๐‘‡ ๐œ‘5 = literal cliques near +1 with ๐œ‘ corresponding clauses near +1 , ๐œ‘ = 0, . . . , ๐‘˜,

๐‘‡6 = clause cliques near +1 with two corresponding literal cliques near โˆ’1,

๐‘‡7 = clause cliques near +1 with all corresponding literal cliques near โˆ’1 ,

๐‘‡3 :=โ‹ƒ๐‘˜

๐œ‘=0 ๐‘‡๐œ‘3 , ๐‘‡6 :=

โ‹ƒ๐‘˜๐œ‘=0 ๐‘‡

๐œ‘6 , ๐‘‡๐‘ := ๐‘‡1 โˆช ๐‘‡2 โˆช ๐‘‡6 โˆช ๐‘‡7, and ๐‘‡๐‘ก := ๐‘‡3 โˆช ๐‘‡5. Let us define

๐‘– := 2๐‘–โˆ’(๐‘›+1)๐‘›

, i.e., the optimal layout of a clique ๐พ๐‘› in one dimension. We can write our

proposed optimal layout as a perturbation of ๐‘–.

Using the above Equation 4.7 for ๐‘–, we obtain ๐‘– = ๐‘– for ๐‘– โˆˆ ๐‘‡4, and, by ignoring both

142

the contribution of ๐ถ๐‘– and ๐‘œ(1/๐‘›) terms, we obtain

๐‘– = ๐‘– โˆ’3๐‘๐‘ก

๐‘›, ๐‘– โˆˆ ๐‘‡1, ๐‘– = ๐‘– +

3๐‘๐‘ก

๐‘›, ๐‘– โˆˆ ๐‘‡7,

๐‘– = ๐‘– โˆ’3๐‘๐‘ก

2๐‘›, ๐‘– โˆˆ ๐‘‡2, ๐‘– = ๐‘– +

3๐‘๐‘ก

2๐‘›, ๐‘– โˆˆ ๐‘‡6,

๐‘– = ๐‘– โˆ’๐‘๐‘ก + (๐‘˜ โˆ’ ๐œ‘/2)๐‘๐‘

๐‘›, ๐‘– โˆˆ ๐‘‡3, ๐‘– = ๐‘– +

๐‘๐‘ก + (๐‘˜ โˆ’ ๐œ‘/2)๐‘๐‘

๐‘›, ๐‘– โˆˆ ๐‘‡5.

Next, we upper bound the objective value of . We proceed in steps, estimating different

components up to ๐‘œ(๐‘๐‘๐‘๐‘ก). We recall the useful formulas

๐‘Ÿโˆ’1โˆ‘๐‘–=1

๐‘Ÿโˆ‘๐‘—=๐‘–+1

(2(๐‘— โˆ’ ๐‘–)

๐‘›โˆ’ 1

)2

=(๐‘Ÿ โˆ’ 1)๐‘Ÿ

(3๐‘›2 โˆ’ 4๐‘›(๐‘Ÿ + 1) + 2๐‘Ÿ(๐‘Ÿ + 1)

)6๐‘›2

, (4.8)

and

๐‘Ÿโˆ‘๐‘–=1

๐‘ โˆ‘๐‘—=1

(2(๐‘— โˆ’ ๐‘–) + ๐‘ž

๐‘›โˆ’ 1

)2

=๐‘Ÿ๐‘ 

3๐‘›2

(3๐‘›2 + 3๐‘ž2 + 4๐‘Ÿ2 + 4๐‘ 2 โˆ’ 6๐‘›๐‘ž + 6๐‘›๐‘Ÿ โˆ’ 6๐‘›๐‘ 

โˆ’ 6๐‘ž๐‘Ÿ + 6๐‘ž๐‘ โˆ’ 6๐‘Ÿ๐‘ โˆ’ 2)

โ‰ค ๐‘Ÿ๐‘ 3

3๐‘›2+

13๐‘Ÿ๐‘ 

3๐‘›2max(๐‘›โˆ’ ๐‘ )2, (๐‘Ÿ โˆ’ ๐‘ž)2, ๐‘Ÿ2. (4.9)

For ๐‘– โˆˆ ๐‘‡๐‘, |1โˆ’ |๐‘–|| โ‰ค 4๐‘๐‘ก/๐‘›, and so

๐ธ๐‘‡๐‘() โ‰ค |๐‘‡๐‘|2 max๐‘–,๐‘—โˆˆ๐‘‡๐‘

(|๐‘– โˆ’ ๐‘—| โˆ’ 1)2 โ‰ค 8๐‘š2๐‘2๐‘ .

In addition, for ๐‘– โˆˆ ๐‘‡๐‘ก, |1โˆ’ |๐‘–|| โ‰ค 3โ„“๐‘๐‘ก/๐‘›, and so

๐ธ๐‘‡๐‘ก() โ‰ค |๐‘–, ๐‘— โˆˆ ๐‘‡๐‘ก | ๐‘‘(๐‘–, ๐‘—) = 1|(

1 +6โ„“๐‘๐‘ก

๐‘›

)2

+ |๐‘–, ๐‘— โˆˆ ๐‘‡๐‘ก | ๐‘‘(๐‘–, ๐‘—) = 2| 1

4

(6โ„“๐‘๐‘ก

๐‘›

)2

โ‰ค (2โ„“โˆ’ 1)โ„“๐‘2๐‘ก + 40

โ„“3๐‘3๐‘ก

๐‘›

143

and

๐ธ๐‘‡๐‘,๐‘‡๐‘ก() โ‰ค(

1 +6โ„“๐‘๐‘ก

๐‘

)2

|๐‘– โˆˆ ๐‘‡๐‘, ๐‘— โˆˆ ๐‘‡๐‘ก | ๐‘‘(๐‘–, ๐‘—) = 1|

+ 2ร— 1

4

(2 +

6โ„“๐‘๐‘ก

๐‘›

)2

|๐‘– โˆˆ ๐‘‡2, ๐‘— โˆˆ ๐‘‡3 | ๐‘‘(๐‘–, ๐‘—) = 2|

+ 2ร— 1

4

(6โ„“๐‘๐‘ก

๐‘›

)2

|๐‘– โˆˆ ๐‘‡1 โˆช ๐‘‡2, ๐‘— โˆˆ ๐‘‡5 | ๐‘‘(๐‘–, ๐‘—) = 2|

โ‰ค(

1 +18โ„“๐‘๐‘ก

๐‘›

)(2โ„“โˆ’ 3)๐‘๐‘ก 2๐‘š๐‘๐‘

+1

2

(4 +

30โ„“๐‘๐‘ก

๐‘›

)(๐‘šโˆ’ ๐‘)๐‘๐‘๐‘๐‘ก

+1

2

6โ„“๐‘๐‘ก

๐‘›(2๐‘š + ๐‘)๐‘๐‘๐‘๐‘ก

โ‰ค (4โ„“โˆ’ 6)๐‘š๐‘๐‘ก๐‘๐‘ + 2(๐‘šโˆ’ ๐‘)๐‘๐‘ก๐‘๐‘ + 100๐‘šโ„“2๐‘2

๐‘ก ๐‘๐‘

๐‘›.

The quantity ๐ธ๐‘‡4() is given by (4.8) with ๐‘Ÿ = ๐‘๐‘ฃ, and so

๐ธ๐‘‡4() =๐‘๐‘ฃ(๐‘๐‘ฃ โˆ’ 1)(3๐‘›2 โˆ’ 4๐‘›(๐‘๐‘ฃ + 1) + 2๐‘๐‘ฃ(๐‘๐‘ฃ + 1))

6๐‘›2

โ‰ค (๐‘›โˆ’ 1)(๐‘›โˆ’ 2)โˆ’ 2๐‘› + 32 + 3

6.

The quantity ๐ธ๐‘‡1,๐‘‡4() is given by (4.9) with

๐‘ž = 3๐‘๐‘ก + /2, ๐‘Ÿ = ๐‘๐‘๐‘, and ๐‘  = ๐‘๐‘ฃ,

and so

๐ธ๐‘‡1,๐‘‡4() โ‰ค ๐‘๐‘๐‘๐‘3๐‘ฃ

3๐‘›2+

13๐‘๐‘๐‘๐‘๐‘ฃ2

3๐‘›2โ‰ค ๐‘๐‘๐‘

3(๐‘›โˆ’ 3) +

16๐‘๐‘๐‘2

3๐‘›.

Similarly, the quantity ๐ธ๐‘‡2,๐‘‡4() is given by (4.9) with

๐‘ž = 32๐‘๐‘ก + /2โˆ’ ๐‘๐‘๐‘, ๐‘Ÿ = (๐‘šโˆ’ ๐‘)๐‘๐‘, and ๐‘  = ๐‘๐‘ฃ,

144

and so

๐ธ๐‘‡1,๐‘‡4() โ‰ค (๐‘šโˆ’ ๐‘)๐‘๐‘๐‘3๐‘ฃ

3๐‘›2+

13(๐‘šโˆ’ ๐‘)๐‘๐‘๐‘๐‘ฃ2

3๐‘›2

โ‰ค (๐‘šโˆ’ ๐‘)๐‘๐‘

3(๐‘›โˆ’ 3) +

16(๐‘šโˆ’ ๐‘)๐‘๐‘2

3๐‘›.

Next, we estimate ๐ธ๐‘‡๐œ‘3 ,๐‘‡4

(). Using formula (4.9) with

๐‘ž = ๐‘๐‘ก + (๐‘˜ โˆ’ ๐œ‘/2)๐‘๐‘ +๐‘˜โˆ‘

๐‘—=๐œ‘

โ„“๐‘—, ๐‘Ÿ = โ„“๐œ‘, and ๐‘  = ๐‘๐‘ฃ,

where โ„“๐œ‘ := |๐‘‡ ๐œ‘3 |, we have

๐ธ๐‘‡๐œ‘3 ,๐‘‡4โ‰ค โ„“๐œ‘๐‘

3๐‘ฃ

3๐‘›2+

13โ„“๐œ‘๐‘๐‘ฃ2

3๐‘›2โ‰ค โ„“๐œ‘

3(๐‘›โˆ’ 3) +

16โ„“๐œ‘2

3๐‘›.

Combining the individual estimates for each ๐‘‡ ๐œ‘3 gives us

๐ธ๐‘‡3,๐‘‡4 โ‰คโ„“๐‘๐‘ก

3(๐‘›โˆ’ 3) +

16โ„“๐‘๐‘ก2

3๐‘›.

Finally, we can combine all of our above estimates to obtain an upper bound on ๐ธ().

We have

๐ธ() โ‰ค (2โ„“โˆ’ 1)โ„“๐‘2๐‘ก + (4โ„“โˆ’ 6)๐‘š๐‘๐‘ก๐‘๐‘ + 2(๐‘šโˆ’ ๐‘)๐‘๐‘ก๐‘๐‘ +

(๐‘›โˆ’ 1)(๐‘›โˆ’ 2)

6โˆ’ 1

3๐‘›

+ 122 + 2

3๐‘š๐‘๐‘(๐‘›โˆ’ 3) + 2

3โ„“๐‘๐‘ก(๐‘›โˆ’ 3) + 200๐‘š2๐‘2

๐‘

=(๐‘›โˆ’ 1)(๐‘›โˆ’ 2)

6โˆ’ 1

22 + (2โ„“โˆ’ 1)โ„“๐‘2

๐‘ก

+[(4โ„“โˆ’ 6)๐‘š + 2(๐‘šโˆ’ ๐‘)

]๐‘๐‘ก๐‘๐‘ + 200๐‘š2๐‘2

๐‘

โ‰ค (๐‘›โˆ’ 1)(๐‘›โˆ’ 2)

6โˆ’ โ„“๐‘2

๐‘ก โˆ’ 2(2๐‘š + ๐‘)๐‘๐‘ก๐‘๐‘ + 200๐‘š2๐‘2๐‘ .

We define the ceiling of this final upper bound to be the quantity ๐ฟ. The remainder of

the proof consists of showing that if our given instance satisfies at most 2๐‘โˆ’ 2 clauses, then

any layout has objective value at least ๐ฟ + 1.

145

Suppose, to the contrary, that there exists some layout โ€ฒ (shifted so thatโˆ‘

๐‘– โ€ฒ๐‘– = 0),

with ๐ธ(โ€ฒ) < ๐ฟ + 1.

From the analysis above, |โ€ฒ๐‘–| โ‰ค 1 + 1/๐‘5

๐‘ก for all ๐‘–. Intuitively, an optimal layout should

have a large fraction of the vertices at distance two on opposite sides. To make this intuition

precise, we first note that

Lemma 17. Let โˆˆ R๐‘› be a layout of the clique ๐พ๐‘›. Then ๐ธ() โ‰ฅ (๐‘›โˆ’ 1)(๐‘›โˆ’ 2)/6.

Proof. The first order conditions (4.7) imply that the optimal layout (up to translation and

vertex reordering) is given by โ€ฒ๐‘– =

(2๐‘–โˆ’(๐‘›+1)

)/๐‘›. By (4.8), ๐ธ(โ€ฒ) = (๐‘›โˆ’1)(๐‘›โˆ’2)/6.

Using Lemma 17, we can lower bound ๐ธ(โ€ฒ) by

๐ธ(โ€ฒ) =โˆ‘๐‘–<๐‘—

(โ€ฒ๐‘— โˆ’ โ€ฒ

๐‘– โˆ’ 1)2 โˆ’ โˆ‘

(๐‘–,๐‘—)โˆˆ๐‘†

(โ€ฒ๐‘— โˆ’ โ€ฒ

๐‘– โˆ’ 1)2

+1

4

โˆ‘(๐‘–,๐‘—)โˆˆ๐‘†

(โ€ฒ๐‘— โˆ’ โ€ฒ

๐‘– โˆ’ 2)2

โ‰ฅ (๐‘›โˆ’ 1)(๐‘›โˆ’ 2)

6โˆ’โˆ‘

(๐‘–,๐‘—)โˆˆ๐‘†

[34(โ€ฒ

๐‘— โˆ’ โ€ฒ๐‘–)2 โˆ’ (โ€ฒ

๐‘— โˆ’ โ€ฒ๐‘–)].

Therefore, by assumption,

โˆ‘(๐‘–,๐‘—)โˆˆ๐‘†

[34(โ€ฒ

๐‘— โˆ’ โ€ฒ๐‘–)2 โˆ’ (โ€ฒ

๐‘— โˆ’ โ€ฒ๐‘–)]โ‰ฅ (๐‘›โˆ’ 1)(๐‘›โˆ’ 2)

6โˆ’ (๐ฟ + 1)

โ‰ฅ โ„“๐‘2๐‘ก + 2(2๐‘š + ๐‘)๐‘๐‘ก๐‘๐‘ โˆ’ 200๐‘š2๐‘2

๐‘ โˆ’ 2.

We note that the function 34๐‘ฅ2 โˆ’ ๐‘ฅ equals one at ๐‘ฅ = 2 and is negative for ๐‘ฅ โˆˆ

[0, 4

3

).

Because |โ€ฒ๐‘–| โ‰ค 1 + 1/๐‘5

๐‘ก for all ๐‘–,

max(๐‘–,๐‘—)โˆˆ๐‘†

[34(โ€ฒ

๐‘— โˆ’ โ€ฒ๐‘–)2 โˆ’ (โ€ฒ

๐‘— โˆ’ โ€ฒ๐‘–)]โ‰ค 3

4

(2 +

2

๐‘5๐‘ก

)2

โˆ’(

2 +2

๐‘5๐‘ก

)โ‰ค 1 +

7

๐‘5๐‘ก

.

Let

๐‘‡ โ€ฒ := (๐‘–, ๐‘—) โˆˆ ๐‘† | โ€ฒ๐‘– โ‰ค โˆ’1

6and โ€ฒ

๐‘— โ‰ฅ 16.

By assumption, |๐‘‡ โ€ฒ| is at most โ„“๐‘2๐‘ก + 2(2๐‘š + ๐‘ โˆ’ 1)๐‘๐‘ก๐‘๐‘, otherwise the corresponding

instance could satisfy at least 2๐‘ clauses, a contradiction.

146

However, the quantity[34(โ€ฒ

๐‘— โˆ’ โ€ฒ๐‘–)2 โˆ’ (โ€ฒ

๐‘— โˆ’ โ€ฒ๐‘–)]

is negative for all (๐‘–, ๐‘—) โˆˆ ๐‘†โˆ–๐‘‡ โ€ฒ.

Therefore,

(1 +

7

๐‘5๐‘ก

)|๐‘‡ โ€ฒ| โ‰ฅ โ„“๐‘2

๐‘ก + 2(2๐‘š + ๐‘)๐‘๐‘ก๐‘๐‘ โˆ’ 200๐‘š2๐‘2๐‘ โˆ’ 2,

which implies that

|๐‘‡ โ€ฒ| โ‰ฅ(

1โˆ’ 7

๐‘5๐‘ก

)(โ„“๐‘2

๐‘ก + 2(2๐‘š + ๐‘)๐‘๐‘ก๐‘๐‘ โˆ’ 200๐‘š2๐‘2๐‘ โˆ’ 2

)โ‰ฅ โ„“๐‘2

๐‘ก + 2(2๐‘š + ๐‘)๐‘๐‘ก๐‘๐‘ โˆ’ 200๐‘š2๐‘2๐‘ โˆ’ 2000

> โ„“๐‘2๐‘ก + 2(2๐‘š + ๐‘โˆ’ 1)๐‘๐‘ก๐‘๐‘ + ๐‘๐‘ก๐‘๐‘,

a contradiction. This completes the proof, with a gap of one.

4.3.3 An Approximation Algorithm

In this subsection, we formally describe an approximation algorithm using tools from the

Dense CSP literature, and prove theoretical guarantees for the algorithm.

Preliminaries: Greedy Algorithms for Max-CSP

A long line of work studies the feasibility of solving the Max-CSP problem under various

related pseudorandomness and density assumptions. In our case, an algorithm with mild

dependence on the alphabet size is extremely important. A very simple greedy approach,

proposed and analyzed by Mathieu and Schudy [81, 101] (see also [133]), satisfies this

requirement.

Theorem 18 ([81, 101]). Suppose that ฮฃ is a finite alphabet, ๐‘› โ‰ฅ 1 is a positive integer,

and for every ๐‘–, ๐‘— โˆˆ(๐‘›2

)we have a function ๐‘“๐‘–๐‘— : ฮฃ ร— ฮฃ โ†’ [โˆ’๐‘€,๐‘€ ]. Then for any

๐œ– > 0, Algorithm GREEDYCSP with ๐‘ก0 = ๐‘‚(1/๐œ–2) runs in time ๐‘›2|ฮฃ|๐‘‚(1/๐œ–2) and returns

147

Algorithm 4 Greedy Algorithm for Dense CSPs [81, 101]

function GreedyCSP(ฮฃ, ๐‘›, ๐‘ก0, ๐‘“๐‘–๐‘—)Shuffle the order of variables ๐‘ฅ1, . . . , ๐‘ฅ๐‘› by a random permutation.for all assignments ๐‘ฅ1, . . . , ๐‘ฅ๐‘ก0 โˆˆ ฮฃ๐‘ก0 do

for (๐‘ก0 + 1) โ‰ค ๐‘– โ‰ค ๐‘› doChoose ๐‘ฅ๐‘– โˆˆ ฮฃ to maximize โˆ‘

๐‘—<๐‘–

๐‘“๐‘—๐‘–(๐‘ฅ๐‘—, ๐‘ฅ๐‘–)

end forRecord ๐‘ฅ and objective value

โˆ‘๐‘– =๐‘— ๐‘“๐‘–๐‘—(๐‘ฅ๐‘–, ๐‘ฅ๐‘—).

end forReturn the assignment ๐‘ฅ found with maximum objective value.

end function

๐‘ฅ1, . . . , ๐‘ฅ๐‘› โˆˆ ฮฃ such that

Eโˆ‘๐‘– =๐‘—

๐‘“๐‘–๐‘—(๐‘ฅ๐‘–, ๐‘ฅ๐‘—) โ‰ฅโˆ‘๐‘– =๐‘—

๐‘“๐‘–๐‘—(๐‘ฅ*๐‘– , ๐‘ฅ

*๐‘—)โˆ’ ๐œ–๐‘€๐‘›2

for any ๐‘ฅ*1, . . . , ๐‘ฅ

*๐‘› โˆˆ ฮฃ, where E denotes the expectation over the randomness of the

algorithm.

In comparison, we note that computing the maximizer using brute force would run in time

|ฮฃ|๐‘›, i.e. exponentially slower in terms of ๐‘›. This guarantee is stated in expectation but,

if desired, can be converted to a high probability guarantee by using Markovโ€™s inequality

and repeating the algorithm multiple times. We use GREEDYCSP to solve a minimization

problem instead of maximization, which corresponds to negating all of the functions ๐‘“๐‘–๐‘— . To

do so, we require estimates on the error due to discretization of our domain.

Lemma 18. For ๐‘, ๐‘… > 0 and ๐‘ฆ โˆˆ R๐‘Ÿ with โ€–๐‘ฆโ€– โ‰ค ๐‘…, the function ๐‘ฅ โ†ฆโ†’ (โ€–๐‘ฅโˆ’ ๐‘ฆโ€–/๐‘โˆ’ 1)2 is2๐‘

max(1, 2๐‘…/๐‘)-Lipschitz on ๐ต๐‘… = ๐‘ฅ : โ€–๐‘ฅโ€– โ‰ค ๐‘….

Proof. We first prove that the function ๐‘ฅ โ†ฆโ†’ (๐‘ฅ/๐‘โˆ’ 1)2 is 2๐‘

max(1, ๐‘…/๐‘)-Lipschitz on the

interval [0, ๐‘…]. Because the derivative of the function is 2๐‘(๐‘ฅ/๐‘ โˆ’ 1) and

2๐‘(๐‘ฅ/๐‘โˆ’ 1)

โ‰ค

2๐‘

max(1, ๐‘…/๐‘) on [0, ๐‘…], this result follows from the mean value theorem.

148

Algorithm 5 Approximation Algorithm KKSCHEME

function KKSCHEME(๐œ–1, ๐œ–2, ๐‘…):Build an ๐œ–1-net ๐‘†๐œ–1 of ๐ต๐‘… = ๐‘ฅ : โ€–๐‘ฅโ€– โ‰ค ๐‘… โŠ‚ R๐‘Ÿ as in Lemma 12.Apply the GREEDYCSP algorithm of Theorem 18 with ๐œ– = ๐œ–2 to approximatelyminimize ๐ธ(1, . . . , ๐‘›) over 1, . . . , ๐‘› โˆˆ ๐‘†๐‘›

๐œ–1.

Return 1, . . . , ๐‘›.end function

Because the function โ€–๐‘ฅ โˆ’ ๐‘ฆโ€– is 1-Lipschitz and โ€–๐‘ฅ โˆ’ ๐‘ฆโ€– โ‰ค โ€–๐‘ฅโ€– + โ€–๐‘ฆโ€– โ‰ค 2๐‘… by the

triangle inequality, the result follows from the fact that ๐‘ฅ โ†ฆโ†’ (๐‘ฅ/๐‘โˆ’ 1)2 is 2๐‘

max(1, ๐‘…/๐‘)-

Lipschitz and a composition of Lipschitz functions is Lipschitz.

Combining this result with Lemma 12 we can bound the loss in objective value due to

restricting to a well-chosen discretization.

Lemma 19. Let 1, . . . , ๐‘› โˆˆ R๐‘Ÿ be arbitrary vectors such that โ€–๐‘–โ€– โ‰ค ๐‘… for all ๐‘– and

๐œ– > 0 be arbitrary. Define ๐‘†๐œ– to be an ๐œ–-net of ๐ต๐‘… as in Lemma 12, so |๐‘†๐œ–| โ‰ค (3๐‘…/๐œ–)๐‘Ÿ. For

any input metric over [๐‘›] with min๐‘–,๐‘—โˆˆ[๐‘›] ๐‘‘(๐‘–, ๐‘—) = 1, there exists โ€ฒ1, . . . ,

โ€ฒ๐‘› โˆˆ ๐‘†๐œ– such that

๐ธ(โ€ฒ1, . . . ,

โ€ฒ๐‘›) โ‰ค ๐ธ(1, . . . , ๐‘›) + 4๐œ–๐‘…๐‘›2

where ๐ธ is (4.5) defined with respect to an arbitrary graph with ๐‘› vertices.

Proof. By Lemma 18 the energy ๐ธ(1, . . . , ๐‘›) is the sum of(๐‘›2

)โ‰ค ๐‘›2/2 many terms,

which, for each ๐‘– and ๐‘—, are individually 4๐‘…-Lipschitz in ๐‘– and ๐‘— . Therefore, defining โ€ฒ๐‘–

to be the closest point in ๐‘†๐œ– for all ๐‘– gives the desired result.

This result motivates a straightforward algorithm for producing nearly-optimal layouts

of a graph. Given some graph, by Lemma 15, a sufficiently large radius can be chosen so

that there exists an optimal layout in a ball of that radius. By constructing an ๐œ–-net of that

ball, and applying the GREEDYCSP algorithm to the resulting discretization, we obtain a

nearly-optimal layout with theoretical guarantees. This technique is formally described in

the algorithm KKSCHEME. We are now prepared to prove Theorem 17.

149

By combining Lemma 19 with Theorem 18 (used as a minimization instead of maxi-

mization algorithm), the output 1, . . . , ๐‘› of KKSCHEME satisfies

๐ธ(1, . . . , ๐‘›) โ‰ค ๐ธ(*1, . . . ,

*๐‘›) + 4๐œ–1๐‘…๐‘›2 + ๐œ–2๐‘…

2๐‘›2

and runs in time ๐‘›22๐‘‚(1/๐œ–22)๐‘Ÿ log(3๐‘…/๐œ–1). Taking ๐œ–2 = ๐‘‚(๐œ–/๐‘…2) and ๐œ–1 = ๐‘‚(๐œ–/๐‘…) completes

the proof of Theorem 17.

The runtime can be improved to ๐‘›2 + (๐‘…/๐œ–)๐‘‚(๐‘‘๐‘…4/๐œ–2) using a slightly more complex

greedy CSP algorithm [81]. Also, by the usual argument, a high probability guarantee can

be derived by repeating the algorithm ๐‘‚(log(2/๐›ฟ)) times, where ๐›ฟ > 0 is the desired failure

probability.

150

Chapter 5

Error Estimates for the Lanczos Method

5.1 Introduction

The computation of extremal eigenvalues of matrices is one of the most fundamental

problems in numerical linear algebra. When a matrix is large and sparse, methods such

as the Jacobi eigenvalue algorithm and QR algorithm become computationally infeasible,

and, therefore, techniques that take advantage of the sparsity of the matrix are required.

Krylov subspace methods are a powerful class of techniques for approximating extremal

eigenvalues, most notably the Arnoldi iteration for non-symmetric matrices and the Lanczos

method for symmetric matrices.

The Lanczos method is a technique that, given a symmetric matrix ๐ด โˆˆ R๐‘›ร—๐‘› and an

initial vector ๐‘ โˆˆ R๐‘›, iteratively computes a tridiagonal matrix ๐‘‡๐‘š โˆˆ R๐‘šร—๐‘š that satisfies

๐‘‡๐‘š = ๐‘„๐‘‡๐‘š๐ด๐‘„๐‘š, where ๐‘„๐‘š โˆˆ R๐‘›ร—๐‘š is an orthonormal basis for the Krylov subspace

๐’ฆ๐‘š(๐ด, ๐‘) = span๐‘, ๐ด๐‘, ..., ๐ด๐‘šโˆ’1๐‘.

The eigenvalues of ๐‘‡๐‘š, denoted by ๐œ†(๐‘š)1 (๐ด, ๐‘) โ‰ฅ ... โ‰ฅ ๐œ†

(๐‘š)๐‘š (๐ด, ๐‘), are the Rayleigh-Ritz

approximations to the eigenvalues ๐œ†1(๐ด) โ‰ฅ ... โ‰ฅ ๐œ†๐‘›(๐ด) of ๐ด on ๐’ฆ๐‘š(๐ด, ๐‘), and, therefore,

151

are given by

๐œ†(๐‘š)๐‘– (๐ด, ๐‘) = min

๐‘ˆโŠ‚๐’ฆ๐‘š(๐ด,๐‘)dim(๐‘ˆ)=๐‘š+1โˆ’๐‘–

max๐‘ฅโˆˆ๐‘ˆ๐‘ฅ =0

๐‘ฅ๐‘‡๐ด๐‘ฅ

๐‘ฅ๐‘‡๐‘ฅ, ๐‘– = 1, ...,๐‘š, (5.1)

or, equivalently,

๐œ†(๐‘š)๐‘– (๐ด, ๐‘) = max

๐‘ˆโŠ‚๐’ฆ๐‘š(๐ด,๐‘)dim(๐‘ˆ)=๐‘–

min๐‘ฅโˆˆ๐‘ˆ๐‘ฅ =0

๐‘ฅ๐‘‡๐ด๐‘ฅ

๐‘ฅ๐‘‡๐‘ฅ, ๐‘– = 1, ...,๐‘š. (5.2)

This description of the Lanczos method is sufficient for a theoretical analysis of error

(i.e., without round-off error), but, for completeness, we provide a short description of the

Lanczos method (when ๐’ฆ๐‘š(๐ด, ๐‘) is full-rank) in Algorithm 6 [113]. For a more detailed

discussion of the nuances of practical implementation and techniques to minimize the effects

of round-off error, we refer the reader to [47, Section 10.3]. If ๐ด has ๐œˆ๐‘› non-zero entries,

then Algorithm 6 outputs ๐‘‡๐‘š after approximately (2๐œˆ + 8)๐‘š๐‘› floating point operations.

A number of different techniques, such as the divide-and-conquer algorithm with the fast

multipole method [23], can easily compute the eigenvalues of a tridiagonal matrix. The

complexity of this computation is negligible compared to the Lanczos algorithm, as, in

practice, ๐‘š is typically orders of magnitude less than ๐‘›.

Equation (5.1) for the Ritz values ๐œ†(๐‘š)๐‘– illustrates the significant improvement that the

Lanczos method provides over the power method for extremal eigenvalues. Whereas the

power method uses only the iterate ๐ด๐‘š๐‘ as an approximation of an eigenvector associated

with the largest magnitude eigenvalue, the Lanczos method uses the span of all of the iterates

of the power method (given by๐’ฆ๐‘š+1(๐ด, ๐‘)). However, the analysis of the Lanczos method is

significantly more complicated than that of the power method. Error estimates for extremal

eigenvalue approximation using the Lanczos method have been well studied, most notably

by Kaniel [56], Paige [91], Saad [99], and Kuczynski and Wozniakowski [67] (other notable

work includes [30, 36, 68, 84, 92, 104, 105, 125]). The work of Kaniel, Paige, and Saad

focused on the convergence of the Lanczos method as ๐‘š increases, and, therefore, their

results have strong dependence on the spectrum of the matrix ๐ด and the choice of initial

152

vector ๐‘. For example, a standard result of this type is the estimate

๐œ†1(๐ด)โˆ’ ๐œ†(๐‘š)1 (๐ด, ๐‘)

๐œ†1(๐ด)โˆ’ ๐œ†๐‘›(๐ด)โ‰ค(

tanโˆ (๐‘, ๐œ™1)

๐‘‡๐‘šโˆ’1(1 + 2๐›พ)

)2

, ๐›พ =๐œ†1(๐ด)โˆ’ ๐œ†2(๐ด)

๐œ†2(๐ด)โˆ’ ๐œ†๐‘›(๐ด), (5.3)

where ๐œ™1 is the eigenvector corresponding to ๐œ†1, and ๐‘‡๐‘šโˆ’1 is the (๐‘šโˆ’1)๐‘กโ„Ž degree Chebyshev

polynomial of the first kind [100, Theorem 6.4]. Kuczynski and Wozniakowski took a quite

different approach, and estimated the maximum expected relative error (๐œ†1 โˆ’ ๐œ†(๐‘š)1 )/๐œ†1 over

all ๐‘›ร— ๐‘› symmetric positive definite matrices, resulting in error estimates that depend only

on the dimension ๐‘› and the number of iterations ๐‘š. They produced the estimate

sup๐ดโˆˆ๐’ฎ๐‘›

++

E๐‘โˆผ๐’ฐ(๐‘†๐‘›โˆ’1)

[๐œ†1(๐ด)โˆ’ ๐œ†

(๐‘š)1 (๐ด, ๐‘)

๐œ†1(๐ด)

]โ‰ค .103

ln2(๐‘›(๐‘šโˆ’ 1)4)

(๐‘šโˆ’ 1)2, (5.4)

for all ๐‘› โ‰ฅ 8 and ๐‘š โ‰ฅ 4, where ๐’ฎ๐‘›++ is the set of all ๐‘› ร— ๐‘› symmetric positive definite

matrices, and the expectation is with respect to the uniform probability measure on the

hypersphere ๐‘†๐‘›โˆ’1 = ๐‘ฅ โˆˆ R๐‘› | โ€–๐‘ฅโ€– = 1 [67, Theorem 3.2]. One can quickly verify

that equation (5.4) also holds when the ๐œ†1(๐ด) term in the denominator is replaced by

๐œ†1(๐ด)โˆ’ ๐œ†๐‘›(๐ด), and the supremum over ๐’ฎ๐‘›++ is replaced by the maximum over the set of all

๐‘›ร— ๐‘› symmetric matrices, denoted by ๐’ฎ๐‘›.

Both of these approaches have benefits and drawbacks. If an extremal eigenvalue is

known to have a reasonably large eigenvalue gap (based on application or construction),

then a distribution dependent estimate provides a very good approximation of error, even for

small ๐‘š. However, if the eigenvalue gap is not especially large, then distribution dependent

estimates can significantly overestimate error, and estimates that depend only on ๐‘› and ๐‘š

are preferable. This is illustrated by the following elementary, yet enlightening, example.

Example 1. Let ๐ด โˆˆ ๐‘†๐‘›++ be the tridiagonal matrix resulting from the discretization of the

Laplacian operator on an interval with Dirichlet boundary conditions, namely, ๐ด๐‘–,๐‘– = 2

for ๐‘– = 1, ..., ๐‘› and ๐ด๐‘–,๐‘–+1 = ๐ด๐‘–+1,๐‘– = โˆ’1 for ๐‘– = 1, ..., ๐‘› โˆ’ 1. The eigenvalues of ๐ด

are given by ๐œ†๐‘–(๐ด) = 2 + 2 cos (๐‘–๐œ‹/(๐‘› + 1)), ๐‘– = 1, ..., ๐‘›. Consider the approximation

of ๐œ†1(๐ด) by ๐‘š iterations of the Lanczos method. For a random choice of ๐‘, the expected

153

Algorithm 6 Lanczos Method

Input: symmetric matrix ๐ด โˆˆ R๐‘›ร—๐‘›, vector ๐‘ โˆˆ R๐‘›, number of iterations ๐‘š.

Output: symmetric tridiagonal matrix ๐‘‡๐‘š โˆˆ R๐‘šร—๐‘š, ๐‘‡๐‘š(๐‘–, ๐‘–) = ๐›ผ๐‘–,

๐‘‡๐‘š(๐‘–, ๐‘– + 1) = ๐›ฝ๐‘–, satisfying ๐‘‡๐‘š = ๐‘„๐‘‡๐‘š๐ด๐‘„๐‘š, where ๐‘„๐‘š = [๐‘ž1 ... ๐‘ž๐‘š].

Set ๐›ฝ0 = 0, ๐‘ž0 = 0, ๐‘ž1 = ๐‘/โ€–๐‘โ€–

For ๐‘– = 1, ...,๐‘š

๐‘ฃ = ๐ด๐‘ž๐‘–

๐›ผ๐‘– = ๐‘ž๐‘‡๐‘– ๐‘ฃ

๐‘ฃ = ๐‘ฃ โˆ’ ๐›ผ๐‘–๐‘ž๐‘– โˆ’ ๐›ฝ๐‘–โˆ’1๐‘ž๐‘–โˆ’1

๐›ฝ๐‘– = โ€–๐‘ฃโ€–๐‘ž๐‘–+1 = ๐‘ฃ/๐›ฝ๐‘–

value of tan2โˆ (๐‘, ๐œ™1) is (1+๐‘œ(1))๐‘›. If tan2โˆ (๐‘, ๐œ™1) = ๐ถ๐‘› for some constant ๐ถ, then (5.3)

produces the estimate

๐œ†1(๐ด)โˆ’ ๐œ†(๐‘š)1 (๐ด, ๐‘)

๐œ†1(๐ด)โˆ’ ๐œ†๐‘›(๐ด)โ‰ค ๐ถ๐‘›๐‘‡๐‘šโˆ’1

(1 + 2 tan

(๐œ‹

2(๐‘› + 1)

)tan

(3๐œ‹

2(๐‘› + 1)

))โˆ’2

โ‰ƒ ๐‘›(1 + ๐‘‚(๐‘›โˆ’1))โˆ’๐‘š โ‰ƒ ๐‘›.

In this instance, the estimate is a trivial one for all choices of ๐‘š, which varies greatly from

the error estimate (5.4) of order ln2 ๐‘›/๐‘š2. The exact same estimate holds for the smallest

eigenvalue ๐œ†๐‘›(๐ด) when the corresponding bounds are applied, since 4๐ผ โˆ’๐ด is similar to ๐ด.

Now, consider the approximation of the largest eigenvalue of ๐ต = ๐ดโˆ’1 by ๐‘š iterations

of the Lanczos method. The matrix ๐ต possesses a large gap between the largest and second-

largest eigenvalue, which results in a value of ๐›พ for (5.3) that remains bounded below by a

constant independent of ๐‘›, namely

๐›พ = (2 cos (๐œ‹/(๐‘› + 1)) + 1) / (2 cos (๐œ‹/(๐‘› + 1))โˆ’ 1) .

154

Therefore, in this instance, the estimate (5.3) illustrates a constant convergence rate, pro-

duces non-trivial bounds (for typical ๐‘) for ๐‘š = ฮ˜(ln๐‘›), and is preferable to the error

estimate (5.4) of order ln2 ๐‘›/๐‘š2.

More generally, if a matrix ๐ด has eigenvalue gap ๐›พ . ๐‘›โˆ’๐›ผ and the initial vector ๐‘

satisfies tan2โˆ (๐‘, ๐œ™1) & ๐‘›, then the error estimate (5.3) is a trivial one for ๐‘š . ๐‘›๐›ผ/2. This

implies that the estimate (5.3) is most useful when the eigenvalue gap is constant or tends to

zero very slowly as ๐‘› increases. When the gap is not especially large (say, ๐‘›โˆ’๐›ผ, ๐›ผ constant),

then uniform error estimates are preferable for small values of ๐‘š. In this work, we focus

on uniform bounds, namely, error estimates that hold uniformly over some large set of

matrices (typically ๐’ฎ๐‘›++ or ๐’ฎ๐‘›). We begin by recalling some of the key existing uniform

error estimates for the Lanczos method.

5.1.1 Related Work

Uniform error estimates for the Lanczos method have been produced almost exclusively

for symmetric positive definite matrices, as error estimates for extremal eigenvalues of

symmetric matrices can be produced from estimates for ๐‘†๐‘›++ relatively easily. The majority

of results apply only to either ๐œ†1(๐ด), ๐œ†๐‘›(๐ด), or some function of the two (i.e., condition

number). All estimates are probabilistic and take the initial vector ๐‘ to be uniformly

distributed on the hypersphere. Here we provide a brief description of some key uniform

error estimates previously produced for the Lanczos method.

In [67], Kuczynski and Wozniakowski produced a complete analysis of the power

method and provided a number of upper bounds for the Lanczos method. Most notably, they

produced error estimate (5.4) and provided the following upper bound for the probability

that the relative error (๐œ†1 โˆ’ ๐œ†(๐‘š)1 )/๐œ†1 is greater than some value ๐œ–:

sup๐ดโˆˆ๐’ฎ๐‘›

++

P๐‘โˆผ๐’ฐ(๐‘†๐‘›โˆ’1)

[๐œ†1(๐ด)โˆ’ ๐œ†

(๐‘š)1 (๐ด, ๐‘)

๐œ†1(๐ด)> ๐œ–

]โ‰ค 1.648

โˆš๐‘›๐‘’โˆ’

โˆš๐œ–(2๐‘šโˆ’1). (5.5)

However, the authors were unable to produce any lower bounds for (5.4) or (5.5), and stated

that a more sophisticated analysis would most likely be required. In the same paper, they

155

performed numerical experiments for the Lanczos method that produced errors of the order

๐‘šโˆ’2, which led the authors to suggest that the error estimate (5.4) may be an overestimate,

and that the ln2 ๐‘› term may be unnecessary.

In [68], Kuczynski and Wozniakowski noted that the above estimates immediately

translate to relative error estimates for minimal eigenvalues when the error ๐œ†(๐‘š)๐‘š โˆ’ ๐œ†๐‘› is

considered relative to ๐œ†1 or ๐œ†1 โˆ’ ๐œ†๐‘› (both normalizations can be shown to produce the

same bound). However, we can quickly verify that there exist sequences in ๐’ฎ๐‘›++ for which

the quantity E[(๐œ†

(๐‘š)๐‘š โˆ’ ๐œ†๐‘›)/๐œ†๐‘›

]is unbounded. These results for minimal eigenvalues,

combined with (5.4), led to error bounds for estimating the condition number of a matrix.

Unfortunately, error estimates for the condition number face the same issue as the quantity

(๐œ†(๐‘š)๐‘š โˆ’ ๐œ†๐‘›)/๐œ†๐‘›, and therefore, only estimates that depend on the value of the condition

number can be produced.

The proof technique used to produce (5.4) works specifically for the quantity E[(๐œ†1 โˆ’

๐œ†(๐‘š)1 )/๐œ†1

](i.e., the 1-norm), and does not carry over to more general ๐‘-norms of the form

E[((๐œ†1 โˆ’ ๐œ†

(๐‘š)1 )/๐œ†1)

๐‘]1/๐‘, ๐‘ โˆˆ (1,โˆž). Later, in [30, Theorem 5.2, ๐‘Ÿ = 1], Del Corso and

Manzini produced an upper bound of the order (โˆš๐‘›/๐‘š)1/๐‘ for arbitrary ๐‘-norms, given by

sup๐ดโˆˆ๐’ฎ๐‘›

++

E๐‘โˆผ๐’ฐ(๐‘†๐‘›โˆ’1)

[(๐œ†1(๐ด)โˆ’ ๐œ†

(๐‘š)1 (๐ด, ๐‘)

๐œ†1(๐ด)

)๐‘] 1๐‘

.1

๐‘š1/๐‘

(ฮ“(๐‘โˆ’ 1

2

)ฮ“(๐‘›2

)ฮ“(๐‘)ฮ“

(๐‘›โˆ’12

) ) 1๐‘

. (5.6)

This bound is clearly worse than (5.4), and better bounds can be produced for arbitrary ๐‘

simply by making use of (5.5). Again, the authors were unable to produce any lower bounds.

More recently, the machine learning and optimization community has become increas-

ingly interested in the problem of approximating the top singular vectors of a symmetric

matrix by randomized techniques (for a review, see [49]). This has led to a wealth of new

results in recent years regarding classical techniques, including the Lanczos method. In

[84], Musco and Musco considered a block Krylov subspace algorithm similar to the block

Lanczos method and showed that with high probability their algorithm achieves an error of

๐œ– in ๐‘š = ๐‘‚(๐œ–โˆ’1/2 ln๐‘›) iterations, matching the bound (5.5) for a block size of one. Very

recently, a number of lower bounds for a wide class of randomized algorithms were shown

156

in [104, 105], all of which can be applied to the Lanczos method as a corollary. First, the

authors showed that the dependence on ๐‘› in the above upper bounds was in fact necessary.

In particular, it follows from [105, Theorem A.1] that ๐‘‚(log ๐‘›) iterations are necessary to

obtain a non-trivial error. In addition, as a corollary of their main theorem [105, Theorem

1], there exist ๐‘1, ๐‘2 > 0 such that for all ๐œ– โˆˆ (0, 1) there exists an ๐‘›๐‘œ = poly(๐œ–โˆ’1) such that

for all ๐‘› โ‰ฅ ๐‘›๐‘œ there exists a random matrix ๐ด โˆˆ ๐’ฎ๐‘› such that

P[๐œŒ(๐ด)โˆ’ ๐œŒ(๐‘‡ )

๐œŒ(๐ด)โ‰ฅ ๐œ–

12

]โ‰ฅ 1โˆ’ ๐‘’โˆ’๐‘›๐‘2 for all ๐‘š โ‰ค ๐‘1๐œ–

โˆ’1/2 ln๐‘›, (5.7)

where ๐œŒ(ยท) is the spectral radius of a matrix, and the randomness is over both the initial

vector and matrix.

5.1.2 Contributions and Remainder of Chapter

In what follows, we prove improved upper bounds for the maximum expected error of the

Lanczos method in the ๐‘-norm, ๐‘ โ‰ฅ 1, and combine this with nearly-matching asymptotic

lower bounds with explicit constants. These estimates can be found in Theorem 19. The

upper bounds result from using a slightly different proof technique than that of (5.4), which

is more robust to arbitrary ๐‘-norms. Comparing the lower bounds of Theorem 19 to the

estimate (5.7), we make a number of observations. Whereas (5.7) results from a statistical

analysis of random matrices, our estimates follow from taking an infinite sequence of non-

random matrices with explicit eigenvalues and using the theory of orthogonal polynomials.

Our estimate for ๐‘š = ๐‘‚(ln๐‘›) (in Theorem 19) is slightly worse than (5.7) by a factor

of ln2 ln๐‘›, but makes up for this in the form of an explicit constant. The estimate for

๐‘š = ๐‘œ(๐‘›1/2 lnโˆ’1/2 ๐‘›) (also in Theorem 19) does not have ๐‘› dependence, but illustrates a

useful lower bound, as it has an explicit constant and the ln๐‘› term becomes negligible as

๐‘š increases. The results (5.7) do not fully apply to this regime, as the dependence of the

required lower bound on ๐‘› is at least cubic in the inverse of the eigenvalue gap (see [105,

Theorem 6.1] for details).

To complement these bounds, we also provide an error analysis for matrices that have a

157

certain structure. In Theorem 20, we produce improved dimension-free upper bounds for

matrices that have some level of eigenvalue โ€œregularity" near ๐œ†1. In addition, in Theorem 21,

we prove a powerful result that can be used to determine, for any fixed ๐‘š, the asymptotic

relative error for any sequence of matrices ๐‘‹๐‘› โˆˆ ๐’ฎ๐‘›, ๐‘› = 1, 2, ..., that exhibits suitable

convergence of its empirical spectral distribution. Later, we perform numerical experiments

that illustrate the practical usefulness of this result. Theorem 21, combined with estimates

for Jacobi polynomials (see Proposition 13), illustrates that the inverse quadratic dependence

on the number of iterations ๐‘š in the estimates produced throughout this chapter does not

simply illustrate the worst case, but is actually indicative of the typical case in some sense.

In Corollary 3 and Theorem 22, we produce results similar to Theorem 19 for arbitrary

eigenvalues ๐œ†๐‘–. The lower bounds follow relatively quickly from the estimates for ๐œ†1, but

the upper bounds require some mild assumptions on the eigenvalue gaps of the matrix.

These results mark the first uniform-type bounds for estimating arbitrary eigenvalues by

the Lanczos method. In addition, in Corollary 4, we translate our error estimates for the

extremal eigenvalues ๐œ†1(๐ด) and ๐œ†๐‘›(๐ด) into error bounds for the condition number of a

symmetric positive definite matrix, but, as previously mentioned, the relative error of the

condition number of a matrix requires estimates that depend on the condition number itself.

Finally, we present numerical experiments that support the accuracy and practical usefulness

of the theoretical estimates detailed above.

The remainder of the chapter is as follows. In Section 5.2, we prove basic results

regarding relative error and make a number of fairly straightforward observations. In Section

5.3, we prove asymptotic lower bounds and improved upper bounds for the relative error

in an arbitrary ๐‘-norm. In Section 5.4, we produce a dimension-free error estimate for a

large class of matrices and prove a theorem that can be used to determine the asymptotic

relative error for any fixed ๐‘š and sequence of matrices ๐‘‹๐‘› โˆˆ ๐’ฎ๐‘›, ๐‘› = 1, 2, ..., with suitable

convergence of its empirical spectral distribution. In Section 5.5, under some mild additional

assumptions, we prove a version of Theorem 19 for arbitrary eigenvalues ๐œ†๐‘–, and extend

our results for ๐œ†1 and ๐œ†๐‘› to the condition number of a symmetric positive definite matrix.

Finally, in Section 5.6, we perform a number of experiments and discuss how the numerical

158

results compare to the theoretical estimates in this work.

5.2 Preliminary Results

Because the Lanczos method applies only to symmetric matrices, all matrices in this

chapter are assumed to belong to ๐’ฎ๐‘›. The Lanczos method and the quantity (๐œ†1(๐ด) โˆ’

๐œ†(๐‘š)1 (๐ด, ๐‘))/(๐œ†1(๐ด) โˆ’ ๐œ†๐‘›(๐ด)) are both unaffected by shifting and scaling, and so any

maximum over ๐ด โˆˆ ๐’ฎ๐‘› can be replaced by a maximum over all ๐ด โˆˆ ๐’ฎ๐‘› with ๐œ†1(๐ด) and

๐œ†๐‘›(๐ด) fixed. Often for producing upper bounds, it is convenient to choose ๐œ†1(๐ด) = 1 and

๐œ†๐‘›(๐ด) = 0. For the sake of brevity, we will often write ๐œ†๐‘–(๐ด) and ๐œ†(๐‘š)๐‘— (๐ด, ๐‘) as ๐œ†๐‘– and ๐œ†

(๐‘š)๐‘—

when the associated matrix ๐ด and vector ๐‘ are clear from context.

We begin by rewriting expression (5.2) for ๐œ†(๐‘š)1 in terms of polynomials. The Krylov

subspace ๐’ฆ๐‘š(๐ด, ๐‘) can be alternatively defined as

๐’ฆ๐‘š(๐ด, ๐‘) = ๐‘ƒ (๐ด)๐‘ |๐‘ƒ โˆˆ ๐’ซ๐‘šโˆ’1,

where ๐’ซ๐‘šโˆ’1 is the set of all real-valued polynomials of degree at most ๐‘š โˆ’ 1. Suppose

๐ด โˆˆ ๐’ฎ๐‘› has eigendecomposition ๐ด = ๐‘„ฮ›๐‘„๐‘‡ , where ๐‘„ โˆˆ R๐‘›ร—๐‘› is an orthogonal matrix and

ฮ› โˆˆ R๐‘›ร—๐‘› is the diagonal matrix satisfying ฮ›(๐‘–, ๐‘–) = ๐œ†๐‘–(๐ด), ๐‘– = 1, ..., ๐‘›. Then we have the

relation

๐œ†(๐‘š)1 (๐ด, ๐‘) = max

๐‘ฅโˆˆ๐’ฆ๐‘š(๐ด,๐‘)๐‘ฅ =0

๐‘ฅ๐‘‡๐ด๐‘ฅ

๐‘ฅ๐‘‡๐‘ฅ= max

๐‘ƒโˆˆ๐’ซ๐‘šโˆ’1๐‘ƒ =0

๐‘๐‘‡๐‘ƒ 2(๐ด)๐ด๐‘

๐‘๐‘‡๐‘ƒ 2(๐ด)๐‘= max

๐‘ƒโˆˆ๐’ซ๐‘šโˆ’1๐‘ƒ =0

๐‘‡๐‘ƒ 2(ฮ›)ฮ›

๐‘‡๐‘ƒ 2(ฮ›),

where = ๐‘„๐‘‡ ๐‘. The relative error is given by

๐œ†1(๐ด)โˆ’ ๐œ†(๐‘š)1 (๐ด, ๐‘)

๐œ†1(๐ด)โˆ’ ๐œ†๐‘›(๐ด)= min

๐‘ƒโˆˆ๐’ซ๐‘šโˆ’1๐‘ƒ =0

โˆ‘๐‘›๐‘–=2

2๐‘–๐‘ƒ

2(๐œ†๐‘–)(๐œ†1 โˆ’ ๐œ†๐‘–)

(๐œ†1 โˆ’ ๐œ†๐‘›)โˆ‘๐‘›

๐‘–=1 2๐‘–๐‘ƒ

2(๐œ†๐‘–),

159

and the expected ๐‘๐‘กโ„Ž moment of the relative error is given by

E๐‘โˆผ๐’ฐ(๐‘†๐‘›โˆ’1)

[(๐œ†1 โˆ’ ๐œ†

(๐‘š)1

๐œ†1 โˆ’ ๐œ†๐‘›

)๐‘]=

โˆซ๐‘†๐‘›โˆ’1

min๐‘ƒโˆˆ๐’ซ๐‘šโˆ’1

๐‘ƒ =0

[ โˆ‘๐‘›๐‘–=2

2๐‘–๐‘ƒ

2(๐œ†๐‘–)(๐œ†1 โˆ’ ๐œ†๐‘–)

(๐œ†1 โˆ’ ๐œ†๐‘›)โˆ‘๐‘›

๐‘–=1 2๐‘–๐‘ƒ

2(๐œ†๐‘–)

]๐‘๐‘‘๐œŽ(),

where ๐œŽ is the uniform probability measure on ๐‘†๐‘›โˆ’1. Because the relative error does not

depend on the norm of or the sign of any entry, we can replace the integral over ๐‘†๐‘›โˆ’1 by

an integral of ๐‘ฆ = (๐‘ฆ1, ..., ๐‘ฆ๐‘›) over [0,โˆž)๐‘› with respect to the joint chi-square probability

density function

๐‘“๐‘Œ (๐‘ฆ) =1

(2๐œ‹)๐‘›/2exp

โˆ’1

2

๐‘›โˆ‘๐‘–=1

๐‘ฆ๐‘–

๐‘›โˆ

๐‘–=1

๐‘ฆโˆ’12

๐‘– (5.8)

of ๐‘› independent chi-square random variables ๐‘Œ1, ..., ๐‘Œ๐‘› โˆผ ๐œ’21 with one degree of freedom

each. In particular, we have

E๐‘โˆผ๐’ฐ(๐‘†๐‘›โˆ’1)

[(๐œ†1 โˆ’ ๐œ†

(๐‘š)1

๐œ†1 โˆ’ ๐œ†๐‘›

)๐‘]=

โˆซ[0,โˆž)๐‘›

min๐‘ƒโˆˆ๐’ซ๐‘šโˆ’1

๐‘ƒ =0

[ โˆ‘๐‘›๐‘–=2 ๐‘ฆ๐‘–๐‘ƒ

2(๐œ†๐‘–)(๐œ†1 โˆ’ ๐œ†๐‘–)

(๐œ†1 โˆ’ ๐œ†๐‘›)โˆ‘๐‘›

๐‘–=1 ๐‘ฆ๐‘–๐‘ƒ2(๐œ†๐‘–)

]๐‘๐‘“๐‘Œ (๐‘ฆ) ๐‘‘๐‘ฆ.

(5.9)

Similarly, probabilistic estimates with respect to ๐‘ โˆผ ๐’ฐ(๐‘†๐‘›โˆ’1) can be replaced by estimates

with respect to ๐‘Œ1, ..., ๐‘Œ๐‘› โˆผ ๐œ’21, as

P๐‘โˆผ๐’ฐ(๐‘†๐‘›โˆ’1)

[๐œ†1 โˆ’ ๐œ†

(๐‘š)1

๐œ†1 โˆ’ ๐œ†๐‘›

โ‰ฅ ๐œ–

]= P

๐‘Œ๐‘–โˆผ๐œ’21

โŽกโŽฃ min๐‘ƒโˆˆ๐’ซ๐‘šโˆ’1

๐‘ƒ =0

โˆ‘๐‘›๐‘–=2 ๐‘Œ๐‘–๐‘ƒ

2(๐œ†๐‘–)(๐œ†1 โˆ’ ๐œ†๐‘–)

(๐œ†1 โˆ’ ๐œ†๐‘›)โˆ‘๐‘›

๐‘–=1 ๐‘Œ๐‘–๐‘ƒ 2(๐œ†๐‘–)โ‰ฅ ๐œ–

โŽคโŽฆ . (5.10)

For the remainder of the chapter, we almost exclusively use equation (5.9) for expected

relative error and equation (5.10) for probabilistic bounds for relative error. If ๐‘ƒ minimizes

the expression in equation (5.9) or (5.10) for a given ๐‘ฆ or ๐‘Œ , then any polynomial of the

form ๐›ผ๐‘ƒ , ๐›ผ โˆˆ Rโˆ–0, is also a minimizer. Therefore, without any loss of generality, we

can alternatively minimize over the set ๐’ซ๐‘šโˆ’1(1) = ๐‘„ โˆˆ ๐’ซ๐‘šโˆ’1 |๐‘„(1) = 1. For the sake

of brevity, we will omit the subscripts under E and P when the underlying distribution

is clear from context. In this work, we make use of asymptotic notation to express the

limiting behavior of a function with respect to ๐‘›. A function ๐‘“(๐‘›) is ๐‘‚(๐‘”(๐‘›)) if there exists

๐‘€,๐‘›0 > 0 such that |๐‘“(๐‘›)| โ‰ค๐‘€๐‘”(๐‘›) for all ๐‘› โ‰ฅ ๐‘›0, ๐‘œ(๐‘”(๐‘›)) if for every ๐œ– > 0 there exists

160

a ๐‘›๐œ– such that |๐‘“(๐‘›)| โ‰ค ๐œ–๐‘”(๐‘›) for all ๐‘› โ‰ฅ ๐‘›0, ๐œ”(๐‘”(๐‘›)) if |๐‘”(๐‘›)| = ๐‘œ(|๐‘“(๐‘›)|), and ฮ˜(๐‘”(๐‘›)) if

๐‘“(๐‘›) = ๐‘‚(๐‘”(๐‘›)) and ๐‘”(๐‘›) = ๐‘‚(๐‘“(๐‘›)).

5.3 Asymptotic Lower Bounds and Improved Upper Bounds

In this section, we obtain improved upper bounds for E[((๐œ†1 โˆ’ ๐œ†

(๐‘š)1 )/๐œ†1)

๐‘]1/๐‘, ๐‘ โ‰ฅ 1, and

produce nearly-matching lower bounds. In particular, we prove the following theorem for

the behavior of ๐‘š as a function of ๐‘› as ๐‘› tends to infinity.

Theorem 19.

max๐ดโˆˆ๐’ฎ๐‘›

P๐‘โˆผ๐’ฐ(๐‘†๐‘›โˆ’1)

[๐œ†1(๐ด)โˆ’ ๐œ†

(๐‘š)1 (๐ด, ๐‘)

๐œ†1(๐ด)โˆ’ ๐œ†๐‘›(๐ด)โ‰ฅ 1โˆ’ ๐‘œ(1)

]โ‰ฅ 1โˆ’ ๐‘œ(1/๐‘›)

for ๐‘š = ๐‘œ(ln๐‘›),

max๐ดโˆˆ๐’ฎ๐‘›

P๐‘โˆผ๐’ฐ(๐‘†๐‘›โˆ’1)

[๐œ†1(๐ด)โˆ’ ๐œ†

(๐‘š)1 (๐ด, ๐‘)

๐œ†1(๐ด)โˆ’ ๐œ†๐‘›(๐ด)โ‰ฅ .015 ln2 ๐‘›

๐‘š2 ln2 ln๐‘›

]โ‰ฅ 1โˆ’ ๐‘œ(1/๐‘›)

for ๐‘š = ฮ˜(ln๐‘›),

max๐ดโˆˆ๐’ฎ๐‘›

P๐‘โˆผ๐’ฐ(๐‘†๐‘›โˆ’1)

[๐œ†1(๐ด)โˆ’ ๐œ†

(๐‘š)1 (๐ด, ๐‘)

๐œ†1(๐ด)โˆ’ ๐œ†๐‘›(๐ด)โ‰ฅ 1.08

๐‘š2

]โ‰ฅ 1โˆ’ ๐‘œ(1/๐‘›)

for ๐‘š = ๐‘œ(๐‘›1/2 lnโˆ’1/2 ๐‘›

)and ๐œ”(1), and

max๐ดโˆˆ๐’ฎ๐‘›

E๐‘โˆผ๐’ฐ(๐‘†๐‘›โˆ’1)

[(๐œ†1(๐ด)โˆ’ ๐œ†

(๐‘š)1 (๐ด, ๐‘)

๐œ†1(๐ด)โˆ’ ๐œ†๐‘›(๐ด)

)๐‘]1/๐‘โ‰ค .068

ln2 (๐‘›(๐‘šโˆ’ 1)8๐‘)

(๐‘šโˆ’ 1)2

for ๐‘› โ‰ฅ 100, ๐‘š โ‰ฅ 10, ๐‘ โ‰ฅ 1.

Proof. The first result is a corollary of [105, Theorem A.1], and the remaining results follow

from Lemmas 20, 21, and 22.

By Hรถlderโ€™s inequality, the lower bounds in Theorem 19 also hold for arbitrary ๐‘-norms,

161

๐‘ โ‰ฅ 1. All results are equally applicable to ๐œ†๐‘›, as the Krylov subspace is unaffected by

shifting and scaling, namely ๐’ฆ๐‘š(๐ด, ๐‘) = ๐’ฆ๐‘š(๐›ผ๐ด + ๐›ฝ๐ผ, ๐‘) for all ๐›ผ = 0.

We begin by producing asymptotic lower bounds. The general technique is as follows.

We choose an infinite sequence of matrices ๐ด๐‘›โˆž๐‘›=1, ๐ด๐‘› โˆˆ ๐’ฎ๐‘›, treat ๐‘š as a function of ๐‘›,

and show that, as ๐‘› tends to infinity, for โ€œmost" choices of an initial vector ๐‘, the relative error

of this sequence of matrices is well approximated by an integral polynomial minimization

problem for which bounds can be obtained. First, we recall a number of useful propositions

regarding Gauss-Legendre quadrature, orthogonal polynomials, and Chernoff bounds for

chi-square random variables.

Proposition 9. ([41],[114]) Let ๐‘ƒ โˆˆ ๐’ซ2๐‘˜โˆ’1, ๐‘ฅ๐‘–๐‘˜๐‘–=1 be the zeros of the ๐‘˜๐‘กโ„Ž degree Legendre

polynomial ๐‘ƒ๐‘˜(๐‘ฅ), ๐‘ƒ๐‘˜(1) = 1, in descending order (๐‘ฅ1 > ... > ๐‘ฅ๐‘˜), and ๐‘ค๐‘– = 2(1 โˆ’

๐‘ฅ2๐‘– )

โˆ’1[๐‘ƒ โ€ฒ๐‘˜(๐‘ฅ๐‘–)]

โˆ’2, ๐‘– = 1, ..., ๐‘˜ be the corresponding weights. Then

1.โˆซ 1

โˆ’1

๐‘ƒ (๐‘ฅ) ๐‘‘๐‘ฅ =๐‘˜โˆ‘

๐‘–=1

๐‘ค๐‘–๐‘ƒ (๐‘ฅ๐‘–),

2. ๐‘ฅ๐‘– =(1โˆ’ 1

8๐‘˜2

)cos(

(4๐‘–โˆ’1)๐œ‹4๐‘˜+2

)+ ๐‘‚(๐‘˜โˆ’3), ๐‘– = 1, ..., ๐‘˜,

3.๐œ‹

๐‘˜ + 12

โˆš1โˆ’ ๐‘ฅ2

1

(1โˆ’ 1

8(๐‘˜ + 12)2(1โˆ’ ๐‘ฅ2

1)

)โ‰ค ๐‘ค1 < ... < ๐‘คโŒŠ๐‘˜+1

2

โŒ‹ โ‰ค ๐œ‹

๐‘˜ + 12

.

Proposition 10. ([110, Section 7.72], [111]) Let ๐œ”(๐‘ฅ) ๐‘‘๐‘ฅ be a measure on [โˆ’1, 1] with

infinitely many points of increase, with orthogonal polynomials ๐‘๐‘˜(๐‘ฅ)๐‘˜โ‰ฅ0, ๐‘๐‘˜ โˆˆ ๐’ซ๐‘˜. Then

max๐‘ƒโˆˆ๐’ซ๐‘˜๐‘ƒ =0

โˆซ 1

โˆ’1๐‘ฅ๐‘ƒ 2(๐‘ฅ)๐œ”(๐‘ฅ) ๐‘‘๐‘ฅโˆซ 1

โˆ’1๐‘ƒ 2(๐‘ฅ)๐œ”(๐‘ฅ) ๐‘‘๐‘ฅ

= max๐‘ฅ โˆˆ [โˆ’1, 1]

๐‘๐‘˜+1(๐‘ฅ) = 0

.

Proposition 11. Let ๐‘ โˆผ ๐œ’2๐‘˜. Then P[๐‘ โ‰ค ๐‘ฅ] โ‰ค

[๐‘ฅ๐‘˜

exp

1โˆ’ ๐‘ฅ๐‘˜

]๐‘˜2 for ๐‘ฅ โ‰ค ๐‘˜ and

P[๐‘ โ‰ฅ ๐‘ฅ] โ‰ค[๐‘ฅ๐‘˜

exp

1โˆ’ ๐‘ฅ๐‘˜

]๐‘˜2 for ๐‘ฅ โ‰ฅ ๐‘˜.

Proof. The result follows from taking [26, Lemma 2.2] and letting ๐‘‘โ†’โˆž.

We are now prepared to prove a lower bound of order ๐‘šโˆ’2.

162

Lemma 20. If ๐‘š = ๐‘œ(๐‘›1/2 lnโˆ’1/2 ๐‘›

)and ๐œ”(1), then

max๐ดโˆˆ๐’ฎ๐‘›

P๐‘โˆผ๐’ฐ(๐‘†๐‘›โˆ’1)

[๐œ†1(๐ด)โˆ’ ๐œ†

(๐‘š)1 (๐ด, ๐‘)

๐œ†1(๐ด)โˆ’ ๐œ†๐‘›(๐ด)โ‰ฅ 1.08

๐‘š2

]โ‰ฅ 1โˆ’ ๐‘œ(1/๐‘›).

Proof. The structure of the proof is as follows. We choose a matrix with eigenvalues based

on the zeros of a Legendre polynomial, and show that a large subset of [0,โˆž)๐‘› (with

respect to ๐‘“๐‘Œ (๐‘ฆ)) satisfies conditions that allow us to lower bound the relative error using an

integral minimization problem. The choice of Legendre polynomials is based on connections

to Gaussian quadrature, but, as we will later see (in Theorem 21 and experiments), the

1/๐‘š2 dependence is indicative of a large class of matrices with connections to orthogonal

polynomials. Let ๐‘ฅ1 > ... > ๐‘ฅ2๐‘š be the zeros of the (2๐‘š)๐‘กโ„Ž degree Legendre polynomial.

Let ๐ด โˆˆ ๐’ฎ๐‘› have eigenvalue ๐‘ฅ1 with multiplicity one, and remaining eigenvalues given by

๐‘ฅ๐‘— , ๐‘— = 2, ..., 2๐‘š, each with multiplicity at least โŒŠ(๐‘›โˆ’ 1)/(2๐‘šโˆ’ 1)โŒ‹. By equation (5.10),

P

[๐œ†1 โˆ’ ๐œ†

(๐‘š)1

๐œ†1 โˆ’ ๐œ†๐‘›

โ‰ฅ ๐œ–

]= P

โŽกโŽฃ min๐‘ƒโˆˆ๐’ซ๐‘šโˆ’1

๐‘ƒ =0

โˆ‘๐‘›๐‘–=2 ๐‘Œ๐‘–๐‘ƒ

2(๐œ†๐‘–)(๐œ†1 โˆ’ ๐œ†๐‘–)

(๐œ†1 โˆ’ ๐œ†๐‘›)โˆ‘๐‘›

๐‘–=1 ๐‘Œ๐‘–๐‘ƒ 2(๐œ†๐‘–)โ‰ฅ ๐œ–

โŽคโŽฆ= P

โŽกโŽฃ min๐‘ƒโˆˆ๐’ซ๐‘šโˆ’1

๐‘ƒ =0

โˆ‘2๐‘š๐‘—=2 ๐‘Œ๐‘—๐‘ƒ

2(๐‘ฅ๐‘—) (๐‘ฅ1 โˆ’ ๐‘ฅ๐‘—)

(๐‘ฅ1 โˆ’ ๐‘ฅ2๐‘š)โˆ‘2๐‘š

๐‘—=1 ๐‘Œ๐‘—๐‘ƒ 2 (๐‘ฅ๐‘—)โ‰ฅ ๐œ–

โŽคโŽฆ ,

where ๐‘Œ1, ..., ๐‘Œ๐‘› โˆผ ๐œ’21, and ๐‘Œ๐‘— is the sum of the ๐‘Œ๐‘– that satisfy ๐œ†๐‘– = ๐‘ฅ๐‘— . ๐‘Œ1 has one degree of

freedom, and ๐‘Œ๐‘— , ๐‘— = 2, ..., 2๐‘š, each have at least โŒŠ(๐‘›โˆ’ 1)/(2๐‘šโˆ’ 1)โŒ‹ degrees of freedom.

Let ๐‘ค1, ..., ๐‘ค2๐‘š be the weights of Gaussian quadrature associated with ๐‘ฅ1, ..., ๐‘ฅ2๐‘š. By

Proposition 9,

๐‘ฅ1 = 1โˆ’ 4 + 9๐œ‹2

128๐‘š2+ ๐‘‚(๐‘šโˆ’3), ๐‘ฅ2 = 1โˆ’ 4 + 49๐œ‹2

128๐‘š2+ ๐‘‚(๐‘šโˆ’3),

and, therefore, 1โˆ’ ๐‘ฅ21 = 4+9๐œ‹2

64๐‘š2 + ๐‘‚(๐‘šโˆ’3). Again, by Proposition 9, we can lower bound

163

the smallest ratio between weights by

๐‘ค1

๐‘ค๐‘š

โ‰ฅโˆš

1โˆ’ ๐‘ฅ21

(1โˆ’ 1

8(2๐‘š + 1/2)2(1โˆ’ ๐‘ฅ21)

)=

(โˆš4 + 9๐œ‹2

8๐‘š+ ๐‘‚(๐‘šโˆ’2)

)(1โˆ’ 1

(4 + 9๐œ‹2)/2 + ๐‘‚(๐‘šโˆ’1)

)=

2 + 9๐œ‹2

8โˆš

4 + 9๐œ‹2๐‘š+ ๐‘‚(๐‘šโˆ’2).

Therefore, by Proposition 11,

P[min๐‘—โ‰ฅ2

๐‘Œ๐‘— โ‰ฅ๐‘ค๐‘š

๐‘ค1

๐‘Œ1

]โ‰ฅ P

[๐‘Œ๐‘— โ‰ฅ

1

3

โŒŠ๐‘›โˆ’ 1

2๐‘šโˆ’ 1

โŒ‹, ๐‘— โ‰ฅ 2

]ร— P

[๐‘Œ1 โ‰ค

๐‘ค1

3๐‘ค๐‘š

โŒŠ๐‘›โˆ’ 1

2๐‘šโˆ’ 1

โŒ‹]

โ‰ฅ[1โˆ’

(๐‘’2/3

3

)12

โŒŠ๐‘›โˆ’12๐‘šโˆ’1

โŒ‹ ]2๐‘šโˆ’1[1โˆ’

(๐‘ค1

3๐‘ค๐‘š

โŒŠ๐‘›โˆ’12๐‘šโˆ’1

โŒ‹๐‘’1โˆ’ ๐‘ค1

3๐‘ค๐‘š

โŒŠ๐‘›โˆ’12๐‘šโˆ’1

โŒ‹)12]

= 1โˆ’ ๐‘œ(1/๐‘›).

We now restrict our attention to values of ๐‘Œ = (๐‘Œ1, ..., ๐‘Œ๐‘›) โˆˆ [0,โˆž)๐‘› that satisfy the

inequality ๐‘ค1 min๐‘—โ‰ฅ2 ๐‘Œ๐‘— โ‰ฅ ๐‘ค๐‘š๐‘Œ1. If, for some fixed choice of ๐‘Œ ,

min๐‘ƒโˆˆ๐’ซ๐‘šโˆ’1

๐‘ƒ =0

โˆ‘2๐‘š๐‘—=2 ๐‘Œ๐‘—๐‘ƒ

2(๐‘ฅ๐‘—)(๐‘ฅ1 โˆ’ ๐‘ฅ๐‘—)โˆ‘2๐‘š๐‘—=1 ๐‘Œ๐‘—๐‘ƒ 2(๐‘ฅ๐‘—)

โ‰ค ๐‘ฅ1 โˆ’ ๐‘ฅ2, (5.11)

then, by Proposition 9 and Proposition 10 for ๐œ”(๐‘ฅ) = 1,

min๐‘ƒโˆˆ๐’ซ๐‘šโˆ’1

๐‘ƒ =0

โˆ‘2๐‘š๐‘—=2 ๐‘Œ๐‘—๐‘ƒ

2(๐‘ฅ๐‘—)(๐‘ฅ1 โˆ’ ๐‘ฅ๐‘—)โˆ‘2๐‘š๐‘—=1 ๐‘Œ๐‘—๐‘ƒ 2(๐‘ฅ๐‘—)

โ‰ฅ min๐‘ƒโˆˆ๐’ซ๐‘šโˆ’1

๐‘ƒ =0

โˆ‘2๐‘š๐‘—=2๐‘ค๐‘—๐‘ƒ

2(๐‘ฅ๐‘—)(๐‘ฅ1 โˆ’ ๐‘ฅ๐‘—)โˆ‘2๐‘š๐‘—=1๐‘ค๐‘—๐‘ƒ 2(๐‘ฅ๐‘—)

= min๐‘ƒโˆˆ๐’ซ๐‘šโˆ’1

๐‘ƒ =0

โˆ‘2๐‘š๐‘—=1 ๐‘ค๐‘—๐‘ƒ

2(๐‘ฅ๐‘—)(1โˆ’ ๐‘ฅ๐‘—)โˆ‘2๐‘š๐‘—=1๐‘ค๐‘—๐‘ƒ 2(๐‘ฅ๐‘—)

โˆ’ (1โˆ’ ๐‘ฅ1)

= min๐‘ƒโˆˆ๐’ซ๐‘šโˆ’1

๐‘ƒ =0

โˆซ 1

โˆ’1

๐‘ƒ 2(๐‘ฆ)(1โˆ’ ๐‘ฆ) ๐‘‘๐‘ฆโˆซ 1

โˆ’1

๐‘ƒ 2(๐‘ฆ) ๐‘‘๐‘ฆ

โˆ’ 4 + 9๐œ‹2

128๐‘š2+ ๐‘‚(๐‘šโˆ’3)

=4 + 9๐œ‹2

32๐‘š2โˆ’ 4 + 9๐œ‹2

128๐‘š2+ ๐‘‚(๐‘šโˆ’3) =

12 + 27๐œ‹2

128๐‘š2+ ๐‘‚(๐‘šโˆ’3).

164

Alternatively, if equation (5.11) does not hold, then we can lower bound the left hand side

of (5.11) by ๐‘ฅ1 โˆ’ ๐‘ฅ2 = 5๐œ‹2

16๐‘š2 + ๐‘‚(๐‘šโˆ’3). Noting that ๐‘ฅ1 โˆ’ ๐‘ฅ2๐‘š โ‰ค 2 completes the proof, as12+27๐œ‹2

256and 5๐œ‹2

32are both greater than 1.08.

The previous lemma illustrates that the inverse quadratic dependence of the relative

error on the number of iterations ๐‘š persists up to ๐‘š = ๐‘œ(๐‘›1/2/ ln1/2 ๐‘›). As we will see

in Sections 4 and 6, this ๐‘šโˆ’2 error is indicative of the behavior of the Lanczos method

for a wide range of typical matrices encountered in practice. Next, we aim to produce a

lower bound of the form ln2 ๐‘›/(๐‘š2 ln2 ln๐‘›). To do so, we will make use of more general

Jacobi polynomials instead of Legendre polynomials. In addition, in order to obtain a result

that contains some dependence on ๐‘›, the choice of Jacobi polynomial must vary with ๐‘›

(i.e., ๐›ผ and ๐›ฝ are a function of ๐‘›). However, due to the distribution of eigenvalues required

to produce this bound, Gaussian quadrature is no longer exact, and we must make use of

estimates for basic composite quadrature. We recall the following propositions regarding

composite quadrature and Jacobi polynomials.

Proposition 12. If ๐‘“ โˆˆ ๐’ž1[๐‘Ž, ๐‘], then for ๐‘Ž = ๐‘ฅ0 < ... < ๐‘ฅ๐‘› = ๐‘,โˆซ ๐‘

๐‘Ž

๐‘“(๐‘ฅ) ๐‘‘๐‘ฅโˆ’๐‘›โˆ‘

๐‘–=1

(๐‘ฅ๐‘– โˆ’ ๐‘ฅ๐‘–โˆ’1)๐‘“(๐‘ฅ*๐‘– )

โ‰ค ๐‘โˆ’ ๐‘Ž

2max๐‘ฅโˆˆ[๐‘Ž,๐‘]

|๐‘“ โ€ฒ(๐‘ฅ)| max๐‘–=1,...,๐‘›

(๐‘ฅ๐‘– โˆ’ ๐‘ฅ๐‘–โˆ’1),

where ๐‘ฅ*๐‘– โˆˆ [๐‘ฅ๐‘–โˆ’1, ๐‘ฅ๐‘–], ๐‘– = 1, ..., ๐‘›.

Proposition 13. ([103, Chapter 3.2]) Let ๐‘ƒ (๐›ผ,๐›ฝ)๐‘˜ (๐‘ฅ)โˆž๐‘˜=0, ๐›ผ, ๐›ฝ > โˆ’1, be the orthogonal

system of Jacobi polynomials over [โˆ’1, 1] with respect to weight function ๐œ”๐›ผ,๐›ฝ(๐‘ฅ) = (1โˆ’

๐‘ฅ)๐›ผ(1 + ๐‘ฅ)๐›ฝ , namely,

๐‘ƒ(๐›ผ,๐›ฝ)๐‘˜ (๐‘ฅ) =

ฮ“(๐‘˜ + ๐›ผ + 1)

๐‘˜! ฮ“(๐‘˜ + ๐›ผ + ๐›ฝ + 1)

๐‘˜โˆ‘๐‘–=0

(๐‘˜

๐‘–

)ฮ“(๐‘˜ + ๐‘– + ๐›ผ + ๐›ฝ + 1)

ฮ“(๐‘– + ๐›ผ + 1)

(๐‘ฅโˆ’ 1

2

)๐‘–

.

Then

(i)โˆซ 1

โˆ’1

[๐‘ƒ

(๐›ผ,๐›ฝ)๐‘˜ (๐‘ฅ)

]2๐œ”๐›ผ,๐›ฝ(๐‘ฅ) ๐‘‘๐‘ฅ =

2๐›ผ+๐›ฝ+1ฮ“(๐‘˜ + ๐›ผ + 1)ฮ“(๐‘˜ + ๐›ฝ + 1)

(2๐‘˜ + ๐›ผ + ๐›ฝ + 1)๐‘˜! ฮ“(๐‘˜ + ๐›ผ + ๐›ฝ + 1),

165

(ii) max๐‘ฅโˆˆ[โˆ’1,1]

๐‘ƒ

(๐›ผ,๐›ฝ)๐‘˜ (๐‘ฅ)

= max

ฮ“(๐‘˜ + ๐›ผ + 1)

๐‘˜! ฮ“(๐›ผ + 1),ฮ“(๐‘˜ + ๐›ฝ + 1)

๐‘˜! ฮ“(๐›ฝ + 1)

for max๐›ผ, ๐›ฝ โ‰ฅ โˆ’1

2,

(iii)๐‘‘

๐‘‘๐‘ฅ๐‘ƒ

(๐›ผ,๐›ฝ)๐‘˜ (๐‘ฅ) =

๐‘˜ + ๐›ผ + ๐›ฝ + 1

2๐‘ƒ

(๐›ผ+1,๐›ฝ+1)๐‘˜โˆ’1 (๐‘ฅ),

(iv) max ๐‘ฅ โˆˆ [โˆ’1, 1] |๐‘ƒ (๐›ผ,๐›ฝ)๐‘˜ (๐‘ฅ) = 0 โ‰ค

โˆš1โˆ’

(๐›ผ + 3/2

๐‘˜ + ๐›ผ + 1/2

)2

for ๐›ผ โ‰ฅ ๐›ฝ > โˆ’1112

and

๐‘˜ > 1.

Proof. Equations (i)-(iii) are standard results, and can be found in [103, Chapter 3.2] or

[110]. What remains is to prove (iv). By [90, Lemma 3.5], the largest zero ๐‘ฅ1 of the ๐‘˜๐‘กโ„Ž

degree Gegenbauer polynomial (i.e., ๐‘ƒ (๐œ†โˆ’1/2,๐œ†โˆ’1/2)๐‘˜ (๐‘ฅ)), with ๐œ† > โˆ’5/12, satisfies

๐‘ฅ21 โ‰ค

(๐‘˜ โˆ’ 1)(๐‘˜ + 2๐œ† + 1)

(๐‘˜ + ๐œ†)2 + 3๐œ† + 54

+ 3(๐œ† + 12)2/(๐‘˜ โˆ’ 1)

โ‰ค (๐‘˜ โˆ’ 1)(๐‘˜ + 2๐œ† + 1)

(๐‘˜ + ๐œ†)2= 1โˆ’

(๐œ† + 1

๐‘˜ + ๐œ†

)2

.

By [37, Theorem 2.1], the largest zero of ๐‘ƒ (๐›ผ,๐›ฝ+๐‘ก)๐‘˜ (๐‘ฅ) is strictly greater than the largest zero

of ๐‘ƒ (๐›ผ,๐›ฝ)๐‘˜ (๐‘ฅ) for any ๐‘ก > 0. As ๐›ผ โ‰ฅ ๐›ฝ, combining these two facts provides our desired

result.

For the sake of brevity, the inner product and norm on [โˆ’1, 1] with respect to ๐œ”๐›ผ,๐›ฝ(๐‘ฅ) =

(1โˆ’ ๐‘ฅ)๐›ผ(1 + ๐‘ฅ)๐›ฝ will be denoted by โŸจยท, ยทโŸฉ๐›ผ,๐›ฝ and โ€– ยท โ€–๐›ผ,๐›ฝ . We are now prepared to prove the

following lower bound.

Lemma 21. If ๐‘š = ฮ˜ (ln๐‘›), then

max๐ดโˆˆ๐’ฎ๐‘›

P๐‘โˆผ๐’ฐ(๐‘†๐‘›โˆ’1)

[๐œ†1(๐ด)โˆ’ ๐œ†

(๐‘š)1 (๐ด, ๐‘)

๐œ†1(๐ด)โˆ’ ๐œ†๐‘›(๐ด)โ‰ฅ .015 ln2 ๐‘›

๐‘š2 ln2 ln๐‘›

]โ‰ฅ 1โˆ’ ๐‘œ(1/๐‘›).

Proof. The structure of the proof is similar in concept to that of Lemma 20. We choose

a matrix with eigenvalues based on a function corresponding to an integral minimization

problem, and show that a large subset of [0,โˆž)๐‘› (with respect to ๐‘“๐‘Œ (๐‘ฆ)) satisfies conditions

that allow us to lower bound the relative error using the aforementioned integral minimization

problem. The main difficulty is that, to obtain improved bounds, we must use Proposition 10

166

with a weight function that requires quadrature for functions that are no longer polynomials,

and, therefore, cannot be represented exactly using Gaussian quadrature. In addition, the

required quadrature is for functions whose derivative has a singularity at ๐‘ฅ = 1, and so

Proposition 12 is not immediately applicable. Instead, we perform a two-part error analysis.

In particular, if a function is ๐ถ1[0, ๐‘Ž], ๐‘Ž < 1, and monotonic on [๐‘Ž, 1], then using Proposition

12 on [0, ๐‘Ž] and a monotonicity argument on [๐‘Ž, 1] results in an error bound for quadrature.

Let โ„“ =โŒŠ.2495 ln๐‘›ln ln๐‘›

โŒ‹, ๐‘˜ =

โŒŠ๐‘š4.004โ„“

โŒ‹, and ๐‘š = ฮ˜(ln๐‘›). We assume that ๐‘› is sufficiently

large, so that โ„“ > 0. โ„“ is quite small for practical values of ๐‘›, but the growth of โ„“ with

respect to ๐‘› is required for a result with dependence on ๐‘›. Consider a matrix ๐ด โˆˆ ๐’ฎ๐‘›

with eigenvalues given by ๐‘“(๐‘ฅ๐‘—), ๐‘— = 1, ..., ๐‘˜, each with multiplicity either โŒŠ๐‘›/๐‘˜โŒ‹ or โŒˆ๐‘›/๐‘˜โŒ‰,

where ๐‘“(๐‘ฅ) = 1โˆ’ 2(1โˆ’ ๐‘ฅ)1โ„“ , and ๐‘ฅ๐‘— = ๐‘—/๐‘˜, ๐‘— = 1, ..., ๐‘˜. Then

P

[๐œ†1 โˆ’ ๐œ†

(๐‘š)1

๐œ†1 โˆ’ ๐œ†๐‘›

โ‰ฅ ๐œ–

]= P

โŽกโŽฃ min๐‘ƒโˆˆ๐’ซ๐‘šโˆ’1

๐‘ƒ =0

โˆ‘๐‘›๐‘–=2 ๐‘Œ๐‘–๐‘ƒ

2 (๐œ†๐‘–) (๐œ†1 โˆ’ ๐œ†๐‘–)

(๐œ†1 โˆ’ ๐œ†๐‘›)โˆ‘๐‘›

๐‘–=1 ๐‘Œ๐‘–๐‘ƒ 2 (๐œ†๐‘–)โ‰ฅ ๐œ–

โŽคโŽฆโ‰ฅ P

โŽกโŽฃ min๐‘ƒโˆˆ๐’ซ๐‘šโˆ’1

๐‘ƒ =0

โˆ‘๐‘˜๐‘—=1 ๐‘Œ๐‘—๐‘ƒ

2 (๐‘“(๐‘ฅ๐‘—)) (1โˆ’ ๐‘“(๐‘ฅ๐‘—))

2โˆ‘๐‘˜

๐‘—=1 ๐‘Œ๐‘—๐‘ƒ 2 (๐‘“(๐‘ฅ๐‘—))โ‰ฅ ๐œ–

โŽคโŽฆ ,

where ๐‘Œ1, ..., ๐‘Œ๐‘› โˆผ ๐œ’21, and ๐‘Œ๐‘— is the sum of the ๐‘Œ๐‘–โ€™s that satisfy ๐œ†๐‘– = ๐‘“(๐‘ฅ๐‘—). Each ๐‘Œ๐‘— ,

๐‘— = 1, ..., ๐‘˜, has either โŒŠ๐‘›/๐‘˜โŒ‹ or โŒˆ๐‘›/๐‘˜โŒ‰ degrees of freedom. Because ๐‘š = ฮ˜(ln๐‘›), we have

๐‘˜ = ๐‘œ(๐‘›.999) and, by Proposition 11,

P[.999โŒŠ๐‘›

๐‘˜โŒ‹ โ‰ค ๐‘Œ๐‘— โ‰ค 1.001โŒˆ๐‘›

๐‘˜โŒ‰, ๐‘— = 1, ..., ๐‘˜

]โ‰ฅ

(1โˆ’

(.999๐‘’.001

)โŒŠ๐‘›๐‘˜โŒ‹ โˆ’(1.001๐‘’โˆ’.001

)โŒˆ๐‘›๐‘˜โŒ‰)๐‘˜

= 1โˆ’ ๐‘œ(1/๐‘›).

Therefore, with probability 1โˆ’ ๐‘œ(1/๐‘›),

min๐‘ƒโˆˆ๐’ซ๐‘šโˆ’1

๐‘ƒ =0

โˆ‘๐‘˜๐‘—=1 ๐‘Œ๐‘—๐‘ƒ

2 (๐‘“(๐‘ฅ๐‘—)) (1โˆ’ ๐‘“(๐‘ฅ๐‘—))

2โˆ‘๐‘˜

๐‘—=1 ๐‘Œ๐‘—๐‘ƒ 2 (๐‘“(๐‘ฅ๐‘—))โ‰ฅ .998

2min

๐‘ƒโˆˆ๐’ซ๐‘šโˆ’1๐‘ƒ =0

โˆ‘๐‘˜๐‘—=1 ๐‘ƒ

2(๐‘“(๐‘ฅ๐‘—))(1โˆ’ ๐‘“(๐‘ฅ๐‘—))โˆ‘๐‘˜๐‘—=1 ๐‘ƒ

2(๐‘“(๐‘ฅ๐‘—)).

167

Let ๐‘ƒ (๐‘ฆ) =โˆ‘๐‘šโˆ’1

๐‘Ÿ=0 ๐‘๐‘Ÿ๐‘ƒ(โ„“,0)๐‘Ÿ (๐‘ฆ), ๐‘๐‘Ÿ โˆˆ R, ๐‘Ÿ = 0, ...,๐‘šโˆ’ 1, and define

๐‘”๐‘Ÿ,๐‘ (๐‘ฅ) = ๐‘ƒ (โ„“,0)๐‘Ÿ (๐‘“(๐‘ฅ))๐‘ƒ (โ„“,0)

๐‘  (๐‘“(๐‘ฅ)) and ๐‘”๐‘Ÿ,๐‘ (๐‘ฅ) = ๐‘”๐‘Ÿ,๐‘ (๐‘ฅ)(1โˆ’ ๐‘“(๐‘ฅ)),

๐‘Ÿ, ๐‘  = 0, ...,๐‘šโˆ’ 1. Then we have

โˆ‘๐‘˜๐‘—=1 ๐‘ƒ

2(๐‘“(๐‘ฅ๐‘—))(1โˆ’ ๐‘“(๐‘ฅ๐‘—))โˆ‘๐‘˜๐‘—=1 ๐‘ƒ

2(๐‘“(๐‘ฅ๐‘—))=

โˆ‘๐‘šโˆ’1๐‘Ÿ,๐‘ =0 ๐‘๐‘Ÿ๐‘๐‘ 

โˆ‘๐‘˜๐‘—=1 ๐‘”๐‘Ÿ,๐‘ (๐‘ฅ๐‘—)โˆ‘๐‘šโˆ’1

๐‘Ÿ,๐‘ =0 ๐‘๐‘Ÿ๐‘๐‘ โˆ‘๐‘˜

๐‘—=1 ๐‘”๐‘Ÿ,๐‘ (๐‘ฅ๐‘—).

We now replace each quadratureโˆ‘๐‘˜

๐‘—=1 ๐‘”๐‘Ÿ,๐‘ (๐‘ฅ๐‘—) orโˆ‘๐‘˜

๐‘—=1 ๐‘”๐‘Ÿ,๐‘ (๐‘ฅ๐‘—) in the previous expression

by its corresponding integral, plus a small error term. The functions ๐‘”๐‘Ÿ,๐‘  and ๐‘”๐‘Ÿ,๐‘  are not

elements of ๐’ž1[0, 1], as ๐‘“ โ€ฒ(๐‘ฅ) has a singularity at ๐‘ฅ = 1, and, therefore, we cannot use

Proposition 12 directly. Instead we break the error analysis of the quadrature into two

components. We have ๐‘”๐‘Ÿ,๐‘ , ๐‘”๐‘Ÿ,๐‘  โˆˆ ๐’ž1[0, ๐‘Ž] for any 0 < ๐‘Ž < 1, and, if ๐‘Ž is chosen to be large

enough, both ๐‘”๐‘Ÿ,๐‘  and ๐‘”๐‘Ÿ,๐‘  will be monotonic on the interval [๐‘Ž, 1]. In that case, we can apply

Proposition 12 to bound the error over the interval [0, ๐‘Ž] and use basic properties of Riemann

sums of monotonic functions to bound the error over the interval [๐‘Ž, 1].

The function ๐‘“(๐‘ฅ) is increasing on [0, 1], and, by equations (iii) and (iv) of Proposition

13, the function ๐‘ƒ(โ„“,0)๐‘Ÿ (๐‘ฆ), ๐‘Ÿ = 2, ...,๐‘šโˆ’ 1, is increasing on the interval

โŽกโŽฃโˆš1โˆ’(

โ„“ + 5/2

๐‘š + โ„“โˆ’ 1/2

)2

, 1

โŽคโŽฆ . (5.12)

The functions ๐‘ƒ(โ„“,0)0 (๐‘ฆ) = 1 and ๐‘ƒ

(โ„“,0)1 (๐‘ฆ) = โ„“ + 1 + (โ„“ + 2)(๐‘ฆ โˆ’ 1)/2 are clearly non-

decreasing over interval (5.12). By the inequalityโˆš

1โˆ’ ๐‘ฆ โ‰ค 1โˆ’๐‘ฆ/2, ๐‘ฆ โˆˆ [0, 1], the function

๐‘ƒ(โ„“,0)๐‘Ÿ (๐‘“(๐‘ฅ)), ๐‘Ÿ = 0, ...,๐‘šโˆ’ 1, is non-decreasing on the interval

[1โˆ’

(โ„“ + 5/2

2๐‘š + 2โ„“โˆ’ 1

)2โ„“

, 1

]. (5.13)

Therefore, the functions ๐‘”๐‘Ÿ,๐‘ (๐‘ฅ) are non-decreasing and ๐‘”๐‘Ÿ,๐‘ (๐‘ฅ) are non-increasing on the

interval (5.13) for all ๐‘Ÿ, ๐‘  = 0, ...,๐‘š โˆ’ 1. The term(

โ„“+5/22๐‘š+2โ„“โˆ’1

)2โ„“= ๐œ”(1/๐‘˜), and so, for

168

sufficiently large ๐‘›, there exists an index ๐‘—* โˆˆ 1, ..., ๐‘˜ โˆ’ 2 such that

1โˆ’(

โ„“ + 5/2

2๐‘š + 2โ„“โˆ’ 1

)2โ„“

โ‰ค ๐‘ฅ๐‘—* < 1โˆ’(

โ„“ + 5/2

2๐‘š + 2โ„“โˆ’ 1

)2โ„“

+1

๐‘˜< 1.

We now upper bound the derivatives of ๐‘”๐‘Ÿ,๐‘  and ๐‘”๐‘Ÿ,๐‘  on the interval [0, ๐‘ฅ๐‘—* ]. By equation

(iii) of Proposition 13, the first derivatives of ๐‘”๐‘Ÿ,๐‘  and ๐‘”๐‘Ÿ,๐‘  are given by

๐‘”โ€ฒ๐‘Ÿ,๐‘ (๐‘ฅ) = ๐‘“ โ€ฒ(๐‘ฅ)

([๐‘‘

๐‘‘๐‘ฆ๐‘ƒ (โ„“,0)๐‘Ÿ (๐‘ฆ)

]๐‘ฆ=๐‘“(๐‘ฅ)

๐‘ƒ (โ„“,0)๐‘  (๐‘“(๐‘ฅ)) + ๐‘ƒ (โ„“,0)

๐‘Ÿ (๐‘“(๐‘ฅ))

[๐‘‘

๐‘‘๐‘ฆ๐‘ƒ (โ„“,0)๐‘  (๐‘ฆ)

]๐‘ฆ=๐‘“(๐‘ฅ)

)

=(1โˆ’ ๐‘ฅ)โˆ’

โ„“โˆ’1โ„“

โ„“

((๐‘Ÿ + โ„“ + 1)๐‘ƒ

(โ„“+1,1)๐‘Ÿโˆ’1 (๐‘“(๐‘ฅ))๐‘ƒ (โ„“,0)

๐‘  (๐‘“(๐‘ฅ))

+(๐‘  + โ„“ + 1)๐‘ƒ (โ„“,0)๐‘Ÿ (๐‘“(๐‘ฅ))๐‘ƒ

(โ„“+1,1)๐‘ โˆ’1 (๐‘“(๐‘ฅ))

),

and

๐‘”โ€ฒ๐‘Ÿ,๐‘ (๐‘ฅ) = ๐‘”โ€ฒ๐‘Ÿ,๐‘ (๐‘ฅ)(1โˆ’ ๐‘ฅ)1โ„“ โˆ’ 2๐‘”๐‘Ÿ,๐‘ (๐‘ฅ)

(1โˆ’ ๐‘ฅ)โˆ’โ„“โˆ’1โ„“

โ„“.

By equation (ii) of Proposition 13, and the inequality(๐‘ฅ๐‘ฆ

)โ‰ค(

๐‘’๐‘ฅ๐‘ฆ

)๐‘ฆ, ๐‘ฅ, ๐‘ฆ โˆˆ N, ๐‘ฆ โ‰ค ๐‘ฅ,

max๐‘ฅโˆˆ[0,๐‘ฅ๐‘—* ]

|๐‘”โ€ฒ๐‘Ÿ,๐‘ (๐‘ฅ)| โ‰ค (1โˆ’ ๐‘ฅ๐‘—*)โˆ’โ„“โˆ’1โ„“

โ„“

((๐‘Ÿ + โ„“ + 1)

(๐‘Ÿ + โ„“

โ„“ + 1

)(๐‘  + โ„“

โ„“

)+(๐‘  + โ„“ + 1)

(๐‘Ÿ + โ„“

โ„“

)(๐‘  + โ„“

โ„“ + 1

))

โ‰ค 2๐‘š + โ„“

โ„“

((โ„“ + 5/2

2๐‘š + 2โ„“โˆ’ 1

)2โ„“

โˆ’ 1

๐‘˜

)โˆ’ โ„“โˆ’1โ„“ (

๐‘’(๐‘š + โ„“โˆ’ 1)

โ„“

)2โ„“+1

,

and

max๐‘ฅโˆˆ[0,๐‘ฅ๐‘—* ]

๐‘”โ€ฒ๐‘Ÿ,๐‘ (๐‘ฅ)

โ‰ค max

๐‘ฅโˆˆ[0,๐‘ฅ๐‘—* ]|๐‘”โ€ฒ๐‘Ÿ,๐‘ (๐‘ฅ)|+ 2

(1โˆ’ ๐‘ฅ๐‘—*)โˆ’โ„“โˆ’1โ„“

โ„“

(๐‘Ÿ + โ„“

โ„“

)(๐‘  + โ„“

โ„“

)

โ‰ค 4๐‘š + โ„“

โ„“

((โ„“ + 5/2

2๐‘š + 2โ„“โˆ’ 1

)2โ„“

โˆ’ 1

๐‘˜

)โˆ’ โ„“โˆ’1โ„“ (

๐‘’(๐‘š + โ„“โˆ’ 1)

โ„“

)2โ„“+1

.

Therefore, both max๐‘ฅโˆˆ[0,๐‘ฅ๐‘—* ] |๐‘”โ€ฒ๐‘Ÿ,๐‘ (๐‘ฅ)| and max๐‘ฅโˆˆ[0,๐‘ฅ๐‘—* ] |๐‘”โ€ฒ๐‘Ÿ,๐‘ (๐‘ฅ)| are ๐‘œ(๐‘š4.002โ„“). Then, by

169

Proposition 12, equation (ii) of Proposition 13 and monotonicity on the interval [๐‘ฅ๐‘—* , 1], we

have 1๐‘˜

๐‘˜โˆ‘๐‘—=1

๐‘”๐‘Ÿ,๐‘ (๐‘ฅ๐‘—)โˆ’โˆซ 1

0

๐‘”๐‘Ÿ,๐‘ (๐‘ฅ) ๐‘‘๐‘ฅ

โ‰ค

1๐‘˜

๐‘—*โˆ‘๐‘—=1

๐‘”๐‘Ÿ,๐‘ (๐‘ฅ๐‘—)โˆ’โˆซ ๐‘ฅ๐‘—*

0

๐‘”๐‘Ÿ,๐‘ (๐‘ฅ) ๐‘‘๐‘ฅ

+

1๐‘˜

๐‘˜โˆ‘๐‘—=๐‘—*+1

๐‘”๐‘Ÿ,๐‘ (๐‘ฅ๐‘—)โˆ’โˆซ 1

๐‘Ž

๐‘”๐‘Ÿ,๐‘ (๐‘ฅ) ๐‘‘๐‘ฅ

โ‰ค 1 + ๐‘œ(1)

2๐‘˜๐‘š4.002โ„“+

๐‘”๐‘Ÿ,๐‘ (1)

๐‘˜

โ‰ค 1 + ๐‘œ(1)

2๐‘š.002โ„“+

1

๐‘˜

(๐‘’(๐‘š + โ„“ + 1)

โ„“ + 1

)2(โ„“+1)

โ‰ค 1 + ๐‘œ(1)

2๐‘š.002โ„“,

and, similarly,

1๐‘˜

๐‘˜โˆ‘๐‘—=1

๐‘”๐‘Ÿ,๐‘ (๐‘ฅ๐‘—)โˆ’โˆซ 1

0

๐‘”๐‘Ÿ,๐‘ (๐‘ฅ) ๐‘‘๐‘ฅ

โ‰ค

1๐‘˜

๐‘—*โˆ‘๐‘—=1

๐‘”๐‘Ÿ,๐‘ (๐‘ฅ๐‘—)โˆ’โˆซ ๐‘ฅ๐‘—*

0

๐‘”๐‘Ÿ,๐‘ (๐‘ฅ) ๐‘‘๐‘ฅ

+

1๐‘˜

๐‘˜โˆ‘๐‘—=๐‘—*+1

๐‘”๐‘Ÿ,๐‘ (๐‘ฅ๐‘—)โˆ’โˆซ 1

๐‘ฅ๐‘—*

๐‘”๐‘Ÿ,๐‘ (๐‘ฅ) ๐‘‘๐‘ฅ

โ‰ค 1 + ๐‘œ(1)

2๐‘š.002โ„“+

๐‘”๐‘Ÿ,๐‘ (๐‘ฅ๐‘—*)

๐‘˜

โ‰ค 1 + ๐‘œ(1)

2๐‘š.002โ„“+

๐‘”๐‘Ÿ,๐‘ (1)

๐‘˜

โ‰ค 1 + ๐‘œ(1)

2๐‘š.002โ„“.

Let us denote this upper bound by ๐‘€ = (1 + ๐‘œ(1))/(2๐‘š.002โ„“). By using the substitution

๐‘ฅ = 1โˆ’ (1โˆ’๐‘ฆ2

)โ„“, we have

โˆซ 1

0

๐‘”๐‘Ÿ,๐‘ (๐‘ฅ) ๐‘‘๐‘ฅ =โ„“

2โ„“โŸจ๐‘ƒ (โ„“,0)

๐‘Ÿ , ๐‘ƒ (โ„“,0)๐‘  โŸฉโ„“,0 and

โˆซ 1

0

๐‘”๐‘Ÿ,๐‘ (๐‘ฅ) ๐‘‘๐‘ฅ =โ„“

2โ„“โŸจ๐‘ƒ (โ„“,0)

๐‘Ÿ , ๐‘ƒ (โ„“,0)๐‘  โŸฉโ„“โˆ’1,0.

170

Then

max๐‘ƒโˆˆ๐’ซ๐‘šโˆ’1

๐‘ƒ =0

โˆ‘๐‘˜๐‘—=1 ๐‘ƒ

2(๐‘“(๐‘ฅ๐‘—))โˆ‘๐‘˜๐‘—=1 ๐‘ƒ

2(๐‘“(๐‘ฅ๐‘—))(1โˆ’ ๐‘“(๐‘ฅ๐‘—))โ‰ค max

๐‘โˆˆR๐‘šโˆ–0|๐œ–๐‘Ÿ,๐‘ |โ‰ค๐‘€|๐œ–๐‘Ÿ,๐‘ |โ‰ค๐‘€

โˆ‘๐‘šโˆ’1๐‘Ÿ,๐‘ =0 ๐‘๐‘Ÿ๐‘๐‘ 

[โ„“2โ„“โŸจ๐‘ƒ (โ„“,0)

๐‘Ÿ , ๐‘ƒ(โ„“,0)๐‘  โŸฉโ„“โˆ’1,0 + ๐œ–๐‘Ÿ,๐‘ 

]โˆ‘๐‘šโˆ’1

๐‘Ÿ,๐‘ =0 ๐‘๐‘Ÿ๐‘๐‘ 

[โ„“2โ„“โŸจ๐‘ƒ (โ„“,0)

๐‘Ÿ , ๐‘ƒ(โ„“,0)๐‘  โŸฉโ„“,0 + ๐œ–๐‘Ÿ,๐‘ 

] .

Letting ๐‘๐‘Ÿ = ๐‘๐‘Ÿโ€–๐‘ƒ (โ„“,0)๐‘Ÿ โ€–โ„“,0, ๐‘Ÿ = 0, ...,๐‘šโˆ’1, and noting that, by equation (i) of Proposition

13, โ€–๐‘ƒ (โ„“,0)๐‘Ÿ โ€–โ„“,0 =

(2โ„“+1

2๐‘Ÿ+โ„“+1

)1/2, we obtain the bound

max๐‘ƒโˆˆ๐’ซ๐‘šโˆ’1

๐‘ƒ =0

โˆ‘๐‘˜๐‘—=1 ๐‘ƒ

2(๐‘“(๐‘ฅ๐‘—))โˆ‘๐‘˜๐‘—=1 ๐‘ƒ

2(๐‘“(๐‘ฅ๐‘—))(1โˆ’ ๐‘“(๐‘ฅ๐‘—))โ‰ค max

๐‘โˆˆR๐‘šโˆ–0|๐œ–๐‘Ÿ,๐‘ |โ‰ค๐œ–|๐œ–๐‘Ÿ,๐‘ |โ‰ค๐œ–

โˆ‘๐‘šโˆ’1๐‘Ÿ,๐‘ =0 ๐‘๐‘Ÿ๐‘๐‘ 

[โŸจ๐‘ƒ (โ„“,0)

๐‘Ÿ ,๐‘ƒ(โ„“,0)๐‘  โŸฉโ„“โˆ’1,0

โ€–๐‘ƒ (โ„“,0)๐‘Ÿ โ€–โ„“,0โ€–๐‘ƒ

(โ„“,0)๐‘  โ€–โ„“,0

+ ๐œ–๐‘Ÿ,๐‘ 

]โˆ‘๐‘šโˆ’1

๐‘Ÿ=0 ๐‘2๐‘Ÿ +โˆ‘๐‘šโˆ’1

๐‘Ÿ,๐‘ =0 ๐‘๐‘Ÿ๐‘๐‘ ๐œ–๐‘Ÿ,๐‘ ,

where

๐œ– =๐‘€(2๐‘š + โ„“โˆ’ 1)

โ„“= (1 + ๐‘œ(1))

2๐‘š + โ„“โˆ’ 1

2๐‘š.002โ„“โ„“= ๐‘œ(1/๐‘š).

Let ๐ต โˆˆ R๐‘šร—๐‘š be given by ๐ต(๐‘Ÿ, ๐‘ ) = โŸจ๐‘ƒ (โ„“,0)๐‘Ÿ , ๐‘ƒ

(โ„“,0)๐‘  โŸฉโ„“โˆ’1,0โ€–๐‘ƒ (โ„“,0)

๐‘Ÿ โ€–โˆ’1โ„“,0โ€–๐‘ƒ

(โ„“,0)๐‘  โ€–โˆ’1

โ„“,0 , ๐‘Ÿ, ๐‘  =

0, ...,๐‘šโˆ’ 1. By Proposition 10 applied to ๐œ”โ„“โˆ’1,0(๐‘ฅ) and equation (iv) of Proposition 13, we

have

max๐‘โˆˆR๐‘šโˆ–0

๐‘๐‘‡๐ต๐‘

๐‘๐‘‡ ๐‘โ‰ค 1

1โˆ’โˆš

1โˆ’(

โ„“+1/2๐‘š+โ„“โˆ’1/2

)2 โ‰ค 2

(๐‘š + โ„“โˆ’ 1/2

โ„“ + 1/2

)2

.

This implies that

max๐‘ƒโˆˆ๐’ซ๐‘šโˆ’1

๐‘ƒ =0

โˆ‘๐‘˜๐‘—=1 ๐‘ƒ

2(๐‘“(๐‘ฅ๐‘—))โˆ‘๐‘˜๐‘—=1 ๐‘ƒ

2(๐‘“(๐‘ฅ๐‘—))(1โˆ’ ๐‘“(๐‘ฅ๐‘—))โ‰ค max

๐‘โˆˆR๐‘š

=0

๐‘๐‘‡(๐ต + ๐œ–11๐‘‡

)๐‘

๐‘๐‘‡ (๐ผ โˆ’ ๐œ–11๐‘‡ ) ๐‘

โ‰ค 1

1โˆ’ ๐œ–๐‘šmax๐‘โˆˆR๐‘š

๐‘ =0

[๐‘๐‘‡๐ต๐‘

๐‘๐‘‡ ๐‘

]+

๐œ–๐‘š

1โˆ’ ๐œ–๐‘š

โ‰ค 2

1โˆ’ ๐œ–๐‘š

(๐‘š + โ„“โˆ’ 1/2

โ„“ + 1/2

)2

+๐œ–๐‘š

1โˆ’ ๐œ–๐‘š,

171

and that, with probability 1โˆ’ ๐‘œ(1/๐‘›),

min๐‘ƒโˆˆ๐’ซ๐‘šโˆ’1

๐‘ƒ =0

โˆ‘๐‘˜๐‘—=1 ๐‘Œ๐‘—๐‘ƒ

2 (๐‘“(๐‘ฅ๐‘—)) (1โˆ’ ๐‘“(๐‘ฅ๐‘—))โˆ‘๐‘˜๐‘—=1 ๐‘Œ๐‘—๐‘ƒ 2 (๐‘“(๐‘ฅ๐‘—))

โ‰ฅ (1โˆ’๐‘œ(1)).998

4

(โ„“ + 1/2

๐‘š + โ„“โˆ’ 1/2

)2

โ‰ฅ .015 ln2 ๐‘›

๐‘š2 ln2 ln๐‘›.

This completes the proof.

In the previous proof, a number of very small constants were used, exclusively for the

purpose of obtaining an estimate with constant as close to 1/64 as possible. The constants

used in the definition of โ„“ and ๐‘˜ can be replaced by more practical numbers (that begin to

exhibit asymptotic convergence for reasonably sized ๐‘›), at the cost of a constant worse than

.015. This completes the analysis of asymptotic lower bounds.

We now move to upper bounds for relative error in the ๐‘-norm. Our estimate for relative

error in the one-norm is of the same order as the estimate (5.4), but with an improved

constant. Our technique for obtaining these estimates differs from the technique of [67] in

one key way. Rather than integrating first by ๐‘1 and using properties of the arctan function,

we replace the ball ๐ต๐‘› by ๐‘› chi-square random variables on [0,โˆž)๐‘›, and iteratively apply

Cauchy-Schwarz to our relative error until we obtain an exponent ๐‘ for which the inverse

chi-square distribution with one degree of freedom has a convergent ๐‘๐‘กโ„Ž moment.

Lemma 22. Let ๐‘› โ‰ฅ 100, ๐‘š โ‰ฅ 10, and ๐‘ โ‰ฅ 1. Then

max๐ดโˆˆ๐’ฎ๐‘›

E๐‘โˆผ๐’ฐ(๐‘†๐‘›โˆ’1)

[(๐œ†1(๐ด)โˆ’ ๐œ†

(๐‘š)1 (๐ด, ๐‘)

๐œ†1(๐ด)โˆ’ ๐œ†๐‘›(๐ด)

)๐‘] 1๐‘

โ‰ค .068ln2 (๐‘›(๐‘šโˆ’ 1)8๐‘)

(๐‘šโˆ’ 1)2.

Proof. Without loss of generality, suppose ๐œ†1(๐ด) = 1 and ๐œ†๐‘›(๐ด) = 0. By repeated

application of Cauchy Schwarz,

โˆ‘๐‘›๐‘–=1 ๐‘ฆ๐‘–๐‘„

2(๐œ†๐‘–)(1โˆ’ ๐œ†๐‘–)โˆ‘๐‘›๐‘–=1 ๐‘ฆ๐‘–๐‘„

2(๐œ†๐‘–)โ‰ค[โˆ‘๐‘›

๐‘–=1 ๐‘ฆ๐‘–๐‘„2(๐œ†๐‘–)(1โˆ’ ๐œ†๐‘–)

2โˆ‘๐‘›๐‘–=1 ๐‘ฆ๐‘–๐‘„

2(๐œ†๐‘–)

] 12

โ‰ค[โˆ‘๐‘›

๐‘–=1 ๐‘ฆ๐‘–๐‘„2(๐œ†๐‘–)(1โˆ’ ๐œ†๐‘–)

2๐‘žโˆ‘๐‘›๐‘–=1 ๐‘ฆ๐‘–๐‘„

2(๐œ†๐‘–)

] 12๐‘ž

for ๐‘ž โˆˆ N. Choosing ๐‘ž to satisfy 2๐‘ < 2๐‘ž โ‰ค 4๐‘, and using equation (5.9) with polynomial

172

normalization ๐‘„(1) = 1, we have

E

[(๐œ†1 โˆ’ ๐œ†

(๐‘š)1

๐œ†1 โˆ’ ๐œ†๐‘›

)๐‘] 1๐‘

โ‰ค min๐‘„โˆˆ๐’ซ๐‘šโˆ’1(1)

[โˆซ[0,โˆž)๐‘›

(โˆ‘๐‘›๐‘–=2 ๐‘ฆ๐‘–๐‘„

2(๐œ†๐‘–)(1โˆ’ ๐œ†๐‘–)2๐‘žโˆ‘๐‘›

๐‘–=1 ๐‘ฆ๐‘–๐‘„2(๐œ†๐‘–)

) ๐‘2๐‘ž

๐‘“๐‘Œ (๐‘ฆ) ๐‘‘๐‘ฆ

] 1๐‘

โ‰ค min๐‘„โˆˆ๐’ซ๐‘šโˆ’1(1)

โŽกโŽฃโˆซ[0,โˆž)๐‘›

(โˆ‘๐‘–:๐œ†๐‘–<๐›ฝ ๐‘ฆ๐‘–๐‘„

2(๐œ†๐‘–)(1โˆ’ ๐œ†๐‘–)2๐‘žโˆ‘๐‘›

๐‘–=1 ๐‘ฆ๐‘–๐‘„2(๐œ†๐‘–)

) ๐‘2๐‘ž

๐‘“๐‘Œ (๐‘ฆ) ๐‘‘๐‘ฆ

โŽคโŽฆ 1๐‘

+

โŽกโŽฃโˆซ[0,โˆž)๐‘›

(โˆ‘๐‘–:๐œ†๐‘–โ‰ฅ๐›ฝ ๐‘ฆ๐‘–๐‘„

2(๐œ†๐‘–)(1โˆ’ ๐œ†๐‘–)2๐‘žโˆ‘๐‘›

๐‘–=1 ๐‘ฆ๐‘–๐‘„2(๐œ†๐‘–)

) ๐‘2๐‘ž

๐‘“๐‘Œ (๐‘ฆ) ๐‘‘๐‘ฆ

โŽคโŽฆ 1๐‘

for any ๐›ฝ โˆˆ (0, 1). The integrand of the first term satisfies

(โˆ‘๐‘–:๐œ†๐‘–<๐›ฝ ๐‘ฆ๐‘–๐‘„

2(๐œ†๐‘–)(1โˆ’ ๐œ†๐‘–)2๐‘žโˆ‘๐‘›

๐‘–=1 ๐‘ฆ๐‘–๐‘„2(๐œ†๐‘–)

) ๐‘2๐‘ž

โ‰ค

(โˆ‘๐‘–:๐œ†๐‘–<๐›ฝ ๐‘ฆ๐‘–๐‘„

2(๐œ†๐‘–)(1โˆ’ ๐œ†๐‘–)2๐‘žโˆ‘๐‘›

๐‘–=1 ๐‘ฆ๐‘–๐‘„2(๐œ†๐‘–)

) 14

โ‰ค max๐‘ฅโˆˆ[0,๐›ฝ]

|๐‘„(๐‘ฅ)|1/2(1โˆ’ ๐‘ฅ)2๐‘žโˆ’2

(โˆ‘๐‘–:๐œ†๐‘–<๐›ฝ ๐‘ฆ๐‘–

๐‘ฆ1

) 14

,

and the second term is always bounded above by max๐œ†โˆˆ[๐›ฝ,1](1โˆ’ ๐œ†) = 1โˆ’ ๐›ฝ. We replace

the minimizing polynomial in ๐’ซ๐‘šโˆ’1(1) by ๐‘„(๐‘ฅ) =๐‘‡๐‘šโˆ’1( 2

๐›ฝ๐‘ฅโˆ’1)

๐‘‡๐‘šโˆ’1( 2๐›ฝโˆ’1)

, where ๐‘‡๐‘šโˆ’1(ยท) is the

Chebyshev polynomial of the first kind. The Chebyshev polynomials ๐‘‡๐‘šโˆ’1(ยท) are bounded by

one in magnitude on the interval [โˆ’1, 1], and this bound is tight at the endpoints. Therefore,

our maximum is achieved at ๐‘ฅ = 0, and

max๐‘ฅโˆˆ[0,๐›ฝ]

| ๐‘„(๐‘ฅ)|1/2(1โˆ’ ๐‘ฅ)2๐‘žโˆ’2

=

๐‘‡๐‘šโˆ’1

(2

๐›ฝโˆ’ 1

)โˆ’1/2

.

By the definition ๐‘‡๐‘šโˆ’1(๐‘ฅ) = 1/2((

๐‘ฅโˆ’โˆš๐‘ฅ2 โˆ’ 1

)๐‘šโˆ’1+(๐‘ฅ +โˆš๐‘ฅ2 โˆ’ 1

)๐‘šโˆ’1)

, |๐‘ฅ| โ‰ฅ 1,

173

and the standard inequality ๐‘’2๐‘ฅ โ‰ค 1+๐‘ฅ1โˆ’๐‘ฅ

, ๐‘ฅ โˆˆ [0, 1],

๐‘‡๐‘šโˆ’1

(2

๐›ฝโˆ’ 1

)โ‰ฅ 1

2

โŽ›โŽ 2

๐›ฝโˆ’ 1 +

โˆš(2

๐›ฝโˆ’ 1

)2

โˆ’ 1

โŽžโŽ ๐‘šโˆ’1

=1

2

(1 +โˆš

1โˆ’ ๐›ฝ

1โˆ’โˆš

1โˆ’ ๐›ฝ

)๐‘šโˆ’1

โ‰ฅ 1

2exp

2โˆš

1โˆ’ ๐›ฝ (๐‘šโˆ’ 1).

In addition,

โˆซ[0,โˆž)๐‘›

(โˆ‘๐‘›๐‘–=2 ๐‘ฆ๐‘–๐‘ฆ1

)1/4

๐‘“๐‘Œ (๐‘ฆ) ๐‘‘๐‘ฆ =ฮ“ (๐‘›/2โˆ’ 1/4) ฮ“(1/4)

ฮ“(๐‘›/2โˆ’ 1/2)ฮ“(1/2)โ‰ค ฮ“(1/4)

21/4ฮ“(1/2)๐‘›1/4,

which gives us

E

[(๐œ†1 โˆ’ ๐œ†

(๐‘š)1

๐œ†1 โˆ’ ๐œ†๐‘›

)๐‘]1๐‘

โ‰ค[

21/4ฮ“(1/4)

ฮ“(1/2)๐‘›1/4

]1/๐‘๐‘’โˆ’๐›พ(๐‘šโˆ’1)/๐‘ + ๐›พ2,

where ๐›พ =โˆš

1โˆ’ ๐›ฝ. Setting ๐›พ = ๐‘๐‘šโˆ’1

ln(๐‘›1/4๐‘(๐‘šโˆ’ 1)2

)(assuming ๐›พ < 1, otherwise our

bound is already greater than one, and trivially holds), we obtain

E

[(๐œ†1 โˆ’ ๐œ†

(๐‘š)1

๐œ†1 โˆ’ ๐œ†๐‘›

)๐‘]1/๐‘โ‰ค

(21/4ฮ“(1/4)ฮ“(1/2)

)1/๐‘+ ๐‘2 ln2(๐‘›1/4๐‘(๐‘šโˆ’ 1)2)

(๐‘šโˆ’ 1)2

=

(21/4ฮ“(1/4)ฮ“(1/2)

)1/๐‘+ 1

16ln2 (๐‘›(๐‘šโˆ’ 1)8๐‘)

(๐‘šโˆ’ 1)2

โ‰ค .068ln2 (๐‘›(๐‘šโˆ’ 1)8๐‘)

(๐‘šโˆ’ 1)2,

for ๐‘š โ‰ฅ 10, ๐‘› โ‰ฅ 100. The constant of .068 is produced by upper bounding 21/4ฮ“(1/4)ฮ“(1/2)

by a

constant1 times ln2[๐‘›(๐‘šโˆ’ 1)8๐‘

]. This completes the proof.

A similar proof, paired with probabilistic bounds on the quantityโˆ‘๐‘›

๐‘–=2 ๐‘Œ๐‘–/๐‘Œ1, where

1The best constant in this case is given by the ratio of 21/4ฮ“(1/4)ฮ“(1/2) to the minimum value of ln2

[๐‘›(๐‘šโˆ’1)8๐‘

]over ๐‘š โ‰ฅ 10, ๐‘› โ‰ฅ 100.

174

๐‘Œ1, ..., ๐‘Œ๐‘› โˆผ ๐œ’21, gives a probabilistic upper estimate. Combining the lower bounds in

Lemmas 20 and 21 with the upper bound of Lemma 22 completes the proof of Theorem 19.

5.4 Distribution Dependent Bounds

In this section, we consider improved estimates for matrices with certain specific properties.

First, we show that if a matrix ๐ด has a reasonable number of eigenvalues near ๐œ†1(๐ด), then

we can produce an error estimate that depends only on the number of iterations ๐‘š. The

intuition is that if there are not too few eigenvalues near the maximum eigenvalue, then the

initial vector has a large inner product with a vector with Rayleigh quotient close to the

maximum eigenvalue. In particular, we suppose that the eigenvalues of our matrix ๐ด are

such that, once scaled, there are at least ๐‘›/(๐‘šโˆ’ 1)๐›ผ eigenvalues in the range [๐›ฝ, 1], for a

specific value of ๐›ฝ satisfying 1โˆ’ ๐›ฝ = ๐‘‚(๐‘šโˆ’2 ln2๐‘š

). For a large number of matrices for

which the Lanczos method is used, this assumption holds true. Under this assumption, we

prove the following error estimate.

Theorem 20. Let ๐ด โˆˆ ๐’ฎ๐‘›, ๐‘š โ‰ฅ 10, ๐‘ โ‰ฅ 1, ๐›ผ > 0, and ๐‘› โ‰ฅ ๐‘š(๐‘šโˆ’ 1)๐›ผ. If

#

๐œ†๐‘–(๐ด)

๐œ†1(๐ด)โˆ’ ๐œ†๐‘–(๐ด)

๐œ†1(๐ด)โˆ’ ๐œ†๐‘›(๐ด)โ‰ค(

(2๐‘ + ๐›ผ/4) ln(๐‘šโˆ’ 1)

๐‘šโˆ’ 1

)2โ‰ฅ ๐‘›

(๐‘šโˆ’ 1)๐›ผ,

then

E๐‘โˆผ๐’ฐ(๐‘†๐‘›โˆ’1)

[(๐œ†1(๐ด)โˆ’ ๐œ†

(๐‘š)1 (๐ด, ๐‘)

๐œ†1(๐ด)โˆ’ ๐œ†๐‘›(๐ด)

)๐‘] 1๐‘

โ‰ค .077(2๐‘ + ๐›ผ/4)2 ln2(๐‘šโˆ’ 1)

(๐‘šโˆ’ 1)2.

In addition, if

#

๐œ†๐‘–(๐ด)

๐œ†1(๐ด)โˆ’ ๐œ†๐‘–(๐ด)

๐œ†1(๐ด)โˆ’ ๐œ†๐‘›(๐ด)โ‰ค(

(๐›ผ + 2) ln(๐‘šโˆ’ 1)

4(๐‘šโˆ’ 1)

)2โ‰ฅ ๐‘›

(๐‘šโˆ’ 1)๐›ผ,

175

then

P๐‘โˆผ๐’ฐ(๐‘†๐‘›โˆ’1)

[๐œ†1(๐ด)โˆ’ ๐œ†

(๐‘š)1 (๐ด, ๐‘)

๐œ†1(๐ด)โˆ’ ๐œ†๐‘›(๐ด)โ‰ค .126

(๐›ผ + 2)2 ln2(๐‘šโˆ’ 1)

(๐‘šโˆ’ 1)2

]โ‰ฅ 1โˆ’๐‘‚(๐‘’โˆ’๐‘š).

Proof. We begin by bounding expected relative error. The main idea is to proceed as in

the proof of Lemma 22, but take advantage of the number of eigenvalues near ๐œ†1. For

simplicity, let ๐œ†1 = 1 and ๐œ†๐‘› = 0. We consider eigenvalues in the ranges [0, 2๐›ฝ โˆ’ 1) and

[2๐›ฝ โˆ’ 1, 1] separately, 1/2 < ๐›ฝ < 1, and then make use of the lower bound for the number

of eigenvalues in [๐›ฝ, 1]. From the proof of Lemma 22, we have

E

[(๐œ†1 โˆ’ ๐œ†

(๐‘š)1

๐œ†1 โˆ’ ๐œ†๐‘›

)๐‘]1๐‘

โ‰ค min๐‘„โˆˆ๐’ซ๐‘šโˆ’1(1)

โŽกโŽฃโˆซ[0,โˆž)๐‘›

(โˆ‘๐‘–:๐œ†๐‘–<2๐›ฝโˆ’1 ๐‘ฆ๐‘–๐‘„

2(๐œ†๐‘–)(1โˆ’ ๐œ†๐‘–)2๐‘žโˆ‘๐‘›

๐‘–=1 ๐‘ฆ๐‘–๐‘„2(๐œ†๐‘–)

) ๐‘2๐‘ž

๐‘“๐‘Œ (๐‘ฆ) ๐‘‘๐‘ฆ

โŽคโŽฆ 1๐‘

+

โŽกโŽฃโˆซ[0,โˆž)๐‘›

(โˆ‘๐‘–:๐œ†๐‘–โ‰ฅ2๐›ฝโˆ’1 ๐‘ฆ๐‘–๐‘„

2(๐œ†๐‘–)(1โˆ’ ๐œ†๐‘–)2๐‘žโˆ‘๐‘›

๐‘–=1 ๐‘ฆ๐‘–๐‘„2(๐œ†๐‘–)

) ๐‘2๐‘ž

๐‘“๐‘Œ (๐‘ฆ) ๐‘‘๐‘ฆ

โŽคโŽฆ 1๐‘

,

where ๐‘ž โˆˆ N, 2๐‘ < 2๐‘ž โ‰ค 4๐‘, ๐‘“๐‘Œ (๐‘ฆ) is given by (5.8), and ๐›ฝ โˆˆ (1/2, 1). The second term is

at most 2(1โˆ’ ๐›ฝ), and the integrand in the first term is bounded above by

(โˆ‘๐‘–:๐œ†๐‘–<2๐›ฝโˆ’1 ๐‘ฆ๐‘–๐‘„

2(๐œ†๐‘–)(1โˆ’ ๐œ†๐‘–)2๐‘žโˆ‘๐‘›

๐‘–=1 ๐‘ฆ๐‘–๐‘„2(๐œ†๐‘–)

)๐‘/2๐‘ž

โ‰ค

(โˆ‘๐‘–:๐œ†๐‘–<2๐›ฝโˆ’1 ๐‘ฆ๐‘–๐‘„

2(๐œ†๐‘–)(1โˆ’ ๐œ†๐‘–)2๐‘žโˆ‘๐‘›

๐‘–=1 ๐‘ฆ๐‘–๐‘„2(๐œ†๐‘–)

)1/4

โ‰ค

(โˆ‘๐‘–:๐œ†๐‘–<2๐›ฝโˆ’1 ๐‘ฆ๐‘–๐‘„

2(๐œ†๐‘–)(1โˆ’ ๐œ†๐‘–)2๐‘žโˆ‘

๐‘–:๐œ†๐‘–โ‰ฅ๐›ฝ ๐‘ฆ๐‘–๐‘„2(๐œ†๐‘–)

)1/4

โ‰คmax๐‘ฅโˆˆ[0,2๐›ฝโˆ’1] |๐‘„(๐‘ฅ)|1/2 (1โˆ’ ๐‘ฅ)2

๐‘žโˆ’2

min๐‘ฅโˆˆ[๐›ฝ,1] |๐‘„(๐‘ฅ)|1/2

(โˆ‘๐‘–:๐œ†๐‘–<2๐›ฝโˆ’1 ๐‘ฆ๐‘–โˆ‘๐‘–:๐œ†๐‘–โ‰ฅ๐›ฝ ๐‘ฆ๐‘–

)1/4

.

Letโˆš

1โˆ’ ๐›ฝ = (2๐‘+๐›ผ/4) ln(๐‘šโˆ’1)๐‘šโˆ’1

< 1/4 (if ๐›ฝ โ‰ค 1/2, then our estimate is a trivial one, and we

are already done). By the condition of the theorem, there are at least ๐‘›/(๐‘šโˆ’1)๐›ผ eigenvalues

176

in the interval [๐›ฝ, 1]. Therefore,

โˆซ[0,โˆž)๐‘›

(โˆ‘๐‘–:๐œ†๐‘–<2๐›ฝโˆ’1 ๐‘ฆ๐‘–โˆ‘๐‘–:๐œ†๐‘–โ‰ฅ๐›ฝ ๐‘ฆ๐‘–

)1/4

๐‘“๐‘Œ (๐‘ฆ) ๐‘‘๐‘ฆ โ‰ค E๐‘Œโˆผ๐œ’2๐‘›

[๐‘Œ 1/4

]E๐‘Œโˆผ๐œ’2

โŒˆ๐‘›/(๐‘šโˆ’1)๐›ผโŒ‰

[๐‘Œ โˆ’1/4

]=

ฮ“(๐‘›/2 + 1/4)ฮ“(โŒˆ๐‘›/(๐‘šโˆ’ 1)๐›ผโŒ‰/2โˆ’ 1/4)

ฮ“(๐‘›/2)ฮ“(โŒˆ๐‘›/(๐‘šโˆ’ 1)๐›ผโŒ‰/2)

โ‰ค ฮ“(๐‘š(๐‘šโˆ’ 1)๐›ผ/2 + 1/4)ฮ“(๐‘š/2โˆ’ 1/4)

ฮ“(๐‘š(๐‘šโˆ’ 1)๐›ผ/2)ฮ“(๐‘š/2)

โ‰ค 1.04 (๐‘šโˆ’ 1)๐›ผ/4

for ๐‘› โ‰ฅ ๐‘š(๐‘š โˆ’ 1)๐›ผ and ๐‘š โ‰ฅ 10. Replacing the minimizing polynomial by ๐‘„(๐‘ฅ) =๐‘‡๐‘šโˆ’1( 2๐‘ฅ

2๐›ฝโˆ’1โˆ’1)

๐‘‡๐‘šโˆ’1( 22๐›ฝโˆ’1

โˆ’1), we obtain

max๐‘ฅโˆˆ[0,2๐›ฝโˆ’1]

๐‘„(๐‘ฅ)12

(1โˆ’ ๐‘ฅ)2๐‘žโˆ’2

min๐‘ฅโˆˆ[๐›ฝ,1]

๐‘„(๐‘ฅ)12

=1

๐‘‡1/2๐‘šโˆ’1

(2๐›ฝ

2๐›ฝโˆ’1โˆ’ 1) โ‰ค 1

๐‘‡1/2๐‘šโˆ’1

(2๐›ฝโˆ’ 1) โ‰ค โˆš2๐‘’โˆ’

โˆš1โˆ’๐›ฝ(๐‘šโˆ’1).

Combining our estimates results in the bound

E

[(๐œ†1 โˆ’ ๐œ†

(๐‘š)1

๐œ†1 โˆ’ ๐œ†๐‘›

)๐‘]1๐‘

โ‰ค(

1.04โˆš

2(๐‘šโˆ’ 1)๐›ผ/4)1/๐‘

๐‘’โˆ’โˆš1โˆ’๐›ฝ(๐‘šโˆ’1)/๐‘ + 2(1โˆ’ ๐›ฝ)

=(1.04

โˆš2)1/๐‘

(๐‘šโˆ’ 1)2+

(2๐‘ + ๐›ผ/4)2 ln2(๐‘šโˆ’ 1)

(๐‘šโˆ’ 1)2

โ‰ค .077(2๐‘ + ๐›ผ/4)2 ln2(๐‘šโˆ’ 1)

(๐‘šโˆ’ 1)2

for ๐‘š โ‰ฅ 10, ๐‘ โ‰ฅ 1, and ๐›ผ > 0. This completes the bound for expected relative error. We

177

now produce a probabilistic bound for relative error. Letโˆš

1โˆ’ ๐›ฝ = (๐›ผ+2) ln(๐‘šโˆ’1)4(๐‘šโˆ’1)

. We have

๐œ†1 โˆ’ ๐œ†(๐‘š)1

๐œ†1 โˆ’ ๐œ†๐‘›

= min๐‘„โˆˆ๐’ซ๐‘šโˆ’1(1)

โˆ‘๐‘›๐‘–=2 ๐‘Œ๐‘–๐‘„

2(๐œ†๐‘–)(1โˆ’ ๐œ†๐‘–)โˆ‘๐‘›๐‘–=1 ๐‘Œ๐‘–๐‘„2(๐œ†๐‘–)

โ‰ค min๐‘„โˆˆ๐’ซ๐‘šโˆ’1(1)

max๐‘ฅโˆˆ[0,2๐›ฝโˆ’1]๐‘„2(๐‘ฅ)(1โˆ’ ๐‘ฅ)

min๐‘ฅโˆˆ[๐›ฝ,1]๐‘„2(๐‘ฅ)

โˆ‘๐‘–:๐œ†๐‘–<2๐›ฝโˆ’1 ๐‘Œ๐‘–โˆ‘๐‘–:๐œ†๐‘–โ‰ฅ๐›ฝ ๐‘Œ๐‘–

+ 2(1โˆ’ ๐›ฝ)

โ‰ค ๐‘‡โˆ’2๐‘šโˆ’1

(2

๐›ฝโˆ’ 1

)โˆ‘๐‘–:๐œ†๐‘–<2๐›ฝโˆ’1 ๐‘Œ๐‘–โˆ‘๐‘–:๐œ†๐‘–โ‰ฅ๐›ฝ ๐‘Œ๐‘–

+ 2(1โˆ’ ๐›ฝ)

โ‰ค 4 expโˆ’4โˆš

1โˆ’ ๐›ฝ(๐‘šโˆ’ 1)โˆ‘

๐‘–:๐œ†๐‘–<2๐›ฝโˆ’1 ๐‘Œ๐‘–โˆ‘๐‘–:๐œ†๐‘–โ‰ฅ๐›ฝ ๐‘Œ๐‘–

+ 2(1โˆ’ ๐›ฝ).

By Proposition 11,

P

[โˆ‘๐‘–:๐œ†๐‘–<2๐›ฝโˆ’1 ๐‘Œ๐‘–โˆ‘๐‘–:๐œ†๐‘–โ‰ฅ๐›ฝ ๐‘Œ๐‘–

โ‰ฅ 4(๐‘šโˆ’ 1)๐›ผ

]โ‰ค P

[ โˆ‘๐‘–:๐œ†๐‘–<2๐›ฝโˆ’1

๐‘Œ๐‘– โ‰ฅ 2๐‘›

]+ P

[ โˆ‘๐‘–:๐œ†๐‘–โ‰ฅ๐›ฝ

๐‘Œ๐‘– โ‰ค๐‘›

2(๐‘šโˆ’ 1)๐›ผ

]โ‰ค (2/๐‘’)๐‘›/2 +

(โˆš๐‘’/2)๐‘›/2(๐‘šโˆ’1)๐›ผ

= ๐‘‚(๐‘’โˆ’๐‘š).

Then, with probability 1โˆ’๐‘‚(๐‘’โˆ’๐‘š),

๐œ†1 โˆ’ ๐œ†(๐‘š)1

๐œ†1 โˆ’ ๐œ†๐‘›

โ‰ค 16(๐‘šโˆ’ 1)๐›ผ๐‘’โˆ’4โˆš1โˆ’๐›ฝ(๐‘šโˆ’1) + 2(1โˆ’ ๐›ฝ) =

16

(๐‘šโˆ’ 1)2+

(๐›ผ + 2)2 ln2(๐‘šโˆ’ 1)

8(๐‘šโˆ’ 1)2.

The 16/(๐‘š โˆ’ 1)2 term is dominated by the log term as ๐‘š increases, and, therefore, with

probability 1โˆ’๐‘‚(๐‘’โˆ’๐‘š),

๐œ†1 โˆ’ ๐œ†(๐‘š)1

๐œ†1 โˆ’ ๐œ†๐‘›

โ‰ค .126(๐›ผ + 2)2 ln2(๐‘šโˆ’ 1)

(๐‘šโˆ’ 1)2.

This completes the proof.

The above theorem shows that, for matrices whose distribution of eigenvalues is inde-

pendent of ๐‘›, we can obtain dimension-free estimates. For example, the above theorem

holds for the matrix from Example 1, for ๐›ผ = 2.

When a matrix has eigenvalues known to converge to a limiting distribution as the

dimension increases, or a random matrix ๐‘‹๐‘› exhibits suitable convergence of its empirical

178

spectral distribution ๐ฟ๐‘‹๐‘› := 1/๐‘›โˆ‘๐‘›

๐‘–=1 ๐›ฟ๐œ†๐‘–(๐‘‹๐‘›), improved estimates can be obtained by

simply estimating the corresponding integral polynomial minimization problem. However,

to do so, we first require a law of large numbers for weighted sums of independent identically

distributed (i.i.d.) random variables. We recall the following result.

Proposition 14. ([25]) Let ๐‘Ž1, ๐‘Ž2, ... โˆˆ [๐‘Ž, ๐‘] and ๐‘‹1, ๐‘‹2, ... be i.i.d. random variables, with

E[๐‘‹1] = 0 and E[๐‘‹21 ] <โˆž. Then 1

๐‘›

โˆ‘๐‘›๐‘–=1 ๐‘Ž๐‘–๐‘‹๐‘– โ†’ 0 almost surely.

We present the following theorem regarding the error of random matrices that exhibit

suitable convergence.

Theorem 21. Let ๐‘‹๐‘› โˆˆ ๐’ฎ๐‘›, ฮ›(๐‘‹๐‘›) โˆˆ [๐‘Ž, ๐‘], ๐‘› = 1, 2, ... be a sequence of random matrices,

such that ๐ฟ๐‘‹๐‘› converges in probability to ๐œŽ(๐‘ฅ) ๐‘‘๐‘ฅ in ๐ฟ2([๐‘Ž, ๐‘]), where ๐œŽ(๐‘ฅ) โˆˆ ๐ถ([๐‘Ž, ๐‘]),

๐‘Ž, ๐‘ โˆˆ supp(๐œŽ). Then, for all ๐‘š โˆˆ N and ๐œ– > 0,

lim๐‘›โ†’โˆž

P(

๐œ†1(๐‘‹๐‘›)โˆ’ ๐œ†(๐‘š)1 (๐‘‹๐‘›, ๐‘)

๐œ†1(๐‘‹๐‘›)โˆ’ ๐œ†๐‘›(๐‘‹๐‘›)โˆ’ ๐‘โˆ’ ๐œ‰(๐‘š)

๐‘โˆ’ ๐‘Ž

> ๐œ–

)= 0,

where ๐œ‰(๐‘š) is the largest zero of the ๐‘š๐‘กโ„Ž degree orthogonal polynomial of ๐œŽ(๐‘ฅ) in the

interval [๐‘Ž, ๐‘].

Proof. The main idea of the proof is to use Proposition 14 to control the behavior of ๐‘Œ , and

the convergence of ๐ฟ๐‘‹๐‘› to ๐œŽ(๐‘ฅ) ๐‘‘๐‘ฅ to show convergence to our integral minimization prob-

lem. We first write our polynomial ๐‘ƒ โˆˆ ๐’ซ๐‘šโˆ’1 as ๐‘ƒ (๐‘ฅ) =โˆ‘๐‘šโˆ’1

๐‘—=0 ๐‘๐‘—๐‘ฅ๐‘— and our unnormalized

error as

๐œ†1(๐‘‹๐‘›)โˆ’ ๐œ†(๐‘š)1 (๐‘‹๐‘›, ๐‘) = min

๐‘ƒโˆˆ๐’ซ๐‘šโˆ’1๐‘ƒ =0

โˆ‘๐‘›๐‘–=2 ๐‘Œ๐‘–๐‘ƒ

2(๐œ†๐‘–)(๐œ†1 โˆ’ ๐œ†๐‘–)โˆ‘๐‘›๐‘–=1 ๐‘Œ๐‘–๐‘ƒ 2(๐œ†๐‘–)

= min๐‘โˆˆR๐‘š

๐‘ =0

โˆ‘๐‘šโˆ’1๐‘—1,๐‘—2=0 ๐‘๐‘—1๐‘๐‘—2

โˆ‘๐‘›๐‘–=2 ๐‘Œ๐‘–๐œ†

๐‘—1+๐‘—2๐‘– (๐œ†1 โˆ’ ๐œ†๐‘–)โˆ‘๐‘šโˆ’1

๐‘—1,๐‘—2=0 ๐‘๐‘—1๐‘๐‘—2โˆ‘๐‘›

๐‘–=1 ๐‘Œ๐‘–๐œ†๐‘—1+๐‘—2๐‘–

,

where ๐‘Œ1, ..., ๐‘Œ๐‘› are i.i.d. chi-square random variables with one degree of freedom each.

The functions ๐‘ฅ๐‘— , ๐‘— = 0, ..., 2๐‘š โˆ’ 2, are bounded on [๐‘Ž, ๐‘], and so, by Proposition 14, for

179

any ๐œ–1, ๐œ–2 > 0,

1

๐‘›

๐‘›โˆ‘๐‘–=2

๐‘Œ๐‘–๐œ†๐‘—๐‘– (๐œ†1 โˆ’ ๐œ†๐‘–)โˆ’

1

๐‘›

๐‘›โˆ‘๐‘–=2

๐œ†๐‘—๐‘– (๐œ†1 โˆ’ ๐œ†๐‘–)

< ๐œ–1, ๐‘— = 0, ..., 2๐‘šโˆ’ 2,

and 1

๐‘›

๐‘›โˆ‘๐‘–=1

๐‘Œ๐‘–๐œ†๐‘—๐‘– โˆ’

1

๐‘›

๐‘›โˆ‘๐‘–=1

๐œ†๐‘—๐‘–

< ๐œ–2, ๐‘— = 0, ..., 2๐‘šโˆ’ 2,

with probability 1โˆ’๐‘œ(1). ๐ฟ๐‘‹๐‘› converges in probability to ๐œŽ(๐‘ฅ) ๐‘‘๐‘ฅ, and so, for any ๐œ–3, ๐œ–4 > 0,

1

๐‘›

๐‘›โˆ‘๐‘–=2

๐œ†๐‘—๐‘– (๐œ†1 โˆ’ ๐œ†๐‘–)โˆ’

โˆซ ๐‘

๐‘Ž

๐‘ฅ๐‘—(๐‘โˆ’ ๐‘ฅ)๐œŽ(๐‘ฅ) ๐‘‘๐‘ฅ

< ๐œ–3, ๐‘— = 0, ..., 2๐‘šโˆ’ 2,

and 1

๐‘›

๐‘›โˆ‘๐‘–=1

๐œ†๐‘—๐‘– โˆ’

โˆซ ๐‘

๐‘Ž

๐‘ฅ๐‘—๐œŽ(๐‘ฅ) ๐‘‘๐‘ฅ

< ๐œ–4, ๐‘— = 0, ..., 2๐‘šโˆ’ 2,

with probability 1โˆ’ ๐‘œ(1). This implies that

๐œ†1(๐‘‹๐‘›)โˆ’ ๐œ†(๐‘š)1 (๐‘‹๐‘›, ๐‘) = min

๐‘โˆˆR๐‘š

๐‘ =0

โˆ‘๐‘šโˆ’1๐‘—1,๐‘—2=0 ๐‘๐‘—1๐‘๐‘—2

[โˆซ ๐‘

๐‘Ž๐‘ฅ๐‘—1+๐‘—2(๐‘โˆ’ ๐‘ฅ)๐œŽ(๐‘ฅ) ๐‘‘๐‘ฅ + (๐‘—1, ๐‘—2)

]โˆ‘๐‘šโˆ’1

๐‘—1,๐‘—2=0 ๐‘๐‘—1๐‘๐‘—2

[โˆซ ๐‘

๐‘Ž๐‘ฅ๐‘—1+๐‘—2๐œŽ(๐‘ฅ) ๐‘‘๐‘ฅ + ๐ธ(๐‘—1, ๐‘—2)

] ,

where |(๐‘—1, ๐‘—2)| < ๐œ–1 + ๐œ–3 and |๐ธ(๐‘—1, ๐‘—2)| < ๐œ–2 + ๐œ–4, ๐‘—1, ๐‘—2 = 0, ...,๐‘šโˆ’ 1, with probability

1โˆ’ ๐‘œ(1).

By the linearity of integration, the minimization problem

min๐‘ƒโˆˆ๐’ซ๐‘šโˆ’1

๐‘ƒ =0

โˆซ ๐‘

๐‘Ž๐‘ƒ 2(๐‘ฅ)

(๐‘โˆ’๐‘ฅ๐‘โˆ’๐‘Ž

)๐œŽ(๐‘ฅ) ๐‘‘๐‘ฅโˆซ ๐‘

๐‘Ž๐‘ƒ 2(๐‘ฅ)๐œŽ(๐‘ฅ) ๐‘‘๐‘ฅ

,

when rewritten in terms of the polynomial coefficients ๐‘๐‘–, ๐‘– = 0, ...,๐‘š โˆ’ 1, corresponds

to a generalized Rayleigh quotient ๐‘๐‘‡๐ด๐‘๐‘๐‘‡๐ต๐‘

, where ๐ด,๐ต โˆˆ ๐’ฎ๐‘š++ and ๐œ†๐‘š๐‘Ž๐‘ฅ(๐ด), ๐œ†๐‘š๐‘Ž๐‘ฅ(๐ต),

and ๐œ†๐‘š๐‘–๐‘›(๐ต) are all constants independent of ๐‘›, and ๐‘ = (๐‘0, ..., ๐‘๐‘šโˆ’1)๐‘‡ . By choosing

180

๐œ–1, ๐œ–2, ๐œ–3, ๐œ–4 sufficiently small, ๐‘๐‘‡ (๐ด + )๐‘

๐‘๐‘‡ (๐ต + ๐ธ)๐‘โˆ’ ๐‘๐‘‡๐ด๐‘

๐‘๐‘‡๐ต๐‘

=

(๐‘๐‘‡ ๐‘)๐‘๐‘‡๐ต๐‘โˆ’ (๐‘๐‘‡๐ธ๐‘)๐‘๐‘‡๐ด๐‘

(๐‘๐‘‡๐ต๐‘ + ๐‘๐‘‡๐ธ๐‘)๐‘๐‘‡๐ต๐‘

โ‰ค (๐œ–1 + ๐œ–3)๐‘š๐œ†๐‘š๐‘Ž๐‘ฅ(๐ต) + (๐œ–2 + ๐œ–4)๐‘š๐œ†๐‘š๐‘Ž๐‘ฅ(๐ด)

(๐œ†๐‘š๐‘–๐‘›(๐ต)โˆ’ (๐œ–2 + ๐œ–4)๐‘š)๐œ†๐‘š๐‘–๐‘›(๐ต)โ‰ค ๐œ–

for all ๐‘ โˆˆ R๐‘š with probability 1 โˆ’ ๐‘œ(1). Applying Proposition 10 to the above integral

minimization problem completes the proof.

The above theorem is a powerful tool for explicitly computing the error in the Lanczos

method for certain types of matrices, as the computation of the extremal eigenvalue of an

๐‘šร—๐‘š matrix (corresponding to the largest zero) is a nominal computation compared to

one application of an ๐‘› dimensional matrix. In Section 6, we will see that this convergence

occurs quickly in practice. In addition, the above result provides strong evidence that the

inverse quadratic dependence on ๐‘š in the error estimates throughout this chapter is not so

much a worst case estimate, but actually indicative of error rates in practice. For instance,

if the eigenvalues of a matrix are sampled from a distribution bounded above and below

by some multiple of a Jacobi weight function, then, by equation (iv) Proposition 13 and

Theorem 21, it immediately follows that the error is of order ๐‘šโˆ’2. Of course, we note that

Theorem 21 is equally applicable for estimating ๐œ†๐‘›.

5.5 Estimates for Arbitrary Eigenvalues and Condition Num-

ber

Up to this point, we have concerned ourselves almost exclusively with the extremal eigen-

values ๐œ†1 and ๐œ†๐‘› of a matrix. In this section, we extend the techniques of this chapter to

arbitrary eigenvalues, and also obtain bounds for the condition number of a positive definite

matrix. The results of this section provide the first uniform error estimates for arbitrary

eigenvalues and the first lower bounds for the relative error with respect to the condition

number. Lower bounds for arbitrary eigenvalues follow relatively quickly from our previous

181

work. However, our proof technique for upper bounds requires some mild assumptions

regarding the eigenvalue gaps of the matrix. We begin with asymptotic lower bounds for an

arbitrary eigenvalue, and present the following corollary of Theorem 19.

Corollary 3.

max๐ดโˆˆ๐’ฎ๐‘›

P๐‘โˆผ๐’ฐ(๐‘†๐‘›โˆ’1)

[๐œ†๐‘–(๐ด)โˆ’ ๐œ†

(๐‘š)๐‘– (๐ด, ๐‘)

๐œ†1(๐ด)โˆ’ ๐œ†๐‘›(๐ด)โ‰ฅ 1โˆ’ ๐‘œ(1)

]โ‰ฅ 1โˆ’ ๐‘œ(1/๐‘›)

for ๐‘š = ๐‘œ(ln๐‘›),

max๐ดโˆˆ๐’ฎ๐‘›

P๐‘โˆผ๐’ฐ(๐‘†๐‘›โˆ’1)

[๐œ†๐‘–(๐ด)โˆ’ ๐œ†

(๐‘š)๐‘– (๐ด, ๐‘)

๐œ†1(๐ด)โˆ’ ๐œ†๐‘›(๐ด)โ‰ฅ .015 ln2 ๐‘›

๐‘š2 ln2 ln๐‘›

]โ‰ฅ 1โˆ’ ๐‘œ(1/๐‘›)

for ๐‘š = ฮ˜(ln๐‘›), and

max๐ดโˆˆ๐’ฎ๐‘›

P๐‘โˆผ๐’ฐ(๐‘†๐‘›โˆ’1)

[๐œ†๐‘–(๐ด)โˆ’ ๐œ†

(๐‘š)๐‘– (๐ด, ๐‘)

๐œ†1(๐ด)โˆ’ ๐œ†๐‘›(๐ด)โ‰ฅ 1.08

๐‘š2

]โ‰ฅ 1โˆ’ ๐‘œ(1/๐‘›)

for ๐‘š = ๐‘œ(๐‘›1/2 lnโˆ’1/2 ๐‘›

)and ๐œ”(1).

Proof. Let ๐œ™1, ..., ๐œ™๐‘› be orthonormal eigenvectors corresponding to ๐œ†1(๐ด) โ‰ฅ ... โ‰ฅ ๐œ†๐‘›(๐ด),

and = ๐‘โˆ’โˆ‘๐‘–โˆ’1

๐‘—=1(๐œ™๐‘‡๐‘— ๐‘)๐‘. By the inequalities

๐œ†(๐‘š)๐‘– (๐ด, ๐‘) โ‰ค max

๐‘ฅโˆˆ๐’ฆ๐‘š(๐ด,๐‘)โˆ–0๐‘ฅ๐‘‡๐œ™๐‘—=0,๐‘—=1,...,๐‘–โˆ’1

๐‘ฅ๐‘‡๐ด๐‘ฅ

๐‘ฅ๐‘‡๐‘ฅโ‰ค max

๐‘ฅโˆˆ๐’ฆ๐‘š(๐ด,)โˆ–0

๐‘ฅ๐‘‡๐ด๐‘ฅ

๐‘ฅ๐‘‡๐‘ฅ= ๐œ†

(๐‘š)1

(๐ด,

),

the relative error๐œ†๐‘–(๐ด)โˆ’ ๐œ†

(๐‘š)๐‘– (๐ด, ๐‘)

๐œ†1(๐ด)โˆ’ ๐œ†๐‘›(๐ด)โ‰ฅ ๐œ†๐‘–(๐ด)โˆ’ ๐œ†

(๐‘š)1 (๐ด, )

๐œ†1(๐ด)โˆ’ ๐œ†๐‘›(๐ด).

The right-hand side corresponds to an extremal eigenvalue problem of dimension ๐‘›โˆ’ ๐‘– + 1.

Setting the largest eigenvalue of ๐ด to have multiplicity ๐‘–, and defining the eigenvalues

๐œ†๐‘–, ..., ๐œ†๐‘› based on the eigenvalues (corresponding to dimension ๐‘›โˆ’ ๐‘–+1) used in the proofs

of Lemmas 20 and 21 completes the proof.

182

Next, we provide an upper bound for the relative error in approximating ๐œ†๐‘– under the

assumption of non-zero gaps between eigenvalues ๐œ†1, ..., ๐œ†๐‘–.

Theorem 22. Let ๐‘› โ‰ฅ 100, ๐‘š โ‰ฅ 9 + ๐‘–, ๐‘ โ‰ฅ 1, and ๐ด โˆˆ ๐’ฎ๐‘›. Then

E๐‘โˆผ๐’ฐ(๐‘†๐‘›โˆ’1)

[(๐œ†๐‘–(๐ด)โˆ’ ๐œ†

(๐‘š)๐‘– (๐ด, ๐‘)

๐œ†๐‘–(๐ด)โˆ’ ๐œ†๐‘›(๐ด)

)๐‘]1/๐‘โ‰ค .068

ln2(๐›ฟโˆ’2(๐‘–โˆ’1)๐‘›(๐‘šโˆ’ ๐‘–)8๐‘

)(๐‘šโˆ’ ๐‘–)2

and

P๐‘โˆผ๐’ฐ(๐‘†๐‘›โˆ’1)

[๐œ†๐‘–(๐ด)โˆ’ ๐œ†

(๐‘š)๐‘– (๐ด, ๐‘)

๐œ†๐‘–(๐ด)โˆ’ ๐œ†๐‘›(๐ด)โ‰ค .571

ln2(๐›ฟโˆ’2(๐‘–โˆ’1)/3๐‘›(๐‘šโˆ’ ๐‘–)2/3

)(๐‘šโˆ’ ๐‘–)2

]โ‰ฅ 1โˆ’ ๐‘œ(1/๐‘›),

where

๐›ฟ =1

2min

๐‘˜=2,...,๐‘–

๐œ†๐‘˜โˆ’1(๐ด)โˆ’ ๐œ†๐‘˜(๐ด)

๐œ†1(๐ด)โˆ’ ๐œ†๐‘›(๐ด).

Proof. As in previous cases, it suffices to prove the theorem for matrices ๐ด with ๐œ†๐‘–(๐ด) = 1

and ๐œ†๐‘›(๐ด) = 0 (if ๐œ†๐‘– = ๐œ†๐‘›, we are done). Similar to the polynomial representation of

๐œ†(๐‘š)1 (๐ด, ๐‘), the Ritz value ๐œ†

(๐‘š)๐‘– (๐ด, ๐‘) corresponds to finding the polynomial in ๐’ซ๐‘šโˆ’1 with

zeros ๐œ†(๐‘š)๐‘˜ , ๐‘˜ = 1, ..., ๐‘–โˆ’ 1, that maximizes the corresponding Rayleigh quotient. For the

sake of brevity, let ๐œ‘๐‘–(๐‘ฅ) =โˆ๐‘–โˆ’1

๐‘˜=1(๐œ†(๐‘š)๐‘˜ โˆ’ ๐‘ฅ)2. Then ๐œ†

(๐‘š)๐‘– (๐ด, ๐‘) can be written as

๐œ†(๐‘š)๐‘– (๐ด, ๐‘) = max

๐‘ƒโˆˆ๐’ซ๐‘šโˆ’๐‘–

โˆ‘๐‘›๐‘—=1 ๐‘

2๐‘—๐‘ƒ

2(๐œ†๐‘—)๐œ‘๐‘–(๐œ†๐‘—)๐œ†๐‘—โˆ‘๐‘›๐‘—=1 ๐‘

2๐‘—๐‘ƒ

2(๐œ†๐‘—)๐œ‘๐‘–(๐œ†๐‘—),

and, therefore, the error is bounded above by

๐œ†๐‘–(๐ด)โˆ’ ๐œ†(๐‘š)๐‘– (๐ด, ๐‘) โ‰ค max

๐‘ƒโˆˆ๐’ซ๐‘šโˆ’๐‘–

โˆ‘๐‘›๐‘—=๐‘–+1 ๐‘

2๐‘—๐‘ƒ

2(๐œ†๐‘—)๐œ‘๐‘–(๐œ†๐‘—)(1โˆ’ ๐œ†๐‘—)โˆ‘๐‘›๐‘—=1 ๐‘

2๐‘—๐‘ƒ

2(๐œ†๐‘—)๐œ‘๐‘–(๐œ†๐‘—).

The main idea of the proof is very similar to that of Lemma 22, paired with a pigeonhole

principle. The intervals [๐œ†๐‘—(๐ด)โˆ’ ๐›ฟ๐œ†1, ๐œ†๐‘—(๐ด) + ๐›ฟ๐œ†1], ๐‘— = 1, ..., ๐‘–, are disjoint, and so there

exists some index ๐‘—* such that the corresponding interval does not contain any of the Ritz

values ๐œ†(๐‘š)๐‘˜ (๐ด, ๐‘), ๐‘˜ = 1, ..., ๐‘– โˆ’ 1. We begin by bounding expected relative error. As

in the proof of Lemma 22, by integrating over chi-square random variables and using

183

Cauchy-Schwarz, we have

E[(๐œ†๐‘– โˆ’ ๐œ†

(๐‘š)๐‘–

)๐‘]1๐‘ โ‰ค min๐‘ƒโˆˆ๐’ซ๐‘šโˆ’๐‘–

โŽกโŽฃโˆซ[0,โˆž)๐‘›

(โˆ‘๐‘›๐‘—=๐‘–+1 ๐‘ฆ๐‘—๐‘ƒ

2(๐œ†๐‘—)๐œ‘๐‘–(๐œ†๐‘—)(1โˆ’ ๐œ†๐‘—)2๐‘žโˆ‘๐‘›

๐‘—=1 ๐‘ฆ๐‘—๐‘ƒ2(๐œ†๐‘—)๐œ‘๐‘–(๐œ†๐‘—)

) ๐‘2๐‘ž

๐‘“๐‘Œ (๐‘ฆ) ๐‘‘๐‘ฆ

โŽคโŽฆ 1๐‘

โ‰ค min๐‘ƒโˆˆ๐’ซ๐‘šโˆ’๐‘–

โŽกโŽฃโˆซ[0,โˆž)๐‘›

(โˆ‘๐‘—:๐œ†๐‘—<๐›ฝ ๐‘ฆ๐‘—๐‘ƒ

2(๐œ†๐‘—)๐œ‘๐‘–(๐œ†๐‘—)(1โˆ’ ๐œ†๐‘—)2๐‘žโˆ‘๐‘›

๐‘—=1 ๐‘ฆ๐‘—๐‘ƒ2(๐œ†๐‘—)๐œ‘๐‘–(๐œ†๐‘—)

) ๐‘2๐‘ž

๐‘“๐‘Œ (๐‘ฆ) ๐‘‘๐‘ฆ

โŽคโŽฆ 1๐‘

+

โŽกโŽฃโˆซ[0,โˆž)๐‘›

(โˆ‘๐‘—:๐œ†๐‘—โˆˆ[๐›ฝ,1] ๐‘ฆ๐‘—๐‘ƒ

2(๐œ†๐‘—)๐œ‘๐‘–(๐œ†๐‘—)(1โˆ’ ๐œ†๐‘—)2๐‘žโˆ‘๐‘›

๐‘—=1 ๐‘ฆ๐‘—๐‘ƒ2(๐œ†๐‘—)๐œ‘๐‘–(๐œ†๐‘—)

) ๐‘2๐‘ž

๐‘“๐‘Œ (๐‘ฆ) ๐‘‘๐‘ฆ

โŽคโŽฆ 1๐‘

for ๐‘ž โˆˆ N, 2๐‘ < 2๐‘ž โ‰ค 4๐‘, and any ๐›ฝ โˆˆ (0, 1). The second term on the right-hand side

is bounded above by 1โˆ’ ๐›ฝ (the integrand is at most max๐œ†โˆˆ[๐›ฝ,1](1โˆ’ ๐œ†) = 1โˆ’ ๐›ฝ), and the

integrand in the first term is bounded above by

(โˆ‘๐‘—:๐œ†๐‘—<๐›ฝ ๐‘ฆ๐‘—๐‘ƒ

2(๐œ†๐‘—)๐œ‘๐‘–(๐œ†๐‘—)(1โˆ’ ๐œ†๐‘—)2๐‘žโˆ‘๐‘›

๐‘—=1 ๐‘ฆ๐‘—๐‘ƒ2(๐œ†๐‘—)๐œ‘๐‘–(๐œ†๐‘—)

) ๐‘2๐‘ž

โ‰ค

(โˆ‘๐‘—:๐œ†๐‘—<๐›ฝ ๐‘ฆ๐‘—๐‘ƒ

2(๐œ†๐‘—)๐œ‘๐‘–(๐œ†๐‘—)(1โˆ’ ๐œ†๐‘—)2๐‘žโˆ‘๐‘›

๐‘—=1 ๐‘ฆ๐‘—๐‘ƒ2(๐œ†๐‘—)๐œ‘๐‘–(๐œ†๐‘—)

) 14

โ‰ค max๐‘ฅโˆˆ[0,๐›ฝ]

|๐‘ƒ (๐‘ฅ)| 12๐œ‘14๐‘– (๐‘ฅ)(1โˆ’ ๐‘ฅ)2

๐‘žโˆ’2

|๐‘ƒ (๐œ†๐‘—*)| 12๐œ‘14๐‘– (๐œ†๐‘—*)

(โˆ‘๐‘–:๐œ†๐‘–<๐›ฝ ๐‘ฆ๐‘–

๐‘ฆ๐‘—*

) 14

.

By replacing the minimizing polynomial in ๐’ซ๐‘šโˆ’๐‘– by ๐‘‡๐‘šโˆ’๐‘–

(2๐›ฝ๐‘ฅโˆ’ 1

), the maximum is

achieved at ๐‘ฅ = 0, and, by the monotonicity of ๐‘‡๐‘šโˆ’๐‘– on [1,โˆž),

๐‘‡๐‘šโˆ’๐‘–

(2

๐›ฝ๐œ†๐‘—* โˆ’ 1

)โ‰ฅ ๐‘‡๐‘šโˆ’๐‘–

(2

๐›ฝโˆ’ 1

)โ‰ฅ 1

2exp

2โˆš

1โˆ’ ๐›ฝ (๐‘šโˆ’ ๐‘–).

In addition,

๐œ‘1/4๐‘– (0)

๐œ‘1/4๐‘– (๐œ†๐‘—*)

=๐‘–โˆ’1โˆ๐‘˜=1

๐œ†

(๐‘š)๐‘˜

๐œ†(๐‘š)๐‘˜ โˆ’ ๐œ†๐‘—*

1/2

โ‰ค ๐›ฟโˆ’(๐‘–โˆ’1)/2.

184

We can now bound the ๐‘-norm by

E[(

๐œ†๐‘– โˆ’ ๐œ†(๐‘š)๐‘–

)๐‘]1๐‘ โ‰ค [21/4ฮ“(1/4)

ฮ“(1/2)

๐‘›1/4

๐›ฟ(๐‘–โˆ’1)/2

]1/๐‘๐‘’โˆ’๐›พ(๐‘šโˆ’๐‘–)/๐‘ + ๐›พ2,

where ๐›พ =โˆš

1โˆ’ ๐›ฝ. Setting ๐›พ = ๐‘๐‘šโˆ’๐‘–

ln(๐›ฟโˆ’(๐‘–โˆ’1)/2๐‘๐‘›1/4๐‘(๐‘šโˆ’ ๐‘–)2

)(assuming ๐›พ < 1,

otherwise our bound is already greater than one, and trivially holds), we obtain

E[(

๐œ†๐‘– โˆ’ ๐œ†(๐‘š)๐‘–

)๐‘]1๐‘ โ‰ค(

21/4ฮ“(1/4)ฮ“(1/2)

)1/๐‘+ 1

16ln2(๐›ฟโˆ’2(๐‘–โˆ’1)๐‘›(๐‘šโˆ’ ๐‘–)8๐‘

)(๐‘šโˆ’ ๐‘–)2

โ‰ค .068ln2(๐›ฟโˆ’2(๐‘–โˆ’1)๐‘›(๐‘šโˆ’ ๐‘–)8๐‘

)(๐‘šโˆ’ ๐‘–)2

,

for ๐‘š โ‰ฅ 9 + ๐‘–, ๐‘› โ‰ฅ 100. This completes the proof of the expected error estimate. We now

focus on the probabilistic estimate. We have

๐œ†๐‘–(๐ด)โˆ’ ๐œ†(๐‘š)๐‘– (๐ด, ๐‘) โ‰ค min

๐‘ƒโˆˆ๐’ซ๐‘šโˆ’๐‘–

โˆ‘๐‘›๐‘—=๐‘–+1 ๐‘Œ๐‘—๐‘ƒ

2(๐œ†๐‘—)๐œ‘๐‘–(๐œ†๐‘—)(1โˆ’ ๐œ†๐‘—)โˆ‘๐‘›๐‘—=1 ๐‘Œ๐‘—๐‘ƒ 2(๐œ†๐‘—)๐œ‘๐‘–(๐œ†๐‘—)

โ‰ค min๐‘ƒโˆˆ๐’ซ๐‘šโˆ’๐‘–

max๐‘ฅโˆˆ[0,๐›ฝ]

๐‘ƒ 2(๐‘ฅ)๐œ‘๐‘–(๐‘ฅ)(1โˆ’ ๐‘ฅ)

๐‘ƒ 2(๐œ†๐‘—*)๐œ‘๐‘–(๐œ†๐‘—*)

โˆ‘๐‘—:๐œ†๐‘—<๐›ฝ ๐‘Œ๐‘—

๐‘Œ๐‘—*+ (1โˆ’ ๐›ฝ)

โ‰ค ๐›ฟโˆ’2(๐‘–โˆ’1) ๐‘‡โˆ’2๐‘šโˆ’๐‘–

(2

๐›ฝโˆ’ 1

)โˆ‘๐‘—:๐œ†๐‘—<๐›ฝ ๐‘Œ๐‘—

๐‘Œ๐‘—*+ (1โˆ’ ๐›ฝ)

โ‰ค 4 ๐›ฟโˆ’2(๐‘–โˆ’1) expโˆ’4โˆš

1โˆ’ ๐›ฝ(๐‘šโˆ’ ๐‘–)โˆ‘

๐‘—:๐œ†๐‘—<๐›ฝ ๐‘Œ๐‘—

๐‘Œ๐‘—*+ (1โˆ’ ๐›ฝ).

By Proposition 11,

P

[โˆ‘๐‘—:๐œ†๐‘—<๐›ฝ ๐‘Œ๐‘—

๐‘Œ๐‘—*โ‰ฅ ๐‘›3.02

]โ‰ค P

[๐‘Œ๐‘—* โ‰ค ๐‘›โˆ’2.01

]+ P

[โˆ‘๐‘— =๐‘—*

๐‘Œ๐‘— โ‰ฅ ๐‘›1.01

]

โ‰ค(๐‘’/๐‘›2.01

)1/2+(๐‘›.01๐‘’1โˆ’๐‘›.01

)(๐‘›โˆ’1)/2

= ๐‘œ(1/๐‘›).

Letโˆš

1โˆ’ ๐›ฝ =ln(๐›ฟโˆ’2(๐‘–โˆ’1)๐‘›3.02(๐‘šโˆ’๐‘–)2)

4(๐‘šโˆ’๐‘–). Then, with probability 1โˆ’ ๐‘œ(1/๐‘›),

๐œ†๐‘–(๐ด)โˆ’ ๐œ†(๐‘š)๐‘– (๐ด) โ‰ค 4

(๐‘šโˆ’ ๐‘–)2+

ln2(๐›ฟโˆ’2(๐‘–โˆ’1)๐‘›3.02(๐‘šโˆ’ ๐‘–)2

)16(๐‘šโˆ’ ๐‘–)2

.

185

The 4/(๐‘š โˆ’ ๐‘–)2 term is dominated by the log term as ๐‘› increases, and, therefore, with

probability 1โˆ’ ๐‘œ(1/๐‘›),

๐œ†๐‘–(๐ด)โˆ’ ๐œ†(๐‘š)๐‘– (๐ด) โ‰ค .571

ln2(๐›ฟโˆ’2(๐‘–โˆ’1)/3๐‘›(๐‘šโˆ’ ๐‘–)2/3

)(๐‘šโˆ’ ๐‘–)2

.

This completes the proof.

For typical matrices with no repeated eigenvalues, ๐›ฟ is usually a very low degree

polynomial in ๐‘›, and, for ๐‘– constant, the estimates for ๐œ†๐‘– are not much worse than that of

๐œ†1. In addition, it seems likely that, given any matrix ๐ด, a small random perturbation of ๐ด

before the application of the Lanczos method will satisfy the condition of the theorem with

high probability, and change the eigenvalues by a negligible amount. Of course, the bounds

from Theorem 22 for maximal eigenvalues ๐œ†๐‘– also apply to minimal eigenvalues ๐œ†๐‘›โˆ’๐‘–.

Next, we make use of Theorem 19 to produce error estimates for the condition number

of a symmetric positive definite matrix. Let ๐œ…(๐ด) = ๐œ†1(๐ด)/๐œ†๐‘›(๐ด) denote the condition

number of a matrix ๐ด โˆˆ ๐’ฎ๐‘›++ and ๐œ…(๐‘š)(๐ด, ๐‘) = ๐œ†

(๐‘š)1 (๐ด, ๐‘)/๐œ†

(๐‘š)๐‘š (๐ด, ๐‘) denote the condition

number of the tridiagonal matrix ๐‘‡๐‘š โˆˆ ๐’ฎ๐‘š++ resulting from ๐‘š iterations of the Lanczos

method applied to a matrix ๐ด โˆˆ ๐’ฎ๐‘›++ with initial vector ๐‘. Their difference can be written as

๐œ…(๐ด)โˆ’ ๐œ…(๐‘š)(๐ด, ๐‘) =๐œ†1

๐œ†๐‘›

โˆ’ ๐œ†(๐‘š)1

๐œ†(๐‘š)๐‘š

=๐œ†1๐œ†

(๐‘š)๐‘š โˆ’ ๐œ†๐‘›๐œ†

(๐‘š)1

๐œ†๐‘›๐œ†(๐‘š)๐‘š

=๐œ†(๐‘š)๐‘š

(๐œ†1 โˆ’ ๐œ†

(๐‘š)1

)+ ๐œ†

(๐‘š)1

(๐œ†(๐‘š)๐‘š โˆ’ ๐œ†๐‘›

)๐œ†๐‘›๐œ†

(๐‘š)๐‘š

= (๐œ…(๐ด)โˆ’ 1)

[๐œ†1 โˆ’ ๐œ†

(๐‘š)1

๐œ†1 โˆ’ ๐œ†๐‘›

+ ๐œ…(๐‘š)(๐ด, ๐‘)๐œ†(๐‘š)๐‘š โˆ’ ๐œ†๐‘›

๐œ†1 โˆ’ ๐œ†๐‘›

],

which leads to the bounds

๐œ…(๐ด)โˆ’ ๐œ…(๐‘š)(๐ด, ๐‘)

๐œ…(๐ด)โ‰ค ๐œ†1 โˆ’ ๐œ†

(๐‘š)1

๐œ†1 โˆ’ ๐œ†๐‘›

+ ๐œ…(๐ด)๐œ†(๐‘š)๐‘š โˆ’ ๐œ†๐‘›

๐œ†1 โˆ’ ๐œ†๐‘›

186

and๐œ…(๐ด)โˆ’ ๐œ…(๐‘š)(๐ด, ๐‘)

๐œ…(๐‘š)(๐ด, ๐‘)โ‰ฅ (๐œ…(๐ด)โˆ’ 1)

๐œ†(๐‘š)๐‘š โˆ’ ๐œ†๐‘›

๐œ†1 โˆ’ ๐œ†๐‘›

.

Using Theorem 19 and Minkowskiโ€™s inequality, we have the following corollary.

Corollary 4.

sup๐ดโˆˆ๐’ฎ๐‘›

++

๐œ…(๐ด)=

P๐‘โˆผ๐’ฐ(๐‘†๐‘›โˆ’1)

[๐œ…(๐ด)โˆ’ ๐œ…(๐‘š)(๐ด, ๐‘)

๐œ…(๐‘š)(๐ด, ๐‘)โ‰ฅ (1โˆ’ ๐‘œ(1))(โˆ’ 1)

]โ‰ฅ 1โˆ’ ๐‘œ(1/๐‘›)

for ๐‘š = ๐‘œ(ln๐‘›),

sup๐ดโˆˆ๐’ฎ๐‘›

++

๐œ…(๐ด)=

P๐‘โˆผ๐’ฐ(๐‘†๐‘›โˆ’1)

[๐œ…(๐ด)โˆ’ ๐œ…(๐‘š)(๐ด, ๐‘)

๐œ…(๐‘š)(๐ด, ๐‘)โ‰ฅ .015

(โˆ’ 1) ln2 ๐‘›

๐‘š2 ln2 ln๐‘›

]โ‰ฅ 1โˆ’ ๐‘œ(1/๐‘›)

for ๐‘š = ฮ˜(ln๐‘›),

sup๐ดโˆˆ๐’ฎ๐‘›

++

๐œ…(๐ด)=

P๐‘โˆผ๐’ฐ(๐‘†๐‘›โˆ’1)

[๐œ…(๐ด)โˆ’ ๐œ…(๐‘š)(๐ด, ๐‘)

๐œ…(๐‘š)(๐ด, ๐‘)โ‰ฅ 1.08

โˆ’ 1

๐‘š2

]โ‰ฅ 1โˆ’ ๐‘œ(1/๐‘›)

for ๐‘š = ๐‘œ(๐‘›1/2 lnโˆ’1/2 ๐‘›

)and ๐œ”(1), and

sup๐ดโˆˆ๐’ฎ๐‘›

++

๐œ…(๐ด)=

E๐‘โˆผ๐’ฐ(๐‘†๐‘›โˆ’1)

[(๐œ…(๐ด)โˆ’ ๐œ…(๐‘š)(๐ด, ๐‘)

๐œ…(๐ด)

)๐‘]1/๐‘โ‰ค .068 ( + 1)

ln2 (๐‘›(๐‘šโˆ’ 1)8๐‘)

(๐‘šโˆ’ 1)2

for ๐‘› โ‰ฅ 100, ๐‘š โ‰ฅ 10, ๐‘ โ‰ฅ 1.

As previously mentioned, it is not possible to produce uniform bounds for the relative

error of ๐œ…(๐‘š)(๐ด, ๐‘), and so some dependence on ๐œ…(๐ด) is necessary.

5.6 Experimental Results

In this section, we present a number of experimental results that illustrate the error of the

Lanczos method in practice. We consider:

187

โ€ข eigenvalues of 1D Laplacian with Dirichlet boundary conditions ฮ›๐‘™๐‘Ž๐‘ = 2+2 cos(๐‘–๐œ‹/(๐‘›+

1))๐‘›๐‘–=1,

โ€ข a uniform partition of [0, 1], ฮ›๐‘ข๐‘›๐‘–๐‘“ = (๐‘›โˆ’ ๐‘–)/(๐‘›โˆ’ 1)๐‘›๐‘–=1,

โ€ข eigenvalues from the semi-circle distribution, ฮ›๐‘ ๐‘’๐‘š๐‘– = ๐œ†๐‘–๐‘›๐‘–=1, where

1/2 +(๐œ†๐‘–

โˆš1โˆ’ ๐œ†2

๐‘– + arcsin๐œ†๐‘–

)/๐œ‹ = (๐‘›โˆ’ ๐‘–)/(๐‘›โˆ’ 1), ๐‘– = 1, ..., ๐‘›,

โ€ข eigenvalues corresponding to Lemma 21, ฮ›๐‘™๐‘œ๐‘” = 1โˆ’ [(๐‘› + 1โˆ’ ๐‘–)/๐‘›]ln ln๐‘›ln๐‘› ๐‘›๐‘–=1.

For each one of these spectra, we perform tests for dimensions ๐‘› = 10๐‘–, ๐‘– = 5, 6, 7, 8.

For each dimension, we generate 100 random vectors ๐‘ โˆผ ๐’ฐ(๐‘†๐‘›โˆ’1), and perform ๐‘š = 100

iterations of the Lanczos method on each vector. In Figure 5-1, we report the results of

these experiments. In particular, for each spectrum, we plot the empirical average relative

error (๐œ†1 โˆ’ ๐œ†(๐‘š)1 )/(๐œ†1 โˆ’ ๐œ†๐‘›) for each dimension as ๐‘š varies from 1 to 100. In addition, for

๐‘› = 108, we present a box plot for each spectrum, illustrating the variability in relative error

that each spectrum possesses.

In Figure 5-12, the plots of ฮ›๐‘™๐‘Ž๐‘, ฮ›๐‘ข๐‘›๐‘–๐‘“ , ฮ›๐‘ ๐‘’๐‘š๐‘– all illustrate an empirical average error

estimate that scales with ๐‘šโˆ’2 and has no dependence on dimension ๐‘› (for ๐‘› sufficiently

large). This is consistent with the theoretical results of the chapter, most notably Theorem

21, as all of these spectra exhibit suitable convergence of their empirical spectral distribution,

and the related integral minimization problems all have solutions of order ๐‘šโˆ’2. In addition,

the box plots corresponding to ๐‘› = 108 illustrate that the relative error for a given iteration

number has a relatively small variance. For instance, all extreme values remain within a

range of length less than .01๐‘šโˆ’2 for ฮ›๐‘™๐‘Ž๐‘, .1๐‘šโˆ’2 for ฮ›๐‘ข๐‘›๐‘–๐‘“ , and .4๐‘šโˆ’2 for ฮ›๐‘ ๐‘’๐‘š๐‘–. This is

also consistent with the convergence of Theorem 21. The empirical spectral distribution of

all three spectra converge to shifted and scaled versions of Jacobi weight functions, namely,

ฮ›๐‘™๐‘Ž๐‘ corresponds to ๐œ”โˆ’1/2,โˆ’1/2(๐‘ฅ), ฮ›๐‘ข๐‘›๐‘–๐‘“ to ๐œ”0,0(๐‘ฅ), and ฮ›๐‘ ๐‘’๐‘š๐‘– to ๐œ”1/2,1/2(๐‘ฅ). The limiting

value of ๐‘š2(1 โˆ’ ๐œ‰(๐‘š))/2 for each of these three cases is given by ๐‘—21,๐›ผ/4, where ๐‘—1,๐›ผ is

2In Figure 5-1, the box plots are labelled as follows. For a given ๐‘š, the 25๐‘กโ„Ž and 75๐‘กโ„Ž percentile of thevalues, denoted by ๐‘ž1 and ๐‘ž3, are the bottom and top of the corresponding box, and the red line in the box is themedian. The whiskers extend to the most extreme points in the interval [๐‘ž1 โˆ’ 1.5(๐‘ž3 โˆ’ ๐‘ž1), ๐‘ž3 + 1.5(๐‘ž3 โˆ’ ๐‘ž1)],and outliers not in this interval correspond to the โ€™+โ€™ symbol.

188

(a) Plot, ฮ›๐‘™๐‘Ž๐‘ (b) Box Plot, ฮ›๐‘™๐‘Ž๐‘

(c) Plot, ฮ›๐‘ข๐‘›๐‘–๐‘“ (d) Box Plot, ฮ›๐‘ข๐‘›๐‘–๐‘“

(e) Plot, ฮ›๐‘ ๐‘’๐‘š๐‘– (f) Box Plot, ฮ›๐‘ ๐‘’๐‘š๐‘–

(g) Plot, ฮ›๐‘™๐‘œ๐‘” (h) Box Plot, ฮ›๐‘™๐‘œ๐‘”

Figure 5-1: Plot and box plot of relative error times ๐‘š2 vs iteration number ๐‘š for ฮ›๐‘™๐‘Ž๐‘,ฮ›๐‘ข๐‘›๐‘–๐‘“ , ฮ›๐‘ ๐‘’๐‘š๐‘–, and ฮ›๐‘™๐‘œ๐‘”. The plot contains curves for each dimension ๐‘› tested. Each curverepresents the empirical average relative error for each value of ๐‘š, averaged over 100random initializations. The box plot illustrates the variability of relative error for ๐‘› = 108.

189

the first positive zero of the Bessel function ๐ฝ๐›ผ(๐‘ฅ), namely, the scaled error converges to

๐œ‹2/16 โ‰ˆ .617 for ฮ›๐‘™๐‘Ž๐‘, โ‰ˆ 1.45 for ฮ›๐‘ข๐‘›๐‘–๐‘“ , and ๐œ‹2/4 โ‰ˆ 2.47 for ฮ›๐‘ ๐‘’๐‘š๐‘–. Two additional

properties suggested by Figure 5-1 are that the variance and the dimension ๐‘› required to

observe asymptotic behavior both appear to increase with ๐›ผ. This is consistent with the

theoretical results, as Lemma 21 (and ฮ›๐‘™๐‘œ๐‘”) results from considering ๐œ”๐›ผ,0(๐‘ฅ) with ๐›ผ as a

function of ๐‘›.

The plot of relative error for ฮ›๐‘™๐‘œ๐‘” illustrates that relative error does indeed depend on

๐‘› in a tangible way, and is increasing with ๐‘› in what appears to be a logarithmic fashion.

For instance, when looking at the average relative error scaled by ๐‘š2, the maximum over

all iteration numbers ๐‘š appears to increase somewhat logarithmically (โ‰ˆ 4 for ๐‘› = 105,

โ‰ˆ 5 for ๐‘› = 106, โ‰ˆ 6 for ๐‘› = 107, and โ‰ˆ 7 for ๐‘› = 108). In addition, the boxplot for

๐‘› = 108 illustrates that this spectrum exhibits a large degree of variability and is susceptible

to extreme outliers. These numerical results support the theoretical lower bounds of Section

3, and illustrate that the asymptotic theoretical lower bounds which depend on ๐‘› do occur in

practical computations.

190

Bibliography

[1] Raja Hafiz Affandi, Emily B. Fox, Ryan P. Adams, and Benjamin Taskar. Learningthe parameters of determinantal point process kernels. In Proceedings of the 31thInternational Conference on Machine Learning, ICML 2014, Beijing, China, 21-26June 2014, pages 1224โ€“1232, 2014.

[2] Edoardo Amaldi, Claudio Iuliano, and Romeo Rizzi. Efficient deterministic algo-rithms for finding a minimum cycle basis in undirected graphs. In InternationalConference on Integer Programming and Combinatorial Optimization, pages 397โ€“410. Springer, 2010.

[3] Nima Anari and Thuy-Duong Vuong. Simple and near-optimal map inference fornonsymmetric dpps. arXiv preprint arXiv:2102.05347, 2021.

[4] Vishal Arul. personal communication.

[5] Mihai Badoiu, Kedar Dhamdhere, Anupam Gupta, Yuri Rabinovich, Harald Rรคcke,Ramamoorthi Ravi, and Anastasios Sidiropoulos. Approximation algorithms forlow-distortion embeddings into low-dimensional spaces. In SODA, volume 5, pages119โ€“128. Citeseer, 2005.

[6] Nematollah Kayhan Batmanghelich, Gerald Quon, Alex Kulesza, Manolis Kellis,Polina Golland, and Luke Bornn. Diversifying sparsity using variational determinantalpoint processes. ArXiv: 1411.6307, 2014.

[7] Giuseppe Di Battista, Peter Eades, Roberto Tamassia, and Ioannis G Tollis. Graphdrawing: algorithms for the visualization of graphs. Prentice Hall PTR, 1998.

[8] Julius Borcea, Petter Brรคndรฉn, and Thomas Liggett. Negative dependence and thegeometry of polynomials. Journal of the American Mathematical Society, 22(2):521โ€“567, 2009.

[9] Ingwer Borg and Patrick JF Groenen. Modern multidimensional scaling: Theory andapplications. Springer Science & Business Media, 2005.

[10] Alexei Borodin. Determinantal point processes. arXiv preprint arXiv:0911.1153,2009.

191

[11] Alexei Borodin and Eric M Rains. Eynardโ€“mehta theorem, schur process, and theirpfaffian analogs. Journal of statistical physics, 121(3):291โ€“317, 2005.

[12] Jane Breen, Alex Riasanovsky, Michael Tait, and John Urschel. Maximum spread ofgraphs and bipartite graphs. In Preparation.

[13] Andries E Brouwer and Willem H Haemers. Spectra of graphs. Springer Science &Business Media, 2011.

[14] Victor-Emmanuel Brunel. Learning signed determinantal point processes throughthe principal minor assignment problem. In S. Bengio, H. Wallach, H. Larochelle,K. Grauman, N. Cesa-Bianchi, and R. Garnett, editors, Advances in Neural Informa-tion Processing Systems, volume 31. Curran Associates, Inc., 2018.

[15] Victor-Emmanuel Brunel, Michel Goemans, and John Urschel. Recovering amagnitude-symmetric matrix from its principal minors. In preparation.

[16] Victor-Emmanuel Brunel, Ankur Moitra, Philippe Rigollet, and John Urschel. Max-imum likelihood estimation of determinantal point processes. arXiv:1701.06501,2017.

[17] Victor-Emmanuel Brunel, Ankur Moitra, Philippe Rigollet, and John Urschel. Ratesof estimation for determinantal point processes. In Conference on Learning Theory,pages 343โ€“345. PMLR, 2017.

[18] David Carlson. What are Schur complements, anyway? Linear Algebra and itsApplications, 74:257โ€“275, 1986.

[19] Lawrence Cayton and Sanjoy Dasgupta. Robust euclidean embedding. In Proceedingsof the 23rd international conference on machine learning, pages 169โ€“176, 2006.

[20] Norishige Chiba, Takao Nishizeki, Shigenobu Abe, and Takao Ozawa. A linearalgorithm for embedding planar graphs using PQ-trees. Journal of computer andsystem sciences, 30(1):54โ€“76, 1985.

[21] David M. Chickering, Dan Geiger, and David Heckerman. On finding a cycle basiswith a shortest maximal cycle. Information Processing Letters, 54(1):55 โ€“ 58, 1995.

[22] Ali ร‡ivril and Malik Magdon-Ismail. On selecting a maximum volume sub-matrix ofa matrix and related problems. Theoretical Computer Science, 410(47-49):4801โ€“4811,2009.

[23] Ed S Coakley and Vladimir Rokhlin. A fast divide-and-conquer algorithm for com-puting the spectra of real symmetric tridiagonal matrices. Applied and ComputationalHarmonic Analysis, 34(3):379โ€“414, 2013.

[24] Martin Costabel. Boundary integral operators on Lipschitz domains: elementaryresults. SIAM J. Math. Anal., 19(3):613โ€“626, 1988.

192

[25] Jack Cuzick. A strong law for weighted sums of iid random variables. Journal ofTheoretical Probability, 8(3):625โ€“641, 1995.

[26] Sanjoy Dasgupta and Anupam Gupta. An elementary proof of a theorem of johnsonand lindenstrauss. Random Structures & Algorithms, 22(1):60โ€“65, 2003.

[27] Timothy A Davis and Yifan Hu. The university of florida sparse matrix collection.ACM Transactions on Mathematical Software (TOMS), 38(1):1โ€“25, 2011.

[28] Jan De Leeuw. Convergence of the majorization method for multidimensional scaling.Journal of classification, 5(2):163โ€“180, 1988.

[29] Jan De Leeuw, In JR Barra, F Brodeau, G Romier, B Van Cutsem, et al. Applicationsof convex analysis to multidimensional scaling. In Recent Developments in Statistics.Citeseer, 1977.

[30] Gianna M Del Corso and Giovanni Manzini. On the randomized error of polynomialmethods for eigenvector and eigenvalue estimates. Journal of Complexity, 13(4):419โ€“456, 1997.

[31] Erik Demaine, Adam Hesterberg, Frederic Koehler, Jayson Lynch, and John Urschel.Multidimensional scaling: Approximation and complexity. In Marina Meila andTong Zhang, editors, Proceedings of the 38th International Conference on MachineLearning, volume 139 of Proceedings of Machine Learning Research, pages 2568โ€“2578. PMLR, 18โ€“24 Jul 2021.

[32] Amit Deshpande and Luis Rademacher. Efficient volume sampling for row/columnsubset selection. In Foundations of Computer Science (FOCS), 2010 51st AnnualIEEE Symposium on, pages 329โ€“338. IEEE, 2010.

[33] Emeric Deutsch. On the spread of matrices and polynomials. Linear Algebra and ItsApplications, 22:49โ€“55, 1978.

[34] Michel Marie Deza and Monique Laurent. Geometry of cuts and metrics, volume 15.Springer, 2009.

[35] Peter Sheridan Dodds, Roby Muhamad, and Duncan J Watts. An experimental studyof search in global social networks. science, 301(5634):827โ€“829, 2003.

[36] Petros Drineas, Ilse CF Ipsen, Eugenia-Maria Kontopoulou, and Malik Magdon-Ismail. Structural convergence results for approximation of dominant subspacesfrom block Krylov spaces. SIAM Journal on Matrix Analysis and Applications,39(2):567โ€“586, 2018.

[37] Kathy Driver, Kerstin Jordaan, and Norbert Mbuyi. Interlacing of the zeros of Jacobipolynomials with different parameters. Numerical Algorithms, 49(1-4):143, 2008.

193

[38] Peter Eades. A heuristic for graph drawing. Congressus numerantium, 42:149โ€“160,1984.

[39] John Ellson, Emden Gansner, Lefteris Koutsofios, Stephen C North, and GordonWoodhull. Graphvizโ€”open source graph drawing tools. In International Symposiumon Graph Drawing, pages 483โ€“484. Springer, 2001.

[40] Miroslav Fiedler. Remarks on the Schur complement. Linear Algebra Appl., 39:189โ€“195, 1981.

[41] Klaus-Jรผrgen Fรถrster and Knut Petras. On estimates for the weights in Gaussianquadrature in the ultraspherical case. Mathematics of computation, 55(191):243โ€“264,1990.

[42] Emilio Gagliardo. Caratterizzazioni delle tracce sulla frontiera relative ad alcuneclassi di funzioni in ๐‘› variabili. Rend. Sem. Mat. Univ. Padova, 27:284โ€“305, 1957.

[43] Emden R Gansner, Yehuda Koren, and Stephen North. Graph drawing by stressmajorization. In International Symposium on Graph Drawing, pages 239โ€“250.Springer, 2004.

[44] Mike Gartrell, Victor-Emmanuel Brunel, Elvis Dohmatob, and Syrine Krich-ene. Learning nonsymmetric determinantal point processes. arXiv preprintarXiv:1905.12962, 2019.

[45] Mike Gartrell, Ulrich Paquet, and Noam Koenigstein. Low-rank factorization ofdeterminantal point processes. In Proceedings of the AAAI Conference on ArtificialIntelligence, volume 31, 2017.

[46] Jennifer A Gillenwater, Alex Kulesza, Emily Fox, and Ben Taskar. Expectation-maximization for learning determinantal point processes. In NIPS, 2014.

[47] Gene H Golub and Charles F Van Loan. Matrix computations, volume 3. JHU press,2012.

[48] David A Gregory, Daniel Hershkowitz, and Stephen J Kirkland. The spread of thespectrum of a graph. Linear Algebra and its Applications, 332:23โ€“35, 2001.

[49] Nathan Halko, Per-Gunnar Martinsson, and Joel A Tropp. Finding structure withrandomness: Probabilistic algorithms for constructing approximate matrix decompo-sitions. SIAM review, 53(2):217โ€“288, 2011.

[50] G. H. Hardy, J. E. Littlewood, and G. Pรณlya. Inequalities. Cambridge MathematicalLibrary. Cambridge University Press, Cambridge, 1988. Reprint of the 1952 edition.

[51] Joseph Douglas Horton. A polynomial-time algorithm to find the shortest cycle basisof a graph. SIAM Journal on Computing, 16(2):358โ€“366, 1987.

194

[52] Xiaozhe Hu, John C Urschel, and Ludmil T Zikatanov. On the approximation oflaplacian eigenvalues in graph disaggregation. Linear and Multilinear Algebra,65(9):1805โ€“1822, 2017.

[53] Charles R Johnson, Ravinder Kumar, and Henry Wolkowicz. Lower bounds for thespread of a matrix. Linear Algebra and Its Applications, 71:161โ€“173, 1985.

[54] Charles R Johnson and Michael J Tsatsomeros. Convex sets of nonsingular andP-matrices. Linear and Multilinear Algebra, 38(3):233โ€“239, 1995.

[55] Tomihisa Kamada, Satoru Kawai, et al. An algorithm for drawing general undirectedgraphs. Information processing letters, 31(1):7โ€“15, 1989.

[56] Shmuel Kaniel. Estimates for some computational techniques in linear algebra.Mathematics of Computation, 20(95):369โ€“378, 1966.

[57] Michael Kaufmann and Dorothea Wagner. Drawing graphs: methods and models,volume 2025. Springer, 2003.

[58] Sanjeev Khanna, Madhu Sudan, Luca Trevisan, and David P Williamson. Theapproximability of constraint satisfaction problems. SIAM Journal on Computing,30(6):1863โ€“1920, 2001.

[59] Kevin Knudson and Evelyn Lamb. My favorite theorem, episode 23 - ingriddaubechies.

[60] Y. Koren. Drawing graphs by eigenvectors: theory and practice. Comput. Math. Appl.,49(11-12):1867โ€“1888, 2005.

[61] Yehuda Koren. On spectral graph drawing. In Computing and combinatorics, volume2697 of Lecture Notes in Comput. Sci., pages 496โ€“508. Springer, Berlin, 2003.

[62] Yehuda Koren, Liran Carmel, and David Harel. Drawing huge graphs by algebraicmultigrid optimization. Multiscale Model. Simul., 1(4):645โ€“673 (electronic), 2003.

[63] B. Korte and J. Vygen. Combinatorial Optimization: Theory and Algorithms.Springer, 2011.

[64] Joseph B Kruskal. Multidimensional scaling by optimizing goodness of fit to anonmetric hypothesis. Psychometrika, 29(1):1โ€“27, 1964.

[65] Joseph B Kruskal. Nonmetric multidimensional scaling: a numerical method. Psy-chometrika, 29(2):115โ€“129, 1964.

[66] Joseph B Kruskal. Multidimensional scaling. Number 11. Sage, 1978.

[67] Jacek Kuczynski and Henryk Wozniakowski. Estimating the largest eigenvalue by thepower and Lanczos algorithms with a random start. SIAM journal on matrix analysisand applications, 13(4):1094โ€“1122, 1992.

195

[68] Jacek Kuczynski and Henryk Wozniakowski. Probabilistic bounds on the extremaleigenvalues and condition number by the Lanczos algorithm. SIAM Journal on MatrixAnalysis and Applications, 15(2):672โ€“691, 1994.

[69] A. Kulesza. Learning with determinantal point processes. PhD thesis, University ofPennsylvania, 2012.

[70] Alex Kulesza and Ben Taskar. ๐‘˜-DPPs: Fixed-size determinantal point processes. InProceedings of the 28th International Conference on Machine Learning, ICML 2011,Bellevue, Washington, USA, June 28 - July 2, 2011, pages 1193โ€“1200, 2011.

[71] Alex Kulesza and Ben Taskar. Determinantal point processes for machine learning.arXiv preprint arXiv:1207.6083, 2012.

[72] Alex Kulesza and Ben Taskar. Determinantal Point Processes for Machine Learning.Now Publishers Inc., Hanover, MA, USA, 2012.

[73] Donghoon Lee, Geonho Cha, Ming-Hsuan Yang, and Songhwai Oh. Individualnessand determinantal point processes for pedestrian detection. In Computer Vision -ECCV 2016 - 14th European Conference, Amsterdam, The Netherlands, October11-14, 2016, Proceedings, Part VI, pages 330โ€“346, 2016.

[74] Chengtao Li, Stefanie Jegelka, and Suvrit Sra. Fast dpp sampling for nystromwith application to kernel methods. International Conference on Machine Learning(ICML), 2016.

[75] Chengtao Li, Stefanie Jegelka, and Suvrit Sra. Fast sampling for strongly rayleighmeasures with application to determinantal point processes. 1607.03559, 2016.

[76] Hui Lin and Jeff A. Bilmes. Learning mixtures of submodular shells with applicationto document summarization. In Proceedings of the Twenty-Eighth Conference onUncertainty in Artificial Intelligence, Catalina Island, CA, USA, August 14-18, 2012,pages 479โ€“490, 2012.

[77] J.-L. Lions and E. Magenes. Non-homogeneous boundary value problems andapplications. Vol. I. Springer-Verlag, New York-Heidelberg, 1972. Translated fromthe French by P. Kenneth, Die Grundlehren der mathematischen Wissenschaften,Band 181.

[78] Chia-an Liu and Chih-wen Weng. Spectral radius of bipartite graphs. Linear Algebraand its Applications, 474:30โ€“43, 2015.

[79] Raphael Loewy. Principal minors and diagonal similarity of matrices. Linear algebraand its applications, 78:23โ€“64, 1986.

[80] Zelda Mariet and Suvrit Sra. Fixed-point algorithms for learning determinantal pointprocesses. In Proceedings of the 32nd International Conference on Machine Learning(ICML-15), pages 2389โ€“2397, 2015.

196

[81] Claire Mathieu and Warren Schudy. Yet another algorithm for dense max cut: gogreedy. In SODA, pages 176โ€“182, 2008.

[82] Leland McInnes, John Healy, and James Melville. Umap: Uniform manifold approxi-mation and projection for dimension reduction. arXiv preprint arXiv:1802.03426,2018.

[83] Leon Mirsky. The spread of a matrix. Mathematika, 3(2):127โ€“130, 1956.

[84] Cameron Musco and Christopher Musco. Randomized block Krylov methods forstronger and faster approximate singular value decomposition. In Advances in NeuralInformation Processing Systems, pages 1396โ€“1404, 2015.

[85] Assaf Naor. An introduction to the ribe program. Japanese Journal of Mathematics,7(2):167โ€“233, 2012.

[86] S. V. Nepomnyaschikh. Mesh theorems on traces, normalizations of function tracesand their inversion. Soviet J. Numer. Anal. Math. Modelling, 6(3):223โ€“242, 1991.

[87] Jindrich Necas. Direct methods in the theory of elliptic equations. Springer Mono-graphs in Mathematics. Springer, Heidelberg, 2012. Translated from the 1967 Frenchoriginal by Gerard Tronel and Alois Kufner, Editorial coordination and preface byล รกrka Necasovรก and a contribution by Christian G. Simader.

[88] Aleksandar Nikolov. Randomized rounding for the largest simplex problem. In Pro-ceedings of the Forty-Seventh Annual ACM on Symposium on Theory of Computing,pages 861โ€“870. ACM, 2015.

[89] Aleksandar Nikolov and Mohit Singh. Maximizing determinants under partitionconstraints. In STOC, pages 192โ€“201, 2016.

[90] Geno Nikolov. Inequalities of Duffin-Schaeffer type ii. Institute of Mathematics andInformatics at the Bulgarian Academy of Sciences, 2003.

[91] Christopher Conway Paige. The computation of eigenvalues and eigenvectors of verylarge sparse matrices. PhD thesis, University of London, 1971.

[92] Beresford N Parlett, H Simon, and LM Stringer. On estimating the largest eigenvaluewith the Lanczos algorithm. Mathematics of computation, 38(157):153โ€“165, 1982.

[93] Jack Poulson. High-performance sampling of generic determinantal point processes.Philosophical Transactions of the Royal Society A, 378(2166):20190059, 2020.

[94] Patrick Rebeschini and Amin Karbasi. Fast mixing for discrete point processes. InCOLT, pages 1480โ€“1500, 2015.

[95] Alex W. N. Riasanocsky. spread_numeric. https://github.com/ariasanovsky/spread_numeric, commit = c75e5d1726361eba04292c46c41009ea76e401e9, 2021.

197

[96] Justin Rising, Alex Kulesza, and Ben Taskar. An efficient algorithm for the symmetricprincipal minor assignment problem. Linear Algebra and its Applications, 473:126 โ€“144, 2015.

[97] Dhruv Rohatgi, John C Urschel, and Jake Wellens. Regarding two conjectures onclique and biclique partitions. arXiv preprint arXiv:2005.02529, 2020.

[98] Donald J Rose, R Endre Tarjan, and George S Lueker. Algorithmic aspects of vertexelimination on graphs. SIAM Journal on computing, 5(2):266โ€“283, 1976.

[99] Yousef Saad. On the rates of convergence of the Lanczos and the block-Lanczosmethods. SIAM Journal on Numerical Analysis, 17(5):687โ€“706, 1980.

[100] Yousef Saad. Numerical methods for large eigenvalue problems: revised edition,volume 66. Siam, 2011.

[101] Warren Schudy. Approximation Schemes for Inferring Rankings and Clusterings fromPairwise Data. PhD thesis, Brown University, 2012.

[102] Michael Ian Shamos and Dan Hoey. Geometric intersection problems. In 17th AnnualSymposium on Foundations of Computer Science (sfcs 1976), pages 208โ€“215. IEEE,1976.

[103] Jie Shen, Tao Tang, and Li-Lian Wang. Spectral methods: algorithms, analysis andapplications, volume 41. Springer Science & Business Media, 2011.

[104] Max Simchowitz, Ahmed El Alaoui, and Benjamin Recht. On the gap betweenstrict-saddles and true convexity: An omega (log d) lower bound for eigenvectorapproximation. arXiv preprint arXiv:1704.04548, 2017.

[105] Max Simchowitz, Ahmed El Alaoui, and Benjamin Recht. Tight query complexitylower bounds for PCA via finite sample deformed Wigner law. In Proceedings of the50th Annual ACM SIGACT Symposium on Theory of Computing, pages 1249โ€“1259,2018.

[106] Jasper Snoek, Richard S. Zemel, and Ryan Prescott Adams. A determinantal pointprocess latent variable model for inhibition in neural spiking data. In Advancesin Neural Information Processing Systems 26: 27th Annual Conference on NeuralInformation Processing Systems 2013. Proceedings of a meeting held December 5-8,2013, Lake Tahoe, Nevada, United States., pages 1932โ€“1940, 2013.

[107] Ernst Steinitz. Polyeder und raumeinteilungen. Encyk der Math Wiss, 12:38โ€“43,1922.

[108] Marco Di Summa, Friedrich Eisenbrand, Yuri Faenza, and Carsten Moldenhauer. Onlargest volume simplices and sub-determinants. In Proceedings of the Twenty-SixthAnnual ACM-SIAM Symposium on Discrete Algorithms, pages 315โ€“323. Society forIndustrial and Applied Mathematics, 2015.

198

[109] JW Suurballe. Disjoint paths in a network. Networks, 4(2):125โ€“145, 1974.

[110] Gabor Szego. Orthogonal polynomials, volume 23. American Mathematical Soc.,1939.

[111] P. L. Tchebichef. Sur le rapport de deux integrales etendues aux memes valeurs de lavariable. Zapiski Akademii Nauk, vol. 44, Supplement no. 2, 1883. Oeuvres. Vol. 2,pp. 375-402.

[112] Tamรกs Terpai. Proof of a conjecture of V. Nikiforov. Combinatorica, 31(6):739โ€“754,2011.

[113] Lloyd N Trefethen and David Bau III. Numerical linear algebra, volume 50. Siam,1997.

[114] Francesco G Tricomi. Sugli zeri dei polinomi sferici ed ultrasferici. Annali diMatematica Pura ed Applicata, 31(4):93โ€“97, 1950.

[115] Alexandre B. Tsybakov. Introduction to nonparametric estimation. Springer Seriesin Statistics. Springer, New York, 2009.

[116] W. T. Tutte. How to draw a graph. Proc. London Math. Soc. (3), 13:743โ€“767, 1963.

[117] William Thomas Tutte. How to draw a graph. Proceedings of the London Mathemati-cal Society, 3(1):743โ€“767, 1963.

[118] John Urschel. spread_numeric+. https://math.mit.edu/~urschel/spread_numeric+.zip,April, 2021. [Online].

[119] John Urschel, Victor-Emmanuel Brunel, Ankur Moitra, and Philippe Rigollet. Learn-ing determinantal point processes with moments and cycles. In International Confer-ence on Machine Learning, pages 3511โ€“3520. PMLR, 2017.

[120] John C Urschel. On the characterization and uniqueness of centroidal voronoitessellations. SIAM Journal on Numerical Analysis, 55(3):1525โ€“1547, 2017.

[121] John C Urschel. Nodal decompositions of graphs. Linear Algebra and its Applications,539:60โ€“71, 2018.

[122] John C Urschel. Uniform error estimates for the Lanczos method. SIAM Journal ofMatrix Analysis, To appear.

[123] John C Urschel and Jake Wellens. Testing gap k-planarity is NP-complete. Informa-tion Processing Letters, 169:106083, 2021.

[124] John C Urschel and Ludmil T Zikatanov. Discrete trace theorems and energy min-imizing spring embeddings of planar graphs. Linear Algebra and its Applications,609:73โ€“107, 2021.

199

[125] Jos LM Van Dorsselaer, Michiel E Hochstenbach, and Henk A Van Der Vorst. Com-puting probabilistic bounds for extreme eigenvalues of symmetric matrices with theLanczos method. SIAM Journal on Matrix Analysis and Applications, 22(3):837โ€“852,2001.

[126] Roman Vershynin. High-dimensional probability: An introduction with applicationsin data science, volume 47. Cambridge university press, 2018.

[127] Duncan J Watts and Steven H Strogatz. Collective dynamics of โ€˜small-worldโ€™networks.nature, 393(6684):440โ€“442, 1998.

[128] H. Whitney. Non-separable and planar graphs. Transactions of the American Mathe-matical Society, 34:339โ€“362, 1932.

[129] Junliang Wu, Pingping Zhang, and Wenshi Liao. Upper bounds for the spread of amatrix. Linear algebra and its applications, 437(11):2813โ€“2822, 2012.

[130] Yangqingxiang Wu and Ludmil Zikatanov. Fourier method for approximating eigen-values of indefinite Stekloff operator. In International Conference on High Perfor-mance Computing in Science and Engineering, pages 34โ€“46. Springer, 2017.

[131] Haotian Xu and Haotian Ou. Scalable discovery of audio fingerprint motifs in broad-cast streams with determinantal point process based motif clustering. IEEE/ACMTrans. Audio, Speech & Language Processing, 24(5):978โ€“989, 2016.

[132] Jin-ge Yao, Feifan Fan, Wayne Xin Zhao, Xiaojun Wan, Edward Y. Chang, andJianguo Xiao. Tweet timeline generation with determinantal point processes. InProceedings of the Thirtieth AAAI Conference on Artificial Intelligence, February12-17, 2016, Phoenix, Arizona, USA., pages 3080โ€“3086, 2016.

[133] Grigory Yaroslavtsev. Going for speed: Sublinear algorithms for dense r-csps. arXivpreprint arXiv:1407.7887, 2014.

[134] Fuzhen Zhang, editor. The Schur complement and its applications, volume 4 ofNumerical Methods and Algorithms. Springer-Verlag, New York, 2005.

[135] Jonathan X Zheng, Samraat Pawar, and Dan FM Goodman. Graph drawing bystochastic gradient descent. IEEE transactions on visualization and computer graph-ics, 25(9):2738โ€“2748, 2018.

200