graphs, principal minors, and eigenvalue problems
TRANSCRIPT
Graphs, Principal Minors, and Eigenvalue Problemsby
John C. Urschel
Submitted to the Department of Mathematicsin partial fulfillment of the requirements for the degree of
Doctor of Philosophy in Mathematics
at the
MASSACHUSETTS INSTITUTE OF TECHNOLOGY
September 2021
ยฉ John C. Urschel, MMXXI. All rights reserved.
The author hereby grants to MIT permission to reproduce and to distributepublicly paper and electronic copies of this thesis document in whole or in
part in any medium now known or hereafter created.
Author . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .Department of Mathematics
August 6, 2021
Certified by . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .Michel X. Goemans
RSA Professor of Mathematics and Department HeadThesis Supervisor
Accepted by . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .Jonathan Kelner
Chairman, Department Committee on Graduate Theses
Graphs, Principal Minors, and Eigenvalue Problems
by
John C. Urschel
Submitted to the Department of Mathematicson August 6, 2021, in partial fulfillment of the
requirements for the degree ofDoctor of Philosophy in Mathematics
AbstractThis thesis considers four independent topics within linear algebra: determinantal pointprocesses, extremal problems in spectral graph theory, force-directed layouts, and eigenvaluealgorithms. For determinantal point processes (DPPs), we consider the classes of symmetricand signed DPPs, respectively, and in both cases connect the problem of learning theparameters of a DPP to a related matrix recovery problem. Next, we consider two conjecturesin spectral graph theory regarding the spread of a graph, and resolve both. For force-directedlayouts of graphs, we connect the layout of the boundary of a Tutte spring embedding totrace theorems from the theory of elliptic PDEs, and we provide a rigorous theoreticalanalysis of the popular Kamada-Kawai objective, proving hardness of approximation andstructural results regarding optimal layouts, and providing a polynomial time randomizedapproximation scheme for low diameter graphs. Finally, we consider the Lanczos methodfor computing extremal eigenvalues of a symmetric matrix and produce new error estimatesfor this algorithm.
Thesis Supervisor: Michel X. GoemansTitle: RSA Professor of Mathematics and Department Head
3
Acknowledgments
I would like to thank my advisor, Michel Goemans. From the moment I met him, I was
impressed and inspired by the way he thinks about mathematics. Despite his busy schedule
as head of the math department, he always made time when I needed him. He has been the
most supportive advisor a student could possibly ask for. Heโs the type of mathematician
I aspire to be. I would also like to thank the other members of my thesis committee,
Alan Edelman and Philippe Rigollet. They have been generous with their time and their
mathematical knowledge. In my later years as a PhD candidate, Alan has served as a
secondary advisor of sorts, and Iโm thankful for our weekly Saturday meetings.
Iโve also been lucky to work with a number of great collaborators during my time at
MIT. This thesis would not have been possible without my mathematical collaborations and
conversations with Jane Breen, Victor-Emmanuel Brunel, Erik Demaine, Adam Hesterberg,
Fred Koehler, Jayson Lynch, Ankur Moitra, Alex Riasanovksy, Michael Tait, and Ludmil
Zikatanov. And, though the associated work does not appear in this thesis, I am also grateful
for collaborations with Xiaozhe Hu, Dhruv Rohatgi, and Jake Wellens during my PhD.
Iโve been the beneficiary of a good deal of advice, mentorship, and support from the larger
mathematics community. This group is too long to list, but Iโm especially grateful to Rediet
Abebe, Michael Brenner, David Mordecai, Jelani Nelson, Peter Sarnak, Steven Strogatz,
and Alex Townsend, among many others, for always being willing to give me some much
needed advice during my time at MIT.
I would like to thank my office mates, Fred Koehler and Jake Wellens (and Megan Fu),
for making my time in office 2-342 so enjoyable. With Jakeโs wall mural and very large
and dusty rug, our office felt lived in and was a comfortable place to do math. Iโve had
countless interesting conversations with fellow grad students, including Michael Cohen,
Younhun Kim, Meehtab Sawhney, and Jonathan Tidor, to name a few. My interactions with
MIT undergrads constantly reminded me just how fulfilling and rewarding mentorship can
be. Iโm thankful to all the support staff members at MIT math, and have particularly fond
memories of grabbing drinks with Andrรฉ at the Muddy, constantly bothering Michele with
5
whatever random MIT problem I could not solve, and making frequent visits to Rosalee for
more espresso pods. More generally, I have to thank the entire math department for fostering
such a warm and welcoming environment to do math. MIT math really does feel like home.
Iโd be remiss if I did not thank my friends and family for their non-academic contributions.
I have to thank Ty Howle for supporting me every step of the way, and trying to claim
responsibility for nearly all of my mathematical work. This is a hard claim to refute, as it
was often only once I took a step back and looked at things from a different perspective that
breakthroughs were made. Iโd like to thank Robert Hess for constantly encouraging and
believing in me, despite knowing absolutely no math. Iโd like to thank my father and mother
for instilling a love of math in me from a young age. My fondest childhood memories of
math were of going through puzzle books at the kitchen table with my mother, and leafing
through the math section in the local Chapters in St. Catherines while my father worked
in the cafe. Iโd like to thank Louisa for putting up with me, moving with me (eight times
during this PhD), and proofreading nearly all of my academic papers. I would thank Joanna,
but, frankly, she was of little to no help.
This research was supported in part by ONR Research Contract N00014-17-1-2177.
During my time at MIT, I was previously supported by a Dean of Science Fellowship and
am currently supported by a Mathworks Fellowship, both of which have allowed me to place
a greater emphasis on research, and without which this thesis, in its current form, would not
have been possible.
6
Contents
1 Introduction 11
2 Determinantal Point Processes and Principal Minor Assignment Problems 17
2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2.1.1 Symmetric Matrices . . . . . . . . . . . . . . . . . . . . . . . . . 18
2.1.2 Magnitude-Symmetric Matrices . . . . . . . . . . . . . . . . . . . 20
2.2 Learning Symmetric Determinantal Point Processes . . . . . . . . . . . . . 22
2.2.1 An Associated Principal Minor Assignment Problem . . . . . . . . 22
2.2.2 Definition of the Estimator . . . . . . . . . . . . . . . . . . . . . . 25
2.2.3 Information Theoretic Lower Bounds . . . . . . . . . . . . . . . . 29
2.2.4 Algorithmic Aspects and Experiments . . . . . . . . . . . . . . . . 32
2.3 Recovering a Magnitude-Symmetric Matrix from its Principal Minors . . . 38
2.3.1 Principal Minors and Magnitude-Symmetric Matrices . . . . . . . . 40
2.3.2 Efficiently Recovering a Matrix from its Principal Minors . . . . . 60
2.3.3 An Algorithm for Principal Minors with Noise . . . . . . . . . . . 66
3 The Spread and Bipartite Spread Conjecture 75
3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
3.2 The Bipartite Spread Conjecture . . . . . . . . . . . . . . . . . . . . . . . 78
3.3 A Sketch for the Spread Conjecture . . . . . . . . . . . . . . . . . . . . . 85
3.3.1 Properties of Spread-Extremal Graphs . . . . . . . . . . . . . . . . 88
9
4 Force-Directed Layouts 95
4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
4.2 Tutteโs Spring Embedding and a Trace Theorem . . . . . . . . . . . . . . . 98
4.2.1 Spring Embeddings and a Schur Complement . . . . . . . . . . . . 101
4.2.2 A Discrete Trace Theorem and Spectral Equivalence . . . . . . . . 109
4.3 The Kamada-Kawai Objective and Optimal Layouts . . . . . . . . . . . . . 127
4.3.1 Structural Results for Optimal Embeddings . . . . . . . . . . . . . 131
4.3.2 Algorithmic Lower Bounds . . . . . . . . . . . . . . . . . . . . . 138
4.3.3 An Approximation Algorithm . . . . . . . . . . . . . . . . . . . . 147
5 Error Estimates for the Lanczos Method 151
5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151
5.1.1 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155
5.1.2 Contributions and Remainder of Chapter . . . . . . . . . . . . . . 157
5.2 Preliminary Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 159
5.3 Asymptotic Lower Bounds and Improved Upper Bounds . . . . . . . . . . 161
5.4 Distribution Dependent Bounds . . . . . . . . . . . . . . . . . . . . . . . 175
5.5 Estimates for Arbitrary Eigenvalues and Condition Number . . . . . . . . . 181
5.6 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 187
10
Chapter 1
Introduction
In this thesis, we consider a number of different mathematical topics in linear algebra, with
a focus on determinantal point processes and eigenvalue problems. Below we provide a
brief non-technical summary of each topic in this thesis, as well as a short summary of other
interesting projects that I had the pleasure of working on during my PhD that do not appear
in this thesis.
Determinantal Point Processes and Principal Minor Assignment Problems
A determinantal point process is a probability distribution over the subsets of a finite
ground set where the distribution is characterized by the principal minors of some fixed
matrix. Given a finite set, say, [๐ ] = 1, . . . , ๐ where ๐ is a positive integer, a DPP
is a random subset ๐ โ [๐ ] such that ๐ (๐ฝ โ ๐ ) = det(๐พ๐ฝ), for all fixed ๐ฝ โ [๐ ],
where ๐พ โ R๐ร๐ is a given matrix called the kernel of the DPP and ๐พ๐ฝ = (๐พ๐,๐)๐,๐โ๐ฝ is
the principal submatrix of ๐พ associated with the set ๐ฝ . In Chapter 2, we focus on DPPs
with kernels that have two different types of structure. First, we restrict ourselves to DPPs
with symmetric kernels, and produce an algorithm that learns the kernel of a DPP from
its samples. Then, we consider the slightly broader class of DPPs with kernels that have
corresponding off-diagonal entries equal in magnitude, i.e., ๐พ satisfies |๐พ๐,๐| = |๐พ๐,๐| for all
๐, ๐ โ [๐ ], and produce an algorithm to recover such a matrix ๐พ from (possibly perturbed
versions of) its principal minors. Sections 2.2 and 2.3 are written so that they may be read
11
independently of each other. This chapter is joint work with Victor-Emmanuel Brunel,
Michel Goemans, Ankur Moitra, and Philippe Rigollet, and based on [119, 15].
The Spread and Bipartite Spread Conjecture
Given a graph ๐บ = ([๐], ๐ธ), [๐] = 1, ..., ๐, the adjacency matrix ๐ด โ R๐ร๐ of ๐บ
is defined as the matrix with ๐ด๐,๐ = 1 if ๐, ๐ โ ๐ธ and ๐ด๐,๐ = 0 otherwise. ๐ด is a real
symmetric matrix, and so has real eigenvalues ๐1 โฅ ... โฅ ๐๐. One interesting quantity is
the difference between extreme eigenvalues ๐1 โ ๐๐, commonly referred to as the spread of
the graph ๐บ. In [48], the authors proposed two conjectures. First, the authors conjectured
that if a graph ๐บ of size |๐ธ| โค ๐2/4 maximizes the spread over all graphs of order ๐
and size |๐ธ|, then the graph is bipartite. The authors also conjectured that if a graph ๐บ
maximizes the spread over all graphs of order ๐, then the graph is the join (i.e., with all
edges in between) ๐พโ2๐/3โ โจ ๐๐โ๐/3โ of a clique of order โ2๐/3โ and an independent set
of order โ๐/3โ. We refer to these conjectures as the bipartite spread and spread conjectures,
respectively. In Chapter 3, we consider both of these conjectures. We provide an infinite
class of counterexamples to the bipartite spread conjecture, and prove an asymptotic version
of the bipartite spread conjecture that is as tight as possible. In addition, for the spread
conjecture, we make a number of observations regarding any spread-optimal graph, and use
these observations to sketch a proof of the spread conjecture for all ๐ sufficiently large. The
full proof of the spread conjecture for all ๐ sufficiently large is rather long and technical,
and can be found in [12]. This chapter is joint work with Jane Breen, Alex Riasanovsky,
and Michael Tait, and based on [12].
Force-Directed Layouts
A force-directed layout, broadly speaking, is a technique for drawing a graph in a low-
dimensional Euclidean space (usually dimension โค 3) by applying โforcesโ between the
set of vertices and/or edges. In a force-directed layout, vertices connected by an edge (or
at a small graph distance from each other) tend to be close to each other in the resulting
layout. Two well-known examples of a force-directed layout are Tutteโs spring embedding
12
and metric multidimensional scaling. In his 1963 work titled โHow to Draw a Graph,โ
Tutte found an elegant technique to produce planar embeddings of planar graphs that also
minimize the sum of squared edge lengths in some sense. In particular, for a three-connected
planar graph, he showed that if the outer face of the graph is fixed as the complement of
some convex region in the plane, and every other point is located at the mass center of its
neighbors, then the resulting embedding is planar. This result is now known as Tutteโs spring
embedding theorem, and is considered by many to be the first example of a force-directed
layout [117]. One of the major questions that this result does not treat is how to best embed
the outer face. In Chapter 4, we investigate this question, consider connections to a Schur
complement, and provide some theoretical results for this Schur complement using a discrete
energy-only trace theorem. In addition, we also consider the later proposed Kamada-Kawai
objective, which can provide a force-directed layout of an arbitrary graph. In particular, we
prove a number of structural results regarding layouts that minimize this objective, provide
algorithmic lower bounds for the optimization problem, and propose a polynomial time
approximation scheme for drawing low diameter graphs. Sections 4.2 and 4.3 are written
so that they may be read independently of each other. This chapter is joint work with Erik
Demaine, Adam Hesterberg, Fred Koehler, Jayson Lynch, and Ludmil Zikatanov, and is
based on [124, 31].
Error Estimates for the Lanczos Method
The Lanczos method is one of the most powerful and fundamental techniques for solving
an extremal symmetric eigenvalue problem. Convergence-based error estimates depend
heavily on the eigenvalue gap, but in practice, this gap is often relatively small, resulting
in significant overestimates of error. One way to avoid this issue is through the use of
uniform error estimates, namely, bounds that depend only on the dimension of the matrix
and the number of iterations. In Chapter 5, we prove upper uniform error estimates for
the Lanczos method, and provide a number of lower bounds through the use of orthogonal
polynomials. In addition, we prove more specific results for matrices that possess some level
of eigenvalue regularity or whose eigenvalues converge to some limiting empirical spectral
13
distribution. Through numerical experiments, we show that the theoretical estimates of this
chapter do apply to practical computations for reasonably sized matrices. This chapter has
no collaborators, and is based on [122].
Topics not in this Thesis
During my PhD, I have had the chance to work on a number of interesting topics that
do not appear in this thesis. Below I briefly describe some of these projects. In [52] (joint
with X. Hu and L. Zikatanov), we consider graph disaggregation, a technique to break
high degree nodes of a graph into multiple smaller degree nodes, and prove a number of
results regarding the spectral approximation of a graph by a disaggregated version of it.
In [17] (joint with V.E. Brunel, A. Moitra, and P. Rigollet), we analyze the curvature of
the expected log-likelihood around its maximum and characterize of when the maximum
likelihood estimator converges at a parametric rate. In [120], we study centroidal Voronoi
tessellations from a variational perspective and show that, for any density function, there
does not exist a unique two generator centroidal Voronoi tessellation for dimensions greater
than one. In [121], we prove a generalization of Courantโs theorem for discrete graphs,
namely, that for the ๐๐กโ eigenvalue of a generalized Laplacian of a discrete graph, there
exists a set of corresponding eigenvectors such that each eigenvector can be decomposed
into at most k nodal domains, and that this set is of co-dimension zero with respect to the
entire eigenspace. In [123] (joint with J. Wellens), we show that, given a graph with local
crossing number either at most ๐ or at least 2๐, it is NP-complete to decide whether the local
crossing number is at most ๐ or at least 2๐. In [97] (joint with D. Rohatgi and J. Wellens),
we treat two conjectures โ one regarding the maximum possible value of ๐๐(๐บ) + ๐๐()
(where ๐๐(๐บ) is the minimum number of cliques needed to cover the edges of ๐บ exactly
once), due to de Caen, Erdos, Pullman and Wormald, and the other regarding ๐๐๐(๐พ๐)
(where ๐๐๐(๐บ) is the minimum number of bicliques needed to cover each edge of ๐บ exactly
๐ times), due to de Caen, Gregory and Pritikin. We disprove the first, obtaining improved
lower and upper bounds on ๐๐๐ฅ๐บ ๐๐(๐บ) + ๐๐(), and we prove an asymptotic version
of the second, showing that ๐๐๐(๐พ๐) = (1 + ๐(1))๐. If any of the topics (very briefly)
14
described here interest you, I encourage you to take a look at the associated paper. This
thesis was constructed to center around a topic and to be of a manageable length, and I am
no less proud of some of the results described here that do not appear in this thesis.
15
Chapter 2
Determinantal Point Processes and
Principal Minor Assignment Problems
2.1 Introduction
A determinantal point process (DPP) is a probability distribution over the subsets of a finite
ground set where the distribution is characterized by the principal minors of some fixed
matrix. DPPs emerge naturally in many probabilistic setups such as random matrices and
integrable systems [10]. They also have attracted interest in machine learning in the past
years for their ability to model random choices while being tractable and mathematically
elegant [71]. Given a finite set, say, [๐ ] = 1, . . . , ๐ where ๐ is a positive integer, a
DPP is a random subset ๐ โ [๐ ] such that ๐ (๐ฝ โ ๐ ) = det(๐พ๐ฝ), for all fixed ๐ฝ โ [๐ ],
where ๐พ โ R๐ร๐ is a given matrix called the kernel of the DPP and ๐พ๐ฝ = (๐พ๐,๐)๐,๐โ๐ฝ
is the principal submatrix of ๐พ associated with the set ๐ฝ . Assumptions on ๐พ that yield
the existence of a DPP can be found in [14] and it is easily seen that uniqueness of the
DPP is then automatically guaranteed. For instance, if ๐ผ โ ๐พ is invertible (๐ผ being the
identity matrix), then ๐พ is the kernel of a DPP if and only if ๐พ(๐ผ โ๐พ)โ1 is a ๐0-matrix,
i.e., all its principal minors are nonnegative. We refer to [54] for further definitions and
properties of ๐0 matrices. Note that in that case, the DPP is also an ๐ฟ-ensemble, with
probability mass function given by ๐ (๐ = ๐ฝ) = det(๐ฟ๐ฝ)/ det(๐ผ + ๐ฟ), for all ๐ฝ โ [๐ ],
17
where ๐ฟ = ๐พ(๐ผ โ๐พ)โ1.
DPPs with a symmetric kernel have attracted a lot of interest in machine learning because
they satisfy a property called negative association, which models repulsive interactions
between items [8]. Following the seminal work of Kulesza and Taskar [72], discrete
symmetric DPPs have found numerous applications in machine learning, including in
document and timeline summarization [76, 132], image search [70, 1] and segmentation
[73], audio signal processing [131], bioinformatics [6] and neuroscience [106]. What makes
such models appealing is that they exhibit repulsive behavior and lend themselves naturally
to tasks where returning a diverse set of objects is important. For instance, when applied to
recommender systems, DPPs enforce diversity in the items within one basket [45].
From a statistical point of view, DPPs raise two essential questions. First, what are the
families of kernels that give rise to one and the same DPP? Second, given observations
of independent copies of a DPP, how to recover its kernel (which, as foreseen by the first
question, is not necessarily unique)? These two questions are directly related to the principal
minor assignment problem (PMA). Given a class of matrices, (1) describe the set of all
matrices in this class that have a prescribed list of principal minors (this set may be empty),
(2) find one such matrix. While the first task is theoretical in nature, the second one is
algorithmic and should be solved using as few queries of the prescribed principal minors as
possible. In this work, we focus on the question of recovering a kernel from observations,
which is closely related to the problem of recovering a matrix with given principal minors.
In this chapter, we focus on symmetric DPPs (which, when clear from context, we simply
refer to as DPPs) and signed DPPs, a slightly larger class of kernels that only require
corresponding off-diagonal entries be symmetric in magnitude.
2.1.1 Symmetric Matrices
In the symmetric case, there are fast algorithms for sampling (or approximately sampling)
from a DPP [32, 94, 74, 75]. Marginalizing the distribution on a subset ๐ผ โ [๐ ] and
conditioning on the event that ๐ฝ โ ๐ both result in new DPPs and closed form expressions
for their kernels are known [11]. All of this work pertains to how to use a DPP once we
18
have learned its parameters. However, there has been much less work on the problem of
learning the parameters of a symmetric DPP. A variety of heuristics have been proposed,
including Expectation-Maximization [46], MCMC [1], and fixed point algorithms [80]. All
of these attempt to solve a non-convex optimization problem, and no guarantees on their
statistical performance are known. Recently, Brunel et al. [16] studied the rate of estimation
achieved by the maximum likelihood estimator, but the question of efficient computation
remains open. Apart from positive results on sampling, marginalization and conditioning,
most provable results about DPPs are actually negative. It is conjectured that the maximum
likelihood estimator is NP-hard to compute [69]. Actually, approximating the mode of size
๐ of a DPP to within a ๐๐ factor is known to be NP-hard for some ๐ > 1 [22, 108]. The best
known algorithms currently obtain a ๐๐ + ๐(๐) approximation factor [88, 89].
In Section 2.2, we bypass the difficulties associated with maximum likelihood estimation
by using the method of moments to achieve optimal sample complexity. In the setting
of DPPs, the method of moments has a close connection to a principal minor assignment
problem, which asks, given some set of principal minors of a matrix, to recover the original
matrix, up to some equivalence class. We introduce a parameter โ, called the cycle sparsity
of the graph induced by the kernel ๐พ, which governs the number of moments that need to
be considered and, thus, the sample complexity. The cycle sparsity of a graph is the smallest
integer โ so that the cycles of length at most โ yield a basis for the cycle space of the graph.
We use a refined version of Hortonโs algorithm [51, 2] to implement the method of moments
in polynomial time. Even though there are in general exponentially many cycles in a graph
to consider, Hortonโs algorithm constructs a minimum weight cycle basis and, in doing so,
also reveals the parameter โ together with a collection of at most โ induced cycles spanning
the cycle space. We use such cycles in order to construct our method of moments estimator.
For any fixed โ โฅ 2 and kernel ๐พ satisfying either |๐พ๐,๐| โฅ ๐ผ or ๐พ๐,๐ = 0 for all ๐, ๐ โ [๐ ],
our algorithm has sample complexity
๐ = ๐
((๐ถ
๐ผ
)2โ
+log๐
๐ผ2๐2
)
19
for some constant ๐ถ > 1, runs in time polynomial in ๐ and ๐ , and learns the parameters
up to an additive ๐ with high probability. The (๐ถ/๐ผ)2โ term corresponds to the number of
samples needed to recover the signs of the entries in ๐พ. We complement this result with
a minimax lower bound (Theorem 2) to show that this sample complexity is in fact near
optimal. In particular, we show that there is an infinite family of graphs with cycle sparsity
โ (namely length โ cycles) on which any algorithm requires at least (๐ถ โฒ๐ผ)โ2โ samples to
recover the signs of the entries of ๐พ for some constant ๐ถ โฒ > 1. We also provide experimental
results that confirm many quantitative aspects of our theoretical predictions. Together, our
upper bounds, lower bounds, and experiments present a nuanced understanding of which
symmetric DPPs can be learned provably and efficiently.
2.1.2 Magnitude-Symmetric Matrices
Recently, DPPs with non-symmetric kernels have gained interest in the machine learning
community [14, 44, 3, 93]. However, for these classes, questions of recovery are significantly
harder in general. In [14], Brunel proposes DPPs with kernels that are symmetric in
magnitudes, i.e., |๐พ๐,๐| = |๐พ๐,๐|, for all ๐, ๐ = 1, . . . , ๐ . This class is referred to as signed
DPPs. A signed DPP is a generalization of DPPs which allows for both attractive and
repulsive behavior. Such matrices are relevant in machine learning applications, because
they increase the modeling power of DPPs. An essential assumption made in [14] is
that ๐พ is dense, which simplifies the combinatorial analysis. We consider magnitude-
symmetric matrices, but without any density assumption. The problem becomes significantly
harder because it requires a fine analysis of the combinatorial properties of a graphical
representation of the sparsity of the matrix and the products of corresponding off-diagonal
entries. In Section 2.3, we treat both theoretical and algorithmic questions around recovering
a magnitude-symmetric matrix from its principal minors. First, in Subsection 2.3.1, we
show that, for a given magnitude-symmetric matrix ๐พ, the principal minors of length at
most โ, for some graph invariant โ depending only on principal minors of order one and
two, uniquely determine principal minors of all orders. Next, in Subsection 2.3.2, we
describe an efficient algorithm that, given the principal minors of a magnitude-symmetric
20
matrix, computes a matrix with those principal minors. This algorithm queries only ๐(๐2)
principal minors, all of a bounded order that depends solely on the sparsity of the matrix.
Finally, in Subsection 2.3.3, we consider the question of recovery when principal minors
are only known approximately, and construct an algorithm that, for magnitude-symmetric
matrices that are sufficiently generic in a sense, recovers the matrix almost exactly. This
algorithm immediately implies an procedure for learning a signed DPP from its samples, as
basic probabilistic techniques (e.g., a union bound, etc.) can be used to show that with high
probability all estimators are sufficiently close to the principal minor they are approximating.
21
2.2 Learning Symmetric Determinantal Point Processes
Let ๐1, . . . , ๐๐ be ๐ independent copies of ๐ โผ DPP(๐พ), for some unknown kernel ๐พ such
that 0 โชฏ ๐พ โชฏ ๐ผ๐ . It is well known that ๐พ is identified by DPP(๐พ) only up to flips of the
signs of its rows and columns: If ๐พ โฒ is another symmetric matrix with 0 โชฏ ๐พ โฒ โชฏ ๐ผ๐ , then
DPP(๐พ โฒ)=DPP(๐พ) if and only if ๐พ โฒ = ๐ท๐พ๐ท for some ๐ท โ ๐๐ , where ๐๐ denotes the
class of all ๐ ร๐ diagonal matrices with only 1 and โ1 on their diagonals [69, Theorem
4.1]. We call such a transform a ๐๐ -similarity of ๐พ.
In view of this equivalence class, we define the following pseudo-distance between
kernels ๐พ and ๐พ โฒ:
๐(๐พ,๐พ โฒ) = inf๐ทโ๐๐
|๐ท๐พ๐ท โ๐พ โฒ|โ ,
where for any matrix ๐พ, |๐พ|โ = max๐,๐โ[๐ ] |๐พ๐,๐| denotes the entrywise sup-norm.
For any ๐ โ [๐ ], we write โ๐ = det(๐พ๐), where ๐พ๐ denotes the |๐| ร |๐| sub-matrix
of ๐พ obtained by keeping rows and columns with indices in ๐. Note that for 1 โค ๐ = ๐ โค ๐ ,
we have the following relations:
๐พ๐,๐ = P[๐ โ ๐ ], โ๐,๐ = P[๐, ๐ โ ๐ ],
and |๐พ๐,๐| =โ
๐พ๐,๐๐พ๐,๐ โโ๐,๐. Therefore, the principal minors of size one and two of
๐พ determine ๐พ up to the sign of its off-diagonal entries. What remains is to compute the
signs of the off-diagonal entries. In fact, for any ๐พ, there exists an โ depending only on the
graph ๐บ๐พ induced by ๐พ, such that ๐พ can be recovered up to a ๐๐ -similarity with only the
knowledge of its principal minors of size at most โ. We will show that this โ is exactly the
cycle sparsity.
2.2.1 An Associated Principal Minor Assignment Problem
In this subsection, we consider the following related principal minor assignment problem:
Given a symmetric matrix ๐พ โ R๐ร๐ , what is the minimal โ such that ๐พ can be
recovered (up to ๐๐ -similarity), using principal minors โ๐ , of size at most โ?
22
This problem has a clear relation to learning DPPs, as, in our setting, we can approximate
the principal minors of ๐พ by empirical averages. However the accuracy of our estimator
deteriorates with the size of the principal minor, and we must therefore estimate the smallest
possible principal minors in order to achieve optimal sample complexity. Answering the
above question tells us how small of principal minors we can consider. The relationship
between the principal minors of ๐พ and recovery of DPP(๐พ) has also been considered
elsewhere. There has been work regarding the symmetric principal minor assignment
problem, namely the problem of computing a matrix given an oracle that gives any principal
minor in constant time [96]. Here, we prove that the smallest โ such that all the principal
minors of ๐พ are uniquely determined by those of size at most โ is exactly the cycle sparsity
of the graph induced by ๐พ.
We begin by recalling some standard graph theoretic notions. Let ๐บ = ([๐ ], ๐ธ),
|๐ธ| = ๐. A cycle ๐ถ of ๐บ is any connected subgraph in which each vertex has even degree.
Each cycle ๐ถ is associated with an incidence vector ๐ฅ โ ๐บ๐น (2)๐ such that ๐ฅ๐ = 1 if ๐ is
an edge in ๐ถ and ๐ฅ๐ = 0 otherwise. The cycle space ๐ of ๐บ is the subspace of ๐บ๐น (2)๐
spanned by the incidence vectors of the cycles in ๐บ. The dimension ๐๐บ of the cycle space is
called the cyclomatic number, and it is well known that ๐๐บ := ๐โ๐ + ๐ (๐บ), where ๐ (๐บ)
denotes the number of connected components of ๐บ. Recall that a simple cycle is a graph
where every vertex has either degree two or zero and the set of vertices with degree two
form a connected set. A cycle basis is a basis of ๐ โ ๐บ๐น (2)๐ such that every element is a
simple cycle. It is well known that every cycle space has a cycle basis of induced cycles.
Definition 1. The cycle sparsity of a graph ๐บ is the minimal โ for which ๐บ admits a cycle
basis of induced cycles of length at most โ, with the convention that โ = 2 whenever the
cycle space is empty. A corresponding cycle basis is called a shortest maximal cycle basis.
A shortest maximal cycle basis of the cycle space was also studied for other reasons by
[21]. We defer a discussion of computing such a basis to a later subsection. For any subset
๐ โ [๐ ], denote by ๐บ๐พ(๐) = (๐,๐ธ(๐)) the subgraph of ๐บ๐พ induced by ๐. A matching
of ๐บ๐พ(๐) is a subset ๐ โ ๐ธ(๐) such that any two distinct edges in ๐ are not adjacent in
๐บ(๐). The set of vertices incident to some edge in ๐ is denoted by ๐ (๐). We denote by
23
โณ(๐) the collection of all matchings of ๐บ๐พ(๐). Then, if ๐บ๐พ(๐) is an induced cycle, we
can write the principal minor โ๐ = det(๐พ๐) as follows:
โ๐ =โ
๐โโณ(๐)
(โ1)|๐ |โ
๐,๐โ๐
๐พ2๐,๐
โ๐ โ๐ (๐)
๐พ๐,๐ + 2ร (โ1)|๐|+1โ
๐,๐โ๐ธ(๐)
๐พ๐,๐. (2.1)
Proposition 1. Let ๐พ โ R๐ร๐ be a symmetric matrix, ๐บ๐พ be the graph induced by ๐พ, and
โ โฅ 3 be some integer. The kernel ๐พ is completely determined up to ๐๐ -similarity by its
principal minors of size at most โ if and only if the cycle sparsity of ๐บ๐พ is at most โ.
Proof. Note first that all the principal minors of ๐พ completely determine ๐พ up to a ๐๐ -
similarity [96, Theorem 3.14]. Moreover, recall that principal minors of degree at most 2
determine the diagonal entries of ๐พ as well as the magnitude of its off-diagonal entries.
In particular, given these principal minors, one only needs to recover the signs of the off-
diagonal entries of ๐พ. Let the sign of cycle ๐ถ in ๐พ be the product of the signs of the entries
of ๐พ corresponding to the edges of ๐ถ.
Suppose ๐บ๐พ has cycle sparsity โ and let (๐ถ1, . . . , ๐ถ๐) be a cycle basis of ๐บ๐พ where each
๐ถ๐, ๐ โ [๐] is an induced cycle of length at most โ. By (2.1), the sign of any ๐ถ๐, ๐ โ [๐] is
completely determined by the principal minor โ๐ , where ๐ is the set of vertices of ๐ถ๐ and is
such that |๐| โค โ. Moreover, for ๐ โ [๐], let ๐ฅ๐ โ ๐บ๐น (2)๐ denote the incidence vector of ๐ถ๐.
By definition, the incidence vector ๐ฅ of any cycle ๐ถ is given byโ
๐โโ ๐ฅ๐ for some subset
โ โ [๐]. The sign of ๐ถ is then given by the product of the signs of ๐ถ๐, ๐ โ โ and thus by
corresponding principal minors. In particular, the signs of all cycles are determined by the
principal minors โ๐ with |๐| โค โ. In turn, by Theorem 3.12 in [96], the signs of all cycles
completely determine ๐พ, up to a ๐๐ -similarity.
Next, suppose the cycle sparsity of ๐บ๐พ is at least โ + 1, and let ๐โ be the subspace of
๐บ๐น (2)๐ spanned by the induced cycles of length at most โ in ๐บ๐พ . Let ๐ฅ1, . . . , ๐ฅ๐ be a basis
of ๐โ made of the incidence column vectors of induced cycles of length at most โ in ๐บ๐พ and
form the matrix ๐ด โ ๐บ๐น (2)๐ร๐ by concatenating the ๐ฅ๐โs. Since ๐โ does not span the cycle
space of ๐บ๐พ , ๐ < ๐๐บ๐พโค ๐. Hence, the rank of ๐ด is less than ๐, so the null space of ๐ดโค is
non trivial. Let be the incidence column vector of an induced cycle ๐ถ that is not in ๐โ, and
24
let โ โ ๐บ๐ฟ(2)๐ with ๐ดโคโ = 0, โ = 0 and โคโ = 1. These three conditions are compatible
because ๐ถ /โ ๐โ. We are now in a position to define an alternate kernel ๐พ โฒ as follows: Let
๐พ โฒ๐,๐ = ๐พ๐,๐ and |๐พ โฒ
๐,๐| = |๐พ๐,๐| for all ๐, ๐ โ [๐ ]. We define the signs of the off-diagonal
entries of ๐พ โฒ as follows: For all edges ๐ = ๐, ๐, ๐ = ๐, sgn(๐พ โฒ๐) = sgn(๐พ๐) if โ๐ = 0 and
sgn(๐พ โฒ๐) = โ sgn(๐พ๐) otherwise. We now check that ๐พ and ๐พ โฒ have the same principal
minors of size at most โ but differ on a principal minor of size larger than โ. To that end, let
๐ฅ be the incidence vector of a cycle ๐ถ in ๐โ so that ๐ฅ = ๐ด๐ค for some ๐ค โ ๐บ๐ฟ(2)๐ . Thus
the sign of ๐ถ in ๐พ is given by
โ๐ :๐ฅ๐=1
๐พ๐ = (โ1)๐ฅโคโ
โ๐ :๐ฅ๐=1
๐พ โฒ๐ = (โ1)๐ค
โค๐ดโคโโ
๐ :๐ฅ๐=1
๐พ โฒ๐ =
โ๐ :๐ฅ๐=1
๐พ โฒ๐
because ๐ดโคโ = 0. Therefore, the sign of any ๐ถ โ ๐โ is the same in ๐พ and ๐พ โฒ. Now, let
๐ โ [๐ ] with |๐| โค โ, and let ๐บ = ๐บ๐พ๐= ๐บ๐พโฒ
๐be the graph corresponding to ๐พ๐ (or,
equivalently, to ๐พ โฒ๐). For any induced cycle ๐ถ in ๐บ, ๐ถ is also an induced cycle in ๐บ๐พ and
its length is at most โ. Hence, ๐ถ โ ๐โ and the sign of ๐ถ is the same in ๐พ and ๐พ โฒ. By [96,
Theorem 3.12], det(๐พ๐) = det(๐พ โฒ๐). Next observe that the sign of ๐ถ in ๐พ is given by
โ๐ : ๐=1
๐พ๐ = (โ1)โคโ
โ๐ : ๐=1
๐พ โฒ๐ = โ
โ๐ :๐ฅ๐=1
๐พ โฒ๐.
Note also that since ๐ถ is an induced cycle of ๐บ๐พ = ๐บ๐พโฒ , the above quantity is nonzero. Let
๐ be the set of vertices in ๐ถ. By (2.1) and the above display, we have det(๐พ๐) = det(๐พ โฒ๐).
Together with [96, Theorem 3.14], it yields ๐พ = ๐ท๐พ โฒ๐ท for all ๐ท โ ๐๐ .
2.2.2 Definition of the Estimator
Our procedure is based on the previous result and can be summarized as follows. We first
estimate the diagonal entries (i.e., the principal minors of size one) of ๐พ by the method of
moments. By the same method, we estimate the principal minors of size two of ๐พ, and we
deduce estimates of the magnitude of the off-diagonal entries. To use these estimates to
25
deduce an estimate of ๐บ๐พ , we make the following assumption on the kernel ๐พ.
Assumption 1. Fix ๐ผ โ (0, 1). For all 1 โค ๐ < ๐ โค ๐ , either ๐พ๐,๐ = 0, or |๐พ๐,๐| โฅ ๐ผ.
Finally, we find a shortest maximal cycle basis of , and we set the signs of our non-zero
off-diagonal entry estimates by using estimators of the principal minors induced by the
elements of the basis, again obtained by the method of moments.
For ๐ โ [๐ ], set โ๐ =1
๐
๐โ๐=1
1๐โ๐๐ , and define
๐,๐ = โ๐ and ๐,๐ = ๐,๐๐,๐ โ โ๐,๐,
where ๐,๐ and ๐,๐ are our estimators of ๐พ๐,๐ and ๐พ2๐,๐ , respectively. Define = ([๐ ], ),
where, for ๐ = ๐, ๐, ๐ โ if and only if ๐,๐ โฅ 12๐ผ2. The graph is our estimator of ๐บ๐พ .
Let ๐ถ1, ..., ๐ถ๐ be a shortest maximal cycle basis of the cycle space of . Let ๐๐ โ [๐ ]
be the subset of vertices of ๐ถ๐, for 1 โค ๐ โค ๐. We define
๐ = โ๐๐โ
โ๐โโณ(๐๐)
(โ1)|๐ |โ
๐,๐โ๐
๐,๐
โ๐ โ๐ (๐)
๐,๐,
for 1 โค ๐ โค ๐. In light of (2.1), for large enough ๐, this quantity should be close to
๐ป๐ = 2ร (โ1)|๐๐|+1โ
๐,๐โ๐ธ(๐๐)
๐พ๐,๐ .
We note that this definition is only symbolic in nature, and computing ๐ in this fashion
is extremely inefficient. Instead, to compute it in practice, we will use the determinant of
an auxiliary matrix, computed via a matrix factorization. Namely, let us define the matrix๐พ โ R๐ร๐ such that ๐พ๐,๐ = ๐,๐ for 1 โค ๐ โค ๐ , and ๐พ๐,๐ = 1/2๐,๐ . We have
det ๐พ๐๐=โ๐โโณ
(โ1)|๐ |โ
๐,๐โ๐
๐,๐
โ๐ โ๐ (๐)
๐,๐ + 2ร (โ1)|๐๐|+1โ
๐,๐โ(๐๐)
1/2๐,๐ ,
26
so that we may equivalently write
๐ = โ๐๐โ det( ๐พ๐๐
) + 2ร (โ1)|๐๐|+1โ
๐,๐โ(๐๐)
1/2๐,๐ .
Finally, let = ||. Set the matrix ๐ด โ ๐บ๐น (2)๐ร with ๐-th row representing ๐ถ๐ in
๐บ๐น (2)๐, 1 โค ๐ โค ๐, ๐ = (๐1, . . . , ๐๐) โ ๐บ๐น (2)๐ with ๐๐ = 12[sgn(๐) + 1], 1 โค ๐ โค ๐,
and let ๐ฅ โ ๐บ๐น (2)๐ be a solution to the linear system ๐ด๐ฅ = ๐ if a solution exists, ๐ฅ = 1๐
otherwise. We define ๐,๐ = 0 if ๐, ๐ /โ and ๐,๐ = ๐,๐ = (2๐ฅ๐,๐ โ 1)1/2๐,๐ for all
๐, ๐ โ .
Next, we prove the following lemma which relates the quality of estimation of ๐พ in
terms of ๐ to the quality of estimation of the principal minors โ๐ .
Lemma 1. Let ๐พ satisfy Assumption 1, and let โ be the cycle sparsity of ๐บ๐พ . Let ๐ > 0. If
|โ๐ โโ๐| โค ๐ for all ๐ โ [๐ ] with |๐| โค 2 and if |โ๐ โโ๐| โค (๐ผ/4)|๐| for all ๐ โ [๐ ]
with 3 โค |๐| โค โ, then
๐(,๐พ) < 4๐/๐ผ .
Proof. We can bound |๐,๐ โ๐พ2๐,๐|, namely,
๐,๐ โค (๐พ๐,๐ + ๐ผ2/16)(๐พ๐,๐ + ๐ผ2/16)โ (โ๐,๐ โ ๐ผ2/16)
โค ๐พ2๐,๐ + ๐ผ2/4
and
๐,๐ โฅ (๐พ๐,๐ โ ๐ผ2/16)(๐พ๐,๐ โ ๐ผ2/16)โ (โ๐,๐ + ๐ผ2/16)
โฅ ๐พ2๐,๐ โ 3๐ผ2/16,
giving |๐,๐ โ๐พ2๐,๐| < ๐ผ2/4. Thus, we can correctly determine whether ๐พ๐,๐ = 0 or |๐พ๐,๐| โฅ
๐ผ, yielding = ๐บ๐พ . In particular, the cycle basis ๐ถ1, . . . , ๐ถ๐of is a cycle basis of ๐บ๐พ .
27
Let 1 โค ๐ โค ๐. Denote by ๐ก = (๐ผ/4)|๐๐|. We have
๐ โ๐ป๐
โค |โ๐๐
โโ๐๐|+ |โณ(๐๐)|max
๐ฅโยฑ1
[(1 + 4๐ก๐ฅ)|๐๐| โ 1
]โค (๐ผ/4)|๐๐| + |โณ(๐๐)|
[(1 + 4๐ก)|๐๐| โ 1
]โค (๐ผ/4)|๐๐| + ๐
(|๐๐|,
โ|๐๐|2
โ)4๐ก ๐ (|๐๐|, |๐๐|)
โค (๐ผ/4)|๐๐| + 4๐ก (2|๐๐|2 โ 1)(2|๐๐| โ 1)
โค (๐ผ/4)|๐๐| + ๐ก22|๐๐|
< 2๐ผ|๐๐| โค |๐ป๐|,
where, for positive integers ๐ < ๐, we denote by ๐ (๐, ๐) =โ๐
๐=1
(๐๐
). Therefore, we can
determine the sign of the productโ
๐,๐โ๐ธ(๐๐)๐พ๐,๐ for every element in the cycle basis and
recover the signs of the non-zero off-diagonal entries of ๐พ๐,๐ . Hence,
๐(,๐พ) = max1โค๐,๐โค๐
|๐,๐| โ |๐พ๐,๐|
.
For ๐ = ๐,|๐,๐| โ |๐พ๐,๐|
= |๐,๐ โ๐พ๐,๐| โค ๐. For ๐ = ๐ with ๐, ๐ โ = ๐ธ, one can
easily show that๐,๐ โ๐พ2
๐,๐
โค 4๐, yielding
|1/2๐,๐ โ |๐พ๐,๐|| โค
4๐
1/2๐,๐ + |๐พ๐,๐|
โค 4๐
๐ผ,
which completes the proof.
We are now in a position to establish a sufficient sample size to estimate ๐พ within
distance ๐.
Theorem 1. Let ๐พ satisfy Assumption 1, and let โ be the cycle sparsity of ๐บ๐พ . Let ๐ > 0.
For any ๐ด > 0, there exists ๐ถ > 0 such that
๐ โฅ ๐ถ( 1
๐ผ2๐2+ โ( 4
๐ผ
)2โ)log๐ ,
28
yields ๐(,๐พ) โค ๐ with probability at least 1โ๐โ๐ด.
Proof. Using the previous lemma, and applying a union bound,
P[๐(,๐พ) > ๐
]โคโ|๐|โค2
P[|โ๐ โโ๐| > ๐ผ๐/4
]+โ
2โค|๐|โคโ
P[|โ๐ โโ๐| > (๐ผ/4)|๐|
]โค 2๐2๐โ๐๐ผ2๐2/8 + 2๐ โ+1๐โ2๐(๐ผ/4)2โ , (2.2)
where we used Hoeffdingโs inequality.
2.2.3 Information Theoretic Lower Bounds
We prove an information-theoretic lower bound that holds already if ๐บ๐พ is an โ-cycle.
Let ๐ท(๐พโ๐พ โฒ) and H(๐พ,๐พ โฒ) denote respectively the Kullback-Leibler divergence and the
Hellinger distance between DPP(๐พ) and DPP(๐พ โฒ).
Lemma 2. For ๐ โ โ,+, let ๐พ๐ be the โร โ matrix with elements given by
๐พ๐,๐ =
โงโชโชโชโชโชโชโจโชโชโชโชโชโชโฉ
1/2 if ๐ = ๐
๐ผ if ๐ = ๐ยฑ 1
๐๐ผ if (๐, ๐) โ (1, โ), (โ, 1)
0 otherwise
.
Then, for any ๐ผ โค 1/8, it holds
๐ท(๐พโ๐พ โฒ) โค 4(6๐ผ)โ, and H(๐พ,๐พ โฒ) โค (8๐ผ2)โ .
Proof. It is straightforward to see that
det(๐พ+๐ฝ )โ det(๐พโ
๐ฝ ) =
โงโชโจโชโฉ2๐ผโ if ๐ฝ = [โ]
0 else.
If ๐ is sampled from DPP(๐พ๐), we denote by ๐๐(๐) = P[๐ = ๐], for ๐ โ [โ]. It follows
29
from the inclusion-exclusion principle that for all ๐ โ [โ],
๐+(๐)โ ๐โ(๐) =โ
๐ฝโ[โ]โ๐
(โ1)|๐ฝ |(det๐พ+๐โช๐ฝ โ det๐พโ
๐โช๐ฝ)
= (โ1)โโ|๐|(det๐พ+ โ det๐พโ) = ยฑ2๐ผโ , (2.3)
where |๐ฝ | denotes the cardinality of ๐ฝ . The inclusion-exclusion principle also yields that
๐๐(๐) = | det(๐พ๐ โ ๐ผ๐)| for all ๐ โ [๐], where ๐ผ๐ stands for the โร โ diagonal matrix with
ones on its entries (๐, ๐) for ๐ /โ ๐, zeros elsewhere.
We denote by ๐ท(๐พ+โ๐พโ) the Kullback Leibler divergence between DPP(๐พ+) and
DPP(๐พโ):
๐ท(๐พ+โ๐พโ) =โ๐โ[โ]
๐+(๐) log
(๐+(๐)
๐โ(๐)
)โคโ๐โ[โ]
๐+(๐)
๐โ(๐)(๐+(๐)โ ๐โ(๐))
โค 2๐ผโโ๐โ[โ]
| det(๐พ+ โ ๐ผ๐)|| det(๐พโ โ ๐ผ๐)|
, (2.4)
by (2.3). Using the fact that 0 < ๐ผ โค 1/8 and the Gershgorin circle theorem, we conclude
that the absolute value of all eigenvalues of ๐พ๐ โ ๐ผ๐ are between 1/4 and 3/4. Thus we
obtain from (2.4) the bound ๐ท(๐พ+โ๐พโ) โค 4(6๐ผ)โ.
Using the same arguments as above, the Hellinger distance H(๐พ+, ๐พโ) between
DPP(๐พ+) and DPP(๐พโ) satisfies
H(๐พ+, ๐พโ) =โ๐ฝโ[โ]
(๐+(๐ฝ)โ ๐โ(๐ฝ)โ๐+(๐ฝ) +
โ๐โ(๐ฝ)
)2
โคโ๐ฝโ[โ]
4๐ผ2โ
2 ยท 4โโ= (8๐ผ2)โ
which completes the proof.
The sample complexity lower bound now follows from standard arguments.
30
Theorem 2. Let 0 < ๐ โค ๐ผ โค 1/8 and 3 โค โ โค ๐ . There exists a constant ๐ถ > 0 such
that if
๐ โค ๐ถ( 8โ
๐ผ2โ+
log(๐/โ)
(6๐ผ)โ+
log๐
๐2
),
then the following holds: for any estimator based on ๐ samples, there exists a kernel
๐พ that satisfies Assumption 1 and such that the cycle sparsity of ๐บ๐พ is โ and for which
๐(,๐พ) โฅ ๐ with probability at least 1/3.
Proof. Recall the notation of Lemma 2. First consider the ๐ ร๐ block diagonal matrix ๐พ
(resp. ๐พ โฒ) where its first block is ๐พ+ (resp. ๐พโ) and its second block is ๐ผ๐โโ. By a standard
argument, the Hellinger distance H๐(๐พ,๐พ โฒ) between the product measures DPP(๐พ)โ๐ and
DPP(๐พ โฒ)โ๐ satisfies
1โ H2๐(๐พ,๐พ โฒ)
2=(1โ H2(๐พ,๐พ โฒ)
2
)๐ โฅ (1โ ๐ผ2โ
2ร 8โ
)๐,
which yields the first term in the desired lower bound.
Next, by padding with zeros, we can assume that ๐ฟ = ๐/โ is an integer. Let ๐พ(0) be
a block diagonal matrix where each block is ๐พ+ (using the notation of Lemma 2). For
๐ = 1, . . . , ๐ฟ, define the ๐ ร ๐ block diagonal matrix ๐พ(๐) as the matrix obtained from
๐พ(0) by replacing its ๐th block with ๐พโ (again using the notation of Lemma 2).
Since DPP(๐พ(๐)) is the product measure of ๐ฟ lower dimensional DPPs that are each
independent of each other, using Lemma 2 we have ๐ท(๐พ(๐)โ๐พ(0)) โค 4(6๐ผ)โ. Hence, by
Fanoโs lemma (see, e.g., Corollary 2.6 in [115]), the sample complexity to learn the kernel
of a DPP within a distance ๐ โค ๐ผ is
ฮฉ
(log(๐/โ)
(6๐ผ)โ
)
which yields the second term.
The third term follows from considering ๐พ0 = (1/2)๐ผ๐ and letting ๐พ๐ be obtained from
๐พ0 by adding ๐ to the ๐th entry along the diagonal. It is easy to see that ๐ท(๐พ๐โ๐พ0) โค 8๐2.
Hence, a second application of Fanoโs lemma yields that the sample complexity to learn the
31
kernel of a DPP within a distance ๐ is ฮฉ( log๐๐2
).
The third term in the lower bound is the standard parametric term and is unavoidable
in order to estimate the magnitude of the coefficients of ๐พ. The other terms are more
interesting. They reveal that the cycle sparsity of ๐บ๐พ , namely, โ, plays a key role in the task
of recovering the sign pattern of ๐พ. Moreover the theorem shows that the sample complexity
of our method of moments estimator is near optimal.
2.2.4 Algorithmic Aspects and Experiments
We first give an algorithm to compute the estimator defined in Subsection 2.2.2. A
well-known algorithm of Horton [51] computes a cycle basis of minimum total length in
time ๐(๐3๐). Subsequently, the running time was improved to ๐(๐2๐/ log๐) time [2].
Also, it is known that a cycle basis of minimum total length is a shortest maximal cycle
basis [21]. Together, these results imply the following.
Lemma 3. Let ๐บ = ([๐ ], ๐ธ), |๐ธ| = ๐. There is an algorithm to compute a shortest
maximal cycle basis in ๐(๐2๐/ log๐) time.
In addition, we recall the following standard result regarding the complexity of Gaussian
elimination [47].
Lemma 4. Let ๐ด โ ๐บ๐น (2)๐ร๐, ๐ โ ๐บ๐น (2)๐ . Then Gaussian elimination will find a vector
๐ฅ โ ๐บ๐น (2)๐ such that ๐ด๐ฅ = ๐ or conclude that none exists in ๐(๐2๐) time.
We give our procedure for computing the estimator in Algorithm 1. In the following
theorem, we bound the running time of Algorithm 1 and establish an upper bound on the
sample complexity needed to solve the recovery problem as well as the sample complexity
needed to compute an estimate that is close to ๐พ.
Theorem 3. Let ๐พ โ R๐ร๐ be a symmetric matrix satisfying 0 โชฏ ๐พ โชฏ ๐ผ , and satisfying
Assumption 1. Let ๐บ๐พ be the graph induced by ๐พ and โ be the cycle sparsity of ๐บ๐พ . Let
32
Algorithm 1 Compute Estimator Input: samples ๐1, ..., ๐๐, parameter ๐ผ > 0.
Compute โ๐ for all |๐| โค 2.Set ๐,๐ = โ๐ for 1 โค ๐ โค ๐ .Compute ๐,๐ for 1 โค ๐ < ๐ โค ๐ .Form ๐พ โ R๐ร๐ and = ([๐ ], ).Compute a shortest maximal cycle basis ๐ฃ1, ..., ๐ฃ๐.Compute โ๐๐
for 1 โค ๐ โค ๐.Compute ๐ถ๐๐
using det ๐พ๐๐for 1 โค ๐ โค ๐.
Construct ๐ด โ ๐บ๐น (2)๐ร๐, ๐ โ ๐บ๐น (2)๐ .Solve ๐ด๐ฅ = ๐ using Gaussian elimination.Set ๐,๐ = ๐,๐ = (2๐ฅ๐,๐ โ 1)
1/2๐,๐ , for all ๐, ๐ โ .
๐1, ..., ๐๐ be samples from DPP(๐พ) and ๐ฟ โ (0, 1). If
๐ >log(๐ โ+1/๐ฟ)
(๐ผ/4)2โ,
then with probability at least 1โ ๐ฟ, Algorithm 1 computes an estimator which recovers
the signs of ๐พ up to a ๐๐ -similarity and satisfies
๐(๐พ, ) <1
๐ผ
(8 log(4๐ โ+1/๐ฟ)
๐
)1/2
(2.5)
in ๐(๐3 + ๐๐2) time.
Proof. (2.5) follows directly from (2.2) in the proof of Theorem 1. That same proof also
shows that with probability at least 1โ๐ฟ, the support of ๐บ๐พ and the signs of ๐พ are recovered
up to a ๐๐ -similarity. What remains is to upper bound the worst case run time of Algorithm
1. We will perform this analysis line by line. Initializing requires ๐(๐2) operations.
Computing โ๐ for all subsets |๐| โค 2 requires ๐(๐๐2) operations. Setting ๐,๐ requires
๐(๐) operations. Computing ๐,๐ for 1 โค ๐ < ๐ โค ๐ requires ๐(๐2) operations. Forming๐พ requires ๐(๐2) operations. Forming ๐บ๐พ requires ๐(๐2) operations. By Lemma 3,
computing a shortest maximal cycle basis requires ๐(๐๐) operations. Constructing the
subsets ๐๐, 1 โค ๐ โค ๐, requires ๐(๐๐) operations. Computing โ๐๐for 1 โค ๐ โค ๐
33
requires ๐(๐๐) operations. Computing ๐ถ๐๐using det( ๐พ[๐๐]) for 1 โค ๐ โค ๐ requires
๐(๐โ3) operations, where a factorization of each ๐พ[๐๐] is used to compute each determinant
in ๐(โ3) operations. Constructing ๐ด and ๐ requires ๐(๐โ) operations. By Lemma 4, finding
a solution ๐ฅ using Gaussian elimination requires ๐(๐3) operations. Setting ๐,๐ for all
edges ๐, ๐ โ ๐ธ requires ๐(๐) operations. Put this all together, Algorithm 1 runs in
๐(๐3 + ๐๐2) time.
Chordal Graphs
Here we show that it is possible to obtain faster algorithms by exploiting the structure
of ๐บ๐พ . Specifically, in the case where ๐บ๐พ chordal, we give an ๐(๐) time algorithm to
determine the signs of the off-diagonal entries of the estimator , resulting in an improved
overall runtime of ๐(๐ + ๐๐2). Recall that a graph ๐บ = ([๐ ], ๐ธ) is said to be chordal if
every induced cycle in ๐บ is of length three. Moreover, a graph ๐บ = ([๐ ], ๐ธ) has a perfect
elimination ordering (PEO) if there exists an ordering of the vertex set ๐ฃ1, ..., ๐ฃ๐ such that,
for all ๐, the graph induced by ๐ฃ๐ โช ๐ฃ๐|๐, ๐ โ ๐ธ, ๐ > ๐ is a clique. It is well known
that a graph is chordal if and only if it has a PEO. A PEO of a chordal graph with ๐ edges
can be computed in ๐(๐) operations using lexicographic breadth-first search [98].
Lemma 5. Let ๐บ = ([๐ ], ๐ธ), be a chordal graph and ๐ฃ1, ..., ๐ฃ๐ be a PEO. Given ๐,
let ๐* := min๐|๐ > ๐, ๐ฃ๐, ๐ฃ๐ โ ๐ธ. Then the graph ๐บโฒ = ([๐ ], ๐ธ โฒ), where ๐ธ โฒ =
๐ฃ๐, ๐ฃ๐*๐โ๐ (๐บ)๐=1 , is a spanning forest of ๐บ.
Proof. We first show that there are no cycles in ๐บโฒ. Suppose to the contrary, that there is
an induced cycle ๐ถ of length ๐ on the vertices ๐ฃ๐1 , ..., ๐ฃ๐๐. Let ๐ฃ be the vertex of smallest
index. Then ๐ฃ is connected to two other vertices in the cycle of larger index. This is a
contradiction to the construction.
What remains is to show that |๐ธ โฒ| = ๐ โ ๐ (๐บ). It suffices to prove the case ๐ (๐บ) = 1.
Suppose to the contrary, that there exists a vertex ๐ฃ๐, ๐ < ๐ , with no neighbors of larger
index. Let ๐ be the shortest path in ๐บ from ๐ฃ๐ to ๐ฃ๐ . By connectivity, such a path exists.
Let ๐ฃ๐ be the vertex of smallest index in the path. However, it has two neighbors in the path
of larger index, which must be adjacent to each other. Therefore, there is a shorter path.
34
Algorithm 2 Compute Signs of Edges in Chordal Graph
Input: ๐บ๐พ = ([๐ ], ๐ธ) chordal, โ๐ for |๐| โค 3.
Compute a perfect elimination ordering ๐ฃ1, ..., ๐ฃ๐.Compute the spanning forest ๐บโฒ = ([๐ ], ๐ธ โฒ).Set all edges in ๐ธ โฒ to have positive sign.Compute ๐ถ๐,๐,๐* for all ๐, ๐ โ ๐ธ โ ๐ธ โฒ, ๐ < ๐.Order edges ๐ธ โ ๐ธ โฒ = ๐1, ..., ๐๐ such that ๐ > ๐ if max ๐๐ < max ๐๐ .Visit edges in sorted order and for ๐ = ๐, ๐, ๐ > ๐, set
sgn(๐, ๐) = sgn(๐ถ๐,๐,๐*) sgn(๐, ๐*) sgn(๐, ๐*).
Now, given the chordal graph ๐บ๐พ induced by ๐พ and the estimates of principal minors
of size at most three, we provide an algorithm to determine the signs of the edges of ๐บ๐พ , or,
equivalently, the off-diagonal entries of ๐พ.
Theorem 4. If ๐บ๐พ is chordal, Algorithm 2 correctly determines the signs of the edges of
๐บ๐พ in ๐(๐) time.
Proof. We will simultaneously perform a count of the operations and a proof of the cor-
rectness of the algorithm. Computing a PEO requires ๐(๐) operations. Computing the
spanning forest requires ๐(๐) operations. The edges of the spanning tree can be given
arbitrary sign, because it is a cycle-free graph. This assigns a sign to two edges of each
3-cycle. Computing each ๐ถ๐,๐,๐* requires a constant number of operations because โ = 3,
requiring a total of ๐(๐โ๐) operations. Ordering the edges requires ๐(๐) operations.
Setting the signs of each remaining edge requires ๐(๐) operations.
Therefore, when ๐บ๐พ is chordal, the overall complexity required by our algorithm to
compute is reduced to ๐(๐ + ๐๐2).
Experiments
Here we present experiments to supplement the theoretical results of the section. We test
our algorithm on two types of random matrices. First, we consider the matrix ๐พ โ R๐ร๐
35
(a) graph recovery, cycle (b) graph and sign recovery, cycle
(c) graph recovery, clique (d) graph and sign recovery, clique
Figure 2-1: Plots of the proportion of successive graph recovery, and graph and sign recovery,for random matrices with cycle and clique graph structure, respectively. The darker the box,the higher the proportion of trials that were recovered successfully.
corresponding to the cycle on ๐ vertices,
๐พ =1
2๐ผ +
1
4๐ด,
where ๐ด is symmetric and has non-zero entries only on the edges of the cycle, either +1 or
โ1, each with probability 1/2. By the Gershgorin circle theorem, 0 โชฏ ๐พ โชฏ ๐ผ . Next, we
consider the matrix ๐พ โ R๐ร๐ corresponding to the clique on ๐ vertices,
๐พ =1
2๐ผ +
1
4โ๐๐ด,
where ๐ด is symmetric and has all entries either +1 or โ1, each with probability 1/2. It is
well known that โ2โ๐ โชฏ ๐ด โชฏ 2
โ๐ with high probability, implying 0 โชฏ ๐พ โชฏ ๐ผ .
For both cases and for a range of values of matrix dimension ๐ and samples ๐, we
run our algorithm on 50 randomly generated instances. We record the proportion of trials
where we recover the graph induced by ๐พ, and the proportion of the trials where we recover
36
both the graph and correctly determine the signs of the entries. In Figure 2-1, the shade of
each box represents the proportion of trials where recovery was successful for a given pair
๐, ๐. A completely white box corresponds to zero success rate, black to a perfect success
rate. The plots corresponding to the cycle and the clique are telling. We note that for the
clique, recovering the sparsity pattern and recovering the signs of the off-diagonal entries
come hand-in-hand. However, for the cycle, there is a noticeable gap between the number
of samples required to recover the sparsity pattern and the number of samples required to
recover the signs of the off-diagonal entries. This empirically confirms the central role that
cycle sparsity plays in parameter estimation, and further corroborates our theoretical results.
37
2.3 Recovering a Magnitude-Symmetric Matrix from its
Principal Minors
In this section, we consider matrices ๐พ โ R๐ร๐ satisfying |๐พ๐,๐| = |๐พ๐,๐| for ๐, ๐ โ [๐ ],
which we refer to as magnitude-symmetric matrices, and investigate the algorithmic question
of recovering such a matrix from its principal minors. First, we require a number of key
definitions and notation. For completeness (and to allow independent reading), we are
including terms and notations that we have defined in Section 2.2. If ๐พ โ R๐ร๐ and
๐ โ [๐ ], ๐พ๐ := (๐พ๐,๐)๐,๐โ๐ and โ๐(๐พ) := det๐พ๐ is the principal minor of ๐พ associated
with the set ๐ (โโ (๐พ) = 1 by convention). When it is clear from the context, we simply
write โ๐ instead of โ๐(๐พ), and for sets of order at most four, we replace the set itself by
its elements, i.e., write โ1,2,3 as โ1,2,3.
In addition, we recall a number of relevant graph-theoretic definitions. In this work, all
graphs ๐บ are simple, undirected graphs. An articulation point or cut vertex of a graph ๐บ is a
vertex whose removal disconnects the graph. A graph ๐บ is said to be two-connected if it has
at least two vertices and has no articulation points. A maximal two-connected subgraph ๐ป
of ๐บ is called a block of ๐บ. Given a graph ๐บ, a cycle ๐ถ is a subgraph of ๐บ in which every
vertex of ๐ถ has even degree (within ๐ถ). A simple cycle ๐ถ is a connected subgraph of ๐บ in
which every vertex of ๐ถ has degree two. For simplicity, we may sometimes describe ๐ถ by a
traversal of its vertices along its edges, i.e., ๐ถ = ๐1 ๐2 ... ๐๐ ๐1; notice we have the choice of
the start vertex and the orientation of the cycle in this description. We denote the subgraph
induced by a vertex subset ๐ โ ๐ by ๐บ[๐]. A simple cycle ๐ถ of a graph ๐บ on vertex set
๐ (๐ถ) is not necessarily induced, and while the cycle itself has all vertices of degree two,
this cycle may contain some number of chords in the original graph ๐บ, and we denote this
set ๐ธ(๐บ[๐])โ๐ธ(๐ถ) of chords by ๐พ(๐ถ).
Given any subgraph ๐ป of ๐บ, we can associate with ๐ป an incidence vector ๐๐ป โ ๐บ๐น (2)๐,
๐ = |๐ธ|, where ๐๐ป(๐) = 1 if and only if ๐ โ ๐ป , and ๐๐ป(๐) = 0 otherwise. Given two
subgraphs ๐ป1, ๐ป2 โ ๐บ of ๐บ, we define their sum ๐ป1 + ๐ป2 as the graph containing all
edges in exactly one of ๐ธ(๐ป1) and ๐ธ(๐ป2) (i.e., their symmetric difference) and no isolated
38
vertices. This corresponds to the graph resulting from the sum of their incidence vectors.
The cycle space of ๐บ is given by
๐(๐บ) = span๐๐ถ |๐ถ is a cycle of ๐บ โ ๐บ๐น (2)๐,
and has dimension ๐ = ๐ โ ๐ + ๐ (๐บ), where ๐ (๐บ) denotes the number of connected
components of ๐บ. The quantity ๐ is commonly referred to as the cyclomatic number. The
cycle sparsity โ of the graph ๐บ is the smallest number for which the set of incidence vectors
๐๐ถ of cycles of edge length at most โ spans ๐(๐บ).
In this section, we require the use and analysis of graphs ๐บ = ([๐ ], ๐ธ) endowed with a
linear Boolean function ๐ that maps subgraphs of ๐บ to โ1,+1, i.e., ๐(๐) โ โ1,+1 for
all ๐ โ ๐ธ(๐บ) and
๐(๐ป) =โ
๐โ๐ธ(๐ป)
๐(๐)
for all subgraphs ๐ป . The graph ๐บ combined with a linear Boolean function ๐ is denoted
by ๐บ = ([๐ ], ๐ธ, ๐) and referred to as a charged graph. If ๐(๐ป) = +1 (resp. ๐(๐ป) = โ1),
then we say the subgraph ๐ป is positive (resp. negative). For the sake of space, we often
denote ๐(๐, ๐), ๐, ๐ โ ๐ธ(๐บ), by ๐๐,๐ . Given a magnitude-symmetric matrix ๐พ โ R๐ร๐ ,
|๐พ๐,๐| = |๐พ๐,๐| for ๐, ๐ โ [๐ ], we define the charged sparsity graph ๐บ๐พ as ๐บ๐พ = ([๐ ], ๐ธ, ๐),
๐ธ := ๐, ๐ โ [๐ ] | ๐ = ๐, |๐พ๐,๐| = 0, ๐(๐, ๐) := sgn(๐พ๐,๐๐พ๐,๐), and when clear from
context, we simply write ๐บ instead of ๐บ๐พ .
We define the span of the incidence vectors of positive cycles as
๐+(๐บ) = span๐๐ถ |๐ถ is a cycle of ๐บ, ๐(๐ถ) = +1.
As we will see, when ๐ป is a block, the space ๐+(๐ป) is spanned by positive simple cycles
(Proposition 3). We define the simple cycle sparsity โ+ of ๐+(๐บ) to be the smallest number
for which the set of incidence vectors ๐๐ถ of positive simple cycles of edge length at most
โ+ spans ๐+(๐บ) (or, equivalently, the set ๐+(๐ป) for every block ๐ป of ๐บ). Unlike ๐(๐บ), for
๐+(๐บ) this quantity โ+ depends on whether we consider a basis of cycles or of only simple
39
cycles. The study of bases of ๐+(๐ป), for blocks ๐ป , consisting of incidence vectors of simple
cycles constitutes the major graph-theoretic subject of this work, and has connections to the
recovery of a magnitude-symmetric matrix from its principal minors.
2.3.1 Principal Minors and Magnitude-Symmetric Matrices
In this subsection, we are primarily concerned with recovering a magnitude-symmetric
matrix ๐พ from its principal minors. Using only principal minors of order one and two,
the quantities ๐พ๐,๐, ๐พ๐,๐๐พ๐,๐, and the charged graph ๐บ๐พ can be computed immediately, as
๐พ๐,๐ = โ๐ and ๐พ๐,๐๐พ๐,๐ = โ๐โ๐ โโ๐,๐ for all ๐ = ๐. The main focus of this subsection is
to obtain further information on ๐พ using principal minors of order greater than two, and
to quantify the extent to which ๐พ can be identified from its principal minors. To avoid the
unintended cancellation of terms in a principal minor, in what follows we assume that ๐พ is
generic in the sense that
If ๐พ๐,๐๐พ๐,๐๐พ๐,โ๐พโ,๐ = 0 for ๐, ๐, ๐, โ โ [๐ ] distinct, then |๐พ๐,๐๐พ๐,โ| = |๐พ๐,๐๐พโ,๐|, (2.6)
and ๐1๐พ๐,๐๐พ๐,๐๐พ๐,โ๐พโ,๐ + ๐2๐พ๐,๐๐พ๐,โ๐พโ,๐๐พ๐,๐ + ๐3๐พ๐,๐๐พ๐,๐๐พ๐,โ๐พโ,๐ = 0
for all ๐1, ๐2, ๐3 โ โ1, 1.
The first part of the condition in (2.6) implies that the three terms (corresponding to the three
distinct cycles on 4 vertices) in the second part are all distinct in magnitude; the second
part strengthens this requirement. Condition (2.6) can be thought of as a no-cancellation
requirement for four-cycles of principal minors of order four. As we will see, this condition
is quite important for the recovery of a magnitude-symmetric matrix from its principal
minors. Though, the results of this section, slightly modified, hold under slightly weaker
conditions than (2.6), albeit at the cost of simplicity. This condition rules out dense matrices
with a high degree of symmetry, as well as the large majority of rank one matrices, but this
condition is satisfied for almost all matrices.
We denote the set of ๐ ร๐ magnitude-symmetric matrices satisfying Property (2.6)
by ๐ฆ๐ , and, when the dimension is clear from context, we often simply write ๐ฆ. We note
40
that if ๐พ โ ๐ฆ, then any matrix ๐พ โฒ satisfying |๐พ โฒ๐,๐| = |๐พ๐,๐|, ๐, ๐ โ [๐ ], is also in ๐ฆ. In this
subsection, we answer the following question:
Given ๐พ โ ๐ฆ, what is the minimal โ+ such that any ๐พ โฒ โ ๐ฆ with (2.7)
โ๐(๐พ) = โ๐(๐พ โฒ) for all |๐| โค โ+ also satisfies โ๐(๐พ) = โ๐(๐พ โฒ)
for all ๐ โ [๐ ]?
Question (2.7) asks for the smallest โ such that the principal minors of order at most โ
uniquely determines principal minors of all orders. In Theorem 5, we show that the answer
is the simple cycle sparsity of ๐+(๐บ). In Subsections 2.3.2 and 2.3.3, we build on the
analysis of this subsection, and produce a polynomial-time algorithm for recovering a matrix
๐พ with prescribed principal minors (possibly given up to some error term). The polynomial-
time algorithm (of Subsection 2.3.3) for recovery given perturbed principal minors has key
connections to learning signed DPPs.
In addition, it is also reasonable to ask:
Given ๐พ โ ๐ฆ, what is the set of ๐พ โฒ โ ๐ฆ that satisfy โ๐(๐พ) = โ๐(๐พ โฒ) (2.8)
for all ๐ โ [๐ ]?
Question (2.8) treats the extent to which we can hope to recover a matrix from its principal
minors. For instance, the transpose operation ๐พ๐ and the similarity transformation ๐ท๐พ๐ท,
where ๐ท is a diagonal matrix with entries โ1,+1 on the diagonal, both clearly preserve
principal minors. In fact, these two operations suitably combined completely define this
set. This result follows fairly quickly from a more general result of Loewy [79, Theorem 1]
regarding matrices with all principal minors equal. In particular, Loewy shows that if two
๐ร ๐, ๐ โฅ 4, matrices ๐ด and ๐ต have all principal minors equal, and ๐ด is irreducible and
rank(๐ด๐,๐ ) โฅ 2 or rank(๐ด๐,๐) โฅ 2 (where ๐ด๐,๐ is the submatrix of ๐ด containing the rows
in ๐ and the columns in ๐ ) for all partitions ๐ โช ๐ = [๐], ๐ โฉ ๐ = โ , |๐|, |๐ | โฅ 2, then
either ๐ต or ๐ต๐ is diagonally similar to ๐ด. In Proposition 4, we include an alternate proof
41
answering Question (2.8), as Loewyโs result and proof, though more general and quite nice,
is more involved and not as illuminating for the specific case that we consider here.
To answer Question (2.7), we study properties of certain bases of the space ๐+(๐บ). We
recall that, given a matrix ๐พ โ ๐ฆ, we can define the charged sparsity graph ๐บ = ([๐ ], ๐ธ, ๐)
of ๐พ, where ๐บ is the simple graph on vertex set [๐ ] with an edge ๐, ๐ โ ๐ธ, ๐ = ๐, if
๐พ๐,๐ = 0 and endowed with a function ๐ mapping subgraphs of ๐บ to โ1,+1. Viewing the
possible values ยฑ1 for ๐ as the two elements of ๐บ๐น (2), we note that ๐(๐ป) is additive over
๐บ๐น (2), i.e. ๐(๐ป1 + ๐ป2) = ๐(๐ป1)๐(๐ป2). We can make a number of observations regarding
the connectivity of ๐บ. If ๐บ is disconnected, with connected components given by vertex
sets ๐1, ..., ๐๐, then any principal minor โ๐ satisfies
โ๐ =๐โ
๐=1
โ๐โฉ๐๐. (2.9)
In addition, if ๐บ has a cut vertex ๐, whose removal results in connected components with
vertex sets ๐1, ..., ๐๐, then the principal minor โ๐ satisfies
โ๐ =๐โ
๐1=1
โ[๐โช๐๐1]โฉ๐
โ๐2 =๐1
โ๐๐2โฉ๐ โ (๐ โ 1)โ๐โฉ๐
๐โ๐=1
โ๐๐โฉ๐. (2.10)
This implies that the principal minors of ๐พ are uniquely determined by principal minors
corresponding to subsets of blocks of ๐บ. Given this property, we focus on matrices ๐พ
without an articulation point, i.e. ๐บ is two-connected. Given results for matrices without an
articulation point and Equations (2.9) and (2.10), we can then answer Question (2.7) in the
more general case.
Next, we make an important observation regarding the contribution of certain terms
in the Laplace expansion of a principal minor. Recall that the Laplace expansion of the
determinant is given by
det(๐พ) =โ๐โ๐ฎ๐
sgn(๐)๐โ๐=1
๐พ๐,๐(๐),
42
where sgn(๐) is multiplicative over the (disjoint) cycles forming the permutation, and is
preserved when taking the inverse of the permutation, or of any cycle therein. Consider
now an arbitrary, possibly non-induced, simple cycle ๐ถ (in the graph-theoretic sense) of
the sparsity graph ๐บ, without loss of generality given by ๐ถ = 1 2 ... ๐ 1, that satisfies
๐(๐ถ) = โ1. Consider the sum of all terms in the Laplace expansion that contains either
the cycle (1 2 ... ๐) or its inverse (๐ ๐ โ 1 ... 1) in the associated permutation. Because
๐(๐ถ) = โ1,
๐พ๐,1
๐โ1โ๐=1
๐พ๐,๐+1 + ๐พ1,๐
๐โ๐=2
๐พ๐,๐โ1 = 0,
and the sum over all permutations containing the cyclic permutations (๐ ๐ โ 1 ... 1) or
(1 2 ... ๐) is zero. Therefore, terms associated with permutations containing negative cycles
(๐(๐ถ) = โ1) do not contribute to principal minors.
To illustrate the additional information contained in higher order principal minors โ๐ ,
|๐| > 2, we first consider principal minors of order three and four. Consider the principal
minor โ1,2,3, given by
โ1,2,3 = ๐พ1,1๐พ2,2๐พ3,3 โ๐พ1,1๐พ2,3๐พ3,2 โ๐พ2,2๐พ1,3๐พ3,1 โ๐พ3,3๐พ1,2๐พ2,1
+ ๐พ1,2๐พ2,3๐พ3,1 + ๐พ3,2๐พ2,1๐พ1,3
= โ1โ2โ3 โโ1
[โ2โ3 โโ2,3
]โโ2
[โ1โ3 โโ1,3
]โโ3
[โ1โ2 โโ1,2
]+[1 + ๐1,2๐2,3๐3,1
]๐พ1,2๐พ2,3๐พ3,1.
If the corresponding graph ๐บ[1, 2, 3] is not a cycle, or if the cycle is negative (๐1,2๐2,3๐3,1 =
โ1), then โ1,2,3 can be written in terms of principal minors of order at most two, and contains
no additional information about ๐พ. If ๐บ[1, 2, 3] is a positive cycle, then we can write
๐พ1,2๐พ2,3๐พ3,1 as a function of principal minors of order at most three,
๐พ1,2๐พ2,3๐พ3,1 = โ1โ2โ3 โ1
2
[โ1โ2,3 + โ2โ1,3 + โ3โ1,2
]+
1
2โ1,2,3,
which allows us to compute sgn(๐พ1,2๐พ2,3๐พ3,1). This same procedure holds for any posi-
tively charged induced simple cycle in ๐บ. However, when a simple cycle is not induced,
43
further analysis is required. To illustrate some of the potential issues for a non-induced
positive simple cycle, we consider principal minors of order four.
Consider the principal minor โ1,2,3,4. By our previous analysis, all terms in the Laplace
expansion of โ1,2,3,4 corresponding to permutations with cycles of length at most three can
be written in terms of principal minors of size at most three. What remains is to consider
the sum of the terms in the Laplace expansion corresponding to permutations containing
a cycle of length four (which there are three pairs of, each pair corresponding to the two
orientations of a cycle of length four in the graph sense), which we denote by ๐, given by
๐ = โ๐พ1,2๐พ2,3๐พ3,4๐พ4,1 โ๐พ1,2๐พ2,4๐พ4,3๐พ3,1 โ๐พ1,3๐พ3,2๐พ2,4๐พ4,1
โ๐พ4,3๐พ3,2๐พ2,1๐พ1,4 โ๐พ4,2๐พ2,1๐พ1,3๐พ3,4 โ๐พ4,2๐พ2,3๐พ3,1๐พ1,4.
If ๐บ[1, 2, 3, 4] does not contain a positive simple cycle of length four, then ๐ = 0 and
โ1,2,3,4 can be written in terms of principal minors of order at most three. If ๐บ[1, 2, 3, 4]
has exactly one positive cycle of length four, without loss of generality given by ๐ถ :=
1 2 3 4 1, then
๐ = โ[1 + ๐(๐ถ)
]๐พ1,2๐พ2,3๐พ3,4๐พ4,1 = โ2๐พ1,2๐พ2,3๐พ3,4๐พ4,1,
and we can write ๐พ1,2๐พ2,3๐พ3,4๐พ4,1 as a function of principal minors of order at most four,
which allows us to compute sgn(๐พ1,2๐พ2,3๐พ3,4๐พ4,1). Finally, we treat the case in which
there is more than one positive simple cycle of length four. This implies that all simple
cycles of length four are positive and ๐ is given by
๐ = โ2[๐พ1,2๐พ2,3๐พ3,4๐พ4,1 + ๐พ1,2๐พ2,4๐พ4,3๐พ3,1 + ๐พ1,3๐พ3,2๐พ2,4๐พ4,1
]. (2.11)
By Condition (2.6), the magnitude of each of these three terms is distinct, ๐ = 0, and we can
compute the sign of each of these terms, as there is a unique choice of ๐1, ๐2, ๐3 โ โ1,+1
44
satisfying
๐1|๐พ1,2๐พ2,3๐พ3,4๐พ4,1|+ ๐2|๐พ1,2๐พ2,4๐พ4,3๐พ3,1|+ ๐3|๐พ1,3๐พ3,2๐พ2,4๐พ4,1| = โ๐
2.
In order to better understand the behavior of principal minors of order greater than four, we
investigate the set of positive simple cycles (i.e., simple cycles ๐ถ satisfying ๐(๐ถ) = +1). In
the following two propositions, we compute the dimension of ๐+(๐บ), and note that when
๐บ is two-connected, this space is spanned by incidence vectors corresponding to positive
simple cycles.
Proposition 2. Let ๐บ = ([๐ ], ๐ธ, ๐) be a charged graph. Then dim(๐+(๐บ)) = ๐ โ 1 if ๐บ
contains at least one negative cycle, and equals ๐ otherwise.
Proof. Consider a basis ๐ฅ1, ..., ๐ฅ๐ for ๐(๐บ). If all associated cycles are positive, then
dim(๐+(๐บ)) = ๐ and we are done. If ๐บ has a negative cycle, then, by the linearity of ๐, at
least one element of ๐ฅ1, ..., ๐ฅ๐ must be the incidence vector of a negative cycle. Without
loss of generality, suppose that ๐ฅ1, ..., ๐ฅ๐ are incidence vectors of negative cycles. Then
๐ฅ1 + ๐ฅ2, ..., ๐ฅ1 + ๐ฅ๐, ๐ฅ๐+1, ..., ๐ฅ๐ is a linearly independent set of incidence vectors of positive
cycles. Therefore, dim(๐+(๐บ)) โฅ ๐ โ 1. However, ๐ฅ1 is the incidence vector of a negative
cycle, and so cannot be in the span of ๐ฅ1 + ๐ฅ2, ..., ๐ฅ1 + ๐ฅ๐, ๐ฅ๐+1, ..., ๐ฅ๐ .
Proposition 3. Let ๐บ = ([๐ ], ๐ธ, ๐) be a two-connected charged graph. Then
๐+(๐บ) = span๐๐ถ |๐ถ is a simple cycle of ๐บ, ๐(๐ถ) = +1.
Proof. If ๐บ does not contain a negative cycle, then the result follows immediately, as then
๐+(๐บ) = ๐(๐บ). Suppose then that ๐บ has at least one negative cycle. Decomposing this
negative cycle into the union of edge-disjoint simple cycles, we note that ๐บ must also have
at least one negative simple cycle.
Since ๐บ is a two-connected graph, it admits a proper (also called open) ear decomposition
with ๐ โ 1 proper ears (Whitney [128], see also [63]) starting from any simple cycle. We
choose our initial simple cycle to be a negative simple cycle denoted by ๐บ0. Whitneyโs proper
45
ear decomposition says that we can obtain a sequence of graphs ๐บ0, ๐บ1, ยท ยท ยท , ๐บ๐โ1 = ๐บ
where ๐บ๐ is obtained from ๐บ๐โ1 by adding a path ๐๐ between two distinct vertices ๐ข๐ and ๐ฃ๐
of ๐ (๐บ๐โ1) with its internal vertices not belonging to ๐ (๐บ๐โ1) (a proper or open ear). By
construction, ๐๐ is also a path between ๐ข๐ and ๐ฃ๐ in ๐บ๐.
We will prove a stronger statement by induction on ๐, namely that for suitably constructed
positive simple cycles ๐ถ๐ , ๐ = 1, ..., ๐ โ 1, we have that, for every ๐ = 0, 1, ..., ๐ โ 1:
(i) ๐+(๐บ๐) = span๐๐ถ๐: 1 โค ๐ โค ๐,
(ii) for every pair of distinct vertices ๐ข, ๐ฃ โ ๐ (๐บ๐), there exists both a positive path in ๐บ๐
between ๐ข and ๐ฃ and a negative path in ๐บ๐ between ๐ข and ๐ฃ.
For ๐ = 0, (i) is clear and (ii) follows from the fact that the two paths between ๐ข and ๐ฃ in
the cycle ๐บ0 must have opposite charge since their sum is ๐บ0 and ๐(๐บ0) = โ1. For ๐ > 0,
we assume that we have constructed ๐ถ1, ยท ยท ยท , ๐ถ๐โ1 satisfying (i) and (ii) for smaller values
of ๐. To construct ๐ถ๐, we take the path ๐๐ between ๐ข๐ and ๐ฃ๐ and complete it with a path of
charge ๐(๐๐) between ๐ข๐ and ๐ฃ๐ in ๐บ๐โ1. The existence of this latter path follows from (ii),
and these two paths together form a simple cycle ๐ถ๐ with ๐(๐ถ๐) = +1. It is clear that this ๐ถ๐
is linearly independent from all the previously constructed cycles ๐ถ๐ since ๐ถ๐ contains ๐๐
but ๐๐ was not even part of ๐บ๐โ1. This implies (i) for ๐.
To show (ii) for ๐บ๐, we need to consider three cases for ๐ข = ๐ฃ. If ๐ข, ๐ฃ โ ๐ (๐บ๐โ1), we
can use the corresponding positive and negative paths in ๐บ๐โ1 (whose existence follows
from (ii) for ๐โ 1). If ๐ข, ๐ฃ โ ๐ (๐๐), one of the paths can be the subpath ๐๐ข๐ฃ of ๐๐ between
๐ข and ๐ฃ and the other path can be ๐๐ โ ๐๐ข๐ฃ together with a path in ๐บ๐โ1 between ๐ข๐ and ๐ฃ๐ of
charge equal to โ๐(๐๐) (so that together they form a negative cycle). Otherwise, we must
have ๐ข โ ๐ (๐๐) โ ๐ข๐, ๐ฃ๐ and ๐ฃ โ ๐ (๐บ๐โ1) โ ๐ข๐, ๐ฃ๐ (or vice versa), and we can select the
paths to be the path in ๐๐ from ๐ข to ๐ข๐ together with two oppositely charged paths in ๐บ๐โ1
between ๐ข๐ and ๐ฃ. In all these cases, we have shown property (ii) holds for ๐บ๐.
For the remainder of the analysis in this subsection, we assume that ๐บ contains a
negative cycle, otherwise ๐+(๐บ) = ๐(๐บ) and we inherit all of the desirable properties of
๐(๐บ). Next, we study the properties of simple cycle bases (bases consisting of incidence
46
vectors of simple cycles) of ๐+(๐บ) that are minimal in some sense. We say that a simple
cycle basis ๐๐ถ1 , ..., ๐๐ถ๐โ1 for ๐+(๐บ) is lexicographically minimal if it lexicographically
minimizes (|๐ถ1|, |๐ถ2|, ..., |๐ถ๐โ1|), i.e., minimizes |๐ถ1|, minimizes |๐ถ2| conditional on |๐ถ1|
being minimal, etc. Any lexicographical minimal basis also minimizesโ|๐ถ๐| and max |๐ถ๐|
(by optimality of the greedy algorithm for (linear) matroids). For brevity, we will simply
refer to such a basis as โminimal." One complicating issue is that, while a minimal cycle
basis for ๐(๐บ) always consists of induced simple cycles, this is no longer the case for ๐+(๐บ).
This for example can happen for an appropriately constructed graph with only two negative,
short simple cycles, far away from each other; a minimal cycle basis for ๐+(๐บ) then contains
a cycle consisting of the union of these 2 negative simple cycles.
However, we can make a number of statements regarding chords of simple cycles
corresponding to elements of a minimal simple cycle basis of ๐+(๐บ). In the following
lemma, we show that a minimal simple cycle basis satisfies a number of desirable properties.
However, before we do so, we introduce some useful notation. Given a simple cycle ๐ถ, it is
convenient to fix an orientation say ๐ถ = ๐1 ๐2 ... ๐๐ ๐1. Now for any chord ๐๐1 , ๐๐2 โ ๐พ(๐ถ),
๐1 < ๐2, we denote the two cycles created by this chord by
๐ถ(๐1, ๐2) = ๐๐1 ๐๐1+1 ... ๐๐2 ๐๐1 and ๐ถ(๐2, ๐1) = ๐๐2 ๐๐2+1 ... ๐๐ ๐1 ... ๐๐1 ๐๐2 .
We have the following result.
Lemma 6. Let ๐บ = ([๐ ], ๐ธ, ๐) be a two-connected charged graph, and ๐ถ := ๐1 ๐2 ... ๐๐ ๐1
be a cycle corresponding to an incidence vector of a minimal simple cycle basis ๐ฅ1, ..., ๐ฅ๐โ1
of ๐+(๐บ). Then
(i) ๐(๐ถ(๐1, ๐2)
)= ๐(๐ถ(๐2, ๐1)
)= โ1 for all ๐๐1 , ๐๐2 โ ๐พ(๐ถ),
(ii) if ๐๐1 , ๐๐2, ๐๐3 , ๐๐4 โ ๐พ(๐ถ) satisfy ๐1 < ๐3 < ๐2 < ๐4 (crossing chords), then either
๐3 โ ๐1 = ๐4 โ ๐2 = 1 or ๐1 = 1, ๐2 โ ๐3 = 1, ๐4 = ๐, i.e. these two chords form a
four-cycle with two edges of the cycle,
(iii) there does not exist ๐๐1 , ๐๐2, ๐๐3 , ๐๐4, ๐๐5 , ๐๐6 โ ๐พ(๐ถ) satisfying either ๐1 < ๐2 โค
47
๐3 < ๐4 โค ๐5 < ๐6 or ๐6 โค ๐1 < ๐2 โค ๐3 < ๐4 โค ๐5.
Proof. Without loss of generality, suppose that ๐ถ = 1 2 ... ๐. Given a chord ๐1, ๐2,
๐ถ(๐1, ๐2) + ๐ถ(๐2, ๐1) = ๐ถ, and so ๐(๐ถ) = +1 implies that ๐(๐ถ(๐1, ๐2)
)= ๐(๐ถ(๐2, ๐1)
).
If both these cycles were positive, then this would contradict the minimality of the basis, as
|๐ถ(๐1, ๐2)|, |๐ถ(๐2, ๐1)| < ๐. This completes the proof of Property (i).
Given two chords ๐1, ๐2 and ๐3, ๐4 satisfying ๐1 < ๐3 < ๐2 < ๐4, we consider the
cycles
๐ถ1 := ๐ถ(๐1, ๐2) + ๐ถ(๐3, ๐4),
๐ถ2 := ๐ถ(๐1, ๐2) + ๐ถ(๐4, ๐3).
By Property (i), ๐(๐ถ1) = ๐(๐ถ2) = +1. In addition, ๐ถ1 + ๐ถ2 = ๐ถ, and |๐ถ1|, |๐ถ2| โค |๐ถ|.
By the minimality of the basis, either |๐ถ1| = |๐ถ| or |๐ถ2| = |๐ถ|, which implies that either
|๐ถ1| = 4 or |๐ถ2| = 4 (or both). This completes the proof of Property (ii).
Given three non-crossing chords ๐1, ๐2, ๐3, ๐4, and ๐5, ๐6, with ๐1 < ๐2 โค ๐3 <
๐4 โค ๐5 < ๐6 (the other case is identical, up to rotation), we consider the cycles
๐ถ1 := ๐ถ(๐3, ๐4) + ๐ถ(๐5, ๐6) + ๐ถ,
๐ถ2 := ๐ถ(๐1, ๐2) + ๐ถ(๐5, ๐6) + ๐ถ,
๐ถ3 := ๐ถ(๐1, ๐2) + ๐ถ(๐3, ๐4) + ๐ถ.
By Property (i), ๐(๐ถ1) = ๐(๐ถ2) = ๐(๐ถ3) = +1. In addition, ๐ถ1 + ๐ถ2 + ๐ถ3 = ๐ถ and
|๐ถ1|, |๐ถ2|, |๐ถ3| < |๐ถ|, a contradiction to the minimality of the basis. This completes the
proof of Property (iii).
The above lemma tells us that in a minimal simple cycle basis for ๐+(๐บ), in each cycle
two crossing chords always form a positive cycle of length four (and thus any chord has at
most one other crossing chord), and there doesnโt exist three non-crossing chords without
one chord in between the other two. Using a simple cycle basis of ๐+(๐บ) that satisfies the
three properties of the above lemma, we aim to show that the principal minors of order at
48
most the length of the longest cycle in the basis uniquely determine the principal minors of
all orders. To do so, we prove a series of three lemmas that show that from the principal
minor corresponding to a cycle ๐ถ = ๐1 ... ๐๐ ๐1 in our basis and ๐(1) principal minors of
subsets of ๐ (๐ถ), we can compute the quantity
๐ (๐ถ) :=โ
๐,๐โ๐ธ(๐ถ),๐<๐
sgn(๐พ๐,๐
).
Once we have computed ๐ (๐ถ๐) for every simple cycle in our basis, we can compute ๐ (๐ถ)
for any simple positive cycle by writing ๐ถ as a sum of some subset of the simple cycles
๐ถ1, ..., ๐ถ๐โ1 corresponding to incidence vectors in our basis, and taking the product of ๐ (๐ถ๐)
for all indices ๐ in the aforementioned sum. To this end, we prove the following three
lemmas.
Lemma 7. Let ๐ถ be a positive simple cycle of ๐บ = ([๐ ], ๐ธ, ๐) with ๐ (๐ถ) = ๐1, ๐2, ยท ยท ยท , ๐๐
whose chords ๐พ(๐ถ) satisfy Properties (i)-(iii) of Lemma 6. Then there exist two edges of ๐ถ,
say, ๐1, ๐๐ and ๐โ, ๐โ+1, where ๐ถ := ๐1 ๐2 ... ๐๐ ๐1 such that
(i) all chords ๐๐, ๐๐ โ ๐พ(๐ถ), ๐ < ๐, satisfy ๐ โค โ < ๐,
(ii) any positive simple cycle in ๐บ[๐ (๐ถ)] containing ๐1, ๐๐ spans ๐ (๐ถ) and contains
only edges in ๐ธ(๐ถ) and pairs of crossed chords.
Proof. We first note that Property (i) of Lemma 6 implies that the charge of a cycle ๐ถ โฒ โ
๐บ[๐ (๐ถ)] corresponds to the parity of ๐ธ(๐ถ โฒ) โฉ ๐พ(๐ถ), i.e., if ๐ถ โฒ contains an even number of
chords then ๐(๐ถ โฒ) = +1, otherwise ๐(๐ถ โฒ) = โ1. This follows from constructing a cycle
basis for ๐บ[๐ (๐ถ)] consisting of incidence vectors corresponding to ๐ถ and ๐ถ(๐ข, ๐ฃ) for every
chord ๐ข, ๐ฃ โ ๐พ(๐ถ).
Let the cycle ๐ถ(๐, ๐) be a shortest cycle in ๐บ[๐ (๐ถ)] containing exactly one chord of
๐ถ. If this is a non-crossed chord, then we take an arbitrary edge in ๐ธ(๐ถ(๐, ๐))โ๐, ๐ to be
๐1, ๐๐. If this chord crosses another, denoted by ๐โฒ, ๐โฒ, then either ๐ถ(๐โฒ, ๐โฒ) or ๐ถ(๐โฒ, ๐โฒ) is
also a shortest cycle, and we take an arbitrary edge in the intersection of these two shortest
49
cycles. Without loss of generality, let 1, ๐ denote this edge and let the cycle be 1 2 ... ๐ 1,
where for simplicity we have assumed that ๐ (๐ถ) = [๐]. Because of Property (iii) of Lemma
6, there exists โ such that all chords ๐, ๐ โ ๐พ(๐ถ) satisfy ๐ โค โ and ๐ > โ. This completes
the proof of Property (i).
Next, consider an arbitrary positive simple cycle ๐ถ โฒ in ๐บ[๐ (๐ถ)] containing 1, ๐.
We aim to show that this cycle contains only edges in ๐ธ(๐บ) and pairs of crossed chords,
from which the equality ๐ (๐ถ โฒ) = ๐ (๐ถ) immediately follows. Since ๐ถ โฒ is positive, it
contains an even number of chords ๐พ(๐ถ), and either all the chords ๐พ(๐ถ) โฉ ๐ธ(๐ถ โฒ) are pairs
of crossed chords or there exist two chords ๐1, ๐1, ๐2, ๐2 โ ๐พ(๐ถ) โฉ ๐ธ(๐ถ โฒ), w.l.o.g.
๐1 โค ๐2 โค โ < ๐2 < ๐1 (we can simply reverse the orientation if ๐1 = ๐2), that do not cross
any other chord in ๐พ(๐ถ) โฉ ๐ธ(๐ถ โฒ). In this case, there would be a 1โ ๐2 path in ๐ถ โฒ neither
containing ๐1 nor ๐1 (as ๐1, ๐1 would be along the other path between 1 and ๐2 in ๐ถ โฒ and
๐1 = ๐2). However, this is a contradiction as no such path exists in ๐ถ โฒ, since ๐1, ๐1 does
not cross any chord in ๐พ(๐ถ) โฉ ๐ธ(๐ถ โฒ). Therefore, the chords ๐พ(๐ถ) โฉ ๐ธ(๐ถ โฒ) are all pairs of
crossed chords. Furthermore, since all pairs of crossed chords must define cycles of length
4, we have ๐ (๐ถ โฒ) = ๐ (๐ถ). This completes the proof of Property (ii).
Although this is not needed in what follows, observe that, in the above Lemma 7,
๐โ, ๐โ+1 also plays the same role as ๐1, ๐๐ in the sense that any positive simple cycle
containing ๐โ, ๐โ+1 also spans ๐ (๐ถ) and contains only pairs of crossed chords and edges
in ๐ธ(๐ถ).
Lemma 8. Let ๐พ โ ๐ฆ have charged sparsity graph ๐บ = ([๐ ], ๐ธ, ๐), and ๐ถ = ๐1 ... ๐๐ ๐1
be a positive simple cycle of ๐บ whose chords ๐พ(๐ถ) satisfy Properties (i)-(iii) of Lemma
6, and with vertices ordered as in Lemma 7. Then the principal minor corresponding to
50
๐ = ๐ (๐ถ) is given by
โ๐ = โ๐1โ๐โ๐1 + โ๐๐โ๐โ๐๐ +[โ๐1,๐๐ โ 2โ๐1โ๐๐
]โ๐โ๐1,๐๐
โ 2๐พ๐1,๐๐โ1๐พ๐๐โ1,๐๐๐พ๐๐,๐2๐พ๐2,๐1โ๐โ๐1,๐2,๐๐โ1,๐๐
โ[โ๐1,๐2 โโ๐1โ๐2
][โ๐๐โ1,๐๐ โโ๐๐โ1
โ๐๐
]โ๐โ๐1,๐2,๐๐โ1,๐๐
+[โ๐1,๐๐โ1
โโ๐1โ๐๐โ1
][โ๐2,๐๐ โโ๐2โ๐๐
]โ๐โ๐1,๐2,๐๐โ1,๐๐
+[โ๐1,๐2 โโ๐1โ๐2
][โ๐โ๐1,๐2 โโ๐๐โ๐โ๐1,๐2,๐๐
]+[โ๐๐โ1,๐๐ โโ๐๐โ1
โ๐๐
][โ๐โ๐๐โ1,๐๐ โโ๐2โ๐โ๐1,๐๐โ1,๐๐
]+ ๐.
where ๐ is the sum of terms in the Laplace expansion of โ๐ corresponding to a permutation
where ๐(๐1) = ๐๐ or ๐(๐๐) = ๐1, but not both.
Proof. Without loss of generality, suppose that ๐ถ = 1 2 ... ๐ 1. Each term in โ๐ corre-
sponds to a partition of ๐ = ๐ (๐ถ) into a disjoint collection of vertices, pairs corresponding
to edges of ๐บ[๐], and simple cycles ๐ถ๐ of ๐ธ(๐) which can be assumed to be positive. We
can decompose the Laplace expansion of โ๐ into seven sums of terms, based on how the as-
sociated permutation for each term treats the elements 1 and ๐. Let ๐๐1,๐2 , ๐1, ๐2 โ 1, ๐, *,
equal the sum of terms corresponding to permutations where ๐(1) = ๐1 and ๐(๐) = ๐2,
where * denotes any element in 2, ..., ๐ โ 1. The case ๐(1) = ๐(๐) obviously cannot
occur, and so
โ๐ = ๐1,๐ + ๐1,* + ๐*,๐ + ๐๐,1 + ๐๐,* + ๐*,1 + ๐*,*.
By definition, ๐ = ๐๐,* + ๐*,1. What remains is to compute the remaining five terms. We
have
๐๐,1 = โ๐พ1,๐๐พ๐,1โ๐โ1,๐
=[โ1,๐ โโ1โ๐
]โ๐โ1,๐,
51
๐1,๐ = โ1โ๐โ๐โ1,๐,
๐1,๐ + ๐1,* = โ1โ๐โ1,
๐1,๐ + ๐*,๐ = โ๐โ๐โ๐,
and so
โ๐ = โ1โ๐โ1 + โ๐โ๐โ๐ +[โ1,๐ โ 2โ1โ๐
]โ๐โ1,๐ + ๐*,* + ๐.
The sum ๐*,* corresponds to permutations where ๐(1) /โ 1, ๐ and ๐(๐) /โ 1, ๐.
We first show that the only permutations that satisfy these two properties take the form
๐2(1) = 1 or ๐2(๐) = ๐ (or both), or contain both 1 and ๐ in the same cycle. Suppose to
the contrary, that there exists some permutation where 1 and ๐ are not in the same cycle,
and both are in cycles of length at least three. Then in ๐บ[๐] both vertices 1 and ๐ contain
a chord emanating from them. By Property (ii) of Lemma 6, each has exactly one chord
and these chords are 1, ๐ โ 1 and 2, ๐. The cycle containing 1 must also contain 2 and
๐ โ 1, and the cycle containing ๐ must also contain 2 and ๐ โ 1, a contradiction.
In addition, we note that, also by the above analysis, the only cycles (of a permutation)
that can contain both 1 and ๐ without having ๐(1) = ๐ or ๐(๐) = 1 are given by (1 2 ๐ ๐โ1)
and (1 ๐โ 1 ๐ 2). Therefore, we can decompose ๐*,* further into the sum of five terms. Let
โข ๐1 = the sum of terms corresponding to permutations containing either (1 2 ๐ ๐ โ 1)
or (1 ๐ โ 1 ๐ 2),
โข ๐2 = the sum of terms corresponding to permutations containing (1 2) and (๐ โ 1 ๐),
โข ๐3 = the sum of terms corresponding to permutations containing (1 ๐ โ 1) and (2 ๐),
โข ๐4 = the sum of terms corresponding to permutations containing (1 2) and where
๐(๐) = ๐ โ 1, ๐,
โข ๐5 = the sum of terms corresponding to permutations containing (๐โ 1 ๐) and where
๐(1) = 1, 2.
52
๐1 and ๐3 are only non-zero if 1, ๐ โ 1 and 2, ๐ are both chords of ๐ถ. ๐4 is only
non-zero if there is a chord incident to ๐, and ๐5 is only non-zero if there is a chord incident
to 1. We have ๐*,* = ๐1 + ๐2 + ๐3 + ๐4 + ๐5, and
๐1 = โ2๐พ1,๐โ1๐พ๐โ1,๐๐พ๐,2๐พ2,1โ๐โ1,2,๐โ1,๐,
๐2 = ๐พ1,2๐พ2,1๐พ๐โ1,๐๐พ๐,๐โ1โ๐โ1,2,๐โ1,๐
=[โ1,2 โโ1โ2
][โ๐โ1,๐ โโ๐โ1โ๐
]โ๐โ1,2,๐โ1,๐,
๐3 = ๐พ1,๐โ1๐พ๐โ1,1๐พ2,๐๐พ๐,2โ๐โ1,2,๐โ1,๐
=[โ1,๐โ1 โโ1โ๐โ1
][โ2,๐ โโ2โ๐
]โ๐โ1,2,๐โ1,๐,
๐2 + ๐4 = โ๐พ1,2๐พ2,1
[โ๐โ1,2 โโ๐โ๐โ1,2,๐
]=[โ1,2 โโ1โ2
][โ๐โ1,2 โโ๐โ๐โ1,2,๐
],
๐2 + ๐5 = โ๐พ๐โ1,๐๐พ๐,๐โ1
[โ๐โ๐โ1,๐ โโ1โ๐โ1,๐โ1,๐
]=[โ๐โ1,๐ โโ๐โ1โ๐
][โ๐โ๐โ1,๐ โโ2โ๐โ1,๐โ1,๐
].
Combining all of these terms gives us
๐*,* = โ2๐พ1,๐โ1๐พ๐โ1,๐๐พ๐,2๐พ2,1โ๐โ1,2,๐โ1,๐
โ[โ1,2 โโ1โ2
][โ๐โ1,๐ โโ๐โ1โ๐
]โ๐โ1,2,๐โ1,๐
+[โ1,๐โ1 โโ1โ๐โ1
][โ2,๐ โโ2โ๐
]โ๐โ1,2,๐โ1,๐
+[โ1,2 โโ1โ2
][โ๐โ1,2 โโ๐โ๐โ1,2,๐
]+[โ๐โ1,๐ โโ๐โ1โ๐
][โ๐โ๐โ1,๐ โโ2โ๐โ1,๐โ1,๐
].
53
Combining our formula for ๐*,* with our formula for โ๐ leads to the desired result
โ๐ = โ1โ๐โ1 + โ๐โ๐โ๐ +[โ1,๐ โ 2โ1โ๐
]โ๐โ1,๐
โ 2๐พ1,๐โ1๐พ๐โ1,๐๐พ๐,2๐พ2,1โ๐โ1,2,๐โ1,๐
โ[โ1,2 โโ1โ2
][โ๐โ1,๐ โโ๐โ1โ๐
]โ๐โ1,2,๐โ1,๐
+[โ1,๐โ1 โโ1โ๐โ1
][โ2,๐ โโ2โ๐
]โ๐โ1,2,๐โ1,๐
+[โ1,2 โโ1โ2
][โ๐โ1,2 โโ๐โ๐โ1,2,๐
]+[โ๐โ1,๐ โโ๐โ1โ๐
][โ๐โ๐โ1,๐ โโ2โ๐โ1,๐โ1,๐
]+ ๐.
Lemma 9. Let ๐พ โ ๐ฆ have two-connected charged sparsity graph ๐บ = ([๐ ], ๐ธ, ๐), and
๐ถ = ๐1 ... ๐๐ ๐1 be a positive simple cycle of ๐บ whose chords ๐พ(๐ถ) satisfy Properties (i)-(iii)
of Lemma 6, with vertices ordered as in Lemma 7. Let ๐ equal the sum of terms in the
Laplace expansion of โ๐ , ๐ = ๐ (๐ถ), corresponding to a permutation where ๐(๐1) = ๐๐
or ๐(๐๐) = ๐1, but not both. Let ๐ โ ๐2 be the set of pairs (๐, ๐), ๐ < ๐, for which
๐๐, ๐๐โ1, ๐๐+1, ๐๐ โ ๐พ(๐ถ), i.e., cycle edges ๐๐, ๐๐+1 and ๐๐โ1, ๐๐) have a pair of
crossed chords between them. Then ๐ is equal to
2 (โ1)๐+1๐พ๐๐,๐1
๐โ1โ๐=1
๐พ๐๐ ,๐๐+1
โ(๐,๐)โ๐
[1โ
๐๐๐,๐๐+1๐พ๐๐โ1,๐๐๐พ๐๐+1,๐๐
๐พ๐๐,๐๐+1๐พ๐๐โ1,๐๐
].
Proof. Without loss of generality, suppose that ๐บ is labelled so that ๐ถ = 1 2 ... ๐ 1.
Edge 1, ๐ satisfies Property (ii) of Lemma 7, and so every positive simple cycle of ๐บ[๐]
containing 1, ๐ spans ๐ and contains only edges in ๐ธ(๐ถ) and pairs of crossing chords.
Therefore, we may assume without loss of generality that ๐บ[๐] contains ๐ pairs of crossing
chords, and no other chords. If ๐ = 0, then ๐ถ is an induced cycle and the result follows
immediately.
There are 2๐ Hamiltonian 1 โ ๐ paths in ๐บ[๐] not containing 1, ๐, corresponding
to the ๐ binary choices of whether to use each pair of crossing chords or not. Let us
54
denote these paths from 1 to ๐ by ๐ 1,๐๐ , where ๐ โ Z๐
2, and ๐๐ equals one if the ๐๐กโ crossing
is used in the path (ordered based on increasing distance to 1, ๐), and zero otherwise.
In particular, ๐ 1,๐0 corresponds to the path = 1 2 ... ๐. We only consider paths with
the orientation 1 โ ๐, as all cycles under consideration are positive, and the opposite
orientation has the same sum. Denoting the product of the terms of ๐พ corresponding to the
path ๐ 1,๐๐ = 1 = โ1 โ2 ... โ๐ = ๐, by
๐พ(๐ 1,๐๐
)=
๐โ1โ๐=1
๐พโ๐ ,โ๐+1,
we have
๐ = 2 (โ1)๐+1๐พ๐,1๐พ(๐ 1,๐0
)โ๐โZ๐
2
๐พ(๐ 1,๐๐
)๐พ(๐ 1,๐0
) ,where
๐พ(๐ 1,๐0
)=
๐โ1โ๐=1
๐พ๐,๐+1.
Suppose that the first possible crossing occurs at cycle edges ๐, ๐ + 1 and ๐ โ 1, ๐,
๐ < ๐, i.e., ๐, ๐โ 1 and ๐ + 1, ๐ are crossing chords. Then
๐ = 2 (โ1)๐+1 ๐พ๐,1
๐โ1โ๐=1
๐พ๐,๐+1
โ๐โZ๐
2
๐พ(๐ ๐,๐๐
)๐พ(๐ ๐,๐0
) .We have โ
๐โZ๐2
๐พ(๐ ๐,๐๐
)=โ๐1=0
๐พ(๐ ๐,๐๐
)+โ๐1=1
๐พ(๐ ๐,๐๐
),
and
โ๐1=0
๐พ(๐ ๐,๐๐
)= ๐พ๐,๐+1๐พ๐โ1,๐
โ๐โฒโZ๐โ1
2
๐พ(๐ ๐+1,๐โ1๐โฒ
),
55
โ๐1=1
๐พ(๐ ๐,๐๐
)= ๐พ๐,๐โ1๐พ๐+1,๐
โ๐โฒโZ๐โ1
2
๐พ(๐ ๐โ1,๐+1๐โฒ
)= ๐พ๐,๐โ1๐พ๐+1,๐
โ๐โฒโZ๐โ1
2
๐พ(๐ ๐+1,๐โ1๐โฒ
)๐(๐ ๐+1,๐โ1๐โฒ
).
By Property (i) of Lemma 6, ๐(๐ถ(๐, ๐โ 1)
)= โ1, and so
๐(๐ ๐+1,๐โ1๐โฒ
)= โ๐๐โ1,๐๐๐,๐+1 and ๐พ๐,๐โ1๐
(๐ ๐+1,๐โ1๐โฒ
)= โ๐๐,๐+1๐พ๐โ1,๐.
This implies that
โ๐โZ๐
2
๐พ(๐ ๐,๐๐
)=(๐พ๐,๐+1๐พ๐โ1,๐ โ ๐๐,๐+1๐พ๐โ1,๐๐พ๐+1,๐
) โ๐โฒโZ๐โ1
2
๐พ(๐ ๐+1,๐โ1๐โฒ
),
and that
โ๐โZ๐
2
๐พ(๐ ๐,๐๐
)๐พ(๐ ๐,๐0
) =
(1โ ๐๐,๐+1๐พ๐โ1,๐๐พ๐+1,๐
๐พ๐,๐+1๐พ๐โ1,๐
) โ๐โฒโZ๐โ1
2
๐พ(๐ ๐+1,๐โ1๐โฒ
)๐พ(๐ ๐+1,๐โ10
) .Repeating the above procedure for the remaining ๐โ 1 crossings completes the proof.
Equipped with Lemmas 6, 7, 8, and 9, we can now make a key observation. Suppose that
we have a simple cycle basis ๐ฅ1, ..., ๐ฅ๐โ1 for ๐+(๐บ) whose corresponding cycles all satisfy
Properties (i)-(iii) of Lemma 6. Of course, a minimal simple cycle basis satisfies this, but in
Subsections 2.3.2 and 2.3.3 we will consider alternate bases that also satisfy these conditions
and may be easier to compute in practice. For cycles ๐ถ of length at most four, we have
already detailed how to compute ๐ (๐ถ), and this computation requires only principal minors
corresponding to subsets of ๐ (๐ถ). When ๐ถ is of length greater than four, by Lemmas 8
and 9, we can also compute ๐ (๐ถ), using only principal minors corresponding to subsets
of ๐ (๐ถ), as the quantity sgn(๐พ๐1,๐๐โ1๐พ๐๐โ1,๐๐๐พ๐๐,๐2๐พ๐2,๐1) in Lemma 8 corresponds to a
positive four-cycle, and in Lemma 9 the quantity
sgn(
1โ๐๐๐,๐๐+1๐พ๐๐โ1,๐๐๐พ๐๐+1,๐๐
๐พ๐๐,๐๐+1๐พ๐๐โ1,๐๐
)
56
equals +1 if |๐พ๐๐,๐๐+1๐พ๐๐โ1,๐๐| > |๐พ๐๐โ1,๐๐๐พ๐๐+1,๐๐| and equals
โ๐๐๐,๐๐+1๐๐๐โ1,๐๐ sgn(๐พ๐๐,๐๐+1๐พ๐๐+1,๐๐๐พ๐๐,๐๐โ1
๐พ๐๐โ1,๐๐
)if |๐พ๐๐,๐๐+1๐พ๐๐โ1,๐๐| < |๐พ๐๐โ1,๐๐๐พ๐๐+1,๐๐|. By Condition (2.6), we have |๐พ๐๐,๐๐+1๐พ๐๐โ1,๐๐| =
|๐พ๐๐โ1,๐๐๐พ๐๐+1,๐๐|. Therefore, given a simple cycle basis ๐ฅ1, ..., ๐ฅ๐โ1 whose corresponding
cycles satisfy Properties (i)-(iii) of Lemma 6, we can compute ๐ (๐ถ) for every such cycle
in the basis using only principal minors of size at most the length of the longest cycle in
the basis. Given this fact, we are now prepared to answer Question (2.7) and provide an
alternate proof for Question (2.8) through the following proposition and theorem.
Proposition 4. Let ๐พ โ ๐ฆ, with charged sparsity graph ๐บ = ([๐ ], ๐ธ, ๐). The set of ๐พ โฒ โ ๐ฆ
that satisfy โ๐(๐พ) = โ๐(๐พ โฒ) for all ๐ โ [๐ ] is exactly the set generated by ๐พ and the
operations
๐๐ -similarity: ๐พ โ ๐ท๐พ๐ท, where ๐ท โ R๐ร๐ is an arbitrary involutory diagonal
matrix, i.e., ๐ท has entries ยฑ1 on the diagonal, and
block transpose: ๐พ โ ๐พ โฒ, where ๐พ โฒ๐,๐ = ๐พ๐,๐ for all ๐, ๐ โ ๐ (๐ป) for some block ๐ป , and
๐พ โฒ๐,๐ = ๐พ๐,๐ otherwise.
Proof. We first verify that the ๐๐ -similarity and block transpose operations both preserve
principal minors. The determinant is multiplicative, and so principal minors are immediately
preserved under ๐๐ -similarity, as the principal submatrix of ๐ท๐พ๐ท corresponding to ๐ is
equal to the product of the principal submatrices corresponding to ๐ of the three matrices ๐ท,
๐พ, ๐ท, and so
โ๐(๐ท๐พ๐ท) = โ๐(๐ท)โ๐(๐พ)โ๐(๐ท) = โ๐(๐พ).
For the block transpose operation, we note that the transpose leaves the determinant un-
changed. By equation (2.10), the principal minors of a matrix are uniquely determined by the
principal minors of the matrices corresponding to the blocks of ๐บ. As the transpose of any
block also leaves the principal minors corresponding to subsets of other blocks unchanged,
principal minors are preserved under ๐๐ -similarity.
57
What remains is to show that if ๐พ โฒ satisfies โ๐(๐พ) = โ๐(๐พ โฒ) for all ๐ โ [๐ ], then ๐พ โฒ
is generated by ๐พ and the above two operations. Without loss of generality, we may suppose
that the shared charged sparsity graph ๐บ = ([๐ ], ๐ธ, ๐) of ๐พ and ๐พ โฒ is two-connected, as the
general case follows from this one. We will transform ๐พ โฒ to ๐พ by first making them agree
for entries corresponding to a spanning tree and a negative edge of ๐บ, and then observing
that by Lemmas 8 and 9, this property implies that all entries agree.
Let ๐ถ โฒ be an arbitrary negative simple cycle of ๐บ, ๐ โ ๐ธ(๐ถ โฒ) be a negative edge in ๐ถ โฒ,
and ๐ be a spanning tree of ๐บ containing the edges ๐ธ(๐ถ โฒ) โ ๐. By applying ๐๐ -similarity
and block transpose to ๐พ โฒ, we can produce a matrix that agrees with ๐พ for all entries
corresponding to edges in ๐ธ(๐ ) โช ๐. We perform this procedure in three steps, first by
making ๐พ โฒ agree with ๐พ for edges in ๐ธ(๐ ), then for edge ๐, and then finally fixing any
edges in ๐ธ(๐ ) for which the matrices no longer agree.
First, we make ๐พ โฒ and ๐พ agree for edges in ๐ธ(๐ ). Suppose that ๐พ โฒ๐,๐ = โ๐พ๐,๐ for some
๐, ๐ โ ๐ธ(๐ ). Let ๐ โ [๐ ] be the set of vertices connected to ๐ in the forest ๐ โ ๐, ๐
(the removal of edge ๐, ๐ from ๐ ), and be the diagonal matrix with ๐,๐ = +1 if ๐ โ ๐
and ๐,๐ = โ1 if ๐ โ ๐ . The matrix ๐พ โฒ satisfies
[๐พ โฒ
]๐,๐
= ๐,๐๐พโฒ๐,๐๐,๐ = โ๐พ โฒ
๐,๐ = ๐พ๐,๐,
and[๐พ โฒ
]๐,๐
= ๐พ โฒ๐,๐ for any ๐, ๐ either both in ๐ or neither in ๐ (and therefore for all
edges ๐, ๐ โ ๐ธ(๐ ) โ ๐, ๐). Repeating this procedure for every edge ๐, ๐ โ ๐ธ(๐ ) for
which ๐พ โฒ๐,๐ = โ๐พ๐,๐ results in a matrix that agrees with ๐พ for all entries corresponding to
edges in ๐ธ(๐ ).
Next, we make our matrix and ๐พ agree for the edge ๐. If our matrix already agrees for
edge ๐, then we are done with this part of the proof, and we denote the resulting matrix by
. If our resulting matrix does not agree, then, by taking the transpose of this matrix, we
have a new matrix that now agrees with ๐พ for all edges ๐ธ(๐ )โช ๐, except for negative edges
in ๐ธ(๐ ). By repeating the ๐๐ -similarity operation again on negative edges of ๐ธ(๐ ), we
again obtain a matrix that agrees with ๐พ on the edges of ๐ธ(๐ ), but now also agrees on the
58
edge ๐, as there is an even number of negative edges in the path between the vertices of ๐ in
the tree ๐ . We now have a matrix that agrees with ๐พ for all entries corresponding to edges
in ๐ธ(๐ ) โช ๐, and we denote this matrix by .
Finally, we aim to show that agreement on the edges ๐ธ(๐ ) โช ๐ already implies that
= ๐พ. Let ๐, ๐ be an arbitrary edge not in ๐ธ(๐ ) โช ๐, and ๐ถ be the simple cycle
containing edge ๐, ๐ and the unique ๐โ ๐ path in the tree ๐ . By Lemmas 8 and 9, the value
of ๐ ๐พ(๐ถ) can be computed for every cycle corresponding to an incidence vector in a minimal
simple cycle basis of ๐+(๐บ) using only principal minors. Then ๐ ๐พ(๐ถ) can be computed
using principal minors and ๐ ๐พ(๐ถ โฒ), as the incidence vector for ๐ถ โฒ combined with a minimal
positive simple cycle basis forms a basis for ๐(๐บ). By assumption, โ๐(๐พ) = โ๐() for
all ๐ โ [๐ ], and, by construction, ๐ ๐พ(๐ถ โฒ) = ๐ (๐ถ โฒ). Therefore, ๐ ๐พ(๐ถ) = ๐ (๐ถ) for all
๐, ๐ not in ๐ธ(๐ ) โช ๐, and so = ๐พ.
Theorem 5. Let ๐พ โ ๐ฆ, with charged sparsity graph ๐บ = ([๐ ], ๐ธ, ๐). Let โ+ โฅ 3 be the
simple cycle sparsity of ๐+(๐บ). Then any matrix ๐พ โฒ โ ๐ฆ with โ๐(๐พ) = โ๐(๐พ โฒ) for all
|๐| โค โ+ also satisfies โ๐(๐พ) = โ๐(๐พ โฒ) for all ๐ โ [๐ ], and there exists a matrix โ ๐ฆ
with โ๐() = โ๐(๐พ) for all |๐| < โ+ but โ๐() = โ๐(๐พ) for some ๐.
Proof. The first part of theorem follows almost immediately from Lemmas 8 and 9. It
suffices to consider a matrix ๐พ โ ๐ฆ with a two-connected charged sparsity graph ๐บ =
([๐ ], ๐ธ, ๐), as the more general case follows from this one. By Lemmas 8, and 9, the
quantity ๐ (๐ถ) is computable for all simple cycles ๐ถ in a minimal cycle basis for ๐+(๐บ)
(and therefore for all positive simple cycles), using only principal minors of size at most
the length of the longest cycle in the basis, which in this case is the simple cycle sparsity
โ+ of ๐+(๐บ). The values ๐ (๐ถ) for positive simple cycles combined with the magnitude of
the entries of ๐พ and the charge function ๐ uniquely determines all principal minors, as each
term in the Laplace expansion of some โ๐(๐พ) corresponds to a partitioning of ๐ into a
disjoint collection of vertices, pairs corresponding to edges, and oriented versions of positive
simple cycles of ๐ธ(๐). This completes the first portion of the proof.
Next, we explicitly construct a matrix that agrees with ๐พ in the first โ+ โ 1 principal
minors (โ+ โฅ 3), but disagrees on a principal minor of order โ+. To do so, we consider a
59
minimal simple cycle basis๐๐ถ1 , ..., ๐๐ถ๐โ1
of ๐+(๐บ), ordered so that |๐ถ1| โฅ |๐ถ2| โฅ ... โฅ
|๐ถ๐โ1|. By definition, โ+ = |๐ถ1|. Let be a matrix satisfying
๐,๐ = ๐พ๐,๐ for ๐ โ [๐ ], ๐,๐๐,๐ = ๐พ๐,๐๐พ๐,๐ for ๐, ๐ โ [๐ ],
and
๐ (๐ถ๐) =
โงโชโจโชโฉ ๐ ๐พ(๐ถ๐) if ๐ > 1
โ๐ ๐พ(๐ถ๐) if ๐ = 1
.
The matrix and ๐พ agree on all principal minors of order at most โ+ โ 1, for if there was
a principal minor where they disagreed, this would imply that there is a positive simple
cycle of length less than โ+ whose incidence vector is not in the span of ๐๐ถ2 , ..., ๐๐ถ๐โ1, a
contradiction. To complete the proof, it suffices to show that โ๐ (๐ถ1)() = โ๐ (๐ถ1)(๐พ).
To do so, we consider three different cases, depending on the length of ๐ถ1. If ๐ถ1 is
a three-cycle, then ๐ถ1 is an induced cycle and the result follows immediately. If ๐ถ1 is a
four-cycle, ๐บ[๐ (๐ถ1)] may possibly have multiple positive four-cycles. However, in this
case the quantity ๐ from Equation (2.11) is distinct for and ๐พ, as ๐พ โ ๐ฆ. Finally,
we consider the general case when |๐ถ1| > 4. By Lemma 8, all terms of โ๐ (๐ถ)(๐พ) (and
โ๐ (๐ถ)()) except for ๐ depend only on principal minors of order at most โ+ โ 1. We
denote the quantity ๐ from Lemma 8 for ๐พ and by ๐ and ๐ respectively. By Lemma 9,
the magnitude of ๐ and ๐ depend only on principal minors of order at most four, and so
|๐| = |๐|. In addition, because ๐พ โ ๐ฆ, ๐ = 0. The quantities ๐ and ๐ are equal in sign if
and only if ๐ (๐ถ1) = ๐ ๐พ(๐ถ1), and therefore โ๐ (๐ถ1)() = โ๐ (๐ถ1)(๐พ).
2.3.2 Efficiently Recovering a Matrix from its Principal Minors
In Subsection 2.3.1, we characterized both the set of magnitude-symmetric matrices in ๐ฆ
that share a given set of principal minors, and noted that principal minors of order โค โ+
uniquely determine principal minors of all orders, where โ+ is the simple cycle sparsity of
๐+(๐บ) and is computable using principal minors of order one and two. In this subsection,
we make use of a number of theoretical results of Subsection 2.3.1 to formally describe a
60
polynomial time algorithm to produce a magnitude-symmetric matrix ๐พ with prescribed
principal minors. We provide a high-level description and discussion of this algorithm below,
and save the description of a key subroutine for computing a positive simple cycle basis for
after the overall description and proof.
An Efficient Algorithm
Our formal algorithm proceeds by completing a number of high-level tasks, which we
describe below. This procedure is very similar in nature to the algorithm implicitly described
and used in the proofs of Proposition 4 and Theorem 5. The main difference between the
algorithmic procedure alluded to in Subsection 2.3.1 and this algorithm is the computation
of a positive simple cycle basis. Unlike ๐(๐บ), there is no known polynomial time algorithm
to compute a minimal simple cycle basis for ๐+(๐บ), and a decision version of this problem
may indeed be NP-hard. Our algorithm, which we denote by RECOVERK(โ๐๐โ[๐ ]
),
proceeds in five main steps:
Step 1: Compute |๐พ๐,๐|, ๐, ๐ โ [๐ ], and the charged sparsity graph ๐บ = ([๐ ], ๐ธ, ๐).
We recall that ๐พ๐,๐ = โ๐ and |๐พ๐,๐| = |๐พ๐,๐| =โ|โ๐โ๐ โโ๐,๐|. The edges ๐, ๐ โ
๐ธ(๐บ) correspond to non-zero off-diagonal entries |๐พ๐,๐| = 0, and the function ๐ is given
by ๐๐,๐ = sgn(โ๐โ๐ โโ๐,๐).
Step 2: For every block ๐ป of ๐บ, compute a simple cycle basis ๐ฅ1, ..., ๐ฅ๐ of ๐+(๐ป).
In the proof of Proposition 3, we defined an efficient algorithm to compute a simple
cycle basis of ๐+(๐ป) for any two-connected graph ๐ป . This algorithm makes use of the
property that every two-connected graph has an open ear decomposition. Unfortunately,
this algorithm has no provable guarantees on the length of the longest cycle in the basis.
For this reason, we introduce an alternate efficient algorithm below that computes a
simple cycle basis of ๐+(๐ป) consisting of cycles all of length at most 3๐๐ป , where ๐๐ป is
61
the maximum length of a shortest cycle between any two edges in ๐ป , i.e.,
๐๐ป := max๐,๐โฒโ๐ธ(๐ป)
minsimple cycle ๐ถ๐ .๐ก. ๐,๐โฒโ๐ธ(๐ถ)
|๐ถ|
with ๐๐ป := 2 if ๐ป is acyclic. The existence of a simple cycle through any two edges
of a 2-connected graph can be deduced from the existence of two vertex-disjoint paths
between newly added midpoints of these two edges. The resulting simple cycle basis
also maximizes the number of three- and four-cycles it contains. This is a key property
which allows us to limit the number of principal minors that we query.
Step 3: For every block ๐ป of ๐บ, convert ๐ฅ1, ..., ๐ฅ๐ into a simple cycle basis satisfying
Properties (i)-(iii) of Lemma 6.
If there was an efficient algorithm for computing a minimal simple cycle basis for
๐+(๐ป), by Lemma 6, we would be done (and there would be no need for the previous
step). However, there is currently no known algorithm for this, and a decision version of
this problem may be NP-hard. By using the simple cycle basis from Step 2 and iteratively
removing chords, we can create a basis that satisfies the same three key properties (in
Lemma 6) that a minimal simple cycle basis does. In addition, the lengths of the cycles
in this basis are no larger than those of Step 2, i.e., no procedure in Step 3 ever increases
the length of a cycle.
The procedure for this is quite intuitive. Given a cycle ๐ถ in the basis, we efficiently
check that ๐พ(๐ถ) satisfies Properties (i)-(iii) of Lemma 6. If all properties hold, we are
done, and check another cycle in the basis, until all cycles satisfy the desired properties.
In each case, if a given property does not hold, then, by the proof of Lemma 6, we can
efficiently compute an alternate cycle ๐ถ โฒ, |๐ถ โฒ| < |๐ถ|, that can replace ๐ถ in the simple
cycle basis for ๐+(๐ป), decreasing the sum of cycle lengths in the basis by at least one.
Because of this, the described procedure is a polynomial time algorithm.
62
Step 4: For every block ๐ป of ๐บ, compute ๐ (๐ถ) for every cycle ๐ถ in the basis.
This calculation relies heavily on the results of Subsection 2.3.1. The simple cycle basis
for ๐+(๐ป) satisfies Properties (i)-(iii) of Lemma 6, and so we may apply Lemmas 8
and 9. We compute the quantities ๐ (๐ถ) iteratively based on cycle length, beginning
with the shortest cycle in the basis and finishing with the longest. By the analysis at
the beginning of Subsection 2.3.1, we recall that when ๐ถ is a three- or four-cycle, ๐ (๐ถ)
can be computed efficiently using ๐(1) principal minors, all corresponding to subsets
of ๐ (๐ถ). When |๐ถ| > 4, by Lemma 8, the quantity ๐ (defined in Lemma 8) can be
computed efficiently using ๐(1) principal minors, all corresponding to subsets of ๐ (๐ถ).
By Lemma 9, the quantity ๐ (๐ถ) can be computed efficiently using ๐, ๐ (๐ถ โฒ) for positive
four-cycles ๐ถ โฒ โ ๐บ[๐ (๐ถ)], and ๐(1) principal minors all corresponding to subsets of
๐ (๐ถ). Because our basis maximizes the number of three- and four-cycles it contains,
any such four-cycle ๐ถ โฒ is either in the basis (and so ๐ (๐ถ โฒ) has already been computed)
or is a linear combination of three- and four-cycles in the basis, in which case ๐ (๐ถ โฒ) can
be computed using Gaussian elimination, without any additional querying of principal
minors.
Step 5: Output a matrix ๐พ satisfying โ๐(๐พ) = โ๐ for all ๐ โ [๐].
The procedure for producing this matrix is quite similar to the proof of Proposition 4.
It suffices to fix the signs of the upper triangular entries of ๐พ, as the lower triangular
entries can be computed using ๐. For each block ๐ป , we find a negative simple cycle ๐ถ (if
one exists), fix a negative edge ๐ โ ๐ธ(๐ถ), and extend ๐ธ(๐ถ)โ๐ to a spanning tree ๐ of ๐ป ,
i.e.,[๐ธ(๐ถ)โ๐
]โ ๐ธ(๐ ). We give the entries ๐พ๐,๐ , ๐ < ๐, ๐, ๐ โ ๐ธ(๐ ) โช ๐ an arbitrary
sign, and note our choice of ๐ ๐พ(๐ถ). We extend the simple cycle basis for ๐+(๐ป) to a
basis for ๐(๐ป) by adding ๐ถ. On the other hand, if no negative cycle exists, then we
simply fix an arbitrary spanning tree ๐ , and give the entries ๐พ๐,๐ , ๐ < ๐, ๐, ๐ โ ๐ธ(๐ )
an arbitrary sign. In both cases, we have a basis for ๐(๐ป) consisting of cycles ๐ถ๐ for
63
which we have computed ๐ (๐ถ๐). For each edge ๐, ๐ โ ๐ธ(๐ป) corresponding to an entry
๐พ๐,๐ for which we have not fixed the sign of, we consider the cycle ๐ถ โฒ consisting of the
edge ๐, ๐ and the unique ๐โ ๐ path in ๐ . Using Gaussian elimination, we can write ๐ถ โฒ
as a sum of a subset of the cycles in our basis. As noted in Subsection 2.3.1, the quantity
๐ (๐ถ โฒ) is simply the product of the quantities ๐ (๐ถ๐) for cycles ๐ถ๐ in this sum.
A few comments are in order. Conditional on the analysis of Step 2, the above algorithm
runs in time polynomial in ๐ . Of course, the set โ๐๐โ[๐ ] is not polynomial in ๐ , but
rather than take the entire set as input, we assume the existence of some querying operation,
in which the value of any principal minor can be queried in polynomial time. Combining the
analysis of each step, we can give the following guarantee for the RECOVERK(โ๐๐โ[๐ ]
)algorithm.
Theorem 6. Let โ๐๐โ[๐ ] be the principal minors of some matrix in ๐ฆ. The algorithm
RECOVERK(โ๐๐โ[๐ ]
)computes a matrix ๐พ โ ๐ฆ satisfying โ๐(๐พ) = โ๐ for all
๐ โ [๐ ]. This algorithm runs in time polynomial in ๐ , and queries at most ๐(๐2)
principal minors, all of order at most 3๐๐บ, where ๐บ is the sparsity graph of โ๐|๐|โค2 and
๐๐บ is the maximum of ๐๐ป over all blocks ๐ป of ๐บ. In addition, there exists a matrix ๐พ โฒ โ ๐ฆ,
|๐พ โฒ๐,๐| = |๐พ๐,๐|, ๐, ๐ โ [๐ ], such that any algorithm that computes a matrix with principal
minors โ๐(๐พ โฒ)๐โ[๐ ] must query a principal minor of order at least ๐๐บ.
Proof. Conditional on the existence of the algorithm described in Step 2, i.e., an efficient
algorithm to compute a simple cycle basis for ๐+(๐ป) consisting of cycles of length at most
3๐๐ป that maximizes the number of three- and four-cycles it contains, by the above analysis
we already have shown that the RECOVERK(โ๐๐โ[๐ ]
)algorithm runs in time polynomial
in ๐ and queries at most ๐(๐2) principal minors.
To construct ๐พ โฒ, we first consider the uncharged sparsity graph ๐บ = ([๐ ], ๐ธ) of ๐พ, and
let ๐, ๐โฒ โ ๐ธ(๐บ) be a pair of edges for which the quantity ๐๐บ is achieved (if ๐๐บ = 2, we
are done). We aim to define an alternate charge function for ๐บ, show that the simple cycle
sparsity of ๐+(๐บ) is at least ๐๐บ, find a matrix ๐พ โฒ that has this charged sparsity graph, and
64
then make use of Theorem 5. Consider the charge function ๐ satisfying ๐(๐) = ๐(๐โฒ) = โ1,
and equal to +1 otherwise. Any simple cycle basis for the block containing ๐, ๐โฒ must have
a simple cycle containing both edges ๐, ๐โฒ. By definition, the length of this cycle is at least
๐๐บ, and so the simple cycle sparsity of ๐+(๐บ) is at least ๐๐บ. Next, let ๐พ โฒ be an arbitrary
matrix with |๐พ โฒ๐,๐| = |๐พ๐,๐|, ๐, ๐ โ [๐ ], and charged sparsity graph ๐บ = ([๐ ], ๐ธ, ๐), with ๐
as defined above. ๐พ โ ๐ฆ, and so ๐พ โฒ โ ๐ฆ, and therefore, by Theorem 5, any algorithm that
computes a matrix with principal minors โ๐(๐พ โฒ)๐โ[๐ ] must query a principal minor of
order at least โ+ โฅ ๐๐บ.
Computing a Simple Cycle Basis with Provable Guarantees
Next, we describe an efficient algorithm to compute a simple cycle basis for ๐+(๐ป)
consisting of cycles of length at most 3๐๐ป . Suppose we have a two-connected graph
๐ป = ([๐ ], ๐ธ, ๐). We first compute a minimal cycle basis ๐๐ถ1 , ..., ๐๐ถ๐ of ๐(๐ป), where
each ๐ถ๐ is an induced simple cycle (as argued previously, any lexicographically minimal
basis for ๐(๐ป) consists only of induced simple cycles). Without loss of generality, sup-
pose that ๐ถ1 is the shortest negative cycle in the basis (if no negative cycle exists, we are
done). The set of incidence vectors ๐๐ถ1 + ๐๐ถ๐, ๐(๐ถ๐) = โ1, combined with vectors ๐๐ถ๐
,
๐(๐ถ๐) = +1, forms a basis for ๐+(๐ป), which we denote by โฌ. We will build a simple
cycle basis for ๐+(๐ป) by iteratively choosing incidence vectors of the form ๐๐ถ1 + ๐๐ถ๐,
๐(๐ถ๐) = โ1, replacing each with an incidence vector ๐ ๐ถ๐corresponding to a positive simple
cycle ๐ถ๐, and iteratively updating โฌ.
Let ๐ถ := ๐ถ1 + ๐ถ๐ be a positive cycle in our basis โฌ. If ๐ถ is a simple cycle, we are done,
otherwise ๐ถ is the union of edge-disjoint simple cycles ๐น1, ..., ๐น๐ for some ๐ > 1. If one of
these simple cycles is positive and satisfies
๐๐น๐โ span
โฌโ๐๐ถ1 + ๐๐ถ๐
,
we are done. Otherwise there is a negative simple cycle, without loss of generality given by
๐น1. We can construct a set of ๐ โ 1 positive cycles by adding ๐น1 to each negative simple
65
cycle, and at least one of these cycles is not in spanโฌโ๐๐ถ1 + ๐๐ถ๐
. We now have a
positive cycle not in this span, given by the sum of two edge-disjoint negative simple cycles
๐น1 and ๐น๐ . These two cycles satisfy, by construction, |๐น1| + |๐น๐| โค |๐ถ| โค 2โ, where โ is
the cycle sparsity of ๐(๐ป). If |๐ (๐น1) โฉ ๐ (๐น๐)| > 1, then ๐น1 โช ๐น๐ is two-connected, and we
may compute a simple cycle basis for ๐+(๐น1 โช ๐น๐) using the ear decomposition approach of
Proposition 3. At least one positive simple cycle in this basis satisfies our desired condition,
and the length of this positive simple cycle is at most |๐ธ(๐น1 โช ๐น๐)| โค 2โ. If ๐น1 โช ๐น๐ is not
two-connected, then we compute a shortest cycle ๐ถ in ๐ธ(๐ป) containing both an edge in
๐ธ(๐น1) and ๐ธ(๐น๐) (computed using Suurballeโs algorithm, see [109] for details). The graph
๐ป โฒ =(๐ (๐น1) โช ๐ (๐น๐) โช ๐ ( ๐ถ), ๐ธ(๐น1) โช ๐ธ(๐น๐) โช ๐ธ( ๐ถ)
)is two-connected, and we can repeat the same procedure as above for ๐ป โฒ, where, in this case,
the resulting cycle is of length at most |๐ธ(๐ป โฒ)| โค 2โ + ๐๐ป . The additional guarantee for
maximizing the number of three- and four-cycles can be obtained easily by simply listing
all positive cycles of length three and four, combining them with the computed basis, and
performing a greedy algorithm. What remains is to note that โ โค ๐๐ป . This follows quickly
from the observation that for any vertex ๐ข in any simple cycle ๐ถ of a minimal cycle basis
for ๐(๐ป), there exists an edge ๐ฃ, ๐ค โ ๐ธ(๐ถ) such that ๐ถ is the disjoint union of a shortest
๐ขโ ๐ฃ path, shortest ๐ขโ ๐ค path, and ๐ฃ, ๐ค [51, Theorem 3].
2.3.3 An Algorithm for Principal Minors with Noise
So far, we have exclusively considered the situation in which principal minors are known (or
can be queried) exactly. In applications related to this work, this is often not the case. The
key application of a large part of this work is signed determinantal point processes, and the
algorithmic question of learning the kernel of a signed DPP from some set of samples. Here,
for the sake of readability, we focus on the non-probabilistic setting in which, given some
matrix ๐พ with spectral radius ๐(๐พ) โค 1, each principal minor โ๐ can be queried/estimated
66
up to some absolute error term 0 < ๐ฟ < 1, i.e., we are given a set โ๐๐โ[๐ ], satisfying
โ๐ โโ๐
< ๐ฟ for all ๐ โ [๐ ],
where โ๐๐โ[๐ ] are the principal minors of some magnitude-symmetric matrix ๐พ, and
asked to compute a matrix ๐พ โฒ minimizing
๐(๐พ,๐พ โฒ) := min s.t.
ฮ๐()=ฮ๐(๐พ),๐โ[๐ ]
| โ๐พ โฒ|โ,
where |๐พ โ๐พ โฒ|โ = max๐,๐ |๐พ๐,๐ โ๐พ โฒ๐,๐|.
In this setting, we require both a separation condition for the entries of ๐พ and a stronger
version of Condition (2.6). Let ๐ฆ๐(๐ผ, ๐ฝ) be the set of matrices ๐พ โ R๐ร๐ , ๐(๐พ) โค 1,
satisfying: |๐พ๐,๐| = |๐พ๐,๐| for ๐, ๐ โ [๐พ], |๐พ๐,๐| > ๐ผ or |๐พ๐,๐| = 0 for all ๐, ๐ โ [๐ ], and
If ๐พ๐,๐๐พ๐,๐๐พ๐,โ๐พโ,๐ = 0 for ๐, ๐, ๐, โ โ [๐ ] distinct, (2.12)
then|๐พ๐,๐๐พ๐,โ| โ |๐พ๐,๐๐พโ,๐|
> ๐ฝ, and the sum
๐1๐พ๐,๐๐พ๐,๐๐พ๐,โ๐พโ,๐ + ๐2๐พ๐,๐๐พ๐,โ๐พโ,๐๐พ๐,๐
+ ๐3๐พ๐,๐๐พ๐,๐๐พ๐,โ๐พโ,๐
> ๐ฝ โ๐ โ โ1, 13.
The algorithm RECOVERK(โ๐๐โ[๐ ]
), slightly modified, performs well for this more
general problem. Here, we briefly describe a variant of the above algorithm that performs
nearly optimally on any ๐พ โ ๐ฆ(๐ผ, ๐ฝ), ๐ผ, ๐ฝ > 0, for ๐ฟ sufficiently small, and quantify the
relationship between ๐ผ and ๐ฝ. The algorithm to compute a ๐พ โฒ sufficiently close to ๐พ, which
we denote by RECOVERNOISYK(โ๐๐โ[๐ ]; ๐ฟ
), proceeds in three main steps:
Step 1: Define |๐พ โฒ๐,๐|, ๐, ๐ โ [๐ ], and the charged sparsity graph ๐บ = ([๐ ], ๐ธ, ๐).
67
We define
๐พ โฒ๐,๐ =
โงโชโจโชโฉโ๐ if |โ๐| > ๐ฟ
0 otherwise,
and
๐พ โฒ๐,๐๐พ
โฒ๐,๐ =
โงโชโจโชโฉโ๐โ๐ โ โ๐,๐ if |โ๐โ๐ โ โ๐,๐| > 3๐ฟ + ๐ฟ2
0 otherwise.
The edges ๐, ๐ โ ๐ธ(๐บ) correspond to non-zero off-diagonal entries |๐พ โฒ๐,๐| = 0, and
the function ๐ is given by ๐๐,๐ = sgn(๐พ โฒ๐,๐๐พ
โฒ๐,๐).
Step 2: For every block ๐ป of ๐บ, compute a simple cycle basis ๐ฅ1, ..., ๐ฅ๐ of ๐+(๐ป) satis-
fying Properties (i)-(iii) of Lemma 6, and define ๐ ๐พโฒ(๐ถ) for every cycle ๐ถ in the basis.
The computation of the simple cycle basis depends only on the graph ๐บ, and so is identi-
cal to Steps 1 and 2 of the RECOVERK algorithm. The simple cycle basis for ๐+(๐ป)
satisfies Properties (i)-(iii) of Lemma 6, and so we can apply Lemmas 8 and 9. We
define the quantities ๐ ๐พโฒ(๐ถ) iteratively based on cycle length, beginning with the shortest
cycle. We begin by detailing the cases of |๐ถ| = 3, then |๐ถ| = 4, and then finally |๐ถ| > 4.
Case I: |๐ถ| = 3.
Suppose that the three-cycle ๐ถ = ๐ ๐ ๐ ๐, ๐ < ๐ < ๐, is in our basis. If ๐บ was the charged
sparsity graph of ๐พ, then the equation
๐พ๐,๐๐พ๐,๐๐พ๐,๐ = โ๐โ๐โ๐ โ1
2
[โ๐โ๐,๐ + โ๐โ๐,๐ + โ๐โ๐,๐
]+
1
2โ๐,๐,๐
would hold. For this reason, we define ๐ ๐พโฒ(๐ถ) as
๐ ๐พโฒ(๐ถ) = ๐๐,๐ sgn(2โ๐โ๐โ๐ โ
[โ๐โ๐,๐ + โ๐โ๐,๐ + โ๐โ๐,๐
]+ โ๐,๐,๐
).
68
Case II: |๐ถ| = 4.
Suppose that the four-cycle ๐ถ = ๐ ๐ ๐ โ ๐, with (without loss of generality) ๐ < ๐ < ๐ < โ,
is in our basis. We note that
โ๐,๐,๐,โ = โ๐,๐โ๐,โ + โ๐,๐โ๐,โ + โ๐,โโ๐,๐ + โ๐โ๐,๐,โ + โ๐โ๐,๐,โ + โ๐โ๐,๐,โ
+ โโโ๐,๐,๐ โ 2โ๐โ๐โ๐,โ โ 2โ๐โ๐โ๐,โ โ 2โ๐โโโ๐,๐ โ 2โ๐โ๐โ๐,โ
โ 2โ๐โโโ๐,๐ โ 2โ๐โโโ๐,๐ + 6โ๐โ๐โ๐โโ + ๐,
where ๐ is the sum of the terms in the Laplace expansion of โ๐,๐,๐,โ corresponding to a
four-cycle of ๐บ[๐, ๐, ๐, โ], and define the quantity ๐ analogously:
๐ = โ๐,๐,๐,โ โ โ๐,๐โ๐,โ โ โ๐,๐โ๐,โ โ โ๐,โโ๐,๐ โ โ๐โ๐,๐,โ โ โ๐โ๐,๐,โ
โ โ๐โ๐,๐,โ โ โโโ๐,๐,๐ + 2โ๐โ๐โ๐,โ + 2โ๐โ๐โ๐,โ + 2โ๐โโโ๐,๐
+ 2โ๐โ๐โ๐,โ + 2โ๐โโโ๐,๐ + 2โ๐โโโ๐,๐ โ 6โ๐โ๐โ๐โโ.
We consider two cases, depending on the number of positive four-cycles in ๐บ[๐, ๐, ๐, โ].
First, suppose that ๐ถ is the only positive four-cycle in ๐บ[๐, ๐, ๐, โ]. If ๐บ was the charged
sparsity graph of ๐พ, then ๐ would equalโ2๐พ๐,๐๐พ๐,๐๐พ๐,โ๐พโ,๐. For this reason, we define
๐ ๐พโฒ(๐ถ) as
๐ ๐พโฒ(๐ถ) = ๐๐,โ sgn(โ ๐
)in this case. Next, we consider the second case, in which there is more than one
positive four-cycle in ๐บ[๐, ๐, ๐, โ], which actually implies that all three four-cycles in
๐บ[๐, ๐, ๐, โ] are positive. If ๐บ was the charged sparsity graph of ๐พ then ๐ would be
69
given by
๐ = โ2(๐พ๐,๐๐พ๐,๐๐พ๐,โ๐พโ,๐ + ๐พ๐,๐๐พ๐,โ๐พโ,๐๐พ๐,๐ + ๐พ๐,๐๐พ๐,๐๐พ๐,โ๐พโ,๐
).
For this reason, we compute an assignment of values ๐1, ๐2, ๐3 โ โ1,+1 that mini-
mizes (not necessarily uniquely) the quantity๐/2 + ๐1
๐,๐๐,๐๐,โโ,๐
+ ๐2
๐,๐๐,โโ,๐๐,๐
+ ๐3
๐,๐๐,๐๐,โโ,๐
,
and define ๐ ๐พโฒ(๐ถ) = ๐๐,โ ๐1 in this case.
Case III: |๐ถ| > 4.
Suppose that the cycle ๐ถ = ๐1 ... ๐๐ ๐1, ๐ > 4, is in our basis, with vertices ordered to
match the ordering of Lemma 8, and define ๐ = ๐1, ..., ๐๐. Based on the equation in
Lemma 8, we define ๐ to equal
๐ = โ๐ โ โ๐1โ๐โ๐1 โ โ๐๐โ๐โ๐๐ โ[โ๐1,๐๐ โ 2โ๐1โ๐๐
]โ๐โ๐1,๐๐
+ 2๐พ โฒ๐1,๐๐โ1
๐พ โฒ๐๐โ1,๐๐
๐พ โฒ๐๐,๐2
๐พ โฒ๐2,๐1
โ๐โ๐1,๐2,๐๐โ1,๐๐
+[โ๐1,๐2 โ โ๐1โ๐2
][โ๐๐โ1,๐๐ โ โ๐๐โ1
โ๐๐
]โ๐โ๐1,๐2,๐๐โ1,๐๐
โ[โ๐1,๐๐โ1
โ โ๐1โ๐๐โ1
][โ๐2,๐๐ โ โ๐2โ๐๐
]โ๐โ๐1,๐2,๐๐โ1,๐๐
โ[โ๐1,๐2 โ โ๐1โ๐2
][โ๐โ๐1,๐2 โ โ๐๐โ๐โ๐1,๐2,๐๐
]โ[โ๐๐โ1,๐๐ โ โ๐๐โ1
โ๐๐
][โ๐โ๐๐โ1,๐๐ โ โ๐2โ๐โ๐1,๐๐โ1,๐๐
],
and note that the quantity
sgn(๐พ โฒ๐1,๐๐โ1
๐พ โฒ๐๐โ1,๐๐
๐พ โฒ๐๐,๐2
๐พ โฒ๐2,๐1
)
is, by construction, computable using the signs of three- and four-cycles in our basis.
70
Based on the equation in Lemma 9, we set
sgn(
2 (โ1)๐+1๐พ โฒ๐๐,๐1
๐โ1โ๐=1
๐พ โฒ๐๐ ,๐๐+1
โ(๐,๐)โ๐
[1โ
๐๐๐,๐๐+1๐พโฒ๐๐โ1,๐๐
๐พ โฒ๐๐+1,๐๐
๐พ โฒ๐๐,๐๐+1
๐พ โฒ๐๐โ1,๐๐
])= sgn(๐),
with the set ๐ defined as in Lemma 9. From this equation, we can compute ๐ ๐พโฒ(๐ถ), as
the quantity
sgn
[1โ
๐๐๐,๐๐+1๐พโฒ๐๐โ1,๐๐
๐พ โฒ๐๐+1,๐๐
๐พ โฒ๐๐,๐๐+1
๐พ โฒ๐๐โ1,๐๐
]is computable using the signs of three- and four-cycles in our basis (see Subsection 2.3.1
for details). In the case of |๐ถ| > 4, we note that ๐ ๐พโฒ(๐ถ) was defined using ๐(1) principal
minors, all corresponding to subsets of ๐ (๐ถ), and previously computed information
regarding three- and four-cycles (which requires no additional querying).
Step 3: Define ๐พ โฒ.
We have already defined ๐พ โฒ๐,๐, ๐ โ [๐ ], |๐พ โฒ
๐,๐|, ๐, ๐ โ [๐ ], and ๐ ๐พโฒ(๐ถ) for every cycle in
a simple cycle basis. The procedure for producing this matrix is identical to Step 5 of
the RECOVERK algorithm.
The RECOVERNOISYK(โ๐๐โ[๐ ], ๐ฟ
)algorithm relies quite heavily on ๐ฟ being small
enough so that the sparsity graph of ๐พ can be recovered. In fact, we can quantify the size of
๐ฟ required so that the complete signed structure of ๐พ can be recovered, and only uncertainty
in the magnitude of entries remains. In the following theorem, we show that, in this regime,
the RECOVERNOISYK algorithm is nearly optimal.
Theorem 7. Let ๐พ โ ๐ฆ๐(๐ผ, ๐ฝ) and โ๐๐โ[๐ ] satisfy |โ๐ โ โ๐(๐พ)| < ๐ฟ < 1 for all
๐ โ [๐ ], for some
๐ฟ < min๐ผ3๐๐บ , ๐ฝ3๐๐บ/2, ๐ผ2/400,
where ๐บ is the sparsity graph of โ๐|๐|โค2 and ๐๐บ is the maximum of ๐๐ป over all blocks ๐ป
71
of ๐บ. The RECOVERNOISYK(โ๐๐โ[๐ ], ๐ฟ
)algorithm outputs a matrix ๐พ โฒ satisfying
๐(๐พ,๐พ โฒ) < 2๐ฟ/๐ผ.
This algorithm runs in time polynomial in ๐ , and queries at most ๐(๐2) principal minors,
all of order at most 3๐๐บ.
Proof. This proof consists primarily of showing that, for ๐ฟ sufficiently small, the charged
sparsity graph ๐บ = ([๐ ], ๐ธ, ๐) and the cycle signs ๐ (๐ถ) of ๐พ โฒ and ๐พ agree, and quantifying
the size of ๐ฟ required to achieve this. If this holds, then the quantity ๐(๐พ,๐พ โฒ) is simply
given by max๐,๐
|๐พ๐,๐| โ |๐พ โฒ
๐,๐|. We note that, because ๐(๐พ) โค 1 and ๐ฟ < 1,
โ๐1 ...โ๐๐
โ โ๐1 ...โ๐๐
< (1 + ๐ฟ)๐ โ 1 < (2๐ โ 1)๐ฟ (2.13)
for any ๐1, ..., ๐๐ โ [๐ ], ๐ โ N. This is the key error estimate we will use throughout the
proof.
We first consider error estimates for the magnitudes of the entries of ๐พ and ๐พ โฒ. We
have |๐พ๐,๐ โ โ๐| < ๐ฟ and either |๐พ๐,๐| > ๐ผ or ๐พ๐,๐ = 0. If ๐พ๐,๐ = 0, then |โ๐| < ๐ฟ, and
๐พ๐,๐ = ๐พ โฒ๐,๐. If |๐พ๐,๐| > ๐ผ, then |โ๐| > ๐ฟ, as ๐ผ > 2๐ฟ, and ๐พ๐,๐ = โ๐. Combining these two
cases, we have
|๐พ๐,๐ โ๐พ โฒ๐,๐| < ๐ฟ for all ๐ โ [๐ ].
Next, we consider off-diagonal entries ๐พ โฒ๐,๐ . By Equation (2.13),
(โ๐โ๐ โ โ๐,๐)โ๐พ๐,๐๐พ๐,๐
โค [(1 + ๐ฟ)2 โ 1] + ๐ฟ < 4๐ฟ.
Therefore, if ๐พ๐,๐ = 0, then ๐พ โฒ๐,๐ = 0. If |๐พ๐,๐| > ๐ผ, then |๐พ โฒ
๐,๐| =โ|โ๐โ๐ โ โ๐,๐|, as
๐ผ2 > 8๐ฟ. Combining these two cases, we have
|๐พ๐,๐๐พ๐,๐ โ๐พ โฒ๐,๐๐พ
โฒ๐,๐| < 4๐ฟ for all ๐, ๐ โ [๐ ],
72
and therefore
|๐พ๐,๐| โ |๐พ โฒ
๐,๐|< 2๐ฟ/๐ผ for all ๐, ๐ โ [๐ ], ๐ = ๐. (2.14)
Our above analysis implies that ๐พ and ๐พ โฒ share the same charged sparsity graph ๐บ =
([๐ ], ๐ธ, ๐). What remains is to show that the cycle signs ๐ (๐ถ) of ๐พ and ๐พ โฒ agree for
the positive simple cycle basis of our algorithm. Similar to the structure in Step 2 of the
description of the RECOVERNOISYK algorithm, we will treat the cases of |๐ถ| = 3, |๐ถ| = 4,
and |๐ถ| > 4 separately, and rely heavily on the associated notation defined above.
Case I: |๐ถ| = 3.
We have |๐พ๐,๐๐พ๐,๐๐พ๐,๐| > ๐ผ3, and, by Equation (2.13), the difference between ๐พ๐,๐๐พ๐,๐๐พ๐,๐
and its estimated value
โ๐โ๐โ๐ โ1
2
[โ๐โ๐,๐ + โ๐โ๐,๐ + โ๐โ๐,๐
]+
1
2โ๐,๐,๐
is at most
[(1 + ๐ฟ)3 โ 1] +3
2[(1 + ๐ฟ)2 โ 1] +
1
2[(1 + ๐ฟ)โ 1] < 12๐ฟ < ๐ผ3.
Therefore, ๐ ๐พ(๐ถ) equals ๐ ๐พโฒ(๐ถ).
Case II: |๐ถ| = 4.
By repeated application of Equation (2.13),
|๐ โ ๐| < 6[(1 + ๐ฟ)4 โ 1] + 12[(1 + ๐ฟ)3 โ 1] + 7[(1 + ๐ฟ)2 โ 1] + [(1 + ๐ฟ)โ 1] < 196 ๐ฟ.
If ๐ถ is the only positive four-cycle of ๐บ[๐, ๐, ๐, โ], then ๐ ๐พ(๐ถ) equals ๐ ๐พโฒ(๐ถ), as 2๐ฝ >
73
196๐ฟ. If ๐บ[๐, ๐, ๐, โ] has three positive four-cycles, then
๐ = โ2(๐พ๐,๐๐พ๐,๐๐พ๐,โ๐พโ,๐ + ๐พ๐,๐๐พ๐,โ๐พโ,๐๐พ๐,๐ + ๐พ๐,๐๐พ๐,๐๐พ๐,โ๐พโ,๐
),
and, by Condition (2.12), any two signings of the three terms of the above equation for ๐
are at distance at least 4๐ฝ from each other. As 2๐ฝ > 196๐ฟ, ๐ ๐พ(๐ถ) equals ๐ ๐พโฒ(๐ถ) in this case
as well.
Case III: |๐ถ| > 4.
Using equations (2.13) and (2.14), and noting that ๐(๐พ) < 1 implies |๐พ๐,๐| <โ
2, we can
bound the difference between ๐ and ๐ by
|๐ โ ๐| < [(1 + ๐ฟ)โ 1] + 5[(1 + ๐ฟ)2 โ 1] + 8[(1 + ๐ฟ)3 โ 1] + 6[(1 + ๐ฟ)4 โ 1]
+ 2[(1 + ๐ฟ)5 โ 1] + 2[(โ
2 + 2๐ฟ/๐ผ)4 โ 4][(1 + ๐ฟ)โ 1]
โค (224 + 120๐ฟ/๐ผ)๐ฟ โค 344๐ฟ.
Based on the equation of Lemma 9 for ๐, the quantity ๐ is at least
min๐ผ|๐ถ|, ๐ฝ|๐ถ|/2 > 344๐ฟ,
and so ๐ ๐พ(๐ถ) equals ๐ ๐พโฒ(๐ถ). This completes the analysis of the final case.
This implies that ๐(๐พ,๐พ โฒ) is given by max๐,๐
|๐พ๐,๐| โ |๐พ โฒ
๐,๐|< 2๐ฟ/๐ผ.
74
Chapter 3
The Spread and Bipartite Spread
Conjecture
3.1 Introduction
The spread ๐ (๐) of an arbitrary ๐ร ๐ complex matrix ๐ is the diameter of its spectrum;
that is,
๐ (๐) := max๐,๐|๐๐ โ ๐๐|,
where the maximum is taken over all pairs of eigenvalues of ๐ . This quantity has been
well-studied in general, see [33, 53, 83, 129] for details and additional references. Most
notably, Johnson, Kumar, and Wolkowicz produced the lower bound
๐ (๐) โฅโ
๐ =๐ ๐๐,๐
/(๐โ 1)
for normal matrices ๐ = (๐๐,๐) [53, Theorem 2.1], and Mirsky produced the upper bound
๐ (๐) โคโ
2โ
๐,๐ |๐๐,๐|2 โ (2/๐)โ
๐ ๐๐,๐
2for any ๐ร ๐ matrix ๐ , which is tight for normal matrices with ๐โ 2 of its eigenvalues all
equal and equal to the arithmetic mean of the other two [83, Theorem 2].
75
The spread of a matrix has also received interest in certain particular cases. Consider a
simple undirected graph ๐บ = (๐,๐ธ) of order ๐. The adjacency matrix ๐ด of a graph ๐บ is
the ๐ ร ๐ matrix whose rows and columns are indexed by the vertices of ๐บ, with entries
satisfying
๐ด๐ข,๐ฃ =
โงโชโจโชโฉ 1 if ๐ข, ๐ฃ โ ๐ธ(๐บ)
0 otherwise.
This matrix is real and symmetric, and so its eigenvalues are real, and can be ordered
๐1(๐บ) โฅ ๐2(๐บ) โฅ ยท ยท ยท โฅ ๐๐(๐บ). When considering the spread of the adjacency matrix ๐ด
of some graph ๐บ, the spread is simply the distance between ๐1(๐บ) and ๐๐(๐บ), denoted by
๐ (๐บ) := ๐1(๐บ)โ ๐๐(๐บ).
In this instance, ๐ (๐บ) is referred to as the spread of the graph. The spectrum of a bipartite
graph is symmetric with respect to the origin, and, in particular, ๐๐(๐บ) = โ๐1(๐บ) and so
๐ (๐บ) = 2๐1(๐บ).
In [48], the authors investigated a number of properties regarding the spread of a
graph, determining upper and lower bounds on ๐ (๐บ). Furthermore, they made two key
conjectures. Let us denote the maximum spread over all ๐ vertex graphs by ๐ (๐), the
maximum spread over all ๐ vertex graphs of size ๐ by ๐ (๐,๐), and the maximum spread
over all ๐ vertex bipartite graphs of size ๐ by ๐ ๐(๐,๐). Let ๐พ๐ be the clique of order ๐ and
๐บ(๐, ๐) := ๐พ๐ โจ๐พ๐โ๐ be the join of the clique ๐พ๐ and the independent set ๐พ๐โ๐ (i.e., the
disjoint union of ๐พ๐ and ๐พ๐โ๐, with all possible edges between the two graphs added). The
conjectures addressed in this article are as follows.
Conjecture 1 ([48], Conjecture 1.3). For any positive integer ๐, the graph of order ๐ with
maximum spread is ๐บ(๐, โ2๐/3โ); that is, ๐ (๐) is attained only by ๐บ(๐, โ2๐/3โ).
Conjecture 2 ([48], Conjecture 1.4). If ๐บ is a graph with ๐ vertices and ๐ edges attaining
the maximum spread ๐ (๐,๐), and if ๐ โค โ๐2/4โ, then ๐บ must be bipartite. That is,
๐ ๐(๐,๐) = ๐ (๐,๐) for all ๐ โค โ๐2/4โ.
76
Conjecture 1 is referred to as the Spread Conjecture, and Conjecture 2 is referred to as
the Bipartite Spread Conjecture. In this chapter, we investigate both conjectures. In Section
3.2, we prove an asymptotic version of the Bipartite Spread Conjecture, and provide an
infinite family of counterexamples to illustrate that our asymptotic version is as tight as
possible, up to lower order terms. In Section 3.3, we provide a high-level sketch of a proof
that the Spread Conjecture holds for all ๐ sufficiently large (the full proof can be found in
[12]). The exact associated results are given by Theorems 8 and 9.
Theorem 8. There exists a constant ๐ so that the following holds: Suppose ๐บ is a graph
on ๐ โฅ ๐ vertices with maximum spread; then ๐บ is the join of a clique on โ2๐/3โ vertices
and an independent set on โ๐/3โ vertices.
Theorem 9.
๐ (๐,๐)โ ๐ ๐(๐,๐) โค 1 + 16๐โ3/4
๐3/4๐ (๐,๐)
for all ๐,๐ โ N satisfying ๐ โค โ๐2/4โ. In addition, for any ๐ > 0, there exists some ๐๐
such that
๐ (๐,๐)โ ๐ ๐(๐,๐) โฅ 1โ ๐
๐3/4๐ (๐,๐)
for all ๐ โฅ ๐๐ and some ๐ โค โ๐2/4โ depending on ๐.
The proof of Theorem 8 is quite involved, and for this reason we provide only a sketch,
illustrating the key high-level details involved in the proof, and referring some of the exact
details to the main paper [12]. The general technique consists of showing that a spread-
extremal graph has certain desirable properties, considering and solving an analogous
problem for graph limits, and then using this result to say something about the Spread
Conjecture for sufficiently large ๐. In comparison, the proof of Theorem 9 is surprisingly
short, making use of the theory of equitable decompositions and a well-chosen class of
counterexamples.
77
3.2 The Bipartite Spread Conjecture
In [48], the authors investigated the structure of graphs which maximize the spread over all
graphs with a fixed number of vertices ๐ and edges ๐, denoted by ๐ (๐,๐). In particular,
they proved the upper bound
๐ (๐บ) โค ๐1 +โ
2๐โ ๐21 โค 2
โ๐, (3.1)
and noted that equality holds throughout if and only if ๐บ is the union of isolated vertices
and ๐พ๐,๐, for some ๐ + ๐ โค ๐ satisfying ๐ = ๐๐ [48, Thm. 1.5]. This led the authors
to conjecture that if ๐บ has ๐ vertices, ๐ โค โ๐2/4โ edges, and spread ๐ (๐,๐), then ๐บ is
bipartite [48, Conj. 1.4]. In this section, we prove a weak asymptotic form of this conjecture
and provide an infinite family of counterexamples to the exact conjecture which verifies that
the error in the aforementioned asymptotic result is of the correct order of magnitude. Recall
that ๐ ๐(๐,๐), ๐ โค โ๐2/4โ, is the maximum spread over all bipartite graphs with ๐ vertices
and ๐ edges. To explicitly compute the spread of certain graphs, we make use of the theory
of equitable partitions. In particular, we note that if ๐ is an automorphism of ๐บ, then the
quotient matrix of ๐ด(๐บ) with respect to ๐, denoted by ๐ด๐, satisfies ฮ(๐ด๐) โ ฮ(๐ด), and
therefore ๐ (๐บ) is at least the spread of ๐ด๐ (for details, see [13, Section 2.3]). Additionally,
we require two propositions, one regarding the largest spectral radius of subgraphs of ๐พ๐,๐
of a given size, and another regarding the largest gap between sizes which correspond to a
complete bipartite graph of order at most ๐.
Let ๐พ๐๐,๐, 0 โค ๐๐ โ๐ < min๐, ๐, be the subgraph of ๐พ๐,๐ resulting from removing
๐๐ โ๐ edges all incident to some vertex in the larger side of the bipartition (if ๐ = ๐, the
vertex can be from either set). In [78], the authors proved the following result.
Proposition 5. If 0 โค ๐๐ โ๐ < min๐, ๐, then ๐พ๐๐,๐ maximizes ๐1 over all subgraphs of
๐พ๐,๐ of size ๐.
We also require estimates regarding the longest sequence of consecutive sizes ๐ <
โ๐2/4โ for which there does not exist a complete bipartite graph on at most ๐ vertices and
78
exactly ๐ edges. As pointed out by [4], the result follows quickly by induction. However,
for completeness, we include a brief proof.
Proposition 6. The length of the longest sequence of consecutive sizes ๐ < โ๐2/4โ for
which there does not exist a complete bipartite graph on at most ๐ vertices and exactly ๐
edges is zero for ๐ โค 4 and at mostโ
2๐โ 1โ 1 for ๐ โฅ 5.
Proof. We proceed by induction. By inspection, for every ๐ โค 4, ๐ โค โ๐2/4โ, there exists
a complete bipartite graph of size ๐ and order at most ๐, and so the length of the longest
sequence is trivially zero for ๐ โค 4. When ๐ = ๐ = 5, there is no complete bipartite graph
of order at most five with exactly five edges. This is the only such instance for ๐ = 5, and
so the length of the longest sequence for ๐ = 5 is one.
Now, suppose that the statement holds for graphs of order at most ๐โ 1, for some ๐ > 5.
We aim to show the statement for graphs of order at most ๐. By our inductive hypothesis,
it suffices to consider only sizes ๐ โฅ โ(๐ โ 1)2/4โ and complete bipartite graphs on ๐
vertices. We have
(๐2
+ ๐)(๐
2โ ๐)โฅ (๐โ 1)2
4for |๐| โค
โ2๐โ 1
2.
When 1 โค ๐ โคโ
2๐โ 1/2, the difference between the sizes of ๐พ๐/2+๐โ1,๐/2โ๐+1 and
๐พ๐/2+๐,๐/2โ๐ is at most
๐ธ(๐พ๐
2+๐โ1,๐
2โ๐+1
)โ๐ธ(๐พ๐/2+๐,๐/2โ๐
)= 2๐ โ 1 โค
โ2๐โ 1โ 1.
Let ๐* be the largest value of ๐ satisfying ๐ โคโ
2๐โ 1/2 and ๐/2 + ๐ โ N. Then
๐ธ(๐พ๐
2+๐*,๐
2โ๐*)
<
(๐
2+
โ2๐โ 1
2โ 1
)(๐
2โโ
2๐โ 1
2+ 1
)=โ
2๐โ 1 +(๐โ 1)2
4โ 1,
79
and the difference between the sizes of ๐พ๐/2+๐*,๐/2โ๐* and ๐พโ๐โ12
โ,โ๐โ12
โ is at most
๐ธ(๐พ๐
2+๐*,๐
2โ๐*)โ๐ธ(๐พโ๐โ1
2โ,โ๐โ1
2โ)
<โ
2๐โ 1 +(๐โ 1)2
4โโ
(๐โ 1)2
4
โโ 1
<โ
2๐โ 1.
Combining these two estimates completes our inductive step, and the proof.
We are now prepared to prove an asymptotic version of [48, Conjecture 1.4], and
provide an infinite class of counterexamples that illustrates that the asymptotic version under
consideration is the tightest version of this conjecture possible.
Theorem 10.
๐ (๐,๐)โ ๐ ๐(๐,๐) โค 1 + 16๐โ3/4
๐3/4๐ (๐,๐)
for all ๐,๐ โ N satisfying ๐ โค โ๐2/4โ. In addition, for any ๐ > 0, there exists some ๐๐
such that
๐ (๐,๐)โ ๐ ๐(๐,๐) โฅ 1โ ๐
๐3/4๐ (๐,๐)
for all ๐ โฅ ๐๐ and some ๐ โค โ๐2/4โ depending on ๐.
Proof. The main idea of the proof is as follows. To obtain an upper bound on ๐ (๐,๐) โ
๐ ๐(๐,๐), we upper bound ๐ (๐,๐) by 2โ๐ (using Inequality 3.1) and lower bound ๐ ๐(๐,๐)
by the spread of some specific bipartite graph. To obtain a lower bound on ๐ (๐,๐)โ๐ ๐(๐,๐)
for a specific ๐ and ๐, we explicitly compute ๐ ๐(๐,๐) using Proposition 5, and lower bound
๐ (๐,๐) by the spread of some specific non-bipartite graph.
First, we analyze the spread of ๐พ๐๐,๐, 0 < ๐๐ โ๐ < ๐ โค ๐, a quantity that will be used
in the proof of both the upper and lower bound. Let us denote the vertices in the bipartition
of ๐พ๐๐,๐ by ๐ข1, ..., ๐ข๐ and ๐ฃ1, ..., ๐ฃ๐, and suppose without loss of generality that ๐ข1 is not
adjacent to ๐ฃ1, ..., ๐ฃ๐๐โ๐. Then
๐ = (๐ข1)(๐ข2, ..., ๐ข๐)(๐ฃ1, ..., ๐ฃ๐๐โ๐)(๐ฃ๐๐โ๐+1, ..., ๐ฃ๐)
80
is an automorphism of ๐พ๐๐,๐. The corresponding quotient matrix is given by
๐ด๐ =
โโโโโโโโ0 0 0 ๐โ (๐โ 1)๐
0 0 ๐๐ โ๐ ๐โ (๐โ 1)๐
0 ๐โ 1 0 0
1 ๐โ 1 0 0
โโโโโโโโ ,
has characteristic polynomial
๐(๐, ๐,๐) = det[๐ด๐ โ ๐๐ผ] = ๐4 โ๐๐2 + (๐โ 1)(๐โ (๐โ 1)๐)(๐๐ โ๐),
and, therefore,
๐ (๐พ๐
๐,๐
)โฅ 2
(๐ +
โ๐2 โ 4(๐โ 1)(๐โ (๐โ 1)๐)(๐๐ โ๐)
2
)1/2
. (3.2)
For ๐๐ = ฮฉ(๐2) and ๐ sufficiently large, this upper bound is actually an equality, as ๐ด(๐พ๐๐,๐)
is a perturbation of the adjacency matrix of a complete bipartite graph with ฮฉ(๐) bipartite
sets by an ๐(โ๐) norm matrix. For the upper bound, we only require the inequality, but for
the lower bound, we assume ๐ is large enough so that this is indeed an equality.
Next, we prove the upper bound. For some fixed ๐ and ๐ โค โ๐2/4โ, let ๐ = ๐๐ โ ๐,
where ๐, ๐, ๐ โ N, ๐ + ๐ โค ๐, and ๐ is as small as possible. If ๐ = 0, then by [48,
Thm. 1.5] (described above), ๐ (๐,๐) = ๐ ๐(๐,๐) and we are done. Otherwise, we note
that 0 < ๐ < min๐, ๐, and so Inequality 3.2 is applicable (in fact, by Proposition 6,
๐ = ๐(โ๐)). Using the upper bound ๐ (๐,๐) โค 2
โ๐ and Inequality 3.2, we have
๐ (๐, ๐๐ โ ๐)โ ๐ (๐พ๐
๐,๐
)๐ (๐, ๐๐ โ ๐)
โค 1โ
(1
2+
1
2
โ1โ 4(๐โ 1)(๐ โ ๐)๐
(๐๐ โ ๐)2
)1/2
. (3.3)
To upper bound ๐, we use Proposition 6 with ๐โฒ = โ2โ๐โ โค ๐ and ๐. This implies that
๐ โคโ
2โ2โ๐โ โ 1โ 1 <
โ2(2โ๐ + 1)โ 1โ 1 =
โ4โ๐ + 1โ 1 โค 2๐1/4.
81
Recall thatโ
1โ ๐ฅ โฅ 1โ ๐ฅ/2โ ๐ฅ2/2 for all ๐ฅ โ [0, 1], and so
1โ(12
+ 12
โ1โ ๐ฅ
)1/2 โค 1โ(12
+ 12(1โ 1
2๐ฅโ 1
2๐ฅ2))1/2
= 1โ(1โ 1
4(๐ฅ + ๐ฅ2)
)โค 1โ
(1โ 1
8(๐ฅ + ๐ฅ2)โ 1
32(๐ฅ + ๐ฅ2)2
)โค 1
8๐ฅ + 1
4๐ฅ2
for ๐ฅ โ [0, 1]. To simplify Inequality 3.3, we observe that
4(๐โ 1)(๐ โ ๐)๐
(๐๐ โ ๐)2โค 4๐
๐โค 8
๐3/4.
Therefore,๐ (๐, ๐๐ โ ๐)โ ๐
(๐พ๐
๐,๐
)๐ (๐, ๐๐ โ ๐)
โค 1
๐3/4+
16
๐3/2.
This completes the proof of the upper bound.
Finally, we proceed with the proof of the lower bound. Let us fix some 0 < ๐ < 1, and
consider some arbitrarily large ๐. Let ๐ = (๐/2 + ๐)(๐/2โ ๐) + 1, where ๐ is the smallest
number satisfying ๐/2 + ๐ โ N and ๐ := 1โ 2๐2/๐ < ๐/2 (here we require ๐ = ฮฉ(1/๐2)).
Denote the vertices in the bipartition of ๐พ๐/2+๐,๐/2โ๐ by ๐ข1, ..., ๐ข๐/2+๐ and ๐ฃ1, ..., ๐ฃ๐/2โ๐,
and consider the graph ๐พ+๐/2+๐,๐/2โ๐ := ๐พ๐/2+๐,๐/2โ๐ โช (๐ฃ1, ๐ฃ2) resulting from adding
one edge to ๐พ๐/2+๐,๐/2โ๐ between two vertices in the smaller side of the bipartition. Then
๐ = (๐ข1, ..., ๐ข๐/2+๐)(๐ฃ1, ๐ฃ2)(๐ฃ3, ..., ๐ฃ๐/2โ๐)
is an automorphism of ๐พ+๐/2+๐,๐/2โ๐, and
๐ด๐ =
โโโโโ0 2 ๐/2โ ๐ โ 2
๐/2 + ๐ 1 0
๐/2 + ๐ 0 0
โโโโโ
82
has characteristic polynomial
det[๐ด๐ โ ๐๐ผ] = โ๐3 + ๐2 +(๐2/4โ ๐2
)๐โ (๐/2 + ๐)(๐/2โ ๐ โ 2)
= โ๐3 + ๐2 +
(๐2
4โ (1โ ๐)๐
2
)๐โ
(๐2
4โ (3โ ๐)๐
2โโ
2(1โ ๐)๐
).
By matching higher order terms, we obtain
๐๐๐๐ฅ(๐ด๐) =๐
2โ 1โ ๐
2+
(8โ (1โ ๐)2)
4๐+ ๐(1/๐),
๐๐๐๐(๐ด๐) = โ๐
2+
1โ ๐
2+
(8 + (1โ ๐)2)
4๐+ ๐(1/๐),
and
๐ (๐พ+๐/2+๐,๐/2โ๐) โฅ ๐โ (1โ ๐)โ (1โ ๐)2
2๐+ ๐(1/๐).
Next, we aim to compute ๐ ๐(๐,๐), ๐ = (๐/2 + ๐)(๐/2 โ ๐) + 1. By Proposition 5,
๐ ๐(๐,๐) is equal to the maximum of ๐ (๐พ๐๐/2+โ,๐/2โโ) over all โ โ [0, ๐ โ 1], ๐ โ โ โ N. As
previously noted, for ๐ sufficiently large, the quantity ๐ (๐พ๐๐/2+โ,๐/2โโ) is given exactly by
Equation (3.2), and so the optimal choice of โ minimizes
๐(โ) := (๐/2 + โโ 1)(๐2 โ โ2 โ 1)(๐/2โ โโ (๐2 โ โ2 โ 1))
= (๐/2 + โ)((1โ ๐)๐/2โ โ2
)(๐๐/2 + โ2 โ โ
)+ ๐(๐2).
We have
๐(๐ โ 1) = (๐/2 + ๐ โ 2)(2๐ โ 2)(๐/2โ 3๐ + 3),
and if โ โค 45๐, then ๐(โ) = ฮฉ(๐3). Therefore the minimizing โ is in [4
5๐, ๐]. The derivative
of ๐(โ) is given by
๐ โฒ(โ) = (๐2 โ โ2 โ 1)(๐/2โ โโ ๐2 + โ2 + 1)
โ 2โ(๐/2 + โโ 1)(๐/2โ โโ ๐2 + โ2 + 1)
+ (2โโ 1)(๐/2 + โโ 1)(๐2 โ โ2 โ 1).
83
For โ โ [45๐, ๐],
๐ โฒ(โ) โค ๐(๐2 โ โ2)
2โ โ๐(๐/2โ โโ ๐2 + โ2) + 2โ(๐/2 + โ)(๐2 โ โ2)
โค 9๐2๐
50โ 4
5๐๐(๐/2โ ๐ โ 9
25๐2) + 18
25(๐/2 + ๐)๐3
=81๐3๐
125โ 2๐๐2
5+ ๐(๐2)
= ๐๐2
(81(1โ ๐)
250โ 2
5
)+ ๐(๐2) < 0
for sufficiently large ๐. This implies that the optimal choice is โ = ๐ โ 1, and ๐ ๐(๐,๐) =
๐ (๐พ๐๐/2+๐โ1,๐/2โ๐+1). The characteristic polynomial ๐(๐/2+๐โ1, ๐/2โ๐+1, ๐2/4โ๐2+1)
equals
๐4 โ(๐2/4โ ๐2 + 1
)๐2 + 2(๐/2 + ๐ โ 2)(๐/2โ 3๐ + 3)(๐ โ 1).
By matching higher order terms, the extreme root of ๐ is given by
๐ =๐
2โ 1โ ๐
2โโ
2(1โ ๐)
๐+
27โ 14๐โ ๐2
4๐+ ๐(1/๐),
and so
๐ ๐(๐,๐) = ๐โ (1โ ๐)โ 2
โ2(1โ ๐)
๐+
27โ 14๐โ ๐2
2๐+ ๐(1/๐),
and
๐ (๐,๐)โ ๐ ๐(๐,๐)
๐ (๐,๐)โฅ 23/2(1โ ๐)1/2
๐3/2โ 14โ 8๐
๐2+ ๐(1/๐2)
=(1โ ๐)1/2
๐3/4+
(1โ ๐)1/2
(๐/2)3/2
[1โ (๐/2)3/2
๐3/4
]โ 14โ 8๐
๐2+ ๐(1/๐2)
โฅ 1โ ๐/2
๐3/4+ ๐(1/๐3/4).
This completes the proof.
84
3.3 A Sketch for the Spread Conjecture
Here, we provide a concise, high-level description of an asymptotic proof of the Spread
Conjecture, proving only the key graph-theoretic structural results for spread-extremal
graphs. The full proof itself is quite involved (and long), making use of interval arith-
metic and a number of fairly complicated symbolic calculations, but conceptually, is quite
intuitive. For the full proof, we refer the reader to [12]. The proof consists of four main steps:
Step 1: Graph-Theoretic Results
In Subsection 3.3.1, we observe a number of important structural properties of any graph
that maximizes the spread for a given order ๐. In particular, in Theorem 11, we show that
โข any graph that maximizes spread is be the join of two threshold graphs, each with
order linear in ๐,
โข the unit eigenvectors x and z corresponding to ๐1(๐บ) and ๐๐(๐บ) have infinity
norms of order ๐โ1/2 ,
โข the quantities ๐1x2๐ข โ ๐๐z
2๐ข, ๐ข โ ๐ , are all nearly equal, up to a term of order ๐โ1.
This last structural property serves as the key backbone of our proof. In addition, we
note that, by a tensor argument (replacing vertices by independent sets ๐พ๐ก and edges by
bicliques ๐พ๐ก,๐ก), an asymptotic upper bound for ๐ (๐) implies a bound for all ๐.
Step 2: Graph Limits, Averaging, and a Finite-Dimensional Eigenvalue Problem
For this step, we provide only a brief overview, and refer the reader to [12] for additional
details. In this step, we can make use of graph limits to understand how spread-extremal
graphs behave as ๐ tends to infinity. By proving a continuous version of Theorem 11
for graphons, and using an averaging argument inspired by [112], we can show that
85
the spread-extremal graphon takes the form of a step-graphon with a fixed structure of
symmetric seven by seven blocks, illustrated below.
The lengths ๐ผ = (๐ผ1, ..., ๐ผ7), ๐ผ๐1 = 1, of each row/column in the spread-optimal
step-graphon is unknown. For any choice of lengths ๐ผ, we can associate a seven by
seven matrix whose spread is identical to that of the associated step-graphon pictured
above. Let ๐ต be the seven by seven matrix with ๐ต๐,๐ equal to the value of the above
step-graphon on block ๐, ๐, and ๐ท = diag(๐ผ1, ..., ๐ผ7) be a diagonal matrix with ๐ผ on the
diagonal. Then the matrix ๐ท1/2๐ต๐ท1/2, given by
diag(๐ผ1, ..., ๐ผ7)1/2
โโโโโโโโโโโโโโโโโ
1 1 1 1 1 1 1
1 1 1 0 1 1 1
1 1 0 0 1 1 1
1 0 0 0 1 1 1
1 1 1 1 1 1 0
1 1 1 1 1 0 0
1 1 1 1 0 0 0
โโโโโโโโโโโโโโโโโ diag(๐ผ1, ..., ๐ผ7)
1/2
has spread equal to the spread of the associated step-graphon. This process has reduced
the graph limit version of our problem to a finite-dimensional eigenvalue optimization
problem, which we can endow with additional structural constraints resulting from the
graphon version of Theorem 11.
86
(a) ๐ผ๐ = 0 for all ๐ (b) ๐ผ2 = ๐ผ3 = ๐ผ4 = 0
Figure 3-1: Contour plots of the spread for some choices of ๐ผ. Each point (๐ฅ, ๐ฆ) of Plot (a)illustrates the maximum spread over all choices of ๐ผ satisfying ๐ผ3+๐ผ4 = ๐ฅ and ๐ผ6+๐ผ7 = ๐ฆ(and therefore, ๐ผ1 +๐ผ2 +๐ผ5 = 1โ ๐ฅโ ๐ฆ) on a grid of step size 1/100. Each point (๐ฅ, ๐ฆ) ofPlot (b) illustrates the maximum spread over all choices of ๐ผ satisfying ๐ผ2 = ๐ผ3 = ๐ผ4 = 0,๐ผ5 = ๐ฆ, and ๐ผ7 = ๐ฅ on a grid of step size 1/100. The maximum spread of Plot (a) isachieved at the black x, and implies that, without loss of generality, ๐ผ3 + ๐ผ4 = 0, andtherefore ๐ผ2 = 0 (indices ๐ผ1 and ๐ผ2 can be combined when ๐ผ3 + ๐ผ4 = 0). Plot (b) treatsthis case when ๐ผ2 = ๐ผ3 = ๐ผ4 = 0, and the maximum spread is achieved on the black line.This implies that either ๐ผ5 = 0 or ๐ผ7 = 0. In both cases, this reduces to the block two bytwo case ๐ผ1, ๐ผ7 = 0 (or, if ๐ผ7 = 0, then ๐ผ1, ๐ผ6 = 0).
Step 3: Computer-Assisted Proof of a Finite-Dimensional Eigenvalue Problem
Using a computer-assisted proof, we can show that the optimizing choice of ๐ผ๐, ๐ =
1, ..., 7, is, without loss of generality, given by ๐ผ1 = 2/3, ๐ผ6 = 1/3, and all other ๐ผ๐ = 0.
This is exactly the limit of the conjectured spread optimal graph as ๐ tends to infinity.
The proof of this fact is extremely technical. Though not a proof, in Figure 1 we provide
intuitive visual justification that this result is true. In this figure, we provide contour
plots resulting from numerical computations of the spread of the above matrix for var-
ious values of ๐ผ. The numerical results suggest that the 2/3 โ 1/3 two by two block
step-graphon is indeed optimal. See Figure 1 and the associated caption for details. The
87
actual proof of this fact consists of three parts. First, we can reduce the possible choices
of non-zero ๐ผ๐ from 27 to 17 different cases by removing different representations of the
same matrix. Second, using eigenvalue equations, the graph limit version of ๐1x2๐ขโ๐๐z
2๐ข
all nearly equal, and interval arithmetic, we can prove that, of the 17 cases, only the
cases ๐ผ1, ๐ผ7 = 0 or ๐ผ4, ๐ผ5, ๐ผ7 = 0 need to be considered. Finally, using basic results
from the theory of cubic polynomials and computer-assisted symbolic calculations, we
can restrict ourselves to the case ๐ผ1, ๐ผ7 = 0. This case corresponds to a quadratic
polynomial and can easily be shown to be maximized by ๐ผ1 = 2/3, ๐ผ7 = 1/3. We refer
the reader to [118] for the code (based on [95]) and output of the interval arithmetic
algorithm (the key portion of this step), and [12] for additional details regarding this step.
Step 4: From Graph Limits to an Asymptotic Proof of the Spread Conjecture
We can convert our result for the spread-optimal graph limit to a statement for graphs.
This process consists of showing that any spread optimal graph takes the form (๐พ๐1 โช
๐พ๐2) โจ ๐พ๐3 for ๐1 = (2/3 + ๐(1))๐, ๐2 = ๐(๐), and ๐3 = (1/3 + ๐(1))๐, i.e. any
spread optimal graph is equal up to a set of ๐(๐) vertices to the conjectured optimal
graph ๐พโ2๐/3โ โจ๐พโ๐/3โ, and then showing that, for ๐ sufficiently large, the spread of
(๐พ๐1 โช๐พ๐2) โจ๐พ๐3 , ๐1 + ๐2 + ๐3 = ๐, is maximized when ๐2 = 0. This step is fairly
straightforward, and we refer the reader to [12] for details.
3.3.1 Properties of Spread-Extremal Graphs
In this subsection, we consider properties of graphs that maximize the quantity ๐1(๐บ) โ
๐ ๐๐(๐บ) over all graphs of order ๐, for some fixed ๐ > 0. When ๐ = 1, this is exactly
the class of spread-extremal graphs. Let x be the unit eigenvector of ๐1 and z be the unit
eigenvector of ๐๐. As noted in [48] (for the case of ๐ = 1), we have
๐1 โ ๐ ๐๐ =โ๐ขโผ๐ฃ
x๐ขx๐ฃ โ ๐ z๐ขz๐ฃ,
88
and if a graph ๐บ maximizes ๐1 โ ๐ ๐๐ over all ๐ vertex graphs, then ๐ข โผ ๐ฃ, ๐ข = ๐ฃ, if
x๐ขx๐ฃ โ ๐ z๐ขz๐ฃ > 0 and ๐ข โผ ๐ฃ if x๐ขx๐ฃ โ ๐ z๐ขz๐ฃ < 0. We first produce some weak lower
bounds for the maximum of ๐1 โ ๐ ๐๐ over all ๐ vertex graphs.
Proposition 7.
max๐บ
๐1(๐บ)โ ๐ ๐๐(๐บ) โฅ ๐[1 + min1, ๐2/50
]for all ๐ > 0 and ๐ โฅ 100 max1, ๐โ2.
Proof. We consider two different graphs, depending on the choice of ๐. When ๐ โฅ 5/4,
๐ โฅ 100, we consider the bicliques ๐พโ๐/2โ,โ๐/2โ and note that
๐1(๐พโ๐/2โ,โ๐/2โ)โ ๐ ๐๐(๐พโ๐/2โ,โ๐/2โ) โฅ(๐ + 1)(๐โ 1)
2> ๐[1 + 1/50].
When ๐ โค 5/4, we consider the graph ๐บ(๐, ๐) = ๐พ๐ โจ๐พ๐โ๐, ๐ > 0, with characteristic
polynomial ๐๐โ๐โ1(๐ + 1)๐โ1(๐2 โ (๐ โ 1)๐โ ๐(๐โ ๐)
), and note that
๐1(๐บ(๐, ๐))โ ๐ ๐๐(๐บ(๐, ๐)) =(1โ ๐)(๐ โ 1)
2+
1 + ๐
2
โ(๐ โ 1)2 + 4(๐โ ๐)๐.
Setting ๐ = โ(1โ๐/3)๐โ+1 and using the inequalities (1โ๐/3)๐+1 โค ๐ < (1โ๐/3)๐+2,
we have(1โ ๐)(๐ โ 1)
2โฅ
(1โ ๐)(1โ ๐3)๐
2= ๐
[1
2โ 2๐
3+
๐2
6
],
and
(๐ โ 1)2 + 4(๐โ ๐)๐ โฅ(
1โ ๐
3
)2
๐2 + 4
(๐๐
3โ 2
)(๐โ ๐๐
3+ 1
)= ๐2
[1 +
2๐
3โ ๐2
3+
4๐โ 8
๐โ 8
๐2
]โฅ 1 +
2๐
3โ ๐2
2
for ๐ โฅ 100/๐2. Therefore,
๐1(๐บ(๐, ๐))โ ๐ ๐๐(๐บ(๐, ๐)) โฅ ๐
[1
2โ 2๐
3+
๐2
6+
1 + ๐
2
โ1 +
2๐
3โ ๐2
2
].
89
Using the inequalityโ
1 + ๐ฅ โฅ 1 + ๐ฅ/2โ ๐ฅ2/8 for all ๐ฅ โ [0, 1],
โ1 +
2๐
3โ ๐2
2โฅ 1 +
(๐
3โ ๐2
4
)โ 1
8
(2๐
3โ ๐2
2
)2
,
and, after a lengthy calculation, we obtain
๐1(๐บ(๐, ๐))โ ๐ ๐๐(๐บ(๐, ๐)) โฅ ๐
[1 +
13๐2
72โ ๐3
9+
5๐4
192โ ๐5
64
]โฅ ๐
[1 + ๐2/50
].
Combining these two estimates completes the proof.
This implies that for ๐ sufficiently large and ๐ fixed, any extremal graph must satisfy
|๐๐| = ฮฉ(๐). In the following theorem, we make a number of additional observations
regarding the structure of these extremal graphs. Similar results for the specific case of ๐ = 1
can be found in the full paper [12].
Theorem 11. Let ๐บ = (๐,๐ธ), ๐ โฅ 3, maximize the quantity ๐1(๐บ) โ ๐ ๐๐(๐บ), for some
๐ > 0, over all ๐ vertex graphs, and x and z be unit eigenvectors corresponding to ๐1 and
๐๐, respectively. Then
1. ๐บ = ๐บ[๐ ] โจ๐บ[๐ ], where ๐ = ๐ข โ ๐ | z๐ข โฅ 0 and ๐ = ๐ โ๐ ,
2. x๐ขx๐ฃ โ ๐ z๐ขz๐ฃ = 0 for all ๐ข = ๐ฃ,
3. ๐บ[๐ ] and ๐บ[๐ ] are threshold graphs, i.e., there exists an ordering ๐ข1, ..., ๐ข|๐ | of the
vertices of ๐บ[๐ ] (and ๐บ[๐ ]) such that if ๐ข๐ โผ ๐ข๐ , ๐ < ๐, then ๐ข๐ โผ ๐ข๐, ๐ = ๐, implies
that ๐ข๐ โผ ๐ข๐,
4. |๐ |, |๐ | > min1, ๐4๐/2500 for ๐ โฅ 100 max1, ๐โ2,
5. โxโโ โค 6/๐1/2 and โzโโ โค 103 max1, ๐โ3/๐1/2 for ๐ โฅ 100 max1, ๐โ2,
6.๐(๐1x
2๐ข โ ๐ ๐๐z
2๐ข) โ (๐1 โ ๐ ๐๐)
โค 5 ยท 1012 max๐, ๐โ12 for all ๐ข โ ๐ for ๐ โฅ
4 ยท 106 max1, ๐โ6.
90
Proof. Property 1 follows immediately, as x๐ขx๐ฃ โ ๐ z๐ขz๐ฃ > 0 for all ๐ข โ ๐ , ๐ฃ โ ๐ . To
prove Property 2, we proceed by contradiction. Suppose that this is not the case, that
x๐ข1x๐ข2 โ ๐ z๐ข1z๐ข2 = 0 for some ๐ข1 = ๐ข2. Let ๐บโฒ be the graph resulting from either
adding edge ๐ข1, ๐ข2 to ๐บ (if ๐ข1, ๐ข2 โ ๐ธ(๐บ)) or removing edge ๐ข1, ๐ข2 from ๐บ (if
๐ข1, ๐ข2 โ ๐ธ(๐บ)). Then,
๐1(๐บ)โ ๐ ๐๐(๐บ) =โ
๐ข,๐ฃโ๐ธ(๐บโฒ)
x๐ขx๐ฃ โ ๐ z๐ขz๐ฃ โค ๐1(๐บโฒ)โ ๐ ๐๐(๐บโฒ),
and so, by the optimality of ๐บ, x and z are also eigenvectors of the adjacency matrix of ๐บโฒ.
This implies that
(๐1(๐บ)โ ๐1(๐บ
โฒ))x๐ข1
=x๐ข2
and (๐1(๐บ)โ ๐1(๐บ
โฒ))x๐ฃ = 0
for any ๐ฃ = ๐ข1, ๐ข2. If ๐ โฅ 3, then there exists some ๐ฃ such that x๐ฃ = 0, and so, by the
Perron-Frobenius theorem, ๐บ is disconnected. By Property 1, either ๐ = โ or ๐ = โ , and
so ๐๐ โฅ 0, a contradiction, as the trace of ๐ด is zero. This completes the proof of Property
2. To show Property 3, we simply order the vertices of ๐ so that the quantity z๐ข๐/x๐ข๐
is
non-decreasing in ๐. If ๐ข๐ โผ ๐ข๐ , ๐ < ๐, and ๐ข๐ โผ ๐ข๐, without loss of generality with ๐ < ๐,
then
x๐ข๐x๐ข๐โ ๐ z๐ข๐
z๐ข๐โฅ x๐ข๐
x๐ข๐โ ๐
x๐ข๐
x๐ข๐
z๐ข๐z๐ข๐
=x๐ข๐
x๐ข๐
(x๐ข๐x๐ข๐โ ๐ z๐ข๐
z๐ข๐) > 0.
To show Properties 4-6, we assume ๐ โฅ 100 max1, ๐โ2, and so, by Proposition 7,
๐๐(๐บ) < โmin1, ๐2๐/50. We begin with Property 4. Suppose, to the contrary, that
|๐ | โค min1, ๐4๐/2500, and let ๐ฃ maximize |z๐ฃ|. If ๐ฃ โ ๐ , then
๐๐ =โ๐ขโผ๐ฃ
z๐ขz๐ฃโฅโ๐ขโ๐
z๐ขz๐ฃโฅ โ|๐ | โฅ โmin1, ๐4๐/2500,
91
a contradiction to ๐๐(๐บ) < โmin1, ๐2๐/50. If ๐ฃ โ ๐ , then
๐2๐ =
โ๐ขโผ๐ฃ
๐๐z๐ขz๐ฃ
=โ๐ขโผ๐ฃ
โ๐คโผ๐ข
z๐คz๐ฃโคโ๐ขโผ๐ฃ
โ๐คโ๐,๐คโผ๐ข
z๐คz๐ฃโค ๐ |๐ | โค min1, ๐4๐2/2500,
again, a contradiction. This completes the proof of Property 4.
We now prove Property 5. Suppose that maximizes |x๐ข| and ๐ฃ maximizes |z๐ฃ|, and,
without loss of generality, ๐ฃ โ ๐ . Let
๐ด =
๐ค
x๐ค >
x
3
and ๐ต =
๐ค
z๐ค > min1, ๐2|z๐ฃ|
100
.
We have
๐1x =โ๐ฃโผ
x๐ฃ โค x
(|๐ด|+ (๐โ |๐ด|)/3
)and ๐1 โฅ ๐/2, and so |๐ด| โฅ ๐/4. In addition,
1 โฅโ๐ฃโ๐ด
x2๐ฃ โฅ |๐ด|
โxโ2โ9โฅ ๐ โxโ2โ
36,
so โxโโ โค 6/๐1/2. Similarly,
๐๐z๐ฃ =โ๐ขโผ๐ฃ
z๐ข โค |z๐ฃ|(|๐ต|+ min1, ๐2
100(๐โ |๐ต|)
)
and |๐๐(๐บ)| โฅ min1, ๐2๐/50, and so |๐ต| โฅ min1, ๐2๐/100. In addition,
1 โฅโ๐ขโ๐ต
z2๐ข โฅ |๐ต|min1, ๐4โzโ2โ
104โฅ min1, ๐6๐ โzโ
2โ
106,
so โzโโ โค 103 max1, ๐โ3/๐1/2. This completes the proof of Property 5.
Finally, we prove Property 6. Let = (๐ (), ๐ธ()) be the graph resulting from
deleting ๐ข from ๐บ and cloning ๐ฃ, namely,
๐ () = ๐ฃโฒ โช[๐ (๐บ) โ ๐ข
]and ๐ธ() = ๐ธ(๐บ โ ๐ข) โช ๐ฃโฒ, ๐ค | ๐ฃ, ๐ค โ ๐ธ(๐บ).
92
Let ๐ด be the adjacency matrix of , and x and z equal
x๐ค =
โงโชโจโชโฉx๐ค ๐ค = ๐ฃโฒ
x๐ฃ ๐ค = ๐ฃโฒ,
and z๐ค =
โงโชโจโชโฉz๐ค ๐ค = ๐ฃโฒ
z๐ฃ ๐ค = ๐ฃโฒ.
.
Then
x๐ x = 1โ x2๐ข + x2
๐ฃ,
z๐ z = 1โ z2๐ข + z2๐ฃ,
and
x๐๐ดx = ๐1 โ 2๐1x2๐ข + 2๐1x
2๐ฃ โ 2๐ด๐ข๐ฃx๐ขx๐ฃ,
z๐๐ดz = ๐๐ โ 2๐๐z2๐ข + 2๐๐z
2๐ฃ โ 2๐ด๐ข๐ฃz๐ขz๐ฃ.
By the optimality of ๐บ,
๐1(๐บ)โ ๐ ๐๐(๐บ)โ
(x๐๐ดx
x๐ xโ ๐
z๐๐ดz
z๐ z
)โฅ 0.
Rearranging terms, we have
๐1x2๐ข โ ๐1x
2๐ฃ + 2๐ด๐ข,๐ฃx๐ขx๐ฃ
1โ x2๐ข + x2
๐ฃ
โ ๐๐๐z
2๐ข โ ๐๐z
2๐ฃ + 2๐ด๐ข,๐ฃz๐ขz๐ฃ
1โ z2๐ข + z2๐ฃโฅ 0,
and so
(๐1x2๐ข โ ๐ ๐๐z
2๐ข)โ (๐1x
2๐ฃ โ ๐ ๐๐z
2๐ฃ) โฅ โ
[๐1(x
2๐ข โ x2
๐ฃ)2 + 2๐ด๐ข,๐ฃx๐ขx๐ฃ
1โ x2๐ข + x2
๐ฃ
โ ๐๐๐(z2๐ข โ z2๐ฃ)
2 + 2๐ด๐ข,๐ฃz๐ขz๐ฃ1โ z2๐ข + z2๐ฃ
].
The infinity norms of x and z are at most 6/๐1/2 and 103 max1, ๐โ3/๐1/2, respectively,
93
and so we can upper bound๐1(x
2๐ข โ x2
๐ฃ)2 + 2๐ด๐ข,๐ฃx๐ขx๐ฃ
1โ x2๐ข + x2
๐ฃ
โค4๐โxโ4โ + 2โxโ2โ
1โ 2โxโ2โ
โค 4 ยท 64 + 2 ยท 62
๐/2,
and๐๐(z2๐ข โ z2๐ฃ)
2 + 2๐ด๐ข,๐ฃz๐ขz๐ฃ1โ z2๐ข + z2๐ฃ
โค2๐โzโ4โ + 2โzโ2โ
1โ 2โzโ2โ
โค 2(1012 + 106) max1, ๐โ12
๐/2
for ๐ โฅ 4 ยท 106 max1, ๐โ6. Our choice of ๐ข and ๐ฃ was arbitrary, and so
(๐1x
2๐ข โ ๐ ๐๐z
2๐ข)โ (๐1x
2๐ฃ โ ๐ ๐๐z
2๐ฃ)โค 5 ยท 1012 max๐, ๐โ12/๐
for all ๐ข, ๐ฃ โ ๐ . Noting thatโ
๐ข(๐1x2๐ข โ ๐ ๐๐z
2๐ข) = ๐1 โ ๐ ๐๐ completes the proof.
The generality of this result implies that similar techniques to those used to prove the
spread conjecture could be used to understand the behavior of graphs that maximize ๐1โ๐ ๐๐
and how the graphs vary with the choice of ๐ โ (0,โ).
94
Chapter 4
Force-Directed Layouts
4.1 Introduction
Graph drawing is an area at the intersection of mathematics, computer science, and more
qualitative fields. Despite the extensive literature in the field, in many ways the concept
of what constitutes the optimal drawing of a graph is heuristic at best, and subjective at
worst. For a general review of the major areas of research in graph drawing, we refer the
reader to [7, 57]. In this chapter, we briefly introduce the broad class of force-directed
layouts of graphs and consider two prominent force-based techniques for drawing a graph in
low-dimensional Euclidean space: Tutteโs spring embedding and metric multidimensional
scaling.
The concept of a force-directed layout is somewhat ambiguous, but, loosely defined, it
is a technique for drawing a graph in a low-dimensional Euclidean space (usually dimension
โค 3) by applying โforcesโ between the set of vertices and/or edges. In a force-directed
layout, vertices connected by an edge (or at a small graph distance from each other) tend
to be close to each other in the resulting layout. Below we briefly introduce a number of
prominent force-directed layouts, including Tutteโs spring embedding, Eadesโ algorithm, the
Kamada-Kawai objective and algorithm, and the more recent UMAP algorithm.
In his 19631 work titled โHow to Draw a Graph,โ Tutte found an elegant technique
1The paper was received in May 1962, but was published in 1963. For this reason, in [124], the year is
95
to produce planar embeddings of planar graphs that also minimize โenergyโ (the sum of
squared edge lengths) in some sense [116]. In particular, for a three-connected planar graph,
he showed that if the outer face of the graph is fixed as the complement of some convex
region in the plane, and every other point is located at the mass center of its neighbors, then
the resulting embedding is planar. This result is now known as Tutteโs spring embedding
theorem, and is considered by many to be the first example of a force-directed layout. One
of the major questions that this result does not treat is how to best embed the outer face. In
Section 4.2, we investigate this question, consider connections to a Schur complement, and
provide some theoretical results for this Schur complement using a discrete energy-only
trace theorem.
An algorithm for general graphs was later proposed by Eades in 1984, in which vertices
are placed in a random initial position in the plane, a logarithmic spring force is applied
to each edge and a repulsive inverse square-root law force is applied to each pair of non-
adjacent vertices [38]. The algorithm proceeds by iteratively moving each vertex towards
its local equilibrium. Five years later, Kamada and Kawai proposed an objective function
corresponding to the situation in which each pair of vertices are connected by a spring with
equilibrium length given by their graph distance and spring constant given by the inverse
square of their graph distance [55]. Their algorithm for locally minimizing this objective
consists of choosing vertices iteratively based on the magnitude of the associated derivative
of the objective, and locally minimizing with respect to that vertex. The associated objective
function is a specific example of metric multidimensional scaling, and both the objective
function and the proposed algorithm are quite popular in practice (see, for instance, popular
packages such as GRAPHVIZ [39] and the SMACOF package in R). In Section 4.3, we provide
a theoretical analysis of the Kamada-Kawai objective function. In particular, we prove a
number of structural results regarding optimal layouts, provide algorithmic lower bounds
for the optimization problem, and propose a polynomial time randomized approximation
scheme for drawing low diameter graphs.
Most recently, a new force-based layout algorithm called UMAP was proposed as an
referred to as 1962.
96
alternative to the popular t-SNE algorithm [82]. The UMAP algorithm takes a data set as
input, constructs a weighted graph, and performs a fairly complex force-directed layout in
which attractive and repulsive forces governed by a number of hyper-parameters are applied.
For the exact details of this approach, we refer the reader to [82, Section 3.2]. Though
the UMAP algorithm is not specifically analyzed in this chapter, it is likely that some the
techniques proposed in this chapter can be applied (given a much more complicated analysis)
to more complicated force-directed layout objectives, such as that of the UMAP algorithm.
In addition, the popularity of the UMAP algorithm illustrates the continued importance and
relevance of force-based techniques and their analysis.
In the remainder of this chapter, we investigate both Tutteโs spring embedding and the
Kamada-Kawai objective function. In Section 4.2, we formally describe Tutteโs spring
embedding and consider theoretical questions regarding the best choice of boundary. In
particular, we consider connections to a Schur complement and, using a discrete version of
a trace theorem from the theory of elliptic PDEs, provide a spectral equivalence result for
this matrix. In Section 4.3, we formally define the Kamada-Kawai objective, and prove a
number of theoretical results for this optimization problem. We lower bound the objective
value and upper bound the diameter of any optimal layout, prove that a gap version of the
optimization problem is NP-hard even for bounded diameter graph metrics, and provide a
polynomial time randomized approximation scheme for low diameter graphs.
97
4.2 Tutteโs Spring Embedding and a Trace Theorem
The question of how to โbestโ draw a graph in the plane is not an easy one to answer, largely
due to the ambiguity in the objective. When energy (i.e. Hallโs energy, the sum of squared
distances between adjacent vertices) minimization is desired, the optimal embedding in
the plane is given by the two-dimensional diffusion map induced by the eigenvectors of
the two smallest non-zero eigenvalues of the graph Laplacian [60, 61, 62]. This general
class of graph drawing techniques is referred to as spectral layouts. When drawing a planar
graph, often a planar embedding (a drawing in which edges do not intersect) is desirable.
However, spectral layouts of planar graphs are not guaranteed to be planar. When looking
at a triangulation of a given domain, it is commonplace for the near-boundary points of
the spectral layout to โgrowโ out of the boundary, or lack any resemblance to a planar
embedding. For instance, see the spectral layout of a random triangulation of a disk in
Figure 4-1.
Tutteโs spring embedding is an elegant technique to produce planar embeddings of planar
graphs that also minimize energy in some sense [116]. In particular, for a three-connected
planar graph, if the outer face of the graph is fixed as the complement of some convex region
in the plane, and every other point is located at the mass center of its neighbors, then the
resulting embedding is planar. This embedding minimizes Hallโs energy, conditional on the
embedding of the boundary face. This result is now known as Tutteโs spring embedding
theorem. While this result is well known (see [59], for example), it is not so obvious how
to embed the outer face. This, of course, should vary from case to case, depending on the
dynamics of the interior.
In this section, we investigate how to embed the boundary face so that the resulting
drawing is planar and Hallโs energy is minimized (subject to some normalization). In what
follows, we produce an algorithm with theoretical guarantees for a large class of three-
connected planar graphs. Our analysis begins by observing that the Schur complement of
the graph Laplacian with respect to the interior vertices is, in some sense, the correct matrix
to consider when choosing an optimal embedding of boundary vertices. See Figure 4-2
98
(a) Circle (b) Spectral Layout (c) Schur Complement Layout
Figure 4-1: A Delaunay triangulation of 1250 points randomly generated on the disk (A), itsnon-planar spectral layout (B), and a planar layout using a spring embedding of the Schurcomplement of the graph Laplacian with respect to the interior vertices (C).
for a visual example of a spring embedding using the two minimal non-trivial eigenvectors
of the Schur complement. In order to theoretically understand the behavior of the Schur
complement, we prove a discrete trace theorem. Trace theorems are a class of results in
the theory of partial differential equations relating norms on the domain to norms on the
boundary, which are used to provide a priori estimates on the Dirichlet integral of functions
with given data on the boundary. We construct a discrete version of a trace theorem in the
plane for energy-only semi-norms. Using a discrete trace theorem, we show that this Schur
complement is spectrally equivalent to the boundary Laplacian to the one-half power. This
spectral equivalence proves the existence of low energy (w.r.t. the Schur complement) convex
and planar embeddings of the boundary, but is also of independent interest and applicability
in the study of spectral properties of planar graphs. These theoretical guarantees give rise to
a simple graph drawing algorithm with provable guarantees.
The remainder of this section is as follows. In Subsection 4.2.1, we formally introduce
Tutteโs spring embedding theorem, describe the optimization problem under consideration,
illustrate the connection to a Schur complement, and, conditional on a spectral equivalence,
describe a simple algorithm with provable guarantees. In Subsection 4.2.2, we prove the
aforementioned spectral equivalence for a large class of three-connected planar graphs. In
particular, we consider trace theorems for Lipschitz domains from the theory of elliptic
99
(a) Laplacian Embedding (b) Schur Complement Embedding
Figure 4-2: A visual example of embeddings of the 2D finite element discretization graph3elt, taken from the SuiteSparse Matrix Collection [27]. Figure (A) is the non-planar spectrallayout of this 2D mesh, and Figure (B) is a planar spring embedding of the mesh using theminimal non-trivial eigenvectors of the Schur complement to embed the boundary.
partial differential equations, prove discrete energy-only variants of these results for a large
class of graphs in the plane, and show that the Schur complement with respect to the interior
is spectrally equivalent to the boundary Laplacian to the one-half power.
Definitions and Notation
Let ๐บ = (๐,๐ธ), ๐ = 1, ..., ๐, ๐ธ โ ๐ โ ๐ | |๐| = 2, be a simple, connected, undirected
graph. A graph ๐บ is ๐-connected if it remains connected upon the removal of any ๐ โ 1
vertices, and is planar if it can be drawn in the plane such that no edges intersect (save for
adjacent edges at their mutual endpoint). A face of a planar embedding of a graph is a region
of the plane bounded by edges (including the outer infinite region, referred to as the outer
face). Let ๐ข๐ be the set of all ordered pairs (๐บ,ฮ), where ๐บ is a simple, undirected, planar,
three-connected graph of order ๐ > 4, and ฮ โ ๐ is the set of vertices of some face of
๐บ. Three-connectedness is an important property for planar graphs, which, by Steinitzโs
theorem, guarantees that the graph is the skeleton of a convex polyhedron [107]. This
characterization implies that, for three-connected graphs, the edges corresponding to each
face in a planar embedding are uniquely determined by the graph. In particular, the set of
100
faces is simply the set of induced cycles, so we may refer to faces of the graph without
specifying an embedding. One important corollary of this result is that, for ๐ โฅ 3, the
vertices of any face form an induced simple cycle.
Let ๐๐บ(๐) be the neighborhood of vertex ๐, ๐๐บ(๐) be the union of the neighborhoods
of the vertices in ๐, and ๐๐บ(๐, ๐) be the distance between vertices ๐ and ๐ in the graph
๐บ. When the associated graph is obvious, we may remove the subscript. Let ๐(๐) be the
degree of vertex ๐. Let ๐บ[๐] be the graph induced by the vertices ๐, and ๐๐(๐, ๐) be the
distance between vertices ๐ and ๐ in ๐บ[๐]. If ๐ป is a subgraph of ๐บ, we write ๐ป โ ๐บ. The
Cartesian product ๐บ1๐บ2 between ๐บ1 = (๐1, ๐ธ1) and ๐บ2 = (๐2, ๐ธ2) is the graph with
vertices (๐ฃ1, ๐ฃ2) โ ๐1 ร ๐2 and edge (๐ข1, ๐ข2), (๐ฃ1, ๐ฃ2) โ ๐ธ if (๐ข1, ๐ฃ1) โ ๐ธ1 and ๐ข2 = ๐ฃ2,
or ๐ข1 = ๐ฃ1 and (๐ข2, ๐ฃ2) โ ๐ธ2. The graph Laplacian ๐ฟ๐บ โ R๐ร๐ of ๐บ is the symmetric
positive semi-definite matrix defined by
โจ๐ฟ๐บ๐ฅ, ๐ฅโฉ =โ
๐,๐โ๐ธ
(๐ฅ๐ โ ๐ฅ๐)2,
and, in general, a matrix is the graph Laplacian of some weighted graph if it is symmetric
diagonally dominant, has non-positive off-diagonal entries, and the vector 1 := (1, ..., 1)๐
lies in its nullspace. Given a matrix ๐ด, we denote the ๐๐กโ row by ๐ด๐,ยท, the ๐๐กโ column by ๐ดยท,๐ ,
and the entry in the ๐๐กโ row and ๐๐กโ column by ๐ด๐,๐ .
4.2.1 Spring Embeddings and a Schur Complement
Here and in what follows, we refer to ฮ as the โboundaryโ of the graph ๐บ, ๐ โฮ as the
โinterior,โ and generally assume ๐ฮ := |ฮ| to be relatively large (typically ๐ฮ = ฮ(๐1/2)).
The concept of a โboundaryโ face is somewhat arbitrary, but, depending on the application
from which the graph originated (i.e., a discretization of some domain), one face is often
already designated as the boundary face. If a face has not been designated, choosing the
largest induced cycle is a reasonable choice. By producing a planar drawing of ๐บ in the
plane and traversing the embedding, one can easily find all the induced cycles of ๐บ in linear
time and space [20].
101
Without loss of generality, suppose that ฮ = ๐โ ๐ฮ + 1, ..., ๐. A matrix ๐ โ R๐ร2 is
said to be a planar embedding of ๐บ if the drawing of ๐บ using straight lines and with vertex ๐
located at coordinates ๐๐,ยท for all ๐ โ ๐ is a planar drawing. A matrix ๐ฮ โ R๐ฮร2 is said
to be a convex embedding of ฮ if the embedding is planar and every point is an extreme
point of the convex hull conv([๐ฮ]๐,ยท๐ฮ๐=1). Tutteโs spring embedding theorem states that if
๐ฮ is a convex embedding of ฮ, then the system of equations
๐๐,ยท =
โงโชโจโชโฉ1
๐(๐)
โ๐โ๐(๐) ๐๐,ยท ๐ = 1, ..., ๐โ ๐ฮ
[๐ฮ]๐โ(๐โ๐ฮ),ยท ๐ = ๐โ ๐ฮ + 1, ..., ๐
has a unique solution ๐ , and this solution is a planar embedding of ๐บ [116].
We can write both the Laplacian and embedding of ๐บ in block-notation, differentiating
between interior and boundary vertices as follows:
๐ฟ๐บ =
โโ๐ฟ๐ + ๐ท๐ โ๐ด๐,ฮ
โ๐ด๐๐,ฮ ๐ฟฮ + ๐ทฮ
โโ โ R๐ร๐, ๐ =
โโ๐๐
๐ฮ
โโ โ R๐ร2,
where ๐ฟ๐, ๐ท๐ โ R(๐โ๐ฮ)ร(๐โ๐ฮ), ๐ฟฮ, ๐ทฮ โ R๐ฮร๐ฮ , ๐ด๐,ฮ โ R(๐โ๐ฮ)ร๐ฮ , ๐๐ โ R(๐โ๐ฮ)ร2,
๐ฮ โ R๐ฮร2, and ๐ฟ๐ and ๐ฟฮ are the Laplacians of ๐บ[๐ โฮ] and ๐บ[ฮ], respectively. Using
block notation, the system of equations for the Tutte spring embedding of some convex
embedding ๐ฮ is given by
๐๐ = (๐ท๐ + ๐ท[๐ฟ๐])โ1[(๐ท[๐ฟ๐]โ ๐ฟ๐)๐๐ + ๐ด๐,ฮ๐ฮ],
where ๐ท[๐ด] is the diagonal matrix with diagonal entries given by the diagonal of ๐ด. The
unique solution to this system is
๐๐ = (๐ฟ๐ + ๐ท๐)โ1๐ด๐,ฮ๐ฮ.
We note that this choice of ๐๐ not only guarantees a planar embedding of ๐บ, but also
102
minimizes Hallโs energy, namely,
arg min๐๐
โ(๐) = (๐ฟ๐ + ๐ท๐)โ1๐ด๐,ฮ๐ฮ,
where โ(๐) := Tr(๐๐๐ฟ๐) (see [62] for more on Hallโs energy).
While Tutteโs theorem is a very powerful result, guaranteeing that, given a convex
embedding of any face, the energy minimizing embedding of the remaining vertices results
in a planar embedding, it gives no direction as to how this outer face should be embedded.
We consider embeddings of the boundary that minimize Hallโs energy, given some
normalization. We consider embeddings that satisfy ๐๐ฮ๐ฮ = ๐ผ and ๐๐
ฮ 1 = 0, though
other normalizations, such as ๐๐๐ = ๐ผ and ๐๐1 = 0, would be equally appropriate. The
analysis that follows in this section can be readily applied to this alternate normalization,
but it does require the additional step of verifying a norm equivalence between ๐ and ฮ
for the harmonic extension of low energy vectors, which can be produced relatively easily
for the class of graphs considered in Subsection 4.2.2. In addition, alternate normalization
techniques, such as the introduction of negative weights on the boundary cycle, can also
be considered. Let ๐ณ be the set of all planar embeddings ๐ฮ that satisfy ๐๐ฮ๐ฮ = ๐ผ ,
๐๐ฮ 1 = 0, and for which the spring embedding ๐ with ๐๐ = (๐ฟ๐ + ๐ท๐)
โ1๐ด๐,ฮ๐ฮ is planar.
We consider the optimization problem
min โ(๐) ๐ .๐ก. ๐ฮ โ cl(๐ณ ), (4.1)
where cl(ยท) is the closure of a set. The set ๐ณ is not closed, and the minimizer of (4.1) may
be non-planar, but must be arbitrarily close to a planar embedding. The normalizations
๐๐ฮ 1 = 0 and ๐๐
ฮ๐ฮ = ๐ผ ensure that the solution does not degenerate into a single
point or line. In what follows we are primarily concerned with approximately solving this
optimization problem and connecting this problem to the Schur complement of ๐ฟ๐บ with
respect to ๐ โฮ. It is unclear whether there exists an efficient algorithm to solve (4.1) exactly
or if the associated decision problem is NP-hard. If (4.1) is NP-hard, it seems rather difficult
to verify that this is indeed the case.
103
A Schur Complement
Given some choice of ๐ฮ, by Tutteโs theorem the minimum value of โ(๐) is attained when
๐๐ = (๐ฟ๐ + ๐ท๐)โ1๐ด๐,ฮ๐ฮ, and given by
Tr
โกโฃ([(๐ฟ๐ + ๐ท๐)โ1๐ด๐,ฮ๐ฮ]๐ ๐๐
ฮ
)โโ๐ฟ๐ + ๐ท๐ โ๐ด๐,ฮ
โ๐ด๐๐,ฮ ๐ฟฮ + ๐ทฮ
โโ โโ(๐ฟ๐ + ๐ท๐)โ1๐ด๐,ฮ๐ฮ
๐ฮ
โโ โคโฆ= Tr
(๐๐
ฮ
[๐ฟฮ + ๐ทฮ โ ๐ด๐
๐,ฮ(๐ฟ๐ + ๐ท๐)โ1๐ด๐,ฮ
]๐ฮ
)= Tr
(๐๐
ฮ ๐ฮ๐ฮ
),
where ๐ฮ is the Schur complement of ๐ฟ๐บ with respect to ๐ โฮ,
๐ฮ = ๐ฟฮ + ๐ทฮ โ ๐ด๐๐,ฮ(๐ฟ๐ + ๐ท๐)
โ1๐ด๐,ฮ.
For this reason, we can instead consider the optimization problem
min โฮ(๐ฮ) ๐ .๐ก. ๐ฮ โ cl(๐ณ ), (4.2)
where
โฮ(๐ฮ) := Tr(๐๐
ฮ ๐ฮ๐ฮ
).
Therefore, if the minimal two non-trivial eigenvectors of ๐ฮ produce a planar spring
embedding, then this is the exact solution of (4.2). However, a priori, there is no reason to
think that this drawing would be planar. It turns out that, for a large class of graphs with
some macroscopic structure, this is often the case (see, for example, Figures 4-1 and 4-2),
and, in the rare instance that it is not, through a spectral equivalence result, a low energy
planar and convex embedding of the boundary always exists. This fact follows from the
spectral equivalence of ๐ฮ and ๐ฟ1/2ฮ , which is shown in Subsection 4.2.2.
First, we present a number of basic properties of the Schur complement of a graph
Laplacian. For more information on the Schur complement, we refer the reader to [18, 40,
134].
104
Proposition 8. Let ๐บ = (๐,๐ธ), ๐ = |๐ |, be a graph and ๐ฟ๐บ โ R๐ร๐ the associated graph
Laplacian. Let ๐ฟ๐บ and vectors ๐ฃ โ R๐ be written in block form
๐ฟ(๐บ) =
โโ๐ฟ11 ๐ฟ12
๐ฟ21 ๐ฟ22
โโ , ๐ฃ =
โโ๐ฃ1
๐ฃ2
โโ ,
where ๐ฟ22 โ R๐ร๐, ๐ฃ2 โ R๐, and ๐ฟ12 = 0. Then
(1) ๐ = ๐ฟ22 โ ๐ฟ21๐ฟโ111 ๐ฟ12 is a graph Laplacian,
(2)โ๐
๐=1(๐๐๐ ๐ฟ221๐)๐๐๐
๐๐ โ ๐ฟ21๐ฟ
โ111 ๐ฟ12 is a graph Laplacian,
(3) โจ๐๐ค,๐คโฉ = infโจ๐ฟ๐ฃ, ๐ฃโฉ|๐ฃ2 = ๐ค.
Proof. Let ๐ =
โโโ๐ฟโ111 ๐ฟ12
๐ผ
โโ โ R๐ร๐. Then
๐ ๐๐ฟ๐ =(โ๐ฟ21๐ฟ
โ111 ๐ผ
)โโ๐ฟ11 ๐ฟ12
๐ฟ21 ๐ฟ22
โโ โโโ๐ฟโ111 ๐ฟ12
๐ผ
โโ = ๐ฟ22 โ ๐ฟ21๐ฟโ111 ๐ฟ12 = ๐.
Because ๐ฟ111๐โ๐ +๐ฟ121๐ = 0, we have 1๐โ๐ = โ๐ฟโ111 ๐ฟ121๐. Therefore ๐1๐ = 1๐, and,
as a result,
๐1๐ = ๐ ๐๐ฟ๐1๐ = ๐ ๐๐ฟ1๐ = ๐ ๐0 = 0.
In addition,[๐โ๐=1
(๐๐๐ ๐ฟ221๐)๐๐๐๐๐ โ ๐ฟ21๐ฟ
โ111 ๐ฟ12
]1๐ =
[ ๐โ๐=1
(๐๐๐ ๐ฟ221๐)๐๐๐๐๐ โ ๐ฟ22
]1๐ + ๐1๐
=๐โ๐=1
(๐๐๐ ๐ฟ221๐)๐๐ โ ๐ฟ221๐
=
[ ๐โ๐=1
๐๐๐๐๐ โ ๐ผ๐
]๐ฟ221๐ = 0.
๐ฟ11 is an M-matrix, so ๐ฟโ111 is a non-negative matrix. ๐ฟ21๐ฟ
โ111 ๐ฟ12 is the product of three non-
negative matrices, and so must also be non-negative. Therefore, the off-diagonal entries of
105
๐ andโ๐
๐=1(๐๐๐ ๐ฟ221)๐๐๐
๐๐ โ ๐ฟ21๐ฟ
โ111 ๐ฟ12 are non-positive, and so both are graph Laplacians.
Consider
โจ๐ฟ๐ฃ, ๐ฃโฉ = โจ๐ฟ11๐ฃ1, ๐ฃ1โฉ+ 2โจ๐ฟ12๐ฃ2, ๐ฃ1โฉ+ โจ๐ฟ22๐ฃ2, ๐ฃ2โฉ,
with ๐ฃ2 fixed. Because ๐ฟ11 is symmetric positive definite, the minimum occurs when
๐
๐๐ฃ1โจ๐ฟ๐ฃ, ๐ฃโฉ = 2๐ฟ11๐ฃ1 + 2๐ฟ12๐ฃ2 = 0.
Setting ๐ฃ1 = โ๐ฟโ111 ๐ฟ12๐ฃ2, the desired result follows.
The above result illustrates that the Schur complement Laplacian ๐ฮ is the sum of two
Laplacians ๐ฟฮ and ๐ทฮโ๐ด๐๐,ฮ(๐ฟ๐ +๐ท๐)
โ1๐ด๐,ฮ, where the first is the Laplacian of ๐บ[ฮ], and
the second is a Laplacian representing the dynamics of the interior.
An Algorithm
Given the above analysis for ๐ฮ, and assuming a spectral equivalence result of the form
1
๐1โจ๐ฟ1/2
ฮ ๐ฅ, ๐ฅโฉ โค โจ๐ฮ๐ฅ, ๐ฅโฉ โค ๐2 โจ๐ฟ1/2ฮ ๐ฅ, ๐ฅโฉ, (4.3)
for all ๐ฅ โ R๐ฮ and some constants ๐1 and ๐2 that are not too large, we can construct a simple
algorithm for producing a spring embedding. Constants ๐1 and ๐2 can be computed explicitly
using the results of Subsection 4.2.2, and are reasonably sized if the graph resembles a
discretization of some planar domain (i.e., a graph that satisfies the conditions of Theorem
15). Given these two facts, a very natural algorithm consists of the following steps. First we
compute the two eigenvectors corresponding to the two minimal non-trivial eigenvalues of
the Schur complement ๐ฮ. If these two eigenvectors produce a planar embedding, we are
done, and have solved the optimization problem (4.2) exactly. Depending on the structure
of the graph, this may often be the case. For instance, this occurs for the graphs in Figures
4-1 and 4-2, and the numerical results of [124] confirm that this also occurs often for
discretizations of circular and rectangular domains. If this is not the case, then we consider
106
Algorithm 3 A Schur Complement-Based Spring Embedding Algorithm
function SchurSPRING(๐บ,ฮ)Compute the two minimal non-trivial eigenpairs ๐๐๐๐ โ R๐ฮร2 of ๐ฮ
if spring embedding of ๐๐๐๐ is non-planar then๐๐๐๐ค โ
2๐ฮ
(cos 2๐๐
๐ฮ, sin 2๐๐
๐ฮ
)๐ฮ
๐=1; gapโ 1
while spring embedding of ๐๐๐๐ค is planar, gap > 0 do โ smooth(๐๐๐๐ค); ๐๐๐๐ โ ๐๐๐๐ค
โ โ 11๐ /๐ฮ
Solve [๐ ]๐ = ๐ฮ, ๐ orthogonal, ฮ diagonal; set ๐๐๐๐ค โ ๐ฮโ1/2
gapโ โฮ(๐๐๐๐)โ โฮ(๐๐๐๐ค)end while
end ifReturn ๐๐๐๐
end function
the boundary embedding
[๐๐ถ ]๐,ยท =2
๐ฮ
(cos
2๐๐
๐ฮ
, sin2๐๐
๐ฮ
), ๐ = 1, ..., ๐ฮ,
namely, the embedding of the two minimal non-trivial eigenvectors of ๐ฟ1/2ฮ . This choice
of boundary is convex and planar, and so the associated spring embedding is planar. By
spectral equivalence,
โฮ(๐๐ถ) โค 4๐2 sin๐
๐ฮ
โค ๐1๐2 min๐ฮโcl(๐ณ )
โฮ(๐ฮ),
and therefore, this algorithm already produces a ๐1๐2 approximation guarantee for (4.2).
The choice of ๐๐ถ can often be improved, and a natural technique consists of smoothing
๐๐ถ using ๐ฮ and renormalizing until either the resulting drawing is no longer planar or the
objective value no longer decreases. We describe this procedure in Algorithm 3.
We now discuss some of the finer details of the SchurSPRING(๐บ,ฮ) algorithm (Algo-
rithm 3). Determining whether a drawing is planar can be done in near-linear time using
the sweep line algorithm [102]. Also, in practice, it is advisable to replace conditions of the
form gap > 0 in Algorithm 3 by gap > tol for some small value of tol, in order to ensure
that the algorithm terminates after some finite number of steps. For a reasonable choice of
107
smoother, a constant tolerance, and ๐ฮ = ฮ(๐1/2), the SchurSPRING(๐บ,ฮ) algorithm has
complexity near-linear in ๐. The main cost of this procedure is due to the computations that
involve ๐ฮ.
The SchurSPRING(๐บ,ฮ) algorithm requires the repeated application of ๐ฮ or ๐โ1ฮ in
order to compute the minimal eigenvectors of ๐ฮ and also to perform smoothing. The Schur
complement ๐ฮ is a dense matrix and requires the inversion of a (๐โ๐ฮ)ร (๐โ๐ฮ) matrix,
but can be represented as the composition of functions of sparse matrices. In practice,
๐ฮ should never be formed explicitly. Rather, the operation of applying ๐ฮ to a vector ๐ฅ
should occur in two steps. First, the sparse Laplacian system (๐ฟ๐ + ๐ท๐)๐ฆ = ๐ด๐,ฮ๐ฅ should
be solved for ๐ฆ, and then the product ๐๐ฅ is given by ๐ฮ๐ฅ = (๐ฟฮ + ๐ทฮ)๐ฅ โ ๐ด๐๐,ฮ๐ฆ. Each
application of ๐ฮ is therefore an (๐) procedure (using a nearly-linear time Laplacian
solver). The application of the inverse ๐โ1ฮ defined on the subspace ๐ฅ | โจ๐ฅ,1โฉ = 0 also
requires the solution of a Laplacian system. As noted in [130], the action of ๐โ1ฮ on a vector
๐ฅ โ ๐ฅ | โจ๐ฅ,1โฉ = 0 is given by
๐โ1ฮ ๐ฅ =
(0 ๐ผ
)โโ๐ฟ๐ + ๐ท๐ โ๐ด๐,ฮ
โ๐ด๐๐,ฮ ๐ฟฮ + ๐ทฮ
โโ โ1โโ0
๐ฅ
โโ ,
as verified by the computation
๐ฮ
[๐โ1ฮ ๐ฅ]
= ๐ฮ
(0 ๐ผ
)โกโฃโโ ๐ผ 0
โ๐ด๐๐,ฮ(๐ฟ๐ + ๐ท๐)
โ1 ๐ผ
โโ โโ๐ฟ๐ + ๐ท๐ โ๐ด๐,ฮ
0 ๐ฮ
โโ โคโฆโ1โโ0
๐ฅ
โโ = ๐ฮ
(0 ๐ผ
)โโ๐ฟ๐ + ๐ท๐ โ๐ด๐,ฮ
0 ๐ฮ
โโ โ1โโ ๐ผ 0
๐ด๐๐,ฮ(๐ฟ๐ + ๐ท๐)
โ1 ๐ผ
โโ โโ0
๐ฅ
โโ = ๐ฮ
(0 ๐ผ
)โโ๐ฟ๐ + ๐ท๐ โ๐ด๐,ฮ
0 ๐ฮ
โโ โ1โโ0
๐ฅ
โโ = ๐ฅ.
Given that the application of ๐โ1ฮ has the same complexity as an application ๐ฮ, the inverse
power method is naturally preferred over the shifted power method for both smoothing and
the computation of low energy eigenvectors.
108
4.2.2 A Discrete Trace Theorem and Spectral Equivalence
The main result of this subsection takes classical trace theorems from the theory of partial
differential equations and extends them to a class of planar graphs. However, for our
purposes, we require a stronger form of trace theorem, one between energy semi-norms
(i.e., no โ2 term), which we refer to as โenergy-onlyโ trace theorems. These energy-only
trace theorems imply their classical variants with โ2 terms almost immediately. We then
use these new results to prove the spectral equivalence of ๐ฮ and ๐ฟ1/2ฮ for the class of
graphs under consideration. This class of graphs is rigorously defined below, but includes
planar three-connected graphs that have some regular structure (such as graphs of finite
element discretizations). For the sake of space and readibility, a number of the results of
this subsection are stated for arbitrary constants, or constants larger than necessary. More
complicated proofs with explicit constants (or, in some cases, improved constants) can be
found in [124]. We begin by formally describing a classical trace theorem.
Let ฮฉ โ R๐ be a domain with boundary ฮ = ๐ฟฮฉ that, locally, is a graph of a Lipschitz
function. ๐ป1(ฮฉ) is the Sobolev space of square integrable functions with square integrable
weak gradient, with norm
โ๐ขโ21,ฮฉ = โโ๐ขโ2๐ฟ2(ฮฉ) + โ๐ขโ2๐ฟ2(ฮฉ), where โ๐ขโ2๐ฟ2(ฮฉ) =
โซฮฉ
๐ข2 ๐๐ฅ.
Let
โ๐โ21/2,ฮ = โ๐โ2๐ฟ2(ฮ)+
โซโซฮรฮ
(๐(๐ฅ)โ ๐(๐ฆ))2
|๐ฅโ ๐ฆ|๐๐๐ฅ ๐๐ฆ
for functions defined on ฮ, and denote by ๐ป1/2(ฮ) the Sobolev space of functions defined on
the boundary ฮ for which โ ยท โ1/2,ฮ is finite. The trace theorem for functions in ๐ป1(ฮฉ) is one
of the most important and frequently used trace theorems in the theory of partial differential
equations. More general results for traces on boundaries of Lipschitz domains, which
involve ๐ฟ๐ norms and fractional derivatives, are due to E. Gagliardo [42] (see also [24]).
Gagliardoโs theorem, when applied to the case of ๐ป1(ฮฉ) and ๐ป1/2(ฮ), states that if ฮฉ โ R๐
109
is a Lipschitz domain, then the norm equivalence
โ๐โ1/2,ฮ h infโ๐ขโ1,ฮฉ๐ข|ฮ = ๐
holds (the right hand side is indeed a norm on ๐ป1/2(ฮ)). These results are key tools in
proving a priori estimates on the Dirichlet integral of functions with given data on the
boundary of a domain ฮฉ. Roughly speaking, a trace theorem gives a bound on the energy
of a harmonic function via norm of the trace of the function on ฮ = ๐ฮฉ. In addition to
the classical references given above, further details on trace theorems and their role in the
analysis of PDEs (including the case of Lipschitz domains) can be found in [77, 87]. There
are several analogues of this theorem for finite element spaces (finite dimensional subspaces
of ๐ป1(ฮฉ)). For instance, in [86] it is shown that the finite element discretization of the
Laplace-Beltrami operator on the boundary to the one-half power provides a norm which is
equivalent to the ๐ป1/2(ฮ)-norm. Here we prove energy-only analogues of the classical trace
theorem for graphs (๐บ,ฮ) โ ๐ข๐, using energy semi-norms
|๐ข|2๐บ = โจ๐ฟ๐บ๐ข, ๐ขโฉ and |๐|2ฮ =โ๐,๐โฮ,๐<๐
(๐(๐)โ ๐(๐))2
๐2๐บ(๐, ๐).
The energy semi-norm | ยท |๐บ is a discrete analogue of โโ๐ขโ๐ฟ2(ฮฉ), and the boundary
semi-norm | ยท |ฮ is a discrete analogue of the quantityโซโซ
ฮรฮ(๐(๐ฅ)โ๐(๐ฆ))2
|๐ฅโ๐ฆ|2 ๐๐ฅ ๐๐ฆ. In addition,
by connectivity, | ยท |๐บ and | ยท |ฮ are norms on the quotient space orthogonal to 1. We aim to
prove that for any ๐ โ R๐ฮ ,
1
๐1|๐|ฮ โค min
๐ข|ฮ=๐|๐ข|๐บ โค ๐2 |๐|ฮ
for some constants ๐1, ๐2 that do not depend on ๐ฮ, ๐. We begin by proving these results for
a simple class of graphs, and then extend our analysis to more general graphs. Some of the
proofs of the below results are rather technical, and are omitted for the sake of readibility
and space.
110
Trace Theorems for a Simple Class of Graphs
Let ๐บ๐,โ = ๐ถ๐๐โ be the Cartesian product of the ๐ vertex cycle ๐ถ๐ and the โ vertex path ๐โ,
where 4โ < ๐ < 2๐โ for some constant ๐ โ N. The lower bound 4โ < ๐ is arbitrary in some
sense, but is natural, given that the ratio of boundary length to in-radius of a convex region is
at least 2๐. Vertex (๐, ๐) in ๐บ๐,โ corresponds to the product of ๐ โ ๐ถ๐ and ๐ โ ๐โ, ๐ = 1, ..., ๐,
๐ = 1, ..., โ. The boundary of ๐บ๐,โ is defined to be ฮ = (๐, 1)๐๐=1. Let ๐ข โ R๐รโ and
๐ โ R๐ be functions on ๐บ๐,โ and ฮ, respectively, with ๐ข[(๐, ๐)] denoted by ๐ข๐,๐ and ๐[(๐, 1)]
denoted by ๐๐. For the remainder of the section, we consider the natural periodic extension
of the vertices (๐, ๐) and the functions ๐ข๐,๐ and ๐๐ to the indices ๐ โ Z. In particular, if
๐ โ 1, ..., ๐, then (๐, ๐) := (๐*, ๐), ๐๐ := ๐๐* , and ๐ข๐,๐ := ๐ข๐*,๐ , where ๐* โ 1, ..., ๐ and
๐* = ๐ mod ๐. Let ๐บ*๐,โ be the graph resulting from adding to ๐บ๐,โ all edges of the form
(๐, ๐), (๐โ 1, ๐ + 1) and (๐, ๐), (๐ + 1, ๐ + 1), ๐ = 1, ..., ๐, ๐ = 1, ..., โโ 1. We provide
a visual example of ๐บ๐,โ and ๐บ*๐,โ in Figure 4-3. First, we prove a trace theorem for ๐บ๐,โ.
The proof of the trace theorem consists of two lemmas. Lemma 10 shows that the discrete
trace operator is bounded, and Lemma 11 shows that it has a continuous right inverse. Taken
together, these lemmas imply our desired result for ๐บ๐,โ. We then extend this result to all
graphs ๐ป satisfying ๐บ๐,โ โ ๐ป โ ๐บ*๐,โ.
Lemma 10. Let ๐บ = ๐บ๐,โ, 4โ < ๐ < 2๐โ, ๐ โ N, with boundary ฮ = (๐, 1)๐๐=1. For any
๐ข โ R๐รโ, the vector ๐ = ๐ข|ฮ satisfies |๐|ฮ โค 4โ๐ |๐ข|๐บ.
Proof. We can decompose ๐๐+โ โ ๐โ into a sum of differences, given by
๐๐+โ โ ๐๐ =๐ โ1โ๐=1
๐ข๐+โ,๐ โ ๐ข๐+โ,๐+1 +โโ
๐=1
๐ข๐+๐,๐ โ ๐ข๐+๐โ1,๐ +๐ โ1โ๐=1
๐ข๐,๐ โ๐+1 โ ๐ข๐,๐ โ๐,
111
where ๐ =โโ๐
โis a function of โ. By Cauchy-Schwarz,
๐โ๐=1
โ๐/2โโโ=1
(๐๐+โ โ ๐๐
โ
)2
โค 3๐โ
๐=1
โ๐/2โโโ=1
(1
โ
๐ โ1โ๐=1
๐ข๐+โ,๐ โ ๐ข๐+โ,๐+1
)2
+ 3๐โ
๐=1
โ๐/2โโโ=1
(1
โ
โโ๐=1
๐ข๐+๐,๐ โ ๐ข๐+๐โ1,๐
)2
+ 3๐โ
๐=1
โ๐/2โโโ=1
(1
โ
๐ โ1โ๐=1
๐ข๐,๐ โ๐+1 โ ๐ข๐,๐ โ๐
)2
.
We bound the first and the second term separately. The third term is identical to the first.
Using Hardyโs inequality [50, Theorem 326], we can bound the first term by
๐โ๐=1
โ๐/2โโโ=1
(1
โ
๐ โ1โ๐=1
๐ข๐,๐ โ ๐ข๐,๐+1
)2
=๐โ
๐=1
โโ๐ =1
(1
๐
๐ โ1โ๐=1
๐ข๐,๐ โ ๐ข๐,๐+1
)2 โโ:โโ/๐โ=๐ 1โคโโคโ๐/2โ
๐ 2
โ2
โค 4๐โ
๐=1
โโ1โ๐ =1
(๐ข๐,๐ โ ๐ข๐,๐ +1
)2 โโ:โโ/๐โ=๐ 1โคโโคโ๐/2โ
๐ 2
โ2.
We have
โโ:โโ/๐โ=๐ 1โคโโคโ๐/2โ
๐ 2
โ2โค ๐ 2
๐๐ โ๐=๐(๐ โ1)+1
1
๐2โค ๐ 2(๐โ 1)
(๐(๐ โ 1) + 1)2โค 4(๐โ 1)
(๐ + 1)2โค 1
2
for ๐ โฅ 2 (๐ โฅ 3, by definition), and for ๐ = 1,
โโ:โโ/๐โ=11โคโโคโ๐/2โ
1
โ2โค
โโ๐=1
1
๐2=
๐2
6.
112
(a) ๐บ16,3 = ๐ถ16๐3 (b) ๐บ*16,3
Figure 4-3: A visual example of ๐บ๐,โ and ๐บ*๐,โ for ๐ = 16, โ = 3. The boundary ฮ is given
by the outer (or, by symmetry, inner) cycle.
Therefore, we can bound the first term by
๐โ๐=1
โ๐/2โโโ=1
(1
โ
๐ โ1โ๐=1
๐ข๐,๐ โ ๐ข๐,๐+1
)2
โค 2๐2
3
๐โ๐=1
โโ1โ๐ =1
(๐ข๐,๐ โ ๐ข๐,๐ +1
)2.
For the second term, we have
๐โ๐=1
โ๐/2โโโ=1
(1
โ
โโ๐=1
๐ข๐+๐,๐ โ ๐ข๐+๐โ1,๐
)2
โค๐โ
๐=1
โ๐/2โโโ=1
1
โ
โโ๐=1
(๐ข๐+๐,๐ โ ๐ข๐+๐โ1,๐
)2โค ๐
๐โ๐=1
โโ๐ =1
(๐ข๐+1,๐ โ ๐ข๐,๐
)2.
Combining these bounds produces the result |๐|2ฮ โค max4๐2, 3๐ |๐ข|2๐บ. Noting that ๐ โฅ 3
gives the desired result.
In order to show that the discrete trace operator has a continuous right inverse, we need
to produce a provably low-energy extension of an arbitrary function on ฮ. Let
๐ =1
๐
๐โ๐=1
๐๐ and ๐๐,๐ =1
2๐ โ 1
๐โ1โโ=1โ๐
๐๐+โ.
113
We consider the extension
๐ข๐,๐ =๐ โ 1
โโ 1๐ +
(1โ ๐ โ 1
โโ 1
)๐๐,๐. (4.4)
The proof of following inverse result for the discrete trace operator is similar in technique
to that of Lemma 10, but significantly more involved.
Lemma 11. Let ๐บ = ๐บ๐,โ, 4โ < ๐ < 2๐โ, ๐ โ N, with boundary ฮ = (๐, 1)๐๐=1. For any
๐ โ R๐, the vector ๐ข defined by (4.4) satisfies |๐ข|๐บ โค 5โ๐ |๐|ฮ.
Proof. We can decompose |๐ข|2๐บ into two parts, namely,
|๐ข|2๐บ =๐โ
๐=1
โโ๐=1
(๐ข๐+1,๐ โ ๐ข๐,๐)2 +
๐โ๐=1
โโ1โ๐=1
(๐ข๐,๐+1 โ ๐ข๐,๐)2.
We bound each sum separately, beginning with the first. We have
๐ข๐+1,๐ โ ๐ข๐,๐ =
(1โ ๐ โ 1
โโ 1
)(๐๐+1,๐ โ ๐๐,๐) =
(1โ ๐ โ 1
โโ 1
)๐๐+๐ โ ๐๐+1โ๐
2๐ โ 1.
Squaring both sides and noting that 4โ < ๐, we have
๐โ๐=1
โโ๐=1
(๐ข๐+1,๐ โ ๐ข๐,๐)2 โค
๐โ๐=1
โโ๐=1
[๐๐+๐ โ ๐๐+1โ๐
2๐ โ 1
]2โค
๐โ๐=1
2โโ1โโ=1
[๐๐+โ โ ๐๐
โ
]2โค |๐|2ฮ.
We now consider the second sum. Each term can be decomposed as
๐ข๐,๐+1 โ ๐ข๐,๐ =๐โ ๐๐,๐โโ 1
+
(1โ ๐
โโ 1
)[๐๐,๐+1 โ ๐๐,๐],
which leads to the upper bound
๐โ๐=1
โโ1โ๐=1
(๐ข๐,๐+1 โ ๐ข๐,๐)2 โค 2
๐โ๐=1
โโ1โ๐=1
[๐โ ๐๐,๐โโ 1
]2+ 2
๐โ๐=1
โโ1โ๐=1
(๐๐,๐+1 โ ๐๐,๐)2.
We estimate the two terms in the previous equation separately, beginning with the first. The
114
difference ๐โ ๐๐,๐ can be written as
๐โ ๐๐,๐ =1
๐
๐โ๐=1
๐๐ โ1
2๐ โ 1
๐โ1โโ=1โ๐
๐๐+โ =1
๐(2๐ โ 1)
๐โ๐=1
๐โ1โโ=1โ๐
๐๐ โ ๐๐+โ.
Squaring both sides,
(๐โ ๐๐,๐)2 =
1
๐2(2๐ โ 1)2
(๐โ
๐=1
๐โ1โโ=1โ๐
๐๐ โ ๐๐+โ
)2
โค 1
๐(2๐ โ 1)
๐โ๐=1
๐โ1โโ=1โ๐
(๐๐ โ ๐๐+โ)2.
Summing over all ๐ and ๐ gives
๐โ๐=1
โโ1โ๐=1
[๐โ ๐๐,๐โโ 1
]2โค 1
(โโ 1)2
๐โ๐=1
โโ1โ๐=1
1
๐(2๐ โ 1)
๐โ๐=1
๐โ1โโ=1โ๐
(๐๐ โ ๐๐+โ)2
=๐
4(โโ 1)2
โโ1โ๐=1
1
2๐ โ 1
๐โ1โโ=1โ๐
๐โ๐,๐=1
(๐๐ โ ๐๐+โ)2
๐2/4
โค ๐
4(โโ 1)|๐|2ฮ โค ๐|๐|2ฮ.
This completes the analysis of the first term. For the second term, we have
๐๐,๐+1 โ ๐๐,๐ =1
2๐ + 1
[๐๐+๐ + ๐๐โ๐ โ
2
2๐ โ 1
๐โ1โโ=1โ๐
๐๐+โ
].
Next, we note that๐๐+๐ โ
๐๐
2๐ โ 1โ 2
2๐ โ 1
๐โ1โโ=1
๐๐+โ
=
๐๐+๐ โ ๐๐
2๐ โ 1+ 2
๐โ1โโ=1
๐๐+๐ โ ๐๐+โ
2๐ โ 1
โค 2
๐โ1โโ=0
|๐๐+๐ โ ๐๐+โ|2๐ โ 1
,
115
and, similarly,
๐๐โ๐ โ
๐๐
2๐ โ 1โ 2
2๐ โ 1
๐โ1โโ=1
๐๐โโ
=
๐๐โ๐ โ ๐๐
2๐ โ 1+ 2
๐โ1โโ=1
๐๐โ๐ โ ๐๐โโ
2๐ โ 1
โค 2
๐โ1โโ=0
|๐๐โ๐ โ ๐๐โโ|2๐ โ 1
.
Hence,
๐โ1โ๐=1
(๐๐,๐+1 โ ๐๐,๐)2 โค
๐โ1โ๐=1
8
(2๐ + 1)2
โกโฃ( ๐โ1โโ=0
|๐๐+๐ โ ๐๐+โ|2๐ โ 1
)2
+
(๐โ1โโ=0
|๐๐โ๐ โ ๐๐โโ|2๐ โ 1
)2โคโฆ .
Once we sum over all ๐, the sum of the first and second term are identical, and therefore
๐โ๐=1
๐โ1โ๐=1
(๐๐,๐+1 โ ๐๐,๐)2 โค 16
๐โ๐=1
๐โ1โ๐=1
(๐โ1โโ=0
|๐๐+๐ โ ๐๐+โ|(2๐ โ 1)(2๐ + 1)
)2
.
We have
๐โ1โโ=0
|๐๐+๐ โ ๐๐+โ|(2๐ โ 1)(2๐ + 1)
โค 1
3๐
๐+๐โ1โ๐=๐
|๐๐+๐ โ ๐๐|๐
โค 1
3๐
๐+๐โ1โ๐=๐
|๐๐+๐ โ ๐๐|๐ + ๐ โ ๐
,
which implies that
16๐โ
๐=1
๐โ1โ๐=1
(๐โ1โโ=0
|๐๐+๐ โ ๐๐+โ|(2๐ โ 1)(2๐ + 1)
)2
โค 16
9
๐โ๐=1
๐โ1โ๐=1
(1
๐
๐+๐โ1โ๐=๐
|๐๐+๐ โ ๐๐|๐ + ๐ โ ๐
)2
โค 16
9
๐+โโ1โ๐=1
๐โ1โ๐=1
(1
๐ โ๐
๐โ1โ๐=๐
|๐๐ โ ๐๐|๐ โ ๐
)2
,
where ๐ = ๐+ ๐ and ๐ = ๐. Letting ๐ = ๐โ๐, ๐ = ๐โ ๐, and using Hardyโs inequality [50,
116
Theorem 326], we obtain
16
9
๐+โโ1โ๐=1
๐โ1โ๐=1
(1
๐ โ๐
๐โ1โ๐=๐
|๐๐ โ ๐๐|๐ โ ๐
)2
=16
9
๐+โโ1โ๐=1
๐โ1โ๐=1
(1
๐
๐โ๐ =1
|๐๐ โ ๐๐โ๐ |๐
)2
โค 64
9
๐+โโ1โ๐=1
๐โ1โ๐=1
[๐๐ โ ๐๐โ๐
๐
]2
โค 64
9
๐+โโ1โ๐=1
๐โ1โ๐=1
[๐๐ โ ๐๐โ๐
๐๐บ((๐, 1), (๐ โ ๐, 1)
)]2โค 256
9|๐|2ฮ,
where we note that the sum over the indices ๐ and ๐ consists of some amount of over-
counting, with some terms (๐(๐) โ ๐(๐ โ ๐))2 appearing up to four times. Improved
estimates can be obtained by noting that the choice of indexing for ฮ is arbitrary (see [124]
for details), but, for the sake of readability, we focus on simplicity over tight constants.
Combining our estimates and noting that ๐ โฅ 3, we obtain the desired result
|๐ข|๐บ โค
โ1 + 2
(๐ +
256
9
)|๐|ฮ =
โ2๐ +
521
9|๐|ฮ โค 5
โ๐ |๐|ฮ.
Combining Lemmas 10 and 11, we obtain our desired trace theorem.
Theorem 12. Let ๐บ = ๐บ๐,โ, 4โ < ๐ < 2๐โ, ๐ โ N, with boundary ฮ = (๐, 1)๐๐=1. For any
๐ โ R๐,1
4โ๐|๐|ฮ โค min
๐ข|ฮ=๐|๐ข|๐บ โค 5
โ๐ |๐|ฮ.
With a little more work, we can prove a similar result for a slightly more general class
of graphs. Using Theorem 12, we can almost immediately prove a trace theorem for any
graph ๐ป satisfying ๐บ๐,โ โ ๐ป โ ๐บ*๐,โ. In fact, Lemma 10 carries over immediately. In order
to prove a new version of Lemma 11, it suffices to bound the energy of ๐ข on the edges in
117
๐บ*๐,โ not contained in ๐บ๐,โ. By Cauchy-Schwarz,
|๐ข|2๐บ* = |๐ข|2๐บ +๐โ
๐=1
โโ1โ๐=1
[(๐ข๐,๐+1 โ ๐ข๐โ1,๐)
2 + (๐ข๐,๐+1 โ ๐ข๐+1,๐)2
]
โค 3๐โ
๐=1
โโ๐=1
(๐ข๐+1,๐ โ ๐ข๐,๐)2 + 2
๐โ๐=1
โโ1โ๐=1
(๐ข๐,๐+1 โ ๐ข๐,๐)2 โค 3 |๐ข|2๐บ,
and therefore Corollary 1 follows immediately from Lemmas 10 and 11.
Corollary 1. Let ๐ป satisfy ๐บ๐,โ โ ๐ป โ ๐บ*๐,โ, 4โ < ๐ < 2๐โ, ๐ โ N, with boundary
ฮ = (๐, 1)๐๐=1. For any ๐ โ R๐,
1
4โ๐|๐|ฮ โค min
๐ข|ฮ=๐|๐ข|๐ป โค 5
โ3๐ |๐|ฮ.
Trace Theorems for General Graphs
In order to extend Corollary 1 to more general graphs, we introduce a graph operation which
is similar in concept to an aggregation (a partition of ๐ into connected subsets) in which the
size of aggregates is bounded. In particular, we give the following definition.
Definition 2. The graph ๐ป , ๐บ๐,โ โ ๐ป โ ๐บ*๐,โ, is said to be an ๐ -aggregation of (๐บ,ฮ) โ
๐ข๐ if there exists a partition ๐ = ๐* โช ๐๐,๐๐=1,...,โ๐=1,...,๐ of ๐ (๐บ) satisfying
1. ๐บ[๐๐,๐] is connected and |๐๐,๐| โค๐ for all ๐ = 1, ..., ๐, ๐ = 1, ..., โ,
2. ฮ โโ๐
๐=1 ๐๐,1, and ฮ โฉ ๐๐,1 = โ for all ๐ = 1, ..., ๐,
3. ๐๐บ(๐*) โ ๐* โชโ๐
๐=1 ๐๐,โ,
4. the aggregation graph of ๐โ๐*, given by
(๐โ๐*, (๐๐1,๐1 , ๐๐2,๐2) |๐๐บ(๐๐1,๐1) โฉ ๐๐2,๐2 = 0),
is isomorphic to ๐ป .
118
(a) graph (๐บ,ฮ) (b) partition ๐ (c) ๐บ6,2 โ ๐ป โ ๐บ*6,2
Figure 4-4: An example of an ๐ -aggregation. Figure (A) provides a visual representationof a graph ๐บ, with boundary vertices ฮ enlarged. Figure (B) shows a partition ๐ of ๐บ, inwhich each aggregate (enclosed by dotted lines) has order at most four. The set ๐* is denotedby a shaded region. Figure (C) shows the aggregation graph ๐ป of ๐โ๐*. The graph ๐ปsatisfies ๐บ6,2 โ ๐ป โ ๐บ*
6,2, and is therefore a 4-aggregation of (๐บ,ฮ).
We provide a visual example in Figure 4-4, and later show that this operation applies
to a fairly large class of graphs. However, the ๐ -aggregation procedure is not the only
operation for which we can control the behavior of the energy and boundary semi-norms. For
instance, the behavior of our semi-norms under the deletion of some number of edges can be
bounded easily if each edge can be replaced by a path of constant length, and no remaining
edge is contained in more than a constant number of such paths. In addition, the behavior
of these semi-norms under the disaggregation of large degree vertices is also relatively
well-behaved (see [52] for details). Here, we focus on the ๐ -aggregation procedure. We
give the following result regarding graphs (๐บ,ฮ) for which some ๐ป , ๐บ๐,โ โ ๐ป โ ๐บ*๐,โ, is
an ๐ -aggregation of (๐บ,ฮ), but note that a large number of minor refinements are possible,
such as the two briefly mentioned above.
Theorem 13. If ๐ป , ๐บ๐,โ โ ๐ป โ ๐บ*๐,โ, 4โ < ๐ < 2๐โ, ๐ โ N, is an ๐ -aggregation of
(๐บ,ฮ) โ ๐ข๐, then for any ๐ โ R๐ฮ ,
1
๐ถ1
|๐|ฮ โค min๐ข|ฮ=๐
|๐ข|๐บ โค ๐ถ2 |๐|ฮ
for some fixed constants ๐ถ1, ๐ถ2 > 0 that depend only on ๐ and ๐ .
119
Proof. We first prove that there is an extension ๐ข of ๐ which satisfies |๐ข|๐บ โค ๐ถ2|๐|ฮ for
some ๐ถ2 > 0 that depends only on ๐ and ๐ . To do so, we define auxiliary functions ๐ข and๐ on (๐บ*2๐,โ,ฮ2๐,โ). Let
๐(๐) =
โงโชโจโชโฉmax๐โฮโฉ๐(๐+1)/2,1๐(๐) if ๐ is odd,
min๐โฮโฉ๐๐/2,1 ๐(๐) if ๐ is even,
and ๐ข be extension (4.4) of ๐. The idea is to upper bound the semi-norm for ๐ข by ๐ข, for ๐ขby ๐ (using Corollary 1), and for ๐ by ๐. On each aggregate ๐๐,๐ , let ๐ข take values between๐ข(2๐โ 1, ๐) and ๐ข(2๐, ๐), and let ๐ข equal ๐ on ๐*. We can decompose |๐ข|2๐บ into
|๐ข|2๐บ =๐โ
๐=1
โโ๐=1
โ๐,๐โ๐๐,๐ ,
๐โผ๐
(๐ข(๐)โ ๐ข(๐))2 +๐โ
๐=1
โโ๐=1
โ๐โ๐๐,๐ ,
๐โ๐๐+1,๐ ,๐โผ๐
(๐ข(๐)โ ๐ข(๐))2
+๐โ
๐=1
โโ1โ๐=1
โ๐โ๐๐,๐ ,
๐โ๐๐โ1,๐+1,๐โผ๐
(๐ข(๐)โ ๐ข(๐))2 +๐โ
๐=1
โโ1โ๐=1
โ๐โ๐๐,๐ ,
๐โ๐๐+1,๐+1,๐โผ๐
(๐ข(๐)โ ๐ข(๐))2
+๐โ
๐=1
โโ1โ๐=1
โ๐โ๐๐,๐ ,
๐โ๐๐,๐+1,๐โผ๐
(๐ข(๐)โ ๐ข(๐))2,
and bound each term of |๐ข|2๐บ separately, beginning with the first. The maximum energy
semi-norm of an ๐ vertex graph that takes values in the range [๐, ๐] is bounded above by
(๐/2)2(๐โ ๐)2. Therefore,
โ๐,๐โ๐๐,๐ ,
๐โผ๐
(๐ข(๐)โ ๐ข(๐))2 โค ๐2
4(๐ข(2๐โ 1, ๐)โ ๐ข(2๐, ๐))2 .
120
For the second term,
โ๐โ๐๐,๐ ,
๐โ๐๐+1,๐ ,๐โผ๐
(๐ข(๐)โ ๐ข(๐))2 โค ๐2 max๐1โ2๐โ1,2๐,
๐2โ2๐+1,2๐+2
(๐ข(๐1, ๐)โ ๐ข(๐2, ๐))2
โค 3๐2[(๐ข(2๐โ 1, ๐)โ ๐ข(2๐, ๐))2 + (๐ข(2๐, ๐)โ ๐ข(2๐ + 1, ๐))2
+(๐ข(2๐ + 1, ๐)โ ๐ข(2๐ + 2, ๐))2].
The exact same type of bound holds for the third and fourth terms. For the fifth term,
โ๐โ๐๐,๐ ,
๐โ๐๐,๐+1,๐โผ๐
(๐ข(๐)โ ๐ข(๐))2 โค๐2 max๐1โ2๐โ1,2๐,๐2โ2๐โ1,2๐
(๐ข(๐1, ๐)โ ๐ข(๐2, ๐ + 1))2,
and, unlike terms two, three, and four, this maximum appears in |๐ข|2๐บ*2๐,โ
. Combining these
three estimates, we obtain the upper bound |๐ข|2๐บ โค (6ร 3 + 14)๐2 |๐ข|2๐บ*
2๐,โ= 73
4๐2 |๐ข|2๐บ*
2๐,โ.
Next, we lower bound |๐|ฮ by a constant times |๐|ฮ2๐,โ. By definition, in ฮโฉ ๐๐,1 there is
a vertex that takes value ๐(2๐โ1) and a vertex that takes value ๐(2๐). This implies that every
term in |๐|ฮ2๐,โis a term in |๐|ฮ, with a possibly different denominator. Distances between
vertices on ฮ can be decreased by at most a factor of 2๐ on ฮ2๐,โ. In addition, it may be the
case that an aggregate contains only one vertex of ฮ, which results in ๐(2๐โ 1) = ๐(2๐).
Therefore, a given term in |๐|2ฮ could appear four times in |๐|2ฮ2๐,โ. Combining these
two facts, we immediately obtain the bound |๐|2ฮ2๐,โโค 16๐2|๐|2ฮ. Combining these two
estimates with Corollary 1 completes the first half of the proof.
All that remains is to show that for any ๐ข, |๐|ฮ โค ๐ถ1|๐ข|๐บ for some ๐ถ1 > 0 that depends
only on ๐ and ๐ . To do so, we define auxiliary functions ๐ข and ๐ on (๐บ2๐,2โ,ฮ2๐,2โ). Let
๐ข(๐, ๐) =
โงโชโจโชโฉmax๐โ๐โ๐/2โ,โ๐/2โ ๐ข(๐) if ๐ = ๐ mod 2,
min๐โ๐โ๐/2โ,โ๐/2โ ๐ข(๐) if ๐ = ๐ mod 2.
Here, the idea is to lower bound the semi-norm for ๐ข by ๐ข, for ๐ข by ๐ (using Corollary 1),
121
and for ๐ by ๐. We can decompose |๐ข|2๐บ2๐,2โinto
|๐ข|2๐บ2๐,2โ= 4
๐โ๐=1
โโ๐=1
(๐ข(2๐โ 1, 2๐ โ 1)โ ๐ข(2๐, 2๐ โ 1))2
+๐โ
๐=1
โโ๐=1
(๐ข(2๐, 2๐ โ 1)โ ๐ข(2๐ + 1, 2๐ โ 1))2 + (๐ข(2๐, 2๐)โ ๐ข(2๐ + 1, 2๐))2
+๐โ
๐=1
โโ1โ๐=1
(๐ข(2๐โ 1, 2๐)โ ๐ข(2๐โ 1, 2๐ + 1))2 + (๐ข(2๐, 2๐)โ ๐ข(2๐, 2๐ + 1))2.
The minimum squared energy semi-norm of an ๐ vertex graph that takes value ๐ at some
vertex and value ๐ at some vertex is bounded below by (๐โ ๐)2/๐. Therefore,
(๐ข(2๐โ 1, 2๐ โ 1)โ ๐ข(2๐, 2๐ โ 1))2 โค๐โ
๐,๐โ๐๐,๐ ,๐โผ๐
(๐ข(๐)โ ๐ข(๐))2,
and
max๐1โ2๐โ1,2๐,
๐2โ2๐+1,2๐+2
(๐ข(๐1, 2๐)โ ๐ข(๐2, 2๐))2 โค 2๐
โ๐,๐โ๐๐,๐โช๐๐+1,๐ ,
๐โผ๐
(๐ข(๐)โ ๐ข(๐))2.
Repeating the above estimate for each term results in the desired bound.
Next, we upper bound |๐|ฮ by a constant multiple of |๐|ฮ2๐,2โ. We can write |๐|2ฮ as
|๐|2ฮ =๐โ
๐=1
โ๐,๐โฮโฉ๐๐,1
(๐(๐)โ ๐(๐))2
๐2๐บ(๐, ๐)+
๐โ1โ๐1=1
๐โ๐2=๐1+1
โ๐โฮโฉ๐๐1,1,๐โฮโฉ๐๐2,1
(๐(๐)โ ๐(๐))2
๐2๐บ(๐, ๐),
and bound each term separately. The first term is bounded by
โ๐,๐โฮโฉ๐๐,1
(๐(๐)โ ๐(๐))2
๐2๐บ(๐, ๐)โค ๐2
4(๐(2๐โ 1)โ ๐(2๐))2.
For the second term, we first note that ๐๐บ(๐, ๐) โฅ 13๐ฮ2๐,2โ
((๐1, 1), (๐2, 1)) for ๐ โ ฮโฉ๐๐1,1,
๐ โ ฮ โฉ ๐๐2,1, ๐1 โ 2๐1 โ 1, 2๐1, ๐2 โ 2๐2 โ 1, 2๐2, which allows us to bound the
122
second term by
โ๐โฮโฉ๐๐1,1,๐โฮโฉ๐๐2,1
(๐(๐)โ ๐(๐))2
๐2๐บ(๐, ๐)โค 9๐2 max
๐1โ2๐1โ1,2๐1,๐2โ2๐2โ1,2๐2
(๐(๐1)โ ๐(๐2))2
๐2ฮ2๐,2โ(๐1,๐2)
.
Combining the lower bound for ๐ข in terms of ๐ข, the upper bound for ๐ in terms of ๐, and
Corollary 1 completes the second half of the proof.
The proof of Theorem 13 also immediately implies a similar result. Let ๐ฟ โ R๐ฮร๐ฮ be
the Laplacian of the complete graph on ฮ with weights ๐ค(๐, ๐) = ๐โ2ฮ (๐, ๐). The same proof
implies the following.
Corollary 2. If ๐ป , ๐บ๐,โ โ ๐ป โ ๐บ*๐,โ, 4โ < ๐ < 2๐โ, ๐ โ N, is an ๐ -aggregation of
(๐บ,ฮ) โ ๐ข๐, then for any ๐ โ R๐ฮ ,
1
๐ถ1
โจ๐ฟ๐, ๐โฉ1/2 โค min๐ข|ฮ=๐
|๐ข|๐บ โค ๐ถ2 โจ๐ฟ๐, ๐โฉ1/2,for some fixed constants ๐ถ1, ๐ถ2 > 0 that depend only on ๐ and ๐ .
Spectral Equivalence of ๐ฮ and ๐ฟ1/2ฮ
By Corollary 2, and the property โจ๐, ๐ฮ๐โฉ = min๐ข|ฮ=๐ |๐ข|2๐บ (see Proposition 8), in order
to prove spectral equivalence between ๐ฮ and ๐ฟ1/2ฮ , it suffices to show that ๐ฟ1/2
ฮ and ๐ฟ are
spectrally equivalent. This can be done relatively easily, and leads to a proof of the main
result of the section.
Theorem 14. If ๐ป , ๐บ๐,โ โ ๐ป โ ๐บ*๐,โ, 4โ < ๐ < 2๐โ, ๐ โ N, is an ๐ -aggregation of
(๐บ,ฮ) โ ๐ข๐, then for any ๐ โ R๐ฮ ,
1
๐ถ1
โจ๐ฟ1/2ฮ ๐, ๐โฉ โค โจ๐ฮ๐, ๐โฉ โค ๐ถ2 โจ๐ฟ1/2
ฮ ๐, ๐โฉ,
for some fixed constants ๐ถ1, ๐ถ2 > 0 that depend only on ๐ and ๐ .
123
Proof. Let ๐(๐, ๐) = min๐ โ ๐ mod ๐ฮ, ๐ โ ๐ mod ๐ฮ. ๐บ[ฮ] is a cycle, so ๐ฟ(๐, ๐) =
โ๐(๐, ๐)โ2 for ๐ = ๐. The spectral decomposition of ๐ฟฮ is well known, namely,
๐ฟฮ =
โ๐ฮ
2
โโ๐=1
๐๐(๐ฟฮ)
[๐ฅ๐๐ฅ
๐๐
โ๐ฅ๐โ2+
๐ฆ๐๐ฆ๐๐
โ๐ฆ๐โ2
],
where ๐๐(๐ฟฮ) = 2โ 2 cos 2๐๐๐ฮ
and ๐ฅ๐(๐) = sin 2๐๐๐๐ฮ
, ๐ฆ๐(๐) = cos 2๐๐๐๐ฮ
, ๐ = 1, ..., ๐ฮ. If ๐ฮ
is odd, then ๐(๐ฮโ1)/2 has multiplicity two, but if ๐ฮ is even, then ๐๐ฮ/2 has only multiplicity
one, as ๐ฅ๐ฮ/2 = 0. If ๐ = ๐ฮ/2, we have
โ๐ฅ๐โ2 =
๐ฮโ๐=1
sin2
(2๐๐๐
๐ฮ
)=
๐ฮ
2โ 1
2
๐ฮโ๐=1
cos
(4๐๐๐
๐ฮ
)
=๐ฮ
2โ 1
4
[sin(2๐๐(2 + 1
๐ฮ))
sin 2๐๐๐ฮ
โ 1
]=
๐ฮ
2,
and so โ๐ฆ๐โ2 = ๐ฮ
2as well. If ๐ = ๐ฮ/2, then โ๐ฆ๐โ2 = ๐ฮ. If ๐ฮ is odd,
๐ฟ1/2ฮ (๐, ๐) =
2โ
2
๐ฮ
๐ฮโ1
2โ๐=1
[1โ cos
2๐๐
๐ฮ
]1/2 [sin
2๐๐๐
๐ฮ
sin2๐๐๐
๐ฮ
โ cos2๐๐๐
๐ฮ
cos2๐๐๐
๐ฮ
]
=4
๐ฮ
๐ฮโ1
2โ๐=1
sin
(๐
2
2๐
๐ฮ
)cos
(๐(๐, ๐)๐
2๐
๐ฮ
)
=2
๐ฮ
๐ฮโ๐=0
sin
(๐
2
2๐
๐ฮ
)cos
(๐(๐, ๐)๐
2๐
๐ฮ
),
and if ๐ฮ is even,
๐ฟ1/2ฮ (๐, ๐) =
2
๐ฮ
(โ1)๐+๐ +4
๐ฮ
๐ฮ2โ1โ
๐=1
sin
(๐
2
2๐
๐ฮ
)cos
(๐(๐, ๐)๐
2๐
๐ฮ
)
=2
๐ฮ
๐ฮโ๐=0
sin
(๐
2
2๐
๐ฮ
)cos
(๐(๐, ๐)๐
2๐
๐ฮ
).
๐ฟ1/2ฮ (๐, ๐) is simply the trapezoid rule applied to the integral of sin(๐
2๐ฅ) cos(๐(๐, ๐)๐๐ฅ) on
124
the interval [0, 2]. Therefore,๐ฟ1/2ฮ (๐, ๐) +
2
๐(4๐(๐, ๐)2 โ 1)
=
๐ฟ1/2ฮ (๐, ๐)โ
โซ 2
0
sin(๐
2๐ฅ)
cos (๐(๐, ๐)๐๐ฅ) ๐๐ฅ
โค 2
3๐2ฮ
,
where we have used the fact that if ๐ โ ๐ถ2([๐, ๐]), then โซ ๐
๐
๐(๐ฅ)๐๐ฅโ ๐(๐) + ๐(๐)
2(๐โ ๐)
โค (๐โ ๐)3
12max๐โ[๐,๐]
|๐ โฒโฒ(๐)|.
Noting that ๐ฮ โฅ 3, it quickly follows that
(1
2๐โโ
2
12
)โจ๐, ๐โฉ โค โจ๐ฟ1/2
ฮ ๐, ๐โฉ โค(
2
3๐+
โ2
27
)โจ๐, ๐โฉ.
Combining this result with Corollary 2, and noting that โจ๐, ๐ฮ๐โฉ = |๐ข|2๐บ, where ๐ข is the
harmonic extension of ๐, we obtain the desired result.
An Illustrative Example
While the concept of a graph (๐บ,ฮ) having some ๐ป , ๐บ๐,โ โ ๐ป โ ๐บ*๐,โ, as an ๐ -aggregation
seems somewhat abstract, this simple formulation in itself is quite powerful. As an example,
we illustrate that this implies a trace theorem (and, therefore, spectral equivalence) for all
three-connected planar graphs with bounded face degree (number of edges in the associated
induced cycle) and for which there exists a planar spring embedding with a convex hull
that is not too thin (a bounded distance to Hausdorff distance ratio for the boundary with
respect to some point in the convex hull) and satisfies bounded edge length and small angle
conditions. Let ๐ข๐โค๐๐ be the elements of (๐บ,ฮ) โ ๐ข๐ for which every face other than the
outer face ฮ has at most ๐ edges. The exact proof the following theorem is rather long, and
can be found in the Appendix of [124]. The intuition is that by taking the planar spring
embedding, rescaling it so that the vertices of the boundary face lie on the unit circle, and
overlaying a natural embedding of ๐บ*๐,โ on the unit disk, an intuitive ๐ -aggregation of the
graph ๐บ can be formed.
Theorem 15. If there exists a planar spring embedding ๐ of (๐บ,ฮ) โ ๐ข๐โค๐1๐ for which
125
(1) ๐พ = conv ([๐ฮ]๐,ยท๐ฮ๐=1) satisfies
sup๐ขโ๐พ
inf๐ฃโ๐๐พ
sup๐คโ๐๐พ
โ๐ขโ ๐ฃโโ๐ขโ ๐คโ
โฅ ๐2 > 0,
(2) ๐ satisfies
max๐1,๐2โ๐ธ๐1,๐2โ๐ธ
โ๐๐1,ยท โ๐๐2,ยทโโ๐๐1,ยท โ๐๐2,ยทโ
โค ๐3 and min๐โ๐
๐1,๐2โ๐(๐)
โ ๐๐1,ยท ๐๐,ยท ๐๐2,ยท โฅ ๐4 > 0,
then there exists an ๐ป , ๐บ๐,โ โ ๐ป โ ๐บ*๐,โ, โ โค ๐ < 2๐โ, ๐ โ N, such that ๐ป is an
๐ -aggregation of (๐บ,ฮ), where ๐ and ๐ are constants that depend on ๐1, ๐2, ๐3, and ๐4.
126
4.3 The Kamada-Kawai Objective and Optimal Layouts
Given the distances between data points in a high dimensional space, how can we meaning-
fully visualize their relationships? This is a fundamental task in exploratory data analysis,
for which a variety of different approaches have been proposed. Many of these techniques
seek to visualize high-dimensional data by embedding it into lower dimensional, e.g. two
or three-dimensional, space. Metric multidimensional scaling (MDS or mMDS) [64, 66]
is a classical approach that attempts to find a low-dimensional embedding that accurately
represents the distances between points. Originally motivated by applications in psychomet-
rics, MDS has now been recognized as a fundamental tool for data analysis across a broad
range of disciplines. See [66, 9] for more details, including a discussion of applications to
data from scientific, economic, political, and other domains. Compared to other classical
visualization tools like PCA2, metric multidimensional scaling has the advantage of not
being restricted to linear projections of the data, and of being applicable to data from an
arbitrary metric space, rather than just Euclidean space. Because of this versatility, MDS
has also become one of the most popular algorithms in the field of graph drawing, where the
goal is to visualize relationships between nodes (e.g. people in a social network). In this
context, MDS was independently proposed by Kamada and Kawai [55] as a force-directed
graph drawing method.
In this section, we consider the algorithmic problem of computing an optimal drawing
under the MDS/Kamada-Kawai objective, and structural properties of optimal drawings. The
Kamada-Kawai objective is to minimize the following energy/stress functional ๐ธ : R๐๐ โ R
๐ธ(1, . . . , ๐) =โ๐<๐
(โ๐ โ ๐โ๐(๐, ๐)
โ 1
)2
, (4.5)
which corresponds to the physical situation where 1, . . . , ๐ โ R๐ are particles and for each
๐ = ๐, particles ๐ and ๐ are connected by an idealized spring with equilibrium length ๐(๐, ๐)
following Hookeโs law with spring constant ๐๐๐ = 1๐(๐,๐)2
. In applications to visualization,
2In the literature, PCA is sometimes referred to as classical multidimensional scaling, in contrast to metricmultidimensional scaling, which we study in this work.
127
the choice of dimension is often small, i.e. ๐ โค 3. We also note that in (4.5) the terms in
the sum are sometimes re-weighted with vertex or edge weights, which we discuss in more
detail later.
In practice, the MDS/Kamada-Kawai objective (4.5) is optimized via a heuristic proce-
dure like gradient descent [65, 135] or stress majorization [29, 43]. Because the objective is
non-convex, these algorithms may not reach the global minimum, but instead may terminate
at approximate critical points of the objective function. Heuristics such as restarting an
algorithm from different initializations and using modified step size schedules have been
proposed to improve the quality of results. In practice, these heuristic methods do seem to
work well for the Kamada-Kawai objective and are implemented in popular packages like
GRAPHVIZ [39] and the SMACOF package in R.
We revisit this problem from an approximation algorithms perspective. First, we resolve
the computational complexity of minimizing (4.5) by proving that finding the global mini-
mum is NP-hard, even for graph metrics (where the metric is the shortest path distance on a
graph). Consider the gap decision version of stress minimization over graph metrics, which
we formally define below:
GAP STRESS MINIMIZATION
Input: Graph ๐บ = ([๐], ๐ธ), ๐ โ N, ๐ฟ โฅ 0.
Output: TRUE if there exists = (1, . . . , ๐) โ R๐๐ such that ๐ธ() โค ๐ฟ; FALSE if for
every , ๐ธ() โฅ ๐ฟ + 1.
Theorem 16. Gap Stress Minimization in dimension ๐ = 1 is NP-hard. Furthermore,
the problem is hard even restricted to input graphs with diameter bounded by an absolute
constant.
As a gap problem, the output is allowed to be arbitrary if neither case holds; the hardness
of the gap formulation shows that there cannot exist a Fully-Polynomial Randomized
Approximation Scheme (FPRAS) for this problem if P = NP, i.e. the runtime cannot be
polynomial in the desired approximation guarantee. Our reduction shows this problem is
128
hard even when the input graph has diameter bounded above by an absolute constant. This
is a natural setting to consider, since many real world graphs (for example, social networks
[35]) and random graph models [127] indeed have low diameter due to the โsmall-world
phenomena.โ Other key aspects of this hardness proof are that we show the problem is
hard even when the input ๐ is a graph metric, and we show it is hard even in its canonical
unweighted formulation (4.5).
Given that computing the minimizer is NP-hard, a natural question is whether there
exist polynomial time approximation algorithms for minimizing (4.5). We show that if the
input graph has bounded diameter ๐ท = ๐(1), then there indeed exists a Polynomial-Time
Approximation Scheme (PTAS) to minimize (4.5), i.e. for fixed ๐ > 0 and fixed ๐ท there
exists an algorithm to approximate the global minimum of a ๐ vertex diameter ๐ท graph up to
multiplicative error (1+ ๐) in time ๐(๐,๐ท) ยท๐๐๐๐ฆ(๐). More generally, we show the following
result, where KKSCHEME is a simple greedy algorithm described in Subsection 4.3.3 below.
Theorem 17. For any input metric over [๐] with min๐,๐โ[๐] ๐(๐, ๐) = 1 and any ๐ > ๐ > 0,
Algorithm KKSCHEME with ๐1 = ๐(๐/๐ ) and ๐2 = ๐(๐/๐ 2) runs in time ๐2(๐ /๐)๐(๐๐ 4/๐2)
and outputs 1, . . . , ๐ โ R๐ with โ๐โ โค ๐ such that
E [๐ธ(1, . . . , ๐)] โค ๐ธ(*1, . . . ,
*๐) + ๐๐2
for any *1, . . . ,
*๐ with โ*
๐ โ โค ๐ for all ๐, where E is the expectation over the randomness
of the algorithm.
The fact that this result is a PTAS for bounded diameter graphs follows from combining
it with the two structural results regarding optimal Kamada-Kawai embeddings, which are
of independent interest. The first (Lemma 13) shows that the optimal objective value for low
diameter graphs must be of order ฮฉ(๐2), and the second (Lemma 15) shows that the optimal
KK embedding is โcontractiveโ in the sense that the diameter of the output is never much
larger than the diameter of the input. Both lemmas are proven in Subsection 4.3.1.
In the multidimensional scaling literature, there has been some study of the local
convergence of algorithms like stress majorization (for example [28]) which shows that
129
stress majorization will converge quickly if in a sufficiently small neighborhood of a local
minimum. This work seems to propose the first provable guarantees for global optimization.
The closest previous hardness result is the work of [19], where they showed that a similar
problem is hard. In their problem, the terms in (4.5) are weighted by ๐(๐, ๐), absolute
value loss replaces the squared loss, and the input is an arbitrary pseudometric where nodes
in the input are allowed to be at distance zero from each other. The second assumption
makes the diameter (ratio of max to min distance in the input) infinite, which is a major
obstruction to modifying their approach to show Theorem 16. See Remark 1 for further
discussion. In the approximation algorithms literature, there has been a great deal of interest
in optimizing the worst-case distortion of metric embeddings into various spaces, see e.g.
[5] for approximation algorithms for embeddings into one dimension, and [34, 85] for more
general surveys of low distortion metric embeddings. Though conceptually related, the
techniques used in this literature are not generally targeted for minimizing a measure of
average pairwise distortion like (4.5).
In what follows, we generally assume the input is given as an unweighted graph to
simplify notation. However, for the upper bound (Theorem 17) we do handle the general
case of arbitrary metrics with distances in [1, ๐ท] (the lower bound of 1 is without loss of
generality after re-scaling). In the lower bound (Theorem 16), we prove the (stronger) result
that the problem is hard when restricted to graph metrics, instead of just for arbitrary metrics.
Throughout this section, we use techniques involving estimating different components of
the objective function ๐ธ(1, . . . , ๐) given by (4.5). For convenience, we use the notation
๐ธ๐,๐() :=
(โ๐ โ ๐โ๐(๐, ๐)
โ 1
)2
, ๐ธ๐() :=โ๐,๐โ๐๐<๐
๐ธ๐,๐(), ๐ธ๐,๐ () :=โ๐โ๐
โ๐โ๐
๐ธ๐,๐().
The remainder of this section is as follows. In Subsection 4.3.1, we prove two structural
results regarding the objective value and diameter of an optimal layout. In Subsection 4.3.2,
we provide a proof of Theorem 16, the main algorithmic lower bound of the section. Finally,
in Subsection 4.3.3, we formally describe an approximation algorithm for low diameter
graphs and prove Theorem 17.
130
4.3.1 Structural Results for Optimal Embeddings
In this subsection, we present two results regarding optimal layouts of a given graph. In
particular, we provide a lower bound for the energy of a graph layout and an upper bound
for the diameter of an optimal layout. First, we recall the following standard ๐-net estimate.
Lemma 12 (Corollary 4.2.13 of [126]). Let ๐ต๐ = ๐ฅ : โ๐ฅโ โค ๐ โ R๐ be the origin-
centered radius ๐ ball in ๐ dimensions. For any ๐ โ (0, ๐ ) there exists a subset ๐๐ โ ๐ต๐
with |๐๐| โค (3๐ /๐)๐ such that maxโ๐ฅโโค๐
min๐ฆโ๐๐
โ๐ฅโ ๐ฆโ โค ๐, i.e. ๐๐ is an ๐-net of ๐ต๐ .
We use this result to prove a lower bound for the objective value of any layout of a
diameter ๐ท graph in R๐.
Lemma 13. Let ๐บ = ([๐], ๐ธ) have diameter
๐ท โค (๐/2)1/๐
10.
Then any layout โ R๐๐ has energy
๐ธ() โฅ ๐2
81(10๐ท)๐.
Proof. Let ๐บ = ([๐], ๐ธ) have diameter ๐ท โค (๐/2)1/๐/10, and suppose that there exists a
layout โ R๐ of ๐บ in dimension ๐ with energy ๐ธ() = ๐๐2 for some ๐ โค 1/810. If no
such layout exists, then we are done. We aim to lower bound the possible values of ๐. For
each vertex ๐ โ [๐], we consider the quantity ๐ธ๐,๐ โ๐(). The sum
โ๐โ[๐]
๐ธ๐,๐ โ๐() = 2๐๐2,
and so there exists some ๐โฒ โ [๐] such that ๐ธ๐โฒ,๐ โ๐โฒ() โค 2๐๐. By Markovโs inequality,
๐ โ [๐] |๐ธ๐โฒ,๐() > 10๐
< ๐/5,
131
and so at least 4๐/5 vertices (including ๐โฒ) in [๐] satisfy
(โ๐โฒ โ ๐โ๐(๐โฒ, ๐)
โ 1
)2
โค 10๐,
and also
โ๐โฒ โ ๐โ โค ๐(๐โฒ, ๐)(1 +โ
10๐) โค 10
9๐ท.
The remainder of the proof consists of taking the ๐-dimensional ball with center ๐โฒ and
radius 10๐ท/9 (which contains โฅ 4๐/5 vertices), partitioning it into smaller sub-regions,
and then lower bounding the energy resulting from the interactions between vertices within
each sub-region.
By applying Lemma 12 with ๐ := 10๐ท/9 and ๐ := 1/3, we may partition the ๐
dimensional ball with center ๐โฒ and radius 10๐ท/9 into (10๐ท)๐ disjoint regions, each of
diameter at most 2/3. For each of these regions, we denote by ๐๐ โ [๐], ๐ โ [(10๐ท)๐], the
subset of vertices whose corresponding point lies in the corresponding region. As each
region is of diameter at most 2/3 and the graph distance between any two distinct vertices is
at least one, either
๐ธ๐๐() โฅ
(|๐๐|2
)(2/3โ 1)2 =
|๐๐|(|๐๐| โ 1)
18
or |๐๐| = 0. Empty intervals provide no benefit and can be safely ignored. A lower bound
on the total energy can be produced by the following optimization problem
minโโ
๐=1
๐๐(๐๐ โ 1) s.t.โโ
๐=1
๐๐ = ๐, ๐๐ โฅ 1, ๐ โ [โ],
where the ๐๐ have been relaxed to real values. This optimization problem has a non-empty
feasible region for ๐ โฅ โ, and the solution is given by ๐(๐/โ โ 1) (achieved when
๐๐ = ๐/โ for all ๐). In our situation, ๐ := 4๐/5 and โ := (10๐ท)๐, and, by assumption,
132
๐ โฅ โ. This leads to the lower bound
๐๐2 = ๐ธ() โฅโโ
๐=1
๐ธ๐๐() โฅ 4๐
90
[4๐
5(10๐ท)๐โ 1
],
which implies that
๐ โฅ 16
450(10๐ท)๐
(1โ 5(10๐ท)๐
4๐
)โฅ 1
75(10๐ท)๐
for ๐ท โค (๐/2)1/๐/10. This completes the proof.
The above estimate has the correct dependence for ๐ = 1. For instance, consider the
lexicographical product of a path ๐๐ท and a clique ๐พ๐/๐ท (i.e., a graph with ๐ท cliques in a
line, and complete bipartite graphs between neighboring cliques). This graph has diameter
๐ท, and the layout in which the โverticesโ (each corresponding to a copy of ๐พ๐/๐ท) of ๐๐ท lie
exactly at the integer values [๐ท] has objective value ๐2(๐/๐ท โ 1). This estimate is almost
certainly not tight for dimensions ๐ > 1, as there is no higher dimensional analogue of the
path (i.e., a graph with ๐(๐ท๐) vertices and diameter ๐ท that embeds isometrically in R๐).
Next, we provide a proof of Lemma 15, which upper bounds the diameter of any optimal
layout of a diameter ๐ท graph. However, before we proceed with the proof, we first prove
an estimate for the concentration of points ๐ at some distance from the marginal median
in any optimal layout. The marginal median โ R๐ of a set of points 1, . . . , ๐ โ R๐ is
the vector whose ๐๐กโ component (๐) is the univariate median of 1(๐), . . . , ๐(๐) (with the
convention that, if ๐ is even, the univariate median is given by the mean of the (๐/2)๐กโ and
(๐/2 + 1)๐กโ ordered values). The below result, by itself, is not strong enough to prove our
desired diameter estimate, but serves as a key ingredient in the proof of Lemma 15.
Lemma 14. Let ๐บ = ([๐], ๐ธ) have diameter ๐ท. Then, for any optimal layout โ R๐๐, i.e.,
such that ๐ธ() โค ๐ธ() for all โ R๐๐,
๐ โ [๐] | โโ ๐โโ โฅ (๐ถ + ๐)๐ท
โค 2๐๐๐ถโ
โ2๐
133
for all ๐ถ > 0 and ๐ โ N, where is the marginal median of 1, . . . , ๐.
Proof. Let ๐บ = ([๐], ๐ธ) have diameter ๐ท, and be an optimal layout. Without loss
of generality, we order the vertices so that 1(1) โค ยท ยท ยท โค ๐(1), and shift so that
โ๐/2โ(1) = 0. Next, we fix some ๐ถ > 0, and define the subsets ๐0 :=[โ๐/2โ
]and
๐๐ :=๐ โ [๐] | ๐(1) โ
[(๐ถ + ๐)๐ท, (๐ถ + ๐ + 1)๐ท
),
๐๐ :=๐ โ [๐] | ๐(1) โฅ (๐ถ + ๐)๐ท
,
๐ โ N. Our primary goal is to estimate the quantity |๐๐|, for an arbitrary ๐, from which the
desired result will follow quickly.
The objective value ๐ธ() is at most(๐2
); otherwise we could replace by a layout with
all ๐ equal. To obtain a first estimate on |๐๐|, we consider a crude lower bound on ๐ธ๐0,๐๐().
We have
๐ธ๐0,๐๐() โฅ |๐0||๐๐|
((๐ถ + ๐)๐ท
๐ทโ 1
)2
= (๐ถ + ๐ โ 1)2โ๐/2โ|๐๐|,
and therefore
|๐๐| โค(๐
2
)1
(๐ถ + ๐ โ 1)2โ๐/2โโค ๐
(๐ถ + ๐ โ 1)2.
From here, we aim to prove the following claim
|๐๐| โค๐(
๐ถ + ๐ โ (2โโ 1))2โ , ๐, โ โ N, ๐ โฅ 2โโ 1, (4.6)
by induction on โ. The above estimate serves as the base case for โ = 1. Assuming the
above statement holds for some fixed โ, we aim to prove that this holds for โ + 1.
To do so, we consider the effect of collapsing the set of points in ๐๐, ๐ โฅ 2โ, into a
134
hyperplane with a fixed first value. In particular, consider the alternate layout โฒ given by
โฒ๐(๐) =
โงโชโชโชโชโชโจโชโชโชโชโชโฉ(๐ถ + ๐)๐ท ๐ = 1, ๐ โ ๐๐
๐(๐)โ๐ท ๐ = 1, ๐ โ ๐๐+1
๐(๐) otherwise
.
For this new layout, we have
๐ธ๐๐โ1โช๐๐โช๐๐+1(โฒ) โค ๐ธ๐๐โ1โช๐๐โช๐๐+1
() +|๐๐|2
2+(|๐๐โ1|+ |๐๐+1|
)|๐๐|
and
๐ธ๐0,๐๐+1(โฒ) โค ๐ธ๐0,๐๐+1
()โ |๐0||๐๐+1|((
(๐ถ + ๐ + 1)โ 1)2 โ ((๐ถ + ๐)โ 1
)2)= ๐ธ๐0,๐๐+1
()โ (2๐ถ + 2๐ โ 1)โ๐/2โ|๐๐+1|.
Combining these estimates, we note that this layout has objective value bounded above by
๐ธ(โฒ) โค ๐ธ() +(|๐๐โ1|+ |๐๐|/2 + |๐๐+1|
)|๐๐| โ (2๐ถ + 2๐ โ 1)โ๐/2โ|๐๐+1|
โค ๐ธ() + |๐๐โ1||๐๐| โ (2๐ถ + 2๐ โ 1)โ๐/2โ|๐๐+1|,
and so
|๐๐+1| โค|๐๐โ1||๐๐|
(2๐ถ + 2๐ โ 1)โ๐/2โ
โค ๐2(๐ถ + (๐ โ 1)โ (2โโ 1)
)2โ(๐ถ + ๐ โ (2โโ 1)
)2โ(2๐ถ + 2๐ โ 1)โ๐/2โ
โค ๐
2โ๐/2โ
(๐ถ + (๐ โ 1)โ (2โโ 1)
)2โ(๐ถ + ๐ โ (2โโ 1)
)2โ ๐(๐ถ + (๐ โ 1)โ (2โโ 1)
)2โ+1
โค ๐(๐ถ + (๐ + 1)โ (2(โ + 1)โ 1)
)2โ+1
135
for all ๐ + 1 โฅ 2(โ + 1)โ 1 and ๐ถ + ๐ โค ๐ + 1. If ๐ถ + ๐ > ๐ + 1, then
|๐๐+1| โค|๐๐โ1||๐๐|
(2๐ถ + 2๐ โ 1)โ๐/2โโค |๐๐โ1||๐๐|
(2๐ + 1)โ๐/2โ< 1,
and so |๐๐+1| = 0. This completes the proof of claim (4.6), and implies that |๐๐| โค ๐๐ถโโ2๐
.
Repeating this argument, with indices reversed, and also for the remaining dimensions
๐ = 2, . . . , ๐, leads to the desired result
๐ โ [๐] | โโ ๐โโ โฅ (๐ถ + ๐)๐ท
โค 2๐๐๐ถโ
โ2๐
for all ๐ โ N and ๐ถ > 0.
Using the estimates in the proof of Lemma 14 (in particular, the bound |๐๐| โค ๐๐ถโโ2๐
),
we are now prepared to prove the following diameter bound.
Lemma 15. Let ๐บ = ([๐], ๐ธ) have diameter ๐ท. Then, for any optimal layout โ R๐๐, i.e.,
such that ๐ธ() โค ๐ธ() for all โ R๐๐,
โ๐ โ ๐โ2 โค 8๐ท + 4๐ท log2 log2 2๐ท
for all ๐, ๐ โ [๐].
Proof. Let ๐บ = ([๐], ๐ธ), have diameter ๐ท, and be an optimal layout. Without loss of
generality, suppose that the largest distance between any two points of is realized between
two points lying in span๐1 (i.e., on the axis of the first dimension). In addition, we order
the vertices so that 1(1) โค ยท ยท ยท โค ๐(1) and translate so that โ๐/2โ(1) = 0.
Let ๐ฅ1(1) = โ๐ผ๐1, and suppose that ๐ผ โฅ 5๐ท. If this is not the case, then we are done.
By differentiating ๐ธ with respect to 1(1), we obtain
๐๐ธ
๐1(1)= โ
๐โ๐=2
(โ๐ โ 1โ๐(1, ๐)
โ 1
)๐(1) + ๐ผ
๐(1, ๐)โ๐ โ 1โ= 0.
136
Let
๐1 := ๐ โ [๐] : โ๐ โ 1โ โฅ ๐(1, ๐),
๐2 := ๐ โ [๐] : โ๐ โ 1โ < ๐(1, ๐),
and compare the sum of the terms in the above equation corresponding to ๐1 and ๐2,
respectively. We begin with the former. Note that for all ๐ โฅ โ๐/2โ we must have ๐ โ ๐1,
because โ๐ฅ๐ โ ๐ฅ1โ โฅ ๐ผ โฅ 5๐ท > ๐(1, ๐). Therefore we have
โ๐โ๐1
(โ๐ โ 1โ๐(1, ๐)
โ 1
)๐(1) + ๐ผ
๐(1, ๐)โ๐ โ 1โโฅ
๐โ๐=โ๐/2โ
(โ๐ โ 1โ๐(1, ๐)
โ 1
)๐(1) + ๐ผ
๐(1, ๐)โ๐ โ 1โ
โฅ โ๐/2โ(โ๐ โ 1โ
๐ทโ 1
)๐ผ
๐ทโ๐ โ 1โ
โฅ โ๐/2โ(๐ผโ๐ท)
๐ท2.
where in the last inequality we used โ๐ฅ๐ โ ๐ฅ1โ โฅ ๐ผ and โ๐ฅ๐ โ ๐ฅ1โ/๐ท โ 1 โฅ ๐ผ/๐ท โ 1.
Next, we estimate the sum of terms corresponding to ๐2. Let
๐3 := ๐ โ [๐] : ๐(1) + ๐ผ โค ๐ท
and note that ๐2 โ ๐3. We have
โ๐โ๐2
(1โ โ๐ โ 1โ
๐(1, ๐)
)๐(1) + ๐ผ
๐(1, ๐)โ๐ โ 1โ=โ๐โ๐3
1โ โ๐ โ 1โ
๐(1, ๐)
+
๐(1) + ๐ผ
๐(1, ๐)โ๐ โ 1โ
โคโ๐โ๐3
1โ โ๐ โ 1โ
๐(1, ๐)
+
1
๐(1, ๐)
โคโ๐โ๐3
1
๐(1, ๐)โค |๐3|,
where | ยท |+ := maxยท, 0. Combining these estimates, we have
|๐3| โโ๐/2โ(๐ผโ๐ท)
๐ท2โฅ 0,
137
or equivalently,
๐ผ โค ๐ท +|๐3|๐ท2
โ๐/2โ.
By the estimate in the proof of Lemma 14, |๐3| โค ๐๐ถโโ2๐
for any ๐ โ N and ๐ถ > 0
satisfying (๐ถ + ๐)๐ท โค ๐ผ โ๐ท. Taking ๐ถ = 2, and ๐ = โ๐ผ/๐ท โ 3โ > 2๐ท log2 log2(2๐ท),
we have
๐ผ โค ๐ท + 2๐ท22โโ2๐
โค ๐ท +2๐ท2
2๐ท= 2๐ท,
a contradiction. Therefore, ๐ผ is at most 4๐ท + 2๐ท log2 log2 2๐ท. Repeating this argument
with indices reversed completes the proof.
While the above estimate is sufficient for our purposes, we conjecture that it is not tight,
and that the diameter of an optimal layout of a diameter ๐ท graph is always at most 2๐ท.
4.3.2 Algorithmic Lower Bounds
In this subsection, we provide a proof of Theorem 16, the algorithmic lower bound of this
section. Our proof is based on a reduction from a version of Max All-Equal 3SAT. The
Max All-Equal 3SAT decision problem asks whether, given variables ๐ก1, . . . , ๐กโ, clauses
๐ถ1, . . . , ๐ถ๐ โ ๐ก1, . . . , ๐กโ, ๐ก1, . . . , ๐กโ each consisting of at most three literals (variables or
their negation), and some value ๐ฟ, there exists an assignment of variables such that at least
๐ฟ clauses have all literals equal. The Max All-Equal 3SAT decision problem is known to be
APX-hard, as it does not satisfy the conditions of the Max CSP classification theorem for a
polynomial time optimizable Max CSP [58] (setting all variables true or all variables false
does not satisfy all clauses, and all clauses cannot be written in disjunctive normal form as
two terms, one with all unnegated variables and one with all negated variables).
However, we require a much more restrictive version of this problem. In particular, we
require a version in which all clauses have exactly three literals, no literal appears in a clause
more than once, the number of copies of a clause is equal to the number of copies of its
complement (defined as the negation of all its elements), and each literal appears in exactly
138
๐ clauses. We refer to this restricted version as Balanced Max All-Equal EU3SAT-E๐. This
is indeed still APX-hard, even for ๐ = 6. The proof adds little to the understanding of the
hardness of the Kamada-Kawai objective, and for this reason is omitted. The proof can be
found in [31].
Lemma 16. Balanced Max All-Equal EU3SAT-E6 is APX-hard.
Reduction. Suppose we have an instance of Balanced Max All-Equal EU3SAT-E๐ with
variables ๐ก1, . . . , ๐กโ, and clauses ๐ถ1, . . . , ๐ถ2๐. Let โ = ๐ก1, . . . , ๐กโ, ๐ก1, . . . , ๐กโ be the set of
literals and ๐ = ๐ถ1, . . . , ๐ถ2๐ be the multiset of clauses. Consider the graph ๐บ = (๐,๐ธ),
with
๐ = ๐ฃ๐๐โ[๐๐ฃ ] โช ๐ก๐ ๐กโโ๐โ[๐๐ก]
โช ๐ถ๐ ๐ถโ๐๐โ[๐๐]
,
๐ธ = ๐ (2) โ[(๐ก๐, ๐ก๐) ๐กโโ
๐,๐โ[๐๐ก]โช (๐ก๐, ๐ถ๐) ๐กโ๐ถ,๐ถโ๐
๐โ[๐๐ก],๐โ[๐๐]
],
where ๐ (2) := ๐ โ ๐ | |๐ | = 2, ๐๐ฃ = ๐20๐ , ๐๐ก = ๐2
๐ , ๐๐ = 2000โ20๐20. The graph ๐บ
consists of a central clique of size ๐๐ฃ, a clique of size ๐๐ก for each literal, and a clique of
size ๐๐ for each clause; the central clique is connected to all other cliques (via bicliques);
the clause cliques are connected to each other; each literal clique is connected to all other
literals, save for the clique corresponding to its negation; and each literal clique is connected
to the cliques of all clauses it does not appear in. Note that the distance between any two
nodes in this graph is at most two.
Our goal is to show that, for some fixed ๐ฟ = ๐ฟ(โ,๐, ๐, ๐) โ N, if our instance of
Balanced Max All-Equal EU3SAT-E๐ can satisfy 2๐ clauses, then there exists a layout
with ๐ธ() โค ๐ฟ, and if it can satisfy at most 2๐ โ 2 clauses, then all layouts โฒ have
objective value at least ๐ธ(โฒ) โฅ ๐ฟ + 1. To this end, we propose a layout and calculate
its objective function up to lower-order terms. Later we will show that, for any layout of
an instance that satisfies at most 2๐โ 2 clauses, the objective value is strictly higher than
our proposed construction for the previous case. The anticipated difference in objective
value is of order ฮ(๐๐ก๐๐), and so we will attempt to correctly place each literal vertex up to
139
accuracy ๐(๐๐/(โ๐)
)and each clause vertex up to accuracy ๐
(๐๐ก/(๐๐)
).
The general idea of this reduction is that the clique ๐ฃ๐ serves as an โanchor" of sorts that
forces all other vertices to be almost exactly at the correct distance from its center. Without
loss of generality, assume this anchor clique is centered at 0. Roughly speaking, this forces
each literal clique to roughly be at either โ1 or +1, and the distance between negations
forces negations to be on opposite sides, i.e., ๐ก๐ โ โ๐ก๐ . Clause cliques are also roughly at
either โ1 or +1, and the distance to literals forces clauses to be opposite the side with the
majority of its literals, i.e., clause ๐ถ = ๐ก1, ๐ก2, ๐ก3 lies at ๐ถ๐ โ โ๐๐ก๐1+ ๐ก๐2
+ ๐ก๐3โฅ 0,
where ๐ is the indicator variable. The optimal embedding of ๐บ, i.e. the location of variable
cliques at either +1 or โ1, corresponds to an optimal assignment for the Max All-Equal
3SAT instance.
Remark 1 (Comparison to [19]). As mentioned in the Introduction, the reduction here is
significantly more involved than the hardness proof for a related problem in [19]. At a high
level, the key difference is that in [19] they were able to use a large number of distance-zero
vertices to create a simple structure around the origin. This is no longer possible in our
setting (in particular, with bounded diameter graph metrics), which results in graph layouts
with much less structure. For this reason, we require a graph that exhibits as much structure
as possible. To this end, a reduction from Max All-Equal 3SAT using both literals and
clauses in the graph is a much more suitable technique than a reduction from Not All-Equal
3SAT using only literals. In fact, it is not at all obvious that the same approach in [19],
applied to unweighted graphs, would lead to a computationally hard instance.
Structure of optimal layout. Let be a globally optimal layout of ๐บ, and let us label the
vertices of ๐บ based on their value in , i.e., such that 1 โค ยท ยท ยท โค ๐. In addition, we define
:= ๐โ๐๐ฃ = 2โ๐๐ก + 2๐๐๐.
Without loss of generality, we assume thatโ
๐ ๐ = 0. We consider the first-order conditions
for an arbitrary vertex ๐. Let ๐ := (๐, ๐) | ๐ < ๐, ๐(๐, ๐) = 2, ๐๐ := ๐ โ [๐] | ๐(๐, ๐) = 2,
140
and
๐<๐ := ๐ < ๐ | ๐(๐, ๐) = 2, ๐>
๐ := ๐ > ๐ | ๐(๐, ๐) = 2,
๐<,+๐ := ๐ < ๐ | ๐(๐, ๐) = 2, ๐ โฅ 0, ๐>,+
๐ := ๐ > ๐ | ๐(๐, ๐) = 2, ๐ โฅ 0,
๐<,โ๐ := ๐ < ๐ | ๐(๐, ๐) = 2, ๐ < 0, ๐>,โ
๐ := ๐ > ๐ | ๐(๐, ๐) = 2, ๐ < 0.
We have
๐๐ธ
๐๐
= 2โ๐<๐
(๐ โ ๐ โ 1)โ 2โ๐>๐
(๐ โ ๐ โ 1)โ 2โ๐โ๐<
๐
(๐ โ ๐ โ 1)
+ 2โ๐โ๐>
๐
(๐ โ ๐ โ 1) +1
2
โ๐โ๐<
๐
(๐ โ ๐ โ 2)โ 1
2
โ๐โ๐>
๐
(๐ โ ๐ โ 2)
=(2๐โ 3
2|๐๐|)๐ + 2(๐ + 1โ 2๐) +
1
2
โกโฃโ๐โ๐<
๐
(3๐ + 2) +โ๐โ๐>
๐
(3๐ โ 2)
โคโฆ=(2๐โ 3
2|๐๐|)๐ + 2(๐ + 1โ 2๐) + 1
2
(|๐>,+
๐ | โ 5|๐>,โ๐ |+ 5|๐<,+
๐ | โ |๐<,โ๐ |
)+ 3
2๐ถ๐
= 0,
where
๐ถ๐ :=โ
๐โ๐<,โ๐ โช๐>,โ
๐
(๐ + 1) +โ
๐โ๐<,+๐ โช๐>,+
๐
(๐ โ 1).
The ๐๐กโ vertex has location
๐ =2๐โ (๐ + 1)
๐โ 34|๐๐|
โ |๐>,+๐ | โ 5|๐>,โ
๐ |+ 5|๐<,+๐ | โ |๐<,โ
๐ |4๐โ 3|๐๐|
โ 3๐ถ๐
4๐โ 3|๐๐|. (4.7)
By Lemma 15,
|๐ถ๐| โค |๐๐| max๐โ[๐]
max|๐ โ 1|, |๐ + 1| โค 75๐๐ก,
which implies that |๐| โค 1 + 1/๐5๐ก for all ๐ โ [๐].
141
Good layout for positive instances. Next, we will formally describe a layout that we
will then show to be nearly optimal (up to lower-order terms). Before giving the formal
description, we describe the layoutโs structure at a high level first. Given some assignment
of variables that corresponds to 2๐ clauses being satisfied, for each literal ๐ก, we place the
clique ๐ก๐๐โ[๐๐ก] at roughly +1 if the corresponding literal ๐ก is true; otherwise we place the
clique at roughly โ1. For each clause ๐ถ, we place the corresponding clique ๐ถ๐๐โ[๐๐ถ ] at
roughly +1 if the majority of its literals are near โ1; otherwise we place it at roughly โ1.
The anchor clique ๐ฃ๐๐โ[๐๐ฃ ] lies in the middle of the interval [โ1,+1], separating the literal
and clause cliques on both sides.
The order of the vertices, from most negative to most positive, consists of
๐1, ๐2, ๐03 , . . . , ๐
๐3 , ๐4, ๐
๐5 , . . . , ๐
05 , ๐6, ๐7,
where
๐1 = clause cliques near โ1 with all corresponding literal cliques near +1 ,
๐2 = clause cliques near โ1 with two corresponding literal cliques near +1,
๐ ๐3 = literal cliques near โ1 with ๐ corresponding clauses near โ1 , ๐ = 0, . . . , ๐,
๐4 = the anchor clique ,
๐ ๐5 = literal cliques near +1 with ๐ corresponding clauses near +1 , ๐ = 0, . . . , ๐,
๐6 = clause cliques near +1 with two corresponding literal cliques near โ1,
๐7 = clause cliques near +1 with all corresponding literal cliques near โ1 ,
๐3 :=โ๐
๐=0 ๐๐3 , ๐6 :=
โ๐๐=0 ๐
๐6 , ๐๐ := ๐1 โช ๐2 โช ๐6 โช ๐7, and ๐๐ก := ๐3 โช ๐5. Let us define
๐ := 2๐โ(๐+1)๐
, i.e., the optimal layout of a clique ๐พ๐ in one dimension. We can write our
proposed optimal layout as a perturbation of ๐.
Using the above Equation 4.7 for ๐, we obtain ๐ = ๐ for ๐ โ ๐4, and, by ignoring both
142
the contribution of ๐ถ๐ and ๐(1/๐) terms, we obtain
๐ = ๐ โ3๐๐ก
๐, ๐ โ ๐1, ๐ = ๐ +
3๐๐ก
๐, ๐ โ ๐7,
๐ = ๐ โ3๐๐ก
2๐, ๐ โ ๐2, ๐ = ๐ +
3๐๐ก
2๐, ๐ โ ๐6,
๐ = ๐ โ๐๐ก + (๐ โ ๐/2)๐๐
๐, ๐ โ ๐3, ๐ = ๐ +
๐๐ก + (๐ โ ๐/2)๐๐
๐, ๐ โ ๐5.
Next, we upper bound the objective value of . We proceed in steps, estimating different
components up to ๐(๐๐๐๐ก). We recall the useful formulas
๐โ1โ๐=1
๐โ๐=๐+1
(2(๐ โ ๐)
๐โ 1
)2
=(๐ โ 1)๐
(3๐2 โ 4๐(๐ + 1) + 2๐(๐ + 1)
)6๐2
, (4.8)
and
๐โ๐=1
๐ โ๐=1
(2(๐ โ ๐) + ๐
๐โ 1
)2
=๐๐
3๐2
(3๐2 + 3๐2 + 4๐2 + 4๐ 2 โ 6๐๐ + 6๐๐ โ 6๐๐
โ 6๐๐ + 6๐๐ โ 6๐๐ โ 2)
โค ๐๐ 3
3๐2+
13๐๐
3๐2max(๐โ ๐ )2, (๐ โ ๐)2, ๐2. (4.9)
For ๐ โ ๐๐, |1โ |๐|| โค 4๐๐ก/๐, and so
๐ธ๐๐() โค |๐๐|2 max๐,๐โ๐๐
(|๐ โ ๐| โ 1)2 โค 8๐2๐2๐ .
In addition, for ๐ โ ๐๐ก, |1โ |๐|| โค 3โ๐๐ก/๐, and so
๐ธ๐๐ก() โค |๐, ๐ โ ๐๐ก | ๐(๐, ๐) = 1|(
1 +6โ๐๐ก
๐
)2
+ |๐, ๐ โ ๐๐ก | ๐(๐, ๐) = 2| 1
4
(6โ๐๐ก
๐
)2
โค (2โโ 1)โ๐2๐ก + 40
โ3๐3๐ก
๐
143
and
๐ธ๐๐,๐๐ก() โค(
1 +6โ๐๐ก
๐
)2
|๐ โ ๐๐, ๐ โ ๐๐ก | ๐(๐, ๐) = 1|
+ 2ร 1
4
(2 +
6โ๐๐ก
๐
)2
|๐ โ ๐2, ๐ โ ๐3 | ๐(๐, ๐) = 2|
+ 2ร 1
4
(6โ๐๐ก
๐
)2
|๐ โ ๐1 โช ๐2, ๐ โ ๐5 | ๐(๐, ๐) = 2|
โค(
1 +18โ๐๐ก
๐
)(2โโ 3)๐๐ก 2๐๐๐
+1
2
(4 +
30โ๐๐ก
๐
)(๐โ ๐)๐๐๐๐ก
+1
2
6โ๐๐ก
๐(2๐ + ๐)๐๐๐๐ก
โค (4โโ 6)๐๐๐ก๐๐ + 2(๐โ ๐)๐๐ก๐๐ + 100๐โ2๐2
๐ก ๐๐
๐.
The quantity ๐ธ๐4() is given by (4.8) with ๐ = ๐๐ฃ, and so
๐ธ๐4() =๐๐ฃ(๐๐ฃ โ 1)(3๐2 โ 4๐(๐๐ฃ + 1) + 2๐๐ฃ(๐๐ฃ + 1))
6๐2
โค (๐โ 1)(๐โ 2)โ 2๐ + 32 + 3
6.
The quantity ๐ธ๐1,๐4() is given by (4.9) with
๐ = 3๐๐ก + /2, ๐ = ๐๐๐, and ๐ = ๐๐ฃ,
and so
๐ธ๐1,๐4() โค ๐๐๐๐3๐ฃ
3๐2+
13๐๐๐๐๐ฃ2
3๐2โค ๐๐๐
3(๐โ 3) +
16๐๐๐2
3๐.
Similarly, the quantity ๐ธ๐2,๐4() is given by (4.9) with
๐ = 32๐๐ก + /2โ ๐๐๐, ๐ = (๐โ ๐)๐๐, and ๐ = ๐๐ฃ,
144
and so
๐ธ๐1,๐4() โค (๐โ ๐)๐๐๐3๐ฃ
3๐2+
13(๐โ ๐)๐๐๐๐ฃ2
3๐2
โค (๐โ ๐)๐๐
3(๐โ 3) +
16(๐โ ๐)๐๐2
3๐.
Next, we estimate ๐ธ๐๐3 ,๐4
(). Using formula (4.9) with
๐ = ๐๐ก + (๐ โ ๐/2)๐๐ +๐โ
๐=๐
โ๐, ๐ = โ๐, and ๐ = ๐๐ฃ,
where โ๐ := |๐ ๐3 |, we have
๐ธ๐๐3 ,๐4โค โ๐๐
3๐ฃ
3๐2+
13โ๐๐๐ฃ2
3๐2โค โ๐
3(๐โ 3) +
16โ๐2
3๐.
Combining the individual estimates for each ๐ ๐3 gives us
๐ธ๐3,๐4 โคโ๐๐ก
3(๐โ 3) +
16โ๐๐ก2
3๐.
Finally, we can combine all of our above estimates to obtain an upper bound on ๐ธ().
We have
๐ธ() โค (2โโ 1)โ๐2๐ก + (4โโ 6)๐๐๐ก๐๐ + 2(๐โ ๐)๐๐ก๐๐ +
(๐โ 1)(๐โ 2)
6โ 1
3๐
+ 122 + 2
3๐๐๐(๐โ 3) + 2
3โ๐๐ก(๐โ 3) + 200๐2๐2
๐
=(๐โ 1)(๐โ 2)
6โ 1
22 + (2โโ 1)โ๐2
๐ก
+[(4โโ 6)๐ + 2(๐โ ๐)
]๐๐ก๐๐ + 200๐2๐2
๐
โค (๐โ 1)(๐โ 2)
6โ โ๐2
๐ก โ 2(2๐ + ๐)๐๐ก๐๐ + 200๐2๐2๐ .
We define the ceiling of this final upper bound to be the quantity ๐ฟ. The remainder of
the proof consists of showing that if our given instance satisfies at most 2๐โ 2 clauses, then
any layout has objective value at least ๐ฟ + 1.
145
Suppose, to the contrary, that there exists some layout โฒ (shifted so thatโ
๐ โฒ๐ = 0),
with ๐ธ(โฒ) < ๐ฟ + 1.
From the analysis above, |โฒ๐| โค 1 + 1/๐5
๐ก for all ๐. Intuitively, an optimal layout should
have a large fraction of the vertices at distance two on opposite sides. To make this intuition
precise, we first note that
Lemma 17. Let โ R๐ be a layout of the clique ๐พ๐. Then ๐ธ() โฅ (๐โ 1)(๐โ 2)/6.
Proof. The first order conditions (4.7) imply that the optimal layout (up to translation and
vertex reordering) is given by โฒ๐ =
(2๐โ(๐+1)
)/๐. By (4.8), ๐ธ(โฒ) = (๐โ1)(๐โ2)/6.
Using Lemma 17, we can lower bound ๐ธ(โฒ) by
๐ธ(โฒ) =โ๐<๐
(โฒ๐ โ โฒ
๐ โ 1)2 โ โ
(๐,๐)โ๐
(โฒ๐ โ โฒ
๐ โ 1)2
+1
4
โ(๐,๐)โ๐
(โฒ๐ โ โฒ
๐ โ 2)2
โฅ (๐โ 1)(๐โ 2)
6โโ
(๐,๐)โ๐
[34(โฒ
๐ โ โฒ๐)2 โ (โฒ
๐ โ โฒ๐)].
Therefore, by assumption,
โ(๐,๐)โ๐
[34(โฒ
๐ โ โฒ๐)2 โ (โฒ
๐ โ โฒ๐)]โฅ (๐โ 1)(๐โ 2)
6โ (๐ฟ + 1)
โฅ โ๐2๐ก + 2(2๐ + ๐)๐๐ก๐๐ โ 200๐2๐2
๐ โ 2.
We note that the function 34๐ฅ2 โ ๐ฅ equals one at ๐ฅ = 2 and is negative for ๐ฅ โ
[0, 4
3
).
Because |โฒ๐| โค 1 + 1/๐5
๐ก for all ๐,
max(๐,๐)โ๐
[34(โฒ
๐ โ โฒ๐)2 โ (โฒ
๐ โ โฒ๐)]โค 3
4
(2 +
2
๐5๐ก
)2
โ(
2 +2
๐5๐ก
)โค 1 +
7
๐5๐ก
.
Let
๐ โฒ := (๐, ๐) โ ๐ | โฒ๐ โค โ1
6and โฒ
๐ โฅ 16.
By assumption, |๐ โฒ| is at most โ๐2๐ก + 2(2๐ + ๐ โ 1)๐๐ก๐๐, otherwise the corresponding
instance could satisfy at least 2๐ clauses, a contradiction.
146
However, the quantity[34(โฒ
๐ โ โฒ๐)2 โ (โฒ
๐ โ โฒ๐)]
is negative for all (๐, ๐) โ ๐โ๐ โฒ.
Therefore,
(1 +
7
๐5๐ก
)|๐ โฒ| โฅ โ๐2
๐ก + 2(2๐ + ๐)๐๐ก๐๐ โ 200๐2๐2๐ โ 2,
which implies that
|๐ โฒ| โฅ(
1โ 7
๐5๐ก
)(โ๐2
๐ก + 2(2๐ + ๐)๐๐ก๐๐ โ 200๐2๐2๐ โ 2
)โฅ โ๐2
๐ก + 2(2๐ + ๐)๐๐ก๐๐ โ 200๐2๐2๐ โ 2000
> โ๐2๐ก + 2(2๐ + ๐โ 1)๐๐ก๐๐ + ๐๐ก๐๐,
a contradiction. This completes the proof, with a gap of one.
4.3.3 An Approximation Algorithm
In this subsection, we formally describe an approximation algorithm using tools from the
Dense CSP literature, and prove theoretical guarantees for the algorithm.
Preliminaries: Greedy Algorithms for Max-CSP
A long line of work studies the feasibility of solving the Max-CSP problem under various
related pseudorandomness and density assumptions. In our case, an algorithm with mild
dependence on the alphabet size is extremely important. A very simple greedy approach,
proposed and analyzed by Mathieu and Schudy [81, 101] (see also [133]), satisfies this
requirement.
Theorem 18 ([81, 101]). Suppose that ฮฃ is a finite alphabet, ๐ โฅ 1 is a positive integer,
and for every ๐, ๐ โ(๐2
)we have a function ๐๐๐ : ฮฃ ร ฮฃ โ [โ๐,๐ ]. Then for any
๐ > 0, Algorithm GREEDYCSP with ๐ก0 = ๐(1/๐2) runs in time ๐2|ฮฃ|๐(1/๐2) and returns
147
Algorithm 4 Greedy Algorithm for Dense CSPs [81, 101]
function GreedyCSP(ฮฃ, ๐, ๐ก0, ๐๐๐)Shuffle the order of variables ๐ฅ1, . . . , ๐ฅ๐ by a random permutation.for all assignments ๐ฅ1, . . . , ๐ฅ๐ก0 โ ฮฃ๐ก0 do
for (๐ก0 + 1) โค ๐ โค ๐ doChoose ๐ฅ๐ โ ฮฃ to maximize โ
๐<๐
๐๐๐(๐ฅ๐, ๐ฅ๐)
end forRecord ๐ฅ and objective value
โ๐ =๐ ๐๐๐(๐ฅ๐, ๐ฅ๐).
end forReturn the assignment ๐ฅ found with maximum objective value.
end function
๐ฅ1, . . . , ๐ฅ๐ โ ฮฃ such that
Eโ๐ =๐
๐๐๐(๐ฅ๐, ๐ฅ๐) โฅโ๐ =๐
๐๐๐(๐ฅ*๐ , ๐ฅ
*๐)โ ๐๐๐2
for any ๐ฅ*1, . . . , ๐ฅ
*๐ โ ฮฃ, where E denotes the expectation over the randomness of the
algorithm.
In comparison, we note that computing the maximizer using brute force would run in time
|ฮฃ|๐, i.e. exponentially slower in terms of ๐. This guarantee is stated in expectation but,
if desired, can be converted to a high probability guarantee by using Markovโs inequality
and repeating the algorithm multiple times. We use GREEDYCSP to solve a minimization
problem instead of maximization, which corresponds to negating all of the functions ๐๐๐ . To
do so, we require estimates on the error due to discretization of our domain.
Lemma 18. For ๐, ๐ > 0 and ๐ฆ โ R๐ with โ๐ฆโ โค ๐ , the function ๐ฅ โฆโ (โ๐ฅโ ๐ฆโ/๐โ 1)2 is2๐
max(1, 2๐ /๐)-Lipschitz on ๐ต๐ = ๐ฅ : โ๐ฅโ โค ๐ .
Proof. We first prove that the function ๐ฅ โฆโ (๐ฅ/๐โ 1)2 is 2๐
max(1, ๐ /๐)-Lipschitz on the
interval [0, ๐ ]. Because the derivative of the function is 2๐(๐ฅ/๐ โ 1) and
2๐(๐ฅ/๐โ 1)
โค
2๐
max(1, ๐ /๐) on [0, ๐ ], this result follows from the mean value theorem.
148
Algorithm 5 Approximation Algorithm KKSCHEME
function KKSCHEME(๐1, ๐2, ๐ ):Build an ๐1-net ๐๐1 of ๐ต๐ = ๐ฅ : โ๐ฅโ โค ๐ โ R๐ as in Lemma 12.Apply the GREEDYCSP algorithm of Theorem 18 with ๐ = ๐2 to approximatelyminimize ๐ธ(1, . . . , ๐) over 1, . . . , ๐ โ ๐๐
๐1.
Return 1, . . . , ๐.end function
Because the function โ๐ฅ โ ๐ฆโ is 1-Lipschitz and โ๐ฅ โ ๐ฆโ โค โ๐ฅโ + โ๐ฆโ โค 2๐ by the
triangle inequality, the result follows from the fact that ๐ฅ โฆโ (๐ฅ/๐โ 1)2 is 2๐
max(1, ๐ /๐)-
Lipschitz and a composition of Lipschitz functions is Lipschitz.
Combining this result with Lemma 12 we can bound the loss in objective value due to
restricting to a well-chosen discretization.
Lemma 19. Let 1, . . . , ๐ โ R๐ be arbitrary vectors such that โ๐โ โค ๐ for all ๐ and
๐ > 0 be arbitrary. Define ๐๐ to be an ๐-net of ๐ต๐ as in Lemma 12, so |๐๐| โค (3๐ /๐)๐. For
any input metric over [๐] with min๐,๐โ[๐] ๐(๐, ๐) = 1, there exists โฒ1, . . . ,
โฒ๐ โ ๐๐ such that
๐ธ(โฒ1, . . . ,
โฒ๐) โค ๐ธ(1, . . . , ๐) + 4๐๐ ๐2
where ๐ธ is (4.5) defined with respect to an arbitrary graph with ๐ vertices.
Proof. By Lemma 18 the energy ๐ธ(1, . . . , ๐) is the sum of(๐2
)โค ๐2/2 many terms,
which, for each ๐ and ๐, are individually 4๐ -Lipschitz in ๐ and ๐ . Therefore, defining โฒ๐
to be the closest point in ๐๐ for all ๐ gives the desired result.
This result motivates a straightforward algorithm for producing nearly-optimal layouts
of a graph. Given some graph, by Lemma 15, a sufficiently large radius can be chosen so
that there exists an optimal layout in a ball of that radius. By constructing an ๐-net of that
ball, and applying the GREEDYCSP algorithm to the resulting discretization, we obtain a
nearly-optimal layout with theoretical guarantees. This technique is formally described in
the algorithm KKSCHEME. We are now prepared to prove Theorem 17.
149
By combining Lemma 19 with Theorem 18 (used as a minimization instead of maxi-
mization algorithm), the output 1, . . . , ๐ of KKSCHEME satisfies
๐ธ(1, . . . , ๐) โค ๐ธ(*1, . . . ,
*๐) + 4๐1๐ ๐2 + ๐2๐
2๐2
and runs in time ๐22๐(1/๐22)๐ log(3๐ /๐1). Taking ๐2 = ๐(๐/๐ 2) and ๐1 = ๐(๐/๐ ) completes
the proof of Theorem 17.
The runtime can be improved to ๐2 + (๐ /๐)๐(๐๐ 4/๐2) using a slightly more complex
greedy CSP algorithm [81]. Also, by the usual argument, a high probability guarantee can
be derived by repeating the algorithm ๐(log(2/๐ฟ)) times, where ๐ฟ > 0 is the desired failure
probability.
150
Chapter 5
Error Estimates for the Lanczos Method
5.1 Introduction
The computation of extremal eigenvalues of matrices is one of the most fundamental
problems in numerical linear algebra. When a matrix is large and sparse, methods such
as the Jacobi eigenvalue algorithm and QR algorithm become computationally infeasible,
and, therefore, techniques that take advantage of the sparsity of the matrix are required.
Krylov subspace methods are a powerful class of techniques for approximating extremal
eigenvalues, most notably the Arnoldi iteration for non-symmetric matrices and the Lanczos
method for symmetric matrices.
The Lanczos method is a technique that, given a symmetric matrix ๐ด โ R๐ร๐ and an
initial vector ๐ โ R๐, iteratively computes a tridiagonal matrix ๐๐ โ R๐ร๐ that satisfies
๐๐ = ๐๐๐๐ด๐๐, where ๐๐ โ R๐ร๐ is an orthonormal basis for the Krylov subspace
๐ฆ๐(๐ด, ๐) = span๐, ๐ด๐, ..., ๐ด๐โ1๐.
The eigenvalues of ๐๐, denoted by ๐(๐)1 (๐ด, ๐) โฅ ... โฅ ๐
(๐)๐ (๐ด, ๐), are the Rayleigh-Ritz
approximations to the eigenvalues ๐1(๐ด) โฅ ... โฅ ๐๐(๐ด) of ๐ด on ๐ฆ๐(๐ด, ๐), and, therefore,
151
are given by
๐(๐)๐ (๐ด, ๐) = min
๐โ๐ฆ๐(๐ด,๐)dim(๐)=๐+1โ๐
max๐ฅโ๐๐ฅ =0
๐ฅ๐๐ด๐ฅ
๐ฅ๐๐ฅ, ๐ = 1, ...,๐, (5.1)
or, equivalently,
๐(๐)๐ (๐ด, ๐) = max
๐โ๐ฆ๐(๐ด,๐)dim(๐)=๐
min๐ฅโ๐๐ฅ =0
๐ฅ๐๐ด๐ฅ
๐ฅ๐๐ฅ, ๐ = 1, ...,๐. (5.2)
This description of the Lanczos method is sufficient for a theoretical analysis of error
(i.e., without round-off error), but, for completeness, we provide a short description of the
Lanczos method (when ๐ฆ๐(๐ด, ๐) is full-rank) in Algorithm 6 [113]. For a more detailed
discussion of the nuances of practical implementation and techniques to minimize the effects
of round-off error, we refer the reader to [47, Section 10.3]. If ๐ด has ๐๐ non-zero entries,
then Algorithm 6 outputs ๐๐ after approximately (2๐ + 8)๐๐ floating point operations.
A number of different techniques, such as the divide-and-conquer algorithm with the fast
multipole method [23], can easily compute the eigenvalues of a tridiagonal matrix. The
complexity of this computation is negligible compared to the Lanczos algorithm, as, in
practice, ๐ is typically orders of magnitude less than ๐.
Equation (5.1) for the Ritz values ๐(๐)๐ illustrates the significant improvement that the
Lanczos method provides over the power method for extremal eigenvalues. Whereas the
power method uses only the iterate ๐ด๐๐ as an approximation of an eigenvector associated
with the largest magnitude eigenvalue, the Lanczos method uses the span of all of the iterates
of the power method (given by๐ฆ๐+1(๐ด, ๐)). However, the analysis of the Lanczos method is
significantly more complicated than that of the power method. Error estimates for extremal
eigenvalue approximation using the Lanczos method have been well studied, most notably
by Kaniel [56], Paige [91], Saad [99], and Kuczynski and Wozniakowski [67] (other notable
work includes [30, 36, 68, 84, 92, 104, 105, 125]). The work of Kaniel, Paige, and Saad
focused on the convergence of the Lanczos method as ๐ increases, and, therefore, their
results have strong dependence on the spectrum of the matrix ๐ด and the choice of initial
152
vector ๐. For example, a standard result of this type is the estimate
๐1(๐ด)โ ๐(๐)1 (๐ด, ๐)
๐1(๐ด)โ ๐๐(๐ด)โค(
tanโ (๐, ๐1)
๐๐โ1(1 + 2๐พ)
)2
, ๐พ =๐1(๐ด)โ ๐2(๐ด)
๐2(๐ด)โ ๐๐(๐ด), (5.3)
where ๐1 is the eigenvector corresponding to ๐1, and ๐๐โ1 is the (๐โ1)๐กโ degree Chebyshev
polynomial of the first kind [100, Theorem 6.4]. Kuczynski and Wozniakowski took a quite
different approach, and estimated the maximum expected relative error (๐1 โ ๐(๐)1 )/๐1 over
all ๐ร ๐ symmetric positive definite matrices, resulting in error estimates that depend only
on the dimension ๐ and the number of iterations ๐. They produced the estimate
sup๐ดโ๐ฎ๐
++
E๐โผ๐ฐ(๐๐โ1)
[๐1(๐ด)โ ๐
(๐)1 (๐ด, ๐)
๐1(๐ด)
]โค .103
ln2(๐(๐โ 1)4)
(๐โ 1)2, (5.4)
for all ๐ โฅ 8 and ๐ โฅ 4, where ๐ฎ๐++ is the set of all ๐ ร ๐ symmetric positive definite
matrices, and the expectation is with respect to the uniform probability measure on the
hypersphere ๐๐โ1 = ๐ฅ โ R๐ | โ๐ฅโ = 1 [67, Theorem 3.2]. One can quickly verify
that equation (5.4) also holds when the ๐1(๐ด) term in the denominator is replaced by
๐1(๐ด)โ ๐๐(๐ด), and the supremum over ๐ฎ๐++ is replaced by the maximum over the set of all
๐ร ๐ symmetric matrices, denoted by ๐ฎ๐.
Both of these approaches have benefits and drawbacks. If an extremal eigenvalue is
known to have a reasonably large eigenvalue gap (based on application or construction),
then a distribution dependent estimate provides a very good approximation of error, even for
small ๐. However, if the eigenvalue gap is not especially large, then distribution dependent
estimates can significantly overestimate error, and estimates that depend only on ๐ and ๐
are preferable. This is illustrated by the following elementary, yet enlightening, example.
Example 1. Let ๐ด โ ๐๐++ be the tridiagonal matrix resulting from the discretization of the
Laplacian operator on an interval with Dirichlet boundary conditions, namely, ๐ด๐,๐ = 2
for ๐ = 1, ..., ๐ and ๐ด๐,๐+1 = ๐ด๐+1,๐ = โ1 for ๐ = 1, ..., ๐ โ 1. The eigenvalues of ๐ด
are given by ๐๐(๐ด) = 2 + 2 cos (๐๐/(๐ + 1)), ๐ = 1, ..., ๐. Consider the approximation
of ๐1(๐ด) by ๐ iterations of the Lanczos method. For a random choice of ๐, the expected
153
Algorithm 6 Lanczos Method
Input: symmetric matrix ๐ด โ R๐ร๐, vector ๐ โ R๐, number of iterations ๐.
Output: symmetric tridiagonal matrix ๐๐ โ R๐ร๐, ๐๐(๐, ๐) = ๐ผ๐,
๐๐(๐, ๐ + 1) = ๐ฝ๐, satisfying ๐๐ = ๐๐๐๐ด๐๐, where ๐๐ = [๐1 ... ๐๐].
Set ๐ฝ0 = 0, ๐0 = 0, ๐1 = ๐/โ๐โ
For ๐ = 1, ...,๐
๐ฃ = ๐ด๐๐
๐ผ๐ = ๐๐๐ ๐ฃ
๐ฃ = ๐ฃ โ ๐ผ๐๐๐ โ ๐ฝ๐โ1๐๐โ1
๐ฝ๐ = โ๐ฃโ๐๐+1 = ๐ฃ/๐ฝ๐
value of tan2โ (๐, ๐1) is (1+๐(1))๐. If tan2โ (๐, ๐1) = ๐ถ๐ for some constant ๐ถ, then (5.3)
produces the estimate
๐1(๐ด)โ ๐(๐)1 (๐ด, ๐)
๐1(๐ด)โ ๐๐(๐ด)โค ๐ถ๐๐๐โ1
(1 + 2 tan
(๐
2(๐ + 1)
)tan
(3๐
2(๐ + 1)
))โ2
โ ๐(1 + ๐(๐โ1))โ๐ โ ๐.
In this instance, the estimate is a trivial one for all choices of ๐, which varies greatly from
the error estimate (5.4) of order ln2 ๐/๐2. The exact same estimate holds for the smallest
eigenvalue ๐๐(๐ด) when the corresponding bounds are applied, since 4๐ผ โ๐ด is similar to ๐ด.
Now, consider the approximation of the largest eigenvalue of ๐ต = ๐ดโ1 by ๐ iterations
of the Lanczos method. The matrix ๐ต possesses a large gap between the largest and second-
largest eigenvalue, which results in a value of ๐พ for (5.3) that remains bounded below by a
constant independent of ๐, namely
๐พ = (2 cos (๐/(๐ + 1)) + 1) / (2 cos (๐/(๐ + 1))โ 1) .
154
Therefore, in this instance, the estimate (5.3) illustrates a constant convergence rate, pro-
duces non-trivial bounds (for typical ๐) for ๐ = ฮ(ln๐), and is preferable to the error
estimate (5.4) of order ln2 ๐/๐2.
More generally, if a matrix ๐ด has eigenvalue gap ๐พ . ๐โ๐ผ and the initial vector ๐
satisfies tan2โ (๐, ๐1) & ๐, then the error estimate (5.3) is a trivial one for ๐ . ๐๐ผ/2. This
implies that the estimate (5.3) is most useful when the eigenvalue gap is constant or tends to
zero very slowly as ๐ increases. When the gap is not especially large (say, ๐โ๐ผ, ๐ผ constant),
then uniform error estimates are preferable for small values of ๐. In this work, we focus
on uniform bounds, namely, error estimates that hold uniformly over some large set of
matrices (typically ๐ฎ๐++ or ๐ฎ๐). We begin by recalling some of the key existing uniform
error estimates for the Lanczos method.
5.1.1 Related Work
Uniform error estimates for the Lanczos method have been produced almost exclusively
for symmetric positive definite matrices, as error estimates for extremal eigenvalues of
symmetric matrices can be produced from estimates for ๐๐++ relatively easily. The majority
of results apply only to either ๐1(๐ด), ๐๐(๐ด), or some function of the two (i.e., condition
number). All estimates are probabilistic and take the initial vector ๐ to be uniformly
distributed on the hypersphere. Here we provide a brief description of some key uniform
error estimates previously produced for the Lanczos method.
In [67], Kuczynski and Wozniakowski produced a complete analysis of the power
method and provided a number of upper bounds for the Lanczos method. Most notably, they
produced error estimate (5.4) and provided the following upper bound for the probability
that the relative error (๐1 โ ๐(๐)1 )/๐1 is greater than some value ๐:
sup๐ดโ๐ฎ๐
++
P๐โผ๐ฐ(๐๐โ1)
[๐1(๐ด)โ ๐
(๐)1 (๐ด, ๐)
๐1(๐ด)> ๐
]โค 1.648
โ๐๐โ
โ๐(2๐โ1). (5.5)
However, the authors were unable to produce any lower bounds for (5.4) or (5.5), and stated
that a more sophisticated analysis would most likely be required. In the same paper, they
155
performed numerical experiments for the Lanczos method that produced errors of the order
๐โ2, which led the authors to suggest that the error estimate (5.4) may be an overestimate,
and that the ln2 ๐ term may be unnecessary.
In [68], Kuczynski and Wozniakowski noted that the above estimates immediately
translate to relative error estimates for minimal eigenvalues when the error ๐(๐)๐ โ ๐๐ is
considered relative to ๐1 or ๐1 โ ๐๐ (both normalizations can be shown to produce the
same bound). However, we can quickly verify that there exist sequences in ๐ฎ๐++ for which
the quantity E[(๐
(๐)๐ โ ๐๐)/๐๐
]is unbounded. These results for minimal eigenvalues,
combined with (5.4), led to error bounds for estimating the condition number of a matrix.
Unfortunately, error estimates for the condition number face the same issue as the quantity
(๐(๐)๐ โ ๐๐)/๐๐, and therefore, only estimates that depend on the value of the condition
number can be produced.
The proof technique used to produce (5.4) works specifically for the quantity E[(๐1 โ
๐(๐)1 )/๐1
](i.e., the 1-norm), and does not carry over to more general ๐-norms of the form
E[((๐1 โ ๐
(๐)1 )/๐1)
๐]1/๐, ๐ โ (1,โ). Later, in [30, Theorem 5.2, ๐ = 1], Del Corso and
Manzini produced an upper bound of the order (โ๐/๐)1/๐ for arbitrary ๐-norms, given by
sup๐ดโ๐ฎ๐
++
E๐โผ๐ฐ(๐๐โ1)
[(๐1(๐ด)โ ๐
(๐)1 (๐ด, ๐)
๐1(๐ด)
)๐] 1๐
.1
๐1/๐
(ฮ(๐โ 1
2
)ฮ(๐2
)ฮ(๐)ฮ
(๐โ12
) ) 1๐
. (5.6)
This bound is clearly worse than (5.4), and better bounds can be produced for arbitrary ๐
simply by making use of (5.5). Again, the authors were unable to produce any lower bounds.
More recently, the machine learning and optimization community has become increas-
ingly interested in the problem of approximating the top singular vectors of a symmetric
matrix by randomized techniques (for a review, see [49]). This has led to a wealth of new
results in recent years regarding classical techniques, including the Lanczos method. In
[84], Musco and Musco considered a block Krylov subspace algorithm similar to the block
Lanczos method and showed that with high probability their algorithm achieves an error of
๐ in ๐ = ๐(๐โ1/2 ln๐) iterations, matching the bound (5.5) for a block size of one. Very
recently, a number of lower bounds for a wide class of randomized algorithms were shown
156
in [104, 105], all of which can be applied to the Lanczos method as a corollary. First, the
authors showed that the dependence on ๐ in the above upper bounds was in fact necessary.
In particular, it follows from [105, Theorem A.1] that ๐(log ๐) iterations are necessary to
obtain a non-trivial error. In addition, as a corollary of their main theorem [105, Theorem
1], there exist ๐1, ๐2 > 0 such that for all ๐ โ (0, 1) there exists an ๐๐ = poly(๐โ1) such that
for all ๐ โฅ ๐๐ there exists a random matrix ๐ด โ ๐ฎ๐ such that
P[๐(๐ด)โ ๐(๐ )
๐(๐ด)โฅ ๐
12
]โฅ 1โ ๐โ๐๐2 for all ๐ โค ๐1๐
โ1/2 ln๐, (5.7)
where ๐(ยท) is the spectral radius of a matrix, and the randomness is over both the initial
vector and matrix.
5.1.2 Contributions and Remainder of Chapter
In what follows, we prove improved upper bounds for the maximum expected error of the
Lanczos method in the ๐-norm, ๐ โฅ 1, and combine this with nearly-matching asymptotic
lower bounds with explicit constants. These estimates can be found in Theorem 19. The
upper bounds result from using a slightly different proof technique than that of (5.4), which
is more robust to arbitrary ๐-norms. Comparing the lower bounds of Theorem 19 to the
estimate (5.7), we make a number of observations. Whereas (5.7) results from a statistical
analysis of random matrices, our estimates follow from taking an infinite sequence of non-
random matrices with explicit eigenvalues and using the theory of orthogonal polynomials.
Our estimate for ๐ = ๐(ln๐) (in Theorem 19) is slightly worse than (5.7) by a factor
of ln2 ln๐, but makes up for this in the form of an explicit constant. The estimate for
๐ = ๐(๐1/2 lnโ1/2 ๐) (also in Theorem 19) does not have ๐ dependence, but illustrates a
useful lower bound, as it has an explicit constant and the ln๐ term becomes negligible as
๐ increases. The results (5.7) do not fully apply to this regime, as the dependence of the
required lower bound on ๐ is at least cubic in the inverse of the eigenvalue gap (see [105,
Theorem 6.1] for details).
To complement these bounds, we also provide an error analysis for matrices that have a
157
certain structure. In Theorem 20, we produce improved dimension-free upper bounds for
matrices that have some level of eigenvalue โregularity" near ๐1. In addition, in Theorem 21,
we prove a powerful result that can be used to determine, for any fixed ๐, the asymptotic
relative error for any sequence of matrices ๐๐ โ ๐ฎ๐, ๐ = 1, 2, ..., that exhibits suitable
convergence of its empirical spectral distribution. Later, we perform numerical experiments
that illustrate the practical usefulness of this result. Theorem 21, combined with estimates
for Jacobi polynomials (see Proposition 13), illustrates that the inverse quadratic dependence
on the number of iterations ๐ in the estimates produced throughout this chapter does not
simply illustrate the worst case, but is actually indicative of the typical case in some sense.
In Corollary 3 and Theorem 22, we produce results similar to Theorem 19 for arbitrary
eigenvalues ๐๐. The lower bounds follow relatively quickly from the estimates for ๐1, but
the upper bounds require some mild assumptions on the eigenvalue gaps of the matrix.
These results mark the first uniform-type bounds for estimating arbitrary eigenvalues by
the Lanczos method. In addition, in Corollary 4, we translate our error estimates for the
extremal eigenvalues ๐1(๐ด) and ๐๐(๐ด) into error bounds for the condition number of a
symmetric positive definite matrix, but, as previously mentioned, the relative error of the
condition number of a matrix requires estimates that depend on the condition number itself.
Finally, we present numerical experiments that support the accuracy and practical usefulness
of the theoretical estimates detailed above.
The remainder of the chapter is as follows. In Section 5.2, we prove basic results
regarding relative error and make a number of fairly straightforward observations. In Section
5.3, we prove asymptotic lower bounds and improved upper bounds for the relative error
in an arbitrary ๐-norm. In Section 5.4, we produce a dimension-free error estimate for a
large class of matrices and prove a theorem that can be used to determine the asymptotic
relative error for any fixed ๐ and sequence of matrices ๐๐ โ ๐ฎ๐, ๐ = 1, 2, ..., with suitable
convergence of its empirical spectral distribution. In Section 5.5, under some mild additional
assumptions, we prove a version of Theorem 19 for arbitrary eigenvalues ๐๐, and extend
our results for ๐1 and ๐๐ to the condition number of a symmetric positive definite matrix.
Finally, in Section 5.6, we perform a number of experiments and discuss how the numerical
158
results compare to the theoretical estimates in this work.
5.2 Preliminary Results
Because the Lanczos method applies only to symmetric matrices, all matrices in this
chapter are assumed to belong to ๐ฎ๐. The Lanczos method and the quantity (๐1(๐ด) โ
๐(๐)1 (๐ด, ๐))/(๐1(๐ด) โ ๐๐(๐ด)) are both unaffected by shifting and scaling, and so any
maximum over ๐ด โ ๐ฎ๐ can be replaced by a maximum over all ๐ด โ ๐ฎ๐ with ๐1(๐ด) and
๐๐(๐ด) fixed. Often for producing upper bounds, it is convenient to choose ๐1(๐ด) = 1 and
๐๐(๐ด) = 0. For the sake of brevity, we will often write ๐๐(๐ด) and ๐(๐)๐ (๐ด, ๐) as ๐๐ and ๐
(๐)๐
when the associated matrix ๐ด and vector ๐ are clear from context.
We begin by rewriting expression (5.2) for ๐(๐)1 in terms of polynomials. The Krylov
subspace ๐ฆ๐(๐ด, ๐) can be alternatively defined as
๐ฆ๐(๐ด, ๐) = ๐ (๐ด)๐ |๐ โ ๐ซ๐โ1,
where ๐ซ๐โ1 is the set of all real-valued polynomials of degree at most ๐ โ 1. Suppose
๐ด โ ๐ฎ๐ has eigendecomposition ๐ด = ๐ฮ๐๐ , where ๐ โ R๐ร๐ is an orthogonal matrix and
ฮ โ R๐ร๐ is the diagonal matrix satisfying ฮ(๐, ๐) = ๐๐(๐ด), ๐ = 1, ..., ๐. Then we have the
relation
๐(๐)1 (๐ด, ๐) = max
๐ฅโ๐ฆ๐(๐ด,๐)๐ฅ =0
๐ฅ๐๐ด๐ฅ
๐ฅ๐๐ฅ= max
๐โ๐ซ๐โ1๐ =0
๐๐๐ 2(๐ด)๐ด๐
๐๐๐ 2(๐ด)๐= max
๐โ๐ซ๐โ1๐ =0
๐๐ 2(ฮ)ฮ
๐๐ 2(ฮ),
where = ๐๐ ๐. The relative error is given by
๐1(๐ด)โ ๐(๐)1 (๐ด, ๐)
๐1(๐ด)โ ๐๐(๐ด)= min
๐โ๐ซ๐โ1๐ =0
โ๐๐=2
2๐๐
2(๐๐)(๐1 โ ๐๐)
(๐1 โ ๐๐)โ๐
๐=1 2๐๐
2(๐๐),
159
and the expected ๐๐กโ moment of the relative error is given by
E๐โผ๐ฐ(๐๐โ1)
[(๐1 โ ๐
(๐)1
๐1 โ ๐๐
)๐]=
โซ๐๐โ1
min๐โ๐ซ๐โ1
๐ =0
[ โ๐๐=2
2๐๐
2(๐๐)(๐1 โ ๐๐)
(๐1 โ ๐๐)โ๐
๐=1 2๐๐
2(๐๐)
]๐๐๐(),
where ๐ is the uniform probability measure on ๐๐โ1. Because the relative error does not
depend on the norm of or the sign of any entry, we can replace the integral over ๐๐โ1 by
an integral of ๐ฆ = (๐ฆ1, ..., ๐ฆ๐) over [0,โ)๐ with respect to the joint chi-square probability
density function
๐๐ (๐ฆ) =1
(2๐)๐/2exp
โ1
2
๐โ๐=1
๐ฆ๐
๐โ
๐=1
๐ฆโ12
๐ (5.8)
of ๐ independent chi-square random variables ๐1, ..., ๐๐ โผ ๐21 with one degree of freedom
each. In particular, we have
E๐โผ๐ฐ(๐๐โ1)
[(๐1 โ ๐
(๐)1
๐1 โ ๐๐
)๐]=
โซ[0,โ)๐
min๐โ๐ซ๐โ1
๐ =0
[ โ๐๐=2 ๐ฆ๐๐
2(๐๐)(๐1 โ ๐๐)
(๐1 โ ๐๐)โ๐
๐=1 ๐ฆ๐๐2(๐๐)
]๐๐๐ (๐ฆ) ๐๐ฆ.
(5.9)
Similarly, probabilistic estimates with respect to ๐ โผ ๐ฐ(๐๐โ1) can be replaced by estimates
with respect to ๐1, ..., ๐๐ โผ ๐21, as
P๐โผ๐ฐ(๐๐โ1)
[๐1 โ ๐
(๐)1
๐1 โ ๐๐
โฅ ๐
]= P
๐๐โผ๐21
โกโฃ min๐โ๐ซ๐โ1
๐ =0
โ๐๐=2 ๐๐๐
2(๐๐)(๐1 โ ๐๐)
(๐1 โ ๐๐)โ๐
๐=1 ๐๐๐ 2(๐๐)โฅ ๐
โคโฆ . (5.10)
For the remainder of the chapter, we almost exclusively use equation (5.9) for expected
relative error and equation (5.10) for probabilistic bounds for relative error. If ๐ minimizes
the expression in equation (5.9) or (5.10) for a given ๐ฆ or ๐ , then any polynomial of the
form ๐ผ๐ , ๐ผ โ Rโ0, is also a minimizer. Therefore, without any loss of generality, we
can alternatively minimize over the set ๐ซ๐โ1(1) = ๐ โ ๐ซ๐โ1 |๐(1) = 1. For the sake
of brevity, we will omit the subscripts under E and P when the underlying distribution
is clear from context. In this work, we make use of asymptotic notation to express the
limiting behavior of a function with respect to ๐. A function ๐(๐) is ๐(๐(๐)) if there exists
๐,๐0 > 0 such that |๐(๐)| โค๐๐(๐) for all ๐ โฅ ๐0, ๐(๐(๐)) if for every ๐ > 0 there exists
160
a ๐๐ such that |๐(๐)| โค ๐๐(๐) for all ๐ โฅ ๐0, ๐(๐(๐)) if |๐(๐)| = ๐(|๐(๐)|), and ฮ(๐(๐)) if
๐(๐) = ๐(๐(๐)) and ๐(๐) = ๐(๐(๐)).
5.3 Asymptotic Lower Bounds and Improved Upper Bounds
In this section, we obtain improved upper bounds for E[((๐1 โ ๐
(๐)1 )/๐1)
๐]1/๐, ๐ โฅ 1, and
produce nearly-matching lower bounds. In particular, we prove the following theorem for
the behavior of ๐ as a function of ๐ as ๐ tends to infinity.
Theorem 19.
max๐ดโ๐ฎ๐
P๐โผ๐ฐ(๐๐โ1)
[๐1(๐ด)โ ๐
(๐)1 (๐ด, ๐)
๐1(๐ด)โ ๐๐(๐ด)โฅ 1โ ๐(1)
]โฅ 1โ ๐(1/๐)
for ๐ = ๐(ln๐),
max๐ดโ๐ฎ๐
P๐โผ๐ฐ(๐๐โ1)
[๐1(๐ด)โ ๐
(๐)1 (๐ด, ๐)
๐1(๐ด)โ ๐๐(๐ด)โฅ .015 ln2 ๐
๐2 ln2 ln๐
]โฅ 1โ ๐(1/๐)
for ๐ = ฮ(ln๐),
max๐ดโ๐ฎ๐
P๐โผ๐ฐ(๐๐โ1)
[๐1(๐ด)โ ๐
(๐)1 (๐ด, ๐)
๐1(๐ด)โ ๐๐(๐ด)โฅ 1.08
๐2
]โฅ 1โ ๐(1/๐)
for ๐ = ๐(๐1/2 lnโ1/2 ๐
)and ๐(1), and
max๐ดโ๐ฎ๐
E๐โผ๐ฐ(๐๐โ1)
[(๐1(๐ด)โ ๐
(๐)1 (๐ด, ๐)
๐1(๐ด)โ ๐๐(๐ด)
)๐]1/๐โค .068
ln2 (๐(๐โ 1)8๐)
(๐โ 1)2
for ๐ โฅ 100, ๐ โฅ 10, ๐ โฅ 1.
Proof. The first result is a corollary of [105, Theorem A.1], and the remaining results follow
from Lemmas 20, 21, and 22.
By Hรถlderโs inequality, the lower bounds in Theorem 19 also hold for arbitrary ๐-norms,
161
๐ โฅ 1. All results are equally applicable to ๐๐, as the Krylov subspace is unaffected by
shifting and scaling, namely ๐ฆ๐(๐ด, ๐) = ๐ฆ๐(๐ผ๐ด + ๐ฝ๐ผ, ๐) for all ๐ผ = 0.
We begin by producing asymptotic lower bounds. The general technique is as follows.
We choose an infinite sequence of matrices ๐ด๐โ๐=1, ๐ด๐ โ ๐ฎ๐, treat ๐ as a function of ๐,
and show that, as ๐ tends to infinity, for โmost" choices of an initial vector ๐, the relative error
of this sequence of matrices is well approximated by an integral polynomial minimization
problem for which bounds can be obtained. First, we recall a number of useful propositions
regarding Gauss-Legendre quadrature, orthogonal polynomials, and Chernoff bounds for
chi-square random variables.
Proposition 9. ([41],[114]) Let ๐ โ ๐ซ2๐โ1, ๐ฅ๐๐๐=1 be the zeros of the ๐๐กโ degree Legendre
polynomial ๐๐(๐ฅ), ๐๐(1) = 1, in descending order (๐ฅ1 > ... > ๐ฅ๐), and ๐ค๐ = 2(1 โ
๐ฅ2๐ )
โ1[๐ โฒ๐(๐ฅ๐)]
โ2, ๐ = 1, ..., ๐ be the corresponding weights. Then
1.โซ 1
โ1
๐ (๐ฅ) ๐๐ฅ =๐โ
๐=1
๐ค๐๐ (๐ฅ๐),
2. ๐ฅ๐ =(1โ 1
8๐2
)cos(
(4๐โ1)๐4๐+2
)+ ๐(๐โ3), ๐ = 1, ..., ๐,
3.๐
๐ + 12
โ1โ ๐ฅ2
1
(1โ 1
8(๐ + 12)2(1โ ๐ฅ2
1)
)โค ๐ค1 < ... < ๐คโ๐+1
2
โ โค ๐
๐ + 12
.
Proposition 10. ([110, Section 7.72], [111]) Let ๐(๐ฅ) ๐๐ฅ be a measure on [โ1, 1] with
infinitely many points of increase, with orthogonal polynomials ๐๐(๐ฅ)๐โฅ0, ๐๐ โ ๐ซ๐. Then
max๐โ๐ซ๐๐ =0
โซ 1
โ1๐ฅ๐ 2(๐ฅ)๐(๐ฅ) ๐๐ฅโซ 1
โ1๐ 2(๐ฅ)๐(๐ฅ) ๐๐ฅ
= max๐ฅ โ [โ1, 1]
๐๐+1(๐ฅ) = 0
.
Proposition 11. Let ๐ โผ ๐2๐. Then P[๐ โค ๐ฅ] โค
[๐ฅ๐
exp
1โ ๐ฅ๐
]๐2 for ๐ฅ โค ๐ and
P[๐ โฅ ๐ฅ] โค[๐ฅ๐
exp
1โ ๐ฅ๐
]๐2 for ๐ฅ โฅ ๐.
Proof. The result follows from taking [26, Lemma 2.2] and letting ๐โโ.
We are now prepared to prove a lower bound of order ๐โ2.
162
Lemma 20. If ๐ = ๐(๐1/2 lnโ1/2 ๐
)and ๐(1), then
max๐ดโ๐ฎ๐
P๐โผ๐ฐ(๐๐โ1)
[๐1(๐ด)โ ๐
(๐)1 (๐ด, ๐)
๐1(๐ด)โ ๐๐(๐ด)โฅ 1.08
๐2
]โฅ 1โ ๐(1/๐).
Proof. The structure of the proof is as follows. We choose a matrix with eigenvalues based
on the zeros of a Legendre polynomial, and show that a large subset of [0,โ)๐ (with
respect to ๐๐ (๐ฆ)) satisfies conditions that allow us to lower bound the relative error using an
integral minimization problem. The choice of Legendre polynomials is based on connections
to Gaussian quadrature, but, as we will later see (in Theorem 21 and experiments), the
1/๐2 dependence is indicative of a large class of matrices with connections to orthogonal
polynomials. Let ๐ฅ1 > ... > ๐ฅ2๐ be the zeros of the (2๐)๐กโ degree Legendre polynomial.
Let ๐ด โ ๐ฎ๐ have eigenvalue ๐ฅ1 with multiplicity one, and remaining eigenvalues given by
๐ฅ๐ , ๐ = 2, ..., 2๐, each with multiplicity at least โ(๐โ 1)/(2๐โ 1)โ. By equation (5.10),
P
[๐1 โ ๐
(๐)1
๐1 โ ๐๐
โฅ ๐
]= P
โกโฃ min๐โ๐ซ๐โ1
๐ =0
โ๐๐=2 ๐๐๐
2(๐๐)(๐1 โ ๐๐)
(๐1 โ ๐๐)โ๐
๐=1 ๐๐๐ 2(๐๐)โฅ ๐
โคโฆ= P
โกโฃ min๐โ๐ซ๐โ1
๐ =0
โ2๐๐=2 ๐๐๐
2(๐ฅ๐) (๐ฅ1 โ ๐ฅ๐)
(๐ฅ1 โ ๐ฅ2๐)โ2๐
๐=1 ๐๐๐ 2 (๐ฅ๐)โฅ ๐
โคโฆ ,
where ๐1, ..., ๐๐ โผ ๐21, and ๐๐ is the sum of the ๐๐ that satisfy ๐๐ = ๐ฅ๐ . ๐1 has one degree of
freedom, and ๐๐ , ๐ = 2, ..., 2๐, each have at least โ(๐โ 1)/(2๐โ 1)โ degrees of freedom.
Let ๐ค1, ..., ๐ค2๐ be the weights of Gaussian quadrature associated with ๐ฅ1, ..., ๐ฅ2๐. By
Proposition 9,
๐ฅ1 = 1โ 4 + 9๐2
128๐2+ ๐(๐โ3), ๐ฅ2 = 1โ 4 + 49๐2
128๐2+ ๐(๐โ3),
and, therefore, 1โ ๐ฅ21 = 4+9๐2
64๐2 + ๐(๐โ3). Again, by Proposition 9, we can lower bound
163
the smallest ratio between weights by
๐ค1
๐ค๐
โฅโ
1โ ๐ฅ21
(1โ 1
8(2๐ + 1/2)2(1โ ๐ฅ21)
)=
(โ4 + 9๐2
8๐+ ๐(๐โ2)
)(1โ 1
(4 + 9๐2)/2 + ๐(๐โ1)
)=
2 + 9๐2
8โ
4 + 9๐2๐+ ๐(๐โ2).
Therefore, by Proposition 11,
P[min๐โฅ2
๐๐ โฅ๐ค๐
๐ค1
๐1
]โฅ P
[๐๐ โฅ
1
3
โ๐โ 1
2๐โ 1
โ, ๐ โฅ 2
]ร P
[๐1 โค
๐ค1
3๐ค๐
โ๐โ 1
2๐โ 1
โ]
โฅ[1โ
(๐2/3
3
)12
โ๐โ12๐โ1
โ ]2๐โ1[1โ
(๐ค1
3๐ค๐
โ๐โ12๐โ1
โ๐1โ ๐ค1
3๐ค๐
โ๐โ12๐โ1
โ)12]
= 1โ ๐(1/๐).
We now restrict our attention to values of ๐ = (๐1, ..., ๐๐) โ [0,โ)๐ that satisfy the
inequality ๐ค1 min๐โฅ2 ๐๐ โฅ ๐ค๐๐1. If, for some fixed choice of ๐ ,
min๐โ๐ซ๐โ1
๐ =0
โ2๐๐=2 ๐๐๐
2(๐ฅ๐)(๐ฅ1 โ ๐ฅ๐)โ2๐๐=1 ๐๐๐ 2(๐ฅ๐)
โค ๐ฅ1 โ ๐ฅ2, (5.11)
then, by Proposition 9 and Proposition 10 for ๐(๐ฅ) = 1,
min๐โ๐ซ๐โ1
๐ =0
โ2๐๐=2 ๐๐๐
2(๐ฅ๐)(๐ฅ1 โ ๐ฅ๐)โ2๐๐=1 ๐๐๐ 2(๐ฅ๐)
โฅ min๐โ๐ซ๐โ1
๐ =0
โ2๐๐=2๐ค๐๐
2(๐ฅ๐)(๐ฅ1 โ ๐ฅ๐)โ2๐๐=1๐ค๐๐ 2(๐ฅ๐)
= min๐โ๐ซ๐โ1
๐ =0
โ2๐๐=1 ๐ค๐๐
2(๐ฅ๐)(1โ ๐ฅ๐)โ2๐๐=1๐ค๐๐ 2(๐ฅ๐)
โ (1โ ๐ฅ1)
= min๐โ๐ซ๐โ1
๐ =0
โซ 1
โ1
๐ 2(๐ฆ)(1โ ๐ฆ) ๐๐ฆโซ 1
โ1
๐ 2(๐ฆ) ๐๐ฆ
โ 4 + 9๐2
128๐2+ ๐(๐โ3)
=4 + 9๐2
32๐2โ 4 + 9๐2
128๐2+ ๐(๐โ3) =
12 + 27๐2
128๐2+ ๐(๐โ3).
164
Alternatively, if equation (5.11) does not hold, then we can lower bound the left hand side
of (5.11) by ๐ฅ1 โ ๐ฅ2 = 5๐2
16๐2 + ๐(๐โ3). Noting that ๐ฅ1 โ ๐ฅ2๐ โค 2 completes the proof, as12+27๐2
256and 5๐2
32are both greater than 1.08.
The previous lemma illustrates that the inverse quadratic dependence of the relative
error on the number of iterations ๐ persists up to ๐ = ๐(๐1/2/ ln1/2 ๐). As we will see
in Sections 4 and 6, this ๐โ2 error is indicative of the behavior of the Lanczos method
for a wide range of typical matrices encountered in practice. Next, we aim to produce a
lower bound of the form ln2 ๐/(๐2 ln2 ln๐). To do so, we will make use of more general
Jacobi polynomials instead of Legendre polynomials. In addition, in order to obtain a result
that contains some dependence on ๐, the choice of Jacobi polynomial must vary with ๐
(i.e., ๐ผ and ๐ฝ are a function of ๐). However, due to the distribution of eigenvalues required
to produce this bound, Gaussian quadrature is no longer exact, and we must make use of
estimates for basic composite quadrature. We recall the following propositions regarding
composite quadrature and Jacobi polynomials.
Proposition 12. If ๐ โ ๐1[๐, ๐], then for ๐ = ๐ฅ0 < ... < ๐ฅ๐ = ๐,โซ ๐
๐
๐(๐ฅ) ๐๐ฅโ๐โ
๐=1
(๐ฅ๐ โ ๐ฅ๐โ1)๐(๐ฅ*๐ )
โค ๐โ ๐
2max๐ฅโ[๐,๐]
|๐ โฒ(๐ฅ)| max๐=1,...,๐
(๐ฅ๐ โ ๐ฅ๐โ1),
where ๐ฅ*๐ โ [๐ฅ๐โ1, ๐ฅ๐], ๐ = 1, ..., ๐.
Proposition 13. ([103, Chapter 3.2]) Let ๐ (๐ผ,๐ฝ)๐ (๐ฅ)โ๐=0, ๐ผ, ๐ฝ > โ1, be the orthogonal
system of Jacobi polynomials over [โ1, 1] with respect to weight function ๐๐ผ,๐ฝ(๐ฅ) = (1โ
๐ฅ)๐ผ(1 + ๐ฅ)๐ฝ , namely,
๐(๐ผ,๐ฝ)๐ (๐ฅ) =
ฮ(๐ + ๐ผ + 1)
๐! ฮ(๐ + ๐ผ + ๐ฝ + 1)
๐โ๐=0
(๐
๐
)ฮ(๐ + ๐ + ๐ผ + ๐ฝ + 1)
ฮ(๐ + ๐ผ + 1)
(๐ฅโ 1
2
)๐
.
Then
(i)โซ 1
โ1
[๐
(๐ผ,๐ฝ)๐ (๐ฅ)
]2๐๐ผ,๐ฝ(๐ฅ) ๐๐ฅ =
2๐ผ+๐ฝ+1ฮ(๐ + ๐ผ + 1)ฮ(๐ + ๐ฝ + 1)
(2๐ + ๐ผ + ๐ฝ + 1)๐! ฮ(๐ + ๐ผ + ๐ฝ + 1),
165
(ii) max๐ฅโ[โ1,1]
๐
(๐ผ,๐ฝ)๐ (๐ฅ)
= max
ฮ(๐ + ๐ผ + 1)
๐! ฮ(๐ผ + 1),ฮ(๐ + ๐ฝ + 1)
๐! ฮ(๐ฝ + 1)
for max๐ผ, ๐ฝ โฅ โ1
2,
(iii)๐
๐๐ฅ๐
(๐ผ,๐ฝ)๐ (๐ฅ) =
๐ + ๐ผ + ๐ฝ + 1
2๐
(๐ผ+1,๐ฝ+1)๐โ1 (๐ฅ),
(iv) max ๐ฅ โ [โ1, 1] |๐ (๐ผ,๐ฝ)๐ (๐ฅ) = 0 โค
โ1โ
(๐ผ + 3/2
๐ + ๐ผ + 1/2
)2
for ๐ผ โฅ ๐ฝ > โ1112
and
๐ > 1.
Proof. Equations (i)-(iii) are standard results, and can be found in [103, Chapter 3.2] or
[110]. What remains is to prove (iv). By [90, Lemma 3.5], the largest zero ๐ฅ1 of the ๐๐กโ
degree Gegenbauer polynomial (i.e., ๐ (๐โ1/2,๐โ1/2)๐ (๐ฅ)), with ๐ > โ5/12, satisfies
๐ฅ21 โค
(๐ โ 1)(๐ + 2๐ + 1)
(๐ + ๐)2 + 3๐ + 54
+ 3(๐ + 12)2/(๐ โ 1)
โค (๐ โ 1)(๐ + 2๐ + 1)
(๐ + ๐)2= 1โ
(๐ + 1
๐ + ๐
)2
.
By [37, Theorem 2.1], the largest zero of ๐ (๐ผ,๐ฝ+๐ก)๐ (๐ฅ) is strictly greater than the largest zero
of ๐ (๐ผ,๐ฝ)๐ (๐ฅ) for any ๐ก > 0. As ๐ผ โฅ ๐ฝ, combining these two facts provides our desired
result.
For the sake of brevity, the inner product and norm on [โ1, 1] with respect to ๐๐ผ,๐ฝ(๐ฅ) =
(1โ ๐ฅ)๐ผ(1 + ๐ฅ)๐ฝ will be denoted by โจยท, ยทโฉ๐ผ,๐ฝ and โ ยท โ๐ผ,๐ฝ . We are now prepared to prove the
following lower bound.
Lemma 21. If ๐ = ฮ (ln๐), then
max๐ดโ๐ฎ๐
P๐โผ๐ฐ(๐๐โ1)
[๐1(๐ด)โ ๐
(๐)1 (๐ด, ๐)
๐1(๐ด)โ ๐๐(๐ด)โฅ .015 ln2 ๐
๐2 ln2 ln๐
]โฅ 1โ ๐(1/๐).
Proof. The structure of the proof is similar in concept to that of Lemma 20. We choose
a matrix with eigenvalues based on a function corresponding to an integral minimization
problem, and show that a large subset of [0,โ)๐ (with respect to ๐๐ (๐ฆ)) satisfies conditions
that allow us to lower bound the relative error using the aforementioned integral minimization
problem. The main difficulty is that, to obtain improved bounds, we must use Proposition 10
166
with a weight function that requires quadrature for functions that are no longer polynomials,
and, therefore, cannot be represented exactly using Gaussian quadrature. In addition, the
required quadrature is for functions whose derivative has a singularity at ๐ฅ = 1, and so
Proposition 12 is not immediately applicable. Instead, we perform a two-part error analysis.
In particular, if a function is ๐ถ1[0, ๐], ๐ < 1, and monotonic on [๐, 1], then using Proposition
12 on [0, ๐] and a monotonicity argument on [๐, 1] results in an error bound for quadrature.
Let โ =โ.2495 ln๐ln ln๐
โ, ๐ =
โ๐4.004โ
โ, and ๐ = ฮ(ln๐). We assume that ๐ is sufficiently
large, so that โ > 0. โ is quite small for practical values of ๐, but the growth of โ with
respect to ๐ is required for a result with dependence on ๐. Consider a matrix ๐ด โ ๐ฎ๐
with eigenvalues given by ๐(๐ฅ๐), ๐ = 1, ..., ๐, each with multiplicity either โ๐/๐โ or โ๐/๐โ,
where ๐(๐ฅ) = 1โ 2(1โ ๐ฅ)1โ , and ๐ฅ๐ = ๐/๐, ๐ = 1, ..., ๐. Then
P
[๐1 โ ๐
(๐)1
๐1 โ ๐๐
โฅ ๐
]= P
โกโฃ min๐โ๐ซ๐โ1
๐ =0
โ๐๐=2 ๐๐๐
2 (๐๐) (๐1 โ ๐๐)
(๐1 โ ๐๐)โ๐
๐=1 ๐๐๐ 2 (๐๐)โฅ ๐
โคโฆโฅ P
โกโฃ min๐โ๐ซ๐โ1
๐ =0
โ๐๐=1 ๐๐๐
2 (๐(๐ฅ๐)) (1โ ๐(๐ฅ๐))
2โ๐
๐=1 ๐๐๐ 2 (๐(๐ฅ๐))โฅ ๐
โคโฆ ,
where ๐1, ..., ๐๐ โผ ๐21, and ๐๐ is the sum of the ๐๐โs that satisfy ๐๐ = ๐(๐ฅ๐). Each ๐๐ ,
๐ = 1, ..., ๐, has either โ๐/๐โ or โ๐/๐โ degrees of freedom. Because ๐ = ฮ(ln๐), we have
๐ = ๐(๐.999) and, by Proposition 11,
P[.999โ๐
๐โ โค ๐๐ โค 1.001โ๐
๐โ, ๐ = 1, ..., ๐
]โฅ
(1โ
(.999๐.001
)โ๐๐โ โ(1.001๐โ.001
)โ๐๐โ)๐
= 1โ ๐(1/๐).
Therefore, with probability 1โ ๐(1/๐),
min๐โ๐ซ๐โ1
๐ =0
โ๐๐=1 ๐๐๐
2 (๐(๐ฅ๐)) (1โ ๐(๐ฅ๐))
2โ๐
๐=1 ๐๐๐ 2 (๐(๐ฅ๐))โฅ .998
2min
๐โ๐ซ๐โ1๐ =0
โ๐๐=1 ๐
2(๐(๐ฅ๐))(1โ ๐(๐ฅ๐))โ๐๐=1 ๐
2(๐(๐ฅ๐)).
167
Let ๐ (๐ฆ) =โ๐โ1
๐=0 ๐๐๐(โ,0)๐ (๐ฆ), ๐๐ โ R, ๐ = 0, ...,๐โ 1, and define
๐๐,๐ (๐ฅ) = ๐ (โ,0)๐ (๐(๐ฅ))๐ (โ,0)
๐ (๐(๐ฅ)) and ๐๐,๐ (๐ฅ) = ๐๐,๐ (๐ฅ)(1โ ๐(๐ฅ)),
๐, ๐ = 0, ...,๐โ 1. Then we have
โ๐๐=1 ๐
2(๐(๐ฅ๐))(1โ ๐(๐ฅ๐))โ๐๐=1 ๐
2(๐(๐ฅ๐))=
โ๐โ1๐,๐ =0 ๐๐๐๐
โ๐๐=1 ๐๐,๐ (๐ฅ๐)โ๐โ1
๐,๐ =0 ๐๐๐๐ โ๐
๐=1 ๐๐,๐ (๐ฅ๐).
We now replace each quadratureโ๐
๐=1 ๐๐,๐ (๐ฅ๐) orโ๐
๐=1 ๐๐,๐ (๐ฅ๐) in the previous expression
by its corresponding integral, plus a small error term. The functions ๐๐,๐ and ๐๐,๐ are not
elements of ๐1[0, 1], as ๐ โฒ(๐ฅ) has a singularity at ๐ฅ = 1, and, therefore, we cannot use
Proposition 12 directly. Instead we break the error analysis of the quadrature into two
components. We have ๐๐,๐ , ๐๐,๐ โ ๐1[0, ๐] for any 0 < ๐ < 1, and, if ๐ is chosen to be large
enough, both ๐๐,๐ and ๐๐,๐ will be monotonic on the interval [๐, 1]. In that case, we can apply
Proposition 12 to bound the error over the interval [0, ๐] and use basic properties of Riemann
sums of monotonic functions to bound the error over the interval [๐, 1].
The function ๐(๐ฅ) is increasing on [0, 1], and, by equations (iii) and (iv) of Proposition
13, the function ๐(โ,0)๐ (๐ฆ), ๐ = 2, ...,๐โ 1, is increasing on the interval
โกโฃโ1โ(
โ + 5/2
๐ + โโ 1/2
)2
, 1
โคโฆ . (5.12)
The functions ๐(โ,0)0 (๐ฆ) = 1 and ๐
(โ,0)1 (๐ฆ) = โ + 1 + (โ + 2)(๐ฆ โ 1)/2 are clearly non-
decreasing over interval (5.12). By the inequalityโ
1โ ๐ฆ โค 1โ๐ฆ/2, ๐ฆ โ [0, 1], the function
๐(โ,0)๐ (๐(๐ฅ)), ๐ = 0, ...,๐โ 1, is non-decreasing on the interval
[1โ
(โ + 5/2
2๐ + 2โโ 1
)2โ
, 1
]. (5.13)
Therefore, the functions ๐๐,๐ (๐ฅ) are non-decreasing and ๐๐,๐ (๐ฅ) are non-increasing on the
interval (5.13) for all ๐, ๐ = 0, ...,๐ โ 1. The term(
โ+5/22๐+2โโ1
)2โ= ๐(1/๐), and so, for
168
sufficiently large ๐, there exists an index ๐* โ 1, ..., ๐ โ 2 such that
1โ(
โ + 5/2
2๐ + 2โโ 1
)2โ
โค ๐ฅ๐* < 1โ(
โ + 5/2
2๐ + 2โโ 1
)2โ
+1
๐< 1.
We now upper bound the derivatives of ๐๐,๐ and ๐๐,๐ on the interval [0, ๐ฅ๐* ]. By equation
(iii) of Proposition 13, the first derivatives of ๐๐,๐ and ๐๐,๐ are given by
๐โฒ๐,๐ (๐ฅ) = ๐ โฒ(๐ฅ)
([๐
๐๐ฆ๐ (โ,0)๐ (๐ฆ)
]๐ฆ=๐(๐ฅ)
๐ (โ,0)๐ (๐(๐ฅ)) + ๐ (โ,0)
๐ (๐(๐ฅ))
[๐
๐๐ฆ๐ (โ,0)๐ (๐ฆ)
]๐ฆ=๐(๐ฅ)
)
=(1โ ๐ฅ)โ
โโ1โ
โ
((๐ + โ + 1)๐
(โ+1,1)๐โ1 (๐(๐ฅ))๐ (โ,0)
๐ (๐(๐ฅ))
+(๐ + โ + 1)๐ (โ,0)๐ (๐(๐ฅ))๐
(โ+1,1)๐ โ1 (๐(๐ฅ))
),
and
๐โฒ๐,๐ (๐ฅ) = ๐โฒ๐,๐ (๐ฅ)(1โ ๐ฅ)1โ โ 2๐๐,๐ (๐ฅ)
(1โ ๐ฅ)โโโ1โ
โ.
By equation (ii) of Proposition 13, and the inequality(๐ฅ๐ฆ
)โค(
๐๐ฅ๐ฆ
)๐ฆ, ๐ฅ, ๐ฆ โ N, ๐ฆ โค ๐ฅ,
max๐ฅโ[0,๐ฅ๐* ]
|๐โฒ๐,๐ (๐ฅ)| โค (1โ ๐ฅ๐*)โโโ1โ
โ
((๐ + โ + 1)
(๐ + โ
โ + 1
)(๐ + โ
โ
)+(๐ + โ + 1)
(๐ + โ
โ
)(๐ + โ
โ + 1
))
โค 2๐ + โ
โ
((โ + 5/2
2๐ + 2โโ 1
)2โ
โ 1
๐
)โ โโ1โ (
๐(๐ + โโ 1)
โ
)2โ+1
,
and
max๐ฅโ[0,๐ฅ๐* ]
๐โฒ๐,๐ (๐ฅ)
โค max
๐ฅโ[0,๐ฅ๐* ]|๐โฒ๐,๐ (๐ฅ)|+ 2
(1โ ๐ฅ๐*)โโโ1โ
โ
(๐ + โ
โ
)(๐ + โ
โ
)
โค 4๐ + โ
โ
((โ + 5/2
2๐ + 2โโ 1
)2โ
โ 1
๐
)โ โโ1โ (
๐(๐ + โโ 1)
โ
)2โ+1
.
Therefore, both max๐ฅโ[0,๐ฅ๐* ] |๐โฒ๐,๐ (๐ฅ)| and max๐ฅโ[0,๐ฅ๐* ] |๐โฒ๐,๐ (๐ฅ)| are ๐(๐4.002โ). Then, by
169
Proposition 12, equation (ii) of Proposition 13 and monotonicity on the interval [๐ฅ๐* , 1], we
have 1๐
๐โ๐=1
๐๐,๐ (๐ฅ๐)โโซ 1
0
๐๐,๐ (๐ฅ) ๐๐ฅ
โค
1๐
๐*โ๐=1
๐๐,๐ (๐ฅ๐)โโซ ๐ฅ๐*
0
๐๐,๐ (๐ฅ) ๐๐ฅ
+
1๐
๐โ๐=๐*+1
๐๐,๐ (๐ฅ๐)โโซ 1
๐
๐๐,๐ (๐ฅ) ๐๐ฅ
โค 1 + ๐(1)
2๐๐4.002โ+
๐๐,๐ (1)
๐
โค 1 + ๐(1)
2๐.002โ+
1
๐
(๐(๐ + โ + 1)
โ + 1
)2(โ+1)
โค 1 + ๐(1)
2๐.002โ,
and, similarly,
1๐
๐โ๐=1
๐๐,๐ (๐ฅ๐)โโซ 1
0
๐๐,๐ (๐ฅ) ๐๐ฅ
โค
1๐
๐*โ๐=1
๐๐,๐ (๐ฅ๐)โโซ ๐ฅ๐*
0
๐๐,๐ (๐ฅ) ๐๐ฅ
+
1๐
๐โ๐=๐*+1
๐๐,๐ (๐ฅ๐)โโซ 1
๐ฅ๐*
๐๐,๐ (๐ฅ) ๐๐ฅ
โค 1 + ๐(1)
2๐.002โ+
๐๐,๐ (๐ฅ๐*)
๐
โค 1 + ๐(1)
2๐.002โ+
๐๐,๐ (1)
๐
โค 1 + ๐(1)
2๐.002โ.
Let us denote this upper bound by ๐ = (1 + ๐(1))/(2๐.002โ). By using the substitution
๐ฅ = 1โ (1โ๐ฆ2
)โ, we have
โซ 1
0
๐๐,๐ (๐ฅ) ๐๐ฅ =โ
2โโจ๐ (โ,0)
๐ , ๐ (โ,0)๐ โฉโ,0 and
โซ 1
0
๐๐,๐ (๐ฅ) ๐๐ฅ =โ
2โโจ๐ (โ,0)
๐ , ๐ (โ,0)๐ โฉโโ1,0.
170
Then
max๐โ๐ซ๐โ1
๐ =0
โ๐๐=1 ๐
2(๐(๐ฅ๐))โ๐๐=1 ๐
2(๐(๐ฅ๐))(1โ ๐(๐ฅ๐))โค max
๐โR๐โ0|๐๐,๐ |โค๐|๐๐,๐ |โค๐
โ๐โ1๐,๐ =0 ๐๐๐๐
[โ2โโจ๐ (โ,0)
๐ , ๐(โ,0)๐ โฉโโ1,0 + ๐๐,๐
]โ๐โ1
๐,๐ =0 ๐๐๐๐
[โ2โโจ๐ (โ,0)
๐ , ๐(โ,0)๐ โฉโ,0 + ๐๐,๐
] .
Letting ๐๐ = ๐๐โ๐ (โ,0)๐ โโ,0, ๐ = 0, ...,๐โ1, and noting that, by equation (i) of Proposition
13, โ๐ (โ,0)๐ โโ,0 =
(2โ+1
2๐+โ+1
)1/2, we obtain the bound
max๐โ๐ซ๐โ1
๐ =0
โ๐๐=1 ๐
2(๐(๐ฅ๐))โ๐๐=1 ๐
2(๐(๐ฅ๐))(1โ ๐(๐ฅ๐))โค max
๐โR๐โ0|๐๐,๐ |โค๐|๐๐,๐ |โค๐
โ๐โ1๐,๐ =0 ๐๐๐๐
[โจ๐ (โ,0)
๐ ,๐(โ,0)๐ โฉโโ1,0
โ๐ (โ,0)๐ โโ,0โ๐
(โ,0)๐ โโ,0
+ ๐๐,๐
]โ๐โ1
๐=0 ๐2๐ +โ๐โ1
๐,๐ =0 ๐๐๐๐ ๐๐,๐ ,
where
๐ =๐(2๐ + โโ 1)
โ= (1 + ๐(1))
2๐ + โโ 1
2๐.002โโ= ๐(1/๐).
Let ๐ต โ R๐ร๐ be given by ๐ต(๐, ๐ ) = โจ๐ (โ,0)๐ , ๐
(โ,0)๐ โฉโโ1,0โ๐ (โ,0)
๐ โโ1โ,0โ๐
(โ,0)๐ โโ1
โ,0 , ๐, ๐ =
0, ...,๐โ 1. By Proposition 10 applied to ๐โโ1,0(๐ฅ) and equation (iv) of Proposition 13, we
have
max๐โR๐โ0
๐๐๐ต๐
๐๐ ๐โค 1
1โโ
1โ(
โ+1/2๐+โโ1/2
)2 โค 2
(๐ + โโ 1/2
โ + 1/2
)2
.
This implies that
max๐โ๐ซ๐โ1
๐ =0
โ๐๐=1 ๐
2(๐(๐ฅ๐))โ๐๐=1 ๐
2(๐(๐ฅ๐))(1โ ๐(๐ฅ๐))โค max
๐โR๐
=0
๐๐(๐ต + ๐11๐
)๐
๐๐ (๐ผ โ ๐11๐ ) ๐
โค 1
1โ ๐๐max๐โR๐
๐ =0
[๐๐๐ต๐
๐๐ ๐
]+
๐๐
1โ ๐๐
โค 2
1โ ๐๐
(๐ + โโ 1/2
โ + 1/2
)2
+๐๐
1โ ๐๐,
171
and that, with probability 1โ ๐(1/๐),
min๐โ๐ซ๐โ1
๐ =0
โ๐๐=1 ๐๐๐
2 (๐(๐ฅ๐)) (1โ ๐(๐ฅ๐))โ๐๐=1 ๐๐๐ 2 (๐(๐ฅ๐))
โฅ (1โ๐(1)).998
4
(โ + 1/2
๐ + โโ 1/2
)2
โฅ .015 ln2 ๐
๐2 ln2 ln๐.
This completes the proof.
In the previous proof, a number of very small constants were used, exclusively for the
purpose of obtaining an estimate with constant as close to 1/64 as possible. The constants
used in the definition of โ and ๐ can be replaced by more practical numbers (that begin to
exhibit asymptotic convergence for reasonably sized ๐), at the cost of a constant worse than
.015. This completes the analysis of asymptotic lower bounds.
We now move to upper bounds for relative error in the ๐-norm. Our estimate for relative
error in the one-norm is of the same order as the estimate (5.4), but with an improved
constant. Our technique for obtaining these estimates differs from the technique of [67] in
one key way. Rather than integrating first by ๐1 and using properties of the arctan function,
we replace the ball ๐ต๐ by ๐ chi-square random variables on [0,โ)๐, and iteratively apply
Cauchy-Schwarz to our relative error until we obtain an exponent ๐ for which the inverse
chi-square distribution with one degree of freedom has a convergent ๐๐กโ moment.
Lemma 22. Let ๐ โฅ 100, ๐ โฅ 10, and ๐ โฅ 1. Then
max๐ดโ๐ฎ๐
E๐โผ๐ฐ(๐๐โ1)
[(๐1(๐ด)โ ๐
(๐)1 (๐ด, ๐)
๐1(๐ด)โ ๐๐(๐ด)
)๐] 1๐
โค .068ln2 (๐(๐โ 1)8๐)
(๐โ 1)2.
Proof. Without loss of generality, suppose ๐1(๐ด) = 1 and ๐๐(๐ด) = 0. By repeated
application of Cauchy Schwarz,
โ๐๐=1 ๐ฆ๐๐
2(๐๐)(1โ ๐๐)โ๐๐=1 ๐ฆ๐๐
2(๐๐)โค[โ๐
๐=1 ๐ฆ๐๐2(๐๐)(1โ ๐๐)
2โ๐๐=1 ๐ฆ๐๐
2(๐๐)
] 12
โค[โ๐
๐=1 ๐ฆ๐๐2(๐๐)(1โ ๐๐)
2๐โ๐๐=1 ๐ฆ๐๐
2(๐๐)
] 12๐
for ๐ โ N. Choosing ๐ to satisfy 2๐ < 2๐ โค 4๐, and using equation (5.9) with polynomial
172
normalization ๐(1) = 1, we have
E
[(๐1 โ ๐
(๐)1
๐1 โ ๐๐
)๐] 1๐
โค min๐โ๐ซ๐โ1(1)
[โซ[0,โ)๐
(โ๐๐=2 ๐ฆ๐๐
2(๐๐)(1โ ๐๐)2๐โ๐
๐=1 ๐ฆ๐๐2(๐๐)
) ๐2๐
๐๐ (๐ฆ) ๐๐ฆ
] 1๐
โค min๐โ๐ซ๐โ1(1)
โกโฃโซ[0,โ)๐
(โ๐:๐๐<๐ฝ ๐ฆ๐๐
2(๐๐)(1โ ๐๐)2๐โ๐
๐=1 ๐ฆ๐๐2(๐๐)
) ๐2๐
๐๐ (๐ฆ) ๐๐ฆ
โคโฆ 1๐
+
โกโฃโซ[0,โ)๐
(โ๐:๐๐โฅ๐ฝ ๐ฆ๐๐
2(๐๐)(1โ ๐๐)2๐โ๐
๐=1 ๐ฆ๐๐2(๐๐)
) ๐2๐
๐๐ (๐ฆ) ๐๐ฆ
โคโฆ 1๐
for any ๐ฝ โ (0, 1). The integrand of the first term satisfies
(โ๐:๐๐<๐ฝ ๐ฆ๐๐
2(๐๐)(1โ ๐๐)2๐โ๐
๐=1 ๐ฆ๐๐2(๐๐)
) ๐2๐
โค
(โ๐:๐๐<๐ฝ ๐ฆ๐๐
2(๐๐)(1โ ๐๐)2๐โ๐
๐=1 ๐ฆ๐๐2(๐๐)
) 14
โค max๐ฅโ[0,๐ฝ]
|๐(๐ฅ)|1/2(1โ ๐ฅ)2๐โ2
(โ๐:๐๐<๐ฝ ๐ฆ๐
๐ฆ1
) 14
,
and the second term is always bounded above by max๐โ[๐ฝ,1](1โ ๐) = 1โ ๐ฝ. We replace
the minimizing polynomial in ๐ซ๐โ1(1) by ๐(๐ฅ) =๐๐โ1( 2
๐ฝ๐ฅโ1)
๐๐โ1( 2๐ฝโ1)
, where ๐๐โ1(ยท) is the
Chebyshev polynomial of the first kind. The Chebyshev polynomials ๐๐โ1(ยท) are bounded by
one in magnitude on the interval [โ1, 1], and this bound is tight at the endpoints. Therefore,
our maximum is achieved at ๐ฅ = 0, and
max๐ฅโ[0,๐ฝ]
| ๐(๐ฅ)|1/2(1โ ๐ฅ)2๐โ2
=
๐๐โ1
(2
๐ฝโ 1
)โ1/2
.
By the definition ๐๐โ1(๐ฅ) = 1/2((
๐ฅโโ๐ฅ2 โ 1
)๐โ1+(๐ฅ +โ๐ฅ2 โ 1
)๐โ1)
, |๐ฅ| โฅ 1,
173
and the standard inequality ๐2๐ฅ โค 1+๐ฅ1โ๐ฅ
, ๐ฅ โ [0, 1],
๐๐โ1
(2
๐ฝโ 1
)โฅ 1
2
โโ 2
๐ฝโ 1 +
โ(2
๐ฝโ 1
)2
โ 1
โโ ๐โ1
=1
2
(1 +โ
1โ ๐ฝ
1โโ
1โ ๐ฝ
)๐โ1
โฅ 1
2exp
2โ
1โ ๐ฝ (๐โ 1).
In addition,
โซ[0,โ)๐
(โ๐๐=2 ๐ฆ๐๐ฆ1
)1/4
๐๐ (๐ฆ) ๐๐ฆ =ฮ (๐/2โ 1/4) ฮ(1/4)
ฮ(๐/2โ 1/2)ฮ(1/2)โค ฮ(1/4)
21/4ฮ(1/2)๐1/4,
which gives us
E
[(๐1 โ ๐
(๐)1
๐1 โ ๐๐
)๐]1๐
โค[
21/4ฮ(1/4)
ฮ(1/2)๐1/4
]1/๐๐โ๐พ(๐โ1)/๐ + ๐พ2,
where ๐พ =โ
1โ ๐ฝ. Setting ๐พ = ๐๐โ1
ln(๐1/4๐(๐โ 1)2
)(assuming ๐พ < 1, otherwise our
bound is already greater than one, and trivially holds), we obtain
E
[(๐1 โ ๐
(๐)1
๐1 โ ๐๐
)๐]1/๐โค
(21/4ฮ(1/4)ฮ(1/2)
)1/๐+ ๐2 ln2(๐1/4๐(๐โ 1)2)
(๐โ 1)2
=
(21/4ฮ(1/4)ฮ(1/2)
)1/๐+ 1
16ln2 (๐(๐โ 1)8๐)
(๐โ 1)2
โค .068ln2 (๐(๐โ 1)8๐)
(๐โ 1)2,
for ๐ โฅ 10, ๐ โฅ 100. The constant of .068 is produced by upper bounding 21/4ฮ(1/4)ฮ(1/2)
by a
constant1 times ln2[๐(๐โ 1)8๐
]. This completes the proof.
A similar proof, paired with probabilistic bounds on the quantityโ๐
๐=2 ๐๐/๐1, where
1The best constant in this case is given by the ratio of 21/4ฮ(1/4)ฮ(1/2) to the minimum value of ln2
[๐(๐โ1)8๐
]over ๐ โฅ 10, ๐ โฅ 100.
174
๐1, ..., ๐๐ โผ ๐21, gives a probabilistic upper estimate. Combining the lower bounds in
Lemmas 20 and 21 with the upper bound of Lemma 22 completes the proof of Theorem 19.
5.4 Distribution Dependent Bounds
In this section, we consider improved estimates for matrices with certain specific properties.
First, we show that if a matrix ๐ด has a reasonable number of eigenvalues near ๐1(๐ด), then
we can produce an error estimate that depends only on the number of iterations ๐. The
intuition is that if there are not too few eigenvalues near the maximum eigenvalue, then the
initial vector has a large inner product with a vector with Rayleigh quotient close to the
maximum eigenvalue. In particular, we suppose that the eigenvalues of our matrix ๐ด are
such that, once scaled, there are at least ๐/(๐โ 1)๐ผ eigenvalues in the range [๐ฝ, 1], for a
specific value of ๐ฝ satisfying 1โ ๐ฝ = ๐(๐โ2 ln2๐
). For a large number of matrices for
which the Lanczos method is used, this assumption holds true. Under this assumption, we
prove the following error estimate.
Theorem 20. Let ๐ด โ ๐ฎ๐, ๐ โฅ 10, ๐ โฅ 1, ๐ผ > 0, and ๐ โฅ ๐(๐โ 1)๐ผ. If
#
๐๐(๐ด)
๐1(๐ด)โ ๐๐(๐ด)
๐1(๐ด)โ ๐๐(๐ด)โค(
(2๐ + ๐ผ/4) ln(๐โ 1)
๐โ 1
)2โฅ ๐
(๐โ 1)๐ผ,
then
E๐โผ๐ฐ(๐๐โ1)
[(๐1(๐ด)โ ๐
(๐)1 (๐ด, ๐)
๐1(๐ด)โ ๐๐(๐ด)
)๐] 1๐
โค .077(2๐ + ๐ผ/4)2 ln2(๐โ 1)
(๐โ 1)2.
In addition, if
#
๐๐(๐ด)
๐1(๐ด)โ ๐๐(๐ด)
๐1(๐ด)โ ๐๐(๐ด)โค(
(๐ผ + 2) ln(๐โ 1)
4(๐โ 1)
)2โฅ ๐
(๐โ 1)๐ผ,
175
then
P๐โผ๐ฐ(๐๐โ1)
[๐1(๐ด)โ ๐
(๐)1 (๐ด, ๐)
๐1(๐ด)โ ๐๐(๐ด)โค .126
(๐ผ + 2)2 ln2(๐โ 1)
(๐โ 1)2
]โฅ 1โ๐(๐โ๐).
Proof. We begin by bounding expected relative error. The main idea is to proceed as in
the proof of Lemma 22, but take advantage of the number of eigenvalues near ๐1. For
simplicity, let ๐1 = 1 and ๐๐ = 0. We consider eigenvalues in the ranges [0, 2๐ฝ โ 1) and
[2๐ฝ โ 1, 1] separately, 1/2 < ๐ฝ < 1, and then make use of the lower bound for the number
of eigenvalues in [๐ฝ, 1]. From the proof of Lemma 22, we have
E
[(๐1 โ ๐
(๐)1
๐1 โ ๐๐
)๐]1๐
โค min๐โ๐ซ๐โ1(1)
โกโฃโซ[0,โ)๐
(โ๐:๐๐<2๐ฝโ1 ๐ฆ๐๐
2(๐๐)(1โ ๐๐)2๐โ๐
๐=1 ๐ฆ๐๐2(๐๐)
) ๐2๐
๐๐ (๐ฆ) ๐๐ฆ
โคโฆ 1๐
+
โกโฃโซ[0,โ)๐
(โ๐:๐๐โฅ2๐ฝโ1 ๐ฆ๐๐
2(๐๐)(1โ ๐๐)2๐โ๐
๐=1 ๐ฆ๐๐2(๐๐)
) ๐2๐
๐๐ (๐ฆ) ๐๐ฆ
โคโฆ 1๐
,
where ๐ โ N, 2๐ < 2๐ โค 4๐, ๐๐ (๐ฆ) is given by (5.8), and ๐ฝ โ (1/2, 1). The second term is
at most 2(1โ ๐ฝ), and the integrand in the first term is bounded above by
(โ๐:๐๐<2๐ฝโ1 ๐ฆ๐๐
2(๐๐)(1โ ๐๐)2๐โ๐
๐=1 ๐ฆ๐๐2(๐๐)
)๐/2๐
โค
(โ๐:๐๐<2๐ฝโ1 ๐ฆ๐๐
2(๐๐)(1โ ๐๐)2๐โ๐
๐=1 ๐ฆ๐๐2(๐๐)
)1/4
โค
(โ๐:๐๐<2๐ฝโ1 ๐ฆ๐๐
2(๐๐)(1โ ๐๐)2๐โ
๐:๐๐โฅ๐ฝ ๐ฆ๐๐2(๐๐)
)1/4
โคmax๐ฅโ[0,2๐ฝโ1] |๐(๐ฅ)|1/2 (1โ ๐ฅ)2
๐โ2
min๐ฅโ[๐ฝ,1] |๐(๐ฅ)|1/2
(โ๐:๐๐<2๐ฝโ1 ๐ฆ๐โ๐:๐๐โฅ๐ฝ ๐ฆ๐
)1/4
.
Letโ
1โ ๐ฝ = (2๐+๐ผ/4) ln(๐โ1)๐โ1
< 1/4 (if ๐ฝ โค 1/2, then our estimate is a trivial one, and we
are already done). By the condition of the theorem, there are at least ๐/(๐โ1)๐ผ eigenvalues
176
in the interval [๐ฝ, 1]. Therefore,
โซ[0,โ)๐
(โ๐:๐๐<2๐ฝโ1 ๐ฆ๐โ๐:๐๐โฅ๐ฝ ๐ฆ๐
)1/4
๐๐ (๐ฆ) ๐๐ฆ โค E๐โผ๐2๐
[๐ 1/4
]E๐โผ๐2
โ๐/(๐โ1)๐ผโ
[๐ โ1/4
]=
ฮ(๐/2 + 1/4)ฮ(โ๐/(๐โ 1)๐ผโ/2โ 1/4)
ฮ(๐/2)ฮ(โ๐/(๐โ 1)๐ผโ/2)
โค ฮ(๐(๐โ 1)๐ผ/2 + 1/4)ฮ(๐/2โ 1/4)
ฮ(๐(๐โ 1)๐ผ/2)ฮ(๐/2)
โค 1.04 (๐โ 1)๐ผ/4
for ๐ โฅ ๐(๐ โ 1)๐ผ and ๐ โฅ 10. Replacing the minimizing polynomial by ๐(๐ฅ) =๐๐โ1( 2๐ฅ
2๐ฝโ1โ1)
๐๐โ1( 22๐ฝโ1
โ1), we obtain
max๐ฅโ[0,2๐ฝโ1]
๐(๐ฅ)12
(1โ ๐ฅ)2๐โ2
min๐ฅโ[๐ฝ,1]
๐(๐ฅ)12
=1
๐1/2๐โ1
(2๐ฝ
2๐ฝโ1โ 1) โค 1
๐1/2๐โ1
(2๐ฝโ 1) โค โ2๐โ
โ1โ๐ฝ(๐โ1).
Combining our estimates results in the bound
E
[(๐1 โ ๐
(๐)1
๐1 โ ๐๐
)๐]1๐
โค(
1.04โ
2(๐โ 1)๐ผ/4)1/๐
๐โโ1โ๐ฝ(๐โ1)/๐ + 2(1โ ๐ฝ)
=(1.04
โ2)1/๐
(๐โ 1)2+
(2๐ + ๐ผ/4)2 ln2(๐โ 1)
(๐โ 1)2
โค .077(2๐ + ๐ผ/4)2 ln2(๐โ 1)
(๐โ 1)2
for ๐ โฅ 10, ๐ โฅ 1, and ๐ผ > 0. This completes the bound for expected relative error. We
177
now produce a probabilistic bound for relative error. Letโ
1โ ๐ฝ = (๐ผ+2) ln(๐โ1)4(๐โ1)
. We have
๐1 โ ๐(๐)1
๐1 โ ๐๐
= min๐โ๐ซ๐โ1(1)
โ๐๐=2 ๐๐๐
2(๐๐)(1โ ๐๐)โ๐๐=1 ๐๐๐2(๐๐)
โค min๐โ๐ซ๐โ1(1)
max๐ฅโ[0,2๐ฝโ1]๐2(๐ฅ)(1โ ๐ฅ)
min๐ฅโ[๐ฝ,1]๐2(๐ฅ)
โ๐:๐๐<2๐ฝโ1 ๐๐โ๐:๐๐โฅ๐ฝ ๐๐
+ 2(1โ ๐ฝ)
โค ๐โ2๐โ1
(2
๐ฝโ 1
)โ๐:๐๐<2๐ฝโ1 ๐๐โ๐:๐๐โฅ๐ฝ ๐๐
+ 2(1โ ๐ฝ)
โค 4 expโ4โ
1โ ๐ฝ(๐โ 1)โ
๐:๐๐<2๐ฝโ1 ๐๐โ๐:๐๐โฅ๐ฝ ๐๐
+ 2(1โ ๐ฝ).
By Proposition 11,
P
[โ๐:๐๐<2๐ฝโ1 ๐๐โ๐:๐๐โฅ๐ฝ ๐๐
โฅ 4(๐โ 1)๐ผ
]โค P
[ โ๐:๐๐<2๐ฝโ1
๐๐ โฅ 2๐
]+ P
[ โ๐:๐๐โฅ๐ฝ
๐๐ โค๐
2(๐โ 1)๐ผ
]โค (2/๐)๐/2 +
(โ๐/2)๐/2(๐โ1)๐ผ
= ๐(๐โ๐).
Then, with probability 1โ๐(๐โ๐),
๐1 โ ๐(๐)1
๐1 โ ๐๐
โค 16(๐โ 1)๐ผ๐โ4โ1โ๐ฝ(๐โ1) + 2(1โ ๐ฝ) =
16
(๐โ 1)2+
(๐ผ + 2)2 ln2(๐โ 1)
8(๐โ 1)2.
The 16/(๐ โ 1)2 term is dominated by the log term as ๐ increases, and, therefore, with
probability 1โ๐(๐โ๐),
๐1 โ ๐(๐)1
๐1 โ ๐๐
โค .126(๐ผ + 2)2 ln2(๐โ 1)
(๐โ 1)2.
This completes the proof.
The above theorem shows that, for matrices whose distribution of eigenvalues is inde-
pendent of ๐, we can obtain dimension-free estimates. For example, the above theorem
holds for the matrix from Example 1, for ๐ผ = 2.
When a matrix has eigenvalues known to converge to a limiting distribution as the
dimension increases, or a random matrix ๐๐ exhibits suitable convergence of its empirical
178
spectral distribution ๐ฟ๐๐ := 1/๐โ๐
๐=1 ๐ฟ๐๐(๐๐), improved estimates can be obtained by
simply estimating the corresponding integral polynomial minimization problem. However,
to do so, we first require a law of large numbers for weighted sums of independent identically
distributed (i.i.d.) random variables. We recall the following result.
Proposition 14. ([25]) Let ๐1, ๐2, ... โ [๐, ๐] and ๐1, ๐2, ... be i.i.d. random variables, with
E[๐1] = 0 and E[๐21 ] <โ. Then 1
๐
โ๐๐=1 ๐๐๐๐ โ 0 almost surely.
We present the following theorem regarding the error of random matrices that exhibit
suitable convergence.
Theorem 21. Let ๐๐ โ ๐ฎ๐, ฮ(๐๐) โ [๐, ๐], ๐ = 1, 2, ... be a sequence of random matrices,
such that ๐ฟ๐๐ converges in probability to ๐(๐ฅ) ๐๐ฅ in ๐ฟ2([๐, ๐]), where ๐(๐ฅ) โ ๐ถ([๐, ๐]),
๐, ๐ โ supp(๐). Then, for all ๐ โ N and ๐ > 0,
lim๐โโ
P(
๐1(๐๐)โ ๐(๐)1 (๐๐, ๐)
๐1(๐๐)โ ๐๐(๐๐)โ ๐โ ๐(๐)
๐โ ๐
> ๐
)= 0,
where ๐(๐) is the largest zero of the ๐๐กโ degree orthogonal polynomial of ๐(๐ฅ) in the
interval [๐, ๐].
Proof. The main idea of the proof is to use Proposition 14 to control the behavior of ๐ , and
the convergence of ๐ฟ๐๐ to ๐(๐ฅ) ๐๐ฅ to show convergence to our integral minimization prob-
lem. We first write our polynomial ๐ โ ๐ซ๐โ1 as ๐ (๐ฅ) =โ๐โ1
๐=0 ๐๐๐ฅ๐ and our unnormalized
error as
๐1(๐๐)โ ๐(๐)1 (๐๐, ๐) = min
๐โ๐ซ๐โ1๐ =0
โ๐๐=2 ๐๐๐
2(๐๐)(๐1 โ ๐๐)โ๐๐=1 ๐๐๐ 2(๐๐)
= min๐โR๐
๐ =0
โ๐โ1๐1,๐2=0 ๐๐1๐๐2
โ๐๐=2 ๐๐๐
๐1+๐2๐ (๐1 โ ๐๐)โ๐โ1
๐1,๐2=0 ๐๐1๐๐2โ๐
๐=1 ๐๐๐๐1+๐2๐
,
where ๐1, ..., ๐๐ are i.i.d. chi-square random variables with one degree of freedom each.
The functions ๐ฅ๐ , ๐ = 0, ..., 2๐ โ 2, are bounded on [๐, ๐], and so, by Proposition 14, for
179
any ๐1, ๐2 > 0,
1
๐
๐โ๐=2
๐๐๐๐๐ (๐1 โ ๐๐)โ
1
๐
๐โ๐=2
๐๐๐ (๐1 โ ๐๐)
< ๐1, ๐ = 0, ..., 2๐โ 2,
and 1
๐
๐โ๐=1
๐๐๐๐๐ โ
1
๐
๐โ๐=1
๐๐๐
< ๐2, ๐ = 0, ..., 2๐โ 2,
with probability 1โ๐(1). ๐ฟ๐๐ converges in probability to ๐(๐ฅ) ๐๐ฅ, and so, for any ๐3, ๐4 > 0,
1
๐
๐โ๐=2
๐๐๐ (๐1 โ ๐๐)โ
โซ ๐
๐
๐ฅ๐(๐โ ๐ฅ)๐(๐ฅ) ๐๐ฅ
< ๐3, ๐ = 0, ..., 2๐โ 2,
and 1
๐
๐โ๐=1
๐๐๐ โ
โซ ๐
๐
๐ฅ๐๐(๐ฅ) ๐๐ฅ
< ๐4, ๐ = 0, ..., 2๐โ 2,
with probability 1โ ๐(1). This implies that
๐1(๐๐)โ ๐(๐)1 (๐๐, ๐) = min
๐โR๐
๐ =0
โ๐โ1๐1,๐2=0 ๐๐1๐๐2
[โซ ๐
๐๐ฅ๐1+๐2(๐โ ๐ฅ)๐(๐ฅ) ๐๐ฅ + (๐1, ๐2)
]โ๐โ1
๐1,๐2=0 ๐๐1๐๐2
[โซ ๐
๐๐ฅ๐1+๐2๐(๐ฅ) ๐๐ฅ + ๐ธ(๐1, ๐2)
] ,
where |(๐1, ๐2)| < ๐1 + ๐3 and |๐ธ(๐1, ๐2)| < ๐2 + ๐4, ๐1, ๐2 = 0, ...,๐โ 1, with probability
1โ ๐(1).
By the linearity of integration, the minimization problem
min๐โ๐ซ๐โ1
๐ =0
โซ ๐
๐๐ 2(๐ฅ)
(๐โ๐ฅ๐โ๐
)๐(๐ฅ) ๐๐ฅโซ ๐
๐๐ 2(๐ฅ)๐(๐ฅ) ๐๐ฅ
,
when rewritten in terms of the polynomial coefficients ๐๐, ๐ = 0, ...,๐ โ 1, corresponds
to a generalized Rayleigh quotient ๐๐๐ด๐๐๐๐ต๐
, where ๐ด,๐ต โ ๐ฎ๐++ and ๐๐๐๐ฅ(๐ด), ๐๐๐๐ฅ(๐ต),
and ๐๐๐๐(๐ต) are all constants independent of ๐, and ๐ = (๐0, ..., ๐๐โ1)๐ . By choosing
180
๐1, ๐2, ๐3, ๐4 sufficiently small, ๐๐ (๐ด + )๐
๐๐ (๐ต + ๐ธ)๐โ ๐๐๐ด๐
๐๐๐ต๐
=
(๐๐ ๐)๐๐๐ต๐โ (๐๐๐ธ๐)๐๐๐ด๐
(๐๐๐ต๐ + ๐๐๐ธ๐)๐๐๐ต๐
โค (๐1 + ๐3)๐๐๐๐๐ฅ(๐ต) + (๐2 + ๐4)๐๐๐๐๐ฅ(๐ด)
(๐๐๐๐(๐ต)โ (๐2 + ๐4)๐)๐๐๐๐(๐ต)โค ๐
for all ๐ โ R๐ with probability 1 โ ๐(1). Applying Proposition 10 to the above integral
minimization problem completes the proof.
The above theorem is a powerful tool for explicitly computing the error in the Lanczos
method for certain types of matrices, as the computation of the extremal eigenvalue of an
๐ร๐ matrix (corresponding to the largest zero) is a nominal computation compared to
one application of an ๐ dimensional matrix. In Section 6, we will see that this convergence
occurs quickly in practice. In addition, the above result provides strong evidence that the
inverse quadratic dependence on ๐ in the error estimates throughout this chapter is not so
much a worst case estimate, but actually indicative of error rates in practice. For instance,
if the eigenvalues of a matrix are sampled from a distribution bounded above and below
by some multiple of a Jacobi weight function, then, by equation (iv) Proposition 13 and
Theorem 21, it immediately follows that the error is of order ๐โ2. Of course, we note that
Theorem 21 is equally applicable for estimating ๐๐.
5.5 Estimates for Arbitrary Eigenvalues and Condition Num-
ber
Up to this point, we have concerned ourselves almost exclusively with the extremal eigen-
values ๐1 and ๐๐ of a matrix. In this section, we extend the techniques of this chapter to
arbitrary eigenvalues, and also obtain bounds for the condition number of a positive definite
matrix. The results of this section provide the first uniform error estimates for arbitrary
eigenvalues and the first lower bounds for the relative error with respect to the condition
number. Lower bounds for arbitrary eigenvalues follow relatively quickly from our previous
181
work. However, our proof technique for upper bounds requires some mild assumptions
regarding the eigenvalue gaps of the matrix. We begin with asymptotic lower bounds for an
arbitrary eigenvalue, and present the following corollary of Theorem 19.
Corollary 3.
max๐ดโ๐ฎ๐
P๐โผ๐ฐ(๐๐โ1)
[๐๐(๐ด)โ ๐
(๐)๐ (๐ด, ๐)
๐1(๐ด)โ ๐๐(๐ด)โฅ 1โ ๐(1)
]โฅ 1โ ๐(1/๐)
for ๐ = ๐(ln๐),
max๐ดโ๐ฎ๐
P๐โผ๐ฐ(๐๐โ1)
[๐๐(๐ด)โ ๐
(๐)๐ (๐ด, ๐)
๐1(๐ด)โ ๐๐(๐ด)โฅ .015 ln2 ๐
๐2 ln2 ln๐
]โฅ 1โ ๐(1/๐)
for ๐ = ฮ(ln๐), and
max๐ดโ๐ฎ๐
P๐โผ๐ฐ(๐๐โ1)
[๐๐(๐ด)โ ๐
(๐)๐ (๐ด, ๐)
๐1(๐ด)โ ๐๐(๐ด)โฅ 1.08
๐2
]โฅ 1โ ๐(1/๐)
for ๐ = ๐(๐1/2 lnโ1/2 ๐
)and ๐(1).
Proof. Let ๐1, ..., ๐๐ be orthonormal eigenvectors corresponding to ๐1(๐ด) โฅ ... โฅ ๐๐(๐ด),
and = ๐โโ๐โ1
๐=1(๐๐๐ ๐)๐. By the inequalities
๐(๐)๐ (๐ด, ๐) โค max
๐ฅโ๐ฆ๐(๐ด,๐)โ0๐ฅ๐๐๐=0,๐=1,...,๐โ1
๐ฅ๐๐ด๐ฅ
๐ฅ๐๐ฅโค max
๐ฅโ๐ฆ๐(๐ด,)โ0
๐ฅ๐๐ด๐ฅ
๐ฅ๐๐ฅ= ๐
(๐)1
(๐ด,
),
the relative error๐๐(๐ด)โ ๐
(๐)๐ (๐ด, ๐)
๐1(๐ด)โ ๐๐(๐ด)โฅ ๐๐(๐ด)โ ๐
(๐)1 (๐ด, )
๐1(๐ด)โ ๐๐(๐ด).
The right-hand side corresponds to an extremal eigenvalue problem of dimension ๐โ ๐ + 1.
Setting the largest eigenvalue of ๐ด to have multiplicity ๐, and defining the eigenvalues
๐๐, ..., ๐๐ based on the eigenvalues (corresponding to dimension ๐โ ๐+1) used in the proofs
of Lemmas 20 and 21 completes the proof.
182
Next, we provide an upper bound for the relative error in approximating ๐๐ under the
assumption of non-zero gaps between eigenvalues ๐1, ..., ๐๐.
Theorem 22. Let ๐ โฅ 100, ๐ โฅ 9 + ๐, ๐ โฅ 1, and ๐ด โ ๐ฎ๐. Then
E๐โผ๐ฐ(๐๐โ1)
[(๐๐(๐ด)โ ๐
(๐)๐ (๐ด, ๐)
๐๐(๐ด)โ ๐๐(๐ด)
)๐]1/๐โค .068
ln2(๐ฟโ2(๐โ1)๐(๐โ ๐)8๐
)(๐โ ๐)2
and
P๐โผ๐ฐ(๐๐โ1)
[๐๐(๐ด)โ ๐
(๐)๐ (๐ด, ๐)
๐๐(๐ด)โ ๐๐(๐ด)โค .571
ln2(๐ฟโ2(๐โ1)/3๐(๐โ ๐)2/3
)(๐โ ๐)2
]โฅ 1โ ๐(1/๐),
where
๐ฟ =1
2min
๐=2,...,๐
๐๐โ1(๐ด)โ ๐๐(๐ด)
๐1(๐ด)โ ๐๐(๐ด).
Proof. As in previous cases, it suffices to prove the theorem for matrices ๐ด with ๐๐(๐ด) = 1
and ๐๐(๐ด) = 0 (if ๐๐ = ๐๐, we are done). Similar to the polynomial representation of
๐(๐)1 (๐ด, ๐), the Ritz value ๐
(๐)๐ (๐ด, ๐) corresponds to finding the polynomial in ๐ซ๐โ1 with
zeros ๐(๐)๐ , ๐ = 1, ..., ๐โ 1, that maximizes the corresponding Rayleigh quotient. For the
sake of brevity, let ๐๐(๐ฅ) =โ๐โ1
๐=1(๐(๐)๐ โ ๐ฅ)2. Then ๐
(๐)๐ (๐ด, ๐) can be written as
๐(๐)๐ (๐ด, ๐) = max
๐โ๐ซ๐โ๐
โ๐๐=1 ๐
2๐๐
2(๐๐)๐๐(๐๐)๐๐โ๐๐=1 ๐
2๐๐
2(๐๐)๐๐(๐๐),
and, therefore, the error is bounded above by
๐๐(๐ด)โ ๐(๐)๐ (๐ด, ๐) โค max
๐โ๐ซ๐โ๐
โ๐๐=๐+1 ๐
2๐๐
2(๐๐)๐๐(๐๐)(1โ ๐๐)โ๐๐=1 ๐
2๐๐
2(๐๐)๐๐(๐๐).
The main idea of the proof is very similar to that of Lemma 22, paired with a pigeonhole
principle. The intervals [๐๐(๐ด)โ ๐ฟ๐1, ๐๐(๐ด) + ๐ฟ๐1], ๐ = 1, ..., ๐, are disjoint, and so there
exists some index ๐* such that the corresponding interval does not contain any of the Ritz
values ๐(๐)๐ (๐ด, ๐), ๐ = 1, ..., ๐ โ 1. We begin by bounding expected relative error. As
in the proof of Lemma 22, by integrating over chi-square random variables and using
183
Cauchy-Schwarz, we have
E[(๐๐ โ ๐
(๐)๐
)๐]1๐ โค min๐โ๐ซ๐โ๐
โกโฃโซ[0,โ)๐
(โ๐๐=๐+1 ๐ฆ๐๐
2(๐๐)๐๐(๐๐)(1โ ๐๐)2๐โ๐
๐=1 ๐ฆ๐๐2(๐๐)๐๐(๐๐)
) ๐2๐
๐๐ (๐ฆ) ๐๐ฆ
โคโฆ 1๐
โค min๐โ๐ซ๐โ๐
โกโฃโซ[0,โ)๐
(โ๐:๐๐<๐ฝ ๐ฆ๐๐
2(๐๐)๐๐(๐๐)(1โ ๐๐)2๐โ๐
๐=1 ๐ฆ๐๐2(๐๐)๐๐(๐๐)
) ๐2๐
๐๐ (๐ฆ) ๐๐ฆ
โคโฆ 1๐
+
โกโฃโซ[0,โ)๐
(โ๐:๐๐โ[๐ฝ,1] ๐ฆ๐๐
2(๐๐)๐๐(๐๐)(1โ ๐๐)2๐โ๐
๐=1 ๐ฆ๐๐2(๐๐)๐๐(๐๐)
) ๐2๐
๐๐ (๐ฆ) ๐๐ฆ
โคโฆ 1๐
for ๐ โ N, 2๐ < 2๐ โค 4๐, and any ๐ฝ โ (0, 1). The second term on the right-hand side
is bounded above by 1โ ๐ฝ (the integrand is at most max๐โ[๐ฝ,1](1โ ๐) = 1โ ๐ฝ), and the
integrand in the first term is bounded above by
(โ๐:๐๐<๐ฝ ๐ฆ๐๐
2(๐๐)๐๐(๐๐)(1โ ๐๐)2๐โ๐
๐=1 ๐ฆ๐๐2(๐๐)๐๐(๐๐)
) ๐2๐
โค
(โ๐:๐๐<๐ฝ ๐ฆ๐๐
2(๐๐)๐๐(๐๐)(1โ ๐๐)2๐โ๐
๐=1 ๐ฆ๐๐2(๐๐)๐๐(๐๐)
) 14
โค max๐ฅโ[0,๐ฝ]
|๐ (๐ฅ)| 12๐14๐ (๐ฅ)(1โ ๐ฅ)2
๐โ2
|๐ (๐๐*)| 12๐14๐ (๐๐*)
(โ๐:๐๐<๐ฝ ๐ฆ๐
๐ฆ๐*
) 14
.
By replacing the minimizing polynomial in ๐ซ๐โ๐ by ๐๐โ๐
(2๐ฝ๐ฅโ 1
), the maximum is
achieved at ๐ฅ = 0, and, by the monotonicity of ๐๐โ๐ on [1,โ),
๐๐โ๐
(2
๐ฝ๐๐* โ 1
)โฅ ๐๐โ๐
(2
๐ฝโ 1
)โฅ 1
2exp
2โ
1โ ๐ฝ (๐โ ๐).
In addition,
๐1/4๐ (0)
๐1/4๐ (๐๐*)
=๐โ1โ๐=1
๐
(๐)๐
๐(๐)๐ โ ๐๐*
1/2
โค ๐ฟโ(๐โ1)/2.
184
We can now bound the ๐-norm by
E[(
๐๐ โ ๐(๐)๐
)๐]1๐ โค [21/4ฮ(1/4)
ฮ(1/2)
๐1/4
๐ฟ(๐โ1)/2
]1/๐๐โ๐พ(๐โ๐)/๐ + ๐พ2,
where ๐พ =โ
1โ ๐ฝ. Setting ๐พ = ๐๐โ๐
ln(๐ฟโ(๐โ1)/2๐๐1/4๐(๐โ ๐)2
)(assuming ๐พ < 1,
otherwise our bound is already greater than one, and trivially holds), we obtain
E[(
๐๐ โ ๐(๐)๐
)๐]1๐ โค(
21/4ฮ(1/4)ฮ(1/2)
)1/๐+ 1
16ln2(๐ฟโ2(๐โ1)๐(๐โ ๐)8๐
)(๐โ ๐)2
โค .068ln2(๐ฟโ2(๐โ1)๐(๐โ ๐)8๐
)(๐โ ๐)2
,
for ๐ โฅ 9 + ๐, ๐ โฅ 100. This completes the proof of the expected error estimate. We now
focus on the probabilistic estimate. We have
๐๐(๐ด)โ ๐(๐)๐ (๐ด, ๐) โค min
๐โ๐ซ๐โ๐
โ๐๐=๐+1 ๐๐๐
2(๐๐)๐๐(๐๐)(1โ ๐๐)โ๐๐=1 ๐๐๐ 2(๐๐)๐๐(๐๐)
โค min๐โ๐ซ๐โ๐
max๐ฅโ[0,๐ฝ]
๐ 2(๐ฅ)๐๐(๐ฅ)(1โ ๐ฅ)
๐ 2(๐๐*)๐๐(๐๐*)
โ๐:๐๐<๐ฝ ๐๐
๐๐*+ (1โ ๐ฝ)
โค ๐ฟโ2(๐โ1) ๐โ2๐โ๐
(2
๐ฝโ 1
)โ๐:๐๐<๐ฝ ๐๐
๐๐*+ (1โ ๐ฝ)
โค 4 ๐ฟโ2(๐โ1) expโ4โ
1โ ๐ฝ(๐โ ๐)โ
๐:๐๐<๐ฝ ๐๐
๐๐*+ (1โ ๐ฝ).
By Proposition 11,
P
[โ๐:๐๐<๐ฝ ๐๐
๐๐*โฅ ๐3.02
]โค P
[๐๐* โค ๐โ2.01
]+ P
[โ๐ =๐*
๐๐ โฅ ๐1.01
]
โค(๐/๐2.01
)1/2+(๐.01๐1โ๐.01
)(๐โ1)/2
= ๐(1/๐).
Letโ
1โ ๐ฝ =ln(๐ฟโ2(๐โ1)๐3.02(๐โ๐)2)
4(๐โ๐). Then, with probability 1โ ๐(1/๐),
๐๐(๐ด)โ ๐(๐)๐ (๐ด) โค 4
(๐โ ๐)2+
ln2(๐ฟโ2(๐โ1)๐3.02(๐โ ๐)2
)16(๐โ ๐)2
.
185
The 4/(๐ โ ๐)2 term is dominated by the log term as ๐ increases, and, therefore, with
probability 1โ ๐(1/๐),
๐๐(๐ด)โ ๐(๐)๐ (๐ด) โค .571
ln2(๐ฟโ2(๐โ1)/3๐(๐โ ๐)2/3
)(๐โ ๐)2
.
This completes the proof.
For typical matrices with no repeated eigenvalues, ๐ฟ is usually a very low degree
polynomial in ๐, and, for ๐ constant, the estimates for ๐๐ are not much worse than that of
๐1. In addition, it seems likely that, given any matrix ๐ด, a small random perturbation of ๐ด
before the application of the Lanczos method will satisfy the condition of the theorem with
high probability, and change the eigenvalues by a negligible amount. Of course, the bounds
from Theorem 22 for maximal eigenvalues ๐๐ also apply to minimal eigenvalues ๐๐โ๐.
Next, we make use of Theorem 19 to produce error estimates for the condition number
of a symmetric positive definite matrix. Let ๐ (๐ด) = ๐1(๐ด)/๐๐(๐ด) denote the condition
number of a matrix ๐ด โ ๐ฎ๐++ and ๐ (๐)(๐ด, ๐) = ๐
(๐)1 (๐ด, ๐)/๐
(๐)๐ (๐ด, ๐) denote the condition
number of the tridiagonal matrix ๐๐ โ ๐ฎ๐++ resulting from ๐ iterations of the Lanczos
method applied to a matrix ๐ด โ ๐ฎ๐++ with initial vector ๐. Their difference can be written as
๐ (๐ด)โ ๐ (๐)(๐ด, ๐) =๐1
๐๐
โ ๐(๐)1
๐(๐)๐
=๐1๐
(๐)๐ โ ๐๐๐
(๐)1
๐๐๐(๐)๐
=๐(๐)๐
(๐1 โ ๐
(๐)1
)+ ๐
(๐)1
(๐(๐)๐ โ ๐๐
)๐๐๐
(๐)๐
= (๐ (๐ด)โ 1)
[๐1 โ ๐
(๐)1
๐1 โ ๐๐
+ ๐ (๐)(๐ด, ๐)๐(๐)๐ โ ๐๐
๐1 โ ๐๐
],
which leads to the bounds
๐ (๐ด)โ ๐ (๐)(๐ด, ๐)
๐ (๐ด)โค ๐1 โ ๐
(๐)1
๐1 โ ๐๐
+ ๐ (๐ด)๐(๐)๐ โ ๐๐
๐1 โ ๐๐
186
and๐ (๐ด)โ ๐ (๐)(๐ด, ๐)
๐ (๐)(๐ด, ๐)โฅ (๐ (๐ด)โ 1)
๐(๐)๐ โ ๐๐
๐1 โ ๐๐
.
Using Theorem 19 and Minkowskiโs inequality, we have the following corollary.
Corollary 4.
sup๐ดโ๐ฎ๐
++
๐ (๐ด)=
P๐โผ๐ฐ(๐๐โ1)
[๐ (๐ด)โ ๐ (๐)(๐ด, ๐)
๐ (๐)(๐ด, ๐)โฅ (1โ ๐(1))(โ 1)
]โฅ 1โ ๐(1/๐)
for ๐ = ๐(ln๐),
sup๐ดโ๐ฎ๐
++
๐ (๐ด)=
P๐โผ๐ฐ(๐๐โ1)
[๐ (๐ด)โ ๐ (๐)(๐ด, ๐)
๐ (๐)(๐ด, ๐)โฅ .015
(โ 1) ln2 ๐
๐2 ln2 ln๐
]โฅ 1โ ๐(1/๐)
for ๐ = ฮ(ln๐),
sup๐ดโ๐ฎ๐
++
๐ (๐ด)=
P๐โผ๐ฐ(๐๐โ1)
[๐ (๐ด)โ ๐ (๐)(๐ด, ๐)
๐ (๐)(๐ด, ๐)โฅ 1.08
โ 1
๐2
]โฅ 1โ ๐(1/๐)
for ๐ = ๐(๐1/2 lnโ1/2 ๐
)and ๐(1), and
sup๐ดโ๐ฎ๐
++
๐ (๐ด)=
E๐โผ๐ฐ(๐๐โ1)
[(๐ (๐ด)โ ๐ (๐)(๐ด, ๐)
๐ (๐ด)
)๐]1/๐โค .068 ( + 1)
ln2 (๐(๐โ 1)8๐)
(๐โ 1)2
for ๐ โฅ 100, ๐ โฅ 10, ๐ โฅ 1.
As previously mentioned, it is not possible to produce uniform bounds for the relative
error of ๐ (๐)(๐ด, ๐), and so some dependence on ๐ (๐ด) is necessary.
5.6 Experimental Results
In this section, we present a number of experimental results that illustrate the error of the
Lanczos method in practice. We consider:
187
โข eigenvalues of 1D Laplacian with Dirichlet boundary conditions ฮ๐๐๐ = 2+2 cos(๐๐/(๐+
1))๐๐=1,
โข a uniform partition of [0, 1], ฮ๐ข๐๐๐ = (๐โ ๐)/(๐โ 1)๐๐=1,
โข eigenvalues from the semi-circle distribution, ฮ๐ ๐๐๐ = ๐๐๐๐=1, where
1/2 +(๐๐
โ1โ ๐2
๐ + arcsin๐๐
)/๐ = (๐โ ๐)/(๐โ 1), ๐ = 1, ..., ๐,
โข eigenvalues corresponding to Lemma 21, ฮ๐๐๐ = 1โ [(๐ + 1โ ๐)/๐]ln ln๐ln๐ ๐๐=1.
For each one of these spectra, we perform tests for dimensions ๐ = 10๐, ๐ = 5, 6, 7, 8.
For each dimension, we generate 100 random vectors ๐ โผ ๐ฐ(๐๐โ1), and perform ๐ = 100
iterations of the Lanczos method on each vector. In Figure 5-1, we report the results of
these experiments. In particular, for each spectrum, we plot the empirical average relative
error (๐1 โ ๐(๐)1 )/(๐1 โ ๐๐) for each dimension as ๐ varies from 1 to 100. In addition, for
๐ = 108, we present a box plot for each spectrum, illustrating the variability in relative error
that each spectrum possesses.
In Figure 5-12, the plots of ฮ๐๐๐, ฮ๐ข๐๐๐ , ฮ๐ ๐๐๐ all illustrate an empirical average error
estimate that scales with ๐โ2 and has no dependence on dimension ๐ (for ๐ sufficiently
large). This is consistent with the theoretical results of the chapter, most notably Theorem
21, as all of these spectra exhibit suitable convergence of their empirical spectral distribution,
and the related integral minimization problems all have solutions of order ๐โ2. In addition,
the box plots corresponding to ๐ = 108 illustrate that the relative error for a given iteration
number has a relatively small variance. For instance, all extreme values remain within a
range of length less than .01๐โ2 for ฮ๐๐๐, .1๐โ2 for ฮ๐ข๐๐๐ , and .4๐โ2 for ฮ๐ ๐๐๐. This is
also consistent with the convergence of Theorem 21. The empirical spectral distribution of
all three spectra converge to shifted and scaled versions of Jacobi weight functions, namely,
ฮ๐๐๐ corresponds to ๐โ1/2,โ1/2(๐ฅ), ฮ๐ข๐๐๐ to ๐0,0(๐ฅ), and ฮ๐ ๐๐๐ to ๐1/2,1/2(๐ฅ). The limiting
value of ๐2(1 โ ๐(๐))/2 for each of these three cases is given by ๐21,๐ผ/4, where ๐1,๐ผ is
2In Figure 5-1, the box plots are labelled as follows. For a given ๐, the 25๐กโ and 75๐กโ percentile of thevalues, denoted by ๐1 and ๐3, are the bottom and top of the corresponding box, and the red line in the box is themedian. The whiskers extend to the most extreme points in the interval [๐1 โ 1.5(๐3 โ ๐1), ๐3 + 1.5(๐3 โ ๐1)],and outliers not in this interval correspond to the โ+โ symbol.
188
(a) Plot, ฮ๐๐๐ (b) Box Plot, ฮ๐๐๐
(c) Plot, ฮ๐ข๐๐๐ (d) Box Plot, ฮ๐ข๐๐๐
(e) Plot, ฮ๐ ๐๐๐ (f) Box Plot, ฮ๐ ๐๐๐
(g) Plot, ฮ๐๐๐ (h) Box Plot, ฮ๐๐๐
Figure 5-1: Plot and box plot of relative error times ๐2 vs iteration number ๐ for ฮ๐๐๐,ฮ๐ข๐๐๐ , ฮ๐ ๐๐๐, and ฮ๐๐๐. The plot contains curves for each dimension ๐ tested. Each curverepresents the empirical average relative error for each value of ๐, averaged over 100random initializations. The box plot illustrates the variability of relative error for ๐ = 108.
189
the first positive zero of the Bessel function ๐ฝ๐ผ(๐ฅ), namely, the scaled error converges to
๐2/16 โ .617 for ฮ๐๐๐, โ 1.45 for ฮ๐ข๐๐๐ , and ๐2/4 โ 2.47 for ฮ๐ ๐๐๐. Two additional
properties suggested by Figure 5-1 are that the variance and the dimension ๐ required to
observe asymptotic behavior both appear to increase with ๐ผ. This is consistent with the
theoretical results, as Lemma 21 (and ฮ๐๐๐) results from considering ๐๐ผ,0(๐ฅ) with ๐ผ as a
function of ๐.
The plot of relative error for ฮ๐๐๐ illustrates that relative error does indeed depend on
๐ in a tangible way, and is increasing with ๐ in what appears to be a logarithmic fashion.
For instance, when looking at the average relative error scaled by ๐2, the maximum over
all iteration numbers ๐ appears to increase somewhat logarithmically (โ 4 for ๐ = 105,
โ 5 for ๐ = 106, โ 6 for ๐ = 107, and โ 7 for ๐ = 108). In addition, the boxplot for
๐ = 108 illustrates that this spectrum exhibits a large degree of variability and is susceptible
to extreme outliers. These numerical results support the theoretical lower bounds of Section
3, and illustrate that the asymptotic theoretical lower bounds which depend on ๐ do occur in
practical computations.
190
Bibliography
[1] Raja Hafiz Affandi, Emily B. Fox, Ryan P. Adams, and Benjamin Taskar. Learningthe parameters of determinantal point process kernels. In Proceedings of the 31thInternational Conference on Machine Learning, ICML 2014, Beijing, China, 21-26June 2014, pages 1224โ1232, 2014.
[2] Edoardo Amaldi, Claudio Iuliano, and Romeo Rizzi. Efficient deterministic algo-rithms for finding a minimum cycle basis in undirected graphs. In InternationalConference on Integer Programming and Combinatorial Optimization, pages 397โ410. Springer, 2010.
[3] Nima Anari and Thuy-Duong Vuong. Simple and near-optimal map inference fornonsymmetric dpps. arXiv preprint arXiv:2102.05347, 2021.
[4] Vishal Arul. personal communication.
[5] Mihai Badoiu, Kedar Dhamdhere, Anupam Gupta, Yuri Rabinovich, Harald Rรคcke,Ramamoorthi Ravi, and Anastasios Sidiropoulos. Approximation algorithms forlow-distortion embeddings into low-dimensional spaces. In SODA, volume 5, pages119โ128. Citeseer, 2005.
[6] Nematollah Kayhan Batmanghelich, Gerald Quon, Alex Kulesza, Manolis Kellis,Polina Golland, and Luke Bornn. Diversifying sparsity using variational determinantalpoint processes. ArXiv: 1411.6307, 2014.
[7] Giuseppe Di Battista, Peter Eades, Roberto Tamassia, and Ioannis G Tollis. Graphdrawing: algorithms for the visualization of graphs. Prentice Hall PTR, 1998.
[8] Julius Borcea, Petter Brรคndรฉn, and Thomas Liggett. Negative dependence and thegeometry of polynomials. Journal of the American Mathematical Society, 22(2):521โ567, 2009.
[9] Ingwer Borg and Patrick JF Groenen. Modern multidimensional scaling: Theory andapplications. Springer Science & Business Media, 2005.
[10] Alexei Borodin. Determinantal point processes. arXiv preprint arXiv:0911.1153,2009.
191
[11] Alexei Borodin and Eric M Rains. Eynardโmehta theorem, schur process, and theirpfaffian analogs. Journal of statistical physics, 121(3):291โ317, 2005.
[12] Jane Breen, Alex Riasanovsky, Michael Tait, and John Urschel. Maximum spread ofgraphs and bipartite graphs. In Preparation.
[13] Andries E Brouwer and Willem H Haemers. Spectra of graphs. Springer Science &Business Media, 2011.
[14] Victor-Emmanuel Brunel. Learning signed determinantal point processes throughthe principal minor assignment problem. In S. Bengio, H. Wallach, H. Larochelle,K. Grauman, N. Cesa-Bianchi, and R. Garnett, editors, Advances in Neural Informa-tion Processing Systems, volume 31. Curran Associates, Inc., 2018.
[15] Victor-Emmanuel Brunel, Michel Goemans, and John Urschel. Recovering amagnitude-symmetric matrix from its principal minors. In preparation.
[16] Victor-Emmanuel Brunel, Ankur Moitra, Philippe Rigollet, and John Urschel. Max-imum likelihood estimation of determinantal point processes. arXiv:1701.06501,2017.
[17] Victor-Emmanuel Brunel, Ankur Moitra, Philippe Rigollet, and John Urschel. Ratesof estimation for determinantal point processes. In Conference on Learning Theory,pages 343โ345. PMLR, 2017.
[18] David Carlson. What are Schur complements, anyway? Linear Algebra and itsApplications, 74:257โ275, 1986.
[19] Lawrence Cayton and Sanjoy Dasgupta. Robust euclidean embedding. In Proceedingsof the 23rd international conference on machine learning, pages 169โ176, 2006.
[20] Norishige Chiba, Takao Nishizeki, Shigenobu Abe, and Takao Ozawa. A linearalgorithm for embedding planar graphs using PQ-trees. Journal of computer andsystem sciences, 30(1):54โ76, 1985.
[21] David M. Chickering, Dan Geiger, and David Heckerman. On finding a cycle basiswith a shortest maximal cycle. Information Processing Letters, 54(1):55 โ 58, 1995.
[22] Ali รivril and Malik Magdon-Ismail. On selecting a maximum volume sub-matrix ofa matrix and related problems. Theoretical Computer Science, 410(47-49):4801โ4811,2009.
[23] Ed S Coakley and Vladimir Rokhlin. A fast divide-and-conquer algorithm for com-puting the spectra of real symmetric tridiagonal matrices. Applied and ComputationalHarmonic Analysis, 34(3):379โ414, 2013.
[24] Martin Costabel. Boundary integral operators on Lipschitz domains: elementaryresults. SIAM J. Math. Anal., 19(3):613โ626, 1988.
192
[25] Jack Cuzick. A strong law for weighted sums of iid random variables. Journal ofTheoretical Probability, 8(3):625โ641, 1995.
[26] Sanjoy Dasgupta and Anupam Gupta. An elementary proof of a theorem of johnsonand lindenstrauss. Random Structures & Algorithms, 22(1):60โ65, 2003.
[27] Timothy A Davis and Yifan Hu. The university of florida sparse matrix collection.ACM Transactions on Mathematical Software (TOMS), 38(1):1โ25, 2011.
[28] Jan De Leeuw. Convergence of the majorization method for multidimensional scaling.Journal of classification, 5(2):163โ180, 1988.
[29] Jan De Leeuw, In JR Barra, F Brodeau, G Romier, B Van Cutsem, et al. Applicationsof convex analysis to multidimensional scaling. In Recent Developments in Statistics.Citeseer, 1977.
[30] Gianna M Del Corso and Giovanni Manzini. On the randomized error of polynomialmethods for eigenvector and eigenvalue estimates. Journal of Complexity, 13(4):419โ456, 1997.
[31] Erik Demaine, Adam Hesterberg, Frederic Koehler, Jayson Lynch, and John Urschel.Multidimensional scaling: Approximation and complexity. In Marina Meila andTong Zhang, editors, Proceedings of the 38th International Conference on MachineLearning, volume 139 of Proceedings of Machine Learning Research, pages 2568โ2578. PMLR, 18โ24 Jul 2021.
[32] Amit Deshpande and Luis Rademacher. Efficient volume sampling for row/columnsubset selection. In Foundations of Computer Science (FOCS), 2010 51st AnnualIEEE Symposium on, pages 329โ338. IEEE, 2010.
[33] Emeric Deutsch. On the spread of matrices and polynomials. Linear Algebra and ItsApplications, 22:49โ55, 1978.
[34] Michel Marie Deza and Monique Laurent. Geometry of cuts and metrics, volume 15.Springer, 2009.
[35] Peter Sheridan Dodds, Roby Muhamad, and Duncan J Watts. An experimental studyof search in global social networks. science, 301(5634):827โ829, 2003.
[36] Petros Drineas, Ilse CF Ipsen, Eugenia-Maria Kontopoulou, and Malik Magdon-Ismail. Structural convergence results for approximation of dominant subspacesfrom block Krylov spaces. SIAM Journal on Matrix Analysis and Applications,39(2):567โ586, 2018.
[37] Kathy Driver, Kerstin Jordaan, and Norbert Mbuyi. Interlacing of the zeros of Jacobipolynomials with different parameters. Numerical Algorithms, 49(1-4):143, 2008.
193
[38] Peter Eades. A heuristic for graph drawing. Congressus numerantium, 42:149โ160,1984.
[39] John Ellson, Emden Gansner, Lefteris Koutsofios, Stephen C North, and GordonWoodhull. Graphvizโopen source graph drawing tools. In International Symposiumon Graph Drawing, pages 483โ484. Springer, 2001.
[40] Miroslav Fiedler. Remarks on the Schur complement. Linear Algebra Appl., 39:189โ195, 1981.
[41] Klaus-Jรผrgen Fรถrster and Knut Petras. On estimates for the weights in Gaussianquadrature in the ultraspherical case. Mathematics of computation, 55(191):243โ264,1990.
[42] Emilio Gagliardo. Caratterizzazioni delle tracce sulla frontiera relative ad alcuneclassi di funzioni in ๐ variabili. Rend. Sem. Mat. Univ. Padova, 27:284โ305, 1957.
[43] Emden R Gansner, Yehuda Koren, and Stephen North. Graph drawing by stressmajorization. In International Symposium on Graph Drawing, pages 239โ250.Springer, 2004.
[44] Mike Gartrell, Victor-Emmanuel Brunel, Elvis Dohmatob, and Syrine Krich-ene. Learning nonsymmetric determinantal point processes. arXiv preprintarXiv:1905.12962, 2019.
[45] Mike Gartrell, Ulrich Paquet, and Noam Koenigstein. Low-rank factorization ofdeterminantal point processes. In Proceedings of the AAAI Conference on ArtificialIntelligence, volume 31, 2017.
[46] Jennifer A Gillenwater, Alex Kulesza, Emily Fox, and Ben Taskar. Expectation-maximization for learning determinantal point processes. In NIPS, 2014.
[47] Gene H Golub and Charles F Van Loan. Matrix computations, volume 3. JHU press,2012.
[48] David A Gregory, Daniel Hershkowitz, and Stephen J Kirkland. The spread of thespectrum of a graph. Linear Algebra and its Applications, 332:23โ35, 2001.
[49] Nathan Halko, Per-Gunnar Martinsson, and Joel A Tropp. Finding structure withrandomness: Probabilistic algorithms for constructing approximate matrix decompo-sitions. SIAM review, 53(2):217โ288, 2011.
[50] G. H. Hardy, J. E. Littlewood, and G. Pรณlya. Inequalities. Cambridge MathematicalLibrary. Cambridge University Press, Cambridge, 1988. Reprint of the 1952 edition.
[51] Joseph Douglas Horton. A polynomial-time algorithm to find the shortest cycle basisof a graph. SIAM Journal on Computing, 16(2):358โ366, 1987.
194
[52] Xiaozhe Hu, John C Urschel, and Ludmil T Zikatanov. On the approximation oflaplacian eigenvalues in graph disaggregation. Linear and Multilinear Algebra,65(9):1805โ1822, 2017.
[53] Charles R Johnson, Ravinder Kumar, and Henry Wolkowicz. Lower bounds for thespread of a matrix. Linear Algebra and Its Applications, 71:161โ173, 1985.
[54] Charles R Johnson and Michael J Tsatsomeros. Convex sets of nonsingular andP-matrices. Linear and Multilinear Algebra, 38(3):233โ239, 1995.
[55] Tomihisa Kamada, Satoru Kawai, et al. An algorithm for drawing general undirectedgraphs. Information processing letters, 31(1):7โ15, 1989.
[56] Shmuel Kaniel. Estimates for some computational techniques in linear algebra.Mathematics of Computation, 20(95):369โ378, 1966.
[57] Michael Kaufmann and Dorothea Wagner. Drawing graphs: methods and models,volume 2025. Springer, 2003.
[58] Sanjeev Khanna, Madhu Sudan, Luca Trevisan, and David P Williamson. Theapproximability of constraint satisfaction problems. SIAM Journal on Computing,30(6):1863โ1920, 2001.
[59] Kevin Knudson and Evelyn Lamb. My favorite theorem, episode 23 - ingriddaubechies.
[60] Y. Koren. Drawing graphs by eigenvectors: theory and practice. Comput. Math. Appl.,49(11-12):1867โ1888, 2005.
[61] Yehuda Koren. On spectral graph drawing. In Computing and combinatorics, volume2697 of Lecture Notes in Comput. Sci., pages 496โ508. Springer, Berlin, 2003.
[62] Yehuda Koren, Liran Carmel, and David Harel. Drawing huge graphs by algebraicmultigrid optimization. Multiscale Model. Simul., 1(4):645โ673 (electronic), 2003.
[63] B. Korte and J. Vygen. Combinatorial Optimization: Theory and Algorithms.Springer, 2011.
[64] Joseph B Kruskal. Multidimensional scaling by optimizing goodness of fit to anonmetric hypothesis. Psychometrika, 29(1):1โ27, 1964.
[65] Joseph B Kruskal. Nonmetric multidimensional scaling: a numerical method. Psy-chometrika, 29(2):115โ129, 1964.
[66] Joseph B Kruskal. Multidimensional scaling. Number 11. Sage, 1978.
[67] Jacek Kuczynski and Henryk Wozniakowski. Estimating the largest eigenvalue by thepower and Lanczos algorithms with a random start. SIAM journal on matrix analysisand applications, 13(4):1094โ1122, 1992.
195
[68] Jacek Kuczynski and Henryk Wozniakowski. Probabilistic bounds on the extremaleigenvalues and condition number by the Lanczos algorithm. SIAM Journal on MatrixAnalysis and Applications, 15(2):672โ691, 1994.
[69] A. Kulesza. Learning with determinantal point processes. PhD thesis, University ofPennsylvania, 2012.
[70] Alex Kulesza and Ben Taskar. ๐-DPPs: Fixed-size determinantal point processes. InProceedings of the 28th International Conference on Machine Learning, ICML 2011,Bellevue, Washington, USA, June 28 - July 2, 2011, pages 1193โ1200, 2011.
[71] Alex Kulesza and Ben Taskar. Determinantal point processes for machine learning.arXiv preprint arXiv:1207.6083, 2012.
[72] Alex Kulesza and Ben Taskar. Determinantal Point Processes for Machine Learning.Now Publishers Inc., Hanover, MA, USA, 2012.
[73] Donghoon Lee, Geonho Cha, Ming-Hsuan Yang, and Songhwai Oh. Individualnessand determinantal point processes for pedestrian detection. In Computer Vision -ECCV 2016 - 14th European Conference, Amsterdam, The Netherlands, October11-14, 2016, Proceedings, Part VI, pages 330โ346, 2016.
[74] Chengtao Li, Stefanie Jegelka, and Suvrit Sra. Fast dpp sampling for nystromwith application to kernel methods. International Conference on Machine Learning(ICML), 2016.
[75] Chengtao Li, Stefanie Jegelka, and Suvrit Sra. Fast sampling for strongly rayleighmeasures with application to determinantal point processes. 1607.03559, 2016.
[76] Hui Lin and Jeff A. Bilmes. Learning mixtures of submodular shells with applicationto document summarization. In Proceedings of the Twenty-Eighth Conference onUncertainty in Artificial Intelligence, Catalina Island, CA, USA, August 14-18, 2012,pages 479โ490, 2012.
[77] J.-L. Lions and E. Magenes. Non-homogeneous boundary value problems andapplications. Vol. I. Springer-Verlag, New York-Heidelberg, 1972. Translated fromthe French by P. Kenneth, Die Grundlehren der mathematischen Wissenschaften,Band 181.
[78] Chia-an Liu and Chih-wen Weng. Spectral radius of bipartite graphs. Linear Algebraand its Applications, 474:30โ43, 2015.
[79] Raphael Loewy. Principal minors and diagonal similarity of matrices. Linear algebraand its applications, 78:23โ64, 1986.
[80] Zelda Mariet and Suvrit Sra. Fixed-point algorithms for learning determinantal pointprocesses. In Proceedings of the 32nd International Conference on Machine Learning(ICML-15), pages 2389โ2397, 2015.
196
[81] Claire Mathieu and Warren Schudy. Yet another algorithm for dense max cut: gogreedy. In SODA, pages 176โ182, 2008.
[82] Leland McInnes, John Healy, and James Melville. Umap: Uniform manifold approxi-mation and projection for dimension reduction. arXiv preprint arXiv:1802.03426,2018.
[83] Leon Mirsky. The spread of a matrix. Mathematika, 3(2):127โ130, 1956.
[84] Cameron Musco and Christopher Musco. Randomized block Krylov methods forstronger and faster approximate singular value decomposition. In Advances in NeuralInformation Processing Systems, pages 1396โ1404, 2015.
[85] Assaf Naor. An introduction to the ribe program. Japanese Journal of Mathematics,7(2):167โ233, 2012.
[86] S. V. Nepomnyaschikh. Mesh theorems on traces, normalizations of function tracesand their inversion. Soviet J. Numer. Anal. Math. Modelling, 6(3):223โ242, 1991.
[87] Jindrich Necas. Direct methods in the theory of elliptic equations. Springer Mono-graphs in Mathematics. Springer, Heidelberg, 2012. Translated from the 1967 Frenchoriginal by Gerard Tronel and Alois Kufner, Editorial coordination and preface byล รกrka Necasovรก and a contribution by Christian G. Simader.
[88] Aleksandar Nikolov. Randomized rounding for the largest simplex problem. In Pro-ceedings of the Forty-Seventh Annual ACM on Symposium on Theory of Computing,pages 861โ870. ACM, 2015.
[89] Aleksandar Nikolov and Mohit Singh. Maximizing determinants under partitionconstraints. In STOC, pages 192โ201, 2016.
[90] Geno Nikolov. Inequalities of Duffin-Schaeffer type ii. Institute of Mathematics andInformatics at the Bulgarian Academy of Sciences, 2003.
[91] Christopher Conway Paige. The computation of eigenvalues and eigenvectors of verylarge sparse matrices. PhD thesis, University of London, 1971.
[92] Beresford N Parlett, H Simon, and LM Stringer. On estimating the largest eigenvaluewith the Lanczos algorithm. Mathematics of computation, 38(157):153โ165, 1982.
[93] Jack Poulson. High-performance sampling of generic determinantal point processes.Philosophical Transactions of the Royal Society A, 378(2166):20190059, 2020.
[94] Patrick Rebeschini and Amin Karbasi. Fast mixing for discrete point processes. InCOLT, pages 1480โ1500, 2015.
[95] Alex W. N. Riasanocsky. spread_numeric. https://github.com/ariasanovsky/spread_numeric, commit = c75e5d1726361eba04292c46c41009ea76e401e9, 2021.
197
[96] Justin Rising, Alex Kulesza, and Ben Taskar. An efficient algorithm for the symmetricprincipal minor assignment problem. Linear Algebra and its Applications, 473:126 โ144, 2015.
[97] Dhruv Rohatgi, John C Urschel, and Jake Wellens. Regarding two conjectures onclique and biclique partitions. arXiv preprint arXiv:2005.02529, 2020.
[98] Donald J Rose, R Endre Tarjan, and George S Lueker. Algorithmic aspects of vertexelimination on graphs. SIAM Journal on computing, 5(2):266โ283, 1976.
[99] Yousef Saad. On the rates of convergence of the Lanczos and the block-Lanczosmethods. SIAM Journal on Numerical Analysis, 17(5):687โ706, 1980.
[100] Yousef Saad. Numerical methods for large eigenvalue problems: revised edition,volume 66. Siam, 2011.
[101] Warren Schudy. Approximation Schemes for Inferring Rankings and Clusterings fromPairwise Data. PhD thesis, Brown University, 2012.
[102] Michael Ian Shamos and Dan Hoey. Geometric intersection problems. In 17th AnnualSymposium on Foundations of Computer Science (sfcs 1976), pages 208โ215. IEEE,1976.
[103] Jie Shen, Tao Tang, and Li-Lian Wang. Spectral methods: algorithms, analysis andapplications, volume 41. Springer Science & Business Media, 2011.
[104] Max Simchowitz, Ahmed El Alaoui, and Benjamin Recht. On the gap betweenstrict-saddles and true convexity: An omega (log d) lower bound for eigenvectorapproximation. arXiv preprint arXiv:1704.04548, 2017.
[105] Max Simchowitz, Ahmed El Alaoui, and Benjamin Recht. Tight query complexitylower bounds for PCA via finite sample deformed Wigner law. In Proceedings of the50th Annual ACM SIGACT Symposium on Theory of Computing, pages 1249โ1259,2018.
[106] Jasper Snoek, Richard S. Zemel, and Ryan Prescott Adams. A determinantal pointprocess latent variable model for inhibition in neural spiking data. In Advancesin Neural Information Processing Systems 26: 27th Annual Conference on NeuralInformation Processing Systems 2013. Proceedings of a meeting held December 5-8,2013, Lake Tahoe, Nevada, United States., pages 1932โ1940, 2013.
[107] Ernst Steinitz. Polyeder und raumeinteilungen. Encyk der Math Wiss, 12:38โ43,1922.
[108] Marco Di Summa, Friedrich Eisenbrand, Yuri Faenza, and Carsten Moldenhauer. Onlargest volume simplices and sub-determinants. In Proceedings of the Twenty-SixthAnnual ACM-SIAM Symposium on Discrete Algorithms, pages 315โ323. Society forIndustrial and Applied Mathematics, 2015.
198
[109] JW Suurballe. Disjoint paths in a network. Networks, 4(2):125โ145, 1974.
[110] Gabor Szego. Orthogonal polynomials, volume 23. American Mathematical Soc.,1939.
[111] P. L. Tchebichef. Sur le rapport de deux integrales etendues aux memes valeurs de lavariable. Zapiski Akademii Nauk, vol. 44, Supplement no. 2, 1883. Oeuvres. Vol. 2,pp. 375-402.
[112] Tamรกs Terpai. Proof of a conjecture of V. Nikiforov. Combinatorica, 31(6):739โ754,2011.
[113] Lloyd N Trefethen and David Bau III. Numerical linear algebra, volume 50. Siam,1997.
[114] Francesco G Tricomi. Sugli zeri dei polinomi sferici ed ultrasferici. Annali diMatematica Pura ed Applicata, 31(4):93โ97, 1950.
[115] Alexandre B. Tsybakov. Introduction to nonparametric estimation. Springer Seriesin Statistics. Springer, New York, 2009.
[116] W. T. Tutte. How to draw a graph. Proc. London Math. Soc. (3), 13:743โ767, 1963.
[117] William Thomas Tutte. How to draw a graph. Proceedings of the London Mathemati-cal Society, 3(1):743โ767, 1963.
[118] John Urschel. spread_numeric+. https://math.mit.edu/~urschel/spread_numeric+.zip,April, 2021. [Online].
[119] John Urschel, Victor-Emmanuel Brunel, Ankur Moitra, and Philippe Rigollet. Learn-ing determinantal point processes with moments and cycles. In International Confer-ence on Machine Learning, pages 3511โ3520. PMLR, 2017.
[120] John C Urschel. On the characterization and uniqueness of centroidal voronoitessellations. SIAM Journal on Numerical Analysis, 55(3):1525โ1547, 2017.
[121] John C Urschel. Nodal decompositions of graphs. Linear Algebra and its Applications,539:60โ71, 2018.
[122] John C Urschel. Uniform error estimates for the Lanczos method. SIAM Journal ofMatrix Analysis, To appear.
[123] John C Urschel and Jake Wellens. Testing gap k-planarity is NP-complete. Informa-tion Processing Letters, 169:106083, 2021.
[124] John C Urschel and Ludmil T Zikatanov. Discrete trace theorems and energy min-imizing spring embeddings of planar graphs. Linear Algebra and its Applications,609:73โ107, 2021.
199
[125] Jos LM Van Dorsselaer, Michiel E Hochstenbach, and Henk A Van Der Vorst. Com-puting probabilistic bounds for extreme eigenvalues of symmetric matrices with theLanczos method. SIAM Journal on Matrix Analysis and Applications, 22(3):837โ852,2001.
[126] Roman Vershynin. High-dimensional probability: An introduction with applicationsin data science, volume 47. Cambridge university press, 2018.
[127] Duncan J Watts and Steven H Strogatz. Collective dynamics of โsmall-worldโnetworks.nature, 393(6684):440โ442, 1998.
[128] H. Whitney. Non-separable and planar graphs. Transactions of the American Mathe-matical Society, 34:339โ362, 1932.
[129] Junliang Wu, Pingping Zhang, and Wenshi Liao. Upper bounds for the spread of amatrix. Linear algebra and its applications, 437(11):2813โ2822, 2012.
[130] Yangqingxiang Wu and Ludmil Zikatanov. Fourier method for approximating eigen-values of indefinite Stekloff operator. In International Conference on High Perfor-mance Computing in Science and Engineering, pages 34โ46. Springer, 2017.
[131] Haotian Xu and Haotian Ou. Scalable discovery of audio fingerprint motifs in broad-cast streams with determinantal point process based motif clustering. IEEE/ACMTrans. Audio, Speech & Language Processing, 24(5):978โ989, 2016.
[132] Jin-ge Yao, Feifan Fan, Wayne Xin Zhao, Xiaojun Wan, Edward Y. Chang, andJianguo Xiao. Tweet timeline generation with determinantal point processes. InProceedings of the Thirtieth AAAI Conference on Artificial Intelligence, February12-17, 2016, Phoenix, Arizona, USA., pages 3080โ3086, 2016.
[133] Grigory Yaroslavtsev. Going for speed: Sublinear algorithms for dense r-csps. arXivpreprint arXiv:1407.7887, 2014.
[134] Fuzhen Zhang, editor. The Schur complement and its applications, volume 4 ofNumerical Methods and Algorithms. Springer-Verlag, New York, 2005.
[135] Jonathan X Zheng, Samraat Pawar, and Dan FM Goodman. Graph drawing bystochastic gradient descent. IEEE transactions on visualization and computer graph-ics, 25(9):2738โ2748, 2018.
200