1 a new spectral clustering algorithm - arxivclustering methods, the reader is referred to [18]; for...

1

A New Spectral Clustering AlgorithmW.R. Casper1 and Balu Nadiga2

Abstract—We present a new clustering algorithm that is based on searching for natural gaps in the components of thelowest energy eigenvectors of the Laplacian of a graph. In comparing the performance of the proposed method with a setof other popular methods (KMEANS, spectral-KMEANS, and an agglomerative method) in the context of theLancichinetti-Fortunato-Radicchi (LFR) Benchmark for undirected weighted overlapping networks, we find that the newmethod outperforms the other spectral methods considered in certain parameter regimes. Finally, in an application toclimate data involving one of the most important modes of interannual climate variability, the El Niño Southern Oscillationphenomenon, we demonstrate the ability of the new algorithm to readily identify different flavors of the phenomenon.

Index Terms—Clustering, eigenvectors, machine learning

F

1 INTRODUCTION

C LUSTERING is an unsupervised learning tech-nique used to identify natural subgroups of a

set of data wherein the members of a subgroup sharecertain properties. On representing the set of data as agraph, clustering methods group vertices of the graphinto clusters based on the edge structure of the graph[e.g., see 23]. We note here that the edge structureitself may be derived purely from interactions betweenvertices1, e.g., as in a social network, or based ona measure of similarity between the vertices them-selves, or some combination of the two. Needless tomention, clustering methods find use in an extremelywide range of disciplines (e.g., neuroscience [7], so-cial networks [17], computer vision [8], and digitalforensic analysis [4]). Indeed most modes of climatevariability are typically first identified by applyingclustering methods to climatological data sets [e.g.,14, 3, 29, 10, 26, and others].

Since approaches to clustering are extremely var-ied, we do not attempt even a brief overview of suchmethods. For that the reader is referred to reviewssuch as [23]. Instead, we shift focus directly to the newalgorithm after briefly considering common aspects ofspectral approaches. In this article, we propose a spec-tral clustering algorithm based on direct analysis ofthe magnitudes of components of the eigenvectors (ofthe Laplacian matrix; see next section) with smallesteigenvalues. We call this algorithm spectral maximumgap method (SP-MGM). We compare the performanceof this algorithm with popular clustering algorithms

1Department of Mathematics, Louisiana State University, BatonRouge LA. Email:[email protected], Los Alamos National Lab, Los Alamos, New Mexico.Email:[email protected]

1. and in which context clustering is commonly referred to ascommunity-detection

including k-means (KMEAN), more typical spectralclustering based on running kmeans on eigenvectors(SP-KMEANS), and agglomerative clustering (AGG).To determine the performance of these algorithms,we rely on a series of LFR benchmark graphs [13].Notably, it is seen that on a certain series of thesegraphs with varying size, average degree, and weightmixing parameter, our algorithm consistently outper-form these other methods. Following our benchmarkanalysis, we apply SP-MGM to a collection of monthlyaveraged sea surface temperatures in the Niño 3.4region of the Pacific Ocean. We demonstrate that ourmethod correctly and readily identifies the variousflavors of El Niño and La Niña events.

2 SPECTRAL CLUSTERING ALGORITHMS

Spectral clustering algorithms rely on using spectralcharacteristics of the Laplacian of a (weighted) graphto partition the vertices of the graph into naturalclusters. Here, the Laplacian can refer to the nor-malized Laplacian, the non-normalized Laplacian, oreven stranger entities like the p-Laplacian [2]. Theuse of spectral characteristics to identify clusters ina graph may be justified in numerous ways, such asby considering random walks, minimal cuts, or blockmatrix diagonalization. For a recent survey on spectralclustering methods, the reader is referred to [18]; foran introduction to spectral clustering itself, see, e.g.,[31].

To present a brief and intuitive description of spec-tral clustering methods, we consider the eigenvectorsof the Laplacian of a disconnected graph. Let G be agraph with adjacence matrix A and degree matrix D(the diagonal matrix whose nonzero entries are the de-grees of the vertices of G). Then the (non-normalized)

arX

iv:1

710.

0275

6v1

[cs

.LG

] 7

Oct

201

7

Laplacian of G is defined as L = D − A. The matrixL is positive semidefinite, and the kernel has a basisspanned by the indicator functions of the connectedcomponents of G. In particular, if G is disconnectedthen the connected components of G may be deducedfrom the kernel of L.

Clusters in a graph G are intuitively regions inG where the vertices are connected by edges withstrong weights, relative to the weights of edges goingbetween clusters. For this reason, the Laplacian L ofthe graph G should be very close to the Laplacian L̃ ofthe disconnected graph G̃ formed from G by deletingintercluster edges. This means that the eigenvalues ofL will correspond approximately to the eigenvaluesof L̃. Since L̃ is positive semidefinite, the smallesteigenvalue of L̃ is zero. Therefore the eigenvectors ofthe smallest eigenvalues of L should be small pertur-bations of vectors in the kernel of L̃. Thus by analyzingthe structure of the eigenvectors corresponding to thesmallest handful of eigenvalues ofL, we should expectto retrieve information about the clusters of G. It isexactly this idea that all spectral clustering algorithmsexploit. Note that if G is connected, then the kernelof L consists of vectors whose entries are all identical,and therefore does not provide any information. Forthis reason, the kernel of L is typically ignored inspectral clustering algorithms.

To further clarify this idea, consider the exampleof three cliques of size 4 joined together by singlebonds in general position, as in Figure 1a. The non-normalized Laplacian of this graph in block matrixform is

L =

K +K ′ Q QT

QT K +K ′ Q

Q QT K +K ′

,K the Laplacian of a 4× 4 clique, K ′ = QQT +QTQ,andQ the 4×4 matrix whose only nonzero entry is−1and occurs in the bottom left corner. The eigenvaluesof this matrix are 0 and 6 with multiplicity 1, 6−λ andλ with multiplicity 2, and 4 with multiplicity 6, wherehere λ ≈ 0.5051. A basis for the for the eigenspace ofthe lowest nonzero eigenvalue λ is plotted in Figure1b. The stucture of these eigenvectors successfullyreveals the presence of the three clusters.

In our algorithm, spectral clustering on a givendata set occurs in three parts. We first create a similar-ity matrixA describing the similarity or connectednessbetween each of our pieces of data. Secondly, wecalculate the eigenvectors of the Laplacian of A, andsort them in terms of the magnitudes of the associ-ated eigenvalues. Lastly, we use some number of theeigenvectors with the smallest nonzero eigenvaluesto calculate a natural partition of the vertices of thegraph.

(a) A graph with three clusters formed bycliques of size 4. Vertex color indicates inclu-sion in a particular clique, and all edges haveweight 1. The nodes are also indexed in thesame order as their appearance in the Lapla-cian matrix.

(b) A basis for the eigenspace of eigenvectors ofthe clique cluster graph with smallest nonzeroeigenvalue λ ≈ 0.5051. The eigenspace is two-dimensional, and the basis elements are color-coded. The structure of these eigenvectors re-flects cluster membership in that the values (forany given eigenvector) at indices belonging tothe same cluster are similar.

Fig. 1: An illustration of the basis for spectral cluster-ing.

A similarity matrix for a collection of n indexeddata points is an n × n symmetric matrix A whosei, j’th entry is a nonnegative value measuring thesimilarity between the i and j’th data values. It is as-sumed that larger values indicate more similar objects,and the diagonal elements are typically taken to be0. Similarity matrices can be created out of data in avariety of ways. One popular method is to start withthe distance dij between i and j (under some metric),and then define Aij = exp(−d2ij/2σ) for i 6= j, wherehere σ is a parameter.

The (symmetric) normalized Laplacian of A is thegraph Laplacian of the unique weighted graph G(A)

2

whose weight matrix is A. In particular, it is definedby

L = I −D−1/2AD−1/2

where here D is a diagonal matrix whose entries arethe sums of each of the column vectors of A. Thematrix L is positive-semidefinite and will always haveat least one eigenvector ~v with eigenvalue 0, givenby ~v = D1/2[1 . . . 1]T . However, assuming that thegraph G(A) associated to A is connected, this will bethe only eigenvector with eigenvalue 0, up to constantmultiples.

The smallest nonzero eigenvectors of L encodeclustering information for L, and it is these eigen-vectors which we wish to parse in the final step.The reason these eigenvectors contain clustering in-formation can be made intuitive by the followingargument. Clusters consist of regions in the graph withstrong interconnectivity. Specifically, given a vertexin a cluster we would expect that on average thestrength of its connection with other vertices in thesame cluster should be much greater than the strengthof its connection with vertices in other clusters. Fromthis point of view, we can decompose the adjacencymatrix A as A = B + C where B is a block diagonalmatrix whose blocks are formed by eliminating inter-cluster bonds, and where C = B−A is relatively smallcompared to B. In this case, the Laplacian L of A maybe written as L = LB + L′ where LB is the Laplacianof B and L′ is small relative to LB . Consequentlythe eigenvectors of LB will be comparable ot theeigenvectors of L. In particular the 0 eigenvectors ofLB correspond precisely with the eigenvectors of Lwith small eigenvalues. Moreover, these eigenvectorsdescribe the connected components of the graph G(B)of B, and these are precisely the clusters of G(A).In this way, using the eigenvectors of the smallestnonzero eigenvalues makes sense.

Where various clustering algorithms differ is inthis third step, where the eigenvector data is used tocalculate clusters. Before proposing our own cluster-ing scheme, we will recount two common methodsfound in the literature, the simplest method SP-G1 andthe most common method SP-KMEANS.

2.1 SP-G1 clusteringThe oldest and most basic algorithm is SP-G1, whichdecomposes the data into two clusters based on theentries of the eigenvector ~v = [v1 . . . vn]T with thelowest nonzero eigenvalue [6][9]. Specifically, a thesh-old value r needs to be chosen. Then the i’th data pointis put into the first cluster if vi < r, and otherwiseit is put into the second cluster. Of course, there areseveral natural choices for r, including r = n−1

∑i vi.

Of course, more sophisticated methods for the choiceof r exist[25].

Algorithm 1: SP-G1 clusteringInput : weight matrix AOutput: two clusters

1 Calculate normalized Laplacian L of A2 Calculate smallest positive eigenvalue λ1 of L3 Calculate an eigenvector ~v of λ14 Choose a threshold r5 if vi < r then6 put node i in cluster 17 else8 put node i in cluster 29 end

2.2 SP-KMEANS

A more complicated and very reliable clustering al-gorithm instead uses eigenvectors ~v1, . . . , ~vk of the ksmallest positive eigenvalues 0 < λ1 ≤ · · · ≤ λk ofthe Laplacian L of A. These eigenvectors are used asthe column vectors of an n × k matrix V . The matrixV is then normalized so that each of the rows hasnorm 1, and the associated row vectors are viewedas n points on the surface of a sphere in Rk. Thenby running KMEANS on these points, we return kclusters of the data C1, . . . , Ck which partition theoriginal graph [19][16]. The reliability and familiarityof this algorithm has lead to its popularity as one ofthe most common forms of spectral clustering, withimplementations in both R and python [20][11].

Algorithm 2: SP-KMEANS clusteringInput : weight matrix A

number of clusters kOutput: k clusters

1 Calculate normalized Laplacian L of A2 Calculate k smallest positive eigenvaluesλ1 < · · · < λk of L

3 Calculate an eigenvector ~vj of λi4 Construct an n× k matrix V = [~v1 . . . ~vk]5 Normalize V so that each row vector has norm 16 Run KMEANS on the row vectors of V

2.3 Spectral Maximum Gap Method (SP-MGM)

In the new SP-MGM algorithm we propose, the eigen-vectors ~v1, . . . , ~vk of the k smallest positive eigen-values 0 < λ1 ≤ · · · ≤ λk of the Laplacian L ofA are first calculated. Next, for each j we create anew vector ~uj whose entries are the entries of ~vj inincreasing order. Let vji and uji denote the entriesof ~vj and ~uj , respectively. Differences uj(i+1) − ujibetween subsequent entries of ~uj represent “gaps" in~vj .

3

We choose `j so that ~uj(`j+1) − ~uj`j is the largestgap. Then we create a partition Pj , Qj of {0, . . . , n−1}by placing vertex i in Pj if vji <= uj`j , and placingvertex i in Qj otherwise. Finally, for each subset S ofU = {1, . . . , k} we define a cluster

CS =

(⋂i∈S

Pi

)∩

⋂i∈U\S

Qi

,where here if S or U\S is empty, the associatedintersection is interpreted to be {0, 1, . . . , n − 1}. It isclear from the construction that {CS : S ⊆ U} forms apartition of {0, 1, . . . , n− 1}.

The number of clusters in this partition is at least 2,but can be up to 2n. However, in practice it is observedthat many of the clusters CS are repeated, and thealgorithm tends to find far fewer than 2n clusters. Infact, the number of clusters created in this way tendsto be close to the actual right number of clusters. Thiswill be discussed further in the benchmark sectionbelow. Note that we experimented with both using thenormalized and non-normalized Laplacian. We foundthat the non-normalized Laplacian performs better forthe benchmark graphs that we considered. We notethat a common and effective strategy in applying spec-tral methods to large data sets consists of adopting ahierarchical approach wherein spectral techniques areused on subsets of the data that are then recombined[e.g., see 15, 28, 24, 21].

Algorithm 3: SP-MGM clusteringInput : weight matrix A

number of modes kOutput: m clusters (2 ≤ m ≤ 2k)

1 Calculate non-normalized Laplacian L of A2 Calculate k smallest positive eigenvaluesλ1 < · · · < λk of L

3 Calculate an eigenvector ~vj of λi4 Set ~uj = sort(~vj)5 Choose `j so that uj(`j+1) − uj`j is largest6 if vji < uj`j then7 put node i in cluster Pj8 else9 put node i in cluster Qj

10 end11 For each S ⊆ {1, . . . , k} define

CS =

(⋂i∈S

Pi

)∩

⋂i∈U\S

Qi

,

2.3.1 A mathematical justification of the new algo-rithmWe provide a mathematical justification of the SP-MGM algorithm by applying the theory of eigenvector

permutations to the Laplacian of the graph G. Weprovide full mathematical details in the appendix forthe interested reader. However, we restrict ourselvesto a brief description of the justification here so as tokeep the narrative more generally accessible: First, weshow mathematically that if G consists of several well-defined clusters C1, . . . , Ck of size n1, . . . , nk then

maxi 6=m

(ni − 1)ρ1λi,2

<1

4n1/2m

,

for all 1 ≤ m ≤ k. Here λi,2 and ρ1 are spec-tral data which measure the quality of the clusters.Specifically, λi,2 is the smallest nonzero eigenvalueof cluster Ci and measures the connectivity of thecluster Ci. For this reason, it is sometimes called thealgebraic connectivity of the cluster Ci. The numberρ1 is the spectral radius of the Laplacian of the graphG formed by deleting all of the intra-cluster edges inG. In this way, it measures the strength of the inter-cluster connections. For a strongly clustered graph, ittherefore makes sense that ρ1 is small relative to λi,2for all i. Further, if G consists of several such clustersC1, . . . , Ck, then, we can mathematically guarantee thatSP-MGM will correctly identify the clusters. Again, thereader is referred to the appendix for details.

3 BENCHMARK EXPERIMENTS

3.1 LFR BenchmarkTo test the quality of our clustering/community-detection algorithm, we choose to use the LFR bench-mark. Benchmark graphs consist of randomly gen-erated graphs with built-in network ties. The prop-erties of the output graph are controlled by a se-ries of parameters. The parameters we consider andvary are the total number of nodes n, the averagedegree davg , the maximum degree dmax, and themixing parameter µ. Each node is given a degreebased on a power law distribution. The minimumand maximum degrees in the network are chosenso that the mean from the power law distributionis davg . Nodes are assigned to clusters such that the(weighted) fraction of intercluster edges is 0 ≤ µ ≤ 1.Source code for generating LFR Benchmark graphsis available at https://github.com/eXascaleInfolab/LFR-Benchmark_UndirWeightOvp.

Note also that each of the algorithms describedabove requires an important piece of user input:namely the number of clusters to look for. In practice,there are several ways of deciding how many clustersin a graph to look for, such as the elbow technique, thejump method, and silhouette analysis [12]. However,these seem outside the scope of the current paper.Instead, we supply each algorithm with the correctnumber of clusters to look for. The exception to this isSP-MGM, which does not take in a number of clusters

4

https://github.com/eXascaleInfolab/LFR-Benchmark_UndirWeightOvp

https://github.com/eXascaleInfolab/LFR-Benchmark_UndirWeightOvp

to look for. Instead, it takes in a number of eigenmodesm. However, in practice, the number of clusters it findsis m+ 1, so we choose m accordingly.

3.2 The Quality of a Clustering PartitionAfter a clustering algorithm identifies one or moreclusters in a graph, it is natural to want to identify thequality of the clusters. Are the vertices in a cluster nat-urally correlated in some fashion, or is the cluster onlyan artifact of the specified clustering algorithm? Whilethere is no single criteria for reliably determining thequality of a cluster, there exist several metrics whichmay be used in combination to judge the reasonabilityof ones choice of clusters. For a comparison of severalsuch metrics, see [1].

The LFR Benchmark produces a weighted graph,along with a predetermined true value for the neigh-borhoods. To determine the performance of each clus-tering algorithm we use two scoring metrics: the nor-malized mutual info score (NMI) and the adjustedrand score (ARS). In both metrics, the scores can rangebetween 0 and 1, with 1 representing perfect clusterprediction and 0 representing largely independentcluster prediction from the true clusters.

3.3 ResultsWe will now compare the performance of our algo-rithm to the performance of a traditional spectral clus-tering SP-KMEANS, as well as traditional KMEANSand an agglomerative clustering algorithm. This col-lection of algorithms was chosen as a basis of compar-ison since they are notably diverse in their methodolo-gies. At the same time, they are also popular enoughto have been implemented in the widely-used pythonmachine learning package scikit-learn [20]. While SP-KMEANS has already been described, the reader isreferred to [22] for descriptions of KMEANS and theagglomerative methods.

3.3.1 Variations in Mixing ParameterThe scores of clustering algorithms for the LFR bench-mark on graphs with 50 and 100 nodes with varyingmixing parameter are shown in Figures 2 and 3. Fromthese figures, it is evident, and notably so, that SP-MGM consistently outperforms the other clusteringalgorithms considered for small networks over theentire range of the mixing parameter considered.

Next, we examined the performance of SP-MGMrelative to the other algorithms considered while vary-ing the average vertex degree, and the network size.Figure 4 shows the results when average vertex degreewas varied. As the average network degree increased,all algorithms considered showed an increase in score.However, SP-MGM consistently out-performed theother algorithms considered. Furthermore, SP-MGM

Fig. 2: The average of the normalized mutual infoscore (NMI) and adjusted rand score (ARS) for variousvalues of the mixing parameter µ. The benchmarkgraphs used here were all 50 nodes each with anaverage degree of 5 and a maximum degree of 20.The scores for each value of µ are averaged over 50randomly generated benchmark graphs.

Fig. 3: The average of the normalized mutual infoscore (NMI) and adjusted rand score (ARS) for variousvalues of the mixing parameter µ. The benchmarkgraphs used here were all 100 nodes each with anaverage degree of 5 and a maximum degree of 20.The scores for each value of µ are averaged over 50randomly generated benchmark graphs.

5

Fig. 4: The average of the normalized mutual infoscore (NMI) and adjusted rand score (ARS) vs. averagevertex degree k. The benchmark graphs used hereall had mixing parameter µ = 0.1 and network sizen = 100. Each score is averaged over 50 randomlygenerated benchmark graphs.

Fig. 5: The average of the normalized mutual infoscore (NMI) and adjusted rand score (ARS) for variousnetwork sizes n. The benchmark graphs used here allhad mixing parameter µ = 0.1 and average connectiv-ity k = 10. Each score is averaged over 100 randomlygenerated benchmark graphs. The data shows that thespectral gap method tends to have comparable or bet-ter performance than other methods for a wide rangeof network sizes. However, performance worsens forlarger network sizes.

was the only algorithm to perform reasonably whenthe average vertex degree was small, and the mixingparameter was simultaneously low.

The scores of clustering algorithms for the LFRbenchmark on graphs with 50 and 100 nodes withvarying mixing number are shown in Figures 2 and3. From these figures, it is evident, and notably so,that SP-MGM consistently outperforms the other clus-tering algorithms considered for small networks overthe entire range of the mixing parameter considered.

For larger networks (not shown), SP-MGM per-forms comparably or better than the other algorithmsconsidered when the algorithms are scoring highenough. In other words, when any of the algorithmsperform well, SP-MGM performs well as well. Wenote here that when edges are distributed uniformlyacross the nodes, clusters are not well-defined, andas such, the computed clustering is likely not robust(and likely arbitrary) and therefore not meaningfulor useful. Notably, for small values of the mixingparameter on small graphs, the prediction-skill of SP-MGM is far superior.

3.3.2 Variations in Average Degree and Network SizeNext, we examined the performance of SP-MGM rela-tive to the other algorithms considered while varyingthe average vertex degree, and the network size. Fig-ure 4 shows the results when average vertex degreewas varied. As the average network degree increased,all algorithms considered showed an increase in score.However, SP-MGM consistently out-performed theother algorithms considered. Furthermore, SP-MGMwas the only algorithm to perform reasonably whenthe average vertex degree was small, and the mixingparameter was simultaneously low.

Figure 5 shows the results when the network sizewas varied. SP-MGM is seen to perform well overa range of network sizes, with only the agglomera-tive clustering algorithm out-performing it for largergraph sizes. We note, however, that the new algo-rithm tended to outperform its spectral counterpartSP-KMEAN over the full range of network sizes con-sidered.

3.3.3 The Number of Clusters SP-MGM DetectsFrom the description of the SP-MGM algorithm, itis seen that we need to specify only the number mof eigenmodes to consider. For this reason, a priorithe algorithm should be able to predict anywherefrom 2 and min{2m, n} clusters, where here n is thetotal number of vertices in the graph. However, inpractice the algorithm almost always produces m + 1clusters, at least for all the test graphs examined in thispaper. This result is quite surprising, since from anintuitive perspective one would suspect the numberof clusters produced to be far more volatile. It seems

6

like this is additional computational evidence that the“maximum gaps" we are computing must have a lot todo with the structure of the graphs themselves, closelytied to the clusters.

4 CLUSTERING OF SST DATA IN EL NIÑO 3.4In this section we consider the application of the newclustering algorithm to a real world data set in thecontext of the climate system. Specifically we considerthe El Niño Southern Oscillation (ENSO) since it is oneof the most important modes of interannual variabilityof the climate system. It can be characterised in termsof the sea surface temperature anomaly occuring in aregion of the equatorial Pacific Ocean. The ENSO tem-perature anomaly roughly fluctuates between threephases: a warm El Niño phase, a cold La Niña phase,and a neutral phase. The phases of ENSO are corre-lated with changes in precipitation in various regionsaround the globe, with strong correlations in coastalPacific regions. As such, predictions of ENSO havehigh socio-economic value, Luckily, ENSO also hap-pens to be one of the few modes of interannual vari-ability that has a useful level of verified predictability.

4.1 Sea Surface Temperature DataWe use version 4 of the Extended ReconstructedSea Surface Temperature (ERSST) dataset availableat https://www.esrl.noaa.gov/psd/data/gridded/data.noaa.ersst.v4.html. The data has a spatialresolution of 2 degree, and while the monthly-averaged data that we use are available from 1854-present, we focus attention on data over the years1955-2016, given lower uncertainties over this morerecent (instrumented) period. For further descriptionof the data, the reader is referred to https://www.ncdc.noaa.gov/data-access/marineocean-data/extended-reconstructed-sea-surface-temperature-ersst-v4.

4.2 El Niño EventsThe average sea surface temperature is calculatedusing a 30-year average of the NOAA ERSST datastarting Jan 1, 1950. SST anomaly is calulated relativeto this mean [27].

We consider the data from ERSSTv4 in the Niño 3.4region (the region between 170W and 120W longitudeand between -5N and 5N latitude), as shown in Figure6. El Niño events are classified by a persistent largeaverage temperature anomaly in this region. Quanti-tatively, El Niño events are determined by the OceanicNiño Index (ONI), which is a 3-month running meanof the SST anomaly spatially averaged over the Niño3.4 region. A Niño event is characterized by ONIvalues greater than 0.5 degrees C for at least threemonths in a row. The years of El Niño events between1950 and 2000 may be seen from the graph in Figure(7) below.

4.3 Clustering Results

While ONI is a conventional and simple measurethat is used to classify the state of ENSO, the phe-nomenon itself is more complicated. That is, althoughthe ENSO phenomenon can be characterized as a low-dimensional dynamical phenomenon occuring in aninfinite-dimensional dynamical system, it is unlikelyto be fully or adequately characterized by just ONI. Assuch, we conduct a clustering analysis of the spatially-extended sea surface temperature anomaly (SSTA)dataset T(x, y, t) where t runs from years 1955 through2016 and x and y correspond to the Niño 3.4 regiondescribed above. The data vector for each year itselfis a 3-month average starting from November of theprevious year.

In order to conduct cluster analysis, we next trans-form the SSTA dataset into a graph whose verticesare the yearly SSTA datapoints embedded in a high-dimensional space, the number of dimensions cor-responding to the number of distinct spatial loca-tions in the Niño 3.4 region. Next we consider anedge structure that reflects the mutual similarity ofpairs of vertices. The resultant similarity matrix hasthe form Aij = exp(−dij/σ) for i 6= j, wheredij = ~T (x, y, i)TΣ(x, y)−1 ~T (x, y, j), and where Σ thespatially-extended covariance matrix for the periodconsidered. Note that Σ(x, y)−1 is also sometimescalled the precision matrix. For simplicity, we considera diagonal form for Σ(x, y) that comprises the vari-ances of the SSTA at each of the spatial locations inthe Niño 3.4 region. The parameter σ toggles controlslocalization of the similarity matrix; we use σ = 10−2.

In Figure 8, the nodes are colored-coded based oncluster-membership and the nodes are placed at thetwo-dimensional point corresponding to the year (x-axis) and ONI of that year (y-axis). Finally, the thick-ness of the edges reflect similarity between the pairof vertices they connect. In this figure, it is seen thatthere is good correlation between the clusters and theconventional ONI (y-axis). Thus, the new clusteringalgorithm SP-MGM is seen to successfully distinguishbetween El Niño , La Niña , and neutral events.

Next, it is seen that the SP-MGM clustering al-gorithm identifies 7 distinct clusters and that clustermembership is not solely determined by ONI. Indeed,that there are different flavors of ENSO besides ElNiño and La Niña is now widely recognized and howthese different flavors influence climate in differentregions is an active area of research. The cluster pat-terns we find are consistent, e.g., with the patternsfound in [10] who use a neural-network based analysisin conjunction with other statistical distinguishabilitytests. We note, however, that there are significant dif-ferences in the data used by [10] in that they considera six month average starting in September and a much

7

https://www.esrl.noaa.gov/psd/data/gridded/data.noaa.ersst.v4.html

https://www.esrl.noaa.gov/psd/data/gridded/data.noaa.ersst.v4.html

https://www.ncdc.noaa.gov/data-access/marineocean-data/extended-reconstructed-sea-surface-temperature-ersst-v4



Fig. 6: Monthly averaged sea surface temperature on Jan 1, 2017. The region surrounded by the dashed line isthe Niño 3.4 region, which lies in 170W-120W, 5S-5N.

more extensive region of the Pacific.

5 CONCLUSIONS

In this paper, we presented a new spectral clusteringalgorithm which we call the Spectral-Maximum GapMethod SP-MGM. This algorithm is based on identi-fying gaps in the structure of eigenvectors. We thenwent on to provide a mathematical justification for thealgorithm. While it would be interesting to see wherethis algorithm fits in a systematic comparison of spec-tral algorithms, such as in [30], such a comparison isoutside the scope of the current paper. Further, as withother spectral schemes, this method can be appliedto large datasets in conjunction with a hierarchicalapproach as in [15, 28, 24, 21, and others].

Next, we examined the performance of this algo-rithm in comparison to a few other popular algorithmsusing the LFR benchmark graphs. Results showed thatthe SP-MGM algorithm performs either comparbly orbetter than its counterparts in a variety of parameterregimes. We also demonstrated how SP-MGM may be

used to automatically detect an appropriate number ofclusters for a specified graph. Finally, we applied thisalgorithm to analyze and identify a variety of flavorsof the El Niño Southern Oscillation using spatiallyextended sea surface temperature data.

ACKNOWLEDGEMENT

This work was performed during one authors visit toCNLS at Los Alamos National Lab.

APPENDIX

Eigenvector PerturbationThe eigenvalue perturbation problem is the problemof estimating the eigenvectors and eigenvalues of aperturbed matrix T = T0 + εT1 based on the eigenval-ues of the matrix T0, where here ε is a small positivenumber which estimates the size of the perturbation.In such a case one should rightfully assume that theeigenvalues and eigenvectors of T and T0 are nearlythe same. In this section, we briefly recount the idea

8

Fig. 7: Values of the Oceanic Niño Index (ONI) between 1950 and 2000. Years where an El Niño event occurredare marked. Since impacts of El Niño are largest in the (northern-hemisphere), winter some of the events spanthe calendar year boundary.

of eigenvector perturbation analysis, deriving in par-ticular the linear perturbation equations for the eigen-data of a perturbed matrix. This equation specificallyrelates the eigenvectors of T0 and T = T0 + εT1 forε sufficiently small. Our exposition is loosely basedon that found in the standard source [5]. However,standard perturbation analysis formulas seems to em-phasize the reqirement that the eigenvalues of T0 aredistinct, whereas our brief derivation shows that thisis not the case, as long as the formulas are appropri-ately adjusted. This is important for our application tograph theory, where we wish T0 to be the Laplacian ofa disconnected graph.

Assume that T0, T1 are symmetric n × n matricesand let ~u1, . . . , ~un be an orthonormal eigenbasis forT1, with T1~ui = λi~ui. Then the linear perturbationequation says that for all i there exists an eigenvector

~vi of T satisfying

~vi = ~ui +∑

j:λj 6=λi

ε〈~uj , T1~ui〉λj − λi

~ui +O(ε2). (1)

Furthermore the eigenvalue λ′i of ~vi is given by

λ′i = λi + ε〈~uj , T1~ui〉+O(ε2). (2)

where here the braces 〈·, ·〉 denote the usual innerproduct on Rn.

To prove the above two equations, we can use astandard but important linear algebra tool: the resol-vent of a matrix. Given any matrix L, the resolventR(L; z) is a matrix-valued rational function on thecomplex plane, defined by R(L; z) = (Iz−L)−1. Oneuseful property of the resolvent is that acts like a pro-jection operator onto the eigenspaces of L. Specifically,if λ is an eigenvalue of R(L; z) and C is a closed

9

Fig. 8: Results of clustering NDJ SST states in the Niño 3.4 region. Nodes are colored based on the cluster towhich they belong. Years with similar ONI tend to be clustered together.

contour in the complex plane, then for any constantvector ~v ∈ Rn, the integral

1

2πi

∮CR(L; z)~vdz

is an eigenvector of L with eigenvalue λ, as long as itis nonzero.

Now since T = T0 + εT1, a simple calculationshows

R(T ; z) = (I −R(T0; z)T1)−1R(T0; z)

=∞∑m=0

εm(R(T0; z)T1)mR(T0; z).

For each i, let λ′i be an eigenvalue of T closest to λi.Take Ci to be a closed contour in the complex planecontaining λi and λ′i, but no other (distinct) eigenvalueof T or T0. Then we know that

~vi :=1

2πi

∮Ci

R(T, z)~uidz

will be an eigenvector with eigenvalue λ′i. However,using our expression for R(T, z) we see that

R(T, z)~ui =∞∑m=0

εm(R(T0; z)T1)m1

z − λi~ui

=1

z − λi

~ui +n∑j=i

ε〈~uj , T~ui〉(z − λj)

~uj

+O(ε2).

Therefore by Cauchy’s residue theorem

~vi = ~ui +∑

j:λj 6=λi

ε〈~uj , T1~ui〉λj − λi

~ui +O(ε2).

This proves Equation 1. To get the associated eigen-value λ′i, we can use the fact that

λ′i〈~ui, ~vi〉 = 〈~ui, T~vi〉.From our expression for ~vi, we have 〈~ui, ~vi〉 = 1 +O(ε2) and 〈~ui, T~vi〉 = λi + ε〈~ui, T1~ui〉+O(ε2). Thus

λ′i = λi + ε〈~uj , T1~ui〉+O(ε2).

This proves Equation 2.

10

Mathematical Justification for SP-MGMLet G be a graph with n vertices and let C1, . . . , Ck bethe natural subclusters of G. To show mathematicallyhow SP-MGM detects the clusters C1, . . . , Ck fromthe graph G, we will consider two subgraphs of G,which we will call G0 and G1, and which will beformed by deleting the intercluster bonds and intra-cluster bonds of G, respectively. Also to simplify thenotation, throughout this section we will think aboutthe Laplacian of G as acting not on Rn, but ratheron the space of real-valued functions on G. For thisreason, we will switch from talking about eigenvectorsto talking about eigenfunctions. We will denote theLaplacians of G, G0 and G1 as L, L0 and L1. ClearlyL = L0 + L1.

To better understand why SP-MGM should workwell, we need to consider in more detail the eigen-structure of the Laplacian L0 of G0. The eigenfunc-tions of G0 clearly restrict to each cluster Ci to aneigenfunction on Ci. Each cluster Ci is connected,and therefore has a kernel spanned by its indicatorfunction. Thus an orthonormal basis for the kernel ofL0 is given by

fi(x) =

{n−1/2i , x ∈ Ci0 x /∈ Ci

for i = 1, . . . , k. More generally, let λi,1 ≤ · · · ≤ λi,ni

be the eigenvalues of Ci. Then the λij are also eigen-values of L0 and there exists an orthonormal eigenba-sis for L0 such that fi,j(x) has eigenvalue λi,j and issupported on Ci. By definition, fi,1(x) = fi(x).

Assuming that the clusters are really, honestly clus-ters and not just part of some artificial partition of G,we should expect the spectral radius ρ1 of L1 to besmall compared to the spectral radius ρ0 of L0. Thenthe eigendata of L and L0 will be related via the eigen-vector perturbation theory discussed above. Each ofthe fi(x) will correspond to an eigenfunction f ′i(x)of the matrix L via this perturbation theory, and theassociated eigenvalues should be expected to be thesmallest k eigenvalues of the matrix L. Moreover, thiseigenvalue perturbation theory gives us an estimate offi(x)′ relative to fi(x). If we can show that

‖fi(x)− f ′i(x)‖∞ <1

4n1/2i

,

then the location of the maximum gap in fi(x) andthe location of the maximum gap in fi(x) must be thesame! In this case, SP-MGM is guaranteed to identify apartition which correctly separates one of the clustersfrom the others.

From our eigenvalue perturbation theory, we knowthat for m = 1, . . . , k

f ′m(x) = fm(x) +k∑i=1

ni∑j=2

〈fi,j(x), L1fm1(x)〉λi,j

fi,j(x)

to order O(ε2), where ε = ρ1/ρ0. Since fij is sup-ported on Ci and L1 sends fm to a function supportedon the complement of Cm, we find that to order (ε2)

‖f ′m(x)− fm(x)‖ ≤ maxi 6=m

(ni − 1)ρ1λi,2

.

Thus we arrive at the conclusion that if ε = ρ1/ρ0 issmall and for all m we have

maxi 6=m

(ni − 1)ρ1λi,2

<1

4n1/2m

.

Then SP-MGM will correctly identify the desired clus-ters.

BIBLIOGRAPHY

[1] Hélio Almeida, Dorgival Guedes, Wagner Meira,and Mohammed J Zaki. Is there a best qualitymetric for graph clusters? In Joint European Con-ference on Machine Learning and Knowledge Discov-ery in Databases, pages 44–59. Springer, 2011.

[2] Thomas Bühler and Matthias Hein. Spectral clus-tering based on the graph p-laplacian. In Proceed-ings of the 26th Annual International Conference onMachine Learning, pages 81–88. ACM, 2009.

[3] João Corte-Real, Budong Qian, and Hong Xu.Regional climate change in portugal: precipita-tion variability associated with large-scale atmo-spheric circulation. International Journal of Clima-tology, 18(6):619–635, 1998.

[4] Sergio Decherchi, Simone Tacconi, Judith Redi,Alessio Leoncini, Fabio Sangiacomo, and RodolfoZunino. Text clustering for digital forensics anal-ysis. Computational Intelligence in Security for In-formation Systems, pages 29–36, 2009.

[5] Demmel. Applied numerical linear algebra. 1997.[6] Miroslav Fiedler. A property of eigenvectors of

nonnegative symmetric matrices and its applica-tion to graph theory. Czechoslovak MathematicalJournal, 25(4):619–633, 1975.

[7] Sally Galbraith, James A Daniel, and Bryce Vissel.A study of clustered data and approaches to itsanalysis. Journal of Neuroscience, 30(32):10601–10608, 2010.

[8] Robert M Haralick and Linda G Shapiro. Imagesegmentation techniques. Computer vision, graph-ics, and image processing, 29(1):100–132, 1985.

[9] Anil K Jain and Richard C Dubes. Algorithms forclustering data. Prentice-Hall, Inc., 1988.

[10] Nathaniel C Johnson. How many enso flavors canwe distinguish? Journal of Climate, 26(13):4816–4827, 2013.

[11] Alexandros Karatzoglou, Alex Smola, KurtHornik, and Achim Zeileis. kernlab – an S4 pack-age for kernel methods in R. Journal of StatisticalSoftware, 11(9):1–20, 2004.

11

[12] Trupti M Kodinariya and Prashant R Makwana.Review on determining number of cluster in k-means clustering. International Journal, 1(6):90–95,2013.

[13] Andrea Lancichinetti, Santo Fortunato, and Fil-ippo Radicchi. Benchmark graphs for testingcommunity detection algorithms. Physical reviewE, 78(4):046110, 2008.

[14] Robert Lund and Bo Li. Revisiting climate re-gion definitions via clustering. Journal of Climate,22(7):1787–1800, 2009.

[15] Ulrike V Luxburg, Olivier Bousquet, and MikhailBelkin. Limits of spectral clustering. In Advancesin neural information processing systems, pages 857–864, 2005.

[16] Marina Meila and Jianbo Shi. Learning segmen-tation by random walks. In Advances in neural in-formation processing systems, pages 873–879, 2001.

[17] Nina Mishra, Robert Schreiber, Isabelle Stanton,and Robert E Tarjan. Clustering social networks.In WAW, volume 4863, pages 56–67. Springer,2007.

[18] Maria CV Nascimento and Andre CPLF De Car-valho. Spectral methods for graph clustering–asurvey. European Journal of Operational Research,211(2):221–231, 2011.

[19] Andrew Y Ng, Michael I Jordan, and Yair Weiss.On spectral clustering: Analysis and an algo-rithm. In Advances in neural information processingsystems, pages 849–856, 2002.

[20] F. Pedregosa, G. Varoquaux, A. Gramfort,V. Michel, B. Thirion, O. Grisel, M. Blondel,P. Prettenhofer, R. Weiss, V. Dubourg, J. Van-derplas, A. Passos, D. Cournapeau, M. Brucher,M. Perrot, and E. Duchesnay. Scikit-learn: Ma-chine learning in Python. Journal of MachineLearning Research, 12:2825–2830, 2011.

[21] Diego H Peluffo-Ordónez, John A Lee, andMichel Verleysen. Short review of dimensionalityreduction methods based on stochastic neighbourembedding. In Advances in Self-Organizing Mapsand Learning Vector Quantization, pages 65–74.Springer, 2014.

[22] Lior Rokach and Oded Maimon. The data miningand knowledge discovery handbook: A completeguide for researchers and practitioners, 2005.

[23] Satu Elisa Schaeffer. Graph clustering. Computerscience review, 1(1):27–64, 2007.

[24] Theodoros Semertzidis, Dimitrios Rafailidis,Michael G Strintzis, and Petros Daras. Large-scalespectral clustering based on pairwise constraints.Information Processing & Management, 51(5):616–624, 2015.

[25] Jianbo Shi and Jitendra Malik. Normalized cutsand image segmentation. IEEE Transactions onpattern analysis and machine intelligence, 22(8):888–

905, 2000.[26] Karsten Steinhaeuser, Nitesh Chawla, and Au-

roop Ganguly. Comparing predictive power inclimate data: Clustering matters. Advances inSpatial and Temporal Databases, pages 39–55, 2011.

[27] Kevin E Trenberth. The definition of el nino.Bulletin of the American Meteorological Society,78(12):2771–2777, 1997.

[28] Frederick Tung, Alexander Wong, and David AClausi. Enabling scalable spectral clusteringfor image segmentation. Pattern Recognition,43(12):4069–4076, 2010.

[29] Yurdanur Unal, Tayfun Kindap, and MehmetKaraca. Redefining the climate zones of turkeyusing cluster analysis. International journal ofclimatology, 23(9):1045–1055, 2003.

[30] Deepak Verma and Marina Meila. A comparisonof spectral clustering algorithms. University ofWashington Tech Rep UWCSE030501, 1:1–18, 2003.

[31] Ulrike Von Luxburg. A tutorial on spectral clus-tering. Statistics and computing, 17(4):395–416,2007.

W.R. Casper received BS degrees inmathematics and physics, as well as anMS in mathematics from North DakotaState University in 2010. He later receiveda PhD in mathematics from the Univer-sity of Washington, Seattle in 2017. He iscurrently a postdoctoral researcher in theDepartment of Mathematics at LouisianaState University in Baton Rouge. His re-search interests are diverse, and includeintegrable systems, algebraic geometry,

noncommutative algebra, geophysical fluid dynamics, computa-tional physics and machine learning.

Balu Nadiga is a scientist at the LosAlamos National Lab. With a specializa-tion in fluid dynamics, he has workedon problems ranging from climate andocean circulation to turbulence model-ing and multiphase flows. His other inter-ests include dynamical systems, uncer-tainty quantification and Bayesian analy-sis, and using machine learning for turbu-lence modeling.

12

1 a new spectral clustering algorithm - arxivclustering methods, the reader is referred to [18]; for...

Documents