oden: simultaneous approximation of multiple motif counts

14
odeN: Simultaneous Approximation of Multiple Motif Counts in Large Temporal Networks Ilie Sarpe Department of Information Engineering University of Padova Padova, Italy [email protected] Fabio Vandin Department of Information Engineering University of Padova Padova, Italy [email protected] ABSTRACT Counting the number of occurrences of small connected subgraphs, called temporal motifs, has become a fundamental primitive for the analysis of temporal networks, whose edges are annotated with the time of the event they represent. One of the main complications in studying temporal motifs is the large number of motifs that can be built even with a limited number of vertices or edges. As a consequence, since in many applications motifs are employed for exploratory analyses, the user needs to iteratively select and ana- lyze several motifs that represent different aspects of the network, resulting in an inefficient, time-consuming process. This problem is exacerbated in large networks, where the analysis of even a single motif is computationally demanding. As a solution, in this work we propose and study the problem of simultaneously counting the number of occurrences of multiple temporal motifs, all correspond- ing to the same (static) topology (e.g., a triangle). Given that for large temporal networks computing the exact counts is unfeasible, we propose odeN, a sampling-based algorithm that provides an accurate approximation of all the counts of the motifs. We provide analytical bounds on the number of samples required by odeN to compute rigorous, probabilistic, relative approximations. Our extensive experimental evaluation shows that odeN enables the approximation of the counts of motifs in temporal networks in a fraction of the time needed by state-of-the-art methods, and that it also reports more accurate approximations than such methods. CCS CONCEPTS Mathematics of computing Probabilistic algorithms; Theory of computation Graph algorithms analysis. KEYWORDS temporal motifs, sampling algorithm, temporal networks, random- ized algorithm ACM Reference Format: Ilie Sarpe and Fabio Vandin. 2021. odeN: Simultaneous Approximation of Multiple Motif Counts in Large Temporal Networks. In Proceedings of the 30th ACM International Conference on Information and Knowledge Manage- ment (CIKM ’21), November 1–5, 2021, Virtual Event, QLD, Australia. ACM, New York, NY, USA, 14 pages. https://doi.org/10.1145/3459637.3482459 CIKM ’21, November 1–5, 2021, Virtual Event, QLD, Australia © 2021 Copyright held by the owner/author(s). Publication rights licensed to ACM. This is the author’s version of the work. It is posted here for your personal use. Not for redistribution. The definitive Version of Record was published in Proceedings of the 30th ACM International Conference on Information and Knowledge Management (CIKM ’21), November 1–5, 2021, Virtual Event, QLD, Australia, https://doi.org/10.1145/3459637. 3482459. 1 INTRODUCTION Networks are ubiquitous representations that model a wide range of real-world systems, such as social networks [9], citation networks [10], biological systems [12], and many others [32]. One of the most fundamental primitives in network analysis is the mining of motifs [30, 31, 42] (or graphlets [7, 36]), which requires to count the occurrences of small connected subgraphs of nodes. Motifs represent key building blocks of networks, and they provide useful insights in wide range of applications such as network classification [29, 43], network clustering [3], and community detection [2]. Modern networks contain rich information about their edges or nodes [8, 20, 39, 50] in addition to graph structure. One of the most important information is the time at which the interactions, represented by edges, occur. Networks for which such informa- tion is available are called temporal [15, 16]; novel insights about the underlying dynamics of the systems can be uncovered by the analysis of such networks [2224]. In recent years, many primi- tives [17, 21, 34, 41] have been proposed as counterpart, in temporal networks, to the study of subgraph patterns for nontemporal net- works, with each primitive capturing different temporal aspects of a network. One of the most important such primitives is the study of temporal motifs [34]. Temporal motifs are small connected sub- graphs with nodes and edges occurring with a prescribed order within a time interval of duration . Temporal motifs describe the patterns shaping interactions over the network, e.g., networks from similar domains tend to have similar temporal motif counts [34], and their analysis is useful in many applications, e.g., anomalies detection [4], network classification [45], and social networks [6]. The temporal dimension poses several challenges in the analyses of motifs. A major challenge is represented by the large number of temporal motifs that can be build even with a limited number of vertices and edges. For example, even considering directed (and connected) temporal motifs with only 3 vertices and 3 edges, there are 32 such motifs. In several domains when motifs are studied in the exploratory analysis of a temporal network it is almost impos- sible for the data analyst to known a priori which motif is the most interesting and useful. In social networks, a set of 3 vertices repre- sents the smallest non trivial community, and different temporal motifs with 3 vertices describe different patterns of interactions in such community. Hence, studying all such motifs can provide novel insights on the interactions within such communities. In network classification, considering the counts of all the 32 motifs with 3 vertices and 3 edges lead to models with improved accuracy [45]. However, since state-of-the-art approaches for general temporal motifs only allow the analysis of one motif at the time, the user needs to iteratively select and analyze the various motifs, resulting arXiv:2108.08734v1 [cs.SI] 19 Aug 2021

Upload: others

Post on 02-Nov-2021

9 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: odeN: Simultaneous Approximation of Multiple Motif Counts

odeN: Simultaneous Approximation of Multiple Motif Counts inLarge Temporal Networks

Ilie SarpeDepartment of Information Engineering

University of PadovaPadova, Italy

[email protected]

Fabio VandinDepartment of Information Engineering

University of PadovaPadova, Italy

[email protected]

ABSTRACT

Counting the number of occurrences of small connected subgraphs,called temporal motifs, has become a fundamental primitive for theanalysis of temporal networks, whose edges are annotated with thetime of the event they represent. One of the main complicationsin studying temporal motifs is the large number of motifs thatcan be built even with a limited number of vertices or edges. As aconsequence, since in many applications motifs are employed forexploratory analyses, the user needs to iteratively select and ana-lyze several motifs that represent different aspects of the network,resulting in an inefficient, time-consuming process. This problem isexacerbated in large networks, where the analysis of even a singlemotif is computationally demanding. As a solution, in this workwe propose and study the problem of simultaneously counting thenumber of occurrences of multiple temporal motifs, all correspond-ing to the same (static) topology (e.g., a triangle). Given that forlarge temporal networks computing the exact counts is unfeasible,we propose odeN, a sampling-based algorithm that provides anaccurate approximation of all the counts of the motifs. We provideanalytical bounds on the number of samples required by odeNto compute rigorous, probabilistic, relative approximations. Ourextensive experimental evaluation shows that odeN enables theapproximation of the counts of motifs in temporal networks in afraction of the time needed by state-of-the-art methods, and that italso reports more accurate approximations than such methods.

CCS CONCEPTS

• Mathematics of computing → Probabilistic algorithms; •Theory of computation→ Graph algorithms analysis.

KEYWORDS

temporal motifs, sampling algorithm, temporal networks, random-ized algorithm

ACM Reference Format:

Ilie Sarpe and Fabio Vandin. 2021. odeN: Simultaneous Approximation ofMultiple Motif Counts in Large Temporal Networks. In Proceedings of the30th ACM International Conference on Information and Knowledge Manage-ment (CIKM ’21), November 1–5, 2021, Virtual Event, QLD, Australia. ACM,New York, NY, USA, 14 pages. https://doi.org/10.1145/3459637.3482459

CIKM ’21, November 1–5, 2021, Virtual Event, QLD, Australia© 2021 Copyright held by the owner/author(s). Publication rights licensed to ACM.This is the author’s version of the work. It is posted here for your personal use. Notfor redistribution. The definitive Version of Record was published in Proceedings of the30th ACM International Conference on Information and Knowledge Management (CIKM’21), November 1–5, 2021, Virtual Event, QLD, Australia, https://doi.org/10.1145/3459637.3482459.

1 INTRODUCTION

Networks are ubiquitous representations that model a wide range ofreal-world systems, such as social networks [9], citation networks[10], biological systems [12], and many others [32]. One of themost fundamental primitives in network analysis is the mining ofmotifs [30, 31, 42] (or graphlets [7, 36]), which requires to countthe occurrences of small connected subgraphs of 𝑘 nodes. Motifsrepresent key building blocks of networks, and they provide usefulinsights in wide range of applications such as network classification[29, 43], network clustering [3], and community detection [2].

Modern networks contain rich information about their edgesor nodes [8, 20, 39, 50] in addition to graph structure. One of themost important information is the time at which the interactions,represented by edges, occur. Networks for which such informa-tion is available are called temporal [15, 16]; novel insights aboutthe underlying dynamics of the systems can be uncovered by theanalysis of such networks [22–24]. In recent years, many primi-tives [17, 21, 34, 41] have been proposed as counterpart, in temporalnetworks, to the study of subgraph patterns for nontemporal net-works, with each primitive capturing different temporal aspects ofa network. One of the most important such primitives is the studyof temporal motifs [34]. Temporal motifs are small connected sub-graphs with 𝑘 nodes and ℓ edges occurring with a prescribed orderwithin a time interval of duration 𝛿 . Temporal motifs describe thepatterns shaping interactions over the network, e.g., networks fromsimilar domains tend to have similar temporal motif counts [34],and their analysis is useful in many applications, e.g., anomaliesdetection [4], network classification [45], and social networks [6].

The temporal dimension poses several challenges in the analysesof motifs. A major challenge is represented by the large numberof temporal motifs that can be build even with a limited numberof vertices and edges. For example, even considering directed (andconnected) temporal motifs with only 3 vertices and 3 edges, thereare 32 such motifs. In several domains when motifs are studied inthe exploratory analysis of a temporal network it is almost impos-sible for the data analyst to known a priori which motif is the mostinteresting and useful. In social networks, a set of 3 vertices repre-sents the smallest non trivial community, and different temporalmotifs with 3 vertices describe different patterns of interactions insuch community. Hence, studying all such motifs can provide novelinsights on the interactions within such communities. In networkclassification, considering the counts of all the 32 motifs with 3vertices and 3 edges lead to models with improved accuracy [45].

However, since state-of-the-art approaches for general temporalmotifs only allow the analysis of one motif at the time, the userneeds to iteratively select and analyze the various motifs, resulting

arX

iv:2

108.

0873

4v1

[cs

.SI]

19

Aug

202

1

Page 2: odeN: Simultaneous Approximation of Multiple Motif Counts

CIKM ’21, November 1–5, 2021, Virtual Event, QLD, Australia Ilie Sarpe and Fabio Vandin

in an inefficient and time consuming process, in particular for largenetworks.

In this paper, we define and study the problem of simultaneouslycounting the occurrences of various temporal motifs. In particu-lar, we consider all motifs corresponding to the same static targettemplate (e.g., all triangles - see Fig. 1a). This problem is extremelychallenging, since computing the count of even a single temporalmotif is NP-Hard in general [26], with existing state-of-the-art ap-proaches having complexity exponential in the number of edges ofthe motif to obtain even a single motif’s count [26, 40, 47].

The task of counting temporal motifs is hindered by the sheer sizeof modern datasets and, therefore, scalable techniques are needed todeal with such amount of data. Since exact approaches [13, 27, 34]are impractical, rigorous and efficient approximation algorithmsproviding tight guarantees are needed. In this work we developodeN, a sampling algorithm that provides a high quality approxi-mation for the problem of counting multiple temporal motifs withthe same static topology. Our main contributions are as follows:• We propose the motif template counting problem, where,given a temporal network, a 𝑘-node target template graph𝐻 ,the number ℓ of edges of each temporal motif, and a bound 𝛿on the duration of the temporal motifs, the problem requiresto output all the counts of the temporal motifs whose statictopology corresponds to 𝐻 and having exactly ℓ temporaledges, occurring within 𝛿-time.• We propose odeN, a randomized sampling algorithm pro-viding a high quality approximation for the motif templatecounting problem. odeN’s approach is to sample a set of mo-tif occurrences, ensuring that they all share the same statictopology 𝐻 . Thus, odeN takes advantage of the constraintthat all motifs must share a common target template 𝐻 , ag-gregating the computation of all motif counts in a sample.odeN’s approximation, as in other data mining applications,is controlled by two parameters Y, [, which control respec-tively the quality and the confidence of the approximations.• We show a tight and efficiently computable bound on thenumber of samples required by odeN for the approximationto be within Y error with confidence > 1 − [ for all temporalmotif’s counts.• We perform large scale experiments using datasets with upto billions of temporal edges, showing that odeN requires afraction of the time required by state-of-the-art approxima-tion algorithms for single motif counts, and that it reportssharper estimates. We then provide a parallel implemen-tation of odeN displaying almost linear speedup in manyconfigurations. We also show how odeN provides novel in-sights on the dynamics of a real-world temporal network.

2 PRELIMINARIES

In this section we introduce the basic notions that we will usethroughout the work, and we define the computational problemof counting multiple temporal motifs sharing a common targettemplate graph. We start by defining temporal networks.

Definition 2.1. A temporal network is a pair 𝑇 = (𝑉 , 𝐸) where,𝑉 = {𝑣1, . . . , 𝑣𝑛} and 𝐸 = {(𝑥,𝑦, 𝑡) : 𝑥,𝑦 ∈ 𝑉 , 𝑥 ≠ 𝑦, 𝑡 ∈ R+} with|𝑉 | = 𝑛 and |𝐸 | =𝑚.

1

2

3

4

5

6

7

8

TargetTemplate �

?

72

5

3, 811

6, 18 9

14,27

20, 35

10, 15

13

19

21 24

TemporalNetwork )

(a)

v2 v3

v1

ordering σ

〈(v1, v2), (v3, v1), (v2, v3)〉

t3

t1 t2

(b)

2 6

5

20

6 9

2 6

5

35

6 9

2 6

5

20

18 9

2 6

5

35

18 9

(c)

Figure 1: (1a): Motif template counting problem overview:

given a temporal network and a (static) target template, com-

pute the counts of all temporal motifs that map on the tem-

plate. (1b): Temporal motif, with 𝑘 = 3, ℓ = 3, and its order-

ing 𝜎 . (1c): Sequences of edges of the network in (1a) among

nodes {2, 5, 6} thatmap topologically on themotif in (1b). For

𝛿 = 15 only the green sequence is a 𝛿-instance of the motif,

since the timestamps respect 𝜎 and 𝑡 ′ℓ− 𝑡 ′1 = 20 − 6 ≤ 𝛿 . The

red sequences are not 𝛿-instances, since they do not respect

such constraint or do not respect the ordering 𝜎 .

Given (𝑥,𝑦, 𝑡) ∈ 𝐸, we say that 𝑡 is the timestamp of the directededge (𝑥,𝑦). Given a temporal network 𝑇 , by ignoring the times-tamps of its edges we obtain the associated undirected projectedstatic network, defined as follows.

Definition 2.2. The undirected projected static network of a tempo-ral network𝑇 = (𝑉 , 𝐸) is the pair𝐺𝑇 = (𝑉 , 𝐸𝑇 ) that is an undirectednetwork, such that 𝐸𝑇 = {{𝑥,𝑦} : (𝑥,𝑦, 𝑡) ∈ 𝐸}.

We will often use the term static network to denote a networkwhose edges are without timestamps. Next we introduce the defini-tion of temporal motifs as defined by Paranjape et al. [34], whichare small, connected subgraphs representing patterns of interest.

Definition 2.3. A 𝑘-node ℓ-edge temporal motif 𝑀 is a pair𝑀 =

(K, 𝜎) where K = (𝑉K , 𝐸K ) is a directed and weakly connectedmultigraph where 𝑉K = {𝑣1, . . . , 𝑣𝑘 }, 𝐸K = {(𝑥,𝑦) : 𝑥,𝑦 ∈ 𝑉K , 𝑥 ≠

𝑦} s.t. |𝑉K | = 𝑘 and |𝐸K | = ℓ , and 𝜎 is an ordering of 𝐸K .

Note that a 𝑘-node ℓ-edge temporal motif 𝑀 = (K, 𝜎) is alsoidentified by the sequence ⟨(𝑥1, 𝑦1), . . . , (𝑥ℓ , 𝑦ℓ )⟩ of edges orderedaccording to 𝜎 ; we will often use such representation for a motif𝑀(see Fig. (1b) for an example). Given a 𝑘-node ℓ-edge temporal motif𝑀 , the values of 𝑘 and ℓ are determined by 𝑉K and 𝐸K . We willtherefore use the term temporal motif, or simply motif, when 𝑘 andℓ are clear from context. Given a temporal motif𝑀 = ((𝑉K , 𝐸K ), 𝜎),we denote with𝐺𝑢 [𝑀] the undirected graph corresponding to theunderlying undirected graph structure of the multigraph K of𝑀 ,that is 𝐺𝑢 [𝑀] = (𝑉K , 𝐸𝑢𝑀 ) where 𝐸

𝑢𝑀

= {{𝑥,𝑦} : (𝑥,𝑦) ∨ (𝑦, 𝑥) ∈

Page 3: odeN: Simultaneous Approximation of Multiple Motif Counts

odeN: Simultaneous Approximation of Multiple Motif Counts in Large Temporal Networks CIKM ’21, November 1–5, 2021, Virtual Event, QLD, Australia

𝐸K } (i.e., 𝐸𝑢𝑀 is the set of undirected edges associated to the multiset𝐸K ). Notice that directed edges of the form (𝑥,𝑦), (𝑦, 𝑥) as well asmultiple directed edges (𝑥,𝑦), (𝑥,𝑦), . . . from 𝐸K are representedby the same undirected edge {𝑥,𝑦} in 𝐸𝑢

𝑀.

For a fixed temporal motif 𝑀 , we are interested in identifyingits realizations in 𝑇 appearing within at most 𝛿-time duration, ascaptured by the following definition.

Definition 2.4. Given a temporal network 𝑇 = (𝑉 , 𝐸) and 𝛿 ∈R+, a time ordered sequence 𝑆 = ⟨(𝑥 ′1, 𝑦

′1, 𝑡′1), . . . , (𝑥

′ℓ, 𝑦′

ℓ, 𝑡 ′ℓ)⟩ of ℓ

unique temporal edges from 𝑇 is a 𝛿-instance of the temporal motif𝑀 = ⟨(𝑥1, 𝑦1), . . . , (𝑥ℓ , 𝑦ℓ )⟩ if:

(1) there exists a bijection 𝑓 on the vertices such that 𝑓 (𝑥 ′𝑖) = 𝑥𝑖

and 𝑓 (𝑦′𝑖) = 𝑦𝑖 , 𝑖 = 1, . . . , ℓ ; and

(2) the edges of 𝑆 occur within 𝛿 time, i.e., 𝑡 ′ℓ− 𝑡 ′1 ≤ 𝛿 .

Exploring different values of 𝛿 in the above definition oftenleads to different insights on the temporal network that may bediscovered through the analysis of the motifs [1, 15, 21, 33]. Notethat in a 𝛿-instance of the temporal motif 𝑀 = (K, 𝜎) the edgetimestamps must be sorted according to the ordering 𝜎 (see Fig. (1c)for an example). In fact, 𝜎 plays a key role in defining a temporalmotif, with different orderings of the same multigraphK reflectingdiverse dynamic properties captured by the motif.

For a given directed multigraph K with |𝐸K | = ℓ edges, ingeneral not all the ℓ! orderings of its edges define distinct temporalmotifs. We therefore introduce the following equivalence relation.

Definition 2.5. Let𝑀1 and𝑀2 be two temporal motifs. Let𝑀1 =

⟨(𝑥11 , 𝑦

11), . . . , (𝑥

1ℓ, 𝑦1

ℓ)⟩, and 𝑀2 = ⟨(𝑥2

1 , 𝑦21), . . . , (𝑥

2ℓ, 𝑦2

ℓ)⟩ be the se-

quences of edges of𝑀1 and𝑀2, respectively. We say that𝑀1 and𝑀2are not distinct (denoted with𝑀1 �𝜏 𝑀2) if there exists a bijection𝑔 on the vertices such that 𝑔(𝑥1

𝑖) = 𝑥2

𝑖and 𝑔(𝑦1

𝑖) = 𝑦2

𝑖, 𝑖 = 1, . . . , ℓ .

We provide an example of the definition above in Figure 2.Given two networks (undirected or temporal) 𝐺,𝐺 ′ we say that

𝐺 ′ = (𝑉 ′, 𝐸 ′) is a subgraph of 𝐺 = (𝑉 , 𝐸) (denoted with 𝐺 ′ ⊆ 𝐺)if 𝑉 ′ ⊆ 𝑉 and 𝐸 ′ ⊆ 𝐸. Note that we require a subgraph to beedge induced. To conclude the preliminary notions, we recall thedefinition of static graph isomorphism.

Definition 2.6. Given two graphs𝐺 = (𝑉𝐺 , 𝐸𝐺 ) and𝐻 = (𝑉𝐻 , 𝐸𝐻 )we say that the two graphs are isomorphic, denoted with 𝐺 ≃ 𝐻

if and only if there exists a bijection 𝑓 : 𝑉𝐺 ↦→ 𝑉𝐻 on the verticessuch that 𝑒 = (𝑢, 𝑣) ∈ 𝐸𝐺 ⇔ 𝑒 ′ = (𝑓 (𝑢), 𝑓 (𝑣)) ∈ 𝐸𝐻 .

Let U(𝑀,𝛿) = {𝐼 : 𝐼 is a 𝛿-instance of 𝑀} be the set of (all) 𝛿-instances of the motif𝑀 in𝑇 . The count of𝑀 is𝐶𝑀 (𝛿) = |U(𝑀,𝛿) |,denoted with 𝐶𝑀 when 𝛿 is clear from the context.

Given a static undirected graph 𝐻 , which we call the targettemplate, we are interested in solving the problem of computingthe number of 𝛿-instances of all temporal motifs with ℓ edgesand all corresponding to the same static graph 𝐻 . More formally,given the target template 𝐻 = (𝑉𝐻 , 𝐸𝐻 ), which is a simple andconnected graph, and ℓ ≥ |𝐸𝐻 | ∈ Z+, letM(𝐻, ℓ) be the set ofdistinct temporal motifs with ℓ edges whose underlying undirectedgraph structure corresponds to 𝐻 , that isM(𝐻, ℓ) contains mo-tifs 𝑀𝑖 = ((𝑉 𝑖

K , 𝐸𝑖K ), 𝜎𝑖 ), 𝑖 = 1, 2, . . . , such that i) 𝐺𝑢 [𝑀𝑖 ] ≃ 𝐻 ; ii)

|𝐸𝑖K | = ℓ ; and iii)𝑀𝑖 �𝜏 𝑀𝑗 ,∀𝑗 ≠ 𝑖 .

x y

zM1

∼=τ

x′ y′

z′M2

t1

t2t3

t2

t3t1

x y

zM1

�τ

x′ y′

z′M3

t1

t2t3

t1

t3t2

Figure 2: (Left): The two motifs are not distinct: let 𝜎1 =

⟨(𝑦, 𝑥), (𝑦, 𝑧), (𝑥, 𝑧)⟩ and 𝜎2 = ⟨(𝑥 ′, 𝑧′), (𝑥 ′, 𝑦′), (𝑧′, 𝑦′)⟩ corre-sponding to 𝑀1 and 𝑀2, then the function 𝑓 : 𝑉 1

K ↦→ 𝑉 2K de-

fined by 𝑓 (𝑥) = 𝑧′, 𝑓 (𝑦) = 𝑥 ′, 𝑓 (𝑧) = 𝑦′ preserves both the

topology and the ordering as from Definition 2.5. (Right):

The two motifs are distinct since there is no map 𝑓 : 𝑉 1K ↦→

𝑉 3K preserving both the topology and ordering.

Let us explain intuitively the constrains above. First, 𝐻 imposesa constraint on the undirected static topology the temporal motifsof interest (that are directed subgraphs) should have. That is, itrequires all the motifs to have the same underlying graph structure(𝐺𝑢 [𝑀]), which must be isomorphic to 𝐻 . This is a useful way torepresent multiple related temporal motifs. For example, in socialnetwork analysis by fixing 𝐻 as an undirected triangle we considerinM(𝐻, ℓ) all temporal motifs that characterize the communicationbetween groups of three friends (i.e., each motif will represent adifferent form of communication among all such groups [34]). Thesecond constraint requires each motif𝑀𝑖 ∈ M(𝐻, ℓ) to have exactlyℓ ≥ |𝐸𝐻 | edges, with ℓ provided in input by the user. Fixing theparameter ℓ is motivated by the fact that motifs with different valuesof ℓ (evenwith the same target template structure𝐻 ) reflect differentpatterns of interaction (e.g, a group of friends that exchanges ℓ = 3or ℓ = 4 messages). As we will show empirically in Section 5.4, suchcounts vary significantly with ℓ for fixed 𝐻 and 𝛿 . Finally, the thirdconstraint ensures that we only count distinct motifs, i.e., motifsrepresenting different patterns.

We now define the motif template counting problem.

Problem 1. Motif template counting problem. Given a tem-poral network 𝑇 , a static undirected target graph 𝐻 = (𝑉𝐻 , 𝐸𝐻 ),ℓ ∈ Z+, ℓ ≥ |𝐸𝐻 |, and a parameter 𝛿 ∈ R+, find the counts𝐶𝑀𝑖

(𝛿) ofmotifs𝑀𝑖 ∈ M(𝐻, ℓ), 𝑖 = 1, . . . , |M(𝐻, ℓ) | in 𝑇 .

We now provide an example of the different motifs to be countedfor different values of ℓ with a fixed target template 𝐻 .

Example 2.7. Let 𝐻 = ({𝑣1, 𝑣2}, {{𝑣1, 𝑣2}}), that is, the targettemplate is an edge. Let 𝑒1 = (𝑣1, 𝑣2) and 𝑒2 = (𝑣2, 𝑣1). By vary-ing ℓ ∈ {2, 3} the motifs inM(𝐻, ℓ), for which we want to com-pute the counts, are: 𝑀1 = ⟨𝑒1, 𝑒1⟩ and 𝑀2 = ⟨𝑒1, 𝑒2⟩ for ℓ = 2(i.e., |M(𝐻, 2) | = 2) while 𝑀1 = ⟨𝑒1, 𝑒1, 𝑒1⟩, 𝑀2 = ⟨𝑒1, 𝑒2, 𝑒1⟩, 𝑀3 =

⟨𝑒1, 𝑒2, 𝑒2⟩, 𝑀4 = ⟨𝑒1, 𝑒1, 𝑒2⟩ for ℓ = 3 (i.e., |M(𝐻, 3) | = 4).

Since solving the counting problem exactly is NP-Hard in gen-eral1 even for one single temporal motif, we aim at providing high-quality approximations to the motif counts as follows.

Problem 2. Motif template approximation problem. Giventhe input parameters of Problem 1 and additional parameters Y ∈1The hardness depends on the topology of the motif. For example for triangles andsingle edges there exist polynomial time-algorithms, even if they are impracticableon very large networks. Interestingly, counting temporal star-shaped motifs is NP-Hard [26], while on static networks such motifs can be counted in polynomial time.

Page 4: odeN: Simultaneous Approximation of Multiple Motif Counts

CIKM ’21, November 1–5, 2021, Virtual Event, QLD, Australia Ilie Sarpe and Fabio Vandin

R+, [ ∈ (0, 1), compute approximations 𝐶 ′𝑀𝑖(𝛿) of counts 𝐶𝑀𝑖

(𝛿)of motifs 𝑀𝑖 ∈ M(𝐻, ℓ), 𝑖 = 1, . . . , |M(𝐻, ℓ) |, such that P[∃𝑖 ∈{1, . . . , |M(𝐻, ℓ) |} : |𝐶 ′

𝑀𝑖(𝛿) − 𝐶𝑀𝑖

(𝛿) | ≥ Y𝐶𝑀𝑖(𝛿)] ≤ [, that is

𝐶 ′𝑀𝑖(𝛿) is a relative Y-approximation to the count 𝐶𝑀𝑖

(𝛿) with prob-ability ≥ 1 − [ for all 𝑖 = 1, . . . , |M(𝐻, ℓ) | simultaneously.

3 RELATEDWORKS

Much work has been done on enumerating and approximating 𝑘-node motifs in (nontemporal) networks. We refer the interestedreader to the surveys [38, 48]. However, such works cannot be easilyadapted to temporal motifs since they do not properly account forthe temporal information [15, 34]. Many different definitions oftemporal networks and temporal patterns have been proposed: herewe will focus only on those works that are relevant for our work,the interested reader may refer to [15, 16, 18, 28] for a more generaloverview.

Our work builds on the work of Paranjape et al. [34] whichfirst introduced the definition of temporal motif used here, andthe problem of counting single temporal motifs. The authors pro-vided a general algorithm for counting a single temporal motif byenumerating all the subsequences of edges that map on a singlestatic subgraph. Their approach is not feasible on large datasetssince it requires exhaustive enumeration of all subgraphs of theundirected projected static network 𝐺𝑇 that are isomorphic to thetarget template 𝐻 . The authors also proposed efficient algorithmsand data-structures for counting 3-node 3-edge motifs, which maybe used for the exact counting subroutines within odeN samplingframework. In addition to the algorithmic contributions, the authorsalso showed that networks from similar domains tend to exhibitsimilar temporal motif counts. They also showed how motif countscan provide significant insights on the communication patterns inmany networks, highlighting the importance of studying temporalmotifs in temporal networks.

Other exact algorithms have been proposed for the problem ofcounting a single motif, or for slightly different problems. Mackeyet al. [27] presented a backtracking algorithm for counting a singletemporal motif that can be use for any motif. Boekhout et al. [6]developed exact algorithms for counting temporal motifs in multi-layer temporal networks (i.e., each edge is a tuple (𝑥,𝑦, 𝑡, 𝑎) with𝑎 denoting the layer of each edge), they also discuss efficient data-structures for counting 4-node 4-edge motifs, which may also beadapted for the exact counting subroutines in our sampling frame-work odeN. Being exact, both such algorithms do not scale onmassive datasets due large time and memory requirements.

Several approximation algorithms have been proposed in re-cent years for estimating the count of a single motif. Liu et al. [26]proposed a temporal-partition based sampling approach. Wang etal. [47] introduced a sampling-based algorithm that selects tempo-ral edges with a fixed probability specified by the user. Lastly, Sarpeand Vandin [40] proposed PRESTO, an algorithm based on uniformsampling of small windows of the temporal network 𝑇 . All suchsampling algorithms can be used to analyze a single temporal motifbut become inefficient as the number of motifs to be counted grows,such as in Problem 2. In fact, they cannot leverage the additionalinformation that all motifs 𝑀1, . . . , 𝑀 |M(𝐻,ℓ) | must share a com-mon static topology isomorphic to 𝐻 . As stated in Section 1, when

analysing a temporal networks it is hard to know a-priori whichmotif is representing important functions for the network, thereforeone often relies on testing all possible orderings 𝜎 over one fixedtarget template 𝐻 for fixed ℓ, 𝛿 [34, 45] (as in Prob. 1) resulting in atime consuming and inefficient procedure. Our approach insteadsupports the direct analysis of multiple temporal motifs, enablingthe study of hundreds of temporal motifs on massive networks in avery limited time.

4 ODEN

In this section we present odeN, our algorithm to address the motiftemplate approximation problem (Prob. 2). We start in Section 4.1with an overview of odeN. We then describe the algorithm inSection 4.2, analyze its time complexity in Section 4.3 and its theo-retical guarantees, including an efficiently computable bound onthe number of samples required to obtain the desired probabilisticguarantees, in Section 4.4.

4.1 Overview of odeN

Our algorithm odeN estimates of the counts of motifs inM(𝐻, ℓ).The main idea is to avoid the explicit generation all the motifs𝑀𝑖 ∈ M(𝐻, ℓ), 𝑖 = 1, . . . , |M(𝐻, ℓ) | to count them one at the timeas it is required by existing algorithms that approximate a singlemotif count. odeN instead leverages the fact that the topology of allmotifs must to be isomorphic to the target template 𝐻 , by reusingthe computation while estimating the motif counts.

An overview of the main strategy adopted by our algorithm ispresented in Figure 3. Given the input parameters of Problem 2,where 𝐻 is the target template, the idea behind our procedure is toconsider the undirected static projected graph 𝐺𝑇 of the input tem-poral network 𝑇 and proceed as follows: i) find a set of subgraphsin the static graph𝐺𝑇 that are isomorphic to 𝐻 by first sampling anedge 𝑒𝑅 of 𝐺𝑇 with some probability 𝑝𝑒𝑅 , where 𝑝𝑒𝑅 depends, po-tentially, on 𝑒𝑅 and the temporal network𝑇 , and then enumeratingall subgraphs of𝐺𝑇 isomorphic to 𝐻 and containing 𝑒𝑅 ; ii) for eachsuch subgraph, consider the corresponding temporal subgraph andcompute all the counts of the subsequences of ℓ edges occurringwithin 𝛿-time in such temporal subgraph; iii) for each such sub-sequence identified, find the corresponding motif inM(𝐻, ℓ), forwhich the subsequence is a 𝛿-instance of, and update a count foreach motif identified; iv) weight each motif count opportunely inorder to maintain an unbiased estimate of global motif counts; v)repeat steps i)-iv) a sufficient number of iterations to guarantee thedesired (Y, [)-approximation (see Problem 2).

4.2 Algorithm Description

odeN is described in Algorithm 1. It first computes 𝐺𝑇 = (𝑉 , 𝐸𝑇 ),the undirected projected static graph of 𝑇 (line 1), and initializes𝐶𝑒𝑠𝑡𝑖𝑚𝑎𝑡𝑒𝑠 (line 2) used to store the estimates of motif counts, whichare used to compute the estimators 𝐶 ′

𝑀𝑖, 𝑖 = 1, . . . , |M(𝐻, ℓ) |. Then

it repeats 𝑠 times (line 3) the following procedure: i) pick a randomedge 𝑒𝑅 from𝐺𝑇 (line 4) according to some probability distributionover the edges of 𝐸𝑇 ; ii) enumerate all the subgraphs ℎ of 𝐺𝑇 suchthat ℎ ≃ 𝐻 and 𝑒𝑅 ∈ ℎ (line 5); note that this enumeration step islocal to 𝑒𝑅 ; iii) for each such ℎ (line 6), collect the correspondingtemporal graph, i.e., all edges in 𝑇 for which their static projected

Page 5: odeN: Simultaneous Approximation of Multiple Motif Counts

odeN: Simultaneous Approximation of Multiple Motif Counts in Large Temporal Networks CIKM ’21, November 1–5, 2021, Virtual Event, QLD, Australia

Figure 3: Overview of odeN’s approximation strategy. Let

𝐻 be a triangle, and ℓ = 3, 𝛿 = 40. odeN first collects the

static projected network 𝐺𝑇 , then samples an edge 𝑒𝑅 ∈ 𝐺𝑇

randomly (𝑒𝑅 = {1, 2} in the figure) and enumerates all the

subgraphs of 𝐺𝑇 isomorphic to 𝐻 containing 𝑒𝑅 . For each

subgraph it collects the corresponding temporal network,

counts the 𝛿-instances of the motifs, and combines the dif-

ferent counts to obtain unbiased estimates of motif counts.

This procedure is repeated to obtain concentrated estimates.

edge is an edge of ℎ (line 7), sort the sequence of edges of suchgraph by increasing timestamps and apply some pruning criteria(lines 8-9); iv) if the sequence is not pruned, then update the es-timates of the number of 𝛿-instances of each temporal motif bycalling the routine FastUpdate (line 10). FastUpdate features anefficient implementation of the general algorithm by Paranjapeet al. [34], for which we devised efficient encodings of the mo-tifs within integers through bitwise operations. Such function up-dates 𝐶𝑒𝑠𝑡𝑖𝑚𝑎𝑡𝑒𝑠 in order to maintain for each motif the count thatwill be used to output its unbiased estimate (see Appendix B). Let𝐶𝑀𝑖(𝑒) be the number of 𝛿-instances in𝑇 of𝑀𝑖 , 𝑖 = 1, . . . , |M(𝐻, ℓ) |

whose undirected projected static network contains edge 𝑒 ∈ 𝐺𝑇 .FastUpdate updates the estimate of the counts for each motif𝑀𝑖

by summing its unbiased estimate obtained at the 𝑗-th iteration(i.e., 𝑋 𝑗

𝑀𝑖= 𝐶𝑀𝑖

(𝑒𝑅)/(|𝐸𝐻 |𝑝𝑒𝑅 )). Once the procedure is repeated 𝑠times, for each motif 𝑀𝑖 ∈ M(𝐻, ℓ), 𝑖 = 1, . . . , |M(𝐻, ℓ) |, odeNcomputes the final estimate 𝐶 ′

𝑀𝑖= 1

𝑠

∑𝑠𝑗=1 𝑋

𝑗

𝑀𝑖where 𝑋

𝑗

𝑀𝑖=

1|𝐸𝐻 |

∑𝑒∈𝐺𝑇

𝐶𝑀𝑖(𝑒)𝑋𝑒/𝑝𝑒 is the estimate obtained at the 𝑗-th it-

eration (with𝑋𝑒 being a bernoulli random variable denoting if edge𝑒 ∈ 𝐺𝑇 is sampled at the 𝑗-th iteration, s.t. P[𝑋𝑒 = 1] = 𝑝𝑒 ) and out-puts it together with the motif (we output 𝜎𝑖 over the node-set 𝑉𝐻 )(lines 12-13). We show in Lemma 4.1 that odeN outputs unbiasedestimates for all the motif counts.

We briefly discuss the pruning criteria used in line 9. Given acandidate temporal graph 𝑆 for which 𝐺𝑆 ≃ 𝐻 holds, we check inlinear time if 𝑆 can contain a 𝛿-instance of a motif or not: since 𝑆 isalready sorted by increasing timestamps (see line 8), we efficientlycheck if there are at least ℓ edges within 𝛿-time. If not, then we

Algorithm 1: odeNInput: 𝑇 = (𝑉 , 𝐸), 𝐻 = (𝑉𝐻 , 𝐸𝐻 ), 𝛿, 𝑠, ℓOutput: (𝑀𝑖 ,𝐶

′𝑀𝑖), 𝑖 = 1, . . . , |M(𝐻, ℓ) | where 𝐶 ′

𝑀𝑖is an

estimate of 𝐶𝑀𝑖for the motifs inM(𝐻, ℓ).

1 𝐺𝑇 = (𝑉 , 𝐸𝑇 ) ← UndirectedStaticProjection(𝑇 )2 𝐶𝑒𝑠𝑡𝑖𝑚𝑎𝑡𝑒𝑠 ← {}3 for 𝑗 ← 1 to 𝑠 do4 𝑒𝑅 = {𝑥𝑅, 𝑦𝑅} ← RandomEdge(𝑝 (𝑒) : 𝑒 ∈ 𝐸𝑇 )5 H ← {ℎ ⊆ 𝐺𝑇 : ℎ ≃ 𝐻, {𝑥𝑅, 𝑦𝑅} ∈ ℎ}6 foreach ℎ ∈ H do

7 𝑆 ← {(𝑥,𝑦, 𝑡), (𝑦, 𝑥, 𝑡) ∈ 𝐸 : {𝑥,𝑦} ∈ ℎ}8 SortInPlace(𝑆) ⊲ By increasing timestamps

9 if *Pruning criteria are not met* then

10 FastUpdate(𝛿, 𝑆,𝐶𝑒𝑠𝑡𝑖𝑚𝑎𝑡𝑒𝑠 , 𝑝 (𝑒𝑅), 𝐻 )

11 foreach (𝑀,𝑋𝑀 ) ∈ 𝐶𝑒𝑠𝑡𝑖𝑚𝑎𝑡𝑒𝑠 do

12 𝐶 ′𝑀← 𝑋𝑀

𝑠

13 output (𝑀,𝐶 ′𝑀)

prune the sequence (since by definition a 𝛿-instance of a motif with𝑘-nodes, and ℓ-edges must have ℓ edges occurring within 𝛿-time).We thus avoid calling the subroutine FastUpdate, which has anexponential complexity in general (see Section 4.3), on 𝑆 .

We now discuss the probability distribution used to sample arandom edge 𝑒𝑅 from𝐺𝑇 (line 4), while we describe the subroutineFastUpdate that updates the motif estimates at each iteration (line10) and the algorithms employed for the static enumeration inAppendix B for space constraints (Sections B.1 and B.2).

Since our final estimate is an average over 𝑠 samples of thevariables 𝑋 𝑗

𝑀𝑖, 𝑖 = 1, . . . , |M(𝐻, ℓ) |, 𝑗 = 1, . . . , 𝑠 , and given that 𝑋 𝑗

𝑀𝑖

is an unbiased estimate (see Lemma 4.1) the final estimate is also aconsistent estimator (i.e., it converges to𝐶𝑀𝑖

as 𝑠 →∞) if each edgehas a positive probability of being sampled2. Thus any probabilitymass assigning positive probabilities on edges can be adopted. Weconsidered different distributions over the edges of 𝐸𝑇 :

(1) Uniform: 𝑝𝑒 = 1/|𝐸𝑇 |, 𝑒 ∈ 𝐸𝑇 ;(2) Static degree based: 𝑝𝑒 = 𝑑 (𝑒)/(∑𝑒′∈𝐸𝑇 𝑑 (𝑒 ′)), 𝑒 ∈ 𝐸𝑇 where

𝑑 (𝑒 = {𝑥,𝑦}) = 𝑑 (𝑥) + 𝑑 (𝑦) is the degree of the edge as sumof the degree of its nodes 𝑥,𝑦 ∈ 𝑉 in 𝐺𝑇 ;

(3) Temporal degree based: 𝑝𝑒 = 𝜙 (𝑒)/(∑𝑒′∈𝐸𝑇 𝜙 (𝑒 ′)) with𝜙 (𝑒 = {𝑥,𝑦}) = |{𝑡 : ∃(𝑥, 𝑧, 𝑡) ∨ (𝑧, 𝑥, 𝑡) ∈ 𝐸}| + |{𝑡 :∃(𝑧,𝑦, 𝑡) ∨ (𝑦, 𝑧, 𝑡) ∈ 𝐸, 𝑧 ≠ 𝑥}|, 𝑒 ∈ 𝐸𝑇 ;

(4) Temporal edge weight based: 𝑝𝑒={𝑥,𝑦 } = |{(𝑥,𝑦, 𝑡), (𝑦, 𝑥, 𝑡) ∈𝐸}|/𝑚, 𝑒 ∈ 𝐸𝑇 ;

We empirically found the distribution (4) to be the fastest toconverge for small number 𝑠 of iterations, thus we use it in ouranalysis. We observe that many other candidate distributions can bedesigned (e.g., combining two of those already listed with weightsb, 1 − b, b ∈ (0, 1)) making our framework extremely versatile.

We conclude by summarizing some nice properties of our algo-rithm: 1) it computes the estimates only for the temporal motifs

2More formally it is only necessary to assign to each 𝛿-instance a known positivesampling probability.

Page 6: odeN: Simultaneous Approximation of Multiple Motif Counts

CIKM ’21, November 1–5, 2021, Virtual Event, QLD, Australia Ilie Sarpe and Fabio Vandin

occurring in the input temporal network 𝑇 (except for the veryunpractical case where the motifs inM(𝐻, ℓ) have all zero counts)without generating all the possible candidates, while existing sam-pling techniques require to first generate all the candidates and thento execute the algorithms on such candidates, even for motifs withzero counts; 2) it takes advantage of the constraint that all motifsshare the same underlying topology (𝐻 ), saving computation whenestimating the different counts; 3) it is trivially parallelizable: allthe 𝑠 iterations can be executed in parallel; 4) it can easily use mostof the fast state-of-the-art subgraph enumeration algorithms devel-oped for the exact subgraph isomorphism problem (see AppendixB.2).

4.3 Time Complexity

In this section we briefly describe the time complexity of odeN.odeN needs to compute the probabilities 𝑝 (𝑒) of edges in advance,which requires a 𝑂 ( |𝐸𝑇 |) preprocessing step. Interestingly, thisstep does not depend on the target template 𝐻 , so it can be reusedfor different target templates 𝐻 . One of the most expensive stepsin Algorithm 1 is the local enumeration to identify the set Hwhich in general requires exponential time (line 5). For specifictopologies this step can be implemented very efficiently with sym-metry breaking conditions and min-degree expansion. For exam-ple, if 𝐻 is a triangle this “local" enumeration to 𝑒𝑅 = {𝑥𝑅, 𝑦𝑅}can be done in 𝑂 (min(𝑑𝑥𝑅 , 𝑑𝑦𝑅 )) time. Let |H∗ | be the maximumcardinality of a set of subgraphs isomorphic to 𝐻 and adjacentto an edge in 𝐺𝑇 . Let |𝑆∗ | denote the maximum cardinality of aset 𝑆 collected (in line 7) by our algorithm odeN. Sorting 𝑆∗ re-quires 𝑂 ( |𝑆∗ | log |𝑆∗ |) time. The subroutine FastCount has a com-plexity dominated by 𝑂 (( |𝑆∗ | + ℓ) |𝐸𝐻 |ℓ ) (see [34] and App. B.1for more details). So overall the complexity of our procedure is𝑂 ( |𝐸𝑇 | + 𝑠 (Z𝑒𝑛𝑢𝑚 + |H∗ | ( |𝑆∗ | log( |𝑆∗ |) + |𝐸𝐻 |ℓ ( |𝑆∗ | + ℓ)))), whereZ𝑒𝑛𝑢𝑚 is the time required by the static enumerator used as sub-routine to compute the set H∗. Such step in general is exponen-tial in the number of edges of |𝐸𝑇 | and depends on the exacttechnique used as subroutine. The final complexity accounts forthe cycle (in line 3) that is repeated 𝑠 times. The parallel versionof our algorithm, which executes the cycle of line 3 in parallelon 𝜔 processing units available, leads to a time complexity of𝑂 ( |𝐸𝑇 | + 𝑠/𝜔 (Z𝑒𝑛𝑢𝑚 + |H∗ | ( |𝑆∗ | log( |𝑆∗ |) + |𝐸𝐻 |ℓ ( |𝑆∗ | + ℓ)))).

4.4 Theoretical Guarantees

In this section we present the theoretical guarantees provided byodeN. All proofs are provided in Appendix D.

Recall that our algorithm outputs, for each motif 𝑀𝑖 ∈M(𝐻, ℓ), 𝑖 = 1, . . . , |M(𝐻, ℓ) |, the following estimate: 𝐶 ′

𝑀𝑖=

1𝑠

∑𝑠𝑗=1 𝑋

𝑗

𝑀𝑖= 1

𝑠 |𝐸𝐻 |∑𝑠

𝑗=1∑𝑒∈𝐺𝑇

𝐶𝑀 (𝑒)𝑋𝑒/𝑝𝑒 . The followingshows that such estimates are unbiased estimates of 𝐶𝑀𝑖

, 𝑖 =

1, . . . , |M(𝐻, ℓ) |.

Lemma 4.1. For eachmotif-count pair (𝑀𝑖 ,𝐶′𝑀𝑖) reported in output

by odeN, 𝐶 ′𝑀𝑖

is an unbiased estimate to 𝐶𝑀𝑖, that is E[𝐶 ′

𝑀𝑖] = 𝐶𝑀𝑖

Let 𝛼 = min{𝑥,𝑦 }∈𝐸𝑇 {|{(𝑥,𝑦, 𝑡), (𝑦, 𝑥, 𝑡) ∈ 𝐸}|}, i.e., the mini-mum number of temporal edges of 𝑇 that map on an edge in 𝐺𝑇 .We now give an upper bound to the variance of the estimates pro-vided by Algorithm 1 for each motif reported in output.

Lemma 4.2. For eachmotif-count pair (𝑀𝑖 ,𝐶′𝑀𝑖) reported in output

by odeN, it holds Var[𝐶 ′𝑀𝑖] ≤

𝐶2𝑀𝑖

𝑠

(𝑚

𝛼 |𝐸𝐻 | − 1)

To give a bound on the number 𝑠 of samples required by odeNto output a Y-approximation that holds on all motifs in outputwith probability > 1 − [, we combine Bennett’s inequality [5],an advanced result on the concentration of sums for independentrandom variables as reported in [40], with a union bound, obtainingthe following main result.

Theorem 4.3. Let 𝑠 be the number of iterations of odeN, let Y ∈ R+,and [ ∈ (0, 1). If 𝑠 ≥

(𝑚

𝛼 |𝐸𝐻 | − 1)

1(1+Y) ln(1+Y)−Y ln

(2 |M(𝐻,ℓ) |

[

)then

P[∃𝑖 ∈ {1, . . . , |M(𝐻, ℓ) |} : |𝐶 ′𝑀𝑖−𝐶𝑀𝑖

| ≥ Y𝐶𝑀𝑖] ≤ [.

5 EXPERIMENTAL EVALUATION

We implemented odeN and tested it on several large datasets (seeSection 5.1 for details on setup, and data). Our experimental evalu-ation has the following goals: compare odeN with state-of-the-artalgorithms for approximating motif counts (Section 5.2); evaluatethe scalability of a simple parallel implementation of odeN (Sec-tion 5.3); provide a case study highlighting the usefulness of usingodeN (Section 5.4) to analyze real-world temporal networks.

5.1 Setup, and Datasets

We briefly describe the setup and the large-scale datasets used inour experimental evaluation.

We implemented our algorithm odeN in C++20 and compiledit under gcc 9.3 with optimization flag enabled (implementationavailable at https://github.com/VandinLab/odeN), additional detailson the implementation are in Appendix C. We compared odeNwithfour different baselines, denoted as PRESTO-A (PR-A), PRESTO-E(PR-E) [40], LS [26], and ES [47]. We used the original implemen-tations available from the authors. We performed all experimentsunder Ubuntu 20.04 on a machine with 64 cores, Intel Xeon E5-26982.3GHz, running each algorithm single threaded and with 300GBof maximum RAM allowed.

The datasets used in our experimental evaluation are reported inTable 1, which shows the number of nodes and edges of 𝑇 , the pre-cision of the timestamps, the timespan of the network, the number|𝐸𝑇 | of undirected edges in the corresponding undirected projectedstatic network 𝐺𝑇 , the maximum degree 𝑑max of a node in 𝐺𝑇 andthe maximum number𝑤max of temporal edges that are mapped onthe same static edge in𝐺𝑇 . The datasets are from different domains:SO is a network that models interactions from the Stack-Overflowplatform [34], BI is a network of Bitcoin transactions [26], RE anetwork built from comments on the platform Reddit [26], and ECis a bipartite temporal network build from IPv4 packets exchangedbetween Chicago and Seattle [40]. See the original papers for moredetails on the networks and the processes they model.

When measuring the running times for the various algorithmswe exclude the time to read the dataset. Since ES’s implementationsupports only values of ℓ up to 4, we do not report results for ES andℓ > 4. Unless otherwise stated we used 𝛿 = 86400 for SO and RE,𝛿 = 43200 on BI, and 𝛿 = 50000 on EC, as done in previousworks [26,34, 47]. Since all algorithms used in our comparison have different

Page 7: odeN: Simultaneous Approximation of Multiple Motif Counts

odeN: Simultaneous Approximation of Multiple Motif Counts in Large Temporal Networks CIKM ’21, November 1–5, 2021, Virtual Event, QLD, Australia

Table 1: Datasets used and their statistics. See Section 5.1 for

details on the statistics reported.

Name 𝑛 𝑚 |𝐸𝑇 | 𝑑max 𝑤max Precision Timespan

SO 2.58M 47.9M 28.1M 44K 594 sec 2774 (days)BI 48.1M 113M 84.3M 2.4M 24.2K sec 2585 (days)RE 8.40M 636M 435.3M 0.3M 165K sec 3687 (days)EC 11.16M 2.32B 66.8M 0.3M 3.8M `-sec 62.0 (mins)

parameters and only odeN counts multiple motifs simultaneously,we used the following procedure to choose the parameters. For agiven target template𝐻 and ℓ , we run PRESTO-A, PRESTO-E, LS, andES for each motif inM(𝐻, ℓ) with fixed parameters, and computedtheir running time as the sum of the running times required bythe single motifs in M(𝐻, ℓ). We then fixed the parameters ofodeN so that its running time would be at most the same as theother methods, or be close to it. All the parameters used in theexperiments (including sample sizes) are reported with the sourcecode. To extract the exact counts of motifs we used a modifiedversion of the algorithm by Mackey et al. [27]. We do not reportthe running times of such algorithm since, even though it employsparallelism, it still runs several orders of magnitude slower thanapproximate approaches.

5.2 Approximation Quality and Running Time

In this section we compared the quality of the estimates and therunning times of odeN and the baseline sampling approaches.

To evaluate the approximations qualities we used the MAPE(Mean Average Percentage Error) metric over ten executions of eachalgorithm and parameter configuration. The MAPE is computedas follows: let 𝐶 ′

𝑀𝑖be the estimate of 𝐶𝑀𝑖

, 𝑖 = 1, . . . , |M(𝐻, ℓ) |,returned by an algorithm, then the relative error of such estimateis |𝐶 ′

𝑀𝑖−𝐶𝑀𝑖

|/𝐶𝑀𝑖. The MAPE is the average over the ten runs of

the relative errors, in percentage. On each of the ten runs we alsomeasured the running time of each algorithm, for which we willreport the arithmetic mean.

We first discuss the quality of the estimates for different datasetswhen 𝐻 is a triangle and ℓ ∈ {4, 5}. For ℓ = 4 there are |M(𝐻, ℓ) | =96 triangles, while for ℓ = 5, |M(𝐻, ℓ) | is 800. So as long as ℓ in-creases the approximation task becomes more challenging, due tothe exponential growth of the number of motifs. We also observethat, to the best of our knowledge, such a huge number of temporalmotifs was never tested before on large datasets due to the limita-tions of existing algorithms, while, as we will show, odeN rendersthe approximation task practical even on hundreds of motifs.

The results on the SO dataset are shown in Figure 4a. odeN pro-vides much sharper estimates than state-of-the-art sampling tech-niques for single motif estimations on motifs 𝑀1, . . . , 𝑀 |M(𝐻,ℓ) | :the relative error on ℓ = 4-edge triangles is bounded by 5%, andfor ℓ = 5-edge triangles (where |M(𝐻, ℓ) | = 800) the relative erroris bounded by 12% while state-of-the-art algorithms report muchless accurate estimates, with twice the relative error of odeN, oneach configuration. We report the running times to obtain suchestimates in Table 2. Interestingly, odeN is more than 3× faster withℓ = 4 than any sampling algorithm and 1.7× faster with ℓ = 5. Forthe other datasets, since extracting all the exact counts for ℓ > 4 isextremely time consuming, requiring up to months of computation,

PRESTO-APRESTO-E LS ES odeN

100

101

% R

elat

ive

Erro

r (M

APE)

Methods comparison StackOverflow, = 4

PRESTO-A PRESTO-E LS odeN

100

101

102

% R

elat

ive

Erro

r (M

APE)

Methods comparison StackOverflow, = 5

(a)

PRESTO-APRESTO-E LS ES odeN

101

102

% R

elat

ive

Erro

r (M

APE)

Methods comparison Bitcoin, = 4

PRESTO-APRESTO-E LS ES odeN

101

102

% R

elat

ive

Erro

r (M

APE)

Methods comparison Reddit, = 4

(b)

PRESTO-A PRESTO-E LS odeN

101

102

% R

elat

ive

Erro

r (M

APE)

Methods comparison EquinixChicago, = 4

PRESTO-APRESTO-E LS ES odeN

101

% R

elat

ive

Erro

r (M

APE)

Methods comparison Stackoverflow, = 4

(c)

Figure 4: Approximation error on different datasets. (4a): SO

dataset, 𝐻 is a triangle, for ℓ = 4 (left) and ℓ = 5 (right). (4b):

𝐻 is a triangle, ℓ = 4, BI dataset (left) and RE dataset (right).

(4c): EC dataset, 𝐻 is an edge, ℓ = 4 (left); SO dataset, 𝐻 is a

square, ℓ = 4.

we will not discuss the approximation qualities for ℓ = 5 (since wedo not have the exact counts to evaluate them).

On dataset BI (Figure 4b left) odeN provides more concentratedestimates for the |M(𝐻, ℓ) | = 96 triangles than other algorithmsbut ES, which also has a smaller running time than odeN. This maybe related to the static graph structure of BI, which has some veryhigh-degree nodes (see Table 1). Therefore odeNmay sample edgeswith very high degree nodes, introducing an over counting in itsestimates. Nonetheless, for higher values of ℓ this issue is amortizedover the growing number of motifs |M(𝐻, ℓ) |.

On dataset RE (Figure 4b right) the estimates by odeN are allwithin 13% of relative error and improve significantly over state-of-the-art sampling algorithms, up to one order of magnitude ofprecision. Such estimates were notably obtained with significantlysmaller running time than state-of-the-art sampling algorithms,improving up to 2× the running time of ES and 1.4× over PRESTO(as reported in Table 2).

Finally, on the EC datasets, which is a bipartite temporal networkwith more than 2 billion edges we evaluated the approximationqualities with 𝐻 being an edge and ℓ = 4 (for which |M(𝐻, ℓ) | =8), such motifs have fundamental importance in the analysis oftemporal networks since they can be seen as building blocks [16,

Page 8: odeN: Simultaneous Approximation of Multiple Motif Counts

CIKM ’21, November 1–5, 2021, Virtual Event, QLD, Australia Ilie Sarpe and Fabio Vandin

Table 2: Running times (in seconds) to obtain the results in

Figure 4 (results are showed following the order in Figure 4).

Under𝐻 we report the topolology of𝐻 used: T for triangles, Efor edges, and S for squares. “-” denotes not applicable, while

“✗” denotes out of RAM.

Dataset ℓ 𝐻 PR-A PR-E LS ES odeN

SO 4 T 533.4 537.7 555.5 567.2 174.4

SO 5 T 4405 4408 4390 - 2515

BI 4 T 2048.6 2065.2 2754.6 1602.9 1948.9RE 4 T 9787.1 10165.8 14289.7 13172.3 6814.9

EC 4 E 2581.5 3014.9 2981.9 ✗ 1234.3

SO 4 S 15613.7 16718.7 14344.6 26118.3 4517.9

49]. We report the results on such motifs in Figure 4c (left) (ES isnot shown since it did not terminate with the allowed memorybudget). The estimates of odeN are well concentrated and within20% of relative error, while other sampling approaches provideapproximations with a relative error up to 90% or more. Moreover,odeN’s results were obtained with a speedup of at least 2× over allthe other sampling algorithms, rendering the approximations taskfeasible in a small amount of time on very large temporal networks.

To illustrate the enormous advantage of odeN over existing stateof the art exact and approximation algorithms, we compared thevarious algorithms on dataset SO when 𝐻 is set to be a squareand ℓ = 4, for which |M(𝐻, ℓ) | = 48. As [47] observed, amongthe 4-edge square motifs there are 16 motifs that do not grow as asingle component (i.e., their orderings start with ⟨(1, 2) (3, 4) · · · ⟩).Estimating the counts of such motifs is particularly hard for most ofthe current state-of-the-art sampling algorithms since they generatea large number of partial matchings, while such aspect does notimpact odeN. The results are shown in Figure 4c (right). odeNprovides tight approximations under 9% of relative error for allfour-edge square motifs, while other sampling algorithms fail toprovide sharp estimates for some of the motifs. Surprisingly, asshown in Table 2, to obtain such estimates odeN required less than1.3 hours of computation while the exact computation of the countsrequired more than two weeks, and odeN it is at least 3× timesfaster than all algorithms, and it is 5.4× times faster than ES.

Overall, these results show that our algorithm odeN achievesmuch more precise estimates within a significant smaller runningtime than state of the art sampling algorithms when estimatingthe counts𝐶𝑀1 , . . . ,𝐶𝑀|M(𝐻,ℓ ) | for different values of ℓ and differenttopologies of the target template 𝐻 (see Problem 1 in Section 2).

5.3 Parallel Implementation

In this section we briefly describe the advantages of a simple parallelimplementation of Algorithm 1. As discussed in Section 4.2 thefor cycle (from line 3) can be trivially parallelized, therefore weimplemented such strategy through a thread pooling design pattern.

We describe the results obtained with 𝐻 set to be a triangle,ℓ = 4, and on the dataset SO; similar results are observed for otherdatasets. We tested the speedup achieved with 𝜔 ∈ {2, 4, 8, 16}threads over the sequential implementation. Let 𝑇𝜔 the averagerunning time with 𝜔 threads over ten execution of odeNwith fixed

2 4 8 16Threads

2

4

6

8

10

Spee

dup

over

sequ

entia

l

s = 1 106

s = 2 106

s = 3 106

2 4 8 16Threads

2

3

4

5

6

7

8

9

10

Spee

dup

over

sequ

entia

l

= 43200 = 86400 = 129600

Figure 5: Speed-up of odeN’s parallel implementation.

(Left): Varying 𝑠 and fixed 𝛿 ; (Right) Varying 𝛿 and fixed 𝑠.

parameters, with 𝑇1 being the average time for running the algo-rithm sequentially. We report the value of 𝑇1/𝑇𝜔 , 𝜔 ∈ {2, 4, 8, 16},i.e., the speedup over the sequential implementation. Fig. 5 (Left)shows the speedup across different values of the sample size 𝑠 , with𝛿 = 86400. We observe an almost linear speedup up to 4 threadsand then a slightly worse performance, especially for small samplesizes, that may be related to the time needed to process each sam-ple. Fig. 5 (Right) shows how the speedup changes for 𝑠 = 2 · 106

and different values of 𝛿 . We note that our algorithm odeN seemsnot to be impacted by the value of 𝛿 , and always attaining similarperformances. Interestingly, as captured by our analysis in Section4.3, the algorithm does not reach a fully linear speedup since wedid not parallelized the computation of the sampling probabilities𝑝 (𝑒), 𝑒 ∈ 𝐸𝑇 . As a remark, our parallel implementation is not op-timized, and more advanced parallel strategies may substantiallyincrease its speedup.

5.4 A Case Study

In this section we illustrate how counting multiple motifs, corre-sponding to the same target template 𝐻 , with odeN can be used toextract useful insights from a temporal network. We consider a real-world activity network from Facebook [46]. In such network, eachnode represents a user and a temporal edge (𝑢, 𝑣, 𝑡) indicates thatuser 𝑢 posted on 𝑣 ’s wall at time 𝑡 (see the original publication [46]for more details). The network contains information collected fromSeptember 2006 to January 2009. After removing self-loops, the net-work has 𝑛=45.7K nodes,𝑚=826K temporal edges, and |𝐸𝑇 |=179Kstatic (undirected) edges. We will fist show how analyzing the motifcounts obtained with odeN provides complementary insights tothose in [46], that relied onmostly static analyses.We then concludeby discussing how the counts of the network evolve by varyingonly the parameter ℓ (i.e., fixing 𝐻, 𝛿), showing that such countssurprisingly differ with different values of such parameter.

In the original paper [46], the authors partitioned the Facebooknetwork in nine different snapshots (obtaining nine projected staticnetworks), with each snapshot spanning 90 days of interactions inthe network. The authors observed that consecutive snapshots havesmall resemblance, i.e., on average only 45% of the edges are pre-served through consecutive snapshots. The authors also observedthat despite this difference all the snapshots have similar, almostinvariant, structural properties in terms of their clustering coeffi-cient, average degree distribution, and others. We used odeN (with

Page 9: odeN: Simultaneous Approximation of Multiple Motif Counts

odeN: Simultaneous Approximation of Multiple Motif Counts in Large Temporal Networks CIKM ’21, November 1–5, 2021, Virtual Event, QLD, Australia

Y = 1, [ = 0.1) to compare the temporal networks associated tothe snapshots by computing the counts of the 8 temporal motifs inM(𝐻, ℓ = 3) with𝐻 being a triangle and 𝛿 = 86400 = 1 day. On eachsnapshot, after extracting the motif counts, we computed for eachmotif 𝑀 its normalized count on the snapshot as 𝐶𝑀/

∑8𝑖=1𝐶𝑀𝑖

.The results are reported in Fig. (6a) (see Appendix E for a visualrepresentation of the motifs). Interestingly, even if in [46] the au-thors highlight small resemblance through different snapshots, thecounts of the motifs are stable across the different snapshots, es-pecially by looking at the first three and the last two snapshots.Surprisingly on snapshots 6 and 7, which correspond to the periodof observation of mid-2008, we observe that there is a significantvariation in the motif counts w.r.t. the previous months. This is theperiod where the authors of [46] observed a change in Facebook’sinterface (that led to a drop in the growth of the network) thatseems to be correlated to the variation on the motif counts. Evenmore surprisingly, this aspect is not captured by a static analysisof the snapshots as performed in [46]. Thus, our temporal motifsanalysis through odeN is able to capture a variation in the growthof the network that the static analysis cannot highlight. (We discusshow the motifs and their counts can be used to characterize theactivity on the network in Appendix E).

We then analyzed how the different motif counts of the wholenetwork change by varying the parameter ℓ . We fixed 𝐻 a triangleand run odeN with Y = 1, [ = 0.1, 𝛿 = 86400. The results are shownin Figure (6b).We observe that the counts of𝑀1, . . . , 𝑀 |M(𝐻,ℓ) | varysignificantly by increasing ℓ . For ℓ = 3 almost all the motifs havethe same counts, while for larger ℓ there are some motifs with veryhigh counts (i.e., overrepresented) and some other motifs that areunderrepresented. Overall the highest counts range from 104 to 106

from ℓ = 3 up to ℓ = 6. To understand if these counts increase onlyby chance, we performed a widely used statistical test (e..g, [11, 22])by computing the 𝑍 -scores of the different motif counts under thefollowing null model [31]. We generated 500 random networksby the timeline shuffling random model [11], which redistributesall the timestamps by fixing the directed projected static network.For each motif 𝑀𝑖 , 𝑖 = 1, . . . , |M(𝐻, ℓ) | we computed a 𝑍 -scorethat is defined as follows: let 𝐶𝑀𝑖

be the count of the motif inthe original network and let 𝐶1

𝑀𝑖, . . . ,𝐶500

𝑀𝑖be its counts on the

𝑗-th random network 𝑗 ∈ {1, . . . , 500}. The 𝑍 -score is computedas, 𝑍𝑀𝑖

= (𝐶𝑀𝑖−∑500

𝑗=1𝐶𝑗

𝑀𝑖/500)/std(𝐶1

𝑀𝑖, . . . ,𝐶500

𝑀𝑖) where std(·)

denotes the standard deviation. The results are in Fig. (6c), and theyshow that the counts in Fig. (6b) are very significant and not dueto random fluctuations (higher 𝑍 -scores indicate that such motifcounts are significantly more frequent in 𝑇 than in the networkspermutated randomly). Interestingly, the 𝑍 -scores in Figure (6c)follow a similar law to the counts in Figure (6b), with the highest𝑍 -scores increasing significantly every time ℓ increases. Notablythe highest 𝑍 -scores of motifs with ℓ = 6 are more than 3 ordersof magnitude larger than the 𝑍 -scores of motifs with ℓ = 3. (Wediscuss some of the significant motifs in Appendix E).

6 CONCLUSIONS

In this work we introduced odeN, our algorithm to obtain rigor-ous, high-quality, probabilistic approximations of the counts ofmultiple motifs with the same static topology in large temporal

1 2 3 4 5 6 7 8 9Temporal Network Snapshot

0.025

0.050

0.075

0.100

0.125

0.150

0.175

0.200

Norm

alize

d Co

unt

M1M2

M3M4

M5M6

M7M8

(a)

Motif (sorted by count)

104

105

106

Mot

if co

unt

Distribution of the motif counts with varying

= 3 = 4 = 5 = 6

(b)

Motif (sorted by Z-score)

103

104

105

106

Z-sc

ore

of th

e m

otif

Distribution of the motif counts Z-scores with varying

= 3 = 4 = 5 = 6

(c)

Figure 6: (6a): Counts of the motifs inM(𝐻, 3) with 𝐻 a tri-

angle on each temporal network corresponding to one snap-

shot in [46]. (6b): Counts on the full Facebook network with

varying ℓ . (6c): 𝑍 -scores of the motif counts with varying ℓ .

networks. Our experimental evaluation shows that odeN allows toanalyze several motifs in large networks in a fraction of the timerequired by state-of-the-art approaches. We believe that our algo-rithm odeN will be of practical interest in the analysis of temporalnetworks, complementing many of the existing tools and helpingin understanding complex networked systems and their patterns.

There are several interesting directions for future research, in-cluding devising better edge probability distributions for odeNand choosing such distribution based on the characteristics of thedataset, since different datasets can have very different temporaledges distributions (e.g., with skewed behaviours [40]) and, thus,there may not exist a unique distribution that is effective for alltemporal networks. Another direction of future research is thederivation of improved bounds for the number of samples requiredby odeN, using for example statistical learning theory concepts,such as pseudodimensions or Rademacher averages.

ACKNOWLEDGMENTS

This work was supported, in part, by MIUR of Italy, under PRINProject n. 20174LF3T8 AHeAD, and grant L. 232 (Dipartimenti diEccellenza), and by the U. of Padova project “SID 2020: RATED-X”.

Page 10: odeN: Simultaneous Approximation of Multiple Motif Counts

CIKM ’21, November 1–5, 2021, Virtual Event, QLD, Australia Ilie Sarpe and Fabio Vandin

REFERENCES

[1] Paolo Bajardi, Alain Barrat, Fabrizio Natale, Lara Savini, and Vittoria Colizza.2011. Dynamical Patterns of Cattle Trade Movements. PLoS ONE 6, 5 (may 2011),e19869. https://doi.org/10.1371/journal.pone.0019869

[2] V. Batagelj and M. Zaversnik. 2003. An O(m) Algorithm for Cores Decompositionof Networks. Advances in Data Analysis and Classification, 2011. Volume 5, Number2, 129-145 (Oct. 2003). arXiv:cs.DS/cs/0310049

[3] Jeffrey Baumes, Mark K. Goldberg, Mukkai S. Krishnamoorthy, Malik Magdon-Ismail, and Nathan Preston. 2005. Finding communities by clustering a graphinto overlapping subgraphs. In AC 2005, Proceedings of the IADIS InternationalConference on Applied Computing, Algarve, Portugal, February 22-25, 2005, Volume1, Nuno Guimarães and Pedro T. Isaías (Eds.). IADIS, 97–104.

[4] Caleb Belth, Xinyi Zheng, and Danai Koutra. 2020. Mining Persistent Activityin Continually Evolving Networks. In Proceedings of the 26th ACM SIGKDDInternational Conference on Knowledge Discovery & Data Mining. ACM. https://doi.org/10.1145/3394486.3403136

[5] George Bennett. 1962. Probability Inequalities for the Sum of IndependentRandom Variables. J. Amer. Statist. Assoc. 57, 297 (mar 1962), 33–45. https://doi.org/10.1080/01621459.1962.10482149

[6] Hanjo D Boekhout, Walter A Kosters, and Frank W Takes. 2019. Efficiently count-ing complex multilayer temporal motifs in large-scale networks. ComputationalSocial Networks 6, 1 (2019), 1–34.

[7] Marco Bressan, Stefano Leucci, and Alessandro Panconesi. 2019. Motivo. Pro-ceedings of the VLDB Endowment 12, 11 (jul 2019), 1651–1663. https://doi.org/10.14778/3342263.3342640

[8] Matteo Ceccarello, Carlo Fantozzi, Andrea Pietracaprina, Geppino Pucci, andFabio Vandin. 2017. Clustering uncertain graphs. Proceedings of the VLDBEndowment 11, 4 (dec 2017), 472–484. https://doi.org/10.1145/3186728.3164143

[9] Eunjoon Cho, Seth A. Myers, and Jure Leskovec. 2011. Friendship and mobility.In Proceedings of the 17th ACM SIGKDD international conference on Knowledgediscovery and data mining - KDD '11. ACM Press. https://doi.org/10.1145/2020408.2020579

[10] Ying Ding. 2011. Scientific collaboration and endorsement: Network analysisof coauthorship and citation networks. Journal of Informetrics 5, 1 (jan 2011),187–203. https://doi.org/10.1016/j.joi.2010.10.008

[11] Laetitia Gauvin, Mathieu Génois, Márton Karsai, Mikko Kivelä, Taro Takaguchi,Eugenio Valdano, and Christian L. Vestergaard. 2018. Randomized referencemodels for temporal networks. (June 2018). arXiv:physics.soc-ph/1806.04032

[12] M. Girvan and M. E. J. Newman. 2002. Community structure in social andbiological networks. Proceedings of the National Academy of Sciences 99, 12 (jun2002), 7821–7826. https://doi.org/10.1073/pnas.122653799

[13] Saket Gurukar, Sayan Ranu, and Balaraman Ravindran. 2015. COMMIT. InProceedings of the 2015 ACM SIGMOD International Conference on Management ofData. ACM. https://doi.org/10.1145/2723372.2737791

[14] Wook-Shin Han, Jinsoo Lee, and Jeong-Hoon Lee. 2013. Turboiso: towardsultrafast and robust subgraph isomorphism search in large graph databases. InProceedings of the ACM SIGMOD International Conference on Management of Data,SIGMOD 2013, New York, NY, USA, June 22-27, 2013, Kenneth A. Ross, DiveshSrivastava, and Dimitris Papadias (Eds.). ACM, 337–348. https://doi.org/10.1145/2463676.2465300

[15] Petter Holme and Jari Saramäki. 2012. Temporal networks. Physics Reports 519,3 (oct 2012), 97–125. https://doi.org/10.1016/j.physrep.2012.03.001

[16] Petter Holme and Jari Saramäki (Eds.). 2019. Temporal Network Theory. SpringerInternational Publishing. https://doi.org/10.1007/978-3-030-23495-9

[17] Y. Hulovatyy, H. Chen, and T. Milenković. 2015. Exploring the structure andfunction of temporal networks with dynamic graphlets. Bioinformatics 31, 12(jun 2015), i171–i180. https://doi.org/10.1093/bioinformatics/btv227

[18] Ali Jazayeri and Christopher C Yang. 2020. Motif discovery algorithms in staticand temporal networks: A survey. Journal of Complex Networks 8, 4 (aug 2020).https://doi.org/10.1093/comnet/cnaa031

[19] Alpár Jüttner and Péter Madarasi. 2018. VF2++ - An improved subgraph isomor-phism algorithm. Discret. Appl. Math. 242 (2018), 69–81. https://doi.org/10.1016/j.dam.2018.02.018

[20] Chrysanthi Kosyfaki, Nikos Mamoulis, Evaggelia Pitoura, and Panayio-tis Tsaparas. 2018. Flow Motifs in Interaction Networks. (Oct. 2018).arXiv:cs.SI/1810.08408

[21] Lauri Kovanen, Márton Karsai, Kimmo Kaski, János Kertész, and Jari Saramäki.2011. Temporal motifs in time-dependent networks. Journal of Statistical Me-chanics: Theory and Experiment 2011, 11 (nov 2011), P11005. https://doi.org/10.1088/1742-5468/2011/11/p11005

[22] L. Kovanen, K. Kaski, J. Kertesz, and J. Saramaki. 2013. Temporal motifs revealhomophily, gender-specific patterns, and group talk in call sequences. Proceedingsof the National Academy of Sciences 110, 45 (oct 2013), 18070–18075. https://doi.org/10.1073/pnas.1307941110

[23] Rohit Kumar and Toon Calders. 2018. 2SCENT. Proceedings of the VLDB Endow-ment 11, 11 (jul 2018), 1441–1453. https://doi.org/10.14778/3236187.3269460

[24] Ravi Kumar, Jasmine Novak, and Andrew Tomkins. 2006. Structure and evolutionof online social networks. In Proceedings of the 12th ACM SIGKDD internationalconference on Knowledge discovery and data mining - KDD '06. ACM Press. https://doi.org/10.1145/1150402.1150476

[25] Jinsoo Lee, Wook-Shin Han, Romans Kasperovics, and Jeong-Hoon Lee. 2012. AnIn-depth Comparison of Subgraph Isomorphism Algorithms in Graph Databases.Proc. VLDB Endow. 6, 2 (2012), 133–144. https://doi.org/10.14778/2535568.2448946

[26] Paul Liu, Austin R. Benson, and Moses Charikar. 2019. Sampling Methodsfor Counting Temporal Motifs. In Proceedings of the Twelfth ACM Interna-tional Conference on Web Search and Data Mining (Melbourne VIC, Australia)(WSDM ’19). Association for Computing Machinery, New York, NY, USA, 294–302.https://doi.org/10.1145/3289600.3290988

[27] Patrick Mackey, Katherine Porterfield, Erin Fitzhenry, Sutanay Choudhury, andGeorge Chin Jr. 2018. A Chronological Edge-Driven Approach to TemporalSubgraph Isomorphism. (Jan. 2018). arXiv:cs.DS/1801.08098

[28] Naoki Masuda and Renaud Lambiotte. 2016. A Guide to Temporal Networks.WORLD SCIENTIFIC (EUROPE). https://doi.org/10.1142/q0033

[29] Tijana Milenković and Nataša Pržulj. 2008. Uncovering Biological Network Func-tion via Graphlet Degree Signatures. Cancer Informatics 6 (jan 2008), CIN.S680.https://doi.org/10.4137/cin.s680

[30] R. Milo. 2002. Network Motifs: Simple Building Blocks of Complex Networks.Science 298, 5594 (oct 2002), 824–827. https://doi.org/10.1126/science.298.5594.824

[31] R. Milo. 2004. Superfamilies of Evolved and Designed Networks. Science 303,5663 (mar 2004), 1538–1542. https://doi.org/10.1126/science.1089167

[32] Mark Newman. 2010. Networks. Oxford University Press. https://doi.org/10.1093/acprof:oso/9780199206650.001.0001

[33] Pietro Panzarasa, Tore Opsahl, and Kathleen M. Carley. 2009. Patterns anddynamics of users' behavior and interaction: Network analysis of an onlinecommunity. Journal of the American Society for Information Science and Technology60, 5 (may 2009), 911–932. https://doi.org/10.1002/asi.21015

[34] Ashwin Paranjape, Austin R Benson, and Jure Leskovec. 2017. Motifs in temporalnetworks. In Proceedings of the Tenth ACM International Conference on Web Searchand Data Mining. 601–610.

[35] Noujan Pashanasangi and C. Seshadhri. 2019. Efficiently Counting Vertex Orbitsof All 5-vertex Subgraphs, by EVOKE. CoRR abs/1911.10616 (2019). https://doi.org/10.1145/3336191.3371773 arXiv:1911.10616

[36] N. Przulj. 2007. Biological network comparison using graphlet degree distribution.Bioinformatics 23, 2 (jan 2007), e177–e183. https://doi.org/10.1093/bioinformatics/btl301

[37] Xuguang Ren and JunhuWang. 2015. Exploiting Vertex Relationships in Speedingup Subgraph Isomorphism over Large Graphs. Proc. VLDB Endow. 8, 5 (2015),617–628. https://doi.org/10.14778/2735479.2735493

[38] Pedro Ribeiro, Pedro Paredes, Miguel EP Silva, David Aparicio, and FernandoSilva. 2019. A survey on subgraph counting: concepts, algorithms and applicationsto network motifs and graphlets. arXiv preprint arXiv:1910.13011 (2019).

[39] Ryan A. Rossi, Nesreen K. Ahmed, Aldo Carranza, David Arbour, Anup Rao,Sungchul Kim, and Eunyee Koh. 2021. Heterogeneous Graphlets. ACM Trans-actions on Knowledge Discovery from Data 15, 1 (jan 2021), 1–43. https://doi.org/10.1145/3418773

[40] Ilie Sarpe and Fabio Vandin. 2021. PRESTO: Simple and Scalable Sampling Tech-niques for the Rigorous Approximation of Temporal Motif Counts. SIAM Interna-tional Conference on Data Mining (2021). https://doi.org/10.1137/1.9781611976700.17

[41] Alice C. Schwarze and Mason A. Porter. 2020. Motifs for processes on networks.(July 2020). arXiv:physics.soc-ph/2007.07447

[42] Shai S. Shen-Orr, Ron Milo, Shmoolik Mangan, and Uri Alon. 2002. Networkmotifs in the transcriptional regulation network of Escherichia coli. NatureGenetics 31, 1 (apr 2002), 64–68. https://doi.org/10.1038/ng881

[43] Nino Shervashidze, S. V. N. Vishwanathan, Tobias Petri, Kurt Mehlhorn, andKarstenM. Borgwardt. 2009. Efficient graphlet kernels for large graph comparison.In Proceedings of the Twelfth International Conference on Artificial Intelligenceand Statistics, AISTATS 2009, Clearwater Beach, Florida, USA, April 16-18, 2009(JMLR Proceedings), David A. Van Dyk and Max Welling (Eds.), Vol. 5. JMLR.org,488–495. http://proceedings.mlr.press/v5/shervashidze09a.html

[44] Shixuan Sun, Xibo Sun, Yulin Che, Qiong Luo, and Bingsheng He. 2020. Rapid-Match: a holistic approach to subgraph query processing. Proceedings of theVLDB Endowment 14 (2020), 176–188. https://doi.org/10.14778/3425879.3425888

[45] Kun Tu, Jian Li, Don Towsley, Dave Braines, and Liam D. Turner. 2019. gl2vec. InProceedings of the 2019 IEEE/ACM International Conference on Advances in SocialNetworks Analysis and Mining. ACM. https://doi.org/10.1145/3341161.3342908

[46] Bimal Viswanath, Alan Mislove, Meeyoung Cha, and Krishna P. Gummadi. 2009.On the evolution of user interaction in Facebook. In Proceedings of the 2nd ACMworkshop on Online social networks - WOSN '09. ACM Press. https://doi.org/10.1145/1592665.1592675

[47] Jingjing Wang, Yanhao Wang, Wenjun Jiang, Yuchen Li, and Kian-Lee Tan. 2020.Efficient Sampling Algorithms for Approximate Temporal Motif Counting. InProceedings of the 29th ACM International Conference on Information & KnowledgeManagement. ACM. https://doi.org/10.1145/3340531.3411862

Page 11: odeN: Simultaneous Approximation of Multiple Motif Counts

odeN: Simultaneous Approximation of Multiple Motif Counts in Large Temporal Networks CIKM ’21, November 1–5, 2021, Virtual Event, QLD, Australia

[48] Shuo Yu, Yufan Feng, Da Zhang, Hayat Dino Bedru, Bo Xu, and Feng Xia. 2020.Motif discovery in networks: A survey. Computer Science Review 37 (2020),100267.

[49] Qiankun Zhao, Yuan Tian, Qi He, Nuria Oliver, Ruoming Jin, and Wang-ChienLee. 2010. Communication motifs. In Proceedings of the 19th ACM internationalconference on Information and knowledge management - CIKM '10. ACM Press.https://doi.org/10.1145/1871437.1871694

[50] Bo Zong, Xusheng Xiao, Zhichun Li, Zhenyu Wu, Zhiyun Qian, Xifeng Yan,Ambuj K. Singh, and Guofei Jiang. 2015. Behavior query discovery in system-generated temporal graphs. Proceedings of the VLDB Endowment 9, 4 (dec 2015),240–251. https://doi.org/10.14778/2856318.2856320

Table 3: Notation table.

Symbol Description

𝑇 = (𝑉 , 𝐸) Temporal network𝑛,𝑚 Number of nodes and temporal edges of 𝑇𝐺𝑇 Undirected projected static network of 𝑇

𝑀𝑖 , 𝑖 ∈ [1, |M(𝐻, ℓ) |] Motifs inM(𝐻, ℓ)𝑘 Nodes in the motifsℓ Edges of the motifs𝛿 Duration limit of 𝛿-instances

𝑀 = (K, 𝜎) Motif as pair (multigraph, ordering)U(𝑀,𝛿) Set of 𝛿-instances of𝑀 from 𝑇

𝐶𝑀 Number of 𝛿-instances of𝑀 in 𝑇𝐺𝑢 [𝑀] Undirected graph associated to K

M(𝐻, ℓ)Set of distinct motifs with ℓ edges s.t.it holds 𝐺𝑢 [𝑀𝑖 ] ≃ 𝐻 ∀𝑀𝑖 ∈ M(𝐻, ℓ).

𝐻 Static undirected target template𝑉𝐻 , 𝐸𝐻 Set of nodes and edges of the target 𝐻𝐶𝑀𝑖(𝑒) Number of 𝛿-instances containing 𝑒 ∈ 𝐺𝑇

𝑠 Number of samples collected by odeN𝑋𝑒 Indicator variable denoting if 𝑒 ∈ 𝐺𝑇 is sampled

𝑝𝑒 , 𝑝 (𝑒) Probability of sampling edge 𝑒 ∈ 𝐺𝑇

𝑋𝑗

𝑀𝑖Estimate of motif𝑀𝑖 obtained at odeN’s 𝑗-th step

𝐶 ′𝑀𝑖

Final odeN’s estimate of 𝐶𝑀𝑖

Y, [ Quality and confidence parameters𝜔 Number of threads in odeN parallel

A NOTATION

The notation used throughout this work is summarized in Table 3.

B ODEN’S SUBROUTINES

B.1 FastUpdate and its Subroutines

We now discuss the FastUpdate routine that is called in line 10 ofAlgorithm 1 to keep 𝐶𝑒𝑠𝑡𝑖𝑚𝑎𝑡𝑒𝑠 updated. The FastUpdate subrou-tine is shown in Algorithm 2. 𝐶𝑒𝑠𝑡𝑖𝑚𝑎𝑡𝑒𝑠 maintains the weightedcounts of themotif sequences identified, therefore to keep it updatedwe first count the 𝛿-instances of𝑀𝑖 , 𝑖 = 1, . . . |M(𝐻, ℓ) | within thesampled temporal network i.e. 𝑆 , and then rescale each count op-portunely. Such routine will feature two main aspects, i) an efficientadaptation of the algorithm by Paranjape et al. [34] and ii) an ef-ficient encoding of the various sequences representing the motifsoccurrences within integers that will allow for fast operations (com-parisons to distinguish between different motifs and fast updatesto the data structures).

We now discuss how FastUpdate counts all the 𝛿-instances in 𝑆 .First observe that we already know that 𝐺𝑆 ≃ 𝐻 , and that 𝑆 can berewritten as 𝑆 = (((𝑥1, 𝑦1), 𝑡1), . . . , ((𝑥ℓ , 𝑦ℓ ), 𝑡ℓ ). We first computethe set 𝐸𝑢𝑛𝑖𝑞𝑢𝑒 = {(𝑥,𝑦) : ((𝑥,𝑦), 𝑡) ∈ 𝑆} and we assign to eachedge in 𝐸𝑢𝑛𝑖𝑞𝑢𝑒 a unique identifier (lines 4-5). Then we run anefficient implementation of the algorithm by Paranjape et al. [34]that computes through dynamic programming the counts of allthe subsequences of edges (𝑥,𝑦) s.t. (𝑥,𝑦, 𝑡) ∈ 𝑆 having length ℓ

and occurring within 𝛿-time (lines 6-10). In Algorithm 3 we showour implementation of the subroutines needed to execute lines

Page 12: odeN: Simultaneous Approximation of Multiple Motif Counts

CIKM ’21, November 1–5, 2021, Virtual Event, QLD, Australia Ilie Sarpe and Fabio Vandin

Algorithm 2: FastUpdate

Input: 𝛿, 𝑆,𝐶𝑒𝑠𝑡𝑖𝑚𝑎𝑡𝑒𝑠 , 𝑝 (𝑒𝑅), 𝐻1 𝐸𝑢𝑛𝑖𝑞𝑢𝑒 ← {(𝑥,𝑦) : (𝑥,𝑦, 𝑡) ∈ 𝑆}2 𝑀𝑎𝑝𝑖𝑑 ← {}, 𝐸𝑟𝑒𝑣 ← [],𝑀𝑎𝑝𝑐𝑜𝑢𝑛𝑡𝑠 ← {}, 𝑠𝑡𝑎𝑟𝑡 ← 13 id← 04 foreach 𝑒 ∈ 𝐸𝑢𝑛𝑖𝑞𝑢𝑒 do5 𝐸𝑟𝑒𝑣 [id] ← 𝑒 ,𝑀𝑎𝑝𝑖𝑑 {𝑒} ← id++

6 foreach (𝑥,𝑦, 𝑡) ∈ 𝑆 do

7 while 𝑡 − 𝑡𝑠𝑡𝑎𝑟𝑡 > 𝛿 do

8 Decrement(𝑀𝑎𝑝𝑖𝑑 [(𝑥𝑠𝑡𝑎𝑟𝑡 , 𝑦𝑠𝑡𝑎𝑟𝑡 )], 𝑀𝑎𝑝𝑐𝑜𝑢𝑛𝑡𝑠 )9 𝑠𝑡𝑎𝑟𝑡 ← 𝑠𝑡𝑎𝑟𝑡 + 1

10 Increment(𝑀𝑎𝑝𝑖𝑑 [(𝑥,𝑦)], 𝑀𝑎𝑝𝑐𝑜𝑢𝑛𝑡𝑠 )11 foreach key 𝑘 of length ℓ ∈ 𝑀𝑎𝑝𝑐𝑜𝑢𝑛𝑡𝑠 .𝑘𝑒𝑦𝑠 do

12 𝑀 ′ ← ReconstructMotif(𝑘, 𝐸𝑟𝑒𝑣 )13 if 𝐺𝑢 [𝑀 ′] ≃ 𝐻 then

14 𝑀𝑖 ← EncodeAndClassifyMotif(𝑀 ′)15 𝑋𝑀𝑖

← 𝑀𝑎𝑝𝑐𝑜𝑢𝑛𝑡𝑠 {𝑘}/(|𝐸𝐻 |𝑝 (𝑒𝑅))16 𝑋 ′

𝑀𝑖← 𝐶𝑒𝑠𝑡𝑖𝑚𝑎𝑡𝑒𝑠 {𝑀𝑖 }

17 𝐶𝑒𝑠𝑡𝑖𝑚𝑎𝑡𝑒𝑠 {𝑀𝑖 } ← 𝑋 ′𝑀𝑖+ 𝑋𝑀𝑖

6-10 (see the original paper [34] for full details and correctness).Intuitively, lines 6-10 of Algorithm 2 scan the input sequence 𝑆linearly, maintaining inmemory information about the edgeswithin𝛿 time from the processed one. Through such scan the algorithmupdates 𝑀𝑎𝑝𝑐𝑜𝑢𝑛𝑡𝑠 to keep the counts of the sequences havingat most ℓ edges over the set 𝐸𝑢𝑛𝑖𝑞𝑢𝑒 . Starting the cycle in line 11,𝑀𝑎𝑝𝑐𝑜𝑢𝑛𝑡𝑠 contains the counts of all the ℓ subsequences of edgesfrom 𝑆 over the set 𝐸𝑢𝑛𝑖𝑞𝑢𝑒 . We highlight that we assign to eachstatic edge of 𝑆 an ID of 𝑏 bits. This allows us to encode eachsequence up to 𝑗 = 1, . . . , ℓ edges, occurring within 𝛿 time, in aninteger using 𝑗 · 𝑏 bits through bitwise operations (“<<” denotesright shift and “|” denotes bitwise or) to allow for fast updates to𝑀𝑎𝑝𝑐𝑜𝑢𝑛𝑡𝑠 .

To obtain the estimates of motifs 𝑀1, . . . , 𝑀 |M(𝐻,ℓ) | , for eachℓ sequence of edges identified we reconstruct the correspondinggraph and thus the motif 𝑀 ′ that the sequences is an instance ofin line 12 (the multigraph is given by the edges ID’s while theordering of the edges is given by the sequence itself). We thencheck if𝐺𝑢 [𝑀 ′] is isomorphic to 𝐻 (constraint (1) from Problem 1).If so we encode the motif in a sequence of 2𝑏ℓ bits that allows us toclassify such motif (line 14) in order to distinguish between distinctmotifs (recall we want𝑀𝑖 �𝜏 𝑀𝑗 , 𝑖 ≠ 𝑗 ). The encoding is computedas follows: given 𝑀 ′ = ⟨(𝑥1, 𝑦1), . . . , (𝑥ℓ , 𝑦ℓ )⟩ we assign to eachnode an incremental ID according to its first appearance in𝑀 ′ andwe obtain the final encoding as ⟨ID(𝑥1)ID(𝑦1) . . . ID(𝑥ℓ )ID(𝑦ℓ )⟩. Itis easily seen that two motifs𝑀1, 𝑀2 share the same encoding iff itholds𝑀1 �𝜏 𝑀2 as desired, given that the motifs are directed andthe definition of distinct motifs accounts for the ordering in whichedges appear. We provide an example below.

Example B.1. Let us consider𝑀1, 𝑀2, and𝑀3 from Figure 2. Con-sider 𝜎1 = ⟨(𝑦, 𝑥), (𝑦, 𝑧), (𝑥, 𝑧)⟩, then by assigning an incrementalID to each node according to its first appearance in 𝜎1 we get

Algorithm 3: Subroutines of FastUpdateFunction Increment(id,𝑀𝑎𝑝𝑐𝑜𝑢𝑛𝑡𝑠)

1 foreach 𝑘 ∈ SortByDecLength(𝑀𝑎𝑝𝑐𝑜𝑢𝑛𝑡𝑠 .𝑘𝑒𝑦𝑠) do2 if 𝑘.𝑙𝑒𝑛𝑔𝑡ℎ < ℓ then

3 ^ ← (𝑘 << 𝑏) |id4 𝑀𝑎𝑝𝑐𝑜𝑢𝑛𝑡𝑠 [^] ← 𝑀𝑎𝑝𝑐𝑜𝑢𝑛𝑡𝑠 [^] +𝑀𝑎𝑝𝑐𝑜𝑢𝑛𝑡𝑠 [𝑘]

5 𝑀𝑎𝑝𝑐𝑜𝑢𝑛𝑡𝑠 [id] ← 𝑀𝑎𝑝𝑐𝑜𝑢𝑛𝑡𝑠 [id] + 1Function Decrement(id,𝑀𝑎𝑝𝑐𝑜𝑢𝑛𝑡𝑠)

6 𝑀𝑎𝑝𝑐𝑜𝑢𝑛𝑡𝑠 [id] ← 𝑀𝑎𝑝𝑐𝑜𝑢𝑛𝑡𝑠 [id] − 17 foreach 𝑘 ∈ SortByIncLength(𝑀𝑎𝑝𝑐𝑜𝑢𝑛𝑡𝑠 .𝑘𝑒𝑦𝑠) do8 if 𝑘.𝑙𝑒𝑛𝑔𝑡ℎ < ℓ − 1 then

9 ^ ← (id << (𝑘.𝑙𝑒𝑛𝑔𝑡ℎ · 𝑏)) |𝑘10 𝑀𝑎𝑝𝑐𝑜𝑢𝑛𝑡𝑠 [^] ← 𝑀𝑎𝑝𝑐𝑜𝑢𝑛𝑡𝑠 [^] −𝑀𝑎𝑝𝑐𝑜𝑢𝑛𝑡𝑠 [𝑘]

ID(𝑦) = 1, ID(𝑥) = 2, ID(𝑧) = 3 so the final encoding of 𝑀1 is⟨121323⟩. Following a similar procedure the encoding of 𝑀2 is⟨121323⟩, while the encoding𝑀3 is ⟨121332⟩. The encodings of𝑀1and𝑀2 coincide while differing from the one of𝑀3 as desired.

After this step we update the global data structure 𝐶𝑒𝑠𝑡𝑖𝑚𝑎𝑡𝑒𝑠

by summing to each motif’s estimate, its count in 𝑆 divided by|𝐸𝐻 |𝑝 (𝑒𝑅) where 𝑝 (𝑒𝑅) is the probability of edge 𝑒𝑅 of being sam-pled (lines 15-17), which we prove in Section 4.4 to be the correctweighting schema to output an unbiased estimate.

B.2 Exact Subgraph Enumeration

In this section we briefly discuss the algorithms for subgraph enu-meration that can be adapted to our Algorithm 1 (in line 5). Unfor-tunately we cannot easily use the algorithms for extracting 𝑘-nodemotifs mentioned in Section 3 as is, since they do not provide thelocal enumeration step required by odeN.

In fact, the problem most related to the exact enumeration werequire is the labelled query graph matching problem. In such set-ting one is provided a labelled query graph 𝐻 = (𝑉𝐻 , 𝐸𝐻 , 𝐿𝐻 ), anda labelled graph 𝐺 = (𝑉 , 𝐸, 𝐿) (where labels can be colors for ex-ample, see [25]), 𝐿 may be defined both on edges or vertices. Theproblem requires to find all the subgraphs ℎ′ ⊆ 𝐺 isomorphic to𝐻 , which could be either induced or not but must preserve thelabelling properties (i.e., if (𝑥,𝑦) ∈ 𝐸 is mapped to (𝑥 ′, 𝑦′) ∈ 𝐻 then(𝐿(𝑥), 𝐿(𝑦)) = (𝐿𝐻 (𝑥 ′), 𝐿𝐻 (𝑦′))). To explain how we take advan-tage of the algorithms developed for the problem above we need tointroduce the following definitions (adapted from [35]).

Definition B.2. Let 𝐻 = (𝑉𝐻 , 𝐸𝐻 ) be an undirected graph, anautomorphism is a bijection 𝜋 : 𝑉𝐻 ↦→ 𝑉𝐻 such that (𝑥,𝑦) ∈ 𝐸𝐻 iff(𝜋 (𝑥), 𝜋 (𝑦)) ∈ 𝐸𝐻 .

Definition B.3. Let 𝐻 = (𝑉𝐻 , 𝐸𝐻 ) be an undirected graph, we saythat two edges 𝑒 = (𝑥,𝑦), 𝑒 ′ = (𝑥 ′, 𝑦′) ∈ 𝐸𝐻 belong to the sameedge-orbit iff there exists an automorphism that maps 𝑒 on 𝑒 ′.

In order to adapt the algorithms for the labelled query graphmatching problem we proceed in the following way: 1) colour thenodes of𝐺𝑇 with a fixed colour (say red) 2) Once sampled 𝑒𝑅 ∈ 𝐺𝑇 ,colour its endpoint nodes with a different colour (say blue), call

Page 13: odeN: Simultaneous Approximation of Multiple Motif Counts

odeN: Simultaneous Approximation of Multiple Motif Counts in Large Temporal Networks CIKM ’21, November 1–5, 2021, Virtual Event, QLD, Australia

the map from the last two points 𝐿𝐺𝑇; 3) compute the different

edge-orbits of the pattern 𝐻 (by enumerating the automorphismsof 𝐻 ) and for each edge-orbit choose an edge, colour its endpointnodes with the same colour assigned to 𝑒𝑅 , and keep the colouron the other edges the same as 𝐺𝑇 , call this map 𝐿𝐻 ; 4) run analgorithm for the labelled query graph matching problem withgraph 𝐺𝑇 = (𝑉𝑇 , 𝐸𝑇 , 𝐿𝐺𝑇

) and pattern 𝐻 = (𝑉𝐻 , 𝐸𝐻 , 𝐿𝐻 ) 5) thedesired subgraphs (H ) are the union over the different edge-orbitsenumeration steps.

C IMPLEMENTATION DETAILS

In this section we provide additional implementation details, com-plementing the description of Section 5.1.

In our implementation, we used two main structures: first, anadjacency list3, that allows to query for an edge between 𝑢, 𝑣 ∈ 𝑉in 𝑂 (log(min(𝑑𝑢 , 𝑑𝑣))). Second, we used a hashmap to store foreach static directed edge the timestamps of the temporal edgesthat map on that edge, leading to 𝑂 (1) complexity of querying forthe timestamps of a static edge in 𝐺𝑇 . The initialization of suchstructures is done in 𝑂 (1) per each processed temporal edge whileloading the dataset, by knowing the number of nodes 𝑛. Many stateof the art algorithms exist for the local enumeration of motifs (e.g.,[14, 37, 44]), we provide in our code a general algorithm based onthe algorithm VF2++ [19]. However, instead of using the generalprocedure described in Section B.2, in our test we relied on a simplealgorithm that locally enumerates the subgraphs containing anedge 𝑒 = {𝑥,𝑦} isomorphic to 𝐻 : for triangles the algorithm runs in𝑂 (min(𝑑𝑥 , 𝑑𝑦) log(𝑛)), while when𝐻 is a square the algorithm runsin 𝑂 (min(𝑑𝑥 , 𝑑𝑦)𝑑𝑚𝑎𝑥 log(𝑛)), with 𝑑𝑚𝑎𝑥 the maximum degree ofa node in 𝐺𝑇 .

D PROOFS

In this section we provide the proofs not included in the main text.First we recall that 𝐶𝑀𝑖

(𝑒) the number of 𝛿-instances ofmotif 𝑀𝑖 , 𝑖 = 1, . . . , |M(𝐻, ℓ) | from 𝑇 whose undirected pro-jected static network contains edge 𝑒 ∈ 𝐺𝑇 , i.e., 𝐶𝑀𝑖

(𝑒) =∑ℎ⊆𝐺𝑇 ,ℎ≃𝐻 :𝑒∈ℎ |U(ℎ,𝑀𝑖 ) |, 𝑒 ∈ 𝐺𝑇 where U(ℎ,𝑀𝑖 ) is the set of

𝛿-instances of motif 𝑀𝑖 whose static projected graph is ℎ ⊆ 𝐺𝑇 .Then based on the above it is simple to notice that the following for-mula holds for eachmotif𝑀𝑖 , 𝑖 = 1, . . . , |M(𝐻, ℓ) |:∑𝑒∈𝐺𝑇

𝐶𝑀𝑖(𝑒) =

|𝐸𝐻 |𝐶𝑀 . This relation will be the key for proving the unbiasednessof the estimates provided by odeN, as we show next.

Proof of Lemma 4.1. First let us consider the expectation of𝑋

𝑗

𝑀𝑖, 𝑖 = 1, . . . , |M(𝐻, ℓ) |, 𝑗 = 1, . . . , 𝑠 :

E

1|𝐸𝐻 |

∑︁𝑒∈𝐺𝑇

𝐶𝑀𝑖(𝑒)𝑋𝑒

𝑝𝑒

=1|𝐸𝐻 |

∑︁𝑒∈𝐺𝑇

𝐶𝑀𝑖(𝑒)E[𝑋𝑒 ]𝑝𝑒

= 𝐶𝑀𝑖

where we used the linearity of expectation and the facts thatE[𝑋𝑒 ] = 𝑝𝑒 , 𝑒 ∈ 𝐺𝑇 , and

∑𝑒∈𝐺𝑇

𝐶𝑀𝑖(𝑒) = |𝐸𝐻 |𝐶𝑀𝑖

; thus 𝑋 𝑗

𝑀𝑖, 𝑖 =

3We used the one provided by SNAP: https://github.com/snap-stanford/snap, moreefficient implementations can be also adopted improving the global running times.

1, . . . , |M(𝐻, ℓ) |, 𝑗 = 1, . . . , 𝑠 are unbiased estimates of 𝐶𝑀𝑖, com-

bining such result to 𝐶 ′𝑀𝑖

we obtain,

E[𝐶 ′𝑀𝑖] = E

1𝑠

𝑠∑︁𝑗=1

𝑋𝑗

𝑀𝑖

=1𝑠

𝑠∑︁𝑗=1E[𝑋 𝑗

𝑀𝑖] =

𝑠𝐶𝑀𝑖

𝑠= 𝐶𝑀𝑖

by the linearity of expectation. □

Proof of Lemma 4.2. We need to bound the variance of the es-timate 𝐶 ′

𝑀𝑖, first we rewrite the estimator

𝐶 ′𝑀𝑖=

1𝑠

𝑠∑︁𝑗=1

1|𝐸𝐻 |

∑︁𝑒∈𝐺𝑇

𝐶𝑀𝑖(𝑒)𝑋𝑒

𝑝𝑒=

1𝑠

𝑠∑︁𝑗=1

𝑋𝑗

𝑀𝑖

Since the 𝑠 variables 𝑋 𝑗

𝑀𝑖, 𝑗 ∈ [1, 𝑠] are independent (edges are

drawn independently at each iteration of the outer for loop inAlgorithm 1), it holds var(𝐶 ′

𝑀𝑖) = var( 1𝑠

∑𝑠𝑗=1 𝑋𝑀𝑖

) = 1𝑠 var(𝑋𝑀𝑖

)we thus only need to compute the variance of the variable 𝑋𝑀𝑖

. Letus recall var(𝑋𝑀𝑖

) = E[𝑋 2𝑀𝑖] − E[𝑋𝑀𝑖

]2 = E[𝑋 2𝑀𝑖] − 𝐶2

𝑀𝑖by the

previous lemma. We will now bound E[𝑋 2𝑀𝑖].

E[𝑋 2𝑀𝑖] = E

1|𝐸𝐻 |2

∑︁𝑒1∈𝐺𝑇

∑︁𝑒2∈𝐺𝑇

𝐶𝑀𝑖(𝑒1)𝐶𝑀𝑖

(𝑒2)𝑋𝑒1𝑋𝑒2

𝑝𝑒1𝑝𝑒2

=

1|𝐸𝐻 |2

∑︁𝑒2∈𝐺𝑇

𝐶2𝑀𝑖(𝑒2)

1𝑝𝑒2≤ 1|𝐸𝐻 |2

∑︁𝑒2∈𝐺𝑇

𝐶2𝑀𝑖(𝑒2)

𝑚

𝛼=

=𝑚

𝛼 |𝐸𝐻 |2∑︁

𝑒2∈𝐺𝑇

𝐶2𝑀𝑖(𝑒2)

(1.)≤ 𝑚

𝛼 |𝐸𝐻 |2|𝐸𝐻 |𝐶2

𝑀𝑖=𝑚𝐶2

𝑀𝑖

𝛼 |𝐸𝐻 |

where we used the linearity of expectations, the fact thatE[𝑋𝑒1𝑋𝑒2 ] = 𝑝𝑒1 only for 𝑒1 = 𝑒2 otherwise is 0, a boundon the minimum probability 𝑝𝑒 where 𝑝𝑒 ≤ 𝛼/𝑚,∀𝑒 ∈ 𝐺𝑇

for 𝛼 defined as in Section 4.4. In (1.) we used the fact that𝐶𝑀𝑖(𝑒) = _𝑒𝐶𝑀𝑖

, 𝑒 ∈ 𝐺𝑇 , _𝑒 ∈ [0, 1], then∑𝑒2∈𝐺𝑇

𝐶2𝑀𝑖(𝑒2) =∑

𝑒2∈𝐺𝑇_2𝑒2𝐶

2𝑀𝑖≤ 𝐶2

𝑀𝑖

∑𝑒2∈𝐺𝑇

_𝑒2 = |𝐸𝐻 |𝐶2𝑀𝑖

since _𝑒2 ∈ [0, 1]and further

∑𝑒∈𝐺𝑇

_𝑒 = |𝐸𝐻 | by∑𝑒∈𝐺𝑇

_𝑒𝐶𝑀𝑖= |𝐸𝐻 |𝐶𝑀𝑖

.Thus the variance of 𝑋𝑀𝑖

is bounded by:

Var(𝑋𝑀𝑖) ≤

𝑚𝐶2𝑀𝑖

𝛼 |𝐸𝐻 |−𝐶2

𝑀𝑖= 𝐶2

𝑀𝑖

(𝑚

𝛼 |𝐸𝐻 |− 1

)combining everything together we obtain that var(𝐶 ′

𝑀𝑖) ≤

𝐶2𝑀𝑖

𝑠(𝑚

𝛼 |𝐸𝐻 | − 1), concluding the proof. □

Proof of Theorem 4.3. Let us fix 𝑀𝑖 , 𝑖 ∈ [1, |M(𝐻, ℓ) |] wefirst show a bound to the following probability P[|𝐶 ′

𝑀𝑖−𝐶𝑀𝑖

| ≥Y𝐶𝑀𝑖

]. We want to derive such bound through the applicationof Bennett’s inequality to the following summation: 1

𝑠

∑𝑠𝑗=1 𝑋

𝑗

𝑀𝑖,

we already know that E[𝑋 𝑗

𝑀𝑖] = 𝐶𝑀𝑖

and E[(𝑋 𝑗

𝑀𝑖− 𝐶𝑀𝑖

)2] ≤

𝐶2𝑀𝑖

(𝑚

𝛼 |𝐸𝐻 | − 1)= 𝑣2

𝑗for 𝑗 = 1, . . . , 𝑠 it holds:

𝑋𝑗

𝑀𝑖=

1|𝐸𝐻 |

∑︁𝑒∈𝐺𝑇

𝐶𝑀𝑖(𝑒)𝑋𝑒

𝑝𝑒≤ 1|𝐸𝐻 |

∑︁𝑒∈𝐺𝑇

𝐶𝑀𝑖(𝑒)𝑚

𝛼=𝑚𝐶𝑀𝑖

𝛼 |𝐸𝐻 |

As argued by [40] Bennett’s inequality holds even if we only havean upper bound on the variance of the estimates. Therefore let us

Page 14: odeN: Simultaneous Approximation of Multiple Motif Counts

CIKM ’21, November 1–5, 2021, Virtual Event, QLD, Australia Ilie Sarpe and Fabio Vandin

compute the quantities to apply Bennett’s bound (see [40] for thestatement), clearly 𝐵 = 𝐶𝑀𝑖

( 𝑚𝛼 |𝐸𝐻 | − 1) combining what we already

showed with the unbiasedness of 𝑋 𝑗

𝑀𝑖, moreover 𝑣 ≤ 𝑣2

𝑗since the

bound 𝑣2𝑗is equal for each 𝑗 ∈ [1, 𝑠]. Then,

𝑣2𝑗

𝐵2 =

𝐶2𝑀𝑖

(𝑚

𝛼 |𝐸𝐻 | − 1)

𝐶2𝑀𝑖( 𝑚𝛼 |𝐸𝐻 | − 1)2

=1

( 𝑚𝛼 |𝐸𝐻 | − 1)

also𝑡𝐵

𝑣2𝑗

=Y𝐶𝑀𝑖

𝐶𝑀𝑖( 𝑚𝛼 |𝐸𝐻 | − 1)

𝐶2𝑀𝑖

(𝑚

𝛼 |𝐸𝐻 | − 1) = Y

Combining everything together by Bennett’s inequality we obtain,

P©­«������1𝑠 𝑠∑︁

𝑗=1𝑋

𝑗

𝑀𝑖−𝐶𝑀𝑖

������ ≥ Y𝐶𝑀𝑖

ª®¬ ≤ 2 exp

(− 𝑠

( 𝑚𝛼 |𝐸𝐻 | − 1)ℎ(Y)

)(1)

Now, let𝐴𝑖 = “|𝐶 ′𝑀𝑖−𝐶𝑀𝑖

| ≥ Y𝐶𝑀𝑖”, 𝑖 = 1, . . . , |M(𝐻, ℓ) |, namely

𝐴𝑖 is the event that the estimate of motif𝑀𝑖 , 𝑖 = 1, . . . , |M(𝐻, ℓ) | isdistant more than Y𝐶𝑀𝑖

from 𝐶𝑀𝑖. We already showed that that for

an arbitrary 𝐴𝑖 inequality (1) holds for P[𝐴𝑖 ], so

P©­«|M(𝐻,ℓ) |⋃

𝑖=1𝐴𝑖

ª®¬ ≤|M(𝐻,ℓ) |∑︁

𝑖=1P[𝐴𝑖 ] ≤

≤ |M(𝐻, ℓ) |2 exp

(− 𝑠

( 𝑚𝛼 |𝐸𝐻 | − 1)ℎ(Y)

)≤ [

combining the union bound and the choice of 𝑠 as in statement. □

E CASE STUDY - MOTIF ANALYSIS

E1

"1

E2

E3

231

E1

"2

E2

E3

231

E1

"3

E2

E3

231

E1

"4

E2

E3

231

E1

"5

E2

E3

321

E1

"6

E2

E3

321

E1

"7

E2

E3

321

E1

"8

E2

E3

321

Figure 7: Graphical representation of the motifs in Figure

(6a).

Motifs on the Snapshots of the Facebook Network. Thanks to ouranalysis of Section 5.4 we are able to characterize the user behaviouron the Facebook network of wall posts by looking at different motifs(topology and their orderings) and their counts. We first show in Fig.7 the motifs corresponding to the labels of Figure (6a) in Section 5.4.Then, let𝐻 = {𝑣1, 𝑣2, 𝑣3} be a triangle, the most frequent motifs (i.e.,those with the highest normalized counts on each snapshot) seemto share a common pattern: a first node (𝑣3) after posting on 𝑣1’s(or 𝑣2’s) wall triggers 𝑣1 (or 𝑣2’s) to post on the remaining node’swall with 𝑣1 posting also on such node’s wall to close the triangle,as captured by motifs𝑀3,𝑀7 and𝑀8. Observe that by identifyingthe users that mostly act as 𝑣3 in the occurrences of such frequent

motifs one is able to identify, for example, the nodes more engagedin spreading most of the information over the Facebook networkin a short period of time (recall that we set 𝛿 to one day). Notsurprisingly motif𝑀5 is the less frequent one since its occurrencesrequire node 𝑣2 to post on 𝑣3’wall before receiving the post from 𝑣2therefore without being “triggered” by such node, that received thepost from 𝑣3. Interestingly, without considering the orderings ofoccurrence among such patterns we will not be able to distinguishbetween the most frequent motifs and the least frequent ones sincefor example𝑀4 and𝑀5 have the same static directed graph structurebut they have very different counts on the different snapshots ofthe Facebook network.

1 3

CMZ1

= 1188894 (7.1%)

ZMZ1

= 1496494

2 MZ1

1 3

CMZ2

= 1072282 (7.1%)

ZMZ2

= 1215602

2 MZ3

1 3

CMZ3

= 1018769 (7.1%)

ZMZ3

= 1215069

2 MZ3

1 3

CMZ4

= 1110062 (7.1%)

ZMZ4

= 1165630

2 MZ4

1 3

CMZ5

= 8825 (2.5%)

ZMZ5

= 666

2 MZ5

1 3

CMZ6

= 6170 (2.4%)

ZMZ6

= 800

2 MZ6

1 3

CMZ7

= 5535 (2.5%)

ZMZ7

= 907

2 MZ7

1 3

CMZ8

= 3890 (2.3%)

ZMZ8

= 908

2 MZ8

t4

t3

t1 , t

5 , t6

t2

t4

t3

t1 , t

2 , t5

t6

t3

t4

t1 , t

2 , t5

t6

t3

t4t1 , t

6

t2 , t

5

t2, t5

t3, t6t1

t4

t2, t3t6t4t1, t5

t2, t4

t5t1

t3 , t

6

t3, t4

t5

t1

t6 t2

Figure 8: Graphical representation of the 4motifs with high-

est (top) and lowest (bottom) 𝑍 -scores in Figure (6c) for ℓ = 6.For each motif we report the exact count (which we com-

puted for such representation) and the relative error in the

approximation obtainedwithodeN in brackets, we addition-

ally report each 𝑍 -score of the motif as obtained from Sec-

tion 5.4 (i.e., by using only odeN).

Motifs with varying ℓ - Frequent vs Infrequent. In this Section webriefly discuss the properties and show visually the motifs withhighest and lowest𝑍 -scores obtained in Section 5.4 on the Facebookwall post network for ℓ = 6. The motifs are reported in Figure 8,where we report the 4-top motifs ranked by 𝑍 -score on the top andthe 4-lowest motifs by 𝑍 -scores on the bottom. Note, that the top4 motifs share a similar structure, both temporal and topological.Interestingly in the original paper [46] the authors noted that therewere very few pair of nodes that exchanged more than 5 messages(with median 2). The most frequent temporal motifs seem to involvea pair of highly active nodes (which exchanged many messagesbetween them, i.e., more than 4) and another third node that isreached by such pair of nodes. We unfortunately do not have theoriginal messages to understand better the information captured bysuch frequent motifs (since we do not have the original posts), but itis really surprising that the top 4 motifs all share similar propertiesespecially in the orderings of their edges. Additionally, it seemsthat triangles involving nodes that are pairwise very active seemto be the rarest type of interaction as captured by the 4 motifs withlowest 𝑍 -score, reported in Figure 8 bottom.