triadic data analysis in temporal and higher-order networks
TRANSCRIPT
1
Triadic data analysis in temporal and higher-order networks
Austin Benson · Cornell UniversityDynaMo@Networks 2021
The humble triangle is fundamental in network science.
2
The Strength of Weak Ties, Granovetter, 1973.Collective dynamics of ‘small-world’ networks. Watts & Strogatz, 1998.
3
The Structure of Positive Interpersonal Relations in Small Groups, 1967.James Davis and Samuel Leinhardt analyzing triangles to test a sociological theory of George Homans using data from Theodore Newcomb.
4
Network Motifs: Simple Building Blocks of Complex Networks, Milo et al., 2002.
Structure and function of the feed-forward loop network motif,Mangan & Alon, 2003.
The Coherent Feedforward Loop Serves as a Sign-sensitive Delay Element in Transcription Networks,Mangan, Zaslaver, & Alon, 2003.
5
• Higher-order / multi-way interactions• Temporal information• Multilayer, multiplex, heterogeneous, attributed• Features / covariates• Large-scale with millions or billions of edges
Modern network data is rich…
@zhangqian_rach
Triangles are super useful for this rich data!
Triadic analysis for modern network data.
6
w/ R Abebe, M Schaub, J Kleinberg, A Jadbabaie
1. Open and closed triangles in temporal, higher-order interactions.Simplicial closure and higher-order link prediction, PNAS, 2018.
2. Triadic motifs in temporal networks.Motifs in temporal networks, WSDM 2017.Sampling methods for counting temporal motifs, WSDM, 2019.
Real-world systems are composed of “higher-order” interactions that we often reduce to pairwise ones.
7
Commercenodes are productsseveral products can be purchased at once
Communicationsnodes are people/accountsemails often have several recipients, not just one.
Physical proximitynodes are peoplepeople gather in groups
Cell biologynodes are proteinsprotein complexes may involve several proteins
We collected many datasets of timestamped hyperedges
8
1. Coauthorship in different domains.2. Emails with multiple recipients. 3. Tags on Q&A forums.4. Threads on Q&A forums.5. Contact/proximity measurements.6. Musical artist collaboration.7. Substance makeup and
classification codes applied to drugs the FDA examines.
8. U.S. Congress committee memberships and bill sponsorship.
9. Combinations of drugs seen in patients in ER visits. https://math.stackexchange.com/q/80181
bit.ly/sc-holp-data
Thinking of higher-order data as a weighted projected graph with filled-in structures is a convenient viewpoint.
9
1
2
3
4
5
6
7
8
9
2
2
11
1
1
11 1
11
1
1
1
1
t1 : {1, 2, 3, 4}t2 : {1, 3, 5}t3 : {1, 6}t4 : {2, 6}t5 : {1, 7, 8}t6 : {3, 9}t7 : {5, 8}t8 : {1, 2, 6}
Data.
Projected graph W.Wij = # of hyperedges containing nodes i and j.
10
5
113
16
20
11
i
j k
i
j k
or
Open triangleeach pair has been in a hyperedge together but all 3 nodes have never been in the same hyperedge
Closed trianglethere is some hyperedge that contains all 3 nodes
What’s more common in empirical data?
music-rap-genius
NDC-substances
NDC-classes
DAWN
coauth-DBLP
coauth-MAG-geology
coauth-MAG-history
congress-bills
congress-committees
tags-stack-overflow
tags-math-sx
tags-ask-ubuntu
email-Eu
email-Enron
threads-stack-overflow
threads-math-sx
threads-ask-ubuntu
contact-high-school
contact-primary-school10 5 10 4 10 3 10 2 10 1
Edge density in projected graph
0.00
0.25
0.50
0.75
1.00
Fract
ion o
f tr
iangle
s open
There is lots of variation in the fraction of triangles that are open, but datasets from the same domain are similar.
12See also Topological analysis of data by Patania, Vaccarino, & Petri, 2017.
Dataset domain separation also occurs at the local level.
13
• Randomly sample 100 egonets per dataset and measure log of average degree and fraction of open triangles.
• Logistic regression model to predict domain (coauthorship, tags, threads, email, contact).
• 75% model accuracy vs. 21% with random guessing.
Triangles close over time.
14
t1 : {1, 2, 3, 4}t2 : {1, 3, 5}t3 : {1, 6}t4 : {2, 6}t5 : {1, 7, 8}t6 : {3, 9}t7 : {5, 8}t8 : {1, 2, 6}
15
Substances in marketed drugs recorded in the National Drug Code directory.
Bin weighted edges into “weak” and “strong ties” in the projected graph W.Wij = # of simplices containing nodes i and j.
• Weak ties. Wij = 1 (one hyperedge contains i and j)• Strong ties. Wij > 2 (at least hyperedges contain i and j)
Weak and strong ties are useful characterizations.
Closure depends on structure in projected graph.
16
• First 80% of the data (in time) ⟶ record configurations of triplets not in closed triangle. • Remainder of data ⟶ find fraction that are now closed triangles.
Increased edge density increases closure probability.
Increased tie strength increases closure probability.
Tension between edge density and tie strength.
Closure probability Closure probability Closure probability
We used this for a new higher-order link prediction task.
17
t1 : {1, 2, 3, 4}t2 : {1, 3, 5}t3 : {1, 6}t4 : {2, 6}t5 : {1, 7, 8}t6 : {3, 9}t7 : {5, 8}t8 : {1, 2, 6}
Data.
• Observe simplices up to time t. • Predict which groups of > 2
nodes will appear after time t.
t We predict structure that graph models would not even consider!
18
i
j k
Wij
Wjk
Wjk
i
j k
?<latexit sha1_base64="ay03YVMwaV0q+u3rGZz2v3X7FoY=">AAAfcnicjZlbc9vGFYDptE1S9ua0b+20hqtRx3EhiVQsy2rHM7Kt+JLYsayL5URQ1AWwJCAuLjq7oChj8Ds7/Ql972tnepagLsA51JQaCcvdby/Y3fMtBfq5irXp9f5165Of/PRnn372+c+7v/jlr379m9tf/Pa9zgoI5H6QqQw++EJLFady38RGyQ85SJH4Sh74o2e2/GAsQcdZumfOc3mUiGEaD+JAGMw6vn36F8/IiZk2VJ5FsZFVOc0pdZCBrI7ze7HrnLjO6MvK8bwuh58WInQeO/cOjsv4pPoxd/7qYPJkdJmMbfLLH8v+Sl5Vx7cXesu96cuhif4ssdCZvbaPv/jjv70wC4pEpiZQQuvDfi83R6UAEwdKVl2v0DIXwUgM5SEmU5FIfVROx1g5i5gTOoMM8Dc1zjS3e70KtgPivNFKaYRfKAGTZq6fZSMs0c1ckWjbMs1NhImamTYH9EBXTpvV54nfZIcg8igO/q8hyLRIcCkS2pvJMtWCk0KZGLKz5rxho0oelZN61rqNGTq0m0tjH74EGbpQKBni9KthBrGJklVcguYamMGjozJO88LINKiXYFAox2SO3YJOGIMMjDp3mutg4tFHN40DOQARuLMJdPPYzq6biJEMpFL1qC2qYh8EnNvly86062MrQ8iKNNRuLoyRkGqsZSCeuDoSudTuIDZuIFRg34e2Tq4ykwgY6XmtLifSCCyczoqSptwrBkbuyLAqcSbuPurd9RX2e50wkRyClGlVTi+WqeOkyfiqwNCxf68RXS+UA5zrOrzy3IDtaefF06pc7W24D3rug4dILTqRMbn+28oKxuKyNjgCOQkikQ7lcpAlK6eF1Da69Ur/4drG6saKlkmMEvAx5pOlM1yzJXurS3G65KMqJEy5r9YX6kvXs5MtUCV2FrveUGW+UB6+9Wy1TZnqAuRmmCkMkU0USZCF8rEHUonJRd0Mb7EZZod7/aPSLqXdEo11397bFaldApCpPMMbSEQalt5AJLE6xwkRuGHRMnpwkW5uGz2oA23xemca11mGj3vLG26QxNgphpNCKWAHZqIHtonmTWLbXmomtqnNunKp7x+ijdaOqvZNbUnUEMhdjNlMPcdbKutWdFW+ffO6KlPbRRJXZVKVMQ7X25WGgzEjbFfxZ1VmfdgKu4WPy2kKu6R8B+0edp+/sVNy0cFevzF9pT+pSq2uOrFwXbt8haSdA6HySFRXQ/3HK7vvrk97OFQyDqIlOvmXRbjU2sqkIbDEDvdqne97E1/AIe4NL/KzSemN7d/FrhdZzTiRjIeRweNhfS03zqKzF0lHBKYQysFqXW+EYd5bXl2Tk0Xn4rXobOG5KdJAOr40ZxiElnWwM0dP77Jbd7XYdZxpA0u95b5MFi9q70YZ4NDjdOhkqYNr7ig5MI6OQ2lr1Nvf3na50K8uG8ET7KsbG4HpnUxbqfB1tY+2Bfow3MIj3EocvEDGqvSUveBqwPQ6Fx6oDFXhqenV4nWiMcelFxRmdqz7g9K+qZrlaOQGMnvfosaZukLsm1b5sAEMa2KxgdhDIw5nmwzDrPRGIs9FuyWCPanat4Ql1XTb+Ildh/ZQg1ZpsxiuamOI1Lui7gxqHte1WeGqwTkVuo1weptLQGHAfYTiFNfqI9r+MnUjWiQzskjmg2Jy0eZFah6qC/8ET1uTYczXSe/v+IbFceJhiC1W5ex6AxWnNYXXOZTBYxxHaGAeEMZimKUC94xNzaPGOGT8oGFvFpNzh4TnG/oFxzSe21IOmW9DJMra+0Gey3qzoQBbOyl5giXXYn5KPakI9pTBnlLsGYM9o9gWg21R7GsG+5pizxnsOcVeMNgLir1ksJcUe8Vgryj2DYN9Q7FvGexbir1msNcUe8Ngbyj2HYN9R7G3DPaWYtsMtk2xdwz2jmI7DLZDsV0G26XYHoPtUWyfwfYp9p7B3lPsgMEOKPaBwT5Q7HsG+55iPzDYDxT7KCFjyB4TqlLh/wgU9eoCuiIxqpXj6wIalSLh+bqAhgD+8x+yFWYldGNGMYfbbLpRIsnfbV3AnLRtfwIvUCAGBV6hQBwKvESBWBR4jQLxKPAiBWJS4FUKxKXAyxSITYHXKRCfAi9UIEYFXqlAnAq8VIFYFXitAvEq8GIFYlbg1QrErcDLFYhdgdcrEL8CL1gghgVesUAcC7xkgVgWeM0C8SzwogViWuBVC8S1wMsWiG2B1y0Q3wIvXCDGhUvlNkmrXN1i8WM349wslRzYJ+C4rSqLUQOOfQbzKRYwWECxkMFCikkGkxQbMNiAYkMGG1IsYrCIYu0DwWL0NBifMNgJxUYMNqKYYjBFsYTBEoqlDJZSrH3KWyyjWM5gOcVOGeyUYsBgQDFuk2uKGQYzFCsYrKDYmMHGFDtjsDOKTRhsQrFzBjun2EcG+8h86CBxD3zgA4l84EMfSOwDH/xAoh/48AcS/8ALAIgBgFcAEAcALwEgFgBeA0A8ALwIgJgAeBUAcQHwMgBiA+B1AMQHwAsBiBGAVwIQJwAvBSBWAF4LQLwAvBiAmAF4NQBxA/ByAGIH4PUAxA/ACwKIIYBXBBBHAC8JIJYAXhNAPAG8KICYAuapot1iUj/Tm7bZYiPbaFQMZas3ZfOVAFIwsAWDLDNpZqSun9I1Hsvach1AnBumVE9LE2G/9GqWpJDUDyMvv4s93Hnx9KjsufhTMY9DZZLfVOHike0C/QTla8PWXF13+2uP3H5/48bqZ/Oq99fdjTV3tV35+PZCv/1VME28X13uP1zuv3uwsPlo9jXx550/dP7cudfpd9Y7m52Xne3Ofifo/LPz31uf3vrsT/+58/s7d+/MvlP+5Naszu86jdcd93/LqXOt</latexit>
scorep(i, j, k)
= (Wpij +W
pjk +W
pik)1/p
Finding the top-k weighted triangles in large graphs required new algorithms.
19
scorep(i, j, k)
= (Wpij +W
pjk +W
pik)1/p
<latexit sha1_base64="wECyDT1irjpegMdv/Iox6i4U4iU=">AAAHdXicfVVtb9s2EFa7Lem0t7T7OAxglzlIO/ktXZZkQAADK4oVa7FsdpoCoZtR0sliTEoqSTV2Cf2o/Zph37Zfsa872k5jOdkI2DqR99zDu3tIhYXg2nQ6f966/d77H6yt3/nQ/+jjTz79bOPuvRc6L1UEx1EucvUyZBoEz+DYcCPgZaGAyVDASTj+wa2fvAGleZ4NzLSAoWSjjCc8YganzjZ+2qIGJsbqKFdQnRXbPCDnARk/IJT6W/R1yWJySLZPziw/r14V5BuC5vn4ncmd+eCV7baL6mxjs9PqzAa5bnQXxqa3GEdnd9fu0TiPSgmZiQTT+rTbKczQMmV4JKDyaamhYNGYjeAUzYxJ0EM7y7oiDZyJSZIr/GWGzGb9ZQjGUWxai2INC0vB1KQ+G+b5GFd05dcpTbI/tDwrSgNZNGdMSkFMTlwtScwVREZMSZ3W8PHbIOMRJIpFAZNaMpMGBXfbDMz4bXOkWJEGko0hAiGupuabcnDBQ8XU1GWQX+ggxMgjlZdZrIOCGQMq04g3ik8CnbICdJBwE0RMRO49dphC5EYyNdb/FbUlwTBcnBVOgLGDMjHwK8SVVRDf3+/cDwXyLnuYFEYKIKvs7OF8LlJuYMUnFCVU1v0vefgNkhpT6O/bbVRcSxuMDZMoZdkIWlEu269L0E6Uut39bvdg56CtQXLUbohSlc0LbtKmS6LJs2aICgc183u0tzl/+NQVlOEJcPXx6UjkIRMUX6mD9SDTpYJenAvsfw/1H+UxHFIFgk0usTluvq6h00F3aF3jnABqXT4a9FnmiqsggwtMQLIstjRhkotpDAkrhaks1cmlXReJTpwqKr+xTKaxgxAfdloHQSQ5kqIsBCoeCcxEJy5EPUmMTTMzcaF6c7DVD0/xqO0Oq9WkHgOeMQX9qQxz8QRTsvMourI/P39W2cxRSF5ZWVmO26V9MDc540S8CgkXkAWHA/TLENtpStfSmwlWGfpPnruSXBIMurXy2XBSWS2uSJzzHG2foqerARNFyqqrrf72dKXq8UgAj9LmvPY3rWCjNd4u9etBujDLXZZ9PpLIROeqcuEsDaWl8/nqmizkM7yT45sQi4WqTvGQTkKmTlF8NA3ziaVv3H/Dp6kqBZAU+Cg1eLnu7RaGNMggBcIiUzJBEObTMd4QndbOLkwa5HI0yGP8nrAsAhKCucDz63wJkhE9K6M/p2r4hMwCNDutLsjGJbqf5gqrw7MRyTOCoiICEkM0j8EhlvLa7FbvguD9/+h/g6hZJrMolasCfkW6q9+M68aLnVYXt/fLt5u9/cX35I73hfeVt+11vT2v5/3oHXnHXuT97v3h/eX9vfbP+pfrX69vzV1v31pgPvdqY739L4qSnVk=</latexit><latexit sha1_base64="wECyDT1irjpegMdv/Iox6i4U4iU=">AAAHdXicfVVtb9s2EFa7Lem0t7T7OAxglzlIO/ktXZZkQAADK4oVa7FsdpoCoZtR0sliTEoqSTV2Cf2o/Zph37Zfsa872k5jOdkI2DqR99zDu3tIhYXg2nQ6f966/d77H6yt3/nQ/+jjTz79bOPuvRc6L1UEx1EucvUyZBoEz+DYcCPgZaGAyVDASTj+wa2fvAGleZ4NzLSAoWSjjCc8YganzjZ+2qIGJsbqKFdQnRXbPCDnARk/IJT6W/R1yWJySLZPziw/r14V5BuC5vn4ncmd+eCV7baL6mxjs9PqzAa5bnQXxqa3GEdnd9fu0TiPSgmZiQTT+rTbKczQMmV4JKDyaamhYNGYjeAUzYxJ0EM7y7oiDZyJSZIr/GWGzGb9ZQjGUWxai2INC0vB1KQ+G+b5GFd05dcpTbI/tDwrSgNZNGdMSkFMTlwtScwVREZMSZ3W8PHbIOMRJIpFAZNaMpMGBXfbDMz4bXOkWJEGko0hAiGupuabcnDBQ8XU1GWQX+ggxMgjlZdZrIOCGQMq04g3ik8CnbICdJBwE0RMRO49dphC5EYyNdb/FbUlwTBcnBVOgLGDMjHwK8SVVRDf3+/cDwXyLnuYFEYKIKvs7OF8LlJuYMUnFCVU1v0vefgNkhpT6O/bbVRcSxuMDZMoZdkIWlEu269L0E6Uut39bvdg56CtQXLUbohSlc0LbtKmS6LJs2aICgc183u0tzl/+NQVlOEJcPXx6UjkIRMUX6mD9SDTpYJenAvsfw/1H+UxHFIFgk0usTluvq6h00F3aF3jnABqXT4a9FnmiqsggwtMQLIstjRhkotpDAkrhaks1cmlXReJTpwqKr+xTKaxgxAfdloHQSQ5kqIsBCoeCcxEJy5EPUmMTTMzcaF6c7DVD0/xqO0Oq9WkHgOeMQX9qQxz8QRTsvMourI/P39W2cxRSF5ZWVmO26V9MDc540S8CgkXkAWHA/TLENtpStfSmwlWGfpPnruSXBIMurXy2XBSWS2uSJzzHG2foqerARNFyqqrrf72dKXq8UgAj9LmvPY3rWCjNd4u9etBujDLXZZ9PpLIROeqcuEsDaWl8/nqmizkM7yT45sQi4WqTvGQTkKmTlF8NA3ziaVv3H/Dp6kqBZAU+Cg1eLnu7RaGNMggBcIiUzJBEObTMd4QndbOLkwa5HI0yGP8nrAsAhKCucDz63wJkhE9K6M/p2r4hMwCNDutLsjGJbqf5gqrw7MRyTOCoiICEkM0j8EhlvLa7FbvguD9/+h/g6hZJrMolasCfkW6q9+M68aLnVYXt/fLt5u9/cX35I73hfeVt+11vT2v5/3oHXnHXuT97v3h/eX9vfbP+pfrX69vzV1v31pgPvdqY739L4qSnVk=</latexit><latexit sha1_base64="wECyDT1irjpegMdv/Iox6i4U4iU=">AAAHdXicfVVtb9s2EFa7Lem0t7T7OAxglzlIO/ktXZZkQAADK4oVa7FsdpoCoZtR0sliTEoqSTV2Cf2o/Zph37Zfsa872k5jOdkI2DqR99zDu3tIhYXg2nQ6f966/d77H6yt3/nQ/+jjTz79bOPuvRc6L1UEx1EucvUyZBoEz+DYcCPgZaGAyVDASTj+wa2fvAGleZ4NzLSAoWSjjCc8YganzjZ+2qIGJsbqKFdQnRXbPCDnARk/IJT6W/R1yWJySLZPziw/r14V5BuC5vn4ncmd+eCV7baL6mxjs9PqzAa5bnQXxqa3GEdnd9fu0TiPSgmZiQTT+rTbKczQMmV4JKDyaamhYNGYjeAUzYxJ0EM7y7oiDZyJSZIr/GWGzGb9ZQjGUWxai2INC0vB1KQ+G+b5GFd05dcpTbI/tDwrSgNZNGdMSkFMTlwtScwVREZMSZ3W8PHbIOMRJIpFAZNaMpMGBXfbDMz4bXOkWJEGko0hAiGupuabcnDBQ8XU1GWQX+ggxMgjlZdZrIOCGQMq04g3ik8CnbICdJBwE0RMRO49dphC5EYyNdb/FbUlwTBcnBVOgLGDMjHwK8SVVRDf3+/cDwXyLnuYFEYKIKvs7OF8LlJuYMUnFCVU1v0vefgNkhpT6O/bbVRcSxuMDZMoZdkIWlEu269L0E6Uut39bvdg56CtQXLUbohSlc0LbtKmS6LJs2aICgc183u0tzl/+NQVlOEJcPXx6UjkIRMUX6mD9SDTpYJenAvsfw/1H+UxHFIFgk0usTluvq6h00F3aF3jnABqXT4a9FnmiqsggwtMQLIstjRhkotpDAkrhaks1cmlXReJTpwqKr+xTKaxgxAfdloHQSQ5kqIsBCoeCcxEJy5EPUmMTTMzcaF6c7DVD0/xqO0Oq9WkHgOeMQX9qQxz8QRTsvMourI/P39W2cxRSF5ZWVmO26V9MDc540S8CgkXkAWHA/TLENtpStfSmwlWGfpPnruSXBIMurXy2XBSWS2uSJzzHG2foqerARNFyqqrrf72dKXq8UgAj9LmvPY3rWCjNd4u9etBujDLXZZ9PpLIROeqcuEsDaWl8/nqmizkM7yT45sQi4WqTvGQTkKmTlF8NA3ziaVv3H/Dp6kqBZAU+Cg1eLnu7RaGNMggBcIiUzJBEObTMd4QndbOLkwa5HI0yGP8nrAsAhKCucDz63wJkhE9K6M/p2r4hMwCNDutLsjGJbqf5gqrw7MRyTOCoiICEkM0j8EhlvLa7FbvguD9/+h/g6hZJrMolasCfkW6q9+M68aLnVYXt/fLt5u9/cX35I73hfeVt+11vT2v5/3oHXnHXuT97v3h/eX9vfbP+pfrX69vzV1v31pgPvdqY739L4qSnVk=</latexit><latexit sha1_base64="wECyDT1irjpegMdv/Iox6i4U4iU=">AAAHdXicfVVtb9s2EFa7Lem0t7T7OAxglzlIO/ktXZZkQAADK4oVa7FsdpoCoZtR0sliTEoqSTV2Cf2o/Zph37Zfsa872k5jOdkI2DqR99zDu3tIhYXg2nQ6f966/d77H6yt3/nQ/+jjTz79bOPuvRc6L1UEx1EucvUyZBoEz+DYcCPgZaGAyVDASTj+wa2fvAGleZ4NzLSAoWSjjCc8YganzjZ+2qIGJsbqKFdQnRXbPCDnARk/IJT6W/R1yWJySLZPziw/r14V5BuC5vn4ncmd+eCV7baL6mxjs9PqzAa5bnQXxqa3GEdnd9fu0TiPSgmZiQTT+rTbKczQMmV4JKDyaamhYNGYjeAUzYxJ0EM7y7oiDZyJSZIr/GWGzGb9ZQjGUWxai2INC0vB1KQ+G+b5GFd05dcpTbI/tDwrSgNZNGdMSkFMTlwtScwVREZMSZ3W8PHbIOMRJIpFAZNaMpMGBXfbDMz4bXOkWJEGko0hAiGupuabcnDBQ8XU1GWQX+ggxMgjlZdZrIOCGQMq04g3ik8CnbICdJBwE0RMRO49dphC5EYyNdb/FbUlwTBcnBVOgLGDMjHwK8SVVRDf3+/cDwXyLnuYFEYKIKvs7OF8LlJuYMUnFCVU1v0vefgNkhpT6O/bbVRcSxuMDZMoZdkIWlEu269L0E6Uut39bvdg56CtQXLUbohSlc0LbtKmS6LJs2aICgc183u0tzl/+NQVlOEJcPXx6UjkIRMUX6mD9SDTpYJenAvsfw/1H+UxHFIFgk0usTluvq6h00F3aF3jnABqXT4a9FnmiqsggwtMQLIstjRhkotpDAkrhaks1cmlXReJTpwqKr+xTKaxgxAfdloHQSQ5kqIsBCoeCcxEJy5EPUmMTTMzcaF6c7DVD0/xqO0Oq9WkHgOeMQX9qQxz8QRTsvMourI/P39W2cxRSF5ZWVmO26V9MDc540S8CgkXkAWHA/TLENtpStfSmwlWGfpPnruSXBIMurXy2XBSWS2uSJzzHG2foqerARNFyqqrrf72dKXq8UgAj9LmvPY3rWCjNd4u9etBujDLXZZ9PpLIROeqcuEsDaWl8/nqmizkM7yT45sQi4WqTvGQTkKmTlF8NA3ziaVv3H/Dp6kqBZAU+Cg1eLnu7RaGNMggBcIiUzJBEObTMd4QndbOLkwa5HI0yGP8nrAsAhKCucDz63wJkhE9K6M/p2r4hMwCNDutLsjGJbqf5gqrw7MRyTOCoiICEkM0j8EhlvLa7FbvguD9/+h/g6hZJrMolasCfkW6q9+M68aLnVYXt/fLt5u9/cX35I73hfeVt+11vT2v5/3oHXnHXuT97v3h/eX9vfbP+pfrX69vzV1v31pgPvdqY739L4qSnVk=</latexit>
i
j k
Wij
Wjk
Wjk
w/ R Kumar, P Liu, M Charikar
Retrieving Top Weighted Triangles in Graphs, WSDM, 2020.
<latexit sha1_base64="vp8S9+HGTTNptdfFhk9UBe0Sf/4=">AAAf9HicjZlbc9y2FYBX6S3dXuK0j52OmSrqJJ7Vele2bKuddGTZ8SWxY9mSbxE1LkiCS1ogQR+Aq5U5nL62/6Fvnb72//Qn9Ll/oAfL1YU8Zz1djUWQ+A4AAjgf11RQqNTY0ejfKx/94Ic/+vFPPv5p/2c//8UvP7n06a9eGF1CKJ+HWml4FQgjVZrL5za1Sr4qQIosUPJlcHTH1b+cSjCpzvftSSEPMzHJ0zgNhcVLby791w/kJM0rK4JSCagr5S1+6r5vdQGlkv1IWOzCer/3/FUv15E0TVFGk3nRppn0vpAzHG6aT748u4KDNF96vu/5WRrNWwIZRaldB1moE8RuDa8/xsP1a5vu8Pl4OP7cS1wUnm16RoY6j4yHLfRNoW0au5hrwxsOHg+3dlzMnzaun8dcG58GYYgfaGt1Nu/Xl3l0do9vLq2OhqP5x6OF8aKw2lt8dt98+tv/+JEOy0zmNlTCmIPxqLCHlQCbhkriTJVGFiI8EhN5gMVcZNIcVvPVqb01vBJ5sQb8l1tvfrV/MQTbAXHSauV0rLP21UDrI6wx7asiM65lejUTNmlfdFfAxKb2uqw5yYI2OwFRJGn4fw1B5mWWWpnR3qzWqgNnpbIp6OP2vGGjSh5Ws2bW+q0ZOnDb22AfgcQdNHBLGuH0q4mG1CbZBi5Bew1sfOuwSvOitDIPmyWIS+VZ7bkk8KIUZGhxC7bXwaZH7wd5GsoYRDhYTOCgSN3sDjJxJEOpVDNqh6o0AAEnbvn0sRkE2MoEdIm7b1AIayXkBqMspLOBSUQhzSBO7SAUKnTnkYsplLaZgCOzrNVhJq3AyvmsKGmr/TK28pmM6gpn4rNbo88Chf1eJGwiJyBlXlfzg2OOE1ycDhOoUtaV+32B6PuRjHGu52BVFBZcT8/u79TVxmhrcH00uH4DqTUvsbYwf7h61crZ0FgcgZyFicgnchjq7Oq7UhrnF3N1fGNza2PrqpFZihoK0DrZ+jGu2bq71fU0Xw9QVhLm3LWbq82h77vJFigzN4t9f6J0IJSPp74L25a5KUFuR1phimyjykJ00lc+SCVmp7Eab7GdZgf748PKLaXbEq11393fE7lbApC5PMYbyATqwo9FlqoTnBCBG7aufBOfltvbxsRNoq1d7MzgOsvoq9FwaxBmKXaK6aRQCtiBnZnYNdG+SWzbz+3MNbXdBFfmygHaaPOw7t7UXYkaArmHOavVPbylqmnF1NWTx4/qKnddZGldZXWV4nD9PWk5GC9E3ZBgEbLowwXslQEupy3dkvIddHvYu/fYTclpB/vj1vRVwayujDrvxMFNdPUQSTcHQhWJqM+H+ueHbt9dnPZoomQaJut08s+qcKmNk0lLYJkb7vk6X/FngYAD3Bt+EuhZ5U/d77W+nzjNeIlMJ4nFx8PNzcJ6a95+Ij0R2lIoD8P6/hGm+Wi4sSlna97pZ827i49CkYfSC6Q9xiR0rIedeWZ+l/2mq7U+PttcA+uj4Vhma6fRe4kGHDo+Sz2de7jmnpKx9UwaSRfRbH9329XquD5rBJ9g1z7YCMzvZN5KjZ/zfbQr0IfRXfwS4SQOfihTVfnKHXA1YH5cCsdKoyp8NT86vCm05rjyw9JlENrCBnHlTup2PRq5hSzOO9RUq3PEnXTqJy1g0hBrLcQ9NNJosckwzSr/SBSF6LZEsNt195awpp5vmyBz69AdatipbVfDeTSmSLMrms6g4XFd2wHnDS4J6LfS6UkhAYUBVxBKc1yr92j7s9IH0TJbkGW2HBSz0zZPS8tQUwZv8WlrNeZ8U/T/iCcsjhMPE2yxrhbHD1Bp3lB4XEJZfIzjCC0sA6JUTHQucM+40jJqikPGLxruZrG4dEj4fEO/4JimS1sqQAcuRRLd3Q/yRDabDQXY2UnZbay5kPNz6nZNsB0G26HYHQa7Q7G7DHaXYl8z2NcUu8dg9yh2n8HuU+wBgz2g2EMGe0ixbxjsG4p9y2DfUuwRgz2i2GMGe0yx7xjsO4o9YbAnFNtlsF2KPWWwpxR7xmDPKLbHYHsU22ewfYo9Z7DnFHvBYC8o9pLBXlLsFYO9othrBntNse8Z7HuKvZegGXLEpKpU+H8EivpNBV2RFNXK8U0FzUqR8XxTQVNAZEHEBixq6MZMUg53l+lGSSR/t00F86Tt+hN4gQIxKPAKBeJQ4CUKxKLAaxSIR4EXKRCTAq9SIC4FXqZAbAq8ToH4FHihAjEq8EoF4lTgpQrEqsBrFYhXgRcrELMCr1YgbgVerkDsCrxegfgVeMECMSzwigXiWOAlC8SywGsWiGeBFy0Q0wKvWiCuBV62QGwLvG6B+BZ44QIxLpwpt0065ZoOi1+7GefqXHLgmIDTrqocRg04DRgsoFjIYCHFIgaLKCYZTFIsZrCYYhMGm1AsYbCEYt0HgsPo02D6lsHeUuyIwY4ophhMUSxjsIxiOYPlFOs+5R2mKVYwWEGxdwz2jmLAYEAxbpMbilkGsxQrGayk2JTBphQ7ZrBjis0YbEaxEwY7odh7BnvPfOkgeQ984gPJfOBTH0juA5/8QLIf+PQHkv/ACwCIAYBXABAHAC8BIBYAXgNAPAC8CICYAHgVAHEB8DIAYgPgdQDEB8ALAYgRgFcCECcALwUgVgBeC0C8ALwYgJgBeDUAcQPwcgBiB+D1AMQPwAsCiCGAVwQQRwAvCSCWAF4TQDwBvCiAmAKWqaLbYta805u32WET12hSTmSnN+WuKwGkInYVsdY211aa5i1d67WsqzchpIVlas28NhPuj17tmhyy5mWke9U6/2PRwbP7O4fVaIA/NfM6VGbFhwJOX9mu0m9QgbFs5MbNwXjz1mA83vpg+PGy8PHNwdbmYKMb/ObS6rj7p2BaeLExHN8Yjp9urG7vLP5M/HHvN73f9b7ojXs3e9u9B73d3vNeuPJ65S8rf1352+Xp5b9f/sflfzboRyuLmF/3Wp/L//ofMv2Yrw==</latexit>
dataset # nodes # edges time (existing) time (ours)
reddit-reply 8.4M 435M 1.1 hours 5 secondsspotify 3.6M 1.9B > 24 hours 31 seconds
Finding top 1000 triangles
Triadic analysis for modern network data.
20
1. Open and closed triangles in temporal, higher-order interactions.Simplicial closure and higher-order link prediction, PNAS, 2018.
2. Triadic motifs in temporal networks.Motifs in temporal networks, WSDM 2017.Sampling methods for counting temporal motifs, WSDM, 2019.
w/ A Paranjape, J Leskovec, P Liu, M Charikar
Temporal network data is extremely common.
21
Private communicatione-mail, phone calls, text messages, instant messages
Public communicationQ&A forums, Facebook walls, Wikipedia edits
Payment systemscredit card transactions, cryptocurrencies, Venmo
Technical infrastructurepackets over the Internet, messages over supercomputer
22
source destination timestampa d 14sc a 15sa c 17sa b 25sa c 28sa c 30sc d 31sc a 32sa c 35s
1 23
δ = 10s
Temporal network motif1. Directed multigraph
with k edges2. Edge ordering3. Maximum time span δ
a
b c
25s 28s32s
Motif instancek temporal edges that match the pattern that all occur within δ time
a
d c
14s 17s15s
Wrong order!(c, a) before (a, c)
See also Temporal Network Motifs: Models, Limitations, Evaluation, Liu, Guarrasi, & Sarıyüce, 2021.
We developed a model for temporal motifs.Motifs in Temporal Networks, WSDM, 2017.
We also developed fast counting algorithms.
23
M6,1 M6,2 M6,3 M6,4 M6,5 M6,6
M5,1 M5,2 M5,3 M5,4 M5,5 M5,6
M4,1 M4,2 M4,3 M4,4 M4,5 M4,6
M3,1 M3,2 M3,3 M3,4 M3,5 M3,6
M2,1 M2,2 M2,3 M2,4 M2,5 M2,6
M1,1 M1,2 M1,3 M1,4 M1,5 M1,6
1, 2, 3 1, 23 1, 2 3 1, 2 3 1, 2
3
1, 2
3
1, 32 12, 3 12 3 12 3 12
3
12
3
1, 3 2 13 2 1 2, 3 1 23
1 2
3
1 2
3
1, 3 2 13 2 123 1
2, 31 2
3
1 2
3
1, 3
2
13
2
1
2
3 1
2
3 1
2, 3
123
1, 3
2
13
2
1
2
3 1
2
3 1
23
1
2, 3
It takes ~2.5 hours to count all instances of these motifs in a 2B edge phone call network (single threaded).
Cyclic triangles are much more frequent in payment networks than in social networks.
24
Sampling algorithms let us go even faster for large datasets and more complicated motifs.
25
δ = 1 day, 16 threads
<latexit sha1_base64="acPXbWFGBGdsOcIBu/atUAMlp+8=">AAAgXnicjZlbc9vGFYCppE1Ttmns9qUznU6RauhxPBRNUqZktZMZ+W4ndixb8iURNM4CWBCwFhedXUiUMfh1/RV962te2z/QswR1Ac6hJ9RYAHe/ve/5Foa8XMXaDIf/Xvnk01/9+rPffP7b7u9+/8Ufvrxy9Y+vdVaAL1/5mcrgrSe0VHEqX5nYKPk2BykST8k33uE9m//mWIKOs3TPnObyIBHTNA5jXxhMend1xXU9OY3T0givUAKqEpzt0j06KkRQOf7ZT9V1TZZDoWT3mnPNcZNCmRhbL5K0XK9KH4sVaRqnU8fEiXSua+lnaaC/rhzXdVw/iQNb9rqCr8v1tUnVDYTBThtb1apjZJJnIJQjg6nUmCZnwrd5WiQ4B1jpNScXMHAvEmyti0q7zoOjIk7j2b0IRzXNEF6/NXnm4PXW7dFg7NgP3k8GkyFeJ4ONjQ28jgbrbg/r6TovZRDE5l6WJDI1tvmN9Y158dL1QudtZVM217dsyni8MbbXtTXn7GN74mXGZMm8M65Mg/O57L67sjocDOcfh96MFjerncVn593Vv/7sBplf2K74Smi9Pxrm5qAUgNOtJC5DoWUu/EMxlft4m4pE6oNyvg8qp4cpgRNmgP9S48xTu5eLYD0gThu1nHV21kz1suwQc3QzVSTa1kxTE2GiZqJNAR3qymmz+jTxmuwURI6L94u6INMiiXHH0NZMlqkWPN+mkJ005w0rVfKgnNWz1m3M0L4NJI1teBJk0LdrGuD0q2kGsYmSMS5Bcw1MePugjNO8MDL16yUIC+WYzLHh5gQxSN+oU6e5DiY+/NBPY1+GIPz+YgL7eWxnt5+IQ+lLpepeW1TFHgg4tcuXnei+h7VMISswwvq5MEZCqrGUgXjW15HIpe6Hsen7Qvn2e2DL5CoziYBDvazWQSKNwMz5rChpyr0iNBKDA2NbBl/dHn7lKWz3MmEiOQUp06qcXyxzEuHitBhPFbIq7e9LRNcNZIhzPQfLPDdgW3r56G5Vjodb/VvD/q0NpHpOZEyu/3HzppGzgTbYAznzI5FO5cDPkptHhdTWZPrmaGOyNd66qWUSo/A89FuydoJrtmaHuhanax5qUcKcW99crS9d1062QG3aWey6U5V5Qrn41bXFtmWqC5DbQaYwRLZRmn4WyG9ckErMzspmOMRmmO3vjQ5Ku5R2SzTWfWdvV6R2CUCm8gQHkAj0hRuKJFanOCECN2xVujo8u29uGx3Wgda73JjGdZbBN8PBVh9Fi41iOCmUAjZgZjq0VTQHiXW7qZnZqrbrwqW+sY82mhxU7UHdl6ghkLsYs5l6iEMq61p0VT5/9rQqU9tEEldlUpUxdtfdlYaDMSFoF/EWRRZt2AK7hYfLaQq7pHwD7RZ2Hz6zU3LWwN6oMX2lN6tKrS4asXBdunyCpJ0DofJIVBdd/emJ3XeXpz2YKhn70Rqd/PMsXGptZdIQWGK7e7HON9yZJ2Af94YbedmsdI/t717XjaxmnEjG08jg8bA5yY3Tc/Yi6eA5WODBiMW67iGG+XAwnshZ7/z06Tn38RlBpL50PGlOMAgt62Bjjp6Psls31eviSWUrWBsORjLpnZXejTLArtsjNUsdXHNHydA4Og6kLVFvfzvscnVUnVeCJ9j6RyuB+UjmtVT4udhHOwJ9GNzHxxUrcXB9GavSVfaCqwHz61I4VBmqwlXzq8Xrm8Ycl65f2AhCWxgvLO2XqpmPRm4gi+8t6jhTF4j90sqfNoBpTfQaiD004mCxyTDMSvdQ5Llo10SwO1V7SJhTzbeNl9h1aHfVb+U2s+GiNIZIvSvqxqDmcV2bBS4qXFKg2win57kEFAbcQAifw5L4A9r+/O6jaJEsyCJZDorZWZ1nd8tQXXjv8bQ1GcZ8fev+E7+wOE48TLHGqlxcP0LFaU3hdQll8BjHHhpYBgQxPpqmAveMvVtGHWOX8UHDDhZvl3YJzzf0C/bpeGlNOWSeDZEoa+8HeSrrzYYCbO2k5A7mXIr5OXWnIthdBrtLsXsMdo9i9xnsPsUeMNgDij1ksIcUe8Rgjyj2mMEeU+wJgz2h2LcM9i3FvmOw7yj2lMGeUuwZgz2j2PcM9j3FnjPYc4rtMNgOxV4w2AuKvWSwlxTbZbBdiu0x2B7FXjHYK4q9ZrDXFHvDYG8o9pbB3lLsBwb7gWI/MtiPFPsgIWPIIROqUuH/ESjq1hl0RWJUK8fXGTQqRcLzdQYNAZF4AVtgkUM3ZhRzuE2mGyWS/GjrDOakbfsTeIECMSjwCgXiUOAlCsSiwGsUiEeBFykQkwKvUiAuBV6mQGwKvE6B+BR4oQIxKvBKBeJU4KUKxKrAaxWIV4EXKxCzAq9WIG4FXq5A7Aq8XoH4FXjBAjEs8IoF4ljgJQvEssBrFohngRctENMCr1ogrgVetkBsC7xugfgWeOECMS6cK7dJWuXqFouP3Yxzs1Ry4IiAx21VWYwa8NhjMI9iPoP5FAsYLKCYZDBJsZDBQopNGWxKsYjBIoq1DwSL0dPg+D2DvafYIYMdUkwxmKJYwmAJxVIGSynWPuUtllEsZ7CcYkcMdkQxYDCgGLfJNcUMgxmKFQxWUOyYwY4pdsJgJxSbMdiMYqcMdkqxDwz2gXnoIHEPfOADiXzgQx9I7AMf/ECiH/jwBxL/wAsAiAGAVwAQBwAvASAWAF4DQDwAvAiAmAB4FQBxAfAyAGID4HUAxAfACwGIEYBXAhAnAC8FIFYAXgtAvAC8GICYAXg1AHED8HIAYgfg9QDED8ALAoghgFcEEEcALwkglgBeE0A8AbwogJgClqmiXWNSv9Ob19liI1tpVExlqzVl05UAkhHajDDLTJoZqeu3dI3XsjZf+xDnhsnV89xE2D96NXNSSOqXkfZV6/yPRfsvH909KId9/KmY16EyyT9W4OyV7Sp9gvK0YUuON/ujye3+aLT10eIny4qPNvtbk/64XfjdldVR+0/B9Ob1eDDaGIxejFe37y7+TPx55y+dv3eud0adzc5253Fnp/Oq46/8a+Xnlf+u/O9v/3E+c75wvqzRT1YWZf7UaXycP/8fYJC3JQ==</latexit>
running time (seconds)
dataset # temporal edges exact sampling par. sampling
EquinixChicago 345M 481.2 45.50 5.666 1.3%RedditComments 636M X 6739 2262 –
26
THANKS! Austin Bensonhttp://cs.cornell.edu/~arb
Triadic data analysis in temporal and higher-order networks
Supported by ARO MURI, ARO Award W911NF19- 1-0057, NSF Award DMS-1830274, and JP Morgan Chase & Co.
Lots of data available at https://www.cs.cornell.edu/~arb/data/
Santa Fe, NM