networks all around us: extracting networks from your problem domain
TRANSCRIPT
M E T I S M E E T U P
Networks All Around Us: Analyzing Networks in your Problem Domain | 3/3/2016
Russell Jurney
http://bit.ly/socialnetworkanalysis2
BACKGROUND
Serial Entrepreneur Contributed code to Apache Druid, Apache Pig, Apache DataFu, Apache Whirr, Azkaban, MongoDB
Apache Commi?er
Three-Bme O'Reilly Author Started & Shipped Product at E8 Security
Ning, LinkedIn, Hortonworks veteran
FOUNDER
NETWORKS
node = company edge = employment transition as in people who… …worked at one startup, founded another
PROPERTY GRAPHS IN YOUR DOMAIN
identify entities identify relationships specify schema (or not) populate graph database learn to think in graph walks (hard) query in batch query in realtime
POPULATING A PROPERTY GRAPH
// Add nodes while((json = company_reader.readLine()) != null) { document = jsonSlurper.parseText(json) v = graph.addVertex('company') v.property("_id", document._id) v.property("domain", document.domain) v.property("name", document.name) }
POPULATING A PROPERTY GRAPH
// Get a graph traverser g = graph.traversal()
while((json = links_reader.readLine()) != null) { document = jsonSlurper.parseText(json)
// Add edges to graph v1 = g.V().has('domain', document.home_domain).next() v2 = g.V().has('domain', document.link_domain).next() v1.addEdge(document.type, v2) }
final Graph g = TinkerFactory.createClassic(); try (final OutputStream os = new FileOutputStream(“jsondump/links.json")) { GraphSONWriter.build().create().writeGraph(os, g); }
EXPORT LINKS AS JSON
THEN USE SNA
LIBRARIES
# # Example - calculate friendship dispersion #
di_graph = nx.DiGraph()
all_edges = util.json_cr_file_2_array('jsondump/links.json')
for edge in all_edges: if 'type' in edge and edge['type'] == 'partnership': di_graph.add_edge(edge['domain1'], edge[‘domain2'])
dispersion = nx.dispersion(di_graph)
TOOLS OF
SNA
SNA = Social Network Analysis
centrality clustering block models cores dispersion center-pieces
CENTRALITY
Centrality is a way of measuring how central or important a particular node is in a social network.
OR
What nodes should I care about?
SINGLE-RELATIONAL CENTRALITY(S)
# all-links-the-same-type-centrality g.V().out().groupCount()
# things-humans-walk-centrality g.V().hasLabel(‘human’).out(‘walks’).groupCount()
# things-dogs-eat-centrality g.V().hasLabel(‘dog’).out(‘eats’).groupCount()
MULTI-RELATIONAL CENTRALITY(S)
# things-eaten-by-things-humans-walk-centrality g.V().hasLabel(‘human’).out(‘walks’).out(‘eats’).groupCount()
# things-hated-by-things-humans-pet-centrality g.V().hasLabel(‘human’).out(‘pets’).out(‘hates’).groupCount()
# things-that-pet-things-that-eat-mice-centrality g.V().in(‘eats’).in(‘pets’).groupCount()
DEGREE CENTRALITY
in-degree centrality is nice… it works even if you’re missing a node’s outbound links
DEGREE CENTRALITY
# computation count connections …its that simple in-degree centrality = popularity out-degree centrality = gregariousness
# meaning risk of catching cold
CLOSENESS CENTRALITY
# computation count hops of all shortest paths distance from all other nodes reciprocal of farness
# meaning communication efficiency spread of information
CLOSENESS CENTRALITY IN GREMLIN
closenessCentrality = g.V().as(“a”).repeat(both(‘relationship_type').simplePath()).emit().as("b")
.dedup().by(select(“a","b")).path() .group().by(limit(local, 1)).by(count(local)
.map {1/it.get()}.sum())
BETWEENNESS CENTRALITY
# computation count of times node appears in shortest paths… …between all pairs of nodes
# meaning control of communication between other nodes
EIGENVECTOR CENTRALITY
# computation counts connections of connected nodes more connected neighbors matter more
# meaning influence of one node on others pagerank is an eigenvector centrality
EIGENVECTOR CENTRALITY IN GREMLIN
g.V() .repeat(out(‘relationship_type’).groupCount(‘m').by('unique_key'))
.times(n).cap('m')
CLUSTERING
property based clustering: k-meansgraph based clustering: modularity property graph based clustering: CESNA
DISPERSION
Romantic Partnerships and the Dispersion of Social Ties: A Network Analysis of Relationship Status on Facebook
CENTER-PIECE SUBGRAPHS
*Slide stolen from Tong, Faloutsos, Pan
Russell Jurney, CEO [email protected] twi?er.com/rjurney 404-317-3620
http://bit.ly/socialnetworkanalysis2