networkx & gephi tutorial #pydata nyc
DESCRIPTION
Slide deck from my presentation at NYC's #Pydata 2012 conference - http://nyc2012.pydata.org/abstracts/#gephi Talk abstract: Are you interested in working with social data to map out communities and connections between friends, fans and followers? In this session I'll show ways in which we use the python networkx library along with the open source gephi visualization tool to make sense of social network data. We'll take a few examples from Twitter, look at how a hashtag spreads through the network, and then analyze the connections between users posting to the hashtag. We'll be constructing graphs, running stats on them and then visualizing the output.TRANSCRIPT
Networkx & Gephi Tutorial#pydata
Gilad Lotan | @gilgul
#gayrights, #lgbt, #jesus, #flipflop, #jobs, #economy
#palestine, #OWS, #immigration,#abortion
#republican, #dems, #economics, #amnesty
#Debates / Ohio
#Debates / Ohio
Politicos
OSU Students
Ohio based Media
• Node network properties– from immediate connections
• indegreehow many directed edges (arcs) are incident on a node
• outdegreehow many directed edges (arcs) originate at a node
• degree (in or out)number of edges incident on a node
– from the entire graph• centrality (betweenness, closeness)
outdegree=2
indegree=3
degree=5
Source: Lada Adamic (SI508-F08)
Example Graph Types
• Complete Graph
• Bipartite Graph– Vertices can be divided into two disjoint sets– Ex: students & schools
Social Network Attributes• Scale Free
– Degree distribution follows a power law– Barabasi et al (‘99): mapped the topology of a portion of
the web
• Small World– Most nodes are not neighbors, but can be reached by
small number of hops– Watts & Strogatz (’98)– Properties: cliques, sub networks with high clustering
coefficient, most pairs of nodes connected by at least one short path
(Zachary) Karate club graph
social network of friendships between 34 members of a karate club at a US university in the 1970s.
Standard test network for clustering algorithms -> during the observation period the club broke up into two separate clubs over a conflict.
Graph Measures• Centrality
– Betweenness– Closeness– Eigenvector– Degree
• Clustering Coefficient (clique)• Modularity
Graph Layout• Open Ord
– Better distinguishes clusters• Yifan Hu• Force Atlas• Fruchterman Reingold
– Graph as a system of mass particles (nodes:particles, edges:springs)
Networkx
Graph Generators
Generate Twitter Graph
graphml file
nodes
edges
Twitter Users with Python in their Bios
• 2 days of Twitter data (Oct 24th and 25th)• Total: 4246 users (62k tweets)• @mikanyan1 tweeted 795 times
Pythonistas on Twitter
Pythonistas on Twitter
English / European
Japanese
Python(the snake)
Chinese
Spanish Speakers
Musicians, Artists
Twitter User Community: Data Science
• Grepped from Twitter bios over 1 week: "data science|data scientist|machine learning|data strateg”
• 1053 Users• 14k Tweets• Most tweeting users:
– @data_nerd (659)– @Chantel_Esworth (562)– @Da5_12 (253)
Dataists on Twitter
Thank You
Gilad LotanTwitter: @gilgul
Github: giladlotan