social network analysis (1) ling 575 fei xia 01/04/2011
TRANSCRIPT
Basic idea
• Build a graph– A node represents a person– A link represents the relation between two persons– Question: define what kind of relation should be used
• Process the graph to answer questions such as – what is the structure of the graph– who is a key player in the graph
• Let’s start with paper #4, (Diesner and Carley, 2005), “Exploration of Communication Network from the Enron Email Corpus”
(Diesner and Carley, 2005)
• Research questions:– What are the structure and properties of the
communication networks in Enron? How do these features relate to other networks?
– Who are key players or critical individuals in the system?
– How do structure and key players change over time?
Dataset
• Start with the ISI database– 252,759 emails from 151 people
• Database refinement– Add job position and job location info• there are 15 unique job titles (CEO, president, VP, etc.)
– Normalize email addresses• on average, each person has 1.9 email addresses
Degree centrality
• Given a graph G=(V,E) with n vertices,
• in-degree centrality:
• out-degree centrality:
Closeness centrality
• Loosely, Closeness is the inverse of the average distance in the network between the node and all other nodes.
• If every node is reachable from v
Betweenness centrality• Loosely, across all node pairs, the percentage that has a shortest
path that passes through v.
• sum = 0;• For each pair of vertices (s,t)
compute all the shortest paths between s and t determine the fraction of shortest paths that go through v sum += fraction;
• betweenness = sum / X; X is (n-1)(n-2)/2 for undirected graph, and (n-1)(n-2) for directed
graph