social network analysis: what it is, how it works, and how you can do it prof. paul beckman san...
TRANSCRIPT
Social Network Analysis:What it Is, How it Works, and
How You Can Do It
Prof. Paul Beckman
San Francisco State University
Agenda
“Social networks” SNA: Social network analysis The math behind SNA My contribution to SNA research My SNA research SNA software Large dataset research example
Commercially,What Are Social Networks?
Theoretically,What Are Social Networks?
Groups of individuals who are often humans
but not necessarily: gorillas, dolphins, birds, etc. who “interact” in some setting
creating “links” between the individuals With humans, the setting is often professional
for example, in the workplace and NOT necessarily social
because social interactions are often hard to record without some computerized programmatic process such
as used by the firms on the previous slide
Social Network Analysis
SNA: research about/using social networks Other criteria are often needed
Frequently the “interaction” relates to a task For my research purposes:
group of individuals must be precisely defined task must be standardized task must have quantifiable performance
measurements task must be completed many times sub-groups must form, break up, and re-form in
different configurations to complete the task
The Math Behind SNA In math, SNA is call “graph theory”
one branch of mathematics Graph theory is the study of groups of
“nodes” points in a network
“edges” links between points in a network
It is also the study of: measures of node interaction
things you can say about an individual node measures of network structure
things you can say about the network as a whole
Other Graph Theory Terms
Edge weight Did two nodes get linked just once or were they
linked more than once? Did you just meet me or are you currently in ISYS 464?
Directed vs. non-directed graphs Is an edge from one node to the other or is the
edge non-directional? Example: at a party, you may:
know about someone: directional shake hands: non-directional
Example: “recommendation networks”: each node recommends other nodes and can be recommended
Example:Measuring “Degrees of Separation”
Firm #
Board Members
1 A, D
2 B, C
3 A, C
4 C, D, E
5 F,G
B
A
C
D
E
A B C D E
A 2 1 1 2
B 2 1 2 2
C 1 1 1 1
D 1 2 1 1
E 2 2 1 1
x 1.50 1.75 1.00 1.25 1.50
F
= board member
G1
2
Graphwith 2islands
Calculating connectivity 1. for each node: calculate the
shortest path to each other node
2. for each node: calculate mean of all shortest paths for that node
Other Connectivity Measures
In graph theory, we say “centrality” instead of “connectivity”
There are four common measures of centrality Degree centrality
simply: sum of other directly connected nodes Betweenness centrality
a more complex measure of “degrees of separation” Closeness centrality
average of all “degrees of separation” for a node Eigenvector centrality
measures “importance” of a node (Google’s PageRank)
My Contribution to SNA Research
Move beyond simply measuring network centrality or other graph-theoretic constructs and measures
Because: which is important in the real world? Who is most connected?
or How does connectivity relate to real-world task
performance?
My Contribution Requires . . .
Addition of a standard task because the real world cares about task
performance, NOT connectivity values but most SNA research focuses strictly on
information flow through the network and who is “important” in network information flow
Task has quantifiable performance measures so we can relate (in a mathematical way) network
measures to performance measures
My SNA Research
MIS researchers Kevin Bacon, Degrees-of-Separation, and MIS Res
earch VC board members
Do a Firm’s Board Member Linkages Relate to Perceived or Actual Firm Financial Performance?
Baseball players More Highly-Connected Baseball Players Have
Better Offensive Performance
SNA Software
Powerful (and complex) tools: UCINET
For very complex network calculations Pajek
For very large datasets Netdraw
For visualizing networks Weaker (and more simple) tools:
NodeXL
SNA Software: NodeXL
Easy-to-use tool http://nodexl.codeplex.com/
Runs inside Microsoft Excel as a template
NodeXL Example:Using Our Previous Dataset
Firm #
Board Members
1 A, D
2 B, C
3 A, C
4 C, D, E
5 F,G
B
A
C
D
E
A B C D E
A 2 1 1 2
B 2 1 2 2
C 1 1 1 1
D 1 2 1 1
E 2 2 1 1
x 1.50 1.75 1.00 1.25 1.50
F
= board member
G1
2
Graphwith 2islands
Calculating connectivity 1. for each node: calculate the
shortest path to each other node
2. for each node: calculate mean of all shortest paths for that node
NodeXL Centrality Calculations
We need data in “edgelist” format a standard format for entering data into SNA tools this is sometimes a tricky transformation
Node1 Node2
A C
A D
B C
C D
C E
D E
Large Dataset Example
U.S. professional baseball Players are nodes Links occur when players play together
as measured from team rosters
My Five Research Criteria
1. Precisely defined group of nodes? Yes: you are either a MLB player or not
2. Nodes interact in precisely quantifiable sub-groups? Yes: team rosters are defined by specific rules
3. Standard task that sub-groups perform? Yes: a baseball game has a specific set of (exact) rules
4. Task has quantifiable performance measures? Yes: both for players (BA, RBIs, etc.) and teams (wins,
runs, etc.)5. Sub-groups break up, re-form, and re-do the task
again? Yes: rosters change from day to day and year to year
Research Methodology
1. Get the dataset available online
2. Calculate centrality for each NON-pitcher over some time period
Why non-pitchers?
3. Calculate task performance batting average, home runs, RBIs, slugging pct.
4. Calculate correlation between centrality and individual performance
Correlation Calculation
Correlations between centrality measures and individual performance measures
“Correlation” varies from -1.00 to +1.00 example: list 1000 players by height vs. home runs
exactly the same order? correlation = +1.00 exact inverse order? correlation = -1.00
Results
BA = batting average HR = home runs hit RBI = runs batted in SLG = slugging percentage FPCT = fielding percentage
Degree Betweenness Closeness EigenvectorBA 0.30 0.20 0.34 0.26HR 0.48 0.46 0.39 0.42RBI 0.47 0.47 0.38 0.38SLG 0.37 0.25 0.37 0.35FPCT 0.04 0.03 0.07 0.05
Conclusions
Players with higher centrality have higher individual OFFENSIVE performance
measures but not defensive performance measures
This does NOT mean higher centrality leads to higher performance only that they are correlated
Limitations
Simplistic measure of a “link” opening day roster
misses subsequent changes in player connection Further simplistic measure of a “link”
binary, not weighted, links misses players who play together over a long time
Only measured correlation not causality
don’t know if one causes the other or perhaps are both caused by some other factor (age, experience, etc.)
So, Today We’ve Talked About:
What social networks are The math behind social networks
graph theory A free social network analysis tool you can use
NodeXL One particular large SNA research project
connectivity vs. performance of MLB players
Questions?