introduction to social network analysis with rblc3/sna2016/intro_sna_presentation.pdfoutline...
TRANSCRIPT
Introduction to Social Network Analysis with R
Kayleigh Bohemier and Breanne Chryst
April 22, 2016
Outline
IntroductionIntroduction to Network DataData Format and Gathering
Basic Social Network Analysis (SNA) in RSoft Intro to RNetwork VisualizationsNetwork Statistics
Try it out
2 of 25
Introduction
Vertices/Nodes and Edges
I Wikipedia defines Social Networks as ”a social structure made upof a set of social actors (such as individuals or organizations), setsof dyadic ties, and other social interactions between actors.”
I In network parlance the social actors (people, groups, nations) arenode or vertices and the ties between them are also called edges.
4 of 25
Directed and Undirected
I Directed graphs have edges originating from one node and endingin another. An example is twitter, if I follow you, there is an edgeoriginating from me and ending at you, but you may not follow meback.
I Undirected graphs have reciprocal edges. An example is Facebook,if I am your friend, you are also my friend.
I The decision of whether a network is directed or undirected is animportant one and can have major implications for the analysis.Careful thought should be spent in deciding on this point.
5 of 25
Adjacency Matrices
I The data for a social network can be organized as a matrix, withnon-zero entries in the i , j th entry of the i th node shares an edgewith the j th node.
I If the network is undirected this matrix is symmetric.
I If the edges have weights, the entries in the non-zero entries arethe weights for the edges.
I If the edges are not weighted, the non-zero entries are simply one.
6 of 25
Adjacency Matrices
7 of 25
Edgelists
I Data can be stored as an edgelist, specifying the nodes that have atie between them in the graph.
I If the network is directed the first entry is the start of the edgeand the second is the end of the edge (i.e. there is an arrowconnecting the two nodes pointing in the direction of the second).
I For undirected networks there is only one entry per connection andthe order doesn’t matter.
8 of 25
Network Graph
Bert
Ernie
AdamEve
Batman
Joker
Sherlock
Watson
Robin
Ivy
Moriarty
9 of 25
Publicly Available Datasets
I UC Irvine has a nicely curated list of publicly available networkdata at https://networkdata.ics.uci.edu/resources.php
I Stanford Large Network Dataset Collection:https://snap.stanford.edu/data/
I Of course there are many more resources throughout the internet.
10 of 25
API
I Poplication Program Interface (API) allow users to interact withprograms. Facebook and Twitter are two of the main sources thatcome to mind when thinking about social network data. You maybe able to access some data from these sites through their API.
I Twitter: https://dev.twitter.com/overview/apiI Facebook:
https:
//developers.facebook.com/docs/graph-api/overview
11 of 25
From the Field
I Gather your own social network data through observations,literature, etc.
I When gathering your own data it is important to think about whatconstitutes a node and an edge in your data, before you startcollecting the data.
12 of 25
Basic Social Network Analysis(SNA) in R
Intro to R
I R is both a statistical package (like SPSS, Stata, SAS, etc.) and aprogramming language.
I More accurately, R is an environment within which you can dostatistical programming and graphics.
I R provides an environment and a language that allow you tocompute, analyze, and graph your data - and so much more.
I R is an object-oriented programming language.
I Everything you do and create in R, including statistical models,functions, graphics, and output, is an object stored in memorythat can be manipulated, saved, and reused for greater efficiencyand power.
14 of 25
Brief Intro in R
I Lets try out some R!
15 of 25
Reading network Data into R
I Reading network data into R depends on the data file type.
I read.csv can be used for reading in .csv files
I read.table can be used for reading in .txt files
I For other file types, you may need packages like foreign
16 of 25
Local Network Statistics
I Centrality: a measure of importance for vertices in the network.
I Common measures of centrality include degree, closeness,betweeness, and eigenvector centrality.
17 of 25
Degree
I The degree is the number of edges for a node (number of alters orsocial connections in the network).
3
3
4
3 2
2
3
3
2
1
18 of 25
Closeness
I Closeness is the number of nodes ”close” in geodesic distance orshortest path between vertices.
0.04
0.04
0.06
0.06 0.05
0.04
0.06
0.04
0.04
0.03
19 of 25
Betweeness
I Betweeness measures the extent to which the vertex connectsdisparate groups.
0.83
2.1713.67
20.83
2.5
0
18
8
0
0
20 of 25
Eigenvector
I Eigenvector centrality uses the first eigenvector to create acentrality measure that is a function of its neighbors centrality.
0.84
0.81
0.640.5
0.63
0.37
0.22
0.2
0.08
21 of 25
Global Network Statistics
I Average Shortest Path: the average shortest path between verticesin the graph.
I Connectivity/Density: the proportion of the possible edges that arerealized in the graph.
I Average Degree: mean of the degrees in the network.
22 of 25
Global Network Statistics
I Degree Distribution: the distributions of degrees for the network.this summarizes the connectivity of the network.
1 2 3 4 5 6
Degree Distribution for Kite Network
Degree
Freq
uenc
y
0.0
0.5
1.0
1.5
2.0
2.5
3.0
01
23
45
Degree Distribution for Partner Network
Degree
Freq
uenc
y
3 4 5 9
23 of 25
Global Network Statistics
I Clustering Coefficient/Transitivity: the number of connectedtriangles divided by the number of connected pairs in the graph.
I Partitioning: refers to separating the graph into k disjoint sets ofvertices, usually based on the connectivity within partitions.
I Assortivity/Homophily: describes the tendency for ”birds of afeather flock together.”
24 of 25
Try it out