gabriela ochoa goc/...10/19/2015 2 network ≡ graph graph refers to the mathematical abstraction,...
TRANSCRIPT
10/19/2015
1
BIG DATA SCIENTIFIC AND COMMERCIAL
APPLICATIONS (ITNPD4)
LECTURE: COMPLEX NETWORKS
Gabriela Ochoa
http://www.cs.stir.ac.uk/~goc/
OUTLINE
Networks and Complexity
Social networks
Definition
Motivation
History
Networks in general
Representation and types
Properties and metrics
Software packages
Summary & What’s next?
Ga
brie
la O
choa
, goc@
cs.stir.ac.u
k
2
10/19/2015
2
Network ≡ Graph
Graph refers to the mathematical abstraction, while
network to the real world instantiation
Graphs are just collections of “points” joined by “lines”
Points Lines
vertices edges, arcs Math
nodes links Computer Science
sites bonds Physics
actors ties, relations Sociology
NETWORKS AND GRAPHS
WHAT IS A SOCIAL NETWORK?
A social network is a collection of people,
each of whom is acquainted with some
subset of the others
Represented as a set of points (or vertices)
denoting people, joined in pairs by lines (or
edges) denoting acquaintance.
One could, in principle, construct the social
network for a company or firm, for a school
or university, or for any other community up
to and including the entire world.
10/19/2015
3
WHAT IS A COMPUTER NETWORK?
A collection of computing devices
connected in order to communicate
and share resources
Connections between computing
devices can be
Physical: wire or cables
Wireless: radio waves or infrared
signals Resources • Computers
• Data and data storage
• Printing
• Authentication
MANY OTHER COMPLEX NETWORKS …
Technological networks
Internet, WWW, Telephone, Railway, Airlines, …
Biological networks
Neural networks, metabolic networks, food web, …
Social networks
Friendships, sicentific collaborations, …
Demos and links
Facebook social network graph
Mapping the Human ‘Diseasome’
Gallery of network images
6
Ga
brie
la O
choa
, goc@
cs.stir.ac.u
k
10/19/2015
4
COMPLEX SYSTEMS G
ab
riela
Och
oa
, goc@
cs.stir.ac.u
k
7
[adj., v. kuh m-pleks, kom-pleks; n. kom-pleks]
–adjective
1.
composed of many interconnected parts; compound; composite: a complex highway system.
2.
characterized by a very complicated or involved arrangement of parts, units, etc.: complex machinery.
3.
so complicated or intricate as to be hard to understand or deal with: a complex problem.
Source: Dictionary.com
Complexity, a scientific theory which asserts that some systems display behavioral phenomena that are completely inexplicable by any conventional analysis of the systems’ constituent parts. These phenomena, commonly referred to as emergent behaviour, seem to occur in many complex systems involving living organisms, such as a stock market or the human brain.
Source: John L. Casti, Encyclopædia Britannica
Complex Complexity
Network Science: Introduction January 10, 2011
THE ROLE OF NETWORKS
Behind each complex system there is an
intricate wiring diagram, or a network, that
defines the interactions between the
component.
We will never understand complex system
unless we map out and understand the
networks behind them.
10/19/2015
5
RESOURCES
Books
Network Science Book Project by Laszlo
Barabasi et al.
Networks: An Introduction, M. E. J.
Newman, Oxford University Press, Oxford
(2010)
More books
Articles
Newman, M. E. (2003). The structure and
function of complex networks. SIAM review,
45(2):167–256
Newman, M. E. (2001) The structure of
scientific collaboration networks.
Ga
brie
la O
choa
, goc@
cs.stir.ac.u
k
9
A S
OC
IAL
NE
TW
OR
K
555 scientists and
their co-
authorships
Ga
brie
la O
choa
, goc@
cs.stir.ac.u
k
Lothar Krempel, MPI für Gesellschaftsforschung,
Lothringerstr.78, 50677 Köln, Germany
email: [email protected] 10
10/19/2015
6
AN
EX
AM
PL
E O
F A
SM
AL
L C
OA
UT
HO
RS
HIP
NE
TW
OR
K
Collaborations among
scientists at a private
research institution.
Nodes in the network
represent scientists,
and a line between
two of them indicates
they coauthored a
paper during the
period of study. This
particular network
appears to divide into
a number of
subcommunities, as
indicated by the
shapes of the nodes,
and these
subcommunities
correspond roughly to
topics of research,
Ga
brie
la O
choa
, goc@
cs.stir.ac.u
k
Newman M E J PNAS 2004;101:5200-5205
11
Potterat J J et al. Sex Transm Infect 2002;78:i159-i163
HIV
/AID
S N
ET
WO
RK
Largest connected
component, early
period (1980s),
Colorado Springs
(n = 250).
The stereotypic
member was a
white gay man
nearly 30 years old
who associated with
injecting drug users.
Node labels:
G: Gay man F: Female M: Heterosexual man +: HIV positive −: HIV negative ? : Unknown HIV status N: injecting drug using needles
10/19/2015
7
WHY TO STUDY SOCIAL NETWORKS?
Inherent interest in the patterns of human
interaction
Their structure has important implications for
the spread of information and disease.
For example, varying the average no. of
acquaintances individuals have (average degree)
might substantially influence the propagation of
a rumour, a fashion, a joke, or this year’s flu.
Understanding influence and public opinion
formation
Ga
brie
la O
choa
, goc@
cs.stir.ac.u
k
13
HISTORY OF SOCIAL NETWORK ANALYSIS
14
Ga
brie
la O
choa
, goc@
cs.stir.ac.u
k
Two original sources
1. Graph theory: Euler (1735) solved the Seven
Bridges of Konigsberg problem
2. Sociometry: quantitative method for
measuring social relationships. Developed by
J. L. Moreno (1970’s) in his studies of the
relationship between social structures and
psychological well-being.
Social network analysis • Started by H. White and R. Metton, University of Columbia.
(Harvard revolution) 1970’s
• Essential idea: people’s actions have to be related to their
attributes, but to really understand them you also need to
look at the networks that enable them to do something
10/19/2015
8
SEMINAL PAPERS, MODELS OF COMPLEX NETWORKS
Small-world networks (Watts & Strogatz, Nature, 1998),
Scale-free networks (Barabasi & Albert, Science, 1999)
Neither ordered nor
completely random
Nodes are highly
clustered yet path
length between them is
small
the degree distribution is “right-skewed” with a heavy tail
Most nodes have less-than-average degree, whilst a small fraction of hubs have a large number of connections
Described mathematically by a power-law
Cited by 27,237 Cited by 23,580
Ga
brie
la O
choa
, goc@
cs.stir.ac.u
k
15
TYPES OF NETWORKS
(a) un-weighted,
undirected
(b) discrete vertex
and edge types,
undirected
(c) varying vertex
and edge
weights,
undirected
(d) Directed (also
called arcs)
16
Ga
brie
la O
choa
, goc@
cs.stir.ac.u
k
From (Newman, 2003)
10/19/2015
9
GLOSSARY
Degree. The number of edges connected to a vertex. A directed graph has both an in-degree and an out-degree for each vertex, which are the numbers of in-coming and out-going edges respectively.
Component: The component to which a vertex belongs is that set of vertices that can be reached from it by paths running along edges of the graph.
Geodesic path: A geodesic path is the shortest path through the network from one vertex to another. Note that there may be and often is more than one geodesic path between two vertices.
Diameter: The diameter of a network is the length (in number of edges) of the longest geodesic path between any two vertices. 17
Ga
brie
la O
choa
, goc@
cs.stir.ac.u
k
Topology (Degree distribution)
• Gives an idea of the spread in the
number of links the nodes have
• P(k) is the probability that a randomly
selected node has k links
Distance
• Number of links that make up the
path between two points
• “Geodesic” = shortest path
MAIN GLOBAL PROPERTIES OF NETWORKS
Cohesion or Clustering
• Cliques in social network analysis
• Circles of friends in which every member knows each other
• Example: "6-degrees" of distance phenomenon
Ga
brie
la O
choa
, goc@
cs.stir.ac.u
k
18
10/19/2015
10
DIAMETER AND SHORTEST PATH
Small world: only 6 hops separate any two people
in the world
How do we measure this property in a network?
Let dij be the shortest-path distance between nodes i
and j
19
Ga
brie
la O
choa
, goc@
cs.stir.ac.u
k
Diameter (longest
shortest path distance)
Average shortest path
distance
CLUSTERING COEFFICIENT OR
TRANSITIVITY
In social networks: a friend of a friend is also
frequently a friend
How do we measure the this property
To check whether “the friend of a friend is also
frequently a friend”, we use:
The transitivity or clustering coefficient, which basically
measures the probability that two of my friends are also
friends
20
Ga
brie
la O
choa
, goc@
cs.stir.ac.u
k
Metrics
• Global CC
• Local CC
10/19/2015
11
DEGREE DISTRIBUTION: RANDOM VS. SCALE-FREE
NETWORKS
Linked: The New Science of
Networks
by Albert-László Barabási
Perseus Publishing, April
2002, Hard cover, 229 pgs,
• Random network: like a national highway network (nodes: cities, links: highways)
• Scale-free network: like an air traffic system (nodes: airports, links: flights)
Ga
brie
la O
choa
, goc@
cs.stir.ac.u
k
21
SCALE-FREE NETWORKS
The degree distribution of most real-world networks follows a power-law distribution:
fk = ck-α
Where α is a parameter whose value is in the range 2 < α < 3
“heavy-tail” distribution, implies existence of hubs (nodes with very high degree)
Called scale-free because power laws have the same functional form at all scales. Power low remain unchanged (other than multipicative factor) when rescaling independent var. k
22
Ga
brie
la O
choa
, goc@
cs.stir.ac.u
k
Examples:
• World Wide Web
links,
• Biological networks
• Social networks,
10/19/2015
12
PREFERENTIAL ATTACHMENT
“Rich get richer” dynamics
The more someone has, the more she is likely to have
Examples
the more friends you have, the easier it is to make
new ones
the more business a firm has, the easier it is to win
more
the more people there are at a restaurant, the more
who want to go
23
Ga
brie
la O
choa
, goc@
cs.stir.ac.u
k
WHAT DO REAL NETWORKS LOOK LIKE?
A number of models have been proposed to study
complex networks
Real networks exhibit:
Small diameter: also present in the Erdos-Reny or
random model
High clustering coefficient: also present in the
Watts-Strogatz model
Power-law degree distribution: also present in the
Barabasi-Albert or preferential attachment model
24
Ga
brie
la O
choa
, goc@
cs.stir.ac.u
k
10/19/2015
13
CENTRALITY
Centrality is a node’s measure w.r.t. others
A central node is important and/or powerful
A central node has an influential position in the
network
A central node has an advantageous position in the
network
25
Ga
brie
la O
choa
, goc@
cs.stir.ac.u
k
CENTRALITY MEASURES
There are various centrality measures. The key
ones are:
1. degree – This counts how many people are
connected to you.
2. closeness – If you are close to everyone, you have
a high closeness score.
3. betweenness – People who connect people who
are otherwise separate. If information goes
through you, you have a high betweenness score.
4. eigenvector – A person who is popular with the
popular kids has high eigenvector centrality.
Google’s page rank is an example.
26
Ga
brie
la O
choa
, goc@
cs.stir.ac.u
k
10/19/2015
14
SOFTWARE PACKAGES
NodeXL, a plugin for Excel,
NetworkX for Python,
igraph for both Python and R
statnet for R
Jure Leskovec at Stanford Book and network
package for C.
27
Ga
brie
la O
choa
, goc@
cs.stir.ac.u
k
SUMMARY
Many real-world complex systems can be
represented as a network
Networks capture the connectivity pattern
Real-world networks have several common
structural characteristics
Network metrics (distance, topology, cohesion,
centrality)
What is next?
Seminar this week, Friday 23 Oct by Dr Gabriela Ochoa
Seminar Friday 13 No Dr by Paweł Widera ,Newcastle
University
28
Ga
brie
la O
choa
, goc@
cs.stir.ac.u
k