empirical approach for modelling dynamic human contact

29
1 Empirical Approach for Modelling Dynamic Human Contact Networks Eiko Yoneki Outline New Communication Paradigm Empirical Approach to understand Network Structure Social Communities Social-Based Forwarding Algorithms Communication to Epidemiology Towards Modelling Dynamic Human Contact Networks for Epidemiology

Upload: others

Post on 24-Jan-2022

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Empirical Approach for Modelling Dynamic Human Contact

1

Empirical Approach for Modelling Dynamic Human Contact Networks

Eiko Yoneki [email protected]://www.cl.cam.ac.uk/~ey204Systems Research GroupUniversity of Cambridge Computer LaboratoryOutline

� New Communication Paradigm� Opportunistic Networks (proximity based communication)� Empirical Approach to understand Network Structure � Human Connectivity Traces� Characteristics of Networks� Social Communities� Social-Based Forwarding Algorithms� Communication to Epidemiology� Towards Modelling Dynamic Human Contact Networks for Epidemiology 2

Page 2: Empirical Approach for Modelling Dynamic Human Contact

2

Wireless Epidemic� The wireless epidemic (Nature 449, 287-288; 2007) by Jon Kleinberg‘Digital traffic flows not only over the wired backbone of the Internet, but also in small leaps through physical space as people pass one another on the street’ New Communication Paradigm �

Opportunistic NetworksEU FP6 HaggleEU FP7 SOCIALNETSEU FP7 RECOGNITION 3Pocket Switched Networks (PSN)�������������� ������������������������������� ������������� ������ ������������������������������������� ���������!������� ���������� ���������!�������

��������������� ���������"������ ��������������#���Human-to-Human: Mobile Devices cover the Globe

4

Page 3: Empirical Approach for Modelling Dynamic Human Contact

3

Opportunistic Data Dissemination � Store-Carry-Forward Paradigm� Network holds Data� Path existing over time� Delay Tolerant Networks (DTN)� Use of Mobility (e.g. Message Ferry)� Use of Epidemic � Power of Gossip� Highly robust against disconnection, mobility, and node failures; simple, decentralised, and fast� Control Flooding (e.g. Location, Count-base, Timer, History)� Understanding Network Structure is important� Logical Connection Topology: Backbone Structure (e.g. Social Networks – Hubs and Communities) 5Human Contact Data Collection

� Robust data collection from real world� Post-facto analysis and modelling yield insight into human interactions

� Data is useful from building communication protocol to understanding disease spread

6Modelling Contact Networks: Empirical Approach

Page 4: Empirical Approach for Modelling Dynamic Human Contact

4

Proximity Data Collection� Sensor board (iMote), mobile phone� Proximity detection by Bluetooth, and/or GPS � Environmental information (e.g. in train, on road)AroundYou FluPhone

iMote

7Proximity Detection by Bluetooth

� Bluetooth usage (e.g. Bath, UK 7.5%, San Francisco, USA 13.5% among all pedestrians in 2007)� Scanning Interval� 2 mins iMote (one week battery life)� 5 mins phone (one day battery life)� or Continuous scanning by station nodes� BT inquiry can only happen in 1.28-second intervals. 4x1.28 (5.12 seconds) gives >90% chance of finding device

� 5~10m Discover Range� Phone – equipped Bluetooth in mobile phones� Transform Discrete Event Trace 8

Page 5: Empirical Approach for Modelling Dynamic Human Contact

5

Sensor Board or Phone or ...� iMote needs disposable battery� Expensive� Third world experiment� Mobile phone� Rechargeable� Additional functions (messaging, tracing)� Smart phone: location assist applications� RF tag...� Special radio based sensor (e.g. BAS)

� Provide device or software� Combine with online information 9Location Data� Location data necessary?� Ethic approval gets tougher� Use of WiFi Access Points or Cell Towers� Use of GPS but not inside of buildings� Infer location using various information � Online Data (Social Network Services, Google)� Use of limited location information – Post localisation

10

Page 6: Empirical Approach for Modelling Dynamic Human Contact

6

��☺☺� �

� Provide devices to limited population or target general public� For epidemiology study ~=100% coverage necessary?

� Or school as mixing centres

Target Population

11Experiment Parameters vs Data Quality� Battery life vs Granularity of detection interval� Duration of experiments� Day, week, month, or year?� Data rate� Data Storage� Contact /GPS data <50K per device per day (in compressed format)� Server data storage for receiving data from devices� Extend storage by larger memory card � Collected data using different parameters or aggregated? 12

Page 7: Empirical Approach for Modelling Dynamic Human Contact

7

Data Retrieval Methods� Retrieving collected data:� Tracking station� Online (3G, SMS)� Uploading via Web� via memory card� Incentive for participating experiments� Collection cycle: real-time, day, or week?

13Data Transformation for Analysis� Transform to discrete version of contact data� Deal with noise and missing data� Ex. transitivity closure� Post localisation

14

Page 8: Empirical Approach for Modelling Dynamic Human Contact

8

Security and Privacy� Current method: Basic anonymisation of identities (MAC address)

� Use of HTTPS for data transmission via 3G� Anonymising identities may not be enough?� Simple anonymisation does not prevent to be found the social graph� Ethic approval tough! � Any collection of medical information makes it complex � 40 pages of study protocol document for ‘behaviour -FluPhone’ project – took several months to get approval15Human Connectivity Traces� Capture Human Interactions� ..thus far not large scale

Contact: 025d04b2b3f 4650000025d0 5416492246711621549 5416492246711644527Location: 0025d0e113da [lon: -3.384610278596745E125; lat: 1.3168305280597862E182] 5066619950170431763 16

Page 9: Empirical Approach for Modelling Dynamic Human Contact

9

17

City of Bath: Scanner Location

Analyse Network Structure and Model� Network structure of social systems to model dynamics

� Parameterise with interaction patterns, modularity, and details of time-dependent activity� Weighted networks� Modularity� Centrality (e.g. Degree, betweenness)� Community evolution� Network measurement metrics� Patterns of interactionsPublications at:http://www.cl.cam.ac.uk/~ey204http://www.haggleproject.org http://www.social-nets.eu/ 18

Page 10: Empirical Approach for Modelling Dynamic Human Contact

10

Basic Metrics

19

Timeline *Encountering Pairs (BATH) – 5 Days� Regularity of Encountering

* Timeline: 6mins/unit 20

Page 11: Empirical Approach for Modelling Dynamic Human Contact

11

Inter Contact Time of Pair Nodes � Hybrid Power Law Distribution?

Time21loglog histogram for times less than 12 hours (MIT trace)

Edge WeightI. High Contact No - Long Duration: CommunityII. High Contact No - Short Duration: Familiar StrangerIII.Low Contact No - Short Duration: StrangerIV. Low Contact No - Long Duration: Friend

Contact DurationNumber of Contact III

III IV90 seconds

22

Page 12: Empirical Approach for Modelling Dynamic Human Contact

12

� 7500 nodes in Bath Data for 5 days

Tuesday5 Days

Regularity of Network Activity

23Time Dependent Networks� Data paths may not exist at any one point in time but do exist over time

Time Source A

Destination BX Y24

Page 13: Empirical Approach for Modelling Dynamic Human Contact

13

Centrality in Dynamic Networks� Degree Centrality: Number of links� Closeness Centrality: Shortest path to all other nodes� Betweenness Centrality: Control over information flowing between others� High betweenness node is important as a relay node � Large number of unlimited flooding, number of times on shortest delay deliveries � Analogue to Freeman centrality

BA CD 25Party and Date Hubs

� High Degree Distribution: Party Hub connects to the same set of nodes, while Date Hub changes the neighbourhood nodes

26

Page 14: Empirical Approach for Modelling Dynamic Human Contact

14

Neighbourhood Similarity Rate� Find High Degree Hub locally� Neighbourhood Similarity Rate (NSR)*N is a set of neighbourhood nodes� Neighbourhood plus Neighbourhood Similarity Rate (NNSR)

� Can NSR and NNSR characterise Party and Date Hubs? * time unit without connectivity is suppressed in sparse networks 27Dynamics of NSR (MIT Trace)

� Node10: Continuous High NSR� Node 17: Change of Neighbourhood

� On average, 30% (Party Hub), 40% (Date Hub), 30% (Combined) 28

Page 15: Empirical Approach for Modelling Dynamic Human Contact

15

Cumulative Infectious Nodes � Human Epidemic: TTL close to order of day� Apply SI model� 7500 nodes in urban environment (BATH trace)1 DAY12 HOURS6 HOURS 29Three Stages of Epidemic Dynamics� First Rapid Increase: Propagation within Cluster� Second Slow Climbing� Reach Upper Limit of Infection 17 days

MIT Trace 30

Page 16: Empirical Approach for Modelling Dynamic Human Contact

16

Three Stages of Epidemic Dynamics (continued)� UCSD

� INFC06

16 days15 hours31

Fiedler ClusteringK-CLIQUE (K=5)

Uncovering Community� Contact trace in form of weighted (multi) graphs� Contact Frequency and Duration� Use community detection algorithms from complex network studies� K-clique [Palla04], Weighted network analysis [Newman05], Betweenness [Newman04], Modularity [Newman06], Fiedler Clustering etc.

32

Page 17: Empirical Approach for Modelling Dynamic Human Contact

17

K-CLIQUE Detection� Union of k-cliques reachable through a series of adjacent k-cliques

� Adjacent k-cliques share k-1 nodes� Members in a community reachable through well-connected subsets

� Examples� 2-clique (connected components)� 3-clique (overlapping triangles)� Overlapping feature 33

Barcelona GroupParis Group AParis Group BLausanne GroupParis Groups Barcelona Group

Lausanne GroupK=3

K-CLIQUES Communities (Conference)

34

Page 18: Empirical Approach for Modelling Dynamic Human Contact

18

Barcelona GroupParis Group AParis Group BLausanne GroupParis Groups Barcelona Group

Lausanne GroupK=4

K-CLIQUES Communities (Conference)

35BUBBLE RAP Forwarding

� Optimisation of Epidemic Forwarding� Epidemic forwarding - highly robust against disconnection, mobility, and node failures; simple, decentralised, and fast� Control Flooding is necessary (e.g. Count-base, Timer, History)� Social hubs (e.g. celebrities and postman) as betweenness centrality and combining communitystructure for improved routing efficiency

� LABEL Community based� RANK Centrality based� BUBBLE RAP

Global CommunitySub CommunitySource Destination

36

Page 19: Empirical Approach for Modelling Dynamic Human Contact

19

Communication to Epidemiology� Building communication protocol based on proximity� EU FP6 Haggle Project� Inferring social interaction, opinion dynamics �Apply results to networking and computer systems� EU FP7 Socialnets, EU FP7 Recognition

� Bio-Inspired Computing and Communication". Edited a book, LNCS 5151, Springer, 2008...2nd edition in progress.

� Understanding behaviour to infectious disease outbreak -social and economic influences� ESRC FluPhone Project with LSHTM

� Network modelling for epidemiology� EPSRC Data Driven Network Modelling for Epidemiology37Data Driven Approach� Threat to public health: e.g., , , SARS, AIDS� Current understanding of disease spread dynamics� Epidemiology: Small scale empirical work � Physics/Math: Mostly large scale abstract/simplified models� Real-world networks are far more complex� Advantage of real world data � Emergence of wireless technology for proximity data

� Goal: post-facto analysis and modelling yield insight into human interactions� How does community structure affect epidemic spread?� How do hubs and weak links influence temporal or spatial effects, and how does this affect the transmission characteristics of disease?� How does community topology of interpersonal connections and its hierarchical nature yield a multi-level structure? 38

Page 20: Empirical Approach for Modelling Dynamic Human Contact

20

Outcomes: Prediction of Epidemics� Infectious disease control/prediction systems� Provide vaccination strategy� Predict potential outbreaks� Incorporate human connectivity information to epidemic models� Mobility, interaction, behavioural assumption� Time dependent reproduction ratio� Integrate online and web information� Capture behavioural response of nodes� Analyse web search and blog activities� Twitter could act as early warning system Google Flu Trend Google: ReportedSwine Flu Symptoms39

Twitter acts as early warning system Vienna (AFP) April 13, 2010 - The micro-blogging site Twitter could act as an early warning system for epidemics, a team of experts at London's City University found in a new study published on Tuesday. According to a team of interdisciplinary experts, around three million messages -- or so-called "tweets" -- posted in English on Twitter between May and December 2009 contained the word "flu". Their study was presented to the European Congress of Clinical Microbiology and Infectious Diseases (ECCMID) being held in Vienna this week. "The numbers of tweets we collected by searching by keywords such as 'flu' or 'influenza' has been astronomical," one of the study's co-authors, Patty Kostkova, told AFP. "What we're looking at now is, what is the potential of this enormous data set for early warning systems. Because it's a real time media, it can call for an immediate response if required." Among the so-called "tweets", the experts counted 12,954 messages containing the phrase "I have swine flu" and 12,651 saying "I've got flu". They also counted the frequency of other terms, such as "H1N1" and "vaccine".... 40

Page 21: Empirical Approach for Modelling Dynamic Human Contact

21

Extending Data Collection to OSN� Online Social Networks (e.g. Facebook, Twitter)� Potential to obtain data of dynamic behaviour� High volume of dataDoes Facebook matter?� Over 190 M users� Growth rates for 2008 around the world� Italy: 2900%, Argentina: 2000%, Indonesia: 600

41Power Law Degree Distribution

42

� Crawled original Stanford (15043 Nodes), Harvard (18273 nodes) networks� From era when UIDs assign sequentially

� Obtains friends of each user, and their affiliations� 2.1 million links, Maximum degree 911

Page 22: Empirical Approach for Modelling Dynamic Human Contact

22

Cascade Symptom (Use of Geo-coding)

43TexasIllinois Florida

The FluPhone Project� Understanding behavioural responses to infectious disease outbreaks with London School of Hygiene and Tropical Medicine (LSHTM)

� Proximity data collection using mobile phone from general public in Cambridge

https://www.fluphone.org

44

Page 23: Empirical Approach for Modelling Dynamic Human Contact

23

FluPhone: Main Screen

45FluPhone: Report Symptom

46

Page 24: Empirical Approach for Modelling Dynamic Human Contact

24

FluPhone: Report Time - Feedback

47FluPhone Server – Data Collection� Via GPRS/3G FluPhone server collects data� Collection cycle: ~real-time, day, or week?� Collection methods:� Online 3G� Uploading via Web

48

Page 25: Empirical Approach for Modelling Dynamic Human Contact

25

Study Status� Pilot study (April 21 ~ May 15)� Computer Laboratory � University scale study (May 15 ~ June 30)� Advertisement (all departments, 35 colleges, student union, industry support club, Twitter, Facebook...)� Employees and students of University of Cambridge, their families, and any residents or people who work in Cambridge� Issues� Limited phone models are supported� Motivation to participate� Flu is not threat at this moment 49Encountered Bluetooth Devices

50May 14, 2010April 16, 2010

� A FluPhone Encountering History� 1495 unique devices per 10 days� Is he party-animal or a shy wall-flower?

Page 26: Empirical Approach for Modelling Dynamic Human Contact

26

Simulation of Disease – SEIR Model� Four states on each node:SUSCEPTIBLE�EXPOSED�INFECTED�RECOVERD � Parameters� p: exposure probability � a: exposed time (incubation period)� t: infected time� Diseases� D1 (SARS): p=0.8, a=24H, t=30H� D2 (FLU): p=0.4, a=48H, t=60H� D3 (COLD): p=0.2, a=72H, t=120H� Seed nodes� Random selection of 20% of nodes (=7) among 36 nodes 51SARS

52

� Exposure probability = 0.8� Exposed time = 24H (average)� Infected time =30H (average)

Day 11Day 1

Page 27: Empirical Approach for Modelling Dynamic Human Contact

27

Flu

53

� Exposure probability = 0.4� Exposed time = 48H (average)� Infected time = 60H (average)

Day 11Day 1SEIR – Normalised Form

54

SUSCEPTIBLE EXPOSEDINFECTED RECOVERD

Page 28: Empirical Approach for Modelling Dynamic Human Contact

28

Time to Exposure vs #of Meetings

55

� Distribution of time to infection (black line) is strongly influenced by time dependent adjacency matrices of meetings

Day 11Day 1Simple Flood (3 Stages)� First Rapid Increase: Propagation within Cluster� Second Slow Climbing� Reach Upper Limit of Infection

5 days56

Page 29: Empirical Approach for Modelling Dynamic Human Contact

29

Virtual Disease Experiment� Spread virtual disease via Bluetooth communication in proximity radio range

� Integrate SAR, FLU, and COLD in SEIR model� Provide additional information (e.g. Infection status, news) to observe behavioural change

57Conclusions

� Quantitative Contact Data from Real World!� Analyse Network Structure of Social Systems to Model Dynamics � Emerging Research Area� Weighted networks� Modularity� Centrality (e.g. dynamic betweenness centrality)� Community evolution and dynamics� Network measurement metrics� Integrate Background of Target Population� Location specific� Demography specific...� Virtual Disease Experiment � Behavioural study� Applying methodology to measure contact networks in Malawi, Africa (with diary-based survey) 58