measuring and predicting departures from routine in human mobility by dirk gorissen
TRANSCRIPT
Measuring and Predicting
Departures from Routine
in Human Mobility
Dirk Gorissen | @elazungu
PyData London - 23 February 2014
Human Mobility - Credits
University of Southampton James McInerney
Sebastian Stein
Alex Rogers
Nick Jennings
BAE Systems ATC Dave Nicholson
Reference: J. McInerney, S. Stein, A. Rogers, and N. R. Jennings (2013).
Breaking the habit: measuring and predicting departures from routine in individual human mobility. Pervasive and Mobile Computing, 9, (6), 808-822.
Submitted KDD paper
Human Mobility: Inference
Cross cuts many fields: sociology, physics, network
theory, computer science, epidemiology, …
© PNAS
© MIT
Project InMind
Project InMind announced on 12 Feb
$10m Yahoo-CMU collaboration on predicting human needs and
intentions
Human Mobility
Human mobility is highly predictable
Average predictability in the next hour is 93% [Song 2010]
Distance little or no impact
High degree of spatial and temporal regularity
Spatial: centered around a small number of base locations
Temporal: e.g., workweek / weekend
“…we find a 93% potential predictability in user mobility across the whole user base. Despite the significant differences in the travel patterns, we find a remarkable lack of variability in predictability, which is largely independent of the distance users cover on a regular basis.”
Breaking the Habit
However, regular patterns not the full story
travelling to another city on a weekend break or while on
sick leave
Breaks in regular patterns signal potentially
interesting events
Being in an unfamiliar place at an unfamiliar time
requires extra context aware assistance
E.g., higher demand for map & recommendation
apps, mobile advertising more relevant, …
Predict future departures from routine?
Applications
Optimize public transport
Insight into social behaviour
Spread of disease
(Predictive) Recommender systems
Based on user habits (e.g., Google Now, Sherpa)
Context aware advertising
Crime investigation
Urban planning
…
Obvious privacy & de-anonymization concerns
-> Eric Drass’ talk
Modeling Mobility
Entropy measures typically used to determine regularity in fixed time slots
Well understood measures, wide applicability
Break down when considering prediction or higher level structure
Model based
Can consider different types of structure in mobility (i.e., sequential and temporal)
Can deal with heterogeneous data sources
Allows incorporation of domain knowledge (e.g., calendar information)
Can build extensions that deal with trust
Allows for prediction
Bayesian approach
distribution over locations
enables use as a generative model
Bayesian Networks
Bottom up: Grass is wet, what is the most likely cause?
Top down: Its cloudy, what is the probability the grass is wet?
Probabilistic Models
Model can be run forwards or backwards
Forwards (generation): parameters -> data
E.g., use a distribution
over word pair
frequencies to
generate sentences
Building the model
We want to model departures from routine
Assume assignment of a person to a hidden location
at all time steps (even when not observed)
Discrete latent locations
Correspond to “points of interest”
e.g., home, work, gym, train station, friend's house
Latent Locations
Augment with temporal structure
Temporal and periodic assumption to behaviour
e.g., tend to be home each night at 1am
e.g., often in shopping district on Sat afternoon
Add Sequential Structure
Added first-order Markov dynamics
e.g., usually go home after work
can extend to more complex sequential structures
Sensors
Noisy sensors, e.g., cell tower observations
observed: latitude/longitude
inferred: variance (of locations)
Trustworthiness
E.g., Eyewitness
observed: latitude/longitude, reported variance
inferred: trustworthiness of observation
single latent trust value(per time step & source)
Inference is Challenging
Exact inference intractable
Can perform approximate inference using:
Expectation maximisation algorithm
Fast
But point estimates of parameters
Gibbs sampling, or other Markov chain Monte Carlo
Full distributions (converges to exact)
But slow
Variational approximation
Full distributions based on induced factorisation of model
And fast
Variational Approximation
Advantages
Straightforward parallelisation by user
Months of mobility data ~ hours
Updating previous day's parameters ~ minutes
Variational approximation amenable to fully online
inference
M. Hoffman, D. Blei, C. Wang, and J. Paisley.
Stochastic variational inference.
arXiv:1206.7051, 2012
Model enables
Inference
location
departures from routine
noise characteristics of observations
trust characteristics of sensors
Exploration/summarisation
parameters have intuitive interpretations
Prediction
Future mobility (given time context)
Future departures from routine
Performance
Synthetic dataset with heterogeneous, untrustworthy
observations.
Parameters of generating model learned from OpenPaths
dataset
Implementation
Backend inference and data processing code all python
numpy
scipy
matplotlib
UI to explore model predictions & sanity check
flask
d3.js
leaflet.js
kockout.js
Future
Gensim, pymc, bayespy, …
Probabilistic programming
Conclusion & Future Work
Summary
Novel model for learning and predicting departures from routine
Limitations
Need better ground truth for validation
Finding ways to make the model explain why each departure from routine happened.
Needs more data (e.g., from people who know each other, using weather data, app usage data, …).
Future Work
Incorporating more advanced sequential structure into the model
e.g., hidden semi-Markov model, sequence memoizer
Supervised learning of what “interesting" mobility looks like
More data sources
Online inference
Taxi drivers
Questions?
Thank you.
[email protected] | @elazungu
Reference: J. McInerney, S. Stein, A. Rogers, and N. R. Jennings (2013).
Breaking the habit: measuring and predicting departures from routine in individual human mobility. Pervasive and Mobile Computing, 9, (6), 808-822.