data fusion and mining in sphere
TRANSCRIPT
Data fusion and mining in SPHEREChallenges & Opportunities for Machine Learning
Tom DietheIntelligent Systems LaboratoryUniversity of Bristol
irc-sphere.ac.uk
21st Century Healthcare ChallengesUK: 1.4 million aged > 85, by 2035 -> 3.6 millionJapan: will have the oldest population in human history by 2050 (52 yrs)China: a retired population larger than EuropeAgeing populations living with long term health conditions: obesity, diabetes, depression, heart disease,
dementia … Technological solution to fill the gap between expectations and reality of healthcare
EnvironmentalTemperature, light level,
humidity, air qualityWater & electricity
consumptionVideo
emotion, gate, activity, interactionWearables
activity, sleep, etc.Contextual information
medical history, demographicsFeedback
medical practitioner, users
Use Cases• Clinician with information need• Hip injury example – how is their gait, are they walking
better?• Causal relations – are there any common patterns of
behaviour that lead to health issues• Information back to the user – better health-enhancing
choices• Early warning system• Disease progress/treatment effectiveness• Many more …
5
• 100 home deployment underway
What do we want to learn?
7
Prediction and ModellingWho is in the houseWhat are they doingWhen are activities happeningWhere are these happeningWhy does this matter?
8
Is this Big Data?• Sensors
• Heterogeneous• Noisy/intermittent• Different spatial/temporal resolutions
• Velocity ✔ Variety ✔ volume ?
9
What’s Important?• Quantification of uncertainty• Transparent models• Online Learning: models must adapt to
changing habits• How to incorporate medical history• Daily/Weekly/Seasonal patterns• Personalisation
10
Model-Based Machine Learning• Model uncertainty using probabilities
• frequency• belief
• Model contains variables and factors
1. Build a model2. Incorporate observations3. Perform inference over “latent” variables
11
What is a Model?
12
A simulator programmebool A = random.NextDouble() > 0;bool B = random.NextDouble() > 0;bool C = A & B;
C
A B
&
• A set of assumptions1. A coin has an equal chance of landing on heads or tails2. Coin tosses are independent
Rules of Probability
13
SUM RULE
PRODUCT RULE
Bayes’ Rule
14
POSTERIOR PRIOR LIKELIHOOD
NORMALISER
• Designed for the data-scientist
• Case study based
www.mbmlbook.com
Experiments• SPHERE House:
• Scripted Experiments
• Medium Term Stays (1-7 days)
• Long Stays (1-4 weeks)
• Full Deployments
16
17
SPHERE House Script <<<<Downstairs>>>>Living RoomEnter the room and close the door behind youStand facing the mirror and jump twice.Turn light onGo to window and open and close the curtainsTake off shoesArtificial activities … repeat 5 times with 3 seconds between eachstand to bendbend to standstand to kneelkneel to stand
stand to sitsit to lie (back)lie (back) to lie (side) (on sofa)lie (side) to lie (back)lie (back) to sitsit to standcoughTurn light off
18
SPHERE Challenge• Task: predict posture and ambulation
labels given the sensor data • Accelerometer, RGB-D and environmental
data
19
SPHERE Challenge• Data was collected from a script in the SPHERE house• 10 participant, ~20-30 minutes per script• Even split between training and testing data• Test data split into short 10-30s sequences
https://www.drivendata.org/competitions/42/senior-data-science-safe-aging-with-sphere
bit.ly/sphere-challenge
20
Targets• Each sequence was annotated at least twice• Not all annotators will agree all of the time
• Start/end time of annotations may not be aligned
• Actual label assigned to a time interval may not agree
• Task: predict mean annotation on a per-second basis — the targets are probabilistic
• Also provided localisation annotations (in training only)
21
22
23
24
Challenge Participation• ~ 80 teams• > 100 participants• ~ 400 total registrants• > 770 individual submissions
25
26
Challenges 1 & 2Shifting Sands
Humans are costly
27
Goal: Activity Recognition in Smart Homes
Deployment context differs from learning context (home/resident)
TRANSFER LEARNING
Labels costly and time-consuming to acquire
ACTIVE LEARNING
28
29
ONLINE
ACTIVE
+ TRANSFER
+ TRANSFER
Method• Extension of the “Bayes Point Machine”• Additional layer of hierarchy:
• model “shared” and “individual” weights• can smoothly evolve from generic to
personalised predictions• Implemented using Infer.NET
• http://research.microsoft.com/en-us/um/cambridge/projects/infernet/
30
irc-sphere.ac.uk
Accelerometer Data• Source: 30 subjects, Smartphone, 50Hz, Video
annotations• Target: 14 subjects, MotionNode, 100Hz, Observer
annotations• Classes: Walking upstairs vs. Walking downstairs• Features: 48 features based on the ‘body’
acceleration signal
31
https://archive.ics.uci.edu/ml/datasets/Human+Activity+Recognition+ Using+Smartphones
http://sipi.usc.edu/HAD/
32
Online Learning Active Transfer Learning
VOI method fails to out-perform US
irc-sphere.ac.uk
• Bayesian framework very appealing• Transfer boosts initial accuracy to 70%• Active Learning -> ~5 instances to
personalise
Diethe, T., Twomey, N. and Flach, P., 2016. Active transfer learning for activity recognition. ESANN
Summary
Challenge 3: Where did I put my sensor?
34
Unsupervised learning of sensor topologies• signal processing and information-
theoretic techniques• learn an adjacency matrix• enables us to determine combinations of
sensors useful for classification• Experiments using CASAS data:
http://ailab.wsu.edu/casas/datasets/
35
irc-sphere.ac.uk
Modelling of CASAS datasets
• Experiments based on dataset 11 (Kyoto Daily Life 2010)
• Existing methods:• Naïve Bayes, HMM, CRF (segmented data) Krishnan & Cook
2012• SVM, Decision trees (streaming data) Cook 2012• SOTA: ~80-90% accuracy in controlled environments, some
transfer learning• Two approaches:
• Undirected models: Further work using Linear Chain CRFs• Directed models: Online Bayesian classifiers
Dataset: CASAS twor2009
37
38
Results• 5-10% boost in classification performance• Can help
• when transferring to new sensor configuration• disambiguating multiple residents
Twomey, N., Diethe, T., Craddock, I. and Flach, P., 2016. Unsupervised learning of sensor topologies for improving activity recognition in smart environments. Neurocomputing.
39
Challenge 4What to compute, and when?
40
HyperStream• Software for streaming data• High-level interfaces• Complex interlinked workflows• Online and offline execution modes
https://github.com/IRC-SPHERE/HyperStream
• Diethe, T., Twomey, N., Kull, M., Sokol, K., Song, H., Tonkin, E., & Flach, P.. (2017). IRC-SPHERE/HyperStream: First public pre-release version. Zenodo. http://doi.org/10.5281/zenodo.242227
• General purpose tool• Domain-independent• “Compute-on-request”
Challenge 5Tick-tock-tick-tock
42
Circular Statistics
43
Further ChallengesOpportunities!
44
Opportunities• True house-to-house transfer learning• How well does this all work with multiple
residents• What happens when houses or people
change/move?• Complex activities• Sleep• Real medical applications
45
Summing up …
46
Resources• SPHERE Code:
• https://github.com/IRC-SPHERE• SPHERE Challenge Dataset
• http://irc-sphere.ac.uk/sphere-challenge/home• http://bit.ly/sphere-challenge
• Infer.NET• http://research.microsoft.com/en-us/um/cambridge/projects/infer
net/• MBML Book
• http://mbmlbook.com/
48