data fusion and mining in sphere

Data fusion and mining in SPHEREChallenges & Opportunities for Machine Learning

Tom DietheIntelligent Systems LaboratoryUniversity of Bristol

irc-sphere.ac.uk

21st Century Healthcare ChallengesUK: 1.4 million aged > 85, by 2035 -> 3.6 millionJapan: will have the oldest population in human history by 2050 (52 yrs)China: a retired population larger than EuropeAgeing populations living with long term health conditions: obesity, diabetes, depression, heart disease,

dementia … Technological solution to fill the gap between expectations and reality of healthcare

EnvironmentalTemperature, light level,

humidity, air qualityWater & electricity

consumptionVideo

emotion, gate, activity, interactionWearables

activity, sleep, etc.Contextual information

medical history, demographicsFeedback

medical practitioner, users

Use Cases• Clinician with information need• Hip injury example – how is their gait, are they walking

better?• Causal relations – are there any common patterns of

behaviour that lead to health issues• Information back to the user – better health-enhancing

choices• Early warning system• Disease progress/treatment effectiveness• Many more …

• 100 home deployment underway

What do we want to learn?

7

Prediction and ModellingWho is in the houseWhat are they doingWhen are activities happeningWhere are these happeningWhy does this matter?

8

Is this Big Data?• Sensors

• Heterogeneous• Noisy/intermittent• Different spatial/temporal resolutions

• Velocity ✔ Variety ✔ volume ?

9

What’s Important?• Quantification of uncertainty• Transparent models• Online Learning: models must adapt to

changing habits• How to incorporate medical history• Daily/Weekly/Seasonal patterns• Personalisation

10

Model-Based Machine Learning• Model uncertainty using probabilities

• frequency• belief

• Model contains variables and factors

1. Build a model2. Incorporate observations3. Perform inference over “latent” variables

11

What is a Model?

12

A simulator programmebool A = random.NextDouble() > 0;bool B = random.NextDouble() > 0;bool C = A & B;

C

A B

&

• A set of assumptions1. A coin has an equal chance of landing on heads or tails2. Coin tosses are independent

Rules of Probability

13

SUM RULE

PRODUCT RULE

Bayes’ Rule

14

POSTERIOR PRIOR LIKELIHOOD

NORMALISER

• Designed for the data-scientist

• Case study based

www.mbmlbook.com

http://www.mbmlbook.com/

http://www.mbmlbook.com/

Experiments• SPHERE House:

• Scripted Experiments

• Medium Term Stays (1-7 days)

• Long Stays (1-4 weeks)

• Full Deployments

16

SPHERE House Script <<<<Downstairs>>>>Living RoomEnter the room and close the door behind youStand facing the mirror and jump twice.Turn light onGo to window and open and close the curtainsTake off shoesArtificial activities … repeat 5 times with 3 seconds between eachstand to bendbend to standstand to kneelkneel to stand

stand to sitsit to lie (back)lie (back) to lie (side) (on sofa)lie (side) to lie (back)lie (back) to sitsit to standcoughTurn light off

18

SPHERE Challenge• Task: predict posture and ambulation

labels given the sensor data • Accelerometer, RGB-D and environmental

data

19

SPHERE Challenge• Data was collected from a script in the SPHERE house• 10 participant, ~20-30 minutes per script• Even split between training and testing data• Test data split into short 10-30s sequences

https://www.drivendata.org/competitions/42/senior-data-science-safe-aging-with-sphere

bit.ly/sphere-challenge

20



http://bit.ly/sphere-challenge

Targets• Each sequence was annotated at least twice• Not all annotators will agree all of the time

• Start/end time of annotations may not be aligned

• Actual label assigned to a time interval may not agree

• Task: predict mean annotation on a per-second basis — the targets are probabilistic

• Also provided localisation annotations (in training only)

21

Challenge Participation• ~ 80 teams• > 100 participants• ~ 400 total registrants• > 770 individual submissions

25

Challenges 1 & 2Shifting Sands

Humans are costly

27

Goal: Activity Recognition in Smart Homes

Deployment context differs from learning context (home/resident)

TRANSFER LEARNING

Labels costly and time-consuming to acquire

ACTIVE LEARNING

28

29

ONLINE

ACTIVE

+ TRANSFER

+ TRANSFER

Method• Extension of the “Bayes Point Machine”• Additional layer of hierarchy:

• model “shared” and “individual” weights• can smoothly evolve from generic to

personalised predictions• Implemented using Infer.NET

• http://research.microsoft.com/en-us/um/cambridge/projects/infernet/

30

http://research.microsoft.com/en-us/um/cambridge/projects/infernet/


irc-sphere.ac.uk

Accelerometer Data• Source: 30 subjects, Smartphone, 50Hz, Video

annotations• Target: 14 subjects, MotionNode, 100Hz, Observer

annotations• Classes: Walking upstairs vs. Walking downstairs• Features: 48 features based on the ‘body’

acceleration signal

31

https://archive.ics.uci.edu/ml/datasets/Human+Activity+Recognition+ Using+Smartphones

http://sipi.usc.edu/HAD/

32

Online Learning Active Transfer Learning

VOI method fails to out-perform US

irc-sphere.ac.uk

• Bayesian framework very appealing• Transfer boosts initial accuracy to 70%• Active Learning -> ~5 instances to

personalise

Diethe, T., Twomey, N. and Flach, P., 2016. Active transfer learning for activity recognition. ESANN

Summary

Challenge 3: Where did I put my sensor?

34

Unsupervised learning of sensor topologies• signal processing and information-

theoretic techniques• learn an adjacency matrix• enables us to determine combinations of

sensors useful for classification• Experiments using CASAS data:

http://ailab.wsu.edu/casas/datasets/

35

http://ailab.wsu.edu/casas/datasets/

irc-sphere.ac.uk

Modelling of CASAS datasets

• Experiments based on dataset 11 (Kyoto Daily Life 2010)

• Existing methods:• Naïve Bayes, HMM, CRF (segmented data) Krishnan & Cook

2012• SVM, Decision trees (streaming data) Cook 2012• SOTA: ~80-90% accuracy in controlled environments, some

transfer learning• Two approaches:

• Undirected models: Further work using Linear Chain CRFs• Directed models: Online Bayesian classifiers

Dataset: CASAS twor2009

37

Results• 5-10% boost in classification performance• Can help

• when transferring to new sensor configuration• disambiguating multiple residents

Twomey, N., Diethe, T., Craddock, I. and Flach, P., 2016. Unsupervised learning of sensor topologies for improving activity recognition in smart environments. Neurocomputing.

39

Challenge 4What to compute, and when?

40

HyperStream• Software for streaming data• High-level interfaces• Complex interlinked workflows• Online and offline execution modes

https://github.com/IRC-SPHERE/HyperStream

• Diethe, T., Twomey, N., Kull, M., Sokol, K., Song, H., Tonkin, E., & Flach, P.. (2017). IRC-SPHERE/HyperStream: First public pre-release version. Zenodo. http://doi.org/10.5281/zenodo.242227

• General purpose tool• Domain-independent• “Compute-on-request”

https://github.com/IRC-SPHERE/HyperStream

http://doi.org/10.5281/zenodo.242227

Challenge 5Tick-tock-tick-tock

42

Circular Statistics

43

Further ChallengesOpportunities!

44

Opportunities• True house-to-house transfer learning• How well does this all work with multiple

residents• What happens when houses or people

change/move?• Complex activities• Sleep• Real medical applications

45

Summing up …

46

47

https://www.youtube.com/watch?v=dsIxMBYoo84

Resources• SPHERE Code:

• https://github.com/IRC-SPHERE• SPHERE Challenge Dataset

• http://irc-sphere.ac.uk/sphere-challenge/home• http://bit.ly/sphere-challenge

• Infer.NET• http://research.microsoft.com/en-us/um/cambridge/projects/infer

net/• MBML Book

• http://mbmlbook.com/

48

https://github.com/IRC-SPHERE

http://irc-sphere.ac.uk/sphere-challenge/home

http://bit.ly/sphere-challenge



http://mbmlbook.com/

data fusion and mining in sphere

Data & Analytics