business impact from iot? just add data science

75
1 © Copyright 2015 Pivotal. All rights reserved. 1 © Copyright 2013 Pivotal. All rights reserved. Business impact from IoT? Just add data science Sarah Aerni, Principal Data Scientist Pivotal @itweetsarah Strata + Hadoop World, New York September 30th

Upload: pivotal

Post on 07-Jan-2017

2.102 views

Category:

Internet


0 download

TRANSCRIPT

Page 1: Business Impact From IoT? Just Add Data Science

1 © Copyright 2015 Pivotal. All rights reserved. 1 © Copyright 2013 Pivotal. All rights reserved.

Business impact from IoT? Just add data science

Sarah Aerni, Principal Data Scientist Pivotal @itweetsarah Strata + Hadoop World, New York September 30th

Page 2: Business Impact From IoT? Just Add Data Science

2 © Copyright 2015 Pivotal. All rights reserved.

Our everyday devices are smart and talk to us

Page 3: Business Impact From IoT? Just Add Data Science

3 © Copyright 2015 Pivotal. All rights reserved.

These devices are now talking to each other

Page 4: Business Impact From IoT? Just Add Data Science

4 © Copyright 2015 Pivotal. All rights reserved.

How can connected devices in our home be smart enough to

make daily life easier?

Page 5: Business Impact From IoT? Just Add Data Science

5 © Copyright 2015 Pivotal. All rights reserved.

How can we know a tree has fallen on a power line before

the residents complain?

Page 6: Business Impact From IoT? Just Add Data Science

6 © Copyright 2015 Pivotal. All rights reserved.

How can we use data to help prevent

accidents like the Macondo Disaster ?

Page 7: Business Impact From IoT? Just Add Data Science

7 © Copyright 2015 Pivotal. All rights reserved.

How does this…

Page 8: Business Impact From IoT? Just Add Data Science

8 © Copyright 2015 Pivotal. All rights reserved.

How does this… …become this?

Page 9: Business Impact From IoT? Just Add Data Science

9 © Copyright 2015 Pivotal. All rights reserved.

How does this… …become this?

By recognizing this

Page 10: Business Impact From IoT? Just Add Data Science

10 © Copyright 2015 Pivotal. All rights reserved.

Gene Sequencing

Smart Grids

COST TO SEQUENCE ONE GENOME HAS FALLEN FROM $100M IN 2001 TO $10K IN 2011 TO $1K IN 2014

READING SMART METERS EVERY 15 MINUTES IS 3000X MORE DATA INTENSIVE

Stock Market

Social Media

FACEBOOK UPLOADS 250 MILLION

PHOTOS EACH DAY

In all industries billions of data points represent opportunities for the Internet of Things

Oil Exploration

Video Surveillance

OIL RIGS GENERATE

25000 DATA POINTS PER SECOND

Medical Imaging

Mobile Sensors

Page 11: Business Impact From IoT? Just Add Data Science

11 © Copyright 2015 Pivotal. All rights reserved.

To realize this opportunity requires the right tools and techniques

Sensors & Actuators

Page 12: Business Impact From IoT? Just Add Data Science

12 © Copyright 2015 Pivotal. All rights reserved.

To realize this opportunity requires the right tools and techniques

Sensors & Actuators

Data Lake

Page 13: Business Impact From IoT? Just Add Data Science

13 © Copyright 2015 Pivotal. All rights reserved.

To realize this opportunity requires the right tools and techniques

Problem Formulation

Data Science for Building Models

Sensors & Actuators

Data Lake

Page 14: Business Impact From IoT? Just Add Data Science

14 © Copyright 2015 Pivotal. All rights reserved.

To realize this opportunity requires the right tools and techniques

Problem Formulation

Data Step

Data Science for Building Models

Sensors & Actuators

Data Lake

Page 15: Business Impact From IoT? Just Add Data Science

15 © Copyright 2015 Pivotal. All rights reserved.

To realize this opportunity requires the right tools and techniques

Problem Formulation

Modeling Step

Data Step

Data Science for Building Models

Sensors & Actuators

Data Lake

Page 16: Business Impact From IoT? Just Add Data Science

16 © Copyright 2015 Pivotal. All rights reserved.

To realize this opportunity requires the right tools and techniques

Problem Formulation

Modeling Step

Data Step Application Step

Data Science for Building Models

Sensors & Actuators

Data Lake

Page 17: Business Impact From IoT? Just Add Data Science

17 © Copyright 2015 Pivotal. All rights reserved.

What does it take to build a data-driven models? Data Cleansing Libraries to Build

Models At-Scale Feature

Engineering

Page 18: Business Impact From IoT? Just Add Data Science

18 © Copyright 2015 Pivotal. All rights reserved.

Treating Patients

What does it take to build a data-driven models? Data Cleansing Libraries to Build

Models At-Scale Feature

Engineering

Vaccine Manufacturing Oil Drilling

Page 19: Business Impact From IoT? Just Add Data Science

19 © Copyright 2015 Pivotal. All rights reserved.

Treating Patients

What does it take to build a data-driven models? Data Cleansing Libraries to Build

Models At-Scale Feature

Engineering

Derive insight from models to change

processes

Tradeoffs between model accuracy and timeliness

Vaccine Manufacturing Oil Drilling

Page 20: Business Impact From IoT? Just Add Data Science

20 © Copyright 2015 Pivotal. All rights reserved.

Treating Patients

What does it take to build a data-driven models? Data Cleansing Libraries to Build

Models At-Scale Feature

Engineering

Derive insight from models to change

processes

Tradeoffs between model accuracy and timeliness

Vaccine Manufacturing Oil Drilling Vaccine Manufacturing

Page 21: Business Impact From IoT? Just Add Data Science

21 © Copyright 2015 Pivotal. All rights reserved.

Opportunities for Data-Driven Decisions in Pharma

Page 22: Business Impact From IoT? Just Add Data Science

22 © Copyright 2015 Pivotal. All rights reserved.

A pipeline of sensors and opportunities for optimizing output Internet of Things in Manufacturing

Input materials Mix Incubate Filter Centrifuge Final Product

Page 23: Business Impact From IoT? Just Add Data Science

23 © Copyright 2015 Pivotal. All rights reserved.

A pipeline of sensors and opportunities for optimizing output Internet of Things in Manufacturing

Input materials Mix Incubate Filter Centrifuge Final Product

Sensors Te

mp

Time

Abs

orba

nce

Elution volume

Velo

city

Time

Page 24: Business Impact From IoT? Just Add Data Science

24 © Copyright 2015 Pivotal. All rights reserved.

A pipeline of sensors and opportunities for optimizing output Internet of Things in Manufacturing

Input materials Mix Incubate Filter Centrifuge Final Product

Tem

p

Time

Abs

orba

nce

Elution volume

Velo

city

Time

•  What opportunities exist for intervention, correction? •  Which attributes should be used as features in a model? •  When is the appropriate time to take action?

Page 25: Business Impact From IoT? Just Add Data Science

25 © Copyright 2015 Pivotal. All rights reserved.

A pipeline of sensors and opportunities for optimizing output Internet of Things in Manufacturing

Input materials Mix Incubate Filter Centrifuge Final Product

Tem

p

Time

Abs

orba

nce

Elution volume

Velo

city

Time

•  What opportunities exist for intervention, correction? •  Which attributes should be used as features in a model? •  When is the appropriate time to take action?

>6 months

Page 26: Business Impact From IoT? Just Add Data Science

26 © Copyright 2015 Pivotal. All rights reserved.

How can noisy data create meaningful models?

True Potency

Pre

dict

ed P

oten

cy

Input materials Mix Incubate Filter Centrifuge Final Product

Page 27: Business Impact From IoT? Just Add Data Science

27 © Copyright 2015 Pivotal. All rights reserved.

How can noisy data create meaningful models? Te

mpe

ratu

re

Time

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●●●

●●

●●

●●

●●●

●●

●●●

●●●

●●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●●

●●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●●●

●●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●●●

●●

●●

●●

●●

●●●

●●●

●●●

●●

●●

●●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●●

●●●●

●●

●●●

●●

●●

●●●

●●

●●

●●

●●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●●

●●

●●

●●

●●●

●●

●●●●

●●

●●●

●●

●●●

●●

●●

●●

●●

●●

●●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●●●

●●

●●

●●

●●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●●

●●●

●●●

●●●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●●

●●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●●

●●●

●●

●●●

●●

●●

●●

●●●

●●●

●●

●●●

●●

●●●

●●●●

●●

●●

●●

●●

●●

●●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●●

●●●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

00:00 10:00 20:00 30:00 40:00 50:00 00:00

1015

20

df$ts_utc

df$w

ob

True Potency

Pre

dict

ed P

oten

cy

Page 28: Business Impact From IoT? Just Add Data Science

28 © Copyright 2015 Pivotal. All rights reserved.

How can noisy data create meaningful models? Te

mpe

ratu

re

Time

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●●●

●●

●●

●●

●●●

●●

●●●

●●●

●●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●●

●●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●●●

●●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●●●

●●

●●

●●

●●

●●●

●●●

●●●

●●

●●

●●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●●

●●●●

●●

●●●

●●

●●

●●●

●●

●●

●●

●●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●●

●●

●●

●●

●●●

●●

●●●●

●●

●●●

●●

●●●

●●

●●

●●

●●

●●

●●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●●●

●●

●●

●●

●●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●●

●●●

●●●

●●●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●●

●●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●●

●●●

●●

●●●

●●

●●

●●

●●●

●●●

●●

●●●

●●

●●●

●●●●

●●

●●

●●

●●

●●

●●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●●

●●●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

00:00 10:00 20:00 30:00 40:00 50:00 00:00

1015

20

df$ts_utc

df$w

ob

•  Deriving signal noisy sensor data requires data cleansing

Page 29: Business Impact From IoT? Just Add Data Science

29 © Copyright 2015 Pivotal. All rights reserved.

How can noisy data create meaningful models? Te

mpe

ratu

re

Time

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●●●

●●

●●

●●

●●●

●●

●●●

●●●

●●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●●

●●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●●●

●●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●●●

●●

●●

●●

●●

●●●

●●●

●●●

●●

●●

●●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●●

●●●●

●●

●●●

●●

●●

●●●

●●

●●

●●

●●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●●

●●

●●

●●

●●●

●●

●●●●

●●

●●●

●●

●●●

●●

●●

●●

●●

●●

●●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●●●

●●

●●

●●

●●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●●

●●●

●●●

●●●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●●

●●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●●

●●●

●●

●●●

●●

●●

●●

●●●

●●●

●●

●●●

●●

●●●

●●●●

●●

●●

●●

●●

●●

●●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●●

●●●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

00:00 10:00 20:00 30:00 40:00 50:00 00:00

1015

20

df$ts_utc

df$w

ob

●●

●●●

●●

●●

●●

●●●

●●

●●●

●●●●●●●●●●

●●●●

●●●

●●●

●●

●●

●●

●●

●●

●●●●●

●●

●●●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●●

●●●

●●

●●

●●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●●

●●●●

●●

●●

●●

●●

●●●

●●

●●

●●●●●

●●●●

●●

●●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●●

●●

●●●●

●●

●●●

●●

●●●●●

●●●

●●

●●

●●

●●

●●

●●●

●●

●●●

●●

●●

●●●●

●●●

●●

●●

●●

●●●●

●●

●●

●●

●●

●●

●●●●

●●

●●

●●●

●●

●●●

●●

●●●

●●●

●●●●

●●

●●●●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●●

●●

●●

●●●●●

●●

●●

●●

●●●

●●●

●●

●●

●●●

●●●

●●

●●

●●

●●●

●●

●●●●

●●●

●●●

●●

●●●

●●●

●●

●●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●●●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●●

●●●

●●●

●●

●●

●●

●●

●●●●

●●

●●

●●●

●●●

●●

●●●

●●●

●●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●●

●●

●●

●●

●●

●●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●●

●●

●●

●●

●●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●●

●●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●●●

●●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●●●●●

●●

●●

●●

●●

●●

●●●

●●●

●●

●●

●●●●●

●●●

●●

●●●●●

●●

●●

●●●●●

●●●

●●

●●

●●●

●●●●

●●

●●

●●

●●●●●●●

●●

●●

●●●

●●●●

●●●

●●

●●●

●●

●●

●●●●

●●●

●●●●

●●

●●●

●●●●●●●●●

●●

●●

●●●●●

●●

●●●●

●●

●●●

●●●

●●

●●

●●●●●

●●●●●●●●

●●●

●●●

●●●●●

●●

●●●

●●

●●

●●●

●●

●●

●●●

●●

●●●

●●

●●

●●●

●●●●

●●●●●

●●●

●●

●●

●●

●●

●●

●●

●●●●●

●●

●●●

●●

●●

●●

●●●

●●

●●●

●●●

●●

●●●●

●●●

●●

●●

●●

●●●

●●●

●●●

●●●●

●●

●●●●

●●●

●●

●●●

●●●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●●●

•  Deriving signal noisy sensor data requires data cleansing

Page 30: Business Impact From IoT? Just Add Data Science

30 © Copyright 2015 Pivotal. All rights reserved.

How can noisy data create meaningful models? Te

mpe

ratu

re

Time

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●●●

●●

●●

●●

●●●

●●

●●●

●●●

●●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●●

●●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●●●

●●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●●●

●●

●●

●●

●●

●●●

●●●

●●●

●●

●●

●●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●●

●●●●

●●

●●●

●●

●●

●●●

●●

●●

●●

●●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●●

●●

●●

●●

●●●

●●

●●●●

●●

●●●

●●

●●●

●●

●●

●●

●●

●●

●●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●●●

●●

●●

●●

●●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●●

●●●

●●●

●●●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●●

●●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●●

●●●

●●

●●●

●●

●●

●●

●●●

●●●

●●

●●●

●●

●●●

●●●●

●●

●●

●●

●●

●●

●●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●●

●●●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

00:00 10:00 20:00 30:00 40:00 50:00 00:00

1015

20

df$ts_utc

df$w

ob

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●●●

●●

●●

●●

●●●

●●

●●●

●●●

●●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●●

●●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●●●

●●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●●●

●●

●●

●●

●●

●●●

●●●

●●●

●●

●●

●●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●●

●●●●

●●

●●●

●●

●●

●●●

●●

●●

●●

●●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●●

●●

●●

●●

●●●

●●

●●●●

●●

●●●

●●

●●●

●●

●●

●●

●●

●●

●●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●●●

●●

●●

●●

●●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●●

●●●

●●●

●●●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●●

●●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●●

●●●

●●

●●●

●●

●●

●●

●●●

●●●

●●

●●●

●●

●●●

●●●●

●●

●●

●●

●●

●●

●●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●●

●●●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

•  Deriving signal noisy sensor data requires data cleansing

Page 31: Business Impact From IoT? Just Add Data Science

31 © Copyright 2015 Pivotal. All rights reserved.

How can noisy data create meaningful models? Te

mpe

ratu

re

Time

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●●●

●●

●●

●●

●●●

●●

●●●

●●●

●●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●●

●●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●●●

●●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●●●

●●

●●

●●

●●

●●●

●●●

●●●

●●

●●

●●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●●

●●●●

●●

●●●

●●

●●

●●●

●●

●●

●●

●●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●●

●●

●●

●●

●●●

●●

●●●●

●●

●●●

●●

●●●

●●

●●

●●

●●

●●

●●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●●●

●●

●●

●●

●●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●●

●●●

●●●

●●●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●●

●●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●●

●●●

●●

●●●

●●

●●

●●

●●●

●●●

●●

●●●

●●

●●●

●●●●

●●

●●

●●

●●

●●

●●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●●

●●●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

00:00 10:00 20:00 30:00 40:00 50:00 00:00

1015

20

df$ts_utc

df$w

ob

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●●●

●●

●●

●●

●●●

●●

●●●

●●●

●●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●●

●●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●●●

●●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●●●

●●

●●

●●

●●

●●●

●●●

●●●

●●

●●

●●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●●

●●●●

●●

●●●

●●

●●

●●●

●●

●●

●●

●●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●●

●●

●●

●●

●●●

●●

●●●●

●●

●●●

●●

●●●

●●

●●

●●

●●

●●

●●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●●●

●●

●●

●●

●●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●●

●●●

●●●

●●●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●●

●●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●●

●●●

●●

●●●

●●

●●

●●

●●●

●●●

●●

●●●

●●

●●●

●●●●

●●

●●

●●

●●

●●

●●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●●

●●●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

A cleansing approach: use average across a window

•  Deriving signal noisy sensor data requires data cleansing

•  Window functions in SQL allow us to perform smoothing seamlessly, at-scale

Page 32: Business Impact From IoT? Just Add Data Science

32 © Copyright 2015 Pivotal. All rights reserved.

How can noisy data create meaningful models? Te

mpe

ratu

re

Time

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●●●

●●

●●

●●

●●●

●●

●●●

●●●

●●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●●

●●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●●●

●●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●●●

●●

●●

●●

●●

●●●

●●●

●●●

●●

●●

●●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●●

●●●●

●●

●●●

●●

●●

●●●

●●

●●

●●

●●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●●

●●

●●

●●

●●●

●●

●●●●

●●

●●●

●●

●●●

●●

●●

●●

●●

●●

●●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●●●

●●

●●

●●

●●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●●

●●●

●●●

●●●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●●

●●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●●

●●●

●●

●●●

●●

●●

●●

●●●

●●●

●●

●●●

●●

●●●

●●●●

●●

●●

●●

●●

●●

●●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●●

●●●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

00:00 10:00 20:00 30:00 40:00 50:00 00:00

1015

20

df$ts_utc

df$w

ob

•  Deriving signal noisy sensor data requires data cleansing

•  Window functions in SQL allow us to perform smoothing seamlessly, at-scale

Page 33: Business Impact From IoT? Just Add Data Science

33 © Copyright 2015 Pivotal. All rights reserved.

How can noisy data create meaningful models? Te

mpe

ratu

re

Time

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●●●

●●

●●

●●

●●●

●●

●●●

●●●

●●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●●

●●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●●●

●●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●●●

●●

●●

●●

●●

●●●

●●●

●●●

●●

●●

●●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●●

●●●●

●●

●●●

●●

●●

●●●

●●

●●

●●

●●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●●

●●

●●

●●

●●●

●●

●●●●

●●

●●●

●●

●●●

●●

●●

●●

●●

●●

●●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●●●

●●

●●

●●

●●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●●

●●●

●●●

●●●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●●

●●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●●

●●●

●●

●●●

●●

●●

●●

●●●

●●●

●●

●●●

●●

●●●

●●●●

●●

●●

●●

●●

●●

●●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●●

●●●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

00:00 10:00 20:00 30:00 40:00 50:00 00:00

1015

20

df$ts_utc

df$w

ob

•  Deriving signal noisy sensor data requires data cleansing

•  Window functions in SQL allow us to perform smoothing seamlessly, at-scale

•  Test many hypotheses in parallel to examine if features have an effect on potency

Page 34: Business Impact From IoT? Just Add Data Science

34 © Copyright 2015 Pivotal. All rights reserved.

Interpreting the utility of a measure obtained during manufacturing based on model outcomes

Building insights from models

� Some features may reveal tunable parameters to alter potency, others may simply be markers

Assayed value Duration of a step

Pot

ency

Pot

ency

Correlation=0.45 Correlation=0.38

Page 35: Business Impact From IoT? Just Add Data Science

35 © Copyright 2015 Pivotal. All rights reserved.

Treating Patients

What does it take to build a data-driven models? Data Cleansing Libraries to Build

Models At-Scale Feature

Engineering

Derive insight from models to change

processes

Tradeoffs between model accuracy and timeliness

Vaccine Manufacturing Oil Drilling Treating Patients

Page 36: Business Impact From IoT? Just Add Data Science

36 © Copyright 2015 Pivotal. All rights reserved. 36 © Copyright 2013 Pivotal. All rights reserved.

Internet of Things in Healthcare Improving Patient Outcomes and Increasing Efficiency

Page 37: Business Impact From IoT? Just Add Data Science

37 © Copyright 2015 Pivotal. All rights reserved.

Beyond monitor alerts for crashing patients–Prediction means prevention Powering the Connected Hospital

ClinicalNarratives

Page 38: Business Impact From IoT? Just Add Data Science

38 © Copyright 2015 Pivotal. All rights reserved.

Use Cases in Healthcare Building a case for leveraging data and data science within a hospital setting

SAMPLE USE CASES �  Prevent unnecessary ED visits using air quality and patient histories to anticipate needed

prescription refills �  Avoid keeping patients longer than needed due to poor coordination by predicting patient length-of-

stay leading to on-time planning �  Early alerts for deteriorating patients to increase monitoring and improve outcomes �  Prevent discharging patients prematurely through patient readmission models �  Improve treatment pathways via mortality models for sepsis

INFLUENCE CHANGE by finding drivers in the models

IMPROVE customer MODELS using data-driven approaches

LEVERAGE previously inaccessible DATA sources

Environment Approach Insights

Page 39: Business Impact From IoT? Just Add Data Science

39 © Copyright 2015 Pivotal. All rights reserved.

Data & Platform Overview

Pivotal HD

Pivotal HAWQ

DATA PLATFORM

TOOLS

�  Data obtained from EPIC

�  Total unique encounters: 242,312,567

�  Total unique patient IDs: 11,195,934

�  Encounters from 6 healthcare settings (including hospitals, skilled nursing facilities, ambulance and dialysis) –  8 total hospitals used in LOS –  2 regions

�  9 years of data

EPIC

DIAGNOSES PROCEDURES

LABORATORY VALUES

MONITOR FEEDS

BED OCCUPANCY

ORDERS

Page 40: Business Impact From IoT? Just Add Data Science

40 © Copyright 2015 Pivotal. All rights reserved.

Engineering over 300 features to improve models

Simple SQL enables rapid generation of many creative features

•  Processing performed in the database without having to move the data with very simple SQL code

•  Reduced time to generate and examine features enables rapid iterations

•  Test hypotheses rapidly to examine if features have an effect on LOS

Patient Demographics

Patient Medical History

Current Admission

Prior Hospitalizations

ED Stay

Outpatient Utilization

Hospital Attributes

Lab Results (last 72 hrs)

Page 41: Business Impact From IoT? Just Add Data Science

41 © Copyright 2015 Pivotal. All rights reserved.

Understanding drivers of length of stay through model interpretation Model Results and Insights into Patient Outcomes

Data-driven approaches improved model fit by 66%, and predicts patient length of stay in the hospital within 22 hours of true discharge (on average)

Patient history offers less information for AMI Recent observations (from current admission), labs and hospital features are more predictive of length of stay than patient medical history

Current Admission Lab

Medical History Demographics

Hospital None (complete model)

Variance Explained When Category Excluded

Patient Demographics

Patient Medical History

Current Admissio

n

Prior Hospitalizations

ED Stay

Outpatient Utilization

Hospital Attributes

Lab Results (last 72 hrs)

Page 42: Business Impact From IoT? Just Add Data Science

42 © Copyright 2015 Pivotal. All rights reserved.

Insight into hospital operations Length of stay is not only biology. Admission Time, Day of Week, hospital’s size and a hospital’s experience with cardiology matter

Understanding drivers of length of stay through model interpretation Model Results and Insights into Patient Outcomes

Data-driven approaches improved model fit by 66%, and predicts patient length of stay in the hospital within 22 hours of true discharge (on average)

Patient history offers less information for AMI Recent observations (from current admission), labs and hospital features are more predictive of length of stay than patient medical history

Current Admission Lab

Medical History Demographics

Hospital None (complete model)

Variance Explained When Category Excluded Hour of the day

# of

Adm

issi

ons

Hour of the day

# of

Dis

char

ges

Page 43: Business Impact From IoT? Just Add Data Science

43 © Copyright 2015 Pivotal. All rights reserved.

Treating Patients

What does it take to build a data-driven models? Data Cleansing Libraries to Build

Models At-Scale Feature

Engineering

Derive insight from models to change

processes

Tradeoffs between model accuracy and timeliness

Vaccine Manufacturing Oil Drilling Oil Drilling

Page 44: Business Impact From IoT? Just Add Data Science

44 © Copyright 2015 Pivotal. All rights reserved.

Data: The New Oil IoT in Oil & Gas

Page 45: Business Impact From IoT? Just Add Data Science

45 © Copyright 2015 Pivotal. All rights reserved.

Predictive Maintenance Drilling into the San Andreas Fault at Parkfield

California. Credit: Stephen H.

Hickman, USGS

�  Failure costs estimated at $150,000/incident (billions annually)*

� Oil & gas generates large amounts of data from sensors enabling data-driven approaches to improve operations

� Goals –  Early warning system –  Insights into prominent features impacting operation and failure –  Reduction of non-productive drill time –  Reduced incidents

�  But how do we build models?

*http://blog.pivotal.io/pivotal/case-studies-2/data-as-the-new-oil-producing-value-for-the-oil-gas-industry

Page 46: Business Impact From IoT? Just Add Data Science

46 © Copyright 2015 Pivotal. All rights reserved.

How are models built using sensor data? Predictive use cases Class of model Specific models

Page 47: Business Impact From IoT? Just Add Data Science

47 © Copyright 2015 Pivotal. All rights reserved.

How are models built using sensor data? Predictive use cases Class of model Specific models

Predict equipment failure in time window Classification

•  Logistic Regression •  Support Vector Machines •  Random Forest

Page 48: Business Impact From IoT? Just Add Data Science

48 © Copyright 2015 Pivotal. All rights reserved.

How are models built using sensor data? Predictive use cases Class of model Specific models

Predict equipment failure in time window Classification

•  Logistic Regression •  Support Vector Machines •  Random Forest

Predict remaining life of equipment Survival •  Cox Proportional Hazards Regression

Page 49: Business Impact From IoT? Just Add Data Science

49 © Copyright 2015 Pivotal. All rights reserved.

How are models built using sensor data? Predictive use cases Class of model Specific models

Predict equipment failure in time window Classification

•  Logistic Regression •  Support Vector Machines •  Random Forest

Predict remaining life of equipment Survival •  Cox Proportional Hazards Regression

Predict rate-of-penetration in drilling Regression

•  Linear Regression •  Elastic Net Regularized Regression (Gaussian) •  Random Forest

Page 50: Business Impact From IoT? Just Add Data Science

50 © Copyright 2015 Pivotal. All rights reserved.

How are models built using sensor data? Predictive use cases Class of model Specific models

Predict equipment failure in time window Classification

•  Logistic Regression •  Support Vector Machines •  Random Forest

Predict remaining life of equipment Survival •  Cox Proportional Hazards Regression

Predict rate-of-penetration in drilling Regression

•  Linear Regression •  Elastic Net Regularized Regression (Gaussian) •  Random Forest

Identifying similar drilling sites Clustering •  K-means

•  Spectral clustering

Page 51: Business Impact From IoT? Just Add Data Science

51 © Copyright 2015 Pivotal. All rights reserved.

How are models built using BIG sensor data? Predictive use cases Class of model Specific models

Predict equipment failure in time window Classification

•  Logistic Regression •  Support Vector Machines •  Random Forest

Predict remaining life of equipment Survival •  Cox Proportional Hazards Regression

Predict rate-of-penetration in drilling Regression

•  Linear Regression •  Elastic Net Regularized Regression (Gaussian) •  Random Forest

Identifying similar drilling sites Clustering •  K-means

•  Spectral clustering

Page 52: Business Impact From IoT? Just Add Data Science

52 © Copyright 2015 Pivotal. All rights reserved.

How are models built using BIG sensor data? Predictive use cases Class of model Specific models

Predict equipment failure in time window Classification

•  Logistic Regression •  Support Vector Machines •  Random Forest

Predict remaining life of equipment Survival •  Cox Proportional Hazards Regression

Predict rate-of-penetration in drilling Regression

•  Linear Regression •  Elastic Net Regularized Regression (Gaussian) •  Random Forest

Identifying similar drilling sites Clustering •  K-means

•  Spectral clustering

Oil & Gas industries may produce billions of data points across thousands of sensors. Most implementations of these algorithms do not scale

Page 53: Business Impact From IoT? Just Add Data Science

53 © Copyright 2015 Pivotal. All rights reserved.

Linear Regression: Streaming Algorithm

� Finding linear dependencies between variables –  ROP = c0+ WOB * cWOB

0 10 20 30 40 50 60 70 80 90

100 110

-10 15

Rat

e of

P

enet

ratio

n (R

OP

)

Weight on Bit (WOB)

Page 54: Business Impact From IoT? Just Add Data Science

54 © Copyright 2015 Pivotal. All rights reserved.

Linear Regression: Streaming Algorithm

� Finding linear dependencies between variables

0

10 20 30 40 50 60 70 80 90

100 110

-10 15

Rat

e of

P

enet

ratio

n (R

OP

)

Weight on Bit (WOB)

Page 55: Business Impact From IoT? Just Add Data Science

55 © Copyright 2015 Pivotal. All rights reserved.

Linear Regression: Streaming Algorithm

� Finding linear dependencies between variables

� How to compute with a single scan?

0 10 20 30 40 50 60 70 80 90

100 110

-10 15

Rat

e of

P

enet

ratio

n (R

OP

)

Weight on Bit (WOB)

Page 56: Business Impact From IoT? Just Add Data Science

56 © Copyright 2015 Pivotal. All rights reserved.

Linear Regression: Parallel Computation

Page 57: Business Impact From IoT? Just Add Data Science

57 © Copyright 2015 Pivotal. All rights reserved.

Linear Regression: Parallel Computation

Segment 1 Segment 2

Page 58: Business Impact From IoT? Just Add Data Science

58 © Copyright 2015 Pivotal. All rights reserved.

Linear Regression: Parallel Computation

Segment 1 Segment 2

Page 59: Business Impact From IoT? Just Add Data Science

59 © Copyright 2015 Pivotal. All rights reserved.

Linear regression on 10 million rows in seconds

0

50

100

150

200

0 50 100 150 200 250 300 350

6 Segments 12 Segments 18 Segments 24 Segments

Hellerstein, Joseph M., et al. "The MADlib analytics library: or MAD skills, the SQL." Proceedings of the VLDB Endowment 5.12 (2012): 1700-1711.

# independent variables

Exe

cutio

n tim

e (s

)

Page 60: Business Impact From IoT? Just Add Data Science

60 © Copyright 2015 Pivotal. All rights reserved.

BIG DATA MACHINE LEARNING IN SQL http://madlib.net/

Predictive Modeling Library

Linear Systems •  Sparse and Dense Solvers

Matrix Factorization •  Single Value Decomposition (SVD) •  Low-Rank

Generalized Linear Models •  Linear Regression •  Logistic Regression •  Multinomial Logistic Regression •  Cox Proportional Hazards •  Regression •  Elastic Net Regularization •  Sandwich Estimators (Huber white,

clustered, marginal effects)

Machine Learning Algorithms •  Principal Component Analysis (PCA) •  Association Rules (Affinity Analysis, Market

Basket) •  Topic Modeling (Parallel LDA) •  Decision Trees •  Ensemble Learners (Random Forests) •  Support Vector Machines •  Conditional Random Field (CRF) •  Clustering (K-means) •  Cross Validation

Descriptive Statistics

Sketch-based Estimators •  CountMin (Cormode-

Muthukrishnan) •  FM (Flajolet-Martin) •  MFV (Most Frequent

Values) Correlation Summary

Support Modules

Array Operations Sparse Vectors Random Sampling Probability Functions PMML Export

Page 61: Business Impact From IoT? Just Add Data Science

61 © Copyright 2015 Pivotal. All rights reserved.

Treating Patients

What does it take to build a data-driven models? Data Cleansing Libraries to Build

Models At-Scale Feature

Engineering

Derive insight from models to change

processes

Tradeoffs between model accuracy and timeliness

Vaccine Manufacturing Oil Drilling

Page 62: Business Impact From IoT? Just Add Data Science

62 © Copyright 2015 Pivotal. All rights reserved.

1 5 3 4 2 6 7

8 12 10 11 9 13 14

15 19 17 18 16 20 21

22 26 24 25 23 27 28

29 30

SUNDAY THURSDAY TUESDAY WEDNESDAY MONDAY FRIDAY SATURDAY

SEPTEMBER 2013

Page 63: Business Impact From IoT? Just Add Data Science

63 © Copyright 2015 Pivotal. All rights reserved.

1 5 3 4 2 6 7

8 12 10 11 9 13 14

15 19 17 18 16 20 21

22 26 24 25 23 27 28

29 30

SUNDAY THURSDAY TUESDAY WEDNESDAY MONDAY FRIDAY SATURDAY

SEPTEMBER 2013

Page 64: Business Impact From IoT? Just Add Data Science

64 © Copyright 2015 Pivotal. All rights reserved.

1 5 3 4 2 6 7

8 12 10 11 9 13 14

15 19 17 18 16 20 21

22 26 24 25 23 27 28

29 30

SUNDAY THURSDAY TUESDAY WEDNESDAY MONDAY FRIDAY SATURDAY

SEPTEMBER 2013

Page 65: Business Impact From IoT? Just Add Data Science

65 © Copyright 2015 Pivotal. All rights reserved.

1 5 3 4 2 6 7

8 12 10 11 9 13 14

15 19 17 18 16 20 21

22 26 24 25 23 27 28

29 30

SUNDAY THURSDAY TUESDAY WEDNESDAY MONDAY FRIDAY SATURDAY

SEPTEMBER 2013

Page 66: Business Impact From IoT? Just Add Data Science

66 © Copyright 2015 Pivotal. All rights reserved.

3 1 2 4 5

6 10 8 9 7 11 12

13 17 15 16 14 18 19

20 24 22 23 21 25 26

27 31 29 30 28

SUN THU TUE WED MON FRI SAT OCTOBER 2013

1 5 3 4 2 6 7

8 12 10 11 9 13 14

15 19 17 18 16 20 21

22 26 24 25 23 27 28

29 30

SUN THU TUE WED MON FRI SAT SEPTEMBER 2013

A Snapshot

1 2

3 7 5 6 4 8 9

10 14 12 13 11 15 16

17 21 19 20 18 22 23

24 28 26 27 25 29 30

SUN THU TUE WED MON FRI SAT NOVEMBER 2013

1 5 3 4 2 6 7

8 12 10 11 9 13 14

15 19 17 18 16 20 21

22 26 24 25 23 27 28

29 31 30

SUN THU TUE WED MON FRI SAT DECEMBER 2013

Page 67: Business Impact From IoT? Just Add Data Science

67 © Copyright 2015 Pivotal. All rights reserved.

3 1 2 4 5

6 10 8 9 7 11 12

13 17 15 16 14 18 19

20 24 22 23 21 25 26

27 31 29 30 28

SUN THU TUE WED MON FRI SAT OCTOBER 2013

1 5 3 4 2 6 7

8 12 10 11 9 13 14

15 19 17 18 16 20 21

22 26 24 25 23 27 28

29 30

SUN THU TUE WED MON FRI SAT SEPTEMBER 2013

A Snapshot

1 2

3 7 5 6 4 8 9

10 14 12 13 11 15 16

17 21 19 20 18 22 23

24 28 26 27 25 29 30

SUN THU TUE WED MON FRI SAT NOVEMBER 2013

1 5 3 4 2 6 7

8 12 10 11 9 13 14

15 19 17 18 16 20 21

22 26 24 25 23 27 28

29 31 30

SUN THU TUE WED MON FRI SAT DECEMBER 2013

Another Snapshot

Page 68: Business Impact From IoT? Just Add Data Science

68 © Copyright 2015 Pivotal. All rights reserved.

CDC – 2011- Number of Health Care Visits Per Year - Age Adjusted

3 1 2 4 5

6 10 8 9 7 11 12

13 17 15 16 14 18 19

20 24 22 23 21 25 26

27 31 29 30 28

SUN THU TUE WED MON FRI SAT OCTOBER 2013

1 5 3 4 2 6 7

8 12 10 11 9 13 14

15 19 17 18 16 20 21

22 26 24 25 23 27 28

29 30

SUN THU TUE WED MON FRI SAT SEPTEMBER 2013

A Snapshot

1 2

3 7 5 6 4 8 9

10 14 12 13 11 15 16

17 21 19 20 18 22 23

24 28 26 27 25 29 30

SUN THU TUE WED MON FRI SAT NOVEMBER 2013

1 5 3 4 2 6 7

8 12 10 11 9 13 14

15 19 17 18 16 20 21

22 26 24 25 23 27 28

29 31 30

SUN THU TUE WED MON FRI SAT DECEMBER 2013

Another Snapshot

Page 69: Business Impact From IoT? Just Add Data Science

69 © Copyright 2015 Pivotal. All rights reserved.

The Promise of Internet of Humans

�  Smart contact lenses and sensors to identify and alert patients before catastrophic events (e.g. blood sugar drop for diabetics)

� Wearables to track patient disease progression using objective measures

�  Track patient adherence �  Detect disease outbreaks using sequencing in

sewer system samples �  ECG monitoring on mobile phones for early

alerting of stroke

Page 70: Business Impact From IoT? Just Add Data Science

70 © Copyright 2015 Pivotal. All rights reserved.

http://blog.pivotal.io/data-science-pivotal

Check out the Pivotal Data Science Blog!

Page 71: Business Impact From IoT? Just Add Data Science

71 © Copyright 2015 Pivotal. All rights reserved.

http://blog.pivotal.io/data-science-pivotal

Check out the Pivotal Data Science Blog!

Page 72: Business Impact From IoT? Just Add Data Science

72 © Copyright 2015 Pivotal. All rights reserved.

http://blog.pivotal.io/data-science-pivotal

Check out the Pivotal Data Science Blog!

Page 73: Business Impact From IoT? Just Add Data Science

73 © Copyright 2015 Pivotal. All rights reserved.

http://blog.pivotal.io/data-science-pivotal

Check out the Pivotal Data Science Blog!

Page 74: Business Impact From IoT? Just Add Data Science

74 © Copyright 2015 Pivotal. All rights reserved.

http://blog.pivotal.io/data-science-pivotal

Check out the Pivotal Data Science Blog!

Page 75: Business Impact From IoT? Just Add Data Science

75 © Copyright 2015 Pivotal. All rights reserved.

FOR FURTHER INFO, CHECKOUT…

•  Join us at our MeetUp tomorrow at 6:30 PM MADlib + HAWQ for advanced SQL machine learning on Hadoop Pivotal Labs 625 Avenue of the Americas, 2nd http://www.meetup.com/Pivotal-NY/events/225074025/

•  Pivotal Blog @ http://blog.pivotal.io

•  Pivotal Academy @ https://pivotal.biglms.com