business impact from iot? just add data science

Post on 07-Jan-2017

2.102 Views

Category:

Internet

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

1 © Copyright 2015 Pivotal. All rights reserved. 1 © Copyright 2013 Pivotal. All rights reserved.

Business impact from IoT? Just add data science

Sarah Aerni, Principal Data Scientist Pivotal @itweetsarah Strata + Hadoop World, New York September 30th

2 © Copyright 2015 Pivotal. All rights reserved.

Our everyday devices are smart and talk to us

3 © Copyright 2015 Pivotal. All rights reserved.

These devices are now talking to each other

4 © Copyright 2015 Pivotal. All rights reserved.

How can connected devices in our home be smart enough to

make daily life easier?

5 © Copyright 2015 Pivotal. All rights reserved.

How can we know a tree has fallen on a power line before

the residents complain?

6 © Copyright 2015 Pivotal. All rights reserved.

How can we use data to help prevent

accidents like the Macondo Disaster ?

7 © Copyright 2015 Pivotal. All rights reserved.

How does this…

8 © Copyright 2015 Pivotal. All rights reserved.

How does this… …become this?

9 © Copyright 2015 Pivotal. All rights reserved.

How does this… …become this?

By recognizing this

10 © Copyright 2015 Pivotal. All rights reserved.

Gene Sequencing

Smart Grids

COST TO SEQUENCE ONE GENOME HAS FALLEN FROM $100M IN 2001 TO $10K IN 2011 TO $1K IN 2014

READING SMART METERS EVERY 15 MINUTES IS 3000X MORE DATA INTENSIVE

Stock Market

Social Media

FACEBOOK UPLOADS 250 MILLION

PHOTOS EACH DAY

In all industries billions of data points represent opportunities for the Internet of Things

Oil Exploration

Video Surveillance

OIL RIGS GENERATE

25000 DATA POINTS PER SECOND

Medical Imaging

Mobile Sensors

11 © Copyright 2015 Pivotal. All rights reserved.

To realize this opportunity requires the right tools and techniques

Sensors & Actuators

12 © Copyright 2015 Pivotal. All rights reserved.

To realize this opportunity requires the right tools and techniques

Sensors & Actuators

Data Lake

13 © Copyright 2015 Pivotal. All rights reserved.

To realize this opportunity requires the right tools and techniques

Problem Formulation

Data Science for Building Models

Sensors & Actuators

Data Lake

14 © Copyright 2015 Pivotal. All rights reserved.

To realize this opportunity requires the right tools and techniques

Problem Formulation

Data Step

Data Science for Building Models

Sensors & Actuators

Data Lake

15 © Copyright 2015 Pivotal. All rights reserved.

To realize this opportunity requires the right tools and techniques

Problem Formulation

Modeling Step

Data Step

Data Science for Building Models

Sensors & Actuators

Data Lake

16 © Copyright 2015 Pivotal. All rights reserved.

To realize this opportunity requires the right tools and techniques

Problem Formulation

Modeling Step

Data Step Application Step

Data Science for Building Models

Sensors & Actuators

Data Lake

17 © Copyright 2015 Pivotal. All rights reserved.

What does it take to build a data-driven models? Data Cleansing Libraries to Build

Models At-Scale Feature

Engineering

18 © Copyright 2015 Pivotal. All rights reserved.

Treating Patients

What does it take to build a data-driven models? Data Cleansing Libraries to Build

Models At-Scale Feature

Engineering

Vaccine Manufacturing Oil Drilling

19 © Copyright 2015 Pivotal. All rights reserved.

Treating Patients

What does it take to build a data-driven models? Data Cleansing Libraries to Build

Models At-Scale Feature

Engineering

Derive insight from models to change

processes

Tradeoffs between model accuracy and timeliness

Vaccine Manufacturing Oil Drilling

20 © Copyright 2015 Pivotal. All rights reserved.

Treating Patients

What does it take to build a data-driven models? Data Cleansing Libraries to Build

Models At-Scale Feature

Engineering

Derive insight from models to change

processes

Tradeoffs between model accuracy and timeliness

Vaccine Manufacturing Oil Drilling Vaccine Manufacturing

21 © Copyright 2015 Pivotal. All rights reserved.

Opportunities for Data-Driven Decisions in Pharma

22 © Copyright 2015 Pivotal. All rights reserved.

A pipeline of sensors and opportunities for optimizing output Internet of Things in Manufacturing

Input materials Mix Incubate Filter Centrifuge Final Product

23 © Copyright 2015 Pivotal. All rights reserved.

A pipeline of sensors and opportunities for optimizing output Internet of Things in Manufacturing

Input materials Mix Incubate Filter Centrifuge Final Product

Sensors Te

mp

Time

Abs

orba

nce

Elution volume

Velo

city

Time

24 © Copyright 2015 Pivotal. All rights reserved.

A pipeline of sensors and opportunities for optimizing output Internet of Things in Manufacturing

Input materials Mix Incubate Filter Centrifuge Final Product

Tem

p

Time

Abs

orba

nce

Elution volume

Velo

city

Time

•  What opportunities exist for intervention, correction? •  Which attributes should be used as features in a model? •  When is the appropriate time to take action?

25 © Copyright 2015 Pivotal. All rights reserved.

A pipeline of sensors and opportunities for optimizing output Internet of Things in Manufacturing

Input materials Mix Incubate Filter Centrifuge Final Product

Tem

p

Time

Abs

orba

nce

Elution volume

Velo

city

Time

•  What opportunities exist for intervention, correction? •  Which attributes should be used as features in a model? •  When is the appropriate time to take action?

>6 months

26 © Copyright 2015 Pivotal. All rights reserved.

How can noisy data create meaningful models?

True Potency

Pre

dict

ed P

oten

cy

Input materials Mix Incubate Filter Centrifuge Final Product

27 © Copyright 2015 Pivotal. All rights reserved.

How can noisy data create meaningful models? Te

mpe

ratu

re

Time

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●●●

●●

●●

●●

●●●

●●

●●●

●●●

●●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●●

●●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●●●

●●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●●●

●●

●●

●●

●●

●●●

●●●

●●●

●●

●●

●●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●●

●●●●

●●

●●●

●●

●●

●●●

●●

●●

●●

●●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●●

●●

●●

●●

●●●

●●

●●●●

●●

●●●

●●

●●●

●●

●●

●●

●●

●●

●●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●●●

●●

●●

●●

●●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●●

●●●

●●●

●●●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●●

●●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●●

●●●

●●

●●●

●●

●●

●●

●●●

●●●

●●

●●●

●●

●●●

●●●●

●●

●●

●●

●●

●●

●●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●●

●●●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

00:00 10:00 20:00 30:00 40:00 50:00 00:00

1015

20

df$ts_utc

df$w

ob

True Potency

Pre

dict

ed P

oten

cy

28 © Copyright 2015 Pivotal. All rights reserved.

How can noisy data create meaningful models? Te

mpe

ratu

re

Time

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●●●

●●

●●

●●

●●●

●●

●●●

●●●

●●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●●

●●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●●●

●●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●●●

●●

●●

●●

●●

●●●

●●●

●●●

●●

●●

●●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●●

●●●●

●●

●●●

●●

●●

●●●

●●

●●

●●

●●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●●

●●

●●

●●

●●●

●●

●●●●

●●

●●●

●●

●●●

●●

●●

●●

●●

●●

●●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●●●

●●

●●

●●

●●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●●

●●●

●●●

●●●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●●

●●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●●

●●●

●●

●●●

●●

●●

●●

●●●

●●●

●●

●●●

●●

●●●

●●●●

●●

●●

●●

●●

●●

●●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●●

●●●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

00:00 10:00 20:00 30:00 40:00 50:00 00:00

1015

20

df$ts_utc

df$w

ob

•  Deriving signal noisy sensor data requires data cleansing

29 © Copyright 2015 Pivotal. All rights reserved.

How can noisy data create meaningful models? Te

mpe

ratu

re

Time

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●●●

●●

●●

●●

●●●

●●

●●●

●●●

●●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●●

●●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●●●

●●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●●●

●●

●●

●●

●●

●●●

●●●

●●●

●●

●●

●●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●●

●●●●

●●

●●●

●●

●●

●●●

●●

●●

●●

●●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●●

●●

●●

●●

●●●

●●

●●●●

●●

●●●

●●

●●●

●●

●●

●●

●●

●●

●●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●●●

●●

●●

●●

●●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●●

●●●

●●●

●●●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●●

●●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●●

●●●

●●

●●●

●●

●●

●●

●●●

●●●

●●

●●●

●●

●●●

●●●●

●●

●●

●●

●●

●●

●●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●●

●●●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

00:00 10:00 20:00 30:00 40:00 50:00 00:00

1015

20

df$ts_utc

df$w

ob

●●

●●●

●●

●●

●●

●●●

●●

●●●

●●●●●●●●●●

●●●●

●●●

●●●

●●

●●

●●

●●

●●

●●●●●

●●

●●●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●●

●●●

●●

●●

●●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●●

●●●●

●●

●●

●●

●●

●●●

●●

●●

●●●●●

●●●●

●●

●●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●●

●●

●●●●

●●

●●●

●●

●●●●●

●●●

●●

●●

●●

●●

●●

●●●

●●

●●●

●●

●●

●●●●

●●●

●●

●●

●●

●●●●

●●

●●

●●

●●

●●

●●●●

●●

●●

●●●

●●

●●●

●●

●●●

●●●

●●●●

●●

●●●●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●●

●●

●●

●●●●●

●●

●●

●●

●●●

●●●

●●

●●

●●●

●●●

●●

●●

●●

●●●

●●

●●●●

●●●

●●●

●●

●●●

●●●

●●

●●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●●●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●●

●●●

●●●

●●

●●

●●

●●

●●●●

●●

●●

●●●

●●●

●●

●●●

●●●

●●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●●

●●

●●

●●

●●

●●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●●

●●

●●

●●

●●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●●

●●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●●●

●●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●●●●●

●●

●●

●●

●●

●●

●●●

●●●

●●

●●

●●●●●

●●●

●●

●●●●●

●●

●●

●●●●●

●●●

●●

●●

●●●

●●●●

●●

●●

●●

●●●●●●●

●●

●●

●●●

●●●●

●●●

●●

●●●

●●

●●

●●●●

●●●

●●●●

●●

●●●

●●●●●●●●●

●●

●●

●●●●●

●●

●●●●

●●

●●●

●●●

●●

●●

●●●●●

●●●●●●●●

●●●

●●●

●●●●●

●●

●●●

●●

●●

●●●

●●

●●

●●●

●●

●●●

●●

●●

●●●

●●●●

●●●●●

●●●

●●

●●

●●

●●

●●

●●

●●●●●

●●

●●●

●●

●●

●●

●●●

●●

●●●

●●●

●●

●●●●

●●●

●●

●●

●●

●●●

●●●

●●●

●●●●

●●

●●●●

●●●

●●

●●●

●●●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●●●

•  Deriving signal noisy sensor data requires data cleansing

30 © Copyright 2015 Pivotal. All rights reserved.

How can noisy data create meaningful models? Te

mpe

ratu

re

Time

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●●●

●●

●●

●●

●●●

●●

●●●

●●●

●●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●●

●●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●●●

●●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●●●

●●

●●

●●

●●

●●●

●●●

●●●

●●

●●

●●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●●

●●●●

●●

●●●

●●

●●

●●●

●●

●●

●●

●●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●●

●●

●●

●●

●●●

●●

●●●●

●●

●●●

●●

●●●

●●

●●

●●

●●

●●

●●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●●●

●●

●●

●●

●●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●●

●●●

●●●

●●●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●●

●●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●●

●●●

●●

●●●

●●

●●

●●

●●●

●●●

●●

●●●

●●

●●●

●●●●

●●

●●

●●

●●

●●

●●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●●

●●●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

00:00 10:00 20:00 30:00 40:00 50:00 00:00

1015

20

df$ts_utc

df$w

ob

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●●●

●●

●●

●●

●●●

●●

●●●

●●●

●●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●●

●●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●●●

●●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●●●

●●

●●

●●

●●

●●●

●●●

●●●

●●

●●

●●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●●

●●●●

●●

●●●

●●

●●

●●●

●●

●●

●●

●●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●●

●●

●●

●●

●●●

●●

●●●●

●●

●●●

●●

●●●

●●

●●

●●

●●

●●

●●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●●●

●●

●●

●●

●●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●●

●●●

●●●

●●●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●●

●●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●●

●●●

●●

●●●

●●

●●

●●

●●●

●●●

●●

●●●

●●

●●●

●●●●

●●

●●

●●

●●

●●

●●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●●

●●●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

•  Deriving signal noisy sensor data requires data cleansing

31 © Copyright 2015 Pivotal. All rights reserved.

How can noisy data create meaningful models? Te

mpe

ratu

re

Time

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●●●

●●

●●

●●

●●●

●●

●●●

●●●

●●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●●

●●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●●●

●●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●●●

●●

●●

●●

●●

●●●

●●●

●●●

●●

●●

●●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●●

●●●●

●●

●●●

●●

●●

●●●

●●

●●

●●

●●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●●

●●

●●

●●

●●●

●●

●●●●

●●

●●●

●●

●●●

●●

●●

●●

●●

●●

●●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●●●

●●

●●

●●

●●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●●

●●●

●●●

●●●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●●

●●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●●

●●●

●●

●●●

●●

●●

●●

●●●

●●●

●●

●●●

●●

●●●

●●●●

●●

●●

●●

●●

●●

●●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●●

●●●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

00:00 10:00 20:00 30:00 40:00 50:00 00:00

1015

20

df$ts_utc

df$w

ob

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●●●

●●

●●

●●

●●●

●●

●●●

●●●

●●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●●

●●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●●●

●●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●●●

●●

●●

●●

●●

●●●

●●●

●●●

●●

●●

●●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●●

●●●●

●●

●●●

●●

●●

●●●

●●

●●

●●

●●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●●

●●

●●

●●

●●●

●●

●●●●

●●

●●●

●●

●●●

●●

●●

●●

●●

●●

●●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●●●

●●

●●

●●

●●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●●

●●●

●●●

●●●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●●

●●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●●

●●●

●●

●●●

●●

●●

●●

●●●

●●●

●●

●●●

●●

●●●

●●●●

●●

●●

●●

●●

●●

●●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●●

●●●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

A cleansing approach: use average across a window

•  Deriving signal noisy sensor data requires data cleansing

•  Window functions in SQL allow us to perform smoothing seamlessly, at-scale

32 © Copyright 2015 Pivotal. All rights reserved.

How can noisy data create meaningful models? Te

mpe

ratu

re

Time

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●●●

●●

●●

●●

●●●

●●

●●●

●●●

●●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●●

●●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●●●

●●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●●●

●●

●●

●●

●●

●●●

●●●

●●●

●●

●●

●●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●●

●●●●

●●

●●●

●●

●●

●●●

●●

●●

●●

●●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●●

●●

●●

●●

●●●

●●

●●●●

●●

●●●

●●

●●●

●●

●●

●●

●●

●●

●●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●●●

●●

●●

●●

●●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●●

●●●

●●●

●●●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●●

●●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●●

●●●

●●

●●●

●●

●●

●●

●●●

●●●

●●

●●●

●●

●●●

●●●●

●●

●●

●●

●●

●●

●●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●●

●●●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

00:00 10:00 20:00 30:00 40:00 50:00 00:00

1015

20

df$ts_utc

df$w

ob

•  Deriving signal noisy sensor data requires data cleansing

•  Window functions in SQL allow us to perform smoothing seamlessly, at-scale

33 © Copyright 2015 Pivotal. All rights reserved.

How can noisy data create meaningful models? Te

mpe

ratu

re

Time

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●●●

●●

●●

●●

●●●

●●

●●●

●●●

●●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●●

●●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●●●

●●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●●●

●●

●●

●●

●●

●●●

●●●

●●●

●●

●●

●●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●●

●●●●

●●

●●●

●●

●●

●●●

●●

●●

●●

●●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●●

●●

●●

●●

●●●

●●

●●●●

●●

●●●

●●

●●●

●●

●●

●●

●●

●●

●●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●●●

●●

●●

●●

●●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●●

●●●

●●●

●●●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●●

●●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●●

●●●

●●

●●●

●●

●●

●●

●●●

●●●

●●

●●●

●●

●●●

●●●●

●●

●●

●●

●●

●●

●●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●●

●●●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

00:00 10:00 20:00 30:00 40:00 50:00 00:00

1015

20

df$ts_utc

df$w

ob

•  Deriving signal noisy sensor data requires data cleansing

•  Window functions in SQL allow us to perform smoothing seamlessly, at-scale

•  Test many hypotheses in parallel to examine if features have an effect on potency

34 © Copyright 2015 Pivotal. All rights reserved.

Interpreting the utility of a measure obtained during manufacturing based on model outcomes

Building insights from models

� Some features may reveal tunable parameters to alter potency, others may simply be markers

Assayed value Duration of a step

Pot

ency

Pot

ency

Correlation=0.45 Correlation=0.38

35 © Copyright 2015 Pivotal. All rights reserved.

Treating Patients

What does it take to build a data-driven models? Data Cleansing Libraries to Build

Models At-Scale Feature

Engineering

Derive insight from models to change

processes

Tradeoffs between model accuracy and timeliness

Vaccine Manufacturing Oil Drilling Treating Patients

36 © Copyright 2015 Pivotal. All rights reserved. 36 © Copyright 2013 Pivotal. All rights reserved.

Internet of Things in Healthcare Improving Patient Outcomes and Increasing Efficiency

37 © Copyright 2015 Pivotal. All rights reserved.

Beyond monitor alerts for crashing patients–Prediction means prevention Powering the Connected Hospital

ClinicalNarratives

38 © Copyright 2015 Pivotal. All rights reserved.

Use Cases in Healthcare Building a case for leveraging data and data science within a hospital setting

SAMPLE USE CASES �  Prevent unnecessary ED visits using air quality and patient histories to anticipate needed

prescription refills �  Avoid keeping patients longer than needed due to poor coordination by predicting patient length-of-

stay leading to on-time planning �  Early alerts for deteriorating patients to increase monitoring and improve outcomes �  Prevent discharging patients prematurely through patient readmission models �  Improve treatment pathways via mortality models for sepsis

INFLUENCE CHANGE by finding drivers in the models

IMPROVE customer MODELS using data-driven approaches

LEVERAGE previously inaccessible DATA sources

Environment Approach Insights

39 © Copyright 2015 Pivotal. All rights reserved.

Data & Platform Overview

Pivotal HD

Pivotal HAWQ

DATA PLATFORM

TOOLS

�  Data obtained from EPIC

�  Total unique encounters: 242,312,567

�  Total unique patient IDs: 11,195,934

�  Encounters from 6 healthcare settings (including hospitals, skilled nursing facilities, ambulance and dialysis) –  8 total hospitals used in LOS –  2 regions

�  9 years of data

EPIC

DIAGNOSES PROCEDURES

LABORATORY VALUES

MONITOR FEEDS

BED OCCUPANCY

ORDERS

40 © Copyright 2015 Pivotal. All rights reserved.

Engineering over 300 features to improve models

Simple SQL enables rapid generation of many creative features

•  Processing performed in the database without having to move the data with very simple SQL code

•  Reduced time to generate and examine features enables rapid iterations

•  Test hypotheses rapidly to examine if features have an effect on LOS

Patient Demographics

Patient Medical History

Current Admission

Prior Hospitalizations

ED Stay

Outpatient Utilization

Hospital Attributes

Lab Results (last 72 hrs)

41 © Copyright 2015 Pivotal. All rights reserved.

Understanding drivers of length of stay through model interpretation Model Results and Insights into Patient Outcomes

Data-driven approaches improved model fit by 66%, and predicts patient length of stay in the hospital within 22 hours of true discharge (on average)

Patient history offers less information for AMI Recent observations (from current admission), labs and hospital features are more predictive of length of stay than patient medical history

Current Admission Lab

Medical History Demographics

Hospital None (complete model)

Variance Explained When Category Excluded

Patient Demographics

Patient Medical History

Current Admissio

n

Prior Hospitalizations

ED Stay

Outpatient Utilization

Hospital Attributes

Lab Results (last 72 hrs)

42 © Copyright 2015 Pivotal. All rights reserved.

Insight into hospital operations Length of stay is not only biology. Admission Time, Day of Week, hospital’s size and a hospital’s experience with cardiology matter

Understanding drivers of length of stay through model interpretation Model Results and Insights into Patient Outcomes

Data-driven approaches improved model fit by 66%, and predicts patient length of stay in the hospital within 22 hours of true discharge (on average)

Patient history offers less information for AMI Recent observations (from current admission), labs and hospital features are more predictive of length of stay than patient medical history

Current Admission Lab

Medical History Demographics

Hospital None (complete model)

Variance Explained When Category Excluded Hour of the day

# of

Adm

issi

ons

Hour of the day

# of

Dis

char

ges

43 © Copyright 2015 Pivotal. All rights reserved.

Treating Patients

What does it take to build a data-driven models? Data Cleansing Libraries to Build

Models At-Scale Feature

Engineering

Derive insight from models to change

processes

Tradeoffs between model accuracy and timeliness

Vaccine Manufacturing Oil Drilling Oil Drilling

44 © Copyright 2015 Pivotal. All rights reserved.

Data: The New Oil IoT in Oil & Gas

45 © Copyright 2015 Pivotal. All rights reserved.

Predictive Maintenance Drilling into the San Andreas Fault at Parkfield

California. Credit: Stephen H.

Hickman, USGS

�  Failure costs estimated at $150,000/incident (billions annually)*

� Oil & gas generates large amounts of data from sensors enabling data-driven approaches to improve operations

� Goals –  Early warning system –  Insights into prominent features impacting operation and failure –  Reduction of non-productive drill time –  Reduced incidents

�  But how do we build models?

*http://blog.pivotal.io/pivotal/case-studies-2/data-as-the-new-oil-producing-value-for-the-oil-gas-industry

46 © Copyright 2015 Pivotal. All rights reserved.

How are models built using sensor data? Predictive use cases Class of model Specific models

47 © Copyright 2015 Pivotal. All rights reserved.

How are models built using sensor data? Predictive use cases Class of model Specific models

Predict equipment failure in time window Classification

•  Logistic Regression •  Support Vector Machines •  Random Forest

48 © Copyright 2015 Pivotal. All rights reserved.

How are models built using sensor data? Predictive use cases Class of model Specific models

Predict equipment failure in time window Classification

•  Logistic Regression •  Support Vector Machines •  Random Forest

Predict remaining life of equipment Survival •  Cox Proportional Hazards Regression

49 © Copyright 2015 Pivotal. All rights reserved.

How are models built using sensor data? Predictive use cases Class of model Specific models

Predict equipment failure in time window Classification

•  Logistic Regression •  Support Vector Machines •  Random Forest

Predict remaining life of equipment Survival •  Cox Proportional Hazards Regression

Predict rate-of-penetration in drilling Regression

•  Linear Regression •  Elastic Net Regularized Regression (Gaussian) •  Random Forest

50 © Copyright 2015 Pivotal. All rights reserved.

How are models built using sensor data? Predictive use cases Class of model Specific models

Predict equipment failure in time window Classification

•  Logistic Regression •  Support Vector Machines •  Random Forest

Predict remaining life of equipment Survival •  Cox Proportional Hazards Regression

Predict rate-of-penetration in drilling Regression

•  Linear Regression •  Elastic Net Regularized Regression (Gaussian) •  Random Forest

Identifying similar drilling sites Clustering •  K-means

•  Spectral clustering

51 © Copyright 2015 Pivotal. All rights reserved.

How are models built using BIG sensor data? Predictive use cases Class of model Specific models

Predict equipment failure in time window Classification

•  Logistic Regression •  Support Vector Machines •  Random Forest

Predict remaining life of equipment Survival •  Cox Proportional Hazards Regression

Predict rate-of-penetration in drilling Regression

•  Linear Regression •  Elastic Net Regularized Regression (Gaussian) •  Random Forest

Identifying similar drilling sites Clustering •  K-means

•  Spectral clustering

52 © Copyright 2015 Pivotal. All rights reserved.

How are models built using BIG sensor data? Predictive use cases Class of model Specific models

Predict equipment failure in time window Classification

•  Logistic Regression •  Support Vector Machines •  Random Forest

Predict remaining life of equipment Survival •  Cox Proportional Hazards Regression

Predict rate-of-penetration in drilling Regression

•  Linear Regression •  Elastic Net Regularized Regression (Gaussian) •  Random Forest

Identifying similar drilling sites Clustering •  K-means

•  Spectral clustering

Oil & Gas industries may produce billions of data points across thousands of sensors. Most implementations of these algorithms do not scale

53 © Copyright 2015 Pivotal. All rights reserved.

Linear Regression: Streaming Algorithm

� Finding linear dependencies between variables –  ROP = c0+ WOB * cWOB

0 10 20 30 40 50 60 70 80 90

100 110

-10 15

Rat

e of

P

enet

ratio

n (R

OP

)

Weight on Bit (WOB)

54 © Copyright 2015 Pivotal. All rights reserved.

Linear Regression: Streaming Algorithm

� Finding linear dependencies between variables

0

10 20 30 40 50 60 70 80 90

100 110

-10 15

Rat

e of

P

enet

ratio

n (R

OP

)

Weight on Bit (WOB)

55 © Copyright 2015 Pivotal. All rights reserved.

Linear Regression: Streaming Algorithm

� Finding linear dependencies between variables

� How to compute with a single scan?

0 10 20 30 40 50 60 70 80 90

100 110

-10 15

Rat

e of

P

enet

ratio

n (R

OP

)

Weight on Bit (WOB)

56 © Copyright 2015 Pivotal. All rights reserved.

Linear Regression: Parallel Computation

57 © Copyright 2015 Pivotal. All rights reserved.

Linear Regression: Parallel Computation

Segment 1 Segment 2

58 © Copyright 2015 Pivotal. All rights reserved.

Linear Regression: Parallel Computation

Segment 1 Segment 2

59 © Copyright 2015 Pivotal. All rights reserved.

Linear regression on 10 million rows in seconds

0

50

100

150

200

0 50 100 150 200 250 300 350

6 Segments 12 Segments 18 Segments 24 Segments

Hellerstein, Joseph M., et al. "The MADlib analytics library: or MAD skills, the SQL." Proceedings of the VLDB Endowment 5.12 (2012): 1700-1711.

# independent variables

Exe

cutio

n tim

e (s

)

60 © Copyright 2015 Pivotal. All rights reserved.

BIG DATA MACHINE LEARNING IN SQL http://madlib.net/

Predictive Modeling Library

Linear Systems •  Sparse and Dense Solvers

Matrix Factorization •  Single Value Decomposition (SVD) •  Low-Rank

Generalized Linear Models •  Linear Regression •  Logistic Regression •  Multinomial Logistic Regression •  Cox Proportional Hazards •  Regression •  Elastic Net Regularization •  Sandwich Estimators (Huber white,

clustered, marginal effects)

Machine Learning Algorithms •  Principal Component Analysis (PCA) •  Association Rules (Affinity Analysis, Market

Basket) •  Topic Modeling (Parallel LDA) •  Decision Trees •  Ensemble Learners (Random Forests) •  Support Vector Machines •  Conditional Random Field (CRF) •  Clustering (K-means) •  Cross Validation

Descriptive Statistics

Sketch-based Estimators •  CountMin (Cormode-

Muthukrishnan) •  FM (Flajolet-Martin) •  MFV (Most Frequent

Values) Correlation Summary

Support Modules

Array Operations Sparse Vectors Random Sampling Probability Functions PMML Export

61 © Copyright 2015 Pivotal. All rights reserved.

Treating Patients

What does it take to build a data-driven models? Data Cleansing Libraries to Build

Models At-Scale Feature

Engineering

Derive insight from models to change

processes

Tradeoffs between model accuracy and timeliness

Vaccine Manufacturing Oil Drilling

62 © Copyright 2015 Pivotal. All rights reserved.

1 5 3 4 2 6 7

8 12 10 11 9 13 14

15 19 17 18 16 20 21

22 26 24 25 23 27 28

29 30

SUNDAY THURSDAY TUESDAY WEDNESDAY MONDAY FRIDAY SATURDAY

SEPTEMBER 2013

63 © Copyright 2015 Pivotal. All rights reserved.

1 5 3 4 2 6 7

8 12 10 11 9 13 14

15 19 17 18 16 20 21

22 26 24 25 23 27 28

29 30

SUNDAY THURSDAY TUESDAY WEDNESDAY MONDAY FRIDAY SATURDAY

SEPTEMBER 2013

64 © Copyright 2015 Pivotal. All rights reserved.

1 5 3 4 2 6 7

8 12 10 11 9 13 14

15 19 17 18 16 20 21

22 26 24 25 23 27 28

29 30

SUNDAY THURSDAY TUESDAY WEDNESDAY MONDAY FRIDAY SATURDAY

SEPTEMBER 2013

65 © Copyright 2015 Pivotal. All rights reserved.

1 5 3 4 2 6 7

8 12 10 11 9 13 14

15 19 17 18 16 20 21

22 26 24 25 23 27 28

29 30

SUNDAY THURSDAY TUESDAY WEDNESDAY MONDAY FRIDAY SATURDAY

SEPTEMBER 2013

66 © Copyright 2015 Pivotal. All rights reserved.

3 1 2 4 5

6 10 8 9 7 11 12

13 17 15 16 14 18 19

20 24 22 23 21 25 26

27 31 29 30 28

SUN THU TUE WED MON FRI SAT OCTOBER 2013

1 5 3 4 2 6 7

8 12 10 11 9 13 14

15 19 17 18 16 20 21

22 26 24 25 23 27 28

29 30

SUN THU TUE WED MON FRI SAT SEPTEMBER 2013

A Snapshot

1 2

3 7 5 6 4 8 9

10 14 12 13 11 15 16

17 21 19 20 18 22 23

24 28 26 27 25 29 30

SUN THU TUE WED MON FRI SAT NOVEMBER 2013

1 5 3 4 2 6 7

8 12 10 11 9 13 14

15 19 17 18 16 20 21

22 26 24 25 23 27 28

29 31 30

SUN THU TUE WED MON FRI SAT DECEMBER 2013

67 © Copyright 2015 Pivotal. All rights reserved.

3 1 2 4 5

6 10 8 9 7 11 12

13 17 15 16 14 18 19

20 24 22 23 21 25 26

27 31 29 30 28

SUN THU TUE WED MON FRI SAT OCTOBER 2013

1 5 3 4 2 6 7

8 12 10 11 9 13 14

15 19 17 18 16 20 21

22 26 24 25 23 27 28

29 30

SUN THU TUE WED MON FRI SAT SEPTEMBER 2013

A Snapshot

1 2

3 7 5 6 4 8 9

10 14 12 13 11 15 16

17 21 19 20 18 22 23

24 28 26 27 25 29 30

SUN THU TUE WED MON FRI SAT NOVEMBER 2013

1 5 3 4 2 6 7

8 12 10 11 9 13 14

15 19 17 18 16 20 21

22 26 24 25 23 27 28

29 31 30

SUN THU TUE WED MON FRI SAT DECEMBER 2013

Another Snapshot

68 © Copyright 2015 Pivotal. All rights reserved.

CDC – 2011- Number of Health Care Visits Per Year - Age Adjusted

3 1 2 4 5

6 10 8 9 7 11 12

13 17 15 16 14 18 19

20 24 22 23 21 25 26

27 31 29 30 28

SUN THU TUE WED MON FRI SAT OCTOBER 2013

1 5 3 4 2 6 7

8 12 10 11 9 13 14

15 19 17 18 16 20 21

22 26 24 25 23 27 28

29 30

SUN THU TUE WED MON FRI SAT SEPTEMBER 2013

A Snapshot

1 2

3 7 5 6 4 8 9

10 14 12 13 11 15 16

17 21 19 20 18 22 23

24 28 26 27 25 29 30

SUN THU TUE WED MON FRI SAT NOVEMBER 2013

1 5 3 4 2 6 7

8 12 10 11 9 13 14

15 19 17 18 16 20 21

22 26 24 25 23 27 28

29 31 30

SUN THU TUE WED MON FRI SAT DECEMBER 2013

Another Snapshot

69 © Copyright 2015 Pivotal. All rights reserved.

The Promise of Internet of Humans

�  Smart contact lenses and sensors to identify and alert patients before catastrophic events (e.g. blood sugar drop for diabetics)

� Wearables to track patient disease progression using objective measures

�  Track patient adherence �  Detect disease outbreaks using sequencing in

sewer system samples �  ECG monitoring on mobile phones for early

alerting of stroke

70 © Copyright 2015 Pivotal. All rights reserved.

http://blog.pivotal.io/data-science-pivotal

Check out the Pivotal Data Science Blog!

71 © Copyright 2015 Pivotal. All rights reserved.

http://blog.pivotal.io/data-science-pivotal

Check out the Pivotal Data Science Blog!

72 © Copyright 2015 Pivotal. All rights reserved.

http://blog.pivotal.io/data-science-pivotal

Check out the Pivotal Data Science Blog!

73 © Copyright 2015 Pivotal. All rights reserved.

http://blog.pivotal.io/data-science-pivotal

Check out the Pivotal Data Science Blog!

74 © Copyright 2015 Pivotal. All rights reserved.

http://blog.pivotal.io/data-science-pivotal

Check out the Pivotal Data Science Blog!

75 © Copyright 2015 Pivotal. All rights reserved.

FOR FURTHER INFO, CHECKOUT…

•  Join us at our MeetUp tomorrow at 6:30 PM MADlib + HAWQ for advanced SQL machine learning on Hadoop Pivotal Labs 625 Avenue of the Americas, 2nd http://www.meetup.com/Pivotal-NY/events/225074025/

•  Pivotal Blog @ http://blog.pivotal.io

•  Pivotal Academy @ https://pivotal.biglms.com

top related