cassandra day london 2015: british gas connected homes: 5 things we wish we had known before...

31
Five Things... (we wish we had known) British Gas Connected Homes Josep Casals - Lead Data Engineer Jim Anning - Head of Data & Analytics

Upload: planet-cassandra

Post on 15-Jul-2015

292 views

Category:

Technology


0 download

TRANSCRIPT

Page 1: Cassandra Day London 2015: British Gas Connected Homes: 5 Things We Wish We Had Known Before Building a Data Platform

Five Things... (we wish we had known)

British Gas Connected Homes

Josep Casals - Lead Data EngineerJim Anning - Head of Data & Analytics

Page 2: Cassandra Day London 2015: British Gas Connected Homes: 5 Things We Wish We Had Known Before Building a Data Platform
Page 3: Cassandra Day London 2015: British Gas Connected Homes: 5 Things We Wish We Had Known Before Building a Data Platform

Hive : Control Your Heating from your Phone

Connected Boiler: Proactive Maintenance

MyEnergy: Understand your Energy Usage

Page 4: Cassandra Day London 2015: British Gas Connected Homes: 5 Things We Wish We Had Known Before Building a Data Platform

Hive

170K - 2 minutes

Page 5: Cassandra Day London 2015: British Gas Connected Homes: 5 Things We Wish We Had Known Before Building a Data Platform
Page 6: Cassandra Day London 2015: British Gas Connected Homes: 5 Things We Wish We Had Known Before Building a Data Platform

600K - Smart Meter

3.8M Monthly

Future - 10 seconds

Page 7: Cassandra Day London 2015: British Gas Connected Homes: 5 Things We Wish We Had Known Before Building a Data Platform
Page 8: Cassandra Day London 2015: British Gas Connected Homes: 5 Things We Wish We Had Known Before Building a Data Platform
Page 9: Cassandra Day London 2015: British Gas Connected Homes: 5 Things We Wish We Had Known Before Building a Data Platform

Internet of Things

Data ScienceLots of Data

0

15000

30000

45000

60000

2011 Now

Page 10: Cassandra Day London 2015: British Gas Connected Homes: 5 Things We Wish We Had Known Before Building a Data Platform

Internet of Things

Data ScienceLots of Data

0

15000

30000

45000

60000

C* + Spark

2011 Now

Page 11: Cassandra Day London 2015: British Gas Connected Homes: 5 Things We Wish We Had Known Before Building a Data Platform

Lesson 1 : Not to race against bicycles

Page 12: Cassandra Day London 2015: British Gas Connected Homes: 5 Things We Wish We Had Known Before Building a Data Platform

Spark is for parallel execution

• Makes sense when we have jobs that can’t run on a single machine

• The Spark master needs to distribute the job to workers

• If the job shuffles all data to one single node, parallelism is lost

• For small tasks, many times a simple script is better

techblog.netflix.com

Page 13: Cassandra Day London 2015: British Gas Connected Homes: 5 Things We Wish We Had Known Before Building a Data Platform

Things that look like a Spark / C* cluster A Large Hadron Collider

A Ion Thrust Engine • It can achieve big energies

• It takes a lot of fine tuning

• It starts slow but in the long run goes very fast

Page 14: Cassandra Day London 2015: British Gas Connected Homes: 5 Things We Wish We Had Known Before Building a Data Platform

Who wins?

It depends on how far you go…

Page 15: Cassandra Day London 2015: British Gas Connected Homes: 5 Things We Wish We Had Known Before Building a Data Platform

Lesson 2 : Not to use Spark too much

Page 16: Cassandra Day London 2015: British Gas Connected Homes: 5 Things We Wish We Had Known Before Building a Data Platform

Joining data from multiple sources Think twice when you do that

Page 17: Cassandra Day London 2015: British Gas Connected Homes: 5 Things We Wish We Had Known Before Building a Data Platform

Upserting data from multiple sources

Do that if possible

Page 18: Cassandra Day London 2015: British Gas Connected Homes: 5 Things We Wish We Had Known Before Building a Data Platform

Upserting data from multiple sources

Page 19: Cassandra Day London 2015: British Gas Connected Homes: 5 Things We Wish We Had Known Before Building a Data Platform

Lesson 3 : Spark is stronger than Cassandra

Page 20: Cassandra Day London 2015: British Gas Connected Homes: 5 Things We Wish We Had Known Before Building a Data Platform
Page 21: Cassandra Day London 2015: British Gas Connected Homes: 5 Things We Wish We Had Known Before Building a Data Platform

Spark Properties & Cassandra-specific properties tuning

Page 22: Cassandra Day London 2015: British Gas Connected Homes: 5 Things We Wish We Had Known Before Building a Data Platform

Lesson 4 : Mindset

Page 23: Cassandra Day London 2015: British Gas Connected Homes: 5 Things We Wish We Had Known Before Building a Data Platform
Page 24: Cassandra Day London 2015: British Gas Connected Homes: 5 Things We Wish We Had Known Before Building a Data Platform

Lesson 5 : Velocity

Page 25: Cassandra Day London 2015: British Gas Connected Homes: 5 Things We Wish We Had Known Before Building a Data Platform

Idea ValueData Science Data EngineeringData Operations

Page 26: Cassandra Day London 2015: British Gas Connected Homes: 5 Things We Wish We Had Known Before Building a Data Platform

Idea ValueData Science Data EngineeringData Operations

Page 27: Cassandra Day London 2015: British Gas Connected Homes: 5 Things We Wish We Had Known Before Building a Data Platform

Data OperationsIdea ValueData Science Data Engineering

Creative

Experimental

Incremental Robust

Defined

Maintainable

Research

Scalable

Testable

Page 28: Cassandra Day London 2015: British Gas Connected Homes: 5 Things We Wish We Had Known Before Building a Data Platform

Data OperationsIdea ValueData Science Data Engineering

Creative

Experimental

Incremental Robust

Defined

Maintainable

Research

Scalable

Testable

R PythonJava Scala

Small DatasetsOffline

Single Machine

#BigData

ClusteredRealtime

Page 29: Cassandra Day London 2015: British Gas Connected Homes: 5 Things We Wish We Had Known Before Building a Data Platform

Data OperationsIdea ValueData Science Data Engineering

Creative

Experimental

Incremental Robust

Defined

Maintainable

Research

Scalable

Testable

R PythonJava Scala

Small DatasetsOffline

Single Machine

#BigData

ClusteredRealtime

Page 30: Cassandra Day London 2015: British Gas Connected Homes: 5 Things We Wish We Had Known Before Building a Data Platform

Data OperationsIdea ValueData Science Data Engineering

Creative

Experimental

Incremental Robust

Defined

Maintainable

Research

Scalable

Testable

R PythonJava Scala

Small DatasetsOffline

Single Machine

#BigData

ClusteredRealtime

x

Page 31: Cassandra Day London 2015: British Gas Connected Homes: 5 Things We Wish We Had Known Before Building a Data Platform

Thankyou

@JimAnning : [email protected] @Jcasals : [email protected]