streaming customer insights with datastax cassandra & apache kafta at british gas connected...

20
Streaming Customer Insights With DSE Cassandra and Apache Kafka At British Gas Connected Homes Josep Casals | @jcasals | 2016 London

Upload: datastax

Post on 06-Jan-2017

309 views

Category:

Technology


0 download

TRANSCRIPT

Page 1: Streaming Customer Insights with DataStax Cassandra & Apache Kafta at British Gas Connected Homes

Streaming Customer InsightsWith DSE Cassandra and Apache Kafka

At British Gas Connected Homes

Josep Casals | @jcasals | 2016 London

Page 2: Streaming Customer Insights with DataStax Cassandra & Apache Kafta at British Gas Connected Homes

Data Sources• Gas and electricity meter readings

• Thermostat temperature data

• Connected boiler data

• Real time energy consumption data

• Introducing motion sensors, window and door sensors, etc.

2Josep Casals | @jcasals | 2016 London

Page 3: Streaming Customer Insights with DataStax Cassandra & Apache Kafta at British Gas Connected Homes

Meter Data

• Millions of gas and electricity customers

• 2 Million smart meters

• Readings every 30 minutes from smart meters

3Josep Casals | @jcasals | 2016 London

Page 4: Streaming Customer Insights with DataStax Cassandra & Apache Kafta at British Gas Connected Homes

Machine Learning applied to Meter Data

• Energy disaggregation

• Similar homes comparison

• Smart meters used in indirect algorithms for non-smart customers

4Josep Casals | @jcasals | 2016 London

Page 5: Streaming Customer Insights with DataStax Cassandra & Apache Kafta at British Gas Connected Homes

Connected Thermostats

• 300k Connected Thermostats

• Temperature data time series

5Josep Casals | @jcasals | 2016 London

Page 6: Streaming Customer Insights with DataStax Cassandra & Apache Kafta at British Gas Connected Homes

Boiler IQ

• Proactive maintenance

• Failure detection

6Josep Casals | @jcasals | 2016 London

Page 7: Streaming Customer Insights with DataStax Cassandra & Apache Kafta at British Gas Connected Homes

In Home Displays in a mobile App

• Data every 10 seconds

• Still needs an access device connected to the router

• Allows real time mobile alerts

7Josep Casals | @jcasals | 2016 London

Page 8: Streaming Customer Insights with DataStax Cassandra & Apache Kafta at British Gas Connected Homes

Connected Home’s Streaming architecture

Page 9: Streaming Customer Insights with DataStax Cassandra & Apache Kafta at British Gas Connected Homes

What real time looks like

• Temperature updates via web socket

• We plot them on a map using postal codes

• Updates for 25 out of 100 partitions

Page 10: Streaming Customer Insights with DataStax Cassandra & Apache Kafta at British Gas Connected Homes

Use Case High Consumption Alerts

• The red dot on top is what we want to detect

• The green bottom dots are the baseline plus the fridge

10Josep Casals | @jcasals | 2016 London

Page 11: Streaming Customer Insights with DataStax Cassandra & Apache Kafta at British Gas Connected Homes

High Consumption Alerts Data Ingest

• Very high volume of messages (every 10 seconds)

• Kafka partitions help us cope with volume

• Often we miss reads, the Samza job also does basic interpolation

11Josep Casals | @jcasals | 2016 London

Page 12: Streaming Customer Insights with DataStax Cassandra & Apache Kafta at British Gas Connected Homes

High Consumption Alerts Spark Streaming with Cassandra

• Real time data comes from Kafka

• Cassandra stores historical usage information

• A Spark Streaming job combines both and applies a machine learning algorithm to generate high usage alerts

12Josep Casals | @jcasals | CassandraSummit, 2015 Santa Clara CA

Page 13: Streaming Customer Insights with DataStax Cassandra & Apache Kafta at British Gas Connected Homes

Design tips Cassandra (+ Spark)• Partition your customers using buckets

• Use consistent partitioning across Spark & Cassandra as much as possible

• Don’t make your C* nodes too big (< 1TB) otherwise operations become painful.

• Don’t put all your tables inside one schema (it’s good to have flexibility setting replication factors)

Page 14: Streaming Customer Insights with DataStax Cassandra & Apache Kafta at British Gas Connected Homes

Cassandra data modeling with buckets• Using a hashing function that is uniform and deterministic we can cope

with time series data of any amount of customers

• One of our preferred strategies is to use buckets

14Josep Casals | @jcasals | 2016 London

Page 15: Streaming Customer Insights with DataStax Cassandra & Apache Kafta at British Gas Connected Homes

h(k) = ⌊m * frac(kA)⌋• Multiplicative hashing is our preferred simple partitioning algorithm

• m= Number of partitions

• A≈(√5−1)/2 = 0.6180339887... (Golden Ratio)

• Online example: jsfiddle.net/joscas/yfp72fq5

15Josep Casals | @jcasals | 2016 London

Page 16: Streaming Customer Insights with DataStax Cassandra & Apache Kafta at British Gas Connected Homes

Design Tips Kafka + Spark Streaming

• Keep our own offsets (don’t rely on Spark checkpointing)

• Avro makes learning curve steeper but it’s worth the effort. (convert into binary + schema as soon as possible)

• Kafka producers are expensive if created for each RDD for each Spark Streaming micro batch

Page 17: Streaming Customer Insights with DataStax Cassandra & Apache Kafta at British Gas Connected Homes

Design Tips Kafka + Spark Streaming

• Beware: Offset Out of Range Exception - ooore :-(

• Kafka manager is very useful

• Schema registry is a weak spot (log.cleanup.policy = compact)

Page 18: Streaming Customer Insights with DataStax Cassandra & Apache Kafta at British Gas Connected Homes

Kafka producer factory for Spark Streaming

Page 19: Streaming Customer Insights with DataStax Cassandra & Apache Kafta at British Gas Connected Homes
Page 20: Streaming Customer Insights with DataStax Cassandra & Apache Kafta at British Gas Connected Homes

Thank [email protected]

@jcasals

20Josep Casals | @jcasals | 2016 London