cassandra and iot

36
Company Confidential © 2015 Aeris Communications, Inc. All Rights Reserved © 2015 DataStax, All Rights Reserved. 1 “ideal to store time series data” “Apache Cassandra has never failed us.”

Upload: russell-spitzer

Post on 31-Jul-2015

321 views

Category:

Software


2 download

TRANSCRIPT

Company Confidential© 2015 Aeris Communications, Inc. All Rights Reserved © 2015 DataStax, All Rights Reserved. 1

“ideal to store time series data”

“Apache Cassandra has never failed us.”

PerformanceAvailabilityScale

Company Confidential© 2015 Aeris Communications, Inc. All Rights Reserved © 2015 DataStax, All Rights Reserved. 3

Startup Program

ToastrBox

Analytics

Search

In-memory

Visual Admin

Security

Certified Cassandra

Company Confidential© 2015 Aeris Communications, Inc. All Rights Reserved © 2015 DataStax, All Rights Reserved. 4

IoT requires performance and reliability

Company Confidential© 2015 Aeris Communications, Inc. All Rights Reserved © 2015 DataStax, All Rights Reserved. 5

Your System

Send Heating Coil Repair Man

IoT requires performance and reliability

Company Confidential© 2015 Aeris Communications, Inc. All Rights Reserved © 2015 DataStax, All Rights Reserved. 6

Your System

Send Heating Coil Repair ManSend 10% Off Bread Coupon

IoT requires performance and reliability

Company Confidential© 2015 Aeris Communications, Inc. All Rights Reserved © 2015 DataStax, All Rights Reserved. 7

Your System

Send Heating Coil Repair ManSend 10% Off Bread Coupon

Offer Upgrade Suggestions

IoT requires performance and reliability

Company Confidential© 2015 Aeris Communications, Inc. All Rights Reserved © 2015 DataStax, All Rights Reserved. 8

Your System

Send Heating Coil Repair ManSend 10% Off Bread Coupon

Offer Upgrade Suggestions Integrate with your SaaS (Spread as a Service)

IoT requires performance and reliability

Company Confidential© 2015 Aeris Communications, Inc. All Rights Reserved © 2015 DataStax, All Rights Reserved. 9

Your SystemFAULT

IoT requires performance and reliability

App Down, Customers Lose Interest

Company Confidential© 2015 Aeris Communications, Inc. All Rights Reserved © 2015 DataStax, All Rights Reserved. 10

Your SystemSLOW

Send Heating Coil Repair ManThree months after they

get a competitor's toaster

Offer Upgrade SuggestionsThat are already out of date

Send 10% Off Bread Coupon They've already restocked on bread

Integrate with your SaaS (Spread as a Service)

Toast got spread a long time ago

IoT requires performance and reliability

Company Confidential© 2015 Aeris Communications, Inc. All Rights Reserved © 2015 DataStax, All Rights Reserved. 11

Send Heating Coil Repair ManSend 10% Off Bread Coupon

Offer Upgrade Suggestions Integrate with your SaaS (Spread as a Service)

IoT requires performance and reliability

Company Confidential© 2015 Aeris Communications, Inc. All Rights Reserved © 2015 DataStax, All Rights Reserved. 12

0 50 100 150 200 250 300 350

174,373

366,828

537,172

1,099,837

http://techblog.netflix.com/2011/11/benchmarking-cassandra-scalability-on.html

50 nodes

100

150

300 nodesScale

Company Confidential© 2015 Aeris Communications, Inc. All Rights Reserved © 2015 DataStax, All Rights Reserved. 13

Horizontal scale

B

A

A BToken Range Mapping Data To Nodes

Ring Architecture Peer to Peer Communication

No Masters, No Slaves

Company Confidential© 2015 Aeris Communications, Inc. All Rights Reserved © 2015 DataStax, All Rights Reserved. 14

C

BA

D

A B

C D

Token Range Mapping Data To Nodes

Ring Architecture Peer to Peer Communication

No Masters, No Slaves

Horizontal scale

Company Confidential© 2015 Aeris Communications, Inc. All Rights Reserved © 2015 DataStax, All Rights Reserved. 15

Availability

"During Hurricane Sandy, we lost an entire data center. Completely. Lost. It.

Our data in Cassandra never went offline."

Company Confidential© 2015 Aeris Communications, Inc. All Rights Reserved © 2015 DataStax, All Rights Reserved.

Peer-to-peer architecture

16

C

BA

D

Client

Client has a holistic view

Cluster cluster = Cluster.builder().addContactPoint("192.168.0.1").build();

Cassandra Cluster

Company Confidential© 2015 Aeris Communications, Inc. All Rights Reserved © 2015 DataStax, All Rights Reserved. 17

C

BA

D

Client

Client has a holistic view

Partition Keys are Hashed to a Token Range

DeviceID: 102349

Divided data responsibility across cluster

A B

C D

Company Confidential© 2015 Aeris Communications, Inc. All Rights Reserved © 2015 DataStax, All Rights Reserved.

Controlling fault tolerance: replication factor

18

Server - Replication: How many copies of a data should exist in the cluster?

ReplicationFactor=3

Client

Replication Strategies can span data centers! Survive whole AWS Region Failure!

ACD

ABCABD

BCD

A B

C D

Company Confidential© 2015 Aeris Communications, Inc. All Rights Reserved © 2015 DataStax, All Rights Reserved.

Controlling fault tolerance: replication factor

19

ACD

ABCABD

BCDACD

ABCABD

BCD

US-West US-East

Server - Replication: How many copies of a data should exist in the cluster?

ReplicationFactor=3

A B

C D

Company Confidential© 2015 Aeris Communications, Inc. All Rights Reserved © 2015 DataStax, All Rights Reserved.

Controlling fault tolerance: replication factor

20

Cassandra Cluster

ACD

ABCABD

BCDACD

ABCABD

BCD

US-East

Server - Replication: How many copies of a data should exist in the cluster?

ReplicationFactor=3

US-West

A B

C D

Company Confidential© 2015 Aeris Communications, Inc. All Rights Reserved © 2015 DataStax, All Rights Reserved.

Controlling fault tolerance: replication factor

21

A B

C D

ACD

ABCABD

BCDACD

ABCABD

BCD

US-West US-East

Server - Replication: How many copies of a data should exist in the cluster?

ReplicationFactor=3

Company Confidential© 2015 Aeris Communications, Inc. All Rights Reserved © 2015 DataStax, All Rights Reserved.

Controlling fault tolerance: tunable consistency

22

Client - Consistency Level: How many replicas should we check before acknowledgement?

CL = One

Client

Successful  Toast  Made!

ACD

ABCABD

BCDACD

ABCABD

BCD

A B

C D

Company Confidential© 2015 Aeris Communications, Inc. All Rights Reserved © 2015 DataStax, All Rights Reserved.

Controlling fault tolerance: tunable consistency

23

Client - Consistency Level: How many replicas should we check before acknowledgement?

CL = Quorum

Client

Toaster  Burst  Into  Flames!

Higher Consistency Level's Let us Make Sure Events are Persisted

ACD

ABCABD

BCDACD

ABCABD

BCD

A B

C D

http://www.datastax.com/apache-cassandra-leads-nosql-benchmark

0

40000

80000

120000

160000

1 2 4 8

Performance

Company Confidential© 2015 Aeris Communications, Inc. All Rights Reserved © 2015 DataStax, All Rights Reserved.

Unparalleled durable performance

25

Par ReClu Memory

Commit Log

Memtable Memtable

Disk

Memtable

Par ReClu

Par ReCluPar ReClu

Par ReCluPar ReClu

Par ReCluPar ReClu

Company Confidential© 2015 Aeris Communications, Inc. All Rights Reserved © 2015 DataStax, All Rights Reserved.

Unparalleled durable performance

26

Par ReClu Memory

Commit Log

Memtable Memtable

Disk

Memtable

Par ReClu

Par ReCluPar ReClu

Par ReCluPar ReClu

Par ReCluPar ReClu

SSTable SSTable

Flushed

Company Confidential© 2015 Aeris Communications, Inc. All Rights Reserved © 2015 DataStax, All Rights Reserved.

Reading data is fast but limited by disk IO

27

Memory

Commit Log

Memtable Memtable

Disk

Memtable

Par ReCluPar ReClu

Par ReCluPar ReClu

Par ReCluPar ReClu

SSTable SSTable

Flushed

Replica

Company Confidential© 2015 Aeris Communications, Inc. All Rights Reserved © 2015 DataStax, All Rights Reserved.

Reading data is fast but limited by disk IO

28

Memory

Commit Log

Memtable Memtable

Disk

Memtable

Par ReCluPar ReClu

Par ReCluPar ReClu

Par ReCluPar ReClu

SSTable SSTable

Flushed

Replica

Par ReCluPar ReClu

Par ReCluPar ReClu

Company Confidential© 2015 Aeris Communications, Inc. All Rights Reserved © 2015 DataStax, All Rights Reserved.

Reading data is fast but limited by disk IO

29

Memory

Commit Log

Memtable Memtable

Disk

Memtable

Par ReCluPar ReClu

Par ReCluPar ReClu

Par ReCluPar ReClu

SSTable SSTable

Flushed

Replica

Par ReCluPar ReClu

Par ReCluPar ReCluLWW

Company Confidential© 2015 Aeris Communications, Inc. All Rights Reserved © 2015 DataStax, All Rights Reserved.

Data modeling for time series

30

Things Generating Events

Company Confidential© 2015 Aeris Communications, Inc. All Rights Reserved © 2015 DataStax, All Rights Reserved.

Data modeling for time series

31

Things Generating Events

Store Events ordered by TimeUUID

t1 t2 t3 t4 t5 t6 t7 t8 t9

Company Confidential© 2015 Aeris Communications, Inc. All Rights Reserved © 2015 DataStax, All Rights Reserved.

Data modeling for time series

32

Things Generating Events

Store Events ordered by TimeUUID

t1 t2 t3 t4 t5 t6 t7 t8 t9

SSTable SSTable

t1 t10 t11 t20

Data Ends up being Stored Temporally Sequentially on Disk

Additional tables with Rollups/aggs etc …

With data stored sequentially by time, time based queries become extremely fast!

Company Confidential© 2015 Aeris Communications, Inc. All Rights Reserved © 2015 DataStax, All Rights Reserved.

Cassandra data modeling

33

Create Table example ( toasterID UUID, eventTime TIMEUUID, event Text, PRIMARY KEY (pk, ck))

Whole partition available

on each replica

Data ordered within Partition by Clustering Key

Partition Key Idle Toasting Toasting Toast Success! Idle

12:00 12:01 12:02 12:03 12:04

Stored as Multiple SSTables,

Each Internally Ordered

Easy to Search Ranges of Clustering Key Difficult to search Ranges of Partition Key

Company Confidential© 2015 Aeris Communications, Inc. All Rights Reserved © 2015 DataStax, All Rights Reserved.

DataStax Spark-Cassandra connector

34

Receiver

DStream

Events

Batch Batch

RDD RDD RDD RDD

https://github.com/datastax/spark-cassandra-connector

Company Confidential© 2015 Aeris Communications, Inc. All Rights Reserved © 2015 DataStax, All Rights Reserved.

Streaming data direct to Cassandra

35

It's easier than ever to connect you incoming event data with Cassandra

Start free Apache Cassandra training at DataStax Academy