cassandra & spark for iot

20
Cassandra & Spark for IoT_ Matthias Niehoff

Upload: matthias-niehoff

Post on 07-Jan-2017

888 views

Category:

Software


1 download

TRANSCRIPT

Page 1: Cassandra & Spark for IoT

Cassandra & Spark for IoT_

Matthias Niehoff

Page 2: Cassandra & Spark for IoT

Cassandra

2

Page 3: Cassandra & Spark for IoT

•Distributed database

•Highly Available

•Horizontal & Linear Scalable

•Multi Datacenter Support

•No Single Point Of Failure

•Chooses Availability Over Strong Consistency

Cassandra for IoT_

3

Node 1

Node 2

Node 3

Node 4

1-25

26-50 51-75

76-0

Page 4: Cassandra & Spark for IoT

Great for Time Series Data_

4

CREATETABLEsensors(sensorIduuid,timetimeuuid,metricNametext,metricValuedouble,PRIMARYKEY(sensorId,time)

)

id t1 t2 t3 t4 t5 t6 t7 t8 t9 t10 t11

Stored sequentially on disk

Page 5: Cassandra & Spark for IoT

Spark

5

Page 6: Cassandra & Spark for IoT

•Open Source & Apache project since 2010

•Data processing Framework • Batch processing • Stream processing

What Is Apache Spark_

6

Page 7: Cassandra & Spark for IoT

•Fast • up to 100 times faster than Hadoop • a lot of in-memory processing • linear scalable using more nodes

• Easy • Scala, Java and Python API • Clean Code (e.g. with lambdas in Java 8) • expanded API: map, reduce, filter, groupBy, sort, union, join,

reduceByKey, groupByKey, sample, take, first, count

• Fault-Tolerant • easily reproducible

Why Use Spark_

7

Page 8: Cassandra & Spark for IoT

•RDD‘s – Resilient Distributed Dataset • Read–Only description of a collection of objects • Partitioned for distribution • Determined through transformations • Allows automatically rebuild on failure

•Operations • Transformations (map,filter,reduce...) —> new RDD • Actions (count, collect, save)

•Only Actions start processing!

Easily Reproducable?_

8

Page 9: Cassandra & Spark for IoT

RDD Example_

9

scala>valtextFile=sc.textFile("README.md")textFile:spark.RDD[String]=spark.MappedRDD@2ee9b6e3

scala>vallinesWithSpark=textFile.filter(line=>line.contains("Spark"))linesWithSpark:spark.RDD[String]=spark.FilteredRDD@7dd4af09

scala>linesWithSpark.count()res0:Long=126

Page 10: Cassandra & Spark for IoT

Spark & Cassandra

10

Page 11: Cassandra & Spark for IoT

•Spark Cassandra Connector by Datastax • https://github.com/datastax/spark-cassandra-connector

• Cassandra tables as Spark RDD (read & write)

• Mapping of C* tables and rows onto Java/Scala objects

• Server-Side filtering („where“)

• Included as Maven / SBT dependency in your application

Connecting Spark With Cassandra_

11

Page 12: Cassandra & Spark for IoT

Two Datacenter - Two Purposes_

12

C*

C*

C*C*

C*

C*

C*C*

Spark WN

Spark WNSpark

WN

Spark WN

Spark Master

DC1 - Online DC2 - Analytics

Page 13: Cassandra & Spark for IoT

Spark Streaming

13

Page 14: Cassandra & Spark for IoT

• Real Time Processing using micro batches

• Supported sources: Files, TCP, MQTT, Kafka, Twitter,..

• Data as Discretized Stream (DStream)

• Same programming model as for batches

• All Operations of the Spark Core, SQL and MLLib

• Stateful Operations & Sliding Windows

Stream Processing With Spark Streaming_

14

Page 15: Cassandra & Spark for IoT

valssc=newStreamingContext(sc,Milliseconds(500))vallines=MQTTUtils.createStream(ssc,"tcp://localhost:1883","foo",StorageLevel.MEMORY_ONLY_SER_2)

valkeyValue=lines.map(input=>input.toLowerCase)

data.foreachRDD(_.saveToCassandra("mqtt","sensors"))

ssc.start()

//awaitmanualterminationorerrorssc.awaitTermination()

//manualterminationssc.stop()

Spark Streaming - MQTT Example_

15

Page 16: Cassandra & Spark for IoT

Use Cases

16

Page 17: Cassandra & Spark for IoT

•Spark Streaming • Continuous data streams • MQTT, Kafka, ZeroMQ... • Easily reliable

• Spark Core • Existing data • SQL Databases, CSV, Json...

• Use the same programming model or even the same code!

Use Cases for Spark and Cassandra in IoT_

17

Ingestion

Page 18: Cassandra & Spark for IoT

• Real-Time Analysis • React on events • Join with existing data • Apply events on ML models

• Batch Analysis • Scheduled jobs • Analytics on the data • Train ML models

Use Cases for Spark and Cassandra in IoT_

18

Analyses

Page 19: Cassandra & Spark for IoT

Demo

19

Page 20: Cassandra & Spark for IoT

Questions?

Matthias Niehoff, IT-Consultant

90

codecentric AG Zeppelinstraße 2 76185 Karlsruhe, Germany

mobil: +49 (0) 172.1702676 [email protected]

www.codecentric.de blog.codecentric.de

matthiasniehoff